Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science Edited by J. G. Carbonell and J. Siekmann
Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen
1730
3
Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Singapore Tokyo
Michael Gelfond Nicola Leone Gerald Pfeifer (Eds.)
Logic Programming and Nonmonotonic Reasoning 5th International Conference, LPNMR ’99 El Paso, Texas, USA, December 2-4, 1999 Proceedings
13
Series Editors Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA J¨org Siekmann, University of Saarland, Saarbr¨ucken, Germany
Volume Editors Michael Gelfond University of Texas at El Paso Department of Computer Science El Paso, TX 79916, USA E-mail:
[email protected] Nicola Leone Gerald Pfeifer Technische Universit¨at Wien Institut f¨ur Informationssysteme 184/2 Favoritenstraße 9-11, A-1040 Vienna, Austria E-mail: {leone,pfeifer}@dbai.tuwien.ac.at
Cataloging-in-Publication data applied for
Die Deutsche Bibliothek - CIP-Einheitsaufnahme Logic programming and nonmonotonic reasoning : 5th international conference ; proceedings / LPNMR ’99, ElPaso, Texas, USA, December 2 - 4, 1999. Michael Gelfond . . . (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ; Singapore ; Tokyo : Springer, 1999 (Lecture notes in computer science ; Vol. 1730 : Lecture notes in artificial intelligence) ISBN 3-540-66749-0
CR Subject Classification (1998): I.2.3-4, F.4.1, D.1.6 ISBN 3-540-66749-0 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. © Springer-Verlag Berlin Heidelberg 1999 Printed in Germany Typesetting: Camera-ready by author SPIN: 10704004 06/3142 – 5 4 3 2 1 0
Printed on acid-free paper
Preface This volume consists of the refereed papers presented at the Fifth International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR’99) held at El Paso, Texas, in December 1999. LPNMR’99 is the fifth in a series of international meetings on logic programming and nonmonotonic reasoning. Four previous meetings were held in Washington, U.S.A., in 1991, in Lisbon, Portugal, in 1993, in Lexington, U.S.A., in 1995, and in Dagstuhl, Germany, in 1997. The aim of the LPNMR conferences is to facilitate interactions between researchers interested in logic based programming languages and database systems and researchers who work in the areas of knowledge representation and nonmonotonic reasoning. In addition to presentations of accepted papers the conference will feature talks by four invited speakers — Marco Cadoli, Vladimir Lifschitz, David Mc Allester, and Leora Morgenstern. Many people contributed to the success of the LPNMR’99 conference. Special thanks are due to the program committee and the additional reviewers for careful evaluation of the submitted papers. We would also like to thank Gopal Gupta and Danny De Schreye for their efforts in coordinating the schedules of ICLP99 and LPNMR99, and Georg Gottlob, chair of the LPNMR steering committee, who provided continuous advise and support to the program chairs. The conference was financially supported by the University of Texas at El Paso and Compulog Net provided support for a European invited speaker.
December 1999
Michael Gelfond Nicola Leone Gerald Pfeifer
Conference Organization
Program Co-Chairs Michael Gelfond (University of Texas at El Paso, USA) Nicola Leone (Vienna University of Technology, Austria)
Program Committee Jose Julio Alferes (Universidade de Evora, Portugal) Chitta Baral (University of Texas at El Paso, USA) Nicole Bidoit (Universit´e de Bordeaux 1, France) J¨ urgen Dix (University of Koblenz, Germany) Thomas Eiter (Vienna University of Technology, Austria) Fangzhen Lin (The Hong Kong University of Science and Technology, China) Jack Minker (University of Maryland, USA) Anil Nerode (Cornell University, USA) Ilkka Niemela (Helsinki University of Technology, Finland) Dino Pedreschi (University of Pisa, Italy) Pasquale Rullo (University of Calabria, Rende, Italy) Chiaki Sakama (Wakayama University, Japan) V.S. Subrahmanian (University of Maryland, USA) Francesca Toni (Imperial College, London, U.K.) Miroslaw Truszczynski (University of Kentucky at Lexington, USA) Hudson Turner (University of Minnesota at Duluth, USA) Moshe Y. Vardi (Rice University, USA) Jia-Huai You (University of Alberta, Canada)
Publicity Chair Gerald Pfeifer (Vienna University of Technology, Austria)
Additional Reviewers Roberto Barbuti Francesco Buccafurri Phan Minh Dung Sergio Greco
Stefan Brass Carlos Damasio Uwe Egly Jeff Horty
Krysia Broda Alexander Dekhtyar Wolfgang Faber Katsumi Inoue
Conference Organization
Tomi Janhunen Hirofumi Katsuno Thomas Lukasiewicz Victor Marek Iara Mora Luis Moniz Pereira Salvatore Ruggieri Dietmar Seipel Terry Swift
Chris Johnson Vladimir Lifschitz Sofian Maabout Cristinel Mateis Mirco Nanni Gerald Pfeifer Fariba Sadri Hirohisa Seki Hans Tompits Kewen Wang
VII
Antonis Kakas Jorge Lobo Giuseppe Manco Yuji Matsumoto Luigi Palopoli Inna Pivkina Francesco Scarcello Patrik Simons Ulrich Zukowski
Table of Contents Contributed Papers Fixed Parameter Complexity in AI and Nonmonotonic Reasoning . . . . . . . . . . . 1 G. Gottlob, F. Scarcello, M. Sideri Classifying Semi-Normal Default Logic on the Basis of its Expressive Power 19 T. Janhunen Locally Determined Logic Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 D. Cenzer, J. B. Remmel, A. Vanderbilt Annotated Revision Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 V. Marek, I. Pivkina, M. Truszczy´ nski Belief, Knowledge, Revisions, and a Semantics of Non-Monotonic Reasoning 63 J. Sefranek An Argumentation Framework for Reasoning about Actions and Changes . . 78 A. Kakas, R. Miller, F. Toni Representing Transition Systems by Logic Programs . . . . . . . . . . . . . . . . . . . . . . . 92 V. Lifschitz, H. Turner Transformations of Logic Programs Related to Causality and Planning . . . . 107 E. Erdem, V. Lifschitz From Causal Theories to Logic Programs (Sometimes) . . . . . . . . . . . . . . . . . . . . 117 F. Lin, K. Wang Monotone Expansion of Updates in Logical Databases . . . . . . . . . . . . . . . . . . . . 132 M. Dekhtyar, A. Dikovsky, S. Dudakov, N. Spyratos Updating Extended Logic Programs through Abduction . . . . . . . . . . . . . . . . . . 147 C. Sakama, K. Inoue LUPS – A Language for Updating Logic Programs . . . . . . . . . . . . . . . . . . . . . . . . 162 J. J. Alferes, L. M. Pereira, H. Przymusinska, T. Przymusinski
X
Table of Contents
Pushing Goal Derivation in DLP Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 W. Faber, N. Leone, G. Pfeifer Linear Tabulated Resolution for Well Founded Semantics . . . . . . . . . . . . . . . . . 192 Y. Shen, L. Yuan, J. You, N. Zhou A Case Study in Using Preference Logic Grammars for Knowledge Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 B. Cui, T. Swift, D. S. Warren Minimal Founded Semantics for Disjunctive Logic Programming . . . . . . . . . . 221 S. Greco On the Role of Negation in Choice Logic Programs . . . . . . . . . . . . . . . . . . . . . . . 236 M. De Vos, D. Vermeir Approximating Reiter‘s Default Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 T. Linke, T. Schaub Coherent Well-founded Annotated Logic Programs . . . . . . . . . . . . . . . . . . . . . . . . 262 C. V. Dam´asio, L. M. Pereira, T. Swift Many-Valued Disjunctive Logic Programs with Probabilistic Semantics . . . .277 T. Lukasiewicz Extending Disjunctive Logic Programming by T-norms . . . . . . . . . . . . . . . . . . . 290 C. Mateis Extending the Stable Model Semantics with More Expressive Rules . . . . . . . 305 P. Simons Stable Model Semantics for Weight Constraint Rules . . . . . . . . . . . . . . . . . . . . . . 317 I. Niemel¨a, P. Simons, T. Soininen Towards First-Order Nonmonotonic Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 R. Rosati Comparison of Sceptical NAF-Free Logic Programming Approaches . . . . . . . 347 G. Antoniou, M.J. Maher, Billington, G. Governatori
Table of Contents
XI
Characterizations of Classes of Programs by Three-Valued Operators . . . . . 357 P. Hitzler, A. K. Seda Invited Talks Using LPNMR for Problem Specification and Code Generation (Abstract) 372 M. Cadoli Answer Set Planning (Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 V. Lifschitz World-Modeling vs. World-Axiomatizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 D. McAllester Practical Nonmonotonic Reasoning: Extended Inheritance Techniques to Solve Real-World Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 L. Morgenstern Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning Georg Gottlob1 , Francesco Scarcello1 , and Martha Sideri2 1
Institut f¨ur Informationssysteme, Technische Universit¨at Wien A-1040 Wien, Paniglgasse 16, Austria {gottlob,scarcell}@dbai.tuwien.ac.at 2 Department of Computer Science, Athens University of Economics and Business, Athens, Greece
[email protected]
Abstract We study the fixed-parameter complexity of various problems in AI and nonmonotonic reasoning. We show that a number of relevant parameterized problems in these areas are fixed-parameter tractable. Among these problems are constraint satisfaction problems with bounded treewidth and fixed domain, restricted satisfiability problems, propositional logic programming under the stable model semantics where the parameter is the dimension of a feedback vertex set of the program’s dependency graph, and circumscriptive inference from a positive k-CNF restricted to models of bounded size. We also show that circumscriptive inference from a general propositional theory, when the attention is restricted to models of bounded size, is fixed-parameter intractable and is actually complete for a novel fixed-parameter complexity class. Keywords: Complexity, Fixed-parameter Tractability, Nonmonotonic Reasoning, Constraint Satisfaction, Prime Implicants, Logic Programming, Stable Models, Circumscription.
1
Introduction
Many hard decision or computation problems are known to become tractable if a problem parameter is fixed or bounded by a fixed value. For example, the well-known NP-hard problems of checking whether a graph has a vertex cover of size at most k, and of computing such a vertex cover if so, become tractable if the integer k is a fixed constant, rather than being part of the problem instance. Similarly, the NP complete problem of finding a clique of size k in a graph becomes tractable for every fixed k. Note, however, that there is an important difference between these problems: – The vertex cover problem is solvable in linear time for every fixed constant k. Thus the problem is not only polynomially solvable for each fixed k, but, moreover, can be solved in time bounded by a polynomial pk whose degree does not depend on k. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 1–18, 1999. c Springer-Verlag Berlin Heidelberg 1999
2
G. Gottlob, F. Scarcello, and M. Sideri
– The best known algorithms for finding a clique of size k in a graph are all exponential in k (typically, they require runtime nΩ(k/2) ). Thus, for fixed k, the problem is solvable in time bounded by a polynomial pk , whose degree depends crucially on k. Problems of the first type are called fixed-parameter (short: fp) tractable, while problems of the second type can be classified as fixed-parameter intractable [9]. It is clear that fixed-parameter tractability is a highly desirable feature. The theory of parameterized complexity, mainly developed by Downey and Fellows [9,7,6], deals with general techniques for proving that certain problems are fptractable, and with the classification of fp-intractable problems into a hierarchy of fixedparameter complexity classes. In this paper we study the fixed-parameter complexity of a number of relevant AI and NMR problems. In particular, we show that the following problems are all fixedparameter tractable (the parameters to be fixed are added in square brackets after the problem description): – Constraint Satisfiability and computation of the solution to a constraint satisfaction problem (CSP) [fixed parameters: (cardinality of) domain and treewidth of constraint scopes]. – Satisfiability of CNF [fixed parameter: treewidth of variable connection graph]. – Prime Implicants of a q-CNF [fixed parameters: maximal number q of literals per clause and size of the prime implicants to be computed]. – Propositional logic programming [fixed parameter: size of a minimal feedback vertex set of the atom dependency graph]. – Circumscriptive inference from a positive q-CNF [fixed parameters: maximal number q of literals per clause and size of the models to be considered]. We believe that these results are useful both for a better understanding of the computational nature of the above problems and for the development of smart parameterized algorithms for the solution of these and related problems. We also study the complexity of circumscriptive inference from a general propositional theory when the attention is restricted to models of size k. This problem, referred-to as small model circumscription (SMC), is easily seen to be fixed-parameter intractable, but it does not seem to be complete for any of the fp-complexity classes defined by Downey and Fellows. We introduce the new class Σ2 W [SAT ] as a miniaturized version of the class Σ2P of the polynomial hierarchy, and prove that SMC is complete for Σ2 W [SAT ]. This seems to be natural, given that the nonparameterized problem corresponding to SMC is Σ2P -complete [10]. Note, however, that completeness results for parameterized classes are more difficult to obtain. In fact, for obtaining our completeness result we had to resort to the general version of circumscription (called P;Zcircumscription) where the propositional letters of the theory to be circumscribed are partitioned into two subsets P and Z, and only the atoms in P are minimized, while those in Z can float. The restricted problem, where P consists of all atoms and Z is empty does not seem to be complete for Σ2 W [SAT ], even though its non-parameterized version is still Σ2P complete [10].
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
3
The paper is organized as follows. In Section 2 we state the relevant formal definitions related to fixed parameter complexity. In Section 3 we deal with constraint satisfaction problems. In Section 4 we study fp-tractable satisfiability problems. In Section 5 we deal with logic programming. Finally, in Section 6 we study the problem of circumscriptive inference with small models.
2
Parameterized Complexity
Parameterized complexity [9] deals with parameterized problems, i.e., problems with an associated parameter. Any instance S of a parameterized problem P can be regarded as consisting of two parts: the “regular” instance IS , which is usually the input instance of the classical – non parameterized – version of P ; and the associated parameter kS , usually of integer type. Definition 1. A parameterized problem P is fixed-parameter tractable if there is an algorithm that correctly decides, for input S, whether S is a yes instance of P in time f (kS )O(nc ), where n is the size of IS (|IS | = n), kS is the parameter, c is a constant, and f is an arbitrary function. A notion of problem reduction proper to the theory of parameterized complexity has been defined. Definition 2. A parameterized problem P fp-reduces to a parameterized problem P 0 by an fp-reduction if there exist two functions f, f 0 and a constant c such that we can associate to any instance S of P an instance S 0 of P 0 satisfying the following conditions: (i) the parameter kS 0 of S 0 is f (kS ); (ii) the regular instance IS 0 is computable from S in time f 0 (kS )|IS |c ; (iii) S is a yes instance of P if and only if S 0 is a yes instance of P 0. A parameterized class of problems C is a (possibly infinite) set of parameterized problems. A problem P is C-complete if P ∈ C and every problem P 0 ∈ C is fpreducible to P . A hierarchy of fp-intractable classes, called W -hierarchy, has been defined to properly characterize the degree of fp-intractability associated to different parameterized problems. The relationship among the classes of problems belonging to the W -hierarchy is given by the following chain of inclusions: W [1] ⊆ W [2] ⊆ . . . ⊆ W [SAT ] ⊆ W [P ] where, for each natural number t > 0, the definition of the class W [t] is based on the degree t of the complexity of a suitable family of Boolean circuits. The most prominent W [1]-complete problem is the parameterized version of clique, where the parameter is the clique size. W [1] can be characterized as the class of parameterized problems that fp-reduce to parameterized CLIQUE. Similarly, W [2] can be characterized as the class of parameterized problems that fp-reduce to parameterized Hitting Set, where the parameter is the size of the hitting set.
4
G. Gottlob, F. Scarcello, and M. Sideri
A k-truth value assignment for a formula E is a truth value assignment which assigns true to exactly k propositional variables of E. Consider the following problem Parameterized SAT: Instance: A Boolean formula E. Parameter: k. Question: Does there exist a k-truth value assignment satisfying E? W [SAT ] is the class of parameterized problems that fp-reduce to parameterized SAT. W [SAT ] is contained in W [P ], where Boolean circuits are used instead of formulae. It is not known whether any of the above inclusionships is proper or not, however the difference of all classes is conjectured. The AW -hierarchy has been defined in order to deal with some problems that do not fit the W -classes [9]. The AW -hierarchy represents in a sense the parameterized counterpart of PSPACE in the classical complexity setting. In this paper we are mainly interested in the class AW [SAT ]. Consider the following problem Parameterized QBFSAT: Instance: A quantified boolean formula Φ = Qk11 x1 Qk22 x2 · · · Qknn xn E. Parameter: k =< k1 , k2 , . . . , kn >. Question: Is Φ valid? (Here, ∃ki x denotes the choice of some ki -truth value assignment for the variables x, and ∀kj x denotes all choices of kj -truth value assignments for the variables x.) AW [SAT ] is the class of parameterized problems that fp-reduce to parameterized QBFSAT.
3
Constraint Satisfaction Problems, Bounded Treewidth, and FP-Tractability
In this section we prove that Constraint Satisfaction Problems of bounded treewidth over a fixed domain are FP tractable. In order to get this results we need a number of definitions. In Section 3.1 we give a very general definition of CSPs; in Section 3.2 we define the treewidth of CSP problems and quote some recent results; in Section 3.3 we show the main tractability result. 3.1
Definition of CSPs
An instance of a constraint satisfaction problem (CSP) (also constraint network) is a triple I = (V ar, U, C), where V ar is a finite set of variables, U is a finite domain of values, and C = {C1 , C2 , . . . , Cq } is a finite set of constraints. Each constraint Ci is a pair (Si , ri ), where Si is a list of variables of length mi called the constraint scope, and ri is an mi -ary relation over U , called the constraint relation. (The tuples of ri indicate the allowed combinations of simultaneous values for the variables Si ). A solution to a CSP instance is a substitution ϑ : V ar −→ U , such that for each 1 ≤ i ≤ q, Si ϑ ∈ ri . The problem of deciding whether a CSP instance has any solution is called constraint satisfiability (CS). (This definition is taken almost verbatim from [17].)
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
5
To any CSP instance I = (V ar, U, C), we associate a hypergraph H(I) = (V, H), where V = V ar, and H = {var(S) | C = (S, r) ∈ C}, where var(S) denotes the set of variables in the scope S of the constraint C. Let H(I) = (V, H) be the constraint hypergraph of a CSP instance I. The primal graph of I is a graph G(I) = (V, E), having the same set of variables (vertices) as H(I) and an edge connecting any pair of variables X, Y ∈ V such that {X, Y } ⊆ h for some h ∈ H. 3.2 Treewidth of CSPs The treewidth of a graph is a measure of the degree of cyclicity of a graph. Definition 3 ([20]). A tree decomposition of a graph G = (V, F ) is a pair hT, λi, where T = (N, E) is a tree, and λ is a labeling function associating to each vertex p ∈ N a set of vertices λ(p) ⊆ V , such that the following conditions are satisfied: 1. for each vertex b of G, there is a p ∈ N such that b ∈ λ(p); 2. for each edge {b, d} ∈ F , there is a p ∈ N such that {b, d} ⊆ λ(p); 3. for each vertex b of G, the set {p ∈ N | b ∈ λ(p)} induces a (connected) subtree of T. The width of the tree decomposition is maxp∈N k λ(p) k −1. The treewidth of G is the minimum width over all its tree decompositions. Bodlaender [3] has shown that, for each fixed k, there is a linear time algorithm for checking whether a graph G has treewidth bounded by k and, if so, computing a tree decomposition of G having width k at most. Thus, the problem of computing a tree decomposition of a graph of width k is fp-tractable in the parameter k. The treewidth of a CSP instance I is the treewidth of its primal graph G(I). Accordingly, a tree decomposition of I is a tree decomposition of G(I). 3.3
FP-Tractable CSPs
Constraint Satisfaction is easily seen to be NP-complete. Moreover, the parameterized version, where the parameter is the total size of all constraint scopes, is W [1]-complete, and thus not fp-tractable. This follows from well-known results on conjunctive query evaluation [8,19], which is equivalent to constraint satisfaction (cf. [2,18,15]). Therefore, also bounded treewidth CSP is fp-intractable and W [1]-hard. Indeed, the CSPs having total size of the constraint scopes ≤ k form a subclass of the CSPs having treewidth ≤ k. Note that, for each fixed k, CSPs of width ≤ k can be evaluated in time O(nk log n) [16]. In this section we show that, however, if as an additional parameter we fix the size of the domain U , then bounded treewidth CSP is fixed parameter tractable. It is worthwhile noting that the general CSP problem remains NP-complete even for constant domain U . (See, e.g., the 3-SAT problem discussed below.)
6
G. Gottlob, F. Scarcello, and M. Sideri
Theorem 1. Constraint Satisfaction with parameters treewidth k and universe size u = |U | is fp-tractable. So is the problem of computing a solution of a CSP problem with parameters k and u. Proof. (Sketch.) Let I = (V ar, U, C) be a CSP instance having treewidth k and |U | = u. We exhibit an fp-transformation from I to an equivalent instance I 0 = (V ar, U, C 0 ). We assume w.l.o.g. that no constraint scope S in I contains multiple occurrences of variables. (In fact, such occurrences can be easily removed by a simple preprocessing of the input instance.) Note that, from the bound k on the treewidth, it follows that each constraint scope contains at most k variables, and thus the constraint relations have arity at most k. Let hT = (V, E), λi be a k-width tree decomposition of G(I) such that |V | ≤ c|G(I)|, for a fixed predetermined constant c. (This is always possible because Bodlaender’s algorithm runs in linear time.) For each vertex p ∈ V , I 0 has a constraint Cp = (S, r) ∈ C 0 , where the scope S is a list containing the variables belonging to λ(p), and r is the associated relation, computed as described below. The relations associated to the constraints of I 0 are computed through the following two steps: 0
1. For each constraint C 0 = (S 0 , r0 ) ∈ C 0 , initialize r0 as U |var(S )| , i.e., the |var(S 0 )|fold cartesian product of the domain U with itself. 2. For each constraint C = (S, r) ∈ C, let C 0 = (S 0 , r0 ) ∈ C 0 be any constraint of I 0 such that var(S) ⊆ var(S 0 ). Such a constraint must exist by definition of tree decomposition of the primal graph G(I). Modify r0 as follows. r0 :={t0 ∈ r0 | ∃ a substitution ϑ s.t. S 0 ϑ = t0 and Sϑ ∈ r}. (In database terms, r0 is semijoinreduced by r.) It is not hard to see that the instance I 0 is equivalent to I, in that they have exactly the same set of solutions. Note that the size of I 0 is ≤ |U |k (c|G(I)|), and even computing I 0 from I is feasible in linear time. Thus the reduction is actually an fp-reduction. The resulting instance I 0 is an acyclic constraint satisfaction problem which is equivalent to an acyclic conjunctive query over a fixed database [15]. Checking whether such a query has a nonempty result and, in the positive case, computing a single tuple of the result, is feasible in linear time by Yannakakis’ well-known algorithm [24]. t u Note that, since CSP is equivalent to conjunctive query evaluation, the above result immediately gives us a corollary on the program complexity of conjunctive queries, i.e. the complexity of evaluating conjunctive queries over a fixed database [23]. The following result complements some recent results on fixed-parameter tractability of database problems by Papadimitriou and Yannakakis [19]. Corollary 1. The evaluation of Boolean conjunctive queries is fp-tractable if the parameters are the treewidth of the query and the size of the database universe. Moreover, evaluating a nonboolean conjunctive query is fp-tractable in the input and output size w.r.t. the treewidth of the query and the size of the database universe.
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
4 4.1
7
FP-Tractable Satisfiability Problems Bounded-width CNF Formulae
As an application of our general result on FP tractable CSPs we show that a relevant satisfiability problem is also FP tractable. The graph G(F ) of a CNF formula F has as vertices the set of propositional variables occurring in F and has an edge {x, y} iff the propositional variables x and y occur together in a clause of F . The treewidth of F is defined to be the treewidth of the associated graph G(F ). Theorem 2. CNF Satisfiability with parameter treewidth k is fp-tractable. So is the problem of computing a model of a CNF formula with parameter k. Proof. (Sketch.) We fp-transform a CNF formula F into a constraint satisfaction instance I(F ) = (V ar, U, C) defined as follows. V ar contains a variable Xp for each propositional variable p occurring in F ; U = {0, 1}; and for each clause D of F , I(F ) contains a constraint (S, r) where the constraint scope S is the list containing all variables Xp such that p is a propositional variable occurring in D, and the constraint relation r ⊆ U |D| consists of all tuples corresponding to truth value assignments satisfying D. It is obvious that every model of F correspond to a solution of I(F ) and vice versa. Thus, in particular, F is satisfiable if and only if I(F ) is a positive CSP instance. Since G(F ) is isomorphic to G(I(F )), both F and I(F ) have the same treewidth. Moreover, any CNF formula F of treewidth k has clauses of cardinality at most k. Therefore, our reduction is feasible in time O(2k |F |) and is thus an fp-reduction w.r.t. parameter k. By this fp-reduction, fp-tractability of CNF-SAT with the treewidth parameter follows from the fp-tractability of CSPs w.r.t. treewidth, as stated in Theorem 1. t u 4.2
CNF with Short Prime Implicants
The problem of finding the prime implicants of a CNF formula is relevant to a large number of different areas, e.g., in diagnosis, knowledge compilation, and many other AI applications. Clearly, the set of the prime implicants of a CNF formula F can be viewed as a compact representation of the satisfying truth assignments for F . It is worthwhile noting that the restriction of Parameterized SAT to CNF formulae is fp-intractable. More precisely, deciding whether a q-CNF formula F has a k-truth value assignment is W [2]-complete [9]. (We recall that a k-truth value assignment assigns true to exactly k propositional variables.) Nevertheless, we identified a very natural parameterized version of satisfiability which is fp-tractable. We simply take as the parameter the length of the prime implicants of the Boolean formula. Given a q-CNF formula F , the Short Prime Implicants problem (SPI) is the problem of computing the (consistent) prime implicants of F having length ≤ k, with parameters k and q.
8
G. Gottlob, F. Scarcello, and M. Sideri
Theorem 3. SPI is fixed-parameter tractable. Proof. (Sketch.) Let F be a q-CNF formula. W.l.o.g., assume that F does not contain tautological clauses. We generate a set IMk (F ) of implicants of F from which it is possible to compute the set of all prime implicants of F having length ≤ k. (this is very similar to the well-known procedure of generating vertex covers of bounded size, cf. [5,9]). Pick an arbitrary clause C of F . Clearly, each implicant I of F must contain at least one literal of C. We construct an edge-labeled tree t whose vertices are clauses in F as follows. The root of t is C. Each nonleaf vertex D has an edge labeled ` to a descendant, for each literal ` ∈ D. As child attach to this edge any clause E of F which does not intersect the set of all edge-labels from the root to the current position. A branch is closed if such a set does not exist or the length of the path is k. For each root-leaf branch β of the tree, let I(β) be the set containing the ≤ k literals labeling the edges of β. Check whether I(β) is a consistent implicant of F and add I(β) to the set IMk (F ) if so. It is easy to see that the size of the tree t is bounded by q k and that for every prime implicant S of F having length ≤ k, S ⊆ I holds, for some implicant I ∈ IMk (F ). Moreover, note that there are at most q k implicants in IMk (F ). For each implicant I ∈ IMk (F ), the set of all consistent prime implicants of F included in I can be easily obtained in time O(2k |F |) from I. It follows that SPI is fp-tractable w.r.t. parameters q and k. t u
5
Logic Programs with Negation
Logic programming with negation under the stable model semantics [14] is a well-studied form of nonmonotonic reasoning. A literal L is either an atom A (called positive) or a negated atom ¬A (called negative). Literals A and ¬A are complementary; for any literal L, we denote by ¬.L its complementary literal, and for any set Lit of literals, ¬.Lit = {¬.L | L ∈ Lit}. A normal clause is a rule of the form A ← L1 , . . . , Lm
(m ≥ 0)
(1)
where A is an atom and each Li is a literal. A normal logic program is a finite set of normal clauses. A normal logic program P is stratified [1], if there is an assignment str(·) of integers 0,1,. . . to the predicates p in P , such that for each clause r in P the following holds: If p is the predicate in the head of r and q the predicate in an Li from the body, then str(p) ≥ str(q) if Li is positive, and str(p) > str(q) if Li is negative. The reduct of a normal logic program P by a Herbrand interpretation I [14], denoted P I , is obtained from P as follows: first remove every clause r with a negative literal L in the body such that ¬.L ∈ I, and then remove all negative literals from the remaining rules. An interpretation I of a normal logic program P is a stable model of P [14], if I is the least Herbrand model of P I .
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
9
In general, a normal logic program P may have zero, one, or multiple (even exponentially many) stable models. Denote by stabmods(P ) the set of stable models of P. It is well-known that every stratified logic program has a unique stable model which can be computed in linear time. The following problems are the main decision and search problems in the context of logic programming. Main logic programming problems. Let P be a logic program. 1. Consistency: Determine whether P admits a stable model. 2. Brave Reasoning: Check whether a given literal is true in a stable model of P . 3. Cautious Reasoning: Check whether a literal is true in every stable model of P . 4. SM Computation: Compute an arbitrary stable model of P . 5. SM Enumeration: Compute the set of all stable models of P . For a normal logic program P , the dependency graph G(P ) is a labeled directed graph (V, A) where V is the set of atoms occurring in P and A is a set of edges such that (p, q) ∈ A is there exists a rule r ∈ P having p in its head and q in its body. Moreover, if q appears negatively in the body, then the edge (p, q) is labeled with the symbol ¬. The undirected dependency graph G∗ (P ) of P is the undirected version of G(P ). A feedback vertex set S of an undirected (directed) graph G is a subset X of the vertices of G such that any cycle (directed cycle) contains at least one vertex in S. Clearly, if a feedback vertex set is removed from G, then the resulting graph is acyclic. The feedback width of G is the minimum size over its feedback vertex sets. It was shown by Downey and Fellows [9,5] that determining whether an undirected graph has feedback width k and, in the positive case, finding a feedback vertex set of size k, is fp-tractable w.r.t. the parameter k. Let P be a logic program defined over a set U of propositional atoms. A partial truth value assignment (p.t.a.) for P is a truth value assignment to a subset U 0 of U . If τ is a p.t.a. for P , denote by P [τ ] the program obtained from P as follows: – eliminate all rules whose body contains a literal contradicting τ ; – eliminate from every rule body all literals whose literals are made true by τ . The following lemma is easy to verify. Lemma 1. Let M be a stable model of some logic program P , and let τ be a p.t.a. consistent with M . Then M is a stable model of P [τ ]. Theorem 4. The logic programming problems (1–5) listed above are all fp-tractable w.r.t. the feedback width of the dependency graph of the logic program. Proof. (Sketch.) Given a logic program P whose graph G∗ (P ) has feedback width k, compute in linear time (see [9]) a feedback vertex set S for G∗ (P ) s.t. |S| = k. Consider the set T of all the 2k partial truth value assignments to the atoms in S.
10
G. Gottlob, F. Scarcello, and M. Sideri
For each p.t.a. τ ∈ T , P [τ ] is a stratified program whose unique stable model Mτ can be computed in linear time. For each τ ∈ T compute Mτ and check whether Mτ ∈ stabmods(P ), where stabmods(P ) denotes the set of all stable models of P (this latter can be done in linear time, too, if suitable data structures are used). Let Σ = {Mτ | Mτ ∈ stabmods(P )}. By definition of Σ, it suffices to note that every stable model M for P belongs to Σ. Indeed, let τ be the p.t.a. on S determined by M . By Lemma 1, it follows that M is a stable model of P [τ ] and hence M ∈ Σ. Thus, P has at most 2k stable models whose computation is fp-tractable and actually feasible in linear time. Therefore, the problem 5 above (Stable Model Enumeration) is fp-tractable. The fp-tractability of all other problems follows. t u It appears that an overwhelmingly large number of “natural” logic programs have very low feedback width, thus the technique presented here seems to be very useful in practice. Note, however, that the technique does not apply to some important and rather obvious cases. In fact, the method does not take care of the direction and the labeling of the arcs in the dependency graph G(P ). Hence, positive programs width large feedback width are not recognized to be tractable, although they are trivially tractable. The same applies, for instance, for stratified programs having large feedback width, or to programs whose high feedback-with is exclusively due to positive cycles. Unfortunately, it is not known whether computing feedback vertex sets of size k is fixed-parameter tractable for directed graphs [9]. Another observation leading to a possible improvement is the following. Call an atom p of a logic program P malignant if it lies on at least one simple cycle of G(P ) containing a marked (=negated) edge. Call an atom benign if it is not malignant. It is easy to see that only malignant atoms can be responsible for a large number of stable models. In particular, every stratified program contains only benign atoms and has exactly one stable model. This suggest the following improved procedure: – – – –
Identify the set of benign atoms occurring in P ; Drop these benign vertices from G∗ (P ), yielding H(P ); Compute a feedback vertex set S of size ≤ k of H(P ); For each p.t.a. τ over S compute the unique stable model Mτ of P [τ ] and check whether this is actually a stable model of P , and if so, output Mτ .
It is easy to see that the above procedure correctly computes the stable models of P . Unfortunately, as shown by the next theorem, it is unlikely that this procedure can run in polynomial time. Theorem 5. Determining whether an atom of a propositional logic program is benign is NP-complete. Proof. (Sketch.) This follows by a rather simple reduction from the NP-complete problem of deciding whether for two pairs of vertices hx1 , y1 i and hx2 , y2 i of a directed graph G, there are two vertex-disjoint paths linking x1 to x2 and y1 to y2 [12]. A detailed explanation will be given in the full paper. t u
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
11
We thus propose a related improvement, which is somewhat weaker, but tractable. A atom p of a logic program P is called weakly malignant if it lies on at least one simple cycle of G∗ (P ) containing a marked (=negated) edge. An atom is called strongly benign if it is not weakly-malignant. Lemma 2. Determining whether an atom of a propositional logic program is strongly benign or weakly malignant can be done in polynomial time. Proof. (Sketch.) It is sufficient to show that determining whether a vertex p of an undirected graph G with Boolean edge labels lies on a simple cycle containing a marked edge. This can be solved by checking for each marked edge hy1 , y2 i of G and for each pair of neighbours x1 , x2 of x whether the graph G − {x} contains two vertex-disjoint paths linking x1 to y1 and x2 to y2 , respectively. The latter is in polynomial time by a result of Robertson and Seymour [21]. t u We next present an improved algorithm for enumerating the stable models of a logic program P based on the feedback width of a suitable undirected graph associated to P . Modular Stable Model Enumeration procedure (MSME). 1. Compute the set C of the strongly connected components (s.c.c.) of G(P ); 2. For each s.c.c. C ∈ C, let PC be the set of rules of P that “define” atoms belonging to C, i.e., PC contains any rule r ∈ P whose head belongs to C; 3. Determine the set UC ⊆ C of the strongly connected components (s.c.c.) of G(P ) whose corresponding program PC is not stratified; 4. For each s.c.c. C ∈ UC compute the set of strongly benign atoms SB(C) occurring in PC ; S 5. Let P 0 = C∈U C PC ; 6. Let H(P 0 ) be the the subgraph of G∗ (P 0 ) obtained by dropping every vertex p occurring in some set of strongly benign atoms SB(C) for some C ∈ UC; 7. Compute a feedback vertex set S of size ≤ k of H(P 0 ); 8. For each p.t.a. τ over S compute the unique stable model Mτ of P [τ ] and check whether this is actually a stable model of P , and if so, output Mτ . The feedback width of the graph H(P 0 ) is called the weak feedback-width of the dependency graph of P . The following theorem follows from the fp-tractability of computing feedback vertex sets of size k for undirected graph and from well-known modular computation methods for stable model semantics [11]. Theorem 6. The logic programming problems (1–5) listed above are all fp-tractable w.r.t. the weak feedback-width of the dependency graph of the logic program. Note that the methods used in this section can be adapted to show fixed-parameter tractability results for extended versions of logic programming, such as disjunctive logic programming, and for other types of nonmonotonic reasoning. In the case of disjunctive logic programming, it is sufficient to extend the dependency graph to contain a labeled directed edge between every pair of atoms occurring together in a rule head.
12
G. Gottlob, F. Scarcello, and M. Sideri
A different perspective to the computation of stable models has been recently considered in [22], where the size of stable models is taken as the fixed parameter. It turns out that computing small stable models is fixed-parameter intractable, whereas computing large stable models is fixed-parameter tractable if the parameter is the number of rules in the program.
6 The Small Model Circumscription Problem In this section we study the fixed-parameter complexity of a tractable parametric variant of circumscription, where the attention is restricted to models of small cardinality.
6.1
Definition of Small Model Circumscription
The Small Model Circumscription Problem (SMC) is defined as follows. Given a propositional theory T , over a set of atoms A = P ∪ Z, and given a propositional formula ϕ over vocabulary A, decide whether ϕ is satisfied in a model M of T such that: – M is of small size, i.e., at most k propositional atoms are true in M (written |M | ≤ k); and – M is P ; Z-minimal w.r.t. all other small models1 , i.e., for each model M 0 of T such that |M 0 | ≤ k, M ∩ P ⊆ M 0 ∩ P holds. This problem appears to be a miniaturization of the classical problem of (brave) reasoning with minimal models. We believe that SMC is useful, since in many contexts, one has large theories, but is mainly interested in small models (e.g. in abductive diagnosis). Clearly, for each fixed k, SMC is tractable. In fact it sufficed to enumerate |A|k candidate interpretations in an outer loop and for each such interpretation M check whether M |= T , M |= ϕ, and M is P ; Z-minimal. The latter can be done by an inner loop enumerating all small interpretations and performing some easy checking tasks. It is also not hard to see that SMC is fp-intractable. In fact the Hitting Set problem, which was shown to be W [2]-complete [9], can be fp-reduced to SMC and can be actually regarded as the restricted version of SMC where P = A, Z = ∅, and T consists of a CNF having only positive literals. In Section 6.2 we present the fp-tractable subclass of this version of SMC, where the maximum clause length in the theory is taken as an additional parameter. However, in Section 6.3 we show that, as soon as the set Z of floating variables is not empty, this problem becomes fp-intractable. Since brave reasoning under minimal models was shown to be Σ2P complete in [10], and is thus one level above the complexity of classical reasoning, it would be interesting to determine the precise fixed-parameter complexity of the general version of SMC w.r.t. parameter k. This problem too is tackled in Section 6.3. 1
In this paper, whenever we speak about P ; Z-minimality, we mean minimality as defined here.
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
13
6.2 A Tractable Restriction of SMC We restrict SMC by requiring that the theory T be a q-CNF with no negative literal occuring in it, and by minimizing over all atoms occurring in the theory. The problem Restricted Small Model Circumscription (RSMC) is thus defined as SMC except that T is required to be a purely positive q-CNF formula, the “floating” set Z is empty, and the parameters are the maximum size k of the models to be considered, and the maximum size q of the number of literals in the largest conjunct (=clause) of T . Theorem 7. RSMC is fixed-parameter tractable. Proof. (Sketch.) Since T is positive and Z = ∅, the set of minimal models of T to be considered are exactly the prime implicants of T having size ≤ k. By Theorem 3, computing these prime implicants for a q-CNF theory is fp-tractable w.r.t. parameters k and q. Thus, the theorem easily follows. t u 6.3 The Fixed-Parameter Complexity of SMC We first show that the slight modification of the fp-tractable problem RSMC where Z 6= ∅ is fp-intractable and in fact W [SAT ] hard. The problem Positive Small Model Circumscription (PSMC) is defined as SMC except that T is required to be a purely positive q-CNF formula, and the parameters are the maximum size k of the models to be considered, and the maximum clause length q. Let us define the Boolean formula countk (x), where x = (x1 , . . . , xn ) is a list of variables: ^ ^ _ ^ ( A= (qtj−1 ∧ ¬xs ) ∧ xi ) ≡ qij 1≤i≤n 1≤j≤min{i,k+1}
B=
^
j−1≤t≤i−1
t+1≤s≤i−1
¬qrk+1
C=
k+1≤r≤n
_
qrk
k≤r≤n
countk (x) = A ∧ B ∧ C Intuitively, in any satisfying truth value assignment for countk (x), the propositional variable qij gets the value true iff xi is the j th true variable among x1 , . . . , xi . Note that the size of countk (x) is O(kn2 ). The variables x1 , . . . , xn in the formula above are called the external variables of the formula, while all the other variables occurring in the formula are called private variables. Whenever a theory T contains a count subformula, we assume w.l.o.g. that the private variables of this subformula do not occur in T outside the subformula. In particular, if T contains two count subformulas, then their set of private variables are disjoint.
14
G. Gottlob, F. Scarcello, and M. Sideri
Lemma 3. Let F be a formula and x a list of variables occurring in F . Then – F ∧ countk (x) is satisfiable if and only if there exists a truth value assignment σ for F assigning true to exactly k variables from x. – Every k-truth value assignment σ satisfying F can be extended in a unique way to an assignment σ 0 satisfying F ∧ countk (x). – Every satisfying truth value assignment for F ∧ countk (x) assigns true to exactly k private variables of countk (x) and true to exactly k variables from x. Theorem 8. PSMC is W [SAT ]-hard. The problem remains hard even for 2-CNF theories. Proof. (Sketch.) Let Φ be a Boolean formula over propositional variables {x1 , . . . , xn }. We fp-reduce the W [SAT ]-complete problem of deciding whether there exists a k-truth value assignment satisfying Φ to an instance of PSMC where the maximum model size is 2k + 1, and the maximum clause length is 2. Let Φ0 = Φ ∧ countk (x1 , . . . , xn ), and let y1 , . . . , ym be the private variables of the countk subformula. Moreover, let T be the following 2-CNF positive theory: (p ∨ x1 ) ∧ · · · ∧ (p ∨ xn ) ∧ (p ∨ y1 ) ∧ · · · ∧ (p ∨ ym ). We take P = {p} and Z = {x1 , . . . , xn , y1 , . . . , ym }. Note that a set M is a P ; Z minimal model of T having size ≤ 2k + 1 if and only if M = {p} ∪ S, where S is any subset of Z such that |M | ≤ 2k. From Lemma 3, every satisfying truth value assignment for Φ0 must make true exactly k variables from {x1 , . . . , xn }, and k variables from the set of private variables of countk . It follows that there exists a P ; Z minimal model M of T such that |M | ≤ 2k + 1 and M satisfies Φ0 if and only if there exists a k-truth value assignment satisfying Φ. t u Let us now focus on the general SMC problem, where both arbitrary theories are considered and floating variables are permitted. It does not appear that SMC is contained in W [SAT ]. On the other hand, it can be seen that SMC is contained in AW [SAT ], but it does not seem to be hard (and thus complete) for this class. In fact, AW [SAT ] is the miniaturization of PSPACE and not of Σ2P . No class corresponding to the levels of the polynomial hierarchy have been defined so far in the theory of the fixed-parameter intractability. Nonmonotonic reasoning problems, such as SMC, seem to require the definitions of such classes. We next define the exact correspondent of Σ2P at the fixedparameter level. Definition of the class Σ2 W [SAT ]. Σ2 W [SAT ] is defined similarly to AW [SAT ], but the quantifier prefix is restricted to Σ2 . Parameterized QBF2 SAT. Instance: A quantified boolean formula ∃k1 x∀k2 yE. Parameter: k =< k1 , k2 >.
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
15
Question: Is ∃k1 x∀k2 yE valid? (Here, ∃k1 x denotes the choice of some k1 -truth value assignment for the variables x, and ∀k2 y denotes all choices of k2 -truth value assignments for the variables y.) Definition 4. Σ2 W [SAT ] is the set of all problems that fp-reduce to Parameterized QBF2 SAT. Membership of SMC in Σ2 W [SAT ]. Let the problem Parameterized QBF2 SAT≤ be the variant of the problem Parameterized QBF2 SAT where the quantifiers ∃k1 x and ∀k2 y are replaced by quantifiers ∃≤k1 x and ∀≤k2 y with the following meaning. ∃≤k1 x α means that there exists a truth value assignment making at most k1 propositional variables from x true such that α is valid. Simmetrically, ∀≤k2 y α means that α is valid for every truth value assignment making at most k2 propositional variables from y true. Lemma 4. Parameterized QBF2 SAT≤ is in Σ2 W [SAT ]. Proof. (Sketch.) It suffices to show that Parameterized QBF2 SAT≤ is fp-reducible to Parameterized QBF2 SAT. Let Φ = ∃≤k1 x1 x2 . . . xn ∀≤k2 y1 y2 . . . ym E(x1 , . . . , xn , y1 , . . . , ym ) be an instance of Parameterized QBF2 SAT≤ . It is easy to see that the following instance Φ0 of Parameterized QBF2 SAT is equivalent to Φ. 0 ∃2k1 x1 x2 . . . xn x01 x02 . . . x0n ∀2k2 y1 y2 . . . ym y10 y20 . . . ym 0 E(x1 ∧ x01 , . . . , xn ∧ x0n , y1 ∧ y10 , . . . , ym ∧ ym ),
0 where x01 , x02 , . . . , x0n , y10 , y20 , . . . , ym are new variables and E(x1 ∧x01 , . . . , xn ∧x0n , y1 ∧ 0 0 y1 , . . . , ym ∧ ym ) is obtained from E by substituting xi ∧ x0i for xi (1 ≤ i ≤ n) and yj ∧ yj0 for yj (1 ≤ j ≤ m). t u
Theorem 9. SMC is in Σ2 W [SAT ]. Proof. (Sketch.) By Lemma 4 it is sufficient to show that every SMC instance S can be fp-reduced to an equivalent instance Φ(S) of Parameterized QBF2 SAT≤ . Let S = (A = P ∪ Z, T (P, Z), ϕ, k) be an SMC instance, where P = {p1 , . . . , pn } and Z = 0 {z1 , . . . , zm }. Let P 0 = {p01 , . . . , p0n } and Z 0 = {z10 , . . . , zm } be two sets of fresh variables. Φ(S) is defined as follows: 0 ∃≤k p1 . . . pn z1 , . . . zm ∀≤k p01 . . . p0n z10 , . . . zm T (P, Z) ∧ ϕ ∧V W T (P 0 , Z 0 ) ⇒ ( 1≤i≤n pi ≡ p0i )( 1≤i≤n p0i ∧ ¬pi ),
where T (P 0 , Z 0 ) is obtained from T (P, Z) by substituting p0i for pi (1 ≤ i ≤ n) and zj0 for zj (1 ≤ j ≤ m).
16
G. Gottlob, F. Scarcello, and M. Sideri
The first part of Φ(S) guesses a model M of T with at most k atoms among P ∪ Z which satisfies ϕ. The second part makes sure that the M is P ; Z minimal by checking that each model M 0 of T is either equivalent to M over the P variables, or has at least one P variable true whereas the same variable is false in M . Hence T bravely entails ϕ under small models P ; Z circumscription if and only if Φ(S) is valid. t u Σ2 W [SAT ]-hardness of SMC. Theorem 10. SMC is Σ2 W [SAT ]-hard, and thus Σ2 W [SAT ]-complete. Proof. (Sketch.) We show that Parameterized QBF2 SAT is fp-reducible to SMC. Let Φ be the following instance of Parameterized QBF2 SAT. ∃k1 x1 x2 . . . xn ∀k2 y1 y2 . . . ym E(x1 , . . . , xn , y1 , . . . , ym ). We define a corresponding instance of SMC S(Φ) = (A = P ∪ Z, T, ϕ = w, k = 2k1 + 2k2 + 1), where w is a fresh variable, T = (E(x, y) ⇒ w) ∧ countk1 (x) ∧ countk2 (y), P = x ∪ {w}, and Z consists of all the other variables occurring in T , namely, the variables in y and the private variables of the two count subformulae. We prove that Φ is valid if and only if S(Φ) is a yes instance of SMC. (Only if part.) Assume Φ is valid. Then, there exists a k1 -truth value assignment σ to the variables x such that for every k2 -truth value assignment to the variables y, the formula E is satisfied. Let M be an interpretation for T constructed as follows. M contains the k1 variables from x which are made true by σ and the first k2 variables of y; in addition, M contains w and k1 + k2 private variables which make true the two count subformulae. This is possible by Lemma 3. It is easy to see that M is a model for T . We now show that M is a P ; Z minimal model of T . Assume that M 0 is a P ; Z smaller model. Due to the countk1 (x) subformula, M 0 must contain exactly k1 atoms from x and therefore M and M 0 coincide w.r.t. the x atoms. It follows that w 6∈M 0 . However, by validity of Φ and the construction of M , M 0 |= E holds, and therefore M 0 |= w, as well. Contradiction. (If part.) Assume there exists a P ; Z minimal model M of T such that M entails w and |M | ≤ k. Note that, by Lemma 3, it must hold that M contains exactly k1 true variables from x and exactly k2 true variables from y. Towards a contradiction, assume that Φ is not valid. Then it must hold that for every k1 -truth value assignment σ to the variables x, there exists a k2 -truth value assignment σ 0 to the variables y, such that σ ∪ σ 0 falsifies E. In particular, for the k1 variables from x which are true according to M , it is possible to make true exactly k2 variables from y such that the formula E is not satisfied. Consider now the interpretation M 0 containing these k1 + k2 true variables plus the k1 + k2 made true by the two count subformulae. M 0 is a model of T whose P variables coincide with those of M except for w which belongs to M , but not to M 0 . Therefore, M is not P ; Z minimal, a contradiction. Finally, note that the transformation from Φ to S(Φ) is an fp-reduction. Indeed it is feasible in polynomial time and is just linear in k. t u
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
17
Corollary 2. Parameterized QBF2 SAT≤ is Σ2 W [SAT ]-complete. Proof. (Sketch.) Completeness follows from the fact that, as shown in Lemma 4, this problem belongs to Σ2 W [SAT ], and by Theorem 10, which shows that the Σ2 W [SAT ]t u hard problem SMC is fp-reducible to Parameterized QBF2 SAT≤ . Downey and Fellows [9] pointed out that completeness proofs for fixed parameter intractability classes are generally more involved than classical intractability proofs. Note that this is also the case for the above proof, where we had to deal with subtle counting issues. A straightforward downscaling of the standard Σ2P completeness proof for propositional circumscription appears not to be possible. In particular, observe that we have obtained our completeness result for a very general version of propositional minimal model reasoning, where there are variables to be minimized (P ) and floating variables (Z). It is well-known that minimal model reasoning remains Σ2P complete even if all variables of a formula are minimized (i.e., if Z is empty). This result does not seem to carry over to the setting of fixed parameter intractability. Clearly, this problem, being a restricted version of SMC, is in Σ2 W [SAT ]. Moreover it is easy to see that the problem is hard for W [2] and thus fixed parameter intractable. However, we were not able to show that the problem is complete for any class in the range from W [2] to Σ2 W [SAT ], and leave this issue as an open problem. Open Problem. Determine the fixed-parameter complexity of SMC when all variables of the theory T are to be minimized.
References 1. K. Apt, H. Blair, and A. Walker. Towards a Theory of Declarative Knowledge. In J. Minker, editor, Foundations of Deductive Databases and Logic Programming, pp. 89–148. Morgan Kaufman, Washington DC, 1988. 2. W. Bibel. Constraint Satisfaction from a Deductive Viewpoint. Artificial Intelligence, 35,401– 413, 1988. 3. H. L. Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM Journal on Computing, 25(6):1305-1317, 1996. 4. A.K. Chandra and P.M. Merlin. Optimal Implementation of Conjunctive Queries in relational Databases. In ACM Symp. on Theory of Computing (STOC’77), pp.77–90, 1977. 5. R.G. Downey and M.R. Fellows. Fixed Parameter Tractability and Completeness. Congressus Numerantium, 87:161–187, 1992. 6. R.G. Downey and M.R. Fellows. Fixed Parameter Intractability (Extended Abstract). In Proc. of Structure in Complexity Theory, IEEE, pp.36–50, 1992. 7. R.G. Downey and M.R. Fellows. Fixed Parameter Tractability and Completeness I: Basic Results. SIAM J. Comput., 24:873–921, 1995. 8. R.G. Downey and M.R. Fellows. On the Parametric Complexity of Relational Database Queries and a Sharper Characterization of W [1]. In Combinatorics Complexity and Logics, Proceedings of DMTCS’96, pp.164–213, Springer, 1996. 9. R.G. Downey and M.R. Fellows. Parameterized Complexity. Springer, New York, 1999. 10. T. Eiter and G. Gottlob. Propositional Circumscription and Extended Closed World Reasoning are ΠP 2 -complete. Theoretical Computer Science, 114(2):231–245, 1993. Addendum 118:315.
18
G. Gottlob, F. Scarcello, and M. Sideri
11. T. Eiter, G. Gottlob, and H. Mannila. Disjunctive Datalog. ACM Trans. on Database Syst., 22(3):364–418, September 1997. 12. S. Fortune, J.E. Hopcroft, and J. Wyllie. The Directed Subgraph Homeomorphism Problem. Theoretical Computer Science, 10(2): 111-121, 1980. 13. M.R. Garey and D.S. Johnson. Computers and Intractability. A Guide to the Theory of NPcompleteness. Freeman and Comp., NY, USA, 1979. 14. M. Gelfond and V. Lifschitz. The Stable Model Semantics for Logic Programming. In Logic Programming: Proc. Fifth Intl Conference and Symposium, pp. 1070–1080, Cambridge, Mass., 1988. MIT Press. 15. G. Gottlob, N. Leone, and F. Scarcello. The Complexity of Acyclic Conjunctive Queries. Technical Report DBAI-TR-98/17, available on the web as: http://www.dbai.tuwien.ac.at/staff/gottlob/acyclic.ps, or by email from the authors. An extended abstract concerning part of this work has been published in Proc. of the IEEE Symposium on Foundations of Computer Science (FOCS’98), pp.706–715, Palo Alto, CA, 1998. 16. G. Gottlob, N. Leone, and F. Scarcello.A Comparison of Structural CSP Decomposition Methods. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 394–399, Stockholm, 1999. 17. P. Jeavons, D. Cohen, and M. Gyssens. Closure Properties of Constraints. JACM, 44(4), 1997. 18. Ph. G. Kolaitis and M. Y. Vardi. Conjunctive-Query Containment and Constraint Satisfaction. Proc. of Symp. on Principles of Database Systems (PODS’98), 1998. 19. C.H. Papadimitriou and M. Yannakakis. On the Complexity of Database Queries. In Proc. of Symp. on Principles of Database Systems (PODS’97), pp.12–19, Tucson, Arizona, 1997. 20. N. Robertson and P.D. Seymour. Graph Minors II. Algorithmic Aspects of Tree-Width. J. Algorithms, 7:309-322, 1986. 21. N. Robertson and P.D. Seymour. Graph Minors XX. Wagner’s Conjecture. To appear. 22. M. Truszczy´nski. Computing Large and Small Stable Models. In Proc. of the 16th International Conference on Logic Programming (ICLP’99), Las Cruces, New Mexico. To appear. 23. M. Vardi. Complexity of Relational Query Languages. In Proceedings 14th STOC, pages 137–146, San Francisco, 1982. 24. M. Yannakakis. Algorithms for Acyclic Database Schemes. In Proc. of Int. Conf. on Very Large Data Bases (VLDB’81), pp. 82–94, C. Zaniolo and C. Delobel Eds., Cannes, France, 1981.
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power Tomi Janhunen? Helsinki University of Technology Laboratory for Theoretical Computer Science P.O.Box 5400, FIN-02015 HUT, Finland
[email protected]
Abstract. This paper reports on systematic research which aims to classify non-monotonic logics by their expressive power. The classification is based on translation functions that satisfy three important criteria: polynomiality, faithfulness and modularity (PFM for short). The basic method for classification is to prove that PFM translation functions exist (or do not exist) between certain logics. As a result, non-monotonic logics can be arranged to form a hierarchy. This paper gives an overview of the current expressive power hierarchy (EPH) and investigates semi-normal default logic as well as prerequisite-free and semi-normal default logic in order to locate their exact positions in the hierarchy.
1
Introduction
Non-monotonic reasoning (NMR) has a rich variety of formalizations such as McCarthy’s circumscription [18], Moore’s autoepistemic logic [19] and Reiter’s default logic [21]. These non-monotonic logics were proposed about twenty years ago and since then their interrelations have been extensively studied [3,5,6,7,9,11] [12,22]. This line of research has concentrated on finding translation functions that transform a theory of one non-monotonic logic into a theory of other such that the sets of conclusions associated with the former theory are preserved (to a reasonable degree) in this transformation. A number of variants of nonmonotonic logics has also been proposed. Let us just mention some of these approaches: parallel circumscription by Lifschitz [13], strong autoepistemic logic by Marek and Truszczy´ nski [16] and syntactically restricted forms of default logic such as normal default logic and prerequisite-free default logic (see e.g. [4]). Naturally, the interconnections of these variants to their predecessors have also been analyzed (see e.g. [1,5,6,8,15,16,20,23]). The translation functions proposed in the literature provide means to measure the expressive power of non-monotonic logics involved: a non-monotonic logic can capture expressions of another non-monotonic logic via a translation function. The tightness of this relationship depends on the requirements imposed on translation functions. Our recent experiences indicate that these requirements ?
The support from Academy of Finland (project 43963) is gratefully acknowledged.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 19–33, 1999. c Springer-Verlag Berlin Heidelberg 1999
20
T. Janhunen
affect the results on expressiveness in very delicate ways [9]. We have adopted three requirements from earlier approaches – namely polynomiality, faithfulness and modularity (PFM) – as the basis of our framework. In particular, the modularity requirement has turned out to be useful when one wants to differentiate non-monotonic logics by their expressive power [5,6,9]. The author has used PFM translation functions systematically for classifying non-monotonic logics on the basis of their expressive power. So far, our analysis [10] has covered eight non-monotonic logics – giving rise to a hierarchy (EPH) on page 21. In this paper, we analyze the expressive powers of semi-normal default logic (SNDL) as well as prerequisite-free and semi-normal default logic (PSNDL) in order to locate the exact positions of these logics in EPH. We proceed as follows. In Section 2, we describe the requirements for PFM translation functions and give an overview of the current EPH. Five syntactic variants of default logic are introduced in Section 3. The expressive powers of semi-normal default logic and semi-normal and prerequisite-free default logic are then analyzed in Sections 4 and 5, respectively. The paper ends with conclusions in Section 6.
2
PFM Translations and Expressive Power Hierarchy
In this section, we introduce the three basic requirements for translation functions. Let us introduce some notation and terminology needed in order to formulate these requirements. We write L(A) to introduce a propositional language L based on a set of propositional atoms A. We let hX, T i stand for a nonmonotonic theory where T ⊆ L is its propositional subtheory and X stands for any set(s) of syntactic elements which are specific to the non-monotonic logic L in question (such as a set of defaults D in Reiter’s default logic). The sets of conclusions associated with a non-monotonic theory hX, T i are called extensions (or expansions) of hX, T i and they determine the semantics of hX, T i. We consider only finite non-monotonic theories, and let ||hX, T i|| denote the length of hX, T i, i.e. the number of symbol occurrences needed to represent hX, T i. The three requirements for translation functions are formulated as follows. Definition 1. A translation function Tr : L1 → L2 is (i) polynomial iff for all hX, T i ∈ L1 , the time required to compute Tr(hX, T i) is polynomial in ||hX, T i||, (ii) faithful iff for all hX, T i ∈ L1 , the propositionally consistent extensions of hX, T i ∈ L1 and Tr(hX, T i) ∈ L2 are in one-to-one correspondence and coincide up to the propositional language L of T (iii) modular iff for all hX, T i ∈ L1 , the translation Tr(hX, T i) = hX 0 , T 0 ∪ T i where hX 0 , T 0 i = Tr(hX, ∅i). Polynomiality is a reasonable requirement from the computational point of view: translations should be computable in polynomial time and space. A faithful translation preserves the semantics of a non-monotonic theory hX, T i which is determined by its extensions. We require a one-to-one correspondence of propositionally consistent extensions, which supports both brave and cautious reasoning with extensions in a straightforward way. Moreover, the languages associated with hX, T i and Tr(hX, T i) may extend the propositional language L(A) of T ,
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
21
but a faithful translation function is supposed to preserve extensions of hX, T i up to L. The modularity requirement demands that a translation function provides a fixed translation for X which is independent of T . Consequently, there is no need to recompute Tr(hX, ∅i) whenever T is updated. A more detailed discussion and a comparison (e.g. with [5,17]) can be found in [9,10]. For the sake of brevity, we say that a translation function is PFM if it satisfies the three requirements of Definition 1. In particular, we note that any composition of PFM translation functions is also PFM [9]. PFM translation functions provide us a framework for analyzing the relative expressive power of non-monotonic logics: if there is a PFM translation function from one nonmonotonic logic L1 to another non-monotonic logic L2 , then is considered L2 to be at least as expressive as L1 (denoted by L1 −→ L2 ). If – in addition – there are no PFM translation functions in the opposite direction (denoted by L2 −→L 6 1 ), then we say that L1 is less expressive than L2 (denoted by L1 =⇒ L2 ). If there are PFM translation functions in both directions, then L1 and L2 are equally 6 6 expressive (denoted by L1 ←→ L2 ). If L1 −→L 2 and L2 −→L 1 hold for two logics L1 and L2 simultaneously, then L1 and L2 are incomparable with −→ (denoted by L1 ←→L 6 2 ). Note that −→ is a preorder on (non-monotonic) logics and =⇒ is a strict partial order among the equivalence classes induced by −→ .
.... ... ... ... ...
.... ... ... ... ...
.... ... ... ... ...
... ... ... ... ...
..... .. .. ... ...
INCREASING EXPRESSIVE POWER
The author has analyzed the interreTT lations of eight non-monotonic logics in DL SAEL PL terms of PFM translation functions [10]. Using the relations −→ and =⇒ it is possible to form a hierarchy of nonNDL AEL - PDL monotonic logics in Figure 1. Classical propositional logic (CL) is included in the hierarchy in order to complete our view. PNDL- CIRC The most expressive class contains default logic (DL) [21], strong autoepistemic logic (SAEL) as well as priority logic (PL) [24]. Below this class, there are two less exCL pressive but mutually incomparable classes. The one on the left contains normal Fig. 1: Expressive Power Hierarchy of default logic (NDL) while the one on the Non-monotonic Logics (EPH) right contains autoepistemic logic (AEL) [19] and prerequisite-free default logic (PDL) which are of equal expressive power. Below these two classes, there is a less expressive class containing parallel circumscription (CIRC) [13] as well as prerequisite-free and normal default logic (PNDL) which are equally expressive. The classes of EPH indicate some astonishing relationships in light of earlier expressiveness results (c.f. [1,5]): AEL and PDL are of equal expressive power and less expressive than DL and SAEL. NDL, PDL and PNDL are all syntactically restricted forms of default logic. It is clear by the classes of EPH that syntactic restrictions tend to decrease the expressive powers of the corresponding variants of DL. This motivates the goal of this paper which is to investigate the expressiveness of DL under the semi-normality restriction.
22
3
T. Janhunen
Syntactic Variants of Default Logic
In order to define syntactic variants of default logic, we begin with a short introduction to Reiter’s default logic [21]. A default theory is a pair hD, T i where n such that T ⊆ L and D is a set of default rules (or defaults) of the form α:β1 ,...,β γ n ≥ 0 and the prerequisite α, the justifications β1 , . . . , βn and the consequent γ of the rule are sentences of L. We let Jf(D) (Cq(D)) stand for the set of justifications (consequents) that appear in a set of defaults D. Marek and Truszczy´ nski [17] reduce a set of defaults D with respect to a propositional theory E ⊆ L to a set of inference rules DE which contains an inference rule αγ whenever there is
n ∈ D such that E ∪ {βi } is consistent for all 0 < i ≤ n. a default rule α:β1 ,...,β γ The closure of a theory T ⊆ L under a set of inference rules R is denoted by CnR (T ). This theory is the least theory E ⊆ L which (C1) contains T , (C2) is closed under propositional consequence and (C3) is closed under the rules of R, i.e. whenever αγ ∈ R and α ∈ E, then also γ ∈ E [17, Theorem 3.7]. It is possible to capture the closure CnR (T ) by introducing a notion of a proof from T as follows. A sequence αγ11 , . . . , αγnn of rules of R is an R-proof of φ ∈ L from T ⊆ L iff T ∪ {γ1 , . . . , γn } |= φ and T ∪ {γ1 , . . . , γi−1 } |= αi holds for each 0 < i ≤ n (c.f. [17] for a a slightly different system where the rules of R are incorporated into a propositional proof system). The following definition of extensions for a default theory hD, T i is equivalent to Reiter’s original definition [21].
Definition 2 (Marek and Truszczy´ nski [17]). A theory E ⊆ L is an extension of a default theory hD, T i in L if and only if E = CnDE (T ). Normality and semi-normality are examples of syntactic restrictions proposed for defaults (see e.g. [4,14]). Normal defaults have the form α:γ γ while semi-normal
. A default theory hD, T i is called normal if D defaults are of the form α:γ∧β γ contains only normal defaults. The fragment of DL corresponding to normal default theories under Reiter’s extensions is called normal DL (NDL). Seminormal default theories and semi-normal DL (SNDL) are defined analogously. A default of the form >:β1 γ,...,βn is called prerequisite-free and a shorthand :β1 ,...,βn γ
is often used for such a default. A default theory hD, T i is called prerequisite-free, if every default of D is prerequisite-free. Prerequisite-free DL (PDL) is the fragment of DL corresponding to prerequisite-free default theories under Reiter’s extensions. It is also possible to combine prerequisite-freedom with the normality and semi-normality conditions. This gives rise to prerequisite-free and normal default logic (PNDL) and prerequisite-free and semi-normal default logic :γ∧β (PSNDL). In these logics, defaults are of the forms :β β γ , respectively. The expressive powers of PDL, NDL and PNDL have already been analyzed [10]. This paper extends the analysis to cover SNDL and PSNDL. Lemma 1 (Marek and Truszczy´ nski [17]). If E ⊆ L is an extension of a default theory hD, T i in L, then E = Cn(T ∪ Γ ) where Γ ⊆ {γ | αγ ∈ DE }.
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
4
23
Classifying SNDL in EPH
The goal of this section is to locate the exact position of SNDL in EPH. We start by explaining how the current hierarchy provides lower and upper bounds for the expressiveness of SNDL. Firstly, a semi-normal default is a special case of an ordinary default and this implies that DL is at least as expressive as SNDL. becomes equivalent to a normal one when Secondly, a semi-normal default α:γ∧β γ β equals to > (it is customary to omit β in this case). Thus semi-normal defaults are able to express anything that normal defaults do – indicating that SNDL is at least as expressive as NDL. This is how we end up with a preliminary setting NDL −→ SNDL −→ DL, but the strictness of these relationships remains open. There is a significant difference between NDL and SNDL: a normal default theory has always at least one extension [17, p. 107], but this is not guaranteed for a semi-normal default theory. This is demonstrated by our next example. A variant of C without prerequisites appears in the literature (see e.g. [2,17]). f:r∧¬p , f:q∧¬r } Example 1. Consider a semi-normal set of defaults C = { f:p∧¬q p q , r and theories ∅ and T = {f}. The default theory hC, ∅i has exactly one extension: E = Cn(∅) in which none of the given defaults is applicable, since the common prerequisite f of the rules cannot be derived. However, this critical prerequisite is given in T directly, but the default theory hC, T i has no extensions. This is because the consequents and justifications of the three rules are circularly interdependent such that an extension cannot be established.
The set of defaults C provides us means to rule out extensions of any default theory hD, T i provided that f, p, q and r are new atoms with respect to hD, T i. For instance, if we want to exclude the extensions of hD, T i that contain a but not b, we extend D to a the set of defaults D0 = D∪C ∪{ a:¬b f }. Given an extension E of hD, T i that contains a but not b, it can be shown that E is not an extension of hD0 , T i. In fact, the possibility for the nonexistence of expansions is a distinctive feature of SNDL. It is sufficient to make a difference between SNDL and NDL what comes to expressiveness. Theorem 1. SNDL−→NDL 6 Proof. Let us assume that there is a PFM translation function Tr that transforms semi-normal default theories into normal ones. Let us recall the set of defaults C given in Example 1 and define a set of semi-normal defaults D = C ∪{ :f∧¬a f }. Let hD0 , T 0 i be the translation Tr(hD, ∅i). Note that hD, ∅i does not have extensions, but the normality of D0 guarantees the existence of an extension E 0 for hD0 , T 0 i [17, p. 106]. As Tr is faithful, E 0 must be inconsistent. As shown by Marek and Truszczy´ nski [17, p. 106], a normal default theory hD0 , T 0 i has an inconsistent extension E 0 if and only if T 0 is inconsistent. Thus T 0 must be inconsistent. Then consider a theory T = {a}. The default theory hD, T i has a unique extension E = Cn({a}) which is also propositionally consistent. The translation Tr(hD, T i) is hD0 , T 0 ∪ T i by the modularity of Tr. However, the theory T 0 ∪ T is also inconsistent and thus hD0 , T 0 ∪ T i has only an inconsistent extension E 0 = L0 . But this contradicts the faithfulness of the translation function Tr. 2
24
T. Janhunen
Theorem 1 indicates that NDL =⇒ SNDL, i.e. NDL is less expressive than SNDL. This suggests that we should next analyze whether the relationship SNDL −→ DL is strict or not. Thus we have to consider the possibilities of obtaining a PFM translation from standard DL into SNDL. The very problem n is represented in terms of semi-normal defaults. is how a default rule α:β1 ,...,β γ The main questions that arise in this respect are (i) how the consistency of justifications β1 , . . . , βn is tested and (ii) on what conditions the consequent γ is supposed to be inferable. To give answers to these questions we propose a translation function as follows. The translation function TrSN introduces a new atom cβ (meaning that β is consistent) for each justification β ∈ Jf(D) and a new atom bd (meaning that d is blocked) for each default d ∈ D. Definition 3. Let D be any set of defaults and C the set of defaults of Examn ∈ D, the translation TrSN (d) = ple 1. For an individual default d = α:β1 ,...,β γ :bd ∧¬cβ1 bd
:bd ∧¬cβn bd
d α∧¬γ:f∧¬bd } ∪ { α:γ∧¬b , }. For the default theory hD, T i, γ f S :cβ ∧β the translation TrSN (hD, T i) = hC ∪ { cβ | β ∈ Jf(D)} ∪ d∈D TrSN (d), T i.
{
,...,
The defaults introduced by TrSN have the following purposes in regard to n ∈ D. (i) Semi-normal defaults justifications β ∈ Jf(D) and defaults d = α:β1 ,...,β γ :c ∧β
of the form βcβ test the consistency of the justifications that appear in D. For each consistent justification β, the atom cβ is concluded by the rule. (ii) For each :b ∧¬c :b ∧¬c default d, the semi-normal defaults d bd β1 , . . . , d bd βn test whether each of the justifications β1 , . . . , βn is consistent. If not, then bd is derived by one of the rules to indicate that d is blocked (as one of its justifications is not consistent). d is a rewrite of the original default d. The (iii) The semi-normal default α:γ∧¬b γ consistency of the justifications β1 , . . . , βn is verified by checking that ¬bd is consistent, i.e. bd cannot be derived. Note that this amounts to testing that d is not blocked. (iv) The consequent γ of d appears as an additional justification of the preceding default in order to establish semi-normality. This leads to a complication that has to be relaxed by introducing the rules of C as well as a d . Such a default detects cases where semi-normal default of the form α∧¬γ:f∧¬b f d is applicable (α can be derived and each of β1 , . . . , βn is consistent with E), d tests the but γ is inconsistent with E (also ¬γ can be derived). Because α:γ∧¬b γ consistency of γ, it is unable to derive γ as well as a propositional contradiction1 d in this case. This is why α∧¬γ:f∧¬b and C are needed to ensure that no extension f can result in this case. This is how we utilize nonexistence of extensions as a substitute for propositional inconsistency of extensions. These are equivalent under the notion of faithfulness introduced in Section 2. Let us demonstrate the consistency checking mechanism in practice as follows. Example 2. Consider a set of defaults D = { >: a } and theories T1 = ∅ and T2 = {¬a} based on the language L({a}). The default theory hD, T1 i has a unique extension E1 = Cn({a}) while the default theory hD, T2 i has only one extension E2 = L which is propositionally inconsistent. 1
This is possible with justification-free defaults as to be demonstrated in Example 2.
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
25
The translation TrSN (hD, Ti i) = hC ∪ D0 , Ti i where D0 is the set of defaults :a∧¬b ¬a:f∧¬b }. The default theory hC ∪ D0 , T1 i has a unique exTrSN ( >: a ) = { a , f 0 tension E1 = Cn({a}) so that E1 = E10 ∩ L. On the other hand, the default theory hD0 , T2 i has an anomalous extension E20 = Cn({¬a, f}), but the defaults in C ensure that hC ∪ D0 , T2 i has no extensions (recall Example 1 and discussion after that). Thus we have a one-to-one correspondence of propositionally consistent extensions of hD, Ti i and the translation TrSN (hD, Ti i) = hC ∪ D0 , Ti i such that extensions coincide up to the language L of Ti . Example 2 suggests that TrSN is faithful. Our next objective is to show that this in deed the case. We need a subsidiary result on the effects of adding a set of new literals L to a propositional theory T (note that M ⊆ A is a model of T and M0 ⊆ A0 is a model of L ⇔ M ∪ M0 ⊆ A ∪ A0 is a model of T ∪ L). Lemma 2. Let T ⊆ L(A) and let L be set of literals based on a set of atoms A0 such that A ∩ A0 = ∅. Thus T ∪ L is a propositional theory in L0 (A ∪ A0 ). Let φ ∈ L be any sentence and l any literal based on A0 . – If L is consistent, then (i) T ∪ L |= φ ⇔ T |= φ, (ii) φ is consistent with T ∪L ⇔ φ is consistent with T and (iii) T is consistent ⇔ T ∪L is consistent. – If T is consistent, then (i) T ∪ L |= l ⇔ L |= l, (ii) l is consistent with T ∪L ⇔ l is consistent with L and (iii) T ∪L is consistent ⇔ L is consistent. – If L is consistent, then L |= l ⇔ l ∈ L. – T ∪ L is consistent ⇔ T is consistent and L is consistent. To make precise the relationship of propositionally consistent extensions of hD, T i and the translation TrSN (hD, T i), we introduce mappings ExtSN and Ext (for propositionally consistent and closed theories) that are later shown to establish a one-to-one correspondence between these classes of extensions. Definition 4. Let hD, T i be a default theory in L(A) and A0 = {f, p, q, r} ∪ {cβ | β ∈ Jf(D)}∪{bd | d ∈ D}. Let hD0 , T i be the translation TrSN (hD, T i) which has the language L0 (A ∪ A0 ). For every propositionally closed theory E ⊆ L, let ExtSN (E) = Cn(E ∪ A) ⊆ L0 where A ⊆ A0 is a set of atoms atoms containing (i) the atom cβ for each justification β ∈ Jf(D) that is consistent with E and (ii) n ∈ D having a justification βi which the atom bd for each default d = α:β1 ,...,β γ is not consistent with E. For every propositionally closed theory E 0 ⊆ L0 , let Ext(E 0 ) = E 0 ∩ L. 0 Let us then precompute the reduction DE 0 as far as possible, given propositionally consistent and closed theories E ⊆ L and E 0 = ExtSN (E).
Lemma 3. Assume definitions given in Definition 4. If E ⊆ L(A) is a propositionally consistent and closed theory and A ⊆ A0 is a set of new atoms, then D0 and E 0 = Cn(E ∪ A) ⊆ L0 (A ∪ A0 ) satisfy for all justifications β ∈ Jf(D) and for 0 n all defaults d = α:β1 ,...,β ∈ D the following: (R1) c>β ∈ DE 0 ⇔ β is consistent γ α 0 0 with E, (R2) b>d ∈ DE 0 ⇔ cβi 6∈A for some justification βi , (R3) γ ∈ DE 0 α∧¬γ 0 ⇔ (γ is consistent with E and bd 6∈A), (R4) f ∈ DE 0 ⇔ bd 6∈A, (R5) f f f 0 0 0 p ∈ DE 0 ⇔ q 6∈A, q ∈ DE 0 ⇔ r 6∈A, and r ∈ DE 0 ⇔ p 6∈A.
26
T. Janhunen
0 Proof sketch. Use the definition of DE 0 and Lemma 2 repeatedly.
2
Lemma 4. Let hD0 , T i be the translation TrSN (hD, T i) as given in Definition 3. If E 0 ⊆ L0 (A ∪ A0 ) is a propositionally consistent extension of hD0 , T i, then (i) E = E 0 ∩L = Cn(T ∪Γ ) is propositionally closed and consistent, (ii) E 0 = Cn(T ∪ 0 0 > 0 Γ ∪ A) = Cn(E ∪ A) where Γ ⊆ {γ ∈ L | αγ ∈ DE 0 } and A = {a ∈ A | a ∈ DE 0 }. Moreover, A ∩ {f, p, q, r} = ∅. 0
Proof sketch. Let E 0 = CnDE0 (T ) be a propositionally consistent extension of hD0 , T i implying that E = E 0 ∩ L is propositionally closed and consistent. Then 0 Lemma 1 implies that E 0 = Cn(T ∪ Γ 0 ) where Γ 0 ⊆ {γ | αγ ∈ DE 0 }. The possible 0 members of DE 0 are listed in Lemma 3. Thus we can partition Γ 0 into two disjoint 0 sets of consequents Γ = Γ 0 ∩L and A = Γ 0 ∩A0 . Note that Γ ⊆ {γ ∈ L | αγ ∈ DE 0} 0 α 0 0 0 and A ⊆ {a ∈ A | a ∈ DE 0 }. Then E = Cn(T ∪ Γ ∪ A) and E = E ∩ L implies E = Cn(T ∪ Γ ) by Lemma 2. Thus also E 0 = Cn(E ∪ A) holds by closure properties. Moreover, if we assume that f ∈ A we can establish that p ∈ A ⇔ f 0 p ∈ DE 0 (and similarly for q and r by symmetry). Together with R5, we obtain f f 0 0 0 p ∈ A ⇔ pf ∈ DE 0 ⇔ q 6∈A ⇔ q 6∈DE 0 ⇔ r ∈ A ⇔ r ∈ DE 0 ⇔ p 6∈A, a 0 contradiction. Hence f 6∈A is the case and f 6∈E follows by Lemma 2. f 0 By Lemma 3, the only rule of DE 0 having p as a consequent is p (if present 0
by R5). Since E 0 = CnDE0 (T ) is consistent, and f 6∈E 0 , we know that p 6∈E 0 and p 6∈ A by Lemma 2. Thus q 6∈ A and r 6∈ A for symmetry reasons and 0 A ∩ {f, p, q, r} = ∅. Lemma 3 implies that A = {a ∈ A0 | >a ∈ DE 0 }, as the rules 0 of DE 0 that have f, p, q or r as a consequent are not applicable. 2
Propositions 1 and 2 establish that ExtSN and Ext are mappings between the propositionally consistent extensions of a default theory hD, T i and the propositionally consistent extensions of the translation TrSN (hD, T i). Proposition 1. If a theory E ⊆ L is a propositionally consistent extension of a default theory hD, T i in L(A), then ExtSN (E) ⊆ L0 (A ∪ A0 ) given in Definition 4 is a propositionally consistent extension of TrSN (hD, T i). Proof sketch. Assume definitions given in Definition 4. Let E = CnDE (T ) be a propositionally consistent extension of hD, T i. It follows by Lemma 2 that also E 0 = ExtSN (E) = Cn(E ∪ A) is propositionally consistent. The conditions 0 R1–R5 in Lemma 3 are also satisfied. The proof of E 0 = CnDE0 (T ) follows. 0 (⊆) Let us establish A ⊆ CnDE0 (T ) at first. (i) If cβ ∈ A for a justification β ∈ 0 Jf(D), it follows that β is consistent with E by the definition of A. Thus c>β ∈ DE 0 0
n by R1 and cβ ∈ CnDE0 (T ) follows. (ii) If bd ∈ A for a default d = α:β1 ,...,β ∈ D, γ then some of the justifications βi of d is not consistent with E and cβi 6∈A by D0 0 0 the definition of A. Thus b>d ∈ DE 0 by R2 and bd ∈ Cn E (T ) is the case. It 0
0
follows by (i) and (ii) that A ⊆ CnDE0 (T ). Still E = CnDE (T ) ⊆ CnDE0 (T ) should be established. It can be proved by induction on the lengths of DE 0 proofs that if φ ∈ L is DE -provable from T in k steps, then φ ∈ CnDE0 (T ). Thus 0 0 E ∪ A ⊆ CnDE0 (T ) implying that also E 0 = Cn(E ∪ A) ⊆ CnDE0 (T ).
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
27
(⊇) It suffices to show that E 0 satisfies the closure properties C1–C3 of (T ). It is clear that T ⊆ E 0 , as T ⊆ E ⊆ E 0 , and that E 0 is proposiCn tionally closed. Moreover, it can be shown that E 0 = Cn(E ∪ A) is closed under α0 0 the rules of DE 0 using R1-R5. This requires one to check all the rules γ 0 of the 0 DE 0
0
0 forms c>β , b>d , αγ , α∧¬γ , pf , qf and fr . If we assume that αγ 0 ∈ DE 0 (the conditions f 0 0 0 2 are given by R1–R5) and α ∈ E , we can establish that γ ∈ E 0 as well.
Proposition 2. Let hD, T i be a default theory in L(A) and let hD0 , T i be the translation TrSN (hD, T i) as given in Definition 3. If a theory E 0 ⊆ L0 (A ∪ A0 ) is a propositionally consistent extension of hD0 , T i, then the theory E = Ext(E 0 ) = E 0 ∩ L is a propositionally consistent extension of hD, T i. Proof sketch. Let E 0 ⊆ L0 (A ∪ A0 ) be a propositionally consistent extension of hD0 , T i and E = E 0 ∩L. It follows by Lemma 4 that E is propositionally closed as 0 well as consistent and E 0 = Cn(E ∪ A) for a set of atoms A = {a ∈ A | >a ∈ DE 0} DE so that A ∩ {f, p, q, r} = ∅. The proof of E = Cn (T ) follows. (⊆) Using the conditions R1–R5, it can shown for all φ ∈ L that φ ∈ 0 DE 0 Cn 0 (T ) implies φ ∈ CnDT (T ) by induction on the lengths of DE 0 -proofs. 0 (⊇) Let us establish that E = E ∩ L has the closure properties C1–C3 of CnDE (T ). (C1) Since T ⊆ L and T ⊆ E 0 , it holds that T ⊆ E = E 0 ∩ L. (C2) As n ∈D noted already, E is propositionally closed. (C3) Consider any d = α:β1 ,...,β γ α such that γ ∈ DE and α ∈ E. The former implies that each justification βi is 0 ∈ DE consistent with E. Thus c> 0 by R1 and cβi ∈ A for each βi . It follows β i
α∧¬γ 0 0 by R2 that b>d 6∈DE ∈ DE 0 so that bd 6∈A by the definition of A. Thus 0 f by R4. Now assuming that γ is not consistent with E implies that ¬γ ∈ E, α ∧ ¬γ ∈ E ⊆ E 0 , and f ∈ E 0 , a contradiction. Hence γ is consistent with E and 0 DE α 0 0 0 0 (T ). γ ∈ DE 0 by R3. Then α ∈ E implies α ∈ E as well as γ ∈ E = Cn Since γ ∈ L, it follows that γ ∈ E. Thus E is closed under the rules of DE . 2
Using mappings ExtSN and Ext, we can establish the desired one-to-one correspondence for propositionally consistent extensions. Proposition 3. The propositionally consistent extensions of a default-theory hD, T i and the translation TrSN (hD, T i) are in one-to-one correspondence. Proof sketch. Let E1 and E2 be two propositionally consistent extensions of hD, T i such that ExtSN (E1 ) = ExtSN (E2 ). It follows by Definition 4 and Lemma 2 that Cn(E1 ∪A1 ) = Cn(E2 ∪A2 ) where A1 ⊆ A0 , A2 ⊆ A0 , E1 = Cn(E1 ∪A1 )∩L and E2 = Cn(E2 ∪ A2 ) ∩ L. Thus E1 = E2 and ExtSN is injective. Let E10 and E20 be two propositionally consistent extensions of TrSN (hD, T i) such that Ext(E10 ) = Ext(E20 ), i.e. E10 ∩ L = E20 ∩ L = E. Let i ∈ {1, 2}. It 0 holds by Lemma 4 that Ei0 = Cn(E ∪ Ai ) where Ai = {a ∈ A0 | >a ∈ DE 0 }. Thus > cβ
i
> 0 0 ∈ DE 0 ⇔ cβ ∈ DE20 and cβ ∈ A1 ⇔ cβ ∈ A2 by R1 and the definitions of 1 A1 and A2 . Consequently, also bd ∈ A1 ⇔ bd ∈ A2 by R2 and the preceding equivalence. Recall also that A1 ∩ {f, p, q, r} = A2 ∩ {f, p, q, r} = ∅ by Lemma 4. Thus A1 = A2 as well as E10 = E20 .
28
T. Janhunen
The mappings ExtSN and Ext are inverses of each other as Ext(ExtSN (E)) = 2 ExtSN (E) ∩ L = Cn(E ∪ A) ∩ L = E by Lemma 2. Let us now state the main result of this paper: DL and SNDL are of equal expressive power according to the measure provided by PFM translations. Theorem 2. DL ←→ SNDL. Proof. (−→) TrSN is obviously polynomial and modular. The one-to-one correspondence of propositionally consistent extensions is established in Propositions 1–3. Proposition 2 implies that these extensions coincide up to L. Therefore TrSN is also faithful. (←−) The translation function TrI (hD, T i) = hD, T i is PFM. 2 Marek and Truszczy´ nski [17, Theorem 5.19] propose a translation function TrMT that transforms a default theory hD, T i to a weak semi-normal default theory hD0 , T i where D0 contains semi-normal defaults or justification-free defaults (of the form α: γ ) that correspond to monotonic inference rules. The function TrMT translates a default d = α:cd α:cd ∧β 1 ∧β1 , . . . , ncd n cd n 1
d cd 1 ∧...∧cn : . γ
α:β1 ,...,βn γ
∈ D into following defaults:
and The default for checking the consistency of a justification βi is semi-normal and almost like ours except α is used as a prerequisite. The last rule (that controls the derivation of the consequent γ of the original rule) is not semi-normal. Marek and Truszczy´ nski establish that TrMT is PFM so that DL ←→ WSNDL. However, this result does not yet establish that DL and SNDL are of expressive power. This is because weak semi-normal theories have a richer syntax and are therefore at least as expressive as semi-normal default theories (i.e. SNDL −→ WSNDL). In order to establish SNDL ←→ WSNDL, the key problem is to express justification-free defaults in terms of semi-normal ones. The translational technique behind TrSN provides a solution: a justification-free α:γ α∧¬γ:f default α: together γ can be expressed using semi-normal defaults γ and f with the set of defaults of C in Example 1. From the historical perspective, it is also worth mentioning earlier work [14] by Lukaszewicz who considers the possibilities of translating a default α:β γ into a
semi-normal default α:γ∧β as well as into a normal default α:γ∧β γ γ∧β . This is because he argues that normal defaults are the only defaults that one needs in practice. d introduced by The first step yields a default that resembles the default α:γ∧¬b γ TrSN , but Lukaszewicz does not provide a consistency checking mechanism (as TrSN does). Thus the set of defaults addressed in Example 2 cannot be faithfully captured in his approach. On the other hand, Theorem 1 and EPH indicate that the second translation considered by Lukaszewicz cannot be faithful.
5
Classifying PSNDL in EPH
As already explained in Section 3 there is an a further way to constrain defaults by denying prerequisites. This is how we end up with defaults of the form :γ∧β that are special cases of both semi-normal and prerequisite-free defaults. γ
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
29
This suggests that PSNDL −→ SNDL and PSNDL −→ PDL. However, the author [10] has already established that PDL =⇒ DL which implies PDL =⇒ SNDL by Theorem 2. Moreover, assuming that SNDL −→ PSNDL would imply that SNDL −→ PDL by the relationship PSNDL −→ PDL and compositionality of PFM translation functions. But this contradicts PDL =⇒ SNDL. Thus SNDL−→PSNDL 6 is the case and PDL gives an upper bound for the expressiveness of PSNDL. To examine whether PDL provides a tight bound, we have to consider the possibilities of translating systematically prerequisite-free defaults n into prerequisite-free and semi-normal ones. of the form :β1 ,...,β γ The function TrSN provides a natural starting point for our considerations, as it produces semi-normal defaults. The set of defaults C involved in TrSN has prerequisites, and we have to make a rearrangements in order to translate C into a set of prerequisite-free and semi-normal defaults (denoted by C 0 below). Example 3. Consider a prerequisite-free and semi-normal set of defaults C 0 = :q∧¬r :r∧¬p :p∧¬f :q∧¬f :r∧¬f { :p∧¬q p , q , r , p , q , r } and theories ∅ and T = {f}. The default theory hC 0 , ∅i has exactly one extension: E = Cn({p, q, r}) in which the last three of the given defaults are applicable and the justifications of the first three rules are inconsistent with E. However, the atom f is present in T directly, preventing the applicability of the last three defaults so that p, q and r cannot be derived by them. Then the first three defaults are again circularly interdependent (in analogy to the defaults of C if f can be derived) such that no extension results. The translation function TrSN produces also defaults of the forms
α:γ∧¬bd γ
d and α∧¬γ:f∧¬b that are not prerequisite-free. Whenever α = > these defaults f d d and ¬γ:f∧¬b . The last category has still a reduce to defaults of the form :γ∧¬b γ f prerequisite and we have to express the consistency check of γ somehow else. For these reasons, the translation function TrPSN introduces a new atom cγ (meaning that γ is consistent) for consequents γ ∈ Cq(D) as well.
Definition 5. Let D be any set of prerequisite-free defaults and C 0 the set of n defaults of Example 3. For an individual default d = :β1 ,...,β ∈ D, the translaγ
:bd ∧¬cβ1 :b ∧¬c d :f∧¬bd ∧¬cγ , . . . , d bd βn } ∪ { :γ∧¬b , } and for hD, T i, bd γ S f :c ∧φ hC 0 ∪ { φcφ | φ ∈ Jf(D) ∪ Cq(D)} ∪ d∈D TrPSN (d), T i.
tion TrPSN (d) = { TrPSN (hD, T i) =
d is expressed using a In this translation, the semi-normal default ¬γ:f∧¬b f :f∧¬bd ∧¬cγ default which is prerequisite-free and semi-normal. If the refined juf stification ¬bd ∧¬cγ is consistent, then the justifications of the translated default n are all consistent, but γ is not consistent. This is exactly the case d = :β1 ,...,β γ
d when :γ∧¬b cannot be applied due to semi-normality of the default (although γ it should be applied to derive a propositional contradiction). Thus it is natural to derive f by the former rule in this case in order to prevent an extension where d is not properly applied. Let us then define a mapping (a revision of ExtSN ).
30
T. Janhunen
Definition 6. Let hD, T i be a prerequisite-free default theory in L(A) and A0 = {f, p, q, r}∪{cβ | β ∈ Jf(D) ∪ Cq(D)}∪{bd | d ∈ D}. Let hD0 , T i be the translation TrPSN (hD, T i) in L0 (A ∪ A0 ). For every propositionally closed theory E ⊆ L, let ExtPSN (E) = Cn(E ∪ A) ⊆ L0 where A ⊆ A0 is a set of atoms atoms containing (i) the atom cφ for each justification or consequent φ ∈ Jf(D) ∪ Cq(D) that is n ∈ D having a consistent with E, (ii) the atom bd for each default d = :β1 ,...,β γ justification βi which is not consistent with E and (iii) the atoms p, q and r. The following three lemmas correspond to Lemmas 3, 1 and 4, respectively. Lemma 5. If E ⊆ L(A) is propositionally consistent and closed and A ⊆ A0 then E 0 = Cn(E ∪ A) ⊆ L0 (A ∪ A0 ) satisfies for all φ ∈ Jf(D) ∪ Cq(D) and for 0 n all d = :β1 ,...,β ∈ D the following: (PR1) c>φ ∈ DE 0 ⇔ φ is consistent with E, γ
> 0 0 (PR2) b>d ∈ DE 0 ⇔ cβi 6∈A for some justification βi , (PR3) γ ∈ DE 0 ⇔ (γ > 0 is consistent with E and bd 6∈A), (PR4) f ∈ DE 0 ⇔ (bd 6∈A and cγ 6∈A), > > 0 0 0 (PR5) > p ∈ DE 0 ⇔ f 6∈A, q ∈ DE 0 ⇔ f 6∈A, r ∈ DE 0 ⇔ f 6∈A, (PR6) > > > 0 0 0 p ∈ DE 0 ⇔ q 6∈A, q ∈ DE 0 ⇔ r 6∈A, and r ∈ DE 0 ⇔ p 6∈A.
Lemma 6. If E ⊆ L is an extension of a prerequisite-free default theory hD, T i in L, then E = Cn(T ∪ Γ ) where Γ = {γ | > γ ∈ DE }. Lemma 7. Let hD0 , T i be the translation TrPSN (hD, T i) as given in Definition 3. If E 0 ⊆ L0 (A ∪ A0 ) is a propositionally consistent extension of hD0 , T i, then (i) E = E 0 ∩ L = Cn(T ∪ Γ ) is propositionally closed and consistent and (ii) 0 E 0 = Cn(T ∪ Γ ∪ A) = Cn(E ∪ A) where Γ = {γ ∈ L | > γ ∈ DE 0 } and A = 0 > 0 {a ∈ A | a ∈ DE 0 }. Moreover, f 6∈A, but {p, q, r} ⊆ A. 0
Proof sketch. Let E 0 = CnDE0 (T ) be a propositionally consistent extension of hD0 , T i so that E = E 0 ∩ L is propositionally closed and consistent. Lemma 6 0 0 implies that E 0 = Cn(T ∪ Γ 0 ) where Γ 0 = {γ | > γ ∈ DE 0 }, as D is prerequisite0 0 0 0 free. Let us partition Γ into Γ = Γ ∩ L and A = Γ ∩ A by the structure of D0 . Then E 0 = Cn(T ∪ Γ ∪ A) and E = E 0 ∩ L imply E = Cn(T ∪ Γ ) and E 0 = Cn(E ∪ A) by Lemma 2. If f ∈ A, PR5 and PR6 imply that p ∈ A ⇔ > > > 0 0 0 p ∈ DE 0 ⇔ q 6∈A ⇔ q 6∈DE 0 ⇔ r ∈ A ⇔ r ∈ DE 0 ⇔ p 6∈A, a contradiction. > > 0 2 Hence f 6∈A and { > p , q , r } ⊆ DE 0 (by PR5) implying that {p, q, r} ⊆ A. We are ready to establish that ExtPSN and ExtP are mappings between the propositionally consistent extensions of a default theory hD, T i and the propositionally consistent extensions of the translation TrPSN (hD, T i). Proposition 4. If a theory E ⊆ L is a propositionally consistent extension of a prerequisite-free default theory hD, T i in L(A), then ExtPSN (E) ⊆ L0 (A ∪ A0 ) given in Definition 6 is a propositionally consistent extension of TrPSN (hD, T i). Proof. Assume definitions given in Definition 5. Let E ⊆ L be a propositionally consistent extension of hD, T i. Since E is propositionally consistent, it is clear by
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
31
Lemma 2 that also E 0 = ExtPSN (E) = Cn(E ∪ A) is propositionally consistent. Moreover, the conditions PR1–PR6 given in Lemma 5 are met. On the basis 0 of these conditions and the definition of A, the reduction DE 0 contains for each :β1 ,...,βn > φ ∈ Jf(D) ∪ Cq(D) and d = ∈ D (i) the rule if φ is consistent with E, γ cφ (ii) the rule b>d if some of justifications βi is not consistent with E, (iii) the rule > γ if γ is consistent with E and each of the justifications β1 , . . . , βn is consistent > > with E and (iv) the rules > p , q and r . It follows by the items above and the 0
definition of A that CnDE0 (T ) = Cn(T ∪ Γ ∪ A) where Γ contains a consequent n γ of a default :β1 ,...,β if and only if γ is consistent with E and > γ γ ∈ DE . Let us 0
then establish Cn(E ∪ A) = CnDE0 (T ). The key observation is that E satisfies E = CnDE (T ) = Cn(T ∪ {γ | > γ ∈ DE }) by Lemma 6. Since E is propositionally consistent, it follows that each consequent γ for which > γ ∈ DE is necessarily consistent with E. Thus E = Cn(T ∪ Γ ) and E 0 = Cn(E ∪ A) = Cn(T ∪ Γ ∪ A) = 0 CnDE0 (T ) is a propositionally consistent extension of hD0 , T i. 2 Proposition 5. Let hD, T i be a prerequisite-free default theory in L(A) and let hD0 , T i be the translation TrPSN (hD, T i) as given in Definition 6. If a theory E 0 ⊆ L0 (A ∪ A0 ) is a propositionally consistent extension of hD0 , T i then the theory E = Ext(E 0 ) = E 0 ∩ L is a propositionally consistent extension of hD, T i. Proof. Let E 0 be a propositionally consistent extension of hD0 , T i. It follows by 0 Lemma 7 that E 0 = CnDE0 (T ) = Cn(E ∪ A) where A ⊆ A0 and E = E 0 ∩ L is propositionally closed and consistent. The proof of E = CnDE (T ) ⊆ E follows. (⊆) Consider any φ ∈ E. If follows that φ ∈ L and T ∪ Γ |= φ by Lemma 7. 0 n ∈ D such that > Note that γ ∈ Γ whenever d = :β1 ,...,β γ γ ∈ DE 0 . By PR3, this 0 is the case ⇔ γ is consistent with E and bd 6∈A. This implies that b>d 6∈DE 0 by > > 0 Lemma 7. Thus {cβ1 , . . . , cβn } ⊆ A by PR2 and { cβ , . . . , cβ } ⊆ DE 0 by Lemma 1
n
7. Then PR1 implies that each of β1 , . . . , βn is consistent with E so that > γ ∈ DE DE DE DE and γ ∈ Cn (T ). Thus Γ ⊆ Cn (T ). Since also T ⊆ Cn (T ) and T ∪ Γ |= φ holds, it follows that φ ∈ CnDE (T ), because CnDE (T ) is propositionally closed. (⊇) (C1) As T ⊆ L and T ⊆ E 0 , it holds that T ⊆ E = E 0 ∩ L. (C2) As noted n already, E is propositionally closed. (C3) Let d = :β1 ,...,β be any default from D γ > and assume that γ ∈ DE . It follows that each of the justifications β1 , . . . , βn is 0 , . . . , c> belong to DE consistent with E. Then the rules c> 0 (by PR1) implying β β 1
n
0 that {cβ1 , . . . , cβn } ⊆ A. But then PR2 implies b>d 6∈DE 0 as well as bd 6∈A. > Moreover, the rule γ ∈ DE implying that γ ∈ E and γ is consistent with E, 0 0 as E is consistent. It follows by PR3 that the rule > γ ∈ DE 0 and γ ∈ E . Since 2 γ ∈ L, we know that γ ∈ E holds as well. Thus E is closed under DE .
The one-to-one correspondence of extensions is established in analogy to Proposition 3. By this relationship of extensions, the function TrPSN is PFM and PSNDL and PDL reside in the same class of EPH (this is a straightforward analog to Theorem 2 and we omit the proof of Theorem 3).
32
T. Janhunen
Proposition 6. The propositionally consistent extensions of a prerequisite-free default-theory hD, T i and the propositionally consistent extensions of the translation TrPSN (hD, T i) are in one-to-one correspondence. Theorem 3. PDL ←→ PSNDL.
6
Conclusions
This paper continues earlier research by the author on classifying non-monotonic logics on the basis of their expressive power. The framework [10] is based on the notion of a polynomial, faithful and modular (PFM) translation function that maps systematically theories of one non-monotonic logic to theories of other such that the semantics of theories is preserved. Then it is possible to use the existence/nonexistence of such translations between certain non-monotonic logics as criteria to rank non-monotonic logics on the basis of their expressive power. This gives rise to the expressive power hierarchy (EPH) of non-monotonic logics. This paper analyzes semi-normal default logic in order locate its position in EPH. A PFM translation function is presented in order to establish the main result of this paper (Theorem 2): semi-normal default logic (SNDL) and Reiter’s default logic (DL) are of equal expressive power, i.e. SNDL ←→ DL. Thus semi-normality is an example of a syntactic restriction that does not affect the expressiveness of DL. In contrast to this, normal and prerequisite-free defaults are already less expressive, as NDL and PDL reside lower in EPH. The result of Theorem 2 has some interesting consequences. The first one relates to Reiter’s original definition of defaults: that are assumed to have at least one justification (n > 0) as noted in [17, p. 71]. Recall that we allow justificationfree defaults (n ≥ 0). However, Theorem 2 indicates that justification-free defaults do not increase the expressiveness of DL. Moreover, Theorem 2 implies that it is possible to translate arbitrary defaults into defaults that have only a single justification (n = 1). This tightens Marek and Truszczy´ nski’s result that unitary defaults (0 ≤ n ≤ 1) are sufficient [17, Corollary 5.20]. The equality of SNDL and DL implies also that SNDL−→NDL, 6 i.e. there is no PFM translation function from SNDL to normal default logic (NDL), since it is already known that DL−→NDL 6 [10]. Nevertheless, a direct counter-example is given in this paper for illustrative purposes (see Theorem 1). The structure of EPH implies that SNDL has greater expressive power than Moore’s autoepistemic logic (AEL) so that SNDL−→AEL 6 holds in harmony with [5]. The effects of the prerequisite-freedom are also evaluated in this paper in conjunction with the semi-normality. The resulting logic PSNDL turns out to be less expressive than SNDL. In fact, PSNDL resides in the same class as PDL as indicated by Theorem 3. This is intuitive: since DL ←→ SNDL it is natural to expect that PDL ←→ PSNDL as the syntaxes of PDL and PSNDL are obtained by constraining those of DL and SNDL in the same way. One of the implications of Theorem 3 is also that NDL←→PSNDL. 6 This indicates that NDL has features (rules that are close to monotonic inference rules) that cannot be captured in PSNDL, and vice versa (existence of extensions is not guaranteed in PSNDL).
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
33
References 1. P.A. Bonatti and T. Eiter. Querying disjunctive database through nonmonotonic logics. Theoretical Computer Science, 160:321–363, 1996. 2. D.W. Etherington. Formalizing nonmonotonic reasoning systems. Artificial Intelligence, 31:41–85, 1987. 3. D.W. Etherington. Relating default logic and circumscription. In Proceedings of IJCAI’87, pages 489–494, Milan, Italy, August 1987. Morgan Kaufmann. 4. D.W. Etherington. Reasoning with Incomplete Information. Pitman, London, 1988. 5. G. Gottlob. Translating default logic into standard autoepistemic logic. Journal of the Association for Computing Machinery, 42(2):711–740, 1995. 6. T. Imielinski. Results on translating defaults to circumscription. Artificial Intelligence, 32:131–146, 1987. 7. T. Janhunen. Representing autoepistemic introspection in terms of default rules. In Proceedings of ECAI’96, pages 70–74, Budapest, Hungary, 1996. John Wiley. 8. T. Janhunen. Separating disbeliefs from beliefs in autoepistemic reasoning. In J. Dix, U. Furbach, and A. Nerode, editors, Proceedings of LPNMR’97, pages 132– 151, Dagstuhl, Germany, July 1997. Springer-Verlag. LNAI 1265. 9. T. Janhunen. On the intertranslatability of autoepistemic, default and priority logics, and parallel circumscription. In Proceedings of JELIA’98, pages 216–232, Dagstuhl, Germany, October 1998. Springer-Verlag. LNAI 1489. 10. T. Janhunen. On the intertranslatability of non-monotonic logics. Annals of Mathematics and Artificial Intelligence (issue on JELIA’98). Accepted for Publication. 11. K. Konolige. On the relation between default and autoepistemic logic. Artificial Intelligence, 35:343–382, 1988. 12. K. Konolige. On the relation between autoepistemic logic and circumscription. In Proceedings of IJCAI’89, pages 1213–1218, Detroit, Michigan, USA, August 1989. 13. V. Lifschitz. Computing circumscription. In Proceedings of IJCAI’85, pages 121– 127, Los Angeles, California, USA, August 1985. Morgan Kaufmann. 14. W. Lukaszewicz. Two results on default logic. In Proceedings of IJCAI’85, pages 459–461, Los Angeles, California, August 1985. 15. W. Marek, G.F. Schwarz, and M. Truszczy´ nski. Modal nonmonotonic logics: Ranges, characterization, computation. Journal of the ACM, 40(4):963–990, 1993. 16. W. Marek and M. Truszczy´ nski. Modal logic for default reasoning. Annals of Mathematics and Artificial Intelligence, 1:275–302, 1990. 17. W. Marek and M. Truszczy´ nski. Nonmonotonic Logic: Context-Dependent Reasoning. Springer-Verlag, Berlin, 1993. 18. J. McCarthy. Circumscription—a form of non-monotonic reasoning. Artificial Intelligence, 13:27–39, 1980. 19. R.C. Moore. Semantical considerations on nonmonotonic logic. In Proceedings of IJCAI’83, pages 272–279, Karlsruhe, FRG, August 1983. Morgan Kaufmann. 20. I. Niemel¨ a. A unifying framework for nonmonotonic reasoning. In Proceedings of ECAI’92, pages 334–338, Vienna, Austria, August 1992. John Wiley. 21. R. Reiter. A logic for default reasoning. Artificial Intelligence, 13:81–132, 1980. 22. G. Schwarz. On embedding default logic into Moore’s autoepistemic logic. Artificial Intelligence, 80:349–359, 1996. 23. M. Truszczy´ nski. Modal interpretations of default logic. In Proceedings of IJCAI’91, pages 393–398, Sydney, Australia, August 1991. Morgan Kaufmann. 24. X. Wang, J.-H. You, and L.Y. Yuan. Nonmonotonic reasoning by monotonic inferences with priority constraints. In Proceedings of the 2nd International Workshop on Non-Monotonic Extensions of LP, pages 91–109. Springer, 1996. LNAI 1216.
Locally Determined Logic Programs Douglas Cenzer1 , Jeffrey B. Remmel2 , and Amy Vanderbilt1 1
2
Department of Mathematics, University of Florida, P.O. Box 118105, Gainesville, Florida 32611
[email protected] fax: 352-392-8357 Department of Mathematics, University of California at San Diego La Jolla, CA 92093
[email protected]
Abstract. In general, the set of stable models of a recursive propositional logic program can be quite complex. For example, it follows from results of Marek, Nerode, and Remmel [8] that there exists finite predicate logic programs and recursive propositional logic programs which have stable models but no hyperarithmetic stable models. In this paper, we shall define several conditions which ensure that recursive logic program has a stable model which is recursive.
1
Introduction
The stable model semantics of logic programs has been extensively studied. Unfortunately, the set of stable models of a recursive propositional logic program with negation or even of a finite predicate logic program with negation can be quite be quite complex. For example, in [7], it is shown that for any recursive propositional logic program P , there is an infinite branching recursive tree TP such that there is an effective 1:1 degree preserving correspondence between the set of stable models of P and the set of infinite paths through TP . In [8] it is shown that given any infinite branching recursive tree T , there exists a recursive propositional logic program PT such that there is an effective 1:1 correspondence between the set of infinite paths through T and the set of stable models of P . Moreover, in [8], it is shown that the same result holds if we replace recursive logic programs by finite predicate logic programs with negation. These results imply that the set of stable models of a recursive propositional logic program or a finite predicate logic program can be extremely complex. For example, it follows from these results that there a finite predicate logic program which has a stable model but has no stable model which is hyperarithmetic. The main motivation for this paper was to develop conditions on recursive logic programs P which would guarantee the existence of well behaved stable model for P , i.e. a stable model of P which is recursive or possibly even polynomial time. In this paper, we shall give several conditions which guarantee that a recursive propositional logic program has a recursive stable model. We should note that are several number of conditions in the literature which guarantee that a recursive propositional logic program has a stable model of relatively low complexity with respect to the arithmetic hierarchy. Clearly, the first such condition M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 34–48, 1999. c Springer-Verlag Berlin Heidelberg 1999
Locally Determined Logic Programs
35
is to consider recursive Horn logic programs. In that case, it is implicit in [11] and explicitly proved in [1] that the least model of recursive Horn program is a recursively enumerable (r.e.) set and that every r.e. set can appear as the least model of recursive Horn Program. Another important class of logic programs is stratified logic programs where one can single out a particular model called the perfect model. This model is a unique stable model of such program. Apt and Blair [3] showed that recursive logic program with n strata must have a perfect model which is Σn0 and that there is a recursive logic program P with n strata such that the perfect model of P is Σn0 complete. In [8], Marek, Nerode, and Remmel considered the two conditions of being (i) locally finite, that is, (i) each atom of the Herbrand Base of P has at most finitely many minimal derivations from P , and (ii) rsp, which is that there is an effective procedure to find these possible derivations. They showed that these conditions ensure that there is highly recursive tree TP and an effective 1:1 degree preserving correspondence between the set of stable models of P and the set of infinite paths through TP . Here a tree T is highly recursive, if T is recursive, finitely branching, and there is a effective procedure which given any node η ∈ T , produces the set of nodes in T which immediately extend η. One consequence of this fact is that a recursive rsp logic program which has a stable model always has a stable model M whose jump is recursive in 00 . In addition, Marek, Nerode, and Remmel [9] generalized Reiter’s concept of normal default theories to logic programs, FC normal logic programs in language of [9], and showed that FC normal logic programs always have a stable model which is r.e. in 00 . The outline of this paper is as follows. In section 1, we shall define the concept of proof schemes and of FC normal logic programs which will be crucial for later developments. In section 2, we shall introduce the new notion of a locally determined logic program. Given a recursive logic program P and an effective listing of the atoms of the Herbrand base, HP , of P , HP = {a0 , a1 , . . .}, we say that n is a level of P if, roughly, whenever there is a proof scheme p for a sentence ai with i ≤ n, then there exists a proof scheme q only involving elements from {a0 , . . . , an } and their negations such that the restraint set of q is contained in the restraint set of p. We then say that P is locally determined if for every k ≥ 0, there is an nk ≥ k such that nk is a level of P . We say that P is effectively locally determined if one can effectively find such an nk from k. We shall show in section 2, that if P is an effectively locally determined recursive logic program, then there is a highly recursive tree TP such that there is an effective 1:1 degree preserving correspondence between the stable models of P and the set of infinite paths through TP . Thus being effectively locally determined is another condition much like the rsp property which reduces the complexity of the set of stable models of P . In section 3, we shall introduce several strengthenings of local determinedness which will ensure that recursive logic program always has a recursive stable model.
36
2
D. Cenzer, J.B. Remmel, and A. Vanderbilt
Propositional Logic Programs, Proof Schemes, and Normality
In this section, we shall introduce several key notions which will be used in later sections. In particular, we shall carefully define the notion of recursive logic programs. Then we shall define the notion of proof schemes which will lead to the definitions of locally finite programs and rsp programs. Finally we shall describe the extension of the Reiter’s concept of normal default theories to recursive logic programs following [9]. A program clause is an expression of the form C = p ← q1 , . . . , qn , ¬r1 , . . . , ¬rm
(1)
where p, q1 , . . . , qn , r1 , . . . , rm are atomic formulas in some propositional language L. A program is a set of clauses of the form (1). A clause C is called a Horn clause if m = 0. We let Horn(P ) denote the set of all Horn clauses of P . HP is the Herbrand base of P , that is, the set of all atomic formulas of the language of P . If P is a program and M ⊆ HP is a subset of the Herbrand base, define operator TP,M : P(HP ) → P(HP ) where TP,M (I) is the set of all p such that there exists a clause C = p ← q1 , . . . , qn , ¬r1 , . . . , ¬rm in P such that q1 ∈ I, . . . , qn ∈ I and {r1 , . . . , rm } ∩ M = ∅. The operator TP,M is a monotonic finitizable operator, see [2], and hence possesses a least fixpoint FP,M . Given program P and M ⊆ HP , the GelfondLifschitz reduct of P is defined as follows. For every clause C of P , execute the following operation: If some atom a belongs to M and its negation ¬a appears in C, then eliminate C altogether. In the remaining clauses that have not been eliminated by the operation above, eliminate all the negated atoms. The resulting GL is a Horn propositional program (possibly infinite). The program program PM GL GL PM possesses a least Herbrand model. If that least model of PM coincides with M , then M is called a stable model for P . Gelfond and Lifschitz [6] proved the every stable model of P is a minimal model of P and that M is stable model of P iff M = FP,M . Having characterized stable models as fixpoints of (parametrized) operators, consider the form of elements of FP,M . A P, M -derivation of an atom p is a sequence hp1 , . . . , ps i such that (i) ps = p and (ii) for every i ≤ s, either ”pi ←” is a member of P or there is a clause C = ”pi ← q1 , . . . , qn , ¬r1 , . . . , ¬rm ” such / M . It is easy to show that C ∈ P , q1 , . . . , qn ∈ {p1 , . . . , pi−1 } , and r1 , . . . , rm ∈ that FP,M is the set of all atoms possessing a P, M -derivation. Thus M is a stable model of the program P if and only if M consists exactly of those atoms which possess a P, M -derivation. The property that a sequence hp1 , . . . , ps i is a P, M -derivation of an atom p does not depend on the whole set M but only on the intersection of M and a certain finite set of atoms that occur in the derivation. In order that the sequence hp1 , . . . , ps i be a P, M -derivation of an atom ps , some atoms must be left out of the set M . Each derivation depends on a finite number of such omitted atoms.
Locally Determined Logic Programs
37
In other words, if we classify the atoms according to whether they are “in” or “out” of M , the property that a sequence hp1 , . . . , ps i is a P, M -derivation depends only on whether a finite number of elements are out of M . The notion of a proof scheme formalizes this idea. A (P-)proof scheme for an atom p is a sequence S = hhpi , Ci , Ui iisi=1 of triples such that for each triple hpi , Ci , Ui i, pi ∈ HP , Ci ∈ P is a clause with the head pi and Ui is a finite subset of HP . Such sequence S is a proof scheme for p if (1) ps = p, and for every i (2) Ci = pi ← q1 , . . . , qn , ¬r1 , . . . , ¬rm , where {q1 , . . . , qn } ⊆ {p1 , . . . pi−1 } and Ui = Ui−1 ∪ {r1 , . . . , rm }. We call p the conclusion of S, written p = cln(S), and the set Us the support of S, written supp(S). We say that a subset M ⊆ HP admits a proof scheme S = hhpi , Ci , Ui iisi=1 if M ∩ Us = ∅. The following proposition due to Marek, Nerode,and Remmel in [7] characterizes stable models in terms of the existence of proof schemes. Proposition 1. Let M ⊆ HP . Then M is a stable model of P if and only if (1) for every p ∈ M , there is a proof scheme S for p such that M admits S, and (2) for every p ∈ / M , there is no proof scheme S for p such that M admits S. As stated in the introduction, restrictions on the number of proof schemes greatly reduces the possible complexity of the set of stable models of a recursive logic program P . But how many derivation schemes for an atom p can there be? If we allow P to be infinite, then it is easy to construct an example with infinitely many derivations of a single atom. Moreover given two proof schemes, one can insert one into the other (increasing appropriately the sets Ui in this process, with obvious restrictions). Thus various clauses Ci may be immaterial to the purpose of deriving p. This leads us to introduce a natural relation ≺ on proof schemes using a well-known device from proof theory. Namely, we define S1 ≺ S2 if S1 , S2 have the same conclusion and if every clause appearing in S1 also appears in S2 . Then a minimal proof scheme for p is defined to be a proof scheme S for p such that whenever S 0 is a proof scheme for p and S 0 ≺ S, then S ≺ S 0 . Note that ≺ is reflexive and transitive, but ≺ is not antisymmetric. However it is wellfounded. That is, given any proof scheme S , there is an S 0 such that S 0 ≺ S and for every S 00 , if S 00 ≺ S 0 , then S 0 ≺ S 00 . Moreover, the associated equivalence relation, S ≡ S 0 , defined by S ≺ S 0 and S 0 ≺ S, has finite equivalence classes. Example 1. Let P1 be the following program: C1 : p(0) ← ¬q(Y ). C2 : nat(0) ← . C3 : nat(s(X)) ← nat(X). Then atom p(0) possesses infinitely many minimal proof schemes. For instance, each one-element sequence Si = hhp(0), C1 Θi , {si (0)}ii where Θi is the operation of substituting si (0) for Y , is a minimal proof scheme for p(0). However if program P2 is the result of replacing clause C1 by C10 : q(s(Y )) ← ¬q(Y ), each atom possesses only finitely many minimal proof schemes.
38
D. Cenzer, J.B. Remmel, and A. Vanderbilt
We shall call a program P locally finite if for every atom p, there are only finitely many minimal proof schemes with conclusion p. If P is locally finite and p ∈ HP , we let Dp denote the union of all supports of minimal proof schemes of p. Clearly for any M ⊆ HP , the question of whether p has a P, M -derivation depends only on M ∩Dp . This implies that if P is locally finite, when we attempt to construct a subset M ⊆ HP which is a stable model for P , we can apply a straightforward (although still infinite) tree construction to produce such an M , if such an M exists at all. Next, we need to make the notion of a recursive program precise. First, assume that we have a G¨odel numbering of the elements of the Herbrand base HP . Thus, we can think of each element of the Herbrand base as a natural number. odel number of p. Let ω = {0, 1, 2, . . .}. If p ∈ HP , write c(p) for the code or G¨ Assume [, ] is a fixed recursive pairing function which maps ω × ω onto ω which has recursive projection functions π1 and π2 , defined by πi ([x1 , x2 ]) = xi for all x1 and x2 and i ∈ {1, 2}. Code a finite sequence hx1 , . . . , xn i for n ≥ 3 by the usual inductive definition [x1 , . . . , xn ] = [x1 , [x2 , . . . , xn ]]. Next, code finite subsets of ω via “canonical indices”. The canonical index of the empty set, ∅, is the number 0 and Pnthe canonical index of a nonempty set {x0 , . . . , xn }, where x0 < . . . < xn , is j=0 2xj . Let Fk denote the finite set whose canonical index is k. Once finite sets and sequences of natural numbers have been coded, we can code more complex objects such as clauses, proof schemes, etc. as follows. Let the code c(C) of a clause C = p ← q1 , . . . , qn , ¬r1 , . . . , ¬rm be [c(p), k, l], where k is the canonical index of the finite set {c(q1 ), . . . , c(qn )}, and l is the canonical index of the finite set {c(r1 ), . . . , c(rm )}. Similarly, let the code c(S) of a proof scheme S = hhpi , Ci , Ui iisi=1 be [s, [[c(p1 ), c(C1 ), c(U1 )], . . . , [c(ps ), c(Cs ), c(Us )]]], where for each i, c(Ui ) is the canonical index of the finite set of codes of the elements of Ui . The first coordinate of the code of a proof scheme is the length of the proof scheme. Once we have defined the codes of proof schemes then for locally finite programs we can define the code of the set Dp consisting of the union of the supports of all minimal proof schemes for P . Finally we code recursive sets as natural numbers. Let φ0 , φ1 , . . . be an effective list of all partial recursive functions where φe is the partial recursive function computed by the e-th Turing machine. By definition, a (recursive) index of a recursive set R is an e such that φe is the characteristic function of R. Call a program P recursive if the set of codes of the Herbrand universe HP is recursive and the set of codes of the clauses of the program P is recursive. If P is a recursive program, then by an index of P we mean the code of a pair [u, p] where u is an index of the recursive set of all codes of elements in HP and p is an index of the recursive set of the codes of all clauses in P . For the rest of this paper we shall identify an object with its code as described above. This means that we shall think of the Herbrand universe of a program, and the program itself, as subsets of ω and clauses, proof schemes, etc. as elements of ω. We also need to define various types of recursive trees and Π10 classes. Let be the set of all finite sequences from ω and let 2<ω be the set of all ω <ω
Locally Determined Logic Programs
39
finite sequences of 0’s and 1’s. Given α = hα1 , . . . , αn i and β = hβ1 , . . . , βk i in ω <ω , write α v β if α is initial segment of β, i.e. , if n ≤ k and αi = βi for i ≤ n. In this paper, we identify each finite sequence α = hα1 , . . . , αn i with its code c(α) = [n, [α1 , . . . , αn ]] in ω. Let 0 be the code of the empty sequence ∅. When we say that a set S ⊆ ω <ω is recursive, recursively enumerable, etc., what we mean is that the set {c(α): α ∈ S} is recursive, recursively enumerable, etc. Define a tree T to be a nonempty subset of ω <ω such that T is closed under initial segments. Call a function f : ω → ω an infinite path through T provided that for all n, hf (0), . . . , f (n)i ∈ T . Let [T ] be the set of all infinite paths through T . Call a set A of functions a Π10 -class if there exists a recursive predicate R such that A = {f : ω → ω : ∀n(R(n, [f (0), . . . , f (n)])}. Call a Π10 class A recursively bounded if there exists a recursive function g: ω → ω such that ∀f ∈ A∀n(f (n) ≤ g(n)). It is not difficult to see that if A is a Π10 -class, then A = [T ] for some recursive tree T ⊆ ω <ω . A tree T ⊆ ω <ω is highly recursive if T is a recursive finitely branching tree and also there is a recursive procedure which, applied to α = hα1 , . . . , αn i in T , produces a canonical index of the set of immediate successors of α in T . Then if C is a recursively bounded Π10 -class, it is easy to show that C = [T ] for some highly recursive tree T ⊆ ω <ω , see [5]. For 0 any set A ⊆ ω, the set A0 = {e: φA e (e) is defined} is called the jump of A, let 0 denote the jump of the empty set ∅. We write A ≤T B if A is Turing reducible to B and A ≡T B if A ≤T B and B ≤T A. A function f : C → D is said to be degree-preserving if A ≡T f (A) for all A ∈ C. Even if P is a locally finite program, there is no guarantee that the global behavior of the function p 7→Dp , mapping ω into ω, has any sort of effective properties. Thus we are led to define the following. We say that a locally finite recursive program P possesses a recursive proof structure (rps) if (1) P is locally finite, and (2) the function p 7→Dp is recursive. A locally finite recursive program with an rps is called an rps program. We end this section with the notion of normal logic programs as defined by Marek, Nerode, and Remmel [9]. Recall that Horn(P ) is the set of Horn clauses of a logic program P . We let THorn(P ) denote the immediate provability operator associated with Horn(P ), see [2]. Call a family of subsets of HP , Con, a consistency property over P if it satisfies the following conditions: 1. 2. 3. 4.
∅ ∈ Con. If A ⊆ B and B ∈ Con, then A ∈ Con. Con is closed under directed unions. If A ∈ Con then A ∪ THorn(P ) (A) ∈ Con.
Conditions (1)-(3) are Scott’s conditions for information systems. Condition (4) connects “consistent” sets of atoms to the Horn part of the program, i.e., if A is consistent then adding atoms provable from A by means of Horn part of the program preserves “consistency”. The following fact is easy to prove. Proposition 2. If Con is a consistency property with respect to P and A ∈ Con, then THorn(P ) ⇑ ω(A) ∈ Con.
40
D. Cenzer, J.B. Remmel, and A. Vanderbilt
Here, for a Horn program Q, TQ ⇑ ω(A) is the cumulative fixpoint of TQ over A. Proposition 2 says that our condition (4) in the definition of consistency property implies that the cumulative closure of a “consistent” set of atoms under TH(P ) is still “consistent”. Given a consistency property, we define the concept of an FC-normal program with respect to that property. Here FC stands for “Forward Chaining”. Definition 1. (a) Let P be a program, let Con be a consistency property with respect to P . Call P FC-normal with respect to Con if for every clause C = p ← q1 , . . . , qn , ¬r1 , . . . , ¬rm such that C ∈ ground(P ) − ground(Horn(P )) and every consistent fixpoint A of THorn(P ) , whenever q1 , . . . , qn ∈ A and / A, then p, r1 , . . . , rm ∈ (1) A ∪ {p} ∈ Con / Con for all 1 ≤ i ≤ m. (2) A ∪ {p, ri } ∈ (b) P is called FC-normal if there exists a consistency property Con such that P is FC-normal with respect to Con. Example 2. Let the Herbrand base consist of atoms a, b, c, d, e, f . Let the consistency property be defined by the following condition: A ∈ / Con if and only if either {c, d} ⊆ A or {e, f } ⊆ A. Now consider the following program. (1) a ←,
(2) b ← c,
(3) c ← b,
(4) c ← a, ¬d,
(5) e ← c, ¬f .
It is not difficult to check that his program is FC-normal with respect to the consistency property described above. Moreover one can easily check that there is a unique stable model M = {a, b, c, e}. If we add to this program the clause f ← c, ¬e, the resulting program is still FC-normal but now there are two stable models, M1 = {a, b, c, e} and M2 = {a, b, c, f }. Marek, Nerode, and Remmel [9] showed that FC-normal normal programs have many of the properties that are possessed by normal default theories. Theorem 1. If P is an FC-normal program, then P possesses a stable model. Theorem 2. If P is an FC-normal program with respect to the consistency property Con and I ∈ Con, then P possesses a model I 0 such that I ⊆ I 0 . Marek, Nerode, and Remmel proved Theorem 1 and 2 via a forward chaining algorithm which can be applied to FC-normal programs of any cardinality. Since in our case, we are dealing with only recursive and hence countable programs, we shall give only the countable version of their forward chaining construction. That is, suppose we fix some well-ordering ≺ of ground(P ) − ground(H(P )) of order type ω. Thus, the well-ordering ≺ determines some listing of the clauses of ground(P ) − ground(H(P )),{cn : n ∈ ω}. The forward chaining construction then defines an increasing sequence of sets {Tn≺ }n∈ω in stages and a set T ≺ = S ≺ n Tn as follows.
Locally Determined Logic Programs
41
Stage 0. Let T0≺ = THorn(P ) ⇑ ω(∅). Stage n + 1. Let `(n + 1) be the least s ∈ ω such that cs = ϕ ← α1 , . . . , αk , ¬β1 , . . . , ¬βm where α1 , . . . , αk ∈ Tn≺ and β1 , . . . , βm , ϕ ∈ / Tn≺ . If there is no such `(n + 1), let ≺ ≺ ≺ Tn+1 = Tn . Otherwise let Tn+1 = THorn(P ) ⇑ ω(Tn≺ ∪ {p`(n+1) }) where p`(n+1) is the head of c`(n+1) . Example 3. If we consider the final extended program of Example 2, it is easy to check that any ordering ≺1 in which the clause C1 = e ← c, ¬f precedes the clause C2 = f ← c, ¬e will have T ≺1 = M1 while any ordering ≺2 in which C2 precedes C1 will have T ≺2 = M2 . This given, Marek, Nerode, and Remmel proved the following results. Theorem 3. If P is a countable FC-normal program, and ≺ is any well-ordering of ground(P ) − ground(H(P )) of order type ω, then (1) T ≺ is a stable model of P where T ≺ is constructed via the countable forward chaining algorithm. (2) (completeness of the construction). Every stable model model of P is of the form T ≺ for a suitably chosen ordering ≺ of ground(P )−ground(H(P )) of order type ω where T ≺ is constructed via the countable forward chaining algorithm. Theorem 4. If P is an FC-normal logic program with respect to Con, then (1) every stable model M of P is in Con and / Con. (2) if E1 and E2 are two distinct stable models of P , then E1 ∪ E2 ∈ FC-normal programs possess “semi-monotonicity” property. Theorem 5. Let P1 , P2 be two programs such that P1 ⊆ P2 but H(P1 ) = H(P2 ). Assume, in addition, that both are FC-normal with respect to the same consistency property. Then for every stable model M1 of P1 , there is a stable model M2 of P2 such that (1) M1 ⊆ M2 and (2) N G(M1 , P1 ) ⊆ N G(M2 , P2 ). Here given a logic program P and a stable model M , we let N G(M, P ) equal the set of all clauses c = ϕ ← α1 , . . . , αk , ¬β1 , . . . , ¬βm in ground(P ) such that α1 , . . . , αk ∈ M and β1 , . . . , βm ∈ / M.
3
Locally Determined Propositional Logic Programs
In this section, we shall introduce the key notion of a locally determined logic program P . The informal notion of a locally determined logic program P is one in which the existence of a proof scheme for an atom ai (or the lack of existence thereof) can be determined by examining only clauses or proof schemes involving some initial segment of the Herbrand base of P . More formally, fix some countable logic program P and some listing a0 , a1 , . . . of the atoms of
42
D. Cenzer, J.B. Remmel, and A. Vanderbilt
Herbrand base of P without repetitions. (We shall make the convention that if P is recursive logic program, then there is some recursive function h such that h(i) = ai .) Then given a proof scheme or a clause φ, we write max(φ) for the max({i : ai occurs in φ}). We shall write Pn for the set of all clauses C ∈ P such that max(C) ≤ n and let An = {a0 , . . . , an }. Definition 2. We shall say that n is a level of P if for all S ⊆ {a0 , . . . , an } and all i ≤ n, whenever there exists a proof scheme ψ such that cln(ψ) = ai and supp(ψ) ∩ S = ∅, then there exists a proof scheme φ such that cln(φ) = ai , supp(φ) ∩ S = ∅ and max(φ) ≤ n. Note that by definition, the Herbrand base HPn of Pn is contained in An Theorem 6. Suppose that n is a level of P and E is a stable model of P . Then En = E ∩ {a0 , . . . , an } is a stable model of Pn . Proof. If E is a stable model of P , then for any ai ∈ En , there is a proof scheme ψ such that cln(ψ) = ai and supp(ψ)∩E = ∅. Thus, in particular, supp(ψ)∩En = ∅ so that since n is a level, there exists a proof scheme φ such that max(φ) ≤ n, cln(φ) = ai , and supp(φ) ∩ En = ∅. Thus φ is a proof scheme of Pn and En admits φ. Vice versa, if i ≤ n and ai is not in En , then there can be no proof scheme φ of Pn such that cln(φ) = ai , max(φ) ≤ n, and supp(φ) ∩ En = ∅ since this would violate the fact that E is a stable model of of P . Thus En is a extension of Pn . t u Definition 3. 1. We shall say that a logic program P is locally determined if P is countable and there are infinitely many n such that n is a level of P . 2. Suppose that P is a recursive logic program. Then we say that P is effectively locally determined if P is locally determined and there is a recursive function f such that for all i, f (i) ≥ i and f (i) is a level of P . Notation: If P is locally determined, we let lev(P ) = {n : n is a level of P }. In [7,8], Marek, Nerode, and Remmel showed that the problem of finding a stable model of a locally finite recursive logic program can be reduced to finding an infinite path through a finitely branching recursive tree and the problem of finding a stable model of a rsp logic program can be reduced to finding an infinite path through a highly recursive tree. A locally determined logic program is not always locally finite since it is possible that a given atom has infinitely many proof schemes which involves arbitrarily large atoms. Vice versa, it is possible to give examples of locally finite logic programs which are not locally determined. Nevertheless, we shall see that we get similar results to those of Marek, Nerode, and Remmel for locally determined and effectively locally determined recursive logic programs. Theorem 7. Let P be a recursive logic program. 1. If P is locally determined, then there is a recursive finitely branching tree T and a 1:1 degree preserving correspondence between the set set of stable models E of P and [T ] and
Locally Determined Logic Programs
43
2. If P is effectively locally determined, then there is a highly recursive finitely branching tree T and a 1:1 degree preserving correspondence between the set set of stable models E of P and [T ]. Proof. There is no loss in generality in assuming that Hp = ω and that a0 = 0, a1 = 1, . . . . Next observe that for each n, Pn has only finitely many minimal proof schemes so that we can effectively list all minimal proof schemes φ0 < φ1 < . . . in such a way that 1. if max(φk ) = i and max(φl ) = j and i < j, then k < l. (This says that if i < j, then the proof schemes whose max is i come before those proof schemes whose max is j.) 2. if max(φk ) = max(φl ) = i, k < l if and only if c(φk ) < c(φl ) where c(φ) denotes the index assigned to a proof scheme φ under our effective G¨ odel numbering of the proof schemes. We shall encode a stable model M of P by a path πM = (π0 , π1 , . . .) through the complete ω-branching tree ω <ω as follows. First, for all i ≥ 0, π2i = χM (i). That is, at the stage 2i we encode the information if i belongs to M . Next, if π2i = 0 then π2i+1 = 0. But if π2i = 1 so that i ∈ M , then π2i+1 = qM (i) where qM (i) is the least q such cln(φq ) = i and supp(φq ) ∩ M = ∅. Thus φqM (i) is the least proof scheme which shows that i ∈ FP,M . Clearly M ≤T πM since it is enough to look at the values of πM at the even levels to read off M . Now given an M -oracle, it should be clear that for each i ∈ M , we can use an M -oracle to find qM (i) effectively. This means that πM ≤T M . Thus the correspondence M 7→πM is an effective degree-preserving correspondence. It is trivially one-to-one. Next we construct a recursive tree T ⊆ ω ω such that [T ] = {πE : E is a stable model of P }. Let Lk = max({i : max(φi ) ≤ k}). It is easy to see that since P is a locally determined recursive logic program, we can effectively calculate Lk from k. We have to say which finite sequences belong to our tree T . To this end, given a sequence σ = (σ(0), . . . , σ(k)) ∈ ω <ω set Iσ = {i : 2i ≤ k ∧ σ(2i) = 1} and Oσ = {i : 2i ≤ k ∧ σ(2i) = 0}. Now we define T by putting σ into T if and only if the following four conditions are met: (a) ∀i(2i + 1 ≤ k ∧ σ(2i) = 0 ⇒ σ(2i + 1) = 0). (b) ∀i(2i+1 ≤ k ∧ σ(2i) = 1 ⇒ σ(2i+ 1) = q where φq is a minimal proof scheme such that cln(φq ) = i and supp(φq ) ∩ Iσ = ∅). (c) ∀i(2i + 1 ≤ k ∧ σ(2i) = 1 ⇒ there is no c ∈ Lbk/2c such that cln(φc ) = i, supp(φc ) ⊆ Oσ and c < σ(2i + 1)). (here b·c is the so-called number-theoretic “floor” function). (d) ∀i(2i ≤ k ∧ σ(2i) = 0 ⇒ there is no c ∈ Lbk/2c such that cln(φc ) = i and supp(φc ) ⊆ Oσ ). It is immediate that if σ ∈ T and τ v σ, then τ ∈ T . Moreover it is clear from the definition that T is a recursive subset of ω <ω . Thus T is a recursive
44
D. Cenzer, J.B. Remmel, and A. Vanderbilt
tree. Also, it is easy to see that our definitions ensure that, for any stable E of P , the sequence πE is a branch through T , that is, πE ∈ [T ]. We shall show now that every infinite branch through T is of the form πE for a suitably chosen stable model E. To this end assume that β = (β(0), β(1), . . .) is an infinite branch through T . There is only one candidate for E, namely Eβ = {i : β(2i) = 1}. Two items have to be checked, namely, (I) Eβ is an stable model of P and (II) π(Eβ ) = β. To prove (I), first observe that if i ∈ Eβ , then σ(2i) = 1 and σ(2i + 1) = q where φq is a proof scheme such that cln(φq ) = i. Moreover condition (c) and the fact that σn = (β0 , β1 , . . . , βn ) ∈ T for all n ≥ 2i+1 easily imply that supp(φq )∩ Iσn = ∅ for all such n and hence supp(φq ) ∩ Eβ = ∅. In addition, condition (c) ensures that φq is the least proof scheme with this property. Similarly if i∈ / Eβ , then condition (d) and the fact that σn = (β0 , β1 , . . . , βn ) ∈ T for all n ≥ 2i + 1 easily imply that there can be no proof scheme φq with cln(φq ) = i and supp(φq ) ∩ Eβ = ∅. It then follows from Proposition 1 that Eβ is a stable model of P and that π(Eβ ) = β The key fact needed to establish the branching properties of T is that for any sequence σ ∈ T and any i, either σ(2i) = σ(2i + 1) = 0 or σ(2i) = 1 and σ(2i + 1) codes a minimal proof scheme for i. We just note that when a proof scheme ψ = σ(2i + 1) does not correspond to a path πE , then there will be some k such that σ has no extension in T of length k. This will happen once we either find a smaller code for a proof scheme or we find some u > i in the support of ψ such that all possible extensions τ of σ have τ (2u) = 1. We claim that T is always finitely branching and that if P iss effectively locally determined, then T is highly recursive. Clearly the only case of interest is when 2i + 1 ≤ k and σ(2i) = 1. In this case we will let σ(2i + 1) = c where cln(φc ) = i and supp(φc ) ∩ Iσ = ∅ and there is no a < c such that cln(φa ) = i and supp(φa )∩Iσ = ∅. Now suppose that p is level and i < p. Then by definition, there must be a minimal proof scheme ψ such that max(ψ) ≤ p, cln(ψ) = i, and supp(ψ) ∩ Iσ = ∅. Thus ψ = φq for some q ≤ Lp . It follows that c ≤ Lp where p is the least level greater than or equal to i. Thus T is always finitely branching. Now if P is effectively locally determined and this is witnessed by the recursive function f , then it will always be the case that c ≤ Lf (i) so that T will be highly recursive. t u Corollary 1. Suppose that P is a countable locally determined logic program such that there are infinitely many n such that Pn has a stable model En . Then P has a stable model. Proof. Now consider the tree T constructed for P as in Theorem 7. Here we again can construct our sequence of minimal proof schemes φ0 , φ1 , . . . recursive in P just as we did in Theorem 7. However, we can only conclude that T is recursive in P . Nevertheless, we are guaranteed that T is finitely branching which is all we need for our argument. Now fix some level n and consider some m ≥ n such that Pm has an extension Em . Then by the exact same argument as in Theorem 6, En = Em ∩ {0, . . . , n}
Locally Determined Logic Programs
45
will be a stable model of Pn . Now consider the node σEn = (σ(0), . . . , σ(2n + 1)) such that / En , 1. σ(2i) = 1 if i ∈ En and σ(2i) = 0 if i ∈ 2. σ(2i + 1) = 0 if σ(2i) = 0, and 3. σ(2i + 1) = c where c is least number such that max(φc ) ≤ n, cln(φc ) = i, and supp(φc ) ∩ En = ∅. (Note it follows from our ordering of minimal proof schemes that φc is the least proof scheme φ such that cln(φ) = i and supp(φ) ∩ En = ∅.) It is easy to check that our construction of T ensures that σ ∈ T . It follows that T is infinite finitely branching tree and hence T has infinite path π by Konig’s Lemma. Our proof of Theorem 7 shows that Eπ is an stable model of P . t u Using known results from the theory of recursively bounded Π10 classes [5], one can also prove the following. Corollary 2. Suppose that P is a effectively locally determined recursive logic program which has at least one stable model. Then 1. P has a stable model whose Turing jump is recursive in 00 . 2. If P has no recursive stable model, then P has 2ℵ0 stable models. 3. If P has only finitely many stable models, then each of these stable models is recursive. 4. There is a stable model E of P in an r.e. degree. 5. There exist stable models E1 and E2 of P such that any function, recursive in both E1 and E2 , is recursive. 6. If P has no recursive stable model, then there is a nonzero r.e. degree a such that P has no stable model recursive in a.
4
Conditions Which Ensure the Existence of Recursive Stable Models
In this section, we give two conditions which ensure the existence of recursive stable models. Definition 4. 1. Let P be a locally determined logic program and lev(P ) = {l0 < l1 < . . .}. Then we say say that P has the level extension property if for all k, whenever Ek is a stable model of Plk , there exists a stable model Ek+1 of Plk+1 such that Ek+1 ∩ {a0 , . . . , alk } = Ek . 2. A level n of P is a strong level of P if for any levels m < n of P and any stable model Em of Pm , if there is no stable model E of P with E ∩ {a0 , . . . , am } = Em , then there is no stable model En of Pn with En ∩ {a0 , . . . , am } = Em . 3. P has effectively strong levels if P has infinitely many strong levels and there is a computable function f such that for each i, i ≤ f (i) and f (i) is a strong level.
46
D. Cenzer, J.B. Remmel, and A. Vanderbilt
In general, it is not easy to ensure that a locally determined logic program P has the level extension property. However, there are many natural examples where this condition is satisfied. One way to generate such programs is to consider the following result from recursive combinatorics. Bean [4] showed that there there exists highly recursive connected graphs which are 3-colorable but not recursively k-colorable for any k. However Bean also showed that every infinite connected k-colorable highly recursive graph G is recursively 2k-colorable. Here a graph G = (V, E) is highly recursive if the edge set V is a recursive subset of ω, the set of codes of the sets {x, y} ∈ E is recursive, G is locally finite, i.e. the degree of any vertex v ∈ V is finite, and there is an effective procedure which given any vertex v ∈ V produces a code of N (x) = {y ∈ V : {x, y} ∈ E}. A recursive 2k-coloring of G can be produced as follows. Given any set W ⊆ V , let N (W ) = {y ∈ V − W : (∃x ∈ W )({x, y} ∈ E}. Then given any x ∈ V , define an effective increasing sequence of finite sets ∅ = A0 , A1 , A2 , . . . where A1 = N (x) ∪ {x} and for all k > 1, Ak = Ak−1 ∪ N (Ak−1 ). It is easy to see that there can be no edge from an element of Ak to an element of Ak+2 − Ak+1 . Since G is k-colorable, the induced graph determined by Ai − Ai−1 is k-colorable for all i ≥ 1. We then defined a recursive 2k-coloring of G as follows. (Step 1) Find a coloring of A1 using colors {1, . . . k}, (Step 2) Find a coloring of A2 − A1 using colors {k + 1, . . . , 2k}, (Step 3) Find a coloring of A3 − A − 2 using colors {1, . . . k}, (Step 4) Find a coloring A4 − A3 using colors {k + 1, . . . , 2k}, etc. One can easily write a logic program to implement this procedure and it will naturally be effectively locally finite and have the level extension property. It also turns out that all locally determined FC normal logic programs have the level extension property. That is, we can prove the following. Theorem 8. Suppose that P is a locally determined FC-normal logic program. Then P has the level extension property. Theorem 9. 1. Suppose that P is an effectively locally determined recursive logic program with the level extension property. Then for every level n and stable model En of Pn , there is a recursive stable model of E of P such that E ∩ {a0 , . . . , an } = En . 2. Suppose that P is a recursive logic program with effectively strong levels. Then for every level n and stable model En of Pn , if there is a stable model E of P with E ∩ {a0 , . . . , an } = En , then there is a recursive stable model of E of P such that E ∩ {a0 , . . . , an } = En . Proof. For (1), fix a level n of P and a stable model En of P . Suppose that f is the function which witnesses the fact P is effectively locally determined. Then let b0 , b1 , b2 , . . . be the sequence n, f (n), f (f (n)), . . .. It is easy to see that our level extension property implies that we can effectively construct a sequence of sets Eb0 , Eb1 , Eb2 , . . . such that (1) Eb0 = En , (2) for all j > 0, Ebj is a stable model of Pbj , and (3) for all j ≥ 0, Ebj+1 ∩ {a0 , . . . , abj } = Ebj . Now consider tree T and the nodes σEbj as constructed in Corollary 1. It is easy to check that
Locally Determined Logic Programs
47
for all i, σEbi ∈ T and that σEb0 v σEb1 v σEb2 v · · ·. It follows that there is a S unique path β in [T ] which extends all σEbi and that Eβ = i≥0 σEbi is a stable model of P . Moreover Eβ is recursive because to decide if aj ∈ Eβ , one need only find k such that bk ≥ j, in which case, aj ∈ Eβ ⇐⇒ aj ∈ σEbk . For (2), assume that f is the recursive function which witnesses the fact that P has strong levels and let b0 , b1 , b2 , . . . be defined as above. We claim that the property of strong levels once again lets us construct an effective sequence Eb0 , Eb1 , Eb2 , . . . such that (1) Eb0 = En , (2) for all j > 0, Ebj is a stable model of Pbj , and (3) for all j ≥ 0, Ebj+1 ∩ {a0 , . . . , abj } = Ebj . That is, suppose that we have constructed Ebk such that there exists a stable model S of P such that S ∩ {a0 , . . . , abk } = Ebk . Now consider the strong level bk+2 . Our definition of strong level ensures that there must be some stable model Fk+2 of Pbk+2 such that Fk+2 ∩ {a0 , . . . , abk } = Ebk . Then let Ebk+1 = Fk+2 ∩ {a0 , . . . , abk+1 }. The argument in Theorem 6 shows that Ebk+1 is a stable model of Pbk+1 . Moreover, since bk+2 is a strong level, there must be a stable model S 0 of P such that Ebk+1 = S 0 ∩ {a0 , . . . , abk+1 } since otherwise, there can be no extension F of Pbk+2 such that Ebk+1 = F ∩ {a0 , . . . , abk+1 }. This given, we can then construct our desired recursive stable model Eβ exactly as in (1). t u We end this section with another, more direct approach to producing a recursive stable model. Definition 5. We say that a recursive logic program P has witnesses with effective delay if there is a recursive function f such that for all n, f (n) > n and whenever there is a set S ⊆ {a0 , . . . , an } such that there is a stable model E of P with E ∩ {a0 , . . . , an } = S but there is no stable model F of U such that F ∩ {a0 , . . . , an , an+1 } = S, then either (i) there is a proof scheme φ with max(φ) ≤ n + 1 such that cln(φ) = an+1 and supp(φ) ⊆ {a0 , . . . , an } − S, or (ii) for all sets T included in {an+2 , . . . , af (n) }, there is a proof scheme ψT with max(ψ) ≤ f (n) such that supp(ψT ) ⊆ {a0 , . . . , af (n) } − (T ∪ S) and cln(ψT ) ∈ {a0 , . . . , af (n) } − (T ∪ S). Note that in case (i), the proof scheme φ witnesses that we must have an+1 in any stable model E such that E ∩{a0 , . . . , an } = S. In case (ii), the proof schemes φT show that we can not have a stable model E of P such that E ∩{a0 , . . . , af (n) } = S ∪ T so that we are again forced to have an+1 in any stable model E such that E ∩ {a0 , . . . , an } = S. Theorem 10. Suppose that P is a recursive logic program which has witnesses with effective delay and has at least one stable model. Then the lexicographically least stable model E of P is recursive. Proof. We can construct the lexicographically least stable model E of P by induction as follows. Suppose that for any given n we have constructed En = E ∩ {a0 , . . . , an }. Then En+1 = En unless either
48
D. Cenzer, J.B. Remmel, and A. Vanderbilt
(i) there is a proof scheme φ of level n such that supp(φ) ⊆ {a0 , . . . , an } − En and cln(φ) = an+1 or (ii) for all sets T included in {an+2 , . . . , af (n) }, there is a proof scheme ψT with max(φT ) ≤ f (n) such that supp(ψT ) ⊆ {a0 , . . . , af (n) } − (T ∪ En ) and cln(ψT ) ∈ {a0 , . . . , af (n) } − (T ∪ En ). in which case En+1 = En ∪ {an+1 }. Note that since there are only finitely many minimal proof schemes φ with max(φ) ≤ k for any given k, we can check conditions (i) and (ii) effectively. Since there is a stable model, it is easy to see that our definitions insure that En S is always contained in the lexicographically least stable model of P . Thus E = n En is recursive. t u By putting suitable effective bounds on the effective levels and/or the witnesses with effective delay, one can readily come up with conditions that force P to have exponential time, NP, or P-time stable models. These topics will be pursued in subsequent papers.
References 1. H. Andreka, I. Nemeti. The Generalized Completeness of Horn Predicate Logic as a Programming Language. Acta Cybernetica 4(1978) pp. 3-10. 2. K.R. Apt. Logic programming. Handbook of Theoretical Computer Science (J. van Leeuven, ed.), Cambridge, MA: MIT Press, 1990. 3. K.R. Apt, H.A. Blair. Arithmetical Classification of Perfect Models of Stratified Programs. Fundamenta Informaticae 13 (1990) pp. 1-17. 4. D.R. Bean. Effective Coloration. Journal of Symbolic Logic 41(1976) pp. 469-480. 5. D. Cenzer and J.B. Remmel. Π10 -classes in Mathematics. Handbook of Recursive Mathematics: Volume 2 (Yu. L. Ershov, S.S. Goncharov, A. Nerode, and J.B. Remmel, eds.) Studies in Logic and the Foundations of Mathematics, vol. 139: 623-822, Elsevier, 1998. 6. M. Gelfond and V. Lifschitz. The stable semantics for logic programs. Proceedings of the 5th International Sympposium on Logic Programming, MIT Press, pp. 1070-1080, 1988. 7. W. Marek, A. Nerode, and J. B. Remmel. How complicated is the set of stable models of a recursive logic program? Annals of Pure and Applied Logic, 56 (1992), pp. 119-135. 8. W. Marek, A. Nerode, and J. B. Remmel. The stable models of predicate logic programs. Journal of Logic Programming, 21 (1994), pp. 129-154. 9. W. Marek, A. Nerode, and J. B. Remmel. Context for Belief Revision: FC-Normal Nonmonotonic Rule Systems, Annals of Pure and Applied Logic 67(1994) pp. 269324. 10. R. Reiter. A logic for default reasoning. Artificial Intelligence, 13(1980), pp. 81-132. 11. R.M. Smullyan. Theory of Formal Systems, Annals of Mathematics Studies, no. 47, Princeton, N.J.
Annotated Revision Programs Victor Marek, Inna Pivkina, and Miroslaw Truszczy´ nski Department of Computer Science, University of Kentucky, Lexington, KY 40506-0046 marek|inna|
[email protected]
Abstract. Revision programming was introduced as a formalism to describe and enforce updates of belief sets and databases. Revision programming was extended by Fitting who assigned annotations to revision atoms. Annotations provide a way to quantify certainty (likelihood) that a revision atom holds. The main goal of our paper is to reexamine the work of Fitting, argue that his semantics does not always provide results consistent with intuition and to propose an alternative treatment of annotated revision programs. Our approach differs from that proposed by Fitting in two key aspects: we change the notion of a model of a program and we change the notion of a justified revision. We show that under this new approach fundamental properties of justified revisions of standard revision programs extend to the case of annotated revision programs.
1
Introduction
Revision programming is a formalism to specify and enforce constraints on databases, belief sets and, more generally, on arbitrary sets of elements. Revision programming was introduced and studied in [MT95,MT98]. The formalism was shown to be closely related to logic programming with stable model semantics [MT98,PT97]. In [MPT99], a simple correspondence of revision programming with the general logic programming system of Lifschitz and Woo [LW92] was discovered. Roots of another recent formalism of dynamic programming [ALP+ 98] can also be traced back to revision programming. Revision rules come in two forms of in-rules and out-rules: in(a) ← in(a1 ), . . . , in(am ), out(b1 ), . . . , out(bn )
(1)
out(a) ← in(a1 ), . . . , in(am ), out(b1 ), . . . , out(bn ).
(2)
and Expressions in(a) and out(a) are called revision atoms. Informally, the atom in(a) stands for “a is in the current set” and out(a) stands for “a is not in the current set.” The rules (1) and (2) have the following imperative, or computational, interpretation: whenever elements ak , 1 ≤ k ≤ m, belong to the current set (database, belief set) and none of the elements bl , 1 ≤ l ≤ n, belongs to the current set then, in the case of rule (1), the item a must be added to the set (if it is not there already), and in the case of rule (2), a must be eliminated from the M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 49–62, 1999. c Springer-Verlag Berlin Heidelberg 1999
50
V. Marek, I. Pivkina, and M. Truszczy´ nski
database (if it is there). The rules (1) and (2) have also an obvious declarative interpretation. To provide a precise semantics to revision programs, that is, collections of revision rules, the concept of a justified revision was introduced in [MT95,MT98]. Informally, given an initial set BI and a revision program P , a justified revision of BI with respect to P (or, simply, a P -justified revision of BI ) is obtained from BI by adding some elements to BI and by removing some other elements from BI so that each change is, in a certain sense, justified by the program. The formalism of revision programs was extended by Fitting [Fit95] to the case when revision atoms are assigned annotations. These annotations can be interpreted as the degree of confidence that a revision atom holds. For instance, an annotated atom (in(a):0.2) can be regarded as the statement that a is in the set with probability 0.2. In his paper, Fitting described the concept of a justified revision of an annotated program and studied properties of that notion. The main goal of our paper is to reexamine the work of Fitting, argue that his semantics does not always provide results consistent with intuition, and to propose an alternative treatment of annotated revision programs. Our approach differs from that proposed by Fitting in two key aspects: we change the notion of a model of a program and we change the notion of a justified revision. We show that under this new approach all fundamental properties of justified revisions of standard revision programs extend to the case of annotated revision programs. We also show that annotated revision programming can be given a more uniform treatment if the syntax of revision programs is somewhat modified. The new syntax yields a formalism that is equivalent to the original formalism of annotated revision programs. The advantage of the new syntax is that it allows us to generalize the shifting theorem proved in [MPT99] and used there to establish the equivalence of revision programming with general logic programming of Lifschitz and Woo [LW92]. Finally, in the paper we also address briefly the issue of disjunctive annotated programs and other possible research directions.
2
Preliminaries
Throughout the paper we consider a fixed universe U whose elements are referred to as atoms. Expressions of the form in(a) and out(a), where a ∈ U , are called revision atoms. In the paper we assign annotations to revision atoms. These annotations are members of a complete distributive lattice with the de Morgan complement (an order reversing involution). Throughout the paper this lattice is denoted by T . The partial ordering on T is denoted by ≤ and the corresponding meet and join operations by ∧ and ∨, respectively. The de Morgan complement of a ∈ T is denoted by a ¯. An annotated revision atom is an expression of the form (in(a):α) or (out(a): α), where a ∈ U and α ∈ T . An annotated revision rule is an expression of the form p ← q1 , . . . , qn ,
Annotated Revision Programs
51
where p, q1 , . . . , qn are annotated revision atoms. An annotated revision program is a set of annotated revision rules. A T -valuation is a mapping from the set of revision atoms to T . A T valuation v describes our information about the membership of the elements from U in some (possibly unknown) set B ⊆ U . For instance, v(in(a)) = α can be interpreted as saying that a ∈ B with certainty α. A T -valuation v satisfies an annotated revision atom (in(a):α) if v(in(a)) ≥ α. Similarly, v satisfies (out(a):α) if v(out(a)) ≥ α. The T -valuation v satisfies a list or a set of annotated revision atoms if it satisfies each member of the list or the set. A T -valuation satisfies an annotated revision rule if it satisfies the head of the rule whenever it satisfies the body of the rule. Finally, a T -valuation satisfies an annotated revision program (is a model of the program) if it satisfies all rules in the program. Given a revision program P we can assign to it an operator on the set of all T -valuations. Let tP (v) be the set of the heads of all rules in P whose bodies are satisfied by v. We define an operator TP as follows: _ TP (v)(l) = {α|(l:α) ∈ tP (v)} (note that ⊥ is the join of an empty set of lattice elements). The operator TP is a counterpart of the well-know van Emden-Kowalski operator from logic programming and it will play an important role in our paper. It is clear that under T -valuations, the information about an element a ∈ U is given by a pair of elements from T that are assigned to revision atoms in(a) and out(a). Thus, in the paper we will also consider an algebraic structure T 2 with the domain T × T and with an ordering ≤k defined by: hα1 , β1 i ≤k hα2 , β2 i if α1 ≤ α2 and β1 ≤ β2 . If a pair hα1 , β1 i is viewed as a measure of our information about membership of a in some unknown set B then α1 ≤ α2 and β1 ≤ β2 imply that the pair hα2 , β2 i represents higher degree of knowledge about a. Thus, the ordering ≤k is often referred to as the knowledge or information ordering. Since the lattice T is complete, T 2 is a complete lattice with respect to the ordering ≤k 1 . The operations of meet, join, top, and bottom under ≤k are denoted ⊗, ⊕, >, and ⊥, respectively. In addition, we make use of an additional operation, ¯ α conflation. Conflation is defined as −hα, βi = hβ, ¯ i. An element A ∈ T 2 is consistent if A ≤k −A. A T 2 -valuation is a mapping from atoms to elements of T 2 . If B(a) = hα, βi under some T 2 -valuation B, we say that under B the element a is in a set with certainty α and it is not in the set with certainty β. We say that a T 2 -valuation is consistent if it assigns a consistent element of T 2 to every atom in U . 1
There is another ordering that can be associated with T 2 . We can define hα1 , β1 i ≤t hα2 , β2 i if α1 ≤ α2 and β1 ≥ β2 . This ordering is often called the truth ordering. Since T is a distributive lattice, T 2 with both orderings ≤k and ≤t forms a bilattice (see [Gin88,Fit99] for a definition). In this paper we will not use the ordering ≤t nor the fact that T 2 is a bilattice.
52
V. Marek, I. Pivkina, and M. Truszczy´ nski
In the paper, T 2 -valuations will be used to represent current information about sets (databases) as well as change that needs to be enforced. Let B be a T 2 -valuation representing our knowledge about a certain set and let C be a T 2 -valuation representing change that needs to be applied to B. We define the revision, B 0 , of B by C by B 0 = (B ⊗ −C) ⊕ C. The intuition is as follows. After the revision, the new valuation must contain at least as much knowledge about atoms being in and out as C. On the other hand, this amount of knowledge must not exceed implicit bounds present in C and expressed by −C, unless C directly implies so (if C(a) = hα, βi, then evidence for in(a) must not exceed β¯ and the evidence for out(a) must not exceed α ¯ , unless C directly implies so). Since we prefer explicit evidence of C to implicit evidence expressed by −C, we perform the change by first using −C and then applying C (however, let us note here that the order matters only if C is inconsistent; if C is consistent, (B ⊗ −C) ⊕ C = (B ⊕ C) ⊗ −C). This specification of how a change modeled by a T 2 -valuation is enforced plays a key role in our definition of justified revisions in Section 4. There is a one-to-one correspondence θ between T -valuations (of revision atoms) and T 2 -valuations (of atoms). For a T -valuation v, the T 2 -valuation θ(v) is defined by: θ(v)(a) = hv(in(a)), v(out(a))i. The inverse mapping of θ is denoted by θ−1 . Clearly, using the mapping θ, the notions of satisfaction defined earlier for T -valuations can be extended to T 2 -valuations. Similarly, the operator TP gives rise to a related operator TPb . The operator TPb is defined on the set of all T 2 -valuations by TPb = θ ◦ TP ◦ θ−1 . The key property of the operator TPb is its ≤k -monotonicity. Theorem 1. Let P be an annotated revision program and let B and B 0 be two T 2 -valuations such that B ≤k B 0 . Then, TPb (B) ≤k TPb (B 0 ). By Tarski-Knaster Theorem it follows that the operator TPb has a least fixpoint in T 2 [KS92]. This fixpoint is an analogue of the concept of a least Herbrand model of a Horn program. It represents the set of annotated revision atoms that are implied by the program and, hence, must be satisfied by any revision under P of any initial valuation. Given an annotated revision program P we will refer to the least fixpoint of the operator TPb as the necessary change of P and will denote it by N C(P ). The present concept of the necessary change generalizes the corresponding notion introduced in [MT95,MT98] for the original unannotated revision programs. To illustrate concepts and results of the paper, we will consider two special lattices. The first of them is the lattice with the domain [0, 1] (interval of reals) and with the standard ordering ≤ and the standard complement operation. We will denote this lattice by T[0,1] . Intuitively, the annotated revision atom (in(a): x), where x ∈ [0, 1], stands for the statement that a is “in” with likelihood (certainty) x.
Annotated Revision Programs
53
The second lattice is the Boolean algebra of all subsets of a given set X. It will be denoted by TX . We will think of elements from X as experts. The annotated revision atom (out(a):Y ), where Y ⊆ X, will be understood as saying that a is believed to be “out” by those experts that are in Y (the atom (in(a):Y ) has a similar meaning).
3
Models and c-Models
The semantics of annotated revision programs will be based on the notion of a model as defined in the previous section. The following result provides a characterization of the concept of a model in terms of the operator TPb . Theorem 2. A T 2 -valuation B of an annotated revision program P is a model of P (satisfies P ) if and only if B ≥k TPb (B). Given an annotated revision program P , its necessary change N C(P ) satisfies N C(P ) = TPb (N C(P )). Hence, N C(P ) is a model of P . As we will argue now, not all models are appropriate for describing the meaning of an annotated revision program. The problem is that T 2 -valuations may contain inconsistent information about elements from U . When studying the meaning of an annotated revision program we will be interested in those models only whose inconsistencies are limited by the information explicitly or implicitly present in the program. Consider the annotated revision program P , consisting of the following rule: (in(a):{q}) ← (out(a):{p}) (the literals are annotated with elements of the lattice T{p,q} ). Some models of this program are consistent (for instance, the T 2 -valuation that assigns h{q}, {p}i to a). However, P also has inconsistent models. Let us consider first the T 2 valuation B1 such that B1 (a) = h{p, q}, {p}i. Clearly, B1 is a model of P . Moreover, it is an inconsistent model — the expert p believes both in(a) and out(a). Let us notice though that this inconsistency is not disallowed by the program. The rule (in(a):{q}) ← (out(a):{p}) is applicable with respect to B1 and, thus, provides an explicit evidence that q believes in in(a). This fact implicitly precludes q from believing in out(a). However, this rule does not preclude that expert p believes in out(a). In addition, since no rule in the program provides any information about out(a), it prevents neither p nor q from believing in in(a). To summarize, the program allows for p to have inconsistent beliefs (however, q’s beliefs must be consistent). Next, consider the T 2 -valuation B2 such that B2 (a) = h{p, q}, {p, q}i. This valuation is also a model of P . In B2 both p and q are inconsistent in their beliefs. As before, the inconsistent beliefs of p are not disallowed by P . However, reasoning as before we see that the program disallows q to believe in out(a). Thus the inconsistent beliefs of expert q cannot be reconciled with P . In our study of annotated revision programs we will restrict ourselves only to consistent models
54
V. Marek, I. Pivkina, and M. Truszczy´ nski
and to those inconsistent models whose all inconsistencies are not disallowed by the program. Speaking more formally, by direct (or explicit) evidence we mean evidence provided by heads of program rules applicable with respect to B. It can be described as TPb (B). The implicit bound on allowed annotations is given by a version of the closed world assumption: if the evidence for a revision atom l provided by the program is α then, the evidence for the dual revision atom lD (in(a), if l = out(a), or out(a), otherwise) must not exceed α ¯ (unless explicitly forced by the program). Thus, the implicit upper bound on allowed annotations is given by −TPb (B). Hence, a model B of a program P contains no more evidence than what is implied by P given B if B ≤k TPb (B) ⊕ (−TPb (B)). This discussion leads us to a refinement of the notion of a model of an annotated revision program. Definition 1. Let P be an annotated revision program and let B be a T 2 valuation. We say B is a c-model of P if TPb (B) ≤k B ≤k TPb (B) ⊕ (−TPb (B)). Thus, coming back to our example, the T 2 -valuation B1 is a c-model of P and B2 is not. The “c” in the term c-model is to emphasize that c-models are “as consistent as possible”, that is, inconsistencies are limited to those that are not explicitly or implicitly disallowed by the program. The notion of a c-model will play an important consideration in our considerations. Clearly, by Theorem 2, a c-model of P is a model of P . In addition, it is easy to see that the necessary change of an annotated program P is a c-model of P (it follows directly from the fact that N C(P ) = TPb (N C(P ))). The distinction between models and c-models appears only in the context of inconsistent information. This observation is formally stated below. Theorem 3. Let P be an annotated revision program. A consistent T 2 -valuation B is a c-model of P if and only if B is a model of P .
4
Justified Revisions
In this section, we will extend to the case of annotated revision programs the notion of a justified revision introduced for revision programs in [MT95]. The reader is referred to [MT95,MT98] for the discussion of motivation and intuitions behind the concept of a justified revision and of the role of the inertia principle (a version of the closed world assumption). There are several properties that one would expect to hold when the notion of justified revision is extended to the case of programs with annotations. Clearly, the extended concept should specialize to the original definition if annotations can be dropped. Next, all main properties of justified revisions studied in [MT98,MPT99] should have their counterparts in the case of justified revisions of annotated programs. In particular, justified revisions of an annotated logic
Annotated Revision Programs
55
program should satisfy it. Finally, there is one other requirement that naturally arises in the context of programs with annotations. Consider two annotated revision rules r and r0 that are exactly the same except that the body of r contains two annotated revision atoms l:β1 and l:β2 , while the body of r0 instead of l:β1 and l:β2 contains annotated revision atom l:β1 ∨ β2 : r= . . . ← . . . , (l:β1 ), . . . , (l:β2 ), . . . r0 =
. . . ← . . . , (l:β1 ∨ β2 ), . . .
It is clear, that for any T 2 -valuation B, B satisfies (l:β1 ) and (l:β2 ) if and only if B satisfies (l:β1 ∨ β2 ). Consequently, replacing rule r by rule r0 (or vise versa) in an annotated revision program should have no effect on justified revisions In fact, any reasonable semantics for annotated revision programs should be invariant under such operation, and we will refer to this property of a semantics of annotated revision programs as invariance under join. In this section we introduce the notion of the justified revision of an annotated revision program and contrast it with an earlier proposal by Fitting [Fit95]. In the following section we show that our concept of a justified revision satisfies all the requirements listed above. Let a T 2 -valuation BI represent our current knowledge about some subset of the universe U . Let an annotated revision program P describe an update that BI should be subject to. The goal is to identify a class of T 2 -valuations that could be viewed as representing updated information about the subset, obtained by revising BI by P . As argued in [MT95,MT98], each appropriately “revised” valuation BR must be grounded in P and in BI , that is, any difference between BI and the revised T 2 -valuation BR must be justified by means of the program and the information available in BI . To determine whether BR is grounded in BI and P , we use the reduct of P with respect to the two valuations. The construction of reduct consists of two steps and mirrors the original definition of the reduct of an unannotated revision program [MT98]. In the first step, we eliminate from P all rules whose bodies are not satisfied by BR (their use does not have an a posteriori justification with respect to BR ). In the second step, we take into account the initial valuation BI . How can we use the information about the initial T 2 -valuation BI at this stage? Assume that BI provides evidence α for a revision atom l. Assume also that an annotated revision atom (l:β) appears in the body of a rule r. In order to satisfy this premise of the rule, it is enough to derive, from the program resulting from step 1, an annotated revision atom (l:γ), where α ∨ γ ≥ β. The least such element exists (due to the fact that T is complete and distributive). Let us denote it by pcomp(α, β)2 . Thus, in order to incorporate information about a revision atom l contained in the initial T 2 -valuation BI , which is given by α = (θ−1 (BI ))(l), we proceed as follows. In the bodies of rules of the program obtained after step 1, we replace 2
The operation pcomp(·, ·) is known in the lattice theory as the relative pseudocomplement, see [RS70].
56
V. Marek, I. Pivkina, and M. Truszczy´ nski
each annotated revision atom of the form (l:β) by the annotated revision atom (l:pcomp(α, β)). Now we are ready to formally introduce the notion of reduct of an annotated revision program P with respect to the pair of T 2 -valuations, initial one, BI , and a candidate for a revised one, BR . Definition 2. The reduct PBR |BI is obtained from P by 1. removing every rule whose body contains an annotated atom that is not satisfied in BR , 2. replacing each annotated atom (l:β) from the body of each remaining rule by the annotated atom (l:γ), where γ = pcomp((θ−1 (BI ))(l), β). We now define the concept of a justified revision. Given an annotated revision program P , we first compute the reduct PBR |BI of the program P with respect to BI and BR . Next, we compute the necessary change for the reduced program. Finally we apply the change thus computed to the T 2 -valuation BI . A T 2 valuation BR is a justified revision of BI if the result of these three steps is BR . Thus we have the following definition. Definition 3. BR is a P -justified revision of BI if BR = (BI ⊗ −C) ⊕ C, where C = N C(PBR |BI ) is the necessary change for PBR |BI . We will now contrast the above approach with one proposed by Fitting in [Fit95]. In order to do so, we recall the definitions introduced in [Fit95]. The key difference is in the way Fitting defines the reduct of a program. The first step is the same in both approaches. However, the second steps, in which the initial valuation is used to simplify the bodies of the rules not eliminated in the first step of the construction, differ. Definition 4 (Fitting). Let P be an annotated revision program and let BI and BR be T 2 -valuations. The F-reduct of P with respect to (BI , BR ) (denoted PBFR |BI ) is defined as follows: 1. Remove from P every rule whose body contains an annotated revision atom that is not satisfied in BR . 2. From the body of each remaining rule delete any annotated revision atom that is satisfied in BI . The notion of justified revision as defined by Fitting differs from our notion only in that the necessary change of the F-reduct is used. We call the justified revision using the notion of F -reduct, the F-justified revision. In the remainder of this section we show that the notion of the F-justified revision does not in general satisfy some basic requirements that we would like justified revisions to have. In particular, F-justified revisions under an annotated revision program P are not always models of P .
Annotated Revision Programs
57
Example 1. Consider the lattice T{p,q} . Let P be a program consisting of the following rules: (in(a):{p}) ← (in(b):{p, q})
and
(in(b):{q}) ←
and let BI be an initial valuation such that BI (a) = h∅, ∅i and BI (b) = h{p}, ∅i. Let BR be a valuation given by BR (a) = h∅, ∅i and BR (b) = h{p, q}, ∅i. Clearly, PBFR |BI = P , and BR is an F -justified revision of BI (under P ). However, BR does not satisfy P . The semantics of F -justified revisions also fails to satisfy the invariance under join property. Example 2. Let P be a revision program consisting of the following rules: (in(a):{p}) ← (in(b):{p, q})
and
(in(b):{q}) ←
and let P 0 consist of (in(a):{p}) ← (in(b):{p}), (in(b):{q})
and
(in(b):{q}) ←
Let the initial valuation BI be given by BI (a) = h∅, ∅i and BI (b) = h{p}, ∅i. The only F-justified revision of BI (under P ) is a T 2 -valuation BR , where BR (a) = h∅, ∅i and BR (b) = h{p, q}, ∅i. The only F-justified revision of BI (under P 0 ) 0 0 0 is a T 2 -valuation BR , where BR (a) = h{p}, ∅i and BR (b) = h{p, q}, ∅i. Thus, replacing in the body of a rule annotated revision atom (in(b):{p, q}) by (in(b): {p}) and (in(b):{q}) affects F-justified revisions. However, in some cases the two definitions of justified revision coincide. The following result provides a complete characterization of those cases. Theorem 4. F-justified revisions and justified revisions coincide if and only if the lattice T is linear (that is, for any two elements a, b ∈ T either a ≤ b or b ≤ a). Theorem 4 explains why the difference between the justified revisions and F justified revisions is not seen when we limit our attention to revision programs as those considered in [MT98]. Namely, the lattice T WO = {f , t} of boolean values is linear. Similarly, the lattice of reals from the segment [0, 1] is linear, and there the differences cannot be seen either.
5
Properties of Justified Revisions
In this section we study basic properties of justified revisions. We show that key properties of justified revisions in the case of revision programs without annotations have their counterparts in the case of justified revisions of annotated revision programs.
58
V. Marek, I. Pivkina, and M. Truszczy´ nski
First, we will observe that revision programs as defined in [MT95] can be encoded as annotated revision programs (with annotations taken from the lattice T WO = {f , t}). Namely, a revision rule p ← q1 , . . . qm (where p and all qi s are revision atoms) can be encoded as (p:t) ← (q1:t), . . . , (qm:t) In [Fit95], Fitting argued that under this encoding the semantics of F-justified revisions generalizes the semantics of justified revisions introduced in [MT95]. Since for lattices whose ordering is linear the approach by Fitting and the approach presented in this paper coincide, and since the ordering of T WO is linear, the semantics of justified revisions discussed here extends the semantics of justified revisions from [MT95]. Next, let us recall that in the case of revision programs without annotations, justified revisions under a revision program P are models of P . In the case of annotated revision programs we have a similar result. Theorem 5. Let P be an annotated revision program and let BI and BR be T 2 -valuations. If BR is a P -justified revision of BI then BR is a c-model of P (and, hence, also a model of P ). In the case of revision programs without annotations, a model of a program P is its unique P -justified revision. In the case of programs with annotations, the situation is slightly more complicated. The next result characterizes those models of an annotated revision program that are their own justified revisions. Theorem 6. Let a T 2 -valuation BI be a model of an annotated revision program P . Then, BI is a P -justified revision of itself if and only if BI is a c-model of P. As we observed above, in the case of programs without annotations, models of a revision program are their own unique justified revisions. This property does not hold, in general, in the case of annotated revision programs. Example 3. Consider an annotated revision program P (with annotations belonging to T{p,q} ) consisting of the clauses: (out(a):{q}) ←
and
(in(a):{q}) ← (in(a):{q})
Consider a T 2 -valuation BI such that BI (a) = h{q}, {q}i. It is easy to see that BI is a c-model of P . Hence, BI is its own justified revision (under P ). However, BI is not the only P -justified revision of BI . Consider the T 2 valuation BR such that BR (a) = h∅, {q}i. We have PBR |BI = {(out(a):{q}) ←}. Let us denote the corresponding necessary change, N C(PBR |BI ), by C. Then, C(a) = h∅, {q}i. Hence, −C = h{p}, {p, q}i and ((BI ⊗ −C) ⊕ C)(a) = h∅, {q}i = BR (a). Consequently, BR is a P -justified revision of BI .
Annotated Revision Programs
59
The same behavior can be observed in the case of programs annotated with elements from other lattices. Example 4. Let P be an annotated revision program (annotations belong to the lattice T[0,1] ) consisting of the rules: (out(a):1) ←
and
(in(a):0.4) ← (in(a):0.4)
Let BI be a valuation such that BI (a) = h0.4, 1i. Then, BI is a c-model of P and, hence, it is its own P -justified revision. Consider a valuation BR such that BR (a) = h0, 1i. We have PBR |BI = {(out(a) : 1) ←}. Let us denote the necessary change N C(PBR |BI ) by C. Then C(a) = h0, 1i and −C = h0, 1i. Thus, ((BI ⊗ −C) ⊕ C)(a) = h0, 1i = BR (a). That is, BR is a P -justified revision of BI . Note that in both examples the additional justified revision BR of BI is smaller than BI with respect to the ordering ≤k . It is not coincidental as demonstrated by our next result. Theorem 7. Let BI be a model of an annotated revision program P . Let BR be a P -justified revision of BI . Then, BR ≤k BI . Finally, we observe that if a consistent T 2 -valuation is a model (or a c-model; these notions coincide in the class of consistent valuations) of a program then, it is its unique justified revision. Theorem 8. Let BI be a consistent model of an annotated revision program P . Then, BI is the only P -justified revision of itself. To summarize, when we consider inconsistent valuations (they appear naturally, especially when we measure beliefs of groups of independent experts), we encounter an interesting phenomenon. An inconsistent valuation BI , even when it is a model of a program, may have different justified revisions. However, all these additional revisions must be less informative than BI . In the case of consistent models this phenomenon does not occur. If a valuation B is consistent and satisfies P then it is its unique P -justified revision.
6
An Alternative Way of Describing Annotated Revision Programs and Order-isomorphism Theorem
We will now provide an alternative description of annotated revision programs. Instead of evaluating separately revision atoms (i.e. expressions of the form in(a) and out(a)) we will evaluate atoms. However, instead of evaluating revision atoms in T , we will evaluate atoms in T 2 (i.e. T × T ). This alternative presentation will allow us to obtain a result on the preservation of justified revisions under order isomorphisms of T 2 . This result is a generalization of the “shift theorem” of [MPT99].
60
V. Marek, I. Pivkina, and M. Truszczy´ nski
An expression of the form ahα, βi, where hα, βi ∈ T 2 , will be called an annotated atom (thus, annotated atoms are not annotated revision atoms). Intuitively, an atom ahα, βi stands for both (in(a):α) and (out(a):β). An annotated rule is an expression of the form p ← q1 , . . . , qn where p, q1 , . . . , qn are annotated atoms. An annotated program is a set of annotated rules. A T 2 -valuation B satisfies an annotated atom ahα, βi if hα, βi ≤k B(a). This notion of satisfaction can be extended to annotated rules and annotated programs. We will now define the notions of reduct, necessary change and justified revision for the new kind of program. The reduct of a program P with respect to two valuations BI and BR is defined in a manner similar to Definition 2. Specifically, we leave only the rules with bodies that are satisfied by BR , and in the remaining rules we reduce the annotated atoms (except that now the transformation θ is no longer needed!). Next, we compute the least fixpoint of the operator associated with the reduced program. Finally, as in Definition 3, we define the concept of justified revision of a valuation BI with respect to a revision program P . It turns out that this new syntax does not lead to a new notion of justified revision. Since we talk about two different syntaxes, we will use the term “old syntax” to denote the revision programs as defined in Section 2, and “new syntax” to describe programs introduced in this section. Specifically we now exhibit two mappings. The first of them, tr1 , assigns to each “old” in-rule (in(a):α) ← (in(b1 ):α1 ), . . . , (in(bm ):αm ), (out(s1 ):β1 ), . . . , (out(sn ):βn ), a “new” rule ahα, ⊥i ← b1 hα1 , ⊥i, . . . , bm hαm , ⊥i, s1 h⊥, β1 i, . . . , sn h⊥, βn i. Encoding of an “old” out-rule (out(a):β) ← (in(b1 ):α1 ), . . . , (in(bm ):αm ), (out(s1 ):β1 ), . . . , (out(sn ):βn ) is analogous: ah⊥, βi ← b1 hα1 , ⊥i, . . . , bm hαm , ⊥i, s1 h⊥, β1 i, . . . , sn h⊥, βn i. Translation tr2 , in the other direction, replaces a revision “new” rule by one in-rule and one out-rule. Specifically, a “new” rule ahα, βi ← a1 hα1 , β1 i, . . . , an hαn , βn i is replaced by two “old” rules (with identical bodies but different heads) (in(a):α) ← (in(a1 ):α1 ), (out(a):β1 ), . . . , (in(an ):αn ), (out(an ):βn ) and (out(a):β) ← (in(a1 ):α1 ), (out(a):β1 ), . . . , (in(an ):αn ), (out(an ):βn ). The translations tr1 and tr2 can be extended to programs. We then have the following theorem.
Annotated Revision Programs
61
Theorem 9. Both transformations tr1 , and tr2 preserve justified revisions. That is, if BI , BR are valuations in T 2 and P is a program in the “old” syntax, then BR is a P -justified revision of BI if and only if BR is a tr1 (P )-justified revision of BI . Similarly, if BI , BR are valuations in T 2 and P is a program in the “new” syntax, then BR is a P -justified revision of BI if and only if BR is a tr2 (P )-justified revision of BI . In the case of unannotated revision programs, the shifting theorem proved in [MPT99] shows that for every revision program P and every two initial databases B and B 0 there is a revision program P 0 such that there is a one-to-one correspondence between P -justified revisions of B and P 0 -justified revisions of B 0 . In particular, it follows that the study of justified revisions (for unannotated programs) can be reduced to the study of justified revisions of empty databases. We will now present a counterpart of this result for annotated revision programs. The situation here is more complex. It is no longer true that a T 2 -valuation can be “shifted” to any other T 2 -valuation. However, the shift is possible if the two valuations are related to each other by an order isomorphism of the lattice of all T 2 -valuations. There are many examples of order isomorphisms on the lattice of T 2 -valuations. For instance, the mapping ψ : T 2 → T 2 defined by ψ(hα, βi) = hβ, αi is an order isomorphism of T 2 . In the case of a specific lattice TX , other order isomorphisms of TX2 are generated by permutations of the set X. An order isomorphism on T 2 can be extended to annotated atoms, programs and valuations. The extension to valuations is again an order isomorphism, this time on the lattice of all T 2 -valuations. The following result generalizes the shifting theorem of [MPT99]. Theorem 10. Let ψ be an order-isomorphism on the set of T 2 -valuations. Then, BR is a P -justified revision of BI if and only if ψ(BR ) is a ψ(P )-justified revision of ψ(BI ).
7
Conclusions and Further Research
The main contribution of our paper is a new definition of the reduct (and hence of justified revision) for the annotated programs considered by Fitting in [Fit95]. This new definition eliminates some anomalies (specifically the fact that the justified revisions of [Fit95] do not have to be models of the program). We also found that in cases where the intuition of [Fit95] is very clear (for instance in case when annotations are numerical degrees of belief), the two concepts coincide. Due to the limited space of the extended abstract, some results were not included. Below we briefly mention two research areas that are not discussed here but that will be discussed in the full version of the paper. First, the annotation programs can be generalized to disjunctive case, that is to programs admitting “nonstandard disjunctions” in the heads of rules. It turns out that a definition of justified revisions by means of such programs is
62
V. Marek, I. Pivkina, and M. Truszczy´ nski
possible, and one can prove that the disjunctive revisions for programs that have the head consisting of just one literal reduce to the formalism described above. Second, one can extend the formalism of annotated revision programs to the case when the lattice of annotations is not distributive. However, in such case only some of the results discussed here still hold.
8
Acknowledgments
This work was partially supported by the NSF grants CDA-9502645 and IRI9619233.
References ALP+ 98. J.J. Alferes, J.A. Leite, L.M. Pereira, H. Przymusinska, and T.C. Przymusinski. Dynamic logic programming. In Proceedings of KR’98: Sixth International Conference on Principles of Knowledge Representation and Reasoning, Trento, Italy, pages 98 – 110. Morgan Kaufmann, 1998. Fit95. M. C. Fitting. Annotated revision specification programs. In Logic programming and nonmonotonic reasoning (Lexington, KY, 1995), volume 928 of Lecture Notes in Computer Science, pages 143–155. Springer-Verlag, 1995. Fit99. M. C. Fitting. Fixpoint semantics for logic programming – a survey. Theoretical Computer Science, 1999. To appear. Gin88. M.L. Ginsberg. Multivalued logics: a uniform approach to reasoning in artificial intelligence. Computational Intelligence, 4:265–316, 1988. KS92. M. Kifer and V.S. Subrahmanian. Theory of generalized annotated logic programs and its applications. Journal of Logic Programming, 12:335–367, 1992. LW92. V. Lifschitz and T.Y.C. Woo. Answer sets in general nonmonotonic reasoning. In Proceedings of the 3rd international conference on principles of knowledge representation and reasoning, KR ’92, pages 603–614, San Mateo, CA, 1992. Morgan Kaufmann. MPT99. W. Marek, I. Pivkina, and M. Truszczy´ nski. Revision programming = logic programming + integrity constraints. In Computer Science Logic, 12th International Workshop, CSL’98, volume 1584 of Lecture Notes in Computer Science, pages 73– 89. Springer, 1999. MT95. W. Marek and M. Truszczy´ nski. Revision programming, database updates and integrity constraints. In Proceedings of the 5th International Conference on Database Theory — ICDT 95, volume 893 of Lecture Notes in Computer Science, pages 368–382. Springer-Verlag, 1995. MT98. W. Marek and M. Truszczy´ nski. Revision programming. Theoretical Computer Science, 190(2):241–277, 1998. PT97. T. C. Przymusinski and H. Turner. Update by means of inference rules. Journal of Logic Programming, 30(2):125–143, 1997. RS70. H. Rasiowa and R. Sikorski. The Mathematics of metamathematics. PWN— Polish Scientific Publishers, Warsaw, 1970.
Belief, Knowledge, Revisions, and a Semantics of Non-Monotonic Reasoning ˇ anek J´ an Sefr´ Institute of Informatics, Comenius University Mlynsk´ a dolina, 842 15 Bratislava, Slovakia
[email protected]
Abstract. Przymusinski’s Autoepistemic Logic of Knowledge and Belief (AELKB) is a unifying framework for various non-monotonic formalisms. In this paper we present a semantic characterization of AELKB in terms of Dynamic Kripke Structures (DKS). A DKS is composed of two components – a static one (a Kripke structure) and a dynamic one (a set of transformations). Transformations between possible worlds correspond to hypotheses generation and to revisions. Therefore they enable to define a semantics of insertions to and revisions of AELKB-theories. A computation of the transformations (between possible worlds) is based on (an enhanced) model-checking. The transformations may be used as a method of computing static autoepistemic expansions. Keywords: non-monotonic reasoning, autoepistemic logic of knowledge and belief, dynamic Kripke structure, belief revision, model checking
1
Introduction
The paper is aiming to present Dynamic Kripke Structures (DKS, [10]) as a rather general tool of a semantic characterization of non-monotonic reasoning. A DKS consists of two components – a static one (a Kripke structure) and a dynamic one (a set of transformations between possible worlds). The situations when a new knowledge is acquired and – as a consequence – a piece of knowledge (accepted before) should be revised are crucial from the nonmonotonic reasoning point of view. DKS provide a semantic characterization of these situations. A transformation of one possible world to another represents a change in our knowledge. The transformation is defined on a set of possible worlds and the set of possible worlds produced by the transformation represents the sets of epistemic alternatives after the transformation (after some insertions and some revisions forced by the insertions). A technical core of the paper is a semantic characterization of Przymusinski’s Autoepistemic Logic of Knowledge and Belief [9] (AELKB) in terms of DKS. Moreover, DKS provides also a semantics of revisions (of knowledge and belief theories). A framework for belief revision of knowledge and belief theories was presented in [1]. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 63–77, 1999. c Springer-Verlag Berlin Heidelberg 1999
64
ˇ anek J. Sefr´
Przymusinski augmented Moore’s autoepistemic logic (employing the knowledge operator K) with an additional belief operator B.1 Przymusinski’s extension of AELK to AELKB reflects an intuition that besides reasoning about statements which are known to be true we also need to reason about those statements that are only believed to be true. The semantics of B operator is determined by minimal entailment (or more general by a non-monotonic entailment). Expressibility is a strong point of AELKB: AELKB is a unifying framework for several major nonmonotonic formalisms [9]. Therefore, a semantics of AELKB in terms of DKS supports the ambition to use DKS as a tool of a general semantic characterization of non-monotonic reasoning (with an incorporated belief revision). The paper is organized as follows: First we describe the language and the basic concepts of AELKB (Sections 2 and 3). Thereafter in Section 4 we define DKS. In Section 5 are reviewed known results of [8] and [2] concerning characterizations of AELK and AELB in terms of Kripke structures. The results of this paper are presented in Section 6 (a possible-world semantics of AELKB), in Section 7 (insertions into knowledge and belief theories are characterized in terms of DKS and it is outlined how to compute static autoepistemic expansions and how to use model-checking as a method of computing transformations between possible worlds), and in Section 8 (a semantic specification of revisions of AELKB-theories and their computation by enhanced model checking is presented; the computation uses an idea of [3]).
2
Preliminaries
We assume a fixed propositional language L with standard connectives (¬, ⇒ , ∧, . . . ), a countable set of propositional letters P = {p1 , . . . , pn , . . . } and a special propositional letter ⊥ denoting false. Propositional atom, literal, and formula are defined as usually. Let LA , an extension of L, be defined as follows: Two (modal) operators K and B are added to the set of symbols. Each atom, literal, and formula of L is an (objective) atom, literal, and formula of LA , respectively. If φ is a formula of LA , then Bφ, Kφ are (subjective) atoms, and Bφ, Kφ, ¬Bφ and ¬Kφ are (subjective) literals of LA . Each (subjective) literal of LA is a formula of LA . If φ and ψ are formulae of LA , then φ ∧ ψ, ¬φ are formulae of LA (LA -formulae). The formulae which contain K or B operators, are called subjective formulae. Definition 1 (Knowledge and belief theory, [9]) A knowledge and belief theory in LA (AELKB-theory) is a (possibly infinite) set of formulae of the form β1 ∧ · · · ∧ βk ∧ Bφ1 ∧ · · · ∧ Bφl ∧ Kψ1 ∧ · · · ∧ Kψm ⇒ α1 ∨ · · · ∨ αn ∨ Bχ1 ∨ · · · ∨ Bχr ∨ Kτ1 ∨ · · · ∨ Kτs , 1
We will use abbreviations AELKB, AELK, AELB for the logics employing both operators, only K, and only B, respectively. There is a little difference between here introduced symbols/abbreviations and the usual usage.
Belief, Knowledge, Revisions, and a Semantics of Non-Monotonic Reasoning
65
where αi s, βi s are propositional atoms, φi s, ψi s, χi s, τi s are arbitrary formulae of LA . 2 Let us denote by PLA the set of all atoms of LA . An interpretation of LA is a subset of PLA . It is clear that a valuation of an LA -formula in an interpretation I may be defined precisely as the two-valued propositional valuation: Definition 2 Let I be an interpretation: – if φ is an atom (objective or subjective) of LA , then valI (φ) = 1 iff φ ∈ I, – if φ is a literal ¬ψ, then valI (φ) = 1 iff ψ 6∈ I, – otherwise φ is a boolean combination of literals and valI (φ) is computed according to the rules for boolean combinations. If X is a set of formulae, then valI (X) = 1 iff valI (φ) = 1 for each φ ∈ X and we say that I is a model of T (X is satisfied in I). 2 A convention: We will sometimes use an alternative notation for interpretations. If I is an interpretation of LA , it can be denoted by I ∪ N , where N = {¬φ : φ ∈ PLA \ I}. Definition 3 Let us consider interpretations I, J . I ≺ J iff for each objective atom α holds: if α ∈ I, then α ∈ J . Let X be a set of interpretations and I ∈ X. Then I is minimal in X iff there is no J ∈ X (except of I) which coincides with I on subjective literals, such that J ≺ I. If a formula φ is true in all minimal models of a knowledge and belief theory T then we say that φ is minimally entailed by T (notation: T |=min φ). 2
3
Static Autoepistemic Expansions
Truth values of the subjective atoms are independent on the truth values of their arguments. Intuitively, subjective atoms are true only if their arguments are known or believed. An evidence of what is known and/or believed we can represent by a set of subjective atoms (belief set).2 Definition 4 Let I be an interpretation of propositional letters and S be a set of subjective atoms. We define a function val which assigns a value from the set {0, 1} to each pair (I, S) and each LA formula: – – – – 2
if if if if
φ φ φ φ
is is is is
an objective atom, then valIS (φ) = 1 iff φ ∈ I a subjective atom, then valIS (φ) = 1 iff φ ∈ S ¬ψ, then valIS (φ) = 1 iff valIS (ψ) = 0 ψ ∧ τ , then valIS (φ) = 1 iff valIS (ψ) = 1 and valIS (τ ) = 1
Later we will use a more general notion of belief set. A decoupling of subjective and objective literals was used in the Definition 3 of minimal interpretations.
66
ˇ anek J. Sefr´
Let S be fixed. We define valS (τ ) = 1 iff for each I is valIS (τ ) = 1. If X is a set of LA -formulae, then valIS (X) = 1 iff valIS (φ) = 1 for each φ ∈ X and we say that I is a model of T (X is satisfied in I). 2 We will use repeatedly the scheme from the Definition 4 in the following. The only point of difference will be how the set S is specified. We do not intend to use arbitrary belief sets. It is appropriate to restrict somehow possible belief sets (a belief set should be a reasonable one). There is a variety of possibilities for a decision, some of them are used in the paper. Definition 5 (Formulae derivable from an AELKB-theory, [9]) Let T be an AELKB-theory. We denote by CnA (T ) the smallest set of formulae which contains the theory T , and all instances of: Consistency Axiom ¬B⊥ Normality axiom B(φ ⇒ ψ) ⇒ (Bφ ⇒ Bψ) and is closed under propositional consequence and under Necessitation Inference φ 2 Rule Bφ A consequence operator is a function which assigns a set of formulae to a set of formulae. We will use two consequence operators: CnA and CnP L (the propositional consequence operator). Each set of formulae derivable from an AELKB-theory is – in a sense – a reasonable belief set. An “introspective content” of an AELKB-theory T can be viewed as an AELKB-theory T ∗ , called static autoepistemic expansion. Definition 6 (Static autoepistemic expansion) A theory T ∗ is called a static autoepistemic expansion (SAE) of a knowledge and belief theory T iff T ∗ = CnA (T ∪ {Kφ : T ∗ |= φ} ∪ {¬Kφ : T ∗ 6|= φ} ∪ {Bφ : T ∗ |=min φ}) 2 Notice that we distinguish three levels of a logical characterization of AELKBtheories: – two-valued models (and CnP L -consequence) – CnA -consequence – static autoepistemic expansion
4
Dynamic Kripke Structures
We can now proceed to the central semantic construction used in this paper. First a rather general concept of Kripke structures is defined. (Later we will use some of its specializations.) Definition 7 Kripke structure is a triple (W, R, m), where W is a set of possible worlds, R = {ρ : ρ ⊆ W × W } is a set of accessibility relations and m is a (meaning) function assigning to each possible world an interpretation. 2 Definition 8 A monoid is a triple (M, ◦, e), where M is a set, ◦ : M ×M −→ M is an associative operation, e ∈ M and for every x ∈ M holds e ◦ x = x = x ◦ e. 2
Belief, Knowledge, Revisions, and a Semantics of Non-Monotonic Reasoning
67
We are ready to define DKS. The structure consists of a monoid-part and a Kripke-structure-part. The main idea is a transformation of possible worlds to possible worlds. The transformation is specified by monoid elements. Definition 9 Dynamic Kripke Structure is a pair (M, W), where M is a monoid and W is a Kripke structure, and for every x ∈ M there is a function3 fx : W −→ W such that fe is an identity mapping and for every x, y ∈ M , for every w ∈ W holds fx◦y (w) = fx (fy (w)) 2 Dynamic Kripke structures were introduced in [10] together with a demonstration that database updates and Closed World Assumption are expressible in terms of DKS. A motivation (and an ambition) behind the concept is that it seems that DKS provide a useful tool for a (unifying) semantic characterization of non-monotonic reasoning. The proposed approach is based on a belief that non-monotony is a consequence of some fundamental properties4 of hypothetical and contextdependent reasoning, and of belief revision. A close relationship between belief revision and inference is emphasized. The most significant feature of DKS are transformations between possible worlds. The transformations correspond intuitively to hypotheses generation and to revisions (a hypothesis may be true in the image-world, but not in the sourceworld and vice versa). Sometimes the accessibility relation is “changed” by a transformation (more precisely – for worlds w1 , w2 , accessibility relation ρ and transformation f may hold: (w1 , w2 ) ∈ ρ, (f (w1 ), f (w2 )) 6∈ ρ or vice versa). Therefore, if a consequence operator Cn is dependent on the accessibility relation, then a transformation results in a non-monotonic Cn. In a sense, DKS is a construction explaining the non-monotony of reasoning. ¿From the DKS point of view: If non-monotony is a symptom, then hypotheses addition and revisions forced by the addition are the essence (of the non-standard, hypothetical, context-dependent reasoning).
5
Possible World Semantics
A characterization of AELK in terms of Kripke structures was given by Moore, see [8]. Similarly, Kripke structures were used as a tool of a characterization of AELB in [2]. In this section we summarize the results of [8] and [2], particularly a characterization of SAE of AELK- and AELB-theories in terms of Kripke structures. Let us restrict the language LA in such a way that we do not use belief atoms (knowledge atoms) of the form Bφ (Kφ). The language we denote by LAK (LAB ).5 The formulae of both languages we will denote as LAK - (LAB -) formulae. 3 4 5
It is said that there is an action of M on W . “ . . . non-monotonic behaviour . . . is a symptom, rather than the essence of nonstandard inference”, see [11]. It means, LAK = {φ ∈ LA : B operator does not occur in φ}. Similarly for LAB .
ˇ anek J. Sefr´
68
Definition 10 An AELK (AELB)-theory TAK (TAB ) in LAK (LAB ) is a K(B-) restriction of an AELKB-theory T iff TAK = {φ ∈ T : φ is a LAK -formula} (TAB = {φ ∈ T : φ is a LAB -formula }). 2 5.1
Possible World Semantics for AELK
Definition 11 A complete S5-frame is a Kripke structure (W, ρ) such that ρ = W × W .6 2 Each possible world is accessible from each possible world in a complete S5frame and a complete S5-frame is uniquely determined by the set of possible worlds W . Definition 12 A set S of LAK -formulae is stable iff – S = CnP L (S) – if φ ∈ S, then Kφ ∈ S – if φ 6∈S, then ¬Kφ ∈ S 2 We now introduce a specialization of the Definition 4. Let M be a complete S5-frame. Let w ∈ W be an interpretation of propoS sitional letters (as introduced in the Definition 2). We will use a function valw S as defined in the Definition 4, but S = {Kφ : (∀w ∈ W ) valw (φ) = 1} ∪ {¬Kφ : S S (φ) = 0}. Note that valIM we use as a synonym of valw . (∃w ∈ W ) valw Let us recall that a formula φ is true in a complete S5-frame M , if for each S w ∈ W is valw (φ) = 1; notation: valS (φ) = 1 or alternatively valM (φ) = 1. Theorem 1 ([8], [7]) A set of LAK -formulae S is stable iff S is the set of all LAK -formulae which are true in some complete S5-frame. 2 We can now define an interpretation consisting of two components – one is an ordinary propositional interpretation, the second is a complete S5-frame (a reasonable belief set is a set of all formulae satisfied in a complete S5-frame). Definition 13 A possible-world autoepistemic interpretation is a pair P W = (I, M ), where I is an ordinary interpretation of propositional letters of L and M is a complete S5-frame. 2 Possible-world model is defined in an obvious way. Definition 14 Let X be a set of formulae, φ a formula. X |=P W φ iff φ is true in every possible-world model of X. 2 6
This is our first special case of Kripke structures. For simplicity, we use the symbol ρ instead of {ρ} and we identify the set of possible worlds with a set of interpretations – possible worlds are interpretations. (Formally, function m is the identity, but we are omitting an explicit recording of this function.)
Belief, Knowledge, Revisions, and a Semantics of Non-Monotonic Reasoning
69
We are now able to express a characterization of SAE of AELK-theories in terms of possible-world interpretations. Theorem 2 ([7]) Let T be an AELK-theory. A set S of LAK -formulae is a K-restriction of a SAE of T iff S = {φ : (T ∪ {Kψ : ψ ∈ S0 } ∪ {¬Kψ : ψ ∈ L0 \ S0 ) |=P W φ}, where S0 is the set of all objective formulae from S and L0 is the set of all objective formulae from LAK . 2 5.2
Possible World Semantics for AELB
B-restrictions of SAE can be also characterized in terms of Kripke structures. The result is due to [2]. Let K be a Kripke structure (W, ρ), where W is a set of propositional interS and valS are defined pretations (a set of sets of objective literals). Functions valw 0 0 as above and for each w ∈ W is S = {Bφ : ∃w ((w, w ) ∈ ρ ∧ valw0 (φ) = 1)}. 2 K S We will write also valK and valw instead of valS and valw . Theorem 3 ([2]) Let T be an arbitrary AELB-theory and (W, ρ) be a Kripke structure satisfying 0
0
– for every w ∈ W there is w ∈ W such that w ρ w – each w ∈ W is a model of T 0 0 0 – for all w, w ∈ W such that w ρ w holds that w is a minimal model of T K (φ) = 1} is a B-restriction of a SAE of Then T ∗ = {φ ∈ LAB : (∀w ∈ W ) valw T. 2
6
AELKB-structures
We are now ready to construct an appropriate Kripke structure which enables a characterization of SAE of (full) AELKB-theories. The possible worlds of our Kripke structures are complete S5-frames and one of the accessibility relations leads to minimal models. In what follows we assume only a language LA with a finite set of propositional letters and finite sets of finite interpretations.7 There are two reasons for the limitation to the finite structures. First, we are interested in a correspondence between sets of models (possible worlds) and AELKB-theories (sets of all LA -formulae true in the given possible world). But there is a countable set M of propositional models8 such that there is no set S of LA -formulae such that M is the set of all models of S, see [4]. Only for finite sets of propositional models holds: if w is a (finite) set of models, then there is a set S of LA -formulae such that w is the set of all models of S. Second, we propose model checking as a computational method for DKS, therefore the limitation to finite structures is a natural one. 7 8
We consider only relevant interpretations. Note that we use the concept of two-valued interpretations (models) as defined in the Definition 2.
70
ˇ anek J. Sefr´
Definition 15 (AELKB-structure) Let Int be a set of all interpretations of an AELKB-theory in a language LA . AELKB-structure is a triple (W, R, m), where W = P(Int) is the set of all 0 0 0 subsets of Int, R = {ρ1 , ρ2 }, ρ1 = {(w, w ) : w 6= w ∧ (∃I ∈ Int) w = w ∪ {I}}, 0 0 ρ2 = {(w, w ) : w = {I : I is minimal in w}}. Finally, m is defined as follows:9 – for an objective formula φ: mw (φ) = 1, if (∀I ∈ w) valI (φ) = 1, mw (φ) = 0 if (∀I ∈ w) valI (φ) = 0, otherwise mw (φ) = 12 – mw (Kφ) = 1 iff mw (φ) = 1 – mw (¬Kφ) = 1 iff mw (φ) 6= 1 0 – mw (Bφ) = 1 iff (w, w ) ∈ ρ2 → mw0 (φ) = 1, otherwise mw (Bφ) = 0 – if φ and ψ are LA -formulae, then mw (¬φ) = 1 − mw (φ) and mw (φ ∧ ψ) = min{mw (φ), mw (ψ)}. If T is a knowledge and belief theory, then mw (T ) = 1 iff (∀φ ∈ T ) mw (φ) = 1. 2 Note that the three-valued valuation of objective formulae was defined. We motivate the decision as follows: Each consistent SAE of an arbitrary AELKBtheory T contains exactly one of the complementary literals Kφ, ¬Kφ for each LA -formula φ. Therefore, we have to define mw in such a way that for each formula φ holds either Kφ or ¬Kφ. However, if neither φ nor ¬φ is true in each interpretation of w, then it is natural to accept both mw (¬Kφ) = 1 and mw (¬K¬φ) = 1. It means that we have to introduce the third truth-value. Two-valued valuations are used for subjective formulae. Notation: Let T be an AELKB-theory and w be a set of models. We denote by M od(T ) the set of all models of T and by T h(w) the set of all formulae true in each model of w. Obviously, T h(w) = CnA (T h(w)) and w = M od(T h(w)). Theorem 4 Let T be an AELKB-theory, K = (W, {ρ1 , ρ2 }, m) be an AELKBstructure and wT ∈ W be the set of all models of T . Let w⊥ be the empty set of interpretations. 0 0 Then there is a possible world w ∈ W , where w⊥ ⊆ w ⊆ wT , such that T ∗ = {φ : mw0 (φ) = 1} is a SAE of T . Proof Sketch : First we prove that T ∗ = {φ : mw0 (φ) = 1} ⊆ CnA (T ∪ {Kφ : T ∗ |= φ} ∪ {¬Kφ : T ∗ 6|= φ} ∪ {Bφ : T ∗ |=min φ}). Let be mw0 (φ) = 1. If φ is of the form Kψ, then mw0 (ψ) = 1. It means that each model of T ∗ is a model of ψ. Therefore φ ∈ {Kτ : T ∗ |= τ }. Similarly for φ = ¬Kψ and φ = Bψ. The closure of T ∗ under CnP L and under CnA 0 is obvious. Finally, each subset of wT satisfies T , i.e. also w satisfies T . As a consequence, T ⊆ T ∗ . It means, T ∗ is a subset of a SAE. Conversely, let us assume φ ∈ CnA (T ∪ {Kφ : T ∗ |= φ} ∪ {¬Kφ : T ∗ 6|= φ} ∪ {B φ : T ∗ |=min φ}). It is straightforward to show that mw0 (φ) = 1. 2 0 Of course, if T ∗ is a consistent SAE of T , then w⊥ ⊂ w ⊆ wT . 9
m assigns an interpretation m(w) to each possible world w. An application of the interpretation m(w) to a formula φ we will denote by mw (φ).
Belief, Knowledge, Revisions, and a Semantics of Non-Monotonic Reasoning
71
The theorem provides an existential characterization of AELKB-theories (and their SAE) in terms of Kripke structures (AELKB-structures). Our next goal is to present a more constructive method of SAE characterization. Let wT be the set of all models of an AELKB-theory T . Consider two sets of formulae: S = {φ : mwT (φ) = 1} and CnA (T ). The next example shows that S \ CnA (T ) 6= ∅ for some AELKB-theories. Example 1 ([9]) Let T be {B¬b ∧ B¬f ⇒ r, ¬Kb ∧ ¬Kf → d}. Some of the members of wT are ν1 = {B¬b, B¬f, ¬Kb, ¬Kf, r, d, b, f } ν2 = {¬B¬b, ¬B¬f, Kb, Kf, ¬r, ¬d, ¬b, ¬f }. Hence, mwT (¬Kb) = 1 and mwT (¬Kf ) = 1, but ¬Kb 6∈CnA (T ) and ¬Kf 6∈ CnA (T ). 0 0 Let wT be the set of all minimal models of T . If ν ∈ wT , then ¬b ∈ ν and ¬f ∈ ν. Therefore, mwT (B¬b) = 1 = mwT (B¬f ), but B¬b, B¬f 6∈CnA (T ). 2 A non-empty S \ CnA (T ) may contain literals of two forms: ¬Kφ or Bφ. Intuitively, the function mw generates two kinds of (defeasible) hypotheses (the sentences which do not belong among CnA -consequences of T ): belief formulae and introspective formulae stating that something is not known. It remains to show that we can provide a more constructive method of SAE characterization. Next we define a monotonic mapping of a complete lattice. Theorem 5 Let be W = P(Int). Then the mapping Φ : W −→ W defined as Φ(w) = M od({φ : mw (φ) = 1}) is monotonic.10 0
Proof: If w ⊆ w , then {φ : mw (φ) = 1} ⊇ {φ : mw0 (φ) = 1}, i.e. M od({φ : mw (φ) = 1}) ⊆ M od({φ : mw0 (φ) = 1}). Remark 1 ¿From the monotony follows that Φ has a least fixpoint and a greatest fixpoint. We are now able to give a more deep characterization of SAE. Theorem 6 Let T be an AELKB-theory, K = (W, {ρ1 , ρ2 }, m) be an AELKBstructure and wT ∈ W be the set of all models of T . Then for each possible world w ∈ W , where w⊥ ⊆ w ⊆ wT holds: if Φ(w) = w, then CnA (T h(w)) is a SAE of T (we will say that w determines a SAE of T ). There is a naive (and inefficient) method of verifying whether some possible world w determines a SAE of T . 10
There is a relation between Φ and the belief closure operator ΨT of [9]. A forthcoming paper devoted to a more detailed study of computational aspects will discuss the relation.
72
ˇ anek J. Sefr´
Definition 16 Let w0 , . . . , wk be a sequence of possible worlds such that for each i = 0, . . . , k − 1 holds (wi , wi+1 ) ∈ ρ1 . We say that the sequence is a ρ1 -path. Obviously, for each pair (wi , wj ) such that i < j holds that wj ⊆ wi . The method consists in searching all ρ1 -paths and for each w on a ρ1 -path checking if {φ : mw (φ) = 1} is satisfied in w. A more promising method consists in (non-deterministic) selecting some formulae from the set S \CnA (T ), inserting them to T and verifying if the insertion leads to a SAE of T . In simple cases the first attempt is a successful one: Example 2 Let us return to the Example 1. ¬Kb, ¬Kf, B¬b, B¬f ∈ S\CnA (T ). 0 If T = T ∪ {¬Kb, ¬Kf, B¬b, B¬f } and wT 0 is the set of all models of T , then mwT 0 is a fixpoint of Φ, hence CnA (T h(wT 0 )) is a SAE of T . In general, some iteration of insertions is needed. A recursive procedure we outline later. We have seen that a computation of SAE consists in some insertions to T and checking if a possible world, the set of all models of the extended theory, is a fixed point of Φ. We are now motivated to study insertions into AELKB-theories. Moreover, a semantic characterization of insertions is interesting in its own right: insertions exhibit the non-monotonic features of autoepistemic theories (or more generally – of each knowledge representation framework).
7
Dynamic AELKB-structures
In this Section we provide a characterization of insertions into AELKB-theories in terms of dynamic Kripke structures. Let us begin with a continuation of the example 1: Example 3 Let T be again {B¬b ∧ B¬f ⇒ r, ¬Kb ∧ ¬Kf → d}. Let us insert into T a formula b ∨ f , i.e. T2 = T ∪ {b ∨ f }. If wT is the set of 0 all models of T and wT 0 is the set of all models of T , then wT 0 does not contain the models from wT with both ¬b and ¬f . Therefore also the set of minimal 0 models of T is changed and corresponding B-consequences, too. The change may be specified by a transformation. If w is a possible world, 0 then fb∨f (w) is a possible world w = {m ∈ w : mw (b ∨ f ) = 1}, i.e. fb∨f (w) is the set of all models from w which satisfy b ∨ f (obviously, if there is no such model, then fb∨f (w) = w⊥ ). 2 Our characterization of insertions in terms of dynamic Kripke structures is based on some well known relations between sets of models and sets of formulae. The relations provide – in a sense – also a connection between insertions into some theories and corresponding models. They are expressed by the following facts:
Belief, Knowledge, Revisions, and a Semantics of Non-Monotonic Reasoning
73
0
Fact 1 Let T an AELKB-theory and w = M od(T ). Let w be a set of models 0 and w ⊇ w . 0 0 0 Then w = M od(T ∪ T ), where T is a set of LA -formulae. Therefore, a function f defined on W such that f (w) ⊆ w is a promising candidate of an appropriate transformation of DKS. 0
0
Fact 2 Let T , T be AELKB-theories such that T ⊂ T . If w = M od(T ) and 0 0 0 w = M od(T ), then there is a ρ1 -path from w to w . We can now propose a DKS: a transition from a possible world to another possible world should correspond to insertion of formulae into theories (and vice versa). Our next goal is to define an appropriate monoid (and corresponding transformations). The basic intuitions: U , a set of insertions, we will represent by a set of formulae. We propose U as a monoid: a concatenation of two insertions is an insertion, the concatenation of insertions is associative, further, an insertion of no proposition plays the role of the unit (of the monoid). To each monoid member is assigned a mapping from possible worlds to possible worlds (see Example 3). 0 Let w be a set of interpretations and fu (w) = w for some u. We need the transformation defined in a unique way: if u ≡ v, then should be fu (w) = fv (w) for each w ∈ W . Definition 17 Let u be an LA -formula and [u]≡ = {x ∈ LA : x ≡ u}. We assume a selection function σ that assigns to each [u]≡ exactly one representative. Definition 18 (i-monoid) Let U = {u : ∃[u]≡ σ([u]≡ ) = u} be a set of representatives. We define a monoid (called i-monoid) U over U : For u, v ∈ U be u ◦ v = σ([u ∧ v]≡ ). Clearly, the operation ◦ is associative and the empty formula plays the role of the monoid unit, u ◦ = u = ◦ u for each u ∈ U . By a convention we may consider as the representative of the class of all propositional tautologies. Definition 19 (Dynamic AELKB-Structure) Dynamic AELKB-Structure is a pair (U, K), where K = (W, {ρ1 , ρ2 }, m) is an AELKB-structure and U is an i-monoid. An action of the monoid U on W is defined as follows: for u ∈ U is fu (w) = 0 w = M od(CnA (T h(w) ∪ {u})). 0
Of course, w is the (unique) value of fu (w): Fact 3 Let T be an AELKB-theory and K = (W, {ρ1 , ρ2 }, m) be an AELKB0 structure. Let w ∈ W be the set of all models of T and T = T ∪ {u}. 0 0 0 Then there is in W exactly one w such that w = M od(CnA (T )). Fact 4 Let a dynamic AELKB-structure be given. It holds:
ˇ anek J. Sefr´
74
– f (w) = w – fu◦v (w) = fu (fv (w)) 0 0 – if fu◦v (w) = w , then T h(w ) = CnA (T h(w) ∪ {u ∧ v}) = CnA (T h(w) ∪ {u} ∪ {v}) We are ready to outline an insertion-based procedure for computing SAE. Let an AELKB-theory T and a corresponding dynamic AELKB-structure K are given. Let wT ∈ W be the set of all models of T . – select a hypothesis h from S \ CnA (T )) 0 – compute fh (wT ) = w 0 – if w is a fixpoint of Φ, then return the computed SAE (and search for 0 0 0 0 another SAE), else select a hypothesis h from S \CnA (T ), where S = {φ : 0 mw0 (φ) = 1} and T = CnA (T ∪{h}),11 continue the (recursive) computation Remark 2 A backtracking is assumed – it may be useful to revise the initial selection. For example, a premature selection of formulae of the form ¬Kφ leads sometimes to a direct construction of an inconsistent SAE. It remains to show that the computation of fh may be based on model checking. Let an AELKB-theory T and a possible world wT = M od(T ) be given. We can use (an adaptation of) model checking algorithm of [5]12 in order to compute the value of fh (wT ). We search through all ρ1 -paths (breadth-first search is necessary) until we find a possible world w such that mw (h) = 1.13 Therefore, fh (wT ) = w and CnA (T h(w)) = CnA (T ∪ {h}).
8
Revisions
Finally, we give a characterization of revisions in terms of DKS. The power of AELKB (more precisely, of AELB) is demonstrated also by a belief revision framework presented in [1]. We try to use also DKS as a tool of revisions specification and computation. We also compare the reached results with the results of [1]. In what follows we assume only AELB-theories. Example 4 ([1]) Let be T = {B¬broken ⇒ runs}. The set of all models of T is w = {{B¬b, r, b}, {B¬b, r, ¬b}, {¬B¬b, r, b}, {¬B¬b, r, ¬b}, {¬B¬b, ¬r, b}, {¬B¬b, ¬r, ¬b}}. 11 12 13
0
0
0
If S \ CnA (T )) = ∅, then w is a fixpoint of Φ. Symbolic model checking may be used in real applications. The relation ρ1 allows to define a semantics of branching time. From this point of view, the application of the algorithm consist in checking the formula EF h. The formula means that there is some ρ1 -path from wT to some w such that h holds at w.
Belief, Knowledge, Revisions, and a Semantics of Non-Monotonic Reasoning 0
75
0
Let u = {¬runs}. It holds that fu (w) = w , where w = {{¬B¬b, ¬r, b}, 0 {¬B¬b, ¬r, ¬b}} is the set of all models of T = T ∪ {¬runs}. The set of mi0 0 nimal models is wmin = {{¬B¬b, ¬r, ¬b}}. Hence, T ∗ , the only SAE of T is 0 0 0 inconsistent: T |=min ¬broken, B¬broken ∈ T ∗ , T ∗ |= ¬runs ∧ runs. We have seen that the semantics of minimal models has some undesirable consequences in a context of incomplete information. In our example the inconsistency was caused by the hypothesis B¬broken. The hypothesis is a member of a SAE (defined in the standard way). It seems that we need a modified – as compared with SAE and minimal entailment – idea of reasonable hypotheses. Our proposal consists in a reconstruction of the given dynamic AELKB-structure. The SAE of the reconstructed structure satisfies our intuitive requirements. Example 5 Consider a reason of inconsistency observed in the Example 4. The minimal model {¬B¬b, ¬r, ¬b} is in a sense a pathological one. It contains the pair (¬B¬b, ¬b) – let us call it a gang (according to [6]) – with a potential conflict between claiming ¬b and disbelieving ¬b. We repair the pathology using a technique of [3]. The essence of the technique is a modification of the accessi0 bility relation ρ2 . The modification consists in a removal of the pair (w , wmin ) from ρ2 and an insertion of an improvement of the pair to ρ2 . The goal of the improvement is a minimization of undesirable consequences. The basic idea of the improvement is to replace the gang by a more rational choice. For example, the more rational choice may be wrat = {¬B¬b, ¬r, b} (the 0 interpretation {B¬b, ¬r, ¬b} is not a model of T ). 0 Therefore, we may insert (w , wrat ) into ρ2 . After the revision of ρ2 – the 0 0 0 0 new ρ2 is (ρ2 \ (w , wmin )) ∪ (w , wrat ) – holds T |=min b, therefore Bb ∈ T ∗ 0 ∗ and T 6|= ¬r ∧ r. Definition 20 Let φ be a literal and I be an interpretation. I is called rational iff each of the following rationality conditions is satisfied: Kφ ∈ I ⇒ φ ∈ I Bφ ∈ I ⇒ φ ∈ I ¬Kφ ∈ I ⇒ φ 6∈ I ¬Bφ ∈ I ⇒ φ 6∈ I Definition 21 Let φ be an objective and ψ a subjective literal. A gang is a pair of literals (φ, ψ) such that it does not satisfy a rationality condition. 0 0 A rational modification of a gang (φ, ψ) is a pair of literals (φ , ψ) or (φ, ψ ), 0 0 where φ is a complementary literal to φ and ψ to ψ. If an interpretation I of an AELKB-theory T contains a gang, then a repair of I is a set S of interpretations J such that some14 gangs of I are in J replaced by their rational modifications and each J is a model of T . Let an AELKB-structure K = (W, {ρ1 , ρ2 }, m) be given. Another AELKB0 0 structure K = (W, {ρ1 , ρ2 }, m)15 is called a reconstruction of K, if there is at 14
15
There is a freedom in improving the impact of a gang. Our goal is not to use only rational interpretations (in order to avoid some non-intuitive consequences). 0 0 The only difference between K and K is in accessibility relation: ρ2 6= ρ2 .
76
ˇ anek J. Sefr´ 0
0
least one pair (w1 , w2 ) ∈ ρ2 \ ρ2 and a possible world w2 containing a repair of 0 0 an interpretation I ∈ w2 such that (w1 , w2 ) ∈ ρ2 \ ρ2 . A computation of a reconstruction (of an AELKB-structure): If the model checking algorithm gives w⊥ for some u and w, we can proceed as follows. Let wrat contains a repair of an interpretation I ∈ wmin . Repeat: put ρ2 := (ρ2 \ (w, wmin ))∪(w, wrat ) and compute SAE again (until a consistent SAE is gained). A summary: We do not change the concept of SAE, but the underlying semantic structure is changed. The modified AELKB-structure determines a modification of (minimal) entailment. Therefore, the set of derivable hypotheses of the form Bφ is changed. The reasoning specified by the semantics can be called dynamic preferential entailment. (If some facts from the knowledge base contradict derivable beliefs, then we modify the given semantic specification of the entailment.) We now compare our results concerning the revisions of AELB-theories with the results of [1]. A concept of careful SAE is introduced in [1]: First we define Y / X as Z, if Z is a maximal subset of X such that Y ∪ Z is consistent. Otherwise Y / X is ∅. Definition 22 A careful static autoepistemic expansion of an AELB-theory T is T ∗ = CnA (T ∪ (T ∗ / {Bφ : T ∗ |=min φ}). A set R(T ∗ ) = {φ : (T ∗ |=min φ) ∧ (Bφ 6∈T ∗ )} is called a revision set. The next theorem corresponds to the Fundamental Theorem of Belief Revision, [1]. Theorem 7 Let K = (W, {ρ1 , ρ2 }, m) be an AELKB-structure. Let T be a consistent AELB-theory, wT = M od(T ). Then holds: if {φ : mwT (φ) = 1}, where mwT is computed according to K, is 0 an inconsistent SAE of T , then there is an AELKB-structure K , a reconstruction 0 of K, such that for mwT computed according to K is the set {φ : mwT (φ) = 1} a careful SAE of T . Conversely, if a careful SAE of T is given, we can compute it as a SAE specified by a reconstruction of the corresponding AELKB-structure. Theorem 8 Let T , K and wT be as in the Theorem 7. Let T ∗ be a careful SAE of T . 0 0 Then there is a K = (W, {ρ1 , ρ2 }, m), a reconstruction of K, such that T ∗ = 0 {φ : mwT (φ) = 1}, where mwT is computed according to K . 0
Proof Sketch: Let T be an AELKB-theory, and (wT , w ) ∈ ρ2 . Select a literal 0 φ ∈ R(T ∗ ) and make a repair S of a model I from w such that mwrat (φ) 6= 1, 0 where wrat = w \{I}∪S. Reconstruct the underlying AELKB-structure. Repeat until T ∗ = {φ : mwT (φ) = 1}. 2 Finally, a remark concerning a comparison of the presented approach to the other results of [1]: both the belief revision by theory change and the belief completion of [1] may be simulated by modifying wT -component of pairs (wT , wmin ) ∈ ρ2 (by transforming wT to f (wT ), to the set of all models of the changed theory).
Belief, Knowledge, Revisions, and a Semantics of Non-Monotonic Reasoning
9
77
Conclusions
Summary of the results: We have introduced AELKB-structures and provided a characterization of static autoepistemic expansions of AELKB-theories in terms of AELKB-structures was given. A method of computing SAE of AELKBtheories was outlined. Further, a DKS-characterization of insertions into AELKBtheories (together with a corresponding computation using model checking) was presented. Finally, a characterization of revisions of AELKB-theories (and a computation using an enhanced model checking) was described. The approach of the Section 8 motivates a generalization of the DKS. DKS may be extended by a set of mappings from accessibility relations to accessibility relations. Moreover, other transformations may be added – transformations extending the sets of possible worlds or transformations extending the vocabularies associated to possible worlds. Some of the other goals of the future research – a detailed study of dynamic preferential entailment (modifications of minimal entailment in the presence of incomplete knowledge), computation of static autoepistemic expansions of AELKB theories, a characterization of deletions (from full AELKB-theories) in terms of DKS, default reasoning in DKS, a semantic characterization of reasoning about action in terms of DKS.
References 1. Alferes, J., Pereira, L., Przymusinski, T., Belief Revision in Non-Monotonic Reasoning and Logic Programming. Fundamenta Informaticae 1(1996) 2. Brass, S., Dix, J., Przymusinski, T., Super Logic Programs. Technical Report, Universit¨ at Hannover, Institut f¨ ur Informatik, 1997 3. Buccafurri, F., Eiter, T., Gottlob, G., Leone, N., Enhancing Symbolic Model Checking by AI Techniques. IFIG Research Report 9701, September 1997. 4. Chang, C., Keisler, H., Model Theory. North-Holland Publ. Comp., 1973 5. Clarke, E.M., Emerson, E.A., Sistla, A.P., Automatic Verification of Finite-State Concurrent Systems Using Temporal Logic Specifications. ACM Transactions on Programming Languages and Systems, vol. 8, No. 2, April 1986, 244-263 6. Kifer, M., Lozinskii, E., RI: A Logic for Reasoning with Inconsistency. Proc. of LICS 1989, IEEE Computer Society Press. 7. Lukasiewicz, W., Non-Monotonic Reasoning. Formalization of Commonsense Reasoning. Ellis Horwood Ltd. 1990 8. Moore, R., Possible-World Semantics for Autoepistemic Logic. Proc. AAAI Workshop on Non-Monotonic Reasoning, New Paltz, NY, 344-354 9. Przymusinski, T., Autoepistemic logic of knowledge and beliefs. Artificial Intelligence 95 (1997), 115-154 ˇ anek, J., Dynamic Kripke structures. CAEPIA’97. Actas de la VII Conferencia 10. Sefr´ de la Associaci´ on Espa˜ nola para la Inteligencia Artificial, Malaga 1997, 271-283 11. J. Van Benthem, Semantic Parallels in Natural Language and Computation, in: Logic Colloquium ’87, eds. Ebbinghaus H.-D. et al., 1989, pp. 331-375, North Holland, Amsterdam
An Argumentation Framework for Reasoning about Actions and Change Antonis Kakas1 , Rob Miller2 , and Francesca Toni3 1
Department of Computer Science, University of Cyprus
[email protected] 2 School of Library, Archive and Information Studies, University College London, UK
[email protected] 3 Department of Computing, Imperial College of Science, Technology and Medicine, London, UK
[email protected]
Abstract. We show how a class of domains written in the Language E, a high level language for reasoning about actions, narratives and change, can be translated into the argumentation framework of Logic Programming without Negation as Failure (LPwNF). This translation enables us 1) to understand default persistence as captured by various temporal reasoning frameworks in a simple and natural way, by assigning higher priority to information about effects of later actions over effects of earlier actions; 2) to develop an argumentation-based computational model for this type of reasoning in logic programming goal-driven style.
1
Introduction
The idea of specialised action description languages was first introduced in [7,8] (and here exemplified by the Language A) with the aim that such languages could serve as specifications for theories of action and change written in different, general purpose or computation oriented formalisms. One such language is the Language E [10], designed to describe domains involving narrative information, i.e. information about actual occurrences of actions, using a basic ontology of actions, fluents and time-points inspired by the Event Calculus [12,13]. This paper contributes to the research agenda suggested in [7,8] by showing how a significant class of E domains may be translated into the argumentation framework of Logic Programming without Negation as Failure (LPwNF) [4] and thus be given an argumentation-based, goal-driven proof theory. LPwNF uses a priority relation between potentially conflicting or incompatible pieces of information to measure the validity of arguments supporting conclusions of interest against potentially conflicting arguments. Like many formalisms for reasoning about action, the Language E incorporates a notion of default persistence over time to address the frame problem. In the LPwNF translation, the default nature of this persistence is reflected in the nature of the priority M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 78–91, 1999. c Springer-Verlag Berlin Heidelberg 1999
An Argumentation Framework for Reasoning about Actions and Change
79
relation, which assigns higher priority to change brought about by later actions than to change brought about by earlier actions. The full Language E uses a more-or-less arbitrary structure of time, so that in some ways it generalises both “branching time” formalisms (such as the Situation Calculus) and linear time formalisms (such as the Event Calculus). However, for the purposes of this paper we have made the simplifying assumption that the structure of time is isomorphic either to the the integer or to the real number line. For such structures of time, it is reasonable to assume that the number of action occurrences in a given domain is finite. Various previous works [8,5,3,2,10] address the problem of translating various declarative specifications for reasoning about actions and change into a computation-oriented form. Most of these works translate Language A theories into (some extension of) Logic Programming. On the whole these translations are less direct than the one presented here, as they rely on a complex use of Negation as Failure to capture default persistence. In contrast, our argumentation-based translation provides an intuitive reformulation as well as a sound and complete computational model. The paper is structured as follows. Section 2 reviews the basic Language E (without “ramification” statements) used in this paper. Section 3 reviews Argumentation for LPwNF. Section 4 gives the translation from E domains to argumentation and proves it sound and complete. Section 5 describes the argumentation-based proof theory for the result of the transformation. Proofs of all technical results can be found in an extended version of this paper at: http://laotzu.doc.ic.ac.uk/UserPages/staff/ft/bibliography.html.
2
A Review of the Basic Language E
The Language E is really a collection of languages. The particular vocabulary of each language depends on the domain being represented, but always includes a set of fluent constants, a set of action constants, and a partially ordered set hΠ, i of time-points. A fluent literal is either a fluent constant F or its negation ¬F . Given time-points T1 and T2 , T1 ≺ T2 stands for T1 T2 and T1 6= T2 . Domain descriptions in the Language E are collections of statements of three kinds (where A is an action constant, T is a time-point, F is a fluent constant, L is a fluent literal and C is a set of fluent literals): t-propositions (“t” for “time-point”), of the form L holds-at T ; h-propositions (“h” for “happens”), of the form A happens-at T ; c-propositions (“c” for “causes”), of the form A initiates F when C or A terminates F when C. Any such cproposition should be regarded as meaning “C is a minimally sufficient set of conditions for an occurrence of A to have an initiating or terminating effect on F ”. When C is empty, the c-propositions can be written as “A initiates F ” and “A terminates F ” respectively. In this paper, unless explicitly stated otherwise, we will consider only domains without t-propositions.
80
A. Kakas, R. Miller, and F. Toni
The semantics of E is based on simple definitions of interpretations and models. Since we are primarily interested in inferences about what holds at particular time-points (i.e. the truth value of t-propositions that might or might not occur explicitly in the domain), it is sufficient to define an interpretation as a mapping of fluent/time-point pairs to truth values: an interpretation is a mapping H : Φ × Π 7→ {true, f alse}, where Φ is the set of fluent constants and Π is set of time-points in E. Given a set of fluent literals C and a time-point T , an interpretation H satisfies C at T iff for each fluent constant F ∈ C, H(F, T ) = true, and for each fluent constant F 0 such that ¬F 0 ∈ C, H(F 0 , T ) = f alse. The definition of model is parametric on the definitions of initiation- and termination-points, i.e. time-points where a c-proposition and an h-proposition combine to describe a direct effect: a time-point T is an initiation-point (termination-point resp.) for a fluent constant F in an interpretation H relative to a domain description D iff there is an action constant A such that (i) there is both an h-proposition in D of the form “A happens-at T ” and a c-proposition in D of the form “A initiates (terminates, resp.) F when C” and (ii) H satisfies C at T . Then, an interpretation H is a model of a given domain description D without t-propositions iff, for every fluent constant F and time-points T1 ≺ T3 : 1. If there is no initiation- or termination-point T2 for F in H relative to D such that T1 T2 ≺ T3 , then H(F, T1 ) = H(F, T3 ), i.e. fluents change their truth values only via occurrences of initiating or terminating actions; 2. If T1 is an initiation-point for F in H relative to D, and there is no terminationpoint T2 for F in H relative to D such that T1 ≺ T2 ≺ T3 , then H(F, T3 ) = true, i.e. initiating a fluent establishes its truth value as true; 3. If T1 is a termination-point for F in H relative to D, and there is no initiation-point T2 for F in H relative to D such that T1 ≺ T2 ≺ T3 , then H(F, T3 ) = f alse, i.e. terminating a fluent forces its truth value to f alse. A domain description is consistent iff it has a model. A domain description D entails the t-proposition “F holds-at T ” (“¬F holds-at T ”, resp.), iff for every model H of D, H(F, T ) = true (H(F, T ) = f alse, resp.). As an example we formulate a car engine domain, with action constants TurnOn and Empty and fluents Running and Petrol . The following domain Dc expresses that, in general, turning on the engine is only effective if the tank has not been emptied of petrol, and, in particular, that the tank has been emptied at time 3 and a TurnOn action has been performed at time 5: TurnOn initiates Running when {Petrol } Empty terminates Petrol Empty terminates Running Empty happens-at 3 TurnOn happens-at 5
(Dc 1) (Dc 2) (Dc 3) (Dc 4) (Dc 5)
It is easy to see, for example, that Dc entails ¬Running holds-at 7. For domain descriptions with t-propositions, the definition of model is extended by the condition:
An Argumentation Framework for Reasoning about Actions and Change
81
4. For all t-propositions in D of the form “F holds-at T ”, H(F, T ) = true, and for all t-propositions of the form “¬F holds-at T 0 ”, H(F, T 0 ) = f alse. Thus, in effect, the t-propositions are like “static” constraints that interpretations must satisfy in order to be deemed models. For example, consider the domain description Dc0 obtained from Dc by replacing (Dc 3) and (Dc 4) with the single t-proposition ¬Running holds-at 7. Then, for an interpretation H to be a model of Dc0 , necessarily H(Running, 7) = f alse. More interestingly, H(P etrol, 5) = f alse, as otherwise H(Running, 7) = true (by condition 2. in the definition of model). In the sequel we will assume that the ordered set of time-points hΠ, i is isomormphic either to the integer or to the real numbers. This ensures that the relation is a total order and that time extends infinitely forwards into the future and backwards into the past. This assumption simplifies considerably the various results, which would otherwise need to be qualified to avoid anomalous (looping, backwardly branching, fragmented, disconnected, etc.) time structures.
3
A Review of Argumentation
Argumentation has recently proved to be a unifying mechanism for most existing non-monotonic formalisms [1,6]. In this section we review (a version of) the LP wN F ([4]) argumentation framework. Let a monotonic logic be a pair (L, `) consisting of a formal language L and a monotonic derivability notion ` between sentences of the formal language. In the sequel we will assume – L consisting of all sentences (called rules) λ0 ← λ1 , . . . , λn (n ≥ 0), with each λi a classical literal, and all variables in the λi implicitly universally quantified from the outside; λ0 is called the head and λ1 , . . . , λn the body of any such rule; – ` obtained by repeatedly applying the classical modus ponens inference rule X ← Y, Y with X ← Y any ground instance of a sentence in L. X Then, an argumentation program (relative to (L, `)) is a tuple (B, A, A0 , <) consisting of a background theory B, i.e. a (possibly empty) set of sentences in L, an argumentation theory A, i.e. a set of sentences in L (the argument rules), an argument base A0 ⊆ A, and a priority relation, i.e. an irreflexive and antisymmetric relation < on the ground instances of the argument rules, where φ < ψ means that φ has lower priority than ψ. Intuitively, any subset of the argument base can be used to extend non-monotonically the background theory in the underlying monotonic logic, if the extension satisfies some requirements. The sentences in the background theory can be seen as non-defeasible argument rules which must belong to any extension. One possible requirement that extensions of the background theory must satisfy is that the extension be admissible, namely non-self-attacking and able to
82
A. Kakas, R. Miller, and F. Toni
counterattack any (set of) argument rules attacking it (see below for the formal definitions). Whereas the admissible set consists only of argument rules in the argument base, attacks against an admissible set of rules are allowed to be subsets of the larger argument theory. The argumentation-theoretic notion of attack is formally defined as follows: a set of argument rules S 0 attacks another non-empty such set S iff there are a literal λ and sets S1 ⊆ S 0 , S2 ⊆ S such that (i) B ∪ S1 `min λ and B ∪ S2 `min ¬λ, and (ii) if ∃r0 ∈ S1 , r ∈ S2 s.t. r0 < r then ∃r0 ∈ S1 , r ∈ S2 s.t. r < r0 where B ∪ X `min α iff B ∪ X ` α and 6 ∃X 0 ⊂ X s.t. B ∪ X 0 ` α. Note that, by minimality, S2 contains a unique rule r with head ¬λ. We say that S 0 attack S on the rule r on ¬λ. Intuitively, a set of argument rules attacks another such set if the two sets are in conflict, by deriving in the underlying logic complimentary literals λ and ¬λ, respectively, and the subset of the attacking set (minimally) responsible for the derivation of λ is not overall lower in priority than the subset of the attacked set (minimally) responsible for the derivation of ¬λ. Note that the notion of attack is monotonic, namely any superset of a set of argument rules attacking another still attacks the second set, because of the minimality requirement. Finally, note that a set S is contradictory, i.e. it derives together with the background theory B a literal λ and its complement ¬λ, iff S attacks itself. The notion of admissible extension is formally defined as follows. Let a set S of argument rules be closed if it contains no rule whose body is not derived (via `) from the background theory extended by S. Then, a closed subset S of A0 is admissible iff B ∪ S is non-contradictory and for any S 0 ⊆ A if S 0 attacks S then S attacks S 0 . For any maximal (wrt set inclusion) admissible subset S, ∆ = B ∪ S is called an admissible extension of (B, A, A0 , <). The admissible extensions of an argumentation program P determine the (non-monotonic) consequences of P : a ground literal λ is a skeptical (resp. credulous) consequence of P iff ∆ ` λ for every (resp. some) admissible extension ∆ of P .
4
Translating E into Argumentation
In translating a given domain description D into an argumentation program, all individual h- and c-proposition translations are included in the background theory. Definition 1. The background theory for D is the theory B(D) such that – for all time-points T and T 0 and action constants A: B(D) ` T T 0 iff T T 0 , B(D) ` T ≺ T 0 iff T ≺ T 0 , and B(D) ` HappensAt(A, T ) iff “A happens-at T ” is in D; – for each c-proposition “A initiates F when {L1 , . . . , Ln }” in D, B(D) contains the rule Initiation(F, t) ← HappensAt(A, t), Λ(L1 ), . . . , Λ(Ln ), and
An Argumentation Framework for Reasoning about Actions and Change
83
– for each c-proposition “A terminates F when {L1 , . . . , Ln }” in D, B(D) contains the rule Termination(F, t) ← HappensAt(A, t), Λ(L1 ), . . . , Λ(Ln ) where Λ(Li ) = HoldsAt(Fi , t) if Li = Fi , and Λ(Li ) = ¬HoldsAt(Fi , t) if Li = ¬Fi , for some fluent constant Fi , and finally – B(D) contains no other rules. As an example, consider the domain description Dc in section 2. Then, B(Dc ) ` HappensAt(Empty, 3), HappensAt(T urnOn, 5) and B(Dc ) contains the rules Initiation(Running, t) ← HappensAt(T urnOn, t), HoldsAt(P etrol, t) T ermination(P etrol, t) ← HappensAt(Empty, t) T ermination(Running, t) ← HappensAt(Empty, t). The remaining components of the argumentation program corresponding to D are domain independent. Definition 2. The argumentation program of D is (B(D), AE , A0E , <E ), referred to in short as PE (D), where • AE consists of Generation rules:HoldsAt(f, t2 ) ← Initiation(f, t1 ), t1 ≺ t2 (PG[f, t2 ; t1 ]) ¬HoldsAt(f, t2 ) ← Termination(f, t1 ), t1 ≺ t2 (NG[f, t2 ; t1 ]) Persistence rules: HoldsAt(f, t2 ) ← HoldsAt(f, t1 ), t1 ≺ t2 ¬HoldsAt(f, t2 ) ← ¬HoldsAt(f, t1 ), t1 ≺ t2 Assumptions: HoldsAt(f, t) ¬HoldsAt(f, t)
(PP[f, t2 ; t1 ]) (NP[f, t2 ; t1 ]) (PA[f, t]) (NA[f, t])
• A0E consists of all the generation rules and assumptions only. • <E is given by (for any fluent constant F and time-points T , T 0 , T1 , T2 ): (i) if T1 T2 then NP[F, T ; T1 ] <E PG[F, T ; T2 ] and PP[F, T ; T1 ] <E NG[F, T ; T2 ]; (ii) if T1 ≺ T2 then NG[F, T, T1 ] <E PG[F, T ; T2 ] and PG[F, T ; T1 ] <E NG[F, T ; T2 ]; (iii) PA[F, T ] <E NG[F, T ; T 0 ], PA[F, T ] <E NP[F, T ; T 0 ], NA[F, T ] <E PG[F, T ; T 0 ] and NA[F, T ] <E PP[F, T ; T 0 ]. In other words, the effects of later events take priority over the effects of earlier ones and thus (i) persistence rules have lower priority than “conflicting” and “later” generation rules, (ii) “earlier” generation rules have lower priority than
84
A. Kakas, R. Miller, and F. Toni
“conflicting” “later” generation rules. Moreover, (iii) assumptions have lower priority than “conflicting” generation rules. For example, given Dc as in section 2, PA[Running, 5] <E NG[Running, 5; 3] PG[Running, 7; 5] <E NG[Running, 7; 3]. We will show that the above translation is sound and complete, with respect to the semantics for the Language E, for fluent-independent domain descriptions (see the definition below) with a finite number of h-propositions. We will need the following two definitions, which provide the Language E with a notion analogous to that of e-consistency introduced in [3] for the Language A. Definition 3. The action constants A1 and A2 conflict in D iff D contains c-propositions of the form “A1 initiates F when C1 ” and “A2 terminates F when C2 ” and there is no fluent symbol F 0 in E such that both F 0 ∈ C1 ∪ C2 and ¬F 0 ∈ C1 ∪ C2 . When A1 = A2 = A we say that the action constant A self-conflicts in D. Definition 4. D is fluent-independent iff (i) there are no time-point T and h-propositions in D of the form “A1 happens-at T ” and “A2 happens-at T ” such that A1 and A2 conflict in D, and (ii) there is no h-proposition in D of the form “A happens-at T ” such that A self-conflicts in D. The sound and completeness results are given as follows: for any fluent-independent domain description D with only a finite number of h-propositions, the corresponding argumentation program PE (D) is semantically equivalent to D, since there is a one to one correspondence between models of D and admissible extensions of PE (D), such that true t-propositions in the models match HoldsAt literals derivable from the corresponding admissible extensions. Clearly, a fluent-independent domain description with a finite number of hpropositions (for totally ordered time) is consistent. Then: Theorem 1 (Soundness). Let D be a fluent-independent domain description with a finite number of h-propositions, and let M be a model of D. Then there exists an admissible extension ∆ of PE (D) such that for any time-point T and fluent constant F : M (F, T ) = true iff ∆ ` HoldsAt(F, T ) M (F, T ) = f alse iff ∆ ` ¬HoldsAt(F, T ). Theorem 2 (Completeness). Let D be a fluent-independent domain description with a finite number of h-propositions. and let ∆ be an admissible extension of PE (D). Then, the interpretation H defined as follows is a model of D. For any time-point T and fluent constant F : H(F, T ) = true iff ∆ ` HoldsAt(F, T ) H(F, T ) = f alse iff ∆ ` ¬HoldsAt(F, T ).
An Argumentation Framework for Reasoning about Actions and Change
85
Note that these results imply that any admissible extension ∆ is total, i.e. is such that, for any fluent F and time-point T , ∆ ` HoldsAt(F, T ) or ∆ ` ¬HoldsAt(F, T ). Indeed, we use this property to prove the results above. These results continue to hold when D contains t-propositions by simply considering only the admissible sets that confirm all the t-propositions. Thus, the semantic role of the t-propositions in D is like that of integrity constraints i.e. meta-level properties that must hold in any extension of the program. Note that, by expressing the requirement imposed by the t-propositions in this (meta-level) manner, the translation of the domain description D into an argumentation program does not need to take into account the t-propositions in D. Alternatively, the semantic role of the t-propositions can also be captured by extending the translation of D so that the background theory B(D) includes a sentence given(F, T ) (resp. given(neg(F ), T ) for every t-proposition F holds-at T (resp. ¬F holds-at T ) in D, and adding the following generation rules to the argument theory AE but not to the argument base A0E in PE (D): HoldsAt(f, t2 ) ← given(f, t1 ), t1 t2 ¬HoldsAt(f, t2 ) ← given(neg(f ), t1 ), t1 t2 and by extending the priority ordering accordingly (as for the basic generation rules). Note that, since these additional rules are not in the argument base A0E , they cannot be used in building an admissible extension. Indeed, allowing them as part of admissible extensions would trivially allow to confirm all the t-propositions of the domain, and this would be incorrect in cases where there is no other independent way of confirming the t-propositions.
5
Proof Theory
The translation of domain descriptions into argumentation programs allows us to employ existing computational model(s) for argumentation, such as [11] or more directly for LPwNF [4], to compute entailed t-propositions by computing the sceptical consequences ot the corresponding argumentation programs. In this section we adapt the abstract computational framework in [11] for the Language E. As an example, consider the domain description Dc in section 2. We can use the corresponding argumentation program PE (D) to show (argue) that ¬HoldsAt(R, 7) (here and below R stands for Running and P for P etrol). The background theory B(D) together with any set S such that S ⊇ {N G[R, 7; 3]} can (monotonically) derive this conclusion. S is attacked by A = { P G[R, 7; 5], P A[P, 5]}. S can attack back A on the rule P A[P, 5] by bringing in the argument N G[P, 5; 3], giving the extended set S1 = S ∪ {N G[P, 5; 3]}. S1 still derives the required conclusion ¬HoldsAt(R, 7). Moreover, S1 is admissible and hence ¬HoldsAt(R, 7) is a credulous consequence of PE (D). In order to show that it is also a sceptical consequence we can show that we can not build an admissible set for its negation HoldsAt(R, 7). This can be done by showing
86
A. Kakas, R. Miller, and F. Toni
that for whatever set S 0 deriving HoldsAt(R, 7), there is at least one attack against S 0 (in this case given by the set S1 ) that can not be attacked back by any non-contradictory extension of S 0 . In other words, we can show that there is no admissible argument set S 0 deriving HoldsAt(R, 7). The general argumentation-based computational framework developed in [11] is expressed in abstract terms via derivations of trees, whose nodes are sets of arguments attacking the arguments in their parent nodes, and the construction of “acceptable” trees. We develop a proof theory for E in a similar way in terms of derivations of trees which construct admissible trees. Definition 5. Given an argumentation program (B, A, A0 , < ), an admissible tree for a set of arguments S ⊆ A0 is an and-tree of depth 2 where – each node is a closed set of arguments; – the root (node of level 0) is S; – the root has as children (nodes of level 1) all sets of arguments A ⊆ A such that A is a minimal attack against S; – every node A of level 1 has (exactly) one child D ⊆ A0 (node of level 2) and D is a minimal attack against A; – no node of level 1 is a subset of the root S; – every node of level 2 is a subset of the root S. Nodes of level 1 play the role of attacks against the given set S. These attacks are counter attacked by the nodes of level 2, that play the role of defences for S. The last two conditions correspond to the two properties of an admissible set S that it is non-contradictory and that it defends itself under any attack. Figure 1(a) shows the structure of a generic admissible tree. S
... @ @
A0
A1
D0
D1
A2
D2
S
@ @
A1 (T ). . . A2 . . .A3 (T ) . . . S0
S
S1
Fig. 1. (a) Generic admissible tree, and (b) Admissible tree for the Dc domain
Theorem 3. A set of argument rules is admissible iff it is the root of an admissible tree. The proof theory detects whether a given literal (¬)HoldsAt(F, T ) holds in some admissible extension ∆, and hence that it is a credulous consequence of the argumentation program, by constructing 1. a non-contradictory (and closed) S0 such that B(D)∪S0 ` (¬)HoldsAt(F, T )
An Argumentation Framework for Reasoning about Actions and Change
87
2. extending S0 to S ⊇ S0 such that S is the root of an admissible tree. By monotonicity of `, B(D) ∪ S ` (¬)HoldsAt(F, T ). Moreover, the following lemma guarantees that S ⊆ ∆ for some admissible extension ∆ = B(D) ∪ S ∗ for some S ∗ ⊇ S. Thus, again by monotonicity of `, ∆ ` (¬)HoldsAt(F, T ), and thus (¬)HoldsAt(F, T ) is a credulous consequence of the argumentation program. Lemma 1. For every admissible set of argument rules S there exists an admissible extension containing S. In the sequel we assume that S0 is given. The proof theory constructs an admissible tree with root S ⊇ S0 incrementally, via derivations of partial trees. A partial tree T is a subtree of an admissible tree T 0 such that both the root and all nodes of level 2 of T are subsets of the root of T 0 . In a derivation, which starts with the partial tree consisting only of the root S0 , two operations can be performed on the current partial tree: the tree can be expanded vertically by lengthening its branches, by adding defences against a group of attack nodes; or the tree can be expanded horizontally by first extending the root with new argument rules occurring in some group of defence nodes and then by adding new children of the root corresponding to the new attacks against the extended root. In the definition below we assume a selection strategy identifying the “group” of nodes of the current partial tree to be handled next, and we mark nodes that should not be selected further. We assume that the selection strategy satisfies the following conditions: (i) all nodes in the selected group are at the same level, and (ii) if the level is 1 then there exists a single set of argument rules attacking each node in the group, whereas (iii) if the level is 2 then the sets of argument rules at all nodes of the group coincide. Definition 6. A derivation for a set of argument rules S0 is a sequence of partial trees T1 , . . . , Tn , . . . such that T1 consists only of the (unmarked) root S0 and, given Ti (i ≥ 1), if N is the selected group of (unmarked) nodes in Ti and R(Ti ) is the root of Ti , then Ti+1 is obtained as follows: (α) if the nodes in N are of level 1, choose a (minimal) attack D against N − R(Ti ) for all nodes N in N . Then, Ti+1 is Ti where 1. all the nodes in N are marked, and 2. D is added as the (unmarked) node child of each node N in N . (δ) if N is a group of nodes of level 0 or 2, then Ti+1 is Ti where 1. all nodes in N are marked, 2. the root is extended by N − R(Ti ) where N is the set of argument rules at each node in N , and 3. if A1 , . . . , Am , . . . (m ≥ 0) are all the (new) minimal attacks against N ∪ R(Ti ) such that each Aj does not attack R(Ti ) alone then A1 , . . . , Am , . . . are added as additional (unmarked) nodes children of the root.
88
A. Kakas, R. Miller, and F. Toni
Note that in a step of type δ if the set of arguments in the selected defence node(s) is a subset of the current root, then there will be no new attacks against the extended root and hence no extra attack nodes children of the root. As an example, consider again the domain Dc , and suppose we want to construct a derivation for S0 = {N G[R, 7; 3]} (deriving, together with B(Dc ), the conclusion ¬HoldsAt(R, 7)). One possible such derivation is as follows: T1 is the tree consisting uniquely of the root S0 . There are two groups of (minimal) attacks against S0 that it is necessary to consider: A1 (T ) = {P P [R, 7; T ], P A[R, T ]} for any time-point T with 3 ≺ T ≺ 7, A2 = {P G[R, 7; 5], P A[P, 5]}. Note that there are additional groups of (minimal) attacks, e.g. 0
A1 (T, T 0 ) = {P P [R, 7; T ], P P [R, T ; T 0 ], P A[R, T 0 ]} for any time-points T, T 0 with 3 ≺ T 0 ≺ T ≺ 7, 00 A1 (T, T 0 , T 00 ) = {P P [R, 7; T ], P P [R, T ; T 0 ], P P [R, T 0 ; T 00 ], P A[R, T 00 ]} for any time-points T, T 0 , T 00 with 3 ≺ T 00 ≺ T 0 ≺ T ≺ 7, etc. 0 A2 (T ) = {P G[R, 7; 5], P P [P, 5; T ], P A[P, T ]} for any time-point T such that T ≺ 5, 00 A2 (T, T 0 ) = {P G[R, 7; 5], P P [P, 5; T ], P P [P, T ; T 0 ], P A[P, T 0 ]} for any time-point T, T 0 such that T 0 ≺ T ≺ 5, etc. However, any such additional attacks, involving chains of persistence rules, can be counterattacked in the same way as the corresponding simpler attacks involving a single persistence rule, and so may be disregarded. Then, by a step of type (δ), T2 is T1 with the root marked and all attacks A1 (T ) and A2 added as children of the root. All attacks A1 (T ) are counter attacked by S0 itself, since their argument rules are non-comparable under the priority ordering. Then, selecting all attacks A1 (T ), a step of type (α) gives T3 amounting to T2 with S0 added as the child of each attack node A1 (T ) and each A1 (T ) marked. All newly added defence nodes (with S0 ) can be selected next and marked, by a step of kind (δ), thus giving T4 . S0 alone does not counter attack the attack node A2 or any of the attack 0 nodes in the group of nodes A2 (T ). The set S = S0 ∪ {N G[P, 5; 3]} can attack 0 each one of these nodes. Then, by a step of type (α) where A2 and each A2 (T ) 0 node are selected together, T5 is T4 with A2 and all nodes A2 (T ) marked and S 0 added as the child of A2 and of each attack node A2 (T ). Further, since S −S0 is attacked by A3 (T ) = {P P [P, T, 5], P A[P, T ]}, for any time-point T such that 3 ≺ T ≺ 5, by a step of type (δ) selecting the defence
An Argumentation Framework for Reasoning about Actions and Change
89
node S, T6 is T5 with S marked, S − S0 added to the root and all attacks A3 (T ) added as additional children of the root 1 . Every attack A3 (T ) is counter attacked by S1 = {N G[P, 5; 3]} (as again their rules are non-comparable) and thus by the current root. Then, by a step of type (α) selecting all attack nodes A3 (T ), T7 is T6 with every A3 (T ) marked and S1 added as a child of every A3 (T ). Finally, a step of type (δ) selecting the group of these new defence nodes gives T8 amounting to T7 with all these defence nodes marked. No more steps can be applied to T8 as all its nodes are marked. The computed tree obtained from the final tree T8 by ignoring all marks is illustrated in figure 1(b). Note that this tree is an admissible tree for S. Definition 7. A successful derivation from S0 to S is a (possibly infinite) derivation T1 , . . . , T for S0 such that all nodes in T are marked, all leaves in T are of even level and S is the root of T . Theorem 4 (Soundness and Completeness). If there exists a successful derivation from S0 to S then S0 ⊆ S and S is admissible. Conversely, if there exists an admissible set S containing S0 , then there exists a successful derivation from S0 to S 0 , for some admissible set S 0 such that S0 ⊆ S 0 ⊆ S. Definition 8. A finitely failed derivation from S0 to S is a finite derivation T1 , . . . , Tk for S0 such that S is the root of Tk , all nodes in Tk are marked and some leaves in Tk are of odd level. Corollary 1 (Soundness of Finite Failure). If every derivation from S0 is finitely failed, then there exists no set S such that S0 ⊆ S and S is admissible. The given proof theory can be used to compute sceptical consequences as follows: Theorem 5. Let D be a domain description. Let HoldsAt(F, T ) (¬HoldsAt(F, T ) resp.) be given and S0 be a set of argument rules such that B(D) ∪ S0 ` HoldsAt(F, T ) ( resp. B(D) ∪ S0 ` ¬HoldsAt(F, T )). If there exists a successful derivation for S0 and, for every set S00 of argument rules such that B(D) ∪ S00 ` ¬HoldsAt(F, T ) (resp. B(D) ∪ S00 ` HoldsAt(F, T )), every derivation for S00 finitely fails, then HoldsAt(F, T ) (resp. ¬HoldsAt(F, T )) is a sceptical consequence of PE (D). The abstract proof theory forms the basis for developing concrete top-down proof procedures for query evaluation, such as the one in [4]. The proof procedures are obtained by adopting specific selection strategies and ways of computing attacks in the concrete argumentation framework. For example, [4] adopts a resolutionbased computation of attacks. The selection of a group of attack or defence 1
Again, there are additional groups of attacks against S − S0 , involving chains of persistence rules accounting for P etrol to hold at 5. And again, these additional groups of attacks are defended against in the same way as the simpler (single persistence rule) attacks in A3 (T ).
90
A. Kakas, R. Miller, and F. Toni
nodes, in the particular argumentation framework for E, can be guided by time constraints. In this way the computation essentially reduces to satisfaction of appropriate sets of time constraints. Note that there are several redundancies in the attacks considered by the proof theory. For example, attack nodes whose rules are non-comparable with those in the root need not be considered as they can always be counter attacked by the root itself. Hence, in a resolution-based computation of attacks, we need only consider attacks whose top-most rule is of higher priority than that of the corresponding conflicting rule in the root. This means that in E case we can restrict our attention to (i) attacks via generation rules only against the root on a generation rule, and (ii) attacks via generation or persistence rules against the root on a assumption rule. In the earlier example this would mean that the groups of attacks A1 (T ) and A3 (T ) would not need to be considered. Consider now the case where domain descriptions contain t-propositions. A simple (but naive) way to extend the proof theory is to add all t-propositions to every query. If the extended query succeeds (resp. fails) then the original query can (resp. cannot) be satisfied simultaneously with the t-propositions in the domain and hence the proof theory is still correct. To illustrate this consider a simple domain description D which contains only the t-proposition F holds-at 5. If we pose the query ¬HoldsAt(F, 2), then the proof theory would succeed from S0 = {N A[F, 2]} returning S = S0 ∪ {N A[F, T ]|T ≺ 2}. However, when the query is extended with the t-proposition HoldsAt(F, 5), the proof theory would fail. Indeed, in order to derive the t-proposition the argument set S0 would need to contain the rule P A[F, 5]. This gives a new attack A = {N A[F, 2], N P [F, 5; 2]} which can only be defended by P A[F, 2]. If we attempt to add this, the root would become contradictory. We are currently investigating more efficient proof theories for domains with t-propositions which identify which t-propositions are relevant to a given query by some limited form of forward reasoning from the query.
6
Conclusions
We have shown how reasoning about actions and change as formalized by the Language E can be understood in argumentation terms within the framework of LP wN F . This re-formulation rests on the fact that, for default persistence, later information has priority over earlier conflicting information. Although in this paper we have restricted our attention to totally ordered time, we believe that analogous results hold for more general time structures, such as the branching time structure employed by the Situation Calculus and the Language A. An important consequence of the argumentation formulation of E is that it provides a computational view on which we can build an automated proof theory for reasoning about actions and change. The proof theory is a Logic Programming style query-oriented proof procedure. Various previous works [8,5,3,2,10] address the problem of translating various declarative specifications for reasoning about actions and change into a computation-oriented form. Whereas these
An Argumentation Framework for Reasoning about Actions and Change
91
translations rely on a complex use of Negation as Failure to capture default persistence, our translation is more direct and, arguably, more intuitive. We are currently investigating ways to improve the effeciency of the proof theory proposed in this paper. In particular, we are examining the possibility of further restricting the type of attacks that need to be considered, as well as introducing a limited form of forward reasoning to allow us to consider only t-propositions relevant to the query at hand. The latter feature is particularly desirable when we want to consider E domains with “ramification” statements [9].
Acknowledgements This research has been partially supported by the EC Keep-In-Touch Project “Computational Logic for Flexible Solutions to Applications”. The third author has been supported by the UK EPSRC Project “Logic-based multi-agent systems”.
References 1. A. Bondarenko, P. M. Dung, R. A. Kowalski, F. Toni. An abstract, argumentationtheoretic framework for default reasoning. AI 93(1-2) 63–101 (1997). 2. S.-E. Bornscheuer and M. Thielscher. Explicit and Implicit Indeterminism: Reasoning about uncertain and contradictory specifications of dynamic systems. JLP 31(1-3) 119–155, 1997. 3. M. Denecker and D. De Schreye. Representing Incomplete Knowledge in Abductive Logic Programming. ILPS’93, MIT Press. 4. Y. Dimopoulos and A.C. Kakas. Logic Programming without Negation as Failure. ILPS’95, 369–383, MIT Press. 5. P. M. Dung. Representing Actions in Logic Programming and its Applications in Database Updates. ICLP’93, 222–238, MIT Press. 6. P. M. Dung. The acceptability of arguments and its fundamental role in nonmonotonic reasoning and logic programming and n-person game. AI 77 321–357, 1995. 7. M. Gelfond and V. Lifschitz. Representing Actions in Extended Logic Programming. JICSLP’92, 560, MIT Press. 8. M. Gelfond and V. Lifschitz. Representing Action and Change by Logic Programs. JLP, 17 (2,3,4) 301–322, 1993. 9. A.C. Kakas and R.S. Miller. Reasoning about Actions, Narratives and Ramifications. Journal of Electronic Transactions on Artificial Intelligence 1(4), 1997. 10. A.C. Kakas and R.S. Miller. A Simple Declarative Language for Describing Narratives with Actions. JLP 31(1–3) (Special Issue on Reasoning about Action and Change) 157–200, 1997. 11. A. C. Kakas and F. Toni. Computing argumentation in logic programming. To appear in JLC, Oxford University Press, 1999. 12. R.A. Kowalski and M. Sergot. A Logic-Based Calculus of Events. New Generation Computing 4, 267, 1986. 13. M. Shanahan. Solving the Frame Problem: A Mathematical Investigation of the Commonsense Law of Inertia, MIT Press, 1997.
Representing Transition Systems by Logic Programs Vladimir Lifschitz1 and Hudson Turner2 1
Department of Computer Sciences University of Texas at Austin Austin, TX 78712, USA
[email protected] 2 Department of Computer Science University of Minnesota at Duluth Duluth, MN 55812, USA
[email protected]
Abstract. This paper continues the line of research on representing actions, on the automation of commonsense reasoning and on planning that deals with causal theories and with action language C. We show here that many of the ideas developed in that work can be formulated in terms of logic programs under the answer set semantics, without mentioning causal theories. The translations from C into logic programming that we investigate serve as a basis for the use of systems for computing answer sets to reason about action domains described in C and to generate plans in such domains.
1
Introduction
This paper continues the line of research on representing actions, on the automation of commonsense reasoning and on planning described in [11], [7] and [12]. A large part of that work deals with a new nonmonotonic formalism—“causal theories.” We show here that many of the ideas developed in those papers can be formulated also in terms of logic programs under the answer set semantics [5], without even mentioning causal theories. Specifically, we investigate here translations from action language C into logic programming. These translations serve as a basis for the use of systems for computing answer sets, such as smodels [13]1 and dlv [3]2 , for planning in action domains described in C, as proposed in [9].3 In [2] smodels is used to generate plans for action domains described in the language of STRIPS, which is not as expressive as C. One such translation can be obtained by composing the translation from C into the language of causal theories defined in [7] with the translation from 1 2 3
http://saturn.hut.fi/pub/smodels . http://www.dbai.tuwien.ac.at/proj/dlv . See also http://www.cs.utexas.edu/users/esra/experiments/experiments.html .
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 92–106, 1999. c Springer-Verlag Berlin Heidelberg 1999
Representing Transition Systems by Logic Programs
93
causal theories into logic programs given by Proposition 6.1 from [10]. We call this translation lpn. Our basic translation lp is similar (and equivalent) to lpn but a little simpler, and our proof of the soundness of both translations is direct: it does not refer to causal theories. A modification of the “literal completion” procedure from [11] allows us to describe answer sets for these translations by propositional formulas. This fact provides an alternative explanation for some of the computational procedures that ccalc4 uses for temporal reasoning and planning. The new explanation is entirely in terms of logic programming; it does not appeal to any specialized logic of causal reasoning. After a review of the syntax and semantics of C (Section 2) and of the concept of an answer set (Section 3), we define the basic translation from C into logic programming and state a theorem expressing its adequacy (Section 4). In Section 5 we show that the result of the basic translation can be simplified if the given domain description has a nontrivial “split mapping.” Literal completion is discussed in Section 6. Proofs are postponed to Section 7.
2
Review of C
This review of action language C follows [7] and [6]. Consider a set σ of propositional symbols partitioned into the fluent names σ fl and the elementary action names σ act . An action is an interpretation of σ act . There are two kinds of propositions in C: static laws of the form caused F if G
(1)
caused F if G after U ,
(2)
and dynamic laws of the form
where F , G are formulas of signature σ fl and U is a formula of signature σ. In a proposition of either kind, the formula F is called the head. An action description is a set of propositions. Consider an action description D. A state is an interpretation of σ fl that satisfies G ⊃ F for every static law (1) in D. A transition is any triple hs, a, s0 i where s, s0 are states and a is an action; s is the initial state of the transition, and s0 is its resulting state. A formula F is caused in a transition hs, a, s0 i if it is (i) the head of a static law (1) from D such that s0 satisfies G, or (ii) the head of a dynamic law (2) from D such that s0 satisfies G and s ∪ a satisfies U . A transition hs, a, s0 i is causally explained by D if its resulting state s0 is the only interpretation of σ fl that satisfies all formulas caused in this transition. 4
http://www.cs.utexas.edu/users/mccain/cc .
94
V. Lifschitz and H. Turner
The transition system described by an action description D is the directed graph which has the states of D as nodes, and which includes an edge from s to s0 labeled a for every transition hs, a, s0 i that is causally explained by D. Consider two examples. The first describes the action of opening a springloaded door using the fluent name Closed and the elementary action name OpenDoor . In the notation introduced in [6, Section 6], this action description can be written as default Closed , (3) OpenDoor causes ¬Closed which is an abbreviation for5 caused Closed if Closed , caused ¬Closed if > after OpenDoor . The transition system described by (3) has 2 states (Closed and Closed ) and 4 causally explained transitions: h h h h
Closed , Closed , Closed , Closed ,
OpenDoor , OpenDoor , OpenDoor , OpenDoor ,
Closed Closed Closed Closed
i i i i
, , , .
The first of these transitions shows that the door is spring-loaded: it closes by itself when we do nothing (OpenDoor ). The other example describes the effect of putting an object in water. It involves the fluent names InWater and Wet and the elementary action name PutInWater . In abbreviated notation, its propositions are: PutInWater causes InW ater , caused Wet if InWater , inertial InWater , ¬InWater , Wet, ¬Wet .
(4)
(InWater is treated here as a direct effect of the action, and Wet is an indirect effect.) Written out in full, (4) becomes: caused caused caused caused caused caused
InWater if > after PutInWater , Wet if InWater , InWater if InWater after InWater , ¬InWater if ¬InWater after ¬InWater , Wet if Wet after Wet , ¬Wet if ¬Wet after ¬Wet .
The corresponding transition system has 3 states InWater Wet, InWater Wet, InWater Wet and 6 causally explained transitions: 5
We assume that the language contains the 0-place connectives > (true) and ⊥ (false).
Representing Transition Systems by Logic Programs
h h h h h h
InWater InWater InWater InWater InWater InWater
Wet, Wet, Wet, Wet, Wet, Wet,
PutInWater , PutInWater , PutInWater , PutInWater , PutInWater , PutInWater ,
InWater InWater InWater InWater InWater InWater
Wet Wet Wet Wet Wet Wet
i i i i i i
95
, , , , , .
The translations defined in this note are applicable to an action description D only if it satisfies the following condition: for all static laws (1) and dynamic laws (2) in D, (i) F is a literal or the symbol ⊥, (ii) G and U are conjunctions of literals (possibly empty, understood as >). If this condition is satisfied, we will say that D is definite.6 For instance, descriptions (3) and (4) are definite. We will identify an intepretation I of a signature with the set of literals of that signature that are satisfied by I. If D is definite then the condition s0 is the only interpretation of σ fl that satisfies all formulas caused in hs, a, s0 i in the definition of a causally explained transition can be equivalently replaced by the set of formulas caused in hs, a, s0 i is s0 .
3
Review of the Answer Set Semantics
The definitions below diverge from [5] in how they treat constraints and inconsistent sets of literals. Consider a set of propositional symbols, called atoms. A literal is an expression of the form B or ¬B, where B is an atom. A rule element is an expression of the form L or not L where L is a literal. The symbol ¬ is called classical negation, and the symbol not is negation as failure. A rule is a pair Head ← Body where Head is a literal or the symbol ⊥, and Body is a finite set of rule elements. Thus a rule has the form Head ← L1 , . . . , Lm , not Lm+1 , . . . , not Ln
(5)
where n ≥ m ≥ 0; we drop { } around the elements of the body. A rule (5) is a constraint if Head = ⊥. A program is a set of rules. The notion of an answer set is defined first for programs whose rules do not contain negation as failure. Let Π be such a program, and let X be a consistent set of literals. We say that X is closed under Π if, for every rule Head ← Body 6
Part (ii) of this definition is not essential: it can be dropped at the cost of making the translations slightly more complicated.
96
V. Lifschitz and H. Turner
in Π, Head ∈ X whenever Body ⊆ X. (For a constraint, this condition means that the body is not contained in X.) We say that X is an answer set for Π if X is minimal among the sets closed under Π. It is clear that a program without negation as failure can have at most one answer set. To extend this definition to arbitrary programs, take any program Π, and let X be a consistent set of literals. The reduct Π X of Π relative to X is the set of rules Head ← L1 , . . . , Lm for all rules (5) in Π such that Lm+1 , . . . , Ln 6∈ X. Thus Π X is a program without negation as failure. We say that X is an answer set for Π if X is an answer set for Π X . (Note that, according to these definitions, all answer sets are consistent.) A set X of literals is complete if for every atom B, B ∈ X or ¬B ∈ X. The use of the answer set semantics in this paper is similar to its use in [14] in that we will be interested in complete answer sets only. It is clear that the incomplete answer sets for any program can be eliminated by adding the constraints ← not B, not ¬B
(6)
for all atoms B.
4
Basic Translation
Let D be a definite action description. We will define, for every positive integer T , a logic program lp T (D) whose answer sets correspond to “histories”—paths of length T in the transition system described by D. The language of lp T (D) has atoms of two kinds: (i) fluent atoms—the fluent names of D followed by (t) where t = 0, . . . , T , and (ii) action atoms—the action names of D followed by (t) where t = 0, . . . , T − 1. Thus every literal in this language ends with (t) for some natural number t. This number will be called the time stamp of the literal. Program lp T (D) consists of the following rules: (i) for every static law in D, the rules
caused F if L1 ∧ · · · ∧ Lm F (t) ← not L1 (t), . . . , not Lm (t)
(7)
for all t = 0, . . . , T (we understand F (t) as ⊥ if F is ⊥; L stands for the literal complementary to L), (ii) for every dynamic law caused F if L1 ∧ · · · ∧ Lm after Lm+1 ∧ · · · ∧ Ln in D, the rules F (t + 1) ← not L1 (t + 1), . . . , not Lm (t + 1), Lm+1 (t), . . . , Ln (t) for all t = 0, . . . , T − 1,
(8)
Representing Transition Systems by Logic Programs
(iii) the rules
97
¬B ← not B , B ← not ¬B
where B is a fluent atom with the time stamp 0 or an action atom. For instance, the translation of (3) consists of all rules of the forms Closed (t) ← not ¬Closed (t) , ¬Closed (t + 1) ← OpenDoor (t) , Closed (0) ← not ¬Closed (0) , ¬Closed (0) ← not Closed (0) , OpenDoor (t) ← not ¬OpenDoor (t) , ¬OpenDoor (t) ← not OpenDoor (t) .
(9)
The translation of (4) is InWater (t + 1) ← PutInWater (t) , Wet(t) ← not ¬InWater (t) , InWater (t + 1) ← not ¬InWater (t + 1), InWater (t) , ¬InWater (t + 1) ← not InWater (t + 1), ¬InWater (t) , Wet(t + 1) ← not ¬Wet(t + 1), Wet(t) , ¬Wet(t + 1) ← not Wet(t + 1), ¬Wet(t) , InWater (0) ← not ¬InWater (0) , ¬InWater (0) ← not InWater (0) , Wet(0) ← not ¬Wet(0) , ¬Wet(0) ← not Wet(0) , PutInWater (t) ← not ¬PutInWater (t) , ¬PutInWater (t) ← not PutInWater (t) .
(10)
Proposition 1. A complete set X of literals is an answer set for lp T (D) iff it has the form "T −1 # [ {L(t) : L ∈ st ∪ at } ∪ {L(T ) : L ∈ sT } t=0
for some path hs0 , a0 , s1 , . . . , sT −1 , aT −1 , sT i in the transition system described by D. As discussed at the end of Section 3, the restriction to complete sets can be dropped if we extend the program by constraints (6). In the case of lp T (D), it is sufficient to add these constraints for the fluent atoms with nonzero time stamps. The case of T = 1 deserves a special mention: Corollary 1. A complete set X of literals is an answer set for lp 1 (D) iff it has the form {L(0) : L ∈ s ∪ a} ∪ {L(1) : L ∈ s0 } for some transition hs, a, s0 i causally explained by D.
98
5
V. Lifschitz and H. Turner
Simplifying the Basic Translation
A split mapping for a definite action description D is a function λ from fluent literals7 to ordinals such that, for every static law caused F if L1 ∧ · · · ∧ Lm and every dynamic law caused F if L1 ∧ · · · ∧ Lm after Lm+1 ∧ · · · ∧ Ln in D such that F is not ⊥, λ(L1 ), . . . , λ(Lm ) ≤ λ(F ) . If λ is a split mapping for D, then we can sometimes eliminate some occurrences of negation as failure in the translation of static and dynamic laws: by replacing not Li (t) (1 ≤ i ≤ m) with Li (t) whenever λ(F ) > λ(Li ). Proposition 2. If λ is a split mapping for a definite action description D, then in rules (7) and (8) of lp T (D) with fluent literal heads we can replace any expressions of the form not Li (t) (1 ≤ i ≤ m) such that λ(F ) > λ(Li ) with Li (t) without affecting the complete answer sets. Similarly, in rules (7) and (8) of lp T (D) with head ⊥ we can replace any expressions of the form not Li (t) (1 ≤ i ≤ m) with Li (t). For instance, a split mapping for (4) can be defined by λ(InWater ) = λ(¬InWater ) = 0, λ(Wet) = λ(¬Wet) = 1. It follows that the second rule Wet(t) ← not ¬InWater (t) in the translation (10) of (4) can be equivalently replaced by Wet(t) ← InWater (t) . A result analagous to Proposition 2, applied to an extension of causal theories, appears as Theorem 5.15 in [15]. There the issue is when a formula CF ⊃ CG, read “G is caused whenever F is caused” can be replaced by F ⊃ CG without affecting the “causally explained interpretations.” The first of these formulas corresponds to the treatment of static causal laws in the language B from [6] and in the translation of static causal laws into logic programming from [14]. 7
A fluent literal is a literal containing a fluent name; an action literal is a literal containing an action name. We apply this terminology both to the language of action descriptions and to the language of their translations into logic programming.
Representing Transition Systems by Logic Programs
6
99
Literal Completion
Programs lp T (D) are “positive-order-consistent,” or “tight.” This concept is defined in [4] for the special case of programs without classical negation; for tight programs without classical negation, the answer set semantics is shown in that paper to be equivalent to the completion semantics defined in [1]. According to Proposition 3 below, complete answer sets for a finite tight program can be characterized by the propositional formulas generated from the program by “literal completion.” This process, similar to Clark’s completion, is defined in [11] for causal theories, and here we show how this idea applies to tight programs. A level mapping is a function from literals to ordinals. (For finite programs, we can assume, without the loss of generality, that the values of a level mapping are nonnegative integers.) A program Π is tight if there exists a level mapping λ such that, for every rule (5) in Π that is not a constraint, λ(L1 ), . . . , λ(Lm ) < λ(Head ) . Note that this condition does not impose any restriction on the rule elements that include negation as failure. Consider a finite program Π. If H is a literal or the symbol ⊥, by Bodies(H) we denote the set of the bodies of all rules in Π whose head is H. For any finite set Body of rule elements, the propositional formula pf (Body) is defined by the equation pf (L1 , . . . , Lm , not Lm+1 , . . . , not Ln ) = L1 ∧ · · · ∧ Lm ∧ Lm+1 ∧ · · · ∧ Ln . The literal completion of Π consists of the formulas _ pf (Body) H≡ Body∈Bodies(H)
for all H. (The range of values of H includes all literals of the underlying language, even those that do not occur in Π, and the symbol ⊥.) Proposition 3. For any finite tight program Π and any complete set X of literals, X is an answer set for Π iff X is an interpretation satisfying the literal completion of Π. Consider, for instance, the program p ← not ¬p ¬p ← not p q ← not ¬q ¬q ← not q r←q .
, , , ,
This program has 4 answer sets: {p, q, r}, {p, ¬q}, {¬p, q, r}, {¬p, ¬q};
(11)
100
V. Lifschitz and H. Turner
two of them are complete. The literal completion for (11) consists of the formulas p ≡ p, ¬p ≡ ¬p, q ≡ q, ¬q ≡ ¬q, r ≡ q, ¬r ≡ ⊥. This set of formulas is equivalent to q ∧ r. Consequently, it is satisfied by two interpretations, {p, q, r} and {¬p, q, r}. In accordance with Proposition 3, these are the same as the complete answer sets for (11). It is clear that lp T (D) is tight: take λ(L) to be the time stamp of L. Consequently, the complete answer sets for this program can be characterized as the models of its literal completion. This fact is important for two reasons. First, it shows that one does not need a system for computing answer sets to plan in a domain described in C; a propositional solver, such as sato [16], will suffice. This is, in fact, how ccalc operates when the available actions are described by a definite action description in C. Second, this fact can be used to prove that some modifications of lp T (D), for finite D, are essentially equivalent to lp T (D), in the sense that they have the same complete answer sets as lp T (D). Consider, for instance, a program obtained from lp T (D) by replacing some of the rule elements Li (t) (m < i ≤ n) in the translations (8) of dynamic laws by not Li (t). The result is a finite tight program with the same literal completion; Proposition 3 implies that it has the same complete answer sets as lp T (D). In particular, this replacement can be applied to every rule element in lp T (D) that does not contain negation as failure. We will denote the result by lpn T (D). For instance, if D is (3) then lpn T (D) is Closed (t) ← not ¬Closed (t) , ¬Closed (t + 1) ← not ¬OpenDoor (t) , Closed (0) ← not ¬Closed (0) , ¬Closed (0) ← not Closed (0) , OpenDoor (t) ← not ¬OpenDoor (t) , ¬OpenDoor (t) ← not OpenDoor (t) . (Program lpn T (D) is, in fact, the alternative translation from C into logic programming mentioned in the introduction.) Proposition 3 shows that lp T (D) and lpn T (D) have the same complete answer sets, for finite D. In the next section, we prove Proposition 3 on the basis of a similar result applicable even when D is not finite. The same result plays a role in the proofs of Propositions 1 and 2.
7
Proofs
We begin with definitions and a theorem from [8] concerning tight programs. Take any program Π, and consistent set X of literals. We say that X is closed under Π if, for every rule Head ← L1 , . . . , Lm , not Lm+1 , . . . , not Ln
(12)
Representing Transition Systems by Logic Programs
101
Head ∈ X whenever L1 , . . . , Lm ∈ X and Lm+1 , . . . , Ln 6∈X. We say that X is supported by Π if, for every L ∈ X, there is a rule (12) in Π s.t. Head = L, L1 , . . . , Lm ∈ X and Lm+1 , . . . , Ln 6∈X. Proposition 4 ([4,8]). For any tight program Π, a consistent set X of literals is an answer set for Π iff X is closed under and supported by Π. Since Proposition 4 appears in [4] in a less general form, and in [8] without proof, we include a proof at the end of this section. 7.1
Proof of Proposition 1
Proposition 1. A complete set X of literals is an answer set for lp T (D) iff it has the form # "T −1 [ {L(t) : L ∈ st ∪ at } ∪ {L(T ) : L ∈ sT } (13) t=0
for some path hs0 , a0 , s1 , . . . , sT −1 , aT −1 , sT i in the transition system described by D. To prove this fact, we will establish three lemmas about the modification lpn T of the translation lp T defined in Section 6. Recall that, in the modified translation, a dynamic law caused F if L1 ∧ · · · ∧ Lm after Lm+1 ∧ · · · ∧ Ln is represented by the rules F (t + 1) ← not L1 (t + 1), . . . , not Lm (t + 1), not Lm+1 (t), . . . , not Ln (t) instead of F (t + 1) ← not L1 (t + 1), . . . , not Lm (t + 1), Lm+1 (t), . . . , Ln (t) (t = 0, . . . , T − 1). The first lemma shows that programs lp T (D) and lpn T (D) have the same complete answer sets. The third lemma differs from Proposition 1 only in that lp T (D) in its statement is replaced by lpn T (D). Lemma 1. Programs lp T (D) and lpn T (D) have the same complete answer sets. Proof. Both programs are tight. It is easy to verify that a complete, consistent set of literals is closed under one iff it is closed under the other. It is also easy to verify that they have the same complete supported sets. So, by Proposition 4, they have the same complete answer sets. Lemma 2. If X has the form (13), for some states s0 , s1 , . . . , sT −1 , sT and some interpretations a0 , . . . , aT −1 of σ act , then for every fluent literal L and t ∈ {0, . . . , T − 1}, L(t + 1) ← ∈ lpn T (D)X iff L is caused in hst , a, st+1 i.
102
V. Lifschitz and H. Turner
Proof. Assume that L(t + 1) ← ∈ lpn T (D)X . At least one of the following two cases holds. Case 1: D contains a static law caused L if L1 ∧ · · · ∧ Lm such that L1 (t + 1), . . . , Lm (t + 1) ∈ / X. Then L1 (t + 1), . . . , Lm (t + 1) ∈ X, and consequently L1 , . . . , Lm ∈ st+1 . It follows that L is caused in hst , a, st+1 i. Case 2: D contains a dynamic law caused L if L1 ∧ . . . ∧ Lm after Lm+1 ∧ . . . ∧ Ln such that Then
L1 (t + 1), . . . , Lm (t + 1), Lm+1 (t), . . . , Ln (t) ∈ / X. L1 (t + 1), . . . , Lm (t + 1), Lm+1 (t), . . . , Ln (t) ∈ X,
and consequently L1 , . . . , Lm ∈ st+1 and Lm+1 , . . . , Ln ∈ st . It follows that L is caused in hst , a, st+1 i. A similar argument shows that if L(t + 1) ← ∈ / lpn T (D)X , then L is not caused in hst , a, st+1 i. Lemma 3. A complete set X of literals is an answer set for lpn T (D) iff it has form (13) for some path hs0 , a0 , s1 , . . . , sT −1 , aT −1 , sT i in the transition system described by D. Proof. Left-to-right: Assume that X is a complete answer set for lpn T (D). Clearly X has the form (13), for some interpretations s0 , s1 , . . . , sT −1 , sT of σ act and a0 , . . . , aT −1 of σ act . Since X is closed under lpn T (D)X , we know that for every rule (7) obtained from a static causal law caused F if L1 ∧ · · · ∧ Lm , if L1 (t), . . . , Lm (t) ∈ / X, then F (t) ∈ X. Hence, for all t ∈ {0, . . . , T }, st satisfies L1 ∧ · · · ∧ Lm ⊃ F , for every static causal law caused F if L1 ∧ · · · ∧ Lm . That is, each st is a state. It remains to show that for each t ∈ {0, . . . , T − 1}, the set of formulas caused in hst , a, st+1 i is st+1 . This follows easily from Lemma 2. Right-to-left: Assume that hs0 , a0 , s1 , . . . , sT −1 , aT −1 , sT i is a path in the transition system described by D, and let X be the associated complete set of literals of form (13). We complete the proof by showing that L(t) ∈ X iff L(t) ← ∈ lpn T (D)X . Consider three cases. Case 1: L(t) is an action literal. The claim is trivial in this case, since in lpn T (D) such literals appear in the heads only of rules obtained in clause (iii) of the translation. Case 2: L(t) is a fluent literal, and t = 0. It is clear that if L(0) ∈ X, then L(0) ← ∈ lpn T (D)X , because of the rules obtained in clause (iii) of the translation. Assume L(0) ∈ / X. Then, with regard to the rule L(0) ← not L(0) obtained by clause (iii) of the translation, notice that L(0) ∈ X. All other rules
Representing Transition Systems by Logic Programs
103
in lpn T (D) with L(0) in the head have the form L(0) ← not L1 (0), . . . , not Lm (0) for some static causal law caused L if L1 ∧ · · · ∧ Lm , and since s0 is a state to which L does not belong, we can conclude that at least one of L1 , . . . , Lm belongs to s0 . Hence, at least one of L1 (0), . . . , Lm (0) belongs to X. Consequently, L(0) ← ∈ / lpn T (D)X . Case 3: L(t) is a fluent literal, and t 6= 0. The claim in this case follows from Lemma 2. 7.2
Proof of Proposition 2
Proposition 2. If λ is a split mapping for a definite action description D, then in rules (7) and (8) of lp T (D) with fluent literal heads we can replace any expressions of the form not Li (t) (1 ≤ i ≤ m) such that λ(F ) > λ(Li ) with Li (t) without affecting the complete answer sets. Similarly, in rules (7) and (8) of lp T (D) with head ⊥ we can replace any expressions of the form not Li (t) (1 ≤ i ≤ m) with Li (t). Proof. Let λ be the split mapping for D, and let Π be the program obtained from lp T (D) by replacing some rule elements not Li (t) with Li (t). It is easy to verify that the same complete, consistent sets of literals are closed under and supported by the two programs. As previously observed, lp T (D) is tight, so we can conclude by Proposition 4 that the two programs have the same complete answer sets, if we can show that Π is tight also. We do this by constructing a suitable level mapping λ∗ . Take α = sup {λ(L) : L is a fluent literal} . For all fluent literals L and t ∈ {0, . . . , T }, let λ∗ (L(t)) = (α + 1) · t + λ(L) . For all action literals L and t ∈ {0, . . . , T − 1}, we define λ∗ (L(t)) = 0. First observe that for any fluent literals L, L0 , and any t ∈ {0, . . . , T − 1}, λ∗ (L(t)) < λ∗ (L0 (t + 1)) , since (α + 1) · t + λ(L) < (α + 1) · t + (α + 1) = (α + 1) · t + (α + 1) · 1 = (α + 1) · (t + 1) ≤ (α + 1) · (t + 1) + λ(L0 ) . (The first step uses the fact that λ(L) < α + 1, along with the right monotonicity of ordinal addition. The third step uses the fact that ordinal multiplication distributes from the left over addition.) Hence, level mapping λ∗ establishes
104
V. Lifschitz and H. Turner
that lp T (D) itself is tight. It remains to show that the allowed replacements of rule elements preserve tightness. Consider any rule in lp T (D) with a fluent literal head L(t) in which a rule element not Li (t) has been replaced with Li (t) in Π. In this case, we know that λ(Li ) < λ(L), and we complete the proof by observing that, consequently, λ∗ (Li (t)) = (α + 1) · t + λ(Li ) < (α + 1) · t + λ(L) = λ∗ (L(t)) (again by the right monotonicity of ordinal addition). 7.3
Proof of Proposition 3
Proposition 3. For any finite tight program Π and any complete set X of literals, X is an answer set for Π iff X is an interpretation satisfying the literal completion of Π. Proof. Assume X is a complete answer set for Π. Since Π is tight, we know by Proposition 4 that X is closed under and supported by Π. Let H be any literal or ⊥. We must show that X satisfies _ pf (Body) . H≡ Body∈Bodies(H)
Case 1: H ∈ X. Since X is supported by Π, there is at least one rule H ← L1 , . . . , Lm , not Lm+1 , . . . , not Ln in Π such that L1 , . . . , Lm ∈ X and Lm+1 , . . . , Ln 6∈X. It follows that X satisfies the corresponding element of Bodies(H). Case 2: H ∈ / X. Since X is closed under Π, we know that for every rule H ← L1 , . . . , Lm , not Lm+1 , . . . , not Ln in Π either {L1 , . . . , Lm } 6⊆X or {Lm+1 , . . . , Ln } ∩ X 6= ∅. It follows that X satisfies no element of Bodies(H). Proof in the other direction is similar. It may be worth noting that Proposition 3 holds even when Π is not finite, as long as there are finitely many rules in Π with any given head (so that the literal completion can be defined). Moreover, even this restriction can be dropped in the case of constraints (rules with head ⊥), if we modify the definition of literal completion slightly, so that H ranges only over literals, and we add to the resulting propositional theory the formula ¬pf (Body) for each constraint ⊥ ← Body in Π.
Representing Transition Systems by Logic Programs
7.4
105
Proof of Proposition 4
Proposition 4. For any tight program Π, a consistent set X of literals is an answer set for Π iff X is closed under and supported by Π. Lemma 4. For any tight program Π without negation as failure, and consistent set X of literals, if X is closed under and supported by Π, then X is an answer set for Π. Proof. We need to show that X is minimal among sets closed under Π. Suppose otherwise; let Y be a proper subset of X that is also closed under Π. Let λ be a level mapping establishing that Π is tight. Choose a literal L ∈ X \ Y such that λ(L) is minimal. Since X is supported by Π, there is a rule L ← L1 , . . . , Lm in P i such that L1 , . . . , Lm ∈ X. Since Π is tight, λ(L1 ), . . . , λ(Lm ) < λ(L). Hence, by choice of L, we can conclude that L1 , . . . , Lm ∈ Y , which shows that Y is not closed under Π, contrary to the choice of Y . Proof of Proposition 4: The left-to-right direction is straightforward, and does not rely on tightness. For the other direction, assume X is closed under and supported by Π. It follows that X is closed under and supported by Π X . Since Π is tight, so is Π X . Hence, by Lemma 4, X is an answer set for Π X , and, consequently, an answer set for Π.
Acknowledgements Thanks to Esra Erdem for comments on a draft of this note. Some of these results were presented by the first author in his seminar on planning at the University of Texas in the Spring semester of 1999, and he is grateful to the participants of the seminar for interesting discussion. His work was partially supported by National Science Foundation under grant IRI-9732744. The second author is partially supported by University of Minnesota Grant-in-Aid of Research, Artistry & Scholarship #17831.
References 1. Keith Clark. Negation as failure. In Herve Gallaire and Jack Minker, editors, Logic and Data Bases, pages 293–322. Plenum Press, New York, 1978. 2. Yannis Dimopoulos, Bernhard Nebel, and Jana Koehler. Encoding planning problems in non-monotonic logic programs. In Proc. European Conf. on Planning 1997, pages 169–181, 1997. 3. Thomas Eiter, Nicola Leone, Cristinel Mateis, Gerald Pfeifer, and Francesco Scarcello. The KR system dlv: Progress report, comparisons and benchmarks. In Anthony Cohn, Lenhart Schubert, and Stuart Shapiro, editors, Proc. Sixth Int’l Conf. on Principles of Knowledge Representation and Reasoning, pages 406–417, 1998.
106
V. Lifschitz and H. Turner
4. Fran¸cois Fages. Consistency of Clark’s completion and existence of stable models. Journal of Methods of Logic in Computer Science, 1:51–60, 1994. 5. Michael Gelfond and Vladimir Lifschitz. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9:365–385, 1991. 6. Michael Gelfond and Vladimir Lifschitz. Action languages. Electronic Transactions on AI, 3, 1998. Available at http://www.ep.liu.se/ea/cis/1998/016/. 7. Enrico Giunchiglia and Vladimir Lifschitz. An action language based on causal explanation: Preliminary report. In Proc. AAAI-98, pages 623–630, 1998. 8. Vladimir Lifschitz. Foundations of logic programming. In Principles of Knowledge Representation, pages 69–127. CSLI Publications, 1996. 9. Vladimir Lifschitz. Action languages, answer sets and planning. In The Logic Programming Paradigm: a 25-Year Perspective, pages 357–373. Springer Verlag, 1999. 10. Norman McCain. Causality in Commonsense Reasoning about Actions. PhD thesis, University of Texas at Austin, 1997. 11. Norman McCain and Hudson Turner. Causal theories of action and change. In Proc. AAAI-97, pages 460–465, 1997. 12. Norman McCain and Hudson Turner. Satisfiability planning with causal theories. In Anthony Cohn, Lenhart Schubert, and Stuart Shapiro, editors, Proc. Sixth Int’l Conf. on Principles of Knowledge Representation and Reasoning, pages 212–223, 1998. 13. Ilkka Niemel¨ a and Patrik Simons. Efficient implementation of the well-founded and stable model semantics. In Proc. Joint Int’l Conf. and Symp. on Logic Programming, pages 289–303, 1996. 14. Hudson Turner. Representing actions in logic programs and default theories: a situation calculus approach. Journal of Logic Programming, 31:245–298, 1997. 15. Hudson Turner. Causal Action Theories and Satisfiability Planning. PhD thesis, University of Texas at Austin, 1998. 16. Hantao Zhang. An efficient propositional prover. In Proc. CADE-97, 1997.
Transformations of Logic Programs Related to Causality and Planning Esra Erdem and Vladimir Lifschitz Department of Computer Sciences University of Texas at Austin Austin, TX 78712, USA {esra,vl}@cs.utexas.edu
Abstract. We prove two properties of logic programs under the answer set semantics that may be useful in connection with applications of logic programming to representing causality and to planning. One theorem is about the use of disjunctive rules to express that an atom is exogenous. The other provides an alternative way of expressing that a plan does not include concurrently executed actions.
1
Introduction
In this note we prove two properties of logic programs under the answer set semantics [3] that may be useful in connection with applications of logic programming to representing causality and to planning. According to the first of the two theorems, replacing a disjunctive rule of the form p; ¬p ← (1) (where ; is the disjunction symbol and ¬ is classical negation) by the nondisjunctive rules p ← not ¬p (2) ¬p ← not p is an equivalent transformation, as far as consistent answer sets are concerned. Under some conditions, this fact follows from [1]; our Theorem 1 is more general. Rules (1) and (2) are of interest in connection with the system of causal logic introduced by McCain and Turner [9]. This logic is based on the “principle of universal causation,” according to which, in a causally possible world history, every fact that obtains has a cause. McCain and Turner point out that the principle of universal causation needs to be relaxed in applications. For instance, in reasoning about actions, the causes for the occurrences and the non-occurrences of actions are usually not given. Instead, an occurrence of an action a at time t is postulated to have a cause whenever a occurs at t. In the language of [9], this is expressed by at ⇒ at . Similarly, the non-occurrence of an action a at time t is caused whenever a does not occur at t: ¬at ⇒ ¬at . In [9], the authors say that occurrences of actions are “exogenous” to the causal theory. Values of fluents at M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 107–116, 1999. c Springer-Verlag Berlin Heidelberg 1999
108
E. Erdem and V. Lifschitz
time 0 are usually treated as exogenous also. Generally, the assumption that an atom p is exogenous is expressed by p⇒p ¬p ⇒ ¬p.
(3) (4)
Proposition 6.1 from [8] shows that the language of causal logic used in these examples is closely related to logic programming. It shows, in particular, that causal rules (3) and (4) can be translated into logic programming as rules (2). From Theorem 1 proved below we learn that there is another way to express in logic programming that p is exogenous: disjunctive rule (1) can be used instead. This possibility played a role in our experiments with the use of system dlv [2] for planning, as discussed in the next section.1 The second theorem proved in this note has to do with expressing uniqueness assumptions in logic programming. In a language containing a binary predicate constant p and no function constants, the constraints ← p(c, d), p(c0 , d0 )
(c 6= c0 or d 6= d0 )
(5)
(c, d, c0 , d0 range over the object constants) eliminate the answer sets that contain more than one atom of the form p(c, d). Essentially the same result can be achieved using the rules p1 (c) ← p(c, d), p2 (d) ← p(c, d), ← p1 (c), p1 (c0 ) ← p2 (d), p2 (d0 )
(c 6= c0 ), (d 6= d0 ),
where p1 , p2 are auxiliary predicates. Theorem 2 is a generalization of this fact. Like Theorem 1, it is related to applications of dlv and similar systems to planning. When the goal is to find a plan in which actions are executed sequentially, the logic programming representation of the problem has to contain a “no-concurrency constraint” similar to (5). The equivalent transformation provided by Theorem 2 may allow us to state this constraint in a way that provides some computational advantages. An example is given in the next section.
2
Planning in the Blocks World
We define here a logic programming formalization of the blocks world in which on(B,L) expresses that a block B is at a location L, where L can be a block or the table. There is an action of moving a block B onto a location L denoted by move(B, L). Time is represented by an initial segment of nonnegative integers 0, . . . , tmax . 1
For details of our experiments with the use of systems dlv and smodels [11] to solve planning problems in the blocks world, see http://www.cs.utexas.edu/users/ esra/experiments/experiments.html .
Transformations of Logic Programs Related to Causality and Planning
109
The first rule of the program describes the effect of moving a block: on(B, L, T1 ) ← move(B, L, T )
(T1 = T + 1).
The commonsense law of inertia is postulated in the form on(B, L, T1 ) ← on(B, L, T ), not ¬on(B, L, T1 )
(T1 = T + 1).
A block can be moved only when it’s clear: ← move(B, L, T ), on(B1 , B, T ). No two blocks can be on the same block at the same time: ← on(B1 , B, T ), on(B2 , B, T )
(B1 6= B2 ).
Wherever a block is, it’s not anywhere else: ¬on(B, L1 , T ) ← on(B, L, T )
(L1 6= L).
Every block is part of a tower supported by the table: supported (B, T ) ← on(B, table, T ), supported (B, T ) ← on(B, B1 , T ), supported (B1 , T ) ← not supported (B, T ).
(B 6= B1 ),
No actions are executed at time tmax : ← move(B, L, tmax ). Both the initial values of fluents and the occurrences of actions are exogenous: on(B, L, 0) ← not ¬on(B, L, 0), ¬on(B, L, 0) ← not on(B, L, 0), move(B, L, T ) ← not ¬move(B, L, T ), ¬move(B, L, T ) ← not move(B, L, T ). Actions cannot be executed concurrently: ← move(B, L, T ), move(B1 , L1 , T )
(B 6= B1 or L 6= L1 ).
Every possible “history” of the blocks world over the time interval 0, . . . , tmax corresponds to an answer set for this program. Given this formalization, dlv can be used to solve planning problems in the blocks world.
110
E. Erdem and V. Lifschitz
The two theorems proved in this note allow us to modify the last two groups of rules. According to Theorem 1, we can express that the initial values and actions are exogenous by disjunctive rules: on(B, L, 0); ¬on(B, L, 0) ← , move(B, L, T ); ¬move(B, L, T ) ← . Theorem 2 shows that the no-concurrency constraints can be rewritten as follows: move 1 (B, T ) ← move(B, L, T ), move 2 (L, T ) ← move(B, L, T ), ← move 1 (B1 , T ), move 1 (B2 , T ) ← move 2 (L1 , T ), move 2 (L2 , T )
(B1 6= B2 ), (L1 6= L2 ).
Since the other rules of the program make it impossible to move a block to two different places at the same time, the rules involving move 2 are actually redundant, and the no-concurrency constraint can be expressed by the rules in the first and the third lines. This representation of nonconcurrency in the blocks world is used in [10]. Niemel¨a’s modification has a significant effect on the size of the program after grounding. For the original program, this size grows as n4 with the number n of blocks; for the modified program, it grows as n3 . As to the computation time of dlv, each of the two modifications led to substantial improvements for the November 21, 1998 version of the system. With the additional optimizations incorporated in the June 11, 1999 version of dlv, the efficiency is not affected by either transformation in an essential way. In our recent experiments with smodels, the computation time was found to be significantly affected by the second transformation. In one of the planning experiments with 8 blocks, the computation time was reduced from 219 seconds to 27 seconds. For a planning problem involving 11 blocks, the corresponding numbers were 637 seconds and 113 seconds.
3
Programs
In this note, a program is a set of rules of the form L1 ; . . . ; Lk ← Lk+1 . . . , Lm , not Lm+1 , . . . , not Ln
(6)
where 0 ≤ k ≤ m ≤ n, and each Li is a literal. (A literal is a propositional atom possibly preceded by classical negation ¬; a literal possibly preceded by negation as failure is called a rule element.) If k = 0, the rule is called a constraint. We denote the set of literals in the language of a program Π by lit(Π). For the definition of the reduct Π X of a program Π relative to a subset X of lit(Π), and for other definitions related to the answer set semantics, see [3] or [6].
Transformations of Logic Programs Related to Causality and Planning
4
111
Theorem 1
Theorem 1. For any program Π and any atom p, the programs
and
Π p; ¬p ←
(7)
Π p ← not ¬p ¬p ← not p
(8)
have the same consistent answer sets. It is essential that the statement of the theorem is restricted to consistent answer sets. For instance, if Π consists of the rules p ← ¬p ¬p ← p then the set of all literals is an answer set for (7), but not for (8). Both the theorem and the proof below can be extended to programs with negation as failure allowed in the heads of rules [6]. In the following lemma, used in the proof of Theorem 1, we denote program (7) by Π1 and program (8) by Π2 . Lemma 1. For any consistent set X of literals and any subset Y of X, Y is closed under Π1X iff Y is closed under Π2X . Proof. Let X be a consistent set of literals, and let Y be a subset of X. Since X is consistent, at least one of the literals p, ¬p does not belong to X. Consider 3 cases. Case 1: p 6∈X and ¬p 6∈X. Then Π1X is ΠX p; ¬p ← and Π2X is ΠX p← ¬p ← Since Y ⊆ X, p 6∈Y and ¬p 6∈Y . Consequently, Y is closed neither under Π1X nor under Π2X . Case 2: p ∈ X and ¬p 6∈X. Then Π1X is ΠX p; ¬p ←
112
E. Erdem and V. Lifschitz
and Π2X is ΠX p← The “if part”: assume that Y is closed under Π2X . Then Y is closed under Π X . In addition, p is in Y . Therefore, Y is also closed under Π1X . The “only if” part: assume that Y is closed under Π1X . Then Y is closed under Π X . As ¬p is not in X due to the case assumption, and Y ⊆ X, ¬p is not in Y either. Then p is in Y . Therefore, Y is closed under Π2X also. Case 3: p 6∈X and ¬p ∈ X. Similar to Case 2. Proof of Theorem 1: Let X be a consistent set of literals. We want to show that X is an answer set for Π1X iff X is an answer set for Π2X . Recall that a consistent set X of literals is an answer set for a disjunctive program without negation as failure iff it is a minimal set closed under this program. Notice first that X is closed under Π1X iff X is closed under Π2X , due to Lemma 1 for Y = X. It remains to check that (a) X is minimal among the sets closed under Π1X iff (b) X is minimal among the sets closed under Π2X . Assume (a), and consider any subset Y of X that is closed under Π2X . Due to Lemma 1, Y is closed under Π1X as well. By (a), it follows that X = Y . The proof in the other direction is similar.
5
Theorem 2
In the statement of Theorem 2, Π is a program, C1 , . . . , Cn (n > 0) are sets, and p is a function such that for all c1 ∈ C1 , . . . , cn ∈ Cn its values p(c1 , . . . , cn ) are pairwise distinct atoms in the language of Π. The expressions pi (c), where 1 ≤ i ≤ n and c ∈ Ci , are assumed to be pairwise distinct atoms that do not belong to the language of Π. Theorem 2. If X is an answer set for the program Π1 obtained from Π by adding the rules pi (ci ) ← p(c1 , . . . , cn ) ← pi (c), pi (c0 )
(9) (10)
(1 ≤ i ≤ n, c1 ∈ C1 , . . . , cn ∈ Cn , c, c0 ∈ Ci ) then X ∩ lit(Π)
(11)
Transformations of Logic Programs Related to Causality and Planning
113
is an answer set for the program Π2 obtained from Π by adding the rules ← p(c1 , . . . , cn ), p(c01 , . . . , c0n ) (c1 , c01 ∈ C1 , . . . , cn , c0n ∈ Cn , hc1 , . . . , cn i = 6 hc01 , . . . , c0n i).
(12)
Moreover, every consistent answer set for Π2 can be represented in form (11) for some consistent answer set X for Π1 . The theorem asserts, in other words, that replacing constraints (12) with rules (9) and (10) may extend the answer sets by new literals pi (c), ¬pi (c), but the parts of the answer sets that belong to the language of Π remain the same. To apply Theorem 2 to the blocks world example (Section 2), we use it to replace the no-concurrency constraints by the rules involving move 1 and move 2 consecutively for T = 0, T = 1, etc., each time with – – – –
n = 2, C1 equal to the set of block constants, C2 equal to the set of location constants, p(c1 , c2 ) equal to move(c1 , c2 , T ).
The proof of Theorem 2 is based on two facts. One is the splitting set theorem (see [7] for terminology and notation). The other is a property of constraints that easily follows from the definition of an answer set. Recall that a set X of literals is said to violate a constraint ← L1 , . . . , Lm , not Lm+1 , . . . , not Ln if L1 , . . . , Lm ∈ X, Lm+1 , . . . , Ln 6∈X. The fact about constraints that we need is that the effect of adding a set of constraints to a program is to eliminate the answer sets that violate at least one of these constraints. For any subset Y of lit(Π), we define: Y ∗ = {pi (ci ) : c1 ∈ C1 , . . . , cn ∈ Cn , p(c1 , . . . , cn ) ∈ Y }. Clearly Y ∗ ∩ lit(Π) = ∅.
(13)
Lemma 2. A consistent set X of literals is an answer set for the union of Π with rules (9) iff X = Y ∪ Y ∗ for some consistent answer set Y for Π. Proof. Take U = lit(Π). This set splits the union of Π with rules (9), and Π is the bottom of this union relative to U . By the splitting set theorem, X is an answer set for the union program iff X can be represented as a union of an answer set Y for Π with an answer set for program eU (Π, Y ). The latter consists of the rules (14) pi (ci ) ← for all c1 , . . . , cn such that
p(c1 , . . . , cn ) ∈ Y.
Consequently, its only answer set is Y ∗ .
114
E. Erdem and V. Lifschitz
Proof of Theorem 2: Let X be an answer set for Π1 . Then X is an answer set for the union of Π with rules (9). As any answer set for a program with constraints without negation as failure, X is consistent. By Lemma 2, X can be represented as Y ∪ Y ∗ , where Y is an answer set for Π. It is clear that X ∩ lit(Π) = Y.
(15)
We will show that Y is an answer set for Π2 . As Y is an answer set for Π, it is sufficient to show that Y does not violate any of constraints (12). Assume that it does. That means that Y contains a pair of distinct atoms p(c1 , . . . , cn ) and p(c01 , . . . , c0n ). Take an i such that ci 6= c0i . Both pi (ci ) and pi (c0i ) are in Y ∗ , and consequently in X. It follows that X violates (10), contrary to the assumption that it is an answer set for Π1 . Now we will show that any consistent answer set Y for Π2 can be represented as X ∩lit(Π) for some consistent answer set X for Π1 . Take X = Y ∪Y ∗ . By (15), to complete the proof, we only need to show that X is a consistent answer set for Π1 . By Lemma 2, X is a consistent answer set for the union of Π with rules (9). Assume that X violates constraints (10). Then X contains a pair of atoms pi (c), pi (c0 ) with c 6= c0 . Since pi (c) ∈ X = Y ∪ Y ∗ and Y ⊆ lit(Π), it follows that pi (c) ∈ Y ∗ . This means that c = ci for some atom p(c1 , . . . , cn ) in Y . Similarly, c0 = c0i for some atom p(c01 , . . . , c0n ) in Y . It follows that Y violates (12), contrary to the assumption that it is an answer set for Π2 .
6
Related Work
Gelfond et al [4] compare a disjunctive logic program Π with the nondisjunctive program Π 0 obtained by replacing each rule of form (6) by k rules: L1 ← Lk+1 , . . . , Lm , not Lm+1 , . . . , not Ln , not L2 , . . . , not Lk .. .
Lk ← Lk+1 , . . . , Lm , not Lm+1 , . . . , not Ln , not L1 , . . . , not Lk−1
Here each Li is a literal, i.e., an atom possibly preceded by classical negation, and 0 ≤ k ≤ m ≤ n. They show that each answer set for Π 0 is also an answer set for Π. Ben-Eliyahu and Dechter [1] show that Π is equivalent to Π 0 if Π is “headcycle free”. Their proof is based on translating Π and Π 0 into propositional logic; for the two programs, the results happen to be the same. Our Theorem 1 is different from that of [1] in two ways. First, it allows us to replace a single disjunctive rule (1)—and consequently any finite set of rules of this form—by nondisjunctive rules. This is not the same as completely eliminating all disjunctive rules from a program. Second, the disjunctive programs we consider are not required to be head-cycle free.
Transformations of Logic Programs Related to Causality and Planning
115
Consider, for instance, the program p; q ← p; ¬p ←
(16) (17)
p←q q ← p.
(18) (19)
By Theorem 1, we can replace rule (17) by nondisjunctive rules (2) and leave rule (16) as it is. The theorem from [1] would not allow us to justify this replacement. It is not applicable to program (16)–(19) because this program is not head-cycle free: its dependency graph has a cycle that goes through the atoms p and q that belong to the head of the same disjunctive rule. In satisfiability planning, auxiliary atoms similar to pi (c) from our Theorem 2 are sometimes used to eliminate action symbols altogether [5].
Acknowledgments We would like to thank Wolfgang Faber, Nicola Leone, Norman McCain, Ilkka Niemel¨a, Gerald Pfeifer, Patrik Simons and Hudson Turner for useful discussions related to representing the blocks world in logic programming and to the use of dlv and smodels. This work was partially supported by National Science Foundation under grant IIS-9732744.
References 1. Rachel Ben-Eliyahu and Rina Dechter. Propositional semantics for disjunctive logic programs. Annals of Mathematics and Artificial Intelligence, 12:53–87, 1994. 2. Thomas Eiter, Nicola Leone, Christinel Mateis, Gerald Pfeifer, and Francesco Scarcello. A deductive system for non-monotonic reasoning. In Proceedings of the 4th International Conference on Logic Programmimg and Nonmonotonic Reasoning, pages 363–374. Springer-Verlag, 1997. 3. Michael Gelfond and Vladimir Lifschitz. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9:365–385, 1991. 4. Michael Gelfond, Halina Przymusinska, Vladimir Lifschitz, and Miroslaw Truszczy´ nski. Disjunctive defaults. In Proceedings of the 2nd International Conference on Principles of Knowledge Representation and Reasoning, pages 230–237, 1991. 5. Henry Kautz and Bart Selman. Pushing the envelope: planning, propositional logic, and stochastic search. In Proceedings of 13th National Conference on AI, pages 1194–1201, 1996. 6. Vladimir Lifschitz. Foundations of logic programming. Principles of Knowledge Representation, pages 69–127, 1996. 7. Vladimir Lifschitz and Hudson Turner. Splitting a logic program. In Proceedings of the Eleventh International Conference on Logic Programming, pages 23–37, 1994. 8. Norman McCain. Causality in Commonsense Reasoning About Actions. PhD thesis, The University of Texas at Austin, 1997.
116
E. Erdem and V. Lifschitz
9. Norman McCain and Hudson Turner. Causal theories of action and change. In Proceedings of AAAI-97, pages 460–465, 1997. 10. Ilkka Niemel¨ a. Logic programs with stable model semantics as a constraint programming paradigm. Annals of Mathematics and Artificial Intelligence, 1999. To appear. 11. Ilkka Niemel¨ a and Patrik Simons. Efficient implementation of the well-founded and stable model semantics. In Proceedings of Joint International Cenference and Symposium on Logic Programming, pages 289–303, 1996.
From Causal Theories to Logic Programs (Sometimes) Fangzhen Lin and Kewen Wang Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon {flin, kwang}@cs.ust.hk
Abstract. In this paper, we shall provide a translation of a class of causal theories in (Lin [3]) to Gelfond and Lifschitz’s disjunctive logic programs with classical negation [1]. We found this translation interesting for at least the following two reasons: it provides a basis on which a wide class of causal theories in ([3]) can be computed; and it sheds some new lights on the nature of the causal theories in [3]. Our translation is in many ways similar to the one given in [9,2]. Our main result is a theorem that shows how action precondition and fully instantiated successor state axioms can be computed from the answer sets of the translated logic program. Keywords: disjunctive logic programs with classical negation; causal theories of actions; situation calculus.
1
Introduction
The ramification problem in reasoning about action concerns with the implications that domain constraints have on action effects. For instance, in the blocks world, given that stack(A, B) causes A to be on B and the constraint that a block can be only at one location at any given time, we can deduce that after the action is performed, A will not be on any block other than B. Current work on the ramification problem centers on the idea that those domain constraints that can entail additional action effects should be encoded as causal constraints. There are currently several closely related theories of causal constraints. In this paper, we shall pursue the approach developed in (Lin [3]), and provide a translation of a class of causal theories in (Lin [3]) to Gelfond and Lifschitz’s disjunctive logic programs with classical negation [1]. We found this translation interesting for at least the following two reasons: it provides a basis on which a wide class of causal theories in ([3]) can be computed; and it sheds some new lights on the nature of the causal theories in [3]. For instance, the causal theories in [3] use a two-stage minimization policy that first minimizes causation and then maximizes action precondition. In contrast, the intended meaning of the translated logic programs is captured using Gelfond and Lifschitz’s answer set semantics alone. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 117–131, 1999. c Springer-Verlag Berlin Heidelberg 1999
118
F. Lin and K. Wang
This paper is organized as follows. In section 2, we define our language and introduce some notations. In section 3, we reformulate in this language causal theories of [3,4]. In section 4, we briefly review Gelfond and Lifschitz’s disjunctive logic programs with classical negation. In section 5, we provide a translation from a class of causal theories to logic programs and prove the correctness of this translation. In section 6, we illustrate our translation with some examples. Finally, we conclude in section 7.
2
The Language
Given an action, and a starting situation, our goal in this paper is to determine whether the action is possible in the starting situation, and if so what the effects of this action are. We assume that the action and the starting situation are fixed and are implicit. We shall assume a propositional language since logic programs are essentially propositional when only Herbrand models are considered. We assume a set of fluents F. Fluents, i.e. elements of F are not propositions in the language. They are used to generate propositions in the following ways: – if f ∈ F, then init(f ), meaning that fluent f holds in the initial situation, is a proposition. – if f ∈ F, then succ(f ), meaning that fluent f holds in the successor situation of performing the action in the initial situation, is a proposition. – if f ∈ F, then ctinit(f ), meaning that fluent f is caused to be true in the initial situation, is a proposition. – if f ∈ F, then cfinit(f ), meaning that fluent f is caused to be false in the initial situation, is a proposition. – if f ∈ F, then ctsucc(f ), meaning that fluent f is caused to be true in the successor situation, is a proposition. – if f ∈ F, then cfsucc(f ), meaning that fluent f is caused to be false in the successor situation, is a proposition. In the following, the last four types of propositions are called causal propositions. In addition, we assume that poss is a proposition meaning that the action is possible in the starting situation.1 In the following, we take a model to be a set of propositions, that is, the set of propositions that are true in the model.
3
Causal Theories
With respect to this language, a causal theory according to (Lin [3,4]) is a set of sentences of the following form: 1
As one of the anonymous referees pointed out, we do not really need poss in this paper because we assume a fixed action and, as we shall see later in Theorems 1 and 2, we consider only models in which poss holds. We however keep it here as one of the main tasks of this paper is to compute an explicit poss axiom from answer sets.
From Causal Theories to Logic Programs (Sometimes)
119
– an explicit action precondition axiom of the form: poss ⊃ ϕ0 where ϕ0 is a formula consisting only of propositions of the form init(f ) for f ∈ F. (In the situation calculus, this corresponds to ∀s.P oss(A, s) ⊃ ϕ, where A is the assumed action and ϕ is a situation calculus formula obtained from ϕ0 by replacing in it each occurrence of init(f ) by Holds(f, s).) – a set of effect axioms of the form poss ∧ ϕ0 ⊃ l1 ∨ · · · ∨ lk where ϕ0 is a formula consisting of propositions of the form init(f ) for f ∈ F, and each li , 1 ≤ i ≤ k, is either ctsucc(f ) or cfsucc(f ) for some f ∈ F. – a set of causal constraints of the form: ϕ0 ∧ ϕ1 , where ϕ0 is a formula consisting of propositions of the form init(f ), ctinit(f ), or cfinit(f ), for f ∈ F, and ϕ1 is obtained from ϕ0 by replacing init(f ) by succ(f ), ctinit(f ) by ctsucc(f ), and cfinit(f ) by cfsucc(f ). (In the situation calculus, this corresponds to the causal constraint ∀s.ϕ(s), where ϕ(s) is obtained from ϕ0 by replacing init(f ) by Holds(f, s), ctinit(f ) by Caused(f, true, s), and cfinit(f ) by Caused(f, f alse, s).) Notice that when ϕ0 does not mention causal propositions, then ϕ0 ∧ ϕ1 is an acausal state constraint. Example 1. Consider the suitcase domain in (Lin [3]). In this domain, there is a suitcase with two latches such that the suitcase will open up when both of the two latches are in the up position. We have the following fluents: up1 (the first latch is up), up2 (the second latch is up), and open (the suitcase is open). Now consider the action press1 that will turn up the first latch if it was not up initially. The causal theory for this action is the following set of axioms: – explicit action precondition: poss ⊃ ¬init(up1). – action effect axiom:
poss ⊃ ctsucc(up1).
– causal constraints: init(up1) ∧ init(up2) ⊃ ctinit(open), succ(up1) ∧ succ(up2) ⊃ ctsucc(open). Given a causal theory T , its semantics is given by its completion, comp(T ), which is the circumscription of ¬poss in precom(T ) with the propositions in I = {init(f ) | f ∈ F} fixed, where precom(T ) is the set of following axioms:
120
F. Lin and K. Wang
– The circumscription of all causal propositions: {ctinit(f ), cfinit(f ), ctsucc(f ), cfsucc(f ) | f is a fluent} simultaneously with all other propositions fixed. – For each fluent f , the following basic causation axioms which say that if a proposition is caused to have certain truth value, then it must have that truth value: ctinit(f ) ⊃ init(f ), cfinit(f ) ⊃ ¬init(f ), ctsucc(f ) ⊃ succ(f ), cfsucc(f ) ⊃ ¬succ(f ). – For each fluent f , the following generic successor state axiom which says that a fluent is true in a successor situation if and only if it either is caused to be true or was true and is not caused to be false: poss ⊃ [succ(f ) ≡ ctsucc(f ) ∨ (init(f ) ∧ ¬cfsucc(f ))]. For instance, for our suitcase domain in Example 1, the completion of that causal theory will yield the following axioms: poss ≡ ¬init(up1), poss ⊃ [succ(up1) ≡ true], poss ⊃ [succ(up2) ≡ init(up2)], poss ⊃ [succ(open) ≡ init(up2) ∨ init(open)].
4
Extended Disjunctive Logic Programs
According to Gelfond and Lifschitz [1], an extended disjunctive logic program is one that can have disjunction in the head of a rule, and can have two kinds of negations: the so-called classical negation and negation-as-failure. More precisely, an extended disjunctive logic program P is a set of rules of the form l1 ∨ · · · ∨ lk ← lk+1 , . . . , lm , not lm+1 , . . . , not ln , where each li , 1 ≤ i ≤ n, is a literal which is either a proposition p or the (classical) negation of a proposition ¬p. In the following, we let Lit to be the set of all literals. The semantics of an extended disjunctive logic program P is given by its answer sets as follows: 1. For any set S of literals, define P S to be the GL-transformation of P obtained from P in the following ways: a) if a rule has some not l in its body such that l ∈ S, then delete this rule;
From Causal Theories to Logic Programs (Sometimes)
121
b) if for some l, not l occurs in the body of a rule and l 6∈S, then delete not l from this rule. It is clear that P S so obtained is a program without mentioning ‘not’. 2. A set S of literals is an answer set of P , if S is a minimal consequence set of P S , i. e. S is a minimal set of literals satisfying the following properties: a) if l1 ∨ · · · ∨ lk ← lk+1 , . . . , lm is a rule in P S , and lk+1 , . . . , lm ∈ S, then li ∈ S for some i ∈ {1, . . . , k}; b) if S contains a pair of complementary literals p and ¬p, then S = Lit. An answer set E is consistent if it does not contain two contradictory literals, i.e. E 6= Lit.
5
From Causal Theories to Logic Programs (Sometimes)
We shall now provide a translation of causal theories of certain form to Gelfond and Lifschitz’s extended disjunctive logic programs. Definition 1. A causal theory is a clausal one if it satisfies the following conditions: – The ϕ0 in the explicit action precondition axiom and all effect axioms are of the form: l1 ∧ · · · ∧ ln , where each li , 1 ≤ i ≤ n, is either init(f ) or ¬init(f ) for some fluent f . – The ϕ0 in each causal constraint has the following form: l1 ∧ · · · ∧ lk ⊃ c1 ∨ · · · ∨ ct , where each li is either init(f ) or ¬init(f ), and each ci is either ctinit(f ) or cfinit(f ), for some fluent f . In other words, a clausal causal theory is a set of following sentences: – a set of explicit action precondition axioms of the following form: poss ⊃ l1 ∧ · · · ∧ ln ,
(1)
where each li is either init(f ) or ¬init(f ). – a set of action effect axioms of the following form: poss ∧ l1 ∧ · · · ∧ ln ⊃ c˜1 ∨ · · · ∨ c˜m ,
(2)
where each li is either init(f ) or ¬init(f ), and each c˜i is either ctsucc(f ) or cfsucc(f ). – a set of causal constraints of the following form: l1 ∧ · · · ∧ lk ⊃ c1 ∨ · · · ∨ ct , ˜l1 ∧ · · · ∧ ˜lk ⊃ c˜1 ∨ · · · ∨ c˜t ,
(3) (4)
where each li is either init(f ) or ¬init(f ), each ci is either ctinit(f ) or cfinit(f ), ˜li is obtained from li by replacing succ(f ) for init(f ), and c˜i is obtained from ci by replacing ctsucc(f ) for ctinit(f ) and cfsucc(f ) for cfinit(f ).
122
F. Lin and K. Wang
For instance, the causal theory given in Example 1 is a clausal causal theory. We now proceed to translate a clausal causal theory T into an extended disjunctive logic program πT . First of all, independent of any causal theory, we have the following rules for each fluent f : init(f ) ← ctinit(f ) ¬init(f ) ← cfinit(f ) succ(f ) ← ctsucc(f )
(5) (6) (7)
¬succ(f ) ← cfsucc(f ) succ(f ) ← init(f ), not cfsucc(f ), poss
(8) (9)
¬succ(f ) ← ¬init(f ), not ctsucc(f ), poss poss ← not ¬poss
(10) (11)
init(f ) ← not ¬init(f ) ¬init(f ) ← not init(f )
(12) (13)
It is clear that the first four rules correspond to basic causation axioms. Rules (9) and (10) correspond to generic successor state axioms, that is, the inertia axioms. Rule (11) maximizes poss. Rules (12) and (13) assume a complete knowledge about the initial situation, and were adapted from (Turner [8]). These two rules are crucial for our translation. In addition to the above domain independent rules, πT contains the following domain specific rules obtained from T : – if T contains an explicit action precondition axiom (1), then πT contains the following rules: (14) ¬poss ← not li , i = 1, . . . , n. – if T contains an action effect axiom (2), then πT contains the following rule: c˜1 ∨ · · · ∨ c˜m ← l1 , . . . , ln , poss
(15)
– if T contains causal constraints (3) and (4), then πT contains the following rules: c1 ∨ · · · ∨ ct ← l1 , . . . , lk c˜1 ∨ · · · ∨ c˜t ← not ˆl1 , · · · , not ˆlk
(16) (17)
where ˆli , 1 ≤ i ≤ k, is the complement of ˜li , i.e. if ˜li is atom p, then ˆli is ¬p, and if ˜li is negated atom ¬p, then ˆli is p. Notice that in the last rule, not ˆli cannot be replaced by ˜li . Otherwise, the rule would be too weak, as the following example illustrates:
From Causal Theories to Logic Programs (Sometimes)
123
Example 2. Consider the following causal theory T consisting of a single cyclic causal rule which says that if q is true, then q is caused to be true: init(q) ⊃ ctinit(q), succ(q) ⊃ ctsucc(q). These two sentences are translated to the following two rules: ctinit(q) ← init(q) ctsucc(q) ← not ¬succ(q) The logic program consisting of these two rules and the domain independent ones has three consistent answer sets: S1 = {poss, ¬init(q), ¬succ(q)}, S2 = {poss, ¬init(q), succ(q), ctsucc(q)}, S3 = {poss, init(q), succ(q), ctinit(q), ctsucc(q)}. ¿From these answer sets,2 we see that the action is always possible, and that if q is true initially, then q is still true afterward, but if q is not true initially, then the action is indeterminate about q afterward. This is the same conclusion entailed by comp(T ). However, if we replace not ¬succ(q) by succ(q) in the second rule: ctinit(q) ← init(q) ctsucc(q) ← succ(q) Then we would get a different result: S2 above is no longer an answer set of the new logic program. This translation is provably correct, in the sense that there is a one to one correspondence between models of the completion of a causal theory and answer sets of its corresponding logic program, provided the models and the answer sets satisfy poss: Theorem 1. Let T be a clausal causal theory and πT its corresponding extended disjunctive logic program. We have: 1. If N is a model of comp(T ) and poss ∈ N , then SN = N ∪N ¬ is a consistent answer set of πT , where3 N ¬ = {¬init(f ) | init(f ) 6∈N } ∪ {¬succ(f ) | succ(f ) 6∈N }. 2. If E is a consistent answer set of πT with poss ∈ E, then ME = BT ∩ E is a model of comp(T ), where BT is the set of atoms in T . 2 3
See Theorem 2 below. We thank one of the anonymous referees for suggesting this definition of N ¬ which is simpler and more intuitive than our previous one.
124
F. Lin and K. Wang
Proof. See Appendix. We can use this theorem to compute action precondition and successor state axioms for clausal causal theories as follows. For any set E of literals. Let init(E) be the conjunction of all literals in E about the initial situation: ^ ^ init(f ) ∧ ¬init(f ). init(E) = init(f )∈E
¬init(f )∈E
The following consequence of Theorem 1 shows how action precondition and successor state axioms of a clausal causal theory can be recovered from the answer sets of the corresponding logic program: Theorem 2. Let T and πT be as in Theorem 1. The following are consequences of comp(T ): _ init(E), (18) poss ≡ E∈Eposs
_
poss ∧ succ(f ) ⊃
init(E),
(19)
E∈Ef
_
poss ∧ ¬succ(f ) ⊃
init(E),
(20)
E∈E¬f
where f is a fluent, and Eposs = {E | E is a consistent answer set of πT and poss ∈ E}, Ef = {E | E is a consistent answer set of πT, succ(f ) ∈ E, and poss ∈ E}, E¬f = {E | E is a consistent answer set of πT, ¬succ(f ) ∈ E, and poss ∈ E}. Furthermore, for any fluent f , there is a sentence ϕ about the initial situation such that comp(T ) |= poss ⊃ succ(f ) ≡ ϕ if and only if |=
_ E∈Ef
init(E) ⊃ ¬
_
init(E).
(21)
(22)
E∈E¬f
Proof. We prove (18) first. Let N be a model of comp(T ). If N |= poss, then by Theorem 1, there is an answer set SN of πT such that poss ∈ SN . By the definition of SN , it is easy to see N |= init(SN ). Thus N satisfies the right side of (18) as well. Conversely, if N satisfies the right side, then there is an answer set E ∈ Eposs such that N |= init(E). By Theorem 1, ME is a model of comp(T ) and poss ∈ ME . Recall that comp(T ) is the result of minimizing ¬poss in precomp(T ) with propositions in I = {init(f ) | f ∈ F} fixed. Because N |= init(E), N and ME must agree on propositions in I. So poss must be in N as well.
From Causal Theories to Logic Programs (Sometimes)
125
Similarly, (19) and (20) can be proven. It remains to prove the last part of this corollary:
W (22)⇒(21): this follows trivially from (19) and (20) by letting ϕ be E∈Ef init(E). (21)⇒(22): Suppose (21) holds. If (22) doesWnot hold, then there is an interpreW tation N such that N |= E∈Ef init(E) ∧ E∈E¬f init(E). Thus there are two answer sets E ∈ Ef and E 0 ∈ E¬f such that N |= init(E) ∧ init(E 0 ). This implies that init(E) = init(E 0 ). By Theorem 1, both ME and ME 0 are models of comp(T ) and poss. So by (21), both of them satisfy succ(f ) ≡ ϕ. But this is impossible because ME |= succ(f ), ME 0 |= ¬succ(f ), and since init(E) = init(E 0 ), ME |= ϕ iff ME 0 |= ϕ. Therefore, (22) must hold as well. Except for the fact that we uses causal propositions, the translation to logic programs given in this section is very similar to the one given in [9,2] for causal logic of (Turner [9]). In fact, Turner [9] shows that there is a mapping from our definite clausal causal theories to his causal theories. However, the clausal causal theories here can have disjunctions in the head.
6
Some Examples
Using Theorem 2, we can compute action preconditions and fully instantiated successor state axioms4 using systems like Smodel [6] and dlv [5]. In fact, all examples in this paper have been checked using Smodel for those that do not have disjunctions. Example 3. Consider again the suitcase domain in Example 1. The logic program rules corresponding to the causal theory are as follows: ¬poss ← not ¬init(up1) ctsucc(up1) ← poss ctinit(open) ← init(up1), init(up2) ctsucc(open) ← not ¬succ(up1), not ¬succ(up2) The logic program consisting of these rules and the domain independent ones have the following four answer sets that contain poss: E1 = {poss, ¬init(up1), ¬init(up2), init(open), succ(up1), succ(open), ctsucc(up1)}, E2 = {poss, ¬init(up1), ¬init(up2), ¬init(open), succ(up1), ¬succ(up2), ¬succ(open), ctsucc(up1)}, E3 = {poss, ¬init(up1), init(up2), ¬init(open), succ(up1), succ(up2), succ(open), ctsucc(up1), ctsucc(open)}, E4 = {poss, ¬init(up1), init(up2), init(open), succ(up1), succ(up2), succ(open), ctsucc(up1), ctsucc(open)}. 4
Successor state axioms in [7] quantify over actions. Here we deal with actions one at a time. So what we are interesting in here are successor state axioms with action variables fully instantiated.
126
F. Lin and K. Wang
Since init(E1 ) = ¬init(up1) ∧ ¬init(up2) ∧ init(open), init(E2 ) = ¬init(up1) ∧ ¬init(up2) ∧ ¬init(open), init(E3 ) = ¬init(up1) ∧ init(up2) ∧ ¬init(open), init(E4 ) = ¬init(up1) ∧ init(up2) ∧ init(open). Thus by Theorem 2, we have poss ≡ init(E1 ) ∨ init(E2 ) ∨ init(E3 ) ∨ init(E4 ), which is equivalent to poss ≡ ¬init(up1). We can similarly use the theorem to compute successor state axioms. We consider the fluent open. Others are similar. Notice that Eopen , the set of the answer sets containing succ(open), is {E1 , E3 , E4 }. After some simplifications, init(E1 ) ∨ init(E3 ) ∨ init(E4 ) is ¬init(up1) ∧ (init(up2) ∨ init(open)).
(23)
Now E¬open is {E2 }, and init(E2 ) is ¬init(up1) ∧ ¬init(up2) ∧ ¬init(open).
(24)
It can be seen that |= (23) ⊃ (24), so we have the following successor state axiom for open: poss ⊃ succ(open) ≡ ¬init(up1) ∧ (init(up2) ∨ init(open)), which is equivalent to poss ⊃ succ(open) ≡ (init(up2) ∨ init(open)) because poss ≡ ¬init(up1). The following example, adapted from [3], shows a causal rule that can entail implicit action preconditions. Example 4. In this domain, we have two fluents: w (the agent is walking) and a (the agent is alive), and a causal rule which says that if the agent is not alive, then it is not walking: ¬init(a) ⊃ cfinit(w), ¬succ(a) ⊃ cfsucc(w). Now consider the action start-walk that initiates walking: poss ⊃ ¬init(w), poss ⊃ ctsucc(w),
From Causal Theories to Logic Programs (Sometimes)
127
where the first one says that for this action to be possible, the agent should not already be walking, and the second axiom says that this action, if possible, causes the agent to be walking. The logic program rules that correspond to the above axioms are as follows: cfinit(w) ← ¬init(a) cfsucc(w) ← not succ(a) ¬poss ← not init(w) ctsucc(w) ← poss When put together with the domain independent rules, this logic program yields a unique answer set that is consistent and containing poss: S = {poss, init(a), ¬init(w), succ(a), succ(w), ctsucc(w)}. So according to Theorem 2, we have the following action precondition axiom: poss ≡ init(a) ∧ ¬init(w), where the condition init(a) is the implicit action precondition entailed by the causal rule. The following example, adapted from [4], illustrates our translation for an indeterminate action. Example 5. (Reiter’s Dropping Pin Example) In this domain, an agent can drop a pin on a checkerboard and this causes the pin to land either inside a white square, inside a black square, or touching both. To axiomatize this domain, we introduce two fluents: white (all or part of the pin is in a white square), black (all or part of the pin is in a black square). Now consider action drop that drops the pin on the checkerboard. It’s effect is given by the following effect axiom: poss ⊃ ctsucc(white) ∧ cfsucc(black) ∨ cfsucc(white) ∧ ctsucc(black) ∨ ctsucc(white) ∧ ctsucc(black), which is logically equivalent to the following three effect axioms: poss ⊃ ctsucc(white) ∨ ctsucc(black), poss ⊃ ctsucc(white) ∨ cfsucc(black), poss ⊃ cfsucc(white) ∨ ctsucc(black). For the causal theory consisting only of the above three axioms, the corresponding logic program will consist of the following rules together with the domain independent rules about fluents white and black: ctsucc(white) ∨ ctsucc(black) ← poss ctsucc(white) ∨ cfsucc(black) ← poss cfsucc(white) ∨ ctsucc(black) ← poss
128
F. Lin and K. Wang
This program has 12 answer sets that contain poss. They are all possible unions of any of the following sets: {poss, init(white), ¬init(black)}, {poss, init(white), init(black)}, {poss, ¬init(white), init(black)}, {poss, ¬init(white), ¬init(black)} with any of the following sets: {succ(white), succ(black), ctsucc(white), ctsucc(black)}, {succ(white), ¬succ(black), ctsucc(white), cfsucc(black)}, {¬succ(white), succ(black), cfsucc(white), ctsucc(black)}. Using Theorem 2, we can then compute that poss ≡ true, which means the action is always possible. W W For fluent white, both E∈Ewhite init(E) and E∈E¬white init(E) are equivalent to true, so condition (21) in Theorem 2 does not hold, and that we do not have a successor state axiom for white. All we have are the two axioms (19) and (20): poss ∧ succ(white) ⊃ true, poss ∧ ¬succ(white) ⊃ true which are tautologies. This means that the action is always indeterminate about fluent white, and symmetrically, about fluent black as well.
7
Concluding Remarks
We have provided a translation of so-called clausal causal theories of [3] to Gelfond and Lifschitz’s logic programs with classical negation and disjunction. Our translation is in many ways similar to the one given in [9,2]. Our key technical result is Theorem 2 that shows how action precondition and fully instantiated successor state axioms can be computed from the answer sets of the translated logic program. However, the main computational impediment in using this theorem to large causal theories is that the number of answer sets grows exponentially in terms of the number of fluents in the domain. For instance, if we add n irrelevant fluents to the suitcase domain, then the number of answer sets would be in the order of 2n . There is probably nothing that we can do in the worst case. However, we are working on strategies that would handle cases where the number of “relevant” answer sets is much smaller. Acknowledgments We would like to thank the anonymous referees for their useful comments. This work was supported in part by the Research Grants Council of Hong Kong under Competitive Earmarked Research Grants HKUST6091/97E and HKUST6145/98E.
From Causal Theories to Logic Programs (Sometimes)
129
References 1. M. Gelfond, V. Lifschitz. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9:365-385, 1991. 2. V. Lifschitz. Action languages, answer sets and planning. In: The Logic Programming Paradigm: a 25-Year Perspective, Springer Verlag, 1999. 3. F. Lin. Embracing causality in specifying the indirect effects of actions. In: Proc. IJCAI’95, pages 1985-1991, 1995. 4. F. Lin. Embracing causality in specifying the indeterminate effects of actions. In: Proc. AAAI’96, pages 670-676, 1996. 5. N. Leone, G. Pfeifer and W. Faber. The dlv project: A disjunctive datalog system, http://www.dbai.tuwien.ac.at/proj/dlv/ 6. I. Niemel¨ a, P. Simons. Efficient implementation of the well-founded model and stable model semantics. In: Proc. JICLP’96, pages 289-303, 1996. 7. R. Reiter. The frame problem in the situation calculus: a simple solution (sometimes) and a completeness result for goal regression. In V. Lifschitz, editor, Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of John McCarthy, pages 418–420. Academic Press, San Diego, CA, 1991. 8. H. Turner. Representing actions in logic programs and default theories: A situation calculus approach. Journal of Logic Programming, 31(1-3): 245-298, 1997. 9. H. Turner. A logic of universal causation. Artificial Intelligence, 1998.
Appendix Proof of Theorem 1 Theorem 1. Let T be a clausal causal theory and πT its corresponding extended disjunctive logic program. We have: 1. If N is a model of comp(T ) and poss ∈ N , then SN = N ∪N ¬ is a consistent answer set of πT , where N ¬ = {¬init(f ) | init(f ) 6∈N } ∪ {¬succ(f ) | succ(f ) 6∈N }. 2. If E is a consistent answer set of πT with poss ∈ E, then ME = BT ∩ E is a model of comp(T ), where BT is the set of atoms in T . Proof. For any rule R, the rule obtained from R by deleting all default literals in its body is written as R+ . 1. Assume that N |= comp(T ), then by the construction of SN , SN does not contain any complementary literals. We can directly verify that SN is a model of πT SN from N |= comp(T ): for instance, the rule (9)+ : succ(f ) ← init(f ), poss is in πT SN if cfsucc(f ) is not in SN . If init(f ) ∈ SN , then init(f ) ∈ N and cfsucc(f ) 6∈N since SN and N have the same set of I-atoms and S-atoms. Since N satisfies the inertia axioms, we have succ(f ) ∈ N . Thus, succ(f ) ∈ SN . This implies that the rule (9)+ : succ(f ) ← init(f ), poss in πT SN is satisfied by SN (poss is in SN ). The satisfiability of other rules in πT SN can be verified similarly. To show that SN is an answer set of πT , it is enough to show that SN is a minimal consequence set of πT SN . Let E0 is a minimal consequence set of πT SN such that E0 ⊆ SN .
130
F. Lin and K. Wang
It suffices to show that SN ⊆ E0 : Consider eight possible cases: Case 1. init(f ) ∈ SN : then ¬init(f ) 6∈SN and thus the rule (12)+ : init(f ) ← is in πT SN . Hence, init(f ) ∈ E0 . Case 2. ¬init(f ) ∈ SN , then init(f ) 6∈SN and thus the rule (13)+ : ¬init(f ) ← is in πT SN . Hence, ¬init(f ) ∈ E0 . Case 3. ctinit(f ) ∈ SN : then ctinit(f ) ∈ N . Since N |= precom(T ), there exists an axiom of form (3): l1 ∧ · · · ∧ lk ∧ ct+1 ∧ . . . ∧ cn ⊃ ctinit(f ) ∨ c1 ∨ · · · ∨ ct such that l1 , . . . , lk , ct+1 , . . . , cn ∈ N but c1 , . . . , ct 6∈N . Let this axiom of form (3) be translated into the rule R in πT . Then the rule R+ = R is in πT SN . This implies one of literals in ctinit(f ) ∨ c1 ∨ · · · ∨ ct should be in E0 . Since c1 , . . . , ct 6∈SN and E0 ⊆ SN , c1 , . . . , ct 6∈E0 . Thus, it forces ctinit(f ) ∈ E0 . Case 4. cfinit(f ) ∈ SN : similar to the Case 3, we can prove cfinit(f ) ∈ E0 . Case 5. ctsucc(f ) ∈ SN : then ctsucc(f ) ∈ N . Since N is a model of precomp(T ), there is at least one axiom in T such that (1) the left side of the axiom is true, (2) ctsucc(f ) appears in the right side of the axiom, (3) the other disjuncts except for ctsucc(f ) are not in N . Therefore, there are two possible subcases: Subcase 1. there exists an axiom of form (2): poss∧l1 ∧· · ·∧ln ⊃ ctsucc(f )∨ c˜1 ∨ · · · ∨ c˜m such that poss, l1 , . . . , ln ∈ N and c˜1 , . . . , c˜m 6∈N . This means the rule ctsucc(f ) ∨ c˜1 ∨ · · · ∨ c˜m ← l1 , . . . , ln , poss is in πT SN with l1 , . . . , ln ∈ SN but c˜1 , . . . , c˜m 6∈E0 . Thus it forces ctsucc(f ) ∈ E0 . Subcase 2. there exists an axiom of form (4): ˜l1 ∧ · · · ∧ ˜lk ∧ c˜t+1 ∧ · · · ∧ c˜n ⊃ ctsucc(f ) ∨ c˜1 ∨ · · · ∨ c˜t such that ˜l1 , . . . , ˜lk , c˜t+1 , . . . , c˜n ∈ N and c˜1 , . . . , c˜t 6∈N . Then ¬˜l1 , . . . , ¬˜lk 6∈SN . Thus, we know that the rule ctsucc(f ) ∨ c˜1 ∨ · · · ∨ c˜t ← c˜t+1 , . . . , c˜n is in πT SN . Thus we also have ctsucc(f ) ∈ E0 . Case 6. cfsucc(f ) ∈ SN : similar to the Case 5. Case 7. succ(f ) ∈ SN : there are again two subcases: Subcase 1. ctsucc(f ) ∈ SN : by Case 5, ctsucc(f ) ∈ E0 . Since the rule (7)+ =(7): succ(f ) ← ctsucc(f ) is in πT SN . Thus, succ(f ) ∈ E0 . Subcase 2. ctsucc(f ) 6∈SN : then ctsucc(f ) 6∈N . since poss ∈ N , we have init(f ) ∈ N and cfsucc(f ) 6∈N . Hence, init(f ) ∈ SN and cfsucc(f ) 6∈SN . The last fact means that the rule (16)+ : succ(f ) ← init(f ) is in πT SN . By Case 1, init(f ) ∈ E0 . Therefore, succ(f ) ∈ E0 . Case 8. ¬succ(f ) ∈ SN : according to the definition of SN , succ(f ) 6∈ N . There are two possible subcases: Subcase 1. init(f ) 6∈N and ctsucc(f ) 6∈N : then ¬init(f ) ∈ SN and ctsucc(f ) 6∈SN . This means the rule (10)+ is in πT SN . Therefore, ¬succ(f ) ∈ E0 . Subcase 2. cfsucc(f ) ∈ N : since the rule (8)+ = (8): ¬succ(f ) ← is in SN πT . Thus, ¬succ(f ) ∈ E0 . Combining the eight cases above, we have E0 = SN . That is, SN is an answer sets of πT . 2. If E is an answer set of πT , we want to show ME = BT ∩ E is a model of comp(T ).
From Causal Theories to Logic Programs (Sometimes)
131
It can be directly verified that ME satisfies T and the caused-axioms (see the end of Section 3). Now we remain to prove ME satisfies the inertial axiom for succ, that is, ME |= ctsucc(f ) ∨ (init(f ) ∧ ¬cfsucc(f )) ≡ succ(f ). By (7) and (9), ME |= ctsucc(f ) ∨ (init(f ) ∧ ¬cfsucc(f )) ⊃ succ(f ). For the opposite implication, if succ(f ) ∈ E, cfsucc(f ) could not be in E. Thus, if succ(f ) ∈ ME and ctsucc(f ) 6∈ME , then the rule (9)+ : succ(f ) ← init(f ), poss is in πT E and succ(f ) can only be supported by this rule in πT E . That is, init(f ) must be in ME . This implies ME |= succ(f ) ⊃ ctsucc(f )∨(init(f )∧¬cfsucc(f )). Now we prove ME |= precomp(T ), where precomp(T ) = Circ(T ; C). Suppose N is a model of precomp(T ) such that N ⊆ ME . Then the following conditions are satisfied: (1) N and ME have the same sets of I-atoms and S-atoms, (2) every C-atom in N is also in ME and (3) poss ∈ N . We claim the following two facts: Fact 1. SN is also a model of πT E and Fact 2. SN ⊆ E. Fact 1 follows directly from the conditions (1)-(3) above. To convince Fact 2, first note that SN = N ∪ N ¬ . It is obvious that N ⊆ ME ⊆ E. It needs to show that N ¬ ⊆ E. If fact, for any L ∈ N ¬ , then there are two possible cases by the definition of N ¬ : Case 1. L = ¬init(f ) such that init(f ) 6∈N : By the condition (1) above, init(f ) 6∈ME . Thus, init(f ) 6∈E. This implies ¬init(f ) ∈ E since πT E should contain the rule (11)+ . Case 2. L = ¬succ(f ) such that succ(f ) 6∈N . This case can again be divided into two subcases: Subcase 1. L = ¬succ(f ) such that init(f ) 6∈N and ctsucc(f ) 6∈N : Similar to Case 1, ¬init(f ) ∈ E. If we have shown that ctsucc(f ) 6∈E, then the rule (10)+ : ¬succ(f ) ← ¬init(f ), poss is in πT E . Thus, ¬succ(f ) ∈ E. It remains to show that ctsucc(f ) 6∈E under the assumption of the Subcase 1: On the contrary, suppose that ctsucc(f ) ∈ E for some fluent f . Then succ(f ) ∈ E by (7). This implies succ(f ) ∈ N . On the other hand, the left sides of both (7)+ and (9)+ are not satisfied by N . Thus, succ(f ) 6∈N , a contradiction. Subcase 2. L = ¬succ(f ) such that cf succ(f ) ∈ N : then cf succ(f ) ∈ ME and thus, cf succ(f ) ∈ E. By (8), ¬succ(f ) ∈ E. Combining the three cases above, we know that the Fact 2 is convinced. By the minimality of E, we have SN = E. Therefore, N = ME . This means that ME is a model of precomp(T ). Note that poss ∈ ME , we have thus shown that ME is a model of comp(T ).
Monotone Expansion of Updates in Logical Databases? Michael Dekhtyar1 , Alexander Dikovsky2 , Sergey Dudakov1 , and Nicolas Spyratos3 1
Dept. of CS, Tver State Univ. 33 Zheljabova str. Tver, Russia, 170000
[email protected],
[email protected] 2 Universite de Nantes. IRIN, UPREF, EA No 2157. 2, rue de la Houssiniere BP 92208 F 44322 Nantes cedex 3 France,
[email protected] and Keldysh Institute for Applied Math. 4 Miusskaya sq. Moscow, Russia, 125047 3 Universit´e de Paris-Sud, LRI, U.R.A. 410 du CNRS, Bˆ at. 490 F-91405 Orsay Cedex, France,
[email protected]
Abstract. To find a minimal real change after an update of a database with integrity constraints (IC) expressed by a generalized logic program with explicit negation is proven to be a Σ2p -complete problem. We define a class of operators expanding the input updates correctly with respect to the IC. The particular monotone expansion operator we describe is incrementally computed in square time. It provides a practical optimization of the standard complete choice algorithm resolving the update problem.
1
Introduction
Since the early 80ies the database updates have been in the focus of attention of the researchers and the practitioners. There are several mainstreams in this activity. One of the first approaches was that the updates are specified in some algorithmic algebra with iteration or recursion in its signature (cf. [1]). Its main interest is in the expressivity and the complexity analysis. Quite close is the other approach which proposes logical means for verification of the results of updates specified by expressions with operational semantics over primitive updates (see e.g. [3]). Most intensively was developed still another approach to updates which applies to knowledge bases (KB) presented by logic programs. Sometimes KB extensions (states) are distinguished from KB intensions (views), sometimes not. In the first case various methods were developed of updating views through abduction ( see e.g. [9,11,8]), sometimes enforced by bottom-up hypothesis generation for integrity checking (e.g. [4]). In the second case either the models are updated (in order to indirectly change the knowledge base) (see e.g. ?
This work was sponsored by the Russian Fundamental Studies Foundation (Grants 97-01-00973, 98-01-00204).
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 132–147, 1999. c Springer-Verlag Berlin Heidelberg 1999
Monotone Expansion of Updates in Logical Databases
133
[13,14]), or the logic program specifying the KB is directly changed by another logic program (an update program, [2]). In the case where negative knowledge is treated, both approaches work exclusively with some sort of intended models: for instance, stable or well founded. In this paper we develop the approach to updates proposed in [5,6,10]. We tackle the following problem: given an extended logic program Φ which formalizes the integrity constraints (IC), a correct initial state I |= Φ, and an external update ∆ which specifies the facts D+ to be added to I and the facts D− to be deleted from it, one should find the minimal real change Ψ (I) of the state I, sufficient to accomplish ∆ and to restore Φ if and when it is violated, i.e. to guarantee that D+ ⊆ Ψ (I), Ψ (I) ∩ D− = ∅, and Ψ (I) |= Φ. This setting changes substantially the requirements to the models in consideration. Indeed, the program Φ has nothing to do with knowledge definition. It only specifies the conflicts to avoid after the update. So the use of exclusively “intended” models would lead in this case to the loss of information or to unjustified conflict resolution failures. We illustrate this difference by the following simplified example. Example 1 The IC Φ below expresses a typical case of an exception from a general rule. It consists of two clauses. The first one expresses the general rule: ”children (proposition children) can bathe (bathe) if with parents (parents)”. The other one expresses an exception from this rule: ”children cannot bathe while the ebb tide (proposition ebb) even when with parents”: bathe ← children, parents ¬bathe ← children, parents, ebb. In the state I = {children, ebb} suppose that the parents arrive, which is expressed as the addition of parents to I. The solution is simple but nontrivial. Inclusion of parents causes replacement of ebb by bathe, the result being I1 = {children, parents, bathe}. It seems that the “intended model” methods fail to find this solution. Indeed, since there are no rules with parents in the head, there is no direct refutation or abduction proof of the added fact. I is not a stable model of Φ, nor is it I1 . One could propose to add the clauses children ← and ebb ← to Φ and then to update the resulting program Φ0 by the update program {parents ← }. In this case I would become a stable model of Φ0 , but again the inertia rules of [2,14] would prevent finishing the ebb tide (i.e. prevent to infer ¬ebb).
Moreover, our problem of minimal real change with respect to IC is harder from the complexity point of view than the problem of intended knowledge update. The latter is of the type “guess and check” (which corresponds to N P ) whereas, the former is proven in Section 3 of this paper to be Σ2p -complete. We present in this section a natural standard complete choice algorithm D search which resolves this problem in linear space and exponential time. In our earlier papers [5,6] we introduce a broad class of update operators Ψ (Φ, ∆, I) (we call them conservative), based on a mixed minimal change criterion which is a combination of the maximal intersection, and of the minimal symmetric difference of states. We consider both partial and total models, and we describe nondeterministic and deterministic conservative update algorithms based on conflict resolution techniques. In this paper we deal with the case of classical total models and we propose a new practical method of speeding up the standard choice algorithm D search.
134
M. Dekhtyar et al.
Our method is based on the simple idea that the initial update ∆ = (D+ , D− ) can be incrementally and correctly expanded to a broader update after being iteratively propagated into the IC Φ. In fact, we define in Section 4 a broad class of such update extension operators which narrow the choice space of D search and simultaneously specialize and simplify the IC Φ with respect to the expanded updates. We show that these operators have elegant properties. The particular extension operators we consider propagate positive and negative elementary updates in both directions: from bodies to heads of the clauses and back. The combined fixpoint of the composition of these operators stabilizes after a finite number of steps and is computed in square time. In Section 5 we show how this operator can be used for both, static and dynamic optimizations of D search. This practical method can be applied to databases with IC expressed in the form of logic programs without structures. It also applies to knowledge bases in which the knowledge has an invariable part.
2
Notation
We assume that the reader is familiar with the basic concepts and terminology of logic programming (see [12]). Language. We fix a 1st order signature S with an infinite set of constants C and no other function symbols. A domain is a finite subset D of C. For each domain D by A(D), L(D), B(D) and LB(D) we denote respectively the sets of all atoms, all literals, all ground atoms, and all ground literals in the signature S with constants in D. A literal contrary to a literal l is denoted by ¬.l. We set ¬.M = {¬.l | l ∈ M }. Logic programs. We consider generalized logic programs in S with explicit negation, i.e. finite sets of clauses of the form r = (l ← l1 , ..., ln ), where n ≥ 0 and l, li ∈ L(D) (note that negative literals are possible in the bodies and in the heads of the clauses). For a clause r head(r) denotes its head, and body(r) its body. We will treat body(r) as a set of literals. Integrity constraints (IC) are expressed by a logic program Φ of this kind. IC(D) denotes the set of all ground integrity constraints in the signature S with constants in D. We consider the following simplification order on IC(D): Φ1 Φ2 if ∀r ∈ Φ1 ∃r0 ∈ Φ2 (head(r) = head(r0 ) & body(r) ⊆ body(r0 )). Correct DB states. In this paper we consider classical total interpretations of ICs over closed domains. This means that a certain domain D is fixed for each problem. A (total) interpretation or a DB state I over domain D is a subset of B(D). Given an IC Φ ∈ IC(D) and a DB state I over D, a (ground) clause r = (l ← l1 , ..., ln ) of Φ is valid in I (denoted I |= r) if I |= l whenever / I. I |= li for each 1 ≤ i ≤ n. I |= a means a ∈ I, and I |= ¬.a means a ∈ I is a correct DB state or a model of Φ (denoted I |= Φ) if all clauses of Φ are valid in I.
Monotone Expansion of Updates in Logical Databases
135
Updates. A pair ∆ = (D+ , D− ), where D+ , D− are subsets of LB(D), and D+ ∩ D− = ∅, is called an update. Intuitively, the atoms of D+ are to be added to DB state I, and those of D− are to be removed from I. UP(D) will denote the set of all updates in the signature S and with constants in D. We say that ∆ = (D+ , D− ) is accomplished in I if D+ ⊆ I and D− ∩ I = ∅. The updates in UP(D) will be partially ordered by the componentwise inclusion relation: ∆1 v ∆2 iff D1+ ⊆ D2+ and D1− ⊆ D2− . Update operators. Let Γ be an operator of the type Γ : IC(D) × UP(D) → + − IC(D) × UP(D), Γ (Φ, ∆) = (Φ0 , ∆0 ), and ∆0 = (D0 , D0 ). In the definitions 0 0 0+ 0− to follow Φ , ∆ , D , and D are denoted respectively by Γ (Φ, ∆)ic , up + − Γ (Φ, ∆) , Γ (Φ, ∆) , and Γ (Φ, ∆) . We denote by Γ n the n-fold composition of Γ and by Γ ω the operator Γ ω (Φ, ∆) = lim Γ n (Φ, ∆). n→∞
In the sequel we will omit D when it causes no ambiguity. So when D is subsumed, in the place of A(D), L(D), B(D), LB(D) we will use the notation A, L, B, LB.
3
Conservative Update Operators and Their Complexity
In general an update may contradict a constraint. So a reasonable definition of an update operator should either contain a requirement of “compatibility” of an update and a constraint, or specify a part of the update “compatible” with the constraint. The requirement of compatibility is easy to formalize. Definition 1 For Φ ∈ IC and ∆ ∈ UP let us denote by Acc(Φ, ∆) the set of all models I |= Φ where ∆ is accomplished. An update ∆ is compatible with an IC Φ if Acc(Φ, ∆) 6= ∅. Proposition 1 The property Comp = {(Φ, ∆)| Φ ∈ IC & ∆ ∈ UP & (∆ is compatible with Φ)} is N P -complete. Proof. It is evident that Comp ∈ N P . Now in order to prove N P -hardness of Comp we show that 3−SAT ≤p Comp. Let α = β1 &...& βm be some 3−CN F, where βj = l1j ∨ l2j ∨ l3j and lij ∈ {x1 , ¬x1 , ..., xn , ¬xn } for all 1 ≤ j ≤ m and 1 ≤ i ≤ 3 . We consider the set of atoms B = {a, x1 , x01 , ..., xn , x0n } and construct from α the pair (Φα , ∆α ), where ∆α = ({a}, ∅) and Φα has (2n + m) clauses: ri = (¬a ← xi , x0i ), i = 1, ..., n pi = (¬a ← ¬xi , ¬x0i ), i = 1, ..., n ˜ ˜ ˜ bj = (¬a ← l1j , l2j , l3j ), j = 1, ..., m, ˜ ˜ where lij = xk if lij = xk , and lij = x0k if lij = ¬xk . Suppose that (Φα , ∆α ) ∈ Comp and I ∈ Acc(Φα , ∆α ). Then a ∈ I and the clauses ri and pi assure that for each 1 ≤ i ≤ n either xi ∈ I, or x0i ∈ I, but not both. Let σ be the truth assignment σ(xi ) = true iff x0i ∈ I and σ(xi ) = f alse iff xi ∈ I. Since each clause bj , 1 ≤ j ≤ m, is valid in I, there is some 1 ≤ i ≤ 3 such ˜ ˜ that I 6|= lij , and hence, lij 6∈I. So σ(lij ) = true and therefore, σ(α) = true.
136
M. Dekhtyar et al.
Now suppose that α ∈ 3−SAT and let σ be a truth assignment such that σ(α) = true. Consider the interpretation I = {xi |σ(xi ) = f alse} ∪ {x0i |σ(xi ) = true} ∪ {a}. It is easy to check that I ∈ Acc(Φα , ∆α ). 2 In [5] we propose the following minimal change criterion implementing the intention to keep as much initial facts as possible, and then to add possibly fewer new facts: Definition 2 Let I, I1 be two DB states, and K be a class of DB states. We say that I1 is minimally deviating from I with respect to K if ∀I2 ∈ K (¬(I ∩ I1 ( I ∩ I2 ) & ((I ∩ I1 = I ∩ I2 ) → ¬(I2 \ I ( I1 \ I))). In terms of this minimal change criterion the conservative update operators we consider have been defined in [6] as follows. Definition 3 Let ∆ be a given update which is compatible with IC Φ. An operator Ψ = Ψ [Φ, ∆] on the set of DB states UP is a conservative update operator if for each DB state I : • Ψ (I) is a model of Φ, • ∆ is accomplished in Ψ (I), • Ψ (I) is minimally deviating from I with respect to Acc(Φ, ∆). 3.1
Computational Complexity of Conservative Updates
In order to measure the complexity of conservative updates we use two standard algorithmic problems: Optimistic and Pessimistic Fall-Into-Problem (OFIP and respectively PFIP) (cf. [7]). OFIP: Given some ∆ ∈ UP compatible with Φ ∈ IC, an initial state I, and a literal l ∈ LB, one should check whether there exists a DB state I1 such that: (a) I1 ∈ Acc(Φ, ∆), (b) I1 is minimally deviating from I with respect to Acc(Φ, ∆), and (c) I1 |= l. PFIP: requires (c) be true for all models I1 satisfying (a),(b). We consider the combined complexity of these problems with respect to problem size evaluated as N = |D| + |I| + |∆| + |Φ| + |l| (| | being the size of constant or literal sets, and of programs in some standard encoding). We denote respectively by OFIP and PFIP the sets of all solutions (I, ∆, Φ, l) of these problems. It is easy to see that they are in fact co-sets (i.e. (I, ∆, Φ, l) ∈ PFIP iff (I, ∆, Φ, ¬.l) 6∈OFIP). The following theorem classifies their complexity depending on nonmonotonic means used: negation in Φ and deletions in ∆. Theorem 1 1 (1) If there are no negations in Φ and deletions in ∆ (i.e. Φ is a definite logic program and D− = ∅), then both OFIP and PFIP belong to P . (2) If Φ is a definite logic program, then OFIP is N P -complete and PFIP is co-NP-complete. (3) In the general case OFIP is Σ2p -complete and PFIP is Π2p -complete. 1
It is worth noting that these complexity bounds are similar to those established in [7] for updates of propositional KBs.
Monotone Expansion of Updates in Logical Databases
137
Proof. Point (1) being quite evident, we present some details of the proofs of (2) and (3). Under conditions of (2), I1 satisfies OFIP (b) iff (b’): I1 equals the minimal model of (Φ ∪ D+ ∪ (I1 ∩ I)). So it is enough to guess I1 , then to check that it satisfies (a) and (b’), and finally, to check that l ∈ I1 . Therefore, OFIP ∈ N P. To show the lower bound, we reduce 3-CNF to OFIP. Let α = (l11 ∨ l12 ∨ l13 ) ∧ · · · ∧ (ln1 ∨ ln2 ∨ ln3 ) be a 3-CNF and x1 , . . . , xm be the variables in α. Then we set ∆ = ({c}, {a}) , I = {x1 , x01 , . . . , xm , x0m , a}, and define Φ as the set of clauses: a ← xi , x0i , 1 ≤ i ≤ m; bi ← c, ˜lij , 1 ≤ i ≤ n, j = 1, 2, 3, b ← b1 , . . . , bn , where ˜lij = xi , if lij = xi , and ˜lij = x0i , if lij = ¬xi . One can verify that (I, ∆, Φ, b) ∈ OFIP iff α ∈ 3-CNF. For (3), the upper bound is provided by the following ∃∀-algorithm: 1) Guess an interpretation I1 such that I1 |= l, and check that I1 ∈ Acc(Φ, ∆). 2) For each interpretation I2 6= I1 check that either I2 6∈ Acc(Φ, ∆), or the condition of definition 2 is satisfied. In order to show the lower bound, we reduce to OFIP the Σ2p -complete set of valid quantified propositional formulas of the form ∃x1 ...∃x x, y¯), Wkr∀y1 ...∀ym α(¯ where α is in 3-DNF. Let φ be such a formula with α(¯ x, y¯) = j=1 (l1j ∧ l2j ∧ l3j ), where lij ∈ {x1 , ¬x1 , ..., xk , ¬xk , y1 , ¬y1 , ..., ym , ¬ym } for al 1 ≤ j ≤ r, 1 ≤ i ≤ 3. 0 Let B = {a, b, x1 , x01 , ..., xk , x0k , y1 , y10 , ..., ym , ym }. φ is encoded by the quadruple (Φ, ∆, I, b), where ∆ = ({a}, ∅), I = {x1 , x01 , ..., xk , x0k }, and Φ has 2k + 4m + r clauses: rxi : (¬a ← xi , x0i ); pxi : (¬a ← ¬xi , ¬x0i ), i = 1, ..., k; ryj : (b ← yj , yj0 ); pyj : (b ← ¬yj , ¬yj0 ), j = 1, ..., m; tj : (yj ← b); t0j : (yj0 ← b), j = 1, ..., m; bs : (b ← l˜1s , l˜2s , l˜3s ), s = 1, ..., r, where l˜is = lis , if lis ∈ {x1 , ..., xk , y1 , ..., ym }, and l˜is = (¬.lis )0 , if lis ∈ {¬x1 , ..., ¬xk , ¬y1 , ..., ¬ym }. Then the lower bound of point (3) follows from the next Claim. φ is valid iff (I, Φ, ∆, b) ∈ OFIP. 2 3.2
Directed Search Implementation of Conservative Updates
Since DB states and updates are finite, and the domain is closed, the space of DB states resulting from updates is finite as well. For each subset X of this space let us fix the topological order with respect to set inclusion on subsets of X, with the successor function nextX . We set nextX (X) = ⊥ for some constant ⊥. The following algorithm implements the complete search of a conservative update, directed by this order and conformable with our minimal change criterion. Algorithm D search(I, Φ, ∆) Input: a DB state I, and an update ∆ = (D+ , D− ) compatible with Φ ∈ IC. Local variables: I˜ (initial state update), Hadd (positive one-step state update), Hdel
138
M. Dekhtyar et al.
(negative one-step state update), H + (search space for positive updates Hadd ), H − (search space for negative updates Hdel ), b, c (boolean stabilization flags). Output: I1 . I˜ := (I ∪ D+ ) \ D− ; % search space for the current state updates Hdel and Hadd : H − := I˜ \ D+ ; % search space for Hdel H + := B \ (I˜ ∪ D− ); % search space for Hadd Hdel := ∅; b := f alse; WHILE ¬b AND Hdel 6= ⊥ DO Hadd := ∅; c := f alse; WHILE ¬c AND Hadd 6= ⊥ DO I1 := (I˜ \ Hdel ) ∪ Hadd ; =r IF there is a clause r ∈ Φ such that I1 6| THEN Hadd := nextH + (Hadd ) ELSE c := true END IF END DO IF c THEN b := c ELSE Hdel := nextH − (Hdel ) END IF END DO Output I1 .
Theorem 2 Algorithm D search implements conservative update operators in linear space and in time O(2d+a N ), where N = |Φ| + |∆|, d is the size (the number of literals) of H − , and a is the size of H + . 0 Proof. It is easy to check that for every I 0 ∈ Acc(Φ, ∆) there are such Hdel ⊆ − 0 + 0 0 0 ˜ H and Hadd ⊆ H that I = (I \ Hdel ) ∪ Hadd . Since Acc(Φ, ∆) 6= ∅ and algorithm D−search(I, Φ, ∆) tries all pairs (Hdel , Hadd ) such that Hdel ⊆ H − and Hadd ⊆ H + , then it eventually stops and outputs some model I1 ∈ Acc(Φ, ∆). The order on pairs (Hdel , Hadd ) induced by functions nextH − and nextH + insures that I1 is minimally deviating from I with respect to Acc(Φ, ∆). The complexity bounds are easy to verify. 2
Theorem 1 shows that the nature of conservative update operators does not leave a hope to substantially optimize this algorithm in general. However, there may exist practical methods of speeding-up this standard search directed by our minimal change criterion. Below we propose one such efficient method. Our idea is to reduce the search space H + , H − by expanding correctly the sets D+ , D− , propagating them into Φ.
4
Update Expansion Operators
In this section we define several operators on constraints and updates which correctly expand the sets of added and deleted facts according to the IC implemented by a logic program Φ, and simultaneously specialize and simplify Φ according to the expanded updates.
Monotone Expansion of Updates in Logical Databases
139
Each update ∆ = (D+ , D− ) induces the set of literals l which it requires (∆ |= l), and those to which it contradicts (∆ 6|= l). The idea behind our expansion operators is that this primary positive and negative information can be incrementally augmented being propagated into Φ. Table 1 below describes the primary relations |= and 6|= between ∆ and ground atoms a ∈ D+ and a ∈ D − . These relations can be extended to conjunctions of ground literals l1 , ..., lk ∈ BL : ∆ |= l1 , ..., lk if ∀j (∆ |= lj ), and ∆ 6|= l1 , ..., lk if ∃j (∆ 6|= lj ). In particular, ∆ |= ∅, and ∆ 6|= ∅ is not true. Table 1. Relations ∆ |= l and ∆ 6|= l. ∈ D+ D− a ∆ |= a, ∆ 6|= ¬a ∆ |= ¬a, ∆ 6|= a These definitions allow to carry over the validity of literals from updates to the interpretations in which the updates are accomplished. Namely, the following evident property holds. Lemma 1 Let ∆ be accomplished in I. Then: (1) if ∆ |= l1 , ..., lk , then I |= l1 , ...lk , and if ∆ 6|= l1 , ..., lk , then ¬(I |= l1 , ...lk ); (2) if I |= l1 , ...lk , then it is not the case that ∆ 6|= l1 , ..., lk . Evidently, relations |= and 6|= defined by Table 1 are monotone with respect to updates. Lemma 2 For any two updates ∆1 , ∆2 ∈ UP such that ∆1 v ∆2 , and for any set {l1 , ..., lk } ⊆ BL (1) if ∆1 |= l1 , ..., lk , then ∆2 |= l1 , ...lk ; (2) if ∆1 6|= l1 , ..., lk , then ∆2 6|= l1 , ...lk . The simplification order on programs we use conforms with the following residue operator simplifying logic programs via updates. Definition 4 The residue of Φ ∈ IC with respect to ∆ is defined as: res(Φ, ∆) = {l ← α | ∃r ∈ Φ (head(r) = l & ¬(∆ |= l) & ¬(∆ 6|= body(r)) & α ⊆ body(r) & body(r) \ α = max{β ⊆ body(r) | ∆ |= β})}. Example 2 For the IC Φ consisting of clauses: r1 : a ← b, ¬c, r2 : ¬b ← ¬a, d, r3 : c ← ¬b, ¬d, and r4 : ¬c ← b, d, and the update ∆ = ({b}, {c}) their residue res(Φ, ∆) has two clauses: r10 : a ← and r20 : ¬b ← ¬a, d.
The following evident properties of res ensure its incremental computation. Lemma 3 (1) Acc(Φ, ∆) = Acc(res(Φ, ∆), ∆) for all Φ ∈ IC and ∆ ∈ UP. (2) Let an update ∆ = (D+ , D− ) be partitioned into two updates ∆1 = (D1+ , D1− ) and ∆2 = (D2+ , D2− ), where D+ = D1+ ∪ D2+ and D− = D1− ∪ D2− . Then res(Φ, ∆) = res(res(Φ, ∆1 ), ∆2 ) for all Φ ∈ IC.
140
M. Dekhtyar et al.
(3) Let Φ = Φ1 ∪ Φ2 be an IC partition. Then res(Φ, ∆) = res(Φ1 , ∆) ∪ res(Φ2 , ∆) for all ∆ ∈ UP. (4) Let |Φ| be the size of Φ ∈ IC (the number of literals in all clauses of Φ). Then res(Φ, ∆) Φ, and therefore |res(Φ, ∆)| ≤ |Φ| for all ∆ ∈ UP. We propose the following general definition of expansion operators. Definition 5 (1) Let Φ, Φ0 be ICs, and ∆, ∆0 be updates. For the pairs (Φ, ∆) and (Φ0 , ∆0 ) we say that they are update-equivalent (notation: (Φ, ∆) ≡u (Φ0 , ∆0 )) if Acc(Φ, ∆) = Acc(Φ0 , ∆0 ). (2) An operator Γ : IC × UP → IC × UP is an update expansion operator if for all compatible Φ ∈ IC and ∆ = (D+ , D− ) ∈ UP • ∆ v Γ (Φ, ∆)up . • Γ (Φ, ∆)ic Φ, • (Φ, ∆) ≡u Γ (Φ, ∆). From this definition it follows immediately that the set operators is closed under composition. The particular rators we propose are defined through operators which ∆ |= l and ∆ 6|= l in opposite directions: from bodies to
of all update expansion update expansion opepropagate the relations heads and back.
Forward operator F on Φ ∈ IC and ∆ = (D+ , D− ) ∈ UP : F (Φ, ∆)+ = D+ ∪ {a ∈ B | ∃r ∈ Φ (a = head(r) & ∆ |= body(r))} F (Φ, ∆)− = D− ∪ {a ∈ B | ∃r ∈ Φ (¬a = head(r) & ∆ |= body(r))}. Backward operator B on Φ ∈ IC and ∆ = (D+ , D− ) ∈ UP : B(Φ, ∆)+ = D+ ∪ {a ∈ B | ∃r ∈ Φ (∆ 6|= head(r) & body(r) = ¬aα & ∆ |= α)} B(Φ, ∆)− = D− ∪ {a ∈ B | ∃r ∈ Φ (∆ 6|= head(r) & body(r) = aα & ∆ |= α)}. Clearly, both operators F and B are monotone. They are also invariant with respect to the residue operator res and do not change models. Lemma 4 For any Φ ∈ IC and ∆ ∈ UP the following equalities hold: (1) F (res(Φ, ∆), ∆) = F (Φ, ∆), B(res(Φ, ∆), ∆) = B(Φ, ∆). (2) Acc(Φ, ∆) = Acc(Φ, F (Φ, ∆)), Acc(Φ, ∆) = Acc(Φ, B(Φ, ∆)). We now use these operators to define the forward and backward update expansions. Forward update expansion Γf : γf0 (Φ, ∆) = (Φ, ∆) γf (Φ, ∆) = (res(Φ, ∆), F (res(Φ, ∆), ∆)) γfn+1 (Φ, ∆) = γf (γfn (Φ, ∆)) Γf (Φ, ∆) = lim γfn (Φ, ∆). n→∞
Backward update expansion Γb : γb0 (Φ, ∆) = (Φ, ∆) γb (Φ, ∆) = (res(Φ, ∆), B(res(Φ, ∆), ∆)) γbn+1 (Φ, ∆) = γb (γbn (Φ, ∆)) Γb (Φ, ∆) = lim γbn (Φ, ∆). n→∞
Monotone Expansion of Updates in Logical Databases
141
The operators F, B, and res we have introduced, have the properties that guarantee the existence of limits for the directed sets of their iterations. Lemma 5 For any pair Φ ∈ IC and ∆ ∈ UP (1) there is n ≥ 0 such that Γf (Φ, ∆) = γfn (Φ, ∆), (2) there is m ≥ 0 such that Γb (Φ, ∆) = γbm (Φ, ∆). Proof. In order to establish point (1) let us consider for each pair (Φ, ∆), ∆ = (D+ , D− ), the number N (Φ, ∆) = |Φ| + |LB| − |D+ | − |D− | ≥ 0. From Lemma 3 and the definition of γf it follows that for every n, if γfn+1 (Φ, ∆) 6= γfn (Φ, ∆), then N (γfn+1 (Φ, ∆)) < N (γfn (Φ, ∆)). Therefore, for some m ≤ N (Φ, ∆) the equality N (γfm+1 (Φ, ∆)) = N (γfm (Φ, ∆)) holds and Γf (Φ, ∆) = γfm (Φ, ∆). A similar argument proves the point (2). 2 Theorem 3 (1) Γf , Γb , and Γlim = (Γf ◦ Γb )ω are update expansion operators. (2) There are algorithms f expand and b expand which correctly compute the operators Γf and Γb in linear time. (3) Γlim is computable in square time. Proof. 1. Let us show that Γf is an expansion operator. From Lemma 2 it follows that operators γf and γb are monotone with respect to updates. Then by standard induction on n we prove that so are operators γfn and γbn . Therefore, by Lemma 5 Γf and Γb are also monotone with respect to updates. Now, in order to prove that Γf is an update expansion operator we show that Acc(Φ, ∆) = Acc(Γf (Φ, ∆)). In fact, it suffice to establish the equality Acc(Φ, ∆) = Acc(γf (Φ, ∆)). Then the previous equality will follow from Lemma 5 (2) by evident induction on m. IF-part: Acc(Φ, ∆) ⊆ Acc(γf (Φ, ∆)). Suppose that I ∈ Acc(Φ, ∆). By definition, γf (Φ, ∆) = (res(Φ, ∆), F (res(Φ, ∆), ∆)). First, we show that I |= res(Φ, ∆). Let r0 = l ← α be a clause in res(Φ, ∆). Then by definition of res, there are a clause r ∈ Φ and a set of literals β such that head(r) = l, body(r) = α ∪ β, and ∆ |= β. By Lemma 1, I |= β. Now, if I |= α, then I |= body(r) and I |= l (since I |= r). So I |= r0 , and I |= res(Φ, ∆). Next, we show that the update γf (Φ, ∆)up = F (res(Φ, ∆), ∆) is accomplished in I. Let a be any atom from F (res(Φ, ∆), ∆)+ . Then either a ∈ D+ , or a = head(r) for some clause r ∈ res(Φ, ∆) such that ∆ |= body(r). In the first case, a ∈ I since ∆ is accomplished in I. In the second case, by Lemma 1 I |= body(r) and I |= a since I |= res(Φ, ∆). Therefore, a ∈ I and F (res(Φ, ∆), ∆)+ ⊆ I. If a ∈ F (res(Φ, ∆), ∆)− , then either a ∈ D− , or ¬a = head(r) for some clause r ∈ res(Φ, ∆) such that ∆ |= body(r). Again we conclude that I |= head(r) = ¬a and therefore, a 6∈ I. Then I ∩ F (res(Φ, ∆), ∆)− = ∅. Hence, the update γf (Φ, ∆)up is accomplished in I.
142
M. Dekhtyar et al.
ONLY-IF-part: Acc(γf (Φ, ∆)) ⊆ Acc(Φ, ∆). Suppose that I ∈ Acc(γf (Φ, ∆)). Since F is monotone with respect to updates, and F (res(Φ, ∆), ∆) is accomplished in I, then ∆ is accomplished in I as well. Now let r be a clause of Φ with head(r) = l. Suppose that I |= body(r). Then by Lemma 1(2), it is not the case that ∆ 6|= body(r). If ∆ |= l, then by Lemma 1(1), I |= l and I |= r. Otherwise, there is a clause r0 = l ← α ∈ res(Φ, ∆) such that for some set of literals β body(r) = α ∪ β, and ∆ |= β. Since I |= body(r), then I |= α, and I |= l because I |= r0 . Therefore, I |= r. So I |= Φ and I ∈ Acc(Φ, ∆). This ends the proof that Γf is an update expansion operator. 2. Let us show that Γb is an expansion operator. A standard induction argument shows that it is enough to establish this fact for γb . So we should prove the equality Acc(Φ, ∆) = Acc(γb (Φ, ∆)). IF-part. Acc(Φ, ∆) ⊆ Acc(γb (Φ, ∆)). Suppose that I ∈ Acc(Φ, ∆). By definition, γb (Φ, ∆) = (res(Φ, ∆), B(res(Φ, ∆), ∆)). Let us check that γb (Φ, ∆)up = B(res(Φ, ∆), ∆) is accomplished in I. / D+ . Since ∆ is acLet a ∈ B(res(Φ, ∆), ∆)+ . If a ∈ D+ then I |= a. Let a ∈ complished in I, there is a clause r ∈ res(Φ, ∆) such that r = l0 ← ¬aα, ∆ 6|= l0 , and ∆ |= α. Then by Lemma 1, I 6|= l0 , and I |= α. Since I |= r and I 6|= l0 , then I 6|= body(r). Therefore, I 6|= ¬a and a ∈ I. So B(res(Φ, ∆), ∆)+ ⊆ I. / D− . Since ∆ is Let a ∈ B(res(Φ, ∆), ∆)− . If a ∈ D− , then a 6∈I. Let a ∈ accomplished in I, there is a clause r ∈ res(Φ, ∆) such that r = l0 ← aα, ∆ 6|= l0 , and ∆ |= α. Then by Lemma 1, I 6|= l0 and I |= α. Since I |= r and I 6|= l0 , I 6|= body(r) is true. Hence, I 6|= a, and a 6∈I. So B(res(Φ, ∆), ∆)− ∩ I = ∅ and therefore, γb (Φ, ∆)up is accomplished in I. Together with I |= res(Φ, ∆), which was proven above, this implies Acc(Φ, ∆) ⊆ Acc(γb (Φ, ∆)). The proof of the ONLY-IF-part is similar to that of the operator γf . 3. By a standard induction argument we infer that Γlim is an update expansion operator from points 1 and 2, and from Lemma 5. We omit here quite a standard algorithm f expand for Γf , and we show a sequential computation in linear time for Γb . It keeps track of the set BR of the clauses whose heads contradict the current update, and places into the queue Q those clauses in this set, whose bodies have only one literal (which therefore, also contradicts ∆). In the course of the main loop one clause in Q is invoked and the literal in it’s body contradicting to ∆ is used to expand the current update (M + , M − ) and to reduce the current IC Φ1 . Algorithm b expand computing the backward update expansion Input: Some compatible update ∆ = (D+ , D− ) and Φ ∈ IC. Local variables: BR (clauses whose heads contradict the current update expansion), Φ1 (the resulting IC residue incrementally computed), M + (the expanded positive update incrementally computed), M − (the expanded negative update incrementally computed), ∆1 (the resulting update expansion incrementally computed), Q (the queue of clauses in BR with single literal bodies), r (ranges over the clauses). Output: (Φ1 , ∆1 ). Φ1 := Φ; M − := D− ; M + := D+ ;
Monotone Expansion of Updates in Logical Databases
143
Delete from Φ1 all clauses r with head(r) ∈ (M + ∪ ¬.M − ) or with body(r) ∩ (¬.M + ∪ M − ) 6= ∅; FOR EACH clause r ∈ Φ1 DO body(r) := body(r) \ (M + ∪ ¬.M − ) END DO BR := {r ∈ Φ1 | head(r) = l & ∆ 6| = l}; FOR EACH clause r = (l0 ← l10 ) ∈ BR DO Q := EN QU EU E(Q, r); BR := BR \ {r} END DO % residue and expansion computation loop: WHILE Q is not empty DO r := F RON T (Q); let r = l ← l1 ; Q := DEQU EU E(Q, r); IF l1 ∈ B THEN M − := M − ∪ {l1 } ELSE M + := M + ∪ {¬.l1 } END IF; Delete from Φ1 and from BR all clauses with l1 in their bodies; Delete from Φ1 all clauses with ¬.l1 in their heads; Delete ¬.l1 from bodies of clauses in Φ1 and BR; BR := BR ∪ {r ∈ Φ1 | head(r) = l1 } FOR EACH clause r = (l0 ← l10 ) ∈ BR DO Q := EN QU EU E(Q, r); BR := BR \ {r} END DO END DO ∆1 := (M + , M − ); Output (Φ1 , ∆1 ).
Γlim is computed by an evident square time algorithm with the loop in which successive calls of f expand and b expand are effected till stabilization. 2 As it turns out, the limit operator Γlim is more powerful then both operators Γf and Γb , and than their compositions. Theorem 4 For each n ≥ 1 there exist compatible IC Φ and update ∆ such that (Γf ◦ Γb )n+1 (Φ, ∆) 6= (Γf ◦ Γb )n (Φ, ∆). Proof. Let us consider the following IC Φ consisting of 2(n + 1) clauses: r1 : a1 ← b0 s1 : ¬b0 ← a1 , ¬b1 ... ri+1 : ai+1 ← bi si+1 : ¬ai ← ai+1 , ¬bi+1 ... rn+1 : an+1 ← bn sn+1 : ¬an ← an+1 , ¬bn+1 . Let ∆ = ({b0 }, ∅). Then it is easy to check that (Γf ◦ Γb )i (Φ, ∆) = (Φi , ∆i ), where Φi = {ri+1 , si+1 , ..., rn+1 , sn+1 } and ∆i = ({b0 , a1 , b1 , ..., ai , bi }, ∅) for all 1 ≤ i ≤ n + 1. Therefore,
144
M. Dekhtyar et al.
(Γf ◦ Γb )n+1 (Φ, ∆) = (∅, ({b0 , a1 , b1 , ..., an+1 , bn+1 }, ∅)) 6= (Γf ◦ Γb (Φ, ∆))n = ({rn+1 , sn+1 }, ({b0 , a1 , b1 , ..., an , bn }, ∅)). 2
5
Speeding-up the Directed Search
The simplest way to speed-up the standard algorithm of directed search is to reduce its search space by using the IC and the update optimized by Γlim in the place of initial IC and update. The resulting operator Ψ (I) = D search(I, Γlim (Φ, ∆)) is a conservative update operator, and is more efficient. Meanwhile, we can optimize D search itself by applying extension operators inside its loops. The idea is to optimize the inner WHILE-loop of this algorithm. We apply Γlim twice in this loop: first time, after the new choice of Hdel , and second time, after the new choice of Hadd . The first application allows to precompute in advance the consequences of the choice of Hdel . If the choice is correct, this simplifies Φ and still narrows search space H + for all Hadd in the loop. If the choice is not correct, we cut the inner WHILE-loop at this early phase. The second application can narrow the search space H + for Hadd or give the result immediately. Algorithm CU computing a conservative update operator Input: a DB state I, an update ∆ = (D+ , D− ) compatible with an IC Φ. Output: DB state I1 . % Initial update expansion and IC simplification: (Φω , (M + , M − )) := Γlim (Φ, ∆); I˜ := (I ∪ M + ) \ M − ; % New search space for Hdel : H − := I˜ \ M + ; % The outer loop on Hdel in the narrowed search space H − : Hdel := ∅; b := f alse; WHILE ¬b AND Hdel 6= ⊥ DO Hadd := ∅; c := f alse; % Computation of the initial state update determined by the choice of Hdel ; when correct, it is good for all choices of Hadd in the inner loop: + − Ddel := I˜ \ Hdel ; Ddel := M − ∪ Hdel ; + − , Ddel )) : % Precomputation of D search(I, Γlim (Φω , (Ddel + − + − (Φdel , (Mdel , Mdel )) := Γlim (Φω , (Ddel , Ddel )); + − IF (Mdel ∩ Mdel = ∅) % checking non contradictory choice of Hdel THEN % non contradictory choice % search space definition for the current state updates Hadd : + − H + := B \ (Mdel ∪ Mdel ); % the inner loop on Hadd in the choice space reduced by the Hdel precomputation of + − the residue Φdel and the expansion (Mdel , Mdel ): WHILE ¬c AND Hadd 6= ⊥ DO + − % The new update (Dadd , Dadd ) determined by the choice of Hadd : + + − − Dadd := Mdel ∪ Hadd ; Dadd := Mdel ;
Monotone Expansion of Updates in Logical Databases
145
% An attempt to compute a new DB state through Γlim . + − + − , Madd )) := Γlim (Φdel , (Dadd , Dadd ))up ; (Madd % contradiction check: + − + IF (Madd ∩ Madd 6= ∅ OR Madd 6| = Φadd ) THEN % contradictory choice of Hadd Hadd := nextH + (Hadd ) % Hadd abandoned + + ELSE IF Madd \ Dadd 6= ∅ THEN % Decreasing search space for Hadd : + + H + := Madd \ Mdel : Hadd := nextH + (Hadd ) ELSE % minimal correct state found + I1 := Madd ; c := true END IF END IF END DO END IF IF c THEN b := c ELSE % Hdel abandoned and the inner loop cut Hdel := nextH − (Hdel ) END IF END DO Output I1 .
Theorem 5 The algorithm CU implements a conservative update operator on compatible ICs and updates. The following artificial example illustrates this algorithm. Example 3 Let Φ have the clauses: r1 : b ← a, e, f ; r2 : d ← ¬e, ¬g, ¬i; r3 : ¬a ← ¬e, ¬h, i; r4 : g ← ¬d, h; r5 : c ← a, ¬b; r6 : b ← c, d, and update ∆ = ({a}, {b}) be applied to DB state I = {b, d, e, f }. Then CU runs as follows. Before the main loop Γf removes the rule r5 , and Γb removes r6 . The result is: M + = {a, c}, M − = {b, d}, and Φω = { r10 : b ← e, f ; r2 : d ← ¬e, ¬g, ¬i; r3 : ¬a ← ¬e, ¬h, i; r40 : g ← h}. Then I˜ = {a, c, e, f } and + ˜ D− = {b, d}. H − = {e, f }. The first loop WHILE is started with Hdel = ∅, Ddel = I, + Then, due to r10 , Γf adds b to Mdel . Hence, the condition of the first IF is false, so the + − new Hdel is nextH − (∅) = {e}. Now in the main loop we get Mdel = {a, c, f }, Mdel = 0 0 0 {b, d, e}, and Φdel = { r2 : d ← ¬g, ¬i; r3 : ¬a ← ¬h, i; r4 : g ← h}. Therefore, H + = {g, h, i} and for Hadd = ∅ the first execution of the inner loop WHILE discovers + + = Mdel 6| = Φadd . Now Hadd = nextH + (∅) = {g} is considered, the contradiction: Madd + and the course of the second execution of the inner loop we obtain Dadd = {a, c, f, g} = + + 0 0 Madd , Φadd = {r3 , r4 }. So Madd |= Φadd . Here CU finishes with the result I1 = {a, c, f, g}.
146
6
M. Dekhtyar et al.
Conclusion
Even though we have proven that the conservative updates are intractable in general, the method we propose extracts their reasonably large part which is incrementally computed in square time. It provides both, static and dynamic optimizations of the complete search. The next step would be to find more powerful, nevertheless tractable expansion operators. E.g., we could use a more powerful backward operator if we abandoned the claim that a single literal in the body is refuted. However, the resulting operator would be nondeterministic. It is an interesting theoretical problem to implement efficiently some sort of such disjunctive backward operators.
References 1. Abiteboul, S.:Updates a new Frontier. In: Proc. of the Second International Conference on the Theory of Databases, ICDT’88. LNCS 326 (1988) 1-18. 2. Alferes, J.J.,Pereira, L.M.: Update-Programs Can Update Programs. In J.Dix, L.M. Pereira, T.C. Przymusinski, editors: Second International Workshop, NMELP’96. Selected Papers. LNCS 1216 (1997) 110-131. 3. Bonner, A.J., Kifer, M.: An Overview of Transaction Logic. Theoretical Computer Science, 133(2) (1994), 2-5-265. 4. Decker H.: An extension of SLD by abduction and integrity maintenance for view updating in deductive databases. In: Proc. of the 1996 International Conference on Logic Programming. MIT Press, (1996), 157-169. 5. Dekhtyar, M., Dikovsky, A., Spyratos, N.: On Conservative Enforced Updates. In: Dix, J., Furbach, U., Nerode, A., editors: Proceedings of 4th International Conference, LPNMR’97. Dagstuhl Castle, Germany, LNCS 1265 (1997) 244-257. 6. Dekhtyar, M., Dikovsky, A., Spyratos, N.: On Logically Justified Updates. In: J. Jaffar, editor: Proc. of the 1998 Joint International Conference and Symposium on Logic Programming. MIT Press, (1998), 250-264. 7. Eiter, T., Gottlob, G.: On the complexity of propositional knowledge base revision, updates, and counterfactuals. Artificial Intelligence 57 (1992) 227-270. 8. Eshghi, K., Kowalski, R. A.: Abduction Compared with Negation by Failure. In: Proc. of the 1989 International Conference on Logic Programming. (1989) 9. Guessoum A., Lloyd J.W.: Updating knowledge bases. New Generation Computing, 8 (1990), 71-89. 10. Halfeld Ferrari Alves, M., Laurent, D., Spyratos, N., Stamate, D.: Update rules and revision programs. Rapport de Recherche Universit´e de Paris-Sud, Centre d’Orsay, LRI 1010 (12 / 1995). 11. Kakas A.C., Mancarella P.: Database updates through abduction. IN: Proc. 16th VLBD Conference. (1990) 650-661. 12. Lloyd, J.W., Foundations of Logic Programming. Second, Extended Edition. Springer-Verlag. (1993) 13. Marek, V.W., Truszczy´ nski, M.: Revision programming, database updates and integrity constraints. In: International Conference on Data Base theory, ICDT. LNCS 893 (1995) 368-382. 14. Przymusinski, T.C., Turner, H.: Update by Means of Inference Rules. In: V.W.Marek, A.Nerode, M.Truszczy´ nski, editors, Logic Programming and Nonmonotonic Reasoning. Proc. of the Third Int. Conf. LPNMR’95, Lexington, KY, USA (1995) 166-174.
Updating Extended Logic Programs through Abduction Chiaki Sakama1 and Katsumi Inoue2 1
Department of Computer and Communication Sciences Wakayama University Sakaedani, Wakayama 640 8510, Japan
[email protected] 2 Department of Electrical and Electronics Engineering Kobe University Rokkodai, Nada, Kobe 657 8501, Japan
[email protected]
Abstract. This paper introduces techniques for updating knowledge bases represented in extended logic programs. Three different types of updates, view updates, theory updates, and inconsistency removal, are considered. We formulate these updates through abduction, and provide methods for computing them with update programs. An update program is an extended logic program which specifies changes on abductive hypotheses, then updates are computed by the U-minimal answer sets of an update program. The proposed technique provides a uniform framework for these different types of updates, and each update is computed using existing procedures of logic programming.
1
Introduction
A knowledge base must be updated when new information arrives. There are three cases in updating a knowledge base. The first one is that a knowledge base contains two different kinds of knowledge — variable knowledge and invariable knowledge. In this case, updates are permitted only on variable knowledge. Updates on the invariable part are translated into updates on the variable part. An example of this type of updates is a view update in deductive databases. The second one is that there is no such distinction and the whole knowledge base is subject to change. In this case, an update is done by directly introducing new information to the knowledge base. When there are conflicts between the current knowledge and the new knowledge, a higher priority is put on the new one to produce a consistent theory as a whole. We call this type of updates a theory update. As the third one, suppose a knowledge base containing inconsistent information. In this situation, a knowledge base must be updated to restore consistency by removing the source of inconsistency. This type of updates is called inconsistency removal. There are many studies which consider the update issue in logic programming and deductive databases, while they are rather individual techniques to realize one of these updates. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 147–161, 1999. c Springer-Verlag Berlin Heidelberg 1999
148
C. Sakama and K. Inoue
In this paper we propose a unified framework for realizing view updates, theory updates, and inconsistency removal in extended logic programs. We first formulate these different types of updates through abduction, then introduce an update program which is an extended logic program specifying changes on abductive hypotheses. We show that each update is realized by computing the U-minimal answer sets of an update program. This paper is organized as follows. Section 2 introduces a theoretical framework used in this paper. Section 3 presents a technique for computing view updates using update programs. Section 4 provides methods for computing theory updates and inconsistency removal. Section 5 analyzes computational complexities. Section 6 presents related work, and Section 7 concludes the paper.
2
Preliminaries
A knowledge base is represented in an extended logic program (ELP) [9], which is a set of rules of the form: L0 ← L1 , . . . , Lm , not Lm+1 , . . . , not Ln where Li ’s are literals and not is negation as failure. The literal L0 is the head and the conjunction L1 , . . . , Lm , not Lm+1 , . . . , not Ln is the body. The head is possibly empty. A rule is called a fact if the body is empty. An ELP is a normal logic program (NLP) if every Li is an atom. Throughout the paper, a program means an ELP unless stated otherwise. The semantics of an ELP is defined by the answer set semantics [9]. Let LP be the set of all ground literals in the language of a program P . The answer sets of an ELP are defined by the following two steps. First, let P be a not-free ELP (i.e., for each rule m = n) and S ⊆ LP . Then, S is an answer set of P if S is a minimal set satisfying the conditions: 1. For each ground rule L0 ← L1 , . . . , Lm from P , {L1 , . . . , Lm } ⊆ S implies L0 ∈ S. In particular, {L1 , . . . , Lm } 6⊆S if L0 is empty. 2. If S contains a pair of complementary literals L and ¬L, then S = LP . Second, let P be any ELP and S ⊆ LP . Then, define a not-free ELP P S as follows: a rule L0 ← L1 , . . . , Lm is in P S iff there is a ground rule L0 ← L1 , . . . , Lm , not Lm+1 , . . . , not Ln from P such that { Lm+1 , . . . , Ln } ∩ S = ∅. For programs of the form P S , their answer sets have already been defined. Then, S is an answer set of P if S is an answer set of P S . An answer set is consistent if it is not LP . A program P is consistent if it has a consistent answer set; otherwise P is inconsistent. If a literal L is included in every answer set of P , it is written as P |= L; otherwise P 6|= L. When P is inconsistent, we define P |= f alse. An abductive framework used in this paper is the extended abduction introduced by Inoue and Sakama [11]. An abductive program is a pair h P, A i where P is an ELP and A ⊆ LP is a set of literals from the language of P called
Updating Extended Logic Programs through Abduction
149
abducibles.1 Any instance A of an element from A is also called an abducible and is written as A ∈ A. Without loss of generality, we assume that any rule from P having an abducible in its head is always a fact.2 Let h P, A i be an abductive program and G a ground literal. A pair (E, F ) is an explanation (resp. anti-explanation) of an observation G wrt h P, A i if 1. (P ∪ E) \ F |= G (resp. (P ∪ E) \ F 6|= G), 2. (P ∪ E) \ F is consistent, 3. E ⊆ A \ P and F ⊆ A ∩ P . Thus, to explain observations, extended abduction can not only introduce hypotheses to a program but also remove them from it. On the other hand, when a given fact is not observed anymore, anti-explanations are used to unexplain the observation. Note that traditional abduction considers only introducing hypotheses to explain an observation, i.e., P ∪ E |= G where P ∪ E is consistent. We call such traditional abduction normal abduction to distinguish it from the extended one. An (anti-)explanation (E, F ) of an observation G is called minimal if for any (anti-)explanation (E 0 , F 0 ) of G, E 0 ⊆ E and F 0 ⊆ F imply E 0 = E and F 0 = F . Example 2.1. Let h P, A i be an abductive program such that P : f lies(x) ← bird(x), not ab(x) , ab(x) ← broken-wing(x) , bird(tweety) ← , bird(opus) ← , broken-wing(tweety) ← . A : broken-wing(x) . Then, the observation G = f lies(tweety) has the minimal explanation (E, F ) = (∅, {broken-wing(tweety)}), while the observation G = f lies(opus) has the minimal anti-explanation ({broken-wing(opus)}, ∅).
3
View Updates
3.1
Update Programs
Suppose a knowledge base which contains variable knowledge and invariable knowledge. When there is a request for inserting/deleting knowledge to/from the knowledge base, the update request on invariable knowledge is translated into updates on variable knowledge. This type of updates is called view updates. 1 2
In [11], P and A are given as autoepistemic theories. Here we consider a formulation in logic programming. If there is a rule A ← Γ with an abducible A and a non-empty body Γ , then A is made a non-abducible by introducing a rule A ← A0 with a new abducible A0 .
150
C. Sakama and K. Inoue
Definition 3.1. Let P be a program, V the set of variable rules in the language of P , and G a literal. Then, a program P 0 accomplishes a view update for the insertion (resp. deletion) of G to/from P if 1. 2. 3. 4.
P 0 is consistent, P 0 |= G (resp. P 0 6|= G), P0 \ V = P \ V , there is no consistent program P 00 s.t. P 00 |= G (resp. P 00 6|= G), P 00 \ V = P \ V , and [(P ∩ V ) ∼ (P 00 ∩ V )] ⊂ [(P ∩ V ) ∼ (P 0 ∩ V )], where P ∼ Q = (P \ Q) ∪ (Q \ P ).
In the above, the third condition presents that the invariable part of P is unchanging, while the fourth condition presents that P 0 minimally changes the variable part. In particular, when G ∈ V \ P (resp. G ∈ V ∩ P ), the insertion (resp. deletion) is done by directly introducing (resp. deleting) G to/from P . In this section, we consider a knowledge base in which facts are variable and other rules are invariable. Also, an update request is either an insertion or deletion of a ground literal to/from a knowledge base.3 With this setting, the view update problem is naturally expressed by an abductive program h P, A i, where the program P represents a knowledge base and the abducibles A represent updatable literals. Proposition 3.1 [11] Let h P, A i be an abductive program and G a ground literal. Then, (P ∪ E) \ F accomplishes a view update for inserting (resp. deleting) G iff (E, F ) is a minimal explanation (resp. minimal anti-explanation) of the observation G wrt h P, A i. The goal of this section is to provide a computational method for the above characterized view updates. To this end, we introduce the notion of update programs. Definition 3.2. Given an abductive program h P, A i, the set U R of update rules are defined as follows. 1. For any literal a ∈ A, the following rules are in U R: a ← not a, a ← not a, where a is a newly introduced atom uniquely associated with a. For notational convenience, the above pair of rules are expressed as abd(a), hereafter. 2. For any literal a ∈ A \ P , the following rule is in U R:4 +a ← a . 3
4
When G = (H ← B), inserting (resp. deleting) G to/from P is rephrased as inserting (resp. deleting) a literal G0 to/from P ∪ { H ← B, G0 } (resp. (P \ G) ∪ { H ← B, G0 } ∪ { G0 ← }). When p(x) ∈ A, p(a) ∈ P and p(t) 6∈P for t 6= a, the rule precisely becomes +p(t) ← p(t) for any t 6= a. In such a case, the rule is shortly written as +p(x) ← p(x), x 6= a. Generally, the rule becomes +p(x) ← p(x), x 6= t1 , . . . , x 6= tn for n such instances.
Updating Extended Logic Programs through Abduction
151
3. For any literal a ∈ A ∩ P , the following rule is in U R: −a ← not a . Here, +a and −a are uniquely associated with any a ∈ A and are called update atoms. The atom a becomes true iff a is not true. Then, a pair of rules in abd(a) specify the situation that an abducible a is true or not [15,10]. The rule +a ← a derives the atom +a if an abducible a which is not in P is to be true. In contrast, the rule −a ← not a derives the atom −a if an abducible a which is in P is not to be true. The set of all update atoms associated with A is denoted by UA. We define that UA = UA+ ∪ UA− , where UA+ (resp. UA− ) is the set of update atoms of the form +a (resp. −a). Definition 3.3. Given an abductive program h P, A i, its update program U P is defined as an ELP such that U P = (P \ A) ∪ U R . Definition 3.4. An answer set S of U P is called U-minimal if there is no answer set T of U P such that T ∩ UA ⊂ S ∩ UA. A U-minimal answer set represents a minimal change in P . When there is no update request, the following property holds. Proposition 3.2 Let h P, A i be an abductive program in which P is consistent, and U P its update program. If S is a U-minimal answer set of U P , it holds that S ∩ UA = ∅ and there is an answer set T of P s.t. T = S ∩ LP . Conversely, if T is an answer set of P , there is a U-minimal answer set S of U P s.t. S ∩ UA = ∅ and S ∩ LP = T . Proof. When S is a U-minimal answer set of U P , a ∈ S ∩ A iff a ∈ A ∩ P ; and a ∈ A \ S iff a ∈ A \ P . Hence, S ∩ UA = ∅ holds. Put T = S ∩ LP . Then, P T = { H ← B | (H ← B) ∈ U P S and H ∈ LP }. As S is an answer set of U P S , T becomes an answer set of P T . Hence, T is an answer set of P . In converse, let T be an answer set of P . Put S = T ∪ { a | a ∈ A \ P }. Then, U P S = P T ∪ { a ← | a ∈ S }. As T is an answer set of P T , S becomes an answer set of U P . Here S ∩ UA = ∅ holds, hence S is also U-minimal. t u Example 3.1. Let h P, A i be an abductive program such that P : p ← b, q ← a, not b , a← . A : a, b .
152
C. Sakama and K. Inoue
Then, U P becomes U P : p ← b , q ← a, not b , abd(a), abd(b) , −a ← not a , +b ← b . Here, U P has four answer sets: S1 = { a, b, +b, p }, S2 = { a, b, −a, +b, p }, S3 = { a, b, q }, and S4 = { a, b, −a }. Of these, S3 is the U-minimal answer set and S3 ∩ LP coincides with the answer set of P . Next we consider an update request for inserting/deleting ground literals. An insertion of a literal G is represented as the rule ← not G , which represents the constraint that “G should be true”. On the other hand, a deletion of a literal G is represented as the rule ← G, which represents the constraint that “G must not be true”. To perform the insertion of p in the program of Example 3.1, consider the program U P ∪ { ← not p }. It has two answer sets: S1 and S2 , of which S1 is the U-minimal answer set. Observe that the observation p has the unique minimal explanation ({b}, ∅) wrt h P, A i. The situation is expressed by the update atom +b in S1 . On the other hand, to perform the deletion of q, consider the program U P ∪ { ← q }. It has three answer sets: S1 , S2 , and S4 , of which S1 and S4 are the U-minimal answer sets. Here, the observation q has two minimal antiexplanations ({b}, ∅) and (∅, {a}) wrt h P, A i. The situations are respectively expressed by the update atom +b in S1 and −a in S4 . Note that when the insertion of p and the deletion of q are requested at the same time, S1 becomes the unique U-minimal answer set of U P ∪ { ← not p } ∪ { ← q }.5 Thus, the U-minimal answer sets are used to compute minimal explanations which realize view updates in a knowledge base. Note that the constraint ← not G extracts answer sets in which G is true, but this does not imply that G is true in every answer set of (P ∪ E) \ F in general. To know that (E, F ) is an explanation of G, we need an additional test for checking the entailment of G from (P ∪ E) \ F . A pair of abducibles (E, F ) is called a pre-explanation of an observation G if (P ∪ E) \ F has a consistent answer set in which G is true. A pre-explanation of G is minimal if for any pre-explanation (E 0 , F 0 ) of G, E 0 ⊆ E and F 0 ⊆ F imply E 0 = E and F 0 = F .6 5
6
Generally, when there are insertion requests of p1 , . . . , pm and deletion requests of q1 , . . . , qn , instead of considering the (m + n)-goals ← not pi and ← qj , the same effect is achieved by introducing the rule g ← p1 , . . . , pm , not q1 , . . . , not qn to U P and considering the single goal ← not g. The term pre-explanation means that it does not explain G skeptically (as in the definition of explanations) but explains G credulously. Note that every skeptical explanation is a pre-explanation (credulous explanation) (cf. Proposition 3.3).
Updating Extended Logic Programs through Abduction
153
Proposition 3.3 Let h P, A i be an abductive program and G an observation. 1. Any explanation (E, F ) of G is a pre-explanation of G. 2. Any pre-explanation (E, F ) of G is an explanation of G if ((P ∪ E) \ F ) ∪ { ← G } is inconsistent. Proof. 1. When (E, F ) is an explanation of G in P , (P ∪ E) \ F |= G and (P ∪ E) \ F is consistent. Then, (P ∪ E) \ F has a consistent answer set in which G is true. 2. When (E, F ) is a pre-explanation of G in P , (P ∪ E) \ F has a consistent answer set in which G is true. In addition, if ((P ∪ E) \ F ) ∪ { ← G } is inconsistent, every answer set of (P ∪ E) \ F contains G. Hence, (P ∪ E) \ F |= G and (E, F ) is an explanation of G. t u In what follows, given sets E ⊆ A and F ⊆ A, we define E + = { +a | a ∈ E } and F − = { −a | a ∈ F }. In converse, given sets E + ⊆ UA+ and F − ⊆ UA− , define E = { a | +a ∈ E + } and F = { a | −a ∈ F − }. Lemma 3.4 Let h P, A i be an abductive program, U P its update program, and G a ground literal. Then, there is a (minimal) pre-explanation (E, F ) of G iff U P ∪ { ← not G } has a consistent (U-minimal) answer set S s.t. E + = S ∩ UA+ and F − = S ∩ UA− . Proof. Let S be a consistent answer set of U P ∪ { ← not G } s.t. E + = S ∩ UA+ and F − = S ∩ UA− . Then, for each +a ∈ E + and −b ∈ F − , a ← and b ← are produced by abd(a) and abd(b) in U P S . This means that to make G true in S, a ∈ A \ P is introduced to P and b ∈ A ∩ P is deleted from P . In this case, U P S contains a rule H ← B with H ∈ LP iff ((P ∪ E) \ F )S has the same rule. Put T = S ∩ LP . Then, T is a consistent answer set of (P ∪ E) \ F in which G is true. When S is U-minimal, suppose that the pair (E, F ) is not minimal. Then there is a pair (E 0 , F 0 ) s.t. E 0 ⊆ E, F 0 ⊆ F and (E 0 6= E or F 0 6= F ), and (P ∪ E 0 ) \ F 0 has an answer set T 0 in which G is true. In this case, there is an answer set S 0 of U P ∪ { ← not G } s.t. T 0 = S 0 ∩ LP and S 0 ∩ UA ⊂ S ∩ UA. This contradicts the assumption that S is U-minimal. In converse, for a pair (E, F ) of abducibles, let T be a consistent answer set of (P ∪ E) \ F in which G is true. Then, there is a program U P T in which a ← and b ← are produced by abd(a) and abd(b) for each a ∈ E and b ∈ F . In this case, U P T contains a rule H ← B with H ∈ LP iff ((P ∪ E) \ F )T has the same rule. Put S = T ∪ { +a | a ∈ E } ∪ { −b, b | b ∈ F }. Then, S is a consistent answer set of U P in which G is true. When the pair (E, F ) is minimal, it is shown in a similar way as above that S is also U-minimal. t u Theorem 3.5. Let h P, A i be an abductive program, U P its update program, and G a ground literal. Then, (P ∪ E) \ F accomplishes an insertion of G iff 1. S is a consistent answer set of U P ∪ { ← not G } s.t. E + = S ∩ UA+ , F − = S ∩ UA− , and ((P ∪ E) \ F ) ∪ { ← G } is inconsistent; and 2. S is U-minimal among those satisfying the condition 1.
154
C. Sakama and K. Inoue
Proof. Suppose that S satisfies the conditions. By the first condition, (E, F ) is a pre-explanation of G (Lemma 3.4), and also an explanation of G (Proposition 3.3). By the second condition, (E, F ) is a minimal explanation of G. Then, (P ∪E)\F accomplishes the insertion of G. In converse, suppose that (P ∪E)\F accomplishes the insertion of G. By Proposition 3.3 and Lemma 3.4, there is a consistent answer set S of U P ∪ { ← not G } s.t. E + = S ∩ UA+ , F − = S ∩ UA− , and ((P ∪ E) \ F ) ∪ { ← G } is inconsistent. Thus, the first condition holds. To see that S is U-minimal among those satisfying the first condition, suppose that there is an answer set S 0 s.t. S 0 ∩ UA ⊂ S ∩ UA and satisfying the first condition. Put J + = S 0 ∩ UA+ and K − = S 0 ∩ UA− . Then, J + ⊂ E + or K − ⊂ F − , and (P ∪ J) \ K |= G holds by Proposition 3.3 and Lemma 3.4. This contradicts the assumption that (E, F ) is minimal. t u In the above theorem, the first condition selects (possibly non-minimal) explanations from pre-explanations of G which are computed by the consistent answer sets of U P ∪ { ← not G } (Lemma 3.4). Then, the second condition selects minimal ones from those explanations. When a program is a locally stratified NLP [14], it has at most one answer set (called a perfect model ). Then, Theorem 3.5 is simplified as follows. Corollary 3.6 Let h P, A i be an abductive program in which P is a locally stratified NLP, and U P its update program. Given a ground atom G, (P ∪ E) \ F accomplishes the insertion of G iff the program U P ∪ { ← not G } has a consistent U-minimal answer set S s.t. E + = S ∩ UA+ and F − = S ∩ UA− . Proof. When P is a locally stratified NLP, so is (P ∪ E) \ F . Then (P ∪ E) \ F has at most one answer set. Hence, the result holds by Lemma 3.4. t u For deletion, the next result holds for any abductive program. Theorem 3.7. Let h P, A i be an abductive program, U P its update program, and G a ground literal. Then, (P ∪ E) \ F accomplishes the deletion of G iff U P ∪ { ← G } has a consistent U-minimal answer set S s.t. E + = S ∩ UA+ and F − = S ∩ UA− . Proof. Deletion is done as (P ∪ E) \ F 6|= G iff (E, F ) is a minimal anti-explanation of G wrt h P, A i iff (E, F ) is a minimal pre-explanation of G0 wrt h P ∪ { G0 ← not G }, A i where G0 is an atom appearing nowhere in P iff U P ∪ { G0 ← not G } ∪ { ← not G0 } has a consistent U-minimal answer set S ∪ { G0 } s.t. E + = S ∩ UA+ and F − = S ∩ UA− (by Lemma 3.4) iff U P ∪ { ← G } has a consistent U-minimal answer set S s.t. E + = S ∩ UA+ and F − = S ∩ UA− . t u 3.2
Updates with Rules
In view updates, if one wants to insert/delete not only facts but also rules, it is done in the following manner.
Updating Extended Logic Programs through Abduction
155
Let h P, A i be an abductive program, in which both P and A are ELPs. The rules in A, called abducible rules, are hypothetical rules that are used for abducing an observation together with the background knowledge from P [10]. Then, abductive programs introduced in Section 2 are considered as a special case where each rule in A is a fact. In this extended framework, an (anti-)explanation for an observation is defined as in Section 2 with the only difference that it is given as a pair (E, F ) where E and F are sets of abducible rules. Example 3.2. Let h P, A i be an abductive program such that P : f lies(x) ← bird(x), bird(x) ← penguin(x), penguin(tweety) ← . A : f lies(x) ← bird(x), ¬f lies(x) ← penguin(x). Then, the observation G = ¬f lies(tweety) has an explanation (E, F ) = ({ ¬f lies(x) ← penguin(x) }, { f lies(x) ← bird(x) }). An abductive program with abducible rules is transferable to an abductive program with abducible facts [10]. Given an abductive program with abducible rules h P, A i, let R = { H ← B | (H ← B) ∈ A and B 6= ∅ }. Then, define P 0 = (P \ R) ∪ { H ← B, γR , γR ← | R = (H ← B) ∈ R ∩ P }, ∪ { H ← B, γR | R = (H ← B) ∈ R \ P }, A0 = (A \ R) ∪ { γR | R ∈ R }, where γR is a newly introduced atom uniquely associated with each abducible rule R in R. The produced abductive program h P 0 , A0 i is semantically equivalent to the original abductive program h P, A i. Example 3.3. The abductive program of Example 3.2 is transformed to an abductive program h P 0 , A0 i where P 0 : f lies(x) ← bird(x), γ1 (x), bird(x) ← penguin(x), ¬f lies(x) ← penguin(x), γ2 (x). penguin(tweety) ←, γ1 (x) ←, A0 : γ1 (x), γ2 (x). Here, γ1 (x) and γ2 (x) are newly introduced abducibles associated with the rules f lies(x) ← bird(x) and ¬f lies(x) ← penguin(x), respectively. In this program, G = ¬f lies(tweety) has an explanation ({ γ2 (x) }, { γ1 (x) }), which corresponds to the explanation (E, F ) of Example 3.2. With this transformation, the technique of view updates in Section 3.1 is directly applied to view updates with abducible rules.
156
4 4.1
C. Sakama and K. Inoue
Theory Updates Update with Programs
Next we consider the situation that new information arrives at a knowledge base in which the whole knowledge is subject to change. Given a program P which represents the current knowledge base and another program Q which represents new information, a theory update should satisfy the following conditions. Definition 4.1. Given programs P and Q, P 0 accomplishes a theory update of P by Q if 1. P 0 is consistent, 2. Q ⊆ P 0 ⊆ P ∪ Q, 3. there is no consistent program P 00 s.t. P 0 ⊂ P 00 ⊆ P ∪ Q. By definition, the updated program P 0 is defined as the union of the new information Q and a maximal subset of the original program P which is consistent with Q. The first condition also implies that new information Q should be consistent, namely, updating with inconsistent information makes no sense. To realize theory updates, an abductive framework is used for specifying priorities between the current knowledge and the new knowledge. Consider the abductive program h P ∪ Q, P \ Q i, where a program is given as P ∪ Q and any rule in the original program P other than the new information Q is specified as variable abducible rules. Proposition 4.1 Let h P ∪ Q, P \ Q i be an abductive program. Then, P 0 accomplishes a theory update of P by Q iff P 0 = (P ∪ Q) \ F where (∅, F ) is a minimal anti-explanation of the observation G = f alse wrt h P ∪ Q, P \ Q i. Proof. P 0 accomplishes a theory update of P by Q iff P 0 = (P ∪Q)\F where F is a minimal set s.t. F ⊆ P \Q and (P ∪Q)\F 6|= f alse iff P 0 = (P ∪Q)\F where (∅, F ) is a minimal anti-explanation of the observation G = f alse wrt h P ∪ Q, P \ Q i. t u The abductive program h P ∪ Q, P \ Q i is transformed to an abductive program with abducible facts using the naming technique of Section 3.2. Then, we can compute theory updates using update programs and U-minimal answer sets introduced in Section 3.1. When U P is an update program of h P ∪ Q, P \ Q i, a minimal anti-explanation of G = f alse is computed by a consistent U-minimal answer set of U P ∪ { ← f alse } (Theorem 3.7). Here, the constraint ← f alse imposes consistency on U P , then the above problem is equivalent to computing a consistent U-minimal answer set of U P . Example 4.1. [2] Given the current knowledge base P1 : sleep ← not tv on , watch tv ← tv on , tv on ← ,
Updating Extended Logic Programs through Abduction
157
we want to update P1 with7 P2 : ¬ tv on ← power f ailure , power f ailure ← . The situation is expressed by the abductive program h P1 ∪ P2 , P1 \ P2 i, The update program U P of h P1 ∪ P2 , P1 \ P2 i then becomes U P : ¬ tv on ← power f ailure , power f ailure ← , sleep ← not tv on, γ1 , watch tv ← tv on, γ2 , abd(tv on), abd(γ1 ), abd(γ2 ), −tv on ← not tv on , −γ1 ← not γ1 , −γ2 ← not γ2 . where γ1 and γ2 are the names of the abducible rules in P1 . Then, U P has the unique U-minimal answer set { power f ailure, ¬ tv on, sleep, tv on, −tv on, γ1 , γ2 }, which represents the deletion of the fact tv on from P1 ∪ P2 . As a result, the theory update of P1 by P2 becomes P3 = (P1 ∪ P2 ) \ { tv on }. Next, suppose that another update P4 : ¬ power f ailure ← is given to P3 which states that power is back again. The situation is expressed by the abductive program h P3 ∪ P4 , P3 \ P4 i, and its update program becomes U P : ¬ power f ailure ← , sleep ← not tv on , γ1 , watch tv ← tv on , γ2 , ¬ tv on ← power f ailure, γ3 , abd(power f ailure), abd(γi ) (i = 1, 2, 3), −power f ailure ← not power failure , −γi ← not γi (i = 1, 2, 3) . Then, U P has the unique U-minimal answer set { ¬ power f ailure, sleep, γ1 , γ2 , γ3 , power f ailure, −power f ailure }, which implies that the result of an update is (P3 ∪ P4 ) \ { power f ailure }.8 Note that in h P ∪ Q, P \ Q i it holds that (P \ Q) \ (P ∪ Q) = ∅, so U P contains no rule of the form +a ← a of Definition 3.2(2). The above observation is formally stated as follows. Theorem 4.2. Let P and Q be programs, and U P the update program of h P ∪ Q, P \ Q i. Then, (P ∪ Q) \ F accomplishes a theory update of P by Q iff U P has a consistent U-minimal answer set S and F is the set of rules whose names are in S ∩ UA. 9 Proof. The result holds by Proposition 4.1 and Theorem 3.7. 7 8 9
t u
In [2] the rule ¬ tv on ← power f ailure is given as not tv on ← power f ailure. If the rule tv on ← not power failure is in the program, tv on becomes true as in [2]. For convenience, we consider that an abducible fact a ← has the name a.
158
C. Sakama and K. Inoue
4.2
Inconsistency Removal
When a program contains inconsistent information, it must be updated to recover consistency. This type of updates is called an inconsistency removal. Definition 4.2. Let P be an inconsistent program. Then, a program P 0 accomplishes an inconsistency removal of P if 1. P 0 is consistent, 2. P 0 ⊆ P , 3. there is no consistent program P 00 s.t. P 0 ⊂ P 00 ⊂ P . By definition, inconsistency removal is captured as a special case of theory updates where P is inconsistent and Q is empty in Definition 4.1. By putting Q = ∅ in h P ∪Q, P \Q i, inconsistency removal is characterized by the abductive program h P, P i. The next proposition directly follows from Proposition 4.1. Proposition 4.3 Let h P, P i be an abductive program. Then, P 0 accomplishes an inconsistency removal of P iff P 0 = P \ F where (∅, F ) is a minimal antiexplanation of the observation G = f alse wrt h P, P i. Example 4.2. Let P = { p ← not p, q ← }. Then, G = f alse has the minimal anti-explanation (E, F ) = (∅, { p ← not p }) wrt h P, P i. As a result, P 0 = { q ← } accomplishes an inconsistency removal of P . The following result holds by Theorem 4.2. Theorem 4.4. Let P be an inconsistent program and U P the update program of h P, P i. Then, P \ F accomplishes inconsistency removal of P iff U P has a consistent U-minimal answer set S and F is the set of rules whose names are in S ∩ UA. Example 4.3. Let P be the program pacif ist ← quaker , ¬pacif ist ← republican , quaker ← ,
republican ← ,
which has the answer set LP . Then, consider the update program U P of an abductive program h P, P i: U P : pacif ist ← quaker, γ1 , ¬pacif ist ← republican, γ2 , abd(γ1 ), abd(γ2 ), abd(quaker), abd(republican), −γ1 ← not γ1 , −γ2 ← not γ2 , −quaker ← not quaker , −republican ← not republican . Then, U P has four U-minimal answer sets: { quaker, republican, pacif ist, γ1 , γ2 , − γ2 }, { quaker, republican, ¬pacif ist, γ1 , γ2 , −γ1 }, { quaker, republican, γ1 , γ2 , ¬pacif ist, −quaker }, { quaker, republican, pacif ist, γ1 , γ2 , −republican }, which represent that deletion of one of the rules (or facts) from P makes the program consistent.
Updating Extended Logic Programs through Abduction
5
159
Computational Complexity
In this section, we analyze the computational complexity of updating ELPs. Throughout the section, we consider propositional abductive programs, i.e., an abductive program h P, A i where P is a finite ELP containing no variable and A is a finite set of ground literals. We first present the complexity of extended abduction. To this end, we introduce transformations between extended abduction and normal abduction. Let h P, A i be an abductive program, G an observation, E ⊆ A \ P and F ⊆ A ∩ P . It holds that (P ∪ E) \ F |= G iff (P \ A) ∪ E ∪ ((P ∩ A) \ F ) |= G. Then, G has an explanation (E, F ) wrt h P, A i (under extended abduction) iff G has an explanation H = E ∪ ((P ∩ A) \ F ) wrt h P \ A, A i (under normal abduction). Here, (E, F ) is easily extracted from H as E = H ∩ (A \ P ) and F = (P ∩ A) \ H. In converse, put P 0 = P ∪ { ← not a | a ∈ A ∩ P }. Then, G has an explanation E wrt h P, A i (under normal abduction) iff G has an explanation (E, ∅) wrt h P 0 , A i (under extended abduction). On the other hand, G has an anti-explanation (E, F ) wrt h P, A i iff an observation G0 has a pre-explanation (E, F ) wrt h P ∪ { G0 ← not G }, A i (Theorem 3.7). Using the above transformations between extended abduction and normal abduction, it is shown that any pre-explanation (E, F ) of G0 is efficiently computed by normal abduction, and vice-versa. In what follows, the problem of finding an (anti-)explanation (resp. minimal (anti-)explanation) of an observation G means that producing a pair (E, F ) of abducibles and checking whether it is an (anti-)explanation (resp. minimal (anti-)explanation) of G. Theorem 5.1. Given a propositional abductive program h P, A i and a ground literal G: (a) Finding an explanation (resp. minimal explanation) of an observation G is Σ2P -complete (resp. Σ3P -complete). (b) Finding an anti-explanation (resp. minimal anti-explanation) of an observation G is NP-complete (resp. Σ2P -complete). Proof. By the polynomial-time transformations between normal abduction and extended abduction, the complexity class of a decision problem under extended abduction is the same as that of the corresponding problem under normal abduction. Then the results of (a) and (b) follow from the complexity results of normal abduction [7]. t u The above results imply the complexity of updating ELPs. Theorem 5.2. (a) Finding a program which accomplishes a given view update is Σ3P -complete for insertion and Σ2P -complete for deletion. (b) Finding a program which accomplishes a given theory update is Σ2P -complete. (c) Finding a program which accomplishes inconsistency removal is Σ2P -complete.
160
C. Sakama and K. Inoue
Proof. The problem of finding a program which accomplishes the view update for inserting (resp. deleting) G is equivalent to the problem of finding a minimal explanation (resp. minimal anti-explanation) of G. On the other hand, the problem of finding a program which accomplishes a theory update or inconsistent removal is equivalent to the problem of finding a minimal anti-explanation of G = f alse. Then, the results hold by Theorem 5.1. t u To conclude, in view update insertion is generally harder than deletion by one level of the polynomial hierarchy, while theory updates and inconsistency removal are as hard as deletion in view updates.
6
Related Work
It is well-known that abduction is used for view updates in deductive databases. Kakas and Mancarella [13] characterize view updates through abduction and use Eshghi and Kowalski’s abductive procedure for computation. Decker [6] introduces an abductive procedure for view updates and integrity maintenance. These procedures are top-down and the correctness is guaranteed for locally stratified programs. Generally, view updates based on an SLDNF-like top-down procedure have restrictions on the program syntax. By contrast, the proposed technique is applicable to any ELP and updates are computed by a procedure for computing answer sets of ELPs with the additional task of selecting the U-minimal answer sets. Bry [3] characterizes intensional updates through abduction and specifies update procedures in a meta-program that is computed in a bottom-up manner. Console et al. [4] provide an abductive procedure for view updates based on Clark’s completion. They realize view updates based on traditional abduction but do not consider theory updates. Update programs can compute explanations for extended abduction. Inoue and Sakama [12] introduce transaction programs for computing extended abduction, while its application is limited to acyclic abductive programs. Fagin et al. [8] discuss view updates and theory updates in the context of first-order theories. Inoue and Sakama [11] characterize view updates in normal logic programs and theory updates of [8] in terms of extended abduction. Update programs proposed in this paper provide methods for computing them. Further, theory updates considered in this paper are more general than those of Fagin et al. in the sense that they consider updating a theory with a single formula. Alferes and Pereira [1] introduce update programs for updating normal and extended logic programs. Alferes et al. [2] propose a framework of dynamic logic programming which performs theory updates for generalized logic programs containing negation as failure in the head. Our approach is different from these two works on the point that we formulate updates in terms of abductive programs and update programs are used for computing view updates as well as theory updates. Theory updates with ELPs are also discussed in [16]. They associate priorities to resolve conflicts between rules, which is different from our abductive approach. Dam´ asio and Pereira [5] resolve inconsistency in ELPs by changing the truth-value of abducibles under the three-valued well-founded semantics.
Updating Extended Logic Programs through Abduction
7
161
Summary
This paper introduced methods for computing updates in extended logic programs. Three different types of updates, view updates, theory updates, and inconsistency removal were characterized in terms of abductive programs. All these updates were computed using update programs. The results of this paper show that abduction can play a fundamental role in various update problems. In future study we will extend the techniques to knowledge bases containing disjunctions.
References 1. J. J. Alferes and L. M. Pereira, Update programs can update programs, Nonmonotonic Extensions of Logic Programming, Lecture Notes in Artificial Intelligence 1216, pages 110–131, Springer, 1997. 2. J. J. Alferes, J. A. Leite, L. M. Pereira, H. Przymusinska, and T. Przymusinski. Dynamic logic programming. In: Proc. 6th Int’l Conf. Principles of Knowledge Representation and Reasoning, pages 98–109, Morgan Kaufmann, 1998. 3. F. Bry. Intensional updates: abduction via deduction. In: Proc. 7th Int’l Conf. Logic Programming, pages 561–575, MIT Press, 1990. 4. L. Console, M. L. Sapino, and D. T. Dupr´e. The role of abduction in database view updating. J. Intelligent Information Systems 4:261–280, 1995. 5. C. V. Dam´ asio and L. M. Pereira. Abduction over 3-valued extended logic programs. In: Proc. LPNMR’95, Lecture Notes in Artificial Intelligence 928, pages 29–42. 6. H. Decker. An extension of SLD by abduction and integrity maintenance for view updating in deductive databases. In: Proc. 1996 Joint Int’l Conf.& Symp. Logic Programming, pages 157–169, MIT Press. 7. T. Eiter, G. Gottlob, and N. Leone, Abduction from logic programs: semantics and complexity, Theoretical Computer Science 189(1-2):129–177, 1997. 8. R. Fagin, J. D. Ullman, and M. Y. Vardi. On the semantics of updates in databases (preliminary report). In: Proc. 2nd ACM SIGACT-SIGMOD Symp. Principles of Database Systems, pages 352–365, 1983. 9. M. Gelfond and V. Lifschitz. Logic programs with classical negation. In: Proc. 7th Int’l Conf. Logic Programming, pages 579–597, MIT Press, 1990. 10. K. Inoue. Hypothetical reasoning in logic programs. J. Logic Programming 18:191227, 1994. 11. K. Inoue and C. Sakama. Abductive framework for nonmonotonic theory change. In: Proc. IJCAI-95, pages 204–210, Morgan Kaufmann. 12. K. Inoue and C. Sakama. Specifying transactions for extended abduction. In: Proc. 6th Int’l Conf. Principles of Knowledge Representation and Reasoning, pages 394–405, Morgan Kaufmann, 1998. 13. A. C. Kakas and P. Mancarella. Database updates through abduction. In: Proc. 16th Int’l Conf. Very Large Databases, pages 650–661, Morgan Kaufmann, 1990. 14. T. C. Przymusinski. On the declarative semantics of deductive databases and logic programs. In: J. Minker, (ed.), Foundations of Deductive Databases and Logic Programming, pages 193–216, Morgan Kaufmann, 1998. 15. K. Satoh and K. Iwayama. Computing abduction by using the TMS. In: Proc. 8th Int’l Conf. Logic Programming, pages 505–518, MIT Press, 1991. 16. Y. Zhang and N. Y. Foo. Updating logic programs. In: Proc. 13th European Conf. Artificial Intelligence, pages 403-407, Wiley, 1998.
LUPS – A Language for Updating Logic Programs Jos´e J´ ulio Alferes1,2 , Lu´ıs Moniz Pereira2 , Halina Przymusinska3,4 , and Teodor C. Przymusinski4 ´ Dept. de Matem´ atica, Univ. Evora, ´ Rua Rom˜ ao Ramalho, 59, P-7000 Evora, Portugal,
[email protected] Centro de Inteligˆencia Artificial, Fac. Ciˆencias e Tecnologia, Univ. Nova de Lisboa, P-2825-114 Caparica, Portugal,
[email protected], Voice:+351 1 294 8533, Fax: +351 1 294 8541 3 Dept. Computer Science, California State Polytechnic Univ. Pomona, CA 91768, USA,
[email protected] 4 Dept. Computer Science, Univ. of California Riverside, CA 92521, USA,
[email protected] 1
2
Abstract. Most of the work conducted so far in the field of logic programming has focused on representing static knowledge, i.e. knowledge that does not evolve with time. To overcome this limitation, in a recent paper, the authors introduced the concept of dynamic logic programming. There, they studied and defined the declarative and operational semantics of sequences of logic programs (or dynamic logic programs), P0 ⊕ . . . ⊕ Pn . Each such program contains knowledge about some given state, where different states may, e.g., represent different time periods or different sets of priorities. The role of dynamic logic programming is to employ relationships existing between the possibly mutually contradictory sequence of programs to precisely determine, at any given state, the declarative and procedural semantics of their combination. But how, in concrete situations, is a sequence of logic programs built? For instance, in the domain of actions, what are the appropriate sequences of programs that represent the performed actions and their effects? Whereas dynamic logic programming provides a way for determining what should follow, given the sequence of programs, it does not provide a good practical language for the specification of updates or changes in the knowledge represented by successive logic programs. In this paper we define a language designed for specifying changes to logic programs (LUPS – “Language for dynamic updates”). Given an initial knowledge base (in the form of a logic program) LUPS provides a way for sequentially updating it. The declarative meaning of a sequence of sets of update actions in LUPS is defined using the semantics of the dynamic logic program generated by those actions. We also provide a translation of the sequence of update statements sets into a single generalized logic program written in a meta-language, so that the stable models of the resulting program correspond to the previously defined declarative semantics. This meta-language is used in the actual implementation, although this is not the subject of this paper. Finally we briefly mention related work (lack of space prevents us from presenting more detailed comparisons). M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 162–176, 1999. c Springer-Verlag Berlin Heidelberg 1999
LUPS – A Language for Updating Logic Programs
1
163
Introduction
Several authors [9,10,2] have addressed the issue of updates of logic programs and deductive databases, most of them following the so called “interpretation update” approach. This approach, proposed in [11,7], is based on the idea of reducing the problem of finding an update of a knowledge base DB by another knowledge base U to the problem of finding updates of its individual interpretations or models. More precisely, a knowledge base DB 0 is considered to be the update of a knowledge base DB by U if the set of models of DB 0 coincides with the set of updated models of DB. As pointed out in [1], such an approach suffers from several important drawbacks: first, it requires the computation of all models of DB, before computing the update; second, the resulting knowledge base DB 0 is only indirectly characterized (as one whose models are all the updated models of the original DB) – no direct definition of DB 0 is provided; last, and most importantly, it leads to counterintuitive results when the intensional part of the knowledge base (i.e. the set of rules) changes. In [2] the authors eliminated the first two drawbacks by showing how to, given a program P , construct another program P 0 whose models are exactly the interpretation updates of the models of P . However the last, and most important, drawback still remained: no method to update logic programs consisting of rules, not just extensional facts, was provided. Example 1.1. Consider the logic program: f ree ← not jail jail ← abortion whose only stable model is M = {f ree}. Suppose now that the update U states that abortion becomes true, i.e. U = {abortion ←}. According to the interpretation approach to updating, we would obtain {f ree, abortion} as the only update of M by U . However, by inspecting the initial program and the update, we are likely to conclude that, since f ree was true only because jail could be assumed false, and that was the case because abortion was false, now that abortion became true jail should also have become true, and f ree should be removed from the conclusions. Suppose now that the law changes, so that abortion no longer implies jail. That could, for example, be described by the new (update) program U2 = {not jail ← abortion}. We should now expect jail to become false and, consequently, f ree to become true (again). This example suggests that the principle of inertia should be applied not just to individual literals but rather to the whole rules of the knowledge base. It also suggests that the update of a knowledge base by another one should not just depend on their semantics, it should also depend on their syntax. It also illustrates the need for some way of representing negative conclusions.
164
J.J. Alferes et al.
In [1], the authors investigated the problem of updating knowledge bases represented by generalized logic programs1 and proposed a new approach to this problem that eliminates the drawbacks of previously proposed solutions. It starts by defining the update of a generalized program P by another generalized program U , P ⊕U . The semantics of P ⊕U avoids the above mentioned problems by applying the inertia principle not just to atoms but to entire program rules. This notion of updates is then extended to sequences of programs, thereby defining the so-called dynamic logic programming. A dynamic logic program is a (finite or infinite) sequence P0 ⊕ . . . ⊕ Pn ⊕ . . ., representing consecutive updates of logic programs by logic programs. The semantics defined in [1] assigns meaning to such sequences. However, dynamic logic programming does not by itself provide a proper language for specifying (or programming) changes of logic programs. If knowledge is already represented by logic programs, dynamic programs simply represent the evolution of knowledge. But how is that evolving knowledge specified? What makes knowledge evolve? Since logic programs describe knowledge states, it’s only fit that logic programs describe transitions of knowledge states as well. It is natural to associate with each state a set of transition rules to obtain the next state. As a result, an interleaving sequence of states and rules of transition will be obtained. Imperative programming specifies transitions and leaves states implicit. Logic programming, up to now, could not specify state transitions. With the language of dynamic updates LUPS we make both states and their transitions declarative. Usually updates are viewed as actions or commands that make the knowledge base evolve from one state to another. This is the classical view e.g. in relational databases: the knowledge (data) is expressed declaratively via a set of relations; updates are commands that change the data. In [1], updates were viewed declaratively as a given update store consisting of the sequence of programs. They were more in the spirit of state transition rules, rather than commands. Of course, one could say that the update commands were implicit. For instance, in example 1.1, the sequence P ⊕ U ⊕ U2 could be viewed as the result of, starting from P , performing first the update command assert abortion, and then the update command assert not jail ← abortion. But, if viewed as a language for (implicitly) specifying update commands, dynamic logic programming is quite poor. For instance, it does not provide any mechanism for saying that some rule (or fact) should be asserted only whenever some conditions are satisfied. This is essential in the domain of actions, to specify direct effects of actions. For example, suppose we want to state that wake up should be added to our knowledge base whenever alarm rings is true. As a language for specifying updates, dynamic logic programming does not provide a way of specifying such an update command. Note that the command is distinct from assert wake up ← alarm rings. With the latter, if the alarm stops ringing (i.e. if not alarm rings is later asserted), wake up becomes false. In the former, we expect wake up to remain true (by 1
i.e. logic programs which allow default negation not only in rule bodies but also in the heads.
LUPS – A Language for Updating Logic Programs
165
inertia) even after the alarm stops ringing. As a matter of fact, in this case, we don’t want to add the rule saying that wake up is true whenever alarm rings is also true. We simply want to add the fact wake up as soon as alarm rings is true. From there on, no connection between wake up and alarm rings should persist. This simple one-rule example also highlights another limitation of dynamic logic programming as a language for specifying update commands: one must explicitly say to which program in the sequence a rule belongs to. Sometimes, in particular in the domain of actions, there is no way to know a priori to which state (or program) a rule should belong to. Where should we assert the fact wake up? This is not known a priori because we don’t know when alarm rings. In this paper we define (in section 3) a language for specifying logic program updates: LUPS – “Language of dynamic updates”. The object language of LUPS is that of generalized logic programs. A sentence U in LUPS is a set of simultaneous update commands (or actions) that, given a pre-existing sequence of logic programs P0 ⊕ . . . ⊕ Pn (i.e. a dynamic logic program), whose semantics corresponds to our knowledge at a given state, produces a sequence with one more program, P0 ⊕ . . .⊕ Pn ⊕ Pn+1 , corresponding to the knowledge that results from the previous sequence after performing all the simultaneous commands. A program in LUPS is a sequence of such sentences. Given a program in LUPS, its semantics is first defined (in section 4) by means of a dynamic logic program generated by the sequence of commands. In section 5, we also describe a translation of a LUPS program into a generalized logic program, whose stable models exactly correspond to the semantics of the original LUPS program. Finally, in section 6, we briefly discuss some of the similarities and differences between LUPS and “Action Languages”.
2
Object Language
In order to represent negative information in logic programs and their updates, we need more general logic programs, allowing default negation not A not only in the premises of clauses but also in their heads2 . In [1], such programs are dubbed generalized logic programs, and their semantics is defined as a generalization of the stable model semantics of normal logic programs [4]3 . For convenience, generalized logic programs are syntactically represented as propositional Horn theories. In particular, default negation not A is represented as a standard propositional variable (atom). Suppose that K is an arbitrary set of propositional variables whose names do not begin with a “ not”. By the 2
3
See [1] for an explanation on why default negation is needed in rule heads, rather than explicit negation. Note that a default negated atom in a rule’s head means that the atom should no longer be assumed true, whilst an explicit negated atom would mean that the atom should become false. In an update context this difference is similar to the difference between deleting a fact and asserting its complement. The class of generalized logic programs can be viewed as a special case of a yet broader class of programs introduced earlier in [5].
166
J.J. Alferes et al.
propositional language LK generated by the set K we mean the language whose set of propositional variables consists of {A : A ∈ K} ∪ {not A : A ∈ K}. Atoms A ∈ K, are called objective atoms while the atoms not A are called default atoms. From the definition it follows that the two sets are disjoint. By “literals” we mean objective or default atoms in LK . Definition 2.1 (Generalized Logic Program). A generalized logic program P in the language LK is a (possibly infinite) set of propositional rules of the form L ← L1 , . . . , Ln , where L, L1 , . . . , Ln are literals. If none of the literals appearing in heads of rules of P are default ones, then we say that the logic program P is normal. By a (2-valued) interpretation M of LK we mean any set of atoms from LK satisfying the condition that for any A in K, precisely one of the atoms A or not A belongs to M . Given an interpretation M we define: M + = {A ∈ K : A ∈ M } and M − = {not A : not A ∈ M } = { not A : A ∈ / M} By a (2-valued) model M of a generalized logic program we mean a (2-valued) interpretation that satisfies all of its clauses. Definition 2.2 (Stable models of generalized logic programs). We say that a (2-valued) interpretation M of LK is a stable model of a generalized logic program P if M is the least model of the Horn theory P ∪ M − , or, equivalently, if M = {L : L is a literal and P ∪ M − ` L}. As proven in [1], the class of stable models of generalized logic programs extends the class of stable models of normal programs [4], in the sense that, for the special case of normal programs, both semantics coincide.
3
Language for Updates
In our update framework, knowledge evolves from one knowledge state to another as a result of update commands stated in object language. Knowledge states KSi represent dynamically evolving states of our knowledge. They undergo change due to update actions. Without loss of generality (as will become clear below) we assume that the initial knowledge state, KS0 , is empty and that in it all predicates are false by default. This is the default knowledge state. Given the current knowledge state KS, its successor knowledge state KS[U ] is produced as a result of the occurrence of a non-empty set U of simultaneous updates. Each of the updates can be viewed as a set of (parallel) actions and consecutive knowledge states are obtained as KSn = KS0 [U1 ][U2 ]...[Un ] where Ui ’s represent consecutive sets of updates. We also denote this state by: KSn = U1 ⊗ U2 ⊗ . . . ⊗ Un So defined sequences of updates will be called update programs. In other words, an update program is a finite sequence U = {Us : s ∈ S} of updates indexed by
LUPS – A Language for Updating Logic Programs
167
the set S = {1, 2, . . . , n}. Each updates is a set of update commands. Update commands (to be defined below) specify assertions or retractions to the current knowledge state. By the current knowledge state we mean the one resulting from the last update performed. Knowledge can be queried at any state q ≤ n, where n is the index of the current knowledge state. A query will be denoted by: holds(B1 , . . . , Bk , not C1 , . . . , not Cm ) at q? and is true iff the conjunction of its literals holds at the state KBq . If q = n, we simply skip the state reference “at q”. 3.1
Update Commands
Update commands cause changes to the current knowledge state leading to a new successor state. The simplest command consists of adding a rule to the current state: assert L ← L1 , . . . , Lk . For example, when a law stating that abortion is punished by jail is approved, the knowledge state might be updated via the command: assert jail ← abortion. In general, the addition of a rule to a knowledge state may depend upon some precondition. To allow for that, an assert command in LUPS has the form: assert L ← L1 , . . . , Lk when Lk+1 , . . . , Lm
(1)
The meaning of such assert rule is that if the precondition Lk+1 , . . . , Lm is true in the current knowledge state, then the rule L ← L1 , . . . , Lk should belong to the successor knowledge state. Normally, the so added rule persists, or is in force, from then on by inertia, until possibly defeated by some future update or until retracted. This is the case for the assert-command above: the rule jail ← abortion remains in effect by inertia from the sucessor state onwards unless later invalidated. However, there are cases where this persistence by inertia should not be assumed. Take, for instance, the alarm ring discussed in the introduction. This fact is a one-time event that should not persist by inertia, i.e. it is not supposed to hold by inertia after the successor state. In general, facts that denote names of events or actions should be non-inertial. Both are true in the state they occur, and do not persist by inertia for later states. Accordingly, the rule within the assert command may be preceeded with the keyword event, indicating that the added rule is non-inertial. Assert commands are thus of the form (1) or of the form4 : assert event L ← L1 , . . . , Lk when Lk+1 , . . . , Lm (2) While some update commands, such as assert republican congress, represent newly incoming information, and are thus one-time non-persistent update commands (whose effect, i.e. the truth of republican congress, may nevertheless 4
In both cases, if the precondition is empty we just skip the whole when subclause.
168
J.J. Alferes et al.
persist by inertia), some other update commands are liable to be persistent, i.e., to remain in force until cancelled. For example, an update like: assert jail ← abortion when rep congress, rep president or
assert wake up when alarm sounds
might be always true, or at least true until cancelled. Enabling the possibility of such updates allows our system to dynamically change without any truly new updates being received. For example, the persistent update command: assert set hands(T ) when get hands(C) ∧ get time(T ) ∧ (T − C) > ∆ defines a perpetually operating clock whose hands move to the actual time position whenever the difference between the clock time and the actual time is sufficiently large. In order to specify such persistent updates commands (which we call laws) we introduce the syntax: always [event] L ← L1 , . . . , Lk when Lk+1 , . . . , Lm
(3)
For cancelling persistent update commands, we use: cancel L ← L1 , . . . , Lk when Lk+1 , . . . , Lm
(4)
The first statement means that, in addition to any new set of arriving update commands, we are also supposed to keep executing this persistent update command. The second statement cancels this persistent update, when the conditions for cancelation are met. The existence of persistent update commands requires a “trivial” update, which does not specify any truly new updates but simply triggers all the already defined persistent updates to fire, thus resulting in a new modified knowledge state. Such “no-operation” update ensures that the system continues to evolve, even when no truly new updates are specified, and may be represented by assert true. It stands for the tick of the clock that drives the world being modelled. To deal with the deletion of rules, we introduce the retraction command: retract [event] L ← L1 , . . . , Lk when Lk+1 , . . . , Lm
(5)
meaning that, subject to precondition Lk+1 , . . . , Lm , the rule L ← L1 , . . . , Lk is either retracted from now on, or just retracted temporarily in the next state (non-inertial retract, i.e. an event of retraction, triggered by the event keyword). The cancelling of an update command is not equivalent to retracting a rule. Cancelling an update just means it will no longer be added as a command to updates, it does not cancel the inertial effects of its previous application(s). However, retracting an update causes any of its inertial effects to be cancelled from
LUPS – A Language for Updating Logic Programs
169
now on, as well as cancelling a persistent law. Also, note that “retract event . . .” does not mean the retracting of an event, because events persist only for one state and thus do not require retraction. It represents a temporary removal of a rule from the successor state (a temporary retraction event). Definition 3.1 (LUPS). An update program in LUPS is a finite sequence of updates, where an update is a set of commands of the form (1) to (5). Example 3.1. Consider the scenario: once Republicans take over both Congress and the Presidency they establish a law stating that abortions are punishable by jail; once Democrats take over both Congress and the Presidency they abolish such a law; in the meantime, there are no changes in the law because always either the President or the Congress vetoes such changes; performing an abortion is an event, i.e. a non-inertial update. Consider the following update history: (1) a Democratic Congress and a Republican President (Reagan); (2) Mary performs abortion; (3) Republican Congress is elected (Republican President remains in office: Bush); (4) Kate performs abortion; (5) Clinton is elected President; (6) Ann performs abortion; (7) Gore is elected President and Democratic Congress is in place (year 2000?); (8) Susan performs abortion. The specification in LUPS would be: Persistent update commands: always jail(X) ← abn(X) when repC ∧ repP always not jail(X) ← abn(X) when not repC ∧ not repP Alternatively, instead of the second clause, in this example, we can use a retract statement retract jail(X) ← abn(X) when not repC ∧ not repP Note that, in this example, since there is no other rule implying jail, retracting the rule is safely equivalent to retracting its conclusion. These rules state that we are always supposed to update the current state with the rule jail(X) ← abn(X) provided repC and repP hold true and that we are supposed to assert the opposite (or just retract this rule) provided not repC and not repP hold true. Such rules should be added to U1 . Sequence of non-persistent update commands: U1 : assert repP U5 : assert not repP assert not repC U6 : assert event abn(ann) U2 : assert event abn(mary) U7 : assert not repC U3 : assert repC U8 : assert event abn(susan) U4 : assert event abn(kate) Of course, in the meantime we could have a lot of trivial update events representing ticks of the clock, or any other irrelevant updates.
170
J.J. Alferes et al.
Example 3.2. Consider the spring-loaded suitcase with two latches situation [8], where the suitcase opens whenever both latches are up, and there is an action of toggling each latch. This situation can be represented in LUPS by adding to U1 the persistent update commands: always open ← up(l1), up(l2) always up(L) when not up(L), toggle(L) always not up(L) when up(L), toggle(L) A concrete situation, where initially l1 is down, l2 is up and the suitcase is closed, plus the action of toggling l1, would be represented by the sequence of updates: U1 = {assert not up(l1), assert up(l2), assert not open} U2 = {assert event toggle(l1)} Note how an initial state can easily be characterized by asserting appropriate rules in U1 . To model a situation where both l1 and l2 are toggled simultaneously, simply add assert event toggle(l2) to U2 .
4
Semantics of LUPS
In this section we provide update programs with a meaning, by translating them into dynamic logic programs. The semantics of an update program is then determined by the semantics of the so obtained dynamic program. L We recall that P, where P a dynamic program is a sequence P0 ⊕ . . . ⊕ Pn (also denoted is a set of generalized logic programs indexed by 1, . . . , n and P0 = {}). LThe notion of “dynamic program update at a given state s”, represented by s P, precisely characterizes a generalized logic program whose stable models correspond to the meaning of the dynamic program when queried at L state s. If some literal or conjunction of literals φ holds in all stable models of s P, we write L s P |=sm φ. Lack of space prevents us from giving here the complete definition of dynamic programs semantics. The reader is refered to [1] where such programs are defined, and their precise semantic characterization is provided. The translation of an update program into a dynamic program is made by induction, starting from the empty program P0 , and for each update Ui , given the already built dynamic program P0 ⊕ . . . ⊕ Pi−1 , determining the resulting program P0 ⊕ . . . ⊕ Pi−1 ⊕ Pi . To cope with persistent update commands we will further consider, associated with every dynamic program in the inductive construction, a set containing all currently active persistent commands, i.e. all those that were not cancelled until that point in the construction, from the time they were introduced. To be able to retract rules, we need to uniquely identify each such rule. This is achieved by augmenting the language of the resulting dynamic program with a new propositional variable “rule(L ← L1 , . . . , Ln )” for every rule L ← L1 , . . . , Ln appearing in the original LUPS program5 . 5
Note that, by definition, all such rules are ground and thus the new variable uniquely identifies the rule, where rule/1 is a reserved predicate.
LUPS – A Language for Updating Logic Programs
171
Definition 4.1 (Translation into dynamic programs). Let U = U1 ⊗ . . . ⊗ Un be an update program. The corresponding dynamic program Υ (U) = P = P0 ⊕ . . . ⊕ Pn is obtained by the following inductive construction, using at each step i an auxiliary set of persistent commands P Ci : Base step: P0 = {} with P C0 = {}. Inductive step: Let Pi = P0 ⊕ . . . ⊕ Pi with the set of persistent commands P Ci be the translation of Ui = U1 ⊗ . . . ⊗ Ui . The translation of Ui+1 = U1 ⊗ . . . ⊗ Ui+1 is Pi+1 = P0 ⊕ . . . ⊕ Pi+1 with the set of persistent commands P Ci+1 , where: P Ci+1 = P Ci ∪ ∪{assert R when C : always R when C ∈ Ui+1 } ∪{assert event R when C : always event R when C ∈ Ui+1 L} −{assert [event] R when C : cancel R when D ∈ Ui+1 ∧ Li Pi |=sm D} −{assert [event] R when C : retract R when D ∈ Ui+1 ∧ i Pi |=sm D} N Ui+1 = Ui+1 ∪ P Ci+1
L Pi+1 = {R, rule(R) : assert [event] R when C ∈ N Ui+1 L∧ i Pi |=sm C} ∪ {not rule(R) : retract [event] R when C ∈ N Ui+1 L∧ i Pi |=sm C} ∧ i−1 Pi−1 |=sm C} ∪ {not rule(R) : assert event R when C ∈ N UiL ∪ {rule(R) : retract event R when C ∈ N Ui ∧ i−1 Pi−1 |=sm C, rule(R)} where R denotes a generalized logic program rule, and C and D a conjunction of literals. assert [event] R when C and retract [event] R when C are used for notation convenience, and stand for either the assert or the assert-event command (resp. retract and retract-event). So, for example in the first line of the definition of Pi+1 , R and rule(R) must be added either if there exists a command assert R when C or a command assert event R when C obeying the conditions there. In the inductive step, if i = 0 the last two lines are ommitted. In that case N Ui does not exist. Definition 4.2 (LUPS semantics). Let U beL an update program. A query holds(L1 , . . . , Ln ) at q is true in U iff q Υ (U) |=sm L1 , . . . , Ln . From the results on dynamic programs in [1], it is clear that LUPS generalizes the language of updates of “revision programs” defined in [9]: Theorem 4.1 (LUPS generalizes revision programs). Let I be an interpretation and R a revision program. Let U = U1 ⊗ U2 be the update program where: U1 = {assert A : A ∈ I} U2 = {assert A ← B1 , . . . , not Bn : in(A) ← in(B1 ), . . . , out(Bn ) ∈ R} ∪ {assert not A ← B1 , . . . , not Bn : out(A) ← in(B1 ), . . . , out(Bn ) ∈ R} Then, M is a stable model of Υ (U) iff M is an interpretation update of I by R in the sense of [9].
172
5
J.J. Alferes et al.
Translation into Generalized Logic Programs
The previous section established semantics for LUPS. However, its definition is based on a translation into dynamic logic programs, and is not purely syntatic. Indeed, to obtain the translated dynamic program, one needs to compute, at each step of the inductive process, the consequences of the previous one. In this section we present a translation of update programs and queries into normal logic programs written in a meta-language. The translation is purely syntactic, and is correct in the sense that a query holds in an update program iff the translation of the query belongs to all stable models of the translation of the update program. This translation also directly provides a mechanism for implementing update programs: with a pre-processor performing the translations, query answering is reduced to that of normal logic programs. Such a preprocessor and a meta-interpreter for answering queries have been implemented6 . The translation presented here assumes the existence of a sequence of consecutive updates. Nevertheless, it is easy to see that the translation is modular (i.e. adding an extra update does not modify what has been already translated). Thus, in practice, the various updates can be iteratively translated, one at a time. The translation uses a meta-language generated by the language of the update programs. For each objective atom A in the language of the update program, and each special propositional symbol ruleL←Body or cancelL←Body (where these symbols are added to the language for each rule L ← Body in the update program), the meta-language includes the following symbols: A(s, t), Au (s, t), A(s, t), and Au (s, t), where s and t range over the indexes of the update program. Intuitively, these new symbols mean, respectively: A is true at state s considering all states until t; A is true due to the update program at state s, considering all states until t; A is false at state s considering all states until t; not A is true due to the update program at state s, considering all states until t. Intuitively, the first index argument added to atoms stands from the update state where the atom has been introduced. So, according to the transformation below, in non-persistent asserts the first argument of atoms in the head of rules is instanciated with the index of the update state where the rule was asserted. In persistent asserts, the argument ranges over the indexes where the rule should be asserted (i.e. all those greater than the state where the corresponding always command is). The second index argument stands for the query state. Accordingly, when translating (non-event) asserts, the second argument of atoms in the head of rules ranges over all states greater than that where the rule was asserted. For event asserts, the second argument is instanciated with the index of the update state where the event was asserted. This is so in order to guarantee that the event is only true when queried in that state (it does not remain, by inertia, to subsequent query states). 6
The system, running under XSB-Prolog, is available at: http://centria.di.fct.unl.pt/ jja/lups.p Rather than stable models, the well-founded semantics is used.
LUPS – A Language for Updating Logic Programs
173
Inertia rules are added to allow for the usage of rules asserted in states before the query one. Such rules say that one way to prove L at state s with query state t, is by proving L at state s−1 with the same query state (unless its complement is proven at state s, thus blocking the inertia of L). Literals in the body of asserted rules are translated such that both arguments are instanciated with the query state. This guarantees that body literals are always evaluated in the query state. Literals in the when clause have both arguments instanciated with the state prior to that when the rule was asserted. This guarantees that those literals are always evaluated considering that state as the query state. Definition 5.1 (Translation of update programs). By the translation of an update program U = U1 ⊗. . .⊗Un in the language L, T r(U), we mean the normal logic program consisting of the following rules, in the meta-language above: Default knowledge state rules. For all objective atoms A ∈ L, and t ≥ 0: A(0, t) These rules state that in the initial state all objective atoms are false. Update rules. For all objective atoms A ∈ L, and s, t ≥ 0: A(s, t) ← Au (s, t)
A(s, t) ← Au (s, t)
These update rules state that A is true (resp. false) at state s if A (resp. not A) is true due to the update program at state s. Inertia rules. For all objective atoms A ∈ L, and s, t > 0: A(s, t) ← A(s − 1, t), not Au (s, t)
A(s, t) ← A(s − 1, t), not Au (s, t)
Inertia rules say that A is true (resp. false) if it is true (resp. false) in the previous state and its complement is not true due to the update at s. Translation of asserts. For all update commands assert L ← B1 , . . . , not Bk when C1 , . . . , not Cm ∈ Us for any 1 ≤ s ≤ n and t > s: ruleuL←B1 ,...,not Bk (s + 1, t) ← C1 (s, s), . . . , Cm (s, s) T L ← B1 (t, t), . . . Bk (t, t), ruleL←B1 ,...,not Bk (t, t), C1 (s, s), . . . , Cm (s, s) where T L = Au (s+1, t) if L is an objective atom A, and T L = Au (s + 1, t) if L is a default atom not A. The rule L ← B1 , . . . , not Bk is added at state s+1 provided condition C1 , . . . , not Cm holds at state s (considering only states till s). It will remain true by inertia for all t ≥ s + 1 unless the literal ruleL←B1 ,...,not Bk (t, t) becomes false. Moreover, beginning at state s + 1, ruleL←B1 ,...,not Bk (t, t) is true (and so remains by inertia).
174
J.J. Alferes et al.
Translation of retracts. For all update commands retract L ← B1 , . . . , not Bk when C1 , . . . , not Cm ∈ Us for any 1 ≤ s ≤ n and t > s: ruleuL←B1 ,...,not Bk (s + 1, t) ← C1 (s, s), . . . , Cm (s, s) u (s + 1, t) ← C1 (s, s), . . . , Cm (s, s) cancelL←B 1 ,...,not Bk
The rule L ← B1 , . . . , not Bk is retracted at state s + 1 provided that condition C1 , . . . , not Cm holds at state s. Retractions also cancel persistent update rules. Translation of persistent asserts. For all update commands always L ← B1 , . . . , not Bk when C1 , . . . , not Cm ∈ Us for any 1 ≤ s ≤ n and t, q + 1 > s: u cancelL←B (s + 1, t) ← 1 ,...,not Bk
ruleuL←B1 ,...,not Bk (q + 1, t)
← C1 (q, q), . . . , Cm (q, q), cancelL←B1 ,...,not Bk (q + 1, q + 1)
T L ← B1 (t, t), . . . Bk (t, t), ruleL←B1 ,...,not Bk (t, t), cancelL←B1 ,...,not Bk (q + 1, q + 1), C1 (q, q), . . . , Cm (q, q) where T L = Au (q + 1, t) if L is an objective atom A, and T L = Au (q + 1, t) if L is a default atom not A. The rule L ← B1 , . . . , not Bk is added to any state greater than s, provided condition C1 , . . . , not Cm holds at that state s, and will remain true by inertia for all t > q, unless retracted or cancelled. Translation of cancellation rules. For all update commands cancel L ← B1 , . . . , not Bk when C1 , . . . , not Cm ∈ Us for any 1 ≤ s ≤ n and t > s: u cancelL←B (s + 1, t) ← C1 (s, s), . . . , Cm (s, s) 1 ,...,not Bk
The persistent update of rule L ← B1 , . . . , not Bk is cancelled at state s + 1 provided condition C1 , . . . , not Cm holds at state s. Translation of assert events. For all update commands assert event L ← B1 , . . . , not Bk when C1 , . . . , not Cm ∈ Us for any 1 ≤ s ≤ n: T L ← B1 (s + 1, s + 1), . . . Bk (s + 1, s + 1), C1 (s, s), . . . , Cm (s, s) where T L = Au (s + 1, s + 1) if L is objective atom A, and T L = Au (s + 1, s + 1) if L is default atom not A. The rule L ← B1 , . . . , not Bk is added at state s + 1, but does not remain true through inertia.
LUPS – A Language for Updating Logic Programs
175
Translation of retract events. For all update commands retract event L ← B1 , . . . , not Bk when C1 , . . . , not Cm ∈ Us for any 1 ≤ s ≤ n: ruleuL←B1 ,...,not Bk (s + 1, s + 1) ← C1 (s, s), . . . , Cm (s, s) u cancelL←B (s + 1, s + 1) ← C1 (s, s), . . . , Cm (s, s) 1 ,...,not Bk
The rule L ← B1 , . . . , not Bk is retracted at state s+1 under the conditions. The retraction does not remain true through inertia. Translation of persistent assert events. For all update commands always event L ← B1 , . . . , not Bk when C1 , . . . , not Cm ∈ Us for any 1 ≤ s ≤ n and t, q + 1 > s: u cancelL←B (s + 1, t) ← 1 ,...,not Bk
T L ← B1 (q + 1, q + 1), . . . Bk (q + 1, q + 1), cancelL←B1 ,...,not Bk (q + 1, q + 1), C1 (q, q), . . . , Cm (q, q) where T L = Au (q + 1, q + 1) if L is objective atom A, and T L = Au (q + 1, q + 1) if L is default atom not A. The translation of update programs queries is similar to the translation of conditions in update commands: Definition 5.2 (Translation of queries). Let Q = holds(B1 , . . . , Bk , not C1 , . . . , not Cm ) at q be a query to an update program U with language L. The translation of Q, T r(Q), is: B1 (q, q), . . . , Bk (q, q), C1 (q, q), . . . , Cm (q, q) Theorem 5.1 (Correctness of the translation). Let U be an update program. A query Q is true in U iff T r(U) |=sm T r(Q).
6
Comparisons
The language defined in this paper shares some similarities (and common motivations) with “Action Languages”7 . Indeed, both LUPS and “Action Languages” specify how knowledge changes as an effect of actions or commands. Thus, a comparison with such languages would be in order. However, lack of space prevents us from carrying out such comparisons here. Some work in this area, as well as some more examples on the usage of LUPS, can be found in [3]. 7
For an overview of action languages see [6].
176
J.J. Alferes et al.
Let it be said that the most notable difference between LUPS and “Action Languages” is that while the latter deal only with updates of propositional knowledge states, LUPS updates knowledge states that consist of knowledge rules, i.e. the outcome of a LUPS update is not a simple set of propositional literals but rather a set of rules. Thus, inertia applies to knowledge rules, not just to propositional fluents. In addition, our approach makes it easier to specify socalled “static laws”, and to deal with indirect effects of actions. Moreover, with LUPS, “static laws” may not necessarily be forever static: the laws that allow for indirect effects can themselves be subject to change. Another issue easily dealt with in LUPS is that of simultaneous actions. However, unlike most “Action Language”, LUPs does not cater for nondeterministic actions. Indeed, in LUPS, updates are just linear sequences of sets of commands.
Acknowledgements This work was partially supported by PRAXIS XXI project MENTAL, by FCT project ACROPOLE, by the National Science Foundation grant # IRI931-3061, and by a NATO grant for L. M. Pereira to visit Riverside.
References 1. J. J. Alferes, J. A. Leite, L. M. Pereira, H. Przymusinska, and T. Przymusinski. Dynamic logic programming. In A. Cohn and L. Schubert, editors, KR’98. Morgan Kaufmann, 1998. 2. J. J. Alferes and L. M. Pereira. Update-programs can update programs. In J. Dix, L. M. Pereira, and T. Przymusinski, editors, NMELP’96. Springer, 1996. 3. J. J. Alferes, L. M. Pereira, T. Przymusinski, H. Przymusinska, and P. Quaresma. Preliminary exploration on actions as updates. In Joint Conference on Declarative Programming, AGP’99, 1999. To appear. 4. M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In R. Kowalski and K. A. Bowen, editors, ICLP’88. MIT Press, 1988. 5. M. Gelfond and V. Lifschitz. Answer sets in general non-monotonic reasoning (preliminary report). In B. Nebel, C. Rich, and W. Swartout, editors, KR’92. Morgan-Kaufmann, 1992. 6. M. Gelfond and V. Lifschitz. Action languages. Linkoping Electronic Articles in Computer and Information Science, 3(16), 1998. 7. H. Katsuno and A. Mendelzon. On the difference between updating a knowledge base and revising it. In J. Allen, R. Fikes, and E. Sandewall, editors, KR’91. Morgan Kaufmann, 1991. 8. F. Lin. Embracing causality in specifying the indirect effects of actions. In IJCAI’95, pages 1985–1991. Morgan Kaufmann, 1995. 9. V. Marek and M. Truszczynski. Revision specifications by means of programs. In C. MacNish, D. Pearce, and L. M. Pereira, editors, JELIA ’94. Springer, 1994. 10. T. Przymusinski and H. Turner. Update by means of inference rules. In V. Marek, A. Nerode, and M. Truszczynski, editors, LPNMR’95. Springer, 1995. 11. M. Winslett. Reasoning about action using a possible models approach. In V. Marek, A. Nerode, and M. Truszczynski, editors, AAAI’88, 1988.
Pushing Goal Derivation in DLP Computations? Wolfgang Faber, Nicola Leone, and Gerald Pfeifer Institut f¨ ur Informationssysteme, TU Wien A-1040 Wien, Austria {faber,leone,pfeifer}@dbai.tuwien.ac.at
Abstract. dlv is a knowledge representation system, based on disjunctive logic programming, which offers front-ends to several advanced KR formalisms. This paper describes new techniques for the computation of answer sets of disjunctive logic programs, that have been developed and implemented in the dlv system. These techniques try to “push” the query goals in the process of model generation (query goals are often present either explicitly, like in planning and diagnosis, or implicitly in the form of integrity constraints). This way, a lot of useless models are discarded “a priori” and the computation converges rapidly toward the generation of the “right” answer set. A few preliminary benchmarks show dramatic efficiency gains due to the new techniques. Keywords: Disjunctive Logic Programming, Algorithms, Heuristics.
1
Introduction
dlv is a knowledge representation system, based on disjunctive logic programming (DLP) [13,8], which offers front-ends to several advanced KR formalisms [5,4,2]. A strong point of dlv is its highly expressive language, which allows to represent very hard problems (even Σ2P -hard problems) in an elegant and natural fashion. An efficient support for such an expressive language requires the use of smart algorithms and optimization techniques, that are able to deal with hard computational tasks. This paper describes new techniques for the computation of Disjunctive Logic Programs, that have been developed and implemented in the dlv system. We start from the observation that the DLP encoding of problems, and particularly of AI problems, often contains query goals. Frequently, these goals are explicit. In the DLP encoding of planning problems [6,11], for instance, we usually look for a particular answer set where a query goal, representing the desired evolution of the system, is true. Similarly, in abductive diagnosis we look for answer sets where the observation (encoded as a query goal) is true [2]. Sometimes the query goals are implicitly expressed by integrity constraints [14,12]. Consider, for instance, the program Php in Figure 1, that we will use ?
This work was supported by FWF (Austrian Science Funds) under the projects P11580-MAT and Z29-INF.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 177–191, 1999. c Springer-Verlag Berlin Heidelberg 1999
178
W. Faber, N. Leone, and G. Pfeifer (i) (ii) (iii) (iv)
reached(X) :- start(X). reached(X) :- reached(Y), inPath(Y,X). inPath(X,Y) ∨ outPath(X,Y) :- arc(X,Y). :- inPath(X,Y), inPath(X,Y1), Y <> Y1. :- inPath(X,Y), inPath(X1,Y), X <> X1. (v) :- node(X), not reached(X). Fig. 1. The Hamiltonian path program Php
as as a running example throughout the paper. The program computes the Hamiltonian paths of a directed graph starting at a given node. (Recall that a Hamiltonian path of a directed graph is a path through that graph that reaches each node exactly once.) In our example, the graph is given by relations arc(X, Y ) and node(X) denoting its arcs and nodes, respectively, plus a starting node start(X). Rule (iii) “guesses” a set S of arcs of the graph. Constraints (iv) enforce a path property on S. Constraint (v) imposes that all nodes are reached by S. The latter constraint is like a query reached(1), ..., reached(n)?, where 1..n are the nodes of the input graph,1 asking to compute an answer set in which all these atoms are true. 2 The techniques proposed in this paper try to “push” query goals in the DLP computation driving it towards the generation of the desired answer sets. The contributions of the paper are the following: – We define a new truth value, called must-be-true (mbt), for the partial interpretations we use during the computation. Intuitively, the mbt truth value is assigned to atoms that cannot be derived from any program rule at the current computation step, but for sure must eventually be true in the answer set to be computed (e.g., in the example above the atom reached(k) is mbt, for each node k of the graph). These atoms are not immediately taken as true in order to guarantee the “supportedness” of the interpretation at hand (i.e., that every true atom is derivable from some program rule). Supportedness is an important peculiarity of the logic programming semantics, and ensuring this property is a main difference between SAT checkers and logic programming systems. – We provide a new function det cons(I) which extends an interpretation I at hand in a deterministic way, such that every answer set containing I also contains its extension computed by det cons. This function “propagates” mbt truth values as much as possible, performing also a sort of backward chaining when it is possible in a deterministic way (e.g., undefined atoms a1 , ..., an become mbt if b :- a1 , ..., an is the only rule having the mbt b in the head). 1 2
Queries are indeed translated into constraints in dlv. Note that the starting node is not strictly required to be the end node of an arc of the Hamiltonian path in this encoding.
Pushing Goal Derivation in DLP Computations
179
– Most importantly, we propose a new heuristic function, driving the computation according with the criterion of “finding support” for the mbt atoms (i.e., making them true through a program rule deriving them). – We implement the above ideas in the dlv system, and report some results of experiments. Even if the results are preliminary, and further experiments have to be done, these experimental results undoubtedly evidence the usefulness of the proposed techniques, through a dramatic performance gain on relevant problems. It is worthwhile noting that the function det cons employs a disjunctive extension of Fitting’s operator [7] and extends to the disjunctive case a number of techniques already used in smodels [15] and XSB [1] for the computation of disjunction-free logic programs. The main contribution of the paper is the definition and the experimentation of the heuristics based on the new notion of mbt; we are not aware of any similar heuristics for nonmonotonic systems.
2
The Language of dlv
An atom is an expression of the form a(t1 , . . . ,tf ) where a is the predicate name of the atom and each ti (1 ≤ i ≤ f ) is either a constant or a variable. A literal is an expression of the form L or , not L, where , not is the negation as failure symbol, and L is an extended atom, i.e., either an atom A, or an atom preceded by the strong negation symbol ‘−’ (−A). A dlv program is a set of rules and constraints. A rule r has the form a1 ∨ . . . ∨ am :- b1 , . . . , bk , not bk+1 , . . . , not bn where m ≥ 1 and n ≥ 0. The part to the left of :- is called the head of the rule, the part to the right is the body. We denote by H(r) resp. B(r) the set of literals occurring in the head resp. body of r; B + (r) (resp. B − (r)) denotes the subset of positive (resp. negative) literals in B(r) (for the rule above we have H(r) = {a1 , . . . , am }, B + (r) = {b1 , . . . , bk }, B − (r) = {not bk+1 , . . . , not bn }, and B(r) = B + (r) ∪ B − (r)). A constraint is a rule with an empty head (m = 0). Ground queries of the form b1 , . . . , bk , not bk+1 , . . . , not bn ? are allowed and are translated to constraints internally. For example, the query a, not b? will generate the two constraints :- not a. and :- b. The semantics of a dlv program is given by its consistent answer sets [8]. It is worth noting that, in dlv computations, integrity constraints are simply treated as rules always having a f alse head. Moreover, every strongly negated atom −p(t¯) is replaced by a new atom p0 (t¯), where p0 is a fresh predicate symbol. For any such new atom p0 (t¯), a constraint :- p(t¯), p0 (t¯) is added to the program. Thus, though the system takes as input programs written in extended disjunctive DLP, they are treated internally as “traditional” disjunctive DLP programs.
3
Deriving Deterministic Consequences
In this section, we describe the function det cons which, given a ground program P and a (partial) interpretation I, derives certain knowledge concerning the
180
W. Faber, N. Leone, and G. Pfeifer
literals which are not yet decided in I. In other words, det cons extends the interpretation I in a deterministic way, such that every answer set containing I also contains its extension computed by det cons. This function is crucial for the efficiency of the system: the larger the interpretation it derives, the smaller the remaining search space. During the computation, we deal with four-valued interpretations, where we consider the following truth values: true (T), must-be-true (M or mbt), undefined (U), and false (F). Negation acts as follows on these truth values: not T = F , not M = F , not U = U , and not F = T . We associate the total ordering T > M > U > F to the truth values. An interpretation I for P is a total mapping from BP to {T, M, U, F }. We denote by I T (resp. I M , I U , I F ) the set of atoms whose truth value according to I is true (resp. must-be-true, undefined, false). Intuitively, the mbt truth value is assigned to atoms that cannot be derived from any program rule at the current computation step, but must eventually be true in the answer set to be computed. These atoms are not immediately taken as true in order to guarantee the “supportedness” of the interpretation at hand. This is a main peculiarity of answer sets w.r.t. ordinary models: Any atom p belonging to an answer set of P has a rule which supports p. Formally, p is true w.r.t. a given answer set S if and only if there exists a rule r ∈ P such that p ∈ H(r), H(r) − {p} is f alse w.r.t. S, and B(r) is true w.r.t. S − {p}. Thus, enforcing supportedness is a principal difference between DLP systems and satisfiability solvers (like, e.g., the Davis-Putnam procedure). Given an interpretation I for P, the function valI is defined as follows: For any atom p ∈ BP , valI (p) = I(p), and valI (not p) = not valI (p). Accordingly, for a ground rule r ∈ P, we define valI (H(r)) (resp., valI (B(r))) as the maximum (minimum) value assigned by valI over the literals in H(r) (B(r)). If H(r) = ∅, i.e., r is a constraint, then valI (H(r)) = F . r is satisfied w.r.t. I if the truth value of its head is not less than the truth value of its body, i.e. valI (H(r)) ≥ valI (B(r)). Moreover, for every atom p, we define support(p) as the set of rules in ground(P) such that valI (H(r) − {p}) < M and valI (B(r)) > F , i.e., the rules which can be potentially used to derive the truth of p “starting” from the interpretation I. |support(p)| denotes the cardinality of support(p). The procedure det cons is shown in Figure 2. Given an interpretation I for P, it extends I by what we call the deterministic consequences of I w.r.t. P. It can assign F , M or T to any undefined atom, but can only assign T to mbt atoms. det cons also detects inconsistencies, e.g. if some mbt atom should get the value F . As long as det cons has modified the interpretation I for the ground program P, the Boolean variable modif ied is true at the end of the repeat loop of Step 2. If any inconsistency is detected, the procedure immediately aborts by means of an exit instruction. Steps 4–12 focus on rules which are not satisfied (in the 4-valued sense described above). If the head of the rule is false and its body is either true or mbt, then the procedure exit returning contradiction = true, because there is no way
Pushing Goal Derivation in DLP Computations
181
Procedure det cons(P: Program; var I: Interpretation; var contradiction: Boolean) (* Computes the deterministic consequences for P w.r.t. I *) var modified: Boolean; begin (1) contradiction := f alse; (2) repeat (3) modif ied := f alse; (* Enforce satisfaction of all rules *) (4) for each rule r ∈ ground(P) not satisfied w.r.t. I do (5) if valI (B(r)) ≥ M and valI (H(r)) = F (6) contradiction := true; exit procedure; (7) else if valI (B(r)) ≥ M and valI (H(r) − {p}) = F for a p ∈ H(r) then (8) I(p) := valI (B(r)); modif ied := true; (9) if valI (H(r)) = F and valI (B(r) − {L}) ≥ M for some undefined literal L ∈ B(r) then (10) modif ied := true; (11) if L is a positive literal p then I(p) := F ; (12) else (* L is a negative literal not p *) I(p) := M ; end for; (* Ensure supportedness *) (13) if |support(p)| = 0 and I(p) ≥ M for some atom p then (14) contradiction := true; exit procedure; (15) for each atom p s.t. I(p) = U and |support(p)| = 0 do (16) I(p) := F ; (17) for each atom p s.t. I(p) ≥ M and |support(p)| = 1 do Let r be the (unique) rule in support(p); (18) for each undefined atom q ∈ (H(r) − {p}) do (19) I(q) := F ; modif ied := true; (20) for each undefined positive literal q ∈ B(r) do (21) I(q) := M ; modif ied := true; (22) for each undefined negative literal not q ∈ B(r) do (23) I(q) := F ; modif ied := true; end for; (24) until not modified end procedure
Fig. 2. Function for computing the deterministic consequences
to satisfy r (recall that true and f alse atoms cannot be changed and mbt can evolve only into true). Steps 7–12 enforce the satisfaction of a rule r ∈ P if this can be done deterministically, i.e., by changing the value of exactly one literal occurring in r. Consider Steps 7–8: If the truth value of B(r) according to I is X, where X is at least M , i.e., mbt or true, and every atom in the head of r is f alse, except for one atom p, we can draw a deterministic consequence. We enforce the satisfaction of r by incrementing the truth value of p up to the value of B(r).
182
W. Faber, N. Leone, and G. Pfeifer
For instance, if p is either undefined or mbt w.r.t. I and valI (B(r)) = T , then I is modified by assigning the value T to p, denoted by I(p) := T in the algorithm. Note that this is the only step which can assign the value true to an atom, that is, det cons assigns the value true to an atom p only if p is “supported”. Now, consider Steps 9–12: If the head of r is f alse, but its body is at least mbt, except for one undefined literal L, then L should get the truth value f alse in order to satisfy r. Note that, if L is a negative literal not p, this is accomplished by setting I(p) := M . Indeed, declaring p true would not guarantee the “supportedness” of this atom. Steps 13–23 draw deterministic conclusions following the “supportedness” principle; they are a main novelty of our approach. In particular, if a true or mbt atom has no supporting rule according to the interpretation I, then we get a contradiction (Step 14), while an undefined atom without any supporting rule can be declared f alse (Steps 15–16). If a true or mbt atom p has only one supporting rule r, i.e., support(p) = {r}, then r must be able to derive the truth of p. Thus, we enforce that p is derivable from r assigning suitable truth values to every undefined literal occurring in r (see Steps 18–23). This is a sort of backward propagation step: From the truth of the head we derive that all body literals must be true. Example 1. Consider the program from Figure 1 applied to (the encoding of) the graph of Figure 3 starting with the “empty” interpretation I, where I T = I M = I F = ∅ and I U = BP . By rule (i), reached(a) is immediately derived (by steps 4–8). The constraint (v) essentially serves as a query that assures that all nodes are indeed reached. As node(X) is true for all nodes, reached(X) is derived as mbt by means of Steps 9–12, for each X ∈ {a, b, c, d, e}. The mbt atom reached(b) is only derivable by a single ground instance3 of rule (ii), namely reached(b) :- reached(a), inPath(a,b). At this point, the backward propagation step described above comes into play and sets inPath(a,b) to mbt (lines 17–21). Then, support(outPath(a,b)) becomes empty, since the only rule with outPath(a,b) in the head contains also the mbt inPath(a,b). Thus, outPath(a,b) is derived as false (lines 15–16). In turn, this causes Steps 4–8 to derive inPath(a,b) as true. Now we easily derive reached(b) as true from (ii). Also inPath(c,d) and outPath(c,d) are derived as true and false, respectively, in analogy to inPath(a,b) and outPath(a,b). Each node in a Hamiltonian path has exactly one outgoing arc, and indeed Steps 9–12 derive inPath(a,c) and inPath(a,e) as false, which then leads to outPath(a,c) and outPath(a,e) to be set to true. Now we derive inPath(b,c) and inPath(d,e) as true in the same way we derived inPath(a,b) above, and eventually are able to obtain reached(c), reached(d), and reached(e). That is, starting from an “empty” interpreta3
Note that the instantiation procedure of dlv generates only ground rules that are constructible from the facts in the input ([4]).
Pushing Goal Derivation in DLP Computations
183
tion, a single invocation of det cons has deterministically and efficiently found the Hamiltonian path for this graph.
b
a
c
d
e
Fig. 3. Example graph 1
Recall that a total model S for P is an interpretation satisfying the following properties: (i) every rule of P is satisfied w.r.t. S, and (ii) S M ∪ S U = ∅, i.e., every atom is either true or false w.r.t. S. The partial order on the four truth values is defined through the following relationships: U F , U M , U T , and M T ; moreover, X X for any X ∈ {F, U, M, T }. We say that an interpretation I 0 extends an interpretation I if, for each atom p, I(p) I 0 (p). Intuitively, I 0 represents more concrete knowledge than I does. The correctness of our algorithm relies on the following property of det cons. Theorem 1. Let P and I be the program and the interpretation, resp., given as inputs to det cons, and denote by I 0 and contradiction0 the value of the variables I and contradiction, resp., at the end of the procedure. Then, 1. I 0 extends I; 2. every answer set S for P which extends I, extends I 0 as well; 3. if contradiction0 holds, then no answer set S 0 for P extends I. It is worthwhile noting that det cons has been implemented very carefully in our system. By using sophisticated data structures for representing rules and interpretations, it runs in linear time, i.e., in time O(k P k + k I k), where k · k denotes the size of an object.
4
Overall Model Generation Algorithm
In this section we briefly review the model generation algorithm (Model Generator) of the dlv system, in order to show how the new notion of mbt and the new function det cons are employed. The Model Generator (MG) produces a set of interpretations that are “candidates” for answer sets, which are then submitted to the Model Checker for verification. The Model Generator essentially relies on a backtracking technique which spans the search space for computing all answer sets.
184
W. Faber, N. Leone, and G. Pfeifer
Basically, the MG works as follows: (1) Derive what is deterministically derivable from the program, (2) make an “educated” guess for one of those literals which have not been decided yet, and (3) propagate the consequences of this choice. This process is recursively applied until either a contradiction arises, or no further guess can be made. In the former case, MG backtracks and modifies the last choice; in the latter case, we have an answer set candidate and the Model Checker is called. If the candidate is not an answer set, backtracking is performed. To formalize what we have called “educated guess” before, we introduce the concept of a possibly-true (PT) literal: Definition 1. Let I be an interpretation for P. A positive PT literal of P w.r.t. I is a positive literal p such that U ≤ I(p) ≤ M and there exists a rule r ∈ ground(P) for which all of the following conditions hold: 1. p ∈ H(r); 2. valI (H(r)) < T (i.e., the head is not true w.r.t. I); 3. valI (B(r)) = T (i.e., the body is true w.r.t. I). A negative PT literal of P w.r.t. I is an undefined negative literal not q such that there exists a rule r ∈ ground(P) for which all of the following conditions hold: 1. 2. 3. 4.
not q ∈ B − (r); valI (H(r)) < T (i.e., the head is not true w.r.t. I); valI (B + (r)) = T (i.e., the body is true w.r.t. I). valI (B − (r)) ≥ U (i.e., no negative literal of the body is false w.r.t. I).
The set of all (positive and negative) PT literals of P w.r.t. I is denoted by P TP (I). Example 2. Consider the program P = {a ∨ b :- c, not d. e :- c, not f.} and let I = {c, not d} be an interpretation for P. Then, we have three PT literals of P w.r.t. I: a, b and not f . The actual algorithm for computing answer sets is shown in Figure 4. There, isAnswerSet is a function which returns true iff I T is an answer set for P. It is worth noting that the essence of the MG, based on the notion of PT, has not significantly changed w.r.t. previous versions; the reader is referred to [10,3,4] for further details on the other features.
5
Heuristics
In this section we focus on the question how to select PT literals in line (6) of ComputeAnswerSets in Figure 4, such that the likelihood of finding an answer set is maximized.
Pushing Goal Derivation in DLP Computations
185
Algorithm ComputeAnswerSets Input: A ground DLP program P. Output: The answer sets of P (if any). Procedure ComputeAnswerSets(I: Interpretation) (* The procedure outputs all answer sets of P *) var Q: SetOfLiterals; L: Literal; (1) det cons(P, I,contradiction); (2) if contradiction then exit procedure; (3) if (P TP (I) = ∅) then (* I T ∪ I M is a model of P *) (4) if (I M = ∅) and isAnswerSet(P, I T ) then (5) output I T ; (* I T is an answer set *) else (6) Take a literal L from P TP (I); (* Assume the truth of a PT literal *) (7) if L is a negative literal not p then (8) I(p) := F ; else (* L is a positive literal *) (9) I(L) := T ; (10) ComputeAnswerSets(I); (* At this point all answer sets containing I ∪{L} have been generated *) (* L must be false in following computations *) (11) if L is a negative literal not p then (12) I(p) := M ; else (13) I(L) = F ; (14) ComputeAnswerSets(I); end procedure var I: Interpretation; begin (* Main *) I T := ∅; I F := ∅; I M := ∅; I U := BP ; ComputeAnswerSets(I); end. Fig. 4. Algorithm for the Computation of Answer Sets
To this end we employ so-called “lookahead”, that is, we temporarily assume the truth of one PT literal at a time 4 and perform the deterministic derivations, i.e. we apply the det cons function. On the basis of the changes which have been derived during this lookahead, we then make the decision which PT literal should be taken. (Note that the smodels system [15,16] also employs lookahead, but they use a completely different heuristics.) Definition 2. A mbt atom p is said to be of level n (w.r.t. an interpretation I), if |support(p)| = n (w.r.t. I). 4
For a negative literal this means assigning false to its atom.
186
W. Faber, N. Leone, and G. Pfeifer
During the lookahead we record the following counters for each PT literal p: mbt− (p) The overall number of eliminated mbt atoms (mbt which became true). mbt+ (p) The overall number of inserted mbt atoms (undefined which became mbt). mbt− 2 (p) The number of eliminated mbt atoms of level 2. mbt+ 2 (p) The number of inserted mbt atoms of level 2. mbt− 3 (p) The number of eliminated mbt atoms of level 3. mbt+ 3 (p) The number of inserted mbt atoms of level 3. The respective level is w.r.t. the interpretation at the moment the mbt atom is assigned true. In addition, we define some difference functions: ∆mbt (p) = mbt− (p) − mbt+ (p) + ∆mbt2 (p) = mbt− 2 (p) − mbt2 (p) − ∆mbt3 (p) = mbt3 (p) − mbt+ 3 (p) Concerning heuristics itself, we have defined a heuristic relation over the set of PT literals as follows: Definition 3. Given two PT literals a and b, we define an ordering relation > as follows: If (mbt− (a) = 0 ∧ mbt− (b) > 0) ∨ (mbt− (a) > 0 ∧ mbt− (b) = 0) then a > b ⇔ mbt− (a) > mbt− (b) otherwise a > b holds if one of the following conditions applies: 1. ∆mbt (a) > ∆mbt (b) 2. ∆mbt2 (a) > ∆mbt2 (b) ∧ ∆mbt (a) = ∆mbt (b) 3. ∆mbt3 (a) > ∆mbt3 (b) ∧ ∆mbt (a) = ∆mbt (b) ∧ ∆mbt2 (a) = ∆mbt2 (b) Further, let a = b be true if a 6> b ∧ b 6> a. In other words, if exactly one of mbt− (a) and mbt− (b) is zero, we prefer the PT literal for which mbt− is non-zero. Otherwise (i.e., both mbt− (a) and mbt− (b) are zero or both are non-zero), we prefer the one for which the overall number of mbt atoms becomes smaller. If this number is equal, we prefer the one for which the overall number of mbt atoms of level 2 becomes smaller. If also this number is equal, we use the number of mbt atoms of level 3. Otherwise, we consider them to be equal. The reasoning behind this relation is that the total number of mbt atoms can be viewed as constraints which are not yet satisfied but eventually have to be for any answer set. So the fewer mbt atoms there are, the smaller is the distance to an answer set. Additionally, mbt atoms of level 2 and 3 are the ones which are the “hardest” to become satisfied (observe that mbt atoms of level 1 are always derived by det cons). The purpose of the test whether exactly one of mbt− (a) or mbt− (b) is zero is that in this case we want to avoid preferring a PT literal, which only introduces
Pushing Goal Derivation in DLP Computations
187
new mbt atoms but does not eliminate any, over one which eliminates some but introduces more (the former is like a “null action”). The guessing step in the Model Generator (line (6) in ComputeAnswerSets in Figure 4) takes a PT literal which is a maximum w.r.t. ≥. Example 3. Consider again the program for computing Hamiltonian paths shown in Figure 1, now together with the encoding of the graph depicted in Figure 5 plus start(a). b
a
c
d
e
Fig. 5. Example graph 2
By the first call to det cons, only reached(a) is set to true, while reached(b), reached(c), reached(d), and reached(e) are assigned mbt because of the single literal constraints obtained by (v) (see Appendix A). The choice rule (iii) is instantiated with the arcs (see Appendix A); these rules supply the PT literals (all of which are positive). Note that the rules which define the predicate reached are instantiated in a way such that reached(n) occurs in the head of exactly two rules for each node n (apart from a). This is because each of these nodes has exactly two incoming arcs. Each of the reached(n) (n = {b, . . . , e}) needs support, but it is not yet known which of the two rules will supply it eventually. To evaluate the heuristic relation, we perform a lookahead: for each PT L, we assume L true, compute its deterministic consequences (by a call to det cons), and store the values of the respective mbt counters. Let us first consider the PT literal inPath(a,b): Upon assuming it true, we immediately derive reached(b) as true, and thus eliminate a mbt atom of level 2 (since it occurs in the head of two unsatisfied rules). By statements (9) – (11) in det cons we derive falsity for inPath(a,c), inPath(a,d), inPath(a,e), and inPath(d,b), reflecting the fact that no two arcs in the Hamiltonian path may begin in the same node or end in the same node (constraints (iv) in Figure 1). After that, for each of reached(c), reached(d), reached(e) only one supporting rule is left, so we can infer that the yet undefined positive body literals of these rules (inPath(b,c), inPath(c,d), inPath(d,e)) are mbt. Moreover, since each of them occurs in the head of exactly one rule and the body of this rule is true, we infer them as true immediately afterwards and eventually we also infer reached(c), reached(d), and reached(e) as true. These steps are visualized in Figure 6, where bold arcs are in the Hamiltonian path, while dashed arcs are not.
188
W. Faber, N. Leone, and G. Pfeifer b
a
b c
d
e
a
b c
d
e
a
c
d
e
Fig. 6. Steps during lookahead for inPath(a,b)
In total, the deterministic derivation has generated 3 new mbt atoms (all of which have subsequently been derived as true) and eliminated 7 mbt atoms, one of which was of level 2. All PT literals and their corresponding heuristic-relevant function values, ordered by ≥, are shown in Table 1. Those which are not listed (inPath(a,c), inPath(a,d), outPath(b,c), outPath(c,d)) generate an inconsistency during propagation. + − + PT literal mbt− mbt+ mbt− 2 mbt2 mbt3 mbt3 inPath(a,b) 7 3 1 0 0 0 outPath(a,e) 8 4 0 0 0 0 outPath(d,b) 8 4 0 0 0 0 inPath(d,e) 7 3 0 0 0 0 inPath(a,e) 4 3 1 0 0 0 outPath(a,b) 5 4 0 0 0 0 inPath(d,b) 4 3 0 0 0 0 outPath(a,b) 5 4 0 0 0 0 outPath(a,c) 1 1 0 0 0 0 outPath(a,d) 1 1 0 0 0 0 inPath(b,c) 0 0 0 0 0 0 inPath(c,d) 0 0 0 0 0 0
Table 1. PT literals and their values, ordered by ≥
Thus, following the heuristics, the PT inPath(a,b) is chosen by our computation. Then, the propagation of it, done by det cons, immediately leads to the computation of the Hamiltonian path. Thanks to the heuristics only one choice was sufficient! Note that performing lookahead has an additional merit: If an inconsistency is detected during the propagation of the PT literal, we can then set it to false and apply det cons, thus pruning the search tree quite a bit.
6
Some Experimental Results
We have conducted a number of experiments, in order to show the usefulness of the various techniques introduced in this paper. To this end we have compared several versions of dlv: The first one is the release of February 10th , 1999. It contains only a small part of det cons, notably without the notion of mbt atoms and also without the part ensuring supportedness. Also heuristics are not included.
Pushing Goal Derivation in DLP Computations
189
The second one is the release of April 6th , 1999. This one contains the fully implemented first part of det cons, i.e. statements (4) – (12). The part ensuring supportedness is missing (as in the previous version), and also heuristics are not yet included. The third version is the release of May 28th , 1999. It contains the full implementation of det cons, but heuristics are not included. Finally, the fourth version is the previous one, enriched by heuristics. The public release of this version is dated June 8th , 1999. We have benchmarked a set of blocksworld instances, most of which are taken from [6] (except for P5). We use an encoding of the problem domain which is different from the one in [6], but which is also derived from an encoding in an action language. You can find the domain encoding and the instances in Appendix B. Blocksworld Examples P1 to P5
Runtime in seconds
800
600
400
without enhancements with must-be-true (without supportedness) with full det_cons with full det_cons and heuristics
200
0 1
2
3 Instance #
4
5
Since we chose the Hamiltonian path problem as a running example, we have also picked a random graph with 25 nodes and 60 arcs5 and run the program of Figure 1 and an arbitrarily picked starting node (node 0) with it. Version 1 could not find a Hamiltonian path within 1000 seconds, while version 2 found one in 716 seconds, and version 3 took 750 seconds. With heuristics enabled, dlv was able to find a path in 12.7 seconds!
References 1. W. Chen and D. S. Warren. Computation of Stable Models and Its Integration with Logical Query Processing. IEEE Transactions on Knowledge and Data Engineering, 8(5):742–757, 1996. 2. T. Eiter, W. Faber, N. Leone, and G. Pfeifer. The Diagnosis Frontend of the dlv System. AI Communications – The European Journal on Artificial Intelligence, 12(1–2):99–111, 1999. 3. T. Eiter, N. Leone, C. Mateis, G. Pfeifer, and F. Scarcello. A Deductive System for Nonmonotonic Reasoning. In Proc. LPNMR ’97, pages 363–374. 4. T. Eiter, N. Leone, C. Mateis, G. Pfeifer, and F. Scarcello. Progress Report on the Disjunctive Deductive Database System dlv. In Proc. FQAS ’98, pages 145–160. 5. T. Eiter, N. Leone, C. Mateis, G. Pfeifer, and F. Scarcello. The KR System dlv: Progress Report, Comparisons and Benchmarks. In Proc. KR’98, pages 406–417. 5
Generated by the Stanford Graphbase [9] using
random graph(25,60,0,0,0,0,0,1,1,60).
190
W. Faber, N. Leone, and G. Pfeifer
6. E. Erdem. Applications of Logic Programming to Planning: Computational Experiments. Unpublished draft, 1999. 7. M. Fitting. A Kripke-Kleene semantics for logic programs. Journal of Logic Programming, 2(4):295–312, 1985. 8. M. Gelfond and V. Lifschitz. Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing, 9:365–385, 1991. 9. D. E. Knuth. The Stanford GraphBase : a platform for combinatorial computing. ACM Press, New York, 1994. 10. N. Leone, P. Rullo, and F. Scarcello. Disjunctive stable models: Unfounded sets, fixpoint semantics and computation. Information and Computation, 135(2):69– 112, June 1997. 11. V. Lifschitz. Action Languages, Answer Sets and Planning. In K. Apt, V. W. Marek, M. Truszczy´ nski, and D. S. Warren, editors, The Logic Programming Paradigm - A 25-Year Perspective, pages 357–373. Springer Verlag, 1999. 12. V. W. Marek and M. Truszczy´ nski. Stable Models and an Alternative Logic Programming Paradigm. In K. Apt, V. W. Marek, M. Truszczy´ nski, and D. S. Warren, editors, The Logic Programming Paradigm - A 25-Year Perspective, pages 375–398. Springer Verlag, 1999. 13. J. Minker. On Indefinite Data Bases and the Closed World Assumption. In Proc. CADE ’82, pages 292–308. 14. I. Niemel¨ a. Logic Programs with Stable Model Semantics as a Constraint Programming Paradigm. In Proceedings of the Workshop on Computational Aspects of Nonmonotonic Reasoning, May 1998. 15. I. Niemel¨ a and P. Simons. Smodels - an implementation of the stable model and well-founded semantics for normal logic programs. In Proc. LPNMR ’97, pages 420–429. 16. P. Simons. Towards constraint satisfaction through logic programs and the stable model semantics. Research Report A47, Digital Systems Laboratory, Department of Computer Science, Helsinki University of Technology, Finland.
A
Instantiation of the Hamiltonian Path Program
Here are the rules of the Hamiltonian path program in Figure 1, instantiated with example graph 2 of Figure 5: :::::::::::-
inPath(a,b), inPath(a,b), inPath(a,b), inPath(a,c), inPath(a,c), inPath(a,c), inPath(a,d), inPath(a,d), inPath(a,d), inPath(d,b), inPath(d,e),
inPath(a,c). inPath(a,d). inPath(a,e). inPath(a,b). inPath(a,d). inPath(a,e). inPath(a,b). inPath(a,c). inPath(a,e). inPath(d,e). inPath(d,b).
::::::::-
inPath(a,b), inPath(d,b), inPath(a,c), inPath(b,c), inPath(c,d), inPath(a,d), inPath(d,e), inPath(a,e),
::::-
not not not not
inPath(d,b). inPath(a,b). inPath(b,c). inPath(a,c). inPath(a,d). inPath(c,d). inPath(a,e). inPath(d,e).
reached(b). reached(c). reached(d). reached(e).
Pushing Goal Derivation in DLP Computations inPath(a,b) inPath(a,c) inPath(a,d) inPath(a,e) inPath(b,c) inPath(c,d) inPath(d,b) inPath(d,e)
B
∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨
reached(b) reached(b) reached(c) reached(c) reached(d) reached(d) reached(e) reached(e)
outPath(a,b). outPath(a,c). outPath(a,d). outPath(a,e). outPath(b,c). outPath(c,d). outPath(d,b). outPath(d,e).
::::::::-
reached(a), reached(d), reached(a), reached(b), reached(c), reached(a), reached(d), reached(a),
191
inPath(a,b). inPath(d,b). inPath(a,c). inPath(b,c). inPath(c,d). inPath(a,d). inPath(d,e). inPath(a,e).
The Blocksworld Domain and Instances
% specification of the move action move(B,L,T) v -move(B,L,T) :- block(B), location(L), actiontime(T), B <> L. % the effects of moving a block on(B,L,T1) :- move(B,L,T), #succ(T,T1). -on(B,L,T1) :- move(B, ,T), on(B,L,T), #succ(T,T1). % move preconditions % a block can be moved only when it’s clear :- move(B,L,T), on(B1,B,T). % if a block is moved onto another block, the latter must be clear :- move(B,B1,T), on(B2,B1,T), block(B1). % concurrent actions are not allowed :- move(B, ,T), move(B1, ,T), B <> B1. :- move( ,L,T), move( ,L1,T), L <> L1. % inertia on(B,L,T1) :- on(B,L,T), not -on(B,L,T1), #succ(T,T1). % time at which actions can be initiated actiontime(T) :- T < #maxint, #int(T). % location definition (blocks are defined in the problem instances) true. location(t) :- true. location(B) :- block(B). P1
1
2
4 3
P4 2 1 0
10 9 4 3
8 7 6 5
0 4 9
7 8 3
5 4 3 2 1
P2
4 3 2 1
1 4 3
1 2 10 6 5
2 5
P5 2 1 0
10 9 4 3
8 7 6 5
0 4 9
7 8 3
P3 2 3 0
1 2 10 5 6
4 1
7 6 5
7 3 4
2 6
Problem blocks steps P1 4 4 P2 5 6 P3 8 8 P4 11 9 P5 11 11
5 0 1
Linear Tabulated Resolution for the Well-Founded Semantics Yi-Dong Shen?1 , Li-Yan Yuan2 , Jia-Huai You2 , and Neng-Fa Zhou3 1
3
Department of Computer Science, Chongqing University, Chongqing 400044, P.R.China,
[email protected] 2 Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2H1, {yuan, you}@cs.ualberta.ca Department of Computer and Information Science, Brooklyn College, The City University of New York, New York, NY 11210-2889, USA,
[email protected]
Abstract. Global SLS-resolution and SLG-resolution are two representative mechanisms for top-down evaluation of the well-founded semantics of general logic programs. Global SLS-resolution is linear but suffers from infinite loops and redundant computations. In contrast, SLG-resolution resolves infinite loops and redundant computations by means of tabling, but it is not linear. The distinctive advantage of a linear approach is that it can be implemented using a simple, efficient stack-based memory structure like that in Prolog. In this paper we present a linear tabulated resolution for the well-founded semantics, which resolves the problems of infinite loops and redundant computations while preserving the linearity. For non-floundering queries, the proposed method is sound and complete for general logic programs with the bounded-term-size property.
1
Introduction
Two representative methods have been presented in literature for top-down evaluation of the well-founded semantics of general logic programs: Global SLSresolution [5,6] and SLG-resolution [2,3]. Global SLS-resolution is a direct extension to SLDNF-resolution [4], which treats infinite derivations as failed and infinite recursions through negation as undefined. Like SLDNF-resolution, it is linear in the sense that for any derivation G0 ⇒C1 ,θ1 G1 ⇒ ... ⇒Ci ,θi Gi with Gi the latest generated goal, it makes the next derivation step either by expanding Gi by resolving a subgoal in Gi with a program clause, i.e. Gi ⇒Ci+1 ,θi+1 Gi+1 , or by expanding Gi−1 via backtracking. The distinctive advantage of a linear approach is that it can be implemented using a simple, efficient stack-based memory structure (like that in Prolog). However, Global SLS-resolution inherits ?
Currently on leave at Department of Computing Science, University of Alberta, Canada. Email:
[email protected]
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 192–205, 1999. c Springer-Verlag Berlin Heidelberg 1999
Linear Tabulated Resolution for the Well-Founded Semantics
193
from SLDNF-resolution the two serious problems: infinite loops and redundant computations. SLG-resolution (similarly, Tabulated SLS-resolution [1]) is a tabling mechanism for top-down evaluation of the well-founded semantics. The main idea of tabling is to store intermediate results of relevant subgoals and then use them to solve variants of the subgoals whenever needed. Since no variant subgoals will be recomputed by applying the same set of program clauses, infinite loops can be avoided and redundant computations be substantially reduced. Like other existing tabling mechanisms, SLG-resolution adopts the solution-lookup mode. That is, all nodes in a search tree/forest are partitioned into two subsets, solution nodes and lookup nodes. Solution nodes produce child nodes using program clauses, whereas lookup nodes produce child nodes using answers in the tables. As an illustration, consider the derivation p(X) ⇒Cp1 ,θ1 q(X) ⇒Cq1 ,θ2 p(Y ). Assume that no answers of p(X) have been derived. Since p(Y ) is a variant of p(X) and thus a lookup node, the next derivation step is to expand p(X) against a program clause, instead of expanding the latest generated goal p(Y ). Apparently, such a derivation is not linear. Because of such non-linearity, SLGresolution can neither be implemented using an efficient stack-based memory structure nor utilize those useful strictly sequential operators such as cuts in Prolog. This has been evidenced by the fact that a well-known tabling system, XSB, which is an implementation of SLG-resolution [7,8,9], disallows clauses like p(.) ← ..., t(.), !, ... where t(.) is a tabled subgoal, because the tabled predicate t occurs in the scope of a cut [9]. The objective of our research is to develop a linear tabling method for topdown evaluation of the well-founded semantics of general logic programs, which resolves infinite loops and redundant computations, without sacrificing the linearity of SLDNF-resolution. In an earlier paper [11], we presented a linear tabling mechanism called TP-resolution for positive logic programs (“TP” for “Tabulated Prolog”). In TP-resolution, each node in a search tree can act both as a solution node and as a lookup node, regardless of when and where it is generated. This represents an essential difference from existing tabling approaches. The main idea is as follows: For any selected subgoal A at a node Ni labeled with a goal Gi , we first try to use an answer I in the table of A to generate a child node Ni+1 , which is labeled by the resolvant of Gi and I. If such answers are not available in the table, we then resolve A against program clauses in a top-down order, except for the case where the derivation has stepped into a loop at Ni . In such a case, the subgoal A will skip the clause that is being used by its ancestor subgoal that is a variant of A. For example, for the derivation p(X) ⇒Cp1 ,θ1 q(X) ⇒Cq1 ,θ2 p(Y ), we will expand p(Y ) by resolving it against the program clause next to Cp1 . Thanks to its linearity, TP-resolution can be implemented by an extension to any existing Prolog abstract machines such as WAM [14] or ATOAM [15]. In this paper, we extend TP-resolution to TPWF-resolution, which computes the well-founded semantics of general logic programs. The extension is nontrivial because of possible infinite recursions through negation. In addition to
194
Y.-D. Shen et al.
the strategy for clause selection adopted by TP-resolution, TPWF-resolution uses two critical mechanisms to deal with infinite recursions through negation. One is making assumptions for negative loop subgoals whose truth values are currently undecided, and the other is doing answer iteration to derive complete answers of loop subgoals. For non-floundered queries, TPWF-resolution is sound and complete for general logic programs with the bounded-term-size property. Section 2 will give an illustrative example to outline these main ideas, and Section 3 defines TPWF-trees based on these strategies. Section 4 presents the definition of TPWF-resolution and discusses its properties. 1.1
Notation and Terminology
Variables begin with a capital letter, and predicates, functions and constants with a lower case letter. By E we denote a list/tuple (E1 , ..., Em ) of elements. Let X = (X1 , ..., Xm ) be a tuple of variables and I = (I1 , ..., Im ) a tuple of terms. By X/I we denote an mgu {X1 /I1 , ..., Xm /Im }. By p(.) we refer to any atom with the predicate p and p(X) to an atom p(.) that contains the list X of distinct variables. For instance, if p(X) = p(W, a, f (Y, W ), Z), then X = (W, Y, Z). By a variant of an atom (resp. subgoal or term) A we mean an atom (resp. subgoal or term) A0 that is the same as A up to variable renaming.1 A set of atoms (resp. subgoals or terms) that are variants of each other are called variant atoms (resp. variant subgoals or variant terms). Moreover, for any element E by E being in a set S we understand a variant of E is in S. For convenience of describing our method, we use the four truth values: t (true), f (false), u (undefined), and u∗ (temporarily undefined), with ¬t = f , ¬f = t, ¬u = u, and ¬u∗ = u∗ . As its name suggests, u∗ will be used as a temporary truth value when the truth value (t, f or u) of a subgoal is currently undecided (due to the occurrence of loops). In addition to f ∧V = f and t∧V = V for any V ∈ {t, f, u, u∗ }, we have u ∧ u∗ = u∗ . Let A be an atom. By A∗ we refer to an answer A with truth value u∗ . Finally, clauses in a program with the same head predicate p are numbered sequentially, with Cpi referring to its i-th clause (i > 0).
2
Main Ideas
In this section, we outline the main ideas of TPWF-resolution through an illustrative example. Example 1. Consider the following program: P1 : p(X) ← q(X). p(a). q(X) ← ¬r. q(X) ← w. 1
By this definition, A is a variant of itself.
Cp1 Cp2 Cq1 Cq2
Linear Tabulated Resolution for the Well-Founded Semantics
q(X) ← p(X). r ← ¬s. s ← ¬r. w ← ¬w, v.
195
Cq3 Cr1 Cs1 Cw1
Let G0 =← p(X) be the query (top goal). Reasoning in the same way as Prolog,2 we successively generate the nodes N0 − N7 as shown in Fig. 1. Obviously Prolog will repeat the loop between N3 and N7 infinitely. However, we break the loop by disallowing N7 to select the clause Cr1 , which is being used by N3 . This makes N7 have no clause to unify with, which leads to backtracking. Since the loop is negative in the sense that it goes through negation, N7 should not be failed by falsifying r at this moment. Instead, r is assumed to be temporarily undefined (i.e. r = u∗ ). By definition r = u∗ (at N7 ) means ¬r = u∗ (at N6 ), so that s = u∗ (at N5 ) is derived. For the same reason, r = u∗ (at N3 ) is derived. N0 : p(X)
Cp1 Cq1
N3 : r
Cr1 ?
N4 .: ¬s ..
. ..
. ..
?
U
N1 : q(X)
N13 : q(X)
H Cq3 Cq2 ? H j H
N2 .: ¬r ..
HHCp1 j H
N8 : w
N12 : p(X)
Cw1 ?
N9 : ¬w, v
.. .. ..
N10 : w
Cq3 ?
N14 : p(X)
C p2 ?
T
?
N11 : v
N5 : s
Cs1 ?
N6 : . ¬r ..
. ..
N7 : r
Fig. 1. TPWF-derivations.
We use two data structures, U A and U D, to keep atoms that are assumed and derived to be temporarily undefined, respectively. Therefore, after these steps, U A = {r} and U D = {s, r}. We are then back to N3 . Since N3 is the top node of the loop, before failing it via backtracking we need to be sure that r has got its complete set of answers (r = t or r = u or r = f ). This is achieved by performing answer iteration via the loop. That is, we regenerate the loop to see if any new answers can be derived until we reach 2
That is, we use the following control strategy: Depth-first (for goal selection) + Left-most (for subgoal selection) + Top-down (for clause selection) + Last-first (for backtracking).
196
Y.-D. Shen et al.
a fixpoint. We use a flag variable N EW , with N EW = 0 initially. Whenever a new answer with truth value t or u for any subgoal is derived, N EW is set to 1. Before starting an iterate, we set N EW = 0 and U A = U D = {}. The answer iteration stops by the end of some iterate, where N EW = 0 and (U A ⊆ U D or U D = {}). The fact that N EW = 0 and U A ⊆ U D indicates that the truth values of all atoms in U A totally depend on how they are assumed in the negative loop, which, under the well-founded semantics [12], amounts to saying that these truth values are undefined. Since up to now no answer with truth value t or u has been derived (i.e. N EW = 0) and U A = {r} ⊂ U D = {s, r}, the termination condition of answer iteration is satisfied. Therefore, we change the truth values of all atoms in U D from temporarily undefined to undefined (i.e. r = s = u) and memorize the new answers in respective tables. After the completion of answer iteration, we set U A = U D = {}. By definition, r = u (at N3 ) means ¬r = u (at N2 ), which leads to an answer node U for the top goal (see Fig. 1). That is, we have q(X) = u and p(X) = u, which are memorized in their tables. Now we backtrack q(X) at N1 . Applying Cq2 and Cw1 leads to N8 − N10 , which forms another negative loop. In the same way as above, we assume w = u∗ and put w into U A. So ¬w = u∗ , which leads to the node N11 . Since v is false, we backtrack to N9 and then to N8 , with N EW = 0, U A = {w} and U D = {}. Again, before leaving N8 via backtracking, we need to complete the answers of w by means of answer iteration via the loop. Obviously, the termination condition of answer iteration is satisfied. Here N EW = 0, w ∈ U A and w 6∈U D suggests that w can not be inferred from the program whatever truth values we assign to the temporarily undecided subgoals in U A. This, under the well-founded semantics, implies that w is false. So we set w = f and come back to N1 again. Applying Cq3 leads to N12 . We see that there is a loop between N0 and N12 . Instead of selecting Cp1 which is being used by N0 , we use Cp2 to unify against p(X), which leads to an answer node T with mgu X/a. That is, p(a) = t and q(a) = t, which are added to the tables of p(X) and q(X), respectively (N EW is then set to 1). Since the loop N0 → N1 → N12 is positive, we backtrack to N1 and then to N0 , making no assumption. This time, we have N EW = 1, U A = {} and U D = {}. Since N0 is the top loop node and N EW = 1, we do answer iteration by regenerating the loop, which leads to N0 → N13 → N14 . Since Cp1 is being used by N0 and Cp2 has already been used before (by N12 , with the answer stored in the table of p(X)), p(X) at N14 has no clause to unify with. So we backtrack to N13 and then to N0 . Now, N EW = 0 and U A = U D = {}, so we end the iteration. Since N0 is the root, the evaluation of G0 terminates. The derived answers are: p(a) = q(a) = t, p(b) = q(b) = u for any b 6= a, r = s = u, and w = v = f . t u We see that these answers constitute the well-founded model for P1 .
Linear Tabulated Resolution for the Well-Founded Semantics
197
The tabulated resolution shown in Example 1 is obviously linear. Meanwhile, we see that it resolves infinite loops and redundant computations without losing any answers. Main points are summarized as follows: 1. Tabling. Tables are used to store intermediate results, which is the basis of all tabulated resolutions. 2. Clause selection. Without loops, clauses are selected in the same way as in Prolog except that clauses that have been used before will not be reapplied because the complete set of answers derived via those clauses has already been memorized in related tables. For example, N14 skips Cp2 because the clause has already been used by N12 . This avoids redundant computations. When a loop occurs, however, clauses that are being used by ancestor loop subgoals will be skipped. For example, Cr1 , Cw1 and Cp1 are skipped by N7 , N10 and N12 , respectively. This breaks infinite loops. 3. Assumption. For a positive loop subgoal, backtracking proceeds in the same way as in Prolog (see N12 and N14 ). A negative loop subgoal whose truth value is currently undecided, however, will be assumed temporarily undefined before being failed (see N7 and N10 ). Temporarily undefined values will be removed (from tables) when their t or u counterparts are derived. This guarantees the correctness of answers. 4. Answer iteration. Before leaving a loop by failing its top loop node (e.g. N3 , N8 and N0 ), iteration will be carried out to derive complete answers of loop subgoals. Without iteration, we would miss answers because some clauses have been skipped to break infinite loops. The process of answer iteration is briefly described as follows. Let Nt be the top loop node. We first check if the termination condition is satisfied (i.e. N EW = 0 and (U A ⊆ U D or U D = {})). If not, we start an iterate by setting N EW = 0 and U A = U D = {}. The iterate will regenerate the loop (e.g. N0 → N13 → N14 in Fig. 1). During the iterate, N EW , U A and U D will be updated accordingly. By the end of the iterate, i.e. when we come back to the top loop node Nt again and try to fail it via backtracking, we distinguish among the following cases: – N EW = 1, which means at least one new answer, with truth value t or u, has been derived (and added to the related table) during the iterate. So we start a new iterate to seek more answers. – N EW = 0 and (U A ⊆ U D or U D = {}). Stop the iteration with all temporarily undefined answers replaced by undefined ones. After this, the answers of all subgoals involved in the loop are completed (i.e. the tables of these subgoals contain all of their answers). We attach a flag COM P to each table with COM P = 1 standing for being completed. For any subgoal A whose table flag COM P is 1, its instance A0 is true if A0 = t is in the table of A, undefined if A0 = u is in the table but A0 = t is not, and false if neither is in the table. For instance, in the above example, p(a) = t and p(X) = u being in the table of p(X) shows that p(a) is true and p(b) is undefined for any b 6= a. – Otherwise. Let U C = U A−U D. Since N EW = 0, for all subgoals in U C we can not infer any new answers for them from the program whatever
198
Y.-D. Shen et al.
truth valus we assign to the temporarily undecided subgoals in U A. This implies that the answers of these subgoals have been completed, so we set the flag COM P of their tables to 1. Since the subgoals in U D − U C are still temporarilly undecided, we start next iterate. The iteration will terminate provided that the program has the bounded-term-size property [13].
3
TPWF-Trees
In this section we define TPWF-trees, which is the basis of TPWF-resolution. We begin by defining tables. 3.1
Tables
Let P be a logic program and p(X) an atom. Let P contain exactly Np clauses with a head p(.). A table for p(X), denoted T B(p(X)), is a four-tuple (p(X), T, C, COM P ), where 1. T = {T1 , T2 }, with T1 and T2 storing answers of p(X) with truth values t and u, respectively. 2. C is a vector of Np elements, keeping the status of Cpi s w.r.t. p(X). C[i] = 0 (resp. = 1) represents that the clause Cpi is no longer available (resp. still available) to p(X). 3. COM P ∈ {0, 1}, with COM P = 1 indicating that the answers of p(X) have been completed. For convenience, we use T B(p(X)) → t answer[i] and T B(p(X)) → u answer[i] to refer to the i-th answer in T1 and T2 , respectively, T B(p(X)) → clause status[i] to refer to the status of Cpi w.r.t. p(X), and T B(p(X)) → COM P to refer to the flag COM P . When a table T B(p(X)) is created, T1 = T2 = {}, the status of all clauses is initialized to 1, and COM P = 0. Answers in a table will be read sequentially from T1 followed by T2 . When T1 = T2 = {} and COM P = 1, p(X) = f . Example 2. Consider again the program P1 in Example 1. After node N14 is generated (see Fig. 1), we have the following tables: T B(p(X)) : T B(q(X)) : T B(r) : T B(s) : T B(w) :
(p(X), {{p(a)}, {p(X)}}, {1, 0}, 0), (q(X), {{q(a)}, {q(X)}}, {0, 0, 1}, 0), (r, {{}, {r}}, {0}, 1), (s, {{}, {s}}, {0}, 1), (w, {{}, {}}, {0}, 1).
t u
From Fig. 1 we observe that each node in the tree has a unique name (index) Ni that is labeled by a goal Gi , so that the left-most subgoal A1 = A (or A1 = ¬A) of Gi is uniquely determined by Ni . In order to keep track of A1 that resolves against both program clauses and tabled answers, we attach to Ni
Linear Tabulated Resolution for the Well-Founded Semantics
199
three pointers. Ni → t answer ptr and Ni → u answer ptr point to an answer in T B(A) → t answer and T B(A) → u answer, respectively. Ni → clause ptr points to a clause whose head is unifiable with A. This leads to the following. Definition 1. Let Gi be a goal ← A1 , ..., Am (m ≥ 1). By “register a node Ni with Gi ” we do the following: (1) label Ni with Gi ; (2) create the above three pointers for Ni , which unless otherwise specified are initialized to null. We assume two table functions: memo(.) and lookup(.). Let Ni be a node with the left-most subgoal A. Let I be an answer of A with truth type S ∈ {t, u, u∗ }. When T B(A) contains no answer with truth value t that is a variant of or more general than I, memo(Ni , I, S) adds I to T B(A) in the following way. When S = t, add I to the end of T B(A) → t answer, set T B(A) → COM P = 1 if I is a variant of A, and remove from T B(A) → u answer all J/J ∗ with J an instance/variant of I. Otherwise, if S = u (resp. S = u∗ ), add I (resp. I ∗ ) to the end of T B(A) → u answer provided that it contains no answer that is a variant of or more general than I (resp. I ∗ ). Let Ni and A be as above, and I and S be variables that are used for caching an answer and its truth type. lookup(Ni , I, S) fetches from T B(A) an answer with its truth type into I and S, respectively. If no answer is available in T B(A), I = null. 3.2
Resolvants
We now discuss how to resolve subgoals against program clauses as well as tabled answers. Let Ni be a node labeled by a goal Gi =← A1 , ..., Am (m ≥ 1) with A1 = p(X). Consider evaluating A1 using a program clause C = A ← B1 , ..., Bn (n ≥ 0), where A1 θ = Aθ.3 In Prolog, we will generate a new node labeled with the goal Gi+1 = (B1 , ..., Bn , A2 , ..., Am )θ, where we see that the mgu θ is consumed by all Aj s (j > 1), although the proof of A1 θ has not yet been completed (produced). In our tabulated resolution, however, we apply the PMF (for Prove-Memorize-Fetch) mode to resolve subgoals against clauses and tabled answers [11]. That is, we first prove (B1 , ..., Bn )θ. If it is true with some mgu θ1 , which means A1 θθ1 is true, we memorize the answer in the table T B(A1 ) if it is new. We then fetch an answer p(I) with truth type S from T B(A1 ) and apply it to the remaining subgoals of Gi . The process can be depicted more clearly in Fig. 2. Obviously the PMF mode preserves the original set of answers of A1 . Moreover, since only new answers of A1 are added to the table, all repeated answers of A1 will be precluded to apply to the remaining subgoals of Gi . The PMF mode can readily be realized by using the two table procedures memo(.) and lookup(.). That is, after resolving the subgoal A1 with the clause C, Ni gives a child node Ni+1 labeled with the goal Gi+1 =← (B1 , ..., Bn )θ, memo(Ni , p(X)θ, t), lookup(Ni , Ii , Si ), A2 , ..., Am . Note that the propagation of 3
Here and throughout, we assume that C has been standardized apart to share no variables with Gi .
200
Y.-D. Shen et al.
Resolve A1 against C
A1 θ = Aθ
-
Prove (B1 , ..., Bn )θ
A1 θθ1 is true
-
Memorize A1 θθ1
⇓ Apply (X/I, S) to A2 , ..., Am
(I, S)
Fetch an answer p(I)
⇐
T B(A1 )
Fig. 2. The PMF mode for resolving subgoals.
θ is blocked by the subgoal lookup(Ni , Ii , Si ) because the consumption (fetch) must be after the production (prove and memorize). Observe that after the proof of A1 is reduced to the proof of (B1 , ..., Bn )θ, memo(Ni , p(X)θ, t), lookup(Ni , Ii , Si ) by applying a program clause C, the truth value of an answer of A1 to be memorized must be the logical AN D of the truth values of the answers of all Bj θs. Such an AN D computation is carried out incrementally. Initially we have memo(Ni , p(X)θ, S 0 ) with S 0 = t. Then from j = 1 to j = n if Bj θ gets an answer Bj θθ0 with truth type S, the memo(.) subgoal is updated to memo(Ni , p(X)θθ0 , S 0 ∧ S). This leads to the following definition. Definition 2. Let G1 =← A1 , ..., Am be a goal, θ an mgu, and S ∈ {t, u, u∗ }. The resultant of applying (θ, S) to G1 is the goal G2 =← (A1 , ..., Ak−1 )θ, A0k θ, Ak+1 , ..., Am , where Ak is the left-most subgoal of the form memo(.) (if G1 contains no memo(.), k = m) and A0k is Ak with its answer type S 0 ∈ {t, u, u∗ } changed to S 0 ∧ S. The concept of resolvants of TPWF-resolution is then defined based on the PMF mode. Definition 3. Let Ni be a node labeled by a goal Gi =← A1 , ..., Am (m ≥ 1). 1. If A1 = p(X), let C be a clause A ← B1 , ..., Bn with Aθ = A1 θ, then a) The resolvant of Gi and C is the goal Gi+1 =← (B1 , ..., Bn )θ, memo(Ni , p(X)θ, t), lookup(Ni , Ii , Si ), A2 , ..., Am . b) Let p(I) be an answer of A1 with truth type S, then the resolvant of Gi and p(I) with S is the resultant of applying (X/I, S) to ← A2 , ..., Am . 2. If A1 = ¬B with B a ground atom, let B be the answer with truth type S ∈ {f, u, u∗ }, then the resolvant of Gi and B with S is the resultant of applying ({}, ¬S) to ← A2 , ..., Am . 3. If A1 is memo(Nh , q(I), S) and A2 is lookup(Nh , Ih , Sh ), let q(X) be the leftmost subgoal at node Nh , then (after executing the two functions) the resolvant of Gi and Ih with truth type Sh is the resultant of applying (X/Ih , Sh ) to ← A3 , ..., Am .
Linear Tabulated Resolution for the Well-Founded Semantics
3.3
201
Ancestor Lists and Loops
Loop checking is a principal feature of TPWF-resolution (see Example 1). Positive and negative loops are determined based on ancestor lists that are associated with subgoals. Definition 4. ([10] with slight modification) An ancestor list ALA is associated with each subgoal A in a tree (see the TPWF-tree below), which is defined recursively as follows. 1. If A is at the root, then ALA = {}. 2. Let A be at node Ni+1 . If A inherits a subgoal A0 (by copying or instantiation) from its parent node Ni , then ALA = ALA0 ; else if A is in the resolvant of a subgoal B at node Ni and a clause B 0 ← A1 , ..., An with Bθ = B 0 θ (i.e. A = Ai θ for some 1 ≤ i ≤ n), ALA = {(Ni , B)} ∪ ALB . 3. Let Gi =← ¬A, ... be the goal at Ni , which has a child node Ni+1 labeled by the goal Gi+1 =← A (the edge from Ni to Ni+1 is dotted; see Fig. 1). Then ALA = {¬} ∪ AL¬A . Let Gi at node Ni and Gk at node Nk be two goals in a derivation and A and A0 be the left-most subgoals of Gi and Gk , respectively. If A is in the ancestor list of A0 , i.e. (Ni , A) ∈ ALA0 , the proof of A needs the proof of A0 . In such a case, we call A (resp. Ni ) an ancestor subgoal of A0 (resp. ancestor node of Nk ). Particularly, if A is both an ancestor subgoal and a variant, i.e. an ancestor variant subgoal, of A0 , we say the derivation goes into a loop. The loop is negative if there is a ¬ ahead of (Ni , A) in ALA0 ; otherwise, it is positive. For example, the ancestor list of the subgoal r at N7 in Fig. 1 is ALr = {¬, (N5 , s), ¬, (N3 , r), ¬, (N1 , q(X)), (N0 , p(X))} and the ancestor list of the subgoal p(X) at N12 is ALp(X) = {(N1 , q(X)), (N0 , p(X))}. There is a negative loop between N3 and N7 , and a positive loop between N0 and N12 . 3.4
Control Strategy
Although in principle the tabulated approach presented in this paper is effective for any fixed control strategy, we choose to use the so called TP-strategy, which is the Prolog control strategy enhanced with mechanisms for selecting tabled answers. Definition 5 ([11]). By TP-strategy we mean: Depth-first (for goal selection) + Left-most (for subgoal selection) + Table-first (for program and table selection) + Top-down (for the selection of tabled answers and program clauses) + Lastfirst (for backtracking).
202
3.5
Y.-D. Shen et al.
Algorithm for Building TPWF-Trees
In order to simplify the presentation, we assume every subgoal has a table and that the flag variables COM P (in tables) and N EW are updated automatically. Moreover, we assume that whenever an atom A is assumed undefined (i.e. A = u∗ is assumed), A is added to U A, and that whenever A = u∗ is derived (memorized), A is added to U D (automatically). We assume that when selecting clauses to resolve with subgoals, all clauses whose status is “no longer available” are automatically skipped. Finally we assume a function return(A, S), which returns an answer A with truth type S. The truth type of return(A, S) is updated in the same way as memo( , , S). TPWF-trees are constructed based on TP-strategy using the following algorithm. Definition 6 (TPWF-Algorithm). Let P be a logic program, A an atom, and G0 =← A, return(A, t). Let ALA = {} be the ancestor list of A. The TPWFtree T FG0 of P ∪ {G0 } is constructed by applying the following algorithm until the answer N O or FLOUND is returned. tpwf (G0 , ALA ) : 1. Root Node: Register the root N0 with G0 and goto 2. 2. Node Expansion: Let Ni be the latest registered node labeled by Gi =← A1 , ..., Am (m > 0). Register Ni+1 as a child of Ni with Gi+1 if Gi+1 can be obtained as follows. Case 2.1: A1 is return(A0 , S). Return (A0 , S) if S 6= u∗ . When S = t or S = u, set Gi+1 = T and Gi+1 = U, respectively. Goto 3 with N = Ni . Case 2.2: A1 is memo(Nh , I, S) and A2 is lookup(Nh , Ih , Sh ). Execute the two table functions. If Ih = null, then goto 3 with N = Ni ; else set Gi+1 to the resolvant of Gi and Ih with truth type Sh and goto 2. Case 2.3: A1 = ¬B. If B is non-ground, set Gi+1 = FD and return FLOUND. Get an answer from T B(B). Let I be the answer with truth type S. Case 2.3.1: I 6= null. If S = t, then goto 3 with N = Ni ; else set Gi+1 to the resolvant of Gi and I with S and goto 2. Case 2.3.2: I = null. When T B(B) → COM P = 1, if T B(B) → u answer 6= {}, then goto 3 with N = Ni ; else set Gi+1 =← A2 , ..., Am and goto 2. Otherwise, let G00 =← B, return(B, t) and ALB = {¬} ∪ ALA1 . Call tpwf (G00 , ALB ) until N O or F LOU N D is returned. If F LOU N D is returned, then set Gi+1 = FD and return F LOU N D; else apply the answers in T B(B) to A1 (repeat Case 2.3) and then goto 3 with N = Ni . Case 2.4: A1 = p(X). Get an answer I with truth type S from T B(A1 ). If I 6= null, then set Gi+1 to the resolvant of Gi and I with S and goto 2; else Case 2.4.1: T B(A1 ) → COM P = 1. Goto 3 with N = Ni . Case 2.4.2: Ni is a top loop node. Do answer iteration and then goto 3 with N = Ni .
Linear Tabulated Resolution for the Well-Founded Semantics
203
Case 2.4.3: A1 ∈ U A. if A∗1 is in T B(A1 ) → u answer, then goto 3 with N = Ni ; else set Gi+1 to the resolvant of Gi and A1 with truth type u∗ , and goto 2.4 Case 2.4.4: Otherwise. If no loop occurs (i.e. A1 has no ancestor variant subgoal), then resolve A1 with the first clause available; else resolve A1 with the first clause below the one that is being used by its closest ancestor variant subgoal. If such a clause Cpj exists, then set Gi+1 to the resolvant of Gi and Cpj and goto 2; else goto 3 with N = Ni while assuming A1 = u∗ if the loop is negative. 3. Backtracking: If N is the root, return N O. Let Nf be the parent node of N with the left-most subgoal Af . If Af is a function, goto 3 with N = Nf . Otherwise, if N was generated from Nf by resolving Af with a clause Cj , then if Af is not involved in any loop, set T B(Af ) → clause status[j] = 0. Goto 2 with Nf as the latest registered node. The input of TPWF-Algorithm includes a top goal G0 =← A, return(A, t) and an ancestor list ALA . Its output is either F LOU N D, indicating that G0 is floundered, or N O, showing that there is no more answer for A, or (A0 , S), meaning that A0 is an answer of A with truth type S ∈ {t, u}. Observe that like SLDNF-resolution [4], when A1 = ¬B we may build a new tree for B (Case 2.3.2). In SLDNF-resolution, the two SLDNF-trees are totally independent. This leads to possible infinite negative loops. TPWF-resolution, however, connects the two TPWF-trees via the ancestor list {¬} ∪ ALA1 , so that negative loops can be detected effectively (see Fig. 1).
4
TPWF-Resolution
Definition 7. Let T FG0 be a TPWF-tree of P ∪{G0 }. All leaves of T FG0 labeled by T , U or FD are success, undefined and flounder leaves, respectively, and all other leaves are failure leaves. A TPWF-derivation is a partial branch in T FG0 starting at the root, which is successful, floundered, undefined or failed if it ends respectively with a success leaf, a flounder leaf, an undefined leaf and a failure leaf. The process of constructing TPWF-derivations is called TPWF-resolution. A goal G0 is floundered if it has a floundered TPWF-derivation. Let G0 be a non-floundered goal and I 0 be a variant of or more general than I. Then G0 is true with an answer I if there is a successful TPWF-derivation with (I 0 , t) returned; undefined with I if it is not true with any instance of I but there is an undefined TPWF-derivation with (I 0 , u) returned; false with I if it is neither true nor undefined with any instance of I. The following theorem follows from the basic fact: For any logic program with the bounded-term-size property, (1) the set of answers in any table is finite, (2) every TPWF-derivation is finite, and (3) answer iteration must reach a fixpoint. 4
For this case, no further backtracking will be allowed at this node.
204
Y.-D. Shen et al.
Theorem 1 (Termination of TPWF-resolution). Let P be a logic program with the bounded-term-size property and G0 =← A, return(A, t) a top goal. TPWF-Algorithm terminates with a finite TPWF-tree. TPWF-resolution cuts infinite loops and infinite recursions through negation by means of assumption and answer iteration. Positive loops are cut simply by backtracking, whereas negative loop subgoals whose truth values are currently undecided will be assumed temporarily undefined before being failed via backtracking. Temporarily undefined values will be removed (from tables) after their t or u counterparts are derived. This guarantees the correctness of loop cutting. Meanwhile, before leaving a loop by failing its top loop node, iteration will be carried out to derive complete answers of loop subgoals. For logic programs with the bounded-term-size property, the iteration must terminate with a fixpoint of answers. This leads to the following. Theorem 2 (Soundness and Completeness of TPWF-resolution). Let P be a logic program with the bounded-term-size property and G0 =← A, return(A, t) a non-floundered goal. Let W F (P ) be the well-founded model of P . Then 1. 2. 3. 4.
W F (P ) |= ∃(A) iff G0 is true with an instance of A; W F (P ) |= ¬∃(A) iff G0 is false with A; W F (P ) |= ∀(Aθ) iff G0 is true with Aθ; W F (P ) 6|= ∃(A) and W F (P ) 6|= ¬∃(A) iff G0 is undefined with A.
Acknowledgments We thank the anonymous referees for their helpful comments. The first author is supported in part by Chinese National Natural Science Foundation and Trans-Century Training Programme Foundation for the Talents by the Chinese Ministry of Education.
References 1. Bol, R. N., Degerstedt, L.: Tabulated Resolution for the Well-Founded Semantics. Journal of Logic Programming 34:2 (1998) 67-109 2. Chen, W. D., Swift, T., Warren, D. S.: Efficient Top-Down Computation of Queries under the Well-Founded Semantics. Journal of Logic Programming 24:3 (1995) 161199 3. Chen, W. D., Warren, D. S.: Tabled Evaluation with Delaying for General Logic Programs. J. ACM 43:1 (1996) 20-74 4. Lloyd, J. W.: Foundations of Logic Programming. 2nd edn. Springer-Verlag, Berlin (1987) 5. Przymusunski, T.: Every Logic Program Has a Natural Stratification and an Iterated Fixed Point Model. In: Proc. of the 8th ACM Symposium on Principles of Database Systems (1989) 11-21 6. Ross, K.: A Procedural Semantics for Well-Founded Negation in Logic Programs. Journal of Logic Programming 13:1 (1992) 1-22 7. Sagonas, K., Swift, T., Warren, D. S.: XSB as an Efficient Deductive Database Engine. In: Proc. of the ACM SIGMOD Conference on Management of Data. Minneapolis (1994) 442-453
Linear Tabulated Resolution for the Well-Founded Semantics
205
8. Sagonas, K., Swift, T., Warren, D. S.: An Abstract Machine for Tabled Execution of Fixed-Order Stratified Logic Programs.ACM Transactions on Programming Languages and Systems 20:3 (1998) 9. Sagonas, K., Swift, T., Warren, D. S., Freire, J., Rao, P.: The XSB Programmer’s Manual (Version 1.8) (1998) 10. Shen, Y. D.: An Extended Variant of Atoms Loop Check for Positive Logic Programs.New Generation Computing 15:2 (1997) 317-341 11. Shen, Y. D., Yuan, L. Y., You, J. H., Zhou, N. F.: Linear Tabulated Resolution Based on Prolog Control Strategy. Submitted for publication (1999) 12. Van Gelder, A., Ross, K., Schlipf, J.: The Well-Founded Semantics for General Logic Programs. J. ACM 38:3 (1991) 620-650 13. Van Gelder, A.: Negation as Failure Using Tight Derivations for General Logic Programs. Journal of Logic Programming 6:1&2 (1989) 109-133 14. Warren, D. H. D.: An Abstract Prolog Instruction Set. Technical Report 309, SRI International (1983) 15. Zhou, N. F.: Parameter Passing and Control Stack Management in Prolog Implementation Revisited. ACM Transactions on Programming Languages and Systems 18:6 (1996) 752-779
A Case Study in Using Preference Logic Grammars for Knowledge Representation Baoqiu Cui, Terrance Swift, and David S. Warren Department of Computer Science SUNY at Stony Brook Stony Brook, NY 11794-4400, U.S.A. {cbaoqiu,tswift,warren}@cs.sunysb.edu
Abstract. Data standardization is the commercially important process of extracting useful information from poorly structured textual data. This process includes correcting misspellings and truncations, extraction of data via parsing, and correcting inconsistencies in extracted data. Prolog programming offers natural advantages for standardizing: definite clause grammars can be used to parse data; Prolog rules can be used to correct inconsistencies; and Prolog’s simple syntax allows rules to be generated to correct misspellings and truncations of keywords. These advantages can be seen as rudimentary mechanisms for knowledge representation and at least one commercial standardizer has exploited these advantages. However advances in implementation and in knowledge representation — in particular the addition of preferences to logical formalisms — allow even more powerful and declarative standardizers to be constructed. In this paper a simple preference logic, that of [7] is considered. A fixed point semantics is defined for this logic and its tabled implementation within XSB is described. Development of a commercial standardizer using the preference logic of [7] is then documented. Finally, detailed comparisons are made between the preference logic standardizer and the previous Prolog standardizer illustrating how an advance in knowledge representation can lead to improved commercial software.
1
Introduction
Horn clauses have proven remarkably useful for parsing when their syntactic variant, definite clause grammars (DCGs), is employed. DCGs are commonly used to construct LL parses in Prolog — but DCGs can also implement the more powerful class of LR parses in Prologs that include tabling, such as XSB or YAP. Even LR parses, however, can prove cumbersome for implementing grammars that contain potential ambiguities such as the “dangling else” problem, which can arise with nested if-then-else statements in imperative programming languages. While LR grammars can be written to deterministically parse such potential ambiguities, the determinism comes at a cost of the conciseness of the grammar. This problem is especially important for natural language applications where ambiguities often occur and which may require a high degree of maintenance when a grammar written for one corpus of text is re-applied to a new corpus. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 206–220, 1999. c Springer-Verlag Berlin Heidelberg 1999
A Case Study in Using Preference Logic Grammars
207
As proposed in [7] a natural way to resolve the dangling else ambiguity is to declare preferences for one parse over another by adding preference clauses to Horn clauses. The resulting framework is called Preference Logic Programs (PLPs) or, in their grammar form Preference Logic Grammars (PLGs) [3]. PLPs contain a syntactic restriction that ensures that the preferences can be precomputed “statically”. The semantics of PLPs is also oriented to the subclass of programs that contain the optimal subproblem property which intuitively means that a preferred goal depends only on preferred subgoals. The PLPs of [7] are in this sense weaker than other formalisms such as that of Brewka [1]. On the other hand, PLPs are relatively easy to efficiently implement and are arguably easier for programmers to understand than more general frameworks. Despite the above restrictions, the usefulness of PLGs over DCGs for practical natural language analysis can be striking, particularly for the important commercial application of data standardization [12]. The problem of data standardization is to extract meaningful, standardized information from formatted textual strings. For instance, data standardization might seek to extract street address or telephone information from address strings contained in a relational database or XML page. To take a simple but concrete example, a relational database may contain the following (misspelled) textual string1 : TO THE ORDR OF ZZZ AUTOPARTS INC 129 WASHING TON STREET EL SEGUNDO CA A name and address standardizer might extract the company name, address, city and postal zip code all in a standard format, as the following record indicates: Organization: Street: PO BOX: City: State: Zip:
ZZZ AUTOPARTS 129 WASHINGTON ST EL SEGUNDO CA 90245
Data standardiation thus relies on parsing to extract the company name, street number and so on from the string; on techniques to infer missing information to provide the proper zip code for the string; on facilities to correct badly entered information to correct the street name; and on a detailed knowledge of a narrow domain to understand that the phrase TO THE ORDR OF is a preamble, and not part of a company name. At the same time, data standardization does not require techniques for understanding deep linguistic structures, and is performed over a relatively narrow semantic domain. Since nearly any large organization must maintain data about names and addresses (of suppliers, customers, etc), name and address standardization is of great commercial significance, and various tools exist that extract and standardize textual names and addresses. For instance a commercial Prolog standardizer has been developed by the second author [12] and used to standardize 1
All examples are from commercial data, with minor changes to protect the privacy of the entities involved.
208
B. Cui, T. Swift, and D.S. Warren
data for several commercial and government organizations. This standardizer required a significant implementation effort, consists of about 100,000 lines of Prolog code. The most widely used address standardizer, however, is written by the U.S. Postal Service and does not use logic programming techniques. Comparisons between the Postal standardizer and the standardizer of [12] indicate complimentary strengths: the Prolog standardizer is much better at extracting addresses from free text, at parsing the various address components, and at handling foreign addresses. Because it works off of a better knowledge base, the Postal standardizer is better at correcting address data once the address components have been identified (e.g. the street, post-office box and city identified). Indeed, these two standardizers have worked together to commercial advantage. Name and address standardization thus provides an example of an important commercial problem for which logic programming techniques offer significant advantages over other existing methods. It should also be noted that the domain of names and addresses is only one domain for which data standardization has commercial importance, and standardizers have been written for other domains, including aircraft part information, transportation records, and cargo descriptions. There are no other known standardizers for these domains. This paper studies in detail how the addition of simple preferences to grammars has been used to improve the Prolog standardizer mentioned above. The structure of the paper is as follows. We first briefly describe the PLPs of [7] with a view to developing a fixed point semantics for a large class of PLPs. Based on this semantics we outline how PLPs can be evaluated using tabling, and sketch their implementation in XSB. The rest of the paper is specific to the application of name and address standardization. The domain of names and addresses is described in detail, followed by a brief description of the Prolog name and address standardizer. We then examine how PLPs (and tabling) can be used to improve the Prolog standardizer, and show that when rewritten using PLPs the resulting standardizer is much more concise, particularly for portions of the code that parse and resolve ambiguities, with only a moderate loss in performance time.
2
Preference Logic Programs
For our purposes, a preference logic program (PLP) P can be see as a set H(P ) of Horn clauses formed over a language LP containing a finite number of predicate, and function symbols, augmented with a set of preference clauses pref er(p(t1 , . . . , tn ), p(u1 , . . . , un )) ← A1 , ..., An . where the predicate symbol pref er/2 does not occur in LP , but all other atoms are formed over LP . Predicates that appear as arguments of the heads of preference rules are called optimization predicates. The set of derived predicates is the smallest set of predicates in H(P ) for which at least one clause of each predicate contains either a derived predicate or an optimization predicate in its body. All other predicates are termed base predicates that are defined by base clauses. Atoms in the body of preference clauses are restricted to base predicates. This
A Case Study in Using Preference Logic Grammars
209
syntactic restriction in the bodies of preference clauses guarantees that the computation of preferences will not depend on the application of other preferences, greatly simplifying computation. Example 1. Let P1 be the preference logic program prefer(p(a),p(a)). p(a). d(X):- p(X).
prefer(p(c),p(d)):- b(1). b(1). p(c):- p(d). p(d).
p/1 is an optimization predicate, b/1 a base predicate, and d/1 a derived predicate. A Fixed-Point Semantics for PLPs In order to define a fixed point semantics for PLPs, we consider a program to be represented by its ground instantiation, and term ground atoms of the form pref er(p(t1 , . . . , tn ), p(u1 , . . . , un )) preference atoms. Similarly, we refer to optimization atoms, derived atoms, and base atoms of a (ground) PLP P . Neither preferred atoms nor base atoms depend on an application of preferences and can be constructed via the usual least fixed point construction for Horn clauses. We thus refer to the canonical pre-interpretation of a PLP P as the least model of the preference and base clauses of P considered as Horn clauses. Thus, for any PLP P , its canonical pre-interpretation is uniquely determined and is denoted as CP . The preference atoms in CP , when augmented by a transitivity axiom, induces a relation, which we denote by
Consider the behavior of TP (J, I) for fixed J. Optimization atoms in bodies of rules in P are satisfied with respect to the fixed interpretation J; filtering of optimization atoms in satisfied rules is done with respect to J and to the pre-computed preference ordering
210
B. Cui, T. Swift, and D.S. Warren
Lemma 1. Let P be a PLP and C be the canonical pre-interpretation of P . Let J be a set of optimization atoms of P . Then λI.TP (J, I) is monotonic with respect to the set-inclusion relation between interpretations. By the Tarski fixed point theorem, λI.TP (J, I) has a least fixed point, denoted as lf p(λI.TP (J, I)). Using this least fixed point, the operator λJ.TP (J) is defined. Definition 2. Let P be a PLP and J be a set of optimization atoms of P . TP (J) = {O|(O ∈ lf p(λI.TPC (J, I)) and O is an optimization atom or O ∈ J) and 6 ∃O0 ∈ lf p(λI.TPC (J, I)) such that O
TP (J) thus constructs the set of optimization atoms lf p(λI.TPC (J, I)) ∪ J, and retains only optimization atoms that are maximal with respect to
prefer(q(b),q(a)). q(b). q(c):- q(a).
Let J1 = {q(b)}, J2 = {q(b), q(a)}. In terms of set-inclusion, J1 ⊆ J2 , TP (J1 ) = {q(b)}, TP (J2 ) = {q(c)}, and {q(b)} 6⊆ {q(c)}. Similarly, for J1 = {q(a)}, J2 = {q(b)}, J1 P J2 , but TP (J1 ) = {q(c)}, TP (J2 ) = {q(b)}, and {q(c)} 6P {q(b)}. [7] informally defines the optimal subproblem property as the property that the solution to any optimal (preferred) goal should depend only on preferred subgoals. In our framework the optimal subproblem property can be cast as a form of monotonicity. Proposition 1. Let P be a PLP. P has the optimal subproblem property if J1 J2 implies that TP (J1 ) P TP (J2 ). Then, if P has the optimal subproblem property, TP∞ (∅) is the least fixed point of λJ.TP (J) As terminology, if P is a PLP with the optimal subproblem property, the least fixed point of λJ.TP (J) is called the distinguished model of P . Example 3. The canonical pre-interpretation of P1 of Example 1 is {pref er(p(a), p(a)), pref er(p(c), p(d)), b(1)}. The least fixed point of λI.TP1 (∅, I) (J) is {p(c)}. Finally, the distinguished model of P1 is is {p(d), d(d)}, and TP∞ 1 {p(c), d(c)}. The approach taken above differs significantly from that taken in [7] which provides a possible worlds semantics for PLPs. We briefly describe a portion of their semantics to justify our use of distinguished models.
A Case Study in Using Preference Logic Grammars
211
Definition 3. Let P be a PLP with canonical pre-interpretation C; let M be the unique minimal model of H(P) (the Horn clauses in P ) taken as a definite program; and let MO be the optimization atoms in M . Then a subset W of MO is reduced if it does not contain an O1 such that O1
that R is a unique strongly optimal world follows immediately from the monotonicity requirement of Definition 1. Straightforward by induction on iterations of λJ.TP (J), and of the standard TP operator [9] for definite programs.
2.1
Tabled Resolution of PLPs
Just as PLPs are an extension of Horn clause programs, the tabled implementation of PLPs is an extension of the tabled implementation of Horn clauses. For purposes of space, we present the main ideas of tabling PLPs at an informal level — SLG terminology is used, see [2] or [11] for more detail. Tabling can be seen as a method of dynamically factoring out redundant derivations for subgoals so that these derivations are performed only once during the course of an evaluation. Accordingly, rather than modeling an evaluation using a tree, as with SLD, tabling uses a forest of trees, one for each tabled subgoal encountered during the evaluation. When tabled evaluation encounters a new (up to variance) subgoal S a new tree is created with root S :- S, using the new subgoal operation. The body is a list of goals that remain to be resolved in the derivation of S, while the head accumulates bindings required by the resolution steps for the goals in the goal list. Figure 1 shows part of a tabled resolution for a query -? d(X) to program P1 of Example 1 (evaluation of the canonical pre-interpretation of P1 is not shown) Three subgoals are encountered during this evaluation, d(X), p(X) and p(d), and trees are created for each of them. Children of the roots of trees are produced by a program clause resolution step — accounting for the creation of nodes 1,3,5,6,8 in Figure 1. Answers are simply leaf nodes with empty goal lists, (nodes 8,9, and 11). Answers are used for resolution for all other nodes are produced by answer resolution which resolves answers against the selected literals of these nodes. For example, node 11 is produced by
212
B. Cui, T. Swift, and D.S. Warren 2) p(X):- p(X)
0) d(X):- d(X)
1) d(X):- p(X) 11) d(c)
3) p(a)
4) fail
5) p(d)
10) fail
7) p(d):- p(d) 6) p(c):- p(d)
8) p(d)
9) p(c)
Fig. 1. Tabling Forest for Query ?- d(X) to Program of Example 1
resolving the answer in node 9 with the selected literal of the goal list of node q. To extend tabled resolution to handle preference logic programs with the optimal subgoal property, one need only ensure that for optimization predicates, an answer A is used for resolution at a state S of an evaluation only if there are no answers preferred to A in S. Thus, the operation preferred answer filtering is used to remove non-preferred answers. This is indicated in Figure 1 by the creation of failure nodes as children of the removed answers. Via the canonical pre-interpretation of P1 , answer p(a) (node 3) can be seen to be nonpreferred as soon as it is derived (in fact it is preferred to itself), while p(d) is not seen to be non-preferred until node 9 is derived. We call this framework SLGP LP . We note that the optimal subproblem property of PLPs is critical for the correctness of their tabled implementation. Without this property, the answers derived depends on the ordering of the various tabling operations. For instance, if the query ?- q(X) to program P2 of example 2 is evaluated, it may derive either the answer q(c) or q(b) depending on the ordering of the resolution steps. Given a formal definitions of the SLGP LP operations, it can be shown that answers produced by SLGP LP are sound and complete with respect to optimization predicates of PLPs with the optimal subproblem property. A small technical difficulty remains in using SLGP LP to compute queries to distinguished models. If derived predicates are tabled, a final SLGP LP system may contain answers for derived predicates that are not part of the distinguished model. In a practical evaluation, this can be solved by not tabling derived predicates, or by making the derived predicates into optimization predicates. Tabling helps the evaluation of PLPs in two ways. First, when derived, base or optimization predicates are tabled, they gain the same properties that definite clauses gain from tabling, — termination polynomial data complexity for e.g. datalog programs. For grammars, tabled DCGs can provide the same complexity and termination properties as Earley recognition [5]. Specific to PLPs, when optimization predicates are tabled, only answer resolution is used for calls to these predicate. In a practical evaluation these answers are stored in a table separate from the evaluation environment. If an answer is derived that is not preferred, that answer need never be used for resolution. Similarly, if the derivation of an answer A causes another answer A0 to be non-preferred (because A0 is preferred to A), the answer A can be removed from the table so that it will no longer
A Case Study in Using Preference Logic Grammars
213
be used for resolution. Thus use of the table helps to aggregate different answer derivations for calls to optimization predicates 2 . Implementing PLPs in XSB PLPs may be implemented by extending the tabling mechanism of XSB to implement the preferred answer filtering operation. In XSB, SLDNF resolution is used on predicates by default; if tabled resolution is required, a declaration such as :- table < predicate/arity > is required. The first step in implementing PLPs in XSB is to declare the predicate prefer/2 as tabled. The next step is to replace occurrences of each optimization predicate p(v) in the bodies of derived or optimization predicates by the atom pref erred p(v), which in turn is defined as pref erred p(v 0 ) :- f ilterP O(p(v 0 ), pref er), cycle check(p(v 0 ), pref er). where v 0 is an argument vector to which no bindings have been made. The tabled predicate f ilterP O/2 is part of the XSB aggregates library (version 2.1) and only returns an answer if it is preferred according to the relation pref er/2. The predicate cycle check/2 simply ensures that the answer returned is not involved in a cyclic preference. If computation of PLGs is desired, a DCG translation is first performed, followed by the PLP transformation. It can easily be seen that tabled evaluations will be most efficient for PLPs when the evaluation minimizes resolution that uses non-preferred answers. For instance, in Figure 1 the evaluation postpones resolution of the selected literal of node 1 using the p(d) until the derivation stemming from node 6 is finished. In the course of this derivation, another answer p(c) is derived which is preferred to p(d). At an intuitive level, if an optimization subgoal S1 occurs in a lower recursive component than an optimization subgoal S2 , the evaluation can wait until all possible answers for S1 have been derived, (until S1 is completely evaluated) before returning these answers to S2 . However, if S1 and S2 occur in the same recursive component, this may not always be possible, since preferred answers for S2 may depend on preferred answers for S2 and vice-versa. These notions are made precise in [6] which describes a method local evaluation which postpones return of answers out of recursive components until these components are completely evaluated. Local evaluation is implemented in XSB as a user-selectable scheduling strategy. However the predicate filterPO/2 is designed so that it reduces the use of non-preferred answers for resolution to a large extent, even when local evaluation is not used.
3
PLPs and Name and Address Standardization
We now turn to how PLPs can be used to improve a commercial name and address standardizer written in Prolog. We begin by briefly describing the ar2
For an illustration of how SLGP LP can be formulated precisely, see [11] in which an aggregation operation is added to tabling operations for Horn clause programs to implement Annotated Programs [8], which share many features of PLPs.
214
B. Cui, T. Swift, and D.S. Warren
chitecture of the Prolog standardizer. (further details may be found in [12,10]), before turing to the standardizer based on PLGs. 3.1
Prolog Standardizer Architecture
As indicated in the introduction, the input of the Prolog standardizer is a textual string, while the output is a structure consisting of standardized elements. Accordingly the architecture of the Prolog standardizer consists of four stages: – An initial tokenization phase which converts the free text record into a stream of tokens. – A bottom-up parse which corrects spelling of tokens and is responsible for grouping designated token sequences to supertokens. – A top-down frame-oriented parse implemented using Prolog Definite Clause Grammars (DCGs). – A final post-processing phase which corrects badly parsed entities and handles inconsistent or missing data. We discuss each of the last three phases in turn. Bottom-up Parser The bottom-up parse is responsible for simple correction and grouping of tokens, when the correction or grouping does not depend on encountering the tokens within a particular context. The bottom-up parse performs several functions: – Explicit Translation. For instance, translating keywords in foreign languages such as ’AEROPORTO’ to ’AIRPORT’; – Correcting Misspellings such as correcting ’WISCONSON’ to WISCONSIN’; – Supertokenization of sequences of tokens. One example of this is grouping the sequence ’SALT’,’LAKE’,’CITY’ into ’SALT LAKE CITY’ a town in Utah. If these tokens were not grouped, later stages of the parser might inadvertently recognize ’LAKE CITY’, a town in Pennsylvania, as the city field. We call this grouping supertokenization. – Correcting Line Breaks such as correcting ’WASHING’, | , ’TON’ to ’WASHINGTON’, where | denotes a line-break carriage return pair. The bottom-up parser is implemented as a series of list processing routines that takes the output of the raw tokenization, and successively applies grouping and correction steps. Most of the code for the bottom-up parsing phase is automatically generated by declaring keywords such as cities, provinces, and so on [12]. Top-down Parser The Prolog top-down parser is structured as an LL(K) parser and coded using DCGs. The goal of the top-down parser is to fill up an entity frame which is abstractly represented by the Prolog term: frame(Type, entity(Name,Title,Rest), address(Room,Building,Street,PoBox,Town,State,Country,Zip,Rest), Telephone,Attention name,Other)
A Case Study in Using Preference Logic Grammars
215
To take a full example, consider an input sentence: XYZ INC. ATTENTION MANUFACTURING DEPARTMENT 4 GATEWAY AVENUE MAPLE NC 27956 This will be parsed and standardized as Entity:
Name: XYZ Title: INC Address: Attn: MFRG DEPT Str/Dist: 4 GATEWAY AVE Town: MAPLE Town: NC Zip: 27956
In the above parse, the entity name is standardized from XYZ INC. to XYZ, while the street number, city, state, and zipcode are all properly extracted. The field Attn indicates that a particular person or department within the company is also specified. At an operational level, each stage of the parse is associated with a given element of the entity frame into which tokens are placed by default. In the above example, XYZ is not a recognized keyword, so the top-down parser begins by assuming that it is parsing an entity name. When the parser encounters the token INC it recognizes that INC may constitute the end of an organization name. Next, the parser encounters the keyword ATTN (transformed from ATTENTION by a transformation in the bottom-up parse). By default it then enters a state in which it adds unknown tokens to the Attention field, and remains in that state until it hits the number 4 which is the first token of various address elements, including street number. Parsing continues until the entire string has been consumed. In the case of ’XYZ INC’ the use of default entity frame elements above is correct. In general, however their use leads to complicated parsing code. The production for post-office boxes, for instance, must be placed into several different address parsing routines: routines whose default is to consider a token an Attention field, routines whose default is a street name, a city, and so on. Post-processing The top-down parser attempts to find the best entity frame element for tokens by using its present context plus a short lookahead of the input token stream. Such guesses turn out to be wrong in a significant minority of cases, and need to be rectified in the post-processing phase. As an example, consider the string: ALLIED INDUSTRY PA Perhaps the most natural way for a human to parse this string is to take the organization name as ALLIED INDUSTRY, the city as empty, and the state as Pennsylvania. However, Industry is in fact a town in Pennsylvania, and this information may lead us to conclude that the company name is actually ALLIED. Such post processing is done by rules which have the form:
216
B. Cui, T. Swift, and D.S. Warren
post_process_entity_name(ent_addr(Rel,entity(Name,Title,Rest), address(Rm,Bld,Str,Po,City,State,Country,Zp,Rst), Tel,Attn,Flags,Other), ent_addr(Rel,entity(Newname,Title,Rest), address(Rm,Bld,Str,Po,City,State,Country,Zp,Rst), Tel,Attn,Flags,Other) ):is_null(City),is_null(State), last_two(Name,City,State,Newname), consistent_city_state(Penult,Ult).
This rule, which has been somewhat simplified for presentation, can be read as follows. If neither a city nor a state were found during the top-down parse, the rule checks to see whether the last two tokens of the name field form a consistent city state pair. If so, the tokens are stripped from the entity name and added to the appropriate elements of the address. Abstracting from the foregoing example, the fact that no city had been parsed was used to disambiguate the parse. Indeed, when global information is needed to disambiguate a parse, it is most easily done in the post-processing stage. The post-processing phase is also responsible for applying consistency checks on the output of the top-down parser. These consistency checks are based largely on the following fact bases: – 42,000 United States cities with their states, and 5-digit zip codes; – The 500 largest Canadian cities with their provinces; – 10,000 additional city-country pairs. Depending on the fact base used, the post-processing phase can check the validity of various locations. If the standardizer does not recognize a valid location it attempts to correct the spelling of the city name using a more aggressive algorithm than permitted in earlier stages. To take a concrete example, if the city name in the parsed output is PITSBURG, the zipcode is 15123, and the country is US, we determine that the city corresponding to zipcode 15123 is PITTSBURGH. To make this transformation, the standardizer checks whether the string-edit distance is less than a predefined threshold (which is a function of the string length) and transforms the corrects city if so. Related algorithms are used for non-US cities. 3.2
Standardizing with PLPs
While an effective standardizer has been constructed using Prolog, it can be difficult to maintain due to the many default contexts needed by the top-down parse, and to the lack of use of global information until the post-processing phase. In particular, choices of how to disambiguate strings like the ALLIED INDUSTRY PA example mentioned above may differ from one corpus (i.e a commercial client) to another. The code can be made more maintainable by reducing the use of default frame elements into which tokens are parsed and by more cleanly separating the task of parsing from that of disambiguation and correction phase. We
A Case Study in Using Preference Logic Grammars
217
now turn to how this has been accomplished can be re-implemented via tabling and Preference Logic. Figure 2 illustrates how the address parser of Section 3.1 can be re-implemented using tabled DCGs. Given an input list, the optimization predicate preferred address/2 is true if Addr unifies with a preferred address, whose definition is given below. Rather than using difference lists, as do Prolog DCGs, tabled DCGs work off of tokens that has been asserted to a database (see [4]). The predicate preferred address/1 is then called, which finds all preferred address parses within the sentence. In order to do this, a scanning predicate, scan address/3 calls the parsing routine address/1, at every possible position of the sentence 3 .
preferred_address(Addr):filterPO(scan_address(Address),prefer_address). :- table scan_address/1. scan_address(Addr):- scan_address(Addr,0,_).
% Begin with first token % of input sentence.
scan_address(Addr) --> address(Addr). scan_address(Addr) --> [_], scan_address(Addr). address([Elem|Rest])--> addr_element(Elem), address(Rest). address([]) --> tnot(addr_element(_Elem)). :- table addr_element/3. addr_element(room(Rm)) --> room(Rm). addr_element(building(Bld)) --> building(Bld). addr_element(street(Str)) --> street(Str). addr_element(pobox(PO)) --> pobox(PO). addr_element(csz(Csz)) --> city_state_zip(CSZ). addr_element(country(Ctry)) --> country(Ctry).
Fig. 2. Parsing an Address Using Tabling
3
In version 2.0 XSB, the call tnot/1 in Figure 2 must be replaced by a call to ’t not’/1 in order to execute non-ground tabled negation.
218
B. Cui, T. Swift, and D.S. Warren
As will be substantiated in Section 4 the address parsing of Figure 2 is much simpler than the LL-parser of Section 3.1. Rather than duplicating different productions within the default production for company for street name and so on, it simply constructs all possible parses, and filters the preferred ones through the predicate preferred address/1. As seen in Figure 2, preferred address/1 calls the predicate filterPO/2 which performs preferred answer filtering. One clause for the preference rule prefer address is: prefer_address(Address1,Address2):Address1 \== Address2, get_csz(Address1,CSZ1), get_csz(Address2,CSZ2), weigh_csz(CSZ1,W1), weigh_csz(CSZ2,W2), W1 >= W2.
This rule calls a routine that assigns a weight to a triple of city, state, zip elements, in which the weight depends on whether city is a valid city name, whether city is actually located in state, and whether zipcode is correct for the city. Other rules of the PLP standardizer, which are not shown, weigh an address depending on the validity of a street address, on whether a valid room number is present or not, and so on. Pruning Using Preference Logic Compared to the approach of Section 3.1, the standardizer just described is simple but inefficient. In particular, all addresses are generated before preferences are applied so that no advantage is taken of pruning. However, in the case of addresses with valid city, state, zip triples, pruning can be programmed in a simple manner. To begin with, the fifth clause of addr element/3 in Figure 2 can be modified to use a derived predicate addr_element(csz(Csz)) --> preferred_city_state_zip(Csz). so that only preferred city, state, sip triples will be propagated into addresses. New predicates needed to implement preferred city state zip(CSZ) are analogous to those needed for preferred address/1. Note that the use of pruning relies on the optimal sub-problem property.
4
Comparison of the Two Standardizers
Table 1 provides insight into the amount of code in each standardizer. Clearly, most code comprises domain information, mostly tables of cities, states, zip codes, countries, and so on; along with rules for the bottom-up parse, which as mentioned in Section 3.1 is largely automatically generated based on declarations of keywords. The most elaborate code is in the top-down parse and in the postprocessing: each of these sections of code is reduced. Indeed, the post-processing step almost eliminated: some of it is moved into preference rules and reclassified
A Case Study in Using Preference Logic Grammars
219
under the parsing phase, but much of it is avoided altogether. Thus, while using the new standardizer architecture does not lead to a large reduction in overall standardizer code, it greatly reduces the amount of code needed by later phases of standardization — the code that requires the most programmer maintenance. Function Clauses Lines Tokenization 94 412 Bottom-up Parse 26205 26205 Domain Information 59150 59150 Control and Utilities 727 1345 (Prolog) Top-Down 724 2082 (Prolog) Post-processing 604 2838 (PLP) Top-Down 198 686 (PLP) Post-processing 7 106 Table 1. Code sizes for Prolog and XSB Standardizers
Testing on Defense Department data indicates that the PLP standardizer works correctly on about 96-97% of the time, a rate that is virtually identical to the Prolog standardizer 4 . Table 2 indicates the performance of the various standardizers in terms of records per second standardized on a PC. We note that the two standardizers differ slightly in their functionality so that the numbers in each table, should be taken as approximate comparisons. Even with this disclaimer, it can be seen that the PLP standardizer drastically reduces code in the top-down parsing and post processing stanges. This is due to both to the simpler grammatical forms that tabling allows and to the declarative use of preference rules that are combined with the grammar rather than applied after the entire string has been parsed. While the PLP standardizer is 3 times slower than the Prolog standardizer, the tradeoff of speed for declarativity is beneficial for this application since the costs of maintenance outweigh the performance costs as long as the performance costs remain reasonable. In any case, low-level optimizations to filterPO/3 have been identified and are scheduled to be implemented in XSB so that the performance loss of the PLP standardizer may be reduced. In addition, the standardizer recoding was done manually coding the application of preferences on DCGs. A library that implements the full PLG transformation is planned for XSB, consisting of the DCG transformation together with the PLP transformation sketched in Section 2.
5
Discussion
Commercial entities are often reluctant to use Prolog for program development, let alone extensions of Prolog that include preferences or other uncommon techniques for knowledge representation. We believe that it is only by developing 4
Verification is performed by human analysis of a random sample of data.
220
B. Cui, T. Swift, and D.S. Warren
Prolog Stdzr PLP Stdzr (no pruning) PLP Stdzr (pruning) Records per second 54 14 19 Table 2. Performance of Various Standardizers
efficient implementations of these techniques that their research and commercial applications can be discovered and tested — and that it is through such applications that the significance of the knowledge representation techniques will be judged. We have shown here how a simple logic for preferences can be implemented and applied to a commercial problem. Efficient implementation and large-scale application of more powerful logics for preferences, such as that of [1] which includes dynamic preferences, remains open. Acknowledgements The authors would like to thank Bharat Jayaraman and Kannan Govindarajan for their comments on a prelimiary version of this paper. This work was partially supported by NSF grants CCR-9702581, EIA-97-5998, and INT-96-00598.
References 1. G. Brewka. Well-founded semantics for extended logic programs with dynamic preferences. Journal of Artificial Intelligence Research, 4:19–36, 1996. 2. W. Chen and D. S. Warren. Tabled Evaluation with Delaying for General Logic Programs. JACM, 43(1):20–74, January 1996. 3. C. Crowner, K. Govindarajan, B. Jayaraman, and S. Mantha. Preference logic grammars. Computer Languages, 1999. To apear. 4. B. Cui, T. Swift, and D. S. Warren. Using tabled logic programs and preference logic for data standardization. Available at http://www.cs.sunysb.edu/˜tswift, 1998. 5. J. Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94–102, 1970. 6. J. Freire, T. Swift, and D. S. Warren. Beyond depth-first: Improving tabled logic programs through alternative scheduling strategies. Journal of Functional and Logic Programming, 1998(3), 1998. 7. K. Govindarajan, B. Jayaraman, and S. Mantha. Preference logic programming. In ICLP, pages 731–746. MIT Press, 1995. 8. M. Kifer and V. S. Subrahmanian. Theory of generalized annotated logic programming and its applications. J. Logic Programming, 12(4):335–368, 1992. 9. J. W. Lloyd. Foundations of Logic Programming. Springer-Verlag, Berlin Germany, 1984. 10. I. V. Ramakrishnan, A. Roychoudhury, and T. Swift. A standardization tool for data warehousing. In Practical Applications of Prolog, 1997. 11. T. Swift. Tabling for non-monotonic programming. Annals of Mathematics and Artificial Intelligence, 1999. To appear. 12. T. Swift, C. Henderson, R. Holberger, J. Murphey, and E. Neham. CCTIS: an expert transaction processing system. In Sixth Conference on Industrial Applications of Artificial Intelligence, pages 131–140, 1994.
Minimal Founded Semantics for Disjunctive Logic Programming? Sergio Greco DEIS, Univ. della Calabria, 87030 Rende, Italy
[email protected]
Abstract. In this paper, we propose a new semantics for disjunctive logic programming and deductive databases. The semantics, called minimal founded, generalizes stable model semantics for normal (i.e. non disjunctive) programs but differs from disjunctive stable model semantics (the extension of stable model semantics for disjunctive programs). Compared with disjunctive stable model semantics, the minimal founded semantics seems to be, in some case, more intuitive, it gives meaning to programs which are meaningless under stable model semantics and it is not harder to compute. We study the expressive power of the semantics and show that for general disjunctive datalog programs it has the same power of disjunctive stable model semantics. We also present a variation of the minimal founded semantics, called strongly founded which on stratified programs coincide with the perfect model semantics.
1
Introduction
Several different semantics have been proposed for normal and disjunctive logic programs. Stable model semantics, first proposed for normal (i.e. disjunction free) programs, has been subsequently extended for disjunctive programs. For normal programs, stable model semantics has been widely accepted since it captures the intuitive meaning of programs and, for stratified programs it coincides with the perfect model semantics which is the standard semantics for this class of programs. For positive programs, stable model semantics coincide with the minimal model semantics which is the standard semantics for positive disjunctive programs. However, the introduction of disjunction in the head of rules does not guarantees uniqueness of the minimal model also in the case of negation free programs. For general disjunctive programs several semantics have been proposed. We mention here the (extended) generalized closed world assumption [17], the perfect model semantics [18], particularly suited for stratified programs, the disjunctive stable model semantics [12,19], Partial stable model semantics [19,7]. Disjunctive stable model semantics is widely accepted since i) it gives a good intuition of the meaning of programs, ii) for normal programs it coincide with ?
This work has been partially supported by ISI-CNR, by an EC grant under the project “Contact” and by MURST grants under the projects “Interdata” and “Telcal”.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 221–235, 1999. c Springer-Verlag Berlin Heidelberg 1999
222
S. Greco
stable model semantics and for positive programs it coincide with the minimal model semantics. However, disjunctive stable model semantics has some drawbacks. It is defined for a restricted class of programs and there are several reasonable programs which are meaningless, i.e. they do not have stable models. Motivating Examples The following examples present some programs whose intuitive meaning is not captured by disjunctive stable model semantics. Example 1. Consider the following simple disjunctive program P1 a∨b∨c← ← ¬a. ← ¬b. where the second and third rules are constraints, i.e. rules which are satisfied only if the body is false. These rules can be rewritten into equivalent standard rules.1 P1 has a unique a minimal model M1 = {a, b} but M1 is not stable. 2 Thus, under stable model semantics the above program is meaningless. However, the intuitive meaning is captured by the unique minimal model since the constraints forces to infer more than one atom from the disjunctive rule. The next example presents another program which has no stable models. Example 2. Consider the following disjunctive program P2 a∨b∨c← a ← ¬b. b ← ¬c. c ← ¬a. P2 has three minimal models M21 = {a, b}, M22 = {a, c} and M23 = {b, c} but all minimal models are not stable. 2 The intuitive meaning of the above program is captured by the three alternative minimal models. Indeed, the non disjunctive rules state that from the first rule we must infer at least two atoms, among a, b and c. Intuitively, the problem with stable model semantics, is that in some case the inclusive disjunction is interpreted as exclusive disjunction. Thus, in order to overcome some drawbacks of stable model semantics and to give semantics to a larger class of programs, we propose a different extension of stable model semantics for normal program, called minimal founded semantics. 1
A constraint rule of the form ← b1 , ..., bk can be rewritten as p(X) ← b1 , ..., bk , ¬p(X) where p is a new predicate symbol and X is the list of all distinct variables appearing in the source rule.
Minimal Founded Semantics for Disjunctive Logic Programming
223
Contributions The main contributions of the paper are the following: – We introduce a new semantics for disjunctive programs. The proposed semantics seems to be more intuitive than stable model semantics and it gives meaning to programs which are meaningless under disjunctive stable model semantics. – We show that the new semantics coincide with stable model semantics for normal (i.e. disjunction free) and positive programs. Therefore, the proposed semantics differs from stable model semantics only for programs containing both disjunctive rules and negation. – We formally define the expressive power and complexity of the new semantics for datalog programs and we show that the proposed semantics has the same expressive power and complexity of disjunctive stable model semantics. Organization of the Paper The sequel of the paper is organized as follows. Section 2 presents preliminaries on disjunctive datalog, minimal and stable model semantics. Section 3 introduces the minimal founded semantics. The relationship with minimal founded semantics and stable model semantics is investigated. Section 3 presents results on the expressive power of minimal founded semantics whereas Section 5 presents the data complexity results. Finally, Section 6 presents our conclusions.
2
Preliminaries
A (disjunctive datalog) rule r is a clause of the form a1 ∨ · · · ∨ an ← b1 , · · · , bk , ¬bk+1 , · · · , ¬bm ,
n ≥ 1, m ≥ 0.
a1 , · · · , an , b1 , · · · , bm are atoms of the form p(t1 , ..., tn ), where p is a predicate of arity n and the terms t1 , ..., tn are constants or variables. The disjunction a1 ∨ · · · ∨ an is the head of r, while the conjunction b1 , ..., bk , ¬bk+1 , ..., ¬bm is the body of r. Moreover, if n = 1 we say that the rule is normal, i.e. not disjunctive. We denote by H(r) the set {a1 , ..., an } of the head atoms, and by B(r) the set {b1 , ..., bk , ¬bk+1 , ..., ¬bm } of the body literals. We often use upper-case letters, say L, to denote literals. As usual, a literal is an atom A or a negated atom ¬A; in the former case, it is positive, and in the latter negative. Two literals are complementary , if they are of the form A and ¬A, for some atom A. For a literal L, ¬L denotes its complementary literal, and for a set S of literals, ¬S = {¬L | L ∈ A}. Moreover, B + (r) and B − (r) denote the set of positive and negative literals occurring in B(r), respectively. A (disjunctive) logic program is a finite set of rules. A ¬-free (resp. ∨-free) program is called positive (resp. normal). A term, (resp. an atom, a literal, a rule or a program) is ground if no variables occur in it. In the following we also
224
S. Greco
assume the existence of rules with empty head which defines constraints 2 , i.e. rules which are satisfied only if the body is false. Moreover, a rule defining a constraint of the form ← B(X), where B(X) denote the body conjunction and X denotes the list of variables appearing in the body of the rule, can be rewritten as a normal rule of the form p(X) ← B(X), ¬p(X) where p is a new predicate symbol. The Herbrand Universe UP of a program P is the set of all constants appearing in P , and its Herbrand Base BP is the set of all ground atoms constructed from the predicates appearing in P and the constants from UP . A rule r0 is a ground instance of a rule r, if r0 is obtained from r by replacing every variable in r with some constant in UP . We denote by ground(P ) the set of all ground instances of the rules in P . An interpretation of P is any subset of BP . The value of a ground atom L w.r.t. an interpretation I, valueI (L), is true if L ∈ I and f alse otherwise. The value of a ground negated literal ¬L is ¬valueI (L). The truth value of a conjunction of ground literals C = L1 , . . . , Ln is the minimum over the values of the Li , i.e., valueI (C) = min({valueI (Li ) | 1 ≤ i ≤ n}), while the value valueI (D) of a disjunction D = L1 ∨ ... ∨ Ln is their maximum, i.e., valueI (D) = max({valueI (Li ) | 1 ≤ i ≤ n}); if n = 0, then valueI (C) = T and valueI (D) = F . Finally, a ground rule r is satisfied by I if valueI (H(r)) ≥ valueI (B(r)). Thus, a rule r with empty body is satisfied by I if valueI (H(r)) = T whereas a rule r0 with empty head is satisfied by I if valueI (B(r)) = F . An interpretation M for P is a model of P if M satisfies each rule in ground(P ). Minker proposed in [17] a model-theoretic semantics for positive P , which assigns to P the set of its minimal models MM(()P ), where a model M for P is minimal, if no proper subset of M is a model for P . Accordingly, the program P = {a ∨ b ←} has the two minimal models {a} and {b}, i.e. MM(()P ) = { {a}, {b} }. The more general disjunctive stable model semantics also applies to programs with (unstratified) negation [12,19]. Disjunctive stable model semantics generalizes stable model semantics [11], previously defined for normal programs. Definition 1. For any interpretation I, denote with gram derived from ground(P)
P I
the ground positive pro-
1. by removing all rules that contain a negative literal ¬a in the body and a ∈ I, and 2. by removing all negative literals from the remaining rules. An interpretation M is a (disjunctive) stable model of P if and only if M ∈ P ). 2 MM( M For general P , the stable model semantics assigns to P the set SM(P ) of its stable models. It is well known that stable models are minimal models (i.e. SM(P ) ⊆ MM(P )) and that for negation free programs minimal and stable model semantics coincide (i.e. SM(P ) = MM(P )). 2
Under total semantics
Minimal Founded Semantics for Disjunctive Logic Programming
225
An extension of the perfect model semantics for stratified datalog programs to disjunctive programs has been proposed in [19]. A disjunctive datalog program P is said to be locally stratified if there exists a decomposition S1 , ..., Sω of the Herbrand base such that for every (ground instance of a) clause A1 , ...Ak ← B1 , ..., Bm , ¬ C1 , ..., ¬ Cn in P , there exists an l, called level of the clause, so that: 1. ∀i ≤ k stratum(Ai ) = l, 2. ∀i ≤ m stratum(Bi ) ≤ l, and 3. ∀i ≤ n stratum(Ci ) < l. where stratum(A) = i iff A ∈ Si . The set of clause in ground(P ) having level i (resp. ≤ i) is denoted by Pi (resp. Pi∗ ). Any decomposition of the ground instantiation of a program P is called local stratification of P . The preference order on the models of P is defined as follows: M N iff M 6= N and for each a ∈ M − N there exists a b ∈ N − M such that stratum(a) > stratum(b). Intuitively, stratum(a) > stratum(b) means that a has higher priority that b. Definition 2. Let P be a (locally) stratified disjunctive datalog program. A model M for P is perfect if there is no model N such that N M . The collection of all perfect models of P is denoted by PM(P ). 2 Consider for instance the program consisting of the clause a ∨ b ← ¬ c. The minimal models are M1 = {a}, M2 = {b} and M3 = {c}. Since stratum(a) > stratum(c) and stratum(b) > stratum(c), we have that M1 M3 and M2 M3 . Therefore, only M1 and M2 are perfect models. Notice that M ⊂ N implies M N ; thus, for stratified P , PM(P ) ⊆ MM(P ). Moreover, for positive P , MM(P ) = PM(P ) and for stratified P , PM(P ) = SM(P ) ⊆ MM(P ). The computation of the perfect model semantics of a program P can be done by considering a decomposition (P1 , ..., Pω ) of P and computing the minimal models of all subprograms, one at time, following the linear order [10].
226
3
S. Greco
Minimal Founded Semantics
In this section we introduce a new semantics for disjunctive programs. Definition 3. Let P be a positive disjunctive program and let M be an interpretation. Then, SP (M ) = {a ∈ BP |∃r ∈ ground(P ) ∧ a ∈ H(r) ∧ B(r) ⊆ M } SPω (∅) denotes the least fixpoint of the operator SP .
2
The operator SP extends the classical immediate consequence operator TP to disjunctive programs. It is obvious that the operator SP , for positive P , is monotonic and continuos and, therefore, it admits a least fixpoint. Definition 4. Let P be a disjunctive program and let M be an interpretation. Then, P (M ) denotes the positive program derived from ground(P ) as follows: for each rule r : a1 , ..., ak ← b1 , ..., bm , ¬c1 , ..., cn 1. delete r if there is some ci ∈ M ; 2. delete all remaining negated literals ¬ci s.t. ci 6∈M ; 3. delete all head atoms ai 6∈M if there is some aj ∈ M .
2
P The difference between P (M ) and M is in Item 3. Thus, in the generation of P (M ) we delete also the atoms appearing in the head of rules which are false in the interpretation M if the head of rule contains some other atom true in M . P . Clearly, for normal programs P (M ) = M
Example 3. Consider for instance the program P1 of Example 1 and the interpretation M1 = {a, b}. P1 (M1 ) consists of the unique rule a∨b← Consider now the program P2 of Example 2 and the interpretation M21 = {a, b}. The program P2 (M21 ) consists of the rules a∨b← b←
2
Definition 5. Let P be a disjunctive program and let M be a model for P . Then, M is a founded model if it is contained in SPω (M ) (∅). Moreover, M is said to be minimal founded if it is a minimal model of P and it is also founded. 2 Example 4. The program P1 of Example 1 has a unique minimal model M1 = {a, b} which is also founded since it is the fixpoint of P1 (M1 ). The program P2 of Example 2 has three minimal models M21 = {a, b}, M22 = {a, c} and M23 = {b, c} which are all minimal founded since M21 , M22 , 2 M23 are fixpoints of P2 (M21 ), P2 (M22 ) and P2 (M23 ), respectively.
Minimal Founded Semantics for Disjunctive Logic Programming
227
In the following we shall denote the set of minimal founded models by MF(P ). The following results states that for disjunction-free programs, stable models semantics and minimal founded semantics coincide. Proposition 1. Let P be a normal program. Then, SM(P ) = MF(P ). Proof. Clearly, for any normal program P and any interpretation M of P , P M = P (M ). If M is a minimal model and the unique minimal model of P (M ) is M , M is also a stable model. Moreover, if M is a stable model for P , M is a P minimal model and since P (M ) = M , it is also minimal founded. 2 The following example presents a disjunctive program where stable and minimal founded semantics coincide. Example 5. Consider the following simple disjunctive program P5 a∨b∨c← a ← ¬b, ¬c. b ← ¬a. c ← ¬a. This program has a two stable models M51 = {a} and M52 = {b, c} which are also minimal founded 2 Moreover, for general programs containing both disjunction and negation, stable and minimal founded semantics do not coincide. The relation between the two semantics is given by the following result. Theorem 1. Let P be a disjunctive program. Then, SM(P ) ⊆ MF(P ). Proof (Sketch). It is well known that stable models are also minimal models. It is sufficient to show that every stable model is founded, i.e. M ⊆ SPω (M ) (∅). P Clearly, every minimal model of M is contained in SPω (M ) (∅) and, therefore, since M ∈ MM(P ), M ⊆ SPω (M ) (∅), i.e. M is founded. Therefore, SM(P ) ⊆ MF(P ). 2 As shown by the previous examples there are programs where the containment is strict and there are programs having minimal founded models which are not stable. Corollary 1. Let P be a positive program. Then, MM(P ) = MF(P ). Proof. From Theorem 1 SM(P ) ⊆ MF(P ). Moreover, by definition MF(P ) ⊆ MM(P ). Since for positive programs SM(P ) = MM(P ), we conclude that MF(P ) = MM(P ). 2 Therefore, for positive programs, minimal model semantics, stable model semantics and minimal founded semantics coincide. Proposition 2. Let P be a stratified program. Then, MF(P ) 6= ∅.
2
228
S. Greco
The above results states that, under minimal founded semantics, stratified programs have a well defined meaning. However, also for stratified programs the set of stable and minimal founded models may be different. Example 6. Consider the program P6 a∨b← a ← c ← ¬b This program has two minimal founded models M61 = {a, c} and M62 = {a, b} 2 but only M61 is stable. The previous results states that all programs having stable model semantics have also minimal founded semantics although, as showed by our examples, there are programs which have well defined meaning under minimal founded semantics but are meaningless under stable model semantics. It is worth noting that both stable and minimal founded semantics consider minimal models whose atoms can be ‘derived’ from the program. Stable model semantics is more restrictive since for a given program P it considers only minimal models M which belong P to MM( M ), whereas the minimal founded semantics considers minimal models whose atoms can be derived from the program, i.e. all minimal models M contained in SP∞(M ) (∅). It could be interesting to compare the two semantics on the base of abstract properties [2].
4
Expressive Power and Complexity
In this section we present some results on the expressive power and the data complexity of minimal founded semantics for disjunctive datalog programs. We first introduce some preliminary definitions and notation and next present our results. Predicates symbols are partitioned into the two sets of base (EDB) and derived (IDB) predicates. Base predicate symbols correspond to database relations on a countable domain U and do not occur in the rule heads. Derived predicate symbols appear in the head of rules. Possible constants in a program are taken from the domain U . A program P has associated a relational database scheme DB P = {r| r is an EDB predicate symbol of P}, thus EDB predicate symbols are seen as relation symbols. A database D on DBP is a set of finite relations, one for each r in DBP , denoted by D(r). The set of all databases on DB P is denoted by DP . Given a database D ∈ DP , PD denotes the following logic program: PD = P ∪ {r(t) ← | r ∈ DB P ∧ t ∈ D(r)}. The Herbrand universe UPD is a finite subset of U and consists of all constants occurring in P or in D (active domain). If D is empty and no constant occur in P, then UPD is assumed to be equal to {a}, where a is any constant in U .
Minimal Founded Semantics for Disjunctive Logic Programming
229
Definition 6. A (bound) query Q is a pair hP, Gi, where P is a disjunctive program and G is a ground literal (the query goal). 2 The result of a query Q = hP, Gi on an input database D is defined in terms of the minimal founded models of PD , by taking either the union of all models (possible inference, ∃MF ) or the intersection (certain inference, ∀MF ). Definition 7. Given a program P and a database D, a ground atom G is true, under possible semantics, if there exists a minimal founded model M for PD such that G ∈ M . Analogously, G is true, under certain semantics, if G is true in every minimal founded model. The set of all queries is denoted by Q. 2 Definition 8. Let Q = hP, Gi be a query. Then the database collection of Q w.r.t. the set of minimal founded models MF is: (a) under the possible version of semantics, the set of all such that G is true in PD under the possible version semantics; this set is denoted by EXP ∃MF (Q); (b) under the certain version of semantics, the set of all such that G is true in PD under the certain version semantics; this set is denoted by EXP ∀MF (Q).
databases D in DP of minimal founded databases D in DP of minimal founded
The expressive power of a given version (either possible or certain) of minimal founded semantics is given by the family of the database collections of all possible queries, i.e., EXP ∃MF [Q] = {EXP ∃MF (Q)|Q ∈ Q} and EXP ∀MF [Q] = 2 {EXP ∀MF (Q)|Q ∈ Q}. It is well known that the database collection of every query, is indeed a generic set of databases [1]. Recall that a set D of databases on a database scheme DB with domain U is (K-)generic [4,1] if there exists a finite subset K of U such that for any D in D and for any isomorphism θ on relations extending a permutation on U − K, θ(D) is in D as well — informally, all constants not in K are not interpreted and relationships among them are only those explicitly provided by the databases. Note that for a query Q = hP, Gi, K consists of all constants occurring in P and in G. From now on, any generic set of databases will be called a database collection. After the data complexity approach of [4,22] for which the query is assumed to be a constant while the database is the input variable, the expressive power coincides with the complexity class of the problems of recognizing each query database collection. The expressive power of each semantics will be compared with database complexity classes, defined as follows. Given a Turing machine complexity class C (for instance P or N P), a relational database scheme DB, and a database collection D on DB, D is C-recognizable if the problem of deciding whether D is in D is in C. The database complexity class DB-C is the family of all C-recognizable database collections (for instance, DB-P is the family of all database collections that are recognizable in polynomial time). If the expressive
230
S. Greco
power of a given semantics coincides with some complexity class DB-C, we say that the given semantics captures (or expresses all queries in) DB-C. Recall that the classes ΣkP , ΠkP of the polynomial hierarchy [21] are defined by P P = N P Σi , and ΠiP = co-ΣiP , for all i ≥ 0. In particular, Π0P = P, Σ0P = P, Σi+1 Σ1P = N P, and Π1P = co-N P. By Fagin’s Theorem [9] and its generalization in [21], complexity and second-order definability are linked as follows. Fact 1 ([9,21]) A database collection D over a scheme R is in DB-ΣkP , k ≥ 1, ˚1 )(∀A ˚k )φ on R, where ˚2 ) · · · (Qk A iff it is definable by a second-order formula (∃A ˚i are lists of predicate variables preceded by alternating quantifiers and φ is the A first-order. 2 4.1
Expressive Power
It is well known that disjunctive datalog under total stable model semantics captures the complexity classes Σ2P and Π2P , respectively, under possible and certain semantics. The following example presents a program which defines a Σ2P -complete problem [3]. The definition of the problem by means of a disjunctive program has been taken from [8]. Example 7. A holding owns companies, each of which produces some goods. Moreover, several companies may have jointly control over another company. Now, some companies should be sold, under the constraint that all goods can be still produced, and that no company is sold which would still be controlled by the holding after the transaction. A company is strategic, if it belongs to a strategic set, which is a minimal set of companies satisfying these constraints. The query consists to check if a given company “a00 is strategic. This query can expressed as hSC, st(a)i where SC is defined as follows: st(C1 ) ∨ st(C2 ) ← pb(P, C1 , C2 ). st(C) ← cb(C, C1 , C2 , C3 ), st(C1 ), st(C2 ), st(C3 ). Here st(C) means that C is strategic, pb(P, C1 , C2 ) that product P is produced by companies C1 and C2 , and cb(C, C1 , C2 , C3 ) that C is jointly controlled by C1 , C2 and C3 ; we have adopted here from [3] that each product is produced by at most two companies and each company is jointly controlled by at most three other companies. The problem consists in checking if the company a is strategic, i.e. if there is a stable model containing st(a). 2 Thus, the strategic company problem can be defined by means of the disjunctive program reported above under the possible version of disjunctive stable model semantics (see [8]). Theorem 2. EXP ∃MF [Q] = DB-Σ2P .
Minimal Founded Semantics for Disjunctive Logic Programming
231
Proof. We first prove that for any query Q = hP, Gi in Q, recognizing whether a database D is in EXP ∃MF (Q) is in Σ2P . D is in EXP ∃MF (Q) iff there exists a minimal founded model M of PD such that G ∈ M . To check this, we may guess an interpretation M of PD and verify that: 1) M is a minimal model of PD , 2) M is founded and 3) G ∈ M . To solve Step 1 we can verify in polynomial time that M is a model of PD and use an N P oracle to ask whether M is not minimal (the oracle guess an interpretation N ⊆ M and checks that N is a model for PD ). If the answer of the oracle is “no” (i.e. M is a minimal model) we check in polynomial time Steps 2 and 3. Therefore, recognizing whether a database D is in EXP ∃MF (Q) is in Σ2P . To prove completeness it is sufficient to show that there is some Σ2P problem which can be expressed by disjunctive datalog under the possible version of minimal founded semantics. The strategic companies problem of Example 7 is a Σ2P -complete and is expressed by means of positive disjunctive datalog program under the possible version of stable model semantics [8]. Since for positive disjunctive program the sets of stable and minimal founded models coincide, we conclude that this program defines the strategic companies problem also under the possible version of minimal founded semantics. 2 Theorem 3. EXP ∀MF [Q] = DB-Π2P . Proof (Sketch). We first prove that for any query Q = hP, Gi in Q, recognizing whether a database D is in EXP ∀MF (Q) is in Π2P . To this end, let consider the complementary problem: is it true that D is not in EXP ∀MF (Q) ? Now, D is not in EXP ∀MF (Q) iff there exists a minimal founded model M of PD such that G 6∈M . Following the line of the proof of Theorem 2, we can easily see that the latter problem is in Σ2P . Hence, recognizing whether a database D is in EXP ∀MF (Q) is in Π2P . Let us now prove that every Π2p recognizable database collection D on a database scheme DB is in EXP ∀MF [Q]. By Fact 1, D is defined by a second order formula of the form ∀R1 ∃R2 Φ(R1 , R2 ). Using the usual transformation technique, the above formula is equivalent to a second order Skolem form formula (∀S1 )(∃S2 )Γ (S1 , S2 ), where Γ (S1 , S2 ) = (∀X)(∃Y)(Θ1 (S1 , S2 , X, Y) ∨ . . . ∨ Θk (S1 , S2 , X, Y)), S1 and S2 are two lists of respectively m1 , m2 predicate symbols, containing all symbols in R1 and R2 , respectively. Consider the following program P: r1 r2 r3 r4 r5 r6 r7
: : : : : : :
s1j (Wj1 ) ∨ sˆ1j (Wj1 ) ← s2j (Wj2 ) ∨ sˆ2j (Wj2 ) ← q(X) ← Θi (S1 , S2 , X, Y) g ← ¬q(X). g ← s2j (Wj2 ), sˆ2j (Wj2 ) sˆ2j (Wj2 ) ← g. s2j (Wj2 ) ←g
(1 ≤ j ≤ m1 ) (1 ≤ j ≤ m2 ) (1 ≤ i ≤ k) (1 ≤ j ≤ m2 ) (1 ≤ j ≤ m2 ) (1 ≤ j ≤ m2 )
232
S. Greco
where, intuitively, sˆ1j (Wj1 ) corresponds to ¬s1j (Wj1 ) and sˆ2j (Wj2 ) corresponds to s2j (Wj2 ). Now it is easy to show that the formula (∀S1 )(∃S2 )Γ (S1 , S2 ) is valid if g is false in all minimal founded models of P. 2 Therefore, the expressive power of disjunctive datalog under minimal founded and stable model semantics is the same. 4.2
Data Complexity
Data complexity is usually closely tied to the expressive power and, in particular, it provides an upper bound for the expressive power. Theorem 4. Given a disjunctive program P, a database D on DBP , and an interpretation M for PD , deciding whether M is a minimal founded model for PD is coN P-complete. Proof. (sketch) Let M be an interpretation and consider the complementary problem Π: is it true that M is not a minimal founded model? Π is in N P since we can guess an interpretation N and verify in polynomial time that (i) N is a model for PD and (ii) either M is not a model for PD or N is a proper subset of M . Hence the problem Π is in coN P. Deciding whether M is a stable model for PD is also coN P-complete. Hardness can be proved in a similar way (cf. [5]). 2 The results on the data complexity of queries under minimal founded semantics are immediate consequences from the expressiveness results. Theorem 5. Let Q = hP, Gi be a query and D a database. Deciding whether PD has a minimal founded model, is Σ2P -complete. 2 Theorem 6. (Possibility inference) Let Q be a query and D a database. Deciding whether Q is true under the possible version of the minimal founded semantics is Σ2P -complete whereas under the certain version is Π2P -complete. 2
5
Strongly Founded Semantics
As shown by the program of Example 6, the minimal founded semantics admits models which are not intuitive. Indeed, the intuitive meaning of stratified programs is captured by the perfect model semantics. Thus, in this section we introduce a refinement of the minimal founded semantics, called strongly founded, which on stratified programs coincide with the perfect model semantics. Let P be a disjunctive datalog program and let S1 , ..., Sω be a decomposition of the Herbrand base such that for every (ground instance of a) clause A1 , ...Ak ← B1 , ..., Bm , ¬ C1 , ..., ¬ Cn in P , there exists an l, called level of the clause, so that:
Minimal Founded Semantics for Disjunctive Logic Programming
233
1. ∀i ≤ k stratum(Ai ) = l; 2. ∀i ≤ m stratum(Bi ) ≤ l; = l, if there is some Aj → Ci (1 ≤ j ≤ k) 3. ∀i ≤ n stratum(Ci ) < l, otherwise where stratum(A) = i iff A ∈ Si . The set of clause in ground(P ) having level i (resp. ≤ i) is denoted by Pi (resp. Pi∗ ). Any decomposition of the ground instantiation of a program P is called ordered decomposition of P . Observe that the level of ground clauses as above defined slightly differs from the one used in the definition of locally stratification (the two definitions differ in Item 3 since we also consider unstratified programs—for stratified programs the two definitions coincide). The preference order on the models of P is defined as follows: M N (M is preferable to N ) iff M 6= N and for each a ∈ M −N there exists a b ∈ N −M such that stratum(a) > stratum(b). Intuitively, stratum(a) > stratum(b) means that a has higher priority that b. A model M is said to be preferred if there is no model N M . Definition 9. Let P be a disjunctive datalog program. A model M for P is said to be strongly founded if it is founded and there is no model N such that N M . The collection of all strongly founded models of P is denoted by SF (P ). 2 Theorem 7. Let P be a disjunctive datalog program. Then, SM(P ) ⊆ SF (P ) ⊆ M F (P ). Proof. SF(P ) ⊆ MF(P ) is obvious since strongly founded models are restricted minimal founded models. Let us now prove that SM (P ) ⊆ SF (P ), i.e., that for each M ∈ SM (P ), M ∈ SF (P ). Assume that this is not true, i.e. that there is a model N ∈ M F (P ) Si (resp. Mi = M ∩ Si ) and Ni∗ = N ∩ Si∗ such that N M . Let Ni = N ∩S ∗ ∗ ∗ (resp. Mi = M ∩ Si ) where Si = j≤i Si . Let k be the first ordinal such that ∗ ∗ Nk+1 ⊂ Mk+1 and for h ≤ k Nh = Mh (i.e., Nk+1 ⊂ Mk+1 and Nk∗ = Mk∗ ). Since P∗
k+1 ∗ ∗ M is a stable model we have that M ∈ MM( M ∗ ) and that Mk+1 ∈ MM(Pk+1 ). k ∗ ∗ ∗ ∗ But Mk+1 ∈ MM(Pk+1 ) implies that Nk+1 6⊆ Mk+1 . Therefore there is no k ∗ ∗ such that Nk+1 6⊆Mk+1 and, consequently, there is no minimal founded model N M. 2
Example 8. The program of Example 6 has two minimal founded models M61 = {a, c} and M62 = {a, b} but only M62 is strongly founded since M62 M61 . As observed in Example 6, M62 is also stable. The program of Example 1 has a unique minimal founded model which is strongly founded. The program of Example 2 has three minimal founded models which are also strongly founded. 2
234
S. Greco
Theorem 8. Let P be a locally stratified program. Then, SF(P ) = PM(P ). Proof. For (locally) stratified programs the definition of local stratification and ordered decomposition coincide. The definition of perfect model and preferred model coincide too and, therefore, SF(P ) ⊇ PM(P ). We show that SF(P ) ⊆ SM (P ) = PM(P ), i.e. that for every M ∈ SF (P ), P ). Assume the existence of a model M ∈ SM (P ) or equivalently M ∈ M M ( M P N ∩ Si (resp. Mi = M ∩ Si ) and M ∈ M M ( M ) and a model N ⊂ M . Let Ni = S Ni∗ = N ∩Si∗ (resp. Mi∗ = M ∩Si∗ ) where Si∗ = j≤i Si . Let k be the first ordinal ∗ ∈ MM( such that Nk+1 ⊂ Mk+1 (for h ≤ k Nh = Mh ). We have that Mk+1 P∗
∗ Pk+1 Mk∗ )
∗ ∗ ∗ ∗ ∗ ∈ MM( Nk+1 and Nk+1 ∗ ). Since Mk = Nk , Mk+1 = Nk+1 ⊆ N and, therefore, k ∗ there is no k such that Mk+1 ⊇ N . Thus, M ⊆ N , i.e. M is a minimal model of P 2 M.
The above theorem states that for stratified programs, stable model semantics and strongly founded semantics coincide. Corollary 2. Let P be a positive disjunctive datalog program. Then MM(P ) = SF(P ). Proof. SF(P ) ⊆ MF(P ) subseteqMM(P ) by Theorem 7 since MF(P ) ⊆ MM(P ) by definition. For positive disjunctive programs, SM(P ) = MM(P ) and, therefore, SF(P ) = MF(P ) = MM(P ) 2 Corollary 3. Let P be a standard datalog program. Then SM (P ) = SF (P ). Proof. SM (P ) ⊆ SF(P ) ⊆ MF(P ) by Theorem 7. For standard datalog programs, SM(P ) = MM(P ) (by Proposition 1). Therefore, SM (P ) = SF (P ). 2 We conclude this section by mentioning that the strongly founded and minimal founded semantics have the same expressive power and the same data complexity. The formal results on the expressive power and data complexity of the strongly founded semantics can be found in the extended version of the paper [14].
6
Conclusion
The semantics proposed in this paper are essentially an extension of stable model semantics for normal programs and of the perfect model semantics for disjunctive programs. The aim of our proposal is the solution of some drawbacks of disjunctive stable model semantics which, in some case, interprets inclusive disjunction as exclusive disjunction. Several problems which need further research have been left open. For instance, further research could be devoted to i) the characterization head cycle free programs, ii) the identification of fragments of disjunctive datalog for which one minimal founded model can be computed in polynomial time; iii) the investigation of abstract properties for disjunctive datalog under minimal founded semantics [2].
Minimal Founded Semantics for Disjunctive Logic Programming
235
References 1. Abiteboul, S., Hull, R., Vianu, V. (1994), Foundations of Databases. AddisonWesley. 2. S. Brass and J. Dix. Classifying semantics of disjunctive logic programs. Proc. JICSLP-92, pp. 798–812, 1993. 3. Cadoli M, T. Eiter and G. Gottlob, Default Logic as a Query Language, IEEE Transaction on Knowledge and Data Engineering, 9(3), 1997, 448-463. 4. Chandra, A., D. Harel. Structure and Complexity of Relational Queries. Journal of Computer and System Sciences, 25, pp. 99–128, 1982. 5. T. Eiter and G. Gottlob. Complexity aspects of various semantics of disjunctive databases, Proc. Int. Conf. on Principles of Database Systems, 158–166, 1993. 6. T. Eiter, G. Gottlob and H. Mannila (1997), Disjunctive Datalog, ACM Transactions on Database Systems, 22(3):364–418, 1997 7. T. Eiter and N. Leone and D. Sacc´ a. Expressive Power and Complexity of Partial Models for Disjunctive Deductive databases, Theoretical Computer Science, 1997. 8. Eiter T., N. Leone, C. Mateis, G. Pfeifer and F. Scarcello. The KR System dlv: Progress Report, Comparisons and Benchmarks. Proc. of 6th Int. Conf. on Princ. of Knowledge Representation, 1998, pp. 406-417. 9. Fagin R. Generalized First-Order Spectra and Polynomial-Time Recognizable Sets, in Complexity of Computation, SIAM-AMS Proc., Vol. 7, pp. 43-73, 1974. 10. Fernandez, J. A., and Minker, J. Computing perfect models of disjunctive stratified databases. In Proc. ILPS’91 Workshop on Disjunctive Logic Programming, pp. 110117, 1991. 11. Gelfond, M., Lifschitz, V. The Stable Model Semantics for Logic Programming, in Proc. of Fifth Conf. on Logic Programming, pp. 1070–1080, 1988. 12. Gelfond, M. and Lifschitz, V. (1991), Classical Negation in Logic Programs and Disjunctive Databases, New Generation Computing, 9, 365–385. 13. Greco, S., Binding Propagation in Disjunctive Databases, Proc. Int. Conf. on Very Large Data Bases, 1997. 14. Greco, S., Strongly founded semantics for disjunctive logic programming, Technical Report, 1999. 15. Leone, N., P. Rullo, P. and F. Scarcello. Disjunctive Stable Models: Unfounded Sets, Fixpoint Semantics and Computation, Information and Computation, Academic Press, Vol. 135, No. 2, June 15, pp. 69-112, 1997. 16. Marek, W., Truszczy´ nski, M., Autoepistemic Logic, Journal of the ACM, 38, 3, pp. 518-619, 1991. 17. Minker, J. On Indefinite Data Bases and the Closed World Assumption, in “Proc. of the 6th Conference on Automated Deduction (CADE-82),” pp. 292–308, 1982. 18. Przymusinski, T. On the Declarative Semantics of Deductive Databases and Logic Programming, in “Foundations of deductive databases and logic programming,” Minker, J. ed., ch. 5, pp.193–216, 1988. 19. Przymusinski, T. Stable Semantics for Disjunctive Programs, New Generation Computing, 9, 401–424, 1991. 20. D. Sacc` a. The Expressive Powers of Stable Models for Bound and Unbound DATALOG Queries. Journal of Computer and System Sciences, Vol. 54, No. 3, June 1997, pp. 441–464. 21. Stockmeyer, L.J. The Polynomial-Time Hierarchy. Theoretical Computer Science, 3, pp. 1–22, 1977. 22. Vardi M.Y., ”The Complexity of Relational Query Languages”, Proc. ACM Symp. on Theory of Computing, pp. 137-146, 1982.
On the Role of Negation in Choice Logic Programs Marina De Vos? and Dirk Vermeir Dept. of Computer Science Free University of Brussels, VUB Pleinlaan 2, Brussels 1050, Belgium Tel: +32 2 6293308 Fax: +32 2 6293525 {marinadv,dvermeir}@tinf.vub.ac.be http://tinf2.vub.ac.be
Abstract. We introduce choice logic programs as negation-free datalog programs that allow rules to have exclusive-only (possibly empty) disjunctions in the head. Such programs naturally model decision problems where, depending on a context, agents must make a decision, i.e. an exclusive choice out of several alternatives. It is shown that such a choice mechanism is in a sense equivalent with negation as supported in semi-negative (“normal”) datalog programs. We also discuss an application where strategic games can be naturally formulated as choice programs: it turns out that the stable models of such programs capture exactly the set of Nash equilibria. We then consider the effect of choice on “negative information” that may be implicitly derived from a program. Based on an intuitive notion of unfounded set for choice programs, we show that several results from (seminegative) disjunctive programs can be strengthened; characterizing the position of choice programs as an intermediate between simple positive programs and programs that allow for the explicit use of negation in the body of a rule. Keywords: Logic programming, choice, unfounded sets, game-theory
1
Choice Logic Programs for Modeling Decision Making
When modeling agents using logic programs, one often has to describe a situation where an agent needs to make a decision, based on some context. A decision can be thought of as a single choice between several competing alternatives, thus naturally leading to a notion of nondeterminism. Using seminegative (also called “normal”) programs, such a choice can be modeled indirectly by using stable model semantics, as has been argued convincingly before [10,8]. E.g. a program such as p ← ¬q q ← ¬p ?
Wishes to thank the FWO for their support.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 236–246, 1999. c Springer-Verlag Berlin Heidelberg 1999
On the Role of Negation in Choice Logic Programs
237
has no (unique) total well-founded model but it has two total stable models, namely {p, ¬q} and {¬p, q}, representing a choice between p and q (note that his choice is, however, not exclusive, as e.g. p may very well lead to q in a larger program). In this paper, we simplify matters by providing for explicit choice sets in the head of a rule. Using p ⊕ q to denote an exclusive choice between p and q, the example above can be rewritten as p⊕q ← Intuitively, ⊕ is interpreted as “exclusive or”, i.e. either p or q, but not both, should be accepted in the above program. Definition 1. A choice logic program is a finite set of rules 1 of the form A ← B where A, the head, and B, the body, are finite sets of atoms. Intuitively, atoms in A are assumed to be xor’ed together while B is read as a conjunction. In examples, we often use ⊕ to denote exclusive or, while “,” is used to denote conjunction. If we want to single out an atom in the head of a rule we sometimes write A ⊕ a to denote A ∪ {a}. The semantics of choice logic programs can be defined very simply. Definition 2. Let P be a choice logic program. The Herbrand base of P , denoted BP , is the set of all atoms occurring in the rules of P . A set of atoms I ⊆ BP is model of P if for every rule A ← B, B ⊆ I implies that I ∩ A is a singleton, i.e. |A ∩ I| = 12 . A model of P is called stable iff it is minimal (according to set inclusion). Note that the above definitions allow for constraints to be expressed as rules where the head is empty. Example 1 (Graph 3-colorability). Given a graph assign each node one of three colors such that no two adjacent nodes have the same color. This problem is know as graph 3-colorability and can be easily transformed in the following choice program: col(X, r) ⊕ col(X, g) ⊕ col(X, b) ← node(X) ← edge(X, Y ), col(X, C), col(Y, C) The first rule states that every node should take one and only one of the three available colors (r, g or b). The second demands that two adjacent nodes have different colors. To this program we only need to add the facts (rules with empty body) that encode the graph to make sure that the stable models for this program reflect the possible solutions for this graph’s 3-colorability. The facts look either as node(a) ← or edge(a, b) ←. 1
2
In this paper, we identify a program with its grounded version, i.e. the set of all ground instances of its clauses. This keeps the program finite as we do not allow function symbols (i.e. we stick to datalog). We use |X| to denote the cardinality of a set X.
238
M. De Vos and D. Vermeir
Does not confess Confess
Does not confess 3, 3 4, 0
Confess 0, 4 1, 1
Fig. 1. The prisoner’s dilemma (Ex. 2)
The following example shows how choice logic programs can be used to represent strategic games[6]. Example 2 (The Prisoner’s Dilemma). Two suspects of a crime (they jointly committed) are arrested and interrogated separately. The maximum sentence for their crime is four years of prison. But if one betrays the other while the latter keeps quiet, the former is released while the silent one receives the maximum penalty. If they both confess they are both convicted to three years of prison. In case they both remain silent, they are convicted for a minor felony and sent to prison for only a year. In game theory this problem can be represented as a strategic game with a graphical notation as in Fig. 1. One player’s actions are identified with the rows and the other player’s with the columns. The two numbers in the box formed by row r and column c are the players’ payoff (e.g., the years gained with respect to the maximum sentence). When the row player chooses r and the column player chooses c, the first component represents the payoff of the row player. It is easy to see that the best action for both suspects is to confess because otherwise there is a possibility that they obtain the full four years. This is called a Nash equilibrium. This game can be easily transformed to the following choice logic program where di stands for “suspect i does not confess” and ci means “suspect i confesses”: d1 ⊕ c1 d2 ⊕ c 2 c1 c1 c2 c2
← ← ← d2 ← c2 ← d1 ← c1
The first two rules express that both suspects have to decide upon a single action. The last four indicate which action is the most appropriate given the other suspect’s actions. This program has a single stable model corresponding to the Nash equilibrium of the game, namely {c1 , c2 }. In [3], it was shown that every finite strategic game can be converted to a choice logic program whose stable models correspond with the game’s Nash equilibria. Definition 3 ([6]). A strategic game is a tuple hN, (Ai )i∈N , (≥i )i∈N i where 1. N is a finite set of players;
On the Role of Negation in Choice Logic Programs
Head Tail
239
Head Tail 1, 0 0, 1 0, 1 1, 0
Fig. 2. Matching Pennies (Ex. 3)
2. for each player i ∈ N , Ai is a nonempty set of actions that are available to her (we assume that Ai ∩ Aj = ∅ whenever i 6= j) and 3. for each player i ∈ N , ≥i is a preference relation on A = ×j∈N Aj An element a ∈ A is called a profile. For a profile a we use ai to denote the component of a in Ai . For any player i ∈ N , we define A−i = ×j∈N \{i} Aj . Similarly, an element of A−i will often be denoted as a−i . For a−i ∈ A−i and ai ∈ Ai we will abbreviate as (a−i , ai ) the profile a0 ∈ A which is such that a0 i = ai and a0 j = aj for all j 6= i. A Nash equilibrium of a strategic game hN, (Ai )i∈N , (≥i )i∈N i is a profile a∗ satisfying ∀ai ∈ Ai · (a∗−i , a∗i ) ≥i (a∗−i , ai ) Intuitively, a profile a∗ is a Nash equilibrium if no player can unilaterally improve upon his choice. Put in another way, given the other players’ actions a∗−i , a∗i is the best player i can do3 . Not every strategic game has a Nash equilibrium as demonstrated by the next example. Example 3 (Matching Pennies). Two persons are tossing a coin. Each of them has to choose between Head or Tail. If the choices differ, person 1 pays person 2 a Euro; if they are the same, person 2 pays person 1 a Euro. Each person cares only about the amount of money that she receives. The game modeling this situation is depicted in Fig. 2. This game does not have a Nash equilibrium. The corresponding choice logic program would look like: h 1 ⊕ t1 h 2 ⊕ t2 h1 t1 h2 t2
← ← ← h2 ← t2 ← t1 ← h2
This program has no stable model as the game has no Nash equilibrium. Notice that this would not have been the case if we would have used inclusive disjunctions instead of exclusive ones. Theorem 1. For every strategic game G = hN, (Ai )i∈N , (≥i )i∈N i there exists a choice logic program PG such that the set of stable models of PG coincides with the set of Nash equilibria of G. 3
Note that the actions of the other players are not actually known to i.
240
M. De Vos and D. Vermeir
The choice logic program PG obtained for a game, as one can see form the examples, consists of rules expressing that each player has to make a single choice out of her action set and rules expressing the best action for a player given the different actions of the other players.
2
Negation in Choice Logic Programs
While negation is not explicitly present in choice logic programs, it does appear implicitly. E.g. deciding on a in a rule a ⊕ b ← implicitly excludes b from any model; which can be read as “¬b is true”. A similar effect can be observed for constraints: if e.g. a is true, then the presence of a rule ← a, b implies that b must be false. Still, there is a difference with seminegative programs because, although implicitly implied negative information may prevent the further application of certain rules, such information can never be used to enable the inference of further atoms. The latter is possible e.g. in seminegative logic programs or disjunctive logic programs where the body of a rule may contain negated atoms. Hence choice logic programs can be regarded as an interesting intermediate system in between purely positive logic programs, where a model can be computed without taking into account any negative information4 and systems that allow for explicit negation in (the body of) a rule. In the remainder of this paper we will compare the role of negation in choice logic programs with both seminegative logic programs and seminegative disjunctive logic programs. 2.1
Simulating Seminegative Logic Programs
It turns out that choice logic programs can simulate semi-negative datalog programs, using the following transformation, which resembles the one used in [9] or [7] for the transformation of general disjunctive programs into negation-free disjunctive programs. Definition 4. Let P be a semi-negative logic program. The corresponding choice logic program P⊕ can be obtained from P by replacing each rule r : a ← B, ¬C from P with B ∪ C ⊆ BP and C 6= ∅, by ar ⊕ KC ← B (r10 ) a ← ar (r20 ) ∀c ∈ C · KC ← c (r30 ) where ar and KC are new atoms that are uniquely associated with the rule r. A model M for P⊕ is called rational iff: ∀KC ∈ M · M ∩ C 6= ∅ 4
Of course, as a last step, the complement of the positive interpretation can be declared false as a consequence of the closed world assumption.
On the Role of Negation in Choice Logic Programs
241
Intuitively, KC is an “epistemic” atom which stands for “the (non-exclusive) disjunction of atoms from C is believed”. If the positive part of a rule in the original program P is true, P⊕ will choose (rules r10 ) between accepting the conclusion and KC where C is the negative part of the body; the latter preventing rule application. Each conclusion is tagged with the corresponding rule (r20 ), so that rules for the same conclusion can be processed independently. Finally, the truth of any member of C implies the truth of KC (rules r30 ). Intuitively, a rational model contains a justification for every accepted KC . Proposition 1. Let P be a semi-negative datalog program. M is a rational stable model of P⊕ iff M ∩ BP is a (total) stable model of P . The rationality restriction is necessary to prevent KC from being accepted without any of the elements of C being true. For positive-acyclic programs, we can get rid of this restriction. Definition 5. A semi-negative logic program P is called positive-acyclic 5 iff there is an assignment of positive integers to each element of BP such that the number of the head of any rule is greater than any of the numbers assigned to any non-negated atom appearing in the body. Proposition 2. Let P be a semi-negative positive-acyclic logic program. There exists a choice logic program Pc such that M is a stable model of Pc iff M ∩ BP is a stable model of P . The reverse transformation is far less complicated. Proposition 3. Let P⊕ be a choice program. There exists a semi-negative datalog program P (containing constraints) such that M is a stable model of P⊕ iff M is a stable model of P . 2.2
Unfounded Sets and Seminegative Disjunctive Programs
In this section, we formalize implicit negative information by defining an appropriate notion of “unfounded set” for choice logic programs and we investigate its properties and usefulness for the computation of stable models. It turns out that many of the results of [5] remain valid or can even be strengthened: 1. For choice logic programs, the greatest unfounded set is defined on any interpretation, which is not the case for disjunctive programs. 2. Contrary to disjunctive programs, the results for choice programs remain valid in the presence of constraints. 3. For choice logic programs, the RP,I (see Definition 9) operator, when repeatedly applied to BP , always yields the greatest unfounded set w.r.t. I. 5
In [5] a similar notion is called “head-cycle free”.
242
M. De Vos and D. Vermeir
4. Because of (1) above, the WP (see Definition 8) operator can be used in the computation of a stable model. For disjunctive programs, this is not possible because there is no guarantee that an intermediate interpretation has a greatest unfounded set. Definition 6. Let P be a choice logic program . An interpretation is any consistent6 subset of (BP ∪ ¬BP ). We use IP to denote the set of all interpretations of P . An interpretation I is total iff7 I + ∪ I − = BP . A total interpretation M is called a (stable) model iff M + is a (stable) model of P . A set X ⊆ BP is an unfounded set for P w.r.t. an interpretation I iff for each p ∈ X one of the following three conditions holds: 1. ∃r : A ⊕ p ← B ∈ P such that A ∩ I 6= ∅ and B ⊆ I, or 2. ∃r : ← B, p ∈ P such that B ⊆ I, or 3. ∀r : A ⊕ p ← B ∈ P at least one of the following conditions is satisfied: a) B ∩ ¬I 6= ∅, or b) B ∩ X 6= ∅, or c) A ∩ B 6= ∅ The set of all unfounded sets for P wrt I is denoted UP (I). TheSgreatest unfounded set wrt I, denoted GUS P (I), is defined by GUS P (I) = X∈UP (I) X. I is called unfounded-free iff I ∩ GUS P (I) = ∅. Condition (1) above expresses the fact that choice is exclusive and thus, alternatives to the actual choice are to be considered false. Condition (2) implies that any atom that would cause a constraint to be violated may be considered false. Condition (3) resembles the traditional definition of unfounded set by expressing when a rule cannot be used to infer a new atom: in case (a), the rule is “blocked” by the current interpretation; in case (b), the rule’s application depends on an unfounded literal while case (c) indicates that the rule is useless[2] since the body contains one of the choices in the head. The next proposition shows that the name “greatest unfounded set” is wellchosen for the union of all unfounded sets, GUS P (I). Proposition 4. Let I be an interpretation for the choice logic program P . Then, GUS P (I) ∈ UP (I). Moreover, GUS P is a monotonic operator; i.e. if I1 ⊆ I2 , then GUS P (I1 ) ⊆ GUS P (I2 ). Note that the above proposition is false for disjunctive logic programs [5]. In fact, for such programs, GUS P (I) ∈ UP (I) is only guaranteed if I is unfounded-free or d-unfounded-free[2]. Proposition 5. Let M be a model for the choice logic program P . Then M − ∈ UP (I). 6 7
For X a set of literals, we use ¬X to denote {¬p|p ∈ X} where ¬¬a = a for any atom a. X is consistent iff X ∩ ¬X = ∅. For a subset X ⊆ (BP ∪ ¬BP ), we define X + = X ∩ BP and X − = ¬(X ∩ ¬BP ).
On the Role of Negation in Choice Logic Programs
243
Unfortunately, the converse does not hold, as can be seen from the interpretation {a, b} of the single-rule program a ⊕ b ← which is not a model, although its complement (the empty set) is trivially unfounded. For seminegative disjunctive logic programs, the converse does hold[5]. Proposition 6. Let P be a choice logic program . A total interpretation is a stable model iff it is unfounded-free. Combining Propositions 5 and 6 yields a characterization of stable models in terms of unfounded sets which also holds for disjunctive programs. Corollary 1. Let P be a choice logic program. An interpretation M is a stable model for P iff GUS P (M ) = M − . Definition 7. Let P be a choice logic program. The immediate consequence operator, TP : 2(BP ∪ ¬BP ) → 2BP , is defined by TP (I) = {a ∈ BP | ∃A ⊕ a ← B ∈ P · A ⊆ ¬I ∧ B ⊆ I} This operator adds those atoms that are definitely needed in any model extension of I. It is clearly monotonic. The WP operator, which uses the same intuition as the one defined in [4], uses TP to extend I + and GUS P to extend I − . Definition 8. Let P be a choice logic program. The operator WP : IP → 2(BP ∪ ¬BP ) is defined by WP (I) = TP (I) ∪ ¬GUS P (I) Note that WP is monotonic and skeptical as it only adds literals that must be in any model extension of I. The following result also holds for disjunctive programs (without constraints). Proposition 7. Let P be a choice logic program and let M be a total interpretation for it. M is a stable model iff M is a fixpoint of WP . The least fixpoint WPω (∅) of WP can, if it exists8 , be regarded as the “kernel” of any stable model. Proposition 8. Let P be a choice logic program . If WPω (∅) exists then WPω (∅) ⊆ M for each stable model M . If WPω (∅) does not exist then P has no stable models. Because WP is deterministic, and contrary to the case of e.g. seminegative (disjunctive-free) programs, WPω (∅) may not be a model, even if it is consistent. Corollary 2. Let P be a choice logic program . If WPω (∅) is a total interpretation, then it is the unique stable model of P . 8
The fixpoint may not exist because WPn (I) may not be consistent, i.e. outside of the domain of WP , for some n > 0.
244
M. De Vos and D. Vermeir
The following monotonically decreasing operator can be used to check the unfounded-free property of total interpretations. Definition 9. Let P be a choice logic program and let I be an interpretation for it. The operator RP,I : 2BP → 2BP is defined by ∃r : A ⊕ a ← B ∈ P · A ∩ I 6= ∅ ∧ B ⊆ I or or RP,I (X) = a ∈ X | ∃← B, a ∈ P · B ⊆ I ∩ (¬I ∪ X) 6= ∅, or ∀r : A ⊕ a ← B · B (A ∪ {a}) ∩ B 6= ∅ Intuitively, RP,I (J) gathers all atoms that are contained in both J and some unfounded set of I. Proposition 9. Let I be a total interpretation for a choice logic program P . + Then, Rω P,I (I ) = ∅ iff I is unfounded-free. Moreover RP,I can be used to compute the greatest unfounded sets GUS P (I). Proposition 10. Let P be a choice logic program and let I be an interpretation for it. Then, Rω P,I (BP ) = GUS P (I). The above result does not hold for disjunctive logic programs.
3
Computing Stable Models
With the help of the above results, an intuitive and relatively efficient “backtracking fixpoint” algorithm can be designed to compute the stable models of a choice logic program. Essentially, the algorithm of Fig. 3 keeps a “current interpretation” (which is initialized to the empty set) and a stack of choice points (initially empty). It consists of a loop which itself consists of two stages: 1. In the first stage, WP is applied on the current interpretation until a fixpoint interpretation is reached or an inconsistency is detected. In the latter case, the algorithm backtracks to the previous choice point (if any) and tries a different choice. 2. In the second stage, a choice is made from the applicable rules (that have a true body in the current interpretation) that are not yet applied. If there are no such rules, the current interpretation is a stable model. For the selected rule, a choice is made for a literal from the head to be added to the current interpretation, thus making the rule applied (the choice must be such that the new interpretation remains consistent). The other literals are immediately assumed false. Such a combination of literals is called is a ”possibly-true conjunction”[5]. We use P TP (I) to denote the set of such choices that are available, given the interpretation I. Given the results of the previous section, it is clear that this algorithm will find all stable models of a given choice logic program. It generalizes on a corresponding algorithm in [5] because it also handles constraints. In addition, it can afford to be more skeptical than the algorithm in [5] (checking consistency at each step in stage 1) because of Proposition 4.
On the Role of Negation in Choice Logic Programs
245
Input: A choice logic program P . Output: The stable models of P . Procedure Compute-Stable(In :SetOfLiterals); 0 var X, In0 , In+1 : SetOfLiterals; begin if P TP (In ) = ∅ (* no choices available *) then output ”In is a stable model of P ”; else for each X ∈ P TP (In ) do 0 In+1 := In ∪ X; (* Assume the truth of a possibly-true conjunction *) repeat 0 In0 := In+1 ; 0 0 In+1 := TP (In0 ) ∪ ¬Rω 0 (BP ); (* = WP (In ) *) P,In 0 0 0 0 until In+1 = In or In+1 ∩ ¬In+1 6= ∅; 0 0 0 ∩ ¬In+1 = ∅ (* In+1 is consistent *) if In+1 0 then Compute-Stable(In+1 ) end-if end-for end-if end-procedure var I,J : SetOfLiterals; G : SetOfAtoms; begin (*Main *) I := ∅; repeat (* Computation of WPω (∅) if it exists *) J := I; G := GU S P (J); (* by means of Rω P,J (BP ) *) if G ∩ J 6= ∅ (* J not unfounded-free *) exit end-if ; I := TP (J) ∪ ¬G; (* = WP (J) *) until I = J; if P TP (I) = ∅ then output ”I is the unique stable model of P ”; else Compute-Stable(I) end-if end. Fig. 3. Algorithm for the Computation of Stable Models for choice logic programs.
4
Conclusions and Directions for Further Research
We introduced choice logic programs as a convenient and simple formalism for modeling decision making. Such programs can e.g. be used to model strategic games. We investigated the implicit support for negation that is present in such programs, due to the exclusive nature of the choices and the support for constraints. It turns out that choice programs can reasonably simulate seminegative
246
M. De Vos and D. Vermeir
logic programs. On the other hand, many results that are known for (seminegative) disjunctive programs (without constraints) can be carried over (or even strengthened) to choice programs (with constraints), resulting in a simple algorithm to compute the stable models of a choice program. It is worth noting that, although [1] introduces constraints for disjunctive logic programs, these are checked only after the usual algorithm (for programs without constraints) finishes, while our algorithm uses constraints directly, which should result in a more eager pruning of candidate interpretations. Future research will attempt to extend the notion of choice programs to allow for the expression of epistemic restrictions. At present, all the knowledge of decision making agents is stored in a single program which is visible to each agent (this fact lies at the basis of Theorem 1); an assumption which is often not realistic.
References 1. Francesco Buccafurri, Nicola Leone and Pasquale Rullo. Strong and Weak Constraints in Disjunctive Datalog. In Jurgen Dix and Ulrich Furbach and Anil Nerode, editors, 4th International Conference on Logic Programming and Non-Monotonic Reasoning (LPNMR’97), volume 1265 of Lecture Notes in Computer Science, pages 2–17. Springer. 2. Marina De Vos and Dirk Vermeir. Forcing in Disjunctive Logic Programs. In Kamal Karlapalem, Amin Y. Noaman and Ken Barker, editors, Proceedings of the Ninth International Conference on Information and Computation, pages 167–174, Winnipeg, Manitoba, Canada, June 1998. 3. Marina De Vos and Dirk Vermeir. Choice logic programs and Nash equilibria in strategic games. Accepted at Annual Conference of the European Association for Computer Science Logic (CSL99), September 20-25, 1999, Madrid, Spain. Published in Lecture Notes in Computer Science, Springer. 4. Allen Van Gelder, Kenneth A. Ross and John S. Schlipf. The Well-Founded Semantics for General Logic Programs. Journal of the Association for Computing Machinery, 38(3) (1991) 620–650. 5. Nicola Leone, Pasquale Rullo and Francesco Scarello. Disjunctive Stable Models: Unfounded Sets, Fixpoint Semantics, and Computation. Journal of Information and Computation 135(2) (1997) 69–112. 6. M. J. Osborne and A. Rubinstein. A Course in Game Theory, MIT Press,1994. 7. Carolina. Ruiz and Jack. Minker. Computing Stable and Partial Stable Models of Extended Disjunctive Logic Programs. Lecture Notes in Computer Science, 927(1995). Spinger 8. D. Sacca. Deterministic and Non-Deterministic Stable Models. Logic and Computation, 5 (1997) 555–579. 9. Chiaki Sakama and Katsumi Inoue An Alternative Approach to the Semantics of Disjunctive Logic Programs and Decductive Databases. Journal of Automated Reasoning, 13 (1994) 145–172. 10. D. Sacca and C. Zaniolo. Stable Models and Non-Determinism for Logic Programs with Negation. In Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 205-218. Association for Computing Machinery, 1990.
Default Reasoning via Blocking Sets Thomas Linke and Torsten Schaub Institut f¨ ur Informatik, Universit¨ at Potsdam, Postfach 60 15 53, D-14415 Potsdam
Abstract. We present a new approach to reasoning with default logic that aims at Reiter’s original approach, whenever there is no source for incoherence. We accomplish this by shifting the emphasis from the application of individual default rules to that of the joint application of a default rule together with rules supporting this application. This allows for reasoning in an incremental yet compositional fashion, without giving up the expressiveness needed for knowledge representation. Technically, our approach differs from others in that it guarantees the existence of extensions without requiring semi-monotonicity.
1
Introduction
Default logic [20] is one of the best known and most widely studied formalizations of default reasoning due to its very expressive and lucid language. In default logic, knowledge is represented as a default theory, which consists of a set of formulas and a set of default rules for representing default information. Possible sets of conclusions from a default theory are given in terms of extensions of that theory. A default theory can possess no, one, or multiple extensions because different ways of resolving conflicts among default rules lead to different alternative extensions. Such extensions are formed in a context-sensitive (yet self-referential) way by requiring that all drawn inferences are already consistent with the final extension. Interestingly, Reiter already anticipated in [20, p. 83] that “providing an appropriate formal definition of this consistency requirement is perhaps the thorniest issue in defining a logic for default reasoning”. At this stage, this was insofar foreseeable since the original approach relied on complex fixed-point constructions that denied any incremental constructibility and that sometimes had no solutions (i.e. extensions) at all. As a consequence, several variants of default logic were proposed, addressing either purportedly counterintuitive or technical problems of the original approach, beginning with Lukaszewicz’ variant [14] up to the proposals in over those of Brewka [2], Delgrande et al. [5], Mikitiuk and Truszczy´ nski [17], Przymusinska and Przymusinski [19] and Giordano and Martinelli [9] up to the proposal by Brewka and Gottlob [3]. Many of these variants put forward the formal property of semi-monotonicity because it guarantees the existence of extensions and it allows for incremental constructibility that is advantageous from a computational point of view. On the other hand, Brewka has shown in [2] that semi-monotonicity diminishes the expressive power of default logic. Intuitively, this is because semi-monotonicity limits the contextual scope of M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 247–261, 1999. c Springer-Verlag Berlin Heidelberg 1999
248
T. Linke and T. Schaub
inferences (as made precise in Section 2). Consequently, we were up to now faced with the dilemma of choosing between full expressive power and full incremental constructibility. We address this shortcoming by proposing a compromising approach that allows for (compositional) incremental constructions and that guarantees the existence of extensions without requiring semi-monotonicity. This gives us a useful trade-off between the feasibility of inference and the expressiveness of representation. (The term feasibility should not be conflated with that of complexity; it rather refers to the degree of incrementality.) As a result, we obtain an approach to default logic that aims at deviating from the original approach on incoherent theories only. This is made more precise in Section 4. The intuitive idea is to substitute the usual fixed-point constructions by rather conflict-driven constructions that are delineated by pre-compiled interaction patterns between default rules. For this purpose, we draw on the notions of blocking sets (and block graphs) introduced in [12]. There, these concepts were used for characterizing default theories guaranteeing the existence of extensions and for supporting queryanswering. (An explicit reference is made for each contribution due to [12].) Our interest lies here, however, on rather different topics, namely the development of a new conception for extensions of default theories and the elaboration of new structural relationships among related approaches.
2
Background
A default rule is an expression of the form αγ: β 1 where α, β and γ are propositional formulas. We sometimes denote the prerequisite α of a default rule δ by p(δ), its justification β by j(δ) and its consequent γ by c(δ).2 A rule is called normal if β is equivalent to γ; it is called semi-normal if β implies γ. A set of default rules D and a set of formulas W form a default theory3 ∆ = (D, W ), that may induce one, multiple or no extensions: Definition 1. [20] Let ∆ = (D, W ) be a default theory. For any set of formulas S, let Γ∆(S) be the smallest set of formulas S 0 such that DL1 W ⊆ S 0 , DL2 Th(S 0 ) = S 0 , DL3 for any αγ: β ∈ D, if α ∈ S 0 and ¬β 6∈S then γ ∈ S 0 . A set of formulas E is an R-extension of ∆ iff Γ∆(E) = E. Observe that E is a fixed point of Γ∆. Any such extension represents a possible set of beliefs about the world. 1
2 3
Reiter [20] considers default rules having finite sets of justifications. [16] show that any such default rule can be transformed into a set of default rules having a single justification. This generalizes to sets of default rules in the obvious way. If clear from the context, we sometimes refer to (D, W ) as ∆ and vice versa.
Default Reasoning via Blocking Sets
249
For simplicity, we assume for the rest of the paper that default theories (D, W ) comprise finite sets only. Additionally, we assume that for each default rule δ in D, we have that W ∪ {j(δ)} is consistent. This can be done without loss of generality, because we can clearly eliminate all rules δ from D for which W ∪ {j(δ)} is inconsistent, without altering the set of extensions. Consider the standard example where birds fly, birds have wings, penguins are birds, and penguins don’t fly along with a formalization through default theory (D1 , W1 ) where n o p : ¬ab b , b w: w , p b: b , ¬f p D1 = b : ¬ab (1) f and W1 = {¬f → abb , f → abp , p}. We let δf , δw , δb , δ¬f abbreviate the previous default rules by appeal to their consequents. Our example yields two extensions, viz. E1 = Th(W1 ∪ {b, w , ¬f }) and E2 = Th(W1 ∪ {b, w , f }), while :x theory (D1 ∪ { ¬x }, W1 ) has no extension. We call a default theory coherent if it has some extension. A default theory (D, W ) is semi-monotonic, if for any D0 ⊆ D, we have that if E 0 is an extension of (D0 , W ), then there is an extension E of (D, W ) where E 0 ⊆ E. Note that semi-monotonicity implies coherence but not vice versa. A default logic is said to enjoy coherence or semi-monotonicity, if its interpretation of default theories guarantees the respective property for all default theories. It is well-known that semi-monotonicity does not hold for Reiter’s default logic. Now we can make precise the dilemma between incremental constructibility and full expressiveness depending on whether semi-monotonicity holds or not. On the one hand, it should be clear that semi-monotonicity allows for incremental constructions because we can gradually extend a set of default rules without running in danger of invalidating former conclusions. Although this does not affect worst-case complexity (cf. [10]), it makes inferencing more feasible since it allows to validate the application of a default rule with respect to previously applied rules only (while ignoring all other rules). On the other hand, semi: abb to (1) in monotonicity reduces expressiveness. To explain this, let us add p ab b order to eliminate extension E2 . While this works in Reiter’s default logic, it fails for semi-monotonic default logics. To see this, simply take E 0 and (D0 , W ) above as E2 and (D1 , W1 ). Now semi-monotonicity ensures that either E2 or : abb }, W1 ). In fact, E2 is one of its supersets is an extension of theory (D1 ∪ { p ab b an extension of this theory in the semi-monotonic variant of Lukaszewicz (see below). This shows that semi-monotonicity disables the possibility of blocking b through default conclusions such as abb . Note the difference to rules like b : ¬ab f the addition of p → abb to W1 that eliminates extension E2 no matter whether we deal with a semi-monotonic system or not. Further, define for a set of formulas S and a set of defaults D, the set of generating default rules as GDR(D, S) = {δ ∈ D | S ` p(δ) and S 6` ¬j(δ)}. We call a set of default rules D grounded in a set of formulas S iff there exists an enumeration hδi ii∈I of D such that we have for all i ∈ I that S ∪ c({δ0 , . . . , δi−1 }) ` p(δi ). As proposed by [11,8], we call a set of default rules D weakly regular wrt a set of formulas S iff we have for each δ ∈ D that S ∪ c(D) 6` ¬j({δ}). A set of rules D
250
T. Linke and T. Schaub
is called strongly regular wrt S iff S ∪ c(D) ∪ j(D) 6` ⊥. A default logic is said to enjoy one of these properties according to its treatment of default rules that generate extensions. While all variants mentioned in the introductory section lead to grounded sets of generating default rules, Reiter’s and Lukaszewicz’ variants enjoy weak regularity, while those in [2,5,17] are strongly regular. Lukaszewicz’ gives in [14] the following alternative definition of extensions: Definition 2. Let (D, W ) be a default theory. For any pair of sets of formulas (S, T ) let Ψ (S, T ) be the pair of smallest sets of formulas (S 0 , T 0 ) such that LDL1 W ⊆ S 0 , LDL2 S 0 = Th(S 0 ), LDL3 for any αγ: β ∈ D, if α ∈ S 0 and ¬η 6∈Th(S ∪ {γ}) for all η ∈ T ∪ {β} then γ ∈ S 0 and β ∈ T 0 . A set of formulas E is an L-extension of (D, W ) wrt a set of formulas J iff Ψ (E, J) = (E, J). Interestingly, given a theory (D, W ), maximal sets D0 ⊆ D of grounded and weakly regular default rules induce L-extensions, as shown in [21]. That is, for each such D0 , Th(W ∪ c(D0 )) forms an L-extension wrt J = j(D0 )4 . We refer to the set of default rules (here D0 ) generating an L-extension E wrt J (here j(D0 )) as GDL(D, E, J). For capturing the interaction between default rules under weak regularity, [12] introduced the concept of blocking sets: Definition 3. [12] Let ∆ = (D, W ) be a default theory. For δ ∈ D and B ⊆ D, we define 1. B as a potential blocking set of δ, written B 7→∆ δ, iff a) W ∪ c(B) ` ¬j(δ) and b) B is grounded in W . • 2. B is an essential blocking set of δ, written B 7→∆ δ, iff a) B 7→∆ δ and b) (B \ {δ 0 }) 7→∆ δ 00 for no δ 0 ∈ B and no δ 00 ∈ B ∪ {δ}. Observe that for constructing blocking sets the justifications of the default rules are ignored. Hence defaults are treaded as monotonic inference rules.5 • Let B∆ (δ) = {B | B 7→∆ δ} be the set of all essential blocking sets of δ. These blocking sets provide candidate sets for denying the application of δ. The second condition on essential blocking sets, namely (2b), assures that B∆ (δ) contains only ultimately necessary blocking sets: First, members of B∆ (δ) are (set inclusion) minimal among the blocking sets of δ. Second, no blocking set in B∆ (δ) contains any blocking sets for its constituent rules. We give the sets of blocking sets obtained in our example at the end of this section. 4 5
J is used to distinguish identical L-extensions, generated by different sets of defaults D0 . Monotonic inference rules are also considered in [15]
Default Reasoning via Blocking Sets
251
In what follows, we let the term blocking set refer to essential blocking sets. This is justified by our first result, showing that essential blocking sets are indeed sufficient for characterizing the notion of consistency used in Reiter’s default logic: Theorem 1. Let ∆ = (D, W ) be a default theory and let D0 ⊆ D be grounded in W . We have that D0 is weakly regular wrt W iff we have for each δ 0 ∈ D0 and each B ⊆ D0 that B 6∈ B∆ (δ 0 ). The problem with blocking sets is that there may be exponentially many in the worst case. This is why [12] put forward the notion of a block graph, as a compact abstraction of actual blocking sets: Definition 4. [12] Let ∆ = (D, W ) be a default theory. The block graph G(∆) = (V∆ , A∆ ) of ∆ is a directed graph with vertices V∆ = D and arcs A∆ = {(δ 0 , δ) | δ 0 ∈ B for some B ∈ B∆ (δ)} . (Recall that a directed graph G is a pair G = (V, A) such that V is a finite, non-empty set of vertices and A ⊆ V × V is a set of arcs.) We observe that the space complexity of block graphs is quadratic in the number of default rules; its construction6 faces the same time complexity as the extension-membershipproblem. Note that the efforts put into constructing a block graph are, however, meant to amortize over subsequent tasks; notably its construction (and reduction, see below) are both incremental. A default theory is said to be non-conflicting, well-ordered or even, depending on whether its block graph has no arcs, no cycles or only even cycles, respectively. [12] show that these three classes guarantee the existence of R-extensions. A default theory is said to be odd if its block graph has some odd cycle. For a default theory ∆ = (D, W ) and sets B, B 0 ⊆ D, we abuse our notation • and write B 0 7→∆ B, if there is some δ ∈ B such that B 0 ∈ B∆ (δ). With this, we define the concept of supporting sets: Definition 5. [12] Let ∆ = (D, W ) be a default theory. We define the set of all supporting sets for δ ∈ D as •
S∆ (δ) = {B10 ∪ . . . ∪ Bn0 | Bi0 ⊆ D s.t. Bi0 7→∆ Bi and B∆ (δ) = {B1 , . . . , Bn } } provided B∆ (δ) 6= ∅. Otherwise, we define S∆ (δ) = {∅}. Supporting sets are meant to cover the safe application of default rules in focus. We draw on them in the next section as a means for ruling out blocking sets as subsets of the generating default rules, because once a supporting set for some rule has been applied, the rule itself can be applied safely. Default theory (1) yields the following blocking and supporting sets: B∆ (δf ) = {{δ¬f }} B∆ (δw ) = ∅ B∆ (δb ) = ∅ B∆ (δ¬f ) = {{δb , δf }} 6
That is, a corresponding decision problem.
S∆ (δf ) = {{δb , δf }} S∆ (δw ) = {∅} S∆ (δb ) = {∅} S∆ (δ¬f ) = {{δ¬f }}
252
T. Linke and T. Schaub
We get a block graph with vertex set D1 (indicated by white nodes) and (solid) arcs (δ¬f , δf ), (δf , δ¬f ) and (δb , δ¬f ) : δf
δw
δabb
δ¬x δb
δ¬f
: abb The addition of δabb = p ab to (1) augments B∆ (δf ) as well as S∆ (δ¬f ) by b {δabb }, whereas it reduces S∆ (δf ) to ∅, indicating that δf has no supporting sets anymore. We get additionally B∆ (δabb ) = ∅ and S∆ (δabb ) = {∅}, reflecting the fact that δabb is unblockable, that is, applicable without consistency check. Note the crucial difference between an empty supporting set and one containing the empty set. The addition to the block graph is indicated by (light-gray) node :x to (1) leaves δabb and (dashed) arc (δabb , δf ). The further addition of δ¬x = ¬x the above blocking sets unaffected and yields additionally B∆ (δ¬x ) = {{δ¬x }} and S∆ (δ¬x ) = {{δ¬x }} reflecting self blockage. This leads to an additional (light-gray) node δ¬x and a (dotted) odd loop (δ¬x , δ¬x ) in the augmented block graph.
3
Supported default logic
Our new conception of extensions is defined by appeal to blocking and supporting sets: Definition 6. Let ∆ = (D, W ) be a default theory and E a set of formulas. We define E as an S-extension of ∆ iff E = Th(W ∪ c(D0 )) for some maximal set D0 ⊆ D s.t. SDL1 D0 is grounded in W , SDL2 B ⊆ D0 for no B ∈ B∆ (δ) and every δ ∈ D0 , SDL3 S ⊆ D0 for some S ∈ S∆ (δ) and every δ ∈ D0 . Observe that SDL2 and SDL3 are actually parameterized by ∆. We refer to the set of default rules (here D0 ) generating an S-extension E of some theory (D, W ) as GDS(D, E). First of all, we observe that S-extensions do not rely on fixed-point definitions. In contrast to R-extensions, where global consistency is guaranteed at once by appeal to all applying default rules (comprised in the extension, which is the fixed-point), S-extensions ensure consistency by avoiding conflicts (separately) among the generating default rules. While SDL2 implements weak regularity (see Theorem 1) by eliminating all blocking sets of generating default rules, SDL3
Default Reasoning via Blocking Sets
253
provides reasons for doing so. That is, by requiring the presence of some supporting set for each generating default rule δ, it keeps out all blocking sets of δ. This is actually the salient difference between our approach and the standard way of constructing extensions: While all existing variants focus on the applicability of individual rules, we shift the emphasis to the joint application of a rule together with one of its supporting sets. Hence we call the resulting system supported default logic. Consider our initial example in (1). In fact, both R-extensions E1 and E2 are also S-extensions of (D1 , W1 ). To see this, let us verify that the underlying sets of generating defaults GDR(D, E1 ) = {δw , δb , δ¬f } and GDR(D, E2 ) = {δf , δw , δb }, respectively, do also fulfill the conditions stipulated for D0 in Definition 6. Clearly, both of them satisfy SDL1 (groundedness) and SDL2 (weak regularity) by virtue of being generating default rules for R-extensions. To see that both also fulfill SDL3, it is sufficient to verify that each of their constituent rules comes with one of its supporting sets. For instance, we have δf ∈ GDR(D, E2 ) and {δb , δf } ⊆ GDR(D, E2 ) for {δb , δf } ∈ S∆ (δf ). Now consider (D1 ∪ {δabb }, W1 ). We have seen at the end of the previous section that δf has no supporting set in (D1 ∪{δabb }, W1 ), which would protect it against δabb . This disqualifies GDR(D, E2 ) as a generator of an S-extension, since it contains a default rule without a supporting set. Hence E2 is no S-extension of the augmented theory, as opposed to E1 , which is still an S-extension. This is because the supporting sets of all members of GDR(D, E1 ) remain intact when adding δabb (and no new blocking sets for them appear). In both cases, we have obtained in supported default logic the same extensions as in Reiter’s default logic. Notably in both default logics E2 is ruled out by the addition of δabb . This is due to the following fact. Property 1. Supported default logic is not semi-monotonic. Here’s another property shared with Reiter’s approach: Theorem 2. Supported default logic is weakly regular. For further illustration, consider the theory used in [7] to show that semi-normal theories may lack extensions: : a∧¬b : b∧¬c : c∧¬a , b , c ,∅ . (2) (D2 , W2 ) = a While this theory has no R-extension, it has S-extension Th(∅). In fact, the block graph of this theory comprises an odd cycle. This makes it impossible to jointly apply a rule together with its supporting set (given by the singleton containing the pre-predecessor in the block graph). Consequently, none of the rules can contribute to an S-extension, which results in S-extension Th(∅). This comportment becomes more apparent when examining theories like :x (D1 ∪{ ¬x }, W1 ) or (D1 ∪D2 , W1 ∪W2 ). In both cases, we obtain no R-extensions although there is arguably a part of the theory, viz. (D1 , W1 ), that would give rise to reasonable conclusions. However, in each example the respective odd cycle destroys all conclusions, although its rules are unrelated to the rest of the
254
T. Linke and T. Schaub
theory. This is different from supported default logic that yields in both cases the two extensions E1 and E2 already obtained from (D1 , W1 ). So, supported default logic lets the reasonable conclusions go through, whereas rules belonging to (harmful) odd cycles are discarded during extension formation. Notably, the elimination of odd cycles applies to harmful ones only. For instance, theory (D2 , W2 ∪ {c → b}) has despite its odd cycle in the block graph the identical R- and S-extension Th({c}). The capacity of discarding harmful odd cycles leads to the following result. Theorem 3. Every default theory has an S-extension. We complete this section by showing that the extension construction process coincides with that of conventional default logics on normal theories: Theorem 4. Let ∆ be a normal default theory and E a set of formulas. Then, E is an R-extension of ∆ iff E is an S-extension of ∆. Clearly, this result extends to all variants enjoying the same correspondence with Reiter’s default logic. Moreover, it provides us with complexity results: For instance, by using normal default theories, Gottlob shows in [10] that the extension-membership-problem (for R-extensions) is Σ2P -complete. Hence, considering normal default theories, Theorem 4 makes this result applicable to supported default logic.
4
Elaboration in context
This section continues with the elaboration of supported default logic and its underlying concepts in the context of Reiter’s and Lukaszewicz’ default logic. We need the following definition. For default theory ∆ = (D, W ) and D0 ⊆ D, define ∆|D0 = (D \ (D0 ∪ D0 ), W ∪ c(D0 )) where7 D0 = {δ ∈ D | W ∪ c(D0 ) ` ¬j(δ)}. The next result shows that operator | allows for filtering out extensions that are generated by a given rule set: Theorem 5. Let ∆ = (D, W ) be a default theory and let E be a set of formulas. Further, let D0 ⊆ GDR(D, E) be grounded in W . We have that E is an Rextension of ∆ iff E is an R-extension of ∆|D0 . R- and S-extensions. To begin with, we show that Reiter’s conception of default logic coincides with ours whenever there are no odd cycles in the block graph: Theorem 6. Let ∆ be an even default theory and let E be a set of formulas. We have that E is an R-extension of ∆ iff E is an S-extension of ∆. 7
D0 eliminates defaults whose justification is inconsistent with the facts of ∆|D0 .
Default Reasoning via Blocking Sets
255
This explains further why we obtain the same R- and S-extensions from (D1 , W1 ). In the general case, both approaches coincide whenever the generating defaults induce an arcless block graph: Theorem 7. Let ∆ = (D, W ) be a default theory and let E be a set of formulas. We have that E is an R-extension of ∆ iff E is an S-extension of ∆ and G(∆|GDS(D, E)) is arcless. In fact, one can show that if E is an S-extension but not an R-extension, then there is an odd cycle in G(∆|GDS(D, E)) and hence also in G(∆). In other words, R- and S-extensions coincide whenever there is no source for incoherence. In less technical terms, we have in general the following corollary: Corollary 1. Every R-extension is an S-extension, but not vice versa. In fact, the generating default rules of R-extensions do always induce arcless block graphs: Theorem 8. Let ∆ = (D, W ) be a default theory. If E is an R-extension of ∆, then G(∆|GDR(D, E)) is arcless. This is different from S-extensions that leave back harmful odd cycles in the block graph. For instance, the generating default rules of both S-extensions E1 and E2 of (D1 ∪ {δ¬x }, W1 ) induce block graphs containing odd cycle (δ¬x , δ¬x ). L- and S-extensions. Let us now turn to the relationship between our approach and that of Lukaszewicz. First of all, we note that we obtain identical L- and S-extensions from normal default theories. In analogy to Theorem 7, we have the following result: Theorem 9. Let ∆ = (D, W ) be a default theory and let E be a set of formulas. If D0 = GDS(D, E) = GDL(D, E, J) for some J ⊆ j(D) and G(∆|D0 ) is arcless, then E is an S-extension of ∆ iff E is an L-extension of ∆ and D0 satisfies SDL3. That is, whenever the generating default rules induce an arcless block graph, then an L-extension is an S-extension if it satisfies SDL3. More precisely, we have the following relationship. Theorem 10. Let ∆ = (D, W ) be a default theory and let E be a set of formulas. We have that – if E is an L-extension of ∆ and GDL(D, E, J) satisfies SDL3, then E is an S-extension of ∆, and – if E is an S-extension of ∆ and GDS(D, E) is maximal in SDL1 and SDL2, then E is an L-extension of ∆. Both types of extensions are induced by grounded (SDL1) and weakly regular (SDL2) sets of default rules, so that their difference boils down to SDL3. This condition enforces that the application of each default is unseparately
256
T. Linke and T. Schaub
connected with that of one of its supporting sets. The absence of SDL3 leads to semi-monotonicity that allows defaults to support themselves when forming Lextensions. To see this, recall that E2 = Th(W1 ∪ {b, w , f }) is an L-extension of : abb }, W1 ), although it is no S-extension (and no R-extension). This is (D1 ∪ { p ab b because the contribution of δf to L-extension E2 is ensured by semi-monotonicity, while it is ruled out by SDL3 in supported default logic (and R-default logic, see below). Since both the existence of L- and S-extensions is guaranteed, the question arises how the underlying approaches handle odd cycles destroying R-extensions. In fact, we obtain S-extension Th(∅) from Theory (D2 , W2 ), whereas there are three L-extensions, viz. Th({a}), Th({b}), and Th({c}). This shows how semimonotonicity unfolds the odd cycle in Lukaszewicz’ variant, whereas our approach simply ignores the rules belonging to the harmful cycle. This is advantageous whenever there are multiple odd cycles, because they induce an exponential number of L-extensions in the worst case. R- and L-extensions. Let us finally exploit our instruments even further for making the relationship between Reiter’s and Lukaszewicz’ conception of default logic more precise. Lukaszewicz already showed in [14] that every R-extension is an L-extension, but not vice versa. Also, it is well-known that both approaches coincide on normal default theories. To begin with, we show that default theories with arcless block graphs, yield the same R- and L-extensions: Theorem 11. Let ∆ be a non-conflicting default theory and let E be a set of formulas. We have that E is an R-extension of ∆ iff E is an L-extension of ∆. Note that weakly regular default logics differ on non-conflicting and strongly , ∅). Our result is therefore orthogonal to the general theories, like ( :ab , : ¬b c equivalence of these default logics on normal default theories. The last result already fails to hold for well-ordered theories, such as ( :aa , : b∧¬a , ∅). This theory has one extension containing a under Reiter’s b interpretation, while a second one containing b emerges in Lukaszewicz’ default logic. We have the following result for the general case, that provides (to the best of our knowledge) the first “iff” result between R- and L-extensions. Theorem 12. Let ∆ = (D, W ) be a default theory and let E be a set of formulas. We have that E is an R-extension of ∆ iff E is an L-extension of ∆, GDL(D, E, J) satisfies SDL3, and G(∆|GDL(D, E, J)) is arcless. In addition to SDL3, the difference between L- and R-extensions boils down the induction of an arcless block graph (already observed between R- and Sextensions). In fact, the last result is not only of theoretical importance, but moreover of practical relevance, since it furnishes an easy procedure for constructing R-extensions from L-extensions. For this, we first construct (incrementally) an
Default Reasoning via Blocking Sets
257
L-extension and then verify by recourse to the block graph whether the corresponding generating default rules satisfy the two additional conditions. This is detailed next. Constructing extensions. The last series of results has not only shed led on the relationships between the considered variants, but it has moreover provided a new view on the respective extension construction processes. In fact, we can directly read off Theorem 12 the following recipe for constructing R-extensions: Procedure R-extension( ∆ = (D, W ) : Default theory ) 0. Construct the block graph G(∆) of ∆. 1. Construct a maximal set D0 ⊆ D of default rules satisfying SDL1 and SDL2. 2. If D0 satisfies SDL3 and if G(∆|D0 ) is arcless then return Th(W ∪ c(D0 )). Interestingly, our above results show that one could integrate the verification of the two conditions in 2. into the maximization in 1. Then, however, step 1. would go beyond the construction of L-extensions. For constructing L-extensions it is clearly sufficient to replace Step 2. by: 2. Return Th(W ∪ c(D0 )). For constructing S-extensions, we must integrate the verification of SDL3 into Step 1., while the condition on G(∆|D0 ) is deleted: Procedure S-extension ( ∆ = (D, W ) : Default theory ) 0. Construct the block graph G(∆) of ∆. 1. Construct a maximal set D0 ⊆ D of default rules satisfying SDL1, SDL2 and SDL3. 2. Return Th(W ∪ c(D0 )). As opposed to L-extensions, we must account for SDL3 when computing Rand S-extensions. In fact, the plain condition imposed by SDL3 comprises a “don’t know”-choice for the supporting set S ∈ S∆ (δ) accompanying rule δ. Interestingly, this turns out to be a “don’t care”-choice whenever {δ}∪S satisfies SDL1, SDL2 and SDL3. We make this precise below in Theorem 15. An issue common to all three procedures is the construction of the block graph at Step 0. Apart from its explicit inspection when constructing R-extensions, the block graph plays an important pragmatic role for verifying SDL2 and SDL3. This is because it delineates the respective search space: Given a default rule, its blocking sets are necessarily found among its predecessors in the block graph, while its supporting sets are among its pre-predecessors. [13] contains case-studies showing that for instance the encoding of the Hamiltonian cycle problem given in [4] yields a rather dense graph, while the encoding of graph coloring [4] and taxonomic knowledge results in rather sparse graphs. The block graph’s role as an instrument indicating rules relevant to the application of other rules is further elaborated upon next.
258
T. Linke and T. Schaub
Restricted semi-monotonicity. Apart from the plain fact that Reiter’s default logic does not enjoy semi-monotonicity (except for restricted theories), there was yet no further elaboration of semi-monotonicity under Reiter’s interpretation. We address this shortcoming by providing a conditioned semimonotonicity property for Reiter’s default logic. For this, we need the following definitions: For a block graph G(∆) = (D, S A) iand vertex v 0∈ D, define (v) where γ∆ (v) = {v} the reachable predecessors of v as γ∆(v) = i≥0 γ∆ i−1 i (v) = {u | (u, w) ∈ A and w ∈ γ (v)} for i > 1. Finally, define and γ∆ ∆ S γ∆(D) = v∈D γ∆(v). Then, we can show the following property of restricted semi-monotonicity for Reiter’s default logic: Theorem 13. Let ∆ = (D, W ) be a default theory and let D0 ⊆ D be a set of defaults. If (γ∆(D0 ), W ) has an R-extension E 0 and ∆|GDR(γ∆(D0 ), E 0 ) is coherent, then ∆ has an R-extension E with E 0 ⊆ E. If D0 is the set of generating defaults of E 0 , then there is an R-extension E of ∆ with E 0 ⊆ E, provided that ∆|D0 is coherent (which is verifiable by appeal to block graph G(∆|D0 )). Since odd loops in G(∆|D0 ) cannot harm S-extensions, we may drop the coherence condition in supported default logic: Theorem 14. Let ∆ = (D, W ) be a default theory and let D0 ⊆ D be a set of defaults. If (γ∆(D0 ), W ) has an S-extension E 0 , then ∆ has an S-extension E with E 0 ⊆ E. The last two theorems exploit the structure of block graphs for capturing the nature of semi-monotonicity in Reiter’s and supported default logic. While full semi-monotonicity starts out from an arbitrary subset D0 ⊆ D, we must additionally account for the reachable predecessors of D0 in block graph G(∆) in order to guarantee the continued existence of partial (R- and) S-extension E 0 . The lack of coherence in Reiter’s approach necessitates moreover the inspection of the remaining rules in D \ γ∆(D0 ) by examining ∆|GDR(γ∆(D0 ), E 0 )). Although the coherence of this theory is often verifiable by appeal to its block graph (cf. Section 2), the mere possibility of a hidden incoherence in D \ γ∆(D0 ) causes the computational inconvenience that all default rules must in some way or another be inspected for reasoning under Reiter’s interpretation (for ensuring an encompassing extension). In contrast to this, full semi-monotonicity allows for constructing L-extensions by gradually adding one default after another as long as SDL1 and SDL2 are satisfied. In fact, a similar proceeding is possible for constructing S-extensions, yet at another level of granularity: Theorem 15 (Compositional incrementality). Let ∆ = (D, W ) be a default theory and let D0 ⊆ D be a set of defaults. If D0 satisfies SDL1, SDL2 and SDL3 (with respect to ∆), then ∆ has an S-extension E with D0 ⊆ GDS(D, E).
Default Reasoning via Blocking Sets
259
The important consequence of this result is that S-extensions are constructible by progressively adding grounded and weakly regular sets of defaults that contain a supporting set for each constituent rule. We refer to such sets, like D0 , as supported sets. A strategy would be to start with a rule δ and one of its supporting sets S ∈ S∆ (δ). While conditions SDL1 and SDL2 depend merely on the rule set in focus, one may have to supplement additional rules, say S 0 , for SDL3. If a supported set like {δ} ∪ S ∪ S 0 , satisfying all three criteria, has applied, it has the same incontestable status as an applied individual rule δ under full semi-monotonicity. Hence for constructing S-extensions of (D1 , W1 ), we may rely on supported sets {δf , δb }, {δw }, {δb }, {δ¬f }, while (D1 ∪ {δabb }, W1 ) gives {δw }, {δb }, {δ¬f }, {δ¬f , δabb }, {δabb }. All of them are freely combinable unless their union violates SDL2 or SDL3. This leads finally to the respective sets of generating defaults.
5
Conclusion
We presented an approach to default logic that aims at deviating from the original approach merely on odd default theories (cf. Theorem 6 and 7). Our approach aims at balancing the expressiveness of Reiter’s default logic with the notion of feasibility found in semi-monotonic variants. While it complies with Reiter’s approach in enabling blockage via default conclusions, it provides incremental constructions using (supported) sets of default rules rather than individual rules, as in Lukaszewicz’ variant. We thus shift the emphasis from the application of individual defaults to the joint application of a default together with one of its supporting sets. We observe that violating this may either lead to the destruction of (R-)extensions or to a tremendous increase in the number of (L-)extensions. A rather different approach to feasibility is pursued in [1,19,3] by using ideas borrowed from well-founded semantics. These approaches are different from ours in several respects. First, they are interested in conclusions belonging to all extensions rather than extensions themselves. Second, the two former approaches are rather weak approximations, as shown in [3]. Finally, the latter approach is only defined for coherent theories, which takes it out of the focus of our approach. On the other hand, these approaches are well studied as regards computational complexity. For semi-monotonic variants, one may draw on their usual equivalence to Reiter’s approach on normal default theories, since the central complexity proofs in [10] rely on prerequisite-free normal default theories. What is definitely needed here is a more fine-grained complexity analysis, addressing constructive issues and distinguishing different treatments of general default rules, as done for instance in [6]. This shortcoming applies also to our work and makes it an issue of future research. In [18] finite sets of justifications, so-called full sets are used to characterize R-extensions. Full sets contain those justifications that are consistent with the set obtained by closing the initial set of facts under classical inferences and the defaults (used as monotonic inference rules) whose justifications belong to the full set. Blocking sets also use default rules as monotonic inference rules, but
260
T. Linke and T. Schaub
here negated justification of other defaults are derived. In this sense, blocking sets and full sets can be considered as dual. However, there is another important difference between full and blocking sets: Whereas the former characterize entire R-extensions the latter are just potential parts of some R-extensions. The distinction between coherence and semi-monotonicity was so far neglected in the literature. Our approach is thus unique in that it guarantees the existence of extensions without requiring semi-monotonicity. In fact, so far the major distinguishing properties of default logics were given by cumulativity, regularity and semi-monotonicity [8]; coherence was always subsumed by semimonotonicity, as one of its consequences. This was insofar appropriate since up to now existing variants enjoyed either both semi-monotonicity and coherence or neither of them. So how does supported default logic fit into the picture? Actually, as regards formal properties, it is indistinguishable from Reiter’s approach, when odd theories are not in issue (cf. Theorem 6). That is, it enjoys weak regularity, whereas it neither satisfies semi-monotonicity nor cumulativity (as verifiable by the standard example). Our elaboration has also revealed structural dependencies that shed light on existing approaches. In particular, we have clarified the relationship between Rand L-extensions and we have given a non-fixed-point definition of R-extensions along with a recipe for constructing R-extension from L-extensions.
Acknowledgements We would like to thank the anonymous referees and Hans Tompits for commenting on a previous version of this paper.
References 1. C. Baral and V. Subrahmanian. Duality between alternative semantics of logic programs and nonmonotonic formalisms. In First International Workshop on Logic Programming and Nonmonotonic Reasoning, pages 69–86. MIT Press, 1991. 2. G. Brewka. Cumulative default logic: In defense of nonmonotonic inference rules. Artificial Intelligence, 50(2):183–205, 1991. 3. G. Brewka and G. Gottlob. Well-founded semantics for default logic,. Fundamenta Informaticae,, 31(3-4):221–236, 1997. 4. P. Cholewi´ nski, V. Marek, A. Mikitiuk, and M. Truszczy´ nski. Experimenting with nonmonotonic reasoning. In Proceedings of the International Conference on Logic Programming. MIT Press, 1995. 5. J. Delgrande, T. Schaub, and W. Jackson. Alternative approaches to default logic. Artificial Intelligence, 70(1-2):167–237, 1994. 6. Y. Dimopoulos. The computational value of joint consistency. In L. Pereira and D. Pearce, editors, European Workshop on Logics in Artificial Intelligence (JELIA’94), volume 838 of Lecture Notes in Artificial Intelligence, pages 50–65. Springer Verlag, 1994. 7. D. Etherington. Reasoning with Incomplete Information: Investigations of NonMonotonic Reasoning. PhD thesis, Department of Computer Science, University
Default Reasoning via Blocking Sets
8. 9. 10. 11.
12.
13. 14. 15. 16.
17.
18. 19. 20. 21.
261
of British Columbia, Vancouver, BC, 1986. Revised Version appeared as: Research Notes in AI, Pitman. C. Froidevaux and J. Mengin. Default logic: A unified view. Computational Intelligence, 10(3):331–369, 1994. L. Giordano and A. Martinelli. On cumulative default logics. Artificial Intelligence, 66(1):161–179, 1994. G. Gottlob. Complexity results for nonmonotonic logics. Journal of Logic and Computation, 2(3):397–425, June 1992. F. L´evy. Computing extensions of default theories. In R. Kruse and P. Siegel, editors, Proceedings of the European Conference on Symbolic and Quantitative Approaches for Uncertainty, volume 548 of Lecture Notes in Computer Science, pages 219–226. Springer Verlag, 1991. T. Linke and T. Schaub. An approach to query-answering in Reiter’s default logic and the underlying existence of extensions problem. In J. Dix, L. Fari˜ nas del Cerro, and U. Furbach, editors, Logics in Artificial Intelligence, Proceedings of the Sixth European Workshop on Logics in Artificial Intelligence, volume 1489 of Lecture Notes in Artificial Intelligence, pages 233–247. Springer Verlag, 1998. Th. Linke. New Foundations for Automation of Default Reasoning. Dissertation, University of Bielefeld, 1999. W. Lukaszewicz. Considerations on default logic — an alternative approach. Computational Intelligence, 4:1–16, 1988. W. Marek and M. Truszczy´ nski. Nonmonotonic logic: context-dependent reasoning. Artifical Intelligence. Springer Verlag, 1993. W. Marek and M. Truszczy´ nski. Normal form results for default logics. In G. Brewka, K. Jantke, and P. Schmitt, editors, Nonmonotonic and Inductive Logic, volume 659 of Lecture Notes in Artificial Intelligence, pages 153–174. Springer Verlag, 1993. A. Mikitiuk and M. Truszczy´ nski. Rational default logic and disjunctive logic programming. In A. Nerode and L. Pereira, editors, Proceedings of the Second International Workshop on logic Programming and Non-monotonic Reasoning., pages 283–299. MIT Press, 1993. I. Niemel¨ a. Towards efficient default reasoning. In C. Mellish, editor, Proceedings of the International Joint Conference on Artificial Intelligence, pages 312–318. Morgan Kaufmann Publishers, 1995. H. Przymusinska and T. Przymusinski. Stationary default extensions. Fundamenta Informaticae, 21(1-2):76–87, 1994. R. Reiter. A logic for default reasoning. Artificial Intelligence, 13(1-2):81–132, 1980. V. Risch. Analytic tableaux for default logics. Journal of Applied Non-Classical Logics, 6(1):71–88, 1996.
Coherent Well-founded Annotated Logic Programs Carlos Viegas Dam´asio1 , Lu´ıs Moniz Pereira2 , and Terrance Swift3 1
A.I. Centre, Faculdade de Ciˆencias e Tecnologia, Universidade Nova de Lisboa, 2825-114 Caparica, Portugal. (
[email protected]) 2 A.I. Centre, Faculdade de Ciˆencias e Tecnologia, Universidade Nova de Lisboa, 2825-114 Caparica, Portugal. (
[email protected]) 3 Department of Computer Science, University of Maryland, College Park, MD, USA. (
[email protected])
Abstract. Extended logic programs and annotated logic programs are two important extensions of normal logic programs that allow for a more concise and declarative representation of knowledge. Extended logic programs add explicit negation to the default negation of normal programs in order to distinguish what can be shown to be false from what cannot be proven true. Annotated logic programs generalize the set of truth values over which a program is interpreted by explicitly annotating atoms with elements of a new domain of truth values. In this paper coherent well-founded annotated programs are defined, and shown to generalize both consistent and paraconsistent extended programs, along with several classes of annotated programs.
1
Introduction
The ability to concisely represent knowledge by a logic program, along with the ability to efficiently evaluate that program, can lead to important applications of logic programming. This has been seen to be the case in diagnosis, model checking, grammar processing, and many other applications. Indeed, a stream of research has focussed on how logic programming can be employed to better represent knowledge. For instance, extended logic programs add explicit negation to normal programs, and gain the fundamental ability to distinguish what can be shown to be false from what is false by default because it cannot be proven true. This distinction can be useful in representing knowledge that derives from separate, possibly contradictory, sources. These two negations are conveniently related through the principle of coherence, which states that an atom that is explicitly proven false must be considered default false as well. In fact, the coherence principle underlies two main semantics for extended programs: the answer set semantics [9] and the well-founded semantics with explicit negation [2]. A separate line of research, into annotated logic programs, has extended the domain of truth M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 262–276, 1999. c Springer-Verlag Berlin Heidelberg 1999
Coherent Well-founded Annotated Logic Programs
263
values over which logic programs are interpreted. Rather than mapping atoms into true, false, or undefined, they are mapped into domains that allow paraconsistent or quantitative information to be represented. This research direction is represented by formalisms such as GAPs [11] and Amalgamation Logic [13]. Each of these extensions is powerful in itself, but suffers from some deficiencies regarding knowledge representation. Extended logic programs, per se, cannot easily represent quantitative information such as probabilities or degrees of belief; annotated logic programs, per se, cannot easily relate what is explicitly known to be false to what is not known to be true. In this paper we propose a framework for coherent well-founded annotated programs that combines the expressivity of both annotated and extended logic programs. We show that several classes of annotated programs can be embedded into coherent well-founded annotated programs, as can both consistent and paraconsistent extended programs.
2
Generalized Annotated Logic Programs
Generalized annotated logic programs (GAPs) are an extension to ordinary definite logic programs. The language, semantics, and query answering procedures are covered in the joint article by Kifer and Subrahmanian [11]. In this section we recall their fundamental results required by our study. A GAP is defined with respect to an underlying upper semi-lattice of truthvalues (T , 4), representing a partial ordering among truth-values. This lattice can be used to represent fuzzy truth-values, time intervals, paraconsistent logics, qualitative degrees of truth, and the like [11,13,1]. In our work we assume this lattice is complete, and therefore the existence of the minimum and maximum elements is always guaranteed, represented by ⊥ and > respectively. Truth-values are referred to in the programs by means of annotation terms, while the basic syntactic elements of generalized annotated logic programs are annotated atoms. Given a set of atoms A, an annotated atom has the form A : µ, where A is an atom in A and µ is an annotation term. For instance, low temp : 0.82 may mean that the temperature is low with a certainty of at least 82%, or box(a) : 4 could signify that object a is a box with a confidence level of at least 4. Formally: Definition 1 (Annotation terms). Let (T , 4) be a complete lattice, F a set of function symbols of arity n > 1, and V a set of variables ranging over the truth-values T . Then, 1. Every element of T is a (simple) annotation term; 2. Every annotation variable of V is a (simple) annotation term; 3. If f is a n-ary annotation function symbol of F and t1 , . . . , tn are annotation terms, then f (t1 , . . . , tn ) is an annotation term; 4. Nothing else is an annotation term. A annotation function symbol is assumed to be computable and continuous, and hence monotonic [11].
264
C.V. Dam´ asio, L.M. Pereira, and T. Swift
Definition 2 (Generalized Annotated Logic Program). A generalized annotated logic program is a set of annotated clauses of the form: A0 : µ0 ← A1 : µ1 & . . . &An : µn where A0 : µ0 is an annotated atom, and the Ai : µi (1 ≤ i ≤ n) are atoms annotated with simple annotation terms. In a ground annotated clause all annotations are truth-values of T . For our purposes the above syntax suffices. However, in their original paper a full blown first-order logic like syntax is introduced, with the usual connectives and quantification symbols. The details can be found in [11]. The reading of an annotated clause of the form A0 : µ0 ← A1 : µ1 & . . . &An : µn is “if A1 is at least µ1 and . . . and An is at least µn then A0 is at least µ0 .” Mark that function symbols may only appear in the heads of annotated clauses. Furthermore, one can instantiate all the annotated variables and evaluate all the function symbol annotations in the heads of the resulting ground program, i.e. where all annotated clauses are replaced by all their ground instances. For simplicity, we assume from now on that this grounding operation has been performed on every program, which may result in an infinite program. This program is dubbed a “strictly ground instance” in [11]. An interpretation is a mapping from the set of atoms to the set of truthvalues T . This corresponds to the restricted interpretations of [11]. Given an interpretation, it is straightforward to define a satisfaction relation: Definition 3 (Ground satisfaction). Let I be an interpretation on (T , 4). We define the ground satisfaction relation as follows, denoted by |=, where all annotations are ground: – I |= A : µ iff I(A) < µ; – I |= A1 : µ1 & . . . &An : µn iff I |= A1 : µ1 and . . . and I |= An : µn ; – I |= A0 : µ0 ← A1 : µ1 & . . . &An : µn iff I 6|= A1 : µ1 & . . . &An : µn or I |= A0 : µ0 An interpretation I is a model of a ground GAP iff it satisfies all the annotated clauses in the program. The ordering of the underlying lattice of truth-values is easily extended to the point-wise ordering between interpretations. As usual, we are interested in the minimal model of the program. It can be obtained by extending the TP operator of van Emden and Kowalski [6] to this more general setting: Definition 4 (Immediate consequences operator). Let P be a generalized annotated logic program on the complete lattice (T , 4). The immediate consequences operator, a function mapping interpretations into interpretations, is defined by: TP (I)(A) = lub { µ | I |= A1 : µ1 & . . . &An : µn , where A : µ ← A1 : µ1 & . . . &An : µn belongs to P }
Coherent Well-founded Annotated Logic Programs
265
Because this operator is monotonic by the Knaster-Tarski fixpoint theorem, i.e. if I 4 J then TP (I) 4 TP (J), we can conclude that TP has a least fixpoint, which corresponds to the least model of P . It can be found by iterating from the least interpretation, where all atoms are initially assigned the truth-value ⊥. However, this operator is not continuous. See [11] for more details. We now provide some examples to illustrate the above concepts. Example 1. Definite logic programs are easily captured by generalized annotated logic programs. Let L2 = {⊥, >} with ⊥ ≺ >. For instance the classical member/2 predicate, written as a GAP over L2 is: member(X, [X| ]) : >.
member(X, [ |Y ]) : > ← member(X, Y ) : >.
In general, we obtain an equivalence between GAPs over L2 and definite logic programs by adding the annotation “: >” to every predicate symbol of the latter. Example 2. [10] Consider Belnap’s logic FOUR = ({⊥, f , t, >}, {⊥ ≺ f , ⊥ ≺ t, f ≺ >, t ≺ >}). The tweety example can be encoded as: f lies(X) : t ← bird(X) : t. f lies(X) : f ← penguin(X) : t. bird(X) : t ← penguin(X) : t.
penguin(f red) : t. bird(tweety) : t.
In this example we conclude that f lies(tweety) : t and f lies(f red) : >. Mark that the corresponding first-order theory does not have any model. Example 3. By defining C to be a lattice of probability intervals, GAPs can be used to implement probabilistic reasoning. Specifically, if probabilities associated with atoms are assumed to be independent, the join operation of C can be defined as the intersection < max(Low1 , Low2 ), min(High1 , High2 ) > of two intervals [Low1 , High1 ] and [Low2 , High2 ]. Expanding on this idea, GAPs can be used to implement a significant subset of Hybrid Probabilistic Programs [5]. Under the lattice C, GAPs have been used to model probabilistic association rules in a deductive database about aircraft spare parts implemented in XSB [7]. An instance of such a rule is: process(P art,0 CADM IU M P LAT IN G0 , Source) : [94.7, 100] ← nomenclature(P art,0 BELL CRAN K 0 , Source) : [100, 100] & f ederal supply class(P art,0 A05000 , Source)) : [100, 100]. This rule allows one to infer the finishing process of a part given other definite true facts about the part that may be present in a database.
3
Coherent Well-founded Annotated Programs
Both common sense and expert knowledge may be positive (stating the veracity of facts and conclusions) or negative (expressing their falsity). It is also important to have the ability to assume truth or falsity of facts non-monotonically.
266
C.V. Dam´ asio, L.M. Pereira, and T. Swift
Generalized Annotated Logic Programs provide ease of expression of (monotonic) negative knowledge by means of epistemic negation. Epistemic negation, represented by the symbol “¬”, is a unary operator on the truth-value lattice, subject to no other additional constraints. This negation corresponds to the notion of explicit negation in several semantics for extended logic programs. Example 4. Continuing from Ex. 3, it is known that the primary material of certain types of parts, must be either all Steel, all Aluminum, or all Magnesium. This gives us a rule, using explicit negation, for instance of the form ¬material(P art,0 ST EEL0 , Source) : P rob ← material(P art,0 ALU M IN U M 0 , Source) : P rob & nomenclature(P art,0 ST RU T 0 , Source) : [100, 100]. The definition of the negation operator is ¬[Low, High] = [1 − High, 1 − Low]. As originally presented [11], the GAP framework lacks a form of (nonmonotonic) default negation (called ontological negation in [10]), i.e. a nonmonotonic closed world assumption. This has been remedied in the more recent work [13], where a well-founded like [8] and answer-sets like semantics [9] have extended GAPs with a default negation operator. However the semantics of [13] ignores a fundamental relationship that default and explicit negation should obey. Namely, that if something is stated false then it should be assumed false: the coherence principle1 . This principle has been extensively advocated in [2,3,4]. We will adopt coherency from now on and examine its consequences within the setting of annotated programs. Example 5. Continuing from Ex. 4, it is also known that cadmium plating is only used on steel parts, so that, in the absence of more specific information about a part’s material, the part may be inferred to be of generic steel. This requires default negation to be added to GAPs. material(P art,0 ST EEL0 , Source) : P rob ← process(P art,0 CADM IU M P LAT IN G0 , Source) : P rob & not more specif ic material(P art,0 ST EEL0 , Source) : P rob. more specif ic material(P art, M at, Source) : P rob ← material(P art, M at1, Source) : P rob & subclass(M at1, M at, Source) : [100, 100]. In these rules about parts, GAPs are used to reason with probabilistic data mining rules in Ex. 3, default negation is used to allow default inferences, while explicit negation is used to allow representation of contrary information in the rule of Ex. 4. Moreover, coherence ensures that default literal of the form not material(P art,0 Steel0 , Source) : P rob are true in the deductive database by virtue of explicit negative information. 1
Even though answer-sets are coherent, their paraconsistent [12] and annotated [13] extensions are not. For details consult [4].
Coherent Well-founded Annotated Logic Programs
267
Let us start by clarifying the notion of explicit negation. Definition 5 (Explicit negation). Let (T , 4) be a complete lattice. An explicit negation operator “¬” is a total mapping from T into T such that the following two conditions are satisfied: 1. for every µ ∈ T we have ¬¬µ = µ; 2. if µ 4 ϑ then ¬µ 4 ¬ϑ, for every µ, ϑ ∈ T . An explicit negation operator enforces a symmetry transformation on the truth-values lattice. The use of negation is already covered by the original syntax of [11]. We extend the syntax with a default negation operator. Definition 6 (Annotated objective and default literals). Let A : µ be an annotated atom constructed from the complete lattice (T , 4) with an explicit negation operator “¬”. For simplicity, assume that µ ∈ T . Then – A : µ, and ¬A : µ = A : ¬µ are annotated objective literals. We use the notation L : µ to refer this type of literal; – not(A : µ), and not(¬A : µ) = not(A : ¬µ) are annotated default literals. Similarly, we use not(L : µ) to denote annotated default literals. The extension of the satisfaction relation to objective literals is straightforward. By definition ¬A : µ equals A : ¬µ. Therefore, I |= ¬A : µ iff I |= A : ¬µ iff I(A) < ¬µ. Notice that ¬µ is an element of T . We conclude that in annotated programs without default negation, the explicit negation operator is just syntactic sugar. The syntax of Generalized Annotated Programs is appropriately extended: Definition 7 (Normal Annotated Logic Programs). A normal annotated logic program is a set of annotated clauses of the form: L0 : µ0 ← L1 : µ1 & . . . &Lm : µm ¬(H1 : ϑ1 )& . . . ¬(Hn : ϑn )
(m, n ≥ 0)
where the Li : µi are annotated objective literals and the not(Hj : ϑj ) are annotated default literals. For default negation, the definition of the satisfaction relation is more intricate. One cannot simply define I |= not(A : µ) via I 6|= A : µ, as can be seen from the next example. Example 6. Consider the normal annotated logic program on the lattice L2 : a : > ← not(a : >) The single model of this program is I(a) = >. However, this is contrary to the usual requirement of a logic program: every true literal should be supported, roughly meaning that it should be implied only by the set of rules with true body for it, whose conjuncts are each supported. One might conclude that under these conditions this program has no computationally relevant model, as this single rule becomes an equivalence, but the body is not supported.
268
C.V. Dam´ asio, L.M. Pereira, and T. Swift
One of the competing solutions to this problem is given by the well-founded semantics, for which the literal becomes undefined in the above situation. We adhere to this stream, and proceed by defining the meaning of the default negation operator via an alternating fixpoint construction, very similar to the original one of well-founded semantics [8] and amalgamation logic [13]. The crux of this technique is the notion of a Gelfond-Lifschitz like operator: Definition 8 (ΓPT operator). Let P be a normal annotated logic program, over the complete lattice (T , 4). Let I be an interpretation for P . The division of P by I over T is the generalized logic program P T = { L0 : µ0 ← L1 : µ1 & . . . &Lm : µm | I L0 : µ0 ← L1 : µ1 & . . . &Lm : µm ¬(H1 : ϑ1 )& . . . ¬(Hn : ϑn ) ∈ P and I 6|= H1 : ϑ1 and . . . and I 6|= Hn : ϑn } Then the operator ΓPT , maps interpretations to interpretations as follows: ΓPT (I) = lfp T P T I
That is, the least fixpoint of the immediate consequences operator T applied to the division of P by I with respect to T . Proposition 1 (Anti-monotonicity of ΓPT ). [13] Let P be a normal annotated logic program over the complete lattice (T , 4). Let I and J be interpretations for P . Then I 4 J implies that ΓPT (J) 4 ΓPT (I). With this operator, the well-founded semantics can be extended to the more general setting of normal annotated logic programs. This result is provided in [13]. Basically, the true atoms in the well-founded annotated semantics are given by the least fixpoint T = ΓPT ΓPT (T ), and the default ones are obtained from F = HPT − ΓPT (T ), where HPT is the set of all annotated objective literals (the annotated Herbrand base). We refer to this fixpoint semantics as the well-founded annotated semantics. Example 7. Consider the normal annotated logic program over FOUR: a : t ← not(b : t). b : t ← not(a : t). b : f. According to the well-founded annotated semantics, these literals are entailed by the program: {a : ⊥, b : f , b : ⊥} ∪ not {a : f , a : >}
Coherent Well-founded Annotated Logic Programs
269
For lattice FOUR, there is a natural explicit negation operator, where ¬⊥ = ⊥, ¬f = t, ¬t = f , and ¬> = >. What is odd about the above result is that though we have b : f we do not have not(¬b : f ), which is the same as not(b : t), and therefore we cannot conclude a : t ! This example shows the well-founded annotated semantics does not comply with the coherence principle, which would entail not(¬b : f ) from b : f . In our opinion, this is unsatisfactory. Nevertheless, the coherence property cannot be easily enforced on the well-founded annotated semantics. A na¨ıve approach would resort to the semi-normal program, an approach used by the well-founded semantics with explicit negations (WFSX) [2]. However, in some situations the resulting semantics might not be coherent, in particular when we have undefined literals in the model. The next example illustrates this. Example 8. Consider the lattice over the set of elements {⊥, t1, t2, f1, f2, >} and with ordering relation ⊥ ≺ t1, ⊥ ≺ f1, t1 ≺ t2, f1 ≺ f2, t2 ≺ >, f2 ≺ >. The explicit negation operator ¬ is given by ¬⊥ = ⊥, ¬t1 = f1, ¬f1 = t1, ¬t2 = f2, ¬f2 = t2, ¬> = >. Let P be the program: a : t1.
a : f2 ← b : t1.
b : t1 ← not(b : t1).
Its semi-normal version Ps is: a : t1 ← not(a : f1). b : t1 ← not(b : t1) & not(b : f1). a : f2 ← b : t1 & not(a : t2). For extended programs without annotations, the well-founded semantics can be derived as the 4-least fixpoint of ΓP ΓPs , where Ps denotes the semi-normal rewrite of P . The computation of the least fixpoint of ΓP ΓPs proceeds as follows: I0 ΓPs I0 I1 = ΓP ΓPs I0 ΓPs I1 I2
= {a = ⊥, b = ⊥} = {a = >, b = t1} = {a = t1, b = ⊥} = {a = >, b = t1} = ΓP ΓPs I1
The annotated literals true in the model are {a : ⊥, a : t1, b : ⊥} ∪ not {b : f1, b : t2, b : f2, b : >}. Thus we have a : t1 but not(a : f1) is not entailed! Coherence is not satisfied. As the example shows, coherence is impeded because a : f2 is undefined. a : f2 is not being falsified via its semi-normal rule because a : t1 is not strong enough; a : t2 being required to do so. One way of guaranteeing coherence, and the one we follow in this paper, is by avoiding these situations. We achieve that by removing rules from the program which can destroy coherence. This is accomplished by an extension to the semi-normal program called the down seminormal program, whose construction requires that the down-set2 of every element in the lattice must be finite. Another solution, to be expounded elsewhere, is to introduce a tuneable coherence in complete lattices. 2
In an ordered set P the down-set of x, denoted by ↓ x is {y ∈ P |y ≤ x}.
270
C.V. Dam´ asio, L.M. Pereira, and T. Swift
Definition 9 (Down semi-normal program). Let P be a normal annotated logic program over the finite lattice (T , 4) with explicit negation operator “¬”. The down semi-normalized program version of P , denoted by Pds , is the normal annotated logic program obtained as follows: If L0 : µ0 ← L1 : µ1 & . . . &Lm : µm ¬(H1 : ϑ1 )& . . . ¬(Hn : ϑn ) ∈ P then let {τ1 , . . . , τo } = (↓ ¬µ0 ) − {⊥}. The following rule is in Pds : L0 : µ0 ←L1 : µ1 & . . . &Lm : µm & not(H1 : ϑ1 )& . . . ¬(Hn : ϑn )¬(L0 : τ1 )& . . . ¬(L0 : τo ) Note that the finiteness condition is necessary to guarantee that each body in the down semi-normal program is finite. This simplifies the presentation in the finite case. The down semi-normal can then be used to define the new operator on programs ΓPTds . However, a more general operator can be easily defined to work on arbitrary complete lattices, by including the down semi-normalization condition directly in the program division operation: Definition 10 (zTP operator). Let P be a normal annotated logic program, over the complete lattice (T , 4). Let I be an interpretation for P . The down division of P by I over T is the generalized logic program P/T I = { L0 : µ0 ← L1 : µ1 & . . . &Lm : µm | L0 : µ0 ← L1 : µ1 & . . . &Lm : µm ¬(H1 : ϑ1 )& . . . ¬(Hn : ϑn ) ∈ P and I 6|= H1 : ϑ1 and . . . and I 6|= Hn : ϑn and for all τ ∈ (↓ ¬µ0 ) − {⊥} then I 6|= L0 : τ } The operator zTP , mapping interpretations to interpretations, is defined by: zTP (I) = lfp TP/T I The proof of anti-monotonicity of zTP is straightforward. Proposition 2 (Anti-monotonicity of zTP ). Let P be a normal annotated logic program over the complete lattice (T , 4). Let I and J be interpretations for P . Then I 4 J implies that zTP (J) 4 zTP (I). Proof. Our proof relies on the fact that when I 4 J, then the program P/T J has fewer rules than P/T I. By monoticity of the immediate consequences operator on the program the result immediately follows (if we have more rules one can derive more truths). Assume that a rule of P with head L : µ is removed in program P/T I. We show that this rule is also removed in P/T J. This is due to at least one of the following cases: 1. There is a default annotated literal not(H : ϑ) in the body of the rule such that I |= H : ϑ. But since I 4 J then J |= H : ϑ. Therefore the rule also does not belong to P/T J.
Coherent Well-founded Annotated Logic Programs
271
2. There is a τ ∈ (↓ ¬µ) − {⊥} such that I |= L : τ . But then J |= L : τ . Therefore the rule does not appear in P/T J. We finally obtain the intended alternating fixpoint operator construction: Proposition 3 (Monotonicity of ΓPT zTP ). Let P be a normal annotated logic program over the complete lattice (T , 4) with explicit negation operator “¬”. Let I 4 J be two interpretations for P . Then I 4 J implies ΓPT zTP (I) 4 ΓPT zTP (J) When the program and associated truth-value lattice are clear from context we omit them from the operators. Also, it should be clear to the reader that zTP coincides with ΓPTds when T is finite. To further simplify notation we denote the combination of operators ΓPT zTP with the notation Γ Γds , whenever confusion does not arise. Since the alternating fixpoint construction Γ Γds is monotonic it has always a least fixpoint, which can be “obtained” by iterating from the least interpretation, ∆, where for every atom A in the language we have ∆(A) = ⊥. The semantics of normal annotated logic programs follows. Definition 11 (Down coherent well-founded semantics). Let P be a normal annotated logic program over the complete lattice (T , 4) with explicit negation operator “¬”. Let M be the least fixpoint of Γ Γds . Its down coherent well-founded semantics is given by {A : µ | M(A) = ϑ and µ 4 ϑ} ∪ {not(A : µ) | (Γds M)(A) = ϑ and µ 64 ϑ} The least fixpoint M of Γ Γds determines the true annotated literals while the default ones are those not belonging to Γds M. Example 9. Consider the program and lattice of Ex. 7. First, note that the down semi-normal version of P is: a : t ← not(b : t) & not(a : f ). b : t ← not(a : t) & not(b : f ). b : f ← not(b : t). The semantics of the program is iteratively obtained as follows: I0 Γds I0 I1 = Γ Γds I0 Γds I1 I2 = Γ Γds I1 Γds I2 I3 = Γ Γds I2
=∆ = {a = ⊥, b = ⊥} = lfp {a : t ←; b : t ←; b : f ←} = {a = t, b = >} = lfp {b : f ←} = {a = ⊥, b = f } = lfp {a : t ←; b : f ←} = {a = t, b = f } = lfp {a : t ←; b : f ←} = {a = t, b = f } = lfp {a : t ←; b : f ←} = {a = t, b = f } = I2
272
C.V. Dam´ asio, L.M. Pereira, and T. Swift
Applying now Def. 11 we get the model: M = {a : ⊥, a : t, b : ⊥, b : f } ∪ not {a : f , a : >, b : t, b : >} One can easily check this is the expected model, and that coherence is verified. Example 10. Let us return to the program and lattice of Ex. 8. The down seminormal version Pds is: b : t1 ← not(b : t1) & not(b : f1). a : t1 ← not(a : f1). a : f2 ← b : t1 & not(a : t2) & not(a : t1). Note that in the last rule we have added the default literals not(a : t2) and not(a : t1) to the body of the rule. We now get the expected results: I0 Γds I0 I1 = Γ Γds I0 Γds I1 I2 = Γ Γds I1
=∆ = {a = ⊥, b = ⊥} = lfp {a : t1 ←; a : f2 ← b : t1; b : t1 ←} = {a = >, b = t1} = lfp {a : t1 ←; a : f2 ← b : t1} = {a = t1, b = ⊥} = lfp {a : t1 ←; b : t1 ←} = {a = t1, b = t1} = I1
The literals true in the model are: {a : ⊥, a : t1, b : ⊥} ∪ not {a : f1, a : t2, a : f2, a : >, b : f1, b : t2, b : f2, b : >} Clearly coherence is obeyed. However the semantics in some cases is “overly” coherent. The following illustrates this fact. Example 11. Consider the lattice over the set of elements {⊥, t1, t2, >1, f1, f2, >2} and with ordering relation ⊥ ≺ t1, ⊥ ≺ f1, t1 ≺ >1, f1 ≺ >1, >1 ≺ t2, >1, ≺ f2, t2 ≺ >2, f2 ≺ >2. The explicit negation operator ¬ is given by ¬⊥ = ⊥, ¬t1 = f1, ¬f1 = t1, ¬>1 = >1, ¬t2 = f2, ¬f2 = t2, ¬>2 = >2. The program consisting of the single fact a : t2 has model M : {a : ⊥, a : t1, a : f1, a : >1, a : t2} ∪ not {a : ⊥, a : t1, a : f1, a : >1, a : t2, a : f2, a : >2} In which both a : t2 and not(a : t2) are present. Te presence of both a : t2 and not(a : t2) in the model M requires some explanation. Consider the atom a : >1. Now, since M |= a : t2, by Definition 3 M |= a : >1. But ¬a : >1 = a : >1, so that M |= ¬a : >1. By coherency, we should have M 6|= a : >1, which by Definition 3 means that M 6|= a : t2, accounting for the paraconsistency.
Coherent Well-founded Annotated Logic Programs
273
This approach to coherency, may be termed strong in that it dictates that if a literal is false to some degree then it must be false for all higher degrees. In some cases strong coherency may be desirable, but not in others. We are currently working on a spectrum of annotated semantics where the intended degree of coherence can be tuned, e.g. by not requiring propagation of falsity to all higher degrees. In particular, one may want to allow for a undefinedness at a higher truth value not to be overriden by falsity at a weaker truth value. Thus, the introduction of coherence into annotated programs raises some non-trivial issues in the propagation of paraconsistency.
4
Embeddings
We next show how down coherent well-founded semantics extends several wellknown semantics of logic and annotated programs. We assume the reader is acquainted with the syntax and definitions of the following semantics. An embedding of the well-founded annotated semantics with a complete truth-value lattice having an explicit negation operator is given below. The rationale is to put two copies of the truth-value lattice side by side, merging their two bottom elements, and putting a new top element over both sub-lattices. The negation operator maps an element onto its corresponding element at the other lattice copy, and so provides the desired symmetry along the vertical axis. Proposition 4 (Well-founded annotated semantics). Let P be a normal annotated logic program over the complete lattice (T , 4). Construct the new lattice (2T , 42 ) as follows. Let 2T = {(⊥, ⊥)} ∪ {(f , µ) | µ ∈ T − {⊥}} ∪ {(t, µ) | µ ∈ T − {⊥}} ∪ {(>, >)} and the ordering on 2T and the explicit negation operator ¬ be defined by: – For every (µ, ϑ) ∈ 2T we have (⊥, ⊥) 42 (µ, ϑ), and ¬(⊥, ⊥) = (⊥, ⊥); – For every (f , µ), (f , ϑ) ∈ 2T we have (f , µ) 42 (f , ϑ) iff µ 4 ϑ. Furthermore, ¬(f , µ) = (t, µ); – For every (t, µ), (t, ϑ) ∈ 2T we have (t, µ) 42 (t, ϑ) iff µ 4 ϑ. Furthermore, ¬(t, µ) = (f , µ); – For every (µ, ϑ) ∈ 2T we have (µ, ϑ) 42 (>, >), and ¬(>, >) = (>, >). Construct program P 2 over the new lattice, from P , by substituting every occurrence of ⊥ by (⊥, ⊥) and every other literal µ by (t, µ). Then a literal is derived from program P over lattice T under the well-founded annotated semantics iff its corresponding literal – substituting its annotation either to (⊥, ⊥) or (t, µ) – is derived from program P 2 over lattice 2T under the down coherent well-founded semantics. The technique of Prop. 4 should now be clear. Note that literals annotated with (>, >) or (f , µ) never appear in P 2 , and therefore objective literals annotated with those truth-values are never derived from P 2 . Thus the extra default
274
C.V. Dam´ asio, L.M. Pereira, and T. Swift
annotated literals introduced in the down semi-normal program are never false. It is this fact that ensures that the fixpoint of ΓPT zTP over the lattice 2T coincides with that of ΓPT ΓPT over T , and so guarantees the validity of the embedding. The embedding requires the smallest addition of new truth values to the original lattice so that the embedding into down coherent well-founded semantics is valid. This is important since it is desirable to keep the lattice as simple and as close as possible to the original one. Obviously, the lattice L2 of Ex. 1 provides an embedding of Well-founded Semantics into the Well-founded Annotated Semantics. By resorting to the above result and letting T = L2 we obtain an embedding of well-founded semantics into down coherent well-founded semantics. Notice that 2L2 = {(⊥, ⊥), (f , >), (t, >), (>, >)} Accordingly, the ordering relation is: (⊥, ⊥) 42 (f , >)
(⊥, ⊥) 42 (t, >)
(f , >) 42 (>, >)
(t, >) 42 (>, >)
and the explicit negation operator behaves as follows: ¬(⊥, ⊥) = (⊥, ⊥) ¬(f , >) = (t, >) ¬(t, >) = (f , >) ¬(>, >) = (>, >) The reader may verify that (2L2 , 42 ) is isomorphic to Belnap’s logic FOUR. This justifies the following corollary: Corollary 1 (Well-founded semantics). Let P be a normal logic program, and P W F S the following normal annotated logic program over the lattice FOUR with the usual explicit negation operator: P W F S = {A0 : t ← A1 : t& . . . , An : t¬ (B1 : t)& . . . ¬ (Am : t) such that A0 ← A1 , . . . , An , not B1 , . . . , not Bm belongs to P } Then A, respectively not A, belongs to the well-founded model of P iff A : t, respectively not (A : t), belongs to the down coherent well-founded model of P WFS. The transformation guarantees that all semi-normalization literals introduced in the semi-normal program transformation are of the form not(A : f ). All rules in P W F S have a head annotated with t, therefore neither > nor f are derivable, and therefore not(A : f ) is always true. Thus, the Γ F OU R ΓsF OU R alternating fixpoint construction coincides with the original Γ Γ construction of [8]. More importantly, the same construction can be used to extend an arbitrary lattice with an explicit negation operator, where it is possible to categorically state the truth or falsity of literals and have coherence enforced. A similar effect can be obtained with the lattice operator |, where T |T is the lattice 2T without elements (f , >) and (t, >). This corresponds to merging the two top elements of the two lattice instances, as explained before. The lattice T |T is normally used when there is no need to distinguish between the veracity and falsity of the top
Coherent Well-founded Annotated Logic Programs
275
element of T ; in most situations this corresponds to interpreting > in T already as “contradiction”. An application of the previous techniques and results gives a natural embedding of the Paraconsistent Extended Well-founded Semantics [3,4] into Down Coherent Well-Founded Annotated programs. The Paraconsistent Extended Wellfounded Semantics (denoted by W F Mp ) is obtained from Well-founded Annotated Semantics by using the complete set of truth values of lattice FOUR. Proposition 5 (Paraconsistent extended well-founded semantics). [3] Consider the extended logic program P . The normal annotated logic program P ¬ over FOUR obtained from P by substituting every occurrence of ¬A by A : f , and of A by A : t. Let M be its coherent well-founded model. Then – – – –
A belongs to W F Mp (P ) iff A : t belongs to M; ¬A belongs to W F Mp (P ) iff A : f belongs to M; not A belongs to W F Mp (P ) iff not (A : t) belongs to M; not ¬A belongs to W F Mp (P ) iff not (A : f ) belongs to M.
If programs contain only literals annotated with ⊥ or t we obtain the wellfounded semantics (as expected from Prop. 4). Moreover, if programs contain no default negated literals then the objective literals labeled with t in the down coherent well-founded model coincide with the ones true in the minimal Herbrand Model of the corresponding definite program. We thus have shown how to move from generalized annotated logic programs or from extended logic programs to their natural paraconsistent and coherent well-founded based annotated semantics.
5
Conclusions
The theorems in the previous section show that coherent well-founded annotated programs incorporate both extended and annotated logic programs. The practical importance of such a combination has been indicated by several of the examples. Example 3 uses annotations to formulate probabilistic data mining rules; Example 4 uses explicit negation to represent a contradiction in information arising from different sources; and Example 5 uses default negation as an instance of default reasoning based on probabilistic rules. Thus, annotations, explicit negation, and default negation are all required together for this deductive database example. The generality and practical applicability of coherent wellfounded annotated programs with strong coherency as so far described indicates that their efficient implementation is a worthwhile task; the simplicity of their fixpoint definition suggests that they can be implemented by extending a system, such as XSB, that computes the well-founded semantics, a task which is now underway.
276
C.V. Dam´ asio, L.M. Pereira, and T. Swift
Acknowledgements We thank PRAXIS XXI project MENTAL (Mental Agents Architecture in Logic) and FLAD-NSF project REAP for their support. This work was also partially supported by NSF grants CCR-9702581, EIA-97-5998, and INT-96-00598. We also thank Jos´e Alferes for his helpful comments.
References 1. S. Adali and V. S. Subrahmanian. Amalgamating knowledge bases, III: Algorithms, data structures, and query processing. J. of Logic Programming, 28(1):45–88, 1996. 2. J. J. Alferes and L. M. Pereira. Reasoning with Logic Programming. LNAI volume 1111, Springer–Verlag, 1996. 3. Carlos Viegas Dam´ asio. Paraconsistent Extended Logic Programming with Constraints. PhD thesis, Universidade Nova de Lisboa, October 1996. 4. Carlos Viegas Dam´ asio and Lu´ıs Moniz Pereira. A survey of paraconsistent semantics for logic programas. In D. Gabbay and P. Smets, editors, Handbook of Defeasible Reasoning and Uncertainty Management Systems, volume 2, pages 241– 320. Kluwer, 1998. 5. A. Dekhtyar and V. S. Subrahmanian. Hybrid probabilistic programs. In International Conference on Logic Programming 1997, pages 391–495, 1997. 6. M. Van Emden and R. Kowalski. The semantics of predicate logic as a programming language. Journal of ACM, 4(23):733–742, 1976. 7. J. Freire, P. Rao, K. Sagonas, T. Swift, and D. S. Warren. XSB: A system for efficiently computing the well-founded semantics. In Fourth LPNMR, pages 430– 440, 1997. 8. A. Van Gelder, K. A. Ross, and J. S. Schlipf. The well-founded semantics for general logic programs. Journal of the ACM, 38(3):620–650, 1991. 9. M. Gelfond and V. Lifschitz. Logic programs with classical negation. In Warren and Szeredi, editors, 7th International Conference on Logic Programming, pages 579–597. MIT Press, 1990. 10. M. Kifer and E. Lozinskii. A logic for reasoning with inconsistency. J. of Automated Reasoning, 8:179–215, 1992. 11. M. Kifer and V. S. Subrahmanian. Theory of generalized annotated logic programming and its applications. J. of Logic Programming, 12:335–367, 1992. 12. C. Sakama and K. Inoue. Paraconsistent Stable Semantics for Extended Disjunctive Programs. J. of Logic and Computation, 5(3):265–285, 1995. 13. V. S. Subrahmanian. Amalgamating knowledge bases. ACM Transactions on Database Systems, 19(2):291–331, 1994.
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics Thomas Lukasiewicz Institut f¨ ur Informationssysteme, Technische Universit¨ at Wien Treitlstraße 3, A-1040 Wien, Austria
[email protected]
Abstract. We present many-valued disjunctive logic programs in which classical disjunctive logic program clauses are extended by a truth value that respects the material implication. Interestingly, these many-valued disjunctive logic programs have both a probabilistic semantics in probabilities over possible worlds and a truth-functional semantics. We then define minimal, perfect, and stable models and show that they have the same properties like their classical counterparts. In particular, perfect and stable models are always minimal models. Under local stratification, the perfect model semantics coincides with the stable model semantics. Finally, we show that some special cases of propositional many-valued disjunctive logic programming under minimal, perfect, and stable model semantics have the same complexity like their classical counterparts.
1
Introduction
In the logic programming framework, there exist at least two main streams in handling uncertain knowledge. Many-valued and probabilistic logic programming aims to handle numerical uncertainty, whereas disjunctive logic programming deals with disjunctive knowledge and nonmonotonic negation. In this paper, we now propose a combination of both of them in a uniform framework. This paper relies on probability theory as a commonly accepted formalism for handling numerical uncertainty. Probabilistic propositional logics and related languages are thoroughly studied in the literature (see especially [26] and [7]). Their extensions to probabilistic first-order logics can be classified into firstorder logics in which probabilities are defined over a set of possible worlds and those in which probabilities are given over the domain (see especially [2] and [9]). The first ones are suitable for representing degrees of belief, while the latter are appropriate for describing statistical knowledge. In the present paper, we assume that probabilities are defined over a set of possible worlds. Probabilistic reasoning in its full generality is a quite tricky task and very different from classical reasoning (see especially [19], [15], and [14]). It should generally be performed by global linear programming methods, rather than by local inference techniques. For this reason, it is generally also computationally more complex than classical reasoning. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 277–289, 1999. c Springer-Verlag Berlin Heidelberg 1999
278
T. Lukasiewicz
In particular, the model and fixpoint characterization and the proof theory of classical definite logic programming generally do not carry over to probabilistic definite logic programming (as presented in [14]). Moreover, the tractability of special cases of classical logic programming generally does not carry over to the corresponding special cases of probabilistic logic programming. However, we would like an approach to many-valued disjunctive logic programming that does not ignore the years of work in classical disjunctive logic programming. Furthermore, it would be nice if query processing in many-valued disjunctive logic programs is not computationally more complex than query processing in classical disjunctive logic programs. The key to achieve all this is to increase the axioms of probability by an axiom that brings probabilistic logics closer to truth-functional logics [17]. In detail, our many-valued disjunctive logic programs have a probabilistic semantics in probabilities over possible worlds. Furthermore, the truth values of all clauses are truth-functionally defined on the truth values of atoms. We showed in [17] and [18] that many-valued definite logic programming with this probabilistic semantics has a model and fixpoint characterization and a proof theory similar to classical definite logic programming. Moreover, special cases of many-valued logic programming with this semantics were shown to have the same computational complexity like their classical counterparts. Many-valued definite logic programming with this probabilistic semantics has an important companion in the literature. More precisely, van Emden’s quantitative deduction [31] can be given a probabilistic semantics by probabilities over possible worlds under the additional axiom. However, van Emden’s quantitative deduction is based on a conditional probability semantics of the implication connective, while [17], [18], and the present paper use the material implication semantics. Interestingly, it turns out that the material implication is much closer to classical logic programming. In particular, the material implication is more suitable for additionally handling disjunction and nonmonotonic negation. It is also important to point out that both many-valued definite logic programming with probabilistic semantics and van Emden’s quantitative deduction are approximations of probabilistic logic programming. More precisely, our approach is an approximation of probabilistic logic programming under the material implication [18], while van Emden’s quantitative deduction can be understood as an approximation of probabilistic logic programming under the conditional probability implication (as defined in [14]). The literature contains many other approaches to many-valued logic programming (see, for example, [11], [31], [3], [8], and [20]) and probabilistic logic programming (see, for example, [23], [27], [24], [25], [4], [14], and [22]). To our knowledge, this paper is the first to integrate numerical uncertainty in the form of probabilities over possible worlds, disjunction, and nonmonotonic negation in a uniform framework close to classical disjunctive logic programming. The work closest in spirit to this paper is perhaps the one by Mateis [20]. It also combines numerical uncertainty, disjunctive knowledge, and nonmonoto-
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
279
nic negation. Its uncertainty formalism, however, is based on t-norms and not on probabilities over possible worlds. Ngo [25] also combines numerical uncertainty and disjunction. However, he does not consider nonmonotonic negation. Moreover, he does not allow numerical uncertainty on the rule level. Furthermore, his approach is closer to Bayesian networks than to classical disjunctive logic programming. Finally, Ng and Subrahmanian [23] also deal with the combination of numerical uncertainty, disjunctive knowledge, and nonmonotonic negation. However, they also do not allow numerical uncertainty on the rule level. Moreover, their work is perhaps best described as logic programming under nonmonotonic negation about probabilistic disjunctions and conjunctions of atoms. The main contributions of this paper can be summarized as follows: • We present many-valued disjunctive logic programs in which classical disjunctive logic program clauses are extended by a truth value that respects the material implication. These programs have both a probabilistic semantics in probabilities over possible worlds and a truth-functional semantics. • We define minimal, perfect, and stable models and show that they have the same properties like their classical counterparts. In particular, perfect and stable models are always minimal. Furthermore, under local stratification, the perfect model semantics coincides with the stable model semantics. • We show that the problems of deciding whether a ground program has a minimal, perfect, or stable model have the same complexity like their classical counterparts. Moreover, we show that some special cases of propositional query processing under minimal, perfect, and stable model semantics have the same complexity like their classical counterparts. The rest of this paper is organized as follows. In Section 2, we describe the technical background in probabilistic first-order logics over possible worlds. Sections 3 and 4 introduce many-valued disjunctive logic programs. In Section 5, we focus on their minimal, perfect, and stable models. Section 6 concentrates on the complexity of many-valued disjunctive logic programming. In Section 7, we summarize the main results and give an outlook on future research. Note that all proofs are given in full detail in [16].
2
Technical Preliminaries
In this section, we focus on the technical background. We briefly describe firstorder logics of probability and their semantics in Pr- and Pr? -interpretations. 2.1
Pr-Interpretations
We now briefly summarize how (a quantifier-free fragment of) classical first-order logics can be given a probabilistic semantics in which probabilities are defined
280
T. Lukasiewicz
over a set of possible worlds. We basically follow the work by Halpern [9], which we adapt to our needs in the logic programming framework. Let Φ be a first-order vocabulary that contains a set of function symbols and a set of predicate symbols (as usual, constant symbols are function symbols of arity zero). Let X be a set of variables. We define terms by induction as follows. A term is a variable from X or an expression f (t1 , . . . , tk ), where f is a function symbol of arity k ≥ 0 from Φ and t1 , . . . , tk are terms. We define classical formulas by induction as follows. If p is a predicate symbol of arity k ≥ 0 from Φ and t1 , . . . , tk are terms, then p(t1 , . . . , tk ) is a classical formula (called atom). If F and G are classical formulas, then ¬F and (F ∧ G) are classical formulas. A probabilistic formula is an expression prob(F ) ≥ c, where F is a classical formula and c is a real number from [0, 1]. We abbreviate (F ∨ G) and (F ← G) by ¬(¬F ∧ ¬G) and ¬(¬F ∧ G), respectively. We adopt the usual conventions to eliminate parentheses in combination with these abbreviations. Literals, positive literals, and negative literals are defined as usual. Terms, classical formulas, and probabilistic formulas are ground iff they do not contain any variables. The notions of substitutions, ground substitutions, and ground instances of classical formulas are defined as usual. The latter is assumed to be canonically extended to probabilistic formulas. An interpretation I is a subset of the Herbrand base HB Φ over Φ. A variable assignment σ is a mapping that assigns to each variable from X an element from the Herbrand universe HU Φ over Φ. It is by induction extended to terms by σ(f (t1 , . . . , tk )) = f (σ(t1 ), . . . , σ(tk )) for all terms f (t1 , . . . , tk ). The truth of classical formulas F in I under σ, denoted I |=σ F , is inductively defined as follows (we write I |= F if F is ground): • I |=σ p(t1 , . . . , tk ) iff p(σ(t1 ), . . . , σ(tk )) ∈ I. • I |=σ ¬F iff not I |=σ F , and I |=σ (F ∧ G) iff I |=σ F and I |=σ G. A probabilistic interpretation (Pr-interpretation) p = (I, µ) consists of a set I of classical interpretations (called possible worlds) and a discrete probability function µ on I (that is, a mapping µ from I to the real interval [0, 1] such that all µ(I) with I ∈ I sum up to 1 and that the number of all I ∈ I with µ(I) > 0 is countable). The truth value pσ (F ) of a formula F in the Pr-interpretation p under a variable assignment σ is defined by (we write p(F ) if F is ground): X µ(I) . (1) pσ (F ) = I∈I, I |=σ F
A probabilistic formula prob(F ) ≥ c is true in p under σ iff pσ (F ) ≥ c. The formula prob(F ) ≥ c is true in p, or p is a model of prob(F ) ≥ c, denoted p |= prob(F ) ≥ c, iff prob(F ) ≥ c is true in p under all variable assignments σ. The Pr-interpretation p is a model of a set of probabilistic formulas P, denoted p |= P, iff p is a model of all probabilistic formulas in P. The set of probabilistic formulas P is satisfiable iff a model of P exists. The formula prob(F ) ≥ c is a tight logical consequence of P, denoted P |=tight prob(F ) ≥ c, iff c is the infimum of pσ (F ) subject to all models p of P and all variable assignments σ.
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
281
For Pr-interpretations p = (I, µ) with µ(I) = 1 for some possible world I ∈ I, we use classical (p) to denote this I. For a set of probabilistic formulas P, we use classical (P) to denote the set of all F with prob(F ) ≥ 1 ∈ P. 2.2
Pr? -Interpretations
We now define Pr? -interpretations by restricting Pr-interpretations (that is, by assuming another axiom besides the axioms of probability): A Pr? -interpretation is a Pr-interpretation p with: p(A ∧ B) = min(p(A), p(B)) for all A, B ∈ HB Φ .
(2)
Note that the condition p(A ∧ B) = min(p(A), p(B)) is just assumed for ground atoms A and B. This condition brings probabilistic logics over possible worlds closer to truth-functional logics. It is important to point out that we do not assume that (2) always holds in the part of the real world that we want to model. The axiom (2) is simply a technical assumption that carries us to a form of many-valued logic programming that approximates probabilistic logic programming (see Section 5.1). It makes a global probabilistic semantics over possible worlds match with the truth-functionality that stands behind logic programming techniques. Interestingly, the axiom (2) is equivalent to the assumption of a subset relationship between possible worlds as follows. Theorem 1. Let p = (I, µ) be a Pr-interpretation. Let I+ = {I ∈ I | µ(I) > 0} and for all ground atoms A let I+ (A) = {I ∈ I+ | I |= A}. Then the condition (2) is equivalent to each of the following conditions (3) and (4): I+ (A) ⊆ I+ (B) or I+ (A) ⊇ I+ (B) for all A, B ∈ HB Φ
(3)
I1 ⊆ I2 or I1 ⊇ I2 for all I1 , I2 ∈ I+ .
(4)
The next theorem shows that the truth value of certain ground formulas under Pr? -interpretations is truth-functionally defined on the truth values of their components. In particular, the truth value of all ground classical clauses is truth-functionally defined on the truth values of their ground atoms. Note that the truth functions are the same as in the nondenumerable infinite-valued Lukasiewicz logic Lℵ1 (see [30] for a survey). Theorem 2. For all Pr? -interpretations p = (I, µ), all ground classical formulas F , and all ground classical formulas G and H that are built without the logical connectives ¬ and ←: p(¬F ) = 1 − p(F ) p(G ∧ H) = min(p(G), p(H))
(5) (6)
p(G ∨ H) = max(p(G), p(H))
(7)
p(G ← H) = min(1, p(G) − p(H) + 1) .
(8)
282
T. Lukasiewicz
The following theorem shows that Pr? -interpretations give a natural probabilistic semantics to van Emden’s quantitative deduction [31] in which the implication connective is interpreted as conditional probability (note that this result implies that van Emden’s quantitative deduction is an approximation of probabilistic logic programming under the conditional probability implication). Theorem 3. For all Pr? -interpretations p, all real numbers c ∈ [0, 1], and all ground atoms H, B1 , . . . , Bk with k ≥ 0: p(H) ≥ c · min(p(B1 ), . . . , p(Bk )) iff p(B1 ∧ · · · ∧ Bk ) = 0 or p(H | B1 ∧ · · · ∧ Bk ) ≥ c . Note that for p(B1 ∧ · · · ∧ Bk ) > 0, the expression p(H | B1 ∧ · · · ∧ Bk ) is defined as p(H ∧ B1 ∧ · · · ∧ Bk ) / p(B1 ∧ · · · ∧ Bk ). Note also that for k = 0, we naturally define both min(p(B1 ), . . . , p(Bk )) and p(B1 ∧ · · · ∧ Bk ) as 1. Finally, we show that Pr? -interpretations are already uniquely determined by the truth values they give to all ground atoms: Theorem 4. Let p = (I, µ) be a Pr? -interpretation with µ(I) > 0 for all I ∈ I. Then p is uniquely determined by all pairs (A, p(A)) with A ∈ HB Φ . Hence, Pr? -interpretations can be identified with mappings from HB Φ to the real interval [0, 1]. Since such mappings can also be viewed as fuzzy sets, we get the following natural subset relation on Pr? -interpretations. For Pr? -interpretations p and q, we say p is a subset of q, denoted p ⊆ q, iff p(A) ≤ q(A) for all A ∈ HB Φ . We use p ⊂ q as an abbreviation for p ⊆ q and p 6= q. For sets of probabilistic formulas P and probabilistic formulas prob(F ) ≥ c, we write P |=?tight prob(F ) ≥ c, iff c is the infimum of pσ (F ) subject to all Pr? interpretations p that are models of P and all variable assignments σ.
3
Many-Valued Disjunctive Logic Programs
We are now ready to define many-valued disjunctive logic programs. We start by defining many-valued disjunctive logic program clauses, which are special probabilistic formulas that are interpreted under Pr? -interpretations: A many-valued disjunctive logic program clause (or simply clause) is a probabilistic formula of the following kind: prob(A1 ∨ · · · ∨ Al ← B1 ∧ · · · ∧ Bm ∧ ¬C1 ∧ · · · ∧ ¬Cn ) ≥ c , where A1 , . . . , Al , B1 , . . . , Bm , C1 , . . . , Cn are atoms, l, m, n ≥ 0, and c ∈ [0, 1] is rational. It is abbreviated by (A1 ∨ · · · ∨ Al ← B1 , . . . , Bm , ¬C1 , . . . , ¬Cn )[c, 1]. We call A1 ∨ · · · ∨ Al its head, B1 , . . . , Bm , ¬C1 , . . . , ¬Cn its body, and c its truth value. Such a clause is called an integrity clause iff l = 0, a fact iff l > 0 and m + n = 0, and a rule iff l > 0 and m + n > 0. A many-valued disjunctive logic program (or simply program) is a finite set of clauses.
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
283
Given a program P , we identify Φ with the vocabulary Φ(P ) that consists of all the function and predicate symbols in P . We use HB P to denote the Herbrand base over Φ(P ). We use ground (P ) to denote the set of all ground instances of clauses from P with respect to Φ(P ). Given a program P , we do not need all the real numbers in [0, 1] to characterize the semantics of P . More precisely, the least set of equidistant rational numbers from [0, 1] that contains 0, 1, and all the rational numbers occurring in P is sufficient (see Theorem 7). Hence, we define the set of truth values of P , 0 1 denoted TV (P ), as the least set of rational numbers { n−1 , n−1 , . . . , n−1 n−1 }, where n ≥ 2 is a natural number, that contains all the rational numbers occurring in P . The program P is n-valued iff |TV (P )| = n. Crucially, the truth value of all ground clauses under Pr? -interpretations is truth-functionally defined on the truth values of their ground atoms: Theorem 5. A ground clause (A1 ∨ · · · ∨ Al ← B1 , . . . , Bm , ¬C1 , . . . , ¬Cn )[c, 1] is true in a Pr? -interpretation p iff the following condition holds: max(p(A1 ), . . . , p(Al ), p(C1 ), . . . , p(Cn )) ≥ c − 1 + min(p(B1 ), . . . , p(Bm )) . Note that the maximum and the minimum of an empty list of arguments are canonically defined as 0 and 1, respectively. We finally define queries and their correct and tight answers: A many-valued query (or simply query) is an expression ∃(F )[t, 1], where F is a ground classical formula and t is a variable or a rational number from [0, 1]. We call the query ∃(G)[t, 1] a positive query and the query ∃(¬G)[t, 1] a negative query if G is built without the logical connectives ¬ and ←. Given the queries ∃(F )[c, 1] and ∃(F )[x, 1] to a program P , where c ∈ [0, 1] and x ∈ X , we define their desired semantics in terms of correct and tight answers with respect to a set M(P ) of models of P as follows. The correct answer for ∃(F )[c, 1] to P under M(P ) is Yes if c ≤ inf{p(F ) | p ∈ M(P )} and No otherwise. The tight answer for ∃(F )[x, 1] to P under M(P ) is the substitution θ = {x/d}, where d = inf{p(F ) | p ∈ M(P )}. Many-valued query processing generalizes the classical cautious inference: Theorem 6. Let P be a 2-valued program and let M(P ) be a set of models (I, µ) of P with µ(I) ⊆ {0, 1}. The correct answer for the query ∃(F )[1, 1] to P under M(P ) is Yes iff F is true in all models from classical(M(P )).
4
Example
Assume that we have the following knowledge about roads and the reachability of places through roads: The probability that the road r is closed or that the road s is closed is greater than 0.5. The probability that r connects the place a with the place b is greater than 0.8. The probability that s connects b with c is greater than 0.7. The probability that we can reach Y through X if there is a
284
T. Lukasiewicz
road from X to Y that is not closed is greater than 0.9. The probability that we can reach Z through X if we can reach Z through Y and Y through X is greater than 0.9. This knowledge can be expressed by the following program P (r, s, a, b, and c are constant symbols and R, X, Y , and Z are variables): P = {(closed (r) ∨ closed (s) ← )[0.5, 1], (road (r, a, b) ← )[0.8, 1], (road (s, b, c) ← )[0.7, 1], (reach(X, Y ) ← road (R, X, Y ), ¬closed (R))[0.9, 1], (reach(X, Z) ← reach(X, Y ), reach(Y, Z))[0.9, 1]} . We may ask for the tight lower bound of the probability that we can reach c through a. This can be expressed by the query ∃(reach(a, c))[U, 1], where U is a variable. To give its tight answer, we must specify a set of models of P . Some models p1 , p2 , p3 , and p4 of P are shown in Table 1 (we assume that pi (A) = 0 for all ground atoms A that are not mentioned). The tight answer for ∃(reach(a, c))[U, 1] to P under {p1 , p2 , p3 , p4 } is given by {U/0}, whereas the tight answer for ∃(reach(a, c))[U, 1] to P under {p1 , p2 } is given by {U/0.5}. Hence, as far as the query ∃(reach(a, c))[U, 1] is concerned, {p1 , p2 } seems to describe the intended meaning of P better than {p1 , p2 , p3 , p4 }. Table 1. Some models of the program P closed (r) closed (s) road (r, a, b) road (s, b, c) reach(a, b) reach(b, c) reach(a, c) p1 p2 p3 p4
0.5 0 0 0
0 0.5 0.6 0.7
0.8 0.8 0.8 0.8
0.7 0.7 0.7 0.7
0.7 0.7 0.7 0
0.6 0.6 0 0
0.5 0.5 0 0
The models p1 , p2 , p3 , and p4 are some minimal models of P (with respect to the subset relationship defined in Section 2.2), whereas the models p1 and p2 are the only perfect and stable models of the locally stratified program P . We will introduce these notions in the following section.
5
Model Semantics
In this section, we define minimal, perfect, and stable models of many-valued disjunctive logic programs, and we discuss some of their properties. 5.1
Minimal Models
We now define minimal models of many-valued disjunctive logic programs. A model p of a program P is a minimal model of P iff no model of P is a proper subset of p. MM(P ) denotes the set of all minimal models of P .
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
285
Crucially, as far as minimal models of a program P are concerned, we can restrict our attention to the finite number of truth values in TV (P ): Theorem 7. All minimal models of a program P map into TV (P ). Given a positive query to a program P , the tight answer under MM(P ) describes a tight logical consequence under Pr? -interpretations. Moreover, it approximates a tight logical consequence under Pr-interpretations. That is, inference in Pr? -interpretations is an approximation of inference in Pr-interpretations. Theorem 8. Let P be a program. a) The tight answer for a positive query ∃(G)[x, 1] to P under MM(P ) is given by {x/d}, where d such that P |=?tight prob(G) ≥ d. b) If the tight answer for a positive query ∃(G)[x, 1] to P under MM(P ) is given by {x/d}, then [0, d] contains the unique c with P |=tight prob(G) ≥ c. Finally, many-valued minimal models generalize classical minimal models: Theorem 9. Let P be a 2-valued program. The set classical(MM(P )) coincides with the set of all minimal models of classical(P ). 5.2
Perfect Models
We now extend the notion of perfect models [28] to many-valued disjunctive logic programs. For this purpose, we must first define a priority relation on ground atoms and a preference relation on Pr? -interpretations. The priority relation on ground atoms is simply defined like in [28]: For a program P , the priority relation ≺ and the auxiliary relation are the least binary relations on HB P with the following properties. If ground (P ) contains a clause with the atom A in the head and the negative literal ¬C in the body, then A ≺ C. If ground (P ) contains a clause with the atom A in the head and the positive literal B in the body, then A B. If ground (P ) contains a clause with the atoms A and A0 in the head, then A A0 . If A ≺ B, then A B. If A B and B C, then A C. If A B and B ≺ C, then A ≺ C. If A ≺ B and B C, then A ≺ C. We say that the ground atom B has higher priority than the ground atom A iff A ≺ B. The preference relation on Pr? -interpretations is defined as follows. For Pr? -interpretations p and q, we say p is preferable to q, denoted p q, iff p 6= q and for each A ∈ HB P with p(A) > q(A) there is some B ∈ HB P with q(B) > p(B) and A ≺ B. We write p ≤≤ q iff p q or p = q. We are now ready to define perfect models. A model q of a program P is a perfect model of P iff no model of P is preferable to q. We use PM(P ) to denote the set of all perfect models of P . Every many-valued perfect model is a minimal model: Theorem 10. Every perfect model of a program P is a minimal model of P . Many-valued perfect models generalize classical perfect models:
286
T. Lukasiewicz
Theorem 11. Let P be a 2-valued program. The set classical(PM(P )) coincides with the set of all perfect models of classical(P ). 5.3
Perfect Models under Local Stratification
We now concentrate on perfect models of locally stratified programs. Locally stratified classical disjunctive logic programs without integrity clauses always have a perfect model [28]. We now show that the same holds for locally stratified many-valued disjunctive logic programs without integrity clauses. A program P without integrity clauses is locally stratified iff HB P can be partitioned into sets H1 , H2 , . . . (called strata) such that for each clause (A1 ∨ · · · ∨ Al ← B1 , . . . , Bm , ¬C1 , . . . , ¬Cn )[c, 1] ∈ ground (P ) , there exists an i ≥ 1 such that all A1 , . . . , Al belong to Hi , all B1 , . . . , Bm belong to H1 ∪ · · · ∪ Hi , and all C1 , . . . , Cn belong to H1 ∪ · · · ∪ Hi−1 . Given such a partition H1 , H2 , . . . of HB P (which is called a local stratification of P ) with i ≥ 1, we use Pi to denote the set of all clauses from ground (P ) whose heads belong to Hi . Moreover, we define Hi? = H1 ∪ · · · ∪ Hi , Pi? = P1 ∪ · · · ∪ Pi , and h?i = HB P |Hi? , where HB P = {(A, 1) | A ∈ HB P }. Every model of a locally stratified program is subsumed by a perfect model: Theorem 12. For every model q of a locally stratified program P , there exists a perfect model p of P such that p ≤≤ q. The next theorem shows that each perfect model of a locally stratified program has a natural characterization by iterative minimal models. Theorem 13. Let P be a program and let H1 , H2 , . . . be a local stratification of P . The Pr? -interpretation q is a perfect model of P iff 1. the Pr? -interpretation q|H1 is a minimal model of P1 and 2. for all i ≥ 2, the Pr? -interpretation q|Hi? is a minimal element in the set of ? ? all models o ⊆ h?i of Pi with o|Hi−1 = q|Hi−1 . Finally, the following theorem shows that locally stratified programs without disjunction always have a unique perfect model. Theorem 14. Every disjunction-free locally stratified program P has a unique perfect model p such that p ≤≤ q for all models q of P . 5.4
Stable Models
We now extend the notion of stable models [29] to many-valued disjunctive logic programs. For this purpose, we must slightly generalize clauses as follows. An extended many-valued disjunctive logic program clause (or simply extended clause) is an expression of the following kind: (A1 ∨ · · · ∨ Al ; d ← B1 , . . . , Bm , ¬C1 , . . . , ¬Cn )[c, 1] ,
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
287
where A1 , . . . , Al , B1 , . . . , Bm , C1 , . . . , Cn are atoms, l, m, n ≥ 0, c is a rational number from [0, 1], and d is a real number from [0, 1]. It is true in a Pr? -interpretation p under a variable assignment σ iff max(pσ (A1 ), . . . , pσ (Al ), pσ (C1 ), . . . , pσ (Cn ), d) ≥ c − 1 + min(pσ (B1 ), . . . , pσ (Bm )). We next generalize the classical Gelfond-Lifschitz transformation: For a program P and a Pr? -interpretation q, the expression P /q denotes the set of extended clauses that is obtained from ground (P ) by replacing every clause (A1 ∨ · · · ∨ Al ← B1 , . . . , Bm , ¬C1 , . . . , ¬Cn )[c, 1] by the extended clause (A1 ∨ · · · ∨ Al ; max(q(C1 ), . . . , q(Cn )) ← B1 , . . . , Bm )[c, 1] . We are now ready to define stable models as follows. A Pr? -interpretation q is a stable model of a program P iff q is a minimal model of P /q. We use SM(P ) to denote the set of all stable models of P . Every stable model is also a minimal model: Theorem 15. Every stable model of a program P is a minimal model of P . The next theorem shows that for locally stratified programs, the notion of stable models coincides with the notion of perfect models. Theorem 16. The set of stable models of a locally stratified program P coincides with the set of perfect models of P . Many-valued stable models generalize classical stable models: Theorem 17. Let P be a 2-valued program. The set classical(SM(P )) coincides with the set of all stable models of classical(P ).
6
Computational Complexity
We now show that some decision problems related to many-valued disjunctive logic programs have the same complexity like their classical counterparts [6]. We first concentrate on the problems of deciding whether a ground program has a minimal, perfect, or stable model. Theorem 18. a) The problem of deciding whether a ground program P has a minimal model is NP-complete. b) The problem of deciding whether a ground program P has a perfect model is Σ2P -complete. c) The problem of deciding whether a ground program P has a stable model is Σ2P -complete. We next focus on some decision problems related to propositional query processing under minimal, perfect, and stable model semantics. Theorem 19. The problem of deciding whether Yes is the correct answer for a ground positive or negative query ∃(F )[c, 1] to a ground program P under M(P ) is Π2P -complete for every M(P ) among MM(P ), PM(P ), and SM(P ).
288
7
T. Lukasiewicz
Summary and Outlook
We presented many-valued disjunctive logic programs with probabilistic semantics in which classical disjunctive logic program clauses are extended by a truth value that respects the material implication. We showed that they have a natural minimal, perfect, and stable model semantics, which generalize the minimal, perfect, and stable model semantics of classical disjunctive logic programs. We also showed that some decision problems related to ground many-valued disjunctive logic programs under minimal, perfect, and stable model semantics have the same computational complexity like their classical counterparts. An interesting topic of future research is to explore other semantics of nonmonotonic negation in many-valued disjunctive logic programs. Moreover, it would be very interesting to elaborate a fixpoint semantics and a proof theory for many-valued disjunctive logic programs.
Acknowledgments I am very grateful to Thomas Eiter, Georg Gottlob, Nicola Leone, and Cristinel Mateis for useful discussions. Some of this work was done while I was supported by a DFG grant.
References 1. K. R. Apt. Logic programming. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, chapter 10, pages 493–574. MIT Press, 1990. 2. F. Bacchus, A. Grove, J. Y. Halpern, and D. Koller. From statistical knowledge bases to degrees of beliefs. Artif. Intell., 87:75–143, 1996. 3. J. F. Baldwin. Evidential support logic programming. Fuzzy Sets Syst., 24:1–26, 1987. 4. A. Dekhtyar and V. S. Subrahmanian. Hybrid probabilistic programs. In Proc. of the 14th International Conference on Logic Programming, pages 391–405, 1997. 5. T. Eiter and G. Gottlob. Complexity aspects of various semantics for disjunctive databases. In Proc. of the 12th ACM Symposium on Principles of Database Systems, pages 158–167. ACM Press, 1993. 6. T. Eiter and G. Gottlob. On the computational cost of disjunctive logic programming: Propositional case. Ann. Math. Artif. Intell., 15:289–323, 1995. 7. R. Fagin, J. Y. Halpern, and N. Megiddo. A logic for reasoning about probabilities. Inf. Comput., 87:78–128, 1990. 8. M. Fitting. Bilattices and the semantics of logic programming. J. Logic Program., 11(1–2):91–116, 1991. 9. J. Y. Halpern. An analysis of first-order logics of probability. Artif. Intell., 46:311– 350, 1990. 10. M. Kifer and V. S. Subrahmanian. Theory of generalized annotated logic programming and its applications. J. Logic Program., 12(3–4):335–367, 1992. 11. J.-L. Lassez and M. J. Maher. Optimal fixedpoints of logic programs. Theor. Comput. Sci., 39:15–25, 1985. 12. J. W. Lloyd. Foundations of Logic Programming. Springer, Berlin, 2nd ed., 1987.
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
289
13. J. Lobo, J. Minker, and A. Rajasekar. Foundations of Disjunctive Logic Programming. MIT Press, Cambridge, MA, 1992. 14. T. Lukasiewicz. Probabilistic logic programming. In Proc. of the 13th Biennial European Conf. on Artificial Intelligence, pages 388–392. J. Wiley & Sons, 1998. 15. T. Lukasiewicz. Local probabilistic deduction from taxonomic and probabilistic knowledge-bases over conjunctive events. Int. J. Approx. Reas., 21(1):23–61, 1999. 16. T. Lukasiewicz. Many-valued disjunctive logic programs with probabilistic semantics. Technical Report 1843-99-09, Institut f¨ ur Informationssysteme, Technische Universit¨ at Wien, 1999. ftp://ftp.kr.tuwien.ac.at/pub/tr/rr9909.ps.gz. 17. T. Lukasiewicz. Many-valued first-order logics with probabilistic semantics. In Proc. of the Annual Conference of the European Association for Computer Science Logic, 1998, volume 1584 of LNCS, pages 415–429. Springer, 1999. 18. T. Lukasiewicz. Probabilistic and truth-functional many-valued logic programming. In Proc. of the 29th IEEE International Symposium on Multiple-Valued Logic, pages 236–241, 1999. 19. T. Lukasiewicz. Probabilistic deduction with conditional constraints over basic events. J. Artif. Intell. Res., 10:199–241, 1999. 20. C. Mateis. A Quantitative Extension of Disjunctive Logic Programming. Doctoral Dissertation, Technische Universit¨ at Wien, 1998. 21. J. Minker. Overview of disjunctive logic programming. Ann. Math. Artif. Intell., 12:1–24, 1994. 22. R. T. Ng. Semantics, consistency, and query processing of empirical deductive databases. IEEE Trans. Knowl. Data Eng., 9(1):32–49, 1997. 23. R. T. Ng and V. S. Subrahmanian. A semantical framework for supporting subjective and conditional probabilities in deductive databases. J. Autom. Reasoning, 10(2):191–235, 1993. 24. R. T. Ng and V. S. Subrahmanian. Stable semantics for probabilistic deductive databases. Inf. Comput., 110:42–83, 1994. 25. L. Ngo. Probabilistic disjunctive logic programming. In Proc. of the 12th Conf. on Uncertainty in Artificial Intelligence, pages 397–404. Morgan Kaufmann, 1996. 26. N. J. Nilsson. Probabilistic logic. Artif. Intell., 28:71–88, 1986. 27. D. Poole. Probabilistic Horn abduction and Bayesian networks. Artif. Intell., 64:81–129, 1993. 28. T. C. Przymusinski. On the declarative semantics of stratified deductive databases and logic programs. In J. Minker, editor, Foundations of Deductive Databases and Logic Programming, pages 193–216. Morgan Kaufmann, 1988. 29. T. C. Przymusinski. Stable semantics for disjunctive programs. New Generation Comput., 9:401–424, 1991. 30. N. Rescher. Many-valued Logic. McGraw-Hill, New York, 1969. 31. M. H. van Emden. Quantitative deduction and its fixpoint theory. J. Logic Program., 3(1):37–53, 1986.
Extending Disjunctive Logic Programming by T -norms? Cristinel Mateis Information Systems Department, TU Vienna A-1040 Vienna, Austria
[email protected]
Abstract. This paper proposes a new knowledge representation language, called QDLP, which extends DLP to deal with uncertain values. A certainty degree interval (a subinterval of [0, 1]) is assigned to each (quantitative) rule. Triangular norms (T -norms) are employed to define calculi for propagating uncertainty information from the premises to the conclusion of a quantitative rule. Negation is considered and the concept of stable model is extended to QDLP. Different T -norms induce different semantics for one given quantitative program. In this sense, QDLP is parameterized and each choice of a T -norm induces a different QDLP language. Each T -norm is eligible for events with determinate relationships (e.g., independence, exclusiveness) between them. Since there are infinitely many T -norms, it turns out that there is a family of infinitely many QDLP languages. This family is carefully studied and the set of QDLP languages which generalize traditional DLP is precisely singled out. Finally, the complexity of the main decisional problems arising in the context of QDLP (i.e., Model Checking, Stable Model Checking, Consistency, and Brave Reasoning) is analyzed. It is shown that the complexity of the relevant fragments of QDLP coincides exactly with the complexity of DLP. That is, reasoning with uncertain values is more general and not harder than reasoning with boolean values.
1
Introduction
Disjunctive logic programs are logic programs where disjunction is allowed in the heads of the rules and negation may occur in the bodies of the rules. Such programs are nowadays widely recognized as a valuable tool for knowledge representation and commonsense reasoning [3,16,22]. An important merit of disjunctive logic programming (DLP) is its capability to model incomplete knowledge [3,22]. DLP has a very high expressive power. In [14] it is proved that, under stable model semantics, disjunctive programs capture the complexity class Σ2P , that is, they allow us to express every property which is decidable in non-deterministic polynomial time with an oracle in NP. Thus, DLP can express real world situations that cannot be represented by disjunction-free programs. ?
Work partially supported by the Austrian Science Fund (FWF) under grants N Z29-INF and P12344-INF.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 290–304, 1999. c Springer-Verlag Berlin Heidelberg 1999
Extending Disjunctive Logic Programming by T -norms
291
However, real-life applications often need to deal with uncertain information and quantitative data which cannot be represented in DLP. The usual logical reasoning in terms of the truth values true and false are insufficient for the purposes of several real-life applications. Image databases, sensor data, temporal indeterminacy, information retrieval are only a few of the domains where uncertainty occurs [20]. Consider for instance a robot which moves and changes direction according to a prefixed route and to the coordinates received from a sensor. Since sensor data may be subject to error and sensors may have different reliability, a formalism able to deal with uncertain information is needed to encode the control mechanism of the robot. (See section 4 for the example on this subject.) Many frameworks for multivalued logic programming have been proposed to handle uncertain information. There is a split in the AI community between (i) those who attempt to deal with uncertainty using non-numerical techniques [8,9,12], (ii) those who use numerical representations of uncertainty but, believing that probability calculus is inadequate for the task, invent entirely new calculi, such as Dempster-Shafer calculus [10,17,32], fuzzy logic [6,7,15,18,19,33,34], and (iii) those who remain within the traditional framework of probability theory, while attempting to equip the theory with computational facilities needed to perform AI tasks [2,24,25,26,27,29,30]. We propose an approach to define the representation, inference, and control of uncertain information in the framework of DLP which is closely related to the second of the above categories. The main contributions of the paper are the following. – We define a new knowledge representation language, called Quantitative Disjunctive Logic Programming (QDLP), extending DLP to deal with uncertain values. – We define a mechanism of reasoning with uncertainty through rule chaining by using the well-studied and mathematically clean notion of T -norm. In particular, we consider a p-parameterized family of T -norms. Each T -norm is eligible for events with determinate relationships (e.g., independence, exclusiveness) between them. Different T -norms induce different semantics for one given quantitative program. Thus, QDLP is parameterized and each choice of a T -norm induces a different QDLP language. There are infinitely many T -norms, hence there are infinitely many QDLP languages. Importantly, the T -norm may be chosen according to the level of knowledge of the relationships between the atoms (events) of the program. – We single out precisely the fragments from the QDLP family which are generalizations of DLP. Basically, a fragment QF of QDLP induced by a T -norm T (p) , p ∈ [−∞, +∞], is a generalization of DLP iff to each program P from DLP corresponds a program QP in QF such that the set of all stable models of P is exactly the set of all stable models of QP under the semantics induced by T (p) . – We show that the Quantitative Logic Programming Language proposed by van Emden in [34] coincides with the disjunction-free fragment of QDLP induced by the T -norm T3 .
292
C. Mateis
– We analyze the complexity of the main decisional problems arising in QDLP. We classify precisely (i.e., by completeness results) the complexity of all relevant fragments of QDLP (i.e., of the QDLP languages which truly generalize DLP) for the T -norm T3 . Importantly, the addition of uncertainty does not cause any computational overhead, as the complexity of QDLP is exactly the same as the complexity of DLP. In other words, uncertainty comes for free! For space limitation, we omit the proofs of the results reported in section 6.2 and 7. The proofs of all results along with further material and details are reported in the long version of the paper [23] which can be retrieved from the mentioned web address.
2
Preliminaries: Triangular Norms and Conorms
The triangular norms (T -norms) and conorms (T -conorms) form the basis for the various uncertainty calculi discussed in this paper. We will denote a T -norm by T and a T -conorm by S. One of the advantages of these operators is their low computational complexity. The T -norms and T -conorms are functions T, S : [0, 1] × [0, 1] → [0, 1] which satisfy the following properties: T (a, 0) = T (0, a) = 0 S(1, a) = S(a, 1) = 1 [boundary] T (a, 1) = T (1, a) = a S(0, a) = S(a, 0) = a [boundary] T (a, b) ≤ T (c, d) S(a, b) ≤ S(c, d) if a ≤ c, b ≤ d [monotonicity] T (a, b) = T (b, a) S(a, b) = S(b, a) [commutativity] T (a, T (b, c)) = T (T (a, b), c) S(a, S(b, c)) = S(S(a, b), c) [associativity] Intuitively, T (a, b) (resp., S(a, b)) assigns a certainty value to the composition of two events e1 and e2 whose certainty values are a and b. Usually, the composition of e1 and e2 is the conjunction (resp., disjunction) under certain conditions (e.g., independence, mutual exclusiveness). Although defined as two-place functions, the T -norms and T -conorms can be used to represent the composition of a larger number of events. Because of the associativity property, it is possible to define recursively T (x1 , . . . , xn , xn+1 ) and S(x1 , . . . , xn , xn+1 ) for x1 , . . . , xn+1 ∈ [0, 1] as: T (x1 , . . . , xn , xn+1 ) = T (T (x1 , . . . , xn ), xn+1 ) S(x1 , . . . , xn , xn+1 ) = S(S(x1 , . . . , xn ), xn+1 ) Some typical T -norms and T -conorms are the following:
min(a, b) if max(a, b) = 1 0 otherwise 1) T1 (a, b) = max(0, a + b − √ √ T1.5 (a, b) = max(0, a + b − 1)2 T2 (a, b) = ab ab T2.5 (a, b) = a+b−ab T3 (a, b) = min(a, b) T0 (a, b) =
max(a, b) if min(a, b) = 0 1 otherwise S1 (a, b) = min(1, a + b) √ √ S1.5 (a, b) = 1 − max(0, 1 − a + 1 − b − 1)2 S2 (a, b) = a + b − ab S2.5 (a, b) = a+b−2ab 1−ab S3 (a, b) = max(a, b) S0 (a, b) =
Extending Disjunctive Logic Programming by T -norms
293
It is important to note that T0 ≤ T1 ≤ T1.5 ≤ T2 ≤ T2.5 ≤ T3 S3 ≤ S2.5 ≤ S2 ≤ S1.5 ≤ S1 ≤ S0 T1 is appropriate to perform the intersection of lower probability bounds (uncertainty values) and captures the notion of the worst case, where the arguments are considered as mutually exclusive as possible. T3 is appropriate to represent the intersection of upper probability bounds and captures the notion of the best case, where one argument attempts to subsume the others. T2 is the classical probabilistic operator that assumes independence of arguments and its dual T conorm S2 is the usual additive measure for the union. Schweizer and Sklar [31] proposed a parameterized family, denoted by T (a, b, p), where a and b are the T -norm’s arguments and p is the parameter that spans the space of T -norms from T0 to T3 : −p 1 if a−p + b−p ≥ 1 when p < 0 (a + b−p − 1)− p 0 if a−p + b−p ≤ 1 when p < 0 T (a, b, p) = lim T (a, b, p) = ab when p → 0 p→0 −p 1 −p −p when p > 0 (a + b − 1) Let R = [−∞, +∞] and R+ = [0, +∞]. Given a real number p ∈ R, we denote by T (p) the member of the family of T -norms induced by p. Note that we allow p to be assigned the infinite values −∞ and +∞. Figure 1 illustrates how T (p) spans over the real numbers, so for example T (−∞) = T0 , T (−1) = T1 , T (0) = T2 , and T (+∞) = T3 . p T (a, b, p)
−∞
−1
−0.5
0
1
T0
T1
T1.5
T2
T2.5
+∞ T3
Fig. 1. Spanning of the T -norms over the real numbers
For suitable negation operators N (a), such as N (a) = 1 − a, T -norms and T -conorms are duals in the sense of the following generalization of DeMorgan’s law: S(a, b) = N (T (N (a), N (b))) T (a, b) = N (S(N (a), N (b))) This duality implies that given the negation operator N (a) = 1 − a, the selection of a T -norm uniquely constrains the selection of the T -conorm. The dual parameterized family of T -conorm, denoted by S(a, b, p) is defined as S(a, b, p) = 1 − T (1 − a, 1 − b, p). Given a real number p ∈ R we denote by S (p) the member of the family of T -conorms induced by p. So for example S (−∞) = S0 , S (−1) = S1 , S (0) = S2 , and S (+∞) = S3 . Theorem 1. The evaluation of the T -norms and T -conorms at the extremes of the unity interval [0, 1] satisfies the truth tables of the logical operators AN D and OR, respectively. 2
294
3
C. Mateis
Syntax of QDLP
A term is either a constant or a variable1 . An atom is a(t1 , ..., tn ), where a is a predicate of arity n and t1 , ..., tn are terms. A literal is either a positive literal p or a negative literal ¬p, where p is an atom. A positive (disjunctive) quantitative rule r is a clause of the form: [x,y]
h1 ∨ · · · ∨ hn ←− b1 , · · · , bk ,
n ≥ 1, k ≥ 0
where h1 , · · · , hn , b1 , · · · , bk are atoms and 0 < x ≤ y ≤ 1. The interval [x, y] is the certainty degree interval of the rule (i.e., the strength of the rule implication) and it is a measure of the reliability of the rule. h1 ∨ · · · ∨ hn is the head of the quantitative rule and it is a non-empty disjunction of atoms. b1 , · · · , bk is the body of the quantitative rule and it is a (possibly empty) conjunction of atoms. If the body is empty (i.e., k = 0) and the head contains exactly one atom (i.e., n = 1), the rule is a fact whose certainty degree interval coincides with the strength of the implication. A positive (disjunctive) quantitative program is a finite set of positive quantitative disjunctive rules.
4
Semantics of QDLP
Let P be a positive disjunctive quantitative program. The Herbrand universe UP , the Herbrand base BP , and ground(P) of P are defined like in DLP. Once we defined the syntax of quantitative rules, we need to evaluate the satisfiability of premises, to propagate uncertainty through rule chaining and to consolidate the same conclusion derived from different rules. A quantitative interpretation I of P is a mapping which assigns to each atom A ∈ BP a certainty degree interval [xA , yA ] ⊆ [0, 1]. We write I(A) = [xA , yA ], ∀ A ∈ BP , [xA , yA ] ⊆ [0, 1]. It is worth noting that a quantitative program P has infinitely many quantitative interpretations because each atom A ∈ BP can be assigned infinitely many intervals [xA , yA ] ⊆ [0, 1]. This is an important difference w.r.t. (function-free) DLP, where each program has always a finite number of Herbrand interpretations. Let p be any real number inducing T (p) from the family of T -norms. We denote by T (p) (resp., S (p) ) the generalization of the T -norm T (p) (resp., T -conorm S (p) ) whose arguments are intervals instead of single values, e.g., T (p) ([a, b], [c, d]) = [T (p) (a, c), T (p) (b, d)]. Now that we know what a quantitative interpretation I is, the first thing to straighten out is when a rule r is true w.r.t. I and what is the role of p. To this end, we first define the way the certainty degree intervals of the atoms of a conjunction or disjunction are combined. In particular, we define 1
Note that function symbols are not considered in this work.
Extending Disjunctive Logic Programming by T -norms
295
1. The certainty degree interval of a (possibly empty) conjunction C of atoms from BP , C = b1 ∧ . . . ∧ bm , w.r.t. I and p: [1, 1] if m = 0 (i.e., C = ∅) (p) I (C) = T (p) (I(b1 ), . . . , I(bm )) if m > 0 2. The certainty degree interval of a non-empty disjunction D of atoms from BP , D = h1 ∨ . . . ∨ hn , w.r.t. I and p: I (p) (D) = S (p) (I(h1 ), . . . , I(hn )). Given two certainty degree intervals [a, b] and [p, q], then [a, b] ≤ [p, q] iff a ≤ p and b ≤ q. Moreover, [a, b] < [p, q] iff (i) [a, b] ≤ [p, q], and (ii) a < p or b < q. [x,y]
We say that a rule r ∈ ground(P), H(r) ←− B(r), is p-satisfied w.r.t. I iff the following inequality is satisfied I (p) (H(r)) ≥ T2 (I (p) (B(r)), [x, y])
(1)
The member on the right-hand side of the inequality (1) represents the certainty degree interval propagated through the rule w.r.t. I and p. The head event H(r) depends on two events: (i) the rule reliability event, expressed through [x, y], and (ii) the reliability event of the body of r w.r.t. I and p, given by I (p) (B(r)). Intuitively, we can assume that the rule reliability is independent of the certainty degree intervals of the body literals, so that the two events are to be considered independent and for this reason we use T2 in (1). A quantitative p-model of P is a quantitative interpretation M of P such that each rule r ∈ ground(P) is p-satisfied w.r.t. M . Since the definition of quantitative p-model relies completely on the instantiation ground(P) of P, for simplicity, throughout the rest of this paper, we assume that P is a ground program (that can be either ground originally, or it is the instantiation ground(P 0 ) of a program P 0 ). The set of all p-models of P is denoted by M(p) (P). As previously noted, a quantitative program P has infinitely many quantitative interpretations. Thus, P may have (infinitely) many p-models. Therefore, it is useful to define an order relation between the p-models of P which makes possible to prefer some p-models to others. Since a p-model assigns certainty degree intervals to all atoms in BP , an order relation between p-models should be defined in terms of an order relation between intervals. Given M1 , M2 ∈ M(p) (P), M1 ≤ M2 iff M1 (A) ≤ M2 (A) for each A ∈ BP . Moreover, M1 < M2 iff (i) M1 ≤ M2 , and (ii) ∃A ∈ BP s.t. M1 (A) < M2 (A). We are now in a position to define what a minimal p-model is. A p-model M ∈ M(p) (P) is minimal iff there is no N ∈ M(p) (P) such that N < M . The minimal p-model semantics of P is the set of all minimal p-models of P and is denoted by MM(p) (P). Once we fix p, we uniquely select a T -norm and its dual T -conorm which completely describe an uncertainty calculus. That is, according to the previous definitions, once we fix p, we define a semantics for P, called the p-semantics. In
296
C. Mateis
this sense, we say that the semantics of the quantitative programs is parameterized and the choice of a T -norm induces the semantics of a quantitative program. Moreover, different T -norms induce different semantics in general. Since we can fix p in infinitely many ways, we can define infinitely many semantics for P. The T -norm may be chosen according to the level of knowledge of the relationships between the atoms of P. Example 1. Consider the ground program P consisting of the following rules [0.9,1]
[0.8,0.8]
[0.5,0.6]
a ∨ c ←− .
u ←− a, b .
w ←− u .
b ←−
v ←− b .
w ←− v .
[0.5,0.5]
.
[0.4,0.8]
[1,1]
and the interpretations I1 , I2 and I3 , I1 = {a : [0.9, 1], b : [0.5, 0.5], c : [0, 0], u : [0.4, 0.4], v : [0.2, 0.4], w : [0.2, 0.4]} I2 = {a : [0.9, 1], b : [0.5, 0.5], c : [0, 0], u : [0.4, 0.6], v : [0.2, 0.4], w : [0.2, 0.4]} I3 = {a : [0.9, 1], b : [0.5, 0.5], c : [0, 0], u : [0.2, 0.5], v : [0.2, 0.4], w : [0.2, 0.4]} If p = +∞ (i.e., T (p) = T3 ) then I1 , I2 ∈ M(p) (P). I3 6∈ M(p) (P) because the [0.8,0.8]
rule u ←− a, b is not p-satisfied w.r.t. I3 . Moreover, I1 < I2 and I1 is minimal. 2 Example 2. Consider a robot which moves and changes direction according to a prefixed route and to the coordinates received from a sensor. Sensor data is subject to error and different sensors may have different reliabilities. The control mechanism of the robot can be encoded in QDLP as follows. Consider the atoms moveT oRight, moveT oLef t, moveU p, moveDown, xCoord(X), yCoord(Y ), sensorX(X), and sensorY (Y ). At regular intervals of time, the sensors return instances of the atoms sensorX(X) and sensorY (Y ) which are used to derive the actual coordinates according to the following quantitative rules [0.9,1]
xCoord(X) ←− sensorX(Z), |X − Z| ≤ 0.5 [0.8,1]
yCoord(Y ) ←− sensorY (Z), |Y − Z| ≤ 0.5 where the strength of the implication of each rule represents the reliability of the corresponding sensor in normal environment conditions (e.g., good visibility, low level of usage, etc). The built-in predicates have always the maximal reliability (i.e., [1, 1]). The atoms sensorX(X) and sensorY (Y ) are assigned reliabilities according to the current environment conditions. For each turning point (x, y) of the assigned route, we define a rule like [1,1]
atom ←− xCoord(x), yCoord(y) where atom ∈ {moveT oRight, moveT oLef t, moveU p, moveDown}. The robot turns to the right when the certainty degree interval of moveT oRight is at least [0.75, 1], and so on. 2
Extending Disjunctive Logic Programming by T -norms
5
297
QDLP with Negation
Several real world situations can be represented much more naturally if negation is allowed [21]. It is therefore necessary to define a general (disjunctive) quantitative rule r which allows negative literals in its body: [x,y]
h1 ∨ · · · ∨ hn ←− b1 , · · · , bk , ¬bk+1 , · · · , ¬bk+m ,
n ≥ 1, k, m ≥ 0
where h1 , · · · , hn , b1 , · · · , bk+m are atoms and 0 < x ≤ y ≤ 1. We show next how the definitions of p-satisfiability and (minimal) p-model change when negative literals are allowed in the rules’ bodies. Moreover, we will see that the quantitative minimal model semantics is not the natural meaning to be assigned to a negative quantitative program, and we define the quantitative stable model semantics. We have to redefine only the relation (1) which the p-satisfiability of a positive rule depends on and take into consideration the case when the body of a rule contains also negative literals; all other definitions remain unchanged. A natural question that arises is, given I(A) = [x, y], how do we evaluate the certainty degree of the negative literal ¬A, that is, what is I(¬A)? The answer is I(¬A) = [N (y), N (x)] = [1 − y, 1 − x] where N is the negation operator N : [0, 1] → [0, 1], N (x) = 1 − x. Thus, the certainty degree interval of the body of r w.r.t. I and p is given by I (p) (B(r)) =
[1, 1] if k + m = 0 T (p) (I(b1 ), . . . , I(bk ), I(¬bk+1 ), . . . , I(¬bk+m )) if k + m > 0
Like in DLP, the quantitative minimal model semantics is applicable also to negative quantitative programs, but it does not capture the meaning of negation by failure (i.e., CWA). We define a new semantics, called quantitative stable model semantics. The quantitative stable model semantics involves the notion of stable p-model. Before defining this new notion, we define the extended quantitative program and the quantitative version (qGL) of the Gelfond-Lifschitz transformation (GL). An extended quantitative program is a quantitative program Pe where subintervals of the unity interval [0, 1] may occur as body atoms in the rules of Pe and are considered like normal atoms. It is worth noting that such atoms are not in BPe . We assume that every quantitative interpretation I of Pe assigns to each atom [x, y] occurring in the body of a rule the certainty degree interval [x, y], that is, I([x, y]) = [x, y]. Given a quantitative interpretation I for P, the qGL-transformation PI of P w.r.t. I is the positive extended quantitative program obtained from P by replacing in the body of every rule each negative literal ¬ Bi by the constant interval I(¬ Bi ). Let M be a p-model of P, for some p ∈ R. M is a stable p-model of P iff P . The stable p-model semantics of P is the set of M is a minimal p-model of M all stable p-models of P and is denoted by SM(p) (P). Note that if P is positive then MM(p) (P) = SM(p) (P) for each p ∈ R.
298
C. Mateis [0.5,0.6]
Example 3. Let P = {a ←− ¬ b}, p ∈ R (the value of p is irrelevant, since the body of the single rule of P contains only one literal), and the minimal p-model [0.5,0.6]
P = {a ←− [1, 1]} and that M is a M = {a : [0.5, 0.6], b : [0, 0]}. Note that M P minimal p-model of M , hence M is a stable p-model of P. Consider now the minimal p-model N = {a : [0.25, 0.36], b : [0.4, 0.5]}. Thus, P N
[0.5,0.6]
P = {a ←− [0.5, 0.6]} and N 0 = {a : [0.25, 0.36], b : [0, 0]} is a p-model of N . P 0 Since N < N , N is not minimal for N , hence N is not a stable p-model of P. 2
6 6.1
Generalization Results Van Emden’s Approach
One of the most relevant earlier works in this field was accomplished by van Emf den in [34]. There, a quantitative rule r is of the form A ←− B1 , . . . , Bn , where n ≥ 0, A, B1 , . . . , Bn are all positive atoms, f is a real number in the interval (0, 1]. r is true in a quantitative interpretation I iff I(A) ≥ f × min{ I(Bi ) | i ∈ {1, . . . , n} }. Theorem 2. The language proposed by van Emden is a particular case of the p-model semantics, where p = +∞ (i.e., T (p) = T3 ). 2 There are important differences between our approach and that of van Emden. First of all, the programs considered in [34] are positive and without disjunction. Moreover, unlike in our approach, each clause implication receives a scalar and not an interval. Finally, van Emden defines a unique uncertainty calculus, based on the T -norm T3 . 6.2
Traditional Disjunctive Logic Programming
From the syntax point of view, QDLP is an extension of DLP. Each P in DLP can be transformed in a program P 0 in QDLP, called the quantitative version of P, by assigning [1, 1] to the strength of the implication of each rule (fact) of P. Remember that in DLP the implications are strict logical true and the logical value true is regarded as [1, 1] in QDLP. Thus, P is equivalent to P 0 from the syntax point of view. Example 4. Consider the logic program P = {a ← ; b ← ; c ∨ d ← a, b}. The [1,1]
[1,1]
[1,1]
quantitative version of P is P 0 = {a ←− ; b ←− ; c ∨ d ←− a, b}.
2
We wish to see now whether QDLP is an extension of DLP also from the semantics point of view. We say that a stable p-semantics of QDLP is a generalization of the stable model semantics of DLP iff SM(p) (P 0 ) = SM(P) for each P in DLP, where P 0 is the quantitative version of P. Given p, a priori, it is not guaranteed that the p-semantics of QDLP generalizes the DLP semantics. It is highly desirable that QDLP semantics coincides with DLP semantics on boolean quantitative programs. Whether the p-semantics of a given class
Extending Disjunctive Logic Programming by T -norms
299
of boolean quantitative programs coincides with the DLP semantics, depends strongly on the value of p and on the features (e.g., positive, stratified negative, disjunctive,etc.) of the QDLP class. We single out the classes of QDLP and the values of p for which the p-semantics on the boolean quantitative programs of these classes coincides with the DLP semantics. Table 1. QDLP fragments generalizing DLP { } { ¬s } { ¬ } { ∨h } { ∨ } { ∨h , ¬s } { ∨, ¬s } { ∨, ¬ } p = −∞
YES YES
NO
YES
NO
YES
NO
NO
p ∈ (−∞, 0) YES YES
NO
NO
NO
NO
NO
NO
p ∈ [0, +∞] YES YES
NO
YES YES
YES
YES
NO
The results on generalizations are summarized in Table 1. Each column of the table collects the results for a specific class of programs for the T -norms induced by the values of p on the rows. The symbol ¬s refers to the stratified negation, while ∨h refers to the head cycle free (HCF) disjunction. 2 For instance, the last column of the table refers to the (unstratified) negative (non-HCF) disjunctive programs. A box of the table contains the answer YES if the class of quantitative programs given by the corresponding column header is a generalization for the values of p given by the header of the corresponding row of the table, and NO otherwise. From the non-disjunctive programs class, for positive and stratified programs, every p ∈ R induces a quantitative extension of DLP. From the class of disjunctive programs, like in the non-disjunctive case, for positive and stratified programs there are values of p which induce quantitative extensions of DLP, but unlike in the non-disjunctive case, where p ∈ R, p is reduced to {−∞} ∪ [0, +∞] for the HCF case and to [0, +∞] for the non-HCF case. The generalizations of the HCF and non-HCF programs are not supported by other values of p. Thus, generalization is guaranteed in most cases where recursion through negation and disjunction is forbidden (stratified and HCF programs). This is a nice result because stratified HCF programs have a very clear and intuitive declarative meaning (while unstratification and recursion through disjunction can be confusing). Intuitively, the fact that a fragment QF of QDLP is not a generalization of the corresponding fragment F of DLP is due to (i) the disjunctive rules’ heads, and (ii) that some values of p induce T -conorms for which, when applied to a disjunction of atoms, it is not absolutely necessary that the certainty degree 2
The notion of Stratified Negation [1] and of Head Cycle Free Disjunction [4,5] are extended from traditional DLP to QDLP in a straightforward manner. Their formal definitions are given in Appendix A.
300
C. Mateis
interval of all atoms be [1, 1] or [0, 0] in order to derive [1, 1] as certainty degree interval for the disjunction. For these values of p, the quantitative version P 0 in QF of a program P in F has pure quantitative stable p-models in QDLP which clearly cannot be accepted as stable models in DLP for P. Only the T -conorms and not also the T -norms corresponding to these values of p were reasons for not obtaining generalizations of DLP.
7
Complexity Results
As for traditional DLP, four main decisional problems arise in the context of QDLP. In particular, given a quantitative program P and p ∈ R, 1. is a given quantitative interpretation I of P a p-model for P? (p-Model Checking) 2. is a given p-model M of P a stable p-model for P? (Stable p-Model Checking) 3. does there exist a stable p-model for P? (p-Consistency) 4. given an atom A ∈ BP and a certainty interval [x, y], does there exist a stable p-model M for P such that M (A) ≥ [x, y]? (Brave p-Reasoning) We have analyzed the complexity of the above decisional problems for the classes of QDLP which are generalizations of the corresponding DLP classes, the other fragments being of low interest from the practical point of view. The results for non-disjunctive and disjunctive quantitative programs are summarized in Table 2 and 3, respectively. A box in the tables contains the complexity of the decisional problem given by the corresponding column header for the fragment of QDLP given by the corresponding row header. Table 2. Complexity of non-disjunctive QDLP Fragments for p ∈ R p-Model Checking Stable p-Model Checking p-Consistency Brave p-Reasoning
{}
P
P
Ensured
P
{ ¬s }
P
P
Ensured
P
The results in Table 2 for the non-disjunctive fragments are valid for p ∈ R. For both positive and stratified classes, all decisional QDLP problems, apart from p-Consistency which is O(1), are polynomial. Determining precisely the complexity of disjunctive QDLP is much more difficult. In this paper, we have concentrated our attention on the QDLP fragments relative to the T -norm T3 (p = +∞). This T -norm is of particular interest, as it is the norm for which QDLP generalizes also the quantitative language of van Emden (see section 6.1). The results for the disjunctive QDLP are shown in Table 3. The first column reports about the complexity of p-Model Checking for the various disjunctive fragments of QDLP. In all cases the complexity is polynomial.
Extending Disjunctive Logic Programming by T -norms
301
Table 3. Complexity of QDLP Fragments for p = +∞ (T -norm T3 ) p-Model Checking Stable p-Model Checking p-Consistency Brave p-Reasoning
{ ∨h }
P
P
Ensured
NP-complete
{∨}
P
coNP-complete
Ensured
Σ2P -complete
{ ∨ h , ¬s }
P
P
Ensured
NP-complete
{ ∨, ¬s }
P
coNP-complete
Ensured
Σ2P -complete
The 2nd column reports about the complexity of Stable p-Model Checking. The “hardest” QDLP fragments for this problem are proved to be the classes of positive and stratified negative (non-HCF) disjunctive programs whose complexity is coNP-complete. In the other two considered cases the complexity is polynomial. The 3rd column reports about the complexity of p-Consistency. In all considered cases the complexity is O(1) because the existence of a stable p-model is ensured. Finally, the 4th column reports about the complexity of Brave p-Reasoning. We can note an increment of complexity from NP-complete for the HCF case to Σ2P -complete for the non-HCF case. Note that the here considered classes of QDLP with stratified negation do not increase the complexity of any of the four decisional problems w.r.t. the corresponding positive classes. This result is shown in Table 3 by the rows pairs (1, 3) and (2, 4) which store the same complexity results in all columns. Remark that our results for QDLP coincide precisely with the results for DLP obtained by Eiter et al. in [13,14], i.e., reasoning under multiple-valued logics is more general but not harder than reasoning under boolean logics. That is, uncertainty comes for free!
Acknowledgments I am very grateful to Georg Gottlob and Nicola Leone for their useful criticism and numerous fruitful discussions on the manuscript.
References 1. K.R. Apt, H.A. Blair, and A. Walker. Towards a Theory of Declarative Knowledge. Foundations of Deductive Databases and Logic Programming, Minker, J. (ed.), Morgan Kaufmann, Los Altos, 1987. 2. F. Bacchus. Representing and Reasoning with Probabilistic Knowledge. Research Report CS-88-31, University of Waterloo, 1988. 3. Baral, C., Gelfond, M. Logic Programming and Knowledge Representation. Journal of Logic Programming, Vol. 19/20, May/July, pp. 73–148, 1994.
302
C. Mateis
4. R. Ben-Eliyahu and R. Dechter. Propositional Semantics for Disjunctive Logic Programs. Annals of Mathematics and Artificial Intelligence, 12:53–87, 1994. 5. R. Ben-Eliyahu and L. Palopoli. Reasoning with Minimal Models: Efficient Algorithms and Applications. In Proc. KR-94, pp. 39–50, 1994. 6. H.A. Blair and V.S. Subrahmanian. Paraconsistent Logic Programming. Theoretical Computer Science, 68, pp. 35–54, 1987. 7. P. Bonissone. Summarizing and Propagating Uncertain Information with Triangular Norms. International Journal of Approximate Reasoning, 1:71–101,1987. 8. P.R. Cohen and M.R. Grinberg. A Framework for Heuristic Reasoning about Uncertainty. In Proc. IJCAI ’83, pp. 355–357, Karlsruhe, Germany, 1983. 9. P.R. Cohen and M.R. Grinberg. A Theory of Heuristics Reasoning about Uncertainty. AI Magazine, 4(2):17–23, 1983. 10. A.P. Dempster. A Generalization of Bayesian Inference. J. of the Royal Statistical Society, Series B, 30, pp. 205–247, 1968. 11. J. Dix. Semantics of Logic Programs: Their Intuitions and Formal Properties. An Overview. In Logic, Action and Information, pp. 241–329. DeGruyter, 1995. 12. J. Doyle. Methodological Simplicity in Expert System Construction: the Case of Judgements and Reasoned Assumptions. AI Magazine, 4(2):39–43, 1983. 13. T. Eiter, G. Gottlob, and H. Mannila. Disjunctive Datalog. ACM Transaction on Database System, 22(3):364–417, September 1997. 14. T. Eiter and G. Gottlob. On the Computational Cost of Disjunctive Logic Programming: Propositional Case. Annals of Mathematics and Artificial Intelligence, 15(3/4):289–323, 1995. 15. M.C. Fitting. Bilattices and the Semantics of Logic Programming. J. Logic Programming, 11, pp. 91–116, 1991. 16. M. Gelfond and V. Lifschitz. Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing, 9:365–385, 1991. 17. M. Ishizuka. Inference Methods Based on Extended Dempster-Shafer Theory for Problems with Uncertainty/Fuzziness. New Generation Computing, 1, 2, pp. 159– 168, 1983. 18. M. Kifer and A. Li. On the Semantics of Rule-Based Expert Systems with Uncertainty. In 2-nd International Conference on Database Theory, Springer Verlag LNCS 326, pp. 102–117, 1988. 19. M. Kifer and V.S. Subrahmanian. Theory of the Generalized Annotated Logic Programming and its Applications. J. Logic Programming, 12, pp. 335–367, 1992. 20. L.V.S. Lakshmanan, N. Leone, R. Ross, and V.S. Subrahmanian. ProbView: A Flexible Probabilistic Database System. ACM Transaction on Database Systems, 22, 3, pp. 419–469, 1997. 21. J.W. Lloyd. Foundations of Logic Programming. Springer-Verlag, 1987. 22. J. Lobo, J. Minker, and A. Rajasekar. Foundations of Disjunctive Logic Programming. MIT Press, Cambridge, MA, 1992. 23. C. Mateis. A Quantitative Extension of Disjunctive Logic Programming. Technical Report, available on the web as: http://www.dbai.tuwien.ac.at/staff/mateis/gz/qdlp.ps. 24. R.T. Ng and V.S. Subrahmanian. Probabilistic Logic Programming. Information and Computation, 101:150–201, 1992. 25. R.T. Ng and V.S. Subrahmanian. Empirical Probabilities in Monadic Deductive Databases. In Proc. Eighth Conf. Uncertainty in AI, pp. 215–222, Stanford, 1992. 26. R.T. Ng and V.S. Subrahmanian. A Semantical Framework for Supporting Subjective and Conditional Probabilities in Deductive Databases. J. of Automated Reasoning, 10, 2, pp. 191–235, 1993.
Extending Disjunctive Logic Programming by T -norms
303
27. R.T. Ng and V.S. Subrahmanian. Stable Semantics for Probabilistic Deductive Databases. Information and Computation, 110:42–83, 1994. 28. R.T. Ng and V.S. Subrahmanian. Non-monotonic Negation in Probabilistic Deductive Databases. In Proc. 7-th Conf. Uncertainty in AI, pp. 249–256, Los Angeles, 1991. 29. N.J. Nilsson. Probabilistic Logic. Artificial Intelligence, vol. 28, pp. 71–87, 1986. 30. J. Pearl. Probabilistic Reasoning in Intelligent Systems – Networks of Plausible Inference. Morgan Kaufmann, 1988. 31. B. Schweizer and A. Sklar. Associative Functions and Abstract Semi-Groups. Publicationes Mathematicae Debrecen, 10:69–81, 1963. 32. G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, 1976. 33. E. Shapiro. Logic Programs with Uncertainties: A Tool for Implementing Expert Systems. In Proc. IJCAI ’83, pp. 529–532, 1983. 34. M.H. van Emden. Quantitative Deduction and its Fixpoint Theory. The Journal of Logic Programming, 1:37–53, 1986. 35. L.A. Zadeh. Fuzzy Sets. Inform. and Control, 8:338–353, 1965.
A
Stratified and Head Cycle Free QDLP
The stratified and head-cycle-free (HCF) quantitative programs are important classes of the quantitative programs and, as we will see, they have nice properties. The stratified quantitative programs are defined in the classical way, as introduced by Apt et al. in [1]. A quantitative program P is (locally) stratified iff it is possible to partition the set of its atoms into strata hS1 , . . . , Sr i, such that [x,y]
for every rule h1 ∨ · · · ∨ hk ←− b1 , · · · , bl , ¬bl+1 , · · · , ¬bl+m in P the following holds, (i) Strat(a) = i iff a ∈ Si and (ii) Strat(bs ) ≤ Strat(ht ) for all 1 ≤ s ≤ l, 1 ≤ t ≤ k and (iii) Strat(bs ) < Strat(ht ) for all l + 1 ≤ s ≤ l + m, 1 ≤ t ≤ k. Note that if P is stratified then there is a partition P = P1 ∪ . . . ∪ Pr , where r is the number of strata and Pi contains the rules of P defining the atoms of Si , 1 ≤ i ≤ r. In the sequel, if a negative program is not explicitly said to be stratified, it is assumed to be unstratified. Example 5. Consider the program P consisting of the following rules: [0.6,0.6]
[0.4,0.4]
a ∨ b ←− ¬ c.
a ←− .
e ←− b, ¬ d.
c ∨ d ←− .
[0.5,0.5]
[0.8,0.8]
P is stratified. A partition of BP into strata is hS1 , S2 i with S1 = {c, d} and S2 = {a, b, e}. The partition of P corresponding to the partition of BP is P = P1 ∪ P2 with P1 = {c ∨ d [0.5,0.5]
¬ c; e ←− b, ¬ d}.
[0.8,0.8]
←− } and P2 = {a
[0.4,0.4]
←− ; a ∨ b
[0.6,0.6]
←−
2
304
C. Mateis
At every program P, we associate a directed graph DGP = (N, E), called the dependency graph of P, in which (i) each predicate of P is a node in N , and (ii) there is an arc in E directed from a node a to a node b iff there is a rule in P such that b and a are the predicates of a positive literal appearing in H(r) and B(r), respectively. DGP singles out the dependencies of the head predicates of a rule r from the positive predicates in its body. 3 Example 6. Consider the program P1 consisting of the following rules: [0.6,0.6]
[0.6,0.6]
a ∨ b ←− .
4
[0.6,0.6]
c ←− a.
c ←− b.
DGP1 is depicted in Figure 2a. (Note that, since the sample program is propositional, the nodes of the graph are atoms, as atoms coincide with predicates in this case.) Consider now the program P2 , obtained by adding to P1 the rules [0.8,0.8]
[0.4,0.4]
d ∨ e ←− a.
[0.5,0.5]
d ←− e.
e ←− d, ¬ b.
The dependency graph DGP2 is shown in Figure 2b.
c
a
e
b
d
2
c
a
(a)
b (b)
Fig. 2. Dependency Graph (DGP )
The HCF quantitative programs are an important class of the quantitative programs with disjunction in the head and are defined in the classical way, as defined in [4,5]. A program P is HCF iff there is no clause r in P such that two predicates occurring in the head of r are in the same cycle of DGP . In the sequel, if a disjunctive program is not explicitly said to be HCF, it is assumed to be nonHCF. Example 7. The dependency graphs given in Figure 2 reveal that program P1 [0.8,0.8]
of Example 6 is HCF and that P2 is not HCF, as rule d ∨ e ←− a contains in 2 its head two predicates belonging to the same cycle of DGP2 . 3 4
Note that negative literals cause no arc in DGP . We point out again that we use propositional programs for simplicity, but the results are valid for the general case of (function-free) programs with variables.
Extending the Stable Model Semantics with More Expressive Rules Patrik Simons? Department of Computer Science and Engineering Helsinki University of Technology, FIN-02015 HUT, Finland
[email protected], http://www.tcs.hut.fi/˜psimons Abstract. The rules associated with propositional logic programs and the stable model semantics are not expressive enough to let one write concise programs. This problem is alleviated by introducing some new types of propositional rules. Together with a decision procedure that has been used as a base for an efficient implementation, the new rules supplant the standard ones in practical applications of the stable model semantics.
1
Introduction
Logic programming with the stable model semantics has emerged as a viable method for solving constraint satisfaction problems [4,5]. The state-of-the-art system smodels [6] can often handle non-stratified programs with tens of thousands of rules. However, propositional logic programs can not compactly encode several types of constraints. For example, expressing the subsets of size k of an n-sized set as stable models requires on the order of nk rules. In order to remedy this problem, we improve upon the techniques of smodels, by extending the semantics with some new types of propositional rules: – choice rules for encoding subsets of a set, – constraint rules for enforcing cardinality limits on the subsets, and – weight rules for writing inequalities over weighted linear sums. The extended semantics is not based on subset-minimal models as is the case for disjunctive logic programs. For instance, the choice rule is more of a generalization of the disjunctive rule of the possible model semantics [7]. A system that computes the stable models of programs containing the new rules has been implemented [9], and it has successfully been applied to deadlock and reachability problems in a class of Petri nets [3]. Other problem domains, such as planning and configuration, will benefit by the improved rules as well. The system is based on smodels 1.10 from which it evolved. The new rules and the stable model semantics are introduced in Section 2. A decision procedure for the extended syntax is presented in Section 3, and some important implementation details are described in Section 4. Experimental results are found in Section 5. ?
The financial support of the Academy of Finland (project nr 43963) and the Helsinki Graduate School in Computer Science and Engineering is gratefully acknowledged.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 305–316, 1999. c Springer-Verlag Berlin Heidelberg 1999
306
2
P. Simons
The Stable Model Semantics
Let Atoms be a set of primitive propositions, or atoms, and consider logic programs consisting of rules of the form h ← a1 , . . . , an , not b1 , . . . , not bm , where the head h and the atoms a1 , . . . , an , b1 , . . . , bm in the body are members of Atoms. Call the expression not b a not-atom — atoms and not-atoms are referred to as literals. The stable model semantics for a logic program P is defined as follows [2]. The reduct P A of P with respect to the set of atoms A is obtained by 1. deleting each rule in P that has a not-atom not x in its body such that x ∈ A, and by 2. deleting all not-atoms in the remaining rules. Definition 1. A set of atoms S is a stable model of P if and only if S is the deductive closure of P S when the rules in P S are seen as inference rules. In order to facilitate the definition of more general forms of rules, we introduce an equivalent characterization of the stable model semantics. Proposition 1. We say that fP : 2Atoms → 2Atoms is a closure if fP (S) = {h | h ← a1 , . . . , an , not b1 , . . . , not bm ∈ P, a1 , . . . , an ∈ fP (S), b1 , . . . , bm 6∈S}. Let gP (S) =
\
{fP (S) | fP : 2Atoms → 2Atoms is a closure}.
Then, S is a stable model of the program P if and only if S = gP (S). Proof. Note that the deductive closure of the reduct P S is a closure, and note that for every fP that is a closure, the deductive closure of P S is a subset of fP (S). A stable model is therefore a model that follows from itself by means of the smallest possible closure. In other words, a stable model is a supported model, and this is the essence of the semantics. Definition 2. A basic rule r is of the form h ← a1 , . . . , an , not b1 , . . . , not bm and is interpreted by the function fr : 2Atoms × 2Atoms → 2Atoms as follows. fr (S, C) = {h | a1 , . . . , an ∈ C, b1 , . . . , bm 6∈S}.
Extending the Stable Model Semantics with More Expressive Rules
307
The function fr produces the result of a deductive step when applied to a candidate stable model S and its consequences C. Definition 3. A constraint rule r is of the form h ← k {a1 , . . . , an , not b1 , . . . , not bm } and is interpreted by
fr (S, C) = h |{a1 , . . . , an } ∩ C| + |{b1 , . . . , bm } − S| ≥ k .
The constraint rule can be used for testing the cardinality of a set of atoms. The rule h1 ← 2 {a, b, c, d} states that h1 is true if at least 2 atoms in the set {a, b, c, d} are true. The rule h2 ← 1 {not a, not b, not c, not d}, on the other hand, states that h2 is true if at most 3 atoms in the set are true. Definition 4. A choice rule r is of the form {h1 , . . . , hk } ← a1 , . . . , an , not b1 , . . . , not bm and is interpreted by fr (S, C) = h h ∈ {h1 , . . . , hk } ∩ S, a1 , . . . , an ∈ C, b1 , . . . , bm 6∈S . The choice rule is typically used when one wants to implement optional choices. The rule {a} ← b, not c declares that if b is true and c is false, then a is one or the other. Definition 5. Finally, a weight rule r is of the form h ← {a1 = wa1 , . . . , an = wan , not b1 = wb1 , . . . , not bm = wbm } ≥ w, for wai , wbi ≥ 0, and is interpreted by X X wai + wbi ≥ w}. fr (S, C) = {h | ai ∈C
bi 6∈S
The weight rule is a generalization of the constraint rule. If every literal in the body of a weight rule has weight 1, then the rule behaves precisely as a constraint rule. Definition 6. Let P be a set of rules. As before we say that fP : 2Atoms → 2Atoms is a closure if [ fr S, fP (S) , fP (S) = r∈P
and we define gP (S) =
\
{fP (S) | fP : 2Atoms → 2Atoms is a closure}.
Then, S is a stable model of the program P if and only if S = gP (S).
308
P. Simons
The motivation for defining constraint, choice, and weight rules is that they can be easily and efficiently implemented and that they are quite expressive. For example, the constraint rule h ← k {a1 , . . . , an , not b1 , . . . , not bm } replaces the program {h ← ai1 , . . . , aik1 , not bj1 , . . . , not bjk2 | k1 + k2 = k, 1 ≤ i1 < · · · < ik1 ≤ n, 1 ≤ j1 < · · · < jk2 ≤ m}, rules. which contains n+m k Thus, a constraint rule guarantees that if the sum of the number of atoms in its body that are in a stable model and the number of not-atoms in its body that are not is at least k, then the head is in the model. Similarly, if the body of a choice rule agrees with a stable model, then the rule motivates the inclusion of any number of atoms from its head. A weight rule h ← {a1 = wa1 , . . . , an = wan , not b1 = wb1 , . . . , not bm = wbm } ≥ w, in turn, will force the head to be a member of a stable model S if X X wai + wbi ≥ w. ai ∈S
bi 6∈S
Example 1. The stable models of the program {a1 , . . . , an } ← false ← {a1 = w1 , . . . , an = wn } ≥ w true ← {a1 = v1 , . . . , an = vn } ≥ v containing the atom true but not the atom false correspond to the ways one can pack a subset of a1 , . . . , an in a bin such that the total weight is less than w and the total value is at least v. The individual weights and values of the items are given by respectively w1 , . . . , wn and v1 , . . . , vn . Example 2. The satisfying assignments of the formula (a ∨ b ∨ ¬c) ∧ (¬a ∨ b ∨ ¬d) ∧ (¬b ∨ c ∨ d) correspond to the stable models of the program {a, b, c, d} ← false ← not a, not b, c false ← a, not b, d false ← b, not c, not d that do not contain false.
Extending the Stable Model Semantics with More Expressive Rules
3
309
The Decision Procedure
For an atom a, let not (a) = not a, and for a not-atom not a, let not (not a) = a. For a set of literals A, define not (A) = {not (a) | a ∈ A}. Let A+ = {a ∈ Atoms | a ∈ A} and let A− = {a ∈ Atoms | not a ∈ A}. Define Atoms(A) = A+ ∪ A− , and for a program P , define Atoms(P ) = Atoms(L), where L is the set of literals that appear in the program. A set of literals A is said to cover a set of atoms B if B ⊆ Atoms(A), and B is said to agree with A if A+ ⊆ B
and A− ⊆ Atoms − B.
Algorithm 1 displays a decision procedure for the stable model semantics. The function smodels(P, A) returns true whenever there is a stable model of P agreeing with A, and it relies on the three functions expand (P, A), conflict(P, A), and lookahead (P, A). Let A0 = expand (P, A). We assume that E1 A ⊆ A0 and that E2 every stable model of P that agrees with A also agrees with A0 . Moreover, we assume that the function conflict(P, A) satisfies the two conditions C1 if A covers Atoms(P ) and there is no stable model that agrees with A, then conflict(P, A) returns true, and C2 if conflict(P, A) returns true, then there is no stable model of P that agrees with A. In addition, lookahead (P, A) is expected to return literals not covered by A. Theorem 1. Let P be a set of rules and let A be a set of literals. Then, there is a stable model of P agreeing with A if and only if smodels(P, A) returns true. Let S be a stable model of P agreeing with the set of literals A. Then, fr (S, S) ⊆ S for r ∈ P , and we make the following observations. Let \ min r (A) = fr (C, C) A+ ⊆C A− ∩C=∅
be the inevitable consequences of A, and let [ fr (C, C) max r (A) = A+ ⊆C A− ∩C=∅
be the possible consequences of A. Then,
310
P. Simons
Algorithm 1 A decision procedure for the stable model semantics function smodels (P; A) 0 A := expand (P; A) if con ict (P; A0 ) then return false
0
else if
A
Atoms ( ) then f 0 + is a stable modelg
covers
return true
else
P
A
lookahead (P; A0 ) if smodels (P; A0 [ fxg) then :=
x
return true
else
return
end if end if.
smodels
0 [ fnot (x)g
P; A
function expand (P; A) repeat
0 := A
A A
:=
A
:=
Atleast (P; A) A [ fnot x j x 2 Atoms (P ) and x 62 Atmost (P; A)g
until A = A0 return
A.
function con ict (P; A) fPrecondition: A = expand (P; A)g if A+ \ A 6= ; then return true
else
return false
end if.
function lookahead (P; A) B := Atoms (P ) Atoms (A); B := B [ not (B ) while B 6= ; do Take any literal
x
2
B
0 := expand (P; A [ fxg) if con ict (P; A0 ) then
A
return
else B
:=
x
B
0
A
end if end while return heuristic (P; A).
1. for all r ∈ P , S agrees with min r (A), 2. if there is an atom a such that for all r ∈ P , a 6∈max r (A), then S agrees with {not a},
Extending the Stable Model Semantics with More Expressive Rules
311
3. if the atom a ∈ A, if there is only one r ∈ P for which a ∈ max r (A), and if there exists a literal x such that a 6∈max r (A ∪ {x}), then S agrees with {not (x)}, and 4. if not a ∈ A and if there exists a literal x such that for some r ∈ P , a ∈ min r (A ∪ {x}), then S agrees with {not (x)}. The four statements help us deduce additional literals that are in agreement with S. Define Atleast(P, A) as the smallest set of literals containing A that can not be enlarged using 1–4 above, i.e., let Atleast(P, A) be the least fixed point of the operator f (B) = A ∪ B ∪ {a ∈ min r (B) | r ∈ P } ∪ {not a | a ∈ Atoms(P ) and for all r ∈ P , a 6∈max r (B)} ∪ not (x) there exists a ∈ B such that a ∈ max r (B) for only one r ∈ P and a 6∈max r (B ∪ {x}) ∪ not (x) there exists not a ∈ B and r ∈ P such that a ∈ min r (B ∪ {x}) . We conclude, Proposition 2. If the stable model S of P agrees with A, then S agrees with Atleast(P, A). Furthermore, we can bound the stable models from above. Proposition 3. For a choice rule r of the form {h1 , . . . , hk } ← a1 , . . . , an , not b1 , . . . , not bm , let fr0 (S, C) = h ∈ {h1 , . . . , hk } a1 , . . . , an ∈ C, b1 , . . . , bm 6∈S , and for any other type of rule, let fr0 (S, C) = fr (S, C). Let S be a stable model of P that agrees with A. Define Atmost(P, A) as the least fixed point of f 0 (B) =
[
fr0 (A+ , B − A− ) − A− .
r∈P
Then, S ⊆ Atmost(P, A). It follows that expand (P, A) satisfies the conditions E1 and E2. The function conflict(P, A) obviously fulfills C2, and the next proposition shows that also C1 holds. Proposition 4. If A = expand (P, A) covers the set Atoms(P ) and A+ ∩ A− = ∅, then A+ is a stable model of P .
312
3.1
P. Simons
Looking Ahead and the Heuristic
Besides Atleast(P, A) and Atmost(P, A), there is a third way to prune the search space. If the stable model S agrees with A but not with A ∪ {x} for some literal x, then S agrees with A ∪ {not (x)}. One can therefore avoid futile choices if one looks ahead and tests whether A ∪ {x} gives rise to a conflict for some literal x. Since x0 ∈ expand (P, A ∪ {x}) implies expand (P, A ∪ {x0 }) ⊆ expand (P, A ∪ {x}) due to the monotonicity of Atleast(P, A) and Atmost(P, A), it is not even necessary to examine all literals not covered by A. That is, if we have tested x, then we do not have to test the literals in expand (P, A ∪ {x}). When looking ahead fails to find a literal that causes a conflict, one falls back on a heuristic. For a literal x, let Ap = expand (P, A ∪ {x}) and An = expand P, A ∪ {not (x)} . Assume that the search space is a full binary tree of height H, and let p = |Ap −A| and n = |An − A|. Then, 2H−p + 2H−n = 2H
2 n + 2p 2p+n
is an upper bound on the size of the remaining search space. Minimizing this number is equal to minimizing log
2n + 2p = log(2n + 2p ) − (p + n). 2p+n
Since 2max(n,p) < 2n + 2p ≤ 2max(n,p)+1 is equivalent to max(n, p) < log(2n + 2p ) ≤ max(n, p) + 1 and − min(n, p) < log(2n + 2p ) − (p + n) ≤ 1 − min(n, p), it suffices to maximize min(n, p). If two different literals have equal minimums, then one chooses the one with the greater maximum, max(n, p).
Extending the Stable Model Semantics with More Expressive Rules
4
313
Implementation Details
The deductive closures Atleast(P, A) and Atmost(P, A) can both be implemented using two versions of a linear time algorithm of Dowling and Gallier [1]. The basic algorithm associates with each rule a counter that keeps track of how many literals in the body of a rule are not included in a partially computed closure. If a counter reaches zero, then the head of the corresponding rule is included in the closure. From the inclusion follows changes in other counters, and in this manner is membership in the closure propagated. We begin with basic rules of the form h ← a1 , . . . , an , not b1 , . . . , not bm . For every rule r we create a literal counter r.literal , which is used as above, and an inactivity counter r.inactive. If the set A is a partial closure, then the inactivity counter records the number of literals in the body of r that are in not (A). The counter r.inactive is therefore positive, and the rule r is inactive, if one can not now nor later use r to deduce its head. For every atom a we create a head counter a.head that holds the number of active rules with head a. Recall that a literal can be brought into Atleast(P, A) in four different ways. We handle the four cases with the help of the three counters. 1. If r.literal reaches zero, then the head of r is added to the closure. 2. If a.head reaches zero, then not a is added to the closure. 3. If a.head is equal to one and a is in the closure, then every literal in the body of the only active rule with head a is added to the closure. 4. Finally, if a is the head of r, if not a is in the closure, and if r.literal = 1 and r.inactive = 0, then there is precisely one literal x in the body of r that is not in the closure, and not (x) is added to the closure. Constraint rules and choice rules are easily incorporated into the same framework. Specifically, one does neither use the first nor the fourth case together with choice rules, and one does not compare the literal and inactivity counters of a constraint rule h ← k {a1 , . . . , an , not b1 , . . . , not bm } with zero but with m + n − k. A weight rule h ← {a1 = wa1 , . . . , an = wan , not b1 = wb1 , . . . , not bm = wbm } ≥ w, is managed using the upper and lower bound of the sum of the weights in its body. Given a set of literals A, the lower bound is X X wai + wbi ai ∈A+
bi ∈A−
and the upper bound is X ai 6∈A−
wai +
X bi 6∈A+
wbi .
314
P. Simons
If the upper bound is less than w, then the rule is inactive, and if the lower bound is at least w, then the head is in the closure. Notice that the implementation provides for incremental updates to the closure Atleast(P, A) as A changes. This is crucial for achieving a high performance. Since the function Atmost(P, A) is anti-monotonic, it will shrink as A grows. It is no good computing Atmost(P, A) anew each time A is modified. Instead all atoms that might not be in the newer and smaller closure are found using a variant of the basic algorithm. By inspecting these atoms it is possible to decide which ones must be in the closure, and then the basic algorithm can again be used to compute the final closure. A small example will make the method clear. Example 3. Suppose P is the program a←b
a ← not c
b←a
a ← not d,
and suppose A has changed from the empty set to {d}. Then, we have already computed Atmost(P, ∅) = {a, b}, and we want to find Atmost(P, A). If r is the rule a ← not d, then the counter of r is at first zero and then changes to one as d becomes a member of A. Therefore, we deduce that a is possibly not a part of the new closure. The basic algorithm proceeds to increment the counters of b ← a, removing b, and a ← b, where it stops. At this point the counter of the rule a ← not c is still zero, and we note that a must be part of the closure. Including a causes the counter of b ← a to decrease to zero. Consequently, b is added to the closure and the counter of a ← b is decremented. Since nothing more remains to be done, the final closure is {a, b}. One can argue, in this particular example, that a follows from the rule a ← not c and need not be removed in the first stage of the procedure. However, in general it is not possible to decide whether an atom is in the final closure by inspecting the rules of which it is a head. Notwithstanding, we can make improvements based upon this observation. For every atom a, create a source pointer whose mission is to point to the first rule that causes a to be included in the closure. During the portion of the computation when atoms are removed from the closure, we only remove atoms which are to be removed due to a rule in a source pointer. For if the rule in a source pointer does not justify the removal of an atom, then the atom is reentered into the closure in the second phase of the computation. In practice, this simple trick yields a substantial speedup of the computation of Atmost(P, A).
5
Experiments
We will search for sets of binary words of length n such that the Hamming distance between any two words is at least d. The size of the largest of these sets is denoted by A(n, d). For example, A(5, 3) = 4 and any 5-bit one-error-correcting code contains at most 4 words. One such code is {00000, 00111, 11001, 11110} =
Extending the Stable Model Semantics with More Expressive Rules
315
{0, 7, 25, 30}. Finding codes becomes very quickly very hard. For instance, it was only recently proved that A(10, 3) = 72 [10]. Construct a program that includes a rule wi ← not wj1 , . . . , not wjk for every word i = 0, . . . , 2n such that j1 , . . . , jk are the words whose distance to i is positive and less than d. Then, the stable models of the program are the maximal codes with Hamming distance d. Add the rule true ← m {w0 , . . . , w2n } and every model containing true is a code of size at least m. For the purpose of making the problem a bit more tractable, we only consider codes that include the zero word. The test results are tabulated below. The minimum, maximum, and average times are given in seconds and are calculated from ten runs on randomly shuffled instances of the program. All tests where run under Linux 2.2.6 on a 233MHz Pentium II with 128MB of memory. Problem Min Max Average A(5, 3) ≥ 4 0.01 0.02 0.02 A(5, 3) < 5 0.00 0.02 0.02 A(6, 3) ≥ 8 0.02 0.04 0.03 A(6, 3) < 9 0.16 0.18 0.17 A(7, 3) ≥ 16 0.14 14.19 6.77 A(7, 3) < 17 69.08 72.29 70.55 A(8, 3) ≥ 20 6.39 202.41 55.98 A(8, 3) < 21 > 1 week
6
Problem Min Max Average A(6, 5) ≥ 2 0.02 0.03 0.03 A(6, 5) < 3 0.02 0.03 0.02 A(7, 5) ≥ 2 0.05 0.07 0.06 A(7, 5) < 3 0.04 0.07 0.06 A(8, 5) ≥ 4 0.29 0.36 0.34 A(8, 5) < 5 2.64 2.75 2.71 A(9, 5) ≥ 6 3.18 8.71 4.81 A(9, 5) < 7 1127.03 1162.10 1145.85
Conclusion
We have presented some new and more expressive propositional rules for the stable model semantics. A decision procedure, which has been used as a base for an efficient implementation, has also been described. We note that the decision problem for the extended semantics is NP -complete, as a proposed stable model can be tested in polynomial time. Accordingly, the exponential worst case timecomplexity of the decision procedure comes as no surprise. The literals that smodels(P, A) can branch on are, in this paper, the literals that do not cover Atoms(P ) − Atoms(A). In previous work, for instance in Niemel¨a and Simons [6,8], the eligible literals have also been required to appear in the form of not-atoms in the program. This additional restriction can reduce the search space, and a similar requirement is, of course, also possible here. The question of which literals one necessarily must consider as branch points is left to future research.
316
P. Simons
References 1. W.F. Dowling and J.H. Gallier. Linear-time algorithms for testing the satisfiability of propositional Horn formulae. Journal of Logic Programming, 3:267–284, 1984. 2. M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In Proceedings of the 5th International Conference on Logic Programming, pages 1070–1080, Seattle, USA, August 1988. The MIT Press. 3. K. Heljanko. Using logic programs with stable model semantics to solve deadlock and reachability problems for 1-safe petri nets. In Tools and Algorithms for the Construction and Analysis of Systems, volume 1579 of Lecture Notes in Computer Science, pages 240–254, Amsterdam, The Netherlands, March 1999. SpringerVerlag. 4. V.W. Marek and M. Truszczy´ nski. Stable models and an alternative logic programming paradigm. The Computing Research Repository, http://xxx.lanl.gov/ archive/cs/, September 1998. cs.LO/9809032. 5. I. Niemel¨ a. Logic programs with stable model semantics as a constraint programming paradigm. In Proceedings of the Workshop on Computational Aspects of Nonmonotonic Reasoning, pages 72–79. Research Report A52, Helsinki University of Technology, May 1998. 6. I. Niemel¨ a and P. Simons. Efficient implementation of the well-founded and stable model semantics. In Proceedings of the 1996 Joint International Conference and Symposium on Logic Programming, pages 289–303, Bonn, Germany, September 1996. The MIT Press. 7. C. Sakama and K. Inoue. An alternative approach to the semantics of disjunctive logic programs and deductive databases. Journal of Automated Reasoning, 13:145– 172, 1994. 8. P. Simons. Towards constraint satisfaction through logic programs and the stable model semantics. Research Report A47, Helsinki University of Technology, August 1997. 9. P. Simons. Smodels 2.10. http://www.tcs.hut.fi/pub/smodels/, 1999. A system for computing the stable models of logic programs. ¨ 10. P. Osterg˚ ard, T. Baicheva, and E. Kolev. Optimal binary one-error-correcting codes of length 10 have 72 codewords. IEEE Transactions on Information Theory, 45(4):1229–1231, May 1999.
Stable Model Semantics of Weight Constraint Rules Ilkka Niemel¨a1 , Patrik Simons1 , and Timo Soininen2 1
Helsinki University of Technology, Dept. of Computer Science and Eng., Laboratory for Theoretical Computer Science, P.O.Box 5400, FIN-02015 HUT, Finland {Patrik.Simons,Ilkka.Niemela}@hut.fi 2 Helsinki University of Technology, TAI Research Center and Lab. of Information Processing Science, P.O.Box 9555, FIN-02015 HUT, Finland
[email protected]
Abstract. A generalization of logic program rules is proposed where rules are built from weight constraints with type information for each predicate instead of simple literals. These kinds of constraints are useful for concisely representing different kinds of choices as well as cardinality, cost and resource constraints in combinatorial problems such as product configuration. A declarative semantics for the rules is presented which generalizes the stable model semantics of normal logic programs. It is shown that for ground rules the complexity of the relevant decision problems stays in NP. The first implementation of the language handles a decidable subset where function symbols are not allowed. It is based on a new procedure for computing stable models for ground rules extending normal programs with choice and weight constructs and a compilation technique where a weight rule with variables is transformed to a set of such simpler ground rules.
1
Introduction
The implementation techniques for normal logic programs with the stable model semantics have advanced considerably during the last years. The performance of their state of the art implementations, e.g. the smodels system [12,13], is approaching the level needed in realistic applications. Recently, logic program rules with the stable model semantics have also been proposed as a methodology for expressing constraints capturing for example combinatorial, graph and planning problems, see, e.g., [9,11]. This indicates that interesting applications can be handled using normal programs and stable models. However, there are important aspects of combinatorial problems which do not seem to have a compact representation using normal rules. We explain these difficulties by first introducing the basic ideas behind the methodology of using rules for problem solving [9,11]. Then we examine a number of examples involving cardinality, cost and resource M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 317–331, 1999. c Springer-Verlag Berlin Heidelberg 1999
318
I. Niemel¨ a, P. Simons, and T. Soininen
constraints which are difficult to express using normal programs, i.e., programs consisting of rules without disjunction but with default negation in the body. On the basis of the examples we present an extension of normal rules where a generalized notion of cardinality constraints is used and which is suitable for handling choices with cardinality, cost and resource constraints in the examples. When solving, e.g., a combinatorial problem using the stable model semantics the idea is to write a program such that the stable models of the program correspond to the solutions to the problem [9,11]. As an example consider the 3-coloring problem where given a graph, we can build a program where for each vertex v in the graph we take the three rules on the left and for each edge (v, u) the three rules on the right v(1) ← not v(2), not v(3) v(2) ← not v(1), not v(3) v(3) ← not v(1), not v(2)
← v(1), u(1) ← v(2), u(2) ← v(3), u(3)
Now a stable model of the program, which is a set of atoms of the form v(n), gives a legal coloring of the graph where a node v is colored with the color n iff v(n) is included in the stable model. These kinds of logic programming codings of different kinds of combinatorial, constraint satisfaction and planning problems can be found, e.g., in [9,11]. The encodings demonstrate nicely the expressivity of normal programs. However, there are a number of conditions which are hard to capture using normal programs. For example, in the product configuration domain [14] choices with cardinality, cost and resource constraints need to be handled. Next we consider some motivating examples demonstrating the difficulties and show that extending normal rules by a suitable notion of cardinality constraints is an interesting approach to handling the problems. By a cardinality constraint we mean an expression written in the form L ≤ {a1 , . . . , an , not b1 , . . . , not bm } ≤ U .
(1)
The intuitive idea is that such a constraint is satisfied by any model (a set of atoms) where the cardinality of the subset of the literals satisfied by the model is between the integers L and U . For example, the cardinality constraint 1 ≤ {a, not b, not c} ≤ 2 is satisfied by the model {a, b} but not by {a}. These kinds of cardinality constraints are useful in a number of settings and rules extended with such constraints can be used to express different kinds of choices and cardinality restrictions. For example, vertex covers of size less than K could be captured in the following way. For a given graph, we build a program by including for each edge (v, u) a rule 1 ≤ {v, u} ← and then adding an integrity constraint ← K ≤ {v1 , . . . , vn } where {v1 , . . . , vn } is the set of vertices in the graph. The first rule expresses a choice saying that at least one end point for each edge should be selected and
Stable Model Semantics of Weight Constraint Rules
319
the second rule states a cardinality restriction saying that the cover must have size less than K. Now stable models of the program directly represent vertex covers of the graph. It seems that the choice rule cannot be expressed by normal rules without introducing additional atoms in the program and that there are no compact encodings of the cardinality restriction using normal rules. For applications it is important to be able to work with first-order rules having variables. Hence, this kind of a cardinality constraint needs to be generalized to the first-order case where the set on which the constraint is imposed could be given compactly using expressions with variables. Consider, e.g., the problem of capturing cliques in a graph which is given by two relations vertex and edge, i.e., two sets of ground facts vertex (v) and edge(v, u) specifying the vertices and edges of the graph, respectively. The idea is to define the set of ground atoms in the constraint by attaching conditions to non-ground literals which are local to each constraint, i.e., using conditional literals, for example, in the following way: 0 ≤ {clique(X) : vertex (X)} ←
(2)
where the set of atoms in the constraint consists of those instances of clique(v) for which vertex (v) holds. Such a rule chooses a subset of vertices and cliques. Cliques, i.e., subsets of vertices where each pair of vertices is connected by an edge, can be captured by including the rule ← clique(X), clique(Y ), not (X = Y ), not edge(X, Y ) . It is also useful to allow both local and global variables in a rule. The scope of a local variable is one constraint, as for the variable X in (2), but the scope of a global variable is the whole rule. The first of the following rules capturing the colorings of a graph demonstrates the usefulness of this distinction. 1 ≤ {colored (V, C) : color (C)} ≤ 1 ← vertex (V ) ← edge(V, U ), colored (V, C), colored (U, C)
(3) (4)
Here V is a global variable in the first rule stating the requirement that for each vertex v exactly one instance of colored (v, c) should be chosen such that color (c) holds for the term c. The set of facts color (c) provides the available colors. As the examples show cardinality constraints are quite expressive and useful in practice. However, in for instance product configuration [14] applications there are conditions which are hard to capture even using cardinality constraints. One important class is resource or cost constraints. A typical example of these is the knapsack problem where the task is to choose a set of items ij each having a weight wj and value vj such that the sum of the weights of the chosen items does not exceed a given limit W but the sum of the values exceeds a given limit V. It turns out that these kinds of constraints can be captured by generalizing cardinality constraints in a suitable way which becomes obvious by noticing that a cardinality constraint of the form (1) can be seen as a linear inequality L ≤ a1 + · · · + an + b1 + · · · + bm ≤ U
320
I. Niemel¨ a, P. Simons, and T. Soininen
where ai , bj are variables with values 0 or 1 such that x+x = 1 for all variables x. We can generalize this by allowing a real-valued coefficient for each variable, i.e., a weight for each atom in the cardinality constraint. Hence we are considering constraints of the form L ≤ {a1 = wa1 , . . . , an = wan , not b1 = wb1 , . . . , not bm = wbm } ≤ U
(5)
where, e.g., wa1 is a real-valued weight for the atom a1 . The idea is that a stable model satisfies the constraint if the sum of the weights of the literals satisfied by the model is between L and U . For example, 1.02 ≤ {a = 1.0, b = 0.02, not c = 0.04} ≤ 1.03 is satisfied by {a, b, c} but not by {a}. Hence, a weight constraint of the form (5) corresponds to a linear inequality L ≤ wa1 × a1 + · · · + wan × an + wb1 × b1 + · · · + wbm × bm ≤ U
(6)
Using weight constraints the knapsack problem can be captured using the following rules: 0 ≤ {i1 = w1 , . . . , in = wn } ≤ W ← ← {i1 = v1 , . . . , in = vn } ≤ V In the light of the examples it seems that weight constraints provide an expressive and uniform framework for handling large classes of combinatorial problems. In this paper we present a novel rule language which extends normal rules by taking weight constraints as the basic building blocks of the rules. Hence, the extended rules which we call weight rules are of the form C0 ← C1 , . . . , Cn .
(7)
Here each Ci is a weight constraint L ≤ {a1 : c1 = w1 , . . . , an : cn = wn , not an+1 : cn+1 = wn+1 , . . . , not am : cm = wm } ≤ U
(8)
where ai , ci are atomic formulae possibly containing variables. These kinds of constraints are a first-order generalization of weight constraints of the form (5). The weight rules are given a declarative nonmonotonic semantics that extends the stable model semantics of normal logic programs [4] and generalizes the propositional choice rules presented in [16] to the first-order case where type information and weight constraints can be used. Unlike the approaches based on associating priorities, preferences, costs, probabilities or certainty factors to rules (see e.g. [1,8,10,6] and the references there), our aim is to provide a relatively simple way of associating weights or costs to atoms and representing constraints using the weights. Approaches such as NP-SPEC [3], constraint logic programs (CLP) and constraint satisfaction problems are not based on stable model semantics like ours and thus do not include default negation. In addition, our semantics treats the constraints, rules and choices uniformly unlike
Stable Model Semantics of Weight Constraint Rules
321
the CLP and NP-SPEC approaches. There is also some related work based on stable models. For example, in [2] priorities are added to integrity constraints. However, this is done to express weak constraints, as many of which as possible should be satisfied, and not weight constraints which must all be satisfied. In [5] several types of aggregates are integrated to Datalog in a framework based on stable models in order to express dynamic programming optimization problems. This contrasts with our approach which is not primarily intended to capture optimization. In addition, their approach covers only the subclass of programs with stratified negation and choice constructs. Our approach also differs from the main semantics of disjunctive logic programs in that they are based on subset minimal choices through disjunction while we support a general notion of cardinality constraints. The computational complexity of the decision problem for the language is analyzed and found to remain in NP for ground rules. The first implementation of the language handles a decidable subset of weight rules where function symbols are not allowed. Although the semantics of the language is based on real-valued weights, the implementation handles only integer weights in order to avoid problems arising from finite precision of real number arithmetic. The implementation is based on the smodels-2 procedure [15] which is a new extended version of the smodels procedure [12,13]. It computes stable models for ground logic programs but supports several types of rules extending normal logic programs. Our language extends that handled by smodels-2 further: it is first-order with conditional literals, variables, and built-in functions; both upper and lower bounds of a constraint can be given and a weight constraint is allowed also in the head of a rule. However, we show that it is possible to translate a set of weight rules containing variables to a set of simple ground rules supported by smodels-2. This provides the basis for our implementation.
2
Weight Constraint Rules
We extend logic program rules by allowing weight constraints of the type (8) with conditional literals that have real-valued weights. First we develop a semantics for ground rules and then we show how to generalize this to rules with variables. 2.1
Ground Rules
The basic building block of a weight constraint is a conditional atom which is an expression of the form p : q where the proper part p and conditional part q are atomic formulae. In ground rules formulae p and q are variable-free (ground) atoms. If q is >, i.e., always valid, it is typically omitted. A conditional literal is a conditional atom or its negation, an expression of the form not p : q. Note that the not is intended as a nonmonotonic, default negation. A weight constraint C is an expression of the form: l(C) ≤ lit(C) ≤ u(C)
322
I. Niemel¨ a, P. Simons, and T. Soininen
where lit(C) is a set of conditional literals and l(C), u(C) two real numbers denoting the lower and upper bounds, respectively. The bounds l(C), u(C) can also be missing in which case we denote them by l(C) = −∞, u(C) = ∞, respectively. To each constraint C we associate a local weight function w(C) from the set of literals in C to the real numbers, typically specified directly as in the constraint for C below: 2.1 ≤ {p : d1 = 1.1, not q : d2 = 1.0001} where, e.g., w(C)(not q) = 1.0001 and u(C) = ∞. The extension to allow < in the constraints is straightforward but for brevity we discuss only ≤. Finally, a weight program is a set of weight rules, i.e., expressions of the form (7) where each Ci is a weight constraint and where the head C0 contains no negative literals. Our semantics for weight rules generalizes the stable model semantics for normal logic programs and is given in terms of models that are sets of atoms. First we define when a model satisfies a rule and then using this concept the notion of stable models. Definition 1. A set of atoms S satisfies a weight constraint C (S |= C) iff for the weight W(C, S) of C in S, l(C) ≤ W(C, S) ≤ u(C) holds where X X w(C)(p) + w(C)(not p) W(C, S) = p∈plit(C,S)
not p∈nlit(C,S)
with plit(C, S) = {p | p : q ∈ lit(C), {p, q} ⊆ S} and nlit(C, S) = {not p | not p : q ∈ lit(C), p 6∈S, q ∈ S} which are the positive and negative literals satisfied by S, respectively. A rule r of the form (7) is satisfied by S (S |= r) iff S satisfies C0 whenever it satisfies C1 , . . . , Cn . We also allow integrity constraints, i.e., rules without the head constraint C0 , which are satisfied if at least one of the body constraints C1 , . . . , Cn is not. Example 1. Consider the weight constraints C1 : 2 ≤ {p : d1 = 1, not q : d1 = 2, r : d2 = 1.5} ≤ 5 C2 : 2 ≤ {p : d2 = 1, not q : d2 = 2, r : d1 = 1.5} ≤ 5 and a set of atoms S = {p, d1 , r}. Now plit(C1 , S) = {p} and nlit(C1 , S) = {not q} and, hence, W(C1 , S) = 1 + 2 = 3. Similarly, W(C2 , S) = 1.5. Thus, S |= C1 but S 6|= C2 and S |= C1 ← C2 but S 6|= C2 ← C1 . Moreover, S |=← C1 , C2 but S 6|=← C1 . We define stable models first for weight programs with non-negative weights. We then show how the general case, i.e., programs with negative weights reduce to this case. In the definition we need the notion of a deductive closure of rules in a special form P ← C1 , . . . , Cn
Stable Model Semantics of Weight Constraint Rules
323
where P is a ground atom and each weight constraint Ci contains only positive literals and non-negative weights, and has only a lower bound condition. We call such rules Horn weight rules. A set of atoms is closed under a set of rules if each rule is satisfied by the atom set. A set of Horn weight rules P has a unique smallest set of atoms closed under P . We call it the deductive closure and denote it by cl(P ). The uniqueness is implied by the fact that Horn weight rules are monotonic, i.e., if the body of a rule is satisfied by a model S, then it is satisfied by any superset of S. Note that the closure can be constructed iteratively by starting from the empty set of atoms and iterating over the set of rules and updating the set of atoms with the head of a rule not yet satisfied until no unsatisfied rules are left. Example 2. Consider a set of Horn weight rules P a ← 1 ≤ {a = 1} b ← 0 ≤ {b = 100} c ← 6 ≤ {b = 5, d = 1}, 2 ≤ {b = 2, a = 2} The deductive closure of P is the set of atoms {b} which can be constructed iteratively by starting from the empty set and realizing that the body of the second rule is satisfied by the empty set and, hence, b should be added to the closure. This set is already closed under the rules. If a rule d ← 1 ≤ {a = 1, b = 1, c = 1} is added, then the closure is {b, d, c}. Stable models for programs with non-negative weights are defined in the following way using the concept of a reduct. The idea is to define a stable model of a program P as an atom set S that satisfies all rules of P and that is the deductive closure of a reduct of P w.r.t. S. The role of the reduct is to provide the possible justifications for the atoms in S. Each atom in a stable model is justified by the program P in the sense that it is derivable from the reduct. We introduce the reduct in two steps. First we define the reduct of a constraint and then generalize this to rules. The reduct C S of a constraint C w.r.t. to a set of atoms S is the constraint L0 ≤ {p : q = w | p : q = w ∈ lit(C)} P where L0 = l(C) − not p∈nlit(C,S) w(C)(not p). Hence, in the reduct all negative literals and the upper bound are removed and the lower bound is decreased by w for each not p : q = w satisfied by S. The idea here is that for negative literals satisfied by S, their weights contribute to satisfying the lower bound. However, this does not yet capture the condition part of the negative literals satisfied by S. In order to guarantee that the conditions are justified by the program a set j(C, S) of justification constraints is used: j(C, S) = {1 ≤ {q = 1} | not p : q = w ∈ lit(C), p 6∈S, q ∈ S}
324
I. Niemel¨ a, P. Simons, and T. Soininen
For example, for a constraint C: 3 ≤ {not p : q = 2, not r : p = 3, p : q = 1} ≤ 4 and a set S = {q} we get the reduct and justification constraint C S = 1 ≤ {p : q = 1}
j(C, S) = {1 ≤ {q = 1}}
The reduct P S for a program P w.r.t. a set of atoms S is a set of Horn weight rules which contains a rule r0 with an atom p as the head if p ∈ S and there is a rule r ∈ P such that p : q = w appears in the head with q ∈ S, and the upper bounds of the constraints in the body of r are satisfied by S. The condition q is moved to the body as q is the justification condition for p and the body of r0 is obtained by taking the reduct of the constraints in the body of r and adding the corresponding justification constraints. Formally the reduct is defined as follows. Definition 2. Let P be a weight program with non-negative weights and S a set of atoms. The reduct P S of P w.r.t. S is defined by P S = {p ← 1 ≤ {q = 1}, C1S , j(C1 , S), . . . , CnS , j(Cn , S) | C0 ← C1 , . . . , Cn ∈ P, p : q = w ∈ lit(C0 ), {p, q} ⊆ S, for all i = 1, . . . , n, W(Ci , S) ≤ u(Ci )} Definition 3. Let P be a weight program with non-negative weights. Then S is a stable model of P iff the following two conditions hold: (i) S |= P , (ii) S = cl(P S ). Example 3. Consider first program P1 demonstrating the role of justification constraints. 0 ≤ {p : p = 2} ≤ 2 ← 2 ≤ {p = 2} ≤ 2 ← 2 ≤ {not q : p = 3} The empty set is a stable model of P1 because it satisfies both rules and the reduct P1∅ = ∅ . For S = {p} the reduct P1S is p ← 1 ≤ {p = 1} p ← −1 ≤ {}, 1 ≤ {p = 1}, Now cl(P1S ) = {} implying that S is not a stable model although it satisfies P1 . Consider the program P2 2 ≤ {b = 2, c = 3} ≤ 4 ← 2 ≤ {not a = 2, b = 4} ≤ 5 The definition of stable models guarantees that atoms in a model must be justifiable by the program in terms of the reduct and thus, e.g., P2 cannot have a stable model containing a. The empty set is not a stable model as {} 6|= P2 . The same holds if S = {b} because the reduct P2S is empty since the upper bound in the body is exceeded. However, S = {c} is a stable model as S |= P2 and cl(P2S ) = {c} where P2S = {c ← 0 ≤ {b = 4}}. Note that as there are no conditional literals, no justification constraints are needed.
Stable Model Semantics of Weight Constraint Rules
325
Our definition is a generalization of the stable model semantics for normal programs as a simple literal l in a normal program can be seen as a shorthand for 1 ≤ {l = 1} ≤ 1. Thus, e.g., a normal rule a ← b, not c is a shorthand for 1 ≤ {a = 1} ≤ 1 ← 1 ≤ {b = 1} ≤ 1, 1 ≤ {not c = 1} ≤ 1 . The reduct of the rule w.r.t. S = {a, b} is a ← 1 ≤ {b = 1}, 0 ≤ {} whose closure is {} and, hence, S is not a stable model of the rule although it satisfies the rule. We use this abbreviation frequently and, furthermore, we often omit the weight of a literal if it is 1. Definition 3 does not cover constraints with negative weights. However, it turns out that these can be transformed to constraints with non-negative weights by simple linear algebraic manipulation which translates a constraint C L ≤ {a1 = wa1 , . . . , an = wan , not b1 = wb1 , . . . , not bm = wbm } ≤ U to an equivalent form C 0 with only non-negative weights L+
X
|wai | +
wai <0
X
|wbi | ≤
wbi <0
{ai = wai , . . . , bj = |wbj |, . . . , not bk = wbk , . . . , not al = |wal |, . . . } | {z } | | {z } {z } | {z } wai ≥0
wbj <0
wal <0
wbk ≥0
X
≤U+
|wai | +
wai <0
X
|wbi |
wbi <0
where negative weights are complemented together with the corresponding literal and the sum of absolute values of all negative weights is added to the bounds. The equivalence of C and C 0 can be seen using the linear inequality (6) for C. We can eliminate any negative weight wai by adding |wai | × (ai + ai ) = |wai |
(ai + ai = 1)
to the inequality. This leaves the term |wai | × ai in the middle corresponding to not ai = |wai |. Similarly all negative weights wbi can be eliminated. Example 4. Consider the rule a ← −1 ≤ {a = −4, not b = −1} ≤ 0 where we can eliminate the negative weights in the body using the method above. Then the resulting rule is a ← 4 ≤ {not a = 4, b = 1} ≤ 5 Let S = {a}. Then the reduct for the resulting rule is {a ← 4 ≤ {b = 1}}. Hence, S is not a stable model of the rule.
326
I. Niemel¨ a, P. Simons, and T. Soininen
We have demonstrated the expressiveness of weight constraints already by a number of examples in the introduction. Here we show how they capture propositional logic and the rule-based configuration language in [16]. Example 5. (i) We can reduce propositional satisfiability to the problem of finding a stable model in the following way without introducing any additional atoms. Consider a set T of propositional clauses containing the atoms a1 , . . . , ak . If we construct a program with a rule 0 ≤ {a1 , . . . , ak } ← together with a rule ← not a1 , . . . , not an , an+1 , . . . , am for each clause a1 ∨· · ·∨an ∨¬an+1 ∨· · ·∨¬am ∈ T , then the resulting program has a stable model iff T is satisfiable. Furthermore, each stable model corresponds directly to a propositional model (the atoms in the stable model are true and the other atoms are false). (ii) Weight rules generalize also the rule-based configuration language in [16]. For example, an inclusive choice-rule a | b | c ← d can be represented as 1 ≤ {a, b, c} ← d and an exclusive choice-rule a ⊕ b ⊕ c ← not d is captured by 1 ≤ {a, b, c} ≤ 1 ← not d 2.2
First-Order Rules
Now we consider the first-order case where rules have variables. The semantics is obtained by the use of Herbrand models. The Herbrand universe of the program is defined as usual, i.e., it consists of terms constructible from constants and functions appearing in the program. The Herbrand base is the set of ground atoms constructible from the predicate symbols and the Herbrand universe of the program. As noted in the introduction, it is useful to provide local variables for a constraint as well as global, i.e., universally quantified, variables for a rule. A constraint C with local variables X1 , . . . , Xn is written l(C) ≤ hX1 , . . . , Xn ilit(C) ≤ u(C) and the variables not local to a constraint are global. With this distinction we define the Herbrand instantiation of a weight program which consists of all ground rules obtainable in the following way. First each global variable in a rule is substituted with a ground term from the Herbrand universe. Now the rule contains only local variables. Then for each constraint C, the set of literals in the ground instance of C is obtained by taking every substitution instance of the literals where the local variables are replaced by terms from the Herbrand universe. For example, for the rule 1 ≤ hXi{p(X, Y ) : d(X, Y )} ≤ 1 ← q(Y )
Stable Model Semantics of Weight Constraint Rules
327
Y is a global variable and X is a local variable for the constraint in the head. If the Herbrand universe is {a, b}, the Herbrand instantiation of the rule is 1 ≤ {p(a, a) : d(a, a), p(b, a) : d(b, a)} ≤ 1 ← q(a) 1 ≤ {p(a, b) : d(a, b), p(b, b) : d(b, b)} ≤ 1 ← q(b) With local and global variables many problems can be expressed quite succinctly. Consider, e.g., the rule (3) for assigning colors to vertices in a graph. Notice that it is not necessary to explicate the local variables for a constraint if we use a convention that all variables appearing in more than one constraint are global and all other variables are local. This convention is used in (3) and also in the rest of the paper. The stable models of a weight program with variables are defined using the Herbrand instantiation of the program. Definition 4. Let P be a weight program with variables. Then a set of ground atoms S is a stable model of P iff it is a stable model of the Herbrand instantiation of P . Note that the definition allows a fairly dynamic notion of weights by associating a local weight function with each constraint in every ground instance of a rule.
3
Computational Aspects
Although weight programs extend e.g. normal logic programs considerably, the computational complexity remains unaffected, i.e., stays in NP. Theorem 1. The problem of deciding whether a ground weight program has a stable model is NP-complete. Proof. NP-hardness is implied by the fact that weight programs generalize normal logic programs with the stable model semantics for which NP-completeness has been shown [7]. Containment in NP follows from the property that given a set of atoms it can be checked in polynomial time whether the set is a stable model of a given program. The crucial step here is the computation of the closure of the reduct which can be done iteratively in polynomial time by starting from the empty set of atoms S and iterating over the set of rules and updating S with the heads of the rules not yet satisfied until no unsatisfied rules are left. 3.1
Implementation
The full first-order case of weight rules is clearly undecidable. We have developed an implementation for a decidable subclass in which function symbols are not allowed. This subclass offers an interesting trade-off between expressiveness and implementability which seems adequate for many practical purposes. The implementation is based on the smodels-2 procedure [15] (available at
328
I. Niemel¨ a, P. Simons, and T. Soininen
http://www.tcs.hut.fi/pub/smodels/) and a compilation technique where a rule with variables is transformed to a set of simpler ground propagation rules. The smodels-2 procedure, which is a new extended version of the smodels procedure [12,13], computes stable models for ground logic programs but supports new types of rules that extends the normal rules. The implementation, available at http://www.tcs.hut.fi/smodels/lparse/, is a front-end that maps a general weight program to ground rules from which the stable models of the original program are computed by the smodels-2 procedure. We first define the subclass of rules that our current implementation accepts and then explain the compilation technique. The current implementation works with domain-restricted rules where each variable in a rule must appear in a positive domain predicate in the same rule and for each conditional literal the condition part is a domain predicate. A domain predicate is one that is defined non-recursively using other domain predicates with normal logic program rules. This means that a domain predicate can appear as the head of a rule only when each constraint in the rule is a simple literal. Example 6. Consider the rules (3–4) capturing colorings of a graph. Assume that the predicates vertex , edge, and color are defined non-recursively, e.g., by a set of ground facts. Then they can be taken as domain predicates and the rule (3) is domain restricted but (4) is not because it contains a variable C not appearing in a domain predicate in the rule. Rule (4) can be transformed to a domain-restricted one by adding a domain predicate for C, e.g., as follows: ← edge(V, U ), colored (V, C), colored (U, C), color (C) It is straightforward to extend domain predicates with built-in functions and predicates, e.g., for arithmetic, and this extension is supported by our implementation. This allows rules such as area(C, W ∗ L) ← width(C, W ), length(C, L) 0 ≤ {circuit(C) : area(C, A) = A} ≤ 90 ←
(9) (10)
where area is a domain predicate defined by the domain predicates width and length (giving the width and length of a circuit) and the second rule specifies a choice of a subset of circuits with the sum of areas at most 90. Our implementation also allows expressing weights by rules involving domain predicates as in the example. In order to avoid complications arising from finite precision of real number arithmetic our current implementation supports only integer weights. Domain-restrictedness enables efficient compilation of a program with variables to a set of ground rules which is typically considerably smaller than the Herbrand instantiation of the program but still has exactly the same stable models. This is because for the set of domain predicates D there is a unique set D0 of ground instances of predicates in D that is common to all the stable models of the program. The set D0 can be computed efficiently using database techniques because domain predicates are similar to view definitions in databases. The set D0 has the property that program P has the same stable models as
Stable Model Semantics of Weight Constraint Rules
329
PD0 where PD0 contains those ground instances of rules in P where each ground instance of a domain predicate is in D0 . Furthermore, given D0 , PD0 can be computed efficiently by processing one rule in P at a time. As an example consider program P with rules (9–10) and predicates width and length given as a set of facts F = {width(c1 , 5), length(c1 , 10), width(c2 , 3), length(c2 , 30)}. Our implementation detects automatically that width, length, and area can be taken as domain predicates and that the rules are domain-restricted. It computes the set D0 = F ∪ {area(c1 , 50), area(c1 , 90)} and from that PD0 where, e.g., for (10) only one ground instance is included: 0 ≤ {circuit(c1 ) : area(c1 , 50) = 50, circuit(c1 ) : area(c2 , 90) = 90} ≤ 90 ← The whole compilation works as follows. Given a program P the domain predicates D are determined and the set D0 is computed. Then the set PD0 is constructed and the condition parts of the literals are removed. Finally the set of ground rules obtained in this way is transformed to a set of simpler rules accepted by smodels-2. We finish the section by explaining this last phase. The smodels-2 procedure supports many types of extended rules of which we employ two: choice and weight rules. A choice rule {h1 , . . . , hk } ← a1 , . . . , an , not b1 , . . . , not bm states that a subset of {h1 , . . . , hk } is in a stable model if the body is satisfied by the model. A weight rule h ← {a1 = wa1 , . . . , an = wan , not b1 = wb1 , . . . , not bm = wbm } ≥ w with positive weights the inclusion of the head h into a stable P P wai , wbi implies model S whenever ai ∈S wai + bi 6∈S wbi ≥ w. A weight rule C0 ← C1 , . . . , Cn is encoded as rules handled by smodels-2 as follows. For each constraint Ci l(Ci ) ≤ {c1i = w(Ci )(c1i ), . . . , cki = w(Ci )(cki )} ≤ u(Ci ) we construct two weight rules encoding whether the lower bound is satisfied (Cil ) and the upper bound is not (Ciu ). Cil ← {c1i = w(C)(c1i ), . . . , cki = w(C)(cki )} ≥ l(Ci ) Ciu ← {c1i = w(C)(c1i ), . . . , cki = w(C)(cki )} > u(Ci ) where Cil , Ciu are new atoms. Since only integer weights are allowed in the implementation, the latter rule can be expressed using ‘≥’ instead of ‘>’, by increasing u(Ci ) by one. Then we add a choice and two normal rules {c10 , . . . , ck0 } ← C1l , not C1u , . . . , Cnl , not Cnu ← not C0l , C1l , not C1u , . . . , Cnl , not Cnu ← C0u , C1l , not C1u , . . . , Cnl , not Cnu
330
I. Niemel¨ a, P. Simons, and T. Soininen
where the first one selects a subset of {c10 , . . . , ck0 } and the two other rules enforce that the lower and upper bounds of the head of the rule hold if the body of the rule is satisfied. Finally negative weights are eliminated as described in Section 2. Example 7. To give an example, the weight constraint rule 1 ≤ {a, b} ≤ 1 ← 1 ≤ {a, b, not c} ≤ 2 is translated into the program C0l C0u C1l C1u
← {a = 1, b = 1} ≥ 1 ← {a = 1, b = 1} > 1 ← {a = 1, b = 1, not c = 1} ≥ 1 ← {a = 1, b = 1, not c = 1} > 2
{a, b} ← C1l , not C1u ← not C0l , C1l , not C1u ← C0u , C1l , not C1u
In order to give an idea of the performance of the implementation we provide some test results for the pigeonhole problem. In Figure 1 running times (w-rules) are shown for deciding that n + 1 pigeons cannot be put into n holes using the program on the left for n = 8, 9, 10 (where pigeons and holes are given as facts p(i)/h(j)). The results compare favorably to those (n-rules) for solving the same problems using normal logic programs as described in [11].
1 ≤ {in(P, H) : h(H)} ≤ 1 ← p(P ) ← 2 ≤ {in(P, H) : p(P )}, h(H)
pigeons/holes 9/8 10/9 11/10
w-rules 2.4 s 22 s 225 s
n-rules 25.1 s 258 s 2600 s
Fig. 1. The pigeonhole problem and test results for it.
4
Conclusions
We have presented a novel rule language extending normal logic programs with conditional and weighted literals and weight constraints. The declarative semantics of the language generalizes the stable model semantics of normal programs. Despite the extensions, the complexity of finding a stable model remains in NP for the ground case. An implementation of a computationally attractive and useful subset of the language based on the smodels-2 procedure is described. The language seems to be particularly suitable for product configuration problems and an interesting topic for further research is to apply the language in such problems along the lines presented in [16]. Acknowledgements. The work of the first and second authors has been funded by the Academy of Finland (Project 43963), that of the second and third authors by the Helsinki Graduate School in Computer Science and Engineering, and that of the third author by the Technology Development Centre Finland. We thank Tommi Syrj¨ anen for implementing the front-end to the smodels-2 procedure.
Stable Model Semantics of Weight Constraint Rules
331
References 1. G. Brewka and T. Eiter. Preferred answer sets for extended logic programs. In Principles of Knowledge Representation and Reasoning Proceedings of the Sixth International Conference, pages 86–97, 1998. 2. F. Buccafurri, N. Leone, and P. Rullo. Strong and weak constraints in disjunctive datalog. In Proceedings of the 4th International Conference on Logic Programming and Non-Monotonic Reasoning, pages 2–17, 1997. 3. M. Cadoli, L. Palopoli, A. Schaerf, and D. Vasile. NP-SPEC: An executable specification language for solving all problems in NP. In Practical Aspects of Declarative Languages, LNCS 1551, pages 16–30, 1999. 4. M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In Proceedings of the 5th International Conference on Logic Programming, pages 1070–1080, 1988. 5. S. Greco. Dynamic programming in Datalog with aggregates. IEEE Transactions on Knowledge and Data Engineering, 11(2):265–283, 1999. 6. J. Lu, A. Nerode, and V. Subrahmanian. Hybrid knowledge bases. IEEE Transactions on Knowledge and Data Engineering, 8(5):773–785, 1996. 7. W. Marek and M. Truszczy´ nski. Autoepistemic logic. J. ACM, 38:588–619, 1991. 8. W. Marek and M. Truszczy´ nski. Logic programming with costs. Manuscript, available at http://www.cs.engr.uky.edu/˜mirek/papers.html, 1999. 9. W. Marek and M. Truszczy´ nski. Stable models and an alternative logic programming paradigm. In The Logic Programming Paradigm: a 25-Year Perspective, pages 375–398. Springer-Verlag, 1999. 10. R. Ng and V. Subrahmanian. Stable semantics for probabilistic deductive databases. Information and Computation, 110:42–83, 1994. 11. I. Niemel¨ a. Logic programs with stable model semantics as a constraint programming paradigm. In Proceedings of the Workshop on Computational Aspects of Nonmonotonic Reasoning, pages 72–79. Available at http://www.tcs.hut.fi/ pub/reports/A52abstract.html, 1998. An extended version to be published in Annals of Mathematics and Artificial Intelligence. 12. I. Niemel¨ a and P. Simons. Efficient implementation of the well-founded and stable model semantics. In Proceedings of the Joint International Conference and Symposium on Logic Programming, pages 289–303, 1996. 13. I. Niemel¨ a and P. Simons. Smodels – an implementation of the stable model and well-founded semantics for normal logic programs. In Proceedings of the 4th International Conference on Logic Programming and Non-Monotonic Reasoning, pages 420–429, 1997. 14. D. Sabin and R. Weigel. Product configuration frameworks – a survey. IEEE Intelligent Systems & Their Applications, pages 42–49, July/August 1998. 15. P. Simons. Extending the stable model semantics with more expressive rules. In Proceedings of the 5th International Conference on Logic Programming and NonMonotonic Reasoning, 1999. 16. T. Soininen and I. Niemel¨ a. Developing a declarative rule language for applications in product configuration. In Practical Aspects of Declarative Languages, LNCS 1551, pages 305–319, 1999.
332
Towards First-Order Nonmonotonic Reasoning Riccardo Rosati Dipartimento di Informatica e Sistemistica Universit` a di Roma “La Sapienza” Via Salaria 113, I-00198 Roma, Italy
[email protected]
Abstract. We investigate the problem of reasoning in nonmonotonic extensions of first-order logic. In particular, we study reasoning in firstorder MKNF, the modal logic of minimal knowledge and negation as failure introduced by Lifschitz. MKNF can be considered as a unifying framework for several nonmonotonic formalisms, including default logic, autoepistemic logic, circumscription, and logic programming. By suitably extending deduction methods for propositional nonmonotonic logics, we define techniques for reasoning in significant subsets of first-order MKNF, which allow for characterizing decidable fragments of first-order nonmonotonic modal logics. Due to the expressive abilities of MKNF, such techniques can be seen as general reasoning methods for several nonmonotonic formalisms based on first-order logic. We also analyze the relationship between such decidable fragments of MKNF and disjunctive Datalog.
1
Introduction
In recent years the computational aspects of nonmonotonic reasoning have been extensively studied (see e.g. [2,7,19,5]). However, while reasoning in propositional formalisms for nonmonotonic reasoning has been exhaustively analyzed, providing a detailed study of the computational properties of reasoning in propositional nonmonotonic logics, the problem of reasoning in nonmonotonic extensions of first-order logic has not been thoroughly investigated. In particular, reasoning in nonmonotonic modal extensions of first-order logic has been only partially studied: among the most important studies in this direction, it is worth mentioning [12] and [21], which analyze reasoning in a fragment of first-order autoepistemic logic (without “quantifying-in”), and [13], which provides a computational study of a framework in which a modal logic of “only knowing” is added to a formal model of limited reasoning. In this paper we study reasoning in first-order MKNF, the modal logic of minimal knowledge and negation as failure [15,16]. Such a logic is built by adding to first-order logic two distinct modalities, a “minimal knowledge” modality K and a “negation as failure” modality not. The logic thus obtained is characterized in terms of a nice model-theoretic semantics. MKNF has been used in order to give a declarative semantics to very general classes of logic programs M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 332–346, 1999. c Springer-Verlag Berlin Heidelberg 1999
Towards First-Order Nonmonotonic Reasoning
333
[17,25,10], which generalize the stable model/answer set semantics of negation as failure in logic programming [6]. Due to its ability of expressing many features of nonmonotonic logics, MKNF is generally considered as a unifying framework for several nonmonotonic formalisms, including default logic, autoepistemic logic, circumscription, epistemic queries, and logic programming (see [16,25]). Known reasoning techniques for MKNF only concern its propositional modal fragment [22] and a very restricted subset of the first-order case [1]. By suitably extending deduction techniques for propositional nonmonotonic modal logics, we define techniques for reasoning in significant subsets of firstorder MKNF, which allow for characterizing decidable fragments of first-order nonmonotonic modal logics. Specifically, we first prove that extending a decidable subset of first-order logic with the ability of reasoning about minimal knowledge and negation as failure preserves decidability of reasoning, as long as quantifyingin, i.e., the presence of modalities inside quantifiers, is not allowed. Moreover, we define a general method for reasoning in MKNF theories without quantifyingin. To the best of our knowledge, such an algorithm is the first terminating procedure for reasoning about minimal knowledge and negation as failure in any decidable fragment of first-order logic. Then, we deal with the case of quantifying-in, and show that it is possible to reduce reasoning in MKNF theories with a restricted form of quantifying-in to reasoning without quantifying-in, through a pre-processing step which eliminates quantifiers which occur outside modal operators. Interestingly, such a technique is based on a notion of safeness for quantifying-in which can be seen as a generalization of the notion of rule safeness in deductive databases. Besides its theoretical interest, the importance of defining reasoning techniques (and establishing decidability results) for fragments of first-order MKNF lies in the fact that, due to the remarkable expressive abilities of this logic, such techniques can be seen as general reasoning methods able to deal with first-order extensions of the most important nonmonotonic formalisms. The paper is structured as follows. In the next section we briefly recall the logic MKNF. In Section 3, we study reasoning in the quantifying-in-free fragment of first-order MKNF, while in Section 4 we deal with MKNF theories with quantifying-in. Finally, in Section 5 we relate quantifying-in in first-order MKNF with (disjunctive) Datalog. Due to space limitations, we omit formal proofs of theorems in the present version of the paper.
2
The Logic MKNF
In this section we briefly recall the logic MKNF [16], which is a modal logic with two epistemic operators: a “minimal knowledge” modality K and an “autoepistemic assumption” modality A.1 We use L to denote the set of first-order sentences built in the usual way upon connectives ∧, ¬ and an existential quantifier ∃ (the symbols ∨, ⊃ and the 1
Following [18], we use the modality A, corresponding to ¬not in Lifschitz’s original proposal.
334
R. Rosati
universal quantifier ∀ are used as abbreviations), an infinite set of variables, an infinite set A of propositional symbols, an infinite set of predicate symbols of every arity, and an infinite set of function symbols. We assume that A contains the symbols true, false. We call objective any sentence from L. We call interpretation a usual first-order interpretation for L. An interpretation is also called world. For each interpretation I, I(true) = TRUE and I(false) = FALSE. The evaluation I(ϕ) of a sentence ϕ in an interpretation I is defined in the usual way. We say that a sentence ϕ ∈ L is satisfiable if there exists an interpretation I such that I(ϕ) = TRUE (which we also denote as I |= ϕ). Definition 1. We denote as LM the modal extension of L with the modalities K and A inductively defined as follows: 1. 2. 3. 4. 5.
if ϕ ∈ L, then ϕ ∈ LM ; if ϕ ∈ LM , then ¬ϕ ∈ LM ; if ϕ1 ∈ LM and ϕ2 ∈ LM , then ϕ1 ∧ ϕ2 ∈ LM ; if ϕ ∈ LM , then Kϕ ∈ LM and Aϕ ∈ LM ; nothing else belongs to LM .
Hence, LM denotes a quantifying-in-free first-order bimodal language. We call a modal formula ϕ from LM K-formula (resp. A-formula) if the modality A (resp. K) does not occur in ϕ, and denote as LK the set of K-formulas from LM . Hence, ordinary first-order formulas are both K-formulas and A-formulas. We now recall the notion of MKNF model. Satisfiability of a formula in a structure (I, Mk , Ma ), where I is an interpretation (also called initial world) and Mk , Ma are sets of interpretations (worlds), is defined inductively as follows: 1. if ϕ is an objective sentence, ϕ is satisfied by (I, Mk , Ma ) iff ϕ is satisfied by I; 2. ¬ϕ is satisfied by (I, Mk , Ma ) iff ϕ is not satisfied by (I, Mk , Ma ); 3. ϕ1 ∧ ϕ2 is satisfied by (I, Mk , Ma ) iff ϕ1 is satisfied by (I, Mk , Ma ) and ϕ2 is satisfied by (I, Mk , Ma ); 4. Kϕ is satisfied by (I, Mk , Ma ) iff, for every J ∈ Mk , ϕ is satisfied by (J, Mk , Ma ); 5. Aϕ is satisfied by (I, Mk , Ma ) iff, for every J ∈ Ma , ϕ is satisfied by (J, Mk , Ma ). An MKNF structure is a pair of sets of interpretations (Mk , Ma ). We say that a theory Σ ⊆ LM is satisfied by an MKNF structure (Mk , Ma ) (and write (Mk , Ma ) |= Σ) iff, for each I ∈ Mk , each formula from Σ is satisfied by (I, Mk , Ma ). If ϕ is a K-formula, then the evaluation of ϕ is insensitive to the set Ma , and we write Mk |= ϕ. In order to relate MKNF structures to standard interpretation structures in modal logic (i.e., Kripke structures), we remark that, due to the above notion of satisfiability, we can consider the sets Mk , Ma in an MKNF interpretation structure as two distinct universal Kripke structures, i.e., possible-world structures in which each world is connected to all worlds of the structure. In fact,
Towards First-Order Nonmonotonic Reasoning
335
since the accessibility relation in such a structure is universal, without loss of generality it is possible to identify a universal Kripke structure with the set of interpretations contained in it. The nonmonotonic character of MKNF is obtained by imposing the following preference semantics over the interpretation structures satisfying a given theory. Definition 2. A structure (M, M ), where M 6= ∅, is an MKNF model of a theory Σ ⊆ LM iff (M, M ) |= Σ and, for each set of interpretations M 0 , if M 0 ⊃ M then (M 0 , M ) 6|= Σ. We say that a formula ϕ is entailed (or logically implied ) by Σ in MKNF (and write Σ |=MKNF ϕ) iff ϕ is satisfied by every MKNF model of Σ. In order to simplify notation, we denote the MKNF model (M, M ) with M . Example 1. Let ϕ be a satisfiable first-order sentence, and let Σ = {Kϕ}. The only MKNF model for Σ is M = {I : I |= ϕ}. Hence, Σ |=MKNF Kϕ, and Σ |=MKNF ¬Kψ for each ψ ∈ L such that ϕ ⊃ ψ is not a valid first-order sentence. Therefore, the agent modeled by Σ has minimal knowledge, in the sense that she only knows ϕ and the objective knowledge logically implied by ϕ. Example 2. Let ϕ be a satisfiable first-order sentence, and let Σ = {¬A(∀x.p(x)) ⊃ Kϕ} It is easy to see that the only MKNF model for Σ is M = {I : I |= ϕ}, since ∀x.p(x) can be assumed not to hold by the agent modeled by Σ, which is then able to conclude Kϕ. Notably, the meaning of Σ is analogous to the default rule : ∃x.¬p(x) ϕ in Reiter’s default logic [16]. Also, let Σ = {K bird ∧ ¬A¬ flies ⊃ K flies, K bird } In a way analogous to the previous case, it can be shown that the only MKNF model for Σ is M = {I : I |= bird ∧ flies}. Therefore, Σ |=MKNF K flies. As shown in [16], Σ corresponds to the default theory ({
bird : flies }, bird ) flies
It turns out that, when restricting to theories composed of K-formulas, MKNF corresponds to the modal logic of minimal knowledge due to Halpern and Moses [9], also known as ground nonmonotonic modal logic S5G [11,4]. Moreover, it has been shown [18,22] that, under the restriction that the theory be composed of A-formulas, MKNF corresponds to Moore’s autoepistemic logic [20]. Consequently, the logic MKNF can be interpreted as a generalization of both Halpern and Moses’ logic of minimal knowledge and Moore’s autoepistemic logic.
336
3
R. Rosati
Reasoning Without Quantifying-In
In this section we study reasoning in MKNF theories without quantifying-in. Specifically, we first present a finite characterization of the models of a sentence Σ ∈ LM 2 which is based on the use of partitions of modal sentences occurring in Σ. Based on such a characterization, we are then able to define an algorithm for reasoning in the quantifying-in-free fragment of MKNF, showing that extending a decidable fragment of first-order logic with minimal knowledge and negation as failure preserves decidability of reasoning. 3.1
Characterizing MKNF Models
We now present a finite characterization of the MKNF models of a formula Σ ∈ LM . Such a characterization extends the one defined in [22] for the propositional fragment of MKNF, and is based on the definition of a correspondence between the preferred models of a theory and the partitions of the set of modal subformulas of the theory. In fact, such partitions can be used in order to provide a finite characterization of a universal Kripke structure: specifically, a partition satisfying certain properties identifies a particular universal Kripke structure M , by uniquely determining an objective theory such that M is the set of all interpretations satisfying such a theory. First, we introduce some preliminary definitions. Following [19], we call modal atom a formula of the form Kϕ or Aϕ, with ϕ ∈ LM . Definition 3. Given a formula Σ ∈ LM , we call modal atoms of Σ (and denote as M A(Σ)) the set composed of the formula KΣ and the set of modal atoms occurring in Σ. Following [8], we say that an occurrence of a subformula ψ in a formula ϕ ∈ LM is strict if it does not lie within the scope of a modal operator. E.g., let Σ = Kϕ ∧ A(Kψ ∨ ξ). The occurrence of Kϕ in Σ is strict, while the occurrence of Kψ is not strict. Definition 4. Let Σ ∈ LM and let (P, N ) be a partition of a set of modal atoms. We denote as Σ(P, N ) the sentence obtained from Σ by substituting each strict occurrence in Σ of a sentence in P with true, and each strict occurrence in Σ of a sentence in N with false. Observe that only the occurrences in Σ of modal subformulas which are not within the scope of another modality are replaced; notice also that, if P ∪ N contains M A(Σ), then Σ(P, N ) is an objective sentence. In this case, the pair (P, N ) identifies a guess on the modal subformulas from Σ, i.e., P contains the modal subformulas of Σ assumed to hold, while N contains the modal subformulas of Σ assumed not to hold. 2
From now on we only deal with finite MKNF theories, hence we restrict our attention to single MKNF formulas, since a finite theory corresponds to the formula obtained by conjoining all formulas in the theory.
Towards First-Order Nonmonotonic Reasoning
337
Definition 5. Let Σ ∈ LM and let (P, N ) be a partition of M A(Σ). We denote as ob(P, N ) the objective sentence ^ ϕ(P, N ) ob(P, N ) = Kϕ∈P
Roughly speaking, the objective sentence ob(P, N ) represents the “objective knowledge” implied by the guess (P, N ) on the sentences of the form Kϕ belonging to P : in fact, ob(P, N ) corresponds to the objective knowledge implied on M in each structure (M, M 0 ) satisfying the guess on the modal atoms given by (P, N ), since in each such structure the objective sentence ob(P, N ) is satisfied in each interpretation J ∈ M , i.e., J |= ob(P, N ). Example 3. Let Σ = (K(∃x.p1 (x))∨¬A(p2 (a)∧p3 (b∧c)))∧((∀x.¬p4 (x))∨¬K(¬p5 (d)∨p6 (e, f, g))) Then, M A(Σ) = {KΣ, K(∃x.p1 (x)), A(p2 (a)∧p3 (b∧c)), K(¬p5 (d)∨p6 (e, f, g))}. Now, let P = {KΣ, K(∃x.p1 (x))} N = {K(¬p5 (d) ∨ p6 (e, f, g)), A(p2 (a) ∧ p3 (b ∧ c))} Then, ob(P, N ) = (true ∨ ¬false) ∧ ((∀x.¬p4 (x)) ∨ true) ∧ (∃x.p1 (x)), which is u t equivalent to (∀x.¬p4 (x)) ∧ (∃x.p1 (x)). Definition 6. We say that a pair of sets of interpretations (M, M 0 ) induces the partition (P, N ) of M A(Σ) if, for each modal atom ξ ∈ M A(Σ), ξ ∈ P iff (M, M 0 ) |= ξ. It can be shown that, if M is an MKNF model for Σ which induces the partition (P, N ) of M A(Σ), then the sentence ob(P, N ) completely characterizes the set of interpretations M . Theorem 1. Let Σ ∈ LM , let M be an MKNF model for Σ, and let (P, N ) be the partition of M A(Σ) induced by (M, M ). Then, M = {J : J |= ob(P, N )}. Informally, the above theorem states that each MKNF model for Σ can be associated with a partition (P, N ) of the modal atoms of Σ; moreover, the objective sentence ob(P, N ) exactly characterizes the set of interpretations of an MKNF model M , in the sense that M is the set of all interpretations satisfying ob(P, N ). This provides a finite characterization of all MKNF models for Σ. Definition 7. Let Σ ∈ LM , ϕ1 , ϕ2 ∈ L. We denote as prt(Σ, ϕ1 , ϕ2 ) the partition of M A(Σ) induced by (M1 , M2 ), where M1 = {I : I |= ϕ1 } M2 = {I : I |= ϕ2 }
338
R. Rosati
In order to simplify notation, we denote as prt(Σ, ϕ) the partition prt(Σ, ϕ, ϕ). The following theorem establishes a correspondence between the MKNF models of a sentence Σ ∈ LM and the partitions of M A(Σ) satisfying a given set of conditions. Theorem 2. Let Σ ∈ LM . Then, Σ is MKNF-satisfiable iff there exists a partition (P, N ) of M A(Σ) which satisfies the following conditions: (a) (P, N ) = prt(Σ, ob(P, N )); (b) KΣ ∈ P ; (c) for each partition (P 0 , N 0 ) 6= (P, N ) of M A(Σ), at least one of the following conditions holds: (c1) KΣ ∈ N ; (c2) (P 0 , N 0 ) 6= prt(Σ, ob(P 0 , N 0 ), ob(P, N )); (c3) the objective sentence ob(P, N ) ∧ ¬ob(P 0 , N 0 ) is satisfiable. Notice that the above theorem provides for a finite, sufficient set of conditions which allows for identifying all MKNF models of a formula Σ ∈ LM . 3.2
Reasoning Method
As for effective methods for reasoning in first-order MKNF, we recall that MKNF-satisfiability in unrestricted LM is not a decidable problem, since establishing MKNF-satisfiability of objective sentences corresponds to solving the satisfiability problem for full first-order logic. However, the characterization provided by Theorem 2 allows for the definition of an algorithm for reasoning in subsets of LM built upon decidable fragments of first-order logic. In the following, we say that a language L0 ⊆ L is closed under boolean composition if, for each ϕ1 , ϕ2 ∈ L0 , ϕ1 ∧ ϕ2 ∈ L0 and ¬ϕ1 ∈ L0 . Moreover, we denote as L0M the subset of LM built upon L0 , i.e., the modal extension of L0 obtained according to Definition 1. The following lemma provides a constructive way to build the partition prt(Σ, ϕ, ψ), given the formulas Σ, ϕ, ψ. Lemma 1. Let Σ ∈ LM , ϕ, ψ ∈ L. Let (P, N ) be the partition of M A(Σ) built as follows: 1. start from P = N = ∅; 2. for each modal atom Kξ in M A(Σ) such that ξ ∈ L, if the objective sentence ϕ ⊃ ξ is valid, then add Kξ to P and replace in M A(Σ) all occurrences of Kξ with true, otherwise add Kξ to N and replace in M A(Σ) all occurrences of Kξ with false; 3. for each modal atom Aξ in M A(Σ) such that ξ ∈ L, if the objective sentence ψ ⊃ ξ is valid, then add Aξ to P and replace in M A(Σ) all occurrences of Aξ with true, otherwise add Aξ to N and replace in M A(Σ) all occurrences of Aξ with false; 4. iteratively apply the above rules until all modal atoms have been replaced in M A(Σ).
Towards First-Order Nonmonotonic Reasoning
339
Algorithm MKNF-Sat(Σ) Input: sentence Σ ∈ L0M Output: true if Σ is MKNF-satisfiable, false otherwise. begin if there exists partition (P, N ) of M A(Σ) such that (a) (P, N ) = prt(Σ, ob(P, N )) and (b) KΣ ∈ P and (c) for each partition (P 0 , N 0 ) 6= (P, N ) of M A(Σ), (c1) KΣ ∈ N or (c2) (P 0 , N 0 ) 6= prt(Σ, ob(P 0 , N 0 ), ob(P, N )) or (c3) ob(P, N ) ∧ ¬ob(P 0 , N 0 ) is satisfiable then return true else return false end Fig. 1. Algorithm MKNF-Sat.
Then, (P, N ) = prt(Σ, ϕ, ψ). The above lemma allows us to prove decidability of MKNF-satisfiability for subsets of LM built upon decidable subsets of the first-order language L. In fact, it is immediate to see that, since the set M A(Σ) is finite, there is a finite number of partitions of M A(Σ); moreover, by Lemma 1, for each such partition (P, N ), conditions (a) and (c) in Theorem 2 can be computed through a finite number of satisfiability checks in L0 . In particular, given a partition (P, N ) (that is, the objective sentence ob(P, N )), Lemma 1 allows for building the partition prt(Σ, ob(P, N )), while, given the partitions (P, N ) and (P 0 , N 0 ) (namely, the objective sentences ob(P, N ) and ob(P 0 , N 0 )), the same lemma allows for building the partition prt(Σ, ob(P 0 , N 0 ), ob(P, N )). Hence, the following property holds. Theorem 3. Let L0 ⊂ L. If L0 is closed under boolean composition and satisfiability in L0 is decidable, then MKNF-satisfiability in L0M is decidable. Therefore, the above theorem states that reasoning about minimal knowledge and negation as failure in the modal extension (without quantifying-in) of a decidable fragment of first-order logic closed under boolean composition is decidable. In Figure 1 we present the algorithm MKNF-Sat for computing satisfiability in any fragment L0M of LM satisfying the conditions of Theorem 3. The algorithm is based on Theorem 2, and relies on Lemma 1, which provides a constructive way to build the partition prt(Σ, ϕ, ψ) starting from the sentences Σ, ϕ, and ψ, again using a procedure for checking satisfiability in L0 . Therefore, the algorithm computes MKNF-satisfiability in L0M by reducing such a problem to a number of satisfiability problems in L0 .
340
R. Rosati
Informally, the algorithm checks whether there exists a partition (P, N ) of M A(Σ) satisfying the three conditions (a), (b), (c). Intuitively, conditions (a) and (b) state that the partition cannot be self-contradictory and must be consistent with Σ: in particular, the condition (P, N ) = prt(Σ, ob(P, N )) establishes that the objective knowledge implied by the partition (P, N ) (that is, the sentence ob(P, N )) identifies a set of interpretations M = {I : I |= ob(P, N )} such that (M, M ) induces the same partition (P, N ) of M A(Σ). Then, condition (c) corresponds to check whether such a structure (M, M ) identifies an MKNF model for Σ according to Definition 2, i.e., whether there is no set M 0 such that M 0 ⊃ M and (M 0 , M ) satisfies Σ. Again, the search for such a structure is performed by examining whether there exists a partition of M A(Σ), different from (P, N ), which does not satisfy any of the conditions (c1), (c2), (c3): in particular, condition (c3) is satisfied iff the set M 0 identified by the partition (P 0 , N 0 ) is not a superset of M . Correctness of the algorithm follows immediately from Theorem 2. We illustrate the algorithm through the following propositional example. Example 4. Suppose Σ = K(a ∨ Kb) ∧ (¬A(¬c ∨ ¬d) ∨ KAb) ∧ c Then,
M A(Σ) = {KΣ, K(a ∨ Kb), Kb, A(¬c ∨ ¬d), KAb, Ab}
Now suppose that (P, N ) = (P1 , N1 ), where P1 = {KΣ, K(a ∨ Kb)} N1 = {Kb, KAb, A(¬c ∨ ¬d), Ab} Then, ob(P, N ) = (true ∧ (¬false ∨ false) ∧ c) ∧ (a ∨ false) (which is equivalent to c ∧ a). Now, let M = {I : I |= c ∧ a}: it is easy to see that (M, M ) satisfies the modal atoms in P , while it does not satisfy the modal atoms in N , hence (P, N ) = prt(Σ, ob(P, N )), thus satisfying condition (a) of the algorithm. Then, since KΣ ∈ P , condition (b) of the algorithm is satisfied. Finally, it is easy to verify that either condition (c1) or condition (c2) holds for each partition of M A(Σ) different from (P1 , N1 ), with the exception of (P2 , N2 ), where P2 = {KΣ, K(a ∨ Kb), Kb, Ab, KAb} N2 = {A(¬c ∨ ¬d)} So let (P 0 , N 0 ) = (P2 , N2 ): since KΣ ∈ P 0 , (P 0 , N 0 ) does not satisfy condition (c1) in the algorithm. Moreover, ob(P 0 , N 0 ) = (true ∧ (¬false ∨ true) ∧ c) ∧ (a ∨ true) ∧ b ∧ true, which is equivalent to c ∧ b. It is immediate to see that (P 0 , N 0 ) = prt(Σ, ob(P 0 , N 0 ), ob(P, N )), thus condition (c2) of the algorithm is not satisfied by (P 0 , N 0 ). Finally, since ob(P, N ) is equivalent to c ∧ a, ob(P, N ) ∧ ¬ob(P 0 , N 0 ) is equivalent to the satisfiable formula c ∧ a ∧ (¬c ∨ ¬b), therefore condition (c3) holds for (P 0 , N 0 ) = (P2 , N2 ), which implies that condition (c) holds for t u (P, N ) = (P1 , N1 ). Consequently, MKNF-Sat(Σ) returns true.
Towards First-Order Nonmonotonic Reasoning
341
As for other reasoning problems, we now show that entailment can be reduced to unsatisfiability in MKNF. We remark that such a reduction is not trivial, since in general the deduction theorem does not hold in nonmonotonic logics, in particular it does not hold in MKNF. In the following, we denote with ϕ[K/A] the positive formula obtained from ϕ ∈ LM by substituting each occurrence of K with A. The following theorem is obtained by extending an analogous result for the propositional fragment of MKNF [23]. Theorem 4. Let Σ ∈ LM , ϕ ∈ LK .3 Then, Σ |=MKNF ϕ iff the formula Σ ∧ ¬A(ϕ[K/A]) is MKNF-unsatisfiable. The above theorem implies a duality between the semantics of the modality used in expressions on the right-hand side of the entailment relation and the assumption operator of MKNF: in fact, a formula can pass from the right-hand side to the left-hand side of the entailment relation in MKNF by negating it and replacing all its modalities with the operator A. The above property provides a constructive and easy way to reduce (in linear time) entailment to unsatisfiability in MKNF. We can thus use the algorithm MKNF-Sat to compute entailment in MKNF. We point out the fact that the algorithm MKNF-Sat does not rely on a theorem prover for a modal logic: thus, “modal reasoning” is not actually needed for reasoning in MKNF. This is an interesting feature that MKNF shares with other nonmonotonic modal formalisms, like autoepistemic logic [20] or the autoepistemic logic of knowledge [24].
4
Reasoning with Safe Quantifying-In
In this section we study reasoning in MKNF theories with quantifying-in. Due to lack of space, we only provide a very succint presentation of our study in the present version of the paper. In order to overcome the semantic problems related to quantifying-in, we have to impose some conditions on both syntax and semantics of the first-order language (see e.g. [14] for a discussion of such a topic). As for the syntax, we restrict the objective language to the function-free fragment of the first-order language. As for the semantics, we follow [14], and assume that each first-order interpretation is defined over the same countably infinite domain of elements ∆. Moreover, we interpret constants in the language as standard names, namely: (i) a constant denotes the same element of ∆ in each interpretation; (ii) two constants with different names denote different elements of ∆. In the following, we say that a formula ϕ is subjective if each occurrence of an objective subformula in ϕ lies within the scope of a modal operator. We denote as LQ M the full modal extension (with quantifying-in) of the function-free fragment of L, in which the following restriction holds: if ϕ ∈ LQ M , then every occurrence 3
As in Lifschitz’s original proposal [15], we restrict to entailment of K-formulas.
342
R. Rosati
of a quantifier in ϕ has either an objective or a subjective formula within its scope. For instance, a formula of the form ∀x.¬Kp1 (x) ∨ Kp2 (x) belongs to LQ M, since the formula within the scope of the quantifier is subjective, while a formula of the form ∀x.p1 (x) ∨ Kp2 (x) does not belong to LQ M , since the formula within the scope of the quantifier is neither objective nor subjective. The above restriction allows us to extend the characterization of MKNF models, provided for the quantifying-in-free case by Theorem 2, to theories with quantifying-in. Definition 8. Let Σ ∈ LQ M . We denote as M A∆ (Σ) the infinite set of modal atoms obtained from M A(Σ) by adding, for each modal atom Kϕ(x) (resp. Aϕ(x)) occurring in Σ and such that x is the set of free variables in ϕ, the set of modal atoms obtained from Kϕ(x) (resp. Aϕ(x)) by replacing each variable in x with an element of ∆. Then, in order to generalize the evaluation Σ(P, N ) of a sentence Σ ∈ LQ M with respect to a partition of modal atoms (P, N ), we extend Definition 4 by taking into account subformulas with quantifying-in (that is, subformulas of the form ∃x.ψ(x) when ψ(x) is subjective) as follows: in the evaluation of Σ(P, N ), the subformula ∃x.ψ(x) is replaced by true if there exists a substitution t of x with elements of ∆ such that ψ(t)(P, N ) is equivalent to true, otherwise ∃x.ψ(x) is replaced by false. Moreover, since M A∆ (Σ) is in general infinite, we have to slighly modify Definition 5, considering ob(P, N ) as a set of sentences instead of a single sentence. Definition 9. Let Σ ∈ LQ M and let (P, N ) be a partition of M A∆ (Σ). We denote as ob(P, N ) the set of objective sentences [ ϕ(P, N ) ob(P, N ) = Kϕ∈P
Using the infinite set of modal atoms M A∆ (Σ), we are then able to state a characterization analogous to the one stated by Theorem 2, namely it is possible to decide satisfiability of a formula Σ ∈ LQ M by looking for a partition of M A∆ (Σ) satisfying a set of conditions analogous to the one reported in Theorem 2. In the following, prt ∆ (Σ, ϕ1 , ϕ2 ) denotes the partition of M A∆ (Σ) induced by (M1 , M2 ), where M1 = {I : I |= ϕ1 }, M2 = {I : I |= ϕ2 }, and prt ∆ (Σ, ϕ) denotes the partition prt ∆ (Σ, ϕ, ϕ). Theorem 5. Let Σ ∈ LQ M . Then, Σ is MKNF-satisfiable iff there exists a partition (P, N ) of M A∆ (Σ) which satisfies the following conditions: (a) (P, N ) = prt ∆ (Σ, ob(P, N )); (b) KΣ ∈ P ; (c) for each partition (P 0 , N 0 ) 6= (P, N ) of M A∆ (Σ), at least one of the following conditions holds: (c1) KΣ ∈ N ;
Towards First-Order Nonmonotonic Reasoning
343
(c2) (P 0 , N 0 ) 6= prt ∆ (Σ, ob(P 0 , N 0 ), ob(P, N )); (c3) the objective theory ob(P, N ) ∪ {¬ϕ|ϕ ∈ ob(P 0 , N 0 )} is satisfiable. Unfortunately, unlike the quantifying-in-free case, such a characterization cannot be directly turned into a decidable satisfiability procedure for formulas in fragments of LQ M built upon decidable subsets of L, since the set M A∆ (Σ) is infinite. However, it is possible to impose further restrictions on the form of quantifying-in, in a way such that only a finite subset of ∆ can be considered in the generation of the set M A∆ (Σ). Definition 10. Let LSM be the set of formulas from LQ M of the form Σ = ΣM ∧ ΣQ1 ∧ . . . ∧ ΣQh , where ΣM is a formula from LM and each ΣQi (1 ≤ i ≤ h) is a formula satisfying one of the following conditions: (a) ΣQi is of the form ∃x.ϕ(x), where ϕ(x) is a subjective formula without quantifying-in; (b) ΣQi is of the form ∀x.ϕ(x), where ϕ(x) is a subjective formula of the form ¬Kψ1 (x) ∨ . . . ∨ ¬Kψn (x) ∨ ϕ0 (x), in which ϕ0 (x) is a subjective formula without quantifying-in, each ψi (x) is an objective formula, and for each x ∈ x there exists i (1 ≤ i ≤ n) such that the objective sentence ∃(x − {x}).(∀x.ψi (x)) is not entailed by ΣM , i.e., ΣM 6|=MKNF ∃(x − {x}).(∀x.ψi (x)). Under the above conditions, it is possible to show that a characterization analogous to Theorem 2 also holds if, in the construction of M A∆ (Σ), we restrict the instantiation of the free variables of modal atoms occurring in Σ to a finite subset ∆F of ∆, of the form ∆F = ∆Σ ∪ {i1 , . . . , ik } where k is the number of existentially quantified variables x in subformulas ΣQi of the form (a), ∆Σ is the set of constants occurring in Σ, and i1 , . . . , ik are new constant names such that ij 6∈∆Σ and, if j 6= l then ij 6= il , for each 1 ≤ j ≤ k, 1 ≤ l ≤ k. Roughly speaking, the restriction imposed upon a formula Σ from LSM guarantees that each subformula of Σ with quantifying-in may have effect only on a finite number of elements of the interpretation domain. With respect to existentially quantified formulas with quantifying-in, such a property is trivially satisfied, since it is easy to see that such a formula may affect the set of constant explicitly mentioned in Σ plus at most one new element of the domain for each existentially quantified variable. And since satisfiability is independent of the particular elements chosen, we can consider without loss of generality the k new constants i1 , . . . , ik . On the other hand, for universally quantified formulas with quantifying-in, this property in general does not hold: e.g., the formula ∀x.Kp(x) constrains every element of ∆ to satisfy the property Kp. Now, it is possible to show
344
R. Rosati
that condition (b) implies that the formula ΣQi only affects elements from ∆F . In fact, by condition (b), for each universally quantified variable x and for each choice (say t) of the other universally quantified variables, there exists a disjunct ¬Kψi (x) in ΣQi such that the objective sentence ∀x.ψi (t, x) is not implied by the formula ΣM . Due to the form of Σ, this in turn implies that only a choice t from elements of ∆F may be such that each sentence Kψi (t) is implied by Σ: from this fact and the minimal knowledge semantics of MKNF, it follows that the sentence ∀x.¬Kψ1 (x) ∨ . . . ∨ ¬Kψn (x) ∨ ϕ0 (x) only affects elements from the set ∆F , which intuitively is explained by the fact that, in such a sentence, the disjuncts ¬Kψi (x) are “preferred” over the disjunct ϕ0 (x), hence the above formula imposes some knowledge (i.e., the property ϕ0 ) only for those choices t such that it is inconsistent to assume any of the sentences ¬Kψi (t). Based on the above property, it is possible to reduce satisfiability of MKNF formulas from LSM to satisfiability in LM , through a pre-processing step which eliminates quantifiers occurring outside modal operators, by instantiating such S quantified variables to the elements of ∆F . In the following, we denote as L0 M the S 0 0 subset of LM built upon L , i.e., the modal extension of L obtained according to Definition 10. Theorem 6. Let L0 ⊂ L. If L0 is closed under boolean composition and satisS fiability in L0 is decidable, then MKNF-satisfiability in L0 M is decidable.
5
Relationship with Datalog
The interest of reasoning in MKNF theories with quantifying-in arises from the fact that reasoning in several well-known nonmonotonic logics reduces to reasoning in such kind of MKNF theories. For instance, some forms of open defaults can be naturally formalized through MKNF theories with quantifying-in (see [15]). Moreover, it has been recently proven that some implemented nonmonotonic features of frame-based knowledge representation systems can be formally reconstruced in terms of MKNF theories with quantifying-in [3]. In particular, it is worth pointing out the relationship between quantifying-in in MKNF and (disjunctive) Datalog. Indeed, referring to Definition 10 and to the translation, provided in [16], of extended disjunctive logic programs in terms of MKNF formulas, it is immediate to see that the translation of a disjunctive Datalog program corresponds to a formula Σ in LSM , such that the quantifyingin-free part ΣM corresponds to the conjunction of the facts in the extensional database, and for each program rule ri of the form p1 (x1 ) ∨ . . . ∨ pn (xn ) ← q1 (y 1 ), . . . , qm (y m ), not r(z 1 ), . . . , not r(z l ) there is a subformula ΣQi of the form ∀w.¬Kq1 (y 1 ) ∨ . . . ∨ ¬Kqm (y m ) ∨ Ar(z 1 ) ∨ . . . ∨ Ar(z l ) ∨ Kp1 (x1 ) ∨ . . . ∨ Kpn (xn ) in which w denotes the set of all variables appearing in each xi , y i , z i . Notably, it is easy to see that ΣQi satisfies condition (b) in Definition 10, since, due to Datalog’s rule safeness, each variable in w must appear in at least one of the y i .
Towards First-Order Nonmonotonic Reasoning
345
Hence, our notion of “safeness” for quantifying-in in MKNF, provided by Definition 10, can be seen as a generalization of the notion of rule safeness in deductive databases. Notice that rule safeness in Datalog is necessary in order to naturally interpret a Datalog program over a finite interpretation domain (the Herbrand universe of the program, namely the constants appearing in the program). Conversely, in the case of MKNF formulas from LSM , the semantics is based on an infinite interpretation domain, while safeness forces formulas with quantifying-in to affect only a finite subset of such a domain. These correspondences suggest that the relationship between disjunctive Datalog and safe quantifying-in in MKNF should be further analyzed. Acknowledgments This research has been partially supported by CNR grant 203.15.10.
References 1. A. Beringer and T. Schaub. Minimal belief and negation as failure: a feasible approach. In Proc. of the 11th Nat. Conf. on Artificial Intelligence (AAAI’93), pages 400–405, 1993. 2. M. Cadoli and M. Schaerf. A survey of complexity results for non-monotonic logics. J. of Logic Programming, 17:127–160, 1993. 3. F. M. Donini, D. Nardi, and R. Rosati. Autoepistemic description logics. In Proc. of the 15th Int. Joint Conf. on Artificial Intelligence (IJCAI’97), pages 136–141, 1997. 4. F. M. Donini, D. Nardi, and R. Rosati. Ground nonmonotonic modal logics. J. of Logic and Computation, 7(4):523–548, Aug. 1997. 5. T. Eiter and G. Gottlob. Propositional circumscription and extended closed world reasoning are Π2p -complete. Theoretical Computer Science, 114:231–245, 1993. 6. M. Gelfond and V. Lifschitz. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9:365–385, 1991. 7. G. Gottlob. Complexity results for nonmonotonic logics. J. of Logic and Computation, 2:397–425, 1992. 8. G. Gottlob. NP trees and Carnap’s modal logic. J. of the ACM, 42(2):421–457, 1995. 9. J. Y. Halpern and Y. Moses. Towards a theory of knowledge and ignorance: Preliminary report. In K. Apt, editor, Logic and models of concurrent systems. Springer-Verlag, 1985. 10. K. Inoue and C. Sakama. On positive occurrences of negation as failure. In Proc. of the 4th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR’94), pages 293–304. Morgan Kaufmann, Los Altos, 1994. 11. M. Kaminski. Embedding a default system into nonmonotonic logics. Fundamenta Informaticae, 14:345–354, 1991. 12. K. Konolige. On the relation between autoepistemic logic and circumscription. In Proc. of the 11th Int. Joint Conf. on Artificial Intelligence (IJCAI’89), pages 1213–1218, 1989. 13. G. Lakemeyer. Limited reasoning in first-order knowledge bases with full introspection. Artificial Intelligence, 84:209–255, 1996.
346
R. Rosati
14. H. J. Levesque. All I know: a study of autoepistemic logic. Artificial Intelligence, 42:263–310, 1990. 15. V. Lifschitz. Nonmonotonic databases and epistemic queries. In Proc. of the 12th Int. Joint Conf. on Artificial Intelligence (IJCAI’91), pages 381–386, Sydney, 1991. 16. V. Lifschitz. Minimal belief and negation as failure. Artificial Intelligence, 70:53– 72, 1994. 17. V. Lifschitz and T. Woo. Answer sets in general nonmonotonic reasoning (preliminary report). In Proc. of the 3rd Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR’92), pages 603–614. Morgan Kaufmann, Los Altos, 1992. 18. F. Lin and Y. Shoham. Epistemic semantics for fixed-point non-monotonic logics. Artificial Intelligence, 57:271–289, 1992. 19. W. Marek and M. Truszczy´ nski. Nonmonotonic Logics – Context-Dependent Reasoning. Springer-Verlag, 1993. 20. R. C. Moore. Semantical considerations on nonmonotonic logic. Artificial Intelligence, 25:75–94, 1985. 21. I. Niemel¨ a. On the decidability and complexity of autoepistemic reasoning. Fundamenta Informaticae, 17(1,2):117–156, 1992. 22. R. Rosati. Reasoning with minimal belief and negation as failure: Algorithms and complexity. In Proc. of the 14th Nat. Conf. on Artificial Intelligence (AAAI’97), pages 430–435. AAAI Press/The MIT Press, 1997. 23. R. Rosati. Reducing query answering to satisfiability in nonmonotonic logics. In Proc. of the 15th Nat. Conf. on Artificial Intelligence (AAAI’98), pages 853–858. AAAI Press/The MIT Press, 1998. 24. G. Schwarz. Autoepistemic logic of knowledge. In Proc. of the 1st Int. Workshop on Logic Programming and Non-monotonic Reasoning (LPNMR’91), pages 260–274. The MIT Press, 1991. 25. G. Schwarz and V. Lifschitz. Extended logic programs as autoepistemic theories. In Proc. of the 2nd Int. Workshop on Logic Programming and Non-monotonic Reasoning (LPNMR’93), pages 101–114. The MIT Press, 1993.
A Comparison of Sceptical NAF-Free Logic Programming Approaches G. Antoniou, M.J. Maher, Billington, and G. Governatori CIT, Griffith University Nathan, QLD 4111, Australia {ga,mjm,db,guido}@cit.gu.edu.au
Abstract. Recently there has been increased interest in logic programmingbased default reasoning approaches which are not using negation-asfailure in their object language. Instead, default reasoning is modelled by rules and a priority relation among them. Historically the first logic in this class was Defeasible Logic. In this paper we will study its relationship to other approaches which also rely on the idea of using logic rules and priorities. In particular we will study sceptical LPwNF, courteous logic programs, and priority logic.
1
Introduction
Recently there has been increased interest in modelling default reasoning by means of rules without negation as failure, and a priority relation. Defeasible Logic [12,13] is an early approach to sceptical nonmonotonic reasoning [1] which was based on rules without negation as failure, plus a priority relation. In fact it has an implementation as a straightforward extension of Prolog [5]. LPwNF (Logic Programming without Negation as Failure) is a recent approach, introduced in [6]. It supports both credulous and sceptical reasoning, unlike defeasible logic, and has an argumentation-theoretic characterisation. The main contribution of this paper is to compare defeasible logic with sceptical LPwNF. We discuss how the two approaches differ. The main difference is that LPwNF does not take into account teams of rules [7] supporting the same conclusion (a team contains all rules with a certain head), but rather views rules individually. By doing so, LPwNF fails to draw desirable conclusions that defeasible logic can, as we show in this paper. On the other hand, defeasible logic can prove everything that sceptical LPwNF can. Also we compare defeasible logic with courteous logic programs [7] and priority logic [17,18]. Finally we point out at an earlier result which establishes a relationship between defeasible logic and inheritance networks.
2
Defeasible Logic
In this paper we restrict attention to propositional defeasible logic, and assume that the reader is familiar with the notation and basic notions of propositional M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 347–356, 1999. c Springer-Verlag Berlin Heidelberg 1999
348
G. Antoniou
logic. If q is a literal, ∼ q denotes the complementary literal (if q is a positive literal p then ∼ q is ¬p; and if q is ¬p, then ∼ q is p). A rule r consists of its antecedent A(r) (written on the left; A(r) may be omitted if it is the empty set) which is a finite set of literals, an arrow, and its consequent (or head) C(r) which is a literal. In writing rules we omit set notation for antecedents. There are three kinds of rules: Strict rules are denoted by A → p and represent indisputable conclusions (“Emus are birds”); defeasible rules are denoted by A ⇒ p and represent conclusions that can be defeated by contrary evidence (“Birds usually fly”); and defeaters are denoted by A ; p and represent knowledge which might prevent the conclusion ¬p from being drawn without directly supporting the conclusion p (“Heavy animals may not fly”). Given a set R of rules, we denote the set of all strict rules in R by Rs , and the set of strict and defeasible rules in R by Rsd . R[q] denotes the set of rules in R with consequent q. In the following we use the formalization of [4]. A superiority relation on R is an acyclic relation > on R (that is, the transitive closure of > is irreflexive), and is used to represent priority information among rules. A defeasible theory T is a triple (F, R, >) where F is a finite set of literals (called facts), R a finite set of rules, and > a superiority relation on R. A conclusion of T is a tagged literal and can have one of the following four forms: – +∆q, which is intended to mean that q is definitely provable in T . – −∆q, which is intended to mean that we have proved that q is not definitely provable in T . – +∂q, which is intended to mean that q is defeasibly provable in T . – −∂q which is intended to mean that we have proved that q is not defeasibly provable in T . A derivation (or proof) in T = (F, R, >) is a finite sequence P = (P (1), . . . P (n)) of tagged literals satisfying the following conditions (P (1..i) denotes the initial part of the sequence P of length i): +∆: If P (i + 1) = +∆q then either q ∈ F or ∃r ∈ Rs [q] ∀a ∈ A(r) : +∆a ∈ P (1..i) −∆: If P (i + 1) = −∆q then q 6∈F and ∀r ∈ Rs [q] ∃a ∈ A(r) : −∆a ∈ P (1..i) +∆ denotes forward chaining provability, and −∆q denotes its strong negation, that is, finite failure to prove definitely. +∂: If P (i + 1) = +∂q then either (1) +∆q ∈ P (1..i) or (2) ∃r ∈ Rsd [q] such that
A Comparison of Sceptical NAF-Free Logic Programming Approaches
349
(2.1) ∀a ∈ A(r) : +∂a ∈ P (1..i) and (2.2) −∆ ∼ q ∈ P (1..i) and (2.3) ∀s ∈ R[∼ q], either (2.3.1) ∃a ∈ A(s) : −∂a ∈ P (1..i) or (2.3.2) ∃t ∈ Rsd [q] such that ∀a ∈ A(t) : +∂a ∈ P (1..i) and t > s −∂: If P (i + 1) = −∂q then (1) −∆q ∈ P (1..i) and (2) (2.1) ∀r ∈ Rsd [q] ∃a ∈ A(r) : −∂a ∈ P (1..i) or (2.2) +∆ ∼ q ∈ P (1..i) or (2.3) ∃s ∈ R[∼ q] such that (2.3.1) ∀a ∈ A(s) : +∂a ∈ P (1..i) and (2.3.2) ∀t ∈ Rsd [q] either ∃a ∈ A(t) : −∂a ∈ P (1..i) or t 6> s We give a brief explanation of the +∂ rule. One way of proving q defeasibly is to prove q definitely. The other way requires us to find a rule with head q whose antecedents have already been proven defeasibly (2.1). In addition, we must consider and discard potential attacks against q: we must be sure that the negation of q is not definitely provable (2.2), and for every attack on q by a rule with a head ∼ q there must be a stronger (counterattacking) rule with head q 1 . The elements of a derivation are called lines of the derivation. We say that a tagged literal L is provable (or derivable) in T = (F, R, >), denoted T ` L, iff there is a derivation in T such that L is a line of a proof P . Even though the definition seems complicated, it follows ideas which are intuitively appealing. For an explanation of this definition see [11]. In the remainder of this paper we will only need to consider defeasible rules and a superiority relation; facts, strict rules and defeaters will not be necessary. We conclude this section with an example, adapted from [6]. r1 r2 r3 r4 r2 r4
: bird(X) ⇒ f ly(X) r5 : : penguin(X) ⇒ ¬f ly(X) f1 : : walkslikepeng(X) ⇒ penguin(X) f2 : : ¬f latf eet(X) ⇒ ¬penguin(X) f3 : > r1 > r3
penguin(X) ⇒ bird(X) bird(tweety) walkslikepeng(tweety) ¬f latf eet(tweety)
We can derive +∂¬penguin(tweety) because both rules r3 and r4 are applicable (with instantiation tweety) and r4 is stronger than r3 . For the same reason we can derive −∂penguin(tweety). The fact f1 allows us to derive +∆bird(tweety), thus also +∂bird(tweety). Therefore rule r1 (with instantiation tweety) is applicable. Moreover rule r2 , the 1
It should also be noted that defeaters are only used as potential attacks on conclusions, but are never used to support a conclusion (directly or in a counterattack). This treatment is consistent with the intuitive idea of a defeater as explained previously.
350
G. Antoniou
only possible way for proving ¬f ly(tweety), cannot be applied because we have already derived −∂penguin(tweety). Thus we can derive f ly(tweety).
3
LPwNF
In LPwNF [6], a logic program consist of a set of rules of the form p ← q1 , . . . , qn , where p, q1 , . . . , qn are literals, and an irreflexive and transitive priority relation > among rules. A proof theory and a corresponding argumentation framework were introduced in [6]. The main idea of LPwNF is the following: In order to prove a literal q, a type A derivation must be found which proves q. One part of this derivation is a top-level proof of q in the sense of logic programming (SLD-resolution). But additionally every attack on this argument must be counterattacked. Attacks are generated in type B derivations. For an A derivation to succeed all B derivations must fail. In general, a rule r in a type B derivation can attack a rule r0 in a type A derivation if they have complementary heads, and r is not weaker than r0 , that is, r0 6> r. On the other hand, a rule r in a type A derivation can attack a rule r0 in a type B derivation if they have complementary heads, and r > r0 . This reflects the notion of scepticism: it should be easier to attack a positive argument than to counterattack (i.e. attack the attacker). For example, consider the following program which is the same as the example in the previous section, but for variations of syntax. r1 r2 r3 r4 r2 r4
: f ly(X) ← bird(X) r5 : ¬f ly(X) ← penguin(X) r6 : penguin(X) ← walkslikepeng(X) r7 : ¬penguin(X) ← ¬f latf eet(X) r8 > r1 > r3
: : : :
bird(X) ← penguin(X) bird(tweety) ← walkslikepeng(tweety) ← ¬f latf eet(tweety) ←
Here it is possible to prove f ly(tweety). Firstly there is a standard SLD refutation (A derivation) of ← f ly(tweety) via the rules r1 and r6 . Additionally we need to consider all possible attacks on this refutation. In our case, r1 can be attacked by r2 . Thus we start a B derivation with goal ← ¬f ly(tweety) (with first rule r2 ), and have to show that this proof fails. This happens because the rule r3 is successfully counterattacked by r4 . There are no other attacks on the original derivation. The following figure illustrates how the reasoning proceeds. Below we give the formal definition. LPwNF can support either credulous or sceptical reasoning. Since in this paper we are interested in a comparison with defeasible logic, we will restrict ourselves to the sceptical case (as we have already done so far in this section). Also, our presentation is slightly simpler than that of [6]. The reason is that in their paper, Dimopoulos and Kakas showed the soundness of their proof theory w.r.t. an argumentation framework, and they had to make the definition of derivations more complicated to collect the appropriate rules which are used to build an appropriate argument. This is not our concern here, so we just focus on the derivation of formulae.
A Comparison of Sceptical NAF-Free Logic Programming Approaches
argument (A derivation)
attack (B derivation)
← fly(tweety)
← ¬ fly(tweety)
r1
r2
← bird(tweety)
← penguin(tweety)
← ¬ penguin(tweety)
r6
r3
r4
2
351
counter-attack (A derivation)
← walkslikepeng(tweety) ← ¬ flatfeet(tweety) r7 2
r8 2
Fig. 1. A derivation in LPwNF
A type A derivation from (G1 , r) to (Gn , r) is a sequence ((G1 , r), (G2 , r), . . . , (Gn , r), where r is a rule, and each Gi has the form ← q, Q, where q is the selected literal and Q a sequence of literals. For Gi , i ≥ 1, if there is a rule ri such that either 1. i = 1, ri > r, ri resolves with Gi on q, and there is a type B derivation from ({←∼ q}, ri ) to (∅, ri ), or 2. i > 1, ri resolves with Gi on q, and there is a type B derivation from ({←∼ q}, ri ) to (∅, ri ) then Gi+1 is the resolvent of ri with Gi . A type B derivation from (F1 , r) to (Fn , r) is a sequence (F1 , r), (F2 , r), . . . , (Fn , r), where every Fi is of the form Fi = {← q, Q} ∪ Fi0 , q the selected literal, and Fi+1 is constructed from Fi as follows: 1. For i = 1, F1 must have the form ← q. Let R be the set of rules ri which resolve with ← q, and which satisfy the condition ri 6< r. Let C be the set of resolvents of ← q with the rules in R. If [] 6∈C then F2 = C; otherwise there is no F2 . 2. For i > 1, let R be the set of rules ri which resolve with ← q, Q on q. Let R0 be the subset of R containing all rules ri such that there is no A derivation from (←∼ q, ri ) to ([], ri ). Let C be the set of all resolvents of the rules in R0 with the rule ← q, Q, by resolving on q. If [] 6∈C then Fi+1 = C ∪ Fi0 ; otherwise there is no Fi+1 .
352
4
G. Antoniou
A Comparison of LPwNF and Defeasible Logic
Given a logic program without negation as failure P , let T (P ) be the defeasible theory containing the same rules as P , written as defeasible rules, and the same superiority relation. In other words, rules in LPwNF are represented as defeasible rules in defeasible logic. First we show that every conclusion provable in LPwNF can be derived in defeasible logic. The proof goes by induction on the length of a derivation and is found in the full version of this paper. Theorem 1. Let q be a literal which can be sceptically proven in the logic program without negation as failure P , that is, there is a type A derivation from (← q, r) to ([], r) for some rule r. Then T (P ) ` +∂q. However the reverse of the theorem is not true. The reason is that LPwNF argues on the basis of individual rules, whereas defeasible logic argues on the basis of teams of rules with the same head. The difference can be illustrated by the following simple example. r1 r2 r3 r4 r1 r2
: monotreme(X) ⇒ mammal(X) : hasF ur(X) ⇒ mammal(X) : laysEggs(X) ⇒ ¬mammal(X) : hasBill(X) ⇒ ¬mammal(X) > r3 > r4
monotreme(platypus) hasF ur(platypus) laysEggs(platypus) hasBill(platypus)
Intuitively we conclude that platypus is a mammal because for every reason against this conclusion (r3 and r4 ) there is a stronger reason for mammal (platypus) (r1 and r2 respectively). It is easy to see that +∂mammal(platypus) is indeed provable in defeasible logic: there is a rule in support of mammal (platypus), and every rule for ¬mammal(platypus) is overridden by a rule for mammal(platypus). On the other hand, the corresponding logic program without negation as failure is unable to prove mammal(platypus): If we start with r1 , trying to build an A derivation, then we must counter the attack r4 (which is not inferior to r1 ) used in a B derivation. But LPwNF does not allow counterattacks on r4 by another rule with head mammal(platypus), but only by an attack on the body of r4 . The latter is impossible in our case (there is no rule matching ¬hasBill(platypus)). Thus the attack via r4 succeeds and the proof of mammal(platypus) via r1 fails. Similarly, the proof of mammal(platypus) via r2 fails, due to an attack via rule r3 . Thus mammal(platypus) cannot be proven. Our analysis so far has shown that defeasible logic is stronger than LPwNF because it allows attacks to be counterattacked by different rules. But note that a counterattacking rule needs to be stronger than the attacking rule. Thus it is not surprising that if the priority relation is empty, both approaches coincide. Theorem 2. Let P be a logic program without negation as failure with empty priority relation. Then a literal q can be sceptically proven in P iff T (P ) ` +∂q.
A Comparison of Sceptical NAF-Free Logic Programming Approaches
5 5.1
353
Other Approaches Courteous Logic Programs
Courteous logic programs [7] share some basic ideas of defeasible logic. In particular, the approach is logic programming based, implements sceptical reasoning, and is based on competing teams of rules, and a priority relation. It imposes a total stratification on the logic program by demanding that the atom dependency graph be acyclic. This ensures that each stratum contains only rules with head p or ¬p. An answer set is built gradually, stratum by stratum. Compared to defeasible logic, courteous logic programs are more specialized in the following respects: (i) The atom dependency graph of a courteous logic program must be acyclic. This condition is central in the courteous logic program framework, but is not necessary in defeasible logic; (ii) Defeasible logic distinguishes between strict and defeasible conclusions, courteous logic programs do not. Thus defeasible logic is more fine-grained; (iii) Defeasible logic has the concept of a defeater, courteous logic programs do not. Thus defeasible logic offers a greater flexibility in the expression of information. On the other hand, there seems to be a major difference between the two approaches, in that courteous logic programs may use negation as failure. However, a courteous logic program with negation as failure C can be modularly translated into a program C 0 without negation as failure, using a technique suggested in [10]: every rule r : L ← L1 ∧ . . . ∧ Ln ∧ f ail M1 ∧ . . . ∧ f ail Mk can be replaced by the rules: r : L ← L1 ∧ . . . ∧ Ln ∧ pr pr ← ¬pr ← M1 ... ¬pr ← Mk where pr is a new propositional atom. If we restrict attention to the language of C, the programs C and C 0 have the same answer set. Thus, without loss of generality we may assume that a courteous logic program C does not use negation as failure. The corresponding defeasible theory df (C) is obtained by representing every rule in C 0 by an equivalent defeasible rule, and by using the same priority relation as C. Then we are able to show that courteous logic programs are a special case of defeasible logic: Theorem 3. Let C be a courteous logic program. A literal q is in the answer set of C iff df (C) ` +∂q.
354
5.2
G. Antoniou
Priority Logic
Priority logic [17,18] is a knowledge representation language where a theory consists of logic programming-like rules, and a priority relation among them. The meaning of the priority relation is that once a rule r is included in an argument, all rules inferior to r are automatically blocked from being included in the same argument. The semantics of priority logic is based on the notion of a stable argument for the credulous case, and the well-founded argument for the sceptical case. Priority logic is a general framework with many instantiations (based on so-called extensibility functions), and supports both credulous and sceptical reasoning. To allow a fair comparison to defeasible logic, one has to impose the following restrictions: (i) We will only consider defeasible rules in the sense of defeasible logic. That is, we will not distinguish between strict and defeasible rules, and we will restrict attention to rules in which only propositional literals occur (but not more general formulae, as in priority logic). Also, there will be no defeaters. (ii) The priority/superiority relation will only be defined on pairs of rules with complementary heads. (iii) We will consider the two basic instantiations of priority logic, as determined by the extensibility functions R1 and R2 (see [17,18] for details). (iv) We will compare defeasible logic to the sceptical interpretation of priority logic. Under these conditions, the difference between defeasible logic and priority logic is highlighted by the following example: r1 : r2 : r3 : r4 : The
quaker ← r5 : f ootballf an ← republican republican ← r6 : antimilitary ← pacif ist pacif ist ← quaker r7 : ¬antimilitary ← f ootballf an ¬pacif ist ← republican priority relation is empty.
(Obviously in defeasible logic we consider r1 -r7 to be defeasible rules.) In priority logic, if we use the extensibility relation R1 , then the well-founded argument is the set of all rules, and therefore inconsistent. On the other hand, in the defeasible logic version T of the priority logic program, T 6`+∂pacif ist, so the approaches are different. And if we use the extensibility relation R2 , then priority logic does not allow one to prove ¬antimilitary. But defeasible logic can prove +∂¬antimilitary. The difference is caused by the fact that defeasible logic does not propagate ambiguity, as extension-based formalisms like priority logic do (for a discussion of this issue see [15]). 5.3
Inheritance Networks
Nonmonotonic inheritance networks [14,9] were an early nonmonotonic reasoning approach which had powerful implementations, even though they lacked declarativity. Moreover they are based on the use of rules and an implicit notion of priority among rules. In [3] it was shown that inheritance networks as defined in [8] can be represented in defeasible logic. We outline the translation below.
A Comparison of Sceptical NAF-Free Logic Programming Approaches
355
A nonmonotonic inheritance network consists of a set of objects, a set of properties, and a set of arcs which is acyclic. Below is a list of the possible kinds of arcs, where a is an object, and p and q are properties (we use a variation of syntax to be consistent with this paper): a ⇒ p, meaning that a has the property p. a 6⇒p, meaning that a does not have property p. p ⇒ q, meaning that an object with property p typically has property q. p 6⇒q, meaning that an object with property p typically does not have property q. A nonmonotonic inheritance network N is naturally translated into a defeasible theory T (N ): For every arc a ⇒ p in N include the fact p(a) in T (N ). For every a 6⇒p in N include the fact ¬p(a) in T (N ). For every path a ⇒ . . . ⇒ p ⇒ q in N include the rule p(a) ⇒ q(a) in T (N ). For every path a ⇒ . . . ⇒ p 6⇒q in N include the rule p(a) ⇒ ¬q(a) in T (N ). We have omitted the definition of the superiority relation which simulates specificity in the inheritance networks of [8]. The complicated definition is found in [3]. That paper also proposes a way of compiling specificity into the definition of a derivation, which can be used to make the translation of a nonmonotonic inheritance network into a defeasible theory modular. Result 5.2 Let N be a nonmonotonic inheritance network. Then we may construct a defeasible theory T (N ), such that for every literal q, q is supported by N iff T (N ) ` +∂q.
6
Conclusion
We have looked at the relationship between four logic programming-based formalisms that employ a priority relation among rules and take a sceptical approach to inference. Three, defeasible logic, LPwNF and courteous logic programs, belong to the same “school” of conservative reasoning in the classification of [16], while priority logic takes a fundamentally different approach, which is evident in its propagation of ambiguity. In addition, a class of nonmonotonic inheritance networks can be embedded into defeasible logic, so it belongs, too, to the school of conservative reasoning, even though it is not a logical formalism. Of the three formalisms in the conservative reasoning school, defeasible logic is the most powerful. It is able to draw more conclusions (from the same rules) than LPwNF can, principally because it argues on the basis of teams of rules. Courteous logic programs also employ teams of rules, but the approach is severely restricted in that the atom dependency graph is required to be acyclic. In addition, of course, defeasible logic makes a distinction between definite knowledge (obtained by facts and strict rules) and defeasible knowledge.
356
G. Antoniou
The results of this paper indicate that defeasible logic deserves more attention. In other papers [2,11] we have studied the logic as a formal system, including representation results, properties of the inference relation, and semantics.
References 1. G. Antoniou. Nonmonotonic Reasoning. MIT Press 1997. 2. G. Antoniou, D. Billington, and M.J. Maher. Normal forms for defeasible logic. In Proc. 1998 Joint International Conference and Symposium on Logic Programming, MIT Press 1998. 3. D. Billington, K. de Coster and D. Nute. A modular translation from defeasible nets to defeasible logic. Journal of Experimental and Theoretical Artificial Intelligence 2 (1990): 151-177. 4. D. Billington. Defeasible Logic is Stable. Journal of Logic and Computation 3 (1993): 370–400. 5. M.A. Covington, D. Nute and A. Vellino. Prolog Programming in Depth. Prentice Hall 1997. 6. Y. Dimopoulos and A. Kakas. Logic Programming without Negation as Failure. In Proc. ICLP-95, MIT Press 1995. 7. B.N. Grosof. Prioritized Conflict Handling for Logic Programs. In Proc. Int. Logic Programming Symposium, J. Maluszynski (Ed.), 197–211. MIT Press, 1997. 8. J.F. Horty, R.H. Thomason and D. Touretzky. A skeptical theory of inheritance in nonmonotonic semantic networks. In Proc. AAAI-87, 358-363. 9. J.F. Horty. Some direct theories of nonmonotonic inheritance. In D.M. Gabbay, C.J. Hogger and J.A. Robinson (eds): Handbook of Logic in Artificial Intelligence and Logic Programming Vol. 3, Clarendon Press 1994, 111-187. 10. A.C. Kakas, P. Mancarella and P.M. Dung. The Acceptability Semantics for Logic Programs. In Proc. Eleventh International Conference on Logic Programming, (ICLP’94), 504-519, MIT Press 1994 11. M.J. Maher, G. Antoniou and D. Billington. A Study of Provability in Defeasible Logic. In Proc. 11th Australian Joint Conference on Artificial Intelligence, LNAI 1502, Springer 1998, 215-226. 12. D. Nute. Defeasible Reasoning. In Proc. 20th Hawaii International Conference on Systems Science, IEEE Press 1987, 470–477. 13. D. Nute. Defeasible Logic. In D.M. Gabbay, C.J. Hogger and J.A. Robinson (eds.): Handbook of Logic in Artificial Intelligence and Logic Programming Vol. 3, Oxford University Press 1994, 353-395. 14. D. Touretzky. The mathematics of inheritance systems. Morgan Kaufmann 1986. 15. D. Touretzky, J.F. Horty and R.H. Thomason. A clash of intuitions: The current state of nonmonotonic multiple inheritance systems. In Proc. IJCAI-87, 476-482, Morgan Kaufmann 1987. 16. G. Wagner. Ex contradictione nihil sequitur. In Proc. 12th International Joint Conference on Artificial Intelligence, Morgan Kaufmann 1991. 17. X. Wang, J. You and L. Yuan. Nonmonotonic reasoning by monotonic inferences with priority constraints. In Nonmonotonic Extensions of Logic Programming, J. Dix, P. Pereira, and T. Przymusinski (eds), LNAI 1216, Springer 1997, 91-109. 18. X. Wang, J. You and L. Yuan. Logic programming without default negation revisited. In Proc. IEEE International Conference on Intelligent Processing Systems, IEEE 1997.
Characterizations of Classes of Programs by Three-Valued Operators Pascal Hitzler1 and Anthony Karel Seda2 1
National University of Ireland, Cork, Ireland,
[email protected], WWW home page: http://maths.ucc.ie/˜pascal/index.html 2 National University of Ireland, Cork, Ireland,
[email protected], WWW home page: http://maths.ucc.ie/˜seda/index.html
Abstract. Several important classes of normal logic programs, including the classes of acyclic, acceptable, and locally hierarchical programs, have the property that every program in the class has a unique twovalued supported model. In this paper, we call such classes unique supported model classes. We analyse and characterize these classes by means of operators on three-valued logics. Our studies will motivate the definition of a larger unique supported model class which we call the class of Φ∗ -accessible programs. Finally, we show that the class of Φ∗ -accessible programs is computationally adequate in that every partial recursive function can be implemented by such a program.
1
Introduction
A good deal of recent research in logic programming has been put into the determination of standard, or intended, models for normal logic programs. Some standard semantics, such as the well-founded semantics ([14]) or the stable model semantics ([15]), are applicable to very large classes of programs. However, whilst the general applicability of these semantics is certainly desirable, the study of these large classes of programs has a natural practical limitation: it is possible to assign standard models to logic programs for which useful interpreters have not yet been implemented, and for which it is questionable whether or not this ever will be possible. It is therefore reasonable to study smaller classes of programs whose behaviour is more controlled, so long as these classes are large enough for practical purposes. On the other hand, certain classes of logic programs have been defined purely in order to study termination and computability properties. For instance, the acyclic programs of Cavedon [8] (initially called locally ω-hierarchical programs by him) are precisely the terminating programs, and were shown by Bezem [7] to be able to compute all the total computable functions, see also [1]. Next, the class of acceptable programs ([3]) was introduced by Apt and Pedreschi. Such programs are left-terminating and, conversely, left-terminating non-floundering M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 357–371, 1999. c Springer-Verlag Berlin Heidelberg 1999
358
P. Hitzler and A.K. Seda
programs are acceptable. In fact, the class of all acceptable programs strictly contains the acyclic programs but, nevertheless, is not computationally adequate, i.e. not every partial recursive function can be implemented by such a program. Finally, the class of all locally hierarchical programs was introduced in [8]. However, this class, which also contains all acyclic programs, is computationally adequate under Prolog if the use of safe cuts is allowed ([23]). All the programs contained in the classes mentioned in the previous paragraph have a common property: they have unique supported models. These classes will be called here unique supported model classes. In fact, they even have unique three-valued models under Fitting’s Kripke-Kleene semantics ([11]). Thus, the programs in question leave little doubt about the semantics, i.e. the model, which is to be assigned to them as standard model and, in addition, they have interesting computational properties under existing interpreters, as noted above. In this paper, we will analyse and characterize unique supported model classes by means of certain three-valued logics, and study computability properties of these. In particular, in Section 2 we will introduce three different three-valued logics and their associated consequence operators, and study the relationships between them. In Sections 3.1 and 3.2, we will characterize acceptable and locally hierarchical programs by means of the behaviour of these operators. We will also give constructions of their canonical level mappings. Prompted by the studies of acceptable and locally hierarchical programs, we will define a new class of programs which we call the Φ∗ -accessible programs. We study this class in Section 3.3, where it is shown that the Φ∗ -accessible programs contain the acceptable and the locally hierarchical programs. Moreover, we will show that each Φ∗ -accessible program has a unique supported model, that each has a canonical level mapping, and that the class of Φ∗ -accessible programs is computationally adequate under SLDNF-resolution. Many-valued logics have been employed in several studies of the semantics of logic programs. In particular, they have been used to assign special truth values to atoms which possess certain computational behaviour such as being non-terminating ([11,20]), being ill-typed ([21]), being floundering ([4]), or failing when backtracking ([6]). The motivation for the definitions of the three-valued logics we will be using in this paper comes from a couple of sources. Primarily, these logics are formulated in order to allow for easy analysis and characterization of the programs or classes of programs in question by using the logic to mimic the defining property of the program or class of programs. This idea is akin to some of those considered in the papers just cited, see also [6], and is a component of work being undertaken by the authors in [16] where a program transformation which outputs a locally hierarchical program, when input an acceptable one, is used in the characterization of acceptable programs given in [16]. Natural questions, partly answered here, then arise as to the different ways that different classes of programs can be characterized. On the other hand, the present work can also be viewed as a contribution to the asymmetric semantics proposed by Fitting and Ben-Jacob in [13] where it is noted that certain differences between Pascal,
Characterizations of Classes of Programs by Three-Valued Operators
359
LISP and Prolog, for example, are easily described in terms of three-valued logic. Thus, [13] is also a source of motivation for our definitions. However, we note that all programs analysed herein do have unique supported models, therefore the third truth value undefined will only be used for obtaining the unique supported two-valued model. Hence, interpretations of undefined from the point of view of computation (such as non-halting) are not actually necessary in this paper. Preliminaries and Notation Our notation basically follows [18], but we will include next a short review of the main terminology used. Given a normal logic program P , we work over an arbitrary preinterpretation J (complete generality is needed in [16] and hence also in this companion paper). We refer to variable assignments which map into the domain D of J as J-variable assignments; the underlying first order language of P will be denoted by L. By BP,J , we denote the set of all J-ground instances of atoms in L. Thus, BP,J is the set of all p(d1 , . . . , dn ), where p is an n-ary predicate symbol in L and d1 , . . . , dn ∈ D. An element A = p(d1 , . . . , dn ) of BP,J is called a J, v-(ground) instance or J-(ground) instance of an atomic formula A0 = p(t1 , . . . , tn ) in L if there exists a J-variable assignment v such that A0 | v = A, meaning that ti | v = di for i = 1, . . . , n, where t | v is the denotation of a term t relative to J and v. Since each ti | v ∈ D, any J-instance of A0 is variable free. This extends easily to literals L, where L = ¬A0 = ¬p(t1 , . . . , tn ), say. Thus, the symbol ¬p(d1 , . . . , dn ) is called a J, v-(ground) instance or J-(ground) instance of the literal L if there exists a J-variable assignment v such that p(t1 , . . . , tn ) | v = p(d1 , . . . , dn ). We often loosely refer to J-ground instances of atoms and of literals as J-ground atoms and J-ground literals respectively or even as ground atoms and ground literals respectively if J is understood. In accordance with [22, Definition 1], we write groundJ (P ) for the set of all J-(ground) instances of clauses, or J-ground clauses, or simply ground clauses, in P ; the latter term being used, of course, when again J is understood. Thus, typically, if A0 ← L1 , . . . , Ln is a clause in P , then A0 | v ← L1 | v, . . . , Ln | v is an element of groundJ (P ), where v is a J-variable assignment such that A = A0 | v is a J-instance of A0 and Li | v is a J-instance of Li for i = 1, . . . , n. All elements of groundJ (P ) are obtained thus from some clause and some J-variable assignment. Example 1. As an example of a normal logic program, we give the following program from [3] for computing the transitive closure of a graph. r(X, Y, E, V ) ← m([X, Y ], E) r(X, Z, E, V ) ← m([X, Y ], E), ¬m(Y, V ), r(Y, Z, E, [Y |V ]) m(X, [X|T ]) ← m(X, [Y |T ]) ← m(X, T ) e(a) ←
for all a ∈ N
Here, N denotes a finite set containing the nodes appearing in the graph as elements. In the program, uppercase letters denote variable symbols, lowercase
360
P. Hitzler and A.K. Seda
letters constant symbols, and lists are written using square brackets as usual under Prolog. One evaluates a goal ← r(x, y, e, [x]) where x and y are nodes and e is a graph specified by a list of pairs denoting its edges. The goal is supposed to succeed when x and y can be connected by a path in the graph. The predicate m implements membership of a list. The last argument of the predicate r acts as an accumulator which collects the list of nodes which have already been visited in an attempt to reach y from x. The transitive closure program has been studied in detail in [3,12]. The set of all two-valued interpretations based on J for a given normal program P will be denoted by IP,J . Elements of IP,J are called J-interpretations and are called J-models of P if they are also models of P . The set IP,J is a complete lattice with respect to the ordering ⊆ defined by I ⊆ K if and only if I |= A implies K |= A for every A ∈ BP,J . In order to simplify notation, we note that IP,J can be identified with the power set 2BP,J and the ordering ⊆ is then indeed setinclusion. For I ∈ IP,J , we set c I = BP,J \ I. With this convention and following [22, Section 2], in classical two-valued logic we write I |= p(d1 , . . . , dn ) (respectively I |= ¬p(d1 , . . . , dn )) if p(d1 , . . . , dn ) ∈ I (respectively p(d1 , . . . , dn ) 6∈I). By abusing the meaning of conjunction, and its notation, in the obvious way (see [22, Section 2]), it is now meaningful to write I |= L1 | v, . . . , Ln | v, where L1 | v, . . . , Ln | v denotes a “conjunction” L1 | v ∧ . . . ∧ Ln | v of J-instances of literals. The immediate consequence operator TP,J for a given program P is defined as usual as a mapping on IP,J as follows (where body denotes a conjunction of J-instances of literals): TP,J (I) = {A ∈ BP,J | there exists A ← body in groundJ (P ) with I |= body}. Finally, recall from [2] that a two-valued J-interpretation M is a supported Jmodel of P if and only if M (together with Clark’s Equality Theory) is a J-model of the Clark-completion of P if and only if TP,J (M ) = M .
2
Three-Valued Semantics
A three-valued J-interpretation of a program P is a pair (T, F ) of disjoint sets T, F ⊆ BP,J . Given such a J-interpretation I = (T, F ), a J-ground atom A is true (t) in I if A ∈ T , false (f) in I if A ∈ F , and undefined (u) otherwise; ¬A is true in I iff A is false in I, ¬A is false in I iff A is true in I and ¬A is undefined in I iff A is undefined in I. Given I = (T, F ), we denote T by I + and F by I − . Thus, I = (I + , I − ). If + I ∪ I − = BP,J , we call I a total three-valued J-interpretation of the program P . Total three-valued interpretations can be identified with elements of IP,J . Given a program P , the set IP,J,3 of all three-valued J-interpretations of P forms a complete partial order (in fact, complete semi-lattice) with the ordering ≤ defined by I≤K
if and only if
I + ⊆ K + and I − ⊆ K −
Characterizations of Classes of Programs by Three-Valued Operators
361
with least element (∅, ∅) which we will denote by ⊥. Notice that total threevalued J-interpretations are maximal elements in this ordering. In our present context, it will be sufficient to give truth tables for conjunction and disjunction, and we will make use of three different three-valued logics which we are now going to define. It should be noted here that the truth tables for disjunction are the same in all three logics and that disjunction is commutative. The first logic, which we will denote by L1 , evaluates conjunction as in Fitting’s Kripke-Kleene semantics [11] (in fact, as in Kleene’s strong three-valued logic, see [13]). Fitting’s work built on [20] and was subsequently studied in the literature by Kunen in [17], Apt and Pedreschi in [3], and Naish in [21]. Disjunction will be evaluated differently though, as indicated by the truth table in Table 1. Table 1. Truth tables for the logics L1 , L2 , and L3
p q t t t u t f u t u u u f f t f u f f Operator
Logic L1 Logic L2 Logic L3 p∧q p∨q p∧q p∨q p∧q p∨q t t t t t t u u u u u u f t f t f t u u u u u u u u u u u u f u u u u u f t f t f t f u f u u u f f f f f f ΦP,1 ΦP,2 ΦP,3
The second three-valued logic, L2 , will be used for studying acceptable programs and is non-commutative under conjunction. It will be sufficient to evaluate u ∧ f to u instead of f and leaving the truth table for L1 otherwise unchanged. This way of defining conjunction was employed in [4] and [6], see also the discussion of LISP in [13]. The truth table is again given in Table 1. The third logic, L3 , will be used for studying locally hierarchical and acyclic programs. For this purpose, we use a commutative version of L2 where we evaluate f ∧ u to u instead of f, see the discussion in [13] of Kleene’s weak three-valued logic in relation to Pascal. The truth table is shown in Table 1. Let P be a normal logic program, and let Li denote one of the three-valued logics above, where i = 1, 2 or 3. Corresponding to each of these logics we define an operator FP,J on IP,J,3 as follows. For I ∈ IP,J,3 , let FP,J (I) = (T, F ) where T denotes the set {A ∈ BP,J | there is A ← body ∈ groundJ (P ) s.t. body is truei in I}, and F denotes the set
362
P. Hitzler and A.K. Seda
{A ∈ BP,J | for every A ← body ∈ groundJ (P ), body is falsei in I}. Of course, truei and falsei here denote truth respectively falsehood in the logic Li . Notice that if A is not the head of any clause in P , then A is false in FP,J (I) for any I. It is clear that FP,J is monotonic in all three cases. We set FP,J ↑ 0 = ⊥, FP,J ↑ α = FP,J (FP,J ↑ (α − 1)) for α a successor ordinal, and [ FP,J ↑ α = FP,J ↑ β for α a limit ordinal. β<α
Since FP,J is monotonic, it has a least fixed point which is equal to FP,J ↑ α for some ordinal α called the closure ordinal of P (for the chosen logic Li ). Throughout the sequel, we will denote FP,J by ΦP,1 , ΦP,2 or ΦP,3 if the chosen logic is correspondingly L1 , L2 or L3 . The appropriate symbol is also included in Table 1 for ease of reference. Note that the behaviour of each of these operators depends only on the evaluation of conjunction. In fact, ΦP,1 is the very same operator as used in [11]. Proposition 1. Let P be a normal logic program and let I, I 0 , I 00 ∈ IP,J,3 be such that I ≤ I 0 ≤ I 00 . Then we have ΦP,3 (I) ≤ ΦP,2 (I 0 ) ≤ ΦP,1 (I 00 ). Proof. The following observations are clear from the given truth tables, and indeed suffice. If a body of a clause is true (false) in L3 , then it is true (false) in L2 . If it is true (false) in L2 , then it is true (false) in L1 . The following result is taken from [16] generalizing a result in [3]. Proposition 2. Let P be a normal logic program and let I = (I + , c I + ) be a total three-valued J-interpretation for P . Then I is a fixed point of ΦP,1 if and only if I + is a fixed point of TP,J . Furthermore, if ΦP,1 has exactly one fixed point M and M is total, then M + is the unique fixed point of TP,J . Proposition 3. Let P be a normal logic program, let FP,J denote ΦP,i , for i = 1, 2, 3, and assume that M = FP,J ↑ α is total, where α is the corresponding closure ordinal of P . Then M + is the unique two-valued supported J-model of P. Proof. By totality of M and the previous results we obtain M + as a fixed point of TP,J . Since M is the least fixed point of FP,J and is maximal in IP,J,3 , it is the unique fixed point of FP,J which finishes the proof. Given a J-ground atom A which occurs as the head of an element A ← C of groundJ (P ), we form the J-pseudo clause, or simply pseudo clause, A ← ∨i Ci whose body ∨i Ci is the (possibly infinite) disjunction of the bodies Ci of all clauses in groundJ (P ) whose head is A; we call A the head of the pseudo clause A ← ∨i Ci . The set of all such pseudo clauses will be denoted by P ∗ . It will be
Characterizations of Classes of Programs by Three-Valued Operators
363
convenient to assign “truth” values to ∨i Ci , relative to the logics Li by in fact assigning truth values to arbitrary disjunctions of literals and then employing the same sort of abuse for “disjunctions” of J-ground literals which was established earlier for conjunction. This is done as follows: ∨i Ci will be assigned value true (t) iff at least one Ci is true and none are undefined; it will be assigned value undefined (u) iff at least one Ci is undefined; it will be assigned value false (f) iff all the Ci are false. These definitions are the natural extension to possibly infinite disjunctions of the values given iteratively to finite disjunctions by the truth tables in Table 1. Letting FP,J denote any one of the ΦP,i , for i = 1, 2, 3, we define an operator FP ∗ on IP,J,3 as follows. For I ∈ IP,J,3 , set FP ∗ (I) = (T, F ), where T is the set of all J-ground atoms which occur as the head of a pseudo clause in P ∗ whose body is true in I, and F is the set of all J-ground atoms which occur as the head of a pseudo clause whose body is false in I. As before, ΦP ∗ ,i will denote FP ∗ when the chosen logic is Li , i = 1, 2, 3. Note that FP ∗ is again monotonic for any choice of underlying logic. Ordinal powers FP ∗ ↑ α are defined as for FP,J . Example 2. We give an example illustrating the program transformation P ∗ . Let P be the (propositional) program a←b a←c b← c←c then P ∗ is a←b∨c b← c←c Let I be the three-valued interpretation ({b}, ∅). Then ΦP,1 (I) = ({a, b}, ∅), which is also the least fixed point of ΦP,1 . However, since c is undefined in I, we have ΦP ∗ ,1 (I) = ({b}, ∅), which is the least fixed point of ΦP ∗ ,1 . The difference between ΦP,1 and ΦP ∗ ,1 results from the way in which disjunction is defined, see the following proposition, Proposition 4. In fact, in this context it is worth noting an observation made by one of the referees of this paper, as follows. In classical two-valued logic, the programs (a ← b) ∧ (a ← c) and a ← (b ∨ c) are equivalent simply because of the distributive laws and De Morgan’s law that ¬b ∧ ¬c and ¬(b ∨ c) are equivalent. In the Logics Li , i = 1, 2, 3, ¬b ∧ ¬c and ¬(b ∨ c) are not equivalent as can easily be verified by, for example, taking b to be true and c to be undefined. In fact, the rule a ← (b ∨ c) with disjunctive body is weaker (leaves more undefined) than the two separate rules a ← b and a ← c. Proposition 4. Let P be a normal logic program and let I, I 0 , I 00 ∈ IP,J,3 be such that I ≤ I 0 ≤ I 00 . Then we have
364
P. Hitzler and A.K. Seda
ΦP ∗ ,3 (I) ≤ ΦP ∗ ,2 (I 0 ) ≤ ΦP ∗ ,1 (I 00 ), and for F denoting any of the Φi , for i = 1, 2, 3, we have FP ∗ (I) ≤ FP,J (I)
and
FP ∗ (I)− = FP,J (I)− .
Proof. The proof is along the same lines as the proof of Proposition 1 noting that in a disjunction ∨i Ci which is true, no Ci is undefined.
3 3.1
Unique Supported Model Classes Acceptable Programs
Acceptable programs were introduced in [3] and were shown to be strongly related to left-terminating programs. Given a normal logic program P , a level mapping for P is a mapping l from J-ground atoms to an ordinal α. We always assume that l is extended to J-ground literals by setting l(¬A) = l(A) for every J-ground atom A. A level mapping which maps into ω will be called an ω-level mapping. Following [3], we define a subset P − of P as follows. Definition 1. Let P be a normal logic program and let p, q be predicate symbols occurring in P . (i) We say that p refers to q if there is a clause in P with p in its head and q in its body. (ii) We say that p depends on q if (p, q) is in the reflexive, transitive closure of the relation refers to. (iii) The set of predicate symbols in P which occur in a negative literal in the body of a clause in P is denoted by NegP . (iv) The set of predicate symbols in P on which the predicate symbols in NegP depend is denoted by Neg∗P . For convenience, we will denote this set simply by N. (v) We define P − to be the set of clauses in P which contain a predicate symbol from N in the head. The following definition is the generalization to an arbitrary preinterpretation J of the definition of acceptability given in [3] for the Herbrand preinterpretation. Definition 2. Let P be a normal logic program, let l be an ω-level mapping for P , and let I be a (two-valued) J-model for P whose restriction to the predicate symbols in N is a supported J-model of P − . Then P is called J-acceptable with respect to l and I if, for every clause A ← L1 , . . . , Ln in groundJ (P ), the following implication holds for all i ∈ {1, . . . , n}: if I |=
i−1 ^
Lj then l(A) > l(Li ).
j=1
A program is called J-acceptable with respect to l if l is a level mapping and there exists a J-model I such that the program is J-acceptable with respect to l and I. A program is called J-acceptable, or just acceptable if J is understood, if it is J-acceptable with respect to some level mapping and some J-model.
Characterizations of Classes of Programs by Three-Valued Operators
365
Example 3. The transitive closure program given in Example 1 is Herbrandacceptable; for details of the model and level mapping required, see [3]. We are able to characterize J-acceptable programs by means of the operator ΦP ∗ ,2 , and we do this next. We will need the following proposition from [16]. Proposition 5. Suppose that P is J-acceptable with respect to a level mapping + is the unique supported J-model of P l. Then MP,J = ΦP,1 ↑ ω is total, MP,J + and P is J-acceptable with respect to l and MP,J . Lemma 1. Let P be J-acceptable. Then M = ΦP ∗ ,2 ↑ ω is total. Furthermore, M = ΦP,2 ↑ ω, and M + is the unique supported J-model of P . Proof. Let l be a level mapping with respect to which P is J-acceptable. By + . Assume that there Proposition 5, P is J-acceptable with respect to l and MP,J is a J-ground atom A which is undefined in M . Without loss of generality we can assume that l(A) is minimal. Then by definition of L2 , there is precisely one pseudo clause in P ∗ of the form A ← ∨i Ci in which at least one of the Ci , say C1 , is undefined. Thus, there must occur a left-most J-ground body literal B in C1 which is undefined in M , and this ground literal is to the left in C1 of the first ground literal which is false in M . Hence, all ground literals occurring to the left of B must be true in M . Since M ≤ MP,J by Proposition 4, all these + . By acceptability of P we therefore ground literals must also be true in MP,J conclude that l(B) < l(A), contradicting the minimality of l(A). By Proposition 4, the second statement holds. The last statement follows from Proposition 3. Definition 3. Let P be J-acceptable. Define its canonical level mapping as follows: lP (A) is the lowest ordinal α such that A is not undefined in ΦP ∗ ,2 ↑ (α+1). Proposition 6. Let P be J-acceptable. Then lP is an ω-level mapping and P is J-acceptable with respect to lP and MP,J . Furthermore, if l is another level mapping with respect to which P is J-acceptable, then lP (A) ≤ l(A) for all A ∈ BP,J . In particular, lP is exactly the canonical level mapping defined in [16]. Proof. By the previous lemma, lP is indeed an ω-level mapping. Let A be the head of a J-ground clause C in P with lP (A) = n. Then the body ∨i Ci of the corresponding pseudo clause in P ∗ is either true or false (i.e. is not undefined) in N = ΦP ∗ ,2 ↑ n. If ∨i Ci is true, each Ci evaluates to true or false in N . If Ci evaluates to true in N (and at least one must), then all J-ground literals in Ci are true in N , and therefore have level less than or equal to n − 1. If Ci evaluates to false in N , then there must be a ground literal in Ci which is false in N such that all ground literals occurring to the left of it are true in N . Moreover all these ground literals are not undefined in N and hence have level less than or equal to n − 1. A similar argument applies if ∨i Ci is false in N . Since N ≤ MP,J , it is now clear that the clause C satisfies the condition of acceptability given in Definition 2 with respect to lP and MP,J .
366
P. Hitzler and A.K. Seda
Now let l be another level mapping with respect to which P is J-acceptable. By Proposition 5, P is J-acceptable with respect to l and MP,J . Let A ∈ BP,J with l(A) = n. We show by induction on n that l(A) ≥ lP (A). If n = 0, then A appears only as the head of unit clauses, and therefore lP (A) = 0. Now let n > 0. Then in every clause with head A, the left prefix of the corresponding body, up to and including the first ground literal which is false in MP,J , contains only ground literals L with l(L) < n. By the induction hypothesis, lP (L) < n for all these ground literals L and, consequently, lP (A) ≤ l(A) by definition of lP . The last statement follows from [16], where it is shown that the given minimality property characterizes lP . We are now in a position to characterize J-acceptable programs. Theorem 1. Let P be a normal logic program. Then P is J-acceptable if and only if M = ΦP ∗ ,2 ↑ ω is total. Proof. By Lemma 1 it remains to show that totality of M implies acceptability. Define the ω-level mapping lP for P as in Definition 3. Since M is total, lP is indeed an ω-level mapping for P . We will show that P is J-acceptable with respect to lP and M . Arguing as in the proof of the previous proposition, let A be the head of a Jground clause C in P with lP (A) = n. Then the corresponding body C evaluates to true or false in N = ΦP ∗ ,2 ↑ n. If it evaluates to true in N , then all J-ground literals in C are true in N , and therefore have level less than or equal to n − 1. If it evaluates to false in N , then there must be a ground literal in C which is false in N such that all ground literals occurring to the left of it are true in N . Again, all these ground literals are not undefined in N and hence have level less than or equal to n − 1. Since N ≤ M, the clause C satisfies the condition of acceptability given in Definition 2. In [19], it was shown that the class of programs which terminate under Chan’s constructive negation ([10]) coincides with the class of programs which are acceptable with respect to a model based on a preinterpretation whose domain is the Herbrand universe and contains infinitely many constant and function symbols. We therefore obtain the following result. Theorem 2. A normal logic program P terminates under Chan’s constructive negation if and only if ΦP ∗ ,2 ↑ ω is total, where ΦP ∗ ,2 is computed with respect to a preinterpretation whose domain is the Herbrand universe and contains infinitely many constant and function symbols. 3.2
Locally Hierarchical Programs
Locally hierarchical programs were introduced in [8], for the special case of the Herbrand base, as a natural generalization of acyclic programs. They were further studied in [9] and in [23] (and also called strictly level-decreasing there). Here, we consider them over an arbitrary preinterpretation J and our definition and subsequent results are therefore completely general.
Characterizations of Classes of Programs by Three-Valued Operators
367
Definition 4. A normal logic program P is called locally hierarchical if there exists a level mapping l : BP,J → α, where α is some countable ordinal, such that for every clause A ← L1 , . . . , Ln in groundJ (P ) we have l(A) > l(Li ) for all i. If, further, α = ω, we call P acyclic. We will now give a new characterization of these programs along the lines of Theorem 1, using the operator ΦP ∗ ,3 . Lemma 2. Let P be locally hierarchical with respect to the level mapping l and let A ∈ BP,J be such that l(A) = α. Then A is true or false in ΦP ∗ ,3 ↑ (α + 1). In particular, there exists an ordinal αP such that ΦP ∗ ,3 ↑ αP is total. Proof. The proof is by transfinite induction on α. The base case follows directly from the fact that if α = 0, then A appears as head of unit clauses only. Now let α = β + 1 be a successor ordinal. Then all J-ground literals appearing in bodies of clauses with head A have level less than or equal to β. By the induction hypothesis, they are all not undefined in ΦP ∗ ,3 ↑ (β + 1) and therefore A is either true or false in ΦP ∗ ,3 ↑ (α + 1). If α is a limit ordinal, then all ground literals occurring in bodies of clauses with head A have level strictly less than α. Hence, by the induction hypothesis and since α is a limit ordinal, all these ground body literals are not undefined in ΦP ∗ ,3 ↑ α, and therefore A is true or false in ΦP ∗ ,3 ↑ (α + 1). Corollary 1. Let P be a locally hierarchical program with level mapping l : BP,J → α and let M = ΦP,1 ↑ α. Then M is total and MP,J = M + is the unique supported J-model of P . Proof. By Propositions 1 and 4, we have ΦP ∗ ,3 ↑ β ≤ ΦP,3 ↑ β ≤ ΦP,1 ↑ β for all ordinals β. Since ΦP ∗ ,3 ↑ α is total by Lemma 2, the given statement holds using Proposition 3. Definition 5. Let P be locally hierarchical. Define the canonical level mapping lP of P as a function lP : BP,J → αP where lP (A) is the least ordinal α such that A is true or false in ΦP ∗ ,3 ↑ (α + 1). Proposition 7. Let P be locally hierarchical with respect to some level mapping l. Then lP is a level mapping for P and, for all A ∈ BP,J , we have lP (A) ≤ l(A). Furthermore, the notion of canonical level mapping as defined here coincides with the same notion defined by different methods in [23]. Proof. The mapping lP is indeed a level mapping by Lemma 2. Let A ∈ BP,J with l(A) = α. We show the given minimality statement by transfinite induction on α. If α = 0, then A appears as the head of unit clauses only, and so lP (A) = 0. If α = β + 1 is a successor ordinal, then all J-ground literals L occurring in bodies of clauses with head A have level l(L) ≤ β. By the induction hypothesis, we obtain lP (L) ≤ β for all those ground literals, and so lP (A) ≤ α = l(A) by construction of lP . If α is a limit ordinal, then all ground literals L occurring in bodies of clauses with head A have level l(L) < α. Since lP (L) ≤ l(L) and since
368
P. Hitzler and A.K. Seda
α is a limit ordinal, we obtain that all these ground literals L are not undefined in ΦP ∗ ,3 ↑ α and therefore lP (A) ≤ α = l(A) as desired. The last statement follows since the minimality property just proved characterizes the canonical level mapping as was shown in [23]. Note that it is an easy corollary of the previous results that if a program P is acyclic, then ΦP ∗ ,3 ↑ ω is total. Theorem 3. A normal logic program P is locally hierarchical if and only if ΦP ∗ ,3 ↑ α is total for some ordinal α. It is acyclic if and only if ΦP ∗ ,3 ↑ ω is total. Proof. Let P be a normal logic program such that ΦP ∗ ,3 ↑ α is total for some α. We define a mapping l : BP,J → α by analogy with the definition of the canonical level mapping for locally hierarchical programs. From the definion of L3 it is now obvious that P is indeed locally hierarchical with canonical level mapping l. The reverse was shown in the previous proposition. The statement for acyclic programs now follows immediately as well. 3.3
Φ∗ -Accessible Programs
Our investigations of J-acceptable and locally hierarchical programs suggest we define a class of programs by the property that ΦP ∗ ,1 ↑ α is total for some ordinal α. We will do this next and show also that this class is computationally adequate. Definition 6. A normal logic program P will be called a Φ∗ -accessible program if ΦP ∗ ,1 ↑ α is total for some ordinal α. Theorem 4. Every Φ∗ -accessible program has a unique supported J-model. Furthermore, the class of Φ∗ -accessible programs contains all J-acceptable and all locally hierarchical programs. Proof. Immediate by Propositions 3 and 4. Definition 7. The canonical level mapping l∗ for a given Φ∗ -accessible program is defined as follows. For every A ∈ BP,J , set l∗ (A) = α, where α is the minimal ordinal such that A is true or false in ΦP ∗ ,1 ↑ (α + 1). The following is immediate by Proposition 4. Proposition 8. If P is J-acceptable or locally hierarchical with canonical level mapping lP , then l∗ (A) ≥ lP (A) for all J-ground atoms A. Proposition 9. Let P be Φ∗ -accessible with unique supported J-model M . Let C be an arbitrary element of groundJ (P ), let A be its head, and let l∗ (A) = α. Then the following property (∗) holds: Either the body of C is true in M , in which case every J-ground literal L in this body has level l∗ (L) < α, or there exists a ground body literal B in C which is false in M , and in this case l∗ (B) < α. Furthermore, if l is a level mapping for P which satisfies (∗), then l∗ (A) ≤ l(A) for every A ∈ BP,J .
Characterizations of Classes of Programs by Three-Valued Operators
369
Proof. Since P is Φ∗ -accessible, every body of every J-ground clause with head A is either true or false in ΦP ∗ ,1 ↑ α. In particular, the body of C is true or false in ΦP ∗ ,1 ↑ α. If it is true, then all J-ground literals L in the body are true in ΦP ∗ ,1 ↑ α and so l∗ (L) < α by definition of l∗ . If the body is false, then there is a ground body literal B which is false in ΦP ∗ ,1 ↑ α, and again by definition of l∗ we obtain l∗ (B) < l(A). The minimality property of l∗ is shown by transfinite induction along the same lines as in the proofs of the Propositions 6 and 7. It was shown in [23] that the class of all locally hierarchical programs is computationally adequate in the sense that every partial recursive function can be computed with such a program if the use of safe cuts is allowed. For Φ∗ accessible programs, the cut need not be used, and we will show this next. The proof basically shows that given a partial recursive function, there is a definite program as given in [18] which computes that function. This program will turn out to be a Φ∗ -accessible program. Theorem 5. Let f be a partial recursive function. Then there exists a definite Φ∗ -accessible program which computes f . Proof. We will make use of the definite program Pf given in [18, Theorem 9.6], and we refer the reader to the proof of this theorem for details. It is easily seen that we have to consider the minimalization case only. In [18], the following program Pf was given as an implementation of a function f which is the result of applying the minimalization operator to a partial recursive function g, which is in turn implemented by a predicate pg . We abbreviate X1 , . . . , Xn by X. pf (X, Y ) ← pg (X, 0, U ), r(X, 0, U, Y ) r(X, Y, 0, Y ) ← r(X, Y, s(V ), Z) ← pg (X, s(Y ), U ), r(X, s(Y ), U, Z) This program is not Φ∗ -accessible. However, we can replace it with a program Pf0 which has the same procedural behaviour and is Φ∗ -accessible. In fact, we replace the definition of r by r(X, Y, 0, Y ) ← r(X, Y, s(V ), Z) ← pg (X, s(Y ), U ), r(X, s(Y ), U, Z), lt(Y, Z), where the predicate lt is in turn defined as lt(0, s(X)) ← lt(s(X), s(Y )) ← lt(X, Y ) and is obviously Φ∗ -accessible. By a straightforward analysis of the original program Pf , it is clear that the addition of lt(y, z) in the second defining clause of r does not alter the behaviour of the program. Since lt and pg are Φ∗ -accessible, it is now easy to see that r is Φ∗ -accessible, and so therefore is Pf0 .
370
P. Hitzler and A.K. Seda
It is worth noting that negation is not needed here in order to obtain full computational power, so Theorem 5 strenghtens the result of [18] referred to in the proof of Theorem 5. By contrast, as already noted, definite locally hierarchical programs seem not to provide full computational power. Regardless of some known drawbacks in SLDNF-resolution, it is interesting to know that relative to it the class of all Φ∗ -accessible programs has full computational power – neither the class of acyclic nor even the class of J-acceptable programs has this property.
4
Conclusions
The rather simple characterizations of the classes discussed in this paper are a contribution to exploring the “space” of all normal programs, a task which appears not yet to have been addressed very extensively. Both the class of locally hierarchical programs and the class of J-acceptable programs are natural generalizations of acyclic programs; the first can be understood as a generalization in semantical terms, and the second as a generalization expressing termination. The results presented in this paper establish a common framework which highlights more clearly the differences and the similarities between these generalizations: each can be obtained uniquely by suitably defining conjunction in the underlying three-valued logic whilst retaining a fixed meaning for disjunction. Our approach then leads naturally to the definition of the class of all Φ∗ -accessible programs, by choosing yet another definition of conjunction. This class is remarkable for two reasons: (i) each program in it has a unique supported J-model, and (ii) the class itself has full computational power under SLDNF-resolution whilst containing all J-acceptable and all locally hierarchical programs, but not all definite programs. However, a simple syntactical description of this class and how it relates to other better known classes is not yet known to us, nor is the complexity of deciding if a program is Φ∗ -accessible. Other classes of programs may well be susceptible to the sort of analysis presented here, and this also is ongoing research of the authors. As already noted in the Introduction, such an investigation carries forward the suggestion made in [13] that asymmetric semantics is worthy of further study. Acknowledgements The authors wish to thank three anonymous referees for their comments which substantially helped to improve the style of this paper. The first named author acknowledges financial support under grant SC/98/621 from Enterprise Ireland.
References 1. Apt, K.R., Bezem, M.: Acyclic Programs. In: Warren, D.H.D., Szeredi, P. (Eds.): Proceedings of the Seventh International Conference on Logic Programming. MIT Press, Cambridge MA, 1990, pp. 617–633 2. Apt, K.R., Blair, H.A., Walker, A.: Towards a Theory of Declarative Knowledge. In: Minker, J. (Ed.): Foundations of Deductive Databases and Logic Programming. Morgan Kaufmann Publishers Inc., Los Altos, 1988, pp. 89–148
Characterizations of Classes of Programs by Three-Valued Operators
371
3. Apt, K.R., Pedreschi, D.: Reasoning about Termination of Pure Prolog Programs. Information and Computation 106 (1993) 109–157 4. Andrews, J.H.: A Logical Semantics for Depth-first Prolog with Ground Negation. Theoretical Computer Science 184 (1–2) (1997) 105–143 5. Bidoit, N., Froidevaux, C.: Negation by default and unstratifiable logic programs. Theoretical Computer Science 78 (1991) 85–112 6. Barbuti, R., De Francesco, N, Mancarella, P, Santone, A.: Towards a Logical Semantics for Pure Prolog. Science of Computer Programming 32 (1–3) (1998) 145–176 7. Bezem, M: Characterizing Termination of Logic Programs with Level Mappings. In: Lusk, E.L., Overbeek R.A.(Eds.): Proceedings of the North American Conference on Logic Programming. MIT Press, Cambridge MA, 1989, pp. 69–80 8. Cavedon, L.: Continuity, Consistency, and Completeness Properties for Logic Programs. In: Levi, G., Martelli, M. (Eds.): Proceedings of the 6th International Conference on Logic Programming. MIT Press, Cambridge MA, 1989, pp. 571–584 9. Cavedon L.: Acyclic Logic Programs and the Completeness of SLDNF-Resolution. Theoretical Computer Science 86 (1991) 81–92 10. Chan, D.: Constructive Negation Based on the Completed Database. In: Proc. of the 5th Int. Conf. and Symp. on Logic Programming, 1988, pp. 111–125 11. Fitting, M.: A Kripke-Kleene Semantics for General Logic Programs. J. Logic Programming 2 (1985) 295-312 12. Fitting, M.: Metric Methods: Three Examples and a Theorem. J. Logic Programming 21 (3) (1994) 113–127 13. Fitting, M., Ben-Jacob, M.: Stratified, Weak Stratified, and Three-Valued Semantics. Fundamenta Informaticae XIII (1990) 19–33 14. Van Gelder, A., Ross, K.A., Schlipf, J.S.: The Well-Founded Semantics for General Logic Programs. Journal of the ACM 38 (3) (1991) 620–650 15. Gelfond, M., Lifschitz, V.: The Stable Model Semantics for Logic Programming. In: Kowalski, R.A., Bowen, K.A. (Eds.): Proceedings of the 5th International Conference and Symposium on Logic Programming, MIT Press, 1988, pp. 1070–1080 16. Hitzler, P., Seda, A.K.: Acceptable Programs Revisited. Preprint, Department of Mathematics, University College Cork, Cork, Ireland, 1999, pp. 1–15 17. Kunen, K.: Negation in Logic Programming. J. Logic Programming 4 (1987) 289– 308 18. Lloyd, J.W.: Foundations of Logic Programming. Second Edition, Springer, Berlin, 1988 19. Marchiori, E.: On Termination of General Logic Programs with respect to Constructive Negation. J. Logic Programming 26 (1) (1996) 69–89 20. Mycroft, A.: Logic Programs and Many-valued Logic. In: Fontet, M., Mehlhorn, K. (Eds.): STACS 84, Symposium of Theoretical Aspects of Computer Science, Paris, France, 1984, Proceedings. Lecture Notes in Computer Science, Vol. 166, Springer, 1984, pp. 274–286 21. Naish, L.: A Three-Valued Semantics for Horn Clause Programs. Technical Report 98/4, University of Melbourne, pp. 1–11 22. Seda, A.K.: Topology and the Semantics of Logic Programs. Fundamenta Informaticae 24 (4) (1995) 359–386 23. Seda, A.K., Hitzler, P.: Strictly Level-decreasing Logic Programs. In: Butterfield, A., Flynn, S. (Eds.): Proceedings of the Second Irish Workshop on Formal Methods 1998 (IWFM’98), Electronic Workshops in Computing, British Computer Society, 1999, 1–18
Using LPNMR for Problem Specification and Code Generation Marco Cadoli Dipartimento di Informatica e Sistemistica Universit` a di Roma “La Sapienza” Via Salaria 113, I-00198 Roma, Italy
[email protected] WWW home page: http://www.dis.uniroma1.it/˜cadoli
In an ongoing research project1 we use a form of LPNMR as the formal basis for some code generation tools, which take as input the specification for a problem, and give as output the code to solve it, in C++ or prolog. Formally, we defined a logic-based specification language, called np-spec, extending negation-free datalog by allowing a limited use of some second-order predicates of predefined forms. The semantics of np-spec is fully declarative, and is based on the notion of minimal model, typical of circumscription. np-spec programs specify solutions to problems in a very abstract and concise way, and are executable. As an example, this is the np-spec program for the “graph 3-coloring” problem, which specifies both an instance (i.e., a graph, in the DATABASE section), and the question (in the SPECIFICATION section). DATABASE NODE = {1..6}; EDGE = {(1,2), (1,3), (2,3), (6,2), (6,5), (5,4), (3,5)}; SPECIFICATION Partition(NODE,coloring,3). non_3_colorable <-- edge(X,Y), coloring(X,C), coloring(Y,C).
In this case, the instance is a graph with six nodes and seven edges, which is 3-colorable. Intuitively, non 3 colorable becomes true iff for all possible extensions of coloring –which is declared to partition the domain NODE in 3 subsets– the rule fires. In the present prototype the specification is compiled to prolog code, which is run to construct outputs. Using second-order predicates of suitable forms, we were able to give the specifications of several NP-complete problems. An important theoretical result concerns the expressive power of np-spec, which is characterized as to express exactly the problems in the class NP. In another prototype called NpC++ –which outputs a C++ program– we aimed at a tight coupling with a typed programming language, i.e., C++. The specification of the problem is given in an enriched C++, so that both the instance and the output can be any C++ object. Current research focuses on improving the efficiency of the generated code. 1
Preliminary results appear in: M. Cadoli, L. Palopoli, A. Schaerf, and D. Vasile. npspec: An executable specification language for solving all problems in NP. In Proc. of the 1st Intl. Workshop on Practical Aspects of Declarative Languages (PADL’99), number 1551 in Lecture Notes in Artificial Intelligence. Springer-Verlag, 1999.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, p. 372–372, 1999. c Springer-Verlag Berlin Heidelberg 1999
Answer Set Planning? (Abstract) Vladimir Lifschitz Department of Computer Sciences University of Texas at Austin Austin, TX 78712, USA
[email protected]
In “answer set programming”[5,7] solutions to a problem are represented by answer sets (known also as stable models), and not by answer substitutions produced in response to a query, as in conventional logic programming. Instead of Prolog, answer set programming uses software systems capable of computing answer sets. Four such systems were demonstrated at the Workshop on LogicBased AI held in June of 1999 in Washington, DC: dlv1 , smodels,2 , DeReS3 and ccalc4 . This paper is about applications of answer set programming to planning. In a planning problem, we look for a sequence of actions that leads from a given initial state to a given goal state. An important class of planning algorithms is based on the idea of reducing a planning problem to the problem of finding a satisfying interpretation for a set of propositional formulas. This is known as satisfiability planning [2]. Answer set planning differs from satisfiability planning in that it uses logic programming rules instead of propositional formulas. An important advantage of answer set planning is that the representation of properties of actions is easier when logic programs are used instead of classical logic, in view of the nonmonotonic character of negation as failure. The idea of answer set planning is due to Subrahmanian and Zaniolo [8], and the results of computational experiments that use smodels to compute answer sets are reported in [1,7]. The method presented in this paper is based on some of the ideas of [6,9,3,4]. The key element of answer set planning is the representation of the planning domain in the form of a “history program”—a program whose answer sets represent possible “histories,” or evolutions of the system, over a fixed time interval. This program is extended by the constraints representing the initial state and the goal state of the problem. The full paper will appear in the Proceedings of the 1999 International Conference on Logic Programming (ICLP-99), to be published by the MIT Press. ? 1 2 3 4
Joint invited talk of LPNMR’99 and ICLP’99 http://www.dbai.tuwien.ac.at/proj/dlv/ http://www.tcs.hut.fi/Software/smodels/ http://www.cs.engr.uky.edu/∼lpnmr/DeReS.html http://www.cs.utexas.edu/users/mccain/cc
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 373–374, 1999. c Springer-Verlag Berlin Heidelberg 1999
374
V. Lifschitz
References 1. Yannis Dimopoulos, Bernhard Nebel, and Jana Koehler. Encoding planning problems in non-monotonic logic programs. In Proc. European Conf. on Planning 1997, pages 169–181, 1997. 2. Henry Kautz and Bart Selman. Planning as satisfiability. In Proc. ECAI-92, pages 359–363, 1992. 3. Vladimir Lifschitz. Action languages, answer sets and planning. In The Logic Programming Paradigm: a 25-Year Perspective, pages 357–373. Springer Verlag, 1999. 4. Vladimir Lifschitz and Hudson Turner. Representing transition systems by logic programs. This volume. 5. Victor Marek and Miroslaw Truszczy´ nski. Stable models and an alternative logic programming paradigm. In The Logic Programming Paradigm: a 25-Year Perspective, pages 375–398. Springer Verlag, 1999. 6. Norman McCain and Hudson Turner. Causal theories of action and change. In Proc. AAAI-97, pages 460–465, 1997. 7. Ilkka Niemel¨ a. Logic programs with stable model semantics as a constraint programming paradigm. Annals of Mathematics and Artificial Intelligence, 1999. To appear. 8. V.S. Subrahmanian and Carlo Zaniolo. Relating stable models and AI planning domains. In Proc. ICLP-95, 1995. 9. Hudson Turner. Representing actions in logic programs and default theories: a situation calculus approach. Journal of Logic Programming, 31:245–298, 1997.
World-Modeling vs. World-Axiomatizing David McAllester AT&T Labs-Research 180 Park Ave Florham Park NJ 07932
[email protected] http://www.research.att.com/˜dmac
Abstract. A logic allows one to express statements (axioms) that are, perhaps approximately, true of the world. A model is a particular object that is similar to, or “models”, the world. For example, the growing field of model checking involves formal models of the behavior of physical computer chips. Bayesian networks, MPDs, and POMDPs are models of (real) probabilistic environments. This paper argues that world-modeling is more natural that world-axiomatizing. The main technical result is an algorithm for exactly computing the asymptotic average reward of a robot controller written in a high level programming language when run in a world model also defined in a high level language.
1
Introduction
In an axiomatic approach to automated reasoning one writes down axioms that one takes to be true of the world and then uses automated reasoning or theorem proving to draw conclusions from those axioms. In the formal verification community, however, there has been an growing movement away from this “axiomatic approach” toward a “modeling approach” [1,8,2]. In the modeling approach one specifies a particular model (structure) which approximates, or “models”, the world. Most models used in AI include “world states” or “situations”. The situation calculus [7] represents knowledge about states and actions in the form of first order axioms — the situation calculus takes an axiomatic approach. In a modeling approach, on the other hand, one defines the set of states and actions. For example, in formal model checking of a computer chip a state is defined to be bit string representing the electrical state of a set of transistors. An initial state can be given and the set of allowed state transitions can be specified as a binary relation on bit strings represented by a Boolean decision diagram (BDD). Modeling has also been gaining ground over axiomatizing in the area of reasoning under uncertainty. A stochastic model is one in which probabilities occur as part of the fundamental data of the model. Bayesian networks, Markov Decision processes (MDPs), Hidden Markov Models (HMMs), n-gram language models, and probabilistic context free grammars (PCFGs) are all examples of stochastic models. In each case some aspect of the world is modeled rather than axiomatized. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 375–388, 1999. c Springer-Verlag Berlin Heidelberg 1999
376
D. McAllester
Semantics, as studied by logicians, is the relationship between statements and models — a given statement is either true or false when interpreted in a given model. Philosophers tend to think of semantics as a relationship between statements and “the world”. But it seems clearer to work with abstract formally defined models rather than the world itself. This is not just a technical convenience. Most, and perhaps all, “real” objects seem reside in conceptual world-models rather than the world itself. For example, consider this paper. I see words on a computer screen as I type this, but certainly the pixels on the screen are not “the paper”. The paper is a document that can be emailed. Perhaps only the pixels on my screen and the bits in the computer’s memory really exist. But the pixels themselves are only models of the real stuff — phosphor excited by electrons. But electrons are only a model of the true reality — the unified electroweak field. And so on, to some hypothetical ultimate model of physics. The world consists of models all the way down. Does this paper really exist? I don’t care. I use a conceptual model where it does. The model is constructed, perhaps even defined, in my head. First order logic, or even higher order logic, seems inappropriate for modeling. Modeling languages, such as the PROMELA language for constructing models in the SPIN system [3], are similar to programming languages. A computer program can be viewed as a system of definitions. To see the fundamental difference between first order logic and a programming language, consider the following first order axiom for the predicate PATH where PATH(x, y) is intended to mean that there is a path from x to y in the graph defined by the predicate ARC. ∀x, y, z PATH(x, y) ← ARC(x, y) ∨ [PATH(x, z) ∧ PATH(z, y)] Note that this formula is true if we interpret PATH as the universally true predicate. The formula fails to define the predicate PATH. Of course, any set of Horn clauses has a unique least fixed point which gives, in this case, the intended meaning of PATH. So Horn clauses under least fixed point semantics are a better modeling language than general first order logic. Horn clauses under least fixed point semantics can be viewed as a kind of programming language. Of course one could use a second order axiom to say that PATH is required to be the least predicate satisfying the above formula. But this seems much more awkward than simply defining PATH in an appropriate definitional language, such as Horn clauses under least fixed point semantics. In general, programming languages are designed to expression definitions while logics are designed to express axioms. This paper presents a particular technical approach, based on modeling rather than axiomatizing, for constructing and validating high level robot controllers. We propose a particular form of world model which we call a partially observable semi-Markovian decision process (POSDP). This general model class includes MDPs and POMDPs as special cases. We also propose a particular compact representation of POSDPs where the allowed actions are represented by programs defined in a high level programming language. We also give a high level language for expressing policies (robot controllers) for selecting actions in a POSDP. This high level “action language” allows for the definition of complex
World-Modeling vs. World-Axiomatizing
377
subroutines which can then be used as additional possible actions. The action language can be viewed as a generalization of the notion of option developed for MDPs [9]. The main technical result of this paper is an algorithm for exactly computing (without stochastic simulation) the asymptotic average reward of a policy defined in this high level action language when run in a POSDP which is also defined in a high level programming language. This algorithm provides a way of formally validating robot controllers. Other researchers have constructed methods of formally validating robot controllers. In particular, the Golog project [5,4] provides a programming language for expressing policies and a logic for reasoning about such policies. There are three significant differences between Golog and the approach taken here. First, the Golog approach uses a mixture of axiomatic and modeling methods. In the Golog approach the effect of primitive actions on the environment is specified axiomatically using a set of “frame axioms”. Converting frame axioms to actual operations on states requires solving the so-called McCarthy frame problem [7]. In a modeling approach each action is defined and the McCarthy frame problem does not arise. A second difference between this paper and the Golog work is stochasticity — Golog does not support stochastic world models. Finally, although both approaches allow policies expressed as (essentially) computer programs, quite difference technical approaches are taken. The Golog programming language is formalized as a translation of declarative statements about programs into formulas of second order logic. Here the programming language is formalized in a more traditional manner — a traditional syntax and operational semantics is given. This paper builds on the work of Koller, McAllester and Pfeffer on analysis algorithms for stochastic programs [6]. The high level language used to compactly represent POSDPs is somewhat similar to the language developed by Koller et al. The main difference is that here a more familiar semantics is given based on call by value rather than call by need. This paper does not address the problem of learning a world-model or policy from interaction with the world. Learning a sophisticated symbolic POSDP from interaction with the world seems difficult but not fundamentally impossible. Before addressing learning world models and policies, however, it seems useful to first formalize a language in which sophisticated world-models and sophisticated policies for behaving in those models can be expressed compactly. The first step is the specification of an appropriate high level stochastic programming language.
2
Stochastic Programs
This section defines a simple first order stochastic programming language. This language serves two purposes in this paper. First, it is a kind of warm-up exercise for the more conceptually challenging action language of section 3. Second, as shown in section 5, it provides a way of compactly representing world models. The programming language is designed to be as simple as possible while allowing an exposition of the conceptual issue of stochasticity. More sophisticated
378
D. McAllester
languages are clearly possible and desirable but are not discussed here. The syntax of the programming language is defined by the following grammar. v ::= f [v1 , . . . , vn ] p ::= x | f [p1 , . . . , pn ] e ::= x | f [e1 , . . . , en ] | g(e1 , . . . , en ) | FLIP(α) | CASE e0 OF p1 : e1 , . . . , pn : en P ::= [] | g(x1 , . . . , xn ) ≡ e ; P In the grammar f ranges over constructor function symbols each with a specified number of arguments (possibly zero) called the arity of f . The grammar for v defines the set of “values” — each value is a Herbrand term build from constants (constructors of no arguments) and applications of constructors to other Herbrand terms. In the grammar x represents a variable. The grammar for p defines the set of “patterns” — terms that can be constructed from variables, constants, and constructors. In the grammar g ranges over non-constructor function symbols each with a specified number of arguments (possibly zero). In an expression of the form FLIP(α) we must have that α is a floating point number in the internal [0, 1]. Note that P ranges over programs, i.e., sequences of definitions for non-constructor function symbols. The following is a simple example of a syntactically well formed program. APPEND(x, y) ≡ CASE x OF ADJOIN[z, w] : ADJOIN[z, APPEND(w, y)] z:y TAIL(x)
≡ CASE x OF ADJOIN[y, w] : IF(FLIP(1/2), x, TAIL(w)) z:x
The case construct does pattern matching, i.e., the value of the expression CASE e0 OF p1 : e1 , . . . , pn : en is derived by finding the first pattern pi that matches the value of e0 and then taking the value of ei under a variable interpretation derived from the match. If no pattern matches then the value of the case expression is the constant NOCASE. In the second program IF(e0 , e1 , e2 ) is used as an abbreviation for CASE e0 OF T : e1 , F : e2 . The value of an expression of the form FLIP(α) is selected nondeterministically by selecting the constant T with probability α and the constant F with probability 1 − α. A variable x occurs free in CASE e0 OF p1 : e1 , . . . , pn : en if either x occurs free in e0 or there is some pi : ei such that x does not occur in the pattern
World-Modeling vs. World-Axiomatizing
P (hx, ρi → w) =
379
1 if w = ρ(x) 0 otherwise
( P (hFLIP(α), ρi → w) =
α if w = T 1 − α if w = F 0 otherwise
P (hf [e1 , . . . , en ], ρi → g[v1 , . . . , vk ]) =
P (hg(e1 , . . . , en ), ρi → w) =
Qn 0
i=1
P (hei , ρi → vi ) if g = f otherwise
X Qn P (hu, ρ0 i → w) i=1 P (hei , ρi → vi ) v1 , ..., vn
where0 g(x1 , . . . , xn ) ≡ u ∈ P
and ρ is the environment mapping xi to vi
P (hCASE e0 OF p1 : e1 , . . . , pn : en , ρi → w) =
X v
P (he0 , ρi → v)
0 P (hei , ρ i → w) if0 pi is the first pattern matching v and 1 0
ρ is ρ augmented by matching pi to v if no pi matches v and t = NOCASE otherwise
Fig. 1. Semantic equations for program expressions.
pi but does occur free in ei . In a definition g(x1 , . . . , xn ) ≡ e we must have that e is an expression (as defined by the grammar) with no free variables other than x1 , . . ., xn . In a program P we must have that no non-constructor symbol is defined twice and that every non-constructor symbol appearing in the program is defined in the program, i.e., if P contains the definition g(x1 , . . . , xn ) ≡ e, and g 0 appears in e, then P also contains a definition of the form g 0 (x1 , . . . , xk ) ≡ e0 . Whenever a particular program P is being discussed we will implicitly restrict our discussion to program expressions e such that every non-constructor function symbol appearing in e is defined in P. A binding environment is a mapping ρ from a finite set of program variables, called the domain of ρ, to Herbrand terms. Given a program expression e, and a binding environment ρ whose domain includes the free variables of e, we can compute a value for e in environment ρ in the standard way. This evaluation process either fails to terminate or computes a particular Herbrand term as the value. When the program P is clear from context we let P (he, ρi → v) denote the probability that evaluating e under binding ρ and program P terminates and returns v as the value. These probabilities satisfy the equations shown in figure 1. The equations in figure 1 have a unique least fixed point which gives the intended meaning of P (he, ρi → v). More formally, we define P 0 (he, ρi → v) be 0 and define P i+1 (he, ρi → v) by the equations in figure 1 where P i+1 is used on the right hand side of each equation and P i is used on the left hand side. One
380
D. McAllester
i+1 i can prove by induction Pon i ithat P (he, ρi → v) ≥ P (he, ρi → v) and that for any e and ρ we have v P (he, ρi → v) ≤ 1. We can then define P (he, ρi → v) to be limi→∞ P i (he, ρi → v). The fact that the values of P i are monotonically increasing and bounded implies that this limit exists. Furthermore, the limit satisfies the equations in figure 1 and is the intended meaning of P . If e is a closed term, i.e., one not containing free variables, we will write P (e → v) as an abbreviation for P (he, ρi → v) where ρ is arbitrary. P Because it is possible to write nonterminating programs it is possible that v P (e → v) is strictly less than one. The program P is called consistent if for all non-constructor functions P symbols f defined in P, and all Herbrand terms v1 , . . ., vn , we have that v P (g(v1 , . . . , vn ) → v) = 1. Note that the following definition is consistent even though there exists a nonterminating computation.
AT-LEAST(x) ≡ IF(FLIP(1/2), x, AT-LEAST(SUCC[x]))
3
World Models and Robot Controllers
A robot must interact with an environment that is only partially under its control. Here we model the environment (the world) as a partially observable semiMarkovian decision process (POSDP). More formally, we adopt a discrete notion of time and assume the environment passes through an infinite sequence of states s0 , s1 , s2 , . . .. The robot is assumed to have a certain set of “primitive actions” where a particular execution of a primitive action takes some integral amount of time during which the environment is in some sequence of states. We take a primitive action to be an expression of the form h{v1 , . . . vn } where h is a primitive action function and v1 , . . ., vn are Herbrand terms. For example, if PUT-ON is a primitive action function, and A and B are Herbrand constants, then PUT-ON{A, B} is a primitive action. We define a POSDP to be a set of possible states together with an interpretation of each primitive action as a stochastic function mapping a state si at time i to a pair hhsi+1 , . . . si+n i, wi where n is the time taken by the action and w is a Herbrand term called the observation yielded by the action. For example, the primitive action POT-ON{A, B} can take some integral amount of time and yield an observation such as the constant SUCCEEDED or the constant FAILED. A POSDP must be Markovian in the sense that the stochastic behavior of an action depends only on the state of the environment at the time the action is performed. The robot’s selection of the next action can depend only on the observations returned by previous actions — the robot has no direct knowledge of the environment state. If M is a POSDP we write P (hs, h{v1 , . . . , vn }i → hhu1 , . . . , un i, wi | M) to denote the probability that taking action h{v1 , . . . , vn } in state s takes n time steps during which the environment is in the states u1 , . . ., un and then yields the observation w. If n = 0 then the action is interpreted as a pure sensing action and is modeled as taking no time at all. The definition of a POSDP generalizes the more familiar notions of Markov decision processes (MDP), partially observable Markov decision process
World-Modeling vs. World-Axiomatizing
381
(POMDP), and semi-Markovian decision process. A POMDP is (essentially) a POSDP in which every action takes one time step. A semi-Markovian decision process is a POSDP where the observation always uniquely determines the resulting state. An MDP is a POSDP with both these properties, i.e., every action takes one time step and the observation uniquely determines the resulting state. Here we work with general POSDPs. Complex actions can be constructed from primitive actions. For example, consider the following action expression. CASE PUT-ON(x, y) OF SUCCEEDED : PUT-ON(u, x), FAILED : FAILED This action attempts to put x on y and if that succeeds then attempts to put u on x. A Herbrand constant such as A can be viewed as a trivial action that takes no time, and hence leaves the state unchanged, and always yields itself as the observation. A variable such as x can also be viewed as an action which takes no time and yields its value as the observation. The syntax of action expressions is defined by the following grammar where Herbrand terms v and patterns p are defined as before.
a ::= x | f [a1 , . . . , an ] | g(a1 , . . . , an ) | h{a1 , . . . , an } | FLIP(α) | CASE a0 OF p1 : a1 , . . . , pn : an Q ::= [] | g(x1 , . . . , xn ) ≡ a ; Q
In this grammar f ranges over Herbrand constructors, g ranges over defined action functions — non-constructor functions other than primitive action functions — and h ranges over primitive action functions. Note that, except for the introduction of a new form of function application for primitive action functions, the grammar is identical to the grammar for stochastic programs. Again we require that an action program Q have the property that the body of a definition does not have free variables other than the arguments of the defined function, that no defined action function is defined more than once, and that every defined action function appearing in the program is defined in the program. Note that by including actions of the form FLIP(α) we allow for stochastic action selection. Action expressions can be executed in essentially the same way as program expressions except that calls to primitive action functions are handled by the POSDP environment rather than by the robot. Consider the following composite action function for clearing a block. Here we assume a primitive sensing action function BLOCK-ON which takes a block and
382
D. McAllester
P (hs, x, ρi → hσ, wi) =
1 if σ = hi and w = ρ(x) 0 otherwise
( P (hs, FLIP(α), ρi → hσ, wi) =
α if σ = hi and w = T 1 − α if σ = hi and w = F 0 otherwise
P (hs, f [e1 , . . . , en ], ρi → hσ, g[v1 , . . . , vk ]i) =
X Qn P (h`(sγ1 · · · γi−1 ), ei , ρi → hγi , vi i) i=1
if g = f
γ1 ···γn =σ
0
otherwise
P (hs, h{e1 , . . . , en }, ρi → hσ, wi) =
X v1 ,...,vn , γ1 ···γn δ=σ
Qn i=1
P (h`(sγ1 · · · γi−1 ), ei , ρi → hγi , vi i)
× P (h`(sγ1 · · · γn ), h{v1 , . . . , vn }i → hδ, wi | M)
P (hs, g(e1 , . . . , en ), ρi → hσ, wi) =
X v1 ,...,vn , γ1 ···γn δ=σ
Qn i=1
P (h`(sγ1 · · · γi−1 ), ei , ρi → hγi , vi i)
× P (h`(sγ1 · · · γn ), u, ρ0 i → hδ, wi)
where g(x1 , . . . , xn ) ≡ u ∈ Q and ρ0 is the environment mapping xi to vi P (hs, CASE e0 OF p1 : e1 , . . . , pn : en , ρi → hσ, wi) =
X v, γδ=σ
P (hs, e0 , ρi → hγ, vi) 0 P (h`(sγ), ei , ρ i → hδ, wi) if pi is the first pattern matching v and × ρ0 is ρ augmented by matching pi to v 1 if no pi matches v, δ = hi, and w = NOCASE 0 otherwise
Fig. 2. Semantic equations for action expressions. If α is a non-empty sequence of states, then `(α) denotes the last state in the sequence α.
returns the block on top of that block as an observation about the world. CLEAR(x) ≡ CASE BLOCK-ON(x) OF NONE : SUCCESS FAILURE[m] : FAILURE[m] BLOCK[y] : SEQ(CLEAR(y), PUT-ON(y, TABLE))
World-Modeling vs. World-Axiomatizing
383
In this program we let SEQ(a1 , a2 ) abbreviate the following. CASE a1 OF FAILURE[m] : FAILURE[m] CASE a2 OF FAILURE[m] : FAILURE[AFTER[da1 e, m]] z:z We have that the action SEQ(a1 , a2 ) first performs a1 . If the observation from a1 is a failure message then that message is returned. Otherwise the action a2 is performed and if that fails a failure message is returned noting that a1 succeeded before the failure occurred. The expression dae is intended to denote an expression whose value is a Herbrand term representation of the action term a. For example, if a is CLEAR(y) then dae is CLEAR[y]. Given a POSDP M, it is possible to formally assign a meaning to action expressions so that each such expression denotes a stochastic function which takes a state and stochastically produces a pair of a sequence of states and an observation. When Q and M are clear from context we write P (hs, a, ρi → hα, wi) to denote the probability that executing action a in state s under binding environment ρ results in the sequence of states α and the observation w. These probabilities can be formally defined in a manner similar to the definition of P (he, ρi → v) in section 2. In particular these probabilities can be defined as the least fixed point of the equations in figure 2. The intuition should be clear, however, independent of figure 2.
4
Policies for POSDPs
It is standard in the MDP literature to call a robot control program a “policy”. Here we define a policy (control program) for a POSDP M to be a pair hQ, gi where Q is an action program as defined above and g is a defined action function of one argument satisfying the condition that for any Herbrand term v we have, that with probability 1, the action g(v) takes time greater than zero and terminates. This can be expressed formally by the following equation where |α| denotes the length of the string α. X P (hs, g(v)i → hα, wi) = 1 |α|>0, w
We run a policy g as follows. We let v0 be the “observation” START. We then compute an infinite sequence of observations by letting vi+1 be the observation resulting from the execution of the action g(vi ). Note that observations are Herbrand terms and hence this definition allows the observation vi to be the entire history of primitive actions and their observations. If M is a finite MDP then vi might be a single symbol naming the current state. In an elevator controller or a robosoccer controller the action g(v) might rely entirely on sensing and ignore the observation v.
384
D. McAllester
A policy g and an initial state s0 determines a probability distribution over infinite sequences of states. We can evaluate the utility of a policy by introducing a reward function on states. A reward function maps each state to a real number called the reward of that state. Intuitively, the reward function expresses the goal of the robot — it should behave so as to maximize reward. Here we will be concerned with (undiscounted) asymptotic average reward. A given behavior of the robot leads, ultimately, to an infinite sequence of states s0 , s1 , s2 , . . .. We define the asymptotic average reward of such a sequence to be the following quantity where r is the reward function. k 1X r(si ) k→∞ k i=0
lim
In general there can be a nonzero probability over the choice of the infinite sequence that this limit does not exist (even for bounded reward). However, if the set of states is finite and the set of observations passed between runs of the policy h is also finite, then with probability 1 over the generation of the sequence the limit exists. In section 7 we will assume that the sets of the states and observations reachable by the policy from the start state s0 is finite. Of course the semantics of action expressions and policies supports other methods of evaluating a policy. We could consider discounted reward, or the expected time to reaching a goal state. However, for the formalism developed here, asymptotic average reward turns out to be most easily computed.
5
Symbolic POSDPs
In this section we give a method of constructing POSDPs. We define a symbolic POSDP to be a pair hP, Ai where P is a consistent stochastic program as defined in section 2 and A is a set of non-constructor function symbols defined in P which we identify with the primitive action functions. In the program P an n-ary primitive action function is defined as an n + 1-ary function — the last argument is interpreted as the state in which the action is executed. To formally define the semantics of a symbolic POSDP we first define an internal action value to be a Herbrand term of the following form. PAIR[INSERT[s1 , . . . , INSERT[sn , EMPTY] . . .], w] If u is a term of this form then we define s(u) to be the state sequence hs1 , . . . , sn i and we define o(u) to be the observation w. If u is not of this form then we define s(u) to be EMPTY and o(u) to be FAILURE[BAD-ACTION-VALUE]. We now define the semantics of a symbolic POSDP M consisting of P and A by taking the set of states to be the set of Herbrand terms and by defining the semantics of primitive actions with the following equation. X P (hs, h{v1 , . . . , vn }i → hα, wi | M) =
P (h(v1 . . . , vn , s) → u)
u: s(u)=α, o(u)=w
World-Modeling vs. World-Axiomatizing
385
It is interesting to note that the McCarthy frame problem does not arise in this approach to constructing world models. For example, a natural representation of a state of the world is a list of assertions — a list of Herbrand terms each of which intuitively represents some aspect of the world. A blocks world state might include assertions such as ON[A, B] and COLOR[A, GREEN]. We can “implement” a primitive action MOVE-FROM-TO so that if s contains CLEAR[x], ON[x, y] and CLEAR[z], then MOVE-FROM-TO(x, y, z, s) returns (the Herbrand representation of) hhui, SUCCESSi where u is the state resulting from removing the assertions ON[x, y] and CLEAR[z] and adding the assertions CLEAR[y] and ON[x, z]. We can also easily arrange that if the required conditions on the input state are not met then MOVE-FROM-TO(x, y, z, s) returns hhi, FAILURE[PRECONDITIONS-NOT-MET]i. Note that this action will automatically not affect assertions about colors — there is no need to list the properties unaffected by the action. The need to list unaffected properties does not arise in the modeling approach.
6
Computing Value Distributions for Program Expressions
This section gives an algorithm for computing the probability distribution over the values of a given program expression. This will be a required step in later algorithms and also provides a warm-up exercise for the slightly more complex computation of value distributions for action expressions. Given a closed program expression e we define the computation graph of e to be the least set of assertions containing the following. The computation graph contains all assertions of the form Eval(e0 , ρ) such that there exists a nonzero probability that the evaluation e will cause an evaluation of he0 , ρi. The computation graph also includes all assertions of the form he0 , ρi → v such that it includes Eval(e0 , ρ) and the evaluation of he0 , ρi has a nonzero probability of returning the value v. The computation graph of a given expression e can be computed using a bottom-up logic program, i.e., a set of rules for deriving new assertions. We start with the single assertion Eval(e, ∅) where e is the given top level expression and ∅ is the empty variable substitution. We then add new assertions as they become derivable under the rules. For example, there is a rule stating that if we derive Eval(g(e1 , . . . , en ), ρ) then we also derive Eval(ei , ρ) for each ei . Furthermore, if we derive Eval(g(e1 , . . . , en ), ρ) and he1 , ρi → v1 , . . ., hen , ρi → vn then we derive Eval(u, ρ0 ) where g(x1 , . . . , xn ) ≡ u ∈ P and ρ0 is the environment mapping xi to vi . It is possible to write down “evaluation rules” for each of the five types of program expressions. This generation process terminates if and only if the computation graph of e is finite. Our algorithm for computing value distributions requires that the computation graph be finite. As an example, suppose that we have defined a function NEXT-STATE such that for any Herbrand expression v we have that NEXT-STATE(v) stochastically
386
D. McAllester
computes one of a finite set of Herbrand constants representing a finite set of states. Now suppose we define the following program. TERMINAL-STATE(s) ≡ CASE s OF A : A B:B z : TERMINAL-STATE(NEXT-STATE(s)) Now suppose we take the top level assertion to be TERMINAL-STATE(C). If the transition matrix defined by the function NEXT-STATE is ergodic then the procedure TERMINAL-STATE terminates with probability 1. Furthermore, it has only two possible values — the constants A and B. We wish to compute the relative probabilities of these two possible outcomes. Assuming that calls to NEXT-STATE produce finite computation graphs, calls to FINAL-STATE also produce finite computation graphs. The graph consists, essentially, of assertions of the form NEXT-STATE(D) → E and FINAL-STATE(D) → A. We now give a general algorithm for computing value distributions for expressions with finite computation graphs. Now for each “edge” he0 , ρi → v in the computation graph we compute the probability of that edge, i.e., the probability that the evaluation of he0 , ρi gives value v. This is done with a numerical least fixed point calculation on the (finite) computation graph. More specifically, for each edge in the graph we define P 0 (he0 , ρi → v) to be zero. For each such edge we then compute P i+1 (w → v | ρ) using the equations of figure 1 with P replaced by P i+1 in the right hand side of each equation and P replaced with P i on the left hand side. The edge probability P (he0 , ρi → v) equals the limit as i → ∞ of P i (he0 , ρi → v). In practice the numerical computation can be terminated when the edge probabilities have stabilized. In the above example, this process will essentially compute all probabilities of the form P (NEXT-STATE(D) → E) and P (TERMINAL-STATE(D) → A).
7
Computing Value Distributions for Action Expressions
We now assume a given symbolic POSDP defined by a stochastic program P and assume a given action program Q. We will compute distributions of “values” for actions. For any state s and action expression a we define the computation graph of hs, ai to be the least set of assertions containing the following. First, we include all assertions of the form Eval(s0 , a0 , ρ) such that the running action a in state s has a nonzero probability of causing a0 to run in s0 under environment ρ. Second, we include all “edges” of the form hs0 , a0 , ρi → hs00 , wi such that the computation graph contains Eval(s0 , a0 , ρ) and there is a nonzero probability that running a0 in state s0 under environment ρ produces observation w and a state sequence ending in the state s00 . Third, we include the computation graph for all terms of the form h(v1 , . . . , vn , sn+1 ) such that the graph includes assertions of the form Eval(s1 , h{e1 , . . . , en }, ρ) and hs1 , e1 , ρi → hs2 , v1 i, . . ., hsn , en , ρi → hsn+1 , vn i. The computation graph for hs, ai can be computed with a bottom-up logic program for generating these assertions. We require that the computation graph for hs, ai be finite.
World-Modeling vs. World-Axiomatizing
387
If hs, ai has a finite computation graph then we can compute a probability for each edge hs0 , a0 , ρi → hs00 , wi using a numerical least fixed point calculation analogous to that in section 6.
8
Computing Asymptotic Average Reward
Finally, we define the computation graph for a policy g and initial state s0 to be the least set of assertions containing the following. First it contains the computation graph of the state-action pair hs0 , g(START)i. Second, if the graph contains an edge of the form hs, g(v), ∅i → hs0 , v 0 i then it also includes the computation graph for hs0 , g(v 0 )i. We require that the computation graph of hs0 , gi be finite. The edge probabilities for each edge in this graph can be computed using the the numerical least fixed point calculation mentioned in the previous section. Given a finite computation graph for hs0 , gi with computed edge probabilities we now compute two additional numbers for each edge hs, a, ρi → hs0 , wi. First we compute the expected time of the edge, i.e., the expected number of states in the state sequence generated by the execution of hs, a, ρi given that the execution produces hs0 , wi. Given the edge probabilities (which are all nonzero), the expected times of the edges can be computed using a numerical least fixed point calculation on the (finite) computation graph. Finally, for each edge we compute the expected total reward of that edge, i.e., the expected sum of the rewards for the states in the state sequence generated by the execution of hs, a, ρi given that the execution produces hs0 , wi. Given the edge probabilities, the expected rewards of the edges can again be computed by a numerical least fixed point calculation. Given the probability, expected time, and expected total reward of each edge we can compute the the asymptotic average reward as follows. We define S to be the set of pairs hs, vi such that the computation graph contains Eval(s, g(v), ∅). We define a probability transition matrix M on S where the probability of the transition from hs, vi to hs0 , v 0 i is the probability of the edge hs, g(v), ∅i → hs0 , v 0 i if the computation graph contains this edge and zero otherwise. Let D0 be the probability distribution on S concentrating all mass on the element hs0 , STARTi. Now define Di+1 to be (D0 + iDi M )/(i + 1). It is possible to show that the limit as i → ∞ of Di exists, is a stationary distribution of M , and equals the long-term distribution of the elements of S under the transitions defined by M (whether or not M is ergodic). We let D be this limit distribution. We let T be the average time per transition, i.e., the quantity X D(hs, vi)M (hs, vi, hs0 , v 0 i)T (hs, vi, hs0 , v 0 i) hs, vi,hs0 , v 0 i where T (hs, vi, hs0 , v 0 i) is the expected transition time for hs, g(v), ∅i → hs0 , v 0 i. Similarly, we define R to be the average transition reward, i.e., the quantity X D(hs, vi)M (hs, vi, hs0 , v 0 i)R(hs, vi, hs0 , v 0 i) hs, vi,hs0 , v 0 i
388
D. McAllester
where R(hs, vi, hs0 , v 0 i) is the expected transition reward for hs, g(v), ∅i → hs0 , v 0 i. The asymptotic average reward is now just R/T . To see this for the case where M is ergodic consider sampling an infinite run of the policy starting at state s0 . For any finite number k let R(k) be sum of the rewards up to time k and let n(k) be the number of top level iterations of the policy up to time k. We now have the following. limk→∞ R(k)/n(k) R(k) R(k)/n(k) = R/T = lim = k→∞ k→∞ k k/n(k) limk→∞ k/n(k) lim
9
Conclusions
We have argued that world knowledge is often more usefully expressed as a world model rather than as world axioms. A particular formalism for expressing world models — symbolic POSDPs — has been defined, as well as a high level language for writing policies for these models. Finally, an algorithm has been given for computing asymptotic average reward. There are many directions for further research. It should be possible to give an algorithm for computing expected discounted reward or expected time to a goal state. It should also be possible to enrich the programming languages with types, exceptions, and concurrency. Finally, it should be possible write more sophisticated analysis algorithms such as algorithms for verifying the consistency of stochastic programs.
References 1. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and J. Hwang. Symbolic model checking:1020 states and beyond. In Proceedings of the Fifth Annual IEEE Symposium on Logic in Computer Science, June 1990. 2. J.Y. Halpern and M. Y. Vardi. Model checking vs. theorem proving – a manifesto. In V. Lifschitz, editor, Artificial Intelligence and Mathematical Theory of Computation, pages 151–176. Academic Press, 1991. An Earlier version appeared in KR-91. 3. Gerard J. Holzmann. Design and Validation of Computer Protocols. Prentice Hall, 1991. 4. H.J. Levesque and R. Reiter. High-level robotic control: beyond planning. A position paper for AAAI 1998 Spring Symposium: Integrating Robotics Research: Taking the Next Big Leap. Available at http://www.cs.toronto.edu/cogrobo, 1998. 5. H.J. Levesque, R. Reiter, Y. Lesp´erance, F. Lin, and R. Scherl. Golog: A logic programming language for dynamic domains. Journal of Logic Programming, 31:59– 84, 1997. 6. David McAllester, Daphne Koller, and Avi Pfeffer. Effective bayesian inference for stochastic programs. In AAAI-97, 1997. 7. John McCarthy and Patrick Hayes. Some philosophical problems from the standpoint of artificial intelligence. In B. Meltzer and D. Michie, editors, machine intelligence 4, pages 463–502. Edinburgh University Press, 1969. 8. Kenneth L. McMillan. Symbolic Model Checking. Kluwer Academic, July 1993. 9. R. S. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. AIJ, 1999. to appear.
Practical Nonmonotonic Reasoning: Extending Inheritance Techniques to Solve Real-World Problems Leora Morgenstern IBM T.J. Watson Research 30 Saw Mill River Drive Hawthorne, NY 10532
[email protected]
Despite the obvious relevance of plausible reasoning to real-world problem solving, nonmonotonic logics are rarely used in commercial applications or largescale commonsense reasoning systems. This is largely because few efficient algorithms and tools have thus far been developed. A notable exception is nonmonotonic inheritance, which provides a natural model for commonsense taxonomic reasoning, and for which low-order polynomial algorithms are available (Horty et al., 1990; Stein, 1992). However, inheritance is not sufficiently powerful to model the reasoning needed in many real-world applications. This talk discusses how the paradigm of nonmonotonic inheritance can be extended to a broader and more powerful kind of nonmonotonic reasoning. This is done by introducing formula-augmented semantic networks (FANs), semantic networks which attach well-formed formulae to nodes. The problem of inheriting well-formed formulae within this structure is explored, and an algorithm, based on selecting preferred maximal consistent subsets of wffs subject to various preference criteria, is given and discussed. We examine in detail several real-world problems which have been or can be solved using FANs. These include benefits inquiry in the medical insurance domain (Morgenstern, 1998; Morgenstern and Singh, 1997), rapid search of large knowledge bases for helpdesk applications (Hantler et al.), and legal reasoning in the tax domain using a combination of taxonomic and case-based reasoning techniques (Ashley et al.).
References 1. Ashley, K., Horty, J., Morgenstern, L, and Thomason, R.: work in progress. 2. Hantler, S., Laker, M., and Morgenstern, L.: work in progress. 3. Horty, J., Thomason, R., and Touretzky, D.: A skeptical theory of inheritance in nonmonotonic semantic networks, Artificial Intelligence 42 (1990): 311-349. 4. Morgenstern, L.: Inheritance comes of age: applying nonmonotonic techniques to problems in industry, Artificial Intelligence 103 (1998): 237-271. 5. Morgenstern, L. and Singh, M.: An expert system using nonmonotonic techniques for benefits inquiry in the insurance industry, Proceedings IJCAI-97, Morgan Kaufmann, San Francisco, 655-661, 1997. 6. Stein, L.: Resolving ambiguity in nonmonotonic inheritance hierarchies, Artificial Intelligence 55 (1992): 259-310. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, p. 389–389, 1999. c Springer-Verlag Berlin Heidelberg 1999
Author Index Alferes, J. J. . . . . . . . . . . . . . . . . . . . 162 Antoniou, G. . . . . . . . . . . . . . . . . . . 347
Miller, R. . . . . . . . . . . . . . . . . . . . . . . . 78 Morgenstern, L. . . . . . . . . . . . . . . . 389
Billington, . . . . . . . . . . . . . . . . . . . . . 347
Niemel¨a, I. . . . . . . . . . . . . . . . . . . . . .317
Cadoli, M. . . . . . . . . . . . . . . . . . . . . . 372 Cenzer, D. . . . . . . . . . . . . . . . . . . . . . . 34 Cui, B. . . . . . . . . . . . . . . . . . . . . . . . . 206
Pereira, L. M. . . . . . . . . . . . . . 162, 262 Pfeifer, G. . . . . . . . . . . . . . . . . . . . . . 177 Pivkina, I. . . . . . . . . . . . . . . . . . . . . . . 49 Przymusinska, H. . . . . . . . . . . . . . . 162 Przymusinski, T. . . . . . . . . . . . . . . 162
Dam´asio, C. V. . . . . . . . . . . . . . . . . 262 Dekhtyar, M. . . . . . . . . . . . . . . . . . . 132 De Vos, M. . . . . . . . . . . . . . . . . . . . . 236 Dikovsky, A. . . . . . . . . . . . . . . . . . . . 132 Dudakov, S. . . . . . . . . . . . . . . . . . . . 132 Erdem, E. . . . . . . . . . . . . . . . . . . . . . 107 Faber, W. . . . . . . . . . . . . . . . . . . . . . 177 Gottlob, G. . . . . . . . . . . . . . . . . . . . . . . 1 Governatori, G. . . . . . . . . . . . . . . . . 347 Greco, S. . . . . . . . . . . . . . . . . . . . . . . 221 Hitzler, P. . . . . . . . . . . . . . . . . . . . . . 357 Inoue, K. . . . . . . . . . . . . . . . . . . . . . . 147 Janhunen, T. . . . . . . . . . . . . . . . . . . . 19 Kakas, A. . . . . . . . . . . . . . . . . . . . . . . . 78 Leone, N. . . . . . . . . . . . . . . . . . . . . . . 177 Lifschitz, V. . . . . . . . . . . . 92, 107, 373 Lin, F. . . . . . . . . . . . . . . . . . . . . . . . . .117 Linke, T. . . . . . . . . . . . . . . . . . . . . . . 247 Lukasiewicz, T. . . . . . . . . . . . . . . . . 277 Maher, M. J. . . . . . . . . . . . . . . . . . . 347 Marek, V. . . . . . . . . . . . . . . . . . . . . . . 49 Mateis, C. . . . . . . . . . . . . . . . . . . . . . 290 McAllester, D. . . . . . . . . . . . . . . . . . 375
Remmel, J. B. . . . . . . . . . . . . . . . . . . 34 Rosati, R. . . . . . . . . . . . . . . . . . . . . . 332 Sakama, C. . . . . . . . . . . . . . . . . . . . . 147 Scarcello, F. . . . . . . . . . . . . . . . . . . . . . 1 Schaub, T. . . . . . . . . . . . . . . . . . . . . .247 Seda, A. K. . . . . . . . . . . . . . . . . . . . . 357 Sefranek, J. . . . . . . . . . . . . . . . . . . . . . 63 Shen, Y. . . . . . . . . . . . . . . . . . . . . . . . 192 Sideri, M. . . . . . . . . . . . . . . . . . . . . . . . . 1 Simons, P. . . . . . . . . . . . . . . . . 305, 317 Soininen, T. . . . . . . . . . . . . . . . . . . . 317 Spyratos, N. . . . . . . . . . . . . . . . . . . . 132 Swift, T. . . . . . . . . . . . . . . . . . . 206, 262 Toni, F. . . . . . . . . . . . . . . . . . . . . . . . . .78 Truszczy´ nski, M. . . . . . . . . . . . . . . . . 49 Turner, H. . . . . . . . . . . . . . . . . . . . . . . 92 Vanderbilt, A. . . . . . . . . . . . . . . . . . . 34 Vermeir, D. . . . . . . . . . . . . . . . . . . . . 236 Wang, K. . . . . . . . . . . . . . . . . . . . . . . 117 Warren, D. S. . . . . . . . . . . . . . . . . . . 206 Yuan, L. . . . . . . . . . . . . . . . . . . . . . . . 192 You, J. . . . . . . . . . . . . . . . . . . . . . . . . 192 Zhou, N. . . . . . . . . . . . . . . . . . . . . . . . 192