This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
)] ≥ 2/3. We want to test periodicity in the family of functions defined on N. To make the problem finite, we fix an upper bound on the period. Then, a function f : {0, . . . , T − 1} → S is q-periodic, for 1 ≤ q < T , if f (x + aq) = f (x), for every x, a ∈ N such that x + aq < T . The problem we now want to test is if there exists a period less than some given number t. More precisely, we define for integers 2 ≤ t ≤ T, INT-PERIOD(T, t) = {f : {0, . . . , T − 1} → S | ∃q : 1 ≤ q < t, f is q-periodic}. Here we do not require that q divides t since we do not have any finite group structure.
Quantum Testers for Hidden Group Properties
427
Test Integer periodf (T, t, δ) 1. N ← Ω((log T )2 /δ). 2. For i = 1, . . . , N do yi ← Fourier samplingf (ZT ), and use the continued fractions method to round yTi to the nearest fraction abii with bi < t. 3. p ← lcm{bi : 1 ≤ i ≤ N }. 4. If p ≥ t, reject. 5. Tp ← T /pp. 6. M ← Ω(1/δ). 7. For i = 1, . . . , M let ai , xi ∈R ZTp . 1 |{i : f (xi + ai p mod Tp ) = f (xi )}| < 2δ . 8. Accept iff M Theorem 3. For 0 < δ < 1, and integers 2 ≤ t ≤ T such that T /(log T )4 = Ω((t log t/δ)2 ), Test Integer period(T, t, δ) is a δ-tester with two-sided error for INT-PERIOD(T, t) on the family of functions from {0, . . . , T − 1} to S, with O((log T )2 /δ) query complexity and (log T /δ)O(1) time complexity.
5
Common Coset Range
In this section, G denotes a finite group and S a finite set. Let f0 , f1 be functions from G to S. For a normal subgroup H G, we say that f0 and f1 are Hsimilar if on all cosets of H the ranges of f0 and f1 are the same, that is, the multiset equality f0 (xH) = f1 (xH) holds for every x ∈ G. Consider the function f : G × Z2 → S, where by definition f (x, b) = fb (x). We will use f for (f0 , f1 ) when it is convenient in the coming discussion. We denote by Range(H) the set of functions f such that f0 and f1 are H-similar. We say that H is (k, t)-generated, for some positive integers k, t, if |H| ≤ k and it is the normal closure of a subgroup generated by at most t elements. The aim of this section is to establish that for any positive integers k and t, the family COMMON-COSET-RANGE(k, t) (for short CCR(k, t)), defined as the set {f : G × Z2 → S | ∃H G : H is (k, t)-generated, f0 and f1 are H-similar}, can be tested by the following quantum test. Note that a subgroup of size k is always generated by at most log k elements, therefore we always assume that t ≤ log k. In the testing algorithm, we assume that we have a quantum oracle for the function f : G × Z2 → S. Test Common coset rangef (G, k, t, δ) 1. N ← 2kt log(|G|)/δ. 2. For i = 1, . . . , N do (ρi , bi ) ← Fourier samplingf (G × Z2 ). 3. Accept iff ∃H G : H is (k, t)-generated ∀i (bi = 1 =⇒ ρi ∈ H ⊥ ). We first prove the robustness of the property that when Fourier samplingf (G × Z2 ) outputs (ρ, 1), where G is any finite group, H G and f ∈ Range(H), then ρ is not in H ⊥ .
428
Katalin Friedl et al.
Lemma 5. Let S be a finite set and G a finite group. Let f : G × Z2 → S and H G. Then dist(f, Range(H)) ≤ |H| · Pr[Fourier samplingf (G × Z2 ) outputs (ρ, 1) such that ρ ∈ H ⊥ ]. Our next theorem implies that CCR(k, t) is query efficiently testable when k is polynomial in log|G|. Theorem 4. For any finite set S, finite group G, integers k ≥ 1, 1 ≤ t ≤ log k, and 0 < δ < 1, Test Common coset range(G, k, t, δ) is a δ-tester for CCR(k, t) on the family of all functions from G × Z2 to S, with O(kt log(|G|)/δ) query complexity. The proof technique of Theorem 4.2 of [2] yields: Theorem 5. Let G be a finite Abelian group and let k be the exponent of G. For testing CCR(k, 1) on G, any classical randomized bounded error query algorithm on G requires Ω( |G|) queries.
References 1. C. H. Bennett and G. Brassard. Quantum cryptography: Public key distribution and coin tossing. In Proc. IEEE International Conference on Computers, Systems, and Signal Processing, pages 175–179, 1984. 2. H. Buhrman, L. Fortnow, I. Newman, and H. R¨ ohrig. Quantum property testing. In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2003. 3. M. Blum, M. Luby, and R. Rubinfeld. Self-testing/correcting with applications to numerical problems. J. Comput. System Sci., 47(3):549–595, 1993. 4. W. van Dam, F. Magniez, M. Mosca, and M. Santha. Self-testing of universal and fault-tolerant sets of quantum gates. Proc. 32nd ACM STOC, pp. 688–696, 2000. 5. M. Ettinger and P. Høyer. On quantum algorithms for noncommutative hidden subgroups. Adv. in Appl. Math., 25(3):239–251, 2000. 6. E. Fischer. The art of uninformed decisions: A primer to property testing, the computational complexity. In The Computational Complexity Column, volume 75, pages 97–126. The Bulletin of the EATCS, 2001. 7. K. Friedl, G. Ivanyos, F. Magniez, M. Santha, and P. Sen. Hidden translation and orbit coset in quantum computing. In Proc. 35th ACM STOC, 2003. 8. O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. J. ACM, 45(4):653–750, 1998. 9. L. Hales. The Quantum Fourier Transform and Extensions of the Abelian Hidden Subgroup Problem. PhD thesis, University of California, Berkeley, 2002. 10. L. Hales and S. Hallgren. An improved quantum Fourier transform algorithm and applications. In Proc. 41st IEEE FOCS, pages 515–525, 2000. 11. A. Kitaev. Quantum measurements and the Abelian Stabilizer Problem. Technical report no. 9511026, Quantum Physics e-Print archive, 1995. 12. D. Mayers and A. Yao. Quantum cryptography with imperfect apparatus. In Proc. 39th IEEE FOCS, pages 503–509, 1998. 13. M. Nielsen and I. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000. 14. R. Rubinfeld and M. Sudan. Robust characterizations of polynomials with applications to program testing. SIAM J. Comp., 25(2):23–32, 1996. 15. P. Shor. Algorithms for quantum computation: Discrete logarithm and factoring. SIAM J. Comp., 26(5):1484–1509, 1997.
Local LTL with Past Constants Is Expressively Complete for Mazurkiewicz Traces Paul Gastin1 , Madhavan Mukund2 , and K. Narayan Kumar2 1 2
LIAFA, Universit´e Paris 7, 2, place Jussieu, F-75251 Paris Cedex 05, France [email protected] Chennai Mathematical Institute, 92 G N Chetty Road, Chennai 600 017, India {madhavan,kumar}@cmi.ac.in
Abstract. To obtain an expressively complete linear-time temporal logic (LTL) over Mazurkiewicz traces that is computationally tractable, we need to intepret formulas locally, at individual events in a trace, rather than globally, at configurations. Such local logics necessarily require past modalities, in contrast to the classical setting of LTL over sequences. Earlier attempts at defining expressively complete local logics have used very general past modalities as well as filters (side-conditions) that “look sideways” and talk of concurrent events. In this paper, we show that it is possible to use unfiltered future modalities in conjunction with past constants and still obtain a logic that is expressively complete over traces. Keywords: Temporal logics, Mazurkiewicz traces, concurrency
1
Introduction
Linear-time temporal logic (LTL) [17] has established itself as a useful formalism for specifying the interleaved behaviour of reactive systems. To combat the combinatorial blow-up involved in describing computations of concurrent systems in terms of interleavings, there has been a lot of interest in using temporal logic more directly on labelled partial orders. Mazurkiewicz traces [13] are labelled partial orders generated by dependence alphabets of the form (Σ, D), where D is a dependence relation over Σ. If (a, b) ∈ / D, a and b are deemed to be independent actions that may occur concurrently. Traces are a natural formalism for describing the behaviour of static networks of communicating finite-state agents [24]. LTL over Σ-labelled sequences is equivalent to FOΣ (<), the first-order logic over Σ-labelled linear orders [12] and thus defines the class of aperiodic languages over Σ. Though FOΣ (<) permits assertions about both the past and the future, future modalities suffice for establishing the expressive completeness of LTL with respect to FOΣ (<) [8]. From a practical point of view, a finite-state program may be checked against an LTL specification relatively efficiently.
Partial support of CEFIPRA-IFCPAR Project 2102-1 (ACSMV) is gratefully acknowledged.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 429–438, 2003. c Springer-Verlag Berlin Heidelberg 2003
430
Paul Gastin, Madhavan Mukund, and K. Narayan Kumar
The first expressively complete temporal logic over traces was described in [6] for finite traces and in [19] for infinite traces. The result was refined in [4] to show expressive completeness without past modalities, using an extension of the proof technique developed for LTL in [23]. Formulas in both these logics are defined at global configurations (maximal antichains). Unfortunately, reasoning at the level of global configurations makes the complexity of deciding satisfiability non-elementary [21]. Computational tractability seems to require interpreting formulas at local states—effectively at individual events. Recently, in [10], a local temporal logic has been defined over traces and shown to be expressively complete and tractable (the satisfiability problem is in Pspace). This logic uses both future and past modalities (similar to the until and since operators of LTL) which are further equipped with filters (side-conditions). It was also shown that for finite traces, a restricted form of past modalities suffices, but only in conjunction with filtered future modalities. Another proposal is presented in [1] and this logic also uses the since operator. LTL without any past operators is expressively complete over words but this cannot be the case for traces: there exist two first-order inequivalent traces that cannot be distinguished using only future modalities [22]. In this paper, we show that a very limited ability to talk about the past is sufficient to obtain expressive completeness over traces. Our logic uses unfiltered future modalities and a finite number of past constants. (In particular, there is no nesting of past operators and for that matter even future formulas cannot be nested into past formulas.) As in [3,4,10], we show expressive completeness using an extension to traces of the proof technique introduced in [23] for LTL over sequences. From the recent general result proved in [9], it follows that the satisfiability problem for this new logic is also in Pspace. The paper is organized as follows. We begin with some preliminaries about traces. In Section 3 we define our new temporal logic. Section 4 describes a syntactic partition of traces that is used in Section 5 to establish expressive completeness. Many proofs have had to be omitted in this extended abstract. A full version of the paper is available in [11].
2
Preliminaries
We briefly recall some notions about Mazurkiewicz traces (see [5] for background). A dependence alphabet is a pair (Σ, D) where the alphabet Σ is a finite set of actions and the dependence relation D ⊆ Σ ×Σ is reflexive and symmetric. The independence relation I is the complement of D. For A ⊆ Σ, the set of letters independent of A is denoted by I(A) = {b ∈ Σ | (a, b) ∈ I for all a ∈ A} and the set of letters depending on (some action in) A is denoted by D(A) = Σ \ I(A). A Mazurkiewicz trace is a labelled partial order t = [V, ≤, λ] where V is a set of vertices labelled by λ : V → Σ and ≤ is a partial order over V satisfying the following conditions: For all x ∈ V , the downward set ↓x = {y ∈ V | y ≤ x} is finite, (λ(x), λ(y)) ∈ D implies x ≤ y or y ≤ x, and xy implies (λ(x), λ(y)) ∈ D, where = < \ <2 is the immediate successor relation in t.
Local LTL with Past Constants Is Expressively Complete
431
The alphabet of a trace t is the set alph(t) = λ(V ) ⊆ Σ and its alphabet at infinity, alphinf(t), is the set of letters occurring infinitely often in t. The set of all traces is denoted by R(Σ, D) or simply by R. A trace t is called finite if V is finite. For t = [V, ≤, λ] ∈ R, we define min(t) ⊆ V as the set of all minimal vertices of t. We can also read min(t) ⊆ Σ as the set of labels of the minimal vertices of t. It will be clear from the context what we actually mean. Let t1 = [V1 , ≤1 , λ1 ] and t2 = [V2 , ≤2 , λ2 ] be a pair of traces such that alphinf(t1 ) × alph(t2 ) ⊆ I. We then define the concatenation of t1 and t2 to be t1 ·t2 = [V, ≤, λ] where V = V1 ∪V2 (assuming wlog that V1 ∩V2 = ∅), λ = λ1 ∪λ2 and ≤ is the transitive closure of the relation ≤1 ∪ ≤2 ∪ (V1 × V2 ∩ λ−1 (D)). The set of finite traces is then a monoid, denoted M(Σ, D) or simply M, with the empty trace 1 = (∅, ∅, ∅) as unit. Here is some useful notation for subclasses of traces. For C ⊆ Σ, let RC = {t ∈ R | alph(x) ⊆ C} and MC = M ∩ RC . Also, (alph = C) = {t ∈ R | alph(t) = C}, (alphinf = C) = {t ∈ R | alphinf(t) = C} and (min = C) = {t ∈ R | min(t) = C}. For A, C ⊆ Σ, we set RA C = RC ∩ (alphinf = A). Observe that MC = R∅C . The first order theory of traces FOΣ (<) is given by the syntax: ϕ ::= Pa (x) | x < y | ¬ϕ | ϕ ∨ ϕ | ∃xϕ, where a ∈ Σ and x, y ∈ Var are first order variables. Given a trace t = [V, ≤, λ] and a valuation σ : Var → V , t, σ |= ϕ denotes that t satisfies ϕ under σ. We interpret each predicate Pa by the set {x ∈ V | λ(x) = a} and the relation < as the strict partial order relation of t. The semantics then lifts to all formulas as usual. Since the meaning of a closed formula (sentence) ϕ is independent of the valuation σ, we can associate with each sentence ϕ the language L(ϕ) = {t ∈ R | t |= ϕ}. We say that a trace language L ⊆ R is expressible in FOΣ (<) if there exists a sentence ϕ ∈ FOΣ (<) such that L = L(ϕ). We denote by FO(Σ,D) (<) the set of trace languages L ⊆ R(Σ, D) that are expressible in FOΣ (<). For n > 0, FOnΣ (<) denotes the set of formulas with at most n distinct variables (note that each variable may be bound and reused several times). We use the algebraic notion of recognizability. Let h : M → S be a morphism to a finite monoid S. For t, u ∈ R, we say that t and u are h-similar, denoted t ∼h u, if either t, u ∈ M and h(t) = h(u) or t and u have infinite factorizations in non-empty finite traces t = t1 t2 · · ·, u = u1 u2 · · · with h(ti ) = h(ui ) for all i. The transitive closure ≈h of ∼h is an equivalence relation. Since S is finite, this equivalence relation is of finite index with at most |S|2 + |S| equivalence classes. A trace language L ⊆ R is recognized by h if it is saturated by ≈h (or equivalently by ∼h ), i.e., t ∈ L implies [t]≈h ⊆ L for all t ∈ R. Let L ⊆ R be recognized by a morphism h : M → S. For B ⊆ Σ, L ∩ MB and L ∩ RB are recognized by h MB the restriction of h to MB . A finite monoid S is aperiodic if there is an n ≥ 0 such that sn = sn+1 for all s ∈ S. A trace language L ⊆ R is aperiodic if it is recognized by some morphism to a finite and aperiodic monoid. First-order definability coincides with aperiodicity for traces.
432
Paul Gastin, Madhavan Mukund, and K. Narayan Kumar
Theorem 1 ([6,7]). A language L ⊆ R(Σ, D) is expressible in FOΣ (<) if and only if it is aperiodic.
3
Local Temporal Logic
We denote by LocTLiΣ the set of (internal) formulas over the alphabet Σ. They are given by the following syntax: ϕ ::= a ∈ Σ | ¬ϕ | ϕ ∨ ϕ | EX ϕ | ϕ U ϕ | ¬a S b, a, b ∈ Σ Let t = [V, ≤, λ] ∈ R be a finite or infinite trace and let x ∈ V be some vertex of t. We write t, x |= ϕ to denote that trace t at node x satisfies the formula ϕ ∈ LocTLiΣ . This is defined inductively as follows: t, x |= a t, x |= ¬ϕ t, x |= ϕ ∨ ψ t, x |= EX ϕ t, x |= ϕ U ψ t, x |= ¬a S b
ϕ ϕUψ
λ(x) = a t, x |= ϕ t, x |= ϕ or t, x |= ψ ∃y. x y and t, y |= ϕ ∃z ≥ x. [t, z |= ψ and ∀y. (x ≤ y < z) ⇒ t, y |= ϕ] ∃z ≤ x. [λ(z) = b and ∀y. (z < y ≤ x) ⇒ λ(y) = a]
if if if if if if
@
@ @
ψ
b
@ ¬a @ @ ¬a S b
The modality U is the “universal” until operator defined in [3]. The modality S is the corresponding since operator. Note that we only use the operator S in the very restricted form of a fixed number of past constants. Past modalities are essential, as indicated by the following example from [22], where the dependence relation is a − b − c − d. These two traces are not first-order equivalent but are bisimilar at the level of events and thus cannot be distinguished by purely future modalities. a → b → c → b → c ··· d → c → b → c → b ··· ↑ ↑ a→b d→c As usual, we can derive useful operators such as universal next AX ϕ = ¬ EX ¬ϕ, eventually in the future F ϕ = U ϕ and always in the future G ϕ = ¬ F ¬ϕ. The modality F∞ a = F a ∧ G(a ⇒ EX F a) expresses the existence of infinitely many vertices labelled with a above the current vertex. Traces as Models of Formulas: We now turn our attention to defining when a trace satisfies a formula. For LTL over sequences, lifting satisfaction at positions to satisfaction by a word is quite simple: a word models a formula if its initial position models the formula. Since a trace, in general, does not have a unique
Local LTL with Past Constants Is Expressively Complete
433
initial position, we need to use initial formulas as introduced in [3]. These are boolean combinations of formulas EM ϕ, each of which asserts the existence of a minimal vertex in a trace satisfying the internal formula ϕ. More precisely, the set LocTLΣ of initial formulas over the alphabet Σ is defined as follows: α ::= ⊥ | EM ϕ, ϕ ∈ LocTLiΣ | ¬α | α ∨ α The semantics of EM is given by: t |= EM ϕ if ∃x. (x ∈ min(t) and t, x |= ϕ) An initial formula α ∈ LocTLΣ defines the trace language L(α) = {t ∈ R | t |= α}. We can then express various alphabetic properties using initial formulas: L(EM a) = {t ∈ R | a ∈ min(t)}, L(EM F a) = {t ∈ R | a ∈ alph(t)}, and L(EM F∞ a) = {t ∈ R | a ∈ alphinf(t)}. Therefore, for C ⊆ Σ, trace languages such as (alph = C), (alphinf = C) and (min = C) are expressible in LocTLΣ . The following result is immediate from the definition of LocTLΣ . Proposition 2. If a trace language is expressible in LocTLΣ , then it is expressible in FO3Σ (<). We now show that the “filtered” modalities EXb and Fb from [10], with the following semantics, are both expressible in LocTLiΣ . t, x |= EXb ϕ if ∃y. [x y and t, y |= ϕ and ∀z. (z ≤ y ∧ λ(z) = b) ⇒ z ≤ x] t, x |= Fb ϕ if ∃y. [x ≤ y and t, y |= ϕ and ∀z. (z ≤ y ∧ λ(z) = b) ⇒ z ≤ x] Proposition 3. For any trace t over some alphabet Σ, any position x in t and any formula ϕ of LocTLiΣ t, x |= EXb ϕ ⇐⇒ t, x |= (b ∧ EX(ϕ ∧ ¬b)) ∨ (a ∧ EX(ϕ ∧ ¬(¬a S b)))
a=b
Let the formula Safeb = (b ∧ AX ¬b) ∨ a=b (a ∧ AX ¬(¬a S b)). Further, let F0b ϕ = Safeb U ϕ and Fk+1 ϕ = Safeb U EXb (Fkb ϕ). b Proposition 4. For any trace t ∈ R(Σ, D), any position x in t and any formula ϕ of LocTLiΣ Fkb ϕ t, x |= Fb ϕ ⇐⇒ t, x |= k≤|Σ|
Now we establish some important lemmas that are critical in proving the expressive completeness of LocTLΣ . Lemma 5. Let A ⊆ Σ and b ∈ Σ with b ∈ / A. For all ϕ ∈ LocTLiA , there i is a formula ϕ ∈ LocTLA∪{b} such that for all t = t1 bt2 t3 ∈ R with t1 ∈ R, t2 ∈ RA , min(t2 ) ⊆ D(b) and min(t3 ) ⊆ {b} and for all x ∈ bt2 we have bt2 , x |= ϕ iff t, x |= ϕ.
t1
b
t2
b
t3
434
Paul Gastin, Madhavan Mukund, and K. Narayan Kumar
EX Proof Sketch. We have a = a, ¬ϕ = ¬ϕ, ϕ ∨ψ =ϕ ∨ ψ, ϕ = EXb ϕ, ϕ Uψ = U (d ∧ ψ)) ∧ Fb (d ∧ ψ)) and ¬c S d = (¬c S d) ∧ ¬(¬d S b). d∈A∪{b} ((ϕ Lemma 6. Let A ⊆ Σ and b ∈ Σ with b ∈ / A. For all α ∈ LocTLA , there exists a formula α ∈ LocTLiA∪{b} such that for all t = t1 bt2 t3 ∈ R with t1 ∈ R, t2 ∈ RA , min(t2 ) ⊆ D(b) and min(t3 ) ⊆ {b}, we have t2 |= α if and only if t, min(bt2 t3 ) |= α. Proof Sketch. We have ¬α = ¬α, α ∨ β = α ∨ β and EM ϕ = EX(ϕ ∧ ¬b) where ϕ is the formula given by Lemma 5. Lemma 7. Let A ⊆ Σ and b ∈ Σ with b ∈ / A. For all α ∈ LocTLA , there exists a formula α ∈ LocTLA∪{b} such that for all t = t1 t2 with t1 ∈ RA and min(t2 ) ⊆ {b}, we have t1 |= α if and only if t |= α . ¬ϕ Proof Sketch. Let ϕ ∨ψ =ϕ ∨ ψ, = ¬ϕ, EX ϕ = EX(ϕ ∧ ¬(¬b S b)), ϕ Uψ = ϕ U (ψ ∧ ¬(¬b S b)) and ¬c S d = ¬c S d. Then, for all t = t1 t2 with t1 ∈ RA , min(t2 ) ⊆ {b} and for all x ∈ t1 , we have t1 , x |= ϕ if and only if t, x |= ϕ. ϕ = EM(ϕ Finally, let EM ∧ ¬b).
4
Decomposition of Traces
The proof of our main result is a case analysis based on partitioning the set of traces according to the structure of the trace. Fix a letter b ∈ Σ and set B = Σ \ {b}. Using the notation introduced in Section 2, let Γ A = {t ∈ RA B | min(t) ⊆ D(b)}, Γ = Γ ∅ , and ΩA = {t ∈ RI(A) | min(t) ⊆ {b}}. Each trace t ∈ R has a unique finite or infinite factorization t = t0 bt1 bt2 · · · with t0 ∈ RB and ti ∈ RB ∩ (min ⊆ D(b)) for all i > 0. In particular, we have (bΓ )∗ bΓ A ΩA (min = {b}) = (bΓ )+ (bΓ )ω ∅=A⊆B
The following two results will allow us to use this decomposition effectively in proving the expressive completeness of our logic. For this, we use F∞ b a = Fb a ∧ ¬ Fb (a ∧ ¬ EXb Fb a). Lemma 8. Let t = t0 t with t0 , t ∈ R and min(t ) = {b}. Then, 1. t ∈ (bΓ )∞ \ {1} if and only if t, min(t ) |= β with ∞ ∞ F c∧ ¬F c β= C
c∈C
c∈C /
where C ranges over connected subsets of Σ such that b ∈ C if C = ∅.
Local LTL with Past Constants Is Expressively Complete
435
2. t ∈ (bΓ )∗ bΓ A ΩA if and only if t, min(t ) |= γ with ∞ ∞ ∞ ∞ ¬F c ∧ F b ∧ F c∧ Fb a ∧ ¬ Fb a . γ= C⊆Σ
c∈C
c∈C /
a∈A
a∈A /
A
Note that “the” b in bΓ ΩA is characterized by the formula b ∧ F∞ b a, where a is any letter in A. Lemma 9. Let A ⊆ Σ and let L ⊆ R be a trace language recognized by a morphism h from M into a finite monoid S. Then, L ∩ (bΓ )∗ bΓ A ΩA = (L1 ∩ (bΓ )∗ )b(L2 ∩ Γ A )(L3 ∩ ΩA ) finite where the trace languages Li ⊆ R are recognized by h.
5
Expressive Completeness
If T is a finite alphabet, we define the linear temporal logic LTLT (XU) by the syntax: f ::= u ∈ T | f XU f | ¬f | f ∨ f. The length of a finite or infinite word w = w1 w2 · · · ∈ T ∞ is |w| ∈ N ∪ {ω}. For a word w = w1 w2 · · · ∈ T ∞ the semantics of LTLT (XU) is given by w |= u if |w| > 0 and w1 = u w |= f XU g if ∃j ∈ N with 1 < j ≤ |w| + 1 and wj wj+1 · · · |= g and wk wk+1 · · · |= f, ∀1 < k < j. Note that if w |= f XU g then w is nonempty. A formula f ∈ LTLT (XU) defines the word language L(f ) = {w ∈ T ∞ | w |= f }. We use the following proposition which is a consequence of several results on the equivalence between aperiodic word languages, star-free word languages, first order word languages and word languages expressible in LTLT (XU) [18,12,14,20,8,15,16,2]. Proposition 10. Every aperiodic word language K ∈ T ∞ is expressible in LTLT (XU). We fix T = h(bΓ ) and we define the mapping σ : (bΓ )∞ → T ∞ by σ(t) = h(bt1 )h(bt2 ) · · · if t = bt1 bt2 · · · with ti ∈ Γ for i ≥ 1. Note that the mapping σ is well-defined since each trace t ∈ (bΓ )∞ has a unique factorization t = bt1 bt2 · · · with ti ∈ Γ for i ≥ 1. Lemma 11. Let L ⊆ R be recognized by h. Then, 1. L ∩ (bΓ )ω = σ −1 (K) for some K expressible in LTLT (XU). 2. L ∩ (bΓ )+ = σ −1 (K) for some K expressible in LTLT (XU). Next we show show how to lift an LTLT (XU) formula for K ⊆ T ∞ to a LocTLi formula for σ −1 (K) ∈ (bΓ )∞ . Lemma 12. Suppose that any aperiodic trace language over B is expressible in LocTLB . Then, for all f ∈ LTLT (XU) there exists f ∈ LocTLiΣ such that for all t = t1 t with t1 ∈ R and t ∈ (bΓ )∞ \ {1}, we have σ(t ) |= f iff t, min(t ) |= f.
436
Paul Gastin, Madhavan Mukund, and K. Narayan Kumar
Proof Sketch. The formula f is defined by structural induction. We let f ∨g =
f ∨ g, ¬f = ¬f , f XU g = EX (¬b ∨ f ) U (b ∧ g) . The difficult case is when f = s ∈ T . For all r ∈ S, the trace language h−1 (r) ∩ MB is aperiodic and therefore expressible in LocTLB by the hypothesis of the lemma: we find αr ∈ LocTLB such that for all t ∈ MB , h(t ) = r if and only if t|= αr . Let αr ∈ LocTLiΣ be the formula obtained using Lemma 6. We let s = h(b)·r=s αr . Lemma 13. Suppose that any aperiodic trace language over B is expressible in LocTLB . Let A ⊆ Σ be non-empty and let f ∈ LTLT (XU). There exists f ∈ LocTLiΣ such that for all t = t1 t2 t3 with t1 ∈ R, t2 ∈ (bΓ )∗ , t3 ∈ bΓ A ΩA , we have σ(t2 ) |= f iff t, min(t2 t3 ) |= f. Lemma 14. Suppose that for any proper subset A of Σ, any aperiodic trace language over A is expressible in LocTLA . Let L ⊆ R be an aperiodic trace language over Σ. Then, for all b ∈ Σ, there exists ϕ ∈ LocTLiΣ such that for all t = t0 t with t0 , t ∈ R and min(t ) = {b}, t ∈ L iff t, min(t ) |= ϕ. Proof. We prove this lemma by induction on the size of the alphabet Σ. If Σ = ∅ then there is nothing to prove. Now, suppose that Σ = ∅ and let b ∈ Σ. We assume that L is recognized by the aperiodic morphism h : M → S. Now, L ∩ (min = {b}) can be written as (L ∩ (bΓ )∗ bΓ A ΩA ). (L ∩ ((bΓ )∞ \ {1})) ∅=A⊆B
By Lemma 11 we get L ∩ ((bΓ )∞ \ {1}) = σ −1 (L(f )) for some f ∈ LTLT (XU). From the hypothesis, aperiodic languages over B are expressible in LocTLB . Hence, we can apply Lemma 12 and we get f such that for all t = t0 t with t0 ∈ R and t ∈ (bΓ )∞ \ {1}, we have σ(t ) |= f iff t, min(t ) |= f. We conclude this case taking ϕ = β ∧ f where β is defined in Lemma 8. Now, we consider L ∩ (bΓ )∗ bΓ A ΩA where ∅ = A ⊆ B. By Lemma 9, (L1 ∩ (bΓ )∗ )b(L2 ∩ Γ A )(L3 ∩ ΩA ) L ∩ (bΓ )∗ bΓ A ΩA = finite where each Li is an aperiodic language recognized by h. Thus, it suffices to show that for aperiodic languages L1 , L2 and L3 recognized by h, there is a formula ϕ such that for all t = t0 t with t0 , t ∈ R and min(t ) = {b}, we have t, min(t ) |= ϕ if and only if t ∈ (L1 ∩ (bΓ )∗ )b(L2 ∩ Γ A )(L3 ∩ ΩA ). By Lemma 11 we get L1 ∩ (bΓ )∗ = σ −1 (L(f1 )) for some f1 ∈ LTLT (XU). From the hypothesis, aperiodic languages over B are expressible in LocTLB . Hence, we can apply Lemma 13 and we get f1 such that for all t = t0 t1 t with t0 ∈ R and t1 ∈ (bΓ )∗ , and t ∈ bΓ A ΩA , we have t1 ∈ L1 iff t, min(t1 t ) |= f1 . Using again the hypothesis of the lemma, we get some formula α2 ∈ LocTLB such that L2 ∩ RB = L(α2 ). By Lemma 6 we find α2 ∈ LocTLiΣ such that for all t = t0 t1 bt2 t3 with t0 ∈ R and t1 ∈ (bΓ )∗ , t2 ∈ Γ A and t3 ∈ ΩA , we have t2 ∈ L2 iff t, min(bt2 t3 ) |= α2 .
Local LTL with Past Constants Is Expressively Complete
437
Finally, L3 is an aperiodic trace language over a smaller alphabet (since A = ∅, I(A) is a proper subset of Σ) and hence by induction hypothesis there is a formula ϕ3 such that for all t = t0 t1 bt2 t3 with t0 ∈ R and t1 ∈ (bΓ )∗ , t2 ∈ Γ A and t3 ∈ ΩA with t3 = 1, we have t3 ∈ L3 iff t, min(t3 ) |= ϕ3 . Putting these three pieces together we let ψ = f1 ∧ F(b ∧ F∞ b a ∧ α2 ∧ (ϕ4 ∨ Fb EX(b ∧ ϕ3 ))) / L3 and ϕ4 = ¬ EX F b otherwise. Then, for all t = t0 t1 bt2 t3 with ϕ4 = ⊥ if 1 ∈ with t0 ∈ R and t1 ∈ (bΓ )∗ , t2 ∈ Γ A and t3 ∈ ΩA , we get from the above discussion that t1 bt2 t3 ∈ L1 bL2 L3 if and only if t, min(t1 bt2 t3 ) |= ψ. We complete the proof with ϕ = γ ∧ ψ where γ is the formula defined in Lemma 8. Theorem 15. Any aperiodic real trace language over R(Σ, D) is expressible in LocTLΣ . Proof. The proof proceeds by induction on the size of Σ. When Σ = {a} is a singleton, L is either a finite set or the union of a finite set and a set of the form an a∗ for some n ≥ 0. In both cases, it is easy to check that L is expressible in LocTLΣ . For the inductive step, assume that the theorem holds for any aperiodic language over any proper subset of Σ. Let L be recognized by an aperiodic morphism h : M → S. Let b ∈ Σ and B = Σ \ {b} as usual. We can show as in Lemma 9 that L can be written as follows: L = (L1 ∩ RB )(L2 ∩ (min ⊆ {b})) finite
where L1 and L2 are languages recognized by the same aperiodic morphism h. Since the decomposition of any trace t ∈ R as t1 t2 with t1 ∈ RB and t2 ∈ (min ⊆ {b}) is unique, the above decomposition can be rewritten as (L1 ∩ RB )(min ⊆ {b}) ∩ (RB (L2 ∩ (min ⊆ {b}))) L = finite
Now, by the induction hypothesis, there is formula α1 in LocTLB such that for t1 ∈ RB , t1 |= α1 if and only if t1 ∈ L1 . Thus, by Lemma 7, there is a formula α 1 in LocTLΣ such that t |= α 1 if and only if t1 |= α1 whenever t = t1 t2 with α1 ). t1 ∈ RB and min(t2 ) ⊆ {b}. Thus, (L1 ∩ RB )(min ⊆ {b}) = L( Since we have assumed expressive completeness for every proper subset of Σ, by Lemma 14 there is a formula ϕ2 such that for any t = t1 t2 with min(t2 ) = b, t2 ∈ L2 if and only if t, min(t2 ) |= ϕ2 . Consider the formula α = α ∨ EM((b ∧ ϕ2 ) ∨ (¬b ∧ Fb EX(b ∧ ϕ2 ))) where α = ⊥ if 1 ∈ / L2 and α = ¬ EM F b otherwise. Then, t |= α if and only if either t ∈ RB and 1 ∈ L2 , or there is a minimal b event x in the trace t and t, x |= ϕ2 . That is t = t1 t2 with t1 ∈ RB , min(t2 ) = {b} and t2 ∈ L2 . Thus RB (L2 ∩ (min ⊆ {b})) = L(α) is also expressible in LocTLΣ .
438
Paul Gastin, Madhavan Mukund, and K. Narayan Kumar
References 1. B. Adsul and M. Sohoni. Complete and tractable local linear time temporal logics over traces. In Proc. of ICALP’02, LNCS 2380, 926–937. Springer Verlag, 2002. 2. J. Cohen, D. Perrin, and J.-E. Pin. On the expressive power of temporal logic. Journal of Computer and System Sciences, 46:271–295, 1993. 3. V. Diekert and P. Gastin. Local temporal logic is expressively complete for cograph dependence alphabets. In Proc. of LPAR’01, LNAI 2250, 55–69. Springer Verlag, 2001. 4. V. Diekert and P. Gastin. LTL is expressively complete for Mazurkiewicz traces. Journal of Computer and System Sciences, 64:396–418, 2002. 5. V. Diekert and G. Rozenberg, editors. The Book of Traces. World Scientific, Singapore, 1995. 6. W. Ebinger. Charakterisierung von Sprachklassen unendlicher Spuren durch Logiken. Dissertation, Institut f¨ ur Informatik, Universit¨ at Stuttgart, 1994. 7. W. Ebinger and A. Muscholl. Logical definability on infinite traces. Theoretical Computer Science, 154:67–84, 1996. 8. D. Gabbay, A. Pnueli, S. Shelah, and J. Stavi. On the temporal analysis of fairness. In Proc. of PoPL’80, 163–173, Las Vegas, Nev., 1980. 9. P. Gastin and D. Kuske. Satisfiability and model checking for MSO-definable temporal logics are in PSPACE. To appear in Proc. of CONCUR’03. 10. P. Gastin and M. Mukund. An elementary expressively complete temporal logic for Mazurkiewicz traces. In Proc. of ICALP’02, LNCS 2380, 938–949. Springer Verlag, 2002. 11. P. Gastin, M. Mukund, and K. Narayan Kumar. Local LTL with past constants is expressively complete for Mazurkiewicz traces. Tech. Rep. 2003-008, LIAFA, Universit´e Paris 7 (France), 2003. 12. J.A.W. Kamp. Tense Logic and the Theory of Linear Order. PhD thesis, University of California, Los Angeles, California, 1968. 13. A. Mazurkiewicz. Concurrent program schemes and their interpretations. DAIMI Rep. PB 78, Aarhus University, Aarhus, 1977. 14. R. McNaughton and S. Papert. Counter-Free Automata. MIT Press, 1971. 15. D. Perrin. Recent results on automata and infinite words. In Proc. of MFCS’84, LNCS 176, 134–148. Springer Verlag, 1984. 16. D. Perrin and J.-E. Pin. First order logic and star-free sets. Journal of Computer and System Sciences, 32:393–406, 1986. 17. A. Pnueli. The temporal logic of programs. In FOCS’77, 46–57, 1977. 18. M.-P. Sch¨ utzenberger. On finite monoids having only trivial subgroups. Information and Control, 8:190–194, 1965. 19. P.S. Thiagarajan and I. Walukiewicz. An expressively complete linear time temporal logic for Mazurkiewicz traces. In Proc. of LICS’97, 183–194, 1997. 20. W. Thomas. Star-free regular sets of ω-sequences. Information and Control, 42:148–156, 1979. 21. I. Walukiewicz. Difficult configurations – on the complexity of LTrL. In Proc. of ICALP’98, LNCS 1443, 140–151. Springer Verlag, 1998. 22. I. Walukiewicz. Local logics for traces. Journal of Automata, Languages and Combinatorics, 7(2):259–290, 2002. 23. Th. Wilke. Classifying discrete temporal properties. In Proc. of STACS’99, LNCS 1563, 32–46. Springer Verlag, 1999. 24. W. Zielonka. Notes on finite asynchronous automata. R.A.I.R.O. — Informatique Th´eorique et Applications, 21:99–135, 1987.
LTL with Past and Two-Way Very-Weak Alternating Automata Paul Gastin and Denis Oddoux LIAFA, Universit´e Paris 7, 2, place Jussieu, F-75251 Paris Cedex 05, France {Paul.Gastin,Denis.Oddoux}@liafa.jussieu.fr Abstract. In this paper, we propose a translation procedure of PLTL (LTL with past modalities) formulas to B¨ uchi automata using two-way very-weak alternating automata (2VWAA) as an intermiediary step. Our main result is an efficient translation of 2VWAA to generalized B¨ uchi automata (GBA).
1
Introduction
Nowadays, computer systems (both hardware and software) play a central role in most human activities, including safety critical areas. It is therefore essential to improve their reliability. For this, we need to formally specify their expected behaviors. Choosing the specification language is a crucial factor for the overall validation process. We need a language that allows to express easily all desired properties. Moreover the specification language must be easily amenable to validation techniques such as model checking for instance. Temporal logics [13,1,12] are among the most widely used specification languages, the most popular ones being branching time temporal logics (CTL, CTL*) and linear time temporal logic (LTL). These logics are based on pure future modalities, i.e., modalities that does not depend on what happened before the current time. Adding past modalities to LTL does not increase its expressive power [6,2,10] but PLTL (LTL with past modalities) is exponentially more succinct than LTL [9]. In this paper, we focus our attention on PLTL. The drawback of using past modalities is that the satisfiability and the model checking problems are harder to solve. Not harder from a theoretical point of view since both problems are PSPACE complete [15] regardless of whether we use past modalities or not, but harder from a practical point of view. Model checking algorithms for PLTL have already been proposed [7,14,18] but few have been implemented and used in actual model checker. This is surprising since past modalities make specifications more succinct and, more importantly, much easier and more natural [11]. We believe that the reason is the difficulty to build efficiently a B¨ uchi automaton associated with a PLTL formula, which is a crucial step for a model checker using PLTL as a specification language. This is precisely the problem addressed in the present paper. The easiest construction is the so-called tableau construction (see e.g. [7,11,15]) but a straightforward implementation, called declarative tableau, is highly inefficient. Implementations are based on incremental tableau. They construct B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 439–448, 2003. c Springer-Verlag Berlin Heidelberg 2003
440
Paul Gastin and Denis Oddoux
only reachable states and apply some simplification techniques. An incremental tableau construction for PLTL [7] requires backtracking and is much harder to implement than for LTL [5]. Following the automata theoretic approach of [17], another technique to generate efficiently a generalized B¨ uchi automaton (GBA) from an LTL formula is to use very-weak alternating automata (VWAA) as an intermediary step. As demonstrated in [3], this yields an implementation which is dramatically faster than those based on the tableau construction. In this paper, we develop the same technique for PLTL. Since we have to deal with past modalities we use two-way very-weak alternating automata (2VWAA). Two-way alternating automata (2AA) were already proposed for specification languages that are much more expressive than PLTL [16,8] and a translation procedure from general 2AA to GBA is given in [8]. Since PLTL formulas are sufficient to specify most interesting properties and can be easily translated to 2VWAA, it is very important to develop an efficient translation procedure restricted to 2VWAA. The main result of this paper is an efficient translation procedure of progressing 2VWAA to GBA. Starting from a progressing 2VWAA with n states, we construct a GBA with at most 2n states. Since the translation from a PLTL formula ϕ gives a progressing 2VWAA with at most |ϕ| + 1 states, we get for this formula a GBA with at most 2|ϕ|+1 states. As a comparison, the algorithm of [8] 2 gives an automaton with 2O(|ϕ| ) states (recall that the specification language considered in [8] is much more expressive than PLTL). Due to space constraints, some proofs have had to be omitted in this extended abstract. A full version of the paper is available in [4].
2
Two-Way Very-Weak Alternating Automata (2VWAA)
A two-way alternating automaton (2AA) is a six-tuple A = (Q, Σ, δ, I, F, R) Q Σ Q where Q is the set of states, Σ is the alphabet, δ : Q → 22 ×2 ×2 is the Q transition function, I ⊆ 2 gives the initial condition, F ⊆ Q is the set of final states. R ⊆ Q is the set of repeated states. A more classical way of defining 2AAs would use I ∈ B+ (Q) and δ : Q×Σ → + B (Q × {−1, 1}), where −1 and 1 indicate whether the head moves left or right. While the two approaches are equivalent, our definition allows for more compact representations of the automata and for faster algorithms. Notice in particular that in the transition function we use 2Σ instead of Σ, so that transitions that differ only by actions can be gathered. However, the automaton still reads finite and infinite words in Σ ∞ = Σ + ∪ Σ ω . Runs of 2AAs are defined using Q-forests. A Q-forest over a word u ∈ Σ ∞ is a labeled forest (V, E, σ, ν) where V is the set of vertices, E ⊆ V × V is the set of edges, σ : V → Q is the state labeling function, ν : V → N is the position labeling function: it indicates the letter the automaton A is reading; ∀x ∈ V , 0 ≤ ν(x) ≤ |u| + 1 (where |u| ∈ N ∪ {ω} denotes the length of u), and ∀(x, y) ∈ E, either ν(y) = ν(x) + 1 if A goes forward on u, or ν(y) = ν(x) − 1 if A goes backward on u.
LTL with Past and Two-Way Very-Weak Alternating Automata
441
We shall use the following notations: λ = (σ, ν) : V → Q × N is the labelling ← − function, E(x) = {y ∈ V | (x, y) ∈ E} is the set of the sons of x, E (x) = {y ∈ → − ← − E(x) | ν(y) = ν(x) − 1} is the set of the left sons of x, and E (x) = E(x)\ E (x) ← − is the set of the right sons of x, left(x) = E ∗ ( E (x)) ∪ {x}, ]x, y] = {z ∈ E + (x) | y ∈ E ∗ (z)}, and similarly for [x, y], ]x, y[ and [x, y[. A run ρ of a 2AA A on a word u = u1 u2 · · · ∈ Σ ∞ is a Q-forest (V, E, σ, ν) over u such that the roots satisfy the initial condition: if Γ ⊆ V are the roots of the forest, then σ(Γ ) ∈ I and ν(Γ ) ⊆ {1}; and the sons of any node satisfy the transition function: ∀x ∈ V if ν(x) = 0 or ν(x) = |u| + 1 then E(x) = ∅, ← − → − otherwise ∃α ∈ 2Σ , uν(x) ∈ α and (σ( E (x)), α, σ( E (x))) ∈ δ(σ(x)). A run ρ is accepting if ∀x ∈ V , ν(x) = 0 or ν(x) = |u| + 1 imply σ(x) ∈ F ; and if any infinite branch of ρ goes infinitely often through R (note that ρ may have infinite branches even if u is finite). Finally, L(A) is the set of words on which there exists an accepting run of A. A two-way very-weak alternating automaton (2VWAA) is a 2AA where there ← − → − ← − − → exists a partial order on Q such that ∀p ∈ Q, ∀( X , α, X ) ∈ δ(p), ∀q ∈ X ∪ X , q p. Actually, we use progressing (or loop-free) 2VWAA for which we describe in Section 4 an efficient translation to B¨ uchi automaton. A run ρ is progressing if any infinite branch x0 . . . xn . . . in ρ satisfies the following property: ∀N ≥ 0, ∃i ≥ 0, ν(xi ) ≥ N . A run ρ has a loop if two nodes on a same branch have the same label: ∃x ∈ V , ∃y ∈ E + (x), λ(x) = λ(y). Otherwise ρ is loop-free. A 2AA is progressing (respectively loop-free) if all runs (accepting or not) of this automaton are progressing (respectively loop-free). Note that a loop-free run on a finite word has no infinite branches and is therefore progressing. More generally, any loop-free run is progressing. The converse is trivially false. Still, we have Proposition 1. A 2AA is progressing iff it is loop-free. Also, given a 2VWAA with n states we can effectively construct a progressing 2VWAA with at most 2n states, accepting the same language. When combined with the translation of progressing 2VWAA to GBA presented in Section 4, this yields an efficient translation from general 2VWAA to GBA.
3
PLTL to Progressing 2VWAA
The syntax and the semantics of PLTL is classical [12]. Here we use the following syntax: ϕ ::= ⊥ | p | ¬ ϕ | ϕ ∨ ϕ | X ϕ | Y ϕ | ϕ U ϕ | ϕ S ϕ, where ⊥ stands for false and p ranges over the set AP of atomic propositions. An LTL formula is a PLTL formula that uses neither Y nor S. Let Σ = 2AP . We write L(ϕ) ⊆ Σ ∞ for the set of words satisfying a PLTL formula ϕ. We also use the dual operators Y, U, ϕ = ¬ X ¬ ϕ and ϕ1
, ∧, X, S. For instance, X S ϕ2 = ¬(¬ ϕ1 S ¬ ϕ2 ). Note ϕ = ¬ X ∨ X ϕ. that X ϕ is not equivalent to X ϕ on finite words: we have X A PLTL formula can be written in negative normal form, where negations are on predicates only. A formula in negative normal form uses ⊥, , predicates in
442
Paul Gastin and Denis Oddoux
AP, their negations, the operators ∨, X, U, Y, S and their duals. A PLTL formula in negative normal form that is neither a disjunction (∨) nor a conjunction (∧) is called a temporal formula. Notice that transforming a formula in negative normal form does not change the number of temporal operators of the formula. From now on, we suppose that every PLTL formula is in negative normal form. This section is devoted to the first step of our algorithm: from a PLTL formula with n temporal operators we can compute a progressing 2VWAA with at most Q Σ Q n + 1 states. We use the following notations. For J1 , J2 ∈ 22 ×2 ×2 we let ← − ← − → − − → ← − → − ← − → − J1 ⊗J2 = {( X 1 ∪ X 2 , α1 ∩α2 , X 1 ∪ X 2 ) | ( X 1 , α1 , X 1 ) ∈ J1 , ( X 2 , α2 , X 2 ) ∈ J2 }. For a PLTL formula ψ in negative normal form we define ψ inductively by: ψ = {{ψ}} if ψ is a temporal formula, ψ1 ∨ ψ2 = ψ1 ∪ ψ2 and ψ1 ∧ ψ2 = {X1 ∪ X2 | X1 ∈ ψ1 and X2 ∈ ψ2 }. It is actually not completely immediate to deal smoothly with the special cases raised when checking a past modality at the beginning of a word, or a future modality at the end of a finite word. We solve this problem using a special state END, which is reached in an accepting run when the current position ν is outside the word u (σ(x) = END ⇒ ν(x) = 0 or ν(x) = |u| + 1). Definition 2 (ϕ to Aϕ ). For any PLTL formula ϕ in negative normal form on the set AP, let Aϕ = (Q, Σ, δ, I, F, R) be the 2AA defined by Q = sub(ϕ)∪{END} where sub(ϕ) are the temporal subformulae of ϕ, Σ = 2AP , I = ϕ, F = {END}, R is the set of the subformulae in sub(ϕ) that are not of the form ϕ1 U ϕ2 , and δ isdefined below (∆ extends δ to B + (Q)): δ(⊥) = ∅ δ( ) = {(∅, Σ, ∅)} δ(p) = {(∅, Σp , ∅)} where Σp = {a ∈ Σ | p ∈ a} δ(¬ p) = {(∅, Σ¬ p , ∅)} where Σ¬ p = Σ\Σp δ(X ψ) = (∅, Σ, e) | e ∈ ψ ψ) = (∅, Σ, e) | e ∈ ψ ∪ {(∅, Σ, {END})} δ(X δ(Y ψ) = (e, Σ, ∅) | e ∈ ψ ψ) = (e, Σ, ∅) | e ∈ ψ ∪ {({END}, Σ, ∅)} δ(Y δ(ψ1 U ψ2 ) = ∆(ψ2 ) ∪ (∆(ψ1 ) ⊗ {(∅, Σ, {ψ1 U ψ2 })}) δ(ψ U ψ2 ) = ∆(ψ2 ) ⊗ (∆(ψ1 ) ∪ {(∅, Σ, {ψ1 U ψ2 }), (∅, Σ, {END})}) 1 δ(ψ S ψ ) = ∆(ψ ) ∪ (∆(ψ ) ⊗ {({ψ S ψ }, Σ, ∅)}) 1 2 2 1 1 2 δ(ψ ) = ∆(ψ ) ⊗ (∆(ψ ) ∪ {({ψ S ψ S ψ 1 2 2 1 1 2 }, Σ, ∅), ({END}, Σ, ∅)}) δ(END) = ∅ ∆(ψ) = δ(ψ) if ψ is a temporal formula ∆(ψ1 ∨ ψ2 ) = ∆(ψ1 ) ∪ ∆(ψ2 ) ∆(ψ1 ∧ ψ2 ) = ∆(ψ1 ) ⊗ ∆(ψ2 ) Theorem 3 (PLTL to Progressing 2VWAA). For any PLTL formula in negative normal form ϕ with n temporal subformulae, the automaton Aϕ is a progressing 2VWAA with at most n + 1 states and L(Aϕ ) = L(ϕ). Proof. Let us define ψ1 ψ2 if ψ1 = END or if ψ1 is a subformula of ψ2 . It is easy to see with this partial order that Aϕ is very-weak.
LTL with Past and Two-Way Very-Weak Alternating Automata
443
Now, we show that Aϕ is progressing. Suppose that a node x and its son → − y have the same state-label ψ in a run of Aϕ . Then, either (y ∈ E (x) and ← − ψ2 )), or (y ∈ E (x) and (ψ = ψ1 S ψ2 or ψ = ψ1 (ψ = ψ1 U ψ2 or ψ = ψ1 U S ψ2 )). Now let x0 , x1 , . . . be an infinite branch of a run ρ of Aϕ . Since Aϕ is veryweak, the sequence σ(x0 ), σ(x1 ), . . . is ultimately constant. From the previous ψ2 , statement, necessarily the limit of this sequence is a formula ψ1 U ψ2 or ψ1 U and hence ν is strictly increasing from a certain node on the branch: Aϕ is progressing. The proof for L(Aϕ ) = L(ϕ) is omitted. It is similar to the classical proof used for the similar translation of LTL formulas to VWAA.
4
Progressing 2VWAA to GBA
The general translation from a 2AA to a BA is rather involved. Here we take full advantage of the fact that we start from a progressing 2VWAA. The basic idea, starting from an accepting run ρ of a 2AA A over a word u = u1 u2 . . . ∈ Σ ∞ , is to consider the sets Xi = σ(ν −1 (i)), and to build a kind of automaton B such that the sequence X0 , X1 , X2 , . . . is an accepting run of B on u. Transitions of B should reflect the fact that each node x of ρ satisfies the transition function of A. If ν(x) = i, this involves the sets Xi−1 , Xi , Xi+1 and the letter ui . Hence it is natural to consider quadruples (Xi−1 , Xi , Xi+1 , ui ) as transitions of B. This is why we introduce here generalized BA having such quadruples as transitions. We are also using several acceptance conditions on transitions instead of states since the construction becomes simpler. A GBA is a five-tuple G = (Q, Σ, T, I, F, T ) where: Q is the set of states, Σ is the alphabet, T ⊆ Q × Q × Q × 2Σ is the transition function, I ⊆ Q × Q is the set of initial states, F ⊆ Q × Q is the set of final states, T = {T1 , . . . , Tr } where Tj ⊆ T are the acceptance tables. A run σ of a GBA G on a word u = u1 u2 . . . ∈ Σ ω is an infinite sequence q0 , q1 , α1 , . . . , qn , αn , . . . such that (q0 , q1 ) ∈ I, and ∀i ≥ 1, ui ∈ αi and (qi−1 , qi , qi+1 , αi ) ∈ T . A run on a finite word u ∈ Σ + is defined similarly. The run σ is accepting if u ∈ Σ n and (qn , qn+1 ) ∈ F or u ∈ Σ ω and σ uses infinitely many transitions from each Tj , 1 ≤ j ≤ r. L(G) is the set of words on which there exists an accepting run of G. 1 Definition 4 (A to GA ). For any progressing 2VWAA A = (Q, Σ, δ, I, F, R), 1 we define the GBA GA = (Q , Σ, T , I , F , T ) by Q = 2Q , ← − → − ← − → − ← − ← − → → − − T = {( X , X, X , α) | ∃ ( Y , α, Y ) ∈ δ(q) with Y ⊆ X and Y ⊆ X }, q∈X
I = {(X, Y ) ∈ Q × Q | X ⊆ F and ∃Z ∈ I with Z ⊆ Y }, F = {(X, Y ) ∈ Q × Q | Y ⊆ F }, and T = {Tq | q ∈ Q\R} where ← − → − Tq = {( X , X, X , α) ∈ T | q ∈ / X or ← − → − ← − ← − − → − → → − ∃( Y , β, Y ) ∈ δ(q) with Y ⊆ X , Y ⊆ X , β ⊇ α and q ∈ / Y} Indeed, the hard part in a translation from a 2AA A to a GBA G is to make sure that each accepting run of G, which is a flat sequence of nodes, can be lifted
444
Paul Gastin and Denis Oddoux
to a Q-forest which defines an accepting run of A. It is remarkable that if A is a progressing 2VWAA, we can simply use sequences X0 , X1 , X2 , . . . as runs of G and still be able to lift them to accepting runs of A. 1 ) ⊆ L(A). Proposition 5. For any progressing 2VWAA A, L(GA 1 ) = L(A), but the other inclusion follows directly from results Actually L(GA that will be proved afterwards, hence we will not prove it here. See theorem 9 for the final result. 1 on a word u = Proof. Let X0 , X1 , α1 , X2 , α2 , . . . be an accepting run of GA ∞ u1 u2 . . . ∈ Σ . We are going to define a labelled graph ρ = (V, E, σ, ν). Let V = {(q, i) | i ≥ 0, q ∈ Xi }. By definition of T , ∀1 ≤ i ≤ |u|, ∀q ∈ Xi , there ← − → − ← − → − exist Y ⊆ Xi−1 , Y ⊆ Xi+1 and α ∈ 2Σ with αi ⊆ α and ( Y , α, Y ) ∈ δ(q), and → − such that q ∈ / Y whenever q ∈ / R and (Xi−1 , Xi , Xi+1 , αi ) ∈ Tq . Let E((q, i)) = ← − → − Y × {i − 1} ∪ Y × {i + 1}. If i ∈ {0, |u| + 1} then let E((q, i)) = ∅. ∀(q, i) ∈ V , let σ((q, i)) = q and ν((q, i)) = i. Since (X0 , X1 ) ∈ I , ∃Γ ⊆ X1 × {1} such that σ(Γ ) ∈ I. Now if we “unfold” the labelled graph ρ = (V, E, σ, ν) from the set of roots Γ , we obtain a Q-forest ρ = (V , E , σ , ν ). Let x ∈ V . If ν (x) ∈ {0, |u| + 1} then E (x) = ∅. Otherwise, let q = σ (x) ← − − → ← − → − and i = ν (x), let Y = σ (E (x)) = σ(E(q, i)∩Q×{i−1}) and Y = σ (E (x)) = σ(E(q, i) ∩ Q × {i + 1}). By construction of ρ there exists α ∈ 2Σ with αi ⊆ α ← − → − and ( Y , α, Y ) ∈ δ(q). Since ui ∈ αi , we have proved that the sons of x satisfy the transition function. Hence ρ is a run of A on u. We shall now check that ρ is accepting: – σ (ν −1 (0)) ⊆ σ(ν −1 (0)) = X0 ⊆ F , since (X0 , X1 ) ∈ I , – if u is finite of length n then σ (ν −1 (n + 1)) ⊆ σ(ν −1 (n + 1)) = Xn+1 ⊆ F since (Xn , Xn+1 ) ∈ F , – assume that ρ contains an infinite branch x0 , x1 , . . . ultimately state-labelled in Q\R: ∃q ∈ Q\R, ∃N > 0, ∀i ≥ N , σ(xi ) = q. Since A is progressing, from proposition 1 we know that ρ is loop-free (which implies that ρ is loop-free) and u is infinite. Since X0 , X1 , . . . is accepting, ∃k ≥ ν(xN ) such that (Xk−1 , Xk , Xk+1 , αk ) ∈ Tq . ρ is loop-free so ν is necessarily strictly increasing on xN , xN +1 , . . ., and there exists j ≥ N such that λ(xj ) = (q, k) and λ(xj+1 ) = (q, k +1). But since σ(xj ) = q and (Xk−1 , Xk , Xk+1 , αk ) ∈ Tq we chose E((q, k)) in ρ such that (q, k+1) ∈ / E((q, k)). This is a contradiction, and hence ρ cannot contain such a branch. 1 is still too big to be used in an efficient It is quite easy to see that GA implementation. It contains many useless states, and keeping only the accessible states would not be enough since the initial states are too numerous already. We 1 introduced GA merely to prove more easily that our construction is correct. The intuition how to get a smaller GBA from a 2VWAA is as follows: by removing useless parts of runs of a 2VWAA we obtain minimal runs, such that removing any subtree of the forest makes the run invalid. Ideally, we would like
LTL with Past and Two-Way Very-Weak Alternating Automata
445
2 1 to construct a GBA GA which is the restriction of GA to transitions of the form −1 −1 −1 (σ(ν (i − 1)), σ(ν (i)), σ(ν (i + 1)), ui ) obtained from minimal runs. For this, we start from a small set of initial states, and we compute the states and transitions accessible from these states. We need to store both the current set of states Y and the previous one X. Since A is two-way, it may happen that the set X is not big enough to fulfil all requirements imposed by Y . In this case, we have to backtrack and enlarge X. 2 Definition 6 (A to GA ). For any progressing 2VWAA A = (Q, Σ, δ, I, F, R), 2 = (Q , Σ, T , I , F , T ) be the GBA computed by let GA Initialization: I = {F } × I, ∇ = {F } × I, T = ∅. Then, we apply the following saturation procedure for each state (X, Y ) ∈ ∇ until we reach a fixed point: δ(q) for each (X , α, Z) ∈
q∈Y
if X ⊆ X if (Y, Z) ∈ / ∇ then add (Y, Z) to ∇ (a) if (X, Y, Z, α) ∈ / T then add (X, Y, Z, α) to T else for each (F, X, Y, β) ∈ T with (F, X) ∈ I if (F, X ∪ X ) ∈ / ∇ then add (F, X ∪ X ) to ∇ (b) add (F, X ∪ X ) to I for each (V, W, X, γ), (W, X, Y, β) ∈ T if (W, X ∪ X ) ∈ / ∇ then add (W, X ∪ X ) to ∇ (c) if (V, W, X ∪ X , γ) ∈ / T then add (V, W, X ∪ X , γ) to T Finally, we set Q = {X ∈ 2Q | ∃Y ∈ 2Q , (X, Y ) ∈ ∇ or (Y, X) ∈ ∇} F = (2Q × 2F ) ∩ (Q × Q ), and T = {Tq | q ∈ Q\R} where ← − → − Tq = {( X , X, X , α) ∈ T | q ∈ / X or ← − → − ← − ← − − → − → → − ∃( Y , β, Y ) ∈ δ(q) with Y ⊆ X , Y ⊆ X , β ⊇ α and q ∈ / Y} 2 1 Proposition 7. For any progressing 2VWAA A, L(GA ) ⊆ L(GA ). 2 1 Proof. It is easy to notice that any state or transition of GA appears in GA : 2 Q ⊆ Q and T ⊆ T . Hence any accepting run of GA on a word u is also an 1 accepting run of GA on the same word. 2 ). Proposition 8. For any progressing 2VWAA A, L(A) ⊆ L(GA
Proof. Let ρ = (V, E, σ, ν) be an accepting run of A on a word u. We build by induction on k a sequence ρk = X0 , (Xi , αi )0
446
Paul Gastin and Denis Oddoux
Otherwise, ∀q ∈ Xn , choose xq ∈ V such that λ(xq ) = (q, n), and such that → − if q ∈ / R then q ∈ / σ( E (xq )) whenever this is possible. Let X = {xq | q ∈ Xn }, ← − → − Xn−1 = σ( E (X)), Xn+1 = σ( E (X)). Since ρ is a run of A on u and n ≤ |u|, , αn , Xn+1 ) ∈ δ(q). Since ρk satisfies the ∃αn with un ∈ αn such that (Xn−1 q∈Xn
inductive hypothesis, we have (Xn−1 , Xn ) ∈ ∇ and three cases can occur: 2 ⊆ Xn−1 , then from the construction of GA , (Xn , Xn+1 ) ∈ ∇ (a) if Xn−1 and (Xn−1 , Xn , Xn+1 , αn ) ∈ T . Moreover, for each q ∈ Xn \ R, if ∃x ∈ → − ν −1 (n) with σ(x) = q and q ∈ / σ( E (x)) then ∃β ⊇ αn , ∃X ⊆ Xn−1 , ∃Z ⊆ Xn+1 \{q}, (X , β, Z ) ∈ δ(q) hence (Xn−1 , Xn , Xn+1 , αn ) ∈ Tq . Therefore, ρk+1 = ρk , αn , Xn+1 satisfies the inductive hypothesis. ⊆ Xn−1 . Note that this implies n ≥ 2, because if We now suppose that Xn−1 n = 1 then since ρ is accepting, X0 ⊆ F = X0 . (b) If n = 2 then (X0 , X1 , X2 , α1 ) ∈ T and (X0 , X1 ) ∈ I so from the 2 construction of GA , (X0 , X1 ∪ X1 ) ∈ I ⊆ ∇. Hence ρk+1 = X0 , X1 ∪ X1 satisfies the inductive hypothesis. (c) Assume now that n ≥ 3. Then we have (Xn−3 , Xn−2 , Xn−1 , αn−2 ) ∈ 2 , T and (Xn−2 , Xn−1 , Xn , αn−1 ) ∈ T . From the construction of GA (Xn−3 , Xn−2 , Xn−1 ∪ Xn−1 , αn−2 ) ∈ T and (Xn−2 , Xn−1 ∪ Xn−1 ) ∈ ∇. Moreover if (Xn−3 , Xn−2 , Xn−1 , αn−2 ) ∈ Tq then (Xn−3 , Xn−2 , Xn−1 ∪ , αn−2 ) ∈ Tq . Therefore, ρk+1 = X0 , . . . , Xn−2 , αn−2 , Xn−1 ∪ Xn−1 Xn−1 satisfies the inductive hypothesis. Now, we show that the sequence (ρk )k≥0 converges. For this, consider the alphabet A = 2Q × 2Σ partially ordered by (X, α) (Y, β) if X ⊆ Y . Let us write ρk = (X0 , ∅), (Xi , αi )1≤i≤n , (Xn+1 , ∅) ∈ A∗ . The sequence of words (ρk )k≥0 is strictly increasing for the lexicographic order induced by . Hence, we can easily show that this sequence is either finite (if the construction terminates) or infinite and converges to an infinite word ρ ∈ Aω . If u is finite then the words ρk are of length at most |u| + 2. Therefore the construction terminates at some step k: let ρ = ρk . Since ρ is accepting, 2 σ(ν −1 (|u| + 1)) ⊆ F so (X|u| , X|u|+1 ) ∈ F . Hence ρ is an accepting run of GA on u. If u is infinite then the sequence (ρk )k≥0 must be infinite and converges to an infinite word ρ ∈ Aω . Since any prefix of ρ is also a prefix of ρk for 2 on u. Suppose that ρ is not accepting: there exists some k, ρ is a run of GA q ∈ Q\R such that after a given rank N , all transitions used in ρ are not in Tq . → − Hence ∀x ∈ V if λ(x) = (q, i) with i ≥ N then q ∈ σ( E (x)). Moreover, since (XN −1 , XN , XN +1 , αN ) ∈ / Tq , we have q ∈ XN . So we can find in ρ an infinite branch ultimately labelled by q, which is a contradiction with ρ accepting. As a consequence of Propositions 5, 7 and 8 we obtain: Theorem 9 (Progressing 2VWAA to GBA). For any progressing 2VWAA 1 2 A, L(GA ) = L(GA ) = L(A). Hence from any progressing 2VWAA with n states and r repeated states we can construct a GBA accepting the same language with at most 2n states and n − r acceptance tables.
LTL with Past and Two-Way Very-Weak Alternating Automata
447
We conclude this section by explaining several simplifications that are used in the implementation of the algorithm described in Definition 6. Saturation procedure: In order to generate all needed transitions, we have to repeat the saturation procedure until no more changes occur. This has to be carefully implemented to avoid redundant computations. The idea here is to timestamp states and transitions by integers. When they are first created, new elements of ∇ are timestamped by 0, whereas new transitions in T get the current timestamp. Each time a pair (X, Y ) is considered for saturation, the current timestamp is incremented, and when this pair has been treated, it gets the current timestamp. Now, step (a) is executed only when the timestamp of (X, Y ) is 0. In step (b) we consider only transitions (F, X, Y, β) whose timestamps are greater or equal to that of (X, Y ). Similarly, in step (c), we only consider pairs of transitions for which at least one timestamp is greater or equal to that of (X, Y ). Transition simplification: The idea here is to remove redundant transitions. When there exist two transitions (X, Y, Z, α) and (X, Y, Z, β) with α ⊆ β, the first transition is more restrictive than the second one, and thus can be deleted. This simplification can be done on-the-fly, that is when the algorithm adds a new transition to T . State simplification: The idea here is to merge equivalent pairs of states. When two pairs of states (X, Y ) and (X , Y ) have the same outgoing transitions ((X, Y, Z, α) ∈ T ⇐⇒ (X , Y , Z, α) ∈ T and ∀q ∈ Q\R, (X, Y, Z, α) ∈ Tq ⇐⇒ (X , Y , Z, α) ∈ Tq ), they can be merged, and any transition pointing on one of the two states can now point on the merged state. The overall simplification alternates transition simplification and state simplification until a fixed point is reached. Experimental Results: We have implemented two algorithms, building re1 2 spectively GA and GA from a PLTL formula ϕ. Here are the results of some computations on the formula ϕn = ¬ G(p1 → F−1 (p2 ∧ F−1 (p3 ∧ . . . F−1 pn ) . . .), stating that each p1 is preceded by p2 , preceded itself by p3 , and so on until pn . This demonstrates the striking improvement of the on-the-fly algorithm.
ϕ2 ϕ3 ϕ4 ϕ5
Aϕ states 3 4 5 6
before 28, 82 100, 544 364, 3630 1348,24830
1 GA after time space 6, 11 0.03 <380 12, 36 0.83 <380 27,102 230 1,700 58,264 130,000 39,000
before 7, 10 10, 19 13, 31 16, 46
2 GA after time space 4, 6 0.01 <380 6,13 0.01 <380 8,23 0.08 635 10,36 9.40 32,000
In order of appearance : tested formula, number of states of the VWAA, and i for each i ∈ {1, 2}, number of states and transitions of GA before and after simplification, computation time in seconds, memory used in KB.
Model-Checking: At this point we are able from any PLTL formula ϕ to compute a GBA accepting the models of ϕ. In order to apply usual model checking techniques we can transform our GBA G = (Q, Σ, T, I, F, T ) to a more conventional automaton G = (Q , Σ, T , I, F, T ) where T = {((q− , q), α, (q, q+ )) |
448
Paul Gastin and Denis Oddoux
(q− , q, q+ , α) ∈ T }, the same for acceptance tables, and Q is the set of all pairs (q− , q) or (q, q+ ) appearing in T . We can actually view G as a convenient encoding of G . Classical techniques allows also to get a B¨ uchi automaton (BA) replacing the acceptance tables by a single set of repeated states.
References 1. E.A. Emerson. Temporal and modal logic. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, chapter 16, pages 995–1072. Elsevier Science, 1990. 2. D. Gabbay, A. Pnueli, S. Shelah, and J. Stavi. On the temporal analysis of fairness. In Proc. of PoPL’80, pages 163–173, Las Vegas, Nev., 1980. 3. P. Gastin and D. Oddoux. Fast ltl to b¨ uchi automata translation. In Proc. of CAV’01, number 2102 in LNCS, pages 53–65. Springer Verlag, 2001. 4. P. Gastin and D. Oddoux. LTL with past and two-way very-weak alternating automata. Tech. Rep. LIAFA 2003–010, Universit´e Paris 7 (France). 5. R. Gerth, D. Peled, M.Y. Vardi, and P. Wolper. Simple on-the-fly automatic verification of linear temporal logic. In Protocol Specification Testing and Verification, pages 3–18, Warsaw, Poland, 1995. Chapman & Hall. 6. J.A.W. Kamp. Tense Logic and the Theory of Linear Order. PhD thesis, University of California, Los Angeles, California, 1968. 7. Y. Kesten, Z. Manna, H. McGuire, and A. Pnueli. A decision algorithm for full propositional temporal logic. In Proc. of CAV’93, number 697 in LNCS, pages 97–109. Springer Verlag, 1993. 8. O. Kupferman, N. Piterman, and M.Y. Vardi. Extended temporal logic revisited. In Proc. of CONCUR’01, number 2154 in LNCS, pages 519–535. Springer Verlag, 2001. 9. F. Laroussinie, N. Markey, and Ph. Schnoebelen. Temporal logic with forgettable past. In Proc. of LICS’02, pages 383–392, 2002. 10. F. Laroussinie and Ph. Schnoebelen. A hierarchy of temporal logics with past. Theoretical Computer Science, 148:303–324, 1995. 11. O. Lichtenstein, A. Pnueli, and L.D. Zuck. The glory of the past. In Proc. of the 3rd Workshop on Logics of Programs, number 193 in LNCS, pages 196–218. Springer Verlag, 1985. 12. Z. Manna and A. Pnueli. The temporal logic of reactive and concurent systems: Specification. Springer Verlag, Berlin-Heidelberg-New York, 1992. 13. A. Pnueli. The temporal logic of programs. In Proc. of FOCS’77, pages 46–57, 1977. 14. Y.S. Ramakrishna, L.E. Moser, L.K. Dillon, P.M. Melliar-Smith, and G. Kutty. An automata theoretic decision procedure for propositional temporal logic with Since and Until. Fundamenta Informaticae, 17:271–282, 1992. 15. A.P. Sistla and E.M. Clarke. The complexity of propositional linear time logic. Journal of the Association of Computing Machinery, 32:733–749, 1985. 16. M.Y. Vardi. Reasonning about the past with two-way automata. In Proc. of ICALP’98, number 1443 in LNCS, pages 628–641. Springer Verlag, 1998. 17. M.Y. Vardi and P. Wolper. An automata-theoretic approach to automatic program verification. In Proc. of LICS’86, pages 332–344, 1986. 18. M.Y. Vardi and P. Wolper. Reasonning about infinite computations. Information and Computation, 115:1–37, 1994.
Match-Bounded String Rewriting Systems Alfons Geser1 , Dieter Hofbauer2 , and Johannes Waldmann3 1
3
National Institute of Aerospace, 144 Research Drive Hampton, Virginia 23666, USA [email protected] 2 Fachbereich Mathematik/Informatik, Universit¨ at Kassel D-34109 Kassel, Germany [email protected] Fakult¨ at f¨ ur Mathematik und Informatik, Universit¨ at Leipzig D-04109 Leipzig, Germany [email protected]
Abstract. We investigate rewriting systems on strings by annotating letters with natural numbers, so called match heights. A position in a reduct will get height h+1 if the minimal height of all positions in the redex is h. In a match-bounded system, match heights are globally bounded. Exploiting recent results on deleting systems, we prove that it is decidable whether a given rewriting system has a given match bound. Further, we show that match-bounded systems preserve regularity of languages. Our main focus, however, is on termination of rewriting. Match-bounded systems are shown to be linearly terminating, and–more interestingly– for inverses of match-bounded systems, termination is decidable. These results provide new techniques for automated proofs of termination.
1
Introduction
Rewriting is a model of computation. It allows to handle questions like termination (there is no infinite computation), normalization (a final configuration is reachable) and correctness (no erroneous configuration is reachable). These questions can be formulated in terms of sets of descendants: if R is a rewriting system, and L is a language, then R∗ (L) = {y | x ∈ L, x →∗R y}. Now R is correct for L iff R∗ (L) ∩ Err = ∅, and R is normalizing for L iff L ⊆ R−∗ (Final), with Err (resp. Final) denoting the set of erroneous (resp. final) configurations. Starting from classical program analysis, recent applications include verification of XML transformations, and cryptographic protocols [6]. From the point of view of these applications, it is highly desirable that the reachability relation R∗ effectively respects language classes with good decidability and closure properties. In the present paper, we achieve this by restricting the flow of information in rewriting systems, using match bounds. We can then apply recent results on deleting systems, to obtain closure and termination results. All constructions are effective (and we have indeed implemented them), so they can be used in automated proofs of termination. For instance, we can B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 449–459, 2003. c Springer-Verlag Berlin Heidelberg 2003
450
Alfons Geser, Dieter Hofbauer, and Johannes Waldmann
automatically verify termination of Zantema’s System {a2 b2 → b3 a3 }, a task that is notoriously difficult even for a human. To obtain automated termination proofs, we transform rewriting systems as follows: We annotate letters with numbers, so called match heights. A position in a reduct will get height h + 1 if the minimal height of all positions in the redex is h. A rewriting system is match-bounded if match heights of derivations are globally bounded. We give the definition and examples in Sections 3 and 4, while in Section 5 we discuss how to verify or refute a match-bound. Results follow from our recent research [10] on deleting systems. A string rewriting system R is called deleting if there exists a partial ordering on its alphabet such that each letter in the right hand side of a rule is less than some letter in the corresponding left hand side. Deleting systems can be regarded as the inverses of context limited grammars as defined and investigated by Hibbard [9]. Deleting rewriting systems terminate, and they have linearly bounded derivational complexity. In Section 6, we show that inverses of deleting string rewriting systems have decidable termination and uniform termination problem. This carries over to inverse match-bounded systems immediately: match-bounded systems are terminating, and inverse match-bounded systems have decidable termination and uniform termination problems. An application is given in Section 6. We conclude by discussing ramifications for further research in Section 7.
2
Preliminaries
We mostly stick to standard notations for strings and string rewriting, as e.g. in [2]. A string rewriting system (SRS) over an alphabet Σ is a relation R ⊆ Σ ∗ × Σ ∗ , inducing the rewrite relation →R = {(xy, xry) | x, y ∈ Σ ∗ , (, r) ∈ R} on Σ ∗ . Unless indicated otherwise, all rewriting systems are finite. Pairs (, r) from R are frequently referred to as rules → r. By lhs(R) and rhs(R) we denote the sets of left (resp. right) hand sides of R. The reflexive and transitive closure of + →R is →∗R , often abbreviated as R∗ , and →+ R or R denote the transitive closure. A rewriting rule → r is context-free if || ≤ 1, and a SRS is context-free if all its rules are. We use for the empty string, and |x| is the length of a string x. Further, for a language L ⊆ Σ ∗ , let factor(L) = {y ∈ Σ ∗ | ∃x, z ∈ Σ ∗ : xyz ∈ L}. For standard results on rational transductions we refer to [1]. For arelation ρ ⊆ A × B let ρ(a) = {b ∈ B | (a, b) ∈ ρ} for a ∈ A and ρ(A ) = a∈A ρ(a) for A ⊆ A, so the set of descendants of a language L ⊆ Σ ∗ modulo R is R∗ (L). The inverse of ρ is ρ− = {(b, a) | (a, b) ∈ ρ} ⊆ B × A, and we say that ρ satisfies the property inverse P if ρ− satisfies P . Define Inf(ρ) = {a ∈ A | ρ(a) is infinite}; the relation ρ is finitely branching if Inf(ρ) = ∅. For a relation ρ ⊆ Σ ∗ ×Σ ∗ and a set ∆ ⊆ Σ, let ρ|∆ denote ρ∩(∆∗ ×∆∗ ). Note the difference between R∗ |∆ and (R|∆ )∗ for a SRS R. E.g., for R = {a → b, b → / (R|∆ )∗ . c} over Σ = {a, b, c} and ∆ = {a, c} we have (a, c) ∈ R∗ |∆ , but (a, c) ∈ ∗ ∗ A relation s ⊆ Σ × Γ is a substitution if s() = and s(xy) = s(x)s(y) for x, y ∈ Σ ∗ , so s is uniquely determined by the languages s(a) for a ∈ Σ. For a family of languages L over Γ , the substitution s is an L-substitution if s(a) ∈ L
Match-Bounded String Rewriting Systems
451
for a ∈ Σ. For instance, if L is the family of finite (context-free) languages, then s is a finite (resp. context-free) substitution. If ∈ / s(a) for every a ∈ Σ, then s is epsilon-free. Note that a finite substitution is finitely branching, and the same holds for the inverse of a finite and epsilon-free substitution. Now we recall definitions and results regarding deleting string rewriting systems. This topic originates with Hibbard [9]. A string rewriting system R over an alphabet Σ is >-deleting for a precedence > on Σ if ∈ / lhs(R), and if for each rule → r in R and for each letter a in r, there is some letter b in with b > a. The system R is deleting if it is >-deleting for some precedence >. If R is deleting, then R is terminating, and R has linear derivational complexity. Theorem 1 ([10]). Let R be a deleting string rewriting system over Σ. Then there are an extended alphabet Γ ⊇ Σ, a finite substitution s ⊆ Σ ∗ × Γ ∗ , and a context-free string rewriting system C over Γ such that R∗ = (s ◦ C −∗ )|Σ . This decomposition result implies that deleting systems effectively preserve regularity, and that inverse deleting systems effectively preserve context-freeness, the latter being already shown in [9].
3
Match-Bounded String Rewriting Systems
We will now apply the theory of deleting systems to obtain results for matchbounded rewriting. A derivation is match-bounded if dependencies between rule applications are limited. To make this precise, we will annotate positions in strings by natural numbers that indicate their matching height. Positions in a reduct will get height h + 1 if the minimal height of all positions in the corresponding redex was h. Given an alphabet Σ, define the morphisms liftc : Σ ∗ → (Σ × N)∗ for c ∈ N by liftc : a → (a, c), base : (Σ × N)∗ → Σ ∗ by base : (a, c) → a, and height : (Σ × N)∗ → N∗ by height : (a, c) → c. For a string rewriting system R over Σ we define the rewriting system match(R) = { → liftc (r) | ( → r) ∈ R, base( ) = , c = 1 + min(height( ))} over alphabet Σ × N. For instance, the system match({ab → bc}) contains the rules {a0 b0 → b1 c1 , a0 b1 → b1 c1 , a1 b0 → b1 c1 , a1 b1 → b2 c2 , a0 b2 → b1 c1 , . . .}, writing xh as abbreviation for (x, h). Note that this is an infinite system. Every derivation modulo match(R) corresponds to a derivation modulo R, (for x, y ∈ (Σ × N)∗ , if x →match(R) y then base(x) →R base(y)) and vice versa. (for v, w ∈ Σ ∗ and x ∈ (Σ × N)∗ , if v →R w and base(x) = v, then there is y ∈ (Σ × N)∗ such that base(y) = w and x →match(R) y). Definition 1. A string rewriting system R over Σ is called match-bounded for L ⊆ Σ ∗ by c ∈ N if ∈ / lhs(R) and max(height(x)) ≤ c for every x ∈ match(R)∗ (lift0 (L)). If we omit L, then it is understood that L = Σ ∗ . Note that max(height x) (resp. min(height ) in the definition of match(R)) denotes the maximum (resp. minimum) of the corresponding sets of heights. We set max(∅) = 0.
452
Alfons Geser, Dieter Hofbauer, and Johannes Waldmann
For those systems R that are indeed match-bounded, it is enough to consider finite restrictions of the infinite system match(R). Denote by matchc (R) the restriction of match(R) to the alphabet Σ × {0, 1, . . . , c}. Proposition 1. If R is match-bounded by c, then R∗ = lift0 ◦ matchc (R)∗ ◦base. Proposition 2. For all R, and all c ∈ N, the system matchc (R) is deleting. Proof. Use the ordering > on Σ × {0, . . . , c} where (a, m) > (b, n) iff m < n. (Letters of minimal match height are maximal in the deletion ordering.) Corollary 1. If R is match-bounded for L, then R is terminating on L. Proof. An infinite R-derivation can be transformed into an infinite match(R)derivation which, given that R is match-bounded by c, is a matchc (R)-derivation. However, matchc (R) is terminating, since it is deleting. We conclude this section with a few examples of match-bounded rewriting systems; for non-examples, see Section 5. A large class of examples comes from the following observation: Dually to Proposition 2, we have Proposition 3. If R is deleting, then R is match-bounded. Proof. Assume R over Σ is deleting for the ordering > on Σ. Then R is matchbounded by the maximal depth (i.e., length of a descending chain) in (Σ, >). Example 1. The system {ba → cb, bd → d, cd → de} is match-bounded by 2, since it is deleting for the ordering a > b > d, a > c > e, c > d. Example 2. The system {ab → bac} is match-bounded by 1, {ab → ac, ca → bc} is match-bounded by 2, and {ab → ac, ca → b} is match-bounded by 3. (None of these systems is deleting.) See Section 5 for verification of the bounds.
4
Match-Bounded Systems Preserve Regularity
Theorem 2. If R is match-bounded, then R preserves regularity. Proof. By Theorem 1, deleting systems preserve regularity. By Proposition 1, all we need is two more rational transducers to do the encoding and decoding. Example 3. The system R = {ab → ba} on Σ = {a, b} is not regularity preserving, since R∗ ((ab)∗ )∩a∗ b∗ = {an bn | n ≥ 0} is not regular. So Theorem 2 implies that R is not match-bounded. (See Example 5 for a more direct proof.) Example 4. Peg solitaire is a one-person game. The objective is to remove pegs from a board. A move consists of one peg X hopping over an adjacent peg Y , landing on the empty space on the other side of Y . After the hop, Y is removed. Peg solitaire on a one-dimensional board corresponds to the rewriting system P = { → , → },
Match-Bounded String Rewriting Systems
453
where stands for “peg”, and for “empty”. One is interested in the language of all positions that can be reduced to one single peg, which is P −∗ (∗ ∗ ). Regularity of P −∗ (∗ ∗ ) is a “folklore theorem”, see [16] for its history. The system P − is match-bounded by 2, so we obtain yet another proof of that result. The automata Ac constructed for matchc (P − )∗ (∗ ∗ ) have sizes of 2, 14, and 30 respectively for A0 , A1 , and A2 . Remark 1. Ravikumar [17] proved that P − preserves regularity by considering the system’s change-bound (which is 4). Change-boundedness is a concept which is strongly related to match-boundedness. Given a length-preserving string rewriting system R (viz. || = |r| for every rule → r), define the system change(R) = { → r | (base() → base(r)) ∈ R, height(succ()) = height(r)} over alphabet Σ × N, where succ is the morphism succ : (Σ × N)∗ → (Σ × N)∗ induced by succ : (a, h) → (a, h + 1). Ravikumar proves that if change(R) has bounded height, then R preserves regularity. Our results both generalize and strengthen this, the main improvement being that the definition of match does also apply to systems that are not length-preserving. For length-preserving systems, match(R) will always give lower or equal heights, so our result implies Ravikumar’s.
5
Verification and Refutation of Match-Bounds
Theorem 3. The following problem is decidable: Given: A string rewriting system R, a regular language L, and c ∈ N. Question: Is R match-bounded by c for L? Proof. Construct (a finite automaton for) Lc = (lift0 ◦ matchc (R)∗ )(L), using Proposition 1. Then decide whether Lc contains a string x that has a factor liftc (), for some rule → r in R. If this is not the case, then Lc = Lc+1 = · · · and R is match-bounded by c for L. Otherwise, we have found a “high redex” in x, thus there is a string y with x →match(R) y and max(height(y)) = c + 1. For an implementation, the enormous growth of | matchc (R)| as a function of c is problematic. If we are computing matchc (R)∗ (lift0 (L)), then we should restrict attention to those rules of matchc (R) that are accessible in derivations starting from lift0 (L). For a language L ⊆ Σ ∗ , a system R over Σ, and a system S ⊆ match(R) define accessible(L, R, S) = match(R) ∩ (factor(S ∗ (lift0 (L))) × (Σ × N)∗ ). Note that this construction is effective if a finite system S and a regular language L are effectively given. We construct a sequence of rewriting systems Ri by R0 = ∅ and Ri+1 = accessible(L, R, Ri ). Induction on i shows Ri ⊆ matchi (R) for i ≥ 0. In particular, every system Ri is finite. By induction on i, using
454
Alfons Geser, Dieter Hofbauer, and Johannes Waldmann
monotonicity of S → accessible(L, R, S), one also proves that Ri ⊆ Ri+1 . Define ∗ R∞ = i∈N Ri . Clearly, R∞ (lift0 (L)) = match(R)∗ (lift0 (L)). If R is matchbounded by c, then R∞ is a subset of matchc (R); so R∞ is finite, and there is an index N such that RN = RN +1 = · · · . If R is not match-bounded then R∞ contains for each c a rule with height c, and is so infinite. The enumeration of Ri up to i = | matchc (R)| + 1 can be used as an alternative decision procedure for Theorem 3. In some cases we can also verify automatically that a given rewriting system R is not match-bounded for a language L. For this purpose, we try to find a self-embedding set of witnesses, as follows. The set raisedc (R, L) consists of all strings that occur as the base of a “high factor” (with all positions of height > 0) of a string that is reachable by a matchc (R)-derivation starting from lift0 (L): raisedc (R, L) = base(factor(matchc (R)∗ (lift0 (L))) ∩ (Σ × {1, 2, . . .})∗ ). First we observe that a match(R)-derivation can be raised to larger heights. For u , u ∈ (Σ × N)∗ we write u ≥ u if base(u ) = base(u) and height(u ) ≥n height(u), where ≥n denotes the pointwise greater-or-equal ordering on Nn . Lemma 1. If u ≥ u →match(R) v, then u →match(R) v ≥ v for some string v . Proposition 4. Let R be a string rewriting system, let L be a language, both over Σ. If there are c ∈ N and a language W ⊆ L ∩ raisedc (R, W ) with W ⊆ {}, then R is not match-bounded for L. Proof. We call u ∈ Σ ∗ a witness for height h if there is a match(R)-derivation from lift0 (u) to some string in (Σ × N)∗ that contains at least one position of height ≥ h. We will show that for each h ∈ N, there is some witness u ∈ W for height h. For h = 0, there is nothing to prove. By induction, assume u ∈ W is a witness for height h. Since W ⊆ raisedc (R, W ), there is some v ∈ W such that u ∈ raisedc (R, v). We claim that v is a witness for height h + 1. By definition of raisedc , there is a match(R)-derivation D from lift0 (v) to some string xu y with base(u ) = u and min(height u ) ≥ 1. Since u is a witness for h, there is a match(R)-derivation E from lift0 (u) to some word w with maximum height ≥ h. This derivation can be relabelled to a derivation from succ(lift0 (u)) = lift1 (u) to succ(w), where succ is the morphism defined in Section 4 that increases the height of each position by 1. By Lemma 1 and u ≥ lift1 (u), this derivation can be raised to a derivation E : u →∗ w for some string w ≥ succ(w). Now, D and E can be combined to lift0 (v) →∗ xu y →∗ xw y, such that max(height w ) ≥ h + 1. Note that the condition in Proposition 4 can be effectively checked if a finite SRS R, a number c ∈ N, and regular languages W and L are effectively given. Example 5. The system R = {ab → ba} (cf. Example 3) is not match-bounded for Σ ∗ . Take W = (ab)+ . Then raised1 (R, W ) = factor((ba)+ ) ⊇ W . Example 6. Neither is R = {aabb → ba} match-bounded, as wittnessed by W = {a, b}∗ = raised1 (R, W ). See Example 8 for a similar system with different behaviour.
Match-Bounded String Rewriting Systems
455
We have implemented the algorithm according to Theorem 3 and Proposition 4, see http://theo1.informatik.uni-leipzig.de/˜joe/bounded/.
6
Deciding Termination for Inverse Deleting and Inverse Match-Bounded Systems
In this section, we will prove that termination is decidable for inverse deleting string rewriting systems, and conclude that the same holds for inverse matchbounded systems. Lemma 2. Let s ⊆ Σ ∗ × Γ ∗ be a substitution, and let K be a regular language over Γ . Then Inf(s ∩ (Σ ∗ × K)) is regular. Proof. Consider a finite automaton A with state set Q that accepts K. Denote x by L(A, p, q) the set of strings x for which there is a path p → q in A. We define an automaton B over alphabet Σ × {F, I} as follows. The sets of states, initial states, and final states of B and A coincide. For p, q ∈ Q and a ∈ Σ, B contains the transition (a,I)
– p −→ q iff the language s(a) ∩ L(A, p, q) is infinite, (a,F )
– p −→ q iff the language s(a) ∩ L(A, p, q) is finite and non-empty. We claim that a1 . . . an ∈ Inf(s ∩ (Σ ∗ × K)) for ai ∈ Σ if and only if there is an accepting path in B that is labelled by (a1 , b1 ) . . . (an , bn ) where at least one bi equals I. Therefore, Inf(s ∩ (Σ ∗ × K)) = π(L(B) \ (Σ × F )∗ ) where π : (Σ × {I, F })∗ → Σ ∗ is the morphism induced by π : (a, b) → a. Lemma 3. Let Σ, Σ0 , Γ, Γ0 be alphabets, let s ⊆ Σ ∗ × Γ ∗ be a substitution, and let T1 ⊆ Σ0∗ × Σ ∗ and T2 ⊆ Γ ∗ × Γ0∗ be finitely branching rational transductions. Then Inf(T1 ◦ s ◦ T2 ) is regular. Proof. By Lemma 2, since Inf(T1 ◦ s ◦ T2 ) = T1− (Inf(s ∩ (Σ ∗ × T2− (Γ0∗ )))).
Remark 2. The regularity results in Lemma 2 and Lemma 3 are effective if s is an L-substitution for a family L of languages that is closed under intersection with regular sets, and for which emptiness and finiteness are decidable. This is the case, e.g., for the family of context-free languages, as in the proof of Proposition 5 below. Proposition 5. For an inverse deleting SRS R, Inf(R∗ ) is effectively regular. Proof. Let R be a system over alphabet Σ such that R− is deleting. First we exclude some trivial cases. Since R is inverse deleting, we have ∈ / rhs(R). And if R contains a rule → r = , then Inf(R∗ ) = Σ ∗ . So from now on, we may assume ∈ / lhs(R) ∪ rhs(R). By Theorem 1 we have R−∗ = (s ◦ C −∗ ) ∩ (Σ ∗ × Σ ∗ ), where s ⊆ Σ ∗ × Γ ∗ is a finite substitution into an extended alphabet Γ ⊇ Σ, and C is a context-free
456
Alfons Geser, Dieter Hofbauer, and Johannes Waldmann
rewriting system over Γ . Reviewing the construction in [10], we find that no can occur on either side of s and C, so C −∗ is a context-free substitution c ⊆ Γ ∗ ×Γ ∗ , and s− ⊆ Γ ∗ × Σ ∗ is the inverse of a finite and epsilon-free subsitution. We have R∗ = e ◦ c ◦ s− , where e ⊆ Σ ∗ × Γ ∗ is the embedding of Σ ∗ in Γ ∗ , therefore the claim follows by Lemma 3 and Remark 2. Theorem 4. The following problem is decidable: Given: A regular language L over Σ; an inverse deleting SRS R over Σ. Question: Is there an infinite R-derivation starting from a string in L? Proof. A finitely branching binary relation ρ is well-founded if and only if ρ∗ is finitely branching and ρ+ is irreflexive. Note that if R− is deleting then R−+ is well-founded, hence irreflexive. So there is an infinite R-derivation starting from a string in L if, and only if, Inf(R∗ )∩L = ∅. By Proposition 5, Inf(R∗ ) is regular, so emptiness of Inf(R∗ ) ∩ L is decidable. Corollary 2. Termination and uniform termination are decidable for inverse deleting string rewriting systems. Proof. Choose L = {x} to decide whether there is an infinite derivation starting with string x, and choose L = Σ ∗ to decide uniform termination. Example 7. McNaughton [13] proves decidability of termination and of uniform termination for the following class of string rewriting systems: A system R is called an inhibitor system, if there is a letter a ∈ / Σ such that ∈ Σ + and ∗ ∗ r ∈ (Σ ∪ {a}) \ Σ for every rule → r in R. (Inhibitor systems play a vital role in solving the uniform termination problem of well-behaved SRSs [13].) We can give an alternative proof by observing that an inhibitor system R is inverse deleting for the ordering that makes a greater than every other letter. Hence decidability of (uniform) termination follows from Corollary 2. As a bonus, we get context-freeness of R∗ (x) for x ∈ Σ ∗ , a result by Ginsburg and Greibach [8]. This shows once more that language classes and the uniform termination problem are intrinsically related. Theorem 5. Termination and uniform termination is decidable for string rewriting systems R for which R− is match-bounded. Proof. Assume R− match-bounded by c. Then each derivation modulo R corresponds to a derivation modulo matchc (R− )− = S, by the remark before Definition 1. So termination of R and S coincide. By Proposition 2, S is an inverse deleting system, and by Corollary 2, (uniform) termination of S is decidable. Example 8. Proving termination of the one-rule system Z = {aabb → bbbaaa} is known as Zantema’s Problem. This is a “modern classic” in rewriting [3,4,12,19,20,22], as it provides a test case where most of the automated methods for termination proofs fail. The match-bound of Z − is 2, therefore termination can be mechanically verified. (Recall that the fact that Z − is inverse match-bounded is in itself not a proof of termination for Z.) The computation of match(Z − )∗ (Σ ∗ ) according to Section 5 takes five iterations (i.e.,
Match-Bounded String Rewriting Systems
457
Z4− = Z5− = Z6− = · · · ). In our implementation (Haskell code compiled with ghc-5.04.2), this needs about 70 CPU seconds on a 2.4-GHz Pentium. The resulting automaton has 199 states. The intermediate constructions according to Theorem 1 involve much larger automata (up to 1576 states with 15999 transitions), on much larger alphabets (up to 283 letters).
7
Discussion
If the flow of information during rewriting is suitably restricted, nice properties hold: termination, bounded derivational complexity, or preservation of regular languages. For instance, McNaughton [13] and independently Ferreira and Zantema [5] use extra letters to indicate absence of information flow through certain positions. Kobayashi et al. [11] restrict derivations by using markers for the start and the end of a redex. S´enizergues [19] constructs finite automata to solve the termination problem for certain one-rule string rewriting systems. Moczydlowski and Geser [14,15] restrict the way the right hand side of a rule may be consumed in order to simulate the rewrite relation by the computation of a pushdown automaton. Our concepts of deleting and match-bounded rewriting aim at extending these approaches to a systematic theory of termination by language properties. The concept of match-bounded string rewriting opens two novel approaches to automated termination proofs: match-bounded systems are terminating, and for inverse match-bounded systems, termination is decidable. These methods can be further strengthened by considering match-boundedness not for all strings over the respective alphabet, but only for suitably chosen subsets. As we have demonstrated elsewhere [7], the right hand sides of forward closures are a suitable such subset. We expect these powerful tools to enable some major progress in the problem of deciding uniform termination for one-rule string rewriting systems, an open problem for 13 years [12], see [18, Problem 21]. Single-player games like Peg Solitaire can be analyzed through the construction of reachability sets. It is very challenging to extend this approach to twoplayer rewriting games [21]. Instead of termination (which is required anyway to give a well-defined game), for instance, one would like to know whether winning sets are regular. Even the impartial case is hard; here the central question is whether Grundy values are bounded. It seems natural to carry over the notion of match-boundedness to term rewriting, in order to obtain both closure properties and new automated termination proof methods.
Acknowledgements This research was supported in part by the National Aeronautics and Space Administration (NASA) while the last two authors were visiting scientists at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center (LaRC), Hampton, VA, in September 2002.
458
Alfons Geser, Dieter Hofbauer, and Johannes Waldmann
References 1. J. Berstel. Transductions and Context-Free Languages. Teubner, Stuttgart, 1979. 2. R. V. Book and F. Otto. String-Rewriting Systems. Texts and Monographs in Computer Science. Springer-Verlag, New York, 1993. 3. T. Coquand and H. Persson. A proof-theoretical investigation of Zantema’s problem. In M. Nielsen and W. Thomas (Eds.) 11th Annual Conf. of the EACSL CSL-97, Lect. Notes Comp. Sci. Vol. 1414, pp. 177–188. Springer-Verlag, 1998. 4. N. Dershowitz and C. Hoot. Topics in termination. In C. Kirchner (Ed.), Proc. 5th Int. Conf. Rewriting Techniques and Applications RTA-93, Lect. Notes Comp. Sci. Vol. 690, pp. 198–212. Springer-Verlag, 1993. 5. M. C. F. Ferreira and H. Zantema. Dummy elimination: Making termination easier. In H. Reichel (Ed.), 10th Int. Symp. Fundamentals of Computation Theory FCT-95, Lect. Notes Comp. Sci. Vol. 965, pp. 243–252. Springer-Verlag, 1995. 6. T. Genet and F. Klay. Rewriting for Cryptographic Protocol Verification. In D. A. McAllester (Ed.), 17th Int. Conf. Automated Deduction CADE-17, Lect. Notes Artificial Intelligence Vol. 1831, pp. 271–290. Springer-Verlag, 2000. 7. A. Geser, D. Hofbauer, and J. Waldmann. Match-bounded string rewriting systems and automated termination proofs. 6th Int. Workshop on Termination WST-03, Valencia, Spain, 2003. 8. S. Ginsburg and S. A. Greibach. Mappings which preserve context sensitive languages. Inform. and Control, 9(6):563–582, 1966. 9. T. N. Hibbard. Context-limited grammars. J. ACM, 21(3):446–453, 1974. 10. D. Hofbauer and J. Waldmann. Deleting string rewriting systems preserve regularity. In Proc. 7th Int. Conf. Developments in Language Theory DLT-03, Lect. Notes Comp. Sci., Springer-Verlag, 2003. To appear. 11. Y. Kobayashi, M. Katsura, and K. Shikishima-Tsuji. Termination and derivational complexity of confluent one-rule string-rewriting systems. Theoret. Comput. Sci., 262(1-2):583–632, 2001. 12. W. Kurth. Termination und Konfluenz von Semi-Thue-Systemen mit nur einer Regel. Dissertation, Technische Universit¨ at Clausthal, Germany, 1990. 13. R. McNaughton. Semi-Thue systems with an inhibitor. J. Automat. Reason., 26:409–431, 2001. 14. W. Moczydlowski Jr. Jednoregulowe systemy przepisywania sl´ ow. Masters thesis, Warsaw University, Poland, 2002. 15. W. Moczydlowski Jr. and A. Geser. Termination of single-threaded one-rule SemiThue systems. Technical Report TR 02-08 (273), Warsaw University, Dec. 2002. Available at http://research.nianet.org/˜geser/papers/single.html. 16. C. Moore and D. Eppstein. One-dimensional peg solitaire, and duotaire. In R. J. Nowakowski (Ed.), More Games of No Chance, Cambridge Univ. Press, 2003. 17. B. Ravikumar. Peg-solitaire, string rewriting systems and finite automata. In H.-W. Leong, H. Imai, and S. Jain (Eds.), Proc. 8th Int. Symp. Algorithms and Computation ISAAC-97, Lect. Notes Comp. Sci. Vol. 1350, pp. 233–242. SpringerVerlag, 1997. 18. The RTA list of open problems. http://www.lsv.ens-cachan.fr/rtaloop/. 19. G. S´enizergues. On the termination problem for one-rule semi-Thue systems. In H. Ganzinger (Ed.), Proc. 7th Int. Conf. Rewriting Techniques and Applications RTA-96, Lect. Notes Comp. Sci. Vol. 1103, pp. 302–316. Springer-Verlag, 1996.
Match-Bounded String Rewriting Systems
459
20. E. Tahhan Bittar. Complexit´e lin´eaire du probl`eme de Zantema. C. R. Acad. Sci. Paris S´er. I Inform. Th´ eor., t. 323:1201–1206, 1996. 21. J. Waldmann. Rewrite games. In S. Tison (Ed.), Proc. 13th Int. Conf. Rewriting Techniques and Applications RTA-02, Lect. Notes Comp. Sci. Vol. 2378, pp. 144– 158. Springer-Verlag, 2002. 22. H. Zantema and A. Geser. A complete characterization of termination of 0p 1q → 1r 0s . Appl. Algebra Engrg. Comm. Comput., 11(1):1–25, 2000.
Probabilistic and Nondeterministic Unary Automata Gregor Gramlich Institut f¨ ur Informatik Johann Wolfgang Goethe–Universit¨ at Frankfurt Robert-Mayer-Straße 11-15 60054 Frankfurt am Main, Germany [email protected] Fax: +49 - 69 - 798-28814
Abstract. We investigate unary regular languages and compare deterministic finite automata (DFA’s), nondeterministic finite automata (NFA’s) and probabilistic finite automata (PFA’s) with respect to their size. Given a unary PFA with n states and an -isolated cutpoint, we show 1 that the minimal equivalent DFA has at most n 2 states in its cycle. This result is almost optimal, since for any α < 1 a family of PFA’s can be α constructed such that every equivalent DFA has at least n 2 states. Thus we show that for the model of probabilistic automata with a constant error bound, there is only a polynomial blowup for cyclic languages. Given a unary NFA with n states, we show that efficiently √ approximating the size of a minimal equivalent NFA within the factor ln nn is impossible unless P = N P . This result even holds under the promise that the accepted language is cyclic. On the other hand we show that we can approximate a minimal NFA within the factor ln n, if we are given a cyclic unary n-state DFA.
1
Introduction
Regular languages and finite state automata as their acceptance devices, are well studied objects. We consider DFA’s, NFA’s and PFA’s with isolated cutpoint and compare their sizes. For an n-state PFA with -isolated cutpoint, the equivalent DFA needs at 1 n−1 most (1 + 2 ) states [9]. For a unary alphabet, Milani and Pighizzini [8] show √ the tight bound of Θ(e n ln n ) for the number of states in the cycle of the minimal DFA. This result does not depend on the size of the isolation and the proof of the lower bound actually relies on an isolation that tends to zero. We show that the isolation plays a crucial role, namely that L can be accepted by a DFA 1 with at most n 2 states in its cycle. Thus, for constant isolation , we improve the upper bound of Milani and Pighizzini to be a polynomial in n.
Partially supported by DFG project SCHN503/2-1
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 460–469, 2003. c Springer-Verlag Berlin Heidelberg 2003
Probabilistic and Nondeterministic Unary Automata
461
The minimization problem for DFA’s can be efficiently solved. But for a given DFA, the problem of determining the minimal number of states of an equivalent NFA is P SP ACE-complete [5]. A result of Stockmeyer and Meyer [10] shows that the problem of minimizing a given NFA is P SP ACE-complete for a binary alphabet and N P -complete for a unary alphabet. We show that, given an n-state NFA accepting L, it is impossible to efficiently approximate the number of states of a minimal NFA accepting L within a factor √ of ln nn unless P = N P . This result holds even under the promise that L is a unary cyclic language and can be extended to PFA’s with isolated cutpoint. On the other hand we show that if we are given a unary cyclic n-state DFA accepting L, then we can efficiently construct an equivalent NFA with at most k · (1 + ln n) states, where k is the number of states of a minimal NFA accepting L. This contrasts with a result of Jiang et al. [4] who show that the number of states of a minimal NFA, equivalent to a given unary DFA, cannot be computed in polynomial time, unless N P ⊆ DT IM E(nO(ln n) ). This result even holds, if we restrict the DFA to accept only cyclic languages. The next section gives a short introduction into unary NFA’s and unary PFA’s. Unary PFA’s with -isolated cutpoint, resp. unary NFA’s, are investigated in sections 3 and 4 respectively.
2
Preliminaries
We consider unary languages L ⊆ {a}∗ . A unary regular language is recognized by a DFA that starts with a possibly empty path and ends in a non-empty cycle. A language L is ultimately d-cyclic, if there is a µ ∈ IN0 , so that (aj ∈ L ⇔ j+d a ∈ L) holds for any j ≥ µ and we say that d is an ultimate period of L. A smallest ultimate period is called the minimal ultimate period c(L) and any ultimate period is a multiple of the minimal ultimate period. L is called cyclic, if the path of the minimal DFA for L is empty. For cyclic languages we use the term period instead of ultimate period and d-cyclic (resp. minimally d-cyclic) instead of ultimately d-cyclic (resp. minimally utimately d-cyclic). The size of an automaton A is the number of states of A. For a given regular language L, we use nsize(L) as the minimal size of an NFA accepting L. A normal form for unary NFA’s is established by Chrobak in [1]. His construction converts a given NFA N with n states into an equivalent NFA N consisting of a deterministic path and several deterministic cycles. Only the last state of the path branches nondeterministically into one state of each cycle. The path of N has length O(n2 ), and the number of all states in the cycles is bounded by n. Chrobak proves, that L(N ) is ultimately d-cyclic, where d is the least common multiple of the length of the cycles in N . For cyclic languages we introduce union automata as automata in Chrobak normal form with an empty path. Definition 1. A union automaton U is described by a collection (A1 , . . . , Ak ) of cyclic DFA’s. U accepts an kinput w iff there is an Ai , such that Ai accepts w. The size of U is defined as i=1 si , where si is the number of states of Ai .
462
Gregor Gramlich
To convert a union automaton U into an NFA with a single inital state, we simply add one state q0 and transitions from q0 to each state that succeeds an initial state of the deterministic automata that U consists of. Jiang, McDowell and Ravikumar [4] show a structural result about minimal unary NFA’s accepting cyclic languages. Fact 1. [4] Let L be a minimally D-cyclic unary language. Every minimal NFA accepting L can be obtained by converting some minimal union automaton U accepting L into an NFA. Moreover D is the least common multiple of the cycle lengths of U . αr 1 Consider the prime factorization of D = pα 1 ·. . .·pr , where the pi are distinct αr 1 and αi ∈ IN, then every NFA accepting L has at least pα 1 + . . . + pr states. This result offers some clues about the composition of the (ultimate) period of a unary language which also apply to probabilistic finite automata which we define as follows. A unary PFA M with a set Q of n states is described by a stochastic n × n matrix A, a stochastic row vector π representing the initial distribution, and a column vector η ∈ {0, 1}n indicating the final states. Observe that πAj η is the acceptance probability for input aj . The language accepted by M with respect to a cutpoint λ ∈ [0, 1] is L(M, λ) = {aj |πAj η > λ}. We call cutpoint λ -isolated, if for any j ∈ IN0 : |πAj η − λ| ≥ . We call a cutpoint isolated, if there is an > 0, so that it is -isolated. We regard A as the stochastic matrix of a finite Markov chain M, with rows and columns indexed by states, and consider the representation of M as a directed graph GA = (V, E) with V = Q. An arc from state q to state p exists in GA , if Ap,q > 0. We call a strongly connected component B ⊆ Q in GA ergodic1 , if starting in any state q ∈ B, we cannot reach any state outside of B. States within an ergodic component are called ergodic states, non-ergodic states are called transient. For an ergodic component B, the period of q ∈ B is defined as dq = gcd{j| starting in q one can reach q with exactly j steps}. All states q ∈ B have the same period d = dq , which we call the period of B. Factorization and primality play an important role for (ultimate) periods. To estimate the size of the i-th prime number we use the following fact. Fact 2. [3] If pi is the i-th prime number, then i ln i ≤ pi ≤ 2i ln i for i ≥ 3.
3
Unary PFA’s with -Isolated Cutpoint
In [8] Milani and Pighizzini show, that the ergodic components of a unary PFA with isolated cutpoint basically play the same role as the cycles of an NFA in Chrobak normal form. The least common multiple D of the periods of these components is an ultimate period of the language L(M, λ) accepted by the PFA. This result does not take the isolation into account and yields√an exponential upper bound for the ultimate period, namely c(L(M, λ)) = O(e n ln n ) where n 1
Unlike some authors we do not require an ergodic component to be aperiodic.
Probabilistic and Nondeterministic Unary Automata
463
is the number of states in the PFA. We show that the ultimate period c(L(M, λ)) decreases significantly with increasing isolation and this results in a polynomial upper bound for c(L(M, λ)), if is a constant. As a first step, Lemma 1 shows that the period di of an ergodic component Bi with absorption probability ri < 2, where (πAt )p = prob(a random walk is eventually absorbed into Bi ), ri := lim t→∞
p∈Bi
does not play a role for c(L(M, λ)), neither do periods of collections of ergodic components with small combined absorption probability. Lemma 1. Let B1 , . . . , Bm be the ergodic components of a Markov chain with periods di and absorption probabilities ri , respectively. If the corresponding PFA M accepts L := L(M, λ) with -isolated cutpoint, then for any I ⊆ {1, . . . , m} with i∈I ri > 1 − 2, D(I) := lcm{di |i ∈ I} is an ultimate period of L and thus is a multiple of c(L). Proof (Sketch). For an ultimate period D of L the limit A∞ := limt→∞ (AD )t exists, where we require convergence in each entry of the matrix. This can be shown by bringing the matrix A into a normal form (see Gantmacher [2]), so that the stochastic submatrix Ai for each ergodic component Bi forms a block within A. If Bi has period di , then limt→∞ (Adi i )t exists. Since D is a multiple of every di , the limit of (AD )t exists. As a consequence from [8] and from the existence of this limit, for every δ there must be a µδ ∈ IN, such that for every j ≥ µδ , aj ∈ L ⇔ aj+D ∈ L and |(πAj )q − (πA(j mod D) A∞ )q | < δ. q∈Q
Let I ⊆ {1, . . . , m} be a set of indices with i∈I ri > 1 − 2. Assume that D(I) is not an ultimate period of L. Then there is some j > µδ with aj ∈ L and aj+D(I) ∈ L. So πAj η ≥ λ + and πAj+D(I) η ≤ λ − , and thus j j+D(I) η ≥ 2. Let (x)+ = x if x > 0, and π A −A let (x)+ = 0 otherwise. n Remember, that η ∈ {0, 1} . Then we have with QI := i∈I Bi ∪ {q|q transient} (π(Aj − Aj+D(I) ))q 2 ≤ q∈Q
≤
(π(Aj − Aj+D(I) ))+ q +
q∈QI
(π(Aj − Aj+D(I) ))+ q .
(1)
q∈QI
The proof of the existence of A∞ also shows that if we restrict the matrix A to all the states in QI and call the resulting substochastic matrix AI , then the D(I) limit limt→∞ (AI )t exists as well. And so, for δ = 2 − i∈I ri and for any j ≥ µδ , we get (π(Aj − Aj+D(I) ))+ (2) q < δ. q∈QI
But on the other hand, for any j ≥ 0
464
Gregor Gramlich
(π(Aj − Aj+D(I) ))+ q ≤
q∈QI
(πAj )q ≤
q∈QI
ri = 2 − δ.
(3)
i∈I
The second inequality follows, since the absorption probability is the limit of a monotonically increasing sequence. So we have reached a contradiction, since the sum of (3) and (2) does not satisfy (1). We can now exclude some prime powers as potential divisors of c(L(M, λ)). Definition 2. Let M be a PFA with ergodic periods di and absorption probabilities ri . We call a prime power q = ps -essential (for M ), if ri ≥ 2 and ri < 2. i: q divides di
i: q·p divides di
Lemma 2. If λ is -isolated for a PFA M , then D= q. q is −essential
is an ultimate period of L = L(M, λ). Hence D is a multiple of c(L). Proof. Assume that c(L) is a multiple of a prime power pk which does not divide any -essential prime power. Let J = {i|pk divides di }, and let I = {1, . . . , m}\J be the complement of J. Then pk does not divide any di with i ∈ I and thus pk k does not divide D(I) = lcm{d i |i ∈ I}.Since p does not divide any -essential prime power, we have that i∈J ri < 2, and so i∈I ri > 1 − 2. According to Lemma 1, D(I) is a multiple of c(L). But on the other hand D(I) is not a multiple of pk . This is a contradiction, since pk was assumed to divide c(L). Now we show the tight upper bound for the minimal ultimate period of a language accepted by an -isolated PFA. Theorem 1. a) For any unary PFA M with n states and -isolated cutpoint λ 1
c(L(M, λ)) ≤ n 2 . 1 with m ∈ IN, there is a PFA M with n b) For any 0 ≤ α < 1 and any = 2m α states and -isolated cutpoint λ, such that c(L(M, λ)) > n 2 .
Proof. a) Let M have m ergodic components with periods d1 , . . . , dm . Set D := q and remember, that q is −essential i: q divides di ri ≥ 2 for any -essential q, then ri 2 2 q ≤ q i: q divides di D = q is −essential
=
q is −essential m i=1 q is −essential, q divides di
q ri ≤
m i=1
dri i .
Probabilistic and Nondeterministic Unary Automata
465
m Now, since i=1 ri = 1, the weighted arithmetic mean is at least as large as the geometric mean, and thus m
ri d i ≥
i=1
m i=1
dri i .
Since D ≥ c(L(M, λ)) with Lemma 2, we obtain n≥
m
di ≥
i=1
m
ri d i ≥
i=1
m i=1
dri i ≥ D2 ≥ c(L(M, λ))2 .
And the claim follows. b) Let p1 , p2 , . . . be the sequence of prime numbers. We define the languages
k+m−1 j Lk,m = a j ≡ 0 mod pi i=k
k+m−1
m for k, m ≥ 1. Obviously c(Lk,m ) = i=k p i ≥ pm k ≥ (k ln k) . 1 On the other hand Lk,m can be accepted by a PFA with isolation = 2m 1 and cutpoint λ = 1 − 2m as follows. We define a “union automaton with an initial distribution” by setting up m disjoint cycles of length pk , pk+1 , . . . , pk+m−1 , respectively. The transition probability from one state to the next in a cycle is 1. There is exactly one final state in each cycle and the initial distribution 1 places probability m on each final state. For every word az ∈ Lk,m we have z ≡ 0(mod pi ) for every k ≤ i ≤ k + m − 1 and for every word az ∈ Lk,m there is at least one i with z ≡ 0(mod pi ). Thus a word is either accepted with 1 probability 1, or it can reach acceptance probability at most 1 − m . Applying Fact 2, the number of states in the PFA is
k+m k+m−1 k+m−1 pi ≤ 2 i ln i ≤ 2 x ln x dx nk,m = i=k
=2
i=k
k
2 x=k+m
x x ln x − 2 4 2
x=k
≤ (k + 2km + m ) ln(k + m) − k 2 ln k m + (2km + m2 ) ln(k + m). = k 2 ln 1 + k m k ≤ ln em = m, But since k ln 1 + m k = ln 1 + k 2
2
nk,m ≤ km + (2km + m2 ) ln(k + m) ≤ (3km + m2 ) ln(k + m). Thus for any 0 ≤ α < 1, any constant m =
1 2
and a sufficiently large k, we have a
2 , c(Lk,m ) ≥ (k ln k)m > ((3km + m2 ) ln(k + m))αm ≥ nk,m
and the claim follows.
Our result shows that for a fixed isolation , the ultimate period of the language accepted by the PFA M with n states is only polynomial in n.
466
4
Gregor Gramlich
Approximating the Size of a Minimal NFA
Stockmeyer and Meyer [10] show, that the universe problem L(N ) = Σ ∗ is N P complete for regular expressions and NFA’s N , even if we consider only unary languages. Since our argument is based on their construction, we show the proof. Fact 3. [10] For a unary NFA N , it is N P -hard to decide, if L(N ) = {a}∗ . Proof. We reduce 3SAT to the universe problem for unary NFA’s. Let Φ be a 3CNF-formula overn variables with m clauses. Let p1 , . . . , pn be the first n n primes and set D := i=1 pi . According to the chinese remainder theorem, the n function µ : IN0 → IN0 with µ(x) = (x mod p1 , . . . , x mod pn ) is injective, if we restrict the domain to {0, . . . , D − 1}. We call x a code (for an assignment), if µ(x) ∈ {0, 1}n . We construct a union automaton NΦ that accepts {a}∗ iff Φ is not satisfiable. We first make sure, that L0,Φ = {ak |k is not a code} is accepted. Therefore, for every prime pi (pi > 2) we construct a cycle that accepts the words aj with j ≡ 0(mod pi ) ∧ j ≡ 1(mod pi ). So there are 2 non-final states and (pi − 2) final states in the cycle. For every clause C of Φ with variables xi1 , xi2 , xi3 we construct a cycle C ∗ of length pi1 pi2 pi3 . C ∗ will accept {ak | the assignment k mod pij for xij (j = 1, 2, 3) does not satisfy C}. Since the falsifying assignment is unique for the three variables in question, exactly one state is accepting in C ∗ . The construction can be done in time polynomial in the length of Φ. If there is a word aj ∈ L(NΦ ), then j is a code for a satisfying assignment. On the other hand every satisfying assignment has a code j and aj is not accepted by NΦ . We set LΦ = L(NΦ ) for the automaton NΦ constructed above. Observe that LΦ is a union of cyclic languages and hence itself cyclic. Obviously if Φ ∈ 3SAT , 1. We will show, that for Φ ∈ 3SAT every then the minimal NFA for LΦ has size n NFA accepting LΦ must have at least i=2 pi states, which implies Theorem 2. Theorem 2. Given an NFA N with n states, it is impossible to efficiently ap√ n proximate nsize(L(N )) within a factor of ln n unless P = N P . We first determine a lower bound for the period of LΦ . Lemma 3. For nany given 3CNF-formula Φ ∈ 3SAT the minimal period of LΦ is either D := i=2 pi or 2D. Proof. LΦ is 2D-cyclic, since 2D is the least common multiple of the cycle lengths of NΦ . Assume that neither D nor 2D is the minimal period of LΦ . Then there is i ≥ 2, such that d = pDi is a period of LΦ . We know that aqpi +2 ∈ L0,Φ for every q ∈ IN, because qpi + 2 does not represent a code. Since L0,Φ ⊆ LΦ and we assume that LΦ is d-cyclic, aqpi +2+rd belongs to LΦ for every r ∈ IN as well. On the other hand, since LΦ = {a}∗ , there is an al ∈ LΦ , and so al+td ∈ LΦ for every t ∈ IN. It is a contradiction, if we find q, r, t ∈ IN0 , so that qpi +2+rd =
Probabilistic and Nondeterministic Unary Automata
467
l + td, since the corresponding word has to be in LΦ because of the left-hand side of the equation and cannot be in LΦ because of the right-hand side. ∃q, r, t : qpi + 2 + rd = l + td ⇔ ∃q, r, t : qpi = l − 2 + (t − r)d (mod d) ⇔ ∃q : qpi ≡ l − 2 ⇔ ∃q : q ≡ (l − 2)p−1 (mod d) i The multiplicative inverse of pi modulo d exists, since gcd(pi , d) = 1, and we have obtained the desired contradiction. We will need a linear relation between the number of clauses and variables in the CNF-formula. Fact 4. Let E3SAT −E5 be the satisfiability problem for formulae with exactly 3 literals in every clause and every variable appearing in exactly 5 distinct clauses, then E3SAT − E5 is N P -complete. The following Lemma determines a lower bound for the size of an NFA equivalent to NΦ , if Φ is satisfiable. Lemma 4. Let Φ ∈ E3SAT − E5 and assume that Φ consists of m clauses. Then nsize(L(NΦ )) ≥ cm2 ln m for some constant c. Proof. We know from Lemma 3, that L(NΦ ) is either minimally D-cyclic or 2Dn cyclic with D = i=2 pi where n is the number of variables n in Φ. Applying Fact 1 the size of a minimal NFA accepting LΦ is at least i=2 pi . We observe that n
i=2
pi ≥
n
i=1
i ln i ≥
n 1
x ln x dx ≥
n2 4
ln n
We have 5n = 3m and thus nsize(LΦ ) ≥ cm2 ln m for some constant c.
Finally we determine an upper bound for the size of the NFA NΦ . Lemma 5. Let Φ be a 3CNF formula with m clauses and exactly 5 appearances of every variable. Then the NFA NΦ has size Θ m4 (ln m)3 . Proof. The number of states in a cycle for a clause is a product of three primes. 3 (m ln m) ) states in all of these cycles. The So there are at most m · p3n = Θ(m n 2 cycles recognizing L0,Φ have i=2 pi = Θ(n ln n) states, where n is the number of variables of Φ. Since n = Θ(m) the claim follows. Proof (of Theorem 2). Assume that the polynomial time deterministic algorithm √ A approximates nsize(L(N )) within the factor ln ss for an NFA N with s states. We show that the satisfiablity problem can be decided in polynomial time. Let Φ be the given input for the E3SAT − E5 problem, where we assume that Φ has n variables and m clauses. We construct the NFA NΦ as in fact 3. If Φ is not satisfiable, then nsize(LΦ ) = 1, and according to Lemma 5 the algorithm A claims that an equivalent NFA with at most √ (Θ(m4 (ln m)3 )) s = o(m2 ln m) = ln s ln(Θ(m4 (ln m)3 ))
468
Gregor Gramlich
n 2 states exists. Since i=2 pi = Θ(m ln m), the claimed number of states is asymptotically smaller than nsize(LΨ ) for any satisfiable formula Ψ with the same number of clauses as Φ. Hence with the help of A, we can decide if Φ is satisfiable within polynomial time. Remark 1. For every 0 < ≤ 1 the same construction as in the proof of Theorem 2 can be used to show that it is not possible to approximate the size of a minimal 1 PFA with isolation equivalent to a given n-state PFA with isolation c · n− 4 √ within the factor ln nn . For a given formula Φ with m clauses we construct the PFA MΦ with m cycles2 and uniform initial distribution for the initial states of each cycle. We 1 . Hence a word is accepted by MΦ iff it is accepted define the cutpoint as λ = 2m 1 1 ≥ c · n− 4 for by at least one cycle. Thus the cutpoint λ is δ-isolated with δ = 2m some appropriate c, and MΦ behaves like a union automaton. Since L(MΦ , λ) is the same languageas considered before, it is 1-cyclic if Φ is not satisfiable m and has period D = i=2 pi or 2D if Φ is satisfiable. Every PFA with isolated m m cutpoint that accepts a language with period i=2 pi has at least i=2 pi states [7], independent of the actual isolation. The approximation complexity changes if a unary cyclic language is specified by a DFA M , although the decision problem, namely to decide whether there is a k-state NFA accepting the cyclic language L(M ), is not efficiently solvable unless N P ⊆ DT IM E(nO(ln n) ) [4]. Theorem 3. Given a unary cyclic DFA accepting L with D states, an NFA for L with at most nsize(L) · (1 + ln D) states computed in polynomial time. can be3 Observe that nsize(L) · (1 + ln D) = O nsize(L) 2 ln nsize(L) . Proof. We reduce the optimization problem for a given cyclic DFA M to an instance of the weighted set cover problem. We can assume M to be a minimal cyclic D-state DFA with the set of states Q = {0, . . . , D − 1}, 0 as the initial state, and final states F ⊆ Q. Then L(M ) = {aj+kD |j ∈ F, k ∈ IN0 }. For every dl that divides D we construct a deterministic cycle Cl with period dl . The union automaton consisting of these cycles will accept L(M ), if we choose the final states of Cl as follows: For each aj ∈ L with 0 ≤ j < dl , we let Cl accept aj , iff aj+k·dl ∈ L(M ) for any 0 ≤ k < dDl . Remember, that we don’t have to check for ax with x ≥ D, since L(M ) is D-cyclic and dl divides D. At this stage the union automaton will have a lot of unnecessary cycles. Therefore we define an instance of the set cover problem, where we introduce a set Tl := {j|0 ≤ j < D, aj is accepted by Cl } of weight wl := dl for every cycle Cl . The universe is {j|0 ≤ j < D, aj ∈ L(M )}. The instance can be constructed in polynomial time, since the number of divisors of D is less than D and thus the set cover problem consists of at most D sets with at most D elements. If N is a minimal NFA accepting L(M ), then we know from Fact 1 that N is a union automaton (with an additional initial state) that consists of cycles 2
To check the validity of a code we can also use the clause cycles.
Probabilistic and Nondeterministic Unary Automata
469
with periods that divide D. Every cycle C ∗ of N corresponds to a set Tl and the accepted words of C ∗ up to length D − 1 are contained in Tl . So a minimal union automaton with n states can be expressed by a set cover of weight n. On the other hand, every set cover can be considered to be a union automaton. Thus a minimal set cover corresponds to a minimal NFA. The greedy algorithm for the weighted set problem approximates the cover k optimal set cover within the factor H(k) = i=1 k1 ≤ 1 + ln k, where k is the size of the largest set [6]. For an n-state NFA N Chrobak [1] bounds the size of √ c(L(N )) by the Landau function and receives D = O(e n ln n ).
5
Conclusions and Open Problems
In Theorem 1 we have shown that PFA’s with constant isolation lead to only polynomially smaller automata in comparison to cyclic unary DFA’s. It is not hard to observe that PFA’s with constant isolation are negatively exponentially smaller than DFA’s for non-cyclic unary languages. The size relation between minimal PFA’s and minimal DFA’s for non-cyclic unary languages is to be further explored. The hardness result of Theorem 2 for minimizing unary NFA’s is tight within √ a square, since size ln nn is excluded for a given NFA of size n. Is Theorem 2 “essentially” optimal? Jiang and Ravikumar [5] state the open problem of approximating a minimal NFA given a DFA. Specifically to determine the complexity of designing an NFA accepting L(M ) with at most nsize(L(M ))k states for a given DFA M and a given k. We have answered the question for the case of unary cyclic DFA’s and k > 32 in Theorem 3.
References 1. Chrobak, M.: Finite automata and unary languages, Theoretical Computer Science 47, 1986, pp. 149-158. 2. Gantmacher, F.R.: Theory of Matrices, Vol. II, Chelsea, New York, 1959. 3. Graham, R., Knuth, D., Patashnik, O.: Concrete Mathematics, Addison Wesley, Reading, Massachusetts, 1989. 4. Jiang, T., McDowell, E., Ravikumar, B.: The structure and complexity of minimal NFA’s over a unary alphabet, Int. J. Found. of Comp. Sci., 2, 1991, pp. 163-182. 5. Jiang, T., Ravikumar, B.: Minimal NFA problems are hard, SIAM Journal on Computing, 22 (1), 1993, pp. 1117-1141. 6. Hochbaum, D. (editor): Approximation algorithms for N P -hard problems, PWS Publishing Company, Boston, 1997. 7. Mereghetti, C., Palano, B., Pighizzini, G.: On the succinctness of deterministic, nondeterministic, probabilistic and quantum finite automata, DCAGRS 2001. 8. Milani, M., Pighizzini, G.: Tight bounds on the simulation of unary probabilistic automata by deterministic automata, DCAGRS 2000. 9. Rabin, M.: Probabilistic automata, Information and Control, 1963, pp. 230-245. 10. Stockmeyer, L., Meyer, A.: Word Problems Requiring Exponential Time, Proc. of the 5th Ann. ACM Symposium on Theory of Computing, New York, 1973, pp. 1-9.
On Matroid Properties Definable in the MSO Logic Petr Hlinˇen´ y ´ SAV) Institute of Mathematics and Comp. Science (MU Matej Bel University and Slovak Academy of Sciences Severn´ a ul. 5, 974 00 Bansk´ a Bystrica, Slovakia [email protected]
Abstract. It has been proved by the author that all matroid properties definable in the monadic second-order (MSO) logic can be recognized in polynomial time for matroids of bounded branch-width which are represented by matrices over finite fields. (This result extends so called “M S2 -theorem” of graphs by Courcelle and others.) In this work we review the MSO theory of finite matroids and show some interesting matroid properties which are MSO-definable. In particular, all minorclosed properties are recognizable in such way. Keywords: matroid, branch-width, MSO logic, parametrized complexity.
1
Introduction
The theory of parametrized complexity provides a background for analysis of difficult algorithmic problems which is finer than classical complexity theory. We postpone formal definitions till Section 3. Briefly saying, a problem is called “fixed-parameter tractable” if there is an algorithm having running time with the (possible) super-polynomial part separated in terms of some natural “parameter”, which is supposed to be small even for large input in practice. (Successful practical applications of this concept are known, for example, in computational biology or in database theory.) We are interested in algorithmic problems that are parametrized by a “treelike” structure of the input objects. Graph “branch-width” is closely related to well-known tree-width [13], but a branch decomposition does not refer to vertices, and so branch-width directly generalizes from graphs to matroids. It follows from works of Courcelle [2] and Bodlaender [1] that all graph problems definable in the monadic second-order logic can be solved in linear time for graphs of bounded tree-width. Those include many notoriously hard problems like 3-colouring, Hamiltonicity, etc.
Parts of this research have been done during author’s stay at the Victoria University of Wellington in New Zealand. From August 2003 also Department of Computer Science, Technical University Ostrava, Czech Republic.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 470–479, 2003. c Springer-Verlag Berlin Heidelberg 2003
On Matroid Properties Definable in the MSO Logic
471
We study and present analogous results for matroids representable over finite fields. The motivation of our research is mainly theoretical — to show how the mentioned complexity phenomenon extends from graphs to a much larger class of combinatorial objects, and to stimulate further research interest in matroid branch-width and the complexity of matroid problems. (Unfortunately, wide generality of our approach leads to impractically huge constants involved in the algorithms, such as in Theorem 4.1.) Since not all computer scientists are familiar with structural matroid theory or with parametrized complexity, we give a basic overview of necessary concepts in the next two sections.
2
Matroids and Branch-Width
We refer to Oxley [12] for matroid terminology. A matroid is a pair M = (E, B) where E = E(M ) is the ground set of M (elements of M ), and B ⊆ 2E is a nonempty collection of bases of M . Moreover, matroid bases satisfy the “exchange axiom”; if B1 , B2 ∈ B and x ∈ B1 − B2 , then there is y ∈ B2 − B1 such that (B1 − {x}) ∪ {y} ∈ B. We consider only finite matroids. Subsets of bases are called independent sets, and the remaining sets are dependent. Minimal dependent sets are called circuits. All bases have the same cardinality called the rank r(M ) of the matroid. The rank function rM : 2E → N of M tells the maximal cardinality rM (X) of an independent subset of a set X ⊆ E(M ). If G is a graph, then its cycle matroid on the ground set E(G) is denoted by M (G). The bases of M (G) are the (maximal) spanning forests of G, and the circuits of M (G) are the cycles of G. Another example of a matroid is a finite set of vectors with usual linear dependency. If A is a matrix, then the matroid formed by the column vectors of A is called the vector matroid of A, and denoted by M (A). The matrix A is a representation of a matroid M M (A). We say that the matroid M (A) is F-represented if A is a matrix over a field F. The dual matroid M ∗ of M is defined on the same ground set E, and the bases of M ∗ are the set-complements of the bases of M . The dual rank function satisfies rM ∗ (X) = |X| − r(M ) + rM (E − X). A set X is coindependent in M if it is independent in M ∗ . An element e of M is called a loop (a coloop), if {e} is dependent in M (in M ∗ ). The matroid M \ e obtained by deleting a noncoloop element e is defined as (E − {e}, B − ) where B − = {B : B ∈ B, e ∈ B}. The matroid M/e obtained by contracting a non-loop element e is defined using duality M/e = (M ∗ \ e)∗ . (This corresponds to contracting an edge in a graph.) A minor of a matroid is obtained by a sequence of deletions and contractions of elements. Since these operations naturally commute, a minor M of a matroid M can be uniquely expressed as M = M \ D/C where D are the coindependent deleted elements and C are the independent contracted elements. A matroid family M is minor-closed if M ∈ M implies that all minors of M are in M. A matroid N is called an excluded minor (also known as “forbidden”) for a minor-closed family M if N ∈ M but N ∈ M for all proper minors N of N . The connectivity function λM of a matroid M is defined for all subsets A ⊆ E = E(M ) by λM (A) = rM (A) + rM (E − A) − r(M ) + 1. Notice that λM (A) =
472
Petr Hlinˇen´ y 1
2
4
3
8
7 4
2
1
9
3
1
2
6
5
7
4
8 5
3
6
5 6
7
8 9
4
8
3
7
2
6
1
5
Fig. 1. Two examples of width-3 branch decompositions of the Pappus matroid (top left, rank 3) and of the binary affine cube (bottom left, rank 4). Here the lines depict linear dependencies between matroid elements.
λM (E − A). Is is also routine to verify that λM (A) = λM ∗ (A), i.e. matroid connectivity is dual-invariant. A subset A ⊆ E is k-separating if λM (A) ≤ k. A partition (A, E−A) is called a k-separation if A is k-separating and both |A|, |E− A| ≥ k. For n > 1, the matroid M is called n-connected if it has no k-separation for k = 1, 2, . . . , n − 1, and |E(M )| ≥ 2n − 2. (A connected matroid corresponds to a vertex 2-connected graph. Geometric interpretation of a k-separation (A, B) is that the spans of A and of B intersect in a subspace of rank less than k.) Let (T ) denote the set of leaves of a tree T . A branch decomposition of a matroid M is a pair (T, τ ) where T is a tree of maximal degree three, and τ is a bijection of E(M ) onto (T ). Let f be an edge of T , and T1 , T2 be the connected components of T − f . The width of an edge f in T is λM (A) = λM (B), where A = τ −1 ((T1 )) and B = τ −1 ((T2 )). The width of the branch decomposition (T, τ ) is maximum of the widths of all edges of T , and the branch-width of M is the minimal width over all branch decompositions of M . If T has no edge, then we take its width as 0. An example of a branch decomposition is presented in Fig. 1. Notice that matroid branch-width is invariant under duality. It is straightforward to verify that branch-width does not increase when taking minors: Let (T, τ ) be a branch decomposition of a matroid M . Say, up to duality, that M = M \ e. We form T from T by deleting the leaf τ (e), and set τ to be τ restricted to E(M ). Then, for any partition (A, B) of E(M ) given by an edge f in T , we have obvious λM (A − {e}) ≤ λM (A), and so the width of (T , τ ) is not bigger than the width of (T, τ ) for M .
On Matroid Properties Definable in the MSO Logic
473
We remark that branch-width of a graph G is defined analogously, using the connectivity function λG where λG (F ) for F ⊆ E(G) is the number of vertices incident both with F and E(G) − F . Clearly, branch-width of a graph G is never smaller than branch-width of its cycle matroid M (G). It is still an open conjecture that these numbers are actually equal. On the other hand, branchwidth is within a constant factor of tree-width in graphs [13]. Lastly in this section we mention few words about relations of matroid theory to computer science. As the reader surely knows, a greedy algorithm on a matroid is one of the basic tools in combinatorial optimization. That is why matroids naturally arise in a number of optimization problems; such as the minimum spanning tree or job assignment problems. More involved applications of matroids in combinatorial optimization could be found in numerous works of Edmonds, Cunningham and others. Besides that, the concept of branch-width has attracted increasing attention among matroid theorists recently, and several deep results of Robertson-Seymour’s graph minor theory have been extended from graphs to matroids representable over finite fields; such as [6]. Robertson-Seymour’s theory has been followed by many interesting algorithmic applications on graphs (mostly related to tree-width or branch-width). Therefore we think it is right time now to look at complexity aspects of branchwidth in matroid problems. For example, we have given a straightforward polynomial algorithm for computation of the Tutte polynomial [10] on a representable matroid of bounded branch-width. (It seems that matroids present a more suitable model than graphs for computing the Tutte polynomial on structures of bounded tree-/branch-width.) As yet another motivation we remark that linear codes over a finite field F are in a direct correspondence with F-represented matroids.
3
Parametrized Complexity
When speaking about parametrized complexity, we closely follow Downey and Fellows [4]. Here we present the basic definition of parametrized tractability. For simplicity, we restrict the definition to decision problems, although an extension to computation problems is straightforward. Let Σ be the input alphabet. A parametrized problem is an arbitrary subset Ap ⊆ Σ ∗ × N. For an instance (x, k) ∈ Ap , we call k the parameter and x the input for the problem. (The parameter is sometimes implicit in the context.) We say that a parametrized problem Ap is (nonuniformly) fixed-parameter tractable if there is a sequence of algorithms {Ai : i ∈ N}, and a constant c; such that (x, k) ∈ Ap iff the algorithm Ak accepts (x, k), and that the running time of Ak on (x, k) is O(|x|c ) for each k. Similarly, a parametrized problem Ap is uniformly fixed-parameter tractable if there is an algorithm A, a constant c, and an arbitrary function f : N → N; such that (x, k) ∈ Ap iff the algorithm A accepts (x, k), and that the running time of A on (x, k) is O(f (k) · |x|c ). There is a natural correspondence of a parametrized problem Ap to an ordinary problem A = { x, k : (x, k) ∈ Ap } (for example, the problem of a
474
Petr Hlinˇen´ y
k-vertex cover in a graph), or to a problem A = {x : ∃k (x, k) ∈ Ap } if k is not “directly involved” in the question (such as a Hamiltonian cycle in a graph of tree-width k). On the other hand, an ordinary problem may have several natural parametrized versions respecting different parameters. We remark that the parameter is formally a natural number, but that may encode arbitrary finite structures in a standard way. As we have already noted above, our interest is in parametrized problems where the parameter is branch-width (tree-width). Inspired by the algorithm of Bodlaender [1], we have shown that branch-width of matroids represented over finite fields is fixed parameter tractable, and that, moreover, we could efficiently construct a branch decomposition. Let Bt denote the class of all matroids of branch-width at most t. We have proved the following: Theorem 3.1. (PH [9]) Let t ≥ 1 be fixed, and let F be a finite field. Suppose that A is an r × n matrix over F (r ≤ n) such that the represented matroid M (A) ∈ Bt . Then there is an algorithm that finds a branch decomposition of the matroid M (A) of width at most 3t in time O(n3 ). Actually, our algorithm directly constructs a so called “parse tree” for the mentioned branch decomposition. Unfortunately, the algorithm in Theorem 3.1 does not necessarily produce the optimal branch decomposition. On the other hand, there are finitely many excluded minors for the class Bk for each k, and these excluded minors are constructed algorithmically since they have size at most 15 (6k+1 − 1) by [5]. Hence, in this particular case, we can extend the idea in Theorem 5.2 to show: Corollary 3.2. Let F be a finite field. Suppose that A is a given matrix over F. Then branch-width of the matroid M (A) is uniformly fixed parameter tractable.
4
MSO Logic of Matroids
The monadic second-order (MSO) theory of matroids uses language based on the monadic second-order logic. The syntax includes variables for matroid elements and element sets, the quantifiers ∀, ∃ applicable to these variables, the logical connectives ∧, ∨, ¬, and the following predicates: 1. =, the equality for elements and their sets, 2. e ∈ F , where e is an element variable and F is an element set variable, 3. indep(F ), where F is an element set variable, and the predicate tells whether F is independent in the matroid. Moreover, we write φ → ψ to stand for ¬φ∨ψ, and X ⊆ Y for ∀x(x ∈ Y ∨x ∈ X). Notice that the “universe” of a formula (the model in logic terms) in the above theory is one particular matroid. To give a better feeling for the MSO theory of matroids, we provide few simple predicates now. We write basis(B) ≡ indep(B) ∧ ∀e e ∈ B ∨ ¬ indep(B ∪ {e}) where indep(B ∪ {e}) is a shortcut for obvious ∃X indep(X) ∧ e ∈ X ∧ B ⊆ X ∧ ∀x(x = e ∨ x ∈ B ∨ x ∈ X) . Similarly,
On Matroid Properties Definable in the MSO Logic
475
we write a predicate circuit(C) ≡ ¬ indep(C) ∧ ∀e e ∈ C → indep(C − {e}) where indep(C − {e}) is a shortcut for ∃X indep(X) ∧ e ∈ X ∧ X ⊆ C ∧ ∀x(x = e ∨ x ∈ C ∨ x ∈ X) . Let us now look at the (graph) property of being Hamiltonian. In matroid language, that means to have a circuit containing a basis. So we may write a sentence hamilton ≡ ∃C circuit(C) ∧ ∃e basis(C − {e}) . A related matroidal property is to be a paving matroid M — i.e., to have all circuits C in M of size |C| ≥ r(M ). Let us explain this sample property in detail. Since C − {e} is independent for each e ∈ C by definition of a circuit, we have |C| ≤ r(M ) + 1 for any circuit C in M . Considering a basis B ⊇ C − {e} and the inequality |C| ≥ r(M ) = |B| valid in a paving matroid, we conclude that there is an element f such that B ⊆ C ∪ {f}. The converse also holds. Hence we express paving ≡ ∀C circuit(C) → ∃f, B B ⊆ C ∪ {f } ∧ basis(B) . The reason why we are looking for properties definable in the MSO logic of matroids is, that such properties can be recognized in polynomial time for matroids of bounded branch-width over finite fields. The following result is based on a finite-state recognizability of matroidal MSO properties, proved by the author in [8], and on Theorem 3.1. Theorem 4.1. (PH [7,8,9]) Let F be a finite field. Assume that M is a class of matroids defined in one of the following ways; (a) there is an MSO sentence φ such that M ∈ M iff φ is true on M , or (b) there is a sequence of MSO sentences {φk : k = 1, 2, . . .} and, for all k ≥ 1 and matroids M ∈ Bk , we have M ∈ M iff φk is true on M . Suppose that A is an n-column matrix over F such that M (A) ∈ Bt where t ≥ 1 is fixed. Then there is an algorithm deciding whether M (A) ∈ M in time O(n3 ), and this algorithm can be constructed from the given sentence(s) φ or φt for all t. Remark. In the language of parametrized complexity, Theorem 4.1 says that the class of F-represented matroids defined by MSO sentences φ or φt is fixedparameter tractable with respect to the combined parameter F, t . Moreover, in the case (a), or in the case (b) when the sentences φk are constructible by an algorithm, the class M is uniformly fixed-parameter tractable. So it follows that the properties of being Hamiltonian or a paving matroid can be efficiently recognized on F-represented matroids of bounded branch-width. Other simple matroidal properties definable in the MSO logic are, for example, the properties of being identically self-dual, or being a “free spike” [11]. Moreover, all properties definable in the extended MSO theory of graphs (M S2 ) are also MSO-definable over graphic matroids [8]. Several more interesting classical matroid properties are shown to be MSO-definable in the next sections.
5
Minor-Closed Properties
It is easy to see that the class of F-representable matroids is minor-closed, and so is the class Bt of matroids of branch-width at most t. We say that a set S is wellquasi-ordered (WQO) if there are neither infinite antichains nor infinite strictly
476
Petr Hlinˇen´ y
descending chains in S. By a deep result of [6], matroids of bounded branchwidth which are representable over a fixed finite field F are WQO in the minor order. (However, unlike graphs, matroids are not WQO in general.) So it follows that any minor-closed matroid family M has a finite number of F-representable excluded minors in Bt . We now show that presence of one particular minor can be described by an MSO sentence. Lemma 5.1. Let N be a matroid. There is a (computable) MSO sentence ψN such that ψN is true on a matroid M if and only if M has an N -minor. Proof. N is a minor of M if and only if there are two sets C, D such that C is independent and D is coindependent in M , and that N = M \ D/C. Suppose that N = M \ D/C holds. Then a set X ⊆ E(N ) is dependent in N if and only if there is a dependent set Y ⊆ E(M ) in M such that Y − X ⊆ C. (This simple claim may be more obvious when viewed over the dual matroid M ∗ — a set is dependent in M iff it intersects each basis of M ∗ , and N ∗ = M ∗ /D \ C.) Since N is fixed, we may identify the elements of the (supposed) N -minor in M by variables x1 , . . . , xn in order, where n = |E(N )|. Then, knowing the contract set C (and implicit D), we are able to say which subsets of {x1 , . . . , xn } are dependent in M \ D/C. For each J ⊆ [1, n], we write mdep(xj : j ∈ J; C) ≡ ∃ Y ¬ indep(Y ) ∧ ∀y y ∈ Y ∨ y ∈ C ∨ y = xj . j∈J
Now, M \ D/C is isomorphic to N iff the dependent subsets of {x1 , . . . , xn } exactly match the dependent sets of N . Hence we express ψN as
¬ mdep(xj : j ∈ J; C) ∧ mdep(xj : j ∈ J; C) , ∃ C ∃ x1 , . . . , xn J∈J+
J∈J−
where J+ is the set of all J ⊆ [1, n] such that {xj : j ∈ J} actually is independent in N , and where J− is the complement of J+ . 2 Hence, in connection with Theorem 4.1, we conclude: Theorem 5.2. Let t ≥ 1 be fixed, let F be a finite field, and let M be a minorclosed family. Given a matrix A over F with n columns such that M (A) ∈ Bt , one can decide whether the matroid M (A) belongs to M in time O(n3 ). Proof. As already noted above, the family M has a finite number of Frepresentable excluded minors X1 , . . . , Xp ∈ Bt . Keeping in mind that all minors of M (A) also belong to Bt , we see that M (A) ∈ M iff M (A) has no minors isomorphic to X1 , . . . , Xp . (For formal completeness, we may verify M (A) ∈ Bt using Corollary 3.2.) We write φt ≡ ¬ψX1 ∧ . . . ∧ ¬ψXp using Lemma 5.1. Finally, we apply Theorem 4.1(b). 2 Applications of this theorem include determining the exact branch-width (cf. Section 3) or tree-width of a matroid, or deciding matroid orientability and representability over another field.
On Matroid Properties Definable in the MSO Logic
477
Remark. Unfortunately, the proof of Theorem 5.2 is non-constructive — there is no way in general how to compute the excluded minors X1 , . . . , Xp , not even their number or size. So we cannot speak about uniform fixed-parameter tractability here.
6
Matroid Connectivity
Another interesting task is to describe matroid connectivity in the MSO logic. That can be done quite easily. Lemma 6.1. Let M be a matroid on the ground set E, and let k ≥ 1. There is an MSO formula σk (X) which is true for X ⊆ E if and only if λM (X) ≥ k + 1. Proof. By definition, λM (X) ≥ k + 1 iff rM (X) + rM (E − X) ≥ r(M ) + k. Using standard matroidal arguments, this is equivalent to stating that there exist two bases B1 , B2 of M such that B2 ∩ X ⊂ B1 and |(B1 − B2 ) ∩ X| ≥ k. We may formalize this statement as
σk (X) ≡ ∃B1 , B2 basis(B1 ) ∧ basis(B2 ) ∧ ∀x (x ∈ B2 ∧ x ∈ X) → x ∈ B1 ∧ ∧ ∃z1 , . . . , zk
i=j
zi = zj ∧
i
zi ∈ X ∧
i
zi ∈ B1 ∧
i
zi ∈ B2
. 2
So we may finish this section with the next immediate result: Corollary 6.2. For each n > 1, there is an MSO sentence κn which is true on a matroid M if and only if M is n-connected.
7
Transversal Matroids
A matroid M is transversal if there is a bipartite graph G with vertex parts V = E(M ) and W , such that the rank of any set X in M equals the largest size of a matching incident with X in G. (Equivalently, a transversal matroid is a union of rank-1 matroids.) We consider transversal matroids here mainly because they have long history of research, but there is not much known about their relation to branch-width. Two elements e, f in a matroid M are parallel if {e, f } form a circuit, and e, f are in series if e, f are parallel in the dual M ∗ . A series minor of a matroid M is obtained by a sequence of contractions of series elements and arbitrary deletions of elements in M . A matroid having a representation over GF (2) is called a binary matroid. The trouble with transversal matroids is that these are not closed under taking minors or duals. However, series minors of transversal matroids are transversal again. We cannot use a “series” analogue of Theorem 5.2 since there is no wellquasi-ordering property of series minors even of bounded branch-width. Still, we can say a bit:
478
Petr Hlinˇen´ y
Theorem 7.1. There is an MSO sentence τ which is true on a matroid M if and only if M is a binary transversal matroid. Sketch of proof. Let Ck2 denote the graph obtained from a length-k cycle Ck by adding one parallel edge to each edge of Ck . According to [3], the following is true: A matroid M is both binary and transversal if and only if M has no series minor isomorphic to either the 4-element line U2,4 , or the graphic matroids M (K4 ) or M (Ck2 ) for k ≥ 3. Let N = M \ D/C be a minor of M , and let F = E(N ). There are no problems to express that N is a series minor of M , i.e. that C consists of series elements of M \ D. (For simplicity, we assume no coloops.) We write ∀x ∈ C ∃y ∈ F ∀Z Z ⊆ F ∪ C ∧ basis(Z) → x ∈ Z ∨ y ∈ Z . Now let P be a matroid. We may express whether P is isomorphic to M (Ck2 ) (regardless of the value of k) as follows ∃Z circuit(Z) ∧ ∀x ∈ Z ∃y ∈ Z circuit(x, y) ∧ ∀y ∈ Z ∃!x x ∈ Z ∧ circuit(x, y) where ∃!x Π(x) is a shortcut for ∃x Π(x) ∧ ∀x, x x = x ∨ ¬Π(x) ∨ ¬Π(x ) . The rest of the proof proceeds by combining the previous formulas with the ideas in the proof of Lemma 5.1. (Considering matroid P as a minor of M , we use the predicate mdep from that proof to express circuit in the above formula.) We leave technical details to the reader. 2 Since the proof of Theorem 7.1 is very specific to binary matroids, we doubt that it could be extended to all matroids. Thus we ask: Problem 7.2. Is the property of being a transversal matroid MSO-definable?
Acknowledgement I would like to thank Prof. Geoff Whittle from Victoria University for introducing me to the beauties of structural matroid theory, and to Prof. Rod Downey for pointing my research towards parametrized complexity of matroid problems. Moreover, I am grateful to the NZ Marsden Fund and the Victoria University of Wellington for supporting my stay in New Zealand.
References 1. H.L. Bodlaender, A Linear Time Algorithm for Finding Tree-Decompositions of Small Treewidth, SIAM J. Computing 25 (1996), 1305–1317. 2. B. Courcelle, The Monadic Second-Order Logic of Graphs I. Recognizable sets of Finite Graphs, Information and Computation 85 (1990), 12–75. 3. J. de Sousa, D.J.A. Welsh, A Characterisation of Binary Transversal Matroids, J. Math. Anal. Appl. 40 (1972), 55–59.
On Matroid Properties Definable in the MSO Logic
479
4. R.G. Downey, M.R. Fellows, Parametrized Complexity, Springer-Verlag, 1999. 5. J.F. Geelen, A.H.M. Gerards, N. Robertson, G.P. Whittle, On the Excluded Minors for the Matroids of Branch-Width k, J. Combin. Theory Ser. B, to appear (2003). 6. J.F. Geelen, A.H.M. Gerards, G.P. Whittle, Branch-Width and Well-QuasiOrdering in Matroids and Graphs, J. Combin. Theory Ser. B 84 (2002), 270–290. 7. P. Hlinˇen´ y, Branch-Width, Parse Trees, and Monadic Second-Order Logic for Matroids (Extended Abstract), In: STACS 2003, Lecture Notes in Computer Science 2607, Springer Verlag (2003), 319–330. 8. P. Hlinˇen´ y, Branch-Width, Parse Trees, and Monadic Second-Order Logic for Matroids, submitted, 2002. 9. P. Hlinˇen´ y, A Parametrized Algorithm for Matroid Branch-Width, submitted, 2002. 10. P. Hlinˇen´ y, The Tutte Polynomial for Matroids of Bounded Branch-Width, submitted, 2002. 11. P. Hlinˇen´ y, It is Hard to Recognize Free Spikes, submitted, 2002. 12. J.G. Oxley, Matroid Theory, Oxford University Press, 1992,1997. 13. N. Robertson, P.D. Seymour, Graph Minors X. Obstructions to Tree-Decomposition, J. Combin. Theory Ser. B 52 (1991), 153–190.
Characterizations of Catalytic Membrane Computing Systems (Extended Abstract) Oscar H. Ibarra1 , Zhe Dang2 , Omer Egecioglu1 , and Gaurav Saxena1 1
2
Department of Computer Science University of California Santa Barbara, CA 93106, USA [email protected] Fax: 805-893-8553 School of Electrical Engineering and Computer Science Washington State University Pullman, WA 99164, USA
Abstract. We look at 1-region membrane computing systems which only use rules of the form Ca → Cv, where C is a catalyst, a is a noncatalyst, and v is a (possibly null) string of noncatalysts. There are no rules of the form a → v. Thus, we can think of these systems as “purely” catalytic. We consider two types: (1) when the initial configuration contains only one catalyst, and (2) when the initial configuration contains multiple (not necessarily distinct) catalysts. We show that systems of the first type are equivalent to communication-free Petri nets, which are also equivalent to commutative context-free grammars. They define precisely the semilinear sets. This partially answers an open question in [19]. Systems of the second type define exactly the recursively enumerable sets of tuples (i.e., Turing machine computable). We also study an extended model where the rules are of the form q : (p, Ca → Cv) (where q and p are states), i.e., the application of the rules is guided by a finite-state control. For this generalized model, type (1) as well as type (2) with some restriction correspond to vector addition systems. Keywords: membrane computing, catalytic system, semilinear set, vector addition system, reachability problem.
1
Introduction
In recent years, there has been a burst of research in the area of membrane computing [16], which identifies an unconventional computing model (namely a P system) from natural phenomena of cell evolutions and chemical reactions [2]. Due to the built-in nature of maximal parallelism inherent in the model, P systems have a great potential for implementing massively concurrent systems in an efficient way, once future biotechnology (or silicon-technology) gives way to a practical bio-realization (or a chiprealization). In this sense, it is important to study the computing power of the model.
This research was supported in part by NSF Grants IIS-0101134 and CCR02-08595.
B. Rovan and P. Vojt´asˇ (Eds.): MFCS 2003, LNCS 2747, pp. 480–489, 2003. c Springer-Verlag Berlin Heidelberg 2003
Characterizations of Catalytic Membrane Computing Systems
481
Two fundamental questions one can ask of any computing device (such as a Turing machine) are: (1) What kinds of restrictions/variations can be placed on the device without reducing its computing power? (2) What kinds of restrictions/variations can be placed on the device which will reduce its computing power? For Turing machines, the answer to (1) is that Turing machines (as well as variations like multitape, nondeterministic, etc.) accept exactly the recursively enumerable (r.e.) languages. For (2), there is a wide spectrum of well-known results concerning various sub-Turing computing models that have been introduced during the past half century – to list a few, there are finite automata, pushdown automata, linearly bounded automata, various restricted counter automata, etc. Undoubtedly, these sub-Turing models have enhanced our understanding of the computing power of Turing machines and have provided important insights into the analysis and complexity of many problems in various areas of computer science. We believe that studying the computing power of P systems would lend itself to the discovery of new results if a similar methodology is followed. Indeed, much research work has shown that P systems and their many variants are universal (i.e., equivalent to Turing machines) [4,16,17,3,6,8,19] (surveys are found in [12,18]). However, there is little work in addressing the sub-Turing computing power of restricted P systems. To this end, we present some new results in this paper, specifically focusing on catalytic P systems. A P system S consists of a finite number of membranes, each of which contains a multiset of objects (symbols). The membranes are organized as a Venn diagram or a tree structure where one membrane may contain 0 or many membranes. The dynamics of S is governed by a set of rules associated with each membrane. Each rule specifies how objects evolve and move into neighboring membranes. The rule set can also be associated with priority: a lower priority rule does not apply if one with a higher priority is applicable. A precise definition of S can be found in [16]. Since, from a recent result in [19], P systems with one membrane (i.e., 1-region P systems) and without priority are already able to simulate two counter machines and hence universal [14], for the purposes of this paper, we focus on catalytic 1-region P Systems, or simply catalytic systems (CS’s) [16,19]. A CS S operates on two types of symbols: catalytic symbols called catalysts (denoted by capital letters C, D, etc) and noncatalytic symbols called noncatalysts (denoted by lower case letters a, b, c, d, etc). An evolution rule in S is of the form Ca → Cv, where C is a catalyst, a is a noncatalyst, and v is a (possibly null) string (an obvious representation of a multiset) of noncatalysts. A CS S is specified by a finite set of rules together with an initial multiset (configuration) w0 , which is a string of catalysts and noncatalysts. As with the standard semantics of P systems [16], each evolution step of S is a result of applying all the rules in S in a maximally parallel manner. More precisely, starting from the initial configuration w0 , the system goes through a sequence of configurations, where each configuration is derived from the directly preceding configuration in one step by the application of a subset of rules, which are chosen nondeterministically. Note that a rule Ca → Cv is applicable if there is a C and an a in the preceding configuration. The result of applying this rule is the replacement of a by v. If there is another occurrence of C and another occurrence of a, then the same rule or another rule with Ca on the left hand side can be applied. We require that the chosen subset of rules to apply must be
482
Oscar H. Ibarra et al.
maximally parallel in the sense that no other applicable rule can be added to the subset. Configuration w is reachable if it appears in some execution sequence. w is halting if none of the rules is applicable. The set of all reachable configurations is denoted by R(S). The set of all halting reachable configurations (which is a subset of R(S)) is denoted by Rh (S). We show that CS’s, whose initial configuration contains only one catalyst, are equivalent to communication-free Petri nets, which are also equivalent to commutative context free grammars [5,11]. They define precisely the semilinear sets. Hence R(S) and Rh (S) are semilinear. This partially answers an open problem in [19], where it was shown that when the initial configuration contains six catalysts, S is universal, and [19] raised the question of what is the optimal number of catalysts for universality. Our result shows that one catalyst is not enough. We also study an extended model where the rules are of the form q : (p, Ca → Cv) (where q and p are states), i.e., the application of the rules is guided by a finite-state control. For this generalized model, systems with one catalyst in its initial configuration as well as systems with multiple catalysts in its initial configuration but with some restriction correspond to vector addition systems. We conclude this section by recalling the definitions of semilinear sets and Parikh maps [15]. Let N be the set of nonnegative integers and k be a positive integer. A set S ⊆ Nk is a linear set if there exist vectors v0 , v1 , . . . , vt in Nk such that S = {v | v = v0 + a1 v1 + . . . + at vt , ai ∈ N}. The vectors v0 (referred to as the constant vector) and v1 , v2 , . . . , vt (referred to as the periods) are called the generators of the linear set S. A set S ⊆ Nk is semilinear if it is a finite union of linear sets. The empty set is a trivial (semi)linear set, where the set of generators is empty. Every finite subset of Nk is semilinear – it is a finite union of linear sets whose generators are constant vectors. Clearly, semilinear sets are closed under union and projection. It is also known that semilinear sets are closed under intersection and complementation. Let Σ = {a1 , a2 , . . . , an } be an alphabet. For each string w in Σ ∗ , define the Parikh map of w to be ψ(w) = (|w|a1 , |w|a2 , . . . , |w|an ), where |w|ai is the number of occurrences of ai in w. For a language (set of strings) L ⊆ Σ ∗ , the Parikh map of L is ψ(L) = {ψ(w) | w ∈ L}.
2
1-Region Catalytic Systems
In this section, we study 1-region membrane computing systems which use only rules of the form Ca → Cv, where C is a catalyst, a is a noncatalyst, and v is a (possibly null) string of noncatalysts. Note that we do not allow rules of the form a → v as in a P System. Thus, we could think of these systems as “purely” catalytic. As defined earlier, we denote such a system by CS. Let S be a CS and w be an initial configuration (string) representing a multiset of catalysts and noncatalysts. A configuration x is a reachable configuration if S can reach x starting from the initial configuration w. Call x a halting configuration if no rule is applicable on x. Unless otherwise specified, “reachable configuration” will mean any reachable configuration, halting or not. Note that a non-halting reachable configuration x is an intermediate configuration in a possibly infinite computation. We denote by R(S)
Characterizations of Catalytic Membrane Computing Systems
483
the set of Parikh maps of reachable configurations with respect to noncatalysts only. Since catalysts do not change in a computation, we do not include them in the Parikh map. Also, for convenience, when we talk about configurations, we sometimes do not include the catalysts. R(S) is called the reachability set of R. Rh (S) will denote the set of all halting reachable configurations. 2.1 The Initial Configuration Has Only One Catalyst In this subsection, we assume that the initial configuration of the CS has only one catalyst C. A noncatalyst a is evolutionary if there is a rule in the system of the form Ca → Cv; otherwise, a is non-evolutionary. Call a CS simple if each rule Ca → Cv has at most one evolutionary noncatalyst in v. Our first result shows that semilinear sets and simple CS’s are intimately related. Theorem 1. 1. Let Q ⊆ Nk . If Q is semilinear, then there is a simple CS S such that Q is definable by S, i.e., Q is the projection of Rh (S) on k coordinates. 2. Let S be a simple CS. Then Rh (S) and R(S) are semilinear. Later, in Section 4, we will see that, in fact, the above theorem holds for any CS whose initial configuration has only one catalyst. Suppose that we extend the model of a CS so that the rules are now of the form q : (p, Ca → Cv), i.e., the application of the rules is guided by a finite-state control. The rule means that if the system is in state q, application of Ca → Cv will land the system in state p. We call this system a CS with states or CSS. In addition, we allow the rules to be prioritized, i.e., there is a partial order on the rules: A rule r of lower priority than r cannot be applied if r is applicable. We refer to such a system as a CSSP. For both systems, the computation starts at (q0 , w), where q0 is a designated start state, and w is the initial configuration consisting of catalyst C and noncatalysts. In Section 4, we will see that a CSS can define only a recursive set of tuples. In contrast, the following result shows that a CSSP can simulate a Turing machine. Theorem 2. Let S be a CSSP with one catalyst and two noncatalysts. Then S can simulate a Turing machine. Directly from Theorem 2, we have: Corollary 1. Let S be a CSSP with one catalyst and two noncatalysts. Then R(S) ⊆ N2 need not be a semilinear set. We will see later that in contrast to the above result, when the rules are not prioritized, i.e., we have a CSS S with one catalyst and two noncatalysts, R(S) is semilinear. 2.2 The Initial Configuration Has Multiple Catalysts In this subsection, we assume that initial configuration of the CS can have multiple catalysts.
484
Oscar H. Ibarra et al.
In general, we say that a noncatalyst is k-bounded if it appears at most k times in any reachable configuration. It is bounded if it is k-bounded for some k. Consider a CSSP whose initial configuration has multiple catalysts. Assume that except for one noncatalyst, all other noncatalysts are bounded or make at most r (for some fixed r) alternations between nondecreasing and nonincreasing multiplicity in any computation. Call this a reversal-bounded CSSP. Corollary 2. If S is a reversal-bounded CSSP, then Rh (S) and R(S) are semilinear. Without the reversal-bounded restriction, a CSSP can simulate a TM. In fact, a CS (with multiple catalysts in its initial configuration) can simulate a TM. It was shown in [19] that a CS augmented with noncooperating rules of the form a → v, where a is a noncatalyst and v is a (possibly null) string of noncatalysts is universal in the sense that such an augmented system with 6 catalysts can define any recursively enumerable set of tuples. A close analysis of the proof in [19] shows that all the rules can be made purely catalytic (i.e., of the form Ca → Cv) using at most 8 catalysts. Actually, this number 8 can be improved further using the newest results in [7]: Corollary 3. A CS with 7 catalysts can define any recursively enumerable set of tuples. There is another restriction on a CSSP S that makes it define only a semilinear set. Let T be a sequence of configurations corresponding to some computation of S starting from a given initial configuration w (which contains multiple catalysts). A noncatalyst a is positive on T if the following holds: if a occurs in the initial configuration or does not occur in the initial configuration but later appears as a result of some catalytic rule, then the number of occurrences (multiplicity) of a in any configuration after the first time it appears is at least 1. (There is no bound on the number of times the number of a’s alternate between nondecreasing and nonincreasing, as long there is at least 1.) We say that a is negative on T if it is not positive on T , i.e., the number of occurrences of a in configurations in T can be zero. Any sequence T of configurations for which every noncatalyst is bounded or is positive is called a positive computation. Corollary 4. Any semilinear set is definable by a CSSP where every computation path is positive. Conversely, we have, Corollary 5. Let S be a CSSP. Suppose that every computation path of S is positive. Then Rh (S) and R(S) are semilinear. The previous corollary can further be strengthened. Corollary 6. Let S be a CSSP. Suppose we allow one (and only one) noncatalyst, say a, to be negative. This means that a configuration with a positive occurrence (multiplicity) of a can lead to a configuration with no occurrence of a. Suppose that every computation path of S is positive, except for a. Then Rh (S) and R(S) are semilinear.
Characterizations of Catalytic Membrane Computing Systems
3
485
Characterizations in Terms of Vector Addition Systems
An n-dimensional vector addition system (VAS) is a pair G = x, W , where x ∈ Nn is called the start point (or start vector) and W is a finite set of vectors in Zn , where Z is the set of all integers (positive, negative, zero). The reachability set of the VAS x, W is the set R(G) = {z | for some j, z = x + v1 + ... + vj , where, for all 1 ≤ i ≤ j, each vi ∈ W and x + v1 + ... + vi ≥ 0}. The halting reachability set Rh (G) = {z | z ∈ R(G), z + v ≥ 0 for every v in W }. An n-dimensional vector addition system with states (VASS) is a VAS x, W together with a finite set T of transitions of the form p → (q, v), where q and p are states and v is in W . The meaning is that such a transition can be applied at point y in state p and yields the point y + v in state q, provided that y + v ≥ 0. The VASS is specified by G = x, T, p0 , where p0 is the starting state. The reachability problem for a VASS (respectively, VAS) G is to determine, given a vector y, whether y is in R(G). The equivalence problem is to determine given two VASS (respectively, VAS) G and G , whether R(G) = R(G ). Similarly, one can define the reachability problem and equivalence problem for halting configurations. We summarize the following known results concerningVAS andVASS [20,9,1,10,13]: Theorem 3. 1. Let G be an n-dimensional VASS. We can effectively construct an (n + 3)-dimensional VAS G that simulates G. 2. If G is a 2-dimensional VASS G, then R(G) is an effectively computable semilinear set. 3. There is a 3-dimensional VASS G such that R(G) is not semilinear. 4. If G is a 5-dimensional VAS G, then R(G) is an effectively computable semilinear set. 5. There is a 6-dimensional VAS G such that R(G) is not semilinear. 6. The reachability problem for VASS (and hence also for VAS) is decidable. 7. The equivalence problem for VAS (and hence also for VASS) is undecidable. Clearly, it follows from part 6 of the theorem above that the halting reachability problem for VASS (respectively, VAS) is decidable. 3.1 The Initial Configuration Has Only One Catalyst We first consider CSS (i.e., CS with states) whose initial configuration has only one catalyst. There is an example of a 3-dimensional VASS G in [10] such that R(G) is not semilinear: G =< x, T, p >, where x = (0, 0, 1), and the transitions in T are: p → (p, (0, 1, −1)) p → (q, (0, 0, 0)) q → (q, (0, −1, 2)) q → (p, (1, 0, 0)) Thus, there are only two states p and q. The following was shown in [10]: 1. (x1 , x2 , x3 ) is reachable in state p if and only if 0 < x2 + x3 ≤ 2x1 . 2. (x1 , x2 , x3 ) is reachable in state q if and only if 0 < 2x2 + x3 ≤ 2x1 +1 . Hence R(G) is not semilinear. From this example, we can show: Corollary 7. There is CSS S with 1 catalyst, 3 noncatalysts, and two states such that R(S) is not semilinear.
486
Oscar H. Ibarra et al.
In fact, as shown below, each CSS corresponds to a VASS and vice versa. Lemma 1. 1. Let S be a CSS. We can effectively construct a VASS G such that R(G) = R(S). 2. Every VASS can be simulated by a CSS. From Theorem 3 part 6, we have: Corollary 8. The reachability problem for CSS is decidable. Clearly a reachable configuration is halting if no rule is applicable on the configuration. It follows from the above result that the halting reachability problem (i.e., determining if a configuration is in Rh (S)) is also decidable. A VASS is communication-free if for each transition q → (p, (j1 , ..., jk )) in the VASS, at most one ji is negative, and if negative its value is −1. From Lemma 1 and the observation that the VASS constructed for the proof of Lemma 1 can be made communication-free, we have: Theorem 4. The following systems are equivalent in the sense that each system can simulate the others: CSS, VASS, communication-free VASS. Now consider a communication-free VASS without states, i.e., a VAS where in every transition, at most one component is negative, and if negative, its value is -1. Call this a communication-free VAS. Communication-free VAS’s are equivalent to communicationfree Petri nets, which are also equivalent to commutative context-free grammars [5,11]. It is known that they have effectively computable semilinear reachability sets [5]. It turns out that communication-free VAS’s characterize CS’s. Theorem 5. Every communication-free VAS G can be simulated by a CS, and vice versa. Corollary 9. If S is a CS, then R(S) and Rh (S) are effectively computable semilinear sets. The following is obvious, as we can easily construct a VAS from the specification of the linear set. Corollary 10. If Q is a linear set, then we can effectively construct a communicationfree VAS G such that R(G) = Q. Hence, every semilinear set is a union of the reachability sets of communication-free VAS’s. From the NP-completeness of the reachability problem for communication-free Petri nets (which are equivalent to commutative context-free grammars) [11,5], we have: Corollary 11. The reachability problem for CS is NP-complete. We have already seen that a CSS S with prioritized rules (CSSP) and with two noncatalysts can simulate a TM (Theorem 2); hence R(S) need not be semilinear. Interestingly, if we drop the requirement that the rules are prioritized, such a system has a semilinear reachable set. Corollary 12. Let S be a CSS with two noncatalysts. Then R(S) and Rh (S) are effectively computable semilinear sets. Open Problem: Suppose S has only rules of the form Ca → Cv whose initial configuration has exactly one catalyst. Suppose the rules are prioritized. How is R(S) related to VASS?
Characterizations of Catalytic Membrane Computing Systems
487
3.2 The Initial Configuration Has Multiple Catalysts We have seen that a CS with multiple catalysts can simulate a TM. Consider the following restricted version: Instead of “maximal parallelism” in the application of the rules at each step of the computation, we only allow “limited parallelism” by organizing the rules to apply in one step to be in the following form (called a matrix rule): (D1 b1 → D1 v1 , ..., Ds bs → Ds vs ) where the Di ’s are catalysts (need not be distinct), the bi ’s are noncatalysts (need not be distinct), the vi ’s are strings of noncatalysts (need not be distinct), and s is the degree of the matrix. The matrix rules in a given system may have different degrees. The meaning of a matrix rule is that it is applicable if and only if each component of the matrix is applicable. The system halts if no matrix rule is applicable. Call this system a matrix CS, or MCS for short. We shall also consider MCS with states (called MCSS), where now the matrix rules have states and are of the form: p : (q, (D1 b1 → D1 v1 , ..., Ds bs → Ds vs )) Now the matrix is applicable if the system is in state p and all the matrix components are applicable. After the application of the matrix, the system enters state q. Lemma 2. Given a VAS (VASS) G, we can effectively construct an MCS (MCSS) S such that R(S) = R(G) × {1}. Lemma 3. Given an MCSS S over n noncatalysts, we can effectively construct an (n + 1)-dimensional VASS G such that R(S) = projn (R(G) ∩ (Nn × {1})). The VASS in Lemma 3 can be converted to a VAS. It was shown in [10] that if G is an n-dimensional VASS with states q1 , ..., qk , then we can construct an (n+3)-dimensional VAS G with the following property: If the VASS G is at (i1 , ..., in ) in state qj , then the VAS G will be at (i1 , ..., in , aj , bj , 0), where aj = j for j = 1 to k, bk = k + 1 and bj = bj+1 + k + 1 for j = 1 to k − 1. The last three coordinates keep track of the state changes, and G has additional transitions for updating these coordinates. However, these additional transitions only modify the last three coordinates. Define the finite set of tuples Fk = {(j, (k − j + 1)(k + 1)) | j = 1, ..., k} (note that k is the number of states of G). Then we have: Corollary 13. Given an MCSS S over n noncatalysts, we can effectively construct an (n+4)-dimensional VAS G such that R(S) = projn (R(G )∩(Nn ×{1}×Fk ×{0})), for some effectively computable k (which depends only on the number of states and number of rules in G). From Theorem 4, Lemmas 2 and 3, and the above corollary, we have: Theorem 6. The following systems are equivalent in the sense that each system can simulate the others: CSS, MCS, MCSS, VAS, VASS, communication-free VASS. Corollary 14. It is decidable to determine, given an MCSS S and a configuration α, whether α is a reachable configuration (halting or not).
488
Oscar H. Ibarra et al.
Corollary 15. It is decidable to determine, given an MCSS S and a configuration α, whether α is a halting reachable configuration. From Lemma 2 and Theorem 3 part 7, we have: Corollary 16. The equivalence and containment problems for MCSS are undecidable.
4
Closure Properties
Let S be a catalytic system of any type introduced in the previous sections. For the purposes of investigating closure properties, we will say that S defines a set Q ⊆ Nk (or Q is definable by S) if Rh (S) = Q × {0}r for some given r. Thus, the last r coordinates of the (k + r)-tuples in Rh (S) are zero, and the first k-components are exactly the tuples in Q. Fixed the noncatalysts to be a1 , a2 , a3 , .... Thus, any system S has noncatalysts a1 , ..., at for some t. We say that a class of catalytic systems of a given type is closed under: 1. Intersection if given two systems S1 and S2 , which define sets Q1 ⊆ Nk and Q2 ⊆ Nk , respectively, there exists a system S which defines Q = Q1 ∩ Q2 . 2. Union if given two systems S1 and S2 , which define sets Q1 ⊆ Nk and Q2 ⊆ Nk , respectively, there exists a system S which defines Q = Q1 ∪ Q2 3. Complementation if given a system S which defines a set Q ⊆ Nk , there exists a system S which defines Q = Nk − Q. 4. Concatenation if given two systems S1 and S2 , which define sets Q1 ⊆ Nk and Q2 ⊆ Nk , respectively, there exists a system S which defines Q = Q1 Q2 , where Q1 Q2 = {(i1 + j1 , ..., ik + jk ) | (i1 , ..., ik ) ∈ Q1 , (j1 , ..., jk ) ∈ Q2 }. 5. Kleene + if given a system S which defines a set Q ⊆ Nk , there exists a system S which defines Q = n≥1 Qn . 6. Kleene * if given a system S which defines a set Q ⊆ Nk , there exists a system S which defines Q = n≥0 Qn . Other unary and binary operations can be defined similarly. Theorem 7. The class CS with only one catalyst in the initial configuration is closed under intersection, union, complementation, concatenation, and Kleene+ (or Kleene∗ ). Investigation of closure properties of other types of catalytic systems is a subject for future research.
Acknowledgment We would like to thank Dung Huynh and Hsu-Chun Yen for their comments and for pointing out some of the references concerning vector addition systems. We also appreciate the comments and encouragement of Gheorghe Paun and Petr Sosik on this work.
Characterizations of Catalytic Membrane Computing Systems
489
References 1. H. G. Baker. Rabin’s proof of the undecidability of the reachability set inclusion problem for vector addition systems. In C.S.C. Memo 79, Project MAC, MIT, 1973. 2. G. Berry and G. Boudol. The chemical abstract machine. In POPL’90, pages 81–94. ACM Press, 1990. 3. P. Bottoni, C. Martin-Vide, Gh. Paun, and G. Rozenberg. Membrane systems with promoters/inhibitors. Acta Informatica, 38(10):695–720, 2002. 4. J. Dassow and Gh. Paun. On the power of membrane computing. Journal of Universal Computer Science, 5(2):33–49, 1999. 5. J. Esparza. Petri nets, commutative context-free grammars, and basic parallel processes. In FCT’95, volume 965 of LNCS, pages 221–232. Springer, 1995. 6. R. Freund and M. Oswald. P Systems with activated/prohibited membrane channels. In WMC-CdeA’02, volume 2597 of LNCS, pages 261–269. Springer, 2003. 7. R. Freund, M. Oswald, and P. Sosik. Reducing the number of catalysts needed in computationally universal P systems without priorities. In the 5th Descriptional Complexity of Formal Systems Workshop (DFCS), July 12-14, 2003, Budapest, Hungary. 8. P. Frisco and H. Jan Hoogeboom. Simulating counter automata by P Systems with symport/antiport. In WMC-CdeA’02, volume 2597 of LNCS, pages 288–301. Springer, 2003. 9. M. H. Hack. The equality problem for vector addition systems is undecidable. In C.S.C. Memo 121, Project MAC, MIT, 1975. 10. J. Hopcroft and J.-J. Pansiot. On the reachability problem for 5-dimensional vector addition systems. TCS, 8(2):135–159, 1979. 11. D.T. Huynh. Commutative grammars: The complexity of uniform word problems. Information and Control, 57:21–39, 1983. 12. C. Martin-Vide and Gh. Paun. Computing with membranes (P Systems): Universality results. In MCU, volume 2055 of LNCS, pages 82–101. Springer, 2001. 13. E. Mayr. Persistence of vector replacement systems is decidable. Acta Informatica, 15:309– 318, 1981. 14. M. Minsky. Recursive unsolvability of Post’s problem of Tag and other topics in the theory of Turing machines. Ann. of Math., 74:437–455, 1961. 15. R. Parikh. On context-free languages. Journal of the ACM, 13:570–581, 1966. 16. Gh. Paun. Computing with membranes. JCSS, 61(1):108–143, 2000. 17. Gh. Paun. Computing with membranes (P Systems): A variant. International Journal of Foundations of Computer Science, 11(1):167–181, 2000. 18. Gh. Paun and G. Rozenberg. A guide to membrane computing. TCS, 287(1):73–100, 2002. 19. P. Sosik and R. Freund. P Systems without priorities are computationally universal. In WMC-CdeA’02, volume 2597 of LNCS, pages 400–409. Springer, 2003. 20. J. van Leeuwen. A partial solution to the reachability problem for vector addition systems. In STOC’74, pages 303–309.
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets in Undirected Graphs Toshimasa Ishii and Masayuki Hagiwara Department of Information and Computer Sciences Toyohashi University of Technology Aichi 441-8580, Japan {ishii,masa}@algo.ics.tut.ac.jp
Abstract. Given an undirected multigraph G = (V, E), a family W of sets W ⊆ V of vertices (areas), and a requirement function rW : W → Z + (where Z + is the set of positive integers), we consider the problem of augmenting G by the smallest number of new edges so that the resulting graph has at least rW (W ) edge-disjoint paths between v and W for every pair of a vertex v ∈ V and an area W ∈ W. So far this problem was shown to be NP-hard in the uniform case of rW (W ) = 1 for each W ∈ W, and polynomially solvable in the uniform case of rW (W ) = r ≥ 2 for each W ∈ W. In this paper, we show that the problem can be solved in O(m+ pr∗ n5 log (n/r∗ )) time, even in the general case of rW (W ) ≥ 3 for each W ∈ W, where n = |V |, m = |{{u, v}|(u, v) ∈ E}|, p = |W|, and r∗ = max{rW (W ) | W ∈ W}. Moreover, we give an approximation algorithm which finds a solution with at most one surplus edges over the optimal value in the same time complexity in the general case of rW (W ) ≥ 2 for each W ∈ W.
1
Introduction
In a communication network, graph connectivity is a fundamental measure of its robustness. The problem of achieving a high connectivity between every (or specified) two vertices has been extensively studied as the network design problem and so on (see [2,12] for surveys). Most of all those researches have dealt with connectivity between two vertices in a graph. However, in many real-wold networks, the connectivity between every two vertices is not necessarily required. For example, in a multimedia network, for a set W of vertices offering certain service i, such as mirror servers, a user at a vertex v can use service i by communicating with one vertex w ∈ W through a path between w and v. In such networks, it is desirable that the network has some pairwise disjoint paths from the vertex v to at least one of vertices in W . This means that the measure of reliability is the connectivity between a vertex and a set of vertices rather than that between two vertices. From this point of view, H. Ito et al. considered the node to area connectivity (NA-connectivity, for B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 490–499, 2003. c Springer-Verlag Berlin Heidelberg 2003
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets
491
v3 v5
v4
W3
W3
v
v2
10
v1
v
11
W1
v7
W1
v9
v6
W2
W2
v8
W
(G=(V,E),
rW W1 (
(i)
)=2,
={W1,W2,W3})
rW W2 (
) =3,
rW W3 (
) = 4
W
(G=(V,E),
rW W1 (
(ii)
)=2,
={W1,W2,W3})
rW W2 (
) =3,
rW W3 (
) = 4
Fig. 1. Illustration of an instance of rW -NA-ECAP. (i) An initial graph G = (V, E) with a family W = {W1 = {v4 , v7 , v11 }, W2 = {v1 , v8 , v9 }, W3 = {v1 , v2 , v10 }} of areas, where a requirement function rW : W → Z + satisfies rW (W1 ) = 2, rW (W2 ) = 3, and rW (W3 ) = 4. (ii) An rW -NA-edge-connected graph obtained from G by adding a set of edges drawn as broken lines; there are at least rW (W ) edge-disjoint paths between every pair of a vertex v ∈ V and an area W ∈ W.
short) as a concept that represents the connectivity between vertices and sets of vertices (areas) in a graph [5,6,7]. In this paper, given a multigraph G = (V, E) with a family W of sets W of vertices (areas), and a requirement function rW : W → Z + , we consider the problem of asking to augment G by adding the smallest number of new edges so that the resulting graph has at least rW (W ) pairwise edge-disjoint paths between v and W for every pair of a vertex v ∈ V and an area W ∈ W. We call this problem rW -NA-edge-connectivity augmentation problem (for short, rW -NA-ECAP). Figure 1 gives an instance of rW -NA-ECAP with rW (W1 ) = 2, rW (W2 ) = 3, and rW (W3 ) = 4. So far r-NA-ECAP in the uniform case that rW (W ) = r holds for every area W ∈ W has been studied, and several algorithms have been developed. It was shown by H. Miwa et al. [9] that 1-NA-ECAP is NP-hard. Whereas, r-NA-ECAP is polynomially solvable in the case of r = 2 by H. Miwa et al. [9], and in the case of r ≥ 3 by T. Ishii et al. [4]. However, it was still open whether the problem with general requirements rW (W ) ≥ 2, W ∈ W is polynomially solvable or not. The above two algorithms for r-NA-ECAP are based on algorithms for solving the classical edge-connectivity augmentation problem which augments the edge-connectivity of a graph, but they are essentially different; the former one follows the method based on the minimum cut structure by T. Watanabe et al. [13], and the latter one follows the so-called ‘splitting off’ method by A. Frank [1]. In this paper, by extending the approach in [4] and establishing a min-max formula to rW -NA-ECAP, we show that rW -NA-ECAP with general requirements rW (W ) ≥ 3 for each W ∈ W can be solved in O(m+ pr∗ n5 log (n/r∗ )) time, where n = |V |, m = |{{u, v}|(u, v) ∈ E}|, p = |W|, and r∗ = max{rW (W ) | W ∈ W}. We also give an approximation algorithm for rW -NA-ECAP with general requirements rW (W ) ≥ 2, W ∈ W which delivers a solution with at most one edge over the optimal in the same time complexity. Some of the proofs will be omitted from this extended abstract.
492
2
Toshimasa Ishii and Masayuki Hagiwara
Problem Definition
Let G = (V, E) stand for an undirected graph with a set V of vertices and a set E of edges. An edge with end vertices u and v is denoted by (u, v). We denote |V | by n and |{{u, v}|(u, v) ∈ E}| by m. A singleton set {x} may be simply written as x, and “ ⊂ ” implies proper inclusion while “ ⊆ ” means “ ⊂ ” or “ = ”. In G = (V, E), its vertex set V and edge set E may be denoted by V (G) and E(G), respectively. For a subset V ⊆ V in G, G[V ] denotes the subgraph induced by V . For an edge set E with E ∩ E = ∅, we denote the augmented graph (V, E ∪ E ) by G + E . For an edge set E , we denote by V [E ] a set of all end vertices of edges in E . An area graph is defined as a graph G = (V, E) with a family W of vertex subsets W ⊆ V which are called areas (see Figure 1). We denote an area graph G with W by (G, W). In the sequel, we may denote (G, W) by G simply if no confusion arises. For two disjoint subsets X, Y ⊂ V of vertices, we denote by EG (X, Y ) the set of edges e = (x, y) such that x ∈ X and y ∈ Y , and also denote |EG (X, Y )| by dG (X, Y ). A cut is defined as a subset X of V with ∅ = X = V , and the size of a cut X is defined by dG (X, V − X), which may also be written as dG (X). Moreover, we define d(∅) = 0. For two cuts X, Y ⊂ V with X ∩ Y = ∅ in G, we denote by λG (X, Y ) the minimum size of cuts which separate X and Y , i.e., λG (X, Y ) = min{dG (S)|S ⊇ X, S ⊆ V − Y }. For two cuts X, Y ⊂ V with X ∩ Y = ∅ in G, we define λG (X, Y ) = ∞. The edge-connectivity of G, denoted by λ(G), is defined as minX⊂V,Y ⊂V λG (X, Y ). For a vertex v ∈ V and a set W ⊆ V of vertices, the node-to-area edge-connectivity (NA-edge-connectivity, for short) between v and W is defined as λG (v, W ). Note that λG (v, W ) = ∞ holds for v ∈ W . Also note that by Menger’s theorem, λG (v, W ) ≥ r holds if and only if there exist at least r edge-disjoint paths between v and W . For an area graph (G, W) and a function rW : W → Z + ∪ {0}, we say that (G, W) is rW -NA-edge-connected if λ(v, W ) ≥ rW (W ) holds for every pair of a vertex v ∈ V and an area W ∈ W. Note that the area graph (G, W) in Figure 1(ii) is rW -NA-edge-connected, where rW (W1 ) = 2, rW (W2 ) = 3, and rW (W3 ) = 4. In this paper, we consider the following problem, called rW -NA-ECAP. Problem 1. (rW -NA-edge-connectivity augmentation problem, rW -NA-ECAP) Input: An area graph (G, W) and a requirement function rW : W → Z + . Output: A set E ∗ of new edges with the minimum cardinality such that G + E ∗ is rW -NA-edge-connected.
3
Lower Bound on the Optimal Value
For an area graph (G, W) and a fixed function rW : W → Z + , let opt(G, W, rW ) denote the optimal value to rW -NA-ECAP in (G, W), i.e., the minimum size |E ∗ | of a set E ∗ of new edges such that G + E ∗ is rW -NA-edge-connected. In this section, we derive lower bounds on opt(G, W, rW ) to rW -NA-ECAP with (G, W). In the sequel, let W = {W1 , W2 , . . . , Wp }, rW (Wi ) = ri , and r1 ≤ r2 ≤ · · · ≤ rp if no confusion occurs.
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets
493
A family X = {X1 , . . . , Xt } of cuts in G is called a subpartition of V , if every two cuts Xi , Xj ∈ X satisfy Xi ∩ Xj = ∅ and ∪X∈X X ⊆ V holds. For an area graph (G, W) and an area Wi ∈ W, a cut X with X ∩ Wi = ∅ is called type (Ai ), and a cut X with X ⊇ Wi is called type (Bi ) (note that a cut X of type (Bi ) satisfies X = V by the definition of a cut). We easily see the following property. Lemma 1. An area graph (G, W) is rW -NA-edge-connected if and only if all cuts X ⊂ V of type (Ai ) or (Bi ) satisfy dG (X) ≥ ri for each area Wi ∈ W.
Let X be a cut in (G, W). If X is a cut of type (Ai ) or (Bi ) with dG (X) < ri for some area Wi ∈ W, then it is necessary to add at least ri − dG (X) edges between X and V − X. It follows since if X is of type (Ai ) (resp., type (Bi )), then the NA-edge-connectivity between a vertex in X (resp., V − X) and an area Wi ∈ W with Wi ∩ X = ∅ (resp., Wi ⊆ X) need be augmented to at least ri . Here we define αG,W,rW (X) as follows, which indicates the number of necessary edges which join two vertices form a cut X to the cut V − X (note that r1 ≤ r2 ≤ · · · ≤ rp holds). Definition 1. For each cut X of type (Aj ) or (Bj ) for some area Wj , we define iX as the maximum index i such that X is of type (Ai ) or (Bi ), and define αG,W,rW (X) = max{0, riX − dG (X)}. For any other cut X, we define
αG,W,rW (X) = 0. Lemma 2. For each cut X, it is necessary to add at least αG,W,rW (X) edges between X and V − X.
Let
α(G, W, rW ) = max αG,W,rW (X) , X
(1)
X∈X
where the maximization is taken over all subpartitions of V . Then any feasible solution to rW -NA-ECAP with (G, W) must contain an edge which joins two vertices from a cut X with αG,W,rW (X) > 0 and the cut V − X. Adding one edge can contribute to at most two ‘cut deficiencies’ in a subpartition of V , and hence we see the following lemma. Lemma 3. opt(G, W, rW ) ≥ α(G, W, rW )/2 holds.
The area graph (G, W) in Figure 1(i) satisfies α(G, W, rW ) = 8. We have X∈X αG,W,rW (X) = 8 for the subpartition X = {{v1 }, {v2 }, {v4 }, {v6 , v7 , v8 }, {v9 , v11 }, {v10 }} of V . We remark that there is an area graph (G, W) with opt(G, W, rW ) > α(G, W, rW )/2. Figure 2 gives an instance for r = r1 = r2 = r3 = 2. Each cut {vi }, i = 1, 2, 4 is of type (A3 ) and satisfies r − dG (vi ) = 1 and the cut {v3 } is of type (A1 ) and satisfies r − dG (v3 ) = 1. Then we see α(G, W, rW )/2 = 2. In order to make (G, W) rW -NA-edge-connected by adding two new edges, we must add e = (v1 , v2 ) and e = (v3 , v4 ) without loss of generality. G + {e, e } is not rW -NA-edge-connected by λG+{e,e } (v1 , W3 ) = 1. We will show that all such instances can be completely characterized as follows.
494
Toshimasa Ishii and Masayuki Hagiwara
v1 W3
v3 v4
W2
W1
v2
W) Fig. 2. Illustration of an area graph (G, W) with opt(G, W, rW ) = α(G,W,r + 1, 2 where rW (Wi ) = 2 holds for i = 1, 2, 3.
Definition 2. We say that an area graph (G, W) has property (P ) ifα(G, W, rW ) is even and there is a subpartition X = {X1 , . . . , Xt } of V with X∈X αG,W,rW (X) = α(G, W, rW ) satisfying the following conditions (P 1)–(P 3): (P 1) Each cut X ∈ X is of type (Ai ) for some Wi ∈ W. (P 2) The cut X1 satisfies αG,W,rW (X1 ) = 1 and X1 ⊂ C1 for some component C1 of G with X ∩ C1 = ∅ for each = 2, 3, . . . , t. (P 3) For each = 2, 3, . . . , t, there is a cut Y of type (Bj ) with some Wj ∈ W such that we have X ∪ X1 ⊆ Y and X∈X ,X⊂Y αG,W,rW (X) ≤ (rj + 1) −dG (Y ), and that every cut X ∈ X satisfies X ⊂ Y or X ∩ Y = ∅.
Intuitively, the above condition (P3) indicates that for any feasible solution E , if the number of edges e ∈ E incident to Y is equal to X∈X ,X⊂Y αG,W,rW (X), then any edge e ∈ E must have its end vertex also in V − Y , from dG+E (Y ) ≥ rj . Note that (G, W) in Figure 2 has property (P) because α(G, W, rW ) = 4 holds and a subpartition X = {X1 = {v4 }, X2 = {v1 }, X3 = {v2 }, X4 = {v3 }} of V satisfies Y2 = C1 ∪{v1 }, Y3 = C1 ∪{v2 }, and Y4 = C1 ∪{v3 } for the component C1 of G containing v4 . Lemma 4. If (G, W) has property (P), then opt(G, W, rW ) ≥ α(G, W, rW )/2 +1 holds. Proof. Assume by contradiction that (G, W) has property (P) and there is an edge set E ∗ with |E ∗ | = α(G, W, rW )/2 such that G + E ∗ is rW -NAedge-connected (note that α(G, W, rW ) is even). Let X = {X1 , . . . , Xt } denote a subpartition of V satisfying X∈X αG,W,rW (X) = α(G, W, rW ) and the above (P1)–(P3). Since |E ∗ | = α(G, W, rW )/2 holds, each cut X ∈ X satisfies dG+E ∗ (X) = riX , and hence dG (X) = riX − dG (X) = αG,W,rW (X), where G = (V, E ∗ ). Therefore, any edge (x , x ) ∈ E ∗ satisfies x ∈ X and x ∈ X for some two cuts X , X ∈ X with = . From this, there exists a cut Xs ∈ X with s = 1 and EG (Xs , X1 ) = ∅. Since (G, W) satisfies property (P), there is a cut Ys of type (Bj ) which satisfies (P3), and hence d v∈Ys G (v) ≤ (rj +1)−dG (Ys ). Since G [Ys ] contains one edge in EG (Xs , X1 ), we have dG (Ys ) ≤ (rj − 1) − dG (Ys ), which implies that dG+E ∗ (Ys ) = dG (Ys ) + dG (Ys ) ≤ rj − 1. Hence a vertex v ∈ V − Ys satisfies λG+E ∗ (v, Wj ) ≤ rj − 1, contradicting that G + E ∗ is rW -NA-edge-connected (note that Ys is of type (Bj ) and hence we have Wj ⊆ Ys ).
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets
495
In this paper, we prove that rW -NA-ECAP enjoys the following min-max theorem and is polynomially solvable. Theorem 1. For rW -NA-ECAP with rW (W ) ≥ 3 for each area W ∈ W, opt(G, W, rW ) = α(G, W, rW )/2 holds if (G, W) does not have property (P ), and opt(G, W, rW ) = α(G, W, rW )/2 + 1 holds otherwise. Moreover, a solution E ∗ with |E ∗ | = opt(G, W, rW ) can be obtained in O(m+ prp n5 log (n/rp )) time.
Theorem 2. For rW -NA-ECAP with rW (W ) ≥ 2 for each area W ∈ W, a solution E ∗ with |E ∗ | ≤ opt(G, W, rW ) + 1 can be obtained in O(m+ prp n5 log (n/rp )) time.
4
Algorithm
Based on the lower bounds in the previous section, we give an algorithm, called rW -NAEC-AUG, which finds a feasible solution E to rW -NA-ECAP with |E | = opt(G, W, rW ), for a given area graph (G, W) and a requirement function rW : W → Z + − {1, 2}. It finds a feasible solution E with |E | = α(G, W, rW )/2 + 1 if (G, W) has property (P), |E | = α(G, W, rW )/2 otherwise. For a graph H = (V ∪ {s}, E) and a designated vertex s ∈ / V , an operation called edge-splitting (at s) is defined as deleting two edges (s, u), (s, v) ∈ E and adding one new edge (u, v). That is, the graph H = (V ∪ {s}, (E − {(s, u), (s, v)}) ∪ {(u, v)}) is obtained from such edge-splitting operation. Then we say that H is obtained from H by splitting a pair of edges (s, u) and (s, v). A sequence of splittings is complete if the resulting graph H does not have any neighbor of s. Conversely, we say that H is obtained from H by hooking up an edge (u, v) ∈ E(H − s) at s, if we construct H by replacing an edge (u, v) with two edges (s, u) and (s, v) in H. The edge-splitting operation is known to be a useful tool for solving connectivity augmentation problems [1]. An outline of our algorithm is as follows. We first add a new vertex s and the minimum number of new edges between s and an area graph (G, W) to construct an rW -NA-edge-connected graph H and convert H into an rW -NAedge-connected graph by splitting off edges incident to s and eliminating s. More precisely, we describe the algorithm below, and introduce three theorems necessary to justify the algorithm, whose proofs are omitted due to space limitation. An example of computational process of rW -NAEC-AUG is shown in Figure 3. Algorithm rW -NAEC-AUG. Input: An area graph (G = (V, E), W = {W1 , W2 , . . . , Wp }) and a requirement function rW : W → Z + − {1, 2}. Output: A set E ∗ of new edges with |E ∗ | = opt(G, W, rW ) such that G + E ∗ is rW -NA-edge-connected. Step 1: We add a new vertex s and a set F1 of new edges between s and V such that in the resulting graph H = (V ∪ {s}, E ∪ F1 ), all cuts X ⊂ V of type (Ai ) or (Bi ) satisfy dH (X) ≥ ri for each Wi ∈ W, (2)
496
Toshimasa Ishii and Masayuki Hagiwara
∪
H=(V
{s},
E
F1
∪
F1)
∪
H=(V
s
{s},
E
F1
∪
F1)
s
W3
v1 v2
W1 W2
(i)
∪
H=(V
{s},
E
F1
∪
F1)
(ii)
s
∪
H=(V
{s},
F1
E
∪
F1)
s
v3 v4
(iii)
(iv)
Fig. 3. Computational process of algorithm rW -NAEC-AUG applied to the area graph (G, W) in Figure 1 and (rW (W1 ), rW (W2 ), rW (W3 )) = (2, 3, 4). The lower bound in Section 3 is α(G, W, rW )/2 = 4. (i) H = (V ∪ {s}, E ∪ F1 ) obtained by Step 1. Edges in F1 are drawn as broken lines. Then λH (v, W ) ≥ rW (W ) holds for every pair of v ∈ V and W ∈ W. (ii) H1 = (H − {(s, v1 ), (s, v2 )}) ∪ {(v1 , v2 )} obtained from H by an admissible splitting of (s, v1 ) and (s, v2 ). (iii) H2 = (H1 − {(s, v3 ), (s, v4 )}) ∪ {(v3 , v4 )} obtained from H1 by an admissible splitting of (s, v3 ) and (s, v4 ). (iv) H3 obtained from H2 by a complete admissible splitting at s. The graph G3 = H3 − s is rW -NAedge-connected.
and no F ⊂ F1 satisfies this property (as will be shown, |F1 | = α(G, W, rW ) holds). If dH (s) is odd, then we add to F1 one extra edge between s and V . Step 2: We split two edges incident to s while preserving (2) (such splitting pair is called admissible). We continue to execute admissible edge-splittings at s until no pair of two edges incident to s is admissible. Let H2 = (V ∪ {s}, E ∪ E2 ∪ F2 ) be the resulting graph, where F2 = EH2 (s) and E2 denotes a set of split edges. If F2 = ∅ holds, then halt after outputting E ∗ := E2 . Otherwise dH2 (s) = 4 holds and the graph H2 − s has two components C1 and C2 with dH2 (s, C1 ) = 3 and dH2 (s, C2 ) = 1, where EH2 (s, C2 ) = {(s, u∗ )}. We have the following four cases (a) – (d). (a) The vertex u∗ is contained in no cut X ⊆ C2 of type (Ai ) with dH2 (X) = ri for any i. Then after replacing (s, u∗ ) with a new edge (s, v) for some vertex v ∈ C1 while preserving (2), execute a complete admissible splitting at s. Output a set E ∗ of all split edges, where |E ∗ | = α(G, W, rW )/2 holds.
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets
497
(b) E2 ∩ E(H2 [V − C1 ]) = ∅ holds. Then after hooking up one edge e ∈ E2 ∩ E(H2 [V − C1 ]), execute a complete admissible splitting at s. Output a set E ∗ of all split edges, where |E ∗ | = α(G, W, rW )/2 holds. (c) There is a set E ⊆ E2 of at most two split edges such that the graph H3 resulting from hooking up a set E of edges in H2 has an admissible pair {(s, u∗ ), f } for some f ∈ EH3 (s, V ). After a complete admissible splitting at s in H3 , output a set E ∗ of all split edges, where |E ∗ | = α(G, W, rW )/2 holds. (d) None of (a) – (c) holds. Then we can prove that (G, W) has property (P). After adding one new edge e∗ to EH2 (C1 , C2 ), execute a complete admissible splitting at s in H2 + {e∗ }. Outputting an edge set E ∗ := E3 ∪ {e∗ }, where E3 denotes a set of all split edges and |E ∗ | = α(G, W, rW )/2 + 1 holds.
To justify the algorithm rW -NAEC-AUG, it suffices to show the following three theorems. Theorem 3. Let (G = (V, E), W = {W1 , . . . , Wp }) be an area graph, and 0 ≤ / V and r1 ≤ · · · ≤ rp be integers. Let H = (V ∪ {s}, E ∪ F1 ) be a graph with s ∈ F1 = EH (s, V ) such that H satisfies (2) and no F ⊂ F1 satisfies this property. Then |F1 | = α(G, W, rW ) holds.
Theorem 4. Let (G = (V, E), W = {W1 , . . . , Wp }) be an area graph, and 2 ≤ r1 ≤ · · · ≤ rp be integers. Let H = (V ∪ {s}, E ∪ F ) with F = EH (s, V ) = ∅, s∈ / V , and an even dH (s), satisfy (2). If no pair of two edge in F is admissible, then we have dH (s) = 4 and G has two components C1 and C2 with dH (s, C1 ) = 3 and dH (s, C2 ) = 1. Moreover, in the graph H + e∗ obtained by adding one arbitrary new edge e∗ to EG (C1 , C2 ), there is a complete admissible splitting at s.
Theorem 5. Let (G, W) and H satisfy the assumption of Theorem 4, and 3 ≤ r1 ≤ · · · ≤ rp be integers. Let H ∗ be a graph obtained by a sequence of admissible splittings at s from H such that EH ∗ (s, V ) = ∅ holds and no pair of two edge in EH ∗ (s, V ) is admissible in H ∗ . Let C1 and C2 be two components in H ∗ − s with dH ∗ (s, C1 ) = 3 and dH ∗ (s, C2 ) = 1 (they exist by Theorem 4). Then if H ∗ satisfies one of the following conditions (a)–(c), then H has a complete admissible splitting at s after replacing at most one edge in EH (s, V ). Otherwise (G, W) has property (P). (a) For {(s, u∗ )} = EH ∗ (s, C2 ), u∗ is contained in no cut X ⊆ C2 of type (Ai ) with dH ∗ (X) = ri for any i. (b) E1 ∩ E(H ∗ [V − C1 ]) = ∅ holds, where E1 denotes a set of all split edges. (c) There is a set E ⊆ E1 of at most two split edges such that the graph H resulting from hooking up a set E of edges in H ∗ has an admissible pair {(s, u∗ ), f } for some f ∈ EH (s, V ).
By Theorems 4 and 5, for a set E ∗ of edges obtained by algorithm rW -NAECAUG, the graph H ∗ = (V ∪ {s}, E ∪ E ∗ ) satisfies (2), i.e., all cuts X ⊂ V of type (Ai ) or (Bi ) satisfy dH ∗ (X) ≥ ri for each area Wi ∈ W. By dH ∗ (s) = 0, all cuts X ⊂ V satisfy dG+E ∗ (X) = dH ∗ (X). By Lemma 1, this implies that G + E ∗ is rW -NA-edge-connected. By Theorems 3 and 5, we have |E ∗ | =
498
Toshimasa Ishii and Masayuki Hagiwara
α(G, W, rW )/2+1 in the cases where an initial area graph (G, W) has property (P), |E ∗ | = α(G, W, rW )/2 otherwise. By Lemmas 3 and 4, we have |E ∗ | = opt(G, W, rW ). Finally, we analyze the time complexity of algorithm rW -NAEC-AUG. By the maximum flow technique in [3], we can compute in O(mn log (n2 /m)) time λG (v, W ) for a vertex v ∈ V and an area W ∈ W. Hence it can be checked in O(mpn2 log (n2 /m)) time whether H satisfies (3) or not. In Step 1, for each vertex v ∈ V , after deleting all edges between s and v, we check whether the resulting graph H satisfies (3) or not. If (3) is violated, then we add maxx∈V,Wi ∈W {ri − λH (x, Wi )} edges between s and v in H . In Step 2, for each pair {u, v} ⊆ V , after splitting min{dH (s, u), dH (s, v)} pairs {(s, u), (s, v)}, we check whether the resulting graph H satisfies (3) or not. If (3) is violated, then we hook up 12 maxx∈V,Wi ∈W {ri −λH (x, Wi )} pairs in H . The procedures (a) – (d) can be also executed in polynomial time since the number of hooking up operations is O(n4 ). By further analysis, we can prove that hooking up split edges O(n2 ) times suffice for these procedures, but we here omit the details. Therefore, we see that algorithm rW -NAEC-AUG can be implemented to run in O(mpn4 log (n2 /m)) time. As a result, this total complexity can be reduced to O(m+ prp n5 log (n/rp )) by applying the procedure to a sparse spanning subgraph of G with O(rp n) edges, where such sparsification takes O(m + n log n) time [10,11]. Summarizing the argument given so far, Theorem 1 is now established. Notice that the assumption of r1 ≥ 3 is necessary only for Theorem 5. Therefore, even in the case of r1 = 2, we see by Theorem 4 that we can obtain a feasible solution E to rW -NA-ECAP with |E | ≤ α(G, W, rW )/2 + 1 ≤ opt(G, W, rW ) + 1. This implies Theorem 2. Actually, we remark that there are some differences between the case of r1 = 2 and the case of r1 ≥ 3. For example, a graph (G = (V ∪ {v}, E), W) obtained from the graph (G, W) in Figure 2 by adding an isolated vertex v does not have property (P), but satisfies opt(G , W, rW ) > α(G , W, rW )/2.
5
Conclusion
In this paper, given an area multigraph (G = (V, E), W) and a requirement function rW : W → Z + , we have proposed a polynomial time algorithm for rW -NA-ECAP in the case where each area W ∈ W satisfies rW (W ) ≥ 3. The time complexity of our algorithm is O(m+ pr∗ n5 log (n/r∗ )). Moreover, we have showned that in the case of rW (W ) ≥ 2, W ∈ W, a solution with at most one edge over the optimal can be found in the same time complexity. However, it is still open whether the problem in the case of rW (W ) ≥ 2, W ∈ W is polynomially solvable. We finally remark that our method in this paper cannot be applied to the problem of augmenting a given simple graph while preserving the simplicity of the graph. For such simplicity preserving problems, it was shown [8] that even the edge-connectivity augmentation problem is NP-hard.
Augmenting Local Edge-Connectivity between Vertices and Vertex Subsets
499
Acknowledgments This research is supported by a Grant-in-Aid for the 21st Century COE Program “Intelligent Human Sensing” from the Ministry of Education, Culture, Sports, Science, and Technology.
References 1. A. Frank, Augmenting graphs to meet edge-connectivity requirements, SIAM J. Discrete Math., 5(1), (1992), 25–53. 2. A. Frank, Connectivity augmentation problems in network design, in Mathematical Programming: State of the Art 1994, J.R. Birge and K.G. Murty (Eds.), The University of Michigan, Ann Arbor, MI, (1994), 34–63. 3. A. V. Goldberg and R. E. Tarjan, A new approach to the maximum flow problem, J. Assoc. Comput. Mach., 35, (1988), 921–940. 4. T. Ishii, Y. Akiyama, and H. Nagamochi, Minimum augmentation of edgeconnectivity between vertices and sets of vertices in undirected graphs, Electr. Notes Theo. Comp. Sci., vol.78, Computing Theory: The Australian Theory Symposium (CATS’03), (200 3). 5. H. Ito, Node-to-area connectivity of graphs, Transactions of the Institute of Electrical Engineers of Japan, 11C(4), (1994), 463-469. 6. H. Ito, Node-to-area connectivity of graphs, In M. Fushimi and K. Tone, editors, Proceedings of APORS94, World Scientific publishing, (1995), 89–96. 7. H. Ito and M. Yokoyama, Edge connectivity between nodes and node-subsets, Networks, 31(3), (1998), 157–164. 8. T. Jord´ an, Two NP-complete augmentation problems, Preprint no. 8, Department of Mathematics and Computer Science, Odense University, (1997). 9. H. Miwa and H. Ito, Edge augmenting problems for increasing connectivity between vertices and vertex subsets, 1999 Technical Report of IPSJ, 99-AL-66(8), (1999), 17–24. 10. H. Nagamochi and T. Ibaraki, A linear-time algorithm for finding a sparse kconnected spanning subgraph of a k-connected graph, Algorithmica, 7, (1992), 583– 596. 11. H. Nagamochi and T. Ibaraki, Computing edge-connectivity of multigraphs and capacitated graphs, SIAM J. Discrete Math., 5, (1992), 54–66. 12. H. Nagamochi and T. Ibaraki, Graph connectivity and its augmentation: applications of MA orderings, Discrete Applied Mathematics, 123(1), (2002), 447–472. 13. T. Watanabe and A. Nakamura, Edge-connectivity augmentation problems, J. Comput. System Sci., 35, (1987), 96–144.
Scheduling and Traffic Allocation for Tasks with Bounded Splittability Piotr Krysta1, Peter Sanders1 , and Berthold V¨ ocking2 1
2
Max-Planck-Institut f¨ ur Informatik Stuhlsatzenhausweg 85, Saarbr¨ ucken, Germany {krysta,sanders}@mpi-sb.mpg.de Dept. of Computer Science, Universit¨ at Dortmund Baroper Str. 301, 44221 Dortmund, Germany [email protected]
Abstract. We investigate variants of the problem of scheduling tasks on uniformly related machines to minimize the makespan. In the k-splittable scheduling problem each task can be broken into at most k ≥ 2 pieces to be assigned to different machines. In a more general SAC problem each task j has its own splittability parameter kj ≥ 2. These problems are NPhard and previous research focuses mainly on approximation algorithms. Our motivation to study these scheduling problems is traffic allocation for server farms based on a variant of the Internet Domain Name Service (DNS) that uses a stochastic splitting of request streams. We show that the traffic allocation problem with standard latency functions from Queueing Theory cannot be approximated in polynomial time within any finite factor because of the extreme behavior of these functions. Our main result is a polynomial time, exact algorithm for the k-splittable scheduling problem as well as the SAC problem with a fixed number of machines. The running time of our algorithm is exponential in the number of machines but is only linear in the number of tasks. This result is the first proof that bounded splittability reduces the complexity of scheduling as the unsplittable scheduling is known to be NP-hard already for two machines. Furthermore, since our algorithm solves the scheduling problem exactly, it also solves the traffic allocation problem.
1
Introduction
A server farm is a collection of servers delivering data to a set of clients. Large scale server farms are distributed all over the Internet and deliver various types of site content including graphics, streaming media, downloadable files, and HTML on behalf of other content providers who pay for an efficient and reliable delivery of their site data. To satisfy these requirements, one needs an advanced traffic management that takes care for the assignment of traffic streams to individual servers. Such streams can be formed, e.g., by traffic directed to the same page,
Partially supported by DFG grants Vo889/1-1, Sa933/1-1, and the IST program of the EU under contract IST-1999-14186 (ALCOM-FT).
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 500–510, 2003. c Springer-Verlag Berlin Heidelberg 2003
Scheduling and Traffic Allocation for Tasks with Bounded Splittability
501
traffic directed to pages of the same content provider, or by the traffic requested from clients in the same geographical region or domain, or also by combinations of these criteria. The objective is to distribute these streams as evenly as possible over all servers in order to ensure site availability and optimal performance. For each traffic stream there is a corresponding stream of requests sent from the clients to the server farm. Current implementations of commercial Web server farms use the Internet Domain Name Service (DNS) to direct the requests to the server that is responsible for delivering the data of the corresponding traffic stream. The DNS can answer a query such as “What is www.uni-dortmund.de?” with a short list of IP addresses rather than only a single IP address. The original idea behind returning this list is that, in case of failures, clients can redirect their requests to alternative servers. Nowadays, slightly deviating from this idea, these lists are also used for the purpose of load balancing among replicated servers (cf., e.g., [8]). When clients make a DNS query for a name mapped to a list of addresses, the server responds with the entire list of IP addresses, rotating the ordering of addresses for each reply. As clients typically send their HTTP requests to the IP address listed first, DNS rotation distributes the requests more or less evenly among all the replicated servers in the list. Suppose the request streams are formed by a sufficiently large number of clients such that it is reasonably well described by a Poisson process. Let λj denote the rate of stream j, i.e., the expected number of requests in some specified time interval. Under this assumption, rotating a list of servers corresponds to splitting stream j into substreams each of which having rate λj /. We propose a slightly more sophisticated stochastic splitting policy that allows for a better load balancing and additionally preserves the Poisson property of the request streams. Suppose, the DNS attaches a vector pj1 , . . . , pj with i pji = 1 to the list of each stream j. In this way, every individual request in stream j can be directed to the ith server on this list with probability pji . This policy breaks Poisson stream j into Poisson streams of rate pj1 λj , . . . , pj λj , respectively. The possibility to split streams into smaller substreams can obviously reduce the maximum latency. It is not obvious, however, whether it is easier or more difficult to find an optimal assignment if every stream is allowed to be broken into a bounded number of substreams. Observe that the allocation problem above is a variant of machine scheduling in which streams correspond to jobs and servers to machines. In the context of machine scheduling, bounded splittability has been investigated before with the motivation to speed up the execution of parallel programs. We will introduce first the relevant background in scheduling. Scheduling on Uniformly Related Machines. Suppose a set of jobs [n] = {1, . . . , n} need to be scheduled on a set of machines [m] = {1, . . . , m}. Jobs are described by sizes λ1 , . . . , λn ∈ >0 , and machines are described by their speeds s1 , . . . , sn ∈ >0 . In the classical, unsplittable scheduling problem on uniformly related machines, every job must be assigned to exactly one machine. This mapping can be described by an assignment matrix (xij )i∈[m],j∈[n] , where xij is an indicator variable with xij = 1 if job j is assigned to machine i and 0 otherwise. The objective is to minimize the makespan z = maxi∈[m] j∈[n] λj xij /si . It is
502
Piotr Krysta, Peter Sanders, and Berthold V¨ ocking
well known that this problem is strongly NP-hard. Hochbaum and Shmoys [5, 6] gave the first polynomial time approximation schemes (PTAS) for this problem. If the number of machines is fixed, then the problem is only weakly NP-hard and it admits a fully polynomial time approximation scheme (FPTAS) [7]. A fractional relaxation of the problem leads to splittable scheduling. In the fully splittable scheduling problem the variables xij can take arbitrary real values from [0, 1] subject to the constraints i∈[m] xij ≥ 1, for every j ∈ [n]. This problem is trivially solvable, e.g., by assigning a piece of each job to each machine whose size is proportional to the speed of the machine. k-Splittable Machine Scheduling and the SAC Problem. In the k-splittable machine scheduling problem each job can be broken into at most k ≥ 2 pieces that must be placed on different machines, i.e., at most k of the variables xij ∈ [0, 1], for every j, are allowed to be positive. Recently, Shachnai and Tamir [12] introduced a generalization of this problem, called scheduling with machine allotment constraints (SAC). In this problem, each job j has its own splittability parameter kj ≥ 1. In our study, we will mostly assume kj ≥ 2, for every j ∈ [n]. Shachnai and Tamir [12] prove that, in contrast to the fully splittable scheduling problem, the k-splittable machine scheduling problem is strongly NP-hard even on identical machines. They also give a PTAS for the SAC problem, whose running time, however, does not render practical as the splittability appears double exponentially in the running time. As a more practical result, they present a very fast maxj (1 + 1/kj )-approximation algorithm. This result suggests that, in fact, approximation should get easier when the splittability is increased. We should mention here, that there is a related scheduling problem in which preemption is allowed, that is, jobs can be split arbitrarily but pieces of the same job cannot be processed at the same time on different machines. Shachnai and Tamir study also combinations of SAC and scheduling with preemption in which jobs can be broken into a bounded number of pieces and additionally there are bounds on the number of pieces that can be executed at the same time. Further variants of scheduling with different notions of splittability with motivations from parallel computing and production planning can be found in [12] and [13]. Scheduling with Non-linear Latency Functions. The only difference between the k-splittable scheduling and the traffic allocation problems is that the latency occurring at servers may not be linear. A typical example of a latency function at a server of speed s with an incoming Poisson stream at rate λ is λ . This family of functions can be derived from the formula fs (λ) = s(s−min{s,λ}) for the waiting time on an M/M/1 queueing system. Of course, M/M/1 waiting time is only one out of many examples for latency functions that can be obtained from Queueing Theory. In fact, a typical property of such functions is that the latency goes to infinity when the injection rate approaches the service rate. Instead of focusing on particular latency functions, we will set up a more general framework to analyze the effects of non-linearity. The k-splittable traffic allocation problem is a variant of the k-splittable scheduling. Streams are described by rates λ1 , . . . , λn , and servers–by bandwidths or service rates s1 , . . . , sm . Hence, traffic streams can be identified with jobs and servers with machines.
Scheduling and Traffic Allocation for Tasks with Bounded Splittability
503
The latencies occurring at the servers are described by a family of latency functions F = {fs : ≥0 → ≥0 ∪{∞} | s ∈ >0 } where fs denotes a non-decreasing latency function for a server with service rate s. Scheduling under non-linear latency functions has been considered before. Alon et al. [2] give a PTAS for makespan minimization on identical machines with certain well-behaved latency functions. This was extended into a PTAS for makespan minimization on uniformly related machines by Epstein and Sgall [4]. In both studies, the latency functions must fulfill some analytical properties like convexity and uniform continuity under a logarithmic scale. Unfortunately, the uniform continuity condition excludes typical functions from Queueing Theory. Our Results. The main result of this paper is a fixed-parameter tractable algorithm for the k-splittable scheduling and the more general SAC problem with splittability at least two for every job. Our algorithm has polynomial running time for every fixed number of machines. This result is remarkable as unsplittable scheduling is known to be NP-hard already on two machines. In fact, our result is the first proof that bounded splittability reduces the complexity of scheduling. In more detail, given any upper bound T on the makespan of an optimal assignment, our algorithm computes a feasible assignment with makespan at most T in time O(n+mm+m/(k0 −1) ) with k0 = min{k1 , k2 , . . . , kn }. Furthermore, despite the possibility to split the jobs into pieces of non-rational size, we prove that the optimal makespan can be represented by a rational number with only a polynomial number of bits. Thus the optimal makespan can be found by using binary search techniques over the rationals. This yields an exact, polynomialtime algorithm for SAC with a fixed number of machines. (We have recently improved the running time in case of identical machines [1].) Note, that this problem is strongly NP-hard when the number of machines is not fixed and k0 ≥ 2 [12]. In addition, we study the effects due to the non-linearity of latency functions. The algorithm above can be adopted to work efficiently for a wide class of latency functions containing even such extreme functions as M/M/1 waiting time. On the negative side, we prove that latency functions like M/M/1 do not admit polynomial time approximation algorithms with finite approximation ratio if the number of machines is unbounded. The latter result is an ultimate rationale for our approach to devise efficient algorithms for a fixed number of machines.
2
An Exact Algorithm for SAC with Given Makespan
We present here an exact algorithm for SAC with kj ≥ 2 for every job. Our algorithm has polynomial running time for any fixed number of machines. We assume that an upper bound on the optimal makespan is given. This upper bound defines a capacity for each machine. The capacity of machine i is denoted by ci . The computed schedule has to satisfy j∈[n] λj xij ≤ ci , for every i ∈ [m]. A difficult subproblem to be solved is to decide into which pieces of which size the jobs should be cut. In principle, the number of possible cuts is unbounded. We will show that it suffices to consider only those cuts that “saturate” a machine.
504
Piotr Krysta, Peter Sanders, and Berthold V¨ ocking
Let πij = λj xij denote the size of the piece of job j allocated to machine i. Machine i is saturated by job j if πij = ci . Our algorithm (Algorithm 1) schedules the bulkiest job j first where the bulkiness of j is λj /(kj − 1). Using backtracking it tries all ways to cut one piece from job j s.t. a machine is saturated. The saturated machine is removed from the problem; the splittability and size of j is reduced accordingly. The remaining problem is solved recursively. Two special cases arise. If j is too small to saturate kj machines, all remaining jobs can be scheduled using a simple greedy approach known as McNaughton’s rule [10]. Since the splittability kj of a job is decreased whenever a piece is cut off, a remaining piece can eventually become unsplittable. Since this remaining piece will be infinitely bulky, it will be scheduled next. In this case, all machines that can accommodate the piece are tried. For the precise description see Fig. 1. I := [n] – – Machines to be saturated; Jobs to be scheduled [m]; J := λ > c ∨ ¬solve() then output “no solution possible” if j i j∈J i∈I else output nonzero πij values Function solve() : Boolean if J = ∅ then return true find a j ∈ J that maximizes λj /(kj − 1) – – Unsplittable remaining piece if kj = 1 then forall i ∈ I with ci ≥ λj do – – (*) πij := λj ; ci := ci − λj ; J := J \ {j} if solve() then return true undo changes made in line (*) else – – Job j is splittable if λj /(kj − 1) ≤ min {ci : i ∈ I} then McNaughton(); return true forall i ∈ I with ci < λj do – – (**) πij := ci ; λj := λj − ci ; kj := kj − 1; I := I \ {i} if solve() then return true undo changes made in line (**) return false Procedure McNaughton() – – Schedule greedily pick any i ∈ I foreach j ∈ J do while ci ≤ λj do πij := ci ; λj := λj − ci ; I := I \ {i}; pick any new i ∈ I πij := λj ; ci := ci − λj Fig. 1. Algorithm 1: Find a schedule of n jobs with splittabilities kj to m machines.
Theorem 1. Algorithm 1 finds a feasible solution for SAC with a given possible makespan, provided that the splittability of each job is at least two. It can be implemented to run in time O n + mm+m/(k0 −1) , where k0 = min{k1 , . . . , kn }. Proof. All the necessary data structures can be initialized in time O(m + n) if we use a representation of the piece size matrix (πij ) that only stores nonzero entries. There can be at most m recursive calls that saturate a machine and at
Scheduling and Traffic Allocation for Tasks with Bounded Splittability
505
most m/(k0 − 1) recursive calls made for unsplittable pieces that remain after a job j was split kj − 1 times. All in all, the backtrack tree considers no more than m!mm/(k0 −1) possibilities. The selection of the bulkiest job can be implemented to run in time O(log m) independent of n: Only the m largest jobs can ever be a candidate. Hence it suffices to select these jobs initially using an O(n) time algorithm [3] and keep them in a priority queue data structure. Greedy scheduling using McNaughton’s rule takes time O(n + m). Overall, we get an execution time of O n + m + m!mm/(k0 −1) log m = O n + mm+m/(k0 −1) . The algorithm also produces only correct schedules. In particular, when λj /(kj − 1) ≤ min {ci : i ∈ I} McNaughton’s rule can complete the schedule because no remaining job is large enough to saturate more than k j −1 of the remain ing machines. In particular, solve() maintains the invariant j∈J λj ≤ i∈I ci and when McNaughton’s rule is called, it can complete the schedule: Lemma 1. McNaughton’s rule computes a correct schedule if j∈J λj ≤ i∈I ci and ∀i ∈ I, j ∈ J : λj /(kj − 1) ≤ ci . Proof. The only thing that can go wrong is that a job j is split more than kj − 1 times, i.e., into ≥ kj + 1 pieces. Then, it completely fills at least kj − 1 machines with capacity at least mini∈I ci , contradicting λj /(kj −1) ≤ ci . Now we come to the interesting part of the proof. We have to show that the search succeeds if a feasible schedule exists. We show the stronger claim that the algorithm is correct even if unsplittable jobs are present. (In this case only the above running time analysis would fail.) The proof is by induction on m. For m = 1 this is trivial since no splits are necessary. Consider the case m > 1. If there are unsplittable jobs, they are infinitely bulky and so are scheduled first. Since all possible placements for them are tried, nothing can be missed for them. When a splittable job is bulkiest, only those splits are considered that saturate one machine. Lemma 2 shows that if there is a feasible schedule, there must also be one with this property. The recursive call leaves a problem with one machine less and the induction hypothesis is applicable. Lemma 2. If a feasible schedule exists and the bulkiest job is large enough to saturate a machine then there is a feasible schedule where the bulkiest job saturates a machine. Our approach to proving Lemma 2 is to show that any feasible schedule can be transformed into a feasible schedule where the bulkiest job saturates a machine. To simplify this task, we first establish a toolbox of simpler transformations. We begin with two very simple transformations that affect only two jobs and obviously maintain feasibility. See Fig. 2-(a) and 2-(b) for illustrations. Lemma 3. For any feasible schedule, consider two jobs p and q sharing machine i , i.e., πi p > 0 and πi q > 0. For any machine i such that πi q < πip there is a feasible schedule where the overlapping piece of q is moved to machine i, i.e., (πi p , πip , πi q , πiq ) := (πi p + πi q , πip − πi q , 0, πiq + πi q ).
506
Piotr Krysta, Peter Sanders, and Berthold V¨ ocking
Fig. 2. Manipulating schedules. Lines represent jobs. (Bent) boxes represent machines. (a): The move from Lemma 3; (b): The swap from Lemma 4; (c): Saturation using Lemma 5; (d): The rotation from Lemma 6; (e): Moving j away from r.
Lemma 4. For any feasible schedule, consider two jobs p and q sharing machine i, i.e., πip > 0 and πiq > 0. Furthermore, consider two other pieces πip p and πiq q of p and q. If πip p ≤ πiq + πiq q and πiq q ≤ πip + πip p then there is a feasible schedule where the pieces πip p and πiq q are swapped as follows: ( πip p , πip , πiq p , πip q , πiq , πiq q ) := ( 0, πip + πip p − πiq q , πiq p + πiq q , πip q + πip p , πiq + πiq q − πip p , 0 ) As a first application of Lemma 3 we now explain how a large job j allocated to at most kj − 1 machines can “take over” a small machine. Lemma 5. Consider a job j and machine i such that λj /(kj − 1) ≥ ci . If there is a feasible schedule where j is scheduled to at most (kj − 1) machines, then there is a feasible schedule where j saturates machine i. Proof. Let i denote a machine index that maximizes πi j and note that πi j ≥ λj /(kj −1) ≥ ci . We can now apply Lemma 3 to subsequently move all the pieces on machine i to machine i . Lemma 3 remains applicable because πi j is large enough to saturate machine i. See Fig. 2-(c) for an illustration. After the above local transformations, we come to a global transformation that greatly simplifies the kind of schedules we have to consider. Definition 1. Job j is called split if |{i : πij > 0}| > 1. The split graph corresponding to a schedule is an undirected hypergraph G = ([m], E) where each split job j corresponds to a hyperedge {i : πij > 0} ∈ E. Lemma 6. If a feasible schedule exists, then there is a feasible schedule whose split graph is a forest. Proof. It suffices to show that for a feasible schedule whose split graph G contains a cycle there is also a feasible schedule whose corresponding split graph has a smaller value of e∈E |e|. Then it follows that a feasible schedule that minimizes |e| is a forest. e∈E So suppose G contains a cycle involving edges. Let succ(j) stand for (j + 1) mod + 1. By appropriately renumbering machines and jobs we can assume
Scheduling and Traffic Allocation for Tasks with Bounded Splittability
507
without loss of generality that this cycle is made up of jobs 1 to and machines 1 to such that for j ∈ [], πjj > 0, πsucc(j)j > 0, and δ = π11 = minj∈[] min πjj , πsucc(j)j . Fig. 2-(d) depicts this normalized situation. Now we rotate the pieces in the cycle by increasing πjj by δ and decreasing πsucc(j)j by the same amount. The schedule remains feasible since the load of the machines in the cycle remains unchanged. Since the first job is now split in one piece less, e∈E |e| decreases. Now we have all the necessary tools to establish Lemma 2. Proof. Consider any feasible schedule; let j denote the bulkiest job and let there be a machine i0 with λj /(kj − 1) ≥ ci0 . We transform this schedule in several steps. We first apply Lemma 6 to obtain a schedule whose split graph is a forest. We now concentrate on the tree T where j is allocated. If job j is allocated to at most kj − 1 machines, we can saturate i0 using Lemma 5 and we are done. If one piece of j is allocated to a leaf i of T then all other jobs mapped to machine i are allocated there entirely. Let i denote another machine j is mapped to. We apply Lemma 3 to move small jobs from i to i . When this is no longer possible, either job j saturates machine i and we are done or there is a job j with λj = πij > πi j . Now we can apply Lemma 4 to pieces πij , πi j , πij , and a zero size piece of job j . This transformation reduces the number of pieces of job j so that we can saturate machine i0 using Lemma 5. Finally, j could be allocated to machines that are all interior nodes of T . We focus on the two largest pieces πij and πi j so that πij + πi j ≥ 2λj /kj . Now fix a leaf r that is connected to i via a path that does not involve j as an edge. This is possible since j is connected to interior nodes only. Now we intend to move job j away from r, i.e., we transform the schedule such that the path between node r and job j becomes longer. (The path between a node v and a job e in a tree starts at v and uses edges e = e until a node is reached that has e as an incident edge.) We do this iteratively until j is incident to a leaf in T . Then we can apply the transformations described above and we are done. We first apply Lemma 3 to move small pieces of jobs allocated to machine i to machine i. Although this changes the shape of T , it leaves the distance between jobs j and r invariant unless j ends up in machine i completely so that we can apply Lemma 5 and we are done. When Lemma 3 is no longer applicable, either j saturates machine i and we are done or there is a job q with πi q > πij . In that case we consider the smallest other piece πiq q of job q. More precisely, if q is split into at most kq − 1 nonzero pieces we pick some iq with πiq q = 0. Otherwise we pick iq = min { = i : πq > 0}. In either case πiq q ≤ λq /(kq − 1). Recall that πij + πi j ≥ 2λj /kj since this sum is invariant under the move operations we have performed. Furthermore, j is the bulkiest job so that πiq q ≤
λj 2λj 2λj λq ≤ = ≤ kq − 1 kj − 1 (kj − 1) + (kj − 1) (kj − 1) + 1 2λj = ≤ πij + πi j . kj
508
Piotr Krysta, Peter Sanders, and Berthold V¨ ocking
So, we can apply Lemma 4 to pieces i and i of job j and to pieces i and iq of job q. This increases the distance from job j to machine r as desired. Fig. 2-(e) gives an example where we apply Lemma 3 once and then Lemma 4.
3
Finding the Optimal Makespan
We assumed so far that an upper bound on the optimal makespan is known. The obvious idea now is to find the optimal makespan using binary search. In order to show that this search terminates one needs to prove that the optimal makespan is a rational number. This is not completely obvious as in principle jobs might be broken into pieces of non-rational size. The following lemma, however, shows that the optimal makespan can be represented by rational numbers of polynomial length. Let denote the set of non-negative rational numbers that can be represented by an -bit numerator and an -bit denominator and the symbol ∞. The proof of the next lemma is omitted in this extended abstract. Lemma 7. There is a constant κ > 0 s.t. the value of an optimum solution to SAC problem with kj ≥ 2 (for all j) is in N κ , where N is the problem size. By Lemma 7, the optimal makespan can be found by binary search methods over the rationals (see, e.g., [9, 11]) with Algorithm 1 as a decision oracle. Thus: Corollary 1. For every fixed number of machines, there is an exact polynomial time optimization algorithm for the SAC problem with splittability at least two.
4
Solving the Traffic Allocation Problem
In this section, we show how to apply the binary search approach to the traffic allocation problem, i.e., we solve the SAC problem with non-linear latency functions. We need to make some very modest assumptions about these functions. A latency function is monotone if it is positive and non-decreasing. The functions need not be continuous or strictly increasing, e.g., step functions are covered. For a monotone function f : ≥0 → ≥0 ∪ {∞}, let the inverse of f be defined by f −1 (y) = sup{λ|f (λ) ≤ y}, for y ≥ f (0), and f −1 (y) = 0, for y < f (0). We say that a function f is polynomially length-bounded if for every λ ∈ , f (λ) ∈ poly() . For example, the M/M/1 waiting time function is polynomially length-bounded although limλ→b− fs (λ) = ∞. This is because, for λ, s ∈ with λ < s, one can show (s − λ) ∈ 2 , s(s − λ) ∈ 4 and λ/(s(s − λ)) ∈ 8 so that f (λ) ∈ 8 . We say that a family of latency functions F is efficiently computable if, for every s, λ ∈ , fs (λ) and fs−1 (λ) can be calculated in time polynomial in . Observe that the functions from an efficiently computable family must also be polynomially length-bounded. It is easy to check that the M/M/1 waiting time family and other typical function families from Queueing Theory are efficiently computable. We obtain the following result whose proof is omitted.
Scheduling and Traffic Allocation for Tasks with Bounded Splittability
509
Theorem 2. Let F be any efficiently computable family of monotone functions. Consider the SAC problem with latency functions from F and splittability at least two. Suppose the best possible maximum latency can be represented by a number from . Then an optimal solution can be found in time O poly() · (n + mm+m/(k0 −1) ) with k0 = min{kj : j = 1, 2, . . . , n}. Note that is an obvious lower bound on the running time of any algorithm computing the exact, optimal makespan. It is unclear if there exist latency functions s.t. cannot be bounded polynomially in the input length. If an appropriate upper bound on is not known in advance, we can use the geometric search, which can be stopped after computing the optimal latency with desired precision.
5
Non-approximability for Non-linear Scheduling
The M/M/1 waiting time cost function family defined in Section 1 has an infinity pole as λ → b− . Intuitively, this pole reflects the capacity restriction on the servers and it is typical also for other families that can be derived from Queueing Theory. The following theorem, whose proof is omitted, shows that the non-linear k-splittable scheduling even with identical servers is completely inapproximable. Theorem 3. Let F be an efficiently computable family of monotone latency functions. Suppose there is s ∈ >0 s.t. limλ→s fs (λ) = ∞. Then there does not exist a polynomial time approximation algorithm with finite approximation ratio for the non-linear k-splittable scheduling problem under F , provided P = NP.
References 1. A. Agarwal, T. Agarwal, S. Chopra, A. Feldmann, N. Kammenhuber, P. Krysta and B. V¨ ocking. An Experimental Study of k-Splittable Scheduling for DNS-Based Traffic Allocation. To appear in Proc. of the 9th EUROPAR, 2003. 2. N. Alon, Y. Azar, G. J. Woeginger and T. Yadid. Approximation schemes for scheduling on parallel machines. Journal of Scheduling, 1:55–66, 1998. 3. M. Blum, R. Floyd, V. Pratt, R. Rivest, and R. Tarjan. Time bounds for selection. J. Computer and System Science, 7(4):448–461, August 1973. 4. L. Epstein and J. Sgall. Approximation schemes for scheduling on uniformly related and identical parallel machines. Proc. of the 7th ESA, 151–162, 1999. 5. D. S. Hochbaum and D.B. Shmoys. Using dual approximation algorithms for scheduling problems: theoretical and practical results. J. ACM, 34: 144–162, 1987. 6. D. S. Hochbaum and D.B. Shmoys. A polynomial approximation scheme for scheduling on uniform processors: using the dual approximation approach. SIAM Journal on Computing, 17: 539–551, 1988. 7. E. Horowitz and S. K. Sahni. Exact and approximate algorithms for scheduling nonidentical processors. J. ACM, 23:317–327, 1976. 8. J. F. Kurose and K. W. Ross. Computer networking: a top-down approach featuring the Internet. Addison-Wesley, 2001. 9. St. Kwek and K. Mehlhorn. Optimal search for rationals. Information Processing Letters, 86:23 - 26, 2003.
510
Piotr Krysta, Peter Sanders, and Berthold V¨ ocking
10. R. McNaughton. Scheduling with deadlines and loss functions. Management Science, 6:1–12, 1959. 11. C. H. Papadimitriou. Efficient search for rationals. Information Processing Letters, 8:1–4, 1979. 12. H. Shachnai and T. Tamir. Multiprocessor Scheduling with Machine Allotment and Parallelism Constraints. Algorithmica, 32(4): 651–678, 2002. 13. W. Xing and J. Zhang. Parallel machine scheduling with splitting jobs. Discrete Applied Mathematics, 103: 259–269, 2000.
Computing Average Value in Ad Hoc Networks Mirosław Kutyłowski1 and Daniel Letkiewicz2 1
2
Inst. of Mathematics, Wrocław University of Technology [email protected] Inst. of Engineering Cybernetics, Wrocław University of Technology [email protected]
Abstract. We consider a single-hop sensor network with n = Θ(N ) stations using R independent communication channels. Communication between the stations can fail at random or be scrambled by an adversary so that it cannot be distinguished from a random noise. Assume that each station Si holds an integer value Ti . The problem that we consider is to replace the values Ti by their average (rounded to integer values). A typical situation is that we have a local sensor network that needs to make a decision based on the values read by sensors by computing the average value or some kind of voting. We design a protocol that solves this problem in O(N/R · log N ) steps. The protocol is robust: a constant random fraction of messages can be lost (by communication channel failure, by action of an adversary or by synchronization problems). Also a constant fraction of stations may go down (or be destroyed by an adversary) without serious consequences for the rest. The algorithm is well suited for dynamic systems, for which the values Ti may change and the protocol once started works forever. Keywords: mobile computing, radio network, sensor network
1 Introduction Ad hoc networks that communicate via radio channels gain importance due to many new application areas: sensor networks used to monitor environment, self-organizing networks of mobile devices, mobile networks used in military and rescue operations. Ad hoc networks provide many features that are very interesting from practical point of view. They have no global control (which could be either attacked or accidentally destroyed), should work if some stations leave or join the network. So the systems based on ad hoc networks are robust (once they work). On the other side, it is quite difficult to design efficient algorithms for ad hoc networks. Classical distributed algorithms have been designed for wired environments with quite different communication features. For instance, in many cases one can assume that an hoc networks works synchronously (due to GPS signals); if the network works on a small area, then two stations may communicate directly (single-hop model) and there is no communication latency. On the other hand, stations compete for access into a limited number of radio channels. They may disturb each other making the transmission
This research was partially supported by Komitet Bada´n Naukowych grant grant 8T11C 04419.
B. Rovan and P. Vojt´asˇ (Eds.): MFCS 2003, LNCS 2747, pp. 511–520, 2003. c Springer-Verlag Berlin Heidelberg 2003
512
Mirosław Kutyłowski and Daniel Letkiewicz
unreadable, if broadcasting at the same time on the same channel. Therefore quite a different algorithmic approach is necessary. Recently, there have been a lot of research on fundamental issues for ad hoc networks (as a starting point for references see [7]). Problem Statement. In this paper we consider the following task. Each station Si of a network holds initially an integer value Ti . The goal is to compute the average value of all numbers Ti so that each station changes its Ti into the average value. We demand that the numbers held by the stations remain to be integers and their sum is not changed (so at the end, some small differences could be inevitable and the stations hold not exactly the average value but values close to it). The intuition is that Ti might be a physical value measured by a sensor or a preference of Si , expressed as an integer value (for instance 0 meaning totally against, 50 meaning undecided and 100 as a fully supporting voice). The network may have to compute the average in order to get an output of a group of sensors or to make a common decision regarding its behavior. This task, which can trivially be solved for most computing systems (for instance, by collecting data, simple arithmetic and broiadcasting the result) becomes non trivial in ad hoc networks. Computation Model. We consider networks consisting of identical stations with no ID’s (the idea is that it is unpredictable which stations appear in the network and the devices are bulk produced). However, we assume that the stations know n, the number of stations in the network, within a constant factor. This parameter is called N for the rest of the paper. (Let us remark that N can be derived by an efficient algorithm [5].) Communication between stations is accomplished through R independent communication channels labeled by numbers 0 through R−1. A station may either send a message through a chosen channel or listen to a chosen channel (but not both at the same time, according to the IEEE 802.11 standard). If more than one station is sending on the same channel, then a collision occurs and the messages on this channel are scrambled. We assume that a station listening to the channel on which there is a collision receives noise and cannot even recognize that a collision has occurred (no-collision-detection model). In this paper we consider only networks that are concentrated in a local area: we assume that if a station sends a message, then every station (except the sender) can hear it. So we talk about single-hop network. Computation of each station consists of steps. During a step a station may perform a local computation and either send or receive messages through a chosen channel. For the sake of simplicity of presentation we assume that computation is synchronous. However, our results also hold for asynchronous systems, provided that the stations work with comparable speeds and lack of synchronization may only result in failure of a constant fraction of communication rounds. We do not use any global clock available to all stations. (In fact, our algorithm can be used to agree upon a common time for other purposes.) Design Goals. We design a protocol that has to remain stable and efficient in the following sense:
Computing Average Value in Ad Hoc Networks
513
– Each station may break down or leave the network. However we assume that Ω(N ) stations remain active. – A message sent by one station is not received by a station that is listening with probability not lower than p, where p < 1 is fixed. – An adversary, who knows all details about the algorithm, may scramble communication for a constant fraction of the total communication time over all communication channels. (So no “hot spots” of communication pattern of the protocol are admissible - they would be easily attacked by an adversary). – The protocol has to be suited for dynamic systems, which once started compute the average forever. So it has to be applied in systems where the values Ti may change. (For a discussion on dynamic systems see [8].) Preferably, a solution should be periodic with the same code is executed repeatedly. – The protocol should ensure that the number of steps at which a station transmitting a message or listening is minimized. Also there should be no station that is involved in communication substantially longer than an average station. This is due to the fact that energy consumption is due mainly for radio communication and that battery opereted devices has only limited energy resources. Former Results. Computing an average value is closely related to load balancing in distributed systems. (In fact, in the later case we have not only to compare the loads but also forward some tasks.) However, the algorithms are designed for wired networks. An exception for it is a solution proposed by Gosh and Muthukrishnan [4]: they propose a simple protocol based on random matchings in the connection graph. In one round of their protocol the load is balanced between the nodes connected by an edge of a matching. This approach is very different from straightforward algorithms that try to gather information and then make decisions based on it. They provide an estimation of convergence of this algorithm to the equal load based on total characteristics of the graph (its degree and the second eigenvalue). Their proof shows decrease of a certain potential describing how much the load is unbalanced. The protocol of Gosh and Muthukrishnan has been reused for permuting at random in a distributed system [2]. However, in this case the analysis is based on a rapid-mixing property of a corresponding Markov chain and a refined path coupling approach [3]. New Results. We extend the results from [4] and adopt it to the case of a single hop ad hoc network. The issue is that in the protocol of Gosh and Muthukrishnan one needs to show not only that a certain potential decreases fast, but also that there are no “bad points” in the network that have results far from the valid ones (if potential is low, we can only guarantee that the number of bad points is small). Theorem 1. Consider an ad hoc network consisting of Θ(N ) stations using R communication channels. Let D denote the maximum difference of the form Ti − Tj at the beginning of the protocol. With high probability, i.e. probability at least 1 − N1 , after executing O(N/R · (log N + log D)) steps of the protocol: – the sum of all values Ti remains unchanged and each station keeps one value, – either all station keep the same value Ti , or these values differ by at most 1, or they differ by at most 2 and the number of stations that keep the biggest and the smallest values is bounded by ε · N for a small constant ε.
514
Mirosław Kutyłowski and Daniel Letkiewicz
2 Protocol Description The protocol repeats a single stage consisting of 3N/R steps, three consecutive steps are called a round. A stage is in fact a step of protocol of Gosh and Muthukrishnan. Description of a Stage. 1. Each station Si chooses t, t ∈ [1, . . . , N ] and a bit b uniformly at random (the choices of t, t and b are stochastically independent). 2. Station Si performs the following actions during round t/R on the channel t mod R. – If b = 0, then Si transmits Ti at step 1. Otherwise, it transmits Ti at step 2. – At step 3, station Si listens. If a message comes from another station with the Ti transmitted and an another value, say Tj , then Si changes Ti as follows: • if Ti + Tj is even, then station Si puts Ti := 12 (Ti + Tj ), • if Ti + Tj is odd, then station Si puts Ti := 12 (Ti + Tj ), if its b equals 0, and Ti := 12 (Ti + Tj ) otherwise. 3. If t = t, then during round t /R station Si uses channel t mod R: – it listens during the first two steps, – it concatenates the messages heard and sends them during the third step. The idea of the protocol is that 3N/R steps of a single stage are used as a place for N slots in which pairs of stations can balance their values. If everything works fine, then for a given channel at a given round: – during step 1 a station Su for which b = 0 sends Tu , – during step 2 a station Sv for which b = 1 sends Tv , – step 3 is used to avoid Byzantine problems [6]: another station Sw repeats Tu and Tv . Otherwise, neither Su nor Sv could be sure that its message came through. (An update of only one of the values Su or Sv would violate the condition that the sum of all values must not change.) Of course, such a situation happens only for some slots. However, standard considerations (see for instance [3]) show that the following fact: Lemma 1. With high probability, during a stage balancing does occur at step 3 for at least c · N slots, where c is a fixed constant, 0 < c < 1. Note that Lemma 1 holds also, if for a constant fraction of slots communication failure occurs or an adversary scrambles messages. Since the stations communicate at randomly chosen moments, it is difficult for an adversary to attack only some group of stations.
3 Analysis of the Protocol The analysis consists of three different phases (even if the stations behave in exactly the same way all the time). In Phase I we show that some potential function reaches a certain low level – this part is borrowed from [4]. In Phase II we guarantee with high probability that all stations deviate by at most β from the average value. Then Phase 3 is used to show that with high probability all stations hold one of at most 3 consecutive values. In order to simplify the presentation we call the stations S1 , . . . Sn , even if the algorithm does not use any ID’s of the stations.
Computing Average Value in Ad Hoc Networks
515
3.1 Phase I Let Tt,j denote the value of Tj hold by S j nimmediately after executing stage t. Let T denote the average value, that is, T = n1 i=1 T0,i . We examine the values xt,j = Tt,j − T . In order to examine the differences from the average value we consider the following potential function: ∆t =
n
x2t,i .
i=1
Claim A. E [∆t+1 ] ≤ ρ · ∆t +
n 4
for some constant ρ < 1.
Proof. We first assume that new values hold by two stations after balancing their values become equal (possibly reaching non-integer values). Then we make an adjustment to the real situation when the values must remain to be integers. By linearity of expectation, n n E [∆t+1 ] = E x2t+1,i = E x2t+1,i . i=1
i=1
So now we inspect a single E x2t+1,i . As already mentioned, with a constant probability δ station Si balances Tt,j with some other station, say with Ss . Assume that the values held by Si and Ss become equal. So xt+1,i and xt+1,s become equal to z = 12 (xt,i + xt,s ). Therefore 2 2 xt,i + xt,s 2 E xt+1,i = (1 − δ) · xt,i + δ · E 2 = (1 − δ) · x2t,i + δ · E 14 x2t,i + 14 x2t,s + 12 xt,i · xt,s = (1 − 34 δ) · x2t,i + 14 δ · E x2t,s + 12 δ · E [xt,i · xt,s ] . Since s is uniformly distributed over {1, . . . , n}, we get n 1 2 1 · xt,j = · ∆t . E x2t,s = n n j=1
The next expression we have to evaluate is n n 1 1 E [xt,i · xt,s ] = · (xt,i · xt,j ) = · xt,i · xt,j . n n j=1 j=1
n
xt,j = 0. So E [xt,i · xt,s ] equals 0 and finally, 1 E x2t+1,i = (1 − 34 δ) · x2t,i + 14 δ · · ∆t . n When we sum up all expectations E x2t+1,i we get n 1 E [∆t+1 ] = (1 − 34 δ) · x2t,i + 14 δ · ∆t n i=1 Obviously,
j=1
= (1 − 34 δ) · ∆t + 14 δ · ∆t = (1 − 12 δ) · ∆t .
516
Mirosław Kutyłowski and Daniel Letkiewicz
Now, let us consider the case when Tt,i + Tt,s (or equivalently xt,i + xt,s ) is odd. Let us see how it contributes to the change of ∆t+1 compared to the value computed previously. In the simplified case, Si and Ss contribute 2 xt,i + xt,s 2 2 to the value of ∆t+1 . Now, this contribution could be 2 2 xt,i + xt,s + 1 xt,i + xt,s − 1 + . 2 2 For every y,
y+1 2
2 +
y−1 2
2
=2·
y 2 2
+
so we have to increase the value computed for ∆t+1 by at most established a link. It follows finally that n E [∆t+1 ] ≤ ρ · ∆t + 4 1 for ρ = (1 − 2 δ).
1 2 1 2
for each pair that has (1)
Claim B. After τ0 = O(log D + log n) stages, ∆τ0 ≤ α · n + 1 with probability 1 − O( n12 ). Proof. Let ∇t = ∆t − αn, for α =
1 4(1−ρ) .
By inequality (1)
E [∇t+1 ] = E [∆t+1 − αn] ≤ ρ · ∆t +
n − αn = 4
n − αn = ρ · ∇t . 4 It follows that E [∇t+1 ] ≤ ρ · E [∇t ]. Let τ0 = logρ−1 D · n2 . Then E [∇τ0 ] ≤ n−2 . So by Markov inequality, Pr[∇τ0 ≥ 1] ≤ n−2 . We conclude that Pr[∆τ0 < 1 + αn] is at least 1 − n−2 . ρ · ∇t + ραn +
3.2 Phase II
√ We assume that ∆τ0 < 1 + αn. Let β = 2 α. Let B = Bt be the set of stations Si such that |xt,i | > β, and G = Gt be the set of stations Sj such that |xt,j | = β. Claim C. |B ∪ G| < 14 n + O(1)
for each t ≥ τ0 .
Proof. The stations from B ∪ G contribute at least |B ∪ G| · β 2 = |B ∪ G| · 4α to ∆t . 1 Since ∆t ≤ ∆τ0 < αn + 1, we must have |B ∪ G| < 14 n + 4α . Now we define a potential function ∆ used to measure the size of Bt . Definition 1. For station Si ∈ Bt define x ˜i,t = xi,t − β and as a zero otherwise. Then x ˜2i,t . ∆t = i∈Bt
Computing Average Value in Ad Hoc Networks
Claim D. E ∆t+1 ≤ µ · ∆t
517
for some constant µ < 1.
Proof. We consider a single station Si ∈ Bt , with xi,t > β. By Claim C, with a constant probability it communicates with a station Sj ∈ Bt ∪ Gt . In this case, station Sj may join B, but let us consider contribution of stations Si and Sj to ∆t+1 . Let δ = β − xj,t . Then: 2 2 2 x ˜2i,t x ˜i,t − δ x˜i,t − 1 x˜i,t 2 2 . ˜j,t+1 ≤ 2 · <2· ≤2· = x ˜i,t+1 + x 2 2 2 2 If Si communicates with a station Sj ∈ Bt ∪ Gt , then obviously x ˜2i,t+1 + x ˜2j,t+1 ≤ x˜2i,t + x ˜2j,t . We may apply a simple bookkeeping technique assigning the contribution of stations ˜2i,t for to ∆t+1 so that the expected value of contribution of Si is bounded by µ · x some constant µ (we assign to Si the contribution of Si and Sj in the first case, and equals contribution of Si in the second case). Since by linearity of expectation E ∆ t+1 the sum of expected values of these contributions, E ∆t+1 ≤ µ · E [∆t ]. Claim E. For t ≥ τ0 + T , where T = O(log n), the set Bt is empty with probability 1 − O( n12 ). E ∆τ0 ≤ E [∆τ0 ] Proof. By Claim D, E ∆τ0 +t ≤ µt · E ∆τ0 . On the other hand, < α · n + 1. Let T = logµ−1 (2α · n3 ) and t ≥ T . Then E ∆τ0 +t ≤ n−2 . So by Markov inequality, Pr[∆τ0 +t ≥ 1] ≤ n−2 . Observe that ∆ takes only integer values, so Bτ0 +t is empty, if ∆τ0 +t < 1. 3.3 Phase III Let τ1 = τ0 + T , where T satisfies Claim E. Then for t > τ1 we may assume that for no station |xi,t | ≥ β. Our goal now is to reduce the maximal value of |xi,t |. We achieve this in at most 2β − 1 subphases, each consisting of O(log n) stages: during each subphase we “cut off” one of the values that can be taken by xi,t . Always it is the smallest or the biggest value. Let V (s) = Vt (s) be the set of stations, for which xi,t takes the value s. Consider t1 > τ1 . Let l = min{xi,t1 : i ≤ n} and and g = max{xi,t1 : i ≤ n}. Assume that l + 1 < g − 1 (so, there are at least four values of numbers xi,t1 ). We show that for t = t1 + O(log n) either Vt (l) = ∅ or Vt (g) = ∅. Obviously, no station may join Vt (l) or Vt (g), so their sizes are non-increasing. Now consider a single stage. Observe that |Vt (l)∪Vt (l+1)| ≤ 12 n or |Vt (g)∪Vt (g−1)| ≤ 12 n. W.l.o.g. we may assume that |Vt (g) ∪ Vt (g − 1)| ≤ 12 n. Consider a station Si ∈ Vt (g). With a constant probability Si communicates with a station Sj that does not belong to Vt (g) ∪ Vt (g − 1). Then station Si leaves Vt (g) and Sj remains outside Vt (g). Indeed, the values xi,t and xj,t differ by at most 2, so xi,t+1 , xj,t+1 ≤ xi,t − 1. It follows that E [|Vt+1 (l)|] ≤ ψ · |Vt (l)| for some ψ < 1. We see that in a single stage we expect either |Vt (l)| or |Vt (g)| to shrink by a constant factor. Using Markov inequality as in the proof of Claim E we may then easily derive the following property:
518
Mirosław Kutyłowski and Daniel Letkiewicz
Claim F. For some T = O(log n), if t > T , then with probability 1 − O( n12 ) either the set Vτ1 +t (l) or the set Vτ1 +t (g) is empty. By Claim F, after O(β log n) = O(log n) stages we end up in the situation in which there are at most three values taken by xi,t . Even then, we may proceed in the same way as before in order to reduce the sizes of Vt (l) or Vt (g) as long as one of these sets has size Ω(n). So we can derive the following claim which concludes the proof of Theorem 1: Claim G. For some T = O(log n), for τ2 = τ1 + T and t ≥ τ2 with probability 1 − O( n12 ) either xi,t takes only two values, or there are three values and the number of stations holding the smallest and the largest values is at most γ · n.
4 Properties of the Protocol and Discussion Changes in the Network. By a simple examination of the proof we get the following additional properties: – the result holds even if a constant fraction of messages is lost. This only increases the number of stages by a constant factor. – if some number of stations goes down during the execution of the protocol, then the final values do not concentrate around the average value of the original values of the stations that have survived, but anyway they differ by at most 2. If a new station joins the network and its value deviates from the values hold by the rest of the stations, then we may proceed with the same analysis. Conclusions regarding the rate of convergence and the rate at which new stations emerge can be derived as before. Energy Efficiency. Time complexity is not the most important complexity measure for mobile networks. Since the devices are powered by batteries, it is important to design algorithms that are energy efficient (otherwise, the network may fail due to exhaustion of batteries). The main usage of energy is for transmitting messages and listening to the communication channel. Energy usage of internal computations and sensors is substantially smaller and can be neglected. Surprisingly, for transmitting and listening comparable amounts of energy are necessary. A properly designed algorithm should require a small number of messages (not only messages sent, but also messages awaited by the stations). Additionally, the differences between the number of messages sent or awaited by different stations should be as small as possible. The reason is that with similar energy resources no station should be at higher risk of going down due to energy exhaustion. In out algorithm energy usage of each station is Θ(log n). This is optimal since we need that many sending trials to ensure that its value has been transmitted successfully with high probability in the presence of constant probability of transmission failures. Protocol Extensions – Getting Exactly One Value. Our algorithm leaves the network in a state in which there might be 3 different values. The bound from Theorem 1 regarding behavior of the algorithm cannot be improved. Indeed, consider the following example:
Computing Average Value in Ad Hoc Networks
519
assume that initially exactly one station holds value T − 1, exactly one station holds T + 1, and the rest has value T . Then in order to get into the state when all stations get value T we need that the station with T − 1 communicates with the station with value T + 1. However, probability that it happens during a single stage is Θ(1/N ). Therefore, probability that these two station encounter each other within logarithmic number of stages is O(log N/N ). Once we are left with the values that differ from the average value by less than two, it is quite reasonable to start a procedure of computing the minimum over all active stations. In fact, it suffices to redefine part 3 of a stage: instead of computing the average of two values both stations are assigned the smaller of their two values. Arguing as in Claim E, we may easily show that after O(log n) stages with high probability all stations know the minimum. If each station knows the number n of the active stations, a simple trick may be applied to compute the average value exactly. At the beginning, each value is multiplied by n. Then the average value becomes s = Ti and it is an integer. So after executing the protocol we end up in the second situation described in Theorem 1 or all stations hold the same value. In order to get rid of the first situation a simple protocol may broadcast the minimal and maximal values to all stations within O(log N ) steps. Then all stations may find s and thereby the average s/n. Dynamic Processes. For a dynamic process, in which the the values considered are changing (think for instance about the output of sensors), we may observe that the protocol works quite well. For the sake of discussion assume that the values may only increase. If we wish to ignore the effect of increments of the values we may think about old units of the values existing at the beginning of the protocol, and the new ones due to incrementing the values. When executing part 3 of a stage and “allocating” the units to stations A and B we may assume that first the same (up to 1) amount of old units is given to A and B and afterwards the new units are assigned. In this way, the new units do not influence the behavior of the old ones. So a good distribution of the old units will be achieved as stated by Theorem 1 despite the fact that the values have changed. Security Issues. Since the stations exchange information at random moments an adversary which only disturbs communication can only slow down the rate of convergence. However, if it knows that there is a station with a value X that differs a lot from the average, it may increase its chances a little bit: if X has not occurred so far during a stage, then it might be advantageous for the adversary to scramble the rest of the stage making sure that X remains untouched. Of course, serious problems occur when an adversary can fake messages of the legitimate stations. If the legitimate stations have a common secret, say K, then the problems can be avoided. Faking messages becomes hard, when the messages are secured with MAC code using K. In order to avoid the first problem it is necessary to encipher all messages (together with random nounces). In this case an adversary cannot say which values are exchanged by the algorithm. The only information that might be derived is the fact that somebody has transmitted at a given time. But this seems not to bring any substantial advantage, except that then it could be advantageous to attack the third step
520
Mirosław Kutyłowski and Daniel Letkiewicz
of a round. Encryption with a symmetric algorithm should be no problem regarding speed differences between transmission and internal computations.
Acknowledgment We thank Artur Czumaj for some ideas developed together and contained in this paper.
References 1. Chlebus, B. S.: Randomized communication in radio networks. A chapter in “Handbook on Randomized Computing” (P. M. Pardalos, S. Rajasekaran, J. H. Reif, J. D. P. Rolim, Eds.), Kluwer Academic Publishers, 2001, vol. I, 401-456 2. Czumaj, A., Kanarek, P., Kutyłowski, M., and Lory´s, K.: Distributed stochastic processes for generating random permutations. ACM-SIAM SODA’99, 271-280 3. Czumaj, A., Kutyłowski, M.: Generating random permutations and delayed path coupling method for mixing time of Markov chains. Random Structures and Algorithms 17 (2000), 238–259 4. Gosh, B., Muthukrishnan, S.: Dynamic Load Balancing in Parallel and Distributed Networks by Random Matchings. JCSS 53(3) (1996), 357–370 5. Jurdzi´nski, T., Kutyłowski, M., Zatopia´nski, J.: Energy-Efficient Size Approximation for Radio Networks with no Collision Detection. COCOON’2002, LNCS 2387, Springer-Verlag, 279-289 6. L. Lamport, R. Shostak and M. Pease: The Byzantine Generals Problem. ACM TOPLAS 4 (1982), 382-401 7. I. Stojmenoviˇc (Ed.): Handbook of Wireless Networks and Mobile Computing, Wiley, 2002 8. E. Upfal: Design and Analysis of Dynamic Processes: A Stochastic Approach, ESA’1998, LNCS 1461, Springer-Verlag, 26–34
A Polynomial-Time Algorithm for Deciding True Concurrency Equivalences of Basic Parallel Processes Sławomir Lasota Institute of Informatics, Warsaw University, Poland [email protected]
Abstract. A polynomial-time algorithm is presented to decide distributed bisimilarity of Basic Parallel Processes. As a direct conclusion, several other noninterleaving semantic equivalences are also decidable in polynomial time for this class of process, since they coincide with distributed bisimilarity.
1 Introduction One important problem in the verification of concurrent systems is to check whether two given systems P and Q are equivalent under a chosen notion of equivalence. For process algebras generating infinite-state systems the equivalence checking problem cannot be decidable in general, therefore restricted classes of processes have been defined and investigated. We study here the class of Basic Parallel Processes [9] (BPP), an extension of recursively defined finite-state systems by parallel composition. Strong bisimilarity [25] is a well accepted behavioural equivalence, which often remains decidable for infinite-state systems. An elegant proof of decidability of bisimilarity for BPP and even for BPPτ , extension of BPP by communication, was given in [10]. The PSPACE lower bound has been recently proved in [26], followed by the PSPACEcompleteness result of Janˇcar [18]. On the other hand, all other equivalences in van Glabbeek’s spectrum are undecidable [17]. BPP is the natural class of processes to investigate non-interleaving equivalences, intended to capture true concurrent computations of a system. One of the bisimulationlike non-interleaving equivalences is distributed bisimilarity [6], taking into account spatial distribution of a process. Already in [8] distributed bisimilarity was shown to be decidable on BPPτ by means of a sound and complete tableau proof system. Concerning complexity, the tableau depth was only bounded exponentially. In this paper we design a polynomial-time decision procedure for distributed bisimilarity. It strongly uses a polynomial-time algorithm for deciding strong bisimilarity on normed BPP processes proposed in [16]. Distributed bisimilarity is therefore very likely to be computationally more feasible than interleaving bisimilarity, in the light of the recent PSPACE lower bound for the latter one by Srba [26]. Further interesting conclusions follow from the fact that many non-interleaving equivalences coincide on BPP. As mentioned in [12], Kiehn proved [21] that location equivalence [7], causal equivalence [11] and distributed bisimilarity all coincide
A part of this work has been performed during the post-doc stay at Laboratoire Specification et Verification, ENS Cachan. Partially supported by the KBN grant 7 T11C 002 21 and the EC Research Training Network “Games and Automata for Synthesis and Validation” (GAMES).
B. Rovan and P. Vojt´asˇ (Eds.): MFCS 2003, LNCS 2747, pp. 521–530, 2003. c Springer-Verlag Berlin Heidelberg 2003
522
Sławomir Lasota
on CPP, a sublanguage of BPPτ without explicit τ , hence also on BPP. Furthermore, causal equivalence and history preserving bisimilarity [13] coincide on BPP by the result of Aceto [1]; moreover, Fr¨oschle showed coincidence of distributed and history preserving bisimilarity on BPP [12]. The coincidence with performance equivalence, a timed bisimulation equivalence proposed by [14], has been shown in [23], to complete the picture1 . As a direct conclusion from all these results, all the mentioned equivalences can be also decided in polynomial time on BPP. Related results are [15] decision procedures for causal equivalence, location equivalence and ST-bisimulation equivalence of BPPτ as well as for their weak versions on a subset of BPPτ . However, complexity issues were not addressed there. Furthermore, polynomial-time complexity of performance equivalence extends the result of [5] shown only for timed BPP in a full standard form [9]. Surprisingly, polynomial-time complexity of history preserving bisimilarity on BPP can be contrasted with the EXPTIME-completeness on finite-state systems (finite 1-safe nets) [19]. Similarly, decidability of hereditary history preserving bisimilarity [4] on BPP, proved in [12], can be contrasted with undecidability on finite 1-safe nets shown by Jurdzi´nski and Nielsen in [20]. We start by Section 2 containing definitions and some basic facts and then we outline our algorithm in Section 3. The algorithm works for BPP processes in standard form, similarly as in [9]. A polynomial-time preprocessing procedure transforming a process into standard form can be found in the full version of the paper [24].
2 Basic Definitions and Facts Let Act be a finite set of actions, ranged over by a, b, etc. and let Const be a finite set of process constants, ranged over by X, Y , etc. The set of BPP process expressions [9] over Act and Const is given by: Pi | P P | P P (1) P ::= 0 | X | a.P | i∈I
where 0 denotes the empty process, a. is an action prefix, i∈I Pi is a finite nondeterministic choice for a finite nonempty set I and stands for a parallel composition. The only operator not present in CCS [25] is the left merge , which differs from the parallel composition only in that the very first action must be performed in the left argument. The purpose of considering here is the standard form (3) below. A BPP process definition ∆ consists of a finite set Act(∆) of actions, a finite set Const(∆) of constants and a finite number of recursive process equations X = P, def
one for each constant X ∈ Const(∆), where P is a process expression over Act(∆) and Const(∆). Sets Const(∆) and Act(∆) are often even not mentioned explicitly, as they can be deduced from process equations. In the sequel we shall assume that a 1
Related results are [2], where Aceto proved that distributed bisimilarity coincides with timed bisimilarity on BPP without recursion, and [22], where decidability of strong bisimilarity for timed BPP was shown, that generalizes the result of [10].
A Polynomial-Time Algorithm for Deciding True Concurrency Equivalences
523
process definitions that our algorithm inputs is guarded, i.e., that each occurrence of a constant on the right-hand side is within the scope of an action prefix. For instance, def X = a.Y X + b.Z is not guarded. By a BPP process we mean a pair (P, ∆), where ∆ is a BPP process definition and P is a BPP process expression over Act(∆) and Const(∆). When ∆ is evident from the context, P itself is called a process too. Distributed bisimilarity was introduced in [6], but here we follow [9]. Given a BPP process definition ∆, consider the following SOS transition rules: a
P → [P , P ]
(X = P ) ∈ ∆ def
a
X → [P , P ] a
Pj → [P , P ] for some j∈I a i∈I Pi → [P , P ] a
P → [P , P ] a
P Q →
a
a.P → [P, 0] a
P → [P , P ] a
P Q → [P , P Q] a
(2)
Q → [Q , Q ]
[P , P Q]
a
P Q → [Q , P Q ]
a
We write P → [P , P ] if this transition can be derived from the above rules. The rules a reflect a view on a process as distributed in space. Each transition P → [P , P ] gives rise to a local derivative P , which intuitively records a location at which the action is observed, and a concurrent derivative P , recording the part of the process separated from the local component. BPP processes (P1 , ∆1 ) and (P2 , ∆2 ) are distributed bisimilar, denoted (P1 , ∆1 ) ∼ (P2 , ∆2 ) if they are related by some distributed bisimulation R, i.e., a binary relation over BPP process expressions such that whenever (P, Q) ∈ R, for each a ∈ Act, a
a
– if P → [P , P ] then Q → [Q , Q ] for some Q , Q such that (P , Q ) ∈ R and (P , Q ) ∈ R, a a – if Q → [Q , Q ] then P → [P , P ] for some P , P such that (P , Q ) ∈ R and (P , Q ) ∈ R. In the next section we prove polynomial-time complexity of the problem of checking distributed bisimilarity for a given pair of constants. We do not lose generality, as checking P ∼Q for arbitrary P, Q is equivalent to checking XP ∼XQ , where XP and def def XQ are new fresh constants with defining equations: XP = a.P and XQ = a.Q, for arbitrary a. Moreover, w.l.o.g. we assume that both constants share a process definition. Problem: D ISTRIBUTED BISIMILARITY FOR BPP Instance: A BPP process definition ∆ and X, Y ∈ Const(∆) Question: (X, ∆)∼(Y, ∆) ? Christensen presented in [8] a sound and complete tableau proof system for ∼ on BPPτ and proved an exponential upper bound for the depth of a tableau. Theorem 1 ([8, 9]). Distributed bisimilarity is decidable on BPP.
524
Sławomir Lasota
Christensen [9] showed also that each BPP process definition ∆ can be effectively transformed into an equivalent process definition ∆ in standard form, i.e., consisting exclusively of process equations in the restricted form def X= (ai .Pi )Qi , (3) i∈I
where all Pi and Qi are merely a parallel composition of constants, i.e., of the form X1 X2 . . .Xn ,
for n > 0 and X1 , . . . , Xn ∈ Const(∆ ).
(4)
Note that (3) is not guarded in general. We omit brackets in (4) as is associative and commutative w.r.t. ∼ (and w.r.t. any other known semantical equivalence, in fact). A parallel composition of constants (4) is called basic process (expression) in the sequel. Observe that processes Pi in (3) are precisely local derivatives of X and processes Qi are precisely its concurrent derivatives. Hence the left merge operator allows here to syntactically separate both derivatives. Consequently, in the next section we will only consider basic processes since both local and concurrent derivatives of a basic process are basic again. Since ∆ is guarded, the process definition ∆ produced by Christensen’s transformation is bounded in the following sense: Definition 1. Define a binary relation over Const(∆ ) as follows: Y ≺1 X iff Y appears in some concurrent derivative of X. We say that ∆ is bounded if the transitive + closure ≺+ 1 of ≺1 is irreflexive, i.e., no constant satisfies X≺1 X.
Christensen’s transformation produces ∆ that contains only basic processes consisting of at most three constants. The price for this constant bound is the exponential size of ∆ w.r.t. the size of ∆, defined as the number of process equations in ∆ plus the sum of lengths of right-hand sides. This is why we proved (Theorem 2 below) that the transformation to the standard form (3) can be done in polynomial time.
Theorem 2. There exists a polynomial-time algorithm that transforms a guarded process definition ∆ into a process definition ∆ such that: 1. ∆ is in standard form (3), 2. ∆ is bounded, 3. Const(∆) ⊆ Const(∆ ) and for each X ∈ Const(∆), (X, ∆)∼(X, ∆ ). (The proof is given in the full version of this paper [24].) a → P obtained Strong bisimilarity is defined w.r.t. single-derivative transitions P − by the rules (2) when the distribution of a process is ignored – for details we refer eg. to [25]. A remarkable result is that strong bisimilarity can be decided for BPP processes in polynomial time [16], but only when all the constants are normed. A constant, or generally a process P , is normed if an inactive process is reachable from P , i.e., if there a1 an a2 P1 −→ . . . −− → Pn , n ≥ 0, such that there are is a finite sequence of transitions P −→ no further transition from Pn . The norm of P is the length of the shortest such sequence. Strong bisimilarity is less restrictive than distributed bisimilarity. Hence a process definition ∆ can be transformed into a strong bisimilarity equivalent process definition ∆ in a more restrictive than (3) full standard form [9]. Full standard form admits exdef clusively process equations in the form: X = i∈I ai .Pi , where all Pi are basic again. Theorem 3 ([16]). There exists a polynomial-time algorithm to decide strong bisimilarity on normed BPP in full standard form.
A Polynomial-Time Algorithm for Deciding True Concurrency Equivalences
525
3 Algorithm Throughout this section we fix ∆, and hence also sets of constants and actions. ∆ is assumed to be in standard form and bounded. A more succinct representation is possible for ∆ in standard form. Namely, due to associativity and commutativity of , it is sufficient to remember the number of occurrences of each constant in each basic expression Pi and Qi in the right-hand side of all equations (3) in ∆, encoded in binary. In this section, complexity is measured w.r.t. the size of ∆ defined as the number of process equations plus the sum of lengths of the succinct representations of the right-hand sides. Theorem 3 is still valid for this definition of size. The reachability relation over process expressions is defined as the smallest transitive relation such that each P is related to all its local and concurrent derivatives, i.e, a whenever P → [P , P ], the relation contains pairs (P, P ) and (P, P ). We say that Q is reachable from P if the pair (P, Q) is in the reachability relation. Let us denote by P the set of all process expressions reachable from the constants. As observed previously, all P ∈ P are basic. Unless stated otherwise, all the relations mentioned below are binary relations over P. This includes also ∼, which is restricted now to P only. Unless explicitly stated otherwise, P , Q, etc., range over P. An exponential-time algorithm can be easily derived as follows. First define two monotonic operators as follows. The operator B2 acts on a pair of relations as follows: (P, Q) ∈ B2 (R , R ) iff for each a ∈ Act, a
a
– if P → [P , P ] then Q → [Q , Q ] for some Q , Q such that (P , Q ) ∈ R and (P , Q ) ∈ R , a a – if Q → [Q , Q ] then P → [P , P ] for some P , P such that (P , Q ) ∈ R and (P , Q ) ∈ R . Then the operator B1 is defined by B1 (R) := B2 (R, R). Now, define the approximating equivalences ∼i as follows: – P ∼0 Q for all P and Q, – ∼i+1 := B1 (∼i ), for i ≥ 0. Distributed bisimulations R are exactly the post-fixed points of B1 , i.e., are defined by R ⊆ B1 (R). Hence ∼ being the union of all distributed bisimulations is then the greatest fixed point of B1 , by the Knaster-Tarski theorem. Recall that BPP is imagefinite, that is to say each process expression has only finitely many local and concurrent derivatives. Thus by a standard argument the decreasing chain {∼i } converges to ∼: ∼= ∼i . (5) i∈N
Furthermore, observe that each local derivative of a basic process X1 . . .Xn is a local derivative of some Xj , i.e., is equal to some process Pi appearing in a process equation (3) in ∆. In consequence, the number of local derivatives of all basic processes is polynomial. Let us denote the set of all those by L. Moreover, there are only exponentially many processes reachable from each P ∈ L – this follows easily from boundedness of ∆. Consequently, the cardinality N of the whole P is exponential. Hence ∼
526
Sławomir Lasota
can be computed over P in exponential time, e.g., as the limit of the sequence {∼i } of equivalences, since the sequence stabilizes after at most N −1 steps. We have not focused on details of the exponential-time algorithm, as in the rest of this section we argue that one can do better: the problem can be solved in polynomial time. Essentially, this is possible due to a ”quicker” convergence to the greatest fixed point, as explained in Lemma 2 below and thereafter. Then, in the crucial Lemma 4 we reduce distributed bisimilarity to strong bisimilarity between normed processes. To this aim we incorporate local derivatives into actions and obtain single-derivative transitions. We start by a couple of definitions and simple facts. Definition 2. Given a binary relation S, a distributed bisimulation w.r.t. S is any binary relation R such that R ⊆ B2 (S, R). P and Q are distributed bisimilar w.r.t. S, denoted P ∼S Q, if they are related by some distributed bisimulation w.r.t. S. Definition 3. We say that a relation R is a distributed bisimilarity w.r.t. itself if R = ∼R . Let ≈ denote the greatest distributed bisimilarity w.r.t. itself. A relation is a distributed bisimilarity w.r.t. itself precisely if it is a fixed point of the monotonic mapping R → ∼R . Hence the greatest distributed bisimilarity w.r.t. itself always exists. Lemma 1. ∼ and ≈ coincide. Proof. For one inclusion, recall that ∼ is the union of all distributed bisimulations while ∼∼ is the union of all distributed bisimulations w.r.t. ∼. Since each distributed bisimulation is a distributed bisimulation w.r.t. ∼, we have ∼ ⊆ ∼∼ , i.e., ∼ is a post-fixed point of the mapping R → ∼R . As ≈ is the greatest fixed point of that mapping, we obtain ∼ ⊆ ≈. For the other inclusion, assume a relation S is a distributed bisimilarity w.r.t. itself, S = ∼S . Since ∼S is the union of all distributed bisimulations w.r.t. S, it is the (greatest) fixed point of the monotonic mapping R → B2 (S, R), i.e., ∼S = B2 (S, ∼S ). Substituting S in place of ∼S we get S = B2 (S, S), i.e., S is a fixed point of B1 . Hence S ⊆ ∼. As S was chosen arbitrary, we showed that each distributed bisimilarity w.r.t. itself is included in ∼. Hence also ≈ does, i.e., ≈ ⊆ ∼. 2 We proved that ≈ is just another formulation of ∼. But ≈ gives rise to another sequence of approximating equivalences {≈i } that converges more rapidly than {∼i }, namely after a polynomial number of iterations: Lemma 2. ≈ = i∈N ≈i , where the sequence {≈i } is defined by: – P ≈0 Q for all P and Q, – ≈i+1 := ∼≈i . Proof. Obviously ≈ ⊆ i∈N ≈i , so we only need to show the opposite inclusion. Similarly as ∼, ∼S is the greatest fixed point of the monotonic mapping R → B2 (S, R), for any fixed S. So ∼S is also a limit of a decreasing sequence of approximations: ∼Si , (6) ∼S = i∈N
where relations ∼Si are defined by:
A Polynomial-Time Algorithm for Deciding True Concurrency Equivalences
527
– P ∼S0 Q for all P and Q, – ∼Si+1 := B2 (S, ∼Si ). Having this, by an easy induction we show ≈i ⊆ ∼i , for all i ≥ 0. As the induction assumption suppose ≈i ⊆ ∼i , for a fixed i ≥ 0. Now substitute ≈i in place of S in (6) and in the definition of ∼Sj , j ≥ 0. Due to monotonicity of B2 we derive by another ≈i i easy induction on j ≤ i that ∼≈ j+1 = B2 (≈i , ∼j ) ⊆ B2 (∼j , ∼j ) = ∼j+1 , since i ≈i ⊆ ∼i ⊆ ∼j by the induction assumption and by j ≤ i. Hence ∼≈ i+1 ⊆ ∼i+1 . ≈i ≈i By (6) we know that ≈i+1 = ∼ ⊆ ∼i+1 , hence we conclude ≈i+1 ⊆ ∼i+1 . This completes the induction step. 2 Having ≈i ⊆ ∼i , for all i ≥ 0, we apply Lemma 1 and (5). Equipped with Lemmas 1 and 2, we are ready to describe the polynomial-time algorithm. Recall that L denotes the set of all local derivatives. The algorithm consists of two phases as outlined in the figure below. By R∩(L×L) we mean here the restriction of a relation R to pairs from L. P HASE 1: let 0 := L×L REPEAT FOR n = 0, 1, . . . compute n+1 ⊆ L×L as follows: n+1 := ∼n ∩(L×L) UNTIL n = n+1 P HASE 2:
decide whether X ∼n Y
The first phase is a crucial one, but amounts simply to computing an initial part of {≈i } up to the position n where it eventually stabilizes. The trick is that ≈i is computed only for local derivatives. Then, in the second phase we only need to check whether the input pair (X, Y ) belongs to ∼≈n . Assuming that the first phase of the algorithm terminates, the outcome ∼n coincides with ≈. Lemma 3. If n = n+1 then ∼n = ≈. Proof. Assuming n = n+1 , we will show ∼n ⊆ ≈ and ∼n ⊇ ≈. For both inclusions we will silently use an obvious fact: when two relations S1 and S2 coincide on L × L, i.e., S1 ∩(L×L) = S2 ∩(L×L), then ∼S1 = ∼S2 . For ∼n ⊆ ≈, it is sufficient to show that ∼n is a distributed bisimilarity w.r.t. itn n = ∼∼ ∩(L×L) = ∼n+1 = ∼n . self. Indeed, ∼∼ n For ∼ ⊇ ≈, we show by induction that for all i ≥ 0, (a) i = ≈i ∩(L×L), (b) ∼i = ≈i+1 . For i = 0 it is obvious, so assume i > 0 and (a) and (b) hold for i−1. We prove (b)
(a) first: i = ∼i−1 ∩(L×L) = ≈i ∩(L×L). Then (b) follows easily from (a): (a)
∼i = ∼≈i ∩(L×L) = ∼≈i = ≈i+1 . Now ∼n ⊇ ≈ follows from (b) since ≈n+1 ⊇ ≈, by Lemma 2.
2
528
Sławomir Lasota
Termination of the first phase of the algorithm after a polynomial number of iterations of the main loop is guaranteed as the sequence {i } is non-increasing: 0 ⊇ 1 ⊇ . . . , and each i contains only polynomially many pairs. What we still need to show is that the single iteration of the loop body, i.e., computation of i+1 from i , can be done in polynomial time. To this aim we will prove the following Lemma 4. In the proof we will profit from Theorem 3. Lemma 4. Let S ⊆ L×L be an equivalence such that there exists a polynomial-time algorithm (w.r.t. the size of ∆) to decide whether (P, Q) ∈ S, for given P, Q ∈ L. Then there exists a polynomial-time algorithm to decide P ∼S Q, for given P , Q ∈ P. Proof. As the first stage, we construct from ∆ a new process definition ∆ in the full standard form, equivalent to ∆ in the following sense: P, Q ∈ P, (P, ∆) ∼S (Q, ∆)
iff (P, ∆ ) is strongly bisimilar to (Q, ∆ ).
(7)
The construction of ∆ is as follows: Const(∆ ) := Const(∆)
Act(∆ ) := Act(∆) × L/S,
where L/S denotes the set of equivalence classes of S and can be computed in polynomial time. Furthermore, whenever ∆ contains a process equation def X= (ai .Pi )Qi , (8) i∈I
∆ contains
X= def
i∈I
(ai , [Pi ]S ).Qi ,
(9)
where [Pi ]S denotes the equivalence class of Pi in S. Having this, (7) is clear, by the very definitions of the two bisimilarities involved. Now, the crucial point is that ∆ is always normed, since ∆ is bounded. We have Y ≺1 X (cf. Section 2) iff Y appears in some Qi on the right-hand side of process equation (8) defining X. As the transitive closure ≺+ 1 is irreflexive, the following equations (10) and (11) are well-defined and give the norm of all the constants. First, for each constant X defined by (9) in ∆ , norm(X) = 1 + min{norm(Qi )}. i∈I
(10)
Second, the norm is additive w.r.t. parallel composition: norm(P Q) = norm(P ) + norm(Q),
(11)
and this implies that the norm of each concurrent derivative Qi in equation (10) is the sum of norms of all parallel components. Now we apply Theorem 3 to ∆ and by (7) get a polynomial-time procedure to decide ∼S . 2 Evidently each i is an equivalence, hence the lemma applies and we conclude that the body of the main loop of the first phase requires only polynomial time: it amounts to invoking the decision procedure for strong bisimilarity on normed BPP polynomially many times, since the set L×L has polynomial cardinality. By Lemma 4 the second phase can be computed in polynomial time as well. Correctness of the algorithm follows by Lemmas 3 and 1. This completes the proof of the following:
A Polynomial-Time Algorithm for Deciding True Concurrency Equivalences
529
Theorem 4. There exists a polynomial-time algorithm to decide distributed bisimilarity for BPP processes.
4 Final Remarks We have proposed a polynomial-time decision procedure for distributed bisimilarity on BPP. As mentioned in the introduction, many non-interleaving equivalences coincide on BPP. Therefore, we directly conclude from Theorem 4: Corollary 1. There exists a polynomial-time algorithm to decide the following equivalences for BPP processes: location equivalence, causal equivalence, history preserving equivalence, performance equivalence. Consider BPPτ , an extension of BPP with communication between parallel components expressed by one additional rule: a
P → [P , P ] τ
a ¯
Q → [Q , Q ]
P Q → [P Q , P Q ]
.
(12)
A local derivative of a τ -transition can be composed of two local derivatives of parallel components. Hence local derivatives cannot be encoded directly into actions, and the reduction of distributed bisimilarity to strong bisimilarity in the proof of Lemma 4 fails. A crucial ingredient of our decision procedure is the polynomial-time transformation of a process definition to the standard form, described in the full version of this paper [24]. It is different from the transformation proposed by Christensen in [9], since the process definition in standard form yielded by the latter is of exponential size. Our algorithm needs Θ(n2 ) calls to the polynomial-time algorithm of [16] in each iteration in the first phase, where n stands for the size of ∆. At most n iterations are needed, since all i are equivalences, and therefore total cost is Θ(n3 ) calls to the procedure of [16]. On the other hand, P-completeness of the problem follows easily since it subsumes strong bisimilarity for finite-state systems and the latter is P-complete [3]. An interesting continuation of the work would be to develop a more efficient direct algorithm, not referring to the procedure of [16].
Acknowledgements The author is very grateful to Philippe Schnoebelen for many fruitful discussions.
References 1. L. Aceto. History preserving, causal and mixed-ordering equivalence over stable event structures. Fundamenta Informaticae, 17:319–331, 1992. 2. L. Aceto. Relating distributed, temporal and causal observations of simple processes. Fundamenta Informaticae, 17:369–397, 1992. 3. J. Balc´azar, J. Gabarr´o, and M. S´antha. Deciding bisimilarity is P-complete. Formal Aspects of Computing, (6A):638–648, 1992.
530
Sławomir Lasota
4. M. Bednarczyk. Hereditary history preserving bisimulation or what is the power of the future perfect in program logics. Technical report, Polish Academy of Sciences, Gda´nsk, 1991. 5. B. B´erard, A. Labroue, and P. Schnoebelen. Verifying performance equivalence for timed Basic Parallel Processes. In Proc. FOSSACS’00, LNCS 1784, pages 35–47, 2000. 6. I. Castellani. Bisimulations for Concurrency. PhD thesis, University of Edinburg, 1988. 7. I. Castellani. Process algebras with localities. In J. Bergstra, A. Ponse, S. Smolka, eds., Handbook of Process Algebra, chapter 15, pages 945–1046, 2001. 8. S. Christensen. Distributed bisimilarity is decidable for a class of infinite state systems. In Proc. 3th Int. Conf. Concurrency Theory (CONCUR’92), LNCS 630, pages 148–161, 1992. 9. S. Christensen. Decidability and Decomposition in process algebras. PhD thesis, Dept. of Computer Science, University of Edinburgh, UK, 1993. 10. S. Christensen, Y. Hirshfeld, and F. Moller. Bisimulation equivalence is decidable for Basic Parallel Processes. In Proc. CONCUR’93, LNCS 713, pages 143–157, 1993. 11. P Darondeau and P. Degano. Causal trees. In Proc. ICALP’89, LNCS 372, pages 234–248, 1989. 12. S. Fr¨oschle. Decidability of plain and hereditary history-preserving bisimulation for BPP. In Proc. EXPRESS’99, volume 27 of ENTCS, 1999. 13. R. van Glabbeek and U. Goltz. Equivalence notions for concurrent systems and refinement of actions. In Proc. MFCS’89, LNCS 379, pages 237–248, 1989. 14. R. Gorrieri, M. Roccetti, and E. Stancampiano. A theory of processes with durational actions. Theoretical Computer Science, 140(1):73–94, 1995. 15. M. Hennessy and A. Kiehn. On the decidability of non-interleaving process equivalences. In Proc. 5th Int Conf. Concurrency Theory (CONCUR’94), pages 18–33, 1994. 16. Y. Hirshfeld, M. Jerrum, and F. Moller. A polynomial time algorithm for deciding bisimulation equivalence of normed basic parallel processes. Mathematical Structures in Computer Science, 6:251–259, 1996. 17. H. H¨uttel. Undecidable equivalences for basic parallel processes. In Proc. TACS’94, LNCS 789, pages 454–464, 1994. 18. P. Janˇcar. Bisimilarity of basic parallel processes is PSPACE-complete. In Proc. LICS’03, to appear, 2003. 19. L Jategaonkar and A. R. Meyer. Deciding true concurrency equivalences on safe, finite nets. Theoretical Computer Science, 154:107–143, 1996. 20. M. Jurdzi´nski and M. Nielsen. Hereditary history preserving bisimilarity is undecidable. In Proc. STACS’00, LNCS 1770, pages 358–369, 2000. 21. A. Kiehn. A note on distributed bisimulations. Unpublished draft, 1999. 22. S. Lasota. Decidability of strong bisimilarity for timed BPP. In Proc. 13th Int. Conf. on Concurrency Theory (CONCUR’02), LNCS 2421, pages 562–578. Springer-Verlag, 2002. 23. S. Lasota. On coincidence of distributed and performance equivalence for Basic Parallel Processes. http://www.mimuw.edu.pl/˜sl/papers/, unpublished draft, 2002. 24. S. Lasota. A polynomial-time algorithm for deciding true concurrency equivalences of Basic Parallel Processes. Research Report LSV-02-13, LSV, ENS de Cachan, France, 2002. 25. R. Milner. Communication and Concurrency. Prentice Hall, 1989. 26. J. Srba. Strong bisimilarity and regularity of Basic Parallel Processes is PSPACE-hard. In Proc. STACS’02, LNCS 2285, 2002.
Solving the Sabotage Game Is PSPACE-Hard Christof L¨ oding and Philipp Rohde Lehrstuhl f¨ ur Informatik VII, RWTH Aachen {loeding,rohde}@informatik.rwth-aachen.de
Abstract. We consider the sabotage game as presented by van Benthem. In this game one player moves along the edges of a finite multigraph and the other player takes out a link after each step. One can consider usual algorithmic tasks like reachability, Hamilton path, or complete search as winning conditions for this game. As the game definitely ends after at most the number of edges steps, it is easy to see that solving the sabotage game for the mentioned tasks takes at most PSPACE in the size of the graph. In this paper we establish the PSPACE-hardness of this problem. Furthermore, we introduce a modal logic over changing models to express tasks corresponding to the sabotage games and we show that model checking this logic is PSPACE-complete.
1
Introduction
In some fields of computer science, especially the controlling of reactive systems, an interesting sort of tasks arises, which consider temporal changes of a systems itself. In contrast to the usual tasks over reactive systems, where movements within a system are considered, an additional process affects: the dynamic change of the system itself. Hence we have two different processes: a local movement within the system and a global change of the system. Consider, for example, a network where connections or servers may break down. Some natural questions arise for such a system: is it possible – regardless of the removed connections – to interchange information between two designated servers? Is there a protocol which guarantees that the destination can be reached? Another example for a task of this kind was recently given by van Benthem [1], which can be described as the real Travelling Salesman Problem: is it possible to find your way between two cities within a railway network where a malevolent demon starts cancelling connections? As usual one can model such kind of reactive system as a two-person game, where one player tries to achieve a certain goal given by a winning condition and the other player tries to prevent this. As winning conditions one can consider algorithmic tasks over graphs as, e.g., reachability, Hamilton path, or complete search. Determining the winner of these games gives us the answers for our original tasks. In this paper we show that solving sabotage games where one player (the Runner ) moves along edges in a multi-graph and the other player (the Blocker ) removes an edge in each round is PSPACE-hard for the three mentioned winning B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 531–540, 2003. c Springer-Verlag Berlin Heidelberg 2003
532
Christof L¨ oding and Philipp Rohde
conditions. The main aspect of the sabotage game is that the Runner can only act locally by moving one step further from his actual position whereas the Blocker has the possibility to behave globally on the arena of the game. So the sabotage game is in fact a match between a local and a global player. This distinguishes the sabotage game from the classical games that are studied in combinatorial game theory (see [2] for an overview). In Sect. 2 we introduce the basic notions of the sabotage game. In Sect. 3 we show the PSPACE-hardness for the sabotage game with the reachability condition on undirected graphs by giving a polynomial time reduction from the PSPACE-complete problem of Quantified Boolean Formulas to these games. In Sect. 4 we give polynomial time reductions from sabotage games with reachability condition to the other winning conditions. In the last section we introduce the extension SML of modal logic over transitions systems which captures the concept of removing edges, i.e., SML is a modal logic over changing models. We give the syntax and the semantics of SML and provide a translation to first order logic. By applying the results of the first part we will show that model checking for this logic is PSPACE-complete. We would like to thank Johan van Benthem and Peter van Emde Boas for several ideas and comments on the topic.
2
The Sabotage Game
In this section we give the definition of the sabotage game and we repeat three algorithmic tasks over graphs which can be considered as winning conditions for this game. A multi-graph is a pair (V, e) where V is a non-empty, finite set of vertices and e : V × V → N is an edge multiplicity function, i.e., e(u, v) denotes the number of edges between the vertices u and v. e(u, v) = 0 means that u and v are not connected. In case of an undirected graph we have in addition e(u, v) = e(v, u) for all u, v ∈ V . A single-graph is given by a multiplicity function with e(u, v) ≤ 1 for all vertices u, v ∈ V . The size of a multi-graph (V, e) is given by |V | + |E|, where we set |E| := u,v∈V e(u, v) for directed graphs and |E| := 12 u,v∈V e(u, v) for undirected graphs. Let (V, e0 ) be a multi-graph and v0 ∈ V be an initial vertex. The two-person sabotage game is played as follows: initially the game arena is A0 = (V, e0 , v0 ). The two players, which we call Runner and Blocker, move alternatingly, where the Runner starts his run from vertex v0 . At the start of round n the Runner moves one step further along an existing edge of the graph, i.e., if vn is his actual position, he chooses a vn+1 ∈ V with en (vn , vn+1 ) > 0 and moves to vn+1 . Afterwards the Blocker removes one edge of the graph, i.e., he chooses two vertices u and v somewhere in the graph with en (u, v) > 0. In the directed case we define en+1 (u, v) := en (u, v) − 1 and en+1 (·, ·) := en (·, ·) otherwise. In the undirected case we let en+1 (u, v) := en+1 (v, u) := en (u, v) − 1. The multi-graph An+1 = (V, en+1 , vn+1 ) becomes the arena for the next round. The game ends, if either the Runner cannot make a move, i.e., there is no link starting from his actual position or if the winning condition is fulfilled.
Solving the Sabotage Game Is PSPACE-Hard
533
As winning conditions for the sabotage game on an undirected or directed graph one can consider the usual tasks over graphs, for example: 1. Reachability: the Runner wins iff he can reach a given vertex (which we call the goal ) 2. Hamilton Path or Travelling Salesman: the Runner wins iff he can move along a Hamilton Path, i.e., he visits each vertex exactly once 3. Complete Search: the Runner wins iff he can visit each vertex (possibly more than once) It is easy to see that for the reachability game with one single goal the use of multi-graphs is crucial, but we can bound the multiplicity uniformly by two or, if we allow a second goal vertex, we even can transform every multi-graph game into a single-graph game: Lemma 1. Let G be a sabotage game with reachability condition on a multigraph arena A. Then there are games G , G on arenas A , A with a size polynomial in the size of A such that the Runner wins G iff he wins G , resp. G , and A is a single-graph with two goals and A is a multi-graph with one goal and only single or double edges where the double edges occur only connected with the goal. Proof. We only sketch the proof for directed graphs. To obtain A one adds a new goal and replaces each edge between vertices u and v with multiplicity k > 0 by the construction depicted in Fig. 1 (with k new vertices). We actually need a new goal if v is the original goal. The arena A is constructed similarly: if v is not the original goal we apply the same construction (Fig. 1), but reusing the existing goal instead of adding a new one. If v is the goal then we add double edges from the new vertices to v (see Fig. 2). Note that Blocker does not gain additional moves because all new vertices are directly connected to the goal. u •
•
···
u •
•
v Fig. 1. Replacement for A
•
•
• 2
2
··· v
2
•
•
2
Fig. 2. Replacement for A
Since edges are only deleted but not added during the play the following fact is easy to see: Lemma 2. If the Runner has a winning strategy in the sabotage game with reachability condition then he can win without visiting any vertex twice. In the sequel we will introduce several game arenas where we use edges with a multiplicity ‘high enough’ to ensure that the Blocker cannot win the game
534
Christof L¨ oding and Philipp Rohde
by reducing these edges. In figures these edges are represented by a curly link • . For the moment we can consider these links to be ‘unremovable’. • Due to the previous lemma we have: if the Runner can win the reachability game at all, then he can do so within at most |V | − 1 rounds. Hence we can set the multiplicity of the ’unremovable’ edges to |V | − 1. To bound the multiplicity of edges uniformly one can apply Lemma 1.
3
PSPACE-Hardness for Sabotage Reachability Games
In this section we prove that the PSPACE-complete problem of Quantified Boolean Formulas (cf. [3]), QBF for short, can be reduced by a polynomial time reduction to sabotage games on undirected graphs with the reachability condition. Let ϕ ≡ ∃x1 ∀x2 ∃x3 . . . Qxn ψ be an instance of QBF, where Q is ∃ for n odd and ∀ otherwise and ψ is a quantifier-free Boolean formula in conjunctive normal form. We will construct an undirected game arena for a sabotage game Gϕ with a reachability condition such that the Runner has a winning strategy in the game iff the formula ϕ is satisfiable. A reduction like the classical one from QBF to the Geography Game (cf. [3]) does not work here, since the Blocker may destroy connections in a part of the graph which should be visited only later in the game. This could be solved by blowing up the distances, but this approach results in an arena with a size exponential in the size n of ϕ. So we have to restrict the liberty of the Blocker in a more sophisticated way, i.e., to force him removing edges only ‘locally’. The game arena Gϕ consists of two parts: a chain of n gadgets where first the Runner chooses an assignment for x1 , then the Blocker chooses an assignment for x2 before the Runner chooses an assignment for x3 and so on. The second part gives the Blocker the possibility to select one of the clauses of ψ. The Runner must certify that this clause is indeed satisfied by the chosen assignment: he can reach the goal vertex and win the game iff at least one literal in the clause is true under the assignment. Figure 5 shows an example of the sabotage game Gϕ for the formula ϕ ≡ ∃x1 ∀x2 ∃x3 (c1 ∧c2 ∧c3 ∧c4 ) where we assume that each clause consists of exactly three literals. In the following we describe in detail the several components of Gϕ and their arrangement. The main step of the construction is to take care about the opportunity of the Blocker to remove edges somewhere in the graph. The ∃-Gadget. The gadget where the Runner chooses an assignment for the xi with i odd is displayed in Fig. 3. We are assuming that the run reaches this gadget at vertex A at the first time. Vertex B is intended to be the exit. In the complete construction there are also edges from Xi , resp. Xi leading to the last gadget of the graph, represented as dotted lines labelled by back. We will see later that taking these edges as a shortcut, starting from the ∃-gadget directly to the last gadget is useless for the Runner. The only meaningful direction is coming from the last gadget back to the ∃-gadget. So we temporary assume that
Solving the Sabotage Game Is PSPACE-Hard
in
•
•
•
Xi
in
A 4
4
•
535
•
•
•
D
Xi
•
•
Xi
back
back
•
A 4
4
•
Xi
•
3
C back
back
B
B
out
out
Fig. 3. ∃-gadget for xi with i odd
Fig. 4. ∀-gadget for xi with i even
start
•
•
•
X1
•
•
•
X2
•
•
•
X3
• 4
•
4
• 4
•
4
• 4
•
4
•
•
X1
•
•
•
X2
•
3
•
•
X3
•
• A1 C1 • • •
A2 •
C2 • • •
A3 •
C3
•
• • •
• •
• unremovable link n
• edge of multiplicity n • single edge
•
C4
•
• • •
Fig. 5. The arena for ∃x1 ∀x2 ∃x3 (c1 ∧ c2 ∧ c3 ∧ c4 ) Types of edges: •
A4
536
Christof L¨ oding and Philipp Rohde
the Runner does not take these edges. In the sequel we further assume, due to Lemma 2, that the Runner does not move backwards. The Runner makes his choice simply by moving from A either to the left or to the right. Thereby he moves either towards Xi if he wants xi to be false or towards Xi if he wants xi to be true. We consider only the first case. The Blocker has exactly four steps to remove all the links between Xi and the goal before the Runner reaches this vertex. On the other hand the Blocker cannot remove edges somewhere else in the graph without loosing the game. Why we use four steps here will be clarified later on. If the Runner has reached Xi and he moves towards B then the Blocker has to delete the edge between B and Xi since otherwise the Runner can reach the goal on this way (there are still four edges left between Xi and the goal). The ∀-Gadget. The gadget where the Blocker chooses an assignment for the xi with i even is a little bit more sophisticated. Figure 4 shows the construction. If the Blocker wants xi to be false he tries to lead the Runner towards Xi . In this case he simply removes the three edges between C and Xi during the first three steps. Then the Runner has to move across D and in the meantime the Blocker deletes the four edges between Xi and the goal to ensure that the Runner cannot win directly. As above he removes in the last step the link between B and Xi to prevent a premature end of the game. If the Blocker wants to assign true to xi he should lead the Runner towards Xi . To achieve this aim he removes three of the four links between Xi and the goal before the Runner reaches C. Nevertheless the Runner has the free choice at vertex C whether he moves towards Xi or towards Xi , i.e., the Blocker cannot guarantee that the run goes across Xi . But let us consider the two possible cases: first we assume that the Runner moves as intended and uses an edge between C and Xi . In this round the Blocker removes the last link from Xi to the goal. Then the Runner moves to B and the Blocker deletes the edge from B to Xi . Now assume that the Runner ‘misbehaves’ and moves from C to D and further towards Xi . Then the Blocker first removes the four edges between Xi and the goal. When the Runner now moves from Xi to B the Blocker has to take care that the Runner cannot reach the goal via the link between B and Xi (there is still one edge left from Xi to the goal). For that he can delete the last link between Xi and the goal and isolate the goal completely within this gadget. The Verification Gadget. The last component of the arena is a gadget where the Blocker can choose one of the clauses of the formula ψ. Before we give the representation of this gadget let us explain the idea. If the Blocker chooses the clause c then the Runner can select for his part one literal xi of c. There is an edge back to the ∃-gadget if i is odd or to the ∀-gadget if i is even, videlicet to Xi if xi is positive in c, resp. to Xi if xi is negative in c. So if the chosen assignment satisfies ψ, then for all clauses of ψ there is at least one literal which is true. Since the path through the assignment gadgets visits the opposite truth values this means that there is at least one edge back to an Xi , resp. Xi , which itself is connected to the goal by an edge with a multiplicity of four (assuming
Solving the Sabotage Game Is PSPACE-Hard
537
that the Runner did not misbehave in the ∀-gadget). Therefore the Runner can reach the goal and wins the game. For the converse if the chosen assignment does not satisfy ψ, then there is a clause c in ψ such that every literal in c is assigned false. If the Blocker chooses this clause c then every edge back to the assignment gadgets ends in an Xi , resp. Xi , which is unconnected to the goal. If we show that there is no other way to reach the goal this means that the Runner looses the game. But we have to be very careful neither to allow any shortcuts for the Runner nor to give the Blocker to much liberty. Figure 5 contains the verification gadget for ψ ≡ c1 ∧ c2 ∧ c3 ∧ c4 where each clause ci has exactly three literals. The curly edges at the bottom of the gadget lead back to the corresponding literals of each clause. The Blocker chooses the clause ck by first removing the edges from Aj to Cj for j < k one after the other. Then he cuts the link between Ak and Ak+1 , resp. between Ak and the goal if ck is the last clause. By Lemma 2 it is useless for the Runner to go back, thus he can only follow the given path to Ck . If he reaches this vertex the Blocker must remove the link from Ck to the goal to prevent the win for the opponent. In the next step the Runner selects a literal xi , resp. ¬xi in ck , moves towards the corresponding vertex and afterwards along the curly edge back to the assignment gadgets as described above. At this point the Blocker has exactly two moves left, i.e., he is allowed to remove two edges somewhere in the graph. But we have: if the ‘right’ assignment for this literal has been chosen then there are exactly four edges left connecting the corresponding vertex and the goal. So the Blocker has not the opportunity to isolate the goal and the Runner wins the game. Otherwise, if the ‘wrong’ assignment has been chosen then there is no link from Xi , resp. Xi to the goal left. Any continuation which the Runner could take either leads him back to an already visited vertex (which is a loss by Lemma 2) or, by taking another back -edge in the ‘wrong’ direction, to another vertex in the verification gadget. We handle the latter case in general: if the Runner uses a shortcut starting from a literal vertex and moves directly to the bottom of the verification gadget then the Blocker can prevent the continuation of the run by removing the corresponding single edge between the clause vertex Ck and the vertex beneath and the Runner has to move back. So the Runner wins the game if and only if he wins it without using any shortcut. If the Runner reaches a vertex Ak and the Blocker removes either the edge between Ak and Ck or the one between Ck and the goal or one of the edges leading to the vertices beneath Ck (one for each literal in ck ) then the Runner moves towards Ak+1 , resp. towards the goal if ck is the last clause. The Runner has to do so since, in the latter two cases, entering the ‘damaged’ area around Ck could be a disadvantage for him. Finally we consider the case that the Blocker removes an edge somewhere else in the graph instead. This behaviour is only reasonable if the chosen assignment satisfies ψ. So consider the round when the Runner reaches for the first time an Ak such that the edges from Ak to Ak+1 , resp. the goal, as well as all edges connected to Ck are still left. If ck is the last clause then the Runner just reaches
538
Christof L¨ oding and Philipp Rohde
the goal and wins the game. Otherwise he moves to Ck , chooses an appropriate literal xi , resp. ¬xi such that at least three edges from the corresponding vertex are still left (at least one literal of this kind exists in each clause). Since Ak is the first vertex with this property the Blocker has gained only one additional move, so nevertheless it remains at least one edge from the vertex Xi , resp. Xi to the goal. So if the Runner can chose a satisfying assignment at all then the Blocker cannot prevent the win for the Runner by this behaviour. This explains the multiplicity of four within the assignment gadgets. This completes the construction of the game Gϕ . Obviously, this construction can be done in polynomial time. Therefore, we obtain the following results. Lemma 3. The Runner has a winning strategy in the sabotage game Gϕ iff ϕ is satisfiable. Theorem 4. There is a polynomial time reduction from QBF to sabotage games with reachability winning condition on undirected graphs. In particular solving these games is PSPACE-hard. Since each edge of the game Gϕ has an ‘intended direction’, it is straight forward to check that a similar construction works for directed graphs as well. The construction can also be adapted to prove the PSPACE-hardness of other variants of the game, e.g., if the Blocker is allowed to remove up to n edges in each round for a fixed number n or if the Blocker removes vertices instead of edges. For the details we refer the reader to [4].
4
The Remaining Winning Conditions
In this section we give polynomial time reductions from sabotage games with reachability condition to the ones with complete search condition and with Hamilton path condition. We only consider games on undirected graphs. Let G be a sabotage game on an undirected arena A = (V, e, v0 ) with the reachability condition. We present an arena B such that the Runner wins G iff he wins the game G on B with the complete search condition iff he wins the game G on B with the Hamilton path condition. To obtain B we add several vertices to A: let m := |V | − 2 and let v1 , . . . , vm be an enumeration of all vertices in A except the initial vertex and the goal. We add a sequence P1 , . . . , Pm of new vertices to A together with several chains of new vertices such that each chain has length max{|V |, |E|} and their nodes are linked among each other by ‘unremovable’ edges. We add these chains from Pi as well as from Pi+1 to vertex vi for i < m and one chain from Pm to vertex vm . Furthermore we add for i < m shortcuts from the last vertices in the chains between Pi and vi to the last vertices in the chains between Pi+1 and vi to give the Runner the possibility to skip the visitation of vi . Additionally there is one link with multiplicity |V | from P1 to the goal in A, see Fig. 6. If the Runner can reach the goal in the original game G then by Lemma 2 he can do so within at most |V | − 1 steps. In this case there is at least one link
Solving the Sabotage Game Is PSPACE-Hard
539
|V |
goal •
A start •
v1 v2 v3
•
...
•
•
...
•
•
...
•
•
...
•
•
...
•
P1 P2 P3
max{|V |,|E|}
Fig. 6. Game arena B
to P1 which he uses to reach P1 . He follows the chain to v1 . If he had already visited v1 on his way to the goal he uses the shortcut at the last vertex in the chain, otherwise he visits v1 . Afterwards he moves to P2 using the next chain. Continuing like this he reaches Pm and moves towards the last vertex vm . If he had already visited vm he just stops one vertex before. Otherwise he stops at vm . Moving this way he visits each vertex of B exactly once and wins both games G and G . For the converse: if the Runner cannot reach the goal in G then he cannot do so in the games G and G as well. If he tries to use a shortcut via some Pi the Blocker has enough time on the way to Pi to cut all the links between the goal and P1 . On Runner’s way back from some Pj to a vertex in A he is able to remove all edges in the original game arena A to isolate the goal completely. Thus the Runner looses both games G and G on B. So we have: Theorem 5. There is a polynomial time reduction from sabotage games with reachability condition to sabotage games with complete search condition, resp., with Hamilton path condition. In particular solving these games is PSPACEhard.
5
A Sabotage Modal Logic
In [1] van Benthem considered a ‘sabotage modal logic’, i.e., a modal logic over changing models to express tasks corresponding to sabotage games. He introduced a cross-model modality referring to submodels from which objects have been removed. In this section we will give a formal definition of a sabotage modal logic with a ‘transition-deleting’ modality and we will show how to apply the results of the previous sections to determine the complexity of uniform model checking for this logic. To realise the use of multi-graphs we will interpret the logic over edge-labelled transition systems. By applying Lemma 1 the complexity results for the reachability game can be obtained for multi-graphs with a uniformly bounded multiplicity. Hence we can do with a finite alphabet Σ.
540
Christof L¨ oding and Philipp Rohde
Definition 6. Let p be an unary predicate symbol and a ∈ Σ. Formulae of the sabotage modal logic SML over transition systems are defined by ϕ ::= p | ¬ϕ | ϕ ∨ ϕ | ♦a ϕ | ♦ - aϕ The dual modality a and the label-free versions ♦, are defined as usual. The - a, ♦- and - are defined analogously. modalities Let T = (S, {Ra | a ∈ Σ}, L) be a transition system. For t, t ∈ S and a ∈ Σ a we define the submodel T(t,t ) := (S, {Rb | b ∈ Σ \ {a}} ∪ {Ra \ {(t, t )}}, L). For a given state s ∈ S the semantics of SML is defined as for usual modal logic together with a (T , s) |= ♦ - a ϕ iff there is (t, t ) ∈ Ra such that (T(t,t ) , s) |= ϕ
For a transition system T let Tˆ be the corresponding FO-structure. Similar to the usual modal logic one can translate the logic SML into first order logic. Since FO-model checking is in PSPACE we obtain (see [4] for a proof): Theorem 7. For every SML-formula ϕ there is an effectively constructible FOformula ϕ(x) ˆ such that for every transition system M and state s of M one has ˆ The size of ϕ(x) ˆ is polynomial in the size of ϕ. (T , s) |=SML ϕ iff Tˆ |=FO ϕ[s]. In particular, SML-model checking is in PSPACE. We can express the winning of the Runner in the sabotage game G on directed graphs with the reachability condition by an SML-formula. For that we consider the game arena as a transition system T (G) such that the multiplicity of edges is captured by the edge labelling and such that the goal vertex of the game is viewed as the only state with predicate p. We inductively define the SML-formula - γi ) ∨ p. Then we obtain the following lemma (see γn by γ0 := p and γi+1 := (♦ [4] for a proof) and in combination with Theorem 4 the PSPACE-completeness of SML model checking. Lemma 8. The Runner has a winning strategy from vertex s in the sabotage game G iff (T (G), s) |= γn where n is the number of edges of the game arena. Theorem 9. Model checking for the sabotage logic SML is PSPACE-complete.
References 1. van Benthem, J.: An essay on sabotage and obstruction. In Hutter, D., Werner, S., eds.: Festschrift in Honour of Prof. J¨ org Siekmann. LNAI. Springer (2002) 2. Demaine, E.D.: Playing games with algorithms: Algorithmic combinatorial game theory. In: Proceedings of MFCS 2001. Volume 2136 of LNCS., Springer (2001) 18–32 3. Papadimitriou, C.H.: Computational Complexity. Addison–Wesley (1994) 4. L¨ oding, C., Rohde, P.: Solving the sabotage game is PSPACE-hard. Technical Report AIB-05-2003, RWTH Aachen (2003)
The Approximate Well-Founded Semantics for Logic Programs with Uncertainty Yann Loyer1 and Umberto Straccia2 1
PRiSM, Universit´e de Versailles, 45 Avenue des Etats-Unis, 78035 Versailles, France 2 I.S.T.I. - C.N.R., Via G. Moruzzi,1 I-56124 Pisa, Italy
Abstract. The management of uncertain information in logic programs becomes to be important whenever the real world information to be represented is of imperfect nature and the classical crisp true, false approximation is not adequate. A general framework, called Parametric Deductive Databases with Uncertainty (PDDU) framework [10], was proposed as a unifying umbrella for many existing approaches towards the manipulation of uncertainty in logic programs. We extend PDDU with (non-monotonic) negation, a well-known and important feature of logic programs. We show that, dealing with uncertain and incomplete knowledge, atoms should be assigned only approximations of uncertainty values, unless some assumption is used to complete the knowledge. We rely on the closed world assumption to infer as much default “false” knowledge as possible. Our approach leads also to a novel characterizations, both epistemic and operational, of the well-founded semantics in PDDU, and preserves the continuity of the immediate consequence operator, a major feature of the classical PDDU framework.
1
Introduction
The management of uncertainty within deduction systems is an important issue whenever the real world information to be represented is of imperfect nature. In logic programming, the problem has attracted the attention of many researchers and numerous frameworks have been proposed. Essentially, they differ in the underlying notion of uncertainty (e.g. probability theory [9,13,14,15], fuzzy set theory [16,17,19], multivalued logic [7,8,10], possibilistic logic [2]) and how uncertainty values, associated to rules and facts, are managed. Lakshmanan and Shiri have recently proposed a general framework [10], called Parametric Deductive Databases with Uncertainty (PDDU), that captures and generalizes many of the precedent approaches. In [10], a rule is of the form α A ← B1 , ..., Bn . Computationally, given an assignment I of certainties to the Bi s, the certainty of A is computed by taking the “conjunction” of the certainties I(Bi ) and then somehow “propagating” it to the rule head, taking into account the certainty α of the implication. However, despite its generality, one fundamental issue that remains unaddressed in PDDU is non-monotonic negation, a well-known and important feature in logic programming. In this paper, we extend PDDU [10] to normal logic programs, logic programs with negation. In order to deal with knowledge that is usually not only uncertain, but also incomplete, we believe that one should rely on approximations of uncertainty values only. Then we study the problem of assigning a semantics to a normal logic program in such B. Rovan and P. Vojt´asˇ (Eds.): MFCS 2003, LNCS 2747, pp. 541–550, 2003. c Springer-Verlag Berlin Heidelberg 2003
542
Yann Loyer and Umberto Straccia
a framework. We first consider the least model, show that it extends the Kripke-Kleene semantics [4] from Datalog programs to normal logic programs, but that it is usually to weak. We then explain how one should try to determine approximations as much precise as possible by completing its knowledge by a kind of default reasoning based on the wellknown Closed World Assumption (CWA). Our approach consists in determining how much knowledge “extracted” from the CWA can “safely” be used to “complete” a logic program. Our approach leads to novel characterizations, both epistemic and operational, of the well-founded semantics [3] for logic programs and extends that semantics to PDDU. Moreover we show that the continuity of the immediate consequence operator, used for inferring information from the program, is preserved. This is important as this is a major feature of classical PDDU, opposed to classical frameworks like [8]. Negation has already been considered in some deductive databases with uncertainty frameworks. In [13,14], the stable semantics has been considered, but limited to the case where the underlying uncertainty formalism is probability theory. That semantics has been considered also in [19], where a semi-possibilistic logic has been proposed, a particular negation operator has been introduced and a fixed min/max-evaluation of conjunction and disjunction is adopted. To the best of our knowledge, there is no work dealing with default negation within PDDU, except than our previous attempt [11]. The semantics defined in [11] is weaker than the one presented in this paper, as in the latter approach more knowledge can be extracted from a program, it has no epistemic characterization, and rely on a less natural management of negation. In the remaining, we proceed as follows. In the following section, the syntax of PDDU with negation, called normal parametric programs, is given, Section 3 contains the definitions of interpretation and model of a program. In Section 4, we present the fundamental notion of support of a program provided by the CWA with respect to (w.r.t.) an interpretation. Then we propose novel characterizations of the well-founded semantics and compare our approach with usual semantics. Section 5 concludes.
2
Preliminaries
Consider an arbitrary first order language that contains infinitely many variable symbols, finitely many constants, and predicate symbols, but no function symbols. The predicate symbol π(A) of an atomic formula A given by A = p(X1 , . . . , Xn ) is defined by π(A) = p. The truth-space is given by a complete lattice: atomic formulae are mapped into elements of a certainty lattice L = T , , ⊗, ⊕ (a complete lattice), where T is the set of certainty values, is a partial order, ⊗ and ⊕ are the meet and join operators, respectively. With ⊥ and we denote the least and greatest element in T . With B(T ) we denote the set of finite multisets (denoted {| · }| ) over T . For instance, a typical certainty lattice is L[0,1] = T , , ⊗, ⊕, where T = [0, 1], α β iff α ≤ β, α⊗β = min(α, β), α ⊕ β = max(α, β), ⊥ = 0 and = 1. While the language does not contain function symbols, it contains symbols for families of conjunction (Fc ), propagation (Fp ) and disjunction functions (Fd ), called combination functions. Roughly, as we will see below, the conjunction function (e.g. ⊗) determines the certainty of the conjunction of L1 , ..., Ln α (the body) of a logic program rule like A ← L1 , ..., Ln , a propagation function (e.g. ⊗) determines how to “propagate” the certainty, resulting from the evaluation of the body L1 , ..., Ln , to the head A, by taking into account the certainty α of the implication,
The Approximate Well-Founded Semantics for Logic Programs with Uncertainty
543
while the disjunction function (e.g. ⊕) dictates how to combine the certainties in case an atom appears in the heads of several rules (evaluates a disjunction). Examples of conjunction, propagation and disjunction over L[0,1] are fc (x, y) = min(x, y), fp (x, y) = xy, fd (x, y) = x + y − xy. Formally, a propagation function is a mapping from T × T to T and a conjunction or disjunction function is a mapping from B(T ) to T . Each combination function is monotonic and continuous w.r.t. each one of its arguments. Conjunction and disjunction functions are commutative and associative. Additionally, each kind of function must verify some of the following properties1 : (i) bounded-above: f (α1 , α2 ) αi , for i = 1, 2, ∀α1 , α2 ∈ T ; (ii) bounded-below: f (α1 , α2 ) αi , for i = 1, 2, ∀α1 , α2 ∈ T ; (iii) f ({α}) = α, ∀α ∈ T ; (iv) f (∅) = ⊥; (v) f (∅) = ; and (vi) f (α, ) = α, ∀α ∈ T . The following should be satisfied. A conjunction function in Fc should satisfy properties (i), (iii), (v) and (vi); a propagation function in Fp should satisfy properties (i) and (vi), while a disjunction function in Fd should satisfy properties (ii), (iii) and (iv). We also assume that there is a function from T to T , called negation function, denoted ¬, that is anti-monotone w.r.t. and satisfies ¬¬α = α, ∀α ∈ T . E.g., in L[0,1] , ¬α = 1 − α is quite typical. Finally, a literal is an atomic formula or its negation. Definition 1 (Normal Parametric Program [10]). A normal parametric program P (np-program) is a 5-tuple L, R, C, P, D, whose components are defined as follows: (i) L = T , , ⊗, ⊕ is a complete lattice, where T is a set of certainties partially ordered by , ⊗ is the meet operator and ⊕ the join operator; (ii) R is a finite set of normal α parametric rules (np-rules), each of which is a statement of the form: r : A ←r L1 , ..., Ln where A is an atomic formula, L1 , ..., Ln are literals or values in T and αr ∈ T \ {⊥} is the certainty of the rule; (iii) C maps each np-rule to a conjunction function in Fc ; (iv) P maps each np-rule to a propagation function in Fp ; (v) D maps each predicate symbol in P to a disjunction function in Fd . α
For ease of presentation, we write r : A ←r L1 , ..., Ln ; fd , fp , fc to represent a np-rule in which fd ∈ Fd is the disjunction function associated with π(A) and, fc ∈ Fc and fp ∈ Fp are respectively the conjunction and propagation functions associated with r. Note that, by Definition 1, rules with same head must have the same disjunction function associated. The following example illustrates the notion of np-program. Example 1. Consider an insurance company, which has information about its customers used to determine the risk coefficient of each customer. The company has: (i) data grouped into a set F of facts; and (ii) a set R of rules. Suppose the company has the following database (which is an np-program P = F ∪ R), where a value of the risk coefficient may be already known, but has to be re-evaluated (the client may be a new client and his risk coefficient is given by his precedent insurance company). The certainty lattice is L[0;1] , with fp (x, y) = xy.
F =
1
1 Experience(John) ← 0.7 ⊕, fp , ⊗ Risk(John)
1
← 0.5 ⊕, f , ⊗
p 1 Sport car(John) ← 0.8 ⊕, fp , ⊗
For simplicity, we formulate the properties treating any function as a binary function on T .
544
Yann Loyer and Umberto Straccia
R=
1 Good driver(X) ← Experience(X), ¬Risk(X) ⊕, ⊗, ⊗ 0.8 Risk(X)
Risk(X) Risk(X)
← Young(X), ⊕, fp , ⊗
0.8
← Sport car(X) ⊕, fp , ⊗ 1
← Experience(X), ¬Good driver(X) ⊕, fp , ⊗
Using another disjunction function associated to the rules with head Risk, such as fd (x, y) = x + y − xy, might have been more appropriate in such an example (i.e. we accumulate the risk factors, rather than take the max only), but we will use ⊕ in order to facilitate the reader’s comprehension later on when we compute the semantics of P . We further define the Herbrand base BP of an np-program P as the set of all instantiated atoms corresponding to atoms appearing in P and define P ∗ to be the Herbrand instantiation of P , i.e. the set of all ground instantiations of the rules in P (P ∗ is finite). Note that a Datalog program with negation P is equivalent to the np-program constructed by ret placing each rule in P of the form A←L1 , ..., Ln by the rule A ← L1 , ..., Ln ; ⊕, ⊗, ⊗, where the classical certainty lattice L{t,f } is considered, where L{t,f } = T , , ⊗, ⊕, with T = {t, f }, is defined by f t, ⊕ = max , ⊗ = min , ¬f = t and ¬t = f , ⊥ = f and = t.
3
Interpretations of Programs
The semantics of a program P is determined by selecting a particular interpretation of P in the set of models of P , where an interpretation I of an np-program P is a function that assigns to all atoms of the Herbrand base of P a value in T . In Datalog programs, as well as in PDDU, that chosen model is usually the least model of P w.r.t. 2 . Unfortunately, the introduction of negation may have the consequence that some logic programs do not have a unique minimal model, as shown in the following example. Example 2. Consider the certainty lattice L[0,1] and the program P = {(A←¬B), (B←¬A), (A←0.2), (B←0.3)}. Informally, an interpretation I is a model of the program if it satisfies every rule, while I satisfies a rule X ← Y if I(X) I(Y ) 3 . So, this program has an infinite number of models Ixy , where 0.2 x 1, 0.3 y 1, y ≥ 1 − x, Ixy (A) = x and Ixy (B) = y (those in the A area). There are also an infinite number of minimal models (those on the thin diagonal line). The minimal models Ixy are such that y = 1 − x. 2 Concerning the previous example we may note that the certainty of A in the minimal models is in the interval [0.2, 0.7], while for B the interval is [0.3, 0.8]. An obvious question is: what should be the answer to a query A to the program proposed in Example 2? There are at least two answers: (i) the certainty of A is undefined, as there is no unique minimal model. This is clearly a conservative approach, which in case of ambiguity prefers to leave A unspecified; (ii) the certainty of A is in [0.2, 0.7], which means that even if there is no unique value for A, in all minimal models the certainty of A is in [0.2, 0.7]. In this approach we still try to provide some information. Of course, some 2 3
is extended to the set of interpretations as follows: I J iff for all atoms A, I(A) J(A). Roughly, X ← Y dictates that “X should be at least as true as Y .
The Approximate Well-Founded Semantics for Logic Programs with Uncertainty
545
care should be used. Indeed from I(A) ∈ [0.2, 0.7] and I(B) ∈ [0.3, 0.8] we should not conclude that I(A) = 0.2 and I(B) = 0.3 is a model of the program. Applying a usual approach, like the well-founded semantics [18] or the Kripke-Kleene semantics [4], would lead us to choose the conservative solution 1. This was also the approach in our early attempt to deal with normal parametric programs [11]. Such a semantics seems to be too weak, in the sense that it loses some knowledge (e.g. the value of A should be at least 0.2). In this paper we address solution 2. To this end, we propose to rely on T × T . Any element of T × T is denoted by [a; b] and interpreted as an interval on T , i.e. [a; b] is interpreted as the set of elements x ∈ T such that a x b. For instance, turning back to Example 2 above, in the intended model of P , the certainty of A is “approximated” with [0.2; 0.7], i.e. the certainty of A lies in between 0.2 and 0.7 (similarly for B). Formally, given a complete lattice L = T , , ⊗, ⊕, we construct a bilattice over T ×T , according to a well-known construction method (see [3,6]). We recall that a bilattice is a triple B, t , k , where B is a nonempty set and t , k are both partial orderings giving to B the structure of a lattice with a top and a bottom [6]. We consider B = T ×T with orderings: (i) the truth ordering t , where [a1 ; b1 ] t [a2 ; b2 ] iff a1 a2 and b1 b2 ; and (ii) the knowledge ordering k , where [a1 ; b1 ] k [a2 ; b2 ] iff a1 a2 and b2 b1 . The intuition of those orders is that truth increases if the interval contains greater values (e.g. [0.1; 0.4] t [0.2; 0.5]), whereas the knowledge increases when the interval (i.e. in our case the approximation of a certainty value) becomes more precise (e.g. [0.1; 0.4] k [0.2; 0.3], i.e. we have more knowledge). The least and greatest elements of T × T are respectively (i) f = [⊥; ⊥] (false) and t = [; ] (true), w.r.t. t ; and (ii) ⊥ = [⊥; ] (unknown – the less precise interval, i.e. the atom’s certainty value is unknown) and = [; ⊥] (inconsistent – the empty interval) w.r.t. k . The meet, join and negation on T × T w.r.t. both orderings are defined by extending the meet, join and negation from T to T × T in the natural way: let [a1 ; b1 ], [a2 ; b2 ] ∈ T × T , then – [a1 ; b1 ] ⊗t [a2 ; b2 ] = [a1 ⊗ a2 ; b1 ⊗ b2 ] and [a1 ; b1 ] ⊕t [a2 ; b2 ] = [a1 ⊕ a2 ; b1 ⊕ b2 ]; – [a1 ; b1 ] ⊗k [a2 ; b2 ] = [a1 ⊗ a2 ; b1 ⊕ b2 ] and [a1 ; b1 ] ⊕k [a2 ; b2 ] = [a1 ⊕ a2 ; b1 ⊗ b2 ]; – ¬[a1 ; b1 ] = [¬b1 ; ¬a1 ]. ⊗t and ⊕t (⊗k and ⊕k ) denote the meet and join operations on T × T w.r.t. the truth (knowledge) ordering, respectively. For instance, taking L[0,1] , [0.1; 0.4] ⊕t [0.2; 0.5] = [0.2; 0.5], [0.1; 0.4]⊗t [0.2; 0.5] = [0.1; 0.4], [0.1; 0.4]⊕k [0.2; 0.5] = [0.2; 0.4], [0.1; 0.4] ⊗k [0.2; 0.5] = [0.1; 0.5] and ¬[0.1; 0.4] = [0.6; 0.9]. Finally, we extend in a similar way the combination functions from T to T × T . Let fc (resp. fp and fd ) be a conjunction (resp. propagation and disjunction) function over T and [a1 ; b1 ], [a2 ; b2 ] ∈ T × T : (i) fc ([a1 ; b1 ], [a2 ; b2 ]) = [fc (a1 , a2 ); fc (b1 , b2 )]; (ii) fp ([a1 ; b1 ], [a2 ; b2 ]) = [fp (a1 , a2 ); fp (b1 , b2 )]; and (iii)fd ([a1 ; b1 ], [a2 ; b2 ]) = [fd (a1 , a2 ); fd (b1 , b2 )]. It is easy to verify that these extended combination functions preserve the original properties of combination functions.The following theorem holds. Theorem 1. Consider T × T with the orderings t and k . Then (i) ⊗t , ⊕t , ⊗k , ⊕k and the extensions of combination functions are continuous (and, thus, monotonic) w.r.t. t and k ; (ii) any extended negation function is monotonic w.r.t. k ; and (iii) if the negation function satisfies the de Morgan laws, i.e. ∀a, b ∈ T .¬(a ⊕ b) = ¬a ⊗ ¬b then the extended negation function is continuous w.r.t. k .
546
Yann Loyer and Umberto Straccia
Proof: We proof only the last item, as the others are immediate. Consider a chain of intervals x0 k x1 k . . ., where xj = [aj ; bj ] with aj , bj ∈ T . To show the continuity of the extended negation function w.r.t. k , we show that ¬ ⊕kj≥0 xj = ⊕kj≥0 ¬xj : ¬ ⊕kj≥0 xj = ¬[⊕j≥0 aj ; ⊗j≥0 bj ] = [¬ ⊗j≥0 bj ; ¬ ⊕j≥0 aj ] = [⊕j≥0 ¬bj ; ⊗j≥0 ¬aj ] = ⊕kj≥0 [¬bj ; ¬aj ] = ⊕kj≥0 ¬[aj ; bj ] = ⊕kj≥0 ¬xj . We can now extend interpretations over T to the above specified “interval” bilattice. Definition 2 (Approximate Interpretation). Let P be an np-program. An approximate interpretation of P is a total function I from the Herbrand base BP to the set T × T . The set of all the approximate interpretations of P is denoted CP . Intuitively, assigning the logical value [a; b] to an atom A means that the exact certainty value of A lies in between a and b with respect to . Our goal will be to determine for each atom of the Herbrand base of P the most precise interval that can be inferred. At first, we extend the two orderings on T ×T to the set of approximate interpretations CP in a usual way: let I1 and I2 be in CP , then (i) I1 t I2 iff I1 (A) t I2 (A), for all ground atoms A; and (ii) I1 k I2 iff I1 (A) k I2 (A), for all ground atoms A. Under these two orderings CP becomes a complete bilattice. The meet and join operations over T × T for both orderings are extended to CP in the usual way (e.g. for any atom A, (I ⊕k J)(A) = I(A) ⊕k J(A)). Negation is extended similarly, for any atom A, ¬I(A) = I(¬A), and approximate interpretations are extended to T , for any α ∈ T , I(α) = [α; α]. At second, we identify the models of a program. The definition extends the one given in [10] to intervals. Definition 3 (Models of a Logic Program). Let P be an np-program and let I be an approximate interpretation of P . α
1. I satisfies a ground np-rule r : A ←r L1 , ..., Ln ; fd , fp , fc in P , denoted |=I r, iff fp ([αr ; αr ], fc ({|I(L1 ), . . . , I(Ln )|})) t I(A); 2. I is a model of P , or I satisfies P , denoted |=I P , iff for all atoms A ∈ BP , fd (X) t I(A) where fd is the disjunction function associated with π(A) and α X = {|fp ([αr ; αr ], fc ({|I(L1 ), . . . , I(Ln )|})): A ←r L1 , ..., Ln ; fd , fp , fc ∈ P ∗}| . At third, among all possible models of an np-program, we have now to specify which one is the intended model. The characterization of that model will require the definition of an immediate consequence operator that will be used to infer knowledge from a program. That operator is a simple extension from T to T × T of the immediate consequence operator defined in [10] to give semantics to classical PDDU. Definition 4. Let P be any np-program. The immediate consequence operator TP is a mapping from CP to CP , defined as follows: for every interpretation I, for every ground atom A, TP (I)(A) = fd (X), where fd is the disjunction function associated with π(A) α and X = {|fp ([αr ; αr ], fc ({|I(L1 ), . . . , I(Ln )|})): A ←r L1 , ..., Ln ; fd , fp , fc ∈ P ∗}| . Note that from the property (iv) of combination functions satisfied by all disjunction functions, it follows that if an atom A does not appear as the head of a rule, then TP (I)(A) = f. Note also that any fixpoint of TP is a model of P . We have Theorem 2. For any np-program P , TP is monotonic and, if the de Morgan laws hold, continuous w.r.t. k .
The Approximate Well-Founded Semantics for Logic Programs with Uncertainty
547
Proof: The proof of monotonicity is easy. To prove the continuity w.r.t. k , consider a chain of interpretations I0 k I1 k . . .. We show that for any A ∈ BP , TP (⊕kj≥0 Ij )(A) = ⊕kj≥0 TP (Ij )(A) (Eq. 1). As CP is a complete lattice, the sequence I0 k I1 k . . . ¯ has a least upper bound, say I¯ = ⊕kj≥0 Ij . For any B ∈ BP , we have ⊕kj≥0 Ij (B) = I(B) k k k ¯ and, from Theorem 1, ⊕j≥0 Ij (¬B) = ⊕j≥0 ¬Ij (B) = ¬ ⊕j≥0 Ij (B) = ¬I(B) and, ¯ thus, for any literal or certainty value L, ⊕kj≥0 Ij (L) = I(L) (Eq. 2) Now, consider the finite set (P ∗ is finite) of all ground rules r1 , . . . , rk having A as head, where α ri = A ←i Li1 , ..., Lini ; fd , fpi , fci ). Let us evaluate the left hand side of Equation (1). ¯ ¯ i ), . . . , I(L ¯ i }| )): 0 ≤ i ≤ TP (⊕kj≥0 Ij )(A) = TP (I)(A) = fd ({|fpi ([αi ; αi ], fci ({|I(L ni 1 k k i k|}). On the other hand side, ⊕j≥0 TP (Ij )(A) = ⊕j≥0 fd ({|fp ([αi ; αi ], fci ({|Ij (Li1 ),. . . , Ij (Lini }| )): 0 ≤ i ≤ k|}). But, fd , fpi and fci are continuous and, thus, by Equation (2), ⊕kj≥0 TP (Ij )(A) = fd ({| ⊕kj≥0 {fpi ([αi ; αi ], fci ({|Ij (Li1 ),. . . , Ij (Lini }| )): 0 ≤ i ≤ k}|}) = fd ({|fpi ([αi ; αi ], ⊕kj≥0 {fci ({|Ij (Li1 ), . . . , Ij (Lini }| )}): 0 ≤ i ≤ k|}) = fd ({|fpi ([αi ; αi ], ¯ i ), . . . , fci ({| ⊕kj≥0 Ij (Li1 ), . . . , ⊕kj≥0 Ij (Lini }| )): 0 ≤ i ≤ k|}) = fd ({|fpi ([αi ;αi ],fci ({|I(L 1 i ¯ I(Lni }| )): 0 ≤ i ≤ k|}). Therefore, Equation (1) holds and, thus, TP is continuous.
4
Semantics of Normal Logic Programs
Usually, the semantics of a normal logic program is the least model of the program w.r.t. the knowledge ordering. That model always exists and coincides with the the least fixed-point of TP with respect to k (which exists as TP is monotonic w.r.t. k ). Note that this least model with respect to k corresponds to an extension of the classical KripkeKleene semantics [4] of Datalog programs with negation to normal parametric programs: if we restrict our attention to Datalog with negation, then we have to deal with four values [f ; f ], [t; t], [f ; t] and [t; f ] that correspond to the truth values false, true, unknown and inconsistent, respectively. Then, our bilattice coincides with Belnap’s logic [1] and for any Datalog program with negation P , the least fixed-point of TP w.r.t. k is a model of P that coincides with the Kripke-Kleene semantics of P . To illustrate the different notions introduced in the paper, we rely on the Example 3. Example 3 (Running example). The certainty lattice is L[0,1] and the np-program is 1
1
1
P = {(A ← B, 0.6; ⊕, ⊗, ⊗), (B ← B; ⊕, ⊗, ⊗), (A ← 0.3; ⊕, ⊗, ⊗)}.
2
For ease of presentation, we represent an interpretation as a set of expressions of the form A: [x; y], where A is a ground atom, indicating that I(A) = [x; y]. E.g. the following sequence of interpretations I0 , I1 , I2 shows how the Kripke- Kleene semantics, KKP , of the running Example 3 is computed (as the iterated fixed-point of TP , starting from I0 = I⊥ , the k minimal interpretation that maps any A ∈ BP to [⊥; ], and In+1 = TP (In )): I0 = {A: [0; 1], B: [0; 1]}, I1 = {A: [0.3; 0.6], B: [0; 1]}, I2 = I1 = KK(P ). In that model, which is minimal w.r.t. k and contains only the knowledge provided by P , the certainty of B lies between 0 and 1, i.e. is unknown, and the certainty of A then lies between 0.3 and 0.6. As well known, that semantics is usually considered as too weak. We propose to consider the Closed World Assumption (CWA) to complete our knowledge (the CWA assumes that all atoms whose value cannot be inferred from the program are false by default). This is done by defining the notion of support, introduced
548
Yann Loyer and Umberto Straccia
in [12], of a program w.r.t. an interpretation. Given a program P and an interpretation I, the support of P w.r.t. I, denoted CP (I), determines in a principled way how much false knowledge, i.e. how much knowledge provided by the CWA, can “safely” be joined to I w.r.t. the program P . Roughly speaking, a part of the CWA is an interpretation J such that J k If , where If maps any A ∈ BP to [⊥; ⊥], and we consider that such an interpretation can be safely added to I if J k TP (I ⊕k J), i.e. if J does not contradict the knowledge represented by P and I. Definition 5. The support of an np-program P w.r.t. an interpretation I, denoted CP (I), is the maximal interpretation J w.r.t. k such that J k If and J k TP (I ⊕k J). k It is easy to note that CP (I) = {J | J k If and J k TP (I ⊕k J)}. The following theorem provides an algorithm for computing the support. Theorem 3. CP (I) coincides with the iterated fixpoint of the function FP,I beginning the computation with If , where FP,I (J) = If ⊗k TP (I ⊕k J). From Theorems 1 and 2, it can be shown that FP,I is monotone and, if the de Morgan laws hold, continuous w.r.t. k . It follows that the iteration of the function FP,I starting from If decreases w.r.t. k . We will refer to CP as the closed world operator. Corollary 1. Let P be an np-program. The closed world operator CP is monotone and, if the de Morgan laws hold, continuous w.r.t. the knowledge order k . The following sequence of interpretations J0 , J1 , J2 shows the computation of CP (KKP ), i.e. the additional knowledge that can be considered using the CWA on the Kripke-Kleene semantics KKP of the running Example 3 (I = KKP , J0 = If and Jn+1 = FP,I (Jn )): J0 = {A: [0; 0], B: [0; 0]}, J1 = {A: [0.0; 0.3], B: [0; 0]}, J2 = J1 = CP (KKP ). CP (KKP ) asserts that, according to the CWA and w.r.t. P and KKP , the certainty of A should be at most 0.3, while that of B is exactly 0. We have now two ways to infer information from an np-program P and an approximate interpretation I: using TP and using CP . To maximize the knowledge derived from P and the CWA, but without introducing any other extra knowledge, we propose to choose the least model of P containing its own support, i.e. that cannot be completed anymore according to the CWA, as the semantics of P . This consideration leads to the following epistemic definition of semantics of a program P . Definition 6. The approximate well-founded semantics of an np-program P , denoted WP , is the least model I of P w.r.t. k such that CP (I) k I. Now we provide a fixpoint characterization and, thus, a way of computation of the approximate well-founded semantics. It is based on an operator, called approximate well-founded operator, that combines the two operators that have been defined above. Given an interpretation I, we complete it with its support provided by the CWA, and then activate the rules of the program on the obtained interpretation using the immediate consequence operator. Definition 7. Let P be an np-program. The approximate well-founded operator, denoted AWP , takes in input an approximate interpretation I ∈ CP and returns AWP (I) ∈ CP defined by AWP (I) = TP (I ⊕k CP (I)).
The Approximate Well-Founded Semantics for Logic Programs with Uncertainty
549
From [12], the following theorems can be shown. Theorem 4. Let P be an np-program. Any fixed-point I of AWP is a model of P . Using the properties of monotonicity and continuity of TP and CP w.r.t. the knowledge order k over CP , from the fact that CP is a complete lattice w.r.t. k , by the well-known Knaster-Tarski theorem, it follows that Theorem 5. Let P be an np-program. The approximate well-founded operator AWP is monotone and, if the de Morgan laws hold, continuous w.r.t. the knowledge order k . Therefore, AWP has a least fixed-point w.r.t. the knowledge order k . Moreover that least fixpoint coincides with the approximate well-founded semantics WP of P . The following sequence of interpretations shows the computation of WP of Example 3 (I0 = I⊥ and In+1 = AWP (In )). The certainty of A is 0.3 and the certainty of B is 0. Note that KKP k WP , i.e. the well-founded semantics contains more knowledge than the Kripke-Kleene semantics that was completed with some default knowledge from the CWA. I0 = {A: [0; 1], B: [0; 1]}, CP (I0 ) = {A: [0; 0.3], B: [0; 0]}, I1 = {A: [0.3; 0.3], B: [0; 0]}, CP (I1 ) = {A: [0; 0.3], B: [0; 0]}, I2 = I1 = WP . Example 4. Consider the program P = R ∪ F given in Example 1. The computation of the approximate well-founded semantics WP of P gives the following result4 : WP = {R(J): [0.64; 0.7], S(J): [0.8; 0.8], Y(J): [0; 0], G(J): [0.3; 0.36], E(J): [0.7; 0.7]}, which establishes that John’s degree of Risk is in between [0.64, 0.7]. 2 Finally, our approach captures and extends the usual semantics of logic programs. Theorem 6. If we restrict our attention to PDDU, then for any program P the approximate well-founded semantics WP assigns exact values to all atoms and coincides with the semantics of P proposed in [10]. Theorem 7. If we restrict our attention to Datalog with negation, then we have to deal with Belnap’s bilattices [1] and for any Datalog program with negation P , (i) any stable model [5] of P is a fixpoint of AWP , and (ii) the approximate well-founded semantics WP coincides with the well-founded semantics of P [18].
5
Conclusions
We present a novel characterization, both epistemic and operational, of the well-founded semantics in PDDU [10], an unifying umbrella for many existing approaches towards the manipulation of uncertainty in logic programs. We extended it with non-monotonic (default) negation. Main features of our extension are (i) dealing with uncertain and incomplete knowledge, atoms are assigned approximation of uncertainty values; (ii) the CWA is used to complete the knowledge to infer the most precise approximations as possible relying on a natural management of negation; (iii) that the continuity of the immediate consequence operator is preserved (which is a major feature of the classical PDDU framework); and (iv) our approach extends to PDDU with negation not only the semantics proposed in [10] for PDDU, but also the usual semantics of Datalog with negation: the well-founded semantics and the Kripke-Kleene semantics. 4
For ease of presentation, we use the first letter of predicates and constants only.
550
Yann Loyer and Umberto Straccia
References 1. N. D. Belnap. How a computer should think. In Gilbert Ryle, editor, Contemporary aspects of philosophy, pages 30–56. Oriel Press, Stocksfield, GB, 1977. 2. D. Dubois, J. Lang, and H. Prade. Towards possibilistic logic programming. In Proc. of the 8th Int. Conf. on Logic Programming (ICLP-91), pages 581–595, 1991. 3. M. Fitting. The family of stable models. J. of Logic Programming, 17:197–225, 1993. 4. M. Fitting. A Kripke-Kleene-semantics for general logic programs. J. of Logic Programming, 2:295–312, 1985. 5. M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In Proc. of the 5th Int. Conf. on Logic Programming, pages 1070–1080, 1988. 6. M. L. Ginsberg. Multi-valued logics: a uniform approach to reasoning in artificial intelligence. Computational Intelligence, 4:265–316, 1988. 7. M. Kifer and A. Li. On the semantics of rule-based expert systems with uncertainty. In Proc. of the Int. Conf. on Database Theory (ICDT-88), in LNCS 326, pages 102–117, 1988. 8. M. Kifer and V.S. Subrahmanian. Theory of generalized annotaded logic programming and its applications. J. of Logic Programming, 12:335–367, 1992. 9. L. V.S. Lakshmanan and N. Shiri. Probabilistic deductive databases. In Int. Logic Programming Symposium, pages 254–268, 1994. 10. L. V.S. Lakshmanan and N. Shiri. A parametric approach to deductive databases with uncertainty. IEEE Transactions on Knowledge and Data Engineering, 13(4):554–570, 2001. 11. Y. Loyer and U. Straccia. The well-founded semantics in normal logic programs with uncertainty. In Proc. of the 6th Int. Symposium on Functional and Logic Programming (FLOPS2002), in LNCS 2441, pages 152–166, 2002. 12. Y. Loyer and U. Straccia. The well-founded semantics of logic programs over bilattices: an alternative characterisation. Technical Report ISTI-2003-TR-05, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy, 2003. Submitted. 13. T. Lukasiewicz. Fixpoint characterizations for many-valued disjunctive logic programs with probabilistic semantics, in LNCS, 2173, pages 336–350, 2001. 14. R. Ng and V.S. Subrahmanian. Stable model semantics for probabilistic deductive databases. In Proc. of the 6th Int. Symposium on Methodologies for Intelligent Systems (ISMIS-91), in LNAI 542, pages 163–171, 1991. 15. R. Ng and V.S. Subrahmanian. Probabilistic logic programming. Information and Computation, 101(2):150–201, 1993. 16. E.Y. Shapiro. Logic programs with uncertainties: A tool for implementing rule-based systems. In Proc. of the 8th Int. Joint Conf. on Artificial Intelligence (IJCAI-83), pages 529–532, 1983. 17. M.H. van Emden. Quantitative deduction and its fixpoint theory. J. of Logic Programming, 4(1):37–53, 1986. 18. A. van Gelder, K. A. Ross, and J. S. Schlimpf. The well-founded semantics for general logic programs. J. of the ACM, 38(3):620–650, January 1991. 19. G. Wagner. Negation in fuzzy and possibilistic logic programs. In T. Martin and F. Arcelli, editors, Logic programming and Soft Computing. Research Studies Press, 1998.
Which Is the Worst-Case Nash Equilibrium? Thomas L¨ ucking1 , Marios Mavronicolas2, Burkhard Monien1 , Manuel Rode1, , Paul Spirakis3,4 , and Imrich Vrto5 1
3
Faculty of Computer Science, Electrical Engineering and Mathematics University of Paderborn, F¨ urstenallee 11, 33102 Paderborn, Germany {luck,bm,rode}@uni-paderborn.de 2 Department of Computer Science, University of Cyprus P. O. Box 20537, Nicosia CY-1678, Cyprus [email protected] Computer Technology Institute, P. O. Box 1122, 261 10 Patras, Greece [email protected] 4 Department of Computer Engineering and Informatics University of Patras, Rion, 265 00 Patras, Greece 5 Institute of Mathematics, Slovak Academy of Sciences 841 04 Bratislava 4, D´ ubravska´ a 9, Slovak Republic [email protected]
Abstract. A Nash equilibrium of a routing network represents a stable state of the network where no user finds it beneficial to unilaterally deviate from its routing strategy. In this work, we investigate the structure of such equilibria within the context of a certain game that models selfish routing for a set of n users each shipping its traffic over a network consisting of m parallel links. In particular, we are interested in identifying the worst-case Nash equilibrium – the one that maximizes social cost. Worst-case Nash equilibria were first introduced and studied in the pioneering work of Koutsoupias and Papadimitriou [9]. More specifically, we continue the study of the Conjecture of the Fully Mixed Nash Equilibrium, henceforth abbreviated as FMNE Conjecture, which asserts that the fully mixed Nash equilibrium, when existing, is the worst-case Nash equilibrium. (In the fully mixed Nash equilibrium, the mixed strategy of each user assigns (strictly) positive probability to every link.) We report substantial progress towards identifying the validity, methodologies to establish, and limitations of, the FMNE Conjecture.
1
Introduction
Motivation and Framework. Nash equilibrium [12,13] is arguably the most important solution concept in (non-cooperative) Game Theory1 . It represents
1
This work has been partially supported by the IST Program of the European Union under contract numbers IST-1999-14186 (ALCOM-FT) and IST-2001-33116 (FLAGS), by funds from the Joint Program of Scientific and Technological Collaboration between Greece and Cyprus, by research funds at University of Cyprus, and by the VEGA grant No. 2/3164/23. Graduate School of Dynamic Intelligent Systems See [14] for a concise introduction to contemporary Game Theory.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 551–561, 2003. c Springer-Verlag Berlin Heidelberg 2003
552
Thomas L¨ ucking et al.
a stable state of the play of a strategic game in which each player holds an accurate opinion about the (expected) behavior of other players and acts rationally. Understanding the combinatorial structure of Nash equilibria is a necessary prerequisite to either designing efficient algorithms to compute them, or establishing corresponding hardness and thereby designing (efficient) approximation algorithms2 . In this work, we embark on a systematic study of the combinatorial structure of Nash equilibria in the context of a simple routing game that models selfish routing over a non-cooperative network such as the Internet. This game was originally introduced in a pioneering work of Koutsoupias and Papadimitriou [9]; that work defined coordination ratio (also known as price of anarchy [15]) as a worst-case measure of the impact of the selfish behavior of users on the efficiency of routing over a non-cooperative network operating at a Nash equilibrium. As a worst-case measure, the coordination ratio bounds the maximum loss of efficiency due to selfish behavior of users at the worst-case Nash equilibrium; in sharp contrast, the principal motivation of our work is to identify the actual worst-case Nash equilibrium of the selfish routing game. Within the framework of the selfish routing game of Koutsoupias and Papadimitriou [9], we assume a collection of n users, each employing a mixed strategy, which is a probability distribution over m parallel links, to control the shipping of its own assigned traffic. For each link, a capacity specifies the rate at which the link processes traffic. In a Nash equilibrium, each user selfishly routes its traffic on those links that minimize its expected latency cost, given the network congestion caused by the other users. The social cost of a Nash equilibrium is the expectation, over all random choices of the users, of the maximum, over all links, latency through a link. The worst-case Nash equilibrium is one that maximizes social cost. Our study distinguishes between pure Nash equilibria, where each user chooses exactly one link (with probability one), and mixed Nash equilibria, where the choices of each user are modeled by a probability distribution over links. Of special interest to our work is the fully mixed Nash equilibrium [10], where each user chooses each link with non-zero probability; henceforth, denote F the fully mixed Nash equilibrium. We will also introduce and study disjointly mixed Nash equilibria, where (loosely speaking) mixed strategies of different users do not intersect. Allowing link capacities to vary arbitrarily gives rise to the standard model of related links, also known as model of uniform links in the scheduling literature (cf. Gonzales et al. [5]); the name is due to the fact that the order of the delays a user experiences on each of the links is the same across all users. A special case of the model of related links is the model of identical links, where all link capacities are equal (cf. Graham [6]); thus, in this model, each user incurs the same delay on all links. We also consider the model of unrelated links, where instead of associating a traffic and a capacity with each user and link, respec2
Computation of Nash equilibria has been long observed to be a very challenging, yet notoriously hard algorithmic problem; see [15] for an advocation.
Which Is the Worst-Case Nash Equilibrium?
553
tively, we assign a delay for each pair of a user and a link in an arbitrary way (cf. Horowitz and Sahni [7]); thus, in the unrelated links model, there is no relation between the delays incurred to a user on different links. Reciprocally, in the model of identical traffics, all user traffics are equal; they may vary arbitrarily in the model of arbitrary traffics. We are interested in understanding the impact of model assumptions on links and users on the patterns of the worst-case Nash equilibria for the selfish routing game we consider. Results and Contribution. In this work, we embark on a systematic study of a natural conjecture due to Gairing et al. [4], which asserts that the fully mixed Nash equilibrium is the worst-case Nash equilibrium (with respect to social cost). Fully Mixed Nash Equilibrium Conjecture [4]. Consider the model of arbitrary traffics and related links. Then, for any traffic vector w such that the fully mixed Nash equilibrium F exists, and for any Nash equilibrium P, SC (w, P) ≤ SC (w, F). Henceforth, abbreviate the Fully Mixed Nash Equilibrium Conjecture as the FMNE Conjecture. Our study reports substantial progress towards the settlement of the FMNE Conjecture: – We prove the FMNE Conjecture for several interesting special cases of it (within the model of related links). – In doing so, we provide proof techniques and tools which, while applicable to interesting special cases of it, may suffice for the general case as well. – We reveal limitations of the FMNE Conjecture by establishing that it is not, in general, valid over the model of unrelated links; we present both positive and negative instances for the conjecture. Related Work, Comparison and Significance. The selfish routing game considered in this paper was first introduced and studied in the pioneering work of Koutsoupias and Papadimitriou [9]. This game was subsequently studied in the work of Mavronicolas and Spirakis [10], where fully mixed Nash equilibria were introduced and analyzed. Both works focused mainly on proving bounds on coordination ratio. Subsequent works that provided bounds on coordination ratio include [1,2,8]. The work of Fotakis et al. [3] was the first to study the combinatorial structure and the computational complexity of Nash equilibria for the selfish routing game we consider; that work was subsequently extended by Gairing et al. [4]. (See details below.) The closest to our work are the one by Fotakis et al. [3] and the one by Gairing et al. [4]. – The FMNE Conjecture has been inspired by two results due to Fotakis et al. [3] that confirm or support the conjecture. First, Fotakis et al. [3, Theorem 6] establish the Fully Mixed Nash Equilibrium Conjecture for the model of identical links and assuming that n = 2; Theorem 3 in this work extends this
554
Thomas L¨ ucking et al.
result to the model of related links, still assuming that n = 2 while assuming, in addition, that traffics are identical. Second, Fotakis et al. [3, Theorem 7] prove that, for the model of related links and of identical traffics, the social cost of any Nash equilibrium is no more than 49.02 times the social cost of the fully mixed Nash Equilibrium. – The FMNE Conjecture was explicitly stated in the work of Gairing et al. [4, Conjecture 1.1]. In the same paper, two results are shown that confirm or support the conjecture. First, Gairing et al. [4, Theorem 4.2] establish the validity of the FMNE Conjecture when restricted to pure Nash equilibria. Second, Gairing et al. [4, Theorem 5.1] prove that for the model of identical links, the social cost of any Nash equilibrium is no more than 6 + ε times the social cost of the fully mixed Nash equilibrium, for any constant ε > 0. (Note that since this result does not assume identical traffics, it is incomparable to the related result by Fotakis et al. [3, Theorem 7] (for the model of related links) which does.) The ultimate settlement of the FMNE Conjecture (for the model of related links) may reveal an interesting complexity-theoretic contrast between the worstcase pure and the worst-case mixed Nash equilibria. On one hand, identifying the worst-case pure Nash equilibrium is an N P-hard problem [3, Theorem 4]; on the other hand, if the FMNE Conjecture is valid, identification of the worstcase mixed Nash equilibrium is immediate in the cases where the fully mixed Nash equilibrium exists. (In addition, the characterization of the fully mixed Nash equilibrium shown in [10, Theorem 14] implies that such existence can be checked in polynomial time.) Road Map. The rest of this paper is organized as follows. Section 2 presents our definitions and some preliminaries. The case of disjointly mixed Nash equilibria is treated in Section 3. Section 4 considers the case of identical traffics and related links with n = 2. The reciprocal case of identical traffics and identical links with m = 2 is studied in Section 5. Section 6 examines the case of unrelated links. We conclude, in Section 7, with a discussion of our results and some open problems.
2
Framework
Most of our definitions are patterned after those in [10, Section 2], [3, Section 2] and [4, Section 2], which, in turn, were based on those in [9, Sections 1 & 2]. Mathematical Preliminaries and Notation. Throughout, denote for any integer m ≥ 2, [m] = {1, . . . , m}. For a random variable X, denote E(X) the expectation of X. General. We consider a network consisting of a set of m parallel links 1, 2, . . . , m from a source node to a destination node. Each of n network users 1, 2, . . . , n, or users for short, wishes to route a particular amount of traffic along a (non-fixed) link from source to destination. (Throughout, we will be using subscripts for users and superscripts for links.) In the model of related links, denote wi the
Which Is the Worst-Case Nash Equilibrium?
555
traffic of user i ∈ [n], and W = i∈[n] wi . Define the n × 1 traffic vector w in the natural way. Assume throughout that m > 1 and n > 1. Assume also, without loss of generality, that w1 ≥ w2 ≥ . . . ≥ wn . In the model of unrelated links, denote Cij the cost of user i ∈ [n] on link j ∈ [m]. Define the n × m cost matrix C in the natural way. A pure strategy for user i ∈ [n] is some specific link. A mixed strategy for user i ∈ [n] is a probability distribution over pure strategies; thus, a mixed strategy is a probability distribution over the set of links. The support of the mixed strategy for user i ∈ [n], denoted support(i), is the set of those pure strategies (links) to which i assigns positive probability. A pure strategy profile is represented by an n-tuple 1 , 2 , . . . , n ∈ [m]n ; a mixed strategy profile is represented by an n × m probability matrix P of nm probabilities pji , i ∈ [n] and j ∈ [m], where pji is the probability that user i chooses link j. For a probability matrix P, define indicator variables Iij ∈ {0, 1}, where i ∈ [n] and j ∈ [m], such that Iij = 1 if and only if pji > 0. Thus, the support of the mixed strategy for user i ∈ [n] is the set {j ∈ [m] | Iij = 1}. For each link j ∈ [m], define the view of link j, denoted view (j), as the set of users i ∈ [n] that potentially assign their traffics to link j; so, view (j) = {i ∈ [n] | Iij = 1}. For each link j ∈ [m], denote V j = |view (j)|. Syntactic Classes of Mixed Strategies. A mixed strategy profile P is disjointly mixed if for all links j ∈ [m], |{i ∈ view (j) : pji < 1}| ≤ 1, that is, there is at most one non-pure user on each link. A mixed strategy profile P is fully mixed [10, Section 2.2] if for all users i ∈ [n] and links j ∈ [m], Iij = 1 3 . Throughout, we will cast a pure strategy profile as a special case of a mixed strategy profile in which all (mixed) strategies are pure. System, Models and Cost Measures. In the model of related links, denote c > 0 the capacity of link ∈ [m], representing the rate at which the link processes traffic, and C = l∈[m] cl . So, the latency for traffic w through link equals w/c . In the model of identical capacities, all link capacities are equal to c, for some constant c > 0; link capacities may vary arbitrarily in the model of arbitrary capacities. Assume throughout, without loss of generality, that c1 ≥ c2 ≥ . . . ≥ cm . In the model of identical traffics, all user traffics are equal to 1; user traffics may vary arbitrarily in the model of arbitrary traffics. For a pure strategy profile 1 , 2 , . . . , n , the latency cost for user i ∈ [n], denoted λi , is the latency cost of the link it chooses, that is, ( k:k =i wk )/ci . For a mixed strategy profile P, denote δ the actual traffic on link ∈ [m]; so, δ is a random variable. For each link ∈ [m], denote θ the expected traffic n on link ∈ [m]; thus, θ = E(δ ) = i=1 pi wi . For a mixed strategy profile P, the expected latency cost for user i ∈ [n] on link ∈ [m], denoted λi , is the expectation, over all random choices of the remaining users, of the latency cost for user i had its traffic been assigned to link ; thus, 3
An earlier treatment of fully mixed strategies in the context of bimatrix games has been found in [16], called there completely mixed strategies. See also [11] for a subsequent treatment in the context of strategically zero-sum games.
556
Thomas L¨ ucking et al.
λi
=
wi +
k=1,k=i c
pk wk
=
(1 − pi )wi + θ . c
For each user i ∈ [n], the minimum expected latency cost, denoted λi , is the minimum, over all links ∈ [m], of the expected latency cost for user i on link ; thus, λi = min∈[m] λi . Associated with a traffic vector w and a mixed strategy profile P is the social cost [9, Section 2], denoted SC(w, P), which is the expectation, over all random choices of the users, of the maximum (over all links) latency of traffic through a link; thus,
SC(w, P) = E
max
∈[m]
k:k = c
wk
=
1 ,2 ,...,n ∈[m]n
n k=1
pkk
· max
∈[m]
k:k = c
wk
.
Note that SC (w, P) reduces to the maximum latency through a link in the case of pure strategies. On the other hand, the social optimum [9, Section 2] associated with a traffic vector w, denoted OPT(w), is the least possible maximum (over all links) latency of traffic through a link. Note that while SC(w, P) is defined in relation to a mixed strategy profile P, OPT(w) refers to the optimum pure strategy profile. In the model of unrelated links, the latency of user i on link l is its cost . expected latency cost of user i on link l translates to λli = Cil + C il Thus, the l on C and the strategy profile k=1,k=i pi Ckl , and the social cost, now depending n lk P, is defined by SC(C, P) = l1 ,l2 ,...,ln ∈[m]n k:lk =l Ckl . k=1 pk · maxl∈[m] Nash Equilibria. We are interested in a special class of mixed strategies called Nash equilibria [13] that we describe below. Formally, the probability matrix P is a Nash equilibrium [9, Section 2] if for all users i ∈ [n] and links ∈ [m], λi = λi if Ii = 1, and λi ≥ λi if Ii = 0. Thus, each user assigns its traffic with positive probability only on links for which its expected latency cost is minimized; this implies that there is no incentive for a user to unilaterally deviate from its mixed strategy in order to avoid links on which its expected latency cost is higher than necessary. The coordination ratio [9] is the maximum value, over all traffic vectors w and Nash equilibria P of the ratio SC (w, P) /OPT (w). In the model of unrelated links, the coordination ratio translates to the maximum value of SC (C, P) /OPT (C). Mavronicolas and Spirakis [10, Lemma 15] show that in the model of identical links, all links are equiprobable in a fully mixed Nash equilibrium. Lemma 1 (Mavronicolas and Spirakis [10]). Consider the fully mixed case under the model of identical capacities. Then, there exists a unique Nash equilibrium with associated Nash probabilities pi = 1/m, for any user i ∈ [n] and link ∈ [m]. Gairing et al. [4, Lemma 4.1] show that in the model of related links, the minimum expected latency cost of any user i ∈ [n] in a Nash equilibrium P is bounded by its minimum expected latency cost in the fully mixed Nash equilibrium F.
Which Is the Worst-Case Nash Equilibrium?
557
Lemma 2 (Gairing et al. [4]). Fix any traffic vector w, mixed Nash equilibrium P and user i. Then, λi (w, P) ≤ λi (w, F).
3
Disjointly Mixed versus Fully Mixed Nash Equilibria
In this section, we restrict ourselves to the case of disjointly mixed Nash equilibria, and we establish the FMNE Conjecture for this case. We prove: Theorem 1. Fix any traffic vector w such that F exists, and any disjointly mixed Nash equilibrium P. Then, SC (w, P) ≤ SC (w, F). Corollary 1. Consider the model of related links, and assume that n = 2 and m = 2. Then, the FMNE Conjecture is valid.
4
Identical Traffics, Related Links and n = 2
In this section we restrict to 2 users with identical traffics, that is, w1 = w2 . Without loss of generality we assume w1 = w2 = 1 and c1 ≥ · · · ≥ cm . In the following, we denote by support(1) and support(2) the supports of user 1 and 2, respectively, and by pji and fij the probabilities for user i to choose link j in P and F, respectively. Since we consider two users with identical traffics, we have f1j = f2j for all j ∈ [m], and we write f j = fij . In order to prove the FMNE Conjecture for this type of Nash equilibria we will use the following formula for the social cost of any Nash equilibrium P in this setting. Theorem 2. In case of two users with identical traffics on m related links, the social cost of any Nash equilibrium P is
1 1 i j SC(w, P) = λ2 (P) + p2 p1 − i . cj c 1≤i<j≤m
We now show that we only have to consider Nash equilibria P of certain structure. Lemma 3. For any Nash equilibrium P = F of two users with identical traffics on m related links the following holds: 1. The supports of the two users are support(1) = [r] ∪ I1
and
support(2) = [r] ∪ I2 ,
where I1 , I2 are disjoint sets of links not containing a link i ∈ [r], such that [r] ∪ I1 ∪ I2 = [r + |I1 | + |I2 |]. 2. All links in I1 (I2 ) have the same capacity.
558
Thomas L¨ ucking et al.
In order to prove the FMNE Conjecture for two users with identical traffics on m related links in Theorem 3, we show that the following lemma holds. Lemma 4. Let G be the fully mixed Nash equilibrium of two users with identical traffics on m related links with capacities c1 ≥ . . . ≥ cm . Furthermore, let the last s ≥ 1 links have the same capacity, and let F be the fully mixed Nash equilibrium of the instance received by increasing the capacities of the last s links to cm−s . Then SC(w, F) ≤ SC(w, G). Theorem 3. Consider the model of identical traffics and related links, and assume that n = 2. Then, the FMNE Conjecture is valid.
5
Identical Traffics, Identical Links and m = 2
We show: Theorem 4. Consider the model of identical traffics and identical links, and assume that m = 2 and n is even. Then, the FMNE Conjecture is valid. Proof. Since both the traffics and the link capacities are identical, we can assume without loss of generality that wi = 1 for all i ∈ [n] and cj = 1 for all j ∈ [m]. Recall that in the case of identical capacities, the fully mixed Nash equilibrium F exists always (that is, for all traffic vectors w). Hence, we will show that for any other Nash equilibrium P, SC (w, P) ≤ SC (w, F). Fix any Nash equilibrium P. We can identify three sets of users in P: U1 = {i : support(i) = {1}}, U2 = {i : support(i) = {2}} and U12 = {i : support(i) = {1, 2}}. There are u = min(|U1 |, |U2 |) (pure) users, which choose link 1 and link 2, respectively, with probability 1. Therefore, SC(w, P) = SC(w, P ) + u, where P is the Nash equilibrium derived from P by omitting those 2u users. We will show, that SC (w, F ) ≥ SC (w, P ) for the fully mixed Nash equilibrium F of n − 2u users. As SC (w, F) > SC (w, F ) + 2u (Lemma 5), this will prove the theorem. Without loss of generality, we can assume that P is of the following form: r (pure) users go on link 1 with probability 1, and n − r users choose both links with positive probability. We write Pr for this kind of Nash equilibrium. Lemma 5. For the fully mixed Nash equilibrium F,
n n−1 n SC (w, F) = + n n . 2 2 2 −1 Lemma 6. For the Nash equilibrium Pr with two sets of users U1 = {i : support(i) = {1}} and U12 = {i : support(i) = {1, 2}} with |U1 | = r < n and |U12 | = n − r the Nash probabilities are p := p1i =
r 1 − , 2 2(n − r − 1)
and
q := p2i =
for all users i ∈ U12 . Furthermore, n > 2r + 1 holds.
r 1 + , 2 2(n − r − 1)
Which Is the Worst-Case Nash Equilibrium?
559
Lemma 7. The social cost of the Nash equilibrium Pr is given by n SC (w, Pr ) = 2 +
n−r n −r 2
n−r i= n +1 2
i·
p
n −r 2
q
n 2
+
n i= n +1 2
i·
n−r i−r
pi−r q n−i
n − r n−r−i i p q . i
The proof is completed by showing that ∆ := SC (w, F) − SC (w, Pr ) ≥ 0.
6
Unrelated Links
In this section, we consider the case of unrelated links. We prove Proposition 1. Consider the model of unrelated links. Fix any cost matrix C for which F exists, and a pure Nash equilibrium P. Assume that n ≤ m. Then, for any user i, λi (P) < λi (F). Theorem 5. Consider the model of unrelated links. Assume that n ≤ m. Consider any cost matrix C such that the fully mixed Nash equilibrium F exists, and any pure Nash equilibrium P. Then, SC (C, P) ≤ SC (C, F). Proof. Clearly, the social cost of any pure Nash equilibrium P is equal to the selfish cost of some user, while the social cost of a fully mixed Nash equilibrium F is at least the selfish cost of any user. Hence, Proposition 1 implies the claim. Proposition 2. Consider the model of unrelated links. Assume that n = 2. Fix any cost matrix C for which F exists, and any Nash equilibrium P. Then, for any user i ∈ [2], λi (P) ≤ λi (F). Theorem 6. Consider the model of unrelated links. Assume that n = 2 and m = 2. Then, the FMNE Conjecture is valid. We remark that Theorem 6 generalizes Corollary 1 to the case of unrelated links. We finally prove: Theorem 7 (Counterexample to the FMNE Conjecture). Consider the model of unrelated links. Then, the FMNE Conjecture is not valid even if n = 3 and m = 2.
7
Conclusion and Directions for Further Research
We have verified the FMNE Conjecture over several interesting restrictions of the selfish routing game we considered for the case of related links. We have also investigated the FMNE Conjecture in the case of unrelated links, for which we have identified instances of the game that validate and falsify the FMNE Conjecture, respectively. The most obvious problem left open by our work is to
560
Thomas L¨ ucking et al.
establish the FMNE Conjecture in its full generality for the case of related links. We hope that several of the combinatorial techniques introduced in this work for settling special cases of the conjecture may be handy for the general case. The FMNE Conjecture attempts to study a possible order on the set of Nash equilibria (for the specific selfish routing game we consider) that is defined with respect to their social costs; in the terminology of partially ordered sets, the FMNE Conjecture asserts that the fully mixed Nash equilibrium is a maximal element of the defined order. We feel that this order deserves further study. For example, what are the minimal elements of the order? More generally, is there a characterization of measures on Nash equilibria such that the fully mixed Nash equilibrium is a maximal element of the order defined with respect to any specific measure? (Our study considers the social cost as one such measure of interest.)
Acknowledgments We thank Rainer Feldmann and Martin Gairing for several helpful discussions.
References 1. A. Czumaj and B. V¨ ocking, “Tight Bounds for Worst-Case Equilibria”, Proceedings of the 13th Annual ACM Symposium on Discrete Algorithms, pp. 413–420, 2002. 2. R. Feldmann, M. Gairing, T. L¨ ucking, B. Monien and M. Rode, “Nashification and the Coordination Ratio for a Selfish Routing Game”, 30th International Colloquium on Automata, Languages and Programming, 2003. 3. D. Fotakis, S. Kontogiannis, E. Koutsoupias, M. Mavronicolas and P. Spirakis, “The Structure and Complexity of Nash Equilibria for a Selfish Routing Game,’ Proceedings of the 29th International Colloquium on Automata, Languages and Programming, LNCS 2380, pp. 123–134, 2002. 4. M. Gairing, T. L¨ ucking, M. Mavronicolas, B. Monien and P. Spirakis, “Extreme Nash Equilibria”, submitted for publication, March 2003. Also available as Technical Report FLAGS-TR-02-5, Computer Technology Institute, Patras, Greece, November 2002. 5. T. Gonzalez, O.H. Ibarra and S. Sahni, “Bounds for LPT schedules on uniform processors”, SIAM Journal on Computing, Vol. 6, No. 1, pp. 155–166, 1977. 6. R. L. Graham, “Bounds on Multiprocessing Timing Anomalies”, SIAM Journal on Applied Mathematics, Vol. 17, pp. 416–426, 1969. 7. E. Horowitz and S. Sahni, “Exact and aproximate algorithms for scheduling nonidentical processors”, Journal of the Association of Computing Machinery, Vol. 23, No. 2, pp. 317–327, 1976. 8. E. Koutsoupias, M. Mavronicolas and P. Spirakis, “Approximate Equilibria and Ball Fusion”, Proceedings of the 9th International Colloquium on Structural Information and Communication Complexity, 2002, accepted to Theory of Computing Systems. 9. E. Koutsoupias and C. H. Papadimitriou, “Worst-case Equilibria”, Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science, LNCS 1563, pp. 404–413, 1999.
Which Is the Worst-Case Nash Equilibrium?
561
10. M. Mavronicolas and P. Spirakis, “The Price of Selfish Routing”, Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pp. 510–519, 2001. 11. H. Moulin and L. Vial, “Strategically Zero-Sum Games: The Class of Games whose Completely Mixed Equilibria Cannot be Improved Upon”, International Journal of Game Theory, Vol. 7, Nos. 3/4, pp. 201–221, 1978. 12. J. F. Nash, “Equilibrium Points in N -Person Games”, Proceedings of the National Academy of Sciences, Vol. 36, pp. 48–49, 1950. 13. J. F. Nash, “Non-cooperative Games”, Annals of Mathematics, Vol. 54, No. 2, pp. 286–295, 1951. 14. M. J. Osborne and A. Rubinstein, A Course in Game Theory, MIT Press, 1994. 15. C. H. Papadimitriou, “Algorithms, Games and the Internet”, Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pp. 749–753, 2001. 16. T. E. S. Raghavan, “Completely Mixed Strategies in Bimatrix Games”, Journal of London Mathematical Society, Vol. 2, No. 2, pp. 709–712, 1970.
A Unique Decomposition Theorem for Ordered Monoids with Applications in Process Theory (Extended Abstract) Bas Luttik Dept. of Theoretical Computer Science, Vrije Universiteit Amsterdam De Boelelaan 1081a, NL-1081 HV Amsterdam, The Netherlands [email protected], http://www.cs.vu.nl/˜luttik
Abstract. We prove a unique decomposition theorem for a class of ordered commutative monoids. Then, we use our theorem to establish that every weakly normed process definable in ACPε with bounded communication can be expressed as the parallel composition of a multiset of weakly normed parallel prime processes in exactly one way.
1
Introduction
The Fundamental Theorem of Arithmetic states that every element of the commutative monoid of positive natural numbers under multiplication has a unique decomposition (i.e., can be expressed as a product of prime numbers uniquely determined up to the order of the primes). It has been an invaluable tool in number theory ever since the days of Euclid. In the realm of process theory, unique decomposability with respect to parallel composition is crucial in the proofs that bisimulation is decidable for normed BPP [5] and normed PA [8]. It also plays an important rˆ ole in the analysis of axiom systems involving an operation for parallel composition [1,6,12]. Milner and Moller [10] were the first to establish the unique decomposition property for a commutative monoid of finite processes with a simple operation for parallel composition. In [11], Moller presents an alternative proof of this result which he attributes to Milner; we shall henceforth refer to it as Milner’s technique. Moller explains that the reason for presenting Milner’s technique is that it serves “as a model for the proof of the same result in more complicated languages which evade the simpler proof method” of [10]. He refines Milner’s technique twice. First, he adds communication to the operational semantics of the parallel operator. Then, he turns from strong bisimulation semantics to weak bisimulation semantics. Christensen [4] shows how Milner’s technique can be further refined so that also certain infinite processes can be dealt with. He proves unique decomposition theorems for the commutative monoids of weakly normed BPP and of weakly normed BPPτ expressions modulo strong bisimulation. Milner’s technique hinges on some special properties of the operational semantics of parallel composition. The main contribution of this paper is to place these properties in a general algebraic context. Milner’s technique employs a well-founded subrelation of the transition relation induced on processes by the B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 562–571, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Unique Decomposition Theorem for Ordered Monoids
563
operational semantics. We consider commutative monoids equipped with a wellfounded partial order (rather than an arbitrary well-founded relation) to tie in with the theory of ordered monoids as put forward, e.g., in [3,7]. In Section 2 we propose a few simple conditions on ordered commutative monoids, and we prove that they imply the unique decomposition property (Theorem 13). Then, to prove that a commutative monoid has the unique decomposition property, it suffices to define a partial order and establish that it satisfies our conditions. From Section 3 onwards, we illustrate this technique, discussing unique decomposability for the process theory ACPε [13]. ACPε is more expressive than any of the process theories for which unique decomposition was investigated previously. Firstly, it distinguishes two forms of termination (successful and unsuccessful). Secondly, it has a more general communication mechanism (an arbitrary number of parallel components may participate in a single communication, and communication not necessarily results in τ ). These two features make the extension of Milner’s technique to ACPε nontrivial; in fact, they both lead to counterexamples obstructing a general unique decomposition result (see Examples 16 and 19). In Section 4 we introduce for ACPε an appropriate notion of weak normedness that takes into account the distinction between successful and unsuccessful termination, and we propose a requirement on the communication mechanism. In Section 5 we prove that if the communication mechanism meets the requirement, then the commutative monoid of weakly normed ACPε expressions modulo bisimulation satisfies the abstract specification of Section 2, and hence admits a unique decomposition theorem. Whether or not a commutative monoid satisfies the conditions put forward in Section 2 is independent of the nature of its elements (be it natural numbers, bisimulation equivalence classes of process expressions, or objects of any other kind). Thus, in particular, our unique decomposition theorem for ordered monoids is independent of a syntax for specifying processes. We think that it will turn out to be a convenient tool for establishing unique decomposability results in a wide range of process theories, and for a wide range of process semantics. For instance, we intend to investigate next whether our theorem can be applied to establish unique decomposition results for commutative monoids of processes definable in ACPε modulo weak- and branching bisimulation, and of processes definable in the π-calculus modulo observation equivalence.
2
Unique Decomposition in Commutative p.o. Monoids
A positively ordered monoid (a p.o. monoid ) is a nonempty set M endowed with: (i) an associative binary operation ⊗ on M with an identity element ι ∈ M ; the operation ⊗ stands for composition and ι represents the empty composition; (ii) a partial order on M that is compatible with ⊗, i.e., x y implies x ⊗ z y ⊗ z and z ⊗ x z ⊗ y for all x, y, z ∈ M , and for which the identity ι is the least element, i.e., ι x for all x ∈ M . A p.o. monoid is commutative if its composition is commutative.
564
Bas Luttik
An example of a commutative p.o. monoid is the set N of natural numbers with addition (+) as binary operation, 0 as identity element and the less-thanor-equal relation (≤) as (total) order; we call it the additive p.o. monoid of natural numbers. Another example is the set N∗ of positive natural numbers with multiplication (·) as binary operation, 1 as identity element and the divisibility relation (|) as (partial) order; we call it the multiplicative p.o. monoid of positive natural numbers. In the remainder of this section we shall use N and N∗ to illustrate the theory of decomposition in commutative p.o. monoids that we are about to develop. However, they are not meant to motivate it; the motivating examples stem from process theory. In particular, note that N and N∗ are so-called divisibility monoids [3] in which x y is equivalent to ∃z(x ⊗ z = y). The p.o. monoids arising from process theory generally do not have this property. Definition 1. An element p of a monoid M is called prime if p = ι and p = x⊗y implies x = ι or y = ι. Example 2. The natural number 1 is the only prime element of N. The prime elements of N∗ are the prime numbers. Let x1 , . . . , xn be a (possibly empty) sequence of elements of a monoid M ; we formally define its composition x1 ⊗ · · · ⊗ xn by the following recursion: (i) if n = 0, then x1 ⊗ · · · ⊗ xn = ι; and (ii) if n > 0, then x1 ⊗ · · · ⊗ xn = (x1 ⊗ · · · ⊗ xn−1 ) ⊗ xn . n Occasionally, we shall write i=1 xi instead of x1 ⊗ · · · ⊗ xn . Furthermore, we write xn for the n-fold composition of x. Definition 3. If x is an element of a monoid M and p1 , . . . , pn is a sequence of prime elements of M such that x = p1 ⊗ · · · ⊗ pn , then we call the expression p1 ⊗ · · · ⊗ pn a decomposition of x in M . Two decompositions p1 ⊗ · · · ⊗ pm and q1 ⊗ · · · ⊗ qn of x are equivalent if there is a bijection σ : {1, . . . , m} → {1, . . . , n} such that pi = qσ(i) for all 1 ≤ i ≤ m; otherwise, they are distinct. The identity element ι has the composition of the empty sequence of prime elements as a decomposition, and every prime element has itself as a decomposition. We now proceed to discuss the existence and uniqueness of decompositions in commutative p.o. monoids. We shall present two conditions that together guarantee that every element of a commutative p.o. monoid has a unique decomposition. Definition 4. Let M be a commutative p.o. monoid; by a stratification of M we understand a mapping | | : M → N from M into the additive p.o. monoid N of natural numbers that is a strict homomorphism, i.e., (i) |x ⊗ y| = |x| + |y|, and (ii) x ≺ y implies |x| < |y| (where ≺ and < are the strict relations corresponding to and ≤, respectively).
A Unique Decomposition Theorem for Ordered Monoids
565
A commutative p.o. monoid M together with a stratification | | : M → N we call a stratified p.o. monoid; the number |x| thus associated with every x ∈ M is called the norm of x. Observe that |x| = 0 iff x = ι (since |ι| + |ι| ≤ |ι ⊗ ι| = |ι| by the first condition in Definition 4, it follows that |ι| = 0, and if x = ι, then ι ≺ x, whence 0 = |ι| < |x| by the second condition in Definition 4). Example 5. The additive p.o. monoid N is stratified with the identity mapping idN on N as stratification. The multiplicative p.o. monoid N∗ is stratified with | | : N∗ → N defined by |k| = max{n ≥ 0 : ∃k0 < k1 < · · · < kn (1 = k0 | k1 | · · · | kn = k)}. Proposition 6. In a stratified commutative p.o. monoid every element has a decomposition. Proof. Straightforward by induction on the norm. The next two propositions are straightforward consequences of the definition of stratification; we need them later on. Proposition 7. If M is a stratified commutative p.o. monoid, then M is strict: x ≺ y implies x ⊗ z ≺ y ⊗ z and z ⊗ x ≺ z ⊗ y for all x, y, z ∈ M . Proposition 8. The order of a stratified p.o. monoid M is well-founded : every nonempty subset of M has a -minimal element. Definition 9. We call a p.o. monoid M precompositional if for all x, y, z ∈ M : x y ⊗ z implies that there exist y y and z z such that x = y ⊗ z . Example 10. That N∗ is precompositional can be shown using the well-known property that if p is a prime number such that p | k · l, then p | k or p | l (see, e.g., [9, p. 11]). If x ≺ y, then x is called a predecessor of y, and y a successor of x. If there is no z ∈ M such that x ≺ z ≺ y, then x is an immediate predecessor of y, and y is an immediate successor of x. The following two lemmas establish a crucial relationship between the immediate predecessors of a composition and certain immediate predecessors of its components. Lemma 11. Let M be a precompositional stratified commutative p.o. monoid, and let x, y and z be elements of M . If x is a predecessor of y of maximal norm, then x ⊗ z is an immediate predecessor of y ⊗ z. Lemma 12. Suppose that x = x1 ⊗ . . . ⊗ xn and y are elements of a precompositional stratified commutative p.o. monoid M . If y is an immediate predecessor of x, then there exist i ∈ {1, . . . , n} and an immediate predecessor yi of xi such that y = x1 ⊗ · · · ⊗ xi−1 ⊗ yi ⊗ xi+1 ⊗ · · · ⊗ xn .
566
Bas Luttik
Theorem 13 (Unique Decomposition). In a stratified and precompositional commutative p.o. monoid every element has a unique decomposition. Proof. Let M be a stratified and precompositional commutative p.o. monoid. By Proposition 6, every element of M has a decomposition. To prove uniqueness, suppose, to the contrary, that the subset of elements of M with two or more distinct decompositions is nonempty. Since is well-founded by Proposition 8, this subset has a -minimal element a. That a has at least two distinct decompositions means that there must be a sequence p, p1 , . . . , pn of distinct primes, and sequences k, k1 , . . . , kn and l, l1 , . . . , ln of natural numbers such that (A) a = pk ⊗ pk11 ⊗ · · · ⊗ pknn and a = pl ⊗ pl11 ⊗ · · · ⊗ plnn ; (B) k < l; and (C) |p| < |pi | implies ki = li for all 1 ≤ i ≤ n. That a is -minimal means that the predecessors of a, i.e., the elements of the initial segment I(a) = {x ∈ M : x ≺ a} of M determined by a, all have a unique decomposition. Let x be an element of I(a). We define #p (x), the multiplicity of p in x, as the number of occurrences of the prime p in the unique decomposition of x. The index of p in x, denoted by [x : p], is the maximum of the multiplicities of p in the weak predecessors of x, i.e., [x : p] = max{#p (y) : y x}. We now use that a = pk ⊗ pk11 ⊗ · · · ⊗ pknn to give an upper bound for the multiplicity of p in an element x of I(a). Since M is precompositional there exist y1 , . . . , yk p and zi1 , . . . , ziki pi (1 ≤ i ≤ n) such that k n k i x= i=1 yi ⊗ i=1 j=1 zij . From yi p it follows that #p (yi ) ≤ [p : p] = 1, and from zij pi it follows that #p (zij ) ≤ [pi : p], so for all x ∈ I(a) #p (x) =
k
#p (yi ) +
i=1
ki n i=1 j=1
#p (zij ) ≤ k +
n
ki · [pi : p].
(1)
i=1
We shall now distinguish two cases, according to the contribution of the second term to the right-hand side of the above inequality, and show that either case leads inevitably to a contradiction with condition (B) above. n First, suppose that i=1 ki · [pi : p] > 0; then [pj : p] > 0 for some 1 ≤ j ≤ n. Let x1 , . . . , xn be such that xi pi and #p (xi ) = [pi : p] for all 1 ≤ i ≤ n, and x = pl ⊗ xl11 ⊗ · · · ⊗ xlnn . Since #p (pi ) = 0, if #p (xi ) > 0 then xi ≺ pi . In particular, since #p (xj ) = [pj : p] > 0, this means that x is an element of I(a) (use that a = pl ⊗pl11 ⊗· · ·⊗plnn and apply Proposition 7), and hence, that #p (x) is defined, by n #p (x) = l + li · [pi : p]. i=1
We combine this definition with the inequality in (1), to conclude that
A Unique Decomposition Theorem for Ordered Monoids
l+
n
li · [pi : p] ≤ k +
i=1
n
567
ki · [pi : p].
i=1
To arrive at a contradiction with condition (B), it therefore suffices to prove that ki · [pi : p] = li · [pi : p] for all 1 ≤ i ≤ n. If [pi : p] = 0, then this is clear at once. If [pi : p] > 0, then, since #p (pi ) = 0, there exists x ≺ pi such that #p (x) = [pi : p] > 0. Every occurrence of p in the decomposition of x contributes |p| to the norm of x, so |p| ≤ |x| < |pi |, from which itfollows by condition (C) n that ki · [pi : p] = li · [pi : p]. This settles the case that i=1 ki · [pi : p] > 0. n We continue with the hypothesis that i=1 ki · [pi : p] = 0. First, assume li > 0 for some 1 ≤ i ≤ n; then, by Proposition 7, pl is a predecessor of a, but that implies l = #p (pl ) ≤ k, a contradiction with (B). In the case that remains, we may assume that li = 0 for all 1 ≤ i ≤ n, and consequently, since a = pl cannot be prime, that l > 1. Clearly, pl−1 is a predecessor of a, so 0 < l − 1 = #p (pl−1 ) ≤ k; it follows that k > 0. Now, let y be a predecessor of p of maximal norm; by Lemma 11, it gives rise to an immediate a-predecessor x = y ⊗ pk−1 ⊗ pk11 ⊗ · · · ⊗ pknn . Then, since a = pl , it follows by Lemma 12 that there exists an immediate predecessor z of p such that x = z ⊗pl−1 . We conclude that k−1 = #p (x) = l−1, again a contradiction with condition (B).
3
ACPε
We fix two disjoint sets of constant symbols A and V; the elements of A we call actions; the elements of V we call process variables. With a ∈ A, X ∈ V and H ranging over finite subsets of A, the set P of process expressions is generated by P ::= ε | δ | a | X | P ·P | P +P | ∂H (P ) | P P | P |P | P P. If X is a process variable and P is a process expression, then the expression def X = P is called a process equation defining X. A set of such expressions is called a process specification if it contains precisely one defining process equation for each X ∈ V. For the remainder of this paper we fix a guarded process specification S: every occurrence of a process variable in a right-hand side P of an equation in S occurs in a subexpression of P of the form a · Q with a ∈ A. We also presuppose a communication function, a commutative and associative partial mapping γ : A×A A. It specifies which actions may communicate: if γ(a, b) is undefined, then the actions a and b cannot communicate, whereas if γ(a, b) = c then they can and c stands for the event that they do. The transition system specification in Table 1 defines on the set P a unary predicate ↓ and binary relations −−a→ (a ∈ A). A bisimulation is a symmetric binary relation R on P such that P R Q implies (i) if P ↓, then Q↓; and (ii) if P −−a→ P , then there exists Q such that Q −−a→ Q and P R Q .
568
Bas Luttik Table 1. The transition system specification for ACPε .
ε↓
P ↓, Q↓ (P · Q)↓
P↓ (P + Q)↓, (Q + P )↓
P ↓, Q↓ (P Q)↓, (Q P )↓
a
a −−→ ε a
P −−→ P a P + Q −−→ P , Q + P −−→ P a
b
a
a
P Q −−→
P
P ↓, Q −−→ Q a P · Q −−→ Q a
def
P −−→ P , (X = P ) ∈ S a X −−→ P
c
a
P −−→ P , Q −−→ Q , a = γ(b, c) a P | Q −−→ P Q
P −−→ P a P Q −−→ P Q
a
a
P −−→ P a P · Q −−→ P · Q
a
P −−→ P a Q, Q P −−→ Q P
P↓ ∂H (P )↓
b
P −−→ P , a ∈ H a ∂H (P ) −−→ ∂H (P ) c
P −−→ P , Q −−→ Q , a = γ(b, c) a P Q −−→ P Q
Process expressions P and Q are said to be bisimilar (notation: P ↔ Q) if there exists a bisimulation R such that P R Q. The relation ↔ is an equivalence relation; we write [P ] for the equivalence class of process expressions bisimilar to P , and we denote by P/↔ the set of all such equivalence classes. Baeten and van Glabbeek [2] prove that ↔ has the substitution property with respect to , and that P (Q R) ↔ (P Q) R, P ε ↔ ε P ↔ P and P Q ↔ Q P . Hence, we have the following proposition. Proposition 14. The set P/↔ with ⊗ and ι defined by [P ] ⊗ [Q] = [P Q] and ι = [ε] is a commutative monoid.
4
Weakly Normed ACPε with Bounded Communication
In this section we present three counterexamples obstructing a general unique decomposition theorem for the monoid P/↔ defined in the previous section. They will guide us in defining a submonoid of P/↔ which does admit a unique decomposition theorem, as we shall prove in the next section. The first counterexample already appears in [10]; it shows that perpetual processes need not have a decomposition. def
Example 15. Let a be an action, let γ(a, a) be undefined and let X = a·X. One can show that X ↔ P1 · · · Pn implies Pi ↔ X for some 1 ≤ i ≤ n. It follows that [X] has no decomposition in P/↔ . For suppose that [X] = [P1 ] ⊗ · · · ⊗ [Pn ]; then [Pi ] = [X], whereas [X] is not a prime element of P/↔ (e.g., X ↔ a X). The second counterexample employs the distinction between successful and unsuccessful termination characteristic of ACP-like process theories. Example 16. Let a be an action; then [a], [a + a · δ] and [a · δ + ε] are prime a elements of P/↔ . Moreover, a ↔ a + a · δ (the transition a + a · δ −−→ δ cannot be
A Unique Decomposition Theorem for Ordered Monoids
569
simulated by a). However, it is easily verified that a(a·δ+ε) ↔ (a+a·δ)(a·δ+ε), so a decomposition in P/↔ need not be unique. w
Let w ∈ A∗ , say w = a1 · · · an ; we write P −−→ P if there exist P0 , . . . , Pn an a1 such that P = P0 −−→ · · · −−→ Pn = P . To exclude the problems mentioned in Examples 15 and 16 above we use the following definition. Definition 17. A process expression P is weakly normed if there exist w ∈ A∗ w and a process expression P such that P −− → P ↔ ε. The set of weakly normed ε process expressions is denoted by P . It is straightforward to show that bisimulation respects the property of being weakly normed, and that a parallel composition is weakly normed iff its parallel components are. Hence, we have the following proposition. Proposition 18. The set P ε /↔ is a submonoid of P/↔ . Moreover, if [P Q] ∈ P ε /↔ , then [P ] ∈ P ε /↔ and [Q] ∈ P ε /↔ . Christensen et al. [5] prove that every element of the commutative monoid of weakly normed BPP expressions modulo bisimulation has a unique decomposition. Presupposing a communication function γ that is everywhere undefined, the operational semantics for BPP expressions is as given in Table 1. So, in BPP there is no communication between parallel components. Christensen [4] extends this result to a unique decomposition theorem for the commutative monoid of weakly normed BPPτ expressions modulo bisimulation. His BPPτ is obtained by replacing the parallel operator of BPP by a parallel operator that allows a restricted form of handshaking communication. Our next example shows that the more general communication mechanism of ACPε gives rise to weakly normed process expressions without a decomposition. def
Example 19. Let a be an action, suppose that a = γ(a, a) and X = a · X + a. Then one can show that X ↔ P1 · · · Pn implies that Pi ↔ X for some 1 ≤ i ≤ n, from which it follows by a similar argument as in Example 15 that [X] has no decomposition in P/↔ . The communication function in the above example allows an unbounded number of copies of the action a to participate in a single communication. To exclude this phenomenon, we use the following definition. Definition 20. A communication function γ is bounded if every action can be assigned a weight ≥ 1 in such a way that a = γ(b, c) implies that the weight of a is the sum of the weights of b and c.
5
Unique Decomposition in P ε /↔
We now prove that every element of the commutative monoid P ε /↔ of weakly normed process expressions modulo bisimulation has a unique decomposition, provided that the communication function is bounded. We proceed by defining on P ε /↔ a partial order and a stratification | | : P ε /↔ → N turning it into
570
Bas Luttik
a precompositional stratified commutative p.o. monoid. That every element of P ε /↔ has a unique decomposition then follows from the theorem of Section 2. Throughout this section we assume that the presupposed communication function γ is bounded so that every action has a unique weight assigned to it (cf. Definition 20). We use it to define the weighted length (w) of w ∈ A∗ inductively as follows: if w is the empty sequence, then (w) = 0; and if w = w a and a is an action of weight i, then (w) = (w ) + i. This definition takes into account that a communication stands for the simultaneous execution of multiple actions. It allows us to formulate the following crucial property of the operational semantics of ACPε . w Lemma 21. If P , Q and R are process expressions such that P Q −− → R, then u ε ∗ there exist P , Q ∈ P and u, v ∈ A such that R = P Q , P −−→ P , Q −−v→ Q and (u) + (v) = (w).
Definition 22. The norm |P | of a weakly normed process expression is the least natural number n such that there exists w ∈ A∗ of weighted length n and w a process expression P such that P −− → P ↔ ε. Lemma 23. If P ↔ Q, then |P | = |Q| for all P, Q ∈ P ε . Lemma 24. |P Q| = |P | + |Q| for all P, Q ∈ P ε . We define on P ε binary relations i (i ≥ 1) and by a
P i Q ⇐⇒ there exists a ∈ A of weight i s.t. P −−→ Q and |P | = |Q| + i. P Q ⇐⇒ P i Q for some i ≥ 1. The reflexive-transitive closure ∗ of is a partial order on P ε . Definition 25. We write [P ] [Q] iff there exist P ∈ [P ] and Q ∈ [Q] such that Q ∗ P . It is straightforward to verify that is a partial order on P ε /↔ . Furthermore, that is compatible with ⊗ can be established by means of Lemma 24, and that ι is its least element essentially follows from weak normedness. Hence, we get the following proposition. Proposition 26. The set P ε /↔ is a commutative p.o. monoid. By Lemmas 23 and 24, the mapping | | : (P ε /↔ ) → N defined by [P ] → |P | is a strict homomorphism. Proposition 27. The mapping | | : (P ε /↔ ) → N is a stratification of P ε /↔ . Lemma 28. If P Q ∗ R, then there exist P and Q such that P ∗ P , Q ∗ Q and R = P Q . The following proposition is an easy consequence of the above lemma.
A Unique Decomposition Theorem for Ordered Monoids
571
Proposition 29. The p.o. monoid P ε /↔ is precompositional. According to Propositions 26, 27 and 29, P ε /↔ is a stratified and precompositional commutative p.o. monoid, so by Theorem 13 we get the following result. Theorem 30. In the p.o. monoid P ε /↔ of weakly normed processes expressions modulo bisimulation every element has a unique decomposition, provided that the communication function is bounded.
Acknowledgment The author thanks Clemens Grabmayer, Jeroen Ketema, Vincent van Oostrom, Simona Orzan and the referees for their comments.
References 1. L. Aceto and M. Hennessy. Towards action-refinement in process algebras. Inform. and Comput., 103(2):204–269, 1993. 2. J. C. M. Baeten and R. J. van Glabbeek. Merge and termination in process algebra. In K. V. Nori, editor, Proc. of FST TCS 1987, LNCS 287, pages 153–172, 1987. 3. G. Birkhoff. Lattice theory, volume XXV of American Mathematical Society Colloquium Publications. American Mathematical Society, third edition, 1967. 4. S. Christensen. Decidability and Decomposition in Process Algebras. PhD thesis, University of Edingburgh, 1993. 5. S. Christensen, Y. Hirshfeld, and F. Moller. Decomposability, decidability and axiomatisability for bisimulation equivalence on basic parallel processes. In Proc. of LICS 1993, pages 386–396. IEEE Computer Society Press, 1993. 6. W. J. Fokkink and S. P. Luttik. An ω-complete equational specification of interleaving. In U. Montanari, J. D. P. Rolim, and E. Welzl, editors, Proc. of ICALP 2000, LNCS 1853, pages 729–743, 2000. 7. L. Fuchs. Partially Ordered Algebraic Systems, volume 28 of International Series of Monographs on Pure and Applied Mathematics. Pergamon Press, 1963. 8. Y. Hirshfeld and M. Jerrum. Bisimulation equivalence is decidable for normed process algebra. In J. Wiedermann, P. van Emde Boas, and M. Nielsen, editors, Proc. of ICALP 1999, LNCS 1644, pages 412–421, 1999. 9. T. W. Hungerford. Algebra, volume 73 of GTM. Springer, 1974. 10. R. Milner and F. Moller. Unique decomposition of processes. Theoret. Comput. Sci., 107:357–363, January 1993. 11. F. Moller. Axioms for Concurrency. PhD thesis, University of Edinburgh, 1989. 12. F. Moller. The importance of the left merge operator in process algebras. In M. S. Paterson, editor, Proc. of ICALP 1990, LNCS 443, pages 752–764, 1990. 13. J. L. M. Vrancken. The algebra of communicating processes with empty process. Theoret. Comput. Sci., 177:287–328, 1997.
Generic Algorithms for the Generation of Combinatorial Objects Conrado Mart´ınez and Xavier Molinero Departament de Llenguatges i Sistemes Inform` atics, Universitat Polit`ecnica de Catalunya, E-08034 Barcelona, Spain {conrado,molinero}@lsi.upc.es
Abstract. This paper briefly describes our generic approach to the exhaustive generation of unlabelled and labelled combinatorial classes. Our algorithms receive a size n and a finite description of a combinatorial class A using combinatorial operators such as union, product, set or sequence, in order to list all objects of size n in A. The algorithms work in constant amortized time per generated object and thus they are suitable for rapid prototyping or for inclusion in general libraries.
1
Introduction
Exhaustively generating all the objects of a given size is an important problem with numerous applications that has attracted the interest of combinatorialists and computer scientists for many years. There is a vast literature on the topic and many ingenious techniques and efficient algorithms have been devised for the generation of objects of relevant combinatorial classes (permutations, trees, sets, necklaces, words, etc.). Indeed, it is common to find introductory material in many textbooks on algorithms (see for instance [8]). Furthermore, several distinct natural (and useful) orderings have been considered for the generation of combinatorial classes, for example, lexicographic and Gray ordering. Many stateof-the-art algorithms for exhaustive generation can be found (and executed) in the Combinatorial Object Server (www.theory.csc.uvic.ca/˜cos), where the interested reader can also find further references. The ultimate goal is to achieve algorithms with constant amortized time per generated object, that is, the cost of generating all N objects of size n takes time proportional to N . Many such algorithms are known, but there is still on-going and active research on this topic. In this work, we combine some well-known principles and a few novel ideas in a generic framework to design algorithms that solve the problem of exhaustive generation, given the size and a finite specification of the combinatorial class whose elements are to be listed. This kind of approach was pioneered by Flajolet et al. [2] for the random generation of combinatorial objects and later applied
This research was supported by the Future and Emergent Technologies programme of the EU under contract IST-1999-14186 (ALCOM-FT) and the Spanish “Ministerio de Ciencia y Tecnolog´ıa” programme TIC2002-00190 (AEDRI II).
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 572–581, 2003. c Springer-Verlag Berlin Heidelberg 2003
Generic Algorithms for the Generation of Combinatorial Objects
573
by the authors for the unranking problem [6] and for the generation of labelled objects [5]. Somewhat different, but with a similar spirit, is the general approach of Kemp [4] for the generation of words in lexicographic order. We show that all our algorithms work in constant amortized time and provide a general framework for the analysis of the performance of these algorithms in the form of a calculus or set of rules. Most existing algorithms exploit particular characteristics of the combinatorial class to be generated, thus achieving improved performance over na¨ıve or brute force methods. The main contribution of this work is to provide a few generic algorithms which solve the problem of iteration over the subset of objects of a given size, given the size and a finite specification of the combinatorial class. These finite specifications are built from basic - and atomic classes, and combinatorial operators like unions (‘+’), Cartesian products (‘×’), sequences (‘S’), multisets (‘M’), cycles (‘C’), etc. Our algorithms, deprived of specific knowledge of the problem at hand, are likely to be a bit worse than their specific counterparts, but still have competitive performance, making them good candidates for rapid prototyping and for inclusion into general combinatorial libraries such as the combstruct package [1] for Maple1 and MuPAD-combinat for MuPAD (mupad-combinat.sourceforge.net). Typically, complex objects in a given class are composed by smaller units, called atoms. Atoms are objects of size 1 and the size of an object is the number of atoms it contains. For instance, a string is composed by the concatenation of symbols, where each of these is an atom, and the size of the string is its length or the number of symbols it is composed of. Similarly, a tree is built out of nodes – its atoms – and the size of the tree is its number of nodes. Objects of size 0 and 1 will be generically denoted by and Z, respectively2 . Unlabelled objects are those whose atoms are indistinguishable. On the contrary, each of n atoms of a labelled object of size n bears a distinct label drawn from the numbers 1 to n. For the rest of this paper, we will use calligraphic uppercase letters to denote classes (A, B, C, . . . ). Given a class A, An will denote the subset of objects of size n in A and an the number of such objects. We use the corresponding uppercase roman letter (A, B, C, . . . ) to denote the counting generating functions (ordinary GFs for unlabelled classes, exponential GF for labelled classes). The n-th coefficient of A(z) is denoted [z n ]A(z); hence, an = [z n ]A(z) if A(z) is ordinary and an = n! · [z n ]A(z) if A(z) is exponential. As it will become apparent, our approach to the exhaustive generation problem requires an efficient algorithm for counting, that is, given a specification of a class and a size, compute the number of objects with the given size. Hence, we will only deal with so-called admissible combinatorial classes [10,9]. Those are constructed from admissible operators, operations over classes that yield new 1
2
The current implementation of combstruct offers a routine allstructs to generate all objects of a given size and a finite specification of the class; but it does the job by repeatedly generating objects at random until all them have been generated. Also, we will use these symbols to denote not only objects but the classes that contain just one object of size 0 and of size 1, respectively.
574
Conrado Mart´ınez and Xavier Molinero
classes, and such that the number of objects of a given size in the new class can be computed from the number of objects of that size or smaller sizes in the constituent classes. Tables 1 and 2 give a few examples of both labelled and unlabelled admissible classes; as such, our algorithms are able to generate all their objects of a given size and the specification of the class. Table 1. Examples of labelled classes and their specifications Labelled class Cayley trees Binary plane trees Hierarchies Surjections Functional graphs
Specification A = Z M(A) B =Z +BB C = Z + M(C, card ≥ 2)) D = S(M(Z, card ≥ 1))) E = M(C(A))
Any combinatorial object belonging to an admissible class can be suitably represented as a string of symbols or as an expression tree whose leaves correspond to the object’s atoms and whose internal nodes are labelled by admissible operators. However, such a representation is not the most convenient for the exhaustive generation problem; our algorithms will act upon a new kind of objects, which we call iterators. An iterator contains a combinatorial object (represented as a tree-like structure), but also additional information which helps and speeds up the generation process. This additional information is also organized as a tree-like structure – which we call deep structure – and reflects that of the corresponding combinatorial object, but each node contains information about the class, the rank, the size, the labelling of the “subobject” rooted at the node in the case of labelled objects, etc. Furthermore, there are pointers between the object’s representation and the deep structure to allow fast access and update of the object’s representation. Table 2. Examples of unlabelled classes and their specifications Unlabelled class Binary sequences Necklaces Rooted unlabelled trees Non plane ternary trees Integer partititons without repetition
Specification A = S(Z + Z) B = C(M(Z, card ≥ 1)) C = Z × M(C) D = Z + M(D, card = 3) E = P(S(Z, card ≥ 1))
From the user’s point of view, we shall offer the following four routines: 1) a procedure to initialize the iterator (init iter), which given a finite description of a combinatorial class A and a size n, returns the iterator corresponding to the first object in An ; 2) a function next, which given an iterator modifies it so that it corresponds to the object following the previous one; 3) a function get obj to retrieve the combinatorial object from the iterator, in order to print or process it as needed; 4) a boolean function is last to check whether the
Generic Algorithms for the Generation of Combinatorial Objects
575
iterator corresponds to the past-the-end object: a fictitious sentinel which follows the last object in An . These will be typically used as follows: it:= init_iter(A, n); while not is_last(it) do print(get_obj(it)); it:= next(it); end In next section we will describe our algorithms and their performance for the generation of admissible unlabelled combinatorial classes. Section 3 briefly considers the generation of labelled objects, based on our previous work [5], thus integrating the generation of labelled and unlabelled classes into a single and elegant framework. Except for the case of unlabelled cycles (not describe here), most common combinatorial constructions can be dealt within this framework. In section 4 we comment our current work, extensions and future developments.
2
Unlabelled Classes
Here, by an admissible class we mean that the class can be finitely specified using the class (the class with a single object of size 0), atomic classes (classes that contain a single object of size 1), and disjoint unions (‘+’), products (‘×’), sequences (‘S’), multisets (‘M’) and powersets (‘P’) of admissible classes. We have also developed a generic algorithm for cycles (‘C’) of admissible unlabelled classes; however both the algorithm and its analysis use rather different ideas and techniques from the other operators and will be not explained here because of the space limitations. There exist other interesting admissible operations, but we shall restrict our attention to those mentioned above. Even within this restricted framework, many important combinatorial classes can be specified. Furthermore, the techniques and results presented here can be easily extended to other admissible operators such as substitutions, sequences, multisets and powersets of restricted cardinality, etc. 2.1
The Algorithms
The problem of generating the class and atomic classes is trivial. We shall only consider the function next, as the others functions are more or less straightforward. We assume that we have a function count(A, n) which returns the number of objects of size n in the combinatorial class A [2]. The function next actually uses a recursive routine which receives a pointer p to some node of the deep structure; the initial recursive call is with p pointing to the root of the deep structure. However, we will use the same name for the recursive routine which actually performs the job. If the current object (of size n) belongs to a class A+B, we need only to check whether the current object is the last of A or not. If it is, then the next object will be the first object in B; otherwise, we generate the next object in the appropriate class (A if the current rank is smaller than or equal to count(A, n) − 1, B if the
576
Conrado Mart´ınez and Xavier Molinero
current rank is greater than or equal to count(A, n) − 1). This actually means recursively applying the procedure to the unique subtree hanging from p. All the checks above can be easily done as the node pointed to by p in the deep structure contains the specification, the rank of the current object, its size, etc. On the other hand, if the current subobject of size n corresponds to a product, say A×B, we check if the second component (given by p.second ) is the last object of size n−k of its class B, where k = p.first.size. If it is not, we recursively obtain the next object of the second component. If the second component is the last object of size n − k in B, but the first component (pointed to by p.first) is not the last object of size k in A then the algorithm is recursively applied to the first component; we also reset the information in the current node and set the second component of the new object to be the first object in B of size n − k. If both first and second components were the last objects of sizes k and n − k in A and B, respectively, then a loop looks for the smallest k > k such that Ak × Bn−k is not empty. After that, the first objects in Ak and Bn−k are generated and the appropriate information is updated in the current node p. Multisets are dealt with in a similar spirit as products. The basis of the algorithm is the isomorphism ΘM(A) = ∆ΘA × M(A), where ∆A denotes the diagonal or stacking of the class A, that is, ∆A = {α | α ∈ A} + {(α, α) | α ∈ A} + {(α, α, α) | α ∈ A} + · · · , and ΘA is the pointing (marking) of the class A, that is, the class that we obtain by making copies of each object of size k in A, but marking a different atom in each copy. If we mark an atom in a multiset we might think that we have marked the object that contains the atom, say the m-th copy of the object; on the right hand side we produce the marked object, make m copies of the marked object and attach a multiset. A multiset γ consists of two parts: a first component α ∈ A of size k together with its number of occurrences, say , and a second component β which is a multiset of size n − k. This second component contains objects in A whose size is less than or equal to k; in the latter case, their rank is strictly smaller than the rank of α (implying that they have been used as the first component previously). In order to get the next object, we check whether there exist multisets satisfying the conditions above, that is, whether there is some object following β. If not, we obtain the object following α in ∆Ak . When the first component in the current object is the last object in ∆Ak , we loop until we find a suitable size j > j = k, and obtain the respective first objects. The generation of the next object of α is also easy: we obtain the object following α in Ak if it exists; if not, we look for the smaller divisor k of j = k which is larger than k and produce the first object in Ak and attach the appropiate number of occurrences = j/k . Powersets are generated in the same vein. For a fixed first component α of size j, we produce all powersets of size n − j made up of objects of size smaller than or equal to j and whose rank is strictly smaller than the rank of α; if there are no more such powersets, we recursively apply the procedure to α. If α is the last object of size j then we look for the next available size j for the first component. The isomorphim here is given by ΘP(A) = ∆[odd] A × P(A) − ∆[even] A × P(A), where ∆[odd] and ∆[even] are like the diagonal operator, but for odd and even
Generic Algorithms for the Generation of Combinatorial Objects
577
numbers of copies, respectively. The proof of this isomorphism is a bit more involved, and exploits the principle of inclusion-exclusion to guarantee that no element is repeated. On a purely formal basis, we can introduce ∆ˆ = ∆[odd] − ∆[even] , so that we ˆ could say ΘP(A) = ∆ΘA × P(A). The operator ∆ˆ allows for more convenient symbolic manipulations when computing the cost of the algorithms, but has no combinatorial meaning, though. 2.2
The Performance Let ΛAn = α∈An cn(α), where cn(α) denotes the cost of applying next to object α. Then µA,n = ΛAn /an is the amortized cost per generated object. We will not include in the cost the preprocessing time needed to compute the tables of counts nor the cost of constructing the first object (and associated information) of a given size in a given class. These can be precomputed just once (or computed the first time they are needed and stored into tables for later reuse) and its contribution to the overall performance will be neglected. Also, we will not include the time to parse the specification and transform it to standard form either as this cost does not depend on n. Lemma 1. Given an admissible unlabelled class A, let ΛA(z) denote the ordinary generating function of the cumulated costs {ΛAn }n≥0 . Then 1. 2. 3. 4. 5. 6.
Λ∅ = Λ = ΛZ = 0, Λ(A + B) = ΛA + ΛB + [[A]] + [[B]] − [[A + B]], Λ(A × B) = ΛA · [[B]] + A · ΛB + [[A]] · [[B]] − [[A × B]], ΛΘA = ΘΛA + Θ[[A]] − [[ΘA]], Λ∆A = ∆ΛA + ∆[[A]] − [[∆A]], Λ∆[t] A = ∆[t] ΛA + ∆[t] [[A]] − [[∆[t] A]], t ∈ {odd, even}
where ΘA(z) ≡ z dA dz , ΘA denotes the pointing or marking of the class A, ∆A(z) ≡ k>0 A(z k ), ∆A denotes the diagonal of the class A, ∆[odd] A(z) = 2k−1 ), ∆[even] A(z) = k>0 A(z 2k ), ∆[odd] A and ∆[even] denote the odd k>0 A(z and even diagonals of the class A, respectively, and [[A]] = n≥0 [[an = 0]]z n , with [[P ]] = 1 if the predicate P is true and 0 otherwise. Proof. For classes that contain just one item, the cumulated cost is 0, as we do not count the cost of generating the first object. The rule for unions is straightforward, but we must take care to charge the cost corresponding to computing the next of the last element in A; this is accounted for by the terms [[A]]+[[B]]−[[A + B]]. For products, we generate pairs in Ak ×Bn−k for k = 0, . . . , n. For a fixed object α of size k we generate all objects in Bn−k to form all pairs whose first component is α; since there are ak objects of size k and we have to consider all possible values of k, this contribution is given by A · ΛB. The other main contribution to the cost comes from the generation of all objects in A with sizes k from 0 to n (and such that there are objects of size Bn−k to form at least a pair). This is given by the term ΛA · [[B]]. The remaining terms account for the
578
Conrado Mart´ınez and Xavier Molinero
application of next to the last pair in Ak × Bn−k whenever there exist such a pair, but not for the first pair in A × B. The algorithm for the marking is also straightforward: list all elements in A of size n with the first atom marked, then list all them again but with the second atom marked and so on. The terms Θ[[A]] − [[ΘA]] account for the cost of passing from the first listing to the second, from the second to the third, etc. To generate all objects of size n in the diagonal of the class A, recall that we loop through all divisors d of n such that there are objects of size d. For each such d, we list all objects of size d and for each one we attach the number of copies (n/d) that make up the corresponding object in ∆A. Thus Λ∆An = d divides n (ΛAd + [[ad = 0]]) − [[∆An = ∅]]. The rule for the odd and even diagonals of A are similarly obtained. From Lemma 1 we can easily obtain rules for sequences, sets and multisets. Corollary 1. Let A be an admissible class such that ∈ A and let A be its counting generating function. Then 1. Let S(A) = 1/(1 − A). Then ΛS(A) = S(A) · (1 + (ΛA + [[A]] − 1)[[S(A)]]) . 2. Let M(A) = exp( k>0 A(z k )/k). Then z dz ∆Θ(ΛA + [[A]]) · [[M(A)]] − Θ[[M(A)]] . ΛM(A) = M(A) · z · M(A) 0 3. Let P(A) = exp( k>0 (−1)k−1 A(z k )/k). Then ΛP(A) = P(A) ·
z
dz ˆ ∆Θ(ΛA + [[A]]) · [[P(A)]] − Θ[[P(A)]] z · P(A)
+[[ΘP(A)]] − [[∆[odd] ΘA × P(A)]] + [[∆[even] ΘA × P(A)]] . 0
Proof. It suffices to use the isomorphisms S(A) = + A × S(A), ΘM(A) = ∆ΘA × M(A) and ΘP(A) = ∆[odd] ΘA × P(A) − ∆[even] ΘA × P(A), and apply rules 1-6 in the statement of Lemma 1. In the case of multisets and powersets, the rules can be obtained applying Λ to both sides of the isomorphisms given above, inverting Θ and Λ with rule 4, and solving the resulting linear differential equations. In the case of powersets, the sought cost arises from the difference of costs; we have thus ΛΘP(A) = Λ(Θ∆[odd] A × P(A)) − Λ(Θ∆[even] A × P(A)). We have the following theorem that can be easily established from either the rules that we have just derived or directly reasoning about the algorithms. Theorem 1. For any unlabelled admissible class A which can be finitely specified using , Z, +, ×, S, M and P, we have µA,n = ΛAn /an = Θ(1).
Generic Algorithms for the Generation of Combinatorial Objects
579
Proof (Sketch of proof ). The proof is by structural induction on the specification of the class and on the size of the objects. We consider thus what happens for unions, products, sequences, etc. and assume that the statement is true for smaller sizes. Since we charge just one “time” unit for the update of one node in the deep structure and assume that the initialization and calls to the count routine are free, we actually have µA,n → 1 as n → ∞. In practice, µA,n is different for different classes if we take into account that the (constant) overhead associated with each operator varies. We conclude with a few simple examples of application of these rules. 1. K-shuffles. A K-shuffle is a sequence of a’s and b’s that contains exactly K b’s. Let LK = S(a) × (b × LK−1 ) for K > 0 and L0 = S(a). It is not zK difficult to show that ΛLK ∼ (1−z) K+1 near the singularity z = 1; hence the n K-shuffles of size n. amortized cost µLK ,n → 1 since there are exactly K We get the same asymptotic performance if we use alternative specifications, e.g., LK = LK−1 × (b × S(a)). 2. Motzkin trees. For √M √= + Z × M + Z × M × M, one readily gets 6z + z 2 + O((1 − 6z + z 2 )3/2 ) near the domΛM ∼ −1/2(3 − 2 2) 1 − √ inant singularity at z = 3 − 2 2; hence ΛM ∼ M and µM,n = 1 + o(1).
3
Labelled Classes
By admissible labelled classes we mean those that can be finitely specified using the -class, atomic labelled classes, unions (+), labelled products ( ), sequences (Seq), sets (Set) and cycles (Cycle) of admissible labelled classes. As in the previous section, there exist other admissible operators over labelled classes, but we shall restrict our attention to those mentioned above. Again, many important combinatorial classes can be specified within this framework and the ideas that we present here carry on to other admissible operators such as substitutions, sequences, sets and cycles of restricted cardinality, etc. For example, the class C of Cayley trees is admissible, as it can be specified by C = Z Set(C), where Z denotes an atomic class. The class F of functional graphs is also admissible, since a functional graph is a set of cycles of Cayley trees; therefore, F = Set(Cycle(C)). The exhaustive generation of labelled combinatorial objects uses similar ideas to those sketched in Section 2; in fact, we face easier problems since sets and cycles can be specified by means of the so-called boxed product, denoted by 2 [3]. We recall that in a boxed product, we obtain a collection of labelled objects from a pair of given objects, much in the same manner as for the usual labelled product, but the smallest label must always correspond to an atom belonging to the first object in the pair. Boxed products are related to the pointing (see subsection 2.1) of a class A by ΘA = ΘB C ⇐⇒ A = B 2 C. The isomorphisms for sequences, sets and cycles in terms of the other constructors (union, product and boxed product) that we use in our algorithms are the following: 1) Seq(A) = +A Seq(A), 2) Set(A) = +A 2 Set(A), and 3) Cycle(A) = A 2 Seq(A). Thus every admissible specification (a finite set of equations specifying admissible
580
Conrado Mart´ınez and Xavier Molinero
classes, like in the example of functional graphs) can be transformed into an equivalent specification that involves only unions, products and boxed products. The algorithms for unions and products are very similar to those for unlabelled classes, and the algorithm for boxed products works much like the algorithm for products. In the case of labelled and boxed products, we change the partition or relabelling of the current object if possible; otherwise we recursively apply the next routine to the second component or the first component of the object. In order to “traverse” the nk possible partitions of the labels3 of a pair α, β , where n is the size of the objects to be generated and k is the size of the first component of the current object, we use Nijenhuis and Wilf’s routine for the next k-subset [7] (alternatively, we can use the algorithm by Kemp [4]). Also, like for unlabelled classes, we can set up a calculus for the complexity of our algorithms with rules such as ΘΛ(A 2 B) = ΘΛA · [[B]] + ΘA · ΛB + Θ[[A]] · [[B]] − Θ[[A 2 B]], n for boxed products. Here, [[A]] = n≥0 [[an = 0]] zn! and ΛA is the exponential generating function of the total costs to generate all elements of each size. The rules for the other combinatorial constructions are similar in spirit and can be easily derived from the rules for unions, products and boxed products. We make here the same assumptions as in the analysis of the performance of the algorithms for unlabelled classes; moreover, we take into account the fact that the algorithm for the generation of k-subsets works in constant amortized time [7]. Then it is not difficult to show that this cost can be easily “absorbed” by terms like A · ΛB in the rule for products and there is no need to include a term of the type c · A · B. Using the same techniques as in the proof of Theorem 1 is not hard to establish an analogous result for labelled generation. The detailed account of the complexity calculus for the generation of labelled objects and of the proof of the following theorem will be given in the full version of this extended abstract. Theorem 2. For any admissible labelled class A which can be finitely specified using , Z, +, , Seq, Set and Cycle, we have µA,n = ΛAn /an = Θ(1).
4
Current and Future Work
As we have mentioned earlier, we already have a constant amortized time algorithm to generate unlabelled cycles of A’s. However, this algorithm for the generation of unlabelled cycles is based upon techniques quite different from the one here and it doesn’t fit nicely in the framework here sketched (in sharp contrast with labelled cycles). We have implemented all the algorithms described here for the Maple system, on top of the basic routines provided by the combstruct package. Also, 3
There are only
n−1 k−1
partitions of the labels in the case of boxed products.
Generic Algorithms for the Generation of Combinatorial Objects
581
there are plans for a port of these routines to the MuPAD-combinat package in the near future. Furthermore, we also have routines for the generation of labelled substitutions and for labelled sequences, sets and cycles when their cardinalities are restricted. We have conducted extensive experiments to asses the practical performance of our algorithms. These experiments show that the practical performance is in good agreement to the theoretical predictions (namely, the cost grows linearly with the total number N of generated objects, if N is sufficiently large; the slope of the plot is independent of the size of the objects being generated). Our current work is now centered in the extension of the techniques presented here to other admissible operators. We also are trying to design an algorithm for unlabelled cycles that fits within the framework here sketched. If we obtained such an algorithm, it would immediately suggest an efficient answer for the unranking of unlabelled cycles, a question that still remains open, to the best of the authors’ knowledge. We are also working on alternative isomorphisms and orderings which could improve the efficiency of the generation algorithms (similar ideas yield significant improvements for the random generation and unranking of objects, see [2,6]).
References 1. Ph. Flajolet and B. Salvy. Computer algebra libraries for combinatorial structures. J. Symbolic Computation, 20:653–671, 1995. 2. Ph. Flajolet, P. Zimmerman, and B. Van Cutsem. A calculus for the random generation of combinatorial structures. Theoret. Comput. Sci., 132(1-2):1–35, 1994. 3. D.H. Greene. Labelled Formal Languages and Their Uses. PhD thesis, Computer Science Dept., Stanford University, 1983. 4. R. Kemp. Generating words lexicographically: An average-case analysis. Acta Informatica, 35(1):17–89, 1998. 5. C. Mart´ınez and X. Molinero. Generic algorithms for the exhaustive generation of labelled objects. In Proc. Workshop on Random Generation of Combinatorial Structures and Bijective Combinatorics (GASCOM), pages 53–58, 2001. 6. C. Mart´ınez and X. Molinero. A generic approach for the unranking of labelled combinatorial classes. Random Structures & Algorithms, 19(3–4):472–497, 2001. 7. A. Nijenhuis and H. S. Wilf. Combinatorial Algorithms. Academic Press, 1978. 8. E.M. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice-Hall, Englewood Cliffs, NJ, 1977. 9. R. Sedgewick and Ph. Flajolet. An Introduction to the Analysis of Algorithms. Addison-Wesley, Reading, MA, 1996. 10. J.S. Vitter and Ph. Flajolet. Average-case analysis of algorithms and data structures. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, chapter 9. North-Holland, 1990.
On the Complexity of Some Problems in Interval Arithmetic K. Meer Department of Mathematics and Computer Science Syddansk Universitet, Campusvej 55, 5230 Odense M, Denmark [email protected] Fax: 0045 6593 2691
Abstract. We study some problems in interval arithmetic treated in Kreinovich et al. [6]. First, we consider the best linear approximation of a quadratic interval function. Known to be N P -hard in the Turing model, we analyze its complexity in the real number model and the analoguous class N PR . We give new upper complexity bounds by locating the decision version in DΣR2 (a real analogue of Σ 2 ) and solve a problem left open in [6].
1
Introduction
Problems in interval arithmetic model situations in which input data only is known within a certain accuracy. Starting from an exact description with input values ai , i ∈ I (say ai ∈ R or ∈ Q, I an index set), a corresponding formalization in terms of interval arithmetic would only supply the information that the ai ’s belong to some given intervals [ai , ai ] ⊂ R. This framework provides a way to formalize and study problems related to the presence of uncertainties. The latter both includes data errors occuring during data measurements and rounding errors during the performance of computer algorithms. Interval arithmetic thus can be seen as an approach to validate numerical calculations. The computational complexity of solving a problem in the interval setting might be significantly larger than solving the corresponding problem with accurate input data. In fact, many such results are known in interval arithmetic. As an example, consider the solvability problem for a linear equation system A · x = b. If A and b are given precisely Gaussian elimination efficiently yields computation of a solution (or proves its non-existence). This holds both for the bit measure and the algebraic cost measure, see below. However, if we only know the entries in A and b to belong to given intervals, then the complexity changes dramatically; deciding the question whether concrete choices for A and b exist within the given (rational) interval bounds such that the resulting system for these choices is solvable is N P -complete, see [6].
Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT) and by the Danish Natural Science Research Council SNF.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 582–591, 2003. c Springer-Verlag Berlin Heidelberg 2003
On the Complexity of Some Problems in Interval Arithmetic
583
From a logical point of view this increase in the complexity (provided P = N P ) is due to the presence of additional quantifiers in the interval problem description. For example, in the above linear system problem the interval approach has additional existential quantifiers ranging over the intervals of the given accuracy bounds and asking for the existence of “right” coefficients followed by the linear equation problem. Since the new quantifiers are ranging over real intervals, they in principle introduce a quantified formula in the first-order theory of the reals (considered as a real closed field; this of course only holds for algebraic problems). Even though this theory is decidable the complexity of currently known algorithms is tremendous. It is well known that already the existential theory over R is an N PR -complete problem; here, N PR denotes the real analogue of N P in the framework of real Turing (or Blum-Shub-Smale, shortly: BSS) machines, see [1]. Therefore, it is natural to consider interval problems in that framework. In this paper we want to analyze whether for interval problems known to be hard in the Turing setting the shift from the Turing to the real number model implies N PR -completeness or N PR -hardness of the problem in the latter as well. We shall substantiate that in the real number model a finer complexity analysis can be done. More precisely, for some problems the interval formulation will likely not lead to N PR -hardness, even though the restriction to rational problems and the Turing model implies N P -hardness (or completeness, respectively). This will be due to the fact that even though formally the new quantifiers range over the reals, in certain situations they can be replaced by Boolean quantifiers, i.e., quantifiers ranging over {0, 1}, only. We study the following problem treated in [6] for clarifying our approach: Best approximation of quadratic interval functions by linear ones. The problem is known to be N P -hard in the Turing model, [7]; note, however, that membership in the polynomial hierarchy is not established in [7]. Definition 1. (a) Let B := [b1 , b1 ] × . . . [bn , bn ] be a box in Rn , bi < bi for 1 ≤ i ≤ n. An interval function f on B is a mapping which assigns to each point y ∈ B an interval f (y) := [f (y), f (y)] ⊆ R. If both functions f and f are linear or quadratic functions, i.e., if they are polynomials of degree 1 or 2, resp., we call f a linear respectively a quadratic interval function. (b) Given a box B as above, a linear interval function X := [X, X] and a quadratic interval function f := [f , f ] on B, we say that X approximates f on B iff [f (y), f (y)] ⊆ [X(y), X(y)] for all y ∈ B. Definition 2. (a) The problem BLAQIF (best linear approximation of a quadratic interval function) is defined as follows: given n ∈ N, a box B ⊆ Rn , a quadratic interval function f (y) := [f (y), f (y)] on B and a bound M ∈ R, is there a linear approximation X = [X, X] of f on B such that max X(y)−X(y) ≤ M ? y∈B
(b) The computational version of BLAQIF asks to compute min max X(y)− X
y∈B
X(y) under the constraints that X = (X, X) is an approximation of f.
584
K. Meer
Our main results now can be summarized as follows: Main results: (i) In the real number model BLAQIF is not N PR -complete under weak polynomial time reductions and likely (to be precised!) neither N PR complete under (full) polynomial time reductions. (ii) In the Turing model BLAQIF can be located in Σ2 . For fixed input dimension n both the decision and the computational version can be solved in polynomial time. Part (ii) complements the results in [7] by providing an upper complexity bound in the Turing setting. It also answers a question posed in [6].
2
Basic Notations; Structural Properties
We first recall the definition of some complexity classes important for our results. Then, we analyze the consequence for problems belonging to one of these classes with respect to completeness or hardness properties in the real number model. 2.1
Complexity Classes
Though there are different equivalent definitions for the classes we need, for our purposes those based on alternating quantifiers are most appropriate. Definition 3. (a) A decision problem A over the alphabet {0, 1} is in class Σ k , k ∈ Nx iff there are a problem B ∈ P and polynomials p1 , . . . , pk such that x ∈ A ⇐⇒ Q1 y1 ∈ {0, 1}p1 (|x|) . . . Qk yk ∈ {0, 1}pk (|x|) (x, y1 , . . . , yk ) ∈ B , where the variable blocks yi range over {0, 1}pi (|x|) and the quantifiers Qi ∈ {∃, ∀} alternate, starting with Q1 = ∃ (and |x| describes the bit size of x). ∞ Ri is in class ΣRk , k ∈ N in the real (b) A decision problem A over R∞ := i=1
number model iff there are a problem B ∈ PR and polynomials p1 , . . . , pk such that x ∈ A ⇐⇒ Q1 y1 ∈ Rp1 (|x|R ) . . . Qk yk ∈ Rpk (|x|R ) (x, y1 , . . . , yk ) ∈ B , where the variable blocks yi range over Rpi (|x|R ) and the quantifiers Qi ∈ {∃, ∀} alternate, starting with Q1 = ∃ (and |x|R describes the algebraic size of x). (c) If we in b) restrict the quantifiers to be Boolean ones, i.e., if the variable blocks range over {0, 1}∗ instead of R∞ we obtain the digital classes DΣRk . Clearly, Σ 1 = N P, ΣR1 = N PR and DΣR1 = DN PR , where the latter is the class digital N PR of problems in N PR that require a discrete search space for verification, only.
On the Complexity of Some Problems in Interval Arithmetic
2.2
585
The Real Number Complexity of Problems in DΣR2
This section is on structural complexity of problems in DΣR2 . The main goal is to argue that problems in DΣR2 likely do not bear the full complexity of ΣR2 , and even not of N PR -hard problems. This shows that the complexity analysis of several NP-hard interval arithmetic problems can be considerably refined. We turn this into more precise statements as follows. We give an absolute statement with respect to so called weak reductions (introduced by Koiran [5] for a weak version of the BSS model): No problem in DΣR2 is N PR -hard under weak reductions. Then, we give an analogous statement for (general) polynomial time reductions under a widely believed hypothesis concerning computations of resultant polynomials: No problem in DΣR2 is N PR -hard under polynomial time reductions unless there is a (non-uniform) polynomial time algorithm computing a multiple of the resultant polynomial on a Zariski-dense subset. Though some definitions are necessary to precisely state these results, the proofs are almost straightforward extensions of similar statements for DN PR given in [2] and [8]. Definition 4. (Weak running time, [5]) (a) Let M be a real machine with (real) machine constants c := (c1 , . . . , cs ) and having a running time bounded by a function t of the (algebraic) input size. Any intermediate result computed by M on x 1 ,...,cs ) is a rational function of the form p(x,c q(x,c1 ,...,cs ) , where p and q are polynomials with integer coefficients over x and c. The weak costs of computing this intermediate result are given as the maximum among the degrees of p, q and the bit sizes of any of its coefficient. Other operations of M (like branches and copying) have weak costs 1. The weak running time of M on input x ∈ R∞ is the sum of the weak costs of all intermediate results and branch-nodes along the computational path of M on x. (b) We call a many-one reduction a weak polynomial time reduction if it can be computed in weak polynomial time by a BSS machine. The notion of N PR -completeness under weak polynomial time reductions then is defined in a straightforward manner. Note that using the weak cost measure we still allow real number algorithms, but operation sequences like repeated squaring now get more expensive than in the BSS model, see [5]. The next definition introduces (a particular subcase of) the well known resultant polynomials. Consider the problem of deciding whether a given system f := (f1 , . . . , fn ) ∈ R[x1 , . . . , xn ]n of n homogeneous polynomial systems of degree 2 in n variables has a zero x ∈ Cn \ {0}, i.e., fi (x) = 0 ∀i. We denote by H the set of all such systems and by H0 those being solvable in Cn \ {0}. The implicitly stated claims in the following definition are well known, see,e.g., [11]. Definition 5. Let n ∈ N, N := 12 · n2 · (n + 1). The resultant polynomial RESn : RN → R is a polynomial which as its indeterminates takes the coefficient vectors of homogeneous systems in H. It is the unique (up to sign) irreducible polynomial with integer coefficients that generates the variety of (coefficient vectors of ) solvable instances H0 of problems in H, i.e., RESn (f ) = 0 iff f ∈ H
586
K. Meer
has a zero x ∈ Cn \ {0}. In this notation, RESn (f ) is interpreted as evaluating RESn on the coefficient vector of f in RN . It is generally believed that no efficient algorithms for computing RESn exist. This is, for example, substantiated by the close relation of this problem to other potentially hard computational problems like the computation of mixed volumes. For more see [11] and the literature cited in there. Hardness results for certain resultant computations can be found in [9]; relations between computation of resultants and the real PR versus N PR question are studied in [10]. Theorem 6. (a) No problem in DN PR is N PR -complete under weak polynomial time reductions. No problem in DΣR2 is N PR -hard under weak polynomial time reductions. (b) Suppose there is no (non-uniform) polynomial time algorithm which for each n ∈ N computes a non-zero multiple of RESn on a Zariski-dense subset of H0 . Then no problem in DN PR is N PR -complete and no problem in DΣR2 is N PR -hard under polynomial time reductions in the BSS model. The proof is an extension of ideas developed in [2] and [8].
Remark 1. Note that in (b) we cannot expect an absolute statement of noncompleteness like in (a) unless PR = N PR is proven (for the weak model the relation weak-PR = weak-N PR is shown in [2]). In the next sections the above theorem is used to substantiate the conjecture that interval problems which either belong to DN PR or DΣR2 do not share the full difficulty of complete problems in N PR . Thus, in the real number model their complexities seem to be not the hardest possible among all (algebraic) interval problems belonging to the corresponding real complexity class.
3
Approximation of Interval Functions
The BLAQIF problem is closely related to a semi-infinite optimization problem. Towards this end, suppose for a while that we have found an optimal linear approximation X(y) := x0 + x1 y1 + . . . + xn yn , X(y) − f (y) ≥ 0 ∀ y ∈ B and X(y) := x0 + x1 y1 + . . . + xn yn , f (y) − X(y) ≥ 0 ∀ y ∈ B . As shown in [6] it is easy to calculate max X(y)−X(y) once X, X are known. y∈B
The components yi∗ of the optimal y ∗ are determined by the signs of xi − xi according to yi∗ := bi if xi ≥ xi and yi∗ := bi if xi < xi . Knowing these signs we obtain a linear semi-infinite programming problem. For example, if we suppose xi ≥ xi ∀ 1 ≤ i ≤ n the problem turns into
On the Complexity of Some Problems in Interval Arithmetic
587
n n min x + x · b − x − xi · bi 0 i i 0 i=1 i=1 T s.t. x0 + x · y − f (y) ≥ 0 ∀y∈B (LSI) T f (y) − x − x · y ≥ 0 ∀ y∈B 0 xi ≥ xi ∀1≤i≤n, where x := (x1 , . . . , xn ), and similarly for x. This problem is linear on the upper variable level (i.e., the 2n + 2 many xvariables) and quadratic for the lower variable level (i.e., y). It is semi-infinite because there are infinitely many side-constraints for X, X, parametrized through y. Note that in general we do not know in advance which sign-conditions for the components of an optimal solution X, X hold. Later on, we shall guess the right conditions as the first part of our DΣR2 algorithm and start from the resulting (LSI) problem. General assumption: For sake of simplicity, in the following we assume without loss of generality xi ≥ xi ∀ 1 ≤ i ≤ n and deal with the above (LSI). It is easy to see that the decision version of BLAQIF belongs to ΣR2 . The result, however, is not strong enough for what we want. It neither proves a similar statement in the Turing model (since we do not know how to bound the bit-sizes of the guessed reals) nor does it give any information that BLAQIF in a real setting is likely to be an easier problem than complete ones for class ΣR2 (and even for class N PR ) are. In order to see how general quantifier elimination procedures over R can be avoided when solving the real BLAQIF problem we have to study semi-infinite optimization problems a bit more deeply. 3.1
Optimality Conditions for (LSI)
A fundamental idea for studying (LSI) is to reduce the infinitely many constraints to finitely many in order to apply common optimization criteria. The following can be deduced from semi-infinite programming theory, see, e.g., [4]. Theorem 7. A feasible point (X, X) is optimal for (LSI) iff the following conditions are satisfied: there exist two sets {y (i) , i ∈ I} and {y (j) , j ∈ J}, each of at most n points in B, together with Lagrange parameters λi , i ∈ I, νj , j ∈ J and µk , 1 ≤ k ≤ n such that 1 0 1 0 (i) y · λ i + 0 · νj + µ = b i) 0 −1 0 −1 i∈I j∈J −µ 0 −y (j) −b where µ := (µ1 , . . . , µn ) and 0 ∈ Rn ; ii) λi ≥ 0 ∀ i ∈ I, νj ≥ 0 ∀ j ∈ J, µk ≥ 0 ∀1 ≤ k ≤ n; iii) either λi = 0 or the point y (i) is optimal for the problem min x0 + xT · y (i) − y∈B
f (y), and the optimal value is 0;
588
K. Meer
iv) either νj = 0 or the point y (j) is optimal for the problem min f (y) − x0 − y∈B
xT · y (j) , and the optimal value is 0; v) µk · (xk − xk ) = 0 ∀ 1 ≤ k ≤ n.
This theorem is most important for obtaining our results by the following (j) reasons. First, it states that at least one point y (i) and one point y satisfying conditions iii) and iv), respectively, exist; this follows from λi = 1 = νj . i∈I
j∈J
Therefore, we can search for it and are sure to find it if we guarantee the search to be exhaustive. Secondly, as global optima for the corresponding subproblems y (i) and y (j) satisfy the following optimality conditions on the lower level of the semi-infinite problem. Corollary 1. Using the setting of Theorem 7 let y (i) be a point satisfying con(i) (i) dition iii), where λi > 0. Let AC(y (i) ) := {k|yk = bk or yk = bk } be the set of active components of y (i) in B. Then thereexist Lagrange parameters ηj ≥ 0, j ∈ AC(y (i) ) such that x − Dy f (y (i) ) = ηj · (±ej ) , where ej is j∈AC(y (i) )
the j-th unit vector and the sign is +1 iff 3.2
(i) yj
(i)
= bj and −1 iff yj = bj .
Linear Approximation of Quadratic Functions Is in DΣR2
The previous results on the relations of BLAQIF and (LSI) are used in this subsection in order to prove membership in DΣR2 resp. in Σ 2 as follows. The overall goal is to find a solution (X, X) and check that it realizes the demanded bound M. It has to be shown how that can be realized using binary (digital) quantifiers. Towards this end 1) we guess the right set of signs for xk − xk , 1 ≤ k ≤ n and produce the corresponding (LSI); without loss of generality we again assume all these signs to be 0 or 1. 2) Assuming (X, X) to be known we guess certain discrete information which then is used to compute at least one point y (i) , i ∈ I and one point y (j) , j ∈ J satisfying Theorem 7. This is done using Corollary 1 and the ideas developed in [8]. 3) From the corollary and the information obtained in 2) we deduce conditions that have to be fulfilled by an optimal solution (X, X). These conditions lead to a linear programming problem. By means of a DN PR algorithm we obtain a candidate (X, X) for the optimum. 4) Finally, the candidate obtained in 3) is checked for optimality. This mainly requires to check the constraints, which now are quadratic programs in y. Using the results of [8] this problem belongs to class co-DN PR = DΠR1 . Together, we obtain a DΣR2 algorithm. Theorem 8. Let y (i) ∈ S and y (j) ∈ S be two points in the statement of Theorem 7 such that the corresponding Lagrange parameters λi and νj are positive.
On the Complexity of Some Problems in Interval Arithmetic
589
Suppose that we do neither know an optimal (X, X) nor y (i) , y (j) . Then having the correct information about the signs for xi − xi of an optimal solution and about the active components of y (i) and y (j) in S (i.e., those components that correspond either to bk or to bk ) we can compute an optimal solution (X, X) of (LSI) as (any) solution of a specific linear programming problem. Moreover, the latter linear programming problem can be constructed deterministically in polynomial time if the active components are known. Corollary 2. There is a DΣR1 algorithm which computes a set X of vectors in which an optimal solution of (LSI) can be found, i.e., a non-deterministic algorithm that guesses a vector in {0, 1}∗ of polynomial length in n and produces a candidate (X, X) for each guess such that at least one of the candidates produced is an optimal solution. Proof. The active components of y (i) and y (j) can be coded by a bit-vector. Now use Theorem 8 together with the results in [8]. Proof of Theorem 8. As it can be seen from the proof below it will be sufficient to argue for one of the points y (i) , y (j) only, so let us consider y (i) . W.l.o.g. (i) suppose the first s many components to be active and to satisfy yk = bk for 1 ≤ k ≤ s. This actually is the most difficult case because the values of the active (i) components yk = bk do not correspond to the assumed inequalities xk − xk ≥ 0 which in the objective function result in the terms (xk − xk ) · bk (instead of (i) (xk − xk ) · bk which would correspond to yk = bk ). However, the difference only results in an additional LP-problem which is of no concern in our analysis. We plug the active components into the constraint x0 + xT · y (i) − f (y (i) ) ≥ 0 and obtain a quadratic minimization problem in the remaining components ys+1 , . . . , yn : s n min x + x · b + x · y − f (b , . . . , b , y , . . . , y ) 0 k k n k 1 s s+1 (∗) k=1 k=s+1 such that bk < yk < bk , s + 1 ≤ k ≤ n If the guess was correct we know that an interior solution for y˜ := ys+1 , . . . , yn T exists. Now define f (b1 , . . . , bs , ys+1 , . . . , yn ) := 12 y˜T · D · y˜ + h · y˜ + e , where D ∈ R(n−s)×(n−s) , h ∈ Rn−s , e ∈ R. Then Corollary 1 together with a straightforward calculation gives: i) an optimal (interior) solution y˜ lies in the kernel of D; ii) an optimal (interior) solution y˜ satisfies (xs+1 , . . . , xn )T = D · y˜ + h . Thus, i) implies (xs+1 , . . . , xn )T = h and we can compute these components of the (LSI) solution directly; s iii) the optimal value of (∗) is 0; using ii) this results in x0 + xk · bk = e . k=1
In a completely analogue fashion we obtain a similar condition for the part X of a solution when studying an optimal y (j) for min f (y) − xT · y − x0 . If without
590
K. Meer
loss of generality the last n − components of a solution y (j) are active we get s (x1 , . . . , x )T = h as well as x0 + xk · bk = e for appropriate values h, e that k=1
can easily be computed knowing the active components of y (j) . Putting all the information together we have the following situation: knowing the active components of y (i) , y (j) we can directly compute in polynomial time from the problem input those components of an optimal solution (X, X) (i) (j) that correspond to non-active yk , yk . The remaining ones can be obtained as optimal solution of the linear program min x0 + s.t. x0 +
s k=1 s k=1
xk · bk +
n k=s+1
hk · bk − x0 +
xk · bk = e and x0 +
n k=+1
k=1
hk · bk −
n k=+1
xk · bk
xk · bk = e
Following [8] such a solution can be computed non-deterministically in polynomial time using a binary vector as a guess. The theorem can be used to prove the main result of this section: Theorem 9. The BLAQIF decision problem belongs to DΣR2 in the real number model and to Σ 2 in the Turing model. Proof. It is clear that there exists a best linear approximation for each instance. The first sequence of binary existential quantifiers is used to find the correct (LSI) version, i.e., to guess the correct signs for xk − xk in an optimal solution. We use the guess to construct the right objective function for the problem (as described before). According to Theorem 7 there exist two points y (i) , y (j) as described in Theorem 8. Moreover, we can guess a binary vector of polynomial length in the algebraic input size, perform the algorithm described in the proof of Theorem 8 and compute a candidate (X, X) for an optimal solution of the (LSI) instance. The proof also guarantees that if we would handle in a deterministic (but inefficient) algorithm all possible guesses at least one would give an optimal solution, see Corollary 2. In the remaining part of the DΣR2 algorithm we have to verify that the computed candidate (X, X) indeed is feasible and gives a bound ≤ M for the objective function. Whereas the latter is done by a simple evaluation the former requires the computation of an optimal point for two quadratic programming problems with linear constraints. These problems have the lower level variables y as unknowns and are obtained by plugging X and X into the lower level equations. We have seen this problem to be in co-DN PR . Note that if we want to get a globally minimal point we have to compare it with all other candidates. Thus, this part corresponds to checking validity of a formula containing a sequence of O(n) many universal binary quantifiers. This implies BLAQIF ∈ DΣR2 . In the Turing model the above structure of binary quantifiers still describes a Σ 2 procedure. The only point to check is that the intermediate computations can be performed in polynomial time with respect to the bit-measure. This is
On the Complexity of Some Problems in Interval Arithmetic
591
true for the arguments relying in [8] as well as for the proof of Theorem 8: No additional constants are introduced in these algorithms and the construction of intermediate matrices and LP-subproblems is done by only rearranging some of the input data. Corollary 3. BLAQIF is not N PR -hard under weak polynomial time reductions; it is not N PR -hard under polynomial time reductions unless a non-zero multiple of RESn can be computed non-uniformly in polynomial time on a Zariski-dense subset of H0 . Theorem 9 also answers a question posed in [6], chapter 19, concerning the complexity of the rational BLAQIF problem if the dimension n is fixed. Our result extends the one in [7]. Theorem 10. Let n ∈ N be fixed. The computational version of the BLAQIF problem for rational inputs and fixed dimension n is solvable in polynomial time in the Turing model. We finally mention that results similar to Corollary 3 can be obtained for several versions of interval linear systems, see [6], which are known to be NPcomplete in the Turing model. We postpone our discussion to the full version.
References 1. L. Blum, F. Cucker, M. Shub, S. Smale: Complexity and Real Computation. Springer, 1998. 2. F. Cucker, M. Shub, S. Smale: Complexity separations in Koiran’s weak model. Theoretical Computer Science, 133, 3 – 14, 1994. 3. E. Gr¨ adel, K. Meer: Descriptive Complexity Theory over the Real Numbers. In: Lectures in Applied Mathematics, J. Renegar, M. Shub, S. Smale (eds.), 32, 381– 403, 1996. 4. S.˚ A. Gustafson, K.O. Kortanek: Semi-infinte programming and applications. In: Mathematical Programming: The State of the Art, A. Bachem, M. Gr¨ otschel, B. Korte, eds., Springer, 132 – 157, 1983. 5. P. Koiran: A weak version of the Blum-Shub-Smale model. In 34th Annual IEEE Symposium on Foundations of Computer Science, 486 – 495, 1993. 6. V. Kreinovich, A.V. Lakeyev, J. Rohn, P. Kahl: Computational Complexity and Feasibility of Data Processing and Interval Computations. Kluwer, 1997. 7. M. Koshelev, L. Longpr´e, P. Taillibert: Optimal Enclusure of Quadratic Interval Functions. Reliable Computing 4, 351 – 360, 1998. 8. K. Meer: On the complexity of quadratic programming in real number models of computation. Theoretical Computer Science 133, 85 – 94, 1994. 9. D.A. Plaisted: New NP-hard and NP-complete polynomial and integer divisibility problems. Theoretical Computer Science 31, 125 – 138, 1984. 10. M. Shub: Some remarks on Bezout’s theorem and complexity theory. In: M. Hirsch, J. Marsden, M. Shub (eds.), Form topology to computation: Proc. of the Smalefest, Springer, 443 – 455, 1993. 11. B. Sturmfels: Introduction to resultants. In: Application of Computational Algebraic Geometry, D.A. Cox B. Sturmfels (eds.), Proc. of Symposia in Applied Mathematics, Vol. 53, AMS, 25 – 39, 1998.
An Abduction-Based Method for Index Relaxation in Taxonomy-Based Sources Carlo Meghini1 , Yannis Tzitzikas1, , and Nicolas Spyratos2 1
Consiglio Nazionale delle Ricerche Istituto della Scienza e delle Tecnologie della Informazione, Pisa, Italy 2 Laboratoire de Recherche en Informatique Universite de Paris-Sud, France
Abstract. The extraction of information from a source containing termclassified objects is plagued with uncertainty. In the present paper we deal with this uncertainty in a qualitative way. We view an information source as an agent, operating according to an open world philosophy. The agent knows some facts, but is aware that there could be other facts, compatible with the known ones, that might hold as well, although they are not captured for lack of knowledge. These facts are, indeed, possibilities. We view possibilities as explanations and resort to abduction in order to define precisely the possibilities that we want our system to be able to handle. We introduce an operation that extends a taxonomy-based source with possibilities, and then study the property of this operation from a mathematical point of view.
1
Introduction
Taxonomies are probably the oldest conceptual modeling tool. Nevertheless, they make a powerful tool still used for indexing by terms books in libraries, and very large collections of heterogeneous objects (e.g. see [8]) and the Web (e.g. Yahoo!, Open Directory). The extraction of information from an information source (hereafter, IS) containing term-classified objects is plagued with uncertainty. From the one hand, the indexing of objects, that is the assignment of a set of terms to each object, presents many difficulties, whether it is performed manually by some expert or automatically by a computer programme. In the former case, subjectivity may play a negative role (e.g. see [10]); in the latter case, automatic classification methods may at best produce approximations. On the other hand, the query formulation process, being linguistic in nature, would require perfect attuning of the system and the user language, an assumption that simply does not hold in open settings such as the Web. A collection of textual documents accessed by users via natural language queries is clearly a kind of IS, where documents play the role of objects and words play the role of terms. In this context, the above mentioned uncertainty is
This work has been carried out while Dr. Tzitzikas was a visiting researcher at CNR-ISTI as an ERCIM fellow. Our thanks to ERCIM.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 592–601, 2003. c Springer-Verlag Berlin Heidelberg 2003
An Abduction-Based Method for Index Relaxation
593
typically dealt with in a quantitative way, i.e. by means of numerical methods: in a document index, each term is assigned a weight, expressing the extent to which the document is deemed to be about the term. The same treatment is applied to each user query, producing an index of the query which is a formal representation of the user information need of the same kind as that of each document. Document and query term indexes are then matched against each other in order to estimate the relevance of the document to a query (e.g. see [1]). In the present study, we take a different approach, and deal with uncertainty in a qualitative way. We view an IS as an agent, operating according to an open world philosophy. The agent knows some facts, but it does not interpret these facts as the only ones that hold; the agent is somewhat aware that there could be other facts, compatible with the known ones, that might hold as well, although they are not captured for lack of knowledge. These facts are, indeed, possibilities. One way of defining precisely in logical terms the notion of possibility, is to equate it with the notion of explanation. That is, the set of terms associated to an object is viewed as a manifestation of a phenomenon, the indexing process, for which we wish to find an explanation, justifying why the index itself has come to be the way it is. In logic, the reasoning required to infer explanations from given theory and observations, is known as abduction. We will therefore resort to abduction in order to define precisely the possibilities that we want our system to be able to handle. In particular, we will define an operation that extends an IS by adding to it a set (term, object) pairs capturing the sought possibilities, and then study the property of this operation from a mathematical point of view. The introduced operation can be used also for ordering query answers using a possibility-based measure of relevance. The paper is structured as follows. Sections 2 and 3 provide the basis of our framework, introducing ISs and querying. Section 4 introduces extended ISs and Section 5 discusses query answering in such sources. Subsequently, Section 6 generalizes extended ISs and introduces iterative extensions of ISs. Finally, Section 7 concludes the paper. For reasons of space, proofs are just sketched.
2
Information Sources
An IS consists of two elements. The first one is a taxonomy, introduced next. Definition 1: A taxonomy is a pair O = (T, K) where T is a finite set of symbols, called the terms of the taxonomy, and K is a finite set of conditionals on T, i.e. formulae of the form p → q where p and q are terms; K is called the knowledge base of the taxonomy. The knowledge graph of O is the directed graph GO = (T, L), such that (t, t ) ∈ L iff t → t is in K. 2 The second element of an IS is a structure, in the logical sense of the term. Definition 2: Given a taxonomy O = (T, K), a structure on O is a pair U = (Obj, I) where: Obj is a countable set of objects, called the domain of the structure, and I is a finite relation from T to Obj, that is I ⊆ T × Obj, called the interpretation of the structure. 2
594
Carlo Meghini, Yannis Tzitzikas, and Nicolas Spyratos
As customary, we will treat the relation I as a function from terms to sets of objects and, where t is a term in T, write I(t) to denote the extension of t, i.e. I(t) = {o ∈ Obj | (t, o) ∈ I}. Definition 3: An information source (IS) S is a pair S = (O, U ) where O is a taxonomy and U is a structure on O. 2 It is not difficult to see the strict correspondence between the notion of IS and that of a restricted monadic predicate calculus: the taxonomy plays the role of the theory, by providing the predicate symbols (the terms) and a set of axioms (the knowledge base); the structure plays the basic semantical role, by providing a domain of interpretation and an extension for each term. These kinds of systems have also been studied in the context of description logics [3], where terms are called concepts and axioms are called terminological axioms. For the present study, we will mostly focus on the information relative to single objects, which takes the form of a propositional theory, introduced by the next Definition. Definition 4: Given an IS S and an object o ∈ Obj, the index of o in S, indS (o), is the set of terms in whose extension o belongs according to the structure S, formally: indS (o) = {t ∈ T | (t, o) ∈ I}. The context of o in S, CS (o), is defined as: CS (o) = indS (o) ∪ K. 2 For any object o, CS (o) consists of terms and simple conditionals that collectivelly form all the knowledge about o that S has. Viewing the terms as propositional variables makes object contexts propositional theories. This is the view that will be adopted in this study. Example 1: Throughout the paper, we will use as an example the IS graphically illustrated in Figure 1, given by (the abbreviations introduced in Figure 1 are used for reasons of space): T = {, C, SC, MPC, UD, R, M, UMC}, K = {C → , SC → C, MPC → C, UD → , R → SC, M → SC, UMC → MPC, UMC → UD}, and U is the structure given by: Obj = {1, 2} and I = {(SC, 1), (M, 2), (MPC, 2)}. The index of object 2 in S, indS (2) is {M, MPC}, while the context of 2 in S is CS (2) = indS (2) ∪ K. Notice that the taxonomy of the example has a maximal element, , whose existence is not required in every taxonomy. 2 Given a set of propositional variables P, a truth assignment for P is a function mapping P to the set of standard truth values, denoted by T and F, respectively [5]. A truth assignment V satisfies a sentence σ, V |= σ, if σ is true in V, according to the truth valuation rules of predicate calculus (PC). A set of sentences Σ logically implies the sentence α, Σ |= α, iff every truth assignment which satisfies every sentence in Σ also satisfies α. In the following, we will be interested in deciding whether a certain conditional is logically implied by a knowledge base. Proposition 1: Given a taxonomy O = (T, K) and any two terms p, q in T, K |= p → q iff there is a path from p to q in GO . 2 From a complexity point of view, the last Proposition reduces logical implication of a conditional to the well-known problem on graphs REACHABILITY, which has been shown to have time complexity equal to O(n2 ), where n is the
An Abduction-Based Method for Index Relaxation
595
Cameras(C)
StillCameras(SC)
Reflex (R)
MovingPictureCams(MPC)
Miniatures(M)
1
UnderwaterDevices(UD)
UnderwaterMovingCams(UMC)
2
Fig. 1. A source
number of nodes of the graph [7]. Consequently, for any two terms p, q in T, K |= p → q can be decided in time O(|T |2 ).
3
Querying Information Sources
We next introduce the query language for extracting information from an IS in the traditional question-answering way. Definition 5: Given a taxonomy O = (T, K), the query language for O, LO , is defined by the following grammar, where t is a term in T : q ::= t | q ∧ q | q ∨ q | ¬q | (q) 2 The answer to queries is defined in logical terms by taking a model-theoretic approach, compliant with the fact that the semantical notion of structure is used to model the extensional data of an IS. To this end, we next select, amongst the models of object contexts, the one realizing a closed-world reading of an IS, whose existence and uniqueness trivially follow from the next Definition. Definition 6: Given an IS S, for every object o ∈ Obj, the truth model of o in S, Vo,S , is the truth assignment for T defined as follows, for each term t ∈ T : T if CS (o) |= t Vo,S (t) = F otherwise Given a query ϕ in LO , the answer of ϕ in S is the set of objects whose truth model satisfies the query: ans(ϕ, S) = {o ∈ Obj | Vo,S |= ϕ}. 2 In the Boolean model of information retrieval, a document is returned in response to a query if the index of the document satisfies the query. Thus, the above definition extends Boolean retrieval by considering also the knowledge base in the retrieval process. Example 2: The answer to the query C in the IS introduced in Example 1, ans(C, S), consists of both object 1 (since {SC, SC → C} ⊆ CS (1) hence V1,S (C) = T) and object 2 (since {MPC, MPC → C} ⊆ CS (2) hence V2,S (C) = T). 2 The next definition introduces the function αS , which, along with Proposition 1, provides a mechanism for the computation of answers.
596
Carlo Meghini, Yannis Tzitzikas, and Nicolas Spyratos
Definition 7: Given an IS S, the solver of S, αS , is the total function from queries to sets of objects, αS : LO → P(Obj), defined as follows: αS (t) = {I(u) | K |= u → t} αS (q∧q ) = αS (q)∩αS (q ), αS (q∨q ) = αS (q)∪αS (q ), and αS (¬q) = Obj\αS (q). 2 As intuition suggests, solvers capture sound and complete query answerers. Proposition 2:
For all ISs S and queries ϕ ∈ LO , ans(ϕ, S) = αS (ϕ).
We shall also use I
−
to denote the restriction of αS on T , i.e. I
−
2
= αS|T .
Example 3: In the IS previously introduced, the term C can be reached in the knowledge graph by each of the following terms: C, SC, MPC, R, M, and UMC. Hence: ans(C, S) = αS (C) = I(C) ∪ I(SC) ∪ I(MPC) ∪ I(R) ∪ I(M) ∪ I(UMC) = {1, 2}. Likewise, it can be verified that ans(M, S) = {2} and ans(UMC, S) = ∅. 2 In the worst case, answering a query requires (a) to visit the whole knowledge graph for each term of the query and (b) to combine the so obtained sets of objects via the union, intersection and difference set operators. Since the time complexity of each such operation is polynomial in the size of the input, the time complexity of query answering is polynomial.
4
Extended Information Sources
Let us suppose that a user has issued a query against an IS and that the answer does not contain objects that are relevant to the user information need. The user may not be willing to replace the current query with another one, for instance because of lack of knowledge on the available language or taxonomy. In this type of situation, both database and information retrieval (IR) systems offer practically no support. If the IS does indeed contain relevant objects, the reason of the user’s disappointment is indexing mismatch: the objects have been indexed in a way that is different from the way the user would expect. One way of handling the problem just described, would be to consider the index of an IS not as the ultimate truth about how the world is and is not, but as a flexible repository of information, which may be interpreted in a more liberal or more conservative way, depending on the context. For instance, the above examples suggest that a more liberal view of the IS, in which the camera in question is indexed under the term M, could help the user in getting out of the impasse. One way of defining precisely in logical terms the discussed extension, is to equate it with the notion of explanation. That is, we view the index of an object as a manifestation, or observation, of a phenomenon, the indexing process, for which we wish to find an explanation, justifying why the index itself has come to be as it is. In logic, the reasoning required to infer explanations from given theory and observations, is known as abduction. The model of abduction that we adopt is the one presented in [4]. Let LV be the language of propositional logic over an alphabet V of propositional variables,
An Abduction-Based Method for Index Relaxation
597
with syntactic operators ∧, ∨, ¬, →, (a constant for truth) and ⊥ (falsity). A propositional abduction problem is a tuple A = V, H, M, T h , where V is a finite set of propositional variables, H ⊆ V is the set of hypotheses, M ⊆ V is the set of manifestations, and T h ⊆ LV is a consistent theory. S ⊆ H is a solution (or explanation) for A iff T h ∪ S is consistent and T h ∪ S |= M. Sol(A) denotes the set of the solutions to A. In the context of an IS S, the terms in S taxonomy play both the role of the propositional variables V and of the hypotheses H, as there is no reason to exclude apriori any term from an explanation; the knowledge base in S taxonomy plays the role of the theory T h; the role of manifestation, for a fixed object, is played by the index of the object. Consequently, we have the following Definition 8: Given an IS S and object o ∈ Obj, the propositional abduction problem for o in S, AS (o), is the propositional abduction problem AS (o) = T, T, indS (o), K . The solutions to AS (o) are given by: Sol(AS (o)) = {A ⊆ T | K ∪ A |= indS (o)} where the consistency requirement on K ∪ A has been omitted since for no knowledge base K and set of terms A, K ∪ A can be inconsistent. 2 Usually, certain explanations are preferable to others, a fact that is formalized in [4] by defining a preference relation over Sol(A). Letting a ≺ b stand for a b and b a, the set of preferred solutions is given by: Sol (A) = {S ∈ Sol(A) | ∃S ∈ Sol(A) : S ≺ S}. In the present context, we require the preference relation to satisfy the following criteria, reflecting the application priorities in order of decreasing priority: (1) explanations including only terms in the manifestation are less preferable than explanations including also terms not in the manifestation; (2) explanations altering the behaviour of the IS to a minimal extent, are to be preferred; (3) between two explanations that alter the behaviour of the IS equally, the simpler, that is the smaller, one is to be preferred. Without the first criterion, all minimal solutions would be found amongst the subsets of M, a clearly undesirable effect, at least as long as alternative explanations are possible. In order to formalize our intended preference relation, we start by defining perturbation. Definition 9: Given an IS S, an object o ∈ Obj and a set of terms A ⊆ T, the perturbation of A on S with respect to o, p(S, o, A) is given by the number of additional terms in whose extension o belongs, once the index of o is extended with the terms in A. Formally: p(S, o, A) = |{t ∈ T | (CS (o) ∪ A) |= t and CS (o) |= t}|. 2 As a consequence of the monotonicity of the PC, for all ISs S, objects o ∈ Obj and sets of terms A ⊆ T, p(S, o, A) ≥ 0. In particular, p(S, o, A) = 0 iff A ⊆ indS (o). We can now define the preference relation over solutions of the above stated abduction problem. Definition 10: Given an IS S, an object o ∈ Obj and two solutions A and A to the problem AS (o), A A if either of the following holds:
598
Carlo Meghini, Yannis Tzitzikas, and Nicolas Spyratos
1. p(S, o, A ) = 0 2. 0 < p(S, o, A) < p(S, o, A ) 3. 0 < p(S, o, A) = p(S, o, A ), and A ⊆ A .
2
In order to derive the set Sol (AS (o)), we introduce the following notions. Definition 11: Given an IS S and an object o ∈ Obj, the depth of Sol(AS (o)), do , is the maximum perturbation of the solutions to AS (o), that is: do = max{p(S, o, A) | A ∈ Sol(AS (o))} Moreover, two solutions A and A are equivalent, A ≡ A , iff they have the same perturbation, that is p(S, o, A) = p(S, o, A ). 2 It can be readily verified that ≡ is an equivalence relation over Sol(AS (o)), determining the partition π≡ whose elements are the set of solutions having the same perturbation. Letting Pi stand for the solutions having perturbation i, Pi = {A ∈ Sol(AS (o)) | p(S, o, A) = i} it turns out that π≡ includes one element for each perburbation value in between 0 and do , as the following Proposition states. Proposition 3: For all ISs IS S and objects o ∈ Obj, π≡ = {Pi | 0 ≤ i ≤ do }. In order to prove the Proposition, it must be shown that {Pi | 0 ≤ i ≤ do } is indeed a partition, that is: (1) Pi = ∅ for each 0 ≤ i ≤ do ; (2) Pi ∩ Pj = ∅ for 0 ≤ i, j ≤ do , i = j; (3) {Pi | 0 ≤ i ≤ do } = Sol(AS (o)). Items 2 and 3 above are easily established. Item 1 is trivial for do = 0. For do > 0, item 1 can be established by backward induction on i : the basis step, Pdo = ∅, is true by definition. The inductive step, Pk = ∅ for k > 0 implies Pk−1 = ∅, can be proved by constructing a solution having perturbation k − 1 from a solution with perturbation k. Finally, it trivially follows that this partition is the one induced by the ≡ relation. 2 We are now in the position of deriving Sol (AS (o)). Proposition 4: For all ISs S and objects o ∈ Obj, if do = 0 P0 Sol (AS (o)) = {A ∈ P1 | for no A ∈ P1 , A ⊂ A} if do > 0 This proposition is just a corollary of the previous one. Indeed, if do is 0, by Proposition 3, Sol(AS (o)) = P0 and by Definition 10, all elements in Sol(AS (o)) are minimal. If, on the other hand, do is positive, then by criterion (1) of Definition 10, all solutions with non-zero perturbation are preferable to those in P0 , and not viceversa; and by criterion (2) of Definition 10, all solutions with perturbation equal to 1 are preferable to the remaining, and not viceversa. Hence, for a positive do , minimal solutions are to be found in P1 . Finally, by considering the containment criterion set by item (3) of Definition 10, the Proposition results. Example 4: Let us consider again the IS S introduced in Example 1, and the problem AS (1). The manifestation is given by {SC}. Letting B stand for the set {UMC, MPC, UD, , C}, it can be verified that: Sol(AS (1)) = P(T ) \ P(B) as B includes all the terms in T not implying SC. Since do = 5, minimal solutions are to be found in the set P1 . By considering all sets of terms in Sol(AS (1)), it
An Abduction-Based Method for Index Relaxation
599
can be verified that: P1 = {{M} ∪ A | A ∈ P({SC, C, })} ∪ {{R} ∪ A | A ∈ P({SC, C, })} ∪ {{SC, UD} ∪ A | A ∈ P({, C})} ∪ {{SC, MPC} ∪ A | A ∈ P({, C})}. By applying the set containment criterion, we have: Sol (AS (1)) = {{M}, {R}, {SC, UD}, {SC, MPC}}. Analogously, it can be verified that: 2 Sol (AS (2)) = {{M, MPC, UD}, {R, M, MPC}}. We now introduce the notion of extension of an IS. The idea is that an extended IS (EIS for short) adds to the original IS all and only the indexing information captured by the abduction process illustrated in the previous Section. In order to maxime the extension, all the minimal solutions are included in the EIS. Definition 12: Given an IS S and an object o ∈ Obj, the abduced index of o, abindS (o), is given by: abindS (o) = Sol (AS (o)). The abduced interpretation of S, I + , is given by I + = I ∪ {t, o ∈ (T × Obj) | t ∈ abindS (o)}. Finally, the extended IS, S e , is given by S e = (O, U e ) where U e = (Obj, I + ). 2 Example 5: From the last Example, it follows that the extended S is given by S e = (O, U e ), U e = (Obj, I + ) where: abindS (1) = {SC, M, R, UD, MPC}, abindS (2) = {M, MPC, UD, R} and I + = {(SC, 1), (M, 1), (R, 1), (UD, 1), (MPC, 1), (M, 2), (MPC, 2), (UD, 2), (R, 2)} 2
5
Querying Extended Information Sources
As anticipated in Section 4, EISs are meant to be used in order to obtain more results about an already stated query, without posing a new query to the underlying information system. The following Example illustrates the case in point. Example 6: The answer to the query M in the extended IS derived in the last Example, ans(M, S e ), consists of both object 1 (since M ∈ abindS (1) hence M ∈ CS e (1)) and object 2 (since (M, 2) ∈ I hence (M, 2) ∈ I + ). Notice that 1 is not returned when M is stated against S, i.e. ans(M, S) ⊂ ans(M, S e ). Instead, ans(UMC, S) = ans(UMC, S e ) = ∅. 2 It turns that queries stated against an EIS can be answered without actually computing the whole EIS. In order to derive an answering procedure for queries posed against an EIS, we introduce a recursive function on the IS query language LO , in the same style as the algorithm for querying IS presented in Section 3. Definition 13: Given an IS S, the extended solver of S, αSe , is the total function from queries to sets of objects, αSe : LO → P(Obj), defined as follows: αSe (t) = {αS (u) | t → u ∈ K and K |= u → t} αSe (q ∧ q ) = αSe (q) ∩ αSe (q ) αSe (q ∨ q ) = αSe (q) ∪ αSe (q ) αSe (¬q) = Obj \ αSe (q)
where αS is the solver of S.
2
600
Carlo Meghini, Yannis Tzitzikas, and Nicolas Spyratos
Note that since is the maximal element the set {αS (u) | → u ∈ K and K |= u → } is empty. This means that αSe (), i.e. {αS (u) | → u ∈ K and K |= u → } is actually the intersection of an empty family of subsets of Obj. However, according to the Zermelo axioms of set theory (see [2] for an overview), the intersection of an empty family of subsets of a universe equals to the universe. In our case, the universe is the set of all objects known to the source, i.e. the set Obj, thus we conclude that αSe () = Obj. The same holds for each maximal element (if the taxonomy has more than one maximal elements). 2 Proposition 5: For all ISs S and queries ϕ ∈ LO , ans(ϕ, S e ) = αSe (ϕ). Example 7: By applying the last Proposition, we have: 2 ans(M, S e ) = αSe (M) = αS (SC) = I(SC) ∪ I(R) ∪ I(M) = {1, 2}.
6
Iterative Extension of Information Sources
Intuitively, we would expect that ·+ be a function which, applied to an IS interpretation, produces a new interpretation that is equal to or larger than the original extension, the former case corresponding to the situation in which the knowledge base of the IS does not enable to find any explanations for each object index. Technically, this amounts to say that ·+ is a monotonic function, which is in fact the case. Then, by iterating the ·+ operator, we expect to move from an interpretation to a larger one, until an interpretation is reached which cannot be extended any more. Also this turns out to be true, and in order to show it, we will model the domain of the ·+ operator as a complete partial order, and use the notion of fixed point in order to capture interpretations that are no longer extensible. Proposition 6: Given an IS S, the domain of S is the set D given by D = {I ∪ A | A ∈ P(T × Obj)}. Then, ·+ is a continuous function on the complete partial order (D, ⊆). The proof that (D, ⊆) is a complete partial order is trivial. The continuity of ·+ follows from its monotonicity (also, a simple fact to show) and the fact that in the considered complete partial order all chains are finite, hence the class of monotonic functions coincides with the class of continuous functions [6]. 2 As a corollary of the previous Proposition and of the Knaster-Tarski fixed point theorem, we have: Proposition 7: The function ·+ has a least fixed point that is the least upper bound of the chain {I, I + , (I + )+ , . . .}. 2 Example 8: Let R be the EIS derived in the last Example, i.e. R = S e , and let us consider the problem AR (1), for which the manifestation is given by the set abindS (1) above. It can be verified that Sol(AR (1)) = P0 ∪ P1 , where: P0 = {{R, M, MPC, UD} ∪ A | A ∈ P({SC, C, })} P1 = {{R, M, UMC} ∪ A | A ∈ P({SC, C, , MPC, UD})} Therefore: Sol (AR (1)) = {{R, M, UMC}} from which we obtain: abindR (1) = {R, M, UMC} which means that the index of object 1 in R has been extended with the term UMC. If we now set P = Re , and consider the problem AP (1), we find
An Abduction-Based Method for Index Relaxation
601
Sol(AP (1)) = P0 = {{R, M, UMC} ∪ A | A ∈ P({SC, MPC, UD, C, })} Consequently, Sol (AP (1)) = {{R, M, UMC}} and abindP (1) ⊆ indP (1). Analogously, we have abindR (2) = indR (2) ∪ {UMC} and abindP (2) ⊆ indP (2). Thus, since ((I + )+ )+ = (I + )+ , (I + )+ is a fixed point, which means that P is no longer extensible. Notice that ∅ = ans(UMC, S) = ans(UMC, R) ⊂ ans(UMC, P ) = {1, 2}. 2
7
Conclusion and Future Work
To alleviate the problem of indexing uncertainty we have proposed a mechanism which allows liberating the index of a source in a gradual manner. This mechanism is governed by the notion of explanation, logically captured by abduction. The proposed method can be implemented as an answer enlargement1 process where the user is not required to give additional input, but from expressing his/her desire for more objects. Another interesting remark is that the abduced extension operation can be applied not only to manually constructed taxonomies but also to taxonomies derived automatically on the basis of an inference service. For instance, it can be applied on sources indexed using taxonomies of compound terms which are defined algebraically [9]. The introduced framework can be also applied for ranking the objects of an answer according to an explanation-based measure of relevance. In particular, we can define the rank of an object o as (k)e follows: rank(o) = min{ k | o ∈ αS (ϕ)}.
References 1. R. Baeza-Yates and B. Ribeiro-Neto. “Modern Information Retrieval”. ACM Press, Addison-Wesley, 1999. 2. George Boolos. “Logic, Logic and Logic”. Harvard University Press, 1998. 3. F.M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf. Reasoning in description logics. In G. Brewka, editor, Principles of Knowledge Representation, Studies in Logic, Language and Information, pages 193–238. CSLI Publications, 1996. 4. T. Eiter and G. Gottlob. The complexity of logic-based abduction. Journal of the ACM, 42(1):3–42, January 1995. 5. H.B. Enderton. A mathematical introduction to logic. Academic Press, N. Y., 1972. 6. P.A. Fejer and D.A. Simovici. Mathematical Foundations of Computer Science. Volume 1: Sets, Relations, and Induction. Springer-Verlag, 1991. 7. C.H. Papadimitriou. Computational complexity. Addison-Wesley, 1994. 8. Giovanni M. Sacco. “Dynamic Taxonomies: A Model for Large Information Bases”. IEEE Transactions on Knowledge and Data Engineering, 12(3), May 2000. 9. Y. Tzitzikas, A. Analyti, N. Spyratos, and P. Constantopoulos. “An Algebra for Specifying Compound Terms for Faceted Taxonomies”. In 13th European-Japanese Conf. on Information Modelling and Knowledge Bases, Kitakyushu, J, June 2003. 10. P. Zunde and M.E. Dexter. “Indexing Consistency and Quality”. American Documentation, 20(3):259–267, July 1969. 1
If the query contains negation then the answer can be reduced.
On Selection Functions that Do Not Preserve Normality Wolfgang Merkle and Jan Reimann Ruprecht-Karls-Universit¨ at Heidelberg Mathematisches Institut Im Neuenheimer Feld 294 D-69120 Heidelberg, Germany {merkle,reimann}@math.uni-heidelberg.de
Abstract. The sequence selected from a sequence R(0)R(1) . . . by a language L is the subsequence of all bits R(n + 1) such that the prefix R(0) . . . R(n) is in L. By a result of Agafonoff [1], a sequence is normal if and only if any subsequence selected by a regular language is again normal. Kamae and Weiss [11] and others have raised the question of how complex a language must be such that selecting according to the language does not preserve normality. We show that there are such languages that are only slightly more complicated than regular ones, namely, normality is neither preserved by linear languages nor by deterministic one-counter languages. In fact, for both types of languages it is possible to select a constant sequence from a normal one.
1
Introduction
It is one of the fundamental beliefs about chance experiments that any infinite binary sequence obtained by independent tosses of a fair coin will, in the long run, produce any possible finite sequence with frequency 2−n , where n is the length of the finite sequence considered. Sequences of zeros and ones having this property are called normal. It is a basic result of probability theory that, with respect to the uniform Bernoulli measure, almost every sequence is normal. One may now pose the following problem: If we select from a normal sequence an infinite subsequence, under what selection mechanisms is the thereby obtained sequence again normal, i.e. which restrictions must and can one impose on the class of admissible selection rules to guarantee that normality is preserved. This problem originated in the work of von Mises (see for example [22]). His aim was to base a mathematical theory of probability on the primitive notion of a Kollektiv, which are objects having two distinguished properties. On the one hand, individual symbols possess an asymptotic frequency (as normal sequences do) which allows in turn to assign probabilities. On the other hand, the limiting frequencies are preserved when a subsequence is selected from the original sequence. Of course, not arbitrary selection rules, or place selection rules, as von Mises calls them, will be allowed in this context, since one might simply select all zeroes from a given sequence. Von Mises did not give a formal definition of an B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 602–611, 2003. c Springer-Verlag Berlin Heidelberg 2003
On Selection Functions that Do Not Preserve Normality
603
admissible selection rule, however, he requires them to select a subsequence “independently of the result of the corresponding observation, i.e., before anything is known about this result.” There have been various attempts to clarify and rigorously define what an admissible selection rule and hence a Kollektiv is. One approach allowed only rules that were in some sense effective, for instance, computable by a Turing machine. This effort was initiated by Church [9] and lead to the study of effective stochastic sequences (see the survey by Uspensky, Semenov and Shen [20] for more on this). Knowing Champernowne’s construction (see Champernowne’s paper [8] and Section 2 below), normal numbers disqualify as stochastic sequences, as some of them are easy to describe by algorithms. On the other hand, from a purely measure theoretic point of view, normal sequences seem to be good candidates for a Kollektiv, as they have the right limiting frequency of individual symbols. Furthermore, their dynamic behavior is as complex as possible: They are generic points in {0, 1}∞ with respect to a measure with highest possible entropy – the uniform (1/2, 1/2)-Bernoulli measure. (In Section 3 we will explain this further, also refer to Weiss [23].) So one might ask a question contrary to the problem set up by Church and others: Which selection rules preserve normality, i.e. map normal sequences to normal ones? In particular, such rules will preserve the limiting frequency of zeroes and ones and hence satisfy von Mises’ requirements for a Kollektiv. There are two kinds of selection rules that are commonly considered: oblivious ones, for which the decision of selecting a bit for the subsequence does not depend on the input sequence up to that bit (i.e., the places to be selected are fixed in advance), and those selection rules that depend on the input sequence. For oblivious selection rules, Kamae [10] found a necessary and sufficient condition for them to preserve normality. For input dependent rules, Agafonoff [1] obtained the result that if a sequence N is normal, then any infinite subsequence selected from N by a regular language L is again normal. Detailed proofs and further discussion can be found in Schnorr and Stimm [18] as well as in Kamae and Weiss [11]. (It is not hard to see that the reverse implication holds, too, as observed by Postnikova [17] and others [15,7], hence the latter property of a sequence N is equivalent to N being normal). It has been asked by Kamae and Weiss [11] whether Agafonoff’s result can be extended to classes of languages that are more comprising than the class of regular languages, e.g., to the class of context-free languages (see also Li and Vit´ anyi [13], p. 59, problem 1.9.7). In the sequel, we give a negative answer to this question for two classes of languages that are the least proper superclasses of the class of regular languages that are usually considered in the theory of formal languages. More precisely, Agafonoff’s result can neither be extended to the class of linear languages nor to the class of languages that are recognized by a deterministic pushdown automat with unary stack alphabet, known as deterministic one-counter languages. Recall that these two classes are incomparable and that the latter fact is witnessed, for example, by the languages used in the proofs of Propositions 10 and 11, i.e., the language of all words that contains as
604
Wolfgang Merkle and Jan Reimann
many 0’s as 1 and the language of even-length palindromes. (For background on formal language theory we refer to the survey by Autebert, Berstel, and Boasson [5].) However, determining exactly the class of languages preserving normality remains an open problem. The outline of the paper is as follows. In Section 2 we review the basic definitions related to normality and recap Champernowne’s constructions of normal sequences. Section 3 discusses the two kinds of selection rules, oblivious and input dependent ones. In Section 4, we show that normality is not preserved by selection rules defined by deterministic one-counter languages, while Section 5 is devoted to proving that normality is not preserved by linear languages. Our notation is mostly standard, for unexplained terms and further details we refer to the textbooks and surveys cited in the bibliography [3,4,6,13,14,16]. Unless explicitly stated otherwise, sequences are always infinite and binary. A word is a finite sequence. For i = 0, 1, . . ., we write A(i) for bit i of a sequence A, hence, A = A(0)A(1) . . . and we proceed similarly for words. A word w is a prefix of a sequence A if A(i) = w(i) for i = 0, . . . , |w| − 1, where |w| is the length of w. The prefix of a sequence A of length m is denoted by A|m. The concatenation of two words v and w is denoted by vw. A word u is a subword of a word w if w = v1 uv2 for appropriate words v1 and v2 .
2
Normal Sequences
For a start, we review the concept of a normal sequence and standard techniques for the construction of such sequences. Definition 1. (i) For given words u and w, let occu (w) be the number of times that u appears as a subword of w, and let frequ (w) = occu (w)/|w|. (ii) A sequence N is normal if and only if for any word u lim frequ (N |m) =
m→∞
1 . 2|u|
(1)
Remark 2. A sequence N is normal if for any word u and any ε > 0, we have for all sufficiently large m, 1 + ε. (2) 2|u| For a proof, it suffices to observe that for any given ε > 0 and for all sufficiently large m, inequality (2) holds with u replaced by any word v that has the same length as u, while the sum of the relative frequencies freqv (N |m) over these 2|u| words differ from 1 by less than ε; hence by (2) all such m, 1 (1 − ε) ≤ freqv (N |m) ≤ frequ (N |m) + (2|u| − 1)( 2|u| + ε), frequ (N |m) <
{v:|v|=|u|}
and by rearranging terms we obtain 1 − 2|u| ε < frequ (N |m) . 2|u|
On Selection Functions that Do Not Preserve Normality
605
Together with (2) this implies that frequ (N |m) converges to 2−|u| , because ε > 0 has been chosen arbitrarily. Definition 3. A set W of words is normal in the limit if and only if for any nonempty word u and any ε > 0 for all but finitely many words w in W , 1 1 − ε < frequ (w) < |u| + ε. 2|u| 2
(3)
Definition 4. For any n, let vn = 0n 0n−1 1 0n−2 10 . . . 1n be the word that is obtained by concatenating all words of length n in lexicographic order. Proposition 5. The set {v1 , v2 , . . .} is normal in the limit. Proof. By an argument similar to the one given in Remark 2, it suffices to show that for any word u and any given ε > 0 we have for almost all words vi , frequ (vi ) <
1 + ε. 2|u|
(4)
So fix u and ε > 0 and consider any index i such that |u|/i < ε. Recalling that vi is the concatenation of all words of length i, call a subword of vi undivided if it is actually a subword of one of these words of length i, and call all other subwords of vi divided. It is easy to see that u can occur at most 2i |u| many times as a divided subword of vi . Furthermore, a symmetry argument shows that among the at most |vi | many undivided subwords of vi of length |u|, each of the 2|u| words of length |u| occurs exactly the same number of times. In summary, we have |vi | 1 2i |u| 1 i occu (vi ) ≤ |u| + 2 |u| = |vi | < |vi | + +ε , |vi | 2 2|u| 2|u| where the last inequality follows by |vi | = 2i i and the choice of i. Equation (4) is then immediate by definition of frequ (vi ). Lemma 6. Let W be a set of words that is normal in the limit. Let w1 , w2 , . . . be a sequence of words in W such that |{i ≤ t : wi = w}| = 0, t→∞ t
(i) for all w ∈ W, lim
|wt+1 | = 0. t→∞ |w1 . . . wt |
(ii) lim
Then the sequence N = w1 w2 . . . is normal. Remark 7. The sequence v1 v2 v2 v3 v3 v3 v4 . . ., which consists of i copies of vi concatenated in length-increasing order, is normal. This assertion is immediate by definition of the sequence, Proposition 5, and Lemma 6.
606
Wolfgang Merkle and Jan Reimann
Due to lack of space, we omit the proof of Lemma 6. The arguments and techniques (also for the other results in this section) are essentially the same as the ones used by Champernowne [8], who considered normal sequences over the decimal alphabet {0, 1, . . . , 9} and proved that the decimal analogues of the sequences N1 = v1 v2 v2 v3 v3 v3 v4 . . .
and
N2 = v1 v2 v3 v4 . . .
are normal. In Remark 7, we have employed Lemma 6 and the fact that the set of all words vi is normal in the limit in order to show that N1 is normal. In order to demonstrate the normality of the decimal analogue of the sequence N2 , Champernowne [8, item (ii) on page 256] shows a fact about the decimal versions of the vi that is stronger than just being normal in the limit, namely, for any word u and any constant k, we have for all sufficiently large i, and all m ≤ |vi |, frequ (vi (0) . . . vi (m − 1)) <
m |vi | . + |u| k 2
(5)
This result is then used in connection with a variant of Lemma 6 where in place of assumption (ii), which asserts that the ratio |wt+1 | and |w1 . . . wt | converge to 0, it is just required that this ratio is bounded.
3
Selecting Subsequences
The most simple selection rules are oblivious ones, they fix the places to be included in the subsequence in advance, independent of the input sequence. Definition 8. An oblivious selection rule is a sequence S ∈ {0, 1}∞ . The subsequence B obtained by applying an oblivious rule S to a sequence A is just the subsequence formed by all the bits A(i) with S(i) = 1. Kamae [10] gave a complete characterization of the class of oblivious selection rules that preserve normality. Let T be the shift map, transforming a sequence A = A(0)A(1)A(2) . . . into another sequence by cutting off the first bit, i.e. T (A) = A(1)A(2)A(3) . . .. Given a sequence A, let further denote δA the Dirac measure induced by A, that is, for any class B of sequences, δA (B) = 1 if A ∈ B and δA (B) = 0 otherwise. Note that if a sequence A is normal, then any cluster point (in the weak-∗ topology) of the measures n 1 δ i , n i=0 T (A) is the uniform (1/2, 1/2)-Bernoulli measure, which has entropy 1 (see Weiss [23] for details on this). Kamae showed that an oblivious selection rule S preserves normality if and only if S is completely deterministic, that is, any cluster point of the measures n
1 δ i n i=0 T (S)
On Selection Functions that Do Not Preserve Normality
607
has entropy 0. (See the monograph by Weiss [23] for more on this topic.) The results in this paper are concerned with selection rules depending on the input sequence. Here, up to now, no exact classification of normality preserving selection rules in the spirit of Kamae’s results are known. First, we formally define how an input dependent selection rule works. Definition 9. Let A be a sequence and let L be a language. The sequence selected from A by L is the subsequence of A that contains exactly the bits A(i) of A such that the prefix A(0) . . . A(i − 1) is in L. The basic result here is that regular languages preserve normality [1,17,18,11,15,7]. In the next two sections we are going to show that two wellknown superclasses of the regular languages, minimal among the classes usually studied in formal language theory and known to be incomparable, do not preserve normality.
4
Normality Is Not Preserved by Deterministic One-Counter Languages
Proposition 10. There exist a normal sequence N and a deterministic onecounter language L such that the sequence selected from N by L is infinite and constant. Proof. Recall that occr (w) is the number of occurrences of the symbol r in w; for any word w, let d(w) = occ0 (w) − occ1 (w) . Let L be the language of all words that have as many 0’s as 1’s, i.e., L = {w ∈ {0, 1}∗ : d(w) = 0} . The language L is obviously a deterministic one-counter language, as it can be recognized by a deterministic push-down automata with unary stack alphabet that for the already scanned prefix v of the input stores the sign and the absolute value of d(v) by its state and by the number of stack symbols, respectively. Recall from Section 2 that vi is obtained by concatenating all words of length i and that by Remark 7, the sequence N = v1 v2 v2 v3 v3 v3 v4 . . .
(6)
is normal. For the scope of this proof, call the subwords vi of N in (6) designated subwords. Furthermore, for all t, let zt be the prefix of N that consists of the first t designated subwords. Every prefix of the form zt of N is immediately followed by the (t + 1)th designated subword, where each designated subword starts with 0. Hence the proposition follows, if we can show that among all prefixes w of N , exactly the zt are in L, or equivalently, exactly the prefixes w that are equal to some zt satisfy d(w) = 0.
608
Wolfgang Merkle and Jan Reimann
Fix any prefix w of N . Choose t maximum such that zt is a prefix of w and pick v such that w = zt v. By choice of t, the word v is a proper prefix of the (t + 1)th designated subword and v is equal to the empty string if and only if w is equal to some zi . By additivity of d, we have d(w) = d(zt ) + d(v) = d(vi1 ) + . . . + d(vit ) + d(v)
(7)
for appropriate values of the indices ij . Then in order to show that w is equal to some zi if and only if d(w) = 0, it suffices to show that for all s, (i) d(vs ) = 0,
(ii) d(v) > 0 for all nonempty proper prefixes of vs . (8)
We proceed by induction on i. For i = 0 there is nothing to prove, so assume i > 0. Let v0i and v1i be the first and the second half of vi , respectively. For r = 0, 1, the string vri is obtained from vi−1 by inserting 2i−1 times r, where d(vi−1 ) = 0 by the induction hypothesis. Hence (i) follows because d(vi ) = d(v0i ) + d(v1i ) = d(vi−1 ) + 2i−1 + d(vi−1 ) − 2i−1 = 0 . In order to show (ii), fix any nonempty proper prefix u of vi . First assume that u is a proper prefix of v0i . Then u can be obtained from a nonempty, proper prefix of vi−1 by inserting some 0’s, hence we are done by the induction hypothesis. Next assume u = v0i v for some proper prefix v of v1i . We have already argued that the induction hypothesis implies d(v0i ) = 2i−1 . Furthermore, v can be obtained from a proper prefix v of vi−1 by inserting at most 2i−1 many 1’s, where by the induction hypothesis we have d(v ) > 0. In summary, we have d(u) = d(v0i ) + d(v) ≥ 2i−1 + d(v ) − 2i−1 > 0 ,
which finishes the proof of the proposition.
5
Normality Is Not Preserved by Linear Languages
Proposition 11. There is a normal sequence N and a linear language L such that the sequence selected from N by L is infinite and constant. Proof. For any word w = w(0) . . . w(n − 1) of length n, let wR = w(n − 1) . . . w(0) be the mirror word of w and let L = {wwR : w is a word} be the language of palindromes of even length. The language L is linear because it can be generated by a grammar with start symbol S and rules S → 0S0 | 1S1 | λ. The sequence N is defined in stages s = 0, 1, . . . where during stage s we z0 and z0 both be equal to the specify prefixes zs and zs of N . At stage 0, let empty string. At any stage s > 0, obtain zs by appending 2s copies of vs to zs−1 zs its own mirror word zR and obtain zs by appending to s , i.e., zs = zs−1 vs . . . vs
(2s copies of vs ),
i.e., for example, we have z1 = v1 ,
and
zs = zs zR s;
(9)
On Selection Functions that Do Not Preserve Normality
z1 z2 z2 z3
609
= v1 vR 1, = v1 vR 1 v2 v2 , R R R = v1 vR 1 v2 v2 v2 v2 v1 v1 , R R R = v1 vR 1 v2 v2 v2 v2 v1 v1 v3 v3 v3 v3 ,
R R R R R R R R R R R z3 = v1 vR 1 v2 v2 v2 v2 v1 v1 v3 v3 v3 v3 v3 v3 v3 v3 v1 v1 v2 v2 v2 v2 v1 v1 .
We show next that the set of prefixes of N that are in L coincides with the set {zs : s ≥ 0}. From the latter, it is then immediate that L selects from N an infinite subsequence that consists only of 0’s, since any prefix zs of N is followed by the word vs+1 , where all these words start with 0. By definition of the zs , all words zs are prefixes of N and are in L. In order to show that the zs are the only prefixes of N contained in L, let us = 01s 1s 0 . By induction on s, we show for all s > 2 that (i) in zs occurs exactly one subword us−1 and no subword us ; (ii) in zs occur exactly two subwords us−1 and one subword us ; Inspection shows that both assertions are true in case s = 3. In the induction step, consider some s > 3. Assertion (i) follows by zs = zs−1 vs . . . vs , the induction hypothesis on zs−1 , and because by definition of vs , the block of copies of vs cannot overlap with a subword us . Assertion (ii) is then immediate by R assertion (i), by zs = zs zR and 01s is a suffix s , and because us is equal to us of zs . Now fix any prefix w of N and assume that w is in L, i.e., is a palindrome of even length. Let s be maximum such that zs is a prefix of w. We can assume s ≥ 3, because inspection reveals that w cannot be a prefix of z3 unless w is equal to some zi , where in the latter case we are done. By (ii), the words zs and zs+1 contain us as a subword exactly once and twice, respectively, hence w contains us as a subword at least once and at most twice. When mirroring the palindrome w onto itself, the first occurrence of the palindrome us in w must either be mapped to itself or, if present at all, to the second occurrence of us in w, in which cases w must be equal to zs and zs+1 , respectively. Since w was chosen as an arbitrary prefix of N in L, this shows that the zs are the only prefixes of N in L. It remains to show that N is normal. Let W = {vi : i ∈ N} ∪ {vi R : i ∈ N} and write the sequence N in the form N = w1 w2 . . .
(10)
where the words wi correspond in the natural way to the words in the set W that occur in the inductive definition of N (e.g., w1 , w2 , and w3 are equal to v1 , v1 R , and v2 ). For the scope of this proof, we will call the subwords wi of N in (10) the designated subwords of N .
610
Wolfgang Merkle and Jan Reimann
We conclude the proof by showing that the assumptions of Lemma 6 are satisfied. By Proposition 5, the set of all words of the form vi is normal in the limit, and the same holds, by literally the same proof, for the set of all words vi R ; the union of these two sets, i.e., the set W , is then also normal in the limit because the class of sets that are normal in the limit is easily shown to be closed under union. Next observe that in every prefix zs of N each of the 2s words v1 , . . . , vs and v1 R , . . . , vs R occurs exactly 2s−1 many times; in particular, zs contains at least s2s designated subwords and has length of at least 2s−1 |vs |. Now fix any t > 0 and let z = w1 . . . wt ; let s be maximum such that zs is a prefix of z. By the preceding discussion, we have for any w in W , |{i ≤ t : wi = w}| 1 2s < s = t s2 s and, furthermore, |wt+1 | |vs+1 | 2s+1 1 < ≤ s−1 = s−2 . |w1 . . . wt | |zs | 2 |vs | 2 Since t was chosen arbitrarily and s goes to infinity when t does, this shows that assumptions (i) and (ii) in Lemma 6 are satisfied.
Acknowledgements We are grateful to Klaus Ambos-Spies, Frank Stephan, and Paul Vit´ anyi for helpful discussions.
References 1. V. N. Agafonoff. Normal sequences and finite automata. Soviet Mathematics Doklady, 9:324–325, 1968. 2. K. Ambos-Spies. Algorithmic randomness revisited. In B. McGuinness (ed.), Language, Logic and Formalization of Knowledge. Bibliotheca, 1998. 3. K. Ambos-Spies and A. Kuˇcera. Randomness in computability theory. In P. Cholak et al. (eds.), Computability Theory: Current Trends and Open Problems, Contemporary Mathematics, 257:1–14. American Mathematical Society, 2000. 4. K. Ambos-Spies and E. Mayordomo. Resource-bounded balanced genericity, stochasticity and weak randomness. In Complexity, Logic, and Recursion Theory. Marcel Dekker, 1997. 5. J.-M. Autebert, J. Berstel, and L. Boasson, Context-Free Languages and Pushdown Automata. In G. Rozenberg and A. Salomaa (eds.), Handbook of formal languages. Springer, 1997. 6. J.L. Balc´ azar, J. D´ıaz and J. Gabarr´ o. Structural Complexity, Vol. I and II. Springer, 1995 and 1990. 7. A. Broglio and P. Liardet. Predictions with automata. Symbolic dynamics and its applications, Proc. AMS Conf. in honor of R. L. Adler, New Haven/CT (USA) 1991, Contemporary Mathematics, 135:111–124. American Mathematical Society, 1992.
On Selection Functions that Do Not Preserve Normality
611
8. D. G. Champernowne, The construction of decimals normal in the scale of ten. Journal of the London Mathematical Society, 8:254–260, 1933. 9. A. Church. On the concept of a random number. Bulletin of the AMS, 46:130–135, 1940. 10. T. Kamae. Subsequences of normal seuqences. Isreal Journal of Mathematics, 16:121–149, 1973. 11. T. Kamae and B. Weiss. Normal numbers and selection rules. Isreal Journal of Mathematics, 21(2-3):101–110, 1975. 12. M. van Lambalgen. Random Sequences, Doctoral dissertation, University of Amsterdam, Amsterdam, 1987. 13. M. Li and P. Vit´ anyi An Introduction to Kolmogorov Complexity and Its Applications, second edition, Springer, 1997. 14. J. H. Lutz. The quantitative structure of exponential time. In Hemaspaandra, L. A. and A. L. Selman, editors, Complexity Theory Retrospective II. Springer, 1997. 15. M. G. O’Connor. An unpredictability approach to finite-state randomness. Journal of Computer and System Sciences, 37(3):324–336, 1988. 16. P. Odifreddi. Classical Recursion Theory. Vol. I. North-Holland, 1989. 17. L. P. Postnikova, On the connection between the concepts of collectives of MisesChurch and normal Bernoulli sequences of symbols. Theory of Probability and its Applications, 6:211–213, 1961. 18. C. P. Schnorr and H. Stimm. Endliche Automaten und Zufallsfolgen. Acta Informatica, 1:345–359, 1972. 19. A. Kh. Shen’. On relations between different algorithmic definitions of randomness. Soviet Mathematics Doklady, 38:316–319, 1988. 20. V. A. Uspensky, A. L. Semenov, and A. Kh. Shen’. Can an individual sequence of zeros and ones be random? Russian Math. Surveys, 45:121–189, 1990. ´ 21. J. Ville, Etude Critique de la Notion de Collectif. Gauthiers-Villars, 1939. 22. R. von Mises. Probability, Statistics and Truth. Macmillan, 1957. 23. B. Weiss. Single Orbit Dynamics. CBMS Regional Conference Series in Mathematics. American Mathematical Society, 2000.
On Converting CNF to DNF Peter Bro Miltersen1,∗ , Jaikumar Radhakrishnan2,∗∗ , and Ingo Wegener3,∗∗∗ 1
3
Department of Computer Science, University of Aarhus, Denmark [email protected] 2 School of Technology and Computer Science Tata Institute of Fundamental Research, Mumbai 400005, India [email protected] FB Informatik LS2, University of Dortmund, 44221 Dortmund, Germany [email protected]
Abstract. We study how big the blow-up in size can be when one switches between the CNF and DNF representations of boolean functions. For a function f : {0, 1}n → {0, 1}, cnfsize(f ) denotes the minimum number of clauses in a CNF for f ; similarly, dnfsize(f ) denotes the minimum number of terms in a DNF for f . For 0 ≤ m ≤ 2n−1 , let dnfsize(m, n) be the maximum dnfsize(f ) for a function f : {0, 1}n → {0, 1} with cnfsize(f ) ≤ m. We show that there are constants c1 , c2 ≥ 1 and > 0, such that for all large n and all m ∈ [ 1 n, 2n ], we have n n−c1 log(m/n)
2
n n−c2 log(m/n)
≤ dnfsize(m, n) ≤ 2
.
In particular, when m is the polynomial nc , we get dnfsize(nc , n) = n ) n−θ(c−1 log n
2
1
.
Introduction
Boolean functions are often represented as disjunctions of terms (i.e. in DNF) or as conjunctions of clauses (i.e. in CNF). Which of these representations is preferable depends on the application. Some functions are represented more succinctly in DNF whereas others are represented more succinctly in CNF, and switching between these representations can involve an exponential increase in size. In this paper, we study how big this blow-up in size can be. We recall some well-known concepts (for more details see Wegener [15]). The set of variables is denoted by Xn = {x1 , . . . , xn }. Literals are variables and negated variables. Terms are conjunctions of literals. Clauses are disjunctions of literals. Every Boolean function f can be represented as a conjunction of clauses, s , (1) i=1 ∈Ci
as well as a disjunction of terms, ∗ ∗∗ ∗∗∗
Supported by BRICS, Basic Research in Computer Science, a centre of the Danish National Research Foundation. Work done while the author was visiting Aarhus. Supported by DFG-grant We 1066/9.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 612–621, 2003. c Springer-Verlag Berlin Heidelberg 2003
On Converting CNF to DNF s
,
613
(2)
i=1 ∈Ti
where Ti and Ci are sets of literals. The form (1) is usually referred to as conjunctive normal form (CNF) and the form (2) is usually referred to as disjunctive normal form (DNF), although it would be historically more correct to call them conjunctive and disjunctive forms and use normal only when the sets Ci and Ti have n literals on distinct variables. In particular, this would ensure that normal forms are unique. However, in the computer science literature such a distinction is not made, and we will use CNF and DNF while referring to expressions such as (1) or (2) even when no restriction is imposed on the sets Ci and Ti , and there is no guarantee of uniqueness. The size of a CNF is the number of clauses (the parameter s in (1)), and cnfsize(f ) is the minimum number of clauses in a CNF for f . Similarly, dnfsize(f ) is the minimum number of terms in a DNF for f . We are interested in the maximal blow-up of size when switching from the CNF representation to the DNF representation (or vice versa). For 0 ≤ m ≤ 2n−1 , let dnfsize(m, n) be the maximum dnfsize(f ) for a function f : {0, 1}n → {0, 1} with cnfsize(f ) ≤ m. Since ∧ distributes over ∨, a CNF with m clauses each with k literals can be converted to a DNF with k m terms each with at most m literals. If the clauses do not share any variable, this blow-up cannot be avoided. If the clauses don’t share variables, we have km ≤ n, and the maximum n dnfsize(f ) that one can achieve by this method is 2 2 . Can the blow-up be worse? In particular, we want to know the answer to the following question: For a function f : {0, 1}n → {0, 1}, how large can dnfsize(f ) be if cnfsize(f ) is bounded by a fixed polynomial in n? The problem is motivated by its fundamental nature: dnfsize(f ) and cnfsize(f ) are fundamental complexity measures. Practical circuit designs like programmable logic arrays (PLAs) are based on DNFs and CNFs. Lower bounds on unbounded fan-in circuits are based on the celebrated switching lemma of H˚ astad (1989) which is a statement about converting CNFs to DNFs where some variables randomly are replaced by constants. Hence, it seems that the exact relationship between CNFs and DNFs ought to be understood as completely as possible. Fortunately, CNFs and DNFs have simple combinatorial properties allowing the application of current combinatorial arguments to obtain such an understanding. In contrast, the results of Razborov and Rudich [12] show that this is not likely to be possible for complexity measures like circuit size and circuit depth. Another motivation for considering the question is the study of SAT algorithms and heuristics with “mild” exponential behaviour; a study which has gained a lot of momentum in recent years (e.g., Monien and Speckenmeyer[9], Paturi et al. [10], Dantsin et al. [4], Sch¨ oning [13], Hofmeister et al. [7], and Dantsin et al. [5]). Despite many successes, the following fundamental question is still open: Is there an algorithm that decides SAT of a CNF with n variables and m clauses (without any restrictions on the length of clauses) in time mO(1) 2cn for some constant c < 1? The obvious brute force algorithm solves the
614
Peter Bro Miltersen, Jaikumar Radhakrishnan, and Ingo Wegener
problem in time mO(1) 2n . One method for solving SAT is to convert the CNF to a DNF, perhaps using sophisticated heuristics to keep the final DNF and any intermediate results small (though presumably not optimally small, due to the hardness of such a task). Once converted to a DNF, satisfiability of the formula is trivial to decide. A CNF-DNF conversion method for solving SAT, phrased in a more general constraint satisfaction framework was recently studied experimentally by Katajainen and Madsen [8]. Answering the question above limits the worst case complexity of any algorithm obtained within this framework. The Monotone Case: Our final motivation for considering the question comes from the monotone version of the problem. Let dnfsize+ (m, n) denote the maximum dnfsize(f ) for a monotone function f : {0, 1}n → {0, 1}. In this case (see, e.g., Wegener [15, Chapter 2, Theorem 4.2]), the number of prime clauses of f is equal to cnfsize(f ) and the number of prime implicants of f is equal to dnfsize(f ). Our problem can then be modelled on a hypergraph Hf whose edges are precisely the prime clauses of f . A vertex cover or hitting set for a hypergraph is a subset of vertices that intersects every edge of the hypergraph. The number of prime implicants of f is precisely the number of minimal vertex covers in Hf . The problem of determining dnfsize+ (m, n) then immediately translates to the following problem on hypergraphs: What is the maximum number of distinct minimal vertex covers in a hypergraph on n vertices with m distinct edges? In particular, how many minimal vertex covers can a hypergraph with nO(1) edges have? Previous Work: Somewhat surprisingly, the exact question we consider does not seem to have been considered before, although some related research has been reported. As mentioned, H˚ astad’s switching lemma can be considered as a result about approximating CNFs by DNFs. The problem of converting polynomialsize CNFs and DNFs into representations by restricted branching programs for the purpose of hardware verification has been considered since a long time (see Wegener [16]). The best lower bounds for ordered binary decision diagrams (OBDDs) and read-once branching programs (BP1s) are due to Bollig and We1/2 gener [3] and are of size 2Ω(n ) even for monotone functions representable as disjunctions of terms of length 2. The Results in this Paper: In Section 2, we show functions where the the blow-up when going from CNF to DNF is large: n
for 2 ≤ m ≤ 2n−1 , dnfsize(m, n) ≥ 2n−2 log(m/n) ; log log(m/n) n for 2 ≤ m ≤ n2 , dnfsize+ (m, n) ≥ 2n−n log(m/n) −log(m/n) . In particular, for m = nO(1) , we have n
dnfsize(m, n) = 2n−O( log n ) and dnfsize+ (m, n) = 2n−O(
n log log n ) log n
.
In Section 3, we show that functions with small CNFs do not need very large DNFs There is a constant c > 0 such that for all large n and all m ∈ − [104 n, 210 4n ],
On Converting CNF to DNF
615
n
dnfsize(m, n) ≤ 2n−c log(m/n) . In particular, for m = nO(1) , we have dnfsize(m, n) = 2n−Ω(n/log n) . For the class of CNF-DNF conversion based SAT algorithms described above, our results imply that no algorithm within this framework has complexity mO(1) 2cn for some constant c < 1, though we cannot rule out an algorithm of this kind with complexity mO(1) 2n−Ω(n/ log n) which would still be a very interesting result.
2
Functions with a Large Blow-Up
In this section, we show functions with small cnfsize but large dnfsize. Our functions will be the conjunction of a small number of parity and majority functions. To estimate the cnfsize and the dnfsize of such functions, we will need use a lemma. Recall, that a prime implicant t of a boolean function f is called an essential prime implicant if there is an input x such that t(x) = 1 but t (x) = 0 for all other prime implicants t of f . We denote the number of essential prime implicants of f by ess(f ). Lemma 1. Let f (x) = i=1 gi (x), where the gi ’s depend on disjoint sets of variables and no gi is identically 0. Then, cnfsize(f ) =
cnfsize(gi ) and
dnfsize(f ) ≥ ess(f ) =
i=1
ess(gi ).
i=1
Proof. First, consider cnfsize(f ). This part is essentially Theorem 1 of Voigt and Wegener [14]. We recall their argument. Clearly, we can put together the CNFs of the gi ’s and produce a CNF for f with size at most i=1 cnfsize(gi ). To show that cnfsize(f ) ≥ i=1 cnfsize(gi ), let C be the set of clauses of the smallest CNF of f . We may assume that all clauses in C are prime clauses of f . Because the gi ’s depend on disjoint variables, every prime clause of f is a prime clause of exactly one gi . Thus we obtain a natural partition {C1 , C2 , . . . , C } of C where each clause in Ci is a prime clause of gi . Consider a setting to the variables of gj (j = i) that makes each such gj take the value 1 (this is possible because no gj is identically 0). Under this restriction, the function f reduces to gi and all clauses outside Ci are set to 1. Thus, gi ≡ c∈Ci c, and |Ci | ≥ cnfsize(gi ). The first claim follows from this. It is well-known since Quine [11] (see also, e.g., Wegener [15, Chapter 2, Lemma 2.2]) that dnfsize(f ) ≥ ess(f ). Also, it is easy to see that any essential prime implicant of f is the conjunction of essential prime implicants of gi and every conjunction of essential prime implicants of gi is an essential prime implicant of f . Our second claim follows from this.
We will apply the above lemma with the parity and majority functions as gi ’s. It is well-known that the parity function on n variables, defined by ∆
Parn (x) =
n
i=1
xi =
n i=1
xi
(mod 2),
616
Peter Bro Miltersen, Jaikumar Radhakrishnan, and Ingo Wegener
has cnfsize and dnfsize equal to 2n−1 . For monotone functions, it is known that for the majority function on n variables, defined by Maj(x) = 1 ⇔ has cnfsize and dnfsize equal to
n n/2
n
xi ≥
i=1
n , 2
.
Definition 1. Let the set of n variables {x1 , x2 , . . . , xn } be partitioned into = n/k sets S1 , . . . S where |Si | = k for i < . The functions fk,n , hk,n : {0, 1}n → {0, 1} are defined as follows: fk,n (x) =
i=1 j∈Si
xj and hk,n (x) =
Maj(xj : j ∈ Si ).
i=1
Theorem 1. Suppose 1 ≤ k ≤ n. Then n · 2k−1 and dnfsize(fk,n ) = 2n−n/k ; cnfsize(fk,n ) ≤ k
n/k n k k cnfsize(hk,n ) ≤ · and dnfsize(hk,n ) ≥ . k k/2 k/2 k Proof. As noted above cnfsize(Park ) = 2k−1 and cnfsize(Majn ) = k/2 . Also, k k−1 and ess(Majn ) = k/2 . Our theorem it is easy to verify that ess(Park ) = 2 follows easily from this using Lemma 1.
Remark: One can determine the dnfsize of fk,n and hk,n directly using a general result of Voigt and Wegener [14], which states that the dnfsize(g1 ∧ g2 ) = dnfsize(g1 ) · dnfsize(g2 ) whenever g1 and g2 are symmetric functions on disjoint sets of variables. This is not true for general functions g1 and g2 (see Voigt and Wegener [14]). Corollary 1. 1. Let 2n ≤ m ≤ 2n−1 . There is a function f with cnfsize(f ) ≤ m and dnfsize(f ) ≥ 2n−2n/ log(m/n) . n 2. Let 4n ≤ m ≤ n/2 . Then, there is a monotone function h with cnfsize(h) ≤ m and dnfsize(h) ≥ 2n−n
log log(m/n) −log(m/n) log(m/n)
.
Proof. The first part follows from Theorem 1, by considering fk,n for k = log2 (m/n). The second part follows from the Theorem 1, by considering hk,n k ≤ 2k−1 (valid for with the same value of k. We use the inequality 2k /k ≤ k/2 k ≥ 2).
Let us understand what this result says for a range of parameters, assuming n is large.
On Converting CNF to DNF
617
Case m = cn: There is a function with linear cnfsize but exponential dnfsize. For > 0, by choosing c = θ(22/ ), the dnfsize can be made at least 2(1−)n . −1 n Case m = nc : We can make dnfsize(f ) = 2n−O(c log n ) . By choosing c large we obtain in the exponent an arbitrarily small constant for the (n/ log n)-term. Case m = 2o(n) : We can make dnfsize(f ) grow at least as fast as 2n−α(n) , for each α = ω(1). Monotone functions: We obtain a monotone function whose cnfsize is nat most log log n a polynomial m = nc , but whose dnfsize can be made as large as 2n−ε log n . Here, ε = O(c−1 ).
3
Upper Bounds on the Blow-Up
In this section, we show the upper bound on dnfsize(m, n) claimed in the introduction. We will use restrictions to analyse CNFs. So, we first present the necessary background about restrictions, and then use it to derive our result. 3.1
Preliminaries
Definition 2 (Restriction). A restriction on a set of variables V is a function ρ : V → {0, 1, }. The set of variables in V assigned by ρ are said to have been left free by ρ and denoted by free(ρ); the remaining variables set(ρ) = V − free(ρ) are said to be set by ρ. Let S ⊆ V . We use RVS to denote the set of all restrictions ρ with set(ρ) = S. For a Boolean function f on variables V and a restriction ρ, we denote by fρ the function with variables free(ρ) obtained from f by fixing all variables x ∈ set(V ) at the value ρ(x). The following easy observation lets us conclude that if the subfunctions obtained by applying restrictions have small dnfsize then the original function also has small dnfsize. Lemma 2. For all S ⊆ V and all boolean functions f with variables V , dnfsize(f ) ≤ dnfsize(fρ ). ρ∈RV S
Proof. Let Φfρ denote the smallest DNF for fρ . For a restriction ρ ∈ RVS , let t(ρ) be the term consisting of literals from variables in S that is made 1 by ρ and 0 by all other restrictions in RVS . (No variables outside S appears in t(ρ). Every variables in S appears in t(ρ): the variable x appears unnegated if and only if ρ(x) = 1.) Then, Φ = ρ∈RV t(ρ) ∧ Φfρ gives us a DNF for f of the required S size.
In light of this observation, to show that the dnfsize of some function f is small, it suffices to somehow obtain restrictions of f that have small dnfsize. Random restrictions are good for this. We will use random restrictions in two ways. If the clauses of a CNF have a small number of literals, then the switching
618
Peter Bro Miltersen, Jaikumar Radhakrishnan, and Ingo Wegener
lemma of H˚ astad[6] and Beame [1] when combined with Lemma 2 immediately gives us a small DNF (see Lemma 4 below). We are, however, given a general CNF not necessarily one with small clauses. Again, random restrictions come to our aid: with high probability large clauses are destroyed by random restrictions (see Lemma 5). Definition 3 (Random Restriction). When we say that ρ is a random restriction on the variables in V leaving variables free, we mean that ρ is generated as follows: first, pick a set S of size |V | − at random with uniform distribution; next, pick ρ with uniform distribution from RVS . We will need the following version of the switching lemma due to Beame [1]. Lemma 3 (Switching Lemma). Let f be a function on n variables with a a CNF whose clauses have at most r literals. Let ρ be a random restriction leaving variables free. Then Pr[fρ does not have a decision tree of depth d] < (7r/n)d . We can combine Lemma 2 and the switching lemma to obtain small DNFs for functions with CNFs with small clauses. n Lemma 4. Let 1 ≤ r ≤ 100 . Let f have a CNF on n variables where each clause 1 n has at most r literals. Then, dnfsize(f ) ≤ 2n− 100 · r .
Proof. Let V be 1the set of variables of f . Let ρ be a random restriction on V that leaves = 15 · nr variables free. By the switching lemma, with probability more than 1 − 2−d , fρ has a decision tree of depth at most d. We can fix S ⊆ V so that this event happens with this probability even when conditioned on set(ρ) = S, that is, when ρ is chosen at random with uniform distribution from RVS . If fρ has a decision tree of depth at most d, then it is easy to see that dnfsize(fρ ) ≤ 2d . In any case, dnfsize(fρ ) ≤ 2−1 . Thus, by Lemma 2, we have dnfsize(f ) ≤ 2n− · 2d + 2n− · 2−d · 2−1 . 1 n Set d = 2 . Then, dnfsize(f ) ≤ ρ∈RV dnfsize(fρ ) ≤ 2n− 2 +1 ≤ 2n− 100 · r . S
Lemma 5. Let V be a set of n variables, and K a set of literals distinct on variables. Let |K| = k. Let ρ be a random restriction that leaves n2 variables free. Then, k Pr[no literal in K is assigned 1] ≤ 2e− 8 . ρ
Proof. Let W be the set of variables that appear in K either in negated or nonnegated form. Using estimates for the tail of the hypergeometric distribution [2], we see first have k k Pr[|W ∩ set(ρ)| ≤ ] ≤ exp(− ). 4 8 k k Furthermore, Pr[no literal in K is assigned 1 | |W ∩ set(ρ)| ≥ ] ≤ 2− 4 . Thus, 4 k
k
k
Pr[no literal in K is assigned 1] ≤ e− 8 + 2− 4 < 2e− 8 . ρ
On Converting CNF to DNF
3.2
619
Small DNFs from Small CNFs
We now show that the blow-up obtained in the previous section (see Corollary 1) is essentially optimal. Theorem 2. There is a constant c > 0, such that for all large n, and m ∈ −4 [104 n, 210 n ], n dnfsize(m, n) ≤ 2n−c log(m/n) . Proof. Let f be a Boolean function on a set V of n variables, and let Φ be a CNF for f with at most m clauses. We wish to show that f has a DNF of small size. By comparing the present bound with Lemma 4, we see that our job would be done if we could somehow ensure that the clauses in Φ have at most O(log(m/n)) literals. All we know, however, is that Φ has at most m clauses. In order to prepare Φ for an application of Lemma 4, we will attempt to destroy the large clauses of Φ by applying a random restriction. Let ρ be a random restriction on V that leaves n2 variables free. We cannot claim immediately that all large clause are likely to be destroyed by this restriction. Instead, we will use the structure of the surviving large clauses to get around them. The following predicate will play a crucial role in our proof. E(ρ): There is a set S0 ⊆ free(ρ) of size at most n/10 so that every clause ∆ of Φ that is not killed by ρ has at most r = 100 log(m/n) free variables outside S0 . n
Claim. Prρ [E(ρ)] ≥ 1 − 2− 100 . Before we justify this claim, let us see how we can exploit it to prove our n theorem. Fix a choice of S ⊆ V such that Pr[E(ρ) | set(ρ) = S] ≥ 1 − 2− 100 . Let F = V − S. We will concentrate only on ρ’s with set(ρ) = S, that is, ρ’s from the set RVS . We will build a small DNF for f by putting together the DNFs for the different fρ ’s. The key point is that whenever E(ρ) is true, we will be able to show that fρ has a small DNF. E(ρ) is true: Consider the set S0 ⊆ free(ρ) whose existence is promised in the definition of E(ρ). The definition of S0 implies that for each σ ∈ RF S0 all clauses of Φσ◦ρ have at most r literals. By Lemma 4, dnfsize(fσ◦ρ ) ≤ 2|F |−|S0 |−
|F |−|S0 | 100r
, and by Lemma 2, we have |F |−|S0 | |F |−|S0 | dnfsize(fσ◦ρ ) ≤ 2|S0 | 2|F |−|S0 |− 100r ≤ 2|F |− 100r . dnfsize(fρ ) ≤ σ∈RF S
0
E(ρ) is false: We have dnfsize(fρ ) ≤ 2|F |−1 . Using these bounds for dnfsize(fρ ) for ρ ∈ RVS in Lemma 2 we obtain dnfsize(f ) ≤ 2|S| · 2|F |−
|F |−|S0 | 100r
n
+ 2|S| 2− 100 2|F |−1 = 2n (2−
|F |−|S0 | 100r
n
+ 2− 100 ).
The theorem follows from this because |F | − |S0 | = Ω(n) and r = O(log(m/n)). We still have to prove the claim.
620
Peter Bro Miltersen, Jaikumar Radhakrishnan, and Ingo Wegener
Proof of Claim. Suppose E(ρ) is false. We will first show that there is a set of at most n/(10(r + 1)) surviving clauses in Φρ that together involve at least n/10 variables. The following sequential procedure will produce this set of clauses. Since E does not hold, there is some (surviving) clause c1 of Φρ with at least r +1 variables. Let T be the set of variables that appear in this clause. If |T | ≥ n/10, then we stop: {c1 } is the set we seek. If |T | < n/10, there must be another clause c2 of Φρ with r + 1 variables outside T , for otherwise, we could take S0 = T and E(ρ) would be true. Add to T all the variables in c2 . If |T | ≥ n/10, we stop with the set of clauses {c1 , c2 }; otherwise, arguing as before there must be another clause c3 of Φρ with r + 1 variables outside T . We continue in this manner, picking a new clause and adding at least r + 1 elements to T each time, as long n n as |T | < 10 . Within n/(10(r + 1)) steps we will have |T | ≥ 10 , at which point we stop. For a set C of clauses of Φ, let K(C) be a set of literals obtained by picking one literal for each variable that appears in some clause in C. By the discussion above, for E(ρ) to be false, there must be some set C of clauses of Φ such that ∆ n |C| ≤ n/(10(r + 1)) = a, K(C) ≥ 10 and no literal in K(C) is assigned 1 by ρ. Thus, using Lemma 5, we have Pr[¬E(ρ)] ≤ Pr[no literal in K(C) is assigned 1 by ρ] ρ
n C,|C|≤a,|K(C)|≥ 10
≤
a m j=1
j
n
ρ
n
· 2e− 80 ≤ 2− 100 .
To justify the last inequality, we used the assumption that n is large and m ∈ −4 [104 n, 210 n ]. We omit the detailed calculation. This completes the proof of the claim.
4
Conclusion and Open Problems n
We have shown lower and upper bounds for dnfsize(m, n) of the form 2n−c log(m/n) . The constant c in the lower and upper bounds are far, and it would be interesting to bring them closer, especially when m = An for some constant A. Our bounds are not tight for monotone functions. In particular, what is the largest possible blow-up in size when converting a polynomial-size monotone CNF to an equivalent optimal-size monotone DNF? Equivalently, what is the largest possible number of distinct minimal vertex covers for a hypergraph with n vertices and nO(1) edges? We have given an upper bound 2n−Ω(n/ log n) and a lower bound 2n−O(n log log n/ log n) . Getting tight bounds seems challenging.
Acknowledgements We thank the referees for their comments.
On Converting CNF to DNF
621
References 1. Beame, P.: A switching lemma primer. Technical Report UW-CSE-95-07-01, Department of Computer Science and Engineering, University of Washington (November 1994). Available online at www.cs.washington.edu/homes/beame/. 2. Chv´ atal, V.: The tail of the hypergeometric distribution. Discrete Mathematics 25 (1979) 285–287. 3. Bollig, B. and Wegener, I.: A very simple function that requires exponential size read-once branching programs. Information Processing Letters 66 (1998) 53–57. 4. Dantsin, E., Goerdt, A., Hirsch, E.A., and Sch¨ oning, U.: Deterministic algorithms for k-SAT based on covering codes and local search. Proceedings of the 27th International Colloquium on Automata, Languages and Programming. Springer. LNCS 1853 (2000) 236–247. 5. Dantsin, E., Goerdt, A., Hirsch, E.A., Kannan, R., Kleinberg, J., Papadimitriou, C., Raghavan, P., and Sch¨ oning, U.: A deterministic (2 − 2/(k + 1))n algorithm for k-SAT based on local search. Theoretical Computer Science, to appear. 6. H˚ astad, J.: Almost optimal lower bounds for small depth circuits. In: Micali, S. (Ed.): Randomness and Computation. Advances in Computing Research, 5 (1989) 143–170. JAI Press. 7. Hofmeister, T., Sch¨ oning, U., Schuler, R., and Watanabe, O.: A probabilistic 3-SAT algorithm further improved. Proceedings of STACS, LNCS 2285 (2002) 192–202. 8. Katajainen, J. and Madsen, J.N.: Performance tuning an algorithm for compressing relational tables. Proceedings of SWAT, LNCS 2368 (2002) 398–407. 9. Monien, B. and Speckenmeyer, E.: Solving satisfiability in less than 2n steps. Discrete Applied Mathematics 10 (1985) 287–295. 10. Paturi, R., Pudl` ak, P., Saks, M.E., and Zane, F.: An improved exponential-time algorithm for k-SAT. Proceedings of the 39th IEEE Symposium on the Foundations of Computer Science (1998) 628–637. 11. W. V. O. Quine: On cores and prime implicants of truth functions. American Mathematics Monthly 66 (1959) 755–760. 12. Razborov, A. and Rudich, S.: Natural proofs. Journal of Computer and System Sciences 55 (1997) 24–35. 13. Sch¨ oning, U.: A probabilistic algorithm for k-SAT based on limited local search and restart. Algorithmica 32 (2002) 615–623. 14. Voigt, B., Wegener, I.: Minimal polynomials for the conjunctions of functions on disjoint variables an be very simple. Information and Computation 83 (1989) 65– 79. 15. Wegener, I.: The Complexity of Boolean Functions. Wiley 1987. Freely available via http://ls2-www.cs.uni-dortmund.de/∼wegener. 16. Wegener, I.: Branching Programs and Binary Decision Diagrams – Theory and Applications. SIAM Monographs on Discrete Mathematics and Applications 2000.
A Basis of Tiling Motifs for Generating Repeated Patterns and Its Complexity for Higher Quorum∗ N. Pisanti1 , M. Crochemore2,3,∗∗ , R. Grossi1 , and M.-F. Sagot4,3,∗∗∗ 1
2
Dipartimento di Informatica, Universit` a di Pisa, Italy {pisanti,grossi}@di.unipi.it Institut Gaspard-Monge, University of Marne-la-Vall´ee, France [email protected] 3 INRIA Rhˆ one Alpes, France [email protected] 4 King’s College London, UK
Abstract. We investigate the problem of determining the basis of motifs (a form of repeated patterns with don’t cares) in an input string. We give new upper and lower bounds on the problem, introducing a new notion of basis that is provably smaller than (and contained in) previously defined ones. Our basis can be computed in less time and space, and is still able to generate the same set of motifs. We also prove that the number of motifs in all these bases grows exponentially with the quorum, the minimal number of times a motif must appear. We show that a polynomial-time algorithm exists only for fixed quorum.
1
Introduction
Identifying repeated patterns in strings is a computationally-demanding task on the large data sets available in computational biology, data mining, textual document processing, system security, and other areas; for instance, see [6]. We consider patterns with don’t cares in a given string s of n symbols drawn over an alphabet Σ. The don’t care is a special symbol ‘◦’ matching any symbol of Σ; for example, pattern T◦E matches both TTE and TEE inside s = COMMITTEE (note that a pattern cannot have a don’t care at the beginning or at the end, as this is not considered informative). Contrarily to string matching with don’t cares, the pattern T◦E is not given in advance for searching s. Instead, the patterns with don’t cares appearing in s are unknown and, as such, have to be discovered and extracted by processing s efficiently. In our example, T◦E and M◦◦T◦E are among the patterns appearing repeated in COMMITTEE. In this paper we focus ∗ ∗∗ ∗∗∗
The full version of this paper is available in [11] as technical report TR-03-02. Supported by CNRS action AlBio, NATO Sc. Prog. PST.CLG.977017, and Wellcome Trust Foundation. Supported by CNRS-INRIA-INRA-INSERM action BioInformatique and Wellcome Trust Foundation.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 622–631, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Basis of Tiling Motifs for Generating Repeated Patterns
623
on finding the patterns called motifs, which appear at least q times in s for an input parameter q ≥ 2 called the quorum. Different formulations in the known literature address the problem of detecting motifs in several contexts, revealing its algorithmic relevance. Unfortunately, the complexity of the algorithms for motif discovery may easily become exponential due to the explosive growth of the motifs in strings, such as in the artificial string A · · · ATA · · · A (same number of As on both sides of T) generating many motifs with As intermixed with don’t cares, and in other “real” strings over a small alphabet occurring in practice, e.g., DNA sequences. Some heuristics try to alleviate this drawback by reducing the number of interesting motifs to make feasible any further processing of them, but they cannot guarantee sub-exponential bounds in the worst case [7]. In this paper, we explore the algorithmic ideas behind motif discovery while getting some insight into their combinatorial complexity and their connections with string algorithmics. Given a motif x for a string s of length n, we denote the set of positions on s at which the occurrences of x start by Lx ⊆ [0. .n−1], where |Lx | ≥ q holds for the given quorum q ≥ 2. We single out the maximal motifs x, informally characterized as satisfying |Lx | = |Ly | for any other motif y more specific than x, i.e., obtained from x by adding don’t cares and alphabet letters or by replacing one or more don’t cares with alphabet letters. In other words, x appears in y but x occurs in s more times than y does, which is considered informative for discovering the repetitions in s. For example, M◦◦T◦E is maximal in COMMITTEE for q = 2 while M◦◦◦◦E and T◦E are not maximal since M◦◦T◦E is more specific with the same number of occurrences. Maximality provides an intuitive notion of relevance as each maximal motif x indirectly represents all non-maximal motifs z that are less specific than it. Unfortunately, this property does not bound significantly the number of maximal motifs. For example, A · · · ATA · · · A contains an exponential number of them for q = 2 (see Section 2). A further requirement on the maximal motifs is the notion of irredundant motifs ([7]). A maximal motif x is redundant if there exist maximal motifs y1 , . . . , yk = x such that the set of occurrences of x satisfies Lx = Ly1 ∪ . . . ∪ Lyk ; it is irredundant otherwise. The set of occurrences of a redundant motif can be covered by other sets of occurrences while that of an irredundant motif is not the union of the sets of occurrences of other maximal motifs. The basis of the irredundant motifs of string s with quorum q is the set of irredundant motifs in s. Informally speaking, a basis can generate all the motifs by simple rules and can be expressed mathematically in the algebraic sense of the term. According to Parida et al. [7], what makes interesting the irredundant motifs is that their number is always upper bounded by 3n independently of any chosen q ≥ 2; moreover, they can be found in O(n3 log n) time by this bound, notwithstanding the possibly exponential number of maximal motifs that are candidates for the basis. Our results: We study the complexity of finding the basis of motifs with novel algorithms to represent all motifs succinctly. We show that, in the worst case, there is an infinite family of strings for which the basis contains Ω(n2 ) irredundant motifs for q = 2 (see Section 2). This contradicts the upper bound of 3n for any q ≥ 2 given in [7] as shown (in the Appendix of [11] we give a
624
N. Pisanti et al.
counterexample to its charging scheme, which crucially relies on a lemma that is not valid). As a result, the bound of O(n3 log n) time in [7] for any q does not hold since it relies on the upper bound of 3n, thus leaving open the problem of discovering a basis in polynomial time for any q. We also introduce a new definition called basis of the tiling motifs of string s with quorum q. The condition for tiling motifs is stronger than that of irredundancy. A maximal motif x is tiled if there exist maximal motifs y1 , . . . , yk = x such that the set of occurrences of x satisfies Lx = (Ly1 + d1 ) ∪ . . . ∪ (Lyk + dk ) for some integers d1 , . . . , dk ; it is tiling otherwise. Note that the motifs y1 , . . . , yk are not necessarily distinct and the union of their occurrences is taken after displacing them by d1 , . . . , dk , respectively. Since a redundant motif is also tiled with d1 = · · · = dk = 0, a tiling motif is surely irredundant. Hence the basis for the tiling motifs is included in the basis for irredundant motifs while both of them are able to generate the same set of motifs with mechanical rules. Although the definition of tiling motifs is derived from that of irredundant ones, the difference is much more substantial than it may appear. The basis of tiling motifs is symmetric, namely, the tiling motifs of s (the string s in reversed order) are the reversed tiling motifs of s whereas the irredundant motifs for strings s and s are apparently unrelated, unlike the entropy and other properties related to the repetitions in strings. Moreover, the number of tiling motifs can be provably upper bounded in the worst case by n − 1 for q = 2 and they occur in s for a total of 2n times at most, whereas we demonstrate that there can be Ω(n2 ) irredundant motifs. We give more details in Section 3, and we also discuss in the full paper [11] how to find the longest motifs with a limited number of don’t cares. Finally, in Section 4, we reveal an exponential dependency on the quorum q for the number of motifs, both for the basis of irredundant motifs and for the basis of tiling motifs, which was unnoticed in previous work. We prove n−1 that there is an family of infinite 1 n−1 2 −1 = Ω strings for which the basis contains at least q−1 tiling (hence, 2q q−1 irredundant) motifs. Hence, no worst-case polynomial-time algorithm can exist for finding the basis with arbitrary values of q ≥ 2. we can prove Nonetheless, that the tiling motifs in our basis are less than n−1 q−1 in number and occur in s a total of q n−1 q−1 times at2 most. For them there exists a pseudo-polynomial time, which shows that the tiling motifs can be algorithm taking O q 2 n−1 q−1 found in polynomial time if and only if the quorum q satisfies either q = O(1) or q = n − O(1) (the latter is hardly meaningful in practice). Experimenting with small strings exhibits a non-constant growth of the basis for increasing values of q up to O(log n) but larger values of q are possible in the worst case. More experimental analysis of the implementation can be found in [11]. Proofs of all results can also be found in [11]. Related work: As previously mentioned, the seminal idea of basis was introduced by Parida et al. [7]. The unpublished manuscript [1] adopted an identical definition of irredundant motifs in the first part. Very recently, Apostolico [4] observed that the O(n3 )-time algorithm proposed in the second part of [1] contains an implicit definition different from that of the first part. Namely, in a redundant motif x, the list Lx can be “deduced” from the union of the oth-
A Basis of Tiling Motifs for Generating Repeated Patterns
625
ers (see also [3]). Note that no formal specification of this alternative definition is however explicited. Applications of the basis of repeated patterns (with just q = 2) to data compression are described in [2]. Tiling motifs can be employed in this context because of their linear number of occurrences in total. The idea of the basis was also explored by Pelfrˆene et al. [8,9], who introduced the notion of primitive motifs. They gave two alternative definitions claimed to be equivalent, one definition reported in the two-page abstract accompanying the poster and the other in the poster itself. The basis defined in the poster is not symmetric and is a superset of the one presented in this paper. On the other hand, the definition of primitive motifs given in the two-page abstract is somehow equivalent to that given in this paper and introduced independently in our technical report [10]. Because of the lower bounds proved in this paper, the algorithm in [9] is exponential with respect to q. The problem of finding a polynomial-size basis for higher values of q remains unsolved.
2
Irredundant Motifs: The Basis and Its Size for q = 2
We consider strings that are finite sequences of letters drawn from an alphabet Σ, whose elements are also called solid characters. We introduce an additional letter (denoted by ◦ and called don’t care) that does not belong to Σ and matches any letter. The length of a string t with don’t cares, denoted by |t|, is the number of letters in t, and t[i] indicates the letter at position i in t for 0 ≤ i ≤ |t| − 1 (hence, t = t[0]t[1] · · · t[|t| − 1] also noted t[0 . . |t| − 1]). A pattern is a string in Σ ∪ Σ(Σ ∪ {◦})∗ Σ, that is, it starts and ends with a solid character. The pattern occurrences are related to the specificity relation . For individual characters σ1 , σ2 ∈ Σ ∪ {◦}, we have σ1 σ2 if σ1 = ◦ or σ1 = σ2 . Relation extends to strings in (Σ ∪ {◦})∗ under the convention that each string t is implicitly surrounded by don’t cares, namely, letter t[j] is ◦ when j < 0 or j ≥ |t|. In this way, v is more specific than u (shortly, u v) if u[j] v[j] for any integer j. We also say that u occurs at position in v if u[j] v[ + j], for 0 ≤ j ≤ |u| − 1. Equivalently, we say that u matches v[] · · · v[ + |u| − 1]. For the input string s ∈ Σ ∗ with n = |s|, we consider the occurrences of arbitrary patterns x in s. The location list Lx ⊆ [0 . . n − 1] denotes the set of all the positions on s at which x occurs. For example, the location list of x = T◦E in s = COMMITTEE is Lx = {5, 6}. Definition 1 (Motif ). Given a parameter q ≥ 2 called quorum, we say that pattern x is a motif according to s and q if |Lx | ≥ q. Given any location list Lx and any integer d, we adopt the notation Lx + d = { + d | ∈ Lx } for indicating the occurrences in Lx “displaced” by the offset d. Definition 2 (Maximality). A motif x is maximal if any other motif y such that x occurs in y satisfies Ly = Lx + d for some integer d. Making a maximal motif x more specific (thus obtaining y) reduces the number of its occurrences in s. Definition 2 is equivalent to that in [7] stating that x is
626
N. Pisanti et al.
maximal if there exist no other motif y and no integer d ≥ 0 verifying Lx = Ly +d, such that y[j + d] x[j] for 0 ≤ j ≤ |x| − 1. Definition 3 (Irredundant Motif ). A maximal motif x is irredundant if, for any maximal motifs y1 , y2 , . . . , yk such that Lx = ∪ki=1 Lyi , motif x must be one of the yi ’s. Vice versa, if all the yi ’s are different from x, pattern x is said to be covered by motifs yi , y2 , . . . , yk . The basis of irredundant motifs for string s is the set of all irredundant motifs in s, useful as a generator for all maximal motifs in s (see [7]). The size of the basis is the number of irredundant motifs contained in it. We now show the existence of an infinite family of strings sk (k ≥ 5) for which there are Ω(n2 ) irredundant motifs in the basis already for quorum q = 2, where n = |sk |. In this way, we disprove the upper bound of 3n which is based on an incorrect lemma (see also [11]). Each string sk is the suitable extension of tk = Ak TAk , where Ak denotes the letter A repeated k times (our argument works also for z k wz k , where |z| = |w| and z is a string not sharing any common character with w). String tk has an exponential number of maximal motifs, including those having the form A{A, ◦}k−2 A with exactly two don’t cares. To see why, each such motif x occurs four times in tk : specifically, two occurrences of x match the first and the last k letters in tk while each distinct don’t care in x matching the letter T in tk contributes to one of the two remaining occurrences. Extending x or replacing a don’t care with a solid character reduces the number of these occurrences, so x is maximal. The idea of our proof is to obtain strings sk by prefixing tk with O(|tk |) symbols to transform the above maximal motifs x into irredundant motifs for sk . Since there are Θ(k 2 ) of them, and n = |sk | = O(|tk |) = O(k), this leads to the result. In order to define sk on the alphabet {A, T, u, v, w, x, y, z, a1 , a2 , . . . , ak−2 }, we introduce a few notations. Let u be the reversal of u, and let ev k , od k , uk , vk be if k is even : ev k od k uk vk
= a2 a4 · · · ak−2 , = a1 a3 · · · ak−3 , = ev k u e v k vw ev k , k z od k , = od k xy od
if k is odd : ev k od k uk vk
= a2 a4 · · · ak−3 , = a1 a3 · · · ak−2 , = ev k uv e v k wx ev k , k z od k . = od k y od
The strings sk are then defined by sk = uk vk tk for k ≥ 5. Lemma 1. The length of uk vk is 3k, and that of sk is n = 5k + 1. Proposition 1. For 1 ≤ p ≤ k − 2, any motif of the form Ap ◦ Ak−p−1 with one don’t care cannot be maximal in sk . Also motif Ak cannot be maximal in sk . Proposition 2. Each motif of the form A{A, ◦}k−2 A with exactly two don’t cares is irredundant in sk . Theorem 1. The basis for string sk contains Ω(n2 ) irredundant motifs, where n = |sk | and k ≥ 5.
A Basis of Tiling Motifs for Generating Repeated Patterns
3
627
Tiling Motifs: The Basis and Its Properties
In this section we introduce a natural notion of basis for generating all maximal motifs occurring in a string s of length n. Analogously to what was done for maximal motifs in Definition 2, we introduce displacements while defining tiling motifs for this purpose. Definition 4 (Tiling Motif ). A maximal motif x is tiling if, for any maximal motifs y1 , y2 , . . . , yk and for any integers d1 , d2 , . . . , dk such that Lx = ∪ki=1 (Lyi + di ), motif x must be one of the yi ’s. Vice versa, if all the yi ’s are different from x, pattern x is said to be tiled by motifs y1 , y2 , . . . , yk . The notion of tiling is more selective than that of irredundancy in general. For example, in the string s = FABCXFADCYZEADCEADC, motif x1 = A◦C is irredundant but it is tiled by x2 = FA◦C and x3 = ADC according to Definition 4 since its location list, Lx1 = {1, 6, 12, 16}, can be obtained from the union of Lx2 = {0, 5} and Lx3 = {6, 12, 16} with respective displacements d2 = 1 and d3 = 0. A fairly direct consequence of Definition 4 is that if x is tiled by y1 , y2 , . . . , yk with associated displacements d1 , d2 , . . . , dk , then x occurs at position di in each yi for 1 ≤ i ≤ k (hence di ≥ 0). Note that the yi ’s in Definition 4 are not necessarily distinct and that k > 1 for tiled motifs (it follows from the fact that Lx = Ly1 +d1 with x = y1 would contradict the maximality of both x and y1 ). As a result, a maximal motif x occurring exactly q times in s is tiling as it cannot be tiled by any other motifs (we need at least two of them, which is impossible). The basis of tiling motifs is the complete set of all tiling motifs for s, and the size of the basis is the number of these motifs. For example, the basis B for FABCXFADCYZEADCEADC contains FA◦C, EADC, and ADC as tiling motifs. Although Definition 4 is derived from that of irredundant motifs given in Definition 3, the difference is much more substantial than it may appear. The basis of tiling motifs relies on the fact that tiling motifs are considered as invariant by displacement as for maximality. Consequently, our definition of basis is symmetric, that is, each tiling motif in the basis for the reverse string s is the reverse of a tiling motif in the basis of s. This follows from the symmetry in Definition 4 and from the fact that maximality is also symmetric in Definition 2. It is a sine qua non condition for having a notion of basis invariant by the left-to-right or right-to-left order of the symbols in s (like the entropy of s), while this property does not hold for the irredundant motifs. The basis of tiling motifs has further interesting properties. Later in this section, we show that our basis is linear for quorum q = 2 (i.e., its size is at most n − 1) and that the total size of the location lists for the tiling motifs is less than 2n, describing how to find the basis in O(n2 log n log |Σ|) time. In the full paper [11], we discuss some applications such as generating all maximal motifs with the basis and finding motifs with a constraint on the number of don’t cares. Given a string s of length n, let B denote its basis of tiling motifs for quorum q = 2. Although the number of maximal motifs may be exponential and the basis of irredundant motifs may be at least quadratic (see Section 2), we show that the size of B is always less than n. For this, we introduce an operator ⊕ between the symbols of Σ to define merges, which are at the heart of
628
N. Pisanti et al.
the properties on B. Given two letters σ1 , σ2 ∈ Σ with σ1 = σ2 , the operator satisfies σ1 ⊕ σ2 = ◦ and σ1 ⊕ σ1 = σ1 . The operator applies to any pair of strings x, y ∈ Σ ∗ , so that u = x ⊕ y satisfies u[j] = x[j] ⊕ y[j] for all integers j. A merge is the motif resulting from applying the operator ⊕ to s and to its suffix at position k. Definition 5 (Merge). For 1 ≤ k ≤ n − 1, let sk be the string whose character at position i is sk [i] = s[i] ⊕ s[i + k]. If sk contains at least one solid character, Merge k denotes the motif obtained by removing all the leading and trailing don’t cares in sk (i.e., those appearing before the leftmost solid character and after the rightmost solid character). For example, the string FABCXFADCYZEADCEADC has Merge 4 = EADC, Merge 5 = FA◦C, Merge 6 = Merge 10 = ADC and Merge 11 = Merge 15 = A◦C. The latter is the only merge that is not a tiling motif. Lemma 2. If Merge k exists, it must be a maximal motif. Lemma 3. For each tiling motif x in the basis B, there is at least one k for which Merge k = x. Theorem 2. Given a string s of length n and the quorum q = 2, let M be the set of Merge k , for 1 ≤ k ≤ n − 1 such that Merge k exists. The basis B of tiling motifs for s satisfies B ⊆ M, and therefore the size of B is at most n − 1. A simple consequence of Theorem 2 implies a tight bound on the number of tiling motifs for periodic strings. If s = we for a string w repeated e > 1 times, then s has at most |w| tiling motifs. Corollary 1. The number of tiling motifs for s is ≤ p, the smallest period of s. The bound in Corollary 1 is not valid for irredundant motifs. For example, string s = ATATATATA has period p = 2 and only one tiling motif ATATATA, while its irredundant motifs are A, ATA, ATATA and ATATATA. We describe how to compute the basis B for string s when q = 2. A bruteforce algorithm generating first all maximal motifs of s takes exponential time in the worst case. Theorem 2 plays a crucial role in that we first compute the motifs in M and then discard those being tiled. Since B ⊆ M, what remains is exactly B. To appreciate this approach, it is worth noting that we are left with the problem of selecting B from n − 1 maximal motifs in M at most, rather than selecting B among all the maximal motifs in s, which may be exponential in number. Our simple algorithm takes O(n2 log n log |Σ|) time and is faster than previous (and more complicated) methods. Step 1. Compute the Multiset M of Merges. Letting sk [i] be the leftmost solid character of string sk in Definition 5, we define occ x = {i, i + k} to be the positions of the two occurrences of x whose superposition generates x = Merge k . For k = 1, 2, . . . , n−1, we compute string sk in O(n−k) time. If sk contains some
A Basis of Tiling Motifs for Generating Repeated Patterns
629
solid characters, we compute x = Merge k and occ x in the same time complexity. As a result, we compute the multiset M of merges in O(n2 ) time. Each merge x in M is identified by a triplet i, i + k, |x|, from which we can recover the jth symbol of x in constant time by simple arithmetic operations and comparisons. Step 2. Transform the Multiset M into the Set M of Merges. Since there can be two or more merges in M that are identical and correspond to the same merge in M, we put together all identical merges in M by performing radix sorting on the triplets representing them. The total cost of this step is dominated by radix 2 sorting, giving O(n log |Σ|) time. As byproduct, we produce the temporary location list Tx = x =x : x ∈M occ x for each distinct x ∈ M thus obtained. Lemma 4. Each motif x ∈ B satisfies Tx = Lx . Step 3. Select M∗ ⊆ M, where M∗ = {x ∈ M : Tx = Lx }. In order to build M∗ , we employ the Fischer-Paterson algorithm based on convolution [5] for string matching with don’t cares to compute the whole list of occurrences Lx for each merge x ∈ M. Its cost is O((|x| + n) log n log |Σ|) time for each merge x. Since |x| < n and there are at most n − 1 motifs x ∈ M, we obtain O(n2 log n log |Σ|) time to construct all lists Lx . We can compute M∗ by discarding the merges x ∈ M such that Tx = Lx in additional O(n2 ) time. Lemma 5. The set M∗ satisfy the conditions B ⊆ M∗ and x∈M∗ |Lx | < 2n. The property of M∗ in Lemma 5 is crucial in that x∈M |Lx | = Θ(n2 ) when many lists contain Θ(n) entries. For example, s = An has n − 1 distinct merges, each of the form x = Ai for 1 ≤ i ≤ n − 1, and so |Lx | = n − i + 1. This would be a sharp drawback in Step 4 when removing tiled motifs as it may turn into an Θ(n3 ) algorithm. Using M∗ instead, we are guaranteed that x∈M∗ |Lx | = O(n); we may still have some tiled motifs in M∗ , but their total number of occurrences is O(n). Step 4. Discard the Tiled Motifs in M∗ . We can now check for tiling motifs in O(n2 ) time. Given two distinct motifs x, y ∈ M∗ , we want to test whether Lx + d ⊆ Ly for some integer d and, in that case, we want to mark the entries in Ly that are also in Lx + d. At the end of this task, the lists having all entries marked are tiled (see Definition 4). By removing their corresponding motifs from M∗ , we eventually obtain the basis B by Lemma 5. Since the meaningful values of d are equal to the individual entries of Ly , we have only |Ly | possible values to check. For a given value of d, we avoid to merge Lx and Ly in O(|Lx |+|Ly |) time to perform the test, as it would contribute to a total of Θ(n3 ) time. Instead, we exploit the fact that each list has values ranging from 1 to n, and use a couple of bit-vectors of size n to perform |Ly |) time for all the above check in O(|Lx | × values of d. This gives O( y x |Lx | × |Ly |) = O( y |Ly | × x |Lx |) = O(n2 ) by Lemma 5. We therefore detail how to perform the above check with Lx and Ly in O(|Lx | × |Ly |) time. We use two bit-vectors V1 and V2 initially set to all zeros. Given y ∈ M∗ , we set V1 [i] = 1 if i ∈ Ly . For each x ∈ M∗ − {y} and
630
N. Pisanti et al.
for each d ∈ Ly , we then perform the following test. If all j ∈ Lx + d satisfy V1 [j] = 1, we set V2 [j] = 1 for all such j. Otherwise, we take the next value of d, or the next motif if there are no more values of d, and we repeat the test. After examining all x ∈ M∗ − {y}, we check whether V1 [i] = V2 [i] for all i ∈ Ly . If so, y is tiled as its list is covered by possibly shifted location lists of other motifs. We then reset the ones in both vectors in O(|Ly |) time. Summing up Steps 1–4, the dominant cost is that of Step 3, leading to the following result. Theorem 3. Given an input string s of length n over the alphabet Σ, the basis of tiling motifs with quorum q = 2 can be computed in O(n2 log n log |Σ|) time. The total number of motifs in the basis is less than n, and the total number of their occurrences in s is less than 2n.
4
q > 2: Pseudo-Polynomial Bases for Higher Quorum
We now discuss the general case of quorum q ≥ 2 for finding the basis of a string of length n. Differently from previous work claiming a polynomial-time algorithm for any arbitrary value of q, we show in Section 4 that no such polynomial-time algorithm can exist in the worst case, both for the basis of irredundant motifs and for the basis of tiling motifs. The size of these bases provably depends n−1 2 −1 . exponentially on suitable values of q ≥ 2, i.e., we give a lower bound of Ω q−1 In practice, this size has an exponential growth for increasing values of q up to O(log n), but larger values of q are theoretically possible in the worst case. Fixing q = (n − 1)/4 + 1 in our lower bound, we get a size of Ω(2(n−1)/4 ) motifs in the bases. On the average q = O(log|Σ| n) by extending the argument after Theorem 3. We show a further for the basis of tiling motifs in Section 4, property giving an upper bound of n−1 on its size with a simple proof. Since we can q−1 find an algorithm taking time proportional to the square of that size, we can conclude that a polynomial-time algorithm for finding the basis of tiling motifs exists in the worst case if and only if the quorum q satisfies either q = O(1) or q = n − O(1) (the latter condition is hardly meaningful in practice). n−1We now show the existence of a family of strings for which there are at least 2 −1 tiling motifs for a quorum q. Since a tiling motif is also irredundant, q−1 this gives a lower bound for the irredundant motifs to be combined with that in Section 2 (the latter lower bound still gives Ω(n2 ) for q ≥ 2). The strings are this time tk = Ak TAk (k ≥ 5) themselves, without the left used in the bound extension motifs that are maximal of Section 2. The proof proceeds by exhibiting k−1 q−1 and have each exactly q occurrences, from whence it follows immediately that they are tiling (indeed the remark made after Definition 4 holds for any q ≥ 2). Proposition 3. For 2 ≤ q ≤ k and 1 ≤ p ≤ k − q + 1, any motif of the type Ap ◦ {A, ◦}k−p−1 ◦ Ap with exactly q don’t cares is tiling (and so irredundant) in tk . n−1 −1 2 Theorem 4. String tk has q−1 = Ω 21q n−1 tiling (and irredundant) moq−1 tifs, where n = |tk | and k ≥ 2. We now prove that n−1 q−1 is, instead, an upper bound for the size of a basis of tiling motifs for a string s and quorum q ≥ 2. Let us denote as before such
A Basis of Tiling Motifs for Generating Repeated Patterns
631
a basis by B. To prove the upper bound, we use again the notion of a merge except that it involves q strings. The operator ⊕ between the elements of Σ is the same as before. Let k be an array of q − 1 positive values k1 , . . . , kq−1 with 1 ≤ ki < kj ≤ n − 1 for all 1 ≤ i < j ≤ q − 1. A merge is the (non empty) pattern that results from applying the operator ⊕ to the string s and to s itself q − 1 times, at each time shifted by ki positions to the right for 1 ≤ i ≤ q − 1. Lemma 6. If Merge k exists for quorum q, it must be a maximal motif. Lemma 7. For each tiling motif x in the basis B with quorum q, there is at least one k for which Merge k = x. Theorem 5. Given a string s of length n and a quorum q, let M be the set of Merge k , for any of the n−1 exists. The q−1 possible choices of k for which Merge n−1 k basis B of tiling motifs satisfies B ⊆ M, and therefore |B| ≤ q−1 . The tiling motifs in our basis appear in s for a total of q n−1 q−1 times at most. A generalization of the algorithm 2 given in Section 3 gives a pseudo-polynomial time complexity of O q 2 n−1 . q−1
References 1. A. Apostolico and L. Parida. Incremental paradigms of motif discovery. unpublished, 2002. 2. A. Apostolico and L. Parida. Compression and the wheel of fortune. In IEEE Data Compression Conference (DCC’2003), pages 143–152, 2003. 3. A. Apostolico. Pattern discovery and the algorithmics of surprise. In NATO ASI on Artificial Intelligence and Heuristic Methods for Bioinformatics. IOS press, 2003. 4. A. Apostolico. Personal communication, May 2003. 5. M. Fischer and M. Paterson. String matching and other products. In R. Karp, editor, SIAM AMS Complexity of Computation, pages 113–125, 1974. 6. H. Mannila. Local and global methods in data mining: basic techniques and open problems. In P. et al., editor, International Colloquium on Automata, Languages, and Programming, volume 2380 of LNCS, pages 57–68. Springer-Verlag, 2002. 7. L. Parida, I. Rigoutsos, A. Floratos, D. Platt, and Y. Gao. Pattern Discovery on Character Sets and Real-valued Data: Linear Bound on Irredundant Motifs and Efficient Polynomial Time Algorithm. In SIAM Symposium on Discrete Algorithms, 2000. 8. J. Pelfrˆene, S. Abdedda˝ım, and J. Alexandre. Un algorithme d’indexation de motifs approch´es. In Journ´ee Ouvertes Biologie Informatique Math´ematiques (JOBIM), pages 263–264, 2002. 9. J. Pelfrˆene, S. Abdedda˝ım, and J. Alexandre. Extracting approximare patterns. In Combinatorial Pattern Matching, 2003. to appear. 10. N. Pisanti, M. Crochemore, R. Grossi, and M.-F. Sagot. A basis for repeated motifs in pattern discovery and text mining. Technical Report IGM 2002-10, Institut Gaspard-Monge, University of Marne-la-Vall´ee, July 2002. 11. N. Pisanti, M. Crochemore, R. Grossi, and M.-F. Sagot. Bases of motifs for generating repeated patterns with don’t cares. Technical Report TR-03-02, Dipartimento di Informatica, University of Pisa, January 2003.
On the Complexity of Some Equivalence Problems for Propositional Calculi Steffen Reith 3Soft GmbH Frauenweiherstr. 14 D-91058 Erlangen, Germany [email protected]
Abstract. In the present paper1 we study the complexity of Boolean equivalence problems (i.e. have two given propositional formulas the same truthtable) and of Boolean isomorphism problems (i.e. does there exists a permutation of the variables of one propositional formula, such that the truthtable of this modified formula coincides with the truthtable of the second formula) of two given generalized propositional formulas and certain classes of Boolean circuits. Keywords: Computational complexity, Boolean functions, Boolean isomorphism, Boolean equivalence, closed classes, Dichotomy, Post, satisfiability problems.
1
Introduction
In 1921 E. L. Post gave a full characterization of all classes of Boolean functions which are closed under superposition (i. e. substitution of Boolean functions, permutation and identification of variables and introduction of fictive variables). Based on his results (see [9]) we define, for a finite set B of Boolean functions, the so called B-formulas and B-circuits, which are closely related to Post’s closed classes of Boolean functions. To be more precise: Every B-formula and B-circuit represents a Boolean function in the closure of B, hence B-formulas form generalized propositional calculi, since the classical formulas and circuits are mostly restricted to B = {∧, ∨, ¬}. The satisfiability-problem of B-formulas was at first studied by H. Lewis. In his paper [8] he was able to show that the satisfiability problem of B-formulas is either NP-complete iff the Boolean function represented by x ∧ ¬y is in the closure of B or it is solvable in deterministic polynomial time. Theorems of this form are called dichotomy theorems, because they deal with problems which are either one of the hardest in a given complexity class or they are easy to solve. One of the best known and the first theorem of this kind was proven by Schaefer (see [12]), giving exhaustive results about the satisfiability of generalized propositional formulas in conjunctive normal form. The work [2] can be seen as 1
Work done in part while employed at Julius-Maximilians-Universit¨ at W¨ urzburg. For a full version of this paper see: http://www.streit.cc/dl/
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 632–641, 2003. c Springer-Verlag Berlin Heidelberg 2003
On the Complexity of Some Equivalence Problems for Propositional Calculi
633
counterpart of the present paper in Schaefer’s framework, because there the same equivalence problems are studied for formulas in generalized conjunctive normal form. Besides asking for a satisfying assignment, other interesting problems in the theory of Boolean functions were studied. Two of them are the property of being equal or isomorphic (i.e. is there a permutation of the variables of one function, such that the modified function is equal to the other function). A list of early references can be found in [3], stressing the importance of this kind of problems. In the case of classical propositional formulas it is known that the Boolean equivalence-problem for formulas and circuits is coNP-complete, whereas only very weak lower and upper bounds for the Boolean isomorphism-problem are known. By a reduction from the tautology problem, which is coNP-complete, a lower bound for the isomorphism-problem can be easily derived. An upper p bound for the isomorphism-problem is clearly Σp 2 . For this the Σ2 -machine can existentially guess a permutation and universally check the resulting formula for equality by using its oracle. In [1] Agrawal and Thierauf show that the complement of the Boolean isomorphism-problem for formulas and circuits has an one-round interactive proof, where the Verifier has access to an NP oracle. Using this result they also show that if the Boolean isomorphism-problem is Σp 2 -complete, then the polynomial hierarchy collapse to Σp 3 . Agrawal and Thierauf give in their paper also a better lower bound for the isomorphism-problem of classical propositional formulas. More precisely they have proven that UOCLIQUE ≤pm ISOF holds, where ISOF is the isomorphism-problem of propositional formulas built out of ∧, ∨ and ¬, and UOCLIQUE is the problem of checking whether the biggest clique in a graph is unique. It is known that UOCLIQUE is ≤pm -hard for 1-NP, a superclass of coNP, where 1-NP denotes the class of all problems whose solution can be found on exactly one path in nondeterministic polynomial time. In the present paper we focus on the complexity of checking whether two given B-formulas (B-circuits, resp.) represents the same Boolean function (the equivalence-problem for B-formulas (B-circuits, resp.)) or if they represent isomorphic Boolean functions (isomorphism-problem for B-formulas (B-circuits, resp.)). We give, were possible, tight upper and lower bounds for the isomorphismproblem of B-formulas and B-circuits. In all other cases we show the coNPhardness for the isomorphism-problem, which is as good as the trivial lower bound in the classical case. Note that the known upper bounds for the usual isomorphism-problem hold for our B-formulas and B-circuits as well, since we work with special non-complete sets of Boolean functions as connectors. In the case of equivalence-problems we always give tight upper and lower bounds, showing that this problems are either in L, NL-complete, ⊕L-complete or coNPcomplete, where the complexity class L (NL) is defined as the class of problems which can be solved by a deterministic logarithmically space bounded Turingmachine (nondeterministic logarithmically space bounded Turing-machine) and
634
Steffen Reith
by ⊕L we denote the class of decision problems solvable by an NL machine such that we accept (reject resp.) the input if the number of accepting paths is odd. After presenting some notions and preliminary results for closed classes of Boolean functions in Section 2, we turn to the equivalence and isomorphismproblem for B-circuits and B-formulas in Section 3. Finally Section 4 concludes.
2
Preliminaries
Any function of the kind f : {0, 1}k → {0, 1} will be called (k-ary) Boolean function. The set of all Boolean functions will be denoted by BF. Now let B be a finite set of Boolean functions. In the following we will give a description of B-circuits and B-formulas. A B-circuit is a directed acyclic graph where each node is labeled either with a variable xi or with a function out of B. The nodes of such a B-circuit are called gates, the edges are called wires. The number of wires pointing into a gate is called fan-in and the number of wires leaving a gate is named fan-out. Moreover we order the wires pointing to a gate. If a wire leaves a gate u and points to gate v we call u a predecessor-gate of v. Additionally the gates labeled by a variable xi must have fan-in 0 and we call them input-gate. The gates labeled by a function f k ∈ B must have fan-in k. Finally we mark one particular gate o and call this gate the output-gate. Since a Boolean formula can be interpreted as a tree-like circuit, it is reasonable to define B-formulas as the subset of B-circuits C, such that each gate of C has at most fan-out 1. Each B-circuit C(x1 , . . . , xn ) computes a Boolean function fCn (x1 , . . . , xn ). Given an n-bit input string a = a1 . . . an every gate in C computes a Boolean value as follows: The input-gate xi computes ai for 1 ≤ i ≤ n and each non input-gate v computes the value g(b1 , . . . , bm ), where g m ∈ B and b1 , . . . , bm are the values computed by the predecessor-gates of v, ordered according to the order of the wires pointing to v. The value fCn (a1 , . . . , an ) computed by C is defined as the value computed by the output-gate o. Let V = {x1 , . . . , xn } be a finite set of Boolean variables. An assignment w.r.t. V is a function I : {x1 , . . . , xn } → {0, 1}. If V is clear from the context we simply say assignment. If there is an obvious ordering for the variables, we also use (a1 , . . . , an ) instead of {I(x1 ) := a1 , . . . , I(xn ) := an }. Let {xi1 , . . . , xim } ⊆ V . By I = I/{xi1 , . . . , xim } we denote the restricted assignment w.r.t {xi1 , . . . , xim }, which is defined by I (x) = I(x) iff x ∈ {xi1 , . . . , xim }. In order for an assignment I w.r.t V to be compatible with a B-circuit C (B-formula H, resp.) we must have V = Var(C) (V = Var(H), resp.). An assignment I satisfies a circuit C(x1 , . . . , xn ) (formula H(x1 , . . . , xn ), resp.) iff fC (I(x1 ), . . . , I(xn )) = 1 (fH (I(x1 ), . . . , I(xn )) = 1, resp). For an assignment I which satisfies C (H, resp.) we write I |= C (I |= H, resp.). The number of satisfying assignments (nonsatisfying assignments, resp.) of C is denoted by #1 (C) =def |{I|I |= C}| (#0 (C) =def |{I|I |= C}|, resp.). A variable xi is called fictive iff f (a1 , . . . , ai−1 , 0, ai+1 , . . . , an ) = f (a1 , . . . , ai−1 , 1, ai+1 , . . . an ) for all a1 , . . . ai−1 , ai+1 , . . . , an and 1 ≤ i ≤ n.
On the Complexity of Some Equivalence Problems for Propositional Calculi
635
For simplicity we often use a formula instead of a Boolean function. For example, the functions id (x), and (x, y), or (x, y), not(x), xor (x, y) are represented by the formulas x, x ∧ y, x ∨ y, ¬x and x ⊕ y. Sometimes x is used instead of ¬x. We will use 0 and 1 for the constant 0-ary Boolean functions. Finally have in mind that the term gate-type will be replaced by function-symbol when we work with B-formulas. Now we identify the class of Boolean functions which can be computed by a B-circuit (B-formula, resp.). For this let B be a set of Boolean functions. By [B] we denote the smallest set of Boolean functions, which contains B ∪ {id } and is closed under superposition, i.e. under substitution (composition of functions), permutation and identification of variables, and introduction of fictive variables. We call a set F of Boolean functions base for B if [F ] = B, and F is called closed if [F ] = F . A base B is called complete if [B] = BF, where BF is the set of all Boolean functions. For an n-ary Boolean function f its dual function dual(f ) is defined by dual(f )(x1 , . . . , xn ) =def ¬f (¬x1 , . . . , ¬xn ). Let B be a set of Boolean functions. We define dual(B) =def {dual(f )|f ∈ B}. Clearly dual(dual(B)) = B. Furthermore we define dual(H) (dual(C), resp.) to be the dual(B)-formula (dual(B)circuit, resp.) that emerges when we replace all function-symbols in H (gatetypes in C, resp.) by the symbol of their dual function (by the gate-type of their dual-function, resp.). Clearly fdual(H) = dual(fH ) (fdual(C) = dual(fC ), resp.). Emil Post gave in [9] a complete list of all classes of Boolean functions being closed under superposition. Moreover he showed that each closed class has a finite base. The following proposition gives us some bases for some closed classes which play a role for this paper. Proposition 1 ([9,7,10]). Every closed class of Boolean functions has a finite base. In particular: Class BF L S00 E2 N
Base {and , or , not} {xor , 1} {x ∨ (y ∧ z)} {and } {not, 1}
Class M L2 S10 V
Base {and , or , 0, 1} {x ⊕ y ⊕ z} {x ∧ (y ∨ z)} {or , 0, 1}
Class D D2 E V2
Base {(x ∧ y) ∨ (x ∧ z) ∨ (y ∧ z)} {(x ∧ y) ∨ (x ∧ z) ∨ (y ∧ z)} {and , 0, 1} {or }
Now we need some definitions: Definition 2. Let C(x1 , . . . , xn ) be a B-circuit and π : {1, . . . , n} → {1, . . . , n} be a permutation. By π(C(x1 , . . . , xn )) we denote the B-circuit which emerges when we replace the variables xi for 1 ≤ i ≤ n by xπ(i) in C(x1 , . . . , xn ) simultaneously. Next we will define two equivalence relations for B-circuits:
636
Steffen Reith
Definition 3. Let C1 and C2 be B-circuits. The Boolean equivalence- and isomorphism relation for B-circuits is defined as follows: C1 ≡ C2 ⇔def fC1 (a1 , . . . , an ) = fC2 (a1 , . . . , an ) for all a1 , . . . , an ∈ {0, 1}. C1 ∼ = C2 ⇔def There exists a permutation π : {1, . . . , n} → {1, . . . , n} such that π(C1 ) ≡ C2 . These definitions will be analogously used for B-formulas. Note that in both cases the sets of input-variables of C1 and C2 are equal. This can be easily achieved by adding fictive variables when needed. Moreover note that C1 ≡ C2 iff dual(C1 ) ≡ dual(C2 ). Now we are ready to define the Boolean equality-problem and the Boolean isomorphism-problem for B-circuits: Problem: EQC (B) Instance: B-circuits C1 , C2 Question: Is C1 ≡ C2 ?
Problem: ISOC (B) Instance: B-circuits C1 , C2 Question: Is C1 ∼ = C2 ?
Analogously we define the Boolean equality-problem EQF (B) and the Boolean isomorphism-problem ISOF (B) for B-formulas. The next proposition shows, that if we permute the variables of a given Bformula H then the number of satisfying assignments #1 (H) (non-satisfying assignments #0 (H), resp.) remain equal. This proposition is used to show the coNP-hard cases. Proposition 4. Let H1 (x1 , . . . , xn ) and H2 (x1 , . . . , xn ) be B-formulas such that H1 ∼ = H2 . Then #1 (H1 ) = #1 (H2 ) and #0 (H1 ) = #0 (H2 ) hold. It is obvious that Proposition 4 works for B-circuits too. Note that the opposite direction of Proposition 4 does not hold, as x ⊕ y ∼ ¬(x ⊕ y) and = #1 (x ⊕ y) = #1 (¬(x ⊕ y)) = 2 shows. Now let E be an arbitrary problem related to a pair of B-circuits: E(B) =def {(C1 , C2 ) | C1 and C2 are B-circuits such that fC1 and fC2 have property E } . This gives the following obvious proposition: Proposition 5. Let E be a property of two B-circuits, and let B and B be finite sets of Boolean functions. log 1. If B ⊆ [B ] then E(B) ≤log m E(B ). 2. If [B] = [B ] then E(B) ≡m E(B ).
This proposition clarifies that the complexity of EQF (B) and ISOC (B) can be determined by studying the classes of Post’s lattice. Now we can give an upper bound for the equivalence-problem of B-formulas and B-circuits. Later we will see that this upper bound is tight for B-circuits and B-formulas in some cases. Moreover we show that the equivalence-problem and the isomorphism-problem for B-circuits and dual(B)-circuits (B-formulas and dual(B)-formulas, resp.) is of equal complexity, hence we have a vertical symmetry-axis in Post’s lattice for the complexity of EQF (B), EQC (B), ISOF (B) and ISOC (B).
On the Complexity of Some Equivalence Problems for Propositional Calculi
637
Proposition 6. Let B be a finite set of Boolean functions. Then the following four statements hold: 1. 2. 3. 4.
C F log C EQF (B) ≤log m EQ (B) and ISO (B) ≤m ISO (B), EQF (B) ∈ coNP and EQC (B) ∈ coNP, F C log C EQF (B) ≡log m EQ (dual(B)) and EQ (B) ≡m EQ (dual(B)), F C log C ISOF (B) ≡log m ISO (dual(B)) and ISO (B) ≡m ISO (dual(B)).
The circuit-value problem for B-formulas is defined as follows: Problem: VALC (B) Instance: A B-circuit C(x1 , . . . , xn ) and an assignment (a1 , . . . , an ) Question: Is fC (a1 , . . . , an ) = 1? Similarly we define the formula-value problem VALF (B) for B-formulas. The following proposition gives us some information about the circuit value-problem of B-circuits. A complete classification of the circuit value problem for all closed classes can be found in [11,10], which generalizes a result of [4], where only two-input gate-types were studied. Proposition 7 ([11]). 1. Let B be a finite set of Boolean functions such that V2 ⊆ [B] ⊆ V or E2 ⊆ [B] ⊆ E, then VALC (B) is ≤log m -complete for NL. 2. Let B be a finite set of Boolean functions such that L2 ⊆ [B] ⊆ L, then VALC (B) is ≤log m -complete for ⊕L. Proposition 8 ([5,6]). 1. L⊕L = ⊕L
2. LNL = NL
To show, in certain cases, lower bounds for the equivalence- and isomorphismproblem of B-formulas and B-circuits we use the following well known graphtheoretic problems: Problem: Graph Accessibility Problem (GAP) Instance: A directed acyclic graph G whose vertices have outdegree 0 or 2, a start vertex s, and a target vertex t. Output: Is there a path in G which leads from s to t? Problem: Graph Odd Accessibility Problem (GOAP) Instance: A directed acyclic graph G whose vertices have outdegree 0 or 2, a start vertex s, and a target vertex t. Output: Is the number of paths in G, which lead from s to t, odd?
638
3
Steffen Reith
Main Results
In this section we use Post’s lattice (see [9,7]) to determine the complexity of EQC (B) and EQF (B) step by step. Similarly we are able to give lower bounds for the isomorphism problem of B-circuits and B-formulas. The basic idea in the case [B] ⊆ N, where N is the class of k-ary negations with Boolean constants is, that there exists a unique path from the output-gate to some input-gate or a gate which is labeled by a constant function. This is because every allowed Boolean function has one non-fictive variable at most. Lemma 9. Let B be a finite set of Boolean functions and [B] ⊆ N. Then EQC (B) ∈ L, EQF (B) ∈ L, ISOC (B) ∈ L and ISOF (B) ∈ L. If we restrict ourselves to or -functions or and -functions the isomorphismand equivalence-problem for such B-circuits is complete for NL. For the proof we use the NL-complete graph accessibility problem (GAP). In contrast to this, if we only use exclusive-or functions for our B-circuits the equivalence- and isomorphism-problem is ⊕L-complete. Here the ⊕L-complete graph odd accessibility problem (GOAP) is used. The basic idea in all cases is to interpret a B-circuit as directed acyclic graph and the observation that the equivalenceand isomorphism-problem can be solved by using the suitable reachability problem. Theorem 10. Let B be a finite set of Boolean functions. If E2 ⊆ [B] ⊆ E or V2 ⊆ [B] ⊆ V, then EQC (B) and ISOC (B) are ≤log m -complete for NL. If L2 ⊆ [B] ⊆ L, then EQC (B) and ISOC (B) are ≤log m -complete for ⊕L. Proposition 7 shows that in the cases V2 ⊆ [B] ⊆ V and E2 ⊆ [B] ⊆ E the circuit value problem is complete for NL and in the case L2 ⊆ [B] ⊆ L that it is complete for ⊕L. The next lemma show that this does not hold for B-formulas. In contrast to the circuit value problem the formula value problem VALF (B) is in L in all these three cases, which gives us a hint that the corresponding equality- and isomorphism-problem for B-formulas is easier than for B-circuits, since it can be solved with the help of a VALF (B) oracle. Lemma 11. Let B be a finite set of Boolean functions such that V2 ⊆ [B] ⊆ V, E2 ⊆ [B] ⊆ E or L2 ⊆ [B] ⊆ L, then VALF (B) ∈ L. When EQC (B) and ISOC (B) are NL-complete or ⊕L-complete, the formula case is still easy to solve, as we will see in the Theorem below. Theorem 12. Let B be a finite set of Boolean functions. If E2 ⊆ [B] ⊆ E, V2 ⊆ [B] ⊆ V or L2 ⊆ [B] ⊆ L, then EQF (B) ∈ L and ISOF (B) ∈ L. The next lemma shows that it is possible to construct for every monotone 3 -DNF-formula an equivalent (B ∪ {0, 1})-formula in logarithmic space, if (B ∪ {0, 1}) is a base for M.
On the Complexity of Some Equivalence Problems for Propositional Calculi
639
Lemma 13. Let k > 0 be fixed and B be a finite set of Boolean functions, such that E(x, y, v, u) and V (x, y, v, u) are B-formulas, fulfilling E(x, y, 0, 1) ≡ x ∧ y and V (x, y, 0, 1) ≡ x∨y. Then, for any monotone k-DNF (k-CNF, resp.) formula H(x1 , . . . , xn ), there exists a B-formula H (x1 , . . . , xn , u, v) such that H (x1 , . . . , xn , 0, 1) ≡ H(x1 , . . . , xn ). Moreover, H can be computed from H in logarithmic space. Now we use Lemma 13 to build two monotone formulas out of a 3 -DNF formula, such that these two monotone formulas are equivalent iff the 3 -DNF formula is a tautology. For this let 3 -TAUT be the coNP-complete set of 3-DNF formulas which are tautologies. Lemma 14. Let B be a finite set of Boolean functions such that {or , and } ⊆ [B]. Then there exist logspace computable B-formulas H1 and H2 , which can be computed out of H, such that H ∈ 3 -TAUT iff H1 ≡ H2 iff H1 ∼ = H2 iff #1 (H1 ) = #1 (H2 ). Moreover the formulas H1 and H2 do not represent constant Boolean functions (i.e., H1 ≡ 0, H1 ≡ 1, H2 ≡ 0 and H1 ≡ 1). The property that the formulas H1 and H2 can not represent a constant Boolean function plays an important role in the proof of Theorem 17. To show the coNP-completeness the basic idea of the next theorems works as follows: Since we have and ∈ [B] and or ∈ [B ∪ {1}] we can almost apply Lemma 14. The only problem is that we have to “simulate” the constant 1. The idea here is to introduce a new variable u which plays the role of the missing constant 1. By connecting the formulas from Lemma 14 and u with ∧ we cause that every satisfying assignment assigns 1 to u. Theorem 15. Let B be a finite set of Boolean functions, such that and ∈ [B] F and or ∈ [B ∪ {1}]. Then EQF (B) is ≤log m -complete for coNP and ISO (B) is log ≤m -hard for coNP. Corollary 16. Let B be a set of Boolean functions such that S10 ⊆ [B] or S00 ⊆ F [B]. Then EQF (B) and EQC (B) are ≤log m -complete for coNP and ISO (B) and C log ISO (B) are ≤m -hard for coNP. Now only two closed classes of Boolean functions are left: D and D2 . In this case the construction of Theorem 15 cannot work, because neither and ∈ D2 nor or ∈ D2 . Hence we have no possibility to use the and -function to force an additional variable to 1 for all satisfying assignments. This is not needed. Instead we use two new variables as a replacement for 0 and 1 and show, that in any case we get either two formulas representing the same constant function or formulas which match to Lemma 14. Theorem 17. Let B be a finite set of Boolean functions such that D2 ⊆ [B] ⊆ F D. Then EQF (B) and EQC (B) are ≤log m -complete for coNP and ISO (B) and C log ISO (B) are ≤m -hard for coNP. This leads us to the following classification theorems for the complexity of the equivalence- and isomorphism-problem of B-circuits and B-formulas:
640
Steffen Reith
Theorem 18. Let B be a finite set of Boolean functions. The complexity of EQC (B) and ISOC (B) can be determined as follows: if B ⊆ N then EQC (B) ∈ L and ISOC (B) ∈ L. If B ⊆ E or B ⊆ V then EQC (B) and ISOC (B) are ≤log m complete for NL. In the case that B ⊆ L EQC (B) and ISOC (B) are ≤log m complete for ⊕L. In all other cases EQC (B) is ≤log m -complete for coNP and ISOC (B) is ≤log m -hard for coNP. Theorem 19. Let B be a finite set of Boolean functions. The complexity of EQF (B) and of ISOF (B) can be determined as follows: If B ⊆ V, B ⊆ E or B ⊆ L then EQF (B) ∈ L and ISOF (B) ∈ L. In all other cases EQF (B) is F log ≤log m -complete for coNP and ISO (B) is ≤m -hard for coNP. Another interesting problem arises in the context of the isomorphism-problem of Boolean formulas. It is well-known that the satisfiability-problem for unrestricted Boolean formulas is complete for NP, but the satisfiability-problem for monotone Boolean formulas is solvable in P. We showed that in both cases the isomorphism-problem is coNP-hard, but it might be possible that a better upper bound for the isomorphism-problem of monotone formulas (M2 ⊆ [B] ⊆ M) than for the isomorphism-problem of unrestricted formulas can be found.
4
Conclusion
In the present work we determined the complexity of the equivalence-problem of B-formulas and B-circuits and were able to give a complete characterization of the complexity w.r.t. all possible finite sets of Boolean functions. We showed that the equivalence-problem for B-circuits is, depending on the used set of Boolean functions, NP-complete, NL-complete, ⊕L-complete or in L. Interestingly, because of the succinctness of circuits, the equivalence-problem for B-formulas is sometimes easier to solve. To be more precise, if EQC (B) is NL-complete or ⊕L-complete then EQF (B) is still solvable in deterministic logarithmic space. In all other cases the representation as formula or circuit has no influence and the complexity of EQC (B) and EQF (B) coincide. In the case of isomorphism-problems we were not always able to prove completeness results. In these cases we showed the hardness for coNP as a lower bound. Note that in these cases the trivial upper bound Σp 2 remains valid, so our results are as strong as the well known trivial upper and lower bound for the isomorphism-problem of unrestricted Boolean formula and circuits. In the easier case [B] ⊆ N we proved that the isomorphism-problem for B-circuits is decidable in deterministic logarithmic space. For V2 ⊆ [B] ⊆ V and E2 ⊆ [B] ⊆ E (L2 ⊆ [B] ⊆ L, resp.) we showed the NL-completeness (⊕L-completeness, resp.) of the isomorphism-problem of B-circuits. Similar to the equivalence-problem the isomorphism-problem for B-formulas is still solvable in deterministic logarithmic space if ISOC (B) is NL-complete or ⊕L-complete. We use the same reduction for showing the coNP-hardness of EQF (B) and ISOF (B), therefore it can not be expected that this reduction is powerful enough
On the Complexity of Some Equivalence Problems for Propositional Calculi
641
to show a better lower bound for the isomorphism-problem. Note that this reduction does not use the ability of permuting variables. Hence it seems possible that any reduction showing a better lower bound for the isomorphism-problem has to take a non-trivial permutation into account.
Acknowledgments I am grateful to Heribert Vollmer for a lot of helpful discussions and in particular to Sven Kosub for a first idea of Lemma 14.
References 1. Manindra Agrawal and Thomas Thierauf. The Boolean isomorphism problem. In 37th Symposium on Foundation of Computer Science, pages 422–430. IEEE Computer Society Press, 1996. 2. Elmar B¨ ohler, Edith Hemaspaandra, Steffen Reith, and Heribert Vollmer. Equivalence problems for boolean constraint satisfaction. In Proc. Computer Science Logic, Lecture Notes in Computer Science. Springer Verlag, 2002. 3. B. Borchert, D. Ranjan, and F. Stefan. On the Computational Complexity of Some Classical Equivalence Relations on Boolean Functions. Theory of Computing Systems, 31:679–693, 1998. 4. L.M. Goldschlager and I. Parberry. On The Construction Of Parallel Computers From Various Bases Of Boolean Functions. Theoretical Computer Science, 43:43– 58, 1986. 5. Ulrich Hertrampf, Steffen Reith, and Heribert Vollmer. A note on closure properties of logspace MOD classes. Information Processing Letters, 75:91–93, 2000. 6. Neil Immerman. Nondeterministic space is closed under complementation. SIAM Journal on Computing, 17(5):935–938, 1988. 7. S. W. Jablonski, G. P. Gawrilow, and W. B. Kudrajawzew. Boolesche Funktionen und Postsche Klassen. Akademie-Verlag, 1970. 8. Harry R. Lewis. Satisfiability Problems for Propositional Calculi. Mathematical Systems Theory, 13:45–53, 1979. 9. E. L. Post. The two-valued iterative systems of mathematical logic. Annals of Mathematical Studies, 5:1–122, 1941. 10. Steffen Reith. Generalized Satisfiability Problems. PhD thesis, University of W¨ urzburg, 2001. 11. Steffen Reith and Klaus W. Wagner. The Complexity of Problems Defined by Boolean Circuits. Technical Report 255, Institut f¨ ur Informatik, Universit¨ at W¨ urzburg, 2000. To appear in Proceedings International Conference Mathematical Foundation of Informatics, Hanoi, October 25 - 28, 1999. 12. T. J. Schaefer. The complexity of satisfiability problems. In Proccedings 10th Symposium on Theory of Computing, pages 216–226. ACM Press, 1978. 13. R. Szelepcs´enyi. The method of forcing for nondeterministic automata. Bulletin of the European Association for Theoretical Computer Science, 33:96–100, 1987.
Quantified Mu-Calculus for Control Synthesis St´ephane Riedweg and Sophie Pinchinat IRISA-INRIA, F-35042, Rennes, France {sriedweg,pinchina}@irisa.fr Fax: +33299847171 Abstract. We consider an extension of the mu-calculus as a general framework to describe and synthesize controllers. This extension is obtained by quantifying atomic propositions, we call the resulting logic quantified mu-calculus. We study its main theoretical properties and show its adequacy to control applications. The proposed framework is expressive : it offers a uniform way to describe as varied parameters as the kind of systems (closed or open), the control objective, the type of interaction between the controller and the system, the optimality criteria (fairness, maximally permissive), etc. To our knowledge, none of the former approaches can capture such a wide range of concepts.
1
Introduction
To generalize the control synthesis theory of Ramadge and Wohnam [1], lot of works use temporal logics as specification [2–4]. All those approaches suffer from substantial limitations: there is no way to impose properties on the interaction between the system and its controller, nor to require optimality of controllers. The motivation of our work is to fill these gaps. We put forward an extension of the mu-calculus well suited to describe general control objectives and to synthesize finite state controllers. The proposed framework is expressive : it offers a uniform way to describe as varied parameters as the kind of systems (closed or open), the control objective, the type of interaction between the controller and the system, the optimality criteria (fairness, maximally permissive), etc. To our knowledge, none of the former approaches can capture such a wide range of concepts. As in [5–7], we extend a temporal logic (the mu-calculus) by quantifying atomic propositions. We call the resulting logic quantified mu-calculus. We study its main theoretical properties and show its adequacy to control applications. We start from alternating tree automata for mu-calculus [8, 9] and we extend their theory using the Simulation Theorem [10, 11, 8] and a projection of automata. The Simulation Theorem states that alternating automata and nondeterministic automata are equivalent. The projection is an adaption of the construction of [12]. The meanings of existential quantifier is defined projecting automata on sets of propositions. Decision procedures for model-checking and satisfaction can therefore be obtained. Both problems are non-elementary when we consider the full logic. We can however display interesting fragments with lower complexity, covering still a wide class of control problems. B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 642–651, 2003. c Springer-Verlag Berlin Heidelberg 2003
Quantified Mu-Calculus for Control Synthesis
643
The following explains the applications to control. We view supervision of systems as pruning the systems’ computation trees. Consequently, a controller can be represented by a labeling c of the (uncontrolled) system’s computation tree into {0, 1}, such that the (downwards closed) 1-labeled subtree is the behavior of the controlled system. For any proposition c, we define a transformation α∗c of mu-calculus formulas α such that some controller induced restriction of S satisfies α if and only if α ∗ c holds of some c-labeling on the computation tree. Labeling allows to consider the forbidden part of the controlled system, and we derive controllers for large classes of specifications, using a constructive model-checking. Beyond the capability to specify controllers which only cut controllable transitions, we can more interestingly specify (and synthesize) a maximally permissive controller for α; i.e. a controller c such that the c-controlled system satisfies α and no c -controlled system such that c c satisfies α; where c c is the mucalculus formula expressing that the 1-labeled subtree defined by c is a proper subtree of the 1-labeled subtree defined by c . A maximally permissive controller enforcing α can therefore be specified by the quantified mu-calculus formula: ∃c. α∗c ∧ ∀c . c c ⇒ ¬(α∗c ) . Controllers and maximally permissive controllers for open systems [2] may also be specified and synthesized. Such controllers are required moreover to be robust against the environment’s policy. Also, the implementation considerations of [13] and decentralized controllers may be formulated in quantified mu-calculus. Not surprisingly, the expressive power of the mu-calculus enables us to deal with fairness. The rest of the paper is organized as follows : Section 2 presents the logic. Section 3 studies applications to control theory. Algorithms are developed in Section 4, based on the automata-theoretic semantics. Finally, control synthesis is illustrated in Section 5.
2
Quantified Mu-Calculus
We assume given a finite set of events A, a finite set of propositions AP , and an infinite set of variables V ar = {X, Y, . . .}. Definition 1. (Syntax of qLµ ) The set of formulas of the quantified mucalculus on Γ ⊆ AP , written qLµ (Γ ), is defined by the grammar: ∃Λ.α | ¬α1 |α1 ∨ α2 | β where Λ ⊆ AP , α ∈ qLµ (Γ ∪ Λ), α1 and α2 are formulas in qLµ (Γ ), and β is a formula of the pure mu-calculus on Γ . The set of formulas of the pure mu-calculus on Γ ⊆ AP , written Lµ (Γ ), is defined by the grammar:
| p | X | ¬β | β | β ∨ β | µX.β(X)
644
St´ephane Riedweg and Sophie Pinchinat
where a ∈ A , p ∈ Γ , X ∈ V ar, and β and β are in Lµ (Γ ). To ensure meanings to fix-points formulas, X must occur under an even number of negation symbols ¬ in α(X), in each formula µX.α(X). Extending the terminology of mu-calculus, we call sentences all quantified mucalculus formulas without free variables. We write ⊥, [a]α, α ∧ β, νX.α(X), and ∀Λ.α respectively for negating , ¬α, ¬α ∨ ¬β, µX.¬α(¬X) and ∃Λ.¬α. a a We write also →, →, α ⇒ β, and ∃x.α respectively for , [a] ⊥, ¬α ∨ β, and ∃{x}.α. Since in general, fixed-point operators and quantifiers do not commute, we enforce no quantification inside fixed-point terms. The quantified mu-calculus qLµ , as a generalization of the mu-calculus, is also given an interpretation over deterministic transition structures called processes in [3] . Definition 2. A process on Γ ⊆ AP is a tuple S =< Γ, S, s0 , t, L >, where S is the set of states, s0 ∈ S is the initial state, t : S × A → S is a partial function called the transition function and L : S → P(Γ ) maps states to subset of propositions. We say that S is finite if S is finite and that it is complete if for all (a, s) ∈ A × S, t(s, a) is defined. Compound processes can be built up by synchronous product. Definition 3. The (synchronous) product of two processes S1 =< Γ1 , S1 , s01 , t1 , L1 > and S2 =< Γ2 , S2 , s02 , t2 , L2 > on disjoint sets Γ1 and Γ2 is the process S1 × S2 =< Γ, S1 × S2 , (s01 , s02 ), t, L > on Γ = Γ1 ∪ Γ2 such that (1) (s1 , s2 ) ∈ t((s1 , s2 ), a) whenever s1 ∈ t1 (s1 , a) and s2 ∈ t2 (s2 , a)), and (2) L(s1 , s2 ) = L1 (s1 ) ∪ L2 (s2 ). In the sequel, we shall in particular make the product of a process on Γ with another (complete) process on a disjoint set of propositions Λ in order to obtain a similar process on Γ ∪ Λ. This is the way in which qLµ will be applied to solve control problem (see Theorem 1 Section 3). Definition 4. A labeling process on Λ ⊆ AP is simply a complete process E on Λ. Now, for any process S =< Γ, S, s0 , t, L > with Γ disjoint from Λ, S × E is called a labeling of S (by E) on Λ. We let LabΛ denote the set of labeling processes on Λ. Definition 5. (Semantics of qLµ ) The interpretation of the qLµ (Γ )-formulas is relative to a process S =< Γ, S, s0 , t, L > and a valuation val : V ar → P(S). [val] This interpretation α S (⊆ S) is defined by: [val] [val] [val] S = S, p S = {s ∈ S |p ∈ L(s)}, X S = val(X), [val] [val] [val] [val] [val] ¬α S = S \ α S , α ∨ β S = α S ∪ β S [val] [val] α S = {s ∈ S |∃s : t(s, a) = s and s ∈ α S }, [val] [val(V /X)] ⊆ V }, µX.α(X) S = ∩{V ⊆ S | α S [val] [val×E] 0 ∃Λ.α S = {s ∈ S |∃E =< Λ, E, ε ,t ,L >∈ LabΛ , (s, ε0 ) ∈ α S×E } where (val × E)(X) = val(X) × E for any X ∈ V ar.
Quantified Mu-Calculus for Control Synthesis
645
Notice that the valuation val does not influence the semantics of a sentence α ∈ qLµ ; and we write S |= α whenever the initial state of S belongs to α S . Clearly, bisimilar processes satisfy the same qLµ formulas.
3
Control Specifications
This section presents various examples of specifications for control objectives in qLµ . First, a transformation of formulas is defined, which used to link qLµ model-checking with control problems, as shown by Theorem 1. Variants of the Theorem are then exploited to capture requirements, such as maximal permissive controllers, controllers for open systems, etc. Definition 6. For any sentence α ∈ qLµ (Γ ) and for any x ∈ AP , the x-lift of α is the formula α∗x ∈ qLµ (Γ ∪ {x}), inductively defined by (by convention, ∗ has highest priority) :
∗x = , p∗x = p, X ∗x = X, (¬α)∗x = ¬α∗x, (α ∨ β)∗x = α∗x ∨ β ∗x, ( α)∗x = (x ∧ α∗x), (µX.α)∗x = µX.α∗x, (∃Λ.α)∗x = ∃Λ.α∗x. Definition 7. Given a process S =< Γ, S, s0 , t, L > and some x ∈ Γ , the xpruning of S is the process S(x ) =< Γ, S, s0 , t , L > such that, for all s ∈ S and a ∈ A, t (s, a) = t(s, a) if x ∈ L(t(s, a)) or t (s, a) is undefined otherwise. Lemma 1. For any process S on Γ , for any x ∈ Γ , and for any sentence α ∈ qLµ (Γ ), we have: α S(x ) = α∗x S . Proof. Straightforward by induction on α.
Control synthesis [1, 3, 14, 4] is a generic problem that consists to enforce a plant S to have some property α by composing it with another process R, the controller of S for α. The goal is to synthesize R given α. We focus here on the joint specification α and constraints on R : they reflect the way it exerts the control. This capability relies on the following theorem. Theorem 1. Given a sentence α ∈ qLµ (Λ ∪ Γ ), where Λ and Γ are disjoint, and a process S on Γ , the following assertions are equivalent : – There exists a controller on Λ of S for α. – S |= ∃c∃Λ. α∗c where c is a fresh proposition. Proof. First, suppose that there exists a process R =< Λ, R, r0 , t, L > such that S × R |= α. Given c ∈ AP \ Λ ∪ Γ , we can easily define E ∈ LabΛ∪{c} s.t. R is (E))(c ) without the label c. Now, (S × E)(c ) or equivalently S ×(E)(c ) satisfies α, since R is (E)(c ) without c and c does not occur in α. Using Lemma 1, we conclude that (S × E) |= α∗c. Suppose now that S |= ∃c∃Λ.α∗c. By Definition 5, there is a process E ∈ Lab{c}∪Λ such that S × E |= α∗c. By Lemma 1, (S × E)(c ) satisfies α. Then take R as (E)(c ) .
646
St´ephane Riedweg and Sophie Pinchinat
We illustrate now the use of qLµ to specify various control requirements. The formula ∃c.α∗c of Theorem 1 is enriched to integrate control rules. In the sequel, we let = a∈A , [A] = a∈A [a], Reachc (γ) = (µY. Y ∨ γ)∗ c, and Invc (γ) = (νY.[A]Y ∧ γ) ∗ c. Also, the x-lift is canonically extended to conjunctions of propositions. Maximally Permissive Admissible Controller for α. When a system S has uncontrollable transitions, denoted by the set of labels Auc , an admissible controller for α should not disable any of them. Its existence may be expressed by the formula (1). An admissible controller c for α is maximally permissive if no other admissible controller c for α can cut strictly less transitions than c. Writing c c the mu-calculus formula Invc (c ) ∧ Reachc (¬c); this requirement is expressed by the formula (2). (1) S |= ∃c. Invc ([Auc ]c) ∧ α∗c S |= ∃c.Invc ([Auc ]c) ∧ α∗c ∧ ∀c . Invc ([Auc ]c ) ∧ (c c ) ⇒ ¬α∗c . (2) Maximally Permissive Open Controller for α. As studied in [2], an open system S takes the environment’s policy into account : the alphabet A of transitions is a disjoint union of the alphabet Aco of controllable transitions and the alphabet Auc of uncontrollable transitions, permitted or not by the environment. The open controller must ensure α for any possible choice of the environment. This requirement is expressed by the formula (3), where the proposition e represents the environment’s policy. The ad-hoc solution of [2] cannot be easily extended to maximally permissive open controller. This requirement is expressed by the formula (4). (3) S |= ∃c.Invc ([Auc ]c) ∧ ∀e. (Inve ([Aco ]e)) ⇒ α∗(e∧c) . S |= ∃c.Invc ([Auc ]c) ∧ ∀e. (Inve ([Aco ]e)) ⇒ α∗(e∧c) (4) ∧∀c . (Invc ([Auc ]c ) ∧ (cut(c) cut(c ))) ⇒ ∃e .Inve ([Aco ]e)∧¬α∗(e ∧c ) . Implementable Controller for “Non-blocking”. Such a controller [13], is an admissible controller which, moreover, selects exactly one controllable transition at a time, and such that, in the resulting supervised system, a final state (given by the proposition Pf ) is always reachable. Let Nblock = νZ. (µX.Pf ∨ X)∧[A]Z a and let Impl = ( a∈Aco →) ⇒ a∈Aco (< a > c ∧ [Aco \ {a}]¬c) ; a nonblocking implementable controller of a system S may be expressed by the formula: S |= ∃c. c ∧ (Nblock)∗c ∧ Invc ([Auc ]c) ∧ Invc (Impl) Decentralized controllers for α. The existence of decentralized controllers R1 and R2 such that S × R1 × R2 |= α may be expressed : S |= ∃c1 ∃c2 .α∗(c1 ∧ c2 ).
4
Quantified Mu-Calculus and Automata
Automata-theoretic approaches provide the model theory of mu-calculus, and they offer decision algorithms for the satisfiability and the model-checking problems [15, 10, 16–18, 8]. Depending on the approach followed, different automata
Quantified Mu-Calculus for Control Synthesis
647
have been considered, differing mainly in two orthogonal parameters : the more or less restricted kind of transitions, ranging from alternating automata to the subclass of non-deterministic automata, and the acceptance conditions e.g. Rabin, Streett, Motkowski/parity. The class of tree automata for mu-calculus which we shall adapt to qLµ is the class of alternating parity automata, or shortly simple automata, considered in [3]. This adaptation is stated by Theorem 2 below which constitutes the main result of this section; the remaining brings back the material needed for its proof. Theorem 2. (Main result)For any sentence α ∈ qLµ (Γ ), there exists a simple automaton Aα on Γ such that, for any process S on Γ : S |= α iff S is accepted by Aα Definition 8. (Simple Automata on Processes) A simple automaton on Γ is a tuple A =< Γ, Q, Q∃, Q∀, q 0, δ : Q × P(Γ ) → P(M oves(Q)), r > where Q is a finite set of states, partitioned into two subsets Q∃ and Q∀ of respectively existential and universal states, q 0 ∈ Q is the initial state, r : Q → IN is the parity condition, and the transition function δ assigns to each state q and to each subset of Γ a set of possible moves included in M oves(Q) = ((A ∪ { }) × Q) ∪ (A × {→, → }). Definition 9. (Nondeterministic Automata on Processes) A simple automaton is nondeterministic if for any set of labels Λ ⊆ Γ , δ(q, Λ) ⊆ { } × Q for any q ∈ Q∃ , and δ(q, Λ) ⊆ M oves(Q) { } × Q for any q ∈ Q∀ . Moreover, in case when q ∈ Q∀ , it is required that (a1 , q1 ), (a2 , q2 ) ∈ δ(q, Λ) ∩ (A × Q) and a1 = a2 entail q1 = q2 . Finally, the initial state should be an existential state. A nondeterministic automaton is bipartite if for any Λ ⊆ Γ , δ(q, Λ) ⊆ { } × Q∀ for any q ∈ Q∃ and δ(q, Λ) ∩ (A × Q) ⊆ A × Q∃ for any q ∈ Q∀ . Parity games provide automata semantics. A parity game is a graph with an initial vertex v 0 , with a partition (VI , VII ) of the vertices, and with a partial mapping r from the vertices to a given finite set of integers. A play from some vertex v proceeds as follows: if v ∈ VI , then player I chooses a successor vertex v , else player II chooses a successor vertex v , and so on ad infinitum unless one player cannot make any move. The play is winning for player I if it is finite and ends in a vertex of VII , or if it is infinite and the upper bound of the set of ranks r(v) of vertices v that are encountered infinitely often is even. A strategy for player I is a function σ assigning a successor vertex to every sequence of vertices → → → v , ending in a vertex of VI . A strategy σ is memoryless if σ( v ) = σ(w) whenever → → the sequences v and w end in the same vertex. A strategy for player I is winning if all play following the strategy from the initial vertex are winning for player I. Winning strategies for player II are defined similarly. The fundamental result of parity games is the memoryless determinacy Theorem, established in [10, 8]. Theorem 3. (Memoryless determinacy) For any parity game, one of the players has a (memoryless) winning strategy.
648
St´ephane Riedweg and Sophie Pinchinat
Definition 10. Given a simple automaton A =< Γ, Q, Q∃ , Q∀ , q 0 , δ, r > and a process S =< Γ, S, s0 , t, L >, we define the parity game G(A, S); where the vertices of player I are in Q∃ × S ∪ { } and the vertices of player II are in Q∀ × S ∪ {⊥}; the initial vertex v 0 is (q 0 , s0 ), the other vertices and transitions are defined inductively as follows. Vertices and ⊥ have no successor. For any vertex (q, s) and for all a ∈ A: – there is an -edge to a successor vertex (q , s) if ( , q ) ∈ δ(q, L(s)), – there is an a-edge to a successor vertex (q , s ) if (a, q ) ∈ δ(q, L(s)) and t(s, a) = s , – there is an a-edge to a successor vertex if (a, →) ∈ δ(q, L(s)) and t(s, a) is defined, or (a, →) ∈ δ(q, L(s)) and t(s, a) is not defined, – there is an a-edge to a successor vertex ⊥ if (a, →) ∈ δ(q, L(s)) and t(s, a) is not defined, or (a, →) ∈ δ(q, L(s)) and t(s, a)is defined. The automaton A accepts S (noted S |= A) if there is a winning strategy for player I in G(A, S). Like automata on infinite trees [10, 11], simple automata on processes are equivalent to bipartite non-deterministic automata. This fundamental result, due to [8, 3], is called the Simulation Theorem: Theorem 4. (Simulation Theorem for processes) Every simple automaton on processes is equivalent to a bipartite nondeterministic automaton. Since a constructive proof of Theorem 2 for α ∈ Lµ may be found in [8, 18, 9], in order to extend it to qLµ , we consider projections of automata: it is the semantic counterpart of the existential quantification in qLµ . Projections presented here are similar to projections of nondeterministic tree automata presented in [12, 19] : projected automata are obtained by forgetting a subset of propositions in the condition of the transitions. Definition 11. (Projection) Let Γ ⊆ Γ and let A =< Γ , Q, Q∃ , Q∀ , q 0 , δ, r > be a bipartite nondeterministic automaton. The projection of A on Γ is the bipartite nondeterministic automaton A ↓Γ =< Γ, Q∃ ∪ Q∀ × P(Λ), Q∃ , Q∀ × P(Λ), q 0 , δ↓Γ , r↓Γ >, where for all l ⊆ Λ and for all l ⊆ Γ : 1. ∀q ∈ Q∃ : δ↓Γ (q, l ) = {( , (q , l)) |( , q ) ∈ δ(q, l ∪ l)}, 2. ∀q ∈ Q∀ : δ↓Γ ((q, l), l ) = δ(q, l ∪ l), 3. ∀q ∈ Q∃ : r↓Γ (q) = r(q) and ∀q ∈ Q∀ : r↓Γ ((q, l )) = r(q). Theorem 5. (Projection) Let A =< Γ , Q, Q∃ , Q∀ , q 0 , δ, r > be a bipartite nondeterministic automaton. For any process S =< Γ, S, s0 , t, L > on Γ ⊆ Γ , S |= A↓Γ if and only if there exists a labeling process E on Λ = Γ Γ such that S × E |= A. Proof. First, suppose S |= A↓Γ . Let σ be a winning memoryless strategy for player I in the game G = G(A↓Γ , S) (Theorem 3) and let VII ⊆ Q∀ × P(Λ) × S be the set of nodes from player II in G without ⊥. Let E ∈ LabΛ be an arbitrary
Quantified Mu-Calculus for Control Synthesis
649
completion of the process < Λ, VII , σ(q 0 , s0 ), t , L >, where for any (q, l, s) inVII , L (q, l, s) = l, and for all a ∈ A, t ((q, l, s), a) = σ(s , q ), for some (unique) (s , q ) such that there is an a-arc from ((q, l), s) to (q , s ) in G. Then, it can be show that we define a winning strategy σ for player I in the game G(A, S × E) by σ ((s, σ(s, q)), q) = σ(s, q). Reciprocally, suppose S × E |= A for some E ∈ LabΛ . It suffice to show that any memoryless winning strategy for player I in G(A, S×E) defines a memory winning strategy for player I in the game G(A↓Γ , S). We prove now Theorem 2, by induction on the structure of α ∈ qLµ (Γ ). We only prove the case where α is quantified, since α ∈ Lµ is dealt with following [8] and [9]. Without loss off generality, we can assume α of the form qΛα , where q ∈ {∃, ∀}. For q = ∃, let A be the bipartite nondeterministic automaton equivalent to Aα (Theorem 4). Now take Aα = A↓Γ and conclude by Theorem 5. The case where q = ∀ is obtained by complementation : since parity games are determined (Theorem 3), we complement A∃Λ¬α (as in [11]) to obtain Aα Theorem 2 gives an effective construction of finite controllers on finite processes: given a finite process S and a sentence ∃c.α ∈ qLµ expressing a control problem, we construct the automaton A(∃c.α) . If we find a memoryless winning strategy in the finite game G(Aα , S), Theorem 5 gives a finite controller. Otherwise, there is no solution. We can show that the complexity of such problem is (k + 1)EXP T IM E − complete where k is the number of alternations of existential and universal quantifiers in α. The result of [2] is retrieved: synthesizing controllers for open systems is 2EXP T IM E − complete for mu-calculus control objectives.
5
Controller Synthesis
This section illustrates the constructions on a simple example. The plant S (to be controlled) is drawn next page. Both states s0 and s1 are labeled with the empty set of propositions, thus S is a process on Γ = ∅. The control objective is the formula α = νY. Y ∧ (µX.[a]X). There is a controller of S for α iff S |= ∃c.α∗c but also iff S |= ∃c.c ∧ α∗c. Let φ be the formula c ∧ α∗c ≡ c ∧ νY. (c ∧ Y )∧ (c ∧ µX.[a](c ⇒ X)). The bipartite nondeterministic automaton Aφ is shown in Figure 1, where the following graphical conventions are used: circled states are existential states, while states enclosed in squares are universal states; the transitions between states are represented by edges in {a, b, } × P({c}); the other transitions are represented by labeled edges from states to the special boxe containing the symbol →. The rank function maps q 0 to 2 and q2 to 1. The projected automaton Aφ ↓∅ is shown in Figure 2, using similar conventions. Note that all transitions are labeled in {a, b, } × {∅}, since Aφ ↓∅ is an automaton on Γ = ∅, but all universal states are now labeled in Q × P({c}), as a result of the projection. Now, S |= ∃c.φ iff Aφ↓∅ accepts S and this condition is equivalent to the existence of a winning strategy for player I in the finite parity game G(Aφ ↓∅ , S) of Figure 3. Clearly, player I has an unique memoryless wining strategy σ, that maps the vertex (q2 , s0 ) to (q2 , ∅, s0 ). The labeling process E on {c} derived from σ is shown in Figure 4. Four states and
650
St´ephane Riedweg and Sophie Pinchinat
transitions between them are first computed, yielding an incomplete process on {c}. A last state c is then added so as to obtain a complete process. The dashed transitions (and all dead transitions) are finally suppressed to yield the synthesized controller.
6
Conclusion
The logical formalism we have developed allows to synthesize controllers for a large class of control objectives. All the constraints, as maximally permissive controllers or admissible ones for open systems, are formulated as objectives. As it is, the class of controllers is left free and we cannot, for example, deal with partial observation. The recent work of [3] offers two constructions that we can use to interpret the quantified mu-calculus relatively to some fixed classes of labeling processes. The first construction, the quotient of automata, forces the labeling processes to be in some mu-calculus (definable) class. It can be seen as a generalization of the automata projection, and used instead. The quantified mu-calculus could hence be extended by constraining each quantifier to range
Quantified Mu-Calculus for Control Synthesis
651
over some mu-calculus class. Nevertheless, the class of controllers under partial observation being undefinable in the mu-calculus, we need to consider the second construction: the quotient of automata over a process exhibits (when it exists) a controller under partial observation inside some mu-calculus class. The outermost quantification of a sentence is then made relative to some class of partial observation. Therefore, we can seek a controller under partial observation for open systems, but we cannot synthesize a maximally permissive controller among the controllers under partial observation.
References 1. Ramadge, P.J., Wonham, W.M.: The control of discrete event systems. Proceedings of the IEEE; Special issue on Dynamics of Discrete Event Systems 77 (1989) 2. Kupferman, O., Madhusudan, P., Thiagarajan, P., Vardi, M.: Open systems in reactive environments: Control and synthesis. CONCUR 2000, LNCS 1877. 3. Arnold, A., Vincent, A., Walukiewicz, I.: Games for synthesis of controllers with partial observation. To appear in TCS (2003) 4. Vincent, A.: Synth`ese de contrˆ oleurs et strat´egies gagnantes dans les jeux de parit´e. MSR 2001 5. Sistla, A., Vardi, M., Wolper, P.: The complementation problem for Buch¨ı automata with applications to temporal logic. TCS49 (1987) 6. Kupferman, O.: Augmenting branching temporal logics with existential quantification over atomic propositions. Journal of Logic and Computation 9 (1999) 7. Patthak, A.C., Bhattacharya, I., Dasgupta, A., Dasgupta, P., Chakrabart, P.P.: Quantified computation tree logic. IPL 82 (2002) 8. Arnold, A., Niwinski, D.: Rudiments of mu-calculus. North-Holland (2001) 9. Walukiewicz, I.: Automata and logic. In: Notes from EEF Summer School’01. (2001) 10. Emerson, E.A., Jutla, C.S.: Tree automata, mu-calculus and determinacy. FOCS 1991. IEEE Computer Society Press (1991) 11. Muller, D.E., Schupp, P.E.: Simulating alternating tree automata by nondeterministic automata: New results and new proofs of the theorems of Rabin, McNaughton and Safra. TCS 141 (1995) 12. Rabin, M.O.: Decidability of second-order theories and automata on infinite trees. Trans. Amer. Math. Soc. 141 (1969) 13. Dietrich, P., Malik, R., Wonham, W., Brandin, B.: Implementation considerations in supervisory control. In: Synthesis and Control of Discrete Event Systems. Kluwer Academic Publishers (2002) 14. Bergeron, A.: A unified approach to control problems in discrete event processes. Theoretical Informatics and Applications 27 (1993) 15. Emerson, E.A., Sistla, A.P.: Deciding full branching time logic. Information and Control 61 (1984) 16. Emerson, E.A., Jutla, C.S., Sistla, A.P.: On model-checking for fragments of mucalculus. CAV 1993, LNCS 697. 17. Streett, R.S., Emerson, E.A.: The propositional mu-calculus is elementary. ICALP 1984, LNCS 172. 18. Kupferman, O., Vardi, M.Y., Wolper, P.: An automata-theoretic approach to branching-time model checking. Journal of the ACM 47 (2000) 19. Thomas, W.: Automata on infinite objects. In Leeuwen, J.v., ed.: Handbook of TCS, vol. B. Elsevier Science Publishers (1990)
On Probabilistic Quantified Satisfiability Games Marcin Rychlik Institute of Informatics, Warsaw University Banacha 2, 02-097 Warsaw, Poland [email protected]
Abstract. We study the complexity of some new probabilistic variant of the problem Quantified Satisfiability(QSAT). Let a sentence ∃v1 ∀v2 . . . ∃vn−1 ∀vn φ be given. In classical game associated with the QSAT problem, the players ∃ and ∀ alternately chose Boolean values of the variables v1 , . . . , vn . In our game one (or both) players can instead determine the probability that vi is true. We call such player a probabilistic player as opposite to classical player. The payoff (of ∃) is the probability that the formula φ is true. We study the complexity of the problem if ∃ (probabilistic or classical) has a strategy to achieve the payoff at least c playing against ∀ (probabilistic or classical). We completely answer the question for the case of threshold c = 1, exhibiting that the case when ∀ is probabilistic is easier to decide (Σ2P –complete) than the remaining cases (PSPACE-complete). For thresholds c < 1 we have a number of partial results. We establish PSPACE-hardness of the question whether ∃ can win in the case when only one of the players is probabilistic, and Σ2P -hardness when both players are probabilistic. We also show that the set of thresholds c for which a related problem is PSPACE is dense in [0, 1]. We study the set of reals c ∈ [0, 1] that can be game values of our games. The set turns out to include the set of binary rationals, but also some irrational numbers.
1
Introduction
In this paper we study a certain probabilistic variant of the problem Quantified Satisfiability(QSAT). Games with coin tosses (see e.g. [9][3][2][5]) or the games where players use randomized strategies (see e.g. [6][11][4]), have been widely considered in several previous works in complexity theory. Many papers consider a possibility of players to choose probability distributions (mixed strategies [11][4][7][8] or behavior strategies[6][7]), but the choices are made by the players just once per game, either independently or with just one alternation. A crucial difference between these works and ours is that in our framework probabilities are chosen by players in turn, according to the values of probabilities chosen so far. To our knowledge, such situation has not been considered so far. Quantified Satisfiability was studied in [1]. It can be considered as a game between two players, call them ∃ and ∀. Fix some Boolean formula φ (x1 , . . . , xn ).
Supported by Polish KBN grant No. 7 T11C 027 20
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 652–661, 2003. c Springer-Verlag Berlin Heidelberg 2003
On Probabilistic Quantified Satisfiability Games
653
The two players move alternately, with ∃ moving first. If i is odd then ∃ fixes the value of xi , whereas if i is even ∀ fixes the value of xi . ∃ tries to make the expression φ true, while ∀ tries to make it false. Then ∃ has a winning strategy iff ∃x1 ∀x2 ∃x3 . . . φ (x1 , . . . , xn ) is true. If we assume that ∀ is uninterested in winning and plays at random then the game becomes a Game Against Nature studied in [9](see also [10]). The decisions of the Nature are probabilistic in manner, Nature chooses xi = 0 or xi = 1 with probability 12 . In this case a winning strategy for ∃ is a strategy that enforces the probability of success to be greater than 12 . Both in the case of the game Quantified Satisfiability and in the case of the Game Against Nature the following problem is PSPACE-complete[9]: Given φ decide whether there exists a winning strategy for ∃. There will be a difference between the Games Against Nature and our probabilistic variant of the game Quantified Satisfiability. In Games Against Nature the players use deterministic (pure) strategies. It means that at a particular node in a game, a player ∃(playing against Nature) is required to make a strategic move - say to choose the side of the coin. Or else, Nature is required to toss a coin, but the probabilities associated with the coin tosses are fixed in advance and not chosen by Nature. Hence, coin tosses correspond to “chance moves” in standard game-theoretic terminology. In our game, the biases of the coins will be chosen strategically in turn by both players. Once the biases of all the coins are determined, the coins are tossed. Thus the values of x1 , . . . , xn are determined and ∃ wins iff φ (x1 , . . . , xn ) is true. More specifically, we consider two types of players. A probabilistic player, instead of determining the value of xi , chooses the probability pi of xi being 1, where we assume that events {xi = 1} and {xj = 1} are independent when i = j. A player using a classical strategy, i.e. choosing values 0 or 1, can be viewed as a probabilistic player as well, but restricted to pi = 0 or pi = 1. The chosen probabilities p1 , p2 , . . . , pn determine the probability P (φ) of the event that φ is true. Now ∃ tries to make P (φ) as large as possible whereas ∀ tries to minimize P (φ). So P (φ) can be meant as the payoff of ∃. Notice that in the classical Quantified Satisfiability game the payoff P (φ) can be only 0 or 1. The following computational problem arises: given formula φ decide if ∃ can make P (φ) greater than a fixed threshold c ∈ [0, 1). We shall study the problem and related ones in this paper. We prove that the problem if ∃ can make P (φ) = 1 is Σ2P -complete(see e.g. [10]), when ∀ is probabilistic, and that this question is PSPACE-complete, when ∀ is classical. We show that it is PSPACE-hard to tell whether a probabilistic ∃ can enforce P (φ) ≥ c, when the opponent ∀ is classical. Similarly it is PSPACE-hard to tell whether a classical ∃ can make P (φ) > c when ∀ is probabilistic. In both cases we assume that thresholds are fixed. We also present Poly (|φ| , |log2 ε|)space algorithm which, given φ and ε > 0, returns a value that is ε-close to the maximal value of P (φ) attainable by ∃ . We prove that for ∈ {>, ≥} and for all types of players ∃ and ∀ (classical or probabilistic) the following set is
654
Marcin Rychlik
dense in [0, 1]: the set of constants c ∈ [0, 1] such that the language of Boolean formulas φ such that ∃ can make P (φ) c is in PSPACE. For the proofs we refer the reader to [12].
2
Variants of The Problem of Quantified Satisfiability
Let V be a countable set of variables. Recall a definition of the set of Boolean formulas Φ ::= 0 | 1 | V
|
∼Φ |
(Φ ∨ Φ)
|
(Φ ∧ Φ) .
Fix φ (v1 , . . . , vn ) ∈ Φ. Let xi ∈ {0, 1} , 1 ≤ i ≤ n. Then the meaning of φ (x1 , . . . , xn ) is the logical value of φ after replacing variables v1 , . . . , vn in φ by x1 , . . . , xn respectively. Now let X1 , . . . , Xn be pairwise independent random variables with range {0, 1} . Naturally φ (X1 , . . . , Xn ) can be understood as the random variable with range {0, 1} such that P (φ (X1 , . . . , Xn ) = 1), also written P (φ (X1 , . . . , Xn )) for short, equals the probability of the event that (X1 , . . . , Xn ) satisfies φ: P (φ (X1 , . . . , Xn ) = 1) =
n
P (Xi = xi ) .
(1)
(x1 ,...,xn )∈{0,1}n i=1 φ(x1 ,...,xn )=1
Note that P (φ (X1 , . . . , Xn ) = 1) is the expected value of φ (X1 , . . . , Xn ). In the sequel, Pp1 ,...,pn (φ) stands for P (φ (X1 , . . . , Xn ) = 1), where Xi s are arbitrary pairwise independent random variables satisfying P (Xi = 1) = pi , 1 ≤ i ≤ n. For all p1 , . . . , pn ∈ [0, 1] Pp1 ,...,pn (φ) =
n
pi xi
(2)
(x1 ,...,xn )∈{0,1}n i=1 φ(x1 ,...,xn )=1
pi if xi = 1 . 1 − pi if xi = 0 For the rest of this paper we shall assume that the range of random variables we consider is {0, 1} and that differently named random variables are pairwise independent. For instance X1 and X2 would denote two pairwise independent random variables with range {0, 1}. We shall write φ (X1 , . . . , Xn ) as the abbreviation for P (φ (X1 , . . . , Xn ) = 1) = 1. Consider the following statement: “There is a random variable X such that for every random variable Y we have P (X ↔ Y ) ≥ 12 ” (Here we wrote φ1 ↔ φ2 as the abbreviation for ((φ1 ∨ ∼ φ2 ) ∧ (∼ φ1 ∨ φ2 ))). It is a true statement - consider random variable X with P (X = 1) = 12 . This statement can be rewritten as 1 ∃X∀Y P (X ↔ Y ) ≥ . 2 where pi xi =
On Probabilistic Quantified Satisfiability Games
655
We used uppercase letters X and Y to emphasize that they represent random variables. Sometimes we would like to state also that: “There is a random variable X such that for every y ∈ {0, 1} we have P (X ↔ y) ≥ 12 .” This can be viewed as the previous statement with Y restricted to two random variables: such that P (Y = 1) = 1 or P (Y = 0) = 1. We will denote it by ∃X∀y P (X ↔ y) ≥
1 . 2
Here and subsequently, ∃X means that there is a random variable X, ∃x means that there is a random variable x restricted to two random variables: such that P (x = 1) = 1 or P (x = 0) = 1. Similarly in the case of quantifier ∀. We extend this notation to longer prefixes in obvious way. Note that ∃x1 ∀y1 ∃x2 . . . φ has its usual meaning. Consider formula of the form: Q1 y1 Q2 y2 Q3 y3 . . . Qn yn P (φ (y1 , y2 , y3 , . . . , yn )) c
(3)
where ∈ {≥, ≤, >, <}, c ∈ [0, 1), yi ∈ {xi , Xi }, Qi ∈ {∃, ∀}, 1 ≤ i ≤ n. We will interpret formula (3) as the game between ∃ and ∀. A player Qi chooses yi in turn, for i = 1, . . . , n. ∃ wins if P (φ (y1 , y2 , y3 , . . . , yn )) c, after all yi s are chosen. ∃ has a winning strategy, if he can make P (φ (y1 , y2 , y3 , . . . , yn )) c, and ∀ has a winning strategy, if he can make P (φ (y1 , y2 , y3 , . . . , yn )) c. Obviously ∃ has a winning strategy iff formula (3) is true, and ∀ has a winning strategy iff the following formula is true: Q1 y1 Q2 y2 Q3 y3 . . . Qn yn P (φ (y1 , y2 , y3 , . . . , yn )) c where Qi is ∃ if Qi = ∀, and Qi is ∀ if Qi = ∃, 1 ≤ i ≤ n. In the case of being ’≥’ or ’>’ ∃ tries to make P (φ (y1 , y2 , y3 , . . . , yn )) as big as possible, and then it is natural to call P (φ (y1 , y2 , y3 , . . . , yn )) the payoff of ∃. If yi = Xi for every yi chosen by ∃, then we call ∃ a probabilistic player, and we say that he uses a probabilistic strategy. If yi = xi for every yi chosen by ∃, then we call ∃ a classical player, and we say he uses a classical strategy then. We use similar terminology for the case of the player ∀. For the rules of the game described in the introduction we can consider following problem. Problem 1. Fix c ∈ [0, 1) . Given Boolean formula φ decide whether ∃X1 ∀X2 ∃X3 . . . n Xn P (φ (X1 , X2 , X3 , . . . , Xn )) > c
(4)
where the nth quantifier n is ∃ if n is odd, and ∀ if n is even. In the case of threshold c given by finitely representable rational number decidability of the problem 1 and of similar ones follows from Tarski’s Theorem on the decidability of the first-order theory of the field of real numbers. For example, we can rewrite formula ∃X∀Y P (X ↔ Y ) ≥ 12 as the following sentence of theory of reals ∃pX (0 ≤ pX ≤ 1) ∧ ∀pY [(0 ≤ pY ≤ 1) ⇒ pX pY + (1 − pX ) (1 − pY ) > c] .
656
Marcin Rychlik
In general, the size of an expression representing P (φ (X1 , X2 , X3 , . . . , Xn )) can be of exponential size with respect to the size of φ. The following problem is PSPACE-complete[1]. Problem 2 (Quantified Satisfiability). Given formula φ decide whether ∃x1 ∀x2 ∃x3 . . . n xn φ . One may make conjecture that ∃X1 ∀X2 ∃X3 . . . n Xn φ (X1 , X2 , X3 , . . . , Xn ) is equivalent to ∃x1 ∀x2 . . . n xn φ (x1 , x2 , x3 , . . . , xn ). But it is not true as the following example shows. Example 1. Let φ = v1 ↔ v2 . Then ∀x1 ∃x2 φ (x1 , x2 ) is true but it is not true that ∀X1 ∃X2 φ (X1 , X2 ), because if P (X1 = 1) = 12 then P (φ (X1 , X2 )) = P (X1 = 0) P (X2 = 0) + P (X1 = 1) P (X2 = 1) 1 1 1 = P (X2 = 0) + P (X2 = 1) = < 1 2 2 2 whatever X2 is chosen.
The next example shows that for some Boolean formulas φ quantified formula ∃x1 ∀x2 . . . n xn φ is true whereas ∃X1 ∀X2 . . . n Xn P (φ (X1 , . . . , Xn )) ≥ c is true only when c is negligible. Example 2. Let φ =
n i=1
(v2i−1 ↔ v2i ). Then ∀x1 ∃x2 . . . ∃x2n φ (x1 , . . . , x2n ) is
true but ∀X1 ∃X2 . . . ∃X2n P (φ (X1 , . . . , X2n )) ≥ c is not true unless c ≤ 21n . If we set P (X2i−1 = 1) = 12 for all 1 ≤ i ≤ n, then P (X2i−1 ↔ X2i ) = 12 for all 1 ≤ i ≤ n (see the previous example) and in consequence P (φ (X1 , . . . , X2n )) = n P (X2i−1 ↔ X2i ) = 21n , no matter how ∀ chooses X2 , . . . , X2n . We used the i=1
fact that for arbitrary Boolean formulas φ1 (v1 , . . . , vn ) and φ2 (w1 , . . . , wm ) P (φ1 (X1 , . . . , Xn ) ∧ φ2 (Y1 , . . . , Ym )) = P (φ1 (X1 , . . . , Xn )) P (φ2 (Y1 , . . . , Ym )) when Xi s and Yi s are pairwise independent random variables.
The example above may seem to suggest that if a player has no winning strategy then the best he can do is to always choose probability 12 . But the following example illustrates that this need not be the case. Example 3. Consider formula φ (v1 , v2 , v3 , v4 ) such that φ (x1 , x2 , x3 , x4 ) is true if and only if (x1 , x2 , x3 , x4 ) ∈ {(1, 0, 0, 0) , (0, 1, 0, 0) , (0, 0, 1, 0) , (0, 1, 1, 0) , (1, 1, 1, 0) , (1, 0, 0, 1) , (0, 1, 0, 1) , (1, 1, 0, 1)} .
On Probabilistic Quantified Satisfiability Games
657
One can check that ∃x1 ∀x2 ∃x3 ∀x4 φ (x1 , x2 , x3 , x4 ) is not true. The value F (p) defined by
F (p) =
p
3 2 − 1 −3p+p −p √
2
√
−1+2p+p2 +p2 +1
−1+2p+p2 p
if 0 ≤ p ≤ if
1 2
1 2
is the maximal value of P (φ) available to ∃, when ∃ chooses P (X1 = 1) = p in the first move. The computation is explained in [12](see also example 4). The
√ 13
√ 1 3 ∗ value of p maximizing F (p) is p = 6 53 − 6 78 + 6 53 + 6 78 − 16 ≈ 0.657298 = 12 . If ∃ would choose P (X1 = 1) = 12 instead, then he could attain
1 F 2 = 12 at most, which is less than F (p∗ ) ≈ 0.553906. It is worth noting that both p∗ and F (p∗ ) are irrational1 . It may come as a slight surprise, that the problem to decide whether ∃X1 ∀X2 ∃X3 . . . n Xn φ is not as hard as QSAT, unless PSPACE collapses to the second level of the polynomial hierarchy(see also Summary at page 659): Theorem 1. ∃x1 ∃x3 . . . ∃xι ∀x2 ∀x4 . . . ∀xκ φ ⇔ ∃x1 ∀X2 ∃x3 . . . n χn φ ⇔ ∃x1 ∀X2 ∃x3 . . . n χn P (φ) > 1 − ⇔ ∃X1 ∀X2 ∃X3 . . . n Xn φ
1 2n/2
⇔ ∃X1 ∀X2 ∃X3 . . . n Xn P (φ) > 1 −
1 2n/2
where χn is xn if n is odd, and Xn if n is even, and ι = n, κ = n − 1 if n is odd, and ι = n − 1, κ = n if n is even. The following theorem shows that if the player ∀ is classical then probabilistic strategy does not add the power to ∃, when threshold c is set to 1. Theorem 2. ∃x1 ∀x2 ∃x3 . . . n xn φ ⇔ ∃X1 ∀x2 ∃X3 . . . n κn φ ⇔ ∃X1 ∀x2 ∃X3 . . . n κn P (φ) > 1 −
1 2n/2
where κn is Xn if n is odd, and xn if n is even. 1
We used command FullSimplify[x ∈ Rationals] in Mathematica ver. 4.0.1.0 program created by Wolfram Research, Inc., to get this result.
658
3
Marcin Rychlik
Game Value
Definition. Let φ (v1 , . . . , vn ) ∈ Φ. cφ = max
min . . . n
Pp1 ,...,pn (φ)
(5)
cφ = max
min . . . n
Pp1 ,...,pn (φ)
(6)
cφ = max
min . . . n Pp1 ,...,pn (φ)
(7)
p1 ∈[0,1] p2 ∈[0,1]
pn ∈[0,1]
p1 ∈[0,1] p2 ∈{0,1}
pn ∈∆n
p1 ∈{0,1} p2 ∈[0,1]
pn ∈Λn
where n ,∆n ,Λn are max, [0, 1] , {0, 1} respectively if n is odd, and min, {0, 1}, [0, 1] if n is even. Let
Γ = {cφ : φ ∈ Φ} , Γ = cφ : φ ∈ Φ , Γ = cφ : φ ∈ Φ . The values at the right-hand sides of the formulas (5), (6) and (7), call them the game values, are well defined because the sets [0, 1] and {0, 1} are compact and for 1 < i ≤ n the following maps are continuous with respect to p1 , . . . , pi p1 , . . . , pi →
i+1 . . . n Pp1 ,...,pn pn ∈[0,1] pi+1 ∈[0,1]
p1 , . . . , pi →
i+1 pi+1 ∈∆i+1
. . . n Pp1 ,...,pn (φ)
p1 , . . . , pi →
i+1 pi+1 ∈Λi+1
. . . n Pp1 ,...,pn (φ) .
(φ)
pn ∈∆n
pn ∈Λn
Pp1 ,...,pn (φ) is continuous (case i = n) because it is a multilinear map (recall (2)). The continuity of maps in case when i < n can be inductively proved by the use of the following lemma. Lemma 1. Assume f : S × T → R is a continuous map and S ,T are compact spaces. Then F defined by F (s) = maxf (s, t) is also continuous. t∈T
The values cφ , cφ , cφ defined by (5), (6) and (7) are the maximal attainable payoffs of ∃ in corresponding games. To see this observe that if f (p) is the payoff of the player corresponding to a choice p ∈ P , where P is the compact set of all possible choices, then F = maxf (p) is the maximal attainable payoff of the p∈P
player provided f is a continuous map. Example 4. Let φ be as in example 3. Then cφ = max
min
cφ = max
min
max
min
p1 ∈[0,1] p2 ∈[0,1] p3 ∈[0,1] p4 ∈[0,1]
max
min
Pp1 ,p2 ,p3 ,p4 (φ) = F (p∗ )
p1 ∈[0,1] p2 ∈{0,1} p3 ∈[0,1] p4 ∈{0,1}
Pp1 ,p2 ,p3 ,p4 (φ) =
where F and p∗ are defined in example 3.
1 √ 5 − 1 ≈ 0.618034 . 2
On Probabilistic Quantified Satisfiability Games
659
One can easily check that for every formula φ the following equations hold, relating the game values for φ and ∼ φ. 1 = cφ(v1 ,...,vn ) + c∼φ(v0 ,v1 ,...,vn )
= cφ(v1 ,...,vn ) + c∼φ(v0 ,v1 ,...,vn ) = c∼φ(v0 ,v1 ,...,vn ) + cφ(v1 ,...,vn )
(8)
where we used dummy variable v0 not used in formula φ to enforce x1 or X1 (according to the type of the game), be chosen by ∀. Observe that by (8) we have Γ = {1 − γ : γ ∈ Γ }. We also have following inequalities cφ(v1 ,...,vn ) ≤ cφ(v1 ,...,vn ) ≤ cφ(v1 ,...,vn ) . Theorem 3. For every c ∈ Γ \ {0} the following problem is PSPACE-hard: Given φ decide whether ∃X1 ∀x2 ∃X3 . . . n κn P (φ) ≥ c. Theorem 4. For every c ∈ Γ \ {1} the following problem is PSPACE-hard: Given φ decide whether ∃x1 ∀X2 ∃x3 . . . n χn P (φ) > c. Theorems 1, 2, 3, 4 are summarized below. We will rephrase them in gametheoretic terms. That is, the problem concerning ∃X1 ∀x2 ∃X3 . . . n κn P (φ) > c is considered as the problem of ∃ using probabilistic strategy, against ∀ using classical strategy. Similarly for other cases. Summary of the complexity results. Assume φ is given and c is arbitrary fixed number c ∈ [0, 1), until otherwise stated. We put three questions: if ∃ can make: (i)P (φ) = 1, (ii)P (φ) > c, (iii)P (φ) ≥ c. Our complexity results(when one or both players are probabilistic) depend on the natures of strategies that both players use. (Of course if both players are classical the results are obvious consequences of the PSPACE-completeness of QSAT.) P (φ) = 1
∃\
∀
Probabilistic Σ2P -complete Σ2P -complete
∃\
∀
Probabilistic PSPACE-hard** Σ2P -hard*
∃\
∀
Probabilistic ? ?
Classical Classical PSPACE-complete Probabilistic PSPACE-complete
Classical P (φ) > c Classical PSPACE-complete Probabilistic PSPACE-hard* P (φ) ≥ c
Classical Classical PSPACE-complete Probabilistic PSPACE-hard***
* when c is the part of an input ** when c ∈ Γ \ {1} *** when c ∈ Γ \ {0}
660
Marcin Rychlik
The next theorem yields a partial information concerning the shape of the n sets Γ , Γ and Γ . A number b is binary rational if b = bi 21i for some n and i=1
for some b1 , . . . , bn ∈ {0, 1}. Let Υ be the set of all binary rationals in [0, 1]. Theorem 5. Υ Γ , Υ Γ and Υ Γ . Corollary. The sets Γ , Γ and Γ are dense subsets of the interval [0, 1]. We say that λ is ε-close to λ if |λ − λ | ≤ ε. Theorem 6. Let ∆i = [0, 1] or ∆i = {0, 1} for every 1 ≤ i ≤ n. Given φ (x1 , . . . , xn ) and ε > 0, we can compute in O (log2 |φ| + n log2 n + n |log2 ε|) space a number λ that is ε-close to λ = max min max . . . n Pp1 ,...,pn (φ). p1 ∈∆1 p2 ∈∆2 p3 ∈∆3
pn ∈∆n
In particular, we can compute the approximation of game values cφ , cφ , cφ within the bound just mentioned. One may ask if Theorem 6 could be used to solve Problem 1 in polynomial space, at least for some c. Lemma 2 enables us to give an affirmative answer to this question. Lemma 2. Let D ⊆ Σ ∗ be a language over a finite alphabet Σ, |Σ| ≥ 2, and let P be a map P : D → [0, 1] . Assume for given d ∈ D we can compute in space O (Poly (|d| , |log ε|)) a value P (d, ε) that is ε-close to P (d). Let ∈ {≥, >}. Then the set {c ∈ [0, 1] : the language {d ∈ D|P (d) > c} is in PSPACE} is a dense subset of [0, 1]. As a corollary we get: Theorem 7. Let ∈ {≥, >}. The sets {c ∈ [0, 1] : the language {φ ∈ Φ|cφ c} is in PSPACE}
c ∈ [0, 1] : the language φ ∈ Φ|cφ c is in PSPACE
c ∈ [0, 1] : the language φ ∈ Φ|cφ c is in PSPACE are dense subsets of [0, 1].
4
Conclusion
We have answered completely the question of the complexity of the problem if ∃ has strategy to achieve payoff 1 for all combinations of types of players. (For both players classical this is the classical QSAT problem.) We have shown PSPACE-hardness of the question whether classical ∃ can make payoff greater than fixed c when ∀ uses a probabilistic strategy. In the case of probabilistic ∃ and classical ∀ we need c to be part of the input to
On Probabilistic Quantified Satisfiability Games
661
get PSPACE-hardness. We have PSPACE-hardness result in the case of fixed c when we ask whether ∃ can make payoff greater or equal to c. We have given Σ2P lower bound for the question “P (φ) > c ?” in the case of both players being probabilistic and c belonging to an input. We also indicate that for every mentioned problem it is possible to find a dense subset of thresholds for which the problem is in PSPACE. Still many problems remain open. It would be nice to have a PSPACEcompleteness result of the question “P (φ) > c ?” or “P (φ) ≥ c ?” for some fixed c (c = 12 for instance) and for all combinations of types of players. Also, the complexity of the problem of computing an approximation of game values (or exact values if possible) remains to be studied. This is the subject of an ongoing research.
Acknowledgement The author wishes to express his thanks to Prof. Damian Niwi´ nski for many stimulating conversations.
References 1. A.Chandra, D.Kozen, and L.Stockmeyer, Alternation, Journal of the ACM, 28 (1981), pp. 114-133 2. A.Condon, Computational Models od Games, ACM Distinguished Dissertation, MIT Press, Cambridge, 1989 3. A. Condon and R. Ladner, Probabilistic Game Automata, Proceedings of 1st Structure in Complexity Theory Conference, Lecture Notes in Computer Science, vol. 223, Springer, Berlin, 1986, pp. 144-162. 4. J.Feigenbaum, D.Koller, P.Shor, A Game-Theoretic Classification of Interactive Complexity Classes, Proceedings of the 10th Annual IEEE Conference on Structure in Complexity Theory (STRUCTURES), Minneapolis, Minnesota, June 1995, pages 227-237. 5. S.Goldwasser, M.Sipser, Private coins versus public coins in interactive proof systems, Randomness and Computation, S.Micali, editor, vol. 5 of Advances in Computing Research, JAI Press, Greenwich, 1989, pp. 73-90 6. D.Koller, N.Megiddo, The complexity of Two-Person Zero-Sum Games in Extensive Form, Games and Economic Behavior, 4:528-552, 1992 7. D.Koller, N.Megiddo, B. von Stengel Fast Algorithms for Finding Randomized Stragegies in Game Trees, Proceedings of the 26th Symposium on Theory of Computing, ACM, New York, 1994, pp. 750-759 8. R.Lipton, N.Young, Simple strategies for large zero-sum games with applications to complexity theory, Contributions to the Theory of Games II, H.Kuhn, A.Tucker, editors, Princeton University Press, Princeton, 1953, pp. 193-216. 9. C.Papadimitriou, Games Against Nature, Journal of Computer and System Sciences, 31(1985), pp. 288-301 10. C.Papadimitriou, Computational Complexity, Addison-Wesley Pub. Co., 1994 11. C.Papadimitriou, M.Yannakakis, On Complexity as Bounded Rationality, Proceedings of the 26th Symposium on Theory of Computing, ACM, New York, 1994, pp. 726-733 12. M.Rychlik, On Probabilistic Quantified Satisfiability Games, Available at http://www.mimuw.edu.pl/∼mrychlik/papers
A Completeness Property of Wilke’s Tree Algebras Saeed Salehi Turku Center for Computer Science Lemmink¨ aisenkatu 14 A FIN-20520 Turku [email protected]
Abstract. Wilke’s tree algebra formalism for characterizing families of tree languages is based on six operations involving letters, binary trees and binary contexts. In this paper a completeness property of these operations is studied. It is claimed that all functions involving letters, binary trees and binary contexts which preserve all syntactic tree algebra congruence relations of tree languages are generated by Wilke’s functions, if the alphabet contains at least seven letters. The long proof is omitted due to page limit. Instead, a corresponding theorem for term algebras, which yields a special case of the above mentioned theorem, is proved: in every term algebra whose signature contains at least seven constant symbols, all congruence preserving functions are term functions.
1
Introduction
A new formalism for characterizing families of tree languages was introduced by Wilke [13], which can be regarded as a combination of universal algebraic framework of Steinby [11] and Almeida [1], in the case of binary trees, based on syntactic algebras, and syntactic monoid/semigroup framework of Thomas [12] and of Nivat and Podelski [8],[9]. It is based on three-sorted algebras, whose signature Σ consists of six operation symbols involving the sorts Alphabet, Tree and Context. Binary trees over an alphabet are represented by terms over Σ, namely as Σ-terms of sort Tree. A tree algebra is a Σ-algebra satisfying certain identities which identify (some) pairs of Σ-terms representing the same tree. The syntactic tree algebra congruence relation of a tree language is defined in a natural way (see Definition 1 below.) The Tree-sort component of the syntactic tree algebra of a tree language is the syntactic algebra of the language in the sense of [11], while its Context-component is the semigroup part of the syntactic monoid of the tree language, as in [12]. A tree language is regular iff its syntactic tree algebra is finite ([13], Proposition 2.) A special sub-class of regular tree languages, that of k-frontier testable tree languages, is characterized in [13] by a set of identities satisfied by the corresponding syntactic tree algebra. For characterizing this sub-class, three-sorted tree algebra framework appears to be more suitable, since “frontier testable tree languages cannot be characterized by syntactic semigroups and there is no known B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 662–670, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Completeness Property of Wilke’s Tree Algebras
663
finite characterization of frontier testability (for an arbitrary k) in the universal algebra framework” [7]. This paper concerns Wilke’s functions (Definition 2) by which tree algebra formalism is established for characterizing families of tree languages ([13]). We claim that Wilke’s functions generate all congruence preserving operations on the term algebra of trees, when the alphabet contains at least seven labels. For the sake of brevity, we do not treat tree languages and Wilke’s functions in manysorted algebra framework as done in [13], our approach is rather a continuation of the lines of the traditional framework, as of e.g. [11]. A more comprehensive general study of tree algebras and Wilke’s formalism (independent from this work) has been initiated by Steinby and Salehi [10].
2
Tree Algebraic Functions
For an alphabet A, let Σ A be the signature which contains a constant symbol ca and a binary function symbol fa for every a ∈ A, that is Σ A = {ca | a ∈ A} ∪ {fa | a ∈ A}. The set of binary trees over A, denoted by TA , is defined inductively by: – ca ∈ TA for every a ∈ A; and – fa (t1 , t2 ) ∈ TA whenever t1 , t2 ∈ TA and a ∈ A. Fix a new symbol ξ which does not appear in A. Binary contexts over A are binary trees over A ∪ {ξ} in which ξ appears exactly once. The set of binary contexts over A, denoted by CA , can be defined inductively by: – ξ, fa (t, ξ), fa (ξ, t) ∈ CA for every a ∈ A and every t ∈ TA ; and – fa (t, p), fa (p, t) ∈ CA for every a ∈ A, every t ∈ TA , and every p ∈ CA . For p, q ∈ CA and t ∈ TA , p(q) ∈ CA and p(t) ∈ TA are obtained from p by replacing the single occurrence of ξ by q or by t, respectively. Definition 1. For a tree language L ⊆ TA we define the syntactic tree algebra L L congruence relation of L, denoted by ≈L = (≈L A , ≈T , ≈C ), as follows: 1. For any a, b ∈ A, a ≈L A b ≡ ∀p ∈ CA {p(ca ) ∈ L ↔ p(cb ) ∈ L} & ∀p ∈ CA ∀t1 , t2 ∈ TA {p(fa (t1 , t2 )) ∈ L ↔ p(fb (t1 , t2 )) ∈ L}. 2. For any t, s ∈ TA , t ≈L T s ≡ ∀p ∈ CA {p(t) ∈ L ↔ p(s) ∈ L}. 3. For any p, q ∈ CA , p ≈L C q ≡ ∀r ∈ CA ∀t ∈ TA {r(p(t)) ∈ L ↔ r(q(t)) ∈ L}. Remark 1. Our definition of syntactic tree algebra congruence relation of a tree language is that of [13], but we have corrected a mistake in Wilke’s definition of ≈L A ; it is easy to see that the original definition (page 72 of [13]) does not yield a congruence relation. Another difference is that ξ is not a context in [13].
664
Saeed Salehi
Definition 2. ([13], page 88) For an alphabet A, Wilke’s functions over A are defined by: ιA : A → TA 2 κA : A × T A → T A A λ : A × TA → CA ρA : A × TA → CA 2 σ A : CA → CA η A : CA × TA → TA
ιA (a) = ca κA (a, t1 , t2 ) = fa (t1 , t2 ) λA (a, t) = fa (ξ, t) ρA (a, t) = fa (t, ξ) σ A (p1 , p2 ) = p1 (p2 ) η A (p, t) = p(t)
Recall that projection functions πjn : B1 × · · · × Bn → Bj (for sets B1 , · · · , Bn ) are defined by πjn (b1 , · · · , bn ) = bj . For a b ∈ Bj , the constant function from B1 × · · · × Bn to Bj , determined by b, is defined by (b1 , · · · , bn ) → b. n
m
k
Definition 3. For an alphabet A, a function F : A × TA × CA → X where X ∈ {A, TA , CA } is called tree-algebraic over A, if it is a composition of Wilke’s functions over A, constant functions and projection function. Example 1. Let A = {a, b}. The function F : A × TA × CA → CA defined by F (x, t, p) = fa fx fb (ca , ca ), ξ , p fb (t, cx ) , for x ∈ A, t ∈ TA and p ∈ CA , is tree-algebraic over A. Indeed F (x, t, p) = σ A λA a, η A p, κA (b, t, ιA (x)) , ρA x, fb (ca , ca ) . n
m
k
Definition 4. A function F : A × TA × CA → X where X ∈ {A, TA , CA } is called congruence preserving over A, if for every tree language L ⊆ TA and for all a1 , b1 , · · · , an , bn ∈ A, t1 , s1 , · · · , tm , sm ∈ TA , p1 , q1 , · · · , pk , qk ∈ CA , L L L if a1 ≈L A b1 , · · · , an ≈A bn , t1 ≈T s1 , · · · , tm ≈T sm , L L and p1 ≈C q1 , · · · , pk ≈C qk , then F (a1 , · · · , an , t1 , · · · , tm , p1 , · · · , pk ) ≈L x F (b1 , · · · , bn , s1 , · · · , sm , q1 , · · · , qk ),
where x is A, T, or C, if X = A, X = TA , or X = CA , respectively. Remark 2. In universal algebra, the functions which preserve congruence relations of an algebra, are called congruence preserving functions. On the other hand it is known that every congruence relation over an algebra is the intersection of some syntactic congruence relations (see Remark 2.12 of [1] or Lemma 6.2 of [11].) So, a function preserve all congruence relations of an algebra iff it preserves the syntactic congruence relations of all subsets of the algebra. This justifies the notion of congruence preserving function in our Definition 4, even though we require that the function preserves only all the syntactic tree algebra congruence relations of tree languages.
A Completeness Property of Wilke’s Tree Algebras
665
Example 2. For A = {a, b}, the root function root: TA → A, which maps a tree to its root label, is not congruence preserving: Let L = {fa (cb , cb )}, then fa (ca , ca ) ≈L T fb (ca , ca ), but since fa (cb , cb ) ∈ L, and fb (cb , cb ) ∈ L, then root fa (ca , ca ) = a ≈L A b = root fb (ca , ca ) . Lemma 1. All tree-algebraic functions are congruence preserving. The easy proof is omitted. We claim the converse for alphabets containing at least seven labels: Theorem 1. For an alphabet A which contains at least seven labels, every congruence preserving function over A is tree-algebraic over A. Remark 3. The condition |A| ≥ 7 in Theorem 1 may seem odd at the first glance, but the theorem does not hold for |A| = 2: let A = {a, b} and define F : A → TA by F (a) = fa (cb , cb ), F (b) = fb (ca , ca ). It can be easily seen that F is congruence preserving but is not tree-algebraic over A. It is not clear at the moment whether Theorem 1 holds for 3 ≤ |A| ≤ 6. The long detailed proof of Theorem 1 will not be given in this paper because of space shortage. Instead, in the next section, a corresponding theorem for term algebras, which immediately yields Theorem 1 for congruence preserving m functions of the form F : TA → TA , is proved.
3
Congruence Preserving Functions in Term Algebras
Our notation follows mainly [2], [3], [5], [6], and [11]. A ranked alphabet is a finite nonempty set of symbols each of which has a unique non-negative arity (or rank). The set of m-ary symbols in a ranked alphabet Σ is denoted by Σm (for each m ≥ 0). TΣ (X) is the set of Σ-terms with variables in X. For empty X it is simply denoted by TΣ . Note that (TΣ (X), Σ) is a Σ-algebra, and (TΣ , Σ) is called the term algebra over Σ. For L ⊆ TΣ , let ≈L be the syntactic congruence relation of L ([11]), i.e., the greatest congruence on the term algebra TΣ saturating L. Let Σ denote a signature with the property that Σ = Σ0 . Throughout X is always a set of variables. Definition 5. A function F : (TΣ )n → TΣ is congruence preserving if for every congruence relation Θ over TΣ and all t1 , · · · , tn , s1 , · · · , sn ∈ TΣ , if t1 Θs1 , · · · , tn Θsn , then F (t1 , · · · , tn )ΘF (s1 , · · · , sn ). Remark 4. A congruence preserving function F : (TΣ )n → TΣ induces a welldefined function FΘ : (TΣ /Θ)n → TΣ /Θ on any quotient algebra, for any congruence Θ on TΣ , defined by FΘ ([t1 ]Θ , · · · , [tn ]Θ ) = [F (t1 , · · · , tn )]Θ .
666
Saeed Salehi
For terms u1 , · · · , un ∈ TΣ (X) and t ∈ TΣ (X ∪ {x1 , · · · , xn }) with x1 , · · · , xn ∈ X, the term t[x1 /u1 , · · · , xn /un ] 1 ∈ TΣ (X) is resulted from t by replacing all the occurrences of xi by ui for all i ≤ n. The function (TΣ )n → TΣ (X) defined by (u1 , · · · , un ) → t[x1 /u1 , · · · , xn /un ] for all u1 , · · · , un ∈ TΣ , is called the term function 2 defined by t. The rest of the paper is devoted to the proof of the following Theorem: Theorem 2. If |Σ0 | ≥ 7, then every congruence preserving F : (TΣ )n → TΣ , for every n ∈ IN, is a term function (i.e., there is a term t ∈ TΣ ({x1 , · · · , xn }), where x1 , · · · , xn are variables, such that F (u1 , · · · , un ) = t[x1 /u1 , · · · , xn /un ] for all u1 , · · · , un ∈ TΣ .) Remark 5. Theorem 2 dose not hold for |Σ0 | = 1: Let Σ = Σ0 ∪ Σ1 be a signature with Σ1 = {α} and Σ0 = {ζ0 }. The term algebra (TΣ , Σ) is isomorphic to (IN, 0, S), where 0 is the constant zero and S is the successor function. Let F : IN → IN be defined by F (n) = 2n. It is easy to see that F is congruence preserving: for every congruence relation Θ, if nΘm then SnΘSm and by repeating the same argument for n times we get Sn nΘSn m or 2nΘn + m. Similarly Sm nΘSm m, so m + nΘ2m, hence 2mΘ2n that is F (n)ΘF (m). But F is not a term function, since all term functions are of the form n → Sk n = k + n for a fixed k ∈ IN. It is not clear at the moment whether Theorem 2 holds for 2 ≤ |Σ0 | ≤ 6. Remark 6. Finite algebras having the property that all congruence preserving functions are term functions are called hemi-primal in universal algebra (see e.g. [3]). Our assumption Σ = Σ0 in Theorem 2 implies that TΣ is infinite. Remark 7. Theorem 2 yields Theorem 1 for congruence preserving functions of n the form F : TA → TA , since (TA , Σ A ) is the term algebra over the signature Σ A , and its every term function can be represented by ιA and κA (recall that ca = ιA (a), and fa (t1 , t2 ) = κA (a, t1 , t2 ), for every a ∈ A, and t1 , t2 ∈ TA ). Proof of Theorem 2 Definition 6. – An interpretation of X in TΣ is a function ε : X → TΣ . Its unique extension to the Σ-homomorphism TΣ (X) → TΣ is denoted by ε∗ . – Any congruence relation Θ on TΣ is extended to a congruence relation Θ∗ on TΣ (X) defined by the following relation for any p, q ∈ TΣ (X): p Θ∗ q, if for every interpretation ε : X → TΣ , ε∗ (p) Θ ε∗ (q) holds. – A function G : TΣ → TΣ (X) is congruence preserving if for every congruence relation Θ on TΣ and t, s ∈ TΣ , if tΘs, then G(t)Θ∗ G(s). The classical proof of the following lemma is not presented here. 1 2
Denoted by t[u1 , · · · , un ] in [4]. It is also called tree substitution operation, see e.g. [4].
A Completeness Property of Wilke’s Tree Algebras
667
Lemma 2. The term function TΣ → TΣ (X), u → t[x/u] defined by any term t ∈ TΣ (X ∪ {x}) (x ∈ X), is congruence preserving. Definition 7. Let t be a term in TΣ (X), and C ⊆ TΣ (X), then t is called independent from C, if it is not a subterm of any member of C and no member of C is a subterm of t. the set of RFor a term rewriting system R, and a term u, let ∆∗R (u) be descendants of {u} (cf. [6]) and for a set of terms C, let ∆∗R (C) = u∈C ∆∗R (u). A useful property of the notion of independence is the following: Lemma 3. Let u ∈ TΣ (X) be independent from C ⊆ TΣ (X) and R be the single-ruled (ground-)term rewriting system {w → u} where w is any term in TΣ (X). Then L = ∆∗R (C) is closed under the rewriting rule u → w, and also u ≈L w. Moreover, every member of L results from a member of C by replacing some w subterms of it by u. Proof. Straightforward, once we note that any application of the rule w → u to a member of C, does not result in a new subterm of the form w, and all u’s appearing in the members of L (as subterms) are obtained by applying the (ground-term) rewriting rule w → u.
Proposition 1. For any C ⊂ TΣ (X) such that |C| < |Σ0 |, there is a term in TΣ which is independent from C. Proof. For each c ∈ Σ0 choose a tc ∈ TΣ that is higher (has longer height) than all terms in C and contains no other constant symbol than this c. Then, no tc is a subterm of any member of C. On the other hand, no term in C may appear as a subterm in more than one of the terms tc (for any c ∈ Σ0 ). Since the number of tc ’s for c ∈ Σ0 are more than the number of elements of C, then by the Pigeon Hole Principle, there must exist a tc that is independent from C.
Lemma 4. Let G : TΣ → TΣ (X) be congruence preserving, ε : X → TΣ be an interpretation, and u, v ∈ TΣ . If v is independent from {u, ε∗ (G(u))}, then ε∗ (G(v)) ∈ ∆∗{u→v} ε∗ (G(u)) . Moreover, ε∗ (G(v)) results from ε∗ (G(u)) by replacing some u subterms by v. Proof. Let L = ∆∗{u→v} ε∗ (G(u)) . By Lemma 3, u ≈L v. The function G is congruence preserving, so ε∗ (G(u)) ≈L ε∗ (G(v)), and since ε∗ (G(u)) ∈ L, then ε∗ (G(v)) ∈ L. The second claim follows from the independence of v from
{u, ε∗ (G(u))}. Recall that for a position p of the term t, t|p is the subterm of t at the position p (cf. [2]).
668
Saeed Salehi
Lemma 5. Suppose |Σ0 | ≥ 7, and let G : TΣ → TΣ (X) be congruence preserving. If v is independent from {u, G(u)}, for u, v ∈ TΣ , then G(v) results from G(u) by replacing some of its u subterms by v. Proof. By Proposition 1, there are w, w1 , w2 such that w is independent from {u, G(u), v, G(v)}, w1 is independent from {w, u, G(u), v, G(v)}, and w2 is independent from {w, w1 , u, G(u), v, G(v)}. Define the interpretation ε : X → TΣ by setting ε(x) = w for all x ∈ X. By the choice of w, v is independent from {u, ε∗ (G(u))}. So we can apply Lemma 4 to infer that ε∗ (G(v)) results from ε∗ (G(u)) by replacing some u subterms by v. Note that G(v) is obtained by substituting all w’s in ε∗ (G(v)) by members of X. The same is true about G(u) and ε∗ (G(u)). The positions of ε∗ (G(v)) in which w appear are exactly the same positions of ε∗ (G(u)) in which w appear (by the choice of w). So, positions of G(v) in which a member of X appear are exactly the same positions of G(u) in which a member of X appear. We claim that identical members of X appear in those identical positions of G(u) and G(v): if not, there are x1 , x2 ∈ X such that G(v)|p = x1 and G(u)|p = x2 for some position p of G(u) (and of G(v)). Define the interpretation δ : X → TΣ by δ(x1 ) = w1 , δ(x2 ) = w2 , and δ(x) = w for all x = x1 , x2 . Then δ ∗ (G(v))|p = w1 and δ ∗ (G(u))|p = w2 . On the other hand by Lemma 4, δ ∗ (G(v)) results from δ ∗ (G(u)) by replacing some u subterms by v. By the choice of w1 and w2 , such a replacement can not affect the appearance of w1 or w2 , and hence the subterms of δ ∗ (G(v)) and δ ∗ (G(u)) in the position p must be identical, a contradiction. This proves the claim which implies that G(v) results from G(u) by replacing some u subterms by v.
Lemma 6. Suppose |Σ0 | ≥ 7, and let G : TΣ → TΣ (X) be congruence preserving. Then for any u, v ∈ TΣ , G(v) results from G(u) by replacing some u subterms by v. Proof. By Proposition 1, there is a w ∈ TΣ independent from {u, G(u), v, G(v)}. By Lemma 5, G(w) is obtained from G(u) by replacing some u subterms by w, and also results from G(v) by replacing some v subterms by w. By the choice of w, all w’s appearing in G(w) have been obtained either by replacing u by w in G(u) or by replacing v by w in G(v). Since the only difference between G(v) and G(w) is in the positions of G(w) where w appears, and the same is true for the difference between G(u) and G(w), then G(v) can be obtained from G(u) by replacing some u subterms of it, the same u subterms which have been replaced by w to get G(w), by v.
Lemma 7. If |Σ0 | ≥ 7, then every congruence preserving function G : TΣ → TΣ (X) is a term function (i.e., there is a term t ∈ TΣ (X ∪ {x}), where x ∈ X, such that G(u) = t[x/u] for all u ∈ TΣ .)
A Completeness Property of Wilke’s Tree Algebras
669
Proof. Fix a u0 ∈ TΣ , and choose a v ∈ TΣ such that v is independent from {u0 , G(u0 )}. (By Proposition 1 such a v exists.) Then by Lemma 6, G(v) results from G(u0 ) by replacing some u0 subterms by v. Let y be a new variable (y ∈ X) and let t ∈ TΣ (X ∪ {y}) result from G(u0 ) by putting y exactly in the same positions that u0 ’s are replaced by v’s to get G(v). So, G(u0 ) = t[y/u0 ] and G(v) = t[y/v], moreover all v’s in G(v) are obtained from t by substituting all y’s by v. We show that for any arbitrary u ∈ TΣ , G(u) = t[y/u] holds: Take a u ∈ TΣ . By Proposition 1, there is a w independent from the set {u0 , G(u0 ), v, G(v), u, G(u)}. By Lemma 6, G(w) results from G(v) by replacing some v subterms by w. We claim that all v’s are replaced by w’s in G(v) to get G(w). If not, then v must be a subterm of G(w). From the fact (Lemma 6) that G(u0 ) results from G(w) by replacing some w subterms by u0 (and the choice of w) we can infer that v is a subterm of G(u0 ) which is in contradiction with the choice of v. So the claim is proved and then we can write G(w) = t[y/w], moreover all w’s in G(w) are obtained from t by substituting y by w. Again by Lemma 6, G(u) results from G(w) by replacing some w subterms by u. We can claim that all w’s appearing in G(w) are replaced by u to get G(u). Since otherwise w would have been a subterm of G(u) which is in contradiction with the choice of w. This shows that G(u) = t[y/u].
Theorem 2. If |Σ0 | ≥ 7, then every congruence preserving F : (TΣ )n → TΣ , for every n ∈ IN, is a term function. Proof. We proceed by induction on n: For n = 1 it is Lemma 7 with X = ∅. For the induction step let F : (TΣ )n+1 → TΣ be a congruence preserving function. For any u ∈ TΣ define Fu : (TΣ )n → TΣ by Fu (u1 , · · · , un ) = F (u1 , · · · , un , u). By the induction hypothesis every Fu is a term function, i.e., there is a s ∈ TΣ ({x1 , · · · , xn }) such that Fu (u1 , · · · , un ) = s[x1 /u1 , · · · , xn /un ] for all u1 , · · · , un ∈ TΣ . Denote the corresponding term for u by tu (it is straightforward to see that such a term s is unique for every u). The mapping TΣ → TΣ ({x1 , · · · , xn }) defined by u → tu is also congruence preserving. Hence by Lemma 7, it is a term function. So there is a t ∈ TΣ ({x1 , · · · , xn , xn+1 }) such that tu = t[xn+1 /u], hence F (u1 , · · · , un , un+1 ) = Fun+1 (u1 , · · · , un ) = tun+1 [x1 /u1 , · · · , xn /un ] = t[xn+1 /un+1 ][x1 /u1 , · · · , xn /un ]. So F (u1 , · · · , un , un+1 ) = t[x1 /u1 , · · · , xn /un , xn+1 /un+1 ] is a term function.
Acknowledgement I am much grateful to professor Magnus Steinby for reading drafts of this paper and for his fruitful ideas, comments and support.
References 1. Almeida J., “On pseudovarieties, varieties of languages, fiters of congruences, pseudoidentities and related topics”, Algebra Universalis, Vol. 27 (1990) pp. 333-350. 2. Bachmair L., “Canonical equational proofs”, Progress in Theoretical Computer Science, Birkh¨ auser, Boston Inc., Boston MA, 1991.
670
Saeed Salehi
3. Denecke K. & Wismath S. L., “Universal algebra and applications in theoretical computer science”, Chapman & Hall/CRC, Boca Raton FL, 2002. 4. F¨ ul¨ op Z. & V´ agv¨ olgyi S. “Minimal equational representations of recognizable tree languages” Acta Informatica Vol. 34, No. 1 (1997) pp. 59-84. 5. G´ecseg F. & Steinby M., “Tree languages”, in: Rozenberg G.& Salomaa A. (ed.) Handbook of formal languages, Vol. 3, Springer, Berlin (1997) pp. 1-68. 6. Jantzen M., “Confluent string rewriting”, EATCS Monographs on Theoretical Computer Science 14, Springer-Verlag, Berlin 1988. 7. Salomaa K., Review of [13] in AMS-MathSciNet, MR-97f:68134. 8. Nivat M. & Podelski A., “Tree monoids and recognizability of sets of finite trees”, Resolution of Equations in Algebraic Structures, Vol. 1, Academic Press, Boston MA (1989) pp. 351-367. 9. Podelski A., “A monoid approach to tree languages”, in: Nivat M. & Podelski A. (ed.) Tree Automata and Languages, Elsevier-Amsterdam (1992) pp. 41-56. 10. Salehi S. & Steinby M., “Tree algebras and regular tree languages” in preparation. 11. Steinby M., “A theory of tree language varieties”, in: Nivat M. & Podelski A. (ed.) Tree Automata and Languages, Elsvier-Amsterdam (1992) pp. 57-81. 12. Thomas W., “Logical aspects in the study of tree languages”, Ninth Colloquium on Trees in Algebra and in Programming (Proc. CAAP’84), Cambridge University Press (1984) pp. 31-51. 13. Wilke T., “An algebraic characterization of frontier testable tree languages”, Theoretical Computer Science, Vol. 154, N. 1 (1996) pp. 85-106.
Symbolic Topological Sorting with OBDDs (Extended Abstract) Philipp Woelfel FB Informatik, LS2, Univ. Dortmund, 44221 Dortmund, Germany [email protected]
Abstract. We present a symbolic OBDD algorithm for topological sorting which requires O(log2 N ) OBDD operations. Then we analyze its true runtime for the directed grid graph and show an upper bound of O(log4 N ). This is the first true runtime analysis of a symbolic OBDD algorithm for a fundamental graph problem, and it demonstrates that one can hope that the algorithm behaves well for sufficiently structured inputs.
1
Introduction
Algorithms on graphs is one of the best studied areas in computer science. Usually, a graph G = (V, E) is given by an adjacency list or by an adjacency matrix. 2 Such an explicit representation of a graph requires space Θ(|V |+|E|) or Θ(|V | ), and for many graph problems efficient algorithms are known. However, there are several application areas where typical problem instances have such a large size that a linear or even polynomial runtime is not feasible, or where even the explicit representation of the problem instance itself may not fit into memory anymore. In order to deal with very large graphs, symbolic (or implicit) graph algorithms have been devised, where the vertex and edge sets representing the involved graphs are stored symbolically, i.e., in terms of their characteristic functions. The characteristic functions are usually represented by so-called Binary Decision Diagrams (BDDs) or more specifically by Ordered Binary Decision Diagrams (OBDDs) — see Section 2 for definitions. Such approaches have been successfully applied in the areas of model checking, circuit verification and finite state machine verification (see e.g. [2,3,4]). These applications can be viewed as particular cases of symbolic graph problems, which raises the question whether it is also possible to devise symbolic graph algorithms with a good behavior for fundamental graph theoretical problems. One approach in this direction was undertaken by Hachtel and Somenzi [5] who introduced a symbolic OBDD algorithm for the maximum flow problem in 0-1 networks. The promising experimental studies demonstrated that the algorithm is able to handle graphs with over 1036 edges and that it is competitive with traditional algorithms on dense random graphs. The paper lacks however, a theoretical analysis of its performance with
Supported in part by DFG grant We 1066/10-1
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 671–680, 2003. c Springer-Verlag Berlin Heidelberg 2003
672
Philipp Woelfel
respect to runtime. Recently, Sawitzki [11,9] has analyzed the number of OBDD operations (i.e., the number of required synthesis operations of characteristic functions) required by the flow algorithm of Hachtel and Somenzi and has proposed an improved algorithm. But note that there is only a very weak relation between the number of OBDD operations and the true runtime of a symbolic OBDD algorithm. The time required for one synthesis step is mainly influenced by the sizes of the involved OBDDs which may range from linear to exponential (in the number of variables of the represented characteristic functions). In fact, we are not aware of any analysis of a symbolic OBDD algorithm with respect to its true runtime. (However, there is a recent true runtime analysis for a related type of decision diagrams, called Binary Moment Diagrams, showing that certain multiplier circuits can be verified in polynomial time [7].) The results and techniques presented here aim to be a first step in filling this gap. First, we present a new OBDD algorithm for topological sorting the N vertices of a directed acyclic graph, which requires only O(log2 N ) OBDD operations on OBDDs for functions with at most 4log N variables. Hence, if all OBDDs obtained during the execution of the algorithm have subexponential size, the total runtime is sublinear in the number of vertices of the graph. Then we analyze its true runtime for the directed grid graph and show an upper bound of O(log4 N ). This demonstrates that one can in fact hope that such a fundamental graph algorithm behaves well for sufficiently structured inputs. For the analysis, we generalize the notion of threshold functions to multivariate threshold functions. We investigate the OBDD size of multivariate threshold (and modulo) functions and obtain strong results about the effect of OBDD operations such as quantification on these functions. Clearly, our analysis is a “good-case” analysis which is only valid for one particular input instance. We hope, though, that the techniques presented here are a good starting point for developing a framework which allows to analyze symbolic algorithms for fundamental graph problems for larger classes of input instances. In fact, Sawitzki [10] has already successfully applied our framework to analyze the true runtime of his 0-1 network flow algorithm on the grid network.
2
OBDDs and Implicit Graph Representation n
In the following, let Bn denote the class of boolean functions {0, 1} → {0, 1} and let Xn = {x1 , . . . , xn } be a set of boolean variables. Let f ∈ Bn be a function defined over the variables in Xn . The subfunction of f , where k variables xi1 , . . . , xik are fixed to k constants c1 , . . . , ck ∈ {0, 1} is denoted by f|xi1 =c1 ,...,xik =ck . A variable ordering π on Xn is a permutation of the indices {1, . . . , n}, leading to the ordered list xπ(1) , . . . , xπ(n) of the variables. A π-OBDD on Xn for a variable ordering π is a directed acyclic graph with one root, two sinks labeled with 0 and 1, resp., and the following properties: Each inner node is labeled by a variable from Xn and has two outgoing edges, one of them labeled by 0, the other by 1. If an edge leads from a node labeled by xi to a node labeled by xj ,
Symbolic Topological Sorting with OBDDs
673
then π −1 (i) < π −1 (j). This means that any directed path passes the nodes in an order respecting the variable ordering π. A π-OBDD is said to represent a n boolean function f ∈ Bn , if for any a = (a1 , . . . , an ) ∈ {0, 1} , the path starting at the root and leading from any xi -node over the edge labeled by the value of ai , ends at a sink with label f (a). The size of a π-OBDD G is the number of its nodes and is denoted by |G|. The π-OBDD of minimal size for a given function f and a fixed variable ordering π is unique up to isomorphism. A π-OBDD is called reduced, if it is the minimal π-OBDD. It is well-known that the size of any reduced π-OBDD for a function f ∈ Bn is bounded by O(2n /n). (see [1] for the upper bound with the best constants known). Let f and g be functions in Bn and let Gf and Gg be π-OBDDs representing f and g, resp., for an arbitrary variable ordering π. In the following, we summarize the operations on OBDDs to which we will refer in this text. For a more detailed discussion on OBDDs and their operations we refer to the monograph [12]. n
– Evaluation: Given x ∈ {0, 1} compute f (x). This can trivially be done in time O(n). – Minimization: Compute the reduced π-OBDD for f . This is possible in time O(|Gf |). – Binary synthesis: Given a boolean operation ⊗ ∈ B2 compute a reduced π-OBDD Gh representing the function h = f ⊗ g. This can be done in time O(|G∗h |), where G∗h is the graph which consists of all nodes in the product graph of Gf and Gg reachable from the root. The size of Gh is at most O(|G∗h |) = O(|Gf | · |Gg |). – Replacement by constants: Given a sequence of variables xi1 , . . . , xik ∈ Xn and a sequence of constants c1 , . . . , ck , compute a reduced π-OBDD Gh for the subfunction h := f|xi1 =c1 ,...,xik =ck ∈ Bn−k . This is possible in time O(|Gf |) and the reduced π-OBDD Gh is of smaller size than Gf . – Quantification: Given a variable xi ∈ Xn and a quantifier Q ∈ {∃, ∀}, compute a reduced π-OBDD for the function h ∈ Bn−1 with h := (Qxi )f , where (∃xi )f := f|xi =0 ∨ f|xi =1 and (∀xi )f := f|xi =0 ∧ f|xi =1 . The time for computing this π-OBDD is determined by the time for determining the πOBDDs for f|xi =0 and f|xi =1 and the time required for the binary synthesis 2 of the two. Hence, it is bounded by O |Gf | . DFS – SAT enumeration: Enumerate all inputs x ∈ f −1 (1). Using simple techniques, this can be done in optimal time O(|Gf | + nf −1 (1)). We can use OBDDs for an implicit graph representation by letting them represent the characteristic functions of the vertex and edge sets. For practical reasons, though, we assume throughout this text that the vertex set is V = n {0, 1} for some n ∈ N, so that a representation of V is not needed. It is easy to accommodate the algorithm for other vertex sets. In order to encode integers n using the binary notation we define |x| = 2n−1 xn−1 + · · · + 20 x0 for x ∈ {0, 1} .
674
3
Philipp Woelfel
The Topological Sorting Algorithm n
Let G = (V, E), V = {0, 1} , be a directed acyclic graph represented by a π-OBDD as described in the former section. The edge relation E defines in a natural way a partial order on V , where v w if and only if there exists a path from v to w. In the explicit case a topological sorting algorithm would enumerate all vertices in such a way that if u is enumerated before v, then v u. In the implicit case, we hope for runtimes in the order of o(|V |) in which the enumeration of all vertices is not possible. Hence, a goal might be to obtain a complete order ≺ which inherits the properties of (i.e., v u implies v ≺ u). Unless is a complete order, ≺ is not uniquely defined by , and thus we assume that an arbitrary complete order on the vertex set V is given (this may be fixed in advance for the algorithm or may be given as an additional parameter), which determines the order of the elements which are incomparable with respect to (i.e., those with u v and v u). An alternative is to compute an OBDD which allows to enumerate the elements in their topological order by simple SAT enumeration operations. For any two vertices u, v we denote by ∆(u, v) the length of the longest path leading from u to v. (The length of a path is the number of its edges.) If no such path exists, then ∆(u, v) := −∞. Note that ∆(v, v) = 0, since the graph is acyclic. Furthermore, let ∆(v) := max {∆(u, v) | u ∈ V }. We call ∆(v) the length of the longest path to the vertex v. Let now DIST ∈ B2n , be defined to be 1 for an n n input (d, v) ∈ {0, 1} × {0, 1} , if and only if ∆(v) = |d|. Clearly, |du | < |dv | implies v u, where du , dv are the unique values with DIST(du , u) = 1 and DIST(dv , v) = 1. Hence, if we have a π-OBDD GDIST for the function DIST, we can use it to enumerate the vertices in an order respecting by computing the π-OBDDs for DIST|d=a for |a| = 0, 1, . . . and enumerating their satisfying inputs using the SAT enumeration procedure. We will see below how the OBDD GDIST can in addition be used to obtain a complete order respecting . In order to compute the function DIST, we use a method which is similar to that of computing the transitive closure by matrix squaring. For i ∈ {1, . . . , n} and u, v ∈ V let Ti (u, v) be the boolean function with function value 1 if and only if there exists a simple path from u to v which has length exactly 2i . We can compute OBDDs for all Ti as follows. T0 (u, v) = E(u, v)
and Ti+1 (u, v) = ∃w : Ti (u, w) ∧ Ti (w, v).
(S1)
Now we define the function DISTj ∈ B2n−j for 0 ≤ j ≤ n. It takes as input an (n − j)-bit value d∗ = dn−1 . . . dj and a vertex v (for j = n, d∗ is the empty string ). The function value DISTj (d∗ , v) is defined as DISTj (d∗ , v) = 1
⇔
2j |d∗ | ≤ ∆(v) < 2j (|d∗ | + 1).
(∗)
I.e., DISTj (d∗ , v) is true if the bits dn−1 . . . dj are exactly the n − j most significant bits of the binary representation of the integer ∆(v). Clearly, DIST =
Symbolic Topological Sorting with OBDDs
675
DIST0 . The functions DISTj can be computed by DISTn (v) := 1 and for j = n − 1, . . . , 0 DISTj (dn−1 . . . dj , v) = DISTj+1 (dn−1 . . . dj+1 , v) ∧ dj ⇔ ∃u Tj (u, v) ∧ DISTj+1 (dn−1 . . . dj+1 , u) . (S2) It is easy to verify that the boolean functions DISTj do in fact fulfill property (∗) (the proof can be found in the full version of this paper). Once we have computed the function DIST, we can use it together with an arbitrary given complete order in order to compute a complete order ≺ by letting u ≺ v ⇔ ∃du , dv : DIST(du , u) ∧ DIST(dv , v) ∧
|du | < |dv | ∨ (|du | = |dv | ∧ u v) . (S3)
It can be easily checked that ≺ defines a complete order on V respecting . Thus, the following theorem follows from simply counting the number of OBDD operations. n
Theorem 1. Let V = {0, 1} and G = (V, E) be an acyclic directed graph represented by OBDDs. Applying the OBDD operations as described in (S1)– (S3) yields an OBDD for a relation ≺ which defines a complete order on V such that v ≺ w for all v, w ∈ V with (v, w) ∈ E. The number of OBDD operations is O(n2 ), where each OBDD represents a function on at most 4n variables. Note that the algorithm can be easily adapted to an arbitrary vertex set V ⊆ n {0, 1} given by an OBDD for the relation V . This is done by simply executing the algorithm from above for the edge relation E (u, v) = E(u, v) ∧ V (u) ∧ V (v). While the complete order ≺ returned by such a modified algorithm is defined on n n {0, 1} × {0, 1} , its restriction to V × V is obviously a correct complete order. Since any (not necessarily reduced) OBDD in n variables has O(2n ) nodes, the theorem shows that the true worst-case runtime of our algorithm is 4 O(|V | log2 |V |). Clearly, this is much worse than the O(|V | + |E|) upper bound obtained by a well-known explicit algorithm. On the other hand, if all OBDDs obtained during the execution of the algorithm have a subexponential size (with respect to n), then its runtime is sublinear with respect to the number of vertices. In the following sections we show that it is justifiable to hope that this is the case for very structured input graphs.
4
Runtime Analysis for the Grid Graph
We analyze the behavior of the topological sorting algorithm for a 2n × 2n -grid, where all edges are directed from left to right and from bottom to up. The n n directed grid of the vertex set V = {0, 1} × {0, 1} and edge set graph consists E, where (x, y), (x , y ) ∈ E if and only if either |x| = |x | and |y | − |y| = 1 or |y| = |y | and |x | − |x| = 1.
676
Philipp Woelfel
In the analysis to follow, we assume an interleaved variable ordering, that is a variable ordering where e.g. for a function depending on two vertices u, v, the variable vi precedes the corresponding variable ui . Note that in practice, heuristics such as sifting algorithms [8] are used to optimize the variable orderings during the execution of an algorithm, and it can be expected that a good variable ordering is found this way. The idea for proving that the topological sorting algorithm is very efficient for the grid graph is that all functions represented by OBDDs after each step of the algorithm belong to a class of functions which have a small OBDD representation. The functions we consider are compositions of certain threshold and modulo functions, which we define and investigate in the following. We denote by Xk,n the set of variables xij with 1 ≤ i ≤ k and 0 ≤ j < n. By i x we denote the vector of n variables (xin−1 , . . . , xi0 ). Definition 1. 1. A boolean function f ∈ Bkn defined on the variable set Xk,n is called kvariate threshold function, if there exist a threshold T∈ Z and weights k w1 , . . . , wk ∈ Z such that f (x1 , . . . , xk ) = 1 if and only if i=1 wi · xi ≥ T . The maximum absolute weight of f is defined as w(f ) := max |w1 |, . . . , |wk | . The set of k-variate threshold functions with maximum absolute weight w defined on the set of variables Xk,n is denoted by Tw k,n 2. A boolean function g ∈ Bkn defined on the variable set Xk,n is called kvariate modulo M function, if there exists a constant k C ∈ Z and w1 , . . . , wk ∈ Z such that g(x1 , . . . , xk ) = 1 if and only if i=1 wi · xi ≡ C (mod M ). The set of k-variate modulo M functions defined on the set of variables Xk,n is denoted by MM k,n . Definition 2. Let f ∈ Bn and C be a class of functions defined on the variable set Xn . We say that f can be decomposed into m functions in C, if there exist a formula F on m variables and f1 , . . . , fm ∈ C such that f = F (f1 , . . . , fm ). The set of functions decomposable into m functions in C is denoted by D[C, m]. For any k ∈ N we denote by Dk the
set of function sequences (fn )n∈N such that ∃m ∈ N ∀n ∈ N : fn ∈ D T1k,n , m .
The main idea in our proof is based on two observations. Firstly, any function decomposable into a constant number of threshold and modulo functions has a small OBDD size. Secondly, all intermediate OBDDs obtained during the execution of the topological sorting algorithm on the directed grid graph represent functions which are decomposable into threshold and modulo functions. Let πk,n be the variable ordering in which the variable in Xk,n appear in the order x10 , x20 , . . . , xk0 , x11 , . . . , xk1 , . . . , xkn−1 . I.e., a πk,n -OBDD tests all bits of the input integers in an interleaved order with increasing significance of the bits. The following result is a generalization of Proposition 4 in [6]. The proof will be given in the full version of this extended abstract.
Symbolic Topological Sorting with OBDDs
677
M Lemma 1. Let f1 , . . . , fm ∈ Tw k,n ∪ Mk,n be given by reduced πk,n -OBDDs for fi , 1 ≤ i ≤ m. Further, let f = F (f1 , . . . , fm ) for a formula F of size s and let L = L(k, m, M ) = max {4kw + 5, M }. The minimal πk,n -OBDD for f has at most Ls+1 kn nodes and can be computed in time and space O (kn)2 sLs+1 + 1 .
Now we show for functions being decomposable into threshold functions (and no modulo functions), that the quantification over one of its variable blocks xi0 , . . . , xin−1 , 1 ≤ i ≤ k, can be done efficiently. Theorem 2. Let (fn )n∈N such that there exist w, m ∈ N with fn ∈ D[Tw k,n , m] for all n ∈ N, and let Q ∈ {∃, ∀}. If fn is given as a πk,n -OBDD, then for any 1 ≤ ≤ k a minimal πk,n -OBDD for (Qx )fn can be computed in time n3 k O(1) . We need the following lemma, which states that the result of quantifying over one variable block of a function decomposable into threshold functions is a function which is decomposable into threshold and modulo functions. The proof has to be omitted due to space restrictions. Let lcm denote the least common multiple.
Lemma 2. Let f ∈ D Tw , m , and Q ∈ {∃, ∀} and ∈ {1, . . . , k}. Then k,n
∗ ∗ w ∗ (Qx )f ∈ D T2w·w ≤ lcm{1, 2, . . . , w} and m = k−1,n ∪ Mk−1,n , m , where w O(2m w∗ m2 ). In particular, for any fixed k ∈ N and a sequence of functions (fn )n∈N ∈ Dk we have (Qxi )fn ∈ D T2k−1,n , m , where m = O(1). Proof (of Theorem 2). Fix w, m ∈ N such that fn ∈ D[Tw k,n , m] for all n ∈ N. W.l.o.g. we assume = 1 and for the sake of readability we write x instead of x1 and f instead of fn . We only prove the theorem for the case Q = ∀; the proof for Q = ∃ works analogously. We can write (∀x)f as (∀xn−1 ∀xn−2 . . . ∀x0 )f (x2 , . . . , xk ). If we apply the OBDD quantification operations to the bits x0 , . . . , xn−1 in this order, then after the ith quantification (0 ≤ i ≤ n) the resulting OBDD Gi represents the function gi = (∀xi−1 . . . ∀x0 )f be done in time in Bkn−i . Since each of the n quantification operations n−1 can 2 2 O(|Gi | ), the total time required is bounded by i=0 |Gi | . Hence, it suffices to show that Gi has a size of at most O(nk O(1) ) for all 0 ≤ i ≤ n − 1. Note that gi does not depend on the variables x0 , . . . , xi−1 . What we do in the following is to introduce n dummyvariables z0 , . . . , zn−1 and to show that gi can be written as (∀z0 , . . . , zn−1 )gi∗ |x =0,...,x =0 , where gi∗ is a function in 0 i−1
w D Tk+1,n , m + 1 . Hence, gi is obtained from the function (∀z0 , . . . zn−1 )gi∗ by restricting some variables to constants. By Lemma 2, this function is decomposable into a constant number of threshold functions, and therefore its OBDD size is bounded sufficiently. Note that the variables z0 , . . . , zn−1 are merely artifical helper variables, and that none of the functions we “really” deal with (i.e., which are represented by OBDDs) depend on these variables. Let f = F (f1 , . . . , fm ) for a formula F and f1 , . . . , fm ∈ Tw k,n . Since m = O(1), we may assume w.l.o.g. that the size s of F is a constant, too. We introduce
678
Philipp Woelfel
n new variables, which we denote by z0 , . . . , zn−1 . Then we replace the variables xj with the variables zj for 0 ≤ j ≤ i − 1. This way we obtain gi = (∀xi−1 . . . x0 )f (xn−1 . . . xi xi−1 . . . x0 , x2 , . . . , xk ) = (∀zi−1 . . . z0 )f (xn−1 . . . xi zi−1 . . . z0 , x2 , . . . , xk ) = (∀zn−1 . . . z0 ) |z| ≥ 2i ∨ f (xn−1 . . . xi zi−1 . . . z0 , x2 , . . . , xk )
(1)
Now consider an arbitrary threshold function fj for some 1 ≤ j ≤ m. I.e., fj (x, x2 , . . . , xk ) = 1 if and only if w1 |x| + w2 x2 + · · · + wk xk ≥ T . Let fj∗ ∈ B(k+1)n be the function with fj∗ (z, x, x2 , . . . , xk ) = 1 ⇔ w1 |z| + w1 x1 + w2 x2 + · · · + wk xk ≥ T ∗ i and f ∗ = F (f1∗ , . . . , fm ). Obviously, f ∗ ∈ D[Tw k+1,n , m]. If |z| < 2 , then |xn−1 . . . xi zi−1 . . . z0 | is the same as |xn−1 . . . xi 0 . . . 0| + |z|. Hence, it is easy to conclude from (1) that gi = (∀zn−1 . . . z0 ) |z| ≥ 2i ∨ f ∗ (z, xn . . . xi 0 . . . 0, x2 , . . . , xk ) ∗ = (∀zn−1 . . . z0 ) |z| ≥ 2i ∨ f|x (z, x1 , x2 , . . . , xk ) . i−1 =···=x0 =0
Now let
gi∗ (x1 , . . . , xk ) = |z| ≥ 2i ∨ f ∗ (z, x1 , x2 , . . . , xk ).
Then gi∗ ∈ D Tw and gi = (∀z)gi∗ |x =0,...,x =0 . Since k+1,n , m + 1 0 i−1
gi∗ ∈ D Tw , m + 1 and k, w, and m are constants, we can conclude from k+1,n
M Lemma 2 that (∀z)gi∗ ∈ D Tw for some constants w , M , and m . k,n ∪ Mk,n , m
Thus, by Lemma 1 the πk,n -OBDD size of (∀z)gi∗ is bounded by O(nk O(1) ). But as we have shown above, the πk,n -OBDD for gi can be obtained from the πk,n -OBDD for (∀z)gi∗ by simply replacing some variables with the constant 0. Hence, the resulting minimal πk,n -OBDD for gi can only be smaller than that for (∀z)gi∗ and thus its size is also bounded by O(nk O(1) ). Remark 1. All the upper bounds in Lemma 1 and Theorem 2 proven for functions decomposable into threshold- and modulo functions hold equivalently for their subfunctions f|α1 ...αi , where α1 . . . αi is a restriction to arbitrary variables except those being quantified in case of Theorem 2. The following corollary summarizes the abovely stated results in a more convenient way. It follows from the statements in Lemma 1, Theorem 2, Lemma 2, and Remark 1. Corollary 1. Fix a constant k ∈ N and let i, j ∈ {1, . . . , k} and Q, Q ∈ {∃, ∀}. Further, let (gn )n∈N ∈ Dk and fn = gn |α , where α is an assignment of constants to arbitrary variables except to those in {xi0 , . . . , xin−1 }. If gn is either given by a reduced πk,n -OBDD or by the reduced πk,n -OBDDs for the threshold functions into which it is decomposable, then the reduced πk,n -OBDDs for (Qxi )gn , (Qxi )fn , and (Qxi Q xj )gn can be computed in time O(n3 ).
Symbolic Topological Sorting with OBDDs
679
We can now apply these results in order to analyze the true run time of the topological sorting algorithm for the grid graph. Whenever we talk in the following about an OBDD for some function sequence in Dk , we assume that the variable ordering is πk,n . We have to specify the complete order for the 1 1 operations in (S3). A very natural order 1 is the2lexicographical order, i.e. (x , y ) 2 2 1 2 1 2 (x , y ) if and only if |x | < |x | ∨ |x | = |x | ∧ |y | ≤ |y | . Recall the steps (S1)-(S3) of the topological sorting algorithm from Section 3. We start the analysis of the grid with of the edge relation 2E. By 1the definition 1 1 2 2 2 1 graph, (x , y ), (x , y | − |x | = 0 ∧ |y ) ∈ E if and only if |x | − |y |=1 ∨ 2 |y | − |y 1 | = 0 ∧ |x2 | − |x1 | = 1 . Clearly, this function is in D4 . Now we look at the functions Ti obtained by (S1). Recall that Ti (u, v) is defined to be 1 if and only if there exists a path from u to v which has length exactly 2i . Note also that in the directed grid graph all paths from vertex u to vertex v have the same length. Hence, for the directed grid graph Ti (x1 , y 1 ), (x2 , y 2 ) = 1 if and only if |y 2 | ≥ |y 1 |
∧
|x2 | ≥ |x1 |
∧
|x2 | − |x1 | + |y 2 | − |y 1 | = 2i .
Clearly, this function is in D4 and thus according to Corollary 1, Ti+1 can be computed from Ti in time O(n3 ). (Note also that the quantification over one vertex in the grid graph is a quantification over two integers.) Hence, computing T1 , . . . , Tn requires time O(n4 ) in total. Next, we analyze the construction of the OBDDs for the functions DISTj in (S2). Recall that for any vertex v and any d∗ = dn−1 . . . dj , the function DISTj (d∗ , v) is true if and only if d∗ describes the n − j most significant bits of the bitwise representation of ∆(v). Let fj ∈ B3n , 0 ≤ j ≤ n, be defined by fj (d, x, y) = 1 if and only if |d| ≤ |x| + |y| < |d| + 2j . Hence, fj is the conjunction of two functions in T13,n . Furthermore, it is easy to see that DISTj dn−1 . . . dj , (x, y) = f|dj−1 =···=d0 =0 (d, x, y). Therefore, DISTj is obtained from a function in D3 by replacing some variables with the constant 0. Note also that DIST = DIST0 is in fact in D3 . Moreover, due to the analysis of Tj above, it becomes obvious that Tj (u, v) ∧ DISTj+1 (dn−1 . . . dj+1 , u) is a function in D5 , where some variables are replaced with the constant 0. Hence, according to Corollary 1, the OBDD for gj := ∃u : Tj (u, v) ∧ DISTj+1 (dn−1 . . . dj+1 , u) 3 can be in time O(n ). The function gj is obtained from a function 8computed 2 in D T3,n ∪ M3,n , O(1) by replacing some variables with the constant 0 (see Lemma 2). Now it is easy to see that the final two synthesis operations of (S2) required to compute DISTj run in time O(n3 ). (Apply Lemma 1 and Remark 1, and note that the function dj ∈ B1 can be viewed as a subfunction of f ∈ D1 with f (dn−1 . . . d0 ) = 1 if and only if |dn−1 . . . d0 | = 2j .) Altogether, the total time required for computing DISTn−1 , . . . , DIST0 = DIST is O(n4 ). Finally, we have to investigate the computation of the complete order ≺ using the operations in (S3). Recall that DIST ∈ D3 Hence, if one takes the definition of into account, the complete term in (S3) before the first quantification describes
680
Philipp Woelfel
a function h in D4 . According to Corollary 1 the function h = (∃dv ∃du )h can be computed in time and space O(n3 ). Summing up the time bounds for all OBDD operations, we have obtained the following result. Theorem 3. The OBDD algorithm for topological sorting takes time O(n4 ) on the directed 2n × 2n grid graph for an appropriate variable ordering πk,n and the complete order as defined above.
5
Conclusion
Since the results about the threshold and modulo functions are quite general, we hope that they might as well be applicable to the analysis of other symbolic OBDD algorithms. It would be nice to extend the techniques in such a way that not only single input instances but small graph classes can be handled. An interesting example would be grids where some arbitrary or randomly chosen edges have been removed.
Acknowledgments I thank Daniel Sawitzki and Ingo Wegener for helpful comments and discussions.
References 1. Y. Breitbart, H. B. Hunt III, and D. J. Rosenkrantz. On the size of binary decision diagrams representing boolean functions. Theor. Comp. Sci., 145:45–69, 1995. 2. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang. Symbolic model checking: 1020 states and beyond. Inform. and Comp., 98:142–170, 1992. 3. H. Cho, G. Hachtel, S.-W. Jeong, B. Plessier, E. Schwarz, and F. Somenzi. ATPG aspects of FSM verification. In IEEE Int. Conf. on CAD, pp. 134–137. 1990. 4. H. Cho, S.-W. Jeong, F. Somenzi, and C. Pixley. Synchronizing sequences and symbolic traversal techniques in test generation. Journal of Electronic Testing: Theory and Applications, 4:19–31, 1993. 5. G. D. Hachtel and F. Somenzi. A symbolic algorithm for maximum flow in 0-1 networks. Formal Methods in System Design, pp. 207–219, 1997. 6. S. Jukna. The graph of integer multiplication is hard for read-k-times networks. Technical Report 95–10, Universit¨ at Trier, 1995. 7. M. Keim, R. Drechsler, B. Becker, M. Martin, and P. Molitor. Polynomial formal verification of multipliers. Formal Methods in System Design, 22:39–58, 2003. 8. R. Rudell. Dynamic variable ordering for ordered binary decision diagrams. In IEEE Int. Conf. on CAD, pp. 42–47. 1993. 9. D. Sawitzki. Implicit flow maximization by iterative squaring. Manuscript. http://ls2-www.cs.uni-dortmund.de/˜sawitzki. 10. D. Sawitzki. Implicit flow maximization on grid networks. Manuscript. http://ls2-www.cs.uni-dortmund.de/˜sawitzki. 11. D. Sawitzki. Implizite Algorithmen f¨ ur Graphprobleme. Diploma thesis, Univ. Dortmund, 2002. 12. I. Wegener. Branching Programs and Binary Decision Diagrams - Theory and Applications. SIAM, 2000.
Ershov’s Hierarchy of Real Numbers Xizhong Zheng1 , Robert Rettinger2 , and Romain Gengler1 1
2
BTU Cottbus, 03044 Cottbus, Germany [email protected] FernUniversit¨ at Hagen, 58084 Hagen, Germany
Abstract. Analogous to Ershov’s hierarchy for ∆02 -subsets of natural numbers we discuss the similar hierarchy for recursively approximable real numbers. Namely, with respect to different representations of real numbers, we define k-computability and f -computability for natural numbers k and functions f . We will show that these notions are not equivalent for representations based on Cauchy sequences, Dedekind cuts and binary expansions.
1
Introduction
In classical mathematics, real numbers are represented typically by Dedekind cuts, Cauchy sequences of rational numbers and binary or decimal expansions. The effectivization of these representations leads to equivalent definitions of computable real numbers. This notion was first explored by Alan Turing in his famous paper [14] where also the Turing machine is introduced. According to Turing, the computable numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite means (page 230, [14]). In other words, a real number x ∈ [0; 1] 1 is called computable if there is a com putable function f : N → {0, 1, · · · , 9} such that x = i∈N f (i) · 10−i . Robinson [9] has observed that computable real numbers can be equivalently defined via Dedekind cuts and Cauchy sequences. Theorem 1 (Robinson [9], Myhill [6] and Rice [8]). For any real number x ∈ [0; 1], the following are equivalent. 1. 2. 3. 4.
x is computable; The Dedekind cut Lx := {r ∈ Q : r < x} of x is a recursive set; There is a recursive set A ⊆ N such that x = xA := i∈A 2−i ; There is a computable sequence (xs ) of rational numbers which converges to x effectively in the sense that (∀s, t ∈ N)(t ≥ s =⇒ |xs − xt | ≤ 2−s ).
1
(1)
In this paper we consider only the real numbers of the unit interval [0; 1]. For other real numbers y, there are an n ∈ N and an x ∈ [0; 1] such that y := x ± n. y and x are regarded as being of the same computability.
B. Rovan and P. Vojt´ aˇ s (Eds.): MFCS 2003, LNCS 2747, pp. 681–690, 2003. c Springer-Verlag Berlin Heidelberg 2003
682
Xizhong Zheng, Robert Rettinger, and Romain Gengler
Because of Specker’s example of an increasing computable sequence of rational numbers with a non-computable limit in [13], the extra condition (1) of the effective convergence is essential for the computability of x. As observed by Specker [13], Theorem 1 does not hold if the effectivization to the primitive recursive instead of computable level is considered. Let R1 be the class of all limits of primitive recursive sequences of rational numbers which converge primitive recursively, R2 the class of all real numbers of primitive recursive binary expansions and R3 include all real numbers of primitive recursive Dedekind cuts. It is shown in [13] that R3 R2 R1 . For polynomial time computability of real numbers, Ko [5] shows this dependence on representations of real numbers too. Let PC be the class of limits of all polynomial time computable sequences of dyadic rational numbers which converge effectively, PD contain all real numbers of polynomial time computable Dedekind cuts and PB be the class of real numbers whose binary expansions are polynomial time computable (with the input n written in unary notation). Ko [5] shows that PD = PB PC and PC is a real closed field while PD is not closed under addition and subtraction. In [5], the dyadic rational numbers D := ∪n∈N Dn for Dn := {m · 2−n : m ∈ N} instead of Q is used as base set. For the complexity discussion D seems more natural and easier to use. But for computability it makes no essential difference and we use both D and Q in this paper. In this paper, we investigate similar classes where we weaken the notion of computability in several quite natural ways instead of strengthening this notion. A typical approach to explore the non-computable objects is to classify them into equivalent classes or so-called degrees by various reductions (see e.g. [12]). This can be easily implemented for real numbers by mapping each set A ⊆ N to a real number xA := i∈A 2−i and then defining Turing reduction xA ≤T xB by A ≤T B. This definition is robust as shown in [2]. The benefit of this approach is that the techniques and results from well developed recursion theory can be applied straightforwardly. For example, Ho [4] shows that a real number x is Turing reducible to 0 , the degree of the halting problem K, iff there is a computable sequence of rational numbers which converges to x. This is a reprint of Shoenfield’s Limit Lemma ([10]) in recursion theory which says that A ≤T K iff A is a limit of a computable sequence of subsets of natural numbers. However, the classification of real numbers by Turing reductions seems not fine enough and it does not relate very closely to the analytical properties of real numbers. In this paper we will give another classification of real numbers which is analogous to Ershov’s hierarchy ([3]) for subsets of natural numbers. Notice that, if A ⊆ N is recursive, then there is an algorithm which tells us whether a natural number n belongs to A or not. In this case, corrections are not allowed. However, if we allow the algorithm to change its mind for the membership of n to A from negative to positive but at most once, then the corresponding set A is an r.e. set. In other words, the algorithm may claim n∈ / A at some stage and correct its claim to n ∈ A at a later stage. In general, given a function h : N → N, if the algorithm is allowed to change the answer to the question “n ∈ A? ” at most h(n) times for any n ∈ N, then the corresponding
Ershov’s Hierarchy of Real Numbers
683
set A is called h-r.e. according to Ershov [3]. Especially, for constant function h(n) ≡ k, the h-r.e. sets are called k-r.e. For recursive function h, the h-r.e. sets are called ω-r.e. This introduces a classification of ∆02 subsets of N (so called Ershov’s Hierarchy). Obviously, we can transfer this hierarchy to real numbers via their binary expansions straightforwardly. More precisely, we call xA h-binary computable if A is h-r.e. Similarly, after extending Ershov’s Hierarchy to subsets of rational numbers, we can call x h-Dedekind computable if the Dedekind cut of x is a h-r.e. set. For the Cauchy representation of real numbers a classification similar to Ershov’s can be introduced too. In this case, we count the number of the “big jumps” of the sequence instead of the number of the “mind-changes”. According to Theorem 1.4, x is computable if there is a computable sequence (xs ) of rational numbers which converges to x and the sequence (xs ) makes no big jumps in the sense of (1). However, if up to h(n) (non-overlapped) “big jumps” are allowed, then x is called h-Cauchy computable. Thus, three kinds of h-computability of real numbers can be naturally introduced. In this paper, we will investigate these notions and compare them with other known notions of weak computability of real numbers discussed in [15]. In this case we will find that Cauchy computability is the most natural notion, although several interesting results about binary and Dedekind computability are obtained in this paper.
2
Basic Definitions
In this section, we recall first some notions of weak computability of real numbers and Ershov’s hierarchy. Then we give the precise definition of binary, Dedekind and Cauchy computability. As mentioned in the previous section, a real number x is computable if there is a computable sequence (xs ) of rational numbers which converges to x effectively in the sense of (1). The limit of an increasing or decreasing computable sequence of rational numbers is called left computable or right computable, respectively. Left and right computable real numbers are called semi-computable. If x is a difference of two left computable real numbers, then x is called weakly computable. According to Ambos-Spies, Weihrauch and Zheng [1], x is weakly computable iff there is a computable sequence (xs ) of rational numbers which converges to x weakly effectively, in the sense that s∈N |xs − xs+1 | ≤ c for a constant c. More generally, if x is simply the limit of a computable sequence of rational numbers, then x is called recursively approximable. The classes of computable, left computable, right computable, semi-computable, weakly computable and recursively approximable real numbers are denoted by EC, LC, RC, SC, WC and RA, respectively. For any finite set A := {x1 < x2 < · · · < xk } of natural numbers, the natural number i := 2x1 + 2x2 + · · · + 2xk is called the canonical index of A. The set with canonical index i is denoted by Di . A sequence (As ) of finite subsets of N is called computable if there is a computable function g : N → N such that As = Dg(s) for any s ∈ N. Similarly, we can introduce the canonical index for subsets of dyadic rational numbers. Let σ : N → D be a one-to-one coding of
684
Xizhong Zheng, Robert Rettinger, and Romain Gengler
the dyadic numbers. For any finite set A ⊆ D, its canonical index is defined as the canonical index of the set Aσ := σ −1 (A) := {n ∈ N : σ(n) ∈ A}. In this paper, the subset A ⊆ D of canonical index n is denoted by Vn . A sequence (As ) of finite subsets of dyadic numbers is called computable if there is a recursive function h such that As = Vh(s) for all s ∈ N. Definition 1 (Ershov [3]). For any function h : N → N, a set A ⊆ N is called h-recursively enumerable (h-r.e. for short) if there is a computable sequence (As ) of finite subsets As ⊆ N such that ∞ ∞ 1. A0 = ∅ and A = i=0 j=i Aj . 2. (∀n ∈ N)(|{s : n ∈ As ∆As+1 }| ≤ h(n)), where A∆B := (A \ B) ∪ (B\A) is the symmetrical difference of A and B. In this case, the sequence (As ) is called an effective h-enumeration of A. For k ∈ N, a set A is called k-r.e. if it is h-r.e. for a constant function h(n) ≡ k and A is ω-r.e. if it is h-r.e. for some recursive function h. For convenience, recursive sets are called 0-r.e. Theorem 2 (Hierarchy Theorem, Ershov [3]). Let f, g : N → N be recursive functions. If (∃∞ n ∈ N)(f (n) < g(n)), then there is a g-r.e. set which is not f -r.e. Thus, there is an ω-r.e. set which is not k-r.e. for any k ∈ N; there is a (k + 1)-r.e. set which is not k-r.e. (for every k ∈ N), and there is also a ∆02 -set which is not ω-r.e. The definition of h-r.e., k-r.e. and ω-r.e. subsets of natural numbers can be transferred straightforwardly to subsets of dyadic rational numbers. Of course, h should be a function of type h : D → N in this case. This should be clear from context and is usually not indicated explicitly later on. Thus, we can easily introduce corresponding hierarchies for real numbers by means of binary or Dedekind representations of real numbers. However, if the real numbers are represented by sequences of rational numbers, we should count the number of their jumps of certain size. More precisely, we have the following definition. Definition 2. Let n be a natural number and (xs ) be a sequence of real numbers which converges to x. 1. An n-jump of (xs ) is a pair (i, j) with n < i < j & 2−n ≤ |xi − xj | < 2−n+1 . 2. The n-divergence of (xs ) is the maximal number of non-nested n-jump pairs of (xs ), i.e., the maximal natural number m such that there is a chain n < i1 < j1 ≤ i2 < j2 ≤ · · · ≤ im < jm with 2−n ≤ |xit − xjt | < 2−n+1 for t = 1, 2, · · · , m. 3. For h : N → N, if the n-divergence of (xs ) is bounded by h(n) for any n ∈ N, then we say that (xs ) converges to x h-effectively. Definition 3. Let x ∈ [0; 1] be a real number and h : N → N a function. 1. x is h-binary computable (h-bEC for short) if there is a h-r.e. set A ⊆ N such that x = xA ;
Ershov’s Hierarchy of Real Numbers
685
2. x is h-Cauchy computable (h-cEC for short) if there is a computable sequence (xs ) of rational numbers which converges to x h-effectively; 3. x is h-Dedekind computable (h-dEC for short) if the left Dedekind cut Lx := {r ∈ Q : r < x} is a h-r.e. set. 4. For δ ∈ {b, c, d}, x is called k-δEC if x is h-δEC for the constant function h(n) ≡ k. x is called ω-δEC if it is h-δEC for a recursive function h. The classes of all k-δEC, ω-δEC and h-δEC real numbers are denoted by k-δ EC, ω-δ EC and h-δ EC, respectively, for δ ∈ {b, c, d}. Besides, let ∗-δ EC := n∈N n-δ EC. The following proposition follows directly from the definition. Proposition 1. For δ ∈ {b, c, d} and f, g : N → N, the following hold. 1. 0-δ EC = EC. 2. k-δ EC ⊆ (k + 1)-δ EC ⊆ ∗-δ EC ⊆ ω-δ EC, for any k ∈ N. 3. If f (n) ≤ g(n) holds for almost all n ∈ N, then f -δ EC ⊆ g-δ EC.
3
Binary Computability
In this section we discuss the binary computability. From Theorem 2, it follows immediately that g-bEC \ f -bEC = ∅ if (∃∞ n ∈ N)(f (n) < g(n)). Thus, we have the following hierarchy theorem for binary computability. Proposition 2. k-bEC (k + 1)-bEC ∗-bEC ω-bEC, for any k ∈ N. Now we compare the binary computability with semi-computability. It turns out that SC is incomparable with ∗-bEC but included properly in ω-bEC. Theorem 3. 1. SC ω-bEC 2. SC ⊆ ∗-bEC 3. 2-bEC ⊆ SC Proof. 1. As it is pointed out by Soare ([11], page 217), if the real number xA is left computable, then the set A is 2n+1 -r.e. Combining this with Theorem 2, SC ω-bEC follows immediately. 2. We construct a set A ⊆ N in stages such that xA is left computable and, for all i, j ∈ N, the following requirements are satisfied. Ri,j : (Dϕi (s) )s is an effective j-enumeration =⇒ A = lim Dϕi (s) . s→∞
where (ϕi ) is an effective enumeration of all computable partial functions ϕ :⊆ N → N. This implies that A is not ∗-r.e. To satisfy Re for e := i, j, we choose an ne > j. We put ne into A as long as ne is not in Dϕi (s) . If ne enters Dϕi (s) for some s, then we take ne out of A. ne may be put into A again if ne leaves Dϕi (t) for some t > s, and so on. Obviously, we need only to change the membership of ne to A at most j times and the strategy succeeds eventually. To make xA left computable, we reserve an interval [me ; ne ] of natural numbers with ne − me > j
686
Xizhong Zheng, Robert Rettinger, and Romain Gengler
exclusively for Re and put a new element from this interval into A whenever ne is taken out of A. 3. Ambos-Spies, Weihrauch and Zheng (Theorem 4.8 of [1]) show that, for Turing incomparable r.e. sets A, B ⊆ N, xA⊕B is not semi-computable, where B is the complement of B and A ⊕ B := {2n : n ∈ A} ∪ {2n + 1 : n ∈ B}. On the other hand, for any r.e. sets A, B, the join A ⊕ B := (2A ∪ (2N + 1)) \ (2B + 1) is a 2-r.e. set and hence xA⊕B is 2-bEC. Theorem 4. WC ⊆ ω-bEC and ω-bEC ⊆ WC Proof. In [16] Zheng shows that there are r.e. sets A, B ⊆ N such that the set C ⊆ N defined by xC := xA − xB is not of ω-r.e. Turing degree. This means that xC is weakly computable but not ω-bEC. That is, WC ⊆ ω-bEC. The part ω-bEC ⊆ WC follows immediately from a result of [1], that if xA⊕∅ is weakly computable, then A is a 23n -r.e. set. By Ershov’s Hierarchy Theorem 2, we can choose an ω-r.e. A which is not 23n -r.e. Then B := A ⊕ ∅ is obviously also an ω-r.e. set and hence xB is ω-bEC. But xB is not weakly computable because A is not 23n -r.e.
4
Dedekind Computability
We investigate Dedekind computability in this section. Again the class of ω-dEC and WC are incomparable. But different from the case of binary computability, the hierarchy theorem does not hold any more. Between ω-binary and ωDedekind computability we have the following result. Theorem 5. ω-bEC ⊆ ω-dEC Proof. Let xA ∈ ω-bEC and (As ) be an effective h-enumeration of A for a recursive function h. We define a computable sequence (Es ) of finite subsets of dyadic numbers by Es := {r ∈ Ds : r ≤ xAs }, where Ds is the set of all dyadic rational numbers of precision s. It is easy to see that E := lims Es exists and it is in fact the left Dedekind cut of the real numberxA . On the other hand, (Es ) is an effective g-enumeration of E, where g(n) := i≤n h(i). Thus, x is a g-dEC and hence an ω-dEC real number. The next result shows that the class ∗-dEC collapses to SC and hence the hierarchy theorem does not hold. Lemma 1. 1. 1-dEC = LC and SC ⊆ 2-dEC. 2. ∗-dEC = SC. Proof. 1. This follows directly from the definition. 2. By item 1, it suffices to prove that ∗-dEC ⊆ SC. For any x ∈ ∗-dEC, let k := min{n : x ∈ n-dEC}. Then the Dedekind cut Lx of x is a k-r.e. but not (k − 1)-r.e. set. Let (As ) be an effective k-enumeration of Lx . Then there are infinitely many r ∈ D such that |{s ∈ N : r ∈ As+1 ∆As }| = k, where
Ershov’s Hierarchy of Real Numbers
687
A∆B := (A \ B) ∪ (B \ A). Let Ok := {r ∈ D : |{s ∈ N : r ∈ As+1 ∆As }| = k}. Obviously, Ok is an r.e. set. If k > 0 and k is even, then x < r for any r ∈ Ok and we can choose a decreasing computable sequence (rs ) from Ok such that lim rs = x. Otherwise, there is a rational number y such that x < y < r for all r ∈ Ok . In this case, we can construct an effective (k − 1)-enumeration of Lx by allowing any r > y to enter Lx at most k/2 − 1 times. This contradicts the hypothesis. Thus x is a right computable real number. Similarly, if k is odd, then x is left computable. Theorem 6. WC ⊆ ω-dEC. Proof. We construct recursive enumerations (As ) and (Bs ) of r.e. sets A and B and define Cs by xCs = xAs − xBs . Let C := lims→∞ Cs = respectively C . Then xC is a weakly computable real number. To guarantee that t s∈N t≥s xC is not ω-dEC, it suffices to satisfy the following requirements for all i, j ∈ N. ϕi and ψj are total functions, (Vϕi (s) )s∈N is an effective ψj -enumeration, =⇒ sup(Ei ) = xC . Ri,j : Ei := lims→∞ Vϕi (s) is a Dedekind cut where (ϕi ) and (ψj ) are recursive enumerations of partial computable functions ϕi :⊆ N → N and ψj :⊆ D → N, respectively. This can be achieved by a finite injury priority construction. Corollary 1. The class ω-dEC is incomparable with the class WC and hence the class ∗-dEC is a proper subset of ω-dEC. Corollary 2. The class ω-dEC is not closed under addition and subtraction. Proof. By Lemma 1.2, we have SC ⊆ ω-dEC. If ω-dEC is closed under addition and subtraction, then WC ⊆ ω-dEC holds because WC is the closure of SC under addition and substraction. This contradicts Theorem 6.
5
Cauchy Computability
We discuss the Cauchy computability in this section. We will show that, the classes k-cEC and ∗-cEC are incomparable with the classes LC and SC, and that the class ∗-cEC is not closed under addition. However the hierarchy theorem holds. From the definition of ω-Cauchy computability, it is easy to see that x is ω-Cauchy computable iff there are a recursive function h and a computable sequence (xs ) of rational numbers converging to x such that, for any n ∈ N, there are at most h(n) non-nested pairs (i, j) of indices with |xi − xj | ≥ 2−n . Thus, class ω-cEC is in fact the class DBC (divergence bounded computable real numbers) discussed in [7] and hence it is in fact the image class of all left computable real numbers under total computable real functions. We summarize some known results about the class ω-cEC in the next theorem where CTF denotes the class of all computable real functions f : [0; 1] → [0; 1].
688
Xizhong Zheng, Robert Rettinger, and Romain Gengler
Theorem 7 (Rettinger, Zheng, Gengler and von Braunm¨ uhl [7]). 1. The class ω-cEC is a field; 2. ω-cEC = CTF(LC) := {f (y) : f ∈ CTF & y ∈ LC} and 3. WC ω-cEC RA. Now let us look at the relationship among the classes 1-cEC, ∗-cEC and the classes SC and WC. Theorem 8. 1-cEC ⊆ SC ⊆ ∗-cEC WC. Proof. For the first noninclusion, consider the number xA⊕B for two Turing incomparable r.e. sets A, B ⊆ N. By Theorem 4.8 of [1], it is not semi-computable but 1-cEC. For the second inclusion, we can construct a left computable real number which is not k-cEC for any k ∈ N by a priority construction. To prove ∗-cEC WC, let (xs ) be a computable sequence of rational numbers which converges k-effectively to a ∗-cEC real number x for some k ∈ N. For any n ∈ N, let Sn := {s ∈ N : 2−n ≤ |xs − xs+1 | < 2−n+1 }. Then s∈N |xs − xs+1 | = n∈N s∈Sn &s≤n |xs − xs+1 | + s∈Sn &s>n |xs − xs+1 | ≤ n∈N (n ·
2−n+1 + k · 2−n+1 ) ≤ 8 + 2k. That is, x is a weakly computable real number. Therefore, ∗-EC ⊆ WC. By the assertion SC ∗-cEC of item (2), this inclusion is also proper. Theorem 9. For any recursive functions f, g with ∃∞ n(f (n) < g(n)), there is a g-cEC real number which is not f -cEC, i.e., g-cEC \ f -cEC = ∅. Proof. We construct a computable sequence (xs ) of rational numbers which satisfies, for any e ∈ N, the following requirements. N:
(xs ) converges g-effectively to x, and
Re :
If (ϕe (s))s converges f -effectively, then x = lims ϕe (s),
where (ϕe ) is an effective enumeration of all computable partial functions ϕe :⊆ N → Q. To satisfy a single requirement Re , choose a rational interval Ie of length 2−ne for some ne ∈ N such that f (ne ) < g(ne ). Divide it equally into four subintervals Ii , for i < 4, of the length 2−(ne +2) . Define xs as the middle point of the interval I1 as long as the sequence (ϕe (s))s does not enter the interval I1 . Otherwise, if ϕe (s) enters into I1 for some s, then let xs be the middle point of I3 . Later, if ϕe (t) enters I3 for some t > s, then let xt be the middle point of I1 again, and so on. If (ϕe (s))s converges f -effectively, then (xs ) needs at most f (ne ) + 1 ≤ g(ne )) jumps to guarantee that lim xs = lims ϕe (s). Thus, the requirement N is satisfied too. To satisfy all the requirements simultaneously, we will construct an increasing sequence (ne ) of natural numbers such that f (ne ) < g(ne ) and ne + 2 ≤ ne+1 for all e ∈ N, and two sequences (Ie ) and (Je ) of rational numbers such that Ie := [ae ; be ] and Je := [ce ; de ] which satisfy the following conditions ae < be < ce < de & be − ae = de − ce = 2−(ne +1) & ce − be = 2−ne ,
(2)
Ershov’s Hierarchy of Real Numbers
689
and Ie+1 ∪ Je+1 ⊂ Ie for all e ∈ N. The intervals Ie and Je are reserved for the requirement Re . That is, we construct a computable sequence (xs ) of rational numbers such that xs is properly chosen from Ie or Je in order to guarantee lims xs = lims ϕe (s). In general, the sequences (ne ), (Ie ) and (Je ) are not computable but they can be effectively approximated. Namely, at stage s, we can construct the finite approximation sequence (ne,s )e≤k(s) , (Ie,s )e≤k(s) and (Je,s )e≤k(s) , where k(s) ∈ N satisfies lims k(s) = ∞. At any stage s, we choose a rational number xs such that xs ∈ Ie,s for all e ≤ k(s). If, for some t, ϕe,s (t) enters the interval Ie,s too, then we exchange Ie,s and Je,s . In this case, we denote this t by te,s . For any i > e, the intervals Ii and Ji will be cancelled and should be redefined with a new ni,t > ni,s for some t > s. For the same ne , the intervals Ie and Je can be exchanged at most f (ne ) times, if (ϕe (s))s converges f -effectively. Therefore, a finite injury priority construction can be applied. Corollary 3. For any k ∈ N, we have k-cEC (k + 1)-cEC. Theorem 10. There are x, y ∈ 1-cEC such that x − y ∈ / ∗-cEC. Therefore, k-cEC and ∗-cEC are not closed under addition and subtraction for any k > 0. Proof. We will construct two computable increasing sequences (xs ) and (ys ) of rational numbers which converge 1-effectively to x and y, respectively, such that z := x − y satisfies all the following requirements: Ri,j :
(ϕi (s))s converges j-effectively to ui =⇒ ui = z
where (ϕi ) is an effective enumeration of all partial computable functions ϕi :⊆ N → Q. To satisfy Re (e := i, j), we choose two natural numbers ne and me such that me = 2j +ne +2 and an rational interval I := [a0e ; a8e ] of length 2−me +2 . ] for k < 8. The interval I is divided equally into eight subintervals Ik := [ake ; ak+1 e At the beginning, let x0 := a2e and y0 = 0 and hence z0 := x0 −y0 = a2e ∈ J := Ie2 , where J serves as a witness interval of Re such that any element z ∈ J satisfies Re . If, at some stage s0 > 0, ϕi (t0 ) enters the interval J for some t0 , then we define xs0 := x0 + 2−(ne +1) + 3 · 2−(me +1) , ys0 := y0 + 2−(ne +1) and J := Ie5 . Accordingly we have zs0 := xs0 − ys0 = z0 + 3 · 2−(me +1) and hence zs0 ∈ J. If, at a later stage s1 > s0 , ϕi (t1 ) enters the interval J := Ie5 for some t1 > t0 , then we define the xs1 := xs0 + 2−(ne +2) + 3 · 2−(me +1) , ys1 := ys0 + 2−(ne +2) and J := Ie2 . In this case, we have zs1 := xs1 − ys1 = z0 + 3 · 2−(me +1) and hence zs1 ∈ J. This can happen at most j times if (ϕi (s))s converges j-effectively. Thus we have lims zs = lims ϕi (s) and Re is satisfied. To satisfy all the requirements, we apply a finite injury priority construction.
References 1. K. Ambos-Spies, K. Weihrauch, and X. Zheng. Weakly computable real numbers. Journal of Complexity, 16(4):676–690, 2000. 2. A. J. Dunlop and M. B. Pour-El. The degree of unsolvability of a real number. In J. Blanck, V. Brattka, and P. Hertling, editors, Computability and Complexity in Analysis, volume 2064 of LNCS, pages 16–29, Berlin, 2001. Springer. CCA 2000, Swansea, UK, September 2000.
690
Xizhong Zheng, Robert Rettinger, and Romain Gengler
3. Y. L. Ershov. A certain hierarchy of sets. i, ii, iii. (Russian). Algebra i Logika. 7(1):47–73, 1968; 7(4):15–47, 1968; 9:34–51, 1970. 4. C.-K. Ho. Relatively recursive reals and real functions. Theoretical Computer Science, 210:99–120, 1999. 5. K.-I. Ko. Complexity Theory of Real Functions. Progress in Theoretical Computer Science. Birkh¨ auser, Boston, 1991. 6. J. Myhill. Criteria of constructibility for real numbers. The Journal of Symbolic Logic, 18(1):7–10, 1953. 7. R. Rettinger, X. Zheng, R. Gengler, and B. von Braunm¨ uhl. Weakly computable real numbers and total computable real functions. In Proceedings of COCOON 2001, Guilin, China, August 20-23, 2001, volume 2108 of LNCS, pages 586–595. Springer, 2001. 8. H. G. Rice. Recursive real numbers. Proc. Amer. Math. Soc., 5:784–791, 1954. 9. R. M. Robinson. Review of “Peter, R., Rekursive Funktionen”. The Journal of Symbolic Logic, 16:280–282, 1951. 10. J. R. Shoenfield. On degrees of unsolvability. Ann. of Math. (2), 69:644–653, 1959. 11. R. Soare. Cohesive sets and recursively enumerable Dedekind cuts. Pacific J. Math., 31:215–231, 1969. 12. R. I. Soare. Recursively enumerable sets and degrees. A study of computable functions and computably generated sets. Perspectives in Mathematical Logic. SpringerVerlag, Berlin, 1987. 13. E. Specker. Nicht konstruktiv beweisbare S¨ atze der Analysis. The Journal of Symbolic Logic, 14(3):145–158, 1949. 14. A. M. Turing. On computable numbers, with an application to the “Entscheidungsproblem”. Proceedings of the London Mathematical Society, 42(2):230–265, 1936. 15. X. Zheng. Recursive approximability of real numbers. Mathematical Logic Quarterly, 48(Suppl. 1):131–156, 2002. 16. X. Zheng. On the Turing degrees of weakly computable real numbers. Journal of Logic and Computation, 13(2): 159-172, 2003.
Author Index ` Alvarez, C. 142 Amano, Kazuyuki 152 Ambos-Spies, Klaus 162 Anantharaman, Siva 169 Ausiello, G. 179 Baba, Kensuke 189 Banderier, Cyril 198 Bannai, Hideo 208 Bazgan, C. 179 Beier, Ren´e 198 Benkoczi, Robert 218 Bhattacharya, Binay 218 Blanchard, F. 228 Blesa, M. 142 Bodlaender, Hans L. 239 B¨ ohler, Elmar 249 Bonsma, Paul S. 259 Boreale, Michele 269, 279 Brosenne, Henrik 290 Brueggemann, Tobias 259 Bucciarelli, Antonio 300 Buhrman, Harry 1 Buscemi, Maria Grazia 269 Carton, Olivier 308 ˇ Cern´ a, Ivana 318 Cervelle, J. 228 Chen, Hubie 328, 338 Chen, Zhi-Zhong 348 Chrobak, Marek 218 Crochemore, M. 622 Dalmau, Victor 358 Dang, Zhe 480 Delhomm´e, Christian 378 Demange, M. 179 D´ıaz, J. 142 Duval, Jean-Pierre 388 Egecioglu, Omer 480 Epstein, Leah 398, 408 Feldmann, R. 21 Fellows, Michael R. 239 Fern´ andez, A. 142 Ford, Daniel K. 358
Formenti, E. 228 Friedl, Katalin 419 Gadducci, Fabio 279 Gairing, M. 21 Gastin, Paul 429, 439 Gengler, Romain 681 Geser, Alfons 449 Glaßer, Christian 249 Gorrieri, Roberto 46 Gramlich, Gregor 460 Grossi, R. 622 Hagiwara, Masayuki 490 Hannay, Jo 68 Hlinˇen´ y , Petr 470 Hofbauer, Dieter 449 Homeister, Matthias 290 Ibarra, Oscar H. 480 Inenaga, Shunsuke 208 Ishii, Toshimasa 490 Katsumata, Shin-ya 68 Knapik, Teodor 378 Kolpakov, Roman 388 Kouno, Mitsuharu 348 Krysta, Piotr 500 Kucherov, Gregory 388 Kumar, K. Narayan 429 Kutylowski, Miroslaw 511 Larmore, Lawrence L. 218 Lasota, Slawomir 521 Lecroq, Thierry 388 Lefebvre, Arnaud 388 Leporati, Alberto 92 Letkiewicz, Daniel 511 L¨ oding, Christof 531 Loyer, Yann 541 L¨ ucking, Thomas 21, 551 Luttik, Bas 562 Magniez, Fr´ed´eric 419 Marco, Gianluca De 368 Martinelli, Fabio 46 Mart´ınez, Conrado 572 Maruoka, Akira 152
692
Author Index
Mauri, Giancarlo 92 Mavronicolas, Marios 551 Meer, K. 582 Meghini, Carlo 592 Mehlhorn, Kurt 198 Meister, Daniel 249 Merkle, Wolfgang 602 Miltersen, Peter Bro 612 Molinero, Xavier 572 Monien, Burkhard 21, 551 Mukund, Madhavan 429 Narendran, Paliath Oddoux, Denis
169
439
Paschos, V. Th. 179 Pel´ anek, Radek 318 Pelc, Andrzej 368 Pinchinat, Sophie 642 Pisanti, N. 622 Radhakrishnan, Jaikumar 612 Reimann, Jan 602 Reith, Steffen 632 Rettinger, Robert 681 Riedweg, St´ephane 642 Rode, Manuel 21, 551 Rohde, Philipp 531 R¨ ohrig, Hein 1 Rusinowitch, Michael 169 Rychlik, Marcin 652 Rytter, Wojciech 218
Sagot, M.-F. 622 Salehi, Saeed 662 Salibra, Antonino 300 Sanders, Peter 500 Sannella, Donald 68 Santha, Miklos 419 Saxena, Gaurav 480 Sen, Pranab 419 Serna, M. 142 Shinohara, Ayumi 189, 208 Spirakis, Paul 551 Spyratos, Nicolas 592 Straccia, Umberto 541 Takeda, Masayuki 189, 208 Tassa, Tamir 408 Thilikos, Dimitrios M. 239 Thomas, D. Gnanaraj 378 Thomas, Wolfgang 113 Tsuruta, Satoshi 189 Tzitzikas, Yannis 592 V¨ ocking, Berthold Vrto, Imrich 551
500
Waack, Stephan 290 Waldmann, Johannes 449 Wegener, Ingo 125, 612 Woeginger, Gerhard J. 259 Woelfel, Philipp 671 Zheng, Xizhong
681