This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
max s, d max 0, − 1 − s . p
(71)
Then Bqs (Lp (Rd )) is the collection of all tempered distributions f such that f is representable as f =
k∈Zd
a k k +
d −1 ∞ 2
ai,j,k i,j,k
(convergence in
S )
i=1 j =0 k∈Zd
with ⎛ f |Bqs (Lp (Rd ))∗ := ⎝
⎞1/p
k∈Z
⎛
|ak |p ⎠
d
d −1 ∞ 2
⎜ +⎝
⎛ 2j (s+d(1/2−1/p))q ⎝
i=1 j =0
⎞q/p ⎞1/q ⎟ |ai,j,k |p ⎠ ⎠ < ∞,
k∈Zd
if q < ∞ and ⎛
s (Lp (Rd ))∗ := ⎝ f |B∞
⎞1/p |ak |p ⎠
k∈Zd
+
sup i=1,...,2d −1
⎛ sup 2j (s+d(1/2−1/p)) ⎝
j =0,...
k∈Z
⎞1/p |ai,j,k |p ⎠
<∞.
d
The representation is unique and
i,j,k and ai,j,k = f,
ak = f, k
i,j,k } is an isomorphic map of Bqs (Lp (Rd )) onto the hold. Further I : f → {f, k , f, sequence space (equipped with the quasi-norm · |Bqs (Lp (Rd ))∗ ), i.e. · |Bqs (Lp (Rd ))∗ may serve as an equivalent quasi-norm on Bqs (Lp (Rd )). A proof of Proposition 4 has been given in [47], see also [25] for a homogeneous version. A different proof, but restricted to s > d( p1 − 1)+ , is given in [3, Theorem 3.7.7]. However, there are many forerunners with some restrictions on s, p and q.
644
S. Dahlke et al. / Journal of Complexity 23 (2007) 614 – 648
A.2. Besov spaces on domains Let ⊂ Rd be an bounded open nonempty set. Then we define Bqs (Lp ()) to be the collection of all distributions f ∈ D () such that there exists a tempered distribution g ∈ Bqs (Lp (Rd )) satisfying f () = g()
for all ∈ D(),
i.e. g| = f in D (). We put f |Bqs (Lp ()) := inf g |Bqs (Lp (Rd )), where the infimum is taken with respect to all distributions g as above. A.3. Sobolev spaces on domains Let be a bounded Lipschitz domain. Let m ∈ N. As usual H m () denotes the collection of all functions f such that the distributional derivatives D f of order || m belong to L2 (). The norm is defined as f |H m () := D f |L2 (). || m
It is well-known that H m (Rd ) = B2m (L2 (Rd )) in the sense of equivalent norms, cf. e.g. [42]. As a consequence of the existence of a bounded linear extension operator for Sobolev spaces on bounded Lipschitz domains, cf. [35, p. 181], it follows H m () = B2m (L2 ()) (equivalent norms), for such domains. For fractional s>0 we introduce the classes by complex interpolation. Let 0 < s < m, s ∈ N. Then, following [26, 9.1], we define ' ( s H s () := H m (), L2 () , = 1 − . m This definition does not depend on m in the sense of equivalent norms, cf. [45]. The outcome H s () coincides with B2s (L2 ()), cf. [9] for further details. A.4. Spaces on domains and boundary conditions We concentrate on homogeneous boundary conditions. Here it makes sense to introduce two further scales of function spaces (distribution spaces). Definition 4. Let ⊂ Rd be an open nontrivial set. Let s ∈ R and 0 < p, q ∞. (i) Then B˚ qs (Lp ()) denotes the closure of D() in Bqs (Lp ()), equipped with the quasi-norm of Bqs (Lp ()). (ii) Let s 0. Then H0s () denotes the closure of D() in H s (), equipped with the norm of H s ().
S. Dahlke et al. / Journal of Complexity 23 (2007) 614 – 648
645
qs (Lp ()) we denote the collection of all f ∈ D () such that there is a g ∈ Bqs (Lp (Rd )) (iii) By B with g| = f
and
supp g ⊂ ,
(72)
equipped with the quasi-norm qs (Lp ()) = inf g |Bqs (Lp (Rd )), f |B where the infimum is taken over all such distributions g as in (72). qs (Lp ()) = Bqs (Lp ()) Remark 9. For a bounded Lipschitz domain it holds B˚ qs (Lp ()) = B if 1 1 1 0 < p, q < ∞, max − 1, d −1 <s< , p p p cf. [19, Corollary 1.4.4.5, 45]. Hence, s (L2 ()) = B s (L2 ()) = H s () H0s () = B˚ 2s (L2 ()) = B 2 2 if 0 s < 21 . A.5. Sobolev spaces with negative smoothness In what follows duality has to be understood in the framework of the dual pairing (D(), D ()). Definition 5. Let ⊂ Rd be a bounded Lipschitz domain. For s > 0 we define ⎧ ⎨ H s () if s − 21 = integer, 0 H −s () := ⎩ B s (L2 ()) otherwise. 2 Remark 10. If ⊂ Rd is a bounded Lipschitz domain then s (L2 ()), H0s () = B 2
s > 0, s −
1 2
= integer
holds. Furthermore H −s () = B2−s (L2 ()),
s>0
(73)
to be understood in the sense of equivalent norms. Again we refer to [9] for detailed references. A.6. Besov spaces on the torus Here our general reference is [34, Chapter 3]. Since we are using also spaces with negative smoothness s<0 and/or p, q<1 we shall give a definition, which relies on Fourier analysis.
646
S. Dahlke et al. / Journal of Complexity 23 (2007) 614 – 648
Let D(T ) denote the collection of all complex-valued infinitely differentiable functions on T (i.e. 2-periodic). By D (T ) we denote its dual. Any f ∈ D (T ) can be identified with its Fourier ∞ series k=−∞ ck (f ) eikx where ck (f ) = (2)−1 f (e−ikx ). Next we need a smooth dyadic decompositions of unity. Let ∈ C0∞ (R) be a function such that (x) = 1 if |x|1 and (x) = 0 if |x|2. Then we put 0 (x) := (x),
j (x) := (2−j x) − (2−j +1 x), j ∈ N.
(74)
It follows ∞
j (x) = 1,
x ∈ R,
j =0
and
# supp j ⊂ x ∈ Rd :
$ 2j −2 |x| 2j +1 ,
j = 1, 2, . . . .
By means of these functions we define the Besov classes. Definition 6. Let s ∈ R and 0 < p, q ∞. Then Bqs (Lp (T)) is the collection of all periodic tempered distributions f such that ⎛ ∞ q ⎞1/q ∞ f |Bqs (Lp (T)) = ⎝ 2sj q j (k) ck (f ) eikx |Lp (T) ⎠ < ∞ j =0
k=−∞
if q < ∞ and s f |B∞ (Lp (T)) =
sup
j =0,1,...
2sj
∞
j (k) ck (f ) eikx |Lp (T) < ∞
k=−∞
if q = ∞. Remark 11. (i) These classes are quasi-Banach spaces. They do not depend on the chosen function (up to equivalent quasi-norms). (ii) There is a number of different characterizations of periodic Besov spaces, cf. e.g. [34, Chapter 3]. In particular we wish to refer to the characterization by differences [34, 3.5.4]. References [1] C. Canuto, A. Tabacco, K. Urban, The wavelet element method, Part I: construction and analysis, Appl. Comput. Harm. Anal. 6 (1999) 1–52. [2] O. Christensen, An Introduction to Frames and Riesz Bases, Birkhäuser, Basel, 2003. [3] A. Cohen, Numerical Analysis of Wavelet Methods, Elsevier Science, Amsterdam, 2003. [4] A. Cohen, W. Dahmen, R. DeVore, Multiscale decompositions on bounded domains, TAMS 352 (2000) 3651–3685. [5] M. Costabel, Boundary integral operators on Lipschitz domains. Elementary results, SIAM 19 (1988) 613–626. [6] S. Dahlke, M. Fornasier, T. Raasch, Adaptive frame methods for elliptic operator equations, Bericht Nr. 2004-3, Philipps-Universität Marburg, 2004, Adv. Comput. Math., to appear.
S. Dahlke et al. / Journal of Complexity 23 (2007) 614 – 648
647
[7] S. Dahlke, M. Fornasier, T. Raasch, R. Stevenson, M. Werner, Adaptive frame methods for elliptic operator equations: the steepest descent approach, IMA J. Numer. Anal. 19 (2007) doi: 10.1093/imanum/drl03. [8] S. Dahlke, E. Novak, W. Sickel, Optimal approximation of elliptic problems by linear and nonlinear mappings I, J. Complexity 22 (2006) 29–49. [9] S. Dahlke, E. Novak, W. Sickel, Optimal approximation of elliptic problems by linear and nonlinear mappings II, J. Complexity 22 (2006) 549–603. [10] W. Dahmen, R. Schneider, Wavelets with complementary boundary conditions—function spaces on the cube, Results Math. 34 (1998) 255–293. [11] W. Dahmen, R. Schneider, Composite wavelet bases for operator equations, Math. Comp. 68 (1999) 1533–1567. [12] W. Dahmen, R. Schneider, Wavelets on manifolds I: construction and domain decomposition, SIAM J. Math. Anal. 31 (1999) 184–230. [13] R.A. DeVore, R. Howard, C. Micchelli, Optimal nonlinear approximation, Manuscripta Math. 63 (1989) 469–478. [14] R.A. DeVore, G. Kyriazis, D. Leviatan, V.M. Tikhomirov, Wavelet compression and nonlinear n-widths, Adv. Comput. Math. 1 (1993) 197–214. [15] R.A. DeVore, G. Kyriazis, P. Wang, Multiscale characterization of Besov spaces on bounded domains, J. Approx. Theory 93 (1998) 273–293. [16] D. Dung, Continuous algorithms in n-term approximation and non-linear widths, J. Approx. Theory 102 (2000) 217–242. [17] D. Dung, V.Q. Thanh, On nonlinear n-widths, Proc. AMS 124 (1996) 2757–2765. [18] M. Frazier, B. Jawerth, A discrete transform and decomposition of distribution spaces, J. Functional Anal. 93 (1990) 34–170. [19] P. Grisvard, Elliptic Problems in Nonsmooth Domains, Pitman, Boston, 1985. [20] K. Gröchenig, Describing functions: atomic decompositions versus frames, Monatsh. Math. 112 (1991) 1–42. [21] K. Gröchenig, Foundations of Time–Frequency Analysis, Birkhäuser, Basel, 2000. [22] K. Gröchenig, Localization of frames, Banach frames, and the invertibility of the frame operator, J. Fourier Anal. Appl. 10 (2004) 105–132. [23] W. Hackbusch, Elliptic Differential Equations: Theory and Numerical Treatment, Springer, Berlin, 1992. [24] B. Kashin, Approximation properties of complete orthonormal systems, Trudy Mat. Inst. Steklov 172 (1985) 187–191. [25] G. Kyriazis, Decomposition systems for function spaces, Studia Math. 157 (2003) 133–169. [26] J.L. Lions, E. Magenes, Non-Homogeneous Boundary Value Problems and Applications I, Springer, Berlin, 1972. [27] P. Mathé, s-Numbers in information-based complexity, J. Complexity 6 (1990) 41–66. [28] Y. Meyer, Wavelets and Operators, Cambridge University Press, Cambridge, 1992. [29] S.M. Nikol’skij, Approximation of Functions of Several Variables and Imbedding Theorems, Springer, Berlin, 1975. [30] J. Peetre, New Thoughts on Besov Spaces, Duke University Mathematics Series, Durham, 1976. [31] A. Pietsch, s-numbers of operators in Banach spaces, Studia Math. 51 (1974) 201–223. [32] T. Runst, W. Sickel, Sobolev Spaces of Fractional Order, Nemytskij Operators and Nonlinear Partial Differential Equations, de Gruyter, Berlin, 1996. [33] V.S. Rychkov, On restrictions and extensions of the Besov and Triebel–Lizorkin spaces with respect to Lipschitz domains, J. London Math. Soc. 60 (1999) 237–257. [34] H.-J. Schmeisser, H. Triebel, Topics in Fourier Analysis and Function Spaces, Geest Portig, Wiley, Chichester, 1987. [35] E.M. Stein, Singular Integrals and Differentiability Properties of Functions, Princeton University Press, Princeton, 1970. [36] M.I. Stesin, Aleksandrov diameters of finite-dimensional sets and of classes of smooth functions, Dokl. Akad. Nauk SSSR 220 (1974) 1278–1281. [37] R. Stevenson, Adaptive solution of operator equations using wavelet frames, SIAM J. Numer. Anal. 41 (2003) 1074–1100. [38] V.N. Temlyakov, Greedy algorithms with regard to multivariate systems with special structure, Constr. Approx. 16 (2000) 399–425. [39] V.N. Temlyakov, Universal bases and greedy algorithms for anisotropic function classes, Constr. Approx. 18 (2002) 529–550. [40] V.N. Temlyakov, Nonlinear methods of approximation, Found. Comput. Math. 3 (2003) 33–107. [41] H. Triebel Periodic spaces of Besov–Hardy–Sobolev type and related maximal inequalities for trigonometric polynomials, in: Functions, Series, Operators, Budapest, 1980, Colloquia Mathematica Societatis Janos Bolyai, col. 35 II, North-Holland, Amsterdam, New York, 1983, pp. 1201–1209. [42] H. Triebel, Theory of Function Spaces, Birkhäuser, Basel, 1983.
648
S. Dahlke et al. / Journal of Complexity 23 (2007) 614 – 648
[43] H. Triebel, Theory of Function Spaces. II, Birkhäuser, Basel, 1992. [44] H. Triebel, The Structure of Functions, Birkhäuser, Basel, 2001. [45] H. Triebel, Function spaces in Lipschitz domains and on Lipschitz manifolds. Characteristic functions as pointwise multipliers, Rev. Mat. Complutense 15 (2002) 475–524. [46] H. Triebel, Theory of Function Spaces. III, Birkhäuser, Basel, 2006. [47] H. Triebel, Wavelets on domains, the extension problem, Report, Jena, 2006.
Journal of Complexity 23 (2007) 649 – 652 www.elsevier.com/locate/jco
Note
A note on the existence of sequences with small star discrepancy Josef Dick UNSW Asia, 1 Kay Siang Road, Singapore 248922, Singapore Received 20 October 2006; accepted 2 January 2007 Available online 1 February 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday
Abstract It was shown by Heinrich et al. [The inverse of the star-discrepancy depends linearly on the dimension, Acta Arith. 96 (2001) 279–302] that there exist point sets for which the inverse of the star discrepancy depends linearly on the dimension. In this paper we extend those results by showing that there exist point sets extensible in the modulus and the dimension for which the star discrepancy satisfies a tractability bound for all dimensions and moduli. © 2007 Elsevier Inc. All rights reserved. Keywords: Star discrepancy; Tractability; Sequence
1. Introduction The classical Koksma–Hlawka inequality [6] (see also [7]) states that the error, when ap proximating an integral [0,1]s f (x) dx by a quasi-Monte Carlo rule n−1 nk=1 f (xk ), where x1 , . . . , xn ∈ [0, 1]s are the quadrature points, is bounded by n 1 ∗ f (x) dx − f (xk ) V (f )Dn,s (Pn,s ), [0,1]s n k=1
where V (f ) denotes the variation of the function f in the sense of Hardy and Krause and ∗ (P ) denotes the star discrepancy of the quadrature points P s Dn,s n,s n,s = {x1 , . . . , xn } ⊂ [0, 1] .
E-mail address: [email protected]. 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.01.004
650
J. Dick / Journal of Complexity 23 (2007) 649 – 652
The star discrepancy is a measure of how uniformly the quadrature points are distributed. It is defined by n 1 ∗ Dn,s (Pn,s ) = sup 1[0,t) (xk ) − t1 · · · ts , s n t∈[0,1] k=1
where t = (t1 , . . . , ts ), [0, t) = sj =1 [0, tj ) and 1[0,t) (xk ) is 1 if xk ∈ [0, t) and 0 otherwise. The Koksma–Hlawka inequality implies that well-distributed point sets in the unit cube, i.e. point sets with small star discrepancy, can be used to approximate integrals over the unit cube. Many constructions of such point sets have been introduced and analysed [7,9] and are used successfully in applications. The convergence of the star discrepancy of such point sets is typically of order n−1 (log n)s . As the factor (log n)s becomes significant for large s and many applications require very high dimensions of s the question arose whether point sets with small discrepancy for large dimensions exist. Hence in a new development researchers started to investigate whether point sets for which the dependence on the dimension is much weaker exist and how such point sets can be constructed [1,3,5], see also [2] for some open problems. An affirmative answer was given in [3] where it was shown that for each n, s ∈ N (N the set of natural numbers) there do exist point sets for which ∗ Dn,s (P )C s/n (1) for some constant C > 0. Note that the result here states that the point set achieving such a bound might differ for different choices of n and s. On the other hand it is desirable to have not just finite point sets with small star discrepancy just for one fixed dimension, but sequences with small star discrepancy for all (or at least a range) of dimensions. Such sequences are especially useful in applications when one encounters the situation of wanting to increase the accuracy and/or the dimension of the approximation of the integral without discarding the computation already undertaken. Such point sets have previously been considered for example in [4,8]. In this paper we show that there exists a sequence of points in infinite dimension for which the projection of the first n points to the first s coordinates achieves an upper bound similar to (1) for all n and s. The results are presented in the following section. 2. Results The following notation will be used: for a given sequence P = (xk )k 1 with xk ∈ [0, 1]∞ , let Pn,s denote the point set consisting of the projection of the first n points x1 , . . . , xn to the ∗ (P ) = D ∗ (P ). Further P first s coordinates and let Dn,s n,s will always denote a point set n,s n,s s of cardinality n in [0, 1] and Pn will denote a point set of cardinality n in [0, 1]∞ . Further let n = (nm )m 1 be a strictly increasing sequence of natural numbers, i.e. 1 n1 < n2 < · · ·. Throughout the paper constants are always finite positive real numbers. ∗ (P ) satisfies The following theorem now establishes the existence of sequences P for which Dn,s a bound which depends only polynomially on the dimension. Theorem 1. There is a constant C such that for every strictly increasing sequence of natural numbers n = (nm )m 1 there exists a sequence P ⊂ [0, 1]∞ such that Dn∗m ,s (P )C s log(m + 1)/nm for all m, s ∈ N.
J. Dick / Journal of Complexity 23 (2007) 649 – 652
651
Choose for example nm = m = n. Then the upper bound in the theorem states that the star discrepancy of Pn,s is bounded by C s log(n + 1)/n for all n, s 1. Or for example by m choosing nm = b (with b 2 being an integer) we obtain that the star discrepancy is bounded by C s log(logb (nm ) + 1)/nm for all m, s 1. The bound can obviously be improved further by considering even bigger steps in the moduli. It can be seen from the proof below that the statement in the theorem above holds in a more general form. Indeed, not only the existence of an appropriate sequence is shown, but that a certain probability measure on the set of sequences satisfying the desired properties can be arbitrarily close to 1. Hence we can also show the corollary below. Corollary 1. Given any strictly increasing sequence of natural numbers n = (nm )m 1 (i.e. 1 n1 < n2 < · · ·), the following holds for a sequence P ⊂ [0, 1]∞ with probability 1: there is a constant CP ,n such that Dn∗m ,s (P )CP ,n s log(m + 1)/nm for all m, s ∈ N. Note that the constant in the corollary above may depend on the sequences P and n, hence we used the notation CP ,n . From the particulars of the proof of the theorem we can also obtain the following result. Corollary 2. There is a constant C such that for each n there exists a point set Pn ∈ [0, 1]∞ , consisting of n points, such that for all s ∈ N the projection onto the first s coordinates Pn,s satisfies ∗ (Pn,s )C s/n. Dn,s Again, the probability measure of point sets satisfying the bound in the above corollary can be made arbitrarily close to 1 by choosing the constant C large enough. By applying the proof technique of the next section to [3, Theorem 1] instead of [3, [Theorem 3] one obtains a slightly weaker bound, but with an explicit constant. We obtain the following corollary. Corollary 3. For all strictly increasing sequences of natural numbers n = (nm )m 1 there exists an infinite dimensional sequence P such that
√ s nm 8 (1 + m + s) log 2 + s log 1 + Dn∗m ,s (P ) nm (1 + m + 2s) log 2 for all m, s ∈ N. In the following section we provide the proofs of the above results. 3. Proofs In the proof of [3, Theorem 3] it was shown that for given number of points n and dimension √ s the probability that an i.i.d. randomly chosen point set P has star discrepancy at most s/n n,s is at least 1 − K2 e−2
2
2
that K
2 e2
s
, for some constant K and for all max(1, K, 0 ), where 0 is such
for all 0 .
652
J. Dick / Journal of Complexity 23 (2007) 649 – 652
Let now 1n1 < n2 < · · · be an arbitrary sequence of integers and let P be a sequence x1 , x2 , . . . , ∈ [0, 1]∞ where all xk are i.i.d. in [0, 1]∞ . Then the probability that an i.i.d. randomly chosen sequence P is such that, for some but fixed m and s in N, the star discrepancy arbitrary √ 2 s Dn∗m ,s (P )m s/nm , is at least 1 − K2m e−2m for any m max(1, K, 0 ). Let now 0 be such that 2K2 e2 for all 0 . We now choose a sequence of real numbers m = c log(m + 1) for m 1, where c is chosen such that c max(1, K, 0 )/ log 2, i.e. m max(1, K, 0 ). Then the probability that an i.i.d. √randomly chosen sequence P is such that there is an m and an s in N such that Dn∗m ,s (P ) > m s/nm is bounded above by ∞ ∞ 2 s 2 K2m e−2m 2K 2m e−2m 2
m,s=1
= 2K
m=1 ∞
c2 (m + 1)−2c log(m + 1). 2
(2)
m=1
By choosing the constant c large enough the last expression can be made arbitrarily√small. Hence the probability that an i.i.d. randomly chosen sequence P satisfies Dn∗m ,s (P ) m s/nm for all m, s ∈ √ N can be made greater than 0 and hence there exists a sequence P such that Dn∗m ,s (P )m s/nm for all m, s ∈ N. Thus the theorem follows. We now prove the first corollary. Above we showed already that the probability that an i.i.d. randomly chosen sequence P is such that there is an m and an s in N such that Dn∗m ,s (P ) > √ m s/nm is bounded above by (2). Now by increasing c this probability can be made arbitrarily small and thus the probability, that a constant CP ,n as in the corollary exists, is 1. A proof of Corollary 2 and 3 can be obtained using the same arguments as in the proof of Theorem 1. Acknowledgments The support of the Australian Research Council under its Center of Excellence Program is gratefully acknowledged. The author would also like to thank the referee who suggested Corollary 1 and an improvement of Theorem 1. References [1] B. Doerr, M. Gnewuch, A. Srivastav, Bounds and constructions for the star-discrepancy via -covers, J. Complexity 21 (2005) 691–709. [2] S. Heinrich, Some open problems concerning the star-discrepancy. Numerical integration and its complexity (Oberwolfach, 2001), J. Complexity 19 (2003) 416–419. [3] S. Heinrich, E. Novak, G.W. Wasilkowski, H. Wo´zniakowski, The inverse of the star-discrepancy depends linearly on the dimension, Acta Arith. 96 (2001) 279–302. [4] F.J. Hickernell, H. Niederreiter, The existence of good extensible rank-1 lattices. Numerical integration and its complexity (Oberwolfach, 2001), J. Complexity 19 (2003) 286–300. ˇ [5] A. Hinrichs, Covering numbers, Vapnik–Cervonenkis classes and bounds for the star-discrepancy, J. Complexity 20 (2004) 477–483. [6] E. Hlawka, Funktionen von beschränkter Variation in der Theorie der Gleichverteilung, Ann. Mat. Pura Appl. 54 (1961) 325–333 (in German). [7] H. Niederreiter, Random number generation and quasi-Monte Carlo methods, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 63, SIAM, Philadelphia, PA, 1992. [8] H. Niederreiter, The existence of good extensible polynomial lattice rules, Monatsh. Math. 139 (2003) 295–307. [9] H. Niederreiter, Constructions of (t, m, s)-nets and (t, s)-sequences, Finite Fields Appl. 11 (2005) 578–600.
Journal of Complexity 23 (2007) 653 – 661 www.elsevier.com/locate/jco
Optimal recovery of solutions of the generalized heat equation in the unit ball from inaccurate data夡 K.Yu. Osipenko∗ , E.V. Wedenskaya “MATI”—Russian State Technological University, Russia Received 29 October 2006; accepted 8 March 2007 Available online 27 March 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday
Abstract We consider the problem of optimal recovery of solutions of the generalized heat equation in the unit ball. Information is given at two time instances, but inaccurate. The solution is to be constructed at some intermediate time. We provide the optimal error and present an algorithm which achieves this error level. © 2007 Elsevier Inc. All rights reserved. Keywords: Optimal recovery; Heat equation; Inaccurate information
The application of optimal recovery theory to problems of partial differential equations was started by Traub and Wo´zniakowski in [12]. In particular, this monograph considered optimal recovery of solutions of the heat equation from finitely many Fourier coefficients of the initial function. Several recovery problems for partial differential equation from noisy information were recently studied in [2,5,7,9,13,14]. The results considered in these papers were based on a general method for optimal recovery of linear operators developed in [3,4] (see also [8]). This method extended previous research from [6]. Various problems of optimal recovery from noisy information may be found in [10] (see also [15] where the complexity of differential and integral equations is discussed). 夡 The research was carried out with the financial support of the Russian Foundation for Basic Research (Grant nos. 0501-00275, 06-01-81004, 05-01-00261, and 06-01-00530) and the President Grant for State Support of Leading Scientific Schools in Russian Federation (Grant no. NSH-5813.2006.1).
∗ Corresponding author.
E-mail addresses: [email protected] (K.Yu. Osipenko), [email protected] (E.V. Wedenskaya). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.03.003
654
K.Yu. Osipenko, E.V. Wedenskaya / Journal of Complexity 23 (2007) 653 – 661
Here we consider the optimal recovery problem for solutions of the generalized heat equation in the unit d-ball at the time from inaccurate solutions at the times t1 and t2 . Set d d 2 2 xj < 1 , B = x = (x1 , . . . , xd ) : |x| = j =1
Sd−1 = { x ∈ Rd : |x| = 1 }. Consider the problem of finding the solution of the generalized heat equation in L2 (Bd ): ut + (−)/2 u = 0, u|t=0 = f (x), u|x∈Sd−1 = 0.
> 0, (1)
Let 0 t1 < t2 . Suppose we know approximate solutions y1 and y2 of (1) at times t1 and t2 , given with errors 1 and 2 in the L2 (Bd ) norm. We want to recover in the best way the solution of (1) at the time , t1 < < t2 . We assume that y1 , y2 ∈ L2 (Bd ) satisfy u(·, tj ) − yj (·)L2 (Bd ) j ,
j = 1, 2.
Any map : L2 (Bd ) × L2 (Bd ) → L2 (Bd ) is admitted as a recovery method. The quantity e (, L2 (Bd ), 1 , 2 , ) =
sup f,y1 ,y2 ∈L2 (Bd ) u(·,tj )−yj (·) j , j =1,2 L2 (Bd )
u(·, ) − (y1 , y2 )(·)L2 (Bd ) ,
where u is the solution of (1), is called the error of the method . The quantity E (, L2 (Bd ), 1 , 2 ) =
inf
:L2 (Bd )×L2 (Bd )→L2 (Bd )
e (, L2 (Bd ), 1 , 2 , )
is called the error of optimal recovery and a method delivering the lower bound is called an optimal recovery method. Note that the initial functions f belong to the whole space L2 (Bd ). In other words, the a priori information about initial functions is not a compact set. Therefore we use the information with infinite cardinality ([12] dealt with algorithms using information having finite cardinality). For example, it can be shown that knowing (even precisely) any finite number of Fourier coefficients of u(·, tj ), j = 1, 2, does not lead to the finite error of optimal recovery. The analysis of the problem is different for d = 1 and d > 1, because of different types of orthogonal eigensystems. We begin with the case d > 1. Let Hk denote the set of spherical harmonics of order k. It is known (see [11]) that dim H0 = a0 = 1, dim Hk = ak = (d + 2k − 2) and L2 (Sd−1 ) =
∞ k=0
Hk .
(d + k − 3)! , (d − 2)!k!
k = 1, 2, . . .
K.Yu. Osipenko, E.V. Wedenskaya / Journal of Complexity 23 (2007) 653 – 661
655
(k)
Let {Yj }aj k=1 denote an orthonormal basis in Hk . Let Jp be the Bessel function of the first kind (p)
of order p and s , s = 1, 2, . . . , be the zeros of Jp . The functions (p)
Zskj (x) =
Jp (s r) (k) Y (x ), r d/2−1 j
where r = |x|, x = x/r, and p = k + (d − 2)/2, form an orthogonal basis in L2 (Bd ). Moreover, (p)
Zskj = −(s )2 Zskj . We will use the orthonormal basis in L2 (Bd ), Yskj =
Zskj . Zskj L2 (Bd )
We recall that the operator (−)/2 is defined as follows: (−)/2 f =
∞ ∞
(s ) (p)
ak
cskj Yskj ,
j =1
s=1 k=0
where f =
ak ∞ ∞
(2)
cskj Yskj .
s=1 k=0 j =1
The solution of (1) can be easily found by the Fourier method of separation of variables. It has the form u(x, t) =
∞ ∞
(p) ) t
e−(s
ak
cskj Yskj (x),
j =1
s=1 k=0
where cskj are the Fourier coefficients of the initial function. Set (p)
ask = e−2(s
)
(we recall that p = k + (d − 2)/2 and is from (1)). It is known (see [1]) that for all s ∈ N, (p)
s
(p+1)
< s
(p)
< s+1
(p)
(p)
and s → ∞ as s → ∞. So the set of zeros of the Bessel functions s , s = 1, 2, . . ., p = k + (d − 2)/2, k = 0, 1, . . . , can be arranged in ascending order (p )
(p )
(p )
s1 1 < s2 2 < · · · < sn n < · · · .
656
K.Yu. Osipenko, E.V. Wedenskaya / Journal of Complexity 23 (2007) 653 – 661
Consequently, as1 k1 > as2 k2 > · · · > asn kn > · · · . For the case d = 1 the functions s Ys (x) = sin (x + 1), s = 1, 2, . . . , 2 form an orthonormal basis in L2 (B1 ) = L2 ([−1, 1]) and s 2 Ys = − Ys . 2 We define the operator (−)/2 as follows: /2
(−)
f =
∞ s s=1
2
cs Ys ,
where cs are the Fourier coefficients of f . It is easily verified that for d = 1 the solution of (1) is given by u(x, t) =
∞
e−(s/2) t cs Ys (x),
s=1
where cs are the Fourier coefficients of the initial function. For an arbitrary decreasing sequence 1 > 2 > · · · > 0 we introduce the following notation: 2 −t1 m = tm+1 , tm2 −t1 , 0 = t12 −t1 , +∞ , ⎧ −t2 m+1 − m−t2 ⎪ ⎪ ⎪ , ⎨ t1 −t2 m+1 − tm1 −t2 1 = ⎪ ⎪ −t1 ⎪ ⎩ 1 ,
22 21 22 21
⎧ −t m 1 − m+1 −t1 ⎪ ⎪ ⎪ , ⎨ t2 −t1 2 −t1 m − tm+1 2 = ⎪ ⎪ ⎪ ⎩ 0,
∈ m , m 1, ∈ 0 , 22 21 22 21
∈ m , m 1, ∈ 0 .
Theorem 1. Set d > 1, asm ,km , m = e−2(m/2) , d = 1. Then for all 1 , 2 > 0 the following equality: E (, L2 (Bd ), 1 , 2 ) = 1 21 + 2 22
K.Yu. Osipenko, E.V. Wedenskaya / Journal of Complexity 23 (2007) 653 – 661
657
holds. Moreover, the method
(y1 , y2 )(x) =
⎧ ∞ ∞ ak t1 /2 t /2 /2 1 ask y1skj + 2 ask2 y2skj ⎪ ⎪ ⎪ a Yskj (x), ⎪ t1 t2 ⎪ ⎨ s=1 k=0 sk j =1 1 ask + 2 ask
d > 1,
∞ ⎪ −(s/2) t1 y1s + ⎪ 2 e−(s/2) t2 y2s ⎪ −(s/2) 1 e ⎪ e Ys (x), d = 1, ⎪ ⎩ 1 e−2(s/2) t1 + 2 e−2(s/2) t2 s=1
(3)
where y1skj , y2skj and y1s , y2s are the Fourier coefficients of y1 (·) and y2 (·), is optimal. To prove Theorem 1 we use a general scheme of construction of optimal recovery methods for linear operators developed in [3,4] (see also [8]). Consider the following extremal problem: u(·, )2L (Bd ) → max, 2
u(·, tj )2L (Bd ) 2j , 2
j = 1, 2, f ∈ L2 (Bd ),
(4)
where u is the solution of problem (1). Set L(f, 1 , 2 ) = −u(·, )2L (Bd ) + 1 u(·, t1 )2L (Bd ) + 2 u(·, t2 )2L (Bd ) . 2 2 2 From [4] (see also [8]) follows: Theorem 2. Suppose that there exist 1 0, 2 0 and an admissible function f in (4) such that (a)
min
f ∈L2 (Bd )
L(f, 1 , 2 ) = L(f , 1 , 2 ),
(b) 1 ( u(·, t1 )2L (Bd ) − 21 ) + 2 ( u(·, t2 )2L (Bd ) − 22 ) = 0, 2 2 where u is the solution of (1) with the initial function f . If for all y1 , y2 ∈ L2 (Bd ) there exists a solution f0 of the problem 1 u(·, t1 ) − y1 (·)2L (Bd ) + 2 u(·, t2 ) − y2 (·)2L (Bd ) → min, 2 2
f ∈ L2 (Bd ),
where u is the solution of (1), then the method (y1 , y2 )(x) = u0 (x, ), where u0 is the solution of (1) with the initial function f0 , is optimal and for the error of optimal recovery the following equality: E(, L2 (Bd ), 1 , 2 ) = 1 21 + 2 22 holds.
658
K.Yu. Osipenko, E.V. Wedenskaya / Journal of Complexity 23 (2007) 653 – 661
Proof of Theorem 1. Consider the case d > 1. We have ∞ ∞
L(f, 1 , 2 ) =
t1 t2 (−ask + 1 ask + 2 ask )
ak
2 cskj ,
j =1
s=1 k=0
where cskj are the Fourier coefficients of f . Putting bsk =
ak
2 cskj ,
j =1
we rewrite L(f, 1 , 2 ) in the form 2 ) = L(f, 1 ,
∞ ∞
t1 − t2 − ask (−1 + 1 ask + 2 ask )bsk .
s=1 k=0
Assume that 22 /21 ∈ m , m 1. It is easily seen that in this case for 1 and 2 , equalities 2 tm2 = m , 1 tm1 + t1 2 2 tm+1 = m+1 1 m+1 +
(5)
hold. Consider the function g(z) = −1 + 1 e−2z(t1 −) + 2 e−2z(t2 −) . It is easy to verify that g is (pm+1 ) (p ) a convex function. It follows from (5) that g has two zeros zm = (smm ) and zm+1 = (sm+1 ) . In view of the convexity of g for all z zm and all z zm+1 the inequality g(z) 0 holds. Thus for all f ∈ L2 (Bd ) we have L(f, 1 , 2 )0. Define bsm ,km and bsm+1 ,km+1 from the conditions t tj bsm ,km mj + bsm+1 ,km+1 m+1 = 2j ,
j = 1, 2.
It is easy to verify that t2 −t1 2 2 2 2 /1 − m+1 bsm ,km = t11 · t −t , 2 −t1 m m2 1 − tm+1
bsm+1 ,km+1 =
21 1 tm+1
·
tm2 −t1 − 22 /21 2 −t1 tm2 −t1 − tm+1
.
For j = m, m + 1 we set bsj ,kj = 0. Then the function f (x) =
m+1 j =m
bsj kj Ysj kj 1 (x)
K.Yu. Osipenko, E.V. Wedenskaya / Journal of Complexity 23 (2007) 653 – 661
659
will be admissible and 2 ) = 0. L(f , 1 , Thus conditions (a) and (b) of Theorem 2 hold. Now we assume that 22 /21 ∈ 0 . It means that 22 21 t12 −t1 . Putting −t /2 f (x) = 1 1 1 Ys1 k1 1 (x),
for the solution u of (1) with the initial function f we have u(·, t1 )2L (Bd ) = 21 , 2 u(·, t2 )2L (Bd ) = 21 t12 −t1 22 . 2 Consequently, condition (b) of Theorem 2 holds. Condition (a) of the same theorem holds since for all functions f ∈ L2 (Bd ), 2 )0, L(f, 1 , and moreover 2 ) = 0. L(f , 1 , Now let us construct an optimal recovery method. According to Theorem 2 we have to solve the problem 1
ak ∞ ∞
2
t /2
(ask1 cskj − y1skj )
s=1 k=0 j =1
+ 2
ak ∞ ∞
t /2
ask2 cskj − y2skj
2
→ min,
f ∈ L2 (Bd ),
s=1 k=0 j =1
where cskj are the Fourier coefficients of f (see (2)). It can be easily verified that the solution of this problem has the form
cskj =
t /2 t /2 1 ask1 y1skj + 2 ask2 y2skj . t1 t2 1 ask + 2 ask
The optimality of method (3) now follows from Theorem 2.
660
K.Yu. Osipenko, E.V. Wedenskaya / Journal of Complexity 23 (2007) 653 – 661
The case d = 1 may be considered in a similar way.
(p)
We give the table (see [1]) of the first 10 ordered numbers s and for odd d (when p = k + 21 , k = 0, 1, . . .). (p )
for even d (that is, for p ∈ Z+ ) (p )
j
sj
pj
sj j
sj
pj
sj j
1
1
0
2.4048
1
3.1416
2
1
1
3.8317
1
3
1
2
5.1356
1
4
2
0
5.5200
2
5
1
3
6.3802
1
6
2
1
7.0156
2
7
1
4
7.5883
1
8
2
2
8.4172
2
9
3
0
8.6537
1
10
1
5
8.7715
3
1 2 3 2 5 2 1 2 7 2 3 2 9 2 5 2 11 2 1 2
4.4934 5.7635 6.2832 6.9879 7.7253 8.1826 9.0950 9.3558 9.4248
The authors are grateful to referees for their remarks and suggestions which greatly help us to improve the paper.
References [1] M. Abramovitz, I.A. Stegun, Handbook of Mathematical Functions, Dover, New York, 1972. [2] E.A. Balova, On optimal recovery of the Dirichlet problem solution in an annulus, Vladikavkaz Mat. Zh. 8 (2) (2006) 15–23. [3] G.G. Magaril-Il’yaev, K.Yu. Osipenko, Optimal recovery of functions and their derivatives from Fourier coefficients prescribed with an error, Mat. Sb. 193 (2002) 79–100 (English translation in Sb. Math. 193 (2002)). [4] G.G. Magaril-Il’yaev, K.Yu. Osipenko, Optimal recovery of functions and their derivatives from inaccurate information about a spectrum and inequalities for derivatives, Funkc. analiz i ego prilozh. 37 (2003) 51–64 (English translation in Functional Anal. Appl. 37 (2003)). [5] G.G. Magaril-Il’yaev, K.Yu. Osipenko, V.M. Tikhomirov, On optimal recovery of heat equation solutions, in: D.K. Dimitrov, G. Nikolov, R. Uluchev (Eds.), Approximation Theory: A Volume Dedicated to B. Bojanov, Marin Drinov Academic Publishing House, Sofia, 2004, pp. 163–175. [6] A.A. Melkman, C.A. Micchelli, Optimal estimation of linear operators in Hilbert spaces from inaccurate data, SIAM J. Numer. Anal. 16 (1979) 87–105. [7] K.Yu. Osipenko, On recovery of the Dirichlet problem solution by inaccurate input data, Vladikavkaz Mat. Zh. 6 (4) (2004) 55–62. [8] K.Yu. Osipenko, The Hardy–Littlewood–Pólya inequality for analytic functions from Hardy–Sobolev spaces, Mat. Sb. 197 (2006) 15–34 (English translation in Sb. Math. 197 (2006) 315–334). [9] K.Yu. Osipenko, N.D. Vysk, Optimal recovery of wave equation solution by inaccurate input data, Mat. Zametki 81 (6) (2007) 803–815. [10] L. Plaskota, Noisy Information and Computational Complexity, Cambridge University Press, Cambridge, 1996. [11] E.M. Stein, G. Weiss, Introduction to Fourier Analysis on Euclidean Spaces, Princeton University Press, Princeton, NJ, 1971. [12] J.F. Traub, H. Wo´zniakowski, A General Theory of Optimal Algorithms, Academic Press, New York, 1980.
K.Yu. Osipenko, E.V. Wedenskaya / Journal of Complexity 23 (2007) 653 – 661
661
[13] N.D. Vysk, On a wave equation solution with inaccurate Fourier coefficients of function defined the initial form of string, Vladikavkaz Mat. Zh. 8 (4) (2006) 12–17. [14] E.V. Wedenskaya, On optimal recovery of heat equation solution by inaccurate temperature given at several times, Vladikavkaz Mat. Zh. 8 (1) (2006) 16–21. [15] A.G. Werschulz, The Computational Complexity of Differential and Integral Equations: An Information-Based Approach, Oxford University Press, New York, 1991.
Journal of Complexity 23 (2007) 662 – 672 www.elsevier.com/locate/jco
Discrepancy with respect to convex polygons W.W.L. Chena,∗ , G. Travaglinib a Department of Mathematics, Macquarie University, Sydney, NSW 2109, Australia b Dipartimento di Statistica, Università di Milano-Bicocca, Edificio U7, Via Bicocca degli Arcimboldi 8, 20126 Milano,
Italy Received 31 October 2006; accepted 20 March 2007 Available online 6 April 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday
Abstract We study the problem of discrepancy of finite point sets in the unit square with respect to convex polygons, when the directions of the edges are fixed, when the number of edges is bounded, as well as when no such restrictions are imposed. In all three cases, we obtain estimates for the supremum norm that are very close to best possible. © 2007 Elsevier Inc. All rights reserved. Keywords: Discrepancy; Irregularities of distribution
1. Introduction Suppose that P is a distribution of N > 1 points, not necessarily distinct, in the unit square [0, 1]2 . For every Lebesgue measurable set A ⊆ [0, 1]2 , let Z[P; A] denote the number of points of P that fall into A, and consider the discrepancy function D[P; A] = Z[P; A] − N (A),
(1)
where (A) denotes the measure (or area) of A. We shall study the discrepancy function (1) when the subsets A are closed convex polygons in [0, 1]2 . More precisely, we study the behaviour of the function sup |D[P; A]|
A∈A
with respect to three classes A of convex polygons in [0, 1]2 . ∗ Corresponding author.
E-mail addresses: [email protected], [email protected] (W.W.L. Chen), [email protected] (G. Travaglini). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.03.006
W.W.L. Chen, G. Travaglini / Journal of Complexity 23 (2007) 662 – 672
663
Notation. We adopt standard Vinogradov notation. For two functions f and g, we write f >g to denote the existence of a positive constant c such that |f | cg. For any non-negative functions f and g, we write f ?g to denote the existence of a positive constant c such that f cg. The inequality signs > and ? may be used with subscripts involving parameters such as k and , in which case the positive constant c in question may depend on the parameters indicated. Let = (1 , . . . , k ), where 1 , . . . , k ∈ [0, ) are fixed. We denote by A() the collection of all convex polygons A in [0, 1]2 such that every side of A makes an angle i for some i = 1, . . . , k with the positive horizontal axis. Note that if = (0, /2), then A() is simply the collection of all aligned rectangles in [0, 1]2 . Then the famous result of Schmidt [12] shows that for every set P of N points in [0, 1]2 , we have sup
A∈A(0,/2)
|D[P; A]|? log N.
(2)
This result is best possible, apart from the implicit constant in the inequality, as an old result of Lerch [10] implies that there exists a set P of N points in [0, 1]2 such that sup
A∈A(0,/2)
|D[P; A]|> log N.
For the general case, the ideas in Beck and Chen [4] can be adapted easily to show that for every set P of N points in [0, 1]2 , we have sup |D[P; A]|? log N.
A∈A()
Here we establish the following complementary result. Theorem 1. Suppose that = (1 , . . . , k ), where 1 , . . . , k ∈ [0, ) are fixed. Then for every integer N > 1, there exists a set P of N points in [0, 1]2 such that sup |D[P; A]|> log N.
A∈A()
Next, we relax the restriction on the direction of the sides of the convex polygons and replace this with a restriction on the number of sides instead. We denote by Ak the collection of all convex polygons in [0, 1]2 with at most k sides. Then a result of Beck [1] implies that for every set P of N points in [0, 1]2 , we have sup |D[P; A]|?k N 1/4 .
A∈Ak
(3)
Here we establish the following upper bound. Theorem 2. For every integer N > 1, there exists a set P of N points in [0, 1]2 such that sup |D[P; A]|>k N 1/4 (log N )1/2 .
A∈Ak
(4)
Finally, we relax all the restrictions on the direction and number of sides of the convex polygons. Accordingly, we denote by A∗ the collection of all convex polygons in [0, 1]2 . Our study is
664
W.W.L. Chen, G. Travaglini / Journal of Complexity 23 (2007) 662 – 672
motivated by the wonderfully elegant work of Schmidt [13] and Beck [2] on the collection C ∗ of all convex sets in [0, 1]2 . Here, for every set P of N points in [0, 1]2 , we have sup |D[P; A]|?N 1/3 .
A∈C ∗
(5)
This is essentially best possible. For every integer N > 1, there exists a set P of N points in [0, 1]2 such that sup |D[P; A]|>N 1/3 (log N )4 .
A∈C ∗
Here we establish the following lower bound. Theorem 3. For every integer N > 1, for every set P of N points in [0, 1]2 , we have sup |D[P; A]|?N 1/3 .
A∈A∗
(6)
We remark that some of the arguments can be extended to polytopes in the d-dimensional unit cube [0, 1]d . In particular, inequalities (3) and (4) can be generalized to arbitrary dimensions d, with the exponent 41 replaced by the exponent 21 − 21 d, while inequalities (5) and (6) can also be generalized to arbitrary dimensions d, with the exponent 13 replaced by the exponent 1−2/(d +1). On the other hand, the generalization of inequality (2) to arbitrary dimensions is one of the most frustrating unsolved problems in the subject. For example, we do not know whether for every set P of N points in the cube [0, 1]3 , there is an aligned rectangular box A in [0, 1]3 such that |D[P; A]|?(log N)2 . 2. Diophantine approximation To establish Theorem 1, we shall follow the argument of Beck and Chen [5] and make use of a suitably scaled and rotated copy of the lattice Z2 . The rotation is made possible by the following result on diophantine approximation due to Davenport [7]. Lemma 2.1. Suppose that f1 , . . . , fr are real valued functions of a real variable, with continuous first derivatives in some open interval I containing some point 0 ∈ R such that f1 (0 ), . . . , fr (0 ) are all non-zero. Then there exists ∈ I such that f1 (), . . . , fr () are all badly approximable. √ Remark. A real number , such as = 2, is said to be badly approximable if there exists a constant c > 0 such that nn > c for every natural number n ∈ N. Here denotes the distance of from the nearest integer. More precisely, we shall use the following simple consequence. Lemma 2.2. Suppose that the angles 1 , . . . , k ∈ [0, ) are fixed. Then there exists ∈ [0, 2) such that tan , tan( − /2), tan( − 1 ), . . . , tan( − k ) are all finite and badly approximable.
W.W.L. Chen, G. Travaglini / Journal of Complexity 23 (2007) 662 – 672
665
We shall be concerned with the collection A() of convex polygons in [0, 1]2 , where 1 , . . . , k ∈ [0, ) are fixed. Recall that every side of such a polygon A ∈ A() makes an angle i for some i = 1, . . . , k with the positive horizontal axis. Corresponding to the given , we now choose a value of from Lemma 2.2 and keep it fixed throughout. We would like to consider the lattice formed by rotating the lattice (N −1/2 Z)2 anticlockwise by the angle about the origin. In particular, we are interested in the lattice points of that fall into [0, 1]2 . Notationally, however, it is far simpler to rescale and rotate the unit square [0, 1]2 and the convex polygons in A(). Accordingly, we consider the following rescaled and rotated variant of the original problem. Let U denote the image of the square [0, N 1/2 ]2 rotated clockwise by the angle about the origin, and let AN (; ) denote the collection of all convex polygons B in U such that every side of B either is parallel to a side of U or makes an angle i − for some i = 1, . . . , k with the positive horizontal axis. For every measurable subset B ⊆ U , let Z(B) denote the number of lattice points of Z2 that fall into B, and write E(B) = Z(B) − (B). We need the following intermediate result. Lemma 2.3. For every B ∈ AN (; ), we have |E(B)|> log N. Deduction of Theorem 1. Unfortunately, the set Z2 ∩ U does not necessarily have precisely N points. Let Q denote a set of precisely N points in U obtained by adding to or removing from Z2 ∩ U precisely ||Z2 ∩ U | − N | points. Note that ||Z2 ∩ U | − N | = |E(U )|> log N in view of Lemma 2.3. For every B ∈ AN (; ), we now let Z[Q; B] denote the number of points of Q in B. Then |Z[Q; B] − (B)| |E(B)| + |Z(B) − Z[Q; B]| |E(B)| + |Z(U ) − Z[Q; U ]| = |E(B)| + |E(U )| > log N. Now let P be obtained by rotating N −1/2 Q anticlockwise by the angle . Then P is a set of precisely N points in [0, 1]2 , and the inequality |D[P; A]|> log N holds for every convex polygon A ∈ A().
Proof of Lemma 2.3. We adopt the convention that 1 , . . . , k are distinct, but note that no convex polygon can have three parallel sides. For every n = (n1 , n2 ) ∈ Z2 , let S(n) = (n1 − 21 , n1 + 21 ] × (n2 − 21 , n2 + 21 ]. For any convex polygon B ∈ AN (; ), let N = {n ∈ Z2 : S(n) ∩ B = ∅},
666
W.W.L. Chen, G. Travaglini / Journal of Complexity 23 (2007) 662 – 672
so that E(B) =
E(B ∩ S(n)).
n∈N
Furthermore, for every i = 1, . . . , k, let Ti denote the edge(s) of B that makes the angle i − with the positive horizontal axis, let Ti∗ denote the totality of all the other edges of B, and write Ni = {n ∈ N : S(n) ∩ Ti = ∅ and S(n) ∩ Ti∗ = ∅}. We also write N + = {n ∈ N : there exist i = i with S(n) ∩ Ti = ∅ and S(n) ∩ Ti = ∅} and N − = {n ∈ N : S(n) ∩ Ti = ∅ for every i}. Clearly, N = N1 ∪ · · · ∪ Nk ∪ N + ∪ N − , and E(B) =
k
E(B ∩ S(n)) +
E(B ∩ S(n)) +
n∈N +
i=1 n∈Ni
E(B ∩ S(n)).
(7)
n∈N −
It is easy to see that |N + | = O (1) and that |E(B ∩ S(n))| 1 for every n ∈ N , so that E(B ∩ S(n)) = O (1).
(8)
n∈N +
It is also easy to see that E(B ∩ S(n)) = 0.
(9)
n∈N −
Combining (7)–(9), we conclude that E(B) =
k
E(B ∩ S(n)) + O (1).
i=1 n∈Ni
To prove Lemma 2.3, it remains to prove that for every i = 1, . . . , k, we have E(B ∩ S(n))> log N.
(10)
n∈Ni
Write i = i − . In view of symmetry, we may assume that 0 i /4. There are at most two edges of B that make the angle i with the positive horizontal axis. Let one of these lie on the line x2 − a2 = tan i , x1 − a 1 where (x1 , x2 ) ∈ R2 denotes any point on the line and a1 and a2 are real constants. Elementary calculation then shows that the contribution from this edge to the sum in (10) is given by (a2 + (m − a1 ) tan i ), ± Ai m Bi
W.W.L. Chen, G. Travaglini / Journal of Complexity 23 (2007) 662 – 672
667
√ where Ai and Bi are integers satisfying 0 Ai Bi 2N 1/2 , and (z) = z − [z] − 1/2 for every z ∈ R. Since tan i is badly approximable, giving rise to good distribution of the sequence m tan i modulo 1, the well-known result of Lerch [10] (see also [8,9,6]) shows that (a2 + (m − a1 ) tan i )> log(Bi − Ai + 2)> log N. Ai m Bi
i
i
This establishes inequality (10), and completes the proof of Lemma 2.3.
3. An argument of Beck To study Theorem 2, we use an elaboration of the idea of Beck as discussed in Section 8.1 of [3]. It is convenient to restrict the natural number N to be a perfect square, so that N = M 2 for some natural number M. This restriction can be lifted easily, in view of Lagrange’s theorem that every positive integer is a sum of at most four integer squares, so that we can superimpose up to four point distributions where the number of points in each is a perfect square. We shall consider a rescaled version of the problem, and study sets of N points in the square [0, M]2 . Let k ∈ N be fixed, with k 3. We denote by Gk the collection of all convex polygons in [0, M]2 which have at most k sides. Suppose that P is a set of N points in [0, M]2 . For every measurable subset A ⊆ [0, M]2 , let Z[P; A] denote the number of points of P that fall into A, and let E[P; A] = Z[P; A] − (A) denote the corresponding discrepancy. We would like to show that there exists a set P of N points in [0, M]2 such that for every convex polygon A ∈ Gk , we have |E[P; A]|>k N 1/4 (log N )1/2 . Our first step is to approximate the convex polygons in Gk by a special finite collection of polygons. Let = (6kM)−1 , and let Hk denote the collection of all convex polygons in [0, M]2 with at most 4k sides and with vertices on ( Z)2 ∩[0, M]2 . It is easy to see that |( Z)2 ∩[0, M]2 | = (6kN + 1)2 , so that 4k (6kN + 1)2 |Hk | ck N 8k , d d=3
where the constant ck depends at most on k. Lemma 3.1. For every convex polygon A ∈ Gk , there exist two convex polygons B + , B − ∈ Hk such that B − ⊆ A ⊆ B + and (B + \ B − ) 1. Lemma 3.2. There exists a set P of N points in [0, M]2 such that for every convex polygon B ∈ Hk , we have |E[P; B]|Ck N 1/4 (log N )1/2 , where the constant Ck depends at most on k. Before we establish these two lemmas, we shall first complete the very short deduction of Theorem 2.
668
W.W.L. Chen, G. Travaglini / Journal of Complexity 23 (2007) 662 – 672
Deduction of Theorem 2. For every convex polygon A ∈ Gk , it is not difficult to show that the convex polygons B + , B − ∈ Hk given by Lemma 3.1 satisfy the inequality |E[P; A]| max{|E[P; B − ]|, |E[P; B + ]|} + (B + \ B − ) Ck N 1/4 (log N )1/2 + 1. This gives Theorem 2 immediately.
We shall establish Lemma 3.2 in Section 4, and Lemma 3.1 in Section 5. 4. Large deviation In this section, we establish Lemma 3.2 using a large deviation-type argument. For every l = (1 , 2 ) ∈ Z2 ∩ [0, M)2 , let ql ∈ S(l) = [1 , 1 + 1) × [2 , 2 + 1) be a random point uniformly distributed in S(l) and independent of the points in the other squares, and consider the random point set = {ql : l ∈ Z2 ∩ [0, M)2 }. P Consider a fixed convex polygon B ∈ Hk , and let L(B) = {l ∈ Z2 ∩ [0, M)2 : S(l) ∩ *B = ∅}. Then it is easy to show that |L(B)|4N 1/2 . For any l ∈ L(B), let 1 if ql ∈ B,
l = 0 otherwise. Then B] = E[P;
( l − E l ).
l∈L(B)
We now use the following large deviation-type inequality due to Hoeffding; see, for example, Appendix B of Pollard [11]. Lemma 4.1. Suppose that 1 , . . . , m are independent random variables such that 0 i 1 for every i = 1, . . . , m. Then for every > 0, m 2 ( i − E i ) 2e−2 /m . Prob i=1
Note that m = |L(B)|4N 1/2 , and choose = Ck N 1/4 (log N )1/2 with a sufficiently large constant Ck . Then it is easy to check that Ck2 2 Ck2 N 1/2 log N log N, = m 4 4N 1/2
W.W.L. Chen, G. Travaglini / Journal of Complexity 23 (2007) 662 – 672
669
so that 4e−2
2 /m
4N −Ck /2 ck−1 N −8k , 2
where the last inequality is valid for all N 2 provided that Ck is large enough in terms of k and ck . Since 2 −1 1 1 −1 −8k 2e−2 /m , 2 |Hk | 2 ck N
we have
B]|Ck N 1/4 (log N )1/2 1 |Hk |−1 . Prob |E[P; 2
If we now consider all convex polygons B ∈ Hk , then the above implies
B]|Ck N 1/4 (log N )1/2 for some B ∈ Hk 1 , Prob |E[P; 2 and so
B]| Ck N 1/4 (log N )1/2 for all B ∈ Hk 1 . Prob |E[P; 2
This completes the proof of Lemma 3.2. 5. Convexity In this section, we establish Lemma 3.1 using a convexity argument. Recall that Gk denotes the collection of all convex polygons in [0, M]2 which have at most k sides, and Hk denotes the collection of all convex polygons in [0, M]2 with at most 4k sides and with vertices on ( Z)2 ∩ [0, M]2 , where = (6kM)−1 . For convenience, we make an ad hoc definition. By a -square, we mean a closed square of side and with all vertices in ( Z)2 ∩ [0, M]2 . 5.1. The outer convex polygon B + Suppose that a convex polygon A ∈ Gk is given. Corresponding to every vertex v of A, we shall define the set Ov of “outer grid points” corresponding to v. We distinguish two cases: Case 1: Suppose that v ∈ ( Z)2 ∩ [0, M]2 . Then we take Ov = {v}. Case 2: Suppose that v ∈ ( Z)2 ∩ [0, M]2 . Then we take Ov to be the collection of the vertices outside A or on the boundary of A of all -squares that contain v and whose interior intersects the boundary of A. To construct the convex polygons B + ∈ Hk given in Lemma 3.1, we simply let B + = ch Ov : v is a vertex of A denote the convex hull of all the outer grid points of A. Trivially, the convex polygon B + has at most 4k sides, since A has at most k sides. The inclusion A ⊆ B + is immediate from our definition. On the other hand, we have (B + \ A) 21 .
(11)
670
W.W.L. Chen, G. Travaglini / Journal of Complexity 23 (2007) 662 – 672
To see this, note that any point of Ov has vertical or horizontal distance at most 2 from the (extended) edges of A that intersect at v. It follows that the set B + \ A is contained in the union of k sets, each of area at most 2 M. Inequality (11) follows immediately. 5.2. The inner convex polygon B − Suppose that a convex polygon A ∈ Gk is given. Here we run into some technical complications caused by the possibility of A having some vertices that are very close together. To overcome these complications, we introduce an iterative process whereby we can remove some of the vertices of A, one at a time, to obtain a smaller polygon A∗ . Start with A0 = A. For each i = 0, 1, 2, . . . , we remove, if possible, a vertex of the polygon Ai by taking one of the steps below, and denote by Ai+1 the convex polygon formed with the remaining vertices: • Option 1: Remove a vertex v of Ai if a -square containing v contains another vertex of Ai . • Option 2: Remove a vertex v of Ai if all four vertices of every -square containing v lie outside Ai and at least one of the following two conditions is satisfied: ◦ The horizontal distance from v to an adjacent vertex of Ai is less than the horizontal distance in the same direction from v to any grid point of ( Z)2 ∩ [0, M]2 lying inside Ai or on the boundary of Ai . ◦ The vertical distance from v to an adjacent vertex of Ai is less than the vertical distance in the same direction from v to any grid point of ( Z)2 ∩ [0, M]2 lying inside Ai or on the boundary of Ai . Note that Ai+1 ⊆ Ai , and (Ai \ Ai+1 ) M. This iterative process stops when it is no longer possible to remove any vertex of a convex polygon under either option, and we denote by A∗ the last convex polygon obtained from A by this process. Note that (A \ A∗ )j M,
(12)
where j is the number of vertices of A removed by this process. Note that the convex polygon A∗ may not be unique, and has at most k − j sides. Corresponding to every vertex v of A∗ , we shall define the set Iv of “inner grid points” corresponding to v. We distinguish two cases: Case 1: Suppose that v ∈ ( Z)2 ∩ [0, M]2 . Then we take Iv = {v}. Case 2: Suppose that v ∈ ( Z)2 ∩ [0, M]2 . Let Fv denote the collection of vertices inside A∗ or on the boundary of A∗ of all -squares that contain v and whose interior intersects the boundary of A∗ —there is only one such -square, unless v lies on the boundary of two adjacent ones in which case there are precisely two. There are three possibilities: • If Fv = ∅, then we take Iv = Fv . • If Fv = ∅, and no point of the lattice ( Z)2 ∩ [0, M]2 lies inside A∗ or on the boundary of A∗ , then we take Iv = ∅. • If Fv = ∅, and there are points of the lattice ( Z)2 ∩[0, M]2 that lie inside A∗ or on the boundary of A∗ , then for every -square that contains v and whose interior intersects the boundary of A∗ , one or more of its four edges must have the following property: the edge intersects A∗ , and there is a grid line of ( Z)2 ∩ [0, M]2 , parallel to this edge, closest to v but on the other side of this edge from v, that contains points of ( Z)2 ∩ [0, M]2 that lie inside A∗ or on the boundary
W.W.L. Chen, G. Travaglini / Journal of Complexity 23 (2007) 662 – 672
671
of A∗ . We take Iv to include all such grid points of ( Z)2 ∩ [0, M]2 on these closest grid lines that lie inside A∗ or on the boundary of A∗ . The following is easy to prove: if the boundary of A∗ crosses precisely one edge or three edges of the -square, then the elements of Iv arising from this -square lie on at most one grid line. If the boundary of A∗ crosses precisely two edges of the -square, then the elements of Iv arising from this -square lie on at most two distinct grid lines, only one of which can contain more than one element of Iv . Note that the boundary of A∗ cannot cross all four edges of the -square, as this would imply that no point of the lattice ( Z)2 ∩ [0, M]2 lies inside A∗ or on the boundary of A∗ . To construct the convex polygons B − ∈ Hk given in Lemma 3.1, we simply let B − = ch Iv : v is a vertex of A∗ denote the convex hull of all the inner grid points of A∗ , with the convention that B − = ∅ if Iv = ∅ for every vertex v of A∗ . Trivially, the convex polygon B − has fewer than 4k sides, since A∗ has at most k sides. The inclusions B − ⊆ A∗ ⊆ A are immediate from our definitions. On the other hand, we have (A \ B − ) 21 .
(13)
To see this, note that each vertex v of A∗ contributes at most three vertices of B − . Moreover, any point of Iv has vertical or horizontal distance at most from the edges of A∗ that intersect at v. It follows that the set A∗ \ B − is contained in the union of k − j sets “along the edges”, each of area at most M, and the union of at most 2(k − j ) triangles “near the vertices”, each of area at most M. Inequality (13) then follows at once on noting inequality (12). The case when B − = ∅ is trivial. 6. An elementary geometric argument In this section, we adapt the wonderfully elegant geometric argument described in Schmidt [13] to give a simple proof of Theorem 3. Consider the circle of radius 21 lying within the unit square [0, 1]2 . Now let k = [N 1/3 ], and let A denote a regular convex polygon of k sides inscribed in this circle. Elementary calculation shows that any triangle whose three vertices are one of the vertices of A and the midpoints of the two adjacent edges has area 1 3 1 1 1 2 3 = 3 . sin cos (14) 4 k k 8 k N k Corresponding to each vertex of A, we now consider an isosceles triangle of area 1/2N and with its two equal sides lying on the two edges of A adjacent to this vertex. Let B1 , . . . , Bs denote those isosceles triangles which contain points of P, and let C1 , . . . , Ct denote those isosceles triangles which do not contain points of P. Clearly, D[P; Bi ] 21
for every i = 1, . . . , s,
and D[P; Cj ] = − 21
for every j = 1, . . . , t.
672
W.W.L. Chen, G. Travaglini / Journal of Complexity 23 (2007) 662 – 672
Furthermore, the triangles B1 , . . . , Bs , C1 , . . . , Ct are pairwise disjoint, in view of (14) above, and s + t = k = [N 1/3 ]. It is also easy to see that both A+ = A \ (B1 ∪ · · · ∪ Bs ) and
A− = A \ (C1 ∪ · · · ∪ Ct )
are convex polygons. But now D[P; A− ] − D[P; A+ ] =
s
D[P; Bi ] −
i=1
t
D[P; Cj ]
j =1
s t k 1 + = = [N 1/3 ]. 2 2 2 2
It follows that |D[P; A− ]| 41 [N 1/3 ]
or |D[P; A+ ]| 41 [N 1/3 ],
and this completes the proof of Theorem 3. References [1] J. Beck, Irregularities of distribution I, Acta Math. 159 (1987) 1–49. [2] J. Beck, On the discrepancy of convex plane sets, Monatsh. Math. 105 (1988) 91–106. [3] J. Beck, W.W.L. Chen, Irregularities of Distribution, Cambridge Tracts in Mathematics, vol. 89, Cambridge University Press, Cambridge, 1987. [4] J. Beck, W.W.L. Chen, Irregularities of point distribution relative to convex polygons, in: G. Halász, V.T. Sós (Eds.), Irregularities of Partitions, Algorithms and Combinatorics, vol. 8, Springer, Berlin, 1989, pp. 1–22. [5] J. Beck, W.W.L. Chen, Irregularities of point distribution relative to convex polygons III, J. London Math. Soc. 56 (1997) 222–230. [6] H. Davenport, Note on irregularities of distribution, Mathematika 3 (1956) 131–135. [7] H. Davenport, A note on diophantine approximation II, Mathematika 11 (1964) 50–58. [8] G.H. Hardy, J.E. Littlewood, Some problems of diophantine approximation: the lattice points of a right-angled triangle I, Proc. London Math. Soc. 20 (1922) 15–36. [9] G.H. Hardy, J.E. Littlewood, Some problems of diophantine approximation: the lattice points of a right-angled triangle II, Abh. Math. Sem. Univ. Hamburg 1 (1922) 212–249. [10] M. Lerch, Question 1547, L’Intermediare Math. 11 (1904) 145–146. [11] D. Pollard, Convergence of Stochastic Processes, Springer, Berlin, 1984. [12] W.M. Schmidt, Irregularities of distribution VII, Acta Arith. 21 (1972) 45–50. [13] W.M. Schmidt, Irregularities of distribution IX, Acta Arith. 27 (1975) 385–396.
Journal of Complexity 23 (2007) 673 – 696 www.elsevier.com/locate/jco
Simple Monte Carlo and the Metropolis algorithm Peter Mathéa , Erich Novakb,∗ a Weierstrass Institute for Applied Analysis and Stochastics, Mohrenstrasse 39, D-10117 Berlin, Germany b Friedrich Schiller University Jena, Mathem. Institute, Ernst-Abbe-Platz 2, D-07743 Jena, Germany
Received 21 October 2006; accepted 14 May 2007 Dedicated to our dear colleague and friend Henryk Wo´zniakowski on the occasion of his 60th birthday Available online 15 June 2007
Abstract We study the integration of functions with respect to an unknown density. Information is available as oracle calls to the integrand and to the non-normalized density function. We are interested in analyzing the integration error of optimal algorithms (or the complexity of the problem) with emphasis on the variability of the weight function. For a corresponding large class of problem instances we show that the complexity grows linearly in the variability, and the simple Monte Carlo method provides an almost optimal algorithm. Under additional geometric restrictions (mainly log-concavity) for the density functions, we establish that a suitable adaptive local Metropolis algorithm is almost optimal and outperforms any non-adaptive algorithm. © 2007 Elsevier Inc. All rights reserved. MSC: 65C05; secondary: 65Y2068Q17; 82B80 Keywords: Monte Carlo methods; Metropolis algorithm; Log-concave density; Rapidly mixing Markov chains; Optimal algorithms; Adaptivity; Complexity
1. Introduction, problem description In many applications one wants to compute an integral of the form f (x) · c(x)(dx)
(1)
with a density c(x), x ∈ , where c > 0 is unknown and is a probability measure. Of course we have 1/c = (x)(dx), but the numerical computation of the latter integral is often as hard as the original problem (1). Therefore it is desirable to have algorithms which are able ∗ Corresponding author.
E-mail addresses: [email protected] (P. Mathé), [email protected] (E. Novak). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.05.002
674
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
to approximately compute (1) without knowing the normalizing constant, based solely on n function values of f and . In other terms, these functions are given by an oracle, i.e., we assume that we can compute function values of f and . Solution operator. Assume that we are given any class F() of input data (f, ) defined on a set . We can rewrite the integral in (1) as f (x) · (x)(dx) (f, ) ∈ F(). (2) S(f, ) = (x)(dx) This solution operator is linear in f but not in . We discuss algorithms for the (approximate) computation of S(f, ). Remark 1. This solution operator is closely related to systems in statistical mechanics, which obey a Boltzmann (or Maxwell or Gibbs) distribution, i.e., when there is a countable number j = 1, 2, . . . of microstates with energies, say Ej , and the overall system is distributed according to the Boltzmann distribution, with inverse temperature , as P (j ) :=
e−Ej , Z
j = 1, 2, . . . .
In this case the normalizing constant Z is the partition function, corresponding to 1/c from (1) and (j ) = e−Ej for j ∈ N. In this setup, if A is any global thermodynamic quantity, then its expected value A is given by A :=
1 Aj e−Ej , Z j
which can be written as S(A, ). Observe, however, that we use here slightly different assumptions since we use the counting measure on N, not a probability measure. Randomized methods. Monte Carlo methods (randomized methods) are important numerical tools for integration and simulation in science and engineering, we refer to the recent special issue [7]. The Metropolis method, or more accurately, the class of Metropolis–Hastings algorithms ranges among the most important methods in numerical analysis and scientific computation, see [6,23]. Here we consider randomized methods Sn that use n function evaluations of f and . Hence Sn is of the form as exhibited in Fig. 1. In all steps, random number generators may be used to determine the consecutive node. If the nodes xi from Step do not depend on previously computed values of f (x1 ), . . . , f (xi−1 ) and (x1 ), . . . , (xi−1 ), then the algorithm is called non-adaptive, otherwise it is called adaptive. simple and Snmh , introduced in (3) and (5) below. Specifically we analyze the procedures Sn Remark 2. The notion of adaption which is used here differs from the one recently used to introduce adaptive MCMC, see e.g. [1,3]. The Metropolis algorithm which is used in this paper is based on a homogeneous Markov chain, in our notation this is still an adaptive algorithm since the used nodes xi depend on . Hence we use the concept of adaptivity from numerical analysis and information-based complexity, see [22].
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
675
Fig. 1. Generic Monte Carlo algorithm based on n values of f and . The final Compute may use any mapping n : R2n → R.
For details on the model of computation we refer to [20,21,27]. Here we only mention the following: We use the real number model and assume that f and are given by an oracle for function values. Our lower bounds hold under very general assumptions concerning the available random number generator. 1 For the upper bounds we only study two algorithms in this paper, described in (3) and (5), below. Specifically we shall deal with the (non-adaptive) simple Monte Carlo method and a specific (adaptive) Metropolis–Hastings method. The former can only be applied if a random number generator for on is available. Thus there are natural situations when this method cannot be used. The latter will be based on a suitable ball walk. Hence we need a random number generator for the uniform distribution on a (Euclidean) ball. Thus the Metropolis–Hastings methods can also be applied when a random number generator for on is not available. Instead, we need a “membership oracle” for : On input x ∈ Rd this oracle can decide with cost 1 whether x ∈ or not. Error criterion. We are interested in error bounds uniformly for classes F() of input data. If Sn is any method that uses (at most) n values of f and then the (individual) error for the problem instance (f, ) ∈ F() is given by 1/2 e(Sn , (f, )) = E |S(f, ) − Sn (f, )|2 , where E means the expectation. The overall (or worst case) error on the class F() is e(Sn , F()) =
sup
(f,)∈F ()
e(Sn , (f, )).
The complexity of the problem is given by the error of the best algorithm, hence we let en (F()) := inf e(Sn , F()). Sn
The classes F() under consideration will always contain constant densities = c > 0 and all f with f ∞ 1, hence F1 () := {(f, ), |f (x)| 1, x ∈ , and = c} ⊂ F(). 1 Observe, however, that we cannot use a random number generator for the “target distribution’’ = · / , since 1 is part of the input.
676
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
On this class the problem (2) reduces to the classical integration problem for uniformly bounded functions, and it is well known that the error of any Monte Carlo method can decrease at a rate n−1/2 , at most. Precisely, it holds true that en (F1 ()) =
1 √ , 1+ n
if the probability is non-atomic, see [17]. On the other hand we will only consider (f, ) with S(f, ) ∈ [−1, 1], hence the trivial algorithm S0 = 0 always has error 1. For the classes FC () and F (), which will be introduced in Section 2, we easily obtain the optimal order en (F()) n−1/2 . We will analyze how en (F()) depends on the parameters C and , in case F() := FC () or F() := F (), respectively. We discuss some of our subsequent results and provide a short outline. In Section 2 we shall specify the methods and classes of input data to be analyzed. The classes FC (), analyzed first in Section 3, contain all densities with sup / inf C. In typical applications we may face C = 1020 . Then we cannot decrease the error of optimal methods from 1 to 0.7 even with sample size n = 1015 , see Theorem 1 for more details. Hence the classes FC () are so large that no algorithm, deterministic or Monte Carlo, adaptive or non-adaptive, can provide an acceptable error. We also prove that the simple (non-adaptive) Monte Carlo method is almost optimal, no sophisticated Markov chain Monte Carlo method can help. Thus we face the question whether adaptive algorithms, such as the Metropolis algorithm, help significantly on “suitable and interesting” subclasses of FC (). We give a positive answer for the classes F (), analyzed in Section 4. Here we assume that ⊂ Rd is a convex body, and that is the normalized Lebesgue measure on . The class F () contains log-concave densities, where is the Lipschitz constant of log . We shall establish in Section 4.1 that all non-adaptive methods (such as the simple Monte Carlo method) suffer from the curse of dimension, i.e., we get similar lower bounds as for the classes FC (). However, in Section 4.2 we shall design and analyze specific (adaptive) Metropolis algorithms that are based on some underlying ball walks, tuned to the class parameters. Using such algorithms we can break the curse of dimension by adaption. The main error estimate for this algorithm is given in Theorem 5, and we conclude this study with further discussion in the final Section 5. 2. Specific methods and classes of input We consider the approximate computation of S(f, ) for large classes of input data. Since with deterministic algorithms one cannot improve the trivial zero algorithm (with error 1), we study randomized or Monte Carlo algorithms. The methods. The Monte Carlo methods under consideration fit the schematic view from Fig. 1. Simple Monte Carlo. Here the random numbers 1 , . . . , n are identically and independently distributed according to , and the routine Step chooses Xi := i . The final routine Compute is the quotient of the sample means of the computed function values n simple (f, ) Sn
:=
j =1 f (Xj )(Xj ) n . j =1 (Xj )
(3)
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
677
Metropolis–Hastings method. This describes a class of (adaptive) Monte Carlo methods which are based on the ingenious idea to construct in Step a Markov chain having · := (4) (x)(dx) as invariant distribution without knowing the normalization. Thus, if (X1 , X2 , . . . , Xn ) is a trajectory of such a Markov chain, then we let Compute be given as 1 f (Xj ). n n
Snmh (f, ) :=
(5)
j =1
Hence we use n steps of the Markov chain, the number of needed (different) function values of and f might be smaller. We will further specify the Metropolis–Hastings algorithm for the problem at hand in Section 4.2, see Figs. 2 and 3 for a schematic presentation and Theorem 5 for the choice of . Both Monte Carlo methods construct Markov chains, i.e., the point xi depends on xi−1 and (xi−1 ), only. This trivially holds true for simple Monte Carlo, since xi does not at all depend on earlier computed function values. Remark 3. Comparisons of different Monte Carlo methods for problems similar to (2) are frequently met in the literature. We mention [5] with a comparison of Metropolis algorithms and importance sampling, where an error expansion at any instance (f, ) is given in terms of certain auto-correlations. The simple Monte Carlo method, as introduced below, is also studied there as ˜ I for = 1. simple
and Snmh , as n → ∞, is The (point-wise almost sure) convergence of both methods Sn ensured by corresponding ergodic theorems, see [14]. But, as outlined above, we are interested in the uniform error on relatively large problem classes. The classes. Here we formally describe the classes of input under consideration. The class FC (). Let be an arbitrary probability measure on a set and consider the set (x) FC () = (f, )|f ∞ 1, > 0, C, x, y ∈ . (y) Note that necessarily C 1. If C = 1 then is constant and we almost face the ordinary integration problem, since can be recovered with only one function value. In many applications the constant C is huge and we will establish that the complexity of the problem (the cost of an optimal algorithm) is linear in C. Therefore, for large C, the class is too large. We have to look for smaller classes that contain many interesting pairs (f, ) and have smaller complexity. The class F () with log-concave densities. In many applications, we have a weight with additional properties and we assume the following: • The set ⊂ Rd is a convex body, that is a compact and convex set with non-empty interior. The probability = is the normalized Lebesgue measure on the set . • The functions f and are defined on . • The weight > 0 is log-concave, i.e., ( x + (1 − )y) (x) · (y)1− , where x, y ∈ and 0 < < 1. • The logarithm of is Lipschitz, i.e., | log (x) − log (y)| x − y2 .
678
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
Thus we consider the class of log-concave weights on ⊂ Rd given by R () = {| > 0, log is concave, | log (x) − log (y)| x − y2 }. We study the following class F () of problem elements,
F () = (f, )| ∈ R (), f 2, 1 ,
(6)
(7)
where · 2, is the L2 -norm with respect to the probability measure , see (4). In some places we restrict our study to the (Euclidean) unit ball, i.e., := B d ⊂ Rd . Remark 4. Let RC () be the class of weight functions that belong to FC (). Then R () ⊂ RC () if C = eD , where D is the diameter of . Thus large correspond to “exponentially large” values of C. However, the densities from the class R () have some extra (local) properties: they are log-concave and Lipschitz continuous. These properties can be used for the construction of fast adaptive methods, via rapidly mixing Markov chains. 3. Analysis for FC () We assume that is an arbitrary set and is a probability measure on , and that the functions f and are defined on . In the applications, the constant C might be very large, something like C = 1020 is a realistic assumption. Therefore we want to know how the complexity (the cost of optimal algorithms) depends on C. Observe that the problem is correctly normalized or scaled such that S(FC ()) = [−1, 1], for any C 1. We will prove that the complexity of the problem is linear in C, and hence there is no way to solve the problem if C is really huge. We start with establishing a lower bound and then show that simple Monte Carlo achieves this error up to a constant. 3.1. Lower bounds Here we prove lower bounds for all (adaptive or non-adaptive) methods that use n evaluations of f and . We use the technique of Bahvalov, i.e., we study the average error of deterministic algorithms with respect to certain discrete measures on FC (). Theorem 1. Assume that we can partition into 2n disjoint sets with equal measure (equal to 1/2n). Then for any Monte Carlo method Sn that uses n values of f and we have the lower bound ⎧ ⎪ ⎨ C, 2n C − 1, √ 1 2n e(Sn , FC ()) 2 (8) 3C ⎪ 6 ⎩ , 2n < C − 1. C + 2n − 1 The lower bound will be obtained in two steps. (1) We first reduce the error analysis for Monte Carlo sampling to the average case error analysis with respect to a certain prior probability on the class FC (). This approach is due to Bahvalov, see [4]. (2) For the chosen prior the average case analysis can be carried out explicitly and will thus yield a lower bound.
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
679
To construct the prior let m := 2n and 1 , . . . , m the partition into sets of equal probability, and
j the corresponding characteristic functions. Furthermore, let ⎧ m ⎨ , l := C−1 ⎩ 1
m C − 1, else.
Denote Jlm the set of all subsets of {1, . . . , m} of cardinality equal to l, and m,l the equi-distribution on Jlm , while Em,l denotes the expectation with respect to the prior m,l . Let (1 , . . . , m ) be independent and identically distributed with P (j = −1) = P (j = 1) = 21 , j = 1, . . . , m. The overall prior is the product probability on Jlm × {±1}m . For any realization = (I, 1 , . . . , m ) we assign f := j j and := C
j +
j . j ∈I
j ∈I
j ∈I
The following observation is useful. Lemma 1. For any subset N ⊂ {1, . . . , m} of cardinality at most n it holds l Em,l #(I \ N) . 2 Proof. Clearly, for any fixed k ∈ {1, . . . , m} we have m,l (k ∈ I ) = l/m, thus Em,l #(I \ N) =
Em,l I (r) = #(N c )
r∈N c
where we denoted by N c the complement of N.
l l , m 2
Proof Theorem 1. Given the above prior let us denote 1/2 avg en (FC ()) := inf Em,l E |S(f, ) − q(f, )|2 , q
(9)
where the inf is taken with respect to any (possibly adaptive) deterministic algorithm which uses at most n values from f and . For any Monte Carlo method Sn we have, using Bahvalov’s argument [4], the relation avg
e(Sn , FC ())en (FC ()).
(10)
avg
We provide a lower bound for en (FC ())2 . To this end note that for each realization (f , ) the integral d is constant. In the first case m C − 1, and we can bound the integral by the choice of l as 1 cm,l := (x) (dx) = (lC + (m − l)1) 3. (11) m In the other case m < C − 1, we obtain cm,1 = (C − 1 + m)/m. Now, to analyze the average case error, let qn be any (deterministic) method, and let us assume that it uses the set N of nodes.
680
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
We have the decomposition
⎛
S(f , ) − qn (f , ) = ⎝
C mcm,l
⎞
⎛
j ⎠ − ⎝
j ∈I \N
C mcm,l
j ∈I ∩N
⎞ j − qn (f , )⎠ .
Given I, the random variables in the brackets are conditionally independent, thus uncorrelated. Hence we conclude that 2 C 2 Em,l E S(f , ) − qn (f , ) Em,l E j mcm,l j ∈I \N =
C2 C2l E #(J \ N ) , m,l 2 2 m2 cm,l 2m2 cm,l
by Lemma 1. In the case m C − 1 we obtain l m/C and have cm,l 3, such that Em,l |S(f, ) − qn (f, )|2
C , 36n
which in turn yields the first case bound in (8). In the other case m < C − 1 the value of l = 1 yields the second bound in (8). 3.2. The error of the simple Monte Carlo method simple
from (3). We will prove The direct approach to evaluate (1) would be to use the method Sn an upper bound for the error of this method, and we start with the following: Lemma 2. If the function obeys the requirements in FC (), then (1) 0 < inf x∈ (x) supx∈ (x) < ∞. √ (2) For every probability measure on we have 2, C1, . Proof. To prove the first assertion, fix any y0 ∈ . Then the assumption on yields (x) C(y0 ), and reversing the roles of x and y also the lower bound. Now both, the assumption on as well as the second assertion, are invariant with respect to multiplication of by a constant. In the lightof the first assertion we may and do assume that 1 (x) C, x ∈ , and we derive, using 1 (x) (dx), that 2 2 (x) (dx)C (x) (dx)C (x) (dx) ,
completing the proof of the second assertion and of the lemma.
We turn to the bound for the simple Monte Carlo method. Theorem 2. For all n ∈ N we have simple , FC ())2 min e(Sn
1,
2C n
.
(12)
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
681
Proof. The upper bound 2 is trivial, it even holds deterministically. Fix any pair (f, ) of inmean put. For any sample (X1 , . . . , Xn ) and function g we denote √ the sample mean by Sn (g) := n mean 1/n j =1 g(Xj ). It is well known that e(Sn , g)g2 / n. With this notation we can bound Snmean (f ) Snmean (f ) Snmean (f ) simple + − mean (f, ) S(f, ) − S(f, ) − Sn Sn () (x)(dx) (x)(dx) 1 f (x)(x)(dx) − S mean (f ) n 1 mean Sn (f ) mean + mean (x)(dx) − Sn () Sn () 1 f (x)(x)(dx) − S mean (f ) n 1 +f ∞ (x)(dx) − Snmean () , where we used Snmean (f )/Snmean () f ∞ , which holds true since the enumerator and denominator use the same sample. This yields the following error bound: √ 2 mean simple e(Sn e(Sn , f ) + f ∞ e(Snmean , ) , (f, )) 1 √ √ √ 2 2 2f ∞ 2 2 2C √ , √ (f 2 + f ∞ 2 ) √ 1 1 n n n where we use Lemma 2. Taking the supremum over (f, ) ∈ FC () allows to complete the proof. 4. Analysis for F () In this section we impose restrictions on the input data, in particular on the density, in order to improve the complexity. This class is still large enough to contain many important situations. Monte Carlo methods for problems when the target (invariant) distribution is log-concave proved to be important in many studies, we refer to [10]. One of the main intrinsic features of such classes of distributions are isoperimetric inequalities, see [2,13], which will also be used here in the form as used in [29]. Recall that here we always require that ⊂ Rd is a convex body, as introduced in Section 2. We start with a lower bound for all non-adaptive algorithms to exhibit that simple Monte Carlo cannot take into account the additional structure of the underlying class of input data and adaptive methods should be used. This bound, together with Theorem 5, will show that adaptive methods can outperform any non-adaptive method, if we consider S on F (B d ). Indeed, we also show that specific Metropolis algorithms, based on local underlying Markov chains are suited for this problem class. 4.1. A lower bound for non-adaptive methods Here we prove a lower bound for all non-adaptive methods (hence in particular for the simple Monte Carlo method) for the problem on the classes F (). Again, this lower bound will use Bahvalov’s technique.
682
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
We start with a result on sphere packings. The Minkowski–Hlawka theorem, see [25], says that the density of the densest sphere packing in Rd is at least (d) · 21−d 21−d . It is also known, see [11], that the density (by definition of the whole Rd ) can be replaced by the density within a convex body , as long as the radius r of the spheres tends to zero. Hence we obtain the following result. Lemma 3. There is n ∈ N such that for all m n there are points y1 , . . . , ym ∈ such that with vol() 1/d r := r(, m) := 2−1 m−1/d vol(B d ) the closed balls Bi := B(yi , r) ⊂ are disjoint. Our construction will use such points y1 , . . . , ym ∈ and the corresponding balls B1 , . . . , Bm as follows. For i ∈ {1, . . . , m} we assign i (y) := ci exp (−y − yi 2 ) , fi (y) := c˜i Bi (y), y ∈ ,
y∈
and
with constants ci and c˜i chosen such that 1= i (y) dy = ci exp(−y − yi )dy 1 = fi 2,i = c˜i2 ci exp(−y − yi ) dy.
and
Bi
The corresponding values of the mapping S are computed as S(fi , i ) = fi i dy = c˜i ci exp(−y − yi ) dy
Bi
= ci =
1/2
exp(−y − yi ) dy Bi
B(0,r) exp(−y) dy
exp(−y
− yi ) dy
= ci
1/2 exp(−y) dy B(0,r)
1/2 .
(13)
Again we turn to the average case setting, this time with probability measure 2n being the equidistribution on the set
F 2n := i fi , i , i = 1, . . . , 2n, i = ±1 ⊂ F (). Similar to (10) we have for any non-adaptive Monte Carlo method Sn (f, ) the relation ! e(Sn , F ()) min eavg (qn , 2n ), qn is deterministic and non-adaptive , where eavg (qn , 2n ) denotes the average case error of the deterministic non-adaptive method qn with respect to the probability 2n . Thus let qn be any non-adaptive (deterministic) algorithm for S on the class F () that uses at most n values.
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
683
The average case error can then be bounded from below as 2 1 E S(i fi , i ) − qn (i fi , i ) 2n i=1 2 1 1 min E S(i fi , i ) min S(fi , i )2 . 2 i=1,...,2n 2 i=1,...,2n 2n
E2n |S(f, ) − qn (f, )|2 =
Above, E denotes the expectation with respect to the independent random variables i = ±1. Together with (13) we obtain 1/2 1√ B(0,r) exp(−y) dy e(Sn , F ()) 2 min . i=1,...,2n 2 exp(−y − yi ) dy We bound the enumerator from below and the denominator from above. For r log 2 we can bound 1 1 exp(−y) dy vol(B(0, r)) = r d vol(B d ). 2 2 B(0,r) For the denominator we have exp(−y − yi ) dy
exp(−y − yi ) dy
Rd
= −d
Rd
exp(−y) dy = −d (d) vol *B d ,
such that we finally obtain, using the well known formula vol(*B d ) = d vol(B d ), that e(Sn , F ())
1/2 1/2 1 √ d r d 1 d r d 2 = . 2 2d! 2 d!
Using the value for r = r(, 2n) from Lemma 3 we end up with Theorem 3. Assume that Sn is any non-adaptive Monte Carlo method for the class F (). Then, with n from Lemma 3, we have for all vol d 2n max n , (/log 4) · vol B d that
e(Sn , F ())2
−d/2−3/2
·
vol vol B d
1/2
d/2 · √ n−1/2 . d!
(14)
−1/2 Remark 5. For fixed d this is a lower bound of the form e(Sn ) c d/2 √ n −1 . It is interesting only if is “large”, otherwise the already mentioned lower bound (1 + n) is better.
We stress that in the above reasoning we essentially used the non-adaptivity of the method Sn . Indeed, if Sn were adaptive, then by just one appropriate function value (x), we could identify the index i, since the functions i are global. Then, knowing i, we could ask for the value of i and would obtain the exact solution to S(f, ) for this small class F 2n for all n2.
684
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
4.2. Metropolis method with local underlying walk The Metropolis algorithm we consider here has a specific routine Step in Fig. 1, whereas the final step Compute is exactly as given in (5). It is based on a specific ball walk and this version is sometimes called ball walk with Metropolis filter, see [29]. Two concepts from the theory of Markov chains turn out to be important, reversibility and uniform ergodicity. We recall these notions briefly, see [24] for further details. A Markov chain (K, ) is reversible with respect to , if for all measurable subsets A, B ⊂ the balance K(x, B) (dx) = K(x, A) (dx) (15) A
B
holds true. Notice that in this case necessarily is an invariant distribution. A Markov chain is uniformly ergodic if there are n0 ∈ N, a constant c > 0 and a probability measure on such that K n0 (x, A)c(A)
for all A ⊂ and x ∈ .
(16)
Markov chains which are uniformly ergodic have a unique invariant probability distribution. Our analysis will be based on conductance arguments and we recall the basic notions, see [12,16]. If (K, ) is a Markov chain with transition kernel K and invariant distribution then we assign the (1) local conductance at x ∈ by lK (x) := K(x, \ {x}), (2) and the conductance as c A K(x, A ) (dx) (K, ) := inf , 0< (A)<1 min { (A), (Ac )}
(17)
where Ac = \ A. Below we call l > 0 a lower bound for the local conductance, if lK (x) l for all x ∈ . The ball walk and some of its properties. Here we gather some properties of the ball walk, see [16,29], which will serve as ingredients for the analysis of Metropolis chains using this as the underlying proposal. In particular we prove that on convex bodies in Rd the ball walk is uniformly ergodic and we bound its conductance from below, in terms of bounds l > 0 for the local conductance. We abbreviate B(0, ) = B d . Let Q be the transition kernel of a local random walk having transitions within -balls of its current position, i.e., we let Q (x, {x}) := 1 − and
vol(B(x, ) ∩ ) , vol(B d )
⎧ ⎨ vol(B(x, ) ∩ A) , Q (x, A) := vol(B d ) ⎩ Q (x, A \ {x}) + Q (x, {x}),
(18)
A⊂
and
x∈ / A,
A⊂
and
x ∈ A.
Schematically, the transition kernel may be viewed as in Fig. 2.
(19)
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
685
Fig. 2. Schematic view of ball walk step.
Clearly we may restrict to D, the diameter of . The following observation is important and explains why we restrict ourselves to convex bodies.. Lemma 4. If ⊂ Rd is a convex body, then the ball walk Q has a (non-trivial) lower bound l > 0 for the local conductance. Proof. It is well-known that convex bodies satisfy the cone condition (see [9, Section 3.2, Lemma 3]). Therefore we obtain that for each > 0 there is l > 0 such that for each x ∈ we have lQ (x)l. Remark 6. Observe however, that l might be very small. For = [0, 1]d , for example, we get d l = 2−d √, even if is very small. In contrast, we will see that a large l is possible for = B and 1/ d + 1, see Lemma 7. Notice that lQ (x) = vol(B(x, ) ∩ )/vol(B d ), hence in the following we use the inequality: vol(B(x, ) ∩ )l vol(B d ),
(20)
where l > 0 is a lower bound for the local conductance of the ball walk. The following result is folklore, but for a lack of reference we sketch a proof. Proposition 1. The ball walk Q is reversible with respect to the uniform distribution and uniformly ergodic. The crucial tool for proving this is provided by the notion of small and petite sets, where we refer to [19, Sections 5.2 and 5.5] for details and properties. To this end we introduce a sampled chain, say (Q )a , where a is some probability a = (a0 , a1 , . . .) on {0, 1, 2, . . .} and (Q )a is j defined by (Q )a (x, C) := ∞ j =0 aj Q (x, C). We recall that a (measurable) subset C ⊂ is petite (for Q ), if there are a probability a and a probability measure on such that (Q )a (y, A)(A),
A ⊂ ,
y ∈ C.
(21)
A set C ⊂ is small, if the same property holds true for some Dirac probability a := n , such that obviously small sets are petite. We first show that certain balls are small. Lemma 5. The sets B(x, /2) ∩ , x ∈ are small for Q .
686
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
Proof. First, we note that y ∈ B(x, /2) implies B(x, /2) ⊂ B(y, ). Let l > 0 be a lower bound for the local conductance of Q/2 . Using (20) for Q/2 , we obtain for any set A ⊂ that vol(B(x, /2) ∩ A) vol(B(y, ) ∩ A) 2−d Q (y, A) Q (y, A \ {y}) = vol(B(y, )) vol(/2B d ) vol(A ∩ B(x, /2) ∩ ) l · 2−d . vol(B(x, /2) ∩ ) Hence estimate (21) holds true with n0 := 1, := l · 2−d and (A) :=
vol(A ∩ B(x, /2) ∩ ) , vol(B(x, /2) ∩ )
This completes the proof.
A ⊂ .
Proof Proposition 1. We first prove reversibility with respect to . Notice that it is enough to verify (15) for disjoint sets A, B ⊂ . Furthermore we observe that for any pair A, B ⊂ of measurable subsets the characteristic function of the set {(x, y) ∈ × ,
x ∈ A, y ∈ B, x − y }
can equivalently be rewritten as
B (y) B(y,)∩A (x) or A (x) B(x,)∩B (y). Hence, letting temporarily c := vol() vol(B d ) we obtain 1 Q (x, B) (dx) = vol(B(x, ) ∩ B) dx c A A 1
(x) B(x,)∩B (y) dy dx = c A 1 =
B (y) B(y,)∩A (x) dx dy = c
B
Q (y, A) (dy),
proving reversibility. By Lemma 5 each set B(x, /2) ∩ is small, thus also petite. Petiteness is inherited by taking finite unions. Since , being compact, can be covered by finitely many sets B(x, /2) ∩ , this implies that is petite. By [19, Theorem 16.2.2] this yields uniform ergodicity of the ball walk (see [19, Theorem 16.0.2(v)]). We mention the following conductance bound of the ball walk, which is a slight improvement of [29, Theorem 5.2]. This will be a special case of Theorem 4, below, and we omit the proof. Proposition 2. Let (Q , ) be the ball walk from above, and let (Q , ) be its conductance. Let D be the diameter of and let l be a lower bound for the local conductance. Then l2
(Q , ) . (22) √ 2 8D d + 1 The local conductance may be arbitrarily small if the domain has sharp corners. For specific sets we can explicitly provide lower bounds for the local conductance, and this will be used in the later convergence analysis. In the following we mainly discuss the case = B d .
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
687
We start with a technical result, related to the Gamma function on R+ . We use the well-known formula vol(B d ) = d/2 /(d/2 + 1).
(23)
Lemma 6. For any z > 0 we have (z + 1/2) √ z. (z)
(24)
Consequently, vol(B d−1 ) vol(B d )
d +1 . 2
(25)
Proof. By [8, Chapter VII, Eq. (11)] we know that the function z → log (z) is convex for z > 0. Thus we conclude log (z + 1/2) =
1 2 1 2
(log (z + 1) + log (z)) (log z + 2 log (z)) = log
√
z + log (z),
from which the proof of assertion (24) can be completed. Using the representation for the volume from (23) and applying the above bound with z := (d + 1)/2 we obtain vol(B d−1 ) d +1 (d/2 + 1) √ , d vol(B ) 2
((d + 1)/2) and the proof is complete.
Using Lemma 6, we can prove the following lower bound for the local conductance of the ball walk on B d . √ Lemma 7. Let (Q , ) be the local ball walk on B d ⊂ Rd . If 1/ d + 1, then its local conductance obeys l 0.3. Proof. The proof is based on some geometric reasoning. It is clear that the local conductance l(x) ", of is minimal for points x at the boundary of B d , and in this case its value equals the portion, say V the volume of B(x, ) inside B d . If H is the hyperplane at x to B d , then this cuts off B(x, ) exactly one half of its volume. Thus we let Z(h) be the cylinder with base being the (d − 1)-ball around x in the hyperplane H of radius . Its height h is the distance of H to the hyperplane determined by the intersection of B d ∩B(x, ). This height h is exactly determined from the quotient h/ = /2, " 1 − vol(Z(h))/vol(B(x, )) and by similarity, hence h := 2 /2. By construction we have V 2 we can lower bound the local conductance l(x) by l(x)
vol(Z(h)) 1 − . 2 vol(B(x, ))
We can evaluate vol(Z(h)) as vol(Z(h)) = hd−1 vol(B d−1 ), and we obtain vol(B d−1 ) 1 d+1 vol(B d−1 ) 1 1− . l(x) − = 2 2 vol(B d ) 2d vol(B d )
688
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
The bound (25) from Lemma 6 implies √ 1 d +1 l(x) 1− √ . 2 2 √ √ For 1/( d + 1) we get l(x) 21 (1 − 1/ 2 ) 0.3, completing the proof.
We close this subsection with the following technical lemma, which can be extracted from the unpublished seminar note [28]. For the convenience of the reader we present its proof. In addition we will slightly improve the statement. Lemma 8. Let l > 0 be a lower bound for the local conductance of the ball walk (Q , ). For any 0 < t < l and any set A ⊂ with related sets l−t c A1 := x ∈ A, Q (x, A ) < ⊂ A, (26) 2 l−t A2 := y ∈ Ac , Q (y, A) < (27) ⊂ Ac , 2 √ we have d(A1 , A2 ) > t 2 / (d + 1). For its proof we need the following: √ Lemma 9. Let > 0. If x, y ∈ Rd are two points with distance t 2 / (d + 1) at most, then vol(B(x, ) ∩ B(y, ))(1 − t) vol(B d ).
(28)
Proof. Let u := x − y2 . If u < then the volume of the intersection of B(x, ) and B(y, ) is exactly the same as the volume of the ball B d minus the volume of the middle slice with distance u as thickness. The volume of this slice is bounded from above by the volume of the cylinder with base B d−1 and thickness u. Thus we obtain vol(B d−1 ) vol(B(x, ) ∩ B(y, ))vol(B d ) − u vol(B d−1 ) = vol(B d ) 1 − u . vol(B d ) Applying Lemma 6 we obtain vol(B d−1 ) vol(B d−1 ) 1 d + 1 = , vol(B d ) vol(B d ) 2 √ √ thus by the choice of u 2 t/ d + 1 we conclude that √ √ 2 t d + 1 vol(B d−1 ) √ √ u t, vol(B d ) 2 d + 1 and the proof is complete.
We turn to the Proof √ of Lemma 8. Let x ∈ A1 and y ∈ A2 be in , and suppose that their distance is at most t 2 / (d + 1). Simple set theoretic reasoning shows that vol(B(x, ) ∩ B(y, ) ∩ ) vol(B(x, ) ∩ ) − vol(B(x, ) \ B(y, )) vol(B(x, ) ∩ ) − vol(B(x, ) \ (B(x, ) ∩ B(y, ))) = vol(B(x, ) ∩ ) − vol(B d ) + vol(B(x, ) ∩ B(y, )).
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
689
Since l is a lower bound for the conductance l(x) we have that vol(B(x, ) ∩ ) l vol(B(x, )) = l vol(B d ). Taking this into account and using (28) we end up with vol(B(x, ) ∩ B(y, ) ∩ ) l vol(B d ) − vol(B d ) + (1 − t) vol(B d ) = (l − t) vol(B d ). In probabilistic terms this rewrites as Q (x, B(x, ) ∩ B(y, ) ∩ )l − t, and similarly Q (y, B(x, ) ∩ B(y, ) ∩ ) l − t. Now, if A ⊂ is any measurable subset with complement Ac then for x ∈ A and y ∈ Ac we obtain # B(x, ) ∩ B(y, ) ∩ ⊂ B(x, ) ∩ Ac ∩ (B(y, ) ∩ A ∩ ) , which in turn yields Q (x, Ac ) + Q (y, A) l − t, but this contradicts the definition of the sets A1√and A2 . Hence any two points from A1 and A2 , respectively, must have distance larger than t 2 / (d + 1), and the proof is complete. Properties of the related Metropolis method. We analyze Metropolis Markov chains which are based on the ball walk, introduced above, for some appropriately chosen . As it will turn out, the related Metropolis chains are perturbations of the underlying ball walk, and its properties, as established in Propositions 1 and 2 extend in a natural way. For ∈ R () we define the acceptance probabilities as (y) (x, y) := min 1, . (29) (x) The corresponding Metropolis kernel is given by K, (x, dy) := (x, y)Q (x, dy) + (1 − (x, y)Q (x, dy))x (dy). Note that for x ∈ / A we obtain K, (x, A) = (x, y)Q (x, dy) = A
1 vol(B d )
(30)
A∩B(x,)
(x, y) dy.
Below we sketch a single Metropolis Step from the present position x ∈ with kernel K, (x, ·) (Fig. 3). The procedure Ball-walk-step was described in Fig. 2. We start with the following observation. Lemma 10. Let be the Lipschitz constant in R () and := exp(−). Uniformly for ∈ R () the following bound for the related Metropolis chain holds true: K, (x, dy)Q (x, dy).
(31)
Proof. Let A ⊂ . If dist(x, A) > then there is nothing to prove. Otherwise, for y ∈ A∩B(x, ) we find from (6) and (29) that (x, y) exp(−x − y2 ) e− = .
690
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
Fig. 3. Schematic view of the Metropolis step. Note that the Acceptance step results in an acceptance probability of (x, y) = min {1, (y)/(x)}.
By definition of the transition kernel K, from (30) we can use to bound K, (x, A) min {(x, y), y ∈ A ∩ B(x, )} Q (x, A) Q (x, A). The proof is complete.
The assertion of Proposition 1 extends to the family of Metropolis chains as follows. Proposition 3 (cf. Mathé [18, Proposition 1]). Let Q be the ball walk from (19) on . For each ∈ R () and D the corresponding Metropolis chains from (30) are uniformly ergodic and reversible with respect to the related . Proof. Reversibility with respect to is clear by the choice of the function . To prove uniform ergodicity, let be from Lemma 10 and c from (16). As established in Lemma 10 we have K, (x, dy)Q (x, dy). It is easy to see, and was established in [18, Proof of Theorem 2], that this extends to all iterates as Kn, (x, dy)n Qn (x, dy). Recall that under the assumptions made, the ball walk is uniformly ergodic, and from Proposition 1 we obtain n0 such that for all x ∈ we have n
A ⊂ ,
K,0 (x, A)n0 c(A), proving uniform ergodicity.
(32)
Remark 7. Notice that (32) is obtained with right-hand side uniformly for all ∈ R (), a fact which will prove useful later. Finally we prove lower bounds for the conductance of the Metropolis chains. Theorem 4. Let (K, , ) be the Metropolis chain based on the local ball walk (Q , ) and let (K, , ) be its conductance, where ∈ R (). Let l be a lower bound for the local conductance of Q . For ∈ R () we have
le− l min (K, , ) ,1 , (33) √ 8 2D d +1 where D is the diameter of .
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
691
Remark 8. As mentioned above, Proposition 2 is a special case of Theorem 4 for = 0. The proof of Theorem 4 will be based on Lemma 8 for the underlying ball walk, specifying t := l/2. This extends to the Metropolis walk as follows. Lemma 11. Let from (6) and l be the local conductance of the ball walk. We let := exp(−). For A ⊂ we assign l T1 := x ∈ A, K, (x, Ac ) < ⊂ A, (34) 4 l T2 := y ∈ Ac , K, (y, A) < ⊂ Ac . (35) 4 √ Then d(T1 , T2 ) > l / (2d + 2). Proof. It is enough to prove T1 ⊂ A1 and T2 ⊂ A2 . If x ∈ T1 then Lemma 10 implies K, (x, Ac ) < l/4, hence 1 l Q (x, Ac ) K, (x, Ac ) . 4 The other inclusion is proved similarly.
We turn to the Proof of Theorem 4. Let A ⊂ be the set for which the conductance is attained. We assign sets T1 and T2 as in Lemma 11 and distinguish two cases. If (T1 ) < (A)/2 or (T2 ) < (Ac )/2, then the estimate (33) follows easily. For instance, if (T1 ) < (A)/2 then c K, (x, A ) (dx) K, (x, Ac ) (dx) A
A\T1
l l l (A \ T1 ) (A) min (A), (Ac ) , 4 8 8
thus (K, , )l/8 in this case, which proves (33). Otherwise we have (T1 ) (A)/2 and (T2 ) (Ac )/2. In this case we apply an isoperimetric inequality, see [29, Theorem 4.2] to the triple (T1 , T2 , T3 ) with T3 := \ (T1 ∪ T2 ) to conclude that (T3 )
2d(T1 , T2 ) min (T1 ), (T2 ) , D
(36)
hence under the size constraints in this case it holds true that (T3 )
d(T1 , T2 ) min (A), (Ac ) . D
Using the reversibility of the Metropolis chain (K, , ) we have K, (x, Ac ) (dx) = K, (y, A) (dy), A
Ac
(37)
692
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
which implies 1 K, (x, Ac ) (dx) = K, (x, Ac ) (dx) + K, (y, A) (dy) 2 A Ac A 1 c K, (x, A ) (dx) + K, (y, A) (dy) 2 Ac ∩T3 A∩T3 1 l l (A ∩ T3 ) + (Ac ∩ T3 ) 2 4 4 l l = (A ∩ T3 ) + (Ac ∩ T3 ) = (T3 ). 8 8 √ Since by Lemma 11 we can bound d(T1 , T2 ) l / (2d + 2) we use (37) to complete the proof. If we restrict ourselves to Metropolis chains on B d , then Lemma 7 provides a lower bound for the local conductance which is independent of the dimension d. As a simple consequence of Theorem 4 we then obtain the following: Corollary 1. Assume that ∈ R (B d ) and (d + 1)−1/2 . Then we obtain
9 (K, , ) e− . √ 2 1600 d + 1
√ To maximize we define ∗ = min 1/ d + 1, 1/ and obtain 1 1 1 ∗ (K, , )0.0025 √ . min √ , d +1 d +1 Error bounds. For the class F () the above lower conductance bound (33) will yield an error estimate for the problem (2). Let Sn be the estimator based on a sample of the local Metropolis Markov chain with transition K, , starting at zero. To estimate its error we combine the estimates of the conductance of K, with two results, partially known from the literature. To formulate the results we note the following. The Markov kernel K, is reversible with respect to and hence induces a self-adjoint operator K, : L2 (, ) → L2 (, ). The spectrum (K, ) is contained in [−1, 1] and 1 ∈ (K, ) and we are interested in the second largest eigenvalue , := sup{ ∈ (K, )| = 1} of K, . This is motivated by the extension of a result from [18, Corollary 1] about the worst case error of Sn , uniformly for (f, ) ∈ F (). Lemma 12. lim
sup
n→∞ (f,)∈F ()
e(Sn , (f, ))2 · n =
sup
∈R ()
1 + , 1 − ,
.
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
693
The proof is given in the Appendix. For Markov chains which start according to the invariant distribution the bound is similar, but more explicit and was given in [26] and [16, Theorem 1.9]. The relation of the second largest eigenvalue , to the conductance is given in Lemma 13 (Cheeger’s Inequality, see [12,15,16]). , := 1 − , 2 (K, , )/2. We are ready to state our main result for the Metropolis algorithm Sn , based on the Markov chain K, , for the class F (B d ), i.e., when ⊂ Rd is the Euclidean unit ball. Theorem 5. Let Sn = 1/n nj=1 f (Xj ) be the estimator based on a sample (X1 , . . . , Xn ) of the local Metropolis Markov chain with transition K, , where (d + 1)−1/2 . Then lim
n→∞
sup (f,)∈F (B d )
e(Sn , (f, ))2 · n
8 · 16002 e2 (d + 1) · 2 . 81
(38)
Again we may choose ∗ = min (d + 1)−1/2 , −1 and obtain lim
n→∞
sup (f,)∈F (B d )
! ∗ e(Sn , (f, ))2 · n 594700 · (d + 1) max d + 1, 2 .
Proof. This follows from Corollary 1, and Lemmas 12 and 13.
(39)
5. Summary Let us discuss our findings. The results from Section 3 clearly indicate that the superiority of Metropolis algorithms upon simpler (non-adaptive) Monte Carlo methods does not hold in general. Specifically, it does not hold for the large classes FC () of input without additional structure. On the other hand, for the class F (B d ), specific Metropolis algorithms that are based on local underlying walks are superior to all non-adaptive methods. Even more, on B d the cost of the ∗ algorithm Sn , roughly given by the number n of evaluations of and f, increases like a polynomial ∗ in d and . More precisely, according to the asymptotic constant limn→∞ e(Sn , F (B d ))2 · n
(39), is bounded by a constant times max d 2 , d2 , i.e., the complexity grows polynomially in d and and, for fixed d, increases (at most) as 2 . If we only allow non-adaptive methods then this asymptotic constant, again for fixed d, increases at least as d , see (14). We believe that this problem is tractable in the sense that the number of function values to achieve an error can be bounded by n(, F (B d ))C−2 d max(d, 2 ).
(40)
We did not prove (40), however, since Theorem 5 is only a statement for large n. Notice that according to Theorem 5 the size ∗ of the underlying balls walk needs to be adjusted both to the spatial dimension d and the Lipschitz constant .
694
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
The analysis of the Metropolis algorithm is based on properties of the underlying ball walk; in particular we establish uniform ergodicity of the ball walk for convex bodies ⊂ Rd . Also, based on conductance arguments, we provide lower bounds for the spectral gap of the ball walk. As a consequence, in the case = 0 the estimate (38) provides an error bound for the ball walk (Q , ), which is asymptotically of the form e(Sn , L2 (B d , )) C−1 (d/n)1/2 . The results extend in a similar way to any family d ⊂ Rd for which the underlying local ball walk Q has (for d ) a non-trivial lower bound for the local conductance that is independent of the dimension. Finally, from the results of Section 3 we can conclude that adaption does not help much for the classes FC (). Hence we have new results concerning the power of adaption, see [22] for a survey of earlier results, in particular that it may help to break the curse of dimensionality for the classes F (B d ). Acknowledgment We thank two anonymous referees and Daniel Rudolf for their comments. Appendix A. Proof of Lemma 12 Lemma 12 extends the bound from [18, Theorem 1], which deals with a single uniformly ergodic chain. It was obtained from on a contraction property, as stated in [18, Proposition 1]. The goal of the present analysis is to establish this asymptotic result uniformly for all Metropolis chains with density from R (), by showing that this contractivity holds true uniformly. Contractivity of the Markov operator. We assign to each transition kernel K on with corresponding invariant distribution the bounded linear mapping P, given by (Pf )(x) := f (y)K(x, dy). (41) Also we let E denote the mapping which assigns any integrable function its expectation as a constant function E(f ): = f (x)(dx). For each K the mapping P −E is bounded in L∞ (, ), with norm less than or equal to one and we shall strengthen this uniformly for kernels K, with ∈ R (). Within this operator context uniform ergodicity is equivalent to a specific form of quasi-compactness, namely there are 0 < < 1 and n0 ∈ N for which P n − E: L∞ () → L∞ ()
for nn0 .
(42)
We first show that reversibility allows to transfer this to the spaces L1 (, ). Lemma 14. Suppose that the transition kernel K with corresponding mapping P is reversible. Then for all n ∈ N we have P n − E: L1 (, ) → L1 (, ) P n − E: L∞ (, ) → L∞ (, ).
(43)
Proof. If K is reversible, then so are all iterates K n . Thus for arbitrary functions f ∈ L1 (, ) and h ∈ L∞ (, ) we have, using the scalar product on L2 (, ), that (P n − E)f, h = f, (P n − E)h.
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
Consequently, for any f ∈ L1 (, ) we have (P n − E)f 1 = sup (P n − E)f, h = h∞ 1
f 1
695
sup f, (P n − E)h
h∞ 1
sup (P − E)h∞ , n
h∞ 1
from which the proof can be completed.
Proposition 4. For any convex body ⊂ Rd there are an integer n0 and a constant 0 < < 1 such that uniformly for ∈ R () we have n
P,0 − E: L1 (, ) → L1 (, ) .
(44)
Proof. This is an immediate consequence of the bound (32). As mentioned in Remark 7 uniform ergodicity was established uniformly for ∈ R (). It is well known (see [19, Theorem 16.2.4]) that this implies that there is an < 1 such that uniformly for ∈ R () we have n
P,0 − E: L∞ () → L∞ () In the light of Lemma 14 this yields (44).
for n n0 .
(45)
Finally we sketch the Proof of Lemma 12. Using Proposition 4 we can extend the proof of [18, Theorem 1]. In particular, the bounds from Eqs. (13)–(15) in [18] tend to zero uniformly for ∈ R (). Moreover, starting at zero, after one step according to the underlying ball walk, the (new) initial distribution is uniformly bounded with respect to the uniform distribution on , hence also with respect to , such that we establish the asymptotics in Lemma 12. References [1] C. Andrieu, É. Moulines, On the ergodicity properties of some adaptive MCMC algorithms, Ann. Appl. Probab. 16 (3) (2006) 1462–1505. [2] D. Applegate, R. Kannan, Sampling and integration of near log-concave functions, in: STOC ’91: Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, ACM Press, New York, NY, USA, 1991, pp. 156–163. [3] Y.F. Atchadé, J.S. Rosenthal, On adaptive Markov chain Monte Carlo algorithms, Bernoulli 11 (5) (2005) 815–828. [4] N.S. Bahvalov, Approximate computation of multiple integrals, Vestnik Moskov. Univ. Ser. Mat. Meh. Astr. Fiz. Him. 1959 (4) (1959) 3–18. [5] F. Bassetti, P. Diaconis, Examples comparing importance sampling and the Metropolis algorithm, Illinois J. Math. 50 (2006) 67–91. [6] I. Beichl, F. Sullivan, The Metropolis algorithm, Comput. Sci. Eng. 2 (1) (2000) 65–69. [7] I. Beichl, F. Sullivan, Guest editors’ introduction: Monte Carlo methods, Comput. Sci. Eng. 8 (2) (2006) 7–8. [8] N. Bourbaki, Functions of a real variable, Elements of Mathematics (Berlin), Springer, Berlin, 2004. [9] V.I. Burenkov, Sobolev Spaces on Domains, Teubner-Texte zur Mathematik, vol. 137, Teubner Verlag Stuttgart, 1998. [10] A. Frieze, R. Kannan, N. Polson, Sampling from log-concave distributions, Ann. Appl. Probab. 4 (3) (1994) 812– 837. [11] E. Hlawka, Ausfüllung und Überdeckung konvexer Körper durch konvexe Körper, Monatsh. Math. Phys. 53 (1949) 81–131. [12] M. Jerrum, A. Sinclair, Approximating the permanent, SIAM J. Comput. 18 (6) (1989) 1149–1178. [13] R. Kannan, L. Lovász, M. Simonovits, Isoperimetric problems for convex bodies and a localization lemma, Discrete Comput. Geom. 13 (3–4) (1995) 541–559.
696
P. Mathé, E. Novak / Journal of Complexity 23 (2007) 673 – 696
[14] U. Krengel, Ergodic theorems, de Gruyter Studies in Mathematics, vol. 6, Walter de Gruyter & Co., Berlin, 1985. [15] G.F. Lawler, A.D. Sokal, Bounds on the L2 spectrum for Markov chains and Markov processes: a generalization of Cheeger’s inequality, Trans. Amer. Math. Soc. 309 (2) (1988) 557–580. [16] L. Lovász, M. Simonovits, Random walks in a convex body and an improved volume algorithm, Random Structures Algorithms 4 (4) (1993) 359–412. [17] P. Mathé, The optimal error of Monte Carlo integration, J. Complexity 11 (4) (1995) 394–415. [18] P. Mathé, Numerical integration using Markov chains, Monte Carlo Methods Appl. 5 (4) (1999) 325–343. [19] S.P. Meyn, R.L. Tweedie, Markov Chains and Stochastic Stability, Springer, London, 1993. [20] E. Novak, Deterministic and stochastic error bounds in numerical analysis, Lecture Notes in Mathematics, vol. 1349, Springer, Berlin, 1988. [21] E. Novak, The real number model in numerical analysis, J. Complexity 11 (1) (1995) 57–73. [22] E. Novak, On the power of adaption, J. Complexity 12 (3) (1996) 199–237. [23] D. Randall, Rapidly mixing Markov chains with applications in computer science and physics, Comput. Sci. Eng. 8 (2) (2006) 30–41. [24] G.O. Roberts, R.L. Tweedie, Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms, Biometrika 83 (1) (1996) 95–110. [25] C.A. Rogers, Packing and covering, Cambridge Tracts in Mathematics and Mathematical Physics, No. 54, Cambridge University Press, New York, 1964. [26] A. Sokal, Monte Carlo methods in statistical mechanics: foundations and new algorithms, in: Functional integration (Cargèse, 1996), Plenum, New York, 1997, pp. 131–192 [27] J.F. Traub, G.W. Wasilkowski, H. Wo´zniakowski, Information-based complexity, Academic Press Inc., Boston, MA, 1988 with contributions by A.G. Werschulz and T. Boult. [28] S. Vempala, Lecture 17, Random walks and polynomial time algorithms, http://www-math.mit.edu/ ˜vempala/random/course.html, 2002. [29] S. Vempala, Geometric random walks: a survey, Combinatorial and computational geometry, Math. Sci. Res. Inst. Publ., vol. 52, Cambridge University Press, Cambridge, 2005, pp. 577–616.
Journal of Complexity 23 (2007) 697 – 714 www.elsevier.com/locate/jco
Tensor-product approximation to operators and functions in high dimensions Wolfgang Hackbusch, Boris N. Khoromskij∗ Max-Planck-Institute for Mathematics in the Sciences, Inselstr. 22-26, D-04103 Leipzig, Germany Received 13 December 2006; accepted 14 March 2007 Available online 6 April 2007 Dedicated to Henryk Wozniakowski on the occasion of this 60th birthday
Abstract In recent papers tensor-product structured Nyström and Galerkin-type approximations of certain multidimensional integral operators have been introduced and analysed. In the present paper, we focus on the analysis of the collocation-type schemes with respect to the tensor-product basis in a high spatial dimension d. Approximations up to an accuracy O(N −/d ) are proven to have the storage complexity O(dN 1/d logq N ) with q independent of d, where N is the discrete problem size. In particular, we apply the theory to a collocation 1 , x, y ∈ Rd , d 3. Numerical illustrations are discretisation of the Newton potential with the kernel |x−y| given in the case of d = 3. © 2007 Published by Elsevier Inc. MSC: 65F50; 65F30; 46B28; 47A80
1. Introduction The construction of efficient representations to multi-variate functions and related operators plays a crucial role in the numerical analysis of higher dimensional problems arising in a wide range of modern applications. For example we mention multi-dimensional integral equations, elliptic and parabolic boundary value problems posed in Rd , d 2. In multi-dimensional applications, standard numerical methods usually fail due to the so-called “curse of dimensionality’’ (Bellman). This effect can be relaxed or completely avoided by a systematic application of Kronecker-type tensor-product representations of the arising high-order
∗ Corresponding author.
E-mail addresses: [email protected] (W. Hackbusch), [email protected] (B.N. Khoromskij). 0885-064X/$ - see front matter © 2007 Published by Elsevier Inc. doi:10.1016/j.jco.2007.03.007
698
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
tensors. Algebraic methods for tensor-product approximations to high-order tensors have been extensively discussed in the literature (see [25,4,5,16,21,27] and related references). In recent papers, modern methods of structured tensor-product approximations to some classes of multi-dimensional integral operators and operator-valued functions have been applied successfully (see [1,14,10,2,12,13,17,19,22] and references therein). Approximations via the Nyström and Galerkin methods have been considered in [14,13,19]. Applications to nonlocal operators associated with the density matrix ansatz for solving the Hartree–Fock equation [7,2], computation of molecular density functions by the Ornstein–Zernicke equation [6], as well as collision integrals of the deterministic Boltzmann equation [18] have demonstrated the efficiency of low-rank tensor-product decompositions. In the present paper, we discuss analytic methods for tensor-product approximations to multidimensional integral operators. For the case of collocation schemes we focus on the construction of tensor decompositions which are exponentially convergent in the separation rank. It is worthwhile to note that on the one hand, collocation schemes can be applied to much more general class of integral operators than the Nyström methods (including kernels with the diagonal singularity), on the other hand, they are much simpler than the Galerkin methods (requiring only a one-fold integration). Approximations up to the accuracy O(n− ) are proven to have the storage complexity O(dn logq n) with q independent of d, where N = nd is the discrete problem size (compare with the linear complexity O(nd )). For example, such methods can be applied to the classical Newton, −|x−y| |x−y|) 1 , e |x−y| and cos(|x−y| with x, y ∈ Rd . Yukawa and Helmholtz kernels |x−y| The rest of the paper is organised as follows. In Section 2 analytic methods for the separable approximation via collocation schemes of multi-variate functions and related tensors are presented and analysed. We describe constructive schemes via sinc-quadrature and sincinterpolation methods. In Section 3 we apply the results of Section 2 to integral operators in Rd in the collocation case. We complete the article with some numerical examples illustrating the efficiency of the low tensor-rank approximation of Newton’s potential via optimised sinc-quadratures. 2. Separable approximation of functions and tensors 2.1. Approximation of functions with low separation rank We start the discussion on the level of functions. In many applications we are interested in approximating a multi-variate function f = f (x1 , . . . , xd ) (from a certain class H) in the set of separable functions M1 = {u : u(x) = 1 (x1 ) · . . . · d (xd ),
k ∈ H },
(2.1)
where H is a real, separable Hilbert space of functions defined on R (say, H = L2 (R)). A better approximation can be obtained by allowing for a linear combination of separable products in the approximation set, Mr = u : u(x) =
k
(1) (d) bk k1 (x1 ) · . . . · kd (xd ),
bk ∈
() R, k
∈H ,
(2.2)
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
699
where the sum is taken over multi-indices k = (k1 , . . . , kd ) with 1 k r , r ∈ N, and r = (r1 , . . . , rd ). We call the coefficients B = {bk } ∈ Rr1 ×···×rd
(2.3) ()
the core tensor. Without loss of generality we can assume that the components k ( = 1, . . . , d) are orthonormal, i.e., ()
(k , () m ) = k ,m ,
k , m = 1, . . . , r ,
where k ,m is Kronecker’s delta. Approximations in the set r (1) (d) Mr = u : u(x) = bk k (x1 ) · . . . · k (xd ),
bk ∈ R,
() k
∈H
⊂ Mr ,
(2.4)
k=1 ()
with normalised components k = 1 can be considered. This is the special case of the approximation problem in Mr with r = (r, . . . , r), under the constraint that all off-diagonal elements of the coefficient tensor B = {bk } are zero. Since Mr is not a linear space, we obtain a difficult nonlinear approximation problem when we want to estimate (f, S) := inf f − s s∈S
(2.5)
for f ∈ H, where either S = Mr or S = Mr . 2.1.1. Approximation in S = Mr For S = Mr , the approximation problem (2.5) can be considered in the framework of best r-term approximation with regard to a redundant dictionary (cf. [24]). A system D of functions from H is called a dictionary, if each g ∈ D has norm one and its linear span is dense in H. We denote by r (D) the collection of all functions in H which can be written in the form: s= cg g, ⊂ D, # r, g∈
with cg ∈ R and r ∈ N. For f ∈ H, the best r-term approximation error is defined by r (f, D) :=
inf
s∈r (D )
f − s.
Let H be a real separable Hilbert space. A simple algorithm that inductively computes an estimate to the best r-term approximation is known as the so-called Pure Greedy Algorithm (see [24] and respective references). Let g = g(f ) ∈ D be an element from D maximising |(f, g)|. We define G(f ) := (f, g)g,
R(f ) := f − G(f ).
Now the Pure Greedy Algorithm reads as follows: define R0 (f ) := f and G0 (f ) := 0. Then, for all 1 mr, define Gm (f ) := Gm−1 (f ) + G(Rm−1 (f )), Rm (f ) := f − Gm (f ) = R(Rm−1 (f )) inductively. The output Gr (f, D) of this algorithm is proven to realise the best r-term approximation in the particular case when D is an orthogonal basis of H.
700
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
For the approximation problem on Mr we set D := {g ∈ H ∩ M1 : g = 1}
and hence r (D) = Mr .
The Pure Greedy Algorithm can be applied to functions characterised via the approximation property r (f, D)r −q ,
r = 1, 2, . . . ,
with some q ∈ (0, 21 ], and leads to the error bound (cf. [24]) f − Gr (f, D) C(q, D)r −q ,
r = 1, 2, . . . ,
which is “too pessimistic’’ in our applications. More precisely, we are interested in an efficient r-term approximation on a class of analytic functions with point singularities. In this case, under certain assumptions, we are able to prove exponential convergence r (f, D)C exp(−r q ),
r = 1, 2, . . . ,
with q = 1 or 21 . Since, in general, the Pure Greedy Algorithm fails to recover exponential convergence, we will discuss more special numerical methods to estimate r (f, D) for this special class of analytic functions. Specifically, we consider quadrature- and interpolation-based approaches. 2.1.2. Approximation in S = Mr () Notice that the coefficients bk and the “single-component’’ functions k in (2.2) are not uniquely defined (up to orthogonal transforms). However, this does not pose any problems from the computational point of view since the minimisation problem (2.5) is equivalent to the dual maximisation problem on V , = 1, . . . , d, which does not include bk . Assume that there exists a minimiser of the problem (2.5). Then, for given orthonormal com() ponents () = (1 , . . . , () r ) ( = 1, . . . , d), the coefficient tensor bk minimising (2.5) is represented by (1) (d) (2.6) bk = f, k1 (·) · . . . · kd (·) , k = (k1 , . . . , kd ). For given f ∈ H, the minimisation problem (2.5) with S = Mr is equivalent to the maximisation problem 2 (1) (d) (f ; Mr ) := sup f (x1 , . . . , xd )k1 (x1 ) · . . . · kd (xd ) , () k
()
where () , = 1, . . . , d, is taken from the set of r -tuples () = (1 , . . . , () r ) with orthonormal components. (1) (d) In fact, let f(r) = k bk k1 (x1 ) · . . . · kd (xd ) be the solution of problem (2.5). Then we obtain the identity f(r) = BF ,
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
701
since orthonormal components do not effect the L2 -norm. Now, with fixed components () ( = 1, . . . , d), relation (2.5) is actually a linear least-squares problem with respect to bk ,
(1) (d) (f, f ) − 2 f, bk k1 (x1 ) · . . . · kd (xd ) + (B, B) → min . k
Solving the corresponding Lagrange equation
(1) (d) − f, bk k1 (x1 ) · . . . · kd (xd ) + (B, B) = 0
for all B ∈ Rr1 ×···×rd ,
k
implies (2.6). Now we obtain f − f(r) 2 = f 2 − B2F , and substitution of (2.6) proves the assertion. 2.2. Tucker and canonical tensor decompositions Higher-order tensors (multi-dimensional arrays) appear in numerical computations as the discrete analogue of multi-variate functions. We consider dth order tensors A = [ai1 ,...,id ](i1 ,...,id )∈I ∈ RI defined on the product index set I = I1 × · · · × Id . It is a generalisation of vectors (tensors of √ order 1) and matrices (tensors of order 2). We use the Frobenius norm A := A, A induced by the inner product A, B := ai1 ,...,id bi1 ,...,id with A, B ∈ RI , (2.7) (i1 ,...,id )∈I
which corresponds to the Euclidean norm of a vector. Below we will discuss tensor-product approximations which can be viewed as an analogue to low-rank approximations of matrices, where a large system matrix is replaced by a low-rank matrix (compare the classical approximation of integral operators using degenerate kernels). The class of rank-1 tensors is a discrete analogue of the class of separable functions M1 . In the following, we use the notation ⊗ to represent the canonical (rank-1) tensor U ≡ {ui }i∈I = b · U (1) ⊗ · · · ⊗ U (d) ∈ RI , (1)
(d)
()
defined by ui1 ,...,id = b · ui1 · · · uid with U () ≡ {ui }i ∈I ∈ RI and with a multi-index i := (i1 , . . . , id ) ∈ I. The discrete analogue of the approximation in Mr given by (2.2) is called the Tucker representation which deals with the approximation A(r) =
r1 k1 =1
···
rd kd =1
(1)
(d)
bk1 ,...,kd · Vk1 ⊗ · · · ⊗ Vkd ≈ A, ()
(2.8)
where the Kronecker factors Vk ∈ RI (k = 1, . . . , r , = 1, . . . , d) are real vectors of the respective size n = |I |. Without loss of generality, we assume that for all the vectors () {Vk : k = 1, . . . , r } are orthonormal. In the following, we denote by Tr the set of tensors represented by (2.8). Conventionally, we use the short notations r = (r1 , . . . , rd ) (Tucker rank)
702
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
and B = {bk } ∈ Rr1 ×···×rd (core tensor). Notice that the representation of elements A ∈ Tr even with orthogonal V() is not unique due to the rotational uncertainty in the core tensor B. The canonical representation is defined by A(r) =
r
(1)
bk · V k
(d)
⊗ · · · ⊗ Vk ,
bk ∈ R,
(2.9)
k=1 ()
where the Kronecker factors Vk ∈ RI are normalised vectors (in chemometrics literature it is often called CANDECOMP/PARAFAC, or shortly CP model). The minimal number r in the representation (2.9) is called the Kronecker rank of A(r) . We denote by Cr the set of tensors represented by (2.9). If we let r = r , n = n ( = 1, . . . , d), then both the CP and Tucker representations require only drn numbers to represent the canonical components plus r (resp. r d ) memory units for the core tensor B. The main computational problem is the approximation of a given higher-order tensor A0 in a certain set of structured low-rank tensors S. In particular, S may be one of the classes Tr or Cr . There are algebraic, analytically-based and combined strategies for computing a Kronecker tensor-product decomposition of a higher-order tensor. In this paper we apply analytically-based representation methods, which are efficient for a special class of function-related operators/tensors (see definitions and examples in §3). In the context of integral operators, we consider the representation problem for a class of realvalued square matrices related to discrete multi-dimensional operators posed in Rd , such that A ∈ RN×N , N = nd . More precisely, let A ∈ RI ×I with #I = N be a real-valued matrix defined on the index set I := In × · · · × In (d factors) with In = {1, . . . , n}. A matrix A (resp. a vector X) can also be regarded as a dth order tensor A ∈ RI1 ×···×Id (resp. X ∈ RI1 ×···×Id ). Hence one needs numerically tractable data-sparse representations of the arising high-dimensional tensors. We recall that the Kronecker product of matrices A ⊗ B is defined as a block matrix [aij B], provided that A = [aij ]. The operation “⊗’’ can be applied to arbitrary rectangular matrices (in particular, to row or column vectors) and in the multi-factor version as in (2.11). The general rank-(r1 , . . . , rd ) Tucker-type matrix decomposition uses the tensor-product matrix format 2
A=
r1
···
k1 =1
rd kd =1
bk1 ,...,kd Vk1 ⊗ · · · ⊗ Vkd ∈ RI1 ×···×Id , (1)
(d)
2
2
bk1 ,...,kd ∈ R,
2
(2.10)
where the Kronecker factors Vk ∈ RI ×I , k = 1, . . . , r , = 1, . . . , d, may be matrices of a certain structure (say, hierarchical matrix, wavelet-based format, Toeplitz/circulant, low-rank, etc.). Here r = (r1 , . . . , rd ) is again called the Kronecker rank. The matrix representation by the format (2.10) is a generalisation of the low-rank approximation of matrices, corresponding to the case d = 2. Note that (2.10) is identical to (2.8) except that now () Vk are matrices and not vectors. The canonical Kronecker tensor-product format as proposed in [14,12] reads ()
A=
r k=1
(1)
bk Vk
(d)
⊗ · · · ⊗ Vk ,
bk ∈ R,
(2.11)
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
703
()
where the Kronecker factors Vk ∈ Rn×n may be matrices of a certain structure (say, hierar() chical matrices). Again, (2.11) is identical to (2.9), but with vectors Vk replaced by matrices. Approximations of function-related matrices by matrices of the form (2.11) were, e.g., studied in [14,26]. The main result of these papers are estimates of the form r = O(log2 ) and r = O(| log | log n), where is the prescribed approximation accuracy. If there is no structure in the Kronecker factors then the storage is O(drn2 ), while the matrix-times-matrix complexity is O(dr 2 n3 ). Introducing the hierarchical (H-matrix) approximation to the Kronecker factors (HKTapproximations) leads to estimates of the form O(dr 2 n logq n) (under certain assumptions on the origin of the matrices [14]). 2.3. Collocation-type approximation of function-related tensors Here we discuss the low Kronecker rank approximation of a special class of higher-order tensors related to certain “discretisations’’ of multi-variate functions, which will be called functiongenerated tensors (FGTs). They directly arise from: (a) a separable approximation of multi-variate functions; (b) Nyström/collocation/Galerkin discretisations of integral operators; (c) the tensor-product approximation of some analytic matrix-valued functions. In the following we define FGTs corresponding to collocation-type discretisation. 2.3.1. General error estimate p Let ( = 1, . . . , d) be a uniform tensor-product grid of intervals on a rectangle := p [a0 , b0 ] , a0 , b0 > 0, indexed by I = I,1 × · · · × I,p with I being the product index set such that for i = (i,1 , . . . , i,p ) ∈ I we have i,m ∈ In := {1, . . . , n} (m = 1, . . . , p). p p p Furthermore, let d := 1 × · · · × d be the corresponding tensor-product lattice in a hypercube d d := ⊂ R with d = dp. (1) (d) We denote by {xi1 , . . . , xid } with i ∈ I ( = 1, . . . , d) a set of collocation points living on the tensor-product lattice d := 1 × · · · × d . In our applications we have d 2 with some fixed p ∈ {1, 2, 3}. In particular, matrix decompositions correspond to the choice p = 2. In this case we introduce the reordered index set of pairs M := {m : m = (i , j ), i , j ∈ In } ( = 1, . . . , d), so that I = M1 × · · · × Md with M = In × In . The Nyström and Galerkin approximations to function-related tensors were discussed in [12,19]. In the following we focus on the collocation-type schemes, which are based on tensor-product ansatz functions i
(y1 , . . . , yd ) =
d
i (y ),
i = (i1 , . . . , id ) ∈ I1 × · · · × Id .
(2.12)
=1
In the following definition, g is a given function defined on × . Definition 2.1 (Collocation, FGT(C)). Given the tensor-product basis set (2.12), we introduce the () () () variable i := (xi , y ) with the collocation point xi and y ∈ , the pair m := (i , j ) ∈ M
704
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
and define the collocation-type dth order FGT by A ≡ A(g) := [am1 ,...,md ] ∈ RM1 ×···×Md with (1) (d) g( i1 , . . . , id ) j (y1 , . . . , yd ) dy, m ∈ M . (2.13) am1 ,...,md :=
In numerical calculations involving integral operators (e.g., arising in classical potential theory or from the Hartree–Fock, Ornstein–Zernicke and Boltzmann equations), n may vary from several hundreds to several thousands, therefore, for d 3, a naive “entry-wise’’ representation to the fully populated tensor A in (2.13) amounts to substantial computer resources, at least of the order O(ndp ). The key observation is that there is a natural duality between separable approximation of the multi-variate generating function and the tensor-product decomposition of the related multidimensional array. Hence, the CP-type decompositions like (2.9) (or (2.11) in the matrix case) can be derived by using a corresponding separable expansion of the generating function g (see [12,14] for more details). Lemma 2.2. Suppose that a multi-variate function g : ⊂ Rd → R can be approximated by a separable expansion gr ():=
r
(1)
(d)
k k ( (1) )· · ·k ( (d) )≈g(),
=( (1) , . . ., (d) )∈Rd ,
(2.14)
k=1
where k ∈ R and k : ⊂ R2 → R. Define the CP decomposition (2.9) via A(r) := A(gr ) (cf. Definition 2.1) with the choice,
() () () j k ( i ) (y ) dy ∈ RI ×J , = 1, . . . , d, k = 1, . . . , r, Vk = (i,j )∈M
(2.15) and with
()
i
=
() (xi , y ),
i ∈ I . Then the FGT(C) A(r) provides the error estimate
A(g) − A(r) (gr )∞ Cg − gr L∞ () . Proof. Using (2.13) we readily obtain (r) (g(x, y) − gr (x, y)) j (y) dy | max |am1 ,...,md − am 1 ,...,md x∈d j g − gr L∞ () (y) dy, and the result follows with C = maxj
supp j
j
| (y)| dy.
Though in general a decomposition (2.14) with small separation rank r is a complicated numerical task, in many interesting applications efficient approximation methods are available. In particular, for a class of multi-variate functions (say, for certain shift-invariant Green’s kernels in Rd ) it is possible to obtain a dimensionally independent Kronecker rank r = O(log n| log |), e.g., based on sinc-quadrature methods or an approximation by exponential sums (see case-study examples in [12,3,18]). The next lemma shows that the error of the Tucker decomposition in the collocation case is directly related to the error of the separable approximation of the generating function.
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
705
Lemma 2.3. Let g : → R be approximated by a separable expansion gr ():=
r1
···
k1 =1
rd kd =1
(1)
(d)
bk1 ,...,kd k1 ( (1) )· · ·kd ( (d) )≈g,
() ∈R2 , 1 d,
(2.16)
where bk1 ,...,kd ∈ R. Then the FGT(C), corresponding to the choice
() () () j Vk = k ( i ) (y ) dy ∈R I × J , =1, . . ., d, k =1, . . ., r , (i,j )∈M
(2.17) ()
with i
()
= (xi , y ) provides the error estimate
A(g) − A(r) (gr )∞ Cg − gr L∞ () . Proof. In the FGT(C) case, by the construction of A(r) , we have ⎛ ⎞ rd r1 (1) (d) A−A(r) ∞ max ⎝g(x, y)− ··· bk1 ,...,kd k1 ( (1) )· · ·kd ( (d) )⎠ j (y) dy x∈d k1 =1 kd =1 g − gr L∞ () max | j j (y) dy, j
supp j
which proves the assertion.
Next we discuss the constructive CP and Tucker decomposition of FGTs applied to a general class of analytic generating functions characterised in terms of their Laplace transform. The construction is based on sinc-approximation methods. 2.3.2. Error bounds for canonical decomposition of FGTs We use constructive approximation based on the sinc-quadrature and sinc-interpolation methods. For the readers convenience we recall the standard approximation results by the sinc-methods (cf. [23,9]). First, we introduce the Hardy space H 1 (D ) as the set of all complex-valued functions f, which are analytic in the strip D := {z ∈ C : |m z| < }, such that
(2.18)
N(f, D ) :=
*D
|f (z)| |dz| =
R
(|f (x + i)| + |f (x − i)|) dx < ∞.
Given f ∈ H 1 (D ), h > 0, and M ∈ N0 , the corresponding sinc-quadrature reads as TM (f, h) := h
M
f (kh) ≈
k=−M
R
f ( ) d .
(2.19)
Proposition 2.4. Let f ∈ H 1 (D ), h > 0, and M ∈ N0 be given. If |f ( )|C exp(−b| |)
for all ∈ R with b, C > 0,
(2.20)
706
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
then the quadrature error satisfies √ f ( ) d − TM (f, h) Ce− 2 bM R
with h =
2 /bM
and with a positive constant C depending only on f, , b (cf. [23]). If f possesses the hyperexponential decay |f ( )| C exp(−b ea| | )
for all ∈ R with a, b, C > 0,
(2.21)
then the choice h = log( 2 baM )/(aM) leads to (cf. [9]) f ( ) d − TM (f, h) CN(f, D )e−2 aM/ log(2 aM/b) . R
Note that 2M + 1 is the number of quadrature/interpolation points. If f is an even function, the number of quadrature/interpolation points reduces to M + 1. We consider a class of multi-variate functions g : Rd → R parametrised by g() = G(()) ≡ G() with ≡ () = 1 ( (1) ) + · · · + d ( (d) ) > 0, : R2 → R+ , where the univariate function G : R+ → R can be represented via the Laplace transform G()e− d. G() = R+
The FGT(C) approximation corresponds to p = 2, () = (x , y ) (cf. Definition 2.5). Without loss of generality, we introduce one and the same scaling function i (·) = (· + (i − 1)h),
i ∈ In ,
(2.22)
for all spatial dimensions = 1, . . . , d, where h > 0 is the mesh parameter. We simplify further and set ≡ () = d=1 0 ( () ), i.e., = 0 (x , y )
( = 1, . . . , d) with 0 : [a, b]2 → R+ .
(2.23)
For i ∈ In , let {x¯i } be the set of cell-centred collocation points on [a, b]. For each i, j ∈ In , we introduce the parameter dependent integral e−0 (x¯i ,y) (y + (j − 1)h) dy, 0. (2.24) i,j () := R2
Theorem 2.5 (FGT(C) approximation). Assume (a)–(c) below: (a) G() has an analytic extension G(w), w ∈ G , into a certain domain G ⊂ C which can be mapped conformally onto the strip D , such that w = (z), z ∈ D and −1 : G → D ; (b) for all (i, j) ∈ I × J the transformed integrand f (z) := (z)G((z))
d
i j ((z))
=1
belongs to the Hardy space H 1 (D ) with N (f, D ) < ∞ uniformly in (i, j);
(2.25)
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
707
(c) the function f (t), t ∈ R, in (2.25) has either exponential (c1) or hyper-exponential (c2) decay as t → ±∞. Under the assumptions (a)–(c), we have that, for each M ∈ N+ , the FGT(C), A(g), defined on [a, b]d allows an exponentially convergent super-symmetric 1 CP decomposition A(r) ∈ Cr with () Vk as in (2.15), where the expansion (2.14) is obtained by the substitution of f from (2.25) into the sinc-quadrature (2.19), such that we have
A(g) − A(r) ∞ Ce−M with r = 2M + 1, √ where = 21 , = 2 b in case (c1) and with = 1, = Proof. First, we notice that by definition d aij = G() i j () d = f (t) dt R+
R
=1
(2.26) 2 b log(2 aM/b)
in case (c2).
for (i, j) ∈ I × J .
(2.27)
We now apply the sinc-quadrature to the transformed integrand f to obtain M TM (f, h) := h f (kh) ≈ f (t) dt, (i, j) ∈ I × J , R
k=−M
with
f (t) dt − TM (f, h) Ce−M , R
and with the respective , (see Proposition 2.4). Combining this estimate with (2.27) and taking into account the separability property of the exponential prove the assertion for all (i, j) ∈ I × J . Noticing that our quadrature does not depend on the index (i, j) completes the proof. Theorem 2.5 proves the existence of a CP decomposition to the FGT A(g) with the Kronecker rank r = O(| log | log 1/ h) (in case (c2)) or r = O(log2 ) (in case (c1)), which provide an approximation of order O(). In our applications we usually have 1/ h = O(n), where n is the number of grid-points in one spacial direction. Theorem 2.5 typically applies to translationinvariant or spherically symmetric functions (see examples in §3). 2.3.3. Error bounds for Tucker decomposition of FGTs For the class of applications with more general than translation-invariant functions the analytic separation methods are based on tensor-product interpolation. This leads to the rank-(r1 , . . . , rd ) Tucker decomposition with small rank parameters r . Again we recall the related results on the sinc-interpolation method. Let x sin [ (x − kh)/h] S(k, h)(x) = ≡ sinc −k (k ∈ Z, h > 0, x ∈ R)
(x − kh)/h h be the kth sinc-function with step size h, evaluated at x with the sinc-function given by sinc(z) =
sin( z) ,
z
z ∈ C.
1 A dth order tensor is called super-symmetric if it is invariant under arbitrary permutations of indices in {1, . . . , d}.
708
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
The classical sinc-interpolant (cardinal series representation) is given by CM (f, h) =
M
S(, h)f (h) ≈ f.
(2.28)
=−M
If (2.20) holds then the interpolation error satisfies (cf. [23]) √
f − CM (f, h)∞ CM 1/2 e− bM
with h =
/bM,
(2.29)
where specifies the width of the strip D in (2.18). Assuming the hyper-exponential decay of f as in (2.21), we obtain (cf. [9]) N (f, D ) − aM/ log( aM/b)
aM with h = log e f − CM (f, h)∞ C (aM) . 2 b (2.30) The sinc-interpolation method can be extended to the multi-dimensional case. For each = 1, . . . , d, let g (·) : = [a0 , b0 ] → R be a univariate parameter-dependent function in variable
() , which is the restriction of a multi-variate function g( (1) , . . . , (d) ) onto with fixed remaining variables (1) , . . . , (−1) , (+1) , . . . , (d) . Suppose that g (·) satisfies all the regularity and decay conditions above, uniformly in = 1, . . . , d. It is shown in [12] that the tensor-product (1) (d) sinc-interpolation CM g := CM , . . . , CM g with respect to d variables, provides the exponential error estimate |g( ) − CM (g, h)( )|
− M CdM max N (g (·), D ) e log M , 2 =1,...,d
()
()
with the stability (Lebesgue) constant M = O(log M), and where CM g = CM (g, h) denotes the univariate sinc-interpolation from (2.28) applied to the variable ∈ I . For a class of analytic functions with point singularities the expansion (2.16) can be derived via tensor-product sinc-interpolation applied with respect to variables 1 , . . . , d . Theorem 2.6. Assume that all conditions in Theorem 2.5 are satisfied. Then the FGT(C), A(g), () allows an exponentially convergent rank-(r, . . . , r) Tucker decomposition A(r) ∈ Tr with Vk as ()
in (2.17), where k ( () ) = sinc(−ak 0 ( () )) with 0 from (2.23) ( = 1, . . . , d), and where bk are explicitly represented via the sinc-interpolation (2.28), such that: A(g) − A(r) ∞ C(1 + log M)d e−M with = 21 , = Theorem 2.5.
with r = 2M + 1,
√ 2 b in case (c1) and with = 1, =
2 b log(2 aM/b)
(2.31) in case (c2) as in
Proof. Modifying the proof of Theorem 2.5, we now apply the sinc-interpolation. In particular, the error bounds (2.29) and (2.30) show exponential convergence in M for the tensor-product sinc-interpolant CM g, which proves the assertion.
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
709
The error estimate (2.31) yields max r = O(| log |−1 ). In some cases we get the estimate = O(log 1/ h) (cf. [12]). −1
3. Tensor approximation of integral operators 3.1. Canonical and Tucker decompositions in Rd The principal ingredient in the structured tensor-product representation of integral operators in many spatial dimensions is a separable approximation of the multi-variate function representing the kernel of the operator. Given the integral operator G : L2 () → L2 () in := [0, 1]d ∈ Rd , d 2, g(x, y)u(y) dy, x, y ∈ , (Gu) (x) :=
with some shift-invariant kernel function g(x, y) = g(|x − y|), which can be represented in the form 2 2
1 + · · · + d , g(x, y) = g( 1 , . . . , d ) ≡ g where = |x − y | ∈ [0, 1], = 1, . . . , d. To approximate the operator G, we consider a collocation scheme with tensor-product test functions i (x1 , . . . , xd ) as in (2.12). If the kernel function g allows a global separable approximation, cf. Lemma 2.6, we approximate the collocation stiffness matrix A = {(Aj )|x¯i }i,j∈Ind ∈ RN×N ,
N = nd , x¯i ∈ d ,
by a matrix A(r) of the form (2.11), where the Vk are n × n matrices given by n Vk =
1
0
j
k (|x¯i − y |) (y ) dy
,
= 1, . . . , d,
(3.1)
i,j =1
providing the corresponding error estimate in l∞ matrix norm. For standard singular kernels (say, Green’s kernels) the direct separable approximation is usually not possible. In this case one can apply Theorem 2.9. In both cases we are able to prove the existence of a low Kronecker rank CP approximation for the class of multi-dimensional integral operators. Note that A − A(r) can be easily estimated in, say, the Frobenius matrix norm. When using the tensor-product sinc-interpolation, the function k (|u − v|) can be proved to be asymptotically smooth. For the class of kernel functions approximated by exponential sums, the factor k (|u − v|) even appears to be globally smooth (indeed, it is the entire function). Hence, the canonical components Vk can be further approximated in the H-matrix format (cf. [13]). In the case of uniform grids also the Toeplitz-type structure can be used to represent n × n matrices Vk . For the class of translation-invariant kernels (see [12] and examples below), we obtain a dimensionally independent bound r = O log h−1 log −1 log log −1 .
710
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
Following Definition 2.1, we introduce the dth order FGT(C) representing the integral operator G, A ≡ A(g) := [am1 ,...,md ] ∈ RM1 ×···×Md . Assume that the kernel function g(x, y) ≡ g( (1) , . . . , (d) ) allows a separable approximation (2.16) via the sinc-interpolation, so that the approximation converges exponentially in r = max r (see Theorem 2.6). Then the associated rank-(r1 , . . . , rd ) Tucker decomposition (2.10) in Tr () (cf. (2.10)) is specified by the Kronecker factors Vk ∈ RM , explicitly defined by (2.17). Let r = (r, . . . , r). Theorem 2.6 now yields the error estimate A(g) − A(r) ∞ Ce−M
with r = 2M + 1,
(3.2)
and with constants , from (2.31). As it was already mentioned, (3.2) yields max r = O(| log |−1 ) with from (2.18). In turn, for a class of shift-invariant kernels we get the estimate −1 = O(log n). In general, given a tolerance > 0, we have the bound d−1 −1 −1 . log log r = O log (n) log The numerical complexity of the Tucker decomposition is estimated by drn2 + r d . The storage cost for the corresponding Tucker approximation combined with hierarchical matrices has the complexity drn logq n + r d . Notice that the Tucker approximation can be applied to more general kernel functions compared with the canonical representation (as it was already mentioned, the latter is usually restricted to the class of translation-invariant kernels). 3.2. Application to the Newton potential Let x, y ∈ Rd , p = 2, and define = |x − y|2 = 21 + · · · + 2d with = x − y : R2 → R, ∈ R2d . The family of functions g(x, y) ≡ g() := 1/
with ∈ R>0 ,
arises in potential theory, in quantum chemistry and in computational gas dynamics (cf. [18]). The choice = 21 corresponds to the classical Newton potential, while = − 21 refers to the Euclidean distance function. Low separation rank decomposition to the multi-variate functions √ 1/, 1/ and to the related Galerkin approximations were discussed in [12–14,19], while the kernel function , ∈ R, was considered in [18]. Let us take a closer look to the collocation-type FGT corresponding to the Newton potential √ 1/ in the hypercube [−R, R]d ∈ Rd . As a basic example, we consider piecewise constant finite elements on the uniform grid with step-size h > 0, defined by scaling functions (x) = (x) associated with a tensor-product grid. Again, we let {x¯i } be the set of cell-centred collocation points. In our case, for the function in (2.24) we have 0 (x, y) = (x − y)2 (x, y ∈ R), hence making use of the Gaussian transform 2 1 2 e− d, √ =√
R+
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
we obtain
i,j () = |i−j | () :=
e−
2 (x¯ −y)2 i
R
2
j (y) dy,
711
0, i, j ∈ In
(see (2.24), (2.22) for the definition of i,j , j ). √ Lemma 3.1. The FGT(G) for the Newton potential 1/ allows a CP approximation in the d hypercube [−R, R]d ∈ R with exponential convergence rate (independent of d) as in (2.31), where = 21 . Proof. We apply Theorem 2.5. To check the condition (a), let us choose the analyticity domain as a sector G := {w ∈ C : |arg(w)| < } with apex angle 0 < 2 < /2 (here G = 1), and then apply the conformal map −1 : G → D
with w = (z) = ez , −1 (w) = log(w)
(cf. Theorem 2.5(a)). To check condition (b) of Theorem 2.5, first, we notice that the transformed integrand f (z) := exp(z)
d
i j ((z))
=1
belongs to the Hardy space H 1 (D ). In fact, introducing the error function erf by t 2 2 erf(t) := √ e− d,
0
(3.3)
we calculate the explicit representation d−1
i ,j () = i () =
2d erf( ih) − erf( (i − 1)h) , 2
(3.4)
with xi = xi = (i −1)h, n = n, h = b/n (uniform grid spacing) for i = i −j +1 = 1, . . . , n, = 1, . . . , d. Since erf(z)/z is an entire function it proves the required analyticity of f. Now we estimate the constant N (f, D ) applying arguments similar to those in [19] (cf. Lemma 4.7). Finally, we check condition (c1). Using properties of the erf-function as t → ±∞, we obtain the required asymptotical behaviour of f (t), t → ±∞, with d 2. This completes our proof. Lemma 3.1 proves the exponential convergence of the canonical decomposition with = 21 . However, it is also possible to apply the improved quadrature with hyper-exponential decay of the integrand which leads to the true exponential convergence with = 1. Using a variable transformation t = sinh(u) and taking advantage of the symmetry of the integrand we obtain the quadrature formula I=
R
f (t) dt =
R+
2 cosh(u)f (sinh(u)) du ≈
M k=0
(M)
wk
(M)
f (tk
) =: IM ,
(3.5)
712
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
100
Error Curves for h = 0.01, C0= 6 and R = 0
Error Curves for h = 0.01, C0 = 2 and R = 1.7147 100 10-2 Relative Error
Relative Error
10-2 10-4 10-6 10-8
10-4 10-6 10-8
10-10
10 20 30 40 50 60 70 80 90 100 Kronecker Rank
10-10
5
10 15 20 25 30 35 40 45 50 Kronecker Rank
Fig. 1. √ Comparison between the improved and not-improved sinc-quadratures for d = 3, h = 0.01, R = 0 (left) and R = 3 (right).
with (M)
tk
:= sinh(khM )
and
(M)
wk
:=
for k = 0, hM 2 hM cosh(khM ) for k > 0,
(3.6)
(3.7)
with the choice hM = C0 log(M) for some C0 (see Lemma 5.1 in [12]). M In the numerical illustrations we consider the case d = 3. Due to the Toeplitz structure of the n × n matrices Vk , in the numerical experiments below we control the accuracy of our quadrature-based decompositions only for a fixed index i = 1 and vary the index j = 1, . . . , n ( = 1, . . . , 3). Hence, in our notation we distinguish the distance R√from the observation point to the origin: for example, R = 0 corresponds to j = 1, while 3 corresponds to j = n ( = 1, . . . , 3). First we demonstrate the advantage of the improved quadrature (3.5), see Fig. 1. For a fixed number of quadrature terms M, in order to obtain uniform error control for all indices j = j = 1, . . . , n, we optimise the quadrature with respect to the factor C0 in hM = C0 ln(M) , M √ such that the quadrature errors are approximately equalised for two limiting cases R = 1 and 3. Then the error for all intermediate values of R lie in the “corridor’’ between the above-mentioned error bounds. Fig. 2 presents non-optimised (left) and optimised (right) errors considered for the limiting values of R (top) and other representative data (bottom), for h = 0.01 and = 10−5 . For our quadrature-based decompositions we observe the exponential convergence in the Kronecker rank. Further reduction of the Kronecker rank can be achieved by applying the so-called near-far field decomposition. It is based on the observation that the quadrature optimisation for the off-diagonal part of the target matrix (i.e., without the diagonal elements corresponding to j 2) leads to a much smaller Kronecker rank compared with an approximation of the whole matrix. In this case the low Kronecker rank representation of the complete matrix is obtained by adding a rank-1 term
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
100
10-2
10-2 Relative Error
Relative Error
Error Curves for h = 0.01 and C0 = 4.5 100
10-4 10-6 10-8 10-10
713
Error Curves for h = 0.01 and C0 = 3.1 R=0 R = 0.01 R = 0.02 R = 0.03 R = 0.99 R = 1.7147 r = 31,ε
10-4 10-6 10-8 10-10
10-12
10
20
30 40 50 60 Kronecker Rank
70
10-12
80
10
20
30 40 50 60 Kronecker Rank
70
80
Fig. 2. Non-optimised (left) and optimised (right) errors for h = 0.01, = 10−5 .
Error Curves for h = 0.01 and C0 = 2.1 100
10-2
10-2 Relative Error
Relative Error
Error Curves for h = 0.01 and C0 = 3.1 100
10-4 10-6 10-8 10-10 10-12
10-4 10-6 10-8 10-10
10
20
30 40 50 60 Kronecker Rank
70
80
10-12
10
20
30 40 50 60 Kronecker Rank
70
80
Fig. 3. Optimal quadratures without (left) and with near-far field decomposition (right) for h = 10−2 and = 10−5 .
representing the diagonal part (j = 1). The numerical results are depicted in Fig. 3 (indicate the rank reduction from 30 to 20). Acknowledgment The authors are thankful to C. Bertoglio for the assistance with numerical experiments. References [1] G. Beylkin, M.J. Mohlenkamp, Numerical operator calculus in higher dimensions, Proc. Nat. Acad. Sci. USA 99 (2002) 10246–10251. [2] G. Beylkin, M.J. Mohlenkamp, Algorithms for numerical analysis in higher dimensions, SIAM J. Sci. Comput. 26 (2005) 2133–2159. [3] G. Beylkin, L. Morzón, On approximation of functions by exponential sums, Appl. Comput. Harmonic Anal. 19 (2005) 17–48.
714
W. Hackbusch, B.N. Khoromskij / Journal of Complexity 23 (2007) 697 – 714
[4] L. De Lathauwer, B. De Moor, J. Vandewalle, On the best rank-1 and rank-(R1 , . . . , RN ) approximation of higherorder tensors, SIAM J. Matrix Anal. Appl. 21 (2000) 1324–1342. [5] L. De Lathauwer, B. De Moor, J. Vandewalle, Computation of the canonical decomposition by means of a simultaneous generalized Schur decomposition, SIAM J. Matrix Anal. Appl. 26 (2004) 295–327. [6] M.V. Fedorov, G. Chuev, H.-J. Flad, L. Grasedyck, B.N. Khoromskij, Low-rank wavelet solver for the Ornstein–Zernike integral equation, Computing 80 (2007), to appear. [7] H.-J. Flad, W. Hackbusch, B.N. Khoromskij, R. Schneider, Concept of data-sparse tensor-product approximation in many-particle models, in preparation. [9] I.P. Gavrilyuk, W. Hackbusch, B.N. Khoromskij, Data-sparse approximation to a class of operator-valued functions, Math. Comp. 74 (2005) 681–708. [10] I.P. Gavrilyuk, W. Hackbusch, B.N. Khoromskij, Tensor-product approximation to elliptic and parabolic solution operators in higher dimensions, Computing 74 (2005) 131–157. [12] W. Hackbusch, B.N. Khoromskij, Low-rank Kronecker product approximation to multi-dimensional nonlocal operators. Part I. Separable approximation of multi-variate functions, Computing 76 (2006) 177–202. [13] W. Hackbusch, B.N. Khoromskij, Low-rank Kronecker product approximation to multi-dimensional nonlocal operators. Part II. HKT representations of certain operators, Computing 76 (2006) 203–225. [14] W. Hackbusch, B.N. Khoromskij, E. Tyrtyshnikov, Hierarchical Kronecker tensor-product approximation, J. Numer. Math. 13 (2005) 119–156. [15] W. Hackbusch, B.N. Khoromskij, E.E. Tyrtyshnikov, Approximate iterations for structured matrices, Preprint 112, Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig, 2005. [16] R. Harshman, Foundation of the PARAFAC procedure: model and conditions for an “explanatory” multi-mode factor analysis, UCLA Working Papers in Phonetics, vol. 16, 1970, pp. 1–84. [17] B.N. Khoromskij, An Introduction to Structured Tensor-product Representation of Discrete Nonlocal Operators. Lecture Notes, vol. 27, Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig, 2005. [18] B.N. Khoromskij, Structured data-sparse approximation to high order tensors arising from the deterministic Boltzmann equation, Preprint 4, Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig, 2005, Math. Comp., to appear. [19] B.N. Khoromskij, Structured rank-(r1 , . . . , rd ) decomposition of function-related tensors in Rd , Comput. Meth. Appl. Math. 6 (2) (2006) 194–220. [21] T. Kolda, Orthogonal tensor decompositions, SIAM J. Matrix Anal. Appl. 23 (2001) 243–255. [22] Ch. Lubich, On variational approximations in quantum molecular dynamics, Math. Comp. 74 (2005) 765–779. [23] F. Stenger, Numerical Methods based on Sinc and Analytic Functions, Springer, Berlin, 1993. [24] V.N. Temlyakov, Greedy algorithms and M-term approximation with regard to redundant dictionaries, J. Approx. Theory 98 (1999) 117–145. [25] L.R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika 31 (1966) 279–311. [26] E.E. Tyrtyshnikov, Tensor approximations of matrices generated by asymptotically smooth functions, Mat. Sb. 194 (6) (2003) 147–160 (in Russian) (Translation in Sb. Math. 194 (2003) 941–954). [27] T. Zhang, G.H. Golub, Rank-one approximation to high order tensors, SIAM J. Matrix Anal. Appl. 23 (2001) 534–550.
Journal of Complexity 23 (2007) 715 – 739 www.elsevier.com/locate/jco
BDDC methods for discontinuous Galerkin discretization of elliptic problems Maksymilian Dryjaa,∗,1 , Juan Galvisb , Marcus Sarkisb, c,2 a Department of Mathematics, Warsaw University, Banacha 2, 02-097 Warsaw, Poland b Instituto Nacional de Matemática Pura e Aplicada, Estrada Dona Castorina 110, CEP 22460-320, Rio de Janeiro,
Brazil Rio de Janeiro, Brazil c Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, MA 01609, USA
Received 27 October 2006; accepted 15 February 2007 Available online 24 March 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday
Abstract A discontinuous Galerkin (DG) discretization of Dirichlet problem for second-order elliptic equations with discontinuous coefficients in 2-D is considered. For this discretization, balancing domain decomposition with constraints (BDDC) algorithms are designed and analyzed as an additive Schwarz method (ASM). The coarse and local problems are defined using special partitions of unity and edge constraints. Under certain assumptions on the coefficients and the mesh sizes across *i , where the i are disjoint subregions of the original region , a condition number estimate C(1 + maxi log(Hi / hi ))2 is established with C independent of hi , Hi and the jumps of the coefficients. The algorithms are well suited for parallel computations and can be straightforwardly extended to the 3-D problems. Results of numerical tests are included which confirm the theoretical results and the necessity of the imposed assumptions. © 2007 Elsevier Inc. All rights reserved. Keywords: Interior penalty discretization; Discontinuous Galerkin method; Elliptic problems with discontinuous coefficients; Finite element method; BDDC algorithms; Schwarz methods; Preconditioners
∗ Corresponding author.
E-mail address: [email protected] (M. Dryja). 1 This work was supported in part by Polish Sciences Foundation under grant 2P03A00524. 2 This work was supported in part by CNPQ (Brazil) under grant 305539/2003-8.
0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.02.003
716
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
1. Introduction In this paper, a discontinuous Galerkin approximation of elliptic problems with discontinuous coefficients is considered. The problem is considered in a polygonal region which is a union of disjoint polygonal subregions i . The discontinuities of the coefficients occur across *i . The problem is approximated by a conforming finite element method (FEM) on matching triangulation in each i and nonmatching one across *i . Composite discretizations are motivated first of all by the regularity of the solution of the problem being discussed. Discrete problems are formulated using DG methods, symmetric and with interior penalty terms on the *i ; see [4,5,8]. A goal of this paper is to design and analyze balancing domain decomposition with constraints (BDDC) preconditioners for the resulting discrete problem; see [7,17,16] for conforming finite elements. In the first step, the problem is reduced to the Schur complement problem with respect to unknowns on *i for i = 1, . . . , N. For that, discrete harmonic functions defined in a special way are used. The preconditioners are designed and analyzed using the general theory of ASMs; see [18]. The local spaces are defined on i and faces of *j which are common to i plus zero average values constraints on faces of i or/and faces of j . The coarse basis functions follow from local orthogonality with respect to the local spaces and from average constraints across those faces. A special partitioning of unity with respect to the substructures i is introduced and it is based on master and slave sides of substructures. A side Fij = *i ∩ *j is a master when i is larger than j , otherwise it is a slave, so if Fij ⊂ *i is a master side then Fj i ⊂ *j is a slave side. The hi - and hj -triangulations on Fij and Fj i , respectively, are built in a way that hi is coarser where i is larger. Here hi and hj denote the parameters of these triangulations. It is proved that the algorithms are almost optimal and its rate of convergence is independent of hi and hj , the number of subdomains i and the jumps of coefficients. The algorithms are well suited for parallel computations and they can be straightforwardly extended to the problems in the 3-D cases. DG methods are becoming more and more popular for the approximation of PDEs since they are well suited to dealing with regions with complex geometries or discontinuous coefficients, and local or patch refinements; see [5,4] and the literature therein. The class of DG methods we deal within this paper uses symmetrized interior penalty terms on the boundaries *i . A goal is to design and analyze BDDC algorithms for the resulting discrete problem; see [7] and also [17,16]. There are also several papers devoted to algorithms for solving discrete DG problems. In particular in connection with domain decomposition methods, we can mention [15,12,14,1–3] where related discretizations to those discussed here are considered. In these papers Neumann–Dirichlet methods and two-level overlapping and nonoverlapping Schwarz methods are proposed and analyzed for DG discretization of elliptic problems with continuous coefficients. In [8] for the discontinuous coefficient case, a nonoptimal multilevel ASM is designed and analyzed. In [6,13], two-level overlapping and nonoverlapping ASMs are proposed and analyzed for DG discretization of fourthorder problems. In those works, the coarse problems are based on polynomial coarse basis functions on a coarse triangulation. In addition, ideas of iterative substructuring methods and notions of discrete harmonic extensions are not explored. Condition number estimates of O( H ) and O( Hh ), 3
3
and O( H3 ) and O( Hh3 ) are obtained for second- and fourth-order problems, respectively,
where is the overlap parameter. In addition, for the cases where the distribution of the coefficients i is not quasimonotonic, see [10], these methods when extended straightforwardly to 3-D problems have condition number estimates which might deteriorate as the jumps of the coefficients get more severe. To the best of our knowledge, BDDC algorithms for DG discretizations of
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
717
elliptic problems with continuous and discontinuous coefficients have not been considered in the literature. We note that part of the analysis presented here has previously appeared as a technical report for analyzing several iterative substructuring DG preconditioners of Neumann–Neumann type; see [11]. In [9] we have also successfully extended these preconditioners to the balancing domain decomposition (BDD) method. The paper is organized as follows. In Section 2 the differential problem and its DG discretization are formulated. In Section 3 the Schur complement problem is derived using discrete harmonic functions in a special way. Some technical tools are presented in Section 4. Sections 5 and 6 are devoted to designing a BDDC algorithm while Sections 7 and 8 are devoted to the proof of the main result, Theorem 7.1. In Section 9 we introduce coarse spaces of dimension half smaller than those defined in Section 6. Finally in Section 10 some numerical experiments are presented which confirm the theoretical results. The enclosed numerical results show that the introduced assumption on the coefficients and the parameter steps are necessary and sufficient. 2. Differential and discrete problems 2.1. Differential problem Consider the following problem: find u∗ ∈ H01 () such that a(u∗ , v) = f (v)
∀v ∈ H01 (),
(1)
where a(u, v) :=
N i=1 i
i ∇u∗ · ∇v dx
and f (v) :=
f v dx.
¯ = N ¯ We assume that i=1 i and the substructures i are disjoint regular polygonal subregions of diameter O(Hi ) and form a geometrical conforming partition of , i.e., ∀i = j the intersection *i ∩*j is empty, or is a common vertex or an edge of *i and *j . We assume that f ∈ L2 () and, for simplicity of presentation, let i be a positive constant. 2.2. Discrete problem Let us introduce a shape-regular triangulation in each i with triangular elements and hi as mesh parameter. The resulting triangulation on is in general nonmatching across *i . Let Xi (i ) be the regular finite element (FE) space of piecewise linear continuous functions in i . Note that we do not assume that functions in Xi (i ) vanish on *i ∩ *. Define Xh () := X1 (1 ) × · · · × XN (N ). The discrete problem obtained by the DG method, see [5,8], is of the form: Find u∗h ∈ Xh () such that ah (u∗h , vh ) = f (vh ) ∀vh ∈ Xh (),
(2)
where ah (u, v) =
N i=1
aˆ i (u, v)
and f (v) =
N i=1
i
f vi dx,
(3)
718
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
aˆ i (u, v) := ai (u, v) + si (u, v) + pi (u, v), ai (u, v) := i ∇ui ∇vi dx, i
si (u, v) :=
Fij ⊂*i
pi (u, v) :=
ij Fij
Fij ⊂*i
Fij
lij
(4) (5)
*ui *vi (vj − vi ) + (uj − ui ) *n *n
ds,
ij (uj − ui )(vj − vi ) ds, lij hij
(6)
N and u = {ui }N i=1 ∈ Xh (), v = {vi }i=1 ∈ Xh (). We set lij = 2 when Fij = *i ∩ *j is a common face (edge) of *i and *j , and define ij := 2i j /(i + j ) as the harmonic average of i and j , and hij := 2hi hj /(hi + hj ). In order to simplify the notation we include the index j = * and put li * := 1 when Fi * := *i ∩ * has a positive measure. We also set u* = 0, v* = 0 and define i * := i and hi * := hi . The **n denotes the outward normal derivative on *i , and is a positive penalty parameter. We note that when ij is given by the harmonic average, it can be shown that min{i , j } ij 2 min{i , j }. We also define
di (u, v) := ai (u, v) + pi (u, v),
(7)
and dh (u, v) :=
N
(8)
di (u, v).
i=1
It is known that there exists a 0 = O(1) > 0 such that for 0 , we obtain |si (u, u)| < cdi (u, u) and i si (u, u) < cdh (u, u), where c < 1, and therefore, the problem (2) is elliptic and has a unique solution. A priori error estimates for the method are optimal for the continuous coefficients, see [4,5], and for the discontinuous coefficients if i *n u∗ − j *n u∗ = 0 in L2 (Fij ), see [8]. Note that this condition is satisfied if the solution u∗ of (2.1) restricted to the i and j is in H 3/2+ (i ) and H 3/2+ (j ) with > 0. We use the dh -norm, also called broken norm, in Xh () with weights given by i and lij hijij . For u = {ui } ∈ Xh () we note that ⎧ ⎫ ⎪ N ⎪ ⎨ ⎬ ij 2 2 dh (u, u) = i ∇ui L2 ( ) + (ui − uj ) ds . (9) i ⎪ ⎪ lij hij ⎭ i=1 ⎩ F ⊂* ij
i
Fij
Lemma 2.1. There exists 0 > 0 such that for 0 , for all u ∈ Xh () the following inequalities hold: 0 di (u, u) aˆ i (u, u) 1 di (u, u)
i = 1, . . . , N,
(10)
and 0 dh (u, u)ah (u, u) 1 dh (u, u), where 0 and 1 are positive constants independent of the i , hi and Hi .
(11)
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
719
The proof essentially follows from (37), see below, or refer to [8]. 3. Schur complement problem In this section we derive a Schur complement version for the problem (2). We first introduce some auxiliary notations. Let u = {ui } ∈ Xh () be given. We can represent ui as ui = Hi ui + Pi ui ,
(12)
where Hi ui is the discrete harmonic part of ui in the sense of ai (., .), see (5), i.e., ai (Hi ui , vi ) = 0 Hi ui = ui
o
∀vi ∈ X i (i ),
(13)
on *i ,
(14) o
while Pi ui is the projection of ui into X i (i ) in the sense of ai (., .), i.e. o
ai (Pi ui , vi ) = ai (ui , vi ) ∀vi ∈ X i (i ).
(15)
o
Here X i (i ) is a subspace of Xi (i ) of functions which vanish on *i , and Hi ui is the classical o
o
discrete harmonic part of ui . Let us denote by X h () the subspace of Xh () defined by Xh () := o
o
X1 (1 ) × . . . × XN (N ) and consider the global projections Hu := {Hi ui }N i=1 and Pu := o N N {Pi ui }i=1 : Xh () → X h () in the sense of i=1 ai (., .). Hence, a function u ∈ Xh () can therefore be decomposed as u = Hu + Pu.
(16)
The function u ∈ Xh () can also be represented as ˆ + Pu, ˆ u = Hu
(17) o
ˆ = {Pˆ i ui }N : Xh () → X h () is the projection in the sense of ah (., .), the original where Pu i=1 o o bilinear form of (2), see (3). Since Pˆ i ui ∈ X i (i ) and vi ∈ X i (i ), we have ai (Pˆ i u, vi ) = ah (u, vi ). ˆ ∗ + Pu ˆ ∗ . To find Pu ˆ ∗ we need to The discrete solution of (2) can be decomposed as u∗h = Hu h h h solve the following set of standard discrete Dirichlet problems: o Find Pˆ i u∗ ∈ X i () such that h
ai (Pˆ i u∗h , vi ) = f (vi )
o
∀vi ∈ X i (i )
(18)
for i = 1, . . . , N. Note that these problems are local and independent, so they can be solved in parallel. This is a precomputational step. ˆ ∗ . Let Hˆ i u be the discrete harmonic part of u in the We now formulate the problem for Hu h sense of aˆ i (., .), see (4), where Hˆ i u ∈ Xi (i ) is the solution of aˆ i (Hˆ i u, vi ) = 0 ui
on *i
and
o
∀vi ∈ X i (i ), uj
on Fj i ⊂ *j are given
(19) (20)
720
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739 o
where uj are given on Fj i = *i ∩ *j . We points out that for vi ∈ Xi (i ) we have ij *vi , uj − u i . aˆ i (ui , vi ) = (i ∇ui , ∇vi )L2 (i ) + lij *n L2 (Fij )
(21)
Fij ⊂*i
Note that (19)–(20) has a unique solution. To see this, let us rewrite (19) in the form ij *k i , uj − ui , i (∇ Hˆ i u, ∇ki )L2 (i ) = − lij *n 2 Fij ⊂*i
(22)
L (Fij )
o
where ki are nodal basis functions of X i (i ) associated with interior nodal points xk of the *k
hi -triangulation of i . Note that *ni does not vanish on *i when xk is a node of an element touching *i . We see that Hˆ i u is a special extension into i where u is given on *i and on all the Fj i , and therefore, it depends on the values of uj given on Fj i = *i ∩ *j and on F*i (we already have assumed u* = 0 for j = *). Note that Hˆ i u is discrete harmonic except at nodal points close to *i . We will sometimes call Hˆ i u discrete harmonic in a special sense, i.e., in the ˆ = {Hˆ i u}N ∈ Xh (). sense of aˆ i (., .) or Hˆ i . We let Hu i=1 Note that (19) is obtained from ˆ v) = 0 ah (Hu,
(23) o
ˆ ˆ N for u ∈ Xh () and when taking v = {vi }N i=1 ∈ X h (). It is easy to see that Hu = {Hi u}i=1 and ˆ = {Pˆ i ui }N are orthogonal in the sense of ah (., .), i.e. Pu i=1 ˆ Pv) ˆ =0 ah (Hu,
u, v ∈ Xh ().
(24)
In addition, ˆ = Hu, HHu
ˆ ˆ HHu = Hu
(25)
ˆ and Hu do not change the values of u on any of the nodes on the boundaries of the since Hu subdomains i also denoted by (26) *ihi , := i
where *ihi is the set of nodal points of *i . We note that the definition of includes the nodes on both sides of i *i . We are now in a position to derive a Schur complement problem for (2). Let us apply the decomposition (17) in (2). We get ˆ ∗ + Pu ˆ ∗ , Hv ˆ h + Pv ˆ h ) = f (Hv ˆ h + Pv ˆ h) ah (Hu h h or ˆ ∗ , Hv ˆ h ) + 2ah (Hu ˆ ∗ , Pv ˆ h ) + ah (Pu ˆ ∗ , Pv ˆ h ) = f (Hv ˆ h ) + f (Pv ˆ h ). ah (Hu h h h
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
721
Using (18) and (23) we have ˆ ∗ , Hv ˆ h ) = f (Hv ˆ h) ah (Hu h
∀vh ∈ Xh ().
(27)
This is the Schur complement problem for (2). We denote by Vh () or V, which we will use ˆ h = 0, i.e., the space of discrete harmonic later, the set of all functions vh in Xh () such that Pv functions in the sense of the Hˆ i . We rewrite the Schur complement problem as follows: Find u∗h ∈ Vh () such that S(u∗h , vh ) = g(vh ) ∀vh ∈ Vh (),
(28)
ˆ ∗ , and here and below u∗h ≡ Hu h ˆ h , Hv ˆ h ), S(uh , vh ) = ah (Hu
ˆ h ). g(vh ) = f (Hv
(29)
This problem has a unique solution. 4. Technical tools Our main goal is to design and analyze BDDC methods for solving (28). This will be done in the next section. We now introduce some notations and facts to be used later. Let u = {ui }N i=1 ∈ Xh () N and v = {vi }i=1 ∈ Xh (). Let di (., .) and dh (., .) be the bilinear forms defined in (7) and (8). o
Note that, for u, v ∈ Xh (), di (u, v) = ai (u, v) = i (∇ui , ∇vi )L2 (i )
(30)
and, for u ∈ Xh (), 0 dh (u, u)ah (u, u) 1 dh (u, u)
(31)
in view of Lemma 2.1, where 0 and 1 are positive constants independent of hi , Hi and i . The next lemma shows the equivalence between discrete harmonic functions in the sense of H and ˆ and therefore, we can take advantage of all the discrete Sobolev norm results in the sense of H, known for H discrete harmonic extensions. Lemma 4.1. For u ∈ Xh () we have ˆ Hu)Cd ˆ di (Hu, Hu)di (Hu, i (Hu, Hu),
i = 1, . . . , N,
(32)
and ˆ Hu)Cd ˆ dh (Hu, Hu)dh (Hu, h (Hu, Hu),
(33)
ˆ ˆ N where Hu = {Hi ui }N i=1 and Hu = {Hi u}i=1 are defined by (13)–(14) and (19)–(20), respectively, and C is a positive constant independent of hi , u, i and Hi . Proof. We note that P and H are projections in the sense of i ai (., .) while Pˆ and Hˆ are projections in the sense of ah (., .). Therefore, the left-hand inequality of (33) follows from properties of minimum energy of discrete harmonic extensions in the i ai (., .) sense. To prove the right-hand inequality of (33) note that ˆ Hu) ˆ = dh (Hu, ˆ HHu ˆ + P Hu) ˆ = dh (Hu, ˆ Hu) + dh (Hu, ˆ P Hu) ˆ dh (Hu,
(34)
722
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
in view of (25). The first term is estimated as ˆ Hu)εdh (Hu, ˆ Hu) ˆ + 1 dh (Hu, Hu), dh (Hu, 4ε
(35)
with arbitrary ε > 0. To estimate the second term on the right-hand side of (34) note that, for o ˆ ∈ X () and using (22), we get v := P Hu ˆ v) = dh (Hu,
N
i (∇ Hˆ i ui , ∇vi )L2 (i )
i=1
=−
N i=1 Fij ⊂*i
ij
lij
*vi , uj − u i *n
(36)
. L2 (Fij )
The terms on the right-hand side of (36) are estimated as follows: *vi *vi ui − uj L2 (Fij ) , uj − u i ij ij *n 2 *n L2 (Fij ) L (Fij ) C C
ij 1/2
hi
∇vi L2 (i ) ui − uj L2 (Fij )
ij
∇vi L2 (i ) ui − uj L2 (Fij ) 1/2 hij ij 2 2 C εij ∇vi L2 ( ) + ui − uj L2 (F ) i ij 4εhij ij C 2εi ∇vi 2L2 ( ) + ui − uj 2L2 (F ) , i ij 4εhij where we have used that hij 2hi and ij 2i . Substituting this into (36), we get ⎧ ⎫ N ⎨ ⎬ ij ˆ v)C 2εi ∇Pi Hˆ i ui 2L2 ( ) + ui − uj 2L2 (F ) , dh (Hu, i ij ⎭ ⎩ 4hij ε i=1
(37)
(38)
Fij ⊂*i
and using ∇Pi Hˆ i ui L2 (i ) ∇ Hˆ i ui L2 (i ) , we obtain
ˆ v)C εdh (Hu, ˆ Hu) ˆ + 1 dh (Hu, Hu) . dh (Hu, 4ε
(39)
Substituting (39) and (35) into (34) we get 1 ˆ ˆ ˆ ˆ dh (Hu, Hu)C εdh (Hu, Hu) + dh (Hu, Hu) . 4ε Choosing a sufficiently small ε, the right-hand side of (33) follows.
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
723
5. Balancing domain decomposition with constraints method We design and analyze BDDC methods for solving (28); see [7,17,16] for conforming elements. We use the general framework of ASMs as stated below in Lemma 5.1; see [18]. For i = 0, . . . , N, let Vi be auxiliary spaces and Ii prolongation operators from Vi to V, and define the operators T˜i : V → Vi as bi (T˜i u, v) = ah (u, Ii v)
∀v ∈ Vi ,
where bi (·, ·) is symmetric and positive definite on Vi × Vi , and set Ti = Ii T˜i . Then the ASMs, in particular the BDDC methods, are defined as T =
N
(40)
Ti .
i=0
The bilinear form ah is defined in (3). The bilinear forms bi , the operators Ii , and the spaces Vi , i = 0, . . . , N, are defined in the next subsections. Lemma 5.1. Suppose the following three assumptions hold: (i) There exists a constant C0 such that, for all u ∈ V , there is a decomposition u = with u(i) ∈ Vi , i = 0, . . . , N, and N
N
i=0 Ii u
(i)
bi (u(i) , u(i) ) C02 ah (u, u).
i=0
(ii) There exist constants ij , i, j = 1, . . . , N, such that for all u(i) ∈ Vi , u(j ) ∈ Vj , ah (Ii u(i) , Ij u(j ) ) ij ah (Ii u(i) , Ii u(i) )1/2 ah (Ij u(j ) , Ij u(j ) )1/2 . (iii) There exists a constant such that ah (Ii u, Ii u) bi (u, u)
∀u ∈ Vi , i = 0, . . . , N.
Then, T is invertible and C0−2 ah (u, u)ah (T u, u) (() + 1)ah (u, u) ∀u ∈ V . Here, () is the spectral radius of the matrix = {ij }N i,j =1 . 5.1. Notations and the interface condition Let us denote by i the set of all nodes on *i and on the neighboring faces Fj i ⊂ *j . We note that the nodes of *Fj i (which are vertices of j ) are included in i . Define Wi as the vector space associated to the nodal values on i and extended via Hˆ i inside i . We say that u(i) ∈ Wi (i) (i) if u(i) is represented as u(i) := {ul }l∈#(i) , where #(i) = {i and ∪ j : Fij ⊂ *i }. Here ui and (i) the uj stand for the nodal values of u(i) on *i and the F¯j i , respectively. We write u = {ui } ∈ V to refer to a function defined on all of with each ui defined (only) on *i . We point out that Fij and Fj i are geometrically the same even though the mesh on Fij is inherited from the i mesh while the mesh on Fj i corresponds to the j mesh.
724
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
Denote by i := {Fij : Fij ⊂ *i } ∪ {Fj i : Fj i = Fij , Fj i ⊂ *j } the set of all faces of i and all faces of j which has a common face with i . Given u(i) ∈ Wi and Fk ∈ i we use the notation 1 (i) uk = u(i) ds. |Fk | Fk Let us define the regular zero extension operator I˜i : Wi → V as follows: given u(i) ∈ Wi , let I˜i u(i) be equal to u(i) on nodes i and zero on \i . A face across i and j has two sides, the side contained in *i , denoted by Fij , and the side contained in *j , denoted by Fj i . In addition, we assign to each pair {Fij , Fj i } a master and a slave side. If Fij is a slave side then Fj i is a master side and vice versa. If Fij is a slave side we will use the notation ij (instead of Fij ) to emphasize this fact while if Fij is a master side we will use the notation ij . The choice of slave–master sides are such that the interface condition, stated next, can be satisfied. In this case Theorem 7.1 below holds with a constant C independent of the i , hi and Hi . Assumption 1 (The interface condition). We say that the coefficients {i } and the local mesh sizes {hi } satisfy the interface condition if there exist constants C0 and C1 , of order O(1), such that for any face Fij the following conditions hold: hi C0 hj and i C1 j if Fij is a slave side, or (41) hj C0 hi and j C1 i if Fij is a master side. (i)
We associate with each i , i = 1, . . . , N, the weighting diagonal matrices D (i) = {Dl }l∈#(i) on i defined as follows: • On *i (l = i):
⎧ 1 if x is a vertex of *i , ⎪ ⎪ ⎨ (i) Di (x) = 1 if x is an interior node of a master face Fij , ⎪ ⎪ ⎩ 0 if x is an interior node of a slave face Fij .
(42)
• On Fj i (l = j ):
⎧ 0 if x is an end point of the face Fj i , ⎪ ⎪ ⎨ (i) Dj (x) = 1 if x is an interior node and Fj i is a slave face, ⎪ ⎪ ⎩ 0 if x is an interior node and Fj i is a master face.
(43)
(i)
• For x ∈ Fi * we set Di (x) = 1. Remark 5.1. We note that two alternatives of weighting diagonal matrices D (i) can also be considered while ensuring that Theorem 7.1 below holds: (1) On faces Fij where hi and hj are of the same order, the values of (42) and (43) at interior nodes x of the faces Fij and Fj i can be √ i replaced by √ √ ; (2) Similarly, on faces Fij where i and j are of the same order, we can i +
j
replace (42) and (43) at interior nodes x of the faces Fij and Fj i by
hi hi +hj
.
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
725
The prolongation operators Ii : Wi → V , i = 1, . . . , N, are defined as Ii = I˜i D (i) ,
(44)
and they form a partition of unity on described as N
Ii I˜iT = I .
(45)
i=1
6. Local and coarse spaces The local spaces Vi = Vi (i ), i = 1, . . . , N, are defined as the subspaces of Wi of functions with zero face-average values on all faces Fij and Fj i associated to the subdomain i , i.e., for all Fk ∈ i . For u(i) , v (i) ∈ Vi (i ) we define the local bilinear form bi as bi (u(i) , v (i) ) := aˆ i (u(i) , v (i) ),
(46)
where the bilinear form aˆ i was defined in (4). Now we define a BDDC coarse space. As in BDDC methods, here we define the coarse space using local bases and imposing continuity conditions with respect to the primal variables; see [7,17,16]. Recall that i := {Fij : Fij ⊂ i } ∪ {Fj i : Fj i = Fij , Fj i ⊂ j } is the set of all faces of i and all faces of j which has a common face with i . For Fk ∈ i define the local coarse basis (i) function Fk ∈ Wi by (i)
bi (Fk , v) = 0 with 1 |Fk | and
(47)
(i)
Fk = 1 Fk
(i)
F k
∀v ∈ Vi (i )
Fk = 0 (i)
∀F k = Fk with F k ∈ i .
(i)
Note that Fk = Fk .
(i) Define V0i = V0i (i ) := Span{Fk : Fk ∈ i } ⊂ Wi . Then (47) implies that Vi is Hˆ i orthogonal to V0i , and Wi is a direct sum of V0i and Vi , i.e., V0i ⊕ Vi = Wi . (i) The global coarse space V0 is defined as the set of all u0 := {u0 } ∈ N i=1 V0i (i ) such that, for i, j = 1, . . . , N, we have, using the notation introduced in Subsection 5.1, (i)
(j )
u0k = u0k
∀Fk ∈ i ∩ j .
The coarse prolongation operator I0 : V0 → V is defined as I0 u0 = form b0 is of the form b0 (u0 , v0 ) :=
N i=1
(i)
(i)
bi (u0 , v0 ).
(48) N
(i) i=1 Ii u0
and the bilinear
(49)
726
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
7. Main result In this section we state and prove our main result. Theorem 7.1. Let the Assumption 1 be satisfied. Then, there exists a positive constant C, independent of hi , Hi and the jumps of i , such that H 2 ah (u, u) ∀u ∈ V , (50) ah (u, u)ah (T u, u) C 1 + log h i where T is defined in (40). Here log Hh = maxi log H hi .
Proof. By the general theorem of ASMs we need to check the three key assumptions of Lemma 5.1. (i) ∈ V such that Assumption (i). We prove that for u = {ui }N i i=1 ∈ V there exists u0 ∈ V0 and u
I0 u0 +
N
Ii u(i) = u
(51)
i=1
and b0 (u0 , u0 ) +
N
bi (u(i) , u(i) ) = a(u, u).
(52)
i=1 (i)
Let u = {ui }N i=1 ∈ V (). Define u0 ∈ V0i (i ) as 1 (i) (i) u0 = u ds Fk , |Fk | Fk
(53)
Fk ∈i
(i)
(i)
where functions Fik were defined in (47). Note that u0 and u have the same face-average values on all faces Fk ∈ i , i.e., ⎧ 1 1 (i) (i) ⎪ u ds = u ds = u0k , ⎨ |Fk | Fk |Fk | Fk 0 (54) 1 (j ) (j ) ⎪ ⎩ 1 u ds = u ds = u , 0k |Fk | Fk |Fk | Fk 0 and therefore, for all the faces Fk ∈ i ∩ j we have, see (48), (j )
(i)
u0k = u0k .
(55) (i)
Define u0 ∈ V0 by u0 = {u0 }N i=1 and set w = u − I0 u0 , where I0 u0 = can write (i)
N
i=1
i=1
where we have defined u(i)
=
(i) I˜iT u − u0
w=
(51) holds.
N
Ii (I˜iT u − u0 ) =
N
(i) i=1 Ii u0 .
Then we
Ii u(i) , ∈ Vi . Since the operators Ii I˜iT form a partition of unity,
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
727
To check (52) observe that u(i) has zero face-average values on all faces Fk ∈ i , hence it is (i) ˆ Hi -orthogonal to u0 ; see (47). Then, from the definition of b0 we have b0 (u0 , u0 ) +
N
bi (u , u ) = (i)
(i)
i=1
N
(i)
(i)
bi (u0 , u0 ) + bi (u(i) , u(i) )
i=1
=
N
(i)
(i)
bi (u0 + u(i) , u0 + u(i) )
i=1
=
N
bi (I˜iT u, I˜iT u) = ah (u, u).
i=1
This ends the proof of Assumption (i). Assumption (ii). We need to prove that 1/2
1/2
ah (Ii u(i) , Ij u(j ) ) Cεij ah (Ii u(i) , Ii u(i) ) ah (Ij u(j ) , Ij u(j ) ),
(56)
for u(i) ∈ Vi and u(j ) ∈ Vj , i, j = 1, . . . , N, and the spectral radius (ε) of ε = {εij }N i,j =1 is bounded. In our case (ε) C with constant independent of hi and Hi . This follows from coloring arguments and the fact that u(i) and u(j ) are different from zero only on i and j and their neighboring substructures. Assumption (iii). We need to prove that for i = 1, . . . , N, ah (Ii u(i) , Ii u(i) ) bi (u(i) u(i) ) ∀u(i) ∈ Vi ,
(57)
ah (I0 u0 , I0 u0 ) b0 (u0 , u0 ) ∀u0 ∈ V0
(58)
and
with C(1 + log Hh )2 where C is a positive constant independent of hi , Hi and the jumps of i . For the proof of (57) see Lemma 8.1, and for the proof of (58) see Lemma 8.2 in the next section. 8. Auxiliary lemmas In this section we complete the proof of Theorem 7.1 by proving two auxiliary lemmas associated with (57) and (58). Lemma 8.1. Assume that the Assumption 1 holds. Then for u(i) ∈ Vi , i = 1, . . . , N, we have H 2 (i) (i) ah (Ii u , Ii u ) C 1 + log bi (u(i) , u(i) ), (59) h where C is independent of hi , Hi and the jumps of i .
728
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
ˆ Hu) ˆ by dh (Hu, Hu) on the left-hand side Proof. In order to prove (59) we can replace ah (Hu, ˜ of (59) and on its right-hand side we can put di (HIi u(i) , HI˜i u(i) ) instead of bi (u(i) , u(i) ); see Lemmas 2.1 and 4.1. In order to simplify the notation, all the functions are considered as harmonic extensions in the (i) H sense. Hence, we denote HIi u by Ii u and let u = {ul }l∈#(i) ∈ Vi . Using (7), (8) and (44) we obtain dj (I˜i D (i) u(i) , I˜i D (i) u(i) ), (60) dh (Ii u(i) , Ii u(i) ) = di (I˜i D (i) u(i) , I˜i D (i) u(i) ) + j
where the sum is taken over j which has a common face with i . The first term on the right-hand side of (60) can be estimated as follows: di (I˜i D (i) u(i) , I˜i D (i) u(i) ) (i) (i) = i |∇Di ui |2 dx + i
Fij ⊂*i
ij lij hij
(i) (i)
Fij
(i) (i)
(Di ui − Dj uj )2 dx.
(61)
To bound the first term of (61) we use (i) (i)
(i) (i)
(i)
(i)
i ∇Di ui 2L2 ( ) 2i { ∇(Di ui − ui ) 2L2 ( ) + ∇ui 2L2 ( ) } i
i
i
and therefore, (i) (i)
(i)
i ∇(Di ui − ui ) 2L2 ( ) C i
(i)
(i)
i u˜ i 2
1/2
H00 (ij )
ij ⊂*i
(i)
.
(i)
Here u˜ i = ui at the interior nodal points of ij and u˜ i = 0 on *ij . Recall that ij denotes Fij when Fij is a slave side. It can be proved, see for example [18], that (i) C i u˜ i 2 1/2 H (ij ) 00
Hi 1 + log hi
2
(i)
i |ui |2H 1 ( ) .
(62)
i
(i)
Here we have used the fact that ui has zero face-average values. We now estimate the second term of (61) and (67), see below. Note that for Fi * , i.e. for faces on *, the estimates of the terms corresponding to Fi * follow straightforwardly. On a slave face Fij of *i , i.e. where hi C0 hj and i C1 j , we have (i) (i)
(i) (i)
(i)
Di ui − Dj uj 2L2 (F ) Chi max |ui |2 ij
and ij hij
(63)
Fij
(i) (i) Di ui
(i) (i) (i) − Dj uj 2L2 (F ) Ci max |ui |2 C ij Fij
Hi 1 + log hi
(i)
i |ui |2H 1 ( ) , i
where we have used ij 2i and hi Chij since hi < C0 hj . We have also used that u(i) has zero face-average value on any face of i , therefore, the Poincaré inequality can be used to bound the H 1 (i )-norm by the seminorm.
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
729
On a master side Fij of *i , i.e. where hj C0 hi and j C1 i , we have (i) (i) (i) (i) (i) (i) (i) j j uj (xv )v Di ui − Dj uj L2 (Fij ) ui − uj L2 (Fij ) + j xv ∈*Fij
,
(64)
L2 (Fij )
and using a triangle inequality we obtain (i)
(i)
(i)
(i)
uj (xvj )jv L2 (Fij ) ui (xvi )iv L2 (Fij ) + ui (xvi )iv − uj (xvj )jv L2 (Fij ) , j
(65)
j
where iv and v are the nodal basis functions corresponding to xvi and xv , respectively. The first term of (65) can be estimated as Hi (i) i (i) 2 (i) i 2 ui (xv )v L2 (F ) C max |ui | hi Chi 1 + log |ui |2H 1 ( ) , ij i Fij hi while the second term of (65) can be bounded as in (81), see below. Using these estimates in (61) and Lemma 2.1 we get Hi 2 (i) (i) di (Ii u , Ii u ) C 1 + log bi (u(i) , u(i) ). (66) hi We now estimate the second term of (60) by bounding dj (I˜i D (i) u(i) , I˜i D (i) u(i) ) by bi (u(i) , u(i) ). (i) For u = {ul } ∈ Vi we have dj (I˜i D (i) u(i) , I˜i D (i) u(i) ) =
(i) (i) j ∇Dj uj 2L2 ( ) j
ij + lij hij
(i) (i)
(i) (i)
(Di ui − Dj uj )2 dx,
Fij
(67)
(i) (i)
where here and below Dj uj is extended by zero on *j \Fj i . We need only to estimate the first term of (67) since the second term has been already estimated; see (63), (64) and (65). If Fij (i) (i) (i) is a slave side of *i then Dj vanishes, and so vanishes ∇Dj uj 2L2 ( ) . We now consider j
the case where Fij is a master side of *i and it is not equal to Fi * . On Fj i we decompose j (i) (i) (i) j (i) (i) (i) uj = wj + x j ∈*F uj (xv )v , where wj = Dj uj . We have ji
v
(i)
∇wj 2L2 (
j)
(i)
C wj 2
1/2
H00 (Fj i )
=C
(i) |wj |2H 1/2 (F ) ji
(i)
(wj )2
+ Fj i
dist(s, *Fj i )
ds .
(68)
We now estimate the first term of (68). Let Qj be the L2 -projection on the hj -triangulation of Fj i . Then, (i)
|wj |2H 1/2 (F
ji )
(i)
(i)
2{|wj − Qj ui |2H 1/2 (F
ji )
(i)
+ |Qj ui |2H 1/2 (F ) } ji
1 (i) (i) (i) w − ui 2L2 (F ) + ∇ui 2L2 ( ) C ji i hj j
(69)
730
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
and 2 (i) (i) 2 (i) (i) 2 (i) j j uj (xv )v wj − ui L2 (F ) 2 uj − ui L2 (F ) + 2 ji ji j 2 xv ∈*Fj i
,
(70)
L (Fj i )
where the second term of (70) can be bounded as before, see (64), (65) and (81), and using that j C1 i . It remains to estimate the second term of (68). In order to simplify the notation, we take Fij as the interval [0, H ]. Note that
(i)
(wj )2
Fj i
dist(s, *Fj i )
H /2
ds C
(i)
(wj )2 s
0
(i)
H
(wj )2
H /2
(H − s)
ds +
ds .
(71)
Let us estimate the first term on the right-hand side of (71). We have
H /2
(i)
(wj )2 s
0
hj
=
(i)
(wj )2 s
0
ds
ds +
H /2 hj
(i)
(uj )2 s
ds
(i) (ui )2 C + ds + ds s s hj hj 2 Hj 1 (i) (i) (i) (i) C uj (hj ) + ui − uj 2L2 (F ) + 1 + log max |ui |2 ji Fij hj hj Hj 1 (i) Hi (i) 2 (i) 2 C 1 + log ui H 1 ( ) , u − uj L2 (F ) + 1 + log ij i hj i hi hj (i) (uj (hj ))2
H /2
(i)
(i)
(ui − uj )2
H /2
(i)
where uj (hj )2 has been estimated as in (81). The second term of (71) is estimated similarly. (i)
Substituting these estimates into (71) and using that ui has zero face-average values we get (i) (uj )2 H 2 (i) ∇ui 2L2 ( ) ds C 1 + log i h Fj i dist(s, *Fj i ) 1 (i) (i) 2 (72) + ui − uj L2 (F ) . ij hj In turn, substituting (69) and (72) into (68), and the resulting estimate into (67), and using Lemma 2.1, we get H 2 (i) (i) ˜ (i) (i) ˜ dj (Ii D u , Ii D u ) C 1 + log bi (u(i) , u(i) ). h
(73)
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
Using (66) and (73) in (60), we get H 2 dh (Ii u(i) , Ii u(i) ) C 1 + log bi (u(i) , u(i) ). h
731
Lemma 8.2. Suppose that the Assumption 1 holds. Then, for u0 ∈ V0 , V0 defined by (48), we have the following inequality H 2 b0 (u0 , u0 ), (74) ah (I0 u0 , I0 u0 ) C 1 + log h where C is independent of hi , Hi and the jumps of i . Proof. By Lemmas 2.1 and 4.1 ˆ Hu)Cd ˆ ˆ Hu) ˆ Cdh (Hu, Hu), ah (Hu, h (Hu,
(75)
ˆ Hu) ˆ by where dh (., .) is defined by (8). Hence, to prove the result (74) we can replace ah (Hu, dh (Hu, Hu) on the left-hand side of (74). (i) In order to simplify the notation we write u instead of u0 and put I0 u0 = I0 u = N i=1 Ii u , see (48) and thereafter. We have ⎧ ⎫2 ⎨ ⎬ (i) (j ) ∇ (I u ) + (I u ) di (I0 u, I0 u) = i i j i ⎩ i ⎭ F ⊂* ij
+
Fij ⊂*i
Fij
L2 (i )
i
ij {(Ii u(i) )i + (Ij u(j ) )i } lij hij 2
−{(Ii u(i) )j + (Ij u(j ) )j }
(76)
ds.
To bound the second term on the right-hand side of (76) let us consider the case where Fij is a master side. The proof for the case where Fij is a slave side is similar; see also the arguments given in (63) and thereafter. Then using the definition of Ii and D (i) , we obtain ij 2 J= {(Ii u(i) )i + (Ij u(j ) )i } − {(Ii u(i) )j + (Ij u(j ) )j ds Fij lij hij ij (i) (i) (j ) (j ) (j ) (j ) 2 (i) (i) = {Di ui − Dj uj } − {Dj uj − Di ui } ds Fij lij hij 2 ij (i) (i) (j ) (j ) (i) (i) = {Di ui − Dj uj } − {Dj uj − 0} ds Fij lij hij ij (i) (i) (j ) (i) (j ) (i) (j ) 2 (i) = {Di ui − (Dj + Dj )uj } + Dj {uj − uj } ds Fij lij hij = Fij
⎛ ij ⎜ (i) (i) ⎝{ui − uj } − lij hij
j
xv ∈*Fj i
⎞2 ⎟ (j ) (i) {uj (xvj ) − uj (xvj )}jv ⎠ ds,
(77)
732
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739 j
j
where v is the nodal basis function corresponding to xv . Hence, ij (i) (i) {ui − uj }2 ds J C Fij lij hij +Chj
ij (j ) (i) max {u (x j ) − uj (xvj )}2 . lij hij xvj ∈*Fj i j v
(78) (j )
(i)
It remains to estimate the second term of (78). First note that uj i = uj i since there are primal variables associated to the faces Fj i ∈ i and Fj i ∈ j ; see (48). Therefore, (i)
(j )
(j )
(j )
(i)
(i)
|uj (xvj ) − uj (xvj )| |uj (xvj ) − uj i | + |uj (xvj ) − uj i |
Hj C 1 + log hj
1 2
(j )
(i)
(i)
∇uj L2 (j ) + |uj (xvj ) − uj i |.
(79)
To deduce the estimate on the first term on the right-hand side of (79) we have used a Poincaré inequality and an L∞ bound for FEM functions, see [18]. The second term of (79) is estimated as (i)
(i)
(i)
(i)
(i)
(i)
(i)
(i)
|uj (xvj ) − uj i | |uj (xvj ) − ui (xvi )| + |ui (xvi ) − uij | + |uij − uj i | C
Hi 1 + log hi
(i) (i) |uj (xvj ) − ui (xvi )| +
−1 (i) +hj 2 ui
1 2
(i)
∇ui L2 (i )
(i) − uj L2 (Fij )
(80)
,
where we have used a Poincaré inequality and an L∞ bound for FEM functions to obtain the second term on the right-hand side of (80) and a Cauchy–Schwarz inequality to obtain the third (i) (i) term of (80). To estimate the first term of (80), let Qj ui be the L2 -projection of ui on the hj triangulation of Fj i . We obtain (i)
(i)
(i)
(i)
(i)
(i)
|uj (xvj ) − ui (xvi )| |uj (xvj ) − Qj ui (xvi )| + |Qj ui (xvi ) − ui (xvi )|
−1
(i)
(i)
C hj 2 uj − ui L2 (Fij ) 1 Hj 2 (i) + 1 + log ∇ui L2 (i ) , hj
(81)
where the first estimate was obtained from an inverse inequality and the second from the approximation properties of the L2 projection and an L∞ bound for FEM functions. By Lemmas 2.1 and 4.1 we can bound the term di (HI˜i u(i) , HI˜i u(i) ) by bi (Hˆ i u(i) , Hˆ i u(i) ). Then we conclude that J of (77) can be estimated as H {bi (u(i) , u(i) ) + bj (u(j ) , u(j ) }, (82) J C 1 + log h since ij Ci and hj Chij .
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
733
It remains to estimate the first term in (76). We have ⎧ ⎫2 ⎨ ⎬ (j ) ∇ (Ii u(i) )i + (Ij u )i ⎩ ⎭ F ⊂* ij
L2 (i )
i
⎧⎛ ⎫2 ⎞ ⎨ ⎬ (j ) (j ) (j ) (i) (i) (i) ⎝ ⎠ = ∇ Di + Di Di (ui − ui ) ui + ⎩ ⎭ F ⊂* F ⊂* ij
C
⎧ ⎨
i
ij
(i)
∇ui 2L2 ( ) +
⎩
i
(j )
(j )
Di (ui
L2 (i )
i
⎫ ⎬
(i)
− ui ) 2
H00 (ij ) ⎭
ij ⊂*i
1/2
(83)
,
where the sum in (83) reduces to the slave sides Fij . From (48) we obtain (j )
(j )
Di (ui
(i)
− ui ) 2
(j ) (j ) Di (ui
2
1/2
H00 (Fij )
(j ) − uij ) 2 1/2 H (Fij ) 00
(j ) (i) + Di (ui
(i) − uij ) 2 1/2 H (Fij )
(84)
00
and therefore, the first term of (84) is estimated as (j )
(j )
i Di (ui
(j ))
− uij ) 2
1/2
H00 (Fij )
(j ) (j ) (j ) 2i Di (ui − Qi uj ) 2
(j )
1/2
H00 (Fij )
(j )
(j )
+ Di (Qi uj − uj i ) 2
(j )
(j )
(j )
+ Di (uj i − uij ) 2 Ci
1/2
H00 (Fij )
1/2
H00 (Fij )
Hj 2 1 (j ) (j ) 2 (j ) 2 u − uj L2 (F ) + 1 + log ∇uj L2 ( ) ji j hi i hj
Hj C 1 + log hj
2 bj (u(j ) , u(j ) ),
(85)
since i C1 j and hij 2hi when Fij is a slave side, and in view of Lemma 2.1. The second term on the right-hand side of (84) is bounded by
(j ) (i) i Di (ui
(i) − uij ) 2 1/2 H (Fij ) 00
Hi 2 (i) Ci 1 + log ∇ui L2 (i ) hi Hi 2 bi (u(i) , u(i) ). 1 + log hi
(86)
734
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
Using (85) and (86) in (84) and the resulting inequality in (83) we see that ⎧ ⎫2 ⎨ ⎬ (i) (j ) i ∇ (Ii u )i + (Ij u )i ⎭ ⎩ Fij ⊂*i
H C 1 + log h
L2 (i )
2 {bi (u(i) , u(i) ) + bj (u(j ) , u(j ) )}.
This estimate and (82), see (76), imply that
H di (I0 u0 , I0 u0 )C 1 + log h
2 {bi (u(i) , u(i) ) + bj (u(j ) , u(j ) )}.
Summing this over i and using Lemmas 2.1 and 4.1 we get (74).
9. Smaller coarse spaces In Section 6 we have defined the coarse space with a primal variable associated to each face Fk ∈ i . In this case the number of constraints per subdomain is twice the number of edges of *i for floating subdomains i . In this section we discuss choices of subsets of i which imply smaller coarse problems and still maintain the bound (50) of Theorem 7.1. Recall that a face across i and j has two sides, the side contained in *i , denoted by Fij , ˜ i , i = 1, . . . , N, be such that for all pairs and the side contained in *j , denoted by Fj i . Let ˜i ∩ ˜ j contains one and only one face from of neighboring subdomains i and j the subset each pair {Fij , Fj i }, i.e., Fij or Fj i . We denote the chosen face by ij = j i . For instance, we ˜ i as the set of master faces ij associated to i . can choose ˜ i , the local spaces Vi = Vi (i ), i = 1, . . . , N, are defined as the subspaces After choosing ˜ i while the spaces V0i are of Wi of functions with zero face-average values on all faces k ∈ (i) (i) ˜ defined as V0i = V0i (i ) = Span{ : k ∈ i } ⊂ Wi where the functions are defined k k ˜ i in each subdomain; see (47). as in Section 6 replacing i by From now on we will use the notation 1 (i) uk = u(i) ds, | k | k (i)
where u(i) ∈ Wi . The global coarse space V0 is now defined as the set of all u0 = {u0 } ∈ N i=1 V0i (i ) such that for i = 1, . . . , N, we have (i)
(j )
u0ij = u0ij (i)
˜ i. ∀ ij ∈
(87)
Recall that u0 is defined locally. Then we have the following possible cases of continuity with respect to the primal variables: (i) Case 1: ij = j i = Fij . This case imposes continuity of the face-average values of u0 and (j ) u0 on Fij ; see (87). Case 2: ij = j i = Fj i . This case imposes continuity of the face-average values on Fj i .
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
735
Example 9.1. Consider the domain = (0, 1)2 and divide it into N = M × M squares subdomains i which are unions of fine elements, with H = 1/M. We note that for floating subdomains ˜ i has only four coarse basis functions. i , i has eight coarse basis functions while The bilinear forms ah , bi and the operators Ii , i = 1, . . . , N, and the operator I0 are defined in Sections 5 and 6. We now show that with these new local and global spaces Theorem 7.1 still holds. The proof is basically the same as the one given in Sections 7 and 8 with some minor modifications depending on which of the above cases is considered and also on a modification of the Poincaré inequality. Theorem 9.1. If the Assumption 1 holds, then there exists a positive constant C independent of hi , Hi and the jumps of i such that H 2 ah (u, u) ∀u ∈ V , ah (u, u)ah (T u, u) C 1 + log h
(88)
where T is defined in (40), the local spaces Vi , i = 1, . . . , N, are defined above in this section i and the global space V0 is defined using (87). Here log Hh = maxi log H hi . Proof. We now mention the main modifications of the proof of the three key assumptions of Lemma 5.1. (i)
Assumption (i). Let u = {ui }N i=1 ∈ V (). Define u0 ∈ V0i (i ) by (i) u0
1 (i) = u ds k | k | k
(89)
˜i k ∈
and proceed as in the proof of Theorem 7.1. Assumption (ii). It is the same argument given to verify Assumption (ii) in the proof of Theorem 7.1. Assumption (iii). We modify the proof of Lemmas 8.2 and 8.1 as follows: For the proof of Lemma 8.2 we consider the following cases to obtain a bound for the left-hand side of (79), Case 1: ij = j i = Fj i . In this case we use the same argument as in the proof of Lemma 8.2 to estimate the left-hand side of (79). Case 2: ij = j i = Fij . In this case we estimate, see (79), (j )
(i)
(i)
(i)
(j )
(j )
(i)
(j )
|uj (xvj ) − uj (xvj )||uj (xvj ) − uj i | + |uj (xvj ) − uj i | + |uj i − uj i |.
(90)
The first and second term of (90) can be bounded as in Case 1. The third term of (90) is bounded (j ) (i) as follows: since ij = j i = Fij we have that uij = uij ; see (87). Then (i)
(j )
(i)
(i)
(j )
(j )
|uj i − uj i | |uj i − uij | + |uij − uj i |
(91)
736
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
and we obtain (i)
(i)
−1
(i)
−1
(i)
(i)
(i)
|uj i − uij | CHj 2 uj − ui L2 (Fij ) Chj 2 uj − ui L2 (Fij ) . An analogous bound holds also for the second term of (91); see (79). For the proof of Lemma 8.1 we can apply Poincaré inequality only in the case which ij = Fij ⊂ *i . If this is not the case, i.e., if ij = Fj i ⊂ j , we can still bound the H 1 (i ) norm by the seminorm using the following argument: if u(i) ∈ Vi and ij = Fj i then u(i) has zero face-average value on Fj i and therefore, (i)
1/2
ui L2 (i ) ui − uij L2 (i ) + Hi
(i)
(i)
(i)
uij − uj i L2 (Fij ) (i)
Hi ∇ui L2 (i ) + ui − uj L2 (Fij ) . Having modified the proof of Lemmas 8.2 and 8.1, then Assumption (iii) follows.
10. Numerical experiments In this section we present numerical results for the preconditioner introduced in (40) and show that the bounds of Theorems 7.1 and 9.1 are reflected in the numerical tests. In particular we show that the Assumption 1, see (41), is necessary and sufficient. We consider the domain = (0, 1)2 and divide into N = M × M square subdomains i which are unions of fine elements, with H = 1/M. Inside each subdomain i we generate a structured triangulation with ni subintervals in each coordinate direction, and apply the discretization presented in Section 2 with = 4. This value = 4 was chosen because numerically it was observed that the L2 approximation error seems to stabilize when becomes larger. The minimum value of that gives a positive definite system is min = 1.565. In the numerical experiments we use a red–black checkerboard type subdomain partition. On the black subdomains we let ni = 2∗2Lb and on the red subdomains we let ni = 3∗2Lr , where Lb and Lr are integers denoting −Lb the number of refinements inside each subdomain i . Hence, the mesh sizes are hb = 22M and −Lr
hr = 23M , respectively. We solve the second-order elliptic problem −div((x)∇u∗ (x)) = 1 in with homogeneous Dirichlet boundary conditions. In the numerical experiments, we run PCG until the l2 -norm initial residual is reduced by a factor of 106 . In the first test we consider the constant coefficient case = 1. We consider different values of M × M coarse partitions and different values of local refinements Lb = Lr , therefore, keeping constant the mesh ratio hb / hr = 23 . We place the masters on the black subdomains. We note that the interface condition (41) is satisfied. Table 1 lists the number of PCG iterations and in parenthesis the condition number estimate of the preconditioned system in the case we choose eight coarse functions per subdomain. As expected from the analysis, the condition numbers appear to be independent of the number of subdomains and seem to grow by a logarithmic factor when the size of the local problems increases. Note that in the case of continuous coefficients, Theorems 7.1 and 9.1 are valid without any assumptions on hb and hr if the master sides are chosen on the larger meshes. ˜ i as the set of master faces of i . Table 2 is the same as before, however, now we have chosen In this case we have four coarse basis functions in each subdomain. We note that even though the coarse problems are smaller, the results are very similar to the ones presented in Table 1 where the
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
737
Table 1 PCG/BDDC iteration counts and condition numbers for different sizes of coarse and local problems and constant coefficients i with eight coarse basis functions per subdomain M ↓ Lr →
0
1
2
3
4
5
2 4 8 16 32
12 (5.7) 14 (5.8) 15 (5.9) 15 (6.0) 15 (6.0)
14 (6.7) 18 (8.5) 20 (9.1) 20 (9.4) 20 (9.3)
15 (7.5) 21 (11.7) 24 (12.3) 25 (12.8) 25 (12.8)
18 (10.6) 24 (15.2) 27 (15.8) 28 (16.3) 28 (16.3)
19 (14.5) 27 (19.2) 31 (19.6) 31 (20.1) 32 (20.2)
19 (19.0) 29 (23.9) 34 (24.0) 35 (24.5) 35 (24.6)
Table 2 PCG/BDDC iteration counts and condition numbers for different sizes of coarse and local problems and constant coefficients i with four coarse basis functions per subdomain associated to its master faces M ↓ Lr
0
1
2
3
4
5
2 4 8 16 32
13 (5.7) 15 (5.8) 17 (6.1) 18 (6.1) 18 (6.1)
15 (6.7) 19 (8.5) 21 (9.1) 23 (9.4) 24 (9.4)
16 (7.5) 22 (11.7) 25 (12.3) 27 (12.8) 27 (12.8)
18 (10.7) 24 (15.1) 28 (15.7) 30 (16.3) 30 (16.3)
19 (14.5) 27 (19.2) 31 (19.6) 32 (20.1) 32 (20.2)
19 (18.9) 29 (23.8) 34 (24.0) 35 (24.5) 35 (24.6)
Table 3 PCG/BDDC iteration counts and condition numbers for different values of coefficients and the local mesh sizes on the red subdomains only
↓ Lr
→
1000 10 0.1 0.001
0
1
2
3
4
5
85 (2099) 28 (24.4) 16 (6.6) 16 (6.96)
165 (2822) 37 (32.9) 17 (6.8) 16 (7.12)
263 (3746) 43 (42.3) 16 (6.8) 16 (7.16)
282 (4758) 47 (52.8) 17 (6.8) 16 (7.25)
287 (5922) 51 (64.8) 17 (6.9) 17 (7.38)
310 (7168) 53 (77.7) 17 (6.9) 18 (7.50)
The coefficients and the local mesh sizes on the black subdomains are kept fixed. The subdomains are also kept fixed to 4 × 4 and eight coarse basis functions in each subdomain are used.
coarse problems are larger. As in the case of Table 2 the smallest eigenvalue of the preconditioned operator is 1. We now consider the discontinuous coefficient case where we set i = 1 on the black subdomains and i = on the red subdomains. The subdomains are kept fixed at 4 × 4, i.e., 16 subdomains. Table 3 lists the results of computations for different values of and for different levels of refinement on the red subdomains. On the black subdomains ni = 2 is kept fixed. The masters are placed on the black subdomains. It is easy to see that the interface condition (41) holds if, and only if, is not large, which seems to be in agreement with the results in Table 3. We repeat the same experiment as in Table 3 but this time with four coarse local basis functions associated to the master sides of the subdomain. The results are presented in Table 4.
738
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
Table 4 PCG/BDDC iteration counts and condition numbers for different values of coefficients and the local mesh sizes on the red subdomains only
↓ Lr
→
1000 10 0.1 0.001
0
1
2
3
4
5
84 (2127) 32 (24.7) 15 (6.9) 15 (7.4)
133 (2905) 40 (33.4) 16 (6.8) 15 (7.3)
188 (3827) 45 (43.0) 16 (6.8) 16 (7.2)
254 (4838) 49 (53.5) 17 (6.8) 17 (7.3)
326 (5980) 53 (65.3) 17 (6.9) 17 (7.42)
384 (7205) 54 (78.0) 17 (7.0) 18 (7.52)
The coefficients and the local mesh sizes on the black subdomains are kept fixed. The subdomains are also kept fixed to 4 × 4 and four coarse basis functions in each subdomain are used. Master faces are chosen.
11. Conclusions and extensions In this paper several BDDC methods with different coarse spaces, for DG discretization of second-order elliptic equations with discontinuous coefficients, have been designed and analyzed. It has been proved that the methods are almost optimal and very well suited for parallel computations. Their rates of convergence are independent of the parameters of the triangulations, the number of substructures and the jumps of the coefficients. The numerical tests confirm the theoretical results. (i) In 2-D, the methods are based on choosing Di to be equal to one at the vertices of i . The (i) methods can be extended to 3-D by considering Di to be equal to one at nodal points of edges and vertices of the i . In this case Theorems 7.1 and 9.1 hold. The methods also can be generalized to max (x) the case where i = mixxx i(x) is not large. In this case, define constants ¯ i as the integral average i of the i (x) over the i . The ¯ i are used to determine the mortar and slave sides, and can be used to define the weighting matrices D (i) as well. For the bilinear forms bi (·, ·) we use exact solvers where i (x) are considered rather than ¯ i . In this case, Theorems 7.1 and 9.1 are valid, with lower bound equal to one, and upper bound now involving a constant C depending linearly on i . The case where the i (x) have large variations inside the i will be discussed elsewhere. Finally, we remark that the condition number of the preconditioned systems deteriorates as we increase the penalty parameters to large values. Acknowledgment We would like to express our thanks to the anonymous referee for the suggestions to improve the presentation of the paper. References [1] P.F. Antonietti, Domain decomposition, spectral correctness and numerical testing of discontinuous Galerkin methods, Ph.D. Thesis, Dipartimento di Matematica, Università di Pavia, 2006. [2] P.F. Antonietti, B. Ayuso, Schwarz domain decomposition preconditioners for discontinuous Galerkin approximations of elliptic problems: non-overlapping case, Technical Report 20-VP, IMATI-CNR, June 2005. To appear in M2AN. [3] P.F. Antonietti, B. Ayuso, Multiplicative schwarz methods for discontinuous Galerkin approximations of elliptic problems, Technical Report 10-VP, IMATI-CNR, June 2006. Submitted to M2AN. [4] D.N. Arnold, An interior penalty finite element method with discontinuous elements, SIAM J. Numer. Anal. 19 (4) (1982) 742–760. [5] D.N. Arnold, F. Brezzi, B. Cockburn, D. Marini, Unified analysis of discontinuous Galerkin method for elliptic problems, SIAM J. Numer. Anal. 39 (5) (2002) 1749–1779.
M. Dryja et al. / Journal of Complexity 23 (2007) 715 – 739
739
[6] S. Brenner, K. Wang, Two-level additive Schwarz preconditioners for C 0 interior penalty methods, Numer. Math. 102 (2) (2005) 231–255. [7] C.R. Dohrmann, A preconditioner for substructuring based on constrained energy minimization, SIAM J. Sci. Comput. 25 (1) (2003) 246–258. [8] M. Dryja, On discontinuous Galerkin methods for elliptic problems with discontinuous coefficients, Comput. Methods Appl. Math. 3 (1) (2003) 76–85. [9] M. Dryja, J. Galvis, M. Sarkis, Balancing domain decomposition methods for discontinuous Galerkin discretization, in: Ulrich Langer et al. (Eds.), Domain Decomposition Methods in Science and Engineering XVII, Lecture Notes in Computational Science and Engineering, Springer, Berlin, 2007, to appear. [10] M. Dryja, M. Sarkis, O. Widlund, Multilevel Schwarz methods for elliptic problems with discontinuous coefficients in three dimensions, Numer. Math. 72 (1996) 313–348. [11] M. Dryja, M. Sarkis, A Neumann-Neumann method for DG discretization of elliptic problems, Tech. Rep. Serie A 456, Intituto de Mathemática Pura e Aplicada, http://www.preprint.impa.br/Shadows/SERIE_A/2006/456.html, June 2006. [12] X. Feng, O.A. Karakashian, Two-level additive Schwarz methods for a discontinuous Galerkin approximation of second-order elliptic problems, SIAM J. Numer. Anal. 39 (4) (2001) 1343–1365. [13] X. Feng, O.A. Karakashian, Two-level non-overlapping Schwarz preconditioners for a discontinuous Galerkin approximation of the biharmonic equation, J. Sci. Comput. 22 (1) (2005) 289–314. [14] C. Lasser, A. Toselli, An overlapping domain decomposition preconditioners for a class of discontinuous Galerkin approximations of advection–diffusion problems, Math. Comput. 72 (243) (2003) 1215–1238. [15] R.D. Lazarov, S.Z. Tomov, P.S. Vassilevski, Interior penalty discontinuous approximations of elliptic problems, Comput. Methods Appl. Math. 1 (4) (2001) 367–382. [16] J. Li, O.B. Widlund, FETI-DP, BDDC, and block Cholesky methods, Internat. J. Numer. Methods Eng. 66 (2) (2006) 250–271. [17] J. Mandel, C.R. Dohrmann, R. Tezaur, An algebraic theory for primal and dual substructuring methods by constraints, Appl. Numer. Math. 54 (2) (2005) 167–193. [18] A. Toselli, O.B. Widlund, Domain decomposition methods—algorithms and theory, Springer Series in Computational Mathematics, vol. 34, Springer, Berlin, 2005.
Journal of Complexity 23 (2007) 740 – 751 www.elsevier.com/locate/jco
An effective algorithm for generation of factorial designs with generalized minimum aberration Kai-Tai Fanga,∗ , Aijun Zhangb , Runze Lic a BNU - HKBU United International College, Zhuhai Campus of Beijing Normal University, Jinfeng Road, Zhuhai,
519085, China b Department of Statistics, University of Michigan, Ann Arbor, MI 48105, USA c Department Statistics, Penn State University, University Park, PA 16802, USA
Received 28 September 2006; accepted 16 March 2007 Available online 4 May 2007
Abstract Fractional factorial designs are popular and widely used for industrial experiments. Generalized minimum aberration is an important criterion recently proposed for both regular and non-regular designs. This paper provides a formal optimization treatment on optimal designs with generalized minimum aberration. New lower bounds and optimality results are developed for resolution-III designs. Based on these results, an effective computer search algorithm is provided for sub-design selection, and new optimal designs are reported. © 2007 Elsevier Inc. All rights reserved. Keywords: Fractional factorial design; Generalized minimum aberration; Lagrange analysis; Sub-design selection
1. Introduction Fractional factorial designs (FFDs) are popular choices of designs of experiments in industry. Extensive research has been done on factorial designs in recent decades, with main focus on optimality theory and design construction. The two most successful optimality criteria are maximum resolution by Box and Hunter [2] and minimum aberration by Fries and Hunter [7]. However, these criteria are defined for regular designs only; they cannot be used to assess a factorial design in general. Recently, generalized minimum aberration (GMA) was proposed for both regular and non-regular designs, with the two-level case by Tang and Deng [14], and multi-level case by Ma ∗ Corresponding author.
E-mail addresses: [email protected] (K.-T. Fang), [email protected] (A. Zhang), [email protected] (R. Li). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.03.010
K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751
741
and Fang [10] and Xu and Wu [17]. More background on FFD and GMA will be presented in the next section. This paper is to study the optimal conditions for GMA designs, which is a non-trivial problem due to the sequential optimization nature of the criterion. Unlike conventional combinatorial approaches, we provide a formal treatment for resolution-III designs from the optimization perspective. It will be shown that our new optimality results can be viewed as a natural extension of the weak-equidistance optimality for resolution-II designs by Zhang et al. [18]. Here we restrict ourselves to the symmetrical designs, i.e. those in which all the factors take the same number of levels, while the methodology is readily generalized to mixed-level designs in a straightforward manner. There exist several approaches to the construction of GMA designs. Among others, Lin [8] proposed to use half fractions of Hadamard matrices for constructing two-level supersaturated designs (SSDs), and Fang et al. [5] proposed the RBIBD method for constructing multi-level SSDs. However, these construction methods are restricted to GMA designs of resolution II. For designs of resolution III or higher, it is most natural to consider the subset design approach based on existing classes of orthogonal arrays. Butler [3] obtained some GMA designs by projecting specific saturated orthogonal arrays. In this paper, we propose a general sub-design selection algorithm, which utilizes the newly developed lower bounds and optimality conditions. The paper is organized as follows. Some background material is presented in Section 2. In Section 3, new lower bounds and optimality results are developed for orthogonal FFDs of resolution III, by Lagrange analysis for the nonlinear programming problem and a strengthening technique to take into account the integer-valued condition. These results are applied in Section 4 for sub-design selection, where we provide an effective computer search algorithm and report some new optimal designs with GMA. In the final section, we discuss some possible routes for future works. Throughout the paper, we use the symbol x for representing the largest integer not exceeding x, x for the smallest integer not less than x and x = x − x for the fractional part of x. The Kronecker delta function is defined as (x, y) = 1 if x = y and 0 otherwise. For non-negative integers j and k, S(j, k) denotes the Stirling number of the second kind, i.e. the number of ways of partitioning a set of j elements into k non-empty sets. Furthermore, we extendthe definition of binomial coefficient function xj to cover any non-negative argument x ∈ R+ : xj = 1 if j = 0, x x(x−1)···(x−j +1) x otherwise. j! j = 0 if x < j and j = 2. Background A factorial design of n runs and s factors for which each factor takes q levels is denoted by D(n, q s ). The full factorial design with n = q s runs comprises all possible level combinations. An FFD with n>q s runs takes only some fraction of the runs required for the full factorial; see Wu and Hamada [15] for details. A particular FFD is often chosen to satisfy some constraint or optimize some condition or set of conditions. Two common conditions of interest are balance and orthogonality. Balance means that for each factor each level appears in the same number of runs. Orthogonality means that for each pair of factors, all the q 2 possible level-combinations appear equally often. Two designs are said to be isomorphic if one can be obtained from the other by re-ordering runs, permuting factors or switching levels of one or more factors. For given parameters (n, s, q), we use D(n, q s ), U(n, q s ) and L(n, q s ) to denote the sets of non-isomorphic designs that have no constraint, balanced constraint and orthogonality constraint, respectively. A design in the set of U(n, q s ) is also called a U-type design in the uniform design literature; see
742
K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751
Fig. 1. Nested classes of fractional factorial designs.
the recent monograph by Fang et al. [6]. Note that these sets of designs are nested by D(n, q s ) ⊃ U(n, q s ) ⊃ L(n, q s ). The column-wise study of fractional factorials is tightly connected with the notion of orthogonal arrays. An FFD D(n, q s ) can be viewed as an orthogonal array of strength t, often denoted by OA(n, s, q, t), if for each t-tuple of factors each level combination appears equally often. Similarly, let us use OA(n, s, q, t) to denote the set of non-isomorphic orthogonal arrays, where OA(n, s, q, 1) ≡ U(n, q s ) and OA(n, s, q, 2) ≡ L(n, q s ). For given (n, s, q), an illustration of the nested structure is given in Fig. 1. Rao [13] presented the following well-known conditions for the existence of OA(n, s, q, t): u s i if t = 2u i (q − 1) n i=0 (1) u s i + s−1 (q − 1)u+1 if t = 2u + 1. (q − 1) i=0 i u These general lower bounds of n for given (s, q, t) are called Rao’s bounds. They have been improved for many specific parameter settings, see e.g. Bose and Bush [1] and Mukerjee and Wu [12]. The row-wise study of factorial designs is tightly connected with the notion of error-correcting codes in MacWilliams and Sloane [11]. It leads to the definition of the generalized minimum j aberration (GMA) criterion. Let ik be the coincidence indicator between the ith and kth runs at the jth factor. For any 1i, k n, the (i, k)-coincidence of the design is defined as ik = s j j =1 ik . For an OA(n, s, q, t), Bose and Bush [1] derived the following necessary conditions on the coincidences: n ik n s for 1 i n, 1j t. (2) = j j q j k=1
On the other hand, Zhang et al. [18] proved that for any D(n, q s ), n n ik n 2 n s = Nv(u) − j for 1 j s, − j q j q j u v i=1
k=1
(3)
where u denotes the summation over all j-element subset of {1, . . . , s}, v denotes the sum(u) mation over q j j-tuple level-combinations, and Nv is the frequency of level-combination v appearing in a u-factor sub-design. It is implied that D is an OA(n, s, q, t) if the right-hand side of (3) vanishes for all j t. Thus, it is clear that the Bose–Bush identities (2) are also sufficient
K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751
743
conditions for D(n, q s ) to have orthogonal strength t. Besides, for a design D(n, q s ) that is saturated in the sense that n = 1 + s(q − 1), [12] derived that ik = s − n/q
for any 1 i < k n,
(4)
a useful property in many ways, with a typical example in the study of complementary designs. Maximum resolution and minimum aberration are well-known criteria for regular designs. They are extended to non-regular designs via row-wise coincidences and MacWilliams identities in coding theory, as noted by Ma and Fang [10] and Xu and Wu [17]. For any FFD D(n, q s ), define j n n 1 ik w j −w s − ik Aj (D) = 2 (−1) (q − 1) (5) w j −w n i=1 k=1 w=0
for 1 j s. The vector w(D) = (A1 (D), . . . , As (D)) is called the generalized word-length pattern (GWP), and the index of the first non-zero element corresponds to the resolution. For two designs, D1 is said to have less generalized aberration than D2 if the first non-zero element of w(D1 ) − w(D2 ) is negative. A design D∗ is said to have GMA if no other design has less generalized aberration than it. 3. Lower bounds and optimality results For given parameters (n, s, q), the GMA criterion tends to sequentially minimize the GWP from low to high orders, such that the selected designs not only have the maximum resolution (say r), but also have the smallest Ar -value. Furthermore, if there are multiple resolution-r designs with the same smallest Ar , the GMA criterion will sequentially reduce the set of candidates by minimizing Ar+1 (D), Ar+2 (D), . . . , until the resulting candidates all have the same optimal GWP. The lower bounds of Aj (D) for j = 1, 2, 3, . . . are of crucial importance in the search of GMA designs. Delsarte [4] derived that Aj (D) 0 for 1 j s, where the equality holds for all j t if and only if D(n, q s ) is an orthogonal array of strength t. Consider the set of candidate designs D ∈ St ≡ OA(n, s, q, t) \ OA(n, s, q, t + 1), for which A1 (D) = · · · = At (D) = 0 while At+1 (D) > 0. Schematically in Fig. 1, St for t = 1, 2, . . . represent the resolution-(t + 1) rings from outer to inner areas. For D ∈ St , there is lack of a tight lower bound for At+1 (D). This section presents lower bounds and optimality results for St with t = 1 and 2. We begin with a brief review of the weak-equidistant optimality for S1 , as obtained by [18] through majorization inequality. Then, we develop a general treatment for S2 from a formal optimization perspective. It will be shown that the optimality results for S2 can be viewed as a natural extension of those for S1 . 3.1. Balanced designs of resolution II Zhang et al. [18] provides the optimality results for U-type designs D ∈ S1 (n, q s ), namely, the weak-equidistance lower bounds: A2 (D)
q2 ((n − 1)( + 2 − 1) + s(s − 1)(1 − n/q 2 )), 2n
(6)
744
K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751
where = 0 and = 0 , based on the average of pairwise coincidences
1 0 = n
2 1 i
s(n − q) , q(n − 1)
ik =
(7)
as a consequence of (2) when j = 1. Recall that a design is called equidistant if any two distinct runs of the design have the same coincidence 0 , where 0 must be a positive integer. The lower bounds (6) are achieved if there exist weak-equidistant designs, i.e. designs whose pairwise coincidences satisfy with proportion (1 − ), ik = (8) + 1 with proportion for 1i < k n. For = 0, this reduces to the equidistant case. Such weak-equidistant designs have been shown to have GMA in S1 (n, q s ). More details can be referred to [9,16,18] in the context of optimal SSDs. 3.2. Orthogonal designs of resolution III For orthogonal FFDs in S2 (n, q s ) that have resolution III, the necessary conditions (2) imply both the mean type of constraint (7) and the following variance type of constraint on the pairwise coincidences: 2 1 ns(q − 1)(n − 1 − s(q − 1)) n ≡ 20 . (9) ik − 0 = 2 (n − 1)2 q 2 1 i
Our goal is to minimize A3 (D) for resolution-III designs D ∈ S2 (n, q s ). Since the original formulas (5) for Aj (D) involve Krawtchouk polynomials which are difficult to analyze, we employ a reformulation through power moments. A similar consideration appears in [16], who, however, employs an overly complicated formulation and proof. Here we need only a basic property of Stirling numbers S(j, l) of the second kind defined by xj =
j
S(j, l)x(x − 1) · · · (x − l + 1)
(for any real x),
l=1
as well as a recent result by [18]
1 i,k n
ik l
2 l n2 s − w s n Aw (D) + = l −n l−w q ql l w=1
for l = 1, . . . , s. Then, it is straightforward to obtain the following:
j
1 i
ik = j +
j
j l
l=1
l s−w Aw (D) for j = 1, . . . , s, l−w
w=1
(10)
− n . Since for each j the leading coefficient for 2q l 2 j Aj (D) is evaluated as n2qjj! > 0, it is clear that sequentially minimizing i
n2 S(j,l)l!
and j =
j
S(j,l)s! l=1 2(s−l)!
n2 ql
K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751
745
Now consider the optimization problem min
D∈S2
(n,q s )
i
3ik (D)
(11)
over pairwise coincidences (D) = {ik (D), 1i < k n}. Besides the mean and variance types of constraints (7) and (9), we have a boundary constraint for each ik based on the existence of OA(n, s, q, 2): max{0, s − n/q}ik s
for 1i < k n,
(12)
where the second inequality is obvious. The first inequality can be verified from (4) if D(n, q s ) can be embedded in a saturated design; otherwise one can refer to [3] for an alternative proof. In what follows we concentrate on the derivation of the lower bounds for (11) subject to the conditions (7), (9) and (12), as divided into two steps. 3.2.1. Step 1: Lagrange analysis We use the classical Lagrange analysis by assuming that each argument ik is a continuous variable falling in the interval given by (12). The Lagrangian function takes the form L(; 1 , 2 ) =
i
3ik − 1
(ik − 0 ) − 2
i
i
(2ik − 20 − 20 ),
where 1 and 2 are undetermined Lagrange multipliers. It is necessary for ˆ to be an optimal solution that there exist ˆ 1 and ˆ 2 such that the gradient ˆ ˆ 1 , ˆ 2 ) = 0. ∇L(;
(13)
Any root ˆ with zero gradient is said to be stationary, which corresponds to a minimum, maximum or saddle point. The equation (13) leads to
ˆ = 1 − ˆ 2 ± ˆ 2 + 3 ˆ 1 ik 2 3
for 1 i < k n.
Denote by ˆ a and ˆ b the two undetermined roots and assume ˆ a ˆ b . Let p ∈ (0, 1) be the ˆ Based on (7) and (9), we get the alternative expressions proportion of ˆ b that appears in . ˆ a = 0 − 0
p , 1−p
ˆ b = 0 + 0
1−p . p
The objective function (11) is evaluated to be 3 n 1 − 2p 30 + 30 20 + √ ˆ ik = 30 , 2 p(1 − p) 1 i
p ∈ (0, 1).
(14)
746
K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751
Now using the boundary condition that max{0, s − n/q} ik s, we get the feasible range of p: ⎧ ⎪ 20 ⎪ ⎪ if n qs, ⎪ ⎪ ⎪ 20 + 20 ⎪ ⎪ ⎨ 20 n 2 ≡ p p p ≡ (15) − s + max min 0 ⎪ q (s − 0 )2 + 20 ⎪ ⎪ otherwise. ⎪ 2 ⎪ ⎪ ⎪ ⎪ 2 + 0 − s + n ⎩ 0 q Substituting pmax into (14) leads to the lower bound for i
ˆ b = 0 + 20 /(0 − ˆ a ).
(16)
It is straightforward that 3ik = (ik − ˆ a )(ik − ˆ b )2 + LB1 LB1 , i
i
where the equality holds if ik takes either ˆ a or ˆ b for 1 i < k n. When ˆ b is not integer-valued, we have that (ik − ˆ a )(ik − ˆ b )(ik − ˆ b )0 for any integer-valued ik ˆ a . Thus, we can strengthen the lower bound by taking the values of ik to be either ˆ a , ˆ b or ˆ b . By (7) and (9), we have the optimal distribution of the pairwise coincidences ⎧ with proportion 1 − p1 − p2 , ˆ ⎪ ⎪ ⎨ a ik = ˆ b with proportion p1 , (17) ⎪ ⎪ ⎩ ˆ b with proportion p2 for 1i < k n, where ˆ a , ˆ b are as in (16), and the proportions are given by p1 = (0 − ˆ a − )/(ˆ b − ˆ a ),
p2 = /(ˆ b − ˆ a )
and = 20 + 20 + ˆ b ˆ a − 0 (ˆ b + ˆ a ). For a non-empty L(n, q s ) of orthogonal FFDs, it can be verified that there uniquely exists such an optimal distribution of pairwise coincidences. Obviously, the optimality conditions (17) for S2 (n, q s ) can be viewed as a natural extension of (8) for S1 (n, q s ).
K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751
Based on (17), the refined lower bound for (11) is given by 3ik LB2
747
(18)
i
=
n (ˆ b + ˆ b )(20 + 20 − 0 ˆ a ) − ˆ b ˆ b (0 − ˆ a ) + ˆ a (20 + 20 ) . 2
LB2 reduces to LB1 if ˆ b is an integer. Back to the generalized word-length pattern, by (5), the lower bound of A3 (D) is given by n−1 A3 (D) (1 − p1 − p2 )K3 (s − ˆ a ; s, q) + p1 K3 (s − ˆ b ; s, q) n (q − 1)j s ˆ +p2 K3 (s − b ; s, q) + , (19) n j where K3 (x; s, q) denotes the Krawtchouk polynomial K3 (x; s, q) =
3
w
(−1) (q − 1)
w=0
3−w
x s−x . w 3−w
Finally, we claim that the optimal pairwise distance distribution (17) leads to not only the smallest 3 i
748
K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751
(2) The pairwise distances ik for i < k take two different integers, with the smaller one given by max{0, s − n/q}. This corresponds to the optimal condition (16) when ˆ b is integer-valued. (3) The pairwise distances ik for i < k take three different integers, with the smallest given by max{0, s − n/q} and the rest two are consecutive integers. This corresponds to the strengthened optimal condition (17). Based on these stopping rules, we present a computer search algorithm for orthogonal sub j design selection. In the algorithm, the power moments i
(2) Collect all s-factor combinations among {1, 2, . . . , s0 }: s0 (l) (l) . Gl = j1 , . . . , js , l = 1, . . . , s (3) For each sub-design l = 1, 2, . . . , ss0 , (a) compute the coincidence matrix by [ik ]n×n = j ∈Gl Cj . Note that one can also update the coincidence matrix from l − 1 to l by adding Cj − Cj j ∈Gl \Gl−1
j ∈Gl−1 \Gl
when |Gl \Gl−1 | = 1 or 2. (b) Check ik ’s to see if they satisfy any of the stopping rules. If so, terminate the loop and output the sub-design with Gl -factors. Otherwise, compute the power moments 3ik , . . . , sik ; Bl = i
i
update the best record Bl ∗ = Bl if the first non-zero element of Bl − Bl ∗ is negative. s If any stopping rule is satisfied,the algorithm outputs a GMA design it D(n,3q ); otherwise, s0 reports the best sub-design among s candidates that has the smallest ( i
K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751
749
Table 1 Sub-design selection results from superset design oa.27.13.3.2 D(27, 3s )
Factor combination
s=4 5 6 7 8 9 10 11 12
{9, 11, 12, 13} {7, 9, 11, 12, 13} {6, 7, 8, 9, 12, 13} {6, 7, 8, 9, 11, 12, 13} {5, 6, 7, 8, 9, 11, 12, 13} {3, 5, 6, 7, 8, 9, 11, 12, 13} {3, 5, 6, 7, 8, 9, 10, 11, 12, 13} {3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13} {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13}
GMA
i
Yes NS Yes NS Yes Yes Yes Yes Yes
3ik
Lower bound
1404 2322 3402 4968 6696 8748 11 772 14958 18 468
1404 2160 3402 4914 6696 8748 11 772 14 958 18 468
“NS” under “GMA” means “not sure”. The selected design has resolution IV for s = 4 and resolution III for s > 4. The GMA design at s = 12 is weak-equidistant.
Table 2 CPU time in seconds resulting from the early stopping rule (A) and the exhaustive search (B) for the oa.27.13.3.2 example s
4
5
6
7
8
9
10
11
12
Total
A B
0.0501 3.5852
1.8426 1.9728
0.0601 3.5651
2.6638 2.5537
0.0601 3.7354
0.0601 1.3119
0.0501 1.2518
0.0501 0.4006
0.0401 0.2203
4.8770 18.5967
RAM. Compared to the exhaustive search in traditional sub-design selection procedures, the computational complexity of our algorithm is greatly reduced, since it terminates immediately upon hitting any stopping rule. To illustrate this gain in computing time, Table 2 lists CPU time cost for the early stopping rule and the exhaustive search, respectively. For large s0 , the algorithm becomes intractable since the total number of combinations, ss0 , grows exponentially. Because many of the sub-designs are actually isomorphic and yield the same coincidence distribution, we can employ a simple trick of randomization to avoid the useless duplicates and overcome the intractability problem. Let us call D(n, q s ) a random sub-design of D0 (n, q s0 ) if it selects s out of s0 factors at random. Let M be a pre-defined maximal number of trials. Then, the second step inthe algorithm above can be modified to M random sub-designs instead of using the complete ss0 candidates. Our experience suggests that M = 200 would succeed in finding the GMA sub-design, whenever there exists a candidate sub-design that satisfies a stopping rule. Otherwise, a more conservative choice of M is recommended. By inputting different supersets from Sloane’s orthogonal arrays website (in particular, those oa.n.s0 .q.2 of strength 2), scanning through the sub-designs for s = 3, . . . , s0 , a large number of GMA designs can be found by our orthogonal sub-design selection algorithm. These optimal designs have resolution III or higher, including both two-level and multi-level cases. It is also interesting to note that the construction results of [3] can be all reproduced by our algorithm, including the GMA designs D(27, 3s ) for s = 8, . . . , 12 (cf. Table 1), D(81, 4s ) for s = 15, . . . , 20 and D(50, 5s ) for s = 3, . . . , 11. Finally, we report some GMA designs that are new to the FFD literature, as summarized in Table 3. These GMA designs have not appeared elsewhere, to the best of our knowledge. Furthermore, there is no restriction to choose other superset designs in the search of new GMA designs.
750
K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751
Table 3 Some new GMA designs to the FFD literature Superset
GMA Design
Factor combination
oa.27.13.3.2
D(27, 34 ) D(27, 36 )
cf. Table 1
oa.81.40.3.2
D(81, 336 ) D(81, 337 )
{2, 8, 16, 37} deleted {16, 26, 36} deleted
oa.64.21.4.2
D(64, 45 ) D(64, 46 ) D(64, 49 ) D(64, 412 )
{9, 16, 17, 18, 19} {2, 8, 13, 16, 18, 19} {3, 4, 10, 12, 14, 15, 16, 17, 20} {1, 3, 4, 5, 6, 7, 8, 9, 11, 17, 18, 20}
5. Conclusion This paper demonstrates how to apply the classical optimization approach to study the GMA criterion for FFDs, by employing Lagrange analysis and a strengthening trick to sharpen the lower bounds. We have shown that the new optimality conditions (17) for orthogonal designs can be viewed as an extension of weak-equidistance optimality (8) for balanced designs. These conditions serve as the early stopping rules in our orthogonal sub-design selection algorithm. Before ending the paper, we discuss several open problems for possible routes of future research. First, it is interesting to extend the GMA optimality to mixed resolution-III designs D(n, q1s1 q2s2 ). One can refer to [9] for this kind of effort in extending the weak-equidistance optimality of resolution-II designs. Second, optimal conditions for GMA designs with higher resolution (r 4) can be also analyzed via pairwise coincidences. Similar to the optimization setting (11), one could focus on the minimization of i
K.-T. Fang et al. / Journal of Complexity 23 (2007) 740 – 751
751
[3] N.A. Butler, Generalised minimum aberration construction results for symmetrical orthogonal arrays, Biometrika 92 (2005) 485–491. [4] P. Delsarte, An algebraic approach to the association schemes of coding theory, Philips Res. Rep. Suppl. 10 (1973). [5] K.-T. Fang, G.N. Ge, M.-Q. Liu, H. Qin, Construction of minimum generalized aberration designs, Metrika 57 (2003) 37–50. [6] K.-T. Fang, R. Li, A. Sudjianto, Design and Modeling for Computer Experiments, Chapman & Hall, Boca Raton, 2006. [7] A. Fries, W.G. Hunter, Minimum aberration 2k−p designs, Technometrics 22 (1980) 601–608. [8] D.K.J. Lin, A new class of supersaturated designs, Technometrics 35 (1993) 28–31. [9] M.Q. Liu, K.T. Fang, F.J. Hickernell, Connections among different criteria for assymetrical fractional factorial designs, Statist. Sinica 16 (2006) 1285–1297. [10] C.X. Ma, K.T. Fang, A note on generalized aberration in factorial designs, Metrika 53 (2001) 85–93. [11] F.J. MacWilliams, N.J.A. Sloane, The Theory of Error-Correcting Codes, North-Holland, Amsterdam, 1977. [12] R. Mukerjee, C.F.J. Wu, On the existence of saturated and nearly saturated asymmetrical orthogonal arrays, Ann. Statist. 23 (1995) 2102–2115. [13] C.R. Rao, Factorial experiments derivable from combinatorial arrangements of arrays, J. Roy. Statist. Soc. B 9 (1947) 128–139. [14] B. Tang, L.Y. Deng, Minimum G2 -aberration for nonregular fractional factorial designs, Ann. Statist. 27 (1999) 1914–1926. [15] C.F.J. Wu, M. Hamada, Experiments: Planning, Analysis, and Parameter Design Optimization, Wiley, New York, 2000. [16] H. Xu, Minimum moment aberration for nonregular designs and supersaturated designs, Statist. Sinica 13 (2003) 691–708. [17] H. Xu, C.F.J. Wu, Generalized minimum aberration for asymmetrical fractional factorial designs, Ann. Statist. 29 (2001) 1066–1077. [18] A. Zhang, K.T. Fang, R. Li, A. Sudjianto, Majorization framework for balanced lattice designs, Ann. Statist. 33 (2005) 2837–2853.
Journal of Complexity 23 (2007) 752 – 772 www.elsevier.com/locate/jco
Lattice-Nyström method for Fredholm integral equations of the second kind with convolution type kernels Josef Dicka , Peter Kritzerb , Frances Y. Kuoc,∗ , Ian H. Sloanc a Division Engineering, Science & Technology, UNSW Asia Tanglin Campus, 1 Kay Siang Road, Singapore 248922,
Singapore b Fachbereich Mathematik, Universität Salzburg, HellbrunnerstraYe 34, A-5020 Salzburg, Austria c School of Mathematics and Statistics, University of New South Wales, Sydney NSW 2052, Australia
Received 30 November 2006; accepted 19 March 2007 Available online 7 April 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday
Abstract
We consider Fredholm integral equations of the second kind of the form f (x)=g(x)+ k(x−y)f (y) dy, where g and k are given functions from weighted Korobov spaces. These spaces are characterized by a smoothness parameter > 1 and weights 1 2 · · ·. The weight j moderates the behavior of the functions with respect to the jth variable. We approximate f by the Nyström method using n rank-1 lattice points. The combination of convolution and lattice group structure means that the resulting linear system can be solved in O(n log n) operations. We analyze the worst case error measured in sup norm for functions g in the unit ball and a class of functions k in weighted Korobov spaces. We show that the generating vector of the lattice rule can be constructed component-by-component to achieve the optimal rate of convergence O(n−/2+ ), > 0, with the implied constant independent of the dimension d under an appropriate condition on the weights. This construction makes use of an error criterion similar to the worst case integration error in weighted Korobov spaces, and the computational cost is only O(n log nd) operations. We also study the notion of QMC-Nyström tractability: tractability means that the smallest n needed to reduce the worst case error (or normalized error) to ε is bounded polynomially in ε−1 and d; strong tractability means that the bound is independent of d. We prove that strong QMC-Nyström tractability in the
∗ Corresponding author. Fax: +61 2 93857123.
E-mail addresses: [email protected] (J. Dick), [email protected] (P. Kritzer), [email protected] (F.Y. Kuo), [email protected] (I.H. Sloan). 0885-064X/$ - see front matter © 2007 Published by Elsevier Inc. doi:10.1016/j.jco.2007.03.004
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
753
absolute sense holds iff ∞ j =1 j < ∞, and QMC-Nyström tractability holds in the absolute sense iff lim supd→∞ dj =1 j / log(d + 1) < ∞. © 2007 Published by Elsevier Inc. Keywords: Lattice rules; Quasi-Monte Carlo rules; Nyström method; Fredholm integral equations; Worst case error; Tractability
1. Introduction We study certain Fredholm integral equations of the second kind: f (x) = g(x) + (x, y)f (y) dy, [0,1]d
(1)
where the kernel is assumed to be of the form (x, y) = k(x − y),
(2)
with k(x) having period one in each component of x. Further, we assume that g and k belong to a weighted Korobov space H (and hence are continuous on [0, 1]d ) and that they are known explicitly (and hence we can evaluate g and k at any point in [0, 1]d ). The general Fredholm integral equation problem has been analyzed in many papers under many different settings, usually without the convolution assumption, see for example [5–9,15,17–19] and the references therein. The weighted Korobov spaces have also been considered in many papers, see for example [16]. These spaces are characterized by a smoothness parameter > 1 and weights 1 1 2 · · · > 0, where j moderates the behavior of the functions with respect to the jth variable; a small j means that the functions depend weakly on the jth variable. More general weights are considered in [4]. We approximate f using the Nyström method based on quasi-Monte Carlo (QMC) rules, that is, equal-weight integration rules. Let t1 , . . . , tn be points in [0, 1]d . Our approximation of f is given by 1 (x, ti )fn (ti ), n n
fn (x) := g(x) +
(3)
i=1
where the function values fn (t1 ), . . . , fn (tn ) are obtained by solving the linear system 1 (tj , ti )fn (ti ), n n
fn (tj ) = g(tj ) +
j = 1, . . . , n.
(4)
i=1
We shall refer to our method formally as the QMC-Nyström method. Further assumptions on the kernel (or equivalently, the function k), the value n, and the points t1 , . . . , tn are needed to ensure the stability and the existence of a unique solution for (4). The details are given in the next section. We analyze the worst case error of the QMC-Nyström method, which is essentially the worst possible error f − fn , measured in sup norm, across functions g in the unit ball and a class of functions k in a weighted Korobov space; the precise definition is given in the next section. In particular, we seek a good lattice point set t1 , . . . , tn which leads to as small a worst case error
754
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
as possible; hence the name lattice-Nyström method. A rank-1 lattice rule is a QMC rule with points given by ti = {iz/n}, i = 1, 2, . . . , n. Here z is known as the generating vector, which is an integer vector having no factor in common with n, and the braces around a vector indicate that each component of the vector is to be replaced by its fractional part. In analogy to known results on lattice rules for the integration problem in weighted Korobov spaces (see for example [3,10,16]), we prove in Theorem 5 that, for a sufficiently large n, a generating vector z can be constructed component-by-component for the integral equation problem such that the worst case error achieves the optimal rate of convergence O(n−/2+ ),
> 0,
in weighted Korobov spaces. Moreover, the implied constant in the big-O notation can be bounded polynomially in d or even independently of d provided that the weights j satisfy certain conditions. The group structure of lattice points, together with the convolution assumption (2), means that (tj , ti ) = k(tj − ti ) = k(t(j −i) mod n ),
with t0 := tn .
Thus forming the linear system (4) requires a total of 2n function evaluations, that is, n evaluations of the function g and n evaluations of the function k at the lattice points. It also means that the linear system (4) can be solved using the Fast Fourier Transform, with only O(n log n) operations. This has been studied in [20]. We also study tractability and strong tractability of the QMC-Nyström method in the absolute and/or normalized sense. Roughly speaking, tractability in the absolute sense means that the minimal value of n needed in the QMC-Nyström method to reduce the worst case error to ε ∈ (0, 1) is bounded polynomially in d and ε−1 ; strong tractability means that the bound is independent of d. We show in Theorem 6 that strong QMC-Nyström tractability in the absolute sense holds iff ∞
j < ∞,
(5)
j =1
and QMC-Nyström tractability in the absolute sense holds iff d lim sup d→∞
j =1 j
log(d + 1)
< ∞.
(6)
(Strong) tractability in the normalized sense is defined in terms of the normalized error with respect to the initial error. Conditions (5) and (6) are also sufficient conditions for (strong) QMCNyström tractability in the normalized sense, but we were unable to prove that they are also necessary. We stress at this point that our tractability study is restricted to the QMC-Nyström method. It is entirely possible that different necessary and/or sufficient conditions hold for the unrestricted class of algorithms. This paper is organized as follows. In Section 2 we formulate the problem and we define the worst case error criterion and the notion of QMC-Nyström tractability. Section 3 contains the main results of this paper. We obtain worst case error bounds and derive necessary and/or sufficient
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
755
conditions for QMC-Nyström tractability. We also prove that the generating vector for a lattice rule can be constructed component-by-component to achieve the optimal rate of convergence. Finally, in Section 4 we give some additional remarks. 2. Problem formulation 2.1. Preliminaries Let D = [0, 1]d , and let C = C(D) denote the class of continuous functions on D equipped with the sup norm f sup = supx∈D |f (x)|. For the space of bounded linear operators from C to C, we equip it with the usual induced operator norm T = T C→C = supf sup 1 Tf sup . In particular, for a given kernel ∈ C(D ×D) we are interested in the integral operator K : C → C, Kf = (·, y)f (y) dy, with K = max |(x, y)| dy, x∈D
D
D
and the corresponding discrete operator Kn : C → C, 1 (·, ti )f (ti ), n n
Kn f =
1 |(x, ti )|, n n
with Kn = max x∈D
i=1
i=1
where t1 , . . . , tn ∈ D. The operator K is compact and the sequence {Kn } is collectively compact, 1 see Anselone [1]. Throughout this paper we consider kernels of the form (x, y) = k(x − y) with k ∈ C periodic. Thus
1 |k(x − ti )| ksup , n n
|k(y)| dy ksup
K =
and
Kn = max x∈D
D
i=1
where the inequalities become equalities when k is a constant function. (d) Let H = H, (D) denote a weighted Korobov space, where = (j )j 1 is a sequence of positive weights and > 1 is a smoothness parameter. For any f (x) = fˆ(h) e2ih·x , with fˆ(h) = f (x) e−2ih·x dx, D
h∈Zd
the norm of f in H is given by ⎛ ⎞1/2 f H = ⎝ |fˆ(h)|2 r (, h)⎠ , h∈Zd
where r (, h) =
d
j =1
r (j , hj ),
with r (, h) =
1
if h = 0,
−1 |h|
otherwise.
1 A set of linear operators from one normed linear space to another is collectively compact iff the union of the images of the unit ball is precompact, i.e., its closure is compact.
756
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
Additionally, we assume that 1 1 2 · · · > 0 and thus r (, h)1 for all h ∈ Zd . Using the Cauchy–Schwarz inequality, we have for all f ∈ H ⎞1/2 ⎛ ⎞1/2 ⎛ 1 ⎠ f sup |fˆ(h)| ⎝ |fˆ(h)|2 r (, h)⎠ ⎝ r (, h) d d d h∈Z
h∈Z
= f H
h∈Z
d
1 + 2()j
1/2
,
(7)
j =1
∞ −x denotes the Riemann Zeta function. Thus H is embedded in C. where (x) := h=1 h Furthermore, the inequalities in (7) become equalities when f is a multiple of the function 2ih·x /r (, h). d e h∈Z 2.2. Fredholm integral equations and the Nyström method Given g, k ∈ H , we study the solution S(g, k) := f of the Fredholm integral equation (1), which we express as f = g + Kf, or as (I − K)f = g, where I : C → C denotes the identity operator If = f . Assuming that the operator (I − K)−1 exists, by the Fredholm alternative we have (I − K)−1 < ∞, and f = (I − K)−1 g. Since (x, y) = k(x − y), it is easily shown that ˆ fˆ(h) e2ih·x , (Kf )(x) = k(h) h∈Zd
ˆ fˆ(h) for all h ∈ Zd . Thus we have implying fˆ(h) = g(h) ˆ + k(h) fˆ(h) =
g(h) ˆ . ˆ 1 − k(h)
(8)
Hence one way to approximate f is to use approximations of the Fourier coefficients of g and k. This approach is especially useful if we do not have complete information about g and k, but have access to only finitely many point values. This will be studied in a separate paper. ˆ Since (I − K)−1 e2ih·x = e2ih·x /(1 − k(h)), we have (I − K)−1
1 ˆ |1 − k(h)|
∀ h ∈ Zd ,
(9)
ˆ ˆ which guarantees that k(h) = 1. Because k ∈ H , we have k(h) → 0 for large h, and thus (9) allows us to deduce that (I − K)−1 1.
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
757
Moreover we have from (8) and (9) that ⎛
f H
⎞1/2 2 g(h) r (, h)⎠ (I − K)−1 gH , ˆ =⎝ ˆ d 1 − k(h)
(10)
h∈Z
where the inequality becomes equality when g and k are both constant functions. We emphasize that the norm of (I − K)−1 in (10) is the operator norm in C, not in H. Using the QMC-Nyström method, we approximate f by the algorithm An (g, k) := fn , with fn given by (3), or alternatively expressed, f n = g + K n fn , where the function values fn (t1 ), . . . , fn (tn ) are to be obtained by solving the linear system (4). Suppose that n := (I − K)−1 (K − Kn )Kn < 1, then the operator (I − Kn )−1 exists and (I − Kn )−1
1 + (I − K)−1 Kn , 1 − n
see [1]. Then fn is well defined and we have fn = (I − Kn )−1 g. Note that n < 1 is essentially a condition on the value of n and the quality of the points t1 , . . . , tn . Provided that Kf − Kn f → 0 for all f ∈ C, the collective compactness of {Kn } yields (K − Kn )Kn → 0. More details can be found in [1]. 2.3. Error formulation We are ready to define the integral equation problem on H. Let > 0
> 1
and
be fixed. Recall that S(g, k) = (I − K)−1 g
and
An (g, k) = (I − Kn )−1 g.
We define the worst case error of a QMC-Nyström method by en,d (An ) :=
sup gH 1 kH , (I −K)−1
S(g, k) − An (g, k)sup ,
that is, we are interested in a class of problems where k ∈ H satisfies kH and
1(I − K)−1 .
(11)
758
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
Due to linearity in g, we have for all g ∈ H and all k satisfying (11) that S(g, k) − An (g, k)sup en,d (An ) gH . However, a similar result does not hold for k and this is why we need to specify the size of the norm kH . On the other hand, the integral equation problem depends crucially on the operator norm of (I − K)−1 and thus the condition (I − K)−1 controls the difficulty of the problem. Note that the parameters and in (11) are mutually independent, in that for appropriate choices of k either kH or (I − K)−1 can be arbitrarily large while the other is bounded. The initial error associated with the zero algorithm A0 ≡ 0 is defined as e0,d :=
sup gH 1 kH , (I −K)−1
S(g, k)sup .
For ε ∈ (0, 1), we are interested in the smallest value of n for which either en,d (An ) ε, which corresponds to tractability in the absolute sense, or en,d (An ) εe0,d , which corresponds to tractability in the normalized sense. First we define tractability in the absolute sense. For ε ∈ (0, 1) and d 1, let nabs (ε, d) := min{n : ∃QMC-Nystr o¨ m method An with en,d (An ) ε}. The integral equation problem is said to be QMC-Nyström tractable in the absolute sense iff there exist nonnegative constants C, p and q independent of ε and d such that nabs (ε, d) C ε−p d q
∀ ε ∈ (0, 1) ∀ d 1.
The problem is said to be strongly QMC-Nyström tractable in the absolute sense iff the above condition holds with q = 0. Tractability and strong tractability in the normalized sense can be defined in a similar way, with nabs (ε, d) replaced by nnor (ε, d) := min{n : ∃QMC-Nystr o¨ m method An with en,d (An ) ε e0,d }. Note that a total of 2n function evaluations (of g and k) are needed to form the linear system (4) for the lattice-Nyström method, while as many as n2 + n function evaluations may be required for a general QMC-Nyström method. 3. Error analysis 3.1. Initial error For all g satisfying gH 1 and all k satisfying (11), it follows from (7) and (10) that S(g, k)sup (I − K)−1 gH
d
1 + 2()j
j =1
1/2
d
1 + 2()j
1/2
.
j =1
This provides an upper bound on the initial error e0,d . Note that this upper bound does not depend on .
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
759
To obtain a lower bound on the initial error, we consider specific functions g and k. Let k ≡ c, with 1 0 < c := min , 1 − < 1. Then kH = c and (I − K)−1 = 1/(1 − c) . We define ⎛ ⎞ 1 ⎝ e2ih·x g(x) := − c⎠ , G r (, h) d h∈Z
with G :=
d
1 + 2()j
1/2
.
j =1
Then gH 1, and it is not hard to see that g(h) ˆ 1 = ˆ G r (, h) 1 − k(h)
∀ h ∈ Zd .
Thus for this choice of g and k, it follows from (8) that d
1 e2ih·x
1/2 = S(g, k)sup = . 1 + 2()j G h∈Zd r (, h) j =1 sup
Hence we have a lower bound on the initial error with the same dependence on d as the upper bound obtained before. In other words, we know exactly how the initial error increases with d. This lower bound does not give an indication of the dependence on and . A different lower bound can be obtained by choosing g ≡ k ≡ c with c defined as above. In this case, S(g, k)sup = c/(1 − c). Our analysis leads to the following result. Lemma 1. Let c := min(, 1 − 1/ ). The initial error satisfies ⎛
⎞ d d
1/2
1/2 c ⎠ e0,d max ⎝ . 1 + 2()j 1 + 2()j , 1−c j =1
j =1
Note that if 1 then c/(1 − c) = − 1. In this case, we see that the initial error increases linearly with . 3.2. Lower bound on the worst case error Again we consider a constant function k ≡ c, with c := min(, 1 − 1/ ). Then kH and (I − K)−1 . Moreover, for any g it is easy to show that c f = (I − K)−1 g = g + g(x) dx, 1−c D
760
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
and fn = (I − Kn )−1 g = g +
c 1 g(ti ). 1−c n n
i=1
Thus it follows by definition that: en,d (An )
sup S(g, k) − An (g, k)sup
gH 1
c = 1−c
n 1 g(ti ) sup g(x) dx − n gH 1 D i=1
c ewor-int (t1 , . . . , tn ), = 1 − c n,d
(12)
wor -int (t , . . . , t ) denotes the worst case integration error in H using quadrature points where en,d 1 n t1 , . . . , t n . It is known from [16] that in weighted Korobov spaces we have 1 2 n−1 2()1 1/2 wor -int wor -int 0, , , . . . , = . en,d (t1 , . . . , tn ) en,1 n n n n
This rate of convergence of O(n−/2 ) is optimal for the integration problem in weighted Korobov spaces (see also Sharygin’s lower bound [9]). In fact, it was proved in [3,10] that a generating vector z for a rank-1 lattice rule can be constructed component-by-component to achieve the rate of convergence O(n−/2+ ), > 0. Not surprisingly, this is also the optimal rate of convergence for the integral equation problem. Later we will show that a generating vector z for a rank-1 lattice rule can be constructed component-by-component, based on a different error criterion, to achieve this optimal rate of convergence. In terms of the dependence on d, it was shown in [16] that ⎞1/2 ⎛ d
1 wor -int 1 + 2() j − 1⎠ , en,d (t1 , . . . , tn ) ⎝ n j =1
1/(21 |min |))1, with −1 < min < −1 + 2− denoting the minimum of where := min(1, the function (x) = ∞ h=1 cos(2hx)/ h , see [2] or [4, Eq. (26)]. Moreover, it was proved in [16] that the integration problem in weighted Korobov spaces is strongly QMC tractable iff (5) holds, and QMC tractable iff (6) holds. Note that since the initial integration error is exactly 1, there is no need to distinguish between tractability in the normalized sense and tractability in the absolute sense. We summarize the lower bounds in the following lemma. Lemma 2. Let c := min(, 1−1/ ). The worst case error for the QMC-Nyström method satisfies ⎞1/2 ⎛ d
c 1 2() 1 en,d (An ) , 1 + 2() j − 1⎠ , max ⎝ 1−c n n j =1
where 1 is some constant independent of n and d.
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
761
For tractability in the absolute sense, we see from relationship (12) that (5) and (6) are necessary conditions for strong QMC-Nyström tractability and QMC-Nyström tractability, respectively. Later we will see that these conditions are also sufficient for tractability in the absolute sense. Unfortunately, we cannot obtain necessary conditions for tractability in the normalized sense because the d-dependence in the lower bound is too weak compared with the bounds on the initial error. Indeed, since 1, we cannot see how the normalized error en,d (An )/e0,d increases with d. 3.3. Upper bound on the worst case error By subtracting (I − Kn )fn = g from (I − Kn )f = (I − K)f + (K − Kn )f = g + (K − Kn )f , we obtain f − fn = (I − Kn )−1 (K − Kn )f. Thus S(g, k) − An (g, k)sup = f − fn sup (I − Kn )−1 (K − Kn )f sup . Recall that (I − Kn )−1
1 + (I − K)−1 Kn , 1 − n
when n := (I − K)−1 (K − Kn )Kn < 1. We can bound Kn as follows: Kn ksup kH
d
(1 + 2()j )1/2 .
j =1
Hence we can write f − fn sup
1 + (I − K)−1 kH
d
j =1 (1 + 2()j )
1 − (I − K)−1 (K − Kn )Kn
1/2
(K − Kn )f sup .
The term (K − Kn )Kn controls whether or not n < 1, while (K − Kn )f sup determines the rate of convergence. It remains to obtain bounds on these two terms. Let t1 , . . . , tn be rank-1 lattice points generated by z, that is, ti = {iz/n} where {x} = x − x . We have n 1 k(x − y)f (y) dy − k(x − ti )f (ti ) ((K − Kn )f )(x) = n D i=1
=−
h∈Z \{0} h·z≡0 (mod n) d
Fˆx (h),
762
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
where Fx (y) := k(x − y)f (y), and ˆ Fx (h) = k(x − y)f (y) e−2ih·y dy D 2il·x ˆ ˆ = k(l)f (p) e e2i(p−l−h)·y dy D
l∈Zd p∈Zd
=
ˆ fˆ(h + l) e2il·x . k(l)
l∈Zd
Thus it follows from the Cauchy–Schwarz inequality that: 2 i l · x ˆ fˆ(h + l) e (K − Kn )f sup = sup k(l) x∈D d d ∈Z \{0} l∈Z h·zh≡0 (mod n) ˆ |k(l)|| fˆ(h + l)| h∈Zd \{0} l∈Zd h·z≡0 (mod n)
⎡
⎛
⎞1/2
⎜ ⎢ ⎢ ˆ ⎜ ⎢|k(l)| ⎜ ⎣ ⎝ d
l∈Z
⎛ ⎜ ⎜ ×⎜ ⎝
h∈Zd \{0} h·z≡0 (mod n)
h∈Zd \{0} h·z≡0 (mod n)
⎛
f H ⎝
⎟ ⎟ |fˆ(h + l)|2 r (, h + l)⎟ ⎠ ⎞1/2 ⎤
⎟ 1 ⎟ ⎟ r (, h + l) ⎠ ⎞1/2
2 ˆ |k(l)| r (, l)⎠
l∈Zd
⎛
⎜ 1 ⎜ ×⎜ ⎝ d r (, l) l∈Z
⎞1/2 h∈Zd \{0} h·z≡0 (mod n)
⎟ 1 ⎟ ⎟ r (, h + l) ⎠
(I − K)−1 gH kH Sn,d (z), where in the last step we used (10) and the definition ⎛ ⎜ ⎜ Sn,d (z) := ⎜ ⎝
h∈Z \{0} l∈Z h·z≡0 (mod n) d
d
⎥ ⎥ ⎥ ⎦
(13)
⎞1/2
⎟ 1 ⎟ ⎟ r (, l) r (, h + l) ⎠
.
(14)
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
763
Using a similar argument to that above, we obtain (K − Kn )Kn n n 1 1 = sup k(x − ti )k(ti − tj ) k(x − y)k(y − tj ) dy − n x∈D n j =1 D i=1 n 1 ˆ k(h ˆ + l) e2il·x e−2i(h+l)·tj = sup k(l) x∈D n j =1 d ∈Zd \{0} l∈Z h·zh≡0 (mod n) ˆ ˆ + l)| |k(l)|| k(h h∈Zd \{0} l∈Zd h·z≡0 (mod n)
k2H Sn,d (z). Therefore, when k satisfies (11) we have n (I − K)−1 k2H Sn,d (z) 2 Sn,d (z). To ensure that n < 1, it is sufficient to demand that Sn,d (z) < 1/( 2 ). When this holds, we have 1 + (I − K)−1 kH dj =1 (1 + 2()j )1/2 −1 (I − Kn ) 1 − n d 1 + j =1 (1 + 2()j )1/2 1 − 2 Sn,d (z)
d
1 + 2
1 − Sn,d (z) j =1
1 + 2()j
1/2
.
Thus for g satisfying gH 1 and k satisfying (11), we have f − fn sup (I − Kn )−1 (I − K)−1 gH kH Sn,d (z)
d (1 + ) Sn,d (z)
1 − 2 Sn,d (z)
(1 + 2()j )1/2 .
j =1
We summarize this discussion in the following lemma. Lemma 3. Suppose there exists an integer vector z for which Sn,d (z) defined in (14) satisfies Sn,d (z) <
1 2
.
Then the worst case error for the lattice-Nyström method satisfies en,d (An )
d (1 + ) Sn,d (z)
2
1 − Sn,d (z)
(1 + 2()j )1/2 .
j =1
764
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
We need Sn,d (z) < 1/( 2 ) to control the denominator in the error bound (to ensure that n < 1). As long as Sn,d (z) converges with n, this condition can be trivially fulfilled with a large enough n. On the other hand, the Sn,d (z) in the numerator determines the rate of convergence of the worst case error. 3.4. Component-by-component construction of z Here we present an algorithm for constructing a generating vector z that leads to the optimal rate of convergence O(n−/2+ ), > 0. For simplicity, we restrict ourselves to n being a prime number. In this case, the components of the generating vector z can be restricted to the set {1, 2, . . . , n − 1}. Algorithm 1. Let n be a prime number. 1. Set z1 = 1. 2. For s = 2, 3, . . . , d, with z1 , z2 , . . . , zs−1 already chosen and fixed, find zs ∈ {1, 2, . . . , n−1} to minimize Sn,s (z1 , . . . , zs−1 , zs ). Lemma 4. Let n be prime and let z∗ ∈ {1, 2, . . . , n − 1}d be constructed by Algorithm 1. Then ∗
Sn,d (z )
d
1 n1/(2 )
1 + 2(1 + )1/2 ( ) j
1/
j =1
for all ∈ (1/, 1] and ∈ (0, 2−3 ]. Proof. The proof of this lemma is long and tedious, and is therefore deferred to Appendix. We now obtain a sufficient condition on n to ensure that Sn,d (z) < 1/( 2 ). It is enough to choose n such that the upper bound in Lemma 4 with = 1 and = 2−3 is no greater than, say, 1/(2 2 ). In other words, if n(2 2 )2 26
d
1 + 2(1 + 2−3 )1/2 ()j
2 ,
(15)
j =1
then Sn,d (z)1/(2 2 ), and we conclude from Lemmas 3 and 4 that en,d (An )
d 1/
1/2 2(1 + ) 1/2 ) ( ) 1 + 2(1 + 1 + 2()j j 1/(2 ) n j =1
for all ∈ (1/, 1] and ∈ (0, 2−3 ]. Taking = 1/( − 2) with min(2−3 , ( − 1)/2)), we see that en,d (An ) = O(n−/2+ ). Comparing this with the first lower bound in Lemma 2, we see that this is the optimal rate of convergence. Using the property ⎞ ⎛ ⎞ ⎛ d d d d
(1 + xj )= exp ⎝ log(1+xj )⎠ exp ⎝ xj ⎠ = (d+1) j =1 xj / log(d+1) (16) j =1
j =1
j =1
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
765
for all xj > 0, we see that the requirement (15) on n does not grow with d if (5) holds, and it grows only polynomially with d when (6) holds. Conditions (5) and/or (6) are also sufficient to ensure that en,d (An ) does not grow faster than polynomially with d. However, we will need to assume stronger conditions on the weights if we want to have the optimal rate of convergence at the same time. Theorem 5. Suppose n is a prime number satisfying (15). Then the generating vector z∗ constructed by Algorithm 1 achieves the optimal rate of convergence, with en,d (An )Cd, n−/2+
en,d (An ) Cd, n−/2+ e0,d
and
d, are independent of n, but depend on for all ∈ (0, min(2−3 , ( − 1)/2)], where Cd, and C and d. Additionally, if ∞
1/(−2)
j
< ∞,
j =1
d, , and the requirement (15) on n, can be bounded independently then the numbers Cd, and C of d. To implement Algorithm 1, we need a computable expression for Sn,d (z). We can write d 1
2 1 + 2(2)2j + 1 + 2()j n j =1 j =1 ⎛ ⎞2 d n−1 e2ikhzj /n 1 ⎝ ⎠ . 1 + j + n |h|
2 (z) = − Sn,d
d
(17)
h∈Z\{0}
k=1 j =1
This expression is very similar to the squared worst case integration error (see for example [16]). If is an even integer, then the inner sum over h can be computed via e2ikhzj /n kzj (2) B , = |h| n (−1)/2+1 ! h∈Z\{0}
where B is the Bernoulli polynomial of degree . Following [14] and using the Fast Fourier Transform, the component-by-component construction based on the quantity Sn,d (z) requires only O(n log n d) operations. In other words, the computational cost is no worse than that for the integration problem. 3.5. Tractability First we analyze tractability in the absolute sense. For ε ∈ (0, 1), we want to find the smallest n for which en,d (An ) ε. From Lemma 3 we see that it is sufficient to insist that Sn,d (z)
ε −1 (1 + )
d
1
j =1 (1 + 2()j )
1/2
+ 2
,
(18)
766
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
the right-hand side of which is less than 1/( 2 ). Using Lemma 4, we see that Algorithm 1 will generate a vector z satisfying (18) if we demand that ⎡ ⎛ d 2 ⎢ 1
⎜ 1 + 2(1 + )1/2 ( ) j n pr ⎝ min ⎣ 2 ∈(1/,1] j =1 ∈(0,2−3 ] ⎞2 ⎤⎞ ⎛ d
⎥⎟ (19) × ⎝ε−1 (1 + ) (1 + 2()j )1/2 + 2 ⎠ ⎦⎠ , j =1
where pr(x) denotes the smallest prime number greater than or equal to x. Hence we conclude that nabs (ε, d) is less than or equal to the right-hand side of (19). Note that pr(x)2x, since there is a prime number in the interval [k, 2k] for any positive integer k. (This is known as “Bertrand’s postulate”, proved by Chebyshev in 1850.) On the other hand, the second lower bound in Lemma 2 implies nabs (ε, d)
d
1 1 + 2() j . 2 2 1 + ε (1/c − 1) j =1
Similarly, for tractability in the normalized sense we obtain ⎛ ⎡ d 2 1 ⎜ 1/2 nor ⎣ ) ( ) 1 + 2(1 + n (ε, d) pr ⎝ min j ∈(1/,1] 2 j =1 ∈(0,2−3 ] ⎤⎞ 2 ⎦⎟ × ε−1 (1 + ) + 2 ⎠. However, we were unable to derive a lower bound on nnor (ε, d) because our lower bound on en,d (An ) was too weak compared to the initial error e0,d . Using again (16) and the additional property that log(1 + x) log(1 + x ∗ )x/x ∗ for all x x ∗ , we arrive at the following theorem. Theorem 6. Consider the Fredholm integral equation problem defined as in Section 2. (a) The problem is strongly QMC-Nyström tractable in the absolute sense iff ∞
j < ∞,
(5)
j =1
and it is QMC-Nyström tractable in the absolute sense iff d j =1 j < ∞. L := lim sup d→∞ log(d + 1)
(6)
These conditions are also sufficient for strong QMC-Nyström tractability and QMC-Nyström tractability in the normalized sense.
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
(b) If (5) holds and additionally
∞
j =1 j
nabs (ε, d) = O(ε−2 ) and
767
< ∞ for some ∈ (1/, 1], then
nnor (ε, d) = O(ε−2 ),
with the implied factors independent of ε and d. (c) If (5) does not hold but (6) holds, then nabs (ε, d) = O(ε−2 d q1 ) and
nnor (ε, d) = O(ε−2 d q2 ),
with the implied factors independent of ε and d, and with q1 and q2 arbitrarily close to 6 () L
and 4 () L,
respectively. Note that Part (b) is obtained by taking any , say, = 2−3 , and Part (c) is obtained with = 1 and with approaching 0. 4. Additional remarks 4.1. Generating vectors constructed for integration Since the optimal rate of convergence O(n−/2+ ), > 0, for the integral equation problem is the same as that for the integration problem, a natural question to ask is: can we use the generating vector already constructed for the integration problem? We came up with two approaches for estimating the resulting worst case error, but both with some undesirable effects. These are discussed below. Since (K − Kn )f (x) is essentially the integration error of the function Fx (y) := k(x − y)f (y), we can write wor -int K − Kn sup sup Fx H en,d (z),
x∈D
wor -int (z) denotes the worst case integration error for a lattice rule with generating vector where en,d z. We know that z can be constructed to achieve the optimal rate of convergence. However, from [13] (Appendix 2: Korobov spaces are algebras) we see that
Fx H 2 d max(1,/2)
d
1 + 2()j
1/2
kH f H .
j =1
This exponential dependence on d means that tractability is out of the question. Alternatively, we can estimate the expression (13) as follows: ˆ |k(l)|| fˆ(h + l)| h∈Zd \{0} l∈Zd h·z≡0 (mod n)
h∈Zd \{0} h·z≡0 (mod n)
⎛ ⎝
l∈Zd
⎞1/2 ⎛ 2 ˆ |k(l)| r (, l)⎠
⎞1/2 |fˆ(h + l)|2 ⎝ ⎠ r (, l) d l∈Z
768
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
⎡⎛
kH
⎞1/2
⎢⎝ ⎣
|fˆ(h + l)|2 r (, h + l)⎠
l∈Zd
h∈Zd \{0} h·z≡0 (mod n)
× d
f H kH
max √
l∈Zd
⎤
1 ⎥ ⎦ r (, l) r (, h + l)
max(1, 2 j )1/2
j =1
h∈Z \{0} h·z≡0 (mod n) d
1 , r/2 (1/2 , h)
(20)
where in the final step we made use of the estimate obtained in [13], r (, h)
r (, l) r (, h + l) d
j =1 max(1, 2
j)
∀ l ∈ Zd .
Observe that the sum in (20) is exactly the squared worst case integration error of a lattice rule 1/2 in the weighted Korobov space with replaced by /2 and j replaced by j . Thus we know
that a generating vector can be constructed such that this sum is of order O(n−/2+ ), > 0. In other words, the rate of convergence is right, and the dependence on d can be controlled by the weights, but we would require > 2 to begin with. 4.2. The algorithms for approximation In [11,12], functions from weighted Korobov spaces were approximated by truncated Fourier series, with vectors h from the set A(d, M) := {h ∈ Zd : r (, h) M}. Since M/r (, h)1 for all h ∈ A(d, M), the quantity En,d (z) studied in [11,12] can be bounded above by 1 En,d (z) := r (, h + l) h∈A(d,M)
h∈A(d,M)
M
h∈Zd
l∈Zd \{0} l·z≡0 (mod n)
M r (, h)
1 r (, h)
l∈Zd \{0} l·z≡0 (mod n)
l∈Zd \{0} l·z≡0 (mod n)
1 r (, h + l)
1 2 = M Sn,d (z). r (, h + l)
2 (z) is much easier to work with than E Note that Sn,d n,d (z), because it is given explicitly by (17), and there is no need to analyze the set A(d, M). The component-by-component construction is independent of M, and the computational cost is much cheaper. 2 (z) lead to the same n-dependence in the Furthermore, the vectors obtained by minimizing Sn,d approximation error bounds as those obtained by minimizing En,d (z). Hence this new quantity
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
769
should be used not only for the integral equation problem, but also for the approximation problem discussed in [11,12]. Acknowledgments The support of the Australian Research Council under its Centres of Excellence program is gratefully acknowledged. The first and third authors were supported by University of New South Wales Vice-Chancellor’s Postdoctoral Research Fellowships. The second author was supported by the Austrian Science Foundation (FWF) project S9609, which is part of the Austrian National Research Network “Analytic Combinatorics and Probabilistic Number Theory”. Furthermore, he would like to thank the other authors for their hospitality during his visit to the University of New South Wales. Appendix. Proof of Lemma 4 We prove the result by induction on d, following closely the argument used in the proof of Lemma 6 in [11]. It can easily be checked that the result holds for d = 1. Suppose the result has been shown for d. By separating the hd+1 = 0 and hd+1 = 0 terms in (14), we can write 2 2 Sn,d+1 (z, zd+1 ) = 1 + 2(2)2d+1 Sn,d (z) + (z, zd+1 ), ⎡
where (z, zd+1 ) =
∞
d+1 ∈Z hd+1 =−∞ hd+1 =0
⎢ 1 1 ⎢ ⎢ ⎣ r (d+1 , d+1 ) r (d+1 , d+1 + hd+1 ) ⎤ ×
l∈Zd
h∈Zd h·z≡−hd+1 zd+1 (mod n)
⎥ 1 1 ⎥ ⎥. r (, l) r (, l + h) ⎦
A similar expression already appeared in the proof of Lemma 6 in [11] (but with the role of l and h interchanged). For ∈ (1/, 1], we follow closely the argument used in [11], including the use of Jensen’s inequality, to arrive at n−1 ! ! 1 ∗ (z, zd+1 (z, zd+1 ) (z), ) n−1 zd+1 =1
with (z) =
G−G n−1 +
∞
1
1
r ( , ) r ( , d+1 ∈Z hd+1 =−∞ d+1 d+1 d+1 d+1 hd+1 =0 ∞
nG − G n−1
1
d+1 ∈Z
hd+1 =−∞ hd+1 ≡0 (mod n) hd+1 =0
+ hd+1 ) 1
r ( d+1 , d+1 ) r ( d+1 , d+1
+ hd+1 )
,
770
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
where G :=
l∈Zd
h∈Zd h·z≡0 (mod n)
1 1 r ( , l) r ( , l + h)
d
1 + 2( ) j
2
=: G.
j =1
We have W1 :=
=
1
1
r ( , ) r ( , d+1 ∈Z hd+1 ∈Z d+1 d+1 d+1 d+1 hd+1 =0 2 1 + 2( ) d+1 − 1 + 2(2 )2d+1
+ hd+1 )
22 ( ) d+1 + 22 [( )]2 2d+1 ,
W2 :=
=
1
d+1 ∈Z hd+1 ∈Z d+1 ≡0 (mod n) hd+1 ≡0 (mod n) hd+1 =0 " #2 " 2( ) d+1
1+
− 1+
n
22 ( ) d+1 n
+
1
r ( d+1 , d+1 ) r ( d+1 , d+1 2(2 )2d+1
22 [( )]2 2d+1
n
+ hd+1 )
#
n2 ,
and
W3 :=
1
d+1 ∈Z hd+1 ∈Z d+1 ≡0 (mod n) hd+1 ≡0 (mod n) hd+1 =0
= 2d+1
(n−1)/2 k=−(n−1)/2 k=0
⎡" ⎣
∈Z
1
r ( d+1 , d+1 ) r ( d+1 , d+1
1 |n + k|
⎡⎛
−
∈Z
⎤ 1 ⎦ |n + k|2 ⎞2
⎤
⎟ 1 1 ⎥ ⎟ ⎥ ⎟ − 2 ⎥ ⎠ ⎦ |k| k k=−(n−1)/2 ∈Z |n| 1 + = 0 k=0 n " # (n−1)/2 1 2 +2 ( ) 22 +2 [( )]2 2 d+1 + |k| n n2
2d+1
(n−1)/2
⎢⎜ 1 ⎢⎜ ⎢⎜ + ⎣⎝ |k|
#2
+ hd+1 )
k=−(n−1)/2 k=0 2 +4 [( )]2 2d+1
n
.
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
771
Thus, G−G nG − G W1 + (W2 + W3 ) n−1 n−1 22 ( ) d+1 + 22 [( )]2 2d+1 nG − G 2 +4 [( )]2 2d+1 1 + G 1− n−1 n n−1 n
(z) =
d 1
2 22 ( ) d+1 + 22 (1 + 2 +2 )[( )]2 2d+1 1 + 2( ) j . n j =1
Combining all the estimates together and making use of the induction hypothesis, the desired result then follows from: 1/ 1 + 2(2)2d+1 + 22 ( ) d+1 + 22 (1 + 2 +2 )[( )]2 2d+1 1/ 1 + 2( ) d+1 + 22 ( ) d+1 + 22 (1 + 2 +2 )[( )]2 2d+1 1/ 1 + 22 ( ) d+1 + 22 (1 + )[( )]2 2d+1 2/ 1 + 2(1 + )1/2 ( ) d+1 , where we have used Jensen’s inequality, 1/2 and 2 +2 1 (since 2−3 ). This completes the proof of Lemma 4. References [1] P.M. Anselone, Collectively Compact Operator Approximation Theory and Applications to Integral Equations, Prentice-Hall, New Jersey, 1971. [2] G. Brown, G.A. Chandler, I.H. Sloan, D. Wilson, Properties of certain trigonometric series arising in numerical analysis, J. Math. Anal. Appl. 162 (1991) 371–380. [3] J. Dick, On the convergence rate of the component-by-component construction of good lattice rules, J. Complexity 20 (2004) 493–522. [4] J. Dick, I.H. Sloan, X. Wang, H. Wo´zniakowski, Good lattice rules in weighted Korobov spaces with general weights, Numer. Math. 103 (2006) 63–97. [5] K.V. Emelyanov, A.M. Ilin, On the number of arithmetic operations necessary for the approximate solution of Fredholm integral equations of the second kind, Zh. Vychisl. Mat. Mat. Fiz. 7 (1967) 905–910 (in Russian). [6] K. Frank, Complexity of local solution of multivariate integral equations, J. Complexity 11 (1995) 416–434. [7] K. Frank, S. Heinrich, S.V. Pereverzev, Information complexity of multivariate Fredholm integral equations in Sobolev classes, J. Complexity 12 (1996) 17–34. [8] S. Heinrich, Complexity of integral equations and relations to s-numbers, J. Complexity 9 (1993) 141–153. [9] L.K. Hua, Y. Wang, Applications of Number Theory to Numerical Analysis, Springer, Berlin, New York, 1981. [10] F.Y. Kuo, Component-by-component constructions achieve the optimal rate of convergence for multivariate integration in weighted Korobov and Sobolev spaces, J. Complexity 19 (2003) 301–320. [11] F.Y. Kuo, I.H. Sloan, H. Wo´zniakowski, Lattice rules for multivariate approximation in the worst case setting, in: H. Niederreiter, D. Talay (Eds.), Monte Carlo and Quasi-Monte Carlo Methods 2004, Springer, Berlin, 2006, pp. 289–330. [12] F.Y. Kuo, I.H. Sloan, H. Wo´zniakowski, Lattice rule algorithms for multivariate approximation in the average case setting, J. Complexity, in press, doi:10.1016/j.jco.2006.10.006. [13] E. Novak, I.H. Sloan, H. Wo´zniakowski, Tractability of approximation for weighted Korobov spaces on classical and quantum computers, Found. Comput. Math. 4 (2004) 121–156.
772
J. Dick et al. / Journal of Complexity 23 (2007) 752 – 772
[14] D. Nuyens, R. Cools, Fast algorithms for component-by-component construction of rank-1 lattice rules in shiftinvariant reproducing kernel Hilbert spaces, Math. Comput. 75 (2006) 903–920. [15] S.V. Pereverzev, On the complexity of the problem of finding solutions of Fredholm equations of the second kind with differentiable kernels, II, Ukrain. Mat. Zh. 41 (1989) 189–193 (in Russian). [16] I.H. Sloan, H. Wo´zniakowski, Tractability of multivariate integration for weighted Korobov classes, J. Complexity 17 (2001) 697–721. [17] A.G. Werschulz, Where does smoothness count the most for Fredholm equations of the second kind with noisy information?, J. Complexity 19 (2003) 758–798. [18] A.G. Werschulz, The complexity of Fredholm equations of the second kind: noisy information about everything, J. Integral Equations Appl., to appear. [19] Y. Xu, A. Zhou, Fast Boolean methods for solving integral equations in high dimensions, J. Integral Equations Appl. 16 (2004) 83–110. [20] P. Zinterhof, Über die schnelle Lösung von hochdimensionalen Fredholm-Gleichungen vom Faltungstyp mit zahlentheoretischen Methoden (On the fast solution of higher-dimensional Fredholm equations of convolution type by means of number-theoretic methods), Österreich. Akad. Wiss. Math.-Natur. Kl. Sitzungsber. II 196 (4–7) (1987) 159–169.
Journal of Complexity 23 (2007) 773 – 792 www.elsevier.com/locate/jco
Sampling numbers and function spaces Jan Vybíral Friedrich-Shiller Universitat, Mathematisches Institut, Ernst-Abbe-Platz 1-3, 07743 Jena, Germany Received 4 April 2006; accepted 1 March 2007 Available online 27 April 2007
Abstract We want to recover a continuous function f : (0, 1)d → C using only its function values. Let us assume, that f is from the unit ball of some function space (for example a fractional Sobolev space or a Besov space) and the precision of the reconstruction is measured in the norm of another function space of this type. We describe the rate of convergence of the optimal sampling method (linear as well as nonlinear) in this setting. © 2007 Elsevier Inc. All rights reserved. MSC: 41A25; 41A46; 46E35 Keywords: Linear and nonlinear approximation methods; Besov and Triebel-Lizorkin spaces; Sampling operators
1. Introduction s () We study the following question. Let ⊂ Rd be a bounded Lipschitz domain and let Bpq denote the scale of Besov spaces on , see Definitions A.1 and A.3 for details. We try to approximate f ∈ Bps11 q1 () in the norm of another Besov space, say Bps22 q2 (), by a linear sampling method
Sn f =
n
f (xj )hj ,
(1.1)
j =1
where hj ∈ Bps22 q2 () and xj ∈ . First of all, we have to give a meaning to the pointwise evaluation in (1.1). For this reason, we shall restrict ourselves to the case s1 >
d , p1
E-mail address: [email protected]. 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.03.011
774
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
¯ Second, we always assume that which guarantees the continuous embedding Bps11 q1 () → C(). s1 s2 the embedding Bp1 q1 () → Bp2 q2 () is compact, which holds if and only if 1 1 s1 − s2 > d − . p1 p2 + We measure the worst case error of Sn f by sup{f − Sn f |Bps22 q2 () : f |Bps11 q1 () 1}.
(1.2)
The same worst case error may also be considered for nonlinear sampling methods: Sn f = (f (x1 ), . . . , f (xn )),
(1.3)
where : Cn → Bps22 q2 () is an arbitrary mapping. In this paper, we discuss the decay of (1.2) for linear (1.1) and nonlinear (1.3) sampling methods. In some cases we restrict ourselves to the case = I d = (0, 1)d . This allows to describe the optimal sampling operator more explicitly. However, we conjecture, that many of these results can be generalised to general bounded Lipschitz domains. Let Lp () stand for the usual Lebesgue space and Wpk (), k ∈ N, denotes the classical Sobolev space over . Then it is well known that − dk +( p1 − p1 )+
inf sup{f − Sn f |Lp2 () : f |Wpk1 () 1} ≈ n Sn
1
2
,
(1.4)
where the infimum in (1.4) runs over all linear sampling operators Sn , see (1.1) (cf. [5] or [10]). The result remains true if we switch to the general situation where nonlinear methods Sn are allowed. In [12], this statement has been proved for arbitrary bounded Lipschitz domain, but with the Sobolev spaces replaced by the more general scales of Besov and Triebel-Lizorkin spaces. The target space was always given by Lp2 (). The proof given there uses the simple structure of the Lebesgue space. It is the main aim of this paper to generalise (1.4) and to investigate also other “target” spaces. Let us present our main results. If s2 > 0, then the quantity inf sup{f − Sn f |Bps22 q2 () : f |Bps11 q1 () 1} Sn
(1.5)
behaves like −
n
s1 −s2 1 1 d +( p1 − p2 )+
in both, the linear as well as the nonlinear setting. We prove this result only for the special case of = (0, 1)d . However, in this situation we are able to give an explicit description of in order kd optimal operator which we are going to introduce now. Namely, if n ≈ 2 , where k ∈ N dis fixed, we use a smooth decomposition of unity {k, } such that k, (x) = 1 for x ∈ (0, 1) where the support of k, is concentrated around 2−k . Then we approximate f locally on supp k, by a polynomial gk, and define gk, k, . Sn f =
To calculate each of the 2(k+2)d functions gk, we need to combine M+d−1 function values of d kd ≈ n function values of f to obtain ≈ 2 f in a linear way. Altogether, we need 2(k+2)d M+d−1 d
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
775
Sn f . Here, M > s1 is a fixed natural number. The generalisation of this construction to bounded Lipschitz domains remains a subject of further study. If s2 < 0, we give the following characterisation of (1.5). If p1 p2 or p1 < p2 and pd2 − pd1 > s2 , then (1.5) decays like s1
n− d
and if p1 < p2 and 0 > s2 > −
n
s1 s2 1 1 d + d + p1 − p2
d p2
−
d p1 ,
then (1.5) behaves like
.
All these results hold for linear as well as nonlinear methods Sn . These estimates can be applied in connection with elliptic differential operators, which was the actual motivation for this research, cf. [6,7]. Let us briefly introduce this setting. Let A:H →G be a bounded linear operator from a Hilbert space H to another Hilbert space G. We assume that A is boundedly invertible, hence A(u) = f has a unique solution for every f ∈ G. A typical application is an operator equation, where A is an elliptic differential operator, and we assume that A : H0s () → H −s (), where is a bounded Lipschitz domain, H0s () is a function space of Sobolev type with fractional order of smoothness s > 0 of functions vanishing on the boundary and H −s is a function space of Sobolev type with negative smoothness −s < 0. The classical example is the Poisson equation −u = f in
and u = 0 on *.
Here, s = 1 and A = − : H01 () → H −1 () is bounded and boundedly invertible. We want to approximate the solution operator u = S(f ) using only function values of f . We define the nth linear sampling number of the identity id : H −1+t () → H −1 () by gnlin (id : H −1+t () → H −1 ()) = inf id − Sn |L(H −1+t (), H −1 ()), Sn
where t is a positive real number with −1 + t > S : H −1+t () → H 1 () by
d 2,
and the nth linear sampling number of
gnlin (S : H −1+t () → H 1 ()) = inf S − Sn |L(H −1+t (), H 1 ()). Sn
(1.6)
(1.7)
The infimum in (1.6) and (1.7) runs over all linear operators Sn of the form (1.1) and L(X, Y ) stands for the space of bounded linear operators between two Banach spaces X and Y, equipped with the classical operator norm.
776
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
It turns out that these quantities are equivalent (up to multiplicative constants which do not depend neither on f nor on n) and are of the asymptotic order gnlin (S : H −1+t () → H 1 ()) ≈ gnlin (id : H −1+t () → H −1 ()) ≈ n−
−1+t d
.
We refer to [6,7] for a detailed discussion of this approach. The estimates of sampling numbers of embedding between two function spaces translates therefore into estimates of sampling numbers of the solution operator S. We observe that the more regular f, the faster is the decay of the linear sampling numbers of the solution operator S. Let us also point out that optimal linear methods (not restricted to use only the function values of f) achieve asymptotically a better rate of convergence, t namely n− d . Hence, the limitation to the sampling operators results in a serious restriction. One has to pay at least n1/d in comparison with optimal linear methods. Using our estimates of sampling numbers of identities between Besov and Triebel-Lizorkin spaces, this result may be generalised as follows. 1 If p 2, 1 q ∞ and −1 + t > pd then −1+t −1+t gnlin (S : Bpq () → H 1 ()) ≈ gnlin (id : Bpq () → H −1 ()) ≈ n−
If p < 2 with
1 p
>
1 d
+ 21 , 1 q ∞ and −1 + t >
d p
−1+t d
.
then − dt + p1 − 21
−1+t −1+t gnlin (S : Bpq () → H 1 ()) ≈ gnlin (id : Bpq () → H −1 ()) ≈ n
Finally, if p < 2 with
1 p
<
1 d
+ 21 , 1 q ∞ and −1 + t >
d p
.
then
−1+t −1+t gnlin (S : Bpq () → H 1 ()) ≈ gnlin (id : Bpq () → H −1 ()) ≈ n−
−1+t d
.
We prove the same results also for the nonlinear sampling numbers gn (S). Altogether, the regularity information of f may now be described by an essentially broader scale of function spaces. All the unimportant constants are denoted by the letter c, whose meaning may differ from one ∞ occurrence to another. If {an }∞ n=1 and {bn }n=1 are two sequences of positive real numbers, we write an bn if, and only if, there is a positive real number c > 0 such that an c bn , n ∈ N. Furthermore, an ≈ bn means that an bn and simultaneously bn an . 2. Sampling numbers The notation and basic facts about function spaces, which we shall need later on, are included in the Appendix. We now introduce the concept of sampling numbers. Definition 2.1. Let be a bounded Lipschitz domain. Let G1 () be a space of continuous functions on and G2 () ⊂ D () be a space of distributions on . Suppose, that the embedding id : G1 () → G2 () is compact. 1 Although the results are stated only for Besov spaces, they are proved also for Triebel-Lizorkin spaces, which include also fractional Sobolev spaces as a special case.
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
777
For {xj }nj=1 ⊂ we define the information map Nn : G1 () → Cn ,
Nn f = (f (x1 ), . . . , f (xn )), f ∈ G1 ().
For any (linear or nonlinear) mapping n : Cn → G2 () we consider Sn : G1 () → G2 (),
Sn = n ◦ Nn .
(i) Then, for all n ∈ N, the nth sampling number gn (id) is defined by gn (id) = inf sup{f − Sn f |G2 () : f |G1 () 1}, Sn
(2.1)
where the infimum is taken over all n-tuples {xj }nj=1 ⊂ and all (linear or nonlinear) n . (ii) For all n ∈ N the nth linear sampling number gnlin (id) is defined by (2.1), where now only linear mappings n are admitted. 2.1. The case s2 > 0 In this section, we discuss the case where = I d = (0, 1)d is the unit cube, G1 () = Asp11 q1 ()
and G2 () = Asp22 q2 () with s1 >
and s1 − d
1 p1
−
1 s p2 + > s2 > 0. Here, Apq () stands s () or a Triebel-Lizorkin space F s (), see Definition A.3 for details. either for a Besov space Bpq pq We start with the most simple and most important case, namely when p1 = p2 = q1 = q2 . d p1
s1 s2 Proposition 2.2. Let = I d = (0, 1)d . Let G1 () = Bpp () and G2 () = Bpp () with 1p ∞,
s1 >
d p
and s1 > s2 > 0.
Then gnlin (id) n−
s1 −s2 d
.
Proof. First, we introduce necessary notation. Let a > 0, z ∈ Rd and U ⊂ Rd . Then aU = {ax : x ∈ U }
and z + aU = {z + ax : x ∈ U }.
(2.2)
Furthermore, if k ∈ N0 and ∈ Zd , we set Qk, = {x ∈ Rd : 2−k i < xi < 2−k (i + 1)},
Qk, = x ∈ I d : 2−k i − 21 < xi < 2−k i + 23 . We point out, that (up to a set of measure zero) I d = ∪{Qk, : 0 i 2k − 1, i = 1, 2, . . . , d}. Next, we introduce smooth decomposition of unity, first on Rd and then its restriction to I d . Let ˜ ∈ S(Rd ) with d ˜ ⊂ −1, 3 ˜ − ) = 1, x ∈ Rd . supp (x and 2 2 ∈Zd
778
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
Then we define
k, (x) =
˜ k x − ) if x ∈ I d , (2 0 otherwise.
(2.3)
Let us denote Ak = {−1, 0, . . . , 2k }d . By (2.3), the following identities are true for every k ∈ N: 1 if x ∈ I d , k, (x) = k, (x) = Id (x) = 0 otherwise, ∈Ak
∈Zd
supp k, ⊂ Qk, ,
∈ Ak .
Now we define linear approximation operators S˜k . Take f ∈ G1 (I d ) and consider the decomposition f = f k, . ∈Ak
To each Qk, we associate gk, ∈ P M (Qk, ) such that gk, (2−k ·) approximates f (2−k ·) on 2k Qk, according to Corollary A.6, see the Appendix, s1 (f − gk, )(2−k ·)|Bpp (2k Qk, )
1
t 0
−s1 p
M,2k Qk, dt (f (2−k ·))(x)|Lp (2k Qk, )p
The operators S˜k : G1 (I d ) → G2 (I d ) are defined by S˜k f = gk, k, , k ∈ N.
dt t
1/p .
(2.4)
(2.5)
∈Ak
Trivially, the right-hand side of (2.5) belongs to G1 (I d ) and hence also to G2 (I d ). The operators M+d−1 k ˜ Sk use · (2 + 2)d ≈ 2kd points. So, it is enough to prove the estimate d s 2 (I d ) 2−k(s1 −s2 ) f |B s1 (I d ). B (f − g ) k, k, pp pp ∈Ak s1 We use the dilation property (cf. [9, Proposition 2.2.1]) as well as the embedding Bpp (Rd ) → s2 d Bpp (R ) and obtain s d 2 (f − gk, )k, Bpp (I ) ∈Ak k s2 − pd −k −k s2 k d 2 (f − gk, )(2 ·)k, (2 ·) Bpp (2 I ) ∈Ak d k s − −k −k s1 k d 2 2 p (f − g )(2 ·) (2 ·) B (2 I ) (2.6) k, k, pp . ∈Ak
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
We claim that −k −k s1 k d (f − gk, )(2 ·)k, (2 ·) Bpp (2 I ) ∈Ak ⎛ ⎞1/p p s1 k k, ⎠ ⎝ (2 Q ) . (f − gk, )(2−k ·) Bpp
779
(2.7)
∈Ak
To prove (2.7), we first decompose (independent of k ∈ N) so that
∈Ak
into
K =1
∈Ak
with the number K ∈ N
dist(supp k,1 (2−k ·), supp k,2 (2−k ·)) > 1
(2.8)
for every 1 , 2 ∈ Ak and every = 1, . . . , K. To every ∈ Ak we associate E ((f − gk, )(2−k ·)) defined on Rd such that E ((f − gk, )(2−k x)) = (f − gk, )(2−k x), E ((f − gk, )(2−k x)) = 0
x ∈ 2k Qk, ,
if x ∈ supp k, (2−k ·)
(2.9) (2.10)
if ∈ Ak , = and s1 s1 E ((f − gk, )(2−k x))|Bpp (Rd ) c (f − gk, )(2−k x)|Bpp (2k Qk, ).
(2.11)
The existence of E ((f − gk, )(2−k ·)) satisfying (2.9)–(2.11) follows directly from the Definition A.3, possibly combined with some smooth cut-off function and the pointwise multiplier assertion, cf. [15, Theorem 2.8.2]. Denoting ˜ (x) = (2 ˜ k x − ), k,
x ∈ Rd , k ∈ N, ∈ Zd ,
we get −k −k s1 k d (f − gk, )(2 ·)k, (2 ·) Bpp (2 I ) ∈Ak K −k −k s1 k d (f − gk, )(2 ·)k, (2 ·) Bpp (2 I ) =1 ∈Ak K −k −k s1 d E ((f − gk, )(2 ·))k, (2 ·) Bpp (R ) . =1 ∈A k
(2.12)
780
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
By (2.8) and the so-called localisation property, cf. [16, Chapter 2.4.7], we may estimate the last expression from above by K
=1
⎛
⎞1/p p s1 d ⎠ ⎝ (R ) E ((f − gk, )(2−k ·))k, (2−k ·) Bpp ∈Ak
⎞1/p K p −k −k s d 1 (R ) ⎝ E ((f − gk, )(2 ·))k, (2 ·) Bpp ⎠ ⎛
=1 ∈Ak
⎞1/p p s1 =⎝ (Rd ) ⎠ . E ((f − gk, )(2−k ·))k, (2−k ·) Bpp ⎛
∈Ak
Together with Lemma A.7 and (2.11) this finally leads to −k −k s1 k d (f − gk, )(2 ·)k, (2 ·) Bpp (2 I ) ∈Ak ⎛ ⎞1/p p p s1 d s1 d ⎠ ⎝ (R ) · k, (2−k ·) Bpp (R ) E ((f − gk, )(2−k ·)) Bpp ∈Ak
⎞1/p p s1 ⎝ (Rd ) ⎠ E ((f − gk, )(2−k ·)) Bpp ⎛
∈Ak
⎞1/p p −k s k k, 1 (2 Q ⎝ ) ⎠ , (f − gk, )(2 ·) Bpp ⎛
∈Ak
which finishes (2.7). We insert (2.7) into (2.6) and use (2.4) together with (A.4) s d 2 (f − gk, )k, Bpp (I ) ∈Ak ⎛ ⎞1/p 1 p dt d k k, k s − M,2 Q ⎠ 2 2 p ⎝ t −s1 p (dt f (2−k ·))(x) Lp (2k Qk, ) t 0 ∈Ak
2
k
s2 − pd
⎛ ⎝
∈Ak 0
1
⎞1/p p dt k, M,Q ⎠ . t −s1 p (d2−k t f )(2−k x) Lp (2k Qk, ) t
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
781
The rest is done by direct substitutions and Theorem A.4 s2 d (f − gk, )k, |Bpp (I ) ∈Ak ⎛ ⎞1/p 2−k p d d k, k s −s − M,Q ⎠ 2 2 1 p ⎝ −s1 p (d f )(2−k x) Lp (2k Qk, ) ∈Ak 0 ⎛ ⎞1/p p d 2−k k, M,Q ⎠ 2k(s2 −s1 ) ⎝ −s1 p (d f )(x) Lp (Qk, ) 0 ∈Ak
−k 1/p p 2 −s1 p M,I d −k(s1 −s2 ) d d 2 f )(x) Lp (I ) (d 0 s1 2−k(s1 −s2 ) f |Bpp (I d ).
Next we consider the case of general integrability and summability parameters. Proposition 2.3. Let = I d = (0, 1)d . Let G1 () = Asp11 q1 () and G2 () = Asp22 q2 () with 1p1 , p2 , q1 , q2 ∞ (p1 , p2 < ∞ in the F-case), d 1 1 s1 > and s1 − d − > s2 > 0. (2.13) p1 p1 p2 + Then − gnlin (id)n
s1 −s2 1 1 d + p1 − p2 +
(2.14)
.
Proof. First, we deal with the case p1 = p2 = p and p = q1 and/or p = q2 . We use the well-known real interpolation formula, cf. [13,1,15,17]: r r0 r1 Bpq (Rd ) = Bpp (Rd ), Bpp (Rd ) ,q
and its counterpart r r0 r1 Bpq (I d ) = Bpp (I d ), Bpp (I d )
,q
for 1 p, q ∞,
0 < < 1,
r0 < r1 ,
r = (1 − )r0 + r1 .
If, for example, p = q2 , we find two different real numbers s2 and s2 such that s1 > s2 , s2 > 0,
s2 = (1 − )s2 + s2
782
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
and apply Proposition 2.2 to embeddings id and id in the following diagram: s
2 Bpp (I d ) t9 t id tt t tt t t s1 s2 d Bpp (I d ) id / Bpq 2 (I ) JJ JJ JJ J id JJ% s2 d (I ) Bpp
s2 Using the same approximation operator S˜k , we may interpolate the estimates for f − S˜k f |Bpp s2 (I d ) and f − S˜k f |Bpp (I d ) and obtain (2.14). If also p = q1 , we proceed in the same way. If p1 < p2 we define s0 by 1 1 > s2 > 0 − s1 > s0 := s2 + d p1 p2
and use the chain of embeddings Bps11 q1 (I d ) → Bps01 q2 (I d ) → Bps22 q2 (I d ). The first embedding provides the estimate gnlin (id) n−
s1 −s0 d
−
=n
s1 −s2 1 1 d + p1 − p2
,
the second one is bounded. If p1 > p2 , we use the embedding Bps11 q1 (I d ) → Bps21 q2 (I d ) → Bps22 q2 (I d ). The second embedding is bounded, the first one together with Proposition 2.2 gives the result. This finishes the proof in the B-case. The F-case then follows through trivial embeddings, cf. [15, 2.3.2] Fps11 q1 (I d ) → Bps11 ,∞ (I d ) → Bps22 ,1 (I d ) → Fps22 q2 (I d ).
Theorem 2.4. Let = I d = (0, 1)d . Let G1 () = Asp11 q1 () and G2 () = Asp22 q2 () with 1p1 , p2 , q1 , q2 ∞ (p1 , p2 < ∞ in the F-case) and (2.13). Then gn (id) ≈
gnlin (id)
−
≈n
s1 −s2 1 1 d + p1 − p2 +
.
(2.15)
Proof. According to the Proposition 2.3, it is enough to prove that gn (id) n
−
s1 −s2 1 1 d + p1 − p2 +
.
(2.16)
We use the following simple observation, (cf. [12, Proposition 20]). For = {xj }nj=1 ⊂ we denote G 1 () = {f ∈ G1 () : f (xj ) = 0 for all j = 1, . . . , n}.
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
783
Then gn (id) ≈ inf sup{f |G2 () : f ∈ G 1 (), f |G1 () = 1}
= inf id : G1 () → G2 (),
(2.17) (2.18)
where both the infima extend over all sets = {xj }nj=1 ⊂ .
To prove (2.16), we construct for every = {xj }2j =1 , l ∈ N, a function l ∈ G 1 () with ld
l |G1 ()1
l |G2 () 2
and
l s2 −s1 +d p1 − p1 1
2 +
,
(2.19)
where the constants of equivalence do not depend on l ∈ N. We rely on the wavelet characterisation of the spaces Aspq (Rn ), as described in [18, Section 3.1]. Let F ∈ C K (R)
and M ∈ C K (R), K ∈ N,
be the Daubechies compactly supported K-wavelets on R with K large enough. Then we define (x) =
d
M (xi ),
x = (x1 , . . . , xd ) ∈ Rd
i=1
and j
m (x) = (2j x − m),
j ∈ N0 , m ∈ Z n .
Then the function j j (x) = j m m (x),
j ∈N
(2.20)
m
satisfies
j |Aspq () ≈ 2
j (s− pd )
1/p | j m |p
(2.21)
m
with constants independent on j ∈ N and on the sequence = { j m }. The summation in (2.20) j and (2.21) runs over those m ∈ Zn for which the support of m is included in . The proof of (2.21) is based on [18, Theorem 3.5]. First, this theorem tells us that the Aspq ()-norm of (2.20) may be estimated from above by the right-hand side of (2.21). On the other hand, considering another extension of j to Rd and its (unique) wavelet decomposition, we get the opposite inequality. ld There is a number k ∈ N with the following property. For any l ∈ N and any = {xj }2j =1 , there are mj ∈ Zd , j = 1, . . . , 2ld such that supp k+l mj ⊂
and
ld supp k+l mj ∩ = ∅ for j = 1, . . . , 2 .
Step 1: p1 p2 . In this case, we take in (2.20) k+l,m1 = 2 n = 2, . . . , 2ld and apply (2.21) twice to verify (2.19).
−j (s− pd )
and k+l,mn = 0,
784
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
Step 2: p1 > p2 . In this case, we take k+l,mn = 2−j s , n = 1, . . . , 2ld in (2.20) and apply again (2.21) twice to prove (2.19). 2.2. The case s2 = 0 In the case s2 = 0, new phenomena come into play. First we point out that Lemma A.8 for s = 0 gives an immediate counterpart of (2.6) and this leads to the following result. Theorem 2.5. Let = I d = (0, 1)d . Let id : G1 () → G2 (), with G1 () = Bps 1 q1 ,
G2 () = Bp02 q2
and 1p1 , q1 , p2 , q2 ∞,
s>
d . p1
Then − ds +( p1 − p1 )+
n
1
2
gn (id) gnlin (id) n
− ds +( p1 − p1 )+ 1
2
(1 + log n)1/q2 ,
n ∈ N.
(2.22)
If the target space is a Lebesgue space, this can be improved, cf. [12]. Theorem 2.6. Let be a bounded Lipschitz domain in Rd . Let id : G1 () = Aspq () → Lr () = G2 (), with 1 p, q ∞,
s>
d p
and
1r ∞
(p < ∞ in the F-case). Then − ds +( p1 − 1r )+
gn (id) ≈ gnlin (id) ≈ n
,
n ∈ N.
Remark 2.7. We show in one example, that the logarithmic factor cannot be removed in general. Let = I d = (0, 1)d and consider the embedding s 0 () → B1,1 (). id : B1,1
(0) = 0. For every k ∈ N and every = Finally, take ∈ S(Rd ) with supp ⊂ and {xj }nj=1 ⊂ , n = 2kd , we set fk (x) = (2k+1 (x − x )), where x is chosen such that supp fk ∩ = ∅ and supp fk ⊂ . We claim that s (I d )c 2k(s−d) fk |B1,1
(2.23)
0 (I d )c k 2−kd . fk |B1,1
(2.24)
and
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
785
Combining (2.23) with (2.24), it follows that gn (id) ≈ gnlin (id) ≈ n− d (1 + log n), s
n ∈ N.
The proof of (2.23) follows directly from Lemma A.8. To prove (2.24), let l ∈ N be the smallest natural number such that () = 0
for || 2−l
and write for k 2l 0 0 fk |B1,1 (I d ) c fk |B1,1 (Rd ) = c
∞
d ∨ (j f k ) |L1 (R )
j =0
c
k−l−1
(2−k−1 )e−i ·x )∨ |L1 (Rd ) (1 (2−j )2(−k−1)d
j =0
= c 2(−k−1)d
k−l−1
(1 (2−j ) (2−k−1 ))∨ |L1 (Rd )
j =0
=c
k−l−1
(1 (2−j +k+1 ) ())∨ (2k+1 x)|L1 (Rd )
j =0
=2
(−k−1)d
k−l−1
(1 (2−j +k+1 ) ())∨ (x)|L1 (Rd ).
(2.25)
j =0
To estimate each of the summands from below, we consider the function 1 (1 (2−j +k+1 ·))∨ = (1 (2−j +k+1 ·) · · · 0 (2l ·))∨ and use Young’s inequality to estimate its L1 -norm. d −j +k+1 ∨ ∨ ·)) |L1 (Rd ) 1 |L1 (R ) = (1 (2
(1 (2
−j +k+1
(2l ·) ∨ 0 d ·) · ) |L1 (R ) · L1 (R ) . ∨
d
Now, (2.24) is a combination of (2.25) and (2.26). 2.3. The case s2 < 0 As the last case, we consider the situation s2 < 0. Theorem 2.8. Let be a bounded Lipschitz domain in Rd . Let id : G1 () = Asp11 q1 () → G2 () = Asp22 q2 () with 1 p1 , p2 , q1 , q2 ∞ (with p1 , p2 < ∞ in the F-case) and s1 >
d , p1
s2 < 0.
(2.26)
786
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
If p1 p2 , then s1
gn (id) ≈ gnlin (id) ≈ n− d . If p1 < p2 and s2 >
d d − , then p2 p1 −
gn (id) ≈ gnlin (id) ≈ n If p1 < p2 and
(2.27)
s1 s2 1 1 d + d + p1 − p2
(2.28)
.
d d − > s2 , then p2 p1 s1
gn (id) ≈ gnlin (id) ≈ n− d .
(2.29)
Proof. Step 1: In this step, we prove two estimates from below. First, using the method from the proof of Theorem 2.4, we obtain − gnlin (id) gn (id)n
s1 −s2 1 1 d + p1 − p2
exactly as in the case s2 > 0. To prove the second estimate from below, namely s1
gnlin (id) gn (id)n− d ,
(2.30) Asp11 q1 (Rd )
spaces as described in we proceed as follows. We rely on atomic decomposition of [18, Chapter 1.5]. For every set ⊂ with || = 2j d we construct a function j (x) =
Mj
j m aj m (x),
x ∈ Rd ,
m=1 −j
d
where Mj ≈ 2j d , j m = 2 p1 for m = 1, . . . , Mj and aj m are positive atoms in the sense of [18, Definition 1.15]. As s1 > 0, no moment conditions are needed. We suppose that supp aj m ∩ = ∅ and supp aj m ⊂ . Altogether, we get j |Asp11 q1 ()j |Asp11 q1 (Rd ) 1 and
j |L1 () = ≈ 2
Id jd
j (x) dx ≈
Mj
j m aj m (x)|L1 (Rd )
m=1
·2
−j pd
1
·2
−j d
·2
−j (s− pd ) 1
= 2−j s1 .
Finally, we choose a non-negative function ∈ S(Rd ) such that the mapping
(x)f (x) dx f →
yields a linear bounded functional on Asp22 q2 (), supp ⊂ and (x)j (x) dx j (x) dx. This leads to 2−j s1 ≈ j |L1 () (x)j (x) dxj |Asp22 q2 ().
Hence, (2.30) is proved and it implies all estimates from below included in the theorem.
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
787
Step 2: If p1 p2 we use the following chain of embeddings: Asp11 q1 () → Lp1 () → Asp22 q2 ()
(2.31)
and obtain s1
gnlin (id)gnlin (id : Asp11 q1 () → Lp1 ()) · id : Lp1 () → Asp22 q2 () n− d . (2.32) If p1 < p2 and 0 > pd2 − pd1 > s2 , then (2.31) holds true as well and, consequently, also (2.32) remains true. If p1 < p2 and 0 > s2 > pd2 − pd1 , we define r > 0 by 1r := − sd2 + p12 . It follows that p1 < r < p2 . Using the embeddings Asp11 q1 () → Lr () → Asp22 p2 ()
(2.33)
we get gnlin (id)gnlin (id : Asp11 q1 () → Lr ()) · id : Lr () → Asp22 p2 () −
n
s1 1 1 d + p1 − r
−
=n
s1 −s2 1 1 d + p1 − p2
.
This proves the upper estimate in (2.28) if p2 = q2 . The general case follows then by interpolation, similar to the proof of Proposition 2.3. 2.4. Comparison with approximation numbers In this closing part we wish to compare the sampling numbers of id : Bps11 q1 () → Bps22 q2 ()
(2.34)
for = (0, 1)d with corresponding approximation numbers. Let us first recall their definition. Definition 2.9. Let A, B be Banach spaces and let T be a compact linear operator from A to B. Then for all n ∈ N the nth approximation number an (T ) of T is defined by an (T ) = inf{T − L : L ∈ L(A, B), rank Ln},
(2.35)
where rank L is the dimension of the range of L. Obviously, an (id) represents the approximation of id by linear operators with the dimension of the range smaller or equal to n, in general not restricted to involve only function values. Hence an (id)gnlin (id),
n ∈ N.
We again assume that d s1 > , p1
s1 − s2 > d
1 1 − p1 p2
,
(2.36)
+
which ensures that (2.34) is compact and its sampling numbers are well defined. The approximation numbers of (2.34) are well known, we refer to [2,14,4,18] for details. We wish to discuss, when the equivalence an (id) ≈ gnlin (id) holds true. The comparison of our results with the known results
788
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
for an (id) shows, that this is the case if either 1. s2 > 0 and 1p2 p1 ∞ or, 2. s2 > 0 and 1p 1 p 2 2 or 2 p1 p2 ∞ or, 3. 0 > s2 > d p12 − p11 and 1 p1 p2 2 or 2 p1 p2 ∞. Acknowledgment I would like to thank to Erich Novak, Winfried Sickel, Hans Triebel and to the anonymous referee for many valuable discussions and comments on the topic. Appendix A. Function spaces on domains A.1. Function spaces on Rd We use standard notation: N denotes the collection of all natural numbers, Rd is the Euclidean d-dimensional space, where d ∈ N, and C stands for the complex plane. Let S(Rd ) be the Schwartz space of all complex-valued rapidly decreasing, infinitely differentiable functions on Rd and let S (Rd ) be its dual—the space of all tempered distributions. Furthermore, Lp (Rd ) with 1 p ∞, are the Lebesgue spaces endowed with the norm ⎧ 1/p p ⎨ , 1 p < ∞, Rd |f (x)| dx d f |Lp (R ) = ess sup |f (x)|, p = ∞. ⎩ x∈Rd
For ∈ S(Rd ) we denote by () = (F )() = (2)−d/2
Rd
e−ix, (x) dx,
x ∈ Rd ,
its Fourier transform and by ∨ or F −1 its inverse Fourier transform. We give a Fourier-analytic definition of Besov and Triebel-Lizorkin spaces, which relies on the so-called dyadic resolution of unity. Let ∈ S(Rd ) with (x) = 1 if |x|1
and (x) = 0
if |x| 23 .
(A.1)
We put 0 = and j (x) = (2−j x) − (2−j +1 x) for j ∈ N and x ∈ Rd . This leads to identity ∞
j (x) = 1,
x ∈ Rd .
j =0 s (Rd ) is the collection of all f ∈ S (Rd ) Definition A.1. (i) Let s ∈ R, 1 p, q ∞. Then Bpq such that ⎞1/q ⎛ ∞ q s (Rd ) = ⎝ 2j sq (j f)∨ Lp (Rd ) ⎠ < ∞ (A.2) f |Bpq j =0
(with the usual modification for q = ∞).
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
789
s (Rd ) is the collection of all f ∈ S (Rd ) such (ii) Let s ∈ R, 1 p < ∞, 1 q ∞. Then Fpq that ⎛ ⎞1/q ∞ s (Rd ) = ⎝ 2j sq |(j f)∨ (·)|q ⎠ |Lp (Rd ) < ∞ (A.3) f |Fpq j =0
(with the usual modification for q = ∞). Remark A.2. These spaces have a long history. In this context we recommend [13,15,16,18] as s (Rd ) and F s (Rd ) are independent of the standard references. We point out that the spaces Bpq pq choice of in the sense of equivalent norms. Special cases of these two scales include Lebesgue spaces, Sobolev spaces, Hölder–Zygmund spaces and many other important function spaces. We omit any detailed discussion. A.2. Function spaces on domains Let be a bounded domain. Let D() = C0∞ () be the collection of all complex-valued infinitely differentiable functions with compact support in and let D () be its dual—the space of all complex-valued distributions on . Let g ∈ S (Rd ). Then we denote by g| its restriction to : (g|) ∈ D ()
(g|)() = g() for ∈ D().
Definition A.3. Let be a bounded domain in Rd . Let s ∈ R, 1 p, q ∞ with p < ∞ in the s or F s . Then F-case. Let Aspq stand either for Bpq pq Aspq () = {f ∈ D () : ∃g ∈ Aspq (Rd ) : g| = f } and f |Aspq () = inf g|Aspq (Rd ), where the infimum is taken over all g ∈ Aspq (Rd ) such that g| = f . We collect some important properties of spaces Aspq () which will be useful later on. For this reason, we have to restrict to bounded Lipschitz domains. We use a standard definition of the notion of Lipschitz domain, the reader may consult for example [18, Chapter 1.11.4]. Let x ∈ Rd , h ∈ Rd and M ∈ N. Then 1 f )(x) = (1h M (M+1 h f )(x) with (h f )(x) = f (x + h) − f (x), h
are the usual differences in Rd . For x ∈ we consider the differences with respect to : (M h, f )(x)
=
(M h f )(x) if x + lh ∈ for l = 0, . . . , M, 0 otherwise.
790
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
We also need to adapt the classical ball means of differences to bounded domains. Let M ∈ N, t > 0, x ∈ . Then we define V M (x, t) = {h ∈ Rd : |h| < t, x + h ∈ for 0 < M} and dtM, f (x) = t −d
V M (x,t)
|(M h f )(x)| dh.
We shall also use the simple relation (cf. [12, (4.10)]) f )( x), (dtM, f ( ·))(x) = (d M, t
x ∈ , 0 < , t < ∞.
(A.4)
The following theorem connects the classical definition of Besov and Triebel-Lizorkin spaces using differences with Definition A.3. We refer to and [8,18, 1.11.9] for details and references to this topic. Theorem A.4. Let be a bounded Lipschitz domain in Rd . Let 1 p, q ∞ and 0 < s < M ∈ N. s () is the collection of all f ∈ L () such that Then Bpq p
1
f |Lp () +
t 0
−sq
dtM, f |Lp ()q
dt t
1/q <∞
(A.5)
in the sense of equivalent norms (usual modification if q = ∞). We present a modification of the preceding theorem, which suits better for our needs. Let M ∈ N. Let P M (Rd ) be the space of all complex-valued polynomials of degree smaller than M and let P M () be its restriction to . We denote M +d −1 . DM = dim P M (Rd ) = dim P M () = d DM d DM M there exists (unique) We say, that {xj }D j =1 ⊂ R is a M-regular set if for every {yj }j =1 ∈ R d M p ∈ P (R ) such that p(xj ) = yj , j = 1, . . . , DM . In particular, if p(xj ) = 0 for p ∈ P M (Rd ) and all j = 1, 2, . . . , DM then p ≡ 0. One may observe directly (or consult [11]) that the set d d mi M m ∈ Z : 0 mi M for i = 1, 2, . . . , d and i=1
and all its translations, dilations and rotations are M-regular. M Theorem A.5. Let be a bounded Lipschitz domain in Rd , M ∈ N and let {xj }D j =1 be a M-regular set in . Let 1p, q ∞ and
d < s < M ∈ N. p
(A.6)
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
791
s () is the collection of all f ∈ L () such that Then Bpq p DM
1
|f (xj )| + 0
j =1
dt t −sq dtM, f |Lp ()q t
1/q <∞
(A.7)
in the sense of equivalent norms (usual modification if q = ∞). Proof. According to (A.6), the following embedding is true: s ¯ Bpq () → C()
and for every x ∈ s ¯ (). |f (x)| f |C()f |Bpq
This shows that the left-hand side of (A.7) is (up to some constant) smaller than the left-hand side of (A.5). s We prove the reverse inequality be contradiction. We denote the left side of (A.7) by f |Bpq () . We suppose, that there is no c > 0 such that s () f |Lp ()c f |Bpq
s for all f ∈ Bpq ().
s Then there is a sequence {fn }∞ n=1 ⊂ Bpq () such that
fn |Lp () = 1
and
s fn |Bpq () <
1 , n ∈ N. n
(A.8)
s ¯ This shows, that {fn }∞ n=1 is bounded in Bpq () and hence precompact in C(). We may therefore assume that
fn → f
¯ in C().
From (A.8) it follows that DM
|f (xj )| = 0
and (dtM, f )(x) = 0
for a.e. x ∈ .
(A.9)
j =1
The second part of (A.9) gives that f ∈ P M (). Furthermore, the definition of M-regular sets and the first part of (A.9) implies that f = 0. This contradicts (A.8). This characterisation has a direct corollary. Corollary A.6. Under the assumptions of Theorem A.5,
inf
g∈P M ()
s f − g|Bpq () ≈
1
0
t −sq dtM, f |Lp ()q
dt t
1/q
M Proof. Consider some M-regular set {xj }jD=1 and g ∈ P M () such that
g(xj ) = f (xj ),
j = 1, . . . , DM .
.
792
J. Vybíral / Journal of Complexity 23 (2007) 773 – 792
Let us mention, that the polynomial g is uniquely determined and its definition combines the function values f (x1 ), . . . , f (xDM ) in a linear way. The rest of the proof follows directly from Theorem A.5. s (Rd ) are multiplication algebras if s > We also recall the fact that the spaces Bpq cf. [15, 2.8.3].
Lemma A.7. Let 1 p, q ∞ and s >
d p.
d p,
Then
s s s h1 · h2 |Bpq (Rd ) ch1 |Bpq (Rd ) · h2 |Bpq (Rd ),
where the constant c does not depend on h1 and h2 . Finally, we consider the dilation operator Tk : f → f (2k ·), k ∈ N, and its behaviour on the scale of Besov spaces. For the proof, we refer to [3, 1.7; 9, 2.3.1]. s (Rd ) Lemma A.8. Let s 0, 1 p, q ∞ and k ∈ N. Then the operator Tk is bounded on Bp,q
and its norm is bounded by c2 does not depend on k ∈ N.
k(s− pd )
if s > 0 and by c2
−k pd
(1 + k)1/q if s = 0. The constant c
References [1] L. Bergh, J. Löfström, Interpolation Spaces, An Introduction, Berlin, Springer, 1976. [2] M.Sh. Birman, M.Z. Solomyak, Piecewise polynomial approximation of functions of the class Wp , Mat. Sb. (N. S.) 73 (1967) 331–355; English translation: Math. USSR Sb. 2 (1967) 295–317. [3] G. Bourdaud, Sur les opérateurs pseudo-différentiels à coefficinets peu réguliers, Habilitation thesis, Université de Paris-Sud, Paris, 1983. [4] A.M. Caetano, About approximation numbers in function spaces, J. Approx. Theory 94 (1998) 383–395. [5] P.G. Ciarlet, The Finite Element Method for Elliptic Problems, North-Holland, Amsterdam, 1978. [6] S. Dahlke, E. Novak, W. Sickel, Optimal approximation of elliptic problems by linear and nonlinear mappings I, J. Complexity 22 (2006) 29–49. [7] S. Dahlke, E. Novak, W. Sickel, Optimal approximation of elliptic problems by linear and nonlinear mappings II, J. Complexity 22 (2006) 549–603. [8] S. Dispa, Intrinsic characterisation of Besov spaces on Lipschitz domains, Math. Nachr. 260 (2003) 21–33. [9] D.E. Edmunds, H. Triebel, Function spaces, entropy numbers, differential operators, Cambridge University Press, Cambridge, 1996. [10] S.N. Kudryavtsev, The best accuracy of reconstruction of finitely smooth functions from their values at a given number of points, Izv. Math. 62 (1) (1998) 19–53. [11] W. Light, W. Cheney, A Course in Approximation Theory, Brooks/Cole, Pacific Grove, 1999. [12] E. Novak, H. Triebel, Function spaces in Lipschitz domains and optimal rates of convergence for sampling, Constr. Approx. 23 (2006) 325–350. [13] J. Peetre, New Thoughts on Besov Spaces, Duke University Mathematics Series, Duke University Press, Durham, 1976. [14] V.M. Tikhomirov, Analysis II, Convex Analysis and Approximation Theory, Springer, Berlin, 1990. [15] H. Triebel, Theory of Function Spaces, Birkhäuser, Basel, 1983. [16] H. Triebel, Theory of Function Spaces II, Birkhäuser, Basel, 1992. [17] H. Triebel, Function spaces in Lipschitz domains and on Lipschitz manifolds. Characteristic functions as pointwise multipliers, Rev. Mat. Complut. 15 (2002) 475–524. [18] H. Triebel, Theory of Function Spaces III, Birkhäuser, Basel, 2006. [19] H. Triebel, Sampling numbers and embedding constants, Trudy Mat. Inst. Steklov 248 (2005) 275–284.
Journal of Complexity 23 (2007) 793 – 801 www.elsevier.com/locate/jco
Quantum lower bounds by entropy numbers Stefan Heinrich∗ Department of Computer Science, University of Kaiserslautern, D-67653 Kaiserslautern, Germany Received 30 November 2006; accepted 30 January 2007 Available online 13 March 2007
Abstract We use entropy numbers in combination with the polynomial method to derive a new general lower bound for the nth minimal error in the quantum setting of information-based complexity. As an application, we improve some lower bounds on quantum approximation of embeddings between finite dimensional Lp spaces and of Sobolev embeddings. © 2007 Elsevier Inc. All rights reserved. Keywords: Quantum information-based complexity; Minimal quantum error; Lower bound; Entropy number
1. Introduction There is one major technique for proving lower bounds in the quantum setting of informationbased complexity (IBC) as introduced in [5]. It uses the polynomial method [1] together with a result on approximation by polynomials from [14]. This method has been applied in [5,9,19]. Other papers on the quantum complexity of continuous problems use this implicitly by reducing mean computation to the problem under consideration and then using the lower bound for mean computation of [14] directly [15,18,11,16]. This approach, however, does not work for the case of approximation of embedding operators in spaces with norms different from the infinity norm. To settle such situations, a more sophisticated way of reduction to known bounds was developed in [6], based on a multiplicativity property of the nth minimal quantum error. In this paper we introduce an approach which is new for the IBC quantum setting. We again use the polynomial method of [1], but combine it with methods related to entropy [4]. We derive lower bounds for the nth minimal quantum error in terms of certain entropy numbers. Similar ∗ Fax: +49 631 205 3270.
E-mail address: [email protected] URL: http://www.uni-kl.de/AG-Heinrich. 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.01.007
794
S. Heinrich / Journal of Complexity 23 (2007) 793 – 801
ideas have been applied before in [17], the model and methods, however, being different, see also related work [13]. As an application, we improve the lower bounds [6,7] on approximation as well as those of [8] by removing the logarithmic factors. Let us also mention that a modification of the polynomial method based on trigonometric polynomials was used in [2,3] for proving lower bounds for a type of query different from that introduced in [5], the so-called power query [16]. Our method can also be applied in this setting and simplifies the analysis from [2,3]. We comment on this at the end of the paper. 2. Lower bounds by entropy We work in the quantum setting of IBC as introduced in [5]. We refer to this paper extensively. Let D and K be nonempty sets, let F(D, K) denote the set of all functions on D with values in K, let F ⊆ F(D, K) be nonempty, and let G be a normed linear space. Let S be a mapping from F to G, the solution operator, which we seek to approximate. Let A be a quantum algorithm from F to G. The error of A at input f ∈ F is the smallest ε 0 such that with probability at least 43 the algorithm output A(f ) is within distance ε of S(f ), formally, e(S, A, f ) = inf {ε 0 | P{S(f ) − A(f )ε}3/4} . The error over the class F is then defined as e(S, A, F ) = sup e(S, A, f ). f ∈F
For any subset C ⊆ G define the function pC : F → R by pC (f ) = P{A(f ) ∈ C}
(f ∈ F ),
the probability that the output of algorithm A at input f belongs to C. This quantity is well defined for all subsets C since the output of A takes only finitely many values, see [5]. Furthermore, define PA,F = span{pC : C ⊆ G} ⊆ F(F, R) to be the linear span of the functions pC . We need some notions related to entropy. We refer to [4] for the definitions. For a nonempty subset W of a normed space G and k ∈ N (we use the notation N = {1, 2, . . .} and N0 = {0, 1, 2, . . .}) define the kth inner entropy number as k (W, G) = sup{ε : there exist u1 , . . . , uk+1 ∈ W such that ui − uj 2ε for all 1 i = j k + 1}. It is worthwhile mentioning a related notion. The kth entropy number is defined to be εk (W, G) = inf ε : there exist g1 , . . . , gk ∈ G such that min g − gi G ε for all g ∈ W . 1i k
(1)
(2)
Then k (W, G)εk (W, G) 2k (W, G),
(3)
S. Heinrich / Journal of Complexity 23 (2007) 793 – 801
795
see [4, relations (1.1.3) and (1.1.4)]. Also observe that the first numbers of both types are related to the radius and diameter of W as follows: 1 (W, G) =
1 2
diam(W, G),
ε1 (W, G) = rad(W, G).
(4)
Entropy numbers of bounded linear operators (that means, the entropy numbers of the image of the unit ball under the action of the operator) as well as their relation to various s-numbers and to eigenvalues are well studied, see again [4] and references therein. Our basic lemma relates the error e(S, A, F ) of a quantum algorithm A from F to G to the dimension of PA,F and the entropy of S(F ) ⊆ G. Lemma 1. (i) Let k ∈ N be such that k + 1 > (log2 5) dim PA,F .
(5)
e(S, A, F )k (S(F ), G).
(6)
Then
(ii) If A is an algorithm without queries, then (7)
e(S, A, F )1 (S(F ), G).
Proof. The first part of the proof is the same for both cases. For case (i) we assume that k satisfies (5), while in case (ii) we set k = 1. Let f1 , . . . , fk+1 ∈ F be arbitrary elements and put (8) ε = min S(fi ) − S(fj ) : 1 i = j k + 1 . It suffices to show that (9)
e(S, A, F )ε/2. For ε = 0 this is trivial, so we suppose ε > 0. We assume the contrary of (9), that is,
(10)
e(S, A, F ) < ε/2. By (8), the subsets Vi ⊂ G defined by ε Vi = g ∈ G : S(fi ) − g < 2
(i = 1, . . . , k + 1)
(11)
are disjoint. It follows from (10) and (11), that for i = 1, . . . , k + 1 P{A(fi ) ∈ Vi } 43 .
(12)
Let us first complete the proof of (ii): if A has no queries, its output does not depend on f ∈ F , and in particular, the distribution of the random variables A(f1 ) and A(f2 ) is the same. But then (12) implies P{A(f1 ) ∈ V1 ∩ V2 } 1/2, thus V1 ∩ V2 = ∅, a contradiction, which proves (9) in case (ii).
796
S. Heinrich / Journal of Complexity 23 (2007) 793 – 801
Now we deal with case (i). Let C be the set of all C ⊂ G of the form Vi C= i∈I
with I being any subset of {1, . . . , k + 1}. Clearly, |C| = 2k+1 .
(13)
Let PA,F be endowed with the supremum norm p∞ = sup |p(f )|. f ∈F
We have pC ∞ 1 (C ∈ C).
(14)
Moreover, pC1 − pC2 ∞ 21
(C1 = C2 ∈ C).
(15)
Indeed, for C1 = C2 ∈ C there is an i with 1 i k + 1 such that Vi ⊆ C1 \ C2 or Vi ⊆ C2 \ C1 . Without loss of generality we assume the first. Then, because of (12), we have pC1 (fi ) = P{A(fi ) ∈ C1 } P{A(fi ) ∈ Vi } 43 , while pC2 (fi ) = P{A(fi ) ∈ C2 } P{A(fi ) ∈ G \ Vi } 41 , hence |pC1 (fi ) − pC2 (fi )| 21 implying (15). For p ∈ PA,F let B(p, r) be the closed ball of radius r around p in PA,F . By (15) the balls B(pC , 1/4) have disjoint interior for C ∈ C. Moreover, by (14), B(pC , 1/4) ⊆ B(0, 5/4). C∈C
A volume comparison gives 2k+1 = |C| 5dim PA,F , hence, taking logarithms, we get a contradiction to (5), which completes the proof.
q
Let en (S, F ) denote the nth minimal quantum error, that is, the infimum of e(S, A, F ) taken over all quantum algorithms A from F to G with at most n queries (see [5]). As an immediate consequence of Lemma 1, and also for later use, we note the following. Corollary 1. 1 2
q
diam(S(F ), G)e0 (S, F ) rad(S(F ), G).
(16)
S. Heinrich / Journal of Complexity 23 (2007) 793 – 801
797
Proof. The lower bound follows from Lemma 1(ii) and (4). The upper bound is obtained by taking for any > 0 a point g ∈ G with S(f ) − g rad(S(F ), G) +
for all f ∈ F
and then using the trivial algorithm which outputs g for all f ∈ F , with probability 1.
Next we recall some facts from [5, Section 4]. Let L ∈ N and for each u = (u1 , . . . , uL ) ∈ {0, 1}L let fu ∈ F(D, K) be assigned such that the following is satisfied: Condition I. For each t ∈ D there is an , 1 L, such that fu (t) depends only on u , in other words, for u, u ∈ {0, 1}L , u = u implies fu (t) = fu (t). The following result was shown in [5, Corollary 2], based on the idea of the quantum polynomial method [1]. Lemma 2. Let L ∈ N and assume that (fu )u∈{0,1}L ⊆ F(D, K) satisfies Condition I. Let n ∈ N0 and let A be a quantum algorithm from F(D, K) to G with n quantum queries. Then for each subset C ⊆ G,
pC (fu ) = pC f(u1 ,...,uL ) , considered as a function of the variables u1 , . . . , uL ∈ {0, 1}, is a real multilinear polynomial of degree at most 2n. Now we are ready to state the new lower bound on the nth minimal quantum error. Proposition 1. Let D, K be nonempty sets, let F ⊆ F(D, K) be a nonempty set of functions, G a normed space, S : F → G a mapping, and L ∈ N. Suppose L = (fu )u∈{0,1}L ⊆ F(D, K) is a system of functions satisfying Condition I. Then q
en (S, F )k (S(F ∩ L), G) whenever k, n ∈ N satisfy 2n L and 2n eL k + 1 > (log2 5) . 2n
(17)
Proof. Let n ∈ N with 2n L and let A be a quantum algorithm from F to G with no more than n queries. Note that, by definition, a quantum algorithm from F ⊆ F(D, K) to G is always also a quantum algorithm from F(D, K) to G (see [5, p. 7]). We show that e(S, A, F )k (S(F ∩ L), G)
(18)
for all k ∈ N satisfying (17). Let ML,2n be the linear space of real multilinear polynomials in L variables of degree not exceeding 2n. Since 2n L, its dimension is dim ML,2n =
2n
L i=0
i
eL 2n
2n (19)
798
S. Heinrich / Journal of Complexity 23 (2007) 793 – 801
(see, e.g., [12, (4.7), p. 122] for the inequality). Set U = {u ∈ {0, 1}L : fu ∈ F } and let ML,2n (U ) denote the space of all restrictions of functions from ML,2n to U. Clearly, dim ML,2n (U ) dim ML,2n .
(20)
Define : PA,F ∩L → F(U, R) by setting for p ∈ PA,F ∩L and u ∈ U (p)(u) = p(fu ). Obviously, is linear, moreover, for C ⊆ G (pC )(u) = pC (fu )
(u ∈ U ).
By Lemma 2, pC (fu ), as a function of u ∈ U , is the restriction of an element of ML,2n to U. Hence, pC ∈ ML,2n (U ), and by linearity and the definition of PA,F ∩L as the linear span of functions pC , we get (PA,F ∩L ) ⊆ ML,2n (U ). Furthermore, is one-to-one, since {fu : u ∈ U } = F ∩ L. Using (19) and (20) it follows that 2n eL . dim PA,F ∩L dim ML,2n (U ) 2n Consequently, for k satisfying (17), k + 1 > (log2 5) dim PA,F ∩L . Now (18) follows from Lemma 1.
3. Some applications For N ∈ N and 1p ∞, let LN p denote the space of all functions f : {1, . . . , N} → R, equipped with the norm 1/p N 1 p f LNp = |f (i)| , N i=1
if p < ∞, f LN∞ = max |f (i)|, 1i N
N N N N and let B(LN p ) be its unit ball. Define Jpq : Lp → Lq to be the identity operator Jpq f = f (f ∈ N Lp ).
S. Heinrich / Journal of Complexity 23 (2007) 793 – 801
799
N was obtained using a mulAs already mentioned, the lower bound for approximation of Jpq tiplicativity property of the nth minimal quantum error [6, Proposition 1]. The result involved some logarithmic factors of negative power [6, Proposition 6]. Based on Proposition 1 above we improve this bound by removing the logarithmic factors.
Proposition 2. Let 1 p, q ∞. There is a constant c > 0 such that for all n ∈ N0 , N ∈ N with n cN q
N 1 en (Jpq , B(LN p )) 8 .
Proof. It suffices to prove the case p = ∞, q = 1. We put L = N and fu = u for u ∈ {0, 1}N . Clearly, the system L = (fu )u∈{0,1}N satisfies Condition I and L ⊂ B(LN ∞ ).
(21)
Let {fui : 1i k + 1} be a maximal in L system with fui − fuj LN 41 1
(1i = j k + 1),
(22)
i.e., a system which has no proper superset in L satisfying (22). Maximality implies {0, 1}N =
k+1
u ∈ {0, 1}N : fu − fui LN < 1
i=1
1 . 4
On the other hand, k+1
1 N N 2 u ∈ {0, 1} : fu − fui LN1 < 4 i=1
N (k + 1)(4e)N/4 (k + 1) j 0 j
again by [12, (4.7), p. 122]. It follows that k + 1 2N (4e)−N/4 = 2c1 N with c1 = 41 log2 ( 4e ) > 0, hence k ∈ N. From (22) we obtain N 1 k J∞,1 (L), LN 1 8.
(23)
(24)
Consider the function g : (0, 1] → R, 1 g(x) = x log2 e + log2 . x It is elementary to check that g is monotonically increasing. Moreover g(x) → 0 as x → 0. Choose 0 < c2 1 in such a way that g(x) <
c1 2
(0 < x c2 ).
(25)
800
S. Heinrich / Journal of Complexity 23 (2007) 793 – 801
Now put
c2 1 c1 , , c = min 2 log2 log2 5 2 2
(26)
and assume (27)
ncN. If n = 0, Corollary 1 gives q
N N e0 (J∞,1 , B(LN ∞ )) = J∞,1 = 1.
(28)
Hence we can suppose that n1, which, by (27), implies N c−1 . Consequently, from (26), log2 log2 5 c1 . N 2
(29)
Since by (26) and (27), 2n/N 2c c2 , we get from (25) 2n c1 N log2 e + log2 < , N 2n 2
(30)
and therefore, with (29), log2 log2 5 2n N + log2 e + log2 < c1 . N N 2n
(31)
This implies, using also (23), eN 2n (log2 5) < 2c1 N k + 1. 2n
(32)
Since we have k, n ∈ N satisfying (32), and moreover, by (26) and (27), 2nN , we can use Proposition 1 together with (21) and (24) to conclude q N N N 1 en (J∞,1 , B(LN ∞ ))k J∞,1 (L), L1 8 . Using Proposition 2 we can also remove the logarithmic factors in another lower bound—for Sobolev embeddings Jpq : Wpr ([0, 1]d ) → Lq ([0, 1]d ), see [7] for the notation and Proposition 2 of that paper for the previous result. The following can be derived from Proposition 2 using the same argument as in [7, relations (87) and (88), p. 43]. Corollary 2. Let 1 p, q ∞, r, d ∈ N, and assume constant c > 0 such that for all n ∈ N
r d
> max
1 2 p, p
−
2 q
. Then there is a
en (Jpq , B(Wpr ([0, 1]d )))cn−r/d . q
Furthermore, the lower bounds from [6] were also used in [8, Proposition 3, Corollary 3]. Using Proposition 2, these results can be improved in a similar way. We omit the details. Let us finally comment on lower bounds for power queries introduced in [16]. An inspection of the proof of Lemma 1 shows that the type of query is not used at all in the argument, so the
S. Heinrich / Journal of Complexity 23 (2007) 793 – 801
801
statement also holds for power queries. One part of the argument in both [2,3] consists of proving that for a quantum algorithm with at most n power queries and for a suitable subset F0 ⊆ F , which can be identified with the interval [0, 1], the respective space PA,F0 is contained in the (complex) linear span of functions e2i t (t ∈ [0, 1]), with frequencies from a set of cardinality not greater than cn for some c > 0, hence, dim PA,F0 2cn . Moreover, S(F0 ) can also be identified with the unit interval. Now Lemma 1 above directly yields the logarithmic lower bounds of [2,3], since the kth inner entropy number of the unit interval is k −1 . References [1] R. Beals, H. Buhrman, R. Cleve, M. Mosca, R. de Wolf, Quantum lower bounds by polynomials, in: Proceedings of the 39th IEEE FOCS, 1998, pp. 352–361, see also http://arXiv.org/abs/quant-ph/9802049. [2] A. Bessen, A lower bound for quantum phase estimation, Phys. Rev. A 71 (2005) 042313 see also
http://arXiv.org/abs/quant-ph/0412008. [3] A. Bessen, A lower bound for the Sturm–Liouville eigenvalue problem on a quantum computer, J. Complexity 22 (2006) 660–675 see also http://arXiv.org/abs/quant-ph/0512109. [4] B. Carl, I. Stephani, Entropy, Compactness and the Approximation of Operators, Cambridge University Press, Cambridge, 1990. [5] S. Heinrich, Quantum summation with an application to integration, J. Complexity 18 (2002) 1–50 see also
http://arXiv.org/abs/quant-ph/0105116. [6] S. Heinrich, Quantum approximation I. Embeddings of finite dimensional Lp spaces, J. Complexity 20 (2004) 5–26 see also http://arXiv.org/abs/quant-ph/0305030. [7] S. Heinrich, Quantum approximation II. Sobolev embeddings, J. Complexity 20 (2004) 27–45 see also
http://arXiv.org/abs/quant-ph/0305031. [8] S. Heinrich, On the power of quantum algorithms for vector valued mean computation, Monte Carlo Methods Appl. 10 (2004) 297–310. [9] S. Heinrich, E. Novak, On a problem in quantum summation, J. Complexity 19 (2003) 1–18 see also
http://arXiv.org/abs/quant-ph/0109038. [11] B. Kacewicz, Almost optimal solution of initial-value problems by randomized and quantum algorithms, J. Complexity 22 (2006) 676–690 see also http://arXiv.org/abs/quant-ph/0510045. [12] J. Matousek, Geometric Discrepancy. An Illustrated Guide, Springer, Berlin, 1999. [13] A. Nayak, Optimal lower bounds for quantum automata and random access codes, FDCS, 1999 p. 369, see also
http://arXiv.org/abs/quant-ph/9904093. [14] A. Nayak, F. Wu, The quantum query complexity of approximating the median and related statistics, in: STOC, May 1999, pp. 384–393, see also http://arXiv.org/abs/quant-ph/9804066. [15] E. Novak, Quantum complexity of integration, J. Complexity 17 (2001) 2–16 see also http://arXiv.org/ abs/quant-ph/0008124. [16] A. Papageorgiou, H. Wo´zniakowski, Classical and quantum complexity of the Sturm–Liouville eigenvalue problem, Quantum Inform. Process. 4 (2005) 87–127 see also http://arXiv.org/abs/quant-ph/0502054. [17] Y. Shi, Entropy lower bounds for quantum decision tree complexity, Inform. Process. Lett. 81 (1) (2002) 23–27 see also http://arXiv.org/abs/quant-ph/0008095. [18] J.F. Traub, H. Wo´zniakowski, Path integration on a quantum computer, Quantum Inform. Process. 1 (5) (2002) 365–388 see also http://arXiv.org/abs/quant-ph/0109113. [19] C. Wiegand, Quantum complexity of parametric integration, J. Complexity 20 (2004) 75–96 see also
http://arXiv.org/abs/quant-ph/0305103.
Journal of Complexity 23 (2007) 802 – 827 www.elsevier.com/locate/jco
On the complexity of the multivariate Sturm–Liouville eigenvalue problem A. Papageorgiou∗ Department of Computer Science, Columbia University, New York, USA Received 30 November 2006; accepted 12 March 2007 Available online 28 March 2007 Dedicated to Henryk Wozniakowski on the occasion of his 60th birthday
Abstract We study the complexity of approximating the smallest eigenvalue of − + q with Dirichlet boundary conditions on the d-dimensional unit cube. Here is the Laplacian, and the function q is non-negative and has continuous first order partial derivatives. We consider deterministic and randomized classical algorithms, as well as quantum algorithms using quantum queries of two types: bit queries and power queries. We seek algorithms that solve the problem with accuracy ε. We exhibit lower and upper bounds for the problem complexity. The upper bounds follow from the cost of particular algorithms. The classical deterministic algorithm is optimal. Optimality is understood modulo constant factors that depend on d. The randomized algorithm uses an optimal number of function evaluations of q when d 2. The classical algorithms have cost exponential in d since they need to solve an eigenvalue problem involving a matrix with size exponential in d. We show that the cost of quantum algorithms is not exponential in d, regardless of the type of queries they use. Power queries enjoy a clear advantage over bit queries and lead to an optimal complexity algorithm. © 2007 Elsevier Inc. All rights reserved. Keywords: Eigenvalue problem; Eigenvalue approximation
1. Introduction In a recent paper with Wo´zniakowski [21] we studied the classical and quantum complexity of the Sturm–Liouville eigenvalue problem. This paper extends those results to the multidimensional case. By analogy with the Sturm–Liouville eigenvalue problem [9] in one dimension, we consider the eigenvalue problem −u + qu = u defined on the d-dimensional unit cube with Dirichlet boundary condition. Here is the d-dimensional Laplacian, and q is a non-negative function ∗ Fax: +1 212 666 0140.
E-mail address: [email protected]. 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.03.002
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
803
of d variables whose first order partial derivatives exist and are continuous. Then we study the complexity of approximating the smallest eigenvalue (q) with accuracy ε. We assume that q is not explicitly known but we can sample it at any point of the unit cube. Any algorithm solving this problem will need to compute a number of evaluations of q and to combine them to obtain an approximation of the eigenvalue of interest. Classical algorithms may be deterministic or randomized. The former evaluate q at deterministically chosen points, while the latter can sample q at randomly chosen points. Moreover, randomized algorithms may also combine the evaluations of q randomly. We obtain the worst case error of classical deterministic algorithms, and the worst expected error of randomized algorithms. We address the information cost of classical algorithms, i.e., the number of function evaluations the algorithms use, as well as their total cost by taking into account the additional cost of the operations that are used for combining the function evaluations. Accordingly, the minimal information cost of any algorithm solving the problem with accuracy ε is the information complexity of the problem, while the minimal total cost of any algorithm with error at most ε is the problem complexity. Clearly, the information complexity provides a lower bound for the problem complexity. Quantum algorithms use quantum queries to evaluate q at deterministically chosen points. (Recently, quantum algorithms with randomized queries have been considered [30] but we do not deal with them in this paper.) The query information is combined using a number of quantum operations. Quantum algorithms succeed in producing an ε-approximation with probability, say, 3 4 . The minimal number of queries of any algorithm solving the problem with accuracy ε is the query complexity. The total cost of a quantum algorithm takes into account the additional quantum operations, excluding the ones used for queries, required to solve the problem with accuracy ε. We will distinguish between quantum algorithms using two types of queries, bit queries and power queries. Bit queries are oracle calls similar to those in Grover’s search algorithm [13]. Power queries are obtained by considering the propagator of the system at different time steps, as in phase estimation [19]. In some cases, quantum algorithms may be used to solve parts of the problem while other parts may be solved classically. In such a case we need to consider the cost of the classical and the quantum parts. The definition of the error of algorithms and the details of the model of computation in the different settings can be found in [21] but we will include them in this paper for the convenience of the reader. Turning to the eigenvalue problem we show a perturbation formula relating the eigenvalues (q) and (q) ¯ for two functions q and q, ¯ as in [21]. In particular, we show that (q) = (q) ¯ + ¯ u2q¯ (x) dx + O q − q ¯ 2∞ , (q(x) − q(x)) Id
¯ Using this equation we reduce the eigenwhere uq¯ is the eigenfunction that corresponds to (q). value problem to the integration problem. For deterministic and randomized classical algorithms we use known lower bounds [25] for the information complexity of integration to obtain lower bounds for the information complexity of the eigenvalue problem. For upper bounds we study the cost of particular algorithms that approximate (q) with error ε. We show that by discretizing the continuous problem and solving the resulting matrix eigenvalue problem we obtain an optimal deterministic algorithm. Optimality is understood modulo multiplicative constants that depend on d. We derive a randomized algorithm using the perturbation formula above. Roughly speaking, the idea is to first approximate q by a
804
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
function q, ¯ and then to approximate the first two terms in the right-hand side of the perturbation formula. Using a matrix discretization we approximate (q) ¯ and using Monte Carlo (MC) we approximate the weighted integral. We derive the cost of the algorithm and show that it has optimal information complexity only when d 2. Proving the optimality of this algorithm for d > 2 is an open question at this time. In summary, denoting by n(ε) and comp(ε) the information complexity and the problem complexity, for deterministic algorithms we have n(ε) = (ε−d ), (c ε−d ) = comp(ε) = O(c ε−d + ε −d log ε−1 ), while for randomized algorithms we have (ε−2d/(d+2) ) = n(ε) = O(ε− max(2/3,d/2) ), (ε−2d/(d+2) ) = comp(ε) = O(c ε− max(2/3,d/2) + ε −d log ε−1 ), where the asymptotic constants depend on d, and c denotes the cost of one function evaluation. It is worth pointing out that even if one is able to obtain matching information complexity bounds for any d, the combinatorial cost (i.e., the number of operations excluding function evaluations) of the randomized algorithm is still exponential in d, because we have to solve a matrix eigenvalue problem and the size of the matrix is exponential in d. For quantum algorithms, we treat algorithms using bit queries and power queries separately. For quantum algorithms with bit queries we use the perturbation formula above to reduce the problem to integration. We obtain lower bounds for the query complexity of integration, which yield lower bounds for the query complexity of the eigenvalue problem. We see that we can modify the classical randomized algorithm we discussed above, to obtain a hybrid algorithm, i.e., an algorithm with classical and quantum parts. The only difference with the randomized algorithm is that, instead of using MC, we approximate the weighted integral in the perturbation formula by a quantum algorithm. The quantum algorithm that approximates the integral is due to Novak [20]. We show that the number of queries plus the number of classical function evaluations of the hybrid algorithm matches the query complexity lower bound only when d = 1. Then q ∈ C 1 ([0, 1]), while for q ∈ C 2 ([0, 1]) the same result has been shown in [21]. When d > 1 we only show that the algorithm has information cost and uses a number of classical function evaluations that is exponential in d. The cost of approximating q by q¯ with error ε is dominant in the worst case. As we already indicated, even if we are able to show matching upper and lower bounds for the query complexity that are also proportional to the classical information cost when d > 1, the number of classical operations required by the algorithm is still exponential in d, due to the cost of the matrix eigenvalue problem. However, there is a different quantum algorithm (without any classical parts) that uses bit queries whose cost is not exponential in d. Indeed, we can use phase estimation to solve the problem. Phase estimation typically uses power queries [1,19] but we can approximate the power queries using a number of bit queries that is polynomial in ε −1 , where the degree of the polynomial is independent of d. Denoting the query complexity by nquery (ε) we show that for bit queries (ε−d/(d+1) ) = nquery (ε) = O(ε−6 log2 ε −1 ),
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
805
where the asymptotic constants depend on d. Moreover the algorithm uses a number of quantum operations, excluding the queries, that is proportional to dε −6 log4 ε −1 , a number of qubits proportional to d log ε−1 , and the algorithm succeeds with probability at least 43 . We remark that due to the results of [30] the number of qubits is optimal modulo multiplicative constants. Phase estimation with power queries has a considerable advantage since nquery (ε) = (log ε −1 ), where the asymptotic constant is an absolute constant, and the lower bound follows using the results of [5,6]. The number of quantum operations, excluding queries, is proportional to log2 ε −1 , the number of qubits is proportional to d log ε−1 and thereby optimal, while the algorithm succeeds with probability at least 43 . 2. Problem definition Let Id = [0, 1]d and consider the class of functions *q Q = q: Id → [0, 1]q, Dj q := ∈ C(Id ), Dj q∞ 1, q∞ 1 , *xj where · ∞ denotes the supremum norm. For q ∈ Q, define Lq := − + q, where = d 2 2 j =1 * /*xj is the Laplacian, and consider the eigenvalue problem Lq u = u, x ∈ (0, 1)d , u(x) ≡ 0, x ∈ *Id . In the variational form, the smallest eigenvalue = (q) of (1), (2) is given by d 2 2 j =1 [Dj u(x)] + q(x)u (x) dx Id (q) = min . 2 0=u∈H01 Id u (x) dx
(1) (2)
(3)
We will study the complexity of classical and quantum algorithms approximating (q) with error ε. We will show asymptotic bounds for the error of the algorithms and the problem complexity, assuming that d is fixed. Henceforth, all asymptotic constants in the error estimates, the complexity estimates and the cost of algorithms are either absolute constants or depend on d. Often we will be addressing these constants. In some cases their nature will be evident from the properties of the algorithm under consideration, but in all cases, especially when the constants are omitted from the discussion, the reader may assume they depend only on d for simplicity.
806
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
2.1. Preliminary analysis The properties of the eigenvalues and eigenvectors of problems such as (1), (2) (defined on a rectangular domain) are discussed extensively in [23] where it is shown that the eigenfunctions are continuous and they have continuous partial derivatives of the first order, including the boundary of Id . The operator Lq is symmetric and its eigenvalues and eigenvectors are real. The eigenvalues are positive, they can be indexed in non-decreasing order 0 < 1 (q)2 (q) · · · k (q) · · · , and the sequence of eigenvalues tends to infinity. We denote the corresponding eigenvectors by uq,k , k = 1, 2, . . . . The smallest eigenvalue (q) ≡ 1 (q) is simple, the corresponding eigenspace has dimension one, and the eigenvector, uq ≡ uq,1 , is uniquely determined up to the sign. It is convenient to assume that the uq,k are normalized, i.e.,
1/2 2 uq,k L2 := uq,k (x) dx = 1, k = 1, 2 . . . . Id
Thus they form a complete orthonormal system in L2 (Id ). Then (3) becomes d (q) = min (Dj u)2 (x) + q(x)u2 (x) dx u∈H01 ,uL2 =1 Id j =1
=
d Id j =1
(Dj uq )2 (x) + q(x)u2q (x) dx.
(4)
For additional details concerning the properties of eigenvalues and eigenfunctions of elliptic operators as well as numerical methods approximating them, see [2,9,11,12] and the references therein. For a constant function q ≡ c we know that (c) = d + c 2
and
uc (x1 , . . . , xd ) = 2
d/2
d
sin(xj ).
j =1
It is also known that the eigenvalues of Lq are non-decreasing functions of q [9,23], i.e., q(x) ¯ for all k = 1, 2, . . . . Thus, using (4) for the q(x), ¯ for all x ∈ [0, 1]d , implies that k (q)k (q), class Q we get d2 = (0)(q)d2 + 1,
q ∈ Q.
For d > 1, the eigenvalues of Lq are, generally, not all simple. However, as in the case d = 1, the smallest eigenvalue (q) is simple and is well separated from the remaining eigenvalues. This is because of the non-decreasing property of the eigenvalues of Lq with respect to q, and the fact that the second smallest eigenvalue of L0 is equal to 2 (0) = (d + 3)2 . Therefore, using (q) d2 + 1, we obtain k (q) − (q)2 (q) − (q)32 − 1,
k 3, q ∈ Q.
(5)
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
807
We will use this fact to establish an estimate for the smallest eigenvalue by considering a perturbation of q. For any two functions q, q¯ ∈ Q we have |(q) − (q)| ¯ q − q ¯ ∞, uq − uq¯ L2 O (q − q ¯ ∞) , (q) = (q) ¯ + ¯ u2q¯ (x) dx + O q − q ¯ 2∞ . (q(x) − q(x))
(6) (7) (8)
Id
Eqs. (6) and (8) are derived as in [21]. They follow from elementary arguments and (7). For the convenience of the reader we point out that it is easy to show that 2 2 2 (q(x) − q(x))u ¯ (q(x) − q(x))(u ¯ (q)(q) ¯ + q (x) dx + q¯ (x) − uq (x)) dx, Id
and similarly
Id
(q)(q) ¯ + Id
(q(x) ¯ − q(x))u2q (x) dx.
These inequalities imply (6), and using them with (7) we obtain (8). Moreover, 2 2 (q(x) − q(x))(u ¯ q¯ (x) − uq (x)) dx 0. Id
We prove Eq. (7) using an approach similar to that in [29], which is based on the separation between 2 (q) and (q). It is a different proof from the one used in [21]. Indeed, let q and q¯ be two functions from the class Q and consider Lq and Lq¯ . Let k (q), uq,k and k (q), ¯ uq,k ¯ , be the eigenvalues and the normalized eigenvectors, k = 1, 2 . . ., of Lq and Lq¯ , respectively. Then Lq¯ uq − (q)uq = Lq uq + (q¯ − q)uq − (q)uq , which implies that Lq¯ uq − (q)uq L2 = (q¯ − q)uq L2 q¯ − q∞ . Since the eigenvectors of Lq¯ form a complete orthonormal system in L2 (Id ) we have uq =
∞
ak uq,k ¯
with uq 2L2 =
k=1
∞
ak2 = 1, ak ∈ R
k=1
and Lq¯ uq =
∞
ak k (q)u ¯ q,k ¯ .
k=1
Thus
2 ∞ ∞ q − q ¯ 2∞ a [ ( q) ¯ − (q)]u = ak2 |k (q) ¯ − (q)|2 k k q,k ¯
k=1 ∞
L2
k=1 ∞ ak2 |k (q) ¯ − (q)|2 (32 − 1)2 ak2 k=2 k=2
= (32 − 1)2 (1 − a12 ),
808
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
where the last inequality is due to the lower bound (5). Thus ¯ 2∞ . a12 1 − (32 − 1)−2 q − q
(9)
Observe that the inequality above implies that a12 0.99. Without loss of generality we assume that the sign of uq has been chosen such that a1 > 0 and then a1 1 − (32 − 1)−2 q − q ¯ 2∞ . Also uq − uq¯ 2L2 = (1 − a1 )2 +
∞
ak2
(10)
k=2
= (1 − a1 )2 + 1 − a12 = 2(1 − a1 ) ¯ 2∞ 2 1 − 1 − (32 − 1)−2 q − q ¯ 2∞ , (32 − 1)−2 q − q
(11)
¯ 2∞ ∈ (0, 1), and this proves where the last inequality is due to the fact that (32 − 1)−2 q − q (7). As a final remark, we observe that the same analysis that led to (9) can be used to establish that (q) is indeed a simple eigenvalue for any q ∈ Q. Clearly (0) is simple. Using (9) with q¯ = 0, we obtain that the square of the projection of u0 onto uq is bounded from below, i.e., 2 that Id u0 (x)uq (x) dx > 21 . If (q) were not simple and the eigenspace corresponding to it had dimension greater than one, there would be at least two orthogonal eigenfunctions uq,1 and uq,2 (both corresponding to (q)). Then each of the projections of uq,1 and uq,2 on u0 satisfies the preceding inequality (since (0) is simple). Thus, expanding u0 using the eigenfunctions of Lq would lead us to conclude that u0 L2 > 1, a contradiction since we have assumed u0 is a normalized eigenfunction. 3. Classical algorithms Let us now discuss the type of classical algorithms we consider and define how we measure their error and cost. These algorithms can be either deterministic or randomized. They use information about the functions q from Q by computing q(ti ) for some discretization points ti ∈ [0, 1]d . Here, i = 1, 2, . . . , nq , for some nq , and the points ti can be adaptively chosen, i.e., ti can be a function ti = ti (t1 , q(t1 ), . . . , ti−1 , q(ti−1 )) of the previously computed function values and points for i 2. The number nq can also be adaptively chosen, see, e.g., [25] for details. A classical deterministic algorithm produces an approximation (q) = (q(t1 ), . . . , q(tnq )) to the smallest eigenvalue (q) based on finitely many values of q computed at deterministic points. Let n = supq∈Q nq . We assume that n < ∞. The worst case error of such a deterministic
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
809
algorithm is given by ewor (, n) = sup |(q) − (q)|.
(12)
q∈Q
A classical randomized algorithm produces an approximation to (q) based on finitely many evaluations of q computed at random points, and is of the form (q) = (q(t1, ), . . . , q(tnq, , )), where , ti, and nq, are random variables. We assume that the mappings → ti, = ti (t1, , q(t1, ), . . . , ti−1, , q(ti−1, )), → , → nq, are measurable. Let nq = E(nq, ) be the expected number of values of the function q with respect to . As before, we assume that n = supq∈Q nq < ∞. The randomized error of such a randomized algorithm is given by 1/2 . (13) eran (, n) = sup E[(q) − (q)]2 q∈Q
We denote the minimal number of function values needed to compute an ε-approximation of the Sturm–Liouville eigenvalue problem in the worst case and randomized settings by nwor (ε) = min{ n: ∃ such that ewor (, n) ε } and nran (ε) = min{ n : ∃ such that eran (, n) ε }, respectively. We refer to nwor (ε) and nran (ε) as the worst case and the randomized case information complexity, respectively. We also consider the cost of combining the function evaluations. For a function q ∈ Q, let mq be the number of arithmetic operations used by an algorithm in order to combine nq function values and obtain the final result. Then the worst case cost of an algorithm is defined as costwor () = sup c nq + mq , q∈Q
where c denotes the cost of an evaluation of q. The worst case complexity compwor (ε) is defined as the minimal cost of an algorithm whose worst case error is at most ε, compwor (ε) = min cost wor () : such that ewor (, n) ε . Obviously, compwor (ε) c nwor (ε). The cost of a randomized algorithm using n = supq∈Q E(nq, ) < ∞ randomized function evaluations is defined as 2 1/2 , costran () = sup E c nq, + mq, q∈Q
810
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
where mq, is the number of arithmetic operations used by the algorithm for a function q from Q and a random variable . The randomized complexity compran (ε) = min cost ran () : such that eran (, n) ε is the minimal cost of an algorithm whose randomized error is at most ε. Obviously, compran (ε) c nran (ε). 3.1. Deterministic algorithms In this section we derive lower and upper bounds for the error and the complexity of deterministic algorithms in the worst case. We begin with the lower bounds. Our derivation is based on the proof in [21] which deals with the case d = 1. Let q¯ = 21 and consider q ∈ Q such that q − 21 ∞ c. Then u1/2 is known and (8) becomes (q) = d2 +
1 + 2d 2
q(x1 , . . . , xd ) − Id
1 2
d
sin2 (xj ) dx1 . . . dxd + ,
(14)
j =1
where || = O(c2 ). Recall that 0 because the first three terms in the right-hand side of the equation overestimate (q) due to (4). Functions that differ by a constant satisfy the above equation with the same value of . Assume that c > 0 is sufficiently small so that c + || < 21 . We will reduce the eigenvalue problem to the multivariate integration problem and use the well known [25] lower bounds for integration to establish a lower bound for the eigenvalue problem. Consider the class of functions Fc = f : Id → R f, Dj f ∈ C(Id ), Dj f ∞ 1, j = 1, . . . , d, f ∞ c (15) and the approximation of weighted integrals of the form S(f ) =
f (x1 , . . . , xd ) Id
d
sin2 (xj ) dx1 . . . dxd .
(16)
j =1
The worst case error of any deterministic algorithm approximating such integrals using n points in Id is (n−1/d ), where the asymptotic constant depends on d. Here we assume that n is large enough so that c?n−1/d . This lower bound is known [25] for integration without weights but the same proofs carry over to this case. Take an f ∈ Fc and set q = f + 21 . Then q belongs to Q. The functions q ± also belong to the class Q because c + || < 21 . Let q˜ = q − . Then (q) ˜ = d2 +
1 2
+ 2d S(f ).
ˆ q) Let ( ˜ be an algorithm approximating (q) ˜ using n function evaluations of q˜ at deterministic points. Then ˆ q) ˜ − d2 − 21 (17) (f ) = 2−d (
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
811
is an algorithm approximating the weighted integral S(f ) with error ˆ q) |S(f ) − (f )| = 2−d |( ˜ − (q)|. ˜ In the worst case with respect to f this quantity is (n−1/d ). Hence, the error of any deterministic algorithm ˆ that approximates (q), for q ∈ Q, using n evaluations of q is bounded from below as follows: ˆ n) = sup |(q) ˆ ewor (, − (q)| = (n−1/d ), q∈Q
where the asymptotic constant depends on d. Therefore, the worst case information complexity nwor (ε) is bounded from below by a quantity proportional to ε −d . Let us now consider upper bounds for the problem complexity. We discretize Lq at the points (i1 h, . . . , id h), ij = 1, . . . , m, j = 1, . . . , d, where h = (m + 1)−1 , and we obtain an md × md matrix Mh (q) = −h +Bh (q), where −h is the md ×md matrix resulting from the (2d +1)-point finite difference discretization of the Laplacian [11,12]. The matrix Bh (q) is diagonal containing evaluations of q at all the discretization points. The matrix Mh (q) is sparse, symmetric positive definite, and its smallest eigenvalue approximates the smallest eigenvalue of Lq with error O(h) [26,27], i.e., |(q) ¯ − (Mh (q))| ¯ = O(h). For example when d = 2 we have ⎛ ⎞ Th −I ⎜ −I Th −I ⎟ ⎜ ⎟ ⎟ −2 ⎜ . . . .. .. .. −h = h ⎜ ⎟ ⎜ ⎟ ⎝ −I Th −I ⎠ −I Th where I is the m × m matrix given by ⎛ 4 −1 ⎜ −1 4 ⎜ ⎜ .. Th = ⎜ . ⎜ ⎝
⎛
and
⎜ ⎜ ⎜ Bh (q) ¯ =⎜ ⎜ ⎜ ⎝
⎞
b11 ..
⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎠
. bij ..
. bmm
identity matrix, bij = q(ih, j h), i, j = 1, . . . , m, and Th is the m × m ⎞ ⎟ −1 ⎟ ⎟ .. .. ⎟, . . ⎟ −1 4 −1 ⎠ −1 4
see [11, p. 270] for more details. The matrix −h has been extensively studied in the literature; see [11,12] and the references therein. Its eigenvalues and eigenvectors are known. The smallest eigenvalue of Mh (0) = −h is (Mh (0)) = 4dh−2 sin2 (h/2) = d2 (1 + O(h2 )). Moreover, the eigenvectors of Mh (0) are tensor products of the eigenvectors of the corresponding matrix in the one-dimensional case d = 1. This is also trivially true for the eigenvectors of Mh (c), where c is any constant. Using results concerning the eigenvalues of perturbed symmetric matrices [29] we have that the smallest eigenvalue (Mh (q)) of Mh (q) satisfies |(Mh (0)) − (Mh (q))|1.
812
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
Moreover, the eigenvalues k (Mh (q)), k = 1, . . . , md (indexed in non-decreasing order) satisfy an equation similar to (5), namely, k (Mh (q)) − (Mh (q))2 (Mh (q)) − (Mh (q)) 32 − 2,
k 3, q ∈ Q.
(18)
The inequalities follow from results concerning the eigenvalues of the sum of two symmetric matrices [29, p. 101], and the separation of the eigenvalues of the matrix Mh (0) = −h . We can approximate the smallest eigenvalue of Mh (q) with error h using the bisection method [11, p. 228] in O(log m) steps. Each step takes a number of arithmetic operations proportional to the number of non-zero elements in Mh (q), which is O(md ), with the asymptotic constant depending on d. Hence, the total cost of approximating (Mm (q)) is O(md log m), and the asymptotic constant depends on d. Setting m + 1 = ε−1 , we obtain an algorithm that approximates (q) by the smallest eigenvalue of the matrix (Mε (q)). This algorithm has error O(ε) and uses ε −d evaluations of q, and O(ε−d log ε−1 ) arithmetic operations. Combining the lower bound for nwor (ε) from the first part of this section with the cost of the algorithm above we obtain the following theorem. Theorem 3.1. nwor (ε) = (ε−d ),
(c ε−d ) = compwor (ε) = O(c ε−d + ε −d log ε−1 ),
where the asymptotic constants depend on d. We conclude this section by remarking that we can extend these results about nwor (ε) to the case where q has continuous and bounded partial derivatives up to order r. The same approach yields that nwor (ε) = (ε−d/r ). So we have a delayed curse of dimension. 3.2. Randomized algorithms We first prove lower bounds for nran (ε) just as we proved lower bounds for nwor (ε). We reduce the problem to multivariate integration and use the known randomized information complexity lower bounds for integration. Recall the perturbation formula (14), the definition (15) of the class Fc and the weighted integration problem (16). Assuming that n is sufficiently large so that c?n−(d+2)/(2d) , we know [25] that the error of any randomized algorithm that approximates the weighted integral S(f ) using n function evaluations at randomly chosen points is bounded from below by a quantity proportional to n−(d+2)/(2d) . (As we already mentioned, this is known for integrals without weights but the same proofs carry over to this case.) For f ∈ Fc , set q = f + 21 ∈ Q and q˜ = q − ∈ Q; see (14). Let ˆ be any randomized algorithm that uses n function evaluations to approximate (q). ˜ Then (f ), defined by replacing ˆ with ˆ in (17), is a randomized algorithm approximating S(f ), and its error is
E[S(f ) − (f )]2
1/2
1/2 = 2−d E[ˆ (q) ˜ − (q)] ˜ 2 .
Taking the worst case with respect to f we see that this quantity is (n−(d+2)/(2d) ).
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
813
Therefore for any randomized algorithm ˆ that approximates (q), for q ∈ Q, using n function evaluations of q at randomly chosen points, we have ˆ n) = (n−(d+2)/(2d) ), eran (, which implies that nran (ε) = (ε−2d/(d+2) ), where the asymptotic constant depends on d. We now derive upper bounds for compran by constructing an algorithm. First we take (n + 1)d samples of q on a grid of equally spaced points (i1 /n, . . . , id /n), ij = 0, . . . , n, j = 1, . . . , d. Using these points we construct a piecewise polynomial q˜ by interpolation. For instance, q˜ can be a natural spline. Then q˜ − q∞ = O(n−1 ). Setting q¯ = q˜ + O(n−1 ) we have that q¯ 0 and q¯ − q∞ = O(n−1 ). Clearly, given the evaluations of q, q¯ can be constructed with O(nd ) arithmetic operations. The perturbation formula (8) for q and q¯ becomes (q) = (q) ¯ + (q(x) − q(x)) ¯ u2q¯ (x) dx + O(n−2 ). (19) Id
We will approximate (q) by an algorithm that ˆ q) 1. computes ( ¯ that approximates (q) ¯ by discretizing Lq¯ and solving a matrix eigenvalue problem, 2. replaces uq¯ in the integral above by an approximate eigenfunction uˆ q¯ , and 3. approximates the resulting integral by MC. Therefore, (19) becomes 2 (q) = (q) ¯ + (q(x) − q(x)) ¯ uˆ q¯ (x) dx + (q(x) − q(x)) ¯ (u2q¯ (x) − uˆ 2q¯ (x)) dx Id
Id
+O(n−2 ),
(20)
and the algorithm approximates the first two terms in the right-hand side of this expression. In particular, the algorithm is given by 1 ˜ ˆ q) (q) := ( ¯ + (q(ti, ) − q(t ¯ i, )) uˆ 2q¯ (ti, ), k k
(21)
i=1
where t1, , . . . , tk, are independent random numbers that follow the uniform distribution in Id . Then the expected error of this algorithm satisfies 1/2 2 ˜ ˆ q)| E[(q) − (q)] |(q) ¯ − ( ¯ 2 !1/2 k 1 2 2 + E (q(x) − q(x)) ¯ uˆ q¯ (x) dx − (q(ti, ) − q(t ¯ i, )) uˆ q¯ (ti, ) k Id i=1
+ uq¯ − uˆ q¯ L2 O(n
−1
) + O(n
−2
).
(22)
Let us now discuss the individual steps of the algorithm and the resulting errors. We discretize the operator Lq¯ on a grid with mesh size h = (m + 1)−1 , exactly as we did in the previous section. The smallest eigenvalue (Mh (q)) ¯ of the resulting matrix approximates (q) with error |(q) ¯ − (Mh (q))| ¯ = O(h),
(23)
814
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
ˆ h (q)) see [27]. We approximate (Mh (q)) ¯ by (M ¯ with error ˆ h (q)) |(M ¯ − (Mh (q))|h, ¯
(24)
which we obtain using the bisection method with cost proportional to md log m times a constant ˆ q) ˆ h (q)). ¯ that depends on d. In (21), we set ( ¯ := (M We now show how to construct the approximate eigenfunction uˆ q¯ required for the second step of our algorithm. Let z = z(Mh (q)) ¯ be the eigenvector of Mh (q) ¯ that corresponds to (Mh (q)). ¯ We assume that z is normalized so that z2 :=
md
1/2 zk2
= 1.
k=1
ˆ h (q)), ¯ we compute an approximation of z using an inverse iteration with the matrix Given (M ˆ h (q)) Mh (q) ¯ − (M ¯ I. We can compute the determinant of this matrix with cost proportional to md . If the matrix is ˆ h (q)) singular, we can perturb (M ¯ by h to obtain a non-singular matrix. The initial vector in inverse iteration is z0 , the eigenvector of Mh (0) that corresponds to its smallest eigenvalue. Observe that the separation of eigenvalues of Mh (q) ¯ as expressed by (18) and arguments similar to those that ¯ 2∞ /(32 − 2)2 . led to (9) and which can be found in [29, p. 172], yield that (z0T z)2 1 − q Since the projection of the initial vector onto the eigenvector of interest is sufficiently large, with O(log m) inverse iteration steps we obtain an approximate eigenvector zˆ , with ˆz2 = 1, such that ˆz − z2 = O(h). The total cost to obtain zˆ is O(md log m). The Rayleigh quotient h =
¯ z zˆ T Mh (q)ˆ , 2 ˆz2
¯ with error O(h). Using zˆ we construct the approximate eigenfunction also approximates (Mh (q)) uˆ q¯ of Lq¯ by a method suggested by Courant [10] and used in [27]. In particular, we subdivide Id into simplices whose vertices are the grid points. Then we construct a piecewise linear function on each simplex that is zero on the boundary of Id and interpolates the values of zˆ at the grid points; see [27] for the details. We denote the interpolating function by u˜ q¯ . The cost for constructing u˜ q¯ is O(md ). Consider now the Rayleigh quotient d ˜ q¯ (x)]2 + q(x) ¯ u˜ 2q¯ (x) dx j =1 [Dj u Id (25) = u˜ q¯ 2L2 for the function u˜ q¯ . From [27] we know that (q) ¯ h + O(h).
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
815
Since |h − (Mh (q))| ¯ = O(h), the equation above and (23) imply that | − (q)| ¯ = O(h).
(26)
We set uˆ q¯ := u˜ q¯ /u˜ q¯ L2 with cost O(md ). Let us now estimate uq¯ − uˆ q¯ L2 . Consider the ∞ eigenvalues ( q) ¯ and eigenvectors u , k = 1, . . . , of L . Then we have u ˆ = k q,k ¯ q ¯ q ¯ ¯ , k=1 ak uq,k ∞ 2 where k=1 ak = 1. Thus from (25) we obtain =
∞
k (q)a ¯ k2 .
k=1
Equivalently, 0=
∞
ak2 [k (q) ¯ − ]
=
k=1
∞
ak2 [k (q) ¯ − ] − a12 [ − 1 (q)]. ¯
k=2
¯ we obtain Using (5) and (26) and the fact that (q) ¯ = 1 (q) a12 [ − (q)] ¯ (32 − 2)
∞
ak2 = (32 − 2)(1 − a12 ),
k=2
and using (26) again, we find that 1 − a12 = O(h). Hence, uq¯ − uˆ q¯ 2L2 = O(h).
(27)
The proof of the last equation is the same as the proof we used to derive (11) from Eq. (9). Recall that the algorithm (21) uses MC to approximate the first integral in (20). It is well known that MC with k function evaluations has error bounded from above by the L2 norm of the integrand times k −1/2 , i.e., the MC error does not exceed n−1 k −1/2 .
(28)
˜ Combining (22) with (23), (24), (27), (28) we obtain that the expected error of the algorithm (q), described in (21), is bounded from above by a quantity proportional to m−1 + n−1 k −1/2 + n−1 m−1/2 + n−2 .
(29)
The cost of this algorithm is equal to nd evaluations of q at deterministic points, plus k evaluations involving q (i.e., evaluations of (q − q) ¯ uˆ 2q¯ ), plus a number of arithmetic operations propord d tional to n + m log m + k times a constant that depends on d. Taking m−1 = ε and observing that we can take k = nd without changing the order of magnitude of the cost of the algorithm, expression (29) becomes ε + n−(d+2)/2 + n−1 ε 1/2 + n−2 .
(30)
The number of evaluations of q is proportional to nd and the number of arithmetic operations is proportional to nd + ε −d log ε−1 times a constant that depends on d.
816
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
The cost of approximating q by q¯ is proportional to nd . It is worth noting that this is the dominant part of the algorithm cost. Indeed, even though we can approximate the first integral of (21) with high accuracy using MC with O(nd ) function evaluations, the advantages of this approximation are lost when n−(d+2)/2 = O(n−2 ) since the eigenvalue error depends on O(n−2 ) as seen in (30). Therefore when d 2, we get error of order ε with ε −2d/(d+2) function evaluations, while for d > 2 we get error of order ε with ε−d/2 function evaluations. In both cases the number of arithmetic operations is proportional to ε −d log ε−1 times a constant that depends on d. We summarize the results of this section in the following theorem. Theorem 3.2. (ε−2d/(d+2) ) = nran (ε) = O(ε− max(2/3,d/2) ), (ε−2d/(d+2) ) = compran (ε) = O(c ε− max(2/3,d/2) + ε −d log ε−1 ), where the asymptotic constants depend on d. When d > 2 we do not have matching upper and lower bounds for nran (ε) and improving the upper bound is an open problem at this time. One possibility would be to use a perturbation formula of higher order of accuracy. On the other hand, we see that if we consider functions that have continuous and bounded mixed partial derivatives up to order r then our approach yields that (ε−2d/(2r+d) ) = nran (ε) = O(ε− max(2d/(2r+d),d/(2r)) ), which extends the range of values of d to 1 d 2r for which we do have matching upper and lower bounds. 4. Quantum algorithms A quantum algorithm applies a sequence of unitary transformations to an initial state, and the final state is measured. See [3,8,14,19] for the details of the quantum model of computation. We briefly summarize this model to the extent necessary for this paper. The initial state |0 is a unit vector of the -fold tensor product Hilbert space H = C2 ⊗ · · · ⊗ C2 , for some appropriately chosen integer , where C2 is the two-dimensional space of complex numbers. The dimension of H is 2 . The number denotes the number of qubits used in quantum computation. The final state | is also a unit vector of H and is obtained from the initial state |0 by applying a number of unitary 2 × 2 matrices, i.e., | := UT QY UT −1 QY · · · U1 QY U0 |0 .
(31)
Here, U0 , U1 , . . . , UT are unitary matrices that do not depend on the input function q. The unitary matrix QY with Y = [q(t1 ), . . . , q(tn )] is called a quantum query and depends on n (with n 2 ), function evaluations of q computed at some non-adaptive points ti ∈ Id . The quantum query QY is the only source of information about q. The integer T denotes the number of quantum queries we choose to use. At the end of the quantum algorithm, a measurement is applied to its final state |. The measurement produces one of M outcomes, where M 2 . Outcome j ∈ {0, 1, . . . , M − 1} occurs with probability pY (j ), which depends on j and the input Y. Knowing the outcome j, we compute an approximation ˆ Y (j ) of the smallest eigenvalue on a classical computer.
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
817
We now define the error in the quantum setting. In this setting, we want to approximate the smallest eigenvalue (q) with a probability p > 21 . For simplicity, we take p = 43 in the rest of this section. As is common for quantum algorithms, we can achieve an ε-approximation with probability arbitrarily close to 1 by repeating the original quantum algorithm, and by taking the median as the final approximation. The local error of the quantum algorithm with T queries that computes ˆ Y (j ) for the function q ∈ Q and the outcome j ∈ {0, 1, . . . , M − 1} is defined by e(ˆ Y , T ) = min : pY (j ) 43 . j : |(q)−ˆ Y (j )|
This can be equivalently rewritten as e(ˆ Y , T ) = min max (q) − ˆ Y (j ), A: (A) 3/4 j ∈A
where A ⊂ {0, 1, . . . , M − 1} and (A) =
j ∈A pY (j ).
The worst probabilistic error of a quantum algorithm ˆ with T queries for the Sturm–Liouville eigenvalue problem is defined by quant ˆ d ˆ e (, T ) = sup e(Y , T ): Y = [q(t1 ), . . . , q(tn )], ti ∈ [0, 1] , for q ∈ Q . (32) We define the query complexity nquery (ε) of a quantum algorithm by ˆ T ) ε }. nquery (ε) = min{ T : ∃ ˆ such that equant (,
(33)
Moreover, since we will be dealing with two types of queries, bit queries and power queries, we will be using the notation nbit-query (ε) and npower-query (ε), respectively, to label the query complexity by the type of queries used. In principle, quantum algorithms may have many measurements applied between sequences of unitary transformations of the form presented above. However, any algorithm with many measurements and a total of T quantum queries can be simulated by a quantum algorithm with only one measurement at the end, for details see e.g., [14]. Classical algorithms in floating or fixed point arithmetic can also be written in the form of (31). Indeed, all classical bit operations can be simulated by quantum computations, see e.g., [4]. Classically computed function values will correspond to bit queries, which we discuss in the next section. We formally use the real number model of computation [24]. Since our eigenvalue problem is well conditioned and properly normalized, we obtain practically the same results in floating or fixed point arithmetic. More precisely, it is enough to use O(log ε −1 ) mantissa bits, and the cost of bit operations in floating or fixed point arithmetic is of the same order as the cost in the real number model multiplied by a power of log ε−1 . Hybrid algorithms, which are combinations of classical and quantum algorithms, can be viewed as finite sequences of algorithms of the form (31) and can be expressed as one quantum algorithm of the form (31), see [14,15]. Consequently, when proving lower bounds it suffices to consider only algorithms of the form (31). For upper bounds it is sometimes convenient to distinguish between classical and quantum computations and charge their costs differently. The cost of classical computations is defined in the previous section. The cost of quantum computations is defined as
818
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
the sum of the number of quantum queries multiplied by the cost of a query, plus the number of quantum operations other than queries. It is also important to indicate how many qubits are used by the quantum algorithm. 4.1. Bit queries Quantum queries are important in the complexity analysis of quantum algorithms. A quantum query corresponds to a function evaluation in classical computation. By analogy with the complexity analysis of classical algorithms, we analyze the cost of quantum algorithms in terms of the number of quantum queries that are necessary to compute an ε-approximation with probability 43 . Clearly, this number is a lower bound on the quantum complexity, which is defined as the minimal total cost of a quantum algorithm that solves the problem. Different quantum queries have been studied in the literature. Probably the most commonly studied query is the bit query as used in Grover’s search algorithm [13]. For a Boolean function f : {0, 1, . . . , 2m − 1} → {0, 1}, the bit query is defined by Qf |j |k = |j |k ⊕ f (j ). Here = m + 1, |j ∈ Hm , and |k ∈ H1 with ⊕ denoting addition modulo 2. For real functions q the bit query is constructed by taking the most significant bits of the function evaluated at some points tj . More precisely, as in [14], the bit query for the function q has the form Qq |j |k = |j |k ⊕ (q( (j ))), where the number of qubits is now = m + m and |j ∈ Hm , |k ∈ Hm with some functions : [0, 1] → {0, 1, . . . , 2m − 1} and : {0, 1, . . . , 2m − 1} → Id , and ⊕ denotes addition modulo 2m . Hence, we compute q at tj = (j ) ∈ Id and then take the m most significant bits of q(tj ) by (q(tj )), for details and a possible use of ancilla qubits see again [14]. The quantum amplitude amplification algorithm of Brassard et al. [7] computes the mean of a Boolean function defined on the set of N elements with accuracy ε and probability 43 using of order min{N, ε−1 } bit queries. Modulo multiplicative factors, it is an optimal algorithm, in terms of the number of bit queries. This algorithm can be also used to approximate the mean of a real function f : Id → R with |f (x)|M, x ∈ Id , see [14,20]. More precisely, if we want to approximate N−1 1 SN (f ) := f (xj ) N j =0
for some xj ∈ Id and N, then the amplitude amplification algorithm QSN (f ) approximates SN (f ) such that |SN (f ) − QSN (f )| ε
with probability
3 4
(34)
using of order min(N, Mε−1 ) bit queries, min(N, Mε −1 ) log N quantum operations, and log N qubits. We begin by showing a lower bound for the query complexity, nbit-query (ε), of the eigenvalue problem. We do this by first estimating the bit query complexity, nbit-query (ε, INTFc ), of the weighted integration problem (16) in the class Fc , as defined in (15), and then reducing the eigenvalue problem to the integration problem.
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
819
From [20] we have nbit-query (ε, INTFc ) = O(ε−d/(d+1) ). Consider now any quantum algorithm that solves the integration problem with error ε and probability at least 43 , using k bit queries. " Let h(x1 , . . . , xd ) = dj =1 hj (xj ) for (x1 , . . . , xd ) ∈ Id , where hj (x) = x 2 (1 − x)2 , x ∈ [0, 1], and h(x1 , . . . , xd ) = 0 for (x1 , . . . , xd ) ∈ Rd \ Id . Here, is a constant such that h ∈ F1 , where F1 is defined by (15) with c = 1. For each j = 1, . . . , d and i = 0, . . . , n − 1, let hi,j (x) = hj (n(x − i/n)). Then the support of hi,j is [i/n, (i + 1)/n]. We obtain nd functions on Id . Each function is defined by hi1 ,...,id (x1 , . . . , xd ) =
d
hij ,j (xj ), n j =1
" and its support is the cube dj =1 [ij /n, (ij + 1)/n], ij = 0, . . . , n − 1. For notational convenience we re-index these functions, in any desirable way, and denote them by g , = 0, . . . , nd − 1 (i.e., g = hi1 ,...,id .) Thus g ∞ n−1 and assuming that c?n−1 we have g ∈ Fc . Then −d−1 g (x) dx = n h(x) dx, = 0, . . . , nd − 1. Id
Id
Consider now any Boolean function B: {0, 1, . . . , nd − 1} → {0, 1} and define the function fB (x) =
d −1 n
B()g (x),
x ∈ Id .
=0
Then fB ∈ Fc and
fB (x) dx = Id
Id
−1 h(x) dx 1 n B(). n nd d
=0
Thus, computing the Boolean mean is reduced to computing the integral of fB . From [18] we know that k < nd bit queries yield error (k −1 ) in the approximation of the Boolean mean. Therefore, by setting k = nd , ∈ (0, 1), we obtain that the error in approximating the integral of fB is (n−(d+1) ). Hence, for error ε we need k = (ε −d/(d+1) ) bit queries. Using the upper bound of [20] we obtain nbit-query (ε, INTFc ) = (ε−d/(d+1) ). This complexity bound remains valid if c depends on ε and c(ε) → 0, as ε → 0, but not very fast. Therefore, when c(ε)ε−1/(d+1) → ∞ as ε → 0, the bit query complexity for integration in the class Fc(ε) is (ε −d/(d+1) ). Now that we have the bit query complexity for integration, we reduce the eigenvalue problem to integration and obtain a lower bound for the bit query complexity of the eigenvalue problem. This is done in exactly the same way as for classical deterministic and randomized algorithms. In particular, using Eq. (14) we see that any algorithm approximating (q) ˜ can be used to derive an
820
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
algorithm that solves the integration problem S(f ) defined in (16), and f = q˜ + − 21 belongs to the class Fc (15). We omit the details since we have already presented this argument twice. Therefore, solving the eigenvalue problem with error ε and probability at least 43 implies that we can solve the integration problem with error O(ε) and probability at least 43 . Consequently the bit query complexity nbit-query (ε) of the eigenvalue problem is at least as large as the bit query complexity of the integration problem. We have proved the following theorem. Theorem 4.1. nbit-query (ε) = (ε−d/(d+1) ). To derive a quantum algorithm for the eigenvalue problem we can slightly modify the randomized algorithm we presented previously. The third and last step of the randomized algorithm approximates a weighted integral using MC. The quantum algorithm will approximate that integral using the amplitude amplification algorithm [7]. In particular the quantum algorithm approximates the first two terms on the right-hand side of Eq. (20) by ˜ ˆ q) (q) := ( ¯ + ((q − q) ¯ uˆ 2q¯ ),
(35)
ˆ q) ˆ h (q)), ¯ := (M ¯ h = (m + 1)−1 , where, just as before, q¯ approximates q with error O(n−1 ), ( 2 while ((q − q) ¯ uˆ q¯ ) is the result of the amplitude amplification algorithm with T bit queries as applied in [20] for the approximation of the integral of (q − q) ¯ uˆ 2q¯ in (20). Since (q − q) ¯ uˆ 2q¯ ∞ = O(n−1 ), with probability 43 the error of (35) is bounded from above by ˆ q)| |(q) ¯ − ( ¯ + O((nT )−1 ) + uq¯ − uˆ q¯ L2 O(n−1 ) + O(n−2 ), where the second term is the error of the quantum algorithm ; see also (34). We have seen that ˆ q)| |(q) ¯ − ( ¯ = O(m−1 ) and uq¯ − uˆ q¯ L2 = O(m−1/2 ). This yields an error proportional to m−1 + (nT )−1 + n−1 m−1/2 + n−2 . The algorithm uses nd evaluations of q at deterministic points, plus a number of classical operations proportional to nd + md log m times a constant that depends on d. The algorithm also uses T bit queries involving q, plus of order log2 T + d log m quantum operations, excluding the cost of queries, for the details see [7,19]. Note that log2 T operations are sufficient for the quantum implementation of the Fourier transform used in the amplitude amplification algorithm. The number of qubits is of order log T + d log m. Setting m−1 = ε 2 and T = O(nd ), we get that the error of our algorithm is bounded from above by a quantity proportional to ε2 + n−(d+1) + n−1 ε + n−2 . Note that when d 2 we do not necessarily have to take as many as O(nd ) queries, since reducing the integration error does not reduce the upper bound of the algorithm error which still depends on n−2 . However, taking T = O(nd ) does not change the order of magnitude of the cost of the algorithm. The dominant component of the cost of the algorithm is the nd classical function evaluations required for the approximation of q by q. ¯ Finally, setting n = ε −1/2 yields error O(ε).
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
821
3 4
and error O(ε) by the
Theorem 4.2. The eigenvalue problem can be solved with probability hybrid algorithm (35). This algorithm uses
ε−d/2 classical function evaluations, ε−2d log ε−1 (times a constant that depends on d) classical arithmetic operations, ε−d/2 bit queries, d 2 log2 ε −1 (times a constant independent of d) quantum operations, excluding queries, (mostly used for the quantum implementation of the Fourier transform), • and a number of qubits proportional to d log ε −1 . • • • •
We see that the number of bit queries used by this algorithm matches the bit query complexity only when d = 1. Perhaps, as in the case of the randomized algorithm, we can improve this situation and obtain matching upper and lower bounds for the number of bit queries using a perturbation formula with higher order terms. This is an open question at this time. Nevertheless, even if this question has a positive answer, the number of arithmetic operations will remain exponential in d, and this is also true for the deterministic and randomized algorithms we have seen, because all of them solve a matrix eigenvalue problem and the size of the matrix is exponential in d. We can solve the eigenvalue problem with cost (number of queries plus other operations) that is not exponential in d using a quantum algorithm without any classical components. The details of the algorithm will become apparent after we discuss, in the next section, a quantum algorithm that solves the eigenvalue problem using a different type of queries, called power queries. This algorithm is based on phase estimation [19], a quantum algorithm approximating an eigenvalue of a Hermitian matrix, which solves the problem with O(log ε −1 ) power queries. Each of the power queries can be approximated by bit queries using the Trotter formula [19] and phase kick-back [8]. The number of bit queries required for the approximation of each power query is a polynomial in ε−1 and its degree is independent of d. In particular, the degree of this polynomial only depends on the norm of the matrix whose eigenvalue is sought, which is independent of d, and on the accuracy demand ε. We have the following theorem whose proof we postpone to the next section. Theorem 4.3. Phase estimation applied for the approximation of the smallest eigenvalue of Mε (q) achieves error O(ε) with probability at least 43 using a number of bit queries proportional to ε−6 log2 ε −1 . The initial state for phase estimation is the eigenvector of Mε (0) = −ε that corresponds to its smallest eigenvalue. The algorithm uses a number of quantum operations, excluding bit queries, proportional to d ε−6 log4 ε −1 and a number of qubits proportional to d log ε−1 . Consequently, nbit-query (ε) = O(ε −6 log2 ε −1 ). 4.2. Power queries In this section, we consider power queries as they have been described in [21]. For some problems, a quantum algorithm can be written in the form #T UT −1 W #T −1 · · · U1 W #1 U0 |0 . | := UT W
(36)
Here U1 , . . . , UT denote unitary matrices independent of the function q just as before, whereas #j are of the form controlled-Wj , see [19, p. 178]. Then Wj = W pj for an the unitary matrices W n × n unitary matrix W that depends on the input of the computational problem, and for some
822
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
non-negative integers p1 , . . . , pT . Without loss of generality we assume that n is a power of two. Let {|yk } be orthonormalized eigenvectors of W, so that W |yk = k |yk with the corresponding eigenvalue k , where | k | = 1 and k = eik with k ∈ [0, 2) for k = 1, 2, . . . , n. For the unit #j is defined as vectors |x = |0 + |1 ∈ C2 , = 1, 2, . . . , r, the quantum query W
#j |x1 |x2 · · · |xr |yk = |x1 | · · · |xj −1 j |0 + j eipj k |1 |xj +1 · · · |xr |yk . (37) W #j is a 2 × 2 unitary matrix with = r + log n. We stress that the exponent pj only Hence, W affects the power of the complex number eik . #j is called a power query since it is derived from powers of W. Power queries have been W successfully used for a number of problems including the phase estimation problem, see [8,19]. The phase estimation algorithm approximates an eigenvalue of a unitary operator W using a good approximation [1] of the corresponding eigenvector as part of the initial state. The powers of W are defined by pi = 2i−1 . Therefore, phase estimation uses queries with W1 = W , W2 = W 2 , 2 m−1 W3 = W 2 , . . . , Wm = W 2 . It is typically assumed, see [8], that we do not explicitly know 2 W but we are given quantum devices that perform controlled-W, controlled-W 2 , controlled-W 2 , and so on. For our eigenvalue problem, we discretize the operator Lq on a grid with mesh size h, as we did when we were discussing deterministic algorithms. We obtain an md × md matrix Mh (q), with h = (m + 1)−1 , that is symmetric positive definite. Then we define the matrix √ W = exp (i Mh (q)) with i = −1 and a positive , (38) which is unitary since Mh (q) is symmetric. #j used in (36). Accordingly, we modify the Using the powers of W we obtain the matrices W #j is one query definition in Eq. (31) by assuming, as in [19, Chapter 5], that for each j the W quantum query. Hence for algorithms that can be expressed in the form (36), the number of power queries is T, independently of the powers pj . With the understanding that the number of queries T is defined differently in this section than ˆ T ) of the algorithm (36) is given by (32). Similarly, the power query before the error equant (, power query (ε) is defined by (33). complexity n We now exhibit a quantum algorithm with power queries that approximates (q) with error O(ε). Consider W defined by (38) with = 1/(2d), i.e.,
1 W = exp (39) i Mh (q) . 2d The eigenvalues of W are eij (Mh (q))/(2d) , with j (Mh (q)) being the eigenvalues of the md × md matrix Mh (q). Without loss of generality we assume that m is a power of two. These eigenvalues can be written as e2ij , where j = j (Mh (q)) =
1 j (Mh (q)) 4d
are called phases. We are interested in estimating the smallest phase 1 (Mq ), which belongs to (0, 1) since 1 (Mh (q)) ∈ [d2 , d2 + 1]. We denote the eigenvector of Mh (q) and W that corresponds to j (Mh (q)) by zj (Mh (q)), with zj (Mh (q))2 = 1, j = 1, . . . , md , indexed in non-decreasing order of eigenvalues.
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
823
Phase estimation, see [19, Section 5.2], is a quantum algorithm that approximates the phase 1 (Mq ). Clearly, to compute an ε-approximation of 1 (Mh (q)), it is enough to compute an ε/(4d)-approximation of 1 (Mh (q)). The initial state of phase estimation algorithm is |0⊗b |z1 , where b is related to the accuracy of the algorithm and will be determined later, while |z1 = |z1 (Mh (q)). It is helpful to think of two registers holding the initial state. The top register is b qubits long and holds |0⊗b , while the bottom register holds the eigenvector |z1 . Abrams and Lloyd [1] showed that phase estimation can still be used even if the eigenvector |z1 is replaced by a good approximation |. More precisely, expanding | in the basis of the eigenvectors |zj , the initial state takes the form |0
⊗b
⊗b
| = |0
d −1 m
dk |zk .
k=0
The success probability of the algorithm depends on |d1 |2 , the square of the projection of | onto |z1 . Omitting the details, which are not important in the analysis here and can be found in [1,21], a measurement of the top register of the final state of phase estimation, with probability at least 8 |d1 |2 , 2 will produce an index j ∈ [0, 2b − 1] such that j − 1 (Mh (q)) 1 . 2b 2b The cost of phase estimation is equal to b power queries, plus a number of operations proportional to b2 + d log m, plus the cost for preparing the initial state |. The number of qubits used is b + d log m. We remark that the O(b2 ) operations are for the quantum implementation of the (inverse) Fourier transform used in phase estimation [19]. Taking into account that the matrix eigenvalue approximates (q) with error O(h), where h = (m + 1)−1 , we obtain that (q) − 4dj 4d + O(h). 2b 2b Therefore, it suffices to set h = ε and b = log ε−1 to obtain error O(ε) in the approximation of (q). Under these conditions, the cost of the algorithm is equal to log ε −1 power queries, plus a number of operations proportional to log2 ε −1 , plus the cost of preparing the initial state. Recall that we want to implement a good approximation | of |z1 leading to success probability at least 43 . Consider the eigenvector corresponding to the smallest eigenvalue when q = 0. Denote this (1) eigenvector by |z1 (Mε (0)), Mε (0) = −ε . Then |z1 (Mε (0)) = |z1 ⊗d , i.e., |z1 (Mε (0)) is the tensor product of the eigenvectors of the ε−1 × ε−1 matrix of the corresponding one-dimensional (1) problem (i.e., when d = 1) [11]. Each |z1 can be implemented using the Fourier transform with a number of operations proportional to log2 ε −1 , see [19, p. 209, 17,28] for more details. Therefore, we can implement |z1 (Mε (0)) with cost proportional to d log2 ε −1 .
824
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
Now consider any q ∈ Q. Since we know that the eigenvalues of Mε (q) are well separated (18), using |z1 (Mε (0)) as an approximate eigenvector we find that the square of its projection onto z1 (Mε (q)) satisfies [29, p. 173] |z1 (Mε (q))|z1 (Mε (0))|2 1 −
1 . (32 − 2)2
Define the initial state of phase estimation using | := |z1 (Mε (0)) to obtain that |d1 |2 1 − (32 − 2)−2 which leads to a success probability
8 1 3 8 2 . |d1 | 2 1 − 4 2 (32 − 2)2 We have proved the following theorem. Theorem 4.4. The eigenvalue problem can be solved with error O(ε), and probability at least 3 4 , by discretizing Lq and then approximating the smallest eigenvalue of the resulting matrix Mε (q) by phase estimation that uses power queries. The initial state of phase estimation uses the eigenvector of Mε (0) = −ε that corresponds to its smallest eigenvalue. The cost of the algorithm is proportional to • log(ε−1 ) power queries, • log2 ε −1 + d log ε−1 quantum operations, • d log ε−1 qubits. Let us now turn to the query complexity npower-query (ε). The previous theorem implies that = O(log ε−1 ). Consider a function q ∈ Q such that q(x1 , . . . , xd ) = dj =1 g(xj ), where g ∈ C 1 ([0, 1]) is non-negative and g∞ 1 and g ∞ 1. Then [23, p. 113] the eigenvalue problem (1), (2) has a separable solution which is obtained by solving the Sturm–Liouville eigenvalue problem
npower-query (ε)
−y (x) + g(x)y(x) = y(x), y(0) = y(1) = 0.
x ∈ (0, 1),
Denoting the smallest eigenvalue of this problem by (g) we have (q) = d (g). Any algorithm that approximates (q) with error O(ε) also approximates (g) with error O(ε). Using the power query lower bound for the Sturm–Liouville eigenvalue problem [5,6], we conclude any quantum algorithm with power queries that approximates (q) with error O(ε) must use (log ε−1 ) queries. Combining the lower bound with the previous theorem leads to tight power query complexity bounds. Theorem 4.5. npower-query (ε) = (log ε−1 ). We are now ready to prove the upper bound for the bit-query complexity of Theorem 4.3.
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
825
Proof of Theorem 4.3. We use phase estimation as in the proof of Theorem 4.4 but instead of power queries we will use bit queries to approximate them. Recall Eq. (39), with h = ε. The matrix Mε (q) has size md × md with (m + 1)−1 = ε. Its largest eigenvalue does not exceed 4dε −2 + 1 [11, p. 268]. Therefore, we have (2d)−1 Mε (q)2 (4dε −2 + 1)/(2d). For = 4dε −2 + 1 we have (2d)−1 Mε (q)2 1. Recall that (2d)−1 Mε (q) = −(2d)−1 ε + (2d)−1 Bε (q). For notational convenience define A1 = −(2d)−1 ε and A2 = (2d)−1 Bε (q). Then A1 2 1 and A2 2 1. Using the Trotter formula [19, p. 208] we have i(A1 +A2 )/k − eiA1 /k eiA2 /k ck −2 , e 2
where c is a constant (see also [16,22] and the references therein). From (39) we have W L = ei(A1 +A2 )L
for any L ∈ N
and therefore L iA /k iA /k k L c L . W − e 1 e 2 k 2
(40)
In phase estimation we require the maximum power of W to be of order ε−1 . Setting L = O(ε −1 ) in the equation above, we have that L is of order ε−3 . Thus for k proportional to ε −3 log2 ε −1 , the error in the approximation of the matrix exponential (40) is O(log−2 ε −1 ). From [8] we know that using bit queries and phase kick-back we can obtain eiA2 /k . Hence, to approximate the O(log ε−1 ) power queries of phase estimation the algorithm we need a total number of bit queries proportional to ε−6 log2 ε −1 . Since the eigenvalues and eigenvectors of −ε are known, each of the eiA1 /k can be implemented using the quantum Fourier transform with a number of quantum operations proportional to d log2 ε −1 . Thus the total number of quantum operations, excluding bit queries, required to approximate all the power queries is proportional to d ε −6 log4 ε −1 . Using (40) to approximate the power queries only changes the success probability of phase estimation [19, p. 195]. Since phase estimation uses of order log ε −1 power queries and each is approximated with error O(log−2 ε −1 ) the success probability may be reduced by a quantity proportional to log−1 ε −1 . Therefore for ε sufficiently small, the probability remains greater than or equal to 43 . We conclude by addressing the qubit complexity of our problem. By qubit complexity we mean the minimum number of qubits required for a quantum algorithm to achieve error ε. We denote the qubit complexity by nqubit (ε). The qubit complexity is related to the classical information complexity nwor (ε) by nqubit (ε) = (log nwor (ε)). This is shown in [30] and it holds regardless of the type of queries used. Since, nwor (ε) = (ε−d ) we get nqubit (ε) = (log ε−1 ). On the other hand, phase estimation solves the problem with error O(ε) using a number of qubits proportional to d log ε−1 . We have proved the following theorem.
826
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
Theorem 4.6. nqubit (ε) = (log ε−1 ). Acknowledgments I am very grateful to A. Bessen for the extensive discussions we had and his insightful remarks that significantly improved this paper. I thank J.H. Lai, J.F. Traub and A.G. Werschulz for their comments and suggestions. References [1] D.S. Abrams, S. Lloyd, Quantum algorithm providing exponential speed increase for finding eigenvalues and eigenvectors, Phys. Rev. Lett. 83 (1999) 5162–5165. [2] I. Babuska, J. Osborn, Eigenvalue problems, in: P.G. Ciarlet, J.L. Lions (Eds.), Handbook of Numerical Analysis, vol. II, North-Holland, Amsterdam, 1991, pp. 641–787. [3] R. Beals, H. Buhrman, R. Cleve, R. Mosca, R. de Wolf, Quantum lower bounds by polynomials, in: Proceedings FOCS’98, 1998, pp. 352–361, also http://arXiv.org/quant-ph/9802049. [4] E. Bernstein, U. Vazirani, Quantum complexity theory, SIAM J. Comput. 26 (5) (1997) 1411–1473. [5] A.J. Bessen, A lower bound for phase estimation, Phys. Rev. A 71 (4) (2005) 042313, also http://arXiv.org/quant-ph/0412008. [6] A.J. Bessen, A lower bound for the Sturm–Liouville eigenvalue problem on a quantum computer, J. Complexity 22 (5) (2006) 660–675, also http://arXiv.org/quant-ph/04512109. [7] G. Brassard, P. Hoyer, M. Mosca, A. Tapp, Quantum amplitude amplification and estimation, in: Contemporary Mathematics, vol. 305, American Mathematical Society, Providence, NJ, 2002, pp. 53–74, also http://arXiv.org/quant-ph/0005055. [8] R. Cleve, A. Ekert, C. Macchiavello, M. Mosca, Quantum algorithms revisited, Proc. R. Soc. London A 454 (1998) 339–354. [9] C. Courant, D. Hilbert, Methods of Mathematical Physics, vol. I, Wiley Classics Library, Wiley-Interscience, New York, 1989. [10] R. Courant, Variational methods for the solution of problems of equilibrium and variations, Bull. Amer. Math. Soc. 49 (1943) 1–23. [11] J.W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, 1997. [12] G.E. Forsythe, W.R. Wasow, Finite-Difference Methods for Partial Differential Equations, Dover, New York, 2004. [13] L. Grover, Quantum mechanics helps in searching for a needle in a haystack, Phys. Rev. Lett. 79 (2) (1997) 325–328, also http://arXiv.org/quant-ph/9706033. [14] S. Heinrich, Quantum summation with an application to integration, J. Complexity 18 (1) (2002) 1–50, also http://arXiv.org/quant-ph/0105116. [15] S. Heinrich, Quantum integration in Sobolev spaces, J. Complexity 19 (2003) 19–42. [16] T. Jahnke, C. Lubich, Error bounds for exponential operator splitting, BIT 40 (4) (2000) 735–744. [17] A. Klappenecker, M. Rötteler, Discrete Cosine Transforms on Quantum Computers, 2001, http://arXiv.org/ quant-ph/0111038. [18] A. Nayak, F. Wu. The quantum query complexity of approximating the median and related statistics, in: Proceedings of the 31st Annual ACM Symposium on the Theory of Computing (STOC), 1999, pp. 384–393. LANL preprint quant-ph/9804066. [19] M.A. Nielsen, I.L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, Cambridge, UK, 2000. [20] E. Novak, Quantum complexity of integration, J. Complexity 17 (2001) 2–16, also http://arXiv.org/ quant-ph/0008124. [21] A. Papageorgiou, H. Wo´zniakowski, Classical and quantum complexity of the Sturm–Liouville eigenvalue problem, Quantum Inform. Process. 4 (2005) 87–127, also http://arXiv.org/quant-ph/0502054. [22] M. Suzuki, General theory of higher-order decomposition of exponential operators and symplectic integrators, Phys. Lett. A 165 (1992) 387–395. [23] E.C. Titschmarsh, Eigenfunction Expansions Associated with Second-Order Differential Equations, Part B, Oxford University Press, Oxford, UK, 1958.
A. Papageorgiou / Journal of Complexity 23 (2007) 802 – 827
827
[24] J.F. Traub, A continuous model of computation, Phys. Today (May 1999) 39–43. [25] J.F. Traub, G.W. Wasilkowski, H. Wo´zniakowski, Information-Based Complexity, Academic Press, New York, 1988. [26] H.F. Weinberger, Upper and lower bounds for eigenvalues by finite difference methods, Comm. Pure Appl. Math. IX (1956) 613–623. [27] H.F. Weinberger, Lower bounds for higher eigenvalues by finite difference methods, Pacific J. Math. 8 (2) (1958) 339–368. [28] M.V. Wickerhauser, Adapted Wavelet Analysis from Theory to Software, A.K. Peters, Wellesley, 1994. [29] J.H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, Oxford, UK, 1965. [30] H. Wo´zniakowski, The quantum setting with randomized queries for continuous problems, Quantum Inform. Process. 5 (2) (2006) 83–130, also http://arXiv.org/quant-ph/060196.
Journal of Complexity 23 (2007) 828 – 850 www.elsevier.com/locate/jco
Cubature formulas for function spaces with moderate smoothness Michael Gnewuch∗ , René Lindloh,1 , Reinhold Schneider, Anand Srivastav Institut für Informatik, Christian-Albrechts-Universität zu Kiel, Christian-Albrechts-Platz 4, 24098 Kiel, Germany Received 30 November 2006; accepted 25 July 2007 Available online 14 September 2007 To Henryk Wo´zniakowski on the occasion of his 60th birthday
Abstract We construct simple algorithms for high-dimensional numerical integration of function classes with moderate smoothness. These classes consist of square-integrable functions over the d-dimensional unit cube whose coefficients with respect to certain multiwavelet expansions decay rapidly. Such a class contains discontinuous functions on the one hand and, for the right choice of parameters, the quite natural d-fold tensor product of a Sobolev space H s [0, 1] on the other hand. The algorithms are based on one-dimensional quadrature rules appropriate for the integration of the particular wavelets under consideration and on Smolyak’s construction. We provide upper bounds for the worst-case error of our cubature rule in terms of the number of function calls. We additionally prove lower bounds showing that our method is optimal in dimension d =1 and almost optimal (up to logarithmic factors) in higher dimensions. We perform numerical tests which allow the comparison with other cubature methods. © 2007 Elsevier Inc. All rights reserved. Keywords: Numerical integration; Smolyak’s algorithm; Sparse grids; Multi wavelets
1. Introduction The computation of high-dimensional integrals is a difficult task arising, e.g., from applications in physics, quantum chemistry, and finance. The traditional methods used in lower dimensions, ∗ Corresponding author. Fax: +49 431 880 1725.
E-mail addresses: [email protected] (M. Gnewuch), [email protected] (R. Lindloh), [email protected] (R. Schneider), [email protected] (A. Srivastav). 1 Partially supported by the DFG-Graduiertenkolleg 357 “Effiziente Algorithmen und Mehrskalenmethoden”.
0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.07.002
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
829
such as product rules of one-dimensional quadratures, are usually too costly in high dimensions, since the number of function calls increases exponentially with the dimension. In this paper we present a cubature method which can be used to handle the following multivariate integration problem also in higher dimensions: Problem definition. We want to approximate the integral f (x) dx I (f ) = [0,1]d
for functions f : [0, 1)d → R belonging to function classes H of theoretical or practical interest. It is important from the view point of applicability of high-dimensional cubature that the function class is general and rich and contains important classes arising in numerical mathematics. A general cubature formula with N sample points {x1 , x2 , . . . , xN } ⊂ [0, 1]d is given by QN (f ) =
N
f (x ),
=1
where {1 , . . . , N } is some suitable set of weights. To measure the quality of a given cubature QN we use the worst case error over H defined by err(H, QN ) :=
sup
f ∈H,f =1
err(f, QN ),
where, err(f, QN ) := |I (f ) − QN (f )|. As I and QN are linear, err(H, QN ) is nothing but the operator norm I − QN op induced by the norm of H. Results. The function classes we consider in this paper are certain Hilbert spaces Hs = f ∈ L2 | f s < ∞ which are spanned by multiwavelets and are characterized by ||2s discrete norms f 2s = f, 2 . The functions in Hs are continuously embedded 2 2 d in L [0, 1] and under proper requirements Hs contains classical function spaces like Sobolev spaces. Our aim is to provide a cubature method that guarantees a (nearly) optimal worst case error and which is easy to implement. For arbitrary parameters s > 21 we show that its worst case error over Hs is of the form (d−1)(s+1/2) , where N denotes the number of sample points used. We also prove a lower O log(N) N s (d−1)/2 for all cubatures on Hs using N sample points. This shows that the bound log(N)N s presented integration method converges on Hs asymptotically almost optimal. Our cubatures are based on one-dimensional quadratures chosen with respect to the particular space Hs under consideration, and Smolyak’s construction. More precisely, we use composite quadrature rules of a fixed order n. These rules are exact for piecewise polynomials of order n. The presented Smolyak construction is related to tensor product multiwavelet expansions in the way that the cubature is exact on finite multiwavelet series up to a critical level.
830
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
Related work. To some extent our work is motivated by [13], where the considered function classes depend on Haar wavelet series and a randomized cubature given by a quasi-Monte Carlo rule using so-called scrambled nets (see, e.g., [15]) is studied. These classes of Haar wavelets are included in the classes of multiwavelets that we consider. Notice that cubature rules using scrambled nets are not exact for (piecewise) polynomials of higher degree, in contrast to our method. It is known that Smolyak’s construction leads in general to almost optimal approximations in any dimension d > 1 as long as the underlying one-dimensional quadrature rule is optimal. The application of Smolyak’s construction to numerical integration has been studied in a number of papers so far, see, e.g., [3,4,11,12,14,16,20,24] and the literature mentioned therein. The error bounds provided in these papers were usually proved on Korobov spaces or spaces of functions with bounded mixed derivatives, i.e., on spaces of functions with a certain degree of smoothness. For our method we provide good error bounds with respect to the Hilbert spaces Hs of not necessarily smooth functions. Note that the power of the logarithm of N in our upper bound is (d − 1)/2 less than the power in the corresponding upper bounds appearing in the papers mentioned above. This paper is organized as follows: In Section 2 we define multiwavelets and introduce the spaces on which our cubatures of prescribed level should be exact. In Section 3 we present one-dimensional quadratures suited to evaluate the integrals of the univariate wavelets introduced in Section 2. We define a scale of Hilbert spaces of square integrable functions over [0, 1) via wavelet coefficients and prove an optimal error bound for our quadrature with respect to these spaces. In Section 4 we use Smolyak’s construction to obtain from our one-dimensional quadratures cubature rules for multivariate integrands. After giving a precise definition of the class of Hilbert spaces Hs of multivariate functions we want to consider error bounds for our cubatures; first in terms of the level of our cubatures, then in terms of the number of function calls. We provide also lower bounds for the worst case error of any cubature QN using N sample points. These lower bounds show that our cubature method is asymptotically almost optimal (up to logarithmic factors). In Section 5 we report on several numerical tests which allow us to compare our method with known methods. In Section 6 we provide a conclusion and make some remarks concerning future work. 2. Discontinuous multiwavelet bases 2.1. The one-dimensional case We start by giving a short construction of a class of bases in L2 [0, 1] that are called discontinuous multiwavelet bases. This topic has already been studied in the mathematical literature, see, e.g., [2,18,23]. By n we denote the set of polynomials of order n, i.e., of degree strictly smaller than n, on [0, 1). Let h0 , h1 , . . . , hn−1 denote the set of the first n Legendre polynomials on the interval [0, 1); an explicit expression of these polynomials is given by
j j j +k hj (x) = (−1) (−x)k k k j
k=0
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
831
for all x ∈ [0, 1), see, e.g., [1]. These polynomials build an orthogonal basis of n and are orthogonal on lower order polynomials, 1 hj (x)x i dx = 0, i = 0, 1, . . . , j − 1. 0
For convenience we extend the polynomials hj by zero to the whole real line. With the help of these (piecewise) polynomials we define for i = 0, 1, . . . , n − 1 a set of scaling functions i (x) := hi (x)/ hi 2 , where · 2 is the usual norm on L2 [0, 1]. For arbitrary j ∈ N0 we use the shorthand ∇j := 0, 1, 2, . . . , 2j − 1 . We consider dilated and translated versions j
i,k := 2j/2 i (2j · −k),
i = 0, 1, . . . , n − 1, j ∈ N0 , k ∈ ∇j ,
of the scaling functions i . Observe that these functions have compact support supp i,k = [2−j k, 2−j (k + 1)] =: Ik j
j
and j
j
i,k , i ,k = i,i k,k . Furthermore, we define spaces of piecewise polynomial functions of order n, j j Vn := span i,k |i = 0, 1, . . . , n − 1, k ∈ ∇j . j
It is obvious that the spaces Vn have dimension 2j n and that they are nested in the following way: n = Vn0 ⊂ Vn1 ⊂ · · · ⊂ L2 [0, 1]. j
For j = 0, 1, 2, . . . we define the 2j n-dimensional space Wn to be the orthogonal complement j j +1 of Vn in Vn , i.e., j j +1 j Wn := ∈ Vn |, = 0 for all ∈ Vn . This leads to the orthogonal decomposition j −1
j
Vn = Vn0 ⊕ Wn0 ⊕ Wn1 ⊕ · · · ⊕ Wn j
of Vn . 0 Let (i )n−1 i=0 be an orthonormal basis of Wn . (An explicit construction of such a basis in more general situations is, e.g., given in [18, Subsection 5.4.1].) Then it is straightforward to verify that the 2j n functions j
i,k := 2j/2 i (2j · −k),
i = 0, . . . , n − 1, k ∈ ∇j , j
form an orthonormal basis of Wn . The functions (i )n−1 i=0 are called multiwavelets and are obviously also piecewise polynomials of degree strictly less than n. Multiwavelets are supported on canonical intervals j
j
supp i,k = Ik
832
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
and satisfy the orthogonality condition j
i,k , m l,n = i,l j,m k,n . j
Since the spaces Wn are orthogonal to Vn0 = n , we have vanishing moments 1 j i,k (x)x dx = 0, = 0, 1, . . . , n − 1. 0
Next we define the space V :=
∞
j
Vn = Vn0 ⊕
∞
j =0
j
(2.1)
Wn .
j =0
Notice that V contains all elements of the well-known Haar basis; therefore V is dense in L2 [0, 1]. We follow the convention from [18] and define −1 i := i (please do not confuse this notation with the notation of inverse functions), ∇−1 := {0} and I0−1 := [0, 1]. A so-called multiwavelet basis of order n for L2 [0, 1] is given by j i,k |i = 0, 1, . . . , n − 1, j − 1, k ∈ ∇j , and for every f ∈ L2 [0, 1] we get the following unique multiwavelet expansion f =
n−1
j
j
f, i,k i,k .
j −1 k∈∇j i=0
2.2. The multivariate case In this subsection we extend the concept of multiwavelet bases to higher dimensions. Here we follow an approach that is suitable for our later analysis. For a given multi-index j ∈ Zd we put |j| := j1 + j2 + · · · + jd , and for i ∈ Nd0 let |i|∞ := max {i1 , . . . , id }. A multivariate multiwavelet basis of L2 [0, 1]d is given by so-called tensor product wavelets. For n ∈ N, we define the approximation space on level L by Vnd,L
:=
d
j
Vn i .
(2.2)
|j|=L i=1
Similarly to the one-dimensional case we put V d :=
∞
V d,L .
L=0
Since V = V 1 is dense in L2 [0, 1], the space V d is dense in L2 [0, 1]d . Thus we obtain the following expansion for f ∈ L2 [0, 1]d f =
n−1 j −1 k∈∇j |i|∞ =0
j
j
f, i,k i,k ,
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
833
where j = (j1 , . . . , jd ) −1 is meant in the way that ju −1 for all u = 1, . . . , d. (In the following all inequalities between vectors and between a vector and a scalar are meant componentwise.) Furthermore, we used the shorthands ∇j = ∇j1 × · · · × ∇jd and j
i,k :=
d
j
iuu,ku .
u=1 j
If the d-dimensional canonical interval Ik is defined by j
j
j
j
Ik := Ik11 × Ik22 × · · · × Ikdd , j
j
then supp i,k = Ik holds. 3. One-dimensional integration 3.1. One-dimensional quadrature formulas Recall that a general one-dimensional quadrature is given by Qm (f ) =
m
f (x ),
(3.1)
=1
where x1 , . . . , xm ⊂ [0, 1] are the sample points, and 1 , . . . , m ∈ R are the weights. Since we are here interested in quadrature formulas with high polynomial exactness—like the Newton– Cotes, Clenshaw–Curtis or Gauss formulas—we confine ourselves to the case m =1 = 1. For a detailed discussion of one-dimensional quadrature formulas see, e.g., [7]. Our aim is to give a simple construction of quadrature formulas QN which satisfy for a given polynomial order n and a so-called critical level l err(h, QN ) = 0
for all h ∈ Vnl .
We get the requested quadrature by scaling and translating a simpler one-dimensional quadrature formula Qm that is exact for all polynomials of order n on [0, 1]. If Qm has the explicit form (3.1), then our resulting quadrature uses 2l m sample points and is given by Am (l, 1)(f ) :=
m
2−l f (2−l x + 2−l k).
(3.2)
k∈∇l =1 j
Am (l, 1) is exact for polynomials on canonical intervals Ik , j l, k ∈ ∇j , of degree strictly less than n and therefore also on the whole space Vnl . Let us call a sequence of quadratures or cubatures (QN )N nested if the corresponding sets of sample points (XN )N are nested, i.e., if XN ⊆ XN+1 for all N. Whether our quadratures (Am (l, 1))l are nested or not depends of course on the set of sample points X of the underlying quadrature Qm . If we, e.g., consider the case n = 1, then we may choose Qm to be the mid point rule Qm (f ) = f ( 21 ), which results in the non-nestedness of our quadratures (Am (l, 1))l . If we choose on the other hand the rule Qm (f ) = f (0), then our quadratures are indeed nested. (Notice that in the latter case Am (l, 1) is nothing but the iterated trapezoidal rule for periodic functions.)
834
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
3.2. Error analysis For the error analysis of our one-dimensional quadrature method let n ∈ N, and let j i,k |i = 0, 1, 2, . . . , n − 1, j − 1, k ∈ ∇j , be the multiwavelet basis of order n defined in Section 2.1. For s > 0 we define a discrete norm |f |2s,n :=
n−1
j
2j 2s f, i,k 2
(3.3)
Hs,n := f ∈ L2 [0, 1]| |f |s,n < ∞ ,
(3.4)
j −1 k∈∇j i=0
on the space
consisting of functions whose wavelet coefficients decrease rapidly. Point evaluations are obvij ously well defined on the linear span of the functions i,k , i = 0, 1, . . . , n − 1, j − 1, k ∈ ∇j . Moreover, it is easy to see that they can be extended to bounded linear functionals on Hs,n as long as s > 21 . On these spaces quadrature formulas are therefore well defined. Now we choose an m = m(n) and an underlying quadrature rule Qm as in (3.1) such that Qm is exact on n . Let Am (l, 1) be as in (3.2). Then the wavelet expansion of a function f ∈ Hs,n and the Cauchy–Schwarz inequality yield the following error bound for our algorithm Am (l, 1): Theorem 3.1. Let s > 21 and n ∈ N. Let Qm and Am (l, 1) be as above. Then there exists a constant C > 0 such that err(Hs,n , Am (l, 1))C2−ls .
(3.5)
Proof. Let f ∈ Hs,n . The quadrature error is given by err(f, Am (l, 1)) = |I (f ) − Am (l, 1)f | n−1 j j j = f, i,k I (i,k ) − Am (l, 1)i,k . j −1 k∈∇j i=0 The Cauchy–Schwarz inequality yields ⎛ err(f, Am (l, 1)) |f |s,n ⎝
n−1
2−j 2s I (i,k ) − Am (l, 1)i,k j
j
2
⎞1/2 ⎠
.
j −1 k∈∇j i=0
Recall that the Cauchy–Schwarz inequality leads to a tight worst case error bound. Because of the polynomial exactness and vanishing moments we get therefore err(Hs,n , Am (l, 1))2 =
n−1 j l k∈∇j i=0
2 j 2−j 2s Am (l, 1)i,k .
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850 j
835
j
j
By some easy calculations and with the identities supp i,k = Ik and i,k ∞ = 2j/2 i ∞ we get err(Hs,n , Am (l, 1))2
⎧ m ⎨
⎫2 ⎬ j 2−j 2s 2−l | | i,k 1I j (2−l x + 2−l k ) ⎩ ⎭ ∞ k j l k∈∇j i=0 k ∈∇l =1 ⎧ ⎫2 m n−1 ⎨ ⎬ −l −l i 2 | | = 2−2l 2j (1−2s) (2 x + 2 k ) . 1 j I ∞ ⎩ ⎭ k n−1
j l
k ∈∇l =1
k∈∇j
i=0
For j l and k ∈ ∇j let = (j, k, l) be the unique element ∈ ∇l such that 2−l 2−j k < 2−j (k + 1) 2−l ( + 1). Then err(Hs,n , Am (l, 1))2
2
2
∞
j l
2
i=0
k∈∇j
i=0
∈∇l
n−1
2 i |∇l | ∞
2
=
2
−2l j (1−2s)
2
m
=1
2 −l
−l
| | 1I j (2 x + 2 ) k
m 2 n−1 2 −l −l i | | 1Il (2 x + 2 ) ∞
−2l j (1−2s)
j l
n−1 2 i
−2l j (1−2s)
j l
=1
m
2 | |
.
=1
i=0
Note that |∇l | = 2l . We can upper bound the integration error by m 2 n−1 2 2 | | 2−l 2j (1−2s) i err(Hs,n , Am (l, 1)) ∞
i=0
=
n−1
2 i
∞
i=0
=
n−1 2 i
∞
i=0
=1 m
2−l2s
| |
=1 m
j l
2 2 | |
=1
2j (1−2s)
j 0
2−l2s . 1 − 2(1−2s)
Thus we proved that (3.5) holds with the constant C=√
1 1 − 21−2s
n−1 i=0
1/2 i 2∞
m
| |.
=1
Remark 3.2. The error estimate in Theorem 3.1 is asymptotically optimal as Theorem 4.9 will reveal.
836
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
4. Multivariate numerical integration 4.1. The d-dimensional cubature method Now we extend our one-dimensional algorithm Am (l, 1) to a d-dimensional cubature. This should be done via Smolyak’s construction: The so-called difference quadrature of level l 0 is defined by l := Am (l, 1) − Am (l − 1, 1), with Am (−1, 1) := 0. Smolyak’s construction of level L is then given by Am (L, d) := (l1 ⊗ l2 ⊗ · · · ⊗ ld ). l∈Nd0 ,|l| L
Examples of sets of sample points used by Smolyak’s algorithm are provided in Fig. 1. Notice that we have 0 = Qm . Let us recall that in the one-dimensional case Am (l, 1) is exact on Vnl . In the d-dimensional case, it is not too difficult to show the exactness of Am (L, d) on Vnd,L . Theorem 4.1. The cubature Am (L, d) is exact on the approximation space Vnd,L . The proof follows the lines of the proof of [14, Theorem 2] and proceeds via induction over the dimension. 4.2. Upper bounds for the cubature error For the error analysis we consider product spaces which are based on the spaces Hs,n used for our one-dimensional quadrature error bounds. These seem to be the natural spaces for our 1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
Fig. 1. A3 (5, 2) and A2 (3, 2) with underlying Gauss quadrature. In the right diagram “+” denotes sample points with positive, “o” sample points with negative weights.
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
837
variation of Smolyak’s construction. For a function f we define a norm |f |2d,s,n :=
n−1 j −1 k∈∇j |i|∞ =0
2|j|2s f, i,k 2 j
(4.1)
and the space d := {f ∈ L2 [0, 1]d | |f |d,s,n < ∞}. Hs,n
In [24, Lemma 2] Wasilkowski and Wo´zniakowski provided an error bound that is valid not only for d-dimensional cubatures, but also for more general d-dimensional approximation algorithms based on Smolyak’s construction. Adapting the corresponding proof, we see that our one dimensional error bound from Theorem 3.1 implies the following result. Theorem 4.2. Let d, n ∈ N, and let the one-dimensional quadrature Qm be exact on n . For s > 21 let C be the constant from (3.5). The worst case error of Am (L, d) satisfies
s !d−1 −Ls L + d d s . 2 err(Hs,n , Am (L, d))C max 2 , C (1 + 2 ) d −1 Instead of explaining the proof in detail we want to provide a better upper bound in which L+d essentially the term d−1 ∼ Ld−1 is replaced by L(d−1)/2 . Before establishing the corresponding theorem, we state a simple helpful lemma and a well-known identity. Lemma 4.3. Let i, j ∈ Zd with n − 1i 0, j − 1, and let k ∈ ∇j . Assume that jd = −1 and id = 0. If i , j and k denote the (d − 1)-dimensional vectors consisting of the first d − 1 components of i, j and k, respectively, then we have for all L ∈ N0 j
j
Am (L, d)i,k = Am (L, d − 1)i ,k . 0 d d Proof. We have idd,kd = −1 0 = 0 = 1[0,1) , implying id ,kd = 1 and id ,kd = 0 for all 1. Now the lemma follows immediately from the definition of Am (L, d). j
j
j
A well-known formula expressing Am (L, d) solely in terms of tensor quadratures is
d d −1 L−|l| (−1) Am (lu , 1). Am (L, d) = L − |l| L−d+1 |l| L
(4.2)
u=1
A proof of this identity can, e.g., be found in [24, Lemma 1]. Theorem 4.4. Let d, n ∈ N, and let the one-dimensional quadrature Qm be exact on n . For s > 21 there exists a constant C > 0 such that for all L > 0 the worst case error of Am (L, d) satisfies d , Am (L, d))C2−Ls L(d−1)/2 . err(Hs,n
Proof. For the sake of brevity we do not try to give a reasonably good bound for the constant C in the theorem; instead we use rather rough estimates and a generic constant C, which may depend on n, m, s and d, but not on the given level L.
838
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
We proceed via induction on d. The case d = 1 has already been treated in Theorem 3.1. Let now d 2, and let the induction hypothesis hold for d − 1. Similarly as in the one-dimensional case we can use the Cauchy–Schwarz inequality and the exactness of Am (L, d) on Vnd,L to get d err(Hs,n , Am (L, d))2 =
n−1
|j| L−d+1 k∈∇j |i|∞ =0
2−|j|2s {Am (L, d)i,k }2 ; j
j
j
hereby note that i,k ∈ ⊗d=1 Vnl if and only if l > j for all ∈ {1, . . . , d}, i.e., i,k ∈ Vnd,L if and only if |j| < L − d + 1. To avoid technical difficulties, we now show that the summation over the index sets U () := {(i, j)|n − 1|i|∞ 0, i = 0, |j|L − d + 1, j = −1} for all ∈ {1, . . . , d} contributes not essentially to the square of the worst case error. Indeed, if i , j , and k denote (d − 1)-dimensional vectors, then Lemma 4.3 yields j 2−|j|2s {Am (L, d)i,k }2 (i,j)∈U () k∈∇j
=
n−1
|i |∞ =0 |j | L−(d−2) k ∈∇j
j
2−(|j |−1)2s {Am (L, d − 1)i ,k }2
d−1 = 22s err(Hs,n , Am (L, d − 1))2 C2−2Ls Ld−2 ,
where in the last step we used the induction hypothesis. So let us now consider solely pairs (i, j) where for all ∈ {1, . . . , d} we have i 1 or j 0. For such pairs (i, j), for k ∈ ∇j and ∈ {L − d + 1, . . . , L} let us define d j, ju d Si,k := l ∈ N0 ||l| = ∧ Am (lu , 1)iu ,ku = 0 . u=1 j
If ju < lu , then, due to the exactness of Am (lu , 1) on Vnlu , we have Am (lu , 1)iuu,ku = 0 (since ju = −1 or iu = 0). Thus j, Si,k ⊆ S˜j, := l ∈ Nd0 ||l| = ∧ ∀u ∈ {1, . . . , d}: lu ju . A coarse estimate of the cardinality of S˜j, is
|j| − + d − 1 |S˜j, | . d −1 (One can verify this bound by starting with j and counting the ways to distribute the difference − |j| to the components of j to get an l ∈ Zd with |l| = and lu ju for 1 ud.) With these observations and with identity (4.2) we get d , Am (L, d))2 err(Hs,n
C2−2Ls Ld−2 +C
n−1
⎧ ⎪ ⎨
L
⎫ d ⎪2 ⎬ j Am (lu , 1)iuu,ku . ⎪ ⎭
2−|j|2s ⎪ ⎩=L−d+1 l∈S˜ |i|∞ =0 |j| L−d+1 k∈∇j
j,
u=1
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
Since for l ∈ S˜j, the tensor quadrature j supp (i,k ), we have
#d
u=1 Am (lu , 1)
839
uses at most md sample points from
d d $ ju j Am (lu , 1)iu ,ku 2−|l| md iuu,ku ∞ 2− md 2|j|/2 M, u=1
u=1
where,
d n−1 M := max{i ∞ , i ∞ } . i=0
#d d L Since each of the tensor quadratures u=1 Am (lu , 1) uses not more than m 2 points, we have 2L function evaluations to calculate the term inside the to make at most C |j|−(L−d+1)+d−1 d−1 j
parentheses. For fixed i and j all the i,k , k ∈ ∇j , have pairwise disjoint support and thus only the summation over some subset ∇˜ j of ∇j with
|j| − (L − d + 1) + d − 1 L |∇˜ j | C 2 d −1 yields a non-trivial contribution to our estimate. Altogether we get (suppressing the lower order term C2−2Ls Ld−2 ) d err(Hs,n , Am (L, d))2 C
∞
|∇˜ j |2−2s
=L−d+1 |j|=
−(L−d+1)+d − 1 −L /2 2 2 2 . d −1
Our estimate for |∇˜ j | and |{j|j − 1, |j| = }| = +2d−1 lead to d−1
∞ + 2d − 1 −(L−d+1)+d−1 3 d err(Hs,n , Am (L, d))2 C2−L 2(1−2s) d −1 d −1 =L−d+1
∞ +d −1 3 −L (L−d+1)(1−2s) (1−2s) L + + d C2 2 2 d −1 d −1 =0 ∞
4 +d −1 2−2Ls Ld−1 . C 2(1−2s) d −1 =0
The sum inside the parentheses converges as s > 21 .
From the abstract definition of our function space Hs,n it is not immediately clear if it contains a reasonable class of interesting functions away from the piecewise polynomials. At least in the case where the parameter n is strictly larger than s, the Sobolev space H s [0, 1] is continuously embedded in Hs,n . There are several ways to define Sobolev spaces with non-integer index s ∈ R, one can use for example the Fourier transform ˆ f ( ) := f (x)e−ix dx R
840
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
to define the norm 2 f s = (1 + |y|2 )s fˆ(y) dy R
and the space H s (R) = f ∈ L2 | f s < ∞ . For the interval [0, 1] we define H s [0, 1] = H s (R)|[0,1] by restriction, i.e., f ∈ H s [0, 1] if there exists a function g ∈ H s (R) such that in the sense of distributions g|[0,1] = f and f H s [0,1] =
inf
g:f =g|[0,1]
gs .
The continuous embedding of H s [0, 1] into Hs,n is established by some Jackson type inequality. s Theorem 4.5. Let (i )n−1 i=0 be multiwavelets of order n. For all s < n the inclusion H [0, 1] ⊂ Hs,n holds. More precisely, there exists a constant K > 0 such that for every f ∈ H s [0, 1] we have n−1 j −1 k∈∇j i=0
j
2j 2s f, i,k 2 K 2 f 2H s [0,1] .
For a proof of the theorem see, e.g., [5,18,23]. Notice that in general we cannot hope to prove equivalence of the norms on Hs,n and H s [0, 1]. This is obvious in the case where s > 21 : Hs,n contains discontinuous functions, while H s [0, 1] does not. s is defined by The mixed Sobolev space Hmix s Hmix = H s [0, 1] ⊗ H s [0, 1] ⊗ · · · ⊗ H s [0, 1], % &' ( d times s i.e., it is the complete d-fold tensor product of the Hilbert space H s [0, 1]. In terms of Hmix Theorem 4.4 reads as follows:
Corollary 4.6. Let s > 21 and n > s. Let the one-dimensional quadrature Qm be exact on n . Then there exists a constant C > 0 such that for every L > 0 s err(Hmix , Am (L, d)) C2−Ls L(d−1)/2 .
Now we analyze the cost of the cubature algorithm Am (L, d). Identity (4.2) shows clearly that the number of multiplications and additions performed by the algorithm Am (L, d) is more or less proportional to the number of function evaluations. Since the cost of one function evaluation is in general much greater than the cost of an arithmetic operation, we concentrate here on the number of sample points N = Nm (L, d) used by Am (L, d). Since for l ∈ Nd0 and a gen# eral d-variate function f the operator du=1 Am (lu , 1) uses 2|l| md function values, identity (4.2)
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
841
gives us N
2|l| md
L−d+1 |l| L
md 2 L
d−1
2j −d+1
j =0
L+j d −1
md 2L+1
L+d −1 . d −1
The bound on N can be improved if our cubatures (Am (L, d))L are nested, i.e., if the set of sample points used by Am (L, d) is a subset of the set of sample points of Am (L + 1, d) for all L. As pointed out in Section 3.1, the right choice of the underlying quadrature Qm implies that the quadratures (Am (l, 1))l are nested, which again implies—see (4.2)—that the cubatures (Am (L, d))L are nested. Although we get for our cubatures, regardless if they are nested or not, the asymptotic estimate N O(2L Ld−1 ), the hidden constants in the big-O-notation are reasonably smaller if we have nestedness. The upper bound on N, Theorem 4.4, and some elementary calculations lead to the following corollary. Corollary 4.7. Let d, n ∈ N and let Qm be exact on n . For s > Am (L, d) satisfies d , Am (L, d)) err(Hs,n
=O
log(Nm (L, d))(d−1)(s+1/2) (Nm (L, d))s
1 2
the worst case error of
.
s d if s < n. In this situation Remark 4.8. Recall that Hmix is continuously embedded in Hs,n s d Corollary 4.7 holds in particular for Hmix in place of Hs,n .
4.3. Lower bounds for the cubature error In the previous section we discussed error bounds for our d-dimensional cubature rule based d and H s . For the considered spaces on Smolyak’s construction with respect to the spaces Hs,n mix d Hs,n there is a general method to prove lower bounds for the worst case error of any cubature QN . In [13] Heinrich et al., presented a lower bound for Haar wavelet spaces that can be extended to d . (It is not hard to verify that their spaces H the spaces Hs,n wav,s coincide (for base b = 2) with d our spaces Hs,1 .) The idea is to construct a finite linear combination f of weighted (multi)wavelet series that is zero on all canonical intervals of a fixed chosen level which contain a sample point of QN . This should be done in such a way that the d-dimensional integral I (f ) is large while the norm |f |d,s,n should remain small. (Similar proof ideas had been appeared in the mathematical literature before; cf, e.g., the well-known proof of Roth of the lower bound for the L2 -discrepancy [17].) Theorem 4.9. Let s > 21 and n ∈ N. There exists a constant C > 0 such that for any d-dimensional cubature rule QN using N sample points we have d err(Hs,n , QN ) C
(log N )(d−1)/2 . Ns
842
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
Proof. Let P ⊂ [0, 1]d , |P | = N be the set of sample points used by the cubature rule QN . For all l ∈ Nd0 we define a function ) 1 for all x ∈ Ikl , k ∈ ∇l with Ikl ∩ P = ∅, fl (x) = 0 else. Now we choose the uniquely determined integer L that satisfies 2L−1 < 2N 2L and define a function f = fl . |l|=L
Hence we get for the norm of our candidate n−1
|f |2d,s,n =
j
j −1 k∈∇j |i|∞ =0
=
2|j|2s f, i,k 2
n−1
|l|=|l |=L j −1 k∈∇j |i|∞ =0
2|j|2s fl , i,k fl , i,k . j
j
j
Due to (2.1) the inner product fl , i,k vanishes if one of the indices j satisfies j l 0. d Furthermore, if we put M := maxn−1 { , } , we have i ∞ i ∞ i=0 j j j fl , i,k i,k fl ∞ vol(Ik ) M|∇j |−1/2 . ∞
Therefore we get |f |2d,s,n nd M 2
2|j|2s |∇j |−1
|l|=|l |=L −1 j
nd M 2
2|j|2s
|l|=|l |=L −1 j
nd M 2
d
|l|=|l |=L =0
n M (1 + 2−2s )d d
2
d 2|j|2s 0 j
2−2s
|l|=|l |=L 0 j
nd M 2 (1 + 2−2s )d
L−d
=0 |j|=,j 0
22s ⎝
2
−2s d
|l|=L,l>j
⎞2 1⎠
+ d − 1 2s L − − 1 2 = n M (1 + 2 ) 2 . d −1 d −1 =0 2s by L−1 22(L−d)s . Furthermore, we use the new index m := We upper-bound +d−1 2 d−1 d−1 L − d − and majorize the resulting sum by taking the infinite sum instead. Using the short hand d
L−d
⎛
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
843
C := nd M 2 (1 + 2−2s )d leads to ∞
2
L − 1 2(L−d)s L − 1 2(L−d)s 2 −m2s m + d − 1 |f |d,s,n C 2 2 2 C , d −1 d −1 d −1 m=0
C
with a constant not depending on L, but on d and s. Furthermore, we have
1 1 L+d −1 −L L f dx = fl dx 2 (2 − N ) = . 2 2 d −1 [0,1)d [0,1)d |l|=L
|l|=L
|l|=L
Let us now consider the function f ∗ = f/ |f |dd,s,n . Since QN (f ) = 0 the estimates above result in * L+d−1 [0,1)d f dx 2ds−1 d−1 err(f ∗ , QN ) = √ 2−Ls . + |f |dd,s,n C L−1 d−1
Using the asymptotic estimates
L+d −1 d −1
,
∼L
d−1
and
L−1 d −1
∼ L(d−1)/2 ,
we finally get err(f ∗ , QN ) C2−Ls L(d−1)/2 , with a constant C not depending on L, but depending on d and s. 5. Numerical examples We implemented our cubature method and computed the integrals of certain test functions in dimension 5 and 10. The families of test functions we considered were selected from the testing package of Genz [9,10], and they are named as follows:
d (1) OSCILLATORY f1 (x) = cos 2 w1 + ci xi , i=1
d -
(2) PRODUCT PEAK f2 (x) =
i=1
ci−2
d
+ (xi − wi )2
−1
,
−(d+1)
(3) CORNER PEAK f3 (x) = 1 + ci xi ,
d i=1 2 (4) GAUSSIAN f4 (x) = exp − ci (xi − wi )2 , i=1 d
(5) CONTINUOUS f5 (x) = exp − ci |xi − wi | , i=1 0 if x1 > w1 or x2 > w2 , d (6) DISCONTINUOUS f6 (x) = exp otherwise. i=1 ci xi This choice of test functions is obviously unfavorable with regard to our cubature rule and the corresponding function classes, but enables us to compare our results directly to the results of the algorithms studied in [14,19]. The algorithm in [14] is based on Smolyak’s construction and
844
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
the Clenshaw–Curtis rule in dimension d = 1. The algorithms in [19, Chapter 11] consist of an embedded sequence of lattice rules named COPY, an algorithm using rank-1 lattice rules, an adaptive Monte Carlo method, and an adaptive method by van Dooren and De Ridder [22], for which the short hand ADAPT is used. With respect to the six test families, COPY and ADAPT are the best performing algorithms of these four. We should mention here that there exist more recent algorithms which improve in some applications on the algorithms we chose as benchmark methods, see, e.g., [4,12,16] and the literature mentioned therein. So, e.g., the use of Kronrod–Patterson formulas as one-dimensional quadratures in Smolyak’s construction seems to be a very powerful tool for the treatment of smooth functions as reported in [11,16]. These cubatures have the advantage to lead to a higher degree of polynomial exactness than the Smolyak construction of the same level based on the Clenshaw–Curtis rules, while on the other hand the number of sample points used increases faster. Although the use of Kronrod–Patterson formulas leads for some examples to reasonably better performance than the use of Clenshaw–Curtis rules, one can see in [16] that for the testing package of Genz the first were not clearly better than the latter. As observed in [11,16] the numerical advantage of the algorithms based on the Kronrod–Patterson rules over the ones based on Clenshaw–Curtis rules decreases with growing dimension d. The improvements of Petras by using delayed basis sequences of Kronrod–Patterson formulas can be seen as a further “fine tuning” of Smolyak’s algorithm for smooth functions. The use of delayed basis sequences may also help to improve our approach for classes of smooth functions. We have not studied this so far. We followed the conventions from [14,19]: All the functions were normalized so that the true integrals over the unit cube equaled 1. By varying the parameters c = (c1 , . . . , cd ) and w = (w1 , . . . , wd ) we got different test integrals. For each family of functions we performed 20 tests in which we chose the vectors independently and uniformly distributed in [0, 1]d . The vectors c were renormalized such that d
ci = bj
i=1
holds for predetermined parameters bj , j = 1, . . . , 6. Since, in general, the difficulty of the integrals increases as the (Euclidean) norm c increases, the choice of the bj determines the level of difficulty. As in [19] and in [14], we chose in dimension d = 10 the following values of bj : In the notion of [19] this corresponds to the level of difficulty L = 1 for the families 2, 4, and 6, and to the level L = 2 for the families 1, 3, and 5. In dimension d = 5 we chose b2 = 29 and b5 = 43.4, which corresponds to the level L = 1 for family 2 and L = 2 for family 5. The diagrams in Figs. 2–7 show the median of the absolute error of our cubatures in 20 tests for each of the considered families. We treated all six families in dimension 10. In Figs. 2–7 we plotted the median error of the lattice rule COPY taken from the diagrams in [19] and the median error of the algorithm considered by Novak and Ritter taken from the diagrams in [14]. We tested our method by using Gauss rules as underlying one-dimensional quadrature Qm . Notice that in the case of the one-point Gauss rule the resulting algorithm is identical to the so-called Boolean midpoint rule which has, e.g., been studied in [3]. (There in a numerical j
1
2
3
4
5
6
bj
9.0
7.25
1.85
7.03
20.4
4.3
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
845
function OSCILLATORY (L=2) in dimension 10 100
one-point Gauss two-point Gauss three-point Gauss four-point Gauss Novak and Ritter COPY
10-2
error
10-4 10-6 10-8 10-10 10-12 100
101
102
103
104 105 function calls
106
107
108
Fig. 2. Median of absolute error of family (1), 20 integrands.
function PRODUCT PEAK (L=1) in dimension 10 100
one-point Gauss two-point Gauss three-point Gauss four-point Gauss Novak and Ritter COPY
10-1 10-2 10-3
error
10-4 10-5 10-6 10-7 10-8 10-9 10-10 100
101
102
103
104 105 function calls
106
107
108
Fig. 3. Median of absolute error of family (2), 20 integrands.
example this algorithm behaved well for a smooth test function lying in some Korobov space.) We decided to use Gauss rules since they achieve the maximal degree of polynomial exactness. This makes it easy to study the dependence of the numerical results on the degree of (piecewise) polynomial exactness by considering only few sample points in the underlying quadrature.
846
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
function CORNER PEAK (L=2) in dimension 10 100
one-point Gauss two-point Gauss three-point Gauss four-point Gauss Novak and Ritter COPY
10-1 10-2
error
10-3 10-4 10-5 10-6 10-7 10-8 100
101
102
103
104 105 function calls
106
107
108
Fig. 4. Median of absolute error of family (3), 20 integrands.
function GAUSSIAN (L=1) in dimension 10
100
one-point Gauss two-point Gauss three-point Gauss four-point Gauss Novak and Ritter COPY
10-1 10-2 10-3
error
10-4 10-5 10-6 10-7 10-8 10-9 10-10 100
101
102
103
104 105 function calls
106
107
108
Fig. 5. Median of absolute error of family (4), 20 integrands.
Unfortunately, the choice of Gauss rules leads to non-nested cubatures (Am (L, d))L . As mentioned in Sections 3.1 and 4.2 one may use nested one-dimensional quadratures to reduce the number of sample points in dimension d at a given level L (of course at the cost of a reduced parameter n for fixed m).
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
847
function CONTINUOUS (L=2) in dimension 10 100
error
10-1
10-2
10-3 one-point Gauss two-point Gauss
10-4
three-point Gauss four-point Gauss Novak and Ritter COPY
10
-5
100
101
102
103
104 105 function calls
106
107
108
107
108
Fig. 6. Median of absolute error of family (5), 20 integrands.
function DISCONTINUOUS (L=1) in dimension 10 100
error
10-1
10-2
one-point Gauss
10-3
two-point Gauss three-point Gauss four-point Gauss Novak and Ritter COPY
10-4 100
101
102
103
104 105 function calls
106
Fig. 7. Median of absolute error of family (6), 20 integrands.
For smooth integrands one would in general expect Gauss rules Qm with larger m superior to Gauss rules with smaller m, while for non-smooth integrands one would expect the contrary behavior. These prediction is supported by the numerical results for the families 1, 3, 5, and 6. The results for family 2 and 4 however do not display such a clear tendency.
848
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850 function SINGULARY in dimension 6 one-point Gauss
10-1
two-point Gauss three-point Gauss
error
10-2
10-3
10-4
10-5 100
101
102
103 104 function calls
105
106
Fig. 8. Median of the error defined in (5.1) of 20 integrands.
If we compare our results to the ones of the algorithm of Novak and Ritter, we see that for the families 1, 2, and 4 their results are clearly better than ours, while for the families 3, 5, and 6 the results are comparable. The results for the families 1, 2, and 4 reflect that the algorithm of Novak and Ritter was constructed to make the best use of smoothness properties, while our method was not. If we compare our cubature method with the algorithms considered in [19], it turns out that for the families 1, 3, and 4 our method is comparable to ADAPT and the two lattice rules. The adaptive Monte Carlo method is in non of these cases competitive. In case of family 2 our cubature is not as good as COPY, but comparable with the rank-1 lattice rule and ADAPT and better than the Monte Carlo method. For family 5 our method is comparable to ADAPT, but worse than Monte Carlo and both lattice rules. Our results for family 6 however are not as good as the results of any of the four algorithms in [19]. We performed additional numerical tests. Here we considered the six-dimensional function 1/2 f (x, y)=(x + ) − y= ((x1 + 1 ) − y1 )2 +((x2 + 2 ) − y2 )2 +((x3 +3 )−y3 )2 , where is a random vector. This function is used as a typical prototype electron-electron cusp of the solution of the electronic Schrödinger equation, see, e.g., [8]. Since we do not know the exact value of the integral of f over [0, 1]6 , we consider the following (normalized) error for a given level L: err(L, ) :=
Am (L, 6)f − Am (L − 1, 6)f . Am (L, 6)f
(5.1)
We performed 20 tests in which we chose the vectors independently and uniformly distributed in [0, 1]6 . Fig. 8 shows the median of the error defined by (5.1) for the cubature Am (L, 6) with underlying one-, two-, and three-point Gauss formulas. The cubature induced by the one-point
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
849
Gauss formula shows a good convergence up to an error of 10−4 . For a precision beyond 10−4 we seem to have numerical instabilities, as is also confirmed by the behavior of the cubatures induced by the two- and three-point Gauss formulas. Here adaptiveness may help to overcome these problems. Altogether, the numerical experiments show that our algorithms induced by Gauss formulas do behave reasonably well, although in every comparison (except for family (3)) the algorithm of Novak and Ritter or COPY perform clearly better. We may create algorithms following our approach by using, e.g., underlying quadratures resulting in nested cubatures or delayed basis sequences, cf. [16]. This may lead to an improved numerical performance. We have not studied this in detail so far, since the focus of our work was on the theoretical aspects of our multiwavelet approach. 6. Conclusion and outlook We provide explicit algorithms for multivariate integration based on Smolyak’s construction and iterated one-dimensional quadrature rules. These quadrature rules can be arbitrary as long as they satisfy a given degree of polynomial exactness. The resulting algorithms are simple and easy to implement. We consider certain multiwavelet function spaces and derive upper and lower bounds for the worst case error over the unit ball of these spaces. These bounds reveal that our algorithms are optimal up to logarithmic factors. We have chosen the presented multiwavelet approach, because translates of scaling functions do not overlap here. The treatment of overlapping scaling functions like Daubechies functions, B-splines or smoother multiwavelets requires special care at the end points of intervals [6]. These wavelet functions can be treated with an approach based on frame concepts; this will be discussed in detail in a forthcoming paper. The function spaces we consider are Hilbert spaces of functions whose coefficients with respect to multiwavelet expansions exhibit a prescribed decay. These spaces are related to Sobolev spaces. A natural question is whether one can combine our multiwavelet approach with other function spaces. One may consider spaces with discrete norms related to Besov spaces (e.g., by using different weights and an lp -metric rather than an l2 -metric in (3.3)). Now Besov spaces are usually useful for adaptive nonlinear approximation. It is known that for our error criterion adaptiveness and nonlinearity do essentially not help (see, e.g., [21]). In our opinion it is more promising to consider weighted (tensor product) Sobolev spaces. This would require some modifications with respect to the choice of cubature points. We have not studied this approach in detail so far, but think that our multiwavelet approach seems to be the appropriate choice here and would lead to good results. Another interesting question is if it is possible to achieve a better numerical performance than we have seen in Section 5 with algorithms following our approach. This should be the case if we consider functions which are more favorable with regard to our theoretical analysis. A way of improving our algorithms for sufficiently smooth integrands may be to use delayed basis sequences which result in nested cubatures. Acknowledgments We would like to thank two anonymous referees for their comments which helped to improve the representation reasonably.
850
M. Gnewuch et al. / Journal of Complexity 23 (2007) 828 – 850
Furthermore, the first author would like to thank Henryk Wo´zniakowski for the warm hospitality and for interesting discussions during his stay at the University of Warsaw in March 2005. References [1] M. Abramowitz, I.A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover, New York, 1964. [2] B. Alpert, A class of bases in L2 for the sparse representation of integral operators, SIAM J. Math. Anal. 24 (1993) 246–262. [3] G. Baszenski, F.J. Delvos, Multivariate Boolean midpoint rules, in: Numerical Integration IV, Birkhäuser, Basel, 1993, pp. 1–11. [4] H.J. Bungartz, M. Griebel, Sparse grids, Acta Numerica 13 (2004) 147–269. [5] A. Cohen, Numerical Analysis of Wavelet Methods, North-Holland, Amsterdam, 2003. [6] W. Dahmen, B. Han, R.-Q. Jia, A. Kunoth, Biorthogonal multiwavelets on the interval: cubic Hermite splines, Constr. Approx. 16 (2000) 221–259. [7] P.J. Davis, P. Rabinowitz, Methods of Numerical Integration, second ed., Academic Press, New York, 1984. [8] H.J. Flad, W. Hackbusch, R. Schneider, Best N-term approximation in electronic structure calculations. II. Jastrow factors, Max Planck Institute for Mathematics in the Sciences, Leipzig, Preprint 80/2005, M2AN Math. Model. Numer. Anal., to appear. [9] A. Genz, Testing multidimensional integration routines, in: B. Ford, J.C. Rault, F. Thomasset (Eds.), Tools, Methods and Languages for Scientific and Engineering Computation, North-Holland, Amsterdam, 1984, pp. 81–94. [10] A. Genz, A package for testing multiple integration subroutines, in: P. Keast, G. Fairweather (Eds.), Numerical Integration, Kluwer, Dordrecht, 1987, pp. 337–340. [11] T. Gerstner, M. Griebel, Numerical integration using sparse grids, Numer. Algorithms 18 (1998) 209–232. [12] T. Gerstner, M. Griebel, Dimension-adaptive tensor-product quadrature, Computing 71 (2003) 65–87. [13] S. Heinrich, F.J. Hickernell, R.-X. Yue, Optimal quadrature for Haar wavelet spaces, Math. Comput. 73 (2004) 259–277. [14] E. Novak, K. Ritter, High-dimensional integration of smooth functions over cubes, Numer. Math. 75 (1996) 79–97. [15] A.B. Owen, Monte Carlo variance of scrambled net quadrature, SIAM J. Numer. Anal. 34 (1997) 1884–1910. [16] K. Petras, Smolyak cubature of given polynomial degree with few nodes for increasing dimension, Numer. Math. 93 (2003) 729–753. [17] K.F. Roth, On irregularities of distribution, Mathematika 1 (1954) 73–79. [18] R. Schneider, Multiskalen- und Wavelet-Matrixkompression. (German) (Multiscale and wavelet matrix compression) Analysisbasierte Methoden zur effizienten Loesung grosser vollbesetzter Gleichungssysteme. (Analysis-based methods for the efficient solution of large nonsparse systems of equations), in: Advances in Numerical Mathematics, B.G. Teubner, Stuttgart, 1998. [19] I.H. Sloan, S. Joe, Lattice Methods for Multiple Integration, Oxford University Press, New York, 1994. [20] S.A. Smolyak, Quadrature and interpolation formulas for tensor products of certain classes of functions, Soviet Math. Dokl. 4 (1963) 240–243. [21] J.F. Traub, G.W. Wasilkowski, H. Wo´zniakowski, Information-Based Complexity, Academic Press, New York, 1988. [22] P. van Dooren, L. de Ridder, An adaptive algorithm for numerical integration over an n-dimensional cube, J. Comp. Appl. Math. 2 (1976) 207–217. [23] T. von Petersdorff, C. Schwab, R. Schneider, Multiwavelets for second kind integral equations, SIAM J. Numer. Anal. 34 (1997) 2212–2227. [24] G.W. Wasilkowski, H. Wo´zniakowski, Explicit cost bounds of algorithms for multivariate tensor product problems, J. Complexity 11 (1995) 1–56.
Journal of Complexity 23 (2007) 851 – 866 www.elsevier.com/locate/jco
Disintegration of Gaussian measures and average-case optimal algorithms夡 Vaja Tarieladze, Nicholas Vakhania∗ Department of Stochastic Processes and Applied Statistics, Niko Muskhelishvili Institute of Computational Mathematics, Tbilisi 0193, Georgia Received 12 April 2006; accepted 19 April 2007 Available online 15 June 2007 Dedicated to Professor H. Wozniakowski on the occasion of his 60th birthday
Abstract It is shown that a Gaussian measure in a given infinite-dimensional Banach space always admits an essentially unique Gaussian disintegration with respect to a given continuous linear operator. This covers a similar statement made earlier in [Lee and Wasilkowski, Approximation of linear functionals on a Banach space with a Gaussian measure, J. Complexity 2(1) (1986) 12–43.] for the case of finite-rank operators. © 2007 Published by Elsevier Inc. MSC: 41A65; 47A50 Keywords: Regular conditional probability; Disintegration; Gaussian measure in Banach space; Average-case optimal algorithm
Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Transition measures and disintegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Measurable mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Transition probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3. Disintegration of general probability measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Disintegration of Gaussian measures in Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Vector-valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Characteristic functionals and mixtures of measures . . . . . . . . . . . . . . . . . . . . . . . . . . 夡
852 852 852 853 854 855 855 855
The authors were partially supported by Grant GNSF/ST06/3-009; the first author was partially supported also by MCYT BFM2003-0804-05878. ∗ Corresponding author.
E-mail addresses: [email protected] (V. Tarieladze), [email protected] (N. Vakhania). 0885-064X/$ - see front matter © 2007 Published by Elsevier Inc. doi:10.1016/j.jco.2007.04.005
852
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
3.3. Moment functionals and covariance operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. Symmetric positive operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5. Gaussian measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6. Gaussian disintegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. The existence of average-case optimal algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. IBC-formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Average-case optimal algorithms via disintegration . . . . . . . . . . . . . . . . . . . . . . . . . . .
856 857 859 860 864 864 865
1. Introduction Professor Henryk Wozniakowski’s contribution to the theory of information-based complexity (IBC) is widely known. Our interest in this area was also stimulated by him. We both (especially the elder one of us) many times had the pleasure to meet Professor Henryk Wozniakowski in our country, in both of his countries, and in other places. We have been having the impression that a mathematical statement used by him to obtain a result in his research was not only a tool to achieve the result but also, at the same time, it had an independent interest for him as a mathematical statement itself: he appreciates and loves beauty in mathematics. Our main motivation was the fundamental monograph [24], in which significant applications of the theory of infinite-dimensional probability distributions are given for studying a wide range of problems of numerical analysis and approximation theory. We deal with the concept of a disintegration of a given probability measure relative to a measurable mapping, and its use for the study of average-case optimal algorithms in the sense of the theory of IBC. The general problem of disintegration for the needs of Ergodic theory was posed and solved in [15] (see historical notes to [3, Chapter 6]). Usually, a disintegration is defined and studied for not necessarily finite measures in the context of topological spaces (see [3]; see also [1] where a gap from [3] is repaired). We refer interested readers to the works [4,6–11,13,14,17,23] related to disintegration. The concept of disintegration (with the name fibering) was successfully used in [5] to study functionals of stochastic processes. By means of disintegrations in [24] the concepts of local average radius of information, local and global average errors, and central algorithms are introduced and studied. We study disintegration of measures in Banach spaces with respect to continuous linear operators. It is shown, in particular, that if is a Gaussian measure in a separable Banach space X, Y is an another separable Banach space and : X → Y is a continuous linear operator, then there always exists a disintegration q : B(X) × Y → [0, 1] of with respect to and, moreover, the measures q(·, y), y ∈ Y are Gaussian. A similar result for finite-dimensional Y was obtained earlier in [12] (see also [24, Appendix, Lemma 2.9.6] and [20, Theorem 3.4.1]); however, in the proof of [12] (as well as in [24]) a statement was used that later turned out to be not correct in general (see Remark 3.14). Our proof is based on a different auxiliary statement. Finally, in the last section the Gaussian disintegration is applied for the proof of the existence of average-case optimal algorithms in the sense of [24]. 2. Transition measures and disintegration 2.1. Measurable mappings Let X be a set, (Y, F) be a measurable space; for a mapping : X → Y we write A := {−1 (F ) : F ∈ F}. If A is a -algebra of subsets of X, then the mapping will be called (A, F)-measurable, if A ⊂ A.
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
853
For a topological space Z we denote B(Z) the Borel -algebra of Z. If (X, A) is a measurable space and Z is a topological space, then a mapping : X → Z will be called A-measurable, if it is (A, B(Z))-measurable. If X and Z are topological spaces, then a mapping : X → Z will be called Borel measurable if it is (B(X), B(Z))-measurable. A topological space Z will be called Polish if it is homeomorphic to a complete separable metric space. Let (X, A) be a measurable space, be a positive measure on A, (Y, F) be an measurable space and : X → Y be an (A, F)-measurable mapping. It is easy to see that the set function := ◦ −1 is a positive measure on F. The measure is called the image of with respect to . We shall use frequently the following well-known statement. Lemma 2.1. Let (X, A, ) be a measure space, (Y, F) be a measurable space, : X → Y be an (A, F)-measurable mapping, be the image of with respect to and g : Y → C be a F-measurable function. Then g ∈ L1 (Y, F, ) ⇐⇒ g ◦ ∈ L1 (X, A, ). If g ∈ L1 (Y, F, ) then the following change of variable formula holds: −1 (B)
g((x)) d(x) = B
g(y) d (y),
∀B ∈ F.
2.2. Transition probabilities Let (X, A) and (Y, F) be measurable spaces. A mapping q : A × Y → [0, 1] will be called a transition probability relative to (X, A) and (Y, F) if it has the following properties: (TP1) for a fixed y ∈ Y the set function q(·, y) is a probability measure on A, (TP2) for a fixed A ∈ A the function q(A, ·) is measurable with respect to F. Lemma 2.2. Let (X, A) and (Y, F) be measurable spaces, q : A × Y → [0, 1] be a transition probability relative to (X, A) and (Y, F) and f : X → C be a A-measurable function. Write qy (·) := q(·, y) for y ∈ Y and
Yf,q = y ∈ Y :
|f (x)| dqy (x) < ∞ . X
Then Yf,q ∈ F, the function y → X f (x) dqy (x) is F-measurable on Yf,q . In particular, if f : X → C is a bounded A-measurable function, then Yf,q = Y and y → X f (x) dqy (x) is a F-measurable function on Y. Proof follows in a standard way from (TP2). Let q : A × Y → [0, 1] be a transition probability relative to (X, A) and (Y, F) and be a probability measure on F. Then it is easy to see that the set function q : A → [0, 1] defined by the equality q (A) :=
q(A, y) d(y), Y
A∈A
854
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
is a probability measure on A. The measure q is called a mixture of the family q(·, y), y ∈ Y with respect to the mixing measure . Lemma 2.3. Let (X, A) and (Y, F) be measurable spaces, q : A × Y → [0, 1] be a transition probability relative to (X, A) and (Y, F), be a probability measure on F, := q and f ∈ L1 (X, A, ; C). Write qy (·) := q(·, y) for y ∈ Y and Yf,q = {y ∈ Y : X |f (x)| dqy (x) < ∞}. Then: (1) Yf,q ∈ F and (Yf,q ) = 1, (2) the function y → X f (x) dqy (x) is -integrable on Yf,q and the following equality holds:
f (x) d(x) = X
Yf,q
f (x) dqy (x) d(y).
(2.1)
X
Proof. The measurability statements follow from Lemma 2.2; the remaining part coincides with [16, Corollary 2 to Proposition III.2.1]. 2.3. Disintegration of general probability measures In this subsection we shall give a definition of a disintegration and formulate some related results. Let (X, A) be measurable space, be a probability measure on A, (Y, F) be a measurable space and : X → Y be an (A, F)-measurable mapping, = ◦ −1 be the -image of . A disintegration of on A with respect to is a mapping q : A × Y → [0, 1] with the following properties: (Dis1) q is a transition probability relative to (X, A) and (Y, F), (Dis2) there exists Y0 ∈ F with (Y0 ) = 1 such that for all y ∈ Y0 we have {y} ∈ F and for −1 each −1fixed y ∈ Y0 the probability measure q(·, y) is concentrated on the “fiber” ({y}) (i.e., q ({y}), y = 1), (Dis3) coincides with the mixture of the family q(·, y) y∈Y with respect to the mixing measure ; i.e., (A) = Y
q(A, y) d (y),
∀A ∈ A.
(2.2)
We note that for this concept instead of the name “a disintegration” the name a regular conditional probability distribution given [18, p. 146] or a conditional measure [24, p. 198] is also used. For the sake of completeness we formulate the following known result. Theorem 2.4. Let X, Y be Polish spaces, be a probability measure on B(X), : X → Y be a Borel measurable mapping. Then there exists a disintegration of on B(X) with respect to and it is unique in the following natural sense: if q1 , q1 are disintegration of on B(X) with respect to , then there exists a set Y1 ∈ F such that (Y1 ) = 1 and q1 (A, y) = q2 (A, y),
∀y ∈ Y1 , ∀A ∈ B(X).
A proof can be found in [18, Theorem 8.1] or [19, Proposition 46.3] (cf. also [21, Theorem 5.3.7]).
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
855
3. Disintegration of Gaussian measures in Banach spaces 3.1. Vector-valued functions Let (X, A, ) be a probability space, G be a normed space over R, 0 < p < ∞. As usual, Lp (X, A, ; G) will stand for the set of all equivalence classes of (A,pB(G))measurable mappings : X → G such that (X) is a separable subset of G and X (x) G d(x) < ∞. We write
1/p p
p =
(x) G d(x) , ∈ Lp (X, A, ; G). X
It is known that if p 1, then the functional · p is a norm on Lp (X, A, ; G) and if G is a Banach space then Lp (X, A, ; G), · p is a Banach space too. For a normed space G its conjugate (or dual) space will be denoted by G∗ . For an element g ∈ G and a functional g ∗ ∈ G∗ instead of g ∗ (g) we shall also write g, g ∗ . If : X → G is an (A, B(G))-measurable mapping, then for every g ∗ ∈ G∗ the mapping ∗ g ◦ : X → R is (A, B(R))-measurable. The following converse of this statement is true: if : X → G is a mapping such that for every g ∗ ∈ G∗ the mapping g ∗ ◦ : X → R is (A, B(R))measurable and (X) is a separable subset of G, then : X → G is an (A, B(G))-measurable mapping (Pettis measurability theorem). An (A, B(G))-measurable mapping : X → G is said to be of weak order p with respect to if g ∗ ◦ ∈ Lp (X, A, ; R), ∀g ∗ ∈ G∗ . If ∈ Lp (X, A, ; G), then is of weak order p with respect to . The converse statement is true if and only if G is finite-dimensional [26, Theorem 2.2.1]. We will say that an (A, B(G))-measurable mapping : X → G has the integral with respect to on a set A ∈ A if |g ∗ ◦ | d < ∞, ∀g ∗ ∈ G∗ , (3.1) A
and there exists an element mA ∈ G such that g ∗ ◦ d = g ∗ (mA ), ∀g ∗ ∈ G∗ .
(3.2)
A
If : X → G has the integral with respect to on a set A ∈ A, then the unique element mA ∈ G satisfying (3.2) is called the (Pettis) integral of with respect to on a set A ∈ A and it is denoted by A d or A (x) d(x). We will say that an (A, B(G))-measurable mapping : X → G is (Pettis) integrable with respect to if is of weak order 1 with respect to and it has the integral with respect to on every set A ∈ A. It is known that if G is a separable Banach space, then every ∈ L1 (X, A, ; G) is integrable with respect to and the following inequality holds: (x) d(x)
(x) G d(x). X
G
X
3.2. Characteristic functionals and mixtures of measures Let X be a separable Banach space over R with the conjugate space X ∗ .
856
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
For a finite measure given on the Borel -algebra B(X) its characteristic functional ˆ : X ∗ → C is defined by the equality: exp{ix ∗ (x)} d(x), x ∗ ∈ X∗ . (x ˆ ∗) = X
Lemma 3.1. Let X be a separable Banach space, (Y, F) be a measurable space, q : B(X)×Y → [0, 1] be a transition probability relative to (X, B(X)) and (Y, F). Write qy (·) := q(·, y), y ∈ Y . Then for each fixed x ∗ ∈ X ∗ the mapping y → qˆy (x ∗ ) is a bounded (F, B(C))-measurable function. ∗ ∗ Proof. Fix x ∗ ∈ X∗ and consider the function x → f (x) := exp{ix (x)}. Clearly, |qˆy (x )| = | X f dqy | 1, ∀y ∈ Y and the measurability of y → qˆy (x ∗ ) = X f (x) dqy (x) follows from Lemma 2.2.
Proposition 3.2. Let X be a separable Banach space, ⊂ X∗ be a separating vector subspace, be a probability measure on B(X), (Y, F) be a measurable space, q : B(X) × Y → [0, 1] be a transition probability relative to (X, B(X)) and (Y, F) and be a probability measure on F. Write qy (·) := q(·, y), y ∈ Y . TFAE: (i) = q ; i.e., is the mixture of probability measures qy , y ∈ Y with the mixing measure . (ii) (x ˆ ∗ ) = Y qˆy (x ∗ ) d(y), ∀x ∗ ∈ X∗ . (iii) (x ˆ ∗ ) = Y qˆy (x ∗ ) d(y), ∀x ∗ ∈ . Proof. (i) ⇒ (ii). Fix x ∗ ∈ X ∗ and apply the equality (2.1) of Lemma 2.3 to the function x → f (x) := exp{ix ∗ (x)}. (ii) ⇒ (iii) is evident. (iii) ⇒ (i). Write = q . Then from ∗ ∗ ˆ ∗) = the already proved implication (i) ⇒ (iii) we have (x Y qˆy (x ) d(y), ∀x ∈ . From ˆ ∗ ) = (x this equality and (iii) we get (x ˆ ∗ ), ∀x ∗ ∈ . From the last relation via uniqueness theorem for characteristic functionals [26, Corollary 2(b) to Theorem 4.2.2] we obtain = = q . 3.3. Moment functionals and covariance operators Let, as in Section 3.2, X be a separable Banach space, be a probability measure on B(X) and 0 < p < ∞. p is of strong order p if X x X d(x) < ∞ and is of weak order p if We say∗ that p ∗ ∗ X | x, x | d(x) < ∞, ∀x ∈ X . Clearly, if is of strong order p then is of weak order p as well; the reverse implication is not true if X is infinite-dimensional [26, Theorem 2.2.1]. We say that a measure has mean or baricenter if is of weak order 1 and there exists an element m ∈ X such that
x, x ∗ d(x) = m , x ∗ , ∀x ∗ ∈ X∗ . X
In other words, has mean or baricenter if the identity mapping : X → X, (x) = x, ∀x ∈ X is of weak order 1 with respect to and has integral over X with respect to .
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
857
If X = c0 , then a weak first-order probability measure given on B(X) may not have the mean [26, Example, p. 115]. If X is an arbitrary separable Banach space and a probability measure given on B(X) is either of strong order 1 or of weak order p with 1 < p < ∞, then has the mean [26, Proposition 2.3.2, Theorem 2.3.1]. If X is a separable Banach space and is a weak second-order probability measure given on B(X), then has mean m and covariance operator C : X∗ → X which is uniquely determined by the equality
∗ ∗ C x1 , x2 = x − m , x1∗ x − m , x2∗ d(x), x1∗ , x2∗ ∈ X∗ . X
More information about covariance operators can be found in [26, Chapter III, §2]. The following lemma can be proved easily. Lemma 3.3. Let X be a separable Banach space and be a weak second-order probability measure given on B(X) for which C is a finite-rank operator. Then (m + C (X ∗ )) = 1. In particular, if C = 0, then ({m }) = 1. 3.4. Symmetric positive operators We fix a real Banach space X. A mapping R : X∗ → X is called
symmetric if Rx1∗ , x2∗ = Rx2∗ , x1∗ , ∀x1∗ , x2∗ ∈ X∗ , positive if Rx ∗ , x ∗ 0, ∀x ∗ ∈ X∗ . It is known that a symmetric mapping R : X ∗ → X is linear and continuous. Moreover, if R : X∗ → X is a symmetric positive mapping, then
∗ ∗ 2 ∗ ∗ ∗ ∗ Rx1 , x2 Rx1 , x1 Rx2 , x2 , ∀x1∗ , x2∗ ∈ X∗ (3.3) and
Rx ∗ 2X R Rx ∗ , x ∗ ,
∀x ∗ ∈ X∗ .
(3.4)
If X is a separable Banach space and is a weak second-order probability measure given on B(X), then its covariance operator C : X∗ → X presents a basic example of a symmetric positive operator. For mappings R1 , R2 : X ∗ → X we write R1 R2 if R1 x ∗ , x ∗ R2 x ∗ , x ∗ , ∀x ∗ ∈ X ∗ . The relation is a partial order in the set of all symmetric operators from X∗ to X (see [26, Chapter III, §1]). Let R : X∗ → X be a non-zero symmetric positive operator; a non-empty family (xi∗ )i∈I of elements of X∗ will be called: R-orthonormal if Rxi∗ , xj∗ = i,j , ∀i, j ∈ I (where i,j stands for Kronecker delta), R-representing if (xi )i∈I is R-orthonormal and 2 Rxi∗ , x ∗ = Rx ∗ , x ∗ , ∀x ∗ ∈ X∗ . (3.5) i∈I
858
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
Lemma 3.4. Let X be a Banach space, R : X ∗ → X be a non-zero symmetric positive operator, (xi∗ )i∈I be a finite or countably infinite R-orthonormal family. Then:
∗ ∗ 2 (a) ∀x ∗ ∈ X∗ . Rx ∗ , x ∗ , i∈I Rxi , x ∗ ∗ (b) For every x ∈ X the series i∈I Rxi∗ , x ∗ Rxi∗ is convergent in X and the mapping RI : X∗ → X defined by the following equality RI x ∗ = Rxi∗ , x ∗ Rxi∗ , x ∗ ∈ X∗ i∈I
is a symmetric positive operator satisfying the relation RI R. ∗ family if and only if for every x ∗ ∈ X ∗ the series (c) (x i )i∈I is∗ a ∗ R-representing ∗ i∈I Rxi , x Rxi is convergent in X and the following equality holds: Rxi∗ , x ∗ Rxi∗ , x ∗ ∈ X∗ . Rx ∗ = i∈I
Proof. non-empty subset I0 of I, a functional x ∗ ∈ X ∗ and write y ∗ = x ∗ −
(a) ∗Fix∗ a finite ∗ ∗ i∈I0 Rxi , x xi . By using of R-orthonormality of (xi )i∈I we get
∗ ∗ ∗ ∗ 2 ∗ ∗ Rx , x − Rxi , x = Ry , y 0. i∈I0
This implies (a) in case of a finite I. The case of a countable I follows from finite case by taking the limit. ∗ ∗ suppose
can ∗ that I = N. Fix a functional x ∈ X . It is sufficient to show∗ that (b) We ∗ ∗ ( i n Rxi , x Rxi )n∈N is a Cauchy sequence in X. Let n, m ∈ N, n < m and write yn,m =
∗ ∗ ∗ n i m Rxi , x xi . By using (3.4) we can write: 2 ∗ ∗ ∗
2 = Ry ∗ 2 R Ry ∗ , y ∗ = R Rx , x Rx Rxi∗ , x ∗ . n,m n,m n,m i i n i m ni m
2
Since by (a) limn,m n i m Rxi∗ , x ∗ = 0. we have limn,m n i m Rxi∗ , x ∗ Rxi∗ = 0. (c) The “if” part is easy to verify. To show the
if” part, fix a finite non-empty subset I0 of “only I, a functional x ∗ ∈ X∗ and write y ∗ = x ∗ − i∈I0 Rxi∗ , x ∗ xi∗ . Using (3.4) again we can write 2 ∗ ∗ ∗ ∗
∗ ∗ ∗ 2 Rx − Rxi , x Rxi = Ry R Ry , y i∈I0 ⎛ ⎞
∗ ∗ ∗ ∗ = R ⎝ Rx , x − Rxi , xi ⎠ . i∈I0
From this equality and (3.5) we get (c).
Lemma 3.5. Let X be a Banach space, R : X∗ → X be a non-zero symmetric positive operator for which R(X∗ ) is a separable subset of X. Then there exists a finite or countably infinite R-representing family (xi∗ )i∈I . Moreover, if dimR(X ∗ ) < ∞, then card(I ) = dim R(X ∗ ) and if R(X∗ ) is an infinite-dimensional, then I is countably infinite.
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
859
Proof. It is easy to see that on R(X ∗ ) ⊂ X the equality
(Rx1∗ |Rx2∗ ) = Rx1∗ , x2∗ , x1∗ , x2∗ ∈ X∗ defines a scalar product. Denote by the H thus obtained inner product space (R(X ∗ ), (·|·)). If n := dim R(X∗ ) < ∞, then we can find functionals xi∗ ∈ X∗ , i = 1, . . . , n, such that Rxi∗ , i = 1, . . . , n, is an orthonormal basis for H. Clearly, xi∗ ∈ X ∗ , i = 1, . . . , n, is an R-representing sequence. If R(X ∗ ) is infinite-dimensional, then the separability of R(X ∗ ) implies that the inner product space H is separable as well (see [26, Corollary 1 to Lemma 3.1.1]). Since any separable inner product space has an orthonormal basis, we can find xi∗ ∈ X ∗ , i = 1, . . . , such that Rxi∗ , i = 1, 2, . . . , is an orthonormal basis for H. Clearly, xi∗ ∈ X ∗ , i = 1, 2, . . . , is an R-representing sequence. 3.5. Gaussian measures Let X be a separable Banach space. A probability measure on B(X) is called Gaussian if for every fixed x ∗ ∈ X∗ the image x ∗ is either a Dirac measure or a non-degenerate Gaussian measure on B(R). Clearly a Gaussian measure on B(X) is of weak order 2, therefore it has mean and covariance operator. The following statement can be proved easily. Lemma 3.6. Let X be a real separable Banach space and be a probability measure on B(X). The following statements are valid. (a) If is Gaussian, then
(x ˆ ∗ ) = exp{i m, x ∗ − 21 Rx ∗ , x ∗ }, ∀x ∗ ∈ X∗ , (3.6) where m = m is the mean and R = C : X∗ → X is the covariance operator of . (b) Conversely, if ˆ has form (3.6) with some m ∈ X and some symmetric positive R : X ∗ → X, then is a Gaussian measure with mean m = m and with the covariance operator C = R. Corollary 3.7. Let X, Y be real separable Banach spaces, be a Gaussian measure on B(X), : X → Y be a continuous linear operator and := ◦ −1 . Then is a Gaussian measure on B(Y ) with mean m = m ∈ Y and with covariance operator C = C ∗ : Y ∗ → Y . Proof. This follows easily from Lemma 3.6.
Let X be a real separable Banach space; a mapping R : X∗ → X is called Gaussian covariance if it coincides with the covariance operator of some Gaussian measure given on B(X). Lemma 3.8. Let X be a real separable Banach space and R : X∗ → X be a Gaussian covariance. Then for every m ∈ X there exists a Gaussian measure on B(X) with mean m = m and with the covariance operator C = R. Proof. Since R : X ∗ → X is a Gaussian covariance, there is a Gaussian measure on B(X) with some mean m ∈ X and with the covariance operator C = R. Fix m ∈ X arbitrarily. Let be the image of under mapping x → x − m + m. It follows easily from change of variable
860
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
formula and Lemma 3.6 that then is a Gaussian measure on B(X) with mean m = m and with the covariance operator C = R. We will not enter here into discussion of a rather delicate problem of description of the class of Gaussian covariances; however, we will need the next statement. Proposition 3.9. Let X be a separable Banach space, R : X ∗ → X be a Gaussian covariance and R1 : X ∗ → X be a symmetric positive mapping such that R1 R. Then R1 is a Gaussian covariance as well. Proof. In view of Lemma 3.6 the needed conclusion is a particular case of [26, Corollary 2 to Proposition 6.3.4]. Remark 3.10. A statement similar to Proposition 3.9 is derived in [25, p. 115] from the following result: (*) Let X be a separable Banach space, : X ∗ → C be characteristic functional of a probability measure given on B(X) and 1 : X ∗ → C be a positive definite functional such that |1 −
1 (x ∗ )||1 − (x ∗ )|, ∀x ∗ ∈ X∗ . Then 1 is also a characteristic functional of a probability measure given on B(X). This assertion is formulated without proof in Tortrat’s paper [22, §3, p. 4975] as a result of A. Badrikian. However, later it turned out that in general (∗) is not true; from [26, Theorem 6.2.4] and [26, Proposition 6.2.4], it follows that if for a Banach space X the assertion (∗) is true, then X is of cotype 2. Since, for example, the space X = lr , r > 2, is not of cotype 2, we get that for X = lr , r > 2, (∗) is not true. It is known that (∗) is true for Banach spaces which have Sazonov property (see [26, Theorem 6.2.4]). For example, the space X = lr , 1 r 2, has Sazonov property (see [26, Corollary to Theorem 6.2.1], and, therefore, (∗) is true for X = lr , 1 r 2. 3.6. Gaussian disintegration In this subsection we will establish the existence of a disintegration of a Gaussian measure with respect to a continuous linear mapping. Theorem 3.11. Let X, Y be real separable Banach spaces, be a Gaussian measure on B(X) with mean zero and covariance operator C : X ∗ → X. Let also : X → Y be a continuous linear operator and := be the image of under . Then there exist a Borel measurable mapping m : Y → X, a Gaussian covariance R : X ∗ → X with R C and a disintegration (qy )y∈Y of on B(X) with respect to such that for a fixed y ∈ Y , qy is Gaussian measure on B(X) with mean m(y) ∈ X and covariance operator R. Moreover: (a) If C = C ∗ : Y ∗ → Y is a finite-rank operator, then (C (Y ∗ )) = 1 and the mapping m : Y → X is a continuous linear operator with the property (m(y)) = y, ∀y ∈ C (Y ∗ ). (b) If C = C ∗ : Y ∗ → Y is not a finite-rank operator, then there exists a vector subspace Y0 ⊂ Y such that Y0 ∈ B(Y ), (Y0 ) = 1 and the restriction of the mapping m : Y → X to Y0 is a Borel measurable linear operator with the property (m(y)) = y, ∀y ∈ Y0 . Proof. Clearly, is a Gaussian measure on B(Y ) with mean m = m = 0 and covariance operator C . We consider separately three cases and show that in each of these cases conditions from Subsection 2.3 are satisfied.
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
861
Case 1: C = 0. In this case the conclusion of the theorem is satisfied with identically zero mapping m : Y → X and with the Gaussian covariance R = C . Case 2: 1 dim(C (Y ∗ ) < ∞. We have (C (Y ∗ )) = 1 by Lemma 3.3. By Lemma 3.5 we can select from Y ∗ a finite C -representing sequence yi∗ , i = 1, . . . , n and write xi∗ = ∗ yi∗ , i = 1, . . . , n. Define then mappings m : Y → X and R : X∗ → X by the equalities: n
y, yi∗ C xi∗ , ∀y ∈ Y
m(y) =
and
i=1
Rx ∗ = C x ∗ −
n
C xi∗ , x ∗ C xi∗ ,
∀x ∗ ∈ X∗ .
i=1
Clearly m : Y → X is a continuous linear and the equality (m(y)) = y, ∀y ∈ C (Y ∗ ) holds because yi∗ , i = 1, . . . , n, is a C -representing sequence. Let us see that R : X∗ → X is a Gaussian covariance with R C . In fact, define R1 : X∗ → X by the equality R1 x ∗ =
n
C xi∗ , x ∗ C xi∗ ,
∀x ∗ ∈ X∗ .
i=1
Clearly, the C -orthonormality of yi∗ , i=1, . . . , n, implies C -orthonormality of xi∗ , i=1, . . . , n. Hence by Lemma 3.4(b), R1 : X∗ → X is a symmetric positive operator and R1 C . This shows that the operator R = C − R1 is also symmetric positive and satisfies the condition R C . Therefore, by Proposition 3.9 R is a Gaussian covariance. Since R is a Gaussian covariance, according to Lemma 3.8, for every y ∈ Y we get the existence of the Gaussian measure qy on B(X) with the mean m(y) and with the covariance operator R. Now we show that the family (qy )y∈Y is a disintegration of with respect to . Fix A ∈ B(X). The function y → qy (A) is B(Y )-measurable as composition of the B(X)-measurable function x → q0 (A − x) with the continuous linear mapping m : Y → X. Consequently, condition (Dis1) is satisfied. (Dis2) is also satisfied with Y0 := C (Y ∗ ). In fact, fix y ∈ Y0 . As we have noted, (C (Y ∗ )) = 1 and the equality (m(y)) = y holds. The Gaussian measure qy ◦ −1 has the mean (m(y)) = y and the covariance operator R∗ = 0, hence (see Lemma 3.3) qy ◦ −1 ({y}) = 1 and therefore qy (−1 ({y})) = 1. Let us check now (Dis3); we must show that is equal to the mixture of (qy )y∈Y with respect to the mixing measure . Taking into account implication (ii) ⇒ (i) of Proposition 3.2 it is sufficient to prove the equality qˆy (x ∗ ) d(y), ∀x ∗ ∈ X∗ . (3.7) (x ˆ ∗) = Y
Fix x ∗ ∈ X∗ . Since
qˆy (x ∗ ) = exp{i m(y), x ∗ − we get Y
1 2
Rx ∗ , x ∗ },
1 qˆy (x ) d(y) = exp − Rx ∗ , x ∗ 2 ∗
Y
∀y ∈ Y,
exp{i m(y), x ∗ } d(y).
862
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
Clearly,
1 exp{i m(y), x ∗ } d(y) = ˆ (m∗ x ∗ ) = exp − C m∗ x ∗ , m∗ x ∗ . 2 Y
Since m∗ x ∗ = ni=1 C xi∗ , x ∗ yi∗ and yi∗ , i = 1, . . . , n, are C -orthonormal, we have C m∗ x ∗ , 2 m∗ x ∗ = ni=1 C xi∗ , x ∗ = R1 x ∗ , x ∗ . Therefore, we get 1 1 1 qˆy (x ∗ ) d(y) = exp − Rx ∗ , x ∗ exp − R1 x ∗ , x ∗ exp − C x ∗ , x ∗ 2 2 2 Y and consequently relation (3.7) is proved. Case 3: dim(C (Y ∗ ) = ∞. By Lemma 3.5 we can select from Y ∗ an infinite C -representing sequence yi∗ , i = 1, 2, . . . , and write xi∗ = ∗ yi∗ , i = 1, 2, . . . . For a fixed natural number n introduce a continuous linear mapping mn : Y → X by the equality mn (y) =
n
y, yi∗ C xi∗ ,
∀y ∈ Y.
i=1
Let Y2 := {y ∈ Y : the sequence (mn (y))n∈N converges in X} and Y3 := {y ∈ Y : limn y − n ∗ ∗ i=1 y, yi C yi Y = 0}. Introduce then a mapping m : Y → X as follows: m(y) = 0 for y ∈ Y \ Y2 and m(y) =
∞
y, yi∗
i=1
C xi∗
= lim n
n
y, yi∗ C xi∗ ,
∀y ∈ Y2 .
i=1
Define also mappings R1 , R : X ∗ → X by the equalities R1 x ∗ =
∞
C xi∗ , x ∗ C xi∗ ,
∀x ∗ ∈ X∗ , R = C − R1 .
(3.8)
i=1
Since C -orthonormality of yi∗ , i = 1, 2, . . . , implies C -orthonormality of xi∗ , i = 1, 2, . . . , by Lemma 3.4(b) the equality (3.8) defines a symmetric positive operator R1 : X ∗ → X with R1 C . This shows that the operator R = C − R1 is also symmetric positive and satisfies the condition R C . Now we will see that the conclusion of the theorem is satisfied with these m and R. First we will prove the following statement. Claim. We have (Y2 ) = 1 and (Y3 ) = 1. Proof. As above we can see that
exp{i mn (y), x ∗ } d(y) = ˆ (m∗n x ∗ ) Y 1 = exp − C m∗n x ∗ , m∗n x ∗ 2 n 1 ∗ ∗ 2 = exp − , C x i , x 2 i=1
∀n ∈ N, ∀x ∗ ∈X ∗ .
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
Hence,
lim n
Y
1 ∗ ∗ ∗ exp{i mn (y), x } d(y) exp − R1 x , x , 2
∀x ∗ ∈ X∗ .
863
(3.9)
Since R1 C , by Proposition 3.9 R1 is a Gaussian covariance. From this according to Lemma 3.8 we get the existence of mean-zero Gaussian measure 1 on B(X) with the covariance operator R1 . From this and (3.9) we get
lim exp{i mn (y), x ∗ } d(y) = ˆ1 (x ∗ ), ∀x ∗ ∈ X∗ . (3.10) n
Y
Observe now that, since is a mean-zero Gaussian measure, the C -orthonormality of yi∗ , i = 1, 2, . . . , implies that yi∗ , i = 1, 2, . . . , are independent standard Gaussian random variables on the probability space (Y, B(Y ), ). This observation and relation (3.10) according to Ito–Nisio’s theorem (see implication (c) ⇒ (a) of [26, Theorem 5.2.4]) imply (Y2 ) = 1. The equality (Y3 ) = 1 can be verified analogously and our claim is proved. Now we continue the proof of the theorem. Since R C , by Proposition 3.9 R is a Gaussian covariance. From this according to Lemma 3.8 for every y ∈ Y we get the existence of the Gaussian measure qy on B(X) with the mean m(y) and the covariance operator R. We now show that the family (qy )y∈Y is the disintegration of with respect to . (Dis1) Clearly m is a Borel measurable mapping. Hence, (Dis1) can be verified as in Case 2. (Dis2) is also satisfied with Y0 := Y2 ∩ Y3 . In fact, according to our claim we have (Y0 ) = 1. Fix y ∈ Y0 . The equality (m(y)) = y holds because on the one hand limn mn (y) = m(y) (as y ∈ Y2 ), hence limn mn (y) = m(y); on the other hand, limn mn (y) − y Y = 0 (as y ∈ Y3 ). Using this, we get that the Gaussian measure qy ◦ −1 has the mean (m(y)) = y and the covariance operator R∗ = 0, hence (see Lemma 3.3), qy ◦ −1 ({y}) = 1, therefore qy (−1 ({y})) = 1. (Dis3) Note first that according to relation limn mn (y) = m(y), ∀y ∈ Y2 , from (Y2 ) = 1 and (3.9) we get
1 ∗ ∗ ∗ exp{i m(y), x } d(y) = exp − R1 x , x (3.11) , ∀x ∗ ∈ X∗ . 2 Y Now (Dis3) can be verified using implication (ii) ⇒ (i) of Proposition 3.2 and relation (3.11) as in Case 2. Remark 3.12. (1) It follows from the uniqueness part of Theorem 2.4 that the disintegration described in Theorem 3.11 is unique. (2) (Suggested to pay attention to by one of the referees.) If in Theorem 3.11 the mapping is injective, then (X) ∈ B(Y ), there exists a vector subspace Y0 ⊂ (X) such that Y0 ∈ B(Y ), (Y0 ) = 1 and m(y) = −1 (y), ∀y ∈ Y0 ; moreover, qy = m(y) , ∀y ∈ Y0 . Corollary 3.13. Let X be a real separable Banach spaces, be a Gaussian measure on B(X) with mean zero and non-zero covariance operator C : X ∗ → X. Let also n be a natural number, xi∗ , i = 1, . . . , n be a C -orthonormal sequence and : X → Rn be the linear mapping induced by the sequence xi∗ , i = 1, . . . , n. Then = is the standard Gaussian measure on B(Rn ) and there exists a disintegration (qy )y∈Y of on B(X) with respect to such that for a fixed y = (y1 , . . . , yn ) ∈ Rn , qy is Gaussian measure on B(X) with mean m(y) =
864
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
n
C xi∗ ∈ X and covariance operator i=1 yi ∗ C x − ni=1 C xi∗ , x ∗ C xi∗ , ∀x ∗ ∈ X∗ .
R : X ∗ → X defined by the equality Rx ∗ =
Remark 3.14. (1) Corollary 3.13 was obtained earlier in [12]; this result is presented also in [24, Appendix, Lemma 2.9.6] and in [20, Theorem 3.4.1]. One of the key points of the proof in [12] is Proposition 3.9, which was derived there from the statement which later turned out to be not correct for the general case (see Remark 3.10). (2) Note finally that the conclusion of Corollary 3.13 remains valid also when is a Gaussian Radon measure in a Hausdorff locally convex space X and : X → Rn is a -measurable linear mapping induced by a finite sequence of -measurable and -orthonormal linear functionals [2, Proposition 6.11.4]. 4. The existence of average-case optimal algorithms 4.1. IBC-formulations Let us describe briefly the best approximation problem in terms of the theory of IBC as it is presented in [24]. Let X, Y be non-empty sets, G a (real or complex) normed space, S : X → G, : X → Y be mappings and be a non-empty set of mappings : Y → G. Let us agree to call S the solution operator, the information operator, the set of admissible algorithms. Moreover, fix a mapping e : GX × GX → [0, ∞] and call it the error criterion. Problem. Compute an approximation of S by means of the given information and the given algorithms ∈ in such a way that to make the error e(S, ◦ ) as small as possible. An algorithm 0 ∈ which achieves the smallest possible error (whenever it exists) will be called the optimal algorithm. Traditionally as an error criterion the functional e∞ is chosen defined by the equality e∞ (S, T ) = sup Sx − T x G ,
S, T ∈ GX .
x∈X
For a given ∈ the quantity e∞ (S, ◦ ) is called the worst-case error. An algorithm 0 ∈ which achieves the smallest possible worst-case error (whenever it exists) will be called the worstcase optimal algorithm. We refer to [24] for the justification of such a terminology and illustrating examples. To introduce a different error criterion, let us assume further that the set X is endowed by a -algebra A on which a probability measure is given, the set Y is endowed by a -algebra F, the solution operator S : X → G belongs to L2 (X, A, ; G), the information operator : X → Y is (A, F)-measurable and : F → [0, 1] is the distribution of associated with . The set of admissible algorithms is contained in L2 (Y, F, ; G). As an error criterion let us choose the functional e2, defined by the equality
1/2 e2, (S, T ) =
Sx − T x 2G d(x) , S, T ∈ L2 (X, A, ; G). X
For a given ∈ we have that ◦ ∈ L2 (X, A, ; G), hence the quantity e2, (S, ◦ ) is well defined and it is called the average-case error. An algorithm 0 ∈ which achieves the
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
865
smallest possible average-case error (whenever it exists) will be called the average-case optimal algorithm. 4.2. Average-case optimal algorithms via disintegration In this subsection we shall see that by using disintegration it is possible to prove the existence and, at the same time, to find an explicit form of the average-case optimal algorithm. Proposition 4.1. Let X, G, Y be separable Banach spaces and be a mean-zero Gaussian measure on B(X). Let, moreover, S : X → G be a continuous linear solution operator; : X → Y be a continuous linear information operator; (qy )y∈Y be the Gaussian disintegration of with respect to and finally m : Y → X be the mapping from Theorem 3.11. Then 0 = S ◦ m : Y → G is an average-case optimal algorithm for S and . Proof. As it is well known every Gaussian measure in a separable Banach space is of strong order 2. We have S ∈ L2 (X, B(X), ; G) because S is continuous linear and so ◦ S −1 is a Gaussian measure on B(G). Since qy , y ∈ Y are also Gaussian measures we also have
Sx 2G dqy (x) < ∞, ∀y ∈ Y. X
From the last relation and = q by Lemma 2.3 we can conclude that 0 ∈ L2 (Y, F, ; G). Fix arbitrarily y ∈ Y . Since the Gaussian measure qy has mean m(y) and S is a continuous linear operator, we have 0 (y) = Sm(y) = S(x) dqy (x). X
From the last equality, since the Gaussian measure qy ◦ S −1 is symmetric with respect to its mean 0 (y), we get 2
S(x) − (y) G dqy (x) S(x)−0 (y) 2G dqy (x), ∀∈L2 (Y, F, ; G). (4.1) X
X
−1 ({y})
for y ∈ Y . By property (Dis2) we can find Y0 ∈ F with (Y0 ) = 1 such Let Xy := that qy (Xy ) = 1, ∀y ∈ Y0 . Fix arbitrarily y ∈ Y0 and ∈ L2 (Y, F, ; G). Using inequality (4.1) and Lemma 2.3 we obtain 2 2
S(x) − ((x)) G d(x) =
S(x) − ((x)) G dqy (x) d (y) X
Y0
Xy
× Y0
Xy
Y0
=
X
and the proof is finished.
Xy
S(x) − (y) 2G dqy (x) d (y)
S(x) − 0 (y) 2G dqy (x)
S(x) − 0 ((x)) 2G d(x)
d (y)
866
V. Tarieladze, N. Vakhania / Journal of Complexity 23 (2007) 851 – 866
Remark 4.2. Proposition 4.1 for the case Y = Rn and : X → Rn is a linear mapping induced by some C -orthonormal sequence xi∗ , i = 1, . . . , n, was obtained earlier in [24]. Acknowledgments We are grateful to the referees for their valuable remarks and suggestions. References [1] S.K. Berberian, A note on the disintegration of measures, Proc. Amer. Math. Soc. 71 (1) (1978) 115–116. [2] V.I. Bogachev, Gaussian measures, Mathematical Surveys and Monographs, vol. 62, American Mathematical Society, Providence, Rhode Island, 1998, xi, 433pp. [3] N. Bourbaki, Integration Vectorielle, Hermann, Paris, 1959, 105pp. Chapter VI. [4] S.D. Chaterji, Disintegration of measures and lifting. Vector and operator valued measures and applications, Proceedings of a symposium on vector and operator valued measures and applications, held at Snowbird Resort, Alta, Utah, August 7–12, 1972, Academic Press, New York, 1973, pp. 69–83. [5] Iu.A. Davydov, M.A. Lifshits, N.V. Smorodina, Local properties of distributions of stochastic functionals, translated from the 1995 Russian original by V. E. Naza˘ıkinski˘ı and M.A. Shishkova, Translations of Mathematical Monographs, vol. 173, American Mathematical Society, Providence, RI, 1998, xiv+184pp. ISBN 0-8218-0584-3. [6] G.A. Edgar, Disintegration of measures and the vector-valued Radon–Nikodým theorem, Duke Math. J. 42 (3) (1975) 447–450. [7] A.M. Faden, The existence of regular conditional probabilities: necessary and sufficient conditions, Ann. Probab. 13 (1985) 288–298. [8] S. Graf, L.D. Mauldin, A classification of disintegrations of measures, Measure and Measurable Dynamics (Rochester, NY, 1987), Contemporary Mathematics, vol. 94, American Mathemtical Society, Providence, RI, 1989, pp. 147–158. [9] H. Helson, Disintegration of measures, Harmonic Analysis and Hypergroups (Delhi, 1995), Trends in Mathematics, Birkhäuser Boston, Boston, MA, 1998, pp. 47–50. [10] J. Hoffmann-Jorgensen, The Theory of Analytic Spaces, Aarhus Universitet, Matematisk Institut, Various Publication Series, vol. 10, June, 1970, 314pp. [11] J. Hoffmann-Jorgensen, Existence of conditional probabilities, Math. Scand. 28 (1971) 257–264. [12] D. Lee, G.W. Wasilkowski, Approximation of linear functionals on a Banach space with a Gaussian measure, J. Complexity 2 (1) (1986) 12–43. [13] D. Maharam, Strict disintegration of measures, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 32 (1975) 73–79. [14] K. Musial, Existence of proper regular conditional probabilities, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 22 (1972) 8–12. [15] J.V. Neumann, Zur Operatoremethode In Der Klassischen Mechanik, Ann. Math. 33 (3) (1933) 587–682. [16] J. Neveu, Bases matematiques du calcul des probabilites, Masson et Cie, Paris, 1964 (Russian transl. Translated and annotated by V.V. Sazonov. Mir, Moscow, 1969, 309pp). [17] J.K. Pachl, Disintegration and compact measures, Math. Scand. 43 (1) (1978/1979) 157–168. [18] K.R. Parthasarathy, Probability Measures on Metric Spaces, Academic Press, New York, London, 1967. [19] K.R. Parthasarathy, Introduction to Probability and Measure, Springer, New York, 1978 xii+312pp. ISBN 0-38791135-9 (Russian translation: Moscow, Mir, 1983, 344pp.). [20] L. Plaskota, Noisy Information and Computational Complexity, Cambridge University Press, Cambridge, 1996 xii+308pp. [21] M.M. Rao, Conditional measures and applications, Marcel Dekker, New York, 1993 xiv+417pp. [22] A. Tortrat, Lois indefinement divisibles ( ∈ I ) dans un group topologique abelian metrisable X. Cas des espaces vectoriels, C. R. Acad. Sci. Paris 261 (1965) 4973–4975. [23] A. Tortrat, Désintégration d’une probabilité, statistiques exhaustives, Séminaire de Probabilités, XI (University of Strasbourg, Strasbourg, 1975/1976), Lecture Notes in Mathematics, vol. 581, Springer, Berlin, 1977, pp. 539–565, (in French). [24] J.F. Traub, G.W. Wasilkowski, H. Wozniakowski, Information-based complexity, with contributions by A. G. Werschultz and T. Boult, Computer Science and Scientific Computing, Academic Press, Boston, MA, 1988, xiv+523pp. [25] N.N. Vakhania, Probability Distributions on Linear Spaces, North-Holland, Amsterdam, 1981. [26] N.N. Vakhania, V.I. Tarieladze, S.A. Chobanyan, Probability Distributions on Banach Spaces, Reidel, Dordrecht, 1987.
Journal of Complexity 23 (2007) 867 – 889 www.elsevier.com/locate/jco
Free-knot spline approximation of stochastic processes Jakob Creutziga , Thomas Müller-Gronbachb , Klaus Rittera,∗ a Fachbereich Mathematik, Technische Universität Darmstadt, SchloYgartenstraYe 7, 64289 Darmstadt, Germany b Fakultät für Mathematik und Informatik, FernUniversität Hagen, LützowstraYe 125, 58084 Hagen, Germany
Received 8 December 2006; accepted 26 May 2007 Available online 22 June 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday
Abstract We study optimal approximation of stochastic processes by polynomial splines with free knots. The number of free knots is either a priori fixed or may depend on the particular trajectory. For the s-fold integrated Wiener process as well as for scalar diffusion processes we determine the asymptotic behavior of the average Lp -distance to the splines spaces, as the (expected) number of free knots tends to infinity. © 2007 Elsevier Inc. All rights reserved. Keywords: Integrated Wiener process; Diffusion process; Stochastic differential equation; Optimal spline approximation; Free knots
1. Introduction Consider a stochastic process X = (X(t))t 0 with continuous paths on a probability space (, A, P ). We study optimal approximation of X on the unit interval by polynomial splines with free knots, which has first been treated in [11]. For k ∈ N and r ∈ N0 we let r denote the set of polynomials of degree at most r, and we consider the space k,r of polynomial splines =
k j =1
1]tj −1 ,tj ] · j ,
where 0 = t0 < · · · < tk = 1 and 1 , . . . , k ∈ r . Furthermore, we let Nk,r denote the class of ∗ Corresponding author.
E-mail address: [email protected] (K. Ritter). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.05.003
868
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
mappings : → k,r , X and for 1p ∞ and 1 q < ∞ we define 1/q q ∈ Nk,r . ek,r (X, Lp , q) = inf E∗ X − X :X Lp [0,1] Here we use the outer expectation value E∗ in order to avoid cumbersome measurability considerations. The reader is referred to [21] for a detailed study of the outer integral and expectation. Note that ek,r (X, Lp , q) is the q-average Lp -distance of the process X to the spline space k,r . A natural extension of this methodology is not to work with an a priori chosen number of free knots, but only to control the average number of knots needed. This leads to the definition r = ∞ k=1 k,r and to the study of the class Nr of mappings : → r . X ∈ Nr we define For a spline approximation method X = E∗ (min{k ∈ N : X(·) ∈ k,r }), (X) − 1 is the expected number of free knots used by X. Subject to the bound (X)k, i.e., (X) the minimal achievable error for approximation of X in the class Nr is given by 1/q av q ∈ Nr , (X) k . (X, Lp , q) = inf E∗ X − X :X ek,r Lp [0,1] av as k tends to infinity. We shall study the asymptotics of the quantities ek,r and ek,r The spline spaces k,r form nonlinear manifolds that consist of k-term linear combinations of functions of the form 1]t,1] · with 0 t < 1 and ∈ r . We refer to [7, Section 6] for a detailed treatment in the context of nonlinear approximation. Hence we are addressing a so-called nonlinear approximation problem. While nonlinear approximation is extensively studied for deterministic functions, see [7] for a survey, much less is known for stochastic processes, i.e., for random functions. Here we refer to [2,3], where wavelet methods are analyzed, and to [11]. In the latter paper nonlinear approximation is related to approximation based on partial information, as studied in information-based complexity, and spline approximation with free knots is analyzed as a particular instance.
2. Main results For two sequences (ak )k∈N and (bk )k∈N of positive real numbers we write ak ≈ bk if limk→∞ ak /bk = 1, and ak bk if lim inf k→∞ ak /bk 1. Additionally, ak bk means c1 ak /bk c2 for all k ∈ N and some positive constants ci . Fix s ∈ N0 and let W (s) denote an s-fold integrated Wiener process. In [11], the following result was proved. Theorem 1. For r ∈ N0 with r s, av ek,r (W (s) , L∞ , 1) ek,r (W (s) , L∞ , 1) k −(s+1/2) .
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
869
Our first result refines and extends this theorem. Consider the stopping time r,s,p = inf t > 0 : inf W (s) − Lp [0,t] > 1 , ∈r
which yields the length of the maximal subinterval [0, r,s,p ] that permits best approximation of W (s) from r with error at most one. We have 0 < E r,s,p < ∞, see (14), and we put =s+
1 2
+ 1/p
as well as cr,s,p = (E r,s,p )− and bs,p = (s + 21 )s+1/2 · p −1/p · − , where, for p = ∞, we use the convention ∞0 = 1. Theorem 2. Let r ∈ N0 with r s and 1 q < ∞. Then, for p = ∞, av ek,r (W (s) , L∞ , q) ≈ ek,r (W (s) , L∞ , q) ≈ cr,s,∞ · k −(s+1/2) .
(1)
Furthermore, for 1 p < ∞, bs,p · cr,s,p · k −(s+1/2) ek,r (W (s) , Lp , q)cr,s,p · k −(s+1/2)
(2)
av ek,r (W (s) , Lp , q) k −(s+1/2) .
(3)
and
Note that the bounds provided by (1) and (2) do not depend on the averaging parameter q. Furthermore, lim bs,p = 1
p→∞
for every s ∈ N, but lim bs,p = 0
s→∞
for every 1 p < ∞. We conjecture that the upper bound in (2) is sharp. (p) ∈ Nk,r that achieve the upper bounds in (1) We have an explicit construction of methods X k and (2), i.e., 1/q ∗ (p) q ≈ cr,s,p · k −(s+1/2) , (4) E W (s) − X k Lp [0,1] see (10) and (21). Moreover, these methods a.s. satisfy Lp [0,1] ≈ cr,s,p · k −(s+1/2) W (s) − X k (p)
(5)
as well, while k Lp [0,1] bs,p · cr,s,p · k −(s+1/2) W (s) − X
(6)
870
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
k ∈ Nk,r . Note that the right-hand sides in (5) holds a.s. for every sequence of approximations X (s) and (6) do not depend on the specific path of W , i.e., on ∈ . Our second result deals with approximation of a scalar diffusion process given by the stochastic differential equation dX(t) = a(X(t)) dt + b(X(t)) dW (t), X(0) = x0 .
t 0, (7)
Here x0 ∈ R, and W denotes a one-dimensional Wiener process. Moreover, we assume that the functions a, b : R → R satisfy (A1) a is Lipschitz continuous. (A2) b is differentiable with a bounded derivative. (A3) b(x0 ) = 0. Theorem 3. Let r ∈ N0 , 1 q < ∞, and 1 p ∞. Then av ek,r (X, Lp , q) ek,r (X, Lp , q) k −1/2
holds for the strong solution X of Eq. (7). For a diffusion process X piecewise linear interpolation with free knots is frequently used in connection with adaptive step-size control. Theorem 3 provides a lower bound for the Lp -error of any such numerical algorithm, no matter whether just Wiener increments or, e.g., arbitrary multiple Itô-integrals are used. Under slightly stronger conditions on the diffusion coefficient b, error estimates in [9,17] lead to refined upper bounds in Theorem 3 for the case 1 p < ∞, as follows. Put 1/p2 p (p1 , p2 ) = E b ◦ XL2p [0,1] 1
for 1 p1 , p2 < ∞. Furthermore, let B denote a Brownian bridge on [0, 1] and define 1/p p (p) = E BLp [0,1] . Then ek,1 (X, Lp , p) (p) · (2p/(p + 2), p) · k −1/2 and av ek,1 (X, Lp , p) (p) · (2p/(p + 2), 2p/(p + 2)) · k −1/2 .
We add that these upper bounds are achieved by piecewise linear interpolation of modified Milstein schemes with adaptive step-size control for the Wiener increments. In the case p = ∞ it is interesting to compare the results on free-knot spline approximation with average k-widths of X. The latter quantities are defined by
1/q q dk (X, Lp , q) = inf E inf X − Lp [0,1] ,
∈
where the infimum is taken over all linear subspaces ⊆ Lp [0, 1] of dimension at most k. For X = W (s) as well as in the diffusion case we have dk (X, L∞ , q) k −(s+1/2) ,
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
871
see [4,14–16,6]. Almost optimal linear subspaces are not known explicitly, since the proof of the upper bound for dk (X, L∞ , q) is non-constructive. We add that in the case of an s-fold integrated Wiener process piecewise polynomial interpolation of W (s) at equidistant knots i/k only yields errors of order (ln k)1/2 · k −(s+1/2) , see [20] for results and references. Similarly, in the diffusion k ∈ Nr that are only based on pointwise evaluation of W and satisfy (X k ) k case, methods X 1/2 −1/2 , see [18]. can at most achieve errors of order (ln k) · k The rest of the paper is organized as follows. In the next section, some auxiliary results about approximation of a fixed function by piecewise polynomial splines are established. In Section 4, this is used to prove Theorem 2, as well as Eqs. (4)–(6). Section 5 is devoted to the proof of Theorem 3. In the Appendix, we prove an auxiliary result about convergence of negative moments of means and a small deviation result, which controls the probability that a path of W (s) stays close to the space r . 3. Approximation of deterministic functions Let r ∈ N0 and 1 p ∞ be fixed. We introduce error measures, which allow to determine suitable free knots for spline approximation. For f ∈ C [0, ∞[ and 0 u < v we put
[u,v] (f ) = inf f − Lp [u,v] . ∈r
Furthermore, for ε > 0, we put 0,ε (f ) = 0, and we define j,ε (f ) = inf{t > j −1,ε (f ) : [j −1,ε (f ),t] (f ) > ε} for j 1. Here inf ∅ = ∞, as usual. Put Ij (f ) = {ε > 0 : j,ε (f ) < ∞}. Lemma 4. Let j ∈ N. (i) If ε ∈ Ij (f ) then
[j −1,ε (f ),j,ε (f )] (f ) = ε. (ii) The set Ij (f ) is an interval, and the mapping ε → j,ε (f ) is strictly increasing and rightcontinuous on Ij (f ). Furthermore, j,ε (f ) > j −1,ε (f ) if ε ∈ Ij −1 (f ), and limε→∞ j,ε (f ) = ∞. (iii) If v → [u,v] (f ) is strictly increasing for every u 0, then ε → j,ε (f ) is continuous on Ij (f ). Proof. First we show that the mapping (u, v) → [u,v] (f ) is continuous. Put J1 = [u/2, u + (v − u)/3] as well as J2 = [v − (v − u)/3, 2v]. Moreover, let (t) = ri=0 i · t i for ∈ Rr+1 , and define a norm on Rr+1 by = Lp [u+(v−u)/3,v−(v−u)/3] . If (x, y) ∈ J1 × J2 and f − Lp [x,y] = [x,y] (f ) then Lp [x,y] [u/2,2v] (f ) + f Lp [u/2,2v] .
872
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
Hence there exists a compact set K ⊆ Rr+1 such that
[x,y] (f ) = inf f − Lp [x,y] ∈K
for every (x, y) ∈ J1 × J2 . Since (x, y, ) → f − Lp [x,y] defines a continuous mapping on J1 × J2 × K, we conclude that (x, y) → inf ∈K f − Lp [x,y] is continuous, too, on J1 × J2 . Continuity and monotonicity of v → [u,v] (f ) immediately imply (i). The monotonicity stated in (ii) will be verified inductively. Let 0 < ε1 < ε2 with ε2 ∈ Ij (f ), and suppose that j −1,ε1 (f )j −1,ε2 (f ). Note that the latter holds true by definition for j = 1. From (i) we get
[j −1,ε1 (f ),j,ε2 (f )] (f ) [j −1,ε2 (f ),j,ε2 (f )] (f ) = ε2 . This implies j,ε1 (f )j,ε2 (f ), and (i) excludes equality to hold here. Since [u,v] (f )f Lp [u,v] , the mappings ε →j,ε (f ) are unbounded and j,ε (f )>j −1,ε (f ) if ε ∈ Ij −1 (f ). For the proof of the continuity properties stated in (ii) and (iii) we also proceed inductively, and we use (i) and the monotonicity from (ii). Consider a sequence (εn )n∈N in Ij (f ), which converges monotonically to ε ∈ Ij (f ), and put t = limn→∞ j,εn (f ). Assume that limn→∞ j −1,εn (f ) = j −1,ε (f ), which obviously holds true for j = 1. Continuity of (u, v) → [u,v] (f ) and (i) imply [j −1,ε (f ),t] (f ) = ε, so that t j,ε (f ). For a decreasing sequence (εn )n∈N we also have j,ε (f ) t. For an increasing sequence (εn )n∈N we use the strict monotonicity of v → [u,v] (f ) to derive t = j,ε (f ). Let F denote the class of functions f ∈ C [0, ∞[ that satisfy j,ε (f ) < ∞
(8)
for every j ∈ N and ε > 0 as well as lim j,ε (f ) = 0
(9)
ε→0
for every j ∈ N. Let k ∈ N. We now present an almost optimal spline approximation method of degree r with k − 1 free knots for functions f ∈ F . Put k (f ) = inf{ε > 0 : k,ε (f )1} and note that (9) together with Lemma 4(ii) implies k (f ) ∈ ]0, ∞[. Let j = j,k (f ) (f ) for j = 0, . . . , k and define (p)
k (f ) =
k j =1
1]j −1 ,j ] · argmin f − Lp [j −1 ,j ] . ∈r
(10)
Note that Lemma 4 guarantees (p)
f − k (f )Lp [j −1 ,j ] = k (f )
(11)
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
873
for j = 1, . . . , k and k 1.
(12) (p)
The spline k (f )|[0,1] ∈ k,r enjoys the following optimality properties. Proposition 5. Let k ∈ N and f ∈ F . (i) For 1p ∞, (p)
f − k (f )Lp [0,1] k 1/p · k (f ). (ii) For p = ∞ and every ∈ k,r , f − L∞ [0,1] k (f ). (iii) For 1p < ∞, every ∈ k,r , and every m ∈ N with m > k, f − Lp [0,1] (m − k + 1)1/p · m (f ). Proof. For p < ∞, (p)
p
f − k (f )Lp [0,1]
k j =1
(p)
p
f − k (f )Lp [j −1 ,j ] = k · (k (f ))p
follows from (11) and (12). For p = ∞, (i) is verified analogously. Consider a polynomial spline ∈ k,r and let 0 = t0 < · · · < tk = 1 denote the corresponding knots. Furthermore, let ∈ ]0, 1[. For the proof of (ii) we put
j = j, ·k (f ) (f ) for j = 0, . . . , k. Then k < 1, which implies [ j −1 , j ] ⊆ [tj −1 , tj ] for some j ∈ {1, . . . , k}. Consequently, by Lemma 4, f − L∞ [0,1] f − L∞ [ j −1 , j ] inf f − L∞ [ j −1 , j ] = · k (f ). ∈r
For the proof of (iii) we define
= , ·m (f ) (f ) for = 0, . . . , m. Then m < 1, which implies [ i −1 , i ] ⊆ [tji −1 , tji ] for some indices 1j1 · · · jm−k+1 k and 1 1 < · · · < m−k+1 m. Hence, by Lemma 4, f
p − Lp [0,1]
m−k+1 i=1
p
inf f − Lp [ −1 , ] = (m − k + 1) · p · (m (f ))p .
∈r
i
i
for 1p < ∞. Letting tend to one completes the proof.
874
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
4. Approximation of integrated Wiener processes Let W denote a Wiener process and consider the s-fold integrated Wiener processes W (s) defined by W (0) = W and t W (s) (t) = W (s−1) (u) du 0
for t 0 and s ∈ N. We briefly discuss some properties of W (s) that will be important in the sequel. The scaling property of the Wiener process implies that for every > 0 the process ( −(s+1/2) · (s) W ( · t))t 0 is an s-fold integrated Wiener process, too. This fact will be called the scaling property of W (s) . While W (s) has no longer independent increments for s 1, the influence of the past is very explicit. For z > 0 we define z W (s) inductively by zW
(0)
zW
(s)
(t) = W (t + z) − W (z)
and (t) =
t zW
(s−1)
(u) du.
0
Then it is easy to check that W (s) (t + z) =
s t i (s−i) (z) + z W (s) (t). W i!
(13)
i=0
Consider the filtration generated by W, which coincides with the filtration generated by W (s) , and let denote a stopping time with P ( < ∞) = 1. Then the strong Markov property of W implies that the process W
(s)
= ( W (s) (t))t 0
is an s-fold integrated Wiener process, too. Moreover, the processes W (s) and (1[0,] (t)·W (t))t 0 are independent, and consequently, the processes W (s) and (1[0,] (t)·W (s) (t))t 0 are independent as well. These facts will be called the strong Markov property of W (s) . Fix s ∈ N0 . In the sequel we assume that r s. For any fixed ε > 0 we consider the sequence of stopping times j,ε (W (s) ), which turn out to be finite a.s., see (14), and therefore are strictly increasing, see Lemma 4. Moreover, for j ∈ N, we define j,ε = j,ε (W (s) ) − j −1,ε (W (s) ). These random variables yield the lengths of consecutive maximal subintervals that permit best approximation from the space r with error at most ε. Recall that F ⊆ C [0, ∞[ is defined via properties (8) and (9) and that = s + 21 + 1/p. In the case s = 0 and r = 1 the analogous construction with interpolation instead of best approximation has already been used for the study of rates of convergence in the functional law of the iterated logarithm, see [8].
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
875
Lemma 6. The s-fold integrated Wiener process W (s) satisfies P (W (s) ∈ F ) = 1. For every ε > 0 and m ∈ N the random variables j,ε form an i.i.d. sequence with 1,ε = ε1/ · 1,1 d
and E (m 1,1 ) < ∞.
Proof. We claim that E (j,ε (W (s) )) < ∞
(14)
for every j ∈ N. For the case j = 1 let Z = [0,1] (W (s) ) and note that
[0,t] (W (s) ) = t · Z d
follows for t > 0 from the scaling property of W (s) . Hence we have P (1,ε (W (s) ) < t) = P ( [0,t] (W (s) ) > ε) = P (Z > ε · t − ),
(15)
which, in particular, yields 1,ε (W (s) ) = ε1/ · 1,1 (W (s) ). d
(16)
According to Corollary 17, there exists a constant c > 0 such that P (Z ) exp(−c · −1/(s+1/2) ) holds for every ∈ ]0, 1]. We conclude that P (1,1 (W s) ) > t) exp(−c · t) (s) if t 1, which implies E (m 1,1 (W )) < ∞ for every m ∈ N. Next, let j 2, put = j −1,ε (W (s) ) and = j,ε (W (s) ), and assume that E (m ) < ∞. From representation (13) and the fact that r s we derive
[, ] (W (s) ) = [0, −] ( W (s) ), and hence it follows that = + 1,ε ( W (s) ).
(17)
We have E ((1,ε ( W (s) ))m ) < ∞, since W (s) is an s-fold integrated Wiener process again, and consequently E (( )m ) < ∞. We turn to the properties of the sequence j,ε . Due to (16) and (17) we have j,ε = 1,ε ( W (s) ) = 1,ε (W (s) ) = ε1/ · 1,1 . d
d
Furthermore, j,ε and (1[0,] (t) · W (s) (t))t 0 are independent because of the strong Markov property of W (s) , and therefore j,ε and (1,ε , . . . , j −1,ε ) are independent as well.
876
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
It remains to show that the trajectories of W (s) a.s. satisfy (9). By the properties of the sequence j,ε we have j,ε (W (s) ) = ε1/ · j,1 (W (s) ). d
(18)
Observing (14) we conclude that
(s) P lim j,ε (W ) t = lim P (j,ε (W (s) ) t) ε→0
ε→0
= lim P (j,1 (W (s) ) t/ε 1/ ) = 0 ε→0
for every t > 0, which completes the proof.
Because of Lemma 6, Proposition 5 yields upper and lower bounds for the error of spline approximation of W (s) in terms of the random variable Vk = k (W (s) ). Remark 7. Note that W (s) a.s. satisfies W (s) |[u,v] ∈ r for all 0 u < v. Assume that p < ∞. Then v → [u,v] (W (s) ) is a.s. strictly increasing for all u0. We use Lemma 4(iii) and Lemma 6 to conclude that, with probability one, Vk is the unique solution of k,Vk (W (s) ) = 1. Consequently, due to (11), we a.s. have equality in Proposition 5(i) for 1 p < ∞, too. Note that with positive probability solutions ε of the equation k,ε (W (s) ) = 1 fail to exist in the case p = ∞. To complete the analysis of spline approximation methods we study the asymptotic behavior of the sequence Vk . Lemma 8. For every 1 q < ∞, q 1/q E Vk ≈ (k · E (1,1 ))− . Furthermore, with probability one, Vk ≈ (k · E (1,1 ))− . Proof. Put Sk = 1/k ·
k
j,1
j =1
and use (18) to obtain −
P (Vk ε) = P (k,ε (W (s) ) 1) = P (k − · Sk ε). Therefore −q
E (Vk ) = k −q · E (Sk q
),
(19)
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
877
and for the first statement it remains to show that −q
E (Sk
) ≈ (E (1,1 ))−q .
The latter fact follows from Proposition 15, if we can verify that 1,1 has a proper lower tail behavior (29). To this end we use (15) and the large deviation estimate (33) to obtain P (1,1 < ) = P ( [0,1] (W (s) ) > − ) P (W (s) Lp [0,1] > − ) exp(−c · −2 ) with some constant c > 0 for all 1. In order to prove the second statement, put Sk∗ = (k · 2 )−1/2 ·
k
(j,1 − ),
j =1
where = E (1,1 ) and 2 denotes the variance of 1,1 . Let > 1. Then P (Vk > · (k · )− ) = P (Sk < −1/ · ) = P (Sk∗ < k 1/2 · ) with = ( −1/ − 1)/ · < 0, due to (19). We apply a local version of the central limit theorem, which holds for i.i.d. sequences with a finite third moment, see [19, Theorem V.14], to obtain P (Vk > · (k · )− ) c1 · k
−1/2
· (1 + k
1/2
· | |)
−3
+ (2)
c2 · k −2
−1/2
·
k 1/2 · −∞
exp(−u2 /2) du
with constants ci > 0. For every < 1 we get P (Vk < · (k · )− ) c2 · k −2
(20)
in the same way. It remains to apply the Borel–Cantelli Lemma.
4.1. Proof of (4), (5), and the upper bounds in (1), (2), (3) Consider the methods (p) = (p) (W (s) ) ∈ Nk,r . X k k
(21)
Observe Remark 7 and use Proposition 5(i) as well as Lemma 6 to obtain Lp [0,1] = k 1/p · Vk W (s) − X k (p)
a.s.
Now, apply Lemma 8 to obtain (4) and (5). Clearly, (4) implies the upper bounds in (1), (2), and (3).
878
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
4.2. Proof of (6) and the lower bound in (2) k ∈ Nk,r and put Consider an arbitrary sequence of approximations X mk = /(s + 21 ) · k. Use Lemma 6, and apply Proposition 5(ii) in the case p = ∞ and Proposition 5(iii) in the case p < ∞ to obtain k Lp [0,1] (mk − k + 1)1/p · Vm W (s) − X k
a.s.
Clearly, mk ≈ /(s + 1/2) · k. Hence, by Lemma 8, q 1/q (mk − k)1/p · Vmk ≈ (mk − k)1/p · E Vmk ≈ k −(s+1/2) · p −1/p · − · (s + 21 )s+1/2 · (E (1,1 ))− with probability one, which implies (6) and the lower bound in (2). 4.3. Proof of the lower bound in (1) k ∈ Nr such that (X k ) k, i.e., Let k ∈ N and consider X
∞ ∗ · 1B k E
(22)
=1
∈ ,r \ −1,r , where 0,r = ∅. By Proposition 5(ii) and Lemma 6, for B = X(·)
∞ q ∗ k q E 1 · V E∗ W (s) − X B . L [0,1] ∞
=1
For ∈ ]0, 1[, = E (1,1 ), and L ∈ N we define A = V > · ( · )− , and CL =
L
B .
=1
Since (f )+1 (f ) for f ∈ F , we obtain ∞ L ∞ q q q 1B · V 1B · VL + 1B · V =1
=1
=L+1 ∞ q 1B ∩AL · VL + =1 =L+1
L
q
1B ∩A · V
∞
q −q · L−q · 1CL ∩AL + −q · 1B ∩A
l=L+1
q −q · L−q · (1CL − 1AcL ) +
∞ l=L+1
−q · (1B − 1Ac )
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
879
with probability one, which implies
∞ ∞ q −q q ∗ ∗ −q −q 1B · V E L · 1CL + · 1B ·E =1
l=L+1 ∞
−E L−q · 1AcL +
−q · 1Ac .
l=L+1
From (20) we infer that P (Ac ) c1 · −2 with a constant c1 > 0. Hence there exists a constant c2 > 0 such that (L) = E
∗
L
−q
∞
· 1CL +
−q
· 1B − c2 · L−q−1
l=L+1
satisfies k q −q q · E∗ W (s) − X L
∞ [0,1]
(L)
(23)
for every L ∈ N. Put = (1 + 2q)/(2 + 2q), and take L(k) ∈ [k − 1, k ]. We claim that there exists a constant c3 > 0 such that
1+q k q · (L(k)) 1 − k −(1−)q − c3 · k −1/2 .
(24)
First, assume that the outer probability of CL satisfies P ∗ (CL ) k −(1−)q . Then
k q · (L(k)) k q · k −q · P ∗ (CL ) − c2 · (k − 1)−q−1 1 − c3 · k −1/2 with a constant c3 > 0. Next, assume P ∗ (CL ) < k −(1−)q and use (22) to derive
∞ −(1−)q ∗ c ∗ P (CL ) = E 1B 1−k = E∗
∞
l=L+1
( · 1B )q/(1+q) · (−q · 1B )1/(1+q)
l=L+1
∗
E
∞
q/(1+q) 1/(1+q) ∞ −q · 1B · · 1B
l=L+1
l=L+1
q/(1+q) 1/(1+q) ∞ ∞ E∗ · 1B · E∗ −q · 1B l=L+1
l=L+1
1/(1+q)
∞ q/(1+q) ∗ −q k · E · 1B . l=L+1
880
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
Consequently,
∞ −q · 1B − c2 · (k − 1)−q−1 k q · (L(k)) k q · E∗
1−k
=L+1
1+q −(1−)q
− c3 · k −1/2 ,
which completes the proof of (24). By (23) and (24), k q E∗ W (s) − X q −q · k −q L [0,1] ∞
for every ∈ ]0, 1[. 4.4. Proof of the lower bound in (3) av (W (s) , L , 1). For further use, Clearly it suffices to establish the lower bound claimed for ek,r 1 we shall prove a more general result.
Lemma 9. For every s ∈ N there exists a constant c > 0 with the following property. For every ∈ Nr , every A ∈ A with P (A) 4 , and every t ∈ ]0, 1] we have X 5
∗ (s) L [0,t] c · t s+3/2 · ((X)) −(s+1/2) . E 1A · W − X 1 Proof. Because of the scaling property of W (s) it suffices to study the particular case t = 1. < ∞ and put k = (X) as well as Assume that (X) ∈ 2k,r }. B = {X Then ∗ k (X)E ((2k + 1) · 1B c ) = (2k + 1) · P ∗ (B c ),
which implies P ∗ (B) 21 . Due to Lemma 6 and Proposition 5(iii), L [0,1] 1B · 2k · V4k 1B · W (s) − X 1
a.s.
Put = E (1,1 ), choose 0 < c < (2)− , and define Dk = {Vk > c · k − }. By (19) we obtain P (Dk ) = P (Sk c−1/ ) P (Sk 2). Hence lim P (Dk ) = 1
k→∞
due to the law of large numbers, and consequently P ∗ (B ∩ Dk ) 25 if k is sufficiently large, say k k0 . We conclude that L [0,1] 1A∩B∩D · c · 21−2 · k −(s+1/2) 1A∩B∩D4k · W (s) − X 1 4k and
P ∗ (A ∩ B
a.s.
∩ D4k ) 1/5 if 4k k0 . Take outer expectations to complete the proof.
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
881
Lemma 9 with A = and t = 1 yields the lower bound in (3) 5. Approximation of diffusion processes Let X denote the solution of the stochastic differential equation (7) with initial value x0 , and recall that the drift coefficient a and the diffusion coefficient b are supposed to satisfy conditions (A1)–(A3). In the following we use c to denote unspecified positive constants, which may only depend on x0 , a, b and the averaging parameter 1 q < ∞. Note that q
E XL∞ [0,1] < ∞ and E
sup
t∈[s1 ,s2 ]
(25)
|X(t) − X(s1 )|q c · (s2 − s1 )q/2
(26)
for all 1q < ∞ and 0 s1 s2 1, see [10, p. 138]. 5.1. Proof of the upper bound in Theorem 3 In order to establish the upper bound, it suffices to consider the case of p = ∞ and r = 0, i.e., nonlinear approximation in supremum norm with piecewise constant splines. We dissect X into its martingale part t M(t) = b(X(s)) dW (s) 0
and
t
Y (t) = x0 +
a(X(s)) ds. 0
∈ Nk,0 such that Lemma 10. For all 1q < ∞ and k ∈ N, there exists an approximation Y
1/q q c · k −1 . E∗ Y − Y L∞ [0,1] Proof. Put gLip = sup0 s
k
1](j −1)/k,j/k] · Y ((j − 1)/k).
j =1
By (A1) and (25), q q ∗ −q q E∗ Y − Y c · 1 + E XL∞ [0,1] · k −q c · k −q . L∞ [0,1] E Y Lip · k
∈ Nk,0 such that Lemma 11. For all 1 q < ∞ and k ∈ N, there exists an approximation M
1/q q c · k −1/2 . E∗ M − M L∞ [0,1]
882
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
Proof. Let = X
k
1](j −1)/k,j/k] · X((j − 1)/k).
j =1
Clearly, by (26), q E X − X
1/q
L2 [0,1]
Define
t
R(t) =
c · k −1/2 .
b(X(s)) dWs .
0
By the Burkholder–Davis–Gundy inequality and (A2),
1
q/2 1/q
1/q q 2 c· E (b(X(s)) − b(X(s))) ds E M − RL∞ [0,1]
0
c · E X − X L2 [0,1] q
1/q
c · k −1/2 . Note that + V, R=R where = R
k
1](j −1)/k,j/k] · R((j − 1)/k)
j =1
and V =
k
1](j −1)/k,j/k] · b(X((j − 1)/k)) · (W − W ((j − 1)/k)).
j =1
∈ Nk,0 such that According to Theorem 2, there exists an approximation W
2q E∗ W − W L∞ [0,1]
1/(2q)
c · k −1/2 .
we define V ∈ N2k,0 by Using W = V
k
− W ((j − 1)/k)). 1](j −1)/k,j/k] · b(X((j − 1)/k)) · (W
j =1
Clearly, L∞ [0,1] b(X)L∞ [0,1] · W − W L∞ [0,1] . V − V
(27)
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
883
Observing (25) and (A2), we conclude that
1/q
1/(2q)
1/(2q) 2q ∗ q 2q E∗ V − V E b(X) · E W − W L∞ [0,1] L∞ [0,1] L∞ [0,1] c · k −1/2 .
(28)
=R + V . Since ∈ N2k,0 by M We finally define M = (M − R) + (V − V ), M −M it remains to apply estimates (27) and (28) to complete the proof.
The preceding two lemma imply ek,0 (X, L∞ , q) c · k −1/2 as claimed. 5.2. Proof of the lower bound in Theorem 3 For establishing the lower bound it suffices to study the case p = q = 1. Moreover, we assume without loss of generality that b(x0 ) > 0. Choose > 0 as well as a function b0 : R → R such that: (a) b0 is differentiable with a bounded derivative, (b) inf x∈R b0 (x)b(x0 )/2, (c) b0 = b on the interval [x0 − , x0 + ]. We will use a Lamperti transform based on the space-transformation x 1 g(x) = du. b x0 0 (u) Note that g = 1/b0 and g = −b0 /b02 , and define H1 , H2 : C[0, ∞[→ C[0, ∞[ by t g a + g /2 · b2 (f (s)) ds H1 (f )(t) = 0
and H2 (f )(t) = g(f (t)). Put H = H2 − H1 . Then by the Itô formula, t b(X(s)) dW (s). H (X)(t) = b 0 (X(s)) 0 The idea of the proof is as follows. We show that any good spline approximation of X leads to a good spline approximation of H (X). However, since with a high probability, X stays within [x0 − , x0 + ] for some short (but nonrandom) period of time, approximation of H (X) is not easier than approximation of W, modulo constants. First, we consider approximation of H1 (X). 1 ∈ Nk,0 such that Lemma 12. For every k ∈ N there exists an approximation X 1 L [0,1] c · k −1 . E∗ H1 (X) − X 1
884
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
Proof. Observe that g a + g /2 · b2 (x)c · (1 + x 2 ), and proceed as in the Proof of Lemma 10. Next, we relate approximation of X to approximation of H2 (X). ∈ Nr with (X) < ∞ there exists an approximation Lemma 13. For every approximation X 2 ∈ Nr such that X 2 )2 · (X) (X and
2 L [0,1] c · E∗ X − X L [0,1] + 1/(X) . E∗ H2 (X) − X 1 1
Proof. For a fixed ∈ let X() be given by X() =
k
1]tj −1 ,tj ] · j .
j =1
tk = 1 that contains all the We refine the corresponding partition to a partition 0 = t0 < · · · < Furthermore, we define the polynomials points i/, where = (X). j ∈ r by X() =
k j =1
j . 1]tj −1 ,tj ] ·
Put f = X() and define 2 () = X
k j =1
1]tj −1 ,tj ] · qj
with polynomials tj −1 )) + g (f ( tj −1 )) · ( j − f ( tj −1 )) ∈ r . qj = g(f ( 2 (). If t ∈ tj −1 , tj ⊆ ](i − 1)/, i/], then Let f2 = X |H2 (f )(t) − f2 (t)| tj −1 )) · ( j (t) − f ( tj −1 )) = g(f (t)) − g(f ( tj −1 )) − g (f ( g(f (t)) − g(f ( tj −1 )) − g (f ( tj −1 )) · (f (t) − f ( tj −1 )) + g (f ( tj −1 )) · |f (t) − j (t)|
c · |f (t) − f ( tj −1 )|2 + |f (t) − j (t)|
c · sup |f (s) − f ((i − 1)/)|2 + |f (s) − j (s)| . s∈](i−1)/,i/]
Consequently, we may invoke (26) to derive 2 L [0,1] c · 1/(X) + E∗ X − X L E∗ H2 (X) − X 1
2 )2 · (X). Moreover, (X
1 [0,1]
.
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
885
We proceed with establishing a lower bound for approximation of H (X). ∈ Nr , Lemma 14. For every approximation X L [0,1] c · ((X)) −1/2 . E∗ H (X) − X 1 Proof. Choose t0 ∈ ]0, 1] such that A = sup |X(t) − x0 | t∈[0,t0 ]
satisfies P (A) 45 . Observe that L [0,1] 1A · W − X L [0,t ] , 1A · H (X) − X 1 1 0 and apply Lemma 9 for s = 0.
∈ Nr with k − 1 < (X) k, and choose X 1 and X 2 Now, consider any approximation X according to Lemmas 12 and 13, respectively. Then 2 − X 1 )L [0,1] E∗ H (X) − (X 1 ∗ 2 L [0,1] + E∗ H1 (X) − X 1 L [0,1] E H2 (X) − X 1 1 ∗ −1 −1 c · E X − XL1 [0,1] + ((X)) + k L [0,1] + k −1 . c · E∗ X − X 1 1 ) (X 2 ) + k 3 · k, so that 2 − X On the other hand, (X 2 − X 1 )L [0,1] c · k −1/2 E∗ H (X) − (X 1 follows from Lemma 14. We conclude that L [0,1] c · k −1/2 , E∗ X − X 1 as claimed. Acknowledgment The authors are grateful to Mikhail Lifshits for helpful discussions. In particular, he pointed out to us the approach in Appendix B. We thank Wenbo Li for discussions on the subject and for providing us with Ref. [8]. We are also grateful for numerous comments from the anonymous referees, which led to an improvement of the presentation. Appendix A. Convergence of negative moments of means Let (i )i∈N be an i.i.d. sequence of random variables such that 1 > 0 a.s. and E (1 ) < ∞. Put Sk = 1/k ·
k i=1
i .
886
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
Proposition 15. For every > 0, lim inf E (Sk− )(E (1 ))− . k→∞
If P (1 < v) c · v ,
v ∈ ]0, v0 ] ,
(29)
for some constants c, , v0 > 0, then lim E (Sk− ) = (E (1 ))− .
k→∞
Proof. Put = E (1 ) and define gk (v) = · v −(+1) · P (Sk < v). Thanks to the weak law of large numbers, P (Sk < v) tends to 1],∞[ (v) for every v = . Hence, by Lebesgue’s theorem, ∞ lim gk (v) dv = − . (30) k→∞
/2
Since E (Sk− ) =
∞
0
P (Sk− > u) du =
∞
gk (v) dv 0
the asymptotic lower bound for E (Sk− ) follows from (30). Given (29), we may assume without loss of generality that c · v0 < 1. We first consider the case 1 1 a.s., and we put /2 v0 /k gk (v) dv and Bk = gk (v) dv. Ak = 0
v0 /k
For v0 /k v /2 we use Hoeffding’s inequality to obtain gk (v) · v −(+1) · P (|Sk − | > /2) · (k/v0 )+1 · 2 exp(−k/2 · 2 ), which implies lim Ak = 0.
k→∞
On the other hand, if k > , then v0 k
−(+1) Bk = k · · v ·P i < v dv
0
i=1 v0
k ·· 0
k · · ck ·
−(+1)
v v0 0
· (P (1 < v))k dv
v k−(+1) dv k−
= k · · ( k − )−1 · ck · v0
,
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
887
and therefore lim Bk = 0.
k→∞
In view of (30) we have thus proved the proposition in the case of bounded variables i . In the general case put i,N = min{N, i } as well as Sk,N = 1/k · ki=1 i,N , and apply the result for bounded variables to obtain − lim sup E (Sk− ) inf lim sup E (Sk,N ) = inf (E 1,N )− = (E 1 )− N∈N
k→∞
N∈N
k→∞
by the monotone convergence theorem.
Appendix B. Small deviations of W (s) from r Let X denote a centered Gaussian random variable with values in a normed space (E, · ), and consider a finite-dimensional linear subspace ⊂ E. We are interested in the small deviation behavior of d(X, ) = inf X − . ∈
Obviously, P (X ε)P (d(X, ) ε)
(31)
for every ε > 0. We establish an upper bound for P (d(X, ) ε) that involves large deviations of X, too. Proposition 16. If dim() = r then P (d(X, )ε)(4/ε)r · P (X 2ε) + P (X − ε) for all ε > 0. Proof. Put B (x) = {y ∈ E : y − x } for x ∈ E and > 0, and consider the sets A = ∩ B (0) and B = Bε (0). Then {d(X, )ε} ⊂ {X ∈ A + B} ∪ {X − ε}, and therefore it suffices to prove P (X ∈ A + B) (4/ε)r · P (X 2ε).
(32)
Since 1/ · A ⊂ ∩ B1 (0), the ε-covering number of A is not larger than (4/ε)r , see [1, Eq. (1.1.10)]. Hence A⊂
n
Bε (xi )
i=1
for some x1 , . . . , xn ∈ E with n (4/ε)r , and consequently, A+B ⊂
n i=1
B2ε (xi ).
888
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
Due to Anderson’s inequality we have P (X ∈ B2ε (xi ))P (X ∈ B2ε (0)), which implies (32).
Now, we turn to the specific case of X = (W (s) (t))t∈[0,1] and E = Lp [0, 1], and we consider the subspace = r of polynomials of degree at most r. According to the large deviation principle for the s-fold integrated Wiener process, − log P (W (s) Lp [0,1] > t) t 2
(33)
as t tends to infinity, see, e.g., [5]. Furthermore, the small ball probabilities satisfy − log P (W (s) Lp [0,1] ε) ε−1/(s+1/2)
(34)
as ε tends to zero, see, e.g., [12,13]. Corollary 17. For all r, s ∈ N0 and 1 p ∞ we have − log P (d(W (s) , r ) ε) ε−1/(s+1/2) as ε tends to zero. Proof. From (31) and (34) we derive − log P (d(W (s) , r ) ε) − log P (W (s) Lp [0,1] ε) ε−1/(s+1/2) , yielding the upper bound in the corollary. For the lower bound we employ Proposition 16 with = ε− for = (2s + 1)−1 to obtain P (d(W (s) , r ) ε) 4r · ε −r(1+ ) · P (W (s) Lp [0,1] 2ε) + P (W (s) Lp [0,1] ε− − ε).
(35)
However, for ε1+ 21 we have ε − /2 ε − − ε ε − and thus, using (33), − log P (W (s) Lp [0,1] ε− − ε) ε −2 = ε−1/(s+1/2) as ε tends to zero. Furthermore, by (34),
− log 4r · ε −r(1+ ) · P (W (s) Lp [0,1] 2ε) ε −1/(s+1/2) . The latter two estimates, together with (35) and the elementary inequality log(x + y) log(2) + max(log(x), log(y)), yield the lower bound in the corollary. References [1] B. Carl, I. Stephani, Entropy, Compactness and the Approximation of Operators, Cambridge University Press, Cambridge, 1990. [2] A. Cohen, J.-P. d’Ales, Nonlinear approximation of random functions, SIAM J. Appl. Math. 57 (1997) 518–540. [3] A. Cohen, I. Daubechies, O.G. Guleryuz, M.T. Orchard, On the importance of combining wavelet-based nonlinear approximation with coding strategies, IEEE Trans. Inform. Theory 48 (2002) 1895–1921.
J. Creutzig et al. / Journal of Complexity 23 (2007) 867 – 889
889
[4] J. Creutzig, Relations between classical, average, and probabilistic Kolmogorov widths, J. Complexity 18 (2002) 287–303. [5] A. Dembo, O. Zeitouni, Large Deviation Techniques and Applications, Springer, New York, 1998. [6] S. Dereich, T. Müller-Gronbach, K. Ritter, Infinite-dimensional quadrature and quantization, Preprint, 2006, arXiv: math.PR/0601240v1. [7] R. DeVore, Nonlinear approximation, Acta Numer. 8 (1998) 51–150. [8] K. Grill, On the rate of convergence in Strassen’s law of the iterated logarithm, Probab. Theory Related Fields 74 (1987) 583–589. [9] N. Hofmann, T. Müller-Gronbach, K. Ritter, The optimal discretization of stochastic differential equations, J. Complexity 17 (2001) 117–153. [10] P.E. Kloeden, P. Platen, Numerical Solution of Stochastic Differential Equations, Springer, Berlin, 1995. [11] M. Kon, L. Plaskota, Information-based nonlinear approximation: an average case setting, J. Complexity 21 (2005) 211–229. [12] W. Li, Q.M. Shao, Gaussian processes: inequalities, small ball probabilities and applications, in: D.N. Shanbhag et al. (Ed.), Stochastic Processes: Theory and Methods, The Handbook of Statistics, vol. 19, North-Holland, Amsterdam, 2001, pp. 533–597. [13] M. Lifshits, Asymptotic behaviour of small ball probabilities, in: B. Grigelionis et al. (Eds.), Proceedings of the Seventh Vilnius Conference 1998, TEV-VSP, Vilnius, 1999, pp. 153–168. [14] V.E. Maiorov, Widths of spaces endowed with a Gaussian measure, Russian Acad. Sci. Dokl. Math. 45 (1992) 305–309. [15] V.E. Maiorov, Average n-widths of the Wiener space in the (L∞ )-norm, J. Complexity 9 (1993) 222–230. [16] V.E. Maiorov, Widths and distribution of values of the approximation functional on the Sobolev space with measure, Constr. Approx. 12 (1996) 443–462. [17] T. Müller-Gronbach, Strong approximation of systems of stochastic differential equations, Habilitationsschrift, TU Darmstadt, 2002. [18] T. Müller-Gronbach, The optimal uniform approximation of systems of stochastic differential equations, Ann. Appl. Probab. 12 (2002) 664–690. [19] V.V. Petrov, Sums of Independent Random Variables, Springer, Berlin, 1975. [20] K. Ritter, Average-Case Analysis of Numerical Problems, Lecture Notes in Mathematics, vol. 1733, Springer, Berlin, 2000. [21] A.W. van der Vaart, J.A. Wellner, Weak Convergence and Empirical Processes, Springer, New York, 1996.
Journal of Complexity 23 (2007) 890 – 917 www.elsevier.com/locate/jco
On the best interval quadrature formulae for classes of differentiable periodic functions V.F. Babenkoa, b,∗ , D.S. Skorokhodova a Dnepropetrovsk National University, Ukraine b Institute of Applied Mathematics and Mechanics of NAS, Ukraine
Received 8 January 2007; accepted 20 March 2007 Available online 7 April 2007 Dedicated to Henryk Wozniakowski on the occasion of his 60th birthday
Abstract In this paper we solve the problem about optimal interval quadrature formula for the class W r F of differentiable periodic functions with rearrangement invariant set F of their derivatives of order r. We prove that the formula with equal coefficients and n node intervals having equidistant midpoints is optimal for considering classes. To this end a sharp inequality for antiderivatives of rearrangements of averaged monosplines is proved. © 2007 Elsevier Inc. All rights reserved. Keywords: Quadrature formulae; Monosplines; Rearrangements
1. Introduction, notations, statement of the problem Let Lp , 1p ∞, be the space of 2-periodic functions f : R → R with the usual norm f p =
⎧ ⎨ 2 |f (t)|p dt 1/p ⎩
0
if p < ∞,
esssup{|f (t)| : t ∈ [0, 2)} if p = ∞.
Let also C2 be the space of continuous 2-periodic functions f : R → R endowed with the uniform norm f C .
∗ Corresponding author. Dnepropetrovsk National University, Ukraine.
E-mail address: [email protected] (V.F. Babenko). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.03.005
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
891
Denote by Kn , n = 1, 2, . . . , the set of all possible quadrature formulae of the form (f ) =
n
aj f (xj ),
j =1
where x1 < x2 < · · · < xn < x1 + 2, aj ∈ R. Let M be some (non-symmetric in general) class of continuous 2-periodic functions. For f ∈ M and ∈ Kn set 2 R(f, ) = f (t) dt − (f ). 0
The error of approximate integration with the help of the formula ∈ Kn on the class M we shall characterize by the pair of values R ± (M, ) = sup{R(±f, ) : f ∈ M} or, equivalently, with the help of the interval (M, ) := [−R − (M, ), R + (M, )]. Certainly, for the symmetric classes M we have R + (M, ) = R − (M, ). Set R± (M, Kn ) = inf{R ± (M, ) : ∈ Kn }.
(1.1)
The Kolmogorov problem about the best quadrature formula for the class M can be formulated in the following way. Find the values (1.1) and find the formulae ∈ Kn that realize the infimum in the right hand part of (1.1), if such formulae exist. The case when there exists a quadrature formula , which realizes infimum in both R+ (M, Kn ) and R− (M, Kn ), is especially interesting. For this and for an arbitrary formula we shall have (M, ) ⊂ (M, ). Quadrature formula satisfying the latter conditions will be called optimal for the class M. Let 0 < h < /n be given. Denote by Kni (h) the set of so-called interval quadrature formulae of the form yj +h n 1 i (f ) = bj f (t) dt, 2h yj −h j =1
where y1 < y2 < · · · < yn < y1 + 2, bj ∈ R. For f ∈ M and i ∈ Kni (h) set 2 R(f, i ) = f (t) dt − i (f ). 0
The error of approximate integration with the help of i ∈ Kni (h) on the class M we shall characterize by the pair of values R ± (M, i ) = sup{R(±f, i ) : f ∈ M}
892
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
or, that is equivalent, with the help of the interval (M, i ) := [−R − (M, i ), R + (M, i )]. As above, for symmetric classes M we have R + (M, i ) = R − (M, i ). Set R± (M, Kni (h)) = inf{R ± (M, i ) : i ∈ Kni (h)}.
(1.2)
The analog of the Kolmogorov problem about the best interval quadrature formula for the class M can be formulated in the following way. Find the values (1.2) and find the formulae i ∈ Kni (h) that realizes the infimum in the right hand part of (1.2). For the interval formulae as well as for usual quadrature formulae the case when there exists an interval quadrature formula i which realize infimum in both R+ (M, Kni (h)) and R− (M, Kni (h)) is especially interesting. For this i and for an arbitrary formula i we shall have (M, i ) ⊂ (M, i ). Interval quadrature formula satisfying the latter conditions will be called optimal for the class M. From the applications point of view, interval quadrature formulae are more natural than the usual quadrature formulae based on values at points, since quite often the result of measuring physical quantities, due to the structure of the measurement devices, is an average values of the function, describing the studied quantities, over some interval. Note that one can obtain the usual quadrature formula from the corresponding interval quadrature formula as a limit case, setting h → 0. Given h > 0, define the Steklov operator Sh : L1 → C2 in the following way: x+h 1 f (t) dt. Sh (f )(x) := 2h x−h We shall often write f h instead of Sh (f ). It can be easily seen that the problem of finding the optimal interval quadrature formula for the class M can be considered as a problem of finding the optimal usual quadrature formula for the class Sh (M) := {Sh (f ) : f ∈ M}. Let f ∈ L1 . The notation f ⊥ 1 means that 2 f (t) dt = 0. 0
Let F be a subset of L1 such that {f ∈ F : f ⊥ 1} = ∅. For r = 1, 2, . . . denote by W r F the class of functions f that have locally absolutely continuous derivative f (r−1) and such that f (r) ∈ F . In the case when F is the unit ball of the space Lp we obtain the standard Sobolev class Wpr of periodic functions. For a non-negative function f ∈ L1 let us denote by P (f, t) the decreasing rearrangement (see e.g. [9, p. 130, 10, pp. 92, 93]) of the restriction of f to [0, 2). If g is an arbitrary function from L1 , then set (see e.g. [10, p. 99]) (g, t) = P (g+ , t) − P (g− , 2 − t), where g± (t) = max{±g(t); 0}. The set F ⊂ L1 is called rearrangement invariant or, shortly, -invariant if conditions f ∈ F and (g) = (f ) imply g ∈ F .
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
893
In order to illustrate the variety of the classes W r F with -invariant sets F we mention some examples. 1. For F one can take the unit sphere of any symmetric space of 2-periodic functions embedded in L1 , in particular, the unit sphere in the space Lp , 1 p ∞, in Orlich [11], Lorentz and Marcinkiewicz [12,22] spaces. 2. Let be an arbitrary non-negative, non-decreasing function defined on [0, ∞). One can take
2
F = F () = f ∈ L1 :
(|f (t)|) dt 1 .
0
3. Let , > 0 be non-negative real numbers, 1p ∞. One can take F = Fp;, = {f+ + f− p 1}. r We shall denote the corresponding class W r Fp;, by Wp; , . r 4. Very interesting classes W Ff, correspond to the set Ff, = {g ∈ L1 : (g) = (f )}, where f is a fixed function from L1 , f ⊥ 1. 5. For F one can take the set
Ff,P = {g ∈ L1 : P (|g|, t) = P (|f |, t), t ∈ [0, 2)} or = {g ∈ L1 : P (|g|, t) P (|f |, t), t ∈ [0, 2)}. Ff,P
The list of examples could, of course, be continued. The following integral representation for functions f ∈ W r F plays an essential role in investigation of various extremal problems for classes W r F . Let ∞ 1 −r Dr (x) = j cos(j x − r/2), r ∈ N j =1
be the Bernoulli kernel. Then 2 a0 a0 f (x) = Dr (x − t)f (r) (t) dt = + + (Dr ∗ f (r) )(x), 2 2 0
(1.3)
where a0 =
1
2 f (t) dt. 0
Note that, considering the problem on optimization of quadrature formulae or interval quadrar ture formulae restrict our consideration n for the classes W F , we may by formulae from Kn such that j =1 aj = 2 or by formulae i from Kni (h) such that nj=1 bj = 2 only. For such formulae set m(t) = m,r (t) = −
n j =1
aj Dr (xj − t)
894
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
and mh (t) = mhi ,r (t) = −
n
bj Drh (yj − t).
j =1
Set Mnr := {m,r : ∈ Kn }, and Mnr,h = Sh (Mnr ) := {mhi ,r : i ∈ Kni (h)}. Functions from
Mnr and from Mnr,h will be called monosplines and averaged monosplines, respectively. With the help of representation (1.3) one can obtain the error of approximate integration by these formulae in the form 2 R(f, ) = f (r) (t)m(t) dt, m = m,r ∈ Mnr 0
if ∈ Kn , or in the form 2 i R(f, ) = f (r) (t)mh (t) dt, 0
mh = mhi ,r ∈ Mnr,h
(1.4)
if i ∈ Kni (h). r Denote by Snr (, ), n = 1, 2, . . . , r = 0, 1, . . . , , > 0, the set of functions f ∈ W∞; ,
with zero mean value on a period such that −1 (f (r) )+ + −1 (f (r) )− ≡ 1 and f (r) admits at most 2n changes of sign on a period. In this paper we shall discuss the Kolmogorov problems on optimal quadrature formulae and optimal interval quadrature formulae for classes W r F with -invariant sets F. We shall show that for any fixed h ∈ (0, /n) the interval quadrature formula having equidistant nodes yj , j = 1, n, and equal coefficients bj = 2/n is optimal for the class W r F among all interval quadrature formulae from Kni (h). To this end a sharp inequality for antiderivatives of rearrangements of averaged monosplines will be proved. The paper is organized in the following way. In Section 2 we shall present the known results, formulate main results of the paper, and describe the ideas of the proof. Some auxiliary results will be presented in Section 3. In Sections 4–7 we shall prove results, formulated in Section 2. 2. Background, main results, scheme of the proof Set n (f ) =
n 2 f (2j/n) n j =1
and in (f ) =
2j/n+h n 2 1 f (t) dt. n 2h 2j/n−h j =1
In addition, set
n 2j 2 mn,r (x) = − −x Dr n n j =1
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
895
and denote by n,r;, the rth periodic integral with zero mean value over a period of a 2n−1 periodic function n,0;, which equals on the interval [0, 2n−1 ( + )−1 ), and equals − on the interval [2n−1 ( + )−1 , 2n−1 ). It was proved in the papers of Motornyi [16], Ligun [14], and Zhensykbaev (see [23,24]) that if M = Wpr , r = 1, 2, . . . , 1 p ∞, then R± (M, Kn ) = R ± (M, n ). At the same time it does not hold for some natural analogues of the class Wpr [19]. Therefore, it was an interesting problem to determine the most general conditions on the class M of functions that ensure the optimality of formula n . This problem was solved by Babenko [3,5]. He proved the following: Theorem A. Let n, r = 1, 2, . . . , and let F ⊂ L1 be rearrangement invariant. Then R± (W r F , Kn ) = R ± (W r F , n )
2 = sup (±f, t)(mn,r , t) dt : f ∈ F, f ⊥ 1 . 0
Let us describe the scheme of the proof of Theorem A. For non-negative 2-periodic functions f and F we shall write f ≺ F if for any x ∈ [0, 2], x x P (f, t) dt P (F, t) dt. 0
0
The following extremal property of monosplines was proved in [5] in order to establish Theorem A. Theorem B. Let n, r = 1, 2, . . . . Then for any m ∈ Mnr and any ∈ R, (mn,r − )± ≺ (m − )± . To prove Theorem B it was enough (see Theorem 10 in Section 3) to prove the following: Theorem C. Let n, r = 1, 2, . . . . Then for any m ∈ Mnr and any , > 0, E0 (mn,r )1;, E0 (m)1;, . (For the definition of the values E0 (f )1;, , f ∈ L1 , see Section 3.) To prove this it was enough to prove: Theorem D. Let r = 1, 2, . . . and , > 0. Then for an arbitrary n ∈ N the quadrature formula r with equidistant nodes and equal coefficients is optimal for the class W∞; , . Moreover, r ± r R± (W∞; , , Kn ) = R (W∞;, , n ) = E0 (mn,r )1;−1 ,−1
= −2 min(±n,r;−1 ,−1 (u)). u
896
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
To prove the last theorem the following two theorems were established: Theorem E. For any n points x1 < x2 < · · · < xn < x1 + 2 there exists a spline g ∈ Snr (, ) with equal minima at these points. Theorem F. Let n, r = 1, 2, . . . and , , , > 0. Then for any g ∈ Snr (, ), E0 (n,r;, )1;, E0 (g)1;, . Interval quadrature formulae have been considered by many mathematicians (see for instance [18,20,13,21,4,17,15]). The results about optimal interval quadrature formula for the classes of r [17], and W 1 F [7,8]. differentiable periodic functions are known for the classes W1r [4], W∞ The main result of our paper is the following: Theorem 1. Let n, r = 1, 2, . . . and 0 < h < /n. Then for an arbitrary -invariant set F, R± (W r F, Kni (h)) = R ± (W r F, in ) = R± (Sh (W r F ), Kn ) = R ± (Sh (W r F ), n )
2 = sup (±f, t)(Sh (mn,r ), t) dt : f ∈ F, f ⊥ 1 . 0
To prove this theorem we shall use the above presented scheme of the proof of Theorem A. In particular, we shall prove the following theorem which is of independent interest. Theorem 2. Let n, r = 1, 2, . . . and 0 < h < /n. Then for any mh ∈ Mnr,h and any ∈ R, (Sh (mn,r ) − )± ≺ (mh − )± . To prove Theorem 2 it is enough to prove: Theorem 3. Let n, r = 1, 2, . . . . Then for any mh ∈ Mnr,h and any , > 0, E0 (Sh (mn,r ))1;, E0 (mh )1;, . To prove this it suffices to prove the following: Theorem 4. Let r = 1, 2, . . . and , > 0. Then for an arbitrary n ∈ N the interval quadrature formula with equal coefficients and node intervals having equidistant midpoints is optimal for the r class W∞; , . Furthermore, r i ± r i R± (W∞; , , Kn (h)) = R (W∞;, , n ) = E0 (Sh (mn,r ))1;−1 ,−1
= −2 min(±h u
n,r;−1 ,−1
(u)).
To prove Theorem 4 we shall prove the following two theorems. Theorem 5. For every system of points x1 < x2 < · · · < xn < 2 + x1 there exists a function fr ∈ Snr (, ) such that frh attains equal minimal values at these points.
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
897
Theorem 6. Let n, r = 1, 2, . . . , 0 < h < /n and , , , > 0. Then for every f ∈ Snr (, ), E0 (hn,r;, )1;, E0 (Sh (f ))1;, . The implementation of this outline meets serious difficulties connected with the fact that Steklov operator Sh does not have the following property: for any 2-periodic function f having zero mean value on a period
(Sh (f )) (f ), where (f ) is number of sign changes of the function f on a period. To overcome these difficulties we shall prove the following theorem which plays the crucial role in proofs of Theorems 5 and 6. Theorem 7. Let n, r = 1, 2, . . . and , > 0. Let splines s1 , s2 ∈ Sn0 (, ) be such that (s1h ) =
(s2h ) = 2n. If f (t) = s1 (t) − s2 (t) then (f h ) (f ). 3. Some auxiliary results Here we shall present some known definitions and results which will be frequently used in the rest of the paper. Let 1 p ∞. Let f ∈ Lp and let H be a subspace of L1 . We shall denote by E(f ; H )p the best approximation of the function f by the subspace H in the Lp -metric, i.e.: E(f ; H )p = inf{f − up : u ∈ H }. In addition, let E ± (f ; H )p = inf{f − up : ±u ± f, u ∈ H } denote the best one-sided approximation of the function f by the subspace H in the Lp -metric. Let , > 0. Then we shall denote by E(f ; H )p;, the best (, )-approximation [2] of the function f by the subspace H in the Lp -metric, i.e.: E(f ; H )p;, = inf{(f − u)+ + (f − u)− p : u ∈ H }. For = we obtain, up to a constant factor, the usual best approximation (instead of E(f ; H )p;1,1 we shall write E(f ; H )p ). By virtue of Theorem 2 in [2], as → ∞ ( → ∞), E(f ; H )p;1, (E(f ; H )p;,1 ) tends monotone non-decreasingly to the best approximation from below (from above) of the function f by the elements of H in the Lp -metric: E + (f ; H )p (E − (f ; H )p ), i.e.: lim E(f ; H )p;1, = E + (f ; H )p lim E(f ; H )p;,1 = E − (f ; H )p . →∞
→∞
This allows us to include the problem of the best approximation without constraint and the problem of the best one-sided approximation into the family of problems of the same type with “loose” constraints, and consider them from a general point of view (see for this reason also [3]). In what follows we shall allow +∞ for or identifying E(f ; H )p;, with the corresponding one-sided approximation. When H is the space of all constants, let E0 (f )p;, = E(f ; H )p;, .
898
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
Theorem 8 (Criterion for the best (, )-approximation,[2], Theorem 4). Let H be a finite dimensional subspace of Lp , 1p < ∞, and , > 0. For an element u0 ∈ H to be the best (, )-approximation for f ∈ Lp in the Lp -metric, it is sufficient and (for p = 1 in the case if f − u0 almost everywhere differs from 0) necessary, that for any u ∈ H , 2 u(t)|f (t) − u0 (t)|p−1 [p sign (f (t) − u0 (t))+ − p sign (f (t) − u0 (t))− ] dt = 0. 0
Theorem 9 (Duality theorem for the best (, )-approximation, [2], Theorem 5). Let 1 p<∞ and let H be any finite dimensional subspace of Lp . Then for any function f ∈ Lp ,
E(f ; H )p;, = sup
2
0
f (t)g(t) dt : −1 g+ + −1 g− q 1, g ⊥ H ,
where p−1 + q −1 = 1. Let f, g ∈ L1 . The convolution of functions f and g is defined as 2 (f ∗ g)(t) = f (t − )g() d, t ∈ [0, 2). 0
Let > 0 and x ∈ [0, 2). Define A (x) =
∞ 1 eiqx . 2 q=−∞ ch(q )
It is easy to verify that the convolution of function A (x) and arbitrary periodic function is analytic on a real line. Hence, a convolution of A (x) and an arbitrary function, not identically constant, differs from zero almost everywhere. It is known (see, for example, [6]) that for every function f ∈ C2 ,
(A ∗ f ) (f ).
(3.1)
In addition, for every f ∈ C2 , (A ∗ f )(·) − f (·)C2 → 0 as → 0. Let n, r = 1, 2, . . . and 0 < h < /n. Due to Lemma 5.1 from [5], it is easy to verify that the following lemma holds. Lemma 3.1. Let the spline g ∈ Snr (, ) with nodes at the points x1 , . . . , x2l be such that g (r) (x) = for x ∈ (x1 , x2 ). Then (A ∗ g h )(x) = ((A ∗ Drh ) ∗ g (r) )(x) = ( + )
2l j =1
(−1)j
2 0
D1 (xj − t)[Dr ∗ A ]h (x − t) dt.
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
899
Lemma 3.2 (Babenko [5], Lemma 5.2). Let the function g ∈ L1 be almost everywhere different from every fixed constant and g ⊥ 1. Then + 2 − E0 (g)1;, = inf |g(t) − | dt + 2 . 2 2 ∈R 0 Lemma 3.3. Let s ∈ Sn0 (, ), and x1 < x2 < · · · < x2l < x1 + 2 be the nodes of s. Let ∈ R, and 2 2 2l j h F (x1 , . . . , x2l ; )= (+) (−1) D1 (xj − t)[Dr ∗A ] (x − t) dt− dx. j =1 0
0
Then F is continuously differentiable in the sense that the partial derivatives **F and **xF , k = 1, 2l, k exist and are continuous. Moreover, ⎛ ⎞ 2 2 2l *F j h = − sign ⎝( + ) (−1) D1 (xj − t)[Dr ∗ A ] (x − t) dt − ⎠ dx, * 0
*F = (−1)k ( + ) *xk ⎛
j =1
2 0
× sign ⎝( + )
0
[A ∗ Dr ]h (xk − x)
2l j =1
2 (−1)j
⎞ D1 (xj − t)[Dr ∗ A ]h (x − t) dt − ⎠ dx.
0
This lemma can be proved analogously to the proof of Lemma 5.3 from [5]. Lemma 3.4 (See Babenko [6]). Let n, r = 1, 2, . . . , , > 0, and l ∈ N, l < n. Then for an arbitrary t ∈ [0, 2), min(A ∗ l,r;, )(u) < (A ∗ n,r;, (t)) < max(A ∗ l,r;, )(u). u
u
The statement of this lemma was noted in [6]. The following theorems represent the statements of Theorem 2.3 and Lemmas 2.2–2.3 from [5]. Theorem 10. Let f and F be continuous 2-periodic functions with zero mean value on a period and for all , > 0 let E0 (f )1;, E(F )1;, . Then f± ≺ F± .
(3.2)
Theorem 11. For any f ∈ L1 with zero mean value on a period and for any F ∈ L1 the following equality holds:
2 2 g(t)F (t) dt : (g) = (f ) = (f, t)(F, t) dt. sup 0
0
900
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
Theorem 12. Let the 2-periodic functions f and F be continuous with zero mean values on a period and such that for all ∈ R and x ∈ [0, 2), the inequality (3.2) holds. Then for any function g ∈ L1 with zero mean value on a period we have 2 2 (g, t)(f, t) dt (g, t)(F, t) dt. 0
0
4. Some properties of averaged (, )-splines In this section we shall prove Theorem 7 which plays very important role in the rest of the paper. Let n be a positive integer, , > 0, and 0 < h < /n. The following two results represent a generalization of Lemmas 2 and 3 from the paper of Motornyi [17] for the case of non-symmetric perfect splines. Lemma 4.1. Let s ∈ Sn0 (, ) be an arbitrary spline and let us denote by x1 < x2 < · · · < x2n its nodes on a period. Then the Steklov function s h is non-decreasing on the interval (xj −h, xj +1 −h), if s(t) ≡ on the interval (xj , xj +1 ), and is non-increasing on (xj − h, xj +1 − h), if s(t) ≡ − on (xj , xj +1 ). Proof. Let us consider the first derivative of the Steklov function
t+h d 1 1 h (s ) (t) = s(t) dt = [s(t + h) − s(t − h)]. dt 2h t−h 2h This provides (s h ) (t − h) = [s(t) − s(t − 2h)]/2h. It can be easily seen that (s h ) (t − h) 0 on the interval t ∈ (xj , xj +1 ), if s(t) ≡ on the same interval, and (s h ) (t − h)0 on the interval t ∈ (xj , xj +1 ), if s(t) ≡ − on the same interval. Thus, we obtain s h is non-decreasing on (xj − h, xj +1 − h), if s(t) ≡ on the interval (xj , xj +1 ). Similarly, s h is non-increasing, if s(t) ≡ − on the interval (xj , xj +1 ). This is the desired conclusion. Lemma 4.2. Let s ∈ Sn0 (, ). Assume that (s h ) = 2n. Then the length of the interval (xj , xj +1 ) is greater than 2h/( + ) in the case s(t) ≡ on this interval, and is greater than 2h/( + ) in the case s(t) ≡ − on (xj , xj +1 ). Proof. Let x1 < x2 < · · · < x2n < x1 + 2 denote the nodes of the spline s and let x2n+1 = x1 + 2. Since (s h ) = 2n, by the previous lemma, we have that s h (xj − h)s h (xj +1 − h) < 0, j = 1, 2n − 1. Without loss of generality, we may assume sign s h (xj − h) = (−1)j , j = 1, 2n. From this it follows that s(t) ≡ on the interval (x1 , x2 ). Note that the sum of lengths of all intervals (xj , xj +1 ), j = 1, 2n, on which s attains the value , is equal to 2/( + ). Then there exists an interval on which s(t) ≡ , with the length greater than 2h/( + ). Similarly, there exists an interval on which s(t) ≡ −, with the length greater than 2h/( + ). Suppose the assertion of the lemma is false. Then, due to the remark above, we obtain two possible cases: (1) There exists 1j 2n such that s(t) ≡ for t ∈ (xj −1 , xj ), s(t) ≡ − for t ∈ (xj , xj +1 ) and the length of the interval (xj −1 , xj ) is greater than 2h/( + ) and the length of the interval (xj , xj +1 ) is less than 2h/( + ).
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
901
(2) There exists 1j 2n such that s(t) ≡ − for t ∈ (xj −1 , xj ), s(t) ≡ for t ∈ (xj , xj +1 ) and the length of the interval (xj −1 , xj ) is greater than 2h/( + ) and the length of the interval (xj , xj +1 ) is less than 2h/( + ). We consider the first case in detail. The second one can be studied similarly. Without loss of generality, we may assume that j = 2. From this we have s h (x3 − h) < 0, since sign s h (x3 − h) = −1. Let us consider x3 − 2h x2 − 2h/( + ). Then x3 1 s(t) dt s h (x3 − h) = 2h x3 −2h x2 x3 x2 −2h/(+) 1 = s(t) dt + s(t) dt + s(t) dt 2h x2 −2h/(+) x2 x3 −2h
1 2h 2h (−) · x2 − − x3 + 2h + · − · (x3 − x2 ) = 0. 2h + + In the case x3 − 2hx2 − 2h/( + ) we obtain
x2 x3 1 1 h s (x3 − h) = · [2h − ( + )(x3 − x2 )] > 0. s(t) dt + s(t) dt = 2h 2h x3 −2h x2 Thus, s h (x3 − h)0, which contradicts the fact that s h (x3 − h) < 0.
The following statement is a trivial corollary of Lemma 4.2. Lemma 4.3. Let the spline s ∈ Sn0 (, ) be such that (s h ) = 2n. Then for an arbitrary point x ∈ [0, 2) spline s has at most two sign changes on the interval (x − h, x + h). Due to Lemma 4.3, considering different possibilities for location of points, where splines s1 and s2 change their signs, we obtain that the following lemma holds. Lemma 4.4. Let splines s1 , s2 ∈ Sn0 (, ) be such that (s1h ) = (s2h ) = 2n and let x be an arbitrary point from the interval [0, 2). Then the difference f (t) = s1 (t) − s2 (t) has at most two sign changes on the interval (x − h, x + h). Lemma 4.5. Let splines s1 , s2 ∈ Sn0 (, ) be such that (s1h ) = (s2h ) = 2n. Assume there exists a point x ∈ [0, 2) such that the function f (t) = s1 (t) − s2 (t) has exactly two sign changes on the interval (x − h, x + h). Then there exists x˜ > 0 such that the function f has exactly one sign change on the interval (x + x˜ − h, x + x˜ + h). Moreover, f h (y) = f h (x) for arbitrary y ∈ [x, x + x]. ˜ Proof. Let x ∈ [0, 2) satisfy conditions of the lemma. Analyzing different possibilities for location of nodes of splines s1 and s2 on the interval [x − h, x + h], we conclude that the function f has exactly two sign changes on this interval only when both splines s1 and s2 have exactly two sign changes on the interval [x − h, x + h] and there exists a neighborhood U (x − h) of the point x − h such that s1 (t) ≡ const and s2 (t) ≡ const when t ∈ U (x − h), and s1 (x − h) · s2 (x − h) < 0. Without loss of generality, we may assume that s1 (x − h) = and s2 (x − h) = −.
902
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
Let x1,1 , x1,2 , x1,3 and x1,4 be the neighboring nodes of the spline s1 such that x1,1 x − h < x1,2 < x1,3 < x + h x1,4 and let x2,1 , x2,2 , x2,3 and x2,4 be the neighboring nodes of the spline s2 such that x2,1 x − h < x2,2 < x2,3 < x + h x2,4 . Then s1 (t) ≡ in the case t ∈ (x1,1 , x1,2 ) or t ∈ (x1,3 , x1,4 ) and s1 (t) ≡ − in the case t ∈ (x1,2 , x1,3 ). At the same time s1 (t) ≡ − in the case t ∈ (x2,1 , x2,2 ) or t ∈ (x2,3 , x2,4 ) and s1 (t) ≡ in the case t ∈ (x2,2 , x2,3 ). Set x˜ = min{x1,2 −x +h; x2,2 −x +h}. Without loss of generality we assume x˜ = x1,2 −x +h. This implies the splines s1 and s2 are equal to and −, respectively, on the interval (x − h, x1,2 ). Thus we obtain f (t) = + for t ∈ (x − h, x1,2 ). At the same time splines s1 and s2 are equal to and −, respectively, on the interval (x + h, x + x˜ + h). Indeed, applying Lemma 4.2 we obtain x1,3 − x1,2 > 2h/( + ) and x1,4 − x1,3 > 2h/( + ). From the last inequalities we conclude that x1,4 − x1,2 > 2h = x + h + x˜ − x1,2 , hence that x + x˜ + h < x1,4 . Similarly, x + x˜ + h < x2,4 . By these arguments for an arbitrary y ∈ [0, x] ˜ we have f (x − h + y) = f (x − h) = + = f (x + h) = f (x + h + y). For every z ∈ [0, x] ˜ it can be easily seen that z < 2h. It follows that the following equalities hold: f h (x + z) − f h (x)
x+h x+z+h 1 = f (t) dt − f (t) dt 2h x−h x+z−h
x+z−h x+h x+h x+z+h 1 = f (t) dt + f (t) dt − f (t) dt − f (t) dt 2h x−h x+z−h x+z−h x+h
z z 1 f (x − h + ) d − f (x + h + ) d = 0. = 2h 0 0 Obviously, the function f does not have more than two sign changes on the interval [x+x−h, ˜ x+ x+h], ˜ which completes the proof. Proof of Theorem 7. Set (f h ) = 2b, where b is a positive integer. Due to Lemmas 4.4 and 4.5, there exist points x1 < x2 < · · · < x2b < x1 + 2 such that sign f h (xj ) = (−1)j , j = 1, 2b, and the function f has at most one sign change on each of intervals [xj − h, xj + h], j = 1, 2b. Clearly, for every j = 1, 2b there exists a non-empty interval j ⊂ [xj − h, xj + h] such that sign f (t) = (−1)j on it. Let us denote by yj , yj∗ and yj∗∗ the midpoint, the left and right endpoints of the interval j , respectively. This implies xj − h < yj < xj + h for every j = 1, 2b. j We shall show that the sequence {yj }2b j =1 increases and sign f (yj ) = (−1) , j = 1, 2b. The second proposition holds by choosing points yj . Suppose, there exists j0 such that yj0 > yj0 +1 . Without loss of generality we may take j0 = 1. It can be easily seen that y2 ∈ (x2 −h, x2 +h) and we conclude from the assumption and inequality x1 − h < x2 − h that y1 ∈ (x2 − h, x2 + h)
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
903
and y2 ∈ (x1 − h, x1 + h). It is easy to verify that x1 − h < x2 − h y2∗ < y2∗∗ y1∗ < y1∗∗ x1 + h < x2 + h. Thus, f (t)0 when t ∈ (x1 − h, x2 − h), otherwise there exist three points from the interval [x1 , −h, x1 + h] with alternate sign. Similarly, f (t) 0 when t ∈ (x1 + h, x2 + h). Therefore, x2 −h x2 +h h h f (t) dt + f (t) dt 0, 0 < f (x2 ) − f (x1 ) = − x1 −h
x1 +h
which is impossible. Thus, y1 < y2 < · · · < y2b < y1 + 2 and sign f (yj ) = (−1)j , j = 1, 2b. This gives (f ) 2b = (f h ). 5. On existence of the spline from Sh (Snr (, )) with prescribed minima This section is devoted to the proof of Theorem 5. This theorem can be proved in many ways. We shall use methods from the paper [16]. Let r, n = 1, 2, . . . , 0 < h < /n and , > 0. Let N˜ nr denote the set of functions f which can be represented in the form f = g h + a,
g ∈ Snr (, ), a ∈ R
and have exactly 2n extrema on a period. It can be easily seen that the Steklov function of every 2/n-periodic function f ∈ Snr (, ) belongs to the set N˜ nr . Hence, N˜ nr = ∅. Let f ∈ Sn0 (, ), and let
1 < 2 < · · · < 2n < 1 + 2 be the nodes of the spline f such that f (t) ≡ when t ∈ ( 1 , 2 ). Then, since f has a zero mean value, the following equality holds 2n
(−1)j j =
j =1
2 . +
(5.1)
Hence, every system of points 1 < 2 < · · · < 2n−1 < 1 + 2 such that 2n−1 2 (−1)j j < 1 + 2 − + j =1
uniquely determines some spline f ∈ Sn0 (, ). Such a system of points we shall denote by ( = { j }2n−1 j =1 ), and we shall call it as the determining system for the spline f. In addition, we shall denote by f the spline which corresponds to the system of points . Let be a given determining system for some spline. Then set
2n =
2n−1 2 (−1)j j − + j =1
and
2n+1 = 1 + 2.
904
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
Lemma 5.1. Let , be determining systems for splines f and f , respectively. If the difference f (t) − f (t) changes sign exactly 2n times on [0, 2) and
j < j +1 < j +2 ,
j = 1, 2n − 1,
then it is necessary that j = j for every j = 1, 2n. Proof. Let 0 = 2n − 2 and 0 = 2n − 2. It is easy to verify that the difference f (t) − f (t) changes sign at most once on each of intervals ( j −1 , j +1 ) and (j −1 , j +1 ), j = 1, 2n. Assume to the contrary, there exists 1 j 2n such that j = j . There are two possible cases: j −1 j −1 and j −1 j −1 . We will consider the first case. In this case f (t) − f (t) does not have sign changes on the interval ( j −1 , j +1 ), which contradicts the assumption (f − f ) = 2n. The second one can be studied similarly. We shall denote by U ( ) the closed ball with the center = ( 1 , . . . , 2n−1 ) and the radius > 0, in (2n − 1)-dimensional space R2n−1 with the norm := max | j |. j
Lemma 5.2. Let ∈ R2n−1 be the determining system for the spline f ∈ Sn0 (, ). Then there exists > 0 such that an arbitrary point ∈ U ( ) is a determining system for some spline f ∈ Sn0 (, ). This lemma can be proved similarly to Lemma 3.2 in [16]. Let be the determining system for the spline f ∈ Sn0 (, ) such that f h,r ∈ N˜ nr , where f ,r (t) = (Ir f )(t) = (Dr ∗ f )(t) :=
2
Dr (t − )f () d.
0
Since Ir is a bounded operator, we may assume to be such that fh,r ∈ N˜ nr for every ∈ U ( ). Let us consider an arbitrary interval (a, a + 2) containing n points x1 < x2 < · · · < xn at which f h,r attains its minima. We may choose > 0 such that for every ∈ U ( ) points y1 < y2 < · · · < yn at which fh,r attains its minima, belong to the interval (a, a + 2). For every point ∈ U ( ) let () = {y1 , . . . , yn , fh,r (y2 ) − fh,r (y1 ), . . . , fh,r (yn ) − fh,r (y1 )}. Clearly, the mapping from the ball U ( ) into R2n−1 is continuous. Lemma 5.3. There exists < such that the restriction of the mapping to the ball U ( ) is injective. Proof. Let a t1 t2 · · · t2n < a + 2 be the points at which f h,r attains its local extrema.
Set mj := f h,r (tj ), j = 1, 2n. Let us denote by w0 the smallest number satisfying the equality (f h,r ; w) =
1 2
min |mj +1 − mj |,
j =1,2n
where m2n+1 = m1 , and (g; t) is the modulus of continuity of the function g. Let := 41 min | j +1 − j |. For every 0 < ε < 18 min |mj +1 − mj | let us choose such j =1,2n
j =1,2n
that < min{; w0 /2; /2} and for an arbitrary ∈ U ( ) the distance between functions
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
905
fh,r and f h,r in the L∞ -metric does not exceed ε. Due to the definition of numbers ε and w0 , we have that the distance between the neighboring points of local extremum of the function fh,r , ∈ U ( ), is greater than or equal to w0 . Now suppose the assertion of the lemma is false. Then there exist two points , ∈ U ( ), = , such that () = (). Let {yj }nj=1 and {zj }nj=1 be the points from the interval (a, a+2) at which fh,r and fh,r attain their local minima, respectively. Hence, yj = zj and fh,r (yj ) − fh,r (y1 ) = fh,r (zj ) − fh,r (z1 ), j = 1, n. Let u = 1 − 1 , and let us consider the function f (t − u). Let us consider the case u > 0 in detail. The case u < 0 can be studied similarly. Clearly, {1 , 2 +u, . . . 2n−1 +u} is a determining system for the spline f (t − u). Let 2n be chosen such that 2n =
2n−1 2 (−1)j j . − + j =1
For every j = 1, 2n − 2 we have j + u < j + 2j +1 j +2 − 2 < j +2 + u, since |u|2 < 2. Let us apply Lemma 5.1 to determining systems for the splines f (t − u) and f (t). Since the first points of these systems are equal, Lemma 5.1 shows that the difference f (t − u) − f (t) has at most (2n − 1) sign changes. Define g1 (t) := fh,r (t − u) − fh,r (y1 ), g2 (t) := fh,r (t) − fh,r (y1 ). We shall show that the difference g1 (t) − g2 (t) has at least two sign changes on every interval [yj , yj +1 ], j = 1, 2n. Since |u| < 2 < w0 , g1 (yj ) − g2 (yj ) > 0,
j = 1, n.
Furthermore, g1 (yj + u) − g2 (yj + u) < 0,
j = 1, n.
Hence,
(g1 − g2 ) (g1 − g2 ) 2n. At the same time g1 (t) − g2 (t) = fh,r−1 (t − u) − fh,r−1 (t) and (fh,r−1 ) = (fh,r−1 ) = 2n. Therefore, applying Rolle’s theorem and Theorem 7 we obtain 2n (g1 − g2 ) (g1 − g2 ) = (fh,r−1 (· − u) − fh,r−1 (·)) (fh (· − u) − fh (·)) (f (· − u) − f (·)) 2n − 1, which is impossible. This proves the lemma.
906
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
Since the mapping is continuous, we derive from the last lemma that is a homeomorphism from U ( ) into R2n−1 . Let us denote by E the set of points x = (x1 , x2 , . . . , xn−1 ) ∈ Rn−1 such that 0 < x1 < · · · < xn−1 < 2. Obviously, E is a connected set. Let E0r ⊂ E be such that for every point x ∈ E0r there exists a function f h,r ∈ N˜ nr with equal local minima at the points 0, x1 , . . . , xn−1 . The set E0r is non-empty, since (2/n, 4/n, . . . , 2(n − 1)/n) ∈ E0r . In fact, for an arbitrary 2/n-periodic function f h,r ∈ N˜ nr we can choose a number b such that
the function f h,r (t + b) attains its minima at the points 2k/n, k = 0, n − 1. Lemma 5.4. The set E0r is open in E.
Proof. For an arbitrary x ∈ E0r there exists a function f h,r ∈ N˜ nr that attains the equal local minima at the points 0, x1 , . . . , xn−1 . Due to Lemma 5.3, there exists a ball U ( ) such that the mapping () : U ( ) → R2n−1 is a homeomorphism. By virtue of theorem about invariance (see [1, p. 196]) of the domain, this provides the existence of an interior point ∈ (U ), ( ) = (0, x1 , . . . , xn−1 , 0, . . . , 0) ∈ (U ( )). Moreover, there exists a neighborhood of the point x such that for every point y ∈ E from this neighborhood (0, y1 , . . . , yn−1 , 0, . . . , 0) ∈ (U ( )). Thus, there exists ∈ U ( ) such that () = (0, y1 , . . . , yn−1 , 0, . . . , 0). This completes the proof. Lemma 5.5. The set E0r is closed in E. r Proof. Let x ∈ E and let the sequence {x m }∞ m=1 ⊂ E0 converges to x as m → ∞. By definition m m of the sequence {x }, for every point x there exists a spline f m ∈ Sn0 (, ) with a determining 2n−1 system m = { m j }j =1 such that
f hm ,r (0) = f hm ,r (xj ),
j = 1, n − 1
and f hm ,r−1 (0) = f hm ,r−1 (xjm ) = 0,
j = 1, n − 1.
It can be easily seen that there exists the subsequence { mk } which tends to some point ∈ R2n−1 as k → ∞. Clearly, is the determining system for the spline f ∈ Sn0 (, ). This implies that f mk − f 1 → 0 as k → ∞. From this, the sequence {f hmk ,b } converges uniformly to f h,b for an arbitrary integer 0 b r. Thus, we have f hmk ,r (xjmk ) → frh (xj )
and
h f hmk ,r−1 (xjmk ) → fr−1 (xj ),
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
907
as k → ∞, for every j = 1, n − 1, and f hmk ,r (0) → frh (0) and
h f hmk ,r−1 (0) → fr−1 (0),
as k → ∞. Hence, frh (xj ) = frh (0),
j = 1, n − 1,
h h (xj ) = fr−1 (0) = 0, fr−1
j = 1, n − 1,
and frh attains its minima at the points 0, x1 , . . . , xn−1 . This proves the lemma.
To summarize, observe that E0r is non-empty, open and closed subset in the connected set E. This gives E0r = E. Thus, the last remark proves Theorem 5. 6. Proof of Theorem 6 In this section we shall prove the following: Theorem 13. Let n, r = 1, 2, . . . , 0 < h < /n and , , , , > 0. Then for every function f ∈ Snr (, ), E0 (A ∗ hn,r;, )1;, E0 (A ∗ f h )1;, . We shall establish Theorem 6 by letting → 0. Let n, r = 1, 2, . . . , , > 0 and 0 < h < /n. Note that the nodes x1 < x2 < · · · < x2l < x1 + 2, l n, of the spline g ∈ Snr (, ) for which g (r) attains the value on the interval (x1 , x2 ) satisfy 2l
(−1)j xj =
j =1
2 . +
Fix , , , , , n, r and consider the extremal problem E0 (A ∗ g h )1;, → inf,
g ∈ Snr (, ).
(6.1)
Since A ∗ Sh (Snr (, )) := {A ∗ s : s ∈ Sh (Snr (, ))} is compact in the topology of the uniform convergence and E0 (A ∗ g h )1;, continuously depends on g ∈ Snr (, ), the solution of the problem (6.1) exists. Assume that the spline solving the problem (6.1) has exactly 2l, l n, nodes. Due to Lemmas 3.1 and 3.2 the nodes x1 < · · · < x2l < x1 + 2 of this spline are also solutions of the following problem: 2 2l + 2 j h ( + ) (−1) D1 (xj − t)[A ∗ Dr ] (x − t) dt − dx 2 0 0 j =1 − → min, 2 under the constraint + 2
2l j =1
(−1)j xj =
2 , +
(6.2)
∈ R.
(6.3)
908
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
Due to Lemma 3.3, we can apply the Lagrange multiplier method to study problem (6.2). This implies the following necessary conditions to be satisfied by the solutions x1 , . . . , x2l , of this problem: ⎛ ⎞ 2 2l + 2 − sign ⎝( + ) (−1)j D1 (xj − t)[A ∗ Dr ]h (x − t) dt − ⎠ dx 2 0 0 j =1
+ 2
− = 0, 2
(6.4)
2 + · ( + ) [A ∗ Dr ]h (xk − x) 2 0 ⎛ ⎞ 2 2l ×sign ⎝( + ) (−1)j D1 (xj − t)[A ∗ Dr ]h (x − t) − ⎠ dx
(−1)k
j =1
= (−1)k+1 , 2l
(−1)j xj =
j =1
0
k = 1, 2l,
2 , +
(6.5) (6.6)
where is the Lagrange multiplier. Let x1 < x2 < · · · < x2l < x1 + 2 be such that the relation (6.6) holds. For a given number m = 0, 1, . . . set fm (x) = ( + )
2l
(−1)j +m Dm+1 (xj − x).
j =1
Using this notation we have ( + )
2l
(−1)j
j =1
2 0
D1 (xj − t)[A ∗ Dr ]h (x − t) dt = (A ∗ frh )(x).
Conditions (6.4)–(6.6) can be written as follows. If (x1 , . . . , x2l , ) is a solution of the problem (6.2), then (1) fr ∈ Slr (, ) and A ∗ frh ∈ A ∗ Sh (Slr (, ) so that A ∗ frh is a solution of the problem (6.1). (2) is the constant of the best (, )-approximation of A ∗ frh in the space L1 and if g0 (x) = sign((A ∗ frh )(x) − )+ − sign((A ∗ frh )(x) − )− =
+ − sign((A ∗ frh )(x) − ) − , 2 2
then sign g0 (x) = sign((A ∗ frh )(x) − )
(6.7)
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
909
and gr (x) = (Dr ∗ g0 )(x) ∈ Slr (, ) and consequently (A ∗ grh )(x) ∈ A ∗ Sh (Slr (, )). (3) A ∗ grh attains at the points xj (nodes of f0 ) the equal values and sign((A ∗ grh )(x) − (A ∗ grh )(x1 )) = ±sign f0 (x). Note that the condition (1) follows from the relation (6.6). As for condition (2), the statement that is the constant of the best (, )-approximation of A ∗ frh in the space L1 follows from condition (6.4) and Theorem 8. From the fact that (f0 ) = 2l, Lemma 4.1, Rolle’s theorem, property (3.1) and relation (6.7) we have gr ∈ Slr (, ). Finally, as for condition (3), the fact that (A ∗ grh )(x) attains at the points xj (nodes of f0 ) equal values follows from condition (6.5). In addition, we can apply Lemma 4.1, Rolle’s theorem and property (3.1) to verify that the difference (A ∗ grh )(x) − (A ∗ grh )(x1 ) does not have zeros different from xj and that this difference changes its sign at the points xj . We shall prove now the following: Theorem 14. Conditions (1)–(3) can be satisfied (up to a translation of the argument) only by the function (A ∗ frh )(x) = (A ∗ hl,r;, )(x). For a number y ∈ R set Fy,0 (x) := f0 (x) − f0 (x + y),
Fy,r = fr (x) − fr (x + y)
Hy,0 (x) = g0 (x) − g0 (x + y),
Hy,r (x) = gr (x) − gr (x + y).
and
h has only isolated zeros. By () let us denote the number of zeros of the Function A ∗ Hy,r function on a period counted according to the following rule: the simple isolated zeros of are counted once, while the multiple zeros are counted two times.
Lemma 6.1. For any y ∈ R, h
(A ∗ Hy,r ) (Fy,0 ).
Proof. In fact, if on a period there exist 2s points t1 < t2 < · · · < t2s at which Fy,0 has non-zero h also has non-zero values at this values with alternating sign, then, by condition (3), A ∗ Hy,r points with alternating sign. This completes the proof. Lemma 6.2. Let r 2. Then for any y ∈ R, h (A ∗ Hy,r ) (Fy,0 ). h )= Proof. Lemma 6.2 is an analogue of Lemma 5.5 from the paper of Babenko [5]. Let (A ∗Hy,r 2s. Then by virtue of Rolle’s theorem and our method of enumerating zeros on a period, there
910
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
h ) . However, between neighboring zeros of exist 2s different zeros for the function (A ∗ Hy,r h h (A ∗ Hy,r ) , the function (A ∗ Hy,r ) alternates its sign at least once. Applying Rolle’s theorem we obtain h h h
(A ∗ Hy,0 ) · · · (A ∗ Hy,r−2 ) = ((A ∗ Hy,r ) ) 2s.
From property (3.1) of the function A (x) we conclude that h )2s.
(Hy,0
Let us ensure that functions g0 (x) and g0 (x + y) satisfy conditions of Theorem 7. To this end it is suffices to verify that the function g0h has exactly 2l sign changes on a period. By condition (3), the function (A ∗ grh )(x) − (A ∗ grh )(x1 ) changes its sign at nodes of f0 . This implies that this function has exactly 2l sign changes. Hence, due to property (3.1) the difference grh (x) − grh (x1 ) has at least 2l sign changes on a period. However, by Rolle’s theorem,
(g0h )2l. Finally, by Lemma 4.1, the function g0 has at least 2l sign changes. At the same time, due to condition (2), g0 ∈ Slr (, ). This provides that g0 and consequently g0h have exactly 2l sign changes on a period. Thus, functions g0 (x) and g0 (x + y) satisfy conditions of Theorem 7. Applying Theorem 7 we conclude that
(Hy,0 )2s. As a consequence, there exist 2s points t1 , . . . , t2s on a period such that Hy,0 attains non-zero values at these points and alternates its sign when an argument passing from tj to tj +1 . Because of (6.7), we have h
(A ∗ Fy,r )2s.
Applying Rolle’s theorem and property (3.1) yields h h h h (A ∗ Hy,r ) = 2s (A ∗ Fy,r ) (A ∗ Fy,0 ) (Fy,0 ).
(6.8)
Functions f0 (x) and f0 (x + y) satisfy conditions of Theorem 7. In fact, we have already established that
(g0 ) = 2l. By relation (6.7)
((A ∗ frh )(·) − ) = 2l. Hence, applying property (3.1) and Rolle’s theorem we obtain
(f0h )2l.
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
911
However, (f0h ) (f0 ) (Lemma 4.1), and since (f0 ) = 2l from the definition of f0 we conclude that
(f0h ) = 2l. Finally, applying Theorem 7 yields h
(Fy,0 ) (Fy,0 ).
Comparing (6.8) with the latter inequality we obtain h (A ∗ Hy,r ) (Fy,0 ).
Proof of Theorem 14. Due to Lemmas 6.1 and 6.2 we conclude that h h (A ∗ Hy,r ) = (A ∗ Hy,r ) = (Fy,0 ) h ) (A ∗ H h ) for any y ∈ R for which A ∗ H h and F as (A ∗ Hy,r y,0 are not identically y,r y,r zero. Thus, every non-identically zero difference must have only isolated simple zeros. We shall show that it follows the function A ∗ grh is 2n−1 -periodic. Let T be the minimal period for A ∗ grh and let a1 be the point of the smallest local maximum of A ∗ grh . We prove that A ∗ grh has exactly two zeros on the interval [a1 , a1 + T ). Assume to the contrary that the function A ∗ grh has at least four zeros on the interval [a1 , a1 + T ). However, then there is at least one local maximum of A ∗ grh on this interval. Let a2 be the point of local maximum of A ∗ grh nearest to a1 from the right, and a3 the local maximum of A ∗ grh nearest to a1 + T from the left. Moreover, let b1 be the point of local minimum of A ∗ grh nearest to a1 from the right, and b2 the local minimum of A ∗ grh nearest to a1 + T from the left. We shall prove h has a multiple zero at some point on the period. that there exists y ∈ (0, T ) such that A ∗ Hy,r This will show that A ∗ grh has a period y < T , i.e., we obtain a contradiction to the minimality of the period T. h )(a ) = If (A ∗grh )(a1 ) = (A ∗grh )(a2 ), then we can choose y = a2 −a1 < T . Hence, (A ∗Hy,r 1 h h h (A ∗gr )(a1 )−(A ∗gr )(a1 +a2 −a1 ) = 0 and (A ∗Hy,r ) (a1 ) = 0. This provides a1 is a multiple h . Now assume (A ∗ g h )(a ) > (A ∗ g h )(a ) and (A ∗ g h )(a ) > (A ∗ g h )(a ). zero of A ∗ Hy,r 2 1 3 1 r r r r Let us consider the values (A ∗ grh )(b1 ) and (A ∗ grh )(b2 ). If they are equal, then we can choose y = b2 − b1 . Without loss of generality, we may assume (A ∗ grh )(b2 ) > (A ∗ grh )(b1 ). Hence, there exist c1 ∈ (a1 , b1 ) and c2 ∈ (a3 , b2 ) such that (A ∗ grh )(c1 ) = (A ∗ grh )(b2 ) and (A ∗ grh )(c2 ) = (A ∗ grh )(a1 ). Let us show that there exist ∈ [a1 , c1 ] and ∈ [c2 , b2 ] such that (A ∗ grh )( ) = (A ∗ grh )() and (A ∗ grh ) ( ) = (A ∗ grh ) (). It can be easily seen that A ∗ grh decreases on the intervals [a1 , c1 ] and [c2 , b2 ]. In addition, (A ∗grh )(t) attains every value from the interval [(A ∗grh )(b2 ), (A ∗grh )(a1 )], when t ∈ [a1 , c1 ]. Similarly, (A ∗ grh )(t) attains every value from the interval [(A ∗ grh )(b2 ), (A ∗ grh )(a1 )], when t ∈ [c2 , b2 ]. Therefore, there exist functions 1 = (A ∗grh |[a1 ,c1 ] )−1 and 2 = (A ∗grh |[c2 ,b2 ] )−1 , defined on the interval [(A ∗ grh )(b2 ), (A ∗ grh )(a1 )], which are continuously differentiable. Then limx→x0 1 (x) = ∞, when x0 = (A ∗grh )(a1 ), and is finite, when x0 = (A ∗grh )(b2 ). In addition, limx→x0 2 (x) = ∞, when x0 = (A ∗ grh )(b2 ), and is finite, when x0 = (A ∗ grh )(a1 ). Thus, there exists w ∈ [(A ∗ grh )(b2 ), (A ∗ grh )(a1 )] such that 1 (w) = 2 (w). Hence, there exist
∈ (a1 , c1 ) and ∈ (c2 , b2 ) such that (A ∗ grh )( ) = (A ∗ grh )() = w and
(A ∗ grh ) ( ) =
1 1 (w)
=
1 2 (w)
= (A ∗ grh ) ().
912
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
Then y = − < T is a period of A ∗ grh , which is impossible. This implies A ∗ grh has exactly two zeros on [a1 , a1 + T ). Since A ∗ grh has 2l zeros on [0, 2), from the last note we have that T = 2l −1 . As a consequence A ∗ grh has period 2l −1 . However, then both f0 and A ∗ frh are 2l −1 -periodic, so that A ∗ frh = A ∗ hl,r;, up to a translation of the argument. Theorem 14 is proved. To prove Theorem 13 it remains to show that E0 (A ∗ hn,r;, )1;, < E0 (A ∗ hl,r;, )1;,
(6.9)
as soon as l < n. The proof falls naturally into four parts. Lemma 6.3. Let l < n, , > 0 and r = 1, 2, . . . . Then for an arbitrary x ∈ [0, 2), min(A ∗ hl,r;, )(t) < (A ∗ hn,r;, )(x) < max(A ∗ hl,r;, )(t). t
t
(6.10)
Proof. We shall prove the second inequality of (6.10). The first one can be established similarly. From Lemma 3.4 we have that (A ∗ n,r;, )(x) < max(A ∗ l,r;, )(t) for an arbitrary t
x ∈ [0, 2). Let y, z ∈ R be such that max(A ∗ hn,r;, )(t) = (A ∗ hn,r;, )(y) t
and max(A ∗ hl,r;, )(t) = (A ∗ hl,r;, )(z). t
Let us consider the function f (t) = (A ∗ l,r;, )(z + t) − (A ∗ n,r;, )(y + t). It follows that f (−h) = f (h), and there exists a point ∈ [−h, h] such that f ( ) > 0. It can be easily seen that f does not have sign changes on [−h, h] when f (h) > 0. Then f (t) > 0 for every t ∈ [−h, h] and h 1 h h h f (0) = (A ∗ l,r;, )(z) − (A ∗ n,r;, )(y) = f (t) dt > 0. 2h −h Now we shall consider the case f (h) < 0. Let the point 0 ∈ (−h − 2/n, −h) be such that (A ∗ l,r;, )(0 ) = (A ∗ l,r;, )(0 + 2/n). Then f has exactly two sign changes on the interval [0 , 0 + 2/n]. Therefore, f h (0) = (A ∗ hl,r;, )(z) − (A ∗ hn,r;, )(y) h +2/n 1 1 f (t) dt f (t) dt > 0, = 2h −h 2h which can be easily verified. This completes the proof.
Lemma 6.4. Let , ∈ R be such that (A ∗ hn,r;, )( ) = (A ∗ hl,r;, )(). Then |(A ∗ hn,r−1;, )( )| |(A ∗ hl,r−1;, )()| as soon as (A ∗ hn,r−1;, )( ) · (A ∗ hl,r−1;, )() > 0.
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
913
Proof. Let x1 < x2 < · · · < x2l < x1 + 2 be the points of extrema of the function A ∗ hl,r;, . Assume to the contrary, that there exist points , ∈ R such that (A ∗ hn,r;, )( ) = (A ∗ hl,r;, )() and |(A ∗ hn,r−1;, )( )| > |(A ∗ hl,r−1;, )()|, although (A ∗ hn,r−1;, )( ) · (A ∗ hl,r−1;, )() > 0. Applying Theorem 7 we obtain that the function f (t) = (A ∗ hl,r;, )(t) − (A ∗ hn,r;, )(t + − ) has exactly one zero on every
interval [xj , xj +1 ), j = 1, 2l, x2l+1 = x1 + 2. Without loss of generality we may assume that (A ∗ hn,r−1;, )( ) > 0. This implies f () = 0 and f () < 0. Let ∈ [xj , xj +1 ). Thus, there exists at least one zero of f either on the interval (, xj +1 ) or on the interval [xj , ), which is impossible. Lemma 6.5. Let l < n. Then (A ∗ hn,r;, )± ≺ (A ∗ hl,r;, )± .
(6.11)
Proof. Let us consider the rearrangements of the functions (A ∗ hl,r;, )(t) − and (A ∗ hn,r;, )(t) − for an arbitrary ∈ R. Applying Lemma 6.3, yields (A ∗ hn,r;, − , 0) < (A ∗ hl,r;, − , 0)
and
(A ∗ hn,r;, − , 2) > (A ∗ hl,r;, − , 2). Obviously, 2 0
(A ∗ hn,r;,
− , t) dt =
2 0
(A ∗ hl,r;, − , t) = −2.
(6.12)
It follows that (A ∗ hn,r;, − , t) and (A ∗ hl,r;, − , t) intersect at least at one point on [0, 2). We shall prove that there exists exactly one point of intersection of these functions. Assume to the contrary that there exist two points of intersection of (A ∗ hn,r;, − , t) and (A ∗ hl,r;, − , t). Hence, there exist points xn and xl such that (A ∗ hn,r;, − , xn ) = (A ∗ hl,r;, − , xl ) = z and (A ∗ hn,r;, − , xn ) < (A ∗ hl,r;, − , xl ). Let points xn < xn and xl < xl from [0, 2) be such that (A ∗ hn,r;, )(xn ) = (A ∗ hn,r;, )(xn ) = (A ∗ hl,r;, )(xl ) = (A ∗ hl,r;, )(xl ) = z and (A ∗ hn,r;, )(x) > z for every x ∈ (xn , xn ) as well as (A ∗ hl,r;, )(x) > z for ev-
ery x ∈ (xl , xl ), since the equality (A ∗ hn,r;, )(x) = c, c ∈ (minu (A ∗ hn,r;, )(u),
914
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
maxu (A ∗ hn,r;, )(u)), always has exactly 2n solutions on the period. Thus, 1 · n
(A ∗ hn,r;, − , xn ) =
1 1
−
(A ∗ hn,r−1;, )(xn )
1 (A ∗ hn,r−1;, )(xn )
and (A ∗ hl,r;, − , xl ) =
1 · l
1 1 (A ∗ hl,r−1;, )(xl )
−
1
.
(A ∗ hl,r−1;, )(xl )
Applying Lemma 6.4 we obtain (A ∗ hn,r−1;, )(xn ) < (A ∗ hl,r−1;, )(xl )
and
(A ∗ hn,r−1;, )(xn ) > (A ∗ hl,r−1;, )(xl ). This provides (A ∗ hl,r;, − , xl ) =
1 · l
1 1 (A ∗ hl,r−1;, )(x l )
1 · l
−
1 (A ∗ hl,r−1;, )(xl )
1 1 (A ∗ hn,r−1;, )(xn )
−
1 (A ∗ hn,r−1;, )(xn )
(A ∗ hn,r;, − , xn ), which is impossible. Therefore, for every x ∈ [0, 2) x x h ˜ ˜ t) dt, (A ∗ l,r;, − , t) dt (A ∗ hn,r;, − , 0
0
where ˜ = for arbitrary ∈ R, which is the desired conclusion.
(A ∗ hl,r;, , 2). Due to (6.12), it follows immediately that inequality (6.11) holds
Relation (6.9) easily follows from Lemma 6.5. In fact, taking x = 2 and , to be the constant of the best (, )-approximation of the function A ∗ hl,r;, in the space L1 , we can assert that E0 (A ∗ hn,r;, )1;, 2 [((A ∗ hn,r;, )(t) − )+ + ((A ∗ hn,r;, )(t) − )− ] dt 0
=
2
0 2 0
P ((A ∗ hn,r;, )+ , t) dt
+
2
P ((A ∗ hl,r;, )+ , t) dt +
= E0 (A ∗ hl,r;, )1;, .
0 2
0
P ((A ∗ hn,r;, )− , t) dt
P ((A ∗ hl,r;, )− , t) dt
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
915
Thus, the inequality (6.8) holds, which proves Theorem 13. Letting → 0, we obtain that Theorem 6 holds. 7. Optimal interval quadrature formula on classes W r F (Proof of Theorems 1–4) Let n, r = 1, 2, . . . , 0 < h < /n and , > 0. Let x1 < x2 < · · · < xn < x1 + 2. Due h ∈ Sh (Snr (, )) such that it attains equal minimal to Theorem 5, there exists the spline f±, x; ¯ , n values at the points {xj }j =1 . Then, ⎡ ⎤ 2 n ⎣± inf sup f h (t) dt ∓ aj f h (xj )⎦ aj f ∈W r
0
∞;−1 ,−1
2 0
j =1
h h [±f±, x; ¯ , (t) − min(±f±,x; ¯ , (u))] dt.
(7.1)
u
For the formula in with equidistant nodes we have R ± (W r
∞;−1 ,−1
, in ) = R ± (Sh (W r =
2 0
∞;−1 ,−1
), n )
[±hn,r;, (t) − min(±hn,r;, (u))] dt. u
(7.2)
In fact, due to (7.1), it suffices to prove that the left-hand side does not exceed the right-hand side. Let be the constant of best (, )-approximation of Sh (mn,r ). Restricting our consideration to R + (W r −1 −1 , in ) and taking into account (1.4) and Theorem 9, we have ∞; , R + (W r −1 −1 , in ) ∞; ,
= R + (Sh (W r
∞;−1 ,−1
), n ) = E0 (Sh (mn,r ))1;,
2 n 2 h 2j =− Dr −x [ sign(Sh (mn,r ) − )+ − sign(Sh (mn,r )−)− ] dx n n 0 j =1
−
2 · n · min hn,r;, (t) = t n
2 0
[hn,r;, (x) − min hn,r;, (t)] dx. t
Finally, note that from Theorem 6 the equality inf
g∈Snr (,)
E0± (g h )1 = E0± (hn,r;, )1
(7.3)
easily follows. Comparing relations (7.1)–(7.3), we conclude that Theorem 4 holds. Now we are ready to prove Theorem 2. In view of Theorem 10, it suffices to prove Theorem 3, i.e., that for all , > 0 and for any monospline mhi we have E0 (Sh (mn,r ))1;, E0 (mhi )1;, .
(7.4)
916
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
However, by the duality Theorem 9 and the representation (1.4) for R(f h , ), we see that if the monospline mhi corresponds to the quadrature formula i ∈ Kni (h), then E0 (mhi )1;, = R + (W r
∞;−1 , −1
; i ).
From this and from Theorem 4 (since Sh (mn,r ) corresponds to the formula in ), inequality (7.4) follows, and Theorems 3, 4 are proved. Now we shall prove Theorem 1. We obtain from relation (1.4) and Theorems 2, 10 and 11 that
2
R ± (W r F, i ) = R ± (Sh (W r F ), ) = sup
= sup
(±f (t))Sh (m)(t) dt : f ∈ F, f ⊥ 1
0
2 sup
g:(g)=(f )
= sup
2
(±g(t))Sh (m)(t) dt : f ∈ F, f ⊥ 1
0
(±f, t)(Sh (m), t) dt : f ∈ F, f ⊥ 1
0
sup
2
(±f, t)(Sh (mn,r ), t) dt : f ∈ F, f ⊥ 1
0
= R ± (Sh (W r F ), n ) = R ± (W r F, in ). Thus, Theorem 1 is proved. References [1] P.S. Aleksandrov, Combinatorial Topology, OGIZ, Moscow, 1947 (in Russian); P.S. Aleksandrov, Combinatorial Topology, vol. 1, Graylock Press, Albany, NY, 1956 (in English). [2] V.F. Babenko, Nonsymmetric approximations in the spaces of summable functions, Ukrainian Math. J. 34 (1982) 409–416 (in Russian). [3] V.F. Babenko, Inequalities for rearrangements of differentiable periodic functions, problems of approximation and integrating, Dokl. USSR 272 (1983) 1038–1041 (in Russian). [4] V.F. Babenko, On a certain problem of optimization of the approximate integration, Studies on Modern Problems of Summation and Approximation of Functions and their Applications, Dnepropetrovsk University, Dnepropetrovsk, 1984, pp. 3–13 (in Russian). [5] V.F. Babenko, Approximations, widths and optimal quadrature formulae for classes of periodic functions with rearrangement invariant sets of derivatives, Anal. Math. 13 (1987) 15–28. [6] V.F. Babenko, Widths and optimal quadrature formulae for convolution classes, Ukrainian Math. J. 43 (1991) 1135–1148. [7] S.V. Borodachov, On optimization of interval quadrature formulae on some nonsymmetric classes of periodic functions, Bull. Dnepropetrovsk Univ. Math. 4 (1999) 19–24 (in Russian). [8] S.V. Borodachov, On optimization of interval quadrature formulae on some classes of absolutely continuous functions, Bull. Dnepropetrovsk Univ. Math. 5 (2000) 28–34 (in Russian). [9] N.P. Korneichuk, Extremal Problems of Approximation Theory, Nauka, Moscow, 1976, p. 320 (in Russian). [10] N.P. Korneichuk, A.A. Ligun, V.G. Doronin, Approximation with Constraints, Naukova dumka, Kiev, 1982 (in Russian). [11] M.A. Krasnosel’skii, Ya.B. Rutickii, Convex Functions and Orlich Spaces, Fizmatgiz, Moscow, 1958 (in Russian). [12] S.G. Krein, Yu.I. Petunin, E.M. Semenov, Interpolation of Linear Operators, Nauka, Moscow, 1978 (in Russian).
V.F. Babenko, D.S. Skorokhodov / Journal of Complexity 23 (2007) 890 – 917
917
[13] A.L. Kuz’mina, Interval quadrature formulae with multiple node intervals, Izv. Vuzov Math. 7 (1980) 39–44 (in Russian). [14] A.A. Ligun, Exact inequalities for spline-functions and best quadrature formulae for some classes of functions, Math. Zametki 19 (1976) 913–926 (in Russian). [15] G.V. Milovanovic, A.S. Cvetkovic, Gauss–Radau and Gauss–Lobatto interval quadrature rules for Jacobi weight function, Numer. Math. 3 (102) (2006) 523–542. [16] V.P. Motornyi, On the best quadrature formula of the form nk=1 pk f (xk ) for certain classes of periodic differentiable functions, Izv. Akad. Nauk SSSR. Ser. Mat. 38 (1974) 583–614 (in Russian). [17] V.P. Motornyi, On the best interval quadrature formula in the class of functions with bounded rth derivative, East J. Approx. 4 (1998) 459–478. [18] M. Omladich, S. Pahor, S. Suhadolc, On a new type of quadrature formulae, Numer. Math. 25 (1976) 421–426. [19] K.I. Oskolkov, On optimality of quadrature formula with equidistant nodes on the classes of periodic functions, Dokl. Akad Nauk USSR 249 (1979) 49–52 (in Russian). [20] Fr. Pittnauer, M. Reimer, Interpolation mit Intervallfunctionalen, Math. Z. 146 (1976) 7–15. [21] R.N. Sharipov, Best interval quadrature formulae for Lipschitz classes, Constructive Function Theory and Functional Analysis, vol. 4, Kazan University, Kazan, 1983, pp. 124–132 (in Russian). [22] H. Tribel, Theory of Interpolation, Function Spaces, Differential Operators, Mir, Moscow, 1980 (in Russian). [23] A.A. Zhensykbaev, The best quadrature formula for some classes of periodic functions, Izv. Akad Nauk USSR, Ser. Math. 41 (1977) 1110–1124 (in Russian). [24] A.A. Zhensykbaev, Monosplines of minimal norm and the best quadrature formulae, Uspehi Math. Nauk. 36 (1981) 107–159 (in Russian).
Journal of Complexity 23 (2007) 918 – 925 www.elsevier.com/locate/jco
Deterministic constructions of compressed sensing matrices夡 Ronald A. DeVore Department of Mathematics, University of South Carolina, Columbia, SC 29208, USA Received 8 January 2007; accepted 16 April 2007 With high esteem to Professor Henryk Wozniakowski on the occasion of his 60th birthday Available online 4 May 2007
Abstract Compressed sensing is a new area of signal processing. Its goal is to minimize the number of samples that need to be taken from a signal for faithful reconstruction. The performance of compressed sensing on signal classes is directly related to Gelfand widths. Similar to the deeper constructions of optimal subspaces in Gelfand widths, most sampling algorithms are based on randomization. However, for possible circuit implementation, it is important to understand what can be done with purely deterministic sampling. In this note, we show how to construct sampling matrices using finite fields. One such construction gives cyclic matrices which are interesting for circuit implementation. While the guaranteed performance of these deterministic constructions is not comparable to the random constructions, these matrices have the best known performance for purely deterministic constructions. © 2007 Elsevier Inc. All rights reserved. Keywords: Compressed sensing; Sampling; Widths; Deterministic construction
1. Introduction Compressed sensing (CS) offers an alternative to the classical Shannon theory for sampling signals. The Shannon theory models signals as bandlimited and encodes them through their time samples. The Shannon approach is problematic for broadband signals since the high sampling rates cannot be implemented in circuitry. In CS one replaces the bandlimited model of signals by the assumption that the signal is sparse or compressible with respect to some basis or dictionary of wave forms and enlarges the concept of sample to include the application of any linear functional. 夡
This research was conducted while the author was the visiting Texas Instrument Professor at Rice University. E-mail address: [email protected].
0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.04.002
R.A. DeVore / Journal of Complexity 23 (2007) 918 – 925
919
Much of the methodology of CS traces back to early work on Gelfand widths and information based complexity (IBC); see [6,5,4] for a discussion of these connections. This paper will be concerned with the discrete CS problem where we are given a discrete signal which is a vector x ∈ RN with N large and we wish to capture x by linear information. This means that we are allowed to sample x by inner products v · x of x with vectors v. We are interested in seeing how well we can do given a budget n < N in the number of samples we are allowed to take. This should be contrasted to the usual paradigm in compression, where one represents the signal with respect to some basis, computes all of its coefficients, but then retains only a small number (in our case n) of the largest of these coefficients to obtain compression. Here we want to see if we can avoid computing all of these coefficients and merely take a compressed number of samples to begin with. If we choose n sampling vectors then our sampling can be represented by an n × N matrix (called a CS matrix) whose rows are the vectors v that have been chosen for the sampling. Thus, the information we extract from x through is the vector y = x which lies in the lower dimensional space Rn . The question becomes: What are good sampling matrices ? To give this question a precise formulation, we need to specify several ingredients. First, what will we allow as decoders of y. That is how will we recover x or an approximation x¯ to x from y. Here we will be very general and consider any mapping from Rn → RN as a potential decoder. The mapping will generally be nonlinear—in contrast to which is assumed to be linear. The problem of having practical, numerically implementable decoders is an important one and to a large extent separates CS from the earlier work on widths and IBC. However, this will not be the concern of this paper. Given that the dimensions n, N of our problem are fixed, we let An,N denote the set of all encoding–decoding pairs (, ) where is an n × N matrix and maps Rn → RN . A second ingredient is how we shall measure distortion. The vector x¯ := (x) will in general not be the same as x. We can measure the distortion x − x¯ in any norm on RN . The typical choices are the N p norms: 1/p N p |x | , 0 < p < ∞, j j =1 (1.1) xNp := maxj =1,...,N |xj |, p = ∞. There are several ways in which we can measure performance of a CS matrix (see [4]). In this paper, we shall restrict our attention to only one method which relates to Gelfand widths. Given a vector x ∈ RN , the performance of the encoding–decoding pair (, ) in the metric of N p is given by E(x, , )Np := x − (x)Np .
(1.2)
Rather than measure the performance on each individual x, we shall measure performance on a class K. If K is a bounded set contained in RN , the error of this encoding–decoding on K is given by E(K, , )Np := sup E(x, , )Np .
(1.3)
x∈K
Thus, the error of the class K is determined by the largest error on K. The best possible performance of an encoder–decoder is given by En,N (K)Np :=
inf
(,)∈An,N
E(K, , )Np .
(1.4)
920
R.A. DeVore / Journal of Complexity 23 (2007) 918 – 925
We say that an encoder–decoder pair (, ) ∈ An,N is near optimal on K with constant M, if E(K, , )Np MEn,N (K)Np .
(1.5)
If M = 1 we say the pair is optimal. This is the so-called min–max way of measuring optimality prevalent in approximation theory, information based complexity, and statistics. Given a set K, the optimal performance En,N (K)Np of CS is directly connected with the Gelfand widths of the set K. If K is a compact set in N p , and n is a positive integer, then the Gelfand width of K is by definition d n (K)Np := inf sup{xNp : x ∈ K ∩ Y },
(1.6)
Y
where the infimum is taken over all subspaces Y of X with codimension n. If K = −K and K + K ⊂ C0 K, for some constant C0 , then d n (K)Np En,N (K)Np C0 d n (K)Np ,
1n N.
(1.7)
In other words, finding the best performance of encoding–decoding on K is equivalent to finding its Gelfand width. The relation between these two problems is the following. If (, ) is an encoding–decoding pair for CS on K, then the null space Y of is a space of codimension n which is a candidate for Gelfand widths. Conversely, given any space Y for Gelfand widths then any basis for its orthogonal complement gives a CS matrix for CS on K. Using these correspondences, one easily proves (1.7) (see [4]). N The Gelfand widths of the unit balls K = U (N q ) in p are known up to multiplicative constants. N We highlight only one of these results for the Gelfand width of U (N 1 ) in 2 which is the deepest result in this field. It states that there exist absolute constants C1 , C2 such that log(N/n) log(N/n) n N C1 d (U (1 ))N C2 . (1.8) 2 n n The upper estimate in (1.8) was proved by Kashin [8] save for the correct power of the logarithm. Later Garneev and Gluskin proved the upper and lower bounds in (1.8) (see [7]). The upper bound is proved via random constructions and there remains to this date no deterministic proof of the upper bound in (1.8). In CS, their constructions correspond to random matrices whose entries are independent realizations of a Gaussian or Bernouli random variable. Our interest in this paper centers around deterministic constructions of matrices for CS. We ask how close we can get to the Gelfand width of classes with such constructions. We shall give constructions of matrices using finite fields which are related to the use of finite fields to prove results on Kolmogorov widths as given in [2]. A related construction using number theory was given by Maiorov [11] (see also [10] for another deterministic construction). Our constructions will not give optimal or near optimal performance, as will be explained later. However, their performance is the best known to the author for deterministic constructions. We shall also consider modifications of this construction so that the resulting matrices are circulant (each row of is a certain shift of the previous row with wrapping). The importance of circulant matrices is that they can be more readily implemented in circuits. An outline of our paper is the following. In the next section, we discuss the restricted isometry property (RIP) introduced by Candès and Tao [3] and how this property guarantees upper bounds for the performance of CS matrices on classes. The following section, gives our construction of CS matrices and the proof that they satisfy a RIP. The final section gives some concluding remarks.
R.A. DeVore / Journal of Complexity 23 (2007) 918 – 925
921
2. Some simple results about CS matrices How can we decide if a given matrix is good for CS? Candès and Tao [3] have introduced a condition on matrices which they call the restricted isometry property and show that whenever a matrix satisfies this property, we can obtain estimates for its performance on sets K = U (N q ). For the remainder of this paper, · will always denote an 2 norm. All other norms will be subscripted. If k 1 is an integer, we denote by k the set of all vectors x ∈ RN such that at most k of the coordinates of x are nonzero. In other words, k is the union of all the k-dimensional spaces XT , #(T ) = k, where T ⊂ {1, . . . , N} and XT is the linear space of all x ∈ RN which vanish outside of T . Given any vector x ∈ RN , we define k (x)Np := inf x − zNp ,
(2.1)
z∈k
which is the error of k term approximation to x in N p. Following Candès and Tao, we say that has the RIP of order k and constant ∈ (0, 1) if (1 − )x2 x2 (1 + )x2 ,
x ∈ k .
(2.2)
Notice that x ∈ Rn so that x is the n2 norm. To get a better understanding of this property, consider the n × #(T ) matrices T formed by the columns of with indices from T . Then (2.2) is equivalent to showing that the Grammian matrices AT := tT T ,
#(T ) = k,
(2.3)
are bounded and boundedly invertible on 2 with bounds as in (2.2), uniform for all T such that #(T ) = k. The matrix AT is symmetric and nonnegatively definite, so this is equivalent to each of these matrices having their eigenvalues in [1 − , 1 + ]. The importance of the RIP is seen from the following theorem of Candès and Tao [3] (reinterpreted in [4]). If the n × N matrix satisfies RIP of order 3k for some ∈ (0, 1), then there is a decoder such that for any vector x ∈ RN , we have x − (x)N C 2
k (x)N √ 1 . k
(2.4)
This means that the bigger the value of k for which we can verify the RIP then the better guarantee we have on the performance of . As an example, let us return to the case of the set K = U (N 1 ). If an n × N matrix has the RIP of order k then (2.4) shows that √ N (2.5) d n (U (N 1 ))N En,N (U (1 ))N C/ k. 2
2
To get the optimal result we want to satisfy RIP of order k = n/ log(N/n). Matrices of this type can be constructed using random variables such as Gaussian or Bernouli as their entries (see [1] for example). However, there are no deterministic constructions for k of this size. In the next section, we shall give a deterministic construction of matrices which satisfy RIP for a more modest range of k.
922
R.A. DeVore / Journal of Complexity 23 (2007) 918 – 925
3. Deterministic constructions of CS matrices We shall give a deterministic construction of matrices which satisfy the RIP. The vehicle for this construction are finite fields F . For simplicity of this exposition, we shall consider only the case that F has prime order and hence is the field of integers modulo p. The results we prove can be established for other finite fields as well. Given F , we consider the set F × F of ordered pairs. Note that this set has n := p 2 elements. Given any integer 0 < r < p, we let Pr denote the set of polynomials of degree r on F . There are N := pr+1 such polynomials. Any polynomial Q ∈ Pr can be represented as Q(x) = a0 + a1 x + · · · + ar x r where the coefficients a0 , . . . , ar are in F . If we consider this polynomial as a mapping of F to F then its graph G(Q) is the set of ordered pairs (x, Q(x)), x ∈ F . This graph is a subset of F × F . We order the elements of F × F lexicographically as (0, 0), (0, 1), . . . , (p − 1, p − 1). For any Q ∈ Pr , we denote by vQ the vector indexed on F × F which takes the value one at any ordered pair from the graph of Q and takes the value zero otherwise. Note that there are exactly p ones in vQ ; one in the first p entries, one in the next p entries, and so on. Theorem 3.1. Let 0 be the n × N matrix with columns vQ , Q ∈ Pr , with these columns ordered lexicographically with respect to the coefficients of the polynomials. Then, the matrix := √1p 0 satisfies the RIP with = (k − 1)r/p for any k < p/r + 1. Proof. Let T be any subset of column indices with #(T ) = k and let T be the matrix created from by selecting these columns. The Grammian matrix AT := tT T has entries p1 vQ · vR with Q, R ∈ Pr . The diagonal entries of AT are all one. For any Q, R ∈ Pr with Q = R, there are at most r values of x ∈ F such that Q(x) = R(x). So any off diagonal entry of AT is r/p. It follows that the off diagonal entries in any row or column of AT have sum (k − 1)r/p = < 1 whenever k < p/r + 1. Hence we can write AT = I + BT ,
(3.1)
where BT where the norm is taken on either of 1 or ∞ . By interpolation of operators, the norm of BT is as an operator from 2 to 2 . It follows that the spectral norm of AT is 1 + and that of its inverse is (1 − )−1 . This verifies (2.2) and proves the lemma. Notice that since n = p2 and N = p r+1 , log(N/n) = (r − 1) log p√ = (r − 1) log(n)/2, we have constructed matrices that satisfy RIP for the range k − 1 < p/r < n log n/(2 log(N/n)). Our next goal is to modify the above construction to obtain circulant matrices = (i,j ). A circulant matrix has the property that i+1,j + = i,j ,
(3.2)
where := N/n and the arithmetic on indices is done modulo N . Hence a circulant matrix is determined by its first columns. Once these columns have been specified, all other entries are determined by imposing condition (3.2). Each other column will be a cyclic shift of one of the first columns. As in the previous theorem, our construction will use the vectors vQ , Q ∈ Pr , to generate the first columns. However, now we must be more selective in which polynomials we shall choose for these columns. Let us observe how we fill out the matrix from its first columns. The next block of columns is each gotten by a cyclic shift. For example each column with index m +
R.A. DeVore / Journal of Complexity 23 (2007) 918 – 925
923
with m ∈ {1, . . . , } is obtained by taking the entries in column m and shifting them down one while the last entry in the mth column is moved to the top position. We continue in this fashion to the next block of columns and so forth. There will be n = p 2 such blocks. Consider the j th block, 0 j n − 1. We can write j = a + bp with a, b ∈ {0, . . . , p − 1}. Each column in this block will be a cyclic shift of the corresponding column vQ from the first block. Recall that we index the rows of by (x, y) ∈ F × F . The entry in the (x, y) position of vQ will now occupy the position (x , y ) where y = y + j = y + a modulo p and x = x + b modulo p or x = x + b + 1 modulo p. Since the ones in vQ occur precisely in the positions (x, Q(x)) the new ones in the corresponding column of block j will occur either in position (x , y ) where y = Q(x) + a modulo p and x = x + b modulo p or x = x + b + 1 modulo p. To describe the set of polynomials we shall use for the columns, we define the equivalence relation that two polynomials P , Q of degree r over F are equivalent (written P ≡ Q) if there exist a, b ∈ F such that P (x) = Q(x + a) + b,
∀x ∈ F.
(3.3)
Let us see what the structure of such an equivalence class is. For this, we use the simple lemma. Lemma 3.2. If f is any function on F for which there exist a, b ∈ F , not both zero, such that f (x) = f (x + a) + b for all x ∈ F , then f is a linear function. Proof. It follows that f (a) = f (0) − b and more generally f (ka) = f (0) − kb, for each k ∈ F . If a = 0, then ka, k = 1, . . . , p exhaust F and so f (x) = f (0) − a −1 bx for all x ∈ F so that f is linear. If a = 0, then f (x) = f (x) + b and hence b = 0 as well. Let us now consider the equivalence classes. One equivalence class consists of all the constant functions; there are p functions in this equivalence class. For each P (x) = x with = 0, its equivalence class will consist of all linear functions of the form x + b, b ∈ F ; there are again p functions in each of these equivalence classes. Finally if P is a polynomial which is not linear, then its equivalence class will consist of the p 2 polynomials P (x + a) + b corresponding to the p 2 choices of a, b (see Lemma 3.2). Let r consist of a set of representatives from each of the equivalence classes which do not consist of linear polynomials. That is we choose one representative from each of these equivalence classes except that we never take polynomials of degree 1. Let us see what the cardinality of r is. There are p r+1 polynomials of degree r and p 2 linear polynomials. So there are p r+1 − p 2 polynomials which are not linear. They are divided into sets of size p 2 (the equivalence classes). Hence, := #(r ) = pr−1 − 1. Now, there are n = p 2 cyclic shifts so N = pr+1 − p 2 . In going further in this section, let 0 denote the circulant matrix whose first columns are the vQ , Q ∈ r written in lexicographic order. Our next lemma bounds the inner products of any two columns of 0 . Lemma 3.3. For any two columns v = w from the matrix 0 , we have |v · w| 4r.
(3.4)
Proof. Each of the columns v, w of 0 can be described as a cyclic shift of vectors vQ , vR with Q, R ∈ r . As we have observed above, there are integers a0 , b0 (depending only on v) such that any one in column v occurs at a position (x , y ) if and only if x = x +b0 +0 and y = Q(x)+a0
924
R.A. DeVore / Journal of Complexity 23 (2007) 918 – 925
with x ∈ F and 0 ∈ {0, 1}. Similarly, a one occurs in column w at position (x
, y
) if and only ¯ + a1 with x¯ ∈ F and 1 ∈ {0, 1}. The inner product v · w if x
= x¯ + b1 + 1 and y
= R(x) counts the number of row positions for which there is a one in each of these two columns. That is the number of solutions to x + b0 + 0 = x¯ + b1 + 1 and Q(x) + a0 = R(x) ¯ + a1 with x, x¯ ∈ F and 0 , 1 ∈ {0, 1}. Consider first the case when Q = R. We fix one of the four possibilities for 0 , 1 . These equations mean that x¯ = x + b and R(x + b) = Q(x) + a with b = b0 − b1 + 0 − 1 and a = a0 − a1 . Since R = Q, we know that R(· + b) is not identical to Q(·) + a because these R and Q are not equivalent. In this case the only possible x which can satisfy the above are the zeros of the nonzero polynomial R(· + b) − Q(·) − a. Thus there are at most r such x because this latter polynomial has degree r. Since there are four possibilities for (0 , 1 ), we have |v · w| 4r as desired. Now consider the case when R = Q and any one of the four possible values for (0 , 1 ). Similar to the case just handled, we have that x¯ = x + b and Q(x + b) − a = Q(x). We are interested in the number of x for which this can happen. As long as these two polynomials are not identical this can happen at most r times. But we know that they can only be identical if Q is linear (see Lemma 3.2) and we know linear polynomials are not in r . Thus, even in the case Q = R we also have that |v · w| is at most 4r. Theorem 3.4. The cyclic matrix := k − 1 < p/4r.
√1 0 p
has the RIP (2.2) with = 4(k − 1)r/p whenever
Proof. The proof is the same as that of Theorem 3.1.
Notice that since n = p2 and N = pr+1 −p 2 , log(N/n) < (r −1) log p√= (r −1) log(n)/2, we have constructed matrices that satisfy RIP for the range k−1 < p/(4r) < n log n/(8 log(N/n)). 4. Concluding remarks √ The matrices of our two theorems satisfy RIP of order k for k C n log n/ log(N/n) which is the largest range of k that is known to the author for deterministic constructions. However, it falls far short of the range k Cn/ log(N/n) known for probabilistic constructions. The fact is that √ we know from probabilistic constructions that there exist n × N matrices with entries ±1/ n that satisfy RIP for the larger range k Cn/ log(N/n). We just cannot explicitly describe one of these matrices when N and n are large. It is therefore very interesting to try to obtain a larger range of k with deterministic methods and to understand if there are any essential limitations to deterministic methods. Let us point out some of the deficiencies in our approach. First, we begin by asking what are good compressed sensing matrices. The restricted isometry property is just a sufficient condition to guarantee that a matrix has good performance on classes. Two matrices can has exactly the same performance on classes and yet one will satisfy RIP and the other not. So there may be a more direct avenue to constructing good CS matrices by not going through RIP. The RIP is a condition on the spectral norm of the matrices AT = tT T . We have bounded the spectral norm by bounding the 1 and ∞ norms (which are much easier to handle than the spectral norm) and then using interpolation. The bounds we have gotten on k appear to be the best we could expect to get by this approach. Indeed, with an eye toward results on distribution of scalar products of unit vectors (see [9, Lemma 4.1, Chapter 14]), it seems that we could not
R.A. DeVore / Journal of Complexity 23 (2007) 918 – 925
925
improve much on the bounds we gave for diagonal dominance. Of course, the spectral norm of a matrix can be much smaller than the 1 , ∞ norms. Thus it may be that estimating the spectral norm directly may be the way to go to obtain stronger results than ours. Acknowledgments The author thanks the Electrical and Computer Engineering Department at Rice, in particular Professor Rich Baraniuk, for their great hospitality. This research was supported by the Office of Naval Research Contracts ONR-N00014-03-1-0051, ONR/DEPSCoR N00014-03-1-0675, and ONR/DEPSCoR N00014-05-1-0715; and the National Science Foundation Grant DMS-354707. References [1] R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, The Johnson–Lindenstrauss meets compressed sensing, Constr. Approx., to appear. [2] C. de Boor, R. DeVore, K. Hoellig, Mixed norm n-widths, Proc. Amer. Math. Soc. 80 (1980) 577–583. [3] E. Candès, T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory 51 (2005) 4203–4215. [4] A. Cohen, W. Dahmen, R. DeVore, Compressed sensing and best k-term approximation, submitted for publication. [5] R. DeVore, Optimal Computation, vol. I, Proceedings of ICM 2006, Madrid, European Mathematical Society Publishing House, 2007, to appear. [6] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory 52 (2006) 1289–1306. [7] E.D. Gluskin, Norms of random matrices and widths of finite-dimensional sets, Math. USSR Sb. 48 (1984) 173–182. [8] B. Kashin, The widths of certain finite dimensional sets and classes of smooth functions, Izvestia 41 (1977) 334–351. [9] G.G. Lorentz, M. von Golitschek, Yu. Makovoz, Constructive Approximation: Advanced Problems, Springer Grundlehren, vol. 304, Springer, Berlin, Heidelberg, 1996. [10] V. Maiorov, Trigonometric widths of Sobolev classes in the space Lq , Math. Zametki 40 (1986) 161–173. [11] V. Maiorov, Linear diameters of Sobolev classes, Soviet Dokl. 43 (1991) 1127–1130.
Journal of Complexity 23 (2007) 926 – 936 www.elsevier.com/locate/jco
On linear codes with large weights simultaneously for the Rosenbloom–Tsfasman and Hamming metrics夡 M.M. Skriganov Steklov Mathematical Institute, Fontanka 27, St. Petersburg 191023, Russia Received 26 January 2007; accepted 26 February 2007 Available online 24 March 2007 Dedicated to Henryk Wo´zniakowski on the occasion of his 60th birthday
Abstract We show that maximum distance separable (MDS) codes, or more generally nearly MDS codes, for the Rosenbloom–Tsfasman metric can meet the Gilbert–Varshamov bound for their Hamming weights. The proof is based on a careful analysis of orbits of a linear group preserving the Rosenbloom–Tsfasman metric. © 2007 Elsevier Inc. All rights reserved. Keywords: Coding theory with non-Hamming metrics
1. Introduction A new approach to the theory of uniformly distributed point sets was developed in the recent papers [1,8,9]. This approach crucially depends on a specific version of coding theory where, unlike the classical coding theory, two basic metrics are involved. One of them is the standard Hamming metric while the other one is the Rosenbloom–Tsfasman metric introduced in [7]. In the present paper, we address an aspect of such a version of coding theory. Suppose that a linear code C ⊂ Fq over a finite field Fq with a large Rosenbloom–Tsfasman weight (C) is given. What can one say about the Hamming weight (C) of this code? Simple examples show that in general (C) is not controlled by (C). However, it turns out (see our main Theorem 3.1 in Section 3) that if one considers the orbit of the code C under the action of a linear group preserving the weight (C), then a portion of codes on this orbit have large Hamming weights. Furthermore, if C is a maximum distance separable (briefly MDS) code, or more generally nearly MDS code for the Rosenbloom–Tsfasman metric, then there exist codes on 夡
Supported by RFFI (Project No. 05-01-00935). E-mail address: [email protected].
0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.02.004
M.M. Skriganov / Journal of Complexity 23 (2007) 926 – 936
927
the orbit of C which meet the Gilbert–Varshamov bound for their Hamming weights (see Theorem 3.2 below). We conjecture that point distributions constructed in terms of such specific codes have a series of remarkable properties. The author hopes to consider these intriguing questions in forthcoming papers. The present paper is organized as follows. In Section 2, preliminary material on coding theory is discussed. Our main Theorem 3.1 is given in Section 3. This section also contains asymptotic consequences of Theorem 3.1 given in Theorem 3.2. In Section 4, we consider the structure of orbits of a group preserving the Rosenbloom–Tsfasman metric, and relying on this consideration, we complete the proof of Theorem 3.1 in Section 5. 2. Preliminaries Let Matn,s (Fq ) denote the linear space of all matrices with n rows and s columns with entries from a fixed finite field Fq of q elements. Clearly, the space Matn,s (Fq ) is a direct product of n copies of the space Mat1,s (Fq ), so that Matn,s (Fq ) = Mat1,s (Fq ) × · · · × Mat1,s (Fq ) Fq ,
= ns.
(2.1)
n
By definition (cf. [4]), the Hamming weight (), ∈ Matn,s (Fq ), is equal to the number of nonzero entries of the matrix . In this case, (1 − 2 ) defines the Hamming metric on the space Matn,s (Fq ). The Rosenbloom–Tsfasman weight (), ∈ Matn,s (Fq ), is defined as follows. At first, let n = 1 and = (1 , . . . , s ) ∈ Mat1,s (Fq ). Then, we put (0) = 0, and () = max{i : i = 0} for = 0. Now, let
(2.2)
⎞ 1 ⎟ ⎜ = (1 , . . . , n ) = ⎝ ... ⎠ ∈ Matn,s (Fq ), ⎛
j ∈ Mat1,s (Fq ), 1 j n.
n
Then, we put () =
n
(j ).
(2.3)
j =1
It is easy to check that () = 0 if and only if = 0, and that the weights (2.2) and (2.3) satisfy the triangle inequality. Thus, (1 − 2 ) defines the Rosenbloom–Tsfasman metric on the space Matn,s (Fq ). Note that definition (2.2) implies even a stronger inequality (1 − 2 ) max{(1 ), (2 )},
1 , 2 ∈ Mat1,s (Fq ).
(2.4)
Thus, the Rosenbloom–Tsfasman metric for n = 1 is an ultrametric. It is obvious that ()()s(),
(2.5)
928
M.M. Skriganov / Journal of Complexity 23 (2007) 926 – 936
and these inequalities cannot be improved on the whole space Matn,s (Fq ). Thus, for large s the metric is stronger than . For s = 1 the both metrics coincide. It is remarkable that fundamental concepts related to the Hamming metric can be very naturally extended to the Rosenbloom–Tsfasman metric (see [2,7,8]). Following [8], we introduce a group Tsn of linear transformations on Matn,s (Fq ) preserving the weight . At first, let n = 1, = (1 , . . . , s ) ∈ Mat1,s (Fq ), and Ts denote the group of all lower triangular s × s matrices over Fq with arbitrary nonzero diagonal elements. From definition (2.2), we immediately conclude that the linear mappings t : Mat1,s (Fq ) → t ∈ Mat1,s (Fq ),
t ∈ Ts ,
(2.6)
preserve the weight : we have (t) = (). Now, let = (1 , . . . , n ) ∈ Matn,s (Fq ), j ∈ Mat1,s (Fq ), 1 j n, and Tsn = Ts × · · · × Ts
(2.7)
n
denote a direct product of n copies of Ts . Then, the linear mappings : Matn,s (Fq ) = (1 , . . . , n ) → = (1 t1 , . . . , n tn ) ∈ Matn,s (Fq ),
(2.8)
= (t1 , . . . , tn ) ∈ Tsn , preserve the weight : we have () = (). Obviously, the orders of the groups Tsn can be given by #{Tsn } = (q − 1)ns q
ns(s−1) 2
(2.9)
.
We write #{ · } for the cardinality of a finite set. Note that the full group of linear transformations preserving the Rosenbloom–Tsfasman weight is a semidirect product of the group Tsn and the group of all permutations of rows in matrices ∈ Matn,s (Fq ). This claim was conjectured in [8] and proved in [3]. However, in the present paper we do not use this fact. Finally, we recall (see [4] for details) that the Hamming ball Bn,s (r) = { ∈ Matn,s (Fq ) : () r},
r 0,
(2.10)
has the cardinality Vq (, r) = #{Bn,s (r)} =
r i=0
i
(q − 1)i ,
(2.11)
where · denotes the integer part of a real number, and = ns as given in (2.1). Furthermore, for each ∈ [0, q−1 q ], we have asymptotically log q Vq (, ) = Hq () + o(1), as → ∞, (2.12) where log q denotes the log in base q and Hq is the q-ary entropy function: Hq (0) = 0, and Hq () = log q (q − 1) − log q − (1 − )log q (1 − ) for 0 < q−1 q .
M.M. Skriganov / Journal of Complexity 23 (2007) 926 – 936
929
Note that Hq () is a continuous monotonic function, increasing on the interval [0, q−1 q ] from ← 0 to 1. Therefore, the inverse function Hq (x) is continuous and monotonic on the interval [0, 1],
increasing from 0 to q−1 q . We have listed the main auxiliary facts. Some additional facts will be given in the next section. 3. The main results
A linear code C is a subspace in Matn,s (Fq ). The parameter = ns is called the length of a code. We will consider only linear codes C = {0}. Introduce the Hamming and Rosenbloom–Tsfasman (minimum) weights for a linear code C ⊂ Matn,s (Fq ) by wt (C) = min {wt () : ∈ C\{0}} ,
(3.1)
where wt denotes any one of the weights or . Obviously, the group Tsn preserves the weight : we have (C) = (C), ∈ Tsn , where C = { , : ∈ C}. In view of (3.1) and (2.5), we have (C)(C)s(C).
(3.2)
Thus, if the weight (C) is large, the weight (C) is also large. However, as it was mentioned in the Introduction, our concern here is with the opposite situation when the weight (C) is known to be large and we are interested in whether the weight (C) can be large as well. Our main result is the following: Theorem 3.1. Let C ⊂ Matn,s (Fq ) be an arbitrary linear code. Suppose that the inequality
n q q (C ) q Vq (, d − 1) (3.3) q −1 holds for some positive integer d. Then, there exists a nonempty subset G(C) ⊂ Tsn such that the bound (C) d holds for all transformations ∈ G(C). Furthermore, the cardinality of the subset G(C) satisfies the bound n
q #{G(C)} Vq (, d − 1)q −(C ) 0. > 1 − q #{Tsn } q −1
(3.4)
(3.5)
The proof of Theorem 3.1 will be given in Section 5. Now we wish to derive some asymptotic consequences of Theorem 3.1. Both weights (C) and (C) (see (3.1)) satisfy the bound wt (C) − k(C) + 1,
(3.6)
where k(C) denotes the dimension of the linear subspace C ⊂ Matn,s (Fq ). For the Hamming weight this is the well-known Singleton bound (see [4]), and for the Rosenbloom–Tsfasman weight this bound was proved in [7] (see also [1] and [8]).
930
M.M. Skriganov / Journal of Complexity 23 (2007) 926 – 936
If for one of the weights (C) or (C) we have the equality in (3.6), wt (C) = − k(C) + 1,
(3.7)
then the code C is called MDS code for the corresponding metric. Trivial MDS codes of dimensions 1, − 1, and can be easily constructed (say, in the last case C = Matn,s (Fq )). Nontrivial MDS codes (of dimension 1 < k(C) < − 1) for the Rosenbloom–Tsfasman metric and s → ∞ exist if and only if q n − 1 (see [8]). The corresponding conditions in the case of the Hamming metric can be found in [4]. Let us write (C) = − k(C) + 1 − (C),
(3.8)
where the nonnegative parameter (C) is called the deficiency of the code C. Thus, MDS codes have zero deficiency. Let an infinite sequence of linear codes Cn,s ⊂ Matn,s (Fq ), s → ∞, be given. The codes Cn,s are called the nearly MDS codes for the Rosenbloom–Tsfasman metric if (Cn,s ) = o() as s → ∞. One can easily construct linear codes Cn,s ∈ Matn,s (Fq ) of deficiency (Cn,s ) = O(n log n) (see [8]). Obviously, these codes are nearly MDS codes if log n = o(s) as s → ∞. With more complicated methods of [5], one can construct codes of deficiency (Cn,s ) = O(n). Moreover, this bound cannot be improved for large n. Obviously, such codes are always nearly MDS codes. The role of both metrics and in the context of uniformly distributed point sets is discussed in detail in [9]. In particular, using the dual codes to linear codes Cn,s ⊂ Matn,s (Fq ) of dimension k(Cn,s ) = (n − 1)s and small deficiency (Cn,s ), one obtains very good distributions of q s points in the n-dimensional unit cube. If additionally the Hamming weights of the codes Cn,s are large, then the corresponding distributions of q s points have the minimal order of the Lp -discrepancies (see [1] and [9]). In applications to the theory of uniformly distributed point sets, the parameter n is usually assumed to be fixed while the parameter s → ∞. The situation when s is fixed and n → ∞ is also of interest for applications but in this case the behavior of the corresponding point distributions turns out to be very specific (see [6]). Note that in the last case the metrics and are equivalent (see (2.5) and (3.2)). For convenience, we normalize various characteristics of a code by the quantity = ns. More precisely, we write (C) =
(C) ,
(C) =
(C) ,
k(C) k(C) = ,
(C) (C) = .
In this notation, relation (3.8) for nearly MDS codes can be written in the form 1 (Cn,s ) = 1 − k(Cn,s ) + − k(Cn,s ) + o(1), (Cn,s ) = 1 −
as s → ∞.
(3.9)
Recall that in coding theory the parameter k( · ) is known as the rate of a linear code. Obviously, the group Tsn preserves the rate: we have k(Cn,s ) = k(Cn,s ), ∈ Tsn . With the above remarks, we have the following corollary of Theorem 3.1.
M.M. Skriganov / Journal of Complexity 23 (2007) 926 – 936
931
Theorem 3.2. Let Cn,s , s → ∞, be an infinite sequence of linear nearly MDS codes for the Rosenbloom–Tsfasman metric. Suppose also that (Cn,s )n{1 − log q (q − 1)} + 1,
as s → ∞.
(3.10)
Then, for all sufficiently large s, there exist nonempty subsets G(Cn,s ) ⊂ Tsn such that the Gilbert–Varshamov bound (cf . [4]) k(Cn,s )) + o(1), (Cn,s )Hq← (1 −
s → ∞,
(3.11)
holds for all transformations ∈ G(Cn,s ). The cardinality of the subsets G(Cn,s ) is given by (3.5) with C = Cn,s . Proof. With the assumption (3.10), we observe that for all sufficiently large s, inequality (3.3) holds for d = 1, at least. Let Dn,s 1 be the largest positive integer such that inequality (3.3) holds for C = Cn,s and d = Dn,s . Then
n n q q q Vq (, Dn,s ) > q (Cn,s ) q Vq (, Dn,s − 1). (3.12) q −1 q −1 Let us put n,s = Dn,s . D Taking the log q of each term in the inequalities (3.12) and using the asymptotic formula (2.12), we find that 1 1 n,s ) + o(1) {1 − log q (q − 1)} + + Hq (D s
1 1 1 > (Cn,s ) {1 − log q (q − 1)} + + Hq Dn,s − + o(1), as s → ∞. s Therefore, n,s ) + o(1), (Cn,s ) = Hq (D
as s → ∞,
and ← n,s = Hq← ( (Cn,s )) + o(1) = Hq (1 − k(Cn,s )) + o(1), D
as s → ∞.
(3.13) ←
In these asymptotic calculations we used the fact that both functions Hq ( · ) and Hq ( · ) are continuous. By Theorem 3.1, for all sufficiently large s, there exist nonempty subsets G(Cn,s ) ⊂ Tsn such that the bound n,s (Cn,s , ) D
(3.14)
holds for all transformations ∈ G(Cn,s ). Substituting the asymptotic formula (3.13) into the bound (3.14), we obtain the inequality (3.11). The proof of Theorem 3.2 is complete.
932
M.M. Skriganov / Journal of Complexity 23 (2007) 926 – 936
4. Orbits of the group Tsn on Matn,s (Fq ) First of all, we wish to describe the structure of orbits of the group Tsn on the space Matn,s (Fq ). Let n = 1, then we introduce the boxes a = { ∈ Mat1,s (Fq ) : () = a},
a ∈ Qs ,
(4.1)
in Mat1,s (Fq ), where Qs = {0, 1, . . . , s}. For arbitrary n, we put A =
n
aj = = (1 , . . . , n ) ∈ Matn,s (Fq ) : (j ) = aj , 1 j n ,
(4.2)
j =1
where A = (a1 , . . . , an ) ∈ Qns . Obviously, A1 ∩ A2 = ⭋ if A1 = A2 , and the space Matn,s (Fq ) can be represented as a disjoint union of all A , A ∈ Qns , so that A . (4.3) Matn,s (Fq ) = A∈Qns
The following is an improvement of Proposition 2.2(i) of [2]. Lemma 4.1. (i) The orbits of the group Tsn on Matn,s (Fq ) coincide with the boxes A , A ∈ Qns . (ii) The cardinality of the boxes A , A ∈ Qns , is given by #{A } = (q − 1)(A) q a1 +···+an −(A) ,
(4.4)
where (A) denotes the “Hamming weight” of the integer vector A = (a1 , . . . , an ), given by the number of nonzero entries of A. (iii) The stabilizer S(A ) = { ∈ Tsn : A = A } of a point A ∈ A is a subgroup in Tsn of order ns(s−1) #{Tsn } (4.5) = (q − 1)ns−(A) q 2 −a1 −···−an +(A) . #{S(A )} = #{A } Proof. (i) First, let n = 1. Then, 0 = {0} and the statement is trivial. If a 1, then the box a consists of all rows = (1 , . . . , s ) with j = 0 for j > a, arbitrary j ∈ Fq for j < a and with arbitrary a ∈ F∗q = Fq \{0}. Write a = (1,a , . . . , s,a ) with j,a = j,a , where j,a is the Kronecker symbol. For a lower triangular matrix t = (tj,i ) ∈ Ts , tj,i = 0 for j > i, we have a t = (t1,a , . . . , ts,a ) ∈ a . Thus, a = {a t : t ∈ Ts } is an orbit of the group Ts . This proves the statement (i) for n = 1. In view of formulas (2.1), (2.7), and (4.2), this also implies the statement (i) for arbitrary n. (ii) The above description of the structure of boxes a implies the formula 1 if a = 0, #{a } = (4.6) (q − 1)q a−1 if 1 a s. From (4.2), we conclude that #{A } =
n
#{aj }.
j =1
Substituting (4.6) into (4.7), we obtain (4.4).
(4.7)
M.M. Skriganov / Journal of Complexity 23 (2007) 926 – 936
933
(iii) Each orbit A , A ∈ Qns , can be identified with a homogeneous space: A Tsn /S(A ). Therefore, #{A } =
#{Tsn } , #{S(A )}
so that
#{S(A )} =
#{Tsn } , #{A }
(4.8)
and (4.5) follows from (4.8), (4.4), and (2.9). The proof of Lemma 4.1 is complete. Let two points 1 and 2 in Matn,s (Fq ) be given. What is the number of solutions ∈ Tsn of the equation 1 = 2 ? We put N (1 , 2 ) = { ∈ Tsn : 1 = 2 } ⊂ Tsn
(4.9)
and (1 , 2 ) = #{N (1 , 2 )}.
(4.10)
Lemma 4.2. (i) If 1 ∈ A1 , 2 ∈ A2 , and A1 = A2 , then (1 , 2 ) = 0. (ii) If 1 ∈ A and 2 ∈ A , then (1 , 2 ) = #{S(A )}. Proof. (i) The statement is a trivial consequence of Lemma 4.1(i). (ii) Since both points 1 and 2 belong to the same orbit A , we can write 1 = A 1 , 2 = A 2 for a fixed point A ∈ A and some 1 , 2 ∈ Tsn . Therefore, the equation 1 = 2 takes the form A 1 = A 2 or A 1 −1 2 = A . This gives −1 (1 , 2 ) = #{ ∈ Tsn : 1 −1 2 ∈ S(A )} = #{1 S(A )2 } = #{S(A )}.
The proof of Lemma 4.2 is complete.
Now our interest is with the distribution of points of a code C ⊂ Matn,s (Fq ) in boxes A , A ∈ Qns . Lemma 4.3. Let C ⊂ Matn,s (Fq ) be an arbitrary linear code. Then #{C ∩ 0 } = 1, and for nonzero A = (a1 , . . . , an ) ∈ Qns #{C ∩ A } = 0
if 0 < a1 + · · · + an < (C),
and #{C ∩ A } q a1 +···+an −(C )+1
if a1 + · · · + an (C).
This is Lemma 2.2 of [9]. It is worth noting that the ultrametric inequality (2.4) is crucial for the proof of this result. Relying on the above three lemmas, we can easily complete the proof of Theorem 3.1. 5. Proof of Theorem 3.1 Let a linear code C ⊂ Matn,s (Fq ) be given. Fix a Hamming ball B(d − 1) ⊂ Matn,s (Fq ) of radius d − 1, where d 1 is an integer (see (2.10)).
934
M.M. Skriganov / Journal of Complexity 23 (2007) 926 – 936
Let us split the group Tsn into a disjoint union of two subsets Tsn = G(C) ∪ B(C), where the subset G(C) of “good” transformations consists of all ∈ Tsn such that 1 = 2 for all 1 ∈ C\{0} and 2 ∈ B(d − 1)\{0}, and the subset B(C) of “bad” transformations consists of all ∈ Tsn such that 1 = 2 for at least one pair 1 ∈ C\{0} and 2 ∈ B(d − 1)\{0}. From these definitions we immediately conclude that #{G(C)} + #{B(C)} = #{Tsn },
(5.1)
(C) d
(5.2)
and
for all transformations ∈ G(C). Let us estimate the cardinality of the subset of bad transformations. With definitions (4.9) and (4.10), we have
B(C) ⊂
{N (1 , 2 ) : 1 ∈ C\{0}, 2 ∈ B(d − 1)\{0}}
1 ,2
and #{B(C)}
{(1 , 2 ) : 1 ∈ C\{0}, 2 ∈ B(d − 1)\{0}} .
(5.3)
1 ,2
Here, for simplicity, we write {E : ∈ O} instead of ∈O E and {f ( ) : ∈ O} instead of ∈O f ( ) if the corresponding region O is rather cumbersome to be indicated under the symbol for union or summation. For convenience, we denote by d (C) the sum in (5.3). Using (4.3), we can write this sum in the form (1 , 2 ) : 1 ∈ (C\{0}) ∩ A1 , d (C) = A1 ,A2 ∈Qns 1 ,2
2 ∈ (B(d − 1)\{0}) ∩ A2 .
By Lemma 4.2(i), all terms in (5.4) with A1 = A2 vanish. Therefore, {(1 , 2 ) : 1 ∈ (C\{0}) ∩ A , 2 ∈ (B(d − 1)\{0}) ∩ A } d (C) = A∈Qns 1 ,2
=
A∈Qns \{0} 1 ,2
{(1 , 2 ) : 1 ∈ C ∩ A , 2 ∈ B(d − 1) ∩ A } .
(5.4)
M.M. Skriganov / Journal of Complexity 23 (2007) 926 – 936
935
It then follows from Lemma 4.2(ii) and Lemma 4.1(iii) that {#{S(A )} : 1 ∈ C ∩ A , 2 ∈ B(d − 1) ∩ A } d (C) = A∈Qns \{0} 1 ,2
=
#{S(A )}#{C ∩ A }#{B(d − 1) ∩ A }
A∈Qns \{0}
= #{Tsn }
A∈Qns \{0}
#{C ∩ A } #{B(d − 1) ∩ A }. #{A }
(5.5)
With Lemma 4.3 we obtain an upper bound for the last sum in (5.5) to give the inequality q a1 +···+an −(C )+1 d (C) #{Tsn } #{B(d − 1) ∩ A } : (q − 1)(A) q a1 +···+an −(A) A A = (a1 , . . . , an ) ∈ Qns , a1 + · · · + an (C) =
#{Tsn }q −(C )
A
(A) q q #{B(d − 1) ∩ A } : q −1
A = (a1 , . . . , an ) ∈ Qns ,
a1 + · · · + an (C)
n q #{B(d − 1) ∩ A } : A = (a1 , . . . , an ) ∈ Qns , q −1 A a1 + · · · + an (C) n
q < #{Tsn }q −(C ) q {#{B(d − 1) ∩ A } q −1 A∈Qns
n q Vq (d − 1), (5.6) = #{Tsn }q −(C ) q q −1 #{Tsn }q −(C ) q
where Vq (d − 1) is the cardinality of the ball B(d − 1) (see (2.11)). Combining (5.3) and (5.6), we find an upper bound for the cardinality of the subset of bad transformations, in the form
n q #{B(C)} < #{Tsn }q Vq (d − 1)q −(C ) . q −1 Substituting this inequality into (5.1), we find the following lower bound for the cardinality of the subset of good transformations
n #{G(C)} q >1−q Vq (d − 1)q −(C ) . (5.7) #{Tsn } q −1 Suppose that inequality (3.3) of Theorem 3.1 holds. Then, it follows from (5.7) that #{G(C)} > 0, and the subset G(C) is nonempty. Therefore, the bound (5.2) holds for all transformations ∈ G(C). The proof of Theorem 3.1 is complete.
936
M.M. Skriganov / Journal of Complexity 23 (2007) 926 – 936
Acknowledgments The author is grateful to Michael Tsfasman, Serge Vladu¸t, and Henryk Wo´zniakowski for their many interesting and valuable discussions. The author is also grateful to the referees for their helpful remarks and suggestions, and to Grzegorz Wasilkowski for his diligent handling of this paper. References [1] W.W.L. Chen, M.M. Skriganov, Explicit constructions in the classical mean squares problem in irregularities of point distribution, J. Reine Angew. Math. 545 (2002) 67–95. [2] S.T. Dougherty, M.M. Skriganov, MacWilliams duality and the Rosenbloom–Tsfasman metric, Moscow Math. J. 2 (1) (2002) 81–97. [3] K. Lee, Automorphism group of the Rosenbloom–Tsfasman space, Eur. J. Combin. 24 (2003) 607–612. [4] J.H. van Lint, Introduction to Coding Theory, third ed., Graduate Texts in Mathematics, vol. 86, Springer, Berlin, 1999. [5] H. Niederreiter, C.P. Xing, Low-discrepancy sequences and global functional field with many rational points, Finite Fields Appl. 2 (1996) 241–273. [6] E. Novak, H. Wo´zniakowski, When are integration and discrepancy tractable?, in: R.A. DeVore et al. (Eds.), FOCM Proceedings of Oxford, 1999, Cambridge University Press, Cambridge, 2001, pp. 211–266. [7] M.Yu. Rosenbloom, M.A. Tsfasman, Codes for the m-metric, Problemy Peredachi Informatsii 33 (1) (1997) 55–63 (English translation in Probl. Inf. Transm. 33 (1) (1997) 45–52). [8] M.M. Skriganov, Coding theory and uniform distributions, Algebra i Analiz 13 (2) (2001) 191–239 (English translation in St. Petersburg Math. J. 13 (2) (2002) 301–337). [9] M.M. Skriganov, Harmonic analysis on totally disconnected groups and irregularities of point distributions, J. Reine Angew. Math. 600 (2006) 25–49.
Journal of Complexity 23 (2007) 937 – 951 www.elsevier.com/locate/jco
Computation of local radius of information in SM-IBC identification of nonlinear systems夡 Mario Milanese, Carlo Novara∗ Dipartimento di Automatica e Informatica, Politecnico di Torino, Italy Received 26 January 2007; accepted 29 May 2007 Available online 27 July 2007
Abstract System identification consists in finding a model of an unknown system starting from a finite set of noisecorrupted data. A fundamental problem in this context is to asses the accuracy of the identified model. In this paper, the problem is investigated for the case of nonlinear systems within the Set Membership—Information Based Complexity framework of [M. Milanese, C. Novara, Set membership identification of nonlinear systems, Automatica 40(6) (2004) 957–975]. In that paper, a (locally) optimal algorithm has been derived, giving (locally) optimal models in nonlinear regression form. The corresponding (local) radius of information, providing the worst-case identification error, can be consequently used to measure the quality of the identified model. In the present paper, two algorithms are proposed for the computation of the local radius of information: The first provides the exact value but requires a computational complexity exponential in the dimension of the regressor space. The second is approximate but involves a polynomial (quadratic) complexity. © 2007 Elsevier Inc. All rights reserved. Keywords: Radius of information computation; Nonlinear systems identification; Set membership; Information based complexity
1. Introduction Consider a nonlinear discrete-time dynamic system in regression form y t+1 = f0 w t , wt = [y t . . . y t−ny +1 ut . . . ut−nu +1 ],
(1)
where y t ∈ R, ut ∈ Rm , n = ny + mnu and f0 : W ⊂ Rn → R. 夡 This work has been partly supported by Ministero dell’Università e della Ricerca of Italy, under the National Projects “Advanced control and identification techniques for innovative applications” and “Control of advanced systems of transmission, suspension, steering and braking for the management of the vehicle dynamics”.
∗ Corresponding author. Fax: +39 011 564 7099.
E-mail addresses: [email protected] (M. Milanese), [email protected] (C. Novara). 0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.05.004
938
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
The problem of system identification is to find, from a set of noise-corrupted measurements of y t and w t , an estimate f of f0 giving small, possibly minimal, identification error f0 − f, where · is a suitable norm. This error is not known and, since data are finite and noise corrupted, a reliable estimate of the identification error can be obtained only if some information on f0 and on noise is available. In the literature, [20,10,15], the information on f0 is typically given by assuming that it belongs to some finitely parametrized subset F() of functions. In some cases, the knowledge of the laws governing the system (mechanical, economical, biological, etc.) generating the data, may allow to have information on its structure, where some basic parameters have to be calibrated from available data. In other situations, when the laws are too complex or not sufficiently known, the usual approach is to rconsider that f 0 belongsq to a finitely parametrized set of functions . F() = {f (w, ) = i=1 i i w, i , i ∈ R } where = [, ] and the i’s are given functions. Then, measured data are used to derive an estimate of and f w, is used as estimate of f0 . Basic to this approach is the proper choice of the parametric family of functions f (w, ), typically realized by some search on different functional forms of the i ’s, e.g. linear, polynomial, sigmoidal, wavelet, etc. and on the number r [20]. This search may be quite time consuming, and in any case leads to approximate model structures. The evaluation of the effects of this approximation on identification errors is at present a largely open problem. Another critical point is that the estimate of is usually obtained by minimization of a non-convex error function. Such a minimization may trap into local minima and thus provide a bad estimate. In [12] an alternative approach is proposed, formulating the problem in a set membership (SM)—information based complexity (IBC) framework. The SM framework is used in systems identification to deal with approximate model structures and finite sample accuracy evaluation, see, e.g. [13,14,11,18,1]. The SM framework, being related to approximation and interpolation of multivariable functions with bounded derivatives, from the knowledge of a finite number of their values, has strong connection with the IBC framework, see, e.g. [21,16,22]. In the nonlinear SM-IBC approach of [12], no assumptions on the functional form of f0 are required. An assumption on f0 regularity is used instead, given by a bound on its gradient. Moreover, the noise is assumed bounded, in contrast with statistical approaches, which rely on assumptions such as stationarity, ergodicity, uncorrelation, type of distribution, etc. The validity of these assumptions may be difficult to test in many applications and is certainly lost in presence of approximate modelling. In the nonlinear SM-IBC approach a locally optimal identification algorithm is derived, which gives an estimate of f0 with minimal guaranteed Lp identification error, without requiring iterative minimization and thus avoiding the problem of local minima. A quantity rI , called (local) radius of information, giving the worst-case identification error, is also defined. The radius of information rI allows to assess the accuracy achieved by the optimal estimate. More in general, rI allows to asses the quality of the overall identification procedure, involving specific problems such as input type selection, sampling time choice, input channels selection, regressors choice, model order selection [17]. These problems are quite relevant in system identification [10,4]. In this paper, the problem of computing the radius of information rI is considered. Two algorithms are proposed: The first provides the exact value of rI but requires a computational complexity which increases exponentially with the dimension n of the regressor space. The second provides an approximate value of rI and involves a polynomial (quadratic) complexity. The paper is organized as follows. Section 2 summarizes the nonlinear SM-IBC method. In Section 3, we introduce the notion of hyperbolic voronoi diagram (HVD), which are used to
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
939
compute the radius of information. Section 4 illustrates the two algorithms for the computation of the local radius of information. In Section 5, a numerical example is shown. 2. SM-IBC identification of nonlinear systems In this section the main concepts and results of the nonlinear SM-IBC identification method [12] are summarized. T = { T = { Consider that a set of noise corrupted data Y y t , t = 1, . . . , T } and W wt , t = 1, . . . , T } generated by (1) is available. Then wt ) + d t , y˜ t+1 = f0 (
t = 1, . . . , T ,
(2)
where the term d t accounts for the fact y t+1 and w t are not exactly known. T , W T ). The aim is to derive an estimate fˆ of f0 from available measurements (Y T T An identification algorithm is an operator mapping available data (Y , W ) into an estimate T , W T ) = fˆ f0 . The algorithm should be chosen to give small (possibly fˆ of f0 : (Y minimal) Lp error f0 − fˆp , where:
1/p p . , p ∈ [1, ∞), W |f (w)| dw (3) f p = ess supw∈W |f (w)| , p = ∞ and W is a bounded convex set in Rn . Whatever algorithm is chosen, no information on the identification error can be derived, unless some assumptions are made on the function f0 and the noise d. The typical approach in the literature is to assume a finitely parametrized functional form for f0 (linear, bilinear, neural network, etc.) and statistical models for the noise [6,20,15,9]. In the SM-IBC approach, different and somewhat weaker assumptions are taken, not requiring the selection of a parametric form for f0 , but related to its derivatives. Moreover, the noise sequence D T = {d t , t = 1, . . . , T } is only supposed bounded. . Prior assumptions on f0 : f0 ∈ K = f ∈ C 1 (W ) : f (w) , ∀w ∈ W . . Prior assumptions on noise: D T ∈ D = {d t , t = 1, . . . , T } : |d t | ε, t = 1, . . . , T . . n 2 Here, f (w) denotes the gradient of f (w) and x = i=1 xi is the Euclidean norm. As typical in any estimation theory, the problem of checking the validity of prior assumptions arises. This problem is considered in [12], where a validation analysis is provided, which also allows to properly choose the values of the bounds and ε. A key role in this SM framework is played by the feasible systems set, often called “unfalsified systems set”, i.e. the set of all systems consistent with prior information and measured data. Definition 1. Feasible systems set: t . F SS T = {f ∈ K : y˜ t+1 − f w ε t , t = 1, . . . , T }.
(4)
The feasible systems set F SS T summarizes all the information on the mechanism generating the data that is available up to time T . If prior assumptions are “true”, then f0 ∈ F SS T , an important property for evaluating the accuracy of identification. Using the notion of feasible systems set, we can define an identification algorithm as an T , W T ) until time operator mapping all available information about function f0 , noise d, data (Y
940
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
T , summarized by F SS T , into an estimate fˆ of f0 : F SS T = fˆ f0 . For given estimate, F SS T = fˆ, the related Lp error f0 − fˆ cannot be exactly computed, p ˆ ˆ but its tightest bound is given by f0 − f supf ∈F SS T f − f . This motivates the followp
p
ing definition of the identification error, often indicated as local worst-case or guaranteed error. Definition 2. The local identification error of the estimate fˆ = F SS T is . E F SS T = E(fˆ) = sup f − fˆ . f ∈F SS T
p
Looking for algorithms that minimize the identification error, leads to the following optimality concepts. Definition 3. An algorithm ∗ is called locally optimal if: E ∗ F SS T = inf E F SS T = rI .
The quantity rI , called local radius of information, gives the minimal identification error that can be guaranteed by any estimate based on the available information up to time T . Define the functions: t . f (w) = min h + w − w t , t=1,...,T . f (w) = max ht − w − w t , t=1,...,T
. t+1 y h = + ε, t
. t+1 y ht = − ε,
(5)
where “min” and “max” are to be intended for fixed w (the same holds for “inf” and “sup” in statement (iii) of Theorem 1 below). The next result shows that the algorithm: . c (F SS T ) = fc = 21 f + f is optimal for any Lp norm, the corresponding minimal identification error can be actually computed, and the functions f and f , called optimal bounds, are the tightest upper and lower bounds of f0 . Theorem 1 (Milanese and Novara [12]). For any Lp (W ) norm, with p ∈ [1, ∞]: (i) The identification algorithm c F SS T = fc is locally optimal. (ii) E (fc ) = 21 f − f = rI = inf E F SS T . p
(iii) f (w) = supf ∈F SS T f (w), f (w) = inf f ∈F SS T f (w).
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
941
Note that the functions fc , f and f are not C 1 (W ), since they are defined by means of “min” and “max” over finite sets of functions. Nevertheless, in [12] it is shown that they are C 1 almost everywhere on W . Remark. The local identification error actually depends on f0 and D T , i.e. E f = E f, f0 , D T . In IBC literature [21,16], a global error of given algorithms is often considered, defined as . E g () = sup E F SS T , f0 , D T . f0 ∈F () D T ∈D
An algorithm g is called globally optimal if E g g = inf E g (). Note that a locally optimal algorithm ∗ is globally optimal, but g is not in general locally optimal. Thus, the local optimality concept considered in this paper is stronger than the global optimality concept. In the rest of the paper the local optimality concept will be considered and the term local will be omitted. 3. Hyperbolic Voronoi diagrams In this section, the notion of HVD introduced in [12] is recalled. The HVD are a generalization of standard Voronoi diagrams (see, e.g. [2]) and are used in the present paper to compute the radius of information. T = { Consider the set of points: W w t , t = 1, 2, . . . , N} and a T × T antisymmetric matrix . t Let be the element of at the tth row and th column. Then define: • The (n − 1)-dimensional hyperbola H t : . = t , t = }. t − w − w H t = {w ∈ Rn : w − w t : • The n-dimensional regions S t containing w . < t , t = }. t − w − w S t = {w ∈ Rn : w − w . • The hyperbolic cell C t : C t = =t S t . t = H t ∩ [C t ], where [C t ] is the closure of The cells C t are also called n-faces. The surfaces H t C , are called (n − 1)-faces. The intersections between the (n − 1)-faces generate other cells of dimension d, with 0 d < n − 1, called d-faces. The 0-faces are called vertices. T , ) is defined as the set of all d-faces, 0 d n. Definition 4. The HVD V (W If t = 0, ∀t, , all hyperbola H t degenerate into hyperplanes and the definitions become the ones of standard Voronoi diagrams [2]. The next theorem shows some properties of HVD useful for characterizing the optimal bounds f and f . Theorem 2 (Milanese and Novara [12]). t (i) C t = ∅ ⇐⇒ w −w > t , ∀ = t, (ii) C t ∩ C =
∅, t = , and (iii) Tt=1 C t = Rn , where C t is the closure of C t .
942
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
This result shows that the non-empty cells of an HVD give a complete partition of Rn , so that any w ∈ Rn belongs to some (n − 1)-dimensional hyperbola H t or to one (and only one) cell C t . Now, for given functions f and f , consider the HVD V and V defined as . T , , V =V W
. T , , V =V W
t t where t = h − h /, t = ht − h /. Let C , t = 1, 2, . . . , T be the cells of V and C t , t = 1, 2, . . . , T be the cells of V . The following result and the comments below show the connection between the HVD V and V and the optimal bounds f (w) and f (w). Theorem 3 (Milanese and Novara [12]).
t t t (i) Let C be a non-empty cell of V . Then f (w) = h + w−w t , ∀w ∈ C . (ii) Let C t be a non-empty cell of V . Then f (w) = ht − w − w t , ∀w ∈ C t . t
This theorem shows that, for w belonging to a non-empty cell Ct , the function f (w) is given t n w − w × R defined by the equation y = h + , with vertex of coordinates by the cone in R w t , h
t
and axis along the y-dimension. Since from Theorem 2 the non-empty cells of V give
a complete partition of the regressor space Rn , f is a piece-wise conic function over a suitable n partition the intersection of two cones y = of R t that can be derived from the HVD V . Indeed, t t and y = h + w − w w − w , projected on Rn gives the hyperbola H = {w ∈ Rn : h + w − w = t , t = } that define the HVD V . Similar considerations hold for t − w − w the relation between f and V . 4. Radius of information computation Let us define the following error function: . fe (w) =
1 2
f (w) − f (w) ,
which allows to write the radius of information as rI = fe p , where the norm is defined in (3). The analytical computation of fe p does not appear feasible, since fe is a quite “complicated” function (see Section 2). Let us consider the numerical computation of fe p . The standard approach (see, e.g. [21,23] and the references therein) for the numerical computation of the Lp norm of a function f (w) ∈ C r (W ) is to evaluate f (w) on a set of m points:
f (w 1 ), f (w 2 ), . . . , f (w m ) : w1 , w2 , . . . , wm ∈ W .
Then, the norm is approximated as p = f p f
k p 1/p f w , p ∈ [1, ∞), maxk=1,...,m f wk , p = ∞, m k=1 ak
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
943
where ak are suitably chosen. For ak = 1/m, we have the widely used quasi-Monte Carlo algorithms. This approach is simple and easy to implement but is affected by two relevant problems: p is only an approximation of f p . (1) For finite m, f (2) In general, the number of points m required to obtain a certain degree of approximation grows exponentially with the dimension n of the set W : mc −n/r , p is the approximation error, and f ∈ C r (W ) where c is a positive number, = f p − f (see [21,23]). This is the well-known curse of dimensionality, by which norm computation is intractable for large values of n. In this paper, we focus on the L∞ norm, which is a most relevant case in nonlinear SM identification. Two methods for the computation of rI = fe ∞ are introduced. The first method provides exact evaluation of rI using a finite set of points. Such a method is still affected by the exponential dependence on the dimension n. The second method is approximated but not affected by the exponential dependence on n. Consider the HVDs individuated by the functions f and f , introduced in Section 3: . . T , . T , , V = V W V =V W t
Let C , t = 1, 2, . . . , T be the cells of V and C t , t = 1, 2, . . . , T be the cells of V . Denote with [X] the closure of set X, with (X)* the boundary of set X and define: t . B tk = [C ] ∩ [C k ] ∩ W,
t, k = 1, 2, . . . , T .
(6)
We refer to the set B0tk using the term cell. The boundary (B tk )* of a cell is composed of surfaces called (n − 1)-faces of B tk . Each (n − 1)-face of B tk is either a portion of an (n − 1)-face of V , or a portion of an (n − 1)-face of V , or a portion of (W )* . The intersections between the (n − 1)-faces generate other surfaces of dimension d, with 0 d < n − 1, called d-faces of B tk . The 0-faces are called vertices of B tk and are indicated with B0tk . The set of all vertices is denoted as T . tk B0 = B0 . t,k=1
Assume that W ⊂ Rn is a convex polytope. The following result shows that the exact value of rI can be calculated by evaluating the error function over the finite set of points B0 . Theorem 4. The radius of information rI is given by rI = max fe (w). w∈B0
Proof. From point (iii) of Theorem 2 it directly follows that the sets B tk constitute a complete T partition of W , i.e. that W = t,k=1 B tk . The radius of information can thus be expressed as rI = fe ∞ = ess sup |fe (w)| = w∈W
max
max fe (w).
t,k=1,...,T w∈Btk
944
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
Hence, let us consider the computation of maxw∈Btk fe (w). From Theorem 3 we have fe (w) =
1 t k ), (h − hk ) + (w − w t + w − w 2 2
w ∈ B tk .
(7)
k are This expression shows that fe (w) is a convex function on B tk , since w − w t and w − w convex functions. A function that is convex on a compact set has its maximum on the boundary of the set, see, e.g. [19]. Then, defining: tk . wM = arg max fe (w) w∈Btk
we have that tk wM ∈ (B tk )* .
(8)
tk is on an (n − 1)-face The boundary (B tk )* is composed of the (n − 1)-faces of B tk , hence wM tk tk of B . An (n − 1)-face of B is either an portion of an (n − 1)-face of V , or a portion of an (n − 1)-face of V , or a portion of (W )* . tk lies on an (n − 1)-face of V . Then w tk ∈ H t for some , where Consider the case that wM M t H is the (n − 1)-dimensional hyperbola defined by t . t = h − h /}. H = {w ∈ Rn : w − w t − w − w
Suppose that this hyperbola has curvature oriented towards w . Since the level surfaces of fe (w) t are ellipsoids with curvature oriented towards w , it follows fe (w) is convex on the (n − 1)-face t tk individuated by H . This implies that wM is on the boundary of the (n − 1)-face. If the hyperbola t H has curvature oriented towards w t , we can write fe (w) as fe (w) =
1 k ), (h − hk ) + (w − w + w − w 2 2
t
w∈H .
tk The level surfaces of this function are ellipsoids with curvature oriented towards w and thus wM is on the boundary of the (n − 1)-face. Similarly, it can be seen that this property holds also if the tk lies on a (n − 1)-face of V or V , it is on maximum lies on a (n − 1)-face of V . Therefore, if wM the boundary of the (n − 1)-face, i.e. on a (n − 2)-face of B tk . tk is on an (n − 1)-face B tk belonging to (W ) . This property holds also in the case that wM * 1 tk is a portion of a plane and thus a convex set. This implies that the error function fe Indeed, B 1 tk and that its maximum is on the boundary of B tk , i.e. on an (n − 2)-face of B tk . is convex on B 1 1 tk is on a 0-face of B tk , i.e. Iterating this argument for n − 3, n − 4, . . . , 0, we have that wM tk tk tk tk wM ∈ B0 . The claim of the theorem follows, since wM ∈ B0 for all t, k = 1, 2, . . . , T .
The computation of rI indicated in Theorem 4 requires to calculate the vertices of the sets B tk . An algorithm for this calculation has been developed in Matlab䉸 . The main functions of the algorithm (main program and function vertices) are reported below in a code-like format. The other functions are only qualitatively described, since their code is quite complex and not essential to understand how the algorithm works.
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
945
Algorithm Main program VERT = []; for t = 1 : T for k = 1 : T Vert=vertices( wtk ); VERT = [VERT Vert]; end end Function vertices v = vert_search( wtk ); Vert = v; Vpv = v; a = 0; while a == 0 V = Vpv; Vpv = []; b = 0; for i = 1:size(V , 2) [Vfn, b(i)] = first_neighbours(V (:, i)); Vpv = [Vpv Vfn]; end a = all(b == 1); Vert = [VM Vpv]; end Function vert_search: This function takes a starting point w tk as input and gives a vertex tk v ∈ B0 as output. Function first_neighbours: This function takes a vertex V (:, i) ∈ B0tk as input and gives the set Vfn of all vertices of B0tk which are first neighbours of V (:, i) as output. The function first_neighbours also allows to check if all the points of B0tk have been computed. In particular, if b = 1 for all the steps of the for loop in the function vertices, then Vert = B0tk . In this case, the while loop stops and the points of B0tk are all contained in Vert. On the contrary, if b = 1 for some step of the for loop, the while loop continues until Vert = B0tk . The function vertices allows to evaluate the vertices B0tk of a cell B tk for given t, k. In order to evaluate all the vertices of t,k B0tk , the main program runs this function for all t, k = 1, 2, . . . , T . Clearly, a vertex of a cell is also a vertex of other cells. In order to avoid unnecessary computations, the function first_neighbours also recognizes if a vertex has already been evaluated and then skips its computation. A simplified version of the algorithm, requiring only one for loop the in main program, has T been implemented for the computation of the vertices of a HVD V W , . The computation of rI as indicated in Theorem 4 and in the above algorithm can be performed in principle for any dimension n of the regressor space. However, as it happens for standard Voronoi diagrams [2], the computational complexity needed to evaluate the vertices is exponential in n. This issue can be overcome by means of the theorem below, which allows to compute an approximate radius of information with low computational cost.
946
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
T . . , 0 . Let C t = [At ] ∩ W , where At are the cells of Define the following HVD: H = V W H . The following lemma, describing some properties of the sets C t and B tt , is essential for the calculation of the approximate radius of information. Lemma 1. The sets C t and B tt are convex and B tt ⊆ C t . Proof. The HVD H is a standard Voronoi diagram (see Section 3), then the cells At are polyhedrons, i.e. convex sets. It follows that C t is a convex set, being the intersection between two convex sets (W is assumed convex). t From the definition of C and C t in Section 3, we have that B tt is given by ⎤ ⎡ ⎤ ⎡ t B tt = ⎣ S ⎦ ∩ ⎣ S t ⎦ ∩ W, =t
t
. S = {w ∈ Rn . S t = {w ∈ Rn t
=t
< (h − ht )/}, : w − w t − w − w : w − w t − w − w < ht − h /}.
(9)
t
⊆ S t if h ht , or S t ⊆ S if h > ht . Eq. (9) can thus be written as ⎤ ⎡ tt t S ⎦ ∩ W, B =⎣
Note that S
=t
. S t = {w ∈ Rn : w − w < t }, t − w − w with t = (h − ht )/ if h ht , or t = (ht − h )/ if h > ht . t It is easy to see regions. Indeed, the surface that defines S t individuated that St are convex t = by the equationw − w − w − w is a (n − 1)-dimensional hyperbola with curvature t t oriented towards w , i.e. towards S . It follows that B tt is convex, being the intersection between convex sets. The cells At are defined by . t At = S , =t
. < 0}. S t = {w ∈ Rn : w − w t − w − w Being t 0, we have that St ⊆ S t . This implies that B tt ⊆ C t .
Consider the following optimization problems:
ti = max wi − w it , i = 1, 2, . . . , n,
(10)
w∈C t
t , i,t = arg max wi − w it ,
t = max i,t − w i
w∈Btt
i = 1, 2, . . . , n.
The following theorem provides upper and lower bounds of the radius of information.
(11)
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
947
Theorem 5. The radius of information rI is bounded as r rI r,
(12)
where r = ε + maxt t ,
r = ε + maxt t .
Proof. From Theorem 3 we have that the error function can be expressed as 1 j k , fe (w) = (h − hk ) + j + w − w w − w 2 2 where
t j = arg min h + w − w t , t k = arg max ht − w − w t .
(13)
t
The HVD H is a standard Voronoi diagram (see Section 3), then the sets C t constitute a complete partition of W . Suppose that w ∈ C t . From (13) it follows that j t h + w − w j h + w − w t , −hk + w − w k − ht + w − w t . We have therefore: 1 t t ) t + w − w fe (w) (h − ht ) + (w − w 2 2 = ε + w − w t , w ∈ C t . t . Hence maxw∈C t fe (w) maxw∈C t ε + w − w From the definition of t , it is easy to see that maxw∈C t w − w t t , which yields: max fe (w)ε + t . w∈C t
Since the cells C t , t = 1, 2, . . . , T define a complete partition of W , we have rI = fe ∞ = ess sup |fe (w)| = max max fe (w) t=1,...,T w∈C t
w∈W
and then rI = max max fe (w) max t=1,...,T
w∈C t
t=1,...,T
ε + t ,
which proves that r is an upper bound of rI . Let us now show that r rI . Since B tt ⊆ C t (see Lemma 1), we have that rI = max max fe (w) max fe (w) ∀t. t=1,...,T w∈C t
w∈Btt
Theorem 3 shows that, for w ∈ B tt , the error function can be expressed as fe (w) = ε + w − w t .
948
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
From the definition of t it follows that maxw∈Btt w − w t t , ∀t. We have thus: rI ε + max t t=1,...,T
which shows that r is a lower bound of rI .
Note that the optimization problems (10) and (11) can be Indeed, (10) is equivalent easily solved. to the following optimization problems: a = maxw∈C t wi − w it , b = minC t wi − w it , ti = t max (|a| t,|b|). The first two problems are convex since C is a convex set (see Lemma 1) and i is a convex function; the third one is trivial. The same argument holds for the second wi − w of Eqs. (11). The first of Eqs. (11) is trivial. The following approximate radius of information: . rI = 21 r + r (14) is an estimate of rI and can be used when the dimension n of the regressor space is large. Remark. The computational complexity of evaluating rI is O(n2 ). Indeed, a complexity O(n)tis i,t t required for the evaluation of i or , since it must be verified that constraints such as w − w − w −w t are satisfied. Clearly, the complexity involved by the computation of the norm . n i,t t 2 x = i=1 xi is O(n). Since i and must be calculated for i = 1, 2, . . . , n, it follows that rI have complexity O(n2 ). the computation of t and t , and thus the computations of r, r and Note that, while the calculation of rI becomes intractable in practice for n5 or 6, the calculation of rI can be performed for large values of n without significant problems.
5. Example The radius of information allows to assess the accuracy achieved by the optimal estimate provided by Theorem 1. More in general, the radius of information allows to asses the quality of the overall identification procedure, involving specific problems such as input type selection, sampling time choice, input channels selection, regressors choice, model order selection [17]. These problems are quite relevant in system identification [10,4]. In the literature, much effort has been spent to solve them for linear systems, see, e.g. [10,3,7]. On the contrary, very few studies on nonlinear systems are available [5,8]. In this example, we have considered an input type selection problem for the following nonlinear system y(t + 1) = 0.88y(t) − 0.12 tanh[15y(t)] + 0.06u(t).
(15)
The initial condition y(1) = 0 has been assumed. Three input types have been used: U(1) = {3 sin(0.2t), t = 1, 2, . . . , T }, U(2) = {3 sin(0.0009t 2 ), t = 1, 2, . . . , T }, U(3) = {W N(0, 4, t), t = 1, 2, . . . , T }
(16)
where W N (0, 4, t) is a white Gaussian noise of mean 0 and variance 4. For each input type, a simulation of system (15) of length T = 300 has been performed and the corresponding exact radius of information rI , approximate radius of information rI , lower
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
u(t)
5
949
Input: U(1)
0 −5 0
u(t)
5
50
100
150 t
200
250
300
50
100
150 t
200
250
300
50
100
150 t
200
250
300
Input: U(2)
0 −5 0
u(t)
5
Input: U(3)
0 −5 0
Fig. 1. Input sequences.
bound r I , and upper bound r I have been computed. The sequences U(1) , U(2) , U(3) used in these simulations are shown in Fig. 1. The regressor has been defined as w(t) = [y(t)
u(t)].
The regressor domain of interest has been assumed to be the rectangular region indicated in Fig. 2 and defined by . W = {w : w1 0.35, w1 − 0.35, w2 3.5, w2 − 3.5} . The values ε = 0 and = 1.5 have been taken on the basis of the procedure proposed in [12]. The values of the exact radius of information rI , approximate radius of information rI , lower bound r I , and upper bound r I obtained are shown in Table 1. The fact that U(3) provides a lower radius of information, and then a higher identification accuracy, could be related to the more uniform exploration of the regressor domain W provided by U(3) with respect to U(1) and U(2) . This can be observed in Fig. 2, where the “measured” regressors are shown for the three simulations. Considering the values of exact and approximate radius of information in Table 1, we can conclude that U3 is the best input type among {U(1) , U(2) , U(3) } to be used for system (15) identification.
950
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
5
w2
Input: U(1) 0 W −5
−0.5
−0.4
−0.3
0.2
−0.1
0 w1
0.1
0.2
0.3
0.4
0.5
0.4
0.5
0.4
0.5
5
w2
Input: U(2) 0 W −5
−0.5
−0.4
−0.3
−0.2
−0.1
0 w1
0.1
0.2
0.3
5
w2
Input: U(3) 0 W −5
−0.5
−0.4
−0.3
−0.2
−0.1
0 w1
0.1
0.2
0.3
Fig. 2. “Measured” regressors.
Table 1 rI , r I , r I corresponding to input sequences U(1) , U(2) , U(3) Values of rI ,
U(1) U(2) U(3)
rI
rI
rI
rI
1.08 0.97 0.49
0.88 1 0.49
0.66 0.96 0.47
1.1 1.04 0.51
6. Conclusions Within the SM-IBC approach for nonlinear systems identification, a quantity called radius of information, giving the worst-case identification error, is defined. The radius of information is important in order to assess the quality of a given model and, more in general, of a whole identification procedure. In this paper, two algorithms for the evaluation of the radius of information have been proposed: The first is exact but requires a complexity exponential in the dimension of the regressor space. The second is approximate and involves a quadratic complexity.
M. Milanese, C. Novara / Journal of Complexity 23 (2007) 937 – 951
951
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]
J. Chen, G. Gu, Control-Oriented System Identification: An H∞ Approach, Wiley, New York, 2000. H. Edelsbrunner, Algorithms in Combinatorial Geometry, Springer, Berlin, 1987. K.R. Godfrey, Perturbation Sugnals for System Identification, Prentice-Hall International, New York, 1993. G. Goodwin, R. Payne, Dynamic System Identification: Experiment Design and Data Analysis, Academic Press, New York, 1977. D. Gorinevsky, On the persistency of excitation in radial basis function network identification of nonlinear systems, IEEE Trans. Neural Networks 6 (1995) 1237–1244. R. Haber, H. Unbehauen, Structure identification of nonlinear dynamic systems—a survey on input/output approaches, Automatica 26 (1990) 651–677. H. Hjalmarsson, From experiment design to closed loop control, Automatica 41 (2005) 393–438. K. Hsu, C. Novara, T. Vincent, M. Milanese, K. Poolla, Parametric and nonparametric curve fitting, Automatica 42 (11) (2006) 1869–1873. R. Isermann, S. Ernst, O. Nelles, Identification with dynamic neural networks—architectures, comparisons, applications, in: Sysid 97, vol. 3, 1997, pp. 997–1022. L. Ljung, System Identification: Theory for the User, Prentice-Hall, Upper Saddle River, NJ, 1999. M. Milanese, J. Norton, H.P. Lahanier, E. Walter, Bounding Approaches to System Identification, Plenum Press, New York, 1996. M. Milanese, C. Novara, Set membership identification of nonlinear systems, Automatica 40 (6) (2004) 957–975. M. Milanese, R. Tempo, Optimal algorithms theory for robust estimation and prediction, IEEE Trans. Automatic Control 30 (1985) 730–738. M. Milanese, A. Vicino, Optimal algorithms estimation theory for dynamic systems with set membership uncertainty: an overview, Automatica 27 (1991) 997–1009. K.S. Narendra, S. Mukhopadhyay, Neural networks for system identification, in: Sysid 97, vol. 2, 1997, pp. 763–770. E. Novak, Deterministic and Stochastic Error Bounds in Numerical Analysis, vol. 1349, Springer, Berlin, 1988. C. Novara, Experiment design in nonlinear set membership identification, in: American Control Conference, New York City, USA, 2007. J.R. Partington, Interpolation, Identification and Sampling, vol. 17, Clarendon Press—Oxford, New York, 1997. R.T. Rockafellar, Convex Analysis, Princeton University Press, New Jersey, 1970. J. Sjöberg, Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P. Glorennec, H. Hjalmarsson, A. Juditsky, Nonlinear black-box modeling in system identification: a unified overview, Automatica 31 (1995) 1691–1723. J.F. Traub, G.W. Wasilkowski, H. Wo´zniakowski, Information-Based Complexity, Academic Press, Inc, 1988. G.W. Wasilkowski, H. Wo´zniakowski, Complexity of weighted approximation over R d , J. Complexity (17) (2001) 722–740. H. Wo´zniakowski, Open problems for tractability of multivariate integration, J. Complexity 19 (2003) 434–444.
Journal of Complexity 23 (2007) 952 – 961 www.elsevier.com/locate/jco
A note on two fixed point problems Ch. Boonyasiriwata , K. Sikorskia,∗,1 , Ch. Xiongb,1 a School of Computing, University of Utah, Salt Lake City, UT 84112, USA b Department of Chemistry, University of Utah, Salt Lake City, UT 84112, USA
Received 20 March 2006; accepted 19 April 2007 Available online 10 May 2007 We dedicate this paper to Henryk Wo´zniakowski on the occasion of his 60th birthday
Abstract We extend the applicability of the Exterior Ellipsoid Algorithm for approximating n-dimensional fixed points of directionally nonexpanding functions. Such functions model many practical problems that cannot be formulated in the smaller class of globally nonexpanding functions. The upper bound 2n2 ln(2/ε) on the number of function evaluations for finding ε-residual approximations to the fixed points remains the same for the larger class. We also present a modified version of a hybrid bisection-secant method for efficient approximation of univariate fixed point problems in combustion chemistry. © 2007 Elsevier Inc. All rights reserved. Keywords: Fixed point problems; Optimal algorithms; Nonlinear equations; Ellipsoid algorithm; Computational complexity
1. Introduction An upper bound on the number of function evaluations needed to compute an -residual approximation x to some fixed point of a function f, f (x ) − x 2 , for globally nonexpanding in the 2-nd norm function f, is 2n2 ln(1/) in n dimensions (see [3, Section 3]). This bound is realized by the Exterior Ellipsoid Algorithm (EEA). It is much better than the best known bounds O((1/)2 ) for the Krasnoselski–Mann type iterations [11], and is within a factor of n from the
∗ Corresponding author.
E-mail addresses: [email protected] (Ch. Boonyasiriwat), [email protected] (K. Sikorski), [email protected] (Ch. Xiong). 1 Partially supported by DOE under the C-SAFE center.
0885-064X/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jco.2007.04.004
Ch. Boonyasiriwat et al. / Journal of Complexity 23 (2007) 952 – 961
953
best possible bound O(n ln(1/)), → 0 [8,10] realized by the Centroid and Interior Ellipsoid algorithms (IEA). At the conference in Bedlewo (Poland), co-organized in 2004 by Professor Wo´zniakowski, Dr. Vassin asked us if these bounds and algorithms could be extended to larger, more practical classes of functions that are only nonexpanding in the direction of fixed points. We stress that these larger classes contain functions that may be globally expanding, may be noncontinuous or may have unbounded derivatives. It turns out that the answer to Dr. Vassin’s question is positive. We show with a simple proof, that the ellipsoid algorithms are applicable for the larger class and that the complexity bounds stay the same as for the globally nonexpanding functions. Several numerical tests of a new, numerically stable implementation of the EEA, as well as comparisons to simple iteration and Newton-type methods are presented in a separate paper [4]. We also introduce a univariate hyper-bisection/secant (HBS) method for approximating fixed points of certain combustion chemistry problems. That algorithm enjoys the average case number of iterations O(log log(1/)) for computing -absolute solutions. It is a modification of the bisection-secant method of Novak, Ritter and Wo´zniakowski that was proven by them to be optimal in the average case [12], with average number of function evaluations O(log log(1/)). We stress that the ellipsoid algorithms are not applicable in the infinity-norm case, since the “cutting ball/plane” Lemma 3.1 that makes possible the construction of exterior/interior ellipsoids does not hold in that case. For the infinity norm case we developed a Bisection Envelope algorithm (BEFix) [14] and a Bisection Envelope Deep-Cut algorithm (BEDFix) [15] for approximating fixed points of two-dimensional nonexpanding functions. Those algorithms enjoy the minimal number of function evaluations 2log2 (1/) + 1. We also developed a (non-optimal) recursive fixed point algorithm (PFix) for approximating fixed points of n-dimensional nonexpanding functions with respect to the infinity norm (see [16,17]). We note that the minimal number of function evaluations needed for finding -residual solutions for expanding functions with the factor > 1 is exponential ((/)(n−1) ), as → 0 [7,6].
2. Classes of functions Given the domain B = {x ∈ n | x 1}, the n-dimensional unit ball, we consider the class of Lipschitz continuous functions: B 1 ≡ {f : B → B : f (x) − f (y) x − y, ∀x, y ∈ B},
(1)
where n2, · = · 2 , and 0 < 1. In the case when 0 < < 1, the class of functions is denoted by B<1 . The existence of fixed points , f () = , of functions f in B 1 is assured by the Brouwer’s fixed point theorem [5]. The EEA algorithm computes an absolute -approximation x to , x − for every f ∈ B<1 , and computes a residual -approximation x , f (x ) − x for every f ∈ B 1 . We extend the applicability of the EEA algorithm to directionally nonexpanding classes of functions considered by Vassin and Eremin in [26]. We indicate that the complexity bounds for the EEA algorithm do not change in this case. Those larger classes were investigated for problems defined by differential and integral equations originating in geophysics, atmospheric research, material science, and image deblurring [24,25,1,27,13]. These problems were effectively solved by Feyer-type iterative methods and/or some general optimization techniques; however,
954
Ch. Boonyasiriwat et al. / Journal of Complexity 23 (2007) 952 – 961
no formal complexity bounds were derived. These classes are defined by B 1 ≡ {f : B → B : the set of fixed points of f, S(f ) = ∅, and ∀x ∈ B, and ∀ ∈ S(f ), we have f (x) − f ()x − },
(2)
where n2 and 1. We note that the functions in B 1 may be expanding globally, and therefore the class B 1 is a proper subclass of B 1 . We finally introduce a univariate combustion chemistry fixed point problem defined in the class ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ C5 G = g : [a, b] → [a, b]: g(t) = C4 + ⎛ ⎞2 ⎪ , ⎪ ⎪ ⎪ 2 2 ⎪ ⎪ E Ec C1 t ⎪ ⎪ ⎪ ⎝ C1 t · e− Rtc + C3 + ⎪ ⎪ · e− Rt ⎠ ⎪ ⎪ ⎪ ⎩ ⎭ t − C2 t − C2 (3) where the interval [a, b], and the constants C1 , . . . , C5 , R and Ec are explained in Section 4 . We solve this problem in the equivalent zero finding formulation f (x) = 0, where f (x) = x − g(x). We derive the HBS method that is a modification of the bisection/regula-falsi/secant (BRS) method which was proven almost optimal in the average case setting [12], with the complexity O(log log(1/)). To get -absolute approximations with = 10−4 we need at most 6 function evaluations in the class G. Since the number of function evaluations in the HBS method is essentially bounded by the number of function evaluations in the BRS method, we conclude that the average case complexity of HBS is at most O(log log(1/)). 3. Constructive lemmas and cost bounds The following “cutting ball/plane” lemma is the basis of the EEA algorithm for fixed points [19,21,23,20]. The proof of this lemma can be found in [21,20]. Lemma 3.1. Let f ∈ B<1 . Suppose that A ⊆ B contains the fixed point . Then, for every x ∈ A, ∈ A ∩ B(c, ), where B(c, ) = {x ∈ n : x − c }, c = x + 1−12 (f (x) − x) and =
1−2
f (x) − x.
The following lemma and corollary exhibit upper bounds on the number of iterations of the EEA algorithm. Lemma 3.2. For any ∈ (0, 1), and f ∈ B 1 , the EEA algorithm requires at most i = 2+ 2n(n + 1) ln iterations to compute xi ∈ n such that f (xi ) − xi , as → 0. Proof. We give a sketch of the proof, since it follows the proof of Lemma 2.2 of Huang et al. [8]. The upper bound on the number of function evaluations of the EEA algorithm is obtained by replacing the volume reduction constant 0.861 from the IEA of Huang et al. [8], by the volumereduction constant of the EEA algorithm given by exp(−1/(2(n + 1))) < 1 [9,20]. Following the
Ch. Boonyasiriwat et al. / Journal of Complexity 23 (2007) 952 – 961
955
formula (2.3) of the proof in Huang et al. [8] we get i n e− 2(n+1) . n (2 + )
(4)
The number of iterations that guarantees to obtain an -residual approximation xi is the smallest i for which this inequality is violated. Therefore, we get n ln
2+
i
1 , 2(n + 1)
(5)
that yields 2+ i = 2n(n + 1) ln , and completes the proof. We remark that the bound obtained in the above Lemma is better by a factor of n than the bound obtained in Tsay [23]. As a direct corollary from this lemma we get: Corollary 3.3. If < 1, the EEA algorithm finds an -approximation xi of the fixed point in the absolute sense, xi − 2 , within i 2n2 (ln(2/) + ln(1/(1 − )) iterations. Proof. We observe that xi − 1−1 xi − f (xi ) and take := (1 − ) in Lemma 3.2. Finally, we derive a constructive lemma for the larger class B 1 . This lemma is similar in nature to Lemma 3.1, since after each function evaluation it enables us to locate the set of fixed points in the intersection of the previous set with a certain half space (in Lemma 3.1 it was the intersection of the previous set with a ball). Lemma 3.4. Let f ∈ B 1 and A ⊂ B be such that the set of fixed points S(f ) ⊂ A. Then: ∀x ∈ A,
S(f ) ⊂ A ∩ Hx ,
(6)
where the halfspace Hx = {y ∈ n : (y − c)T (f (x) − c)0}, for c = (f (x) + x)/2. Proof. Suppose on the contrary that there exists ∈ S(f ) such that ( − c)T (f (x) − c) < 0. Then, since f (x) − c = c − x, we get f (x) − 2 = (f (x) − c + c − )T (f (x) − c + c − ) = f (x) − c2 + c − 2 + 2(c − )T (f (x) − c) > f (x) − c2 + c − 2 > x − c2 + c − 2 − 2(c − )T (c − x) = x − c + c − 2 = x − 2 , which contradicts that f ∈ B 1 and completes the proof.
(7)
956
Ch. Boonyasiriwat et al. / Journal of Complexity 23 (2007) 952 – 961
The above lemma implies that the EEA and IEA algorithms can also be applied to functions in B 1 yielding the same complexity bounds as in the smaller classes B 1 , since all of the arguments in the proof of Lemma 3.2 and Lemma 2.2 of Huang et al. [8] hold in the class B 1 . We formulate this conclusion in: Lemma 3.5. For any ∈ (0, 1), and f ∈ B 1 , iterations to compute (i) the EEA algorithm requires at most i = 2n(n + 1) · ln 2+ n xi ∈ such that f (xi ) − xi , as → 0, and (ii) the IEA algorithm requires at most k = 6.7n ln 2+ +1 iterations to compute xk ∈ n such that f (xk ) − xk , as → 0. 4. Combustion chemistry fixed point problem In this section we focus on the design of a nearly optimal algorithm that is applied to univariate fixed point problems originating in modeling combustion of energetic materials. In particular, our algorithm is able to efficiently approximate fixed points of nonlinear equations modeling burning surface temperature of explosive materials. These fixed point calculations are utilized in large scale transient combustion simulations. They are repeated at every grid cell of a very large model and at every time step. This is why they have to be extremely fast and sufficiently accurate. We derive an almost optimal (on the average) hyper-bisection/secant (HBS) modification of a hybrid bisection-regula falsi-secant (BRS) method [12] to solve a nonlinear fixed point problem that is derived from the Ward, Son, Brewster (WSB) combustion model [28–30]. This model implies [31] that the burning surface temperature Ts is a fixed point of the equation T = G(T ), where the function G(.) = GP ,T0 (.) is uniquely defined by the initial gas phase pressure P and the initial solid temperature T0 , and is given by G(T ) = C4 + ⎛ ⎝
C5 T2
Ec C1 · e− RT + C3 + T − C2
T2
⎞2 ,
Ec C1 · e− RT ⎠ T − C2
where all constants Ci , Ec and R are positive. Those constants characterize the properties of materials and uniquely depend on the initial gas phase pressure P and the initial solid temperature T0 [31]. We remark that the functions G(·) are in general expanding, and that in the univariate case we are able to find -absolute approximations to the fixed points of such functions with worst case complexity of O(log(1/)) via the use of bisection-envelope algorithms [20]. In order to solve it even faster, we reformulate this fixed point problem as a zero finding problem: for function f (T ) = T − G(T ) and a small positive number , we want to find an -approximation T of the |T −Ts | exact zero Ts of f, f (Ts ) = 0, with respect to the root criterion |Tmax −Tmin | , where [Tmin , Tmax ] , and Tmax = is the interval containing the solution Ts . It turns out [31] that we can set Tmin = C4 Ec Ec G(Tsmax ) if Tmin < Tsmax , or Tmax = G(Tmin ) otherwise, where Tsmax = C2 − 2R + C2 2 + 4R 2. Since the function f (·) is continuous and has different signs at its endpoints, it follows that the exact zero Ts is in [Tmin , Tmax ]. Fig. 1 depicts a number of functions f with various T0 and P. 2
Ch. Boonyasiriwat et al. / Journal of Complexity 23 (2007) 952 – 961
957
1000
f (Ts) (K)
500 0 -500
T0=420K, P=10atm T0=390K, P=50atm
-1000
T0=360K, P=200atm T0=330K, P=800atm
-1500
T0=300K, P=2000atm Zero Line
-2000 800
1000
1200
1400
1600
1800
Ts (K) Fig. 1. Functions f (T ) with various T0 and P.
4.1. Description of the algorithm For the univariate zero finding problem, the solution can be efficiently obtained by using the BRS method that is almost optimal on the average [12]. 4.1.1. The BRS method We now outline the BRS method investigated in Novak et al. [12]. This method computes points Ts,i ∈ [Tmin , Tmax ], at which the function f (·) is evaluated, [li , ri ] containing and a subinterval a zero of f (·). We always have f (li ) 0 f (ri ), Ts,i ∈ li−1 , ri−1 and [li , ri ] ⊂ li−1 , ri−1 . Each function evaluation is preceded by the verification of the stopping rule. At the beginning and later after each bisection step,the method takes two steps of the regula falsi method, starting from the endpoints of the interval li−1 , ri−1 . Then, the secant steps are performed until they are well defined, result in points in the current interval and reduce the length of the current interval by at least one half in every three steps. If any of these conditions is violated, a bisection step takes place. We utilize the absolute termination criterion given by Stopi =
1 if (ri − li )/2 · (Tmax − Tmin ) 0 otherwise.
After the termination (Stopi = 1), we have We further define the points Secant(u, w) =
⎧ ⎨
|Ts,i −Ts | |Tmax −Tmin | ,
f (u) · (u − w) f (u) − f (w) ⎩ undefined u−
then Ts,i = (li + ri ) /2,
since the exact solution Ts ∈ [li , ri ] .
if f (u) = f (w), otherwise,
958
Ch. Boonyasiriwat et al. / Journal of Complexity 23 (2007) 952 – 961
Fig. 2. Flowchart of the BRS method.
for the secant method with u, w ∈ [Tmin , Tmax ], and Bisectioni =
li + ri 2
for the bisection method. In each step, a new interval Ii containing a zero of f (Ts ) is computed: ⎧ ⎨ li−1 , xi if f (xi ) > 0, x ,r if f (xi ) < 0, Ii = ⎩ i i−1 if f (xi ) = 0. [xi , xi ] The complete method is summarized in the flowchart in Fig. 2.
Ch. Boonyasiriwat et al. / Journal of Complexity 23 (2007) 952 – 961
959
The BRS method is almost optimal on the average. The average number maver of function evaluations for finding an -approximation to the solution is bounded by 1 1 + A, · log log maver √ 1+ 5 log 2 where A is a constant [12]. For practical combustion simulations, the initial solid bulk temperature T0 usually varies within the interval TE = [280, 460] K and the gas phase pressure P within the interval PR = [0, 3000] atm. To carry out the tests, we selected 60 × 50 evenly spaced grid nodes for the set of parameters TE × PR. By choosing = 10−4 , the average number of iterations is 10.5, where the average is defined as the total number of iterations divided by the number of tested functions. We observed that for low P and high T0 it took in the worst case 12–13 iterations to solve the problem. We derive the HBS, a modification to the BRS method, in order to lower the average and worst case numbers of iterations. 4.1.2. HBS method To derive the HBS method, we first divide the parameter set TE × PR = [280, 460] × [0, 3000] into three subdomains Di , i = 1, 2, 3 by two lines P = 4 · (T0 − 250) and P = 15 · (T0 − 250), ⎧ ⎨ D1 if P 4 · (T0 − 250), Di = D2 if 4 · (T0 − 250) < P 15 · (T0 − 250), ⎩ D3 if P > 15 · (T0 − 250). For each subdomain, we run two steps of hyper-bisection method defined as Hyperbisi = li + i · (ri − li ), T −l
where i = rs,ii −li i ∈ [0, 1]. Extensive numerical experiments indicate that the solutions are distributed around the point Tmin + · (Tmax − Tmin ) in the sub-domain D1 , where = 0.12. We therefore utilize 1 = for the first step of hyper-bisection, and 2 if f (Hyperbis1 ) < 0, 2 = 1 − otherwise choices guarantee that in most cases the solution is in the interval where = 0.2. Those Hyperbis1 , Hyperbis2 . The same strategy applies to subdomains D2 and D3 , with = 0.18 for D2 and = 0.25 for D3 . Parameter equals 0.2 for all subdomains. Ideally, the solution interval is reduced to 2–5% of its original length after two steps of the hyper-bisection. Thereafter, the BRS method is used to find the solution. When choosing the same set of test functions and = 10−4 , the average number of iterations of the HBS method is 5.7 (worst case 6) as compared to 10.5 (worst case 13) of the BRS method. We remark that the secant method in the BRS algorithm could be replaced by the Newton’s method in order to get asymptotically quadratic rate of convergence. This would however increase the cost of each iteration by a factor of at least two, since each step of Newton’s method requires the computation of function value and the derivative, whenever the secant step only needs one function evaluation. As a result, the total computational cost would increase.
960
Ch. Boonyasiriwat et al. / Journal of Complexity 23 (2007) 952 – 961
4.1.3. Conclusion A hybrid bisection-secant method was developed for solving nonlinear equations derived from a combustion model. For the specific univariate zero finding problem, two more steps of a hyperbisection method in addition to the original algorithm reduce the average number of iterations from 10.5 to 5.7. The worst case numbers of iterations are reduced from 13 to 6. This represents a significant improvement in the cost of carrying out large scale combustion simulations, since this zero finding problem has to be solved at every cell and every time step of billions of cells and millions of time steps. Acknowledgments We would like to thank the referees for the comments that significantly improved our paper. References [1] A. Ageev, V. Vassin, E. Bessonova, V. Markusevich, Radiosounding ionosphere on two frequencies. Algorithmic analysis of Fredholm–Stiltjes integral equation, in: Theoretical Problems in Geophysics, 1997, pp. 100–118 (in Russian). [2] R.G. Bland, D. Goldfarb, M. Todd, The ellipsoid method: a survey, Oper. Res. 6 (1981) 1039–1090. [3] C. Boonyasiriwat, Circumscribed ellipsoid algorithm for fixed point problems, M.S. Thesis, University of Utah, Salt Lake City, UT, 2004. [4] C. Boonyasiriwat, K. Sikorski, CH.-W. Tsay, Algorithm XXX: circumscribed Ellipsoid algorithm for fixed points, 2007, submitted to ACM ToMS. [5] L.E. Brouwer, Über abbildungen von mannigfaltigkeiten, Math. Ann. 71 (1912) 97–115. [6] X. Deng, X. Chen, On algorithms for discrete and approximate Brouwer fixed points, in: H.N. Gabov, R. Fagin (Eds.), Proceedings of the 37th Annual ACM Symposium on Theory of Computing, Baltimore, MD, USA, May 22–24, 2005, ACM, New York, 2005. [7] M.D. Hirsch, C. Papadimitriou, S. Vavasis, Exponential lower bounds for finding Brouwer fixed points, J. Complexity 5 (1989) 379–416. [8] Z. Huang, L. Khachiyan, K. Sikorski, Approximating fixed points of weakly contracting mappings, J. Complexity 15 (1999) 200–213. [9] L. Khachiyan, Polynomial algorithm in linear programming, Soviet. Math. Dokl. 20 (1979) 191–194. [10] L. Khachiyan, Private email communication to K. Sikorski, 2000. [11] U. Kochlenbach, Effective uniform bounds from proofs in abstract functional analysis, in: B. Cooper, B. Loewe, A. Sorbi (Eds.), CiE 2005 New Computational Paradigms: Changing Conceptions of What is Computable, Springer, Berlin, 2005. [12] E. Novak, K. Ritter, H. Wo´zniakowski, Average case optimality of a hybrid secant-bisection method, Math. Comput. 64 (1995) 1517–1539. [13] G. Perestonina, I. Prutkin, L. Timerkhanova, V. Vassin, Solving three-dimensional inverse problems of gravimetry and magnetometry for three layer medium, Math. Modeling 15 (2) (2003) 69–76 (in Russian). [14] S. Shellman, K. Sikorski, A two-dimensional bisection envelope algorithm for fixed points, J. Complexity 18 (2) (2002) 641–659. [15] S. Shellman, K. Sikorski, Algorithm 825: a deep-cut bisection envelope algorithm for fixed points, ACM Trans. Math. Soft. 29 (3) (2003) 309–325. [16] S. Shellman, K. Sikorski, A recursive algorithm for the infinity-norm fixed point problem, J. Complexity 19 (6) (2003) 799–834. [17] S. Shellman, K. Sikorski, Algorithm 848: a recursive fixed point algorithm for the infinity-norm case, ACM Trans. Math. Soft. 31 (4) (2005) 580–587. [18] K. Sikorski, Bisection is optimal, Numer. Math. 40 (1982) 111–117. [19] K. Sikorski, Fast algorithms for the computation of fixed points, in: R.T.M. Milanese, A. Vicino (Eds.), Robustness in Identification and Control, Plenum Press, New York, 1982, pp. 49–59. [20] K. Sikorski, Optimal Solution of Nonlinear Equations, Oxford Press, New York, 2001. [21] K. Sikorski, C.W. Tsay, H. Wo´zniakowski, An ellipsoid algorithm for the computation of fixed points, J. Complexity 9 (1993) 181–200.
Ch. Boonyasiriwat et al. / Journal of Complexity 23 (2007) 952 – 961
961
[22] K. Sikorski, H. Wo´zniakowski, Complexity of fixed points, J. Complexity 3 (1987) 388–405. [23] C.W. Tsay, Fixed point computation and parallel algorithms for solving wave equations, Ph.D. Thesis, University of Utah, Salt Lake City, UT, 1994. [24] V. Vassin, Ill-posed problems with a priori information: methods and applications, Institute of Mathematics and Mechanics, Russian Academy of Sciences, Ural Subdivision, 2005. [25] V. Vassin, A. Ageev, Ill-posed problems with a priori information, VSP, Utrecht, The Netherlands, 1995. [26] V. Vassin, E. Eremin, Feyer type operators and iterative processes, Russian Academy of Sciences, Ural Subdivision, Ekaterinburg, 2005 (in Russian). [27] V. Vassin, T. Sereznikova, Two stage method for approximation of nonsmooth solutions and reconstruction of noisy images, Automat. Telemechanica 2 (2004) 12 (in Russian). [28] M. Ward, A new modeling paradigm for the steady deflagration of homogeneous energetic materials, M.S. Thesis, University of Illinois, Urbana-Champain, 1997. [29] M. Ward, S. Son, M. Brewster, Role of gas- and condensed-phase kinetics in burning rate control of energetic solids, Combust. Theory Modeling 2 (1998) 293–312. [30] M. Ward, S. Son, M. Brewster, Steady deflagration of HMX with simple kinetics: a gas phase chain reaction model, Combust. Flame 114 (1998) 556–568. [31] Ch. Xiong, Optimal nonlinear solvers for sub-grid scale combustion models, MS-CES Report, University of Utah, Computational Engineering and Science Program, 2005.
Journal of Complexity 23 (2007) 962–963
http://www.elsevier.com/locate/jco
Author Index for Volume 23 G
A Avendan˜o, Martı´ n, 193 B Babenko, V.F., 346, 890 Bauer, Frank, 52 Berkes, Istva´n, 516 Boonyasiriwat, Ch., 952 Borodachov, S.V., 346 Bournez, Olivier, 317 Briquel, Ire´ne´e, 594 Butcher, J.C., 560 C Calafiore, Giuseppe, 301 Campagnolo, Manuel L., 317 Cattani, Eduardo, 82 Chen, W.W.L., 662 Che`ze, Guillaume, 380 Creutzig, Jakob, 867 Cucker, Felipe, 594 D Dabbene, Fabrizio, 301 Dahlke, Stephan, 614 DeVore, Ronald A., 918 Dick, Josef, 436, 581, 649, 752 Dickenstein, Alicia, 82 Dryja, Maksymilian, 715 F Fang, Kai-Tai, 740 Fu, Fang-Wei, 423 doi:10.1016/S0885-064X(07)00130-6
Galvis, Juan, 715 Gill, Hardeep S., 603 Gnewuch, Michael, 262, 828 Grac¸a, Daniel S., 317 H Hackbusch, Wolfgang, 697 Hainry, Emmanuel, 317 Heinrich, Stefan, 793 Hesse, K., 528 Hesse, Kerstin, 25 Huang, F.L., 73 Hui, Yao, 245 J Jackiewicz, Z., 560 K Kacewicz, Boleslaw, 421 Kaltenbacher, Barbara, 225 Kapusta, Joanna, 336 Khoromskij, Boris N., 697 Ko, Ker-I, 2 Krick, Teresa, 193 Kritzer, Peter, 581, 752 Kuo, Frances Y., 25, 752 L Lazarov, R.D., 498 Lecerf, Gre´goire, 380 Lemieux, Christiane, 603 Li, Runze, 740 Lindloh, Rene´, 828
Author Index / Journal of Complexity 23 (2007) 962–963
M Maller, Michael, 217 Margenov, S.D., 498 Mathe´, Peter, 673 Meidl, Wilfried, 169 Mhaskar, H.N., 528 Milanese, Mario, 937 Mu¨ller-Gronbach, Thomas, 867 N Nie, Jiawang, 135 Niederreiter, Harald, 1, 169, 423 Novak, Erich, 614, 673 Novara, Carlo, 937 O
Skorokhodov, D.S., 890 Skriganov, M.M., 926 Sloan, I.H., 528 Sloan, Ian H., 25, 752 Smarzewski, Ryszard, 336 Sombra, Martı´ n, 193 Srivastav, Anand, 828 T Tarieladze, Vaja, 851 Tempo, Roberto, 301 Thamban Nair, M., 454 Tichy, Robert F., 516 Traub, Joseph F., 1 Travaglini, G., 662 Triebel, Hans, 468 V
Osipenko, K.Yu., 653 P Papageorgiou, A., 802 Pen˜a, Javier, 245 Pereverzev, Sergei V., 454 Pereverzev, Sergei, 52 Philipp, Walter, 516 Pillichshammer, Friedrich, 436, 581 Plaskota, Leszek, 421 R Ritter, Klaus, 867 Rivera, Juan Carlos, 245 Rosasco, Lorenzo, 52 S Sarkis, Marcus, 715 Scheiblechner, Peter, 359 Scheicher, Klaus, 152 Schmid, Wolfgang Ch., 581 Schneider, Reinhold, 828 Schweighofer, Markus, 135 Sickel, Winfried, 614 Sikorski, K., 952
Vakhania, Nicholas, 851 Venkateswarlu, Ayineedi, 169 Vera, Juan Carlos, 245 Vybı´ ral, Jan, 773 W Wasilkowski, Grzegorz, 421 Wedenskaya, E.V., 653 Werschulz, Arthur G., 553 Whitehead, Jennifer, 217 Woz´niakowski, Henryk, 1, 262 Wright, W.M., 560 Wu, Qiang, 108 X Xiong, Ch., 952 Y Ying, Yiming, 108 Yu, Fuxiang, 2 Z Zhang, Aijun, 740 Zhang, S., 73 Zhou, Ding-Xuan, 108
963
Contents continued from inside back cover Sampling numbers and function spaces Jan Vybíral
773
Quantum lower bounds by entropy numbers Stefan Heinrich
793
On the complexity of the multivariate Sturm–Liouville eigenvalue problem A. Papageorgiou
802
Cubature formulas for function spaces with moderate smoothness Michael Gnewuch, René Lindloh, Reinhold Schneider, Anand Srivastav
828
Disintegration of Gaussian measures and average-case optimal algorithms Vaja Tarieladze, Nicholas Vakhania
851
Free-knot spline approximation of stochastic processes Jakob Creutzig, Thomas Müller-Gronbach, Klaus Ritter
867
On the best interval quadrature formulae for classes of differentiable periodic functions V.F. Babenko, D.S. Skorokhodov Deterministic constructions of compressed sensing matrices Ronald A. DeVore On linear codes with large weights simultaneously for the Rosenbloom–Tsfasman and Hamming metrics M.M. Skriganov Computation of local radius of information in SM-IBC identification of nonlinear systems Mario Milanese, Carlo Novara
890 918
926
937
A note on two fixed point problems Ch. Boonyasiriwat, K. Sikorski, Ch. Xiong
952
Author Index for Volume 23
962