Mass Transportation Problems, Volume II: Applications
Svetlozar T. Rachev Ludger Rüschendorf
Springer
To my wife Zoja and To my parents Nadezda and Todor Rachevi.
To my wife Gabi.
Svetlozar (Zari) Rachev
Preface to Volume II
The second volume of the Mass Transportation Problems is devoted to applications in a variety of fields of applied probability, queueing theory, mathematical economics, risk theory, tomography, and others. In Volume I we encompassed the general mathematical theory of mass transportation, concentrating our attention on: • the general duality theory of the transportation and transshipment problem; • explicit optimality results; • applications to minimal probability metrics, stochastic ordering, approximation and extension problems; • applications to functional analysis and mathematical economics (the Debreu theorem, utility theory, dynamical systems, choice theory, and convex and nonconvex analysis were dicsussed in this context). In Volume II we expand the scope of applications of mass transportation problems. Some of them arise from modifications of the admissible transportation plans. In fact, for applications to mathematical economics it is of interest to consider relaxations of the marginal constraints, such as upper or lower bounds on the supply and demand distributions, or additional constraints like capacity bounds for the transportation plans. In mathematical tomography the basic problem is to reconstruct the multivariate
viii
Preface to Volume II
probability distribution based on some information about the marginal distributions in a certain finite number of directions. This information may be represented by additional constraints on the support functions or distributional moments, or it may be contained in only partial information on the marginals. Thus there is a close relationship between a class of problems in mathematical tomography and the classical theory on moment problems, which again can be viewed as a relaxation on the set of constraints in mass transportation problems. We discuss in detail applications to approximation problems for stochastic processes and to rounding problems based on moment-type characteristics. A particular example will be the approximation of queueing models. The minimal metrics allow us to compare various rounding rules and to determine optimal ones from an asymptotic point of view. An important field of applications of mass transportation problems we shall consider in this second volume is to probabilistic limit theorems. This approach was introduced in the seventies by the Russian school of probability theory, headed by V.M. Zolotarev. By inherent regularity properties of probability metrics defined via certain mass transportation problems, there are streamlined proofs for central limit theorems on Banach spaces yielding sharp quantitative estimates of Berry–Esseen type for the convergence rate. The probability metric approach will be applied to general stable and operator stable limits theorems, martingale-type limit theorems, limit behavior of summability methods, and compound Poisson approximation. A particular application is to the classical problem in mathematical risk theory dealing with sharp approximation of the individual risk model by the collective risk model. The probability metric approach will also be applied to the quantitative asymptotics in rounding problems. A new field of application of probability metrics arising as solutions of mass transportation problems is the analysis of deterministic and stochastic algorithms. This research area is of increasing importance in computer science and various fields of stochastic modeling. Based on regularity properties of probability metrics, a general “contraction” method for the asymptotic analysis of algorithms has been developed. The contraction method has been applied successfully to a variety of search, sorting, and other tree algorithms. Furthermore, the recursive structure in iterated functions systems (image encoding), fractal measures, bootstrap statistics, and time series (ARCH) models has been analyzed by this method. It becomes clear that there are many interesting probabilistic applications of this method to be rigorously developed in the future. In the final chapter we consider applications to stochastic differential equations (SDEs) and to convergence of empirical measures. SDEs will be interpreted as continuous recursive structures. From this point of view we provide a detailed discussion on the approximative solution of nonlinear stochastic differential equations of McKean–Vlasov type by interactive par-
Preface to Volume II
ix
ticle systems with application to the Kac theory of chaos propagation. The probability metrics approach allows us to establish approximation results for various modifications of the diffusion system, some of them of “nontraditional” type. In a general context we establish approximation results for empirical measures and give applications to the approximation of stochastic processes. As final applications we discuss a weak approximation of SDEs of Itˆ o type by a combination of the time discretization methods of Euler and Milshtein with a chance discretization based on the strong invariance (embedding) principle. This approximation is given in terms of minimal Lp -metrics and thereby based on regularity properties of the solutions of the corresponding mass transportation problem.
Preface to Volume I
The subject of this book, mass transportation problems (MTPs), concerns the optimal transfer of masses from one location to another, where the optimality depends upon the context of the problem. Mass transportation problems appear in various forms and in various areas of mathematics and have been formulated at different levels of generality. Whereas the continuous case of the transportation problem may be cast in measure-theoretic terms, the discrete case deals with optimization over generalized transportation polyhedra. Accordingly, work on these problems has developed in several separate and independent directions. The aim of this monograph is to investigate and to develop, in a systematic fashion, the Monge–Kantorovich mass transportation problem (MKP) and the Kantorovich–Rubinstein transshipment problem (KRP). We consider several modifications of these problems known as the MTP with partial knowledge of the marginals and the MTP with additional constraints (MTPA). We also discuss extensively a variety of stochastic applications. In the first volume of Mass Transportation Problems we concentrate on the general mathematical theory of mass transportation. In Volume II we expand the scope of applications of mass transportation problems. In 1781 Gaspard Monge proposed in simple prose a seemingly straightforward problem of optimization. It was destined to have wide ramifications. He began his paper on the theory of “clearings and fillings” as follows: When one must transport soil from one location to another, the custom is to give the name clearing to the volume of the soil that one
xii
Preface to Volume 1 must transport and the name filling (“remblai”) to the space that it must occupy after transfer. Since the cost of transportation of one molecule is, all other things being equal, proportional to its weight and the interval that it must travel, and consequently the total cost of transportation being proportional to the sum of the products of the molecules each multiplied by the interval traversed; given the shape and position, the clearing and the filling, it is not the same for one molecule of the clearing to be moved to one or another spot of the filling. Rather, there is a certain distribution to be made of the molecules from the clearing to the filling, by which the sum of the products of molecules by intervals travelled will be the least possible, and the cost of the total transportation will be a minimum. (Monge, (1781, p. 666)).
In mathematical language Monge proposed the following nonlinear varational problem. Given two sets A, B of equal volume, find an optimal volume-preserving map between them; the optimality is evaluated by a cost function c(x, y) representing the cost per unit mass for transporting material from x ∈ A to y ∈ B. The optimal map is the one that minimizes the total cost of transferring the mass from A to B. Monge considered this problem with cost function equal to the Euclidean distance in IRd : c(x, y) = |x − y|. Monge’s problem turned out to be the prototype for a class of problems arising in various fields such as mathematical economics, functional analysis, probability and statistics, linear and stochastic programming, differential geometry, information theory, cybernetics, and ma trix theory. The optimization function A c(x, t(x)) dx is nonlinear in the transportation function t, and moreover, the set of admissible transportations is a nonconvex set. This explains why it took a long time until even existence results for optimal solutions could be established. The first general existence result was given in 1979 by Sudakov. On the second page of his paper Monge himself had remarked that to obtain a minimum, the intervals traversed by two different molecules should not intersect. This simple observation applied to the discrete case—where there are only a finite number of molecules—leads to a “greedy” algorithm, the so-called northwest corner rule. The totality of mass transferences plans in the discrete case is a polytope that arises in the transportation problem of mathematical programming, where it is treated in specialized form as an assignment problem and in generalized form as a network-flow problem. The northwest corner rule solves transportation problems having a particular structure on the costs and is, moreover, at the heart of many seemingly different problems having an “easy” solution (cf. Hoffman (1961), Barnes and Hoffman (1985), Derigs, Goecke, and Schrader (1986), Hoffman and Veinott (1990), Olkin and Rachev (1991), and Rachev and R¨ uschendorf (1994); see also Burkard, Klinz, and Rudolf (1994) and the references therein). The Academy of Paris offered a prize for the solution of Monge’s problem, which was claimed by the differential geometer P. Appell (1884–1928), who
Preface to Volume I
xiii
established some geometric properties of optimal maps in the plane and in IR3 . But it took a long time until a real breakthrough in the transportation problem came, originating in the seminal 1942 paper of L.V. Kantorovich entitled “On the transfer of masses.” Kantorovich stated the problem in a new, abstract, and in more easily accessible setting and without knowledge of Monge’s work. Kantorovich learned of Monge’s work only later (cf. his 1948 paper). In the Kantorovich formulation of the mass transportation problem (the so-called “continuous” MTP), the initial mass (the clearing) and the final mass (the filling) can be considered as probability measures on a metric space. The essential step in this formulation is the replacement of the class of transportation map by the wider class of generalized transportation plans, that are identifiable with the convex set of all probability measures on the product space with fixed marginals. The difficult nonlinear Monge problem was thereby replaced by a linear optimization problem over an abstract convex set. This made it possible to put this problem in the framework of linear optimization theory and encouraged the development of general duality theory for the solution of the Kantorovich formulation of the transportation problem as the basic tool. Accordingly, these problems and their generalizations will be referred to as Monge–Kantorovich Mass Transportation Problems (MKPs). Kantorovich’s measure theoretic formulation made the problem accessible to various areas of the mathematical sciences and other scientific fields. Kantorovich himself received a Nobel Prize in Economics for related work in mathematical economics.(1) Here is a list of some references in the mathematical sciences: • Functional analysis: Kantorovich and Akilov (1984) • Probability theory: Fr´echet (1951), Cambanis et al. (1976), Dudley (1976, 1989), Kellerer (1984), Rachev (1991c), R¨ uschendorf (1991) • Statistics: Gini (1914, 1965), Hoeffding (1940, 1955), Kemperman (1987), Huber (1981), Bickel and Freedman (1981), R¨ uschendorf (1991) • Linear and stochastic programming: Hoffman (1961), Barnes and Hoffman (1985), Anderson and Nash (1987), Burkard, Klinz and Rudolf (1994) • Information theory and cybernetics: Wasserstein (1969), Gray et al. (1975), Gray and Ornstein (1979), Gray et al. (1980) • Matrix theory: Lorentz (1953), Marcus (1960), Olkin and Pukelsheim (1982), Givens and Shortt (1984) (1) L.V.
Kantorovich together with T.C. Koopmans received the Nobel Memorial Prize in Economic Science in 1975 for “contributions to the theory of optimum allocation of resources”; see Dudley (1989, p. 342).
xiv
Preface to Volume 1
Many practical problems arising in various scientific fields have led mathematicians to solve MKPs: e.g., in • Statistical physics: Tanaka (1978), Dobrushin (1979) • Reliability theory: Barlow and Proschan (1975), Kalashnikov and Rachev (1990), Bene˘s (1985) • Quality control: Jirina and Nedoma (1957) • Transportation: Dantzig and Ferguson (1956) • Econometrics: Shapley and Shubik (1972), Pyatt and Round (1985), Gretsky, Ostroy, and Zame (1992) • Expert systems: Perez and Jirousek (1985) • Project planning: Haneveld (1985) • Optimal models for facility location: Ermoljev, Gaivoronski, and Nedeva (1983) • Allocation policy: Rachev and Taksar (1992) • Quality usage: Rachev, Dimitrov and Khalil (1992) • Queueing theory: Rachev (1989), Anastassiou and Rachev (1992a, 1992b) There are several surveys in the vast literature about MKP, among them Rachev (1984b), Rachev and R¨ uschendorf (1990), Burkard, Klinz, and Rudolf (1994), Cuesta-Albertos, Matr´ an, Rachev, and R¨ uschendorf (1996), and Gangbo and McCann (1996) related to dual solutions and applications of MKP; Shorack and Wellner (1985, Sect. 3.6) on optimal processes; Benes and Stepan (1987, 1991) on extremal mass transportation plans; R¨ uschendorf (1981, 1991, 1991a), Kellerer (1984), Rachev (1991c) on multivariate transportation problems; Dudley (1989) on distances in the space of measures; Talagrand (1992) and Yukich (1991) on matching problems. In recent years, characterizations of the solutions of the Monge–Kantorovich problem have been given in terms of c-subgradients of generalized convex functions defined in terms of the cost functions c(x, y) (cf. Knott and Smith (1984, 1992), Brenier (1987), R¨ uschendorf and Rachev (1990), R¨ uschendorf (1991, 1991a, 1995), Cuesta-Albertos, Matr´ an, Rachev, and R¨ uschendorf (1996), and Gangbo and McCann (1996)). For the case of squared Euclidean costs c(x, y) = |x − y|2 , the generalized convexity property is equivalent to convexity, and c-subgradients are identical to the usual subgradients of convex analysis. From this characterization
Preface to Volume I
xv
a series of explicit solutions of the transportation problem could be established. It also implies that the solutions of the MKP are under continuity assumptions given by mappings. Therefore, the solutions of the “easier” MKP imply as well the existence and characterizations of solutions of the original Monge problem, and so the MKP turns out to be the fundamental formulation of the transportation problem. For this reason, we concentrate in this book on the Kantorovich-type mass tranportation problems. For a discussion of interesting analytic aspects of the Monge problem, we refer to Gangbo and McCann (1996). Another type of MTP appears in probability theory, even if it leaves the framework of probability measures as transportation plans. Its solutions are bounded measures on a product of two spaces with the difference of marginals equal to the difference of two given probability measures. It will be called the Kantorovich–Rubinstein Problem (KRP), since the first results were obtained by Kantorovich and Rubinstein (1958). In its relation to the practical task of mass transportation it is sometimes referred to as the transshipment problem; see Kemperman (1983), and Rachev and Shortt (1990). The KRP has been developed to a great extent in the Russian school of probabilists and functional analysts, in particular by V.L. Levin, A.A. Milyutin, and A.M. Vershik and their students. For metric cost functions the KRP coincides with the corresponding MKP; for general cost functions it can be reduced to the MKP for a corresponding reduced cost function. For the duality theory of the KRP a specific detailed theory with many results that are of value in themselves has been developed with wide-ranging applications to mathematical economics. For a different approach to the KRP as introduced in Dudley (1976) and as further extended in Rachev and Shortt (1990) we refer to the book of Rachev (1991c). A problem related to both MKP and KRP is the Mass Transportation Problem with Partial Knowledge of the Marginals (MTPP), which is expressed by stating finitely many moment conditions. Problems of this type were formulated and extensively studied by Rogosinski (1958), Kemperman (1983), and Kuznezova-Sholpo and Rachev (1989). Barnes and Hoffman (1985) considered mass tranportaion problems with capacity constraints on the admissible transportation plans as an example of Mass Transportation Problems with Additional Constraints (MTPA) (see Rachev (1991b) and Rachev and R¨ uschendorf (1994)). In this book we give an extensive account of the duality theory of the MKP and the KRP, including the known results on explicit constructions and characterizations of optimal solutions. In Chapters 2 and 3 we present important duality theorems for the Monge–Kantorovich problem based on work of H. Kellerer, L. R¨ uschendorf, S.T. Rachev, and D. Ramachandran.
xvi
Preface to Volume 1
In Chapters 4 and 5 we present basically work of V.L. Levin; we analyze measure-theoretic methods for infinite-dimensional linear programs developed in context with the KRP as well as applications to general utility theorems (the Debreu theorem), extension theorems, choice theory, and set-valued dynamical systems.(2) In Chapters 6 and 8 we discuss new material on applications of the MKP and the KRP to the representation of ideal metrics and on various probabilistic approximation and limit theorems. This supplements the earlier results in this direction as described in the book of Rachev (1991) on probability metrics and stochastic models. In particular, we show that probability metrics allow us to find unified proofs for central limit theorems for martingales, (operator) stable limit theorems, and to more specific problems like compound Poisson approximation or rounding problems. Chapter 7, the first chapter in the second volume, is concerned with modifications of the MKP by additional or relaxed constraints. We discuss various types of moment problems and applications to the tomography paradoxon and to the approximation of queueing systems. A wide range of applications of metrics based on the transportation problem has been established in recent years in connection with recursive stochastic equations. We discuss algorithms of informatics (sorting, searching, branching, search trees) as well as applications to the approximation of stochastic differential equations, to the propagation of the chaos property of particle systems with applications to the approximation of nonlinear PDEs, as well as to the rate of convergence of empirical measures, which is of interest for matching problems in Chapters 9 and 10. From the technical point of view, MKPs can be subdivided into the discrete and continuous cases, according to the nature of their basic spaces and to the supports of the initial and the final masses. In the discrete case, the totality of the mass transference plans is the polytope that arises in the transportation problem of mathematical programming. There is, of course, a vast literature on the transportation problem, its specialization to the assignment problem, and its generalization to network flow problems. It turns out, as will be elaborated further in the book, that the northwest corner rule in the discrete case corresponds to a closed form for the solution in the continuous case. Indeed, the discrete analogue of a result known in the continuous case provides a new result in the discrete case; and its simple proof in the discrete case provides a new proof for the continuous case, see Rachev and R¨ uschendorf (1994c) and the references therein. Another approach in the discrete linear case prefers to exploit the special structure of supplies and demands (or clearings and fillings) and permits a particularly simple combinatorial algorithm for finding an optimal solution as developed (2) These
two chapters were written following closely the notes kindly provided to us by V.L. Levin.
Preface to Volume I
xvii
by Balinski (1983), Balinski and Russakoff (1984), Balinski (1985, 1986), Goldfarb (1985), Kleinschmidt, Lee, and Schannath (1987), and Burkard, Klinz, and Rudolf (1994). MTPs may be viewed as an analogue and a unifying framework of a problem considered by probabilists at the beginning of the twentieth century: How does one measure the difference between two random quantities? Many specific contributions to the analysis of this problem have been made, including Gini’s (1914) notion of concordance, Kendall’s τ , Spearman’s , the analysis of greatest possible differences by Hoeffding (1940) and others, by Fr´echet (1951, 1957), Robbins (1975), and Lai and Robbins (1976), and the generalizations of these results by Cambanis, Simons, and Stout (1976), R¨ uschendorf (1980), Tchen (1980), and Cambanis and Simons (1982). These (and others) offer piecemeal answers to basic questions that arise from different stochastic models; they give no guidance as to the question of what concept should be used where: There is no general theory underlying the diverse approaches. We refer to Kruskal (1958), Gini (1965), and Rachev (1984b, 1991c). In this book we investigate, develop, and exploit the connections between the discrete and continuous versions of the mass transportation problems (MTP) as well as study systematically the relationships between the methods and results from different versions of the MTP. The MTPs are the basis of many problems related to the question of stability of stochastic models, to the question of whether a proposed model yields a satisfactory approximation to the phenomenon under consideration, and to the problem of approximation of stochastic and deterministic algorithms. It is our belief that MTPs hold great promise in stochastic analysis as well as in mathematical analysis. The MTP is full of connections with geometry, (partial) differential equations, (generalized) convex analysis, moment problems, infinite-dimensional linear programming, measurable choice theory, and extension problems, and it has many open problems. It has a great potential for a series of applications in several scientific fields. This book grew out of joint work and lectures delivered by the authors at the Steklov Mathematical Institute, Universit¨at M¨ unster, Universit¨at Freiburg, the Ecole Polytechnique, SUNY at Stony Brook, and the University of California, Santa Barbara, over many years. Many colleagues provided helpful suggestions after reading parts of the manuscript. All chapters were rewritten several times, and preliminary versions were circulated among friends, who eliminated many inaccuracies and obscurities. We would like to thank H.G. Kellerer, V.L. Levin, M. Balinski, D. Ramachandran, G.A. Anastassiou, M. Maejima, M. Cramer, I. Olkin, M. Gelbrich, W. R¨ omisch, V. Bene˘s, L. Uckelmann, and many other friends and colleagues who encouraged us to complete the work. We are indebted to Mrs. M. Hattenbach and Ms. A. Blessing for their superb typing; the appearance of this monograph owes much to them. We are grateful to the publisher
xviii
Preface to Volume 1
and especially to J. Kimmel for support and patience. We are particularly thankful to J. Gani for his invaluable suggestions concerning improvements of this work, his help with the organization of the material, and his encouragement to continue the project. Finally, we thank the Alexander von Humboldt Foundation for its generous financial support of S.T. Rachev in 1995 and 1996, which made this joint work possible. (3)
(3) The
work of S.T. Rachev was also partially supported by NSF Grants. The joint work of the authors was supported by NATO-Grant CRG900798.
Contents to Volume II
Preface to Volume II
vii
Preface to Volume I
xi
7 Relaxed or Additional Constraints 7.1 Mass Transportation Problem with Relaxed Marginal Constraints . . . . . . . . . . 7.2 Fixed Sum of the Marginals . . . . . . . . . . . . . . 7.3 Mass Transportation Problems with Capacity Constraints . . . . . . . . . . . . . . . 7.4 Local Bounds for the Transportation Plans . . . . . 7.5 Closeness of Measure on a Finite Number of Directions . . . . . . . . . . . 7.6 Moment Problems of Stochastic Processes and Rounding Problems . . . . . . . . . . . . . . . . 7.6.1 Moment Problems and Kantorovich Radius . . . . . 7.6.2 Moment Problems Related to Rounding Proportions 7.6.3 Closeness of Random Processes with Fixed Moment Characteristics . . . . . . . . . . 7.6.4 Approximation of Queueing Systems with Prescribed Moments . . . . . . . . . . . . . . . 7.6.5 Rounding Random Numbers with Fixed Moments . .
1 . . . . . .
2 10
. . . . . .
17 36
. . .
42
. . . . . . . . .
52 54 57
. . .
62
. . . . . .
71 80
xx
Contents to Volume II
8 Probabilistic-Type Limit Theorems 8.1
85
Rate of Convergence in the CLT with Respect to Kantorovich Metric . . . . . . . . . . . .
85
8.2
Application to Stable Limit Theorems . . . . . . . . . . . 102
8.3
Summability Methods, Compound Poisson Approximation 126
8.4
Operator-Stable Limit Theorems . . . . . . . . . . . . . . 131
8.5
Proofs of the Rate of Convergence Results . . . . . . . . . 153
8.6
Ideal Metrics in the Problem of Rounding . . . . . . . . . 178
9 Mass Transportation Problems and Recursive Stochastic Equations
191
9.1
Recursive Algorithms and Contraction of Transformations . . . . . . . . . . . . . . . . . . . . . . 191
9.2
Convergence of Recursive Algorithms . . . . . . . . . . . . 204
9.2.1 Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . 204 9.2.2 Branching-Type Recursion . . . . . . . . . . . . . . . . . . 206 9.2.3 Limiting Distribution of the Collision Resolution Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 9.2.4 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 9.2.5 Limiting Behavior of Random Maxima . . . . . . . . . . . 231 9.2.6 Random Recursion Arising in Probabilistic Modeling: Limit Laws . . . . . . . . . . . . . . . . . . . . . . . . . . 236 9.2.7 Random Recursion Arising in Probabilistic Modeling: Rate of Convergence . . . . . . . . . . . . . . . . . . . . . 248 9.3
Extensions of the Contraction Method . . . . . . . . . . . 254
9.3.1 The Number of Inversions of a Random Permutation . . . 254 9.3.2 The Number of Records . . . . . . . . . . . . . . . . . . . 257 9.3.3 Unsuccessful Searching in Binary Search Trees
. . . . . . 260
9.3.4 Successful Searching in Binary Search Trees . . . . . . . . 263 9.3.5 A Random Search Algorithm . . . . . . . . . . . . . . . . 269 9.3.6 Bucket Algorithm
. . . . . . . . . . . . . . . . . . . . . . 272
10 Stochastic Differential Equations and Empirical Measures 10.1
277
Propagation of Chaos and Contraction of Stochastic Mappings . . . . . . . . . . . . . . . . . . . . 277
10.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Contents to Volume II
xxi
10.1.2 Equations with p-Norm Interacting Drifts . . . . . . . . . 279 10.1.3 A Random Number of Particles
. . . . . . . . . . . . . . 290
10.1.4 pth Mean Interactions in Time: A Non-Markovian Case . 293 10.1.5 Minimal Mean Interactions in Time
. . . . . . . . . . . . 307
10.1.6 Interactions with a Normalized Variation of the Neighbors: Relaxed Lipschitz Conditions . . . . . . . . . . . . . . . . 308 10.2
Rates of Convergence in the Kantorovich Metric . . . . . . . . . . . . . . . . . . 322
10.3
Stochastic Differential Equations . . . . . . . . . . . . . . 332
References
351
Abbreviations
395
Symbols
397
Index
409
Contents to Volume I
Preface to Volume I
vii
Preface to Volume II
xv
1 Introduction
1
1.1
Mass Transportation Problems in Probability Theory . . .
1
1.2
Specially Structured Transportation Problems . . . . . . .
21
1.3
Two Examples of the Interplay Between Continuous and Discrete MTPs . . . . . . . . . . . . . . . . . . . . . .
23
Stochastic Applications . . . . . . . . . . . . . . . . . . . .
27
1.4
2 The Monge–Kantorovich Problem 2.1
57
The Multivariate Monge–Kantorovich Problem: An Introduction . . . . . . . . . . . . . . . . . . . . . . . .
58
2.2
Primal and Dual Monge–Kantorovich Functionals . . . . .
64
2.3
Duality Theorems in a Topological Setting . . . . . . . . .
76
2.4
General Duality Theorem . . . . . . . . . . . . . . . . . .
82
2.5
Duality Theorems with Metric Cost Functions . . . . . . .
86
2.6
Dual Representation for Lp -Minimal Metrics . . . . . . . .
96
xxiv
Contents to Volume I
3 Explicit Results for the Monge–Kantorovich Problem
107
3.1
The One-Dimensional Case . . . . . . . . . . . . . . . . . 107
3.2
The Convex Case . . . . . . . . . . . . . . . . . . . . . . . 112
3.3
The General Case . . . . . . . . . . . . . . . . . . . . . . . 123
3.4
An Extension of the Kantorovich L2 -Minimal Problem . . 132
3.5
Maximum Probability of Sets, Maximum of Sums, and Stochastic Order . . . . . . . . . . . . . . . . . . . . . 144
3.6
Hoeffding–Fr´echet Bounds . . . . . . . . . . . . . . . . . . 151
3.7
Bounds for the Total Transportation Cost . . . . . . . . . 158
4 Duality Theory for Mass Transfer Problems
161
4.1
Duality in the Compact Case . . . . . . . . . . . . . . . . 161
4.2
Cost Functions with Triangle Inequality . . . . . . . . . . 172
4.3
Reduction Theorems . . . . . . . . . . . . . . . . . . . . . 190
4.4
Proofs of the Main Duality Theorems and a Discussion . . 207
4.5
Duality Theorems for Noncompact Spaces . . . . . . . . . 219
4.6
Infinite Linear Programs . . . . . . . . . . . . . . . . . . . 241
4.6.1 Duality Theory for an Abstract Scheme of Infinite-Dimensional Linear Programs and Its Application to the Mass Transfer Problem
. . . . 241
4.6.2 Duality Theorems for the Mass Transfer Problem with Given Marginals . . . . . . . . . . . . . . . . . . . . . 245 4.6.3 Duality Theorem for a Marginal Problem with Additional Constraints of Moment-Type . . . . . . . 251 4.6.4 Duality theorem for a Further Extremal Marginal Problem . . . . . . . . . . . . . . . . . . . . . . . 258 4.6.5 Duality Theorem for a Nontopological Version of the Mass Transfer Problem . . . . . . . . . . . . . . . . 265 5 Applications of the Duality Theory
275
5.1
Mass Transfer Problem with a Smooth Cost Function—Explicit Solution . . . . . . . . . . . . . . 275
5.2
Extension and Approximate Extension Theorems . . . . . 290
5.2.1 The Simplest Extension Theorem the Case X = E(S) and X1 = E(S1 ) . . . . . . . . . . . 290 5.2.2 Approximate Extension Theorems . . . . . . . . . . . . . 292 5.2.3 Extension Theorems . . . . . . . . . . . . . . . . . . . . . 295
Contents to Volume I
xxv
5.2.4 A continuous selection theorem . . . . . . . . . . . . . . . 302 5.3
Approximation Theorems . . . . . . . . . . . . . . . . . . 306
5.4
An Application of the Duality Theory to the Strassen Theorem . . . . . . . . . . . . . . . . . . . 319
5.5
Closed Preorders and Continuous Utility Functions . . . . 322
5.5.1 Statement of the Problem and the Idea of the Duality Approach . . . . . . . . . . . . . . . . . . . 322 5.5.2 Functionally Closed Preorders . . . . . . . . . . . . . . . . 324 5.5.3 Two Generalizations of the Debreu Theorem . . . . . . . . 329 5.5.4 The Case of a Locally Compact Space . . . . . . . . . . . 335 5.5.5 Varying preorders and a universal utility theorem
. . . . 337
5.5.6 Functionally Closed Preorders and Strong Stochastic Dominance . . . . . . . . . . . . . . 341 5.6
Further Applications to Utility Theory . . . . . . . . . . . 344
5.6.1 Preferences That Admit Lipschitz or Continuous Utility Functions . . . . . . . . . . . . . . . 344 5.6.2 Application to Choice Theory in Mathematical Economics . . . . . . . . . . . . . . . . . 352 5.7
Applications to Set-Valued Dynamical Systems . . . . . . 354
5.7.1 Compact-Valued Dynamical Systems: Quasiperiodic Points . . . . . . . . . . . . . . . . . . . . . 354 5.7.2 Compact-Valued Dynamical Systems: Asymptotic Behavior of Trajectories . . . . . . . . . . . . 358 5.7.3 A Dynamic Optimization Problem . . . . . . . . . . . . . 363 5.8
Compensatory Transfers and Action Profiles . . . . . . . . 367
6 Mass Transshipment Problems and Ideal Metrics
371
6.1
Kantorovich–Rubinstein Problems with Constraints . . . . 372
6.2
Constraints on the κth Difference of Marginals . . . . . . 383
6.3
The General Case . . . . . . . . . . . . . . . . . . . . . . . 402
6.4
Minimality of Ideal Metrics . . . . . . . . . . . . . . . . . 414
References
429
Abbreviations
473
Symbols
475
Index
487
7 Modifications of the Monge–Kantorovich Problems: Transportation Problems with Relaxed or Additional Constraints
In this chapter we study modifications of the usual transportation problem by allowing additional constraints on the admissible supply—resp. demand—distributions. In particular, we consider the case that the marginal distribution function of the supply is bounded below by a d.f. F1 , while the marginal d.f. of the demand is bounded above by a d.f. F2 . We also examine transportation plans with constraints of a local type concerning the densities of the marginals, and finally, we study transportation problems with additional moment-type constraints. For the solution of these problems we make use of some methods arising in the theory of marginal and moment problems, duality theory, and stochastic ordering results. The next part is concerned with a solution of the tomography paradox. With respect to some weak metrics, two distributions are getting close if they coincide on an increasing number of directions. In the final sections we review results on the closeness of distributions under given momenttype characteristics and discuss applications to the rounding problem. Most of the results in these sections are contained in Rachev and R¨ uschendorf (1993, 1994c), Levin and Rachev (1989), Klebanov and Rachev (1995), and Anastassiou and Rachev (1992). A survey on related discrete transportation problems is given in Burkard, Klinz, and Rudolf (1994).
2
7. Relaxed or Additional Constraints
7.1 Mass Transportation Problem with Relaxed Marginal Constraints For distribution functions F1 , F2 let F(F1 , F2 ) denote the set of all d.f.s F on IR2 with marginals F1 , F2 (i.e., F (x, ∞) = F1 (x), F (∞, y) = F2 (y)). Then the transportation problem with nonnegative cost function c(x, y), x, y ∈ IR, has the form minimize c(x, y) dF (x, y) over all F ∈ F(F1 , F2 ). (7.1.1) IR2
Usually, in the linear programming setting, F1 is viewed as the supply distribution and F2 as the demand distribution. Clearly, (7.1.1) is an infinitedimensional of the discrete transportation problem: Given ai ≥ analogue m n 0, bj ≥ 0, i=1 ai = j=1 bj , minimize
m n
cij xij , subject to the constraints
(7.1.2)
i=1 j=1 n j=1
xij = ai , 1 ≤ i ≤ m,
m
xij = bj , j = 1, . . . , n, xij ≥ 0, ∀i, j.
i=1
Suppose c(x, y) (resp. (cij )) satisfies the “Monge” conditions, i.e., c is right continuous, and c(x , y ) − c(x, y ) − c(x , y) + c(x, y) ≤ 0
for all x ≥ x, y ≥ y;
(7.1.3)
in the discrete case these conditions are of the form cij + ci+1,j+1 − ci,j+1 − ci+1,j ≤ 0, ∀1 ≤ i < m, 1 ≤ j < n.
(7.1.4)
Then the solution of (7.1.1), (7.1.4) is well known and based on the “northwest corner rule,” which leads to a greedy algorithm; see Hoffman (1961). For (7.1.1) the solution is given by the d.f. F ∗ , F ∗ (x, y) = min{F1 (x), F2 (y)}.
(7.1.5)
F ∗ is the upper Fr´echet bound, see (3.6.2). Recall that the Fr´echet bounds provide the following characterization of F(F1 , F2 ): F ∈ F(F1 , F2 )
if and only if
(7.1.6)
F∗ (x, y) := (F1 (x) + F2 (y) − 1)+ ≤ F (x, y) ≤ F ∗ (x, y) (here (·)+ = max(0, ·)); the lower Fr´echet bound yields the solution of the maximization problem corresponding to (7.1.1).
7.1 Mass Transportation Problem with Relaxed Marginal Constraints
3
In terms of random variables an equivalent formulation of the transportation problem is the following: minimize Ec(X, Y ),
subject to FX = F1 , FY = F2 ,
(7.1.7)
where X, Y are random variables on a rich enough (e.g., atomless) probability space (Ω, A, P ). The solutions (7.1.5), resp. (7.1.6), then can be represented as distributions of r.v.s X ∗ , Y ∗ : X ∗ = F1−1 (U ),
Y ∗ = F2−1 (U )
X ∗ = F1−1 (U ),
Y ∗ = F2−1 (1 − U )
(for (7.1.1), (7.1.5)),
(7.1.8)
resp. (for F∗ ),
(7.1.9)
where U is uniformly distributed on (0, 1), and F1−1 (u) = inf{y; F1 (y) ≥ u} is the generalized inverse of F1 . We next consider the mass transportation problem (7.1.1), but with relaxed marginal constraints. For d.f.s F1 , F2 the set H(F1 , F2 ) = {F ; F is a d.f. on IR2 with marginal d.f.s F1 ≤ F1 , F2 ≥ F2 }
(7.1.10)
of all d.f.s F with F1 (x) = F (x, ∞) ≤ F1 (x), ∀x ∈ IR1 , and F2 (y) = F (∞, y) ≥ F2 (y), ∀y ∈ IR1 . We consider the transportation problem: minimize
c(x, y) dF (x, y),
subject to F ∈ H(F1 , F2 ),
(7.1.11)
R2
or, equivalently, minimize Ec(X, Y ),
subject to FX ≤ F1 , FY ≥ F2 .
(7.1.12)
In the discrete case the problem is to minimize
cij xij ,
(7.1.13)
where for some “supplies” s1 , . . . , sn , a1 ≤ s1 , a1 +a2 ≤ s1 +s2 , . . . , and for some demands d1 , . . . , dn , b1 ≥ d1 , b1 +b2 ≥ d1 +d2 , . . . (ai , bi as in (7.1.2)). This describes a production process and a consumption process subject to some priorities (e.g., queueing priorities) with capacities s1 , . . . , sn having the following property: Every remaining free capacity at stage i of the production (resp. consumption) process can be transferred to some of the next stages i + 1, . . . , n.
4
7. Relaxed or Additional Constraints
Theorem 7.1.1 Let the cost function c(x, y) be symmetric in x, y, let c(x, y) satisfy the Monge condition (7.1.3), and let c(x, x) = 0 for all x ∈ IR. Set H ∗ (x, y) = min{F1 (x), max{F1 (y), F2 (y)}},
x, y ∈ IR.
(7.1.14)
Then (a)
H ∗ ∈ H(F1 , F2 ),
(b)
H ∗ solves the relaxed transportation problem (7.1.11),
(c)
(7.1.15)
1 c(x, y) dH (x, y) = c F1−1 (u), min F1−1 (u), F2−1 (u) du. ∗
0
IR2
Remark 7.1.2 Setting G1 (y) = max{F1 (y), F2 (y)}, we see from Theorem 7.1.1 that the relaxed transportation problem (7.1.11) is equivalent to the transportation problem (7.1.1) with marginals F1 , G1 . In terms of random variables the solution can be expressed by the joint distribution of X ∗ = F1−1 (U ) and −1 −1 Y ∗ = G−1 (U ) = min F (U ), F (U ) 1 1 2
(7.1.16)
(cf. (7.1.8)). Proof: The Monge condition implies that we can view the function −c(x, y) as a “distribution function” corresponding to a nonnegative measure μc on IR2 . Let X, Y be any real r.v.s, and for x, y ∈ IR1 set x ∨ y = max{x, y}, x ∧ y = min{x, y}. Theorem 7.1.1 is a consequence of the following two claims. Claim 7.1.3 (Cambanis, Simons, and Stout (1976); see also Dall’Aglio (1956) for the special case c(x, y) = |x − y|p ) 2Ec(X, Y ) = (P (X < x ∧ y, Y ≥ x ∨ y) IR2
+P (X ≥ x ∨ y, Y < x ∧ y))μc ( dx, dy).
(7.1.17)
For the proof of Claim 7.1.3 define the function f (x, y, w) : IR2 × Ω → IR by ⎧ ⎨ 1 if X(w) < x, y ≤ Y (w) or Y (w) < x, y ≤ X(w), f (x, y, w) = ⎩ 0 otherwise.
7.1 Mass Transportation Problem with Relaxed Marginal Constraints
5
Using Fubini’s theorem, (Ew f (x, y, w))μc ( dx, dy). (7.1.18) Ew f (x, y, w)μc ( dx, dy) = IR2
IR2
Next, the symmetry of c(x, y) and c(x, x) = 0 yield f (x, y, w) dμc
(7.1.19)
IR2
= − [c (Y (w), Y (w)) + c (X(w), X(w)) − c (X(w), Y (w)) −c (Y (w), X(w))] = 2c (X(w), Y (w)) . Clearly, Ew f (x, y, w)
(7.1.20)
= P (X < x ∧ y, Y ≥ x ∨ y) + P (X ≥ x ∨ y, Y < x ∧ y). Combining (7.1.18), (7.1.19), and (7.1.20) we obtain (7.1.17). Claim 7.1.4 Define X ∗ = F1−1 (U ), Y ∗ = min F1−1 (U ), F2−1 (U ) . Then Ec(X ∗ , Y ∗ ) = min (Ec(X, Y ); FX ≤ F1 , FY ≥ F2 ) ,
(7.1.21)
and the value of the expectation in (7.1.21) is given by 1 ∗ ∗ Ec(X , Y ) = max (0, F2 ((x ∧ y)−) − F1 ((x ∨ y)−)) μc ( dx, dy) 2 IR2
1 c F1−1 (t), min F1−1 (t), F2−1 (t) dt. =
(7.1.22)
0
For the proof of Claim 7.1.4 let X, Y be any r.v.s with d.f.s FX ≤ F1 , FY ≥ F2 . Using Claim 7.1.3 we obtain P (X ≥ x ∨ y, Y < x ∧ y)μc ( dx, dy) (7.1.23) 2Ec(X, Y ) ≥ IR2
(P (Y < x ∧ y) − P (X < x ∨ y, Y < x ∧ y)) μc ( dx, dy)
= IR2
≥
(P (Y < x ∧ y) − min {P (X < x ∨ y), IR2
6
7. Relaxed or Additional Constraints
P (Y < x ∧ y)}) μc ( dx, dy) (P (Y < x ∧ y) − P (X < x ∨ y))+ μc ( dx, dy)
= IR2
≥
(F2 ((x ∧ y)−) − F1 ((x ∨ y)−))+ μc ( dx, dy). IR2
Next, we check that the lower bound in (7.1.23) is attained for X ∗ = F1−1 (U ), Y ∗ = min F1−1 (U ), F2−1 (U ) . In fact, by Claim 7.1.3 using X ∗ ≥
Y ∗ and {U < F2 (z)} = F2−1 (U ) < z a.s. we get 2Ec(X ∗ , Y ∗ ) = (P (X ∗ ≥ x ∨ y, Y ∗ < x ∧ y) IR2
(7.1.24)
+P (X ∗ < x ∧ y, Y ∗ ≥ x ∨ y)) μc ( dx, dy) P (X ∗ ≥ x ∨ y, Y ∗ < x ∧ y)μc ( dx, dy)
= IR2
= IR2
=
P F1−1 (U ) ≥ x ∨ y, min F1−1 (U ), F2−1 (U ) < x ∧ y μc ( dx, dy) P F1−1 (U ) ≥ x ∨ y, F2−1 (U ) < x ∧ y μc ( dx, dy)
IR2
P (U ≥ F1 (x ∨ y), U < F2 (x ∧ y))+ μc ( dx, dy)
= IR2
=
(F2 ((x ∧ y)−) − F1 ((x ∨ y)−))+ μc ( dx, dy). IR2
Obviously, F(X ∗ ,Y ∗ ) = H ∗ ∈ H(F1 , F2 ), and the proof of Theorem 7.1.1 is complete. 2
Remark 7.1.5 The optimal coupling (7.1.16) leads to the following “greedy” algorithm for solving the finite discrete transportation problem with relaxed side conditions: minimize
n n
cij xij
i=1 j=1
subject to:
xij ≥ 0,
(7.1.25)
7.1 Mass Transportation Problem with Relaxed Marginal Constraints j n
xrs ≥
j
s=1 r=1
s=1
n i
i
xrs ≤
r=1 s=1
bs =: Gj ,
1 ≤ j ≤ n,
ar =: Fi ,
1 ≤ i ≤ n,
7
r=1
n where the sum of the “demands” s=1 bs equals the sum of the “supplies” n r=1 ar , assuming that (cij ) are symmetric, cii = 0, and c satisfies the Monge condition (7.1.4). Set =
Hi δ1
max(Fi , Gi ), 1 ≤ i ≤ n,
(7.1.26)
= H1 , δi+1 = Hi+1 − Hi , 1 ≤ i ≤ n − 1.
Then (7.1.25) is equivalent to the standard transportation problem (7.1.2) with side conditions (ai ), (δi ). 7In the following example we compare the solution of problem (7.1.25) with inequality constraints with the “greedy” solution of the standard transportation problem with equality constraints (7.1.2). For the problem with inequality constraints we first calculate the new artificial demands δj as in (7.1.26) and then apply the northwest corner rule. supply a1
Example 7.1.6 yij xij
20 10
10
20 20
20 10
10 20 20 10 10 10 10
demand b1
10
30
10
40
0
10
10
40
50
90
90
100
Hj = Fj ∨ Gj
20
40
60
90
90
100
δ 1 = H1 ,
20
20
20
30
0
10
Gj =
j s=1
bs
δj+1 = Hj+1 − Hj
Fi =
i r=1
20
20
0
20
40
60
20
80
10
90
10
100
ar
“artificial” demands
xij = solution of the standard transportation problem (7.1.2), using the classical northwest corner yij = solution of the transportation problem with relaxed side conditions
8
7. Relaxed or Additional Constraints
We next extend the solution to the nonsymmetric case. We relax the symmetry condition, assuming that for any x, y the functions c(x, ·), c(·, y) are unimodal: c(x, y1 ) ≤ c(x, y2 ) if x ≤ y1 ≤ y2 or y2 ≤ y1 ≤ x,
(7.1.27)
c(x1 , y) ≤ c(x2 , y) if x2 ≤ x1 ≤ y or y ≤ x1 ≤ x2 . Theorem 7.1.7 If c(x, x) = 0 for all x ∈ IR and c satisfies the Monge condition and the unimodality condtion (7.1.27), then the relaxed transportation problem minimize Ec(X, Y )
subject to FX ≥ F1 , FY ≤ F2
has a solution, given by the coupling X ∗ = F1−1 (U ), Y ∗ = max F1−1 (U ), F2−1 (U )
(7.1.28)
(7.1.29)
with joint distribution FX ∗ ,Y ∗ (x, y) = min F1 (x), min (F1 (y), F2 (y)) , and the optimal value is given by 1 Ec(X ∗ , Y ∗ ) = c F1−1 (u), max F1−1 (u), F2−1 (u) du. 0
Proof: Let X, Y be r.v.s with FX ≥ F1 , FY ≤ F2 . Then by (7.1.8), −1 Ec(X, Y ) ≥ Ec FX (U ), FY−1 (U ) . (7.1.30) −1 −1 −1 −1 −1 = Let G(y) X (y), FY (y)). Then FX ≤ F1 , FY ≥ F2 , and G −1= min(F −1 max FX , FY . We now need the following
Claim 7.1.8 1 1 −1 −1 c FX (u), FY−1 (u) du ≥ c FX (u), G−1 (u) du. 0
(7.1.31)
0
−1 To show Claim 7.1.8 set (for a fixed u ∈ (0, 1)), x = FX (u), y1 = −1 −1 −1 −1 FX (u) ∨ FY (u) = G (u), and y2 = FY (u).
Case 1: x < y2 . In this case, x ≤ y1 ≤ y2 , and therefore, the unimodality condition (7.1.27) implies c(x, y2 ) ≥ c(x, y1 ). Case 2: y2 ≤ x. In this case, y1 = x, and therefore, y2 ≤ y1 = x. Again by the unimodality condition, c(x, y2 ) ≥ c(x, y1 ). So Claim 7.1.8 holds.
7.1 Mass Transportation Problem with Relaxed Marginal Constraints
9
Claim 7.1.9 The following bound holds for every coupling (X, Y ) : 1 −1 −1 c FX (u), FY−1 ∨ FX (u) du
(7.1.32)
0
1 ≥ c F1−1 (u), F2−1 (u) ∨ F1−1 (u) du. 0 −1 (u), x 2 = FY−1 (u), x1 = F1−1 (u), x2 = For the proof define x 1 = FX 1 ≤ x1 , x2 ≤ x 2 . F2−1 (u) for a fixed u. Then x
If x 1 < x 2 , then x 1 ≤ x 2 ∨ x2 ≤ x 2 . if
x 1 ≥ x 2 , then x 1 = x 1 ∨ x2 ≥ x 2 .
(7.1.33)
From (7.1.33) we obtain Claim 7.1.10 c( x1 , x 1 ∨ x2 ) ≥ c(x1 , x1 ∨ x2 ).
(7.1.34)
For the proof of Claim 7.1.10 we use the relation x1 ≥ x 1 . 1 . Then c( x1 , x2 ) = c( x1 , x 1 ∨ x2 ) ≥ c(x1 , x2 ) = Case 1: x2 > x1 > x c(x1 , x1 ∨ x2 ) by the unimodality condition. Case 2: (a) x1 ≥ x2 ≥ x 1 . Then, trivially, c( x1 , x2 ) = c( x 1 , x2 ∨ x 1 ) ≥ c(x1 , x1 ∨ x2 ) = c(x1 , x1 ) = 0. 1 ≥ x2 . Then again, c( x1 , x 1 ) = c( x1 , x 1 ∨x2 ) ≥ c(x1 , x1 ∨x2 ) = (b) x1 ≥ x c(x1 , x1 ) = 0. Claims 7.1.8, 7.1.9, and 7.1.10 imply (7.1.28).
2
Remark 7.1.11 (a) The unimodality assumption (7.1.27) is quite natural from an application point of view. Note that the transportation problem in Theorem 7.1.7 is the same as in Theorem 7.1.1 (where only the indices 1,2 have been changed). We used this to demonstrate that the solution F ∗ is not unique. Without the symmetry, resp. the unimodality condition, the solution may differ substantially. Given a right continuous function f = f (y) ≥ 0, consider the cost function c(x, y) = f (y). Then
10
7. Relaxed or Additional Constraints
c satisfies the Monge condition, and so (7.1.28) is equivalent to the following problem: minimize
f (y) dFY (y)
subject to FY ≤ F2 .
(7.1.35)
Equivalently, we are seeking a d.f. F2 ≤ F2 such that the distribution of f with respect to F2 has a minimal first moment. Obviously, the solution (7.1.31) of Theorem 7.1.7 is not a solution of (7.1.35). (b) In the proof of Theorem 7.1.7, the assumption c(x, x) = 0 can be replaced with a weaker one, c(x, x) ≤ c(x, y) ∧ c(y, x),
for all x, y ∈ IR.
(7.1.36)
7.2 Mass Transportation Problem with Fixed Sum (Difference) of the Marginals and with Stochastically Ordered Marginals Consider a flow in a network with n nodes, i = 1, . . . , n, and let xij be the flow from node i to node j. Assume that for all nodes k the value of x + j xkj is fixed and equal to hk . As motivation, suppose ai = in ik n x , b = ik i k=1 k=1 xki to be the amount of workload corresponding to the outflow, resp. to the inflow, in node i. Assume that the total work capacity at node i is given by hi (in a certain time period). Then every admissible flow (xij ) should satisfy the condition h i = ai + b i ,
1 ≤ i ≤ n.
(7.2.1)
k k k Set A(k) = i=1 ai , B(k) = i=1 bi , and H(k) = i=1 hi . Then hk = A(k) + B(k) − (A(k − 1) + B(k − 1)), and (7.2.1) is equivalent to H(k) = A(k) + B(k),
1 ≤ k ≤ n.
(7.2.2)
Let cij be the transportation cost of a unit from node i to node j. Then the problem is to minimize the total cost cij xij subject to the admissibility condition (7.2.1) and xij ≥ 0. The general formulation of this problem is the following. For two d.f.s A, B define G(x) = 12 (A(x) + B(x)). For a given cost function c(x, y), c(x, y) dF (x, y), subject to F ∈ FA+B . (7.2.3) minimize IR2
7.2 Fixed Sum of the Marginals
11
Here FA+B is the set of all d.f.s F (x, y) with marginal d.f.s F1 , F2 satisfying F1 (x) + F2 (x) = A(x) + B(x). Consider next the special case c(x, y) = |x − y|. Let X, Y be real r.v.s with joint d.f. F . Then by the triangle inequality, E|X − Y | ≤
inf (E|X − a| + E|Y − a|).
a∈IR1
(7.2.4)
Since E|X − a| + E|Y − a| = |x − a| d(FX + FY )(x) depends only on the sum of the marginals, (7.2.3) is the best possible improvement of (7.2.4), provided that the sum of the marginal FX + FY is known. Rachev (1984d) showed that sup {E|X − Y |p ; FX + FY = A + B} = 1 −1 G (t) − G−1 (1 − t)p dt, p ≥ 1.
(7.2.5)
0
The following result gives an explicit solution of the general problem in (7.2.3). Proposition 7.2.1 Suppose c ≥ 0 is symmetric and satisfies the Monge condition: c(x , y ) − c(x, y ) − c(x , y) + c(x, y) ≤ 0
∀x ≥ x, y ≥ y.
(7.2.6)
1 = c(G−1 (u), G−1 (u)) du,
(7.2.7)
Then inf
c(x, y) dF (x, y); F ∈ FA+B
0
and sup
c(x, y) dF (x, y); F ∈ FA+B
1 = c(G−1 (u), G−1 (1−u)) du.(7.2.8) 0
The corresponding optimal pairs of r.v.s (couplings) are given by (G−1 (U ), G−1 (U )), resp. (G−1 (U ), G−1 (1 − U )). Proof: Since c is symmetric, we obtain for any F ∈ FA+B , 1 c(x, y) dF (x, y) = (c(x, y) + c(y, x)) dF (x, y) 2 F (x, y) + F (y, x) . = c(x, y) d 2
12
7. Relaxed or Additional Constraints
(y,x) On the other hand, Fs (x, y) = F (x,y)+F ∈ F(G, G). Consequently, we 2 obtain (7.2.7), (7.2.8) by making use of (7.1.8) and (7.1.9) with F1 = F2 = G. 2
We have the following analogue of the above proposition for nonsymmetric cost functions. Proposition 7.2.2 If c(x, y) satisfies the Monge condition (7.2.6) and furthermore, x1 ≤ y ≤ x2 implies that c(x1 , x2 ) ≥ c(y, y), then
inf
c(x, y) dF (x, y); F ∈ FA+B
1 c(G−1 (u), G−1 (u)) du. =
(7.2.9)
0
Proof: Applying the Monge conditions for every X, Y with FX,Y ∈ FA+B , −1 (U ), FY−1 (U )). Since FX (x)+FY (x) = 2G(x), it follows Ec(X, Y ) ≥ Ec(FX −1 −1 ∧ FY−1 ≤ G−1 ≤ FX ∨ that FX ∧ FY ≤ G ≤ FX ∨ FY , and therefore, FX −1 −1 −1 −1 −1 FY . Consequently, we have c(FX (U ), FY (U )) ≥ c(G (U ), G (U )), which proves (7.2.9). 2 Remark 7.2.3 The marginals of the class FA+B have largest and smallest elements, defined by ⎧ ⎨ 2 G(x), x < x0 , ∗ F1 (x) = ⎩ 1, x≥x , 0
and
⎧ ⎨ 2 G(x) − 1, ∗ F2 (x) = ⎩ 1,
x < x0 , x ≥ x0 ,
with x0 := inf{y; 2G(y) ≥ 1}. Note that there is no smallest d.f. in FA+B . To show this let F1 (x), F2 (x) be the marginal d.f.s of the smallest elements F ∈ FA+B and let G1 , G2 be d.f.s such that G1 (x) + G2 (x) = 2G(x). If the lower Fr´echet bounds satisfy (F1 (x) + F2 (y) − 1)+ ≤ (G1 (x) + G2 (y) − 1)+ , then F1 ≤ G1 and F2 ≤ G2 , which implies that F1 = G1 , F2 = G2 . In particular, this implies that (G−1 (U ), G−1 (1 − U )) is in the general nonsymmetric case no longer a solution to the problem of maximizing c(x, y) dF (x, y) over the class FA+B . For example, let G be the d.f. of 1 4
4 i=1
ε(i) . Then P1 = P (G ∗ −1
while P2 = P ((F1 )
−1
(U ),G−1 (1−U ))
(U ),(F2∗ )−1 (1−U ))
=
= 14 (ε(1,4) + ε(2,3) + ε(3,2) + ε(4,1) ), 1 2 (ε(1,4)
+ ε(2,3) ). For c1 (x, y) =
7.2 Fixed Sum of the Marginals
13
1(−∞,(3,2)] (x, y), we have EP1 c1 = 14 , EP2 c1 = 0, while for c2 = 1[(2,3),∞) , we have EP1 c1 = 14 , EP2 c2 = 12 . Note that both functions, −c1 , −c2 , are Monge functions (but are not unimodal). We next consider the case where in the network example we fix the total outflow minus the inflow of each node. This problem is known in the literature as the minimal network flow problem (cf. for example Barnes and Hoffman (1985, Section 9) or Anderson and Nash (1987)). Assume that the outflow minus the inflow of each node is fixed; i.e., the following Kirchhoff equations hold xik − xki = ai − bi = hi for all i, k
k
or equivalently, H(k) = A(k) − B(k), with A(k) =
k j=1
aj , B(k) =
1 ≤ k ≤ n, k
bj , and H(k) =
j=1
(7.2.10) k
hj . Consider now the
j=1
general case: Let A, B be distribution functions and let FA−B be the set of all “generalized” d.f.s of finite measures on IR2 with marginals F1 , F2 satisfying F1 − F2 = A − B. We consider the following transportation problem: (7.2.11) minimize c(x, y) dF (x, y) subject to F ∈ FA−B , with c(x, y) satisfying the Monge condition (7.2.6). To solve (7.2.11) we make use of the following dual representation (cf. (6.1.23)): (7.2.12) inf c(x, y) dF (x, y); F ∈ FA−B = sup f d(A − B)(x); f (x) − f (y) ≤ c(x, y), ∀x, y . We first consider a particular type of cost function. Proposition 7.2.4 Let c(x, y) = |x−y| max(1, h(|x−a|), h(|y−a|)), where h is a monotonically nondecreasing function on IR+ . Then inf c(x, y) dF (x, y); F ∈ FA−B (7.2.13) = max(1, h(|x − a|))|A − B|(x) dx, provided that h(|x − a|) is locally integrable.
14
7. Relaxed or Additional Constraints
Proof: We first note that the duality constraints condition f (x) − f (y) ≤ c(x, y), for all x, y, holds if and only if f is absolutely continuous and moreover, |f (x)| ≤ max(1, h(|x − y|)) a.s. Consequently, by the dual representation (7.2.12), we obtain inf c(x, y) dF (x, y); F ∈ FA−B = sup f d(A − B)(x); |f | ≤ max(1, h(|x − a|)), ∀x = sup f (x) d(A − B)(x) dx; |f | ≤ max(1, h(|x − a|)), ∀x = max(1, h(|x − a|))|A − B|(x) dx. 2 To handle the general case set c(x, y) = |x − y|ζ(x, y)
c(x, y) i.e., ζ(x, y) = |x − y|
.
(7.2.14)
Theorem 7.2.5 Assume that for any x < t < y, ζ(t, t) ≤ ζ(x, y), ζ(x, y) = ζ(y, x). Moreover, let ζ(x, y) be right continuous in y, and also assume that t → ζ(t, t) is locally bounded. Then the optimal value in the minimization problem (7.2.11) is equal to = ζ(t, t)|A − B|(t) dt. (7.2.15) inf c(x, y) dF (x, y); F ∈ FA−B Proof: Let F = {f ; f (x)−f (y) ≤ c(x, y), ∀x, y}, and let F ∗ = {f absolutely continuous and |f (t)| ≤ ζ(t, t), ∀t}. Then F ⊂ F ∗ , and for f ∈ F (y) (y) we have f (x)−f ≤ ζ(x, y), and therefore, lim f (x)−f ≤ ζ(x, x). Also, |x−y| |x−y| y→x
f (x) − f (y) f (y) − f (x) = − lim ≥ − lim ζ(y, x) = −ζ(x, x). |x − y| |x − y| y→x lim
Since f is locally Lipschitz, it is absolutely continuous, so the inequalities above imply that |f (t)| ≤ ζ(t, t) a.s. If, conversely, f ∈ F ∗ , then y y f (x) − f (y) = f (t) dt, and therefore, |f (x) − f (y)| ≤ |f (t)| dt ≤ y
x
x
ζ(t, t) dt ≤ |x − y|ζ(x, y) = c(x, y). The dual representation (7.2.12) again
x
implies (7.2.13) (by the same arguments as in the proof of Proposition 7.2.4). 2
7.2 Fixed Sum of the Marginals
15
Next, we consider the following transportation problem with stochastically ordered marginals posed by Rogers (1992). Let F, G be real distribution functions, F ≤st G; here as usual ≤st stands for the stochastic order. Let C := {(x, y) ∈ IR2 ; x ≤ y}, and let MC (F, G) := M (F, G) ∩ {μ ∈ M 1 (IR2 , IB2 ); μ(C) = 1}
(7.2.16)
be the set of all measures with marginals F, G that are concentrated on the order cone C. The problem is to determine, for a given strictly convex function ϕ, the bound sup ϕ(x − y)μ( dx, dy); μ ∈ MC (F, G) . (7.2.17) The motivation for problem (7.2.17) is to get a good monotone coupling of random walks (Sn ), (Sn ) with S0 = x ≥ X0 = 0, Sn ≥ Sn for all n, and Sn = Sn for all large enough n. Without the order restriction, a solution of (7.2.17) is given by the random variables X = F −1 (U ), Y = G−1 (1 − U ) for a uniform (0, 1) distributed r.v. U. It is intuitively clear that a solution of (7.2.17) should concentrate as much mass on the diagonal as possible. This is indeed true. Theorem 7.2.6 (Rogers (1992)) Each solution (X, Y ) of (7.2.17) has the property that P (X = Y ) = |F ∧ G| = f ∧ g dm, (7.2.18) when F = f m, G = gm. There exists a solution of (7.2.17). We next characterize the optimal solutions by an order-type relation. Theorem 7.2.7 Let X, Y be r.v.s with d.f.s F, G and X ≤ Y a.s. Then (X, Y ) defines a solution of (7.2.17) iff X(ω) < X(ω ) ≤ Y (ω) ≤ Y (ω )
implies Y (ω ) = Y (ω)
(7.2.19)
a.s. (for (ω, ω ) and with respect to the product measure). Proof: If (X, Y ) is an optimal admissible coupling and if on a set of pairs (ω, ω ) with positive measure X(ω) < X(ω ) ≤ Y (ω) < Y (ω ) holds, then let us define Y (ω ) := Y (ω), Y (ω) := Y (ω ) and set Y = Y otherwise. Then Y has d.f. G and Eϕ(X − Y ) > Eϕ(X − Y ) because ϕ is strictly convex. Since there is essentially up to simultaneous rearrangements only one pair of r.v.s X, Y with d.f.s F, G satisfying the order relation (7.2.18), the opposite direction follows from the first part of the proof. 2
16
7. Relaxed or Additional Constraints
In terms of measures μ ∈ MC (F, G), the characterization of optimality of μ in (7.2.19) can be formulated as μ ⊗ μ ({(x1 , y1 , x2 , y2 ); x1 < x2 ≤ y1 < y2 }) = 0.
(7.2.20)
We remark that the characterization of optimal pairs in (7.2.19), resp. (7.2.20), implies the “maximal concentration on the diagonal” property in (7.2.18). For finite discrete distributions one can explicitly construct optimal pairs with the ordering property given in (7.2.19). We consider at nfirst the case 1 of equiprobable atoms in each distribution. So let μ1 = n i=1 εai , μ2 = n 1 i=1 εbi be the measures corresponding to F, G, where a1 ≤ · · · ≤ an , n b1 ≤ · · · ≤ bn , and ai ≤ bi for all i. Problem (7.2.17) is equivalent to the following problem: Find a permutation π ∈ Υn such that n
ϕ(bi − aπ(i) ) is maximal.
(7.2.21)
i=1
Here, the maximum is considered over all permuations π ∈ Υn such that aπ(i) ≤ bi , 1 ≤ i ≤ n. Permutations with this property are called admissible permutations. An optimal admissible permutation is essentially unique (up to indices with equal values of ai ), and it is given in the following theorem. Theorem 7.2.8 Define π ∗ ∈ Υn inductively: π ∗ (1)
:=
π ∗ (k) :=
max{k ≤ n; ak ≤ b1 }
(7.2.22)
max{ ≤ n; ∈ {π ∗ (1), . . . , π ∗ (k − 1)}, a ≤ bk }, 2 ≤ k ≤ n.
Then π ∗ ∈ Υ is the optimal admissible permutation. Proof: Define on Ω = {1, . . . , n} (supplied with the uniform distribution P ) random variables X(i) := ai and Y (i) := bπ∗ (i) , 1 ≤ i ≤ n. Then X ≤ Y , since π ∗ is admissible and X, Y satisfy the order relation (7.2.19). Therefore, they are optimal couplings. Equivalently, π ∗ is the optimal admissible permutation. 2 It is clear from the construction in Theorem (7.2.8) that up to a simultaneous permutation of the probability space, an optimal pair of r.v.s is essentially unique. Remark 7.2.9 Theorem 7.2.8 can be extended to the case that μ1 = n n i=1 pi εai , μ2 = i=1 qi εbi with rational pi , qi , by representing pi , qi in the formal equiprobable case. By an approximation argument—as given in Rogers (1992)—one can approximate the optimal couplings for F, G with
7.3 Mass Transportation Problems with Capacity Constraints
17
couplings having compact support. The general case then can be approximated via the ordering criterion (7.2.19) using a truncation technique. Thus, applying Theorem 7.2.8, we are able to construct explicit approximate solutions in the general case.
7.3 Mass Transportation Problems with Capacity Constraints In this section we obtain explicit solutions of Monge–Kantorovich mass transportation problems with capacity constraints. The Hoeffding–Fr´echet inequality is extended for bivariate distribution functions having fixed marginals and satisfying additional constraints. In the discrete case, our results lead to “greedy” algorithms similar to the classical northwest corner rule. Let us start with recalling the abstract version of the MKP: Given two Borel measures μ and ν on a separable metric space S with equal total mass λ = μ(S) = ν(S) < ∞ and a measurable cost function c on S × S, find Lc (μ, ν) = inf c(x, y)P ( dx, dy), (7.3.1) Uc (μ, ν) = sup c(x, y)P ( dx, dy), (7.3.2) where the infimum and supremum are taken over all Borel measures P on S × S having projections (marginals) P (· × S) = μ(·),
P (S × ·) = ν(·).
(7.3.3)
As shown in Section 3.1, the explicit solutions of MKP are based on the Hoeffding–Fr´echet inequality (referred to as upper and lower Fr´echet bounds): max(0, F μ (x) + F ν (y) − λ) ≤ F P (x, y) ≤ min(F μ (x), F ν (y)),
(7.3.4)
for any P on IR2 that satisfies (7.3.3) with S = IR. (In (7.3.4) and in the sequel, F P stands for the distribution function of P .) If c is a lattice superadditive (equivalently, −c is a Monge function): c(x , y ) + c(x, y) ≥ c(x , y) + c(x, y )
for all x ≥ x, y ≥ y,
(7.3.5)
then under mild moment conditions on μ and ν the explicit values of Lc and Uc were given in Section 3.1. In this section we consider two marginal problems with additional constraints on the joint distribution functions. Suppose μ and ν are two nonnegative Borel measures on IR, μ(IR) = ν(IR) = λ < ∞. Suppose c : IR2 →
18
7. Relaxed or Additional Constraints
IR is a right-continuous Monge function generating a nonnegative measure on IR2 . Let σ be a nonnegative bounded Borel measure on IR2 . (Note that the total mass of σ may be different from λ.) Problem I.
Find maximum c(x, y)P ( dx, dy)
(7.3.6)
IR2
subject to the constraints P is a nonnegative Borel measures on IR2
(7.3.7)
with marginals μ and ν, and P ((−∞, x] × (−∞, y]) ≤ σ((−∞, x] × (−∞, y])
(7.3.8)
for all x, y ∈ IR.
Problem II.
Find minimum c(x, y)P ( dx, dy)
(7.3.9)
IR2
subject to (7.3.7) and P ((−∞, x] × [y, ∞)) ≤ σ((−∞, x] × [y, ∞))
for all x, y ∈ IR. (7.3.10)
Problem I with discrete μ and ν was studied by Barnes and Hoffman (1985). Olkin and Rachev (1990) extended their results by completing the characterization of the “optimal feasible” P; i.e., P satisfies (7.3.7), (7.3.8) and attains the maximum in (7.3.6). This method is extended to solve Problem II as well. We start with a refinement of the Fr´echet bounds (7.3.4). We shall do this by determining the exact bounds for a d.f. F P (x, y) with marginals F μ and F ν assuming that P satisfies the constraint (7.3.8) or (7.3.10). Then we shall apply the extended Fr´echet bounds to solve Problems I and II. Whereas in the discrete case the solution of Problem I leads to the Barnes– Hoffman greedy algorithm, the solution of Problem II implies a new greedy algorithm for a transportation problem with capacity constraints (7.3.10). We begin with some notation. For two nonnegative Borel measures μ and ν on IR with equal total mass λ denote by M (μ, ν) the set of all nonnegative Borel measures on IR2 with projections μ and ν. Without loss of generality set λ = 1. Given a nonatomic probability space, the set F(A, B) of joint d.f.s F (x, y) = FX,Y (x, y) = P (X ≤ x, Y ≤ y) with fixed
7.3 Mass Transportation Problems with Capacity Constraints
19
marginals FX = A and FY = B is the set of d.f.s of the probability laws in M (μ, ν). Thus, the Fr´echet bounds (7.3.4) can be rewritten as max
F (x, y) = F ∗ (x, y) :=
min(A(x), B(y)),
(7.3.11)
max
G(x, y) = G∗ (x, y) :=
min(A(x), B(y)),
(7.3.12)
F ∈F (A,B) F ∈F (A,B)
where B(y) := ν([y, ∞)) and G(x, y) := GX,Y (x, y) := P (X ≤ x, Y ≥ y). Clearly, the laws corresponding to F ∗ and G∗ are in M (μ, ν). Furthermore, given a nonnegative bounded Borel measure σ on IR2 , set F σ (x, y) := σ((−∞, x] × (−∞, y]), (7.3.13) Gσ (x, y) := σ((−∞, x] × [y, ∞)) F(A, B, F σ ) := {F ∈ F(A, B); F ≤ F σ }, G(A, B, Gσ ) := {GX,Y ; FX,Y ∈ F(A, B), GX,Y ≤ Gσ } . Our objective in the next two theorems is to extend the Fr´echet bounds; we shall characterize the bounds max
F (x, y) =: F(x, y),
x, y ∈ IR,
(7.3.14)
max
y), G(x, y) =: G(x,
x, y ∈ IR,
(7.3.15)
F ∈F (A,B,F σ ) G∈G(A,B,Gσ )
and shall examine the conditions implying F ∈ F(A, B, F σ )
∈ G(A, B, Gσ ). and G
(7.3.16)
Theorem 7.3.1 If F σ (x, y) ≥ max(0, A(x) + B(y) − 1),
(7.3.17)
then the maximum in (7.3.14) is attained: F(x, y) = =
inf {F σ (t, s) + μ((t, x]) + ν((s, y])}
t≤x s≤y
(7.3.18)
inf {F σ (t, s) + μ((t, x]) + ν((s, y])} ∧ (A(x) ∧ B(y)),
t≤x s≤y
and F ∈ F(A, B, F σ ), where ∧ := min. Remark 7.3.2 Condition (7.3.17) is necessary and sufficient for F(A, B, F σ ) = Ø, cf. Fr´echet (1951), Kellerer (1964). Remark 7.3.3 The second equality in (7.3.18) follows from the fact that F σ (t, s) = 0 for t = −∞ or s = −∞.
20
7. Relaxed or Additional Constraints
Remark 7.3.4 From (7.3.18) F is not greater than the Hoeffding–Fr´echet upper bound F ∗ (7.3.11). Remark 7.3.5 By (7.3.4) the maximum in (7.3.11) is attained for the pair X ∗ = A− (U ), Y ∗ = B − (U ), where A− is the generalized inverse of A, and U is uniformly distributed on [0, 1]. In contrast, for F given in (7.3.18), Y ) with joint d.f. given by F is the explicit form of the optimal pair (X, not known. However, in the discrete case one can use the Barnes–Hoffman greedy algorithm to compute F. Suppose μ, ν, and σ are discrete measures,
ai
:= μ({xi }),
bj
:= ν({yj }), j ∈ N = {1, 2, . . . , }, = bj = 1;
ai
i∈M
σij
i ∈ M = {1, 2, . . . , m},
j∈N σ
:= F (xi , yj ),
(7.3.19)
i ∈ M, j ∈ N.
(7.3.20)
Then F(xi , yj ) =
j i
prs ,
(7.3.21)
r=1 s=1
where the probabilities prs are determined by the following variant of the northwest corner rule (see Hoffman (1961), Barnes and Hoffman (1985)); in fact, we set p11
:=
pij
:=
min(a1 , b1 , σ11 ); (7.3.22) ⎧ ⎫ ⎪ ⎪ j−1 i−1 ⎨ ⎬ min ai − pis , bj − prj , σij − prs , ⎪ ⎪ ⎭ ⎩ r≤i s≤j s=1 r=1 (r,s)=(i,j)
if prs is determined for r ≤ i < m and s ≤ j < n, and we let j−1 i−1 pij := min ai − pis , bj − prj , if i = m or j = n. s=1
r=1
In other words, taking discrete versions of μ, ν, and σ in (7.3.19) one can apply the greedy algorithm (7.3.22) to approximate F in (7.3.18) by means of (7.3.21). Proof of Theorem 7.3.1: The proof is based on three assertions. Claim 7.3.6 (Fr´echet (1951)) The condition F σ (x, y) ≥ H− (x, y) = max(0, A(x) + B(y) − 1) is necessary and sufficient for F(A, B, F σ ) = Ø.
7.3 Mass Transportation Problems with Capacity Constraints
21
Suppose F(A, B, F σ ) = Ø. Then, by (7.3.4) H− (x, y) ≤ F (x, y) < F σ (x, y),
F ∈ F(A, B, F σ ).
(7.3.23)
On the other hand, if H− ≤ F σ , then H− ∈ F(A, B, F σ ). Claim 7.3.7 F defined by (7.3.18) has marginal d.f.s A and B and for all x, y ∈ IR, sup F ∈F (A,B,F σ )
F (x, y) ≤ F(x, y).
(7.3.24)
For any F ∈ F(A, B, F σ ) and any t ≤ x, s ≤ y, we have F (x, y) ≤ F σ (t, s) + μ((t, x]) + ν((s, y]), which clearly implies (7.3.24). Invoking Remark (7.3.3), F(x, y) ≤ H+ (x, y) where H+ is the upper Hoeffding–Fr´echet bound, H+ (x, y) := min(A(x), B(y)). Since F ≥ H− (cf. (7.3.23), (7.3.24)), F ∈ [H− , H+ ] has marginals A and B. Theorem 7.3.1 is now a consequence of the following assertion. Claim 7.3.8 F is a d.f. To this end, we choose −∞ = x0 < x1 < · · · < xm−1 < xm = ∞, −∞ = y0 < y1 < · · · < yn−1 < yn = ∞ such that μ((xi−1 , xi )) < ε, ν((yn−1 , y1 )) < ε, and σ((xi−1 , xi ) × (yj−1 , yj )) < ε for all i ∈ M = {1, . . . , m} and j ∈ N = {1, . . . , n}. Set ai := μ((xi−1 , xi ]), bj := ν((yj−1 , yj ]), and σij := F σ (xi , yj ). Consider the convex polygon ⎧ ⎨ pij = ai , (7.3.25) p = (pij ) i∈M ; pij ≥ 0, pi· := j∈N ⎩ j∈N j i pij = bj , prs ≤ σij , for all i ∈ M, j ∈ N p·j := =
i∈M
r=1 s=1
p; pij ≥ 0, pi· = ai , p·j = bj ,
j i
prs ≤ σij , i = 1, . . . , m − 1,
r=1 s=1
j = 1, . . . , n − 1,
j s=1
p·s ≤ σmj , j ∈ N,
i
pr· ≤ σin , i ∈ M
.
r=1
By the Fr´echet condition (7.3.17) (cf. Claim 7.3.6) the marginals of F σ j majorize A and B, respectively, and thus σmj ≥ s=1 p·s and σin ≥ i r=1 pr· for all j ∈ N, i ∈ M . The polygon (7.3.25) becomes p; pij ≥ 0, pi· = ai , p·j = bj
for all i ∈ M, j ∈ N,
(7.3.26)
22
7. Relaxed or Additional Constraints j i
prs ≤ σij
for all i = 1, . . . , m − 1, j = 1, . . . , n − 1 .
r=1 s=1
Consider now the discrete analogue of F in (7.3.18): dij
:=
dij
:=
min {σrs + ar+1 + · · · + ai + bs+1 + · · · + bj },
0≤r≤i 0≤s≤j
0
(7.3.27)
if i = 0 or j = 0,
where σrs = 0 if r = 0 or s = 0. Our aim now is to show that d = (dij ) i∈M j∈N
determines a bivariate d.f. with support on X × Y, X = (xi )i∈M , Y = (yj )j∈N . Claim 7.3.9 The greedy algorithm (7.3.22) is determined uniquely by (7.3.27); i.e., j i
dij :=
prs ,
i ∈ N, j ∈ M.
(7.3.28)
r=1 s=1
Proof: Consider the discrete version of F (cf. (7.3.21), (7.3.25)). Let σr,s := 0 if r = 0 or s = 0, and define dij
:=
dij
=
min {σrs + ar+1 + · · · + ai + bs+1 + · · · + bj },
0≤r≤i 0≤s≤j
(7.3.29)
0 if i = 0 or j = 0.
We need to check the equality dij =
j i
prs ,
i ∈ M, j ∈ N,
(7.3.30)
r=1 s=1
where the pij ’s are determined by the greedy algorithm (7.3.22). If i = j = 1, then p11 = min(a1 , b1 , σ11 ) (cf. (7.3.22)), and by (7.3.29) d11 = min{σ11 + a1 + b1 , σ11 + a1 , σ10 + b1 , σ11 } = p11 . Suppose we have proved that d1,j−1 = p11 + · · · + p1,j−1 . Then p11 + · · · + p1j
(7.3.31)
7.3 Mass Transportation Problems with Capacity Constraints
=
j−1
p1s + min a1 −
s=1
j−1
p1s , bj , σ1j −
s=1
j−1
23
p1s
s=1
min{a1 , bj + d1,j−1 , σ1,j } = min{a1 , b1 + · · · + bj , σ11 + b2 + · · · + bj , . . . , σ1,j−1 + bj , σi,j } = d1,j .
=
These equalities hold due to (7.3.22), (7.3.31), and (7.3.29), respectively. By symmetry, di,1 = p11 + · · · + pi1 . Suppose next that drs =
s r
for all r ≤ i, s ≤ j, (r, s) = (i, j).
pkl
(7.3.32)
k=1 l=1
Then for 1 ≤ i ≤ m − 1, 1 ≤ j ≤ n − 1, j j j−1 i i−1 i prs = min ai + prs , bj + prs , σij = dij , r=1 s=1
r=1 s=1
r=1 s=1
where the equalities follow from (7.3.22) and (7.3.32). Thus dij =
j i
prs
for all 1 ≤ i ≤ m − 1, 1 ≤ j ≤ n − 1.
(7.3.33)
r=1 s=1
Consider now the case i = m. Then, m m−1 m−1 pr1 = pr1 + min am , b1 − pr1 r=1
r=1
=
r=1
min{am + dm−1,1 , b1 } = dm,1 ,
which follows from (7.3.22) and (7.3.33). Suppose that dm,j−1 =
j−1 m
prs .
(7.3.34)
r=1 s=1
Then using (7.3.22), (7.3.33), and (7.3.34), for 1 ≤ j ≤ n, j j−1 n m m−1 m prs = min am + prs , bj + prs r=1 s=1
r=1 s=1
=
r=1 s=1
min{σrs + ar+1 + · · · + am bs+1 + · · · + bj },
for 0 ≤ r ≤ m, 0 ≤ s ≤ j, (r, s) = (m, j); m m r=1 s=1
prs = dm,j ,
for r = m, s = j, σm,j = F σ (∞, yj ) ≥ bj .
24
7. Relaxed or Additional Constraints
Similarly, di,n =
i n
prs , for all i ∈ M , which proves Claim 7.3.9.
2
r=1 s=1
The greedy algorithm (7.3.22) defines nonnegative pij ’s (cf. Barnes and Hoffman (1985, Lemma 3.2)). Define the probability P (ε) on X × X by P (ε) ((−∞, xi ], (−∞, yj ]) := dij ,
i ∈ M, j ∈ N.
(7.3.35)
Similarly, (ai )i∈M and (bi )i∈N determine probabilites μ(ε) and ν (ε) with supports X and Y , respectively. If is the Kolmogorov (uniform) distance (μ, ν) := sup |F μ (x) − F ν (x)| ,
(7.3.36)
x∈IR
then the sequences μ(ε) ε>0 and ν (ε) ε>0 are -relatively compact, and thus there exists εn ↓ 0 such that (εn ) (εn ) μ , μ → 0 and ν , ν → 0. (7.3.37) (For more facts on -relative compactness cf. Rachev (1984a) and Kakosjan, Klebanov, and Rachev (1988, Sec. 2.5).) Similarly, by definition of σij := F σ (xi , yj ), σ((xi−1 , xi )×(yj−1 , yj )) < ε we have that (σij ) i∈M determines a measure σ (ε) on X × Y . Again, the i∈N (ε) is -relatively compact. Thus, without loss of generality, family σ ε>0 we may assume that as εn → 0, (ε ) σ n (εn ) σ (7.3.38) σ , σ = sup F (x, y) − F (x, y) → 0. x,y∈R
As inClaim 7.3.7, we conclude that P (ε) has marginals μ(ε) and ν (ε) , and thus P (ε) ε>0 is tight. By (7.3.37), (7.3.38), (7.3.26), and (7.3.18), there exists a subsequence {εn } ⊂ {εn } such that P (εn ) weakly converges to a measure P with d.f. F. The proof of Theorem 7.3.1 is now complete. 2 The next theorem provides an explicit expression for the Fr´echet type bound (7.3.15). We recall the notations (7.3.11)–(7.3.13). Theorem 7.3.10 Suppose Gσ (x, y) := σ((−∞, x] × [y, ∞)) (cf. (7.3.13)) satisfies the condition ◦ ◦ B(y) := ν((−∞, y)) . (7.3.39) Gσ (x, y) ≥ max 0, A(x) − B(y)
7.3 Mass Transportation Problems with Capacity Constraints
25
Then the maximum in (7.3.15) is attained, and y) = inf {Gσ (t, s) + μ((t, x] + v([y, s))} . G(x, t≤x s≥y
(7.3.40)
Conclusions similar to those in Remarks 7.3.2–7.3.5 can be made. Here we shall only point out the greedy algorithm that can be used to approximate We use the notations (7.3.19) again and let the optimal distribution G. λij := Gσ (xi , yj ),
i ∈ M, j ∈ N.
(7.3.41)
has the form Then, in this discrete case, G i , yj ) = G(x
i n
prs ,
(7.3.42)
r=1 s=j
where the probabilites pij are determined by the following southwest corner rule: pin
:=
pij
:=
min{a1 , bn , λ1n }; (7.3.43) ⎧ ⎫ ⎪ ⎪ n i−1 ⎨ ⎬ min ai − pis , bj − prj , λij − prs , (7.3.44) ⎪ ⎪ ⎩ ⎭ r≤i s≥j r=1 s=j+1 (r,s)=(i,j)
if i = m or j = 1. if prs is determined for r ≤ i ≤ m − 1 and s ≥ j > 1; moreover, ⎧ ⎫ n i−1 ⎨ ⎬ pij := min ai − pis , bj − prj , if i = m or j = 1. ⎩ ⎭ s=j+1
(7.3.45)
r=1
We now give explicit solutions of the marginal problems I and II. Theorem 7.3.11 Suppose (i)
c : IR2 → IR is a right-continuous lattice superadditive function (−c is a Monge function);
(ii)
μ and ν are two Borel nonnegative measures on IR with μ(IR) = ν(IR) = λ < ∞ and d.f.s F μ and F ν , and such that
c(x0 , y)ν( dy) < ∞
c(x, y0 )μ( dx) + IR
IR
for some x0 , y0 ∈ IR;
(7.3.46)
26
7. Relaxed or Additional Constraints
(iii)
σ is a nonnegative bounded Borel measure on IR2 and F σ (x, y) ≥ max(0, F μ (x) + F ν (y) − λ)
for all x, y ∈ IR. (7.3.47)
Then the maximum in (7.3.6) is attained at the “optimal” measure P. P satisfies the feasibility conditions (7.3.7), (7.3.8) and is determined by
F P (x, y) := inf {F σ (t, s) + μ((t, x]) + ν((s, y])} , t≤x s≤y
x, y ∈ IR.
(7.3.48)
Proof: We need Theorem 3.1.2 (cf. Cambanis, Simons, and Stout (1976, Theorem 1); see also Rachev (1991c, Section 7.3)). If (7.3.5) holds, then for measures P1 and P2 on IR2 with marginals μ and ν, P1 P2 F ≤ F ⇒ c dP1 ≤ c dP2 , (7.3.49) IR2
IR2
2
which with an appeal to Theorem 7.3.1 yields the result.
Remark 7.3.12 The assumption (7.3.46) can be replaced by one of the following assumptions:
c(x, x)(μ + ν)( dx) < ∞.
(a)
c(x, y) is symmetric, and
(b)
c(x, y) is uniformly integrable for all P with marginals μ and ν.
That (a) implies (7.3.49) follows from Cambanis, Simons, and Stout (1976); that (b) implies (7.3.49) follows from Tchen (1980, Corollary 2.1); see also Rachev (1991c, Theorem 7.3.2). Remark 7.3.13 Condition (7.3.47) guarantees that the set of feasible solutions P determined by (7.3.7), (7.3.8) is not empty. Remark 7.3.14 If F σ (x, y) ≥ min(F μ (x), F ν (y)) =: H+ (x, y),
(7.3.50)
then F P in (7.3.48) equals H+ (cf. Remark 7.3.3). Thus, Theorem 7.3.11 (see also the next Theorem 7.3.17) can be considered as a generalization of Theorem 2 of Cambanis, Simons, and Stout (1976) and Corollary 2.2 of Tchen (1980). In this case, Hoffman’s (1962) northwest corner rule gives a greedy algorithm to determine an “optimal” measure P, provided that μ and ν have finite discrete support.
7.3 Mass Transportation Problems with Capacity Constraints
27
Remark 7.3.15 Consider the discrete version of Problem I (see (7.3.6)). Suppose c(i, j), i ∈ M, j ∈ N , is a lattice superadditive sequence c(i, j) + c(i + 1, j + 1) ≥ c(i, j + 1) + c(i + 1, j), (7.3.51) i = 1, . . . , m − 1, j = 1, . . . , n − 1. Hoffman (1961) and Barnes and Hoffman (1985) treat c(i, j) as the (negative) cost of shipping a unit commodity from origin i to destination j. Suppose the discrete measures μ and ν with supports M and N are given. Then ai = μ{i} and bj = ν{j} are interpreted as the amount of a product available at i and the amount required at destination j. Suppose the (m − 1) × (n − 1) matrix (σij ) satisfies ⎧ ⎫ i n ⎨ ⎬ σij ≥ max 0, ar − bs , (7.3.52) ⎩ ⎭ r=1
s=j+1
σij ≤ σis , σij ≤ σrj , σij + σrs ≥ σis + σrj , r ≥ i, s ≥ j. (These conditions are related to what is called a uniformly tapered matrix; see Marshall and Olkin (1979).) Barnes and Hoffman (1985) consider the following transportation problem: maximize c(i, j)pij (7.3.53) i∈M j∈N
subject to j i
pij
≥ 0, pi· = ai , p·j = bj
pij
≤ σij ,
for all i ∈ M, j ∈ N, (7.3.54)
i = 1, . . . , m − 1, j = 1, . . . , n − 1.
r=1 s=1
Clearly, (7.3.54) is a special case of Problem I. Following Barnes and Hoffman, (7.3.54) can be viewed as the capacity restrictions on the amount that can be shipped from the first i origins to the first j destinations. Theorem 7.3.11 is completed by showing that the greedy algorithm of Barnes and Hoffman (1985) for determining the solution (pij ) i∈M of (7.3.53) is also j∈N characterized by P
F (i, j) :=
j i
prs
(7.3.55)
r=1 s=1
=
σrs
=
min {σrs + ar+1 + · · · + ai + bs+1 + · · · + bj } ,
0≤r≤i 0≤s≤j
⎧ ⎨ 0 if r = 0 or s = 0, ⎩ +∞ if r = m or s = n.
28
7. Relaxed or Additional Constraints
Remark 7.3.16 One can determine the extremal value in (7.3.6): max μ ν
F ∈F (F ,F ;F σ ) R2
c dF P ,
c dF =
(7.3.56)
IR2
where F P is given by (7.3.48). By (7.3.46) and Cambanis, Simons, and Stout (1976, p. 288, (9)), c dF = IR2
(7.3.57)
ν
c(x, y0 )F ( dx) + IR
c(x0 , y)F ( dy) − c(x0 , y0 ) +
μ
IR
B(x, y)μc ( dx, dy) IR2
for any bivariate d.f. F with marginals F μ and F ν . (Since F P ∈ μ ν· σ F(F , F , F ) by Theorem 7.3.1, (7.3.57) can be used to compute the value
c dF P . In (7.3.57) the points x0 and y0 are the same as in condition
of
(7.3.23), the measure μc is generated by c (see condition (i) in Theorem 7.3.11), and we also assume that c is a nondecreasing function in both arguments.) Finally, (7.3.58) := B1 − B2 ⎧ ⎪ ⎪ 1 + F (x, y) − F μ (x) − F ν (y) if x0 < x, y0 < y, ⎪ ⎨ B1 (x, y) := F (x, y) if x ≤ x0 , y ≤ y0 , ⎪ ⎪ ⎪ ⎩ 0 otherwise; ⎧ ⎪ ⎪ F μ (x) − F (x, y) if x ≤ x0 , y0 ≤ y, ⎪ ⎨ B2 (x, y) := F ν (y) − F (x, y) if x0 < x, y ≤ y0 , ⎪ ⎪ ⎪ ⎩ 0 otherwise. B
Theorem 7.3.17 Suppose conditions (i) and (ii) of Theorem 7.3.11 hold, and in addition ∗
(iii )
σ is a nonnegative bounded Borel measure on IR2 satisfying
Gσ := σ ((−∞, x], [y, ∞)) ≥ max (0, F μ (x) − ν((−∞, y)))
(7.3.59)
for all x, y ∈ IR. Then, the minimum in (7.3.9) is attained at an optimal measure Q satisfying the feasibility conditions (7.3.7) and (7.3.10); Q is
7.3 Mass Transportation Problems with Capacity Constraints
29
determined by GQ (x, y) = Q((−∞, x] × [y, ∞)) = inf {Gσ (x, y) + μ((t, x]) + ν([y, s))} .
(7.3.60)
t≤x s≥y
All the Remarks 7.3.12–7.3.16 can be easily reformulated regarding Theorem 7.3.17. In particular, consider the transportation problem minimize c(i, j)pij (7.3.61) i∈M j∈N
subject to
i n
pij
≥ 0, pi· = ai , p·j = bj
for all i ∈ M, j ∈ N, (7.3.62)
prs
≤ λij , i = 1, . . . , m − 1, j = 1, . . . , n − 1.
r=1 s=j
bj and c(·, ·) is a lattice superadditive sequence j i (cf. (7.3.51)). Suppose also that λij ≥ max 0, r=1 ar − s=1 bs and for any r < i and s > j the inequalities Suppose
i∈M
ai =
j∈N
λrj ≤ λij ≥ λis , λij + λrs ≥ λis + λrj ≥ 0, r < i, s > j,
(7.3.63)
hold. Then the greedy algorithm (7.3.43)–(7.3.45) realizes the minimum in (7.3.61). Moreover, the optimal pij ’s are determined by pij = fij − fi,j+1 − fi−1,j + fi−1,j+1 , where fij
:=
min {λrs + (ar+1 + · · · + ai ) + (bj + · · · + bs−1 )}
1≤r≤i j≤s≤n
∧
i r=1
ar ∧
n
bs .
(7.3.64)
s=j
The rest of this section is devoted to a generalization of the MKP with additional constraints stated in Problems I and II; see (7.3.6)–(7.3.10). The results are motivated by Hoffman and Veinott (1990), where the discrete version of the problem has been considered. We shall only state the results. The proofs are similar to those of Theorems 7.3.1 and 7.3.10 and will therefore be omitted. The abstract form of the problem is the following. Suppose that (i)
μ and ν are two nonnegative Borel measures on IR, μ(IR) = ν(IR) = λ < ∞;
30
7. Relaxed or Additional Constraints
(ii) L is a union of disjoint sublattices Li ⊂ IR2 , i ∈ S, and the projections of L on each axis equal IR; (iii) (σi )i∈S are nonnegative σ-finite Borel measures on Li . Then the problem is to find min c dP,
(7.3.65)
L
where the minimum is subjet to the following constraints: (i)
P ’s are nonnegative Borel measures on L with marginals μ and ν;
(7.3.66)
(ii)
P (A) ≤ σi (A)
(7.3.67)
for any A = Li ∩ (−∞, x] × (−∞, y],
(x, y) ∈ Li , i ∈ S.
As before, see (7.3.1)–(7.3.3), the measures μ and ν are viewed as initial and final mass distributions, and P in (7.3.66), (7.3.67) are the (feasible) transportation plans. Here the generalization of problems I and II is that L describes the path of the transportation flow and σi ’s are capacity constraints on the cumulative supply–demand flow. Finally, c : L → R is a cost function, and therefore, the integral in (7.3.65) represents the total cost of mass transportation applying the plan P . Suppose c is subadditive on the lattice L; that is; for all x, y ∈ L, f (x) + f (y) ≥ f (x ∧ y) + f (x ∨ y). Then we shall call a feasible plan of transportation achieving the minimum in (7.3.65) an optimal measure P ∗ . As in problems I and II we start with extensions of the classical Hoeffding– Fr´echet bounds (7.3.4), assuming that P meets the constraints (7.3.66) and (7.3.67), or their alternatives: P is a nonnegative Borel measure on L = Li (7.3.68)
i∈S
Li := {(x, y); (x, −y) ∈ L} with marginals μ and ν ;
P (B) ≤ σi (B) for any B = Li ∩ ((−∞, x] × [y, ∞)). The restriction on the support of P given in (7.3.66) has the form L = Li , where S = {0, 1, . . . , s}, i∈S
(7.3.69)
7.3 Mass Transportation Problems with Capacity Constraints
31
+ − + or S = IN, and each sublattice Li is a rectangle (x− i , xi ] × (yi , yi ], where − − + − + − − + + − x− 0 = y0 = −∞, xi < xi , yi < yi , xi−1 ≤ xi ≤ xi−1 ≤ xi , yi−1 ≤ + + yi− ≤ yi−1 ≤ yi+ , x+ s = ys = ∞. Write PL (resp. PL ) to denote the class of all P ’s on L with (7.3.66) and (7.3.67) (resp. (7.3.68), (7.3.69)). Recall that for any measure P on IR2 , F P stands for the d.f. of P , and GP (x, y) = P ((−∞, x]×[y, ∞)). In the next two theorems we shall compute the bounds
F ∗ (x, y) = max F P (x, y)
(7.3.70)
G∗ (x, y) = max GP (x, y).
(7.3.71)
P ∈PL
and P ∈PL
For L = IR2 and σi = +∞, F ∗ is indeed the upper Hoeffding–Fr´echet bound H+ (x, y) = min {F μ (x), F ν (y)}
(F μ (x) := μ((−∞, x]) . (7.3.72)
On the other hand, G∗ (x, y) = min {F μ (x), Gν (y)} determines a measure with d.f.
(Gν (x) := ν([y, ∞)))
H− (x, y) = max (0, F μ (x) + F ν (y) − λ) ,
(7.3.73)
which is the lower Hoeffding–Fr´echet bound. Theorem 7.3.18 Suppose that F ∗ : L → IR is defined iteratively as follows: F ∗ (x, y) =
min
[μ((u, x]) + ν((v, y]) + F σ0 (u, v)]
(7.3.74)
min
[μ((u, x]) + ν((v, y]) + F σi (u, v)
(7.3.75)
+ −∞
for (x, y) ∈ L0 , and F ∗ (x, y) =
− +
x
+ + ∗ + + F ∗ (x+ i−1 , v ∧ yi−1 ) ∨ F (xi−1 ∧ u, yi−1 )
for (x, y) ∈ Li , i ≥ 1. Suppose also that F ∗ satisfies the inequalities + ∗ F σi (x.y) + F ∗ (x+ i−1 , y) ∨ F (x, yi−1 ) ≥ H− (x, y)
(7.3.76)
for (x, y) ∈ Li , i ∈ S. Then the equality (7.3.70) holds, and moreover, F ∗ is a d.f. of some P ∗ ∈ PL .
32
7. Relaxed or Additional Constraints
The proof is similar to that of Theorem 7.3.1. A slightly different approach (see Olkin and Rachev (1990)) can be used based on the following result of Topkis and Veinott (1973): The minimum of subadditive functions over a sublattice with respect to some variables is subadditive in the remaining variables; see also Hoffman and Veinott (1990) for the discrete version of Theorem 7.3.18. In the next theorem we evaluate the bound G∗ in (7.3.71). Recall that + + − Li := (x− i , xi ] × [−yi , −yi ), L := i∈S Li . Theorem 7.3.19 Suppose that G∗ : L → IR is defined iteratively by G∗ (x, y) = min [μ((u, x]) + ν([y, v)) + Gσ0 (u, v)] , for (x, y) ∈ L0 (7.3.77) u≤v v≥y
and G∗ (x, y)
=
min
− + x
v≥y≥−y i i
μ((u, x]) + ν([y, v)) + Gσi (u, v)
(7.3.78)
+ + + ∗ + G∗ (x− i−1 , v ∨ (−yi−1 )) ∨ G (u ∧ xi−1 , −yi−1 )
for (x, y) ∈ Li . Suppose also that G∗ satisfies the inequalities + ∗ Gσ (x, y) + G∗ (x+ i−1 , y) ∨ G (x, −yi−1 )
≥ F μ (x) + Gν (y) − λ
(7.3.79)
for any (x, y) ∈ Li , i ∈ S.
Then the inequality (7.3.71) holds, and G∗ defines P ∗ ∈ PL by G∗ (x, y) = P ∗ ((−∞, x] × [y, ∞)) for (x, y) ∈ L. Condition (7.3.79) is necessary for P ∗ ∈ PL . Next, we shall formulate a multivariate analogue of Theorem 7.3.18. (In general, Theorem 7.3.19 does not admit a multivariate extension by the well-known reason that the lower Hoeffding–Fr´echet bound for d.f.s on IRr (r > 2) with given one-dimensional projections does not generate a measure.) Let μ = μ(1) , . . . , μ(r) be a vector of r Borel nonnegative measures on IR with one and the same total mass λ < ∞. Suppose L is a complete Borel sublattice on IRr whose projection on every axis x(i) (i ∈ R := {1, . . . , r}) is the entire real line IR. Suppose also that L is a union of disjoint nonempty sublattices Li , i ∈ S (S = {0, . . . , s} or S = IN) and each Li is a rectangle in IRr , (j)− (j)+ + r = ⊗ , x , x Li = x − x , i ∈ S, j=1 i i i i
7.3 Mass Transportation Problems with Capacity Constraints
33
− + − − + + + with x− 0 = −∞, xs = +∞, xi < xi , xi−1 ≤ xi ≤ xi−1 ≤ xi . (For representations of sublattices on a product of r lattices we refer to Veinott (1989, Section 4).)
Given σi , a nonnegative Borel measure on Li , and a measure P on L with vector of one-dimensional projections μ we write PLi ≺ σi to denote that the restriction of P on Li is less concordant that σi ; that is, for any x ∈ Li , − P ((x− i , x]) ≤ σi ((xi , x]),
i ∈ S.
Note that in contrast with the usual definition of concordance (Kruskal (1958), Tchen (1980), Stoyan (1983)) we allow σi to have total mass different from that of PLi . For example, assuming that σi vanishes on a subset of Li , we in fact impose additional restrictions on the support of P . Write PL (μ, σ) (σ := (σi )i∈S ) to denote the class of all P ’s on L possessing the properties (i)
P is a nonnegative Borel measure on L with vector of one-dimensional marginals μ.
(7.3.80)
(ii)
P L i ≺ σi
(7.3.81)
for all i ∈ S.
Define the mapping F ∗ : L → IR iteratively as follows: For x ∈ L0 , let F ∗ (x) =
⎧ ⎨
min
−∞
⎩
μ(j)
u(j) , x(j)
j∈R
⎫ ⎬ σ0 + F (u) , ⎭
and for x ∈ Li , let F ∗ (x) =
⎧ ⎨
min
+ x− i
∗
⎩
μ(j)
u(j) , x(j)
⎫ ⎬
+ F σi (u) + fLi (u)
⎭
j∈R
(1)− (2) (r) xi , vi , . . . , vi
∗
(1) (2) (r) vi , xi , . . . , vi
,
where fLi (u) = max F ,F , (1) (2) (r)− (j) (j)+ and vi := u(j) ∧ xi−1 , u ∈ Li . Denote by H− F ∗ vi , vi , . . . , xi the lower Hoeffding–Fr´echet bounds for multivariate d.f. with prescribed one-dimensional marginals: ⎛ H− (u) = max ⎝0,
j∈IR
μ(j)
−∞, u(j)
⎞ − (r − 1)λ⎠ .
34
7. Relaxed or Additional Constraints
Theorem 7.3.20 Suppose F ∗ defined above satisfies the inequality F σi (u) + fLi (u) ≥ H− (u)
for every u ∈ Li , i ∈ S.
Then max F P = F ∗ ,
P ∈PL
and F ∗ is a d.f. of some P ∗ ∈ PL . The proof is similar to that of Theorem 7.3.1. It requires a multivariate analogue of the greedy algorithm similar to that in Barnes and Hoffman (1985) and Olkin and Rachev (1990). A multivariate version of Hoffman’s (1963) northwest corner rule is given in Balinski and Rachev (1989), where the interplay between greedy algorithms and MKPs is emphasized. We are now ready to state the solution of MKP (7.3.65) with constraints (7.3.66) and (7.3.67). Theorem 7.3.21 Suppose that the assumptions of Theorem 7.3.18 hold, and c : L → IR is subadditive and left-continuous on L with c(x, y0 )μ( dx) + c(x0 , y)ν( dy) > −∞ (7.3.82) IR
IR
for some (x0 , y0 ) ∈ L. Then the minimum in (7.3.65) is attained at P ∗ ∈ PL defined in Theorem 7.3.18. The next theorem gives the solution of the following MKP: minimize c dP
(7.3.83)
L
under the constraints (7.3.68), (7.3.69), and assuming that c is superadditive on L; that is, (−c) is subadditive on L. Theorem 7.3.22 Suppose that the assumptions of Theorem 7.3.19 hold, and c : L → IR is superadditive and right-continuous with c(x, y0 )μ( dx) + c(x0 , y)ν( dy) < ∞. (7.3.84) IR
IR
Then the minimum in (7.3.84) is attained at P ∗ ∈ PL defined in Theorem 7.3.19.
7.3 Mass Transportation Problems with Capacity Constraints
35
The proof of the above two theorems is the same as that of Theorem 7.3.11. Example 7.3.23 (The discrete case) Suppose μ and ν are discrete measures with supports I = {1, . . . , m} and J = {1, . . . , n} and L = L0 +· · ·+Ls is a sublattice of I ×J with projections I and J, respectively. Then Theorem 7.3.18 corresponds to the main theorem in Hoffman and Veinott (1990). Example 7.3.24 (MKP with capacity constraints) Suppose that in Theorems 7.3.21 and 7.3.22, L = L = L0 = L0 = IR2 . Then we obtain the solution of problems I and II; see (7.3.6)–(7.3.10). In fact, Theorems 7.3.21 and 7.3.22 reduce to Theorems 7.3.11 and 7.3.17. We shall complete this section with another possible extension of Problems I and II. Consider a finite measure μ on (IR2 , B 2 ) and define for two probability measures P1 , P2 on (IR1 , B 1 ) and Ai × Bi ∈ B 1 ⊗ B 1 , i ∈ I, M μ (P1 , P2 ) (7.3.85)
= P ∈ M 1 (P1 , P2 ); P (Ai × Bi ) ≤ μ(Ai × Bi ), i ∈ I , where M 1 (P1 , P2 ) denotes the set of all probability measures P on IR2 with marginals P1 , P2 . As in Theorem 7.3.1 (see (7.3.17)), we assume that μ(Ai × Bi ) ≥ (P1 (Ai ) + P2 (Bi ) − 1)+ .
(7.3.86)
Theorem 7.3.25 Under the assumption (7.3.86) let us define P ∗ (A × B) = inf {μ (A1 × Bi ) + (P1 (A) − P1 (Ai )) + (P2 (B) − P2 (Bi ))} Ai ⊂A Bi ⊂B
∧ min(P1 (A), P2 (B)),
A, B ∈ B 1 .
(7.3.87)
Then hμ (A × B) := sup{P (A × B); P ∈ M μ (P1 , P2 )} ≤ P ∗ (A × B). (7.3.88) If P ∗ determines a measure, then hμ (A × B) = P ∗ (A × B),
and P ∗ is a solution of (7.3.87).
(7.3.89)
Remark 7.3.26 The proof of Theorem 7.3.25 is similar to that of Theorem 7.3.1. In contrast to Theorem 7.3.1 it allows us to consider “local” bounds in the transportation problem. Observe that in the finite discrete case, bounds of the type xij ≤ μij
for some (i, j)
(7.3.90)
are of this “local” type. As far as we know, in the literature there is no result concerning the solution of (7.3.90) with local bounds. See the next section for a possible approach to the problem.
36
7. Relaxed or Additional Constraints
7.4 Local Bounds for the Transportation Plans While in the preceding three sections the additional constraints were formulated mainly in terms of the d.f.s, we now consider “local” constraints formulated in terms of the probability densities. These local-type restrictions are stronger than those in the previous section, and generally they are much more difficult to handle; see Remark 7.3.26. Our first result deals with a transportation problem with “indicator” cost function ⎧ ⎨ 1 if x = y, (7.4.1) c(x, y) = I(x = y) = ⎩ 0 if x = y; i.e., the cost of transportation is one for any unit mass that has to be moved, and zero otherwise. The cost function c does not satisfy a Mongetype condition. We formulate this transportation problem on a general measure space (S, U) assuming only that {(x, y); x = y} ∈ U ⊗ U.
(7.4.2)
Let Mf (S), Mf (S×S) be the set of all finite measures on (S, U), respectively (S ×S, U ⊗U), and for μ ∈ Mf (S ×S), let πi μ, i = 1, 2, denote the marginals of μ. (This transportation problem leads to an extension of Dobrushin’s result on optimal couplings.) Theorem 7.4.1 (Optimal couplings with local restrictions) Assume that (7.4.2) holds and let μ1 , μ2 ∈ Mf (S) with μ1 (S) ≤ μ2 (S). Then (a) inf{μ ({(x, y); x = y}) ; μ ∈ Mf (S × S), π1 μ ≥ μ1 , π2 μ ≤ μ2 } (7.4.3) = λ− (S) := sup (μ1 (C) − μ2 (C)). C∈U
(b) Moreover, the infimum in (7.4.3) is attained at λ− (A)λ+ (B) , μ (A × B) = γ(A ∩ B) + λ+ (S) ∗
(7.4.4)
where λ+ (A) = supC⊂A (μ2 − μ1 )(C), λ− (A) = supC⊂A (μ1 − μ2 )(C) and γ(A) = μ2 (A) − λ+ (A) = μ1 (A) − λ− (A). Proof: For any μ ∈ Mf (S × S), μ(x = y) ≥ sup μ(C × (S \ C)) = sup{μ(C × S) − μ(C × C)} C
C
≥ sup{μ(C × S) − μ(S × C)} ≥ sup{μ1 (C) − μ2 (C)} C
=
C
−
sup{λ (C) − λ (C)} = λ (supp λ− ) = λ− (S). C
+
−
7.4 Local Bounds for the Transportation Plans
37
On the other hand, μ∗ (A × S) = γ(A) + λ− (A)λ+ (S)/λ+ (S) = μ1 (A) and μ∗ (S × B) = γ(B) + λ− (S)λ+ (B)/λ+ (S) ≤ γ(B) + λ+ (B) = μ2 (B). Finally, we have μ∗ (x = y) = =
I(x = y)(γ( dx, dy) + λ− ( dx)λ+ ( dy)/λ+ (S)) I(x = y) λ− ( dx)λ+ ( dy)/λ+ (S)
= λ− (S)λ+ (S)/λ+ (S) = λ− (S). 2 Consider next some finite measures μ1 , μ2 on IR with densities h1 , h2 with respect to a dominating measure μ on IR1 . Define Pμμ12 := {P ∈ M 1 (IR2 , B 2 ); π1 P ≥ μ1 , π2 P ≤ μ2 }.
(7.4.5)
Any P ∈ Pμμ12 has marginals P1 = π1 P, P2 = π2 P with densities f1 ≥ h1 and f2 ≤ h2 with respect to μ. We assume that 1 = μ1 (IR1 ) ≤ μ2 (IR1 ); i.e., μ1 is a probability measure, and so f1 = h1 . ⎧ ⎪ ⎨ Theorem 7.4.2 Let z0 = inf y; ⎪ ⎩
f2∗ (y) =
⎧ ⎪ ⎪ h2 (y) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1− h2 (u) du ⎪ ⎪ ⎪ ⎨ (z0 ,∞)
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0 ⎪ ⎪ ⎩
(y,∞)
⎫ ⎪ ⎬ h2 dμ ≤ 1 , ⎪ ⎭
if y > z0 ,
if y = z0 and μ{z0 } > 0,
(7.4.6)
μ(z0 ) otherwise,
and let P ∗ be the corresponding probability measure with μ-density f2∗ . Then the following characterizations of the optimal coupling hold: (a) sup{F P (x, y); P ∈ Pμμ12 } = 1−max(Fμ1 (x), FP ∗ (y)), for all x, y where F P (x, y) = P ([x, ∞) × [y, ∞)) is the survival function.
38
7. Relaxed or Additional Constraints
(b) The sup in (a) is attained for the distribution F ∗ = FX ∗ ,Y ∗ , where (U ), Y ∗ = FP−1 X ∗ = Fμ−1 ∗ (U ). 1 (c) If c is a cost function that is componentwise antitone and satisfies the Monge condition (cf. (7.1.3), (7.1.4)), then inf
c(x, y) dFP (x, y); P ∈
Pμμ12
=
c(x, y) dF ∗ (x, y).
(7.4.7)
Proof: (a), (b) For P ∈ Pμμ12 with marginals Fμ1 , G2 , we know that −1 F P (x, y) ≤ P Fμ−1 (U ) ≥ x, G (U ) ≥ y = P (U ≥ max(Fμ1 (x), G2 (y))) = 2 1 1 − max(Fμ1 (x), G2 (y)). By the definition of P ∗ , FP ∗ (y) ≤ G2 (y) for all y, and therefore, F P (x, y) ≤ 1 − max(Fμ1 (x), FP ∗ (y)). (c) Applying (a), (b), and Theorem 3.1.2 we obtain (7.4.7). The conditions on the cost function c were studied by R¨ uschendorf (1980). In that terminology (−c) is a -monotone function. Applying the results in R¨ uschendorf (1980), it is easy to check that (c) follows from (a), (b). 2 The “antitone” assumption in (c) of Theorem 7.4.2 does not have a transparent interpretation in terms of cost functions. Moreover, under some additional assumptions on the bounding measures we can construct solutions for more “natural” cost functions. Again, let μ1 have densities hi with respect to μ, 1 = μ1 (IR1 ) ≤ μ2 (IR1 ). Theorem 7.4.3 Assume that for some y0 ∈ IR1 , h1 (u) ≤ h2 (u) for u < y0
and
h1 (u) ≥ h2 (u) for u ≥ y0 .
(7.4.8)
Define
x0
⎧ ⎪ ⎨ = inf y; ⎪ ⎩
⎫ ⎪ ⎬
h1 (u) dμ(u) ≥
(y,∞)
h2 (u) dμ(u) (y,∞)
⎪ ⎭
,
and let ⎧ h2 (u) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ h1 (u) dμ(u) − h2 (u) dμ(u) ⎪ ⎪ ⎪ ⎨ f2 (u) :=
[x0 ,∞)
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ h1 (u)
(x0 ,∞)
μ(x0 )
if u > x0 ,
if u = x0 and μ{x0 } > 0, if u < x0 .
(7.4.9)
7.4 Local Bounds for the Transportation Plans
39
Then for any cost function c satisfying the Monge condition (7.1.3) and the unimodality condition (7.1.27) we have inf
c(x, y) dfP (x, y); P ∈ Pμμ12
1 −1 (u), F (u) du, = c Fμ−1 2 1
(7.4.10)
0
where F2 is the d.f. of the measure with density f2 with respect to μ. The op(U ), Y ∗ = F2−1 (U ). timal distribution is determined by the r.v.s X ∗ = Fμ−1 1 Proof: Invoking the Monge condition, for any P ∈ Pμμ12 with marginals 1 −1 Fμ1 , G2 , we have c(x, y) dFP (x, y) ≥ c Fμ1 (u), G−1 2 (u) du. 0
By the definition of F2 , G2 (y)
≥ F2 (y) ≥ Fμ1 (y)
F2 (y)
=
for all y ≥ x0 ,
and
(7.4.11)
for all y ≤ x0 ;
Fμ1 (y)
in fact, (7.4.11) implies that Fμ−1 (u) ≥ F2−1 (u) ≥ G−1 2 (u) for u > F2 (x0 ) 1 −1 −1 and F2 (u) = Fμ1 (u) for u ≤ F2 (x0 ). Our assumptions on c imply that −1 −1 −1 c Fμ−1 , G (u) ≥ c F (u), F (u) for all u. 2 μ1 2 2 1 Remark 7.4.4 It is not difficult to extend the solution of Theorem 7.4.3 to the case μ1 (IR1 ) < 1 and to the case f1 ≥ h1 , f2 ≤ h2 for the densities (here, f1 and f2 are the marginal densities of an admissible plan P ), if we still keep the assumption (7.4.8). To see this, choose x0 as in (7.4.9), and define ⎧ h2 (x) ⎪ ⎪ ⎪ ⎪ ⎪ h2 (x) dμ(x) 1− ⎪ ⎪ ⎪ ⎨ f2 (x) =
where z0
y0
(z0 ,∞)
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
0 ⎧ ⎪ ⎨ = inf x; ⎪ ⎩ ⎧ ⎪ ⎨ = inf y; ⎪ ⎩
if x > z0 ,
if x = z0 and μ(z0 ) > 0,
μ(z0 ) otherwise, ⎫ ⎪ ⎬ h2 (x) dμ(x) ≤ 1 . Define next ⎪ ⎭
(x,∞)
h2 (x) dμ(x) ≤
(y,∞)
⎫ ⎪ ⎬ h1 (x) dμ(x)
(y,∞)
⎪ ⎭
(7.4.12)
40
7. Relaxed or Additional Constraints
and ⎧ h1 (x) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ f2 (x) ⎪ ⎪ ⎪ ⎨ f1 (x) = (h2 (x) − h1 (x)) dμ(x) ⎪ ⎪ ⎪ ⎪ ⎪ [y0 ,∞) ⎪ ⎪ ⎪ ⎪ μ(y0 ) ⎩
if x > y0 , if x < y0 , (7.4.13) if μ(y0 ) > 0.
Then for a cost function c defined as in Theorem 7.4.3, we have inf c(x, y) dFP (x, y); π1 P ≥ μ1 , π2 P ≤ μ2
(7.4.14)
1 = c F1−1 (u), F2−1 (u) du, 0
where Fi have densities fi with respect to μ, i = 1, 2. Let us return to the comment we made in Remark 7.3.26. Consider transportation problems with local upper bounds on the transportation plans xij ≤ μij in the discrete case, while P ≤ μ for some finite measure μ in the general case. The following framework allows us to handle quite general transportation problems. On a measurable space (X, B), let Bi ⊂ B, be sub-σ-algebras, 2 ≤ i ≤ n, Pi ∈ M 1 (X, Bi ). Further, let μ be a finite measure on (X, B), and define
Mμ := P ∈ M 1 (X, B); P/Bi = Pi , 1 ≤ i ≤ n, P ≤ μ . (7.4.15)
Ø and define the set of generalized transportation plans Assume that Mμ = with local upper bound μ as follows: Uμ (ϕ) := inf U (ϕ0 ) + h dμ; h ≥ 0, ϕ0 + h ≥ ϕ , (7.4.16) where U (ϕ0 ) := inf
n i=1
fi dPi ; fi ∈ L1 (Bi , Pi ), ϕ0 ≤
n
fi
.
i=1
We view U as the dual operator for the “pure” transportation problem, and typically, sup ϕ0 dP ; P ∈ M (P1 , . . . , Pn ) = U (ϕ0 ) (7.4.17)
7.4 Local Bounds for the Transportation Plans
41
will hold (cf. Chapter 2). The duality principle allows us to infer the corresponding “minimization” problem. Similarly, Uμ is the dual operator for the local majorized transportation problem. A linear operator S is majorized by Uμ , S ≤ Uμ if and only if S ≥ 0, S/Bi = Pi , 1 ≤ i ≤ n, and S ≤ μ. (7.4.18) Therefore, the approach developed in Chapter 2 yields the duality theorem ϕ dP ; P ∈ Mμ (7.4.19) =: Mμ (ϕ) Uμ (ϕ) = sup for any upper semicontinuous or uniformly approximable integrable functions ϕ in the case of a compactly approximable measure space (X, Bi , Pi ) with countable topological basis. In some sense (7.4.19) gives the duality result for the general case of order restrictions as considered, for example, in Sections 3.5 and 5.5. In particular, we obtain upper bounds n Mμ (ϕ) i=1 fi dPi for any admissible system of functions (fi ) with n≤ ϕ ≤ i=1 fi . We next consider the question of more explicit evaluations of the dual operator Uμ for the case ϕ = 1B , B ∈ B; or equivalently, we wish to establish sharp upper Fr´echet bounds in the class Mμ . Define Mμ (B) := Mμ (1B ), and assume the duality (7.4.19) for φ = 1B . Theorem 7.4.5 Mμ (B) =
sup P ∈M (P1 ,...,Pn )
P ∧ μ(B),
(7.4.20)
where P ∧ μ is the infimum in the lattice of measures. Proof: From (7.4.19), Mμ (B) = inf{μ(h) + U (ϕ); h ≥ 0, ϕ + h ≥ 1B } = inf{μ(h) + U (1B − h); 0 ≤ h ≤ 1B }.
(7.4.21)
To see the second equality in (7.4.4), take ϕ = (1B − h)+ . Thus, 0 ≤ ϕ, and it is possible to assume that h ≤ 1B . Next, we make use of the “integration” approach in Strassen (1965), Mμ (B) =
=
inf
sup
{μ(h) + P (1B − h)}
⎧ 1 ⎨
1
0≤h≤1B P ∈M (P1 ,...,Pn )
inf
sup
0≤h≤1B P
⎩
μ(h > t) dt + 0
0
(7.4.22)
⎫ ⎬ P (1B − h ≥ 1 − t) dt . ⎭
42
7. Relaxed or Additional Constraints
With Ct := {h > t} ⊂ B we see that {x; h(x) ≤ 1B (x) − 1 + t} = {x ∈ B; h(x) ≤ t} = B \ Ct . Therefore, 1 Mμ (B) =
inf
(μ(Ct ) + P (B \ Ct )) dt
sup
0≤h≤1B P
0
≥ sup inf {μ(C) + P (B \ C)} = sup μ ∧ P (B). P
C⊂B
P
On the other hand, Mμ (B) = sup{P (B); P ∈ M (P1 , . . . , Pn ), P ≤ μ} = sup{P ∧ μ(B); P ∈ M (P1 , . . . , Pn ), P ≤ μ} ≤ sup P ∧ μ(B). P ∈M (P1 ,...,Pn )
2 Theorem 7.4.5 allows us to reduce the problem of the majorized Fr´echet boundsto a problem of “usual” Fr´echet bounds, but for a more complicated functional. It remains an open problem to determine more explicit formulas for Mμ (B) in the general case.
7.5 Closeness of Measure with Joint Marginals on a Finite Number of Directions In this section we follow the work of Kakosjan and Klebanov (1984), Khalfin and Klebanov (1990), Klebanov and Rachev (1995a, 1995b, 1995c), on the application of marginal problems to computer and diffraction tomography. Here, estimates of the closeness between probability measures defined on IRn that have the same marginals on a finite number of arbitrary directions will be provided. The estimates show that the probability laws get closer in a certain metric when the number of coinciding marginals increases. The results offer a solution to the computer tomography paradox stated in Gutman, Kemperman, Reeds, and Shepp (1991). We start with some historical remarks and with the statement of the problem. Let Q1 and Q2 be a pair of probabilities on IR, i.e., probability measures defined on the Borel σ-field of IR. Lorentz (1949) studied conditions for the existence of a probability density function g(·) on IR2 taking
7.5 Closeness of Measure on a Finite Number of Directions
43
only two values, 0 or 1, and having Q1 and Q2 as marginals. In his 1961 paper Kellerer generalized this result and gave necessary and sufficient conditions for the existence of a density f (·) on IR2 that satisfies the inequalities 0 ≤ f (·) ≤ 1 and has Q1 and Q2 as marginals (see also Strassen (1965) and Jacobs (1987)). Fishburn et al. (1990) were able to show that Kellerer’s and Lorentz’s conditions are equivalent; i.e., for any density 0 ≤ f ≤ 1, on IR2 there exists a density taking the values 0 and 1 only that has the same marginals. In general, similar results hold for probability densities on IRm , m ≥ 2, when the (m − 1)-dimensional marginals are prescribed. Gutmann et al. (1991) show that for any probability density 0 ≤ f ≤ 1 on IRm and for any finite number of directions, there exists a probability density taking the values 0, 1 only that has the same marginals in the chosen directions. It follows that densities having the same marginals in a finite number of arbitrary directions may differ considerably in the uniform metric between densities, which is indeed a very strong metric; recall that convergence in the uniform metric implies convergence in total variation. The goal in this section is to show that under moment-type conditions, measures having a “large” number of coinciding marginals are close to each other in the weakmetrics.(1) The method is based on techniques used in the classical moment problem. On the other hand, most of our results will make use of relationships between different probability metrics, analyzed in the monograph by Kakosjan, Klebanov, and Rachev (1988), referred to below as KKR (1988). The key idea in showing that measures with a large number of common marginals are close to each other in the weak metrics is best understood by comparing three results. The first is the theorem of Gutman et al. (1991) mentioned above. The second (see Karlin and Studden (1966, p. 265)) states that if a finite number of moments μ1 , . . . , μn of a function f , 0 ≤ f ≤ 1, are given, then there exists a function g that takes the values 0 or 1 only and possesses the moments μ1 , . . . , μn . Finally, the third result (see KKR (1988, pp. 170–197)) gives estimates of the closeness in terms of a weak metric (the so-called λ-metric) on IR for measures having a finite number of common moments. Of course, since the condition of common marginals seems to be more restrictive than the condition of equal moments, one should be able to construct a similar estimate expressed in terms of the common marginals only. Furthermore, the technique should be similar to that used here. For simplicity, let us consider the 2-dimensional case. Let θ1 , . . . , θn be n unit vectors in the plane and P1 , P2 be two probabilities on IR2 having the same marginals in the directions θ1 , . . . , θn . To estimate the distance (1) Here
weak metric stands for a metric metrizing the weak convergence in the space of probability measures on a Euclidean space.
44
7. Relaxed or Additional Constraints
between P1 and P2 , various weak metrics can be used; however, it seems that the λ-metric is the most convenient for this purpose. This metric is defined as follows (see, for example, Zolotarev (1986)): Let ei(t,x) Pi ( dx), i = 1, 2, ϕi (t) = IR2
be the characteristic function of Pi . Then define the λ-distance between P1 and P2 as 1 λ(P1 , P2 ) = min max max |ϕ1 (t) − ϕ2 (t)|, ; (7.5.1) T >0 T t ≤T here (·, ·) is the inner product and · is the Euclidean norm. Clearly, λ metrizes the weak convergence. Our first result concerns the important case where one of the probability measures considered has compact support. Lemma 7.5.1 Let θ1 , . . . , θn be n ≥ 2 unit vectors in IR2 , no two of which are collinear. Let the support of the probability P1 be a subset of the unit disk, and let the probability P2 have the same marginals as P1 in the directions θ1 , . . . , θn . Set(2) # $ n−1 s = 2 . (7.5.2) 2 Then 1 2 s+1 λ(P1 , P2 ) ≤ . s!
(7.5.3)
Remark 7.5.2 We can replace the right-hand side of (7.5.3) by C/s, where 1 2 s+1 C is a constant; note that as s → ∞, 2! ∼ e/s. The difference 1 2 s+1 − es is plotted in figures 7.1 and 7.2. s! Proof of Lemma 7.5.1: The λ-metric is invariant under rotations of the coordinate system, so without loss of generality we assume that (a) the directions θj (j = 1, . . . , n) are not parallel to the axis; (b) there exists at least one pair of directions, say θj1 and θj2 , such that θj1 = (a, b), θj2 = (a, −b), where a = 0, b = 0; i.e., the vectors θj1 and θj2 are symmetric about the horizontal axis. (2) Here
and in what follows [r] denotes the integer part of the number r.
45
-0.6
-0.4
-0.2
0.0
7.5 Closeness of Measure on a Finite Number of Directions
-1.2
-1.0
-0.8
FIGURE 7.1. Plot of the difference (2/s!)1/(s+1) − e/s for s = 1, . . . , 100
20
40
60
80
100
0.0
0.001
0
-0.002
-0.001
FIGURE 7.2. Plot of the difference (2/s!)1/(s+1) − e/s for s = 10, . . . , 100
20
40
60
80
100
The law P1 has bounded support, and so, since the marginals on the directions θ1 , . . . , θn of P1 and P2 coincide, then for all j = 1, . . . , n,
k
(x, θj )k P2 ( dx).
(x, θj ) P1 ( dx) = IR2
(7.5.4)
IR2
To see that P2 has moments of any order, consider (7.5.4) with j = j1 , j = j2 , and x = (x1 , x2 ). Then (x1 a ± x2 b)k (P1 − P2 )( dx) = 0, IR2
(x1 a + x2 b)k + (x1 a − x2 b)k (P1 − P2 )( dx) = 0,
(7.5.5)
IR2
and all integrals are finite. If k is even, then (ax1 + bx2 )k + (ax1 − bx2 )k ≥ ak xk1 + bk xk2 , and thus (7.5.5) implies the existence of all moments of P2 of even order.
46
7. Relaxed or Additional Constraints
The next step is to show that all moments of P1 and P2 of order ≤ n − 1 agree. Set μr,t (P ) = xr1 xt2 P ( dx), = 1, 2. IR2
Then setting θj = (uj , vj ) in (7.5.4) yields k k =0
uj vjk− [μ,k− (P1 ) − μ,k− (P2 )] = 0,
j = 1, . . . , n; k ≥ 0. Now, setting zj = vj /uj in the last equation leads to k k =0
zjk− [μ,k− (P1 ) − μ,k− (P2 )] = 0,
(7.5.6)
j = 1, . . . , n. Since no two of the directions θ1 , . . . , θn are collinear, the points z1 , . . . , z2 are distinct. Hence from (7.5.6) we find that the following polynomial of degree k of the variable z, k k l=0
z k− [μ,k− (P1 ) − μ,k− (P2 )] ,
(7.5.7)
has n distinct roots z1 , . . . , zn . If n ≥ k + 1, then this is possible only if all coefficients of (7.5.7) are equal to zero, that is, μ,k− (P1 ) = μ,k− (P2 ), = 0, . . . , k; k = 0, . . . , n − 1. So, for any unit vector t, and k = 0, 1, . . . , n − 1, k (t, x) P1 ( dx) = (t, x)k P2 ( dx). (7.5.8) IR2
IR2 (t)
Denote by P the marginal of P ( = 1, 2) in the direction t, and by ϕ (τ ; t)(τ ∈ IR) its characteristic function. By assumption, the support of (t)
P1
is in the segment [−1, 1]. Then (7.5.8) is equivalent to (k)
(k)
ϕ1 (τ ; t)|τ =0 = ϕ2 (τ ; t)|τ =0 ,
k = 0, . . . , n − 1,
(7.5.9)
(k)
where ϕ (τ ; t) is the kth derivative of ϕ (τ ; t) with respect to τ ( = 1, 2). A Taylor expansion now gives ϕ1 (τ ; t) − ϕ2 (τ ; t) =
(7.5.10)
s−1 (k) (s) ϕ (0; t) − ϕ (0; t) 1
k=0
2
k!
(k)
τ
k
(s)
τ ; t) − ϕ2 ( τ ; t) s ϕ ( τ + 1 s!
7.5 Closeness of Measure on a Finite Number of Directions
47
for some τ ∈ (0, τ ). From (7.5.9), the first sum on the right-hand side of (7.5.10) is equal to zero. Since s is an even number, (s) |ϕ ( τ ; t)|
≤
z
s
(t) P ( dx)
1 (t) = z s P1 ( dz) ≤ 1,
= 1, 2.
−1
IR
Thus for all τ ∈ IR, τs |ϕ1 (τ ; t) − ϕ2 (τ ; t)| ≤ 2 . s! 1
s+1 ;(3) then Choose T = ( s! 2)
1 2 s+1 sup |ϕ1 (τ ; t) − ϕ2 (τ ; t)| ≤ . s! |τ |≤T 2 Corollary 7.5.3 Let θ1 , . . . , θn , n ≥ 2, be directions in IR2 no two of which are collinear. Suppose that the marginals of the probabilities P1 and P2 with respect to the directions θ1 , . . . , θn have moments up to the even order k ≤ n − 1. Then the marginals of P1 and P2 with respect to any direction t have the same moments up to order k. Corollary 7.5.4 Lemma 7.5.1 still holds if we replace the assumption that P1 and P2 have coinciding marginals with respect to the directions θj (j = 1, . . . , n) with the assumption that these marginals have the same moments up to order n − 1. To prove our main result we must relax the condition that the support of P1 is compact, assuming only the existence of all moments together with Carleman’s conditions for the definiteness of the moments problem. Set μk = sup (x, θ)k P1 ( dx), k = 0, 1, . . . , θ∈S 1 IR2
where S 1 is the unit circle, and let (s−1)/2
βs =
−
1
μ2j2j ,
j=1
where the number s is determined in Lemma 7.5.1; see (7.5.2). (3) This
s
choice of T is optimal, since 2 Ts! =
1 T
; see the definition (7.5.1) of λ-metric.
48
7. Relaxed or Additional Constraints
Theorem 7.5.5 Let θ1 , . . . , θn be n ≥ 2 directions in IR2 , no two of which are collinear. Suppose that the measure P1 has moments of any order. Suppose also that the marginals of the measures P1 and P2 in the directions θ1 , . . . , θn have the same moments up to order n − 1. Then there exists an absolute constant C such that(4) − 14
λ(P1 , P2 ) ≤ Cβs
(μ0 +
√
1/4
μ2 )
.
Proof: Let t be an arbitrary vector of the unit circle. From Corollary 7.5.3 (t) (t) we have that the marginals P1 and P2 have the same moments up to order s. From KKR (1988, p. 180) and Klebanov and Mkrtchian (1980), it follows that ⎞ ⎛ −1/4 (s−1)/2 1/4 % (t) (t) −1/(2j) ⎠ ⎝ ≤ C μ0 (t) + μ2 (t) , λ P1 , P2 μ2j (t) j=1
where μk (t) =
∞ −∞
(t)
uk Pi ( du), k = 0, . . . , s, i = 1, 2. The theorem now
follows from the obvious inequality μ2j (t) ≤ μ2j
(j = 0, 1, . . . , s/2).
2
Let us now consider the situation where the marginals of P1 and P2 in the directions θ1 , . . . , θn are not the same but are close in the metric λ. Theorem 7.5.6 Let θ1 , . . . , θn , n ≥ 2, be directions in IR2 , no two of which are collinear. Suppose that the supports of the measures P1 and P2 are in the unit disk, and that P1 and P2 have ε-coinciding marginals with respect to the directions θj (j = 1, . . . , n); i.e., (θj ) (θj ) λ P1 , P2 (7.5.11) ≤ ε. := min max max |ϕ1 (τ ; θj ) − ϕ2 (τ ; θj )|, 1/T T >0
|τ |≤T
Then there exists a constant C depending on the directions θj (j = 1, . . . , n) such that for sufficiently small ε > 0, we have 1 + 1/s , (7.5.12) λ(P1 , P2 ) ≤ C 1/ ln ε . where s = 2 n−1 2 (4) That
is, C is independent of s, P1 , and P2 .
7.5 Closeness of Measure on a Finite Number of Directions
49
Proof: Set ψj (τ ) := ϕ1 (τ ; θj ) − ϕ2 (τ ; θj ), j = 1, . . . , n. For 0 < ε ≤ 1 we have sup|τ |≤1 |ψj (τ )| ≤ ε, cf. (7.5.11). Since the supports of the measures (θj )
(θj )
are subsets of [−1, 1], for any even number k ≥ 2 we have (k) (k) |ϕ1 (0; θj )| + |ϕ2 (0; θj )| 2 (k) sup |ψj (τ )| ≤ ≤ . (7.5.13) k k! |τ |≤1 P1
and P2
Now we apply Corollary 1.5.1 in KKR (1988), which states that there exist constants Ck such that ()
sup |ϕj (τ )|
|τ |≤1
(7.5.14)
≤ Ck
k− k sup |ϕj (τ )|
|τ |≤1
(k)
k1
sup |ϕj (τ )| ,
= 0, 1, . . . , k.
|τ |≤1
Choosing k ≥ 2s, ≤ s, and applying (7.5.13), we obtain ()
sup |ϕj (τ )| ≤ Cs ε1/2 ,
= 0, 1, . . . , s; j = 1, . . . , n,
|τ |≤1
where Cs is a new constant depending on s only. In particular, ()
()
|ϕ1 (0; θj ) − ϕ2 (0; θj )| ≤ Cs ε1/2 ,
= 0, 1, . . . , s; j = 1, . . . , n,
or equivalently, (x, θj )k (P1 − P2 )( dx) ≤ Cs ε1/2 , 2
(7.5.15)
IR
k = 0, 1, . . . , s; j = 1, . . . , n. Following the notation in Lemma 7.5.1, we can rewrite (7.5.15) in the form for k = 0, . . . , s and j = 1, . . . , n, k k k− uj vj [μ,k− (P1 ) − μ,k− (P2 )] ≤ Cs ε1/2 . =0
Thus, setting Rkj =
k k =0
zjk− [μ,k− (P1 ) − μ,k− (P2 )] ,
(7.5.16)
k = 2, . . . , s; j = 1, . . . , n; zj = vj /uj , we obtain 1/2 , |Rkj | ≤ Cε
(7.5.17)
depends on the directions θ1 , . . . , θn only. For any fixed k (k = where C 2, . . . , s) consider
50
7. Relaxed or Additional Constraints (k)
(i) the matrix Ak with elements aj =
k −1
k−(−1)
zj
, , j = 1, . . . , k+1;
(ii) the vector Bk with elements (k)
b
= μ−1,k−+1 (P1 ) − μ−1,k−+1 (P1 ),
= 1, . . . , k + 1;
(iii) the vector Dk with elements dj = Rkj , j = 1, . . . , k + 1. Then (7.5.16) has the form Ak Bk = Dk (k = 1, . . . , s − 1), while (7.5.17) 1/2 . The matrices Ak are invertible, and so yields Dk ≤ Cε 1/2 Bk ≤ A−1 , k Dk ≤ Cε
(7.5.18)
where the constant C depends on the directions θ1 , . . . , θn only. Inequality (7.5.18) shows that the first s − 1 moments of the two-dimensional distributions are close when ε > 0 is sufficiently small. Such an evaluation of closeness holds for the first s − 1 moments of the marginals corresponding to an arbitrary direction t; i.e., k (x, t) (P1 − P2 )( dx) ≤ Cε1/2 , 2 IR
k = 0, . . . , s − 1. Now we have
s−1 (j) (j) 2 ϕ (0; t) − ϕ (0; t) j 1 2 |ϕ1 (τ ; t) − ϕ2 (τ ; t) ≤ τ + |τ |s j! j=0 s! ≤
s−1 Cε1/2 j=0
j!
|τ |j +
2|τ |s 2|τ |s ≤ Cε1/2 e|τ | + . s! s!
& 1 ' 1/2 1/2 s! s−1 Choose T = min ln 1 + Cε , 2 . Since t is arbitrary on the unit circle, we obtain 1 s−1 2 1/2 1/4 λ(P1 , P2 ) ≤ max C ε + Cε1/2 + , 1/T s! ≤ C [1/ ln(1/ε) + 1/s] , which proves the theorem.
2
Remark 7.5.7 The statement in Theorem 7.5.6 still holds if instead of the ε-coincidence of the marginals as in (7.5.11), we require the ε-coincidence of the moments up to order s of these marginals.
7.5 Closeness of Measure on a Finite Number of Directions
51
Theorems 7.5.5 and 7.5.6 can be generalized for probability measures defined on IRm . However, we cannot choose the directions θ1 , . . . , θn in an arbitrary way. Furthermore, to obtain the same order of precision in IRm , m > 2, corresponding to the n directions in IR2 , we need nm−1 directions. The results can be obtained by induction on the dimension m. We define next the set of directions we are going to use. Choose n ≥ 2 distinct real numbers u1 , . . . , un , all different from zero, and first construct the set of n two-dimensional vectors (1, u1 ), (1, u2 ), . . . , (1, un ). Then construct n2 three-dimensional vectors (1, uj1 , uj2 ), j1 , j2 = 1, . . . , n. Repeating this process, by the last step we have constructed a set of m-dimensional vectors (1, uj1 , uj2 , . . . , ujm−1 ),
j = 1, . . . , n; = 1, . . . , m − 1.
(7.5.19)
Denote these m-dimensional vectors by θ1 , . . . θN , where N = nm−1 . These inductive arguments lead to the following extensions of Theorems 7.5.5 and 7.5.6. Theorem 7.5.8 The results in Theorems 7.5.5 and 7.5.6 still hold if we consider the measures P1 and P2 in IRm , and we choose as directions the N = nm−1 vectors in (7.5.19). Further, s = 2[(n − 1)/2]. To prove this, it is sufficient to note that instead of the m-dimensional vectors, we can first consider a pair of one-dimensional probabilities; the first component is the distribution of the inner product of the projections of the vector x and the vector θj upon the (m − 1)-dimensional subspace, while the second is the law of the last coordinate of the vector x. This allows us to decrease the dimensionality by one. To complete the proof it is sufficient to apply inductive arguments. The bounds of the deviation between probability measures with coinciding marginals offers a solution to the computer tomography paradox as stated in Gutman et al. (1991): “It implies that for any human object and corresponding projection data there exist many different reconstructions, in particular, a reconstruction consisting only of bone and air (density 1 or 0), but still having the same projection data as the original object. Related nonuniqueness results are familar in tomography and are usually ignored because CT machines seem to produce useful images. It is likely that the ‘explanation’ of this apparent paradox is that point reconstruction in tomography is impossible.” Lemma 7.5.1 shows that although the densities of the probability measures P1 and P2 (given that such densities exist) can be quite distant from each other for any “large” number of coinciding marginals, yet the measures P1 and P2 themselves are close in the weak metric λ. Khalfin and Klebanov (1990) have analyzed this paradox and obtained some bounds for the closeness of probability measures with coinciding
52
7. Relaxed or Additional Constraints
marginals for specially chosen directions for the case of uniform distance between the smoothed densities of these measures. In tomography the observations are, in fact, integrals of body densities along some straight lines. Using quadratic formulas enables us to evaluate the moments of a set of marginals; these in turn make it possible to apply the results in this section (see Remark 7.5.7) to evaluate the precision of the reconstruction for densities. The classical theory of moments makes it possible to give numerical methods for reconstructing the probability measures using the moments (see, for example, Ahiezer (1961)).
7.6 Moment Problems with Applications to Characterization of Stochastic Processes, Queueing Theory, and Rounding Problems The theory of moment has a long history, which originated in the pioneering works of Shohat and Tamarkin (1943), Hoeffding (1955), Rogosinsky (1958), Ahiezer and Krein (1962), Karlin and Studden (1966), Kemperman (1968). It was also in the 1950s and ’60s that moment theory became a separate mathematical discipline. Currently, it is appropriate to talk about the moment problems as beeing a whole range of problems with applications to many mathematical theories. We refer to the monograph of Annastassiou (1993) for a recent survey on the developments in the theory of moments. In this section we present some applications of moment theory to probabilistic-statistical models. The results presented here are due to Anastassiou and Rachev (1992). For the proofs of the theorems, which are only stated but not proved in this section, we refer to Anastassiou (1993). First we shall state results on the following five moment problems: Moment problem 1: Find sup |x − y|p μ( dx, dy), S ⊂ IR2 , p ≥ 1, (7.6.1) μ
S
and
|x − y|p μ( dx, dy),
inf μ
S ⊂ IR2 ,
(7.6.2)
S
where the supremum (resp. infimum) is taken over the set of all probability measures μ with support in S having fixed marginal moments i x μ( dx, dy) = αi , y i μ( dx, dy) = βi , i = 1, 2, . . . , n. (7.6.3) S
S
7.6 Moment Problems of Stochastic Processes and Rounding Problems
53
Remark 7.6.1 Problem (7.6.2) with fixed marginal distributions μ(· × S) = μ1 (·),
μ(S × ·) = μ2 (·)
(7.6.4)
is indeed the Lp -Kantorovich problem on mass transportation (see Chapters 2 and 3). Moment problem 2: For given x0 ∈ IR and positive α find the Kantorovich radius sup E|X − x0 |α ,
(7.6.5)
where the supremum is over all random variables X with fixed moments EX = p and EX 2 = q. Remark 7.6.2 Problem 2 will be used in approximation of complex queueing models by means of deterministic models. Moment problem 3: Find sup [t]c μ( dt), A = [0, a] or [0, ∞),
(7.6.6)
A
and
inf
[t]c μ( dt),
A = [0, a] or [0, ∞),
(7.6.7)
A
over the set of all probability measures with support A having fixed rth moment tr μ( dt) = dr , r > 0, dr > 0, (7.6.8) A
where for a given nonnegative x the c-rounding (0 ≤ c ≤ 1) of x is defined by ⎧ ⎨ m if m ≤ x ≤ m + c, [x]c = ⎩ m + 1 if m + c < x ≤ m + 1. Remark 7.6.3 Moment problem 3 can be applied to the problems of rounding and apportionment; see Mosteller, Youtz, and Zahn (1967), Diaconis and Freedman (1979), Balinski and Young (1982), and Balinski and Rachev (1993). In the apportionment theory, c = 0 corresponds to the Adams method; c = 1/2 corresponds to the Webster method (or conventional rounding, or Mosteller–Youtz–Zahn “broken stick” rule of rounding); c = 1 corresponds to the Jefferson method; see Balinski and Young (1982).
54
7. Relaxed or Additional Constraints
Moment problem 4: Find (7.6.6) and (7.6.7) subject to (7.6.8) and tμ( dt) = d1 . (7.6.9) A
We next consider some infinite-dimensional analogues of the moment problems 1 and 2. Let C[0, 1] be the space of continuous functions on [0, 1] with the usual sup-norm x, and let X (C[0, 1]) be the space of r.v.s on a nonatomic probability space (Ω, A, P ) with values in C[0, 1]. Let M be the class of all strictly increasing continuous functions f : [0, ∞] → [0, ∞], f (0) = 0, f (∞) = ∞. Finally, let T be a set of finitely many points in [0, 1], 0 ≤ t1 < t2 < · · · ≤ tN ≤ 1.
(7.6.10)
Moment problem 5: Given h, gi ∈ M(i = 1, . . . , N ) find inf Eh(X − Y ),
(7.6.11)
where the infimum is over the set of all possible joint distributions of X and Y subject to the moment constraints Egi (|X(ti )|) = ai ,
Egi (|Y (ti )|) = bi .
(7.6.12)
Remark 7.6.4 This problem can be interpreted as follows. Having observations of two random processes (more precisely, we suppose the moments (7.6.12) are known), the goal is to evaluate the minimal possible distance Eh(||X −Y ||) between the processes X and Y . We shall determine the minimum in (7.6.11) and show that essentially this minimum can be achieved.
7.6.1
Moment Problems and Kantorovich Radius
In this section we state the solutions of moment problems 1 and 2; the proofs are given in Anastassiou and Rachev (1992) and the monograph Anastassiou (1993). Let S = [a, b] × [c, d] ⊂ IR2 and ϕ(x1 , x2 ) = |x1 − x2 |p , p ≥ 1. Suppose (α, β) ∈ S, and denote by U = U (ϕ, α, β) the supremum in (7.6.1) subject to x1 μ( dx1 , dx2 ) = α, x2 μ( dx1 , dx2 ) = β. (7.6.13) S
S
Theorem 7.6.5 The supremum U in (7.6.1) is given by U = Dδ + T,
(7.6.14)
7.6 Moment Problems of Stochastic Processes and Rounding Problems
where
|b − d|p + |a − c|p − |b − c|p + |a − d|p ,
D
:=
T
(1 − B)|b − c|p + (B + C − 1)|a − c|p + (1 − C)|a − d|p , d−β b−α , C := , := b−a d−c := max(0, 1 − B − C).
B δ
55
:=
Remark 7.6.6 Since ϕ is convex on any S ⊂ IR2 , then given (7.6.13), inf ϕ dμ = |α − β|p . μ
S
Next, consider nonbounded regions S. Namely, define for b ≥ 0 the following stripes in IR2 : S1b S2b Sb
:= {(x, y); y = x + b , where 0 ≤ b ≤ b}, := {(x, y); y = x − b , where 0 ≤ b ≤ b}, := S1b ∪ S2b .
We extend Theorem 7.6.5 to this type of unbounded region. Theorem 7.6.7 Assume that 0 < p ≤ 1. (i) If S = S1b or S2b , (α, β) ∈ S, then the supremum U in (7.6.1) is equal to U := U (ϕ, α, β) = |α − β|p . (ii) If S = S b , then U = bp . (iii) Let L be the lower bound (7.6.2) subject to (7.6.13). Then if S = S1b or S = S2b or S = S b , (α, β) ∈ S, we have L := L(ϕ; α, β) = bp−1 |α − β|. Next, consider another type of stripe in IR2 : For b, γ > 0, S1b S2γ S b,γ
:= {(x, y); y = x + b , where 0 ≤ b ≤ b}, := {(x, y); y = x − γ , where 0 ≤ γ ≤ γ}, := S1b ∪ S2γ .
Theorem 7.6.8 Let p ≥ 1. (i) If S = S1b , (α, β) ∈ S, then U := U (ϕ, α, β) = bp−1 (β − α).
56
7. Relaxed or Additional Constraints
(ii) If S = S2γ , (α, β) ∈ S, then U = U (ϕ, α, β) = γ p−1 (α − β). (iii) If S = S b,γ , (α, β) ∈ S, then U =
(bp − γ p )(β − α − b) + bp (b + γ) . b+γ
Next, we shall state the explicit solutions of Moment problem 2. So we will be interested in the following problem. For given x0 ∈ IR, α > 0, p ∈ IR, q > 0 (p2 ≤ q), −∞ ≤ a < b ≤ +∞, find the Kantorovich radius K
(7.6.15) := K(x0 ; α, p, q, a, b) α 2 := sup{E|X − x0 | ; X ∈ [a, b] a.s., EX = p, EX = q}.
Theorem 7.6.9 (Case (A): α ≥ 2, −∞ < a < b < +∞) Let x0 = (a + b)/2, a ≤ p ≤ b, and 0 ≤ q ≤ b2 + (a + b)(p − b). Then the Kantorovich radius K admits the following bound: α−2 # $ b−a (a + b)2 K ≤ q − p(a + b) + . 2 4 Moreover, if there exist λ1 , λ2 ≥ 0, λ1 + λ2 ≤ 1, such that p
=
q
=
a+b b−a + (λ1 − λ2 ), 2 2 b2 − a2 (a + b)2 (b − a)2 + (λ1 − λ2 ) + (λ1 + λ2 ), 4 2 4
then K =
b−a 2
α−2 # $ (a + b)2 q − p(a + b) + . 4
The next theorem gives an analogue of Theorem 7.6.9 when the X’s in (7.6.15) have unbounded support. Theorem 7.6.10 (Case (B): 0 < α ≤ 2, a = −∞, b = +∞) For any x0 ∈ IR, p ∈ IR, q > 0, p2 ≤ q, the Kantorovich radius K is given by K = K(x0 ; α; p, q) = (q − 2x0 p + x20 )α/2 . The rest of the results in this section treat various versions of Theorems 7.6.9 and 7.6.10. Theorem 7.6.11 (Case (C): 0 < α ≤ 2, −∞ < a < b < +∞) For any x0 ∈ (a, b); p ∈ IR, p2 ≤ q, K = sup{E|X − x0 |α ; X ∈ [a, b] a.s., EX = p, EX 2 = q}. Set P = p − x0 , Q = q − 2x0 p + x20 , A(x0 ) = a − x0 , B(x0 ) = b − x0 , C(x0 ) = min(−A(x0 ), B(x0 )).
7.6 Moment Problems of Stochastic Processes and Rounding Problems
57
(i) If 0 ≤ Q ≤ C 2 (x0 ), then K = Qα/2 . (ii) If Q > C 2 (x0 ) and (A(x0 ) + B(x0 ))P − Q − A(x0 )B(x0 ) ≥ 0, then K ≤ Qα/2 . Theorem 7.6.12 (Case (D): 1 ≤ α ≤ 2, −∞ < a < b ≤ +∞) For any p ∈ IR, p2 ≤ q, a ≤ p ≤ b, set P = p − a, Q = q − 2ap + a2 , B = b − a. Suppose Q ≤ BP . Then K := sup{E|X − a|α ; X ∈ [a, b], EX = p, EX 2 = q} = p2−α Qα−1 . Theorem 7.6.13 (Case (E): 1 ≤ α ≤ 2, −∞ ≤ a < b < +∞) For any p ∈ IR, p2 ≤ q, a ≤ p ≤ b, set P = p − b, Q = q − 2bp + b2 , θ = a − b. Suppose Q ≤ θP . Then K := sup{E|X − b|α ; X ∈ [a, b], EX = p, EX 2 = q} = p2−α Qα−1 .
7.6.2
Moment Problems Related to Rounding Proportions
Here we state results on explicit solutions of Moment see (7.6.6) and (7.6.9). To this end recall the definition c ≤ 1): For any x ≥ 0, ⎧ ⎨ m, if m ≤ x < m + c, [x]c = ⎩ m + 1, if m + c ≤ x < m + 1, ⎧ ⎨ 0, if 0 ≤ x < c, = ⎩ m, if m − 1 + c < x ≤ m + c,
problems 3 and 4; of c-rounding (0 ≤
m ∈ IN ∪ {0},
m ∈ IN.
The next four theorems deal with problem 3.(5) Theorem 7.6.14 Let c ∈ (0, 1), r > 0, 0 < a < ∞, d > 0, and U := U[·]c (a, r, d) := sup{E[X]c ; 0 ≤ X ≤ a a.s., (EX r )1/r = d}. Set n = [a]. (I) If n + c < a, n + c ≤ d ≤ a, then U = n + 1. (II) If n + c ≥ a, n − 1 + c ≤ d ≤ a, then U = n. (5) In
the sequel, the underlying probability space is assumed to be nonatomic, and thus the space of laws of nonnegative r.v.s coincides with the space of all Borel probability measures on IR+ .
58
7. Relaxed or Additional Constraints
(III) If 0 < a ≤ c, then U = 0. ln 2 (< 1), n + c < a, and 0 ≤ d ≤ n + c, then ln(1 + 1/c) U = (n + 1)dr (n + c)−r .
(IV) If 0 < r ≤
ln 2 , n + c ≥ a, and 0 ≤ d ≤ n − 1 + c, then ln(1 + 1/c) U = ndr (n − 1 + c)−r .
(V) If 0 < r ≤
(VI) If r ≥ 1 and 0 ≤ d ≤ c, then U = dr c−r . (VII) Suppose r ≥ 1. If either (a) n + c < a and k ∈ {1, . . . , n} is determined by k − 1 + c ≤ d < k + c, or (b) n + c ≥ a and k ∈ {1, . . . , n} is determined by k − 1 + c ≤ d < k + c, dr − (k − 1 + c)r then U = k + ≤ 1 − c + d. (k − c)r − (k − 1 + c)r The next theorem extends Theorem 7.6.14 to the case a = +∞. Theorem 7.6.15 Let 0 < c < 1, r > 0, d > 0, and U := U[·]c (r, d) := sup{E[X]c ; X ≥ 0 a.s., (EX r )1/r = d}. (I) If 0 < r < 1, then U = +∞. (II) If r ≥ 1 and 0 ≤ d ≤ c, then U = dr c−r . (III) Suppose r ≥ 1. Define k ∈ N by k − 1 + c ≤ d < k + c. Then U = k+
dr − (k − 1 + c)r ≤ 1 − c + d. (k + c)r − (k −+ c)r
The next two theorems are versions of the previous two; here we consider the lower bounds in c-rounding. Theorem 7.6.16 Let c ∈ (0, 1), r > 0, 0 < a < ∞, d > 0, and L := 1/r L[·]c (a, r, d) := inf{E[X]c ; 0 ≤ X ≤ a a.s., (EX r ) = d}. Set n = [a]. (I) If 0 < d ≤ c, then L = 0. (II) If c < a ≤ 1 + c and c ≤ d ≤ a, then L = (dr − cr )/(ar − cr ).
7.6 Moment Problems of Stochastic Processes and Rounding Problems
59
(III) Let 0 < r ≤ 1, n + c < a, and determine k ∈ {0, 1, . . . , n − 1} by k + c ≤ d < k + 1 + c. Then dr − (k + c)r . L = k+ (k + 1 + c)r − (k + c)r (IV) If 0 < r ≤ 1, n + c < a, and n + c ≤ d ≤ a, then L = n+
dr − (n + c)r . ar − (n + c)r
The case a = +∞ is extended as follows. Theorem 7.6.17 Let c ∈ (0, 1), r > 0, d > 0, and L := L[·]c (r, d) := inf{E[X]c : X ≥ 0 a.s., (EX r )1/r = d}. (I) If r > 1, then L = 0. (II) If 0 ≤ r ≤ 1, 0 < d ≤ c, then L = 0. (III) If r = 1, c ≤ d < ∞, then L = d − c. (IV) If 0 < r ≤ 1, define k ∈ IN ∪ {0} by k + c ≤ d < k + 1 + c. Then dr − (k + c)r . L := k + (k + 1 + c)r − (k + c)r Next we pass to Moment problem 4, see (7.6.6)–(7.6.9). For simplicity we shall consider only special cases of c-rounding. For the general case we refer to Anastassiou and Rachev (1992) and Anastassiou (1993). First consider the conventional (Webster) rounding, or MYZ-rounding, [x] := [x]1 . Theorem 7.6.18 Let a > 0, 0 < r = 1, d1 > 0, dr > 0, and U = U[·] (a, r, d1 , d2 ) := sup{E[X]; 0 ≤ X ≤ a a.s. and EX = d1 , EX r = dr }. Let θ = [a]. (I) Set Δr,a := ar +
ar − θr (d1 − θ). a−θ
Suppose that a = θ, θ ≤ d1 ≤ a, and either r > 1, dr1 ≤ dr ≤ Δr,a , or 0 < r < 1 and Δr,a ≤ dr ≤ dr1 . Then U = θ.
60
7. Relaxed or Additional Constraints
(II) Suppose 0 < θ = a and there are λ1 , λ2 ≥ 0 with λ1 + λ2 ≤ 1 and such that d1 = λ1 θ + λ2 a and dr = λ1 θr + λ2 ar . Then U =
(θr − ar )d1 + (a − θ)dr . a(θr−1 − ar−1 )
(III) Let θ ≥ 1 and suppose there exists k ∈ {0, 1, . . . , θ − 1} such that k ≤ d1 < k+1 and either r > 1 and dr,k := k r +[(k+1)r −k r ](d1 −k) ≤ dr ≤ θr−1 d1 or 0 < r < 1 and θr−1 d1 ≤ dr ≤ dr,k . Then U = d1 . For the case a = +∞ we have the following version of the above theorem. Theorem 7.6.19 Let 0 < r = 1, d1 > 0, dr > 0, and U := U[·] (r, d1 , dr ) := sup{E[X]; X ≥ 0 a.s. and EX = d1 , EX r = dr }. Suppose there exists a nonnegative integer k such that k ≤ d1 ≤ k + 1 and either r > 1 and dr ≥ dr,k := k r + [(k + 1)r − k r ](d1 − k) or 0 < r < 1 and 0 < dr ≤ dr,k . Then U = d1 . If we change in Theorems 7.6.18 and 7.6.19 the upper bound U to the corresponding lower bound, we obtain the following two theorems. Theorem 7.6.20 Let a > 0, 0 < r = 1, d1 > 0, dr > 0, and L := L[·] (a, r, d1 , dr ) := inf{E[X]; 0 ≤ X ≤ a a.s., EX = d1 , EX r = dr }. (I) Suppose there exist t1 , t2 , λ such that 0 ≤ t1 ≤ t2 ≤ 1, and d1 = (1 − λ)t1 + λt2 and dr = (1 − λ)tr1 + λtr2 . Then L = 0. (II) If 0 < a ≤ 1, then L = 0. (III) If 1 < a < 2 and there exist λ1 , λ2 > 0 with λ1 + λ2 ≤ 1 and such (dr − d1 ) that d1 = λ1 + λ2 a, dr = λ1 + λ2 ar , then L = . (ar − a) From now on assume that a ≥ 2, and let θ = [a]. (IV) Suppose Δθ :=
(θ − 1)(ar − a) ≤ θ. (θr − θ)
(i) If d1 = λ1 + λ2 θ, dr = λ1 + λ2 θr for some λ1 , λ2 ≥ 0, λ1 + λ2 ≤ 1, then L = Δθ . (ii) If d1 = λ1 θ+λ2 a and dr = λ1 θr +λ2 θr for some λ1 , λ2 ≥ 0, λ1 +λ2 ≤ 1, then L =
(θ − a(θ − 1))dr − (θr+1 − ar (θ − 1))d1 . θar − aθr
7.6 Moment Problems of Stochastic Processes and Rounding Problems
61
(V) Suppose Δθ > θ. (i) If d1 = λ1 +λ2 θ+λ3 a and dr = λ1 +λ2 θr +λ3 ar for some λ1 , λ2 , λ3 ≥ 0, λ1 + λ2 + λ3 = 1, then L=
(θ − 1)(θ − a + 1)(dr −1) − ((θr −1)θ − (ar −1)(θ − 1))(d1 − 1) . (θ − 1)(ar − 1) − (a − 1)(θr − 1)
(ii) If d1 = λ1 + λ2 a, dr = λ1 + λ2 ar for some λ1 , λ2 ≥ 0, λ1 + λ2 ≤ 1, θ(dr − d1 ) . then L = ar − a (VI) Suppose θ > 1 and one of the following holds. (i) r > 1 and there exists k ∈ {1, . . . , θ − 1} such that k ≤ d1 k + 1 and dr,k
:= k r + ((k + 1)r − k r ) (d1 − k) ≤ dr (θr − 1)(d1 − 1) ; ≤ Δr,θ := 1 + θ−1
(ii) 0 < r < 1 and there exists k ∈ {1, . . . , θ − 1} such that k ≤ d1 ≤ k + 1 and Δr,k ≤ dr ≤ dr,k . Then L = d1 − 1. The special case a = +∞ is treated as follows. Theorem 7.6.21 Let 0 < r = 1, d1 > 0, dr > 0, and L := L[·] (r, d1 , dr ) := inf{E[X] : X ≥ 0 a.s., EX = d1 , EX r = dr }. (I) Suppose there exist 0 ≤ t1 ≤ t2 ≤ 1, 0 ≤ λ ≤ 1 such that d1 = (1 − λ)t1 + λt2 and dr = (1 − λ)tr1 + λtr2 . Then L = 0. (II) Suppose 0 < r < 1 and either d1 = λ1 + λ2 , dr = λ1 , λ1 ≥ 0, λ1 + λ2 ≤ 1 or d1 ≥ 1, 0 < d1 ≤ 1. Then L = d1 − dr . (III) Suppose 0 < r < 1 and for some integer k, k ≤ d1 < k + 1, and 1 ≤ dr ≤ (k r + ((k + 1)r − k r ))(d1 − k). Then L = d1 − 1. (IV) Suppose r > 1 and either d1 = λ1 , dr = λ1 + λ2 , λi ≥ 0, i = 1, 2, λ1 + λ2 ≤ 1 or 0 < d1 ≤ 1 and 1 ≤ dr . Then L = 0. (V) Suppose r > 1 and for some integer k, k ≤ d1 < k + 1, and (k r + ((k + 1)r − k r ))(d1 − k) ≤ dr . Then L = d1 − 1. Similar results are valid for Adams (c = 0) and Jefferson (c = 1) rules of roundings; see Anastassiou and Rachev (1992) and Anastassiou (1993).
62
7. Relaxed or Additional Constraints
7.6.3
Closeness of Random Processes with Fixed Moment Characteristics
The moment problems we are going to consider in this section may be viewed as extensions of Moment problem 1 (on page 52) for measures μ generated by the joint distribution of random processes. Namely, let the class M, the space X (C[0, 1]), and the set T = {t1 , . . . , tN } be defined by (7.6.10). The subject of this section is the following general version of problem (7.6.10)–(7.6.12). Moment problem 6: Given h, gi,j ∈ M (i = 1, . . . , N, j = 1, . . . , n) find the set of valued Eh(||X − Y ||) subject to the moment constraints Egi,j (|X(ti )|) = ai,j ,
Egi,j (|Y (ti )|) = bi,j ,
i = 1, . . . , N.
(7.6.16)
In particular, determine the bounds inf Eh(||X − Y ||), sup Eh(||X − Y ||),
(7.6.17)
given the constraints (7.6.16). One interpretation of the problem is as follows: Suppose we observe two continuous processes X and Y , only through the “window” T . Suppose for each point of the “window” we know some moment characteristics of X and Y . The problem is to determine the possible deviations between the processes outside the window. In particular, “What is the minimal distance between X and Y with given moment information (7.6.16)?” is just a special case of Moment problem 7.6.3. We start with the case n = 1 in (7.6.16); i.e., given h, gi ∈ M(i = 1, . . . , N ) and assuming that Egi (|X(ti )|) = ai ,
Egi (|Y (ti )|) = bi ,
i = 1, . . . , N,
(7.6.18)
we are interested in the range of Eh(||X − Y ||). The solution of this problem will be given under some assumptions of the following type: Assumption A(h, g): h ◦ g −1 (t) (t ≥ 0) is a convex function (here and in the sequel, f −1 stands for the inverse of f ∈ M). Assumption B(g): g −1 (Eg(|ξ + η|)) ≤ g −1 (Eg(|ξ|)) + g −1 (E(|η|)) for any ξ, η ∈ X (R) (here, X (R) is the set of all real-valued r.v.s.). Assumption C(g): Eg(|ξ + η|) ≤ Eg(|ξ|) + Eg(|η|) for any ξ, η ∈ X (R). Assumption D(h, g): limt→∞ h(t)/g(t) = 0.
7.6 Moment Problems of Stochastic Processes and Rounding Problems
63
Remark 7.6.22 Take the most interesting case: h(t) = tp , g(t) = tq (p > 0, q > 0). Then A(h, g) ⇔
p ≥ q,
B(g) ⇔
q ≥ 1,
C(g) ⇔ q ≤ 1, D(h, g) ⇔ q > p. Now let T = {0 ≤ t1 ≤ · · · ≤ tN ≤ 1} and take a = (a1 , . . . , aN ) ∈ = (b1 , . . . , bN ) ∈ IRN + and g = (g1 , . . . , gN ) ∈ M to be fixed vectors. Denote by X (T, g, a), the space of all X ∈ X (C[0, 1]) satisfying the marginal moment conditions Egi (|X(ti )|) = ai (i = 1, . . . , N ), and let IRN +,b
I{h, g, T, a, b} :=
inf{Eh(||X − Y ||); X ∈ X (T, g, a), Y ∈ X (T, g, b)}.
(7.6.19)
In the next four theorems we describe the exact range of values of Eh(||X − Y ||) under different conditions of type A–D. Theorem 7.6.23 Let A(h, gi ) and B(gi ) hold for any i = 1, 2, . . . , N . Then (7.6.20) (i) I{h, g, T, a, b} = sup h |gi−1 (ai ) − gi−1 (bi )| ; 1≤i≤N
(ii) for any ν ≥ I{h, g, T, a, b} there exist random processes Xν ∈ X (T, g, a) and Yν ∈ X (T, g, b) such that Eh(||Xν − Yν ||) = ν.
(7.6.21)
Proof: We shall split the proof into three claims. Claim 1: I{h, g, T, a, b} ≥ sup φi (ai , bi ), where φi (ai , bi ) := h(|gi−1 (ai )− gi−1 (bi )|).
1≤i≤N
Proof of Claim 1: Let X, Y ∈ X (C[0, 1]) and ξ := g(|X(ti ) − Y (ti )|). Then, by the Jensen’s inequality and A(h, gi ), h−1 (Eh(||X − Y ||)) ≥ h−1 (Eh(|X(ti ) − Y (ti )|)) = h−1 (Eh ◦ gi−1 (ξ)) ≥ h−1 ◦ h ◦ gi−1 E(ξ) = gi−1 E(ξ). Further, by B(gi ), h ◦ gi−1 (E(ξ)) ≥ h
gi−1 (Egi (|X(ti )|) − gi−1 (Egi (|Y (ti )|)
= h(|gi−1 (ai ) − gi−1 (bi )|),
64
7. Relaxed or Additional Constraints
which proves the claim. Claim 2: The infimum in the left-hand side of (7.6.19) is attained, and (7.6.20) holds. Y ∈ X (C[0, 1]) to be random polygonal Proof of Claim 2: Define X, lines with vertices at points 0, t1 , . . . , tn , 1 given by ⎧ ⎪ i , ω) = g −1 (ai ), Y (ti , ω) = g −1 (bi ), i = 1, . . . , N, ⎪ X(t ⎪ ⎨ ω) = Y (0, ω) = 0, (7.6.22) X(0, if t1 > 0, ⎪ ⎪ ⎪ ⎩ X(1, ω) = Y (1, ω) = 0 if tN < 1, ω ∈ Ω. For any ω ∈ Ω, ( ( ( ( (X(·, ω) − Y (·, ω)( = sup |gi−1 (ai ) − gi−1 (bi )|, 1≤i≤N
and hence − Y ||) = Eh(||X
sup φi (ai , bi ).
(7.6.23)
1≤i≤N
Further, by (7.6.22), ∈ X (T, g, a), X
Y ∈ X (T, g, b).
(7.6.24)
Invoking (7.6.23), (7.6.24), and Claim 1, we complete the proof of the claim. Claim 3: (ii) is satisfied. Proof of Claim 3: Let τ ∈ (0, 1), t ∈ T . Define the r.v.s Xν and Yν in X (C[0, 1]) as follows: Xν (t) = X(t),
Yν (t) = Y (t)
for t = 0, t1 , . . . , tN , 1,
(7.6.25)
Y are determined by (7.6.22), and where X, Xν (τ ) = h−1 (ν),
Yν (τ ) = 0.
(7.6.26)
Next, let Xν (t), Yν (t) be a random polygonal lines with vertices at 0, t1 , . . . , tN , 1 and τ . Making use of Claim 2, we have ||Xν − Yν || = h−1 (ν) ≥ and thus (7.6.21) holds.
sup |gi−1 (ai ) − gi−1 (bi )|,
1≤i≤N
2
7.6 Moment Problems of Stochastic Processes and Rounding Problems
65
Theorem 7.6.24 Let A(h, gi ) and C(gi ) hold for any i = 1, . . . , N . Then (i)
I{h, g, T, a, b} = sup h ◦ gi−1 (|ai − bi |),
(ii)
for any ν ≥ I{h, g, T, a, b} there exist Xnν ∈ X (T, g, a) and Ynν ∈ X (T, g, b) such that Eh(||Xnν − Ynν ||) → Y as n → ∞.
1≤i≤N
Proof: Claim 1: I{h, g, T, a, b} ≥ sup ϕi (ai , bi ), where ϕi (ai , bi ) := h◦gi−1 (|ai − 1≤i≤N
bi |).
Proof of Claim 1: Let X, Y ∈ X (C[0, 1]). Then, as in Claim 1 of Theorem 7.6.23, by A(h, gi ), C(gi ), and Jensen’s inequality, we have h−1 (Eh(||X − Y ||)) ≥ gi−1 E[gi (|X(ti ) − Y (ti )|)] ≥ gi−1 (|Egi (|X(ti )|) − Egi (|Y (ti )|)|) = h−1 ◦ ϕi (ai , bi ), which proves the claim. Claim 2: For any ε > 0 there exists a pair (Xε , Yε ) ∈ X (T, g, a)×X (T, g, b) such that Eh(||Xε − Yε ||) = sup h ◦ gi−1 (|ai − bi | + ε) sup 1≤i≤N
1≤i≤N
|ai − bi | . (7.6.27) |ai − bi | + ε
Proof of Claim 2: Without loss of generality we can assume that ai > bi , i = 1, . . . , N . Let pi := (ai −bi )/(ai −bi +ε) and qi := 1−pi , i = 1, . . . , N . We rearrange the indices i so that p1 ≤ p2 ≤ · · · ≤ pN . Choose sets Ai ∈ A such that A1 ⊂ · · · ⊂ AN and P (Ai ) = pi by using the assumption of (Ω, A, P ) being a nonatomic space. More precisely, since (Ω, A, P ) is nonatomic, then for any B ∈ A and any λ ∈ [0, P (B)] there exists C = C(B, λ) ∈ A, C ⊆ B such that P (C) = λ (see Loeve (1977, p. 101)). Then the required sets Ak (k = 1, . . . , N ) are given by Ak = C(Ak+1 , Pk ),
k = 1, . . . , N
(AN +1 := Ω).
Further, for any ω ∈ Ω, define ⎧ ⎨ c := g −1 (a − b + ε), i i i i Xε (ti , ω) := ⎩ d := g −1 (b /q ), i i i i and
⎧ ⎨ 0, Yε (ti , ω) := ⎩ d, i
if ω ∈ Ai , if ω ∈ Ai .
if ω ∈ Ai , if ω ∈ Ai ,
66
7. Relaxed or Additional Constraints
We define (t, Xε (t))t∈[0,1] to be a random polygonal line with vertices (ti , Xε (ti )) and let Xε (0, ω) = 0 if ti > 0 and Xε (1, ω) = 0 if tN < 1 for any ω ∈ Ω. Analogously define the process Yε (t). Then Egi (|Xε (ti )|) = gi (ci )pi + gi (di )qi = ai , and Egi (|Yε (ti )|) = gi (di )qi = bi for any i = 1, . . . , N ; i.e., Xε ∈ X (T, g, a) and Yε ∈ X (T, g, b). Further, Eh(||Xε − Yε ||) = Eh[sup |Xε (t) − Yε (t)|],
(7.6.28)
t∈T
where
⎧ ⎨ c, i |Xε (ti , ω) − Yε (ti , ω)| = ⎩ 0,
if ω ∈ Ai ,
(7.6.29)
if ω ∈ Ai .
Since g ∈ M, then for any i = 1, . . . , N − 1, pi ≤ pi+1 ⇔ ai − bi ≤ ai+1 − bi+1 ⇔ ci ≤ ci+1 ; i.e., c1 ≤ c2 ≤ · · · ≤ cN . Hence, by (7.6.29) and A1 ⊂ A2 ⊂ · · · ⊂ AN , ⎧ ⎨ c , if ω ∈ A , N N sup |Xε (t, ω) − Yε (t, ω)| = (7.6.30) ⎩ 0, t∈T if ω ∈ A . N
Now, (7.6.28) and (7.6.30) imply that Eh(||Xε − Yε ||) = h(cN )pN , which is in fact the right-hand side of equality (7.6.27), and thus the claim is proved. Claims 1 and 2 prove the desired equality (i). Claim 3: (ii) is satisfied. Proof of Claim 3: Let τ ∈ (0, 1), τ ∈ T . Using the same notations as in Claims 1 and 2 we define Xν (ti , ω) Xν (τ, ω)
:= Xε (ti , ω), Yν (ti , ω) := Yε (ti , ω), ⎧ ⎨ h−1 (ν), if ω ∈ A , N = ⎩ 0, if ω ∈ AN ,
Yν (τ, ω)
=
ν > h(cN )
=
0
ω ∈ Ω,
for any ω ∈ Ω, and ε > 0 is chosen so small that sup h ◦ gi−1 (|ai − bi | + ε).
1≤i≤N
We define the random broken lines Xν and Yν with vertices Xν (ti ), Yν (ti ), i = 1, . . . , N and Xν (0) = Yν (0) if ti > 0; Xν (1) = Yν (1) = 0 if tN < 1. Hence, as in Claim 2 we conclude that ⎧ ⎨ max(c , h−1 (ν)), if ω ∈ A , N N ||Xν (·, ω) − Yν (·, ω)|| = ⎩ 0, if ω ∈ A . N
Hence, Eh(||Xν − Yν ||) = νpN = ν aNaN−b−bNN+ε . This proves the claim.
2
7.6 Moment Problems of Stochastic Processes and Rounding Problems
67
Theorem 7.6.25 Let D(h, gi ) hold for any i = 1, . . . , N . Then I{h, g, T, a, b} = 0,
(7.6.31)
and for any ν > 0 there exist random processes Xnν ∈ X (T, g, a) and Ynν ∈ X (T, g, b) such that Eh(||Xnν − Ynν ||) → ν. Proof: Claim 1: For any n = 1, 2, . . . there exist Xn ∈ X (T, g, a) and Yn ∈ X (T, g, b) such that
Eh(||Xn − Yn ||) ≤
n i=1
bi ai h(nbi ) + h(nai ) gi (nbi ) gi (nai
. (7.6.32)
Since gi ∈ M for n large enough, say n ≥ n0 , we can define disjoint sets Ain , Bin , Cin , and such that Ain + Bin + Cin = Ω and
P (Ain ) = cin :=
ai , gi (nai )
P (Bin ) = din :=
bi . gi (nbi )
Now, for any i = 1, . . . , N , n ≥ n0 , define ⎧ ⎪ ⎪ X (t , ω) = nai , Yn (ti , ω) = 0, ⎪ ⎨ n i Xn (ti , ω) = 0, Yn (ti , ω) = nbi , ⎪ ⎪ ⎪ ⎩ X (t , ω) = Y (t , ω) = 0, n i n i
if ω ∈ Ain , if ω ∈ Bin ,
(7.6.33)
if ω ∈ Cin .
Then Egi [|Xn (ti )] = gi (nai )cin = ai and Egi [|Yn (ti )|] = gi (nbi )din = bi ; i.e., X ∈ X (T, g, a) and Y ∈ X (T, g, b). Further, we define the random broken lines Xn (t), Yn (t) (t ∈ [0, 1]) in the way we already did in Theorems 7.6.23 and 7.6.24. Without loss of generality we can assume that a1 ≤ a2 ≤ · · · ≤ aN ≤ b1 ≤ b2 ≤ · · · ≤ bN . Then ||Xn − Yn || =
sup |Xn (t) − Yn (t)|, t∈T
(7.6.34)
68
7. Relaxed or Additional Constraints
⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ =
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
nbN
if ω ∈ BN,n ,
nbN −1 .. .
if ω ∈ BN −1,n \ BN,n , N )
nb1
if ω ∈ B1,n \
naN
if ω ∈ AN,n \
Bj,n ,
j=2 N )
Bj,n ,
j=1
.. .
* if ω ∈ A1,n \
na1
if ω ∈
0
N )
N )
Aj,n ∪
j=2
Aj,n ∪
j=1
N )
N )
+ Bj,n ,
j=1
Bj,n .
j=1
Hence, Eh(||Xn − Yn ||) ≤
N
h(nbj,n )dj,n +
j=1
N
h(naj,n )cj,n ,
j=1
which proves (7.6.32) and the claim. By D(h, gi ) (i = 1, . . . , N ) it follows that the right-hand side of (7.6.32) goes to 0 as n → ∞. Hence, (7.6.31) holds true. Claim 2: (ii) is valid. ∈ (0, 1), t ∈ T , and define νn
Let τ
=
1−P
ν ,
N j=1
, Cj,n
Xn,ν (τ ) = h−1 (νn ), Yn,ν (τ ) = 0. We define the random broken line Xn,ν (t) with vertices Xn,ν (tj ) = Xn (tj ) (see (7.6.33)) and Xn,ν (τ ) (cf. Claim 2 of Theorem 7.6.23). Following the same notations as in Claim 1, we have ⎛
N -
h(nbi )P ⎝Bi,n \
⎞ Bj,n ⎠ ≤ h(nbi )din → 0
as n → ∞
(7.6.35)
Bj,n ⎦⎦ ≤ h(nai )cin → 0
(7.6.36)
j=i+1
and ⎡
⎡
h(nai )P ⎣Ai,n \ ⎣
N -
j=i+1
Aj,n ∪
N -
⎤⎤
j=1
as n → ∞. Hence, by (7.6.34)–(7.6.36), for n large enough,
7.6 Moment Problems of Stochastic Processes and Rounding Problems
Eh(||XN,ν − YN,ν ||) =
N
⎛
i=1
+
N
⎛
=
⎛
νn P ⎝Bi,n \
j=1
+
N
Bj,n ⎠
j=i+1
⎡
max(νn , h(nai ))P ⎝Ai,n \ ⎣
i=1 N
⎞
N -
max(νn , h(nbi ))P ⎝Bi,n \
⎛
N j=i+1
⎡
i=1
= νn P ⎝
N -
Aj,n ∪
j=i+1 N -
Aj,n ∪
N -
⎤⎞ Bj,n ⎦⎠
j=1
Bj,n ⎠
νn P ⎝Ai,n \ ⎣
⎛
N -
j=i+1
⎞
69
⎞
N -
⎤⎞ Bj,n ⎦⎠
j=1
(Aj,n ∪ Bj,n )⎠ .
2
j=1
Theorem 7.6.26 For any gi ∈ M (i = 1, . . . , N ), (i)
I{0, g, a, b} :=
(7.6.37) inf{P (X =
Y ); X ∈ X (T, g, a), y ∈ X (T, g, b)} = 0;
(ii) for any ν ∈ (0, 1) there exists a sequence (Xnν , Ynν ) ∈ X (T, g, a)× X (T, g, b) such that P (Xnν = Ynν ) → ν Proof: (i)
as n → ∞.
Let cn ∈ A and P (cn ) =
1 n.
For any i = 1, . . . , N define
⎧ ⎨ X (t , ω) := g −1 (na ), Y (t , ω) := g −1 (nb ), if ω ∈ C , n i i n i i n i i (7.6.38) ⎩ X (t , ω) := Y (t , ω) = 0, if ω ∈ Cn . n i n i Then (7.6.38) determines the random polygonal lines Xn ∈ X (T, g, a) and Yn ∈ X (T, g, b). Since Xn (ti , ω) = Yn (ti , ω) = 0 whenever ω ∈ Cn and i = 1, . . . , N , then Xn (t, ω) = Yn (t, ω) = 0 if ω ∈ Cn and t ∈ [0, 1]. Hence, P (Xn = Yn ) ≥ P (Ω \ Cn ) = as desired.
n−1 → 1 n
70
7. Relaxed or Additional Constraints
(ii) Let 0 < ν < 1 and τ ∈ (0, 1) \ T . Choose A ∈ A with P (A) = ν and let ⎧ ⎪ ⎪ X (τ, ω) = 1, Ynν (τ, ω) = 0, for any ω ∈ A, ⎪ ⎨ nν (7.6.39) Xnν (τ, ω) = Ynν (τ, ω) = 0, for any ω ∈ A, ⎪ ⎪ ⎪ ⎩ X (t ) = X (t ), Y (t ) = Y (t ), i = 1, . . . , N, nν
i
n
i
nν
i
n
i
where Xn (ti ) and Yn (ti ) are given by (7.6.38). We construct the random broken lines Xnν and Xnν by using (7.6.39) (cf. Claim 3 of Theorem 7.6.23). From the implications
Xnν (·, ω) = Ynν (·, ω) ⇔
⇔
⎧ ⎪ ⎪ X (t , ω) = Ynν (ti , ω), ⎪ ⎨ nν i
i = 1, . . . , N,
⎪ ⎪ ⎪ ⎩ X (τ, ω) = Y (τ, ω) nν nν ω ∈ A ∩ Cn ,
it follows that P (Xnν = Ynν ) = P (Ω \ (A ∪ Cn )) → 1 − ν, which proves (ii), and the theorem as well.
2
As a consequence of Theorems 7.6.23–7.6.26 we obtain the following solution of Moment problem 7.6 for qi (t) = tqi (yi > 0) and h(t) = tp (P ≥ 0, with the convention t0 := I{t = 0}). Corollary 7.6.27 Let q = (q1 , . . . , qN ) (qi > 0), p ≥ 0, and
I{p, q, a, b} :=
inf {E||X − Y ||p ; X, Y ∈ X (C[0, 1]), E|X(ti )|qi = ai , E|Y (ti )|qi = bi , i = 1, . . . , N } .
Then ⎧ p 1/qi 1/qi ⎪ sup ai − bi , if p ≥ qi ≥ 1, i = 1, . . . , N, ⎪ ⎪ ⎪ 1≤i≤N ⎪ ⎨ p/q sup |ai − bi | i , if p ≥ qi , 0 < qi < 1, I{p, q, a, b} = (7.6.40) ⎪ 1≤i≤N ⎪ i = 1, . . . , N. ⎪ ⎪ ⎪ ⎩ 0, if 0 ≤ p ≤ qi , i = 1, . . . , N.
7.6 Moment Problems of Stochastic Processes and Rounding Problems
71
Moreover, for any ν > I{p, q, a, b} there exists a sequence (Xnν , Ynν ) ∈ X ([0, 1]) such that E||Xnν − Ynν ||p → ν
as n → ∞
and E|X(ti )|qi = ai ,
E|Y (ti )|qi = bi ,
i = 1, . . . , N.
Remark 7.6.28 Corollary 7.6.27 gives an explicit expression for I{p, q, a, b} if p and qi are subject to certain inequalities (cf. (7.6.40)) or if qi = q for all i = 1, . . . , N . The problem of an explicit description of I{p, q, a, b} for any p ≥ 0 and qi > 0 is still open.
7.6.4
Approximation of Queueing Systems with Prescribed Moments
In this section we discuss applications of Moment problem 1 (on page 52) to the problem of best approximation of a queueing system with known moment characteristics. As an example, suppose our “real” queueing system is of type G|G|1|∞ (for some acquaintance with the usual notations in queueing theory we refer to Borovkov (1984), Kalashnikov and Rachev (1990)). For this system, the sequences of nonnegative r.v.s (possibly dependent and nonidentically distributed) e = {en }n∈IN , s = {sn }n∈IN (IN = (1, 2, . . .)) are viewed as sequences of interarrival and service times. Looking at e and s as “input” of laws, we define (as the “output” flow) the sequence of waiting times w1 = 0,
wn+1 = (wn + sn − en )+ ,
n ∈ IN,
(7.6.41)
where (·)+ = max(0, ·). Since the distribution of w = {wn }n∈IN is not known, the aim is to approximate, model, or simulate the “real” system determined by the triplet (e, s, w) with a “simpler” queueing model (e∗ , s∗ , w∗ ). Assuming that the marginal distributions (the laws of ei , si ) are known, Borovkov (1984, Chapter 4) and Kalashnikov and Rachev (1990) examine different approximating models (e∗ , s∗ , w∗ ) and estimate the possible discrepancy between the “real” system (e, s, w) and the “ideal” model (e∗ , s∗ , w∗ ). Further, we shall relax the constraints “the laws of ei ’s and si ’s are known” by “certain moment characteristics of ei ’s and si ’s are fixed.” In this setup the solutions of Moment problem 1 are used in cases when the “ideal” model is not deterministic, say G|G|1|∞ but with simpler structure. We invoke Moment problem 2 (on page 53) when the approximation model has some deterministic components, like D|G|1|∞ (i.e., e∗j ’s are constants), or D|D|1|∞ (i.e., e∗j ’s and s∗j ’s are constants). Summarizing, we shall consider here the following two problems:
72
7. Relaxed or Additional Constraints
(a) Bounds for the deviation of output characteristics of two dependent queueing models. (b) Approximation of queueing systems by deterministic-type queueing models. Consider the following problem, which occurs in investigations stability of queueing models (see Kalashnikov and Rachev (1990, Chapter 5)). Suppose two queueing models of type G|G|1|∞, (e, s, w) and (e∗ , s∗ , s∗ ), with dependent characteristics are given. Here e = {en }n∈IN , s = {sn }n∈IN , w = {wn }n∈IN are, respectively, the sequences of interarrival, service, and waiting times. Assume that the components dj , sj , j ∈ IN of the “input flows” e and s are dependent and nonidentically distributed. The “output” flow is given by the sequence of waiting times (7.6.41). Suppose that the distribution of ej (resp. sj ) is concentrated on a compact interval [aj , bj ] (resp. [cj , dj ]). While this assumption is quite natural from the practical point of view, it is not used frequently in the literature, simply because it is easier to analyze queueing models with input distributions having unbounded support. We make similar assumptions for the model (e∗ , s∗ , w∗ ); in particular, it is assumed that a∗j ≤ e∗j ≤ b∗j , c∗j ≤ s∗j ≤ d∗j a.s. for all j ∈ IN. The input pairs (ej , e∗j ), (sj , s∗j ) of the two models are arbitrarily mutually dependent, the distributions of ej ’s, e∗j ’s, sj ’s, s∗j ’s are unknown. We assume that only the moments Eej = αj ,
Ee∗j = αj∗ ,
Esj = βj ,
Es∗j = βj∗
(7.6.42)
are given. Our problem to find a sharp bound for the deviation between the waiting times in both models. Let ϕk (en−k,n−1 , sn−k,n−1 ) (en−k,n−1 := (en−k , . . . , en−1 ), sn−k,n−1 := (sn−k , . . . , sn−1 )) be the waiting time for the nth arrival, assuming that the system is “free” at the moment n − k. In other words, ϕ(en−k,n−1 , sn−k,n−1 ) := max [0, en−1 − sn−1 , (en−1 − sn−1 ) + (en−2 − sn−2 ), . . . ,
(7.6.43)
(en−1 − sn−1 ) + (en−2 − sn−2 ) + · · · + (en−k − sn−k )] . As a measure of deviation between the waiting times of the both systems we shall use δp (T ) = sup max Lp [ϕk (en,n+k−1 , sn,n+k−1 ), ϕk (e∗n,n+k−1 , s∗n,n+k−1 )], n∈IN 1≤k≤T
where p ≥ 1 and T ≥ 2 are fixed, and Lp (X, Y ) := {E|X − Y |p }1/p , p ≥ 1, X, Y ∈ X (R).
(7.6.44)
For random vectors we extend (7.6.44) as follows: Lp (X, Y ) = {E||X − Y ||p }1/p , X, Y ∈ X (RT ),
(7.6.45)
7.6 Moment Problems of Stochastic Processes and Rounding Problems
73
T where ||(x1 , . . . , xT )|| = i=1 |xj |. Since ϕk is a Lipschitz function with respect to the Minkowski norm || · ||, we have that for any k = 1, . . . , T , Lp [ϕk (en,n+k−1 , sn,n+k−1 ), ϕk (e∗n,n+k−1 , s∗n,n+k−1 )]
(7.6.46)
≤ Lp [(en,n+T −1 , sn,n+T −1 ), (e∗n,n+T −1 , sn,n+T −1 )] + Lp [(e∗n,n+T −1 , sn,n+T −1 ), (e∗n,n+T −1 , s∗n,n+T −1 )] ≤
n+T −1
Lp (ej , e∗j ) + Lp (sj , s∗j ) .
j=n
Now we invoke Theorem 7.6.5 to obtain sharp estimates of Lp (ej , e∗j ) and Lp (sj , s∗j ). Namely, Lp (ej , e∗j ) ≤ (Dj δj + sj )1/p ,
(7.6.47)
where Dj
:= Dj (aj , bj , a∗j , bj ∗) := |bj − b∗j |p + |aj − a∗j |p − |bj − a∗j |p − |aj − b∗j |p ;
Tj
:= Tj (aj , a∗j , bj , b∗j , αj , αj∗ ) (7.6.49) ∗ p ∗ p := (1 − Bj )|bj − aj | + (Bj + Cj − 1)|aj − aj | + (1 − Cj )|aj − b∗j |p ;
Bj δj
:=
b j − αj , bj − aj
Cj
b∗j − αj∗ := ∗ ; bj − a∗j
:= δj (aj , a∗j , bj , b∗j , αj , αj∗ ) := max(0, 1 − Bj − Cj ).
(7.6.48)
(7.6.50) (7.6.51)
Remark 7.6.29 If ej and e∗j are unknown, i.e., aj = a∗j = 0, bj = b∗j = +∞, then sup{Lp (dj , e∗j ); Eej = αj , Ee∗j = αj∗ } = ∞ (cf. Kuznezova-Sholpo and Rachev (1989)). In a similar way, j δj + Tj )1/p , Lp (sj , s∗j ) ≤ (D
(7.6.52)
j , δj , Tj are defined by (7.6.48)–(7.6.51), exchanging bj with dj , where D b∗j with d∗j , aj with cj , and a∗j with c∗j . In this way we have proved the following theorem. Theorem 7.6.30 For any p ≥ 1 and T = 2, 3, . . . , j δj + Tj )1/p . δp (T ) ≤ T sup (Dj δj + Tj )1/p + (D j≥1
(7.6.53)
74
7. Relaxed or Additional Constraints
The estimate is sharp or nearly sharp, since the inequalities (7.6.53) and (7.6.52) are the best possible bounds under the moment assumptions (7.6.42) (cf. Theorem 7.6.5), and also, the inequality (7.6.46) cannot be improved in the set of all possible input flows e, e∗ , s, s∗ . Next, we shall consider a much more general case than the single-channel models discussed above. Suppose the dynamics of a queueing system are determined by the transformation F from the set U of input flows U to the set V of output flows V . Let V0 represent the output at moment zero; V0 is assumed to be an -dimensional vector; i.e., V0 ∈ X (R ). It is quite general to assume that the input and the output flows have the form U = (V0 , U0 , U1 , . . .) and V = (V0 , V1 , . . .), where Uj ∈ X (Rk ). We endow U and V with the norms ||U ||U :=
∞
2−j ||Uj ||k,p + ||V0 ||,p
(7.6.54)
2−j ||Vj ||,p ,
(7.6.55)
j=0
and ||V ||V :=
∞ j=0
where p ≥ 1, ||Uj ||k,p
:=
(E||Uj ||pk )1/p ,
||Uj ||k
=
||(Uj , . . . , Uj || = |Uj | + · · · + |Uj |,
(1)
(k)
(1)
(k)
and ||Vj ||,p is defined in a similar way. Suppose the transformation F : U → V is determined by the set of mappings Fj : R × Rkj → R ,
j ∈ IN,
(7.6.56)
such that the output at “time” j is defined recursively: Vj = Fj (V0 , U0 , . . . , Uj−1 ).
(7.6.57)
A smoothness assumption on Fj is given by the Lipschitz condition ⎡ ⎤ j−1 (7.6.58) ||βj ||k ⎦ . ||Fj (α0 , β0 , . . . , βj−1 || ≤ cj ⎣||α0 || + j=0
A reasonably large number of queueing models meet conditions (7.6.56)– (7.6.58). Among them are the single-channel models G|G|1|∞, the multichannel models G|G|J|∞, and the multichannel–multiphased model (G|G|J1 ) → (G|J2 ) → · · · → (G|Jn ) (cf. Kalashnikov and Rachev (1990,
7.6 Moment Problems of Stochastic Processes and Rounding Problems
75
Chapter 5)). By (7.6.55), (7.6.57), and (7.6.58), ||Vj ||,p ≤ cj ||V0 ||,p + j−1 ||U || , and thus i k,p i=0 ≤ 2cj ||U ||U ⎡ ⎤ k ∞ (i) (i) ≤ 2cj ⎣ ||V0 ||,p + 2−1 ||Vj ||,p ⎦ .
||V ||V
i=1
(7.6.59)
i=1 j=0
Combining (7.6.59) with Theorem 7.6.5 gives us a sharp bound on the deviation of two queueing models V = FU, V ∗ = FU ∗ , whose dynamics are determined by (7.6.54)–(7.6.58). Theorem 7.6.31 Suppose V = FU,
U ∈ U,
V ∈U
(7.6.60)
is a queueing model satisfying (7.6.54)–(7.6.58) such that (i)
≤ V0
(i)
≤ Vj
a0 cj
(i)
≤ b0
(i)
(i)
≤ dj
(i)
(i)
a.s.,
EV0
a.s.,
EVj
(i)
(i)
= L0 , (i)
= βj ,
i = 1, . . . , , j = 0, 1, . . . , i = 1, . . . , k.
In addition to model (7.6.60) consider the same type model indexed by ∗ and satisfying the above two sets of inequalities with constants indexed by ∗. Then ||V − V ∗ ||V ⎡ ⎤ k ∞ 1/p 1/p (i) (i) (i) (i) δ(i) + T(i) ⎦, ≤ 2cj ⎣ + 2−j D D0 δ0 + T0 j j j i=1
i=1 j=0
where the D’s, δ’s, and T ’s are determined by the same formula as in (7.6.58)–(7.6.60), and (i) D0 (i)
T0
(i)
δ0
(i) D j (i) Tj (i) δj
(i) (i) (i)∗ (i)∗ a0 , b0 , a0 , b0
= Di , (i) (i) (i)∗ (i)∗ (i) (i)∗ = Ti a0 , b0 , a0 , b0 , α0 , α0 , (i) (i) (i)∗ (i)∗ (i) (i)∗ = δi a0 , b0 , a0 , b0 , α0 , α0 , (i) (i) (i)∗ (i)∗ = Dj cj , dj , cj , dj , (i) (i) (i)∗ (i)∗ (i) (i)∗ = Tj cj , dj , cj , dj , βj , βj , (i) (i) (i)∗ (i)∗ (i) (i)∗ = δj cj , dj , cj , dj , βj , βj .
76
7. Relaxed or Additional Constraints
The rest of this section deals with Problem (b) (on page 72). Suppose again that the “real” queueing system is determined by the triplet (e, s, w), where w is given by the recursive equation (7.6.41). Often in practice one models the random input characteristics by replacing their random values with constants, usually equal to the corresponding means. In doing so, it is natural to investigate the deviation between the “real” output w and the modeled (“ideal”) output w∗ . (In the sequel, all quantities related to the approximating model will have the same notations as in the “real” system but superscribed with ∗.) The deviation between w and w∗ will be expressed by the Kantorovich metric p , defined here as follows: For X, Y ∈ X (IR∞ ), p (X, Y ) := p (P X , P Y )
(7.6.61)
d d Y ); X, Y ∈ X (IR∞ ), X = min{Lp (X, X, Y = Y }, p > 0, q p where Lp (X, Y ) := E d (X, Y ) , q = min(1, 1/p) is the Lp -metric. In the above definition, the space X (IR∞ ) consists of all random taking ∞sequences ∞ −j values in the metric space (IR , d), where d(x, y) := j=1 2 ||xj − yj ||. Since we have assumed that the underlying probability space is not atomic, the minimum in the right-hand side of (7.6.61) is equal to ⎧ ⎨ d(x, y)P ( dx, dy); P s are probabilities on IR∞ × IR∞ min ⎩ ⎫ IR∞ ×IR∞ ⎬ X Y with fixed projections P and P . ⎭
:=
(n)
(n) (n) = X1 , X2 , . . . For X (n) we have p X , X ≥ 2−j p
∈ Xp (IR∞ ), X = (X1 , X2 , . . .) ∈ Xp (IR∞ ), (n) (n) Xj , Xj , and thus p (X , X) → 0 imd
(n)
plies the weak convergence of any j-component Xj = X and E|Xj |p → E|Xj |p . Further, we consider two types of approximating queues D|G|1|∞ (i.e., e∗j are constants) and D|D|1|∞ (i.e., e∗j and s∗j are constants). Similar results can be obtained if one examines the model G|D|1|∞ (i.e., s∗j are constants) as an approximation of the “real” queue G|G|1|∞. In both queues D|G|1|∞ and G|G|1|∞, the sequences of service times ∗ s and s consist of dependent nonidentically distributed random variables. The next lemma shows that the outputs for the ideal and real models meet a lower bound of deviation if s∗ is chosen to have independent components. Let ε > 0 and X = (X1 , X2 , . . .) ∈ Xp (IR∞ ). The components of X are said to be (p , ε)-independent if IND(X) := p (X, X) ≤ ε,
(7.6.62)
7.6 Moment Problems of Stochastic Processes and Rounding Problems
77
d
where the X i ’s (the components if X) are independent and X i = Xi (i ∈ IN). Lemma 7.6.32 Let the approximating model be of type D|G|1|∞. Assume that the sequences e and s of the queueing model G|G|1|∞ are independent. Then 1 p (w, w∗ ) 2
(7.6.63) ∗
≤ IND(s) + IND(s ) +
∞
2−j (p (ej , e∗j ) + p (sj , s∗j )).
j=1
Proof: Using the recursive equations (7.6.41) for w and w∗ , we obtain 1 1 ∗ ∗ ∗ ∗ ∗ ∗ 2 d(w, w ) ≤ d(e, e ) + d(s, s ). Hence, 2 Lp (w, w ) ≤ Lp (e, e ) + Lp (s, s ). Since e and s (resp. e∗ and s∗ ) are independent, we have, passing to the minimal metrics, that 1 p (w, w∗ ) ≤ p (e, e∗ ) + p (s, s∗ ). 2
(7.6.64)
By (7.6.61) and since ej (j ∈ IN) are constants, we obtain the bound ∗
p (e, e ) =
∞
−j
2
Lp (ej , e∗j )
=
j=1
∞
2−j p (ej , e∗j ).
(7.6.65)
j=1
To estimate p (s, s∗ ) in (7.6.64) we use the (p , ε)-independence characteristic defined in (7.6.62): p (s, s∗ ) ≤ IND(s) + IND(s∗ ) + p (s, s∗ ),
(7.6.66) d
d
where s (resp. s∗ ) has independent components and sj = sj (resp. s∗j = s∗j ). We now invoke the “regularity” property of the Kantorovich metric: 2∞ 3 ∞ ∞ (n) (n) (n) (n) p ≤ X , Y p X , Y (7.6.67) n=1
n=1
n=1
for sequences {X (n) }n≥1 ⊂ Xp (IR∞ ), {Y (n) }n≥1 ⊂ Xp (IR∞ ) of independent components. Let E j be a sequence with components all equal to zero except for the jth component, which equals 1. Then by (7.6.67), ⎛ ⎞ ∞ ∞ p (s, s∗ ) = p ⎝ sE j , s∗ E j ⎠ (7.6.68) j=1
≤
∞ j=1
j=1 j
∗
p sE , s E
j
=
∞ j=1
2−j p (sj , s∗j ).
78
7. Relaxed or Additional Constraints
Combining (7.6.64), (7.6.65), (7.6.66), and (7.6.68) proves the lemma.
2
The estimate (7.6.63) suggests that the approximating model should be chosen with s∗ having independent components. If this is the case, then IND(s∗ ) = 0, and the first problem is to estimate IND(s). Lemma 7.6.33 (a) Suppose that the only information known about the “real” service times are the moments ESjq1 = βj ,
Esqj 2 = γj ,
j ∈ IN,
(7.6.69)
and that the support of Fsj is [0, ∞). Then IND(s) ≤
∞
2−j Δj ,
(7.6.70)
j=1
where
Δj :=
⎧ pq ⎪ ⎨ 2β 1/q1 , if 0 < p ≤ q1 , 1 ≤ q1 < q2 , j ⎪ ⎩ +∞,
if 0 < q1 < q2 < p and
1/q βj 1
=
1/q γj 2 ,
(7.6.71)
and q = min(1, 1/p). (b) Suppose the support Fsj is the compact interval [cj , dj ], and βj = Esj . Then (7.6.70) holds with
Δj Tj
1/p j = −2(dj − cj )p , Dj δj + Tj = , p ≥ 1, where D d j − βj dj − βj = 2 1− , and δj = max 0, 1 − 2 . dj − cj dj − cj
(7.6.72)
Proof: Assertion (a) follows from Corollary 2 of Kuznezova-Sholpo and Rachev (1989) and (b) from Theorem 7.6.5. 2
Lemma 7.6.34 Suppose that for every j ∈ IN the first two moments of ej are known: mj := Eej ,
(2)
mj
and let aj ≤ ej ≤ bj a.s.
:= Ee2j ,
σj2 := Var ej ,
(7.6.73)
7.6 Moment Problems of Stochastic Processes and Rounding Problems
79
(i) If p ≥ 2 and −∞ < aj < bj < ∞ and if e∗j is chosen to be the midpoint of [aj , bj ], then p (ej , e∗j ) ≤
*#
bj − aj 2
$p−2
+ 1/p (2) . (7.6.74) mj − mj (aj + bj ) + e∗2 j
(ii) Suppose 0 < p ≤ 2 and either −∞ = aj , +∞ = bj , or −∞ < aj < bj < ∞ and σj ≤ min[mj − aj , bj − mj ].
(7.6.75)
Then the “optimal” d∗j for the approximating model is given by e∗j = mj , and in this case p (ej , e∗j ) ≤ σjpq ,
q = min(1, 1/p).
(7.6.76)
Proof: This follows from Theorems 7.6.10, 7.6.11, and 7.6.12 after some obvious arguments. The estimates (7.6.74) and (7.6.76) are sharp. 2 Lemma 7.6.35 (a) If 0 < p ≤ q1 , 1 ≤ q1 < q2 , then & ' (2) ∗(2) q1 q2 ∗q1 ∗q2 ∗ ∗ sup p (sj , sj ); nj = Esj , nj = Esj , nj = Esj , nj = Esj pq 1/q1 ∗1/q1 = nj + nj , q = min(1, 1/p). (7.6.77) (b) Suppose p ≥ 1, cj ≤ sj ≤ dj , c∗j ≤ s∗j ≤ d∗j a.s., and nj = Esj , n∗j = Es∗j . Then p (sj , s∗j )
≤
j δj + Tj D
1/p
,
(7.6.78)
j = Dj (cj , dj , c∗ , d∗ ), δj = δj (cj , dj , c∗ , d∗ , nj , n∗ ), where D j j j j j ∗ ∗ ∗ Tj = Tj (cj , dj , cj , dj , nj , nj ) are given by (7.6.48)–(7.6.51). Proof: Assertion (a) follows from Corollary 2 of Kuznezova-Sholpo and Rachev (1989) and (b) from Theorem 7.6.5. 2 Lemmas 7.6.32–7.6.35 lead us to the main result. Theorem 7.6.36 Let the approximating queueing model be of type D|G|1|∞. Assume that the sequences e and s of the “real” queueing model are independent. Then the Kantorovich metric between the sequences of waiting
80
7. Relaxed or Additional Constraints
times of the “approximating” and “real” models is bounded as follows: p (w, w∗ )
(7.6.79) ∗
≤ 2 IND(s) + 2 IND(s ) +
∞
2−j+1 (p (ej , e∗j ) + p (sj , s∗j )).
j=1
Each term in the right-hand side of (7.6.79) can be estimated as follows: (a) An appropriate choice for the approximating sequence of service times will be IND(s∗ ) = 0. (b) If (7.6.69) holds, a bound for IND(s) is given by (7.6.70). (c) If the means and variances of the ej ’s are known, then p (ej , e∗j ) can be estimated from above by (7.6.74). (d) The last term in (7.6.78), p (sj , s∗j ), can be estimated by (7.6.77) (resp. (7.6.78)), provided that the corresponding moment conditions hold. In the next theorem we shall omit the restriction that e and s are independent, but we shall assume that the approximating model is of completely deterministic type D|D|1|∞. Theorem 7.6.37 If the approximation queueing model is of type D|D|1|∞, then ∗
p (w, w ) ≤
∞
2−j+1 (p (ej , e∗j ) + p (sj , s∗j )).
(7.6.80)
j=1
If the first moments of ej and sj are fixed, then p (ej , e∗j ) and p (sj , s∗j ) can be estimated as in Lemma 7.6.34. The proof is similar to that of Theorem 7.6.36.
7.6.5
Rounding Random Numbers with Fixed Moments
In this part we shall discuss the interplay between Moment problems 3 and 4 (on pages 53, 54) and the problem of rounding of random proportions. Given a vector X = (X1 , . . . , Xn ) of r.v.s consider the sum X1 +· · ·+Xn . If the Xi ’s are uniformly distributed on the simplex {(si ) ≥ 0; s1 +· · ·+sn = 1}, then they can be treated as proportions, and clearly Sn := X1 + · · · + Xn = 1. If Sn∗ is the sum of conventional roundings [X1 ]1/2 + · · · + [Xn ]1/2 , then Mosteller, Youtz, and Zahn (1967), Diaconis and Freedman (1979), and Balinski and Rachev (1993) have estimated the probability that Sn =
7.6 Moment Problems of Stochastic Processes and Rounding Problems
81
Sn∗ . Here we shall examine the closeness between Sn and Sn∗ in the case of i.i.d. observations Xi where only one or two moments are known. Suppose {Xi }i∈IN are nonnegative i.i.d. r.v.s with known moments EX1 = d1 ,
EX1r = dr .
(7.6.81)
The c-rounding [·]c (c ∈ [0, 1]) (see Section 7.6.2) gives us the sequence of i.i.d. roundings {[Xi ]c }i∈IN . Let Vi := [Xi ]c − Xi be the rounding error, and n Sn,c = i=1 Vi is the total rounding error. Then the normalized rounding error n−1 Sn,c converges by the LLN to E[X1 ]c − d1 . Our objective here is to find sharp bounds for the distribution function of n−1 Sn,c subject to (7.6.81). In other words, for a suitably chosen metric μ in the distribution functions space, the problem is to determine the “radius” of the set of probabilistic laws, i.e., Dn = Dn (μ) := sup μ n−1 Sn,c , E[X]c − d1 ; (7.6.82)
EX = d1 , EX r = dr . In (7.6.82), X has the same distribution as the Xi ’s, and thus E[X]c − d1 = d
EV , where V = V1 . Clearly, there is a great variety of metrics μ from which one can choose in (7.6.82). We shall consider two metrics, one especially designed for the problem, one the ideal metric θs (s > 1) and the other the L´evy metric L. Note that Theorems 7.6.18–7.6.21 provide us with sharp bounds for E[X]c − d1 , in the case of the conventional rounding c = 12 .(6) In fact, with [X] = [X]1/2 , sup{E[X] − d1 ; EX = d1 , EX r = dr } inf{E[X] − d1 ; EX = d1 , EX r = dr }
= U − d1 ,
(7.6.83)
= L − d1 ,
(7.6.84)
where the exact values of U and L are given in Theorems 7.6.18–7.6.21. Next we can chose μ in the definition of Dn = Dn (μ) to be the L´evy metric L(X, Y ) =
inf{ε > 0; FX (x − ε) − ε ≤ FY (x) ≤ FX (x + ε) + ε for all x ∈ IR},
and thus for the distribution function Fn of n−1 Sn,c we obtain the following bounds: 0 ≤ Fn (x) ≤ Dn 0 ≤ Fn (x) ≤ 1 1 − Dn ≤ Fn (x) ≤ 1 (6) The
for 0 ≤ x ≤ L − Dn ,
(7.6.85)
for L − Dn ≤ x ≤ U − Dn , for U + Dn ≤ x.
(7.6.86) (7.6.87)
general case of c ≤ 1 was treated in Anastassiou and Rachev (1992).
82
7. Relaxed or Additional Constraints
From Theorems 7.6.18–7.6.21 it follows that the above bounds are sharp. Our next step is to find a good estimate for Dn = Dn (L). To this end we first estimate Dn (θs ) for θs (X, Y ) = sup |Ef (X) − Ef (Y )|.
(7.6.88)
Here, the supremum is taken over all bounded functions f on IR with q1 . The next integrable second derivative |f |q ≤ 1, 1 < s < 2, q = 2−s lemma shows that the θs -radius Dn (θs ) = O(n1−s ) for all 1 < s < 2. We use the notation ∨ := max. Lemma 7.6.38 For 1 < s < 2, Dn (θs ) ≤ cn1−s ,
c :=
2 (c ∨ (1 − c))s . s
Proof: For any X and Y with equal means, θs (X, Y )
≤ :=
1 κs (X, Y ) s d d = X| s−1 − Y |Y |s−1 ; X inf{E|X| X, Y = Y }.
Therefore, from the ideality of θs (7) Dn (θs )
1 ≤ n1−s θs (V, EV ) ≤ n1−s κs (V, EV ) s 1 = n1−s E V |V |s−1 − (EV )|EV |s−1 s 1 ≤ n1−s . s
The latter follows since |V | = |X − [x]c | ∈ (0, c ∨ (1 − c)).
2
− s−1 1+s
In the next theorem we bound Dn = Dn (L) in (7.6.82) as O n
.
Theorem 7.6.39 For any 1 < s < 2, 0 < c < 1, 1
1−s
Dn (L) ≤ (4 c) 1+s n 1+s , where the constant c is defined as in Lemma 7.6.38.
(7) θ
s is an ideal |c |s θs (Xi , Yi ), for i i
metric of order s > 0; that is, θs ( i ci Xi , i ci Yi ) all independent Xi , Yi and constants ci ∈ IR.
≤
7.6 Moment Problems of Stochastic Processes and Rounding Problems
83
Proof: The following claim was proved by Grigorevski and Shiganov (1976) for the case s = 2; i.e., in (7.6.88) the functions f have a.e. f and |f | ≤ 1 a.e.; see also Maejima and Rachev (1987) and Rachev and R¨ uschendorf (1992). Claim: For any 1 < s < 2, θ(X, Y ) ≥
1 1+s L (X, Y ). 4
Proof of the Claim: Let L(X, Y ) > ε. Then there exists x0 such that either FX (x0 ) > FY (x0 +ε)+ε or FY (x0 ) > FY (x0 +ε)+ε. Say the first inequality takes place. Define ⎧ 1 for x ≤ x0 ; ⎪ ⎪ 2 ⎪ ⎪ 2(x − x0 ) ε ⎪ ⎪ for x0 < x ≤ x0 + ; ⎨ 1− ε 2 2 f0 (x) := ⎪ 2(x0 + ε + x) ε ⎪ ⎪ for x ≤ x < x0 + ε; −1 + + ⎪ 0 ⎪ ε 2 ⎪ ⎩ 1 for x ≥ x0 + ε. Observe that |f0 (x)| ≤ 1, f (x) exists a.e., and ⎡ x +ε ⎤1/q 0 |f0 (x)|q dx⎦ = 8ε−s =: c(ε) ||f0 ||q = ⎣ x0
1 . Recalling the definition of θs , we have 2−s # $ # $ f (X) (Y ) f 0 0 θs (X, Y ) ≥ E −E c(ε) c(ε) 1 = (f (x) + 1) d [F (x) − F (x)] 0 X Y c(ε) x 0 ∞ 1 = (f0 (x) + 1) dFX (x) + (f0 (x) + 1) dFX (x) c(ε)
for q =
−∞
x0
∞ (f0 (x) + 1) dFY (x) − (f0 (x) + 1) dFY (x) − −∞ x0 +ε ⎡ x ⎤ 0 ∞ 1 ⎣ (f0 (x) + 1) dFX (x) − (f0 (x) + 1) dFY (x)⎦ c(ε) x0 +ε
≥
−∞
≥
2 [FX (x0 ) − FY (x0 + ε)] c(ε)
x0 +ε
84
7. Relaxed or Additional Constraints
≥ =
2ε c(ε) 1 1+s ε . 4
Letting ε → L(X, Y ) completes the proof of the claim. Now the desired estimate follows from Lemma 7.6.38 and the claim.
6:26 pm, 3/14/05
2
8 Application of Kantorovich-Type Metrics to Various Probabilistic-Type Limit Theorems
We have discussed already in detail the Kantorovich metric as the solution of mass transportation and mass transshipment problems with a metric cost function; cf. Section 2.5 and Chapter 4. In Chapter 7 we studied generalized transshipment problems, leading to extensions of the Kantorovich metric to encompass a variety of ideal probability metrics. This chapter is devoted to applications of these metrics to the rate of convergence problem in the central limit theorem (CLT) and different summability methods for random vectors. We also discuss applications to the asymptotics of various rounding rules.
8.1 Rate of Convergence in the CLT with Respect to the Kantorovich Metric In this section, we investigate bounds for the rate of convergence in the CLT with respect to the Kantorovich metric for random variables with values in separable Banach spaces. In the first part, the rate in stable limit theorems for sums of i.i.d. random variables is considered. The method of proof is an extension of the Bergstr¨ om convolution method. All assumptions regarding the domain of attraction are given in a metric form. In the second part an extension is given to the martingale case. The proof is based on smoothing properties of suitable conditonal versions of the Kantorovich metric. Smoothing inequalities for the Kantorovich metric will be established, and
86
8. Probabilistic-Type Limit Theorems
the Bergstr¨om convolution method (cf. Zolotarev (1977, 1979, 1983, 1986), Senatov (1980), Sazonov (1981), Rachev and Yukich (1989, 1991), Rachev (1991c)) will be extended to the case of stable limit theorems and at the same time to the Kantorovich metric. All assumptions concerning the domain of attraction and the order of convergence are described in terms of finiteness conditions for certain convolution-type metrics. As a consequence of the results for the Kantorovich metric, one obtains rate of convergence results in stable limit theorems for martingales with respect to the Prohorov metric.(1) We start with the rate of convergence in the i.i.d. case. Consider a separable Banach space (U, · ) and the space X (U ) of U -valued r.v.s defined on a rich enough probability space. The r.v. ϑ ∈ X (U ) is said to be α-stable (0 < α ≤ 2) if n
−1/α
n
d
ϑi = ϑ,
(8.1.1)
i=1
where the ϑi ’s are i.i.d. copies of ϑ. We are interested in the rate of convergence of the normalized sum Zn = n
−1/α
n
Xi
(8.1.2)
i=1
of i.i.d. r.v.s to ϑ with respect to the Kantorovich metric: 1 (X, Y ) :=
sup {|E(f (X) − f (Y ))|; f : U → IR bounded, (8.1.3) |f (x) − f (y)| ≤ x − y} .
Recall from Chapters 2 and 4 that 1 -convergence is equivalent to convergence in distribution and convergence of the moments E · (existence assumed); moreover, for U = IR, 1 (X, Y ) = |FX (x) − FY (x)| dx. The Prohorov metric π(X, Y ) :=
inf{ε > 0; P (X ∈ A) ≤ P (Y ∈ Aε ) + ε,
(8.1.4)
for all Borel sets A in U }(Aε := {x; |x − A|} < ε) and the Kantorovich metric satisfy the well-known inequality π 2 ≤ 1 , (1) Some
(8.1.5)
results in the literature are formulated more generally but use bounds involving moments of order ≥ 2 and, therefore, are restricted to the Gaussian case. For some recent literature we refer to Bolthausen (1982), H¨ aussler (1988), Bentkus et al. (1990), and Rackauskas (1990). Our method involves various extensions of an idea in Gudynas (1985) on suitably conditioned versions of probability metrics. The results in this section are based on Rachev and R¨ uschendorf (1994a).
8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric
87
which is in fact an immediate consequence of the Strassen and Kantorovich theorems. In particular, 1 -convergence rates imply convergence rates for π. Theorem 8.1.1 For any 0 < α < 1, 1 (Zn , ϑ) ≤ n1−1/α 1 (X1 , ϑ).
(8.1.6)
Proof: The result follows from (8.1.1)–(8.1.3) and the contraction properties of 1 ; in fact, if ϑi are i.i.d. copies of ϑ, then n −1/α ϑi ≤ n1−1/α E|X1 − ϑ1 |. (8.1.7) E Zn − n i=1
Next, we take in both sides of (8.1.7) the infimum over all joint distributions P X1 ,ϑ1 with fixed marginals P X1 and P ϑ . The result is (8.1.6) as desired. 2 Note that (8.1.6) is a general property for every ideal metric of order 1; see for example Zolotarev (1979). Recall that a probability metric μ is said to be ideal of order r if μ(X + Z, Y + Z) ≤ μ(X, Y )
(8.1.8)
for all r.v.s X, Y, Z such that Z is independent of (X, Y ) and μ(cX, cY ) = |c|r μ(X, Y )
for all c ∈ IR,
(8.1.9)
see Sections 6.3 and 6.4. Consider next the rate of convergence in 1 (Zn , ϑ) → 0
(8.1.10)
for 1 < α ≤ 2. Define the following ideal (smoothing) Kantorovich metric of order r > 1: r (X, Y ) = sup hr−1 1 (X + hϑ, Y + hϑ),
r > 1,
(8.1.11)
h>0
and σ r (X, Y ) = sup hr σ(X + hϑ, Y + hϑ),
r > 0.
(8.1.12)
h>0
Here ϑ in (8.1.11) and (8.1.12) is assumed to be independent of X and Y , and σ is the total variation metric: σ(X, Y ) = sup{|E(f (X) − f (Y ))|; f : U → [0, 1] continuous} = 2 sup |P (X ∈ A) − P (Y ∈ A)|. A∈B(U )
(8.1.13)
88
8. Probabilistic-Type Limit Theorems
Note that r and σ r are ideal metrics of order r. Throughout this section r stands for the smoothed 1 -metric of order r. The notion p has been used in previous sections for the minimal Lp -metric. So, we have increased the level of “ideality” for 1 and σ (recall that 1 is an ideal metric of order 1, while σ is ideal of order 0) by appropriate smoothing; see (8.1.11) and (8.1.12). The next theorem provides an estimate of the convergence rate in (8.1.10). In what follows C stands for an absolute constant that can be different in different places. Set 1 = 1 (X1 , ϑ), r = r (X1 , ϑ), σ = σ(X1 , ϑ), σ r = σ r (X1 , ϑ). We always assume r > 0. The results in this section are due to Rachev and R¨ uschendorf (1994). Theorem 8.1.2 Suppose that (a)
Eϑ < ∞;
(b)
1 + r + σ1 + σ r < ∞.
Then
1 (Zn , ϑ) ≤ C n
1−r/α
r + τr n
−1/α
,
(8.1.14)
where . τr = max 1 , σ1 , σ 1/(r−α) r
(8.1.15)
Remark 8.1.3 Zolotarev (1986, §5.4) provides a similar bound for 1 (Zn , ϑ) in the normal univariate case. Zolotarev’s bound contains ζr metrics in the right-hand side of (8.1.14), which can be easily estimated from above in the normal case. In the stable case, however, we need more refined bounds. The problem of finiteness of σ r was discussed in Rachev and Yukich (1989) (see also Section 8.3); for the finiteness of r see the next corollary. Further in this section the sum of any random variables X + Y d d + Y , where X and Y are independent and X = means X X, Y = Y . ϑ, and ϑi are defined as in (8.1.1) and satisfy (a). Proof: The proof is similar to that of Theorem 8.1.16, further in this section, which we shall give in detail. Here we give only a short sketch of the proof. It uses the following two properties of the metrics 1 , r , σ r ; see Zolotarev (1986, §5.4). Smoothing Property 1. For any X, Y ∈ X (U ), 1 (X, Y ) ≤ 1 (X + εϑ, Y + εϑ) + 2εEϑ.
(8.1.16)
8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric
89
Smoothing Property 2. For any X, Y, Z, W independent, 1 (X + Z, Y + Z) ≤ 1 (Z, W )σ(X, Y ) + 1 (X + W, Y + W ). (8.1.17) Next, let m = [ n2 ]; then by (8.1.16), 1 (Zn , ϑ1 ) ≤ 1 (Zn + εϑ, ϑ1 + εϑ) + Cε ϑ1 + X 1 + · · · + X n + εϑ ≤ 1 Zn + εϑ, n1/α m ϑ1 + · · · + ϑj + Xj+1 + · · · + Xn · 1 + εϑ, 1/α n j=1
ϑ1 + · · · + ϑj+1 + Xj+2 + · · · + Xn + εϑ n1/α ϑ1 + · · · + εm+1 + Xm+2 + · · · + Xn + 1 + εϑ, ϑ1 + εϑ n1/α m = I0 + Ij + Im+1 . j=1
By (8.1.17),
X2 + · · · + X n ϑ 2 + · · · + ϑ n −1/α −1/α , X1 + εϑ, n ϑ1 + εϑ σ n I0 ≤ 1 n1/α n1/α X1 + ϑ2 + · · · + ϑ n ϑ1 + · · · + ϑn + 1 + εϑ, + εϑ . n1/α n1/α Similar upper bounds are obtained for Ij , 1 ≤ j ≤ m + 1. Some of the terms obtained in this way can be estimated using the ideality properties of the For example, a term of the following form, Δ = (m + X1metrics. +ϑ2 +···+ϑn , ϑ , can be estimated by 1)1 n1/α 2 Δ =
(m + 1)1
≤ (m + 1) ≤ (m + 1)
n−1/α X1 + n n−1 n n−1
r−1 α r−1 α
n−1 n
1/α
ϑ, n−1/α ϑ1 +
r (n−1/α X1 , n−1/α ϑ) n−r/α r ≤ Cn1−r/α r ,
where in the first inequality we use the obvious relation r (X, Y ) ≥ hr−1 1 (X + hϑ, Y + hϑ).
n−1 n
1/α 3 ϑ
90
8. Probabilistic-Type Limit Theorems
X1 +···+Xj ϑ1 +···+ϑj we , j 1/α j 1/α Bj ≤ C(r j 1−r/α + τr j −1/α ).
For terms of the form Bj := 1
use an induction
argument to get the bound see the proof of Theorem 8.1.16.
For more details 2
Corollary 8.1.4 Suppose that U = IRK and that ϑ has a Fr´echet differentiable density pϑ and let C(ϑ) = sup |pϑ (y)(z)| dz < ∞. (8.1.18) z ≤1
Suppose that Eϑ < ∞ and 1 + r < ∞. Then 1 (Zn , ϑ) ≤ C(n1−r/α r + τr∗ n−1/α ), where τ1∗ = max(1 , r
1/(r−α)
(8.1.19)
).
For an integer r, r can be estimated from above by the ζr -metric (see (r) Zolotarev (1983, p. 294)): r ≤ Cζr if sup z ≤1 |pϑ (y)(z)| dz is finite. We shall discuss the finiteness of r in Section 8.5 in more detail. Proof: Claim 1. For any X and Y ∈ X (IRk ) and δ > 0, σ(X + δϑ, Y + δϑ) ≤ C(r)δ −r r (X, Y ),
(8.1.20)
with C(r) = 2(2−3)/α C(ϑ). To prove the claim we first use the obvious bound σ(X + δϑ, Y + δϑ) ≤ δ −r σ r (X, Y ).
(8.1.21)
Next, we show that for any δ > 0, σ(X + δϑ, Y + δϑ) ≤ δ −1 C(ϑ)1 (X, Y ).
(8.1.22)
Indeed, by the ideality of σ and 1 it is enough to show (8.1.22) for δ = 1. Then σ(X + ϑ, Y + ϑ) ≤ sup f (x)(PX ( dx) − PY ( dx)) , |f |≤1
f (x + y)pϑ (y) dy. Since |f | ≤ 1, f (x) = sup |f (x)(z)| ≤ sup |pϑ (y)(z)| dy =: C(ϑ),
where f (x) =
z ≤1
z ≤1
8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric
91
and thus |f (x) − f (y)| ≤ C(ϑ)x − y, which obviously implies (8.1.22). To show (8.1.19) we use (8.1.21), (8.1.22), and the following bound: σ r (X, Y )
=
sup hr σ(X + hϑ, Y + hϑ) h>0
≤ sup hr 1 (X + 2−1/α hϑ, Y + 2−1/α hϑ) h>0
21/α C(ϑ) h
= C(ϑ)2r/α r (X, Y ). This completes the proof of the claim as well as that of (8.1.19).
2
Remark 8.1.5 (Rate of convergence in the CLT for random elements with LePage representation) Consider a symmetric α-stable U -valued random variable ϑ with LePage representation d
ϑ =
∞
−1/α
Γj
ηj Yj ,
(8.1.23)
j=1
where (i) Yj are i.i.d. with EY1 r < ∞; (ii) ηj are i.i.d. symmetric real-valued random variables with η1 α = (E|η1 |α )1/α < ∞; (iii) (Γj ) is a sequence of successive times of jump of a standard Poisson process; (iv) we assume that the three sequences are independent; see Ledoux and Talagrand (1991, Sect. 5.1) and Samorodnitsky and Taqqu (1994). Suppose X has a similar representation d
X =
∞
−1/α ∗ ∗ ηj Yj ,
Γj
(8.1.24)
j=1
where (Yj∗ ) and (ηj∗ ) are chosen as in (i) and (ii) with the only difference that they are not identically distributed. Write Zn , the normalized sum of i.i.d. copies Xi as in (8.1.2). Then Theorem 8.1.2 yields the following rate of convergence of Zn to ϑ in the 1 -metric. Corollary 8.1.6 Let 1 ∨ α < r < 2, and E||Y1 ||r + supj≥1 E||Yj∗ ||r + E||η1 ||r + supj≥1 E||ηj∗ ||r < ∞. Then 1 (Zn , ϑ) ≤ C(n1−r/α ∗r + τr∗ n−1/α ),
(8.1.25)
92
8. Probabilistic-Type Limit Theorems
& where ∗r := supj≥1 (r (Yj∗ , Y1 ) + r (ηj∗ Y1 , η1 Y1 )) and τr∗ = max ∗1 , σ1∗ , ' ∗1/(r−α) σr with σr∗ := supj≥1 (σ r (Yj∗ , Y1 ) + σ r (ηj∗ Y1 , η1 Y1 )). Proof: In view of (8.1.14), (8.1.15) we need only show the finiteness of σ r and r . For σ r = σ r (X, ϑ) we use the ideality of order r and the asymptotics −r/α EΓj ∼ j −r/α (j → ∞) to obtain σ r (X, ϑ)
=
−r/α
EΓj
r (ηj Yj , ηj∗ Yj∗ )
j≥1
⎛ ≤ ⎝
⎞ −r/α ⎠
EΓj
j≥1
sup{E|ηj∗ |r σ r (Yj∗ , Yj ) + σ r (ηj∗ Yj , ηj Yj )} j≥1
≤ C sup(σ r (Yj∗ , Y1 ) + σ r (ηj∗ Y1 , η1 Y1 )). j≥1
2
The same type estimate is valid for r .
Since in the LePage representations Yj , Yj∗ can have any high enough moment, examples with finite ∗r and τr∗ can be readily constructed. Take, d
for example, U to be a Hilbert space with basis (hm )m≥1 , and set Yj∗ = ∗ d ∗ ζj,m hm , Y1 = ζm hm , where (ζm )m≥1 , (ζj,m )m≥1 are sequences of m≥1
m≥1
independent random variables. Then, by the ideality of σ r , ∗ ∗ σ r (Yj∗ , Y1 ) ≤ σ r (ζj,m , ζm ) ≤ C κr (ζj,m , ζm ), m≥1
(8.1.26)
m≥1
where κr is the rth pseudomoment, ∗ κr (ζ , ζ) = r |x|r−1 |Fζ ∗ (x) − Fζ (x)|dx, see Zolotarev (1983). Similarly, ∗ r (Yj∗ , Y1 ) ≤ C κr (ζj,m , ζm ).
(8.1.27)
m≥1
The same example is valid if we relax the independence assumption to “independence in finite blocks,” requiring only that (ζ1+ , . . . , ζL+ ), = 0, L, 2L, . . ., are independent. Remark 8.1.7 (Finite-dimensional approximation) An alternative use of the explicit upper bounds of the smoothing metrics in the finite-dimensional case is to combine Theorem 8.1.2 with an approximation step by the finitedimensional case. To be concrete, let X, Y be C(S) valued processes, (S, d)
8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric
93
a totally bounded metric space. For ε > 0 let Vε denote a finite covering ε-net and let Pε X = (Xt )t∈Vε be the corresponding finite-dimensional approximation of X = (Xt )t∈S . If E sup |Xs − Xt | ≤ a(ε)
(8.1.28)
d(s,t)≤ε
and E sup
|Ys − Yt | ≤ b(ε),
d(s,t)≤ε
then 1 (X, Y ) ≤ 1 (Pε X, Pε Y ) + a(ε) + b(ε).
(8.1.29)
So we can combine fluctuation inequalities (8.1.28) with the finite-dimensional bounds derived in (8.1.14) for the normalized sum Zn in order to choose an optimal rate of approximation ε = ε(n) → 0. A general and simple useful tool to derive fluctuation inequalities as in (8.1.28) is Pollard’s lemma, which applied to (8.1.28) yields % 4 Nε max E sup |Xs − Xt | , (8.1.30) E sup |Xs − Xt | ≤ d(s,t)≤ε
1≤i≤Nε
d(s,ti )<ε d(t,ti )<ε
where Nε = card (Vε ) and Vε = {ti , 1 ≤ i ≤ Nε }. The case α = 1 requires special consideration. We shall state a variant of Theorem 8.1.2 that will cover the case α = 1 but requires additional smoothing conditions on the law of the Xi ’s. The next theorem is based on the following lemma (see Rachev and Yukich (1989) or Rachev (1991c, Ch. 14)). Lemma 8.1.8 Let 0 < α ≤ 2, r > α, ar = Ar (a) = 2−r/α ar . Suppose
1 , 21+r/α (2r/α +3r/α )
δ0 := δ0 (X1 , ϑ) := max(σ, ϑr ) ≤ ar .
and Ar =
(8.1.31)
Then for any n ≥ 1 σ(Zn , ϑ) ≤ Ar δ0 n1−r/α ≤ 2−r/α n1−r/α .
(8.1.32)
Theorem 8.1.9 Suppose condition (8.1.23) holds and τ r := max(1 , r ) < ∞. Then for
1 2
(8.1.33)
< α ≤ 2 and r > α,
1 (Zn , ϑ) ≤ Br,α τ r n1−r/α , where Br,α ≥ 8(r−1)/α + 2(2r/α + 3r/α ).
(8.1.34)
94
8. Probabilistic-Type Limit Theorems
The proof uses the following analogue of (8.1.17): For any independent X, Y, Z, W , 1 (X + Z, Y + Z) ≤ 1 (X, Y )σ(Z, W ) + 1 (X + W, Y + W ). (8.1.35) The proof is similar to that of the smoothing inequality in Zolotarev (1986, §5.4) (see also Rachev (1991c, Theorem 15.2.2)) and thus is omitted. The theorem is of interest for 1 ≤ α ≤ 2, as for 0 < α < r < 1 we get from (8.1.6), 1 (Zn , ϑ) ≤ n1−1/α τ¯r . Our next objective is the extension of Theorem 8.1.2 to the martingale case. Let (Ω, A, P ) be a rich enough probability space, (Fi ) an increasing sequence of sub σ-algebras of A, and let (Xi , Fi ) be an adapted martingale difference sequence with values in a separable Banach space (U, · ); that is, E(Xi |Fi−1 ) = 0 a.s., i ∈ IN. For a given probability metric μ and a sub σ-algebra G ⊂ A define the G-dependence metric μ(·G) by μ(X, Y G) = sup μ(X + V, Y + V ), V ∈G
(8.1.36)
where V ∈ G denotes that V is a G-measurable random variable. This notion generalizes an idea due to Gudynas (1985). Lemma 8.1.10 If μ is homogeneous of order r, that is, μ(cX, cY ) ≤ |c|r μ(X, Y ),
(8.1.37)
then μ(·G) also is homogeneous of order r. We shall use the following metrics: r (·G), σ r (·G), where r , σ r are respectively the smoothed Kantorovich metric and the total variation metric (cf. (8.1.11), (8.1.12)). Lemma 8.1.11 Let the regular conditional distributions PX|G , PY |G exist. Then r (X, Y G) ≤ Er (PX|G , PY |G )
(8.1.38)
σ r (X, Y G) ≤ Eσ r (PX|G , PY |G ).
(8.1.39)
and
Proof: Let ϑ be independent of X, Y , and G. Then r (X, Y G) = =
sup r (X + V, Y + V ) sup sup sup hr−1 E(f (X + V + hϑ)
V ∈G
f L ≤1 h>0 V ∈G
8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric
95
−f (Y + V + hϑ))
≤ E sup sup sup hr−1 E(f (X + V + hϑ)|G)
−E(f (Y + V + hϑ)|G)
f L ≤1 h>0 V ∈G
= E sup sup sup hr−1 E(fV (X + hϑ)|G)
−E(fV (Y + hϑ)|G),
f L ≤1 h>0 V ∈G
where fV (·) = f (· + V ) is the translation by V , which is again a Lipschitz (y)| function, and where f L = supx=y |f (x)−f is the Lipschitz norm. We x−y arrive at r (X, Y G) ≤ E sup sup hr−1 E(f (X + hϑ)|G) − E(f (Y + hϑ)|G) f L ≤1 h>0
= Er (PX|G PY |G ). The proof for the metric σ r is similar.
2
As a consequence we obtain the following regularity property of r and σr . Lemma 8.1.12 Let (Xi , Fi ) be a stochastic sequence and (Gi ) a decreasing sequence of sub σ-algebras such that Yj are Gi -measurable for j ≥ i. Suppose that the following condition holds: (c)
Xi and Gi+1 are conditionally independent given Fi−1 , and Yi and Gi+1 are conditionally independent given Fi−1 .
Then, for ci ∈ IR, r
2 n
c i Xi ,
n
3 ≤
ci Yi
i=1
i=1
2 n
n
n
|ci |r Er (PXi |Fi−1 , PYi |Fi−1 )
(8.1.40)
i=1
and σr
i=1
c i Xi ,
i=1
3 ci Yi
≤
n
|ci |r Eσ r (PXi |Fi−1 , PYi |Fi−1 ),
i=1
assuming that the conditional distributions exist.
(8.1.41)
96
8. Probabilistic-Type Limit Theorems
Proof: By Lemma 8.1.10, 2 n 3 n r ci Xi , ci Yi i=1
≤
n
i=1
r (c1 X1 + · · · + ci Xi + ci+1 Yi+1 + · · · + cn Yn ,
i=1
≤ = ≤
n
c1 X1 + · · · + ci−1 Xi−1 + ci Yi + · · · + cn Yn ) sup hr−1
i=1 h>0 n
sup
V ∈Fi−1 ∨Gi+1
1 (ci Xi + V + hϑ, ci Yi + V + hϑ)
r (ci Xi , ci Yi Fi−1 ∨ Gi+1 )
i=1 n
|ci |r r (X1 , Y1 Fi−1 ∨ Gi+1 ),
i=1
where Fi−1 ∨ Gi+1 is the σ-algebra generated by Fi−1 and Gi+1 . From Lemma 8.1.11 and the conditional independence assumption, r (Xi , Yi Fi−1 ∨ Gi+1 ) ≤ Er (PXi |Fi−1 ∨Gi+1 , PYi |Fi−1 ∨Gi+1 ) = Er (PXi |Fi−1 , PYi |Fi−1 ). As for the metric σ r , the proof is similar.
2
Remark 8.1.13 If Yi are independent of Fi−1 , EYi = 0, then r (PXi |Fi−1 , PYi ) ≤ Cr ζr (PXi |Fi−1 , PYi )
(8.1.42)
≤ Cr κr (PXi |Fi−1 , PYi ), where ζr is the Zolotarev metric and κr is the pseudo-difference moment (cf. (8.1.26) and Rachev (1991, p. 377)). In the α-stable case 1 < α < 2 and r = 2, the finiteness of κr implies that E(Xi |Fi−1 ) = EYi = 0,
(8.1.43)
which is fulfilled in the martingale case. In the normal case α = 2 and r = 3, the finiteness of ζr implies in the Euclidean case that the conditional covariance Cov (Xi |Fi−1 ) = Cov (Yi )
(8.1.44)
is almost surely constant. This and related conditions have been assumed in several papers on the martingale convergence theorem (cf. Basu (1976), Dvoretzky (1970), Bolthausen (1982), Butzer et al. (1983), H¨ aussler (1988), Rackauskas (1990)).
8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric
97
Lemma 8.1.14 r (X, Y ) ≥ (1 (X, Y ))r
r−1 r
r−1
1 2Eϑ. r
(8.1.45)
Proof: By the triangle inequality and from the definition of r , r (X, Y ) ≥ 1 (X + εϑ, Y + εϑ)εr−1 ≥ (1 (X, Y ) − 2εEϑ)εr−1 := ϕ(ε). 2
Maximizing ϕ(ε) with respect to ε, we obtain (8.1.45). In the next step we extend the smoothing inequality (8.1.17).
Lemma 8.1.15 Suppose that X, Z, Y, W are random variables with values in U such that (X, Z) is independent of (Y, W ) and Y, W are independent. Then 1 (X + Z, Y + Z) ≤ 1 (Z, W )σ(X, Y ) + 1 (X + W, Y + W ) + X), + 1 (Z + X, Z
(8.1.46)
d = is independent of X, and Z and Z where Z
σ(X + Z, Y + Z) ≤ σ(Z, W )σ(X, Y ) + σ(X + W, Y + W ) + X). + σ(Z + X, Z
(8.1.47)
Proof: By the triangle inequality, 1 (X + Z, Y + Z) =
sup |E [(f (X + Z) − f (X + W ))
f L ≤1
− (f (Y + Z) − f (Y + W ))]|
+ sup |E(f (X + W ) − f (Y + W ))|. f L ≤1
Furthermore, |E [f (X + Z) − f (X + W ) − (f (Y + Z) − f (Y + W ))]| = (E(f (Z + x)|X = x) − Ef (W + x)) dPX (x) − (Ef (Z + x) − Ef (W + x)) dPY (x) ≤ (E(f (Z + x)|X = x) − Ef (Z + x)) dPX (x) + Ef (Z + x)( dPX (x) − dPY (x))
98
8. Probabilistic-Type Limit Theorems
−
Ef (W + x)( dPX (x) − dPY (x))
+ X) + 1 (Z, W )σ(X, Y ). ≤ 1 (Z + X, Z 2
The proof of (8.1.47) is similar.
The last term in (8.1.46) is a measure of dependence of Z, X, which disappears if Z, X are independent. Making use of the smoothing properties, we next extend Theorem 8.1.2 to the martingale case. Let (Xi , Fi ) be a martingale difference sequence, n 1/α Zn = n j=1 Xj , and as in (8.1.4) let ϑ, ϑi be independent, α-stable distributed r.v.s. For r > α we define
r = sup r (Xj , ϑj ), τr = sup Er (PXj |Fj−1 , Pϑj ), r = r ∨ τr , (8.1.48) j
j
τr = sup Er (PXj |Fj−1 , PXj ), τ5r = sup Er (PXj |G5j+1 , PXj ), j
j
where G5j+1 = σ(Xj+1 , Xj+2 , . . .), and σ r = supj σ r (Xj , ϑj ), the conditional distributions, are assumed to exist. Theorem 8.1.16 Suppose that Eϑ < ∞. Then 1 (Zn , ϑ) ≤ C(n1−r/α r + n−1/α tr ), 1 1 r−α r−α where tr = max 1 , σ1 , σ r , τ5r , τ1 .
(8.1.49)
Proof: Applying (8.1.16) we shall estimate 1 (Zn + εϑ, ϑ1 + εϑ). Set m = [ n2 ]. Then 2 3 n 1 Zn + εϑ, n−1/α ϑi + εϑ ≤
i=1
−1/α
1 Zn + εϑ, n +
m
(ϑ1 + X2 + · · · + Xn ) + εϑ
1 n−1/α (ϑ1 + · · · + ϑj + Xj+1 + · · · + Xn ) + εϑ,
j=1 −1/α
⎛
n
(ϑ1 + · · · + ϑj+1 + Xj+2 + · · · + Xn ) + εϑ
+ 1 ⎝n−1/α (ϑ1 + · · · + ϑm+1 + Xm+2 + · · · + Xn ) + εϑ,
8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric
n−1/α
n
99
⎞ ϑj + εϑ⎠
j=1
=: I0 +
m
Ij + Im+1 .
j=1
From the generalized smoothing inequality (8.1.46), X1 ϑ1 X2 + · · · + Xn X2 + · · · + X n I0 = 1 + + εϑ , + + εϑ n1/α n1/α n1/α n1/α X 2 + · · · + X n ϑ2 + · · · + ϑ n −1/α −1/α ≤ 1 , X + εϑ, n ϑ + εϑ σ n 1 1 n1/α n1/α X 1 + ϑ2 + · · · + ϑ n ϑ1 + · · · + ϑ n + 1 + εϑ, + εϑ n1/α n1/α 2 3 n X1 + · · · + X n X1 + · · · + Xn−1 + X + 1 + εϑ, + εϑ n1/α n1/α =: Δ1 + Δ2 + Δ3 , d n = n is independent of X1 , . . . , Xn−1 , ϑ. Similarly, where X Xn and X m j=1
Ij
≤
m j=1
1
Xj+2 + · · · + Xn ϑj+2 + · · · + ϑn , n1/α n1/α
ϑ1 + · · · + ϑj + Xj+1 ϑ1 + · · · + ϑj+1 ·σ + εϑ, + εϑ n1/α n1/α 2 m ϑ1 + · · · + ϑj + Xj+1 + ϑj+2 + · · · + ϑn + 1 + εϑ, 1/α n j=1 3 n ϑ j=1 j + εϑ n1/α m ϑ1 + · · · + ϑj + Xj+1 + · · · + Xn + 1 + εϑ, 1/α n j=1 j+1 + Xj+2 + · · · + Xn ϑ1 + · · · + ϑj + X + εϑ n1/α
3
=: Δ4 + Δ5 + Δ6 . We first estimate Δ5 . By the ideality of r , 2 1/α m Xj+1 n−1 Δ5 = 1 + ϑ1 + εϑ, 1/α n n j=1
(8.1.50)
100
8. Probabilistic-Type Limit Theorems
3 1/α ϑj+1 n−1 + ϑ + εϑ n n1/α 2 1/α 1/α 3 m n−1 n−1 Xj+1 ϑj+1 ≤ 1 + ϑ, 1/α + ϑ 1/α n n n n j=1 ≤ Cn1−r/α r . Similarly, by Lemma 8.1.12, Δ7
=: Im+1 (8.1.51) 2 1/α m+1 Xm+1 + · · · + Xn ϑ+ , ≤ 1 n n1/α 3 1/α m+1 ϑm+1 + · · · + ϑn ϑ+ n n1/α (1−r)/α n m+1 −r/α ≤ n Er PXi |Fi−1 , Pϑi n j=m+1 ≤
Cn1−r/α τr ,
and in the same way as for Δ5 , we obtain Δ2 ≤ Cn1−r/α r .
(8.1.52)
The remaining terms are dealt with by induction. Assume next that for j < n, X1 + · · · + X j ϑ 1 + · · · + ϑ j r j 1−r/α + tr j −1/α , ≤ B (8.1.53) 1 j 1/α j 1/α and let
ε = A max
1/(r−α) 1/(r−α) σ1 , σ 1/(r−α) , r , τ5r r
n−1/α
(8.1.54)
with a constant A ≥ 0 that we shall fix later in the proof. Then Δ1
≤ BC(n1−r/α r + n−1/α tr )ε−1 n−1/α σ1 (X1 , ϑ) 1 ≤ BC(n1−r/α r + n−1/α tr ). A
In the same way, 1−r/α −1/α Δ4 ≤ CB r (n − m − 2) + tr (n − m − 2) −r/α ∞ j Xj+1 ϑj+1 α +ε σr , · 1/α n1/α n n j=1
(8.1.55)
(8.1.56)
8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric
101
≤ CB r n1−r/α + tr n−1/α ·
∞
σ r (Xj+1 , ϑj+1 )
j+
j=1
Aα σ
r
(Xj+1 , ϑj+1 )
α/(r−α)
r/α
≤ CB(r n1−r/α + tr n−1/α )/Ar−α , using that εα ≥ Aα σ r α/(r−α) n−1 . To estimate Δ3 we apply the G-dependence metric 1 (·Fn−1 ): 2 3 Xn Xn (8.1.57) + εϑ, 1/α + εϑFn−1 Δ3 ≤ 1 1/α n n 2 3 n Xn X ≤ 1 ≤ n−1/α E1 (PXn |Fn−1 , PXn ) , 1/α Fn−1 1/α n n ≤ n−1/α τ1 . Finally, we estimate Δ6 as follows: 2 1/α m j Xj+1 Xj+2 + · · · + Xn α Δ6 = +ε 1 + + ϑ, 1/α 1/α n n n j=1 1/α 3 j+1 j X Xj+2 + · · · + Xn + + ϑ + εα 1/α 1/α n n n (1−r)/α m j ≤ + εα n j=1 2 3 Xj+1 Xj+2 + · · · + Xn Xj+1 Xj+2 + . . . + Xn · r + , 1/α + 1/α 1/α n n n n1/α ≤
m (j + nεα )(1−r)/α
n(1−r)/α
j=1
≤ τr n
−1/α
m
j+1 G5j+2 ) n−r/α r (Xj+1 , X
(j + nAα τ5rα/(r−α) n−1 )(1−r)/α
j=1
≤ Cn−1/α
1
τ5r1/(r−α) . r−1−α A
Gathering all the inequalities, we obtain B 1−r/α −1/α 1 (Zn , ϑ) ≤ C1 n tr + C2 n1−r/α r + C3 n−1/α τ1 r + n A B + C4 r−α n1−r/α r + n−1/α tr + C5 n1−r/α r A
102
8. Probabilistic-Type Limit Theorems
+ C6
1 τ5r1/(r−α) n−1/α r−1−α A
+ 2Eϑn
−1/α
max σ1 , σ r
+ C7 n1−r/α τr
1/(r−α)
, r
1/(r−α)
, τ5r1/(r−α)
.
4 ≤ 12 and then choose B Choose A large enough such that CA1 + ACr−α large enough such that C2 + C3 + C5 + C6 A1+α−r + C7 + 2Eϑ ≤ B2 . Thus we obtain (8.1.49). 2
8.2 Application to Stable Limit Theorems Zolotarev (1976) introduced the ζr -metric as an extension of the Kantorovich metric. For any pair of random vectors X, Y on IRk and r = m + a it is defined by ζr (X, Y ) = sup{|E(f (X) − f (Y ))|; f ∈ Fr },
(8.2.1)
where Fr is the class of all functions f : IRk → IR such that f (m) (x) − f (m) (y) ≤ x − ya1 , 0 < a ≤ 1; f (m) denotes the mth Fr´echet derivative of f supplied with the usual supremum norm of multilinear functionals (cf. Zolotarev (1986, Section 6.3) and Rachev (1991, p. 264)), and x − y1 denotes the L1 -norm in IRk . Indeed, ζ1 is merely the Kantorovich metric; see (8.1.3). ζr is ideal of order r and therefore is suitable for analyzing the rate of convergence in various central limit theorems. (The definition of an ideal metric was given in (8.1.8) and (8.1.9).) A disadvantage of ζr is that only for integers r can ζr be estimated by difference pseudomoments from above, while for r ∈ / IN the known upper estimates involve absolute moments br = max(EXr , EY r ) or absolute pseudomoments of order r and therefore are not suitable for approximation by stable distributions of order α < 2. In IR an alternative ideal metric of order r that does not have this drawback of ζr was found by Maejima and Rachev (1987) and applied to prove convergence to self-similar processes; see also Rachev (1991c, Section 17.1). In this section we introduce a new ideal metric ϑs,p (with respect to summation of independent random vectors in IRk ), which generalizes the construction in Maejima and Rachev (1987). This ideal metric has the following properties. It is ideal of order r = s − 1 + p1 . It can be estimated from above by a Zolotarev-type metric and, what is more important, by a pseudo difference moment, which allows applications to stable distributions. Finally, it can be bound from below by the L´evy metric, and thus ϑs,p describes weak convergence of distributions. The degree of ideality of this
8.2 Application to Stable Limit Theorems
103
metric does not depend on the dimension. This is an important property, which is not satisfied by some obvious generalizations of one-dimensional ideal metrics of order greater than 1, see Sections 6.1, 6.3. We shall establish relations between ϑ1,p and ϑs,p and prove various smoothing inequalities. In the second part of this section we give an application to the rate of convergence in stable limit theorems. The upper bounds in the limit theorem are formulated in metric terms. We establish some new results ensuring the finiteness of these bounds and apply these results to show that random vectors in a neighborhood of the LePage decomposition of a stable law satisfy the central limit theorem with rate. Further applications are to the convergence of summability methods of i.i.d. random vectors and to the approximation by compound Poisson distributions. All these applications are based on the thorough analysis of the metric properties of ideal metrics having a structure close to that of the Kantorovich metric. The results in this section are due to Rachev and R¨ uschendorf (1992). We start with the construction of the ϑs,p -metric. Let X, Y ∈ X (IRk ), the class of k-dimensional random vectors, and define for s ∈ IN, 1 ≤ p ≤ ∞, ϑs,p (X, Y ) = sup{|E(f (X) − f (Y ))|; f ∈ Gs,p }.
(8.2.2)
Here Gs,p is the class of functions f : IRk → IR, such that for 1 ≤ i1 ≤ · · · ≤ is ≤ k and for 1 ≤ j ≤ k, x = (x1 , . . . , xj−1 , xj+1 , . . . , xk ) ∈ IRk−1, and q with p1 + 1q = 1,
Dis1 ,...,is f q,j (x) ⎧ ⎛ ⎞1/q ⎪ ⎪ ⎪ ⎪ q ⎨ ⎝ |Ds ⎠ i1 ,...,is f (x1 , . . . , xj , . . . , xk )| dxj = IR ⎪ ⎪ ⎪ ⎪ ess sup |Dis1 ,...,is f (x1 , . . . , xj , . . . , xk )| ⎩ xj ∈IR
≤ 1
(8.2.3) if q < ∞, if q = ∞
a.s. with respect to the Lebesgue measure.
Lemma 8.2.1 For any 1 ≤ p ≤ ∞ and s ∈ IN, the metric ϑs,p is an ideal metric of order r = s − 1 + p1 . Proof: If f ∈ Gs,p and z ∈ IRk , then f (· + z) ∈ Gs,p and hence ϑs,p (X + Z, Y + Z) ≤ ϑs,p (X, Y ) for any Z independent of X and Y . Further, when q < ∞, for any c ∈ IR, x ∈ IRk−1 , 1 ≤ j ≤ k, fc (x) := f (cx), ⎤1/q ⎡ Dis1 ,...,is fc q,j (x) = ⎣ |Dis1 ,...,is fc (x1 , . . . , xk )|q dxj ⎦ IR
(8.2.4)
104
8. Probabilistic-Type Limit Theorems
⎡ ⎤1/q = |c|s−1/q ⎣ |Dis1 ,...,is f (y1 , . . . , yk )|q dyj ⎦ = |c|
r
IR s Di1 ,...,is
f q,j ,
which yields the ideality of ϑs,p of order r. The case q = ∞ can be handled similarly. 2
Remark 8.2.2 Note that the direct generalization of the Maejima–Rachev (1987) construction leads to 1/q ϑs,p (X, Y ) = sup |E(f (X) − f (Y ))|; f (s) (x)q dx ≤ 1 , (8.2.5) which is an ideal metric of order s − kq = s − k(1 − p1 ). This unpleasant dependence on the dimensionality is avoided in the definition of ϑs,p by the restriction to one-dimensional integration in (8.2.3). We next show that ϑs,p is estimated from above by the following modification ζ r of the ζr -metric: ζ r (X, Y ) = sup{|E(f (X) − f (Y ))|; f ∈ F r },
(8.2.6)
where F r is the class of functions f : IRk → IR with f
(m)
(x) − f
(m)
(y) ≤ dα (x, y) :=
k
|xi − yi |a ,
(8.2.7)
i=1
m = 0, 1, . . . , a ∈ (0, 1], and r = m + a. In fact, (8.2.7) is equivalent to |f (x) − f (y)| ≤ dα (x, y), ∀x, y ∈ IRk ,
if m = 0,
(8.2.8)
and to sup 1≤i1 ≤···≤im ≤k
|Dim1 ,...,im f (x) − Dim1 ,...,im f (y)| ≤ da (x, y), if m ≥ 1.
Since x − ya1 ≤ da (x, y), we have ζr ≤ ζ r . Lemma 8.2.3 (a)
(8.2.9) For any integer r,
ϑr,1 = ζ r = ζr .
(8.2.10)
8.2 Application to Stable Limit Theorems
105
For any r > 0, if ϑs,p (X, Y ) < ∞, then
(b)
ϑs,p (X, Y ) ≤ ζ r ≤ where r = s − 1 +
1 p
da (x, 0)
|xi1 · · · xim | |PX − PY |( dx).
(8.2.12)
1≤i1 ,···,im ≤k
IRk
Proof: (a)
(8.2.11)
= m + a, and
v r (X, Y ) :=
Γ(1 + a) v r (X, Y ), Γ(1 + r)
It is enough to check that Gr,1 = F r . If f ∈ Gr,1 , then
sup 1≤i1 ,...,ir−1 ≤k
|Dir−1 f (x) − Dir−1 f (y)| 1 ,...,ir−1 1 ,...,ir−1
⎧ y ⎨ 1 r Di ,...,i ,1 f (t, x2 , . . . , xk ) dt ≤ sup 1 r−1 1≤i1 ,...,ir−1 ≤k ⎩ x1 y 2 r + Di1 ,...,ir−1 ,2 f (y1 , t, x3 , . . . , xk ) dt x2 y ⎫ k ⎬ r + · · · + Di1 ,...,ir−1 ,k f (y1 , . . . , yk−1 , t) dt ⎭ xk
≤ x − y1 =: d1 (x, y); i.e., f ∈ F r . Conversely, for f ∈ F r we have from (8.2.8),
sup 1≤i1 ,...,ir ≤k
=
|Dir1 ,...,ir f (x)|
sup
lim
|Dir−1 (f (x1 , . . . , xk ) − f (x1 , . . . , yr , . . . , xk ))| 1 ,...,ir−1 |xr − yr |
1≤i1 ,...,ir ≤k yr →xr
≤ 1 a.s.; i.e., f ∈ Gr,1 . (b)
If f ∈ Gs,p and 1 < p ≤ ∞, then similarly to the proof in (a),
sup 1≤i1 ,...,is−1 ≤k
|Dis−1 f (x) − Dis−1 f (y)| 1 ,...,is−1 1 ,...,is−1 k
(8.2.13)
yi
≤
sup
1≤i1 ,...,is−1 ≤k i=1 xi
|Dis1 ,...,is−1 ,i f (y1 , . . . , t, xi+1 , . . . , xk )| dt
106
8. Probabilistic-Type Limit Theorems
≤
k
Dis1 ,...,is f q,i (y1 , . . . , yi−1 , xi+1 , . . . , xk )|xi − yi |1/p
i=1
≤ da (x, y). For the second part of (b) first note that ϑs,p (X, Y ) < ∞ implies that for any 1 ≤ i1 ≤ · · · ≤ ij ≤ k, j ≤ s − 1, E(Xi1 · · · Xij − Yi1 · · · Yij ) = 0.
(8.2.14)
This follows, by taking fc (x) = c xi1 · · · xij , j ≤ s − 1, and the obvious inequality ϑs,p (X, Y ) ≥ supc>0 |E fc (X) − E fc (Y )|. Following the argument in the first part of this proof, (8.2.14) is also a consequence of the condition ζ r (X, Y ) < ∞. We obtain from the Taylor expansion and applying (8.2.14) with m = s − 1 that |E(f (X) − f (Y ))| 1 (1 − t)m−1 = (m − 1)! 1≤i1 ,...,im ≤k 0
· E Dim1 ,...,im f (tX)Xi1 · · · Xim ≤
⎛
1
(1 − t)m−1 ⎝ (m − 1)!
1≤i1 ,...,im ≤k 0
m − Di1 ...im f (tY )Yi1 · · · Yim dt
|Dim1 ,...,im f (tx)xi1 · · · xim IRk
⎞
− Dim1 ,...,im f (t0)xi1 · · · xim | |PX − PY |( dx)⎠ dt ≤
1
1≤i1 ,...,im ≤k 0
⎛
·⎝ ⎛ ≤
1 ⎝ (m − 1)! ·
(1 − t)m−1 (m − 1)!
⎞ da (tx, t0)|xi1 · · · xim | |PX − PY |(dx)⎠ dt
IRk
1
(1 − t)m−1 ta dt⎠
0
1≤i1 ,...,im ≤k
=
⎞
Γ(1 + α) v r (X, Y ). Γ(1 + r)
da (x, 0)|xi1 · · · xim | |PX − PY |( dx) IRk
2
8.2 Application to Stable Limit Theorems
xr1
k i=1
107
r m a k k |xi | = i=1 |xi | i=1 |xi |
Remark 8.2.4 (a) Since = k a ≤ 1≤i1 ,...,im ≤k |xi1 . . . xim | , and on the other hand kxr1 ≥ i=1 |xi | k a |x . . . x | i i 1 m 1≤i1 ,...,im ≤k i=1 |xi | , we have vr (X, Y ) := xr1 |PX − PY |( dx) ≤ v r (X, Y ) ≤ k vr (X, Y ). (8.2.15) IRk
(b) By the same arguments as in (a) we have ζr ≤ ζ r ≤ k ζ r .
(8.2.16)
In particular, by (8.2.11), ϑs,p is also estimated from above by the Zolotarev metric ζr (up to a constant). The following theorem gives an estimate of ϑs,p in terms of certain pseudomoments, which allows one to apply ϑs,p to stable distributions. For random vectors X, Y with densities uX , uY , define αs,p (X, Y ) =
k
i=1 1≤i1 ,...,is ≤k
·
IRk−1
⎡ 1 s−1 ⎣ |yi1 . . . yis | t−s−k (1 − t) (uX − uY ) (s − 1)! IR
⎤1/p
yk dt |yi |p dyi ⎦ ,..., t t
y
1
0
dyi . . . dyi−1 dyi+1 . . . dyk .
If k = 1, then after some transformations we obtain αs,p (X, Y )
= =
Fs,X − Fs,Y p (8.2.17) ⎛ x ⎞1/p s−1 (x − t) ⎝ d(FX − FY )(t)|p dx ⎠ , (s − 1)! −∞
Fs,X (x) :=
1 E(x − X)s−1 + (s − 1)!
(see Maejima and Rachev (1987), Rachev and R¨ uschendorf (1990)). Indeed, αs,p is an ideal metric of order r. Representation (8.2.17) shows that αs,p depends only on the difference pseudomoments. A similar representation holds also for k ≥ 1. Theorem 8.2.5 αs,p is the upper bound for ϑs,p ; i.e., ϑs,p ≤ αs,p .
(8.2.18)
108
8. Probabilistic-Type Limit Theorems
Proof: By (8.2.14) and the Taylor expansion, E(f (X) − f (Y )) = IRk
1
1≤i1 ≤···≤is ≤k 0
(1 − t)s−1 (s − 1)!
· Dis1 ,...,is f (tx1 , . . . , txk )xi1 · · · xis dt d(FX − FY )(x) ⎡ 1 (1 − t)s−1 ⎣ = Dis1 ,...,is f (y1 , . . . , yk )yi1 . . . yis (s − 1)! 1≤i1 ,...,is ≤k 0 IRk ⎤ y · (uX − uY ) t−s−k dy ⎦ dt. t This implies, by making use of H¨older’s inequality, that |E(f (X) − f (Y ))| ≤
k s−1
i=1 j=0 1≤i1 ,...,ij ≤k,ij =i
|yi1 . . . yij |
IRk−1
⎡ ⎣Di1 ,...,ij ,i,...,i f (y)
IR
1 s−1 (1 − t) y −s−k · t (uX − uY ) dt (s − 1)! t 0 ⎤ · |yi |s−j dyi ⎦ dy1 . . . dyi−1 dyi+1 . . . dyk
≤
k s−1
i=1 j=0 1≤i1 ,...,ij ≤k,ij =i
|yi1 . . . yij |Di1 ,...,ij ,i,...,i f q,i (y) IRk−1
( 1 ( ( ( s−1 ( ( y (1 − t) −s−k s−j ( (u ·( t − u ) | dt |y X Y i ( ( (s − 1)! t ( ( 0
p
dy1 . . . dyi−1 dyi+1 . . . dyk , which is equivalent to the representation in (8.2.17).
2
For random vectors in IRk we define the L´evy distance by L(X, Y ) =
inf{ε > 0; P (X ∈ Bx ) ≤ P (Y ∈ Bx (ε)) + ε, P (Y ∈ Bx ) ≤ P (X ∈ Bx (ε)) + ε, ∀x ∈ IRk },
(8.2.19)
8.2 Application to Stable Limit Theorems
109
where Bx := {y ∈ IRk ; yi ≤ xi , 1 ≤ i ≤ k} and Bx (ε) := {y ∈ IRk ; y − Bx ≤ ε},
y =
2 k
31/2 yi2
;
i=1
note that P (X ∈ Bx ) = FX (x). L metrizes the topology of weak convergence. If X has a bounded density uX , then (X, Y ) := sup |FX (x) − FY (x)| 2 3 k ≤ 1+ sup uXi (x) L(X, Y ), i=1
x
(8.2.20) X = (X1 , . . . , Xk ).
Next, we establish that ϑs,p convergence implies weak convergence by providing a lower bound of ϑs,p in terms of L. Theorem 8.2.6 Let s = 1, 2, . . . , p ∈ [1, ∞], r = s − 1 + p1 . Then ϑs,p (X, Y ) ≥ a(s, k)Lr+1 (X, Y ), where a(s, k) :=
Vr 2s+k s!
,
Vr :=
(8.2.21)
(1 − x2 )r+1 dx.
(8.2.22)
{ x ≤1}
Proof: Let L(X, Y ) > ε. Then without loss of generality we can assume that for some z = (z1 , ..., zk ), P (X ∈ Bz ) − P (Y ∈ Bz (ε)) > ε.
(8.2.23)
We define gr (x) = (1 − x2 )r+1 + ,
x ∈ IRk (a+ = max(0, a)),
(8.2.24)
and “normalize” gr by g r (x) :=
gr (x) . Vr
(8.2.25)
Finally, we define the smoothed version of the indicator of Bz : & ε ' ε I x − y ∈ Bz g r (y) dy uε (x) = 2 2 IRk
k & ε ' 2 2 = I y ∈ Bz gr (x − y) d y. ε 2 ε IRk
(8.2.26)
110
8. Probabilistic-Type Limit Theorems
Since 0 ≤ uε (x) ≤
g r (y) dy ≤ 1, we have
IRk
|fε | ≤ 1, where fε (x) := 2 uε (x) − 1; and furthermore, ⎧ ⎨ 1 if x ∈ B , z fε (x) = ⎩ 0 if x ∈ / Bz (ε).
(8.2.27)
(8.2.28)
In fact, we have for x ∈ Bz , & ε ' ε uε (x) = I x − y ∈ Bz g r (y) dy 2 2 IRk & ε ' ε = I x − y ∈ Bz , y ≤ 1 g r (y) dy 2 2 IRk = g r (y) dy = 1. IRk
Similarly, for x ∈ / Bz (ε), uε (x) = 0. In the next step we establish bounds on the derivatives of fε . To that purpose let Ls (f ) =
sup ess sup Dis1 ,...,is f q,i (x).
(8.2.29)
2ε−r . a(s, k)
(8.2.30)
1≤i≤k x∈IRk−1
Then Ls (fε ) ≤
To show (8.2.30), observe that Dis1 ,...,is fε (x) (8.2.31) k+s & ε ' 2 2 = 2 (x − y) d y I y ∈ Bz Dis1 ,...,is g s ε 2 ε IRk
s & ε ' 2 = 2 Dis1 ,...,is g s (y) dy. I x − εy ∈ Bz ε 2 IRk
By Minkowski’s inequality, we get the following bound for the norm of the above quantity: Dis1 ,...,is fε q,1 (x)
(8.2.32)
8.2 Application to Stable Limit Theorems
⎡ = ⎣
111
⎤1/q |Dis1 ,...,is fε (x1 , x2 , . . . , xn )|q dx1 ⎦
IR
⎧ s ⎨ 2 = 2 ε ⎩
& ' I x − εy ∈ Bz ε , y ≤ 1 2 IR IRk ⎫1/q ⎬ s q ·I{Bz (ε)\Bz }Di1 ,...,is g s (y) dy| dx1 ⎭
s 2 = 2 ε
⎫1/q ⎬ ε s dx1 dy |Di1 ,...,is g s (y)| I{x − εy ∈ Bz ⎭ ⎩ 2 ⎧ ⎨
IR
{ y ≤1}∩(Bz (ε)\Bz )
s 2 ≤ 2 ε
|Dis1 ,...,is g s (y)|
{ y ≤1}∩(Bz (ε)\Bz )
·
⎧ ⎨ ⎩
I{z1 ≤ x1 − εy1 ≤ z1 + ε} dx1
IR
⎫1/q ⎬ ⎭
dy.
In fact, the inequality is valid a.e. with respect to Lebesgue measure λ\k−1 . The last integrals are estimated as follows: ⎧ ⎫1/q ⎬ ⎨ I{z1 ≤ x1 − εy1 ≤ z1 + ε} dx1 = ε1/q , (8.2.33) ⎩ ⎭ IR
and
|Dis1 ,...,is g s (y)| dy
(8.2.34)
{ y ≤1}∩(Bz (ε)\Bz )
=
=
≤
1 Vr 1 Vr
{ y ≤1}
3s−1 2 n s−1 I{Bz (ε)\Bz } Di1 ,...,is−1 s 1 − yi2 2yis dy i=1 + I{Bz (ε)\Bz }s!2s |yi1 · · · yis | dy
{ y ≤1}
1 s!2s Vr
1 1 · · · |yi1 · · · yis | dy ≤
−1
−1
1 1 s!2s · 2k−s = s!2k . Vr Vr
Similarly, we can argue for any index 1 ≤ i ≤ k, and thus (8.2.30) follows from (8.2.33) and (8.2.34).
112
8. Probabilistic-Type Limit Theorems
From the inequality in (8.2.30) we obtain that given fε (x) , Ls (fε )
f ∗ (x) :=
ϑs,p (X, Y ) ≥ E(f ∗ (X) − f ∗ (Y )) εr ≥ a(s, k)E(fε (X) − fε (Y )). 2
(8.2.35)
Applying (8.2.27), (8.2.28) we arrive at the following decomposition: E(fε (X) − fε (Y )) =
(fε (x) + 1)(PX − PY )( dx) IRk
⎛
⎜ = ⎝
+
+
⎞ ⎟ ⎠(fε (x) + 1)(PX − PY )( dx)
Bz Bz (ε)\Bz IRk \Bz (ε)
=: I1 + I2 + I3 , where I1
(fε (x) + 1)(PX − PY )( dx) = 2(PX − PY )(Bz );
= Bz
I2 I3
≥ −2 PY (Bz (ε)\Bz ); = 0.
Thus by (8.2.23), I1 + I2 + I3 ≥ 2(PX (Bz ) − PY (Bz )) − 2(PY (Bz (ε)) − PY (Bz )) ≥ 2ε. From (8.2.35) we finally obtain ϑs,p (X, Y ) ≥ εr+1 a(s, k). With ε → L(X, Y ), this implies (8.2.21). 2
Remark 8.2.7 (a)
Let us use the polar transformation
x1 x2 x3 .. .
= cos ϑ1 = cos ϑ1 = cos ϑ1
xk
= sin ϑ1 ,
· · · cos ϑk−2 cos ϑk−1 , · · · cos ϑk−2 sin ϑk−1 , · · · sin ϑk−2 ,
(8.2.36)
8.2 Application to Stable Limit Theorems
113
where > 0, 0 ≤ ϑ1 ≤ 2π, 0 ≤ ϑj ≤ π, 2 ≤ j ≤ k − 1, and ∂(x1 , . . . , xk ) ∂(, ϑ1 , . . . , ϑk−1 )
Dk (ϑ)
k−1 Dk (ϑ), ⎛ sin ϑ1 ⎜ sin ϑ1 ⎜ ⎜ := det ⎜ sin ϑ1 ⎜ .. ⎝ . =
cos ϑ1
· · · sin ϑk−1 · · · sin ϑk−2 · · · cos ϑk−2 0
···
(8.2.37) ⎞
sin ϑk−1 cos ϑk−1 0
⎟ ⎟ ⎟ ⎟. ⎟ ⎠
0
Then we have 1 Vr
= Dk
(1 − 2 )r+1 k−1 d
(8.2.38)
0
1 = Dk
(1 − 2 )r+1 (2 )
k−2 2
d2
0
1 Γ(r + 2)Γ( k2 ) = Dk , 2 Γ(r + 1 + k2 ) where
Dk :=
Dk (ϑ) dϑ.
/ IN) in terms of (b) Note that lower bound of ϑs,p (r = s − 1 + p1 ∈ the Prohorov metric exists. This follows from an example in Maejima and Rachev (1987) in the case k = 1. We next investigate smoothing inequalities, which play an important role in the proof of Berry–Ess´een-type theorems. They are also of interest for the study of intrinsic properties of probability metrics. Lemma 8.2.8 (a) Let Z be independent of X, Y, ε > 0, r = s − 1 + p1 . Then ϑs,p (X, Y ) ≤ ϑs,p (X + εZ, Y + εZ) + 2
Γ(1 + p1 ) Γ(1 + r)
εr kEZr1 . (8.2.39)
(b) If Z, W are independent of X, Y , then ϑs,p (X + Z, Y + Z) ≤ ϑs,p (X, Y )σ(W, Z) + ϑs,p (X + W, Y + W ); (8.2.40) and moreover, ϑs,p (X + Z, Y + Z) ≤ ϑs,p (W, Z)σ(X, Y ) + ϑs,p (X + W, Y + W ), (8.2.41)
114
8. Probabilistic-Type Limit Theorems
where σ is the total variation distance. Proof: (a) By the regularity of ϑs,p (cf. 8.2.1) we have ϑs,p (X, Y ) ≤ ϑs,p (X + εZ, Y + εZ) + 2ϑs,p (0, εZ) ≤ ϑs,p (X + εZ, Y + εZ) + 2εr ϑs,p (0, Z). By (8.2.11) and (8.2.15), Γ(1 + α) ϑs,p (0, Z) ≤ Γ(1 + r)
1≤i1 ,...,is−1 ≤k
E
k
|Zi |α |Zi1 · · · Zis−1 |.
i=1
(b) For any f ∈ Gs,p we have |E(f (X + Z) − f (Y + Z))| (8.2.42) ≤ |E(f (X + Z) − f (X + W )) − E(f (Y + Z) − f (Y + W ))| + |E(f (X + W ) − f (Y + W ))|. If f ∈ Gs,p , then the translates fx (z) := f (x + z) are also in Gs,p , and therefore, the first term is estimated by conditioning on X (respectively Y ): E(f (x + Z) − f (x + W )) dPX (x) − E(f (x + Z) − f (x + W )) dPY (x) = E(f (x + Z) − f (x + W )) d(PX − PY )(x) (8.2.43) ≤ ϑs,p (Z, W )σ(X, Y ). Indeed, the inequalities (8.2.42), (8.2.43) imply ϑs,p (X + Z, Y + Z) ≤ ϑs,p (Z, W )σ(X, Y ) + ϑs,p (X + W, Y + W ). The other case is derived similarly.
2
Lemma 8.2.9 If Z is independent of X, Y and PZ has a density pZ having integrable (s − 1)-fold derivatives Cs,Z := sup |Dis−1 p (x)| dx < ∞, (8.2.44) 1 ,...,is−1 Z 1≤i1 ,...,is−1 ≤k IRk
then ϑ1,p (X + Z, Y + Z) ≤ Cs,Z ϑs,p (X, Y ).
(8.2.45)
8.2 Application to Stable Limit Theorems
115
Proof: For any f ∈ G1,p , E(f (X + Z) − f (Y + Z)) =
f (x) d(FX+Z − FY +Z )(x) IRk
⎛
⎝
= IRk
(8.2.46) ⎞
f (x)pZ (x − z) d(FX − FY )(z)⎠ dx
IRk
f ∗ (z) d(FX − FY )(z),
= IRk
where f ∗ (z) =
f (x)pZ (x − z) dx. From the Taylor expansion,
IRk
∗
∗
f (z) = f (0) +
s−1
j=1 1≤i1 ,...,ij ≤k
+
1
1≤i1 ,...,is ≤k 0
Dij1 ,...,ij f ∗ (0)zi1 · · · zij
(8.2.47)
(1 − t)s−1 s Di1 ,...,is f ∗ (tz)zi1 · · · zis dt. (s − 1)!
Since f ∈ G1,p , i.e., ⎛
⎝
⎞1/q |Di1 f (x1 , . . . , xi , . . . , xn )|q dxi ⎠
≤ 1
a.s.,
(8.2.48)
IR
we have the following bound for the qth norm of f ∗ -derivatives: ⎞1/q ⎛ ⎝ |Dis ,...,i f ∗ (z1 , . . . , zn )|q dzi ⎠ 1 s IR
q ⎞1/q = ⎝ Dis1 ,...,is f (x)pZ (x − z) dx dzi ⎠ IR IRk q ⎞1/q ⎛ dzi ⎠ = ⎝ Di11 f (x)Dis−1 p (x − z) dx Z 2 ,...,is k IR IR q ⎞1/q ⎛ dzi ⎠ = ⎝ Di11 f (x + z)Dis−1 p (x) dx Z 2 ,...,is k ⎛
IR
IR
(8.2.49)
116
8. Probabilistic-Type Limit Theorems
q ⎞1/q 1 dzi ⎠ = ⎝ p (x)) dx Di1 f (x + z) (Dis−1 Z ,...,i 2 s k IR IR ⎧ ⎫1/q ⎨ ⎬ 1 (Di f (x + z))(Ds−1 pZ (x))q dzi ≤ dx i2 ,...,is 1 ⎩ ⎭ ⎛
IRk
= IRk
IR
⎧ ⎫1/q ⎨ ⎬ q s−1 1 D D p (x) f (x + z) dz dx i i1 i2 ,...,is Z ⎩ ⎭ IR
≤
|Dis−1 pZ (x)| dx = Cs,Z 2 ,...,is
by (8.2.48)).
IRk
Summarizing the results in (8.2.46), (8.2.47), and (8.2.48), we derive the desired inequality (8.2.45). 2 As a consequence of Lemmas 8.2.8, 8.2.9 we next obtain an estimate between ϑ1,p and ϑs,p . Theorem 8.2.10 For every s = 1, 2, . . . , p ∈ [1, ∞), r := s − 1 + p1 , and random vectors X, Y on IRk we have 1 pr
ϑ1,p (X, Y ) ≤ A(s, p, k)ϑs,p (X, Y ),
(8.2.50)
where 1
A(s, p, k) := a pr b
s−1 r
1
(p(s − 1)) pr
r s−1
(8.2.51)
and 1 a := √ s
2s π
s/2 ,
2 8 b := 2k k
2 π
31/p .
(8.2.52) d
Proof: Recall the inequality (8.2.39). Then for any ε > 0 and any Z = N (0, 1) independent of X, Y , we have 1/p
ϑ1,p (X, Y ) ≤ ϑ1,p (X + εZ, Y + εZ) + 2ε1/p k EZ1 ;
(8.2.53)
and furthermore, 1/p
EZ1
≤ (EZ1 )1/p =
2 k i=1
31/p E|Zi |
2 8 31/p 2 = k . π
(8.2.54)
8.2 Application to Stable Limit Theorems
117
Now we apply Lemma 8.2.9 to get ϑ1,p (X + εZ, Y + εZ) ≤ Cs,εZ ϑs,p (X, Y ).
(8.2.55)
Next, assuming that Z is standard normal with independent components, we bound the constant in (8.2.55) as follows: x sup |Dis−1 pZ ( )|ε−s−n+1 dx (8.2.56) Cs,εZ = 1 ,...,is−1 ε i1 ,...,is−1 IRk
= ε1−s Cs,Z = ε1−s Cs,s−1/2 (Z1 +···+Zs ) . Here, Z1 , . . . , Zs are i.i.d. copies of Z, and thus (see Zolotarev (1977, 1983, 1986)) Cs,εZ
1
= ε1−s s− 2 (1−s) Cs,Z1 +···+Zs ≤ ε1−s s
where C1,Z
s−1 2
=
sup 1≤i≤k IRk
=
sup i
IRk
=
sup i
IR
(8.2.57)
(C1,Z )s ,
|Di1 pZ (x)| dx 1 √ 2π
n |xi |e
−
x2 i 2
9 j=i
x2 1 i √ |xi |e− 2 dxi = 2π
−
e
8
x2 j 2
dx
2 . π
Therefore, from (3.15), (3.17), (3.19) we obtain 1 ϑ1,p (X, Y ) ≤ ε1−s √ s
2s π
2 8 31/p s/2 2 ϑs,p (X, Y ) + 2ε1/p k k . (8.2.58) π
Define ϕ(x) := ax1−s + bx1/p , 1 a := √ s
s/2 2s ϑs,p (X, Y ), π
2 8 b := 2k k
Minimizing ϕ with respect to x yields (8.2.50).
2 π
31/p . 2
As a consequence of the smoothing properties we shall establish a Berry– Ess´een-type result, that will provide the right order estimate in the stable central limit theorem in terms of the metric ϑ1,1 . Let (Xi ) be an i.i.d.
118
8. Probabilistic-Type Limit Theorems
sequence of random vectors in IRk ; let (Θ, Θi ) be an i.i.d. sequence of d
symmetric α-stable distributed random vectors, i.e. n−1/α (Θ1 +· · ·+Θn ) = Θ; and define ϑr := ϑr (X1 , Θ) := sup hr−1 ϑ1,1 (X1 + hΘ, Θ1 + hΘ),
(8.2.59)
h>0
σr := σr (X1 , Θ) := sup hr σ(X1 + hΘ, Θ1 + hΘ), (8.2.60) h>0 1 ϑ := ϑ1,1 (X1 , Θ), σ := σ(X1 , Θ), τr := max ϑ, σ, σr(r−α) . (8.2.61) Theorem 8.2.11Suppose that 1 < α ≤ 2, α < r, and ϑ+ϑr +σ+σr < ∞. n Let Zn = n−1/α i=1 Xi . Then for some absolute constant C = C(k) depending only on the dimension k, r 1− α −1/α ϑ1,1 (Zn , Θ) ≤ C n ϑr + τ r n . (8.2.62) Proof: We shall use the notation ϑ(X, Y ) = ϑ1,1 (X, Y ) during this proof. Note that by (8.2.10), ϑ(X, Y ) = ζ 1 (X, Y ) = ζ1 (X, Y ). From the smoothing inequality (8.2.39) we obtain the following bound: For any ε > 0 with Θi , Θ independent and identically distributed, ϑ(Zn , Θ1 ) ≤ ϑ(Zn + εΘ, Θ1 + εΘ) + Cε,
(8.2.63)
where C := 2 k EΘ1 . Our proof will be based on the Bergstr¨om convolution method (cf. Rachev (1991, Chapter 18) and the references therein). We start by making use of the triangle inequality: ϑ(Zn + εΘ, Θ1 + εΘ) Θ 1 + X 2 + · · · + Xn ≤ ϑ Zn + εΘ, + εΘ n1/α m Θ1 + · · · + Θj + Xj+1 + · · · + Xn + ϑ + εΘ, 1/α n j=1
(8.2.64)
Θ1 + · · · + Θj+1 + Xj+2 + · · · + Xn + εΘ n1/α Θ1 + · · · + Θm+1 + Xm+2 + · · · + Xn +ϑ + εΘ, Θ1 + εΘ n1/α m =: Io + Ij + Im+1 , m = [n/2]. j=1
Applying the smoothing property (8.2.41), we obtain X2 + · · · + Xn Θ2 + · · · + Θn Io ≤ ϑ , n1/α n1/α
(8.2.65)
8.2 Application to Stable Limit Theorems
119
· σ n−1/α X1 + εΘ, n−1/α Θ1 + εΘ X 1 + Θ2 + · · · + Θ n Θ 1 + · · · + Θn + εΘ, + εΘ . +ϑ n1/α n1/α Similarly, for 1 ≤ j ≤ m, we have
Ij
Xj+2 + · · · + Xn Θj+2 + · · · + Θn ≤ ϑ , (8.2.66) n1/α n1/α Θ1 + · · · + Θj + Xj+1 Θ1 + · · · + Θj+1 ·σ + εΘ, + εΘ n1/α n1/α Θ1 + · · · + Θj + Xj+1 + Θj+2 + · · · + Θn +ϑ + εΘ, Θ1 + εΘ . n1/α
Summarizing the above inequalities, we get the bound ϑ(Zn , Θ1 ) ≤
5
Δj ,
(8.2.67)
j=1
where
Δ1 := Δ2 :=
Δ3 := Δ4 := Δ5 :=
X 2 + · · · + X n Θ2 + · · · + Θ n Θ1 X1 ϑ , + εΘ, 1/α + εΘ , σ n1/α n1/α n1/α n m Xj+2 + · · · + Xn Θj+2 + · · · + Θn ϑ , 1/α n n1/α j=1 Θ1 + · · · + Θj + Xj+1 Θ1 + · · · + Θj+1 ·σ + εΘ, + εΘ , n1/α n1/α X 1 + Θ2 + · · · + Θ n (m + 1)ϑ , Θ , n1/α Θ1 + · · · + Θm+1 + Xm+2 + · · · + Xn ϑ + εΘ, Θ1 + εΘ , n1/α Cε = 2k EΘ1 ε.
To estimate the terms Δi , we introduce smoothed versions of the metrics ϑ, σ defined as follows: For r > 1, ϑr (X, Y )
:=
sup hr−1 ϑ(X + hΘ, Y + hΘ),
(8.2.68)
h>0
σr (X, Y )
:=
sup hr σ(X + hΘ, Y + hΘ)
(8.2.69)
h>0
(cf. also (8.2.59), (8.2.60)). It is easy to see that both ϑr , σr are ideal metrics of order r.
120
8. Probabilistic-Type Limit Theorems
We first estimate Δ3 and Δ4 using the ideality of ϑr . In the rest of the proof, c stands for a general constant, which may be different at different places: 2 1/α 1/α 3 n−1 n−1 Θ, n−1/α Θ1 + Θ Δ3 = (m + 1)ϑ n−1/α X1 + n n r−1 α n −1/α −1/α ϑr n X1 , n Θ (8.2.70) ≤ (m + 1) n−1 ≤ c n1− α ϑr . r
Similarly, Δ4
≤ ϑr
Xm+2 + · · · + Xn Θm+2 + · · · + Θn , n1/α n1/α
n m+1
r−1 α (8.2.71)
≤ c n1− α ϑr . r
Define
1 r−α
ε := A max σ1 , σr
n−1/α ,
A > 0.
(8.2.72)
The proof continues by induction on n. The induction hypothesis states that for all j < n, r X 1 + · · · + X j Θ1 + · · · + Θ j ϑ , (8.2.73) ≤ B(ϑr j 1− α + τr j −1/α ). 1/α 1/α j j (For n = 1, . . . , n0 , n0 fixed, (8.2.62) follows from τr ≥ ϑ and the ideality of ϑ1,1 .) Then, for Δ1 = Δ1 (n), we obtain Δ1 ≤ Bc(n1− α ϑr + n−1/α τr )σ(n−1/α X1 + εΘ, n−1/α Θ1 + εΘ) (8.2.74) r
≤ Bc(n1− α ϑr + n−1/α τr )ε−1 n−1/α σ1 r 1 Bc n1− α ϑr + n−1/α τr . ≤ A r
Similarly, we estimate Δ2 : Δ2
≤ cB(ϑr (n − m − 2)1− α + τr (n − m − 2)−1/α ) −r/α ∞ j −1/α −1/α α σr n X1 , n Θ · +ε n i=1 r
≤ cB ϑr n
r 1− α
−1/α
+ τr n
≤ cB ϑr n1− α + τr n−1/α r
∞
j=1
σr (j + εα n)r/α
1 Ar−α
.
(8.2.75)
8.2 Application to Stable Limit Theorems
From (8.2.70)–(8.2.75) we infer r 1 1 1− α −1/α + τr n B ϑr n ϑ(Zn , Θ) ≤ C1 + A Ar−α r + C2 ϑr n1− α + Aτr n−1/α . 1 Choosing A big enough so that C1 ( A1 + Ar−α )< B/2 > C2 (1 + A), we complete the proof.
1 2
121
(8.2.76)
and then B such that 2
Remark 8.2.12 (a) We note that the conditions concerning the domain of attraction of Θ are given solely in terms of the metrics appearing in the upper bounds. (b) Since Θ has a density pΘ with integrable derivatives, we can get, similarly to the proof of Lemma 8.2.9, that σ(X + δΘ, Y + δΘ) ≤ Cδ −1 ϑ1,1 (X, Y ),
(8.2.77)
where C = C(Θ). This implies that for any 0 < ε < 1, σr (X, Y )
sup hr σ(X + hΘ, Y + hΘ) h>0 r = sup h σ X + h(1 − ε)1/α Θ1 + hε1/α Θ2 , h>0 1/α 1/α Y + h(1 − ε) Θ1 + hε Θ2 r −1/α 1/α Θ1 , Y + h(1 − ε) Θ2 ≤ sup h ϑ1,1 X + h(1 − ε) =
h>0
· C(Θ)/(h(1 − ε)1/α ) = C(Θ)(1 − ε)(1−r)/α ε−1/α ϑr (X, Θ). The minimum in the right-hand side is attained for ε = 1/r, implying that σr (X, Y )
≤ C(Θ)rr/α (r − 1)(1−r)/α ϑr (X, Θ) ≤ C(Θ)2r/α ϑr (X, Θ).
(8.2.78)
Similarly to the proof of Theorem 8.2.10, relations (8.2.77), (8.2.78) allow us to replace (8.2.62) by a bound involving only ϑ, ϑr : r 1− α 1/(r−α) −1/α ϑr + max{ϑ, ϑr }n . (8.2.79) ϑ1,1 (Zn , Θ) ≤ C n For r ∈ IN, ϑr , the r-smoothed ϑ metric, can be estimated from above by the ζr metric: ϑr ≤ cr ζr ,
(8.2.80)
122
8. Probabilistic-Type Limit Theorems
(r) where cr depends on sup z ≤1 |pΘ (y)(z)| dy (cf. Zolotarev (1983, p. 294, property 6)). Also, for r ∈ IN, ζr is estimated from above by the rth difference pseudomoment kr , defined by & kr (X, Y ) = sup |E(f (X) − f (Y ))|; f bounded, f : IRk → IR, (8.2.81) ( (' r−1 r−1 ( ( |f (x) − f (y)| ≤ xx − y y and kr (X, Y ) ≤ 2 vr (X, Y ) = 2 r
r
xr |PX − PY |( dx),
(8.2.82)
where vr is the absolute pseudomoment of order r. From (8.2.79)– (8.2.82) we obtain easy-to-check criteria for finiteness of the upper bounds. In particular, in the normal case α = 2, we take r = 3, and so the finiteness of the third moments of Xi implies the Berry–Ess´een result. In the case 1 < α < 2 we use the boundness of kr in (8.2.81). (c) In the case k = 1, α = 2 (normal case, dimension one) the result of Theorem 8.2.11 is due to Zolotarev (1987), based on the proof of Senatov (1980). From part (b) of the above remark, it follows that we can replace the terms ϑ, σ, ϑr , σr in the upper bound in (8.2.62) by k1 and kr . Since kr is topologically weaker than vr , it is of interest to obtain alternative bounds for kr . To this end let us recall the minimal r -metric: for r > 0, & ' d r (1/r)∧1 d ; X = X, Y = Y . (8.2.83) r (X, Y ) := inf (EX − Y ) Then kr (X, Y ) =
& ' d d r−1 r−1 inf |EXX − Y Y |; X = X, Y = Y (8.2.84)
= 1 (XXr−1 , Y Y |r−1 ) & ' d d r−1 r−1 = inf EU − V ; U = XX , V = Y Y (cf. Rachev and R¨ uschendorf (1990)). If X and Y have densities fX and fY , respectively, then (cf. Rachev (1991, pp. 249–252)) we use that k1 = 1 to get the bound kr (X, Y )
≤
αr (X, Y ) (8.2.85) 1 −k−1 x x := − fY Y r−1 dt dx. x1 t fX X r−1 t t k IR
0
8.2 Application to Stable Limit Theorems
123
For some examples with an equality in (8.2.85), see Rachev (1991, p. 252). The densities of XXr−1 , Y Y r−1 are obtainable from the transformation formula. In particular, this gives explicit bounds in the case r = 1, where the expression in (8.2.85) simplifies. The following upper bound for kr , r > 1, will turn out to be useful. Lemma 8.2.13 If EXr < ∞, EY r < ∞, r > 1, then kr (X, Y ) ≤ cr (X, Y ),
(8.2.86)
where the constant c depends on the rth moments of X, Y . Proof: Let r :=
r r−1
d
d
and U = X, V = Y . Then
( ( E (U U r−1− V V r−1 ( ≤
EU r−1 U −V + EV U r−1− V r−1
=: I1 + I2 . For I1 , I2 we readily get the bounds I1 ≤ (EU r )1/r (EU − V r )1/r and ⎧ 1/r 1/r ⎪ ⎪ (EV r ) (EU − V r ) , 1 < r ≤ 2, ⎪ ⎨ I2 ≤ (r − 1)(EU − V r )1/r ⎪ ⎪ r−2 1 ⎪ r r−1 ⎩ · (EV r )1/r + (EV r ) r−1 (EU ) , r > 2. So I1 + I2 ≤ c(EU − V r )1/r , and passing to the corresponding minimal 2 metrics, we get kr (X, Y ) ≤ cr (X, Y ), as required. In some examples one can determine kr explicitly. Suppose that for some radial transformation ⎧ ⎨ α( x ) x if x = 0, x k k φ : IR → IR , φ(x) := ⎩ 0 if x = 0, d
with α monotonically nondecreasing, we have Y = φ(X). Examples of this relation include spherically invariant distributions and spherically equivalent distributions, as for example, the uniform distribution on a p-ball in IRk and the product of Weibull distributions (cf. Section 3.2). By (8.2.84), it is easy to see that the pair (X, φ(X)) is an optimal coupling with respect to kr , and so we obtain ( ( (8.2.87) kr (X, Y ) = E (XXr−1 − φ(X)φ(X)r−1 ( = E |Xr − α(X)r | .
124
8. Probabilistic-Type Limit Theorems
(A related explicit formula was derived for the r distance in Section 3.2.) Note that α is determined by the equation F Y (y) = P (α(X) ≤ y) = −1 F X (α−1 (y)), which in the case of F X continuous leads to α(t) = F Y ◦ F X (t). We illustrate the above resutls, invoking again the stable limit theorem 8.2.11. Let Θ be a k-dimensional α-stable random vector with spectral α measure m such that x dm(x) < ∞; i.e., IRk
⎫ ⎧ ⎬ ⎨ 1 α |t, s| dm(s) . E exp{it, Θ} = exp − ⎭ ⎩ 2
(8.2.88)
IRk
We apply the LePage representation for symmetric α-stable laws. Let 1. (Yj ) be an i.i.d. sequence of random vectors with distribution m/|m| and let Yj := |m|1/α Yj ; 2. ( ηj ) be i.i.d. symmetric random variables with η1 α = (E| η1 |α )1/α < ∞ and let ηj := ηj/ ηj α ; 3. (Γj ) be the sequence of successive times of jump of a standard Poisson process and assume that the three sequences are independent. Then d
Θ = cα
∞
−1/α
Γj
ηj Yj ,
(8.2.89)
j=1
where the constant cα is determined by the tail behavior of the law of Θ. Without loss of generality we set cα = 1 (cf. Ledoux and Talagrand (1991, Section 5.1) and Samorodnitsky and Taqqu (1994)). Suppose that the distribution of X has a similar representation in distribution d
X =
∞
−1/α ∗ ∗ ηj Yj ,
Γj
(8.2.90)
j=1
where (Yj∗ ), (ηj∗ ) are independent but not necessarily identically distributed. Recall the bound (8.2.80) to see that what we need is an estimate for ζr (X, Θ) from above. Proposition 8.2.14 Let r > max{1, α}. Suppose that Yj , Yj∗ , ηj , ηj∗ have finite rth moments with supj E|ηj∗ |r < ∞. Then ⎛ ⎞ ∞ ζr (X, Θ) ≤ C sup ⎝r (Yj∗ , Yj ) + |x|r−1 |Fηj∗ (x) − Fηj (x)| dx⎠. (8.2.91) j≥1
−∞
8.2 Application to Stable Limit Theorems
125
Proof: By the ideality of ζr , ζr (X, Θ) ≤
∞
−r/α
E(Γj
)ζr (ηj∗ Yj∗ , ηj Yj ).
j=1
Since r > α, and for j > r/α, −r/α
EΓj
the series Sr =
=
Γ(j − r/α) ∼ j −r/α , Γ(j)
∞ j=1
−r/α
E(Γj
) converges. Furthermore,
ζr (ηj∗ Yj∗ , ηj Yj ) ≤ ζr (ηj∗ Yj∗ , ηj∗ Yj ) + ζr (ηj∗ Yj , ηj Yj ) ≤
(E|ηj∗ |r )ζr (Yj∗ , Yj )
≤ C
ζr (Yj∗ , Yj )
+
+
k
(E|Yj,i |r )ζr (ηj∗ , ηj )
i=1 ∗ ζr (ηj , ηj ) ,
where Yj,i is the ith compoment of Yj . Since 1 1 ζr (ηj∗ , ηj ) ≤ kr (ηj∗ , ηj ) = r! (r − 1)!
∞ |x|r−1 |Fηj∗ (x) − Fηj (x)| dx, −∞
we obtain (8.2.91) after applying the inequality ζr (Yj∗ , Yj ) ≤ and Lemma 8.2.13.
1 ∗ r! κr (Yj , Yj )
2
Under the additional assumption sup EYj∗ r < ∞
(8.2.92)
j
we obtain (by the obvious bound r (Yj∗ , Yj ) ≤ (EYj∗ r )1/r + (EYj r )1/r ) the finiteness of the upper bound in the limit theorem in (8.2.62). In this way we establish a stable limit theorem (with an estimate of the rate of convergence) for random vectors in the r -neighborhood of a stable symmetric law in the sense of the LePage representation. For r = 1 we use the estimate in (8.2.85). For r > 1 (and in particular r = 2) explicit expressions for r are known in several cases (cf. Section 3.2), for example for the distance r (X, Y ) between normal distributed random vectors X and Y , between uniform distributions on balls and multivariate normal, or Weibull, distributions, and between spherically equivalent distributions.
126
8. Probabilistic-Type Limit Theorems
8.3 Application to Summability Methods and Compound Poisson Approximation In this section we apply the ϑs,p -metric (see (8.2.2)) to obtain rate of convergence results in stable limit theorems for multivariate summability methods, thus extending some results of Maejima (1985) in the real case. We also study the approximation of sums of independent random variables by compound Poisson distributions. Let (Xn )n≥0 be an i.i.d. sequence of random vectors in IRk and consider the weighted sums T (λ) :=
∞
cj (λ)Xj ,
cj (λ) ≥ 0,
(8.3.1)
j=0
where for λ > 0 or λ ∈ IN, (cj (λ)), j ≥ 0, is a summability method Some classical summability methods are ⎧ ⎨ 1 , 0 ≤ j ≤ n, n+1 (8.3.2) “C´esaro method” cj (λ) = ⎩ 0, otherwise; “Borel method” “Euler method”
“Abel method”
λj −λ e , λ > 0, j ∈ IN0 ; j! n j λ (1 − λ)n−j , cj (λ) = j 0 ≤ j ≤ n, 0 < λ < 1;
cj (λ) =
(8.3.3) (8.3.4)
cj (λ) = (1 − e−1/λ )e−j/λ , 0 ≤ j < ∞;
(8.3.5)
0 ≤ j < ∞,
(8.3.6)
“random walk method” cj (n) = P (Sn = j),
where Sn is a random walk on the integers IN0 . For a review and discussion of these methods in the univariate case we refer to Maejima (1985). Let Θ(α) denote a random vector with symmetric stable distribution of index α, 0 < α ≤ 2. Recall (see Samorodnitsky and Taqqu (1994)) that for 0 < α < 2, Θ(α) is symmetric α-stable in IRk if and only if there exists a (unique) symmetric finite measure Γ on the unit sphere Sk such that ϕΘ(α) (t)
E exp it, Θ(α) ⎧ ⎫ ⎨ ⎬ α = exp − |(t, s)| Γ( ds) , ⎩ ⎭
=
Sk
(8.3.7) t ∈ IRk .
8.3 Summability Methods, Compound Poisson Approximation
127
Define then ⎛ dα (λ) = ⎝
∞
⎞1/α cj (λ)α ⎠ .
(8.3.8)
j=0
Theorem 8.3.1 Let 0 < α < r = s − 1 + p1 . Then ϑs,p
1 T (λ), Θ(α) dα (λ)
≤ R(λ)ϑs,p ,
(8.3.9)
where ⎛ ⎞r ∞ R(λ) = ⎝ cj (λ)/dα (λ)⎠,
ϑs,p := ϑs,p (X0 , Θ(α) ).
(8.3.10)
j=0
Proof: Let (Θj ) be an i.i.d. sequence with the same distribution as Θ(α) . Let us show first that ∞
1 d cj (λ)Θj = Θ(α) . dα (λ) i=0
(8.3.11)
Consider the characteristic function of the right-hand side quantity in (8.3.11): i d
Ee
1 α (λ)
∞
j=0 cj (λ)Θj ,t
=
∞ 9
: i
Ee
cj (λ) Θ ,t dα (λ) j
;
(8.3.12)
j=0
⎧ ⎫ =α ⎨ < c (λ) ⎬ j = exp − t, s Γ( ds) ⎩ ⎭ dα (λ) j=0 Sk ⎧ ⎫ α ∞ ⎨ ⎬ cj (λ) = exp − |t, s|α Γ( ds) ⎩ ⎭ dα (λ) j=0 ∞ 9
Sk
= ϕΘ(α) (t). By Lemma 8.2.3, ϑs,p is ideal of order r = s − 1 +
1 p
> α. Therefore,
1 T (λ), Θ(α) ϑs,p dα (λ) ⎛ ⎞ ∞ ∞ 1 1 = ϑs,p ⎝ cj (λ)Xj , cj (λ)Θj ⎠ dα (λ) j=0 dα (λ) j=0
(8.3.13)
128
8. Probabilistic-Type Limit Theorems −r
≤ dα (λ)
∞
cj (λ)r ϑs,p (X0 , Θ(α) )
j=0
= R(λ)ϑs,p (X0 , Θ(α) ). 2 Note that various upper bounds for ϑs,p were established in Section 8.2. In particular, if r ∈ IN, or if X0 has a density, we have obtained upper bounds in terms of difference pseudomoments. Maejima (1985) showed that R(λ) ≤ cλ−(r−α)/α
(8.3.14)
for the C´esaro and Abel methods, and R(λ) ≤ cλ−(r−α)/2α
(8.3.15)
for the random walk method (which includes the Euler method and the Borel method as particular cases). In the Gaussian case, for r = 3 the metric ϑs,p in (8.3.9) is finite, provided that (i) if Cov (X0 ) = Ik , the unity matrix, and (ii) the components have finite third moments. Furthermore, the corresponding rate of convergence is λ−1/2 for the C´esaro and Abel methods and λ−1/4 for the random walk method. We complete this section with an application of the ideality properties of our metrics to the approximation of the distribution of sums of nonidentically distributed random vectors by a compound Poisson law. Let X1 , . . . , Xn be independent random vectors in IRk with distributions P1 , . . . , Pn of the form Pi = (1 − pi )δ0 + pi Qi ,
0 ≤ pi ≤ 1, 1 ≤ i ≤ n.
(8.3.16)
Here, δ0 stands for the one point distribution at zero. We can write Xi in the form Xi = Ci Di ,
1 ≤ i ≤ n,
(8.3.17)
where Ci has distribution Qi , Di is B(1, pi )-distributed, and Ci , Di are independent. We shall consider the approximation of S ind :=
n
Xi
(8.3.18)
i=1
by a multivariate compound Poisson distribution P(μ, Q). P(μ, Q) is defined as the distribution of S coll :=
N i=1
Zi ,
(8.3.19)
8.3 Summability Methods, Compound Poisson Approximation
129
where (Zi ) is an i.i.d. sequence; P Zi = Q, N is Poisson distributed with parameter μ, P N = P(μ); and N , (Zi ) are independent. The notation S ind , S coll is taken from risk theory. Recall that in the risk-theory framework in the “individual model” pi is the probability of a claim Ci with distribution Qi , corresponding to k different types of claims. S coll denotes the approximation of S ind by the “collective model”; we refer to the books of Gerber (1981) and Hipp and Michel (1990) for these and related notions. The usual choice of Q, μ in risk theory is n
μ=μ :=
:= Q=Q
pi ,
i=1
n pi i=1
μ
Qi .
(8.3.20)
This leads to the following representation of S coll : S coll =
n
Sicoll ,
(8.3.21)
i=1
where Sicoll ∼ P(pi , Qi ) (X ∼ Q denoting that X has distribution Q) and and {Sicoll } independent. Note that with this choice μ = μ , Q = Q, moreover, E S ind = E S coll ,
(8.3.22)
if the expectations exist. If Σi = Cov (Ci ), αi = (αi,1 , . . . , αi,k ) = ECi , then
Cov S ind
=
n
pi Σ i +
n
i=1
i=1
n
n
pi qi αiT αi ,
(8.3.23)
pi αiT αi .
(8.3.24)
while
Cov S coll
=
pi Σi +
i=1
i=1
As a consequence we obtain the following majorization result:
Cov S ind
Cov S coll
in the sense of the Loewner ordering
(8.3.25)
1 p
≥ 3.
130
8. Probabilistic-Type Limit Theorems
In Rachev and R¨ uschendorf (1990) it is shown that a better choice of (μ, Q) is possible for k = 1 by appropriately chosen scale transformations. For an extension of this result to k ≥ 1, define μi μ
:= :=
(1 − pi )αi , Γi
:=
n
n βi := Qi , β β i=1
μi ,
Q
i=1
(1 − pi )Σi ,
βi
pi ; 1 − pi n := βi, :=
(8.3.26)
i=1
where Qi is a probability measure with mean μi and covariance Γi . We approximate Xi by a compound Poisson distributed r.v., Sicoll ∼ P(β i , Qi ). This leads to an approximation of S ind by S coll ∼ P(β, Q).
(8.3.27)
From our construction, it follows that EXi Cov (Xi )
= pi αi = E S coll ; coll = Σi = Cov Si = β i Γi + βi μTi μi .
(8.3.28)
The “ideal” properties of the metric ϑs,p derived in Section 8.2 yield closeness between the “individual” and the “collective” models using the following bounds: ϑs,p
S ind , S coll
≤
n
ϑs,p
Xi , Sicoll
(8.3.29)
i=1
and
ϑ1,p S ind , S coll
≤ A(s, p, k)
2 n
ϑs,p Xi , Sicoll
1 3 pr
.
(8.3.30)
i=1
The constant A(s, p, k) can be determined from (8.2.45). ϑs,p Xi , Sicoll is estimated from above by the metric αs,p (Theorem 8.2.5), which depends only on pseudo-difference moments. In particular, for r = s − 1 + p1 = s and bi = EXi s1 , ϑs,p (Xi , Sicoll ) ≤ pi bi
1 (s − 1)!
(cf. (8.2.11)).
(8.3.31)
Define the normalizations i = Xi − EXi , X
Yi = Sicoll − E Sicoll .
(8.3.32)
Consider the i.i.d case; we shall establish an estimate similar to that in (8.3.29) but without the dependence on n in the upper bound.
8.4 Operator-Stable Limit Theorems
131
Theorem 8.3.2 Suppose (Xn ) is an i.i.d. sequence with α = E Ci , Σ = Cov (Ci ), p = P (Di = 1). Then ind coll (8.3.33) ϑ1,1 S , S ≤ C(ϑr + ϑr + τr + τr ), 1 , Θ), ϑr = ϑr (Y1 , Θ), τr = τr (X 1 , Θ), τr = τr (Y1 , Θ), C where ϑr = ϑr (X is as defined in (8.2.62), and Θ is an N (0, Σ)-distributed r.v. Proof: The ideality of ϑ1,1 and the triangle inequality yield 2 n 3 n X, S coll ϑ S ind , S coll = ϑ 1,1
1,1
i
i=1
= ϑ1,1
2 n
i , X
i=1
≤
√
2
nϑ1,1
1 √ n
i
i=1 n i=1 n i=1
3 Yi 3 i , Θ X
+
√
2 nϑ1,1
3 n 1 √ Yi , Θ , n i=1
where Θ is normally distributed with mean zero and covariance Σ. The proof of Theorem 8.2.11 (in the case α = 2 with Σ being the identity matrix) extends to the general case with the same constant, thus implying (8.3.33). 2
8.4 Application to Operator-Stable Limit Theorems: Statement of the Results and Auxiliary Lemmas In this section we generalize some of the results from Section 8.2, studying the rate of convergence problem for a more general summation scheme for random vectors. Namely, suppose that the IRd -valued random vector θ is strictly operator-stable in the sense that μ 5, the characteristic function of t B∗ 5(t z) for every t > 0, for some invertible linear opθ, satisfies μ 5(z) = μ d erator B on IR . Suppose also that for the i.i.d. random vectors {Xi } in n w IRd , n−B i=1 Xi → θ. In this and the next section we study the rate of convergence of this operator-stable limit theorem in terms of several probability metrics including the Kantorovich metric and Lp -minimal versions, see (2.6.2). The results in this and the next section are due to Maejima and Rachev (1996). We start with some definitions and notation related to operator-stable limit theorems. A probability distribution μ on IRd is said to be full if μ is
132
8. Probabilistic-Type Limit Theorems
not concentrated on a proper hyperplane in IRd . A full distribution μ on IRd is called operator-stable if there exists an invertible linear operator B on IRd and a function b : (0, ∞) → IRd such that for all t > 0, ∗
5(tB z)eib(t) , μ 5(z)t = μ
for all z ∈ IRd .
(8.4.1)
Here μ 5 is the characteristic function of μ, B ∗ is the adjoint operator of B, ∞ and tA = exp{(ln t)A} = k=0 (k!)−1 (ln t)k Ak . The distribution μ is called strictly operator-stable if we can choose b(t) ≡ 0. In this section, we always assume that μ is a full strictly operator-stable distribution on IRd . Sharpe (1969) showed that if 1 is not in the spectrum of B, then the operatorstable law can be centered so as to become strictly operator-stable. Thus, the assumption of strict operator-stability is not so restrictive. The invertible linear operator B in (8.4.1) is called an exponent of μ. When μ is operator-stable with an exponent B, μ may satisfy (8.4.1) for other B’s; i.e., the exponent of μ is not necessarily unique. Further, we fix the value of the exponent B and denote by θ the random vector in IRd having the full strictly operator-stable distribution μ with this fixed B. It is known that every eigenvalue of B has its real part not less than 12 (see Sharpe (1969)). Recall that for a given sequence X1 , X2 , . . . of i.i.d. random vectors in d IR for which the normalized sum converges to θ, namely, n
−B
n
w
Xi → θ,
(8.4.2)
i=1
we say that {Xi } belongs to the domain of normal attraction of μ. As in the previous sections, we will be interested in the rate of convergence of (8.4.2). Remark 8.4.1 Some of the results in this section can be extended to Banach space–valued random variables. Also, our arguments can be used for similar rate of convergence problems of the max-operator-stable limit theorem. We start with some notation. Let · 0 be the usual Euclidean norm of IR and let S(μ) be the symmetry group associated with μ, that is, the group of all invertible linear operators A on IRd such that for some a ∈ IRd , μ 5(z) = μ 5(A∗ z)eia . Since by assumption μ is full, S(μ) is compact, and thus there exists a Haar probability H on S(μ). We introduce the following norm · , which depends on the particular operator-stable law μ but not on the choice of exponent: d
1 x =
gtB x0 S(μ) 0
dt dH(g) t
(8.4.3)
8.4 Operator-Stable Limit Theorems
133
(see Hudson et al. (1986) and Hahn et al. (1989)). It has the following properties: (i) · does not depend on the choice of the exponent B. (ii) The map t → tB x is strictly increasing on (0, ∞) for x = 0. Define the norm of the linear operator A on IRd in the usual way by A = sup x =1 Ax. Then property (ii) implies (iii) The map t → tB is strictly increasing on (0, ∞); i.e., t → t−B = (t−1 )B is strictly decreasing on (0, ∞). Further, we will need estimates of the growth rate of R(t) = tB x. Meerschaert (1989) showed that for every x, the function R0 (t) = tB x0 varies regularly with index between λB and ΛB , where λB and ΛB are the minimum and the maximum of the real parts of the eigenvalues of B, respectively. Clearly, for every norm · on IRd , the function R(t) = tB x will be of the same order as the regularly varying function R0 (t). In particular, for any η > 0, there exists t0 > 0 such that for any t > t0 , tλB −η x < tB x < tΛB +η x,
(8.4.4)
t−ΛB −η x < t−B x < t−λB +η x.
(8.4.5)
and
Let X (IRd ) be the class of all random vectors in IRd , and the Kolmogorov metric in X (IRd ), (X, Y ) :=
sup |P (X ≤ x) − P (Y ≤ x)|.
(8.4.6)
x∈IRd
Here, and throughout this section, x ≤ y or x < y, x, y ∈ IRd , means component-wise inequality. Also, all the probability metrics μ that we shall use are in fact metrics in the space of probability laws: We write μ(X, Y ) instead of μ(PX , PY ) only for the sake of simplicity, where PX , PY stand for the probability distributions of X, Y , respectively. Next, we define a uniform metric depending on the exponent B, ∗ (X, Y ) := sup (tB X, tB Y ).
(8.4.7)
t>0
This metric plays a crucial role in our approach to the rate of convergence problem (8.4.2). Let Var be the total variation distance in X (IRd ),
134
8. Probabilistic-Type Limit Theorems
Var(X, Y ) := 2
sup A∈B(IRd )
|P (X ∈ A) − P (Y ∈ A)|
(8.4.8)
|PX − PY |( dx)
= IRd
=
sup{|Ef (X) − Ef (Y )|; f : IRd → IR, continuous, |f (x)| ≤ 1 for all x ∈ IRd }.
Remark 8.4.2 It is not difficult to check that ∗ is topologically “between” and Var; that is, top
top
≺ ∗ ≺ Var . top
Here we use the standard notation μ ≺ ν, meaning that ν-convergence implies μ-convergence, but the inverse is not generally valid. Remark 8.4.3 Our aim is to present a general approach to the rate of convergence problems associated with (8.4.2) that is designed to work for different metrics in terms of which we want to obtain estimates of the rate of convergence. We start with uniform-type metrics (, ∗ , Var), and then we will proceed with Kantorovich-type minimal distances. For r > 0, define a convolution-type metric associated to Var: μr (X, Y ) := sup tB −r Var (tB X + θ, tB Y + θ).
(8.4.9)
t>0
Here and in what follows, the notation X1 + X2 means the sum of two independent random vectors X1 and X2 . We shall first list our results and then prove them, extending the general method we have outlined in Section 8.1. Theorem 8.4.4 Let θ be a full strictly operator-stable random vector in B IRd , and B an exponent of θ. Let r > Λ (≥ λ1B ) and take p such that λ2 1 λB
λB ΛB r.
Let
{Xi }∞ i=1
∗ = ∗ (X1 , θ),
B
be a sequence of i.i.d. random vectors in IRd
μ1 = μ1 (X1 , θ),
μr = μr (X1 , θ),
satisfying the moment-type condition 1 r−p ∗ < ∞. τr = τr (X1 , θ) := max , μ1 , μr
(8.4.10)
(8.4.11)
8.4 Operator-Stable Limit Theorems
135
Then for some absolute constant K = K(d, B, r, p) > 0, 2 n−B
n
2
3 Xi , θ
≤ ∗
i=1
n−B
≤ K nn
n
3 Xi , θ
i=1 −B r
μr + n−B τr
for all n ≥ 1.
Remark 8.4.5 In our theorem, we do not explicitly assume that {Xi } belongs to the domain of normal attraction of θ. However, since λB > 12 and rλB > 1, nn−B r μr + n−B τr → 0
as
n → ∞,
because of (8.4.5). Consequently, conditions (8.4.10) and (8.4.11) are sufficient for {Xi } to be in the domain of normal attraction of θ. As to the decreasing rate of n−B , by (8.4.5), for every η > 0, there exists n0 such that n−B ≤ n−λB +η for every n ≥ n0 . However, we also see that for any η > 0, n−B ≤ M n−λB +η for all n ≥ 1, where M = supt≥1 t−B+(λB −η)I (< ∞). Note, however, that rate of convergence theorems typically describe only a relatively small subset of that domain of attraction. Letting B =
1 αI
, 0 < α ≤ 2, we have the following:
Corollary 8.4.6 Let θ be a strictly α-stable random vector with index 0 < α ≤ 2 . Let α < p < r and {Xi } be a sequence of i.i.d. random vectors in IRd satisfying τr < ∞. Then for some absolute constant K = K(d, α, p) > 0, 2 ∗
−1/α
n
n
3 Xi , θ
≤K n
r 1− α
μr + n
1 −α
τr
for all n ≥ 1. (8.4.12)
i=1
Resnick and Greenwood (1979) studied the limit theorem for (α1 , α2 )stable laws, which corresponds to the operator-stable limit theorem with exponent ⎞ ⎛ 1/α1 0 ⎠. B=⎝ 0 1/α2 Theorem 8.4.4 provides a bound for the rate of convergence in this particular case. Corollary 8.4.7 Let θ = (θ(1) , θ(2) ) be a strictly (α1 , α2 )-stable bivariate α2 vector, 0 < α1 ≤ α2 ≤ 2. Let r > α21 (≥ α2 ) and take p such that α2 <
136
8. Probabilistic-Type Limit Theorems (1)
(2)
1 p< α α2 r. Let {Xi = (Xi , Xi )}i≥1 be a sequence of i.i.d. random vectors satisfying τr < ∞. Then for all n ≥ 1, 22 3 3 n n (1) (2) ∗ −1/α1 −1/α2 1−r/α1 −1/α1 n ,θ ≤ K n Xi , n Xi μr + n τr .
i=1
i=1
We next state our results on the rates of convergence in another type of uniform metric: the total variation distance Var and the uniform distance between characteristic functions. Let ( ( ( (r 1 5 ( −B (r ( 2 −B ( b = 4 2 r > λB , ( 5 ( , (8.4.13) ( (r ( (r ( ( 1 ( −B (r 2 a = bc , c = (2B ( + (3B ( , and
( 1 ( ( r I−B (r M = sup (x ( (< ∞).
(8.4.14)
x≥1
d Theorem 8.4.8 Let {Xi }∞ i=1 be a sequence of i.i.d. random vectors in IR satisfying
νr = νr (X1 , θ) := max{Var(X1 , θ), μr } ≤ Then
2 Var n−B
n
3 Xi , θ
a . M
( (r ≤ cn (n−B ( νr
i=1
≤
( ( ( 1 ( (2−B (r n (n−B (r bM
(8.4.15)
(8.4.16) for all n ≥ 1.
It would be interesting to have a version of this theorem without condition (8.4.15). Our next theorem concerns the rate of convergence of a third type of uniform metric χ that lies “between” and Var, top
top
≺ χ ≺ Var, namely, the uniform distance between characteristic functions: is,X , (8.4.17) χ(X, Y ) := sup |φX (s) − φY (s)|, φX (s) := E e s∈IRd
where ·, · is the inner product in IRd . The corresponding “tB -uniform” (recall the definition of ∗ (8.4.7)) and “smoothed” versions of χ are defined by χ∗ (X, Y ) := sup χ(tB X, tB Y ) t>0
(8.4.18)
8.4 Operator-Stable Limit Theorems
137
and ( (−r χr (X, Y ) := sup (tB ( χ∗ tB X + θ, tB Y + θ .
(8.4.19)
t>0 d Theorem 8.4.9 Let {Xi }∞ i=1 be a sequence of i.i.d. random vectors in IR satisfying
νr∗ = νr∗ (X1 , θ) := max{χ∗ (X1 , θ), χr (X1 , θ)} ≤ Then for all n ≥ 1, 2 3 n ( (r χ∗ n−B Xi , θ ≤ cn (n−B ( νr∗ ≤ i=1
a . M
( ( ( 1 ( (2−B (r n (n−B (r . bM
Let us now denote the density of the random vector X by pX (x) (when it exists) and define the fourth type of uniform metric d(X, Y ) := ess sup |pX (x) − pY (x)|.
(8.4.20)
x∈IRd
This is “topologically” the strongest: top
top
top
≺ χ ≺ Var ≺ d. Let K(d, B) :=
max
max xB ij
2≤x≤3 1≤i,j≤d
(8.4.21)
(< ∞),
where Aij is the (i, j) component of the matrix A, and put C(d, B) = d! K(d, B)d . Let dr be the smoothed version of d, ( (−r dr (X, Y ) := sup (tB ( d tB X + θ, tB Y + θ .
(8.4.22)
(8.4.23)
t>0
Applying Theorem 8.4.8, we obtain the following rate of convergence bound in the local central limit theorem for operator-stable random vectors. Theorem 8.4.10 Suppose X1 has a density. Let ( B (r 6 ( B (r A = max C(d, B) (2 ( + (3 ( , 1 5 and ( (r D ≥ C(d, B) (3B ( .
138
8. Probabilistic-Type Limit Theorems
If Tr = Tr (X1 , θ) := max{d(X1 , θ), dr (X1 , θ)} < ∞ and νr ≤ min
a 1 , M M cD
,
then 2 d n−B
n
3 Xi , θ
≤ Ann−B r Tr
for all n ≥ 1.
i=1
Remark 8.4.11 Operator-stable random vectors have bounded densities (Hudson (1980)). The rest of our rate of convergence results are concerned with the minimal Lp -metrics, 0 ≤ p ≤ ∞, and in particular with 1 , the Kantorovich metric, 5 1 ; see Section 8.1. Recall that the total variation distance Var is 1 = L in fact the minimal L0 -metric. Recall the definition of the Lp -compound metric: For any X, Y ∈ X (IRd ), (8.4.24) Lp (X, Y ) := {E[X − Y p ]}min(1,1/p) , 0 < p < ∞, L0 (X, Y ) := E[I[X = Y ]] = P (X = Y ), (8.4.25) L∞ (X, Y ) := ess sup X −Y = inf{ε > 0; P (X −Y > ε) = 0}, (8.4.26) where I[A] is the indicator function of a set A. As always in this book, we assume that all random vectors X ∈ X (IRd ) are defined on a nonatomic probability space (Ω, A, P ); in this way the space of all joint laws PX,Y coincides with the space of all probability measures on IR2d . The Lp -minimal metric for 0 ≤ p ≤ ∞ was defined in (8.2.23): 5 p (X, Y ) = L 5 p (PX , PY ) := inf Lp (X, Y ), (8.4.27) p (X, Y ) = L d where the infimum is taken over all PX, Y with fixed marginals X ∼ X, d Y ∼ Y . Remark 8.4.12 For every p ∈ [0, ∞] fixed, we shall be interested in the 5 p n−B n Xi , θ → 0. As a consequence, we rate of convergence of L i=1 shall derive the rate of convergence results in terms of the Prohorov metric π(X, Y ) = inf{ε; P (X ∈ A) ≤ P (Y ∈ Aε ) + ε for all A ∈ B(IRd )}, (8.4.28) where Aε := {x; x − A ≤ ε}.
8.4 Operator-Stable Limit Theorems
139
5 0 = 1 Var, and so Theorem 8.4.8 gives the Case p = 0. For p = 0, L 2 desired bound for the rate of convergence. Case 0< p ≤ 1. Suppose first that B, the exponent of θ, satisfies 1 ΛB ≤ < p ≤ 1. (8.4.29) λB λ2B Then by (8.4.5), nn−B p → 0 as n → ∞. Theorem 8.4.13 Suppose 0 < p ≤ 1 and (8.4.29) holds. Let X, X1 , X2 , . . . be a sequence of i.i.d. random vectors satisfying 5 p (X, θ) < ∞. 5 p := L L
(8.4.30)
Then 2 5p L
n−B
n
3 Xi , θ
( (p 5p , ≤ n (n−B ( L
i=1
and furthermore, 3 2 n 1 ( ( p 1 5 pp+1 . Xi , θ ≤ n p+1 (n−B ( p+1 L π n−B i=1
In the case where (8.4.29) is not satisfied, we shall prove a result similar to that in Theorem 8.4.10. Define the convolution-type metric associated 5 p : For r > 0, with L 5 p (tB X + θ, tB Y + θ). 5 p,r (X, Y ) := sup tB −r L L
(8.4.31)
t>0
Theorem 8.4.14 Let 0 < p ≤ 1 and X, X1 , X2 , . . . be a sequence of i.i.d. random vectors in IRd . Let r > λ1b ,
( −B (p ( B (r 6 ( B (r ( ( ( ( ( ( 2 3 A = max 2 + ,1 5 and ( (p ( (r D ≥ (2−B ( (3B ( . Suppose 5 p (X, θ), L 5 p,r (X, θ)} < ∞ Rp,r = Rp,r (X, θ) := max{L
140
8. Probabilistic-Type Limit Theorems
and
νr ≤ min
a 1 , M M cD
,
where a, c, and M are defined in (8.4.13) and (8.4.14). Then, for all n ≥ 1, 2 π n−B
n
2
3p+1 Xi , θ
5p ≤ L
n−B
i=1
n
3 Xi , θ
≤ Ann−B r Rp,r .
i=1
a , M1cD In the next theorem we shall relax the assumption νr ≤ min M at the cost of losing a little of the order of convergence nn−B r . The next result has a form resembling Theorem 8.4.4. Theorem 8.4.15 Let 0 < p ≤ 1. Let r > 1 λB
λB ΛB r.
ΛB (≥ λ1B ), λ2B
and take q such that
d Let {Xi }∞ i=1 be a sequence of i.i.d. random vectors in IR
5 p,r = L 5 p,r (X1 , θ) < ∞ L and
1 r−q
5 p , μ1 , μr Qp,r = Qp,r (X1 , θ) := max L
< ∞,
5 p (X1 , θ), μ1 = μ1 (X1 , θ), and μr = μr (X1 , θ). Then for some 5p = L where L absolute constant K = K(d, B, r, p, q) > 0, 2 3 n ( ( (r ( 5 p,r + (n−B (p Qp,r , 5 p n−B L Xi , θ ≤ K n (n−B ( L i=1
for all n ≥ 1. Case 1< p ≤2. For 1 < p ≤ 2, we use a completely different method in the rate of convergence problem, which relies on the minimality property 5 p and was introduced in Rachev and R¨ uschendorf (1992). of L B (≥ Theorem 8.4.16 Suppose 1 < p ≤ 2 and p > Λ λ2B X2 , . . . be a sequence of i.i.d. random vectors satisfying
5p = L 5 p (X, θ) < ∞ L
and
E[X − θ] = 0.
Then there exists Cr > 0 such that for all n ≥ 1, 3 2 n ( 1 ( 5 p n−B 5p , L Xi , θ ≤ Cp n p (n−B ( L i=1
1 λB ).
Let X, X1 ,
8.4 Operator-Stable Limit Theorems
141
and moreover, the right-hand side vanishes as n → ∞. Furthermore, 2 3 n p p ( ( p 1 5 pp+1 . π n−B Xi , θ ≤ Cpp+1 n p+1 (n−B ( p+1 L i=1
Corollary 8.4.17 Let θ = (θ(1) , θ(2) ) be a strictly (α1 , α2 )-stable bivariα2 ate vector, 0 < α1 ≤ α2 ≤ 2. Let 2 ≥ p > α21 (≥ α2 ). Let {Xi = (1) (2) 5p = (Xi , Xi )}i≥1 be a sequence of i.i.d. random vectors satisfying L 5 p (X1 , θ) < ∞, and if p > 1, we additionally assume that E[X1 − θ] = 0. L Then for all n ≥ 1, 3 2 n n (1) (2) Xi , n−1/α2 Xi ), θ π (n−1/α1 i=1
i=1
22 5p ≤ L
n−1/α1
n
(1)
Xi , n−1/α2
i=1
≤
⎧ 1 p ⎪ ⎨ n1− α1 L 5 p p+1 p ⎪ ⎩ Cp n p1 − α11 L 5 p p+1
n
3 (2)
Xi
3 max(1,p) p+1 , θ
i=1
for
0 < p ≤ 1,
for
1 < p ≤ 2.
5 p can be Remark 8.4.18 Our approach based on the use of “ideality” of L >n extended to bound the distance between the maxima MX (n) := n−B k=1 k >n >n k −B and M X (n) := n θ θ i=1 i k=1 i=1 i . (Here k=1 stands for the componentwise maximum, and {θi } are i.i.d. copies of θ.) Also, we can ?n k ?n k −B compare?mX (n) := n−B k=1 X (n) := n θ with m i i θ i=1 k=1 i=1 n (where k=1 stands for the componentwise minimum) and aX (n) := n−B ( ( ( >n ( (k Xi ( with a (n) := n−B >n (k θi (. θ k=1 i=1 k=1 i=1 Theorem 8.4.19 Let 1 < p ≤ 2 and θ be a full strictly operator-stable random vector with exponent B such that n2 n−B p → 0
as
n → ∞.
Let X, X1 , X2 , . . . be i.i.d. random vectors in IRd with E[X − θ] = 0 and such that Lp (X, θ) < ∞. Then there exists Cp,d > 0 such that for every n ≥ 1, & ' 5 p (M (n), M (n)) , L 5 p (m (n), m (n)) , L 5 p (a (n), a (n)) max L X X X θ θ θ ( ( 5 p (X, θ). ≤ Cp,d n2/p (n−B ( L
142
8. Probabilistic-Type Limit Theorems
Remark 8.4.20 Note that aX (n) and aθ (n) are positive random variables, and therefore ⎛ 1 ⎞1/p p 5 p (a (n), a (n)) = ⎝ F −1 (t) − F −1 (t) dt⎠ , L X θ a (n) a (n) X
θ
0 −1 where FX is the generalized inverse of FX ; cf. Theorem 3.1.2. Also, from the above bound we can get the rate of convergence for π by making use of p
the bound π ≤ Lpp+1 . Let us compare Theorem 8.4.16 with a similar result on the rate of convergence in terms of the Zolotarev metric ζr , r > 0; see (8.2.1). Theorem 8.4.21 Let X, X1 , X2 , . . . be i.i.d. random vectors in IRd , and r a positive constant satisfying the conditions ζr := ζr (X, θ) < ∞
and
nn−B r → 0 as n → ∞.
Then for every n ≥ 1, 2 3 n ζr n−B Xi , θ ≤ nn−B r ζr , i=1
and for some Cr > 0, 3 2 n 1 1 ( ( r 1 Xi , θ ≤ Crr+1 n r+1 (n−B ( r+1 ζrr+1 . π n−B i=1
It is known that if r is an integer, then ζr on the right-hand sides of the above bounds can be estimated by the rth difference pseudomoment κr from above (see Zolotarev (1993)). Namely, if all mixed moments of order less than or equal to r − 1 for X and Y agree, then ζr (X, Y ) ≤
1 κr (X, Y ), r!
r ∈ IN,
(8.4.32)
where κr is rth difference pseudomoment & κr (X, Y ) = sup |E[f (X) − f (Y )]| ; f : IRd → IR, (8.4.33) ( ( ' ( r−1 r−1 ( d |f (x) − f (y)| ≤ (x x − y y ( for all x, y ∈ IR . For arbitrary r > 0, ζr is bounded from above by the absolute pseudomoment, namely, if all mixed moments of X and Y of order less than or equal to m (r = m + α, m ∈ IN, α ∈ (1, 2]) agree, then ζr ≤
Γ(1 + α) ξr , Γ(1 + r)
(8.4.34)
8.4 Operator-Stable Limit Theorems
where ξr is the rth absolute pseudomoment xr |PX − PY |( dx). ξr (X, Y ) :=
143
(8.4.35)
IRd
Let us now compare the rate of convergence in Theorem 8.4.21 with that in Theorem 8.4.16 for r = p ∈ (1, 2]. Recall that (8.3.32) is true only for r ∈ IN, and the known estimates for ζr from above by κr (r being noninteger) involve E[Xr ] and E[Y r ]. However, for any p ≥ 1, 5 pp (X, Y ) ≤ 2p κp (X, Y ) ≤ 2p ξp (X, Y ). L
(8.4.36)
5 p (X, θ) < ∞ in Theorem 8.4.16 is preferable Therefore, the restriction L θ) < ∞ in Theorem 8.4.21. On the other hand, the estimate for to ζr (X, n −B ζr (n i=1 Xi , θ) holds for any r > 0 and provides us with the exact order of convergence (as n → ∞) under the assumption ζr (X1 , θ) < ∞. Case 2< p ≤ ∞. Theorem 8.4.22 T8.4.21 Let θ be a full strictly operator-stable random vector that does not have a Gaussian component, or equivalently, whose exponent B satisfies n1/2 n−B → 0
as
n → ∞.
5 p (X, θ) < ∞. Then Let X, X1 , X2 , . . . be i.i.d. random vectors such that L for some C(d, p) > 0, 2 3 n 5 p (X, θ). 5 p n−B L Xi , θ ≤ C(d, p)n1/2 n−B L i=1
Before starting with the proof of our theorems, we introduce a notion of ideality for a probability metric, designed for problem (8.4.2). Definition 8.4.23 A metric ζ : X (IRd ) × X (IRd ) → [0, ∞) is called operator-ideal of order r ≥ 0 if (i) (homogeneity) ζ(aB X, aB Y ) ≤ aB r ζ(X, Y ) for any a > 0, and (ii) (regularity) ζ(X + Z, Y + Z) ≤ ζ(X, Y ) for any Z independent of X and Y . We next show a few lemmas needed for the proof of our main results.
144
8. Probabilistic-Type Limit Theorems
Lemma 8.4.24 , κr , and are regular (that is, (ii) holds); ∗ , Var, and χ∗ are operator-ideal of order r = 0; and ≤ ∗ ≤ 12 Var, χ ≤ χ∗ ≤ Var. This follows directly from the definitions of the metrics. 5 p,r , and ζr are operator-ideal of order r > 0. Lemma 8.4.25 μr , χr , dr , L 5 p is operator-ideal of order p ∧ 1. L Proof: We first show the operator-ideality of μr . For any a > 0, ( (−r μr (aB X, aB Y ) = sup (tB ( Var((ta)B X + θ, (ta)B Y + θ) t>0
( (−r 2 3 B B ( t B( t t ( ( = sup ( a X + θ, a Y +θ ( Var ( a a t>0 ( a ( (−r ≤ aB r sup (tB ( Var(tB X + θ, tB Y + θ), t>0
since ( B ( ( B (−1 (t ( (a (
( ( ( ( (−1 ( = (tB a−B aB ( (aB ( ≤ (tB a−B ( = aB r μr (X, Y ),
which shows the homogeneity of μr of order r > 0. We also have μr (X + Z, Y + Z) = sup tB −r Var(tB (X + Z) + θ, tB (Y + Z) + θ) t>0
≤ sup tB −r Var(tB X + θ, tB Y + θ) t>0
= μr (X, Y ), since tB Z is independent of tB X and θ, and Var is regular. This demon5 p,r strates the regularity of μr . One can check the ideality of χr , dr , and L in a similar fashion. We next show the operator-ideality of ζr . The regularity of ζr is known. As for the homogeneity, we have & ζr (aB X, aB Y ) = sup |E[f (aB X) − f (aB Y )]|; ' (r−1) (r−1) f (x) − f (y) ≤ x − y . (8.4.37) Let fa (x) := f (aB x). Then (r−1) fa(r−1) (x)(h)(r−1) = f (r−1) aB x aB h ,
8.4 Operator-Stable Limit Theorems
145
implying that ( ( ( B (r−1 ( B ( ( (r−1) B ( (r−1) ( ( (r−1) (r−1) ( ( a x −f a y (. (x) − fa (y)( ≤ a (f (fa Then the side condition in (8.4.37), ( ( ( (r−1) ( (r−1) (x) − f (y)( ≤ x − y, (f results in ( ( ( (r−1 ( B ( ( ( ( (r−1) ( (r−1) (a x − aB y ( ≤ (aB (r x − y. (x) − fa (y)( ≤ (aB ( (fa Consequently, by (8.4.37), ζr aB X, aB Y & ' ≤ sup |E[fa (X) − fa (Y )]| ; fa(r−1) (x) − fa(r−1) (y) ≤ aB r x − y = aB r ζr (X, Y ), which shows the regularity of ζr .
2
Lemma 8.4.26 If Z, W ∈ X (IRd ) are independent of X, Y ∈ X (IRd ), then ∗ (X + Z, Y + Z) ≤ ∗ (Z, W ) Var(X, Y ) + ∗ (X + W, Y + W ) and ∗ (X + Z, Y + Z) ≤ ∗ (X, Y ) Var(Z, W ) + ∗ (X + W, Y + W ). Lemma 8.4.27 If Z, W ∈ X (IRd ) are independent of X, Y ∈ X (IRd ), then Var(X + Z, Y + Z) ≤ Var(Z, W ) Var(X, Y ) + Var(X + W, Y + W ) and χ∗ (X + Z, Y + Z) ≤ χ∗ (Z, W )χ∗ (X, Y ) + χ∗ (X + W, Y + W ). Lemma 8.4.28 If Z, W ∈ X (IRd ) are independent of X, Y ∈ X (IRd ), then d(X + Z, Y + Z) ≤ d(Z, W ) Var(X, Y ) + d(X + W, Y + W ), d(X + Z, Y + Z) ≤ d(X, Y ) Var(Z, W ) + d(X + W, Y + W ), 5p . and for 0 < p ≤ 1 both inequalities hold with d replaced by L
146
8. Probabilistic-Type Limit Theorems
Proof: The proofs are very similar to those in Lemma 8.1.15; cf. also Lemma 2 in Senatov (1980) or Lemmas 14.3.3 and 14.3.6 in Rachev (1991). 5p , 0 < We shall demonstrate only the proof of the smoothing inequality for L 5p : p ≤ 1. We use the dual representation for L 5 p (X + Z, Y + Z) L = sup |E[f (X + Z) − f (Y + Z)]| f ∈ Lipb (p)
(recall that Lipb (p) consists of all bounded continuous functions on IRd satisfying |f (x) − f (y)| ≤ x − yp for all x, y ∈ IRd ) = sup PZ ( dz)(E[f (X + z)] − E[f (Y + z)]) f ∈ Lipb (p)
≤
sup (PZ − PW )( dz) (E[f (X + z)] − E[f (Y + z)]) f ∈ Lipb (p) + sup PW ( dz) (E[f (X + z)] − E[f (Y + z)])
≤
f ∈ Lipb (p)
|PZ − PW |( dz)
sup f ∈ Lipb (p)
|(E[f (X + z)] − E[f (Y + z)])|
+ Lp (X + W, Y + W ) =
Var(Z, W )Lp (X, Y ) + Lp (X + W, Y + W ),
as desired.
2
5 p , 0 < p ≤ 1) Let θ Lemma 8.4.29 (Smoothing inequalities for ∗ and L and θ1 be independent random vectors in IRd having the same full strictly operator-stable distribution with exponent B. Then for any X ∈ X (IRd ) independent of θ1 and δ > 0, ∗ (X, θ) ≤ C1 ∗ (X + δ B θ1 , θ + δ B θ1 ) + C2 δ B , and for 0 < p ≤ 1, if E[θp ] < ∞, 5 p (X, θ) ≤ Lp (X + δ B θ1 , θ + δ B θ1 ) + 2δ B p E[θp ]. L Here and in what follows, the Ci ’s are absolute constants depending only on d and B, unless stated otherwise explicitly. B = tB Proof: Fix ε ∈ (0, 1) and choose X ε X, θ = tε θ for some tε > 0 such + ε. We first show the inequality θ) that ∗ (X, θ) ≤ (X,
≤ C1 (X + δ B θ1 , θ + δ B θ1 ) + C2 δ B θ) (X,
(8.4.38)
8.4 Operator-Stable Limit Theorems
147
d = (cX, for any c > 0, we can shrink X, θ) cθ) θ, for θ1 ∼ θ. Since (X, and θ1 without any loss of generality. So we assume
< 1) > 2 . P (θ 3
(8.4.39)
For brevity we shall delete the “ ” from now on. Let θ(i) ∈ IR, i = 1, . . . , d, be the ith component of the operator-stable random vector θ ∈ IRd . Then for each i = 1, . . . , d, θ(i) has a bounded density; that is, for some M < ∞, d P (θ(i) ≤ x) =: sup |pθ(i) (x)| ≤ M for all i. sup x∈IR dx x∈IR (See Hudson (1976).) The idea of the following proof is taken from Lemma 12.1 in Bhattacharya and Rao (1976). First consider the case := (X, θ)
=
sup |P (X ≤ x) − P (θ ≤ x)|
(8.4.40)
x∈IRd
= − inf (P (X ≤ x) − P (θ ≤ x)) . x∈IRd
Given η ∈ (0, ), there exists x0 ∈ IRd such that P (X ≤ x0 ) − P (θ ≤ x0 ) < − + η.
(8.4.41)
We then have I := P (X + δ B θ1 ≤ x0 − δ B e) − P (θ + δ B θ1 ≤ x0 − δ B e) = P (X + z ≤ x0 − δ B e)−P (θ + z ≤ x0 − δ B e) P (δ B θ1 ∈ dz) IRd
=
+
E
,
Ec
where E := {z ∈ IRd − δ B e < z < δ B e}
and e = (1, 1, . . . , 1)t ∈ IRd .
Then estimating both integrals in the representation for I, we get I ≤ P (X ≤ x0 ) − P (θ ≤ x0 − z − δ B e) P (δ B θ1 ∈ dz) E
+ P (δ B θ1 ∈ E c ).
(8.4.42)
148
8. Probabilistic-Type Limit Theorems
To estimate the last term observe that 2 , 3
β := P (δ B θ1 ∈ E) ≥ P (δ B θ1 < δ B ) >
(8.4.43)
by (8.4.39). On the other hand, denoting the distribution function of θ by F (x), x = (x(1) , x(2) , . . . , x(d) )t ∈ IRd , and ε = (ε(1) , ε(2) , . . . , ε(d) )t ∈ IRd , we have |P (θ ≤ x + ε) − P (θ ≤ x)| d ≤ F (x(1) , . . . , x(i−1) , x(i) + ε(i) , . . . , x(d) + ε(d) ) i=1 (1)
−F (x ≤
d
(i)
(i+1)
,...,x ,x
+ε
(i+1)
(d)
,...,x
+ε
(d)
)
P (θ(i) ∈ Ii ).
i=1
Here Ii := (x(i) , x(i) + ε(i) ] or := (x(i) + ε(i) x(i) ] depending on the sign of ε(i) . Therefore, d
P (θ
(i)
i=1
∈ Ii ) ≤
d
|ε(i) | sup |pθ(i) (x)| ≤ M ε1 ,
i=1
x∈IR
where · 1 is the L1 -norm. Hence, −P (θ ≤ x + ε) ≤ −P (θ ≤ x) + M ε1 .
(8.4.44)
Thus we have, by (8.4.41), (8.4.42), and (8.4.44) with ε = −z − δ B e, that I ≤ P (X ≤ x0 ) − P (θ ≤ x0 ) + M (z1 + dδ B ) P (δ B θ1 ∈ dz) E
≤
+ P (δ B θ1 ∈ E c ) − + η + M (z1 + dδ B ) P (δ B θ1 ∈ dz) + P (δ B θ1 ∈ E c ).
E
Since z1 ≤ dδ B on E, it follows that I
≤ (− + η + 2M dδ B )P (δ B θ1 ∈ E) + P (δ B θ1 ∈ E c ) ≤ (1 − 2β) + η + 2M dδ B .
Consequently, (2β − 1) ≤ (X + δ B θ1 , θ + δ B θ1 ) + 2M dδ B + η.
8.4 Operator-Stable Limit Theorems
149
Since η can be taken arbitrarily small, we have 1 (X + δ B θ1 , θ + δ B θ1 ) + 2M dδ B . 2β − 1 Since β > 23 by (8.4.43), ≤ 3 (X + δ B θ1 , θ + δ B θ1 ) + 2M dδ B . This proves the inequality (X, θ) ≤ C1 (X + δ B θ1 , θ + δ B θ1 ) + C2 δ B with C1 = 3 and C2 = 6M d, provided that (8.4.40) holds. ≤
If, on the other hand, = supx∈IRd (P (X ≤ x) − P (θ ≤ x)), then given η ∈ (0, ), there exists x0 such that P (X ≤ x0 ) − P (θ ≤ x0 ) > − η. Then we similarly have P (X + δ B θ1 ≤ x0 + δ B e) − P (θ + δ B θ1 ≤ x0 + δ B e) = P (X + z ≤ x0 + δ B e) − P (θ + z ≤ x0 + δ B e) P (δ B θ1 ∈ dz) IRd
+
= E
≥
Ec
P (X ≤ x0 ) − P (θ ≤ x0 − z + δ B e) P (δ B θ1 ∈ dz)
E
− P (δ B θ1 ∈ E c ) ≥
P (X ≤ x0 ) − P (θ ≤ x0 ) − M (z1 + dδ B ) P (δ B θ1 ∈ dz)
E
− P (θB θ1 ∈ E c ) ≥ (2β − 1) − η − 2M dδ B . B B B Hence (2β − 1)B ≤ (XB+ δ θ1 , θ + Bδ θ1 ) + 2M dδ + η, so that ≤ 3 (X + δ θ1 , θ + δ θ1 ) + 2M dδ . This completes the proof of the inequality (8.4.28), and reintroducing the symbol “ ”, we write
≤ C1 (X + δ B θ1 , θ + δ B θ1 ) + C2 δ B . θ) (X, and θ, Therefore, by the definition of ∗ , X, ∗ (X, θ)
+ε θ) ≤ (X, + δ B θ1 , θ + δ B θ1 ) + C2 δ B + ε ≤ C1 (X B B B B ≤ C1 (tB ε (X + δ θ1 ), tε (θ + δ θ1 )) + C2 δ + ε ≤ C1 ∗ (X + δ B θ1 , θ + δ B θ1 ) + C2 δ B + ε,
150
8. Probabilistic-Type Limit Theorems
which yields the first smoothing inequality of Lemma 8.4.29. 5 p (0 < p ≤ 1). By Now let us show an inequality of a similar type for L 5p , the triangle inequality and the regularity of L 5 p (X, θ) ≤ L 5 p (X, X + δ B θ1 ) + L 5 p (X + δ B θ1 , θ+ δ B θ1 ) + L 5 p (θ, θ+ δ B θ1 ) L 5 p (X + δ B θ1 , θ + δ B θ1 ) + 2L 5 p (0, δ B θ1 ). ≤ L 5 p as a minimal metric with respect to the Lp -metric, From the definition of L it follows that 5 p (0, δ B θ) = E[δ B θp ] ≤ δ B p E[θp ], L which completes the proof of Lemma 8.4.29.
2
The proof of the next two lemmas is obvious. Lemma 8.4.30 For any a > 0 and r > 0, Var(aB X + θ, aB Y + θ) ≤ aB r μr (X, Y ), χ∗ (aB X + θ, aB Y + θ) ≤ aB r χr (X, Y ), d(aB X + θ, aB Y + θ) ≤ aB r dr (X, Y ), and for 0 < p ≤ 1, 5 p (aB X + θ, aB Y + θ) ≤ aB r L 5 p,r (X, Y ), L where X, Y ∈ X (IRd ) are independent of θ. Lemma 8.4.31 Let Aut(IRd ) be the set of all invertible linear operators (automorphisms). Then, for any A ∈ Aut(IRd ), d(AX, AY ) = |JA−1 |d(X, Y ), where JA is the Jacobian of the matrix A. Lemma 8.4.32 For x ∈ [2, 3], |JxB | ≤ C(d, B), where C(d, B) is defined in (8.4.22). Proof: Note that |JxB | = | det xB |.
8.4 Operator-Stable Limit Theorems
151
If A is d × d matrix, then | det A| ≤ d! | max Aij |d , 1≤i,j≤d
2
which proves the lemma.
Lemma 8.4.33 Let θ be a full strictly operator-stable random vector in IRd with exponent B. Then for any two independent copies θ1 and θ2 of θ and for any t, s > 0, d
tB θ1 + sB θ2 ∼ (t + s)B θ. Proof: By (8.4.1) with b(t) ≡ 0,
iz,tB θ1 +sB θ2
E e
iz,tB θ1
iz,sB θ2
= E e E e ∗ ∗ = μ 5 tB z μ 5 sB z = μ 5(z)t μ 5(z)s t+s B∗ = μ 5(z) = μ 5 (t + s) z iz,(t+s)B θ = E e . 2
The following lemmas are proved in Sections 2.5 and 2.6. Further, Cb (IRd ) stands for the set of all bounded continuous functions on IRd . 5p ) L 5 p admits the following repreLemma 8.4.34 (Duality theorems for L sentation: (i) For p = 0, 5 0 (X, Y ) L
=
=
sup{|E[f (X) − f (Y )]|; f ∈ Cb (IRd ) such that |f (x) − f (y)| ≤ 1 for all x, y ∈ IRd } 1 Var(X, Y ). 2
(ii) For 0 < p ≤ 1, 5 p (X, Y ) L
=
sup{|E[f (X) − f (Y )]|; f ∈ Cb (IRd ) such that |f (x) − f (y)| ≤ x − yp for all x, y ∈ IRd }.
152
8. Probabilistic-Type Limit Theorems
(iii) For 1 < p < ∞, 5 p (X, Y ) L
=
sup{E[f (X)] − E[g(Y )]; f, g ∈ Cb (IRd ) such that |f (x) − g(y)| ≤ x − yp for all x, y ∈ IRd }.
(iv) For p = ∞, 5 ∞ (X, Y ) L
inf{ε > 0; P (X ∈ A) ≤ P (Y ∈ Aε ) for all A ∈ B(IRd )}.
=
5 p -convergence 5 p -convergence) (i) For any 0 ≤ p ≤ ∞, L Lemma 8.4.35 (L implies weak convergence. Moreover, if π is the Prohorov metric, then 1
5 pp+1 π ≤ L
for all 0 ≤ p ≤ 1
and p
5 pp+1 π ≤ L
for all 1 ≤ p ≤ ∞.
(ii) Let 0 < p < ∞ and E[Xn p ] + E[Xp ] < ∞. Then 5 p (Xn , X) → 0 L if and only if w
Xn → X
E[Xn p ] → E[Xp ].
and
5 p in X (IR)) For d = 1, 1 ≤ Lemma 8.4.36 (Explicit representations for L p ≤ ∞, ⎛ 5 p (X, Y ) = ⎝ L
1
⎞1/p −1 |FX (t) − FY−1 (t)|p dt⎠
,
1 ≤ p < ∞,
0
5 ∞ (X, Y ) L
=
−1 sup |FX (t) − FY−1 (t)|.
0≤t≤1
5 p ) For 1 ≤ p < ∞, Lemma 8.4.37 (Upper bounds for L 5 p ≤ κp ≤ ξp , 2−p+1 L where κp (resp. ξp ) is the pth difference (resp. absolute) pseudomoment.
8.5 Proofs of the Rate of Convergence Results
153
8.5 Application to Operator-Stable Limit Theorems: Proofs of the Rate of Convergence Results In this section we give the proofs of the rate of convergence results stated in Section 8.4. Proof of Theorem 8.4.4: All probability metrics for the random vectors in this section are defined by their marginal distributions and are consequently independent of their joint distributions. So, without loss of generality, we assume that {Xi } and θ are independent of each other. Let {θi } be independent copies of θ and assume that {θi } are independent of {Xi } and θ. Then by the definition of θ (or by (8.4.1) with b(t) ≡ 0), −B
n
n
d
θi ∼ θ
for any n = 1, 2, . . . .
(8.5.1)
i=1
Now, by Lemma 8.4.29 and (8.5.1), for any δ > 0, 2 3 n Xi , θ ∗ n−B i=1
2
≤ C1 ∗
n−B 2
= C1 ∗
n−B
n i=1 n
(8.5.2)
3 Xi + δ B θ1 , θ + δ B θ1 Xi + δ B θ, n−B
i=1
n
+ C2 δ B 3 + C2 δ B .
θi + δ B θ
i=1
Furthermore, by the triangle inequality, 2 3 n n Xi + δ B θ, n−B θi + δ B θ ∗ n−B i=1
i=1
2 ≤ ∗
n−B
n i=1
+
m
⎛
⎛
∗⎝ −B⎝
n
j=1
j i=1
2 + ∗
Xi + δ B θ, n−B θ1 + n−B
n−B
2m+1 i=1
θi +
n
⎞
n
3 Xi + δ B θ
i=2
Xi ⎠ + δ B θ,
i=j+1
⎛ ⎞ ⎞ j+1 n n−B ⎝ θi + Xi ⎠ + δ B θ⎠ i=1
θi +
n i=m+2
Xi
3
i=j+2
+ δ B θ, n−B
n i=1
3 θi + δ B θ ,
154
8. Probabilistic-Type Limit Theorems
where m = [ n2 ], n ≥ 5. By Lemma 8.4.26, the above is 2 ≤
∗
n−B 2
+ ∗
+
n
Xi , n−B
i=2
n−B (X1 +
m
⎡
n
⎛
j=1
n
i=2
i=1
θi ) + δ B θ, n−B
i=j+2
× Var n−B j
+ ∗ ⎝n−B (
θi + δ B θ
⎞
θi ⎠
i=j+2
3
θi + Xj+1
+ δ B θ, n−B
n−B
n
θi + Xj+1 +
2m+1
θi ) + δ B θ, n−B
i=j+2
θi +
i=1
j+1
3 θi + δ B θ
i=1
i=1
2
=:
2 j
3
i=i
⎛
4
n
Xi , n−B
2
+ ∗
Var n−B X1 + δ B θ, n−B θ1 + δ B θ
θi
i=2 n
n
⎣∗ ⎝n−B
3
+ δ B θ, n−B
Xi
⎞⎤
θi + δ B θ⎠⎦
i=1
3
n
n
i=m+2
n
3
θi + δ B θ
i=1
Δk .
k=1
Here, the summands Δk are defined as follows: 2 3 n n Xi , n−B Δ1 = ∗ n−B θi Var n−B X1 + δ B θ, n−B θ1 + δ B θ ,
Δ2
=
m
⎛
i=2
∗ ⎝n−B
j=1
i=2 n
× Var n−B = (m + 1)∗ 2 Δ4
= ∗
n−B
2m+1 i=1
3
+ δ B θ, n−B
θi + Xj+1
i=1
2
n−B
θi ⎠
i=j+2
2 j
2 Δ3
Xi , n−B
i=j+2
2
⎞
n
X1 +
n i=2
θi +
n
i=1
3 θi
Xi
3 + δ B θ, n−B
i=m+2
k=1
3 θi + δ B θ , 3
+ δ B θ, θ1 + δ B θ ,
Next, by (8.5.2), 2 3 n 4 ∗ −B n Xi , θ ≤ C1 Δk + Δ5 , i=1
j+1
n
3 θi + δ B θ .
i=1
(8.5.3)
8.5 Proofs of the Rate of Convergence Results
155
with Δ5 = C2 δ B . We shall estimate each Δk separately. (I) Estimate for Δ3 . We have 2 Δ3 = (m + 1)
∗
−B
n
X1 + n
−B
(n − 1) (n − 1) B
−B
n
θi + δ B θ,
i=2
n−B θ1 + n−B (n − 1)B (n − 1)−B
n
3 θi + δ B θ
i=2
= (m + 1)∗ n−B X1 + n−B (n − 1)B θ2 + δ B θ, −B
n
θ1 + n
−B
(n − 1) θ2 + δ θ B
B
[by (8.5.1)] ≤ (m + 1)∗ n−B X1 + n−B (n − 1)B θ2 , n−B θ1 + n−B (n − 1)B θ2 [by the regularity of ∗ ] 1 ≤ (m + 1) Var n−B X1 + n−B (n − 1)B θ, n−B θ1 + n−B (n − 1)B θ 2 1 [since ∗ ≤ Var] 2 1 (m + 1) Var (n − 1)−B X1 + θ, (n − 1)−B θ1 + θ ≤ 2 [by the homogeneity of Var]. Thus, we have Δ3 ≤
(r 1 ( n ((n − 1)−B ( μr (X1 , θ1 ), 2
(8.5.4)
invoking Lemma 8.4.30. Now, using the fact that t → tB is strictly increasing on (0, ∞), we have ( ( ( n − 1 −B ( ( ( ( ( ( −B ( ((n − 1)−B ( ≤ ( (8.5.5) ( ( n ( ( ( n ( ( ( 1 −B ( ( ( ( ( ( ( ( ( −B ( ≤ ( ( n ( = (2B ( (n−B ( . ( 2 ( Thus it follows from (8.5.4) and (8.5.5) that ( (r Δ3 ≤ C3 n (n−B ( μr , with C3 = 12 2B r .
(8.5.6)
156
8. Probabilistic-Type Limit Theorems
(II) Estimate for Δ4 . Similarly, we have 2 Δ4
∗
=
−B
n
−B
B
(m + 1) (m + 1)
m+1
θi + n
i=1
n−B (m + 1)B (m + 1)−B
m+1
n
n−B (m + 1)B θ1 + n−B
θi + n−B
n
3 θi + δ B θ
i=m+2
Xi ,
i=m+2
n−B (m + 1)B θ1 + n−B
Xi + δ B θ,
i=m+2
i=1
2 ≤ ∗
n
−B
n
3 θi
i=m+2 ∗
[by (8.5.1) and the regularity of ] 2 n 1 −B B −B Xi , ≤ Var n (m + 1) θ1 + n 2 i=m+2 n−B (m + 1)B θ1 + n−B 2
≤ ≤
n
3
θi
i=m+2 n
n
1 Xi + θ, (m + 1)−B θi + θ Var (m + 1)−B 2 i=m+2 i=m+2 2 n 3 n 1 −B r Xi , θi (m + 1) μr 2 i=m+2 i=m+2
3
[by Lemma 8.4.30] ( 1( ((m + 1)−B (r (n − m − 1)μr (X1 , θ1 ) ≤ 2 by the triangle the repeated use of the regularity of μr . ( and ( inequality ( ( ( B( ( n −B ( 1 −B ( (2 (, we have ( ≤ ( = Finally, by ( ( ( 2 ( m+1 ( ( (r Δ4 ≤ C3 n (n−B ( μr .
(8.5.7)
We prove the theorem by induction. For n = 1, the theorem is valid because ∗ (X1 , θ) ≤ τr (X1 , θ). For n = 2, 3, and 4, the estimates are similar to those for n ≥ 5, the case we are going to prove. However, the absolute constants in the bounds for n = 2, 3, and 4 have smaller values.
8.5 Proofs of the Rate of Convergence Results
157
In the following we assume n ≥ 5. Assume that for all j < n, 2 3 j ∗ j −B Xi , θ ≤ K jj −B r μr + j −B τr . i=1
Since we have already estimated Δ3 and Δ4 independently of the induction hypothesis, we shall estimate only Δ1 , Δ2 , and Δ5 . To this end, take δ > 0 such that nδ = ε−1 for some ε > 0, where ε will be suitably chosen later. (III) A bound for Δ1 . By the definition of ∗ , 2 3 n n ∗ n−B Xi , n−B θi i=2
2
=
sup tB n−B t>0
2
i=2 n
Xi , tB n−B
i=2
3 θi
i=2 n
n−1 B ) (n − 1)−B n t>0 i=2 3 2 n n ≤ sup uB (n − 1)−B Xi , uB (n − 1)−B θi
=
sup tB (
n−1 B ) (n − 1)−B n
n
u>0
2
= ∗
(n − 1)−B
Xi , tB (
i=2 n
Xi , (n − 1)−B
i=2
n
3
n
3 θi
i=2
i=2
θi
i=2
≤ K (n − 1)(n − 1)−B r μr + (n − 1)−B τr
by the induction hypothesis. Furthermore, Var n−B X1 + δ B θ, n−B θ1 + δ B θ ≤ Var (nδ)−B X1 + θ, (nδ)−B θ1 + θ ≤ (nδ)−B μ1 (X1 , θ1 ) by the homogeneity of Var of order 0 and Lemma 8.4.30. Thus, we have ( ( ( ( ( ( −B (r −B ( ( ( τr ((nδ)−B ( μ1 (8.5.8) Δ1 ≤ K (n − 1) (n − 1) μr + (n − 1) ( ( (r ( ( ( ≤ KC4 n (n−B ( μr + (n−B ( τr ((nδ)−B ( μ1 ( ( −B ( ( B ( ( −B (r ( ≤ KC4 n n μr + (n ( τr (ε ( τr , with C4 = max{2B r , 2B }. (IV) A bound for Δ2 .
158
8. Probabilistic-Type Limit Theorems
We assume n ≥ 5. Then, as in the case for Δ3 , we have, for j ≤ m = [ n2 ], ⎛ ⎞ n n ∗ ⎝n−B Xi , n−B θi ⎠ i=j+2
i=j+2
⎛
≤ ∗ ⎝(n − j − 1)−B
n
Xi , (n − j − 1)−B
i=j+2
n
⎞ θi ⎠
i=j+2
≤ K (n − j − 1)(n − j − 1)−B r μr + (n − j − 1)−B τr . Also, we have 2
j j+1 −B B −B θi + Xj+1 ) + δ θ, n θi + δ B θ Var n ( i=1
3
i=1
2
≤ Var Xj+1 +
j
θi + (nδ)B θ, θj+1 +
i=1 j B θ1
j
3 θi + (nδ)B θ
i=1 B j θ1
= Var Xj+1 + + (nδ) θ, θj+1 + + (nδ)B θ = Var X1 + (j + nδ)B θ, θ1 + (j + nδ)B θ (by Lemma 8.4.33) ≤ Var (j + nδ)−B X1 + θ, (j + nδ)−B θ1 + θ (by the homogeneity of Var of order 0,) ≤ (j + nδ)−B r μr (X1 , θ) by Lemma 8.4.30. Thus we have
Δ2
≤
m
B
K (n − j − 1)(n − j − 1)−B r μr
j=1
+ (n − j − 1)
−B
(8.5.9)
τr (j + nδ)−B r μr .
Now, for 1 ≤ j ≤ m = [ n2 ], n ≥ 5, we have
n−j−1 n
≥
n−m−1 n
≥
(n − j − 1)−B ≤ 4B n−B ,
1 4.
Hence
(8.5.10)
and so by (8.5.9) and (8.5.10),
Δ2 ≤ KC5 nn
−B r
μr + n
−B
τr
∞
(j + nδ)−B r μr ,
(8.5.11)
j=1
where C5 = max{4B r , 4B }. Furthermore, ∞ j=1
(j + nδ)−B r
∞ ≤ 0
(x + nδ)−B r dx
(8.5.12)
8.5 Proofs of the Rate of Convergence Results
∞ =
y −B r dy ≤ (nδ)(nδ)−B r
∞
159
z −B r dz.
1
nδ
Recall that r > ΛB /λ2B ≥ 1/λB . Take q such that 1/λB < q < r. Then, by 1 1 (8.4.5), x q I−B → 0 as x → ∞, and hence M1 := supx≥1 x q I−B < ∞. Thus for z ≥ 1, z −B r ≤ M1r z −r/q , and hence ∞
∞ z −B r dz ≤ M1 z −r/q dz =: C6 < ∞.
1
(8.5.13)
1
It also follows from the assumption for p (p > 1/λB ) that for z ≥ 1, 1 z p ≤ (M2 z −B −1 ≤ M2 z B since z B z −B ≥ 1, where M2 = ( ( 1 ( supx≥1 (x p I−B (. Thus if nδ ≥ 1, then nδ ≤ M2p (nδ)B p .
(8.5.14)
Finally, we have by (8.5.11)–(8.5.14) and (8.4.5) that if nδ = ε−1 (where ε > 0 will be taken small), then ( (r (8.5.15) Δ2 ≤ KC5 nn−B r μr + n−B τr C6 (nδ) ((nδ)−B ( μr ( ( ( ( p r ≤ KC5 C6 nn−B r μr + n−B τr M2p (ε−B ( (εB ( μr ( ( −B ( ( −B (p ( B (r r−p ( p −B (r ( ≤ KC5 C6 M2 n n μr + (n ( τr (ε ( (ε ( τr . (V) A bound for Δ5 . We have Δ5 ≤ C2 n−B ε−B = C2 n−B τr ε−B τr−1 .
(8.5.16)
Altogether we have from (8.5.3), (8.5.6), (8.5.7), (8.5.8), (8.5.15), and (8.5.16) that 2 3 n n−B Xi , θ (8.5.17) i=1
( −B ( ( B ( ( −B (r ( ( ( ( μr + (n ( τr ≤ 2C1 C3 nn μr + KC1 C4 ε τr n n ( ( −B ( ( ( ( ( ( p ( −B (p ( B (r r−p −B (r ( ε τr μr + (n ( τr n n + KC1 C5 C6 M2 ε ( ( ( ( + C2 (ε−B ( τr−1 (n−B ( τr & ( B( ( ( ( B (r p( B −1 (p ( ( ( τr ε ( ≤ K C1 C4 τr ε + C1 C5 C6 M2 (τr ε ) ( ( ( ( −B (r ( + 2C1 C3 + C2 ((τr εB )−1 ( n (n ( μr + (n−B ( τr . −B r
160
8. Probabilistic-Type Limit Theorems
We first show that ( (p ( (r lim (t−B ( (tB ( = 0. t→0
(8.5.18)
It follows from and (8.4.5) that for any η > 0 and for some small ( −B ( (8.4.4) −Λ t0 > 0, (t ( ≤ t B −η , t < t0 , and tB ≤ tλB −η , t < t0 . Thus for t < t0 , ( −B (p ( B (r (t ( (t ( ≤ t−(ΛB +η)p+(λB −η)r ≤ t−ΛB p+λB r−η(p+r) , where by the restrictions on r and p, −ΛB p + λB r > 0. Thus, taking η > 0 sufficiently small, we get (8.5.18). Also, of course, limt→0 tB = 0. Therefore, we can find a sufficiently small ε > 0 such that the matrix τr εB satisfies ( ( B( (p ( B (r 1 p ( B −1 ( ( ( ( . (8.5.19) C1 C4 τr ε + C1 C5 C6 M2 ( τr ε ( τr ε ( ≤ 2 Then choose K such that ( ( K 2C1 C3 + C2 ((τr εB )−1 ( ≤ . 2
(8.5.20)
Finally, we obtain from (8.5.17), (8.5.19), and (8.5.20) that 2 3 n ( ( −B ( ( −B −B (r ( Xi , θ ≤ K n n μr + (n ( τr . n
2
i=1
Proof of Theorems 8.4.8 and 8.4.9: By the same reasoning as that mentioned at the beginning of this section, we can assume that {Xi } and θ are independent of each other and that the {θi } are independent copies of θ and are independent of the {Xi } and θ. We prove the theorem by induction. For n = 1, the assertion is trivial. For n = 2, we have Var 2−B (X1 + X2 ), θ = Var 2−B (X1 + X2 ), 2−B (θ1 + θ2 ) [by (8.5.1)] ≤ Var(X1 + X2 , θ1 + θ2 ) [by the homogeneity of Var of order r = 0] ≤ Var(X1 + X2 , X1 + θ2 ) + Var(X1 + θ2 , θ1 + θ2 ) [by the triangle inequality] ≤ Var(X2 , θ2 ) + Var(X1 , θ1 ) [by the regularity of Var] ( (r = 2 Var (X1 , θ) ≤ 2νr ≤ 2c (2−B ( νr ,
8.5 Proofs of the Rate of Convergence Results
since
161
( (r ( (r ( ( (r ( (r (r ( −B (r ( ( c 2 = (2B ( + (3B ( (2−B ( ≥ (2B ( (2−B ( ≥ 1.
For n = 3, we similarly have ( (r Var 3−B (X1 + X2 + X3 ), θ ≤ 3c (3−B ( νr . Now suppose that for all j < n, 3 2 j ( (r Var j −B Xi , θ ≤ cj (j −B ( νr .
(8.5.21)
i=1
Then for any j < n, 2 3 j ( 1( (2−B (r Var j −B Xi , θ ≤ cM νr ≤ ca = b i=1
(8.5.22)
by our assumptions. For any integer n ≥ 4 and m = [ n2 ], we have 2 3 n Var n−B Xi , θ i=1
2
= Var n−B 2 ≤ Var n−B 2
n
Xi , n−B
i=1 n
n
(8.5.23)
3 θi
i=1
Xi , n−B
2m
i=1
θi +
i=1 m
+ Var n−B (
θi +
i=1
n
33
n
Xi
i=m+1
Xi ), n−B
i=m+1
n
3 θi
i=1
[by the triangle inequality] 2 ≤ Var n−B 2
n
+ Var n−B 2
Xi , n−B
i=m+1
2m
+ Var n−B (
Xi +
θi +
i=1
[by Lemma 8.4.26]
θi
i=m+1
i=1 m
3
n
n
θi
2 Var n−B
i=1
3 , n−B
i=m+1 n i=m+1
m
Xi ), n−B
n i=1
n i=1
3 θi
3
θi
Xi , n−B
m i=1
3 θi
162
8. Probabilistic-Type Limit Theorems
=: I1 + I2 + I3 . By the induction hypotheses (8.5.21) and (8.5.22), 2 −B −B 3 n n n I1 = Var (n − m)−B Xi , θ1 n−m n − m i=m+1 2 3 n n −B n −B × Var m−B Xi , θ1 m m i=m+1 2 3 2 3 n m ≤ Var (n − m)−B Xi , θ1 Var m−B Xi , θ1 i=m+1
≤
i=1
( ( ( 1( (2−B (r cm (m−B (r νr . b
Note that m = [ n2 ] ≥ 25 n for n ≥ 4. Hence ( (r ( (r ( 2 −B ( ( 2 −B ( ( ( −B (r n n r ( ( ( ( ( −B ( (m ( ≤ ( ≤ ( n ( . ( n ( ( ( 2( 5 2( 5 Thus I1
( (r −B ( ( (r (r ( 1 ( −B (r 1 ( 2 ( ( ( ( 2 2 ≤ c ( cn (n−B ( νr (8.5.24) ( n (n−B ( νr = ( b 2( 5 5
by the definition of b. To estimate I2 , observe that 2 m n n −B −B −B ) (n − m) Xi + ( θi , I2 = Var n n − m i=1 i=m+1 3 m n −B n n−B θi + (n − m)−B θi n − m i=1 i=m+1 3 2 m m −B −B Xi + θ, (n − m) θi + θ . ≤ Var (n − m) i=1
Then we have I2
( (r ≤ ((n − m)−B ( μr
2m i=1
i=1
Xi ,
m i=1
3 θi
[by Lemma 8.4.30]
( (r ≤ ((n − m)−B ( mμr (X1 , θ) [by the triangle inequality and the repeated use of the regularity of μr ] ( (r −B ( (r 1( 1 n−m ( 1 ( ( ≤ ≥ . Hence ( ( n (n−B ( μr since ( 2( 2 n 2
8.5 Proofs of the Rate of Convergence Results
I2 ≤
( ( ( 1( (2B (r n (n−B (r νr . 2
163
(8.5.25)
As to I3 , we have 2 3 n n Xi + n−B mB θ, n−B θi + n−B mB θ I3 = Var n−B 2
i=m+1
≤ Var m−B
n
i=m+1 n
Xi + θ, m−B
i=m+1
3
θi + θ
(8.5.26)
i=m+1
( ( ( −B (r ( −B (r ( n B (r ( ≤ (m ( (n − m)μr (X1 , θ) ≤ (n ( ( ( m ( (n − m)μr ( ( ( 3( (3B (r n (n−B (r νr , ≤ 5 since
n m
≤ 3 for n ≥ 4 and n − m ≤ 35 n.
Altogether, we have from (8.5.23)–(8.5.26), 2 3 n ( (r 2 3 1 Var n−B Xi , θ ≤ c + 2B r + 3B r n (n−B ( νr 5 2 5 i=1 ≤ cnn−B r νr . This completes the proof of Theorem 8.4.8. The proof of Theorem 8.4.9 is similar and is therefore omitted. 2 Proof of Theorem 8.4.10: Again, we assume that the {Xi } and θ are independent of each other. Let {θi } be independent copies of θ, and assume that the {θi } are independent of {Xi } and θ. We prove the theorem by induction. For n = 1, d(X1 , θ) ≤ Ad(X1 , θ) ≤ ATr . For n = 2, we have by Lemma 8.4.31, the regularity of d, and the triangle inequality, d 2−B (X1 + X2 ), θ = d 2−B (X1 + X2 ), 2−B (θ1 + θ2 ) = |J2B | d(X1 + X2 , θ1 + θ2 ) ≤ 2 |J2B | d(X1 , θ) ≤ 2C(d, B)Tr [by Lemma 8.4.32] ≤ 2A2−B r Tr , ( (r ( (r ≥ C(d, B) (2B ( (2−B ( ≥ C(d, B). Similarly, we have
( (r since A (2−B ( for n = 3, (r ( d 3−B (X1 + X2 + X3 ), θ ≤ 3C(d, B)Tr ≤ 3A (3−B ( Tr ,
164
8. Probabilistic-Type Limit Theorems
( ( (r ( (r (r since A (3−B ( ≥ C(d, B) (3B ( (3−B ( ≥ C(d, B). To prove the theorem by induction, assume for all j < n that 2 3 j ( (r d j −B Xj , θ ≤ Aj (j −B ( Tr . i=1
For any n ≥ 4 and m = [ n2 ], we have by Lemma 8.4.28, 2
d n−B
n
3
Xi , θ
(8.5.27)
i=1
2
≤
n
d n−B
i=1
2
+ d n−B
2m
d n−B 2
m i=1
θi +
−B
+d n
m
i=1
2 + d n−B
2m
i=m+1
3
i=1
n i=m+1
Xi
θi Xi , n−B
i=m+1
3 ,n
θi
3
n
θi Var n−B n
n i=1
2
i=m+1
θi +
, n−B
Xi
i=1
Xi +
Xi
3
i=m+1
Xi , n−B 2m
n
33
n
θi +
i=1
i=1
2 ≤
Xi , n−B
2m
−B
n
3
, n−B
n
3 θi
i=m+1
θi
i=1
3
n
3 θi
i=1
=: I1 + I2 + I3 . By Lemma 8.4.31, 2 3 2 3 m n n −B n −B m−B Xi , θ Var (n − m)−B Xi , θ I1 ≤ d m m i=1 i=m+1 2 3 2 3 m n n −B Xi , θ Var (n − m)−B Xi , θ . (8.5.28) = J( m )B d m i=1
i=m+1
By the induction hypothesis and Lemma 8.4.32, 2 3 m (r ( n Xi , θ ≤ C(d, B)Am (m−B ( Tr J( m )B d m−B i=1
(8.5.29)
( (r ( m −B ( ( n( r ( Tr ≤ C(d, B)A (n−B ( ( ( ( 2 n (r 1 ( (r ( ≤ C(d, B)A (3B ( n (n−B ( Tr . 2
8.5 Proofs of the Rate of Convergence Results a On the other hand, since νr ≤ M , by Theorem 8.4.8, 2 3 n ( (r Var (n − m)−B Xi , θ ≤ c(n − m) ((n − m)−B ( νr i=m+1
where we have used νr ≤
1 , D
1 M cD .
Therefore, we have, by (8.5.28)–(8.5.30),
(r (r ( 1 ( (r 1 ( 1 I1 ≤ C(d, B)A (3B ( n (n−B ( Tr ≤ An (n−B ( Tr , 2 D 2 ( B (r since D ≥ C(d, B) (3 ( . Similarly, for the estimate of I2 ,
I2
(8.5.30)
(r ( 1 ( ( I−B = c ((n − m) r ( νr ≤ cM νr ≤
2
165
(8.5.31)
−B −B 3 m n n = d n−B Xi + θ, n−B θi + θ n − m n − m i=1 i=1 2 3 m m −B n Xi + θ, (n − m)−B θi + θ = J( n−m )B d (n − m) m
i=1
i=1
[by Lemma 8.4.31] ≤ C(d, B)(n − m)−B r dr
2m i=1
Xi ,
m
3 θi
[by Lemma 8.4.32]
i=1
( (r ≤ C(d, B) ((n − m)−B ( mdr (X1 , θ)
by the triangle inequality and
the repeated use of the regularity of dr . Hence (r 1 ( (r ( I2 ≤ C(d, B) (2B ( n (n−B ( Tr . 2
(8.5.32)
Finally, we have 3 2 n n n −B n −B I3 = d θ + n−B Xi , θ + n−B θi m m i=m+1 i=m+1 2 3 n n ≤ C(d, B)d m−B Xi + θ, m−B θi + θ i=m+1
( (r ≤ C(d, B) (m−B ( (n − m)dr (X1 , θ) (r 3 ( (r ( ≤ C(d, B) (3B ( n (n−B ( Tr . 5
i=m+1
(8.5.33)
166
8. Probabilistic-Type Limit Theorems
Combining the estimates for I1 , I2 , and I3 , we finally obtain from (8.5.27) and (8.5.31)–(8.5.33) that 2 3 n Xi , θ d n−B ≤ I1 + I2 + I3 ≤
i=1
( B (r 3 ( B (r ( (r 1 1 A + C(d, B) (2 ( + C(d, B) (3 ( n (n−B ( Tr 2 2 5
≤ Ann−B r Tr 2
by the definition of A.
Proof of Theorem 8.4.13: We apply the “minimality” property of the 5 Lp -metric : 2 2 3 3 n n n i , n−B 5 p n−B Xi , θ ≤ inf Lp n−B X L θi , (8.5.34) i=1
i=1
i=1
where the infimum is taken over all independent identically distributed d d d with fixed marginals X θ) ∼ i , θi ) ∼ pairs (X (X, X and θ ∼ θ. The righthand side in (8.5.34) is less than or equal to ' &( ( (p (p d d X θ); ∼ 5 p (X, θ). inf (n−B ( nLp (X, X, θ ∼ θ = n (n−B ( L The bound for π follows from Lemma 8.4.33.
2
Proof of Theorem 8.4.14: The proof is similar to that of Theorem 8.4.10 and is therefore omitted. 2 Proof of Theorem 8.4.15: The proof resembles that of Theorem 8.4.4 5 p in with the replacement of the smoothing inequality for ∗ by that for L Lemma 8.4.27 and hence is omitted. 2 Proof of Theorem 8.4.16: Using the Marcinkiewicz–Zygmund inequality (see for example Kawata (1972, Theorem 13.6.1)), if 1 < p ≤ 2 and {ξi }i≥1 are independent random vectors with E[ξ] = 0, then *( n (p + n ( ( ( ( E ( ≤ Cp ξi ( E[ξi p ], (8.5.35) ( ( i=1
i=1
i − θi ), where {(X i − θi )}i≥1 are i.i.d. for some Cp > 0. Take ξi = n−B (X d d i ∼ X and θi ∼ θ. Thus we get pairs, X (p + *( n n ( ( (p ( ( −B ( ( −B ( (Xi − θi )( E (n (Xi − θi )( ≤ Cp E (n ( ( i=1
i=1
8.5 Proofs of the Rate of Convergence Results
167
( ( −B (p ( ( (p ( ( = Cp n n E (X − θ( . Passing to the minimal metrics gives the necessary inequality. Finally, note Λ that p > λ2B implies nn−B p → 0 as n → ∞. The bound for the Prohorov B
p
5 pp+1 for p ≥ 1. metric π comes from the inequality π ≤ L
2
5 p to get Proof of Theorem 8.4.19: We use the minimality property of L 'p & 5 p (M (n), M (n)) L X θ 2 k 3 2 k 33p 2 n n @ @ 5 p n−B = Xi , n−B θi L k=1
i=1
k=1
i=1
2 k 3 2 k 3(p + *( n n ( ( @ @ ( ( ≤ E (n−B Xi − n−B θi ( ( ( k=1 i=1 k=1 i=1 3 2 k 3(p + *( n 2 k n ( (@ @ ( ( −B p ≤ n E ( Xi − θi ( ( ( k=1 i=1 k=1 i=1 (p + * n ( k k ( @ ( ( ( −B p ≤ Cp,d n E Xi − θi ( ( ( ( k=1
i=1
0
i=1
}1/p · 0 for some Cp,d > 0 and (here we have used · ≤ {Cp,d ( ( > >( > ( xk − yk 0 ≤ xk −yk 0 )
≤ Cp,d n−B p
n k=1
(p + *( k ( ( ( ( E ( (Xi − θi )( . ( ( i=1
0
Since E[X − θ] = 0 and 1 < p ≤ 2, by (8.5.35) again the above is, for some other constant Cp,d > 0, k n ( −B (p p ( ( n ≤ Cp,d E [X − θ ] k=1 i=1
= Cp,d
( n(n + 1) ( (n−B (p E [X − θp ] . 2
Passing to the minimal metric gives the necessary bound for 5 p (M (n), M (n)). L X θ
168
8. Probabilistic-Type Limit Theorems
5 p (m (n), m (n)). Further, The same argument leads to the bound for L X θ & 'p 5 Lp (aX (n), aθ (n)
( ( ( p + * n ( k n ( k ( @ ( ( @ ( −B (p ( ( ( ( ( ( n ≤ E Xi ( − θi ( ( ( ( ( ( ( k=1 i=1 k=1 i=1 ( ( (p + * n ( k k ( ( ( @ ( ( ( ( ( −B p ≤ n E Xi ( − ( θi ( ( ( ( ( ( i=1 i=1 k=1 (p + * n ( k k ( @( ( ( −B p ≤ n E Xi − θi ( ( ( ( k=1
≤ Cp
i=1
i=1
( n(n + 1) ( (n−B (p E [X − θp ] 2
as before. Combining our estimates, we complete the proof of the theorem. 2 Proof of Theorem 8.4.21: From the definition of the Zolotarev metric ζr and its ideality of order r, we get the first bound: 2 2 3 3 n n n ζr n−B Xi , θ ≤ ζr n−B Xi , n−B θi i=1
i=1
( (r ≤ (n−B ( ζr
2n
Xi ,
i=1
i=1 3 n
θi
( (r ≤ n (n−B ( ζr (X1 , θ) .
i=1
Applying the universal bound for the Prohorov metric π by ζr , π r+1 ≤ Crr+1 ζr on X (IRd ) for some Cr > 0 (cf. Zolotarev (1983)), we obtain the final estimate. 2 d i − θi ), where X i ∼ Proof of Theorem 8.4.22: Let ξi = n−B (X X and d i , θi ) are i.i.d. Then by the Rosenthal inequality (see, for θi ∼ θ, and (X example, Araujo and Gin´e (1980, p. 205)), for 2 < p < ∞, ⎧2 *( n (p +1/p 31/p2 n 31/2 ⎫ n ( ( ⎨ ⎬ ( ( p 2 E ( ξi ( ≤ C(d, p) max E[ξi ] , E[ξi ] , ( ( ⎩ ⎭ i=1
i=1
for some C(d, p) > 0. Then 2 3 n n i , n−B X θi Lp n−B i=1
i=1
i=1
8.5 Proofs of the Rate of Convergence Results
≤ C(d, p) max
⎧2 n ⎨ ⎩
169
31/p i − θi )p ] E[n−B (X
i=1
2 n i=1
,
31/2 ⎫ ⎬ −B 2 E[n (Xi − θi ) ] ⎭
1/p ( 1/2 ( −B (p (2 p −B 2 − θ ] , n(n ( E[X − θ ] ≤ C(d, p) max n(n ( E[X & ' ( ( ( ( 1/p ( −B ( 1/2 ( −B ( ≤ C(d, p) max n n Lp (X, θ), n n L2 (X, θ) ( ( θ), ≤ C(d, p)n1/2 (n−B ( Lp (X, since L2 ≤ Lp and n1/p ≤ n1/2 for p > 2. The case p = ∞ is similar.
2
In Theorems 8.4.4, 8.4.8, 8.4.9, 8.4.10, and 8.4.14 we have assumed the 5 p,r . Since χr ≤ μr , the natural finiteness of the metrics μr , χr , dr , and L 5 p,r , which may question is how we can assure the finiteness of μr , dr and L not be easily checked just by a direct use of the definitions. The rest of this 5 p,r , section is devoted to the construction of upper bounds for μr , dr , and L where the metrics used in the upper bounds are more familiar distances in the literature. We shall construct bounds from above for μr , dp , and Lp,r , using the Zolotarev ζr -metric. Define the following probability metrics: For X, Y ∈ X (IRd ), μr (X, Y )
=
dr (X, Y )
=
sup T ∈Aut(IRd )
sup T ∈Aut(IRd )
T −r Var(T X + θ, T Y + θ), T −r d(T X + θ, T Y + θ),
and 5 (X, Y ) = L p,r
sup T ∈Aut(IRd )
T −r Lp (T X + θ, T Y + θ).
5 . In the next two theorems we 5 p,r ≤ L Clearly, μr ≤ μr , dr ≤ dr , L p,r 5 are going to estimate μr , dr , and Lp,r from above by ζr . Let pθ (x) be the density function of the strictly operator-stable random vector θ ∈ IRd . For m ∈ IN let (m) |pθ (x)(h)m | dx Cm (θ) := sup h =1 IRd
170
8. Probabilistic-Type Limit Theorems
and Dm (θ) :=
(m)
sup sup |pθ
x∈IRd h =1
(x)(h)m |.
Theorem 8.5.1 (i) For m ∈ IN, μm (X, Y ) ≤ Cm (θ)ζm (X, Y ). (ii) If r = m + p, m ∈ IN, 0 < p ≤ 1, then 5 (X, Y ) ≤ C (θ)ζ (X, Y ). L p,r m r Theorem 8.5.2 dm (X, Y ) ≤ Dm (θ)ζm (X, Y ),
m = 1, 2, . . . .
Proof of Theorem 8.5.1: (i) For any X and Y , Var(X + θ, Y + θ) =
sup A∈B(IRd )
|P (X + θ ∈ A) − P (Y + θ ∈ A)|
(8.5.36)
sup{|E[f (X + θ) − f (Y + θ)]|; f ∈ F, f ∞ ≤ 1} = sup{|E[g(X) − g(Y )]|; f ∈ F, f ∞ ≤ 1},
=
where g(x) := E[f (x + θ)]. Since pθ (x) is differentiable infinitely many times (see Hudson (1980)), g(x) =
f (z)pθ (z − x) dz
f (x + y)pθ (y) dy = IRd
IRd
has derivatives of every order, and furthermore, (m) |g (m) (x)(h)m | = f (z)pθ (z − x)(h)m dz d IR (m) = f (x + y)pθ (y)(h)m dy . d IR
Since for f with f ∞ ≤ 1, sup sup |g (m) (x)(h)m | ≤ Cm (θ), we have x∈IRd h =1
g (m−1) (x) − g (m−1) (y) ≤ Cm (θ)x − y.
(8.5.37)
8.5 Proofs of the Rate of Convergence Results
171
Hence by (8.5.36) and (8.5.37), Var(X + θ, Y + θ) ( ( & ( ( (m−1) (m−1) ≤ sup |E [g(X) − g(Y )]| ; (g (x) − g (y)(
' ≤ Cm (θ) x − y
and μm (X, Y ) =
sup T ∈Aut(IRd )
T −m Var(T X + θ, T Y + θ)
−m
≤ sup T T
(8.5.38)
&
sup |E [g(T X) − g(T Y )]| ; ( ( ' ( (m−1) ( (m−1) (x) − g (y)( ≤ Cm (θ) x − y . (g
Let gT (x) := g(T x). Then gT(m−1) (x)(h)m−1 = g (m−1) (T x)(T h)m−1
for any x, h ∈ IRd ,
implying that ( ( ( ( ( (m−1) ( ( m−1 ( (m−1) (m−1) (m−1) (x) − gT (y)( ≤ T (T x) − g (T y)( . (gT (g Then the side condition in (8.5.38), ( ( ( (m−1) ( (m−1) (x) − g (y)( ≤ Cm (θ)x − y, (g results in ( ( ( (m−1) ( (m−1) (x) − gT (y)( ≤ Cm (θ)T m−1 T x − T y (gT ≤ Cm (θ)T m x − y. Consequently, by (8.5.38), −m
μm (X, Y ) ≤ sup T T
& sup |E[gT (X) − gT (Y )]|;
(m−1) gT (x)
−
(m−1) gT (y)
'
≤ Cm (θ)T x − y m
= Cm (θ)ζm (X, Y ), as desired.
5 . Let r = m + p, m ∈ IN, (ii) Let us now prove a similar bound for L p,r 0 < p ≤ 1. Then by Lemma 8.4.34 (ii), 5 p (X + θ, Y + θ) = sup {|E [f (X + θ) − f (Y + θ)]| ; f ∈ Lipb (p)} L = sup {|E [g(X) − g(Y )]| ; f ∈ Lipb (p)} ,
172
8. Probabilistic-Type Limit Theorems
where g(x) := E[f (x + θ)]. Since pθ (x) is differentiable infinitely many f (z)pθ (z − x) dz has times, the function g(x) = f (x + y)pθ (y) dy = IRd
IRd
derivatives of all orders, and for m ∈ IN, r = m + p,
(m) (m) m m |g (x)(h) | = f (x + y)pθ (y)(h) dy . d IR
By the requirement for f , ( ( ( (m) ( (m) (g (x) − g (y)( =
(m) m (m) m sup g (x)(h) − g (y)(h)
h =1
(m) m = sup [f (x + z) − f (y + z)] pθ (z)(h) dz h =1 IRd p (m) m ≤ sup x − y pθ (z)(h) dz h =1 IRd p
≤ x − y Cm (θ). & ( ( 5 Therefore Lp (X + θ, Y + θ) ≤ sup |E [g(X) − g(Y )]|; (g (m) (x) − g (m) (y)( ' d p ≤ Cm (θ)x − y for any x, y ∈ IR . 5 (X, Y ): Next consider L p 5 (X, Y ) L p,r
=
sup T ∈Aut(IRd )
T −r Lp (T X + θ, T Y + θ)
& ≤ sup T sup |E[g(T X) − g(T Y )]|; T ( ( ' ( (m) ( (m) p (g (x) − g (y)( ≤ Cr (θ)x − y . −r
Let gT (x) := g(T x). Then for all x, h ∈ IRd , gT(m) (x)(h)m = g (m) (T x)(T h)m for any x, h ∈ IRd , implying that ( ( ( ( ( ( ( (m) (m) m ( (m) (m) (gT (x) − gT (y)( = T (g (T x) − g (T y)( . Applying g (m) (x) − g (m) (y) ≤ Cm (θ)x − yp , we get that ( ( ( (m) ( (m) (gT (x) − gT (y)( ≤ Cm (θ)T m T x − T yp ≤ Cm (θ)T m+p x − yp ,
8.5 Proofs of the Rate of Convergence Results
173
and m + p = r by assumption. Similarly, we get 5 (X, Y ) ≤ sup T −r sup {|E[g (X) − g (Y )]|; L p,r T T T ( ( ' ( (m) ( (m) r p (gT (x) − gT (y)( ≤ Cm (θ)T x − y ≤ Cm (θ)ζr (X, Y ).
2
Proof of Theorem 8.5.2: We have dm (X, Y ) =
sup T ∈Aut(IRd )
(8.5.39) T −m d(T X + θ, T Y + θ)
sup T −m sup |pT X+θ (x) − pT Y +θ (x)| T x∈IRd = sup T −m sup pθ (x − y)[P (T X ∈ dy) − P (T Y ∈ dy)] . T x∈IR d
=
IR
Let
g(y) = pθ (x − y).
(8.5.40)
Then
(m) (m) m m sup sup sup g (y)(h) ≤ sup pθ (x − y)(h) = Dm (θ).
y∈IRd x∈IRd h =1
Hence
y,x,h
( ( ( (m−1) ( (m−1) (x) − g (y)( ≤ Dm (θ)x − y. (g
(8.5.41)
From by (8.5.39)–(8.5.41) we have & dm (X, Y ) ≤ sup T −m sup |E[g(T X) − g(T Y )]|; T ( ( ' ( ( (m−1) (m−1) (x) − g (y)( ≤ Dm (θ)x − y . (g This upper bound is the same as that of μm (X, Y ) in (8.5.38) if Cm (θ) is replaced by Dm (θ). Therefore the proof of Theorem 8.5.1 also implies that dm (X, Y ) ≤ Dm (θ)ζm (X, Y ).
2
The next question is the finiteness of Cm (θ) and Dm (θ). Theorem 8.5.3 We have Dm (θ) < ∞, m = 1, 2, . . . , and if ΛB < 1, then Cm (θ) < ∞, m = 1, 2, . . . .
174
8. Probabilistic-Type Limit Theorems
Proof: We first show the finiteness of Dm (θ). Let μ 5(z), z ∈ IRd , be the characteristic function of θ. As was shown in Hudson (1980), for some c > 0, (8.5.42) |5 μ(z)| ≤ exp{−cz1/ B } for every z with z > 1. Hence, for every m = 1, 2, . . . , zm |5 μ(z)| dz < ∞, implying the exisIRd
(m)
tence of pθ (x), and furthermore the finiteness of Dm (θ). To prove the finiteness of Cm (θ), we assume ΛB < 1, which implies E[θ] < ∞. (See Hudson et al. (1988).) We start with Carlson’s inequality for one-variable functions f . If f5, the Fourier transform of f , is in L2 (IR), and f5 exists and is in L2 (IR), then ⎛ ∞ ⎞4 ⎛ ∞ ⎞⎛ ∞ ⎞ ⎝ |f (x)| dx⎠ ≤ K ⎝ f5(z)2 dz ⎠ ⎝ f5 (z)2 dz ⎠. (8.5.43) −∞
−∞
−∞
A version of this inequality for several variable functions f is, for each h ∈ IRd , ⎞4 ⎛ ⎞⎛ ⎞ ⎛ ⎝ |f (x)| dx⎠ ≤ K ⎝ f5(z)2 dz ⎠ ⎝ (Df5(z)h)2 dz ⎠, (8.5.44) IRd
IRd
IRd
where Df5(z) is the gradient (row) vector of f5(z). The proof of (8.5.44) can be carried out in the same manner as for (8.5.43), so we omit it. Since we are assuming E[θ] < ∞, μ 5(z) is differentiable. Fix h ∈ IRd and apply (8.5.44) to f (x) := p(m) (x)(h)m , x ∈ IRd . θ Then f5(z) = eiz,x p(m) (x)(h)m dx, and θ
IRd
Df5(z)h = i x, heiz,x pθ(m) (x)(h)m dx. Thus
IRd
f5(z)2 dz ≤
IRd
sup h =1 IRd
z2m |5 μ(z)|2 dz < ∞ by (8.5.42), and
IRd
Df5(z)h
2
dz ≤
z2m |D5 μ(z)|2 dz =: I.
IRd
So it remains to show the finiteness of I. We recall that the characteristic function μ 5(z) is given by μ 5(z)
=
exp iz, c + z, Az
8.5 Proofs of the Rate of Convergence Results
∞
+
γ( dx) 0
S
175
B 1 eiz,s x − 1 − iz, sB xIQ (sB x) 2 ds . s
Here z ∈ IRd , c ∈ IRd , A is a nonnegative definite symmetric matrix, S = {x ∈ IRd ; x = 1 and tB x > 1 for all t > 1}, Q = {x ∈ IRd ; x ≤ 1}, γ is a probability measure on S. We write μ 5(z) = eψ(z) . Since ΛB < 1, sB x M1 := γ( dx) ds < ∞. (8.5.45) s2 { sB x >1}
S
Note that if γ(S) > 0, the non-Gaussian part exists, and the restriction of B to the support of the measure γ (we shall call it B again) satisfies λB > 12 . (See Hudson and Mason (1981).) Hence we also have sB x2 M2 := γ( dx) ds < ∞. (8.5.46) s2 { sB x ≤1}
S
μ(z)h = μ 5(z)Dψ(z)h. We have, for h ∈ IRd , D5 Let z = (z1 , . . . , zd )t , c = (c1 , . . . , cd )t, A = (aij ), sBx = B t s x d , and h = (h1 , . . . , hd )t . Then ∂ ∂zj ψ(z)
sB x
1
, ...,
= icj + 2(Az)j ∞ B 1 B B iz,sB x + γ( dx) i s x j e − i s x j IQ s x ds. s2 S
0
Thus ∞ ∂ B i
B 1 − IQ (s x) 2 ds ∂zj ψ(z) ≤ |cj | + 2Az + γ( dx) s x j e s 0 S ( B ( 1 (s x( ≤ c + 2Az + γ( dx) ds s2 S
γ( dx)
+ S
{ sB x >1}
1 ( B ( i (s x( e − 1 2 ds s
{ sB x ≤1}
≤ c + 2Az + M1 + M2 z, where M1 and M2 are finite by (8.5.45) and (8.5.46). We thus finally have d ∂ |Dψ(z)h| = ψ(z)h j ∂zj j=1
176
8. Probabilistic-Type Limit Theorems
≤ dh (c + 2Az + M1 + M2 z) ≤ C1 + C2 z, and |D5 μ(z)h| ≤ (C1 + C2 z) |5 μ(z)| . Hence by (8.5.42) we conclude that z2m |D5 μ(z)h|2 dz < ∞. I = IRd
The proof of Theorem 8.5.3 is now complete.
2
The final question is the finiteness of ζm (X1 , θ). As we have noted in (8.4.32), ζm (X1 , θ) ≤
1 κm (X1 , θ), m!
m = 1, 2, . . . ,
where κm (X1 , θ) is the difference pseudomoment, namely κm (X1 , θ)
=
sup {|E[f (X1 ) − f (θ)]|; |f (x) − f (y)| ≤ dm (x, y) ' d for any x, y ∈ IR ,
( ( ( m−1 m−1 ( − y y where dm (x, y) = (x x (. It would be difficult to check conditions implying κm (X1 , θ) < ∞. Instead we give an example for laws of X1 with ζm (X1 , θ) < ∞. The idea is to use the series representation for θ (compare Remark 8.1.5). Let {Wj }∞ j=1 be a sequence of i.i.d. random variables taking their values in S = {x ∈ IRd ; x = 1} with a common probability distribution λ on S and Γj = δ1 +· · ·+δj . Here {δj } is a sequence of independent exponentially distributed random variables with E[δ1 ] = 1 that are independent of {Wj }. It is known that the series ∞ −B Γj Wj − E Γ−B j I[Γj ≥ 1] E[Wj ] j=1
converges almost surely and is distributed as an operator-stable random vector with exponent B. Suppose E[Wj ] = 0 for all j, and set d
θ =
∞ j=1
Γ−B j Wj .
8.5 Proofs of the Rate of Convergence Results
177
Let r > 1/λB and j0 = [rλB ]. Then it is easy to show that if j > j0 , then r E[Γ−B j ] < ∞. Consider another sequence of independent random variables {Vj } on S, which are independent of {Γj }, such that d
for j ≤ j0 ,
Vj = Wj
and for j > j0 , {Vj } are arbitrary but not identically distributed random variables on S. Define d
X =
∞
Γ−B j Vj ,
j=1
assuming that the series converges almost surely. Theorem 8.5.4 Suppose that all mixed moments of order less than or equal to r − 1 coincide; that is, (8.5.47) E V1α1 · · · Vpαp − W1α1 · · · Wpαp = 0 p for any αi ≥ 0 with i=1 αi ≤ r − 1. Then ζr (X, θ) < ∞. Proof: By the operator-ideality of ζr of order r and the triangle inequality, ⎛ ⎞ ⎠ ζr (X, θ) = ζr ⎝ Γ−B V , Γ−B j j j Wj j
≤
∞
j
−B ζr (Γ−B j Vj , Γj Wj )
=
∞
−B ζr (Γ−B j Vj , Γj Wj )
j=j0 +1
j=1
(by Zolotarev (1983, Property 4 on p. 293)) ≤
∞
r E[Γ−B j ]ζr (Vj , Wj )
j=j0 +1 ∞
≤ sup ζr (Vj , Wj ) j
r E Γ−B j .
j=j0 +1
Since rλB > 1, the final series converges. This is shown as follows. Note that for any ε > 0, there exists C > 0 such that x−B < Cx−(ΛB +ε) ,
0 < x ≤ 1,
178
8. Probabilistic-Type Limit Theorems
and x−B < Cx−(λB −ε) ,
x > 1.
Thus we have for j > j0 = [rλB ], −B r −B r r E[Γ−B j ] = E[Γj I[Γj ≤ 1]] + E[Γj I[Γj > 1]] −r(ΛB +ε)
≤ E[Γj
−r(λB −ε)
] + E[Γj
].
On the other hand, if j > p, then E[Γ−p j ] =
Γ(j − p) ∼ j −p . Γ(j)
Hence r −r(Λ+ε) E[Γ−B + j −r(λB +ε) ) ≤ Cj −r(λB +ε) ) j ] ≤ C(j
for large j, because ΛB ≥ λB . This shows the convergence of the series in question. Also, as we have noted in (8.4.34), under (8.5.47), ζr (Vj , Wj ) ≤
Γ(1 + α) ξr (Vj , Wj ), Γ(1 + r)
where r = m + α, m ∈ IN, α ∈ (1, 2]. In our case, Vj and Wj take their values on the unit sphere, and therefore, ξr (Vj , Wj ) = Var(Vj , Wj ) ≤ 1. This concludes the proof of the theorem.
2
8.6 Ideal Metrics in the Problem of Rounding It is a widely accepted fact that sums of rounded proportions often fail to add to 1. In their pioneering works, Mosteller, Youtz, and Zahn (1967) and Diaconis and Freedman (1979) assessed the probability that a vector of conventionally rounded percentages adds to 100. The conventional rule picks the midpoint of each interval as the threshold for rounding. However, the goal of rounding to maximize the probability that the sum of roundings is the rounding of the sum may well not be a “significant” question: Instead, it seems that the goal of rounding to obtain a distribution as much like the original one as possible is more fundamental.
8.6 Ideal Metrics in the Problem of Rounding
179
Suppose, for example, that q1 , . . . , qs are s independent identically [0, 1]uniformly distributed random numbers and that each is to be rounded to either 0, 12 , or 1. The usual method is to let x = 0 if 0 ≤ q < 14 , x = 12 if 14 ≤ q < 34 , and x = 1 if 34 ≤ q ≤ 1. Then Ex = Eq = 12 , and 1 . On the other hand, if instead of rounding “at” Var x = 18 = Var q = 12 1 3 1 5 1 1 ∗ ∗ 4 and 4 one rounds at 6 and 6 , that is, x = 0 if 0 ≤ q < 6 , x = 2 if 16 ≤ q < 56 , and x∗ = 1 if 56 ≤ q ≤ 1, then Ex∗ = Eq = 12 and 1 . The importance of the deviations between the sums Var x∗ = Var q = 12 of the resulting roundings xS = x1 + · · · + xs and x∗S = x∗1 + · · · + x∗s from qS = q1 + · · · + qs may be seen by comparing the differences s = sup |P (a < qS < b) − P (a < xS < b)| a
and ∗s = sup |P (a < qS < b) − P (a < x∗S < b)|, a
Then, by the central limit theorem, lim s ≥ 0.049,
s→∞
whereas lim ∗s = 0, s→∞
showing that at least by this specific criterion, conventional rounding is not the best choice. In this section the Diaconis and Freedman results are extended, and the motivation for changing the nature of the goal of rounding is given.(2) We define and study the properties of optimal roundings in terms of ideal metrics. We start with some definitions from the theory of rounding (see Balinski and Young (1982)). A vector problem is a pair (p, h) where p = (pj ) is a vector of real numbers, j ∈ N = {1, . . . , n}, and h is a real number as well. Given any positive real t > 0, a rule of (1/t)-rounding is a mapping t such that t (p, h) ∈ {x = (xj ); xj = kj /t, kj integer, j ∈ N }.
(8.6.1)
In the sequel we write x = t (p), since h remains fixed and it is generally the case that h = j pj . Our interest concerns problems with pN := N pj = h; for example, pj s are probabilities and h = 1.(3) This motivates two immediate questions: (a) Given a rule t what is the chance that xN := N xj = h for x = t (p)? (b) What specific rule t maximizes this chance? So from now on, it is assumed that problem (p, h) satisfies pN = h. (2) The
results in this section are due to Balinski and Rachev (1993). is no loss of generality in considering h = 1.
(3) There
180
8. Probabilistic-Type Limit Theorems
The conventional rule of (1/t)-rounding, x = t (p), with xi equal to pj rounded to the nearest 1/t, was first discussed by Mosteller, Youtz, and Zahn (1967) for the vector problem (p, 1). They computed the chance that xN = 1 with this rule of rounding for several different probability models generating p. Diaconis and Freedman (1979) assessed the limiting probability that xN = 1 as t → ∞ under the assumption that p is absolutely continuously distributed on the simplex Sn := {p = (pj ) ≥ 0, j ∈ N ; pN = 1}.
(8.6.2)
The MYZ-rule of rounding (see Mosteller, Youtz, and Zahn (1967) and Diaconis and Freedman (1979)) is defined by 1/2
xj := [pj ]t
= k/t
if k −
1 1 < tpj ≤ k + , k an integer. (8.6.3) 2 2
The MYZ-rule is only one example of an infinite class of rules of rounding first discussed by Balinski and Young (1978) called divisor rules of (1/t)rounding based on d, dt , defined by xj := [pj ]dt := k/t
if d(k − 1) < tpj ≤ d(k), k an integer, (8.6.4)
where d(h) ∈ [k, k + 1], the divisor criterion, is any real-valued function from the integers into the closed interval k to k + 1. It is the “threshold” for rounding: Above the threshold round up, at or below it round down. The divisor rules arose as a characterization of rules satisfying certain desirable properties in the context of apportionment problems; see Balinski and Young (1982). The best known and most discussed among them are the following (for integer k): Adams (or round up): Dean (or harmonic mean): Hill (or geometric mean):
d(k) = k;
1 d(k) = k(k + 1)/ k + 2 % d(k) = k(k + 1);
Webster (or arithmetic mean, 1 the MYZ-rule): d(k) = k + ; 2 Jefferson (or round down): d(k) = k + 1.
(8.6.5)
;
(8.6.6) (8.6.7) (8.6.8) (8.6.9)
Theorem 8.6.1 (Diaconis and Freedman (1979)) Suppose p is absolutely continuously distributed on the simplex and dt is based on d(k) = k+C, C ∈ [0, 1]. Then, as t → ∞, P (xN = 1) → P (C − 1 ≤ V1 + · · · + Vn−1 ≤ C),
(8.6.10)
where the Vi ’s are independent and uniformly distributed on [−C, 1 + C].
8.6 Ideal Metrics in the Problem of Rounding
181
A stationary divisor rule based on C is a divisor rule based on a criterion d(k) = k + C, 0 ≤ C ≤ 1. A K-stationary divisor K t based on (C0 , . . . , CK−1 , C), 0 ≤ C ≤ 1, 0 ≤ Ci ≤ 1 for all i, is a divisor rule based on the divisor criterion ⎧ ⎨ k+C if 0 ≤ k ≤ K − 1, k (8.6.11) d(k) = ⎩ k+C otherwise. From Theorem 8.6.1 it follows that among the stationary divisor rules (or 0-stationary ones) the MYZ-rule, or Webster’s rule w t , maximizes the probability that in the limit as t → ∞ the sum of the roundings is 1. The following theorem strengthens Diaconis–Freedman’s result. Theorem 8.6.2 Suppose p is uniformly distributed on the simplex Sn . Then the maximum of the limiting probability limt→x P (xN = 1) over the set of all K-stationary divisor rules is attained with C = 12 and C0 , . . . , CK−1 arbitrary. The proof is similar to that in Theorem 8.6.1; for details we refer to Balinski and Rachev (1993). This theorem remains valid under the weaker assumption that p is absolutely continuous on the simplex. However, the result suggests that maximizing the limiting probability in (8.6.10) is not in fact a reasonable objective. The rate of convergence in the best case, (8.6.12) lim P (xN = 1) 2 3 n−1 1 1 1 1 Vi ≤ ; Vi i.i.d. uniform on − , = P − < 2 2 2 2 1 4 6 + o(n−1/2 ), ≈ π(n − 1)
t→x
can be very slow. Indeed, for every n and every t it is possible to choose an absolutely continuous distribution μn,t on Sn such that P (xN = 1), whereas the right-hand side of (8.6.10) is less than 1 for n ≥ 3 and converges to 0 as n grows. To construct such a distribution choose p1 , . . . , pn−1 to be independent identically (0, δ)-uniformly distributed random variables with 1 δ ∈ (0, 2m ) and pn = 1 − p1 − · · · − pn−1 . This determines an absolutely continuous distribution on Sn such that the Webster rule of rounding gives xi = 0 for i = 1, . . . , n − 1 and xn = 1, with probability equal to 1. Perhaps a more serious limitation of this approach is the fact that a “best” K-stationary rule—“best” in the sense of maximizing the limit in (8.6.10), an example of which is the Webster rule w t —may not satisfy a natural “continuity” property: If two problems (p, 1) and (p∗ , 1) come
182
8. Probabilistic-Type Limit Theorems
from distributions that are “close,” then the distributions of their roundings should be close as well. Letting μ represent the distribution of the roundings under rule when the original data have the distribution μ, “continuity” means that μ ≈ μ∗
implies
μ ≈ μ∗ .
(8.6.13)
For example, the continuity property may fail if μ is an absolutely continuous distribution on Sn and μ∗ is a discrete distribution on Sn . Specifically, take μ∗ to be the point distribution p1 = · · · = pn−1 = 0, pn = 1, and μ(i) to be absolutely continuous distributions satisfying ν1 (μ(i) , μ∗ ) → 0 as i → ∞, with ν1 a distance in the space of distributions on Sn that metrizes w ∗ the weak convergence. Given a similar distance ν2 (w t μ, t μ ) in the space of the distributions of the roundings on the lattice {1, 1 ± 1/t, . . .}n , the continuity property says that as i → ∞, ν1 (μ(i) , μ∗ ) → 0
implies
(i) w ∗ lim ν2 (w t , μ , t μ ) → 0.
t→x
However, in our example the first sequence goes to 0, but the second is (i) ∗ converges weakly to w strictly positive. To see this, recall that if w t μ t μ , then the distribution of f (x(i) ) converges to that of f (x∗ ) for any continuous (i) function f on Rn , and in particular, the distribution of t(xN −1) converges to that of t(x∗N − 1). But, by construction t(x∗N − 1) = 0, whereas for any (i) i, t, we have (xN − 1) → V1 + · · · + Vn−t as t → ∞, and the last term is not 0 with probability 1. In contrast with Theorem 8.6.1, where the result is independent of the particular absolutely continuous distribution that obtains, the above remarks show that constructing a reasonable rule of rounding should depend on the information that is available concerning the distribution μ. We next consider the vector problem of rounding as a special case of the matrix problem: (pij ), i ∈ N = {1, . . . , n}, j ∈ S = {1, . . . , s}, made up of s observations pj = (p1j , . . . , pnj )T , j ∈ S (the columns of the matrix) of the random vector p = (p1 , . . . , pn )T , consisting of independent nonidentically distributed random variables. The ith row of data pi = (pi1 , . . . , pis ) consists of s independent observations of pi . For simplicity, set q := (qj )j∈S , where qj = (pij ), the vector problem with i.i.d. random variables. It is assumed throughout that q1 has a continuous distribution. Suppose x = (xj )j∈S is obtained by some 1/t-rule of rounding; that is, xj = kj /t, where the kj are integer-valued i.i.d. random variables, and that q ≥ 0, so the xj ’s take values on the lattice {0, 1/t, 2/t, . . .}. The question of interest is, What is the deviation between the sum of the observations qs = q1 + · · · + qs and the sum of the roundings xs = x1 + · · · + xs . The
8.6 Ideal Metrics in the Problem of Rounding
183
answer will, of course, depend upon the distribution of the qj ’s, the way by which the deviation is measured, and the rule of rounding that is used. A rule of rounding x∗ = ∗t (q) is optimal with respect to the metric μ over a class of rules R if for any q, μ(qs , x∗s ) = min{μ(qs , xs ); x = t (q), t ∈ R} t
and
μ
1 1 qs , x∗s s s
(8.6.14)
→ 0
as s → ∞.
(8.6.15)
Roughly speaking, optimality asks that the deviation between the sum of the observations and the sum of the roundings should be as small as possible and that the deviation between their respective sample means should go to 0 as the number of observations grows. Suppose X = (X1 , . . . , Xn ) and Y = (Y1 , . . . , Yn ) are random vectors each with i.i.d. components. A variety of probability metrics μ have been proposed to measure the deviation between two such distributions (see Rachev (1991) and the previous Sections 8.1–8.5). Probably the best known is the Kolmogorov, or uniform, distance , (X, Y ) :=
sup
−∞<x<∞
|FX (x) − FY (x)|,
where FZ is the distribution function of Z. Others include the Kantorovich metric, Dall’Aglio’s extension of the Kantorovich metric, and the L´evy metric. But for these metrics there exists no optimal rule of rounding because it is impossible to meet condition (8.6.15). The only type of metric that seems to be able to meet this condition is an ideal metric of order r > 0 (see Section 8.1) that satisfies 3 2 n n n r ≤ c Xj , c Yj μ(Xj , Yj ) for any c > 0. (8.6.16) μ c 1
1
1
The class of all ideal metrics has not been characterized; indeed, only a few have been identified. An example of an ideal metric of order r = 1 + (1/p) ∈ (1, 2), p ≥ 1, is θr (X, Y ) = sup{|E(f (X) − f (Y ))|; f ∈ Fr }.
(8.6.17)
Here Fr is the set of all functions f whose second derivative f has a 1/q ≤ 1, with (1/p) + (1/q) = 1 (see bounded q-norm: f |q = (f |f |q ) Maejima and Rachev (1987)). It is easy to check that s 1 1 −r −r qs , xs = s θr (qs , xs ) ≤ s θr (qj , xj ) = s1−r θr (q1 , x1 ). θr s s 1
184
8. Probabilistic-Type Limit Theorems
Thus 1 1 θr qs , xs = O(s−1/p ) as s → ∞ whenever θr (q1 , x1 ) < ∞. (8.6.18) s s Notice that by the definition of θr , θr (X, Y ) < ∞
implies
E(X − Y ) = 0.
In fact, θr (X, Y ) ≥ supa>0 |E(aX − aY )| = +∞ if E(X − Y ) = 0. Thus, a necessary condition for x∗ = xt (q) to be an optimal stationary rule with respect to θr is the equality of the first moments of q1 and of its (1/t)rounding: Eq1 =
∞
Ex∗1
1 = P (q1 > d(k)/t). t 0
or Eq1
(8.6.19)
This condition is sufficient under the mild assumption that Eq1r < ∞.
(8.6.20)
In fact, (8.6.19) implies θr (qs , x∗s ) ≤ s where κr (X, Y ) = r
1 κr (q1 , r1 ), Γ(r − 1)
∞ 0
xr−1 |FX (x)−FY (x)| dx is the Kantorovich rth pseu-
domoment, and by (8.6.20), κr (q1 , x∗1 )
≤
Eq1r
1 + E q1 + t
r < ∞.
Summarizing, this yields the following theorem. Theorem 8.6.3 Suppose that the vector q = (q1 , . . . , qs ) consists of i.i.d. random variables, and Eq1r is finite for some r ∈ (1, 2). Then 1 1 θr θr (qs , xs ) = ∞, = ∞ qs , xs s s for any rule of (1/t)-rounding x = t (q) with Eq1 = Ex1 . However, if Eq1 = Ex∗1 for some stationary rule x∗ = ∗t (q), then ∗t is an optimal rule with respect to μ over the class of all stationary rules. Moreover, 1 1 1 ∗ θr ≤ s1−r qs , xs κr (q1 , x∗1 ) = O(s1−r ). s s Γ(r − 1)
8.6 Ideal Metrics in the Problem of Rounding
185
Corollary 8.6.4 If in Theorem 8.6.3 t is stationary, then it is optimal with respect to θr if and only if tEq1 =
∞
P (tq1 > k + C).
(8.6.21)
0
Equation (8.6.21) always has a solution C that is unique for any t > 0 under the condition that Fq1 (x) is strictly increasing. Thus, an optimal stationary rule with respect to θr exists and is unique over the set of stationary rules. Example 8.6.5 Suppose q1 is uniform on the interval (0, 1). Then (8.6.21) becomes [t−C] k+C 1 1 1− = . 2 t 0 t If t ∈ IN = {1, 2, . . .}, then this is equivalent to t−1 k+C 1 1 1− = , 2 t 0 t whose solution is C(t) = 12 , so the Webster rule is optimal. However, if t ∈ N + 12 , then the solution is C(t) = (t − 14 )/(2t + 1) < 12 , so the Webster rule is only asymptotically optimal. The set of stationary rules are clearly a very restrictive class within the class of all divisor rules. Moreover, the rate of convergence of (8.6.15) for θr can be very slow, as can be seen from Theorem 8.6.3, where 1 − r ∈ (−1, 0). Indeed, simple examples show that the order of convergence O(s1−r ) is exact. Given two optimal rounding rules with respect to some μ, x∗ = ∗t (q) and x = t (q), ∗t is preferred to t if μ((1/s)qs , (1/s)x∗s ) → 0 at a faster rate than μ((1/s)qs , (1/s)xs ) → 0 as s → ∞. And ∗t is optimal of order λ > 0 with respect to μ over a class R of rules if for any q it is optimal, and 1 1 ∗ (8.6.22) → O(s−λ ) as s → ∞. μ qs , xs s s Theorem 8.6.3 tells us that there exists an optimal stationary rule ∗t of order λ = r − 1 with respect to θr if the rth moment of q1 exists and is finite. Is it possible that an ideal metric μ other than θr would determine an optimal stationary rule different from ∗t ? The answer is negative for all “nonpathological” metrics.
186
8. Probabilistic-Type Limit Theorems
Corollary 8.6.6 Suppose μ is an ideal metric of order r > 1 such that the law of large numbers holds with respect to μ; that is, X 1 + · · · + Xn , EX1 μ → 0 as n → ∞ (8.6.23) n for any nonnegative i.i.d. Xi ’s with finite EX1 . Then there is a unique stationary rule of (1/t)-rounding that is optimal of order r − 1, and it is determined by the solution C to (8.6.21). Proof: Suppose two different stationary rules x∗ = ∗t (q) and x = t (q) are optimal of orders λ∗ and λ (where λ∗ and λ may be different). Then 1 ∗ 1 ∗ ∗ 1 ∗ μ(Ex1 , Ex1 ) ≤ μ Ex1 , xs + μ x , qs s s s s 1 1 1 +μ qs , xs + μ x , Ex1 . s s s s However, the right-hand side goes to 0 as s → ∞ by (8.6.16) and (8.6.23), and therefore Ex∗1 = Ex1 . Since the rules are stationary, they are both determined by (8.6.4), and so they are the same. 2 To obtain faster rates of convergence one must consider a wider class of divisor rules than the stationary ones. It is our objective to show that if one extends the analysis to K-stationary rules, then one can find an optimal rule of order λ = K + 1. To do this it is necessary to generalize the definition of θr (since by the previous definition r ∈ (1, 2)), which can be done as follows: θr (X, Y ) = sup{|E(f (X) − f (Y ))|; f ∈ Fr },
(8.6.24)
where r = r0 + 1/p > 0, r0 ≥ 0 an integer, and p ≥ 1. Furthermore, in (8.6.24), Fr is the set of all functions f with (r0 + 1)-derivative f (r0 +1) satisfying f (r0 +1) q = [ |f (r0 +1) |q ]1/q ≤ 1, with (1/p) + (1/q) = 1. As before, it may be checked that 1 1 θr ≤ s1−r θr (q1 , x1 ), qs , xs s s so that (8.6.18) holds, provided that θr (q1 , x1 ) < ∞. In addition, θr (q1 , x1 ) < ∞
implies
E(q1k − xk1 ) = 0
for 0 < k ≤ r0 . (8.6.25)
Conversely, if Eq1r < ∞, then E(q1k − xk1 ) = 0 for 0 < k ≤ r0 implies θr (q1 , x1 ) < ∞. (8.6.26)
8.6 Ideal Metrics in the Problem of Rounding
187
In fact, θr (q1 , x1 ) can be bounded by θr (q1 , x1 ) ≤ Cr K κr (q1 , x1 ) ≤ Cr Eq1r < ∞,
(8.6.27)
where Cr and Cr are constants, and κr is the pseudomoment of order r. It is possible to say more. θr (q1 , x1 ) → 0 means that the distributions of q1 and of x1 “θ-merge”; that is, L(q1 , x1 ) → 0 and E(q1r − xr1 ) → 0, where L is the L´evy metric between the distributions of q1 and x1 (for merging, the L´evy metric, and related concepts see D’Aristotile, Diaconis, and Freedman (1988) and Rachev (1991)). A K-stationary rule x∗ = ∗t (q) of order λ = r − 1 is optimal with respect to the metric θr over the class of K-stationary rules if for any q, θr (qs , x∗s ) = min{θr (qs , xs ); x = t (q), t K-stationary}, (8.6.28) t
and furthermore, the rate of θr -merging is θr
1 1 qs , x∗s s s
= O(s1−r )
as s → ∞
(8.6.29)
whenever Eq1r < ∞. In fact, (8.6.25) through (8.6.27) imply that if for a rule x = t (q) the moment conditions E(q1k − xk1 ) = 0 do not hold for some k = 0, . . . , r0 , then θr (q1 , x1 ) = ∞. So (8.6.28) fails for s = 1, and thus t is not optimal. On the other hand, if these moment conditions do hold for all k for some rule x∗ = ∗t (q), then by (8.6.27) and the ideality of θr , θr (qs , x∗s ) ≤ sCr Eq < ∞ for any fixed s, so (8.6.28) is fulfilled. This proves Theorem 8.6.7 Suppose Eq1r < ∞ with r = K +1+1/p. Then x∗ = ∗t (q) is an optimal K-stationary rule of order λ = r − 1 with respect to the metric θr if and only if the thresholds C0 , . . . , CK−1 , C = CK = CK+1 . . . are chosen such that E(q1j − x∗j 1 )=0
for j = 1, . . . , K + 1.
(8.6.30)
This means that the K +1 thresholds must be chosen such that the following system of equations is satisfied Eq1j
=
∞ j k k=1
t
P (k − 1 + Ck−1 < tq1 < k + Ck ),
j = 1, . . . , K + 1, where Ck = CK for k ≥ K.
(8.6.31)
188
8. Probabilistic-Type Limit Theorems
It has been observed above that when q1 is uniform over the interval (0, 1), t is an integer, and K = 0, then the optimal rule is Webster’s. But if K > 0, there is no stationary rule that meets (8.6.31) even in the uniform case. Example 8.6.8 Suppose q1 is uniform over the interval (0, 1), t is an integer, and K = 1.(4) Then (8.6.31) yields the solution C0 = 13 , C = (3t − 2)/(6t − 6), so the optimal 1-stationary rule is determined by d(0) = 1 3 , d(k) = k + (3t − 2)/(6t − 6) for k = 0. Suppose t = 2. Then q1 is rounded to either 0, 12 , or 1 as follows: x∗1 = 0 if 0 < q1 ≤ 16 , x∗1 = 12 if 16 < q1 ≤ 56 , and x∗1 = 1 if 56 < q1 ≤ 1. The first 1 = Var x∗1 . two moments of q1 and x∗1 agree: Eq1 = 12 = Ex∗1 , Var q1 = 12 (Indeed, the third moments also agree: Eq13 = E(x∗1 )3 = 41 .) The Webster rule, on the other hand, rounds q1 as follows: xw 1 = 0 if 1 1 1 3 3 w w 0 < q1 ≤ 4 , x1 = 2 if 4 < q ≤ 4 , and x1 = 1 if 4 < q1 ≤ 1. The first 1 w moments of q1 and xw 1 agree but not the second: Eq1 = 2 = Ex1 , Var q1 = 1 1 w 12 < 8 = Var x1 . Since the first two moments of q1 and x∗1 are equal for the optimal 1stationary rule, the central limit theorem applies, and for the Kolmogorov metric (qs , x∗s ) = sup{|P (qs ≤ x) − P (x∗s ≤ x)|; x ∈ IR} we have # $ qs − s/2 x∗s − s/2 ∗ (qs , xs ) = √ , √ ≈ O(s−1/2 ) as s → ∞. 12s 12s In contrast, the Webster rule gives * 8 + # $ w w qs − s/2 xs − s/2 qs − s/2 xs − s/2 8 √ (qs , xw √ , √ = , √ s) = 12 12s 12s 12s 8s √ → N(0,1) , N(0, 2/3) > 0 as s → ∞, where N(m,σ) is the normal distribution with mean m and standard deviation σ. In terms of the ideal metric θ3 there is even better evidence for the advantage of the optimal 1-stationary rule over the standard Webster rounding. For 1 1 ∗ ≤ s−2 θ3 (q1 , x∗1 ) ≤ constant s−2 , qs , xs θ3 s s the last inequality is due to the optimality of x∗1 , whereas 1 1 ∗ qs , xs θ3 = +∞ for any s. s s (4) For
(1993).
other examples including the case K = 2, we refer to Balinski and Rachev
8.6 Ideal Metrics in the Problem of Rounding
189
For more results concerning the vector problem with more i.i.d. observations and the problem of rounding tables we refer to Balinski and Rachev (1993).
9 Mass Transportation Problems and Recursive Stochastic Equations
In this chapter we use the regularity properties of metrics and distances defined via mass transportation problems in order to investigate the asymptotic behavior of stochastic algorithms and recursive equations. The recursive structure allows us to apply fixed-point and approximation techniques to the space of probability measures supplied with adapted probability metrics in order to describe the limiting behavior of various algorithms.
9.1 Recursive Algorithms and Contraction of Transformations Several different approaches to the asymptotic analysis of algorithms have been given in the literature. Interesting results have been obtained by the transformation method, the method of branching processes, the method based on stochastic approximations, the martingale method, and others. The analysis of algorithms is an important application of stochastics in computer science and poses challenging questions and problems. It has led to some new developments also in stochastics. Based on the properties of minimal metrics as introduced in Chapter 8, a promising new method for asymptotic analysis has recently been introduced. R¨ osler (1991) gave an asymptotic analysis of the quicksort algorithm based on the minimal p -metric. His proof has been extended in several papers by Rachev and R¨ uschendorf to a general “contraction method” with
192
9. Mass Transportation Problems and Recursive Stochastic Equations
a wide range of possible applications. A series of examples and further developments of the method have been found in some recent work. The contraction method (in its basic form) uses the following sequence of steps: 1. Find the correct normalization of the algorithms. (Typically by studying the first moments or tails.) 2. Determine the recursion for the normalized algorithm. 3. Determine the limiting form of the normalized algorithms. The limiting equation typically is defined via a transformation T on the set of probability measures. 4. Choose an ideal metric μ such that T has good contraction properties with respect to μ. This ideal metric has to reflect the structure of the algorithm. It also has to have good bounds in terms of interpretable other metrics and has to allow one to estimate bounds (in terms of moments usually). As a consequence one obtains 5. The conjectured limiting distribution is the unique fixed point of T . Finally, one should ensure that the recursion is stable enough so that the contraction in the limit can be pulled over to establish contraction properties of the recursion itself for n → ∞. This is the technically most involved step in the analysis. 6. Establish convergence of the algorithms to the fixed point. Applications of this method have been given to several sorting algorithms, to the communication resolution interval (CRI) algorithm, to generalized branching-type algorithms, to bootstrap estimators, iterated function systems, learning algorithms, and others. For several examples modifications of this method have been considered. There are examples where the contraction factors converge to one. In several cases there is a trivial limiting recursion that gives no clue to a possible limit distribution. Also, logarithmic normalizations and convergence rates have to be handled by special considerations. We begin with a discussion of contraction properties of transformations T on the set of probability distributions on a basis space U. Stochastic algorithms are typically directly described by the iterates of a transformation T on the set of all probability distributions on the basic space U , or else they are asymptotically closely related to the iterations (and thereby to the fixed points) of a transformation T that describes the limiting equation. They can, for example, differ from iterations by a stochastic sequence converging to zero. Examples for recursive algorithms
9.1 Recursive Algorithms and Contraction of Transformations
193
that are asymptotically related to iterations of transformations T are studied in Section 9.2. Consider a contraction transformation T : M 1 (U ) → M 1 (U ), where M 1 (U ) is the set of probability measures on U supplied with probability metrics as in Chapter 8. Applying the fixed-point theorems for complete, separable metric spaces, we can infer the convergence of the iterates (T n F ), F ∈ M 1 (U ), to a fixed point of T . Some of the following examples serve to describe the influence of the choice of the metrics, while others indicate the range of applicability to different fields. (a) Consider at first a transformation of the form d
TF =
N
ai (τ )Yi + C(τ ).
(9.1.1)
i=1
Here, for F ∈ M 1 (U ), (Yi )1≤i≤N is an i.i.d. sequence with distribution F . Furthermore, C(τ ), ai (τ ), τ are real random variables independent of (Yi ), N and finally, T F is the law of i=1 ai (τ )Yi + C(τ ). Consider Zolotarev’s ideal metric ζr of order r > 0: For r = m + α, m ∈ IN, and 0 < α ≤ 1, ζr (X, Y ) :=
(9.1.2) ' & (m) (m) α sup |E(f (X) − f (Y ))|; |f (x) − f (y)| ≤ x − y .
Here f (m) (x) denotes the Fr´echet derivative of order m, and · is a norm on U . Next, suppose that F, G ∈ M 1 (U ), where (U, · ) is a Banach space. Proposition 9.1.1 ζr (T F, T G) ≤
2N
3 E|ai (τ )|r
ζr (F, G).
(9.1.3)
i=1
Proof: The proof of (9.1.3) uses the ideality properties of ζr ; that is, (i) ζr (X + Z, Y + Z) ≤ ζr (X, Y )
for Z independent of X, Y,
(9.1.4)
and (ii) ζr (cX, cY ) = |c|r ζr (X, Y ), for all c ∈ IR. d
d
(9.1.5)
Let Yi = F , Zi = G be independent r.v.s. Then, with r = m + α, & ai (τ )Yi + C(τ ) −Ef ai (τ )Zi + C(τ ) ; ζr (T F, T G) = sup Ef ' |f (m) (x) − f (m) (y)| ≤ |x − y|α
194
9. Mass Transportation Problems and Recursive Stochastic Equations
≤ ≤ ≤
ζr ζr
ai (t)Yi + C(t), ai (t)Yi ,
ai (t)Zi + C(t) dP τ (t)
ai (t)Zi ) dP τ (t)
|ai (t)|r dP τ (t)ζr (Yi , Zi ) =
E|ai (τ )|r ζr (F, G),
2
which proves (9.1.3).
For the property to hold it suffices to require that
E|ai (τ )|r < 1
and
ζr (F, G) < ∞.
(9.1.6)
In some cases, the last condition can be established by making use of the inequality
ζr ≤
Γ(1 + α) vr , Γ(1 + r)
(9.1.7)
where vr (X, Y ) :=
xr d|PX − PY |(x)
is the absolute pseudomoment of order r. For random vectors X and Y , and m ≤ r < m + 1, (9.1.7) requires that all moments X and Y of order ≤ m coincide. Recall that the minimal Lp -metrics p are ideal of order r = min(1, p).
Proposition 9.1.2 For F, G ∈ M 1 (U ),
p (T F, T G) ≤
2N i=1
3 E|ai (τ )|r
p (F, G).
(9.1.8)
9.1 Recursive Algorithms and Contraction of Transformations d
195
d
Proof: Let Yi = F, Zi = G, 1 ≤ i ≤ N , be independent pairs of random variables with laws F, G, and Lp (Yi , Zi ) = p (F, G). Then ai (τ )Zi + C(τ ) p (T F, T G) = p ai (τ )Yi + C(τ ), ( ( ( ( ai (τ )Yi − ≤ ( ai (τ )Zi ( p
≤
N
E|ai (τ )|r Yi − Zi p
i=1
=
2N
3 E|ai (τ )|r
p (F, G).
i=1
2 So, the contraction property will hold if N
E|ai (τ )|r < 1
and p (F, G) < ∞.
(9.1.9)
i=1
Under additional assumptions we may improve (9.1.9). Let U be a Hilbert space and let F, G have identical first moments. Then the following is a refinement of Proposition 9.1.2. Proposition 9.1.3 2 31/2 N 2 (T F, T G) ≤ E|ai (τ )|2 2 (F, G).
(9.1.10)
i=1
Proof: With Yi , Zi as in the proof of (9.1.8), we have ( (2 ( ( 2 2 (T F, T G) ≤ ( ai (τ )(Yi − Zi )( 2
=
N
E|ai (τ )|2 Yi − Zi 22
i=1
=
2
E|ai (τ )|
22 (F, G). 2
If the Banach space U is of type p, 1 ≤ p ≤ 2, and F, G have identical d d first moments (more precisely, E(Y − Z) = 0 for Y = F, Z = G), then for 1 ≤ p ≤ 2, p (T F, T G) ≤ Bp1/p
2N i=1
31/p E|ai (τ )|p
p (F, G).
(9.1.11)
196
9. Mass Transportation Problems and Recursive Stochastic Equations
Here Bp is the constant arising in the Woyczinski inequality (cf. Rachev and R¨ uschendorf (1992a)). For U = Lp (μ) (Lp (μ) is the space of all r.v.s X with finite |X|p dμ), one can choose the constants B1 = 1, Bp = 18p3/2 /(p − 1)1/2 for 1 < p ≤ 2. The proof of (9.1.11) is similar to that of (9.1.10), but in (9.1.11) we use the Woyczinski inequality instead of the Hilbert space structure. If the underlying space is Euclidean, we can derive similar contraction properties with respect to other metrics defined in Chapter 8. Example 9.1.4 Let N = 2, let τ be uniformly distributed on (0,1), and a1 (τ ) = τ, a2 (τ ) = 1 − τ . Then the contraction factor α with respect to the p -metric is given in the following list: 1 -metric : α
= Eτ + E(1 − τ ) = 1,
2 -metric : α ζ2 -metric : α ζ3 -metric : α
= 1/2.
i.e., “no contraction”; (9.1.12) % = (Eτ 2 + E(1 − τ )2 )1/2 = 2/3; = Eτ 2 + E(1 − τ )2 = 2/3;
Clearly, if for a probability metric μ on M 1 (U ) the contraction factor is α < 1, μ(T F, T G) ≤ αμ(F, G), then μ(T n+1 F, T n F ) ≤ αn μ(T F, F );
(9.1.13)
i.e., one obtains an exponential convergence rate to a fixed point. In the d example above we consider the recursion Xn+1 = τn Xn + (1 − τn )X n + d
d
C(τn ), τn = τ, where X n = Xn , and τn , Xn , X n are independent. The d
corresponding fixed point equation is X = τ X + (1 − τ )X + C(τ ). So under the condition of equal first moments, the convergence rate is (2/3)n for the “ideal” metric ζ2 , in comparison to (2/3)n/2 for the 2 -metric. √ √ If a1 (τ ) = τ , a2 (τ ) = 1 − τ , then with respect to the 2 -metric the contraction factor is α = Eτ + E(1 − τ ) = 1; i.e., there is no contraction. The same “no contraction” property is valid for ζ2 . For ζ3 the contraction factor is α = Eτ 3/2 + E(1 − τ )3/2 = 45 < 1; so the contraction property holds if ζ3 (F, G) < ∞. (b) We next consider the transformation d
TF =
max {ai (τ )Yi }.
1≤i≤N
(9.1.14)
For U = IRk , k = 1, 2, . . . , ∞, F ∈ M 1 (U ), let (Yi )1≤i≤N , τ be as in (a). Let ai (τ ) ≥ 0 and consider d
TF =
max {ai (τ )Yi }.
1≤i≤N
(9.1.15)
9.1 Recursive Algorithms and Contraction of Transformations
197
We shall study the contraction property of T by making use of the weighted uniform metric r : r (X, Y )
sup M (x)r |FX (x) − FY (x)|,
=
(9.1.16)
x∈IRk
where M (x) := mini≤k |xi |. In the next proposition we use the fact that r is an ideal metric of order r with respect to the maxima of i.i.d. r.v.s. Proposition 9.1.5 r (T F, T G) ≤
r
E(ai (τ ))
r (F, G).
(9.1.17)
d
Proof: Let (Zi ) be i.i.d. and Z1 = G. Then using the max-ideality of r , we have r (T F, T G) = r max{ai (τ )Yi }, max{ai (τ )Zi } i≤N i≤N = sup |x|r (Fmax{ai (t)Yi } (x) − Fmax{ai (t)Zi } (x)) dP τ (t) x ≤ r (max{ai (t)Yi }, max{ai (t)Zi }) dP τ (t) 3 2 ≤ ai (t)r dP τ (t)r (Yi , Zi ) = Eai (τ )r r (Y1 , Z1 ). i
i
2 For more general maxima we can again use the p -metrics. Let U = L (μ), 1 ≤ λ < ∞. For F ∈ M 1 (U ) and ai (τ ) ≥ 0 consider λ
d
TF =
max ai (τ )Yi + C(τ ).
i≤i≤N
(9.1.18)
d
Here (Yi ) are i.i.d., Y1 = F, τ is independent of (Yi ), and C(τ ) has values in U . For any p, λ we have 3 2N Eai (τ )r p (F, G), r = min(1, p). (9.1.19) p (T F, T G) ≤ i=1
For 1 ≤ p ≤ λ < ∞ we have the following improvement: Proposition 9.1.6 If 1 ≤ p ≤ λ < ∞, then p (T F, T G) ≤
N i=1
(Eai (τ )p )1/p p (F, G).
(9.1.20)
198
9. Mass Transportation Problems and Recursive Stochastic Equations d
d
Proof: Let Yi = F , Zi = G satisfy Yi − Zi pλ,μ = p (F, G)p , where Xλ,μ = ( |X(t)|λ dμ(t))1/λ . Then
p (T F, T G) ≤ (E max ai (τ )Yi − max ai (τ )Zi pλ,μ )1/p 2 31/p p/λ = E | max ai (t)Yi (s)−max ai (t)Zi (s)|λ dμ(s) dP τ (t) ⎛
≤ ⎝
E
2
3p/λ
⎞1/p
dP τ (t)⎠
ai (t)λ |Yi (s) − Zi (s)|λ dμ(s)
i
≤
2
# E
31/p $p/λ ai (t)λ |Yi (s) − Zi (s)|λ dμ(s) dP τ (t)
i
(since p/λ ≤ 1)
=
2
31/p Eai (τ )EYi − Zi pλ,μ
=
i
2
31/p Eai (τ )p
p (F, G).
i
2 (c) Bootstrap Estimators
For a separable Banach space U and F ∈ M 1 (U ), let μ(F ) = x dF (x), n d μn (F ) = n1 i=1 Xi , where (Xi ) are i.i.d., Xi = F , and Fn is the empirical measure of X1 , . . . , Xn . For p > 0 denote by Γp the class of distributions with finite pth moment. From the strong law of large numbers, for any F ∈ Γp we then obtain (cf. Chapter 8) p (Fn , F ) → 0 a.s.
and Ep (Fn , F ) → 0.
(9.1.21) d
∗ Let now X1∗ , . . . , Xm be a bootstrap sample; i.e., the (Xi∗ ) are i.i.d., X1∗ = ∗ is the empirical distribution of Fn (conditionally on X1 , . . . , Xn ), and Fn,m ∗ ∗ ∗ X1 , . . . , Xm , m = m(n). The condition p (Fn,m , F ) → 0 a.s. (conditionally on X) is equivalent to the joint convergence
1 f (Xi∗ ) → m i=1 m
1 p ∗ d (Xi , a) m i=1 m
→
f dF a.s.,
f ∈ Cb (U ),
dp (x, y) dF (x) a.s.
9.1 Recursive Algorithms and Contraction of Transformations
199
(cf. Chapter 8), representing a special form of the SLLN for real-valued r.v.s. In the case (U, d) = (IRr , · ), p > 1, we are able to obtain a rate of convergence for the approximation. Let γ = kr/[(k − r)(k − 2)], bootstrap γ k > r, k > 2, and x F ( dx) < ∞. Then 1
Epp (Fn , F ) ≤ C(r, k, p)n−(1− p )/k
(9.1.22)
(cf. Rachev (1984d, pp. 667–668)), and thus 1
∗ ∗ , F ) ≤ 2p EEX1 ,...,Xn pp (Fn,m , Fn ) + 2p C(r, k, p)n−(1− p )/k Epp (Fn,m 1 1 −(1− p )/k −(1− p )/k ∗ . (9.1.23) ≤ C (r, k, p) m +n
If, however, the (Xi ) are in the domain of an α-stable distribution, then it is more natural to choose the bootstrap estimator from a distribution Fn 1 , . . . , X n be a bootstrap which has a tail behavior similar to F . Let then X sample with adapted tail behavior such that p (Fn , F ) → 0. Consider Vn =
1 n1/α
n
(9.1.24)
d i ) = X Tn (Fn ) (1 ≤ α ≤ 2) as a i=1 (Xi − EF n n d 1 Vn = n1/α i=1 (Xi − EF Xi ) = Tn (F ). Then
boot-
strap estimator of for a Banach space of type p, 1 ≤ p ≤ 2 and 1 ≤ α ≤ p, it follows from (9.1.11) that 1 1 1 − E X 1 , X 1 − E F X1 ) p (Vn , Vn ) ≤ Bp n p − α p (X 1 1 1 − EF X1 |) → 0. ≤ Bp n p − α (p (Fn , F ) + |E X
(9.1.25)
Since p (Vn , Y(α) ) → 0, where Y(α) an α-stable r.v., then in the case p (X1 , Y(α) ) < ∞, it follows from the bound in (9.1.25) that p (Vn , Y(α) ) → 0.
(9.1.26)
Moreover, the rate of convergence is of order o(n1/p−1/α ). In the case p = 2 and F ∈ Γ2 , the condition (9.1.24) is satisfied for Fn = Fn∗ ; for Euclidean spaces this case has been considered by Bickel and Freedman (1981). Their investigation of more general functionals on the set of empirical measures can also be extended to the setting we described. (d) Transformation by Markov Kernels, Image Encoding Let (U, d) be a separable metric space and let wi : U → U, 1 ≤ i ≤ N , be mappings satisfying d(wi x, wi y) ≤ si d(x, y).
(9.1.27)
200
9. Mass Transportation Problems and Recursive Stochastic Equations
Given a probability distribution (pi )1≤i≤N define the Markov kernel K(x, ·) =
N
pi εwi (x) ,
N ≤ ∞.
(9.1.28)
i=1
The implied transformation on M 1 (U ) is denoted by T F = KF, where KF (A) = K(x, A)F ( dx).
(9.1.29)
Let now Lip(U ) be the set of Lipschitz functions, |f (x) − f (y)| ≤ d(x, y) for all x, y ∈ U . Then for Kf (x) = pi f ◦ wi (x), we have |Kf (x) − Kf (y)| ≤ pi |f ◦ wi (x) − f ◦ wi (y)| (9.1.30) pi si d(x, y). ≤ pi d(wi (x), wi (y)) ≤ Let us look at the contraction properties for the mapping T with respect to the Kantorovich metric d
d
μL (F, G) = sup{|E(f (X) − f (Y ))|; X = F, Y = G, f ∈ Lip(U )}. (9.1.31) We have then μL (T F, T G) = sup{|E(f (K(X, ·) − f (K(Y, ·)))|; f ∈ Lip(U )} (9.1.32) = sup{|E(Kf (X) − Kf (Y ))|; f ∈ Lip(U )} ≤ pi si sup{|E(g(X) − g(Y ))|; g ∈ Lip(U )} pi si μL (F, G). = If pi si < 1, then T is a contractive mapping. By the Kantorovich–Rubinstein theorem μL coincides with the minimal L1 -metric, and therefore, pi si 1 (F, G). 1 (T F, T G) ≤ (9.1.33) Moreover, for any p > 0, we can extend this result as follows. Proposition 9.1.7 p (T F, T G) ≤ d
pi spi d
1/p∧1
p (F, G).
Proof: Suppose X = F, Y = G satisfy ⎧ ⎨ (E d(X, Y )p )1/p , if p ≥ 1, p (F, G) = ⎩ E d(X, Y )p , if p < 1.
(9.1.34)
9.1 Recursive Algorithms and Contraction of Transformations
201
Take I to be a random variable with values in {1, 2, . . . , N } and distribution (pi ) that is independent of X, Y . Then for 1 ≤ p, pp (T F, T G)
≤ E d(wI(X), wI(Y ))p =
N
(9.1.35)
E(d(wi (X), wi (Y ))p I(I = i))
i=1
=
N
pi E d(wi (X), wi (Y ))p
i=1
≤
2N
3 pi spi
E d(X, Y ) =
pi spi
pp (F, G).
i=1
2
The proof for the case 0 < p < 1 is similar.
Remark 9.1.8 Another proof of (9.1.32) can be given via the dual representation of p . Indeed, for 0 < p < ∞, p = 1 ∨ p,
pp (F, G)
=
sup
f dF +
g dG; f, g bounded continuous, p f (x) + g(y) ≤ d (x, y), ∀x, y ∈ U .
(9.1.36)
Therefore,
pp (T F, T G) (9.1.37) = sup{E f (K(X, ·)) + E g(K(Y, ·)); f (x) + g(y) ≤ dp (x, y)}, d
d
where X = F, Y = G. Since Kf (x) + Kg(y) ≤
pi dp (wi (x), wi (y)) ≤
pi spi dp (x, y),
we obtain
pp (T F, T G) = sup{E Kf (X) + E Kg(Y )), f (x) + g(y) ≤ dp (x, y)} ≤ pi spi pp (F, G). (9.1.38) Remark 9.1.9 Hutchinson (1981) was the first to prove convergence with respect to the metric μL in the case si ≤ 1. Barnsley and Elton (1988) used the above Markov chain to “construct images” by so-called iterated
202
9. Mass Transportation Problems and Recursive Stochastic Equations
function systems (IFS). They established the existence of a unique attractive invariant measure μ under the assumption that N 9
d(wi (x), wi (y))pi ≤ r d(x, y),
r < 1.
(9.1.39)
i=1
The above inequality is indeed implied by the condition (9.1.27) with N 9
spi i < 1.
(9.1.40)
i=1
In the case of affine maps on IRk we can improve the arguments in the following way (see Proposition 9.1.10 below, cf. also Burton and R¨ osler (1995)). Define d
T F = AX + b,
(9.1.41)
where A is a random matrix, b a random vector, (A, b) independent of X, d
and X = F . Consider the operator norm of the expected product EAT A, EAT A := sup x∈IRk x=0
(EAT A)x . x
(9.1.42)
Then EAT A = sup x=0
EAx, Ax = EA, A, x2
(9.1.43)
where the right-hand side is the L2 -norm of EA, A. Proposition 9.1.10 Assume that b 2 < ∞. Then 2 (T μ, T ν) ≤
A
EAT A 2 (μ, ν)
for any μ, ν ∈ M 1 (IRk ) with finite second moments.
(9.1.44)
9.1 Recursive Algorithms and Contraction of Transformations
203
Proof: Let Y, Z be random vectors with distributions μ, ν and 2 (μ, ν) = (EY − Z2 )1/2 , where Y, Z are independent of (A, b). Then 2 (T μ, T ν) ≤ AY − AZ 2 % EA(Y − Z), A(Y − Z) ≤ A = EY − Z, E(AT A)(Y − Z) A % ≤ EAT A EY − Z, Y − Z A = EAT A 2 (μ, ν).
(9.1.45)
2 Notice that the estimate from above defined in (9.1.45) is an improvement (in the case p = 2) over the general estimate AX p ≤ A · X p ≤ A p · X p .
(9.1.46)
In fact, the above general bound requires the stronger condition A p < 1 to yield the contraction property. (e) Environmental Processes Let (Yi , Zi ) be a sequence of i.i.d. pairs of r.v.s with values in U × IR, where U is a separable Banach space. Define a sequence of r.v.s (Sn ) by Sn+1 = (Yn + Sn )Zn ,
S0 ≥ 0.
(9.1.47)
This kind of process has found several applications in environmental modeling and has been studied intensively. If we write τn = (Yn , Zn ), and a(τn ) = Zn , C(τn ) = Yn Zn , then Sn+1 = a(τn )Sn + C(τn ),
(9.1.48)
so we have a special case of (9.1.1). Under the condition that E|a(τ )|r < 1, d
the operator T S = a(τ )S + C(τ ) is contractive. Therefore, (Sn ) converges (with respect to some ideal metric of order r such as ζr , for example) to a fixed point, i.e., a solution of d
S = (Y + S)Z.
(9.1.49)
Numerous properties of the solutions of the above equation have been studied in the literature; see, for example, Rachev and Samorodnitsky (1995) and Rachev and R¨ uschendorf (1995).
204
9. Mass Transportation Problems and Recursive Stochastic Equations
9.2 Convergence of Recursive Algorithms In this section we apply the contraction properties established in Section 9.1 to study limits for recursive algorithms. We shall use the “method of probability metrics.” The main idea of this method is to transform the recursive equations in such a way that with respect to a suitable metric we can derive contraction properties in the limit; i.e., we consider decompoBn such that (Yn ) has contraction properties and W Bn sitions Xn = Yn + W converges to zero. This idea will be demonstrated in various examples. n The approach is natural from the following point of view. If Sn = i=1 Yi is a sum of independent (centered) random variables and Xn = n−1/α Sn is the normalized sum, then Xn satisfies the following simple recursion: Xn+1 =
n n+1
−1/α
Xn + (n + 1)−1/α Yn+1 .
(9.2.1)
Thus the central limit theorem can be considered as the limit theorem of this simple (stochastic) recursion. The form of the recursion corresponding to the strong law of large numbers is even simpler.
9.2.1
Learning Algorithm
Let Y1 , Y2 , . . . be an i.i.d. sequence of r.v.s with values in a separable Banach space with first moment μ. Define the following recursive sequence: Let X1 be arbitrary with finite first moment, and let Xn+1 =
n 1 Xn + Yn+1 . n+1 n+1
(9.2.2)
Xn can be viewed as an easy recursive algorithm designed to “learn” about the unknown theoretical mean μ given the sample (Y1 , . . . , Yn ). Proposition 9.2.1
ζr (Xn , μ) → 0
if
ζr (X1 , μ) < ∞.
Proof: Let n = EXn . Claim 1: n → μ. For the proof of Claim 1 note that from (9.2.2), we obtain n+1
= = =
n 1 n + μ n+1 n+1 n−1 2 n−1 + μ n+1 n+1 1 n 1 + μ, n+1 n+1
(9.2.3)
9.2 Convergence of Recursive Algorithms
205
where the last step follows from the inductive argument. This implies Claim 1. Define next Z n = Xn − n ,
Wn = Yn − μ.
(9.2.4)
Then, Zn+1 + n+1 Zn+1
n 1 (Zn + n ) + (Wn+1 + μ), (by (9.2.2)) n+1 n+1 1 1 n n + Wn+1 − n+1 − n −μ = Zn n+1 n+1 n+1 n+1 n 1 + Wn+1 (by (9.2.3)). (9.2.5) = Zn n+1 n+1 =
Now let μr be an ideal metric of order r, 1 < r < 2, and bn = μr (Zn , 0) (for example we can choose μr = ζr ). Claim 2. μr (Zn , 0) → 0 if a = μr (W1 , 0) < ∞. For the proof of this claim note that
bn+1
= ≤
= =
n 1 + Wn+1 ,0 μr (Zn+1 , 0) = μr Zn n+1 n+1 n 1 μr Zn , 0 + μr Wn+1 , 0 n+1 n+1 (since Zn is independent of Wn+1 ) r r 1 n μr (Zn , 0) + μr (Wn+1 , 0) n+1 n+1 r r n 1 bn + a. n+1 n+1
Therefore,
bn+1
r $ r 1 1 ≤ bn−1 + a + n n+1 a r r n−1 1 = bn−1 + 2 a n+1 n+1 r r 1 1 ≤ b1 + n a. (9.2.6) n+1 n+1 n n+1
r #
n−1 n
r
Since 1 < r, it follows that bn → 0. In particular, for μr = ζr , we obtain from Claim 1 that
206
9. Mass Transportation Problems and Recursive Stochastic Equations
ζr (Xn , μ) → 0
if ζr (X1 , μ) < ∞.
(9.2.7) 2
For the case of Euclidean spaces the condition ζr (Y1 , μ) < ∞ is satisfied if Y1 has a finite absolute rth moment, r > 1. Therefore, under the assumption of a finite rth moment we obtain convergence of Xn to μ. The sequence (Xn ) provides a simple example of a “learning algorithm” (for μ). Its convergence to μ in the real case can also be obtained as an application of the Robbins–Siegmund lemma (cf. Robbins and Siegmund (1971)) under the stronger assumption of a finite second moment. In this simple example we can, of course, directly prove the convergence of Xn to μ under the assumption of a finite first moment. The arguments above illustrate the general idea behind the method of probability metrics and show that in this simple case the method of probability metrics works with weaker assumptions than the method of stochastic approximation based on the Robbins–Siegmund lemma. Some further simple examples of the Robbins–Monroe-type recursion Xn+1 = fn (Xn , Yn+1 ) can be treated similarly. Note that our method only needs a metric ideal of order r > 1 such that μr (Xn − n , 0) → 0 implies that Xn − n → 0 in distribution. The p metric will not work in this example, since its degree of ideality is only r = min(1, p).
9.2.2
Branching-Type Recursion
Consider the following recursive sequence (Ln ): L0 ≡ 1,
d
Ln =
K
(i)
Xi Ln−1 + Y.
(9.2.8)
i=1 (i)
Here Ln−1 are i.i.d. copies of Ln−1 , (Xi ) is a real random sequence, K is a random number in IN0 , and Y is a random “immigration” such that d (i) K, {(Xi ), Y }, (Ln−1 ) are independent. As usual, = denotes equality in distribution. (9.2.8) induces a transformation T on M 1 , the set of probability distributions on (IR1 , B 1 ). This is achieved by letting T (μ) be the distriK bution of i=1 Xi Zi + Y , where the (Zi ) are i.i.d. μ-distributed r.v.s, and moreover, (Zi ), {(Xi ), Y }, K are independent. Some special cases of those transformation and recursion have been studied intensively in the literature. If Xi ≡ 1, then (9.2.8) describes a Galton– Watson process with immigration Y with the number of descendants of a parent described by K. The recursion (9.2.8) can be viewed as a branching process with random multiplicative weights. The special case where K is constant, Y = 0, and (Xi ) are i.i.d. and nonnegative was introduced by Mandelbrot (1974) in his analysis of the Yaglom–Kolmogorov
9.2 Convergence of Recursive Algorithms
207
turbulence model. This case has been also studied by Kahane and Peyri`ere (1976) and Guivarch (1990), who considered the question of nontrivial fixed points of T , the existence of moments of the fixed points, and the convergence of (Ln ). For Xi ≡ K −1/α , the solutions of the fixed-point equation d K Z = i=1 K −1/α Zi are Paretian stable distributions (if Zi ≥ 0). For that reason the solutions are called semistable in Guivarch (1990). In this section we will be mainly interested in the case of multipliers Xi and solutions Zi with moments of order ≥ 2. While the analysis of Kahane and Peyri`ere (1976) is based on an associated martingale, Guivarch (1990) uses a more elementary martingale property together with a conjugation relation and moment-type estimates for the Lp -distance, 0 < p < 1. Motivated by some problems in infinite particle systems, Holley and Liggett (1981) and Durrett and Liggett (1983) considered a smoothing transformation with (Xi ) that are not not necessarily independent and assume that Xi ≥ 0, K constant, and Y = 0. In Durrett and Liggett (1983) a complete analysis of the case is given. In particular, a necessary and sufficient condition for the existence and characterization of (all) fixed points as well as a general sufficient condition for convergence was derived, as well as a generalization of the result of Kahane and Peyri`ere on the existence of moments. The method of Durrett and Liggett is based on an associated branching random walk. The use of contraction properties of minimal Lp -metrics in this section allows us to obtain quantitative approximation results for the recursion (9.2.8). Under moment assumptions used in this section, the recursion converges to the limiting distribution exponentially fast. This is demonstrated by simulations for several examples. Also, it is possible to remove the assumption of nonnegativity, to deal with a random number K, and to add immigration Y . This allows us to include applications to branching processes as well as to study the development of the total mass in the construction of multifractal measures (cf., for example, Arbeiter (1991)). For details we refer to Cramer and R¨ uschendorf (1996b). (a) Branching-Type Recursion with Multiplicative Weights In this section we shall study the recursion (9.2.8) allowing for dependent multipliers Xi but setting the immigration Y ≡ 0. In other words, we consider the recursion L0 ≡ 1,
(i)
d
Ln =
K
(i)
Xi Ln−1 ,
(9.2.9)
i=1
are i.i.d. copies of Ln−1 , (Xi ) is a square integrable real (i) random sequence, K is a random number in IN0 , and K, (Xi ), Ln−1 are independent r.v.s. where
Ln−1
208
9. Mass Transportation Problems and Recursive Stochastic Equations
To determine the correct normalization of (L n ) we first consider the first K moments of (Ln ). Set n := ELn , c := E i=1 Xi , vn := Var(Ln ), K K 2 a := E i=1 Xi , and b := Var i=1 Xi . Proposition 9.2.2 Let 0 = 1, n = cn . Suppose that b > 0, c = 0, a = c2 . Then vn = bc
1 − ( ca2 )n , 1 − ca2
2n−2
n ≥ 1, v0 = 0.
(9.2.10)
If a = c2 = 0, then vn = nban−1 . Proof: Using the independence assumption in (9.2.9) and the conditional expectations, we obtain n
3 2 2K 33 2K (i) (i) = E E Xi Ln−1 |K = E EXi Ln−1 = E
2K
i=1
EXi
i=1
3 n−1 = c n−1 ;
i=1
i.e., n = cn . Similarly, vn
2
= EL2n − (ELn ) ⎡ ⎛2 32 ⎞⎤ K (i) = E⎣ E⎝ Xi Ln−1 K ⎠⎦ − c2 2n−1 i=1 ⎤ ⎡ K 2 (i) (i) (j) = E⎣ E Xi Ln−1 + E Xi Xj Ln−1 Ln−1 ⎦ − c2 2n−1 ⎡
i=j
i=1
= E ⎣EL2n−1
K
EXi2 + 2n−1
⎤ E(Xi Xj )⎦ − c2 2n−1
i=j
i=1
⎡ ⎤ K = E⎣ EXi2 Var Ln−1 + 2n−1 + 2n−1 E(Xi Xj )⎦ − c2 2n−1 i=1
= E
2K
3 Xi2
vn−1 + Var
2K
i=1
= a vn−1 + b c2(n−1) = b
3 Xi
i=j
2n−1
i=1 n−1 k=0
ak c2(n−1−k)
9.2 Convergence of Recursive Algorithms
=
⎧ ⎨ b c2n−2 ⎩
1−( ca2 )n 1− ca2
= bc2n
1−( ca2 )n c2 −a
, if
nban−1 , if
209
a = c2 = 0, a = c2 .
2
In the case b = 0, we have vn = 0 for all n. Therefore, we consider only the case b > 0. √ From (9.2.10) we obtain that for a < c2 , vn is of the same order as n . This makes it possible to use a simple normalization by n . Define for c = 0, n := Ln /cn . L
(9.2.11)
n = 1, and Var(L n ) → Then E L recursion n L
b c2 −a .
n satisfies the modified Moreover, L
1 (i) = Xi Ln−1 , c i=1 K
d
(9.2.12)
(i)
n−1 (i) := Ln−1 . Define D2 to be the set of distributions on (IR1 , B 1 ) where L n−1 c with finite second moments and first moment equal to one. Next, define the mapping T : D2 → D2 by
2 T (G) = L
K 1
c
3 Xi Zi
,
(9.2.13)
i=1
where the (Zi ) are i.i.d. random variables with distribution G, and such that (Xi ), (Zi ), K are independent r.v.s. Let 2 denote the minimal L2 metric on D2 : & ' d d 2 1/2 2 (μ, ν) = inf E(V − W ) ; V = μ, W = ν (9.2.14) ⎞1/2 ⎛ 1 −1 2 = ⎝ F (u) − G−1 (u) du⎠ . 0
Here F, G are the distribution functions of μ, ν respectively. If a < c2 , then T is a contraction with respect to 2 . Proposition 9.2.3 Assume that a < c2 . Then for F, G ∈ D2 , 8 2 (T (F ), T (G)) ≤
a 2 (F, G). c2
(9.2.15)
210
9. Mass Transportation Problems and Recursive Stochastic Equations d
d
Proof: Let the r.v.s U (i) = F, V (i) = G, i ∈ IN, be choosen on (Ω, A, P ) in such a way that ||U (i) −V (i) ||2 = 2 (F, G); for all i and K, (Xi ), (U (1) , V (1) ), (U (2) , V (2) ), . . . are all assumed to be independent. Then ( K (2 K ( (1 1 ( 2 (i) (i) ( Xi U − Xi V ( 2 (T (F ), T (G)) ≤ ( (c ( c i=1 i=1 2 ⎛ ⎡2 32 ⎤⎞ K K 1 ⎝ ⎣ ⎦⎠ (i) (i) = E E Xi U − Xi V K 2 c i=1 i=1 *K 2 1 2 (i) (i) K = E E Xi U − V c2 i=1 +
=
2K
1 E EXi2 E U (i) − V (i) c2 i=1 a 2 (F, G). c2 2
2
3
2 As a consequence of Proposition 9.2.3 it follows that T has exactly one fixed point in D2 with variance equal to b/(c2 − a). The fixed-point equad
tion is given in terms of the independent random variables Z, Zi ∈ D2 , Zi = Z, (Zi ) as follows: 1 Z = Xi Zi . c i=1 K
d
(9.2.16)
As a corollary we obtain Theorem 9.2.4 If a = E n , Z) ≤ 2 (L
a n/2 c2
K i=1
Xi2 < c2 , then
√
√
b . c2 − a
(9.2.17)
n converges in distribution to Z. In particular, L Proposition 9.2.5 If K is constant and E 2 ≤ k ≤ h, then E|Z|h < ∞.
⎤
E Xi U (i) − V (i) Xj U (j) − V (j) |K ⎦
i=j
=
K i=1
|Xj |
k
< ck for all
9.2 Convergence of Recursive Algorithms
211
n can be equivalently represented by Yn in the following form: Proof: L Y0 = 1,
Yn
1 = n c
n 9
Xj1 ,...,jk ,
(j1 ,...,jn )∈{1,...K}n k=1 d
where (Xj1 ,...,jk−1 ,1 , ..., Xj1 ,...,jk−1 ,K ) = (X1 , ..., XK ) (cf. Guivarch (1990)). Moreover, (Yn ) is a martingale, and therefore |Yn |k is a submartingale. K (j) Representing the Yn in the recursive way Yn = 1c j=1 Xj Yn−1 , where (j)
Yn−1 are independent copies of Yn−1 , we have ⎞ ⎛ K |Xj |k ⎠ E|Yn−1 |k ck E|Yn |k ≤ ⎝E j=1
+
k1 + · · · + kK = k ki ≤ k − 1
9 K K 9 k kj |Xj | E|Yn−1 |kj . E k1 , . . . , kK j=1 j=1
We can infer from Theorem 9.2.4 that E|Yn |k is uniformly bounded for k ≤ 2. By induction over k ≤ h, we see that the lower-order terms in the above equation are uniformly bounded, say by a constant C. Since E|Yn |k ≥ E|Yn−1 |k , we obtain * 3+ 2K ≤ C. |Xj |k E|Yn |k ck − E i=1
Therefore, the assumptions of this proposition ensure that E|Yn |k is uniformly bounded for all k ≤ h. The submartingale convergence theorem now yields the existence of an integrable almost sure limit of |Yn |h . Since d n = n is absolutely h-integrable. Yn , the weak limit Z of L 2 L We can also obtain a “stability” result for the stationary equation (9.2.16). This will be achieved in terms of the p metrics defined as in (9.2.14) with 2 replaced by p. Suppose we want to approximate the solution S of the equation d
S =
K
Xi Si
i=1
by the solution of the “approximate” equation S
∗
d
=
K i=1
Xi∗ Si∗ .
(9.2.18)
212
9. Mass Transportation Problems and Recursive Stochastic Equations
Here we assume without loss of generality that c = 1 and consider the case of independent sequences (Xi ), (Xi∗ ) so that the pairs (Xi ), (Si ) and (Xi∗ ), (Si∗ ) are independent, and K is constant. Proposition 9.2.6 If K is constant, < 1, then
K
∗ i=1 p (Xi , Xi ) < ε, and
ε||S ∗ ||p p (S, S ) ≤ . K 1 − i=1 ||Xi ||p ∗
K i=1
||Xi ||p
(9.2.19)
Proof: From the definition of S, S ∗ , 2K 3 K Xi Si , Xi∗ Si∗ p (S, S ∗ ) = p i=1
≤
K
i=1
p (Xi Si , Xi∗ Si∗ )
i=1
≤
K
(p (Xi Si , Xi Si∗ ) + p (Xi Si∗ , Xi∗ Si∗ )
i=1
≤
2K
3 p (S, S ∗ ) + ||S ∗ ||p · ε.
||Xi ||p
i=1
This implies that p (S, S ∗ ) ≤
ε||S ∗ ||p . K 1 − i=1 ||Xi ||p
2
A similar idea for establishing robustness of equations can be found in Rachev (1991, Chapter 19.3). For the case of a random K we replace Proposition 9.2.6 by the following one.
Proposition 9.2.7 If E K 2 < 1, then a=E i=1 Xi 2 (S, S ∗ ) ≤
K i=1
(Xi −
ε √ ||S ∗ ||2 . 1− a
2 Xi∗ )
≤ ε2 , EXi = EXi∗ , and
(9.2.20)
Proof: By the triangle inequality and the independence assumption and the assumption EXi = EXi∗ , 2K 3 K Xi Si , Xi∗ Si∗ 2 (S, S ∗ ) = 2 i=1
i=1
9.2 Convergence of Recursive Algorithms
≤ 2
2K
Xi∗ Si∗ ,
i=1
K
3 Xi Si∗
+ 2
2K
i=1
Xi Si∗ ,
i=1
K
213
3 Xi Si
i=1
331/2 2 K 31/2 2 2K ≤ E Xi2 2 (S, S ∗ ) + ||S ∗ ||2 E (Xi − Xi∗ )2 =
√
i=1
i=1
a 2 (S, S ∗ ) + ε||S ∗ ||2 .
Therefore, ε||S ∗ ||2 √ . 2 (S, S ) ≤ 1− a ∗
2 Remark 9.2.8 In the case of constant K and nonnegative Xi , Durrett and Liggett (1983) proved that the stationary solution Z of (9.2.16) has moments of order β if and only if 2 3 K 1 < 0. (9.2.21) EXiβ v(β) = log β c i=1 For β = 2, (9.2.4) is equivalent to the condition a < c2 used in Proposition 9.2.3. In this sense this condition is sharp when using 2 -distances. Guivarch (1990) has shown how to relax the second-moment assumption. Remark 9.2.9 For the normalized recursion (9.2.12) with (Xi ) i.i.d. r.v.s, K being a constant (we assume without loss of generality that c = 1), we can use the form n = L
0 = 1, L
(j1 ,...,jn
n 9
)∈{1,...,K}n
k=1
Xj1 ,...,jk ,
(9.2.22)
n is a sum where (Xj1 ,...,jk ) are independent and distributed as X1 ; i.e., L over product weights in the complete K-ary tree; cf. the proof of Proposition 9.2.5. For nonnegative multipliers Xi we also can consider functionals of the type Mn = max Pn
n 9
Xj1 ,...,jk ,
(9.2.23)
k=1
where the maximum is taken over all paths of length n. Taking logarithms, − ln Mn = − max Pn
n k=1
ln (Xj1 ,...,jk ) = min Pn
n k=1
(− ln (Xj1 ,...,jk )) ,
214
9. Mass Transportation Problems and Recursive Stochastic Equations
and applying Kingman’s subadditive ergodic theorem yields that for some constant β, 1 log Mn → β a.s. n
(9.2.24)
This shows that in some sense the max product weight is not larger in order of magnitude than the average product weight. In some cases, the constant d β is explicitly known, for example, for Xi = U [0, 1], β ≈ −0.23196 (cf. Mahmoud (1992, p. 165)). Remark 9.2.10 In some cases explicit solutions of (9.2.16) are known. (1) If K is constant and
d 1 c Xi =
a , a − a ) is beta distributed, then β( K K
d
Z = Γ(a, β) is gamma distributed (cf. Guivarch (1990)). d 1 K d (2) Suppose that Z1 = K i=1 Xi Zi , (Yi ) are i.i.d. r.v.s, X = X1 , and d d 1 K d Y1 = X1 Z1 holds. Then Y1 = K i=1 Yi X. Conversely, if Y1 = K d 1 K 1 i=1 Yi X1 and (Xi ) are i.i.d. r.v.s, then Zi = K j=1 Yj . The K d 1 K sequence (Zi ) solves the equation Z1 = K i=1 Xi Zi (cf. Durrett and Liggett (1983)). d K 1/ϑ (3) Suppose (Zi ) solves Z1 = i=1 Xi Zi , Xi ≥ 0. Then Yi = Zi Wi , where 0 < ϑ ≤ 2 and Wi are stable r.v.s with index ϑ, satisfy K
1/ϑ
Xi
d
Yi = Y1 .
(9.2.25)
i=1
To prove (9.2.25), observe that K
1/ϑ
Xi
1/ϑ
Zi
d
Wi =
i=1
2K
31/ϑ Xi Zi
d
1/ϑ
W1 = Z1
W1 = Y1 .
i=1
This interesting transformation property is used in Guivarch (1990) to reduce the case with moments of Xi of higher order to the case of moments of lower order. K d d (4) If i=1 Xi2 ≡ c2 = 0, then the normally distributed r.v.s Z = Zi = N (0, σ 2 ) satisfy (9.2.16). (5) If Z solves (9.2.16) and Z is an independent copy of Z, then Z ∗ := Z − Z solves 1 ∗ ∗ = X Z . c i=1 i i K
Z
∗
d
9.2 Convergence of Recursive Algorithms
215
Here Xi∗ = τi Xi , and the τi are arbitrary random signs. In particular, d
if K = 2, and the r.v.s Xi∗ = U [−1, 1] are independent, then (9.2.16) d
is solved by Z ∗ := Z − Z, where Z = Γ(2, 12 , 0). n , Remark 9.2.11 The following simulations (Figures 9.1 and 9.2) of L d
d
d
0.4 0.2 0.0
0.0
0.2
0.4
0.6
0.6
0.8
0.8
1.0
1.0
with K = 2, X1 , X2 independent r.v.s, X1 = X2 = U [0, 1], Xi = β(2, 2), show good approximation of the empirical d.f. by the theoretical gamma distribution.
0
1
2
3
4
0
d
FIGURE 9.1. Empirical d.f., X1 = U [0, 1], n = 10, theoretical Gamma Γ(2, 12 , 0)
1
2
3
4
d
FIGURE 9.2. Empirical d.f., X1 = β(2, 2), n = 8, theoretical Gamma Γ(4, 14 , 0)
d
d
0.8
1.0
Remark 1 9 9.2.12 In the case K = 2, X1 , X2 independent r.v.s, X1 = X2 = U − 8 , 8 , no explicit solution of (9.2.16) is known. Nevertheless, the fol n converges very fast to the lowing simulation (Figure 9.3) shows that L 10 and L 12 fixed point of (9.2.16). The empirical distribution functions of L can hardly be distinguished. Therefore, they may be regarded as the limiting 6 is already distribution function. The empirical distribution function of L very close to that limit (cf. Figure 9.3).
0.0
0.2
0.4
0.6
FIGURE 9.3. Empirical d.f. # of Ln$ for d n = 6, 10, and 12, X1 = U − 81 , 98
-4
-2
0
2
4
6
216
9. Mass Transportation Problems and Recursive Stochastic Equations
Remark 9.2.13 (Branching processes) Equation (9.2.9) includes the Galton–Watson process as special case. A Galton–Watson process is defined by the recursion Z0 = 1,
Zn
Zn+1 =
Xkn ,
(9.2.26)
k=1 d
d
where Xkn = X are i.i.d. r.v.s, n ∈ IN0 . Define K = X and Xi ≡ 1. Then d
Ln = Zn
(9.2.27)
for all n. This equality can be checked by induction on n. In fact, take first d Z0 = L0 = 1. If Zk = Lk for k ≤ n, then Zn+1
d
=
Ln
d
Xkn =
k=l
d
=
K i=1
d
=
K
⎛ ⎝
Ln−1
K
i
(j)
L
n−1
j=1
i=1 k=1+ i−1 L(j) j=1 n−1
⎛
⎞(i) Xkn ⎠
d
=
K i=1
k=1
Xkn (i)
Zn−1
⎞(i)
⎜ n⎟ Xk ⎠ ⎝
=
k=1
K
Zn(i)
i=1
Ln(i) = Ln+1 . d
i=1
The assumption a < c2 is equivalent to the condition EX > 1. From (9.2.27) we can derive explicit stationary distributions and even the extinction probabilities in some cases. If, for example, X is geometrically distributed, P (X = k) = p(1 − p)k , k ∈ IN0 , then c = EX = 1−p p > 1 if p < 1−p 1 √ Zn 2 and Var(X) = p2 . The normalized Galton–Watson process Var(Zn ) converges to a (unique) solution of the fixed-point equation 4 X EX(EX − 1) 1 d Z = Zi , EZ = . (9.2.28) EX i=1 Var(X) p The extinction probability is easily seen to be 1−p . For the normalized continuous part an equation identical to (9.2.28) (but with different variances) is also valid. It is well known that this equation is solved by the geometric stable distribution of order 1, i.e., the exponential distribution. This finally implies √ 1 − 2p p 1 − 2p d Z = δ0 + exp , (9.2.29) 1−p 1−p 1−p √ since EZ = 1 − 2p, EZ 2 = 2(1 − p).
9.2 Convergence of Recursive Algorithms
217
(b) A Random Immigration Term In this section we admit an additional immigration term; i.e., we consider the recursion d
Ln =
K
(i)
Xi Ln−1 + Y,
(9.2.30)
i=1 (1)
(2)
where {(Xi ), Y }, K, Ln−1 , Ln−2 , . . . are independent r.v.s. The analysis of (9.2.30) is essentially simplified if we assume 0 := EL0 , v0 := Var(L0 ), K Var(0 i=1 Xi + Y ) EY (if c = 1), v0 = , where a < 1.(9.2.31) 0 = 1−c 1−a If c = 1, then EY = 0 and 0 is arbitrary. Lemma 9.2.14 Under the assumption (9.2.31), n = ELn = 0 , vn = Var(Ln ) = v0 , Proof: From (9.2.31), n = c n−1 + EY = vn
EY 1−c
for all n ∈ IN. (9.2.32)
= n−1 ,
Var(Ln ) = EL2n − 2n ⎡ K 2 (i) (i) (j) 2 ⎣ = E E Xi Ln−1 |K + E Xi Xj Ln−1 Ln−1 |K =
i=1
+ E(Y 2 |K) + 2E
2K
i=j
i=1
⎛
= a(vn−1 + 20 ) + ⎝
2K
2 = avn−1 + Var 0
⎞
3 Y Xi
i=1 K i=1
− 20
EXi Xj ⎠ 20
i=j
+ EY 2 + 0 2E
(i)
Y Xi Ln−1 |K
3+
Xi + Y
− 20 3 = vn−1 . 2
Condition (9.2.31) is fulfilled for a two-point distribution of L0 . Indeed, it allows us to use the method in the proof of section 9.2.2(a). A change of the initial condition leads to the necessity to change the method of proof and leads to a great variety of different cases to be considered. We therefore restrict ourselves to (9.2.31) in this section.
218
9. Mass Transportation Problems and Recursive Stochastic Equations
As in section (a), we introduce the operator T : M (0 , v0 ) → M (0 , v0 ),
T (G) = L
2K
3 Xi Vi + Y
.
(9.2.33)
i=1
Here M (0 , v0 ) is the set of distributions with mean 0 and variance v0 , d
(Vi ) are i.i.d. r.v.s, and the random quantities V1 = G, (Vi ), {(Xi ), Y }, K are independent. Similarly as in Proposition 9.2.3, we obtain the contractive inequality 2 (T (F ), T (G)) ≤
√
a 2 (F, G).
(9.2.34)
This implies the convergence of Ln to a unique fixed point for the mapping √ T in M (0 , v0 ) with respect to the 2 -metric. The contraction factor is a. A sharper result (i.e., a smaller contraction factor) is obtained by the use of the Zolotarev metric ζr instead of 2 . Recall the definition for ζr (cf. (9.1.2)): ζr (F, G) = sup{|E(f (X) − f (Y ))|; |f (m) (x) − f (m) (y)| ≤ |x − y|α }(9.2.35) for r = m + α, m ∈ IN0 , 0 < α ≤ 1. Proposition 9.2.15 ζr (T (F ), T (G)) ≤ E
2K
3 |Xi |r
ζr (F, G).
(9.2.36)
i=1
Proof: Recall that ζr is an ideal metric of order r with respect to summation; i.e., ζr (X + Z, Y + Z) ≤ ζr (X, Y ) for Z independent of X, Y , and moreover, ζr (cX, cY ) = |c|r ζr (X, Y ). Then, for (Zi ), (Wi ) being i.i.d. r.v.s distributed according to F, G, we have ζr (T F, T G)
=
2K 3 2K 3 sup Ef Xi Zi + Y − Ef Xi Wi + Y ; i=1 i=1 |f (m) (X) − f (m) (Y )| ≤ |x − y|2
≤
ζr
2K i=1
xi Zi + y,
K i=1
3 xi Wi + y
dP (X,Y,K) (x, y, k)
9.2 Convergence of Recursive Algorithms
≤
k
|xi |r ζr (Zi , Wi ) dP (X,Y,K) (x, y, k)
i=1
= E
219
2K
3 |Xi |r
ζr (F, G).
i=1
2
Note that for the recursion defined by T the first two moments are matched. Therefore, we can apply (ζr ) with r ≤ 3 and obtain as a corollary the following theorem. EY Theorem 9.2.16 Suppose either c = 1 and 0 = 1−c or c = 1. SupK Var(0 i=1 Xi +Y ) pose also that EY = 0 and v0 = for a < 1. Then for 1−a 0 < r ≤ 3, the inequality
ar := E
K
|Xi |r < 1
i=1
implies anr ζr (L0 , L1 ) < ∞, 1 − ar
ζr (Ln , Z) ≤
where Z is a fixed point of T in M (0 , v0 ). In particular, Ln converges in distribution to Z. Therefore, in the case with immigration we also obtain an exponential rate of convergence. As a consequence, after a few iterations, the limiting distribution is already well approximated. d
1 δ−5 + 25 δ0 + 12 δ2 , K = 2, X1 , X2 Consider the following example: L0 = 10 d d d 5 25 independent, X1 = X2 = U − 12 , 1 , and Y = 17 32 δ−1 + 64 δ0 + 64 δ2 .
0.8
1.0
In this situation the assumptions of Theorem 9.2.16 are fulfilled. The fast convergence is confirmed by the closeness of the empirical distribution functions of L6 and L8 in the simulation described in Figure 9.4.
0.0
0.2
0.4
0.6
FIGURE 9.4. Empirical distribution functions for L6 and L8 ; the difference between the two curves is hardly visible
-4
-2
0
2
4
6
220
9. Mass Transportation Problems and Recursive Stochastic Equations
9.2.3
Limiting Distribution of the Collision Resolution Interval
In this section we apply the method of probability metrics to investigate the contraction properties of stochastic algorithms arising in random-access communication protocols. The results are due to Feldman, Rachev, and R¨ uschendorf (1994); see also Rachev and R¨ uschendorf (1995). The Capetanakis–Tsybakov–Mikhailov (CTM) protocol is one of the most elegant solutions to the classical multiple-access problem, in which a large population of users share a single communication channel. Throughput of this protocol is close to the throughput of the slotted Aloha protocol. The CTM protocol, unlike the classical “slotted Aloha,” is inherently stable. The “tree splitting protocols,” of which the CTM protocol is an example, pose some interesting mathematical problems and have been the subject of intensive study. We briefly review the definition of the CTM protocol; see Bertsekas and Gallager (1987). Time is divided into slots of equal duration. During each slot, one of the following events occurs: 1. The slot is wasted because no one transmits. 2. Exactly one user transmits a message, in which case the message is successfully received. 3. The slot is wasted because two or more users transmit, interfering with each other. This is called a collision. At the end of each slot, every user knows which of these three events occurred (this is sometimes called “trinary feedback”). When a collision occurs, all users involved (those that transmitted during the slot) divide themselves into two groups on a random basis. Each user performs the equivalent of an independent coin toss in order to make its decision; p is the probability that a user selects the first group. Users in the first group retransmit their messages during the slot following the one in which the collision occurred; users in the second group defer their retransmissions until all users in the first group have successfully transmitted their messages. If one of these groups contains more than one user, another collision will occur, in which case this group divides in the same way. Collisions are resolved on a last-come first-served (LCFS) basis; i.e., the most recent collision is resolved before any prior collisions. We assume that new messages are generated according to a Poisson process with aggregate rate λ. Users who have transmitted a message that collided do not generate any new messages until their messages have been transmitted; however, since only a finite number of users are involved in
9.2 Convergence of Recursive Algorithms
221
any collision, the rate λ remains constant when the total user population is infinite. Denote by Ln the number of slots required for resolution of a collision between n users. Ln includes the slot in which the initial collision occurred, plus the times for the two groups of users to transmit their messages. It is easily seen that the following stochastic recursion holds: d n−I +Y , Ln = 1 + LIn +X + L n
n ≥ 2,
(9.2.37) d
with initial conditions L0 = L1 = 1. Here In = B(n, p) is the number of users who retransmit immediately, X is the number of new arrivals in the collision slot, and Y is the number of new arrivals during the slot in d which the deferred retransmissions occur. Moreover, Ln = L n , and the n )n≥0 are assumed to be mutually random quantities X, Y, (Ln )n≥0 , (L independent. For real systems, the total number of users sharing a multipleaccess channel might be as large as 103 or 104 , but the number n of users involved in any collision would be a small fraction of this. Fayolle et al. (1985) showed that limn→∞ ELn /n exists if log p/ log(1 − p) is irrational;
(9.2.38)
otherwise, ELn /n oscillates around a certain value. In a subsequent paper, Fayolle et al. (1986) proved the linearity of the variance of Ln under (9.2.38) and the finiteness of all moments of Ln . Confirming a conjecture of Massey (1981), Regnier and Jacquet (1989) proved that the variance of d
Ln is not linear for In = B(n, p), p = 1/2, and X = Y = 0. In Jacquet and Regnier (1988) and Regnier and Jacquet (1989) the asymptotic normality of the standardized sequence {Ln } (for X = Y = 0 or both Poisson) was established. In this section we examine the asymptotic normality of the law of Ln without the specific assumptions on the distribution type of In , X, and Y , provided that the variance of Ln is asymptotically linear. In the second part of the section we numerically investigate the influence of nonlinearity d in the case In = B(n, p), X = Y = 0, and p = 12 . It turns out that E Ln /n and (Var Ln )/n increase monotonically with n until n reaches a large value (n = 39, 488). After that, the linearity breaks down, in agreement with the theoretical results. We consider the simple normalization √ when n = E Ln . (9.2.39) Yn = (Ln − n )/ n, The main theoretical result indicates that normality holds if the variance behaves linearly and the numbers of retransmissions are not concentrated too much in the extremes. In this sense the result can be considered as a stability result for the asymptotic distribution.
222
9. Mass Transportation Problems and Recursive Stochastic Equations
This idea of stability is confirmed by simulations for some cases of immigration in part (b) of this section. In the numerical study we detect the theoretically predicted instability but only for extremely large n and with a practically negligible order of magnitude. Our simulation study confirms the stability in the standard model concerning dependence on p. Moreover, a simulation study of the empirical d.f. of Yn confirms the normality for 102 ≤ n ≤ 104 . The “instability” of E Ln /n and Var Ln /N and hence the fluctuation of the limit distribution of Yn arises for very large values of n, n 104 . The order of magnitude of the instability is seen from our numerical results and simulation study to be extremely small (but existent; in accordance with the theoretical results) and can be neglected from the practical point of view. This has the valuable consequence that in practical applications one can use just simple linear normalizations as in (9.2.39) and the normal approximation also for n “moderately” large, 102 ≤ n ≤ 104 . (a) Asymptotic Normality of the Law of Ln In this section the asymptotic normality of Yn is shown under the following assumptions: For some r ∈ (2, 3], (a) E X r/2 + E Y r/2 < ∞ and
In Lr −→ p ∈ (0, 1); n
(b) σn2 = (Var Ln )/n → σ 2 ; (c) supn E|Yn |r < ∞ for some r ∈ (2, 3]. Conditions (b), (c) amount to the correctness of the normalization in (9.2.39). Condition (a) implies that the subgroups are not allowed to be extremely large or small. Note that the number of retransmitting users In is not necessarily binomial in our assumptions. This allows us, for example, to consider departures from independence in the protocol. Regnier and d Jacquet (1989) showed that (a), (b), and (c) hold for In = B(n, p), (9.2.38), d
d
and X = Y = 0. More generally, one can allow X = Y = Pois(λ). Theorem 9.2.17 Under (a), (b), and (c), the distribution of Yn is asymptotically N (0, σ 2 ). Proof: From the definitions of Ln , Yn , and (9.2.37), (9.2.39), Yn
d
=
1/2 In + X YIn +X n 1/2 n − In + Y + Yn−In +Y + Cn (In , X, Y ). n
(9.2.40)
9.2 Convergence of Recursive Algorithms
223
Here Yn is an independent copy of Yn , and Cn (k, m, m) := n−1/2 (1 + k+m + n−k+m − n ). Define a sequence of normal N (0, σn2 )-distributed independent r.v.s Zn that are independent of (In ), X, and Y , and let Zn∗
=
In + X n
1/2 1/2 n − In + Y n−I +Y + Cn (In , X, Y ), Z ZIn +X + n n
n is an independent version of Zn . Z ∗ is an accompanying sequence where Z n to Yn . Let μr be one of the following ideal metrics of order r > 0: & ' (s) μ(1) (X, Y ) = sup |E(f (X) − f (Y ))|; f ≤ 1 q r r = s + 1/ p, s ∈ IN, p ∈ [1, ∞],
with μ(2) r (X, Y ) = μ(3) r (X, Y
) =
1 1 + = 1, p q
sup |t|−r |E eit X − E eit Y |;
t∈IR
sup |h|r
h∈IR
sup |P (X + hN ∈ A) − P (Y + hN ∈ A)|, A∈B(IR)
where N is a standard normal r.v. independent of X and Y . Claim 1. (μr -closeness of Zn∗ and Yn ) Set an = μr (Zn , Yn ) and suppose a := supn an < ∞. Then sup μr(i) (Zn∗ , Yn ) ≤ a[pr/2 + (1 − p)r/2 ].
(9.2.41)
n
(i)
For μr = μr (i = 1, 2, 3), μr (Zn∗ , Yn ) ≤ P (In = k, X = m, Y = m) k,m,m
28
8
n−k+m Zn−k+m + cn (k, m, m), n 3 8 8 k+m n−k+m Yk+m + Yn−k+m + cn (k, m, m) n n * r/2 r/2 + k+m n−k+m ≤ P (In = k, X = m, Y = m)a + n n k,m,m * r/2 r/2 + In + X n − In + Y = aE + . n n μr
k+m Zk+m + n
224
9. Mass Transportation Problems and Recursive Stochastic Equations
Using assumption (a), the right-hand side of the above inequality converges to a[pr/2 + (1 − p)r/2 ]. Claim 2. (Condition (9.2.41) holds) a ≤ C sup(E|Yn |r + E|Zn |r ) < ∞.
(9.2.42)
n
(Throughout this section, C stands for an absolute constant that can have different values in different places.) For the proof, note that for i = 1, 2, or 3, μr(i) (X, Y ) ≤ C(E|X|r + E|Y |r ) < ∞, provided that E(X j − Y j ) = 0 for j = 1, 2 (see, for example, Rachev (1991, Chapters 14, 15)). Thus (9.2.42) holds. Claim 3. (Asymptotic normality of Zn∗ ) For n → ∞, bn = μr (Zn , Zn∗ ) → 0. (1)
We consider the case μr = μr only. Let κr be the rth pseudomoment, κr (X, Y ) = r |x|r−1 |FX (x) − FY (x)| dx. IR
Then, since the mean and variance of Zn matched those of Zn∗ (μr (Zn∗ , Yn ) < ∞ implies E((Zn∗ )j − Ynj ) = 0, j = 1, 2), it follows that bn ≤ C κr (Zn , Zn∗ ). Recall that (Zn )n≥1 is independent of (In )n≥1 and X, Y . Let N0 denote a standard normal r.v. independent of (In ) and X, Y . Consequently, 8 8 I + X n − In + Y n Zn∗ = ZIn +X + Zn−In +Y + Cn (In , X, Y ) n n 1/2 In + X 2 n − In + Y 2 d = N0 + Cn (In , X, Y ) σIn +X + σn−In +Y n n =: ηn N0 + Cn (In , X, Y ). From assumptions (a), (b) we get the convergence of ηn in probability: ηn −→ (p σ 2 + (1 − p)σ 2 )1/2 = σ. P
d
Since Zn∗ = ηn N0 + Cn (In , X, Y ) has the same mean and variance as Zn = σn N0 , then σn2
= E(ηn N0 + Cn (In , X, Y ))2 = E ηn2 + E(Cn (In , X, Y ))2 .
9.2 Convergence of Recursive Algorithms L2
225
P
As ηn −→ σ, we conclude that Cn (In , X, Y ) −→ 0. This implies that bn = μr (Zn , Zn∗ ) → 0, as desired in Claim 3. With an = μr (Zn , Yn ) ≤ μr (Zn∗ , Yn )+bn and a = lim an we finally obtain from claims 1–3 the following result: Claim 4. a = 0. To prove the claim, choose n0 = n0 (ε) (ε > 0) such that ak ≤ a + ε for k > n0 . Then for n ≥ n0 , as in the proof of Claim 1, we have an
≤ μr (Zn∗ , Yn ) + bn 3 2n −1 n 0 P (In = k) ≤ + k=0
k=n−n0
*
×E +
n−n 0 −1
k+X n
+
n−k+Y n
(ak+X + an−k+Y )
r/2 +
P (In = k)
k=n0
*
×E
r/2
sup 0≤k≤n0 −1, n−n0 ≤k ≤n
k+X n
r/2 (a + ε) +
n−k+Y n
r/2
+ (a + ε) + bn .
Recall Claim 2, a = supn an < ∞, and thus as n → ∞, 2n −1 3 n 0 P (In = k)2a E(X r/2 + Y r/2 ) a ≤ lim sup + n
k=0
k=n−n0
+ (a + ε)(pr/2 + (1 − p)r/2 ) + lim sup bn =
0 + (a + ε)(pr/2 + (1 − p)r/2 ) + 0.
Since r > 2, we have pr/2 + (1 − p)r/2 < 1, which implies that a = 0, and thus the proof of the theorem is complete, since μr -convergence implies weak convergence. 2
Remark 9.2.18 Theorem 9.2.17 shows a remarkable stability of the central limit theorem for Ln . It says that the central limit theorem can be expected if the variance behaves approximately linearly and that it is even true under protocols that are not based on a binomial number of retransmitting users. In concrete examples it is not easy to obtain the asymptotic behavior of the first moments. Our method of proof separates this problem and establishes a general structural stability property concerning the asymptotic distribution. This should be of some interest for the application of the algorithm, too.
226
9. Mass Transportation Problems and Recursive Stochastic Equations
This stability is not clear or expected from the methods that established the central limit theorem up to now in some very special cases. (b) Numerical Results In the first part of this section we study numerically the extent of nonlind earity of ELn , Var Ln in the special case of (9.2.37) where X = Y = 0, In = B(n, p), log p/ log(1 − p) rational. Initial investigation of the behavior of the mean n of Ln at p = 0.5 failed to show the predicted instability of n /n. The normalized value n /n seemed to converge rapidly, reaching a value of about 2885 for n = 2400, and showing no variation out to 7 decimal places with further increase in n. The increments n /n − n−1 /(n − 1) were observed always to be positive, another indication of convergence. At n = 38, 488, a negative increment appears, and subsequently, values of the increment oscillate in a sinusoidal fashion, with a peak magnitude of about 1 × 10−10 . The behavior of the increments is shown graphically in Figure 9.5 on a logarithmic scale.
FIGURE 9.5. Increments of n /n, p = 0.5, n = number of users who initially collide
Based on recursions for the first moments, the numerical √ results for evalun−1 n ation of n /n, Δ(n /n) := n − n−1 , Varn := Var(Ln / n), and Δ(Varn ) := Varn − Varn−1 are shown in Table 9.1. On the other hand, a change in the initial conditions disturbs the value of n /n and Varn (see Table 9.2). Table 9.1 confirms the stability of nn ≈ 2.88, Varn ≈ 3.38 for moderate n ∈ (102 , 104 ) and p = 0.5. Slight perturbation of p around 0.5 does not change the overall stability of nn for practically relevant n; see Figure 9.6. Summarizing the numerical findings, it appears that for reasonably large n ≥ 100 and p = 0.5 the nonlinearity of n /n and Varn is not observed
9.2 Convergence of Recursive Algorithms
n 2 3 4 5 10 100 500 1000 5000 10000
n 2 3 4 5 10 100 500 1000
227
TABLE 9.1. numerical results p = 0.5, L0 = L1 = 1 n /n Δ(n /n) Varn Δ(Varn ) 2.5000D+00 1.5000D+00 4.0000D+00 4.000D+00 2.5556D+00 5.5556D−02 3.2593D+00 −7.4074D−01 2.6310D+00 7.5397D−02 3.3832D+00 1.2396D−01 2.6838D+00 5.2857D−02 3.3875D+00 4.2812D−03 2.7853D+00 1.0985D−02 3.3832D+00 1.1672D−04 2.8754D+00 1.0113D−04 3.3834D+00 9.1046D−07 2.8834D+00 4.0528D−06 3.3834D+00 −8.5624D−08 2.8844D+00 1.0224D−06 3.3834D+00 −4.1963D−08 2.8852D+00 3.4639D−08 3.3834D+00 2.1844D−08 2.8853D+00 7.3428D−09 3.3835D+00 −4.1539D−07
n /n 1.0000E+00 1.1111E+00 1.1905E+00 1.2419E+00 1.3427E+00 1.4327E+00 1.4407E+00 1.4417E+00
TABLE 9.2. p = 0.5, L0 = L1 Δ(n /n) Varn 1.000E+00 1.000E+00 1.1111E−01 8.1481E−01 7.9365E−02 8.4580E−01 5.1429E−02 8.4688E−01 1.1048E−02 8.4579E−01 1.0107E−04 8.4586E−01 4.0304E−06 8.4586E−01 1.0117E−06 8.4586E−01
=0 Δ(Varn ) 1.000E+00 −1.8519E−01 3.0990E−02 1.0703E−03 2.9179E−05 2.2762E−07 −2.1503E−08 −1.2471E−08
FIGURE 9.6. |Δ(n /n)|, p = 0.499
in a practically relevant magnitude. Also, in this range of values of n the behavior of n and Varn is stable with respect to p. The following simulations (Figures 9.7, 9.8, and 9.9) show a good agreement with the normal approximation for n ≥ 100. For n = 20 or n = 30 the normal fit is no longer good. Further simulation results indicate stability with respect to the value of p.
228
9. Mass Transportation Problems and Recursive Stochastic Equations
FIGURE 9.7. Simulation curve for Yn = (Ln − n )/σn for n = 1000, p = 0.5, L0 = L1 = 1, based on 936,725 trials, and the fitted normal curve with mean zero and variance 3.3834 as given in Table 9.1
FIGURE 9.8. Normal fit to empirical d.f. with n = 50, p = 0.5
FIGURE 9.9. Normal fit with σ 2 = 3.3874 to the simulated Yn ’s; n = 1000, p = 0.49, L0 = L1 = 1 based on 697,675 trials
In the final simulations (Figures 9.10, 9.11) we consider the case with nonzero immigrations X, Y in a symmetric and a nonsymmetric case with masses in 0,1,2. These examples confirm the general robustness idea that asymptotic normality is approximatively valid if the variances behave approximately lin-
9.2 Convergence of Recursive Algorithms
FIGURE 9.10. Normal fit for n = 40/100 and X ∼ (nonsymmetric case)
3 δ 4 0
229
+ 18 δ1 + 18 δ2 , Y ∼ δ0 ,
3 δ1 + FIGURE 9.11. Normal fit for n = 50/10000 and X ∼ Y ∼ δ0 16 metric case)
1 δ 16 2
(sym-
early (which is observed in these examples empirically).
9.2.4
Quicksort
The quicksort algorithm, which was introduced by C.A.A. Hoare in 1961– 1962, represents a standard sorting procedure in computer systems. From a list of n arbitrary (but different) real numbers it selects an element x randomly. Then the remaining numbers are divided into two groups, the group of numbers smaller and that of numbers larger than x. The same procedure is applied to each of these groups if they contain more than one element. The algorithm ends with a sorted list of the original numbers.
230
9. Mass Transportation Problems and Recursive Stochastic Equations
If Ln denotes the number of comparisons in the quicksort algorithm on its way to ordering n elements x1 , . . . , xn , then Ln satisfies the following recursive equation: d
Ln = n − 1 + LIn + Ln−In ,
L0 = L1 = 0,
L2 = 1. (9.2.43)
Here In , n − In are the sizes of the subgroups, and they are assumed to be uniformly distributed on {0, . . . , n−1}. The expectation n = ELn satisfies then the resursion n = n − 1 +
n−1
P (In = i)(i + n−i ),
i=0
and therefore, n+1 1 2 n n−1 2(n − 1) + − 4. = + = n+1 n n(n + 1) i n + 1 i=1
This yields n = 2n ln n + n(2γ − 4) + 2 ln n + 2γ + 1 + O(n−1 ln n), (9.2.44) where γ ≈ 0.5772 is the Euler constant. Similarly, for vn = Var(Ln ), we have 2 2 vn = 7 − π n2 + o(n2 ). (9.2.45) 3 The normalized random sequence Yn =
Ln − n n
(9.2.46)
satisfies the recursion d
Yn =
In n − In Y n−In + Cn (In ), Y In + n n
(9.2.47)
where Cn (i) =
n−1 E(Li + Ln−i − Ln ). n
As n → ∞, Inn converges to some random variable τ that is uniformly distributed on [0, 1]. Moreover, Cn (In ) = Cn (n Inn ) can be uniformly approximated as follows: sup |Cn ( nx!) − C(x)| ≤ x∈(0,1)
6 n ln n + O(n−1 ). n
(9.2.48)
9.2 Convergence of Recursive Algorithms
231
Here C(x) = 2x log x+2(1−x) log(1−x)+1, and x! is the smallest integer larger than or equal to x (cf. R¨ osler (1991)). As a result we obtain the limiting equation Y
d
= τ Y + (1 − τ )Y + C(τ ).
(9.2.49)
In particular, it yields recursive formulas for the moments of Y . Using as osler (1991) an accompanying sequence Yn := τ Y + (1 − τ )Y + Cn (In ), R¨ established the convergence of Yn to Y for the p -metrics. From Proposition 9.1.3 there exists a unique solution Y (in distribution) of the fixed-point equation (9.2.49). Theorem 9.2.19 Let Y denote the solution of (9.2.49). Then p (Yn , Y ) → 0.
(9.2.50)
0.6
The simulation result described on Figure 9.12 shows that the density of Y is very well approximated by a lognormal distribution (cf. Cramer (1996)). The maximal deviation of the fitted lognormal density and the smoothed empirical density is about 0.004.
0.0
0.2
0.4
FIGURE 9.12. Smoothed empirical density of quicksort. Simultaneously, a lognormal approximation is given
-2
-1
9.2.5
0
1
2
3
Limiting Behavior of Random Maxima
A sample of size n is divided into two parts of random size In and n − In , where In is a random variable. We consider the recursion of “maxima type” d
Ln = cn + LIn ∨ Ln−In ,
(9.2.51)
where Ln , Ln are independent and identically distributed r.v.s, (In ) are independent, and (cn ) is a sequence of real numbers. Given α > 0, let us introduce the normalizations Yn = n−1/α Ln ,
Y n = n−1/α Ln .
(9.2.52)
232
9. Mass Transportation Problems and Recursive Stochastic Equations
By (9.2.51), Yn = cn n−1/α + d
In n
1/α YIn ∨
n − In n
1/α Y n−In .
(9.2.53)
Suppose that n−1/α is the right normalization to obtain the weak converD D gence results, Yn −→ Z, Y n −→ Z, and moreover, let Inn → τ , where τ a random variable independent of Z, Z. Then, in the limit, we obtain the fixed-point equation Z = τ 1/α Z ∨ (1 − τ )1/α Z. d
(9.2.54)
It is easy to check that, for example, the extreme value distribution FZ (x) =
e−x 0,
−α
, x > 0, x≤0
(9.2.55)
satisfies (9.2.53). As a motivation for the study of equation (9.3.46), consider Ln = max{X1 , . . . , Xn }, cn = 0, and assume that (Xi ) are i.i.d. r.v.s of Paretian type (F (x) ∼ 1 − x−α for x → ∞). Then by Gnedenko’s extreme-value D theorem, Yn = n−1/α Ln −→ Z, with FZ as in (9.3.50). Note also that formula (9.3.46) concerns some modifications of this recursion, where the maxima are produced by a (random) scheme determined by In (for examd
ple In = B(n, p) ) and cn corresponding to some weighting of the number of steps in this reduction (for example cn = 1). Furthermore, note that (9.2.51) with cn = 1 also describes the maximum search length of a search algorithm dividing a slot of size n succesively into two parts of size In and n − In , respectively. Define next ak := r (Yk , Z) for 0 < α < r ≤ 1 or ak := r (Yk , Z), r the weighted Kolmogorov metric (cf. (9.1.16)) for 1 ≤ α < r < α + 1, and consider the following assumptions: lim ak
<
∞,
cn n−1/α In n
→
0,
→
τ
(9.2.56) a.s. with Eτ r/α + (1 − τ )r/α < 1.
The first assumption corresponds to the condition that n−1/α is the right normalization for Ln (as for example in the case Ln = max(X1 , . . . , Xn )). d
If In = B(n, p), then
In n
→ p, and for α < r we have pr/α + (1 − p)r/α < 1.
9.2 Convergence of Recursive Algorithms
233
Theorem 9.2.20 Let (Ln ) satisfy the recursion (9.2.51). Define ak := r (Yk , Z) if 0 < α < r ≤ 1, or ak := r (Yk , Z) if 0 ≤ α < r < α + 1, and let FZ be as in (9.2.55). Then assumption (9.2.56) implies lim ak = 0. k
Proof: We consider first the case 0 < α < r ≤ 1 and ak = r (Yk , Z). Let (Zi ) be an i.i.d. sequence with common extreme-value distribution (9.2.55), and define Zn∗ = n−1/α cn +
In n
1/α
ZIn ∨
n − In n
1/α Z n−In .
(9.2.57)
Then r (Yn , Zn∗ )
2
=
≤
≤ =
1/α 1/α In n − In r YI n ∨ Y n−In , (9.2.58) n n 3 1/α 1/α In n − In Z In ∨ Z n−In n n * 2 1/α n 1/α k n−k P (In = k) r Yk ∨ Y n−k , n n k=1 3+ 1/α 1/α k n−k Zk ∨ Z n−k n n + * r/α n r/α k n−k P (In = k) ak + an−k n n k=1 r/α r/α n − In In E aIn + E an−In . n n
The arguments in deriving the above bounds rely on the “ideality” of r with respect to the maxima scheme.(1) Define bn := r (Zn∗ , Zn ) and let us use the bound r (X, Y ) ≤ (1 (X, Y ))r (1) Recall
(9.2.59)
that a metric μ(X, Y ) = μ(FX , FY ) in the space of distribution functions is called ideal with respect to the maxima scheme (or max-ideal) of order r > 0 if for any c > 0 and independent X, Y , and Z, μ(cX ∨ Z, cY ∨ Z) ≤ cr μ(X, Z); see Rachev (1991) and Rachev and R¨ uschendorf (1991).
234
9. Mass Transportation Problems and Recursive Stochastic Equations
for any r ≤ 1 that is a singleconsequence of the Monge–Kantorovich theorem; recall that 1 (X, Y ) = |FX (x) − FY (x)| dx. Claim 1. bn → 0. To show the claim, we apply (9.2.57) to obtain 2 3 1/α 1/α I n − I n n ≤ 1 n−1/α cn + ZI n ∨ Z n−In , Zn b1/r n n n 2 3 1/α 1/α n k n−k P (In = k)1 n−1/α cn + Zk ∨ Z n−k , Zn ≤ n n k=1 2 3 1/α n n−k k + P (In = k)1 cn n−1/α + Z, Z . = n n k=1
In the above bound we have used that the extreme value distributions Zn satisfy the max-stability property: 1/α 1/α 1/α n−k n−k k k d + Zk ∨ Z n−k = Z = Z. n n n n 1/r
Therefore, bn
≤ cn n−1/α → 0, proving the claim.
Applying the triangle inequality and (9.2.58), (9.2.59), we have an ≤ r (Yn , Zn∗ ) + r (Zn∗ , Zn ),
(9.2.60)
and therefore, a ≤ aE(τ r/α + (1 − τ )r/α ) + 0,
with a := lim an ,
(9.2.61)
implying that a = 0. Next, we shall make use of the weighted Kolmogorov metric r (cf. (9.1.16)). It is easy to check that for ε ≥ 1 and X, Y ≥ 0, r (X + a, Y + ε) ≤ (εa)r + εr r (X, Y ).
(9.2.62)
Define then ak = r (Yk , Z), bk = r (Zk∗ , Zk ). By (9.2.62) (with ε = 1), 2 1/α 1/α In n − In ∗ r −r/α + r Z In ∨ Z n−In , r (Yn , Zn ) ≤ cn n n n 3 1/α 1/α In n − In YIn ∨ Y n−In n n r/α r/α In n − In r −r/α ≤ cn n +E aIn + E an−In . n n
9.2 Convergence of Recursive Algorithms
235
This implies that lim r (Yn , Zn∗ ) ≤ E(τ r/α + (1 − τ )r/α )a for r > α.
(9.2.63)
If α < r ≤ α + 1, then r (Zn , Zn∗ )
≤
n
P (In = k)r (cn n−1/α + Zn , Zn )
k=1
= r (cn n−1/α + Z, Z). We next prove that r (a + Z, Z) → 0 for a → 0. Let χ(x, a) := xr |FZ (x) − FZ+a (x)| = xr |e−x
−α
− e−(x−a)
−α
−α
Then sup0≤x≤a χ(x, a) ≤ ar , supa≤x≤2a χ(x, a) ≤ (2a)r e−a more, sup xr |e−x
−α
2a≤x≤1
−e
−(x−a)α
x | =
sup xr α
2a≤x≤1
y −α−1 e−y
−α
|.
. Further-
dy
x−a
1 −x−α e (x − a)α+1 2a≤x≤1 r x αa ≤ sup e−1 α+1−r (x − a) 2a≤x≤1 x − a ≤
sup xr αa
≤ 2r αar−α e−1 and x sup xr α
1≤x<∞
y −α−1 e−y
x−a
−α
dy
≤
xα+1 −x−α αa e α+1 1≤x<∞ (x − y) sup
≤ 2α+1 αa. Combining all this we obtain r (a + Z, Z) ≤ C ar−α for α < r ≤ α + 1. (Note that supx χ(x, u) = ∞ for r > α + 1.) Applying again the triangle inequality, we finally obtain a = lim r (Yn , Z) ≤ lim r (Yn , Zn∗ ) + lim bn ≤ E(τ r/α + (1 − τ )r/α )a, which indeed implies a = 0.
2
A similar study of logarithmic normalizations for max-search algorithms is provided by Cramer (1995a).
236
9. Mass Transportation Problems and Recursive Stochastic Equations
9.2.6
Random Recursion Arising in Probabilistic Modeling: Limit Laws
In this and the next section we study various random recursions arising in probabilistic modeling. In 9.2.6 we shall discuss the limiting behavior of these recursions and describe the limit distributions, and in the next section we estimate the rate of convergence to the corresponding limit.(2) Let {(Yn , Zn )}n≥1 be a sequence of i.i.d. random vectors in IR2 . Define the random recursion (Sn∗ ) by ∗ Zn + Yn Zn , Sn∗ = Sn−1
n = 1, 2, . . . ; S0∗ = 0.
(9.2.64)
The processes {Sn }n≥1 and {Sn∗ }n≥1 have appeared in a variety of situations. The random recursion (9.2.64) is often written in an equivalent form, ∗ An + B n , Sn∗ = Sn−1
n = 1, 2, . . . ,
(9.2.65)
for a sequence of i.i.d. random vectors {(An , Bn )}n≥1 .(3) Alternatively to (9.2.64) we can introduce the process(4) Sn =
n i=1
(2) The
Yi
i 9
Zj ,
n = 1, 2, . . . .
(9.2.66)
j=1
results of Sections 9.2.6 and 9.2.7 are due to Rachev and Samorodnitsky (1995); see also the references therein. (3) Typically, the Markov chain (9.2.65) is supplied with an initial state S ∗ = B for a 0 0 random variable B0 independent of the sequence {(An , Bn )}n≥1 . In the ergodic case the ∗ is, of course, independent of the distribution of B . The recursion limit distribution of Sn 0 ∗ can be (9.2.65) arises in many applications, and as pointed out by Vervaat (1979), Sn regarded as the “wealth” at time n; An is the relative change of the “wealth” between times n and n − 1 due to a “quality” change in the “environment”: inflation, change of an exchange rate, erosion, spoilage, decay, etc. Bn represents the added (or removed) “wealth” just prior to time n. Applications of this model are abundant in the literature. Uppuluri et al. (1967) and Paulson and Uppuluri (1972) used this model to represent the evolution of a stock of a radioactive material. Chandrasekhar and Munch (1950) studied the fluctuations in brightness of the Milky Way. Cavalli-Sforza and Feldman (1973) and Cavalli-Sforza (1975) modeled evolution and cultural inheritance. Applications to investment models can be found in Lassner (1974a) and Perrakis and Henin (1974). A particular subclass of random recursions (9.2.65), the so-called ARCH (autoregressive conditional heteroskedastic) processes, has been used in mathematical finance to model data exhibiting clusters; see Domowitz and Hakkio (1985) and Hsieh (1988) for modeling exchange rate yields, and Engle et al. (1987) and Bollerslev (1987) for modeling stock returns. (4) The stochastic process (9.2.66) has been used by Todorovic and Gani (1987) and Todorovic (1987) to model the effect of environmental changes on crop production; see also Puri (1987).
9.2 Convergence of Recursive Algorithms
237
It is obvious that the two models (9.2.66) and (9.2.64) are closely related. The stochastic processes {Sn }n≥1 and {Sn∗ }n≥1 , although not equal, in general, in finite-dimensional distributions, have equal two-dimensional disd ∗ tributions: More precisely, (Sn , Sn+1 )=(Sn∗ , Sn+1 ) for each n = 1, 2, . . . . d
∗ ∗ , Sn+2 ).(5) However, we may have (Sn , Sn+1 , Sn+2 ) =(Sn∗ , Sn+1
A related pair of processes can be defined by replacing the sum with the maximum: Mn∗ Mn
= =
∗ max(Mn−1 Zn , Yn Zn ); n @ i=1
Yi
i 9
Zj ,
M0∗ = 0;
(9.2.67)
n = 1, 2, . . . .
j=1
These models can be regarded as describing the evolution of the highest up-to-date adjusted change in the “wealth” associated with the summation models (9.2.64)–(9.2.66).(6) Further, we prove limit theorems for the processes {Sn }n≥1 and {Mn }n≥1 stopped at random times. Thinking in terms of “wealth” and “environment” changes described above, suppose that in each time period a disastrous event may occur with probability p ∈ (0, 1). As a result of the disastrous event (bankruptcy, drought, etc., depending on the application) the whole wealth could be lost. The time of the disastrous event τ = τ (p) is assumed to be geometrically distributed, P (τ (p) = k) = (1 − p)pk−1 ,
k = 1, 2, . . . ,
(9.2.68)
and independent of the sequence {(Yn , Zn )}n≥1 . We will discuss the limiting behavior of the total “wealth” (until time τ ), Sτ , as p → 0. (5) Typically,
of interest have been conditions for ergodicity of the Markov chain (9.2.65) and characterization of the limiting distribution. The key reference here is Vervaat (1979). An earlier work of Kesten (1973) studies the multidimensional version of ∗ ’s and B ’s are d-dimensional (random) vectors and A ’s are d × d (ran(9.2.65), i.e., Sn n n dom) matrices. This level of generality allows one, for example, to treat one-dimensional recursions of a higher order. It turns out, in particular, that under certain moment con∗ has Pareto-like tails. ditions on (An , Bn ) in (9.2.65), the stationary distribution of Sn This phenomenon has been studied further by Goldie (1991) in the case of more general recursions. (6) For applications of these and related models see Helland and Nilsen (1976), Hooghiemstra and Keane (1985), and Hooghiemstra and Scheffer (1986). An interesting time-reversibility relation between special cases of the models (9.2.64) and (9.3.68) has been noted by Chernick et al. (1988). Extrema of the processes arising in the random recursion (9.2.65) and, in particular, of ARCH processes are studied in de Haan et al. (1989). As general references on random recursions see Letac (1986) and the introduction of Kifer (1986). Brandt et al. (1990, Chapter 9) establishes a continuous dependence of the stationary distribution of the Markov chain (9.2.65) on certain parameters of the recursion.
238
9. Mass Transportation Problems and Recursive Stochastic Equations
The disastrous event could be caused by the cumulative effect of a large number of “bad” events with high success probability, p ≈ 1. This leads to a negative binomial model for the time of the disastrous event τ = τ1 + · · · + τr , where the τi ’s are i.i.d. geometric with mean 1/p. For p → 1, r → ∞, r(1 − p) → λ > 0 we have the Poisson approximation P (τ = n + r) ≈ e−λ λn /n!. This, in turn, leads to a Poisson model for the time of the disastrous event. Assuming that the Ni are i.i.d. Poiss(λ) r.v.s, Ti = N1 +· · ·+Ni is viewed as the time of the ith disastrous event. We shall study in the framework of the model (9.2.66) the distributions of the sums T1
ST1 =
i=0
Yi
i 9
Zj ,
(9.2.69)
j=0
and
Tk+1
STk+1 − STk =
Yi
i=Tk +1
i 9
Zj ,
k = 1, 2, . . . .
j=0
Here for the sake of convenience we start the sequence {(Yn , Zn )} at n = 0. Similarly to (9.2.69) we shall be interested in the laws of Mτ
=
τ @
Yi
i 9
i=1
j=1
T1 @
i 9
Zj
(9.2.70)
and MT1
=
Yi
i=0
Zj
j=0
and @
Tk+1
Yi
i=Tk +1
i 9
Zj ,
k = 1, 2, . . . .
j=0
Note that if Sn (or Sn∗ ) converges in distribution, then the limiting (in distribution) random variable S satisfies the equation d
S = (S + Y )Z.
(9.2.71)
d
Here (Y, Z) = (Yn , Zn ), and the random quantities S and (Y, Z) in the right-hand side of (9.2.71) are independent. In many cases the solution of the equation (9.2.71) turns out to be an infinitely divisible random variable. Similarily, the distributional limit M of Mn satisfies d
M = (M ∨ Y )Z.
(9.2.72)
9.2 Convergence of Recursive Algorithms
239
It is interesting to note that the total “wealth” until the disastrous geometrical event Sτ also satisfies a distributional equation: d
Sτ = (Y + δSτ )Z.
(9.2.73)
d
Here, as before, (Y, Z) = (Yn , Zn ), δ is Bernoulli with P (δ = 1) = 1 − p); S, δ, and (Y, Z) in the right-hand side of (9.2.73) are independent.(7) Similarily to (9.2.73), d
Mτ = (Y ∨ δMτ )Z;
(9.2.74)
if Z ≡ 1, Mτ is said to be max-geometric infinitely divisible.(8) We start with results on the limiting behavior of the recursions defined above. In the next five theorems {(Yn , Zn )}n≥1 is, unless explicitely stated otherwise, a sequence of nonnegative i.i.d. random vectors such that P (Yn > 0) > 0 and P (Zn > 0) = 1. Set Sn and Mn as in (9.2.66) and (9.2.67), and let Xn = Yn
n 9
Zj .
j=1
Set ξn = log Zn , ν = Eξn (when they exist). Lemma 9.2.21 Let {(Yn , Zn )}n≥1 be a sequence of random vectors living on a common probability space such that {Yn }n≥1 is a sequence of nonnegative i.i.d. random variables with P (Yn > 0) > 0, and {Zn }n≥1 is a sequence of positive i.i.d. random variables. Suppose that E log(1 + Zn ) < ∞. (a) If ν > 0, then with probability 1, Xn does not converge to 0, and thus Sn → ∞. The same is true in the case ν = 0, provided that the sequence {Yn , Zn }n≥1 is a sequence of i.i.d. random vectors. Moreover, in both cases Mn → ∞ (unless P (Zn = 1) = 1). (b) If −∞ < ν < 0, the following are equivalent as n → ∞: (b-i) Xn → 0 a.s.. (b-ii) Sn converges to a finite limit S a.s..
(7) See
Rachev and Todorovich (1990) for some examples of distributions of Sτ ; if Z ≡ 1, Sτ is said to be geometrically infinitely divisible; see Klebanov, Maniya, and Melamed (1984). (8) See Rachev and Resnick (1991).
240
9. Mass Transportation Problems and Recursive Stochastic Equations
(b-iii) Mn converges to a finite limit a.s. (b-iv) 0 < E log(1 + Yn ) < ∞. Moreover, (b-iv) implies (b-i)–(b-iii) even if ν = −∞. The proofs of this and the further assertions in this section can be found in Rachev and Samorodnitsky (1995). Remark 9.2.22 Given a sequence of nonnegative i.i.d. random vectors (Yn , Zn ) = Yn(1) , . . . , Yn(d) , Zn(1) , . . . , Zn(d) ∈ IR2d , (1)
(d)
we consider the vector of “wealths” Sn = (Sn , . . . , Sn ) given by Sn(k)
=
n
(k) Yi
i=1
i 9
(k)
Zj ,
n = 1, 2, . . . .
(9.2.75)
j=1
Then Lemma 9.2.21 applied componentwise yields convergence. Our next theorem is the CLT for the “total wealth” Sn in (9.2.66). We assume that ξn = log Zn belongs to the domain of attraction of an α-stable r.v. ηα (1 < α ≤ 2); i.e., there exist an > 0 and bn ∈ IR such that an
n
an = n−1/α L(n),
D
ξi + bn =⇒ ηα ;
(9.2.76)
i=1
where L(n) is a slowly varying function. Theorem 9.2.23 Suppose that E log(1 + Zn ) < ∞. D
(a) If ν > 0 and E log(1 + Yn ) < ∞, then an log Sn + bn =⇒ ηα . D
(b) If ν < 0 and E log(1 + Yn ) < ∞, then an log(S − Sn ) + bn =⇒ ηα . (c) Let ν = 0, and assume (without loss of generality) that bn ≡ 0. Suppose also that the sequences {Yn }n≥1 and {Zn }n≥1 are independent and that P (log Y1 > 1/an ) = o(n−1 ),
n → ∞.
(9.2.77)
Then, as n → ∞, D
an log Sn =⇒
sup L(t),
(9.2.78)
0≤t≤1
d
where L is a Levy stable motion on [0, 1] with L(1) = ηα .
9.2 Convergence of Recursive Algorithms
241
Remark 9.2.24 The results of Theorem 9.2.23 can be extended both to the multivariate setting and to the form of a functional CLT. We give just one example of such an extension, which is obtainable using Theorem 1 of Resnick and Greenwood (1979). (1) (2) = In the notation of Remark 9.2.22 let d = 2 and set ξn = ξn , ξn (1) (2) log Zn , log Zn . Assume that there exists an ∈ IR2+ and bn ∈ IR2 such that D (1) (2) (2) ξ , a ξ a(1) + bn =⇒ ηα . n n n n (i)
Here α = (α1 , α2 ), 1 < αi ≤ 2, i = 1, 2 and an = n−1/αi Li (n), i = 1, 2, (1) where the Li ’s are slowly varying functions. If E log(1 + Zn ) < ∞ and (i) ν (i) := E log Zn > 0 for i = 1, 2, then & ' D (1) (2) (2) a(1) log S , a log S =⇒ {L(t)}t≥0 ; + b n n [nt] n [nt] t≥0
the weak convergence is in the space D [0, ∞), IR2 . {L(t) = (1) d L (t), L(2) (t) , t ≥ 0} is a L´evy process with L(1) = ηα and such that & ' d 1/α1 (1) 1/α2 (2) t L (1), t L (1) + β(t), t ≥ 0 {L(t), t ≥ 0} = for some β(t) ∈ IR2 prescribed by the marginal convergence. Moreover, (i) if (α1 , α2 ) = (2, 2), then L is an IR2 -valued Wiener process; (ii) if (α1 , α2 ) = (α, 2), 1 < α < 2, then {L(t) = (L(1) (t), L(2) (t)), t ≥ 0}, where L(1) is an α-stable process and L(2) is a Wiener process independent of L(1) ; (iii) if 1 < αi < 2, i = 1, 2, then L has L´evy measure defined by ◦T = , where 1/α1 1/α2 T X = (sign x1 )|x1 | , (sign x2 )|x2 | . The measure is determined by ∈ IR2 ; |x| > r, θ(x) ∈ H} = r−1 S(H) {x for r > 0; H is a Borel subset of [0, 2π], where |x| and θ(x) are the polar coordinates of x ∈ IR2 , and S is a finite Borel measure on [0, 2π].(9) (9) A
more detailed analysis of L can be obtained using further the results of Resnick and Greenwood (1979) and de Haan et al. (1984). An even more general case where an
242
9. Mass Transportation Problems and Recursive Stochastic Equations
Propositions (b) and (c) of Theorem 9.2.23 can be extended in a similar fashion. As far as the maximal “wealth change” (9.2.65) for n years is concerned, we have the following analogue of Theorem 9.2.23.(10) Theorem 9.2.25 Under the assumptions of Theorem 9.2.23 the following hold: D
(a) If ν > 0, then an log Mn + bn =⇒ ηα . D
(b) If ν < 0, then an log(∨j>n Xj ) + bn =⇒ ηα . D
(c) If ν = 0 and (9.2.78) hold, then an log Mn =⇒ sup0≤t≤1 L(t). Next, we examine the geometric random sum Sτ as defined above. We say that ξn = log Zn belongs to the domain of attraction of a geometric α-stable r.v. Gα if there exist functions a = a(p) > 0 and b = b(p) on [0, 1] such that a
τ
D
(ξi + b) =⇒ Gα
as p → 0.
(9.2.79)
i=1
Here a(p) = p1/α L(1/p), where L is slowly varying function.(11) Theorem 9.2.26 Suppose that E log(1 + Zn ) < ∞ and (9.2.79) holds. D
(a) If ν > 0 and E log(1+Yn ) < ∞, then a(log Sτ +τ b) =⇒ Gα as p → 0. (b) If ν < 0 and E log(1 + Yn ) < ∞, then a(log as p → 0.
j≥τ +1
D
Xj + τ b) =⇒ Gα
(c) Let ν = 0 and b ≡ 0. Assume also that the sequences {Yn }n≥1 and {Zn }n≥1 are independent and P (log Y1 > n1/α L(n)−1 ) = o(n−1 ),
n → ∞.
is a (2 × 2) matrix can be treated using the theory of operator stable random vectors; see Meerschaert (1991). (10) Extensions similar to the ones discussed in Remark 9.2.24 are possible here as well; we may use the multivariate extreme value theory as in de Haan and Resnick (1977). (11) The ch.f. of G admits the representation f α Gα (t) = 1/(1 − log φα (t)), where φα is the ch.f. of an α-stable r.v. (Klebanov, Maniya, and Melamed (1984)). Similarly, fξn = 1/(1 − log ψ), where ψ is the ch.f. of a distribution in the domain of attraction of an α-stable r.v. with ch.f. φα (Mittnik and Rachev (1991)). Examples of geometric α-stable distributions are the exponential law (α = 1) and the Laplace law (α = 2).
9.2 Convergence of Recursive Algorithms
243
Then D
a log Sτ =⇒
sup G(t)
0≤t≤1
as p → 0.
(9.2.80)
Here G is a “geometric L´evy stable motion”; i.e., the weak limit in D[0, 1] [τ t] of Gp (t) = a j=1 ξj , 0 ≤ t ≤ 1. Remark 9.2.27 Regarding the existence of the process G as the weak limit of Gp , one can check the following: (a) The finite-dimensional distributions Gp (t1 ), . . . , Gp (td ) (0 ≤ t1 < · · · < td ≤ 1) converge to “geometric strictly stable distributions” G(t1 ), . . . , G(td ) with ch.f. g(θ) of the form 1/(1 − log ψ(θ)), where ψ(θ) is the ch.f. of a strictly α-stable random vector on IRd . (b) The set of laws of Gp (·) (0 < p < 1) is tight. Remark 9.2.28 Under the assumptions listed in Remark 9.2.24 we also have ' & w (1) (1) (2) (2) =⇒ {L(νt)}t≥0 , an S[τ t] , an S[τ t] + τn tbn t≥0
where τn = τ (1/n) and ν is an exponential random variable with mean 1 independent of the bivariate L´evy process L. An important observation is that the above limit relation remains true if we choose any sequence of D positive integer-valued random variables τn such that τn /n ⇒ τ , where τ is a positive random variable. Choosing therefore different laws for τ , we arrive at different models for the total “asset value process.” We list below some of these models, assuming that L is a zero mean bivariate Wiener process; that is, we are in case (i) discussed in Remark 9.2.24. (a) If τn is a mixture (by the values of a fixed random variable U ) of Poisson random variables with mean nU , then we may take τ = U , and L(τ ·) is a mixture of Wiener processes; see Boness et al. (1979). √ (b) If τ = 1/ Xm , where Xm is a chi-square random variable with m degrees of freedom, then the one-dimensional marginals of L(τ ·) are Student’s t distributed; this model was used in Blattberg and Genodes (1974) to model stock prices. (c) If τ is positive strictly stable with index α/2, 0 < α < 2, then L(τ ·) is an α-stable motion. This subordinated process was used in Mandelbrot and Taylor (1967) to explain the nonnormality of stock price changes.
244
9. Mass Transportation Problems and Recursive Stochastic Equations
(d) If τ is a lognormal random variable, then L(τ ·) is the Clark (1973) alternative to the Mandelbrot and Taylor (1967) subordinated process. Note that in contrast to (c), here L(τ ·) has finite variances. Similarly to Theorems 9.2.25 and 9.2.26, we obtain the following limit theorem for the distribution of the maximal “wealth change.” Theorem 9.2.29 Under the assumptions of Theorem 9.2.26 the following holds: D
(a) If ν > 0, then a(log Mτ + τ b) =⇒ Gα as p → 0. D
(b) If ν < 0, then a(log ∨j≥τ +1 Xj + τ b) =⇒ Gα as p → 0. (c) Suppose that the conditions of Theorem 9.2.26(c) hold. Then as p → 0, D
a log Mτ =⇒
sup G(t).
0≤t≤1
Finally, let us consider the total “wealth” until a Poisson (λ) random moment T = T (λ). Let the sequence {Yn , Zn }n≥0 be as before and independent of T . Suppose also that the ch.f. fξn of ξn = log Zn satisfies lim |u|−α (1 − fξn −a (u)) = μ
u→0
(9.2.81)
for some μ > 0, real a, and 1 < α ≤ 2. Note that a = Eξn , and at least when α = 2, (9.2.81) is equivalent to assuming that the ξn ’s are in the domain of normal attraction of an α-stable distribution (Feller (1971, p. 596)). Theorem 9.2.30 Suppose that E log(1 + Zn ) < ∞ and (9.2.81) holds. Let ST =
T i=0
Xi =
T i=0
Yi
i 9
Zj .
j=0
(a) If ν = Eξn > 0 and E log(1 + Yn ) < ∞, then as λ → ∞, λ−1/α (log ST − aT ) =⇒ Y(α) , D
where Y(α) is a symmetric stable r.v. with ch.f. exp(−μ|θ|α ).
9.2 Convergence of Recursive Algorithms
245
(b) If ν < 0 and E log(1 + Yn ) < ∞, then as λ → ∞, ⎛ λ−1/α ⎝log ⎛ λ−1/α ⎝log
⎞
∞
Xj − aT ⎠
j=T +1 Tk
D
=⇒ Y(α) ,
⎞
Xj − aT1 ⎠
D
=⇒ Y(α) ,
j=T1 +1
where T1 , Tk are as in (9.2.69). (c) Let ν = 0 and suppose that the sequences {Yn }n≥0 and {Zn }n≥0 are independent and P (log Yn > u) = o(u−α ) as u → ∞. Then as λ → ∞, λ−1/α log ST =⇒ D
sup L(t),
0≤t≤1
d
where L(·) is a L´evy stable motion on [0, 1] with L(1) = Y(α) . Analogous theorems can be established for the limit distributions of MT1 Tk+1 and ∨i=T Xi . k +1 The remaining results in this section deal with characterizations of the limit laws of Sn (cf. (9.2.66) and (9.2.64)) and Mn (cf. (9.2.67)), which can arise for any given distribution of Zn ’s in a given parametric family of distributions. We will assume that the sequences {Yn }n≥1 and {Zn }n≥1 are independent. Also, we will concentrate our attention on the distributions of Zn supported by (0.1).(12) Invoking Lemma 9.2.21(b), we conclude that Sn (resp. Mn ) converges to a finite limit S (resp. M ) if (and only if, in the case E log Zn > −∞) 0 ≤ E log(1 + Yn ) < ∞.
(9.2.82)
Given (9.2.82), the limits S and M satisfy the equations (9.2.71) and (9.2.72).(13) We start with a characterization of the class S1 (resp. M1 ) of laws L(S) of S (resp. L(M )) such that for any L(Z) ∈ Z1 there exist Y = Y (Z) (12) The
case Zn ∈ (0, 1) a.s. corresponds to “deteriorating environment,” the case being close to the soil erosion model of Todorovic and Gani (1987). (13) Moreover, the converse is also true. Namely, if S, (Y, Z) (M, (Y, Z) respectively) is a solution of (9.2.71) (or (9.2.72) respectively), then the distribution of S(M ) is equal d
to the limiting distribution in the model (9.2.66) ((9.2.67) respectively) with (Yn , Zn ) = (Y, Z). This is a simple consequence of the uniquenes principle (the so-called Letac principle); see Letac (1986) or Goldie (1991).
246
9. Mass Transportation Problems and Recursive Stochastic Equations
such that (9.2.71) (resp. (9.2.72)) holds. The class(14) Z1 of Z-laws L(Z) consists of distributions on (0, 1) with densities fα (u) = (1 + α)z α ,
0 < z < 1, α ≥ 0.
(9.2.83)
In the sequel, for any 0 < β < 1 and an r.v. Y , define ⎧ ⎨ 0 with probability 1 − β, Yβ := ⎩ Y with probability β.
(9.2.84)
A complete description of the class Z1 is given in the following theorem. d
Theorem 9.2.31 The class S1 of the laws L(S) solving S = (S + Y )Z consists of all nonnegative infinitely divisible r.v.s S with Laplace transform ⎧ ∞ ⎫ ⎨ 1 ⎬ φS (θ) = exp − (1 − e−θx )MS ( dx) . ⎩ ⎭ x 0
Moreover, the L´evy measure MS is of the following form: MS " Leb
and
MS ( dx) = H(x) dx,
where H(0) ∈ [0, 1], H is nonincreasing on [0, ∞) and vanishing at ∞. The corresponding Y has 1 − H as its distribution function. Remark 9.2.32 Suppose that S is a solution of (9.2.71) for a given Y with 0 ≤ E log(1 + Y ) < ∞ and Z uniform. Then then S is also a solution of (9.2.71) with Z having density (9.2.83) and Y replaced by Y1/(1+α) . Note that Z1 is a subclass of the class of self-decomposable random variables; see Vervaat (1979). Also, allowing α in (9.2.83) to take values in the whole range (−1, ∞) would have made the class Z1 degenerate (consisting of Z = 0 a.s.). Remark 9.2.32, in particular, has no counterpart for α’s in the range (−1, 0). Our next task is the characterization of the class M1 of laws L(M ) such that for every L(Z) ∈ Z1 there exists Y = Y (Z) such that (9.2.72) holds. Theorem 9.2.33 The class M1 consists of all absolutely continuous laws L(M ) with density fM and d.f. FM satisfying the following conditions: (14) The
class Z1 was considered by Vervaat (1979) (who discussed a wider family, allowing α > −1 in (9.2.83)) and Todorovich and Gani (1987). Some particular examples of laws L(S) ∈ S1 , L(M ) ∈ M1 , were studied by Todorovich and Gani (1987), Todorovich (1987), and Rachev and Todorovich (1990).
9.2 Convergence of Recursive Algorithms
247
(i) fM (x) is nonincreasing on (0, ∞). (ii) x fM (x)/FM (x) is nonincreasing on (0, ∞). Suppose that L(M ) ∈ M1 and let Z (α) have density fZ α (z) = (1 + d
α)z α , 0 < z < 1. Then M = (M ∨ Y )Z (α) is equivalent to F Y (x) =
1 x fM (x) , 1 + α FM (x)
x > 0.
(9.2.85)
By (9.2.85), for any L(M ) ∈ M1 and 0 < α < 1, M = (M ∨ Yα )Z (α) ⇐⇒ M = (M ∨ Y0 )Z (0) , d
d
where Yα is determined by (9.2.84). The last relation is parallel to the corresponding relation in the scheme of summation (cf. Remark 9.2.32). Note also that gamma Γ(p, λ)-distributions with 0 < p ≤ 1 belong to M1 , while those with p > 1 do not. Next, we consider the class S2 (resp. M2 )(15) of laws L(S) (resp. L(M )) such that for every L(Z) ∈ Z2 ≡ {δz , 0 < z < 1} there is a Y = Y (Z) such that (9.2.71) (resp. (9.2.72)) holds. Theorem 9.2.34 The class S2 coincides with the family of all nonnegative infinitely divisible r.v.s with Laplace tranform of the form ⎧ ⎫ ∞ ⎨ ⎬ 1 −tx (1 − e )MS ( dx) , φS (t) = exp −at − ⎩ ⎭ x 0
where a ≥ 0 and the L´evy measure MS " Leb is absolutely continuous, whose Radon-Nikodym derivative is nonincreasing a.s. For any S ∈ S2 and z ∈ (0, 1), the corresponding Y in the equation d
S = (S + Y )z is a nonnegative infinitely divisible r.v. with Laplace transform ⎫ ⎧ ⎬ ⎨ at(1 − z) ∞ 1 −tx − (1 − e )MY ( dx) . φY (t) = exp − ⎭ ⎩ z x 0
(15) It
turns out that the class S2 coincides with the class L of Khinchine (cf. Feller (1971, Sect. 8, Chapter XVII) of nonnegative r.v.s. We shall state here a more explicit description of S2 than that in Feller (1971, Theorem XVII.8). Moreover, S1 ⊂ S2 ; see also Vervaat (1979, Remark 4.9). The class of M2 coincides with the class of the laws of max self-decomposable r.v.s (see Balkema et al. (1990) and the references there). The next theorem, similar to the Mejzler (1956) result, is based on a characterization of the weak limits of the normalized maxima an {max(X1 , X2 , . . . , Xn ) − bn } when the Xi ’s are independent and nonidentically distributed.
248
9. Mass Transportation Problems and Recursive Stochastic Equations
Moreover, MY " Leb, and dMS dMS dMY (x) = (zx) − (x), dλ dλ dλ where λ = Leb. Theorem 9.2.35 The class M2 consists of the laws of positive absolutely continuous r.v.s M such that xfM (x)/FM (x) is a nonincreasing function on (0, ∞). Also, M1 ⊂ M2 .
9.2.7
Random Recursion Arising in Probabilistic Modeling: Rate of Convergence
Throughout this section, (B, || · ||) is the separable Banach space C(T ) of continuous mappings x : T → IR, where T is compact and || · || is the usual supremum norm in C(T ). For any x, y ∈ B we set (x · y)(t) = x(t) · y(t), (x ∨ y)(t) = x(t) ∨ y(t), t ∈ T . Given a nonatomic probability space, let X (B) be the space of all random fields (r.f.s) X of B-valued random variables, and let L(B) be the space of all laws PX . Suppose {(Yn , Zn )}n≥1 is a sequence of i.i.d. pairs of r.f.s, and define Sn =
n
Xi ,
i=1
Xi = Yi
i 9
Zj .
(9.2.86)
j=1
The r.f. Sn can be interpreted as the “wealth” accumulated in different commodities {At , t ∈ T } for a period of n years. We take T = T (U ), where U is a compact metric space, U = (U, ), and T (U ) is the set of all closed subsets (think, for example, of crop-producing areas) t of U endowed with the Hausdorff metric h(t1 , t2 ) = inf{ε > 0; t1 ⊂ tε2 , t2 ⊂ tε1 }. Here tε stands for the ε-neighborhood of t (cf. Hausdorff (1957, Sect. 29)).(16) Similarly, we define the maximal “wealth changes” Mn =
n @
Xi .
(9.2.87)
i=1
Next, we are interested in conditions providing an exponential rate of convergence of Sn and Mn to finite limits S and M , respectively. The rate (16) Then
(T, h) is a compact metric space (cf. Hausdorff (1957, Sect. 29); see also Kuratowski (1966, §21), Kuratowski (1969, §31), and Matheron (1975)).
9.2 Convergence of Recursive Algorithms
249
of convergence of the laws PXn to PX will be expressed, as usual in the Banach space setting, in terms of the Prohorov metric (9.2.88) π(X, Y ) := π(PX , PY ) ε := inf{ε > 0; P (X ∈ A) ≤ P (Y ∈ A ) + ε for all Borel subsets A in B}, where Aε is the open ε-neighborhood of A. Further, we shall use also the following metrics and functions in X (B) and L(B):(17) (i) χp -metric in X (B): 1 # $ 1+p χp (X, Y ) := sup tp P (||X − Y || > t) ,
p > 0;
t>0
(ii) χp -minimal metric in L(B): 5p (PX , PY ) χ 5p (X, Y ) = χ d d Y ); X, Y ∈ X (B), X = X, Y = Y }, p > 0; := inf{χp (X,
ωp,N (X)p+1
(iii)
:=
sup tp P (||X|| > t), t>N
ωp (X)
:= ωp,0 (X),
Np (X)
:= {E||X||p }1/(1+p) ,
p > 0.
Note that χ 5p is a metric in L(B), χ 5p ≥ π, and the following convergence criterion holds: if ωp,N (Xn ) + ωp,N (X) → 0
as N → ∞ for any n = 1, 2, . . . ,
(9.2.89)
then D
χ 5p (Xn , X) → 0 as n → ∞ if and only if Xn →X as n → ∞ and lim lim sup ωp,N (Xn ) = 0.
N →∞ n→∞ (17) (cf.
Pisier and Zinn (1977), de Haan and Rachev (1989), and Rachev (1991c).
250
9. Mass Transportation Problems and Recursive Stochastic Equations
Similarly, if (9.2.21) holds, then P
χp (Xn , X) → 0 as n → ∞ if and only if Xn →X as n → ∞ and lim lim sup ωp,N (Xn ) = 0.
N →∞ n→∞
Theorem 9.2.36 (a) If for some p > 0, Np (Z1 ) < 1 and ωp (Y1 Z 1 ) < ∞, ∞ then Sn converges in probability to a P -a.e. finite limit S = i=1 Xi . Moreover, χp (Sn , S) ≤ φn (Z1 )ωp (Y1 Z1 ),
(9.2.90)
where φn (Z1 ) := Np (Z1 )n /(1 − Np (Z1 )). (b) Under the above conditions, suppose additionally that {(Yi∗ , Zi )}i≥1 is a sequence of i.i.d. pairs of r.f.s satisfying the “tail” condition ωp (Y1∗ Z1 ) < ∞. Let S ∗ be the limit of Sn∗ , i.e., Sn∗
:=
n i=1
Yi∗
i 9
Zj → S ∗ . P
j=1
i ’s are i.i.d. copies of the Z1 independent of the Suppose now that the Z 5p (Y1 Z1 , Y1∗ Z1 ) < ∞. Then as n → ∞,(18) (Yi , Zi )’s and let χ 1 · · · Z n · S) ≤ φn (Z1 )5 π(S ∗ − Sn∗ , Z χp (Y1 Z1 , Y1∗ Z1 ) → 0.
(9.2.91)
Proof: a) First note that χp ≥ κ, where κ is the distance in probability (the Ky Fan metric): κ(X, Y ) = inf{u > 0; P (||X − Y || > u) < u}. P
Indeed, to prove Sn → S it is enough to show that Sn is χp -fundamental. Actually, for k = 1, 2, . . . , χp (Sn+k , Sn ) 2 n+k 3 ≤ Xi = ωp
(9.2.92) n+k
i=n+1
i=n+1
ωp (Xi )
1 ⎧ ⎛ ⎞p+1 ⎫ 1+p ⎪ i−1 n+k ⎨ ⎬ ⎪ 9 EZ1 =z1 ,...,Zi−1 =zi−1 ωp ⎝Yi zj Zi ⎠ ≤ ⎪ ⎪ ⎭ i=n+1 ⎩ j=1
(18) The
CLT.
1 · · · Z n plays the same role as the normalizing scaling in the usual factor Z
9.2 Convergence of Recursive Algorithms
=
n+k
⎡
⎛
⎣EZ1 =z1 ,...,Zi−1 =zi−1 ⎝
i=n+1
≤
n+k
i−1 9
251
1 ⎞p ⎤ 1+p
||zj ||⎠ ⎦
ωp (Yi Zi )
j=1
Np (Z1 )i−1 ωp (Y1 Z1 )
i=n+1
= φn (Z1 )ωp (Y1 Z1 ) → 0 as n → ∞, which indeed implies that Sn is χp -fundamental. The bound (9.2.90) follows by the same arguments we used to show (9.2.92). (b) By definition, χ 5p is the minimal metric with respect to χp , and thus for any joint distribution of Y1 and Y1∗ , ⎛ ⎞ i i 9 9 1 · · · Z n · S) ≤ χp ⎝ Yi∗ Zj , Yi Zj ⎠ . χ 5p (S ∗ − Sn∗ , Z i>n
j=1
i>n
j=1
Now proceed as in (9.2.92) to obtain that the right-hand side is not greater than φn (Z1 )5 χp (Y1 Z1 , Y1∗ Z1 ). We next take the infimum in the last inequality over all joint distribu5p ≥ π to tions of (Y1 , Y1∗ ) with fixed marginals, and use the inequality χ complete the proof of (9.2.92). 2 Theorem 9.2.37 Suppose the Yi ’s and Zi ’s are nonnegative r.v.s. Then P under the assumptions of Theorem 9.2.36a), Mn → M , and moreover, χp (Mn , M ) ≤ φn (Z1 )ωp (Y1 Z1 ) → 0 as n → ∞. If also χ 5p (Y1 , Y1∗ ) < ∞, then under the assumptions of Theorem 9.2.36(b), ⎞ ⎛ ∞ i @ 9 1 · · · Z n · M ⎠ ≤ φn (Z1 )5 π⎝ Yi∗ Zj , Z χp (Y1 Z1 , Y1∗ Z1 ) → 0. i=n+1
j=1
Proof: (a) For any k = 1, 2, . . ., 2 n+k 3 χp (Mn+k , Mn ) ≤ ωp ≤ Xi i=n+1 p
n+k
ωp (Xi ),
i=n+1
and therefore, Mn → M follows by the same arguments as in the proof of Theorem 9.2.36, and the required bound for χp (Mn , M ) is obtained in the same way as we did in (9.2.92).
252
9. Mass Transportation Problems and Recursive Stochastic Equations
b) With Xi∗ = Yi∗ 2 χ 5p
@
Ci j=1
Zj we have 2
3 1 · · · Z n · M Xi∗ , Z
≤ χp
i>n
≤
Xi∗ ,
i>n
2 ≤
@
sup tp P
3 Xi
i>n
2
t>0
@
331/1+p
||Xi∗ − Xi || > t
i>n
ωp (Xi∗ − Xi ).
i>n
The last inequality follows from the triangle inequality for χp in the space of real-valued random variables. Conditioning as in the proof of Theorem 9.2.36(a), we obtain
ωp (Xi∗ − Xi ) ≤ φn (Z1 )χp (X1∗ , X1 ).
i>n
Passing to the minimal metrics and using again π ≤ χ 5p , we obtain the necessary bound. 2 Suppose N is an integer valued r.v. independent of the Yi ’s and Zi ’s. Then under the conditions of Theorem 9.2.36, π(SN , S) ≤ ψN (Z1 )ωp (Z1 ), where ψN (Z1 ) = (ENp (Z1 )N )/(1 − Np (Z1 )), and moreover, ∗ 1 · · · Z N · S) ≤ ψN (Z1 )5 , Z χp (Y1 , Y1∗ ). π(S ∗ − SN
Similar results on the limiting behavior of of the maximum be obtained as a consequence of Theorem 9.2.36.
>N i=1
Xi can
Remark 9.2.38 Vervaat (1979) showed the following limiting result for Sn∗ (see (9.2.65)): let un ↑ ∞ as n → ∞ be a sequence of reals, and assume, in addition, that (i) E log+ |B1 | < ∞, E| log |A1 ||2+η < ∞ for some η > 0, μ := E log |A1 | < 0; d
(ii) the solution S ∗ of the equation S ∗ = A1 S ∗ + B1 , S ∗ and (A1 , B1 ) independent, has a density f that is ultimately nonincreasing and such that f (t) = O(t−1 ) as t → ∞;
9.2 Convergence of Recursive Algorithms
253
(iii) there are positive reals b and ε < |μ| and a positive nonincreasing integrable function φ on [1, ∞) such that the function T ← (T+ (x)φ(y))x−1 e(μ+ε)y (where T+ (x) = P (|S ∗ | > x), T (x) = P (S ∗ > x), and T ← is its generalized inverse) is bounded on the set {(x, y); x ≥ b, y ≥ 1}. Then ∞
P (S ∗ > un )
n=1
< ∞, = ∞,
implies P (Sn∗
> un i.o.) =
0, 1.
Following our Theorem 9.2.36, let us compare the tail behavior of the distribution of the Sn∗ ’s and their limit S ∗ . We consider again the Banach 5 p be the minimal metric space setting for An , Bn , Sn∗ , and S ∗ . Let p = L with respect to Lp (X, Y ), 0 ≤ p ≤ ∞, L0 (X, Y ) = P (X = Y ), Lp (X, Y ) = {EX − Y p }min(1,1/p) ,
0 < p < ∞,
and L∞ (X, Y ) = ess supX − Y . Then, as in Theorem 9.2.36(a), if for some p ∈ [0, ∞], Np∗ (A1 ) := Lp (Z, 0) < 1 and Np∗ (B1 ) < ∞, then as n → ∞, p (Sn∗ , S ∗ ) ≤
Np (A1 )n Np (B1 ) → 0. 1 − Np (A1 )
In the case of real-valued Sn∗ and S ∗ , the last bound gives us conditions for exponential rate of convergence in the total variation metric and in the p -Kantorovich metrics: 0 (Sn∗ , S ∗ )
=
1 (Sn∗ , S ∗ ) =
sup |P (Sn∗ ∈ A) − P (S ∗ ∈ A)| → 0;
A Borel ∞
|FSn∗ (x) − FS ∗ (x)| dx → 0; −∞
254
9. Mass Transportation Problems and Recursive Stochastic Equations
⎛
⎞ 1/p ∞ p (Sn∗ , S ∗ ) = ⎝ |FS←n∗ (x) − FS←∗ (x)|p dx⎠ → 0,
1 ≤ p < ∞;
−∞
and ∞ (Sn∗ , S ∗ ) =
sup |FS←n∗ (x) − FS←∗ (x)| → 0.
0≤x≤1
Here as usual, FSn∗ and FS ∗ are the corresponding distribution functions, and F ← stands for the generalized inverse of F .
9.3 Extensions of the Contraction Method A well-known problem in the theory of probability metrics is the extension of the method of ideal metrics to limit theorems for sums or maxima with “nonregular” normalizations of logarithmic type. Moreover, this problem is quite typical in a wide range of stochastic algorithms, since the logarithmictype normalization is not reflected in the regularity structure of probability metrics, while power normalizations na can be captured easily by ideal metrics of order a. The second difficulty arises when the contraction factors converge to one. In this section we study several examples that show solutions to this problem by the use of a modified version of the contraction method. In Sections 9.3.1 and 9.3.2 we consider the number of inversions for random permutations and the “MAX”-algorithm. In Sections 9.3.3 and 9.3.4 we study successful and unsuccessful searching in binary random trees. Each of these examples needs some special arguments in order to achieve approximation by a limit distribution; so in general, the contraction method cannot be considered an “automatic” method. The advantage of the contraction method is its generality, which allows us, for example, to consider recursions in very general spaces, as well as the fact that it often allows us to obtain quantitative approximations. The examples in this section are due to Cramer and R¨ uschendorf (1996a).
9.3.1
The Number of Inversions of a Random Permutation
Given a permutation σ = (a1 , . . . , an ), the pair (ai , aj ), i < j, is called an inversion if ai > aj . Denote by In the number of inversions in a random permutation of size n. Then the following recursion holds: d
In = In−1 + Xn ,
I1 = 0,
(9.3.1)
9.3 Extensions of the Contraction Method
255
where Xn ∼ U({0, . . . , n − 1}) is uniformly distributed on 0, . . . , n − 1 and the r.v.s In−1 , Xn are independent. This leads to explicit expressions for the moment generating function, the mean, and the variance: Gn (z) = Ez
E In =
1 (1 − z 2 ) · · · (1 − z n ) = , · n! (1 − z)n−1
In
n (n − 1) , 4
Var In =
(n − 1) n (2 n + 5) 72
(9.3.2)
(9.3.3)
(cf. Hofri (1987, pp. 122–124)). For the normalized version In − E In I5n := √ Var In
(9.3.4)
we obtain the following Berry–Ess´een-type result. (Note that we assume that all the occurring random variables are defined on one and the same probability space.) Theorem 9.3.1 For n ≥ 7, 1 5 In , N (0, 1) ≤ C · n− 2 , with C = 2.75 ·
84 6·128
A
(9.3.5)
7 6.
n Proof: Without loss of generality, we assume that In = i=1 Xi , where the Xi are independent, Xi ∼ U({0, . . . , i − 1}). By the Berry–Ess´een theorem (cf. Bhattacharya and Ranga Rao (1976, Th. 12.4)),
I5n , N (0, 1)
≤ 2.75
Sn,3 , (Sn,2 )3/2
(9.3.6)
where Sn,m :=
n
E|Xk − E Xk |m .
(9.3.7)
k=1
We have, for k ≥ 2, 3
E|Xk − E Xk |
k3 , ≤ 32
Var Xk
k2 − 1 . = 12
3 n by some tedious calculations. This implies that k=1 Var Xk ≥ (n−1) and 36 4 n (n+1) 3 k=1 E|Xk − E Xk | ≤ 128 . Thus, from (9.3.6), we obtain, for n ≥ 7,
256
9. Mass Transportation Problems and Recursive Stochastic Equations
I5n , N (0, 1)
63 ≤ 2.75 128
n+1 n−1
4 8
1 1 n n− 2 ≤ C n− 2 . n−1
2 Recursion (9.3.1) leads to a sum of independent variables and therefore allows the application of the classical tools for the central limit theorem. On the other hand, it is an interesting “test” rate of convergence example for the contraction method, since the contraction factors of the normalized recursion converge to one. Furthermore, the approximation result (in terms of the ζ3 -metric) is of independent interest. It gives the same convergence rate as in Theorem 2.1 uniformly on the set of functions f (In ) with f (3) ∞ ≤ 1, when we study the limiting behavior of In − E I n In := . n3/2
(9.3.8)
Theorem 9.3.2 Let σn2 := Var(In ) and Zn ∼ N (0, σn2 ). Then for some C > 0, and for all n ∈ IN, 1 ζ3 (In , Zn ) ≤ C n− 2 .
(9.3.9)
Proof: First note that In satisfies the modified recursion 3/2 n − 1 d n , In = In−1 + X n
(9.3.10)
Xn n := Xn −E where X . Let the sequence (Zn ) be independent, Zn ∼ n3/2 2 N (0, σn ), and define the accompanying sequence
Zn∗ :=
n−1 n
3/2
Bn . Zn−1 + X
Let Yi ∼ N (0, τi2 ) be independent 3 2 Xi τi2 := σi2 − i−1 σi−1 = Var ≥ 0. Then i i3 d
Zi =
i−1 i
(9.3.11) of
i , Zi−1 ), (X
where
3/2 Zi−1 + Yi .
(9.3.12)
Using the homogeneity of order three of the ideal metric ζ3 , we obtain 2 3 3/2 3/2 n − 1 n − 1 n , n In−1 + X ζ3 (In , Zn ) ≤ ζ3 Zn−1 + X n n + ζ3 (Zn∗ , Zn ) 9/2 n−1 ζ3 (In−1 , Zn−1 ) + ζ3 (Zn∗ , Zn ). ≤ n
9.3 Extensions of the Contraction Method
257
By iteration, using Z1 = I1 = 0, we obtain the “ground estimate” n 9/2 i ζ3 (Zi∗ , Zi ). ζ3 (In , Zn ) ≤ n i=2
(9.3.13)
Note that E Zi = E Zi∗ = 0 and E Zi2 = E(Zi∗ )2 . Therefore, by making T (1+ 1 )
α use of the estimate ζr ≤ Γ(1+r) κr for r = m + 2, by (9.3.12) and some calculations (cf. Cramer (1995)) we have 2 3 3/2 3/2 i−1 Bi + i − 1 ζ3 (Zi∗ , Zi ) = ζ3 X Zi−1 , Yi + Zi−1 i i Γ(2) B ≤ ζ3 Xi , Yi ≤ κ3 Xi , Yi Γ(4) 0 = x2 |FX i (x) − FYi (x)| dx
−∞
≤
7 1 −5/2 −3/2 i + i . 2 6 · 32 25
Therefore, by some additional calculations, ζ3 (In , Zn ) ≤ ≤ ≤ =
n 9/2 i ζ3 (Zi∗ , Zi ) n i=2 n 9/2 i 1 −5/2 7 −3/2 i + 6 2i 5 n 2 2 ·3 i=2 # $ 1 1 1 7 1 n2 + n3 + 6 2 n3 + n4 5 9/2 2 3 2 ·3 4 n 1 7 − 32 √ + O n · . 28 · 3 2 n
2 32 Note that the contraction factor in this example is of order n−1 only, n and consequently we cannot obtain a uniform bound, implying that we need to estimate more precisely the individual terms. The exponential con√ vergence rate is reduced to the rate n.
9.3.2
The Number of Records
The “MAX”-algorithm determines the maximum element of a random sequence (cf. Hofri (1987, pp. 112–113)). Its complexity is essentially given
258
9. Mass Transportation Problems and Recursive Stochastic Equations
by the number of records in a random permutation. Let Mn denote the number of maxima of a random permutation read from left to right. Then Mn satisfies the recursion d
Mn = Mn−1 + Xn ,
(9.3.14)
1 where n has a Bernoulli distribution with success probability n , Xn ∼ 1X B 1, n , and Xn , Mn−1 are assumed independent. Define M1 = 0. Then d
Mn =
n
Xi ,
(9.3.15)
i=2
when the (Xi ) are independent. Furthermore, E Mn = Hn − 1, (k)
where Hn = (2)
Hn
n
1 j=1 j k ,
−→ ζ(2) =
n→∞
π2 6
Var Mn = Hn − Hn(2) ,
(9.3.16)
(1) Hn = Hn = ln n + γ + O n−1 , and (cf. Hofri (1987)).
Define next the normalized sum Dn := M√n − E Mn . M Var Mn
(9.3.17)
Then as in Section 9.3.1, we obtain the normal approximation, but with a “very slow” logarithmic rate of convergence. Theorem 9.3.3 For all n ∈ IN and some absolute constant C > 0, the following uniform rate of convergence holds: C D Mn , N (0, 1) ≤ √ . (9.3.18) ln n Proof: We invoke the Berry–Ess´een bound (9.3.6), where E Xk = k1 , k3 −3 k2 +4 k−2 3 , and E|X − E X | = . Var Xk = k−1 2 k k k k4 n n Therefore, k=2 E|Xk − EXk |3 ∼ ln n, and k=2 Var Xk ∼ ln n, leading to (9.3.18). The constant C can be easily explicitly calculated. 2 The normalization of Mn is logarithmic in n. To get a rate of convergence result similar to that in (9.3.18), we shall make use of the ζ3 -metric. AIt turns out that in this example we obtain contraction factors of order ln(n−1) ln n that converge to one. Nevertheless, the method described in the proof of Theorem 9.3.2 can also be applied in this case. To this end, define Bn := Mn√− E Mn . M ln n
(9.3.19)
9.3 Extensions of the Contraction Method
259
Bn and Zn ∼ N (0, σ 2 ), we have Theorem 9.3.4 For σn2 := Var M n Bn , Zn ) = O √ 1 ζ3 (M . (9.3.20) ln n Bn satisfies the recursion Proof: Indeed, M 8 ln(n − 1) B d n , Bn = Mn−1 + X M ln n
(9.3.21)
n := Xn√−E Xn . Let (Zn ) be independent normally distributed r.v.s, where X ln n Zn ∼ N (0, σn2 ), and let 8 ln(n − 1) n Zn−1 + X Zn∗ := (9.3.22) ln n be the accompanying sequence. Further, let Yn ∼ N (0, τn2 ), and τn2 := σn2 −
ln(n − 1) 2 n . σn−1 = Var X ln n
(9.3.23)
Then 8 d
Zn =
ln(n − 1) Zn−1 + Yn , ln n
(9.3.24)
and using the same arguments as in Section 9.2, we get Bn , Zn ) ≤ ζ3 (M Bn , Zn∗ ) + ζ3 (Zn∗ , Zn ) ζ3 (M 3/2 ln(n − 1) Bn−1 , Zn−1 ) + ζ3 (Yn , X n ). ζ3 (M ≤ ln n By iteration, this yields the bound Bn , Zn ) ≤ ζ3 (M
ln 2 ln n
3/2 3/2 n ln i B2 , Z2 ) + i ). ζ3 (M ζ3 (Yi , X ln n i=3
(9.3.25)
B2 , Z2 ) < ∞, and since By the moment estimate ζ3 (M i = Var Yi = τ 2 = 1 · i−1 Var X i ln i i2 , we have i ) ≤ ζ3 (Yi , X ≤
1 3 3 E|Yi | + E|Xi | 6 2 3 √ 1 8 1 1 1 · +√ · √ ; π i i (ln i)3/2 6 i
3 here we also used the estimate E Xi − 1i ≤ 1i .
(9.3.26)
260
9. Mass Transportation Problems and Recursive Stochastic Equations
From (9.3.25) we finally obtain Bn , Zn ) ≤ ζ3 (M
= ≤ =
9.3.3
3/2 1 ln 2 · 6 ln n 2 3 √ 3/2 n 8 ln i 1 1 1 1 +√ · √ + · · 3/2 ln n 6 i π i i (ln i) i=3 * √ + n n 1 1 1 8 3/2 √ √ (ln 2) + + · i i=3 i i π 6 (ln n)3/2 i=3 1 3/2 (ln 2) + 2 ln n 6 (ln n)3/2 1 1 −3/2 + O (ln n) ·√ . 3 ln n
2
Unsuccessful Searching in Binary Search Trees
In this and the following section we deal with the analysis of inserting and retrieving randomly ordered data in binary search trees by the contraction method; we refer to Mahmoud (1992) for an introduction to random search tree algorithms. Let Un denote the number of comparisons that are necessary in order to insert a new random element in a random search tree. A search tree is called random if it arises from a random permutation. An element (to be inserted in a tree) is called random if each of the n + 1 free leaves of the 1 of being chosen. tree has probability n+1 Un satisfies the recursion d
Un = Un−1 + Yn ,
U0 = 0,
(9.3.27)
2 where Un−1 , Yn are independent, Yn ∼ B(1, n+1 ). For n = 1, one comparison with the root is necessary. For n ≥ 2, insertion of the (n + 1)th element needs as many comparisons in the n-tree as in the (n − 1)-tree except in the case that one comparison with the nth element is necessary. The probability that no comparison with this element is necessary equals n−1 n+1 .
From (9.3.27) we have E Un = 2 (Hn+1 − 1),
(2)
Var Un = 2 Hn+1 − 4 Hn+1 + 2.
(9.3.28)
9.3 Extensions of the Contraction Method
261
Brown and Shubert (1984) (cf. Mahmoud (1992, p. 76)) proved a central limit theorem for Un making use of the Lyapunov theorem and the method generating functions. Since by (9.3.27), d
Un =
n
Yi ,
Yi
i=1
2 ∼ B 1, i+1
,
(Yi ) independent, (9.3.29)
this argument can be simplified to yield the following theorem. Theorem 9.3.5 Define n − E Un 5n := U√ . U Var Un
Then for some constant C > 0 and all n,
5n , N (0, 1) U
C . ≤ √ ln n Sn,3
(9.3.30)
1 (cf. Mahmoud (1992, p. 77)). There3/2 2 ln n Sn,2 fore, (9.3.30) is a consequence of (9.3.6). 2
Proof: Observe that
∼√
Applying the results of Deheuvels and Pfeifer (1988) we obtain that ln1n 5n by a Poisson distribution. This is the exact order of approximation of U indicates that the logarithmic rate in the Berry–Esseen bound (9.3.30) should give essentially the right order of approximation. The following rate of convergence result, obtained by the contraction method, supports the fact that the logarithmic order is sharp. The contraction method can be applied in the theorem below in much the same way as in Section 9.3.2. We therefore only give a sketch of the proof. For more details we refer to Cramer (1995a). n , and Zn ∼ N (0, σn2 ). n := Un√−E Un , σn2 := Var U Theorem 9.3.6 Define U ln n Then, for some C > 0 and all n ∈ IN, we have ζ3
n , Zn U
C . ≤ √ ln n
n satifies the recursion Proof: U 8 ln (n − 1) d n = Un−1 + Yn , U ln n
(9.3.31)
Yn − E Yn √ . Yn := ln n
(9.3.32)
262
9. Mass Transportation Problems and Recursive Stochastic Equations
Define then
8
Zn∗ :=
ln (n − 1) Zn−1 + Yn ln n
(9.3.33)
and τn2 := σn2 −
ln (n − 1) 2 σn−1 = Var Yn . ln n
(9.3.34)
Let the normal random variables Wn ∼ N (0, τn2 ) be independent of the sequences (Zn ), (Yn ). Then the sequences 8 ln(n − 1) d (9.3.35) Zn−1 + Wn . Zn = ln n Consequently, as in Section 9.3.2, we have the bound 3/2 3/2 n ln i ln 2 ζ3 Un , Zn ≤ ζ3 U2 , Z2 + ζ3 Wi , Yi . ln n ln n i=3
(9.3.36)
2 = 0 = E Z2 , Var U 2 = σ 2 = Var Z2 , it follows that Next, since E U 2
2 , Z2 ζ3 U
1 ≤ 6
3 3 < ∞. E U2 + E |Z2 |
Furthermore, 3 1 3 ζ3 Wi , Yi ≤ E|Wi | + E Yi 6 * √ 2 3 3 3+ 1 2 2 3 i−1 1 2 2 i−1 √ τi + = + 3/2 6 i+1 i+1 i+1 i+1 π (ln i) * + 4 2 1+ % . ≤ 6 (ln i)3/2 (i + 1) π (i + 1) Therefore, ζ3 Un , Zn ≤
1 1 (ln n)3/2 6 +
≤ as required.
√
1 (ln n)3/2
1 ln n
10 8 + √ 81 27 π 2 3 n 4 1 1 1+ % · i+1 3 π(i + 1) i=3
for n ≥ n0 ,
(9.3.37) 2
9.3 Extensions of the Contraction Method
263
Remark 9.3.7 Studying the recursion (9.3.32) we can also obtain rate of convergence under alternative distributional assumptions on Yn (resp. Yn ). For example, if μr is any (r, +)-ideal, simple metric, then (as in (9.3.36))
n , Zn ≤ μr U
ln 2 ln n
r/2 r/2 n ln i μr U2 , Z2 + μr Wi , Yi . (9.3.38) ln n i=3
This indeed implies that n , Zn μr U −→ 0,
(9.3.39)
n→∞
provided that the following conditions hold: (a) μr U2 , Z2 < ∞, μr Wi , Yi < ∞,
(b)
μr Wi , Yi = o
1 i ln i
i ≥ 3.
.
(9.3.40)
ε To show (9.3.39) for ε > 0, choose k0 ∈ IN such that μr (Wk , Yk ) ≤ , k ln k for k ≥ k0 . Then lim sup μr n→∞
n , Zn U
≤ lim sup n→∞
ln 2 ln n
r/2
2 , Z2 μr U
k 0 −1 1 r/2 + lim sup (ln i) μr Wi , Yi r/2 n→∞ (ln n) i=3 n 1 r/2−1 1 ε + lim sup (ln i) r/2 i n→∞ (ln n) i=k 0
n 1 1 ≤ 0 + 0 + lim sup ε ≤ ε. ln n i n→∞ i=k0
In the preceding example of unsuccessful searching, the estimate of the rate of “merging” of the sequences (Wi ) and (Yi ) in terms of μr (W i , Yi ) is √ of order 1/i(ln i)3/2 , allowing us to reach the convergence rate 1/ ln n.
9.3.4
Successful Searching in Binary Search Trees
Given a random binary search tree as in Section 9.3.3, let Sn denote the number of comparisons to retrieve a randomly chosen element in the tree. Brown and Shubert (1984) derived a formula for P (Sn = k), and Louchard (1987) proved a central limit theorem for Sn using the generating function
264
9. Mass Transportation Problems and Recursive Stochastic Equations
method in Mahmoud (1992, pp. 78–82). We shall next derive a quantitative version of the central limit theorem. Our main tool will be the contraction method and moment formulas based on the following recursion for Sn : d
Sn = 1 + SIn ,
S0 = 0,
S1 = 1.
Here In is independent of (Si ), and P (In = 0) = 1 ≤ j ≤ n − 1.
(9.3.41) 1 n,
P (In = j) =
2j n2 ,
It can be shown that this recursion does not transform itself to a sum of independent random variables as was done in the random search algorithm in Rachev and R¨ uschendorf (1991) (cf. (9.3.59)). Therefore, (9.3.41) does not allow the application of the Berry–Esseen-type or Poisson-type approximation result. In fact, it arises from the recursion n
P (Sn = k) =
P (Sn = k, j chosen)
(9.3.42)
j=1 n
n 1 1{i=j} δ1k + 1{i<j} P (Sn−i = k − 1) 2 n j=1 i=1 + 1{i>j} P (Si−1 = k − 1) n δ1k n − i P (Sn−i = k − 1) + 2 n n i=1
=
=
+
n i−1 i=1
n2
P (Si−1 = k − 1)
δ1k 2j P (Sj = k − 1) · 2 . + n n j=1 n−1
=
An explicit formula for P (Sn = k) is due to Brown and Shubert (1984) (cf. Mahmoud (1992, p. 79)). Making use of the Brown–Shubert result, Mahmoud (1992, p. 80) desired formulas for the first two moments of Sn . The recursion (9.3.41) leads to a direct calculation of those moments, as we shall see in the next proposition. Proposition 9.3.8 1 = 2 1+ (9.3.43) Hn − 3. n # 2 $ Hn 10 1 (2) = 2+ Hn − 4 1 + + Hn + 4. (9.3.44) n n n
(a)
E Sn
(b)
Var Sn
9.3 Extensions of the Contraction Method
265
Proof: = 1 + E (E(SIn |In )) = 1 +
(a) E Sn
n−1
P (In = k) E Sk
(9.3.45)
k=0
= 1+
n−1 k=0
2k E Sk . n2
With Qn := n · E Sn , the recursion (9.3.45) leads to Qn = n + n2 n+1 + n+2 Q1 = 1, which implies Qn+1 = 2n+1 n+1 Qn . Iteratively, Qn
= = = = =
Qk ,
1 + 2 E SIn + E SI2n 1 + 2 (E Sn − 1) +
n−1 j=1
2j E Sj2 . 2 n
With Pn := n · E Sn2 , we obtain Pn = −n + 2 Qn + This yields
k=1
n−1 2k − 1 n + 1 2n − 1 + · n k k+1 k=1 * n + n 2 1 1 (n + 1) − − k+1 k k+1 k=1 k=1 # $ 1 (n + 1) 2 (Hn+1 − 1) − 1 + n+1 2(n + 1) Hn − 3 n.
=
(b) E Sn2
n−1
n+1 2
Pn+1 −
Pn+1 = 8 Hn −
n 2
Pn =
2 n+1 2
2 n
n−1 j=1
Pj .
+ 2 Qn + Pn . By (a), we now have
10 n − 1 n + 2 Pn , + n+1 n+1
and iterating the above expression, we get Pn =
n j=1
10 j − 3 8 Hj − j
n+1 . j+1
The relation n Hj j=1
j
(2)
Hn + Hn2 = 2
leads to an explicit calculation of Pn , which yields (9.3.44).
2
266
9. Mass Transportation Problems and Recursive Stochastic Equations
Our next step is to show that (Sn ) after a logarithmic normalization merges to a sequence of normal r.v.s. Define the following normalized version of (Sn ): Sn − E Sn √ , Sn := 2 ln n
S0 = S1 = 0.
(9.3.46)
Let a(k, n) := 1 − E Sn + E Sk , b(k) := Var Sk , σn2 := Var Sn .
(9.3.47)
For our derivation we need the following (so far unchecked): * 2 3 2 3+ n−1 y 2k y − a(k, n) 2 % (C) lim sup y Φ % −Φ dy < ∞.(9.3.48) n2 n→∞ b(n) b(k) k=2
Here, Φ is the standard normal d.f. Let (Zn ) be independent of (Sn ), and Zn ∼ N (0, σn2 ). Theorem 9.3.9 Suppose that (C) holds. Then there exists a constant K < ∞ such that K . (9.3.49) ζ3 Sn , Zn ≤ √ ln n Proof: Note first that (Sn ) satisfies the recursion 8 ln In d Sn = (9.3.50) SI + cn (In ), ln n n √ where cn (k) := 1 − E Sn + E Sk / 2 ln n. Define then the accompanying sequence 8 ln In d ZIn + cn (In ). Zn∗ = (9.3.51) ln n Applying the “ideality” properties of the metric ζ3 , we obtain the following recursive bound for ζ3 (Sn , Zn ); ∗ ζ3 Sn , Zn ≤ ζ3 Sn , Zn + ζ3 (Zn∗ , Zn ) (9.3.52) 3 28 8 n−1 ln k ln k Zk +cn (k) P (In = k) ζ3 ≤ Sk +cn (k), ln n ln n k=0
+ ζ3 (Zn∗ , Zn ) n−1 2 k ln k 3/2 S ζ , Z ≤ + ζ3 (Zn∗ , Zn ) . 3 k k 2 n ln n k=2
9.3 Extensions of the Contraction Method
267
To estimate the (ζ3 )-distance between Zn∗ , and Zn we compute the first two moments of Zn∗ : E Zn∗
n−1
=
28 P (In = k) E
k=0 n−1
=
P (In = k)
k=0
3
ln k Zk + cn (k) ln n
1 − E Sn + E Sk √ 2 ln n
= (2 ln n)−1/2 [1 − E Sn + E SIn ] = 0 = E Zn , 2
and similarly, E (Zn∗ ) =
1 2 ln n
Var Sn = Var Sn . Now we obtain
1 1 ≤ κ3 (Zn∗ , Zn ) = 6 2
ζ3 (Zn∗ , Zn )
x2 FZn∗ (x) − FZn (x) dx.
√ % Furthermore, FZn (x) = Φ(x/σn ) = Φ x 2 ln n/ b(n) , and
FZn∗ (x) =
n−1
P (Zn∗ ≤ x | In = k) · P (In = k)
k=0
3 2 √ √ x 2 ln n − a(k, n) 1 % = + 1[1−E Sn ,∞) (x 2 ln n) n b(k) k=2 √ 2 + 2 1[2−E Sn ,∞) (x 2 ln n). n √ Applying the substitution y = x · 2 ln n, the above implies n−1
2k Φ n2
ζ3 (Zn∗ , Zn ) ≤
1 [An + Bn + Cn ] , 2 · (2 ln n)3/2
(9.3.53)
where 2 3 y 2 y 1[1−E Sn ,∞) (y) − Φ % dy, b(n) 2 3 y y 2 1[2−E Sn ,∞) (y) − Φ % dy, b(n)
An
:=
1 n
Bn
:=
2 n2
and Cn :=
n−1 3 2 3+ * 2 2 k y y − a(k, n) 2 % % − Φ Φ y dy. n2 b(k) b(n) k=2
268
9. Mass Transportation Problems and Recursive Stochastic Equations
Invoking the assumption (C), we obtain Cn ≤ MC for all n ∈ IN and a fixed constant MC . For n ≥ n0 we have E Sn ≥ 1 and 2 3 1 y 1 2 y Φ % y 2 1[1−E Sn ,0) (y) dy An ≤ − 1[0,∞) (y) dy + n n b(n) √ 3 1 1 2% 1 1 ≤ b(n) + · (E Sn − 1)3 −→ 0. · 2√ n→∞ n 3 n 3 π The last bound follows from the follwoing asymptotics: b(n) = Var Sn ∼ 2 ln n, and E Sn ∼ 2 ln n. Therefore, An ≤ MA , and similarly Bn ≤ MB , for all n, and we obtain M , (ln n)3/2
ζ3 (Zn , Zn∗ ) ≤
(9.3.54)
where M is a fixed constant. Next, we need to apply the Euler summation formula (cf. Hofri (1987, p. 19)) to the function f (x) = x ln x, x ≥ 1: n−1 j=1
n m Bk (k−1) f (j) = f (x) dx + (n) − f (k−1) (1) + Rm , f k!
(9.3.55)
k=1
1
where (Bk ) are the Bernoulli numbers. In (9.3.55) the term Rm has the form Rm
(−1)m+1 = m!
n Bm ({x}) f (m) (x) dx,
{x} = x − %x&,
1
m−k where Bm (x) = k≥0 m is the mth Bernoulli polynomial. After k Bk x some calculations, (9.3.55) with m = 2 yields n−1
j ln j =
j=2
1 2 1 1 n ln n − n2 − n ln n + O(ln n). 2 4 2
(9.3.56)
n−1 ≤ Consider a sufficiently large n0 such that for n ≥ n0 , j=2 j ln j 1 2 1 2 B B 2 n ln n − 4 n . Choose M large enough that (9.3.49)(with M instead of B, 2 M . So from (9.3.52), K) holds for n < n0 and define K := max M (9.3.54), using inductive arguments and assuming (9.3.49) for all k < n, we obtain the final bound: ζ3
Sn , Zn
≤
n−1 k=2
2k n2
ln k ln n
3/2
K M √ + ln k (ln n)3/2
9.3 Extensions of the Contraction Method
= ≤ ≤
269
n−1 1 2 M · · K k ln k + (ln n)3/2 n2 (ln n)3/2 k=2 # $ 1 1 2 2K 1 2 + M n ln n − n 4 (ln n)3/2 n2 2 # $ K K K 1 √ K ln n − + = . 2 2 (ln n)3/2 ln n
2
Remark 9.3.10 In the preceding example, a direct proof of the convergence of Sn based on direct application of the method of probability metric seems impossible. We were able to obtain the rate of convergence by induction arguments that use the Euler summation formula in a crucial way. This extension of the contraction technique seems to be potentially useful also for other examples in the theorey of probability metrics.
0.358
0.358
0.360
0.360
0.362
0.362
0.364
0.364
0.366
0.366
Remark 9.3.11 Numerical simulations (for n ≤ 10, 000) indicate that (C) is correct. Let us denote the integral in (9.3.48) for n ∈ IN by f (n). Numerical calculation in the range −25 to 25 (with a Newton–Cote algorithm with precision 10−5 ) leads to the graphs in Figures 9.13 and 9.14 of f (n) against n, respectively against ln(ln(ln n)). These graphs indicate the boundedness of f .
0
2000
4000
6000
8000
FIGURE 9.13. f (n) against n
9.3.5
0.5
10000
0.6
FIGURE 9.14. ln(ln(ln(n)))
0.7
f (n)
0.8
against
A Random Search Algorithm
In this section we consider a random search in a set of n ordered states {1, 2, . . . , n}, starting in the largest state n. Let (Tn ) be an independent sequence of random natural numbers, Tn ≤ n − 1. After one step of the
270
9. Mass Transportation Problems and Recursive Stochastic Equations
search we reach state Tn ≤ n − 1. The search is continued in the smaller set {1, . . . , Tn } in the same way, reaching in the next step the state TTn ≤ Tn − 1. The search ends if state 1 is reached. Let Sn denote the number of steps needed for this random search to reach this final state 1. Then Sn satisfies the recursion d
Sn = 1 + STn ,
S1 = 1.
(9.3.57)
With r.v.s Tn being uniformly distributed on {1, . . . , n − 1}, this model has been used by Ross (1982, p. 118) and Bickel and Freeman (1981) in a search for an estimate of mean number of steps in the simplex method (with n extreme points). For applications to max-search problems we refer to Nevzorov (1988), and Pfeifer (1991). In their setting there are given independent r.v.s X1 , . . . , Xn , and Tn is the largest index k ≤ n − 1 such that Xk > Xn . We add the index 0 to the state space, Tn = 0 meaning that no value larger than Xn occurs. Consider now the r.v.s I1 , . . . , In , where Ik is defined as 1 or 0 as state k is visited by the search process or not. Then d
Sn =
n
Ik .
(9.3.58)
k=1
Let ai ∈ [0, 1], i ≥ 1, a1 = 1, and consider the special search strategy 2 P (Tn = k) =
3
n−1 9
ak ,
bm
1 ≤ k ≤ n − 1,
(9.3.59)
m=k+1
where bm = 1 − am and
Cn−1
m=n bm
= 1.
Special cases: (a)
If ak = 1/k, bk = (k − 1)/k, then 2 αn,k =
n−1 9
3 bm
m=k+1
ak =
n−2 1 1 k ··· = , kk+1 n−1 n−1
that is, this special case corresponds to the uniform search on {1, . . . , n−1}; (b)
If ak = 1 − e−αk , bk = e−αk , (α1 = −∞), then αn,k = e−
n−1 m=k+1
αk
(1 − e−αn ).
9.3 Extensions of the Contraction Method
271
With our choice of the search probabilities in (9.3.59) we can easily see that the random variables d
I1 , . . . , In
are independent, and Ii = B(1, ai ).
(9.3.60)
The above implies that d
Sn =
n
Ii
(9.3.61)
i=1
is a sum of independent binomial random variables; in particular, ESn =
n
ai ,
Var(Sn ) =
i=1
n
ai bi .
(9.3.62)
i=1
In the uniform search case this leads to ESn
1 = log n + γ + O n
(9.3.63)
and π2 +O Var(Sn ) = log n + γ − 6
1 , n
where γ = 0.5772 is the Euler constant. n n n 2 Suppose that λn = a , and a / i i=2 i=2 i i=2 ai = rn is small for n → ∞. Consider then for the Kolmogorov distance (X, Y ) = sup |FX (x) − FY (x)|
(9.3.64)
x
between Sn − 1 and a Poisson distributed random variable Zn with mean λn . From the results of Deheuvels and Pfeifer (1988) we have the following asymptotic approximation: 2 n 3 3/2 1 p2i / (Sn − 1, Zn ) = √ rn + O max pi , rn2 ; (9.3.65) 2 2πe 2 that is, P (Sn = k + 1) = e−λn λkn /k! + O(rn ).
(9.3.66)
272
9. Mass Transportation Problems and Recursive Stochastic Equations
Some alternative approximations of Sn in terms of various probability metrics were studied in Rachev and R¨ uschendorf (1990).
9.3.6
Bucket Algorithms
Consider now n i.i.d. r.v.s X1 , . . . , Xn with density f on [0, 1], and let i us divide [0, 1] into m intervals Ai = [ i−1 m , m ], 1 ≤ i ≤ m. Let N = number of r.v.s in the “m-buckets” (N1 , . . . , Nm ) be the vector of the n A1 , . . . , Am ; in other words, Ni = j=1 1Ai (Xj ). The total number of comparisons needed to sort n random numbers by the bucket algorithm is given by
Cn =
m Ni i=1
2
1 = (Tn − n), 2
Tn =
m
Ni2
(9.3.67)
i=1
(cf. Devroye (1986)). Since N is multinomial M (m; p1 , . . . , pm )-distributed with pi = f (x) dx, we obtain Ai
n(n − 1) 2 = pj . 2 i=1 m
ECn
(9.3.68)
Therefore, in the case of a uniform distribution pj = 1/m and for m/n → α ∈ (0, 1), we have ECn ≈ n/2α. In the general case we have the following asymptotics for the first two moments of Cn :
ECn
n ≈ 2α
1
f 2 (x) dx
(9.3.69)
0
and 2 2 3 1 2 4 3 2 f 2 (x) dx. (9.3.70) f (x) dx − + f (x) dx Var Cn → 2 n α α 2 ≥ f 2 (x) dx α
9.3 Extensions of the Contraction Method
273
We shall demonstrate the method of probability metrics in order to obtain the asymptotic distribution of Cn in the special case m = 2 and n → ∞. We have 32 2 n 32 2 n Tn = ζi + n− ζi ,
1 Cn = (Tn − n), n
i=1
(9.3.71)
i=1
where the ζi ’s are i.i.d. Bernoulli random variables with success probability p. Define the approximating U-statistic based on a normal sample: 1 (Sn − n), = 2
Dn
Sn
32 2 n 32 2 n = ηi + n− ηi , (9.3.72) i=1
i=1
where ηi ∼ N (p, pq), q := 1 − p. (A detailed analysis of the distribution of Sn can be found in Seidel (1988).) Next, by use of ◦ we shall ◦ √ √ ◦ denote the normalized quantities ζi = (ζi − p)/ pq, ηi = (ηi − p)/ pq, ◦ ◦ ◦ Tn = ( ζi )2 + (n − ζi )2 , etc. ◦
◦
The next theorem provides estimates of closeness of Cn and Dn in terms of the Kantorovich p -metrics for p = 1 and p = 2. Theorem 9.3.12 For m = 2 and n → ∞, 2 2
◦
◦
Cn Dn , n3/2 n3/2
3 = O(n−1/4 ),
(9.3.73)
= O(n−1/2 ).
(9.3.74)
and 2 1
◦
◦
Cn Dn , n3/2 n3/2
3
Proof: To show (9.3.73) note that the normalization n3/2 is of the right order. In fact, 2 Var
◦
Dn n3/2
3 =
=
◦ 1 Var(n−3/2 Sn ) 4 2 n n ◦ ◦ ◦3 2 i=1 j=1 ηi ηj − 2n ηi 1 Var 4 n3/2
274
9. Mass Transportation Problems and Recursive Stochastic Equations
≈ constant Var
◦ 2n ηi ≈ constant > 0. n3/2
Since 2 is the minimal L2 -metric, it follows that ◦
◦
2 (Cn , Dn ) ≤ L2
1 ◦ 1 ◦ (Tn − n), (Sn − n) 2 2
◦ ◦ 1 L2 (Tn , Sn ). 2
=
Thus 2 2
◦
◦
Cn Dn , n3/2 n3/2
3
3 ◦ ◦ Tn Sn , n3/2 n3/2 ⎛⎛ ⎞ ◦ 1 ⎝⎝ ◦ ◦ 2 ζi ζj + n2 − 2n ζi ⎠ n−3/2 , L2 2 i j ⎛ ⎞ ⎞ ◦ ◦ ◦ ⎝2 ηi ηj + n2 − 2n ηi ⎠ n−3/2 ⎠ 1 L2 2
≤
=
2
i
⎛ ≤ L2 ⎝n−3/2
i
+ L2 n−1/2 ◦
j
j
◦ ◦
ζi ζj , n−3/2
i
⎞ ηi ηj ⎠ ◦ ◦
j
◦ ◦ −1/2 ζi , n ηi =: I1 + I2 .
◦
Assuming that the pairs ( ζi , ηi ) are independent, we obtain I1 = n−3/2 L2
2 i
◦2
ζi ,
⎛
3 η2i ◦
+ n−3/2 L2 ⎝
i=j
i
◦ ◦
ζi ζj ,
⎞ ηi ηj ⎠ ◦ ◦
(9.3.75)
i=j
⎛ ⎛ ⎞2 ⎞1/2 ◦2 ◦ ◦ ⎟ ◦2 ◦ ◦ −1/2 −3/2 ⎜ ⎝ ≤ n L2 ζ1 , η1 + n ζi ζj − ηi ηj ⎠ ⎠ ⎝E i=j
⎛ ⎞1/2 ◦2 ◦ ◦ 2 ◦ ◦ ◦ ⎠ = n−1/2 L2 ζ1 , η21 + n−3/2 ⎝ E ζi ζj − ηi ηj i=j
◦2 ◦ ◦ 1/2 ◦2 ◦ ◦ −3/2 = n L2 ζ1 , η1 + n n(n − 1)L1 ζ1 ζ2 , η1 η2 ◦ ◦ ◦ 2 ◦ 2 ◦ ◦ −1/2 ≤ n Var ζ1 + Var ( η1 ) + n−1/2 Var ζ1 ζ2 + Var ( η1 η2 ) −1/2
≤ c n−1/2 .
9.3 Extensions of the Contraction Method
275
Here and in the sequel c stands for an absolute constant, which may be different at different places. Similarly, I22
◦ ζi , n ηi n (9.3.76) ◦ ◦ −1/2 ◦ −1/2 ◦ −1/2 −1/2 ζi , n ηi E n ζi +E n ηi L1 n ◦ ◦ ◦ ◦ −1/2 −1/2 ζi , n ηi Var ζ1 + Var η1 L1 n ◦ ◦ −1/2 −1/2 ζi , n ηi . 2 L1 n L22
= ≤ ≤ =
−1/2
◦
−1/2
Passing to the minimal metric, this yields ◦
◦
−1/2
2 (Cn , Dn ) ≤ c n
◦ ◦ √ A + 2 1 (n−1/2 ζi , η2 ) .
(9.3.77)
The rate of convergence in the CLT for the 1 = ζ1 -metric has been discussed by Zolotarev (1986, Theorem 5.4.7) (see also Rachev and R¨ uschendorf (1990, Lemma 3.3)). It is given by −1/2
1 (n
◦
◦
◦
ζi , η1 ) ≤ 11.5 max 1 ( ζ1 , η1 ), ζ2 ( ζ1 , η1 ) n−1/2 , ◦
◦
◦
(9.3.78) ◦
◦
where ζr is the Zolotarev ideal metric of order r > 0. This implies 2 (Cn , Dn ) ≤ C n−1/4 . We can argue similarly to show (9.3.76). The bound (9.3.75) can be replaced by −1/2
I1 ≤ n
◦ ◦ ◦ 2 ◦ 2 ◦ ◦ −1/2 E ζ1 + E | η1 | + n E ζ1 ζ2 + E | η1 η2 | (9.3.79)
≤ c n−1/2 . Invoking (9.3.78), we see that the term I2 is of order n−1/2 .
2
We can extend our results to the cases m = 3, 4. However, the proofs are computationally quite involved. For some general results on the asymptotic distributions of quadratic forms we refer to de Jong (1989).
10 Stochastic Differential Equations and Empirical Measures
10.1 Propagation of Chaos and Contraction of Stochastic Mappings In this section we use contraction properties of stochastic mappings with respect to suitably chosen metrics in order to study some new examples of propagation of chaos. In particular, systems of stochastic differential equations (SDEs) with mean field type interactions and the corresponding nonlinear SDEs of McKean–Vlasov type for the limiting cases will be considered. We shall also study the rate of convergence to the corresponding limit. Assumptions on the smoothness and growth properties of the coefficients of the SDEs are to be reflected in the choice of the probability metric in order to obtain the required contraction properties. This allows us to investigate new types of interactions as well as to consider systems with relaxed Lipschitz assumptions.
10.1.1
Introduction
The notion “propagation of chaos” was introduced by Kac in his investigation of the relationship between simple Markov models of interacting particles and nonlinear Boltzmann-type equations; for an introduction to the propagation theory of chaos we refer to Sznitman (1989). A formal def-
278
10. Stochastic Differential Equations and Empirical Measures
inition follows. Let (uN ) be a sequence of symmetric probability measures on E N , E a separable metric space, and let u be a probability on E. Then w (uN ) is called u-chaotic if πk un −→ u(k) . Here πk stands for the k-marginal w distribution, u(k) is the k-fold product, and −→ denotes weak convergence. A basic example for chaotic sequences is McKean’s interacting diffusion (cf. the “laboratory” example in Sznitman (1989, p. 172)). Consider a system of interacting diffusions
dXti,N X0i,N
=
dWti
N 1 + b(Xti,N , Xtj,N )dt, N j=1
i = 1, . . . , N,
(10.1.1)
= xi0 ,
where the W i are independent Brownian motions and b satisfies a certain Lipschitz condition. Let uN denote the distribution of (X 1,N , . . . , X N,N ). The nonlinear limiting equation is given by the McKean–Vlasov equation dXt = dBt +
b(Xt , y)ut (dy) dt,
(10.1.2)
when Bt is a Brownian motion and ut is the distribution of Xt . Then uN is u-chaotic, where u is the distribution of X on C(IR+ , IRd ). An alternative example of chaotic behavior of particles, not described by SDEs, are uniform distributions on “p-spheres.” Let uN denote the uniform distribution on the p-sphere of radius N in IRN + , that is, on Sp,N := {x ∈ p N IR+ ; Σxi = N }. Let u denote the probability measure on IR+ with density fp (x) =
p1−1/p −xp /p e , Γ(1/p)
x ≥ 0.
Then for N > k + p, and k and N big enough, πk uN − u(k) ≤
2(k + p) + 1 , N −k−p
(10.1.3)
where · denotes the total variation distance (cf. Kuelbs (1977), Rachev and R¨ uschendorf (1991)). In particular, we obtain that uN is u-chaotic. This example has its origin in Poincar´e’s theorem on the asymptotic behavior of particle systems. More general examples of this kind have been developed in the statistical physics literature in connection with the “equivalence of ensembles” but typically without a quantitative estimate as in (10.1.3).
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
279
The main goal of this section is the study of propagation of chaos in several modifications of McKean’s example. We shall be concerned with the form of the interaction and the regularity assumptions on the coefficients. To this end we introduce suitable probability metrics, allowing us to derive contraction properties of the stochastic equations defined by the corresponding linear equations. Dobrushin (1979) introduced the use of the Kantorovich metric for the interacting diffusions in the model (10.1.1), (10.1.2). The success of this metric is based on a coupling argument inherent in its definition. This metric has been applied since then in several other papers. For some modifications of the model (10.1.1), (10.1.2) we shall need alternatives to the Kantorovich metric that provide suitable regularity and ideality properties for the equations considered. In particular, we need metrics that are “ideal” of higher order when we relax the Lipschitz conditions in equations (10.1.1), (10.1.2). Our modifications allow us to treat much more complicated forms of interactions than those in the McKean example. In particular, we consider nonlinear interactions via some general energy function, as for example the p-norm of the vector of all pair interactions. We also consider interactions with “outside” particles over the whole past (history) of the process, describing some non-Markovian systems. We demonstrate the flexibility of the approach based on suitable probability metrics to analyze with nonstandard forms of interactions and develop the tools to study complex physical systems.
10.1.2
Equations with p-Norm Interacting Drifts
Consider a system of N interacting diffusions with p-norm interacting drifts; that is, the drift is given by the pth norm of the vector of all pair interactions (which can be viewed as the driving force in the system):
dXti,N X0i,N
⎧ ⎫1/p N ⎨1 ⎬ = dWti + bp Xti,N , Xtj,N dt ⎩N ⎭ j=1 = X0i ,
1 ≤ i ≤ N,
(10.1.4)
280
10. Stochastic Differential Equations and Empirical Measures
b ≥ 0, p ≥ 1. ((Wti ), X0i ) are independent, identically distributed for all i.) We shall establish that each X i,N has a natural limit X i , where the (X i ) are independent copies of the solutions of a nonlinear equation ⎧ ⎨ ⎩
dXt
= dBt +
Xt=0
= X,
1/p p
b(Xt , y) ut ( dy)
dt,
(10.1.5)
d
with B = W 1 a process on CT , and ut = P Xt . In order to obtain the necessary contraction properties of these equations, we consider the L∗p -metric and the corresponding minimal L∗p -metric (∗p ), defined for processes X, Y (or the corresponding probability measures m1 , m2 ∈ M 1 (CT )). Here and in what follows M 1 (CT ) denotes the class of all probability distributions on CT , L∗p,t (X, Y ) := (E sup |Xs − Ys |p )1/p ,
(10.1.6)
s≤t
and ∗p,t (m1 , m2 ) := inf{L∗p,t (X, Y ); X = m1 , Y = m2 }. d
d
(10.1.7)
In (10.1.7) we tacitly assumed that the underlying probability space is rich enough to support all possible couplings of m1 , m2 , which is true, for example, in the case of atomless probability spaces. Define, for m0 ∈ M 1 (CT ), Mp (CT , m0 ) := {m1 ∈ M 1 (CT ); ∗p,T (m0 , m1 ) < ∞},
(10.1.8)
and let Xp (CT , m0 ) be the class of processes on CT with distribution m ∈ Mp (CT , m0 ). For m0 = δa (the one-point measure in a ∈ CT ), this is the class of all distributions on CT with finite pth moment of the process norm. For m ∈ Mp (CT , m0 ) consider the linear equation corresponding to (10.1.5) t Xt = Bt + 0
⎛ ⎝
CT
⎞1/p b(Xs , ys )p dm(y)⎠
ds,
(10.1.9)
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
281
where ys is the value of y at time s. Let (Bt ) be a real-valued process on CT = C[0, T ] with finite pth absolute moment (E sups≤T |Bs |p < ∞), and let b ≥ 0 be a Lipschitz function in x; that is, |b(x1 , y) − b(x2 , y)| ≤ c|x1 − x2 |,
for all x1 , x2 and Y in CT . (10.1.10)
As usual, a strong solution of the SDE (10.1.9) means a solution measurable with respect to the augmented filtration of the process (Bt ). In constrast, a weak solution of (10.1.9) is defined on a suitable filtered space of distributions.
Lemma 10.1.1 Assume that (10.1.10) holds, and let 1/p b(0, ys )p dm(y) ds < ∞.
(a) Then equation (10.1.9) has a unique strong solution X. (b) If Φ(m) is the law of X, then Φ(m) ∈ Mp (CT , m0 ); that is, Φ : Mp (CT , m0 ) → Mp (CT , m0 ). Proof: Let X ∈ Xp (CT , m0 ), and define 1/p
t b(Xs , ys )p dm(y)
(SX)t := Bt +
ds.
0
Then, for Y ∈ Xp (CT , m0 ), t |(SX)t − (SY )t | ≤ 0
1/p 1/p b(Xs , ys )p dm(y) − b(Ys , y)p dm(y) ds
t ≤
1/p ds |b(Xs , ys ) − b(Ys , ys )| dm(y) p
0
t ≤ c |Xs − Ys | ds. 0
282
10. Stochastic Differential Equations and Empirical Measures
This implies sups≤t |(SX)s − (SY )s | ≤ c thermore, L∗p,t (SX, SY
t 0
supu≤s |Xu − Yu | ds, and fur-
) =
p
1/p
E sup |(SX)s − (SY )s | s≤t
⎛ ⎛ ≤ c ⎝E ⎝
t 0
⎞p ⎞ 1/p sup |Xu − Yu | ds⎠ ⎠ u≤s
t ≤ c L∗p,s (X, Y ) ds. 0
Define inductively X 0 := B and X n := SX n−1 . Then the above bound yields L∗p,T (X n , X n−1 ) ≤ cn
Tn ∗ Lp,T (X 1 , X 0 ). n!
For the L∗p,T -distance in the right-hand side we have the following estimate: $1/p T# ∗ 1 0 p p Lp,T (X , X ) ≤ c ds E|Bs | + b(0, ys ) dm(y) 0
T
≤ c
1/p E sup |Bu |
0
p
ds + c
u≤s
p
b(0, ys ) dm(y)
ds
0 p 1/p
≤ c T (E sup |Bs | ) s≤T
1/p
T
+c
1/p
T p
b(0, ys ) dm(y)
ds.
0
From the assumptions on B and b, the above bound implies that L∗p,T (X 1 , X 0 ) < ∞. Consequently, ∞
L∗p,T (X n , X n−1 ) ≤ ecT L∗p,T (X 1 , X 0 ) < ∞,
n=1
∞ by the Gronwall lemma. This results in n=1 sups≤T |Xsn −Xsn−1 | < ∞ a.s., and therefore, X n converges to some process X a.s., uniformly on bounded intervals. The limiting process X is a.s. continuous, has finite pth moments
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
283
(i.e., X∗p,t := E sups≤t |Xs |p < ∞), and is a fixed point of the mapping S. So, Φ(m) = P X ∈ Mp (CT , m0 ); this holds because B∗p,T < ∞ and 2 L∗p,T (X, B) < ∞. In addition, suppose that b is Lipschitz in both arguments; that is, for all x1 , x2 , y1 , and y2 in CT , |b(x1 , y1 ) − b(x2 , y2 )| ≤ c[|x1 − y1 | + |x2 − y2 |],
(10.1.11)
and consider the mapping Φ : Mp (CT , m0 ) → Mp (CT , m0 ). Lemma 10.1.2 (Contraction of Φ with respect to the ∗p,t -minimal metric) Under the Lipschitz condition (10.1.11) and the assumptions of Lemma 10.1.1, for t ≤ T and m1 , m2 ∈ Mp (CT , m0 ), the following holds:
∗p,t (Φ(m1 ), Φ(m2 )) ≤ cect
t
∗p,u (m1 , m2 ) du.
(10.1.12)
0
Proof: Let for i = 1, 2 and t ≤ T ,
(i)
Xt
t := Bt + 0
⎛ ⎝
⎞1/p b(Xs(i) , ys )p dmi (y)⎠
ds,
CT
and let m ∈ M 1 (m1 , m2 ), the class of probability measures on CT × CT with marginals m1 , m2 . Then sup |Xs(1) − Xs(2) | s≤t ⎡ ⎤1/p ⎡ ⎤1/p t ⎣ (1) (1) p (1) ⎦ (2) (2) p (2) ⎦ ⎣ ≤ ds b(Xs , ys ) dm1 (y ) − b(Xs , ys ) dm2 (y ) CT 0 CT ⎡ ⎤1/p t p ≤ ds ⎣ b(Xs(1) , ys(1) ) − b(Xs(2) , ys(2) ) dm(y (1) , y (2) )⎦ 0
CT ×CT
t ≤ c 0
ds |Xs(1) − Xs(2) | +
#
|ys(1) − ys(2) |p dm(y (1) , y (2) )
$1/p .
284
10. Stochastic Differential Equations and Empirical Measures
Minimizing the right-hand side with respect to all couplings m, we obtain
sup |Xs(1) s≤t
−
Xs(2) |
t t (1) (2) ≤ c ds sup |Xu − Xu | + c ds ∗p,s (m1 , m2 ). (10.1.13) 0
u≤s
0
Consequently, by Gronwall’s lemma, t sup |Xs(1) − Xs(2) | ≤ c ect ∗p,s (m1 , m2 ) ds. s≤t
(10.1.14)
0
Finally, passing to the pth norm in the left-hand side of (10.1.14) and then to the corresponding minimal metric ∗p,t proves the lemma. 2
Theorem 10.1.3 Under the Lipschitz condition (10.1.11) and assuming T that ( b(0, ys )p dm0 (y))1/p ds < ∞, equation (10.1.1) has a unique weak 0
and strong solution in Xp (CT , m0 ). Proof: From Lemma 10.1.2 we obtain that for m ∈ Mp (CT , m0 ), Tk ∗ (cT = c ecT ) p,T (Φ(m), m) k! Tk ∗ ( (Φ(m), m0 ) + ∗p,T (m, m0 )) < ∞. ≤ ckT k! p,T
∗p,T (Φk+1 (m), Φk (m)) ≤ ckT
Consequently, (Φk (m)) is a Cauchy sequence in (CT , ∗p,T ) and thus converges to a fixed point of Φ. Let X k+1 , X k denote the couplings of Φk+1 (m), Φk (m). Then, by (10.1.12), we have that L∗p,T (X k+1, X k ) ≤ ckT
Tk ∗ (Φ(m), m), k! p,T
and therefore, we determine a unique strong solution with finite pth moment. 2
Remark 10.1.4 While the linear equation in Lemma 10.1.1 can be handled with the L1 -metric, in Lemma 10.1.2 we obtain only a contraction with respect to the minimal p -metric ∗p,T (cf. equation (10.1.12)).
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
285
Remark 10.1.5 The result of Theorem 10.1.3 can be extended to the case p = ∞ by applying the metric L∗∞,T (X, Y ) = ess sup sup |Xs − Ys |
(10.1.15)
s≤T
and the corresponding minimal metric ∗∞,T (m1 , m2 ) = inf{L∗∞,T (X, Y ); X = m1 , Y = m2 }. d
d
(10.1.16)
Then the equation
Xt
t = Bt + ess sup b(Xs , y) ds 0
(10.1.17)
us (dy)
has a unique solution in M∞ (CT , m0 ) if B is a.s. bounded, that is, if ess sups≤T |Bs | < ∞. Remark 10.1.6 Several extensions of equation (10.1.5) can be handled in a similar way, as for example t Xt = Bt +
1/p
b(Xs , y)
p
us(k) ( dy)
ds,
(10.1.18)
0
Ek (k) Xs where us = stands for the k-fold product of us and y = i=1 P k (y1 , . . . , yk ) ∈ IR . More generally, b = b(s, x, y) can be dependent upon s and the past of the process y = (yu )u≤s . In this case, us has to be replaced by u(s) := P (Xu )u≤s (the distribution of the past), and we need to assume a functional Lipschitz condition on b. In a similar way one can also investigate the d-dimensional case. Taking as a starting point Theorem 10.1.3, we next investigate the system of interacting equations in (10.1.4). The following theorem asserts that as N goes to infinity, each X i,N has a natural limit X i . Here, the (X i ) are independent copies of the solutions of the nonlinear equation (10.1.5). Theorem 10.1.7 Let b satisfy the Lipschitz condition (10.1.11) and suppose that |b(X 1s , ys )|2p us ( dys ) < ∞, a.s. Then √ sup N E 1/p sup |Xti,N − X it |p < ∞ N
t≤T
for
p ≥ 2,
(10.1.19)
286
10. Stochastic Differential Equations and Empirical Measures
and N p−1 E sup |Xti,N − X it |p = o(1)
for
t≤T
1 ≤ p ≤ 2.
Proof: For notational convenience we drop the superscript N ; then t Xti − X it = 0
⎛⎧ ⎫1/p ⎞ ⎫1/p ⎧ N ⎨ ⎬ ⎬ ⎨ ⎜ 1 ⎟ i j p i p b(Xs , Xs ) − b(X s , y) us ( dy) ⎝ ⎠ ds ⎭ ⎩N ⎩ ⎭ j=1
⎧⎡⎛ ⎤ ⎞ ⎞ 1/p ⎛ 1/p ⎪ ⎨ 1 ⎢⎝ 1 ⎥ i j p⎠ ⎝ ds ⎣ b(Xs , Xs ) − b(X is , Xsj ⎠ ⎦ = ⎪ N j N j ⎩ 0 ⎡⎛ ⎤ ⎞ ⎞ 1/p ⎛ 1/p 1 p i j ⎠ ⎥ ⎢ 1 + ⎣⎝ b(X is , Xsj )p ⎠ − ⎝ b (X s , X s ) ⎦ N j N j t
⎡⎛ ⎤⎫ ⎞ ⎞ 1/p ⎛ 1/p ⎪ ⎬ ⎢⎝ 1 ⎥ i j p⎠ i p ⎝ ⎠ +⎣ b(X s , X s ) − b(X s , y) us ( dy) ⎦ . ⎪ N j ⎭
Set |X|T := sups≤T |Xs |. Then by the Minkowski inequality and the Lipschitz condition on b, the above equality implies 1/p i i p := E X − X T
X − X T,p ⎧ ⎪ T ⎪ N ⎨ 1 i i ≤ ds cXs − X s p + c Xsj − X js p ⎪ N j=1 ⎪ ⎩ 0 i
i
⎫ ⎞ ⎛ ⎛ 1/p ⎞1/p ⎛ ⎞ 1/p p ⎪ ⎪ 1 ⎬ ⎜ ⎝ ⎟ i j p⎠ i p ⎝ ⎠ . + ⎝E b(X s , X s ) − b(X s , y) us ( dy) ⎠ ⎪ N ⎪ j ⎭
Summing up over i and using the symmetry, we find
1
1
N X − X T,p =
N i=1
X i − X i T,p
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
T ≤ 2c
ds
⎧ ⎪ N ⎨
287
Xsi − X is p
⎪ ⎩ i=1 ⎫ ⎛ ⎞ ⎞ 1/p ⎛ 1/p p N N 1 ⎪ ⎬ ⎝ i j p⎠ i p . + E b(X s , X s ) − ⎝ b(X s , y) us ( dy)⎠ N ⎪ i=1 j=1 ⎭ 0
This amounts to
T X i − X i T,p ≤ 2c 0
⎧ ⎪ ⎨ ds X i − X i s,p ⎪ ⎩
⎤⎫ ⎡ ⎛ ⎞ ⎞ 1/p ⎛ 1/p p N ⎪ ⎬ 1 ⎢ ⎝ 1 ⎥ i j p⎠ i p ⎝ ⎠ + b(X s , X s ) − b(X s , y) us ( dy) ⎦ , ⎣E ⎪ N N i=1 j ⎭
and consequently, by the Gronwall lemma, ⎡ ⎛ ⎞ 1/p N N 1 ⎢ ⎝ 1 X i − X i T,p ≤ 2c e2cT ds b(X is , X js )p⎠ ⎣E N N i=1 j=1 0 ⎛ ⎞1/p p ⎤ ⎥ i p ⎝ ⎠ − b(X s , y) us ( dy) ⎦ ⎛ ⎞ ⎞ 1/p ⎛ 1/p p T 1 ⎝ 2cT 1 j p⎠ 1 p ⎝ ⎠ ds E b(X s , X s ) − b(X s , y) us ( dy) = 2c e . N j 0 T
By the Taylor expansion and with Yj := b(X is , X js )p (conditionally on X is ) we obtain p p 1/p S 1 1−p SN N 1/p +a −a ≤ p a E , E N p N
(10.1.20)
288
10. Stochastic Differential Equations and Empirical Measures
where SN = (Yj − a), a = EYj > 0. Therefore, from the Marcinkiewicz– Zygmund inequality (cf. Chow and Teicher, (1978, p. 357)), we conclude that √
p p 1/p S N 1/p 1/p 1/p SN +a NE − a ≤ const. E √ = O(1). N N
This yields (10.1.19) for p ≥ 2. For 1 ≤ p < 2, the claim follows from N the moment bounds of Pyke and Root (1968), giving E| NS1/p |p = o(1). Therefore, p 1/p S N p−1 1/p +a N E − a = o(1). N
(10.1.21)
2 We next interpret Theorem 10.1.7 as a chaotic property of the diffusions governed by (10.1.4). Recall that by Proposition 2.2 in Sznitman (1989), a sequence (uN ) of symmetric probability measures on E (N ) is u-chaotic, u ∈ M 1 (E), if for (X1 , . . . , XN ) distributed as uN , N 1 w δXi −→ u. N i=1
For X N :=
1 N
w
N
i=1 δX i,N
X N −→ X,
(10.1.22)
we obtain from Theorem 10.1.7 that
(10.1.23)
where X is the solution of equation (10.1.5). Therefore, with m denoting the law of X and mN denoting the law of (X 1,N , . . . , X N,N ) we obtain from (10.1.22) the following corollary.
Corollary 10.1.8 Under the assumptions of Theorems 10.1.3 and 10.1.7, the sequence (mN ) is m-chaotic.
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
289
Remark 10.1.9 For p = ∞ (see (10.1.17)) the propagation of chaos property does not hold. Also, the case 0 < p < 1 does not lead to propagation of chaos, and there does not exist a unique strong solution of t b(Xs , ys )p dm(y) ds.
Xt = Bt +
(10.1.24)
0
Remark 10.1.10 (An example leading to a Burger type equation.) Consider the stochastic system ⎞1/p N 1 = dWti + ⎝ b(Xti , Xtj )p ⎠ dt, N j=1 ⎛
dXti
i = 1, . . . , N,
(10.1.25)
with Lipschitz (in both arguments) interactive term b(·, ·). Then the instantaneous drift term seen by particle i is ⎞1/p N 1 = ⎝ b(Xti , Xtj )p ⎠ . N j=1 ⎛
Δi
Under the assumptions of Theorem 10.1.7, we have ⎡ lim E ⎣
N →∞
1 N
N
2 Δpi −
1/p 32
bp (Xti , y)ut ( dy)
⎤ ⎦ = 0,
i=1
as well as *
+2 N 1 lim E = 0. Δpi − bp (Xti , y)ut ( dy) N →∞ N i=1 Similarly to the above limit we shall examinethe average behavior nrelations p 1 of the “pseudo drift” N i=1 Zi . Here, Zip := N 1−1 j=i φpN,a (Xti − Xtj ), and φN,a (x − y) = N ad/p φ(N a (x − y)), where φ(·) ≥ 0 is smooth, compactly supported on IRd , and φ(x) dx = 1. We consider the vector-valued case here. Note that 1 p φN,a (Xt1 − Xtj ) N − 1 j=2 N
Z1p
:=
290
10. Stochastic Differential Equations and Empirical Measures
1 ad p a i N φ (N (Xt − Xtj )), N − 1 j=1 N
=
and consequently, EZ1p = N ad (Eφp (N a (Xt1 − Xt2 ))) ad p a 1 N Eφ (N (Xt − Xt2 )) p = φp φ
−→
N →∞
ut 2L2
φp =: ut,p (Xt ).
Consider next *
an
+2 N 1 p := E (Zi − ut,p (Xt1 )) N i=1 ⎛ ⎡ ⎞⎤2 N 1 ⎝ 1 p = E⎣ φN,a (Xti − Xtj ) − ut,p (Xt1 )⎠⎦ N i=1 N − 1 j=i
⎡
=
E⎣
1 N −1
N
⎤2
φpN,a (Xt1 − Xtj ) − ut,p (Xt1 )⎦ .
j=1
Arguing as in Sznitman (1989, p. 196), we find that
aN →
⎧ ⎪ ⎪ 0 ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ∞
φ
2p
p 2 φ dx ut 2L2 2p φ
if 0 < a < d1 , if a = d1 , if a > d1 .
Therefore, only in the case of moderate interaction do we obtain Burger’s equation in the limit.
10.1.3
A Random Number of Particles
Let (W i )i∈IN be a sysetm of i.i.d. real-valued processes (as in (10.1.4)) with finite pth moments and let (Nn )n≥1 be an i.i.d. integer-valued sequence of
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
291
r.v.s independent of (W i ). Consider the following system of SDE with a random number of particles and interactions: ⎞1/p Nn 1 = dWti + ⎝ b(Xti,n , Xtj,n )p ⎠ dt, i = 1, . . . , Nn . (10.1.26) n j=1 ⎛
dXti,n
We assume that the following asymptotic stability condition holds: Nn → Y n
a.s. as n → ∞.
(10.1.27)
As in Section 10.1.2, it turns out that X i,N has a natural limit X i that is a solution of the nonlinear SDE dXt = dBt + Y 1/p
1/p
b(Xt , y)p ut ( dy)
.
(10.1.28)
d
Here B = W 1 , and Y is assumed to be independent of B. For m0 ∈ M 1 (CT ) let Mp (CT , m0 ), L∗p,T , ∗p,T be defined as in Section 10.1.2.
Lemma 10.1.11 Suppose that
t 0
|Bs | ds < ∞ a.s. Then for any m ∈
Mp (CT , m0 ), there exists a unique strong solution of the equation
Xt = Bt + Y 1/p
t
⎛ ⎝
0
Proof: Set (SX)t := Y 1/p
⎞1/p
b(Xs , ys )p dm(y)⎠
ds.
(10.1.29)
CT
t 0
⎛ ⎝
⎞1/p b(Xs , ys )p dm(y)⎠
ds. Then arguing
CT
in a similar fashion as in the proof of Lemma 10.1.1, we obtain the bound
sup |(SX)s − (SY )s | ≤ cY 1/p s≤t
t sup |Xu − Yu | du.
0
0≤u≤s
292
10. Stochastic Differential Equations and Empirical Measures
Defining inductively X 0 := B, X n := SX n−1 , we have sup |Xsn s≤t
−
Xsn−1 |
≤ c Y n
n/p t
n
n!
sup |Xs1 − Xs0 | s≤t
⎤ ⎡ t 1/p t t ds⎦ ≤ cn Y (n+1)/p ⎣ Bs ds + |ys |p dm(y) n! n
0
< ∞.
0
2
This indeed implies the existence of a unique strong solution.
Given m ∈ Mp (CT , m0 ), let Φ(m) denote the distribution of the solution of (10.1.29). Then we have the following contraction-type property for the mapping Φ.
Lemma 10.1.12 Suppose that Ap := cY 1/p ecY T, m1 , m2 ∈ Mp (CT , m0 ),
1/p
p < ∞. Then for t ≤
t ∗p,t (Φ(m1 ), Φ(m2 )) ≤ Ap ∗p,s (m1 , m2 ) ds.
(10.1.30)
0
Proof: Let X (i) be the solution of the SDE
(i)
Xt
= Bt + Y 1/p
t
⎛ ⎝
0
⎞1/p b(Xs(i) , ys )p dmi (y)⎠
ds.
CT
Then, as in the proof of Lemma 10.1.2,
sup |Xs(1) − Xs(2) | ≤ cY 1/p s≤t
By the Gronwall lemma, implying that
t 0
t sup |Xu(1) − Xu(2) | + c ds p,s (m1 , m2 ). u≤s
sups≤t |Xs1 − Xs2 |
0
≤ cY
1/p cY 1/p
e
t 0
p,s (m1 , m2 ) ds,
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
293
( t ( 1/p ( ( ∗p,t (Φ(m1 ), Φ(m2 )) ≤ (cY 1/p ecY ( ∗p,s (m1 , m2 ) ds. p
0
2 From Lemmas 10.1.11 and 10.1.12 we conclude that (10.1.28) has a unique solution. The proof is similar to that of Theorem 10.1.3. Theorem 10.1.13 Under the assumptions of Lemmas 10.1.11 and 10.1.12, equation (10.1.28) has a unique solution, provided that B∗p,T
< ∞
1/p p
and
b(0, ys ) dm0 (y)
ds < ∞.
10.1.4 pth Mean Interactions in Time: A Non-Markovian Case Suppose (Xti,N )i=1,...,N determines a system of N particles and let b(Xsi,N , ·) := (b(Xsi,N , Xsj,N ))1≤i≤N describe the interaction vector. Recall that in Section 10.1.2 we considered equation (10.1.4) with a drift of the form b(Xsi,N , ·)p corresponding to the pth norm of the interaction vector. In this section we shall study SDEs with mean interactions in time. In fact, let N 1 i,N j,N (10.1.31) b Xs , Xs Fi (s) := N j=1 be the average of the interaction vector and consider the equations ⎛
t
⎞1/p
Xti,N
= Wti + ⎝
|Fi (s)|p ds⎠
X0i,N
=
X0i ,
Xti,N
= Wti + ess sup |Fi (s)|;
X0i,N
= X0i , 1 ≤ i ≤ N,
;
(10.1.32)
0
1 ≤ i ≤ N,
for 1 ≤ p < ∞;
s≤t,λ\
for p = ∞;
(10.1.33)
294
10. Stochastic Differential Equations and Empirical Measures
t Xti,N
= Wti +
X0i,N
X0i
|Fi (s)|p ds;
(10.1.34)
0
=
; 1 ≤ i ≤ N,
for 0 < p < 1.
In other words, we consider SDEs with a drift resulting from the pth mean in time of the average of the interaction vector. From the definition it is clear that this model describes a system that no longer behaves as a Markovian one, since the instantaneous drift |Fi (t)|p is weighted by the mean interac⎛ t ⎞1/p−1 1 over the whole past of the process. From this tion ⎝ |Fi (s)|p ds⎠ p 0
point of view the propagation of chaos property seems to be not so obvious in this model. First we consider the case 1 ≤ p < ∞. The nonlinear limiting equation is given by ⎛
Xt
⎞1/p p t = Bt + ⎝ b(Xs , y)us ( dy) ds⎠ , us = P Xs . (10.1.35) 0
Here Xt , Bt , b are real-valued, Bt is a process in CT = C[0, T ], and |b(x1 , y) − b(x2 , y)| ≤ c|x1 − x2 | for some c > 0.
(10.1.36)
Define, for m0 ∈ M 1 (T ), Mp (CT , m0 ) := {m1 ∈ M 1 (CT ); ∗p,t (m1 , m0 ) < ∞}.
(10.1.37)
Then, for m ∈ Mp (CT , m0 ), consider the linear equation p ⎞1/p t = Bt + ⎝ b(Xs , ys ) dm(y) ds⎠ , ⎛
Xt
0
CT
where ys is the value of y at time s.
(10.1.38)
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
295
Lemma 10.1.14 Assume that the Lipschitz condition (10.1.36) holds, and furthermore, p T b(0, ys )ms ( dy) ds < ∞, 0
CT
where ms is the distribution of Xs at time s under m. Then (a) Equation (10.1.38) has a unique strong solution X. (b) If Φ(m) is the law of X, then Φ(m) ∈ Mp (CT , m0 ); that is, Φ : Mp (CT , m0 ) → Mp (CT , m0 ). Proof: Let X ∈ Xp (CT , m0 ) and define ⎛
⎞1/p p t := Bt + ⎝ b(Xs , y)ms ( dy) ds⎠ .
(SX)t
(10.1.39)
0
Then |(SX)t − (SY )t |p ⎛⎛ ⎞1/p p t ⎜ = ⎝⎝ b(Xs , y)ms ( dy) ds⎠ 0
⎛ −⎝
t
⎞1/p ⎞p p b(Ys , y)ms ( dy) ds⎠ ⎟ ⎠
0
⎛ t ⎞1/p p ≤ ⎝ (b(Xs , y) − b(Ys , y))ms ( dy) ds⎠ 0
(Minkowski inequality) t cp |Xs − Ys |p ds (by the Lipschitz condition (10.1.36)). ≤ 0
296
10. Stochastic Differential Equations and Empirical Measures
This implies t ≤ cp sup |Xu − Yu |p ds,
sup |(SX)s − (SY )s |p s≤t
0
(10.1.40)
u≤s
t ∗p p and furthermore, L∗p (SX, SY ) ≤ c Lp,s (X, Y ) ds. Define, inductively, p,t 0
n n−1 1 0 X 0 := B, X n := SX n−1 . Then L∗p ) ≤ cpn Tn! L∗p p,t (X , X p,T (X , X ). By (10.1.36), the integral b(Xs , ys )m( dy) is a Lipschitz function of Xs . n
CT
Thus p t = E sup b(Bs , y)ms ( dy) ds t≤T
1 0 L∗p p,T (X , X )
0
T
p
≤ E
(|b(0, y)| + c|Bs |)ms ( dy)
ds
0
≤ c
p
T |b(0, y)|ms ( dy)
T ds + c E |Bs |p ds < ∞,
0
0
as by the assumptions the integrals in the right-hand side are finite. Therefore,
L∗p,T (X n , X n−1 )
n≥1
n≥1
This implies
≤
n≥1
c
n
Tn n!
1/p
L∗p,T (X 1 , X 0 ) < ∞.
L∗p,T (X n , X n−1 ) < ∞. Then
L∗1,T (X n , X n−1 ) < ∞.
n≥1
In consequence, X n converges to some process X a.s. uniformly on bounded intervals. X is a.s. continuous, and E sups≤t |Xs |p < ∞, since E sups≤t |Bs |p < ∞. This yields Φ(m) ∈ Mp (CT , m0 ). 2 In addition, suppose that b is Lipschitz in both arguments; that is, |b(x1 , y1 ) − b(x2 , y2 )| ≤ c[|x1 − x2 | + |y1 − y2 |]
(10.1.41)
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
297
for all x1 , x2 , y1 , and y2 in CT , and consider the map Φ : Mp (XT , m0 ) → Mp (CT , m0 ). Lemma 10.1.15 (Contraction of Φ with respect to ∗p,t ) Suppose that (10.1.41) and the assumption of Lemma 10.1.14 hold. Then for t < T and m1 , m2 ∈ Mp (CT , m0 ), t ∗p,t (Φ(m1 ), Φ(m2 )) ≤ cp ecp t ∗p p,s (m1 , m2 ) ds,
(10.1.42)
0
where cp := c 2p−1 . Proof: For i = 1, 2 and t ≤ T , set
(i)
Xt
p ⎞1/p ⎛ t (i) = Bt ⎝ b Xs , ys dmi (y) ds⎠ , 0
CT
and let m ∈ M 1 (m1 , m2 ), the class of probabilities on CT × CT with marginals m1 and m2 . Then ⎛ p ⎞ 1/p t sup |Xs(1) − Xs(2) |p = ⎝ b Xs(1) , ys(1) dm1 (y (1) ) ds⎠ s≤t 0 CT p ⎞ ⎛ t 1/p p (2) (2) (2) − ⎝ b Xs , ys dm2 (y ) ds⎠ 0 CT ⎤ ⎡ p t ≤ ds ⎣ b Xs(1) , ys(1) − b Xs(2) , ys(2) dm y (1) , y (2) ⎦ 0
t ≤
CT ×CT
# $p (1) ds c Xs − Xs(2) + ys(1) − ys(2) dm y (1) , y (2) .
0
Minimizing the right-hand side over all couplings, we get p (1) (2) sup Xs − Xs s≤t
298
10. Stochastic Differential Equations and Empirical Measures
t ≤ Hc · IJ 2p−1K =:cp
t p ds sup Xu(1) − Xu(2) + cH · IJ 2p−1K ds ∗p 1,s (m1 , m2 ).
0
u≤s
=:cp
0
Consequently, for p ≥ 1, by the Gronwall lemma and ∗1,s ≤ ∗p,s , t p sup Xs(1) − Xs(2) ≤ cp ecp t ds ∗p p,s (m1 , m2 ). s
0
This yields the desired contractive inequality t cp t ∗p ∗p p,t (Φ(m1 ), Φ(m2 )) ≤ cp e p,s (m1 , m2 ) ds. 0
2
T Theorem 10.1.16 Under (10.1.41) and ( b(0, ys ) dm0 (y))p ds < ∞, 0 CT
equation (10.1.38) has a unique weak and strong solution in Xp (CT , m0 ).
Proof: From Lemma 10.1.15 we obtain that for m ∈ Mp (CT , m0 ),
k+1 ∗p (m), Φk (m)) p,T (Φ
T k ∗p (Φ(m), m) ≤ CT k! p,T T k ∗p ∗p p−1 (Φ(m), m0 ) + p,T (m, m0 ) ≤ 2 CT k! p,T < ∞.
The remaining part of the proof is similar to that of Theorem 10.1.3.
2
In the next step we turn our attention to the system of interacting particles defined in (10.1.32), where ((Wti ), X0i ) are independent processes and identically distributed for all i. The following theorem asserts that as N → ∞, every X i,N has a natural limit X i . In fact, the (X i ) are indepen-
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
299
dent copies of the solution of the nonlinear equation of McKean–Vlasov type, p ⎞1/p t = Bt + ⎝ b(Xs , y)us ( dy) ds⎠ , ⎛
Xt
0
Xt=0
(10.1.43)
CT
= X0 , d
considered in Theorem 10.1.16 with B = W (1) . Let b satisfy the Lipschitz condition (10.1.36). Theorem 10.1.17 Suppose that
b(X 1s , y)p us ( dy) < ∞ a.s.
(10.1.44)
Then for any i ≥ 1, T > 0,
√
sup N
E
N
sup |Xti,N t≤T
1/p − X it |p
< ∞
for p ≥ 2
(10.1.45)
and
N
(1/p)−1
E
sup |Xti,N t≤T
1/p −
X it |p
= o(1)
for 1 ≤ p < 2.
Proof: We drop further the superscript N . Then Xti − X it p ⎞ ⎛ t ⎞ 1/p ⎛ t p 1/p N 1 = ⎝ b Xsi , Xsj ds⎠ − ⎝ b X is , y us ( dy) ds⎠ N j=1 0 0 ⎤ ⎡⎛ p ⎞ ⎞ ⎛ 1/p 1/p p t t N N 1 i j i j ⎥ ⎢⎝ 1 ⎠ ⎝ = ⎣ b X , X ds − b X s , Xs ds⎠ ⎦ s s N N j=1 j=1 0 0 ⎤ ⎡⎛ p ⎞ p ⎞ 1/p ⎛ t 1/p t N N 1 1 ⎥ ⎢ b X is , Xsj ds⎠ − ⎝ b X is , X js ds⎠ ⎦ + ⎣⎝ N j=1 N j=1 0
0
300
10. Stochastic Differential Equations and Empirical Measures
⎡⎛ p ⎞ ⎞ ⎤ 1/p ⎛ t p 1/p t N 1 ⎥ ⎢ + ⎣⎝ b X is , X js ds⎠ − ⎝ b X is , y us ( dy) ds⎠ ⎦ . N j=1 0
0
Applying the Minkowski inequality and setting XT := sups≤T |Xs |, we obtain
X i − X i pT,p = EX i − X i pT p ⎡ t N 1 ≤ 4p−1 ⎣E ds b Xsi , Xsj − b X is , Xsj N j=1 0 p T N 1 i j i j + E ds b X s , Xs − b X s , X s N j=1 0 p ⎤ T N 1 i i j + E ds b X s , X s − b X s , y us ( dy) ⎦ N j=1 0 ⎧ ⎡ ⎤p T ⎨ N 1 j ≤ 4p−1 ds cp E|Xsi − X is |p + cp E ⎣ |X − X js |⎦ ⎩ N j=1 s 0 p ⎫ ⎬ T N 1 + E ds b(X is , X js ) − b(X is , y)us ( dy) . N ⎭ j=1
0
Summing up over i and using the symmetry, we find that
N X − i
X i pT,p
=
N i=1
X i − X i pT,p
⎧ ⎤ ⎡ ⎛ ⎞ 1/p p T ⎪ N N ⎨ ⎥ ⎢ 1 ⎝ j p−1 p i i p p ≤ 4 ds c EXs − X s p + c N E ⎣ |Xs − X js |p ⎠ ⎦ ⎪ N j=1 ⎩ i=1 0 p ⎫ ⎬ N N 1 + cp E b(X is , X js ) − b(X is , y)us ( dy) . ⎭ N j=1 i=1
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
301
Therefore, ⎧ ⎨ i p−1 p ≤ 4 c ds X i − X i ps,p + cp X i − X ps,p ⎩ 0 p ⎫ ⎬ N N 1 1 i j i + E b(X s , X s ) − b(X s , y)us ( dy) . N i=1 N j=1 ⎭ T
X i − X i pT,p
Consequently, by the Gronwall lemma, X i − X i pT,p T ≤ Cp eCp T 0
p ⎤ N 1 1 i j i ds ⎣ E b(X s , X s ) − b(X s , y)us ( dy) ⎦ N i=1 N j=1 ⎡
N
(with Cp = 2 · 4p−1 cp ) p T N 1 ≤ Cp eCp T ds E b(X is , X js ) − b(X is , y)us ( dy) N j=1 0 $p # 1 Cp T = Cp e T ·E 0 √ ; N here we also used the Marcinkiewicz–Zygmund inequality (cf. Chow and Teicher (1978, p. 357)) for p ≥ 2, and the Pyke and Root (1968) inequality for 1 ≤ p < 2. 2
Corollary 10.1.18 Let m denote the law of X satisfying (10.1.43), and let mN be the law of (X 1,N , . . . , X N,N ). Then, under the assumptions of Theorems 10.1.16 and 10.1.17, mN is m-chaotic. We next study the limiting case p = ∞ (cf.(10.1.33)). In contrast to the limiting case in Section 10.1.4 of pth norm interaction, we obtain the propagation of chaos property for pth mean interaction in time under a stronger Lipschitz condition. Consider for m ∈ M 1 (CT ),
Xt
= Bt + ess sup b(Xs , y)m( dy) . s≤t CT
(10.1.46)
302
10. Stochastic Differential Equations and Empirical Measures
Here Xt , Bt , and b are real-valued, Bt is a process on CT having finite pth moment, E ess sups≤T |Bs |p < ∞, and b satisfies the Lipschitz condition for all x1 , x2 , and y ∈ CT : |b(x1 , y) − b(x2 , y)| ≤ c|x1 − x2 |,
with 0 < c < 1.
(10.1.47)
We shall use the following Lp -type metric for p ≥ 1: ∗p,t (X, Y ) := (E ess sup |Xs − Ys |p )1/p L s≤t
in X (CT ).
(10.1.48)
Let 5 ∗p,t (m1 , m2 ) ∗p,t (m1 , m2 ) = L
(10.1.49)
be the corresponding minimal metric. Consider the set of measures on M 2 (CT ): Bp (CT , m0 ) = {m1 ∈ M 1 (CT ); ∗p,T (m1 , m0 ) < ∞}, M
(10.1.50)
and let Xp (CT , m0 ) denote the corresponding class of processes. For m0 ∈ Bp (CT , m0 ), consider the linear equation M 1 (CT ) and m ∈ M
Xt
= Bt + ess sup b(Xs , ys ) dm(y) . s≤t
(10.1.51)
CT
Lemma 10.1.19 Assume that the Lipschitz condition (10.1.47) holds, and let ess sup b(0, ys )m( dy) < ∞. s≤T CT
Then (a) Equation (10.1.51) has a unique solution X. (b) If Φ(m) is the law of X, then Φ(m) ∈ Mp (CT , m0 ); that is, Bp (CT , m0 ). Bp (CT , m0 ) → M Φ:M
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
Proof: Let X ∈ Xp (CT , m0 ) and define := Bt + ess sup b(Xs , ys )m( dy) . s≤t
(SX)t
CT
Then |(SX)t − (SY )t |p p = ess sup b(Xs , ys )m( dy) − ess sup b(Ys , ys )m( dy) 0≤s≤t 0≤s
CT
≤ ess sup c |Xs − Ys |
p
0≤s≤t
by the Lipschitz condition (10.1.47). This amounts to ess sup |(SX)s − (SY )s |p ≤ cp ess sup |Xs − Ys |p , 0≤s≤t
s≤t
∗ (X, Y ). ∗ (SX, SY ) ≤ c L and L p,t p,t Define, inductively, X 0 = B, X n = SX n−1 . Then ∗ (X 1 , X 0 ). ∗ (X n , X n−1 ) ≤ cn L L p,t p,t Furthermore, p 1/p = E ess sup b(Bs , ys )m( dy) s≤T $p 1/p # ≤ E ess sup b(0, ys )m( dy) + c |Bs |m( dy)
∗ (X 1 , X 0 ) L p,T
≤ c
s≤T
1/p p < ∞. ess sup b(0, ys )m( dy) + E ess sup |Bs | s≤T s≤t
This yields n≥1
∗ (X n , X n−1 ) ≤ L p,T
n≥1
∗p (X 1 , X 0 ) < ∞. cn L p,T
303
304
10. Stochastic Differential Equations and Empirical Measures a.s.
Therefore, X n −→ X, uniformly on bounded intervals, and E ess sups≤t |Xs |p < ∞.
2
In addition, suppose that b is a Lipschitz function in both arguments; for all x1 , x2 , y1 , and y2 in CT , |b(x1 , y1 ) − b(x2 , y2 )| ≤ c[|x1 − x2 | + |y1 − y2 |],
(10.1.52)
Bp (CT , m0 ) → where we assume that 0 < c < 12 . Consider the mapping Φ : M Bp (CT , m0 ). M
Lemma 10.1.20 (Contraction of Φ with respect to the minimal metric ∗p,t ) Under (10.1.52) and the assumptions of Lemma 10.1.14, for t < T Bp (CT , m0 ), the following contraction property holds: and m1 , m2 ∈ M ∗p,t (Φ(m1 ), Φ(m2 )) ≤
c ∗ (m1 , m2 ). 1 − c p,t
(10.1.53)
Proof: For i = 1, 2, and t ≤ T , define
(i)
Xt
(i) = Bt + ess sup b(Xs , ys ) dmi (y) , 0<s
and let m ∈ M 1 (m1 , m2 ). Then E ess sup |Xs(1) − Xs(2) |p s≤t (1) (1) (1) = E ess sup b Xs , ys dm1 y s≤t CT p (1) (2) (2) − ess sup b Xs , ys dm2 y s≤t CT ⎤p ⎡ ≤ E ess sup c ⎣Xs(1) − Xs(2) + ys(1) − ys(2) dm y (1) , y (2) ⎦ . s≤t CT ×CT
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
305
Therefore, passing to minimal metrics on the right-hand side,
p 1/p (1) E ess sup Xs − Xs(2) s≤t
p 1/p (1) ≤ c E ess sup Xs − Xs(2) s≤t
⎡⎛ + c ⎣⎝
inf
⎞p ⎤1/p ess sup ys(1) − ys(2) dm(y1 , y2 )⎠ ⎦ ;
m∈M 1 (m1 ,m2 ) s≤t CT ×CT
that is, p 1/p (1) .(1 − c) E ess sup Xs − Xs(2) ≤ c ∗1,s (m1 , m2 ) ≤ c ∗p,s (m1 , m2 ) s≤t
Passing to the minimal metrics in the left-hand side, we obtain ∗p p,T (Φ(m1 ), Φ(m2 )) ≤
c ∗ (m1 , m2 ), 1 − c p,T 2
as desired.
Next, we conclude the existence of a unique solution of the McKean– Vlasov-type equation
Xt
= Bt + ess sup b(Xs , ys )us ( dys ) , s≤t
Xt=0 = X0 . (10.1.54)
Theorem 10.1.21 Under (10.1.52), and assuming ess sup b(0, ys ) dm0 (y) < ∞, s≤T CT
Bp (CT , m0 )) a unique weak and strong equation (10.1.54) has (for m ∈ M solution in Xp (CT , m0 ).
306
10. Stochastic Differential Equations and Empirical Measures
Proof: From Lemma 10.1.20 with C := conclude that
c Bp (CT , m0 ), we , and m ∈ M 1−c
k+1 ∗p (m), Φk (m)) ≤ C k ∗p p,T (Φ p,T (Φ(m), m) < ∞,
2
which implies the theorem.
Consider next a system of N interacting particles driven by the equation (10.1.33), namely
Xti,N
N 1 = Wti + ess sup b Xsi,N , Xsj,N s≤t N j=1
(10.1.55)
and X0i,N = X0i ,
1 ≤ i ≤ N.
We shall show that X i,N has a natural limit X i , where the X i are i.i.d. copies of the solution of (10.1.43). Theorem 10.1.22 Suppose that (10.1.52) holds and that the r.v. Ys,j := b(X 1s , X js ) on C[0, T ] are either (i) in the domain of normal attraction (dna) a Gaussian law, or (ii) satisfy the bounded law of the iterated logarithm (BLIL). Suppose also that Eb(X 1s , X js )2∞ < ∞. Then for any i ≥ 1, sup aN N
( ( ( i,N i( E (Xt − X t (
∞
where in case (i) aN =
√
< ∞,
N , while aN =
(10.1.56) % N log log N in case (ii).
Proof: Similarly to the proof of Theorem 10.1.17, we obtain from the condition 2c < 1 that for α ≥ 1, ∗α,T (X i , X i ) L ≤
α N N 1 1 1 i j i E ess sup b(X s , X s ) − b(X s , y)us ( dy) . 1 − 2c N i=1 s≤T N j=1
If (Ys,j ) are in the orgensen (1977)), then (10.1.56) fol√ dna (cf. Hoffmann–J¨ lows with aN = N . If (Ys,j ) satisfy the BLIL, then for the corresponding
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
307
N centered sum we have SN lim E SaN ∞ ≤ lim SaNN ∞ < ∞ a.s. (cf. Kuelbs (1977)), and thus (10.1.56) follows. 2
We remark that invoking Corollary 5.7 of Hoffmann–J¨ orgensen (1977), N a sufficient condition for the dna of SN = j=1 Xj is given by EX1 2bL < ∞,
(10.1.57)
where · bL is the bounded Lipschitz norm with respect to a uniform distance to a Gaussian law. Corollary 10.1.23 Suppose the assumptions of Theorems 10.1.21 and 10.1.22 hold. Let m denote the law of X, and mN stands for the law of (X 1,N , . . . , X N,N ). Then mN is m-chaotic. Remark 10.1.24 Applying a similar technique in the case 0 < p < 1, we see that there exists no unique solution of the linear equation, and furthermore, there is no propagation of chaos.
10.1.5
Minimal Mean Interactions in Time
Next, we study the analogue of equation (10.1.33) with minimal mean interaction in time:
Xti,N X0i,N
N 1 = Wti + ess inf b(Xsi,N , Xsj,N ) , s≤t N j=1 = X0i ,
(10.1.58)
1 ≤ i ≤ N.
The corresponding Boltzmann type equation is
Xt Xt=0
= Bt + ess inf b(Xs , y)us ( dy) , s≤t
(10.1.59)
= X0 .
We obtain the following results. (The proofs are similar to those in section 10.1.4 and are therefore omitted.)
308
10. Stochastic Differential Equations and Empirical Measures
Theorem 10.1.25 Suppose that m0 ∈ M 1 (CT ) and for all x1 , x2 , y1 , and y2 in CT , |b(x1 , y1 ) − b(x2 , y2 )| ≤ c[|x1 − x2 | + |y1 − y2 |],
(10.1.60)
where 0 < c < 12 . Suppose also that ess sup b(0, ys ) dm0 (y) < ∞. s≤T
(10.1.61)
CT
Then (10.1.59) has a unique strong solution in Xp (CT , m0 ). The system (X i,N ) in (10.1.58) has a natural limiting process (X i ), where X i , i ≥ 1, are i.i.d. copies of the solution X of (10.1.59). Theorem 10.1.26 Suppose the assumptions of Theorem 10.1.25 and Theorem 10.1.22 hold. Then for any i ≥ 1, sup aN N
i,N i,N E sup Xt − X t < ∞.
(10.1.62)
t≤T
Corollary 10.1.27 Under the conditions of Theorems 10.1.25 and 10.1.26, the system (10.1.58) admits the propagation of chaos.
10.1.6
Interactions with a Normalized Variation of the Neighbors: Relaxed Lipschitz Conditions
Consider the following stochastic system: t Xti,N
= Wti + 0
X0i,N
=
X0i ,
⎛
⎞ N ◦ ⎠ ds, ⎝1 b Xsi,N, Xj,N s N j=1
(10.1.63)
1 ≤ i ≤ N.
Here ◦
Xis
Xsi − EXsi := E|Xsi − EXsi |
(10.1.64)
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
309
is the normalized variation of particle i, and ((Wti ), X0i ) are independent identically distributed processes on CT × IR. The drift is given by the mean of the interactions with the normalized variation of all particles. We assume that b(x, 0) = 0,
for all x;
(10.1.65)
that is, the interaction is zero if the relative variation is zero. The McKean–Vlasov-type equation corresponding to (10.1.62) is given by t Xt
= Bt +
◦
b(Xs , y) dP
Xs
(y)
ds,
(10.1.66)
0
= X0 ,
Xt=0 d
where B = W i . Note that B in this section is not necessarily a Brownian motion. We study these equations under the following relaxed Lipschitz condition on b. Assume that b has a partial derivative b2 :=
∂b ∂y
(10.1.67)
with respect to the second argument, and consider the following Lipschitztype assumptions: For all x1 , x2 , y ∈ CT , (L1) |b2 (x1 , y) − b2 (x2 , y)| ≤ c|x1 − x2 |; or, for all x1 , x2 , y1 , and y2 , (L2) |b2 (x1 , y1 ) − b2 (x2 , y2 )| ≤ c [|x1 − x2 | + |y1 − y2 |] . (L2) allows a quadratic growth of b with respect to the second component. To obtain contraction properties in this case, we have to switch to a suitable probability metric with regularity conditions of higher order. This makes necessary an essential change in the method of the proofs given so far. ◦
Let m ∈ M1 (CT ) be the distribution of a process (ξs ), and denote by m ◦ the distribution of the normalized process ( ξs ) assuming an absolute first moment of the marginal measure ms . Define Ns
◦
:= ms − δ0 = Nsm ,
and
(10.1.68)
310
10. Stochastic Differential Equations and Empirical Measures
y
(−1)
FNs (y)
:=
FNs (u) du.
(10.1.69)
−∞
Following the common derivates notation of a function f , f (s) , s ≥ 1, we define the s-fold integrated function by f (−s) , and thus (f (−s) )(s) = f . ◦
Note that due to (10.1.65), we can replace the integration of ms in (10.1.66) by integration of Ns . Consider then the linear equation
t Xt = Bt +
b(Xs , y) dNs (ys )
ds.
(10.1.70)
0
Integration by parts in (10.1.70) leads to the equivalent equation t Xt = Bt +
(−1) b2 (Xs , y) dFNs (y)
ds.
(10.1.71)
0
Theorem 10.1.28 Suppose that m ∈ M 1 (CT ) has a finite first moment and E sups≤T |Bs | < ∞. Furthermore, let (L1) be satisfied and suppose that T
ds
|b2 (0, y)| |FNs (y)| dy < ∞.
0
Then
T ds
◦
T ξs |b2 (0, y)| |FNs (y)| dy = E |b2 (0, t)| dt < ∞, (10.1.72)
0
0
0
(10.1.70) has a unique strong solution X, and moreover, E sups≤T |Xs | < ∞. Proof: Let t (SX)t
:= Bt + 0
⎛
⎞
ds ⎝ b(Xs , ys ) dNs (ys )⎠ IR
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
t =
Bt +
⎛
311
⎞
(−1)
ds ⎝ b2 (Xs , ys ) dFNs (ys )⎠ .
0
IR
Then by the Lipschitz condition (L1), (−1) ds (b2 (Xs , ys ) − b2 (Ys , ys )) dFNs (ys )
t |(SX)t − (SY )t | ≤ 0
IR
t
≤
ds c|Xs − Ys | |FNs (ys )| dys . 0
IR (−1)
Observe that the total variation norm of the measure FNs (dy) is 1:
(−1)
Var(FNs ) =
0 ∞ ◦ |FNs (y)| dy = F ξ◦ (y) dy + (1 − F ξ◦ (y)) dy = E| ξs | = 1. −∞
IR
s
s
0
Therefore, t |(SX)t − (SY )t | ≤ c
|Xs − Ys | ds,
(10.1.73)
0
L∗1,t (SX, SY
implying
t
) ≤ c
0
X n = SX n−1 . Then L∗1,T (X n , X n−1 )
L∗1,s (X, Y ) ds. Define inductively X 0 = B,
nT
≤ c
n
n!
L∗1,T (X 1 , X 0 ).
(10.1.74)
Let us estimate the term on the right-hand side of (10.1.74): s ⎛ ⎞ (−1) L∗1,T (X 1 , X 0 ) = E sup ds ⎝ b2 (Bs , ys ) dFNs (ys )⎠ s≤T 0
T ≤ E
ds
0
IR
IR
|b2 (Bs , ys )| |FNs (ys )| dys
(10.1.75)
312
10. Stochastic Differential Equations and Empirical Measures
T ≤ E
(c|Bs | + |b2 (0, ys )|)|FNs (ys )| dys
ds 0
IR
T ≤ E
T ds c|Bs | +
0
ds
0
|b2 (0, ys )| |FNs (ys )| dys < ∞.
IR
Now the equality in (10.1.72) results from the following integration by parts arguments:
|b2 (0, y)||FNs (y)| dy =
IR
|b2 (0, y)||F ξ◦ (y) − F0 (y)| dy s
IR
0 =
|b2 (0, y)|F ξ◦
∞ (y) dy +
s
−∞
|b2 (0, y)|(1 − F ξ◦ (y)) dy s
0
◦ ⎛ ⎞ +∞ y ξs ⎝ |b2 (0, t)| dt⎠ dF ◦ (y) = E |b2 (0, t)| dt < ∞. ξ
=
s
−∞
0
0
Consequently, L∗1,T (X 1 , X 0 ) < ∞. Combining (10.1.74), (10.1.75) implies the existence and the uniqueness of a strong solution X. Moreover, L∗1,T (X, B)
≤
∞
L∗1,T (X n , X n−1 ) ≤ ecT L∗1,T (X1 , B) < ∞;
n≥1
that is, E sups≤T |Bs | < ∞ provides that E sups≤T |Xs | < ∞.
2
We next extend the result of Theorem 10.1.28 to the case where pth moments exist, p ≥ 1. Define X∗T,p = (E supt≤T |X(t)|p )1/p , 1 ≤ p < ∞, and X∗T,∞ = E ess sup0
⎛ ds ⎝
IR
⎞1/p |b2 (0, ys )FNs (ys )|p dys ⎠
< ∞,
(10.1.76)
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
313
and (ii) if p = ∞, then T
ds(ess sup |b2 (0, ys )||FNs (ys )|) < ∞ ys
0
(p = ∞).
(10.1.77)
Under these assumptions, the SDE (10.1.70) has a unique solution X, and furthermore, X∗T,p < ∞. In particular, if Φ(m) is the distribution of the solution of (10.1.70), then Φ(m) maps Mp (CT , δ0 ) into Mp (CT , δ0 ). t Proof: As in Theorem 10.1.28, we have |(SX)t − (SX)t | ≤ c |Xs − Ys | ds. 0
t
Thus for any 1 < p ≤ ∞, L∗p,T (SX, SY ) ≤ L∗p,T (X, Y ) ds. 0
Further, for 1 ≤ p < ∞ (the case p = ∞ is similar), p ⎞1/p s ⎛ (−1) ∗ Lp,T (X, B) = (E sup ds ⎝ b2 (Bs , ys ) dFNs (ys ) ⎠ s≤T ⎛ ⎛ ≤ ⎝E ⎝
0
T
ds
0
T ≤
⎡ ⎛
≤
⎞p ⎞1/p
|b2 (Bs , ys )| |FNs (ys )| dys ⎠ ⎠
IR
⎞p ⎤1/p
ds ⎣E ⎝ |b2 (Bs , ys )| |FNs (ys )| dys ⎠ ⎦
0
T
IR
⎡ ⎛ ds ⎣E ⎝
0
IR
⎞p ⎤1/p (c|Bs | + |b2 (0, ys )|) |FNs (ys )| dys ⎠ ⎦
IR
T ≤ c
ds(E|Bs |p )1/p +
0
T 0
⎞1/p
⎛
ds ⎝ |b2 (0, ys )FNs (ys )|p dys ⎠
< ∞.
IR
2
Now we continue as in Theorem 10.1.28 to complete the proof. Denote by M2∗ (CT , δ0 ) the space of all m ∈ M2 (CT , δ0 ) such that inf E|ξs − Eξs | =: A∗T > 0,
0<s≤T
d
ξ = m.
(10.1.78)
314
10. Stochastic Differential Equations and Empirical Measures
Condition (10.1.78) postulates that the L1 -variation does not converge to 0 for 0 < s < T . In the case of B being a Brownian motion, this means that we do not start (at time s = 0) deterministically at a fixed point. Let Φ(m) be the solution of (10.1.70), t Xt = Bt +
⎛
⎞
◦ ds ⎝ b(Xs , ys ) dms (ys )⎠
0
IR
under the assumptions of Theorem 10.1.29 with p = 2. Then by Theorem 10.1.29, Φ maps M2 (CT , δ0 ) into M0 (CT , δ0 ). Theorem 10.1.30 (Contraction of Φ) Suppose that the Lipschitz condition (L2) holds, and m1 , m2 ∈ M2∗ (CT , δ0 ). Then the following contraction inequality for Φ in terms of ∗2,t is valid: t ∗2,t (Φ(m1 ), Φ(m2 )) ≤ ct ∗2,u (m1 , m2 ) du.
(10.1.79)
0
Proof: For m1 , m2 ∈ M2∗ (CT , δ0 ) let (i)
Xt
t = Bt +
⎛ ⎝
⎞
b Xs(i) , ys(i) dFN (i) ys(i) ⎠ ds s
0
t = Bt +
⎛ ⎝
0
IR
b2 Xs(i) , ys(i) dF
(−1)
(i)
Ns
IR
⎞
ys(i) ⎠ ds.
Then (1)
Xt
(2)
− Xt
⎡ t (1) ⎣ b2 Xs(1) , ys(1) dF (−1) = y (1) s 0
Ns
IR
b2 Xs(2) , ys(2) dF
−
(−1)
(2)
Ns
IR
The total variation norm of FNs
(−1)
⎤
ys(2) ⎦ ds.
is 1 and the total mass is 0. Then the (−1)
− Jordan decomposition has the form FNs (dx) = μ+ s (dx) − μs (dx), where
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
315
− + − + μ+ s (IR) + μs (IR) = 1, μs (IR) − μs (IR) = 0. In other words, μs (IR) = 1 μ− s (IR) = 2 .
We write F
(−1)
(i) Ns
(1)
Xt
(i)+
(ds) = μs t
(2)
− Xt
=
(i)−
(dx) − μs
(dx), and consequently,
⎡ (1) (1) (1)+ (1)− (1) ⎣ ds b2 Xs , ys − μs μs dys
0
IR
−
b2 Xs(2) , ys(2)
μ(2)+ − μ(2)− s s
⎤
dys(2) ⎦ .
IR
Let
dm+ s
(1) (2) y s , ys
(1)+
be a coupling for μs
(2)+
and μs
; that is, m+ s is a (i)+
1 2
and such that πi m+ , i = s = μs (1) (2) 1, 2, πi the ith component. Similarly, let dm− ys , ys be a coupling s positive measure with total mass (1)−
(2)−
and μs
for μs
(1)
(2)
Xt − X t
t =
. Then
⎡ (1) (1) (2) (2) + (1) (2) ⎣ ds b2 Xs , ys − b2 Xs , ys dms ys , ys
0
IR
−
b2 Xs(1) , ys(1) − b2 Xs(2) , ys(2)
dm− s
⎤ . ys(1) , ys(2) ⎦
IR
Consequently, by the Lipschitz condition, (1) (2) (10.1.80) Xt − Xt ⎞ ⎛ t (1) (2) − ≤ ds ⎝ b2 Xs(1) , ys(1) − b2 Xs(2) , ys(2) d m+ + m ys , ys ⎠ s s 0
t ≤ 0
IR2
⎞ (1) (2) − ds ⎝ + m c Xs(1) − Xs(2) + c ys(1) − ys(2) d m+ ys , ys ⎠ . s s ⎛
IR2
− + Observe that the total mass of m+ s + ms is 1, and for i = 1, 2, πi ms + (i)+ (i)− (−1) πi m− + μs is the variation of F (i) . Minimizing with respect to s = μs Ns
316
10. Stochastic Differential Equations and Empirical Measures (i)+
− all couplings m+ s + ms with marginals μs
(i)−
+ μs
, i = 1, 2, we obtain
(1) (2) X − X t t t (1) (2) ≤ c ds Xs − Xs + Fμ(1)+ +μ(1)− (x) − Fμ(2)+ +μ(2)− (x) dx. s s s s 0
IR
As Fμ(1)+ +μ(1)− (x) = FVar(F (−1) ) (x), we have that the integral on the rights
s
(1) Ns
hand side can be bound from above by κ2 : (x) − F (x) dx F Var F (−1) Var F (−1) (1) (2) Ns Ns IR (−1) (−1) ≤ |x| Var Var F (1) − Var F (2) ( dx) Ns
IR
(10.1.81)
Ns
|Fμ1 (x) − Fμ2 (x)| dx ≤ |x| Var(μ1 − μ2 )( dx) d (−1) (−1) (−1) (−1) ≤ |x| Var F (1) − F (2) ( dx) = |x| (x) − F (2) (x) dx F Ns Ns Ns dx Ns(1) IR IR ◦ = |x| FN (1) (x) − FN (2) (x) dx (as Ns := P ξs − δ0 ) using
s
IR
= IR
s
◦(1) ◦(2) |x| F ◦(1) (x) − F ◦(2) (x) dx =: κ2 ξs , ξs , ξs ξs
◦(i)
◦(i)
(i)
where ξs are r.v.s with laws P ξs = P (ξs (i) ms .
−Eξs(i) )/E |ξs(i) −Eξs(i) |
(i)
, and P ξs =
The distance κ2 has the following representation as a minimal metric: κ2
ξs(1) , ξs(2)
& ' d (1) (1) (2) 2 = inf E |ηs |ηs − |ηs |ηs ; ηs(i) = ξs(i) .
This representation allows us to estimate κ2
◦(1)
◦(2)
ξs , ξs
(10.1.82)
by κ2
(1) (2) ξ s , ξs
making use of the assumption that 2 sup E ξs(i) ≤ AT
s≤T
and
(i) (i) inf E ξs − Eξs =: A∗T > 0.
s≤T
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
317
Then κ2
◦(1)
◦(2)
ξs , ξs (10.1.83) ◦(1) ◦(2) ◦(1) ◦(2) ≤ 2 ξs , ξs E| ξs |2 + E| ξs |2 3 2 (1) (1) (2) (2) ξs − Eξs ξs − Eξs ≤ 2AT 2 , (1) (1) (2) (2) E|ξs − Eξs | E|ξs − Eξs | 3 2 (1) (1) (2) (2) ξs − Eξs ξs − Eξs ≤ 2AT 2 , (1) (1) (1) (1) E|ξs − Eξs | E|ξs − Eξs | 2 3 (2) (2) (2) (2) ξs − Eξs ξs − Eξs , + 2AT 2 (1) (1) (2) (2) E|ξs − Eξs | E|ξs − Eξs 2AT (1) (1) (2) (2) ≤ · 2 ξs − Eξs , ξs − Eξs (1) (1) E|ξs − Eξs | 1/2 |E|ξ (1) − Eξ (1) | − E|ξ (2) − Eξ (2) || s s s s (2) (2) 2 + 2AT E|ξs − Eξs | (1) (1) (2) (2) (E|ξs − Eξs |)(E|ξs − Eξs |) (1) ≤ cT 2 ξ2 , ξs(2) .
(1) (2) (1) (2) In the above derivation we used the fact that Eξs − Eξs ≤ 1 ξs , ξs (1) (2) 1 ≤ 2 ξs , ξs , and ≤ A1∗ . Combining these estimates, we (i) (i) |Eξs −Eξs | T write t (1) (1) (2) (2) (1) (2) c ds Xs − Xs + cT 2 ms , ms , Xt − Xt ≤
(10.1.84)
0
(i)
invoking the assumptions E(ξs )2 < ∞, i = 1, 2, and noticing that (i) (i) E ξs − Eξs ≥ A∗T > 0 uniformly on s ∈ (0, T ]. Then, by the Gronwall inequality, with c∗T = c ∨ ct , we have the uniform bound t ∗ (1) (2) sup Xs − Xs(2) ≤ c∗T eCT T ∗2,s m(1) , m ds. s s s≤t
0
318
10. Stochastic Differential Equations and Empirical Measures
By passing to minimal metrics, the above inequality implies that
∗2,t (Φ(m1 ), Φ(m2 ))
≤
c∗T
c∗ TT
e
t ∗2,s (m1 , m2 ) ds. 2
0
Theorem 10.1.31 Suppose that B∗T,2 < ∞. Suppose also that the assumption (L2) holds. Finally, assume that for some m0 ∈ M2 (CT , δ0 ), the following boundedness assumptions on the interaction term b hold:
T ds
|b2 (0, y)FNs (y)|2
1/2 < ∞,
dy
with Ns = Nsm0
(10.1.85)
0
and Φn (m0 ) ∈ M2∗ (CT , δ0 ),
∀ n ∈ IN.
(10.1.86)
Then the Boltzmann-type equation (10.1.66) has a unique weak and strong solution in M2 (CT , δ0 ). Proof: From Theorem 10.1.30, ∗2,T (Φ(k+1) (m), Φ(k) (m))
≤
CTk
Tk ∗ (2,T (Φ(m), δ0 ) + ∗2,T (m, δ0 )) < ∞, k!
for m ∈ M2 (CT , δ0 ). Therefore, (Φk (m)) is a Cauchy sequence in (CT , ∗2,T ) and converges to a fixed point. If X (k+1) , X (k) are the optimal couplings of Φ(k+1) (m) and Φ(k) (m) respectively, we obtain that (X (k) ) is an L2∗,T 2 Cauchy sequence, leading to a (unique) L∗2,T fixed point X.
Remark 10.1.32 Condition (10.1.86) postulates that the solutions of the linear equations corresponding to Φm (m0 ) have strictly positive variation. A simple, sufficient condition for that to hold is inf s≤T |Bs − EBs | ≥ T M + ε, provided that b is bounded and |b| ≤ M . This condition is useful only for fixed T but not for T → ∞. However, it might be possible (at least in some examples, as in the construction of solutions of special SDEs) to construct a solution piecewise on small time intervals and to join the pieces to a solution on the whole real line. For special choices of b it is possible to obtain weaker sufficient conditions for (10.1.86). Condition (10.1.78) is needed in order
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
319
to reconstruct the process. Without this condition we only can reconstruct the normalized process (cf. (10.1.83)). We now turn our attention to equation (10.1.63). The next theorem asserts that as N → ∞ each X i,N has a limit X i . The (X i ) are independent copies of the solution of (10.1.67) considered in Theorem 10.1.31. Theorem 10.1.33 Suppose that (L2) holds, and moreover, b∞ = supx,y |b(x, y)| < ∞. Suppose also that uniformly on i, |W i |T,∞ := ess sup sup |Wsi | ≤ X < ∞. 0<s
Then for any i ≥ 1, T > 0, √ sup N E sup |Xti,N − X it | < ∞. 0
N
Corollary 10.1.34 (Propagation of Chaos) Let m denote the law of X i satisfying (10.1.25) and let WN denote the law of (X 1,N , . . . , X N,N ). Then, under the assumptions of Theorems 10.1.31 and 10.1.33, WN is m-chaotic. Proof of Theorem 10.1.33: Omitting the index N , we get t Xti − X it
= 0
=: where I1 (t)
:=
I2 (t)
:=
I3 (t)
:=
N ◦ ◦ 1 i j Xs ds b(Xs , Xs ) − ds b(X s , ys )P ( dy) N j=1 t
0
I1 (t) + I2 (t) + I3 (t), ⎡ t ⎤ t N N ◦ ◦ 1 ⎣ ds 1 b(Xsi , Xjs ) − ds b(X is , Xjs )⎦ , N j=1 N j=1 0 0 ⎡ t ⎤ t N N ◦ ◦ 1 ⎣ ds 1 b(X is , Xjs ) − ds b(X is , Xjs )⎦ , N j=1 N j=1 0 0 ⎡ t ⎤ t N ◦ ◦ ⎣ ds 1 b(X is , Xjs ) − ds b(X s , ys )P Xs ( dy)⎦ , N j=1 0
and E|I1 |T
:=
CT
E sup E|I1 (t)| 0
0
CT
320
10. Stochastic Differential Equations and Empirical Measures
T =
E 0
N 1 ◦ ◦ i j i j ds [b(Xs , Xs ) − b(X s , Xs )] . N j=1
From (L2), |b(x, y) − b(x, y)| = |b(x, y) − b(x, 0) − (b(x, y) − b(x, 0))| y t = b2 (x, t) dt − b2 (x, t) dt 0
0
|y|
|b2 (x, t) − b2 (x, t)| dt
≤ 0
≤ c|x − x| |y|. Therefore, E|I1 |T ≤ c E
T 0
1 N
N
◦
i i j j=1 |Xs − X s | |Xs |.
Assuming that b∞ = supx,y |b(x, y)| < ∞ and |W i,N |T,∞ := ess sup sup |Wsi,N | ≤ K, 0<s≤T
then supi,N |Xti,N | ≤ K + T · b∞ . Therefore, T E|I1 |T,1
≤ C
ds E|Xsi − X is |, 0
T E|I2 |T,1
≤ 0
N ◦ 1 i ◦j i i b ds X , X X , X − b s s s s . N j=1
For 0 < y < y, y |b(x, y) − b(x, y)| =
|b2 (x, t)|dt
y
y ≤ c y
|b2 (x, t) − b2 (0, t)| dt + c
y y
|b2 (0, t) − b2 (0, 0)| dt
10.1 Propagation of Chaos and Contraction of Stochastic Mappings
321
y+y 1 2 2 ≤ c|x| |y − y| + |y − y | ≤ c|y − y| |x| + . 2 2
In general, |b(x, y) − b(x, y)| ≤ c|y − y| |x| +
|y|+|y| 2
.
Assuming that Xsj,N are bounded a.s., |X j,N |T,∞ := ess sup sup |Xsj,N | 0<s
< ∞, we obtain I2 T,1
:= E|I2 |T T N ◦ 1 i ◦j i j b − b ≤ ds X , X X , X s s s s N j=1 0
T ≤ c 0
T ≤ c 0
N ◦j ◦ ◦ ◦ 1 1 ds E|Xjs − Xs | × |X is | + |Xjs | + |Xjs | N j=1 2 N ◦ ◦ 1 ds E|Xjs − Xjs |, N j=1
changing the values of the absolute constants c wherever it is necessary. Using the estimates for |I1 |T and |I2 |T , we have
N X − X T,1 = i
i
T ≤ c
ds 0
⎧ N ⎨ ⎩
X i − X i T,1
i=1
E|X i − X i |s,1 +
i=1
N
T
+
N
i=1 0
N
E|X j − X j |s,1
j=1
⎫ ⎬ ⎭
N ◦ ◦ 1 i j X ds b(X s , Xs ) − b(X s , ys )P s ( dy) . N j=1 CT
By the Gronwall lemma and the Pyke and Root (1968) inequality,
i
X i − X T,1
T ≤ c 0
⎡ ds ⎣
1 N
N T i=1 0
N ◦ 1 ds b(X is , Xjs ) N j=1
322
10. Stochastic Differential Equations and Empirical Measures
− CT
⎤ 1 Xs b(X s , ys )P ( dys )⎦ ≤ O( √ ). N ◦
2
10.2 Rates of Convergence of Empirical Measures in the Kantorovich Metric Let μ be a probability measure on IRd (typically unknown) and let X1 , X2 , . . . , Xn be i.i.d. r.v.s with common probability law μ. Let
μn
n 1 = δX n i=1 i
⎧ ⎨ 1 with δx (A) := ⎩ 0
if x ∈ A, if x ∈ A
be the empirical measure of X1 , X2 , . . . , Xn . Then it is well known that μn → μ a.s.
(10.2.1)
in the topology of weak convergence.(1) If σ2 =
|x|2 μ( dx) < ∞,
(10.2.2)
then by the SLLN, 1 2 X = n i=1 i n
|x|2 μn ( dx) → σ 2
(10.2.3)
a.s. and in L1 (P ). We denote by P2 = P2 (IRd ) the space of probability measures on Borel sets of) IRd having finite second moments, i.e., (the such that |u|2 μ( du) < ∞. Recall that the L2 -Kantorovich metric (the Wasserstein metric of order 2) on P2 is 22 (μ, ν) = inf (1) See
|u − v|2 P ( du, dv); P ∈ M (μ, ν) ,
Dudley (1989) and Rachev (1991) and the references therein.
10.2 Rates of Convergence in the Kantorovich Metric
323
where M (μ, ν) denotes the set of probability measures on IRd × IRd with marginals μ and ν. (Here and below | · | denotes the usual Euclidean norm on the appropriate space.) Equivalently, 22 (μ, ν) = inf E|X − Y |2 , where the “inf” is taken over all pairs of r.v.s X, Y having laws μ, ν, respectively, in other words, over all couplings of μ and ν. From (10.2.1)–(10.2.3) it follows that 2 (μn , μ) → 0 a.s. In this section we investigate the rate of convergence to zero of E22 (μn , μ). (2) A similar result is obtained for infinite exchangeable sequences except that the common probability law must be replaced by the directing measure. Finally a mean square uniform rate of convergence is obtained for an i.i.d. sequence of stochastic processes on a finite time interval. Theorem 10.2.1 Suppose that the unknown μ has high enough finite ab d+5 solute moments c := |u| μ( du) < ∞. Then there is a constant C, depending only on c and the dimension d, such that E22 (μn , μ) ≤ Cn−2/(d+4) . The proof is built up on lemmas that are of some independent interest. Lemma 10.2.2 (Carlson’s lemma) Let g be a nonnegative, measurable function on IRd . Then for p > d, 4
g(x) dx ≤ Cp,d
1−d/p d/p . g 2 (x) dx |x|p g 2 (x) dx
(10.2.4)
where 4 Cp,d =
(2) The
ωd π sin(πd/p)dd/p (p − d)1−d/p
results of this section are due to Horowitz and Karandikar (1994); see also Horowitz and Karandikar (1990). Their study presented in this section was motivated by the observation (see also Tanaka (1978)) that the Wasserstein metric is convenient for formulating weak convergence results for the empirical measures of finite interacting particle systems related to the Boltzmann equation.
324
10. Stochastic Differential Equations and Empirical Measures
and ωd is the surface area of the unit sphere in IRd . In particular, for p = d + 1 we have 4
g(x) dx ≤ Cd
1/(d+1) d/(d+1) , g 2 (x) dx |x|d+1 g 2 (x) dx
from which follows 4
g(x) dx ≤ Cd
(|x|d+1 + 1)g 2 (x) dx,
(10.2.5)
where Cd is a constant depending only on d. Lemma 10.2.3 (Density coupling lemma) Let f, g be probability densities on IRd such that
|x|2 (f (x) + g(x)) dx < ∞,
and define μ( dx) = f (x) dx, ν( dx) = g(x) dx. Then(3) 22 (μ, ν)
≤ 3
|x|2 |f (x) − g(x)| dx.
Proof: Let M be a coupling of μ and ν defined by ϕ(x, y)M ( dx, dy) 1 ϕ(x, y)(f (x) − f ∧ g(x))(g(y) − f ∧ g(y)) dx dy = 1−A + ϕ(x, x)f ∧ g(x) dx, where A = Then
f ∧ g(x) dx and ϕ(x, y) is any nonnegative Borel function.
|x − y|2 M ( dx dy)
(3) Zolotarev
here.
(1978) proves this lemma with the constant 4 instead of the constant 3
10.2 Rates of Convergence in the Kantorovich Metric
325
|x|2 (f (x) − f ∧ g(x)) dx + |y|2 (g(y) − f ∧ g(y)) dy 2 x(f (x) − f ∧ g(x)) dx · y(g(y) − f ∧ g(y)) dy − 1−A = |x|2 |f (x) − g(x)| dx 2 x(f (x) − f ∧ g(x)) dx · y(g(y) − f ∧ g(y)) dy − 1−A =
(the dot · indicates the usual inner product in IRd ). Furthermore, 1/2 1/2 x(f (x) − f ∧ g(x)) dx ≤ |x|2 |f − f ∧ g| dx |f − f ∧ g| dx 1/2 2 = |x| |f − g| dx (1 − A)1/2 . Thus
|x − y|2 dM ≤ 3 |x|2 |f − g| dx, and the result follows.
2
Lemma 10.2.4 (Pollard (1986)) For any r.v.s Z1 , . . . , ZN , √ A E max |Zk | ≤ N max EZk2 . 1≤k≤n
The proof is obvious: A A A 2 2 EZk ≤ E max |Zk | ≤ E max Zk ≤ N max EZk2 . We next write Φσ ∼ N (0, σ 2 I) to indicate that Φσ is the multivariate normal distribution on IRd with mean vector 0 and dispersion matrix σ 2 I; here σ 2 > 0, and I is the d × d identity matrix. For any probability measure μ on IRd , let μσ := Φσ ∗ μ be the convolution of Φσ and μ. Thus μσ will have density qσ := φσ ∗ μ, where φσ is the density of Φσ . Lemma 10.2.5 If μ ∈ P2 , then 22 (μσ , μ) ≤ dσ 2 . Proof: Let X and Y be independent random vectors with laws μ and Φσ , respectively. Then (X, X + Y ) is a coupling of μ and μσ , and 22 (μσ , μ) ≤ 2 E|Y |2 = dσ 2 .
326
10. Stochastic Differential Equations and Empirical Measures
Proof of Theorem 10.2.1: Let X1 , X2 , . . . be i.i.d. r.v.s with law μ ∈ P2 , and let μn be the corresponding empirical measure. The triangle inequality gives 22 (μn , μ) ≤ 2 22 (μn , μσn ) + 22 (μσn , μσ ) + 22 (μσ , μ) . Thus 22 (μn , μ) ≤ C(σ 2 + 22 (μσn , μσ )).
(10.2.6)
The constant σ 2 > 0 will be chosen later. Let g σ := φσ ∗ μ and gnσ := φσ ∗ μn be the densities of μσ and μσn , respectively; here gnσ is given by 1 = φσ (x − Xi ). n i=1 n
gnσ (x)
By Lemma 10.2.3 and inequality (10.2.5), we have
22 (μσn , μσ ) ≤ 3 |x|2 |g σ (x) − gnσ (x)| dx 4
(10.2.7)
(|x|d+5 + 1)|g σ (x) − gnσ (x)|2 dx.
≤ C The above bound yields E22 (μσn , μσ ) ≤ C
4 (|x|d+5 + 1)E|g σ (x) − gnσ (x)|2 dx .
(10.2.8)
Since gnσ (x) is the mean of n i.i.d. r.v.s, the expectation in (10.2.8) is (1/n)V (φσ (x − X)), since Eg n,σ (x) = g σ (x), where V stands for variance and X has law μ, X ∼ μ. The indicated variance is dominated by Eφ2σ (x − X), and we obtain E22 (μσn , μσ )
C ≤ √ n
4
(|x|d+5 + 1) φ2σ (x − y)μ( dy) dx.
(10.2.9)
Now observe that φ2σ (x) = 2−d/2 (2π)−d/2 σ −d φσ/√2 (x).
(10.2.10)
10.2 Rates of Convergence in the Kantorovich Metric
327
Using this, the integral in (10.2.9) is easily seen to be dominated by −d/2 −d
(4π)
σ
d+4
1+2
σ
d+5
E|Z|
d+5
+
|y|
d+5
μ( dy)
= Cσ −d ,
where Z ∼ N (0, I) and we assume σ ≤ 1. Thus E22 (μσn , μσ ) ≤ Cn−1/2 σ −d/2 .
(10.2.11)
Taking expectations in (10.2.6), we get E22 (μn , μ) ≤ C(σ 2 + n1/2 σ −d/2 ).
(10.2.12)
Choose σ = n−1/(d+4) , and Theorem 10.2.1 is proved.
2
Theorem 10.2.1(4) is also valid, with a slight modification, for infinite exchangeable sequences. Let X1 , X2 , . . . be an infinite exchangeable sequence with directing measure μ (Aldous (1985)). Thus μ is now a random measure on IRd , and conditional on μ, the r.v.s Xn are i.i.d. r.v.s with law μ. Let β be the marginal distribution of Xn , so β(B) = Eμ(B). We then have the following rate of convergence result. Theorem 10.2.6 Suppose c := |u|d+5 β( du) < ∞. Then there is a constant C, depending only on c and d, such that −2
E22 (μn , μ) ≤ cn d+4 . Proof: The proof is virtually the same as that of Theorem 10.2.1, except that the notation μ now refers to the directing measure. In (10.2.8) we take conditional expectation given μ instead of the ordinary (unconditional) expectation, and, arguing as in (10.2.9), but conditional on μ, we get 4 E(22 (μσn , μσ )|μ) (4) Two
≤ Cσ
−d/2 −1/2
n
C1 + C2
|y|d+5 μ( dy),
results related to Theorem 10.2.1 are (i) If the law μ is the Lebesgue measure on [0, 1]d , then for d ≥ 3, E22 (μn , μ) = O(n−2/d ); see Yukich (1991). (ii) Rachev (1991c, Theorem 11.1.6) (see also Dudley (1969)) showed that under a metric entropy condition, the rate E22 (μn , μ) = O(n−1/d ) is optimal.
328
10. Stochastic Differential Equations and Empirical Measures
for some constants C, C1 , C2 . Taking expectation yields (10.2.11), and the proof is completed as before. 2 Consider an i.i.d. sequence of processes Xn (t) with sample functions in D := D([0, 1], IRd ), i.e., the space of cadlag functions (i.e., right continuous and having left limits at each point) on the unit interval, with values in IRd . Let X(t) denote a process having the same law. Set μt to be the marginal law of the process X at time t. The empirical measure at time t, based on observations X1 (t), . . . , Xn (t), is defined by 1 = δX (t) . n i=1 i n
μnt
In this case we give a bound on the mean square uniform rate of convergence, that is of convergence to zero of E sup0≤t≤1 22 (μnt , μt ) under mild assumptions. Theorem 10.2.7 Suppose that for some constants p > 2 and c < ∞, (i) E|X(t)|d+5 ≤ c for 0 ≤ t ≤ 1, (ii) E|X(s) −X(r)|p |X(s) −X(t)|p ≤ c|t − r|2 , for 0 ≤ r < s < t ≤ 1, (iii) E|X(t) −X(s)|p ≤ c|t − s|, for 0 ≤ s ≤ t ≤ 1, (iv) E|X(t) −X(s)|2 ≤ c|t − s|, for 0 ≤ s ≤ t ≤ 1. Then there is a constant C, depending only on p, c, and the dimension d, such that E sup 22 (μnt , μt ) ≤ Cn−2/(d+8) . 0≤t≤1
Proof: Let N be a positive integer, to be chosen later, and let tk = k/N, 0 ≤ k ≤ N , so the tk partition [0, 1], and let Zk :=
sup
tk ≤t≤tk+t
22 (μnt , μntk ) ∧ 22 (μnt , μntk+1 ).
Then sup 22 (μnt , μt )
0≤t≤1
(10.2.13)
10.2 Rates of Convergence in the Kantorovich Metric
*
329
+
≤ 3 max Zk + max 22 (μntk , μtk ) + max k
k
k
sup
tk ≤t≤tk+1
22 (μtk , μt ) .
The last term on the right is easy to estimate: μtk and μt are coupled by X(tk ) and X(t), so by (iv), 22 (μtk , μt ) ≤ E|X(tk ) − X(t)|2 ≤ C|tk − t|. Thus the contribution of the last term on the right-hand side of (10.2.13) is at most C/N . Next we consider the middle term. Let σ > 0 (to be chosen later). With the notation of Theorem 10.2.1, we have (10.2.14) E max 22 (μntk , μtk ) k n,σ 2 n 2 n,σ σ 2 σ ≤ C E max 2 (μtk , μtk ) + E max 2 (μtk , μtk ) + E max 2 (utk , μtk ) k k k σ ≤ C σ 2 + E max 22 (μn,σ tk , μ tk ) . k
The last inequality follows by Lemma 10.2.5. By Lemma 10.2.4, E
σ max 22 (μn,σ tk , μtk ) k
√ A σ ≤ N max E42 (μn,σ tk , μtk ).
(10.2.15)
k
Now, as in (10.2.7), σ 42 (μn,σ t k , μ tk )
≤ C (|x|d+5 + 1)|gtn,σ (x) − gtσk (x)|2 dx, k
σ (x), gtnk (x) are the p.d.f.s of μn,σ where gtn,σ tk , μtk . Arguing as in Theorem k σ −d −1 10.2.1 and noting (i), we find E42 (μn,σ n , and (10.2.15) tk , μtk ) ≤ Cσ yields −d/2 E max 22 (μn,σ tk ) ≤ Cσ k
% N/n.
(10.2.16)
Putting everything together in (10.2.13) we have E
sup 22 (μnt , μt ) k
2
≤ C 1/N + σ + σ
−d/2
% N/n + E max Zk k
(10.2.17)
330
10. Stochastic Differential Equations and Empirical Measures
2
≤ C 1/N + σ + σ
−d/2
% √ A 2 N/n + N max EZk . k
Finally, we analyze the term involving Zk . Define a random vector in (IRd )n by 1 Y (t) := √ (X1 (t), . . . , Xn (t)). n Then for t1 ≤ t ≤ t2 , E|Y (t) − Y (t1 )|p |Y (t2 ) − Y (t)|p 3p/2 2 n 3p/2 2 n 1 1 = E |Xi (t) − Xi (t1 )|2 |Xj (t2 ) − Xj (t)|2 n 1 n 1 32 n 3 2 n 1 1 ≤ E |Xi (t) − Xi (t1 )|p |Xj (t2 ) − Xj (t)|p n 1 n 1 =
n 1 E(|Xi (t) − Xi (t1 )|p |Xi (t2 ) − Xi (t)|p ) n2 1 1 + 2 E(|Xi (t) − Xi (t1 )|p )E(|Xj (t2 ) − Xj (t)|p ) n i=j
≤ C|t2 − t1 |2 . Here we have used the independence of Xi and Xj for i = j, and conditions (ii) and (iii) of the theorem. By Billingsley (1968; (15.26)), there is a constant K = Kp , depending only on p, such that
P
sup min(|Y (t) − Y (t1 )|, |Y (t2 ) − Y (t)|) > λ
t1 ≤t≤t2
≤ CKλ−2p |t2 − t1 |2 .
Hence # E
sup |Y (t) − Y (t1 )|4 ∧ |Y (t2 ) − Y (t)|4
t1 ≤t≤t2
$
≤ CK|t2 − t1 |2 ;
10.2 Rates of Convergence in the Kantorovich Metric
331
that is, ⎡
2
E ⎣ sup
t1 ≤t≤t2
n n 1 1 2 |Xi (t) − Xi (t1 )| ∧ |Xi (t2 ) − Xi (t)|2 n 1 n 1
≤ CK|t2 − t1 |2 .
32 ⎤ ⎦
(10.2.18)
It is easy to check that 1 ≤ |Xi (t) − Xi (t1 )|2 , n 1 n
22 (μnt , μnt1 )
so that (10.2.18) implies (replace t1 , t2 by tk , tk+1 ) EZk2 ≤ CK/N 2 .
(10.2.19)
Putting this into (10.2.17), we have E
sup 22 (μnt , μt ) t
% √ ≤ C 1/N + σ + σ N/n + 1/ N % √ ≤ C σ 2 + σ −d/2 N/n + 1/ N . 2
−d/2
(10.2.20)
The choice σ = n−1/(d+8) , and N = n4/(d+8) now gives the result.
2
Remark 10.2.8 Horowitz and Karandikar also showed that if X(t) is an IRd -valued diffusion with jumps, then under some conditions on the drift and diffusion coefficient, it satisfies moment estimates as in Theorem 10.2.7. More precisely, let X(t) be a solution of the SDE ( dt dz). dX(t) = σ(t, X(t)) dW (t) + b(t, X(t)) dt + h(t, X(t−), z)N Here, W (t) is an IRd -valued standard Wiener process; N is a Poisson random measure on [0, T ] × IRd with intensity measure Λ given by Λ( dt, dz) = λt ( dz) dt, = N − Λ, σ(t, x), b(t, x) are measurable functions on [0, T ] × IRd taking N values in d × d matrices and IRd , respectively; and h(t, x, z) is an IRd -
332
10. Stochastic Differential Equations and Empirical Measures
valued measurable function on [0, T ] × IRd × IRd . It is assumed that X(0) is independent of W , N , and that X(t) is Ft -adapted, where
Ft = σ X(0), W (s), N ([0, s] × A), s ≤ t, A Borel in IR
d
(see Jacod (1979)). Suppose that |σ(t, x)|2 ≤ C(1 + |x|2 ),
|b(t, x)|2 ≤ C(1 + |x|2 ),
E|X(0)|d+5 < ∞,
and for 2 ≤ q ≤ d + 5, 0 ≤ r ≤ T , hq (r, x, z)λr ( dz) ≤ C(1 + |x|q ). Under these conditions, the process X(t) satisfies the moment conditions imposed in Theorem 10.2.7. For details we refer to Horowitz and Karandikar (1994, Section 5).
10.3 Wasserstein Metric and Approximation of Stochastic Differential Equations In this section we shall study weak approximations of stochastic differential equations (SDEs) of Itˆ o type. Moreover, we shall estimate the convergence rates of the approximate solutions using the Lp -distance E · pC[t0 ,T ] ,
p ∈ [2, ∞).(5) These results can also be interpreted as convergence rates for the minimal Lp -(Wasserstein) metric, p ∈ [1, ∞), between the distributions of exact and approximate solutions. Two approximation schemes will be considered. They represent a combination of the time discretization methods of Euler and Milshtein with a chance discretization based on the invariance principle, and they work on a grid constructed to tune both discretizations. The methods investigated here are based on the evaluation of the drift and diffusion coefficients in grid points, and they combine the time discretization of the SDE—as done, for instance, by the stochastic analogue of Euler’s method—with the discretization of the stochastic input, the Wiener (5) The
results in this section are due to Gelbrich (1995).
10.3 Stochastic Differential Equations
333
process. This combination of time and chance discretization is necessary for a computer simulation of the solution of Itˆo SDEs. Another idea for discretizing such SDEs without using the Wiener process can be found in Pardoux and Talay (1985) and Talay (1988) and is based on the approach of Doss (1977) and Sussmann (1978). In fact, Doss (1977) and Sussmann (1978) use a partial and an ordinary differential equation for constructing a pathwise solution of the SDE; that is, a pathwise convergence in the supremum norm is considered; see also Milshtein (1978), Newton (1986), Wagner (1988). A broad survey over various approximations of solutions for SDEs is given in the monograph Kloeden and Platen (1992), see also Platen (1981), Maruyama (1955), Milshtein (1978), Janssen (1984), Dudley (1968), Doss (1977), Sussmann (1978), and R¨omisch and Wakolbinger (1985). Kanagawa (1986) used a method derived from the stochastic Euler method by replacing the increments of the Wiener process by other “simpler” i.i.d. r.v.s. He uses Lp -Wasserstein metrics (p ≥ 2) between the distributions of exact and approximate solutions, thus achieving convergence rates, see also Rachev (1991), Givens and Shortt (1984), and Gelbrich (1990). Gelbrich (1995) uses the same metrics but generalizes the method of Kanagawa. For that he uses as a basis the stochastic Euler method (see further the method (E1)) (proposed by Maruyama (1955)) and Milshtein’s method (M1∗ ) (proposed by Milshtein (1978)) having orders 1 and 2, respectively, with respect to the mean square of the supremum norm of the difference between exact and approximate solutions. Since these methods use values of the drift and diffusion coefficients and of the Wiener process only in grid points tk , the order 2 is optimal, as shown by Clark and Cameron (1980). The orders of these methods (see further (E1) and (M1∗ )) are proved in Platen (1981), as well as higher orders for methods using also iterated integrals of the Wiener process. Consider a stochastic differential equation of Itˆ o type written in integral form:
x(t) − x0
t t = b(x(s)) ds + σ(x(s)) dw(s) t0 t
(SE) =
b(x(s)) ds + t0
t0 q t j=1 t
0
σj (x(s)) dwj (s),
t ∈ [t0 , T ], x0 ∈ IRd ,
334
10. Stochastic Differential Equations and Empirical Measures
where w = (w1 , . . . , wq )T is a q-dimensional standard Wiener process, b ∈ C(IRd ; IRd ), and σ ∈ C(IRd ; L(IRq ; IRd )), and where σj ∈ C(IRd ; IRd ), j = 1, . . . , q, denote the columns of the matrix function σ = (σ1 , . . . , σq ). In the sequel we denote by C and C i spaces of continuous and i times differentiable functions, respectively, and by L spaces of linear mappings. By · we shall denote the Euclidean norm on IRn (n ∈ IN) and the corresponding induced norm on a space L. For any random variable (r.v.) ζ mapping a probability space (Ω, A, P ) into a separable metric space (X, d) with the Borel σ-algebra B(X), L(ζ) denotes the distribution P ◦ ζ −1 induced on X by ζ. P(X) is the set of all Borel probability measures on X. The case that b and σ explicitly depend on the time t can be written in the form (SE) by taking t as another component of x. A direct treatment of this case follows the same lines as in this section and is carried out—for equidistant grids and bounded b and σ—in Gelbrich (1989). It allows us to relax the eventually required second-order t-differentiability to first-order p on the t-differentiability. For p ∈ [1, ∞), recall the minimal Lp -metric set Mp (X) := μ ∈ P(X); (d(x, θ))p dμ(x) < ∞, θ ∈ X : X
⎡ p (μ, ν) := ⎣inf
⎤1/p (d(x, y))p dη(x, y)⎦
,
μ, ν ∈ Mp (X);
X×X
the infimum is taken over all measures η ∈ P(X × X) with marginal distributions μ and ν. Theorem 10.3.1 (Kanagawa (1986)) Let {ζ k , k = 1, . . . , N } ∈ IRq be a set of bounded i.i.d. q-dimensional r.v.s with mean value 0 and covariance matrix Iq (unit matrix), with finite (2+δ)th absolute moments for some δ ∈ (0, 1], and with a quadratically integrable density. If b and σ are Lipschitz continuous, then the method ⎧ ⎪ y5N (0) = x0 , ⎪ ⎪ ⎪ ⎪ k k ⎪ j−1 j−1 j−1 ζ j j−1 1 ⎪ k ⎪ √ y 5 ( ) = x + σ , y 5 + b , y 5 ⎪ N 0 N N N N N N N N, ⎪ N ⎨ j=1 j=1 (K)
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ y5N (t) ⎪ ⎪ ⎪ ⎪ ⎩
k = 1, . . . , N, k−1 k − y5N k−1 = y5N N + (N · t − k + 1) y5N N N k−1 k for t ∈ N , N , k = 1, . . . , N,
10.3 Stochastic Differential Equations
converges for any ε >
1 2
335
and every p ∈ [2, 2 + δ) at the rate
p (D(5 yN ), D(x)) = O(N −δ/2(2+δ) (ln N )ε )
for
N → ∞.
In the sequel we shall use the following assumptions concerning (SE): (AS1)
There exists a constant M > 0 such that for all j = 1, . . . , q and x ∈ IRd , b(x) ≤ M (1 + x)
(AS2)
and
σj (x) ≤ M .
There exists a constant L > 0 such that, for all j = 1, . . . , q and x, y ∈ IRd , b(x) − b(y) ≤ Lx − y
(AS3)
and
σj (x) − σj (y) ≤ Lx − y.
b, σj ∈ C 2 (IRd ; IRd ), j = 1, . . . , q, and there exists a constant B > 0 such that for all j = 1, . . . , q and x, y ∈ IRd , b (x) − b (y) ≤ Bx − y and σj (x) − σj (y) ≤ Bx − y.
(AS2∗ )
b, σj ∈ C 2 (IRd ; IRd ), j = 1, . . . , q, and there exists a constant L > 0 such that for all j = 1, . . . , q and x ∈ IRd , b (x) ≤ L
(AS3∗ )
and
σj (x) ≤ L.
There exists a constant B > 0 such that for all j = 1, . . . , q and x ∈ IRd , sup{ b (x)[h, k]; h, k ∈ IRd , h ≤ 1, k ≤ 1} ≤ B and sup{σj (x)[h, k]; h, k ∈ IRd , h ≤ 1, k ≤ 1} ≤ B.
(AS4)
σi σj = σj σi for all i, j = 1, . . . , q.
The construction of the approximate solutions in Theorem 10.3.1 will be generalized by considering—instead of one equidistant grid for both time and chance discretization—a not necessarily equidistant coarse grid for the time discretization and a fine grid. The fine grid will be a refinement of the coarse grid and will be needed for the chance discretization via
336
10. Stochastic Differential Equations and Empirical Measures
the invariance principle, which yields a lower convergence speed than the time discretization. To this end, we consider a grid class G(m, Λ, α, β). Here let m : (0, T − t0 ] → [1, ∞) be a monotonically decreasing function and let Λ, α, β > 0 be constants. Then each element G of G(m, Λ, α, β) is constructed in the following way and has the following properties: G consists of two kinds of grid points: • the time discretization points tk , k = 0 . . . , n, with t0 < t1 < · · · < tn = T and • the chance discretization points uki , i = 0, . . . , mk ; k = 0, . . . , n − 1, with tk = uk0 < uk1 < · · · < ukmk = tk+1 ; k = 0, . . . , n − 1. Hence, G is a combination of a coarse subgrid consisting of all points tk relevant for the pure time discretization and of a fine grid consisting of all points uki needed for the discretization of the Wiener process. Denote by hk := tk+1 − tk ; k = 0, . . . , n − 1,
and h :=
max
0≤k≤n−1
hk
the step sizes and the maximal step size of the coarse subgrid. Now, G is required to satisfy the following assumptions: (G1)
h · n ≤ Λ and n ∈ IN, h ≤ 1.
(G2)
1 ≤ mk ≤ m(h)α and mk ∈ IN for all k = 0, . . . , n − 1.
(G3)
uki − uki−1 =
hk mk
h ≤ β m(h) for all k = 0, . . . , n − 1; i = 1, . . . , mk .
Here (G1) restricts the number of intervals of the coarse subgrid with given h, which is bounded by 1 only for convenience (in order to write simpler upper bounds later). (G2) and (G3) say that each interval of the coarse subgrid is subdivided in an equidistant way by the points uki , both the number of the subdivisions and the step size of the full grid being bounded by functions of h. As an example, it is easy to see that all equidistant grids that also have an equidistant coarse subgrid and satisfy mk = [m(h)], k = 0, . . . , n − 1, belong to G(m, T − t0 , 1, 2). For a grid G of G(m, Λ, α, β) we define [t]G := tk and iG (t) := k, [t]∗G := uki
if t ∈ [tk , tk+1 ); k = 0, . . . , n − 1;
if t ∈ [uki , uki+1 ); i = 1, . . . , mk ; k = 0, . . . , n − 1.
10.3 Stochastic Differential Equations
337
We construct the approximate solutions in (E3) and (M3) in three steps. The first step is a pure time discretization using the stochastic Euler method (E1) and the method (M1) corresponding to Milshtein’s method (M1∗ ). Here only the coarse subgrid is involved. We define these two methods as follows:
(E1)
t q t σj (y E ([s]G )) dwj (s) y E (t) = x0 + b(y E ([s]G )) ds + t0
j=1 t
0
for all t ∈ [t0 , T ],
and
(M1)
⎧ t t q ⎪ ⎪ ⎪ ⎪ y M (t) = x0 + b(y M ([s]G )) ds + σk (y M ([s]G )) dwk (s) ⎪ ⎪ ⎪ ⎪ k=1 ⎨ t0 t0 t s q ⎪ ⎪ + (σk σj )(y M ([s]G )) dwj (u) dwk (s) ⎪ ⎪ ⎪ ⎪ j=1 t ⎪ 0 [s]G ⎪ ⎩ for all t ∈ [t0 , T ].
q If (AS4) holds and b := b − 12 j=1 σj σj , then (M1) is equivalent to the following method (M1∗ ) proposed by Milshtein (1978). This equivalence is an immediate consequence of Itˆo’s formula. ⎧ M y (t) = ⎪ ⎪ ⎪ iG (t)−1 ⎪ ⎪ ⎪ ⎪ ⎪ + hrb(y M (tr )) + b(y M ([t]G ))(t − [t]G ) x 0 ⎪ ⎪ ⎪ ⎪ ⎪ *r=0 ⎪ (t)−1 q ⎪ ⎪ iG ⎪ ⎪ ⎪ σj (y M (tr ))(wj (tr+1 ) − wj (tr )) + ⎪ ⎪ + ⎪ ⎪ r=0 j=1 ⎪ ⎪ ⎪ + σj (y M ([t]G )) (wj (t) − wj ([t]G )) ⎪ ⎪ ⎪ ⎪ ⎨ *iG (t)−1 q ∗ (M1 ) 1 ⎪ + (σj σg )(y M (tr )) (wj (tr+1 ) − wj (tr )) ⎪ ⎪ 2 j,g=1 r=0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ · (wg (tr+1 ) − wg (tr )) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + (σj σg )(y M ([t]G )) (wj (t) − wj ([t]G )) ⎪ + ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ · (wg (t) − wg ([t]G )) ⎪ ⎪ ⎪ ⎪ ⎩ for all t ∈ [t0 , T ].
338
10. Stochastic Differential Equations and Empirical Measures
In the second step, a continuous and piecewise linear interpolation of the trajectories in (E1) and (M1) between the points of the whole fine grid yields the methods (E2) and (M2), respectively:
(E2)
⎧ E ⎨ y is continuous, and linear in the intervals [uki−1 , uki ], i = 1, . . . , mk ; k = 0, . . . , n − 1, ⎩ with yE (uki ) = y E (uki ), i = 0, . . . , mk ; k = 0, . . . , n − 1.
(M2)
⎧ M ⎨ y is continuous, and linear in the intervals [uki−1 , uki ], i = 1, . . . , mk ; k = 0, . . . , n − 1, ⎩ with yM (uki ) = y M (uki ), i = 0, . . . , mk ; k = 0, . . . , n − 1.
In the third step, the Wiener process increments over the fine grid are replaced by other i.i.d. r.v.s: Let μ ∈ P(IR) be a measure with mean value 0 and variance 1, and let
k {ξjs : j = 1, . . . , q; s = 1, . . . , mk ; k = 0, . . . , n − 1}
0 ) = μ. be a family of i.i.d. r.v.s with distribution D(ξ11
Then we shall define the methods (E3) and (M3) yielding continuous trajectories linear between neighboring grid points: ⎧ E 0 z (u0 ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ z E (uki ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (E3)
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
= x0 ; = x0 +
k−1
hr b(z E (tr )) + hk ·
E i mk b(z (tk ))
r=0
+
*k−1 8 q hr j=1
r=0
mr 8 +
E
σj (z (tr ))
mr
r ξjs
s=1 i hk E k σj (z (tk )) ξjs mk s=1
+
for all i = 1, . . . , mk ; k = 0, . . . , n − 1;
10.3 Stochastic Differential Equations
339
and ⎧ ⎪ ⎪ ⎪ z M (u00 ) = ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ z M (uki ) = ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (M3)
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
1 x0 , and for b := b − σ σj ; 2 j=1 j q
x0 +
k−1
hrb(z M (tr )) + hk ·
r=0 * 8 q k−1
i M mk b(z (tk ))
mr hr M r + σj (z (tr )) ξjs m r s=1 j=1 r=0 + 8 i hk k + σj (z M (tk )) ξjs mk s=1 2 * k−1 32m 3 q mr r 1 r r hr (σj σg )(z M (tr )) ξjs ξgs + m r 2 j,g=1 r=0 2 is=1 3 2 is=1 3 + hk M k k +m (σ σ )(z (t )) ξ ξgs k j g js k s=1
for all i = 1, . . . , mk ; k = 0, . . . , n − 1.
s=1
k For this last step, the Wiener process w and the r.v.s ξji will have to be defined anew on a common probability space. In the following we investigate the convergence rates in terms of the norm E supt0 ≤t≤T · p for C([t0 , T ]; IRd )-valued r.v.s in each of the three steps.
For convenience we shall denote by K any constant depending only on p, the considered grid class, and on the data of the original SDE (SE). This means that K does not depend on the particular grid. Moreover, K may have different values at different occurrences. The theorems(6) in the sequel will be formulated for an arbitrary fixed grid G of the grid class G(m, Λ, α, β). Therefore, G fulfils (G1)–(G3) with the construction above. We start with some preliminary results. The first one provides the multidimensional H¨older inequality. Lemma 10.3.2 (H¨older’s inequality) (a) Let p ∈ [1, ∞), s < t, and let g : [s, t] → IRd , g(u) = (g1 (u), . . . , gd (u))T (u ∈ [s, t]), be a Borel measurable function such that |gi |p is Lebesgue integrable over [s, t] for i = 1, . . . , d. Then (p (t t ( ( ( g(u) du( ≤ (t − s)p−1 g(u)p du. ( ( s (6) For
s
the proof of the results in this section we refer to Gelbrich (1995).
340
10. Stochastic Differential Equations and Empirical Measures
(b) Let p ∈ [1, ∞) and ai ∈ IRd for all i = 1, . . . , r. Then ( r (p r ( ( ( ai ( ≤ rp−1 ai p . ( ( i=1
i=1
Lemma 10.3.3 (The multidimensional martingale inequalities) Let p ∈ [2, ∞). Then there exist constants Cp , Ap > 0 such that the following assertions hold: (a) Let (w(t), F(t))t∈[α,β] be a one-dimensional standard Wiener process over the probability space (Ω, A, P ). Then for every function g = (g1 , . . . , gd ) : [α, β] × Ω → IRd with the properties (i)
g(·, ω) is square-integrable over [α, β] for almost all ω ∈ Ω,
(ii)
g(u) = g(u, ·) is F(u)-measurable for all u ∈ [α, β],
we have (p ( s ⎛ t ⎞p/2 ( ( ( ( ( ≤ dp/2−1 Cp E ⎝ g(u)2 du⎠ E sup ( g(u) dw(u) ( ( α≤s≤t ( ( α
α
for all t ∈ [α, β]. (b) Let (Ms , Fs )s=0,...,r be an IRd -valued martingale (i.e., each component is a martingale), and let p ∈ [2, ∞). Then with ΔMs := Ms − Ms−1 we have E sup Ms 0≤s≤r
p
≤ d
p/2−1
Ap E
r
p/2 ΔMs . 2
s=1
Corollary 10.3.4 Let p ∈ [2, ∞). Then there exist constants Cp , Ap > 0 such that (a)
under the assumptions of Lemma 10.3.3(a), for all t ∈ [α, β], (p ( s ( ( t ( ( E sup ( g(u) dw(u)( ≤ [d(β − α)]p/2−1 Cp Eg(u)p du, ( ( α≤s≤t ( ( α
(b)
α
under the assumptions of Lemma 10.3.3(b), E max Ms
p
0≤s≤r
≤ Ap (dr)
p/2−1
E
r s=1
ΔMs p .
10.3 Stochastic Differential Equations
341
Lemma 10.3.5 (Gronwall’s lemma) (a) Let f : [t0 , T ] → [0, ∞) be a continuous function and c1 , c2 positive t constants. If for all t ∈ [t0 , T ], f (t) ≤ c1 + c2 f (s) ds, then t0
sup f (t) ≤ c1 ec2 (T −t0 ) .
t0 ≤t≤T
(b) Let a0 , . . . , an and c1 , c2 be nonnegative real numbers. If for all k = k−1 0, . . . , n, ak ≤ c1 + c2 n1 i=0 ai , then max ai ≤ c1 ec2 .
0≤i≤n
Based on Lemmas 10.3.2, 10.3.3, and 10.3.5 one gets the following convergence results for the time discretization step. Theorem 10.3.6 Let p ∈ [2, ∞). Then, (a)
(AS1) and (AS2) imply E sup x(t) − y E (t)p ≤ K · hp/2 , t0 ≤t≤T
(b)
(AS1), (AS2), and (AS3) imply E sup x(t) − y M (t)p ≤ K · hp . t0 ≤t≤T
Next, the solutions in (E1) and (M1∗ )—which behave like the Wiener process between two neighboring points tk−1 and tk of the coarse subgrid of G—will be smoothened by linear interpolation with vertices in all grid points of G, that means in all uki . This will be the contents of Theorem 10.3.10. For its proof we need the following three lemmas. Lemma 10.3.7 Let vi , i = 1, . . . , r, be i.i.d. standard-normally distributed real-valued r.v.s. Then for all p ∈ [0, ∞), E max |vi |p ≤ K(1 + ln r)p/2 . 1≤i≤r
Lemma 10.3.8 Let (w(t)) t∈[τ0 ,∞] be a one-dimensional standard Wiener process and x a standard-normally distributed random variable. Then for
342
10. Stochastic Differential Equations and Empirical Measures
τ0 ≤ a < a < ∞, the random variables |x| have the same distribution.
A
1 a−a
supa≤t≤a (w(t) − w(a)) and
Lemma 10.3.9 Let a0 < a1 < · · · < ar be a partition of [a0 , ar ] with maximal step size Δ := max0≤i≤r−1 (ai+1 − ai ) and (w(t)) t∈[a0 ,ar ] a onedimensional standard Wiener process. Then sup
E max
0≤i≤r−1 ai ≤t≤ai+1
|w(t) − w(a i )|p ≤ K · Δp/2 (1 + ln r)p/2 .
Now, upper bounds for the Lp -norm of the differences between the approximate solutions in (E1) and (E2), and in (M1∗ ) and (M2), respectively, can be obtained: Theorem 10.3.10 Let p ∈ [2, ∞). Then (a)
(AS1) and (AS2) imply E sup y (t) − y (t) E
E
≤ K
p
t0 ≤t≤T
(b)
h m(h)
p/2
1 + ln
m(h) h
p/2 ;
(AS1)–(AS4) imply E sup y (t) − y (t) M
M
t0 ≤t≤T
p
≤ K
h m(h)
p/2
1 + ln
m(h) h
p/2 .
Proof:(7) (a) First, consider the process y E with y E (t0 ) = x0 , y E (uki ) = yE (uki ), y E (t) = y E (uki−1 ) for t ∈ [uki−1 , uki ) (k = 0, . . . , n−1; i = 1, . . . , mk ). Then, with Lemma 10.3.2(b), (AS1), Lemma 10.3.9, (G2), and (G3), we have E sup y E (t) − y E (t)p t0 ≤t≤T ⎧ ( t (p ⎪ ( ( ⎨ ( ( E ≤ K E sup ( b(y ([t]G )) ds( ( ⎪ ⎩ t0 ≤t≤T ( ∗
(10.3.1)
[t]G
(7) We
(1995).
shall include only this proof, which is typical for the methods used in Gelbrich
10.3 Stochastic Differential Equations
343
⎫ ( ( t p q ( ⎬ ( ⎪ ( ( E + E sup (σj (y ([t]G )) dwj (s)( ( ⎪ t0 ≤t≤T ( ⎭ j=1
≤ K
[t]∗ G
p hk max m 1+E k
&
sup
0≤k≤n−1
+
q j=1
tJ 0≤t≤T
hk max m k 0≤k≤n−1
≤ K 1 + E sup y (t) E
p
y E (t)p
p/2 ' α p/2 (1 + ln(n · m(h) ))
t0 ≤t≤T
h m(h)
p/2
(1 + ln n + ln m(h))p/2 .
Since we have by Minkowski’s inequality that
1/p E sup y (t) E
p
t0 ≤t≤T
≤
1/p
E sup x(t) − y E (t)p t0 ≤t≤T
1/p
+ E sup x(t)p t0 ≤t≤T
where the right-hand side is bounded because of Theorem 10.3.6, it holds that E sup y E (t)p
≤
to ≤t≤T
K.
(10.3.2)
Hence, by (10.3.1) and (G1), E sup y E (t) − y E (t)p t0 ≤t≤T
≤ K
(10.3.3)
p/2
(1 + ln n + ln m(h))p/2 p/2 p/2 m(h) h ≤ K m(h) . 1 + ln h h m(h)
On the other hand, E sup y E (t) − yE (t)p
(10.3.4)
t0 ≤t≤T
= E
max 0≤k≤n−1 0≤i ≤mk −1
= E
max 0≤k≤n−1 0≤i ≤mk −1
sup k uk i ≤t≤ui+1
y E (t) − yE (t)p
y E (uki+1 ) − yE (uki )p
,
344
10. Stochastic Differential Equations and Empirical Measures
≤ E
max 0≤k≤n−1 0≤i ≤mk −1
sup k uk i ≤t≤ui+1
y E (t) − y E (uki )p
= E sup y E (t) − y E (t)p . t0 ≤t≤T
Now, with (10.3.3) and (10.3.4) we have E sup y E (t) − yE (t)p t0 ≤t≤T
E E p E E p ≤ K E sup y (t) − y (t) + E sup y (t) − y (t) ≤ K
t0 ≤t≤T
p/2
h m(h)
1 + ln
m(h) h
t0 ≤t≤T
p/2 .
(b) As in (a), we first consider the process y M defined by y M (t0 ) = x0 ; y M (uki ) = yM (uki ); y M (t) = y M (uki−1 ) for t ∈ [uki−1 , uki ) q (k = 0, . . . , n − 1; i = 1, . . . , mk ); and with b = b − 12 j=1 σj σj and Δj w(u, v) := wj (v) − wj (u) (j = 1, . . . , q; u, v ∈ [t0 , T ]) we have, using method (M1∗ ), E sup y M (t) − y M (t)p
(10.3.5)
t0 ≤t≤T
( t (p ( ( ( ( ≤ K E sup ( b(y M ([t]G )) ds( ( t0 ≤t≤T (
[t]∗ G
+
q j=1
+
(p ( t ( ( ( ( M E sup ( σj (y ([t]G )) ds( ( t0 ≤t≤T ( [t]∗ G
q i,j=1
E sup (σi σj )(y M ([t]G ))[Δi w([t]G , t)Δj w([t]G , t) t0 ≤t≤T
− Δi w([t]G , [t]∗G )Δj w([t]G , [t]∗G )]p ≤ K
h m(h)
+ K
p/2
1 + ln
q i,j=1
m(h) h
p/2
E sup |Δi w([t]G , t)Δj w([t]G , t) t0 ≤t≤T
− Δi w([t]G , [t]∗G )Δj w([t]G , [t]∗G )|p , analogously to (10.3.1)–(10.3.3), but having used the inequalities b(x) ≤ K(1 + x) (x ∈ IRd )and E sup y M (t)p ≤ K. t0 ≤t≤T
10.3 Stochastic Differential Equations
345
By the Cauchy–Schwarz inequality and by the relations sup |Δwj ([t]∗G , t)|2p =
t0 ≤t≤T
sup |Δwj ([t]G , [t]∗G )|2p
t0 ≤t≤T
max
sup
0≤k≤n−1 uk ≤t≤uk i+1 0≤i ≤mk −1 i
|Δwj (uki , t)|2p ,
sup |Δwj ([t]G , t)|2p
≤
t0 ≤t≤T
=
max
sup
0≤k≤n−1 tk ≤t≤tk+1
|Δwj (tk , t)|2p .
By Lemma 10.3.9 and (G3) we obtain E sup |Δwi ([t]G , t)Δwj ([t]G , t) − Δwi ([t]G , [t]∗G )Δwj ([t]G , [t]∗G )|p t0 ≤t≤T
≤ K E sup |Δwi ([t]G , t)[Δwj ([t]G , t) − Δwj ([t]G , [t]∗G )]|p t0 ≤t≤T
+ E sup |[Δwi ([t]G , t) − Δwi ([t]G , [t]∗G )]Δwj ([t]G , [t]∗G )|p
t0 ≤t≤T
# $ p ∗ p ≤ K E sup |Δwi ([t]G , t)| sup |Δwj ([t]G , t)| t0 ≤t≤T
# +E
sup
t0 ≤t≤T
t0 ≤t≤T
|Δwi ([t]∗G , t)|p
sup
t0 ≤t≤T
|Δwj ([t]G , [t]∗G )|p
$
1/2 1/2 2p ∗ 2p ≤ K E sup |Δwi ([t]G , t)| E sup |Δwj ([t]G , t)| t0 ≤t≤T
t0 ≤t≤T
1/2 1/2 ∗ 2p ∗ 2p + E sup |Δwi ([t]G , t)| E sup |Δwj ([t]G , [t]G )| ≤ K
t0 ≤t≤T
p/2
max hk
0≤k≤n−1
max
0≤k≤n−1
t0 ≤t≤T
p/2 p/2 p/2 hk hk + max m max hk mk k 0≤k≤n−1
0≤k≤n−1
×(1 + ln n + ln m(h))p/2 (1 + ln n)p/2 p/2 p/2 h ≤ K ·h (1 + ln n + ln m(h))p/2 (1 + ln n)p/2 m(h) p/2 p/2 m(h) h ≤ K · m(h) . (10.3.6) 1 + ln h Here the last step is based on (G1) and the boundedness of h(1 + ln n) ≤ h(1 + ln(Λ/h)). Now, (10.3.5) and (10.3.6) yield E sup y (t)−y (t) ≤ K· M
t0 ≤t≤T
M
p
h m(h)
p/2
1 + ln
m(h) h
p/2 .(10.3.7)
346
10. Stochastic Differential Equations and Empirical Measures
Analogously to (10.3.4), it follows that E sup y M (t) − yM (t)p ≤ E sup y M (t) − y M (t)p , t0 ≤t≤T
t0 ≤t≤T
which, together with (10.3.7), gives us the estimate (b). 2 In the last discretization step the Wiener process increments shall be replaced by i.i.d. r.v.s with a given distribution μ on IR. But the corresponding results hold only in the weak sense; i.e., the Wiener process (and k can be defined its increments between the points of G) and i.i.d. r.v.s ξji on a common probability space such that the estimates hold. Theorem 10.3.11 (Koml´ os, Major, Tusn´ ady (1975, 1976)) Let μ ∈ P(IR) have the following properties: ∞ x dμ(x) = 0, −∞
∞ x2 dμ(x) = 1, −∞
∞ etx dμ(x) < ∞, −∞
for all t with t ≤ τ, τ > 0.
(10.3.8)
Then there exist positive constants C, A, λ, depending only on μ, and for each natural number s > 0 two s-tuples (x1 , . . . , xs ) and (y1 , . . . , ys ), each consisting of i.i.d. real-valued r.v.s, with L(x1 ) = μ and y1 being standardnormally distributed, such that for each a > 0, P
k max i=1 (xi − yi ) > C ln s + a < Ae−λa .
1≤k≤s
For translating this estimate into an estimate with the distance used in the previous chapters, we need the following lemma. Lemma 10.3.12 Assume that there exist constants C, A, λ > 0 with λC ≥ 1 and for any two natural numbers r, s ≥ 1 an r-tuple (δ1,s , . . . , δr,s ) of i.i.d. positive real-valued r.v.s satisfying P (δ1,s > C ln s + a) < Ae−λa
for all a > 0.
(10.3.9)
Then for each p ∈ [0, ∞), there exists a constant Mp > 0 such that for all natural r, s ≥ 1, p ≤ Mp (1 + ln r + ln s)p . E max δi,s 1≤i≤r
10.3 Stochastic Differential Equations
347
The following well-known result (cf., for example, Shortt (1983), Rachev (1991)) is used in the proof of the following theorem. Lemma 10.3.13 Let S1 , S2 , and S3 be Polish spaces (i.e., topological spaces that are metrizable with a complete separable metric), and let π12 : S1 × S2 × S3 → S1 × S2 ; π23 : S1 × S2 × S3 → S2 × S3 ; π212 : S1 × S2 → S2 , and π223 : S2 × S3 → S2 denote the projections defined by dropping one component. Then for any two measures ν12 ∈ P(S1 × S2 ) and ν23 ∈ P(S2 × S3 ) with ν12 ◦ (π212 )−1 = ν23 ◦ (π223 )−1 , i.e., with identical marginal distributions on S2 , there exists a measure ν123 ∈ P(S1 ×S2 ×S3 ) with ν123 ◦ (π12 )−1 = ν12 and ν123 ◦ (π23 )−1 = ν23 . Now we can prove the estimates for the chance discretization step: Theorem 10.3.14 Let p ∈ [2, ∞) and μ ∈ P(IR) satisfy (10.3.8). Then we can define a q-dimensional standard Wiener process (w(t))t∈[t0 ,T ] and k a set of i.i.d. r.v.s {ξji : j = 1, . . . , q; i = 1, . . . , mk ; k = 0, . . . , n − 1} 0 with distribution L(ξ11 ) = μ on a common probability space such that for the methods (E2), (E3), (M2), and (M3) constructed with them we have (a) If (AS1) and (AS2) hold, then y (t) − z (t) ≤ K E sup E
E
p
1+ln m(h)
√
t0 ≤t≤T
p .
m(h)
(b) If (AS1), (AS2), (AS3), and (AS4) hold, then y M (t) − z M (t)p ≤ K E sup t0 ≤t≤T
1+ln m(h)
√
m(h)
p .
The preceding results yield the following theorem, which gives bounds for the Lp -norm of the differences between the exact solution x of (SE) and the approximate solutions z E and z M defined in (E3) and (M3). Again, as in Theorem 10.3.14, this is a result in the weak sense. Theorem 10.3.15 Let p ∈ [2, ∞) and μ ∈ P(IR) satisfy (10.3.8). Then we can define a q-dimensional standard Wiener process (w(t))t∈[t0 ,T ] and k a set of i.i.d. r.v.s {ξji , j = 1, . . . , q, i = 1, . . . , mk , k = 0, . . . , n − 1} with 0 distribution L(ξ11 ) = μ on a common probability space such that for (SE) and the methods (E3) and (M3) constructed with them we have
348
10. Stochastic Differential Equations and Empirical Measures
(a) If (AS1) and (AS2) hold, then
1+ln m(h)
√
E sup x(t) − z E (t)p ≤ K hp/2 + t0 ≤t≤T
p .
m(h)
(b) If (AS1), (AS2), (AS3), and (AS4) hold, then E sup x(t) − z (t) M
p
t0 ≤t≤T
≤ K h + p
1+ln m(h)
√
m(h)
p .
To show that both assertions (a) and (b) follow from the Theorems 10.3.6, 10.3.10, and 10.3.14, it suffices to verify that h m(h)
1 + ln
m(h) h
1+ln m(h)
√
≤ K
m(h)
2 ,
which follows easily from (G1). Since Theorem 10.3.15 provides results in the weak sense, it is appropriate to formulate it as an estimate for the Lp -Wasserstein metric between the distributions of the exact solution and the approximate solutions: Corollary 10.3.16 Let p ∈ [1, ∞) and μ ∈ P(IR) have the properties (10.3.8). Moreover, let w(t) t∈[t ,T ] be a q-dimensional standard Wiener 0
: j = 1, . . . , q; i = 1, . . . , mk ; k = 0, . . . , n − 1} a set of process and 0 ) = μ. Then for (SE) and the methods i.i.d. r.v.s with distribution L(ξ11 (E3) and (M3) constructed with them we have k {ξji
(a) If (AS1) and (AS2) hold, then
1/2
p (x, z ) ≤ K h E
1+ln m(h)
+ √
m(h)
.
(b) If (AS1), (AS2), (AS3), and (AS4) hold, then √ m(h) . p (x, z M ) ≤ K h + 1+ln m(h)
For p ∈ [2, ∞) the assertions follow directly from Theorem 10.3.15 and applying Lemma 10.3.3 to the right-hand sides. Then the assertions are also true for p ∈ [1, 2), since p1 ≤ p2 for 1 ≤ p1 ≤ p2 < ∞. The estimates in Theorem 10.3.15 and Corollary 10.3.16 give convergence rates
10.3 Stochastic Differential Equations
349
with respect to h for the methods (E3) and (M3) and for any grid sequence in G(m, Λ, α, β). These rates consist of two summands, one depending on h and the other depending on m(h), representing the rates of time and chance discretization, respectively. It is desirable to tune the rates of both summands, i.e., to equal the powers of h in both summands. This means to choose m(h) to be increasing like 1/h for method (E3) and like 1/h2 for the method (M3). Corollary 10.3.17 Let p ∈ [2, ∞) and μ ∈ P(IR) satisfy (10.3.8). Then we can construct solutions in (SE), (E3), and (M3) on a common probability space (as in Theorem 10.3.15) with the following properties. & (a) If (AS1) and (AS2) hold and max sup0<s≤1 sm(s), sup0<s≤1 ≤ K, then
1 sm(s)
'
E sup x(t) − z E (t)p ≤ K · hp/2 (1 − ln h)p . t0 ≤t≤T
(b) If (AS1), (AS2), (AS3), and (AS4) hold'and & 1 max sup0<s≤1 s2 m(s), sup0<s≤1 s2 m(s) ≤ K, then E sup x(t) − z M (t)p ≤ K · hp (1 − ln h)p . t0 ≤t≤T
Corollary 10.3.18 Under the general assumptions in Corollary 10.3.16 we have & (a) If (AS1) and (AS2) hold and max sup0<s≤1 sm(s), sup0<s≤1 ≤ K, then
1 sm(s)
'
p (x, z E ) ≤ K · h1/2 (1 − ln h). (b) If (AS1), (AS2), (AS3), and (AS4) hold'and & 1 max sup0<s≤1 s2 m(s), sup0<s≤1 s2 m(s) ≤ K, then p (x, z M ) ≤ K · h(1 − ln h). In conclusion, suppose that a grid sequence in G(m, Λ, α, β) with h → 0 is given. Then using the metric p , we have under the assumptions of Corollary 10.3.18(a), for the method (E3), the convergence rate O(h1/2 (1 − ln h)). This convergence is in terms of the maximal step sizes h of the
350
10. Stochastic Differential Equations and Empirical Measures
h coarse subgrids. Similarly, we have the convergence rate O(( m(h) )1/4 (1 − h h ln m(h) )) with respect to the maximal step sizes m(h) of the whole fine grids.
Finally, we have the convergence rate O(N −1/4 (1 + ln N )) with respect to the number N of all gridpoints of the whole fine grids. Analogously, under the assumptions of Corollary 10.3.18(b) we have, for the method h h (M3), the convergence rates O(h(1 − ln h)), O(( m(h) )1/3 (1 − ln m(h) )), and O(N −1/3 (1 + ln N )).
References
[1] T. Abdellaoui. Distances de deux lois dans les espaces de Banach. PhD thesis, Universit´e de Rouen, 1993. [2] T. Abdellaoui. D´etermination d’un couple optimal du probl`eme de Monge–Kantorovich. C.R. Acad. Sci. Paris I, 319:981–984, 1994. [3] T. Abdellaoui and H. Heinich. Sur la distance de deux lois dans le cas vectoriel. C.R. Acad. Sci. Paris I, 319:397–400, 1994. [4] M. Abramowitz and I.A. Stegun. Handbook of Mathematical Functions. Dover Publications, New York, 9th edition, 1970. [5] A. Acosta and G. Gine. Convergence of moments and related functionals in the general central limit theorem in Banach spaces. Z. Wahrscheinlichkeitstheorie Verw. Geb., 48(2):213–241, 1979. [6] N.I. Ahiezer. Classical Moment Problem and Related Questions of Analysis. GIFML, Moscow, 1961. [7] N.I. Ahiezer and M. Krein. Some Questions in the Theory of Moments. American Mathematical Society, Providence, 1962. [8] H. Akaike. Modern development of statistical methods. In P. Eykhoff, editor, Trends and Progress in System Identification, pages 169–184. Pergamon Press, 1981.
352
References
[9] D.J. Aldous. Exchangeability and related topics. Lecture Notes in Mathematics, 1117, 1985. [10] D.J. Aldous. Ultimate instability of exponential backoff protocol for acknowledgement-based transmission control of random access communication channels. IEEE Transactions on Information Theory, IT 33:219–223, 1987. [11] D.J. Aldous. Asymptotic fringe distribution for general families of random trees. Annals of Applied Probability, 1:228–266, 1991. [12] D.J. Aldous. The continuum random tree II: An overview. In M.T. Barlow and N.H. Bingham, editors, Stochastic Analysis, volume 167 of London Math. Soc. Lecture Notes Series, pages 23–70. Cambridge University Press, 1991. [13] D.J. Aldous and J.M. Steele. Introduction to the interface of probability and algorithms. Statistical Science, 8:3–9, 1993. [14] G.A. Anastassiou. Moments in Probability and Approximation Theory. Pitman, England, 1993. [15] G.A. Anastassiou and S.T. Rachev. Approximation of a random queue by means of deterministic queueing models. In C.K. Chui, L.L. Shumaker, and J.D. Ward, editors, Approximation Theory VI, volume 1, pages 9–11. Academic Press, 1989. [16] G.A. Anastassiou and S.T. Rachev. Moment problems and their applications to characterization of stochastic processes, queueing theory, and rounding problems. In Approximation Theory, volume 138, pages 1–77, New York, 1992. Proceedings of 6th S.E.A. Meeting, Marcel Dekker Inc. [17] G.A. Anastassiou and S.T. Rachev. Moment problems and their applications to the stability of queueing models. Computers and Mathematics with Applications, 24(8/9):229–246, 1992. [18] E.J. Anderson and P. Nash. Linear Programming in Infinite Dimensional Spaces. Theory and Applications. Wiley, New York, 1987. [19] E.J. Anderson and A.B. Philpott. An algorithm for a continuous version of the assignment problem. Lecture Notes in Economics and Mathematical Systems, 215:108–117, 1983. Semi-Infinite Programming and Applications (Austin, Texas, 1981). [20] E.J. Anderson and A.B. Philpott. Duality and an algorithm for a class of continuous transportation problems. Mathematics of Operations Research, 9:222–231, 1984.
References
353
[21] T.W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley, New York, 1984. [22] W. Apitzsch, B. Fritzsche, and B. Kirstein. A Schur analysis approach to minimum distance problems. Linear Algebra and its Applications, 1990. [23] A. Araujo and E. Gin´e. The Central Limit Theorem for Real and Banach Valued Random Variables. Wiley, New York, 1980. [24] M.A. Arbeiter. Random recursive constructions of self-similar fractal measures. The non-compact case. Probability Theory and Related Fields, 88:497–520, 1991. [25] R.J. Aumann. Measurable utility and the measurable choice theorem. In Centre Nat. Recherche Sci., Paris, editor, La Decision, volume 171, pages 15–26. Actes Coll. Internat., Aix-en-Provence, 1967. [26] F. Aurenhammer, F. Hoffmann, and B. Arnov. Minkowski-type theorems and least-square partitioning. Reports of the Institute for Computer Science, 1992. Dept. of Mathematics, Freie Universit¨at Berlin. [27] J. Auslander. Generalized recurrences in dynamical systems. Contributions to differential equations, 3(1):65–74, 1964. [28] M.L. Balinski. Signature des points extrˆemes du polyhedre dual du probl`eme de transport. Comptes Rendus de l’Acad´emie des Sciences, Paris, 1983. [29] M.L. Balinski. The Hirsch conjecture for dual transportation polyhedra. Mathematics of Operations Research, 9:629–633, 1984. [30] M.L. Balinski. Signature methods for the assignment problem. Operations Research, 34:125–141, 1985. [31] M.L. Balinski. A complex (dual) simplex method for the assingment problem. Mathematical Programming Study, 34:125–141, 1986. [32] M.L. Balinski, B. Athanasopoulos, and S.T. Rachev. Some developments on the theory of rounding proportions. In Bulletin of thi ISI, 49th Session, volume 1, pages 71–72, Firenze, 1993. [33] M.L. Balinski and D. Gale. On the core of the assignment game. In Functional Analysis, Optimization, and Mathematical Economics: A Collection of Papers dedicated to the Memory of L.V. Kantorovich, pages 274–289, Oxford, 1990. Oxford University Press. [34] M.L. Balinski and S.T. Rachev. On Monge–Kantorovich problems. Preprint, 1989. SUNY at Stony Brook, Dept. of Applied Mathematics and Statistics.
354
References
[35] M.L. Balinski and S.T. Rachev. Rounding proportions: rules of rounding. Numer. Funct. Anal. Optimization, 14:475–501, 1993. [36] M.L. Balinski and S.T. Rachev. Rounding proportions: methods of rounding. Mathematical Scientist, 1997. [37] M.L. Balinski and A. Russakoff. Faces of dual transportation polyhedra. Mathematical Programming Study, 22:1–8, 1984. [38] M.L. Balinski and H.P. Young. Stability, coalitions and schisms in proportional representation systems. Americal Political Science Review, 72:848–858, 1978. [39] M.L. Balinski and H.P. Young. Fair Representation: Meeting the Ideal of One Man, One Vote. Yale University Press, New Haven, 1982. [40] A.A. Balkema, L. de Haan, and R. Karandikar. The maximum of n independent stochastic processes. Preprint, 1990. Erasmus University, Rotterdam. [41] D.P. Barbu and Th. Precupanu. Convexity and optimization in Banach spaces. Sijthoff/Nordhoff, 1978. [42] R.E. Barlow and F. Proschan. Statistical Theory of Reliability and Life Testing: Probability Models. Hold, Rinehart, and Winston, New York, 1975. [43] E.R. Barnes and A.J. Hoffman. Partitioning spectra and linear programming. In Proc. Silver Jubilee Conference on Combinations, Ontario, Canada, June 1982. Univ. Waterloo. [44] E.R. Barnes and A.J. Hoffman. On transportation problems with upper bounds on leading rectangles. SIAM Journal of Algebraic and Discrete Methods, 6:487–496, 1985. [45] M.F. Barnsley and J.H. Elton. A new class of of Markov processes for image encoding. Advances in Applied Probability, 20:14–32, 1988. [46] D.P. Baron and R.B. Myerson. Regulating a monopolist with unknown cost. Econometrica, 50:911–930, 1982. [47] S.K. Basu. On the rate of convergence to normality of sums of dependent random variables. Acta Math. Acad. Sci. Hungarica, 28:261– 265, 1976. [48] S.K. Basu and G. Simons. Moment spaces of IFR distributions, applications and related material. In P.K. Sen, editor, Contributions to Statistics: Essay in Honor of Norman L. Johnson, pages 27–46. North-Holland Publishing Company, 1983.
References
355
[49] J. Beirlant and S.T. Rachev. The problems in stability in insurance mathematics. Insurance: Mathematics and Economics, 6:179–188, 1987. [50] V. Bene˘s. The moment problem and its technical application. In Proc. 30th Int. Wissen. Kolloq., pages 11–14. TH Ilmenau, 1985. [51] V. Bene˘s. Moment Problem and Its Application. PhD thesis, Charles University, 1986. [52] V. Bene˘s. Extremal and optimal solutions in the transshipment problem. Comment. Math. Univ. Carolinae, 33:97–112, 1992. [53] V. Bene˘s. Extremal and Optimal Solutions of the Marginal and Transshipment Problem. PhD thesis, Dept. of Mathematics, FSI, Czech Technical University, Praha, Czech Republic, 1995. ˘ ep´ [54] V. Bene˘s and J. St˘ an. The support of extremal probability measure with given marginals. In M.L. Puri, P. Revesz, and W. Werzt, editors, Mathematical Statistics and Probability Theory, volume A of Proc. 6th Pannon Symp., pages 33–41. D. Reidel Publ. Comp., 1987. ˘ ep´ [55] V. Bene˘s and J. St˘ an. Extremal solutions in the marginal problem. In G. Dall’Aglio et al., editor, Advances in Probability Measures with Given Marginals, pages 189–206. Kluwer, Dordrecht, 1991. [56] V.Y. Bentkus, F. G¨otze, V. Paulauskas, and A. Rackauskas. The accuracy of Gaussian approximation in Banach spaces. University of Bielefeld, Preprint 90-100, 1990. [57] C. Berge. Th´eorie g´en´erale des jeux ` a n personnes, volume 138. Gauthier-Villars, Paris, 1957. M´emorial des science math´ematiques. [58] C. Berge and A. Ghouila-Houri. Programming, Games and Transportation Networks. Methnen, John Wiley and Sons, Inc., New York, 1965. [59] S. Bertino. Su di una sottoclasse della classe di Fr´echet. Statistica, 28:511–542, 1968. [60] S. Bertino. Sulla distanza tra distribuzioni. Pubbl. Ist. Calc. Prob. Univ. Roma, 1968. [61] D. Bertsekas and R. Gallager. Data Networks. Prentice-Hall, New Jersey, 1987. [62] N.P. Bhatia and G.P. Szeg¨o. Stability theory of dynamical systems. Number 161 in Dre Grundlehren der mathematischen Wissenschaften. Springer, 1970.
356
References
[63] R.M. Bhattacharya and R. Rango Rao. Normal Approximation and Asymptotoic Expansions. Wiley, 1976. [64] P.J. Bickel and D.A. Freedman. Some asymptotic theory for the bootstrap. Annals of Statistics, 9:1196–1217, 1981. [65] P. Billingsley. Convergence of Probability Measures. Wiley, New York, 1968. [66] P. Billingsley. Probability and Measure. Wiley, New York, 2nd edition, 1986. [67] D. Blackwell and L.E. Dubins. An extension of Skorohod’s almost sure representation theorem. Proc. Amer. Math. Soc., 89:691–692, 1983. [68] R.C. Blattberg and N.J. Genodes. A comparison of the stable and student distributions as statistical models for stock prices. J. Business, 47:244–280, 1974. [69] T. Bollerslev. A conditionally heteroscedastic time series model for speculative prices and rates of return. Review of Economic Studies, 69:542–547, 1987. [70] E. Bolthausen. Exact convergence rate in some martingale central limit theorems. Annals of Probability, 10:672–688, 1982. [71] A. Boness, A. Chen, and S. Jatusipitak. Investigations of nonstationary prices. J. Business, 48:518–537, 1979. [72] A.A. Borovkov. Asymptotic Methods in Queueing Theory. Wiley, New York, 1984. [73] A.A. Borovkov. On the ergodicity and stability of the sequence wn+1 = f (wn , zn ): applications to communication networks. Theory of Probability and its Applications, 33:595–611, 1988. [74] A. Brandt, P. Franken, and B. Lisek. Stationary Stochastic Models. Wiley, New York, 1990. [75] Y. Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure. Appl. Maths., XLIV:375–417, 1987. [76] G. Brown and B. Shubert. On random binary trees. Mathematics of Operations Research, 9:43–65, 1984. [77] R.A. Brualdi and J. Csima. Extremal plane stochastic matrices of dimension three. Journal of Linear Algebra and its Applications, 11:105–133, 1975.
References
357
[78] R.A. Brualdi and J. Csima. Stochastic patterns. J. Comb. Theory, 19:1–12, 1975. [79] Y.A. Brudnii. A multidimensional analog of a theorem of Whitney. USSR Math. Sbornik, 11:157–170, 1970. [80] R.E. Burkard, B. Klinz, and R. Rudolf. Perspectives of Monge properties in optimization. Bericht 2, 1994. Spezialforschungsbereich F 003, Karl-Franzens-Universit¨ at Graz & Technische Universit¨at Graz. [81] R.M. Burton and U. R¨osler. An L2 -convergence theorem for random affine mappings. Journal of Applied Probability, 32:183–192, 1995. [82] P.L. Butzer, L. Hahn, and M.Th. Roeckerath. Central limit theorem and weak law of large numbers with rates for martingales in Banach spaces. Journal of Multivariate Analysis, 13:287–301, 1983. [83] S. Cambanis and G. Simons. Probability and expectation inequalities. Z. Wahrscheinlichkeitstheorie Verw. Geb., 59:285–294, 1982. [84] S. Cambanis, G. Simons, and W. Stout. Inequalities for Ek(X, Y ) when the marginals are fixed. Z. Wahrscheinlichkeitstheorie Verw. Geb., 36:285–294, 1976. [85] L. Cavalli-Sforza. Cultural and biological evolution: a theoretical inquirey. In S.G. Ghurye, editor, Proceedings of the Conference on Directions for Mathematical Statistics, volume 7 of Suppl. Adv. Appl. Prob., pages 90–99, 1975. [86] L. Cavalli-Sforza and M.W. Feldman. Models for cultural inheritance I. Group mean and within group variation. Theoret. Popn. Biol., 4:42–55, 1973. [87] S. Chandrasekhar and G. Munch. The theory of the fluctuations in brightness of the milky way. I and II. Astrophys. J., 112:380–398, 1950. [88] M.R. Chernick, D.J. Daley, and R.P. Littlejohn. A time-revisibility relationship between two markov chains with exponential stationary distributions. Journal of Applied Probability, 25:418–422, 1988. [89] G. Choquet. Forme abstraite du th´eor`eme de capacitabilit´e. Ann. Inst. Fourier, 9:83–89, 1959. [90] Y.S. Chow and H. Teicher. Probability Theory: Independeance, interchangeability, martingales. Springer, New York, 1978. [91] F.H. Clark. Optimization and nonsmooth analysis. Classics in Appl. Math. SIAM, 1990.
358
References
[92] J.M.C. Clark and R.J. Cameron. The maximum rate of convergence of discrete approximations for stochastic differential equations. Lecture Notes in Control and Information Science, 25:162–171, 1980. [93] P.K. Clark. A subordinated stochastic process model with finite variance for speculative prices. Econometrica, 41:135–155, 1973. [94] M. Cramer. Stochastische Analyse rekursiver Algorithmen mit idealen Metriken. PhD thesis, Universit¨at Freiburg, 1995a. [95] M. Cramer. Convergence of a branching type recursion with nonstationary immigration. Metrica, 1995b. To appear. [96] M. Cramer. A note concerning the limit distribution of the Quicksort algorithm. Informatique Th´eoriqu´e et Appl., 30:195–207, 1996. [97] M. Cramer and L. R¨ uschendorf. Analysis of recursive algorithms by the contraction method. Lecture Notes in Statistics, 114:18–33, 1996a. [98] M. Cramer and L. R¨ uschendorf. Convergence of a branching type recursion. Annales de l’Institut Henri Poincar´e, 32:725–741, 1996b. [99] J. Csima. Multidimensional stochastic matrices and patterns. J. Algebra, 14:194–202, 1970. [100] J.A. Cuesta-Albertos and C. Matr´ an. Strong convergence of weighted sums of random elements through the equivalence of sequences of distributions. Journal of Multivariate Analysis, 25:311–322, 1988. [101] J.A. Cuesta-Albertos and C. Matr´ an. Notes on the Wasserstein metric in Hilbert spaces. Annals of Probability, 17:1264–1276, 1989. [102] J.A. Cuesta-Albertos and C. Matr´ an. Skorohod representation theorem and Wasserstein metrics. Preprint, 1991. [103] J.A. Cuesta-Albertos and C. Matr´an. A review on strong convergence of weighted sums of random elements based on Wasserstein metrics. Journal of Stat. Planning Infer., 30:359–370, 1992. [104] J.A. Cuesta-Albertos and C. Matr´an. Stochastic convergence through Skorohod representation theorems and Wasserstein metrics. Suppl. Rendic. Circolo Matem. Palermo II, 35:89–113, 1994. [105] J.A. Cuesta-Albertos, C. Matr´ an, S.T. Rachev, and L. R¨ uschendorf. Mass transportation problems in probability theory. Mathematical Scientist, 21:37–72, 1996.
References
359
[106] J.A. Cuesta-Albertos, L. R¨ uschendorf, and A. Tuero-Diaz. Optimal coupling of multivariate distributions and stochastic processes. Journal of Multivariate Analysis, 46:335–361, 1993. [107] J.A. Cuesta-Albertos and A. Tuero-Diaz. A characterization for the solution of the Monge–Kantorovich mass transference problem. Statist. Probab. Letters, 16:147–152, 1993. [108] G. Dall’Aglio. Sugli estremi dei momenti delle funzioni di ripartizione doppie. Ann. Scuola Normale Superiore Di Pisa, Cl. Sci., 3(1):33–74, 1956. [109] G. Dall’Aglio. Sulla compatibilita delle funzioni di ripartizione doppia. Rendiconti di Math., 18:385–413, 1959. [110] G. Dall’Aglio. Les fonctions extr`emes de la classe de Fr´echet `a 3 dimensions. Publ. Inst. Stat. Univ. Paris, IX:175–188, 1960. [111] G. Dall’Aglio. Sulle distribuzioni doppie con margini assegnati soggette a delle limitazioni. It. Giorn. 1st. Ital. Attuari, 94, 1961. [112] G. Dall’Aglio. Fr´echet classes and compatibility of distribution functions. Symposia Mathematica, 9:131–150, 1972. [113] G. Dall’Aglio, S. Kotz, and G. Salinetti. Advances in Probability Distributions with Given Marginals. Kluver, Dordrecht, 1991. [114] G.B. Dantzig and A.R. Ferguson. The allocation of aircraft to routes—an example of linear programming under uncertain demands. Mang. Science, 3:45–73, 1956. [115] A. D’Aristotile, P. Diaconis, and D. Freedman. On a merging of probabilities. No. 301, 1988. Dept. of Statistics, Stanford University. [116] M.M. Day. Normed Linear Spaces. Heidelberg, 1958.
Springer, Berlin–G¨ ottingen–
[117] A. de Acosta. Invariance principles in probability for triangle arrays of B-valued random vectors and some applications. Annals of Probability, 10:346–373, 1982. [118] L. de Haan, E. Omey, and S.I. Resnick. Domains of attraction and regular variation in IRd . Journal of Multivariate Analysis, 14:17–33, 1984. [119] L. de Haan and S.T. Rachev. Estimates of the rate of convergence for max-stable processes. Annals of Probability, 17:651–677, 1989.
360
References
[120] L. de Haan and S.I. Resnick. Limit theory for multivariate sample extremes. Z. Wahrscheinlichkeitstheorie Verw. Geb., 40:317–337, 1977. [121] L. de Haan, S.I. Resnick, H. Rootzen, and C.G. Vries. Extremal behavior of solutions to a stochastic difference equation with applications to ARCH process. Stoch. Processes and Applications, 32:213– 224, 1989. [122] P. de Jong. Central limit theorems for generalized multilinear forms. CWI Tract, Amsterdam, 61, 1989. [123] G. Debreu. Representation of a preference ordering by a numerical function. In Decision Processes, pages 159–165. Wiley, New York, 1954. [124] G. Debreu. Continuity properties of paretian utility. Intern. Econ. Revue, 5:285–293, 1964. [125] P. Deheuvels and D. Pfeifer. On a relationship between Uspensky’s theorem and Poisson approximation. Ann. Inst. Statist. Math., 40:671–681, 1988. [126] C. Dellacherie and P.A. Meyer. Probabilit´es et potential, volume 29 of North-Holland Mathematics Studies. Hermann, Paris, 1983. Chapitres IX a XI. [127] U. Derigs, O. Goecke, and R. Schrader. Monge sequences and a simple assignment algorithm. Discrete Applied Mathematics, 15:241– 248, 1986. [128] L. Devroye. Lecture Notes on Bucket Algorithms, volume 6. Birkh¨ auser, Boston, 1986. Progress in computer science. [129] L. Devroye. A Course in Density Estimation, volume 14 of Progress in probability and statistics. Birkh¨ auser, Boston, 1987. [130] P. Diaconis and D. Freedman. On rounding percentages. Journal of the American Statistical Association, 74:359–364, 1979. [131] P. Diaconis and D. Freedman. A dozen of the Finetti-style results in search of a theory. Annales de l’Institut Henri Poincar´e, 23:397–423, 1987. [132] H. Dietrich. Zur c-Konvexit¨ at und c-Subdifferenzierbarkeit von Funktionalen. Optimization, 19:355–371, 1988. [133] N. Dinculeanu. Vector Measures, volume 95 of International series of monographs on pure and applied mathematics. Pergamon Press, Oxford, 1967.
References
361
[134] R.L. Dobrushin. Prescribing a system of random variables by conditional distributions. Theory of Probability and its Applications, 15:458–486, 1970. [135] R.L. Dobrushin. Vlasov equations. Func. Anal. Appl., 13:115–123, 1979. [136] I. Domowitz and C.S. Hakkio. Conditional variance and the risk premium in the foreign exchange market. Journal of Internat. Economics, 19:47–66, 1985. [137] H. Doss. Liens entre ´equation differentielles stochastiques et ordinaires. Annales de l’Institut Henri Poincar´e, XIII:99–125, 1977. [138] R.G. Douglas. On extremal measures and subspace density. Michigan Math. J., 11:243–246, 1964. [139] D.C. Dowson and B.V. Landau. The Fr´echet distance between multivariate normal distributions. Journal of Multivariate Analysis, 12:450–455, 1982. [140] A.Y. Dubovitskii and A.A. Milyutin. Necessary Conditions for a Weak Extremum in the General Problems of Optimal Management. Nauka, Moscow, 1971. In Russian. [141] R.M. Dudley. Convergence of Baire measures. Studia Mathematica, 27:251–268, 1966. [142] R.M. Dudley. Distances of probability measures and random variables. Annals of Mathematical Statistics, 39:1563–1572, 1968. [143] R.M. Dudley. The speed of mean Glivenko–Cantelli convergence. Annals of Mathematical Statistics, 40:40–50, 1969. [144] R.M. Dudley. Speeds of metric probability convergence. Z. Wahrscheinlichkeitstheorie Verw. Geb., 22:323–332, 1972. [145] R.M. Dudley. Probability and metrics. Convergence of laws on metric spaces, with a view to statistical testing. Aarhus Univ. Lect. Notes, 45, 1976. [146] R.M. Dudley. Real Analysis and Probability. Wadsworth & BrooksCole, Pacific Grove, California, 1989. [147] D. Duffie. Dynamic Asset Pricing Theory. Princeton University Press, Princeton, 1992. [148] N. Dunford and J. Schwartz. Linear Operators. General Theory, volume Part I. Wiley-Interscience Publication, New York, 1958.
362
References
[149] R. Durrett and M. Liggett. Fixed points of the smoothing transformation. Z. Wahrscheinlichkeitstheorie Verw. Geb., 64:275–301, 1983. [150] A. Dvoretzky. Asymptotic normality for sums of dependent random variables. Proc. Berkeley Symp. II, pages 513–535, 1970. [151] D.A. Edwards. On the existence of probability measures with given marginals. Ann. Inst. Fourier, 28:53–78, 1978. [152] I. Ekeland and R. Teman. Convex analysis and variational problems. North Holland, 1976. [153] K.H. Elster and R. Nehse. Zur Theorie der Polarfunktionale. Optimization, 5:3–21, 1974. [154] R.F. Engle, D.M. Lilien, and R.P. Robins. Estimating time varying risk premia in the term structure: the ARCH model. Econometrica, 55:391–407, 1987. [155] Y. Ermoljev, A. Gaivoronski, and C. Nedeva. Stochastic optimization problem with incomplete information on distribution functions. Report WP-83-113, 1983. [156] I.V. Evstigneev. Measurable choice theorems and probabilistic control models. Dokl. Akad. Nauk USSR, 283(5):1065–1068, 1985. [157] G. Fayolle, P. Flajolet, and M. Hofri. On a functional equation arising in the analysis of a protocol for a multi-access broadcast channel. Advances in Applied Probability, 18:441–472, 1986. [158] G. Fayolle, P. Flajolet, M. Hofri, and P. Jacquet. Analysis of a stack algorithm for random multiple-access communication. IEEE Transactions on Information Theory, 31:244–254, 1985. [159] M.W. Feldman, S.T. Rachev, and L. R¨ uschendorf. Limit theorems for recursive algorithms. Journal of Computational and Applied Mathematics, 56:69–182, 1994. [160] W. Feller. An Introduction to Probability Theory and Its Applications, volume II. Wiley, New York, 2nd edition, 1971. [161] R. Ferland and G. Giroux. Cutoff-type Boltzmann equations: Convergence of the solution. Adv. Appl. Math., 8:98–107, 1987. [162] R. Ferland and G. Giroux. Le mod`ele Bose–Einstein de l’´equation non lin´eaire de Boltzmann: Convergence vers l’equilibre. Ann. Sc. Math. Qu´ebec, 15:23–33, 1991. [163] X. Fernique. Sur le th´eor`eme de Kantorovich–Rubinstein dans les espaces polonais. Lecture Notes in Mathematics, 850:6–10, 1981.
References
363
[164] P.C. Fishburn, J.C. Lagarias, J.A. Reeds, and L.A. Shepp. Sets uniquelly determine by projections on axes. I. Continuous case. SIAM Journal on Applied Mathematics, 50:288–306, 1990. [165] A.T. Fomenko and S.T. Rachev. Volume functions on historical (narrative) texts and the amplitude correlation principle. Computers and Humanities, 24(3):187–206, 1990. [166] P.R. Fortet and B. Mourier. Convergence de la repartition empirique vers la repartition theoretique. Ann. Sci. Ecole Norm. Sup., 70(3):267–285, 1953. [167] M.J. Frank. Operations arising from copulas. In Symp. Probab. Measures with Given Marginals, volume 67 of Math. Appl., pages 75–93, Rome, 1991. [168] M.J. Frank, R.B. Nelsen, and B. Schweizer. Best possible bounds for the distribution of a sum — a problem of Kolmogorov. Probability Theory and Related Fields, 74:199–211, 1987. [169] M. Fr´echet. Sur les tableaux de corr´elation dont les marges sont donn´ees. Ann. Univ. de Lyon, Sciences, 14:53–77, 1951. [170] M. Fr´echet. Les tableaux de correlation dont les marges sont donn´ees. Ann. Univ. de Lyon, Sciences, 20:13–31, 1957. [171] M. Fr´echet. Sur la distance de deux lois de probabilit´e. C.R. Acad. Sci. Paris, 244:689–692, 1957. [172] M. Fr´echet. Sur les tableaux de corr´elation dont les marges et des bornes sont donn´ees. Revue Inst. Int. de Statistique, 28:10–32, 1960. [173] N. Gaffke and L. R¨ uschendorf. On a class of extremal problems in statistics. Math. Operationsforschung Statist., 12:123–135, 1981. [174] N. Gaffke and L. R¨ uschendorf. On the existence of probability measures with given marginals. Statistics & Decisions, 2:163–174, 1984. [175] D. Gale. Theory of Linear Economic Models. McGraw-Hill, New York, 1960. [176] D. Gale and A. Mas-Colell. An equilibrium existence theorem for a general model without ordered preferences. Journal of Mathematical Economics, 2:9–15, 1975. [177] W. Gangbo and R.J. McCann. Optimal maps in Monge’s mass transport problem. CRAS, Ser. I, 321:1653–1658, 1995. [178] W. Gangbo and R.J. McCann. The geometry of optimal transformations. Preprint, 1996.
364
References
[179] M. Gelbrich. On a formula for the Lp Wasserstein metric between measures on Euclidean and Hilbert spaces. Preprint 179, 1988. Sektion Mathematik der Humboldt-Universit¨at zu Berlin. [180] M. Gelbrich. Lp -Wasserstein-Metriken und Approximationen stochastischer Differentialgleichungen. Dissertation A, Humboldt-Universit¨ at zu Berlin, Sektion Mathematik, 1989. [181] M. Gelbrich. On a formula for the L2 -Wasserstein metric between measures on Euclidean and Hilbert spaces. Math. Nachr., 147:185– 203, 1990. [182] M. Gelbrich. Simultaneous time and chance discretization for stochastic differential equations. Journal of Computational and Applied Mathematics, 58:255–289, 1995. [183] M. Gelbrich and S.T. Rachev. Discretization for stochastic differential equations, L2 -Wasserstein metrics, and econometric models. In Distributions with Given Marginals. IMS Proc., 1996. To appear. [184] I. Gelfand, D. Raikov, and G. Shilov. Kommutative normierte Algebren. VEB Deutscher Verlag der Wissenschaften, 1964. [185] C. Genest. A survey of the statistical properties and applications of Archimedean copulas, 1990. Technical Report. [186] H. Gerber. An Introducation to Mathematical Risk Theory. Huebner Foundation Monograph, 1981. [187] I.I. Gikhman and A.W. Skorokhod. Introduction to the theory of stochastic processes. Nauka, Moscow, 1977. In Russian. [188] C. Gini. Di una misura delle ralazioni tra le graduatorie di due caratteri. Appendix to: A. Hancini. L’Elezioni Generali Politiche del 1913 nel comune di Roma, Ludovic, Cecehini, 1914. [189] C. Gini. La dissomiglianza. Matron, 24:309–331, 1965. [190] C.R. Givens and R.M. Shortt. A class of Wasserstein metrics for probability distributions. Michigan Math. J., 31:231–240, 1984. [191] D. Goldfarb. Efficient dual simplex algorithms for the assignment problem. Preprint, 1985. [192] C.M. Goldie. Implicit renewal theory and tails of solutions of random equations. Annals of Applied Probability, 1:126–166, 1991. [193] C. Graham. McKean–Vlasov Itˆ o–Skorohod equations and nonlinear diffusions with discrete jump sets. Stoch. Proc. Appl., 40:69–82, 1992.
References
365
[194] C. Graham. Nonlinear diffusions with jumps. Preprint, 1992. [195] R.M. Gray, D.L. Neuhoff, and R.L. Dobrushin. Block synchronization, sliding-block coding, invulnerable sources and zero error codes for discrete noisy channels. Annals of Probability, 8:315–328, 1980. [196] R.M. Gray, D.L. Neuhoff, and P.C. Shields. A generalization to Ornstein’s d-distance with applications to information theory. Annals of Probability, 3:315–328, 1975. [197] R.M. Gray and D.S. Ornstein. Block coding for discrete stationary d-continuous channels. IEEE Transactions on Information Theory, 25:292–306, 1979. [198] N.E. Gretsky, J.M. Ostroy, and W.R. Zame. The nonatomic assignment model. Journal of Economic Theory, 2:103–128, 1992. [199] N.V. Grigorevski and I.S. Shiganov. On some modifications of Duley’s metric. Zap. Nauchnich Sem. LOMI, 61:17–24, 1976. [200] F.A. Gr¨ unbaum. Propagation of chaos for the Boltzmann equation. Arch. Rational Mech. Anal., 42:323–345, 1971. [201] P. Gudynas. Approximation by distributions of sums of conditionally independent random variables. Litovski Mat. Sbornik, 24:68–80, 1985. [202] Y. Guivarch. Sur une extension de la notion de loi semi-stable. Annales de l’Institut Henri Poincar´e, 26:261–286, 1990. [203] W. Gutjahr and G.Ch. Pflug. The asymptotic contour process of a binary tree is a Brownian excursion. Stoch. Processes and Applications, 41:69–89, 1992. [204] S. Gutmann, J.H.B. Kemperman, and J.A. Reeds. Existence of probability measures with given marginals. Annals of Probability, 19:1781–1791, 1991. [205] S. Gutmann, J.H.B. Kemperman, J.A. Reeds, and L.A. Shepp. Existence of probability measures with given marginals. Annals of Probability, 19:1781–1791, 1991. [206] D.L. Guy. common extension of finitely additive probability measures. Portugalia Math., 20:1–5, 1961. [207] M.G. Hahn, W.N. Hudson, and J.A. Veeh. Operator stable laws: series representations and domains of normal attraction. Journal of Multivariate Analysis, 10:26–37, 1989. [208] P. Hall. Personal communication, 1985.
366
References
[209] J.P. Hammond. Straightforward individual incentive compatiblility in large economies. Review of Economic Studies, 46:263–282, 1979. [210] W.K.K. Haneveld. Duality in Stochastic Linear and Dynamic Programming. Centrum voor Wiskunde en Informatica, Amsterdam, 1985. [211] L.G. Hanin. Kantorovich–Rubinstein duality for Lipschitz spaces defined by differences of arbitrary order. Soviet Math. Doklady, 42(1):220–224, 1991. [212] L.G. Hanin and S.T. Rachev. An extension of the Kantorovich– Rubinstein mass transportation problem., 1991. Dept. of Statistics and Applied Probability, University of California, Santa Barbara. [213] L.G. Hanin and S.T. Rachev. Mass transshipment problems and ideal metrics. Journal of Computational and Applied Mathematics, 56:183–196, 1994. [214] L.G. Hanin and S.T. Rachev. An extension of the Kantorovich– Rubinstein mass transshipment problem. Numer. Funct. Anal. Optimization, 16:701–735, 1995. [215] G. Hansel and J.P. Troallic. Measures marginales et th´eor`eme de Ford–Fulkerson. Z. Wahrscheinlichkeitstheorie Verw. Geb., 43:245– 251, 1978. [216] G. Hansel and J.P. Troallic. Sur le probl`eme des marges. Probability Theory and Related Fields, 71:357–366, 1986. [217] F. Hausdorff. Set Theory. Chelsea Publishing Company, New York, 1957. [218] E. H¨ aussler. On the rate of convergence in the central limit theorem for martingales with discrete and continuous time. Annals of Probability, 16:275–299, 1988. [219] H. Heinich and J.C. Lootgieter. Convergence des fonctions monotones. Preprint, 1993. [220] I.S. Helland and T.S. Nilsen. On a general random exchange mode. Journal of Applied Probability, 13:781–790, 1976. [221] P.L. Hennequin and A. Tortrat. Probability Theory and Some of Its Applications. Nauka, Moscow, 1974. Russian translation. [222] W. Hildenbrand. On economies with many agents. Journal of Economic Theory, 2:161–168, 1970.
References
367
[223] C. Hipp and R. Michel. Risikotheorie: Stochastische Modelle und Statistische Methoden. DGVM, 24, 1990. [224] W. Hoeffding. Maßstabinvariante Korrelationstheorie. Schriften des Mathematischen Instituts und des Instituts f¨ ur Angewandte Mathematik der Universit¨ at Berlin, 5:181–233, 1940. [225] W. Hoeffding. The extrema of the expected value of a function of independent random variables. Annals of Mathematical Statistics, 26:268–275, 1955. [226] W. Hoeffding and S.S. Shrikahande. Bounds for the distribution function of a sum of independent, identically distributed random variables. Annals of Mathematical Statistics, 27:439–449, 1956. [227] A.J. Hoffman. On simple linear programming problems. Convexity. In Proceedings of Symposia in Pure Mathematics, volume 7, pages 317–327, Providence, R.I, 1961. [228] A.J. Hoffman. On simple linear programming problems. In V. Klee, editor, Convexity, volume 7, pages 317–327, Providence, R.I, 1963. Proc. Symp. Pure Math. [229] A.J. Hoffman and A.F. Veinott jr. Staircase transportation problems with hyperadditive rewards and cumulative capacities. Preprint, 1990. IBM T.Y. Watson Research Center, Yorktown Heights, New York, 10598. [230] J. Hoffmann-J¨orgensen. Probability in Banach space. Lecture Notes in Mathematics, 598:2–186, 1977. [231] M. Hofri. Probabilistic Analysis of Algorithms. Springer, New York, 1987. [232] R. Holley and M. Liggett. Generalized potlach and smoothing processes. Z. Wahrscheinlichkeitstheorie Verw. Geb., 55:165–195, 1981. [233] G. Hooghiemstra and M. Keane. Calculation of the equilibrium distribution for a solar energy storage model. Journal of Applied Probability, 22:852–864, 1985. [234] G. Hooghiemstra and C.L. Scheffer. Some limit theorems for an energy storage model. Stoch. Processes and Applications, 22:121– 127, 1986. [235] J. Horowitz and R.L. Karandikar. Martingale problems associated with the Boltzmann equation. In E. C ¸ inlar et al., editor, Seminar on Stochastic Processes 1989, Boston, 1990. Birkh¨auser.
368
References
[236] J. Horowitz and R.L. Karandikar. Mean rates of convergence of empirical measures in the Wasserstein metric. Journal of Computational and Applied Mathematics, 55:261–273, 1994. [237] D.A. Hsieh. The statistical properties of daily foreign exchange rates: 1974-1983. Journal of Internat. Economics, 24:129–145, 1988. [238] P.J. Huber. Robust Statistics. Wiley, New York, 1981. [239] W.N. Hudson. Operator-stable distributions and stable marginals. Journal of Multivariate Analysis, 10:26–37, 1980. [240] W.N. Hudson, Z.J. Jurek, and J.A. Veeh. The symmetry group and exponents of operator stable probability measures. Annals of Probability, 14:1014–1023, 1986. [241] W.N. Hudson and J.D. Mason. Operator-stable laws. Journal of Multivariate Analysis, 11:434–447, 1981. [242] W.N. Hudson, J.A. Veeh, and D.C. Weiner. Moments of distributions attracted to operator-stable laws. Journal of Multivariate Analysis, 24:1–10, 1988. [243] J.E. Hutchinson. Fractals and selfsimilarity. Indiana Univ. Math. Journal, 30:713–747, 1981. [244] Z. Ignatov and S.T. Rachev. Minimality of ideal probabilistic metrics. J. Soviet Math., 32(6):595–608, 1986. [245] N. Ikeda and S. Watanabe. Stochastic Differential Equations and Diffusion Processes. North-Holland, Amsterdam, 1981. [246] A.D. Ioffe and V.M. Tihomirov. Theory der Extremalaufgaben. VEB Deutscher Verlag der Wissenschaften, Berlin, 1979. [247] K. Isii. Inequalities of the type of Chebychev and Cram´er-Rao and mathematical programming. Ann. Inst. Statist. Math., 16:247–270, 1964. [248] E.H. Ivanov and R. Nehse. Relations between generalized concepts of convexity and conjugacy. Math. Operationsforschung Statist., 13:9– 18, 1982. [249] K. Jacobs. Measure and Integral. Academic Press, New York, 1987. [250] J. Jacod. Calcul stochastique et probl`eme de martingales. Lecture Notes in Mathematics, 714, 1979.
References
369
[251] P. Jacquet and M. Regnier. Normal limiting distribution of the size of tries. In P.J. Courtois and G. Latouche, editors, Proc. Performance 87, pages 209–223, Amsterdam, 1988. Elsevier Science Publications B.V. (North Holland). [252] R. Janssen. Discretization of the Wiener-Process in DifferenceMethods for stochastic differential equations. Stoch. Processes and Applications, 18:361–369, 1984. [253] M. Jirina and J. Nedoma. Minimax solution of a sampling inventory process. Aplikace matematiky, 1:296–314, 1957. In Czech. [254] R. Jirousek. A survey of methods used in probabilistic expert systems for knowledge integration. Knowledge Based Systems, 3:7–12, 1990. [255] R. Jirousek. Solution of the marginal problem and decomposable distributions. Kybernetika, 27(5):403–412, 1991. [256] H. Johnen and K. Scherer. On the equivalence of K-functional and moduli of continuity and some applications. Lecture Notes in Mathematics, 571:119–130, 1977. [257] J.P. Kahane and J. Peyri`ere. Sur certaines martingales de Benoit Mandelbrot. Adv. Math., 22:131–145, 1976. [258] A.V. Kakosjan, K. Klebanov, and S.T. Rachev. Quantitative Criteria for Convergence of Probability Measures. Ayastan Press, Erevan, 1988. (In Russian, Engl. transl.: Springer-Verlag, To appear). [259] A.V. Kakosjan and L.B. Klebanov. On estimates of the closeness of distributions in terms of characteristic functions. Theory of Probability and its Applications, 29:852–853, 1984. [260] V.V. Kalashnikov and S.T. Rachev. Characterization problems in queueing theory and their stability. Advances in Applied Probability, 17:320–348, 1985. [261] V.V. Kalashnikov and S.T. Rachev. Characterization of inverse problems in queueing and their stability, 1986. [262] V.V. Kalashnikov and S.T. Rachev. Mathematical Methods for Construction of Stochastic Queueing Models. Wadsworth & Brooks/Cole, California, 1990. [263] T. Kamae, U. Krengel, and G.I. O’Brien. Stochastic inequalities on partially ordered spaces. Annals of Probability, 5:899–912, 1977. [264] S. Kanagawa. The rate of convergence for approximate solutions of stochastic differential equations. Tokyo J. Math., 12:33–48, 1986.
370
References
[265] Y. Kannai. Continuity properties of the core of a market. Econometrica, 38(6):791–815, 1970. [266] L.V. Kantorovich. On the transfer of masses. Dokl. Akad. Nauk USSR, 37:7–8, 1942. [267] L.V. Kantorovich. On a problem of Monge. Uspekhi Mat. Nauk, 3:225–226, 1948. In Russian. [268] L.V. Kantorovich and G.P. Akilov. Functional Analysis. Nauka, Moscow, 3rd edition, 1984. In Russian. [269] L.V. Kantorovich and G.Sh. Rubinstein. On a function space in certain extremal problems. Dokl. Akad. Nauk USSR, 115(6):1058– 1061, 1957. [270] L.V. Kantorovich and G.Sh. Rubinstein. On the space of completely additive functions. Vestnic Leningrad Univ., Ser. Mat. Mekh. i Astron., 13(7):52–59, 1958. In Russian. [271] S. Karlin and W.J. Studden. Tchebycheff Systems. Interscience, New York, 1966. [272] T. Kawata. Fourier Analysis in Probability Theory. Academic Press, New York, 1972. [273] H.G. Kellerer. Funktionen auf Produktr¨aumen mit vorgegebenen Marginal-Funktionen. Math. Ann., 144:323–344, 1961. [274] H.G. Kellerer. Maßtheoretische Marginal Probleme. Math. Annalen, 153:168–198, 1964. [275] H.G. Kellerer. Duality theorems and probability metrics. In Proc. 7th Brasov Conf., pages 211–220, Bucuresti, 1984. [276] H.G. Kellerer. Duality theorems for marginal problems. In M. Iosifescu, editor, Proceedings of the 7th Conference on Probability Theory, Bra¸sov, Romania, 1984. [277] H.G. Kellerer. Duality theorems for marginal problems. Z. Wahrscheinlichkeitstheorie Verw. Geb., 67:399–432, 1984. [278] H.G. Kellerer. Ambiguity in bounded moment problems. In AMS-IMS-SIAM Joint Research Conference: Distributions with fixed marginals, double-stochastic measures and Markov operators, 1993. To appear. [279] R. Kemp. Fundamentals of the Average Case Analysis of Particular Algorithms. Wiley, New York, 1984.
References
371
[280] J.H.B. Kemperman. The general moment problem, a geometric approach. Annals of Mathematical Statistics, 19:93–122, 1968. [281] J.H.B. Kemperman. On a class of moment problems. In Proceedings 6th Berkeley Symposium on Mathematical Statistics and Probability, volume 2, pages 101–126, 1972. [282] J.H.B. Kemperman. On the FKG-inequality for measures on a partially ordered space. Proc. Nederl. Akad. Wet., 80:313–331, 1977. [283] J.H.B. Kemperman. On the role of duality in the theory of moments. In Semi-Infinite Programming and Applications 1981, volume 215, pages 63–92. Springer, 1983. [284] J.H.B. Kemperman. Geometry of the moment problem. In Proceedings of Symposia in Applied Mathematics, volume 27, pages 16–53. American Mathematical Society, 1987. [285] J.H.B. Kemperman. Moment problems for measures on IRn with given k-dimensional marginals. In AMS-IMS-SIAM; Joint Research Conference. Distributions with fixed marginals, double-stochastic measures and Markov operators, 1993. To appear. [286] H. Kesten. Random difference equations and renewal theory for products of random matrices. Acta Math., 131:207–248, 1973. [287] L.A. Khalfin and L.B. Klebanov. A solution of the computer tomography paradox and estimation of the distances between the densities of measures with the same marginals. Annals of Probability, 22:2235– 2241, 1994. [288] V. Kifer. Ergodic Theory of Random Transformations. Birkh¨auser, Boston, 1986. [289] T. Kim and M.K. Richter. Nontransitive-nontotal consumer theory. Journal of Economic Theory, 38, 1986. [290] A.Y. Kiruta, A.M. Rubinov, and E.B. Yanovskaya. Optimal choice of distributions in complex socio-economic problems. Nauka, Leningrad, 1980. In Russian. [291] L.B. Klebanov, G.M. Maniya, and I.A. Melamed. A problem of Zolotarev and analogs of infinitely divisible and stable distributions in a scheme for summing a random number of random variables. Theory of Probability and its Applications, 29:791–794, 1984. [292] L.B. Klebanov and S.T. Mkrtchian. Estimator of the closeness of distributions in terms of coinciding moments. In Problems of Stability of Stochastic Models, Proceedings, pages 64–72, Moscow, 1980.
372
References
[293] L.B. Klebanov and S.T. Rachev. The method of moments in computer tomography. Math. Scientist, 20:1–14, 1995. [294] L.B. Klebanov and S.T. Rachev. On a special case of the basic problem in diffraction tomography. In Stochastic Models, 1995. [295] L.B. Klebanov and S.T. Rachev. Closeness of probability measures with common marginals on finite number of direction. In Proceedings of Distributions with fixed Marginals and Related Topics, volume 28, pages 162–174. IMS Lecture Notes Monography Series, 1996. [296] L.B. Klebanov and S.T. Rachev. Proximity of probability with common marginals in a finite number of directions. In Distributions with Given Marginals, 1996. [297] P. Kleinschmidt, C.W. Lee, and H. Schannath. Transportation problems which can be solved by the use of Hirsch paths for the dual problems. Mathematical Programming Study, 37:153–168, 1987. [298] P.E. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations. Springer-Verlag, Berlin, 1992. [299] M. Knott and C.S. Smith. On the optimal mapping of distributions. Journal of Optimization Theory and Applications, 43:39–49, 1984. [300] M. Knott and C.S. Smith. Note on the optimal transportation of distributions. Journal of Optimization Theory and Applications, 52:323– 329, 1987. [301] M. Knott and C.S. Smith. On Hoeffding–Fr´echet bounds and cyclic monotone relations. Journal of Multivariate Analysis, 40:328–334, 1992. [302] M. Knott and C.S. Smith. On a generalization of cyclic monotonicity and distances among random vectors. Linear Algebra and its Applications, 199:363–371, 1994. [303] D.E. Knuth. The Art of Computer Programming, volume II. Addison-Wesley, 1969. [304] J. Koml´ os, P. Major, and G. Tusn´ ady. An approximation of partial sums of independent r.v.s and the sample d.f., I. Z. Wahrscheinlichkeitstheorie Verw. Geb., 32:111–131, 1975. [305] J. Koml´ os, P. Major, and G. Tusn´ ady. An approximation of partial sums of independent r.v.s and the sample d.f., II. Z. Wahrscheinlichkeitstheorie Verw. Geb., 34:33–58, 1976. [306] M.G. Krein and A.A. Nudelman. The markov moment problem and extremal problems, 1977.
References
373
[307] W.M. Kruskal. Ordinal measures of association. Journal of the American Statistical Association, 53:814–861, 1958. [308] J. Kuelbs. Kolmogorov’s law of the iterated logarithm for Banach space valued random variables. Illinois J. Math., 21:784–800, 1977. [309] K. Kuratowski. Topology, volume I. Academic Press, New York, 1966. [310] K. Kuratowski. Topology, volume II. Academic Press, New York, 1969. [311] I. Kuznezova-Sholpo and S.T. Rachev. Explicit solutions of moment problems. Probability and Mathematical Statistics, 10:297–312, 1989. [312] J.J. Laffont and E. Maskin. A differential approach to dominant strategy mechanisms. Econometrica, 48:1507–1520, 1980. [313] T.L. Lai and M. Robbins. Maximally dependent random variables. Proc. Nat. Acad. Sci. USA, 73:286–288, 1976. [314] P. Lancaster. Theory of Matrices. Wiley, New York, London, 1969. [315] D. Landers and L. Rogge. Best approximations in Lφ -spaces. Z. Wahrscheinlichkeitstheorie Verw. Geb., 51:215–237, 1980. [316] F. Lassner. Sommes de produit de variables al´eatoires ind´ependantes. Thesis, Universit´e de Paris VI, 1974. [317] M. Ledoux and M. Talagrand. Springer, Berlin, 1991.
Probability in Banach Spaces.
[318] S.J. Leese. Multifunctions of Suslin type. Bull. Austral. Math. Soc., 11:395–411, 1975. and 13:159-160. [319] G. Letac. Repr´esentation des mesures de probabilit´e sur le produit de deux espaces denombrables, de marges donn´ees. Ann. Inst. Fourier, 16:497–507, 1966. [320] G. Letac. A contraction principle for certain Markov chains and its applications. Random matrices and their applications. In H. Kesten J.E. Cohen and C.M. Newman, editors, Proc. AMS-IMS-SIAM Joint Summer Research Conf. 1984, volume 50 of Contemp. Math., pages 263–273, Providence, R.I., 1986. Amer. Math. Soc. [321] V.L. Levin. Application of E. Helly’s theorem to convex programming, problems of best approximation and related questions. USSR Math. Sbornik, 8:235–248, 1969.
374
References
[322] V.L. Levin. Duality and approximation in the problem of mass transfer. In B.S. Mityagin, editor, Mathematical Economics and Functional Analysis, pages 94–108. Nauka, Moscow, 1974. In Russian. [323] V.L. Levin. On the problem of mass transfer. Soviet Math. Doklady, 16:1349–1353, 1975. [324] V.L. Levin. On the theorems in the Monge–Kantorovich problem. Uspekhi Mat. Nauk, 32:171–172, 1977. In Russian. [325] V.L. Levin. The mass transfer problem, strong stochastic domination and probability measures on the product of two compact spaces with given projections. Preprint, TsEMI, Moscow, 1978a. In Russian. [326] V.L. Levin. The Monge–Kantorovich problem on mass transfer. In Methods of Functional Analysis in Mathematical Economics, pages 23–55. Nauka, Moscow, 1978b. In Russian. [327] V.L. Levin. Measurable selections of multivalued mappings into topological spaces and upper envelopes of Carath´eodory integrands. Soviet Math. Doklady, 21:771–775, 1980. [328] V.L. Levin. Some applications of duality for the problem of translocation of masses with a lower semicontinuous cost function. Closed preferences and Choquet theory. Soviet Math. Doklady, 2:262–267, 1981. [329] V.L. Levin. A continuous utility theorem for closed preorders on a compact metrizable space. Soviet Math. Doklady, 28:715–718, 1983a. [330] V.L. Levin. Measurable utility theorems for closed and lexicographic preference relations. Soviet Math. Doklady, 27:639–643, 1983b. [331] V.L. Levin. Lipschitz preorders and Lipschitz utility functions. Russian Mathematical Surveys, 39:199–200, 1984a. [332] V.L. Levin. The mass transfer problem in topological space and probability measures on the product of two spaces with given marginal measures. Soviet Math. Doklady, 29:638–643, 1984b. [333] V.L. Levin. Convex Analysis in Spaces of Measurable Functions and Its Applications in Mathematics and Economics. Nauka, Moscow, 1985a. In Russian. [334] V.L. Levin. Functionally closed preorders and strong stochastic dominance. Soviet Math. Doklady, 32:22–26, 1985b.
References
375
[335] V.L. Levin. Extremal problems with probability measures, functionally closed preorders and strong stochastic dominance. In Stochastic Optimization, volume 81 of Lecture Notes in Control and Information Science, pages 435–447, Berlin, New York, 1986. Proc. Int. Conf. Kiev 1984, Springer-Verlag. [336] V.L. Levin. Measurable selectors of multivalued mappings and the mass transfer problem. Dokl. Akad. Nauk USSR, 292:1048–1053, 1987. [337] V.L. Levin. General Monge–Kantorovich problem and its applications in measure theory and mathematical economics. In L.J. Leifman, editor, Functional Analysis, Optimization and Mathematical Economics. Oxford University Press, 1990. A collection of papers dedicated to the Memory of L.V. Kantorovich. [338] V.L. Levin. Some applications of set-valued mappings in mathematical economics. Journal of Mathematical Economics, 20:69–87, 1991. [339] V.L. Levin. A formula for the optimal value in the Monge– Kantorovich problem with a smooth cost function and a characterization of cyclically monotone mappings. USSR Math. Sbornik, 71:533–548, 1992. [340] V.L. Levin. Private communication, 1994. [341] V.L. Levin. Quasi-convex functions and quasi-monotone operators. Journal of Convex Analysis, 2, 1995a. [342] V.L. Levin. Reduced cost functions and their applications. Journal of Mathematical Economics, 1995b. To appear. [343] V.L. Levin and A.A. Milyutin. The mass transfer problem with discontinuous cost function and a mass setting for the problem of duality of convex extremum problems. Trans Russian Math. Surveys, 34:1– 78, 1979. [344] V.L. Levin and S.T. Rachev. New duality theorems for marginal problems with some applications in stochastics. Lecture Notes in Mathematics, 1412:137–170, 1989. [345] M. Loeve. Probability Theory. Van Nostrand, 1977. [346] G.G. Lorentz. A problem of plane measure. Amer. J. Math., 71:417– 426, 1949. [347] G.G. Lorentz. An inequality for rearrangements. American Mathematics Monthly, 60:176–179, 1953.
376
References
[348] G. Louchard. Exact and asymptotic distributions in digital and binary search trees. Theor. Inf. Appl., 21:479–495, 1987. [349] R. Lucchetti and F. Patrone. Closure and upper semicontinuity results in mathematical programming, Nash and economic equilibria, Optimization. Mathematische Operationsforschung und StatistikSeries Optimization, 17:619–628, 1986. [350] N. Lusin. Le¸cons sur les Ensembles Analytiques. Gauthier-Villars, 1930. [351] M. Maejima. Some limit theorems for summability methods of iid random variables. In V.V. Kalashnikov et al., editor, Stability problems of stochastic models, volume 1233 of Lecture Notes in Mathematics, pages 57–68, 1985. Varna 1985. [352] M. Maejima. Some limit theorems for stability methods of i.i.d. random variables. Lecture Notes in Mathematics, 1233:57–68, 1988. [353] M. Maejima and S.T. Rachev. An ideal metric and the rate of convergence to a self-similar process. Annals of Probability, 15:708–727, 1987. [354] M. Maejima and S.T. Rachev. Rates of convergence in the operatorstable limit theorems. J. Theor. Probability, 9:37–86, 1996. [355] H.M. Mahmoud. Evolution of Random Search Trees. Wiley, New York, London, 1992. [356] G.D. Makarov. Estimates for the distributions function of a sum of two random variables when the marginal distributions are fixed. Theory of Probability and its Applications, 26:803–806, 1981. [357] C.L. Mallows. A note on asymptotic joint normality. Annals of Mathematical Statistics, 43:508–515, 1972. [358] B.B. Mandelbrot. Multiplications al´eatoires it´er´ees et distributions invariantes par moyenne pond´er´ee al´eatorie. C.R. Acad. Sci. Paris, 278, 1974. [359] B.B. Mandelbrot and M. Taylor. On the distribution of stock price differences. Oper. Res., 15:1057–1062, 1967. [360] M. Marcus. Some properties and applications of doubly stochastic matrices. American Mathematics Monthly, 67:215–222, 1960. [361] A.W. Marshall and I. Olkin. Theory of majorization and its applications. Academic Press, New York, 1979.
References
377
[362] G. Maruyama. Continuous Markov processes and stochastic equations. Rend. Circolo Math. Palermo, 4:48–90, 1955. [363] A. Mas-Colell. On the continuous representation of preorders. Intern. Econ. Revue, 18:509–513, 1977. [364] E. Maskin and J. Riley. Monopoly with incomplete information. Rand Journal of Economics, 15:171–196, 1984. [365] J.L. Massey. Collision-resolution algorithms and random-access communications, multi-user communication systems. CISM Courses and Lectures, 1981. [366] R. Mathar and D. Pfeifer. Stochastik f¨ ur Informatiker. Teubner, Stuttgart, 1990. [367] G. Matheron. Random Sets and Integral Geometry. Wiley, 1975. [368] M. Meerschaert. Moments of random vectors which belong to some domain of normal attraction. Annals of Probability, 18:870–876, 1989. [369] M. Meerschaert. Spectral decomposition for generalized domains of attraction. Annals of Probability, 19:875–892, 1991. [370] K. Mehlhorn. Datenstrukturen und effiziente Algorithmen, volume I. Teubner, Stuttgart, 1986. [371] I. Meilijson and A. Nadas. Convex majorization with application to the length of critical paths. Journal of Applied Probability, 16:671– 677, 1979. [372] D. Mejzler. On the problem of the limit distributions for the maximal term of a variational series. Lvov Politechn. Inst. Naucn. Zap. Ser. Fiz.-Mat., 38:90–109, 1956. In Russian. [373] E. Michael. Continuous selections. Ann. of Math., 63:361–382, 1956. [374] P. Mikusinski, H. Sherwood, and M.D. Taylor. Probabilistic interpretations of copulas and their convex sums. In Symp. Probab. Measures with Given Marginals, volume 67 of Math. Appl., pages 95–112, Rome, 1991. [375] G.N. Milshtein. A method of second-order accuracy integration of stochastic differential equations. Theory of Probability and its Applications, 23, 1978. [376] G.N. Milshtein. Numerical integration of stochastic differential equations. Izd. Ural. Univ. Sverdlovsk, 1988. In Russian.
378
References
[377] J.A. Mirrlees. Optimal tax theory: a synthesis. Journal of Public Economics, 6:327–358, 1976. [378] S. Mittnik and S.T. Rachev. Alternative multivariate stable distributions and their applications to financial modeling. In S. Cambanis, G. Samordodnitsky, and M.S. Taqqu, editors, Stable Processes and Related Topics, pages 107–120, Boston, 1991. Birkh¨auser. [379] S. Mittnik and S.T. Rachev. Modeling assets returns with alternative stable laws. Econometric reviews, 12(3):261–330, 1993. [380] S. Mittnik and S.T. Rachev. Reply on comments on “modeling assets returns with alternative stable laws” and some extensions. Econometric reviews, 12(3):347–389, 1993. [381] S. Mittnik and S.T. Rachev. Modelling Financial Assets with Alternative Stable Models. Series in Financial Economics and Quantitative Analysis. Wiley, New York, 1997. [382] G. Monge. M´emoire sur la th´eorie des d´eblais et des remblais, 1781. [383] F. Mosteller, C. Youtz, and D. Zahn. The distribution of sums of rounded percentages. Demography, 4:850–858, 1967. [384] K.R. Mount and S. Reiter. Construction of a continuous utility function for a class of preferences. Journal of Mathematical Economics, 3:227–245, 1976. [385] L. Nachbin. Topology and Order. Van Nostrand, New York, 1965. [386] R.B. Nelsen. Copulas and association. In Symp. Probab. Measures with Given Marginals, pages 51–74, Rome, 1991. Kluwer. [387] W. Neuefeind. On continuous utility. Journal of Economic Theory, 5:174–176, 1972. [388] J. Neveu. Mathematical Foundations of the Calculus of Probability. Holden-Day, San Francisco, 1965. [389] J. Neveu and R.M. Dudley. On Kantorovich–Rubinstein theorems. (Transcript), 1980. [390] V.B. Nevzorov. Records. Theory of Probability and its Applications, 32:201–228, 1988. [391] N.J. Newton. An asymptotically efficient difference formula for solving stochastic differential equations. Stochastics, 19:175–206, 1986. [392] I. Olkin and F. Pukelsheim. The distance between two random vectors with given dispersion matrices. Journal of Linear Algebra and its Applications, 48:257–263, 1982.
References
379
[393] I. Olkin and F. Pukelsheim. Marginal problems with additional constraints. Tech. report, 270, 1990. Department of Statistics, Stanford University, Stanford, CA. [394] I. Olkin and S.T. Rachev. Distances among random vecotrs with given dispersion matrices. Preprint, 1991. Department of Statistics, Stanford University, Stanford, CA. [395] I. Olkin and S.T. Rachev. Maximum submatrix traces for positive definite matrices. SIAM Journal of Matrix Analysis Applications, 14:390–397, 1993. [396] J.M. Ortega and W.C. Rheinboldt. Iterative solution of nonlinear equations in several variables. Academic Press, New York, 1970. [397] J. Pachl. Two classes of measures. Colloq. Math., 42:331–340, 1979. [398] E. Pardoux and D. Talay. Discretization and simulation of stochastic differential equations. Acta Appl. Math., 3:23–47, 1985. [399] V. Paulauskas and A. Rackauskas. Approximation Theory in the Central Limit Theorem. Kluwer Academic Publisher, 1989. [400] A.S. Paulson and V.R.R. Uppuluri. Limit laws of a sequence determined by a random difference equation governing a one-compartment system. Math. Biosci., 13:325–333, 1972. [401] A. Perez and R. Jirousek. Constructing an intentional expert system INES. In J.H. van Remmel, F. Gremy, and J. Zvarova, editors, Medical decision making: Diagnostic Strategies and Expert Systems, pages 307–315. North-Holland, 1985. [402] S. Perrakis and C. Henin. Evaluation of risky investments with random timing of cash returns. Management Sci., 21:79–86, 1974. [403] D. Pfeifer. Some remarks on Nevzorov’s record model. Advances in Applied Probability, 23:823–834, 1991. [404] G. Pflug. Stochastische Modelle in der Informatik. Stuttgart, 1986.
Teubner,
[405] G. Pisier and J. Zinn. On limit theorems for random variables with values in the spaces Lp . Z. Wahrscheinlichkeitstheorie Verw. Geb., 41:286–305, 1977. [406] B. Pittel. Paths in a random digital tree: Limiting distributions. Advances in Applied Probability, 18:139–155, 1986. [407] E. Platen. An approximation method for a class of Itˆ o processes. Lietuvos Math. Rink. XXI, 1:121–133, 1981.
380
References
[408] D. Pollard. Convergence of Stochastic Processes. Springer, 1984. [409] C.J. Preston. A generalization of the FKG inequalities. Comm. Math. Phys., 36:233–241, 1974. [410] P.S. Puri. On almost sure convergence of an erosion process due to Todorovic and Gani. Journal of Applied Probability, 24:1001–1005, 1987. [411] G. Pyatt and J.J. Round, editors. Social Accounting Matrics: A Basis for Planning. World Bank, Washington, D.C., 1985. [412] R. Pyke and D. Root. On convergence in r-mean of normalized partial sums. Annals of Mathematical Statistics, 39:379–381, 1968. [413] S.T. Rachev. On a metric construction of Hausdorff in a space of probability measures. Zapiski Nauchn. Sem. LOMI, 87:87–104, 1978. [414] S.T. Rachev. Minimal metrics in a space of real random variables. Dokl. Akad. Nauk SSSR, 257(5):1067–1070, 1981. [415] S.T. Rachev. On minimal metrics in the space of real-valued random variables. Soviet Dokl. Math., 23(2):425–438, 1981a. [416] S.T. Rachev. Minimal metrics in the random variables spaces. Pub. Inst. Stat. Univ. Paris, 27(1):27–47, 1982a. [417] S.T. Rachev. Minimal metrics in the random variables spaces. In W. Grossmann et al., editor, Probability and Statistical Inference Proceedings of the 2nd Pannonian Symp., pages 319–327, Dordrecht, 1982b. D. Reidel Company. [418] S.T. Rachev. Compactness in the probability measures space. In M. Galyare et al., editor, Proceedings of the 3rd European Young Statisticians Meeting, pages 136–150, Katholieke Univ., Leuven, 1983a. [419] S.T. Rachev. Minimal metrics in the real valued random variable spaces. Lecture Notes in Mathematics, 982:172–190, 1983b. [420] S.T. Rachev. Hausdorff metric construction in the probability measures space. Studia Mathematica, 7:152–162, 1984a. Pliska. [421] S.T. Rachev. The Monge–Kantorovich mass transference problem and its stochastic applications. Theory of Probability and its Applications, 29:647–676, 1984b. [422] S.T. Rachev. On a class of minimal functionals on a space of probability measure. Theory of Probability and its Applications, 29(1):41–49, 1984c.
References
[423] S.T. Rachev. On a problem of Dudley. 29(2):162–164, 1984d.
381
Soviet Math. Doklady,
[424] S.T. Rachev. Extreme functionals in the space of probability measures. Lecture Notes in Mathematics, 1155:320–348, 1985a. Proc. “Stability Problems for Stochastic Models”. [425] S.T. Rachev. Probability metrics and their applications to the stability problems for stochastic models, 1985b. Author’s review of doctor of sciences theses, Steklov Mathematical Institute, USSR Academy of Sciences, Moscow. In Russian. [426] S.T. Rachev. Extreme functional in the space of probability theory and mathematical statistics. VNU Science Press, 2:474–476, 1986. [427] S.T. Rachev. Minimal metrics in a space of random vectors with fixed one-dimensional marginal distributions. J. Soviet Math., 34(2):1542– 1555, 1986. Stability Problems for Stochastic Models. Proceedings, Moscow, VNIISI. [428] S.T. Rachev. The stability of stochastic models. Applied Probability Newsletter, 12(2):3–4, 1988. [429] S.T. Rachev. The problem of stability in queueing theory. Queueing Systems Theory and Applications, 4:287–318, 1989. [430] S.T. Rachev. Mass transshipment problems and ideal metrics. Numer. Func. Anal. & Optimiz., 12(5& 6):563–573, 1991a. [431] S.T. Rachev. Optimal mass transportation problems. In Proceedings of XI Congres de Metodologias en Ingenieria de Sistemas, pages 115– 120, Azocar, Santiago de Chile, 1991b. [432] S.T. Rachev. Probability Metrics and the Stability of Stochastic Models. Wiley, Chichester-New York, 1991c. [433] S.T. Rachev. Theory of probability metrics and recursive algorithms. In S. Joly and G. le Calve, editors, Distancia 1992, Proceedings of Congres International sur Analyse en Distance, pages 339–403, Universit´e de haute Bretagne, Rennes, 1992. [434] S.T. Rachev and G.S. Chobanov. Minimality of ideal probabilistic metrics. Pliska, 2:1154–1158, 1986. In Russian. [435] S.T. Rachev, B. Dimitrov, and Z. Khalil. A probabilistic approach to optimal quality usage. Computers and Mathematics with Applications, 24(8/9):219–227, 1992. [436] S.T. Rachev and Z. Ignatov. Minimality of ideal probabilistic metrics. J. Soviet Math., 32(6):595–608, 1986.
382
References
[437] S.T. Rachev and S.I. Resnick. Max-geometric infinite divisibility and stability. Stoch. Models, 2:191–218, 1991. [438] S.T. Rachev and L. R¨ uschendorf. Approximation of sums by compound Poisson distributions with respect to stop-loss distances. Advances in Applied Probability, 22:350–374, 1990. [439] S.T. Rachev and L. R¨ uschendorf. A counterexample to a.s. constructions. Stat. Prob. Letters, 9:307–309, 1990a. [440] S.T. Rachev and L. R¨ uschendorf. A transformation property of minimal metrics. Theory of Probability and its Applications, 35:131–137, 1990b. [441] S.T. Rachev and L. R¨ uschendorf. Approximate independence of distributions on spheres and their stability properties. Annals of Probability, 19:1311–1337, 1991. [442] S.T. Rachev and L. R¨ uschendorf. Recent results in the theory of probability metrics. Statistics & Decisions, 9:327–373, 1991a. [443] S.T. Rachev and L. R¨ uschendorf. A new ideal metric with applications to multivariate stable limit theorems, summability methods and compound Poisson approximation. Probability Theory and Related Fields, 94:163–187, 1992. [444] S.T. Rachev and L. R¨ uschendorf. Rate of convergence for sums and maxima and doubly ideal metrics. Theory of Probability and its Applications, 37:276–289, 1992a. [445] S.T. Rachev and L. R¨ uschendorf. On constrained transportation problems. In Proceedings of the 32nd Conference on Decision and Control, volume 3, pages 2896–2900. IEEE Control System Society, 1993. [446] S.T. Rachev and L. R¨ uschendorf. On the Cox, Ross and Rubinstein model for option pricing. Theory of Probability and its Applications, 39:150–190, 1994. [447] S.T. Rachev and L. R¨ uschendorf. On the rate of convergence in the CLT with respect to the Kantorovich metric. In J. Kuelbs, M. Marcus, and J. Hoffman-Jorgensen, editors, 9th Conf. on Probability on Banach Spaces, pages 193–207, Boston–Basel–Berlin, 1994a. Birkh¨ auser. [448] S.T. Rachev and L. R¨ uschendorf. Propagation of chaos and contraction of stochastic mappings. Siberian Advances in Mathematics, 4:114–150, 1994b.
References
383
[449] S.T. Rachev and L. R¨ uschendorf. Solution of some transportation problems with relaxed or additional constraints. SIAM Journal of Control and Optimization, 32(3):673–689, 1994c. [450] S.T. Rachev and L. R¨ uschendorf. Probability metrics and recursive algorithms. Journal of Applied Probability, 27:770–799, 1995. Technical Report (1991). [451] S.T. Rachev and L. R¨ uschendorf. Propagation of chaos and contraction of stochastic mappings. Siberian Adv. Math., 4:114–150, 1995a. [452] S.T. Rachev, L. R¨ uschendorf, and A. Schief. Uniformities for the convergence in law and probability. Journal of Theoretical Probability, 5:33–44, 1992. [453] S.T. Rachev and G. Samorodnitsky. Geometric stable distributions in Banach spaces. Journal of Theoretical Probability, 7(29):351–373, 1994. [454] S.T. Rachev and G. Samorodnitsky. Limit laws for a stochastic process and random recursion arising in probabilistic modelling. Advances in Applied Probability, 27:185–203, 1995. [455] S.T. Rachev and A. Schief. On Lp -minimal metric. Probability and Mathematical Statistics, 13(2):311–320, 1992. [456] S.T. Rachev and A. SenGupta. Geometric stable distributions and Laplace–Weibull mixtures. Statistics & Decisions, 10:251–271, 1992. [457] S.T. Rachev and A. SenGupta. Laplace-Weibull mixtures for modeling price changes. Management Science, pages 1029–1038, 1993. [458] S.T. Rachev and R.M. Shortt. Classification problem for probability metrics, volume 94 of Contemporary Mathematics, pages 221–262. AMS, 1989. [459] S.T. Rachev and R.M. Shortt. Duality theorems for Kantorovich– Rubinstein and Wasserstein functionals. Dissertationes Mathematicae, 299:647–676, 1990. [460] S.T. Rachev and M. Taksar. Kantorovich’s functionals in space of measures. In I. Karatzas and D. Ocone, editors, Applied Stochastic Analysis, volume 77 of Lecture Notes in Control and Information Science, pages 248–261, Berlin–New York, 1992. Proceedings of the US–French Workshop, Springer-Verlag. [461] S.T. Rachev and P. Todorovic. On the rate of convergence of some functionals of a stochastic process. Journal of Applied Probability, 28:805–814, 1990.
384
References
[462] S.T. Rachev and J.E. Yukich. Rates for the CLT via new ideal metrics. Annals of Probability, 17:775–788, 1989. [463] S.T. Rachev and J.E. Yukich. Smoothing metrics for measures on groups with applications to random motions. Annales de l’Institut Henri Poincar´e, 25:429–941, 1990. [464] S.T. Rachev and J.E. Yukich. Rates of convergence of α-stable random motions. J. Theor. Prob., 4:333–352, 1991. [465] A. Rackauskas. On the convergence rate in martingale CLT in Hilbert spaces. Preprint 90-031, 1990. University of Bielefeld. [466] D. Ramachandran. Perfect measures. Part I: Basic theory, volume 5. Macmillan, New Delhi, 1979. [467] D. Ramachandran. Perfect measures. Part II: Special topics, volume 7. Macmillan, New Delhi, 1979. [468] D. Ramachandran. Marginal problem in arbitrary product spaces. In Proceedings of the conference on “Distribution with Fixed Marginals, Double Stochastic Measures and Markov Operators”, volume 28, pages 260–272, Seattle, August 1993. IMS Lecture Notes Monograph Series 1997. [469] D. Ramachandran and L. R¨ uschendorf. A general duality theorem for marginal problems. Probability Theory and Related Fields, 101:311– 319, 1995. [470] D. Ramachandran and L. R¨ uschendorf. Duality and perfect probability spaces. Proc. Amer. Math. Soc., 124:2223–2228, 1996a. [471] D. Ramachandran and L. R¨ uschendorf. Duality theorems for assignments with upper bounds. In ‘Distributions with Fixed Marginals and Moment Problems’, pages 283–290. Kluwer, 1997. [472] D. Ramachandran and L. R¨ uschendorf. On the validity of the Monge– Kantorovich duality theorem. Preprint, 1997. [473] F. Ramsey. A mathematical theory of savings. Economic Journal, 38:543–559, 1928. [474] M. Regnier and P. Jacquet. New results on the size of tries. IEEE Transactions on Information Theory, 35:203–205, 1989. [475] S.I. Resnick and P. Greenwood. A bivariate stable characterization and domains of attraction. Journal of Multivariate Analysis, 9:206– 221, 1979.
References
385
[476] M.K. Richter. Duality and rationality. Journal of Economic Theory, 20:131–181, 1979. [477] H. Robbins. The maximum of identically distributed random variables. I.M.S. Bull., March 1975. Abstract. [478] H. Robbins and D. Siegmund. A convergence theorem for nonnegative almost supermartingales. In Rustagi, editor, Optimiz. Meth. in Statistics, pages 233–258. Academic Press, 1971. [479] J.C. Rochet. The taxation principle and multi-time Hamilton–Jacobi equation. Journal of Mathematical Economics, 14:113–128, 1985. [480] J.C. Rochet. A necessary and sufficient condition for rationalizability in a quasi-linear context. Journal of Mathematical Economics, 16:191–200, 1987. [481] R.T. Rockafellar. Characterization of the subdifferentials of convex functions. Pacific J. Math., 17:497–510, 1966. [482] R.T. Rockafellar. Convex Analysis. Princeton Univ. Press, Princeton, NJ, 1970. [483] C. Rogers. Coupling of random walks, 1992. Private communication. [484] W.W. Rogosinski. Moments of non-negative mass. In Proceedings of Royal Society London, Ser. A, volume 245, pages 1–27, 1958. [485] W. R¨ omisch. An approximation method in stochastic optimization and control. In Optimization techniques, volume 22, pages 169–178. Proc. 9th IFIP Conf., Warsaw 1979, Part 1, Lecture Notes in Control and Information Science, 1980. [486] W. R¨omisch. On discrete approximations in stochastic programming, 1981. Seminarbericht. [487] W. R¨ omisch and R. Schultz. Stability analysis of stochastic programs. Ann. Operat. Res., 30:241–266, 1991. [488] W. R¨omisch and R. Schultz. Stability of solutions for stochastic programs with complete recourse. Mathematics of Operations Research, 18:590–609, 1993. [489] W. R¨omisch and A. Wakolbinger. On Lipschitz dependence in systems with differentiated inputs. Math. Ann, 272:237–248, 1985. [490] U. R¨ osler. A limit theorem for quicksort. Informatique Th´eorique et Applications, 25:85–100, 1991.
386
References
[491] U. R¨osler. A fixed point theorem for distributions. Stoch. Processes and Applications, 37:195–214, 1992. [492] S.M. Ross. A simple heuristic approach to simplex efficiency. European J. Oper. Res., 9:344–346, 1982. [493] S.M. Ross. Stochastic Processes. Wiley, New York, 1983. [494] B. R¨ uger. Scharfe untere und obere Schranken f¨ ur die Wahrscheinlichkeit der Realisation von k unter n Ereignissen. Metrika, 26:71–77, 1979. [495] L. R¨ uschendorf. Vergleich von Zufallsvariablen bzgl. integralinduzierter Halbordnungen, 1979. Habilitationsschrift. [496] L. R¨ uschendorf. Inequalities for the expectiation of -monotone functions. Z. Wahrscheinlichkeitstheorie Verw. Geb., 54:341–349, 1980. [497] L. R¨ uschendorf. Ordering of distributions and rearrangement of functions. Annals of Probability, 9:276–283, 1980. [498] L. R¨ uschendorf. Sharpness of Fr´echet-Bounds. Z. Wahrscheinlichkeitstheorie Verw. Geb., 57:293–302, 1981. [499] L. R¨ uschendorf. Random variables with maximum sums. Advances in Applied Probability, 14:623–632, 1982. [500] L. R¨ uschendorf. On the multidimensional assignment problem. Methods of OR, 47:107–113, 1983. [501] L. R¨ uschendorf. Solution of a statistical optimization problem by rearrangement methods. Metrika, 30:55–62, 1983. [502] L. R¨ uschendorf. On the minimum discrimination information theorem. Statistics & Decisions, 1:263–283, 1984. Suppl. Issue. [503] L. R¨ uschendorf. Construction of multivariate distributions with given marginals. Ann. Inst. Stat. Math., 37:225–233, 1985. [504] L. R¨ uschendorf. The Wasserstein distance and approximation theorems. Z. Wahrscheinlichkeitstheorie Verw. Geb., 70:117–129, 1985. [505] L. R¨ uschendorf. Monotonicity and unbiasedness of tests via a.s. constructions. Statistics, 17:221–230, 1986. [506] L. R¨ uschendorf. Fr´echet-bounds and their applications. In G. Dall’Aglio, S. Kotz, and G. Salinetti, editors, Advances in Probability Measure with Given Marginals, pages 151–188. Kluver, Amsterdam, 1991.
References
387
[507] L. R¨ uschendorf. Bounds for distributions with multivariate marginals. In K. Mosler and M. Scarsini, editors, Stochastic Order and Decision under Risk, volume 19, pages 285–310. IMS Lecture Notes, 1991a. [508] L. R¨ uschendorf. Conditional stochastic ordering of distributions. Advances in Applied Probability, 23:46–63, 1991b. [509] L. R¨ uschendorf. Stochastic ordering of likelihood ratios and partial sufficiency. Statistics, 22:551–558, 1991c. [510] L. R¨ uschendorf. Optimal solutions of multivariate coupling problems. Appl. Mathematicae, 22:325–338, 1995. [511] L. R¨ uschendorf. Developments on Fr´echet bounds. In Proceedings of Distributions with Fixed Marginals and Related Topics, volume 28, pages 273–296. IMS Lecture Notes Monograph Series, 1996. [512] L. R¨ uschendorf. On c-optimal random variables. Statistics Prob. Letters, 27:267–270, 1996. [513] L. R¨ uschendorf and S.T. Rachev. A characterization of random variables with minimum L2 -distance. Journal of Multivariate Analysis, 32:48–54, 1990. [514] L. R¨ uschendorf, B. Schweizer, and M.D. Taylor. Distributions with Fixed Marginals and Related Topics. In Proceedings of Distributions with Fixed Marginals and Related Topics, volume 28. IMS Lecture Notes Monograph Series, 1996. [515] L. R¨ uschendorf and L. Uckelmann. On optimal multivariate couplings. In Distribution with given marginals and moment problems, pages 261–274. Kluwer, 1997. [516] T. Rychlik. Stochastically extremal distributions of order statistics for dependent samples. Statistics & Probability Letters, 13:337–341, 1992. [517] C. Ryll-Nardzewski. 40:125–130, 1953.
On quasi-compact measures.
Fund. Math.,
[518] G. Samorodnitsky and M. Taqqu. Stable Non-Gaussian Random Processes. Stochastic Models with Infinite Variance. Chapman & Hall, New York, 1994. [519] E. Samuel and R. Bachi. Measures of distance of distribution functions and some applications. Metron, 23:83–122, 1964. [520] V.V. Sazonov. Normal approximation - some recent advances. Lecture Notes in Mathematics, 879, 1981.
388
References
[521] H.H. Schaefer. Topological Vector Spaces. Springer, New York, 1966. [522] M. Schaefer. Note on the k-dimensional Jensen inequality. Annals of Probability, 2:502–504, 1976. [523] G. Schay. Optimal joint distributions of several random variables with given marginals. Stud. Appl. Math., LXI:179–183, 1979. [524] L. Schwartz. Radon Measures On Arbitrary Topological Spaces and Cylindrical Measures. Oxford University Press, London, 1973. [525] B. Schweizer. Thirty years of copulas. In G. Dall’Aglio, S. Kotz, and G. Salinetti, editors, Symp. Probab. Measures with Given Marginals, pages 13–50, Rome, 1990. Kluwer. [526] B. Schweizer and A. Sklar. Probabilistic Metric Spaces. Elsevier, North-Holland, 1983. [527] L. Seidel. On limit distributions of random symmetric polynomials. Theory of Probability and its Applications, 23:266–278, 1988. [528] V.V. Senatov. Uniform estimates of the rate of convergence in the multi-dimensional central limit theorem. Theory of Probability and its Applications, 25:745–759, 1980. [529] V.V. Senatov. Some lower estimates for the rate of convergence in the multi-dimensional central limit theorem. Soviet Math. Doklady, 23:188–192, 1981. [530] W.J. Shafer and H.F. Sonnenschein. Equilibrium in abstract economics without ordered preferences. Journal of Mathematical Economics, 2:345–348, 1975. [531] L.S. Shapley and M. Shubik. The assignment game, 1: the core. Int. J. Game Theory, 1:110–130, 1972. [532] M. Sharpe. Operator-stable probability distributions on vector groups. Trans. Amer. Math. Soc., 136:51–65, 1969. [533] A.N. Shiryaev. Probability Theory. Springer, 1984. [534] J.A. Shohat and J.D. Tamarkin. The Problem of Moments. American Mathematical Society, Providence, 1943. [535] I.A. Sholpo. ε-minimal metrics. Theory of Probability and its Applications, 28:854–855, 1983. [536] G.R. Shorack and J.A. Wellner. Empiricial Processes With Applications to Statistics. Wiley, New York, 1986.
References
389
[537] R.M. Shortt. Private communication. [538] R.M. Shortt. Combinatorial methods in the study of marginal problems over separable spaces. Journal of Mathematical Analalysis and its Applications, 97:462–479, 1983. [539] R.M. Shortt. Strassen’s marginal problems in two or more dimensions. Z. Wahrscheinlichkeitstheorie Verw. Geb., 64:313–325, 1983. [540] R.M. Shortt. Univerally measurable spaces: An invariance theorem and diverse characterizations. Fund. Math. Th., 121:35–42, 1983. [541] H.J. Skala. The existence of probability measures with given marginals. Annals of Probability, 21:136–142, 1993. [542] M. Sklar. Fonctions de repartition a dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris, 8:229–231, 1959. [543] C.S. Smith and M. Knott. Note on the optimal transportation of distributions. Journal of Optimization Theory and Applications, 52:323– 329, 1987. [544] C.S. Smith and M. Knott. On Hoeffding–Fr´echet bounds and cyclic monotone relations. Journal of Multivariate Analysis, 40:328–334, 1992. [545] T.A.B. Snijders. Antithetic variates for Monte Carlo estimation of probabilites. Statistics Neerlandica, 38:1–19, 1984. [546] D. Stoyan. Comparison Methods for Queues and Other Stochastic Models. Wiley, 1983. [547] V. Strassen. The existence of probability measures with given marginals. Annals of Mathematical Statistics, 36(2):423–439, 1965. ˘ ep´ [548] J. St˘ an. Simplicial measures. In Memor. Vol. of J. H´ ajek, pages 239–251. Academia Prague, 1977. ˘ ep´ [549] J. St˘ an. Probability measures with given expectations. In Proc. of the 2nd Prague Symp. on Asympt. Statistics, pages 315–320. North Holland, 1979. [550] V.N. Sudakov. Geometric problems in the theory of infinite dimensional probability distributions. Proc. Steklov Inst. Math., 141(2), 1979. [551] H. Sussmann. On the gap between deterministic and stochastic differential equations. Annals of Probability, 6:19–41, 1978.
390
References
[552] A.S. Sznitman. Equations de type de Boltzmann, Spatialement homog`enes. Z. Wahrscheinlichkeitstheorie Verw. Geb., 660:559–592, 1984. [553] A.S. Sznitman. Propagation of chaos. In Ecole d’Et´e Saint-Flour, volume 1464 of Lecture Notes in Mathematics, pages 165–251, 1989. [554] A. Szulga. On the Wasserstein metric. In Transactions of the 8th Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, volume B, pages 267–273, Prague, 1978. Akademia Praha. [555] A. Szulga. On minimal metrics in the space of random variables. Theory of Probability and its Applications, 27:424–430, 1982. [556] W. Szwarc and M. Posner. The tridiagonal transportation problem. Operations Research Letters, 3:25–30, 1984. [557] M. Talagrand. Matching random samples in many dimensions. Annals of Applied Probability, 2:846–856, 1992. [558] D. Talay. R´esolution trajectorielle et analyse num´erique des ´equations differentielles stochastiques. Stochastics, 9:275–306, 1988. [559] H. Tanaka. An inequality for a functional of probabillity distributions and its applications to Kac’s one-dimensional modal of a Maxwellian gas. Z. Wahrscheinlichkeitstheorie Verw. Geb., 27:47–52, 1973. [560] H. Tanaka. Probabilistic treatment of the Boltzmann equation for Maxwellian molecules. Z. Wahrscheinlichkeitstheorie Verw. Geb., 46:67–105, 1978. [561] A.H. Tchen. Inequalities for distributions with given marginals. Annals of Probability, 8:814–827, 1980. [562] P. Todorovic. An extremal problem arising in soil erosion modeling, pages 65–73. Reidel, Dordrecht, 1987. edt.: I.B. MacNeil and G.J. Umphrey. [563] P. Todorovic and J. Gani. Modeling of the effect of erosion on crop production. Journal of Applied Probability, 24:787–797, 1987. [564] Y.L. Tong. Probability Inequalities in Multivariate Distributions. Academic Press, 1980. [565] D.M. Topkis and A.F. Veinott jr. Monotone solution of extremal problems on lattices (abstract). In Abstract of 8th International Symposium on Mathematical Programming, volume 131, Stanford, CA,, 1973. Stanford University.
References
391
[566] A. Tuero-Diaz. Aplicaciones crecientes: Relaciones con las m´etricas de Wasserstein. PhD thesis, Universidad de Cantabria, 1991. [567] A. Tuero-Diaz. On the stochastic convergence of representations based on Wasserstein metrics. Annals of Probability, 21:72–85, 1993. [568] L. Uckelmann. Konstruktion von optimalen Couplings. Universit¨at M¨ unster, 1993. Diplom-Arbeit. [569] L. Uckelmann. Optimal couplings between one dimensional distributions. In Distribution with given marginals and moment problems, pages 275–282. Kluwer, 1997. [570] V.R.R. Uppuluri, P.I. Feder, and L.R. Shenton. Random difference equations occuring in one-compartment models. Math. Biosci., 2:143–171, 1967. [571] S.S. Vallander. Calculation of the Wasserstein distance between probability distributions on the line. Theory of Probability and its Applications, 18:784–786, 1973. [572] A.F. Veinott Jr. Representation of general and polyhedral sublattices and sublattices of product spaces. Journal of Linear Algebra and its Applications, 114/115:681–704, 1989. [573] A.M. Vershik. Some remarks on infinite-dimensional linear programming problems. Russian Math. Surveys, 25:117–124, 1970. [574] A.M. Vershik and V. Temelt. Some questions of approximation of the optimal value of infinite-dimensional linear programming problems. Siberian Math. J, 9:591–601, 1968. [575] W. Vervaat. On a stochastic difference equation and a representation of non-negative infinitely divisible random variables. Advances in Applied Probability, 11:750–783, 1979. [576] N.N. Vorobev. Consistent families of measures and their extensions. Theory of Probability and its Applications, 7:147–163, 1962. [577] W. Wagner. Monte Carlo evalutation of functionals of solutions of stochastic differential equations. Variance reduction and numerical examples. Stoch. Analysis Appl., 6:447–468, 1988. [578] W. Warmuth. Marginal Fr´echet-bounds for multidimensional distribution functions. Statistics, 19:283–294, 1976. [579] L.N. Wasserstein. Markov processes over denumerable products of spaces describing large systems of automata. Problems of Information Transmission, 1969.
392
References
[580] H. von Weizs¨acker and G. Winkler. Integral representation in the set of solutions of a generalized moment problem, 1980. [581] E. Wesley. Borel preference orders in markets with a continuum of traders. Journal of Mathematical Economics, 3:155–165, 1976. [582] A. Wieczorek. On the measurable utility theorem. Journal of Mathematical Economics, 7:165–173, 1980. [583] E. Wild. On Boltzmann’s equation in the kinetic theory of gases. Proc. Camb. Phil. Soc., 4:602–609, 1951. [584] G. Winkler. Choquet order and simplices with applications in probabilistic models. Lecture Notes in Mathematics, 1145, 1988. [585] J. Yukich. Exact order rates of convergence of empirical measures. Preprint, 1991. [586] J. Yukich. The exponential integrability of transportation cost. Preprint, 1991. [587] J. Yukich. Some generalizations of the Euclidean two-sample matching problem. Prob. Banach Spaces, 8:55–66, 1992. [588] V.M. Zolotarev. On the continuity of stochastic sequences generated by recursive procedures. Theory of Probability and its Applications, 20:819–832, 1975. [589] V.M. Zolotarev. Approximation of distributions of sums of independent random variables with values in infinite dimensional spaces. Theory of Probability and its Applications, 21:721–737, 1976. [590] V.M. Zolotarev. Metric distances in spaces of random variables and their distributions. Math. Sb., 30(3):393–401, 1976. [591] V.M. Zolotarev. General problems of the stability of mathematical models. Bull. Int. Stat. Inst., 47(2):382–401, 1977. [592] V.M. Zolotarev. On pseudomoments. Theory of Probability and its Applications, 23:269–278, 1978. [593] V.M. Zolotarev. On the properties and relationships of certain types of metrics. Zapiski Nauchn. Sem. LOMI, 87:18–35, 1978. [594] V.M. Zolotarev. Ideal metrics in the problems of probability theory and mathematical statistics. Austral. J. Statist., 21(3):193–208, 1979. [595] V.M. Zolotarev. Probability metrics. Theory of Probability and its Applications, 28:278–302, 1983.
References
393
[596] V.M. Zolotarev. Contemporary Theory of Summation of Independent Random Variables. Nauka, Moscow, 1986. In Russian. [597] V.M. Zolotarev. Modern theory of summation of independent random varables. Nauka, Moscow, 1987. In Russian. [598] V.M. Zolotarev and S.T. Rachev. Rate of convergence in limit theorems for the max scheme. In Stability Problems for stochastic models, volume 1155, pages 415–442. Springer, 1984.
Abbreviations
Bold pagenumbers refer to this volume, non-bold pagenumbers to the other volume.
a.e. ARCH a.s.
almost everywhere autoregressive conditional heteroscedasticity almost sure
158, 385 39 8
BLIL
bounded law of the iterated logarithm
306
CLT ch.f. CRI CTM
central limit theorem characteristic function communication resolution interval Capetanakis–Tsybakov–Mikhailov
85 400 38 220
d.f.(s) dna DP DTP
distribution function(s) domain of normal attraction dual polyhedron dual transportation problem
8, 107 306 23 23
GARCH
general ARCH
39
htl
explained on page
433
IFS i.i.d.
iterated function systems independent identically distributed
202 35
KKR KRP
Kakosjan, Klebanov, and Rachev Kantorovich–Rubinstein transshipment problem
43 vii, 2
LCFS LHS LLN lsc
last come first served left-hand side law of large numbers lower semicontinuous
220 405 81 113
396 MKP MKTP MTPA MTP MTPP OTP PDE PERT PP r.f.(s) r.v.(s) SDE SLLN supp P TP usc
Abbreviations Monge–Kontorovich mass transportation problem classical Monge–Kantorovich transportation problem MTP with additional constraints mass transportation MTP with partial knowledge of the marginals optimal transportation plan partial differential equation network model primal polyhedron random field(s) random variable(s) stochastic differential equation strong law of large numbers support of P transportation problem upper semicontinuous
vii, 1, 19, 58 374 vii vii, 1 4 3 xii, xvi 148 23 248 3 39 30 20 21 127
Symbols
Bold pagenumbers refer to this volume, non-bold pagenumbers to the other volume.
◦
A d A=A Ab Ab Adk
Am Am Aε A(h, g)
An (t) A+ n (t) An (α) An (α) Ap (H)
A(α) ◦ Ar f Q ASp (P1 , P2 ) A∗ (α)
interior of A 59 closure of A with respect to d 68 69 69 set of all linear subspaces Vk of IRd 137 69 69 139 assumption for a moment problem 62 148 148 190, 235 190, 235 optimal multivariate transshipment costs 158 263 393 97 191, 235
Aut(IRd )
a(s, k) au1 (x) a(Z)
B B∗ Bx B(1, n1 ) B(g)
BK (Si ) B n (α) Bkn (Zk ) Bn (m)
all invertible linear operators (automorphisms) 151 109 296 superlinear mapping 241 Banach limit 366 adjoint operator of B 132 109 Bernoulli distribution 257 assumption for the solution of a moment problem 62 61 191 191 set of nonnegative Borel measures 377
398
Symbols
Bp (H) B(p; x, y) Br B(Si ) B(X, B) B1 (x, y) B2 (x, y) B(α) Bx (ε) (B, · ) b br bu1 (x) ba (P1 , . . . , Pn ) ba (S, B)
C, C i
Cp,d Cs,Z
Cs,εZ C(g)
Cb (S), C b (S)
C(T )
upper bound for Ap (H) 158 quadratic form 280 ball of radius r 149 72 58 28 28 263 109 separable Banach space C(T ) 248 transshipment 371 absolute moments 102 296 measures with fixed marginals 62 finitely additive measures 58 spaces of continuous and i times differentiable functions 255, 333 321 integrable (s − 1)-fold derivative 115 117 assumption for the solution of a moment problem 62 Banach space of bounded continuous real-valued functions on S 63, 164 Banach space 248
C γ (c; σ1 , σ2 ) C(Q) ◦
C(Q) ◦
C(Q)∗ Cb (IRd )
C(S)+ C(S)∗+ Cm (θ) Cov Cov (Xi |Fi−1 ) c c(i, j) c(x, y) c1 (x, y) c∗ (x, y) con(supp (P )) (D) D(h, g)
DP D(P, Q) Dp f (x) DΦ Dk (ϑ) Dm (θ) dr dr,k d(h) dn,m (x, t) d(x, y) d(X, Y )
307 set of continuous functions 384 quotient of the space C(Q) 384 conjugate space 393 set of all bounded continuous functions on IRd 152 166 166 170 covariance 108 conditional covariance 96 closure 304 discrete cost 27, 29 cost function viii, 10 ∂ = ∂x c(x, y) 128 reduced cost function 170 129 duality 76 assumption for a moment problem 62 dual polyhedron 23 50 optimal pairs 103 i ) 118 = ( ∂Φ ∂xj 112 170 smoothed version of d 137 61 divisor criterion 180 determinant of An,m (x, t) 397 76 uniform metric 137
Symbols dr (X, Y ) dKR (σ1 , σ2 )
dn (μ) dom fk dom Γ E Ek−1 (f ; Q) E(Si )
Es (X, Y ) ess sup ex H
F∗ FP Fi Fmn F mn (n)
Fn (Fs,X ) Fu F Mp (P1 , P2 ) Fi (s) F1 ∧ F2 (t) F1 ∨ F2 (t) F+ (x) F− (x)
F1∗ (x) F2∗ (x) F (x, y)
probability metric 170 Kantorovich– Rubinstein distance 162 267 253 235
F P (x, y) F σ (x, y) (−1) FNs (y) f fc
separable metric space 278 factor-norm 384 finite elementary functions on Si 76 set of points 425 essential supremum 386 extremal points of H 19
f cc f (m) f (n)c f∗ f ∗∗
Fr´echet bound 19, 31, 33 distribution function of P 18 real distribution function 107 nth integral of m 375 survival function 375 385 423, 107 355 Fort´et–Mourier metric 17, 51 293 infimal convolution 148 supremal convolution 148 := min(Fi (xi )) 107 := k Fi (xi )−(n−1) i=1
107
f (n)∗ f∗ f2 (u) fa (x) f (Z1 , Z2 ) fV (·)
Gk GQ
Gs,p Gα G|G|1|∞ G(m, α, β) Gs,X (t) G(u, v) +
Gσ (x, y)
399
12 12 extended Fr´echet bound 19 26 19 310 Young–Fenchel transform 104 c-conjugate of f 124 doubly c-conjugate of f 124 mth Fr´echet derivative of f 102 nth c-conjugate of f 124 p-conjugate 114, 124 second p-conjugate 102, 112 n-conjugate function of f 112 lower conjugate 103 38 145 extension f 317 translation by V 95 359 determination of an optimal measure Q 29 class of functions 103 geometric α-stable r.v. 242 71 grid class 41 424 graph of (u, v) ∈ DP 23 19
400
Symbols
Gn (Z) G(μ) g= gr Γn g(χ) H H
(k)
Hn hβ
hμ (A × B)
h(t1 , t2 )
I Iq Is I[A] I(|f − g|) I(h) I{0, g, a, b} IND i(x1 , x2 )
255 μ-neglegible open set 221 (g1 , ..., gN ) ∈ M 63 graph of Γn 194 363 Haar probability 133 distribution function of max(V1 , . . . , V ) 156 258 indicator or characteristic function 251 generalized upper Fr´echet bound 54, 35 Hausdorff metric 248 175 unit matrix 334 operator 415 indicator function of a set A 139 semimetric on P(S) 67 65 69 = p (X, X) 76 indicator metric 111
JA
151
K(d, B) Kr (P, Q)
137 Kantorovich-type metric 48, 412 Kantorovich metric 412 Markov kernel 200
K1 (P, Q) K(x, ·)
kr
rth difference pseudomoment 122
L´evy metric 81, 109 ∞ L L∞ -space of functions 388 ◦∞ L 389 Lc 17 Lf continuous linear functional 401 Li 30 Ln class of nth integrals 47 5p L 139 (L1), (L2) 309 L[·]c (a, r, d) 58 L[·] (a, r, d1 , dr ) 60 59 L[·]c (r, d) 61 L[·] (r, d1 , dr ) 1 Lf (Pi ) 69 LSC(βS1 × βS2 ) 252 Lp (X, Y ) 132, 72 Lp -metric 76 Lp (X, Y ) 5 Lp,r (X, Y ) 140 5 L (X, Y ) probability L
p,r
∗p,t (X, Y ) L L∗p,t (X, Y ) Lr (μ) Lp (μ) L(ω, μ) Lipb Liph (r) Lip(r, S) 1 ∞ ∗1
metric 170 302 280 r-fold integrable functions 32 196 Lagrange function 311 bounded Lipschitz functions 88 Lipschitz norm 49 r-Lipschitz functions 163 Kantorovich metric 35, 86 bounded real sequences (ξT )∞ T =1 366 92
Symbols ∗p,t (m1 , m2 ) 2 (P X , P Y ) p (P1 , P2 ) p (X, Y ) p (μ, ν) r (P1 , P2 )
280 = 2 (X, Y ) 132 p -metric 6, 87 76 334 smoothed version of 1 (of order r) 35, 87
Mc
set of measures 403 pseudometrics 423 linear space 384 set of measures 403 subset of Mr 403 L´evy measure 246 set of all signed Borel measures μ on IRn 47 40 41 280 280 59 60 15 81 142 142 35 measures with marginals Pi 58 finite signed Borel measures 375 319
Mi M ◦ , Mk◦ Mr (r > 0) Mr0 Ms Ms◦
Mμ Mμ (B) M 1 (CT ) Mp (CT , m0 ) M1 (c) M2 (c) MC (F, G) M (h, δ) MX (n) Mθ (n) M 1 (P1 , P2 ) M (P1 , . . . , Pn ) k
M (IR ), M
M (S) Mf (S), Mf (S×S)
M1 (U ) Mμ m(c) m0 (c) mX (n)
finite measures 36 probability measures 191 40 58 59 142
401
mθ (n) mn
142 375
N(m,σ)
normal distribution 188 309 normalized rounding error 81
Ns n−1 Sn,c
OTP(c)
(t)
P
P (Xu )u≤s P ∗ (A × B) P1,2 (B|A) P ∗ (h) PP Pε X P (μ)
P ∧μ
pN p mn (p, h) pX (t)
Qd
OTP with respect to c 3 marginal of P in the direction t 46 285 35 transportation plan 2 outer integral of h 65 primal polyhedron 23 approximation 93 stochastic optimization problem 49 infimum in the lattice of measures 41 180 density of mn 376 vector problem 180 density of the r.v. X 419
Qγ Qp,r Q(a)
set of d-quasi periodic points 355 309 140 256
Rp,r R = R(k, n) R(Y )
140 405 145
402
Symbols
R(x) rba (S, R(E))
S1 S1 S2 S coll S ind γ S+ γ S− Sn Sn Sn∗
Sn,c Sn,m (S, ≤) L S1 S2 (Si , Bi ) S(c) S0 (c) (S, d) (SE) S(h) SLr (P1 , P2 )
Sp (P ) Spp (P1 , P2 ) (S, U) Sm (x, h) S(Y ) S(μ)
supp σ
x = xxM 137 x regular bounded additive measures 63
unit circle 47 421 421 129 129 313 313 80 simplex 181 sum of conventional roundings 80 total rounding error 81 255 topological space with closed preorder ≤ 44 316 measurable spaces 58 58 59 (separable) metric space 92 333 shift operator 65, 390 Skorohod– Lebesgue metric 34 97 dual form of Sp (P1 , P2 ) 97 measure space 36 392 F(Y )-Suslin functions 79 a symmetry group associated with μ 133 265
T Tr T C(u, v) Tp (t) T (λ) T← tA U UC U0 U[·]c (a, r, d) U[·]c (r, d) U[·] (r, d1 , dr ) U U Uμ (ϕ)
(U, · ) u X , uY (k) us u1n (x), u2n (x) Vi Vε V (S) V+ (S) V0 (S) Val(c; σ1 , σ2 , b) Var
val(c; σ1 , σ2 , b)
vr (X, Y )
v r (X, Y )
transformation 192 138 total costs 25 quantile function 32 a weighted sum 126 253 132 dual operator 40 17 415 57 58 60 norm 74 transportation problem with local upper bound μ 40 separable Banach space 86 densities 107 285 263 rounding error 81 finite covering ε-net 93 219 219 220 optimal value 252 total variation distance in X (IRd ) 134 optimal value of the dual problem 253 absolute pseudomoment 194 105
Symbols Wi W p = p
Wu w# wn+1 w|M wp,N (X)p+1 X∗ X ◦i Xs [x] := [x1 ] Xm:n , Fm:n
(X, T (X)) xα x, x∗
(Y, ≤)
Brownian motions 278 Lp -Wasserstein metric / Lp -Kantorovich metric 40 354 transposed function 172 “output” flow 71 restriction of w to M 180 249 topological dual space of X 112 76 normalized variation 308 conventional rounding 59 order statistic resp. its distribution function 156 Monge solution 3 384 bilinear form 112 ordered topological space 145
Z(·) Zk,n (X, Y ) Zn (X, Y ) 5 Z(X, Y ; s, p, α)
action profile 367 ideal metric 47 ideal metric 383 426
IBb (S × S), IBb (S)
bounded Borel functions on S × S, resp. on S 221 the d-dimensional Euclidean space bounded universally
IRd UIb (S)
ZZn + λ\1 Aϕ
Aγ A(c, )
AM Ac (P )
A (S, ) B, B(U ) B(c, )
B(En )μ Bm (S)
B(S)ν B(S)σ B0 (S) = σ(C(S)) C(c) C(c; σ1 , σ2 )
403
measurable functions 169 384 150, 155, 111 σ-algebra generated by a measurable function ϕ 420 optimal value 314 optimal value of the general Kantorovich– Rubinstein problem 163 class of M-analytic sets 167 generalized Monge– Kantorovich functional 87 199 Borel σ-algebra 30, 418 optimal value of the dual Kantorovich– Rubinstein problem 164 μ-completion of B(En ) 194 set of lower majorized Borel functions on S 145 ν-completion of B(S) 220 σ-completion of B(S) 167 Baire-sets in S80 set of stable imputations optimal value of the general Monge–Kantorovich mass
404
Symbols
D Dn = Dn (μ) Dγ D(c; σ1 , σ2 )
D(x) D()
Eθ Eθ,u F2 FA+B FA−B Fr Fr FZ F(A, B) F(A, B, F σ ) F(F1 , F2 ) 1
F (R) F b (S) F(S) Fo (S) Gp G(A, B, Gσ ) G(m, Λ, α, β) G(S) Gb (S)
transfer problem 164 the diagonal in S × S 210 81 311 optimal value of the dual Monge– Kantorovich mass transfer problem 164 86 Borel measures with given marginal difference 14 177 177 421 set of d.f.s 11 set of d.f.s 13 class of functions 412, 102 104 distribution function of Z 184 18 19 joint d.f.s F with marginals F1 ,F2 51, 1 distribution functions 421 bounded upper semicontinuous functions 70, 74 upper semicontinuous functions 70 219 pairs of bounded continuous functions 97 19 grid class 332 lower semicontinuous functions 70 70, 74
H(F1 , F2 ) Id (P1 , P2 ) K K(P ) L L, Lo Lm
L(h; δ) L1 (R, P ) Lp (P ) L1 (Pi ) L(X, Y ) L(Y ) M1 M(P1 , P2 ) Mp (X) N Oε (P0 ) PH Pi
P2 PL Pγ Pμμ12 P(S)
relaxed marginal class 52, 3 94 Kantorovich metric 417 dual Monge–Kantorovich functional 87 L´evy stable motion 240 class of topological spaces 219 measurable functions bounded below 62 71 P -integrable functions 63 97 62 joint distributions 414 F(Y )-Suslin sets 79 class of laws 245 probability measures with given marginals 3 334 67 neighborhood of P0 92 space of probabilities 87 Borel probability measures on a product of i copies of (S, d) 27 322 class of all P ’s on L 31 set of measures 309 37 space of tight probabilities on S 64
Symbols Pb (S) P m (S), P P (U ) P(μ, Q)
PL (μ, σ) R R R(×i = 1n Bi ) S1 U U(S) V X Xc X50∗ (X )2 , L(X, Y ) Xs∗ X (B) X (C[0, 1])
Xp (CT , m0 ) X (R) X (IRk )
X (T, g, a) X (U ) Z
70 69, 96 multivariate compound Poisson distribution 129 33 ring 63 class of rules 184 63 class of laws 245 set of input flows U 74 universally measurable sets 167 set of output flows V 74 space of real random variables 414 class of r.v. belonging to X ∗ 427 426 space of joint distributions 414 417 space of random fields 248 space of r.v.s on a nonatomic probability space 54 class of processes on CT 280 set of all real-valued r.v.s 62 class of k-dimensional random vectors 103 space of X ∈ X (C[0, 1]) 63 space of U -valued r.v.s 86 291
405
Z1
class of Z-laws 246
α α1G1 ×···×Gn [α1 , . . . , αn ] αs,p (X, Y ) βS
384 73 403 107 ˇ Stone–Cech compactification of S 225 176 set of transshipment plans 384 set of signed Borel measures Ψ on IR2n 47 set-valued mapping 236, 302 finite collection of functions 307 dual representation of λpp (P1 , P2 ) 105 quasi-antitone 109 78 kth difference of f with step h 384 function class 47 180 180 59 61 60 389 discrete measure 400 absolutely continuous marginal difference 378 391 392 rate of completing the final mass 372 55
Γj Γμ
Γμ
Γn γ γpp (P1 , P2 )
Δ-antitone Δj Δkn Δkr Δs Δ∗s Δr,a Δr,θ Δθ Δkx;h1 ,...,hk Δα x;d Δb (·)
Δα t f (x) Δkh Pm (x) Δ(x)
δ
406
Symbols
δx δp (T ) ζ ζF ζr
ζr ζn (P1 , P2 ) ζs,p (X, Y ) θs ϑs,p κ κ2 κn κr
κm (X1 , θ)
Λ λ = λ + − λ− Λkϕ λpp (P1 , P2 ) λ(X, Y ) μ 5 μn μr μγ μ(ε)
Dirac measure at x 207 measure of deviation 72 Zolotarev metric 416 Zolotarev ζ-metric 110 extension of the Kantorovich metric 102 modification of ζr 104 Zolotarev metric of order n 46 ideal metric 417 ideal metric 81 ideal metric 102 Kantorovich metric 88, 417 315 382 rth difference pseudomoment 143 difference pseudomoment 177 homogeneously convex functional 415 Hahn decomposition 93, 36 generalized Lipschitz space 394 105 λ-metric 423 characteristic function of μ 132 a measure 267, 322 convolution type metric 134 optimal solution for C γ 309 probability 24
◦
μc (·|·) μ 5c (·|·) μ∗ (A × B) μ 5(P1 , P2 ) μ(P1 , P2 ) μ(·G) μ(·, S), μ1 (·) μ(S, ·), μ2 (·) μF (X, Y ) μr (X, Y ) top
μ ≺ ν
νr∗ ν∗ (g) ν ∗ (g) νr ν(ϕ) ξr
πi
π1 μ(B) π2 μ(B) ΠK (x) π∗
π(X, Y )
(X, Y )
Kantorovich– Rubinstein functional 14 Kantorovich functional 3 36 μ-minimal metric 110, 417 105 G-dependence metric 37, 94 fixed marginal distribution 53 fixed marginal distribution 53 functional in X × X 419 probability metric 170 ν-convergence implies μ-convergence 134 137 147 147 136 220 rth absolute pseudomoment 143 projection on the ith coordinate 155 := μ(B × S) 163 := μ(S × B) 163 projection of x on K 122 optimal admissible permutation 16 Prohorov metric 417, 86 Kolmogorov (uniform) distance 24, 133 109, 184
Symbols p t K t w t σ σi σM σr∗ σ ∗ (X, Y ) σ(P1 , P2 ) σr σ r (P1 , P2 ) τK
τr τr∗ τr τ (X, Y )
ϕ(ε) ϕ(μ) ϕ (τ ; t) Φ ΦS (θ) Φσ χ
χ∗ χr
Kolmogorov metric 111 mapping 180 K-stationary divisor 182 Webster’s rule 182 permutation 254 discrete measures 407 supremum of the set Φ(σ, M ) 180 92 134 total variation metric 30 87 smoothed version of σ 35 topology generated by K 90 moment-type condition 88, 135 92 13 compound metric, τ -metric 373 97 optimal value of P (μ) 49 characteristic function 46 standard normal d.f. 266 Laplace transform 246 d.f. of N (0, σ 2 I) 325 uniform distance between characteristic functions 137 “tB -uniform” version of χ 137 “smoothed” version of χ 137
χn,c (m) χn,c (P1 − P2 ) χp (X, Y ) χ 5p (X, Y ) ψ(μ)
(Ω, A, P ) ωk (f, t)
ωk (f ; Q; t) ω(γ) f c · · ∞ mn μ||k,r ||h||H i X i − X T,p XT · bL X∗T,p X∗T,∞ b∞ Dis1 ,...,is ·f q,j (x) x − yp uC b (S) (ξT )∞ T =1 mb,c μr ◦
fk
407
absolute pseudomoment 382 382 metric 249 minimal metric 249 solution set corresponding to P (μ),ϕ(μ) 49 probability space 8, 414 kth modulus of continuity of f 384 393 405 Lipschitz norm 16 norm on 40 supremum norm 91 Kantorovich– Rubinstein norm 46, 378 minimal function on Mr◦ 48 seminorm of h 49 286 300 bounded Lipschitz norm 306 312 312 318 103 p-norm 158 uniform norm on C b (S) 164 norm of ∞ 366 Fortet–Mourier metric 382 383 ◦∞
seminorm on L 389
408
Symbols
μk,ϕ
V V n L B(Bi , Pi ) i=1 n E
(Si , Bi )
i=1
∨-stable ∧-stable x∨y x∧y ∧ (−∞)1x1 ≥x2 ∂A0 (c, ·)(0) ∂f (x) ∂c f ∂p(0) ∂V (c, ·)(0)
generalized Kantorovich– Rubinstein norm 394, 404 norm 74 direct sum of Bi -measurables 61 product 58 69 69 = max{x, y} 4 = min{x, y} 4 min 19 77 subdifferential 268 subdifferential of f 104, 287 c-subdifferential of f 125 p-subdifferential 178 subdifferential 243
∇f (x) (·, ·) (·)+ ≤st |S1
α! α β
[r] [x]c x |W i |T,∞ [t]G , [t]∗G
= grad f (x) = ∂f (x) 115 inner product in IRd 142 = max(0, ·) 71 the stochastic ordering 147 restriction to S1 290 lexicographic order 397 403 convolution of measures 411 403 integer part of the number r 44 c-rounding of x 53 smallest integer larger than or equal to x 231 318 336
Index
Bold pagenumbers refer to this volume, non-bold pagenumbers to the other volume.
α-stable, 86 Abel method, 126, 128 absolute pseudomoment(s), 142, 382 absolutely monotonic, 424 abstract duality theorem, 178, 242, 244, 255, 260 version of the dual problem, 175 of the Kantorovich–Rubinstein functional, 179, 188 of the mass transfer problem, 172, 175 action profile, 367, 368 admissibility of (fi ), 153 admissible, 2 permutation, 16 affine maps on IRk , 202 analytic sets, 177, 190 antithetic variates, 154 apportionment theory, 53 approximable compactly, 63 approximate
extension property, 292 theorem(s), 292 approximating algorithms, 12 model, 79 approximation finite-dimensional, 92 model, 77 of mass transfer problems, 306 of queueing systems, 71, 72 of the distribution of sums, 128 optimal rate, 93 queues, 76 theorems, 306, 307 arbitrary directions, 43 mutually dependent, 72 arbitrary compact space, 198 Arzela theorem, 202, 218 asset returns, 39 assignment games discrete and continuous, 25 asymptotic distribution, 273 normality, 224
410
Index
attracted trajectory, 358 automorphism, 150 autoregressive conditional heteroscedasticity (ARCH), 40 modeling of asset returns, 39 auxiliary theorem on convex sets, 179 Baire function, 80, 167, 220 measurable functions, 177, 302 σ-algebra, 197, 217 sets, 80, 219, 302 subset, 219 balancing condition, 372 Banach lattice, 166, 173, 177, 209, 301, 366 conjugate, 180, 187 of bounded real-valued functions, 292 limit, 366 space(s), viii, xvi, 166, 389 conjugate, 227 dual, 166, 251, 261 isometric isomorphism, 405 real, 112 separable, 4, 33, 329, 354 Barnes–Hoffmann greedy algorithm, 20 Berry–Ess´een bound, 258 type result, 117 theorems, 113 Berry-Ess´een theorem, 255 Berry-Ess´een type result, 255 Beta-distributed, 214 biconjugate function, 112 bilinear form, 112 binary random trees, 254 relation, 322 search trees, 260, 263
BLIL, see bounded law of the iterated logarithm Boltzmann -type equation, 277, 307, 318 Bonferoni bounds, 151 bootstrap approximation, 199 estimator, 198, 199 sample, 198, 199 Borel extension problem, 301 function, 167, 177, 199, 220 measurable function, 339 measure, 167 on a compact space, 166 method, 126, 128 probability measure, 89 σ-algebra, 63 set(s), 219, 301, 302 subset, 295 bounded Kantorovich metric, 79 law of the iterated logarithm (BLIL), 306 boundedness from below, 208 of the cost function, 169 bounds for mn in the multivariate case, 382 for the deviation of two dependent queueing systems, 72 for the total transportation cost, 158 of deviation between probability measures, 51 to the total cost, 158 branching processes, 216 type recursion, 206 with multiplicative weights, 207 Brownian motion, 309, 314 motions, 278 bucket algorithm, 272 Burger’s type equation, 289
Index c-conjugation, 124 c-convex minorant, 125 c-convexity, 124 c-coupling optimal, 123, 130, 131 c-cyclic monotone, 131 monotonicity, 131 c-cyclical monotonicity, 126 c-optimal couplings, 127 c-optimality, 130 c-rounding, 53, 57 lower bounds, 58 c-subdifferential, 125 c-subgradient, 125 C 1 -operator, single-valued, 288 cadlag functions, 328 Cantor’s diagonal method, 360 capacity, 79 Capetanakis–Tsybakov–Mikhailov (CTM) protocol, 220 Carlson’s inequality, 174 lemma, 323 case of equiprobable atoms, 16 Cauchy-Schwarz inequality, 345 central limit theorem (CLT), 103, 179, 204, 226, 263 for the total wealth, 240 functional, 241 local, 137 quantitative version, 264 rate of convergence, 34, 374 C´esaro method, 126, 128 chance discretization points, 42 chaotic, 278, 288, 301, 307, 319 characteristic function, 231 characterization classical Hoeffding–Fr´echet, 52 of c-optimal couplings, 127 of optimal 2 -couplings, 116 characterize the duality theorem, 259 charge, 83 choice function, 352 problem, 352 theory, 352
411
Choquet’s Theorem, 79 classes AB0 (S × S), 197 classical Hoeffding–Fr´echet characterization, 52 Kantorovich–Rubinstein functional, 394 classical multiple-access problem, 220 closed, 361 formula for mn in the univariable case, 380 preorder, 322–324, 327, 336, 340, 341, 344 set-valued mapping, 358 subspace, 172 closeness, 273 between Sn and Sn∗ , 81 in terms of a weak metric, 43 CLT, see central limit theorem, see central limit theorem coarse grid, 41 common probability space, 339 communication resolution interval (CRI) algorithm, 192 compact, 64, 70, 359 case metrizable, 190 nonmetrizable, 196 measures, 64 space, 161, 219 arbitrary, 198 metrizable, 170, 190, 208 compactly approximable, 63 compensatory transfers, 367 competitive equilibria models, 340 complex queueing models, 53 compound metric, 373 computer tomography paradox, 51 conditional covariance, 96 measure, 327 conditionally independent, 95 conditions for a nontrivial explicit solution, 281, 285
412
Index
for duality in the Monge–Kantorovich problem, 248 on the cost function, 176 conjugate Banach lattice, 187 function, 112 functional, 178 connectivity hypothesis, 363 continuity, 70, 72 continuous, 358 and discrete mass transportation problems, 23 function, 300, 330 increasing, 329 isotone, 325 functionals, 68 linear functional, 404 transformation, 404 linear interpolation, 338 partial derivatives, 280 selection theorem, 302 utility, 337 function, 329, 330, 334 -utility-rational, 352 function, 352 continuously differentiable, 280 contraction method, 37, 192, 254, 264 of Φ, 314 of Φ with respect to ∗p,t , 297 of Φ with respect to the ∗p,t -minimal metric, 283 of Φ with respect to the minimal metric ∗p,t , 304 of stochastic mappings, 277 of transformation, 191 contractive mapping, 200 conventional rounding, 59 convergence of a net to a point, 261 of algorithms, 37 of recursive algorithms, 204 converse to the duality space, 86 convex, 64, 103, 112 biconjugate
function(s), 112 cone, 179, 184 thick, 179, 184 conjugate function(s), 112 functional, 178 sets, 184 auxiliary theorem, 179 subset, 179 convex function, 289 convexity, 285 convolution argument, 380 of a measure, 380 of measures, 412 property, 380 copula, 7 corner rule generalized northwest, 24 northwest, 2, 7, 17, 22 Hoffman’s, 26 multivariate version, 34 southwest, 25 cost of shipping a unit commodity form origin i to destination j, 27 cost function(s), viii, xii, 2, 170, 172, 198 Δ-antitone, 109 bounded below, 170, 208 boundedness, 169 condition, 176 duality theorem for symmetric, nonnegative, 16 lower semicontinuous, 365 nonsymmetric, 12 quasi-antitone, 109 reduced, 170, 190 regular, 176, 279 semimetric, 14 strictly positive, 365 symmetric, 4, 11 coupling(s) optimal, 112 couplings, 323 Courant–Fischer lemma, 137
Index CRI, see communication resolution interval CTM, see Capetanakis–Tsybakov– Mikhailov cyclic -monotone, 115 maximal, 116 operator(s), 287 operator(s) and mass transfer problem, 288 -monotonicity, 115, 289 condition, 131 cyclical monotone function, 10 -monotone function, 38 Δ-antitone cost functions, 109 d-closure of the set of upper semicontinuous functions, 81 d-Lipschitz, 349–351, 357 -utility-rational, 352 choice function, 352 utility function, 349 d -Lipschitz, 351 d1 -Lipschitz, 351 d-quasiperiodic, 355 d-valuation, 346, 348 Debreu theorem, 323, 329, 335 demand distribution, 2 demand theory, 370 densitiy of Lip (c, S; X), 293 density, 376 density coupling lemma, 324 deviation between probability measures, 51 Diaconis and Freedman results, 179 diagonal method of Cantor, 361 difference between λp and γp , 106 pseudomoment, 142, 176 differentiability of functions, 115 differential equations stochastic, 277 diffusion with jumps, 331 Dini’s theorem, 74 Dirac measure, 207, 258, 311
413
disastrous event, 237 discrete and continuous assignment games, 25 mass transportation problems, 23 case, 35 marginal measure, 313 metric, 93 Monge condition, 53 transportation problem, 2 discretization of the SDE, 332 point(s), 41, 42 Wiener process, 336 distance between X and Y , 62 in probability, 417 distance from point x to set A, 333 distribution asymptotic, 273 demand, 2 function, 4 multinomial, 272 of the exact solution, 348 of the past, 285 supply, 2 uniform, 272 divisor criterion, 180 rule(s), 180 of (1/t)-rounding, 180 stationary, 181 Dobrushin’s result on optimal couplings, 36 Dobrushin’s theorem, 93 domain of normal attraction, 132 of normal attraction (dna), 306 DTP, see dual transportation problem dual Banach lattice, 173 Banach space, 261 extremal problem, 259 linear extremal problem, 163 Monge–Kantorovich
414
Index
functional(s), 64, 87 problem, 247 polyhedron (DP), 23 problem, 58, 217, 219 of the nontopological version of the mass transfer problem, 265 optimal value, 253, 268 representation, 5, 14 for Lp -minimal metrics, 96 transportation problem (DTP), 23 dual representation of p , 201 duality for Suslin functions, 79 problem in a mass setting, 242 relation, 212, 213 representation for mn , 379 results of KRP, 13 theory for mass transfer problems, 161 duality theorem(s), 171, 207, 214, 219, 225, 277, 375 abstract, 178, 242, 244, 255, 260 characterization, 259 5 p , 151 for L for a marginal problem with moment-type constraints, 251, 253 for a nontopological version of the mass transfer problem, 265, 272 for compact space, 161 for infinite linear programs, 241 for mass transshipments on a compact space with constraints on the marginal kth difference, 402 for semicontinuous functions, 76 for symmetric, nonnegative cost functions, 16 for the KRP, 15 formulation, 175 general, 7, 82, 84
in mass settings, 168, 241 in topolocial setting, 76 more general, 211 of Isii, 59 of Kantorovich–Rubinstein, on noncompact spaces, 222 on a metrizable compact space, 208 on arbitrary compact space, 169, 171, 207 on metrizable compact space, 170 on noncompact spaces, 211, 222, 232 and general cost function, 225, 234, 238 with, 238 with continuous and cost function, 234 with continuous cost function bounded below, 223 with cost function satisfying the triangle inequality, 222 with metric cost function, 86 Dubovitskii–Milyutin theorem on convex sets, 180, 184 Dudley’s problem, 6 dynamic optimization problem, 363 dynamical system, 354, 358 dynamics of a queueing system, 74 ε-coincidence of marginals, 50 of moments, 50 efficient infinite trajectory, 364, 365 empirical measure, 322, 326 environmental processes, 203 equicontinuous, 201, 202 Euclidean case, 96 norm, 323 Euler constant, 230, 271 method, 126, 128, 337 summation formula, 268 existence
Index of optimal measures, 270 of optimal solutions, 217 explicit 5 p in representations for L X (IR), 152 explicit solution of the mass transfer problem with a smooth cost function, 276 exponent, 132 exponential convergence rate, 196 rate of convergence, 219, 253 exponential topology, 340 extension of a function, 299 of the Kantorovich metric, 102, 183 of the Kantorovich–Rubinstein theorem, 406 problem, 290, 295 solution, 296 theorem(s), 295, 325 extremal marginal problem, 251, 258, 307 points, 19 problem(s) linear, 241 solution of, 139–141 value, 28 Fenchel–Moreau theroem, 178 final mass, 372 fine grid, 41 finite -dimensional linear programs, 307 Borel measure on a compact space, 166 dimensional approximation, 92 bounds, 93 case, 92 measure, 35, 265 trajectories, 364 finiteness, 66, 254 of ζm (X1 , θ), 176 of Cm (θ) and Dm (θ), 173 of I, 174
415
of the metrics μr , χr , dr , and 5 p,r , 169 L of the upper bounds, 122 fixed marginal distributions, 53 moments, 52 moments, 53 fluctuation inequalities, 93 formal equiprobable case, 16 Fortet–Mourier metric, 17, 50, 382 Fr´echet bound(s) lower, 2, 17 majorized, 42 upper, 2, 17 usual, 42 bounds, 2 bounds generalized upper, 54 sharpness of, 152 condition, 19, 21 differentiable density, 90 problem, 262 topological version, 262 -problem, 152 space, 339 type bound, 24 Fubini theorem, 5 full distribution, 132 probability distribution, 131 strictly operator-stable distribution, 146 random vector(s), 143, 151 function (convex) biconjugate, 112 (convex) conjugate, 112 bounded below, 171 differentiability, 115 isotone, 324 monotone, 11 optimal, 10 functional sublinear, 244 functional central limit theorem (CLT), 241 functionally
416
Index closed preorder, 324, 327, 336, 341 preorder, 324, 327
G-dependence metric, 94 G-dependence metrics, 37 G-measurable random variable, 94 Γ(p, λ)-distributions, 247 Galton–Watson process, 206, 216 normalized, 216 Gamma -distributed, 214 gamma distribution, 215 GARCH, see generalized autoregressive conditional heteroscedasticity Gaussian processes, 120 Gel’fand compactum, 225 general case, 123, 402 cost functions, 123 duality result, 245 theorem, 7, 82, 84 Kantorovich–Rubinstein mass transshipment problem, 244 problem (KRP), 163 Monge condition, 24 Monge–Kantorovich mass transfer problem with given marginals, 164 mass transportation problem (MKP), 247 problem on continuous selections, 303 generalization of the Monge–Kantorovich mass transportation problem, 29 generalizations of Debreu theorem, 329 generalized autoregressive conditional heteroscedasticity (GARCH), 40 modeling of asset returns, 39
Kantorovich–Rubinstein norm, 394 Monge–Kantorovich functional, 87 subsequence, 187 upper Fr´echet bound, 54 geometric α-stable r.v., 242 L´evy stable motion, 243 strictly stable distributions, 243 geometrically distributed, 237 global minimum, 129 Gnedenko’s extreme-value theorem, 232 greedy algorithm(s), 6, 17, 20, 22 solution(s), 7 greedy recursion, 22 grid class, 41, 339 coarse, 41 fine, 41 points, 41, 338 Gronwall inquality, 317 lemma, 282, 301, 321, 341 H¨ older condition, 418 Haar probability, 132 Hahn decomposition, 93 Hahn–Banach theorem, 61, 393, 402 Hausdorff locally convex linear topological space, 265 space, 178 metric, 333 Hausdorff metric, 248 Hoeffding–Fr´echet bounds, 107, 151 lower, 31–33 upper, 20, 21, 31 characterization, 52 inequality, 17 upper bound, 20, 21 H¨ older’s inequality, 108, 339 multidimensional, 339 homogeneity, 143, 413
Index homogeneous, 94, 174 functional, 414, 427 metric, 416, 422, 424, 428 metric(s), 422 ideal Kantorovich metric, 87 metric, 81, 82, 87, 102, 107, 183, 193, 233 of Zolotarev, 275 metric(s), 30, 223, 371, 374, 383, 411, 414, 415, 421, 424 of Zolotarev, 381 properties of the metric Kr , 412 ideality for a probability metric, 143 identical mapping, 171 IFS, see iterated function system image encoding, 199 implementable, 368 imputation, 25 feasible, 25 individually rational, 25 stable, 25, 26 increasing continuous function, 329 convex function, 26 function, 47, 380 sequence, 72 indicator function, 177, 231 metric, 111 indicator cost function, 36 inequality of Marcinkiewicz–Zygmund, 288 infinite -dimensional linear program, 241 dimensional network flow problem, 378 exchangeable sequence, 327 trajectory, 364, 365 initial mass, 372 input of laws, 71 interacting diffusion, 278
417
diffusions, 279 drifts, 279 intrinsic properties of prabability metrics, 113 inversion, 254 in a random permutation, 254 isometric isomorphism, 404 isometry, 396, 405 isotone, 146 completion, 146 function, 324, 325 functionals, 65 real-valued function, 324 with respect to ω , 337 iterated function system (IFS), 201 Itˆ o type SDEs, 332 Jordan decomposition, 166, 167, 179, 270, 314 k-minimal metric, 110 Kantorovich equality, 246 formulation, 2 of the MTP, 2 functional, 3, 14, 29 L2 -minimal problem, 132 on IRd , 132 metric, 35, 85, 86, 88, 90, 102, 138, 183, 200, 322, 412 p , 76 bounded, 79 extension, 183 generalized, 424 optimality criterion, 163 radius, 53, 54, 56 rth pseudomoment, 184 theorem, 88 Kantorovich–Rubinstein distance, 163 duality theorem on noncompact spaces, 222 functional, 14, 17, 179, 183 abstract version, 179, 188 classical, 394
418
Index
mass transshipment problem, 50, 162, 244, 275, 281, 371 duality results of, 13 topological properties, 13 metric, 306 minimal functionals, 246 norm, 46, 382, 404 generalized, 394 problem (KRP), 2, 161, 372 duality theorem, 15 general, 163 optimal transportation plan (OTP), 2 original, 163 seminorm, 378 theorem, 412 extension, 406 transshipment problem (KRP), vii, xi Kemperman equality, 410 Kingman’s subadditive ergodic theorem, 214 Kirchhoff equation, 13 Kirszbraun–McShane extension, 91 Kolmogorov distance, 24, 183, 271 metric, 188 weighted, 232 uniform distance, 24 Kolmogorov metric, 111 Krein–Milman and Choquet theorem, 19 Krein–Smulyan theorem, 251, 256 KRP, see Kantorovich–Rubinstein problem KRP, see Kantorovich–Rubinstein transshipment problem, see Kantorovich–Rubinstein transshipment problem kth modulus of continuity of f , 384 Ky–Fan metric, 417 λ-metric, 423 L1 -variation, 314 Lp -Kantorovich metric, 40 Lp -Wasserstein metric, 332, 348
1 -convergence, 86 L2 -minimal problem, 132 L2 -Kantorovich metric, 322 Lp -Wasserstein metric, 40 (p , ε)-independence, 77 (p , ε)-independent, 76 5 p -convergence, 152 L Lp -distance, 332 Lp -Kantorovich problem on mass transportation, 53 p -Kantorovich metric, 253 Lp -metric, 138 minimal, 194 Lagrange function, 312 λ-metric, 43 Laplace transform, 246, 247 largest c-convex minorant, 125 elements of the marginal, 12 lattice superadditive, 17 lattice measure, 396 learning algorithm, 204 Lebesgue integrable, 339 Lebesgue–Fatou lemma, 206 lemma of Carlson, 323 of Courant–Fischer, 137 of Gronwall, 282, 301, 321 of Lebesgue–Fatou, 206 of Pollard, 93 of Robbins–Siegmund, 206 of Urysohn, 176, 328 of Zorn, 291 LePage decomposition, 103 representation, 91, 124, 125 less concordant, 33 L´evy distance, 108 measure, 241, 246, 247 metric, 102, 183 process, 241 L´evy metric, 423, 424 generalized, 423
Index lexicographic order, 397 limit laws, 236 linear combination of measures, 400 extremal problem, 163, 241, 242 function(s), 119 preorder, 322, 323 programming duality, 5 transportation problem, 315 programs, 307 transformation, 401 linear interpolation of the trajectories, 338 Lipschitz assumption, 309 condition, 5, 332–334 relaxed, 308 stronger, 301 constant, 334 function, 200 norm, 98, 379 preorder, 333 on a metric space, 332 space, 394 utility function, 344 local bounds for the transportation plans, 36 in the transportation problem, 35 upper bounds on the transportation plans, 40 locally convex Hausdorff space, 286 space, 178 logarithmic normalization, 266 lognormal distribution, 231 lower bounded semicontinuous cost function, 62 bounds c-rounding, 58 Fr´echet bound, 2 Hoeffding–Fr´echet bound, 31
419
semicontinuity of c, 214 semicontinuity of c∗ , 214 semicontinuous, 62, 77, 113, 169, 178, 188, 201, 243, 244, 259, 358, 361 convex function, 103 cost function, 365 function, 70, 171, 176, 227 Lusin C-property, 343 separation theorem, 178, 230 extension, 225 theorem, 74 Lyapunov theorem, 261 μr -closeness, 223 M-analytic, 167 function, 167 m-buckets, 272 m-chaotic, 301, 307, 319 μ-completion, 194 μ-measurable selection, 216 sets, 195 μ-minimal metric, 111 μ-negligible open set, 221 μ 5c -convergence, 29 Maejima–Rachev construction, 104 majorized Fr´echet bounds, 42 Marcinkiewicz–Zygmund inequality, 166, 288 Marcinkiewicz-Zygmund inequality, 301 marginal distributions, 53 elements, 12 moments, 52 marginal(s), 83 and perfectness, 83 constraints, 54 extensions and perfectness, 83 measures, 145 Markov chain, 236 kernel, 199, 200
420
Index
models of interacting particles, 277 Markov inequality, 392 martingale, 211 case, 94 inequalities, 340 mass transfer problem, 161, 162, 175, 198, 219, 275, 365 abstract version, 175 and cyclic-monotone operators, 288 approximated, 307 approximation, 306 noncompact version, 220 nontopological version, 265 dual problem, 265 duality theorem, 265, 272 on completely regular topological spaces, 221, 232 optimal value, 275 with continuous cost function, 306 with given marginal difference, 162, 163, 221, 244 on compact space, 313 with given marginals, 245 mass transportation problem, 414 with fixed sum, 10 with stochastically ordered marginals, 10 mass transportation problem (MTP), vii, xi, xiii, xvii, 1, 27 and probability distances, 27 approximation of, 4 continuous and discrete, 23 general, Monge–Kantorovich (MKP), 247 of Monge–Kantorovich (MKP), vii, xi on IRn , 158 spezialized, 51 with additional constraints (MTPA), vii, xi with partial knowledge of the marginals (MTPP), 4 mass transshipment problem, 13, 371, 378
condition for nontrivial solution, 285 Kantorovich–Rubinstein, 50, 162, 275, 281 necessary condition, 280 for a nontrivial solution, 281 optimal value, 381 with constraints on derivatives of marginals, 378 mathematical economics applications, 322 matrix problem, 182 MAX-algorithm, 254, 257 max-geometric infinitely divisible, 239 max-operator-stable limit theorem, 132 maximal compactification, 226 concentration on the diagonal, 16 cyclic-monotone, 116 dependence, 151 element of a set, 291 measure, 146 maximally dependent random variables, 155 maximum of sums, 144, 148 probability of sets, 144 McKean example, 279 interacting diffusion, 278 McKean–Vlasov equation, 299, 305, 309 McKean-Vlasov equation, 278 MD-operator, 419, 425, 426 measurable function, 420 mapping, 235 selection theorem, 194, 217, 235, 237 measures with a large number of common marginals, 43 method generating function, 261 of antithetic variates, 154
Index of probability metrics, 204, 273 metric(s) compound, 373 ideal, 30, 81, 82, 102, 183, 193, 223, 374, 411 indicator, 111 k-minimal, 110 Kolmogorov, 111 Lp -Kantorovich, 40 2 -minimal, , 112 μ-minimal, 111 minimal, 30, 374 nonpathological, 185 protominimal, 111 simple, 373 space preorder, 332 separable, 332 metrizable, 190 compact case, 190 space, 170, 190, 208 topological spaces, 337 Michael’s selection theorem, 306, 339 middle inequality, 291 Milshtein’s method, 337 minimal p -metric, 191 distance between X and Y , 62 functionals, 246 L0 -metric, 138 1 -metric, 45 2 -metric, 112 Lp -metric, 138 p -coupling, 124, 131 Lp -metric in the space of probabilities, 87 p -metric, 124 Lp -metrics, 194 mean interaction, 307 metric, 417 metric(s), 30, 45, 110, 111, 374 network flow problem, 13 representation of metrics, 45 variance of the sum, 155 minimality of ideal metrics, 414
421
5 p , 140 property of L Minkowski inequality, 343 minorant, 125 MKP, see Monge–Kantorovich problem MKP, see Monge–Kantorovich mass transportation problem MKTP, see Monge–Kantorovich transportation problem moment formulas, 264 generating function, 255 problems, 52 moment-type marginal constraints, 54 Monge condition, 2, 11, 22, 39, 53 generalized, 24 formulation of the MTP, 2 function, 25 problem, 162 solutions, 118, 129 Monge–Amp`ere PDE, 123 Monge–Kantorovich functional(s), 5, 65, 179, 183 dual, 64, 87 generalized, 87 primal, 64 mass transfer problem with given marginals, general, 164 mass transportation problem (MKP), vii, ix, xi, xiii, 1, 34 abstract version, 17 generalization, 29 multidimensional, 23 optimal transportation plan (OTP), 2 with capacity constraints, 35 minimal functionals, 246 problem (MKP), 246, 418 conditions for duality, 248 dual, 247 multivariate, 58 with given marginals, 162
422
Index
transportation problem (MKTP), 374 classical, 374 monotone, 203 convergence, 308 function, 11, 147 cyclical, 10 operator, 287 seminorm, 86 Zarantonello-, 11 Monte Carlo simulation, 151 Moreau’s theorem, 122 MTP, see mass transportation problem MTPA, see mass transportation problem with additional constraints, see mass transportation problem with additional constraints MTPP, see mass transportation problem with partial knowledge of the marginals multi-dimensional martingale inequalities, 340 multichannel models, 74 –multiphased model, 74 multidimensional MKP, 23 multifunction, 363 multinomial distributed, 272 multivariate compound Poisson distribution, 128 normal distribution, 325 setting, 241 summability methods, 126 version of Hoffman’s northwest corner rule, 34 MYZ-rounding, 59 MYZ-rule of rounding, 180 ν-completion, 220 ν-measurable, 220 necessary condition for a nontrivial solution, 281
a nontrivial solution of the mass transshipment problem, 280 the duality relation, 212 negative cost of shipping a unit commodity from origin i to destination j, 27 network, 363 flow problem, 6, 15 minimal, 13 node j, 10 non-Markovian case, 293 nonatomic market games, 370 noncompact reduction theorem, 234 version of mass transfer problem, 220, 244 nondecreasing, 203 nonincreasing function, 394, 421 nonincreasing function, 248 nonmetrizable case, 197 compact case, 196 nonnegative, 66, 203 lower semicontinuous cost function, 213 Radon measure, 166 nonpathological metrics, 185 nonsymmetric cost functions, 12 nonsymmetric case, 8, 12 nontopological version of the mass transfer problem, 265 dual problem, 265 duality theorem, 265, 272 nontotal and nontransitive preference, 344 nontraditional measurable selection theorem, 235 nonuniqueness of optimal solution, 312 nonvoid, 64, 70 normalization condition, 180, 181, 185 normalized rounding error, 81 normed space, 179
Index northwest corner rule, 2, 7, 17, 22 generalized, 24 Hoffman’s, 26 multivariate version, 34 variant, 20 nth integral, 375 numerical approximation of stochastic differential equations, 39 one dimensional case, 120 one-dimensional standard Wiener process, 341, 342 one-dimensional case, 132 operator -ideal metric, 143 -stable, 132 full distribution, 132 limit theorem, 131 random vector(s), 131, 138 strictly, 132 optimal admissible permutation, 16 c-coupling, 123, 130, 131 coupling(s), 6, 16, 37, 112, 117, 123 Dobrushin’s result, 36 of Gaussian processes, 120 with local restrictions, 36 couplings, 318 distribution, 39 feasible, 18 finite trajectories, 364 function, 10 joint distribution, 22 2 -couplings, 116 measure, 26, 28, 30, 217, 221, 248, 270 multivariate transshipment costs, 158 over the class of K-stationary rules, 187 pair, 103 rate of approximation, 93 rounding rule, 185 roundings in terms of ideal metrics, 179 rule of rounding, 183
423
solution for C γ , 308 solution(s), 187, 217 taxation, theory of, 367 trajectory, 364, 365 transportation plan (OTP), 2, 3, 10 for KRP, 2 for MKP, 2 uniqueness of, 13 transshipment, 372 value, 218, 246, 265, 308 in the dual problem, 268 of the dual problem, 253 of the mass transfer problem, 275 of the mass transshipment problem, 381 optimality criterion, 221 of Kantorovich, 163 of a map, 10 of projections, 122 of radial transformations, 121 optimization function, viii, xii problem, 363, 365 order, 323 of convergence, 140 rounding rule, 185 type relation, 15 ordered topological space, 145 ordering criterion, 17 original Kantorovich–Rubinstein problem, 163 Orlicz condition, 26, 29 OTP, see optimal transportation plan output flow, 71 p-conjugate, 102 second, 102 p-th mean interaction, 293 paracompact space, 303 partial derivative, 309 partial derivatives, 280 perfect
424
Index
compound, 415 measures, 64 metric, 414, 422 metric(s), 422 probability, 63 space, 7 perfectness and marginal extensions, 83 and marginals, 83 piecewise linear interpolation, 338 piecewise smooth oriented curve, 285 Poisson distributed random variable, 271 process, 220 Polish space, 167, 177, 219, 295, 301, 302 Polish spaces, 347 Pollard’s lemma, 93 polyhedron dual, 23 primal, 23 positive cost function, 365 definite matrix, 281 semidefinite, 280, 289 precompact, 318 trajectory, 359 preference, 344 nontotal and nontransitive, 344 strict, 345 preferred, 344 rounding roules, 185 preorder, 322, 332 closed, 322, 327, 344 functionally closed, 324, 327, 336, 341 linear, 322, 323 on a completely regular topological space, 341 on a metric space, 332 variying, 337 primal Monge–Kantorovich functional(s), 64 polyhedron (PP), 23
probability density function on IR2 , 42 distance, 28 distribution full, 131 measure with μ-density f2∗ , 37 measure, μ 5c -convergence of, 29 metric, 28, 414, 419, 426 theory of, 373 perfect, 63 semidistance, 27 space, perfect, 7 problem of mass transfer, 315 of Monge, 162 of variance reduction, 154 on mass transportation, 53 with fixed marginals, 315 product measurable functions, 80 Prohorov metric, 86, 138, 152, 249 Prohorov metric, 92, 417 projection(s), 122, 191 operators, 199 optimality of, 122 propagation of chaos, 277, 289, 319 property, 289 proper mapping, 286 sublinear functional, 244 properties of the metric Kr , 412 protominimal metric, 111 pseudo -difference moment, 96 drift, 289 pseudometric(s), 421, 422 pth mean interaction, 301 norm interaction, 301 pure time discretization, 337 Pyke and Root inequality, 301, 321 quantitative approximation, 254 version of the central limit theorem, 264
Index quasi-antitone cost functions, 109 quasiconvexity, 284 queueing models, 75 system(s) approximation, 71, 72 dynamics, 74 real, 71 simpler, 71 quicksort, 229 algorithm, 191, 229, 230 R-isotone, 349, 351, 354, 357, 361 function, 345 R-nondecreasing chain, 346 R-regular, 347 -relative compact, 24 -relatively compact, 24 r-th pseudomoment, 92 Rademacher’s theorem, 381 radial transformation, 123 radial transformation(s), 121, 132 radius of the set of probabilistic laws, 81 Radon measure, 13, 163, 166, 167, 200, 204, 237 nonnegative, 166 signed, 166 Radon-Nikodym derivative, 247 random broken line(s), 66, 67 field(s), 248 immigration term, 217 measure, 327 polygon line(s), 64 recursion, 236 search algorithm, 269 search tree(s), 260 variables, maximally dependent, 155 vector, 131 walk method, 126, 128 random recursion, 248 range of values of Eh(X − Y ), 63 rate explosions, 373 of convergence
425
in the central limit theorem, 34 in the stable limit theorem, 35 of transshipment, 372 rate of convergence, 138, 143, 181, 199, 248, 322, 327 bound in the local central limit theorem, 137 exponential, 219 faster, 186 in the CLT, 85, 275 for random elements with LePage representation, 91 in the i.i.d. case, 86 problem, 131 result(s), 126, 138 square uniform, 323 to zero, 323 under alternative distributional assumptions, 263 rational choice theory, 352 rationalizable, 368 real queueing system, 71 real-valued function, isotone, 324 recursion of branching type with multiplicative weights, 207 of branching-type, 206 recursive algorithm, 191 reduced cost function, 170, 190, 348 associated with the original cost function, 332 reduction theorem(s), 190, 192, 198, 209, 211, 277, 279 noncompact, 234 reflections, 120 reflexive relation, 322 regular cost function, 176, 279 function, 176, 299 functional, 414
426
Index
with respect to R, 347 regularity, 143, 413 related theorem, 178 relation binary, 322 order-type, 15 reflexive, 322 transitive, 322 relaxed Lipschitz condition, 308 side conditions, 7 transportation problem, 4, 8 relaxed transportation problem, 52 representation of metrics, minimal, 45 utility, 45 Robbins–Monroe-type recursion, 206 Robbins–Siegmund lemma, 206 Rosenthal inequality, 168 rounding error, 81 normalized, 81 total, 81 of random proportions, 80 problem, 52 rule(s), 180, 185 optimal, 185 order, 185 rth absolute pseudomoment, 143 rth difference pseudomoment, 122, 142 rule of rounding, 183 optimal, 183 Ryll-Nardyewski, result of, 63 σ-additive, 63 σ-completion, 167 σ-continuity upwards, 72 σ-continuous upwards, 70 σ-measurable, 167 Schur complement, 134 SDE, see stochastic differential equations SDEs wit a drift, 294
with mean interaction in time, 293 search tree binary, 260, 263 random, 260 second p-conjugate, 102 selection theorem, 194, 217, 237 of Michael, 306, 339 self-decomposable, 246 selling strategy, 367, 368 semi-infinite linear programs, 307 semicontinuous function lower, 70, 171 upper, 70 semidistance, 27 semilinear space, 241, 255, 260 semimetric, 67 cost function, 14 separable Fr´echet space, 337 metric space, 332 separation theorem, 325, 328 of Lusin, 178 set of m-dimensional vectors, 51 set-valued mapping, 358, 363 sharpness of Hoeffding–Fr´echet bounds, 152 signature algorithms, 23 of a graph, 23 signed finite measure, 265 Radon measure, 166 simple measure, 396 metric(s), 373 signed measure, 405 simple metric, 415 simplex, 80 simplex method, 270 simultaneous representations, 32 single-channel models, 74 single-valued C 1 -operator, 288 Skorohod–Lebesgue spaces, 32, 33 smallest elements of the marginal, 12 smooth transportation plans, 373
Index smooth convex function, 289 smoothing Kantorovich metric, 87 smoothness of the cost function, 279 solution of mass transportation, 85 of mass transshipment problems, 85 of the maximization problem, 2 of the SDE, 281, 331 solution of extremal problem(s), 139–141 the extension problem, 296 the maximization problem, 138 southwest corner rule, 25 square uniform rate of convergence, 323 stability of stochastic optimization problem, 49 programs, 49 stable central limit theorem, 117 limit theorem(s), 102, 124, 126 symmetric law, 125 stable limit theorem rate of convergence, 35 starlike, 285 stationary divisor rules, 181 rule(s), 185 of (1/t)-rounding, 186 stochastic applications, 27 of the MKP, 27 differential equations, 277 numerical approximation, 39 dominance, 341 Euler method, 337 inequality, 110 mappings, 277 optimization problem, 49 order, 15, 144
427
ordering, 147, 148 Strassen representation theorem, 146 theorem, 154, 417 application of the duality theory, 319 Strassen–Dudley theorem, 105 strict preference, 345 strictly α-stable random vector, 243 operator-stable distribution, 132 operator-stable random vector, 131 strong axiom of revealed preference, 352 law of large numbers, 198 metric, 43 solution of the SDE, 281 stochastic dominance, 341 subadditive, 30 subadditivity, 66 subdifferential, 113, 178 subgradient, 113 sublinear functional, 243, 244 submartingale, 211 subnet, 187 subspace, closed, 172 sufficient condition for a nontrivial solution, 281 summability method, 126 superadditive, 34 superadditive function, 25 superlinear mapping, 241 superlinearity, 241 supply distribution, 2 support of a measure, 403 of marginal measures, 145 supporting hyperplane, 114 survival function, 375 Suslin function(s), 78 set(s), 78 symmetric α-stable, 126
428
Index
U -valued random variable ϑ, 91 cost function, 4, 11 symmetric matrix, 289 system of interacting particles, 298 τ -continuity downwards, 72 upwards, 72 τ -continuous downwards, 70 tail condition, 250 theorem by Weizs¨ acker and Winkler, 19 ergodic, 214 of Arzela, 202, 218 of Berry–Ess´een, 255 of Choquet, 79 of Debreu, 323, 329, 335 of Dini, 74 of Dobrushin, 93 of Douglas, 20 of Dubovitskii–Milyutin on convex sets, 180, 184 of Fenchel–Moreau, 178 of Fubini, 5 of Gutman, 43 of Hahn–Banach, 61, 393, 402 of Isii, duality, 59 of Kantorovich, 88 of Kantorovich–Rubinstein, 412 extension, 406 of Krein–Milman and Choquet, 19 of Krein–Smulyan, 251, 256 of Lusin, 74, 178 of Lyapunov, 261 of Michael, 306, 339 of Moreau, 122 of Rademacher, 381 of Strassen, 146, 154, 417 application, 319 of Strassen–Dudley, 105 theory of moments, 52 of monopoles with incomplete information, 367
of optimal taxation, 367 of probability metrics, 373 of rounding, 179 thick convex cone, 179, 184 threshold for rounding, 180 time discretization methods, 332 discretization of the SDE, 332 time discretization points, 41 topological properties, 21 of Kantorovich–Rubinstein MTP, 13 spaces, 63, 219, 337 completely regular, 221 ordered, 145 version of Fr´echet problem, 262 topology of weak convergence, 322 total cost, bounds to, 158 mass, 375 rounding error, 81 variation distance, 111 metric, 30, 93, 253 variation distance, 133, 136 variation norm, 375 TP, see transportation problem trajectory, 358, 363 efficient infinite, 364 infinite, 365 of dynamical system, 354, 358 optimal, 364 finite, 364 transfer function, 367 problem, 162 transformation by Markov kernel, 199 transitive relation, 322 transportation cost of a unit from note i to node j, 10 cost, upper bound for, 18 plan, 2, 40 problem (TP), 15, 21
Index discrete, 2 relaxed, 4, 8, 52 with local upper bounds, 40 with nonnegative cost function, 2 transshipment, 271 cost, optimal multivariate, 158 network flow problem, 372 plans, 47 problem of Kantorovich–Rubinstein (KRP), vii, xi rate, 372 tree splitting protocols, 220 triangle inequality, 174, 179, 183, 217, 271, 290 trinary feedback, 220 triple of points, 271 two-dimensional case, 43 u-chaotic, 278, 288 uniform bound, 317 distance, 183 between characteristic functions, 136 distribution, 272 k-modulus of continuity, 393 metric, 133, 136, 137 depending on the exponent B, 133 norm, 219 uniformly convergent, 201 tapered matrix, 27 unimodality condition, 8, 39 uniqueness of OTP, 13 univariable case, 380 universal utility theorem, 340 universally measurable, 167, 192, 197, 220, 226, 235, 245 set, 167, 220 upper bound for the transportation cost, 18 bounds finiteness, 122 5 p , 152 bounds for L
429
envelope, 358 Fr´echet bounds, 2 Hoeffding–Fr´echet bound, 21, 31 semicontinuous, 358–362 function, 70, 81 Urysohn lemma, 176, 328 usual Fr´echet bounds, 42 usual stochastic dominance, 341 utility continuous, 337 function(s), 44, 329–332, 345 d-Lipschitz, 349 of a preorder, 323 -rational choice function, 352 representation, 45 theorem, 337, 340, 344 variance of the sum, 155 reduction, 154 variation distance, total, 111 metric, total, 30 norm, 375 vector problem, 179 Wasserstein metric, Lp , 40 norm, 404 Wasserstein metric, 322, 332 weak approximation of SDEs, 332 convergence, 102, 152, 182, 232, 278, 322 metric, 43 weak* compact, 308 compactness, 256 lower semicontinuity, 257 semicontinuous, 178, 188 precompact, 318 weakly perfect metric, 415 regular functional, 427 weakly*
430
Index
closed, 256, 262 compact, 257, 262 convergent subnet, 308 subsequence, 308 lower semicontinuous, 256, 257 wealth changes, 248 Webster rounding, 59, 188 rule, 185, 188 Weibull distribution, 123 weighted total variation metric, 427 Wiener process, 43, 241, 333, 338, 339, 341 q-dimensional, 334 discretization, 336 increments, 346 one-dimensional, 341, 342 standard, 347, 348
Woyczinski inequality, 196 χp -metric, 249 χp -minimal metric, 249 Young inequality, 113, 114, 124 ζF -representation for p , 99 ζn -metric, 47 Zarantonello-monotone, 11 Zolotarev ideal metric, 193, 275, 381 metric, 97, 107, 412, 416 metric ζr , 218 type metric, 102 ζn -metric, 374 ζr -metric, 413 Zorn’s lemma, 291