Mass Transportation Problems: Applications

To my wife Zoja and To my parents Nadezda and Todor Rachevi. To my wife Gabi. Svetlozar (Zari) Rachev Preface to Vo...

Author: Svetlozar T. Rachev | Ludger Ruschendorf

26 downloads 780 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

To my wife Zoja and To my parents Nadezda and Todor Rachevi.

To my wife Gabi.

Svetlozar (Zari) Rachev

Preface to Volume II

The second volume of the Mass Transportation Problems is devoted to applications in a variety of ﬁelds of applied probability, queueing theory, mathematical economics, risk theory, tomography, and others. In Volume I we encompassed the general mathematical theory of mass transportation, concentrating our attention on: • the general duality theory of the transportation and transshipment problem; • explicit optimality results; • applications to minimal probability metrics, stochastic ordering, approximation and extension problems; • applications to functional analysis and mathematical economics (the Debreu theorem, utility theory, dynamical systems, choice theory, and convex and nonconvex analysis were dicsussed in this context). In Volume II we expand the scope of applications of mass transportation problems. Some of them arise from modiﬁcations of the admissible transportation plans. In fact, for applications to mathematical economics it is of interest to consider relaxations of the marginal constraints, such as upper or lower bounds on the supply and demand distributions, or additional constraints like capacity bounds for the transportation plans. In mathematical tomography the basic problem is to reconstruct the multivariate

viii

Preface to Volume II

probability distribution based on some information about the marginal distributions in a certain ﬁnite number of directions. This information may be represented by additional constraints on the support functions or distributional moments, or it may be contained in only partial information on the marginals. Thus there is a close relationship between a class of problems in mathematical tomography and the classical theory on moment problems, which again can be viewed as a relaxation on the set of constraints in mass transportation problems. We discuss in detail applications to approximation problems for stochastic processes and to rounding problems based on moment-type characteristics. A particular example will be the approximation of queueing models. The minimal metrics allow us to compare various rounding rules and to determine optimal ones from an asymptotic point of view. An important ﬁeld of applications of mass transportation problems we shall consider in this second volume is to probabilistic limit theorems. This approach was introduced in the seventies by the Russian school of probability theory, headed by V.M. Zolotarev. By inherent regularity properties of probability metrics deﬁned via certain mass transportation problems, there are streamlined proofs for central limit theorems on Banach spaces yielding sharp quantitative estimates of Berry–Esseen type for the convergence rate. The probability metric approach will be applied to general stable and operator stable limits theorems, martingale-type limit theorems, limit behavior of summability methods, and compound Poisson approximation. A particular application is to the classical problem in mathematical risk theory dealing with sharp approximation of the individual risk model by the collective risk model. The probability metric approach will also be applied to the quantitative asymptotics in rounding problems. A new ﬁeld of application of probability metrics arising as solutions of mass transportation problems is the analysis of deterministic and stochastic algorithms. This research area is of increasing importance in computer science and various ﬁelds of stochastic modeling. Based on regularity properties of probability metrics, a general “contraction” method for the asymptotic analysis of algorithms has been developed. The contraction method has been applied successfully to a variety of search, sorting, and other tree algorithms. Furthermore, the recursive structure in iterated functions systems (image encoding), fractal measures, bootstrap statistics, and time series (ARCH) models has been analyzed by this method. It becomes clear that there are many interesting probabilistic applications of this method to be rigorously developed in the future. In the ﬁnal chapter we consider applications to stochastic diﬀerential equations (SDEs) and to convergence of empirical measures. SDEs will be interpreted as continuous recursive structures. From this point of view we provide a detailed discussion on the approximative solution of nonlinear stochastic diﬀerential equations of McKean–Vlasov type by interactive par-

Preface to Volume II

ix

ticle systems with application to the Kac theory of chaos propagation. The probability metrics approach allows us to establish approximation results for various modiﬁcations of the diﬀusion system, some of them of “nontraditional” type. In a general context we establish approximation results for empirical measures and give applications to the approximation of stochastic processes. As ﬁnal applications we discuss a weak approximation of SDEs of Itˆ o type by a combination of the time discretization methods of Euler and Milshtein with a chance discretization based on the strong invariance (embedding) principle. This approximation is given in terms of minimal Lp -metrics and thereby based on regularity properties of the solutions of the corresponding mass transportation problem.

Preface to Volume I

The subject of this book, mass transportation problems (MTPs), concerns the optimal transfer of masses from one location to another, where the optimality depends upon the context of the problem. Mass transportation problems appear in various forms and in various areas of mathematics and have been formulated at diﬀerent levels of generality. Whereas the continuous case of the transportation problem may be cast in measure-theoretic terms, the discrete case deals with optimization over generalized transportation polyhedra. Accordingly, work on these problems has developed in several separate and independent directions. The aim of this monograph is to investigate and to develop, in a systematic fashion, the Monge–Kantorovich mass transportation problem (MKP) and the Kantorovich–Rubinstein transshipment problem (KRP). We consider several modiﬁcations of these problems known as the MTP with partial knowledge of the marginals and the MTP with additional constraints (MTPA). We also discuss extensively a variety of stochastic applications. In the ﬁrst volume of Mass Transportation Problems we concentrate on the general mathematical theory of mass transportation. In Volume II we expand the scope of applications of mass transportation problems. In 1781 Gaspard Monge proposed in simple prose a seemingly straightforward problem of optimization. It was destined to have wide ramiﬁcations. He began his paper on the theory of “clearings and ﬁllings” as follows: When one must transport soil from one location to another, the custom is to give the name clearing to the volume of the soil that one

xii

Preface to Volume 1 must transport and the name ﬁlling (“remblai”) to the space that it must occupy after transfer. Since the cost of transportation of one molecule is, all other things being equal, proportional to its weight and the interval that it must travel, and consequently the total cost of transportation being proportional to the sum of the products of the molecules each multiplied by the interval traversed; given the shape and position, the clearing and the ﬁlling, it is not the same for one molecule of the clearing to be moved to one or another spot of the ﬁlling. Rather, there is a certain distribution to be made of the molecules from the clearing to the ﬁlling, by which the sum of the products of molecules by intervals travelled will be the least possible, and the cost of the total transportation will be a minimum. (Monge, (1781, p. 666)).

In mathematical language Monge proposed the following nonlinear varational problem. Given two sets A, B of equal volume, ﬁnd an optimal volume-preserving map between them; the optimality is evaluated by a cost function c(x, y) representing the cost per unit mass for transporting material from x ∈ A to y ∈ B. The optimal map is the one that minimizes the total cost of transferring the mass from A to B. Monge considered this problem with cost function equal to the Euclidean distance in IRd : c(x, y) = |x − y|. Monge’s problem turned out to be the prototype for a class of problems arising in various ﬁelds such as mathematical economics, functional analysis, probability and statistics, linear and stochastic programming, diﬀerential geometry, information theory, cybernetics, and ma trix theory. The optimization function A c(x, t(x)) dx is nonlinear in the transportation function t, and moreover, the set of admissible transportations is a nonconvex set. This explains why it took a long time until even existence results for optimal solutions could be established. The ﬁrst general existence result was given in 1979 by Sudakov. On the second page of his paper Monge himself had remarked that to obtain a minimum, the intervals traversed by two diﬀerent molecules should not intersect. This simple observation applied to the discrete case—where there are only a ﬁnite number of molecules—leads to a “greedy” algorithm, the so-called northwest corner rule. The totality of mass transferences plans in the discrete case is a polytope that arises in the transportation problem of mathematical programming, where it is treated in specialized form as an assignment problem and in generalized form as a network-ﬂow problem. The northwest corner rule solves transportation problems having a particular structure on the costs and is, moreover, at the heart of many seemingly different problems having an “easy” solution (cf. Hoﬀman (1961), Barnes and Hoﬀman (1985), Derigs, Goecke, and Schrader (1986), Hoﬀman and Veinott (1990), Olkin and Rachev (1991), and Rachev and R¨ uschendorf (1994); see also Burkard, Klinz, and Rudolf (1994) and the references therein). The Academy of Paris oﬀered a prize for the solution of Monge’s problem, which was claimed by the diﬀerential geometer P. Appell (1884–1928), who

Preface to Volume I

xiii

established some geometric properties of optimal maps in the plane and in IR3 . But it took a long time until a real breakthrough in the transportation problem came, originating in the seminal 1942 paper of L.V. Kantorovich entitled “On the transfer of masses.” Kantorovich stated the problem in a new, abstract, and in more easily accessible setting and without knowledge of Monge’s work. Kantorovich learned of Monge’s work only later (cf. his 1948 paper). In the Kantorovich formulation of the mass transportation problem (the so-called “continuous” MTP), the initial mass (the clearing) and the ﬁnal mass (the ﬁlling) can be considered as probability measures on a metric space. The essential step in this formulation is the replacement of the class of transportation map by the wider class of generalized transportation plans, that are identiﬁable with the convex set of all probability measures on the product space with ﬁxed marginals. The diﬃcult nonlinear Monge problem was thereby replaced by a linear optimization problem over an abstract convex set. This made it possible to put this problem in the framework of linear optimization theory and encouraged the development of general duality theory for the solution of the Kantorovich formulation of the transportation problem as the basic tool. Accordingly, these problems and their generalizations will be referred to as Monge–Kantorovich Mass Transportation Problems (MKPs). Kantorovich’s measure theoretic formulation made the problem accessible to various areas of the mathematical sciences and other scientiﬁc ﬁelds. Kantorovich himself received a Nobel Prize in Economics for related work in mathematical economics.(1) Here is a list of some references in the mathematical sciences: • Functional analysis: Kantorovich and Akilov (1984) • Probability theory: Fr´echet (1951), Cambanis et al. (1976), Dudley (1976, 1989), Kellerer (1984), Rachev (1991c), R¨ uschendorf (1991) • Statistics: Gini (1914, 1965), Hoeﬀding (1940, 1955), Kemperman (1987), Huber (1981), Bickel and Freedman (1981), R¨ uschendorf (1991) • Linear and stochastic programming: Hoﬀman (1961), Barnes and Hoﬀman (1985), Anderson and Nash (1987), Burkard, Klinz and Rudolf (1994) • Information theory and cybernetics: Wasserstein (1969), Gray et al. (1975), Gray and Ornstein (1979), Gray et al. (1980) • Matrix theory: Lorentz (1953), Marcus (1960), Olkin and Pukelsheim (1982), Givens and Shortt (1984) (1) L.V.

Kantorovich together with T.C. Koopmans received the Nobel Memorial Prize in Economic Science in 1975 for “contributions to the theory of optimum allocation of resources”; see Dudley (1989, p. 342).

xiv

Preface to Volume 1

Many practical problems arising in various scientiﬁc ﬁelds have led mathematicians to solve MKPs: e.g., in • Statistical physics: Tanaka (1978), Dobrushin (1979) • Reliability theory: Barlow and Proschan (1975), Kalashnikov and Rachev (1990), Bene˘s (1985) • Quality control: Jirina and Nedoma (1957) • Transportation: Dantzig and Ferguson (1956) • Econometrics: Shapley and Shubik (1972), Pyatt and Round (1985), Gretsky, Ostroy, and Zame (1992) • Expert systems: Perez and Jirousek (1985) • Project planning: Haneveld (1985) • Optimal models for facility location: Ermoljev, Gaivoronski, and Nedeva (1983) • Allocation policy: Rachev and Taksar (1992) • Quality usage: Rachev, Dimitrov and Khalil (1992) • Queueing theory: Rachev (1989), Anastassiou and Rachev (1992a, 1992b) There are several surveys in the vast literature about MKP, among them Rachev (1984b), Rachev and R¨ uschendorf (1990), Burkard, Klinz, and Rudolf (1994), Cuesta-Albertos, Matr´ an, Rachev, and R¨ uschendorf (1996), and Gangbo and McCann (1996) related to dual solutions and applications of MKP; Shorack and Wellner (1985, Sect. 3.6) on optimal processes; Benes and Stepan (1987, 1991) on extremal mass transportation plans; R¨ uschendorf (1981, 1991, 1991a), Kellerer (1984), Rachev (1991c) on multivariate transportation problems; Dudley (1989) on distances in the space of measures; Talagrand (1992) and Yukich (1991) on matching problems. In recent years, characterizations of the solutions of the Monge–Kantorovich problem have been given in terms of c-subgradients of generalized convex functions deﬁned in terms of the cost functions c(x, y) (cf. Knott and Smith (1984, 1992), Brenier (1987), R¨ uschendorf and Rachev (1990), R¨ uschendorf (1991, 1991a, 1995), Cuesta-Albertos, Matr´ an, Rachev, and R¨ uschendorf (1996), and Gangbo and McCann (1996)). For the case of squared Euclidean costs c(x, y) = |x − y|2 , the generalized convexity property is equivalent to convexity, and c-subgradients are identical to the usual subgradients of convex analysis. From this characterization

Preface to Volume I

xv

a series of explicit solutions of the transportation problem could be established. It also implies that the solutions of the MKP are under continuity assumptions given by mappings. Therefore, the solutions of the “easier” MKP imply as well the existence and characterizations of solutions of the original Monge problem, and so the MKP turns out to be the fundamental formulation of the transportation problem. For this reason, we concentrate in this book on the Kantorovich-type mass tranportation problems. For a discussion of interesting analytic aspects of the Monge problem, we refer to Gangbo and McCann (1996). Another type of MTP appears in probability theory, even if it leaves the framework of probability measures as transportation plans. Its solutions are bounded measures on a product of two spaces with the diﬀerence of marginals equal to the diﬀerence of two given probability measures. It will be called the Kantorovich–Rubinstein Problem (KRP), since the ﬁrst results were obtained by Kantorovich and Rubinstein (1958). In its relation to the practical task of mass transportation it is sometimes referred to as the transshipment problem; see Kemperman (1983), and Rachev and Shortt (1990). The KRP has been developed to a great extent in the Russian school of probabilists and functional analysts, in particular by V.L. Levin, A.A. Milyutin, and A.M. Vershik and their students. For metric cost functions the KRP coincides with the corresponding MKP; for general cost functions it can be reduced to the MKP for a corresponding reduced cost function. For the duality theory of the KRP a speciﬁc detailed theory with many results that are of value in themselves has been developed with wide-ranging applications to mathematical economics. For a diﬀerent approach to the KRP as introduced in Dudley (1976) and as further extended in Rachev and Shortt (1990) we refer to the book of Rachev (1991c). A problem related to both MKP and KRP is the Mass Transportation Problem with Partial Knowledge of the Marginals (MTPP), which is expressed by stating ﬁnitely many moment conditions. Problems of this type were formulated and extensively studied by Rogosinski (1958), Kemperman (1983), and Kuznezova-Sholpo and Rachev (1989). Barnes and Hoﬀman (1985) considered mass tranportaion problems with capacity constraints on the admissible transportation plans as an example of Mass Transportation Problems with Additional Constraints (MTPA) (see Rachev (1991b) and Rachev and R¨ uschendorf (1994)). In this book we give an extensive account of the duality theory of the MKP and the KRP, including the known results on explicit constructions and characterizations of optimal solutions. In Chapters 2 and 3 we present important duality theorems for the Monge–Kantorovich problem based on work of H. Kellerer, L. R¨ uschendorf, S.T. Rachev, and D. Ramachandran.

xvi

Preface to Volume 1

In Chapters 4 and 5 we present basically work of V.L. Levin; we analyze measure-theoretic methods for inﬁnite-dimensional linear programs developed in context with the KRP as well as applications to general utility theorems (the Debreu theorem), extension theorems, choice theory, and set-valued dynamical systems.(2) In Chapters 6 and 8 we discuss new material on applications of the MKP and the KRP to the representation of ideal metrics and on various probabilistic approximation and limit theorems. This supplements the earlier results in this direction as described in the book of Rachev (1991) on probability metrics and stochastic models. In particular, we show that probability metrics allow us to ﬁnd uniﬁed proofs for central limit theorems for martingales, (operator) stable limit theorems, and to more speciﬁc problems like compound Poisson approximation or rounding problems. Chapter 7, the ﬁrst chapter in the second volume, is concerned with modiﬁcations of the MKP by additional or relaxed constraints. We discuss various types of moment problems and applications to the tomography paradoxon and to the approximation of queueing systems. A wide range of applications of metrics based on the transportation problem has been established in recent years in connection with recursive stochastic equations. We discuss algorithms of informatics (sorting, searching, branching, search trees) as well as applications to the approximation of stochastic diﬀerential equations, to the propagation of the chaos property of particle systems with applications to the approximation of nonlinear PDEs, as well as to the rate of convergence of empirical measures, which is of interest for matching problems in Chapters 9 and 10. From the technical point of view, MKPs can be subdivided into the discrete and continuous cases, according to the nature of their basic spaces and to the supports of the initial and the ﬁnal masses. In the discrete case, the totality of the mass transference plans is the polytope that arises in the transportation problem of mathematical programming. There is, of course, a vast literature on the transportation problem, its specialization to the assignment problem, and its generalization to network ﬂow problems. It turns out, as will be elaborated further in the book, that the northwest corner rule in the discrete case corresponds to a closed form for the solution in the continuous case. Indeed, the discrete analogue of a result known in the continuous case provides a new result in the discrete case; and its simple proof in the discrete case provides a new proof for the continuous case, see Rachev and R¨ uschendorf (1994c) and the references therein. Another approach in the discrete linear case prefers to exploit the special structure of supplies and demands (or clearings and ﬁllings) and permits a particularly simple combinatorial algorithm for ﬁnding an optimal solution as developed (2) These two chapters were written following closely the notes kindly provided to us by V.L. Levin.

Preface to Volume I

xvii

by Balinski (1983), Balinski and Russakoﬀ (1984), Balinski (1985, 1986), Goldfarb (1985), Kleinschmidt, Lee, and Schannath (1987), and Burkard, Klinz, and Rudolf (1994). MTPs may be viewed as an analogue and a unifying framework of a problem considered by probabilists at the beginning of the twentieth century: How does one measure the diﬀerence between two random quantities? Many speciﬁc contributions to the analysis of this problem have been made, including Gini’s (1914) notion of concordance, Kendall’s τ , Spearman’s , the analysis of greatest possible diﬀerences by Hoeﬀding (1940) and others, by Fr´echet (1951, 1957), Robbins (1975), and Lai and Robbins (1976), and the generalizations of these results by Cambanis, Simons, and Stout (1976), R¨ uschendorf (1980), Tchen (1980), and Cambanis and Simons (1982). These (and others) oﬀer piecemeal answers to basic questions that arise from diﬀerent stochastic models; they give no guidance as to the question of what concept should be used where: There is no general theory underlying the diverse approaches. We refer to Kruskal (1958), Gini (1965), and Rachev (1984b, 1991c). In this book we investigate, develop, and exploit the connections between the discrete and continuous versions of the mass transportation problems (MTP) as well as study systematically the relationships between the methods and results from diﬀerent versions of the MTP. The MTPs are the basis of many problems related to the question of stability of stochastic models, to the question of whether a proposed model yields a satisfactory approximation to the phenomenon under consideration, and to the problem of approximation of stochastic and deterministic algorithms. It is our belief that MTPs hold great promise in stochastic analysis as well as in mathematical analysis. The MTP is full of connections with geometry, (partial) diﬀerential equations, (generalized) convex analysis, moment problems, inﬁnite-dimensional linear programming, measurable choice theory, and extension problems, and it has many open problems. It has a great potential for a series of applications in several scientiﬁc ﬁelds. This book grew out of joint work and lectures delivered by the authors at the Steklov Mathematical Institute, Universit¨at M¨ unster, Universit¨at Freiburg, the Ecole Polytechnique, SUNY at Stony Brook, and the University of California, Santa Barbara, over many years. Many colleagues provided helpful suggestions after reading parts of the manuscript. All chapters were rewritten several times, and preliminary versions were circulated among friends, who eliminated many inaccuracies and obscurities. We would like to thank H.G. Kellerer, V.L. Levin, M. Balinski, D. Ramachandran, G.A. Anastassiou, M. Maejima, M. Cramer, I. Olkin, M. Gelbrich, W. R¨ omisch, V. Bene˘s, L. Uckelmann, and many other friends and colleagues who encouraged us to complete the work. We are indebted to Mrs. M. Hattenbach and Ms. A. Blessing for their superb typing; the appearance of this monograph owes much to them. We are grateful to the publisher

xviii

Preface to Volume 1

and especially to J. Kimmel for support and patience. We are particularly thankful to J. Gani for his invaluable suggestions concerning improvements of this work, his help with the organization of the material, and his encouragement to continue the project. Finally, we thank the Alexander von Humboldt Foundation for its generous ﬁnancial support of S.T. Rachev in 1995 and 1996, which made this joint work possible. (3)

(3) The

work of S.T. Rachev was also partially supported by NSF Grants. The joint work of the authors was supported by NATO-Grant CRG900798.

Contents to Volume II

Preface to Volume II

vii

Preface to Volume I

xi

7 Relaxed or Additional Constraints 7.1 Mass Transportation Problem with Relaxed Marginal Constraints . . . . . . . . . . 7.2 Fixed Sum of the Marginals . . . . . . . . . . . . . . 7.3 Mass Transportation Problems with Capacity Constraints . . . . . . . . . . . . . . . 7.4 Local Bounds for the Transportation Plans . . . . . 7.5 Closeness of Measure on a Finite Number of Directions . . . . . . . . . . . 7.6 Moment Problems of Stochastic Processes and Rounding Problems . . . . . . . . . . . . . . . . 7.6.1 Moment Problems and Kantorovich Radius . . . . . 7.6.2 Moment Problems Related to Rounding Proportions 7.6.3 Closeness of Random Processes with Fixed Moment Characteristics . . . . . . . . . . 7.6.4 Approximation of Queueing Systems with Prescribed Moments . . . . . . . . . . . . . . . 7.6.5 Rounding Random Numbers with Fixed Moments . .

1 . . . . . .

2 10

. . . . . .

17 36

. . .

42

. . . . . . . . .

52 54 57

. . .

62

. . . . . .

71 80

xx

Contents to Volume II

8 Probabilistic-Type Limit Theorems

85

8.1

Rate of Convergence in the CLT with Respect to Kantorovich Metric . . . . . . . . . . . .

8.2

Application to Stable Limit Theorems . . . . . . . . . . . 102

8.3

Summability Methods, Compound Poisson Approximation 126

8.4

Operator-Stable Limit Theorems . . . . . . . . . . . . . . 131

8.5

Proofs of the Rate of Convergence Results . . . . . . . . . 153

8.6

Ideal Metrics in the Problem of Rounding . . . . . . . . . 178

9 Mass Transportation Problems and Recursive Stochastic Equations

85

191

9.1

Recursive Algorithms and Contraction of Transformations . . . . . . . . . . . . . . . . . . . . . . 191

9.2

Convergence of Recursive Algorithms . . . . . . . . . . . . 204

9.2.1 Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . 204 9.2.2 Branching-Type Recursion . . . . . . . . . . . . . . . . . . 206 9.2.3 Limiting Distribution of the Collision Resolution Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 9.2.4 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 9.2.5 Limiting Behavior of Random Maxima . . . . . . . . . . . 231 9.2.6 Random Recursion Arising in Probabilistic Modeling: Limit Laws . . . . . . . . . . . . . . . . . . . . . . . . . . 236 9.2.7 Random Recursion Arising in Probabilistic Modeling: Rate of Convergence . . . . . . . . . . . . . . . . . . . . . 248 9.3

Extensions of the Contraction Method . . . . . . . . . . . 254

9.3.1 The Number of Inversions of a Random Permutation . . . 254 9.3.2 The Number of Records . . . . . . . . . . . . . . . . . . . 257 9.3.3 Unsuccessful Searching in Binary Search Trees

. . . . . . 260

9.3.4 Successful Searching in Binary Search Trees . . . . . . . . 263 9.3.5 A Random Search Algorithm . . . . . . . . . . . . . . . . 269 9.3.6 Bucket Algorithm

. . . . . . . . . . . . . . . . . . . . . . 272

10 Stochastic Diﬀerential Equations and Empirical Measures 10.1

277

Propagation of Chaos and Contraction of Stochastic Mappings . . . . . . . . . . . . . . . . . . . . 277

10.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 277

Contents to Volume II

xxi

10.1.2 Equations with p-Norm Interacting Drifts . . . . . . . . . 279 10.1.3 A Random Number of Particles

. . . . . . . . . . . . . . 290

10.1.4 pth Mean Interactions in Time: A Non-Markovian Case . 293 10.1.5 Minimal Mean Interactions in Time

. . . . . . . . . . . . 307

10.1.6 Interactions with a Normalized Variation of the Neighbors: Relaxed Lipschitz Conditions . . . . . . . . . . . . . . . . 308 10.2

Rates of Convergence in the Kantorovich Metric . . . . . . . . . . . . . . . . . . 322

10.3

Stochastic Diﬀerential Equations . . . . . . . . . . . . . . 332

References

351

Abbreviations

395

Symbols

397

Index

409

Contents to Volume I

Preface to Volume I

vii

Preface to Volume II

xv

1 Introduction

1

1.1

Mass Transportation Problems in Probability Theory . . .

1

1.2

Specially Structured Transportation Problems . . . . . . .

21

1.3

Two Examples of the Interplay Between Continuous and Discrete MTPs . . . . . . . . . . . . . . . . . . . . . .

23

Stochastic Applications . . . . . . . . . . . . . . . . . . . .

27

1.4

2 The Monge–Kantorovich Problem 2.1

57

The Multivariate Monge–Kantorovich Problem: An Introduction . . . . . . . . . . . . . . . . . . . . . . . .

58

2.2

Primal and Dual Monge–Kantorovich Functionals . . . . .

64

2.3

Duality Theorems in a Topological Setting . . . . . . . . .

76

2.4

General Duality Theorem . . . . . . . . . . . . . . . . . .

82

2.5

Duality Theorems with Metric Cost Functions . . . . . . .

86

2.6

Dual Representation for Lp -Minimal Metrics . . . . . . . .

96

xxiv

Contents to Volume I

3 Explicit Results for the Monge–Kantorovich Problem

107

3.1

The One-Dimensional Case . . . . . . . . . . . . . . . . . 107

3.2

The Convex Case . . . . . . . . . . . . . . . . . . . . . . . 112

3.3

The General Case . . . . . . . . . . . . . . . . . . . . . . . 123

3.4

An Extension of the Kantorovich L2 -Minimal Problem . . 132

3.5

Maximum Probability of Sets, Maximum of Sums, and Stochastic Order . . . . . . . . . . . . . . . . . . . . . 144

3.6

Hoeﬀding–Fr´echet Bounds . . . . . . . . . . . . . . . . . . 151

3.7

Bounds for the Total Transportation Cost . . . . . . . . . 158

4 Duality Theory for Mass Transfer Problems

161

4.1

Duality in the Compact Case . . . . . . . . . . . . . . . . 161

4.2

Cost Functions with Triangle Inequality . . . . . . . . . . 172

4.3

Reduction Theorems . . . . . . . . . . . . . . . . . . . . . 190

4.4

Proofs of the Main Duality Theorems and a Discussion . . 207

4.5

Duality Theorems for Noncompact Spaces . . . . . . . . . 219

4.6

Inﬁnite Linear Programs . . . . . . . . . . . . . . . . . . . 241

4.6.1 Duality Theory for an Abstract Scheme of Inﬁnite-Dimensional Linear Programs and Its Application to the Mass Transfer Problem

. . . . 241

4.6.2 Duality Theorems for the Mass Transfer Problem with Given Marginals . . . . . . . . . . . . . . . . . . . . . 245 4.6.3 Duality Theorem for a Marginal Problem with Additional Constraints of Moment-Type . . . . . . . 251 4.6.4 Duality theorem for a Further Extremal Marginal Problem . . . . . . . . . . . . . . . . . . . . . . . 258 4.6.5 Duality Theorem for a Nontopological Version of the Mass Transfer Problem . . . . . . . . . . . . . . . . 265 5 Applications of the Duality Theory

275

5.1

Mass Transfer Problem with a Smooth Cost Function—Explicit Solution . . . . . . . . . . . . . . 275

5.2

Extension and Approximate Extension Theorems . . . . . 290

5.2.1 The Simplest Extension Theorem the Case X = E(S) and X1 = E(S1 ) . . . . . . . . . . . 290 5.2.2 Approximate Extension Theorems . . . . . . . . . . . . . 292 5.2.3 Extension Theorems . . . . . . . . . . . . . . . . . . . . . 295

Contents to Volume I

xxv

5.2.4 A continuous selection theorem . . . . . . . . . . . . . . . 302 5.3

Approximation Theorems . . . . . . . . . . . . . . . . . . 306

5.4

An Application of the Duality Theory to the Strassen Theorem . . . . . . . . . . . . . . . . . . . 319

5.5

Closed Preorders and Continuous Utility Functions . . . . 322

5.5.1 Statement of the Problem and the Idea of the Duality Approach . . . . . . . . . . . . . . . . . . . 322 5.5.2 Functionally Closed Preorders . . . . . . . . . . . . . . . . 324 5.5.3 Two Generalizations of the Debreu Theorem . . . . . . . . 329 5.5.4 The Case of a Locally Compact Space . . . . . . . . . . . 335 5.5.5 Varying preorders and a universal utility theorem

. . . . 337

5.5.6 Functionally Closed Preorders and Strong Stochastic Dominance . . . . . . . . . . . . . . 341 5.6

Further Applications to Utility Theory . . . . . . . . . . . 344

5.6.1 Preferences That Admit Lipschitz or Continuous Utility Functions . . . . . . . . . . . . . . . 344 5.6.2 Application to Choice Theory in Mathematical Economics . . . . . . . . . . . . . . . . . 352 5.7

Applications to Set-Valued Dynamical Systems . . . . . . 354

5.7.1 Compact-Valued Dynamical Systems: Quasiperiodic Points . . . . . . . . . . . . . . . . . . . . . 354 5.7.2 Compact-Valued Dynamical Systems: Asymptotic Behavior of Trajectories . . . . . . . . . . . . 358 5.7.3 A Dynamic Optimization Problem . . . . . . . . . . . . . 363 5.8

Compensatory Transfers and Action Proﬁles . . . . . . . . 367

6 Mass Transshipment Problems and Ideal Metrics

371

6.1

Kantorovich–Rubinstein Problems with Constraints . . . . 372

6.2

Constraints on the κth Diﬀerence of Marginals . . . . . . 383

6.3

The General Case . . . . . . . . . . . . . . . . . . . . . . . 402

6.4

Minimality of Ideal Metrics . . . . . . . . . . . . . . . . . 414

References

429

Abbreviations

473

Symbols

475

Index

487

7 Modiﬁcations of the Monge–Kantorovich Problems: Transportation Problems with Relaxed or Additional Constraints

In this chapter we study modiﬁcations of the usual transportation problem by allowing additional constraints on the admissible supply—resp. demand—distributions. In particular, we consider the case that the marginal distribution function of the supply is bounded below by a d.f. F1 , while the marginal d.f. of the demand is bounded above by a d.f. F2 . We also examine transportation plans with constraints of a local type concerning the densities of the marginals, and ﬁnally, we study transportation problems with additional moment-type constraints. For the solution of these problems we make use of some methods arising in the theory of marginal and moment problems, duality theory, and stochastic ordering results. The next part is concerned with a solution of the tomography paradox. With respect to some weak metrics, two distributions are getting close if they coincide on an increasing number of directions. In the ﬁnal sections we review results on the closeness of distributions under given momenttype characteristics and discuss applications to the rounding problem. Most of the results in these sections are contained in Rachev and R¨ uschendorf (1993, 1994c), Levin and Rachev (1989), Klebanov and Rachev (1995), and Anastassiou and Rachev (1992). A survey on related discrete transportation problems is given in Burkard, Klinz, and Rudolf (1994).

2

7. Relaxed or Additional Constraints

7.1 Mass Transportation Problem with Relaxed Marginal Constraints For distribution functions F1 , F2 let F(F1 , F2 ) denote the set of all d.f.s F on IR2 with marginals F1 , F2 (i.e., F (x, ∞) = F1 (x), F (∞, y) = F2 (y)). Then the transportation problem with nonnegative cost function c(x, y), x, y ∈ IR, has the form minimize c(x, y) dF (x, y) over all F ∈ F(F1 , F2 ). (7.1.1) IR2

Usually, in the linear programming setting, F1 is viewed as the supply distribution and F2 as the demand distribution. Clearly, (7.1.1) is an inﬁnitedimensional of the discrete transportation problem: Given ai ≥ analogue m n a = 0, bj ≥ 0, i i=1 j=1 bj , minimize

m n

cij xij , subject to the constraints

(7.1.2)

i=1 j=1 n j=1

xij = ai , 1 ≤ i ≤ m,

m

xij = bj , j = 1, . . . , n, xij ≥ 0, ∀i, j.

i=1

Suppose c(x, y) (resp. (cij )) satisﬁes the “Monge” conditions, i.e., c is right continuous, and c(x , y ) − c(x, y ) − c(x , y) + c(x, y) ≤ 0

for all x ≥ x, y ≥ y;

(7.1.3)

in the discrete case these conditions are of the form cij + ci+1,j+1 − ci,j+1 − ci+1,j ≤ 0, ∀1 ≤ i < m, 1 ≤ j < n.

(7.1.4)

Then the solution of (7.1.1), (7.1.4) is well known and based on the “northwest corner rule,” which leads to a greedy algorithm; see Hoﬀman (1961). For (7.1.1) the solution is given by the d.f. F ∗ , F ∗ (x, y) = min{F1 (x), F2 (y)}.

(7.1.5)

F ∗ is the upper Fr´echet bound, see (3.6.2). Recall that the Fr´echet bounds provide the following characterization of F(F1 , F2 ): F ∈ F(F1 , F2 )

if and only if

(7.1.6) ∗

F∗ (x, y) := (F1 (x) + F2 (y) − 1)+ ≤ F (x, y) ≤ F (x, y) (here (·)+ = max(0, ·)); the lower Fr´echet bound yields the solution of the maximization problem corresponding to (7.1.1).

7.1 Mass Transportation Problem with Relaxed Marginal Constraints

3

In terms of random variables an equivalent formulation of the transportation problem is the following: minimize Ec(X, Y ),

subject to FX = F1 , FY = F2 ,

(7.1.7)

where X, Y are random variables on a rich enough (e.g., atomless) probability space (Ω, A, P ). The solutions (7.1.5), resp. (7.1.6), then can be represented as distributions of r.v.s X ∗ , Y ∗ : X ∗ = F1−1 (U ),

Y ∗ = F2−1 (U )

X ∗ = F1−1 (U ),

Y ∗ = F2−1 (1 − U )

(for (7.1.1), (7.1.5)),

(7.1.8)

resp. (for F∗ ),

(7.1.9)

where U is uniformly distributed on (0, 1), and F1−1 (u) = inf{y; F1 (y) ≥ u} is the generalized inverse of F1 . We next consider the mass transportation problem (7.1.1), but with relaxed marginal constraints. For d.f.s F1 , F2 the set H(F1 , F2 ) = {F ; F is a d.f. on IR2 with marginal d.f.s F1 ≤ F1 , F2 ≥ F2 }

(7.1.10)

of all d.f.s F with F1 (x) = F (x, ∞) ≤ F1 (x), ∀x ∈ IR1 , and F2 (y) = F (∞, y) ≥ F2 (y), ∀y ∈ IR1 . We consider the transportation problem: minimize

c(x, y) dF (x, y),

subject to F ∈ H(F1 , F2 ),

(7.1.11)

R2

or, equivalently, minimize Ec(X, Y ),

subject to FX ≤ F1 , FY ≥ F2 .

(7.1.12)

In the discrete case the problem is to minimize

cij xij ,

(7.1.13)

where for some “supplies” s1 , . . . , sn , a1 ≤ s1 , a1 +a2 ≤ s1 +s2 , . . . , and for some demands d1 , . . . , dn , b1 ≥ d1 , b1 +b2 ≥ d1 +d2 , . . . (ai , bi as in (7.1.2)). This describes a production process and a consumption process subject to some priorities (e.g., queueing priorities) with capacities s1 , . . . , sn having the following property: Every remaining free capacity at stage i of the production (resp. consumption) process can be transferred to some of the next stages i + 1, . . . , n.

4

7. Relaxed or Additional Constraints

Theorem 7.1.1 Let the cost function c(x, y) be symmetric in x, y, let c(x, y) satisfy the Monge condition (7.1.3), and let c(x, x) = 0 for all x ∈ IR. Set H ∗ (x, y) = min{F1 (x), max{F1 (y), F2 (y)}},

x, y ∈ IR.

(7.1.14)

Then (a)

H ∗ ∈ H(F1 , F2 ),

(b)

H ∗ solves the relaxed transportation problem (7.1.11),

(c)

(7.1.15)

1 c(x, y) dH (x, y) = c F1−1 (u), min F1−1 (u), F2−1 (u) du. ∗

0

IR2

Remark 7.1.2 Setting G1 (y) = max{F1 (y), F2 (y)}, we see from Theorem 7.1.1 that the relaxed transportation problem (7.1.11) is equivalent to the transportation problem (7.1.1) with marginals F1 , G1 . In terms of random variables the solution can be expressed by the joint distribution of X ∗ = F1−1 (U ) and −1 −1 Y ∗ = G−1 1 (U ) = min F1 (U ), F2 (U )

(7.1.16)

(cf. (7.1.8)). Proof: The Monge condition implies that we can view the function −c(x, y) as a “distribution function” corresponding to a nonnegative measure µc on IR2 . Let X, Y be any real r.v.s, and for x, y ∈ IR1 set x ∨ y = max{x, y}, x ∧ y = min{x, y}. Theorem 7.1.1 is a consequence of the following two claims. Claim 7.1.3 (Cambanis, Simons, and Stout (1976); see also Dall’Aglio (1956) for the special case c(x, y) = |x − y|p ) (P (X < x ∧ y, Y ≥ x ∨ y) 2Ec(X, Y ) = IR2

+P (X ≥ x ∨ y, Y < x ∧ y))µc ( dx, dy).

(7.1.17)

For the proof of Claim 7.1.3 deﬁne the function f (x, y, w) : IR2 × Ω → IR by ⎧ ⎨ 1 if X(w) < x, y ≤ Y (w) or Y (w) < x, y ≤ X(w), f (x, y, w) = ⎩ 0 otherwise.

7.1 Mass Transportation Problem with Relaxed Marginal Constraints

5

Using Fubini’s theorem, Ew f (x, y, w)µc ( dx, dy) = (Ew f (x, y, w))µc ( dx, dy). (7.1.18) IR2

IR2

Next, the symmetry of c(x, y) and c(x, x) = 0 yield f (x, y, w) dµc

(7.1.19)

IR2

= − [c (Y (w), Y (w)) + c (X(w), X(w)) − c (X(w), Y (w)) −c (Y (w), X(w))] = 2c (X(w), Y (w)) . Clearly, Ew f (x, y, w)

(7.1.20)

= P (X < x ∧ y, Y ≥ x ∨ y) + P (X ≥ x ∨ y, Y < x ∧ y). Combining (7.1.18), (7.1.19), and (7.1.20) we obtain (7.1.17). Claim 7.1.4 Deﬁne X ∗ = F1−1 (U ), Y ∗ = min F1−1 (U ), F2−1 (U ) . Then Ec(X ∗ , Y ∗ ) = min (Ec(X, Y ); FX ≤ F1 , FY ≥ F2 ) ,

(7.1.21)

and the value of the expectation in (7.1.21) is given by 1 Ec(X ∗ , Y ∗ ) = max (0, F2 ((x ∧ y)−) − F1 ((x ∨ y)−)) µc ( dx, dy) 2 IR2

1 c F1−1 (t), min F1−1 (t), F2−1 (t) dt. =

(7.1.22)

0

For the proof of Claim 7.1.4 let X, Y be any r.v.s with d.f.s FX ≤ F1 , FY ≥ F2 . Using Claim 7.1.3 we obtain P (X ≥ x ∨ y, Y < x ∧ y)µc ( dx, dy) (7.1.23) 2Ec(X, Y ) ≥ IR2

(P (Y < x ∧ y) − P (X < x ∨ y, Y < x ∧ y)) µc ( dx, dy)

= IR2

≥

(P (Y < x ∧ y) − min {P (X < x ∨ y), IR2

6

7. Relaxed or Additional Constraints

P (Y < x ∧ y)}) µc ( dx, dy) (P (Y < x ∧ y) − P (X < x ∨ y))+ µc ( dx, dy)

= IR2

≥

(F2 ((x ∧ y)−) − F1 ((x ∨ y)−))+ µc ( dx, dy). IR2

∗ Next, we check that the lower bound −1 in (7.1.23) is attained for X ∗ = −1 −1 ∗ F1 (U ), Y = min F1 (U ), F2 (U ) . In fact, by Claim 7.1.3 using X ≥ Y ∗ and {U < F2 (z)} = F2−1 (U ) < z a.s. we get

2Ec(X ∗ , Y ∗ ) = (P (X ∗ ≥ x ∨ y, Y ∗ < x ∧ y) IR2

(7.1.24)

+P (X ∗ < x ∧ y, Y ∗ ≥ x ∨ y)) µc ( dx, dy) P (X ∗ ≥ x ∨ y, Y ∗ < x ∧ y)µc ( dx, dy)

= IR2

= IR2

=

P F1−1 (U ) ≥ x ∨ y, min F1−1 (U ), F2−1 (U ) < x ∧ y µc ( dx, dy) P F1−1 (U ) ≥ x ∨ y, F2−1 (U ) < x ∧ y µc ( dx, dy)

IR2

P (U ≥ F1 (x ∨ y), U < F2 (x ∧ y))+ µc ( dx, dy)

= IR2

=

(F2 ((x ∧ y)−) − F1 ((x ∨ y)−))+ µc ( dx, dy). IR2

Obviously, F(X ∗ ,Y ∗ ) = H ∗ ∈ H(F1 , F2 ), and the proof of Theorem 7.1.1 is complete. 2

Remark 7.1.5 The optimal coupling (7.1.16) leads to the following “greedy” algorithm for solving the ﬁnite discrete transportation problem with relaxed side conditions: minimize

n n

cij xij

i=1 j=1

subject to:

xij ≥ 0,

(7.1.25)

7.1 Mass Transportation Problem with Relaxed Marginal Constraints j n

xrs ≥

s=1 r=1 i n

j

bs =: Gj ,

1 ≤ j ≤ n,

ar =: Fi ,

1 ≤ i ≤ n,

7

s=1

xrs ≤

r=1 s=1

i r=1

n where the sum of the “demands” s=1 bs equals the sum of the “supplies” n r=1 ar , assuming that (cij ) are symmetric, cii = 0, and c satisﬁes the Monge condition (7.1.4). Set =

Hi δ1

max(Fi , Gi ), 1 ≤ i ≤ n,

(7.1.26)

= H1 , δi+1 = Hi+1 − Hi , 1 ≤ i ≤ n − 1.

Then (7.1.25) is equivalent to the standard transportation problem (7.1.2) with side conditions (ai ), (δi ). 7In the following example we compare the solution of problem (7.1.25) with inequality constraints with the “greedy” solution of the standard transportation problem with equality constraints (7.1.2). For the problem with inequality constraints we ﬁrst calculate the new artiﬁcial demands δj as in (7.1.26) and then apply the northwest corner rule. supply a1

Example 7.1.6 yij xij

20 10

10

20 20

20 10

10 20 20 10 10 10 10

demand b1

10

30

10

40

0

10

10

40

50

90

90

100

Hj = Fj ∨ Gj

20

40

60

90

90

100

δ 1 = H1 ,

20

20

20

30

0

10

Gj =

j s=1

bs

δj+1 = Hj+1 − Hj

Fi =

i r=1

20

20

0

20

40

60

20

80

10

90

10

100

ar

“artiﬁcial” demands

xij = solution of the standard transportation problem (7.1.2), using the classical northwest corner yij = solution of the transportation problem with relaxed side conditions

8

7. Relaxed or Additional Constraints

We next extend the solution to the nonsymmetric case. We relax the symmetry condition, assuming that for any x, y the functions c(x, ·), c(·, y) are unimodal: c(x, y1 ) ≤ c(x, y2 ) if x ≤ y1 ≤ y2 or y2 ≤ y1 ≤ x,

(7.1.27)

c(x1 , y) ≤ c(x2 , y) if x2 ≤ x1 ≤ y or y ≤ x1 ≤ x2 . Theorem 7.1.7 If c(x, x) = 0 for all x ∈ IR and c satisﬁes the Monge condition and the unimodality condtion (7.1.27), then the relaxed transportation problem minimize Ec(X, Y )

subject to FX ≥ F1 , FY ≤ F2

has a solution, given by the coupling X ∗ = F1−1 (U ), Y ∗ = max F1−1 (U ), F2−1 (U )

(7.1.28)

(7.1.29)

with joint distribution

FX ∗ ,Y ∗ (x, y) = min F1 (x), min (F1 (y), F2 (y)) , and the optimal value is given by 1 Ec(X , Y ) = c F1−1 (u), max F1−1 (u), F2−1 (u) du. ∗

∗

0

Proof: Let X, Y be r.v.s with FX ≥ F1 , FY ≤ F2 . Then by (7.1.8), −1 Ec(X, Y ) ≥ Ec FX (U ), FY−1 (U ) . (7.1.30) −1 −1 −1 −1 −1 Let G(y) X (y), FY (y)). Then FX ≤ F1 , FY ≥ F2 , and G = −1= min(F −1 max FX , FY . We now need the following

Claim 7.1.8 1 1 −1 −1 c FX (u), FY−1 (u) du ≥ c FX (u), G−1 (u) du. 0

(7.1.31)

0

−1 (u), y1 = To show Claim 7.1.8 set (for a ﬁxed u ∈ (0, 1)), x = FX −1 −1 −1 ∨ FY (u) = G (u), and y2 = FY (u).

−1 (u) FX

Case 1: x < y2 . In this case, x ≤ y1 ≤ y2 , and therefore, the unimodality condition (7.1.27) implies c(x, y2 ) ≥ c(x, y1 ). Case 2: y2 ≤ x. In this case, y1 = x, and therefore, y2 ≤ y1 = x. Again by the unimodality condition, c(x, y2 ) ≥ c(x, y1 ). So Claim 7.1.8 holds.

7.1 Mass Transportation Problem with Relaxed Marginal Constraints

9

Claim 7.1.9 The following bound holds for every coupling (X, Y ) : 1 −1 −1 c FX (u), FY−1 ∨ FX (u) du

(7.1.32)

0

1 ≥ c F1−1 (u), F2−1 (u) ∨ F1−1 (u) du. 0 −1 (u), x 2 = FY−1 (u), x1 = F1−1 (u), x2 = For the proof deﬁne x 1 = FX for a ﬁxed u. Then x 1 ≤ x1 , x2 ≤ x 2 .

F2−1 (u)

2 , then x 1 ≤ x 2 ∨ x2 ≤ x 2 . If x 1 < x if

x 1 ≥ x 2 , then x 1 = x 1 ∨ x2 ≥ x 2 .

(7.1.33)

From (7.1.33) we obtain Claim 7.1.10 c( x1 , x 1 ∨ x2 ) ≥ c(x1 , x1 ∨ x2 ).

(7.1.34)

For the proof of Claim 7.1.10 we use the relation x1 ≥ x 1 . 1 . Then c( x1 , x2 ) = c( x1 , x 1 ∨ x2 ) ≥ c(x1 , x2 ) = Case 1: x2 > x1 > x c(x1 , x1 ∨ x2 ) by the unimodality condition. Case 2: (a) x1 ≥ x2 ≥ x 1 . Then, trivially, c( x1 , x2 ) = c( x1 , x2 ∨ x 1 ) ≥ c(x1 , x1 ∨ x2 ) = c(x1 , x1 ) = 0. 1 ≥ x2 . Then again, c( x1 , x 1 ) = c( x1 , x 1 ∨x2 ) ≥ c(x1 , x1 ∨x2 ) = (b) x1 ≥ x c(x1 , x1 ) = 0. Claims 7.1.8, 7.1.9, and 7.1.10 imply (7.1.28).

2

Remark 7.1.11 (a) The unimodality assumption (7.1.27) is quite natural from an application point of view. Note that the transportation problem in Theorem 7.1.7 is the same as in Theorem 7.1.1 (where only the indices 1,2 have been changed). We used this to demonstrate that the solution F ∗ is not unique. Without the symmetry, resp. the unimodality condition, the solution may diﬀer substantially. Given a right continuous function f = f (y) ≥ 0, consider the cost function c(x, y) = f (y). Then

10

7. Relaxed or Additional Constraints

c satisﬁes the Monge condition, and so (7.1.28) is equivalent to the following problem: minimize

f (y) dFY (y)

subject to FY ≤ F2 .

(7.1.35)

Equivalently, we are seeking a d.f. F2 ≤ F2 such that the distribution of f with respect to F2 has a minimal ﬁrst moment. Obviously, the solution (7.1.31) of Theorem 7.1.7 is not a solution of (7.1.35). (b) In the proof of Theorem 7.1.7, the assumption c(x, x) = 0 can be replaced with a weaker one, c(x, x) ≤ c(x, y) ∧ c(y, x),

for all x, y ∈ IR.

(7.1.36)

7.2 Mass Transportation Problem with Fixed Sum (Diﬀerence) of the Marginals and with Stochastically Ordered Marginals Consider a ﬂow in a network with n nodes, i = 1, . . . , n, and let xij be the ﬂow from node i to node j. Assume that for all nodes k the value of x + is ﬁxed and equal to hk . As motivation, suppose ai = ik j xkj in n x , b = i k=1 xki to be the amount of workload corresponding to k=1 ik the outﬂow, resp. to the inﬂow, in node i. Assume that the total work capacity at node i is given by hi (in a certain time period). Then every admissible ﬂow (xij ) should satisfy the condition hi = ai + bi ,

1 ≤ i ≤ n.

(7.2.1)

k k k Set A(k) = i=1 ai , B(k) = i=1 bi , and H(k) = i=1 hi . Then hk = A(k) + B(k) − (A(k − 1) + B(k − 1)), and (7.2.1) is equivalent to H(k) = A(k) + B(k),

1 ≤ k ≤ n.

(7.2.2)

Let cij be the transportation cost of a unit from node i to node j. Then the problem is to minimize the total cost cij xij subject to the admissibility condition (7.2.1) and xij ≥ 0. The general formulation of this problem is the following. For two d.f.s A, B deﬁne G(x) = 12 (A(x) + B(x)). For a given cost function c(x, y), c(x, y) dF (x, y), subject to F ∈ FA+B . (7.2.3) minimize IR2

7.2 Fixed Sum of the Marginals

11

Here FA+B is the set of all d.f.s F (x, y) with marginal d.f.s F1 , F2 satisfying F1 (x) + F2 (x) = A(x) + B(x). Consider next the special case c(x, y) = |x − y|. Let X, Y be real r.v.s with joint d.f. F . Then by the triangle inequality, E|X − Y | ≤

inf (E|X − a| + E|Y − a|).

a∈IR1

(7.2.4)

Since E|X − a| + E|Y − a| = |x − a| d(FX + FY )(x) depends only on the sum of the marginals, (7.2.3) is the best possible improvement of (7.2.4), provided that the sum of the marginal FX + FY is known. Rachev (1984d) showed that sup {E|X − Y |p ; FX + FY = A + B} = 1 −1 G (t) − G−1 (1 − t)p dt, p ≥ 1.

(7.2.5)

0

The following result gives an explicit solution of the general problem in (7.2.3). Proposition 7.2.1 Suppose c ≥ 0 is symmetric and satisﬁes the Monge condition: c(x , y ) − c(x, y ) − c(x , y) + c(x, y) ≤ 0

∀x ≥ x, y ≥ y.

(7.2.6)

1 = c(G−1 (u), G−1 (u)) du,

(7.2.7)

Then inf

c(x, y) dF (x, y); F ∈ FA+B

0

and sup

c(x, y) dF (x, y); F ∈ FA+B

=

1 c(G−1 (u), G−1 (1−u)) du.(7.2.8) 0

The corresponding optimal pairs of r.v.s (couplings) are given by (G−1 (U ), G−1 (U )), resp. (G−1 (U ), G−1 (1 − U )). Proof: Since c is symmetric, we obtain for any F ∈ FA+B , 1 c(x, y) dF (x, y) = (c(x, y) + c(y, x)) dF (x, y) 2 F (x, y) + F (y, x) . = c(x, y) d 2

12

7. Relaxed or Additional Constraints

(y,x) On the other hand, Fs (x, y) = F (x,y)+F ∈ F(G, G). Consequently, we 2 obtain (7.2.7), (7.2.8) by making use of (7.1.8) and (7.1.9) with F1 = F2 = G. 2

We have the following analogue of the above proposition for nonsymmetric cost functions. Proposition 7.2.2 If c(x, y) satisﬁes the Monge condition (7.2.6) and furthermore, x1 ≤ y ≤ x2 implies that c(x1 , x2 ) ≥ c(y, y), then

inf

c(x, y) dF (x, y); F ∈ FA+B

1 c(G−1 (u), G−1 (u)) du. =

(7.2.9)

0

Proof: Applying the Monge conditions for every X, Y with FX,Y ∈ FA+B , −1 (U ), FY−1 (U )). Since FX (x)+FY (x) = 2G(x), it follows Ec(X, Y ) ≥ Ec(FX −1 −1 ∧ FY−1 ≤ G−1 ≤ FX ∨ that FX ∧ FY ≤ G ≤ FX ∨ FY , and therefore, FX −1 −1 −1 −1 −1 FY . Consequently, we have c(FX (U ), FY (U )) ≥ c(G (U ), G (U )), which proves (7.2.9). 2 Remark 7.2.3 The marginals of the class FA+B have largest and smallest elements, deﬁned by ⎧ ⎨ 2 G(x), x < x0 , F1∗ (x) = ⎩ 1, x≥x , 0

and

⎧ ⎨ 2 G(x) − 1, F2∗ (x) = ⎩ 1,

x < x0 , x ≥ x0 ,

with x0 := inf{y; 2G(y) ≥ 1}. Note that there is no smallest d.f. in FA+B . To show this let F1 (x), F2 (x) be the marginal d.f.s of the smallest elements F ∈ FA+B and let G1 , G2 be d.f.s such that G1 (x) + G2 (x) = 2G(x). If the lower Fr´echet bounds satisfy (F1 (x) + F2 (y) − 1)+ ≤ (G1 (x) + G2 (y) − 1)+ , then F1 ≤ G1 and F2 ≤ G2 , which implies that F1 = G1 , F2 = G2 . In particular, this implies that (G−1 (U ), G−1 (1 − U )) is in the general nonsymmetric case no longer a solution to the problem of maximizing c(x, y) dF (x, y) over the class FA+B . For example, let G be the d.f. of 1 4

4 i=1

ε(i) . Then P1 = P (G ∗ −1

while P2 = P ((F1 )

−1

(U ),G−1 (1−U ))

(U ),(F2∗ )−1 (1−U ))

=

= 14 (ε(1,4) + ε(2,3) + ε(3,2) + ε(4,1) ), 1 2 (ε(1,4)

+ ε(2,3) ). For c1 (x, y) =

7.2 Fixed Sum of the Marginals

13

1(−∞,(3,2)] (x, y), we have EP1 c1 = 14 , EP2 c1 = 0, while for c2 = 1[(2,3),∞) , we have EP1 c1 = 14 , EP2 c2 = 12 . Note that both functions, −c1 , −c2 , are Monge functions (but are not unimodal). We next consider the case where in the network example we ﬁx the total outﬂow minus the inﬂow of each node. This problem is known in the literature as the minimal network ﬂow problem (cf. for example Barnes and Hoﬀman (1985, Section 9) or Anderson and Nash (1987)). Assume that the outﬂow minus the inﬂow of each node is ﬁxed; i.e., the following Kirchhoﬀ equations hold xik − xki = ai − bi = hi for all i, k

k

or equivalently, H(k) = A(k) − B(k), with A(k) =

k j=1

aj , B(k) =

1 ≤ k ≤ n, k

bj , and H(k) =

j=1

(7.2.10) k

hj . Consider now the

j=1

general case: Let A, B be distribution functions and let FA−B be the set of all “generalized” d.f.s of ﬁnite measures on IR2 with marginals F1 , F2 satisfying F1 − F2 = A − B. We consider the following transportation problem: (7.2.11) minimize c(x, y) dF (x, y) subject to F ∈ FA−B , with c(x, y) satisfying the Monge condition (7.2.6). To solve (7.2.11) we make use of the following dual representation (cf. (6.1.23)): inf c(x, y) dF (x, y); F ∈ FA−B (7.2.12) = sup f d(A − B)(x); f (x) − f (y) ≤ c(x, y), ∀x, y . We ﬁrst consider a particular type of cost function. Proposition 7.2.4 Let c(x, y) = |x−y| max(1, h(|x−a|), h(|y−a|)), where h is a monotonically nondecreasing function on IR+ . Then (7.2.13) inf c(x, y) dF (x, y); F ∈ FA−B = max(1, h(|x − a|))|A − B|(x) dx, provided that h(|x − a|) is locally integrable.

14

7. Relaxed or Additional Constraints

Proof: We ﬁrst note that the duality constraints condition f (x) − f (y) ≤ c(x, y), for all x, y, holds if and only if f is absolutely continuous and moreover, |f (x)| ≤ max(1, h(|x − y|)) a.s. Consequently, by the dual representation (7.2.12), we obtain inf c(x, y) dF (x, y); F ∈ FA−B = sup f d(A − B)(x); |f | ≤ max(1, h(|x − a|)), ∀x = sup f (x) d(A − B)(x) dx; |f | ≤ max(1, h(|x − a|)), ∀x = max(1, h(|x − a|))|A − B|(x) dx. 2 To handle the general case set c(x, y) = |x − y|ζ(x, y)

i.e., ζ(x, y) =

c(x, y) |x − y|

.

(7.2.14)

Theorem 7.2.5 Assume that for any x < t < y, ζ(t, t) ≤ ζ(x, y), ζ(x, y) = ζ(y, x). Moreover, let ζ(x, y) be right continuous in y, and also assume that t → ζ(t, t) is locally bounded. Then the optimal value in the minimization problem (7.2.11) is equal to inf c(x, y) dF (x, y); F ∈ FA−B = ζ(t, t)|A − B|(t) dt. (7.2.15) Proof: Let F = {f ; f (x)−f (y) ≤ c(x, y), ∀x, y}, and let F ∗ = {f absolutely continuous and |f (t)| ≤ ζ(t, t), ∀t}. Then F ⊂ F ∗ , and for f ∈ F (y) (y) ≤ ζ(x, y), and therefore, lim f (x)−f ≤ ζ(x, x). Also, we have f (x)−f |x−y| |x−y| y→x

f (x) − f (y) f (y) − f (x) = − lim ≥ − lim ζ(y, x) = −ζ(x, x). |x − y| |x − y| y→x lim

Since f is locally Lipschitz, it is absolutely continuous, so the inequalities above imply that |f (t)| ≤ ζ(t, t) a.s. If, conversely, f ∈ F ∗ , then y y f (x) − f (y) = f (t) dt, and therefore, |f (x) − f (y)| ≤ |f (t)| dt ≤ y

x

x

ζ(t, t) dt ≤ |x − y|ζ(x, y) = c(x, y). The dual representation (7.2.12) again

x

implies (7.2.13) (by the same arguments as in the proof of Proposition 7.2.4). 2

7.2 Fixed Sum of the Marginals

15

Next, we consider the following transportation problem with stochastically ordered marginals posed by Rogers (1992). Let F, G be real distribution functions, F ≤st G; here as usual ≤st stands for the stochastic order. Let C := {(x, y) ∈ IR2 ; x ≤ y}, and let MC (F, G) := M (F, G) ∩ {µ ∈ M 1 (IR2 , IB2 ); µ(C) = 1}

(7.2.16)

be the set of all measures with marginals F, G that are concentrated on the order cone C. The problem is to determine, for a given strictly convex function ϕ, the bound sup ϕ(x − y)µ( dx, dy); µ ∈ MC (F, G) . (7.2.17) The motivation for problem (7.2.17) is to get a good monotone coupling of random walks (Sn ), (Sn ) with S0 = x ≥ X0 = 0, Sn ≥ Sn for all n, and Sn = Sn for all large enough n. Without the order restriction, a solution of (7.2.17) is given by the random variables X = F −1 (U ), Y = G−1 (1 − U ) for a uniform (0, 1) distributed r.v. U. It is intuitively clear that a solution of (7.2.17) should concentrate as much mass on the diagonal as possible. This is indeed true. Theorem 7.2.6 (Rogers (1992)) Each solution (X, Y ) of (7.2.17) has the property that P (X = Y ) = |F ∧ G| = f ∧ g dm, (7.2.18) when F = f m, G = gm. There exists a solution of (7.2.17). We next characterize the optimal solutions by an order-type relation. Theorem 7.2.7 Let X, Y be r.v.s with d.f.s F, G and X ≤ Y a.s. Then (X, Y ) deﬁnes a solution of (7.2.17) iﬀ X(ω) < X(ω ) ≤ Y (ω) ≤ Y (ω )

implies Y (ω ) = Y (ω)

(7.2.19)

a.s. (for (ω, ω ) and with respect to the product measure). Proof: If (X, Y ) is an optimal admissible coupling and if on a set of pairs (ω, ω ) with positive measure X(ω) < X(ω ) ≤ Y (ω) < Y (ω ) holds, then let us deﬁne Y (ω ) := Y (ω), Y (ω) := Y (ω ) and set Y = Y otherwise. Then Y has d.f. G and Eϕ(X − Y ) > Eϕ(X − Y ) because ϕ is strictly convex. Since there is essentially up to simultaneous rearrangements only one pair of r.v.s X, Y with d.f.s F, G satisfying the order relation (7.2.18), the opposite direction follows from the ﬁrst part of the proof. 2

16

7. Relaxed or Additional Constraints

In terms of measures µ ∈ MC (F, G), the characterization of optimality of µ in (7.2.19) can be formulated as µ ⊗ µ ({(x1 , y1 , x2 , y2 ); x1 < x2 ≤ y1 < y2 }) = 0.

(7.2.20)

We remark that the characterization of optimal pairs in (7.2.19), resp. (7.2.20), implies the “maximal concentration on the diagonal” property in (7.2.18). For ﬁnite discrete distributions one can explicitly construct optimal pairs with the ordering property given in (7.2.19). We consider at nﬁrst the case of equiprobable atoms in each distribution. So let µ1 = n1 i=1 εai , µ2 = n 1 i=1 εbi be the measures corresponding to F, G, where a1 ≤ · · · ≤ an , n b1 ≤ · · · ≤ bn , and ai ≤ bi for all i. Problem (7.2.17) is equivalent to the following problem: Find a permutation π ∈ Υn such that n

ϕ(bi − aπ(i) ) is maximal.

(7.2.21)

i=1

Here, the maximum is considered over all permuations π ∈ Υn such that aπ(i) ≤ bi , 1 ≤ i ≤ n. Permutations with this property are called admissible permutations. An optimal admissible permutation is essentially unique (up to indices with equal values of ai ), and it is given in the following theorem. Theorem 7.2.8 Deﬁne π ∗ ∈ Υn inductively: π ∗ (1)

:= max{k ≤ n; ak ≤ b1 } (7.2.22) ∗ ∗ π (k) := max{ ≤ n; ∈ {π (1), . . . , π (k − 1)}, a ≤ bk }, 2 ≤ k ≤ n. ∗

Then π ∗ ∈ Υ is the optimal admissible permutation. Proof: Deﬁne on Ω = {1, . . . , n} (supplied with the uniform distribution P ) random variables X(i) := ai and Y (i) := bπ∗ (i) , 1 ≤ i ≤ n. Then X ≤ Y , since π ∗ is admissible and X, Y satisfy the order relation (7.2.19). Therefore, they are optimal couplings. Equivalently, π ∗ is the optimal admissible permutation. 2 It is clear from the construction in Theorem (7.2.8) that up to a simultaneous permutation of the probability space, an optimal pair of r.v.s is essentially unique. Remark 7.2.9 Theorem 7.2.8 can be extended to the case that µ1 = n n p ε , µ = q 2 i=1 i ai i=1 i εbi with rational pi , qi , by representing pi , qi in the formal equiprobable case. By an approximation argument—as given in Rogers (1992)—one can approximate the optimal couplings for F, G with

7.3 Mass Transportation Problems with Capacity Constraints

17

couplings having compact support. The general case then can be approximated via the ordering criterion (7.2.19) using a truncation technique. Thus, applying Theorem 7.2.8, we are able to construct explicit approximate solutions in the general case.

7.3 Mass Transportation Problems with Capacity Constraints In this section we obtain explicit solutions of Monge–Kantorovich mass transportation problems with capacity constraints. The Hoeﬀding–Fr´echet inequality is extended for bivariate distribution functions having ﬁxed marginals and satisfying additional constraints. In the discrete case, our results lead to “greedy” algorithms similar to the classical northwest corner rule. Let us start with recalling the abstract version of the MKP: Given two Borel measures µ and ν on a separable metric space S with equal total mass λ = µ(S) = ν(S) < ∞ and a measurable cost function c on S × S, ﬁnd Lc (µ, ν) = inf c(x, y)P ( dx, dy), (7.3.1) Uc (µ, ν) = sup c(x, y)P ( dx, dy), (7.3.2) where the inﬁmum and supremum are taken over all Borel measures P on S × S having projections (marginals) P (· × S) = µ(·),

P (S × ·) = ν(·).

(7.3.3)

As shown in Section 3.1, the explicit solutions of MKP are based on the Hoeﬀding–Fr´echet inequality (referred to as upper and lower Fr´echet bounds): max(0, F µ (x) + F ν (y) − λ) ≤ F P (x, y) ≤ min(F µ (x), F ν (y)),

(7.3.4)

for any P on IR2 that satisﬁes (7.3.3) with S = IR. (In (7.3.4) and in the sequel, F P stands for the distribution function of P .) If c is a lattice superadditive (equivalently, −c is a Monge function): c(x , y ) + c(x, y) ≥ c(x , y) + c(x, y )

for all x ≥ x, y ≥ y,

(7.3.5)

then under mild moment conditions on µ and ν the explicit values of Lc and Uc were given in Section 3.1. In this section we consider two marginal problems with additional constraints on the joint distribution functions. Suppose µ and ν are two nonnegative Borel measures on IR, µ(IR) = ν(IR) = λ < ∞. Suppose c : IR2 →

18

7. Relaxed or Additional Constraints

IR is a right-continuous Monge function generating a nonnegative measure on IR2 . Let σ be a nonnegative bounded Borel measure on IR2 . (Note that the total mass of σ may be diﬀerent from λ.) Problem I.

Find maximum c(x, y)P ( dx, dy)

(7.3.6)

IR2

subject to the constraints P is a nonnegative Borel measures on IR2

(7.3.7)

with marginals µ and ν, and P ((−∞, x] × (−∞, y]) ≤ σ((−∞, x] × (−∞, y]) for all x, y ∈ IR.

(7.3.8)

Problem II.

Find minimum c(x, y)P ( dx, dy)

(7.3.9)

IR2

subject to (7.3.7) and P ((−∞, x] × [y, ∞)) ≤ σ((−∞, x] × [y, ∞))

for all x, y ∈ IR. (7.3.10)

Problem I with discrete µ and ν was studied by Barnes and Hoﬀman (1985). Olkin and Rachev (1990) extended their results by completing the characterization of the “optimal feasible” P; i.e., P satisﬁes (7.3.7), (7.3.8) and attains the maximum in (7.3.6). This method is extended to solve Problem II as well. We start with a reﬁnement of the Fr´echet bounds (7.3.4). We shall do this by determining the exact bounds for a d.f. F P (x, y) with marginals F µ and F ν assuming that P satisﬁes the constraint (7.3.8) or (7.3.10). Then we shall apply the extended Fr´echet bounds to solve Problems I and II. Whereas in the discrete case the solution of Problem I leads to the Barnes– Hoﬀman greedy algorithm, the solution of Problem II implies a new greedy algorithm for a transportation problem with capacity constraints (7.3.10). We begin with some notation. For two nonnegative Borel measures µ and ν on IR with equal total mass λ denote by M (µ, ν) the set of all nonnegative Borel measures on IR2 with projections µ and ν. Without loss of generality set λ = 1. Given a nonatomic probability space, the set F(A, B) of joint d.f.s F (x, y) = FX,Y (x, y) = P (X ≤ x, Y ≤ y) with ﬁxed

7.3 Mass Transportation Problems with Capacity Constraints

19

marginals FX = A and FY = B is the set of d.f.s of the probability laws in M (µ, ν). Thus, the Fr´echet bounds (7.3.4) can be rewritten as max

F (x, y) = F ∗ (x, y) :=

min(A(x), B(y)),

(7.3.11)

max

G(x, y) = G∗ (x, y) :=

min(A(x), B(y)),

(7.3.12)

F ∈F (A,B) F ∈F (A,B)

where B(y) := ν([y, ∞)) and G(x, y) := GX,Y (x, y) := P (X ≤ x, Y ≥ y). Clearly, the laws corresponding to F ∗ and G∗ are in M (µ, ν). Furthermore, given a nonnegative bounded Borel measure σ on IR2 , set F σ (x, y) := σ((−∞, x] × (−∞, y]), (7.3.13) σ G (x, y) := σ((−∞, x] × [y, ∞)) F(A, B, F σ ) := {F ∈ F(A, B); F ≤ F σ }, G(A, B, Gσ ) := {GX,Y ; FX,Y ∈ F(A, B), GX,Y ≤ Gσ } . Our objective in the next two theorems is to extend the Fr´echet bounds; we shall characterize the bounds max

F (x, y) =: F(x, y),

x, y ∈ IR,

(7.3.14)

max

y), G(x, y) =: G(x,

x, y ∈ IR,

(7.3.15)

F ∈F (A,B,F σ ) G∈G(A,B,Gσ )

and shall examine the conditions implying F ∈ F(A, B, F σ )

∈ G(A, B, Gσ ). and G

(7.3.16)

Theorem 7.3.1 If F σ (x, y) ≥ max(0, A(x) + B(y) − 1),

(7.3.17)

then the maximum in (7.3.14) is attained: F(x, y) = =

inf {F σ (t, s) + µ((t, x]) + ν((s, y])}

t≤x s≤y

(7.3.18)

inf {F σ (t, s) + µ((t, x]) + ν((s, y])} ∧ (A(x) ∧ B(y)),

t≤x s≤y

and F ∈ F(A, B, F σ ), where ∧ := min. Remark 7.3.2 Condition (7.3.17) is necessary and suﬃcient for F(A, B, F σ ) = Ø, cf. Fr´echet (1951), Kellerer (1964). Remark 7.3.3 The second equality in (7.3.18) follows from the fact that F σ (t, s) = 0 for t = −∞ or s = −∞.

20

7. Relaxed or Additional Constraints

Remark 7.3.4 From (7.3.18) F is not greater than the Hoeﬀding–Fr´echet upper bound F ∗ (7.3.11). Remark 7.3.5 By (7.3.4) the maximum in (7.3.11) is attained for the pair X ∗ = A− (U ), Y ∗ = B − (U ), where A− is the generalized inverse of A, and U is uniformly distributed on [0, 1]. In contrast, for F given in (7.3.18), Y ) with joint d.f. given by F is the explicit form of the optimal pair (X, not known. However, in the discrete case one can use the Barnes–Hoﬀman greedy algorithm to compute F. Suppose µ, ν, and σ are discrete measures,

ai

:= µ({xi }),

bj

:= ν({yj }), j ∈ N = {1, 2, . . . , }, = bj = 1;

ai

i∈M

i ∈ M = {1, 2, . . . , m},

(7.3.19)

j∈N

σij

:= F σ (xi , yj ),

i ∈ M, j ∈ N.

(7.3.20)

Then F(xi , yj ) =

j i

prs ,

(7.3.21)

r=1 s=1

where the probabilities prs are determined by the following variant of the northwest corner rule (see Hoﬀman (1961), Barnes and Hoﬀman (1985)); in fact, we set p11 pij

:=

min(a1 , b1 , σ11 ); (7.3.22) ⎧ ⎫ ⎪ ⎪ j−1 i−1 ⎨ ⎬ := min ai − pis , bj − prj , σij − prs , ⎪ ⎪ ⎭ ⎩ r≤i s≤j s=1 r=1 (r,s)=(i,j)

if prs is determined for r ≤ i < m and s ≤ j < n, and we let j−1 i−1 pis , bj − prj , if i = m or j = n. pij := min ai − s=1

r=1

In other words, taking discrete versions of µ, ν, and σ in (7.3.19) one can apply the greedy algorithm (7.3.22) to approximate F in (7.3.18) by means of (7.3.21). Proof of Theorem 7.3.1: The proof is based on three assertions. Claim 7.3.6 (Fr´echet (1951)) The condition F σ (x, y) ≥ H− (x, y) = max(0, A(x) + B(y) − 1) is necessary and suﬃcient for F(A, B, F σ ) = Ø.

7.3 Mass Transportation Problems with Capacity Constraints

21

Suppose F(A, B, F σ ) = Ø. Then, by (7.3.4) H− (x, y) ≤ F (x, y) < F σ (x, y),

F ∈ F(A, B, F σ ).

(7.3.23)

On the other hand, if H− ≤ F σ , then H− ∈ F(A, B, F σ ). Claim 7.3.7 F deﬁned by (7.3.18) has marginal d.f.s A and B and for all x, y ∈ IR, sup F ∈F (A,B,F σ )

F (x, y) ≤ F(x, y).

(7.3.24)

For any F ∈ F(A, B, F σ ) and any t ≤ x, s ≤ y, we have F (x, y) ≤ F (t, s) + µ((t, x]) + ν((s, y]), which clearly implies (7.3.24). Invoking Remark (7.3.3), F(x, y) ≤ H+ (x, y) where H+ is the upper Hoeﬀding–Fr´echet bound, H+ (x, y) := min(A(x), B(y)). Since F ≥ H− (cf. (7.3.23), (7.3.24)), F ∈ [H− , H+ ] has marginals A and B. σ

Theorem 7.3.1 is now a consequence of the following assertion. Claim 7.3.8 F is a d.f. To this end, we choose −∞ = x0 < x1 < · · · < xm−1 < xm = ∞, −∞ = y0 < y1 < · · · < yn−1 < yn = ∞ such that µ((xi−1 , xi )) < ε, ν((yn−1 , y1 )) < ε, and σ((xi−1 , xi ) × (yj−1 , yj )) < ε for all i ∈ M = {1, . . . , m} and j ∈ N = {1, . . . , n}. Set ai := µ((xi−1 , xi ]), bj := ν((yj−1 , yj ]), and σij := F σ (xi , yj ). Consider the convex polygon ⎧ ⎨ pij = ai , (7.3.25) p = (pij ) i∈M ; pij ≥ 0, pi· := j∈N ⎩ j∈N j i pij = bj , prs ≤ σij , for all i ∈ M, j ∈ N p·j := =

i∈M

r=1 s=1

p; pij ≥ 0, pi· = ai , p·j = bj ,

j i

prs ≤ σij , i = 1, . . . , m − 1,

r=1 s=1

j = 1, . . . , n − 1,

j s=1

p·s ≤ σmj , j ∈ N,

i

pr· ≤ σin , i ∈ M

.

r=1

By the Fr´echet condition (7.3.17) (cf. Claim 7.3.6) the marginals of F σ j majorize A and B, respectively, and thus σmj ≥ s=1 p·s and σin ≥ i p for all j ∈ N, i ∈ M . The polygon (7.3.25) becomes r=1 r· p; pij ≥ 0, pi· = ai , p·j = bj

for all i ∈ M, j ∈ N,

(7.3.26)

22

7. Relaxed or Additional Constraints j i

prs ≤ σij

for all i = 1, . . . , m − 1, j = 1, . . . , n − 1 .

r=1 s=1

Consider now the discrete analogue of F in (7.3.18): dij

:=

dij

:=

min {σrs + ar+1 + · · · + ai + bs+1 + · · · + bj },

0≤r≤i 0≤s≤j

0

(7.3.27)

if i = 0 or j = 0,

where σrs = 0 if r = 0 or s = 0. Our aim now is to show that d = (dij ) i∈M j∈N

determines a bivariate d.f. with support on X × Y, X = (xi )i∈M , Y = (yj )j∈N . Claim 7.3.9 The greedy algorithm (7.3.22) is determined uniquely by (7.3.27); i.e., j i

dij :=

prs ,

i ∈ N, j ∈ M.

(7.3.28)

r=1 s=1

Proof: Consider the discrete version of F (cf. (7.3.21), (7.3.25)). Let σr,s := 0 if r = 0 or s = 0, and deﬁne dij

:=

dij

=

min {σrs + ar+1 + · · · + ai + bs+1 + · · · + bj },

0≤r≤i 0≤s≤j

(7.3.29)

0 if i = 0 or j = 0.

We need to check the equality dij =

j i

prs ,

i ∈ M, j ∈ N,

(7.3.30)

r=1 s=1

where the pij ’s are determined by the greedy algorithm (7.3.22). If i = j = 1, then p11 = min(a1 , b1 , σ11 ) (cf. (7.3.22)), and by (7.3.29) d11 = min{σ11 + a1 + b1 , σ11 + a1 , σ10 + b1 , σ11 } = p11 . Suppose we have proved that d1,j−1 = p11 + · · · + p1,j−1 . Then p11 + · · · + p1j

(7.3.31)

7.3 Mass Transportation Problems with Capacity Constraints

=

j−1

p1s + min a1 −

s=1

= =

j−1

p1s , bj , σ1j −

s=1

j−1

23

p1s

s=1

min{a1 , bj + d1,j−1 , σ1,j } min{a1 , b1 + · · · + bj , σ11 + b2 + · · · + bj , . . . , σ1,j−1 + bj , σi,j }

= d1,j . These equalities hold due to (7.3.22), (7.3.31), and (7.3.29), respectively. By symmetry, di,1 = p11 + · · · + pi1 . Suppose next that drs =

s r

for all r ≤ i, s ≤ j, (r, s) = (i, j).

pkl

(7.3.32)

k=1 l=1

Then for 1 ≤ i ≤ m − 1, 1 ≤ j ≤ n − 1, j j j−1 i i−1 i prs = min ai + prs , bj + prs , σij = dij , r=1 s=1

r=1 s=1

r=1 s=1

where the equalities follow from (7.3.22) and (7.3.32). Thus dij =

j i

prs

for all 1 ≤ i ≤ m − 1, 1 ≤ j ≤ n − 1.

(7.3.33)

r=1 s=1

Consider now the case i = m. Then, m m−1 m−1 pr1 = pr1 + min am , b1 − pr1 r=1

r=1

=

r=1

min{am + dm−1,1 , b1 } = dm,1 ,

which follows from (7.3.22) and (7.3.33). Suppose that dm,j−1 =

j−1 m

prs .

(7.3.34)

r=1 s=1

Then using (7.3.22), (7.3.33), and (7.3.34), for 1 ≤ j ≤ n, j j−1 m m−1 m n prs = min am + prs , bj + prs r=1 s=1

r=1 s=1

=

r=1 s=1

min{σrs + ar+1 + · · · + am bs+1 + · · · + bj },

for 0 ≤ r ≤ m, 0 ≤ s ≤ j, (r, s) = (m, j); m m r=1 s=1

prs = dm,j ,

for r = m, s = j, σm,j = F σ (∞, yj ) ≥ bj .

24

7. Relaxed or Additional Constraints

Similarly, di,n =

i n

prs , for all i ∈ M , which proves Claim 7.3.9.

2

r=1 s=1

The greedy algorithm (7.3.22) deﬁnes nonnegative pij ’s (cf. Barnes and Hoﬀman (1985, Lemma 3.2)). Deﬁne the probability P (ε) on X × X by P (ε) ((−∞, xi ], (−∞, yj ]) := dij ,

i ∈ M, j ∈ N.

(7.3.35)

Similarly, (ai )i∈M and (bi )i∈N determine probabilites µ(ε) and ν (ε) with supports X and Y , respectively. If is the Kolmogorov (uniform) distance (µ, ν) := sup |F µ (x) − F ν (x)| ,

(7.3.36)

x∈IR

then the sequences µ(ε) ε>0 and ν (ε) ε>0 are -relatively compact, and thus there exists εn ↓ 0 such that

µ(εn ) , µ → 0 and ν (εn ) , ν → 0. (7.3.37) (For more facts on -relative compactness cf. Rachev (1984a) and Kakosjan, Klebanov, and Rachev (1988, Sec. 2.5).) Similarly, by deﬁnition of σij := F σ (xi , yj ), σ((xi−1 , xi )×(yj−1 , yj )) < ε we have that (σij ) i∈M determines a measure σ (ε) on X × Y . Again, the i∈N family σ (ε) ε>0 is -relatively compact. Thus, without loss of generality, we may assume that as εn → 0, (ε )

n (7.3.38) σ (εn ) , σ = sup F σ (x, y) − F σ (x, y) → 0. x,y∈R

(ε) As inClaim has marginals µ(ε) and ν (ε) , and 7.3.7, we conclude that P (ε) is tight. By (7.3.37), (7.3.38), (7.3.26), and (7.3.18), there thus P ε>0 exists a subsequence {εn } ⊂ {εn } such that P (εn ) weakly converges to a measure P with d.f. F. The proof of Theorem 7.3.1 is now complete. 2

The next theorem provides an explicit expression for the Fr´echet type bound (7.3.15). We recall the notations (7.3.11)–(7.3.13). Theorem 7.3.10 Suppose Gσ (x, y) := σ((−∞, x] × [y, ∞)) (cf. (7.3.13)) satisﬁes the condition

◦ ◦ B(y) := ν((−∞, y)) . (7.3.39) Gσ (x, y) ≥ max 0, A(x) − B(y)

7.3 Mass Transportation Problems with Capacity Constraints

25

Then the maximum in (7.3.15) is attained, and y) = inf {Gσ (t, s) + µ((t, x] + v([y, s))} . G(x, t≤x s≥y

(7.3.40)

Conclusions similar to those in Remarks 7.3.2–7.3.5 can be made. Here we shall only point out the greedy algorithm that can be used to approximate We use the notations (7.3.19) again and let the optimal distribution G. λij := Gσ (xi , yj ),

i ∈ M, j ∈ N.

(7.3.41)

has the form Then, in this discrete case, G i , yj ) = G(x

i n

prs ,

(7.3.42)

r=1 s=j

where the probabilites pij are determined by the following southwest corner rule: pin

:=

pij

:=

min{a1 , bn , λ1n }; (7.3.43) ⎧ ⎫ ⎪ ⎪ n i−1 ⎨ ⎬ min ai − pis , bj − prj , λij − prs , (7.3.44) ⎪ ⎪ ⎩ ⎭ r≤i s≥j r=1 s=j+1 (r,s)=(i,j)

if i = m or j = 1. if prs is determined for r ≤ i ≤ m − 1 and s ≥ j > 1; moreover, ⎧ ⎫ n i−1 ⎨ ⎬ pij := min ai − pis , bj − prj , if i = m or j = 1. ⎩ ⎭ s=j+1

(7.3.45)

r=1

We now give explicit solutions of the marginal problems I and II. Theorem 7.3.11 Suppose (i)

c : IR2 → IR is a right-continuous lattice superadditive function (−c is a Monge function);

(ii)

µ and ν are two Borel nonnegative measures on IR with µ(IR) = ν(IR) = λ < ∞ and d.f.s F µ and F ν , and such that

c(x0 , y)ν( dy) < ∞

c(x, y0 )µ( dx) + IR

IR

for some x0 , y0 ∈ IR;

(7.3.46)

26

7. Relaxed or Additional Constraints

(iii)

σ is a nonnegative bounded Borel measure on IR2 and F σ (x, y) ≥ max(0, F µ (x) + F ν (y) − λ)

for all x, y ∈ IR. (7.3.47)

Then the maximum in (7.3.6) is attained at the “optimal” measure P. P satisﬁes the feasibility conditions (7.3.7), (7.3.8) and is determined by

F P (x, y) := inf {F σ (t, s) + µ((t, x]) + ν((s, y])} , t≤x s≤y

x, y ∈ IR.

(7.3.48)

Proof: We need Theorem 3.1.2 (cf. Cambanis, Simons, and Stout (1976, Theorem 1); see also Rachev (1991c, Section 7.3)). If (7.3.5) holds, then for measures P1 and P2 on IR2 with marginals µ and ν, c dP1 ≤ c dP2 , (7.3.49) F P1 ≤ F P2 ⇒ IR2

IR2

2

which with an appeal to Theorem 7.3.1 yields the result.

Remark 7.3.12 The assumption (7.3.46) can be replaced by one of the following assumptions:

c(x, x)(µ + ν)( dx) < ∞.

(a)

c(x, y) is symmetric, and

(b)

c(x, y) is uniformly integrable for all P with marginals µ and ν.

That (a) implies (7.3.49) follows from Cambanis, Simons, and Stout (1976); that (b) implies (7.3.49) follows from Tchen (1980, Corollary 2.1); see also Rachev (1991c, Theorem 7.3.2). Remark 7.3.13 Condition (7.3.47) guarantees that the set of feasible solutions P determined by (7.3.7), (7.3.8) is not empty. Remark 7.3.14 If F σ (x, y) ≥ min(F µ (x), F ν (y)) =: H+ (x, y),

(7.3.50)

then F P in (7.3.48) equals H+ (cf. Remark 7.3.3). Thus, Theorem 7.3.11 (see also the next Theorem 7.3.17) can be considered as a generalization of Theorem 2 of Cambanis, Simons, and Stout (1976) and Corollary 2.2 of Tchen (1980). In this case, Hoﬀman’s (1962) northwest corner rule gives a greedy algorithm to determine an “optimal” measure P, provided that µ and ν have ﬁnite discrete support.

7.3 Mass Transportation Problems with Capacity Constraints

27

Remark 7.3.15 Consider the discrete version of Problem I (see (7.3.6)). Suppose c(i, j), i ∈ M, j ∈ N , is a lattice superadditive sequence c(i, j) + c(i + 1, j + 1) ≥ c(i, j + 1) + c(i + 1, j), (7.3.51) i = 1, . . . , m − 1, j = 1, . . . , n − 1. Hoﬀman (1961) and Barnes and Hoﬀman (1985) treat c(i, j) as the (negative) cost of shipping a unit commodity from origin i to destination j. Suppose the discrete measures µ and ν with supports M and N are given. Then ai = µ{i} and bj = ν{j} are interpreted as the amount of a product available at i and the amount required at destination j. Suppose the (m − 1) × (n − 1) matrix (σij ) satisﬁes ⎧ ⎫ i n ⎨ ⎬ σij ≥ max 0, ar − bs , (7.3.52) ⎩ ⎭ r=1

s=j+1

σij ≤ σis , σij ≤ σrj , σij + σrs ≥ σis + σrj , r ≥ i, s ≥ j. (These conditions are related to what is called a uniformly tapered matrix; see Marshall and Olkin (1979).) Barnes and Hoﬀman (1985) consider the following transportation problem: maximize c(i, j)pij (7.3.53) i∈M j∈N

subject to

j i

pij

≥ 0, pi· = ai , p·j = bj

pij

≤ σij ,

for all i ∈ M, j ∈ N, (7.3.54)

i = 1, . . . , m − 1, j = 1, . . . , n − 1.

r=1 s=1

Clearly, (7.3.54) is a special case of Problem I. Following Barnes and Hoﬀman, (7.3.54) can be viewed as the capacity restrictions on the amount that can be shipped from the ﬁrst i origins to the ﬁrst j destinations. Theorem 7.3.11 is completed by showing that the greedy algorithm of Barnes and Hoﬀman (1985) for determining the solution (pij ) i∈M of (7.3.53) is also j∈N characterized by F P (i, j) :=

j i

prs

(7.3.55)

r=1 s=1

=

σrs

=

min {σrs + ar+1 + · · · + ai + bs+1 + · · · + bj } ,

0≤r≤i 0≤s≤j

⎧ ⎨ 0 if r = 0 or s = 0, ⎩ +∞ if r = m or s = n.

28

7. Relaxed or Additional Constraints

Remark 7.3.16 One can determine the extremal value in (7.3.6):

max µ ν

F ∈F (F ,F

;F σ )

R2

c dF P ,

c dF =

(7.3.56)

IR2

where F P is given by (7.3.48). By (7.3.46) and Cambanis, Simons, and Stout (1976, p. 288, (9)), c dF = IR2

(7.3.57)

IR

c(x0 , y)F ν ( dy) − c(x0 , y0 ) +

c(x, y0 )F µ ( dx) + IR

B(x, y)µc ( dx, dy) IR2

for any bivariate d.f. F with marginals F µ and F ν . (Since F P ∈ µ ν· σ F(F , F , F ) by Theorem 7.3.1, (7.3.57) can be used to compute the value

c dF P . In (7.3.57) the points x0 and y0 are the same as in condition

of

(7.3.23), the measure µc is generated by c (see condition (i) in Theorem 7.3.11), and we also assume that c is a nondecreasing function in both arguments.) Finally, := B1 − B2 (7.3.58) ⎧ ⎪ ⎪ 1 + F (x, y) − F µ (x) − F ν (y) if x0 < x, y0 < y, ⎪ ⎨ B1 (x, y) := F (x, y) if x ≤ x0 , y ≤ y0 , ⎪ ⎪ ⎪ ⎩ 0 otherwise; ⎧ µ ⎪ ⎪ ⎪ F (x) − F (x, y) if x ≤ x0 , y0 ≤ y, ⎨ B2 (x, y) := F ν (y) − F (x, y) if x0 < x, y ≤ y0 , ⎪ ⎪ ⎪ ⎩ 0 otherwise. B

Theorem 7.3.17 Suppose conditions (i) and (ii) of Theorem 7.3.11 hold, and in addition ∗

(iii )

σ is a nonnegative bounded Borel measure on IR2 satisfying

Gσ := σ ((−∞, x], [y, ∞)) ≥ max (0, F µ (x) − ν((−∞, y)))

(7.3.59)

for all x, y ∈ IR. Then, the minimum in (7.3.9) is attained at an optimal measure Q satisfying the feasibility conditions (7.3.7) and (7.3.10); Q is

7.3 Mass Transportation Problems with Capacity Constraints

29

determined by GQ (x, y) = Q((−∞, x] × [y, ∞)) = inf {Gσ (x, y) + µ((t, x]) + ν([y, s))} .

(7.3.60)

t≤x s≥y

All the Remarks 7.3.12–7.3.16 can be easily reformulated regarding Theorem 7.3.17. In particular, consider the transportation problem c(i, j)pij (7.3.61) minimize i∈M j∈N

subject to

n i

pij

≥ 0, pi· = ai , p·j = bj

for all i ∈ M, j ∈ N, (7.3.62)

prs

≤ λij , i = 1, . . . , m − 1, j = 1, . . . , n − 1.

r=1 s=j

bj and c(·, ·) is a lattice superadditive sequence

j i (cf. (7.3.51)). Suppose also that λij ≥ max 0, r=1 ar − s=1 bs and for any r j the inequalities

Suppose

i∈M

ai =

j∈N

λrj ≤ λij ≥ λis , λij + λrs ≥ λis + λrj ≥ 0, r < i, s > j,

(7.3.63)

hold. Then the greedy algorithm (7.3.43)–(7.3.45) realizes the minimum in (7.3.61). Moreover, the optimal pij ’s are determined by pij = fij − fi,j+1 − fi−1,j + fi−1,j+1 , where fij

:=

min {λrs + (ar+1 + · · · + ai ) + (bj + · · · + bs−1 )}

1≤r≤i j≤s≤n

∧

i r=1

ar ∧

n

bs .

(7.3.64)

s=j

The rest of this section is devoted to a generalization of the MKP with additional constraints stated in Problems I and II; see (7.3.6)–(7.3.10). The results are motivated by Hoﬀman and Veinott (1990), where the discrete version of the problem has been considered. We shall only state the results. The proofs are similar to those of Theorems 7.3.1 and 7.3.10 and will therefore be omitted. The abstract form of the problem is the following. Suppose that (i)

µ and ν are two nonnegative Borel measures on IR, µ(IR) = ν(IR) = λ < ∞;

30

7. Relaxed or Additional Constraints

(ii) L is a union of disjoint sublattices Li ⊂ IR2 , i ∈ S, and the projections of L on each axis equal IR; (iii) (σi )i∈S are nonnegative σ-ﬁnite Borel measures on Li . Then the problem is to ﬁnd min c dP,

(7.3.65)

L

where the minimum is subjet to the following constraints: (i)

P ’s are nonnegative Borel measures on L with marginals µ and ν;

(7.3.66)

(ii)

P (A) ≤ σi (A)

(7.3.67)

for any A = Li ∩ (−∞, x] × (−∞, y],

(x, y) ∈ Li , i ∈ S.

As before, see (7.3.1)–(7.3.3), the measures µ and ν are viewed as initial and ﬁnal mass distributions, and P in (7.3.66), (7.3.67) are the (feasible) transportation plans. Here the generalization of problems I and II is that L describes the path of the transportation ﬂow and σi ’s are capacity constraints on the cumulative supply–demand ﬂow. Finally, c : L → R is a cost function, and therefore, the integral in (7.3.65) represents the total cost of mass transportation applying the plan P . Suppose c is subadditive on the lattice L; that is; for all x, y ∈ L, f (x) + f (y) ≥ f (x ∧ y) + f (x ∨ y). Then we shall call a feasible plan of transportation achieving the minimum in (7.3.65) an optimal measure P ∗ . As in problems I and II we start with extensions of the classical Hoeﬀding– Fr´echet bounds (7.3.4), assuming that P meets the constraints (7.3.66) and (7.3.67), or their alternatives: P is a nonnegative Borel measure on L = Li (7.3.68)

i∈S

Li := {(x, y); (x, −y) ∈ L} with marginals µ and ν ;

P (B) ≤ σi (B) for any B = Li ∩ ((−∞, x] × [y, ∞)). The restriction on the support of P given in (7.3.66) has the form L = Li , where S = {0, 1, . . . , s}, i∈S

(7.3.69)

7.3 Mass Transportation Problems with Capacity Constraints

31

+ − + or S = IN, and each sublattice Li is a rectangle (x− i , xi ] × (yi , yi ], where − − − + − + − − + − x0 = y0 = −∞, xi < xi , yi < yi , xi−1 ≤ xi ≤ xi−1 ≤ x+ i , yi−1 ≤ + + yi− ≤ yi−1 ≤ yi+ , x+ s = ys = ∞. Write PL (resp. PL ) to denote the class of all P ’s on L with (7.3.66) and (7.3.67) (resp. (7.3.68), (7.3.69)). Recall that for any measure P on IR2 , F P stands for the d.f. of P , and GP (x, y) = P ((−∞, x]×[y, ∞)). In the next two theorems we shall compute the bounds

F ∗ (x, y) = max F P (x, y)

(7.3.70)

G∗ (x, y) = max GP (x, y).

(7.3.71)

P ∈PL

and P ∈PL

For L = IR2 and σi = +∞, F ∗ is indeed the upper Hoeﬀding–Fr´echet bound H+ (x, y) = min {F µ (x), F ν (y)}

(F µ (x) := µ((−∞, x]) . (7.3.72)

On the other hand, G∗ (x, y) = min {F µ (x), Gν (y)} determines a measure with d.f.

(Gν (x) := ν([y, ∞)))

H− (x, y) = max (0, F µ (x) + F ν (y) − λ) ,

(7.3.73)

which is the lower Hoeﬀding–Fr´echet bound. Theorem 7.3.18 Suppose that F ∗ : L → IR is deﬁned iteratively as follows: F ∗ (x, y) =

min

[µ((u, x]) + ν((v, y]) + F σ0 (u, v)]

(7.3.74)

min

[µ((u, x]) + ν((v, y]) + F σi (u, v)

(7.3.75)

+ −∞
for (x, y) ∈ L0 , and F ∗ (x, y) =

− +
x

+ + ∗ + + F ∗ (x+ i−1 , v ∧ yi−1 ) ∨ F (xi−1 ∧ u, yi−1 )

for (x, y) ∈ Li , i ≥ 1. Suppose also that F ∗ satisﬁes the inequalities + ∗ F σi (x.y) + F ∗ (x+ i−1 , y) ∨ F (x, yi−1 ) ≥ H− (x, y)

(7.3.76)

for (x, y) ∈ Li , i ∈ S. Then the equality (7.3.70) holds, and moreover, F ∗ is a d.f. of some P ∗ ∈ PL .

32

7. Relaxed or Additional Constraints

The proof is similar to that of Theorem 7.3.1. A slightly diﬀerent approach (see Olkin and Rachev (1990)) can be used based on the following result of Topkis and Veinott (1973): The minimum of subadditive functions over a sublattice with respect to some variables is subadditive in the remaining variables; see also Hoﬀman and Veinott (1990) for the discrete version of Theorem 7.3.18. In the next theorem we evaluate the bound G∗ in (7.3.71). Recall that − + + − Li := (xi , xi ] × [−yi , −yi ), L := i∈S Li . Theorem 7.3.19 Suppose that G∗ : L → IR is deﬁned iteratively by G∗ (x, y) = min [µ((u, x]) + ν([y, v)) + Gσ0 (u, v)] , for (x, y) ∈ L0 (7.3.77) u≤v v≥y

and G∗ (x, y)

=

min

− + x v≥y≥−y i i

µ((u, x]) + ν([y, v)) + Gσi (u, v)

(7.3.78)

+ + + ∗ + G∗ (x− i−1 , v ∨ (−yi−1 )) ∨ G (u ∧ xi−1 , −yi−1 )

for (x, y) ∈ Li . Suppose also that G∗ satisﬁes the inequalities + ∗ Gσ (x, y) + G∗ (x+ i−1 , y) ∨ G (x, −yi−1 ) ≥ F µ (x) + Gν (y) − λ for any (x, y) ∈ Li , i ∈ S.

(7.3.79)

Then the inequality (7.3.71) holds, and G∗ deﬁnes P ∗ ∈ PL by G∗ (x, y) = P ∗ ((−∞, x] × [y, ∞)) for (x, y) ∈ L. Condition (7.3.79) is necessary for P ∗ ∈ PL . Next, we shall formulate a multivariate analogue of Theorem 7.3.18. (In general, Theorem 7.3.19 does not admit a multivariate extension by the well-known reason that the lower Hoeﬀding–Fr´echet bound for d.f.s on IRr (r > 2) with given one-dimensional projections does not generate a measure.) Let µ = µ(1) , . . . , µ(r) be a vector of r Borel nonnegative measures on IR with one and the same total mass λ < ∞. Suppose L is a complete Borel sublattice on IRr whose projection on every axis x(i) (i ∈ R := {1, . . . , r}) is the entire real line IR. Suppose also that L is a union of disjoint nonempty sublattices Li , i ∈ S (S = {0, . . . , s} or S = IN) and each Li is a rectangle in IRr ,

(j)− (j)+ + = ⊗rj=1 xi , xi Li = x− , i ∈ S, i , xi

7.3 Mass Transportation Problems with Capacity Constraints

33

− + − − + + + with x− 0 = −∞, xs = +∞, xi < xi , xi−1 ≤ xi ≤ xi−1 ≤ xi . (For representations of sublattices on a product of r lattices we refer to Veinott (1989, Section 4).)

Given σi , a nonnegative Borel measure on Li , and a measure P on L with vector of one-dimensional projections µ we write PLi ≺ σi to denote that the restriction of P on Li is less concordant that σi ; that is, for any x ∈ Li , − P ((x− i , x]) ≤ σi ((xi , x]),

i ∈ S.

Note that in contrast with the usual deﬁnition of concordance (Kruskal (1958), Tchen (1980), Stoyan (1983)) we allow σi to have total mass diﬀerent from that of PLi . For example, assuming that σi vanishes on a subset of Li , we in fact impose additional restrictions on the support of P . Write PL (µ, σ) (σ := (σi )i∈S ) to denote the class of all P ’s on L possessing the properties (i)

P is a nonnegative Borel measure on L with vector of one-dimensional marginals µ.

(7.3.80)

(ii)

PLi ≺ σi

(7.3.81)

for all i ∈ S.

Deﬁne the mapping F ∗ : L → IR iteratively as follows: For x ∈ L0 , let F ∗ (x) =

⎧ ⎨

min

−∞
⎩

µ(j)

u(j) , x(j)

j∈R

+ F σ0 (u)

⎫ ⎬ ⎭

,

and for x ∈ Li , let F ∗ (x) =

min

⎧ ⎨

+ x− i
⎩

µ(j)

u(j) , x(j)

j∈R

⎫ ⎬ + F σi (u) + fLi (u) , ⎭

(1)− (2) (r) (1) (2) (r) where fLi (u) = max F ∗ xi , vi , . . . , vi , F ∗ vi , xi , . . . , vi ,

(1) (2) (r)− (j) (j)+ and vi := u(j) ∧ xi−1 , u ∈ Li . Denote by H− F ∗ vi , vi , . . . , xi the lower Hoeﬀding–Fr´echet bounds for multivariate d.f. with prescribed one-dimensional marginals: ⎛ H− (u) = max ⎝0,

j∈IR

µ(j)

−∞, u(j)

⎞ − (r − 1)λ⎠ .

34

7. Relaxed or Additional Constraints

Theorem 7.3.20 Suppose F ∗ deﬁned above satisﬁes the inequality F σi (u) + fLi (u) ≥ H− (u)

for every u ∈ Li , i ∈ S.

Then max F P = F ∗ ,

P ∈PL

and F ∗ is a d.f. of some P ∗ ∈ PL . The proof is similar to that of Theorem 7.3.1. It requires a multivariate analogue of the greedy algorithm similar to that in Barnes and Hoﬀman (1985) and Olkin and Rachev (1990). A multivariate version of Hoﬀman’s (1963) northwest corner rule is given in Balinski and Rachev (1989), where the interplay between greedy algorithms and MKPs is emphasized. We are now ready to state the solution of MKP (7.3.65) with constraints (7.3.66) and (7.3.67). Theorem 7.3.21 Suppose that the assumptions of Theorem 7.3.18 hold, and c : L → IR is subadditive and left-continuous on L with c(x, y0 )µ( dx) + c(x0 , y)ν( dy) > −∞ (7.3.82) IR

IR

for some (x0 , y0 ) ∈ L. Then the minimum in (7.3.65) is attained at P ∗ ∈ PL deﬁned in Theorem 7.3.18. The next theorem gives the solution of the following MKP: minimize c dP

(7.3.83)

L

under the constraints (7.3.68), (7.3.69), and assuming that c is superadditive on L; that is, (−c) is subadditive on L. Theorem 7.3.22 Suppose that the assumptions of Theorem 7.3.19 hold, and c : L → IR is superadditive and right-continuous with c(x, y0 )µ( dx) + c(x0 , y)ν( dy) < ∞. (7.3.84) IR

IR

Then the minimum in (7.3.84) is attained at P ∗ ∈ PL deﬁned in Theorem 7.3.19.

7.3 Mass Transportation Problems with Capacity Constraints

35

The proof of the above two theorems is the same as that of Theorem 7.3.11. Example 7.3.23 (The discrete case) Suppose µ and ν are discrete measures with supports I = {1, . . . , m} and J = {1, . . . , n} and L = L0 +· · ·+Ls is a sublattice of I ×J with projections I and J, respectively. Then Theorem 7.3.18 corresponds to the main theorem in Hoﬀman and Veinott (1990). Example 7.3.24 (MKP with capacity constraints) Suppose that in Theorems 7.3.21 and 7.3.22, L = L = L0 = L0 = IR2 . Then we obtain the solution of problems I and II; see (7.3.6)–(7.3.10). In fact, Theorems 7.3.21 and 7.3.22 reduce to Theorems 7.3.11 and 7.3.17. We shall complete this section with another possible extension of Problems I and II. Consider a ﬁnite measure µ on (IR2 , B 2 ) and deﬁne for two probability measures P1 , P2 on (IR1 , B 1 ) and Ai × Bi ∈ B 1 ⊗ B 1 , i ∈ I, M µ (P1 , P2 ) (7.3.85) 1 = P ∈ M (P1 , P2 ); P (Ai × Bi ) ≤ µ(Ai × Bi ), i ∈ I , where M 1 (P1 , P2 ) denotes the set of all probability measures P on IR2 with marginals P1 , P2 . As in Theorem 7.3.1 (see (7.3.17)), we assume that µ(Ai × Bi ) ≥ (P1 (Ai ) + P2 (Bi ) − 1)+ .

(7.3.86)

Theorem 7.3.25 Under the assumption (7.3.86) let us deﬁne P ∗ (A × B) = inf {µ (A1 × Bi ) + (P1 (A) − P1 (Ai )) + (P2 (B) − P2 (Bi ))} Ai ⊂A Bi ⊂B

∧ min(P1 (A), P2 (B)),

A, B ∈ B 1 .

(7.3.87)

Then hµ (A × B) := sup{P (A × B); P ∈ M µ (P1 , P2 )} ≤ P ∗ (A × B). (7.3.88) If P ∗ determines a measure, then hµ (A × B) = P ∗ (A × B),

and P ∗ is a solution of (7.3.87).

(7.3.89)

Remark 7.3.26 The proof of Theorem 7.3.25 is similar to that of Theorem 7.3.1. In contrast to Theorem 7.3.1 it allows us to consider “local” bounds in the transportation problem. Observe that in the ﬁnite discrete case, bounds of the type xij ≤ µij

for some (i, j)

(7.3.90)

are of this “local” type. As far as we know, in the literature there is no result concerning the solution of (7.3.90) with local bounds. See the next section for a possible approach to the problem.

36

7. Relaxed or Additional Constraints

7.4 Local Bounds for the Transportation Plans While in the preceding three sections the additional constraints were formulated mainly in terms of the d.f.s, we now consider “local” constraints formulated in terms of the probability densities. These local-type restrictions are stronger than those in the previous section, and generally they are much more diﬃcult to handle; see Remark 7.3.26. Our ﬁrst result deals with a transportation problem with “indicator” cost function ⎧ ⎨ 1 if x = y, (7.4.1) c(x, y) = I(x = y) = ⎩ 0 if x = y; i.e., the cost of transportation is one for any unit mass that has to be moved, and zero otherwise. The cost function c does not satisfy a Mongetype condition. We formulate this transportation problem on a general measure space (S, U) assuming only that {(x, y); x = y} ∈ U ⊗ U.

(7.4.2)

Let Mf (S), Mf (S×S) be the set of all ﬁnite measures on (S, U), respectively (S ×S, U ⊗U), and for µ ∈ Mf (S ×S), let πi µ, i = 1, 2, denote the marginals of µ. (This transportation problem leads to an extension of Dobrushin’s result on optimal couplings.) Theorem 7.4.1 (Optimal couplings with local restrictions) Assume that (7.4.2) holds and let µ1 , µ2 ∈ Mf (S) with µ1 (S) ≤ µ2 (S). Then (a) inf{µ ({(x, y); x = y}) ; µ ∈ Mf (S × S), π1 µ ≥ µ1 , π2 µ ≤ µ2 } (7.4.3) = λ− (S) := sup (µ1 (C) − µ2 (C)). C∈U

(b) Moreover, the inﬁmum in (7.4.3) is attained at µ∗ (A × B) = γ(A ∩ B) +

λ− (A)λ+ (B) , λ+ (S)

(7.4.4)

where λ+ (A) = supC⊂A (µ2 − µ1 )(C), λ− (A) = supC⊂A (µ1 − µ2 )(C) and γ(A) = µ2 (A) − λ+ (A) = µ1 (A) − λ− (A). Proof: For any µ ∈ Mf (S × S), µ(x = y) ≥ sup µ(C × (S \ C)) = sup{µ(C × S) − µ(C × C)} C

C

≥ sup{µ(C × S) − µ(S × C)} ≥ sup{µ1 (C) − µ2 (C)} C

=

C

sup{λ− (C) − λ+ (C)} = λ− (supp λ− ) = λ− (S). C

7.4 Local Bounds for the Transportation Plans

37

On the other hand, µ∗ (A × S) = γ(A) + λ− (A)λ+ (S)/λ+ (S) = µ1 (A) and µ∗ (S × B) = γ(B) + λ− (S)λ+ (B)/λ+ (S) ≤ γ(B) + λ+ (B) = µ2 (B). Finally, we have

∗

µ (x = y) = =

I(x = y)(γ( dx, dy) + λ− ( dx)λ+ ( dy)/λ+ (S)) I(x = y) λ− ( dx)λ+ ( dy)/λ+ (S)

= λ− (S)λ+ (S)/λ+ (S) = λ− (S). 2 Consider next some ﬁnite measures µ1 , µ2 on IR with densities h1 , h2 with respect to a dominating measure µ on IR1 . Deﬁne Pµµ12 := {P ∈ M 1 (IR2 , B 2 ); π1 P ≥ µ1 , π2 P ≤ µ2 }.

(7.4.5)

Any P ∈ Pµµ12 has marginals P1 = π1 P, P2 = π2 P with densities f1 ≥ h1 and f2 ≤ h2 with respect to µ. We assume that 1 = µ1 (IR1 ) ≤ µ2 (IR1 ); i.e., µ1 is a probability measure, and so f1 = h1 . ⎧ ⎪ ⎨ Theorem 7.4.2 Let z0 = inf y; ⎪ ⎩

f2∗ (y) =

⎧ ⎪ ⎪ h2 (y) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 − h2 (u) du ⎪ ⎪ ⎪ ⎨ (z0 ,∞)

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 0 ⎪ ⎪ ⎩

(y,∞)

⎫ ⎪ ⎬ h2 dµ ≤ 1 , ⎪ ⎭

if y > z0 ,

if y = z0 and µ{z0 } > 0,

(7.4.6)

µ(z0 ) otherwise,

and let P ∗ be the corresponding probability measure with µ-density f2∗ . Then the following characterizations of the optimal coupling hold: (a) sup{F P (x, y); P ∈ Pµµ12 } = 1−max(Fµ1 (x), FP ∗ (y)), for all x, y where F P (x, y) = P ([x, ∞) × [y, ∞)) is the survival function.

38

7. Relaxed or Additional Constraints

(b) The sup in (a) is attained for the distribution F ∗ = FX ∗ ,Y ∗ , where (U ), Y ∗ = FP−1 X ∗ = Fµ−1 ∗ (U ). 1 (c) If c is a cost function that is componentwise antitone and satisﬁes the Monge condition (cf. (7.1.3), (7.1.4)), then inf

c(x, y) dFP (x, y); P ∈ Pµµ12

=

c(x, y) dF ∗ (x, y).

(7.4.7)

Proof: (a), (b) For P ∈ Pµµ12 with marginals Fµ1 , G2 , we know that F P (x, y) ≤ P Fµ−1 (U ) ≥ x, G−1 2 (U ) ≥ y = P (U ≥ max(Fµ1 (x), G2 (y))) = 1 1 − max(Fµ1 (x), G2 (y)). By the deﬁnition of P ∗ , FP ∗ (y) ≤ G2 (y) for all y, and therefore, F P (x, y) ≤ 1 − max(Fµ1 (x), FP ∗ (y)). (c) Applying (a), (b), and Theorem 3.1.2 we obtain (7.4.7). The conditions on the cost function c were studied by R¨ uschendorf (1980). In that terminology (−c) is a -monotone function. Applying the results in R¨ uschendorf (1980), it is easy to check that (c) follows from (a), (b). 2 The “antitone” assumption in (c) of Theorem 7.4.2 does not have a transparent interpretation in terms of cost functions. Moreover, under some additional assumptions on the bounding measures we can construct solutions for more “natural” cost functions. Again, let µ1 have densities hi with respect to µ, 1 = µ1 (IR1 ) ≤ µ2 (IR1 ). Theorem 7.4.3 Assume that for some y0 ∈ IR1 , h1 (u) ≤ h2 (u) for u < y0

and

h1 (u) ≥ h2 (u) for u ≥ y0 .

(7.4.8)

Deﬁne

x0

⎧ ⎪ ⎨ = inf y; ⎪ ⎩

⎫ ⎪ ⎬

h1 (u) dµ(u) ≥

(y,∞)

h2 (u) dµ(u) (y,∞)

⎪ ⎭

,

and let ⎧ h2 (u) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ h1 (u) dµ(u) − h2 (u) dµ(u) ⎪ ⎪ ⎪ ⎨ f2 (u) :=

[x0 ,∞)

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ h1 (u)

(x0 ,∞)

µ(x0 )

if u > x0 ,

if u = x0 and µ{x0 } > 0, if u < x0 .

(7.4.9)

7.4 Local Bounds for the Transportation Plans

39

Then for any cost function c satisfying the Monge condition (7.1.3) and the unimodality condition (7.1.27) we have inf

c(x, y) dfP (x, y); P ∈

Pµµ12

1 (u), F2−1 (u) du, = c Fµ−1 1

(7.4.10)

0

where F2 is the d.f. of the measure with density f2 with respect to µ. The op(U ), Y ∗ = F2−1 (U ). timal distribution is determined by the r.v.s X ∗ = Fµ−1 1 Proof: Invoking the Monge condition, for any P ∈ Pµµ12 with marginals 1 Fµ1 , G2 , we have c(x, y) dFP (x, y) ≥ c Fµ−1 (u), G−1 2 (u) du. 1 0

By the deﬁnition of F2 , G2 (y) F2 (y)

≥ F2 (y) ≥ Fµ1 (y) =

for all y ≥ x0 ,

and

(7.4.11)

for all y ≤ x0 ;

Fµ1 (y)

in fact, (7.4.11) implies that Fµ−1 (u) ≥ F2−1 (u) ≥ G−1 2 (u) for u > F2 (x0 ) 1 −1 −1 and F2 (u) = Fµ1 (u) for u ≤ F2 (x0 ). Our assumptions on c imply that −1 −1 , G−1 2 c Fµ−1 2 (u) ≥ c Fµ1 (u), F2 (u) for all u. 1 Remark 7.4.4 It is not diﬃcult to extend the solution of Theorem 7.4.3 to the case µ1 (IR1 ) < 1 and to the case f1 ≥ h1 , f2 ≤ h2 for the densities (here, f1 and f2 are the marginal densities of an admissible plan P ), if we still keep the assumption (7.4.8). To see this, choose x0 as in (7.4.9), and deﬁne ⎧ h2 (x) ⎪ ⎪ ⎪ ⎪ ⎪ 1− h2 (x) dµ(x) ⎪ ⎪ ⎪ ⎨ f2 (x) =

where z0

y0

(z0 ,∞)

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

0 ⎧ ⎪ ⎨ = inf x; ⎪ ⎩ ⎧ ⎪ ⎨ = inf y; ⎪ ⎩

if x > z0 ,

if x = z0 and µ(z0 ) > 0,

µ(z0 ) otherwise, ⎫ ⎪ ⎬ h2 (x) dµ(x) ≤ 1 . Deﬁne next ⎪ ⎭

(x,∞)

h2 (x) dµ(x) ≤ (y,∞)

⎫ ⎪ ⎬

h1 (x) dµ(x) (y,∞)

⎪ ⎭

(7.4.12)

40

7. Relaxed or Additional Constraints

and ⎧ h1 (x) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ f2 (x) ⎪ ⎪ ⎪ ⎨ f1 (x) = (h2 (x) − h1 (x)) dµ(x) ⎪ ⎪ ⎪ ⎪ ⎪ [y0 ,∞) ⎪ ⎪ ⎪ ⎪ µ(y0 ) ⎩

if x > y0 , if x < y0 , (7.4.13) if µ(y0 ) > 0.

Then for a cost function c deﬁned as in Theorem 7.4.3, we have inf c(x, y) dFP (x, y); π1 P ≥ µ1 , π2 P ≤ µ2

(7.4.14)

1 = c F1−1 (u), F2−1 (u) du, 0

where Fi have densities fi with respect to µ, i = 1, 2. Let us return to the comment we made in Remark 7.3.26. Consider transportation problems with local upper bounds on the transportation plans xij ≤ µij in the discrete case, while P ≤ µ for some ﬁnite measure µ in the general case. The following framework allows us to handle quite general transportation problems. On a measurable space (X, B), let Bi ⊂ B, be sub-σ-algebras, 2 ≤ i ≤ n, Pi ∈ M 1 (X, Bi ). Further, let µ be a ﬁnite measure on (X, B), and deﬁne Mµ := P ∈ M 1 (X, B); P/Bi = Pi , 1 ≤ i ≤ n, P ≤ µ . (7.4.15) Assume that Mµ = Ø and deﬁne the set of generalized transportation plans with local upper bound µ as follows: Uµ (ϕ) := inf U (ϕ0 ) + h dµ; h ≥ 0, ϕ0 + h ≥ ϕ , (7.4.16) where U (ϕ0 ) := inf

n i=1

1

fi dPi ; fi ∈ L (Bi , Pi ), ϕ0 ≤

n

fi

.

i=1

We view U as the dual operator for the “pure” transportation problem, and typically, sup ϕ0 dP ; P ∈ M (P1 , . . . , Pn ) = U (ϕ0 ) (7.4.17)

7.4 Local Bounds for the Transportation Plans

41

will hold (cf. Chapter 2). The duality principle allows us to infer the corresponding “minimization” problem. Similarly, Uµ is the dual operator for the local majorized transportation problem. A linear operator S is majorized by Uµ , S ≤ Uµ if and only if S ≥ 0, S/Bi = Pi , 1 ≤ i ≤ n, and S ≤ µ. (7.4.18) Therefore, the approach developed in Chapter 2 yields the duality theorem ϕ dP ; P ∈ Mµ (7.4.19) =: Mµ (ϕ) Uµ (ϕ) = sup for any upper semicontinuous or uniformly approximable integrable functions ϕ in the case of a compactly approximable measure space (X, Bi , Pi ) with countable topological basis. In some sense (7.4.19) gives the duality result for the general case of order restrictions as considered, for example, in Sections 3.5 and 5.5. In particular, we obtain upper bounds n Mµ (ϕ) ≤ f dP i i for any admissible system of functions (fi ) with i=1 n ϕ ≤ i=1 fi . We next consider the question of more explicit evaluations of the dual operator Uµ for the case ϕ = 1B , B ∈ B; or equivalently, we wish to establish sharp upper Fr´echet bounds in the class Mµ . Deﬁne Mµ (B) := Mµ (1B ), and assume the duality (7.4.19) for φ = 1B . Theorem 7.4.5 Mµ (B) =

sup P ∈M (P1 ,...,Pn )

P ∧ µ(B),

(7.4.20)

where P ∧ µ is the inﬁmum in the lattice of measures. Proof: From (7.4.19), Mµ (B) = inf{µ(h) + U (ϕ); h ≥ 0, ϕ + h ≥ 1B } = inf{µ(h) + U (1B − h); 0 ≤ h ≤ 1B }.

(7.4.21)

To see the second equality in (7.4.4), take ϕ = (1B − h)+ . Thus, 0 ≤ ϕ, and it is possible to assume that h ≤ 1B . Next, we make use of the “integration” approach in Strassen (1965), Mµ (B) =

=

inf

sup

{µ(h) + P (1B − h)}

⎧ 1 ⎨

1

0≤h≤1B P ∈M (P1 ,...,Pn )

inf

sup

0≤h≤1B P

⎩

µ(h > t) dt + 0

0

(7.4.22)

⎫ ⎬ P (1B − h ≥ 1 − t) dt . ⎭

42

7. Relaxed or Additional Constraints

With Ct := {h > t} ⊂ B we see that {x; h(x) ≤ 1B (x) − 1 + t} = {x ∈ B; h(x) ≤ t} = B \ Ct . Therefore, 1 Mµ (B) =

inf

(µ(Ct ) + P (B \ Ct )) dt

sup

0≤h≤1B P

0

≥ sup inf {µ(C) + P (B \ C)} = sup µ ∧ P (B). P

C⊂B

P

On the other hand, Mµ (B) = sup{P (B); P ∈ M (P1 , . . . , Pn ), P ≤ µ} = sup{P ∧ µ(B); P ∈ M (P1 , . . . , Pn ), P ≤ µ} ≤

sup P ∈M (P1 ,...,Pn )

P ∧ µ(B). 2

Theorem 7.4.5 allows us to reduce the problem of the majorized Fr´echet boundsto a problem of “usual” Fr´echet bounds, but for a more complicated functional. It remains an open problem to determine more explicit formulas for Mµ (B) in the general case.

7.5 Closeness of Measure with Joint Marginals on a Finite Number of Directions In this section we follow the work of Kakosjan and Klebanov (1984), Khalﬁn and Klebanov (1990), Klebanov and Rachev (1995a, 1995b, 1995c), on the application of marginal problems to computer and diﬀraction tomography. Here, estimates of the closeness between probability measures deﬁned on IRn that have the same marginals on a ﬁnite number of arbitrary directions will be provided. The estimates show that the probability laws get closer in a certain metric when the number of coinciding marginals increases. The results oﬀer a solution to the computer tomography paradox stated in Gutman, Kemperman, Reeds, and Shepp (1991). We start with some historical remarks and with the statement of the problem. Let Q1 and Q2 be a pair of probabilities on IR, i.e., probability measures deﬁned on the Borel σ-ﬁeld of IR. Lorentz (1949) studied conditions for the existence of a probability density function g(·) on IR2 taking

7.5 Closeness of Measure on a Finite Number of Directions

43

only two values, 0 or 1, and having Q1 and Q2 as marginals. In his 1961 paper Kellerer generalized this result and gave necessary and suﬃcient conditions for the existence of a density f (·) on IR2 that satisﬁes the inequalities 0 ≤ f (·) ≤ 1 and has Q1 and Q2 as marginals (see also Strassen (1965) and Jacobs (1987)). Fishburn et al. (1990) were able to show that Kellerer’s and Lorentz’s conditions are equivalent; i.e., for any density 0 ≤ f ≤ 1, on IR2 there exists a density taking the values 0 and 1 only that has the same marginals. In general, similar results hold for probability densities on IRm , m ≥ 2, when the (m − 1)-dimensional marginals are prescribed. Gutmann et al. (1991) show that for any probability density 0 ≤ f ≤ 1 on IRm and for any ﬁnite number of directions, there exists a probability density taking the values 0, 1 only that has the same marginals in the chosen directions. It follows that densities having the same marginals in a ﬁnite number of arbitrary directions may diﬀer considerably in the uniform metric between densities, which is indeed a very strong metric; recall that convergence in the uniform metric implies convergence in total variation. The goal in this section is to show that under moment-type conditions, measures having a “large” number of coinciding marginals are close to each other in the weakmetrics.(1) The method is based on techniques used in the classical moment problem. On the other hand, most of our results will make use of relationships between diﬀerent probability metrics, analyzed in the monograph by Kakosjan, Klebanov, and Rachev (1988), referred to below as KKR (1988). The key idea in showing that measures with a large number of common marginals are close to each other in the weak metrics is best understood by comparing three results. The ﬁrst is the theorem of Gutman et al. (1991) mentioned above. The second (see Karlin and Studden (1966, p. 265)) states that if a ﬁnite number of moments µ1 , . . . , µn of a function f , 0 ≤ f ≤ 1, are given, then there exists a function g that takes the values 0 or 1 only and possesses the moments µ1 , . . . , µn . Finally, the third result (see KKR (1988, pp. 170–197)) gives estimates of the closeness in terms of a weak metric (the so-called λ-metric) on IR for measures having a ﬁnite number of common moments. Of course, since the condition of common marginals seems to be more restrictive than the condition of equal moments, one should be able to construct a similar estimate expressed in terms of the common marginals only. Furthermore, the technique should be similar to that used here. For simplicity, let us consider the 2-dimensional case. Let θ1 , . . . , θn be n unit vectors in the plane and P1 , P2 be two probabilities on IR2 having the same marginals in the directions θ1 , . . . , θn . To estimate the distance (1) Here weak metric stands for a metric metrizing the weak convergence in the space of probability measures on a Euclidean space.

44

7. Relaxed or Additional Constraints

between P1 and P2 , various weak metrics can be used; however, it seems that the λ-metric is the most convenient for this purpose. This metric is deﬁned as follows (see, for example, Zolotarev (1986)): Let ϕi (t) = ei(t,x) Pi ( dx), i = 1, 2, IR2

be the characteristic function of Pi . Then deﬁne the λ-distance between P1 and P2 as 1 λ(P1 , P2 ) = min max max |ϕ1 (t) − ϕ2 (t)|, ; (7.5.1) T >0 T t≤T here (·, ·) is the inner product and · is the Euclidean norm. Clearly, λ metrizes the weak convergence. Our ﬁrst result concerns the important case where one of the probability measures considered has compact support. Lemma 7.5.1 Let θ1 , . . . , θn be n ≥ 2 unit vectors in IR2 , no two of which are collinear. Let the support of the probability P1 be a subset of the unit disk, and let the probability P2 have the same marginals as P1 in the directions θ1 , . . . , θn . Set(2) " # n−1 s = 2 . (7.5.2) 2 Then 1 2 s+1 λ(P1 , P2 ) ≤ . s!

(7.5.3)

Remark 7.5.2 We can replace the right-hand side of (7.5.3) by C/s, where 1 2 s+1 C is a constant; note that as s → ∞, 2! ∼ e/s. The diﬀerence 1 2 s+1 e − s is plotted in ﬁgures 7.1 and 7.2. s! Proof of Lemma 7.5.1: The λ-metric is invariant under rotations of the coordinate system, so without loss of generality we assume that (a) the directions θj (j = 1, . . . , n) are not parallel to the axis; (b) there exists at least one pair of directions, say θj1 and θj2 , such that θj1 = (a, b), θj2 = (a, −b), where a = 0, b = 0; i.e., the vectors θj1 and θj2 are symmetric about the horizontal axis. (2) Here

and in what follows [r] denotes the integer part of the number r.

45

-0.6

-0.4

-0.2

0.0

7.5 Closeness of Measure on a Finite Number of Directions

-1.2

-1.0

-0.8

FIGURE 7.1. Plot of the diﬀerence (2/s!)1/(s+1) − e/s for s = 1, . . . , 100

20

40

60

80

100

0.0

0.001

0

-0.002

-0.001

FIGURE 7.2. Plot of the diﬀerence (2/s!)1/(s+1) − e/s for s = 10, . . . , 100

20

40

60

80

100

The law P1 has bounded support, and so, since the marginals on the directions θ1 , . . . , θn of P1 and P2 coincide, then for all j = 1, . . . , n,

(x, θj )k P1 ( dx) =

IR2

(x, θj )k P2 ( dx).

(7.5.4)

IR2

To see that P2 has moments of any order, consider (7.5.4) with j = j1 , j = j2 , and x = (x1 , x2 ). Then (x1 a ± x2 b)k (P1 − P2 )( dx) = 0, IR2

(x1 a + x2 b)k + (x1 a − x2 b)k (P1 − P2 )( dx) = 0,

(7.5.5)

IR2

and all integrals are ﬁnite. If k is even, then (ax1 + bx2 )k + (ax1 − bx2 )k ≥ ak xk1 + bk xk2 , and thus (7.5.5) implies the existence of all moments of P2 of even order.

46

7. Relaxed or Additional Constraints

The next step is to show that all moments of P1 and P2 of order ≤ n − 1 agree. Set µr,t (P ) = xr1 xt2 P ( dx), = 1, 2. IR2

Then setting θj = (uj , vj ) in (7.5.4) yields k k =0

uj vjk− [µ,k− (P1 ) − µ,k− (P2 )] = 0,

j = 1, . . . , n; k ≥ 0. Now, setting zj = vj /uj in the last equation leads to k k =0

zjk− [µ,k− (P1 ) − µ,k− (P2 )] = 0,

(7.5.6)

j = 1, . . . , n. Since no two of the directions θ1 , . . . , θn are collinear, the points z1 , . . . , z2 are distinct. Hence from (7.5.6) we ﬁnd that the following polynomial of degree k of the variable z, k k l=0

z k− [µ,k− (P1 ) − µ,k− (P2 )] ,

(7.5.7)

has n distinct roots z1 , . . . , zn . If n ≥ k + 1, then this is possible only if all coeﬃcients of (7.5.7) are equal to zero, that is, µ,k− (P1 ) = µ,k− (P2 ), = 0, . . . , k; k = 0, . . . , n − 1. So, for any unit vector t, and k = 0, 1, . . . , n − 1, k (t, x) P1 ( dx) = (t, x)k P2 ( dx). (7.5.8) IR2

IR2 (t)

Denote by P the marginal of P ( = 1, 2) in the direction t, and by ϕ (τ ; t)(τ ∈ IR) its characteristic function. By assumption, the support of (t)

P1

is in the segment [−1, 1]. Then (7.5.8) is equivalent to (k)

(k)

ϕ1 (τ ; t)|τ =0 = ϕ2 (τ ; t)|τ =0 ,

k = 0, . . . , n − 1,

(7.5.9)

(k)

where ϕ (τ ; t) is the kth derivative of ϕ (τ ; t) with respect to τ ( = 1, 2). A Taylor expansion now gives ϕ1 (τ ; t) − ϕ2 (τ ; t) =

(7.5.10)

s−1 (k) (s) ϕ (0; t) − ϕ (0; t) 1

k=0

2

k!

(k)

τk +

(s)

τ ; t) − ϕ2 ( τ ; t) s ϕ1 ( τ s!

7.5 Closeness of Measure on a Finite Number of Directions

47

for some τ ∈ (0, τ ). From (7.5.9), the ﬁrst sum on the right-hand side of (7.5.10) is equal to zero. Since s is an even number, (s) |ϕ ( τ ; t)|

≤

z

s

(t) P ( dx)

1 (t) = z s P1 ( dz) ≤ 1,

= 1, 2.

−1

IR

Thus for all τ ∈ IR, |ϕ1 (τ ; t) − ϕ2 (τ ; t)| ≤ 2

τs . s!

1

s+1 ;(3) then Choose T = ( s! 2)

1 2 s+1 sup |ϕ1 (τ ; t) − ϕ2 (τ ; t)| ≤ . s! |τ |≤T 2 Corollary 7.5.3 Let θ1 , . . . , θn , n ≥ 2, be directions in IR2 no two of which are collinear. Suppose that the marginals of the probabilities P1 and P2 with respect to the directions θ1 , . . . , θn have moments up to the even order k ≤ n − 1. Then the marginals of P1 and P2 with respect to any direction t have the same moments up to order k. Corollary 7.5.4 Lemma 7.5.1 still holds if we replace the assumption that P1 and P2 have coinciding marginals with respect to the directions θj (j = 1, . . . , n) with the assumption that these marginals have the same moments up to order n − 1. To prove our main result we must relax the condition that the support of P1 is compact, assuming only the existence of all moments together with Carleman’s conditions for the deﬁniteness of the moments problem. Set µk = sup (x, θ)k P1 ( dx), k = 0, 1, . . . , θ∈S 1 IR2

where S 1 is the unit circle, and let (s−1)/2

βs =

−

1

µ2j2j ,

j=1

where the number s is determined in Lemma 7.5.1; see (7.5.2). (3) This

s

choice of T is optimal, since 2 Ts! =

1 T

; see the deﬁnition (7.5.1) of λ-metric.

48

7. Relaxed or Additional Constraints

Theorem 7.5.5 Let θ1 , . . . , θn be n ≥ 2 directions in IR2 , no two of which are collinear. Suppose that the measure P1 has moments of any order. Suppose also that the marginals of the measures P1 and P2 in the directions θ1 , . . . , θn have the same moments up to order n − 1. Then there exists an absolute constant C such that(4) − 14

λ(P1 , P2 ) ≤ Cβs

(µ0 +

√

1/4

µ2 )

.

Proof: Let t be an arbitrary vector of the unit circle. From Corollary 7.5.3 (t) (t) we have that the marginals P1 and P2 have the same moments up to order s. From KKR (1988, p. 180) and Klebanov and Mkrtchian (1980), it follows that ⎞ ⎛ −1/4 (s−1)/2

1/4 $ (t) (t) −1/(2j) ⎠ µ0 (t) + µ2 (t) , ≤ C⎝ µ2j (t) λ P1 , P2 j=1

where µk (t) =

∞ −∞

(t)

uk Pi ( du), k = 0, . . . , s, i = 1, 2. The theorem now

follows from the obvious inequality µ2j (t) ≤ µ2j

(j = 0, 1, . . . , s/2).

2

Let us now consider the situation where the marginals of P1 and P2 in the directions θ1 , . . . , θn are not the same but are close in the metric λ. Theorem 7.5.6 Let θ1 , . . . , θn , n ≥ 2, be directions in IR2 , no two of which are collinear. Suppose that the supports of the measures P1 and P2 are in the unit disk, and that P1 and P2 have ε-coinciding marginals with respect to the directions θj (j = 1, . . . , n); i.e.,

(θ ) (θ ) (7.5.11) λ P1 j , P2 j := min max max |ϕ1 (τ ; θj ) − ϕ2 (τ ; θj )|, 1/T ≤ ε. T >0

|τ |≤T

Then there exists a constant C depending on the directions θj (j = 1, . . . , n) such that for suﬃciently small ε > 0, we have 1 + 1/s , (7.5.12) λ(P1 , P2 ) ≤ C 1/ ln ε . where s = 2 n−1 2 (4) That

is, C is independent of s, P1 , and P2 .

7.5 Closeness of Measure on a Finite Number of Directions

49

Proof: Set ψj (τ ) := ϕ1 (τ ; θj ) − ϕ2 (τ ; θj ), j = 1, . . . , n. For 0 < ε ≤ 1 we have sup|τ |≤1 |ψj (τ )| ≤ ε, cf. (7.5.11). Since the supports of the measures (θj )

(θj )

are subsets of [−1, 1], for any even number k ≥ 2 we have

(k) (k) |ϕ1 (0; θj )| + |ϕ2 (0; θj )| 2 (k) sup |ψj (τ )| ≤ ≤ . (7.5.13) k k! |τ |≤1 P1

and P2

Now we apply Corollary 1.5.1 in KKR (1988), which states that there exist constants Ck such that ()

sup |ϕj (τ )|

|τ |≤1

(7.5.14)

≤ Ck

k− k sup |ϕj (τ )|

|τ |≤1

sup |τ |≤1

(k) |ϕj (τ )|

k1 ,

= 0, 1, . . . , k.

Choosing k ≥ 2s, ≤ s, and applying (7.5.13), we obtain ()

sup |ϕj (τ )| ≤ Cs ε1/2 ,

= 0, 1, . . . , s; j = 1, . . . , n,

|τ |≤1

where Cs is a new constant depending on s only. In particular, ()

()

|ϕ1 (0; θj ) − ϕ2 (0; θj )| ≤ Cs ε1/2 ,

= 0, 1, . . . , s; j = 1, . . . , n,

or equivalently, (x, θj )k (P1 − P2 )( dx) ≤ Cs ε1/2 , 2

(7.5.15)

IR

k = 0, 1, . . . , s; j = 1, . . . , n. Following the notation in Lemma 7.5.1, we can rewrite (7.5.15) in the form for k = 0, . . . , s and j = 1, . . . , n, k k k− uj vj [µ,k− (P1 ) − µ,k− (P2 )] ≤ Cs ε1/2 .

=0

Thus, setting Rkj =

k k =0

zjk− [µ,k− (P1 ) − µ,k− (P2 )] ,

(7.5.16)

k = 2, . . . , s; j = 1, . . . , n; zj = vj /uj , we obtain 1/2 , |Rkj | ≤ Cε

(7.5.17)

depends on the directions θ1 , . . . , θn only. For any ﬁxed k (k = where C 2, . . . , s) consider

50

7. Relaxed or Additional Constraints (k)

(i) the matrix Ak with elements aj =

k −1

k−(−1)

zj

, , j = 1, . . . , k+1;

(ii) the vector Bk with elements (k)

b

= µ−1,k−+1 (P1 ) − µ−1,k−+1 (P1 ),

= 1, . . . , k + 1;

(iii) the vector Dk with elements dj = Rkj , j = 1, . . . , k + 1. Then (7.5.16) has the form Ak Bk = Dk (k = 1, . . . , s − 1), while (7.5.17) 1/2 . The matrices Ak are invertible, and so yields Dk ≤ Cε 1/2 Bk ≤ A−1 , k Dk ≤ Cε

(7.5.18)

where the constant C depends on the directions θ1 , . . . , θn only. Inequality (7.5.18) shows that the ﬁrst s − 1 moments of the two-dimensional distributions are close when ε > 0 is suﬃciently small. Such an evaluation of closeness holds for the ﬁrst s − 1 moments of the marginals corresponding to an arbitrary direction t; i.e., (x, t)k (P1 − P2 )( dx) ≤ Cε1/2 , 2 IR

k = 0, . . . , s − 1. Now we have

s−1 (j) ϕ1 (0; t) − ϕ2(j) (0; t) j 2 s |ϕ1 (τ ; t) − ϕ2 (τ ; t) ≤ τ + |τ | j! j=0 s! ≤

s−1 Cε1/2 j=0

j!

|τ |j +

2|τ |s 2|τ |s ≤ Cε1/2 e|τ | + . s! s!

% 1 & 1/2 s! s−1 Choose T = min ln 1 + Cε1/2 , 2 . Since t is arbitrary on the unit circle, we obtain 1 s−1 2 1/2 1/4 1/2 λ(P1 , P2 ) ≤ max C ε + Cε + , 1/T s! ≤ C [1/ ln(1/ε) + 1/s] , which proves the theorem.

2

Remark 7.5.7 The statement in Theorem 7.5.6 still holds if instead of the ε-coincidence of the marginals as in (7.5.11), we require the ε-coincidence of the moments up to order s of these marginals.

7.5 Closeness of Measure on a Finite Number of Directions

51

Theorems 7.5.5 and 7.5.6 can be generalized for probability measures deﬁned on IRm . However, we cannot choose the directions θ1 , . . . , θn in an arbitrary way. Furthermore, to obtain the same order of precision in IRm , m > 2, corresponding to the n directions in IR2 , we need nm−1 directions. The results can be obtained by induction on the dimension m. We deﬁne next the set of directions we are going to use. Choose n ≥ 2 distinct real numbers u1 , . . . , un , all diﬀerent from zero, and ﬁrst construct the set of n two-dimensional vectors (1, u1 ), (1, u2 ), . . . , (1, un ). Then construct n2 three-dimensional vectors (1, uj1 , uj2 ), j1 , j2 = 1, . . . , n. Repeating this process, by the last step we have constructed a set of m-dimensional vectors (1, uj1 , uj2 , . . . , ujm−1 ),

j = 1, . . . , n; = 1, . . . , m − 1.

(7.5.19)

Denote these m-dimensional vectors by θ1 , . . . θN , where N = nm−1 . These inductive arguments lead to the following extensions of Theorems 7.5.5 and 7.5.6. Theorem 7.5.8 The results in Theorems 7.5.5 and 7.5.6 still hold if we consider the measures P1 and P2 in IRm , and we choose as directions the N = nm−1 vectors in (7.5.19). Further, s = 2[(n − 1)/2]. To prove this, it is suﬃcient to note that instead of the m-dimensional vectors, we can ﬁrst consider a pair of one-dimensional probabilities; the ﬁrst component is the distribution of the inner product of the projections of the vector x and the vector θj upon the (m − 1)-dimensional subspace, while the second is the law of the last coordinate of the vector x. This allows us to decrease the dimensionality by one. To complete the proof it is suﬃcient to apply inductive arguments. The bounds of the deviation between probability measures with coinciding marginals oﬀers a solution to the computer tomography paradox as stated in Gutman et al. (1991): “It implies that for any human object and corresponding projection data there exist many diﬀerent reconstructions, in particular, a reconstruction consisting only of bone and air (density 1 or 0), but still having the same projection data as the original object. Related nonuniqueness results are familar in tomography and are usually ignored because CT machines seem to produce useful images. It is likely that the ‘explanation’ of this apparent paradox is that point reconstruction in tomography is impossible.” Lemma 7.5.1 shows that although the densities of the probability measures P1 and P2 (given that such densities exist) can be quite distant from each other for any “large” number of coinciding marginals, yet the measures P1 and P2 themselves are close in the weak metric λ. Khalﬁn and Klebanov (1990) have analyzed this paradox and obtained some bounds for the closeness of probability measures with coinciding

52

7. Relaxed or Additional Constraints

marginals for specially chosen directions for the case of uniform distance between the smoothed densities of these measures. In tomography the observations are, in fact, integrals of body densities along some straight lines. Using quadratic formulas enables us to evaluate the moments of a set of marginals; these in turn make it possible to apply the results in this section (see Remark 7.5.7) to evaluate the precision of the reconstruction for densities. The classical theory of moments makes it possible to give numerical methods for reconstructing the probability measures using the moments (see, for example, Ahiezer (1961)).

7.6 Moment Problems with Applications to Characterization of Stochastic Processes, Queueing Theory, and Rounding Problems The theory of moment has a long history, which originated in the pioneering works of Shohat and Tamarkin (1943), Hoeﬀding (1955), Rogosinsky (1958), Ahiezer and Krein (1962), Karlin and Studden (1966), Kemperman (1968). It was also in the 1950s and ’60s that moment theory became a separate mathematical discipline. Currently, it is appropriate to talk about the moment problems as beeing a whole range of problems with applications to many mathematical theories. We refer to the monograph of Annastassiou (1993) for a recent survey on the developments in the theory of moments. In this section we present some applications of moment theory to probabilistic-statistical models. The results presented here are due to Anastassiou and Rachev (1992). For the proofs of the theorems, which are only stated but not proved in this section, we refer to Anastassiou (1993). First we shall state results on the following ﬁve moment problems: Moment problem 1: Find (7.6.1) sup |x − y|p µ( dx, dy), S ⊂ IR2 , p ≥ 1, µ

S

and

|x − y|p µ( dx, dy),

inf µ

S ⊂ IR2 ,

(7.6.2)

S

where the supremum (resp. inﬁmum) is taken over the set of all probability measures µ with support in S having ﬁxed marginal moments xi µ( dx, dy) = αi , y i µ( dx, dy) = βi , i = 1, 2, . . . , n. (7.6.3) S

S

7.6 Moment Problems of Stochastic Processes and Rounding Problems

53

Remark 7.6.1 Problem (7.6.2) with ﬁxed marginal distributions µ(· × S) = µ1 (·),

µ(S × ·) = µ2 (·)

(7.6.4)

is indeed the Lp -Kantorovich problem on mass transportation (see Chapters 2 and 3). Moment problem 2: For given x0 ∈ IR and positive α ﬁnd the Kantorovich radius sup E|X − x0 |α ,

(7.6.5)

where the supremum is over all random variables X with ﬁxed moments EX = p and EX 2 = q. Remark 7.6.2 Problem 2 will be used in approximation of complex queueing models by means of deterministic models. Moment problem 3: Find sup [t]c µ( dt), A = [0, a] or [0, ∞),

(7.6.6)

A

and

inf

[t]c µ( dt),

A = [0, a] or [0, ∞),

(7.6.7)

A

over the set of all probability measures with support A having ﬁxed rth moment tr µ( dt) = dr , r > 0, dr > 0, (7.6.8) A

where for a given nonnegative x the c-rounding (0 ≤ c ≤ 1) of x is deﬁned by ⎧ ⎨ m if m ≤ x ≤ m + c, [x]c = ⎩ m + 1 if m + c < x ≤ m + 1. Remark 7.6.3 Moment problem 3 can be applied to the problems of rounding and apportionment; see Mosteller, Youtz, and Zahn (1967), Diaconis and Freedman (1979), Balinski and Young (1982), and Balinski and Rachev (1993). In the apportionment theory, c = 0 corresponds to the Adams method; c = 1/2 corresponds to the Webster method (or conventional rounding, or Mosteller–Youtz–Zahn “broken stick” rule of rounding); c = 1 corresponds to the Jeﬀerson method; see Balinski and Young (1982).

54

7. Relaxed or Additional Constraints

Moment problem 4: Find (7.6.6) and (7.6.7) subject to (7.6.8) and tµ( dt) = d1 . (7.6.9) A

We next consider some inﬁnite-dimensional analogues of the moment problems 1 and 2. Let C[0, 1] be the space of continuous functions on [0, 1] with the usual sup-norm x, and let X (C[0, 1]) be the space of r.v.s on a nonatomic probability space (Ω, A, P ) with values in C[0, 1]. Let M be the class of all strictly increasing continuous functions f : [0, ∞] → [0, ∞], f (0) = 0, f (∞) = ∞. Finally, let T be a set of ﬁnitely many points in [0, 1], 0 ≤ t1 < t2 < · · · ≤ tN ≤ 1.

(7.6.10)

Moment problem 5: Given h, gi ∈ M(i = 1, . . . , N ) ﬁnd inf Eh(X − Y ),

(7.6.11)

where the inﬁmum is over the set of all possible joint distributions of X and Y subject to the moment constraints Egi (|X(ti )|) = ai ,

Egi (|Y (ti )|) = bi .

(7.6.12)

Remark 7.6.4 This problem can be interpreted as follows. Having observations of two random processes (more precisely, we suppose the moments (7.6.12) are known), the goal is to evaluate the minimal possible distance Eh(||X −Y ||) between the processes X and Y . We shall determine the minimum in (7.6.11) and show that essentially this minimum can be achieved.

7.6.1

Moment Problems and Kantorovich Radius

In this section we state the solutions of moment problems 1 and 2; the proofs are given in Anastassiou and Rachev (1992) and the monograph Anastassiou (1993). Let S = [a, b] × [c, d] ⊂ IR2 and ϕ(x1 , x2 ) = |x1 − x2 |p , p ≥ 1. Suppose (α, β) ∈ S, and denote by U = U (ϕ, α, β) the supremum in (7.6.1) subject to x1 µ( dx1 , dx2 ) = α, x2 µ( dx1 , dx2 ) = β. (7.6.13) S

S

Theorem 7.6.5 The supremum U in (7.6.1) is given by U = Dδ + T,

(7.6.14)

7.6 Moment Problems of Stochastic Processes and Rounding Problems

where

|b − d|p + |a − c|p − |b − c|p + |a − d|p ,

D

:=

T

(1 − B)|b − c|p + (B + C − 1)|a − c|p + (1 − C)|a − d|p , d−β b−α , C := , := b−a d−c := max(0, 1 − B − C).

B δ

55

:=

Remark 7.6.6 Since ϕ is convex on any S ⊂ IR2 , then given (7.6.13), inf ϕ dµ = |α − β|p . µ

S

Next, consider nonbounded regions S. Namely, deﬁne for b ≥ 0 the following stripes in IR2 : S1b S2b Sb

:= {(x, y); y = x + b , where 0 ≤ b ≤ b}, := {(x, y); y = x − b , where 0 ≤ b ≤ b}, := S1b ∪ S2b .

We extend Theorem 7.6.5 to this type of unbounded region. Theorem 7.6.7 Assume that 0 < p ≤ 1. (i) If S = S1b or S2b , (α, β) ∈ S, then the supremum U in (7.6.1) is equal to U := U (ϕ, α, β) = |α − β|p . (ii) If S = S b , then U = bp . (iii) Let L be the lower bound (7.6.2) subject to (7.6.13). Then if S = S1b or S = S2b or S = S b , (α, β) ∈ S, we have L := L(ϕ; α, β) = bp−1 |α − β|. Next, consider another type of stripe in IR2 : For b, γ > 0, S1b S2γ S b,γ

:= {(x, y); y = x + b , where 0 ≤ b ≤ b}, := {(x, y); y = x − γ , where 0 ≤ γ ≤ γ}, := S1b ∪ S2γ .

Theorem 7.6.8 Let p ≥ 1. (i) If S = S1b , (α, β) ∈ S, then U := U (ϕ, α, β) = bp−1 (β − α).

56

7. Relaxed or Additional Constraints

(ii) If S = S2γ , (α, β) ∈ S, then U = U (ϕ, α, β) = γ p−1 (α − β). (iii) If S = S b,γ , (α, β) ∈ S, then U =

(bp − γ p )(β − α − b) + bp (b + γ) . b+γ

Next, we shall state the explicit solutions of Moment problem 2. So we will be interested in the following problem. For given x0 ∈ IR, α > 0, p ∈ IR, q > 0 (p2 ≤ q), −∞ ≤ a < b ≤ +∞, ﬁnd the Kantorovich radius K

(7.6.15) := K(x0 ; α, p, q, a, b) := sup{E|X − x0 |α ; X ∈ [a, b] a.s., EX = p, EX 2 = q}.

Theorem 7.6.9 (Case (A): α ≥ 2, −∞ < a < b < +∞) Let x0 = (a + b)/2, a ≤ p ≤ b, and 0 ≤ q ≤ b2 + (a + b)(p − b). Then the Kantorovich radius K admits the following bound: α−2 " # b−a (a + b)2 q − p(a + b) + K ≤ . 2 4 Moreover, if there exist λ1 , λ2 ≥ 0, λ1 + λ2 ≤ 1, such that p

=

q

=

a+b b−a + (λ1 − λ2 ), 2 2 2 2 b − a2 (a + b) (b − a)2 + (λ1 − λ2 ) + (λ1 + λ2 ), 4 2 4

then K =

b−a 2

α−2 " # (a + b)2 q − p(a + b) + . 4

The next theorem gives an analogue of Theorem 7.6.9 when the X’s in (7.6.15) have unbounded support. Theorem 7.6.10 (Case (B): 0 < α ≤ 2, a = −∞, b = +∞) For any x0 ∈ IR, p ∈ IR, q > 0, p2 ≤ q, the Kantorovich radius K is given by K = K(x0 ; α; p, q) = (q − 2x0 p + x20 )α/2 . The rest of the results in this section treat various versions of Theorems 7.6.9 and 7.6.10. Theorem 7.6.11 (Case (C): 0 < α ≤ 2, −∞ < a < b < +∞) For any x0 ∈ (a, b); p ∈ IR, p2 ≤ q, K = sup{E|X − x0 |α ; X ∈ [a, b] a.s., EX = p, EX 2 = q}. Set P = p − x0 , Q = q − 2x0 p + x20 , A(x0 ) = a − x0 , B(x0 ) = b − x0 , C(x0 ) = min(−A(x0 ), B(x0 )).

7.6 Moment Problems of Stochastic Processes and Rounding Problems

57

(i) If 0 ≤ Q ≤ C 2 (x0 ), then K = Qα/2 . (ii) If Q > C 2 (x0 ) and (A(x0 ) + B(x0 ))P − Q − A(x0 )B(x0 ) ≥ 0, then K ≤ Qα/2 . Theorem 7.6.12 (Case (D): 1 ≤ α ≤ 2, −∞ < a < b ≤ +∞) For any p ∈ IR, p2 ≤ q, a ≤ p ≤ b, set P = p − a, Q = q − 2ap + a2 , B = b − a. Suppose Q ≤ BP . Then K := sup{E|X − a|α ; X ∈ [a, b], EX = p, EX 2 = q} = p2−α Qα−1 . Theorem 7.6.13 (Case (E): 1 ≤ α ≤ 2, −∞ ≤ a < b < +∞) For any p ∈ IR, p2 ≤ q, a ≤ p ≤ b, set P = p − b, Q = q − 2bp + b2 , θ = a − b. Suppose Q ≤ θP . Then K := sup{E|X − b|α ; X ∈ [a, b], EX = p, EX 2 = q} = p2−α Qα−1 .

7.6.2

Moment Problems Related to Rounding Proportions

Here we state results on explicit solutions of Moment see (7.6.6) and (7.6.9). To this end recall the deﬁnition c ≤ 1): For any x ≥ 0, ⎧ ⎨ m, if m ≤ x < m + c, [x]c = ⎩ m + 1, if m + c ≤ x < m + 1, ⎧ ⎨ 0, if 0 ≤ x < c, = ⎩ m, if m − 1 + c < x ≤ m + c,

problems 3 and 4; of c-rounding (0 ≤

m ∈ IN ∪ {0},

m ∈ IN.

The next four theorems deal with problem 3.(5) Theorem 7.6.14 Let c ∈ (0, 1), r > 0, 0 < a < ∞, d > 0, and U := U[·]c (a, r, d) := sup{E[X]c ; 0 ≤ X ≤ a a.s., (EX r )1/r = d}. Set n = [a]. (I) If n + c < a, n + c ≤ d ≤ a, then U = n + 1. (II) If n + c ≥ a, n − 1 + c ≤ d ≤ a, then U = n. (5) In

the sequel, the underlying probability space is assumed to be nonatomic, and thus the space of laws of nonnegative r.v.s coincides with the space of all Borel probability measures on IR+ .

58

7. Relaxed or Additional Constraints

(III) If 0 < a ≤ c, then U = 0. ln 2 (< 1), n + c < a, and 0 ≤ d ≤ n + c, then ln(1 + 1/c) U = (n + 1)dr (n + c)−r .

(IV) If 0 < r ≤

ln 2 , n + c ≥ a, and 0 ≤ d ≤ n − 1 + c, then ln(1 + 1/c) U = ndr (n − 1 + c)−r .

(V) If 0 < r ≤

(VI) If r ≥ 1 and 0 ≤ d ≤ c, then U = dr c−r . (VII) Suppose r ≥ 1. If either (a) n + c < a and k ∈ {1, . . . , n} is determined by k − 1 + c ≤ d < k + c, or (b) n + c ≥ a and k ∈ {1, . . . , n} is determined by k − 1 + c ≤ d < k + c, then U = k +

dr − (k − 1 + c)r ≤ 1 − c + d. (k − c)r − (k − 1 + c)r

The next theorem extends Theorem 7.6.14 to the case a = +∞. Theorem 7.6.15 Let 0 < c < 1, r > 0, d > 0, and U := U[·]c (r, d) := sup{E[X]c ; X ≥ 0 a.s., (EX r )1/r = d}. (I) If 0 < r < 1, then U = +∞. (II) If r ≥ 1 and 0 ≤ d ≤ c, then U = dr c−r . (III) Suppose r ≥ 1. Deﬁne k ∈ N by k − 1 + c ≤ d < k + c. Then U = k+

dr − (k − 1 + c)r ≤ 1 − c + d. (k + c)r − (k −+ c)r

The next two theorems are versions of the previous two; here we consider the lower bounds in c-rounding. Theorem 7.6.16 Let c ∈ (0, 1), r > 0, 0 < a < ∞, d > 0, and L := 1/r L[·]c (a, r, d) := inf{E[X]c ; 0 ≤ X ≤ a a.s., (EX r ) = d}. Set n = [a]. (I) If 0 < d ≤ c, then L = 0. (II) If c < a ≤ 1 + c and c ≤ d ≤ a, then L = (dr − cr )/(ar − cr ).

7.6 Moment Problems of Stochastic Processes and Rounding Problems

59

(III) Let 0 < r ≤ 1, n + c < a, and determine k ∈ {0, 1, . . . , n − 1} by k + c ≤ d < k + 1 + c. Then L = k+

dr − (k + c)r . (k + 1 + c)r − (k + c)r

(IV) If 0 < r ≤ 1, n + c < a, and n + c ≤ d ≤ a, then L = n+

dr − (n + c)r . ar − (n + c)r

The case a = +∞ is extended as follows. Theorem 7.6.17 Let c ∈ (0, 1), r > 0, d > 0, and L := L[·]c (r, d) := inf{E[X]c : X ≥ 0 a.s., (EX r )1/r = d}. (I) If r > 1, then L = 0. (II) If 0 ≤ r ≤ 1, 0 < d ≤ c, then L = 0. (III) If r = 1, c ≤ d < ∞, then L = d − c. (IV) If 0 < r ≤ 1, deﬁne k ∈ IN ∪ {0} by k + c ≤ d < k + 1 + c. Then L := k +

dr − (k + c)r . (k + 1 + c)r − (k + c)r

Next we pass to Moment problem 4, see (7.6.6)–(7.6.9). For simplicity we shall consider only special cases of c-rounding. For the general case we refer to Anastassiou and Rachev (1992) and Anastassiou (1993). First consider the conventional (Webster) rounding, or MYZ-rounding, [x] := [x]1 . Theorem 7.6.18 Let a > 0, 0 < r = 1, d1 > 0, dr > 0, and U = U[·] (a, r, d1 , d2 ) := sup{E[X]; 0 ≤ X ≤ a a.s. and EX = d1 , EX r = dr }. Let θ = [a]. (I) Set ∆r,a := ar +

ar − θr (d1 − θ). a−θ

Suppose that a = θ, θ ≤ d1 ≤ a, and either r > 1, dr1 ≤ dr ≤ ∆r,a , or 0 < r < 1 and ∆r,a ≤ dr ≤ dr1 . Then U = θ.

60

7. Relaxed or Additional Constraints

(II) Suppose 0 < θ = a and there are λ1 , λ2 ≥ 0 with λ1 + λ2 ≤ 1 and such that d1 = λ1 θ + λ2 a and dr = λ1 θr + λ2 ar . Then U =

(θr − ar )d1 + (a − θ)dr . a(θr−1 − ar−1 )

(III) Let θ ≥ 1 and suppose there exists k ∈ {0, 1, . . . , θ − 1} such that k ≤ d1 < k+1 and either r > 1 and dr,k := k r +[(k+1)r −k r ](d1 −k) ≤ dr ≤ θr−1 d1 or 0 < r < 1 and θr−1 d1 ≤ dr ≤ dr,k . Then U = d1 . For the case a = +∞ we have the following version of the above theorem. Theorem 7.6.19 Let 0 < r = 1, d1 > 0, dr > 0, and U := U[·] (r, d1 , dr ) := sup{E[X]; X ≥ 0 a.s. and EX = d1 , EX r = dr }. Suppose there exists a nonnegative integer k such that k ≤ d1 ≤ k + 1 and either r > 1 and dr ≥ dr,k := k r + [(k + 1)r − k r ](d1 − k) or 0 < r < 1 and 0 < dr ≤ dr,k . Then U = d1 . If we change in Theorems 7.6.18 and 7.6.19 the upper bound U to the corresponding lower bound, we obtain the following two theorems. Theorem 7.6.20 Let a > 0, 0 < r = 1, d1 > 0, dr > 0, and L := L[·] (a, r, d1 , dr ) := inf{E[X]; 0 ≤ X ≤ a a.s., EX = d1 , EX r = dr }. (I) Suppose there exist t1 , t2 , λ such that 0 ≤ t1 ≤ t2 ≤ 1, and d1 = (1 − λ)t1 + λt2 and dr = (1 − λ)tr1 + λtr2 . Then L = 0. (II) If 0 < a ≤ 1, then L = 0. (III) If 1 < a < 2 and there exist λ1 , λ2 > 0 with λ1 + λ2 ≤ 1 and such (dr − d1 ) that d1 = λ1 + λ2 a, dr = λ1 + λ2 ar , then L = . (ar − a) From now on assume that a ≥ 2, and let θ = [a]. (IV) Suppose ∆θ :=

(θ − 1)(ar − a) ≤ θ. (θr − θ)

(i) If d1 = λ1 + λ2 θ, dr = λ1 + λ2 θr for some λ1 , λ2 ≥ 0, λ1 + λ2 ≤ 1, then L = ∆θ . (ii) If d1 = λ1 θ+λ2 a and dr = λ1 θr +λ2 θr for some λ1 , λ2 ≥ 0, λ1 +λ2 ≤ 1, then L =

(θ − a(θ − 1))dr − (θr+1 − ar (θ − 1))d1 . θar − aθr

7.6 Moment Problems of Stochastic Processes and Rounding Problems

61

(V) Suppose ∆θ > θ. (i) If d1 = λ1 +λ2 θ+λ3 a and dr = λ1 +λ2 θr +λ3 ar for some λ1 , λ2 , λ3 ≥ 0, λ1 + λ2 + λ3 = 1, then L=

(θ − 1)(θ − a + 1)(dr −1) − ((θr −1)θ − (ar −1)(θ − 1))(d1 − 1) . (θ − 1)(ar − 1) − (a − 1)(θr − 1)

(ii) If d1 = λ1 + λ2 a, dr = λ1 + λ2 ar for some λ1 , λ2 ≥ 0, λ1 + λ2 ≤ 1, θ(dr − d1 ) . then L = ar − a (VI) Suppose θ > 1 and one of the following holds. (i) r > 1 and there exists k ∈ {1, . . . , θ − 1} such that k ≤ d1 k + 1 and dr,k

:= k r + ((k + 1)r − k r ) (d1 − k) ≤ dr (θr − 1)(d1 − 1) ; ≤ ∆r,θ := 1 + θ−1

(ii) 0 < r < 1 and there exists k ∈ {1, . . . , θ − 1} such that k ≤ d1 ≤ k + 1 and ∆r,k ≤ dr ≤ dr,k . Then L = d1 − 1. The special case a = +∞ is treated as follows. Theorem 7.6.21 Let 0 < r = 1, d1 > 0, dr > 0, and L := L[·] (r, d1 , dr ) := inf{E[X] : X ≥ 0 a.s., EX = d1 , EX r = dr }. (I) Suppose there exist 0 ≤ t1 ≤ t2 ≤ 1, 0 ≤ λ ≤ 1 such that d1 = (1 − λ)t1 + λt2 and dr = (1 − λ)tr1 + λtr2 . Then L = 0. (II) Suppose 0 < r < 1 and either d1 = λ1 + λ2 , dr = λ1 , λ1 ≥ 0, λ1 + λ2 ≤ 1 or d1 ≥ 1, 0 < d1 ≤ 1. Then L = d1 − dr . (III) Suppose 0 < r < 1 and for some integer k, k ≤ d1 < k + 1, and 1 ≤ dr ≤ (k r + ((k + 1)r − k r ))(d1 − k). Then L = d1 − 1. (IV) Suppose r > 1 and either d1 = λ1 , dr = λ1 + λ2 , λi ≥ 0, i = 1, 2, λ1 + λ2 ≤ 1 or 0 < d1 ≤ 1 and 1 ≤ dr . Then L = 0. (V) Suppose r > 1 and for some integer k, k ≤ d1 < k + 1, and (k r + ((k + 1)r − k r ))(d1 − k) ≤ dr . Then L = d1 − 1. Similar results are valid for Adams (c = 0) and Jeﬀerson (c = 1) rules of roundings; see Anastassiou and Rachev (1992) and Anastassiou (1993).

62

7. Relaxed or Additional Constraints

7.6.3

Closeness of Random Processes with Fixed Moment Characteristics

The moment problems we are going to consider in this section may be viewed as extensions of Moment problem 1 (on page 52) for measures µ generated by the joint distribution of random processes. Namely, let the class M, the space X (C[0, 1]), and the set T = {t1 , . . . , tN } be deﬁned by (7.6.10). The subject of this section is the following general version of problem (7.6.10)–(7.6.12). Moment problem 6: Given h, gi,j ∈ M (i = 1, . . . , N, j = 1, . . . , n) ﬁnd the set of valued Eh(||X − Y ||) subject to the moment constraints Egi,j (|X(ti )|) = ai,j ,

Egi,j (|Y (ti )|) = bi,j ,

i = 1, . . . , N.

(7.6.16)

In particular, determine the bounds inf Eh(||X − Y ||), sup Eh(||X − Y ||),

(7.6.17)

given the constraints (7.6.16). One interpretation of the problem is as follows: Suppose we observe two continuous processes X and Y , only through the “window” T . Suppose for each point of the “window” we know some moment characteristics of X and Y . The problem is to determine the possible deviations between the processes outside the window. In particular, “What is the minimal distance between X and Y with given moment information (7.6.16)?” is just a special case of Moment problem 7.6.3. We start with the case n = 1 in (7.6.16); i.e., given h, gi ∈ M(i = 1, . . . , N ) and assuming that Egi (|X(ti )|) = ai ,

Egi (|Y (ti )|) = bi ,

i = 1, . . . , N,

(7.6.18)

we are interested in the range of Eh(||X − Y ||). The solution of this problem will be given under some assumptions of the following type: Assumption A(h, g): h ◦ g −1 (t) (t ≥ 0) is a convex function (here and in the sequel, f −1 stands for the inverse of f ∈ M). Assumption B(g): g −1 (Eg(|ξ + η|)) ≤ g −1 (Eg(|ξ|)) + g −1 (E(|η|)) for any ξ, η ∈ X (R) (here, X (R) is the set of all real-valued r.v.s.). Assumption C(g): Eg(|ξ + η|) ≤ Eg(|ξ|) + Eg(|η|) for any ξ, η ∈ X (R). Assumption D(h, g): limt→∞ h(t)/g(t) = 0.

7.6 Moment Problems of Stochastic Processes and Rounding Problems

63

Remark 7.6.22 Take the most interesting case: h(t) = tp , g(t) = tq (p > 0, q > 0). Then A(h, g) ⇔ p ≥ q, B(g) ⇔ q ≥ 1, C(g) ⇔ q ≤ 1, D(h, g) ⇔ q > p. Now let T = {0 ≤ t1 ≤ · · · ≤ tN ≤ 1} and take a = (a1 , . . . , aN ) ∈ N IRN + , b = (b1 , . . . , bN ) ∈ IR+ and g = (g1 , . . . , gN ) ∈ M to be ﬁxed vectors. Denote by X (T, g, a), the space of all X ∈ X (C[0, 1]) satisfying the marginal moment conditions Egi (|X(ti )|) = ai (i = 1, . . . , N ), and let I{h, g, T, a, b} :=

inf{Eh(||X − Y ||); X ∈ X (T, g, a), Y ∈ X (T, g, b)}.

(7.6.19)

In the next four theorems we describe the exact range of values of Eh(||X − Y ||) under diﬀerent conditions of type A–D. Theorem 7.6.23 Let A(h, gi ) and B(gi ) hold for any i = 1, 2, . . . , N . Then (7.6.20) (i) I{h, g, T, a, b} = sup h |gi−1 (ai ) − gi−1 (bi )| ; 1≤i≤N

(ii) for any ν ≥ I{h, g, T, a, b} there exist random processes Xν ∈ X (T, g, a) and Yν ∈ X (T, g, b) such that Eh(||Xν − Yν ||) = ν.

(7.6.21)

Proof: We shall split the proof into three claims. Claim 1: I{h, g, T, a, b} ≥ sup φi (ai , bi ), where φi (ai , bi ) := h(|gi−1 (ai )− gi−1 (bi )|).

1≤i≤N

Proof of Claim 1: Let X, Y ∈ X (C[0, 1]) and ξ := g(|X(ti ) − Y (ti )|). Then, by the Jensen’s inequality and A(h, gi ), h−1 (Eh(||X − Y ||)) ≥ h−1 (Eh(|X(ti ) − Y (ti )|)) = h−1 (Eh ◦ gi−1 (ξ)) ≥ h−1 ◦ h ◦ gi−1 E(ξ) = gi−1 E(ξ). Further, by B(gi ), h ◦ gi−1 (E(ξ)) ≥ h

gi−1 (Egi (|X(ti )|) − gi−1 (Egi (|Y (ti )|)

= h(|gi−1 (ai ) − gi−1 (bi )|),

64

7. Relaxed or Additional Constraints

which proves the claim. Claim 2: The inﬁmum in the left-hand side of (7.6.19) is attained, and (7.6.20) holds. Y ∈ X (C[0, 1]) to be random polygonal Proof of Claim 2: Deﬁne X, lines with vertices at points 0, t1 , . . . , tn , 1 given by ⎧ ⎪ i , ω) = g −1 (ai ), Y (ti , ω) = g −1 (bi ), i = 1, . . . , N, ⎪ X(t ⎪ ⎨ ω) = Y (0, ω) = 0, (7.6.22) X(0, if t1 > 0, ⎪ ⎪ ⎪ ⎩ X(1, ω) = Y (1, ω) = 0 if tN < 1, ω ∈ Ω. For any ω ∈ Ω, ' ' ' ' 'X(·, ω) − Y (·, ω)' = sup |gi−1 (ai ) − gi−1 (bi )|, 1≤i≤N

and hence − Y ||) = Eh(||X

sup φi (ai , bi ).

(7.6.23)

1≤i≤N

Further, by (7.6.22), ∈ X (T, g, a), X

Y ∈ X (T, g, b).

(7.6.24)

Invoking (7.6.23), (7.6.24), and Claim 1, we complete the proof of the claim. Claim 3: (ii) is satisﬁed. Proof of Claim 3: Let τ ∈ (0, 1), t ∈ T . Deﬁne the r.v.s Xν and Yν in X (C[0, 1]) as follows: Xν (t) = X(t),

Yν (t) = Y (t)

for t = 0, t1 , . . . , tN , 1,

(7.6.25)

Y are determined by (7.6.22), and where X, Xν (τ ) = h−1 (ν),

Yν (τ ) = 0.

(7.6.26)

Next, let Xν (t), Yν (t) be a random polygonal lines with vertices at 0, t1 , . . . , tN , 1 and τ . Making use of Claim 2, we have ||Xν − Yν || = h−1 (ν) ≥ and thus (7.6.21) holds.

sup |gi−1 (ai ) − gi−1 (bi )|,

1≤i≤N

2

7.6 Moment Problems of Stochastic Processes and Rounding Problems

65

Theorem 7.6.24 Let A(h, gi ) and C(gi ) hold for any i = 1, . . . , N . Then (i)

I{h, g, T, a, b} = sup h ◦ gi−1 (|ai − bi |),

(ii)

for any ν ≥ I{h, g, T, a, b} there exist Xnν ∈ X (T, g, a) and Ynν ∈ X (T, g, b) such that Eh(||Xnν − Ynν ||) → Y as n → ∞.

1≤i≤N

Proof: Claim 1: I{h, g, T, a, b} ≥ sup ϕi (ai , bi ), where ϕi (ai , bi ) := h◦gi−1 (|ai − 1≤i≤N

bi |).

Proof of Claim 1: Let X, Y ∈ X (C[0, 1]). Then, as in Claim 1 of Theorem 7.6.23, by A(h, gi ), C(gi ), and Jensen’s inequality, we have h−1 (Eh(||X − Y ||)) ≥ gi−1 E[gi (|X(ti ) − Y (ti )|)] ≥ gi−1 (|Egi (|X(ti )|) − Egi (|Y (ti )|)|) = h−1 ◦ ϕi (ai , bi ), which proves the claim. Claim 2: For any ε > 0 there exists a pair (Xε , Yε ) ∈ X (T, g, a)×X (T, g, b) such that Eh(||Xε − Yε ||) = sup h ◦ gi−1 (|ai − bi | + ε) sup 1≤i≤N

1≤i≤N

|ai − bi | . (7.6.27) |ai − bi | + ε

Proof of Claim 2: Without loss of generality we can assume that ai > bi , i = 1, . . . , N . Let pi := (ai −bi )/(ai −bi +ε) and qi := 1−pi , i = 1, . . . , N . We rearrange the indices i so that p1 ≤ p2 ≤ · · · ≤ pN . Choose sets Ai ∈ A such that A1 ⊂ · · · ⊂ AN and P (Ai ) = pi by using the assumption of (Ω, A, P ) being a nonatomic space. More precisely, since (Ω, A, P ) is nonatomic, then for any B ∈ A and any λ ∈ [0, P (B)] there exists C = C(B, λ) ∈ A, C ⊆ B such that P (C) = λ (see Loeve (1977, p. 101)). Then the required sets Ak (k = 1, . . . , N ) are given by Ak = C(Ak+1 , Pk ),

k = 1, . . . , N

(AN +1 := Ω).

Further, for any ω ∈ Ω, deﬁne ⎧ ⎨ c := g −1 (a − b + ε), i i i i Xε (ti , ω) := ⎩ d := g −1 (b /q ), i i i i and

⎧ ⎨ 0, Yε (ti , ω) := ⎩ d, i

if ω ∈ Ai , if ω ∈ Ai .

if ω ∈ Ai , if ω ∈ Ai ,

66

7. Relaxed or Additional Constraints

We deﬁne (t, Xε (t))t∈[0,1] to be a random polygonal line with vertices (ti , Xε (ti )) and let Xε (0, ω) = 0 if ti > 0 and Xε (1, ω) = 0 if tN < 1 for any ω ∈ Ω. Analogously deﬁne the process Yε (t). Then Egi (|Xε (ti )|) = gi (ci )pi + gi (di )qi = ai , and Egi (|Yε (ti )|) = gi (di )qi = bi for any i = 1, . . . , N ; i.e., Xε ∈ X (T, g, a) and Yε ∈ X (T, g, b). Further, Eh(||Xε − Yε ||) = Eh[sup |Xε (t) − Yε (t)|],

(7.6.28)

t∈T

where

⎧ ⎨ c, i |Xε (ti , ω) − Yε (ti , ω)| = ⎩ 0,

if ω ∈ Ai ,

(7.6.29)

if ω ∈ Ai .

Since g ∈ M, then for any i = 1, . . . , N − 1, pi ≤ pi+1 ⇔ ai − bi ≤ ai+1 − bi+1 ⇔ ci ≤ ci+1 ; i.e., c1 ≤ c2 ≤ · · · ≤ cN . Hence, by (7.6.29) and A1 ⊂ A2 ⊂ · · · ⊂ AN , ⎧ ⎨ c , if ω ∈ A , N N (7.6.30) sup |Xε (t, ω) − Yε (t, ω)| = ⎩ 0, t∈T if ω ∈ A . N

Now, (7.6.28) and (7.6.30) imply that Eh(||Xε − Yε ||) = h(cN )pN , which is in fact the right-hand side of equality (7.6.27), and thus the claim is proved. Claims 1 and 2 prove the desired equality (i). Claim 3: (ii) is satisﬁed. Proof of Claim 3: Let τ ∈ (0, 1), τ ∈ T . Using the same notations as in Claims 1 and 2 we deﬁne Xν (ti , ω) Xν (τ, ω)

:= Xε (ti , ω), Yν (ti , ω) := Yε (ti , ω), ⎧ ⎨ h−1 (ν), if ω ∈ A , N = ⎩ 0, if ω ∈ AN ,

Yν (τ, ω)

=

ν > h(cN )

=

0

ω ∈ Ω,

for any ω ∈ Ω, and ε > 0 is chosen so small that sup h ◦ gi−1 (|ai − bi | + ε).

1≤i≤N

We deﬁne the random broken lines Xν and Yν with vertices Xν (ti ), Yν (ti ), i = 1, . . . , N and Xν (0) = Yν (0) if ti > 0; Xν (1) = Yν (1) = 0 if tN < 1. Hence, as in Claim 2 we conclude that ⎧ ⎨ max(c , h−1 (ν)), if ω ∈ A , N N ||Xν (·, ω) − Yν (·, ω)|| = ⎩ 0, if ω ∈ A . N

Hence, Eh(||Xν − Yν ||) = νpN = ν aNaN−b−bNN+ε . This proves the claim.

2

7.6 Moment Problems of Stochastic Processes and Rounding Problems

67

Theorem 7.6.25 Let D(h, gi ) hold for any i = 1, . . . , N . Then I{h, g, T, a, b} = 0,

(7.6.31)

and for any ν > 0 there exist random processes Xnν ∈ X (T, g, a) and Ynν ∈ X (T, g, b) such that Eh(||Xnν − Ynν ||) → ν. Proof: Claim 1: For any n = 1, 2, . . . there exist Xn ∈ X (T, g, a) and Yn ∈ X (T, g, b) such that

Eh(||Xn − Yn ||) ≤

n

h(nbi )

i=1

bi ai + h(nai ) gi (nbi ) gi (nai

. (7.6.32)

Since gi ∈ M for n large enough, say n ≥ n0 , we can deﬁne disjoint sets Ain , Bin , Cin , and such that Ain + Bin + Cin = Ω and

P (Ain ) = cin :=

ai , gi (nai )

P (Bin ) = din :=

bi . gi (nbi )

Now, for any i = 1, . . . , N , n ≥ n0 , deﬁne ⎧ ⎪ ⎪ X (t , ω) = nai , ⎪ ⎨ n i

Yn (ti , ω) = 0,

if ω ∈ Ain ,

Xn (ti , ω) = 0, Yn (ti , ω) = nbi , ⎪ ⎪ ⎪ ⎩ X (t , ω) = Y (t , ω) = 0, n i n i

if ω ∈ Bin ,

(7.6.33)

if ω ∈ Cin .

Then Egi [|Xn (ti )] = gi (nai )cin = ai and Egi [|Yn (ti )|] = gi (nbi )din = bi ; i.e., X ∈ X (T, g, a) and Y ∈ X (T, g, b). Further, we deﬁne the random broken lines Xn (t), Yn (t) (t ∈ [0, 1]) in the way we already did in Theorems 7.6.23 and 7.6.24. Without loss of generality we can assume that a1 ≤ a2 ≤ · · · ≤ aN ≤ b1 ≤ b2 ≤ · · · ≤ bN . Then ||Xn − Yn || =

sup |Xn (t) − Yn (t)|, t∈T

(7.6.34)

68

7. Relaxed or Additional Constraints

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ =

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

nbN

if ω ∈ BN,n ,

nbN −1 .. .

if ω ∈ BN −1,n \ BN,n ,

nb1

if ω ∈ B1,n \

naN

if ω ∈ AN,n \

N (

Bj,n ,

j=2 N (

Bj,n ,

j=1

.. .

)

na1

if ω ∈ A1,n \

0

if ω ∈

N (

N (

Aj,n ∪

j=2

Aj,n ∪

j=1

N (

N (

* Bj,n ,

j=1

Bj,n .

j=1

Hence, Eh(||Xn − Yn ||) ≤

N

h(nbj,n )dj,n +

j=1

N

h(naj,n )cj,n ,

j=1

which proves (7.6.32) and the claim. By D(h, gi ) (i = 1, . . . , N ) it follows that the right-hand side of (7.6.32) goes to 0 as n → ∞. Hence, (7.6.31) holds true. Claim 2: (ii) is valid. ∈ (0, 1), t ∈ T , and deﬁne νn

Let τ

=

1−P

ν

+

N j=1

, Cj,n

Xn,ν (τ ) = h−1 (νn ), Yn,ν (τ ) = 0. We deﬁne the random broken line Xn,ν (t) with vertices Xn,ν (tj ) = Xn (tj ) (see (7.6.33)) and Xn,ν (τ ) (cf. Claim 2 of Theorem 7.6.23). Following the same notations as in Claim 1, we have ⎛

N ,

h(nbi )P ⎝Bi,n \

⎞ Bj,n ⎠ ≤ h(nbi )din → 0

as n → ∞

(7.6.35)

Bj,n ⎦⎦ ≤ h(nai )cin → 0

(7.6.36)

j=i+1

and ⎡

⎡

h(nai )P ⎣Ai,n \ ⎣

N ,

j=i+1

Aj,n ∪

N ,

⎤⎤

j=1

as n → ∞. Hence, by (7.6.34)–(7.6.36), for n large enough,

7.6 Moment Problems of Stochastic Processes and Rounding Problems

Eh(||XN,ν − YN,ν ||) =

N

⎛

i=1

+

N

⎛

=

⎛

νn P ⎝Bi,n \

j=1

+

N

Bj,n ⎠

j=i+1

⎡

max(νn , h(nai ))P ⎝Ai,n \ ⎣

i=1 N

⎞

N ,

max(νn , h(nbi ))P ⎝Bi,n \

⎛

⎡

νn P ⎝Ai,n \ ⎣

= νn P ⎝

N ,

Aj,n ∪

j=i+1 N ,

Aj,n ∪

N ,

⎤⎞ Bj,n ⎦⎠

j=1

Bj,n ⎠

j=i+1

i=1

⎛

N ,

j=i+1

⎞

N ,

69

⎞

N ,

⎤⎞ Bj,n ⎦⎠

j=1

(Aj,n ∪ Bj,n )⎠ .

2

j=1

Theorem 7.6.26 For any gi ∈ M (i = 1, . . . , N ), (i)

I{0, g, a, b}

(7.6.37) := inf{P (X = Y ); X ∈ X (T, g, a), y ∈ X (T, g, b)} = 0;

(ii) for any ν ∈ (0, 1) there exists a sequence (Xnν , Ynν ) ∈ X (T, g, a)× X (T, g, b) such that P (Xnν = Ynν ) → ν Proof: (i)

as n → ∞.

Let cn ∈ A and P (cn ) =

1 n.

For any i = 1, . . . , N deﬁne

⎧ ⎨ X (t , ω) := g −1 (na ), Y (t , ω) := g −1 (nb ), if ω ∈ C , n i i n i i n i i (7.6.38) ⎩ X (t , ω) := Y (t , ω) = 0, if ω ∈ Cn . n i n i Then (7.6.38) determines the random polygonal lines Xn ∈ X (T, g, a) and Yn ∈ X (T, g, b). Since Xn (ti , ω) = Yn (ti , ω) = 0 whenever ω ∈ Cn and i = 1, . . . , N , then Xn (t, ω) = Yn (t, ω) = 0 if ω ∈ Cn and t ∈ [0, 1]. Hence, P (Xn = Yn ) ≥ P (Ω \ Cn ) = as desired.

n−1 → 1 n

70

7. Relaxed or Additional Constraints

(ii) Let 0 < ν < 1 and τ ∈ (0, 1) \ T . Choose A ∈ A with P (A) = ν and let ⎧ ⎪ ⎪ for any ω ∈ A, X (τ, ω) = 1, Ynν (τ, ω) = 0, ⎪ ⎨ nν (7.6.39) Xnν (τ, ω) = Ynν (τ, ω) = 0, for any ω ∈ A, ⎪ ⎪ ⎪ ⎩ X (t ) = X (t ), Y (t ) = Y (t ), i = 1, . . . , N, nν

i

n

i

nν

i

n

i

where Xn (ti ) and Yn (ti ) are given by (7.6.38). We construct the random broken lines Xnν and Xnν by using (7.6.39) (cf. Claim 3 of Theorem 7.6.23). From the implications

Xnν (·, ω) = Ynν (·, ω) ⇔

⇔

⎧ ⎪ ⎪ X (t , ω) = Ynν (ti , ω), ⎪ ⎨ nν i

i = 1, . . . , N,

⎪ ⎪ ⎪ ⎩ X (τ, ω) = Y (τ, ω) nν nν ω ∈ A ∩ Cn ,

it follows that P (Xnν = Ynν ) = P (Ω \ (A ∪ Cn )) → 1 − ν, which proves (ii), and the theorem as well.

2

As a consequence of Theorems 7.6.23–7.6.26 we obtain the following solution of Moment problem 7.6 for qi (t) = tqi (yi > 0) and h(t) = tp (P ≥ 0, with the convention t0 := I{t = 0}). Corollary 7.6.27 Let q = (q1 , . . . , qN ) (qi > 0), p ≥ 0, and

I{p, q, a, b} :=

inf {E||X − Y ||p ; X, Y ∈ X (C[0, 1]), E|X(ti )|qi = ai , E|Y (ti )|qi = bi , i = 1, . . . , N } .

Then ⎧ p 1/q 1/q ⎪ sup ai i − bi i , if p ≥ qi ≥ 1, i = 1, . . . , N, ⎪ ⎪ ⎪ ⎪ ⎨ 1≤i≤N p/q sup |ai − bi | i , if p ≥ qi , 0 < qi < 1, (7.6.40) I{p, q, a, b} = ⎪ 1≤i≤N ⎪ i = 1, . . . , N. ⎪ ⎪ ⎪ ⎩ 0, if 0 ≤ p ≤ qi , i = 1, . . . , N.

7.6 Moment Problems of Stochastic Processes and Rounding Problems

71

Moreover, for any ν > I{p, q, a, b} there exists a sequence (Xnν , Ynν ) ∈ X ([0, 1]) such that E||Xnν − Ynν ||p → ν

as n → ∞

and E|X(ti )|qi = ai ,

E|Y (ti )|qi = bi ,

i = 1, . . . , N.

Remark 7.6.28 Corollary 7.6.27 gives an explicit expression for I{p, q, a, b} if p and qi are subject to certain inequalities (cf. (7.6.40)) or if qi = q for all i = 1, . . . , N . The problem of an explicit description of I{p, q, a, b} for any p ≥ 0 and qi > 0 is still open.

7.6.4

Approximation of Queueing Systems with Prescribed Moments

In this section we discuss applications of Moment problem 1 (on page 52) to the problem of best approximation of a queueing system with known moment characteristics. As an example, suppose our “real” queueing system is of type G|G|1|∞ (for some acquaintance with the usual notations in queueing theory we refer to Borovkov (1984), Kalashnikov and Rachev (1990)). For this system, the sequences of nonnegative r.v.s (possibly dependent and nonidentically distributed) e = {en }n∈IN , s = {sn }n∈IN (IN = (1, 2, . . .)) are viewed as sequences of interarrival and service times. Looking at e and s as “input” of laws, we deﬁne (as the “output” ﬂow) the sequence of waiting times w1 = 0,

wn+1 = (wn + sn − en )+ ,

n ∈ IN,

(7.6.41)

where (·)+ = max(0, ·). Since the distribution of w = {wn }n∈IN is not known, the aim is to approximate, model, or simulate the “real” system determined by the triplet (e, s, w) with a “simpler” queueing model (e∗ , s∗ , w∗ ). Assuming that the marginal distributions (the laws of ei , si ) are known, Borovkov (1984, Chapter 4) and Kalashnikov and Rachev (1990) examine diﬀerent approximating models (e∗ , s∗ , w∗ ) and estimate the possible discrepancy between the “real” system (e, s, w) and the “ideal” model (e∗ , s∗ , w∗ ). Further, we shall relax the constraints “the laws of ei ’s and si ’s are known” by “certain moment characteristics of ei ’s and si ’s are ﬁxed.” In this setup the solutions of Moment problem 1 are used in cases when the “ideal” model is not deterministic, say G|G|1|∞ but with simpler structure. We invoke Moment problem 2 (on page 53) when the approximation model has some deterministic components, like D|G|1|∞ (i.e., e∗j ’s are constants), or D|D|1|∞ (i.e., e∗j ’s and s∗j ’s are constants). Summarizing, we shall consider here the following two problems:

72

7. Relaxed or Additional Constraints

(a) Bounds for the deviation of output characteristics of two dependent queueing models. (b) Approximation of queueing systems by deterministic-type queueing models. Consider the following problem, which occurs in investigations stability of queueing models (see Kalashnikov and Rachev (1990, Chapter 5)). Suppose two queueing models of type G|G|1|∞, (e, s, w) and (e∗ , s∗ , s∗ ), with dependent characteristics are given. Here e = {en }n∈IN , s = {sn }n∈IN , w = {wn }n∈IN are, respectively, the sequences of interarrival, service, and waiting times. Assume that the components dj , sj , j ∈ IN of the “input ﬂows” e and s are dependent and nonidentically distributed. The “output” ﬂow is given by the sequence of waiting times (7.6.41). Suppose that the distribution of ej (resp. sj ) is concentrated on a compact interval [aj , bj ] (resp. [cj , dj ]). While this assumption is quite natural from the practical point of view, it is not used frequently in the literature, simply because it is easier to analyze queueing models with input distributions having unbounded support. We make similar assumptions for the model (e∗ , s∗ , w∗ ); in particular, it is assumed that a∗j ≤ e∗j ≤ b∗j , c∗j ≤ s∗j ≤ d∗j a.s. for all j ∈ IN. The input pairs (ej , e∗j ), (sj , s∗j ) of the two models are arbitrarily mutually dependent, the distributions of ej ’s, e∗j ’s, sj ’s, s∗j ’s are unknown. We assume that only the moments Eej = αj ,

Ee∗j = αj∗ ,

Esj = βj ,

Es∗j = βj∗

(7.6.42)

are given. Our problem to ﬁnd a sharp bound for the deviation between the waiting times in both models. Let ϕk (en−k,n−1 , sn−k,n−1 ) (en−k,n−1 := (en−k , . . . , en−1 ), sn−k,n−1 := (sn−k , . . . , sn−1 )) be the waiting time for the nth arrival, assuming that the system is “free” at the moment n − k. In other words, ϕ(en−k,n−1 , sn−k,n−1 ) :=

(7.6.43)

max [0, en−1 − sn−1 , (en−1 − sn−1 ) + (en−2 − sn−2 ), . . . , (en−1 − sn−1 ) + (en−2 − sn−2 ) + · · · + (en−k − sn−k )] .

As a measure of deviation between the waiting times of the both systems we shall use δp (T ) = sup max Lp [ϕk (en,n+k−1 , sn,n+k−1 ), ϕk (e∗n,n+k−1 , s∗n,n+k−1 )], n∈IN 1≤k≤T

where p ≥ 1 and T ≥ 2 are ﬁxed, and Lp (X, Y ) := {E|X − Y |p }1/p , p ≥ 1, X, Y ∈ X (R).

(7.6.44)

For random vectors we extend (7.6.44) as follows: Lp (X, Y ) = {E||X − Y ||p }1/p , X, Y ∈ X (RT ),

(7.6.45)

7.6 Moment Problems of Stochastic Processes and Rounding Problems

73

T where ||(x1 , . . . , xT )|| = i=1 |xj |. Since ϕk is a Lipschitz function with respect to the Minkowski norm || · ||, we have that for any k = 1, . . . , T , Lp [ϕk (en,n+k−1 , sn,n+k−1 ), ϕk (e∗n,n+k−1 , s∗n,n+k−1 )]

(7.6.46)

≤ Lp [(en,n+T −1 , sn,n+T −1 ), (e∗n,n+T −1 , sn,n+T −1 )] + Lp [(e∗n,n+T −1 , sn,n+T −1 ), (e∗n,n+T −1 , s∗n,n+T −1 )] ≤

n+T −1

Lp (ej , e∗j ) + Lp (sj , s∗j ) .

j=n

Now we invoke Theorem 7.6.5 to obtain sharp estimates of Lp (ej , e∗j ) and Lp (sj , s∗j ). Namely, Lp (ej , e∗j ) ≤ (Dj δj + sj )1/p ,

(7.6.47)

where Dj

:= Dj (aj , bj , a∗j , bj ∗) := |bj − b∗j |p + |aj − a∗j |p − |bj − a∗j |p − |aj − b∗j |p ;

Tj

:= Tj (aj , a∗j , bj , b∗j , αj , αj∗ ) (7.6.49) ∗ p ∗ p := (1 − Bj )|bj − aj | + (Bj + Cj − 1)|aj − aj | + (1 − Cj )|aj − b∗j |p ;

Bj

:=

δj

bj − αj , bj − aj

Cj :=

b∗j − αj∗ ; b∗j − a∗j

:= δj (aj , a∗j , bj , b∗j , αj , αj∗ ) := max(0, 1 − Bj − Cj ).

(7.6.48)

(7.6.50) (7.6.51)

Remark 7.6.29 If ej and e∗j are unknown, i.e., aj = a∗j = 0, bj = b∗j = +∞, then sup{Lp (dj , e∗j ); Eej = αj , Ee∗j = αj∗ } = ∞ (cf. Kuznezova-Sholpo and Rachev (1989)). In a similar way, j δj + Tj )1/p , Lp (sj , s∗j ) ≤ (D

(7.6.52)

j , δj , Tj are deﬁned by (7.6.48)–(7.6.51), exchanging bj with dj , where D b∗j with d∗j , aj with cj , and a∗j with c∗j . In this way we have proved the following theorem. Theorem 7.6.30 For any p ≥ 1 and T = 2, 3, . . . , j δj + Tj )1/p . δp (T ) ≤ T sup (Dj δj + Tj )1/p + (D j≥1

(7.6.53)

74

7. Relaxed or Additional Constraints

The estimate is sharp or nearly sharp, since the inequalities (7.6.53) and (7.6.52) are the best possible bounds under the moment assumptions (7.6.42) (cf. Theorem 7.6.5), and also, the inequality (7.6.46) cannot be improved in the set of all possible input ﬂows e, e∗ , s, s∗ . Next, we shall consider a much more general case than the single-channel models discussed above. Suppose the dynamics of a queueing system are determined by the transformation F from the set U of input ﬂows U to the set V of output ﬂows V . Let V0 represent the output at moment zero; V0 is assumed to be an -dimensional vector; i.e., V0 ∈ X (R ). It is quite general to assume that the input and the output ﬂows have the form U = (V0 , U0 , U1 , . . .) and V = (V0 , V1 , . . .), where Uj ∈ X (Rk ). We endow U and V with the norms ||U ||U :=

∞

2−j ||Uj ||k,p + ||V0 ||,p

(7.6.54)

2−j ||Vj ||,p ,

(7.6.55)

j=0

and ||V ||V :=

∞ j=0

where p ≥ 1, ||Uj ||k,p

:=

(E||Uj ||pk )1/p ,

||Uj ||k

=

||(Uj , . . . , Uj || = |Uj | + · · · + |Uj |,

(1)

(k)

(1)

(k)

and ||Vj ||,p is deﬁned in a similar way. Suppose the transformation F : U → V is determined by the set of mappings Fj : R × Rkj → R ,

j ∈ IN,

(7.6.56)

such that the output at “time” j is deﬁned recursively: Vj = Fj (V0 , U0 , . . . , Uj−1 ).

(7.6.57)

A smoothness assumption on Fj is given by the Lipschitz condition ⎡ ⎤ j−1 (7.6.58) ||βj ||k ⎦ . ||Fj (α0 , β0 , . . . , βj−1 || ≤ cj ⎣||α0 || + j=0

A reasonably large number of queueing models meet conditions (7.6.56)– (7.6.58). Among them are the single-channel models G|G|1|∞, the multichannel models G|G|J|∞, and the multichannel–multiphased model (G|G|J1 ) → (G|J2 ) → · · · → (G|Jn ) (cf. Kalashnikov and Rachev (1990,

7.6 Moment Problems of Stochastic Processes and Rounding Problems

75

Chapter 5)). By (7.6.55), (7.6.57), and (7.6.58), ||Vj ||,p ≤ cj ||V0 ||,p + j−1 i=0 ||Ui ||k,p , and thus ||V ||V

≤ 2cj ||U ||U ⎡ ⎤ ∞ k (i) (i) ≤ 2cj ⎣ ||V0 ||,p + 2−1 ||Vj ||,p ⎦ . i=1

(7.6.59)

i=1 j=0

Combining (7.6.59) with Theorem 7.6.5 gives us a sharp bound on the deviation of two queueing models V = FU, V ∗ = FU ∗ , whose dynamics are determined by (7.6.54)–(7.6.58). Theorem 7.6.31 Suppose V = FU,

U ∈ U,

V ∈U

(7.6.60)

is a queueing model satisfying (7.6.54)–(7.6.58) such that (i)

a0

(i) cj

(i)

≤ V0

(i) Vj

≤

(i)

≤ b0 ≤

(i) dj

(i)

a.s.,

EV0

a.s.,

(i) EVj

(i)

= L0 , =

(i) βj ,

i = 1, . . . , , j = 0, 1, . . . , i = 1, . . . , k.

In addition to model (7.6.60) consider the same type model indexed by ∗ and satisfying the above two sets of inequalities with constants indexed by ∗. Then ||V − V ∗ ||V ⎡ ⎤ k ∞ 1/p

1/p (i) (i) (i) (i) (i) (i) δ + T ⎦, ≤ 2cj ⎣ + 2−j D D0 δ0 + T0 j j j i=1

i=1 j=0

where the D’s, δ’s, and T ’s are determined by the same formula as in (7.6.58)–(7.6.60), and (i)

D0

(i)

T0

(i)

δ0

(i) D j (i) Tj (i) δj

(i) (i) (i)∗ (i)∗ , = Di a0 , b0 , a0 , b0

(i) (i) (i)∗ (i)∗ (i) (i)∗ = Ti a0 , b0 , a0 , b0 , α0 , α0 ,

(i) (i) (i)∗ (i)∗ (i) (i)∗ = δi a0 , b0 , a0 , b0 , α0 , α0 ,

(i) (i) (i)∗ (i)∗ = Dj cj , dj , cj , dj ,

(i) (i) (i)∗ (i)∗ (i) (i)∗ = Tj cj , dj , cj , dj , βj , βj ,

(i) (i) (i)∗ (i)∗ (i) (i)∗ = δj cj , dj , cj , dj , βj , βj .

76

7. Relaxed or Additional Constraints

The rest of this section deals with Problem (b) (on page 72). Suppose again that the “real” queueing system is determined by the triplet (e, s, w), where w is given by the recursive equation (7.6.41). Often in practice one models the random input characteristics by replacing their random values with constants, usually equal to the corresponding means. In doing so, it is natural to investigate the deviation between the “real” output w and the modeled (“ideal”) output w∗ . (In the sequel, all quantities related to the approximating model will have the same notations as in the “real” system but superscribed with ∗.) The deviation between w and w∗ will be expressed by the Kantorovich metric p , deﬁned here as follows: For X, Y ∈ X (IR∞ ),

p (X, Y ) := p (P X , P Y )

(7.6.61)

Y ); X, Y ∈ X (IR ), X = X, Y = Y }, p > 0, min{Lp (X, q Y ) := E dp (X, Y ) , q = min(1, 1/p) is the Lp -metric. In the where Lp (X, above deﬁnition, the space X (IR∞ ) consists of all random ∞sequences taking values in the metric space (IR∞ , d), where d(x, y) := j=1 2−j ||xj − yj ||. Since we have assumed that the underlying probability space is not atomic, the minimum in the right-hand side of (7.6.61) is equal to ⎧ ⎨ d(x, y)P ( dx, dy); P s are probabilities on IR∞ × IR∞ min ⎩ ⎫ IR∞ ×IR∞ ⎬ with ﬁxed projections P X and P Y . ⎭ :=

∞

d

d

(n) (n) = X1 , X2 , . . . ∈ Xp (IR∞ ), X = (X1 , X2 , . . .) ∈ Xp (IR∞ ),

(n)

(n) (n) we have p X , X ≥ 2−j p Xj , Xj , and thus p (X , X) → 0 imFor X

(n)

d

(n)

plies the weak convergence of any j-component Xj = X and E|Xj |p → E|Xj |p . Further, we consider two types of approximating queues D|G|1|∞ (i.e., e∗j are constants) and D|D|1|∞ (i.e., e∗j and s∗j are constants). Similar results can be obtained if one examines the model G|D|1|∞ (i.e., s∗j are constants) as an approximation of the “real” queue G|G|1|∞. In both queues D|G|1|∞ and G|G|1|∞, the sequences of service times s∗ and s consist of dependent nonidentically distributed random variables. The next lemma shows that the outputs for the ideal and real models meet a lower bound of deviation if s∗ is chosen to have independent components. Let ε > 0 and X = (X1 , X2 , . . .) ∈ Xp (IR∞ ). The components of X are said to be ( p , ε)-independent if IND(X) := p (X, X) ≤ ε,

(7.6.62)

7.6 Moment Problems of Stochastic Processes and Rounding Problems

77

d

where the X i ’s (the components if X) are independent and X i = Xi (i ∈ IN). Lemma 7.6.32 Let the approximating model be of type D|G|1|∞. Assume that the sequences e and s of the queueing model G|G|1|∞ are independent. Then 1

p (w, w∗ ) 2

(7.6.63)

≤ IND(s) + IND(s∗ ) +

∞

2−j ( p (ej , e∗j ) + p (sj , s∗j )).

j=1

Proof: Using the recursive equations (7.6.41) for w and w∗ , we obtain 1 1 ∗ ∗ ∗ ∗ ∗ ∗ 2 d(w, w ) ≤ d(e, e ) + d(s, s ). Hence, 2 Lp (w, w ) ≤ Lp (e, e ) + Lp (s, s ). ∗ ∗ Since e and s (resp. e and s ) are independent, we have, passing to the minimal metrics, that 1

p (w, w∗ ) ≤ p (e, e∗ ) + p (s, s∗ ). 2

(7.6.64)

By (7.6.61) and since ej (j ∈ IN) are constants, we obtain the bound

p (e, e∗ ) =

∞

2−j Lp (ej , e∗j ) =

j=1

∞

2−j p (ej , e∗j ).

(7.6.65)

j=1

To estimate p (s, s∗ ) in (7.6.64) we use the ( p , ε)-independence characteristic deﬁned in (7.6.62):

p (s, s∗ ) ≤ IND(s) + IND(s∗ ) + p (s, s∗ ),

(7.6.66) d

d

where s (resp. s∗ ) has independent components and sj = sj (resp. s∗j = s∗j ). We now invoke the “regularity” property of the Kantorovich metric: 1∞ 2 ∞ ∞

p X (n) , Y (n)

p X (n) , Y (n) ≤ (7.6.67) n=1

n=1

n=1

for sequences {X (n) }n≥1 ⊂ Xp (IR∞ ), {Y (n) }n≥1 ⊂ Xp (IR∞ ) of independent components. Let E j be a sequence with components all equal to zero except for the jth component, which equals 1. Then by (7.6.67), ⎛ ⎞ ∞ ∞

p (s, s∗ ) = p ⎝ (7.6.68) sE j , s∗ E j ⎠ j=1

≤

∞ j=1

j=1 j

p sE , s∗ E

j

=

∞ j=1

2−j p (sj , s∗j ).

78

7. Relaxed or Additional Constraints

Combining (7.6.64), (7.6.65), (7.6.66), and (7.6.68) proves the lemma.

2

The estimate (7.6.63) suggests that the approximating model should be chosen with s∗ having independent components. If this is the case, then IND(s∗ ) = 0, and the ﬁrst problem is to estimate IND(s). Lemma 7.6.33 (a) Suppose that the only information known about the “real” service times are the moments ESjq1 = βj ,

Esqj 2 = γj ,

j ∈ IN,

(7.6.69)

and that the support of Fsj is [0, ∞). Then IND(s) ≤

∞

2−j ∆j ,

(7.6.70)

j=1

where

∆j :=

⎧ pq ⎪ ⎨ 2β 1/q1 , if 0 < p ≤ q1 , 1 ≤ q1 < q2 , j ⎪ ⎩ +∞,

1/q1

if 0 < q1 < q2 < p and βj

1/q2

= γj

(7.6.71) ,

and q = min(1, 1/p). (b) Suppose the support Fsj is the compact interval [cj , dj ], and βj = Esj . Then (7.6.70) holds with

∆j Tj

1/p j δj + Tj j = −2(dj − cj )p , D , p ≥ 1, where D d j − βj dj − β j = 2 1− , and δj = max 0, 1 − 2 . dj − cj dj − c j =

(7.6.72)

Proof: Assertion (a) follows from Corollary 2 of Kuznezova-Sholpo and Rachev (1989) and (b) from Theorem 7.6.5. 2

Lemma 7.6.34 Suppose that for every j ∈ IN the ﬁrst two moments of ej are known: mj := Eej ,

(2)

mj

and let aj ≤ ej ≤ bj a.s.

:= Ee2j ,

σj2 := Var ej ,

(7.6.73)

7.6 Moment Problems of Stochastic Processes and Rounding Problems

79

(i) If p ≥ 2 and −∞ < aj < bj < ∞ and if e∗j is chosen to be the midpoint of [aj , bj ], then

p (ej , e∗j )

)" ≤

bj − aj 2

#p−2

(2) mj

− mj (aj + bj ) +

e∗2 j

* 1/p . (7.6.74)

(ii) Suppose 0 < p ≤ 2 and either −∞ = aj , +∞ = bj , or −∞ < aj < bj < ∞ and σj ≤ min[mj − aj , bj − mj ].

(7.6.75)

Then the “optimal” d∗j for the approximating model is given by e∗j = mj , and in this case

p (ej , e∗j ) ≤ σjpq ,

q = min(1, 1/p).

(7.6.76)

Proof: This follows from Theorems 7.6.10, 7.6.11, and 7.6.12 after some obvious arguments. The estimates (7.6.74) and (7.6.76) are sharp. 2 Lemma 7.6.35 (a) If 0 < p ≤ q1 , 1 ≤ q1 < q2 , then % & (2) ∗(2) 1 2 = Es∗q sup p (sj , s∗j ); nj = Esqj 1 , nj = Esqj 2 , n∗j = Es∗q j , nj j pq

1/q ∗1/q , q = min(1, 1/p). (7.6.77) = nj 1 + nj 1 (b) Suppose p ≥ 1, cj ≤ sj ≤ dj , c∗j ≤ s∗j ≤ d∗j a.s., and nj = Esj , n∗j = Es∗j . Then

p (sj , s∗j ) ≤

j δj + Tj D

1/p

,

(7.6.78)

j = Dj (cj , dj , c∗ , d∗ ), δj = δj (cj , dj , c∗ , d∗ , nj , n∗ ), where D j j j j j Tj = Tj (cj , dj , c∗j , d∗j , nj , n∗j ) are given by (7.6.48)–(7.6.51). Proof: Assertion (a) follows from Corollary 2 of Kuznezova-Sholpo and Rachev (1989) and (b) from Theorem 7.6.5. 2 Lemmas 7.6.32–7.6.35 lead us to the main result. Theorem 7.6.36 Let the approximating queueing model be of type D|G|1|∞. Assume that the sequences e and s of the “real” queueing model are independent. Then the Kantorovich metric between the sequences of waiting

80

7. Relaxed or Additional Constraints

times of the “approximating” and “real” models is bounded as follows:

p (w, w∗ )

(7.6.79)

≤ 2 IND(s) + 2 IND(s∗ ) +

∞

2−j+1 ( p (ej , e∗j ) + p (sj , s∗j )).

j=1

Each term in the right-hand side of (7.6.79) can be estimated as follows: (a) An appropriate choice for the approximating sequence of service times will be IND(s∗ ) = 0. (b) If (7.6.69) holds, a bound for IND(s) is given by (7.6.70). (c) If the means and variances of the ej ’s are known, then p (ej , e∗j ) can be estimated from above by (7.6.74). (d) The last term in (7.6.78), p (sj , s∗j ), can be estimated by (7.6.77) (resp. (7.6.78)), provided that the corresponding moment conditions hold. In the next theorem we shall omit the restriction that e and s are independent, but we shall assume that the approximating model is of completely deterministic type D|D|1|∞. Theorem 7.6.37 If the approximation queueing model is of type D|D|1|∞, then

p (w, w∗ ) ≤

∞

2−j+1 ( p (ej , e∗j ) + p (sj , s∗j )).

(7.6.80)

j=1

If the ﬁrst moments of ej and sj are ﬁxed, then p (ej , e∗j ) and p (sj , s∗j ) can be estimated as in Lemma 7.6.34. The proof is similar to that of Theorem 7.6.36.

7.6.5

Rounding Random Numbers with Fixed Moments

In this part we shall discuss the interplay between Moment problems 3 and 4 (on pages 53, 54) and the problem of rounding of random proportions. Given a vector X = (X1 , . . . , Xn ) of r.v.s consider the sum X1 +· · ·+Xn . If the Xi ’s are uniformly distributed on the simplex {(si ) ≥ 0; s1 +· · ·+sn = 1}, then they can be treated as proportions, and clearly Sn := X1 + · · · + Xn = 1. If Sn∗ is the sum of conventional roundings [X1 ]1/2 + · · · + [Xn ]1/2 , then Mosteller, Youtz, and Zahn (1967), Diaconis and Freedman (1979), and Balinski and Rachev (1993) have estimated the probability that Sn =

7.6 Moment Problems of Stochastic Processes and Rounding Problems

81

Sn∗ . Here we shall examine the closeness between Sn and Sn∗ in the case of i.i.d. observations Xi where only one or two moments are known. Suppose {Xi }i∈IN are nonnegative i.i.d. r.v.s with known moments EX1 = d1 ,

EX1r = dr .

(7.6.81)

The c-rounding [·]c (c ∈ [0, 1]) (see Section 7.6.2) gives us the sequence of i.i.d. roundings {[Xi ]c }i∈IN . Let Vi := [Xi ]c − Xi be the rounding error, and n Sn,c = i=1 Vi is the total rounding error. Then the normalized rounding error n−1 Sn,c converges by the LLN to E[X1 ]c − d1 . Our objective here is to ﬁnd sharp bounds for the distribution function of n−1 Sn,c subject to (7.6.81). In other words, for a suitably chosen metric µ in the distribution functions space, the problem is to determine the “radius” of the set of probabilistic laws, i.e., Dn = Dn (µ) := sup µ n−1 Sn,c , E[X]c − d1 ; (7.6.82) r EX = d1 , EX = dr . In (7.6.82), X has the same distribution as the Xi ’s, and thus E[X]c − d1 = d

EV , where V = V1 . Clearly, there is a great variety of metrics µ from which one can choose in (7.6.82). We shall consider two metrics, one especially designed for the problem, one the ideal metric θs (s > 1) and the other the L´evy metric L. Note that Theorems 7.6.18–7.6.21 provide us with sharp bounds for E[X]c − d1 , in the case of the conventional rounding c = 12 .(6) In fact, with [X] = [X]1/2 , sup{E[X] − d1 ; EX = d1 , EX r = dr } inf{E[X] − d1 ; EX = d1 , EX r = dr }

= U − d1 ,

(7.6.83)

= L − d1 ,

(7.6.84)

where the exact values of U and L are given in Theorems 7.6.18–7.6.21. Next we can chose µ in the deﬁnition of Dn = Dn (µ) to be the L´evy metric L(X, Y ) =

inf{ε > 0; FX (x − ε) − ε ≤ FY (x) ≤ FX (x + ε) + ε for all x ∈ IR},

and thus for the distribution function Fn of n−1 Sn,c we obtain the following bounds: 0 ≤ Fn (x) ≤ Dn 0 ≤ Fn (x) ≤ 1 1 − Dn ≤ Fn (x) ≤ 1 (6) The

for 0 ≤ x ≤ L − Dn ,

(7.6.85)

for L − Dn ≤ x ≤ U − Dn , for U + Dn ≤ x.

(7.6.86) (7.6.87)

general case of c ≤ 1 was treated in Anastassiou and Rachev (1992).

82

7. Relaxed or Additional Constraints

From Theorems 7.6.18–7.6.21 it follows that the above bounds are sharp. Our next step is to ﬁnd a good estimate for Dn = Dn (L). To this end we ﬁrst estimate Dn (θs ) for θs (X, Y ) = sup |Ef (X) − Ef (Y )|.

(7.6.88)

Here, the supremum is taken over all bounded functions f on IR with q1 integrable second derivative |f |q ≤ 1, 1 < s < 2, q = 2−s . The next 1−s lemma shows that the θs -radius Dn (θs ) = O(n ) for all 1 < s < 2. We use the notation ∨ := max. Lemma 7.6.38 For 1 < s < 2, cn1−s , Dn (θs ) ≤

c :=

2 (c ∨ (1 − c))s . s

Proof: For any X and Y with equal means, θs (X, Y )

1 κs (X, Y ) s d d = X| s−1 − Y |Y |s−1 ; X := inf{E|X| X, Y = Y }. ≤

Therefore, from the ideality of θs (7) Dn (θs )

1 ≤ n1−s θs (V, EV ) ≤ n1−s κs (V, EV ) s 1 = n1−s E V |V |s−1 − (EV )|EV |s−1 s 1 ≤ n1−s . s

The latter follows since |V | = |X − [x]c | ∈ (0, c ∨ (1 − c)).

2

s−1 In the next theorem we bound Dn = Dn (L) in (7.6.82) as O n− 1+s . Theorem 7.6.39 For any 1 < s < 2, 0 < c < 1, 1

1−s

Dn (L) ≤ (4 c) 1+s n 1+s , where the constant c is deﬁned as in Lemma 7.6.38.

(7) θ

s is an ideal s i |ci | θs (Xi , Yi ), for

metric of order s > 0; that is, θs ( i ci Xi , i ci Yi ) all independent Xi , Yi and constants ci ∈ IR.

≤

7.6 Moment Problems of Stochastic Processes and Rounding Problems

83

Proof: The following claim was proved by Grigorevski and Shiganov (1976) for the case s = 2; i.e., in (7.6.88) the functions f have a.e. f and |f | ≤ 1 a.e.; see also Maejima and Rachev (1987) and Rachev and R¨ uschendorf (1992). Claim: For any 1 < s < 2, θ(X, Y ) ≥

1 1+s L (X, Y ). 4

Proof of the Claim: Let L(X, Y ) > ε. Then there exists x0 such that either FX (x0 ) > FY (x0 +ε)+ε or FY (x0 ) > FY (x0 +ε)+ε. Say the ﬁrst inequality takes place. Deﬁne ⎧ 1 for x ≤ x0 ; ⎪ ⎪ 2 ⎪ ⎪ 2(x − x0 ) ε ⎪ ⎪ for x0 < x ≤ x0 + ; ⎨ 1− ε 2 2 f0 (x) := ⎪ 2(x0 + ε + x) ε ⎪ ⎪ −1 + for x0 + ≤ x < x0 + ε; ⎪ ⎪ ε 2 ⎪ ⎩ 1 for x ≥ x0 + ε. Observe that |f0 (x)| ≤ 1, f (x) exists a.e., and ⎡ x +ε ⎤1/q 0 |f0 (x)|q dx⎦ = 8ε−s =: c(ε) ||f0 ||q = ⎣ x0

1 . Recalling the deﬁnition of θs , we have 2−s " # " # f0 (Y ) f0 (X) θs (X, Y ) ≥ E −E c(ε) c(ε) 1 (f0 (x) + 1) d [FX (x) − FY (x)] = c(ε) x 0 ∞ 1 = (f0 (x) + 1) dFX (x) + (f0 (x) + 1) dFX (x) c(ε)

for q =

−∞

x0

∞ (f0 (x) + 1) dFY (x) − (f0 (x) + 1) dFY (x) − −∞ x0 +ε ⎡ x ⎤ 0 ∞ 1 ⎣ (f0 (x) + 1) dFX (x) − (f0 (x) + 1) dFY (x)⎦ c(ε) x0 +ε

≥

−∞

≥

2 [FX (x0 ) − FY (x0 + ε)] c(ε)

x0 +ε

84

7. Relaxed or Additional Constraints

≥ =

2ε c(ε) 1 1+s ε . 4

Letting ε → L(X, Y ) completes the proof of the claim. Now the desired estimate follows from Lemma 7.6.38 and the claim.

2

8 Application of Kantorovich-Type Metrics to Various Probabilistic-Type Limit Theorems

We have discussed already in detail the Kantorovich metric as the solution of mass transportation and mass transshipment problems with a metric cost function; cf. Section 2.5 and Chapter 4. In Chapter 7 we studied generalized transshipment problems, leading to extensions of the Kantorovich metric to encompass a variety of ideal probability metrics. This chapter is devoted to applications of these metrics to the rate of convergence problem in the central limit theorem (CLT) and diﬀerent summability methods for random vectors. We also discuss applications to the asymptotics of various rounding rules.

8.1 Rate of Convergence in the CLT with Respect to the Kantorovich Metric In this section, we investigate bounds for the rate of convergence in the CLT with respect to the Kantorovich metric for random variables with values in separable Banach spaces. In the ﬁrst part, the rate in stable limit theorems for sums of i.i.d. random variables is considered. The method of proof is an extension of the Bergstr¨ om convolution method. All assumptions regarding the domain of attraction are given in a metric form. In the second part an extension is given to the martingale case. The proof is based on smoothing properties of suitable conditonal versions of the Kantorovich metric. Smoothing inequalities for the Kantorovich metric will be established, and

86

8. Probabilistic-Type Limit Theorems

the Bergstr¨om convolution method (cf. Zolotarev (1977, 1979, 1983, 1986), Senatov (1980), Sazonov (1981), Rachev and Yukich (1989, 1991), Rachev (1991c)) will be extended to the case of stable limit theorems and at the same time to the Kantorovich metric. All assumptions concerning the domain of attraction and the order of convergence are described in terms of ﬁniteness conditions for certain convolution-type metrics. As a consequence of the results for the Kantorovich metric, one obtains rate of convergence results in stable limit theorems for martingales with respect to the Prohorov metric.(1) We start with the rate of convergence in the i.i.d. case. Consider a separable Banach space (U, · ) and the space X (U ) of U -valued r.v.s deﬁned on a rich enough probability space. The r.v. ϑ ∈ X (U ) is said to be α-stable (0 < α ≤ 2) if −1/α

n

n

d

ϑi = ϑ,

(8.1.1)

i=1

where the ϑi ’s are i.i.d. copies of ϑ. We are interested in the rate of convergence of the normalized sum Zn = n−1/α

n

Xi

(8.1.2)

i=1

of i.i.d. r.v.s to ϑ with respect to the Kantorovich metric:

1 (X, Y ) :=

sup {|E(f (X) − f (Y ))|; f : U → IR bounded, (8.1.3) |f (x) − f (y)| ≤ x − y} .

Recall from Chapters 2 and 4 that 1 -convergence is equivalent to convergence in distribution and convergence of the moments E · (existence assumed); moreover, for U = IR, 1 (X, Y ) = |FX (x) − FY (x)| dx. The Prohorov metric π(X, Y ) :=

inf{ε > 0; P (X ∈ A) ≤ P (Y ∈ Aε ) + ε,

(8.1.4)

for all Borel sets A in U }(A := {x; |x − A|} < ε) ε

and the Kantorovich metric satisfy the well-known inequality π 2 ≤ 1 ,

(8.1.5)

(1) Some results in the literature are formulated more generally but use bounds involving moments of order ≥ 2 and, therefore, are restricted to the Gaussian case. For some recent literature we refer to Bolthausen (1982), H¨ aussler (1988), Bentkus et al. (1990), and Rackauskas (1990). Our method involves various extensions of an idea in Gudynas (1985) on suitably conditioned versions of probability metrics. The results in this section are based on Rachev and R¨ uschendorf (1994a).

8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric

87

which is in fact an immediate consequence of the Strassen and Kantorovich theorems. In particular, 1 -convergence rates imply convergence rates for π. Theorem 8.1.1 For any 0 < α < 1,

1 (Zn , ϑ) ≤ n1−1/α 1 (X1 , ϑ).

(8.1.6)

Proof: The result follows from (8.1.1)–(8.1.3) and the contraction properties of 1 ; in fact, if ϑi are i.i.d. copies of ϑ, then n −1/α E Zn − n ϑi ≤ n1−1/α E|X1 − ϑ1 |. (8.1.7) i=1

Next, we take in both sides of (8.1.7) the inﬁmum over all joint distributions P X1 ,ϑ1 with ﬁxed marginals P X1 and P ϑ . The result is (8.1.6) as desired. 2 Note that (8.1.6) is a general property for every ideal metric of order 1; see for example Zolotarev (1979). Recall that a probability metric µ is said to be ideal of order r if µ(X + Z, Y + Z) ≤ µ(X, Y )

(8.1.8)

for all r.v.s X, Y, Z such that Z is independent of (X, Y ) and µ(cX, cY ) = |c|r µ(X, Y )

for all c ∈ IR,

(8.1.9)

see Sections 6.3 and 6.4. Consider next the rate of convergence in

1 (Zn , ϑ) → 0

(8.1.10)

for 1 < α ≤ 2. Deﬁne the following ideal (smoothing) Kantorovich metric of order r > 1:

r (X, Y ) = sup hr−1 1 (X + hϑ, Y + hϑ),

r > 1,

(8.1.11)

h>0

and σ r (X, Y ) = sup hr σ(X + hϑ, Y + hϑ),

r > 0.

(8.1.12)

h>0

Here ϑ in (8.1.11) and (8.1.12) is assumed to be independent of X and Y , and σ is the total variation metric: σ(X, Y ) = sup{|E(f (X) − f (Y ))|; f : U → [0, 1] continuous} = 2 sup |P (X ∈ A) − P (Y ∈ A)|. A∈B(U )

(8.1.13)

88

8. Probabilistic-Type Limit Theorems

Note that r and σ r are ideal metrics of order r. Throughout this section

r stands for the smoothed 1 -metric of order r. The notion p has been used in previous sections for the minimal Lp -metric. So, we have increased the level of “ideality” for 1 and σ (recall that 1 is an ideal metric of order 1, while σ is ideal of order 0) by appropriate smoothing; see (8.1.11) and (8.1.12). The next theorem provides an estimate of the convergence rate in (8.1.10). In what follows C stands for an absolute constant that can be diﬀerent in diﬀerent places. Set 1 = 1 (X1 , ϑ), r = r (X1 , ϑ), σ = σ(X1 , ϑ), σ r = σ r (X1 , ϑ). We always assume r > 0. The results in this section are due to Rachev and R¨ uschendorf (1994). Theorem 8.1.2 Suppose that (a)

Eϑ < ∞;

(b)

1 + r + σ1 + σ r < ∞.

Then

1 (Zn , ϑ) ≤ C n1−r/α r + τr n−1/α ,

(8.1.14)

τr = max 1 , σ1 , σ 1/(r−α) . r

(8.1.15)

where

Remark 8.1.3 Zolotarev (1986, §5.4) provides a similar bound for

1 (Zn , ϑ) in the normal univariate case. Zolotarev’s bound contains ζr metrics in the right-hand side of (8.1.14), which can be easily estimated from above in the normal case. In the stable case, however, we need more reﬁned bounds. The problem of ﬁniteness of σ r was discussed in Rachev and Yukich (1989) (see also Section 8.3); for the ﬁniteness of r see the next corollary. Further in this section the sum of any random variables X + Y d d + Y , where X and Y are independent and X = means X X, Y = Y . ϑ, and ϑi are deﬁned as in (8.1.1) and satisfy (a). Proof: The proof is similar to that of Theorem 8.1.16, further in this section, which we shall give in detail. Here we give only a short sketch of the proof. It uses the following two properties of the metrics 1 , r , σ r ; see Zolotarev (1986, §5.4). Smoothing Property 1. For any X, Y ∈ X (U ),

1 (X, Y ) ≤ 1 (X + εϑ, Y + εϑ) + 2εEϑ.

(8.1.16)

8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric

89

Smoothing Property 2. For any X, Y, Z, W independent,

1 (X + Z, Y + Z) ≤ 1 (Z, W )σ(X, Y ) + 1 (X + W, Y + W ). (8.1.17) Next, let m = [ n2 ]; then by (8.1.16),

1 (Zn , ϑ1 ) ≤ 1 (Zn + εϑ, ϑ1 + εϑ) + Cε ϑ1 + X 1 + · · · + X n + εϑ ≤ 1 Zn + εϑ, n1/α m ϑ1 + · · · + ϑj + Xj+1 + · · · + Xn ·

1 + εϑ, n1/α j=1

ϑ1 + · · · + ϑj+1 + Xj+2 + · · · + Xn + εϑ n1/α ϑ1 + · · · + εm+1 + Xm+2 + · · · + Xn + 1 + εϑ, ϑ + εϑ 1 n1/α m = I0 + Ij + Im+1 . j=1

By (8.1.17),

X2 + · · · + Xn ϑ2 + · · · + ϑn −1/α −1/α , X + εϑ, n ϑ + εϑ σ n I0 ≤ 1 1 1 n1/α n1/α X1 + ϑ 2 + · · · + ϑ n ϑ 1 + · · · + ϑn + 1 + εϑ, + εϑ . n1/α n1/α Similar upper bounds are obtained for Ij , 1 ≤ j ≤ m + 1. Some of the terms obtained in this way can be estimated using the ideality properties of the For example, a term of the following form, ∆ = (m + metrics. +···+ϑn , ϑ , can be estimated by 1) 1 X1 +ϑn21/α 1 ∆ =

(m + 1) 1

≤ (m + 1) ≤ (m + 1)

n

−1/α

n n−1 n n−1

X1 +

r−1 α r−1 α

n−1 n

1/α

−1/α

ϑ, n

ϑ1 +

r (n−1/α X1 , n−1/α ϑ) n−r/α r ≤ Cn1−r/α r ,

where in the ﬁrst inequality we use the obvious relation

r (X, Y ) ≥ hr−1 1 (X + hϑ, Y + hϑ).

n−1 n

1/α 2 ϑ

90

8. Probabilistic-Type Limit Theorems

X1 +···+Xj ϑ1 +···+ϑj we , j 1/α j 1/α 1−r/α −1/α Bj ≤ C( r j + τr j ).

For terms of the form Bj := 1

use an induction

argument to get the bound see the proof of Theorem 8.1.16.

For more details 2

Corollary 8.1.4 Suppose that U = IRK and that ϑ has a Fr´echet diﬀerentiable density pϑ and let C(ϑ) = sup |pϑ (y)(z)| dz < ∞. (8.1.18) z≤1

Suppose that Eϑ < ∞ and 1 + r < ∞. Then

1 (Zn , ϑ) ≤ C(n1−r/α r + τr∗ n−1/α ), where τ1∗ = max( 1 , r

1/(r−α)

(8.1.19)

).

For an integer r, r can be estimated from above by the ζr -metric (see (r) Zolotarev (1983, p. 294)): r ≤ Cζr if supz≤1 |pϑ (y)(z)| dz is ﬁnite. We shall discuss the ﬁniteness of r in Section 8.5 in more detail. Proof: Claim 1. For any X and Y ∈ X (IRk ) and δ > 0, σ(X + δϑ, Y + δϑ) ≤ C(r)δ −r r (X, Y ),

(8.1.20)

with C(r) = 2(2−3)/α C(ϑ). To prove the claim we ﬁrst use the obvious bound σ(X + δϑ, Y + δϑ) ≤ δ −r σ r (X, Y ).

(8.1.21)

Next, we show that for any δ > 0, σ(X + δϑ, Y + δϑ) ≤ δ −1 C(ϑ) 1 (X, Y ).

(8.1.22)

Indeed, by the ideality of σ and 1 it is enough to show (8.1.22) for δ = 1. Then σ(X + ϑ, Y + ϑ) ≤ sup f (x)(PX ( dx) − PY ( dx)) , |f |≤1

f (x + y)pϑ (y) dy. Since |f | ≤ 1, f (x) = sup |f (x)(z)| ≤ sup |pϑ (y)(z)| dy =: C(ϑ),

where f (x) =

z≤1

z≤1

8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric

91

and thus |f (x) − f (y)| ≤ C(ϑ)x − y, which obviously implies (8.1.22). To show (8.1.19) we use (8.1.21), (8.1.22), and the following bound: σ r (X, Y )

=

sup hr σ(X + hϑ, Y + hϑ) h>0

≤ sup hr 1 (X + 2−1/α hϑ, Y + 2−1/α hϑ) h>0

21/α C(ϑ) h

= C(ϑ)2r/α r (X, Y ). This completes the proof of the claim as well as that of (8.1.19).

2

Remark 8.1.5 (Rate of convergence in the CLT for random elements with LePage representation) Consider a symmetric α-stable U -valued random variable ϑ with LePage representation d

ϑ =

∞

−1/α

Γj

ηj Yj ,

(8.1.23)

j=1

where (i) Yj are i.i.d. with EY1 r < ∞; (ii) ηj are i.i.d. symmetric real-valued random variables with η1 α = (E|η1 |α )1/α < ∞; (iii) (Γj ) is a sequence of successive times of jump of a standard Poisson process; (iv) we assume that the three sequences are independent; see Ledoux and Talagrand (1991, Sect. 5.1) and Samorodnitsky and Taqqu (1994). Suppose X has a similar representation d

X =

∞

−1/α ∗ ∗ ηj Yj ,

Γj

(8.1.24)

j=1

where (Yj∗ ) and (ηj∗ ) are chosen as in (i) and (ii) with the only diﬀerence that they are not identically distributed. Write Zn , the normalized sum of i.i.d. copies Xi as in (8.1.2). Then Theorem 8.1.2 yields the following rate of convergence of Zn to ϑ in the 1 -metric. Corollary 8.1.6 Let 1 ∨ α < r < 2, and E||Y1 ||r + supj≥1 E||Yj∗ ||r + E||η1 ||r + supj≥1 E||ηj∗ ||r < ∞. Then

1 (Zn , ϑ) ≤ C(n1−r/α ∗r + τr∗ n−1/α ),

(8.1.25)

92

8. Probabilistic-Type Limit Theorems

% where ∗r := supj≥1 ( r (Yj∗ , Y1 ) + r (ηj∗ Y1 , η1 Y1 )) and τr∗ = max ∗1 , σ1∗ , & ∗1/(r−α) σr with σr∗ := supj≥1 (σ r (Yj∗ , Y1 ) + σ r (ηj∗ Y1 , η1 Y1 )). Proof: In view of (8.1.14), (8.1.15) we need only show the ﬁniteness of σ r and r . For σ r = σ r (X, ϑ) we use the ideality of order r and the asymptotics −r/α EΓj ∼ j −r/α (j → ∞) to obtain σ r (X, ϑ)

=

−r/α

EΓj

r (ηj Yj , ηj∗ Yj∗ )

j≥1

⎛ ≤ ⎝

j≥1

⎞ −r/α ⎠ sup{E|ηj∗ |r EΓj j≥1

σ r (Yj∗ , Yj ) + σ r (ηj∗ Yj , ηj Yj )}

≤ C sup(σ r (Yj∗ , Y1 ) + σ r (ηj∗ Y1 , η1 Y1 )). j≥1

2

The same type estimate is valid for r .

Since in the LePage representations Yj , Yj∗ can have any high enough moment, examples with ﬁnite ∗r and τr∗ can be readily constructed. Take, d

for example, U to be a Hilbert space with basis (hm )m≥1 , and set Yj∗ = ∗ d ∗ ζj,m hm , Y1 = ζm hm , where (ζm )m≥1 , (ζj,m )m≥1 are sequences of m≥1

m≥1

independent random variables. Then, by the ideality of σ r , ∗ ∗ σ r (Yj∗ , Y1 ) ≤ σ r (ζj,m , ζm ) ≤ C κr (ζj,m , ζm ), m≥1

(8.1.26)

m≥1

where κr is the rth pseudomoment, κr (ζ ∗ , ζ) = r |x|r−1 |Fζ ∗ (x) − Fζ (x)|dx, see Zolotarev (1983). Similarly, ∗

r (Yj∗ , Y1 ) ≤ C κr (ζj,m , ζm ).

(8.1.27)

m≥1

The same example is valid if we relax the independence assumption to “independence in ﬁnite blocks,” requiring only that (ζ1+ , . . . , ζL+ ), = 0, L, 2L, . . ., are independent. Remark 8.1.7 (Finite-dimensional approximation) An alternative use of the explicit upper bounds of the smoothing metrics in the ﬁnite-dimensional case is to combine Theorem 8.1.2 with an approximation step by the ﬁnitedimensional case. To be concrete, let X, Y be C(S) valued processes, (S, d)

8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric

93

a totally bounded metric space. For ε > 0 let Vε denote a ﬁnite covering ε-net and let Pε X = (Xt )t∈Vε be the corresponding ﬁnite-dimensional approximation of X = (Xt )t∈S . If E sup |Xs − Xt | ≤ a(ε)

(8.1.28)

d(s,t)≤ε

and |Ys − Yt | ≤ b(ε),

E sup d(s,t)≤ε

then

1 (X, Y ) ≤ 1 (Pε X, Pε Y ) + a(ε) + b(ε).

(8.1.29)

So we can combine ﬂuctuation inequalities (8.1.28) with the ﬁnite-dimensional bounds derived in (8.1.14) for the normalized sum Zn in order to choose an optimal rate of approximation ε = ε(n) → 0. A general and simple useful tool to derive ﬂuctuation inequalities as in (8.1.28) is Pollard’s lemma, which applied to (8.1.28) yields $ 3 Nε max E sup |Xs − Xt | , (8.1.30) E sup |Xs − Xt | ≤ 1≤i≤Nε

d(s,t)≤ε

d(s,ti )<ε d(t,ti )<ε

where Nε = card (Vε ) and Vε = {ti , 1 ≤ i ≤ Nε }. The case α = 1 requires special consideration. We shall state a variant of Theorem 8.1.2 that will cover the case α = 1 but requires additional smoothing conditions on the law of the Xi ’s. The next theorem is based on the following lemma (see Rachev and Yukich (1989) or Rachev (1991c, Ch. 14)). Lemma 8.1.8 Let 0 < α ≤ 2, r > α, ar = −r/α

Ar (a) = 2

1 , 21+r/α (2r/α +3r/α )

and Ar =

ar . Suppose

δ0 := δ0 (X1 , ϑ) := max(σ, ϑr ) ≤ ar .

(8.1.31)

Then for any n ≥ 1 σ(Zn , ϑ) ≤ Ar δ0 n1−r/α ≤ 2−r/α n1−r/α .

(8.1.32)

Theorem 8.1.9 Suppose condition (8.1.23) holds and τ r := max( 1 , r ) < ∞. Then for

1 2

(8.1.33)

< α ≤ 2 and r > α,

1 (Zn , ϑ) ≤ Br,α τ r n1−r/α , where Br,α ≥ 8(r−1)/α + 2(2r/α + 3r/α ).

(8.1.34)

94

8. Probabilistic-Type Limit Theorems

The proof uses the following analogue of (8.1.17): For any independent X, Y, Z, W ,

1 (X + Z, Y + Z) ≤ 1 (X, Y )σ(Z, W ) + 1 (X + W, Y + W ). (8.1.35) The proof is similar to that of the smoothing inequality in Zolotarev (1986, §5.4) (see also Rachev (1991c, Theorem 15.2.2)) and thus is omitted. The theorem is of interest for 1 ≤ α ≤ 2, as for 0 < α < r < 1 we get from (8.1.6), 1 (Zn , ϑ) ≤ n1−1/α τ¯r . Our next objective is the extension of Theorem 8.1.2 to the martingale case. Let (Ω, A, P ) be a rich enough probability space, (Fi ) an increasing sequence of sub σ-algebras of A, and let (Xi , Fi ) be an adapted martingale diﬀerence sequence with values in a separable Banach space (U, · ); that is, E(Xi |Fi−1 ) = 0 a.s., i ∈ IN. For a given probability metric µ and a sub σ-algebra G ⊂ A deﬁne the G-dependence metric µ(·G) by µ(X, Y G) = sup µ(X + V, Y + V ), V ∈G

(8.1.36)

where V ∈ G denotes that V is a G-measurable random variable. This notion generalizes an idea due to Gudynas (1985). Lemma 8.1.10 If µ is homogeneous of order r, that is, µ(cX, cY ) ≤ |c|r µ(X, Y ),

(8.1.37)

then µ(·G) also is homogeneous of order r. We shall use the following metrics: r (·G), σ r (·G), where r , σ r are respectively the smoothed Kantorovich metric and the total variation metric (cf. (8.1.11), (8.1.12)). Lemma 8.1.11 Let the regular conditional distributions PX|G , PY |G exist. Then

r (X, Y G) ≤ E r (PX|G , PY |G )

(8.1.38)

σ r (X, Y G) ≤ Eσ r (PX|G , PY |G ).

(8.1.39)

and

Proof: Let ϑ be independent of X, Y , and G. Then

r (X, Y G) = =

sup r (X + V, Y + V ) sup sup sup hr−1 E(f (X + V + hϑ)

V ∈G

f L ≤1 h>0 V ∈G

8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric

95

−f (Y + V + hϑ))

≤ E sup sup sup hr−1 E(f (X + V + hϑ)|G)

−E(f (Y + V + hϑ)|G)

f L ≤1 h>0 V ∈G

= E sup sup sup hr−1 E(fV (X + hϑ)|G)

−E(fV (Y + hϑ)|G),

f L ≤1 h>0 V ∈G

where fV (·) = f (· + V ) is the translation by V , which is again a Lipschitz (y)| is the Lipschitz norm. We function, and where f L = supx =y |f (x)−f x−y arrive at

r (X, Y G) ≤ E sup sup hr−1 E(f (X + hϑ)|G) − E(f (Y + hϑ)|G) f L ≤1 h>0

= E r (PX|G PY |G ). The proof for the metric σ r is similar.

2

As a consequence we obtain the following regularity property of r and σr . Lemma 8.1.12 Let (Xi , Fi ) be a stochastic sequence and (Gi ) a decreasing sequence of sub σ-algebras such that Yj are Gi -measurable for j ≥ i. Suppose that the following condition holds: (c)

Xi and Gi+1 are conditionally independent given Fi−1 , and Yi and Gi+1 are conditionally independent given Fi−1 .

Then, for ci ∈ IR,

r

1 n

ci Xi ,

n

2 ≤

ci Yi

i=1

i=1

1 n

n

n

|ci |r E r (PXi |Fi−1 , PYi |Fi−1 )

(8.1.40)

i=1

and σr

i=1

c i Xi ,

i=1

2 ci Yi

≤

n

|ci |r Eσ r (PXi |Fi−1 , PYi |Fi−1 ),

i=1

assuming that the conditional distributions exist.

(8.1.41)

96

8. Probabilistic-Type Limit Theorems

Proof: By Lemma 8.1.10, 1 n 2 n

r ci Xi , ci Yi i=1

≤

n

i=1

r (c1 X1 + · · · + ci Xi + ci+1 Yi+1 + · · · + cn Yn ,

i=1

≤ = ≤

n

c1 X1 + · · · + ci−1 Xi−1 + ci Yi + · · · + cn Yn ) sup hr−1

i=1 h>0 n

sup

V ∈Fi−1 ∨Gi+1

1 (ci Xi + V + hϑ, ci Yi + V + hϑ)

r (ci Xi , ci Yi Fi−1 ∨ Gi+1 )

i=1 n

|ci |r r (X1 , Y1 Fi−1 ∨ Gi+1 ),

i=1

where Fi−1 ∨ Gi+1 is the σ-algebra generated by Fi−1 and Gi+1 . From Lemma 8.1.11 and the conditional independence assumption,

r (Xi , Yi Fi−1 ∨ Gi+1 ) ≤ E r (PXi |Fi−1 ∨Gi+1 , PYi |Fi−1 ∨Gi+1 ) = E r (PXi |Fi−1 , PYi |Fi−1 ). As for the metric σ r , the proof is similar.

2

Remark 8.1.13 If Yi are independent of Fi−1 , EYi = 0, then

r (PXi |Fi−1 , PYi ) ≤ Cr ζr (PXi |Fi−1 , PYi ) ≤ Cr κr (PXi |Fi−1 , PYi ),

(8.1.42)

where ζr is the Zolotarev metric and κr is the pseudo-diﬀerence moment (cf. (8.1.26) and Rachev (1991, p. 377)). In the α-stable case 1 < α < 2 and r = 2, the ﬁniteness of κr implies that E(Xi |Fi−1 ) = EYi = 0,

(8.1.43)

which is fulﬁlled in the martingale case. In the normal case α = 2 and r = 3, the ﬁniteness of ζr implies in the Euclidean case that the conditional covariance Cov (Xi |Fi−1 ) = Cov (Yi )

(8.1.44)

is almost surely constant. This and related conditions have been assumed in several papers on the martingale convergence theorem (cf. Basu (1976), Dvoretzky (1970), Bolthausen (1982), Butzer et al. (1983), H¨ aussler (1988), Rackauskas (1990)).

8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric

97

Lemma 8.1.14

r (X, Y ) ≥ ( 1 (X, Y ))r

r−1 r

r−1

1 2Eϑ. r

(8.1.45)

Proof: By the triangle inequality and from the deﬁnition of r ,

r (X, Y ) ≥ 1 (X + εϑ, Y + εϑ)εr−1 ≥ ( 1 (X, Y ) − 2εEϑ)εr−1 := ϕ(ε). 2

Maximizing ϕ(ε) with respect to ε, we obtain (8.1.45). In the next step we extend the smoothing inequality (8.1.17).

Lemma 8.1.15 Suppose that X, Z, Y, W are random variables with values in U such that (X, Z) is independent of (Y, W ) and Y, W are independent. Then

1 (X + Z, Y + Z) ≤ 1 (Z, W )σ(X, Y ) + 1 (X + W, Y + W ) + X), + 1 (Z + X, Z

(8.1.46)

d = is independent of X, and Z and Z where Z

σ(X + Z, Y + Z) ≤ σ(Z, W )σ(X, Y ) + σ(X + W, Y + W ) + X). + σ(Z + X, Z

(8.1.47)

Proof: By the triangle inequality,

1 (X + Z, Y + Z) =

sup |E [(f (X + Z) − f (X + W ))

f L ≤1

− (f (Y + Z) − f (Y + W ))]|

+ sup |E(f (X + W ) − f (Y + W ))|. f L ≤1

Furthermore, |E [f (X + Z) − f (X + W ) − (f (Y + Z) − f (Y + W ))]| = (E(f (Z + x)|X = x) − Ef (W + x)) dPX (x) − (Ef (Z + x) − Ef (W + x)) dPY (x) ≤ (E(f (Z + x)|X = x) − Ef (Z + x)) dPX (x) + Ef (Z + x)( dPX (x) − dPY (x))

98

8. Probabilistic-Type Limit Theorems

−

Ef (W + x)( dPX (x) − dPY (x))

+ X) + 1 (Z, W )σ(X, Y ). ≤ 1 (Z + X, Z 2

The proof of (8.1.47) is similar.

The last term in (8.1.46) is a measure of dependence of Z, X, which disappears if Z, X are independent. Making use of the smoothing properties, we next extend Theorem 8.1.2 to the martingale case. Let (Xi , Fi ) be a martingale diﬀerence sequence, n Zn = n1/α j=1 Xj , and as in (8.1.4) let ϑ, ϑi be independent, α-stable distributed r.v.s. For r > α we deﬁne

r = sup r (Xj , ϑj ), τr = sup E r (PXj |Fj−1 , Pϑj ), r = r ∨ τr , (8.1.48) j

j

τr = sup E r (PXj |Fj−1 , PXj ), τ4r = sup E r (PXj |G4j+1 , PXj ), j

j

where G4j+1 = σ(Xj+1 , Xj+2 , . . .), and σ r = supj σ r (Xj , ϑj ), the conditional distributions, are assumed to exist. Theorem 8.1.16 Suppose that Eϑ < ∞. Then

1 (Zn , ϑ) ≤ C(n1−r/α r + n−1/α tr ), 1 1 where tr = max 1 , σ1 , σ rr−α , τ4rr−α , τ1 .

(8.1.49)

Proof: Applying (8.1.16) we shall estimate 1 (Zn + εϑ, ϑ1 + εϑ). Set m = [ n2 ]. Then 2 1 n

1 Zn + εϑ, n−1/α ϑi + εϑ ≤

i=1

1 Zn + εϑ, n−1/α (ϑ1 + X2 + · · · + Xn ) + εϑ +

m

1 n−1/α (ϑ1 + · · · + ϑj + Xj+1 + · · · + Xn ) + εϑ,

j=1

⎛

n−1/α (ϑ1 + · · · + ϑj+1 + Xj+2 + · · · + Xn ) + εϑ

+ 1 ⎝n−1/α (ϑ1 + · · · + ϑm+1 + Xm+2 + · · · + Xn ) + εϑ,

8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric

n−1/α

n

99

⎞ ϑj + εϑ⎠

j=1

=: I0 +

m

Ij + Im+1 .

j=1

From the generalized smoothing inequality (8.1.46), X1 ϑ1 X2 + · · · + X n X2 + · · · + Xn I0 = 1 + + εϑ , + + εϑ n1/α n1/α n1/α n1/α X 2 + · · · + X n ϑ2 + · · · + ϑ n −1/α −1/α ≤ 1 , X + εϑ, n ϑ + εϑ σ n 1 1 n1/α n1/α X1 + ϑ 2 + · · · + ϑ n ϑ1 + · · · + ϑ n + εϑ, + εϑ + 1 1/α n n1/α 2 1 n X1 + · · · + Xn−1 + X X1 + · · · + X n + 1 + εϑ, + εϑ n1/α n1/α =: ∆1 + ∆2 + ∆3 , d n = n is independent of X1 , . . . , Xn−1 , ϑ. Similarly, where X Xn and X m j=1

Ij

≤

m j=1

1

Xj+2 + · · · + Xn ϑj+2 + · · · + ϑn , n1/α n1/α

ϑ1 + · · · + ϑj + Xj+1 ϑ1 + · · · + ϑj+1 + εϑ, + εϑ n1/α n1/α 1 m ϑ1 + · · · + ϑj + Xj+1 + ϑj+2 + · · · + ϑn

1 + εϑ, + n1/α j=1 2 n j=1 ϑj + εϑ n1/α m ϑ1 + · · · + ϑj + Xj+1 + · · · + Xn +

1 + εϑ, n1/α j=1 ·σ

j+1 + Xj+2 + · · · + Xn ϑ1 + · · · + ϑ j + X + εϑ n1/α

2

=: ∆4 + ∆5 + ∆6 . We ﬁrst estimate ∆5 . By the ideality of r , 1 1/α m Xj+1 n−1 ∆5 =

1 + ϑ1 + εϑ, n n1/α j=1

(8.1.50)

100

8. Probabilistic-Type Limit Theorems

2 1/α ϑj+1 n−1 + ϑ + εϑ n n1/α 1 1/α 1/α 2 m n−1 n−1 ϑj+1 Xj+1 ≤

1 + ϑ, 1/α + ϑ n n n1/α n j=1 ≤ Cn1−r/α r . Similarly, by Lemma 8.1.12, ∆7

=: Im+1 (8.1.51) 1 1/α m+1 Xm+1 + · · · + Xn ϑ+ , ≤ 1 n n1/α 2 1/α m+1 ϑm+1 + · · · + ϑn ϑ+ n n1/α (1−r)/α n m+1 ≤ n−r/α E r PXi |Fi−1 , Pϑi n j=m+1 ≤

Cn1−r/α τr ,

and in the same way as for ∆5 , we obtain ∆2 ≤ Cn1−r/α r .

(8.1.52)

The remaining terms are dealt with by induction. Assume next that for j < n,

X1 + · · · + X j ϑ 1 + · · · + ϑ j r j 1−r/α + tr j −1/α , ≤ B

1 (8.1.53) j 1/α j 1/α and let

1/(r−α) 1/(r−α) ε = A max σ1 , σ 1/(r−α) , r , τ4r n−1/α r

(8.1.54)

with a constant A ≥ 0 that we shall ﬁx later in the proof. Then ∆1

≤ BC(n1−r/α r + n−1/α tr )ε−1 n−1/α σ1 (X1 , ϑ) 1 ≤ BC(n1−r/α r + n−1/α tr ). A

In the same way,

∆4 ≤ CB r (n − m − 2)1−r/α + tr (n − m − 2)−1/α −r/α ∞ j Xj+1 ϑj+1 α +ε σr , · n n1/α n1/α j=1

(8.1.55)

(8.1.56)

8.1 Rate of Convergence in the CLT with Respect to Kantorovich Metric

101

≤ CB r n1−r/α + tr n−1/α ·

∞

j=1

σ r (Xj+1 , ϑj+1 ) j + Aα σ r (Xj+1 , ϑj+1 )

α/(r−α)

r/α

≤ CB( r n1−r/α + tr n−1/α )/Ar−α , using that εα ≥ Aα σ r α/(r−α) n−1 . To estimate ∆3 we apply the G-dependence metric 1 (·Fn−1 ): 1 2 n X Xn + εϑ, 1/α + εϑFn−1 (8.1.57) ∆3 ≤ 1 n1/α n 1 2 n Xn X ≤ 1 ≤ n−1/α E 1 (PXn |Fn−1 , PXn ) , Fn−1 n1/α n1/α ≤ n−1/α τ1 . Finally, we estimate ∆6 as follows: 1 1/α m j Xj+1 Xj+2 + · · · + Xn α ∆6 = +ε

1 + + ϑ, n n1/α n1/α j=1 1/α 2 j+1 j X Xj+2 + · · · + Xn + + ϑ + εα n n1/α n1/α (1−r)/α m j α ≤ +ε n j=1 1 2 j+1 Xj+1 Xj+2 + · · · + Xn X Xj+2 + . . . + Xn · r + , 1/α + n1/α n1/α n n1/α ≤

m (j + nεα )(1−r)/α

n(1−r)/α

j=1

≤ τr n−1/α

m

j+1 G4j+2 ) n−r/α r (Xj+1 , X

(j + nAα τ4rα/(r−α) n−1 )(1−r)/α

j=1

≤ Cn−1/α

1 τ41/(r−α) . Ar−1−α r

Gathering all the inequalities, we obtain B 1−r/α n

r + n−1/α tr + C2 n1−r/α r + C3 n−1/α τ1 A B + C4 r−α n1−r/α r + n−1/α tr + C5 n1−r/α r A

1 (Zn , ϑ) ≤ C1

102

8. Probabilistic-Type Limit Theorems

+ C6

1 τ41/(r−α) n−1/α + C7 n1−r/α τr Ar−1−α r

+ 2Eϑn−1/α max σ1 , σ r 1/(r−α) , r

1/(r−α)

, τ4r1/(r−α) .

4 ≤ 12 and then choose B Choose A large enough such that CA1 + ACr−α 1+α−r large enough such that C2 + C3 + C5 + C6 A + C7 + 2Eϑ ≤ B2 . Thus we obtain (8.1.49). 2

8.2 Application to Stable Limit Theorems Zolotarev (1976) introduced the ζr -metric as an extension of the Kantorovich metric. For any pair of random vectors X, Y on IRk and r = m + a it is deﬁned by ζr (X, Y ) = sup{|E(f (X) − f (Y ))|; f ∈ Fr },

(8.2.1)

where Fr is the class of all functions f : IRk → IR such that f (m) (x) − f (m) (y) ≤ x − ya1 , 0 < a ≤ 1; f (m) denotes the mth Fr´echet derivative of f supplied with the usual supremum norm of multilinear functionals (cf. Zolotarev (1986, Section 6.3) and Rachev (1991, p. 264)), and x − y1 denotes the L1 -norm in IRk . Indeed, ζ1 is merely the Kantorovich metric; see (8.1.3). ζr is ideal of order r and therefore is suitable for analyzing the rate of convergence in various central limit theorems. (The deﬁnition of an ideal metric was given in (8.1.8) and (8.1.9).) A disadvantage of ζr is that only for integers r can ζr be estimated by diﬀerence pseudomoments from above, while for r ∈ / IN the known upper estimates involve absolute moments br = max(EXr , EY r ) or absolute pseudomoments of order r and therefore are not suitable for approximation by stable distributions of order α < 2. In IR an alternative ideal metric of order r that does not have this drawback of ζr was found by Maejima and Rachev (1987) and applied to prove convergence to self-similar processes; see also Rachev (1991c, Section 17.1). In this section we introduce a new ideal metric ϑs,p (with respect to summation of independent random vectors in IRk ), which generalizes the construction in Maejima and Rachev (1987). This ideal metric has the following properties. It is ideal of order r = s − 1 + p1 . It can be estimated from above by a Zolotarev-type metric and, what is more important, by a pseudo diﬀerence moment, which allows applications to stable distributions. Finally, it can be bound from below by the L´evy metric, and thus ϑs,p describes weak convergence of distributions. The degree of ideality of this

8.2 Application to Stable Limit Theorems

103

metric does not depend on the dimension. This is an important property, which is not satisﬁed by some obvious generalizations of one-dimensional ideal metrics of order greater than 1, see Sections 6.1, 6.3. We shall establish relations between ϑ1,p and ϑs,p and prove various smoothing inequalities. In the second part of this section we give an application to the rate of convergence in stable limit theorems. The upper bounds in the limit theorem are formulated in metric terms. We establish some new results ensuring the ﬁniteness of these bounds and apply these results to show that random vectors in a neighborhood of the LePage decomposition of a stable law satisfy the central limit theorem with rate. Further applications are to the convergence of summability methods of i.i.d. random vectors and to the approximation by compound Poisson distributions. All these applications are based on the thorough analysis of the metric properties of ideal metrics having a structure close to that of the Kantorovich metric. The results in this section are due to Rachev and R¨ uschendorf (1992). We start with the construction of the ϑs,p -metric. Let X, Y ∈ X (IRk ), the class of k-dimensional random vectors, and deﬁne for s ∈ IN, 1 ≤ p ≤ ∞, ϑs,p (X, Y ) = sup{|E(f (X) − f (Y ))|; f ∈ Gs,p }.

(8.2.2)

Here Gs,p is the class of functions f : IRk → IR, such that for 1 ≤ i1 ≤ · · · ≤ is ≤ k and for 1 ≤ j ≤ k, x = (x1 , . . . , xj−1 , xj+1 , . . . , xk ) ∈ IRk−1, and q with p1 + 1q = 1,

Dis1 ,...,is f q,j (x) ⎧ ⎛ ⎞1/q ⎪ ⎪ ⎪ ⎪ q ⎨ ⎝ |Ds ⎠ i1 ,...,is f (x1 , . . . , xj , . . . , xk )| dxj = IR ⎪ ⎪ ⎪ ⎪ ess sup |Dis1 ,...,is f (x1 , . . . , xj , . . . , xk )| ⎩ xj ∈IR

≤ 1

(8.2.3) if q < ∞, if q = ∞

a.s. with respect to the Lebesgue measure.

Lemma 8.2.1 For any 1 ≤ p ≤ ∞ and s ∈ IN, the metric ϑs,p is an ideal metric of order r = s − 1 + p1 . Proof: If f ∈ Gs,p and z ∈ IRk , then f (· + z) ∈ Gs,p and hence ϑs,p (X + Z, Y + Z) ≤ ϑs,p (X, Y ) for any Z independent of X and Y . Further, when q < ∞, for any c ∈ IR, x ∈ IRk−1 , 1 ≤ j ≤ k, fc (x) := f (cx), ⎤1/q ⎡ Dis1 ,...,is fc q,j (x) = ⎣ |Dis1 ,...,is fc (x1 , . . . , xk )|q dxj ⎦ IR

(8.2.4)

104

8. Probabilistic-Type Limit Theorems

⎡ ⎤1/q = |c|s−1/q ⎣ |Dis1 ,...,is f (y1 , . . . , yk )|q dyj ⎦ IR

= |c|r Dis1 ,...,is f q,j , which yields the ideality of ϑs,p of order r. The case q = ∞ can be handled similarly. 2

Remark 8.2.2 Note that the direct generalization of the Maejima–Rachev (1987) construction leads to 1/q (s) q ϑs,p (X, Y ) = sup |E(f (X) − f (Y ))|; f (x) dx ≤ 1 , (8.2.5) which is an ideal metric of order s − kq = s − k(1 − p1 ). This unpleasant dependence on the dimensionality is avoided in the deﬁnition of ϑs,p by the restriction to one-dimensional integration in (8.2.3). We next show that ϑs,p is estimated from above by the following modiﬁcation ζ r of the ζr -metric: ζ r (X, Y ) = sup{|E(f (X) − f (Y ))|; f ∈ F r },

(8.2.6)

where F r is the class of functions f : IRk → IR with f (m) (x) − f (m) (y) ≤ dα (x, y) :=

k

|xi − yi |a ,

(8.2.7)

i=1

m = 0, 1, . . . , a ∈ (0, 1], and r = m + a. In fact, (8.2.7) is equivalent to |f (x) − f (y)| ≤ dα (x, y), ∀x, y ∈ IRk ,

if m = 0,

(8.2.8)

and to sup 1≤i1 ≤···≤im ≤k

|Dim1 ,...,im f (x) − Dim1 ,...,im f (y)| ≤ da (x, y), if m ≥ 1.

Since x − ya1 ≤ da (x, y), we have ζr ≤ ζ r . Lemma 8.2.3 (a)

(8.2.9) For any integer r,

ϑr,1 = ζ r = ζr .

(8.2.10)

8.2 Application to Stable Limit Theorems

105

For any r > 0, if ϑs,p (X, Y ) < ∞, then

(b)

ϑs,p (X, Y ) ≤ ζ r ≤ where r = s − 1 +

1 p

da (x, 0)

|xi1 · · · xim | |PX − PY |( dx).

(8.2.12)

1≤i1 ,···,im ≤k

IRk

Proof: (a)

(8.2.11)

= m + a, and

v r (X, Y ) :=

Γ(1 + a) v r (X, Y ), Γ(1 + r)

It is enough to check that Gr,1 = F r . If f ∈ Gr,1 , then

sup 1≤i1 ,...,ir−1 ≤k

|Dir−1 f (x) − Dir−1 f (y)| 1 ,...,ir−1 1 ,...,ir−1

⎧ y ⎨ 1 r Di1 ,...,ir−1 ,1 f (t, x2 , . . . , xk ) dt ≤ sup 1≤i1 ,...,ir−1 ≤k ⎩ x1 y 2 r + Di1 ,...,ir−1 ,2 f (y1 , t, x3 , . . . , xk ) dt x2 y ⎫ k ⎬ r + · · · + Di1 ,...,ir−1 ,k f (y1 , . . . , yk−1 , t) dt ⎭ xk

≤ x − y1 =: d1 (x, y); i.e., f ∈ F r . Conversely, for f ∈ F r we have from (8.2.8),

sup 1≤i1 ,...,ir ≤k

=

|Dir1 ,...,ir f (x)|

sup

lim

|Dir−1 (f (x1 , . . . , xk ) − f (x1 , . . . , yr , . . . , xk ))| 1 ,...,ir−1 |xr − yr |

1≤i1 ,...,ir ≤k yr →xr

≤ 1 a.s.; i.e., f ∈ Gr,1 . (b)

If f ∈ Gs,p and 1 < p ≤ ∞, then similarly to the proof in (a),

sup 1≤i1 ,...,is−1 ≤k

|Dis−1 f (x) − Dis−1 f (y)| 1 ,...,is−1 1 ,...,is−1 k

(8.2.13)

yi

≤

sup

1≤i1 ,...,is−1 ≤k i=1 xi

|Dis1 ,...,is−1 ,i f (y1 , . . . , t, xi+1 , . . . , xk )| dt

106

8. Probabilistic-Type Limit Theorems

≤

k

Dis1 ,...,is f q,i (y1 , . . . , yi−1 , xi+1 , . . . , xk )|xi − yi |1/p

i=1

≤ da (x, y). For the second part of (b) ﬁrst note that ϑs,p (X, Y ) < ∞ implies that for any 1 ≤ i1 ≤ · · · ≤ ij ≤ k, j ≤ s − 1, E(Xi1 · · · Xij − Yi1 · · · Yij ) = 0.

(8.2.14)

This follows, by taking fc (x) = c xi1 · · · xij , j ≤ s − 1, and the obvious inequality ϑs,p (X, Y ) ≥ supc>0 |E fc (X) − E fc (Y )|. Following the argument in the ﬁrst part of this proof, (8.2.14) is also a consequence of the condition ζ r (X, Y ) < ∞. We obtain from the Taylor expansion and applying (8.2.14) with m = s − 1 that |E(f (X) − f (Y ))| 1 (1 − t)m−1 = (m − 1)! 1≤i1 ,...,im ≤k 0

· E Dim1 ,...,im f (tX)Xi1 · · · Xim − Dim1 ...im f (tY )Yi1 · · · Yim ≤

⎛ (1 − t)m−1 ⎝ |Dim1 ,...,im f (tx)xi1 · · · xim (m − 1)!

1

1≤i1 ,...,im ≤k 0

IRk

dt

⎞

− Dim1 ,...,im f (t0)xi1 · · · xim | |PX − PY |( dx)⎠ dt ≤

1

1≤i1 ,...,im ≤k 0

⎛

·⎝ ⎛ ≤

1 ⎝ (m − 1)! ·

(1 − t)m−1 (m − 1)!

⎞ da (tx, t0)|xi1 · · · xim | |PX − PY |(dx)⎠ dt

IRk

1

(1 − t)m−1 ta dt⎠

0

1≤i1 ,...,im ≤k

=

⎞

Γ(1 + α) v r (X, Y ). Γ(1 + r)

da (x, 0)|xi1 · · · xim | |PX − PY |( dx) IRk

2

8.2 Application to Stable Limit Theorems

107

r m a k k k Remark 8.2.4 (a) Since xr1 = |x | = |x | |x | i i i i=1 i=1 i=1

k a ≤ 1≤i1 ,...,im ≤k |xi1 . . . xim | , and on the other hand kxr1 ≥ i=1 |xi | k a 1≤i1 ,...,im ≤k |xi1 . . . xim | i=1 |xi | , we have vr (X, Y ) := xr1 |PX − PY |( dx) ≤ v r (X, Y ) ≤ k vr (X, Y ). (8.2.15) IRk

(b) By the same arguments as in (a) we have ζr ≤ ζ r ≤ k ζr .

(8.2.16)

In particular, by (8.2.11), ϑs,p is also estimated from above by the Zolotarev metric ζr (up to a constant). The following theorem gives an estimate of ϑs,p in terms of certain pseudomoments, which allows one to apply ϑs,p to stable distributions. For random vectors X, Y with densities uX , uY , deﬁne αs,p (X, Y ) =

k

i=1 1≤i1 ,...,is ≤k

·

y

1

t

,...,

IRk−1

⎡ 1 s−1 ⎣ |yi1 . . . yis | t−s−k (1 − t) (uX − uY ) (s − 1)! IR

⎤1/p

yk dt |yi |p dyi ⎦ t

0

dyi . . . dyi−1 dyi+1 . . . dyk .

If k = 1, then after some transformations we obtain αs,p (X, Y )

= =

Fs,X − Fs,Y p (8.2.17) ⎛ x ⎞1/p (x − t)s−1 ⎝ d(FX − FY )(t)|p dx ⎠ , (s − 1)! −∞

Fs,X (x) :=

1 E(x − X)s−1 + (s − 1)!

(see Maejima and Rachev (1987), Rachev and R¨ uschendorf (1990)). Indeed, αs,p is an ideal metric of order r. Representation (8.2.17) shows that αs,p depends only on the diﬀerence pseudomoments. A similar representation holds also for k ≥ 1. Theorem 8.2.5 αs,p is the upper bound for ϑs,p ; i.e., ϑs,p ≤ αs,p .

(8.2.18)

108

8. Probabilistic-Type Limit Theorems

Proof: By (8.2.14) and the Taylor expansion, E(f (X) − f (Y )) = IRk

1

1≤i1 ≤···≤is ≤k 0

(1 − t)s−1 (s − 1)!

· Dis1 ,...,is f (tx1 , . . . , txk )xi1 · · · xis dt d(FX − FY )(x) ⎡ 1 (1 − t)s−1 ⎣ = Dis1 ,...,is f (y1 , . . . , yk )yi1 . . . yis (s − 1)! 1≤i1 ,...,is ≤k 0 IRk ⎤

y −s−k · (uX − uY ) dy ⎦ dt. t t This implies, by making use of H¨older’s inequality, that |E(f (X) − f (Y ))| ≤

k s−1

i=1 j=0 1≤i1 ,...,ij ≤k,ij =i

|yi1 . . . yij | IRk−1

⎡ ⎣Di1 ,...,ij ,i,...,i f (y)

IR

1

s−1 (1 − t) y (uX − uY ) dt · t−s−k (s − 1)! t 0 ⎤ · |yi |s−j dyi ⎦ dy1 . . . dyi−1 dyi+1 . . . dyk

≤

k s−1

i=1 j=0 1≤i1 ,...,ij ≤k,ij =i

|yi1 . . . yij |Di1 ,...,ij ,i,...,i f q,i (y) IRk−1

' ' 1 ' '

y s−1 ' ' (1 − t) −s−k s−j ' (uX − uY ) |yi | dt' ·' t ' (s − 1)! t ' ' 0

p

dy1 . . . dyi−1 dyi+1 . . . dyk , 2

which is equivalent to the representation in (8.2.17). For random vectors in IRk we deﬁne the L´evy distance by L(X, Y ) =

inf{ε > 0; P (X ∈ Bx ) ≤ P (Y ∈ Bx (ε)) + ε,

(8.2.19)

P (Y ∈ Bx ) ≤ P (X ∈ Bx (ε)) + ε, ∀x ∈ IR }, k

8.2 Application to Stable Limit Theorems

109

where Bx := {y ∈ IRk ; yi ≤ xi , 1 ≤ i ≤ k} and Bx (ε) := {y ∈ IR ; y − Bx ≤ ε}, k

y =

1 k

21/2 yi2

;

i=1

note that P (X ∈ Bx ) = FX (x). L metrizes the topology of weak convergence. If X has a bounded density uX , then (X, Y ) := sup |FX (x) − FY (x)| 1 2 k ≤ 1+ sup uXi (x) L(X, Y ), i=1

x

(8.2.20) X = (X1 , . . . , Xk ).

Next, we establish that ϑs,p convergence implies weak convergence by providing a lower bound of ϑs,p in terms of L. Theorem 8.2.6 Let s = 1, 2, . . . , p ∈ [1, ∞], r = s − 1 + p1 . Then ϑs,p (X, Y ) ≥ a(s, k)Lr+1 (X, Y ), where a(s, k) :=

Vr 2s+k s!

,

Vr :=

(8.2.21)

(1 − x2 )r+1 dx.

(8.2.22)

{x≤1}

Proof: Let L(X, Y ) > ε. Then without loss of generality we can assume that for some z = (z1 , ..., zk ), P (X ∈ Bz ) − P (Y ∈ Bz (ε)) > ε.

(8.2.23)

We deﬁne gr (x) = (1 − x2 )r+1 + ,

x ∈ IRk (a+ = max(0, a)),

(8.2.24)

and “normalize” gr by g r (x) :=

gr (x) . Vr

(8.2.25)

Finally, we deﬁne the smoothed version of the indicator of Bz : %

ε & ε I x − y ∈ Bz g r (y) dy uε (x) = 2 2 IRk

=

k %

ε & 2 2 I y ∈ Bz gr (x − y) d y. ε 2 ε IRk

(8.2.26)

110

8. Probabilistic-Type Limit Theorems

Since 0 ≤ uε (x) ≤

g r (y) dy ≤ 1, we have

IRk

|fε | ≤ 1, where fε (x) := 2 uε (x) − 1; and furthermore, ⎧ ⎨ 1 if x ∈ B , z fε (x) = ⎩ 0 if x ∈ / Bz (ε).

(8.2.27)

(8.2.28)

In fact, we have for x ∈ Bz , %

ε & ε uε (x) = I x − y ∈ Bz g r (y) dy 2 2 IRk %

ε & ε = I x − y ∈ Bz , y ≤ 1 g r (y) dy 2 2 IRk = g r (y) dy = 1. IRk

Similarly, for x ∈ / Bz (ε), uε (x) = 0. In the next step we establish bounds on the derivatives of fε . To that purpose let Ls (f ) =

sup ess sup Dis1 ,...,is f q,i (x).

(8.2.29)

2ε−r . a(s, k)

(8.2.30)

1≤i≤k x∈IRk−1

Then Ls (fε ) ≤

To show (8.2.30), observe that Dis1 ,...,is fε (x) (8.2.31) k+s %

& 2 2 ε = 2 (x − y) d y Dis1 ,...,is g s I y ∈ Bz ε 2 ε IRk

s %

ε & 2 = 2 Dis1 ,...,is g s (y) dy. I x − εy ∈ Bz ε 2 IRk

By Minkowski’s inequality, we get the following bound for the norm of the above quantity: Dis1 ,...,is fε q,1 (x)

(8.2.32)

8.2 Application to Stable Limit Theorems

⎡ = ⎣

111

⎤1/q

|Dis1 ,...,is fε (x1 , x2 , . . . , xn )|q dx1 ⎦

IR

⎧ s ⎨ 2 = 2 ε ⎩

%

& I x − εy ∈ Bz ε , y ≤ 1 2 IR IRk ⎫1/q ⎬ ·I{Bz (ε)\Bz }Dis1 ,...,is g s (y) dy|q dx1 ⎭

s 2 = 2 ε ≤ 2

|Dis1 ,...,is g s (y)|

{y≤1}∩(Bz (ε)\Bz ) s

2 ε

⎧ ⎨ ⎩

I{x − εy ∈ Bz

IR

ε 2

⎫1/q ⎬ dx1 dy ⎭

|Dis1 ,...,is g s (y)| {y≤1}∩(Bz (ε)\Bz )

·

⎧ ⎨ ⎩

I{z1 ≤ x1 − εy1 ≤ z1 + ε} dx1

IR

⎫1/q ⎬ ⎭

dy.

In fact, the inequality is valid a.e. with respect to Lebesgue measure λ\k−1 . The last integrals are estimated as follows: ⎧ ⎫1/q ⎬ ⎨ I{z1 ≤ x1 − εy1 ≤ z1 + ε} dx1 = ε1/q , (8.2.33) ⎩ ⎭ IR

and

|Dis1 ,...,is g s (y)| dy

(8.2.34)

{y≤1}∩(Bz (ε)\Bz )

=

1 Vr

=

1 Vr

≤

{y≤1}

1 2s−1 n s−1 I{Bz (ε)\Bz } Di1 ,...,is−1 s 1 − yi2 2yis dy i=1 + I{Bz (ε)\Bz }s!2s |yi1 · · · yis | dy

{y≤1}

1 s!2s Vr

1 1 · · · |yi1 · · · yis | dy ≤

−1

−1

1 1 s!2s · 2k−s = s!2k . Vr Vr

Similarly, we can argue for any index 1 ≤ i ≤ k, and thus (8.2.30) follows from (8.2.33) and (8.2.34).

112

8. Probabilistic-Type Limit Theorems

From the inequality in (8.2.30) we obtain that given fε (x) , Ls (fε )

f ∗ (x) :=

≥ E(f ∗ (X) − f ∗ (Y )) εr ≥ a(s, k)E(fε (X) − fε (Y )). 2

ϑs,p (X, Y )

(8.2.35)

Applying (8.2.27), (8.2.28) we arrive at the following decomposition: E(fε (X) − fε (Y )) =

(fε (x) + 1)(PX − PY )( dx) IRk

⎛

⎜ = ⎝

+

+

⎞ ⎟ ⎠(fε (x) + 1)(PX − PY )( dx)

Bz Bz (ε)\Bz IRk \Bz (ε)

=: I1 + I2 + I3 , where (fε (x) + 1)(PX − PY )( dx) = 2(PX − PY )(Bz );

I1

=

I2

≥ −2 PY (Bz (ε)\Bz );

I3

= 0.

Bz

Thus by (8.2.23), I1 + I2 + I3 ≥ 2(PX (Bz ) − PY (Bz )) − 2(PY (Bz (ε)) − PY (Bz )) ≥ 2ε. From (8.2.35) we ﬁnally obtain ϑs,p (X, Y ) ≥ εr+1 a(s, k). With ε → L(X, Y ), this implies (8.2.21). 2

Remark 8.2.7 (a)

Let us use the polar transformation

x1 x2 x3 .. .

= cos ϑ1 = cos ϑ1 = cos ϑ1

xk

= sin ϑ1 ,

· · · cos ϑk−2 cos ϑk−1 , · · · cos ϑk−2 sin ϑk−1 , · · · sin ϑk−2 ,

(8.2.36)

8.2 Application to Stable Limit Theorems

113

where > 0, 0 ≤ ϑ1 ≤ 2π, 0 ≤ ϑj ≤ π, 2 ≤ j ≤ k − 1, and ∂(x1 , . . . , xk ) ∂(, ϑ1 , . . . , ϑk−1 )

k−1 Dk (ϑ), ⎛ sin ϑ1 ⎜ sin ϑ1 ⎜ ⎜ Dk (ϑ) := det ⎜ sin ϑ1 ⎜ .. ⎝ . =

cos ϑ1

· · · sin ϑk−1 · · · sin ϑk−2 · · · cos ϑk−2 0

···

(8.2.37) ⎞

sin ϑk−1 cos ϑk−1 0

⎟ ⎟ ⎟ ⎟. ⎟ ⎠

0

Then we have 1 Vr

= Dk

(1 − 2 )r+1 k−1 d

(8.2.38)

0

1 = Dk

(1 − 2 )r+1 (2 )

k−2 2

d2

0

1 Γ(r + 2)Γ( k2 ) , = Dk 2 Γ(r + 1 + k2 ) where

Dk :=

Dk (ϑ) dϑ.

/ IN) in terms of (b) Note that lower bound of ϑs,p (r = s − 1 + p1 ∈ the Prohorov metric exists. This follows from an example in Maejima and Rachev (1987) in the case k = 1. We next investigate smoothing inequalities, which play an important role in the proof of Berry–Ess´een-type theorems. They are also of interest for the study of intrinsic properties of probability metrics. Lemma 8.2.8 (a) Let Z be independent of X, Y, ε > 0, r = s − 1 + p1 . Then ϑs,p (X, Y ) ≤ ϑs,p (X + εZ, Y + εZ) + 2

Γ(1 + p1 ) Γ(1 + r)

εr kEZr1 . (8.2.39)

(b) If Z, W are independent of X, Y , then ϑs,p (X + Z, Y + Z) ≤ ϑs,p (X, Y )σ(W, Z) + ϑs,p (X + W, Y + W ); (8.2.40) and moreover, ϑs,p (X + Z, Y + Z) ≤ ϑs,p (W, Z)σ(X, Y ) + ϑs,p (X + W, Y + W ), (8.2.41)

114

8. Probabilistic-Type Limit Theorems

where σ is the total variation distance. Proof: (a) By the regularity of ϑs,p (cf. 8.2.1) we have ϑs,p (X, Y )

≤ ϑs,p (X + εZ, Y + εZ) + 2ϑs,p (0, εZ) ≤ ϑs,p (X + εZ, Y + εZ) + 2εr ϑs,p (0, Z).

By (8.2.11) and (8.2.15), ϑs,p (0, Z) ≤

Γ(1 + α) Γ(1 + r)

1≤i1 ,...,is−1 ≤k

E

k

|Zi |α |Zi1 · · · Zis−1 |.

i=1

(b) For any f ∈ Gs,p we have |E(f (X + Z) − f (Y + Z))| (8.2.42) ≤ |E(f (X + Z) − f (X + W )) − E(f (Y + Z) − f (Y + W ))| + |E(f (X + W ) − f (Y + W ))|. If f ∈ Gs,p , then the translates fx (z) := f (x + z) are also in Gs,p , and therefore, the ﬁrst term is estimated by conditioning on X (respectively Y ): E(f (x + Z) − f (x + W )) dPX (x) − E(f (x + Z) − f (x + W )) dPY (x) = E(f (x + Z) − f (x + W )) d(PX − PY )(x) (8.2.43) ≤ ϑs,p (Z, W )σ(X, Y ). Indeed, the inequalities (8.2.42), (8.2.43) imply ϑs,p (X + Z, Y + Z) ≤ ϑs,p (Z, W )σ(X, Y ) + ϑs,p (X + W, Y + W ). The other case is derived similarly.

2

Lemma 8.2.9 If Z is independent of X, Y and PZ has a density pZ having integrable (s − 1)-fold derivatives Cs,Z := sup |Dis−1 p (x)| dx < ∞, (8.2.44) 1 ,...,is−1 Z 1≤i1 ,...,is−1 ≤k IRk

then ϑ1,p (X + Z, Y + Z) ≤ Cs,Z ϑs,p (X, Y ).

(8.2.45)

8.2 Application to Stable Limit Theorems

115

Proof: For any f ∈ G1,p , E(f (X + Z) − f (Y + Z)) =

f (x) d(FX+Z − FY +Z )(x) IRk

⎛

IRk

⎞

⎝

=

(8.2.46)

f (x)pZ (x − z) d(FX − FY )(z)⎠ dx

IRk

f ∗ (z) d(FX − FY )(z),

= IRk

where f ∗ (z) =

f (x)pZ (x − z) dx. From the Taylor expansion,

IRk

f ∗ (z) = f ∗ (0) +

s−1

j=1 1≤i1 ,...,ij ≤k

+

1

1≤i1 ,...,is ≤k 0

Dij1 ,...,ij f ∗ (0)zi1 · · · zij

(8.2.47)

(1 − t)s−1 s D f ∗ (tz)zi1 · · · zis dt. (s − 1)! i1 ,...,is

Since f ∈ G1,p , i.e., ⎞1/q

⎛

⎝

|Di1 f (x1 , . . . , xi , . . . , xn )|q dxi ⎠

≤ 1

a.s.,

(8.2.48)

IR

we have the following bound for the qth norm of f ∗ -derivatives: ⎛

⎝

IR

⎞1/q |Dis1 ,...,is f ∗ (z1 , . . . , zn )|q dzi ⎠

q ⎞1/q s = ⎝ Di1 ,...,is f (x)pZ (x − z) dx dzi ⎠ IR IRk q ⎞1/q ⎛ 1 = ⎝ Di1 f (x)Dis−1 pZ (x − z) dx dzi ⎠ 2 ,...,is IR IRk q ⎞1/q ⎛ 1 = ⎝ Di1 f (x + z)Dis−1 pZ (x) dx dzi ⎠ 2 ,...,is k ⎛

IR

IR

(8.2.49)

116

8. Probabilistic-Type Limit Theorems

q ⎞1/q 1 s−1 Di1 f (x + z) (Di2 ,...,is pZ (x)) dx dzi ⎠ = ⎝ k IR IR ⎧ ⎫1/q ⎨ ⎬ 1 q (Di f (x + z))(Ds−1 pZ (x)) dzi ≤ dx i2 ,...,is 1 ⎩ ⎭ ⎛

IR

IRk

=

⎧ ⎫1/q ⎬ ⎨ 1 q Di f (x + z) dzi p (x) dx Z i2 ,...,is 1 ⎩ ⎭

s−1 D

IR

IRk

|Dis−1 pZ (x)| dx = Cs,Z 2 ,...,is

≤

by (8.2.48)).

IRk

Summarizing the results in (8.2.46), (8.2.47), and (8.2.48), we derive the desired inequality (8.2.45). 2 As a consequence of Lemmas 8.2.8, 8.2.9 we next obtain an estimate between ϑ1,p and ϑs,p . Theorem 8.2.10 For every s = 1, 2, . . . , p ∈ [1, ∞), r := s − 1 + p1 , and random vectors X, Y on IRk we have 1

pr ϑ1,p (X, Y ) ≤ A(s, p, k)ϑs,p (X, Y ),

(8.2.50)

where 1

A(s, p, k) := a pr b

s−1 r

1

(p(s − 1)) pr

r s−1

(8.2.51)

and 1 a := √ s

2s π

1 7

s/2 ,

b := 2k k

2 π

21/p .

(8.2.52) d

Proof: Recall the inequality (8.2.39). Then for any ε > 0 and any Z = N (0, 1) independent of X, Y , we have 1/p

ϑ1,p (X, Y ) ≤ ϑ1,p (X + εZ, Y + εZ) + 2ε1/p k EZ1 ;

(8.2.53)

and furthermore, 1/p EZ1

≤ (EZ1 )

1/p

=

1 k i=1

21/p E|Zi |

1 7 21/p 2 = k . π

(8.2.54)

8.2 Application to Stable Limit Theorems

117

Now we apply Lemma 8.2.9 to get ϑ1,p (X + εZ, Y + εZ) ≤ Cs,εZ ϑs,p (X, Y ).

(8.2.55)

Next, assuming that Z is standard normal with independent components, we bound the constant in (8.2.55) as follows: x Cs,εZ = sup |Dis−1 pZ ( )|ε−s−n+1 dx (8.2.56) 1 ,...,is−1 ε i1 ,...,is−1 IRk

1−s

= ε

Cs,Z = ε1−s Cs,s−1/2 (Z1 +···+Zs ) .

Here, Z1 , . . . , Zs are i.i.d. copies of Z, and thus (see Zolotarev (1977, 1983, 1986)) Cs,εZ

1

= ε1−s s− 2 (1−s) Cs,Z1 +···+Zs s−1 2

≤ ε1−s s where C1,Z

=

sup 1≤i≤k IRk

=

sup i

IRk

=

sup i

IR

(8.2.57)

(C1,Z )s ,

|Di1 pZ (x)| dx 1 √ 2π

n

|xi |e−

x2 i 2

8 j =i

x2 1 i √ |xi |e− 2 dxi = 2π

e− 7

x2 j 2

dx

2 . π

Therefore, from (3.15), (3.17), (3.19) we obtain ϑ1,p (X, Y ) ≤ ε

1−s

1 √ s

2s π

1 7 21/p s/2 2 1/p ϑs,p (X, Y ) + 2ε k k . (8.2.58) π

Deﬁne ϕ(x) := ax1−s + bx1/p , 1 a := √ s

2s π

s/2 ϑs,p (X, Y ),

1 7 b := 2k k

Minimizing ϕ with respect to x yields (8.2.50).

2 π

21/p . 2

As a consequence of the smoothing properties we shall establish a Berry– Ess´een-type result, that will provide the right order estimate in the stable central limit theorem in terms of the metric ϑ1,1 . Let (Xi ) be an i.i.d.

118

8. Probabilistic-Type Limit Theorems

sequence of random vectors in IRk ; let (Θ, Θi ) be an i.i.d. sequence of d

symmetric α-stable distributed random vectors, i.e. n−1/α (Θ1 +· · ·+Θn ) = Θ; and deﬁne ϑr := ϑr (X1 , Θ) := sup hr−1 ϑ1,1 (X1 + hΘ, Θ1 + hΘ),

(8.2.59)

h>0

σr := σr (X1 , Θ) := sup hr σ(X1 + hΘ, Θ1 + hΘ), (8.2.60) h>0 1 ϑ := ϑ1,1 (X1 , Θ), σ := σ(X1 , Θ), τr := max ϑ, σ, σr(r−α) . (8.2.61) Theorem 8.2.11Suppose that 1 < α ≤ 2, α < r, and ϑ+ϑr +σ+σr < ∞. n Let Zn = n−1/α i=1 Xi . Then for some absolute constant C = C(k) depending only on the dimension k,

r ϑ1,1 (Zn , Θ) ≤ C n1− α ϑr + τr n−1/α . (8.2.62) Proof: We shall use the notation ϑ(X, Y ) = ϑ1,1 (X, Y ) during this proof. Note that by (8.2.10), ϑ(X, Y ) = ζ 1 (X, Y ) = ζ1 (X, Y ). From the smoothing inequality (8.2.39) we obtain the following bound: For any ε > 0 with Θi , Θ independent and identically distributed, ϑ(Zn , Θ1 ) ≤ ϑ(Zn + εΘ, Θ1 + εΘ) + Cε,

(8.2.63)

where C := 2 k EΘ1 . Our proof will be based on the Bergstr¨om convolution method (cf. Rachev (1991, Chapter 18) and the references therein). We start by making use of the triangle inequality: ϑ(Zn + εΘ, Θ1 + εΘ) Θ 1 + X2 + · · · + X n + εΘ ≤ ϑ Zn + εΘ, n1/α m Θ1 + · · · + Θj + Xj+1 + · · · + Xn + ϑ + εΘ, n1/α j=1

(8.2.64)

Θ1 + · · · + Θj+1 + Xj+2 + · · · + Xn + εΘ n1/α Θ1 + · · · + Θm+1 + Xm+2 + · · · + Xn +ϑ + εΘ, Θ1 + εΘ n1/α m =: Io + Ij + Im+1 , m = [n/2]. j=1

Applying the smoothing property (8.2.41), we obtain X2 + · · · + Xn Θ2 + · · · + Θn Io ≤ ϑ , n1/α n1/α

(8.2.65)

8.2 Application to Stable Limit Theorems

119

· σ n−1/α X1 + εΘ, n−1/α Θ1 + εΘ X 1 + Θ2 + · · · + Θ n Θ 1 + · · · + Θn + εΘ, + εΘ . +ϑ n1/α n1/α Similarly, for 1 ≤ j ≤ m, we have

Ij

Xj+2 + · · · + Xn Θj+2 + · · · + Θn ≤ ϑ , (8.2.66) n1/α n1/α Θ1 + · · · + Θj + Xj+1 Θ1 + · · · + Θj+1 ·σ + εΘ, + εΘ 1/α n n1/α Θ1 + · · · + Θj + Xj+1 + Θj+2 + · · · + Θn +ϑ + εΘ, Θ1 + εΘ . n1/α

Summarizing the above inequalities, we get the bound ϑ(Zn , Θ1 ) ≤

5

∆j ,

(8.2.67)

j=1

where

∆1 := ∆2 :=

∆3 := ∆4 := ∆5 :=

X2 + · · · + X n Θ 2 + · · · + Θ n X1 Θ1 ϑ , + εΘ, 1/α + εΘ , σ n1/α n1/α n1/α n m Xj+2 + · · · + Xn Θj+2 + · · · + Θn ϑ , n1/α n1/α j=1 Θ1 + · · · + Θj + Xj+1 Θ1 + · · · + Θj+1 + εΘ, + εΘ , ·σ n1/α n1/α X1 + Θ2 + · · · + Θ n (m + 1)ϑ , Θ , n1/α Θ1 + · · · + Θm+1 + Xm+2 + · · · + Xn ϑ + εΘ, Θ + εΘ , 1 n1/α Cε = 2k EΘ1 ε.

To estimate the terms ∆i , we introduce smoothed versions of the metrics ϑ, σ deﬁned as follows: For r > 1, ϑr (X, Y )

:=

sup hr−1 ϑ(X + hΘ, Y + hΘ),

(8.2.68)

h>0

σr (X, Y )

:=

sup hr σ(X + hΘ, Y + hΘ)

(8.2.69)

h>0

(cf. also (8.2.59), (8.2.60)). It is easy to see that both ϑr , σr are ideal metrics of order r.

120

8. Probabilistic-Type Limit Theorems

We ﬁrst estimate ∆3 and ∆4 using the ideality of ϑr . In the rest of the proof, c stands for a general constant, which may be diﬀerent at diﬀerent places: 1 1/α 1/α 2 n−1 n−1 −1/α −1/α X1 + Θ, n Θ1 + Θ ∆3 = (m + 1)ϑ n n n r−1

α n ϑr n−1/α X1 , n−1/α Θ ≤ (m + 1) (8.2.70) n−1 ≤ c n1− α ϑr . r

Similarly, ∆4

≤ ϑr

Xm+2 + · · · + Xn Θm+2 + · · · + Θn , n1/α n1/α

n m+1

r−1 α (8.2.71)

≤ c n1− α ϑr . r

Deﬁne

1 ε := A max σ1 , σrr−α n−1/α ,

A > 0.

(8.2.72)

The proof continues by induction on n. The induction hypothesis states that for all j < n, r X 1 + · · · + X j Θ1 + · · · + Θ j , (8.2.73) ϑ ≤ B(ϑr j 1− α + τr j −1/α ). j 1/α j 1/α (For n = 1, . . . , n0 , n0 ﬁxed, (8.2.62) follows from τr ≥ ϑ and the ideality of ϑ1,1 .) Then, for ∆1 = ∆1 (n), we obtain ∆1 ≤ Bc(n1− α ϑr + n−1/α τr )σ(n−1/α X1 + εΘ, n−1/α Θ1 + εΘ) (8.2.74) r ≤ Bc(n1− α ϑr + n−1/α τr )ε−1 n−1/α σ1

r 1 Bc n1− α ϑr + n−1/α τr . ≤ A r

Similarly, we estimate ∆2 : ∆2

≤ cB(ϑr (n − m − 2)1− α + τr (n − m − 2)−1/α ) −r/α ∞ j

−1/α −1/α α σr n X1 , n Θ · +ε n i=1 r

∞

r ≤ cB ϑr n1− α + τr n−1/α

≤ cB ϑr n

r 1− α

+ τr n−1/α

j=1

σr (j + εα n)r/α

1 . Ar−α

(8.2.75)

8.2 Application to Stable Limit Theorems

From (8.2.70)–(8.2.75) we infer r 1 1 + r−α B ϑr n1− α + τr n−1/α ϑ(Zn , Θ) ≤ C1 A A

r + C2 ϑr n1− α + Aτr n−1/α . 1 Choosing A big enough so that C1 ( A1 + Ar−α )< B/2 > C2 (1 + A), we complete the proof.

1 2

121

(8.2.76)

and then B such that 2

Remark 8.2.12 (a) We note that the conditions concerning the domain of attraction of Θ are given solely in terms of the metrics appearing in the upper bounds. (b) Since Θ has a density pΘ with integrable derivatives, we can get, similarly to the proof of Lemma 8.2.9, that σ(X + δΘ, Y + δΘ) ≤ Cδ −1 ϑ1,1 (X, Y ),

(8.2.77)

where C = C(Θ). This implies that for any 0 < ε < 1, σr (X, Y )

sup hr σ(X + hΘ, Y + hΘ) h>0

= sup hr σ X + h(1 − ε)1/α Θ1 + hε1/α Θ2 , h>0 Y + h(1 − ε)1/α Θ1 + hε1/α Θ2

≤ sup hr ϑ1,1 X + h(1 − ε)−1/α Θ1 , Y + h(1 − ε)1/α Θ2 =

h>0

· C(Θ)/(h(1 − ε)1/α ) = C(Θ)(1 − ε)(1−r)/α ε−1/α ϑr (X, Θ). The minimum in the right-hand side is attained for ε = 1/r, implying that σr (X, Y )

≤ C(Θ)rr/α (r − 1)(1−r)/α ϑr (X, Θ) ≤ C(Θ)2

r/α

(8.2.78)

ϑr (X, Θ).

Similarly to the proof of Theorem 8.2.10, relations (8.2.77), (8.2.78) allow us to replace (8.2.62) by a bound involving only ϑ, ϑr :

r −1/α }n ϑ1,1 (Zn , Θ) ≤ C n1− α ϑr + max{ϑ, ϑ1/(r−α) . (8.2.79) r For r ∈ IN, ϑr , the r-smoothed ϑ metric, can be estimated from above by the ζr metric: ϑr ≤ cr ζr ,

(8.2.80)

122

8. Probabilistic-Type Limit Theorems

(r) where cr depends on supz≤1 |pΘ (y)(z)| dy (cf. Zolotarev (1983, p. 294, property 6)). Also, for r ∈ IN, ζr is estimated from above by the rth diﬀerence pseudomoment kr , deﬁned by % kr (X, Y ) = sup |E(f (X) − f (Y ))|; f bounded, f : IRk → IR, (8.2.81) ' '& |f (x) − f (y)| ≤ 'xxr−1 − y yr−1 ' and kr (X, Y ) ≤ 2r vr (X, Y ) = 2r xr |PX − PY |( dx),

(8.2.82)

where vr is the absolute pseudomoment of order r. From (8.2.79)– (8.2.82) we obtain easy-to-check criteria for ﬁniteness of the upper bounds. In particular, in the normal case α = 2, we take r = 3, and so the ﬁniteness of the third moments of Xi implies the Berry–Ess´een result. In the case 1 < α < 2 we use the boundness of kr in (8.2.81). (c) In the case k = 1, α = 2 (normal case, dimension one) the result of Theorem 8.2.11 is due to Zolotarev (1987), based on the proof of Senatov (1980). From part (b) of the above remark, it follows that we can replace the terms ϑ, σ, ϑr , σr in the upper bound in (8.2.62) by k1 and kr . Since kr is topologically weaker than vr , it is of interest to obtain alternative bounds for kr . To this end let us recall the minimal r -metric: for r > 0, % & d d − Y r )(1/r)∧1 ; X = X, Y = Y . (8.2.83)

r (X, Y ) := inf (EX Then kr (X, Y ) =

% & d d X r−1 − Y Y r−1 |; X = inf |EX X, Y = Y (8.2.84)

= 1 (XXr−1 , Y Y |r−1 ) % & d d = inf EU − V ; U = XXr−1 , V = Y Y r−1 (cf. Rachev and R¨ uschendorf (1990)). If X and Y have densities fX and fY , respectively, then (cf. Rachev (1991, pp. 249–252)) we use that k1 = 1 to get the bound kr (X, Y )

≤

αr (X, Y ) (8.2.85) 1

x

x −k−1 − fY Y r−1 dt dx. := x1 t fXXr−1 t t k IR

0

8.2 Application to Stable Limit Theorems

123

For some examples with an equality in (8.2.85), see Rachev (1991, p. 252). The densities of XXr−1 , Y Y r−1 are obtainable from the transformation formula. In particular, this gives explicit bounds in the case r = 1, where the expression in (8.2.85) simpliﬁes. The following upper bound for kr , r > 1, will turn out to be useful. Lemma 8.2.13 If EXr < ∞, EY r < ∞, r > 1, then kr (X, Y ) ≤ c r (X, Y ),

(8.2.86)

where the constant c depends on the rth moments of X, Y . Proof: Let r :=

r r−1

d

d

and U = X, V = Y . Then

' ' E 'U U r−1− V V r−1 '

≤ EU r−1 U −V + EV U r−1− V r−1 =: I1 + I2 .

For I1 , I2 we readily get the bounds I1 ≤ (EU r )1/r (EU − V r )1/r and ⎧ r 1/r r 1/r ⎪ ⎪ 1 < r ≤ 2, ⎪ (EV ) (EU − V ) , ⎨ r 1/r I2 ≤ (r − 1)(EU − V ) ⎪

⎪ r−2 1 ⎪ ⎩ · (EV r )1/r + (EV r ) r−1 (EU r ) r−1 , r > 2. So I1 + I2 ≤ c(EU − V r )1/r , and passing to the corresponding minimal 2 metrics, we get kr (X, Y ) ≤ c r (X, Y ), as required. In some examples one can determine kr explicitly. Suppose that for some radial transformation ⎧ ⎨ α(x) x if x = 0, x φ : IRk → IRk , φ(x) := ⎩ 0 if x = 0, d

with α monotonically nondecreasing, we have Y = φ(X). Examples of this relation include spherically invariant distributions and spherically equivalent distributions, as for example, the uniform distribution on a p-ball in IRk and the product of Weibull distributions (cf. Section 3.2). By (8.2.84), it is easy to see that the pair (X, φ(X)) is an optimal coupling with respect to kr , and so we obtain ' ' (8.2.87) kr (X, Y ) = E 'XXr−1 − φ(X)φ(X)r−1 ' = E |Xr − α(X)r | .

124

8. Probabilistic-Type Limit Theorems

(A related explicit formula was derived for the r distance in Section 3.2.) Note that α is determined by the equation FY (y) = P (α(X) ≤ y) = −1 FX (α−1 (y)), which in the case of FX continuous leads to α(t) = FY ◦ FX (t). We illustrate the above resutls, invoking again the stable limit theorem 8.2.11. Let Θ be a k-dimensional α-stable random vector with spectral measure m such that xα dm(x) < ∞; i.e., IRk

⎧ ⎫ ⎨ 1 ⎬ E exp{it, Θ} = exp − |t, s|α dm(s) . ⎩ 2 ⎭

(8.2.88)

IRk

We apply the LePage representation for symmetric α-stable laws. Let 1. (Yj ) be an i.i.d. sequence of random vectors with distribution m/|m| and let Yj := |m|1/α Yj ; 2. ( ηj ) be i.i.d. symmetric random variables with η1 α = (E| η1 |α )1/α < ∞ and let ηj := ηj/ηj α ; 3. (Γj ) be the sequence of successive times of jump of a standard Poisson process and assume that the three sequences are independent. Then d

Θ = cα

∞

−1/α

Γj

ηj Y j ,

(8.2.89)

j=1

where the constant cα is determined by the tail behavior of the law of Θ. Without loss of generality we set cα = 1 (cf. Ledoux and Talagrand (1991, Section 5.1) and Samorodnitsky and Taqqu (1994)). Suppose that the distribution of X has a similar representation in distribution d

X =

∞

−1/α ∗ ∗ ηj Yj ,

Γj

(8.2.90)

j=1

where (Yj∗ ), (ηj∗ ) are independent but not necessarily identically distributed. Recall the bound (8.2.80) to see that what we need is an estimate for ζr (X, Θ) from above. Proposition 8.2.14 Let r > max{1, α}. Suppose that Yj , Yj∗ , ηj , ηj∗ have ﬁnite rth moments with supj E|ηj∗ |r < ∞. Then ⎛ ⎞ ∞ ζr (X, Θ) ≤ C sup ⎝ r (Yj∗ , Yj ) + |x|r−1 |Fηj∗ (x) − Fηj (x)| dx⎠. (8.2.91) j≥1

−∞

8.2 Application to Stable Limit Theorems

125

Proof: By the ideality of ζr , ζr (X, Θ) ≤

∞

−r/α

E(Γj

)ζr (ηj∗ Yj∗ , ηj Yj ).

j=1

Since r > α, and for j > r/α, −r/α

EΓj

the series Sr =

=

Γ(j − r/α) ∼ j −r/α , Γ(j)

∞ j=1

−r/α

E(Γj

) converges. Furthermore,

ζr (ηj∗ Yj∗ , ηj Yj ) ≤ ζr (ηj∗ Yj∗ , ηj∗ Yj ) + ζr (ηj∗ Yj , ηj Yj ) ≤

(E|ηj∗ |r )ζr (Yj∗ , Yj )

+

k

(E|Yj,i |r )ζr (ηj∗ , ηj )

i=1

≤ C ζr (Yj∗ , Yj ) + ζr (ηj∗ , ηj ) , where Yj,i is the ith compoment of Yj . Since ζr (ηj∗ , ηj )

1 1 ≤ kr (ηj∗ , ηj ) = r! (r − 1)!

∞ |x|r−1 |Fηj∗ (x) − Fηj (x)| dx, −∞

we obtain (8.2.91) after applying the inequality ζr (Yj∗ , Yj ) ≤ and Lemma 8.2.13.

1 ∗ r! κr (Yj , Yj )

2

Under the additional assumption sup EYj∗ r < ∞

(8.2.92)

j

we obtain (by the obvious bound r (Yj∗ , Yj ) ≤ (EYj∗ r )1/r + (EYj r )1/r ) the ﬁniteness of the upper bound in the limit theorem in (8.2.62). In this way we establish a stable limit theorem (with an estimate of the rate of convergence) for random vectors in the r -neighborhood of a stable symmetric law in the sense of the LePage representation. For r = 1 we use the estimate in (8.2.85). For r > 1 (and in particular r = 2) explicit expressions for r are known in several cases (cf. Section 3.2), for example for the distance r (X, Y ) between normal distributed random vectors X and Y , between uniform distributions on balls and multivariate normal, or Weibull, distributions, and between spherically equivalent distributions.

126

8. Probabilistic-Type Limit Theorems

8.3 Application to Summability Methods and Compound Poisson Approximation In this section we apply the ϑs,p -metric (see (8.2.2)) to obtain rate of convergence results in stable limit theorems for multivariate summability methods, thus extending some results of Maejima (1985) in the real case. We also study the approximation of sums of independent random variables by compound Poisson distributions. Let (Xn )n≥0 be an i.i.d. sequence of random vectors in IRk and consider the weighted sums T (λ) :=

∞

cj (λ)Xj ,

cj (λ) ≥ 0,

(8.3.1)

j=0

where for λ > 0 or λ ∈ IN, (cj (λ)), j ≥ 0, is a summability method Some classical summability methods are ⎧ ⎨ 1 , 0 ≤ j ≤ n, n+1 (8.3.2) “C´esaro method” cj (λ) = ⎩ 0, otherwise; “Borel method” “Euler method”

λj −λ e , λ > 0, j ∈ IN0 ; j! n j cj (λ) = λ (1 − λ)n−j , j 0 ≤ j ≤ n, 0 < λ < 1;

cj (λ) =

cj (λ) = (1 − e−1/λ )e−j/λ , 0 ≤ j < ∞;

“Abel method”

(8.3.3) (8.3.4)

(8.3.5)

“random walk method” cj (n) = P (Sn = j), 0 ≤ j < ∞, (8.3.6) where Sn is a random walk on the integers IN0 . For a review and discussion of these methods in the univariate case we refer to Maejima (1985). Let Θ(α) denote a random vector with symmetric stable distribution of index α, 0 < α ≤ 2. Recall (see Samorodnitsky and Taqqu (1994)) that for 0 < α < 2, Θ(α) is symmetric α-stable in IRk if and only if there exists a (unique) symmetric ﬁnite measure Γ on the unit sphere Sk such that ϕΘ(α) (t)

E exp it, Θ(α) ⎧ ⎫ ⎨ ⎬ = exp − |(t, s)|α Γ( ds) , ⎩ ⎭

=

Sk

(8.3.7) t ∈ IRk .

8.3 Summability Methods, Compound Poisson Approximation

127

Deﬁne then ⎞1/α ⎛ ∞ cj (λ)α ⎠ . dα (λ) = ⎝

(8.3.8)

j=0

Theorem 8.3.1 Let 0 < α < r = s − 1 + p1 . Then ϑs,p

1 T (λ), Θ(α) dα (λ)

≤ R(λ)ϑs,p ,

(8.3.9)

where ⎛ ⎞r ∞ R(λ) = ⎝ cj (λ)/dα (λ)⎠,

ϑs,p := ϑs,p (X0 , Θ(α) ).

(8.3.10)

j=0

Proof: Let (Θj ) be an i.i.d. sequence with the same distribution as Θ(α) . Let us show ﬁrst that ∞

1 d cj (λ)Θj = Θ(α) . dα (λ) i=0

(8.3.11)

Consider the characteristic function of the right-hand side quantity in (8.3.11): E ei dα (λ) 1

∞

j=0

cj (λ)Θj ,t

=

∞ 8

9 i

Ee

cj (λ) Θ ,t dα (λ) j

:

(8.3.12)

j=0

⎧ ⎫ <α ⎨ ; c (λ) ⎬ j exp − = t, s Γ( ds) ⎩ ⎭ dα (λ) j=0 Sk ⎧ ⎫ α ∞ ⎨ ⎬ cj (λ) = exp − |t, s|α Γ( ds) ⎩ ⎭ dα (λ) j=0 ∞ 8

Sk

= ϕΘ(α) (t). By Lemma 8.2.3, ϑs,p is ideal of order r = s − 1 +

1 p

> α. Therefore,

1 T (λ), Θ(α) dα (λ) ⎛ ⎞ ∞ ∞ 1 1 = ϑs,p ⎝ cj (λ)Xj , cj (λ)Θj ⎠ dα (λ) j=0 dα (λ) j=0

ϑs,p

(8.3.13)

128

8. Probabilistic-Type Limit Theorems

≤ dα (λ)−r

∞

cj (λ)r ϑs,p (X0 , Θ(α) )

j=0

= R(λ)ϑs,p (X0 , Θ(α) ). 2 Note that various upper bounds for ϑs,p were established in Section 8.2. In particular, if r ∈ IN, or if X0 has a density, we have obtained upper bounds in terms of diﬀerence pseudomoments. Maejima (1985) showed that R(λ) ≤ cλ−(r−α)/α

(8.3.14)

for the C´esaro and Abel methods, and R(λ) ≤ cλ−(r−α)/2α

(8.3.15)

for the random walk method (which includes the Euler method and the Borel method as particular cases). In the Gaussian case, for r = 3 the metric ϑs,p in (8.3.9) is ﬁnite, provided that (i) if Cov (X0 ) = Ik , the unity matrix, and (ii) the components have ﬁnite third moments. Furthermore, the corresponding rate of convergence is λ−1/2 for the C´esaro and Abel methods and λ−1/4 for the random walk method. We complete this section with an application of the ideality properties of our metrics to the approximation of the distribution of sums of nonidentically distributed random vectors by a compound Poisson law. Let X1 , . . . , Xn be independent random vectors in IRk with distributions P1 , . . . , Pn of the form Pi = (1 − pi )δ0 + pi Qi ,

0 ≤ pi ≤ 1, 1 ≤ i ≤ n.

(8.3.16)

Here, δ0 stands for the one point distribution at zero. We can write Xi in the form Xi = Ci Di ,

1 ≤ i ≤ n,

(8.3.17)

where Ci has distribution Qi , Di is B(1, pi )-distributed, and Ci , Di are independent. We shall consider the approximation of S ind :=

n

Xi

(8.3.18)

i=1

by a multivariate compound Poisson distribution P(µ, Q). P(µ, Q) is deﬁned as the distribution of S coll :=

N i=1

Zi ,

(8.3.19)

8.3 Summability Methods, Compound Poisson Approximation

129

where (Zi ) is an i.i.d. sequence; P Zi = Q, N is Poisson distributed with parameter µ, P N = P(µ); and N , (Zi ) are independent. The notation S ind , S coll is taken from risk theory. Recall that in the risk-theory framework in the “individual model” pi is the probability of a claim Ci with distribution Qi , corresponding to k diﬀerent types of claims. S coll denotes the approximation of S ind by the “collective model”; we refer to the books of Gerber (1981) and Hipp and Michel (1990) for these and related notions. The usual choice of Q, µ in risk theory is µ=µ :=

n

:= Q=Q

pi ,

i=1

n pi i=1

µ

Qi .

(8.3.20)

This leads to the following representation of S coll : S coll =

n

Sicoll ,

(8.3.21)

i=1

where Sicoll ∼ P(pi , Qi ) (X ∼ Q denoting that X has distribution Q) and and {Sicoll } independent. Note that with this choice µ = µ , Q = Q, moreover, E S ind = E S coll ,

(8.3.22)

if the expectations exist. If Σi = Cov (Ci ), αi = (αi,1 , . . . , αi,k ) = ECi , then n n

pi Σi + pi qi αiT αi , Cov S ind = i=1

(8.3.23)

i=1

while n n

pi Σi + pi αiT αi . Cov S coll = i=1

(8.3.24)

i=1

As a consequence we obtain the following majorization result:

Cov S ind
In particular, ϑs,p S ind , S coll = ∞ if r = s − 1 +

(8.3.25)

1 p

≥ 3.

130

8. Probabilistic-Type Limit Theorems

In Rachev and R¨ uschendorf (1990) it is shown that a better choice of (µ, Q) is possible for k = 1 by appropriately chosen scale transformations. For an extension of this result to k ≥ 1, deﬁne µi

:=

µ

:=

(1 − pi )αi , Γi n

µi ,

Q

:=

(1 − pi )Σi ,

:=

n βi Qi , β β i=1

i=1

βi

pi ; 1 − pi n := βi,

:=

(8.3.26)

i=1

where Qi is a probability measure with mean µi and covariance Γi . We approximate Xi by a compound Poisson distributed r.v., Sicoll ∼ P(β i , Qi ). This leads to an approximation of S ind by S coll ∼ P(β, Q).

(8.3.27)

From our construction, it follows that EXi Cov (Xi )

= pi αi = E S coll ;

= Σi = Cov Sicoll = β i Γi + βi µTi µi .

(8.3.28)

The “ideal” properties of the metric ϑs,p derived in Section 8.2 yield closeness between the “individual” and the “collective” models using the following bounds: n

ϑs,p S ind , S coll ≤ ϑs,p Xi , Sicoll

(8.3.29)

i=1

and

ϑ1,p S ind , S coll

≤ A(s, p, k)

1 n

ϑs,p Xi , Sicoll

1 2 pr

.

(8.3.30)

i=1

The constant A(s, p, k) can be determined from (8.2.45). ϑs,p Xi , Sicoll is estimated from above by the metric αs,p (Theorem 8.2.5), which depends only on pseudo-diﬀerence moments. In particular, for r = s − 1 + p1 = s and bi = EXi s1 , ϑs,p (Xi , Sicoll ) ≤ pi bi

1 (s − 1)!

(cf. (8.2.11)).

(8.3.31)

Deﬁne the normalizations i = Xi − EXi , X

Yi = Sicoll − E Sicoll .

(8.3.32)

Consider the i.i.d case; we shall establish an estimate similar to that in (8.3.29) but without the dependence on n in the upper bound.

8.4 Operator-Stable Limit Theorems

131

Theorem 8.3.2 Suppose (Xn ) is an i.i.d. sequence with α = E Ci , Σ = Cov (Ci ), p = P (Di = 1). Then

(8.3.33) ϑ1,1 S ind , S coll ≤ C(ϑr + ϑr + τr + τr ), 1 , Θ), ϑr = ϑr (Y1 , Θ), τr = τr (X 1 , Θ), τr = τr (Y1 , Θ), C where ϑr = ϑr (X is as deﬁned in (8.2.62), and Θ is an N (0, Σ)-distributed r.v. Proof: The ideality of ϑ1,1 and the triangle inequality yield 1 n 2 n

ind coll coll ϑ X, S S ,S = ϑ 1,1

1,1

= ϑ1,1

i

i=1 1 n

i , X

i=1

≤

√

1

nϑ1,1

1 √ n

i

i=1 n i=1 n i=1

2 Yi 2 i , Θ X

+

√

1 nϑ1,1

2 n 1 √ Yi , Θ , n i=1

where Θ is normally distributed with mean zero and covariance Σ. The proof of Theorem 8.2.11 (in the case α = 2 with Σ being the identity matrix) extends to the general case with the same constant, thus implying (8.3.33). 2

8.4 Application to Operator-Stable Limit Theorems: Statement of the Results and Auxiliary Lemmas In this section we generalize some of the results from Section 8.2, studying the rate of convergence problem for a more general summation scheme for random vectors. Namely, suppose that the IRd -valued random vector θ is strictly operator-stable in the sense that µ 4, the characteristic function of ∗ θ, satisﬁes µ 4(z)t = µ 4(tB z) for every t > 0, for some invertible linear operator B on IRd . Suppose also that for the i.i.d. random vectors {Xi } in n w IRd , n−B i=1 Xi → θ. In this and the next section we study the rate of convergence of this operator-stable limit theorem in terms of several probability metrics including the Kantorovich metric and Lp -minimal versions, see (2.6.2). The results in this and the next section are due to Maejima and Rachev (1996). We start with some deﬁnitions and notation related to operator-stable limit theorems. A probability distribution µ on IRd is said to be full if µ is

132

8. Probabilistic-Type Limit Theorems

not concentrated on a proper hyperplane in IRd . A full distribution µ on IRd is called operator-stable if there exists an invertible linear operator B on IRd and a function b : (0, ∞) → IRd such that for all t > 0, ∗

4(tB z)eib(t) , µ 4(z)t = µ

for all z ∈ IRd .

(8.4.1)

Here µ 4 is the characteristic function of µ, B ∗ is the adjoint operator of B, ∞ A and t = exp{(ln t)A} = k=0 (k!)−1 (ln t)k Ak . The distribution µ is called strictly operator-stable if we can choose b(t) ≡ 0. In this section, we always assume that µ is a full strictly operator-stable distribution on IRd . Sharpe (1969) showed that if 1 is not in the spectrum of B, then the operatorstable law can be centered so as to become strictly operator-stable. Thus, the assumption of strict operator-stability is not so restrictive. The invertible linear operator B in (8.4.1) is called an exponent of µ. When µ is operator-stable with an exponent B, µ may satisfy (8.4.1) for other B’s; i.e., the exponent of µ is not necessarily unique. Further, we ﬁx the value of the exponent B and denote by θ the random vector in IRd having the full strictly operator-stable distribution µ with this ﬁxed B. It is known that every eigenvalue of B has its real part not less than 12 (see Sharpe (1969)). Recall that for a given sequence X1 , X2 , . . . of i.i.d. random vectors in IRd for which the normalized sum converges to θ, namely, n−B

n

w

Xi → θ,

(8.4.2)

i=1

we say that {Xi } belongs to the domain of normal attraction of µ. As in the previous sections, we will be interested in the rate of convergence of (8.4.2). Remark 8.4.1 Some of the results in this section can be extended to Banach space–valued random variables. Also, our arguments can be used for similar rate of convergence problems of the max-operator-stable limit theorem. We start with some notation. Let · 0 be the usual Euclidean norm of IRd and let S(µ) be the symmetry group associated with µ, that is, the group of all invertible linear operators A on IRd such that for some a ∈ IRd , µ 4(z) = µ 4(A∗ z)eia . Since by assumption µ is full, S(µ) is compact, and thus there exists a Haar probability H on S(µ). We introduce the following norm · , which depends on the particular operator-stable law µ but not on the choice of exponent: 1 x =

gtB x0 S(µ) 0

dt dH(g) t

(8.4.3)

8.4 Operator-Stable Limit Theorems

133

(see Hudson et al. (1986) and Hahn et al. (1989)). It has the following properties: (i) · does not depend on the choice of the exponent B. (ii) The map t → tB x is strictly increasing on (0, ∞) for x = 0. Deﬁne the norm of the linear operator A on IRd in the usual way by A = supx=1 Ax. Then property (ii) implies (iii) The map t → tB is strictly increasing on (0, ∞); i.e., t → t−B = (t−1 )B is strictly decreasing on (0, ∞). Further, we will need estimates of the growth rate of R(t) = tB x. Meerschaert (1989) showed that for every x, the function R0 (t) = tB x0 varies regularly with index between λB and ΛB , where λB and ΛB are the minimum and the maximum of the real parts of the eigenvalues of B, respectively. Clearly, for every norm · on IRd , the function R(t) = tB x will be of the same order as the regularly varying function R0 (t). In particular, for any η > 0, there exists t0 > 0 such that for any t > t0 , tλB −η x < tB x < tΛB +η x,

(8.4.4)

t−ΛB −η x < t−B x < t−λB +η x.

(8.4.5)

and

Let X (IRd ) be the class of all random vectors in IRd , and the Kolmogorov metric in X (IRd ), (X, Y ) :=

sup |P (X ≤ x) − P (Y ≤ x)|.

(8.4.6)

x∈IRd

Here, and throughout this section, x ≤ y or x < y, x, y ∈ IRd , means component-wise inequality. Also, all the probability metrics µ that we shall use are in fact metrics in the space of probability laws: We write µ(X, Y ) instead of µ(PX , PY ) only for the sake of simplicity, where PX , PY stand for the probability distributions of X, Y , respectively. Next, we deﬁne a uniform metric depending on the exponent B, ∗ (X, Y ) := sup (tB X, tB Y ).

(8.4.7)

t>0

This metric plays a crucial role in our approach to the rate of convergence problem (8.4.2). Let Var be the total variation distance in X (IRd ),

134

8. Probabilistic-Type Limit Theorems

Var(X, Y ) := 2

sup A∈B(IRd )

|P (X ∈ A) − P (Y ∈ A)|

(8.4.8)

|PX − PY |( dx)

= IRd

=

sup{|Ef (X) − Ef (Y )|; f : IRd → IR, continuous, |f (x)| ≤ 1 for all x ∈ IRd }.

Remark 8.4.2 It is not diﬃcult to check that ∗ is topologically “between” and Var; that is, top

top

≺ ∗ ≺ Var . top

Here we use the standard notation µ ≺ ν, meaning that ν-convergence implies µ-convergence, but the inverse is not generally valid. Remark 8.4.3 Our aim is to present a general approach to the rate of convergence problems associated with (8.4.2) that is designed to work for diﬀerent metrics in terms of which we want to obtain estimates of the rate of convergence. We start with uniform-type metrics (, ∗ , Var), and then we will proceed with Kantorovich-type minimal distances. For r > 0, deﬁne a convolution-type metric associated to Var: µr (X, Y ) := sup tB −r Var (tB X + θ, tB Y + θ).

(8.4.9)

t>0

Here and in what follows, the notation X1 + X2 means the sum of two independent random vectors X1 and X2 . We shall ﬁrst list our results and then prove them, extending the general method we have outlined in Section 8.1. Theorem 8.4.4 Let θ be a full strictly operator-stable random vector in B (≥ λ1B ) and take p such that IRd , and B an exponent of θ. Let r > Λ λ2 1 λB

λB ΛB r.

B

d Let {Xi }∞ i=1 be a sequence of i.i.d. random vectors in IR

∗ = ∗ (X1 , θ),

µ1 = µ1 (X1 , θ),

µr = µr (X1 , θ),

satisfying the moment-type condition 1 r−p ∗ < ∞. τr = τr (X1 , θ) := max , µ1 , µr

(8.4.10)

(8.4.11)

8.4 Operator-Stable Limit Theorems

135

Then for some absolute constant K = K(d, B, r, p) > 0, 1 n−B

n

1

2 Xi , θ

i=1

≤ ∗

n−B

≤ K nn

n

2 Xi , θ

i=1 −B r

µr + n−B τr

for all n ≥ 1.

Remark 8.4.5 In our theorem, we do not explicitly assume that {Xi } belongs to the domain of normal attraction of θ. However, since λB > 12 and rλB > 1, nn−B r µr + n−B τr → 0

as

n → ∞,

because of (8.4.5). Consequently, conditions (8.4.10) and (8.4.11) are sufﬁcient for {Xi } to be in the domain of normal attraction of θ. As to the decreasing rate of n−B , by (8.4.5), for every η > 0, there exists n0 such that n−B ≤ n−λB +η for every n ≥ n0 . However, we also see that for any η > 0, n−B ≤ M n−λB +η for all n ≥ 1, where M = supt≥1 t−B+(λB −η)I (< ∞). Note, however, that rate of convergence theorems typically describe only a relatively small subset of that domain of attraction. Letting B =

1 αI

, 0 < α ≤ 2, we have the following:

Corollary 8.4.6 Let θ be a strictly α-stable random vector with index 0 < α ≤ 2 . Let α 0, 1 ∗

−1/α

n

n

2 Xi , θ

r 1 ≤ K n1− α µr + n− α τr

for all n ≥ 1. (8.4.12)

i=1

Resnick and Greenwood (1979) studied the limit theorem for (α1 , α2 )stable laws, which corresponds to the operator-stable limit theorem with exponent ⎞ ⎛ 1/α1 0 ⎠. B=⎝ 0 1/α2 Theorem 8.4.4 provides a bound for the rate of convergence in this particular case. Corollary 8.4.7 Let θ = (θ(1) , θ(2) ) be a strictly (α1 , α2 )-stable bivariate α2 vector, 0 < α1 ≤ α2 ≤ 2. Let r > α21 (≥ α2 ) and take p such that α2 <

136

8. Probabilistic-Type Limit Theorems (1)

(2)

1 p< α α2 r. Let {Xi = (Xi , Xi )}i≥1 be a sequence of i.i.d. random vectors satisfying τr < ∞. Then for all n ≥ 1, 11 2 2 n n

(1) (2) ∗ −1/α1 −1/α2 Xi , n Xi n , θ ≤ K n1−r/α1 µr + n−1/α1 τr .

i=1

i=1

We next state our results on the rates of convergence in another type of uniform metric: the total variation distance Var and the uniform distance between characteristic functions. Let ' ' 'r ' ' −B 'r b = 54 '2−B ' ' 25 r > λ1B , ' , (8.4.13) ' B 'r ' B 'r ' ' r 1 ' −B ' 2 c = '2 ' + '3 ' , a = bc , and

' 1 'r ' ' M = sup 'x r I−B ' (< ∞).

(8.4.14)

x≥1

d Theorem 8.4.8 Let {Xi }∞ i=1 be a sequence of i.i.d. random vectors in IR satisfying

νr = νr (X1 , θ) := max{Var(X1 , θ), µr } ≤ Then

1 −B

Var n

n

2 Xi , θ

a . M

' 'r ≤ cn 'n−B ' νr

i=1

≤

' ' ' 1 ' '2−B 'r n 'n−B 'r bM

(8.4.15)

(8.4.16) for all n ≥ 1.

It would be interesting to have a version of this theorem without condition (8.4.15). Our next theorem concerns the rate of convergence of a third type of uniform metric χ that lies “between” and Var, top

top

≺ χ ≺ Var, namely, the uniform distance between characteristic functions: χ(X, Y ) := sup |φX (s) − φY (s)|, φX (s) := E eis,X , (8.4.17) s∈IRd

where ·, · is the inner product in IRd . The corresponding “tB -uniform” (recall the deﬁnition of ∗ (8.4.7)) and “smoothed” versions of χ are deﬁned by χ∗ (X, Y ) := sup χ(tB X, tB Y ) t>0

(8.4.18)

8.4 Operator-Stable Limit Theorems

137

and ' '−r χr (X, Y ) := sup 'tB ' χ∗ tB X + θ, tB Y + θ .

(8.4.19)

t>0 d Theorem 8.4.9 Let {Xi }∞ i=1 be a sequence of i.i.d. random vectors in IR satisfying

νr∗ = νr∗ (X1 , θ) := max{χ∗ (X1 , θ), χr (X1 , θ)} ≤ Then for all n ≥ 1, 2 1 n ' 'r ∗ −B χ n Xi , θ ≤ cn 'n−B ' νr∗ ≤ i=1

a . M

' ' ' 1 ' '2−B 'r n 'n−B 'r . bM

Let us now denote the density of the random vector X by pX (x) (when it exists) and deﬁne the fourth type of uniform metric d(X, Y ) := ess sup |pX (x) − pY (x)|.

(8.4.20)

x∈IRd

This is “topologically” the strongest: top

top

top

≺ χ ≺ Var ≺ d. Let K(d, B) :=

max

max xB ij

2≤x≤3 1≤i,j≤d

(8.4.21)

(< ∞),

where Aij is the (i, j) component of the matrix A, and put C(d, B) = d! K(d, B)d . Let dr be the smoothed version of d, ' '−r dr (X, Y ) := sup 'tB ' d tB X + θ, tB Y + θ .

(8.4.22)

(8.4.23)

t>0

Applying Theorem 8.4.8, we obtain the following rate of convergence bound in the local central limit theorem for operator-stable random vectors. Theorem 8.4.10 Suppose X1 has a density. Let ' 'r 6 ' 'r A = max C(d, B) '2B ' + '3B ' , 1 5 and ' 'r D ≥ C(d, B) '3B ' .

138

8. Probabilistic-Type Limit Theorems

If Tr = Tr (X1 , θ) := max{d(X1 , θ), dr (X1 , θ)} < ∞ and νr ≤ min

a 1 , M M cD

,

then 1 −B

d n

n

2 Xi , θ

≤ Ann−B r Tr

for all n ≥ 1.

i=1

Remark 8.4.11 Operator-stable random vectors have bounded densities (Hudson (1980)). The rest of our rate of convergence results are concerned with the minimal Lp -metrics, 0 ≤ p ≤ ∞, and in particular with 1 , the Kantorovich metric, 4 1 ; see Section 8.1. Recall that the total variation distance Var is

1 = L in fact the minimal L0 -metric. Recall the deﬁnition of the Lp -compound metric: For any X, Y ∈ X (IRd ), (8.4.24) Lp (X, Y ) := {E[X − Y p ]}min(1,1/p) , 0 0; P (X −Y > ε) = 0}, (8.4.26) where I[A] is the indicator function of a set A. As always in this book, we assume that all random vectors X ∈ X (IRd ) are deﬁned on a nonatomic probability space (Ω, A, P ); in this way the space of all joint laws PX,Y coincides with the space of all probability measures on IR2d . The Lp -minimal metric for 0 ≤ p ≤ ∞ was deﬁned in (8.2.23): 4 p (X, Y ) = L 4 p (PX , PY ) := inf Lp (X, Y ), (8.4.27)

p (X, Y ) = L d where the inﬁmum is taken over all PX, Y with ﬁxed marginals X ∼ X, d Y ∼ Y . Remark 8.4.12 For every p ∈ [0, ∞] ﬁxed, we shall be interested in the 4 p n−B n Xi , θ → 0. As a consequence, we rate of convergence of L i=1 shall derive the rate of convergence results in terms of the Prohorov metric π(X, Y ) = inf{ε; P (X ∈ A) ≤ P (Y ∈ Aε ) + ε for all A ∈ B(IRd )}, (8.4.28) where Aε := {x; x − A ≤ ε}.

8.4 Operator-Stable Limit Theorems

139

4 0 = 1 Var, and so Theorem 8.4.8 gives the Case p = 0. For p = 0, L 2 desired bound for the rate of convergence. Case 0< p ≤ 1. Suppose ﬁrst that B, the exponent of θ, satisﬁes ΛB 1 ≤ < p ≤ 1. (8.4.29) λB λ2B Then by (8.4.5), nn−B p → 0 as n → ∞. Theorem 8.4.13 Suppose 0 < p ≤ 1 and (8.4.29) holds. Let X, X1 , X2 , . . . be a sequence of i.i.d. random vectors satisfying 4 p := L 4 p (X, θ) < ∞. L

(8.4.30)

Then 1 4p L

n

−B

n

2 Xi , θ

' 'p 4p , ≤ n 'n−B ' L

i=1

and furthermore, 2 1 n 1 ' ' p 1 −B 4 pp+1 . Xi , θ ≤ n p+1 'n−B ' p+1 L π n i=1

In the case where (8.4.29) is not satisﬁed, we shall prove a result similar to that in Theorem 8.4.10. Deﬁne the convolution-type metric associated 4 p : For r > 0, with L 4 p (tB X + θ, tB Y + θ). 4 p,r (X, Y ) := sup tB −r L L

(8.4.31)

t>0

Theorem 8.4.14 Let 0 λ1b , ' 'p ' 'r 6 ' 'r A = max '2−B ' '2B ' + '3B ' , 1 5 and ' 'p ' 'r D ≥ '2−B ' '3B ' . Suppose 4 p (X, θ), L 4 p,r (X, θ)} < ∞ Rp,r = Rp,r (X, θ) := max{L

140

8. Probabilistic-Type Limit Theorems

and

νr ≤ min

a 1 , M M cD

,

where a, c, and M are deﬁned in (8.4.13) and (8.4.14). Then, for all n ≥ 1, 1 π n−B

n

2p+1 Xi , θ

1 4p ≤ L

n−B

i=1

n

2 Xi , θ

≤ Ann−B r Rp,r .

i=1

a In the next theorem we shall relax the assumption νr ≤ min M , M1cD at the cost of losing a little of the order of convergence nn−B r . The next result has a form resembling Theorem 8.4.4. Theorem 8.4.15 Let 0 1 λB

λB ΛB r.

Let

{Xi }∞ i=1

ΛB (≥ λ1B ), λ2B

and take q such that

be a sequence of i.i.d. random vectors in IRd

4 p,r = L 4 p,r (X1 , θ) < ∞ L and

Qp,r

1

4 p , µ1 , µrr−q = Qp,r (X1 , θ) := max L

< ∞,

4 p (X1 , θ), µ1 = µ1 (X1 , θ), and µr = µr (X1 , θ). Then for some 4p = L where L absolute constant K = K(d, B, r, p, q) > 0, 1 2 n

' ' 'r ' 4 p,r + 'n−B 'p Qp,r , 4 p n−B L Xi , θ ≤ K n 'n−B ' L i=1

for all n ≥ 1. Case 1< p ≤2. For 1 < p ≤ 2, we use a completely diﬀerent method in the rate of convergence problem, which relies on the minimality property 4 p and was introduced in Rachev and R¨ uschendorf (1992). of L B (≥ Theorem 8.4.16 Suppose 1 Λ λ2B X2 , . . . be a sequence of i.i.d. random vectors satisfying

4p = L 4 p (X, θ) < ∞ L

and

E[X − θ] = 0.

Then there exists Cr > 0 such that for all n ≥ 1, 2 1 n ' 1 ' −B 4p n 4p , L Xi , θ ≤ Cp n p 'n−B ' L i=1

1 λB ).

Let X, X1 ,

8.4 Operator-Stable Limit Theorems

141

and moreover, the right-hand side vanishes as n → ∞. Furthermore, 2 1 n p p ' ' p 1 −B 4 pp+1 . Xi , θ ≤ Cpp+1 n p+1 'n−B ' p+1 L π n i=1

Corollary 8.4.17 Let θ = (θ(1) , θ(2) ) be a strictly (α1 , α2 )-stable bivariα2 ate vector, 0 < α1 ≤ α2 ≤ 2. Let 2 ≥ p > α21 (≥ α2 ). Let {Xi = (1) (2) 4p = (Xi , Xi )}i≥1 be a sequence of i.i.d. random vectors satisfying L 4 p (X1 , θ) < ∞, and if p > 1, we additionally assume that E[X1 − θ] = 0. L Then for all n ≥ 1, 2 1 n n (1) (2) −1/α1 −1/α2 Xi , n Xi ), θ π (n i=1

i=1

11 4p ≤ L

n−1/α1

n

(1)

Xi , n−1/α2

i=1

≤

⎧ 1 p ⎪ ⎨ n1− α1 L 4 p p+1 p

⎪ ⎩ Cp n p1 − α11 L 4 p p+1

n

2 (2)

Xi

2 max(1,p) p+1 , θ

i=1

for

0 < p ≤ 1,

for

1 < p ≤ 2.

4 p can be Remark 8.4.18 Our approach based on the use of “ideality” of L =n extended to bound the distance between the maxima MX (n) := n−B k=1 k = = n k n −B i=1 Xi and Mθ (n) := n k=1 i=1 θi . (Here k=1 stands for the componentwise maximum, and {θi } are i.i.d. copies of θ.) Also, we can >n k >n k −B compare>mX (n) := n−B k=1 i=1 Xi with mθ (n) := n k=1 i=1 θi n (where k=1 stands for the componentwise minimum) and aX (n) := n−B ' ' k =n ' =n ' −B ' ' ' k θi '. k=1 i=1 Xi with aθ (n) := n k=1 i=1 Theorem 8.4.19 Let 1 < p ≤ 2 and θ be a full strictly operator-stable random vector with exponent B such that n2 n−B p → 0

as

n → ∞.

Let X, X1 , X2 , . . . be i.i.d. random vectors in IRd with E[X − θ] = 0 and such that Lp (X, θ) < ∞. Then there exists Cp,d > 0 such that for every n ≥ 1, % & 4 p (M (n), M (n)) , L 4 p (m (n), m (n)) , L 4 p (a (n), a (n)) max L X X X θ θ θ ' ' 4 p (X, θ). ≤ Cp,d n2/p 'n−B ' L

142

8. Probabilistic-Type Limit Theorems

Remark 8.4.20 Note that aX (n) and aθ (n) are positive random variables, and therefore ⎛ 1 ⎞1/p p 4 p (a (n), a (n)) = ⎝ F −1 (t) − F −1 (t) dt⎠ , L X θ a (n) a (n) X

θ

0 −1 is the generalized inverse of FX ; cf. Theorem 3.1.2. Also, from where FX the above bound we can get the rate of convergence for π by making use of p

the bound π ≤ Lpp+1 . Let us compare Theorem 8.4.16 with a similar result on the rate of convergence in terms of the Zolotarev metric ζr , r > 0; see (8.2.1). Theorem 8.4.21 Let X, X1 , X2 , . . . be i.i.d. random vectors in IRd , and r a positive constant satisfying the conditions ζr := ζr (X, θ) < ∞

and

nn−B r → 0 as n → ∞.

Then for every n ≥ 1, 2 1 n −B Xi , θ ≤ nn−B r ζr , ζr n i=1

and for some Cr > 0, 2 1 n 1 1 ' ' r 1 Xi , θ ≤ Crr+1 n r+1 'n−B ' r+1 ζrr+1 . π n−B i=1

It is known that if r is an integer, then ζr on the right-hand sides of the above bounds can be estimated by the rth diﬀerence pseudomoment κr from above (see Zolotarev (1993)). Namely, if all mixed moments of order less than or equal to r − 1 for X and Y agree, then ζr (X, Y ) ≤

1 κr (X, Y ), r!

r ∈ IN,

(8.4.32)

where κr is rth diﬀerence pseudomoment % κr (X, Y ) = sup |E[f (X) − f (Y )]| ; f : IRd → IR, (8.4.33) ' ' & ' r−1 r−1 ' |f (x) − f (y)| ≤ 'x x − y y ' for all x, y ∈ IRd . For arbitrary r > 0, ζr is bounded from above by the absolute pseudomoment, namely, if all mixed moments of X and Y of order less than or equal to m (r = m + α, m ∈ IN, α ∈ (1, 2]) agree, then ζr ≤

Γ(1 + α) ξr , Γ(1 + r)

(8.4.34)

8.4 Operator-Stable Limit Theorems

where ξr is the rth absolute pseudomoment xr |PX − PY |( dx). ξr (X, Y ) :=

143

(8.4.35)

IRd

Let us now compare the rate of convergence in Theorem 8.4.21 with that in Theorem 8.4.16 for r = p ∈ (1, 2]. Recall that (8.3.32) is true only for r ∈ IN, and the known estimates for ζr from above by κr (r being noninteger) involve E[Xr ] and E[Y r ]. However, for any p ≥ 1, 4 pp (X, Y ) ≤ 2p κp (X, Y ) ≤ 2p ξp (X, Y ). L

(8.4.36)

4 p (X, θ) < ∞ in Theorem 8.4.16 is preferable Therefore, the restriction L θ) < ∞ in Theorem 8.4.21. On the other hand, the estimate for to ζr (X, n ζr (n−B i=1 Xi , θ) holds for any r > 0 and provides us with the exact order of convergence (as n → ∞) under the assumption ζr (X1 , θ) < ∞. Case 2< p ≤ ∞. Theorem 8.4.22 T8.4.21 Let θ be a full strictly operator-stable random vector that does not have a Gaussian component, or equivalently, whose exponent B satisﬁes n1/2 n−B → 0

as

n → ∞.

4 p (X, θ) < ∞. Then Let X, X1 , X2 , . . . be i.i.d. random vectors such that L for some C(d, p) > 0, 1 2 n −B 4 4 p (X, θ). Lp n Xi , θ ≤ C(d, p)n1/2 n−B L i=1

Before starting with the proof of our theorems, we introduce a notion of ideality for a probability metric, designed for problem (8.4.2). Deﬁnition 8.4.23 A metric ζ : X (IRd ) × X (IRd ) → [0, ∞) is called operator-ideal of order r ≥ 0 if (i) (homogeneity) ζ(aB X, aB Y ) ≤ aB r ζ(X, Y ) for any a > 0, and (ii) (regularity) ζ(X + Z, Y + Z) ≤ ζ(X, Y ) for any Z independent of X and Y . We next show a few lemmas needed for the proof of our main results.

144

8. Probabilistic-Type Limit Theorems

Lemma 8.4.24 , κr , and are regular (that is, (ii) holds); ∗ , Var, and χ∗ are operator-ideal of order r = 0; and ≤ ∗ ≤ 12 Var, χ ≤ χ∗ ≤ Var. This follows directly from the deﬁnitions of the metrics. 4 p,r , and ζr are operator-ideal of order r > 0. Lemma 8.4.25 µr , χr , dr , L 4 Lp is operator-ideal of order p ∧ 1. Proof: We ﬁrst show the operator-ideality of µr . For any a > 0, ' '−r µr (aB X, aB Y ) = sup 'tB ' Var((ta)B X + θ, (ta)B Y + θ) t>0

' '−r 1 2 B B ' t B' t t ' ' a X + θ, a Y +θ = sup ' ' Var ' a a t>0 ' a ' '−r ≤ aB r sup 'tB ' Var(tB X + θ, tB Y + θ), t>0

since ' B ' ' B '−1 't ' 'a '

' ' ' ' '−1 ' = 'tB a−B aB ' 'aB ' ≤ 'tB a−B ' = aB r µr (X, Y ),

which shows the homogeneity of µr of order r > 0. We also have µr (X + Z, Y + Z) = sup tB −r Var(tB (X + Z) + θ, tB (Y + Z) + θ) t>0

≤ sup tB −r Var(tB X + θ, tB Y + θ) t>0

= µr (X, Y ), since tB Z is independent of tB X and θ, and Var is regular. This demon4 p,r strates the regularity of µr . One can check the ideality of χr , dr , and L in a similar fashion. We next show the operator-ideality of ζr . The regularity of ζr is known. As for the homogeneity, we have % ζr (aB X, aB Y ) = sup |E[f (aB X) − f (aB Y )]|; & f (r−1) (x) − f (r−1) (y) ≤ x − y . (8.4.37) Let fa (x) := f (aB x). Then (r−1) fa(r−1) (x)(h)(r−1) = f (r−1) aB x aB h ,

8.4 Operator-Stable Limit Theorems

145

implying that ' ' ' 'r−1 ' ' ' (r−1) ' ' (r−1) B ' a x − f (r−1) aB y ' . (x) − fa(r−1) (y)' ≤ 'aB ' 'fa 'f Then the side condition in (8.4.37), ' ' ' (r−1) ' (x) − f (r−1) (y)' ≤ x − y, 'f results in ' ' ' ' ' 'r−1 ' B ' ' (r−1) ' 'a x − aB y ' ≤ 'aB 'r x − y. (x) − fa(r−1) (y)' ≤ 'aB ' 'fa Consequently, by (8.4.37), ζr aB X, aB Y % & ≤ sup |E[fa (X) − fa (Y )]| ; fa(r−1) (x) − fa(r−1) (y) ≤ aB r x − y = aB r ζr (X, Y ), which shows the regularity of ζr .

2

Lemma 8.4.26 If Z, W ∈ X (IRd ) are independent of X, Y ∈ X (IRd ), then ∗ (X + Z, Y + Z) ≤ ∗ (Z, W ) Var(X, Y ) + ∗ (X + W, Y + W ) and ∗ (X + Z, Y + Z) ≤ ∗ (X, Y ) Var(Z, W ) + ∗ (X + W, Y + W ). Lemma 8.4.27 If Z, W ∈ X (IRd ) are independent of X, Y ∈ X (IRd ), then Var(X + Z, Y + Z) ≤ Var(Z, W ) Var(X, Y ) + Var(X + W, Y + W ) and χ∗ (X + Z, Y + Z) ≤ χ∗ (Z, W )χ∗ (X, Y ) + χ∗ (X + W, Y + W ). Lemma 8.4.28 If Z, W ∈ X (IRd ) are independent of X, Y ∈ X (IRd ), then d(X + Z, Y + Z) ≤ d(Z, W ) Var(X, Y ) + d(X + W, Y + W ), d(X + Z, Y + Z) ≤ d(X, Y ) Var(Z, W ) + d(X + W, Y + W ), 4p . and for 0 < p ≤ 1 both inequalities hold with d replaced by L

146

8. Probabilistic-Type Limit Theorems

Proof: The proofs are very similar to those in Lemma 8.1.15; cf. also Lemma 2 in Senatov (1980) or Lemmas 14.3.3 and 14.3.6 in Rachev (1991). 4p , 0 < We shall demonstrate only the proof of the smoothing inequality for L 4p : p ≤ 1. We use the dual representation for L 4 p (X + Z, Y + Z) L = sup |E[f (X + Z) − f (Y + Z)]| f ∈ Lipb (p)

(recall that Lipb (p) consists of all bounded continuous functions on IRd satisfying |f (x) − f (y)| ≤ x − yp for all x, y ∈ IRd ) = sup PZ ( dz)(E[f (X + z)] − E[f (Y + z)]) f ∈ Lipb (p)

≤

sup (PZ − PW )( dz) (E[f (X + z)] − E[f (Y + z)]) f ∈ Lipb (p) + sup PW ( dz) (E[f (X + z)] − E[f (Y + z)])

≤

f ∈ Lipb (p)

|PZ − PW |( dz)

sup f ∈ Lipb (p)

|(E[f (X + z)] − E[f (Y + z)])|

+ Lp (X + W, Y + W ) =

Var(Z, W )Lp (X, Y ) + Lp (X + W, Y + W ),

as desired.

2

4 p , 0 < p ≤ 1) Let θ Lemma 8.4.29 (Smoothing inequalities for ∗ and L d and θ1 be independent random vectors in IR having the same full strictly operator-stable distribution with exponent B. Then for any X ∈ X (IRd ) independent of θ1 and δ > 0, ∗ (X, θ) ≤ C1 ∗ (X + δ B θ1 , θ + δ B θ1 ) + C2 δ B , and for 0 < p ≤ 1, if E[θp ] < ∞, 4 p (X, θ) ≤ Lp (X + δ B θ1 , θ + δ B θ1 ) + 2δ B p E[θp ]. L Here and in what follows, the Ci ’s are absolute constants depending only on d and B, unless stated otherwise explicitly. B = tB Proof: Fix ε ∈ (0, 1) and choose X ε X, θ = tε θ for some tε > 0 such ∗ that (X, θ) ≤ (X, θ) + ε. We ﬁrst show the inequality

≤ C1 (X + δ B θ1 , θ + δ B θ1 ) + C2 δ B θ) (X,

(8.4.38)

8.4 Operator-Stable Limit Theorems

147

d = (cX, for any c > 0, we can shrink X, θ) cθ) θ, for θ1 ∼ θ. Since (X, and θ1 without any loss of generality. So we assume

< 1) > P (θ

2 . 3

(8.4.39)

For brevity we shall delete the “ ” from now on. Let θ(i) ∈ IR, i = 1, . . . , d, be the ith component of the operator-stable random vector θ ∈ IRd . Then for each i = 1, . . . , d, θ(i) has a bounded density; that is, for some M < ∞, d (i) P (θ ≤ x) =: sup |pθ(i) (x)| ≤ M for all i. sup x∈IR dx x∈IR (See Hudson (1976).) The idea of the following proof is taken from Lemma 12.1 in Bhattacharya and Rao (1976). First consider the case := (X, θ)

=

sup |P (X ≤ x) − P (θ ≤ x)|

(8.4.40)

x∈IRd

= − inf (P (X ≤ x) − P (θ ≤ x)) . x∈IRd

Given η ∈ (0, ), there exists x0 ∈ IRd such that P (X ≤ x0 ) − P (θ ≤ x0 ) < − + η.

(8.4.41)

We then have I := P (X + δ B θ1 ≤ x0 − δ B e) − P (θ + δ B θ1 ≤ x0 − δ B e) P (X + z ≤ x0 − δ B e)−P (θ + z ≤ x0 − δ B e) P (δ B θ1 ∈ dz) = IRd

=

+

E

,

Ec

where E := {z ∈ IRd − δ B e < z < δ B e} and e = (1, 1, . . . , 1)t ∈ IRd . Then estimating both integrals in the representation for I, we get P (X ≤ x0 ) − P (θ ≤ x0 − z − δ B e) P (δ B θ1 ∈ dz) I ≤ E

+ P (δ B θ1 ∈ E c ).

(8.4.42)

148

8. Probabilistic-Type Limit Theorems

To estimate the last term observe that β := P (δ B θ1 ∈ E) ≥ P (δ B θ1 < δ B ) >

2 , 3

(8.4.43)

by (8.4.39). On the other hand, denoting the distribution function of θ by F (x), x = (x(1) , x(2) , . . . , x(d) )t ∈ IRd , and ε = (ε(1) , ε(2) , . . . , ε(d) )t ∈ IRd , we have |P (θ ≤ x + ε) − P (θ ≤ x)| d ≤ F (x(1) , . . . , x(i−1) , x(i) + ε(i) , . . . , x(d) + ε(d) ) i=1

≤

d

−F (x(1) , . . . , x(i) , x(i+1) + ε(i+1) , . . . , x(d) + ε(d) ) P (θ(i) ∈ Ii ).

i=1

Here Ii := (x(i) , x(i) + ε(i) ] or := (x(i) + ε(i) x(i) ] depending on the sign of ε(i) . Therefore, d

P (θ(i) ∈ Ii ) ≤

i=1

d

|ε(i) | sup |pθ(i) (x)| ≤ M ε1 ,

i=1

x∈IR

where · 1 is the L1 -norm. Hence, −P (θ ≤ x + ε) ≤ −P (θ ≤ x) + M ε1 .

(8.4.44)

Thus we have, by (8.4.41), (8.4.42), and (8.4.44) with ε = −z − δ B e, that I ≤ P (X ≤ x0 ) − P (θ ≤ x0 ) + M (z1 + dδ B ) P (δ B θ1 ∈ dz) E

≤

+ P (δ B θ1 ∈ E c ) − + η + M (z1 + dδ B ) P (δ B θ1 ∈ dz) + P (δ B θ1 ∈ E c ).

E

Since z1 ≤ dδ B on E, it follows that I

≤ (− + η + 2M dδ B )P (δ B θ1 ∈ E) + P (δ B θ1 ∈ E c ) ≤ (1 − 2β) + η + 2M dδ B .

Consequently, (2β − 1) ≤ (X + δ B θ1 , θ + δ B θ1 ) + 2M dδ B + η.

8.4 Operator-Stable Limit Theorems

149

Since η can be taken arbitrarily small, we have 1 (X + δ B θ1 , θ + δ B θ1 ) + 2M dδ B . 2β − 1 Since β > 23 by (8.4.43), ≤ 3 (X + δ B θ1 , θ + δ B θ1 ) + 2M dδ B . This proves the inequality (X, θ) ≤ C1 (X + δ B θ1 , θ + δ B θ1 ) + C2 δ B with C1 = 3 and C2 = 6M d, provided that (8.4.40) holds. ≤

If, on the other hand, = supx∈IRd (P (X ≤ x) − P (θ ≤ x)), then given η ∈ (0, ), there exists x0 such that P (X ≤ x0 ) − P (θ ≤ x0 ) > − η. Then we similarly have P (X + δ B θ1 ≤ x0 + δ B e) − P (θ + δ B θ1 ≤ x0 + δ B e) = P (X + z ≤ x0 + δ B e) − P (θ + z ≤ x0 + δ B e) P (δ B θ1 ∈ dz) IRd

+

= E

≥

Ec

P (X ≤ x0 ) − P (θ ≤ x0 − z + δ B e) P (δ B θ1 ∈ dz)

E

− P (δ B θ1 ∈ E c ) ≥

P (X ≤ x0 ) − P (θ ≤ x0 ) − M (z1 + dδ B ) P (δ B θ1 ∈ dz)

E

− P (θB θ1 ∈ E c ) ≥ (2β − 1) − η − 2M dδ B . B B B Hence (2β − 1)B ≤ (XB+ δ θ1 , θ + Bδ θ1 ) + 2M dδ + η, so that ≤ 3 (X + δ θ1 , θ + δ θ1 ) + 2M dδ . This completes the proof of the inequality (8.4.28), and reintroducing the symbol “ ”, we write

≤ C1 (X + δ B θ1 , θ + δ B θ1 ) + C2 δ B . θ) (X, and θ, Therefore, by the deﬁnition of ∗ , X, ∗ (X, θ)

+ε θ) ≤ (X, + δ B θ1 , θ + δ B θ1 ) + C2 δ B + ε ≤ C1 (X B B B B ≤ C1 (tB ε (X + δ θ1 ), tε (θ + δ θ1 )) + C2 δ + ε

≤ C1 ∗ (X + δ B θ1 , θ + δ B θ1 ) + C2 δ B + ε,

150

8. Probabilistic-Type Limit Theorems

which yields the ﬁrst smoothing inequality of Lemma 8.4.29. 4 p (0 < p ≤ 1). By Now let us show an inequality of a similar type for L 4 the triangle inequality and the regularity of Lp , 4 p (X, θ) ≤ L 4 p (X, X + δ B θ1 ) + L 4 p (X + δ B θ1 , θ+ δ B θ1 ) + L 4 p (θ, θ+ δ B θ1 ) L 4 p (X + δ B θ1 , θ + δ B θ1 ) + 2L 4 p (0, δ B θ1 ). ≤ L 4 p as a minimal metric with respect to the Lp -metric, From the deﬁnition of L it follows that 4 p (0, δ B θ) = E[δ B θp ] ≤ δ B p E[θp ], L which completes the proof of Lemma 8.4.29.

2

The proof of the next two lemmas is obvious. Lemma 8.4.30 For any a > 0 and r > 0, Var(aB X + θ, aB Y + θ) ≤ aB r µr (X, Y ), χ∗ (aB X + θ, aB Y + θ) ≤ aB r χr (X, Y ), d(aB X + θ, aB Y + θ) ≤ aB r dr (X, Y ), and for 0 < p ≤ 1, 4 p (aB X + θ, aB Y + θ) ≤ aB r L 4 p,r (X, Y ), L where X, Y ∈ X (IRd ) are independent of θ. Lemma 8.4.31 Let Aut(IRd ) be the set of all invertible linear operators (automorphisms). Then, for any A ∈ Aut(IRd ), d(AX, AY ) = |JA−1 |d(X, Y ), where JA is the Jacobian of the matrix A. Lemma 8.4.32 For x ∈ [2, 3], |JxB | ≤ C(d, B), where C(d, B) is deﬁned in (8.4.22). Proof: Note that |JxB | = | det xB |.

8.4 Operator-Stable Limit Theorems

151

If A is d × d matrix, then | det A| ≤ d! | max Aij |d , 1≤i,j≤d

which proves the lemma.

2

Lemma 8.4.33 Let θ be a full strictly operator-stable random vector in IRd with exponent B. Then for any two independent copies θ1 and θ2 of θ and for any t, s > 0, d

tB θ1 + sB θ2 ∼ (t + s)B θ. Proof: By (8.4.1) with b(t) ≡ 0, B B B B E eiz,t θ1 +s θ2 = E eiz,t θ1 E eiz,s θ2

∗ ∗ = µ 4 tB z µ 4 sB z = µ 4(z)t µ 4(z)s

∗ = µ 4(z)t+s = µ 4 (t + s)B z B = E eiz,(t+s) θ . 2 The following lemmas are proved in Sections 2.5 and 2.6. Further, Cb (IRd ) stands for the set of all bounded continuous functions on IRd . 4p ) L 4 p admits the following repreLemma 8.4.34 (Duality theorems for L sentation: (i) For p = 0, 4 0 (X, Y ) L

=

=

sup{|E[f (X) − f (Y )]|; f ∈ Cb (IRd ) such that |f (x) − f (y)| ≤ 1 for all x, y ∈ IRd } 1 Var(X, Y ). 2

(ii) For 0 < p ≤ 1, 4 p (X, Y ) L

=

sup{|E[f (X) − f (Y )]|; f ∈ Cb (IRd ) such that |f (x) − f (y)| ≤ x − yp for all x, y ∈ IRd }.

152

8. Probabilistic-Type Limit Theorems

(iii) For 1 0; P (X ∈ A) ≤ P (Y ∈ Aε ) for all A ∈ B(IRd )}.

=

4 p -convergence 4 p -convergence) (i) For any 0 ≤ p ≤ ∞, L Lemma 8.4.35 (L implies weak convergence. Moreover, if π is the Prohorov metric, then 1

4 pp+1 π ≤ L

for all 0 ≤ p ≤ 1

and p

4 pp+1 π ≤ L

for all 1 ≤ p ≤ ∞.

(ii) Let 0 < p < ∞ and E[Xn p ] + E[Xp ] < ∞. Then 4 p (Xn , X) → 0 L if and only if w

Xn → X

E[Xn p ] → E[Xp ].

and

4 p in X (IR)) For d = 1, 1 ≤ Lemma 8.4.36 (Explicit representations for L p ≤ ∞, ⎛ 4 p (X, Y ) = ⎝ L

1

⎞1/p −1 |FX (t) − FY−1 (t)|p dt⎠

,

1 ≤ p < ∞,

0

4 ∞ (X, Y ) L

=

−1 sup |FX (t) − FY−1 (t)|.

0≤t≤1

4 p ) For 1 ≤ p < ∞, Lemma 8.4.37 (Upper bounds for L 4 p ≤ κp ≤ ξp , 2−p+1 L where κp (resp. ξp ) is the pth diﬀerence (resp. absolute) pseudomoment.

8.5 Proofs of the Rate of Convergence Results

153

8.5 Application to Operator-Stable Limit Theorems: Proofs of the Rate of Convergence Results In this section we give the proofs of the rate of convergence results stated in Section 8.4. Proof of Theorem 8.4.4: All probability metrics for the random vectors in this section are deﬁned by their marginal distributions and are consequently independent of their joint distributions. So, without loss of generality, we assume that {Xi } and θ are independent of each other. Let {θi } be independent copies of θ and assume that {θi } are independent of {Xi } and θ. Then by the deﬁnition of θ (or by (8.4.1) with b(t) ≡ 0), n

n−B

d

θi ∼ θ

for any n = 1, 2, . . . .

(8.5.1)

i=1

Now, by Lemma 8.4.29 and (8.5.1), for any δ > 0, 1 2 n ∗ n−B Xi , θ i=1

1

≤ C1

∗

−B

n

= C1

2 B

+ C2 δ B

B

Xi + δ θ1 , θ + δ θ1

i=1

1 ∗

n

(8.5.2)

−B

n

n

B

Xi + δ θ, n

−B

i=1

n

2 + C2 δ B .

B

θi + δ θ

i=1

Furthermore, by the triangle inequality, 1 2 n n ∗ −B B −B B Xi + δ θ, n θi + δ θ n i=1

i=1

1 ∗

≤

n

−B

n i=1

+

m

⎛

+

⎛

j i=1

1 ∗

Xi + δ θ, n

∗⎝n−B⎝

j=1

n

−B

−B

B

1m+1 i=1

θi +

θ1 + n

n

−B

⎞

n

2 B

Xi + δ θ

i=2

Xi ⎠ + δ B θ,

i=j+1

⎛ ⎞ ⎞ j+1 n n−B ⎝ θi + Xi ⎠ + δ B θ⎠ i=1

θi +

n i=m+2

Xi

2

i=j+2 B

−B

+ δ θ, n

n i=1

2 B

θi + δ θ ,

154

8. Probabilistic-Type Limit Theorems

where m = [ n2 ], n ≥ 5. By Lemma 8.4.26, the above is 1 ≤

∗

n−B

−B

n

+

+

m

n

Xi , n−B

i=2

1 ∗

n

⎡

2

Var n−B X1 + δ B θ, n−B θ1 + δ B θ

θi

i=2

(X1 +

n

n

−B

B

θi ) + δ θ, n

i=2

⎛

1

−B

1 j

⎞

n

Xi , n−B

i=j+2

× Var n

θi + δ θ

i=1

n

⎣∗ ⎝n−B

j=1

2 B

θi ⎠

i=j+2

2

θi + Xj+1

−B

B

+ δ θ, n

i=i

⎛

j+1

2 B

θi + δ θ

i=1

⎞⎤ j n n θi + Xj+1 + θi ) + δ B θ, n−B θi + δ B θ⎠⎦ + ∗ ⎝n−B ( i=1

1 + ∗

n−B

1m+1

i=j+2

θi +

i=1 4

=:

i=1

2

n

+ δ B θ, n−B

Xi

i=m+2

n

2

θi + δ B θ

i=1

∆k .

k=1

Here, the summands ∆k are deﬁned as follows: 1 2 n n ∆1 = ∗ n−B Xi , n−B θi Var n−B X1 + δ B θ, n−B θ1 + δ B θ ,

∆2

=

m

⎛

i=2

i=2 n

∗ ⎝n−B

j=1

−B

× Var n = (m + 1) 1 ∆4

= ∗

n−B

∗

2

B

θi + Xj+1

1

n

1m+1 i=1

X1 +

n

θi

θi +

Xi

B

2 + δ B θ, n−B

i=m+2

k=1

2 B

θi + δ θ , 2

B

+ δ θ, θ1 + δ θ ,

Next, by (8.5.2), 1 2 n 4 ∗ −B n Xi , θ ≤ C1 ∆k + ∆5 , i=1

j+1 i=1

2

i=2 n

−B

+ δ θ, n

i=1 −B

θi ⎠

i=j+2

1 j

1 ∆3

Xi , n−B

i=j+2

1

⎞

n

n

2 θi + δ B θ .

i=1

(8.5.3)

8.5 Proofs of the Rate of Convergence Results

155

with ∆5 = C2 δ B . We shall estimate each ∆k separately. (I) Estimate for ∆3 . We have 1 ∆3 = (m + 1)∗

n−B X1 + n−B (n − 1)B (n − 1)−B

n

θi + δ B θ,

i=2

n−B θ1 + n−B (n − 1)B (n − 1)−B

= (m + 1)∗ n−B X1 + n−B (n − 1)B θ2 + δ B θ, n−B θ1 + n−B (n − 1)B θ2 + δ B θ

n

2 θi + δ B θ

i=2

[by (8.5.1)] ≤ (m + 1)∗ n−B X1 + n−B (n − 1)B θ2 , n−B θ1 + n−B (n − 1)B θ2 [by the regularity of ∗ ] 1 (m + 1) Var n−B X1 + n−B (n − 1)B θ, n−B θ1 + n−B (n − 1)B θ ≤ 2 1 [since ∗ ≤ Var] 2 1 ≤ (m + 1) Var (n − 1)−B X1 + θ, (n − 1)−B θ1 + θ 2 [by the homogeneity of Var]. Thus, we have ∆3 ≤

'r 1 ' n '(n − 1)−B ' µr (X1 , θ1 ), 2

(8.5.4)

invoking Lemma 8.4.30. Now, using the fact that t → tB is strictly increasing on (0, ∞), we have ' ' ' n − 1 −B ' ' ' ' ' ' −B ' '(n − 1)−B ' ≤ ' (8.5.5) ' ' n ' ' ' n ' ' ' 1 −B ' ' ' ' ' ' ' ' ' −B ' ≤ ' ' n ' = '2B ' 'n−B ' . ' 2 ' Thus it follows from (8.5.4) and (8.5.5) that ' 'r ∆3 ≤ C3 n 'n−B ' µr , with C3 = 12 2B r .

(8.5.6)

156

8. Probabilistic-Type Limit Theorems

(II) Estimate for ∆4 . Similarly, we have 1 ∆4

= ∗

m+1

n−B (m + 1)B (m + 1)−B

i=1 −B

n

B

(m + 1) (m + 1)

≤

n−B (m + 1)B θ1 + n−B

n

θi + n

−B

n

2 B

θi + δ θ

i=m+2

Xi ,

i=m+2

n−B (m + 1)B θ1 + n−B

Xi + δ B θ,

i=m+2

m+1 i=1

1 ∗

−B

n

θi + n−B

n

2 θi

i=m+2

[by (8.5.1) and the regularity of ∗ ] 1 n 1 Xi , ≤ Var n−B (m + 1)B θ1 + n−B 2 i=m+2 n−B (m + 1)B θ1 + n−B

n

2

θi

i=m+2

≤ ≤

1 2 n n 1 −B −B Var (m + 1) Xi + θ, (m + 1) θi + θ 2 i=m+2 i=m+2 1 n 2 n 1 −B r (m + 1) µr Xi , θi 2 i=m+2 i=m+2

[by Lemma 8.4.30] ' 1' '(m + 1)−B 'r (n − m − 1)µr (X1 , θ1 ) ≤ 2 by the triangle the repeated use of the regularity of µr . ' and ' inequality ' ' ' ' ' n −B ' ' 1 −B ' ' ' Finally, by ' m+1 ' = '2B ', we have ' ≤ ' 2 ' 'r ∆4 ≤ C3 n 'n−B ' µr .

(8.5.7)

We prove the theorem by induction. For n = 1, the theorem is valid because ∗ (X1 , θ) ≤ τr (X1 , θ). For n = 2, 3, and 4, the estimates are similar to those for n ≥ 5, the case we are going to prove. However, the absolute constants in the bounds for n = 2, 3, and 4 have smaller values.

8.5 Proofs of the Rate of Convergence Results

157

In the following we assume n ≥ 5. Assume that for all j < n, 1 2 j ∗ −B Xi , θ ≤ K jj −B r µr + j −B τr . j i=1

Since we have already estimated ∆3 and ∆4 independently of the induction hypothesis, we shall estimate only ∆1 , ∆2 , and ∆5 . To this end, take δ > 0 such that nδ = ε−1 for some ε > 0, where ε will be suitably chosen later. (III) A bound for ∆1 . By the deﬁnition of ∗ , 1 2 n n Xi , n−B θi ∗ n−B i=2

i=2

1

=

B −B

sup t n t>0

1

n

B −B

Xi , t n

i=2

n

2 θi

i=2 n

n−1 B n−1 B ) (n − 1)−B ) (n − 1)−B = sup tB ( Xi , tB ( θi n n t>0 i=2 i=2 2 1 n n B −B B −B ≤ sup u (n − 1) Xi , u (n − 1) θi u>0

1

= ∗

(n − 1)−B

n

i=2 n

Xi , (n − 1)−B

i=2

n

2

2

i=2

θi

i=2

≤ K (n − 1)(n − 1)−B r µr + (n − 1)−B τr by the induction hypothesis. Furthermore, Var n−B X1 + δ B θ, n−B θ1 + δ B θ ≤ Var (nδ)−B X1 + θ, (nδ)−B θ1 + θ ≤ (nδ)−B µ1 (X1 , θ1 ) by the homogeneity of Var of order 0 and Lemma 8.4.30. Thus, we have

' ' 'r ' ' ' ∆1 ≤ K (n − 1) '(n − 1)−B ' µr + '(n − 1)−B ' τr '(nδ)−B ' µ1 (8.5.8)

' ' 'r ' ' ' ≤ KC4 n 'n−B ' µr + 'n−B ' τr '(nδ)−B ' µ1

' ' 'r ' ' ' ≤ KC4 n 'n−B ' µr + 'n−B ' τr 'εB ' τr , with C4 = max{2B r , 2B }. (IV) A bound for ∆2 .

158

8. Probabilistic-Type Limit Theorems

We assume n ≥ 5. Then, as in the case for ∆3 , we have, for j ≤ m = [ n2 ], ⎛ ⎞ n n ∗ ⎝n−B Xi , n−B θi ⎠ i=j+2

i=j+2

⎛

≤ ∗ ⎝(n − j − 1)−B

n

n

Xi , (n − j − 1)−B

i=j+2

⎞ θi ⎠

i=j+2

≤ K (n − j − 1)(n − j − 1)−B r µr + (n − j − 1)−B τr . Also, we have 1 −B

Var n

j j+1 B −B ( θi + Xj+1 ) + δ θ, n θi + δ B θ i=1

2

i=1

1

≤ Var Xj+1 +

j

B

θi + (nδ) θ, θj+1 +

i=1 j B θ1

j

2 B

θi + (nδ) θ

i=1 B j θ1

= Var Xj+1 + + (nδ)B θ, θj+1 + + (nδ)B θ (by Lemma 8.4.33) = Var X1 + (j + nδ)B θ, θ1 + (j + nδ)B θ −B −B ≤ Var (j + nδ) X1 + θ, (j + nδ) θ1 + θ (by the homogeneity of Var of order 0,) ≤ (j + nδ)−B r µr (X1 , θ) by Lemma 8.4.30. Thus we have

∆2

≤

m

K (n − j − 1)(n − j − 1)−B r µr

j=1

(8.5.9)

+ (n − j − 1)−B τr (j + nδ)−B r µr .

Now, for 1 ≤ j ≤ m = [ n2 ], n ≥ 5, we have

n−j−1 n

≥

n−m−1 n

≥

(n − j − 1)−B ≤ 4B n−B ,

1 4.

Hence

(8.5.10)

and so by (8.5.9) and (8.5.10), ∞ ∆2 ≤ KC5 nn−B r µr + n−B τr (j + nδ)−B r µr ,

(8.5.11)

j=1

where C5 = max{4B r , 4B }. Furthermore, ∞ j=1

(j + nδ)−B r

∞ ≤ 0

(x + nδ)−B r dx

(8.5.12)

8.5 Proofs of the Rate of Convergence Results

∞ y

=

−B r

−B r

dy ≤ (nδ)(nδ)

∞

159

z −B r dz.

1

nδ

Recall that r > ΛB /λ2B ≥ 1/λB . Take q such that 1/λB < q < r. Then, by 1 1 (8.4.5), x q I−B → 0 as x → ∞, and hence M1 := supx≥1 x q I−B < ∞. Thus for z ≥ 1, z −B r ≤ M1r z −r/q , and hence ∞ z

∞ dz ≤ M1 z −r/q dz =: C6 < ∞.

−B r

1

(8.5.13)

1

It also follows from the assumption for p (p > 1/λB ) that for z ≥ 1, 1 −1 ≤ M2 z B since z B z −B ≥ 1, where M2 = z p ≤ 'M2 z −B ' ' p1 I−B ' supx≥1 'x '. Thus if nδ ≥ 1, then nδ ≤ M2p (nδ)B p .

(8.5.14)

Finally, we have by (8.5.11)–(8.5.14) and (8.4.5) that if nδ = ε−1 (where ε > 0 will be taken small), then ' 'r ∆2 ≤ KC5 nn−B r µr + n−B τr C6 (nδ) '(nδ)−B ' µr (8.5.15) p ' −B 'p ' B 'r −B r −B ' ' ' ' ε ≤ KC5 C6 nn µr + n τr M2 ε µr

' ' ' ' ' ' ' ' r p r p ≤ KC5 C6 M2 n 'n−B ' µr + 'n−B ' τr 'ε−B ' 'εB ' τrr−p . (V) A bound for ∆5 . We have ∆5 ≤ C2 n−B ε−B = C2 n−B τr ε−B τr−1 .

(8.5.16)

Altogether we have from (8.5.3), (8.5.6), (8.5.7), (8.5.8), (8.5.15), and (8.5.16) that 2 1 n −B Xi , θ (8.5.17) n i=1

' ' ' ' 'r ' ≤ 2C1 C3 nn−B r µr + KC1 C4 'εB ' τr n 'n−B ' µr + 'n−B ' τr

' ' ' 'p ' 'r 'r ' + KC1 C5 C6 M2p 'ε−B ' 'εB ' τrr−p n 'n−B ' µr + 'n−B ' τr ' ' ' ' + C2 'ε−B ' τr−1 'n−B ' τr % ' ' ' 'p ' 'r ≤ K C1 C4 'τr εB ' + C1 C5 C6 M2p '(τr εB )−1 ' 'τr εB ' ' ' ' ' −B 'r ' + 2C1 C3 + C2 '(τr εB )−1 ' n 'n ' µr + 'n−B ' τr .

160

8. Probabilistic-Type Limit Theorems

We ﬁrst show that ' 'p ' 'r lim 't−B ' 'tB ' = 0. t→0

(8.5.18)

It follows ' from ' (8.4.4) and (8.4.5) that for any η > 0 and for some small t0 > 0, 't−B ' ≤ t−ΛB −η , t < t0 , and tB ≤ tλB −η , t < t0 . Thus for t < t0 , ' −B 'p ' B 'r 't ' 't ' ≤ t−(ΛB +η)p+(λB −η)r ≤ t−ΛB p+λB r−η(p+r) , where by the restrictions on r and p, −ΛB p + λB r > 0. Thus, taking η > 0 suﬃciently small, we get (8.5.18). Also, of course, limt→0 tB = 0. Therefore, we can ﬁnd a suﬃciently small ε > 0 such that the matrix τr εB satisﬁes ' ' ' 'r −1 ' 1 ' 'p ' . (8.5.19) C1 C4 'τr εB ' + C1 C5 C6 M2p ' τr εB ' 'τr εB ' ≤ 2 Then choose K such that ' ' K 2C1 C3 + C2 '(τr εB )−1 ' ≤ . 2

(8.5.20)

Finally, we obtain from (8.5.17), (8.5.19), and (8.5.20) that 1 2 n

' ' 'r ' −B n Xi , θ ≤ K n 'n−B ' µr + 'n−B ' τr .

2

i=1

Proof of Theorems 8.4.8 and 8.4.9: By the same reasoning as that mentioned at the beginning of this section, we can assume that {Xi } and θ are independent of each other and that the {θi } are independent copies of θ and are independent of the {Xi } and θ. We prove the theorem by induction. For n = 1, the assertion is trivial. For n = 2, we have Var 2−B (X1 + X2 ), θ = Var 2−B (X1 + X2 ), 2−B (θ1 + θ2 ) [by (8.5.1)] ≤ Var(X1 + X2 , θ1 + θ2 ) [by the homogeneity of Var of order r = 0] ≤ Var(X1 + X2 , X1 + θ2 ) + Var(X1 + θ2 , θ1 + θ2 ) [by the triangle inequality] ≤ Var(X2 , θ2 ) + Var(X1 , θ1 ) [by the regularity of Var] ' 'r = 2 Var (X1 , θ) ≤ 2νr ≤ 2c '2−B ' νr ,

8.5 Proofs of the Rate of Convergence Results

since

161

' 'r ' 'r ' ' ' 'r ' 'r 'r 'r c '2−B ' = '2B ' + '3B ' '2−B ' ≥ '2B ' '2−B ' ≥ 1.

For n = 3, we similarly have ' 'r Var 3−B (X1 + X2 + X3 ), θ ≤ 3c '3−B ' νr . Now suppose that for all j < n, 2 1 j ' 'r −B Var j Xi , θ ≤ cj 'j −B ' νr .

(8.5.21)

i=1

Then for any j < n, 1 2 j ' 1' −B '2−B 'r Var j Xi , θ ≤ cM νr ≤ ca = b i=1

(8.5.22)

by our assumptions. For any integer n ≥ 4 and m = [ n2 ], we have 1 2 n −B Var n Xi , θ i=1

1

−B

= Var n 1

−B

≤ Var n

n i=1 n

Xi , n Xi , n

−B

−B

1

−B

+ Var n

2

n

θi i=1 1m

i=1

(8.5.23)

θi +

i=1

22

n

Xi

i=m+1

n m n ( θi + Xi ), n−B θi i=1

i=m+1

2

i=1

[by the triangle inequality] 1 −B

≤ Var n

n

Xi , n

−B

i=m+1

1

−B

+ Var n

1m i=1

1 −B

+ Var n

2

n

θi

i=m+1

Xi +

n i=m+1

θi

1 −B

Var n

i=1

2 ,n

−B

n i=1

n m n ( θi + Xi ), n−B θi i=1

[by Lemma 8.4.26]

i=m+1

m

i=1

2 θi

2

Xi , n

−B

m i=1

2 θi

162

8. Probabilistic-Type Limit Theorems

=: I1 + I2 + I3 . By the induction hypotheses (8.5.21) and (8.5.22), 1 −B −B 2 n n n −B I1 = Var (n − m) Xi , θ1 n−m n−m i=m+1 1 2 n

n −B

n −B × Var m−B Xi , θ1 m m i=m+1 1 2 1 2 n m −B −B ≤ Var (n − m) Xi , θ1 Var m Xi , θ1 i=m+1

≤

i=1

' ' ' 1' '2−B 'r cm 'm−B 'r νr . b

Note that m = [ n2 ] ≥ 25 n for n ≥ 4. Hence ' 'r ' 'r ' 2 −B ' ' 2 −B ' ' ' −B 'r n n r ' ' ' −B ' ' ' 'm ' ≤ ' ≤ ' n ' . ' n ' ' ' 2' 5 2' 5 Thus I1

' 'r −B ' 'r 1 ' 'r 'r 2 1' 2 ' ' ' −B '2 ' c ' cn 'n−B ' νr (8.5.24) ≤ ' n 'n−B ' νr = ' ' b 2' 5 5

by the deﬁnition of b. To estimate I2 , observe that 1 m n n )−B (n − m)−B I2 = Var n−B Xi + ( θi , n−m i=1 i=m+1 2 −B m n n −B −B n θi + (n − m) θi n−m i=1 i=m+1 1 2 m m ≤ Var (n − m)−B Xi + θ, (n − m)−B θi + θ . i=1

Then we have I2

' 'r ≤ '(n − m)−B ' µr

1m i=1

i=1

Xi ,

m i=1

2 θi

[by Lemma 8.4.30]

' 'r ≤ '(n − m)−B ' mµr (X1 , θ) [by the triangle inequality and the repeated use of the regularity of µr ] ' 'r −B ' 'r 1' 1 n−m ' 1 ' ' ≤ ≥ . Hence ' ' n 'n−B ' µr since ' 2' 2 n 2

8.5 Proofs of the Rate of Convergence Results

I2 ≤

' ' ' 1' '2B 'r n 'n−B 'r νr . 2

163

(8.5.25)

As to I3 , we have 1 2 n n −B −B B −B −B B Xi + n m θ, n θi + n m θ I3 = Var n i=m+1

1

−B

≤ Var m

n

i=m+1

Xi + θ, m

n

−B

i=m+1

θi + θ

2 (8.5.26)

i=m+1

' ' ' ' 'r 'r ' n B 'r ' ≤ 'm−B ' (n − m)µr (X1 , θ) ≤ 'n−B ' ' ' m ' (n − m)µr ' ' ' 3' '3B 'r n 'n−B 'r νr , ≤ 5 since

n m

≤ 3 for n ≥ 4 and n − m ≤ 35 n.

Altogether, we have from (8.5.23)–(8.5.26), 1 2 n 'r ' 2 1 B r 3 B r −B c + 2 + 3 n 'n−B ' νr Var n Xi , θ ≤ 5 2 5 i=1 ≤ cnn−B r νr . This completes the proof of Theorem 8.4.8. The proof of Theorem 8.4.9 is similar and is therefore omitted. 2 Proof of Theorem 8.4.10: Again, we assume that the {Xi } and θ are independent of each other. Let {θi } be independent copies of θ, and assume that the {θi } are independent of {Xi } and θ. We prove the theorem by induction. For n = 1, d(X1 , θ) ≤ Ad(X1 , θ) ≤ ATr . For n = 2, we have by Lemma 8.4.31, the regularity of d, and the triangle inequality, d 2−B (X1 + X2 ), θ = d 2−B (X1 + X2 ), 2−B (θ1 + θ2 ) = |J2B | d(X1 + X2 , θ1 + θ2 ) ≤ 2 |J2B | d(X1 , θ) ≤ 2C(d, B)Tr [by Lemma 8.4.32] ≤ 2A2−B r Tr , ' 'r ' 'r ≥ C(d, B) '2B ' '2−B ' ≥ C(d, B). Similarly, we have

' 'r since A '2−B ' for n = 3, 'r ' d 3−B (X1 + X2 + X3 ), θ ≤ 3C(d, B)Tr ≤ 3A '3−B ' Tr ,

164

8. Probabilistic-Type Limit Theorems

' ' 'r ' 'r 'r since A '3−B ' ≥ C(d, B) '3B ' '3−B ' ≥ C(d, B). To prove the theorem by induction, assume for all j < n that 1 2 j ' 'r d j −B Xj , θ ≤ Aj 'j −B ' Tr . i=1

For any n ≥ 4 and m = [ n2 ], we have by Lemma 8.4.28, 1

d n−B

n

2

Xi , θ

(8.5.27)

i=1

1

≤

d n

n

−B

i=1

1

−B

+d n

1m

1m

d n−B 1

m i=1 −B

θi +

+d n

m

1 −B

+d n

1m

2 Xi

2

Xi +

θi

θi +

n i=m+1

Xi

2 θi

n

Xi , n−B

i=m+1

2

n

n i=1

θi Var n−B

i=m+1

i=1

,n

−B

1

i=1

i=1

Xi

i=m+1

i=m+1

Xi , n−B 1m

n

22

n

θi +

i=1

i=1

1 ≤

Xi , n

−B

,n

−B

n

2

,n

−B

n

2 θi

i=m+1

θi

i=1

2

n

2 θi

i=1

=: I1 + I2 + I3 . By Lemma 8.4.31, 1 2 1 2 m n

n −B

n −B m−B Xi , θ Var (n − m)−B Xi , θ I1 ≤ d m m i=1 i=m+1 1 2 1 2 m n n −B −B = J( m )B d m Xi , θ Var (n − m) Xi , θ . (8.5.28) i=1

i=m+1

By the induction hypothesis and Lemma 8.4.32, 1 2 m ' 'r n −B Xi , θ ≤ C(d, B)Am 'm−B ' Tr J( m )B d m i=1

(8.5.29)

' ' ' ' −B 'r n' −B 'r ' m ' ' Tr n ≤ C(d, B)A ' n ' 2 'r 1 ' 'r ' ≤ C(d, B)A '3B ' n 'n−B ' Tr . 2

8.5 Proofs of the Rate of Convergence Results a On the other hand, since νr ≤ M , by Theorem 8.4.8, 2 1 n ' 'r Xi , θ ≤ c(n − m) '(n − m)−B ' νr Var (n − m)−B i=m+1

where we have used νr ≤

1 , D

1 M cD .

Therefore, we have, by (8.5.28)–(8.5.30),

'r 'r ' 1 ' 'r 1 ' 1 I1 ≤ C(d, B)A '3B ' n 'n−B ' Tr ≤ An 'n−B ' Tr , 2 D 2 ' B 'r since D ≥ C(d, B) '3 ' . Similarly, for the estimate of I2 ,

I2

(8.5.30)

'r ' 1 ' ' = c '(n − m) r I−B ' νr ≤ cM νr ≤

1

165

(8.5.31)

−B −B 2 m n n −B = d n Xi + θ, n θi + θ n−m n−m i=1 i=1 1 2 m m −B −B n = J( n−m Xi + θ, (n − m) θi + θ )B d (n − m) −B

m

i=1

i=1

[by Lemma 8.4.31] ≤ C(d, B)(n − m)−B r dr

1m i=1

Xi ,

m

2 θi

[by Lemma 8.4.32]

i=1

' 'r ≤ C(d, B) '(n − m)−B ' mdr (X1 , θ) by the triangle inequality and the repeated use of the regularity of dr . Hence 'r 1 ' 'r ' I2 ≤ C(d, B) '2B ' n 'n−B ' Tr . 2

(8.5.32)

Finally, we have 1 2 n n

n −B

n −B −B −B θ+n Xi , θ+n θi I3 = d m m i=m+1 i=m+1 1 2 n n ≤ C(d, B)d m−B Xi + θ, m−B θi + θ i=m+1

' 'r ≤ C(d, B) 'm−B ' (n − m)dr (X1 , θ) 'r 3 ' 'r ' ≤ C(d, B) '3B ' n 'n−B ' Tr . 5

i=m+1

(8.5.33)

166

8. Probabilistic-Type Limit Theorems

Combining the estimates for I1 , I2 , and I3 , we ﬁnally obtain from (8.5.27) and (8.5.31)–(8.5.33) that 2 1 n Xi , θ ≤ I1 + I2 + I3 d n−B ≤

i=1

' 'r 3 ' 'r 'r ' 1 1 A + C(d, B) '2B ' + C(d, B) '3B ' n 'n−B ' Tr 2 2 5

≤ Ann−B r Tr 2

by the deﬁnition of A.

Proof of Theorem 8.4.13: We apply the “minimality” property of the 4 p -metric : L 1 1 2 2 n n n −B −B −B i , n 4p n X L Xi , θ ≤ inf Lp n θi , (8.5.34) i=1

i=1

i=1

where the inﬁmum is taken over all independent identically distributed d d d with ﬁxed marginals X θ) ∼ i , θi ) ∼ (X, X and θ ∼ θ. The rightpairs (X hand side in (8.5.34) is less than or equal to %' & ' 'p 'p d d X 4 p (X, θ). θ); ∼ inf 'n−B ' nLp (X, X, θ ∼ θ = n 'n−B ' L The bound for π follows from Lemma 8.4.33.

2

Proof of Theorem 8.4.14: The proof is similar to that of Theorem 8.4.10 and is therefore omitted. 2 Proof of Theorem 8.4.15: The proof resembles that of Theorem 8.4.4 4 p in with the replacement of the smoothing inequality for ∗ by that for L Lemma 8.4.27 and hence is omitted. 2 Proof of Theorem 8.4.16: Using the Marcinkiewicz–Zygmund inequality (see for example Kawata (1972, Theorem 13.6.1)), if 1 0. Take ξi = n−B (X d d i ∼ X and θi ∼ θ. Thus we get pairs, X 'p * )' n n ' ' 'p ' ' −B ' ' i − θi )' (Xi − θi )' E 'n−B (X ≤ Cp E 'n ' ' ' i=1

i=1

8.5 Proofs of the Rate of Convergence Results

167

' ' 'p ' ' 'p = Cp n 'n−B ' E 'X − θ' . Passing to the minimal metrics gives the necessary inequality. Finally, note Λ that p > λ2B implies nn−B p → 0 as n → ∞. The bound for the Prohorov B

p

4 pp+1 for p ≥ 1. metric π comes from the inequality π ≤ L

2

4 p to get Proof of Theorem 8.4.19: We use the minimality property of L &p % 4 p (M (n), M (n)) L X θ 1 k 2 1 k 22p 1 n n ? ? −B −B 4 = Lp n Xi , n θi k=1

i=1

k=1

i=1

)' 1 k 2 1 k 2'p * n n ' ' ? ' −B ? ' ≤ E 'n Xi − n−B θi ' ' ' k=1 i=1 k=1 i=1 )' n 1 k 2 1 2' * n k '? 'p ? ' ' ≤ n−B p E ' Xi − θi ' ' ' k=1 i=1 k=1 i=1 'p * ) n ' k k ' ? ' ' ' ≤ Cp,d n−B p E Xi − θi ' ' ' ' k=1

i=1

0

i=1

}1/p · 0 for some Cp,d > 0 and (here we have used · ≤ {Cp,d ' = = =' xk − yk 0 ≤ 'xk −yk '0 )

≤

Cp,d n−B p

n k=1

'p * )' k ' ' ' ' E ' (Xi − θi )' . ' ' i=1

0

Since E[X − θ] = 0 and 1 0, k n ' −B 'p p ' ' n ≤ Cp,d E [X − θ ] k=1 i=1

' n(n + 1) ' 'n−B 'p E [X − θp ] . = Cp,d 2 Passing to the minimal metric gives the necessary bound for 4 p (M (n), M (n)). L X θ

168

8. Probabilistic-Type Limit Theorems

4 p (m (n), m (n)). Further, The same argument leads to the bound for L X θ % &p 4 p (a (n), a (n) L X θ

' ' k ' p * ) n ' k n ' ' ? ' ' ? ' −B 'p ' ' ' ' ≤ 'n ' E Xi ' − θi ' ' ' ' ' ' ' k=1 i=1 k=1 i=1 ' ' 'p * ) n ' k k ' ' ' ? ' ' ' ' ' ≤ n−B p E Xi ' − ' θi ' ' ' ' ' ' i=1 i=1 k=1 ' * ) n ' k k 'p ?' ' ' −B p ≤ n E Xi − θi ' ' ' ' k=1

i=1

i=1

' n(n + 1) ' 'n−B 'p E [X − θp ] ≤ Cp 2 as before. Combining our estimates, we complete the proof of the theorem. 2 Proof of Theorem 8.4.21: From the deﬁnition of the Zolotarev metric ζr and its ideality of order r, we get the ﬁrst bound: 1 1 2 2 n n n −B −B −B ζr n Xi , θ ≤ ζr n Xi , n θi i=1

i=1

' 'r ≤ 'n−B ' ζr

1n

i=1

2 n ' 'r Xi , θi ≤ n 'n−B ' ζr (X1 , θ) .

i=1

i=1

Applying the universal bound for the Prohorov metric π by ζr , π r+1 ≤ Crr+1 ζr on X (IRd ) for some Cr > 0 (cf. Zolotarev (1983)), we obtain the ﬁnal estimate. 2 d i − θi ), where X i ∼ X and Proof of Theorem 8.4.22: Let ξi = n−B (X d i , θi ) are i.i.d. Then by the Rosenthal inequality (see, for θi ∼ θ, and (X example, Araujo and Gin´e (1980, p. 205)), for 2 0. Then 1 2 n n −B −B i , n X θi Lp n i=1

i=1

i=1

8.5 Proofs of the Rate of Convergence Results

≤ C(d, p) max

⎧1 n ⎨ ⎩

169

21/p E[n

−B

i=1

1 n i=1

i − θi ) ] (X p

,

21/2 ⎫ ⎬ i − θi )2 ] E[n−B (X ⎭

1/p ' 1/2 ' −B 'p ' p −B '2 2 ' ' ' E[X − θ ] , n n E[X − θ ] ≤ C(d, p) max n n % & ' ' ' ' n1/2 'n−B ' L2 (X, θ), θ) ≤ C(d, p) max n1/p 'n−B ' Lp (X, ' ' θ), ≤ C(d, p)n1/2 'n−B ' Lp (X, since L2 ≤ Lp and n1/p ≤ n1/2 for p > 2. The case p = ∞ is similar.

2

In Theorems 8.4.4, 8.4.8, 8.4.9, 8.4.10, and 8.4.14 we have assumed the 4 p,r . Since χr ≤ µr , the natural ﬁniteness of the metrics µr , χr , dr , and L 4 p,r , which may question is how we can assure the ﬁniteness of µr , dr and L not be easily checked just by a direct use of the deﬁnitions. The rest of this 4 p,r , section is devoted to the construction of upper bounds for µr , dr , and L where the metrics used in the upper bounds are more familiar distances in the literature. We shall construct bounds from above for µr , dp , and Lp,r , using the Zolotarev ζr -metric. Deﬁne the following probability metrics: For X, Y ∈ X (IRd ), µr (X, Y )

=

dr (X, Y )

=

sup T ∈Aut(IRd )

sup T ∈Aut(IRd )

T −r Var(T X + θ, T Y + θ), T −r d(T X + θ, T Y + θ),

and 4 (X, Y ) = L p,r

sup T ∈Aut(IRd )

T −r Lp (T X + θ, T Y + θ).

4 . In the next two theorems we 4 p,r ≤ L Clearly, µr ≤ µr , dr ≤ dr , L p,r 4 are going to estimate µr , dr , and Lp,r from above by ζr . Let pθ (x) be the density function of the strictly operator-stable random vector θ ∈ IRd . For m ∈ IN let (m) |pθ (x)(h)m | dx Cm (θ) := sup h=1 IRd

170

8. Probabilistic-Type Limit Theorems

and Dm (θ) :=

(m)

sup sup |pθ

x∈IRd

h=1

(x)(h)m |.

Theorem 8.5.1 (i) For m ∈ IN, µm (X, Y ) ≤ Cm (θ)ζm (X, Y ). (ii) If r = m + p, m ∈ IN, 0 < p ≤ 1, then 4 (X, Y ) ≤ C (θ)ζ (X, Y ). L p,r m r Theorem 8.5.2 dm (X, Y ) ≤ Dm (θ)ζm (X, Y ),

m = 1, 2, . . . .

Proof of Theorem 8.5.1: (i) For any X and Y , Var(X + θ, Y + θ) =

sup A∈B(IRd )

= =

|P (X + θ ∈ A) − P (Y + θ ∈ A)|

(8.5.36)

sup{|E[f (X + θ) − f (Y + θ)]|; f ∈ F, f ∞ ≤ 1} sup{|E[g(X) − g(Y )]|; f ∈ F, f ∞ ≤ 1},

where g(x) := E[f (x + θ)]. Since pθ (x) is diﬀerentiable inﬁnitely many times (see Hudson (1980)), g(x) =

f (z)pθ (z − x) dz

f (x + y)pθ (y) dy = IRd

IRd

has derivatives of every order, and furthermore, (m) |g (m) (x)(h)m | = f (z)pθ (z − x)(h)m dz d IR (m) m = f (x + y)pθ (y)(h) dy . d IR

Since for f with f ∞ ≤ 1, sup sup |g (m) (x)(h)m | ≤ Cm (θ), we have x∈IRd h=1

g (m−1) (x) − g (m−1) (y) ≤ Cm (θ)x − y.

(8.5.37)

8.5 Proofs of the Rate of Convergence Results

171

Hence by (8.5.36) and (8.5.37), Var(X + θ, Y + θ) ' ' % ' ' ≤ sup |E [g(X) − g(Y )]| ; 'g (m−1) (x) − g (m−1) (y)'

& ≤ Cm (θ) x − y

and µm (X, Y ) =

sup T ∈Aut(IRd )

T −m Var(T X + θ, T Y + θ)

(8.5.38)

% ≤ sup T −m sup |E [g(T X) − g(T Y )]| ; T ' ' & ' (m−1) ' (x) − g (m−1) (y)' ≤ Cm (θ) x − y . 'g Let gT (x) := g(T x). Then gT(m−1) (x)(h)m−1 = g (m−1) (T x)(T h)m−1

for any x, h ∈ IRd ,

implying that ' ' ' ' ' (m−1) ' ' m−1 ' (m−1) (x) − gT(m−1) (y)' ≤ T (T x) − g (m−1) (T y)' . 'gT 'g Then the side condition in (8.5.38), ' ' ' (m−1) ' (x) − g (m−1) (y)' ≤ Cm (θ)x − y, 'g results in ' ' ' (m−1) ' (x) − gT(m−1) (y)' ≤ Cm (θ)T m−1 T x − T y 'gT ≤ Cm (θ)T m x − y. Consequently, by (8.5.38), % µm (X, Y ) ≤ sup T −m sup |E[gT (X) − gT (Y )]|; T

(m−1)

gT

(m−1)

(x) − gT

&

(y) ≤ Cm (θ)T m x − y

= Cm (θ)ζm (X, Y ), as desired.

4 . Let r = m + p, m ∈ IN, (ii) Let us now prove a similar bound for L p,r 0 < p ≤ 1. Then by Lemma 8.4.34 (ii), 4 p (X + θ, Y + θ) = L =

sup {|E [f (X + θ) − f (Y + θ)]| ; f ∈ Lipb (p)} sup {|E [g(X) − g(Y )]| ; f ∈ Lipb (p)} ,

172

8. Probabilistic-Type Limit Theorems

where g(x) := E[f (x + θ)]. Since pθ (x) is diﬀerentiable inﬁnitely many f (z)pθ (z − x) dz has times, the function g(x) = f (x + y)pθ (y) dy = IRd

IRd

derivatives of all orders, and for m ∈ IN, r = m + p,

(m) |g (m) (x)(h)m | = f (x + y)pθ (y)(h)m dy . d IR

By the requirement for f , ' ' ' (m) ' 'g (x) − g (m) (y)' =

sup g (m) (x)(h)m − g (m) (y)(h)m

h=1

(m) = sup [f (x + z) − f (y + z)] pθ (z)(h)m dz h=1 d IR (m) ≤ sup x − yp pθ (z)(h)m dz h=1 IRd p

≤ x − y Cm (θ). % ' ' 4 p (X + θ, Y + θ) ≤ sup |E [g(X) − g(Y )]|; 'g (m) (x) − g (m) (y)' Therefore L & ≤ Cm (θ)x − yp for any x, y ∈ IRd . 4 (X, Y ): Next consider L p 4 (X, Y ) L p,r

=

sup T ∈Aut(IRd )

T −r Lp (T X + θ, T Y + θ)

% ≤ sup T −r sup |E[g(T X) − g(T Y )]|; T ' ' & ' (m) ' 'g (x) − g (m) (y)' ≤ Cr (θ)x − yp . Let gT (x) := g(T x). Then for all x, h ∈ IRd , gT(m) (x)(h)m = g (m) (T x)(T h)m for any x, h ∈ IRd , implying that ' ' ' ' ' ' ' ' (m) 'gT (x) − gT(m) (y)' = T m 'g (m) (T x) − g (m) (T y)' . Applying g (m) (x) − g (m) (y) ≤ Cm (θ)x − yp , we get that ' ' ' (m) ' 'gT (x) − gT(m) (y)' ≤ Cm (θ)T m T x − T yp ≤ Cm (θ)T m+p x − yp ,

8.5 Proofs of the Rate of Convergence Results

173

and m + p = r by assumption. Similarly, we get 4 (X, Y ) ≤ sup T −r sup {|E[g (X) − g (Y )]|; L p,r T T T ' ' & ' (m) ' (m) 'gT (x) − gT (y)' ≤ Cm (θ)T r x − yp ≤ Cm (θ)ζr (X, Y ).

2

Proof of Theorem 8.5.2: We have dm (X, Y ) =

(8.5.39)

sup T ∈Aut(IRd )

T −m d(T X + θ, T Y + θ)

sup T −m sup |pT X+θ (x) − pT Y +θ (x)| T x∈IRd = sup T −m sup pθ (x − y)[P (T X ∈ dy) − P (T Y ∈ dy)] . T x∈IR d

=

IR

Let

g(y) = pθ (x − y).

(8.5.40)

Then

m (x − y)(h) sup sup sup g (m) (y)(h)m ≤ sup p(m) = Dm (θ). θ

y∈IRd x∈IRd h=1

Hence

y,x,h

' ' ' (m−1) ' (x) − g (m−1) (y)' ≤ Dm (θ)x − y. 'g

(8.5.41)

From by (8.5.39)–(8.5.41) we have dm (X, Y )

% ≤ sup T −m sup |E[g(T X) − g(T Y )]|; T ' ' & ' (m−1) ' (x) − g (m−1) (y)' ≤ Dm (θ)x − y . 'g

This upper bound is the same as that of µm (X, Y ) in (8.5.38) if Cm (θ) is replaced by Dm (θ). Therefore the proof of Theorem 8.5.1 also implies that dm (X, Y ) ≤ Dm (θ)ζm (X, Y ).

2

The next question is the ﬁniteness of Cm (θ) and Dm (θ). Theorem 8.5.3 We have Dm (θ) < ∞, m = 1, 2, . . . , and if ΛB < 1, then Cm (θ) < ∞, m = 1, 2, . . . .

174

8. Probabilistic-Type Limit Theorems

Proof: We ﬁrst show the ﬁniteness of Dm (θ). Let µ 4(z), z ∈ IRd , be the characteristic function of θ. As was shown in Hudson (1980), for some c > 0, (8.5.42) |4 µ(z)| ≤ exp{−cz1/B } for every z with z > 1. zm |4 µ(z)| dz < ∞, implying the exisHence, for every m = 1, 2, . . . , IRd

tence of p(m) (x), and furthermore the ﬁniteness of Dm (θ). θ To prove the ﬁniteness of Cm (θ), we assume ΛB < 1, which implies E[θ] < ∞. (See Hudson et al. (1988).) We start with Carlson’s inequality for one-variable functions f . If f4, the Fourier transform of f , is in L2 (IR), and f4 exists and is in L2 (IR), then ⎛ ∞ ⎞4 ⎛ ∞ ⎞⎛ ∞ ⎞ ⎝ |f (x)| dx⎠ ≤ K ⎝ f4(z)2 dz ⎠ ⎝ f4 (z)2 dz ⎠. (8.5.43) −∞

−∞

−∞

A version of this inequality for several variable functions f is, for each h ∈ IRd , ⎛ ⎞4 ⎛ ⎞⎛ ⎞ ⎝ |f (x)| dx⎠ ≤ K ⎝ f4(z)2 dz ⎠ ⎝ (Df4(z)h)2 dz ⎠, (8.5.44) IRd

IRd

IRd

where Df4(z) is the gradient (row) vector of f4(z). The proof of (8.5.44) can be carried out in the same manner as for (8.5.43), so we omit it. Since we are assuming E[θ] < ∞, µ 4(z) is diﬀerentiable. Fix h ∈ IRd (m) and apply (8.5.44) to f (x) := pθ (x)(h)m , x ∈ IRd . Then f4(z) = eiz,x p(m) (x)(h)m dx, and θ

IRd

Df4(z)h = i x, heiz,x pθ(m) (x)(h)m dx. Thus

IRd

f4(z)2 dz ≤

IRd

sup h=1 IRd

z2m |4 µ(z)|2 dz < ∞ by (8.5.42), and

IRd

2

Df4(z)h

dz ≤

z2m |D4 µ(z)|2 dz =: I.

IRd

So it remains to show the ﬁniteness of I. We recall that the characteristic function µ 4(z) is given by µ 4(z)

=

exp iz, c + z, Az

8.5 Proofs of the Rate of Convergence Results

+

∞ γ( dx)

e

iz,sB x

0

S

175

1 − 1 − iz, s xIQ (s x) 2 ds . s B

B

Here z ∈ IRd , c ∈ IRd , A is a nonnegative deﬁnite symmetric matrix, S = {x ∈ IRd ; x = 1 and tB x > 1 for all t > 1}, Q = {x ∈ IRd ; x ≤ 1}, γ is a probability measure on S. We write µ 4(z) = eψ(z) . Since ΛB < 1, sB x M1 := γ( dx) ds < ∞. (8.5.45) s2 {sB x>1}

S

Note that if γ(S) > 0, the non-Gaussian part exists, and the restriction of B to the support of the measure γ (we shall call it B again) satisﬁes λB > 12 . (See Hudson and Mason (1981).) Hence we also have sB x2 γ( dx) ds < ∞. (8.5.46) M2 := s2 {sB x≤1}

S

We have, for h ∈ IRd , D4 µ(z)h = µ 4(z)Dψ(z)h. Let z = (z1 , . . . , zd )t , c = (c1 , . . . , cd )t, A = (aij ), sBx = B t s x d , and h = (h1 , . . . , hd )t . Then ∂ ∂zj ψ(z)

sB x

1

, ...,

= icj + 2(Az)j ∞ 1 B + γ( dx) i sB x j eiz,s x − i sB x j IQ sB x ds. s2 S

0

Thus ∞ 1 ∂ B i ≤ |c ψ(z) | + 2Az + γ( dx) − IQ (sB x) 2 ds ∂zj s x j e j s 0 S ' B ' 1 's x' ≤ c + 2Az + γ( dx) ds s2 S

γ( dx)

+ S

{sB x>1}

1 ' B ' i 's x' e − 1 2 ds s

{sB x≤1}

≤ c + 2Az + M1 + M2 z, where M1 and M2 are ﬁnite by (8.5.45) and (8.5.46). We thus ﬁnally have d ∂ ψ(z)h |Dψ(z)h| = j ∂zj j=1

176

8. Probabilistic-Type Limit Theorems

≤ dh (c + 2Az + M1 + M2 z) ≤ C1 + C2 z, and |D4 µ(z)h| ≤ (C1 + C2 z) |4 µ(z)| . Hence by (8.5.42) we conclude that z2m |D4 µ(z)h|2 dz < ∞. I = IRd

The proof of Theorem 8.5.3 is now complete.

2

The ﬁnal question is the ﬁniteness of ζm (X1 , θ). As we have noted in (8.4.32), ζm (X1 , θ) ≤

1 κm (X1 , θ), m!

m = 1, 2, . . . ,

where κm (X1 , θ) is the diﬀerence pseudomoment, namely κm (X1 , θ)

=

sup {|E[f (X1 ) − f (θ)]|; |f (x) − f (y)| ≤ dm (x, y) & for any x, y ∈ IRd ,

' ' ' m−1 m−1 ' − y y where dm (x, y) = 'x x '. It would be diﬃcult to check conditions implying κm (X1 , θ) < ∞. Instead we give an example for laws of X1 with ζm (X1 , θ) < ∞. The idea is to use the series representation for θ (compare Remark 8.1.5). Let {Wj }∞ j=1 be a sequence of i.i.d. random variables taking their values in S = {x ∈ IRd ; x = 1} with a common probability distribution λ on S and Γj = δ1 +· · ·+δj . Here {δj } is a sequence of independent exponentially distributed random variables with E[δ1 ] = 1 that are independent of {Wj }. It is known that the series ∞ −B Γj Wj − E Γ−B j I[Γj ≥ 1] E[Wj ] j=1

converges almost surely and is distributed as an operator-stable random vector with exponent B. Suppose E[Wj ] = 0 for all j, and set d

θ =

∞ j=1

Γ−B j Wj .

8.5 Proofs of the Rate of Convergence Results

177

Let r > 1/λB and j0 = [rλB ]. Then it is easy to show that if j > j0 , then r E[Γ−B j ] < ∞. Consider another sequence of independent random variables {Vj } on S, which are independent of {Γj }, such that d

for j ≤ j0 ,

Vj = Wj

and for j > j0 , {Vj } are arbitrary but not identically distributed random variables on S. Deﬁne d

X =

∞

Γ−B j Vj ,

j=1

assuming that the series converges almost surely. Theorem 8.5.4 Suppose that all mixed moments of order less than or equal to r − 1 coincide; that is, (8.5.47) E V1α1 · · · Vpαp − W1α1 · · · Wpαp = 0 p for any αi ≥ 0 with i=1 αi ≤ r − 1. Then ζr (X, θ) < ∞. Proof: By the operator-ideality of ζr of order r and the triangle inequality, ⎛ ⎞ ⎠ ζr (X, θ) = ζr ⎝ Γ−B Γ−B j Vj , j Wj j

≤

∞

j

−B ζr (Γ−B j Vj , Γj Wj )

=

∞

−B ζr (Γ−B j Vj , Γj Wj )

j=j0 +1

j=1

(by Zolotarev (1983, Property 4 on p. 293)) ≤

∞

r E[Γ−B j ]ζr (Vj , Wj )

j=j0 +1 ∞

≤ sup ζr (Vj , Wj ) j

r E Γ−B j .

j=j0 +1

Since rλB > 1, the ﬁnal series converges. This is shown as follows. Note that for any ε > 0, there exists C > 0 such that x−B < Cx−(ΛB +ε) ,

0 < x ≤ 1,

178

8. Probabilistic-Type Limit Theorems

and x−B < Cx−(λB −ε) ,

x > 1.

Thus we have for j > j0 = [rλB ], −B r −B r r E[Γ−B j ] = E[Γj I[Γj ≤ 1]] + E[Γj I[Γj > 1]] −r(ΛB +ε)

≤ E[Γj

−r(λB −ε)

] + E[Γj

].

On the other hand, if j > p, then E[Γ−p j ] =

Γ(j − p) ∼ j −p . Γ(j)

Hence r −r(Λ+ε) E[Γ−B + j −r(λB +ε) ) ≤ Cj −r(λB +ε) ) j ] ≤ C(j

for large j, because ΛB ≥ λB . This shows the convergence of the series in question. Also, as we have noted in (8.4.34), under (8.5.47), ζr (Vj , Wj ) ≤

Γ(1 + α) ξr (Vj , Wj ), Γ(1 + r)

where r = m + α, m ∈ IN, α ∈ (1, 2]. In our case, Vj and Wj take their values on the unit sphere, and therefore, ξr (Vj , Wj ) = Var(Vj , Wj ) ≤ 1. This concludes the proof of the theorem.

2

8.6 Ideal Metrics in the Problem of Rounding It is a widely accepted fact that sums of rounded proportions often fail to add to 1. In their pioneering works, Mosteller, Youtz, and Zahn (1967) and Diaconis and Freedman (1979) assessed the probability that a vector of conventionally rounded percentages adds to 100. The conventional rule picks the midpoint of each interval as the threshold for rounding. However, the goal of rounding to maximize the probability that the sum of roundings is the rounding of the sum may well not be a “signiﬁcant” question: Instead, it seems that the goal of rounding to obtain a distribution as much like the original one as possible is more fundamental.

8.6 Ideal Metrics in the Problem of Rounding

179

Suppose, for example, that q1 , . . . , qs are s independent identically [0, 1]uniformly distributed random numbers and that each is to be rounded to either 0, 12 , or 1. The usual method is to let x = 0 if 0 ≤ q < 14 , x = 12 if 14 ≤ q < 34 , and x = 1 if 34 ≤ q ≤ 1. Then Ex = Eq = 12 , and 1 . On the other hand, if instead of rounding “at” Var x = 18 = Var q = 12 1 3 1 5 1 1 ∗ ∗ 4 and 4 one rounds at 6 and 6 , that is, x = 0 if 0 ≤ q < 6 , x = 2 1 5 5 1 ∗ ∗ if 6 ≤ q < 6 , and x = 1 if 6 ≤ q ≤ 1, then Ex = Eq = 2 and 1 . The importance of the deviations between the sums Var x∗ = Var q = 12 of the resulting roundings xS = x1 + · · · + xs and x∗S = x∗1 + · · · + x∗s from qS = q1 + · · · + qs may be seen by comparing the diﬀerences s = sup |P (a < qS < b) − P (a < xS < b)| a
and ∗s = sup |P (a < qS < b) − P (a < x∗S < b)|, a
Then, by the central limit theorem, lim s ≥ 0.049,

s→∞

whereas lim ∗s = 0, s→∞

showing that at least by this speciﬁc criterion, conventional rounding is not the best choice. In this section the Diaconis and Freedman results are extended, and the motivation for changing the nature of the goal of rounding is given.(2) We deﬁne and study the properties of optimal roundings in terms of ideal metrics. We start with some deﬁnitions from the theory of rounding (see Balinski and Young (1982)). A vector problem is a pair (p, h) where p = (pj ) is a vector of real numbers, j ∈ N = {1, . . . , n}, and h is a real number as well. Given any positive real t > 0, a rule of (1/t)-rounding is a mapping t such that t (p, h) ∈ {x = (xj ); xj = kj /t, kj integer, j ∈ N }.

(8.6.1)

In the sequel we write x = t (p), since h remains ﬁxed and it is generally the case that h = j pj . Our interest concerns problems with pN := N pj = h; for example, pj s are probabilities and h = 1.(3) This motivates two immediate questions: (a) Given a rule t what is the chance that xN := N xj = h for x = t (p)? (b) What speciﬁc rule t maximizes this chance? So from now on, it is assumed that problem (p, h) satisﬁes pN = h. (2) The

results in this section are due to Balinski and Rachev (1993). is no loss of generality in considering h = 1.

(3) There

180

8. Probabilistic-Type Limit Theorems

The conventional rule of (1/t)-rounding, x = t (p), with xi equal to pj rounded to the nearest 1/t, was ﬁrst discussed by Mosteller, Youtz, and Zahn (1967) for the vector problem (p, 1). They computed the chance that xN = 1 with this rule of rounding for several diﬀerent probability models generating p. Diaconis and Freedman (1979) assessed the limiting probability that xN = 1 as t → ∞ under the assumption that p is absolutely continuously distributed on the simplex Sn := {p = (pj ) ≥ 0, j ∈ N ; pN = 1}.

(8.6.2)

The MYZ-rule of rounding (see Mosteller, Youtz, and Zahn (1967) and Diaconis and Freedman (1979)) is deﬁned by 1/2

xj := [pj ]t

= k/t

if k −

1 1 < tpj ≤ k + , k an integer. (8.6.3) 2 2

The MYZ-rule is only one example of an inﬁnite class of rules of rounding ﬁrst discussed by Balinski and Young (1978) called divisor rules of (1/t)rounding based on d, dt , deﬁned by xj := [pj ]dt := k/t

if d(k − 1) < tpj ≤ d(k), k an integer, (8.6.4)

where d(h) ∈ [k, k + 1], the divisor criterion, is any real-valued function from the integers into the closed interval k to k + 1. It is the “threshold” for rounding: Above the threshold round up, at or below it round down. The divisor rules arose as a characterization of rules satisfying certain desirable properties in the context of apportionment problems; see Balinski and Young (1982). The best known and most discussed among them are the following (for integer k): Adams (or round up): Dean (or harmonic mean): Hill (or geometric mean):

d(k) = k;

1 d(k) = k(k + 1)/ k + 2 $ d(k) = k(k + 1);

Webster (or arithmetic mean, 1 the MYZ-rule): d(k) = k + ; 2 Jeﬀerson (or round down): d(k) = k + 1.

(8.6.5)

;

(8.6.6) (8.6.7) (8.6.8) (8.6.9)

Theorem 8.6.1 (Diaconis and Freedman (1979)) Suppose p is absolutely continuously distributed on the simplex and dt is based on d(k) = k+C, C ∈ [0, 1]. Then, as t → ∞, P (xN = 1) → P (C − 1 ≤ V1 + · · · + Vn−1 ≤ C),

(8.6.10)

where the Vi ’s are independent and uniformly distributed on [−C, 1 + C].

8.6 Ideal Metrics in the Problem of Rounding

181

A stationary divisor rule based on C is a divisor rule based on a criterion d(k) = k + C, 0 ≤ C ≤ 1. A K-stationary divisor K t based on (C0 , . . . , CK−1 , C), 0 ≤ C ≤ 1, 0 ≤ Ci ≤ 1 for all i, is a divisor rule based on the divisor criterion ⎧ ⎨ k+C if 0 ≤ k ≤ K − 1, k (8.6.11) d(k) = ⎩ k+C otherwise. From Theorem 8.6.1 it follows that among the stationary divisor rules (or 0-stationary ones) the MYZ-rule, or Webster’s rule w t , maximizes the probability that in the limit as t → ∞ the sum of the roundings is 1. The following theorem strengthens Diaconis–Freedman’s result. Theorem 8.6.2 Suppose p is uniformly distributed on the simplex Sn . Then the maximum of the limiting probability limt→x P (xN = 1) over the set of all K-stationary divisor rules is attained with C = 12 and C0 , . . . , CK−1 arbitrary. The proof is similar to that in Theorem 8.6.1; for details we refer to Balinski and Rachev (1993). This theorem remains valid under the weaker assumption that p is absolutely continuous on the simplex. However, the result suggests that maximizing the limiting probability in (8.6.10) is not in fact a reasonable objective. The rate of convergence in the best case, (8.6.12) lim P (xN = 1) 1 2 n−1 1 1 1 1 Vi ≤ ; Vi i.i.d. uniform on − , = P − < 2 2 2 2 1 3 6 + o(n−1/2 ), ≈ π(n − 1)

t→x

can be very slow. Indeed, for every n and every t it is possible to choose an absolutely continuous distribution µn,t on Sn such that P (xN = 1), whereas the right-hand side of (8.6.10) is less than 1 for n ≥ 3 and converges to 0 as n grows. To construct such a distribution choose p1 , . . . , pn−1 to be independent identically (0, δ)-uniformly distributed random variables with 1 δ ∈ (0, 2m ) and pn = 1 − p1 − · · · − pn−1 . This determines an absolutely continuous distribution on Sn such that the Webster rule of rounding gives xi = 0 for i = 1, . . . , n − 1 and xn = 1, with probability equal to 1. Perhaps a more serious limitation of this approach is the fact that a “best” K-stationary rule—“best” in the sense of maximizing the limit in (8.6.10), an example of which is the Webster rule w t —may not satisfy a natural “continuity” property: If two problems (p, 1) and (p∗ , 1) come

182

8. Probabilistic-Type Limit Theorems

from distributions that are “close,” then the distributions of their roundings should be close as well. Letting µ represent the distribution of the roundings under rule when the original data have the distribution µ, “continuity” means that µ ≈ µ∗

implies

µ ≈ µ∗ .

(8.6.13)

For example, the continuity property may fail if µ is an absolutely continuous distribution on Sn and µ∗ is a discrete distribution on Sn . Speciﬁcally, take µ∗ to be the point distribution p1 = · · · = pn−1 = 0, pn = 1, and µ(i) to be absolutely continuous distributions satisfying ν1 (µ(i) , µ∗ ) → 0 as i → ∞, with ν1 a distance in the space of distributions on Sn that metrizes w ∗ the weak convergence. Given a similar distance ν2 (w t µ, t µ ) in the space of the distributions of the roundings on the lattice {1, 1 ± 1/t, . . .}n , the continuity property says that as i → ∞, ν1 (µ(i) , µ∗ ) → 0

implies

(i) w ∗ lim ν2 (w t , µ , t µ ) → 0.

t→x

However, in our example the ﬁrst sequence goes to 0, but the second is (i) ∗ converges weakly to w strictly positive. To see this, recall that if w t µ t µ , then the distribution of f (x(i) ) converges to that of f (x∗ ) for any continuous (i) function f on Rn , and in particular, the distribution of t(xN −1) converges ∗ ∗ to that of t(xN − 1). But, by construction t(xN − 1) = 0, whereas for any (i) i, t, we have (xN − 1) → V1 + · · · + Vn−t as t → ∞, and the last term is not 0 with probability 1. In contrast with Theorem 8.6.1, where the result is independent of the particular absolutely continuous distribution that obtains, the above remarks show that constructing a reasonable rule of rounding should depend on the information that is available concerning the distribution µ. We next consider the vector problem of rounding as a special case of the matrix problem: (pij ), i ∈ N = {1, . . . , n}, j ∈ S = {1, . . . , s}, made up of s observations pj = (p1j , . . . , pnj )T , j ∈ S (the columns of the matrix) of the random vector p = (p1 , . . . , pn )T , consisting of independent nonidentically distributed random variables. The ith row of data pi = (pi1 , . . . , pis ) consists of s independent observations of pi . For simplicity, set q := (qj )j∈S , where qj = (pij ), the vector problem with i.i.d. random variables. It is assumed throughout that q1 has a continuous distribution. Suppose x = (xj )j∈S is obtained by some 1/t-rule of rounding; that is, xj = kj /t, where the kj are integer-valued i.i.d. random variables, and that q ≥ 0, so the xj ’s take values on the lattice {0, 1/t, 2/t, . . .}. The question of interest is, What is the deviation between the sum of the observations qs = q1 + · · · + qs and the sum of the roundings xs = x1 + · · · + xs . The

8.6 Ideal Metrics in the Problem of Rounding

183

answer will, of course, depend upon the distribution of the qj ’s, the way by which the deviation is measured, and the rule of rounding that is used. A rule of rounding x∗ = ∗t (q) is optimal with respect to the metric µ over a class of rules R if for any q, µ(qs , x∗s ) = min{µ(qs , xs ); x = t (q), t ∈ R} t

and

µ

1 1 qs , x∗s s s

(8.6.14)

→ 0

as s → ∞.

(8.6.15)

Roughly speaking, optimality asks that the deviation between the sum of the observations and the sum of the roundings should be as small as possible and that the deviation between their respective sample means should go to 0 as the number of observations grows. Suppose X = (X1 , . . . , Xn ) and Y = (Y1 , . . . , Yn ) are random vectors each with i.i.d. components. A variety of probability metrics µ have been proposed to measure the deviation between two such distributions (see Rachev (1991) and the previous Sections 8.1–8.5). Probably the best known is the Kolmogorov, or uniform, distance , (X, Y ) :=

sup

−∞<x<∞

|FX (x) − FY (x)|,

where FZ is the distribution function of Z. Others include the Kantorovich metric, Dall’Aglio’s extension of the Kantorovich metric, and the L´evy metric. But for these metrics there exists no optimal rule of rounding because it is impossible to meet condition (8.6.15). The only type of metric that seems to be able to meet this condition is an ideal metric of order r > 0 (see Section 8.1) that satisﬁes 2 1 n n n Xj , c Yj µ(Xj , Yj ) for any c > 0. (8.6.16) ≤ cr µ c 1

1

1

The class of all ideal metrics has not been characterized; indeed, only a few have been identiﬁed. An example of an ideal metric of order r = 1 + (1/p) ∈ (1, 2), p ≥ 1, is θr (X, Y ) = sup{|E(f (X) − f (Y ))|; f ∈ Fr }.

(8.6.17)

Here Fr is the set of all functions f whose second derivative f has a 1/q ≤ 1, with (1/p) + (1/q) = 1 (see bounded q-norm: f |q = (f |f |q ) Maejima and Rachev (1987)). It is easy to check that s 1 1 qs , xs = s−r θr (qs , xs ) ≤ s−r θr (qj , xj ) = s1−r θr (q1 , x1 ). θr s s 1

184

8. Probabilistic-Type Limit Theorems

Thus 1 1 qs , xs = O(s−1/p ) as s → ∞ whenever θr (q1 , x1 ) < ∞. (8.6.18) θr s s Notice that by the deﬁnition of θr , θr (X, Y ) < ∞ implies

E(X − Y ) = 0.

In fact, θr (X, Y ) ≥ supa>0 |E(aX − aY )| = +∞ if E(X − Y ) = 0. Thus, a necessary condition for x∗ = xt (q) to be an optimal stationary rule with respect to θr is the equality of the ﬁrst moments of q1 and of its (1/t)rounding: ∞

Eq1 = Ex∗1

or Eq1 =

1 P (q1 > d(k)/t). t 0

(8.6.19)

This condition is suﬃcient under the mild assumption that Eq1r < ∞.

(8.6.20)

In fact, (8.6.19) implies θr (qs , x∗s ) ≤ s where κr (X, Y ) = r

1 κr (q1 , r1 ), Γ(r − 1)

∞ 0

xr−1 |FX (x)−FY (x)| dx is the Kantorovich rth pseu-

domoment, and by (8.6.20), κr (q1 , x∗1 )

≤

Eq1r

1 + E q1 + t

r < ∞.

Summarizing, this yields the following theorem. Theorem 8.6.3 Suppose that the vector q = (q1 , . . . , qs ) consists of i.i.d. random variables, and Eq1r is ﬁnite for some r ∈ (1, 2). Then 1 1 θr (qs , xs ) = ∞, θr = ∞ qs , xs s s for any rule of (1/t)-rounding x = t (q) with Eq1 = Ex1 . However, if Eq1 = Ex∗1 for some stationary rule x∗ = ∗t (q), then ∗t is an optimal rule with respect to µ over the class of all stationary rules. Moreover, 1 1 1 ∗ θr ≤ s1−r qs , xs κr (q1 , x∗1 ) = O(s1−r ). s s Γ(r − 1)

8.6 Ideal Metrics in the Problem of Rounding

185

Corollary 8.6.4 If in Theorem 8.6.3 t is stationary, then it is optimal with respect to θr if and only if tEq1 =

∞

P (tq1 > k + C).

(8.6.21)

0

Equation (8.6.21) always has a solution C that is unique for any t > 0 under the condition that Fq1 (x) is strictly increasing. Thus, an optimal stationary rule with respect to θr exists and is unique over the set of stationary rules. Example 8.6.5 Suppose q1 is uniform on the interval (0, 1). Then (8.6.21) becomes [t−C] 1 k+C 1 = . 1− 2 t 0 t If t ∈ IN = {1, 2, . . .}, then this is equivalent to t−1 1 1 k+C 1− = , 2 t 0 t whose solution is C(t) = 12 , so the Webster rule is optimal. However, if t ∈ N + 12 , then the solution is C(t) = (t − 14 )/(2t + 1) < 12 , so the Webster rule is only asymptotically optimal. The set of stationary rules are clearly a very restrictive class within the class of all divisor rules. Moreover, the rate of convergence of (8.6.15) for θr can be very slow, as can be seen from Theorem 8.6.3, where 1 − r ∈ (−1, 0). Indeed, simple examples show that the order of convergence O(s1−r ) is exact. Given two optimal rounding rules with respect to some µ, x∗ = ∗t (q) and x = t (q), ∗t is preferred to t if µ((1/s)qs , (1/s)x∗s ) → 0 at a faster rate than µ((1/s)qs , (1/s)xs ) → 0 as s → ∞. And ∗t is optimal of order λ > 0 with respect to µ over a class R of rules if for any q it is optimal, and 1 1 qs , x∗s (8.6.22) µ → O(s−λ ) as s → ∞. s s Theorem 8.6.3 tells us that there exists an optimal stationary rule ∗t of order λ = r − 1 with respect to θr if the rth moment of q1 exists and is ﬁnite. Is it possible that an ideal metric µ other than θr would determine an optimal stationary rule diﬀerent from ∗t ? The answer is negative for all “nonpathological” metrics.

186

8. Probabilistic-Type Limit Theorems

Corollary 8.6.6 Suppose µ is an ideal metric of order r > 1 such that the law of large numbers holds with respect to µ; that is, X 1 + · · · + Xn , EX1 → 0 as n → ∞ (8.6.23) µ n for any nonnegative i.i.d. Xi ’s with ﬁnite EX1 . Then there is a unique stationary rule of (1/t)-rounding that is optimal of order r − 1, and it is determined by the solution C to (8.6.21). Proof: Suppose two diﬀerent stationary rules x∗ = ∗t (q) and x = t (q) are optimal of orders λ∗ and λ (where λ∗ and λ may be diﬀerent). Then 1 ∗ 1 1 µ(Ex∗1 , Ex1 ) ≤ µ Ex∗1 , x∗s + µ xs , qs s s s 1 1 1 +µ qs , xs + µ xs , Ex1 . s s s However, the right-hand side goes to 0 as s → ∞ by (8.6.16) and (8.6.23), and therefore Ex∗1 = Ex1 . Since the rules are stationary, they are both determined by (8.6.4), and so they are the same. 2 To obtain faster rates of convergence one must consider a wider class of divisor rules than the stationary ones. It is our objective to show that if one extends the analysis to K-stationary rules, then one can ﬁnd an optimal rule of order λ = K + 1. To do this it is necessary to generalize the deﬁnition of θr (since by the previous deﬁnition r ∈ (1, 2)), which can be done as follows: θr (X, Y ) = sup{|E(f (X) − f (Y ))|; f ∈ Fr },

(8.6.24)

where r = r0 + 1/p > 0, r0 ≥ 0 an integer, and p ≥ 1. Furthermore, in (8.6.24), Fr is the set of all functions f with (r0 + 1)-derivative f (r0 +1) satisfying f (r0 +1) q = [ |f (r0 +1) |q ]1/q ≤ 1, with (1/p) + (1/q) = 1. As before, it may be checked that 1 1 ≤ s1−r θr (q1 , x1 ), θr qs , xs s s so that (8.6.18) holds, provided that θr (q1 , x1 ) < ∞. In addition, θr (q1 , x1 ) < ∞ implies

E(q1k − xk1 ) = 0

for 0 < k ≤ r0 . (8.6.25)

Conversely, if Eq1r < ∞, then E(q1k − xk1 ) = 0 for 0 < k ≤ r0 implies θr (q1 , x1 ) < ∞. (8.6.26)

8.6 Ideal Metrics in the Problem of Rounding

187

In fact, θr (q1 , x1 ) can be bounded by θr (q1 , x1 ) ≤ Cr K κr (q1 , x1 ) ≤ Cr Eq1r < ∞,

(8.6.27)

where Cr and Cr are constants, and κr is the pseudomoment of order r. It is possible to say more. θr (q1 , x1 ) → 0 means that the distributions of q1 and of x1 “θ-merge”; that is, L(q1 , x1 ) → 0 and E(q1r − xr1 ) → 0, where L is the L´evy metric between the distributions of q1 and x1 (for merging, the L´evy metric, and related concepts see D’Aristotile, Diaconis, and Freedman (1988) and Rachev (1991)). A K-stationary rule x∗ = ∗t (q) of order λ = r − 1 is optimal with respect to the metric θr over the class of K-stationary rules if for any q, θr (qs , x∗s ) = min{θr (qs , xs ); x = t (q), t K-stationary}, (8.6.28) t

and furthermore, the rate of θr -merging is θr

1 1 qs , x∗s s s

= O(s1−r ) as s → ∞

(8.6.29)

whenever Eq1r < ∞. In fact, (8.6.25) through (8.6.27) imply that if for a rule x = t (q) the moment conditions E(q1k − xk1 ) = 0 do not hold for some k = 0, . . . , r0 , then θr (q1 , x1 ) = ∞. So (8.6.28) fails for s = 1, and thus t is not optimal. On the other hand, if these moment conditions do hold for all k for some rule x∗ = ∗t (q), then by (8.6.27) and the ideality of θr , θr (qs , x∗s ) ≤ sCr Eq < ∞ for any ﬁxed s, so (8.6.28) is fulﬁlled. This proves Theorem 8.6.7 Suppose Eq1r < ∞ with r = K +1+1/p. Then x∗ = ∗t (q) is an optimal K-stationary rule of order λ = r − 1 with respect to the metric θr if and only if the thresholds C0 , . . . , CK−1 , C = CK = CK+1 . . . are chosen such that E(q1j − x∗j 1 )=0

for j = 1, . . . , K + 1.

(8.6.30)

This means that the K +1 thresholds must be chosen such that the following system of equations is satisﬁed Eq1j

=

∞ j k k=1

t

P (k − 1 + Ck−1 < tq1 < k + Ck ),

j = 1, . . . , K + 1, where Ck = CK for k ≥ K.

(8.6.31)

188

8. Probabilistic-Type Limit Theorems

It has been observed above that when q1 is uniform over the interval (0, 1), t is an integer, and K = 0, then the optimal rule is Webster’s. But if K > 0, there is no stationary rule that meets (8.6.31) even in the uniform case. Example 8.6.8 Suppose q1 is uniform over the interval (0, 1), t is an integer, and K = 1.(4) Then (8.6.31) yields the solution C0 = 13 , C = (3t − 2)/(6t − 6), so the optimal 1-stationary rule is determined by d(0) = 1 3 , d(k) = k + (3t − 2)/(6t − 6) for k = 0. Suppose t = 2. Then q1 is rounded to either 0, 12 , or 1 as follows: x∗1 = 0 if 0 < q1 ≤ 16 , x∗1 = 12 if 16 < q1 ≤ 56 , and x∗1 = 1 if 56 < q1 ≤ 1. The ﬁrst 1 = Var x∗1 . two moments of q1 and x∗1 agree: Eq1 = 12 = Ex∗1 , Var q1 = 12 1 3 ∗ 3 (Indeed, the third moments also agree: Eq1 = E(x1 ) = 4 .) The Webster rule, on the other hand, rounds q1 as follows: xw 1 = 0 if 1 1 3 3 w 0 < q1 ≤ 14 , xw 1 = 2 if 4 < q ≤ 4 , and x1 = 1 if 4 < q1 ≤ 1. The ﬁrst 1 w moments of q1 and xw 1 agree but not the second: Eq1 = 2 = Ex1 , Var q1 = 1 1 w 12 < 8 = Var x1 . Since the ﬁrst two moments of q1 and x∗1 are equal for the optimal 1stationary rule, the central limit theorem applies, and for the Kolmogorov metric (qs , x∗s ) = sup{|P (qs ≤ x) − P (x∗s ≤ x)|; x ∈ IR} we have # " qs − s/2 x∗s − s/2 , √ ≈ O(s−1/2 ) as s → ∞. (qs , x∗s ) = √ 12s 12s In contrast, the Webster rule gives ) 7 * # " w w − s/2 − s/2 − s/2 − s/2 8 q x q x s s √ , s√ = , s√ √ (qs , xw s) = 12 12s 12s 12s 8s

→ N(0,1) , N(0,√2/3) > 0 as s → ∞, where N(m,σ) is the normal distribution with mean m and standard deviation σ. In terms of the ideal metric θ3 there is even better evidence for the advantage of the optimal 1-stationary rule over the standard Webster rounding. For 1 1 ≤ s−2 θ3 (q1 , x∗1 ) ≤ constant s−2 , θ3 qs , x∗s s s the last inequality is due to the optimality of x∗1 , whereas 1 1 ∗ qs , xs θ3 = +∞ for any s. s s (4) For other examples including the case K = 2, we refer to Balinski and Rachev (1993).

8.6 Ideal Metrics in the Problem of Rounding

189

For more results concerning the vector problem with more i.i.d. observations and the problem of rounding tables we refer to Balinski and Rachev (1993).

9 Mass Transportation Problems and Recursive Stochastic Equations

In this chapter we use the regularity properties of metrics and distances deﬁned via mass transportation problems in order to investigate the asymptotic behavior of stochastic algorithms and recursive equations. The recursive structure allows us to apply ﬁxed-point and approximation techniques to the space of probability measures supplied with adapted probability metrics in order to describe the limiting behavior of various algorithms.

9.1 Recursive Algorithms and Contraction of Transformations Several diﬀerent approaches to the asymptotic analysis of algorithms have been given in the literature. Interesting results have been obtained by the transformation method, the method of branching processes, the method based on stochastic approximations, the martingale method, and others. The analysis of algorithms is an important application of stochastics in computer science and poses challenging questions and problems. It has led to some new developments also in stochastics. Based on the properties of minimal metrics as introduced in Chapter 8, a promising new method for asymptotic analysis has recently been introduced. R¨ osler (1991) gave an asymptotic analysis of the quicksort algorithm based on the minimal p -metric. His proof has been extended in several papers by Rachev and R¨ uschendorf to a general “contraction method” with

192

9. Mass Transportation Problems and Recursive Stochastic Equations

a wide range of possible applications. A series of examples and further developments of the method have been found in some recent work. The contraction method (in its basic form) uses the following sequence of steps: 1. Find the correct normalization of the algorithms. (Typically by studying the ﬁrst moments or tails.) 2. Determine the recursion for the normalized algorithm. 3. Determine the limiting form of the normalized algorithms. The limiting equation typically is deﬁned via a transformation T on the set of probability measures. 4. Choose an ideal metric µ such that T has good contraction properties with respect to µ. This ideal metric has to reﬂect the structure of the algorithm. It also has to have good bounds in terms of interpretable other metrics and has to allow one to estimate bounds (in terms of moments usually). As a consequence one obtains 5. The conjectured limiting distribution is the unique ﬁxed point of T . Finally, one should ensure that the recursion is stable enough so that the contraction in the limit can be pulled over to establish contraction properties of the recursion itself for n → ∞. This is the technically most involved step in the analysis. 6. Establish convergence of the algorithms to the ﬁxed point. Applications of this method have been given to several sorting algorithms, to the communication resolution interval (CRI) algorithm, to generalized branching-type algorithms, to bootstrap estimators, iterated function systems, learning algorithms, and others. For several examples modiﬁcations of this method have been considered. There are examples where the contraction factors converge to one. In several cases there is a trivial limiting recursion that gives no clue to a possible limit distribution. Also, logarithmic normalizations and convergence rates have to be handled by special considerations. We begin with a discussion of contraction properties of transformations T on the set of probability distributions on a basis space U. Stochastic algorithms are typically directly described by the iterates of a transformation T on the set of all probability distributions on the basic space U , or else they are asymptotically closely related to the iterations (and thereby to the ﬁxed points) of a transformation T that describes the limiting equation. They can, for example, diﬀer from iterations by a stochastic sequence converging to zero. Examples for recursive algorithms

9.1 Recursive Algorithms and Contraction of Transformations

193

that are asymptotically related to iterations of transformations T are studied in Section 9.2. Consider a contraction transformation T : M 1 (U ) → M 1 (U ), where M 1 (U ) is the set of probability measures on U supplied with probability metrics as in Chapter 8. Applying the ﬁxed-point theorems for complete, separable metric spaces, we can infer the convergence of the iterates (T n F ), F ∈ M 1 (U ), to a ﬁxed point of T . Some of the following examples serve to describe the inﬂuence of the choice of the metrics, while others indicate the range of applicability to diﬀerent ﬁelds. (a) Consider at ﬁrst a transformation of the form d

TF =

N

ai (τ )Yi + C(τ ).

(9.1.1)

i=1

Here, for F ∈ M 1 (U ), (Yi )1≤i≤N is an i.i.d. sequence with distribution F . Furthermore, C(τ ), ai (τ ), τ are real random variables independent of (Yi ), N and ﬁnally, T F is the law of i=1 ai (τ )Yi + C(τ ). Consider Zolotarev’s ideal metric ζr of order r > 0: For r = m + α, m ∈ IN, and 0 < α ≤ 1, ζr (X, Y )

(9.1.2) & % := sup |E(f (X) − f (Y ))|; |f (m) (x) − f (m) (y)| ≤ x − yα .

Here f (m) (x) denotes the Fr´echet derivative of order m, and · is a norm on U . Next, suppose that F, G ∈ M 1 (U ), where (U, · ) is a Banach space. Proposition 9.1.1 ζr (T F, T G) ≤

1N

2 r

E|ai (τ )|

ζr (F, G).

(9.1.3)

i=1

Proof: The proof of (9.1.3) uses the ideality properties of ζr ; that is, (i) ζr (X + Z, Y + Z) ≤ ζr (X, Y )

for Z independent of X, Y,

(9.1.4)

and (ii) ζr (cX, cY ) = |c|r ζr (X, Y ), for all c ∈ IR. d

d

(9.1.5)

Let Yi = F , Zi = G be independent r.v.s. Then, with r = m + α,

%

ai (τ )Yi + C(τ ) −Ef ai (τ )Zi + C(τ ) ; ζr (T F, T G) = sup Ef & |f (m) (x) − f (m) (y)| ≤ |x − y|α

194

9. Mass Transportation Problems and Recursive Stochastic Equations

≤ ≤ ≤

ζr ζr

ai (t)Yi + C(t), ai (t)Yi ,

ai (t)Zi + C(t) dP τ (t)

ai (t)Zi ) dP τ (t)

|ai (t)|r dP τ (t)ζr (Yi , Zi ) =

E|ai (τ )|r ζr (F, G),

2

which proves (9.1.3).

For the property to hold it suﬃces to require that

E|ai (τ )|r < 1

ζr (F, G) < ∞.

and

(9.1.6)

In some cases, the last condition can be established by making use of the inequality

ζr ≤

Γ(1 + α) vr , Γ(1 + r)

(9.1.7)

where vr (X, Y ) :=

xr d|PX − PY |(x)

is the absolute pseudomoment of order r. For random vectors X and Y , and m ≤ r < m + 1, (9.1.7) requires that all moments X and Y of order ≤ m coincide. Recall that the minimal Lp -metrics p are ideal of order r = min(1, p).

Proposition 9.1.2 For F, G ∈ M 1 (U ),

p (T F, T G) ≤

1N i=1

2 r

E|ai (τ )|

p (F, G).

(9.1.8)

9.1 Recursive Algorithms and Contraction of Transformations d

195

d

Proof: Let Yi = F, Zi = G, 1 ≤ i ≤ N , be independent pairs of random variables with laws F, G, and Lp (Yi , Zi ) = p (F, G). Then

p (T F, T G) = p ai (τ )Zi + C(τ ) ai (τ )Yi + C(τ ), ' ' ' ' ai (τ )Yi − ≤ ' ai (τ )Zi ' p

N

≤

E|ai (τ )|r Yi − Zi p

i=1

1N

=

2 r

E|ai (τ )|

p (F, G).

i=1

2 So, the contraction property will hold if N

E|ai (τ )|r < 1

and p (F, G) < ∞.

(9.1.9)

i=1

Under additional assumptions we may improve (9.1.9). Let U be a Hilbert space and let F, G have identical ﬁrst moments. Then the following is a reﬁnement of Proposition 9.1.2. Proposition 9.1.3 1 21/2 N 2 E|ai (τ )|

2 (F, G).

2 (T F, T G) ≤

(9.1.10)

i=1

Proof: With Yi , Zi as in the proof of (9.1.8), we have ' '2 ' '

22 (T F, T G) ≤ ' ai (τ )(Yi − Zi )' 2

=

N i=1

=

E|ai (τ )|2 Yi − Zi 22

E|ai (τ )|2 22 (F, G). 2

If the Banach space U is of type p, 1 ≤ p ≤ 2, and F, G have identical d d ﬁrst moments (more precisely, E(Y − Z) = 0 for Y = F, Z = G), then for 1 ≤ p ≤ 2,

p (T F, T G) ≤

Bp1/p

1N i=1

21/p p

E|ai (τ )|

p (F, G).

(9.1.11)

196

9. Mass Transportation Problems and Recursive Stochastic Equations

Here Bp is the constant arising in the Woyczinski inequality (cf. Rachev and R¨ uschendorf (1992a)). For U = Lp (µ) (Lp (µ) is the space of all r.v.s X with p ﬁnite |X| dµ), one can choose the constants B1 = 1, Bp = 18p3/2 /(p − 1)1/2 for 1 < p ≤ 2. The proof of (9.1.11) is similar to that of (9.1.10), but in (9.1.11) we use the Woyczinski inequality instead of the Hilbert space structure. If the underlying space is Euclidean, we can derive similar contraction properties with respect to other metrics deﬁned in Chapter 8. Example 9.1.4 Let N = 2, let τ be uniformly distributed on (0,1), and a1 (τ ) = τ, a2 (τ ) = 1 − τ . Then the contraction factor α with respect to the

p -metric is given in the following list:

1 -metric : α

2 -metric : α ζ2 -metric : α ζ3 -metric : α

= Eτ + E(1 − τ ) = 1,

i.e., “no contraction”; (9.1.12) $ = (Eτ + E(1 − τ ) ) = 2/3; = Eτ 2 + E(1 − τ )2 = 2/3; 2

2 1/2

= 1/2.

Clearly, if for a probability metric µ on M 1 (U ) the contraction factor is α < 1, µ(T F, T G) ≤ αµ(F, G), then µ(T n+1 F, T n F ) ≤ αn µ(T F, F );

(9.1.13)

i.e., one obtains an exponential convergence rate to a ﬁxed point. In the d example above we consider the recursion Xn+1 = τn Xn + (1 − τn )X n + d

d

C(τn ), τn = τ, where X n = Xn , and τn , Xn , X n are independent. The d

corresponding ﬁxed point equation is X = τ X + (1 − τ )X + C(τ ). So under the condition of equal ﬁrst moments, the convergence rate is (2/3)n for the “ideal” metric ζ2 , in comparison to (2/3)n/2 for the 2 -metric. √ √ If a1 (τ ) = τ , a2 (τ ) = 1 − τ , then with respect to the 2 -metric the contraction factor is α = Eτ + E(1 − τ ) = 1; i.e., there is no contraction. The same “no contraction” property is valid for ζ2 . For ζ3 the contraction factor is α = Eτ 3/2 + E(1 − τ )3/2 = 45 < 1; so the contraction property holds if ζ3 (F, G) < ∞. (b) We next consider the transformation d

TF =

max {ai (τ )Yi }.

1≤i≤N

(9.1.14)

For U = IRk , k = 1, 2, . . . , ∞, F ∈ M 1 (U ), let (Yi )1≤i≤N , τ be as in (a). Let ai (τ ) ≥ 0 and consider d

TF =

max {ai (τ )Yi }.

1≤i≤N

(9.1.15)

9.1 Recursive Algorithms and Contraction of Transformations

197

We shall study the contraction property of T by making use of the weighted uniform metric r : r (X, Y )

sup M (x)r |FX (x) − FY (x)|,

=

(9.1.16)

x∈IRk

where M (x) := mini≤k |xi |. In the next proposition we use the fact that r is an ideal metric of order r with respect to the maxima of i.i.d. r.v.s. Proposition 9.1.5 r (T F, T G) ≤

E(ai (τ ))r r (F, G).

(9.1.17)

d

Proof: Let (Zi ) be i.i.d. and Z1 = G. Then using the max-ideality of r , we have r (T F, T G) = r max{ai (τ )Yi }, max{ai (τ )Zi } i≤N i≤N = sup |x|r (Fmax{ai (t)Yi } (x) − Fmax{ai (t)Zi } (x)) dP τ (t) x ≤ r (max{ai (t)Yi }, max{ai (t)Zi }) dP τ (t) 1 2 r τ r ai (t) dP (t)r (Yi , Zi ) = ≤ Eai (τ ) r (Y1 , Z1 ). i

i

2 For more general maxima we can again use the p -metrics. Let U = Lλ (µ), 1 ≤ λ < ∞. For F ∈ M 1 (U ) and ai (τ ) ≥ 0 consider d

TF =

max ai (τ )Yi + C(τ ).

i≤i≤N

(9.1.18)

d

Here (Yi ) are i.i.d., Y1 = F, τ is independent of (Yi ), and C(τ ) has values in U . For any p, λ we have 2 1N r Eai (τ ) p (F, G), r = min(1, p). (9.1.19)

p (T F, T G) ≤ i=1

For 1 ≤ p ≤ λ < ∞ we have the following improvement: Proposition 9.1.6 If 1 ≤ p ≤ λ < ∞, then

p (T F, T G) ≤

N i=1

(Eai (τ )p )1/p p (F, G).

(9.1.20)

198

9. Mass Transportation Problems and Recursive Stochastic Equations d

d

Proof: Let Yi = F , Zi = G satisfy Yi − Zi pλ,µ = p (F, G)p , where Xλ,µ = ( |X(t)|λ dµ(t))1/λ . Then

p (T F, T G) ≤ (E max ai (τ )Yi − max ai (τ )Zi pλ,µ )1/p 1 21/p p/λ λ τ = E | max ai (t)Yi (s)−max ai (t)Zi (s)| dµ(s) dP (t) ⎛

≤ ⎝

E

1

⎞1/p

2p/λ

dP τ (t)⎠

ai (t)λ |Yi (s) − Zi (s)|λ dµ(s)

i

≤

1

" E

21/p #p/λ τ ai (t) |Yi (s) − Zi (s)| dµ(s) dP (t) λ

λ

i

(since p/λ ≤ 1)

=

1

21/p Eai (τ )EYi −

Zi pλ,µ

=

i

1

21/p p

Eai (τ )

p (F, G).

i

2 (c) Bootstrap Estimators

For a separable Banach space U and F ∈ M 1 (U ), let µ(F ) = x dF (x), n d µn (F ) = n1 i=1 Xi , where (Xi ) are i.i.d., Xi = F , and Fn is the empirical measure of X1 , . . . , Xn . For p > 0 denote by Γp the class of distributions with ﬁnite pth moment. From the strong law of large numbers, for any F ∈ Γp we then obtain (cf. Chapter 8)

p (Fn , F ) → 0 a.s.

and E p (Fn , F ) → 0.

(9.1.21) d

∗ Let now X1∗ , . . . , Xm be a bootstrap sample; i.e., the (Xi∗ ) are i.i.d., X1∗ = ∗ is the empirical distribution of Fn (conditionally on X1 , . . . , Xn ), and Fn,m ∗ ∗ ∗ , F ) → 0 a.s. (conditionally X1 , . . . , Xm , m = m(n). The condition p (Fn,m on X) is equivalent to the joint convergence

1 f (Xi∗ ) → m i=1 m

1 p ∗ d (Xi , a) m i=1 m

→

f dF a.s.,

f ∈ Cb (U ),

dp (x, y) dF (x) a.s.

9.1 Recursive Algorithms and Contraction of Transformations

199

(cf. Chapter 8), representing a special form of the SLLN for real-valued r.v.s. In the case (U, d) = (IRr , · ), p > 1, we are able to obtain a rate of convergence for the bootstrap approximation. Let γ = kr/[(k − r)(k − 2)], k > r, k > 2, and xγ F ( dx) < ∞. Then 1

E pp (Fn , F ) ≤ C(r, k, p)n−(1− p )/k

(9.1.22)

(cf. Rachev (1984d, pp. 667–668)), and thus 1

∗ ∗ , F ) ≤ 2p EEX1 ,...,Xn pp (Fn,m , Fn ) + 2p C(r, k, p)n−(1− p )/k E pp (Fn,m

1 1 (9.1.23) ≤ C ∗ (r, k, p) m−(1− p )/k + n−(1− p )/k .

If, however, the (Xi ) are in the domain of an α-stable distribution, then it is more natural to choose the bootstrap estimator from a distribution Fn 1 , . . . , X n be a bootstrap which has a tail behavior similar to F . Let then X sample with adapted tail behavior such that

p (Fn , F ) → 0. Consider Vn =

1 n1/α

n

(9.1.24)

d i ) = X Tn (Fn ) (1 ≤ α ≤ 2) as a i=1 (Xi − EF n d n 1 Vn = n1/α i=1 (Xi − EF Xi ) = Tn (F ). Then

boot-

strap estimator of for a Banach space of type p, 1 ≤ p ≤ 2 and 1 ≤ α ≤ p, it follows from (9.1.11) that 1 1 1 − E X 1 , X1 − EF X1 )

p (Vn , Vn ) ≤ Bp n p − α p (X 1 1 1 − EF X1 |) → 0. ≤ Bp n p − α ( p (Fn , F ) + |E X

(9.1.25)

Since p (Vn , Y(α) ) → 0, where Y(α) an α-stable r.v., then in the case

p (X1 , Y(α) ) < ∞, it follows from the bound in (9.1.25) that

p (Vn , Y(α) ) → 0.

(9.1.26)

Moreover, the rate of convergence is of order o(n1/p−1/α ). In the case p = 2 and F ∈ Γ2 , the condition (9.1.24) is satisﬁed for Fn = Fn∗ ; for Euclidean spaces this case has been considered by Bickel and Freedman (1981). Their investigation of more general functionals on the set of empirical measures can also be extended to the setting we described. (d) Transformation by Markov Kernels, Image Encoding Let (U, d) be a separable metric space and let wi : U → U, 1 ≤ i ≤ N , be mappings satisfying d(wi x, wi y) ≤ si d(x, y).

(9.1.27)

200

9. Mass Transportation Problems and Recursive Stochastic Equations

Given a probability distribution (pi )1≤i≤N deﬁne the Markov kernel K(x, ·) =

N

pi εwi (x) ,

N ≤ ∞.

(9.1.28)

i=1

The implied transformation on M 1 (U ) is denoted by T F = KF, where KF (A) = K(x, A)F ( dx).

(9.1.29)

Let now Lip(U ) be the set of Lipschitz functions, |f (x) − f (y)| ≤ d(x, y) for all x, y ∈ U . Then for Kf (x) = pi f ◦ wi (x), we have |Kf (x) − Kf (y)| ≤ pi |f ◦ wi (x) − f ◦ wi (y)| (9.1.30)

pi si d(x, y). ≤ pi d(wi (x), wi (y)) ≤ Let us look at the contraction properties for the mapping T with respect to the Kantorovich metric d

d

µL (F, G) = sup{|E(f (X) − f (Y ))|; X = F, Y = G, f ∈ Lip(U )}. (9.1.31) We have then µL (T F, T G) = sup{|E(f (K(X, ·) − f (K(Y, ·)))|; f ∈ Lip(U )} (9.1.32) = sup{|E(Kf (X) − Kf (Y ))|; f ∈ Lip(U )}

≤ pi si sup{|E(g(X) − g(Y ))|; g ∈ Lip(U )}

= pi si µL (F, G). If pi si < 1, then T is a contractive mapping. By the Kantorovich–Rubinstein theorem µL coincides with the minimal L1 -metric, and therefore,

1 (T F, T G) ≤ pi si 1 (F, G). (9.1.33) Moreover, for any p > 0, we can extend this result as follows. Proposition 9.1.7

p (T F, T G) ≤ d

pi spi d

1/p∧1

p (F, G).

Proof: Suppose X = F, Y = G satisfy ⎧ ⎨ (E d(X, Y )p )1/p , if p ≥ 1,

p (F, G) = ⎩ E d(X, Y )p , if p < 1.

(9.1.34)

9.1 Recursive Algorithms and Contraction of Transformations

201

Take I to be a random variable with values in {1, 2, . . . , N } and distribution (pi ) that is independent of X, Y . Then for 1 ≤ p,

pp (T F, T G)

≤ E d(wI(X), wI(Y ))p =

N

(9.1.35)

E(d(wi (X), wi (Y ))p I(I = i))

i=1

=

N

pi E d(wi (X), wi (Y ))p

i=1

≤

1N

2 pi spi

E d(X, Y ) =

pi spi pp (F, G).

i=1

2

The proof for the case 0 < p < 1 is similar.

Remark 9.1.8 Another proof of (9.1.32) can be given via the dual representation of p . Indeed, for 0 < p < ∞, p = 1 ∨ p,

pp (F, G) =

sup

f dF +

g dG; f, g bounded continuous, f (x) + g(y) ≤ dp (x, y), ∀x, y ∈ U .

(9.1.36)

Therefore,

pp (T F, T G) (9.1.37) p = sup{E f (K(X, ·)) + E g(K(Y, ·)); f (x) + g(y) ≤ d (x, y)}, d

d

where X = F, Y = G. Since Kf (x) + Kg(y) ≤

pi dp (wi (x), wi (y)) ≤

pi spi dp (x, y),

we obtain

pp (T F, T G) = sup{E Kf (X) + E Kg(Y )), f (x) + g(y) ≤ dp (x, y)}

≤ pi spi pp (F, G). (9.1.38) Remark 9.1.9 Hutchinson (1981) was the ﬁrst to prove convergence with respect to the metric µL in the case si ≤ 1. Barnsley and Elton (1988) used the above Markov chain to “construct images” by so-called iterated

202

9. Mass Transportation Problems and Recursive Stochastic Equations

function systems (IFS). They established the existence of a unique attractive invariant measure µ under the assumption that N 8

d(wi (x), wi (y))pi ≤ r d(x, y),

r < 1.

(9.1.39)

i=1

The above inequality is indeed implied by the condition (9.1.27) with N 8

spi i < 1.

(9.1.40)

i=1

In the case of aﬃne maps on IRk we can improve the arguments in the following way (see Proposition 9.1.10 below, cf. also Burton and R¨ osler (1995)). Deﬁne d

T F = AX + b,

(9.1.41)

where A is a random matrix, b a random vector, (A, b) independent of X, d

and X = F . Consider the operator norm of the expected product EAT A, EAT A := sup x∈IRk x=0

(EAT A)x . x

(9.1.42)

Then EAT A = sup x =0

EAx, Ax = EA, A, x2

(9.1.43)

where the right-hand side is the L2 -norm of EA, A. Proposition 9.1.10 Assume that b 2 < ∞. Then

2 (T µ, T ν) ≤

@ EAT A 2 (µ, ν)

for any µ, ν ∈ M 1 (IRk ) with ﬁnite second moments.

(9.1.44)

9.1 Recursive Algorithms and Contraction of Transformations

203

Proof: Let Y, Z be random vectors with distributions µ, ν and 2 (µ, ν) = (EY − Z2 )1/2 , where Y, Z are independent of (A, b). Then

2 (T µ, T ν) ≤ AY − AZ 2 $ ≤ EA(Y − Z), A(Y − Z) @ = EY − Z, E(AT A)(Y − Z) @ $ EAT A EY − Z, Y − Z ≤ @ EAT A 2 (µ, ν). =

(9.1.45)

2 Notice that the estimate from above deﬁned in (9.1.45) is an improvement (in the case p = 2) over the general estimate AX p ≤ A · X p ≤ A p · X p .

(9.1.46)

In fact, the above general bound requires the stronger condition A p < 1 to yield the contraction property. (e) Environmental Processes Let (Yi , Zi ) be a sequence of i.i.d. pairs of r.v.s with values in U × IR, where U is a separable Banach space. Deﬁne a sequence of r.v.s (Sn ) by Sn+1 = (Yn + Sn )Zn ,

S0 ≥ 0.

(9.1.47)

This kind of process has found several applications in environmental modeling and has been studied intensively. If we write τn = (Yn , Zn ), and a(τn ) = Zn , C(τn ) = Yn Zn , then Sn+1 = a(τn )Sn + C(τn ),

(9.1.48)

so we have a special case of (9.1.1). Under the condition that E|a(τ )|r < 1, d

the operator T S = a(τ )S + C(τ ) is contractive. Therefore, (Sn ) converges (with respect to some ideal metric of order r such as ζr , for example) to a ﬁxed point, i.e., a solution of d

S = (Y + S)Z.

(9.1.49)

Numerous properties of the solutions of the above equation have been studied in the literature; see, for example, Rachev and Samorodnitsky (1995) and Rachev and R¨ uschendorf (1995).

204

9. Mass Transportation Problems and Recursive Stochastic Equations

9.2 Convergence of Recursive Algorithms In this section we apply the contraction properties established in Section 9.1 to study limits for recursive algorithms. We shall use the “method of probability metrics.” The main idea of this method is to transform the recursive equations in such a way that with respect to a suitable metric we can derive contraction properties in the limit; i.e., we consider decompoAn such that (Yn ) has contraction properties and W An sitions Xn = Yn + W converges to zero. This idea will be demonstrated in various examples. n The approach is natural from the following point of view. If Sn = i=1 Yi is a sum of independent (centered) random variables and Xn = n−1/α Sn is the normalized sum, then Xn satisﬁes the following simple recursion: Xn+1 =

n n+1

−1/α

Xn + (n + 1)−1/α Yn+1 .

(9.2.1)

Thus the central limit theorem can be considered as the limit theorem of this simple (stochastic) recursion. The form of the recursion corresponding to the strong law of large numbers is even simpler.

9.2.1

Learning Algorithm

Let Y1 , Y2 , . . . be an i.i.d. sequence of r.v.s with values in a separable Banach space with ﬁrst moment µ. Deﬁne the following recursive sequence: Let X1 be arbitrary with ﬁnite ﬁrst moment, and let Xn+1 =

n 1 Xn + Yn+1 . n+1 n+1

(9.2.2)

Xn can be viewed as an easy recursive algorithm designed to “learn” about the unknown theoretical mean µ given the sample (Y1 , . . . , Yn ). Proposition 9.2.1

ζr (Xn , µ) → 0

if

ζr (X1 , µ) < ∞.

Proof: Let n = EXn . Claim 1: n → µ. For the proof of Claim 1 note that from (9.2.2), we obtain

n+1

= = =

n 1

n + µ n+1 n+1 n−1 2

n−1 + µ n+1 n+1 1 n

1 + µ, n+1 n+1

(9.2.3)

9.2 Convergence of Recursive Algorithms

205

where the last step follows from the inductive argument. This implies Claim 1. Deﬁne next Z n = Xn − n ,

Wn = Yn − µ.

(9.2.4)

Then, n 1 (Zn + n ) + (Wn+1 + µ), (by (9.2.2)) n+1 n+1 1 1 n n + Wn+1 − n+1 − n −µ = Zn n+1 n+1 n+1 n+1 n 1 + Wn+1 (by (9.2.3)). (9.2.5) = Zn n+1 n+1

Zn+1 + n+1

=

Zn+1

Now let µr be an ideal metric of order r, 1 < r < 2, and bn = µr (Zn , 0) (for example we can choose µr = ζr ). Claim 2. µr (Zn , 0) → 0 if a = µr (W1 , 0) < ∞. For the proof of this claim note that bn+1

n 1 + Wn+1 ,0 = µr (Zn+1 , 0) = µr Zn n+1 n+1 n 1 ≤ µr Zn , 0 + µr Wn+1 , 0 n+1 n+1 (since Zn is independent of Wn+1 ) r r 1 n µr (Zn , 0) + µr (Wn+1 , 0) = n+1 n+1 r r 1 n bn + a. = n+1 n+1

Therefore, r r # 1 1 bn−1 + a + n n+1 a r r n−1 1 = bn−1 + 2 a n+1 n+1 r r 1 1 b1 + n a. (9.2.6) ≤ n+1 n+1

bn+1

≤

n n+1

r "

n−1 n

r

Since 1 < r, it follows that bn → 0. In particular, for µr = ζr , we obtain from Claim 1 that

206

9. Mass Transportation Problems and Recursive Stochastic Equations

ζr (Xn , µ) → 0

if ζr (X1 , µ) < ∞.

(9.2.7) 2

For the case of Euclidean spaces the condition ζr (Y1 , µ) < ∞ is satisﬁed if Y1 has a ﬁnite absolute rth moment, r > 1. Therefore, under the assumption of a ﬁnite rth moment we obtain convergence of Xn to µ. The sequence (Xn ) provides a simple example of a “learning algorithm” (for µ). Its convergence to µ in the real case can also be obtained as an application of the Robbins–Siegmund lemma (cf. Robbins and Siegmund (1971)) under the stronger assumption of a ﬁnite second moment. In this simple example we can, of course, directly prove the convergence of Xn to µ under the assumption of a ﬁnite ﬁrst moment. The arguments above illustrate the general idea behind the method of probability metrics and show that in this simple case the method of probability metrics works with weaker assumptions than the method of stochastic approximation based on the Robbins–Siegmund lemma. Some further simple examples of the Robbins–Monroe-type recursion Xn+1 = fn (Xn , Yn+1 ) can be treated similarly. Note that our method only needs a metric ideal of order r > 1 such that µr (Xn − n , 0) → 0 implies that Xn − n → 0 in distribution. The p metric will not work in this example, since its degree of ideality is only r = min(1, p).

9.2.2

Branching-Type Recursion

Consider the following recursive sequence (Ln ): L0 ≡ 1,

d

Ln =

K

(i)

Xi Ln−1 + Y.

(9.2.8)

i=1 (i)

Here Ln−1 are i.i.d. copies of Ln−1 , (Xi ) is a real random sequence, K is a random number in IN0 , and Y is a random “immigration” such that d (i) K, {(Xi ), Y }, (Ln−1 ) are independent. As usual, = denotes equality in distribution. (9.2.8) induces a transformation T on M 1 , the set of probability distributions on (IR1 , B 1 ). This is achieved by letting T (µ) be the distriK bution of i=1 Xi Zi + Y , where the (Zi ) are i.i.d. µ-distributed r.v.s, and moreover, (Zi ), {(Xi ), Y }, K are independent. Some special cases of those transformation and recursion have been studied intensively in the literature. If Xi ≡ 1, then (9.2.8) describes a Galton– Watson process with immigration Y with the number of descendants of a parent described by K. The recursion (9.2.8) can be viewed as a branching process with random multiplicative weights. The special case where K is constant, Y = 0, and (Xi ) are i.i.d. and nonnegative was introduced by Mandelbrot (1974) in his analysis of the Yaglom–Kolmogorov

9.2 Convergence of Recursive Algorithms

207

turbulence model. This case has been also studied by Kahane and Peyri`ere (1976) and Guivarch (1990), who considered the question of nontrivial ﬁxed points of T , the existence of moments of the ﬁxed points, and the convergence of (Ln ). For Xi ≡ K −1/α , the solutions of the ﬁxed-point equation d K Z = i=1 K −1/α Zi are Paretian stable distributions (if Zi ≥ 0). For that reason the solutions are called semistable in Guivarch (1990). In this section we will be mainly interested in the case of multipliers Xi and solutions Zi with moments of order ≥ 2. While the analysis of Kahane and Peyri`ere (1976) is based on an associated martingale, Guivarch (1990) uses a more elementary martingale property together with a conjugation relation and moment-type estimates for the Lp -distance, 0 < p < 1. Motivated by some problems in inﬁnite particle systems, Holley and Liggett (1981) and Durrett and Liggett (1983) considered a smoothing transformation with (Xi ) that are not not necessarily independent and assume that Xi ≥ 0, K constant, and Y = 0. In Durrett and Liggett (1983) a complete analysis of the case is given. In particular, a necessary and sufﬁcient condition for the existence and characterization of (all) ﬁxed points as well as a general suﬃcient condition for convergence was derived, as well as a generalization of the result of Kahane and Peyri`ere on the existence of moments. The method of Durrett and Liggett is based on an associated branching random walk. The use of contraction properties of minimal Lp -metrics in this section allows us to obtain quantitative approximation results for the recursion (9.2.8). Under moment assumptions used in this section, the recursion converges to the limiting distribution exponentially fast. This is demonstrated by simulations for several examples. Also, it is possible to remove the assumption of nonnegativity, to deal with a random number K, and to add immigration Y . This allows us to include applications to branching processes as well as to study the development of the total mass in the construction of multifractal measures (cf., for example, Arbeiter (1991)). For details we refer to Cramer and R¨ uschendorf (1996b). (a) Branching-Type Recursion with Multiplicative Weights In this section we shall study the recursion (9.2.8) allowing for dependent multipliers Xi but setting the immigration Y ≡ 0. In other words, we consider the recursion L0 ≡ 1,

(i)

d

Ln =

K

(i)

Xi Ln−1 ,

(9.2.9)

i=1

are i.i.d. copies of Ln−1 , (Xi ) is a square integrable real

(i) random sequence, K is a random number in IN0 , and K, (Xi ), Ln−1 are independent r.v.s. where

Ln−1

208

9. Mass Transportation Problems and Recursive Stochastic Equations

To determine the correct normalization of (L n ) we ﬁrst consider the ﬁrst

K moments of (Ln ). Set n := ELn , c := E X i=1 i , vn := Var(Ln ),

K K 2 a := E i=1 Xi , and b := Var i=1 Xi . Proposition 9.2.2 Let 0 = 1, n = cn . Suppose that b > 0, c = 0, a = c2 . Then 1 − ( ca2 )n , 1 − ca2

vn = bc2n−2

n ≥ 1, v0 = 0.

(9.2.10)

If a = c2 = 0, then vn = nban−1 . Proof: Using the independence assumption in (9.2.9) and the conditional expectations, we obtain

n

1 1K 22 1K 2 (i) (i) = E E Xi Ln−1 |K = E EXi Ln−1 = E

1K

i=1

EXi

i=1

2

n−1 = c n−1 ;

i=1

i.e., n = cn . Similarly, vn

2

= EL2n − (ELn ) ⎡ ⎛1 22 ⎞⎤ K (i) = E⎣ E⎝ Xi Ln−1 K ⎠⎦ − c2 2n−1 i=1 ⎤ ⎡ K 2

(i) (i) (j) = E⎣ E Xi Ln−1 + E Xi Xj Ln−1 Ln−1 ⎦ − c2 2n−1 ⎡

i =j

i=1

= E ⎣EL2n−1

K

EXi2 + 2n−1

⎤ E(Xi Xj )⎦ − c2 2n−1

i =j

i=1

⎡ ⎤ K = E⎣ EXi2 Var Ln−1 + 2n−1 + 2n−1 E(Xi Xj )⎦ − c2 2n−1 i=1

= E

1K

2 Xi2

vn−1 + Var

1K

i=1

= a vn−1 + b c2(n−1) = b

2 Xi

i =j

2n−1

i=1 n−1 k=0

ak c2(n−1−k)

9.2 Convergence of Recursive Algorithms

=

⎧ ⎨ b c2n−2 ⎩

1−( ca2 )n 1− ca2

= bc2n

1−( ca2 )n c2 −a

, if

nban−1 , if

209

a = c2 = 0, a = c2 .

2

In the case b = 0, we have vn = 0 for all n. Therefore, we consider only the case b > 0. √ From (9.2.10) we obtain that for a < c2 , vn is of the same order as

n . This makes it possible to use a simple normalization by n . Deﬁne for c = 0, n := Ln /cn . L

(9.2.11)

n = 1, and Var(L n ) → Then E L recursion

b c2 −a .

n satisﬁes the modiﬁed Moreover, L

1 (i) d n = L Xi Ln−1 , c i=1 K

(9.2.12)

(i)

1 n−1 1 where L n−1 := cn−1 . Deﬁne D2 to be the set of distributions on (IR , B ) with ﬁnite second moments and ﬁrst moment equal to one. Next, deﬁne the mapping T : D2 → D2 by (i)

L

1 T (G) = L

1 Xi Z i c i=1 K

2 ,

(9.2.13)

where the (Zi ) are i.i.d. random variables with distribution G, and such that (Xi ), (Zi ), K are independent r.v.s. Let 2 denote the minimal L2 metric on D2 : % & 1/2 d d ; V = µ, W = ν (9.2.14)

2 (µ, ν) = inf E(V − W )2 ⎞1/2 ⎛ 1 −1 2 F (u) − G−1 (u) du⎠ . = ⎝ 0

Here F, G are the distribution functions of µ, ν respectively. If a < c2 , then T is a contraction with respect to 2 . Proposition 9.2.3 Assume that a < c2 . Then for F, G ∈ D2 , 7

2 (T (F ), T (G)) ≤

a

2 (F, G). c2

(9.2.15)

210

9. Mass Transportation Problems and Recursive Stochastic Equations d

d

Proof: Let the r.v.s U (i) = F, V (i) = G, i ∈ IN, be choosen on (Ω, A, P ) in such a way that ||U (i) −V (i) ||2 = 2 (F, G); for all i and K, (Xi ), (U (1) , V (1) ), (U (2) , V (2) ), . . . are all assumed to be independent. Then

22 (T (F ), T (G))

'2 ' K K ' '1 1 ' (i) (i) ' ≤ ' Xi U − Xi V ' ' 'c c i=1 i=1 2 ⎛ ⎡1 2 2 ⎤⎞ K K 1 ⎝ ⎣ ⎦⎠ (i) (i) = E E X U − X V K i i c2 i=1 i=1 )K

2 1 2 (i) (i) K = E E X − V U i c2 i=1 +

E Xi U (i) − V (i) Xj U (j) − V (j) |K ⎦

i =j

= =

1K 2

2 1 2 (i) (i) E EXi E U − V c2 i=1 a 2

(F, G). c2 2 2

As a consequence of Proposition 9.2.3 it follows that T has exactly one ﬁxed point in D2 with variance equal to b/(c2 − a). The ﬁxed-point equad

tion is given in terms of the independent random variables Z, Zi ∈ D2 , Zi = Z, (Zi ) as follows: 1 Z = Xi Zi . c i=1 K

d

(9.2.16)

As a corollary we obtain Theorem 9.2.4 If a = E n , Z) ≤

2 (L

a n/2 c2

K i=1

Xi2 < c2 , then

√ √

c2

b . −a

(9.2.17)

n converges in distribution to Z. In particular, L Proposition 9.2.5 If K is constant and E 2 ≤ k ≤ h, then E|Z|h < ∞.

⎤

K i=1

|Xj |k

< ck for all

9.2 Convergence of Recursive Algorithms

211

n can be equivalently represented by Yn in the following form: Proof: L Y0 = 1,

Yn =

1 cn

n 8

Xj1 ,...,jk ,

(j1 ,...,jn )∈{1,...K}n k=1 d

where (Xj1 ,...,jk−1 ,1 , ..., Xj1 ,...,jk−1 ,K ) = (X1 , ..., XK ) (cf. Guivarch (1990)). Moreover, (Yn ) is a martingale, and therefore |Yn |k is a submartingale. K (j) Representing the Yn in the recursive way Yn = 1c j=1 Xj Yn−1 , where (j)

Yn−1 are independent copies of Yn−1 , we have ⎞ ⎛ K ck E|Yn |k ≤ ⎝E |Xj |k ⎠ E|Yn−1 |k j=1

+

k1 + · · · + kK = k ki ≤ k − 1

8 K K 8 k |Xj |kj E|Yn−1 |kj . E k1 , . . . , kK j=1 j=1

We can infer from Theorem 9.2.4 that E|Yn |k is uniformly bounded for k ≤ 2. By induction over k ≤ h, we see that the lower-order terms in the above equation are uniformly bounded, say by a constant C. Since E|Yn |k ≥ E|Yn−1 |k , we obtain 1K ) 2* k k k |Xj | ≤ C. E|Yn | c − E i=1

Therefore, the assumptions of this proposition ensure that E|Yn |k is uniformly bounded for all k ≤ h. The submartingale convergence theorem now yields the existence of an integrable almost sure limit of |Yn |h . Since d n = n is absolutely h-integrable. L Yn , the weak limit Z of L 2 We can also obtain a “stability” result for the stationary equation (9.2.16). This will be achieved in terms of the p metrics deﬁned as in (9.2.14) with 2 replaced by p. Suppose we want to approximate the solution S of the equation d

S =

K

Xi Si

i=1

by the solution of the “approximate” equation S∗ = d

K i=1

Xi∗ Si∗ .

(9.2.18)

212

9. Mass Transportation Problems and Recursive Stochastic Equations

Here we assume without loss of generality that c = 1 and consider the case of independent sequences (Xi ), (Xi∗ ) so that the pairs (Xi ), (Si ) and (Xi∗ ), (Si∗ ) are independent, and K is constant. Proposition 9.2.6 If K is constant, < 1, then

p (S, S ∗ ) ≤

K

∗ i=1 p (Xi , Xi )

< ε, and

ε||S ∗ ||p . K 1 − i=1 ||Xi ||p

K i=1

||Xi ||p

(9.2.19)

Proof: From the deﬁnition of S, S ∗ , 1K 2 K ∗ ∗ ∗

p (S, S ) = p Xi Si , Xi S i i=1

≤

K

i=1

p (Xi Si , Xi∗ Si∗ )

i=1

≤

K

( p (Xi Si , Xi Si∗ ) + p (Xi Si∗ , Xi∗ Si∗ )

i=1

≤

1K

2 ||Xi ||p

p (S, S ∗ ) + ||S ∗ ||p · ε.

i=1

This implies that

p (S, S ∗ ) ≤

ε||S ∗ ||p . K 1 − i=1 ||Xi ||p

2

A similar idea for establishing robustness of equations can be found in Rachev (1991, Chapter 19.3). For the case of a random K we replace Proposition 9.2.6 by the following one.

K ∗ 2 Proposition 9.2.7 If E (X − X ) ≤ ε2 , EXi = EXi∗ , and i i i=1

K 2 a=E < 1, then i=1 Xi

2 (S, S ∗ ) ≤

ε √ ||S ∗ ||2 . 1− a

(9.2.20)

Proof: By the triangle inequality and the independence assumption and the assumption EXi = EXi∗ , 1K 2 K ∗ ∗ ∗ Xi Si , Xi S i

2 (S, S ) = 2 i=1

i=1

9.2 Convergence of Recursive Algorithms

≤ 2

1K

Xi∗ Si∗ ,

i=1

1 1K Xi2 ≤ E =

√

K

2 Xi Si∗

+ 2

1K

i=1 221/2

i=1 ∗

∗

2 (S, S ) + ||S ||2

Xi Si∗ , 1 E

i=1

K

2 Xi Si

i=1 K

213

(Xi −

21/2 Xi∗ )2

i=1

a 2 (S, S ∗ ) + ε||S ∗ ||2 .

Therefore, ε||S ∗ ||2 √ . 1− a

2 (S, S ∗ ) ≤

2 Remark 9.2.8 In the case of constant K and nonnegative Xi , Durrett and Liggett (1983) proved that the stationary solution Z of (9.2.16) has moments of order β if and only if 1 2 K 1 β v(β) = log EXi < 0. (9.2.21) cβ i=1 For β = 2, (9.2.4) is equivalent to the condition a < c2 used in Proposition 9.2.3. In this sense this condition is sharp when using 2 -distances. Guivarch (1990) has shown how to relax the second-moment assumption. Remark 9.2.9 For the normalized recursion (9.2.12) with (Xi ) i.i.d. r.v.s, K being a constant (we assume without loss of generality that c = 1), we can use the form 0 = 1, L

n = L

n 8

Xj1 ,...,jk ,

(9.2.22)

(j1 ,...,jn )∈{1,...,K}n k=1

n is a sum where (Xj1 ,...,jk ) are independent and distributed as X1 ; i.e., L over product weights in the complete K-ary tree; cf. the proof of Proposition 9.2.5. For nonnegative multipliers Xi we also can consider functionals of the type Mn = max Pn

n 8

Xj1 ,...,jk ,

(9.2.23)

k=1

where the maximum is taken over all paths of length n. Taking logarithms, − ln Mn = − max Pn

n k=1

ln (Xj1 ,...,jk ) = min Pn

n k=1

(− ln (Xj1 ,...,jk )) ,

214

9. Mass Transportation Problems and Recursive Stochastic Equations

and applying Kingman’s subadditive ergodic theorem yields that for some constant β, 1 log Mn → β a.s. n

(9.2.24)

This shows that in some sense the max product weight is not larger in order of magnitude than the average product weight. In some cases, the constant d β is explicitly known, for example, for Xi = U [0, 1], β ≈ −0.23196 (cf. Mahmoud (1992, p. 165)). Remark 9.2.10 In some cases explicit solutions of (9.2.16) are known. (1) If K is constant and

d 1 c Xi =

a , a − a ) is beta distributed, then β( K K

d

Z = Γ(a, β) is gamma distributed (cf. Guivarch (1990)). d d 1 K (2) Suppose that Z1 = K i=1 Xi Zi , (Yi ) are i.i.d. r.v.s, X = X1 , and d d 1 K d Y1 = X1 Z1 holds. Then Y1 = K i=1 Yi X. Conversely, if Y1 = K d 1 K 1 i=1 Yi X1 and (Xi ) are i.i.d. r.v.s, then Zi = K j=1 Yj . The K d 1 K sequence (Zi ) solves the equation Z1 = K i=1 Xi Zi (cf. Durrett and Liggett (1983)). d K 1/ϑ (3) Suppose (Zi ) solves Z1 = i=1 Xi Zi , Xi ≥ 0. Then Yi = Zi Wi , where 0 < ϑ ≤ 2 and Wi are stable r.v.s with index ϑ, satisfy K

1/ϑ

Xi

d

Yi = Y1 .

(9.2.25)

i=1

To prove (9.2.25), observe that K

1/ϑ 1/ϑ Xi Zi Wi

d

=

i=1

1K

21/ϑ Xi Zi

d

1/ϑ

W1 = Z1

W1 = Y1 .

i=1

This interesting transformation property is used in Guivarch (1990) to reduce the case with moments of Xi of higher order to the case of moments of lower order. K d d (4) If i=1 Xi2 ≡ c2 = 0, then the normally distributed r.v.s Z = Zi = 2 N (0, σ ) satisfy (9.2.16). (5) If Z solves (9.2.16) and Z is an independent copy of Z, then Z ∗ := Z − Z solves 1 ∗ ∗ X Z . c i=1 i i K

Z∗ = d

9.2 Convergence of Recursive Algorithms

215

Here Xi∗ = τi Xi , and the τi are arbitrary random signs. In particular, d

if K = 2, and the r.v.s Xi∗ = U [−1, 1] are independent, then (9.2.16) d

is solved by Z ∗ := Z − Z, where Z = Γ(2, 12 , 0). n , Remark 9.2.11 The following simulations (Figures 9.1 and 9.2) of L d

d

d

0.4 0.2 0.0

0.0

0.2

0.4

0.6

0.6

0.8

0.8

1.0

1.0

with K = 2, X1 , X2 independent r.v.s, X1 = X2 = U [0, 1], Xi = β(2, 2), show good approximation of the empirical d.f. by the theoretical gamma distribution.

0

1

2

3

4

0

d

FIGURE 9.1. Empirical d.f., X1 = U [0, 1], n = 10, theoretical Gamma Γ(2, 12 , 0)

1

2

3

4

d

FIGURE 9.2. Empirical d.f., X1 = β(2, 2), n = 8, theoretical Gamma Γ(4, 14 , 0)

d

d

0.8

1.0

Remark 9.2.12 In the case K = 2, X1 , X2 independent r.v.s, X1 = X2 = U − 81 , 98 , no explicit solution of (9.2.16) is known. Nevertheless, the fol n converges very fast to the lowing simulation (Figure 9.3) shows that L 12 10 and L ﬁxed point of (9.2.16). The empirical distribution functions of L can hardly be distinguished. Therefore, they may be regarded as the limiting 6 is already distribution function. The empirical distribution function of L very close to that limit (cf. Figure 9.3).

0.0

0.2

0.4

0.6

FIGURE 9.3. Empirical d.f. " of Ln# for d 1 9 n = 6, 10, and 12, X1 = U − 8 , 8

-4

-2

0

2

4

6

216

9. Mass Transportation Problems and Recursive Stochastic Equations

Remark 9.2.13 (Branching processes) Equation (9.2.9) includes the Galton–Watson process as special case. A Galton–Watson process is deﬁned by the recursion Z0 = 1,

Zn

Zn+1 =

Xkn ,

(9.2.26)

k=1 d

d

where Xkn = X are i.i.d. r.v.s, n ∈ IN0 . Deﬁne K = X and Xi ≡ 1. Then d

Ln = Zn

(9.2.27)

for all n. This equality can be checked by induction on n. In fact, take ﬁrst d Z0 = L0 = 1. If Zk = Lk for k ≤ n, then Zn+1

d

=

Ln

Xkn

d

=

d

=

i=1 d

=

K

i

⎛ ⎝

Ln−1

⎞(i) Xkn ⎠

L

(j)

n−1

j=1

i=1 k=1+

k=l K

K

i−1 j=1

(j)

Ln−1

⎛ (i) ⎞(i) Zn−1 K K d ⎜ n⎟ = X = Zn(i) ⎝ k⎠ i=1

k=1

Xkn

k=1

i=1

Ln(i) = Ln+1 . d

i=1

The assumption a < c2 is equivalent to the condition EX > 1. From (9.2.27) we can derive explicit stationary distributions and even the extinction probabilities in some cases. If, for example, X is geometrically distributed, P (X = k) = p(1 − p)k , k ∈ IN0 , then c = EX = 1−p p > 1 if p < 1−p 1 √ Zn and Var(X) = . The normalized Galton–Watson process 2 p2 Var(Zn ) converges to a (unique) solution of the ﬁxed-point equation 3 X EX(EX − 1) 1 d Z = Zi , EZ = . (9.2.28) EX i=1 Var(X) p The extinction probability is easily seen to be 1−p . For the normalized continuous part an equation identical to (9.2.28) (but with diﬀerent variances) is also valid. It is well known that this equation is solved by the geometric stable distribution of order 1, i.e., the exponential distribution. This ﬁnally implies √ 1 − 2p p 1 − 2p d Z = δ0 + exp , (9.2.29) 1−p 1−p 1−p √ since EZ = 1 − 2p, EZ 2 = 2(1 − p).

9.2 Convergence of Recursive Algorithms

217

(b) A Random Immigration Term In this section we admit an additional immigration term; i.e., we consider the recursion d

Ln =

K

(i)

Xi Ln−1 + Y,

(9.2.30)

i=1 (1)

(2)

where {(Xi ), Y }, K, Ln−1 , Ln−2 , . . . are independent r.v.s. The analysis of (9.2.30) is essentially simpliﬁed if we assume 0 := EL0 , v0 := Var(L0 ), K Var( 0 i=1 Xi + Y ) EY (if c = 1), v0 = , where a < 1.(9.2.31)

0 = 1−c 1−a If c = 1, then EY = 0 and 0 is arbitrary. Lemma 9.2.14 Under the assumption (9.2.31),

n = ELn = 0 , vn = Var(Ln ) = v0 , Proof: From (9.2.31), n = c n−1 + EY = vn

EY 1−c

for all n ∈ IN. (9.2.32)

= n−1 ,

Var(Ln ) = EL2n − 2n ⎡ K

2 (i) (i) (j) = E⎣ E Xi2 Ln−1 |K + E Xi Xj Ln−1 Ln−1 |K =

i=1 2

+ E(Y |K) + 2E

1K

i =j

Y

i=1

⎛

= a(vn−1 + 20 ) + ⎝

+ EY + 0 2E

1K

1 = avn−1 + Var 0

i=1

− 20

⎞

2 Y Xi

i=1 K

2*

EXi Xj ⎠ 20

i =j

2

(i) Xi Ln−1 |K

Xi + Y

− 20 2 = vn−1 . 2

Condition (9.2.31) is fulﬁlled for a two-point distribution of L0 . Indeed, it allows us to use the method in the proof of section 9.2.2(a). A change of the initial condition leads to the necessity to change the method of proof and leads to a great variety of diﬀerent cases to be considered. We therefore restrict ourselves to (9.2.31) in this section.

218

9. Mass Transportation Problems and Recursive Stochastic Equations

As in section (a), we introduce the operator T : M ( 0 , v0 ) → M ( 0 , v0 ),

T (G) = L

1K

2 Xi Vi + Y

.

(9.2.33)

i=1

Here M ( 0 , v0 ) is the set of distributions with mean 0 and variance v0 , d

(Vi ) are i.i.d. r.v.s, and the random quantities V1 = G, (Vi ), {(Xi ), Y }, K are independent. Similarly as in Proposition 9.2.3, we obtain the contractive inequality

2 (T (F ), T (G)) ≤

√

a 2 (F, G).

(9.2.34)

This implies the convergence of Ln to a unique ﬁxed point for the mapping √ T in M ( 0 , v0 ) with respect to the 2 -metric. The contraction factor is a. A sharper result (i.e., a smaller contraction factor) is obtained by the use of the Zolotarev metric ζr instead of 2 . Recall the deﬁnition for ζr (cf. (9.1.2)): ζr (F, G) = sup{|E(f (X) − f (Y ))|; |f (m) (x) − f (m) (y)| ≤ |x − y|α }(9.2.35) for r = m + α, m ∈ IN0 , 0 < α ≤ 1. Proposition 9.2.15 ζr (T (F ), T (G)) ≤ E

1K

2 |Xi |

r

ζr (F, G).

(9.2.36)

i=1

Proof: Recall that ζr is an ideal metric of order r with respect to summation; i.e., ζr (X + Z, Y + Z) ≤ ζr (X, Y ) for Z independent of X, Y , and moreover, ζr (cX, cY ) = |c|r ζr (X, Y ). Then, for (Zi ), (Wi ) being i.i.d. r.v.s distributed according to F, G, we have ζr (T F, T G)

=

1K 2 1K 2 sup Ef Xi Zi + Y − Ef Xi Wi + Y ; i=1 i=1 |f (m) (X) − f (m) (Y )| ≤ |x − y|2

≤

ζr

1K i=1

xi Zi + y,

K i=1

2 xi Wi + y

dP (X,Y,K) (x, y, k)

9.2 Convergence of Recursive Algorithms

≤

k

|xi |r ζr (Zi , Wi ) dP (X,Y,K) (x, y, k)

i=1

= E

219

1K

2 |Xi |

r

ζr (F, G).

i=1

2

Note that for the recursion deﬁned by T the ﬁrst two moments are matched. Therefore, we can apply (ζr ) with r ≤ 3 and obtain as a corollary the following theorem. EY Theorem 9.2.16 Suppose either c = 1 and 0 = 1−c or c = 1. SupK Var(0 i=1 Xi +Y ) pose also that EY = 0 and v0 = for a < 1. Then for 1−a 0 < r ≤ 3, the inequality

ar := E

K

|Xi |r < 1

i=1

implies anr ζr (L0 , L1 ) < ∞, 1 − ar

ζr (Ln , Z) ≤

where Z is a ﬁxed point of T in M ( 0 , v0 ). In particular, Ln converges in distribution to Z. Therefore, in the case with immigration we also obtain an exponential rate of convergence. As a consequence, after a few iterations, the limiting distribution is already well approximated. d

1 Consider the following example: L0 = 10 δ−5 + 25 δ0 + 12 δ2 , K = 2, X1 , X2 d d d 5 25 independent, X1 = X2 = U − 12 , 1 , and Y = 17 32 δ−1 + 64 δ0 + 64 δ2 .

0.8

1.0

In this situation the assumptions of Theorem 9.2.16 are fulﬁlled. The fast convergence is conﬁrmed by the closeness of the empirical distribution functions of L6 and L8 in the simulation described in Figure 9.4.

0.0

0.2

0.4

0.6

FIGURE 9.4. Empirical distribution functions for L6 and L8 ; the diﬀerence between the two curves is hardly visible

-4

-2

0

2

4

6

220

9. Mass Transportation Problems and Recursive Stochastic Equations

9.2.3

Limiting Distribution of the Collision Resolution Interval

In this section we apply the method of probability metrics to investigate the contraction properties of stochastic algorithms arising in random-access communication protocols. The results are due to Feldman, Rachev, and R¨ uschendorf (1994); see also Rachev and R¨ uschendorf (1995). The Capetanakis–Tsybakov–Mikhailov (CTM) protocol is one of the most elegant solutions to the classical multiple-access problem, in which a large population of users share a single communication channel. Throughput of this protocol is close to the throughput of the slotted Aloha protocol. The CTM protocol, unlike the classical “slotted Aloha,” is inherently stable. The “tree splitting protocols,” of which the CTM protocol is an example, pose some interesting mathematical problems and have been the subject of intensive study. We brieﬂy review the deﬁnition of the CTM protocol; see Bertsekas and Gallager (1987). Time is divided into slots of equal duration. During each slot, one of the following events occurs: 1. The slot is wasted because no one transmits. 2. Exactly one user transmits a message, in which case the message is successfully received. 3. The slot is wasted because two or more users transmit, interfering with each other. This is called a collision. At the end of each slot, every user knows which of these three events occurred (this is sometimes called “trinary feedback”). When a collision occurs, all users involved (those that transmitted during the slot) divide themselves into two groups on a random basis. Each user performs the equivalent of an independent coin toss in order to make its decision; p is the probability that a user selects the ﬁrst group. Users in the ﬁrst group retransmit their messages during the slot following the one in which the collision occurred; users in the second group defer their retransmissions until all users in the ﬁrst group have successfully transmitted their messages. If one of these groups contains more than one user, another collision will occur, in which case this group divides in the same way. Collisions are resolved on a last-come ﬁrst-served (LCFS) basis; i.e., the most recent collision is resolved before any prior collisions. We assume that new messages are generated according to a Poisson process with aggregate rate λ. Users who have transmitted a message that collided do not generate any new messages until their messages have been transmitted; however, since only a ﬁnite number of users are involved in

9.2 Convergence of Recursive Algorithms

221

any collision, the rate λ remains constant when the total user population is inﬁnite. Denote by Ln the number of slots required for resolution of a collision between n users. Ln includes the slot in which the initial collision occurred, plus the times for the two groups of users to transmit their messages. It is easily seen that the following stochastic recursion holds: d n−I +Y , Ln = 1 + LIn +X + L n

n ≥ 2,

(9.2.37) d

with initial conditions L0 = L1 = 1. Here In = B(n, p) is the number of users who retransmit immediately, X is the number of new arrivals in the collision slot, and Y is the number of new arrivals during the slot in d which the deferred retransmissions occur. Moreover, Ln = L n , and the n )n≥0 are assumed to be mutually random quantities X, Y, (Ln )n≥0 , (L independent. For real systems, the total number of users sharing a multipleaccess channel might be as large as 103 or 104 , but the number n of users involved in any collision would be a small fraction of this. Fayolle et al. (1985) showed that limn→∞ ELn /n exists if log p/ log(1 − p) is irrational;

(9.2.38)

otherwise, ELn /n oscillates around a certain value. In a subsequent paper, Fayolle et al. (1986) proved the linearity of the variance of Ln under (9.2.38) and the ﬁniteness of all moments of Ln . Conﬁrming a conjecture of Massey (1981), Regnier and Jacquet (1989) proved that the variance of d

Ln is not linear for In = B(n, p), p = 1/2, and X = Y = 0. In Jacquet and Regnier (1988) and Regnier and Jacquet (1989) the asymptotic normality of the standardized sequence {Ln } (for X = Y = 0 or both Poisson) was established. In this section we examine the asymptotic normality of the law of Ln without the speciﬁc assumptions on the distribution type of In , X, and Y , provided that the variance of Ln is asymptotically linear. In the second part of the section we numerically investigate the inﬂuence of nonlinearity d in the case In = B(n, p), X = Y = 0, and p = 12 . It turns out that E Ln /n and (Var Ln )/n increase monotonically with n until n reaches a large value (n = 39, 488). After that, the linearity breaks down, in agreement with the theoretical results. We consider the simple normalization √ when n = E Ln . (9.2.39) Yn = (Ln − n )/ n, The main theoretical result indicates that normality holds if the variance behaves linearly and the numbers of retransmissions are not concentrated too much in the extremes. In this sense the result can be considered as a stability result for the asymptotic distribution.

222

9. Mass Transportation Problems and Recursive Stochastic Equations

This idea of stability is conﬁrmed by simulations for some cases of immigration in part (b) of this section. In the numerical study we detect the theoretically predicted instability but only for extremely large n and with a practically negligible order of magnitude. Our simulation study conﬁrms the stability in the standard model concerning dependence on p. Moreover, a simulation study of the empirical d.f. of Yn conﬁrms the normality for 102 ≤ n ≤ 104 . The “instability” of E Ln /n and Var Ln /N and hence the ﬂuctuation of the limit distribution of Yn arises for very large values of n, n 104 . The order of magnitude of the instability is seen from our numerical results and simulation study to be extremely small (but existent; in accordance with the theoretical results) and can be neglected from the practical point of view. This has the valuable consequence that in practical applications one can use just simple linear normalizations as in (9.2.39) and the normal approximation also for n “moderately” large, 102 ≤ n ≤ 104 . (a) Asymptotic Normality of the Law of Ln In this section the asymptotic normality of Yn is shown under the following assumptions: For some r ∈ (2, 3], (a) E X r/2 + E Y r/2 < ∞ and

I n Lr −→ p ∈ (0, 1); n

(b) σn2 = (Var Ln )/n → σ 2 ; (c) supn E|Yn |r < ∞ for some r ∈ (2, 3]. Conditions (b), (c) amount to the correctness of the normalization in (9.2.39). Condition (a) implies that the subgroups are not allowed to be extremely large or small. Note that the number of retransmitting users In is not necessarily binomial in our assumptions. This allows us, for example, to consider departures from independence in the protocol. Regnier and d Jacquet (1989) showed that (a), (b), and (c) hold for In = B(n, p), (9.2.38), d

d

and X = Y = 0. More generally, one can allow X = Y = Pois(λ). Theorem 9.2.17 Under (a), (b), and (c), the distribution of Yn is asymptotically N (0, σ 2 ). Proof: From the deﬁnitions of Ln , Yn , and (9.2.37), (9.2.39), Yn

d

=

1/2 In + X YIn +X n 1/2 n − In + Y + Yn−In +Y + Cn (In , X, Y ). n

(9.2.40)

9.2 Convergence of Recursive Algorithms

223

Here Yn is an independent copy of Yn , and Cn (k, m, m) := n−1/2 (1 + k+m + n−k+m − n ). Deﬁne a sequence of normal N (0, σn2 )-distributed independent r.v.s Zn that are independent of (In ), X, and Y , and let Zn∗

=

In + X n

1/2 1/2 n − In + Y n−I +Y + Cn (In , X, Y ), Z ZIn +X + n n

n is an independent version of Zn . Z ∗ is an accompanying sequence where Z n to Yn . Let µr be one of the following ideal metrics of order r > 0: % & (s) µ(1) q ≤ 1 r (X, Y ) = sup |E(f (X) − f (Y ))|; f r = s + 1/ p, s ∈ IN, p ∈ [1, ∞],

with µ(2) r (X, Y ) = µ(3) r (X, Y ) =

1 1 + = 1, p q

sup |t|−r |E eit X − E eit Y |;

t∈IR

sup |h|r

h∈IR

sup |P (X + hN ∈ A) − P (Y + hN ∈ A)|, A∈B(IR)

where N is a standard normal r.v. independent of X and Y . Claim 1. (µr -closeness of Zn∗ and Yn ) Set an = µr (Zn , Yn ) and suppose a := supn an < ∞. Then sup µr(i) (Zn∗ , Yn ) ≤ a[pr/2 + (1 − p)r/2 ].

(9.2.41)

n

(i)

For µr = µr (i = 1, 2, 3), µr (Zn∗ , Yn ) ≤ P (In = k, X = m, Y = m) k,m,m

17

7

n−k+m Zn−k+m + cn (k, m, m), n 2 7 7 k+m n−k+m Yk+m + Yn−k+m + cn (k, m, m) n n ) r/2 r/2 * k+m n−k+m ≤ P (In = k, X = m, Y = m)a + n n k,m,m * ) r/2 r/2 n − In + Y In + X + . = aE n n µr

k+m Zk+m + n

224

9. Mass Transportation Problems and Recursive Stochastic Equations

Using assumption (a), the right-hand side of the above inequality converges to a[pr/2 + (1 − p)r/2 ]. Claim 2. (Condition (9.2.41) holds) a ≤ C sup(E|Yn |r + E|Zn |r ) < ∞.

(9.2.42)

n

(Throughout this section, C stands for an absolute constant that can have diﬀerent values in diﬀerent places.) For the proof, note that for i = 1, 2, or 3, µr(i) (X, Y ) ≤ C(E|X|r + E|Y |r ) < ∞, provided that E(X j − Y j ) = 0 for j = 1, 2 (see, for example, Rachev (1991, Chapters 14, 15)). Thus (9.2.42) holds. Claim 3. (Asymptotic normality of Zn∗ ) For n → ∞, bn = µr (Zn , Zn∗ ) → 0. (1)

We consider the case µr = µr only. Let κr be the rth pseudomoment, κr (X, Y ) = r |x|r−1 |FX (x) − FY (x)| dx. IR

Then, since the mean and variance of Zn matched those of Zn∗ (µr (Zn∗ , Yn ) < ∞ implies E((Zn∗ )j − Ynj ) = 0, j = 1, 2), it follows that bn ≤ C κr (Zn , Zn∗ ). Recall that (Zn )n≥1 is independent of (In )n≥1 and X, Y . Let N0 denote a standard normal r.v. independent of (In ) and X, Y . Consequently, 7 7 In + X n − In + Y ∗ ZIn +X + Zn = Zn−In +Y + Cn (In , X, Y ) n n 1/2 In + X 2 n − In + Y 2 d σIn +X + σn−In +Y = N0 + Cn (In , X, Y ) n n =: ηn N0 + Cn (In , X, Y ). From assumptions (a), (b) we get the convergence of ηn in probability: ηn −→ (p σ 2 + (1 − p)σ 2 )1/2 = σ. P

d

Since Zn∗ = ηn N0 + Cn (In , X, Y ) has the same mean and variance as Zn = σn N0 , then σn2

= E(ηn N0 + Cn (In , X, Y ))2 = E ηn2 + E(Cn (In , X, Y ))2 .

9.2 Convergence of Recursive Algorithms L2

225

P

As ηn −→ σ, we conclude that Cn (In , X, Y ) −→ 0. This implies that bn = µr (Zn , Zn∗ ) → 0, as desired in Claim 3. With an = µr (Zn , Yn ) ≤ µr (Zn∗ , Yn )+bn and a = lim an we ﬁnally obtain from claims 1–3 the following result: Claim 4. a = 0. To prove the claim, choose n0 = n0 (ε) (ε > 0) such that ak ≤ a + ε for k > n0 . Then for n ≥ n0 , as in the proof of Claim 1, we have an

≤ µr (Zn∗ , Yn ) + bn 2 1n −1 n 0 P (In = k) ≤ + k=0

k=n−n0

)

×E +

n−n 0 −1

k+X n

+

n−k+Y n

(ak+X + an−k+Y )

r/2 *

P (In = k)

k=n0

)

×E

r/2

sup 0≤k≤n0 −1, n−n0 ≤k ≤n

k+X n

r/2

(a + ε) +

n−k+Y n

r/2

* (a + ε) + bn .

Recall Claim 2, a = supn an < ∞, and thus as n → ∞, 1n −1 2 n 0 P (In = k)2a E(X r/2 + Y r/2 ) a ≤ lim sup + n

k=0

k=n−n0

+ (a + ε)(p + (1 − p)r/2 ) + lim sup bn 0 + (a + ε)(pr/2 + (1 − p)r/2 ) + 0. r/2

=

Since r > 2, we have pr/2 + (1 − p)r/2 < 1, which implies that a = 0, and thus the proof of the theorem is complete, since µr -convergence implies weak convergence. 2

Remark 9.2.18 Theorem 9.2.17 shows a remarkable stability of the central limit theorem for Ln . It says that the central limit theorem can be expected if the variance behaves approximately linearly and that it is even true under protocols that are not based on a binomial number of retransmitting users. In concrete examples it is not easy to obtain the asymptotic behavior of the ﬁrst moments. Our method of proof separates this problem and establishes a general structural stability property concerning the asymptotic distribution. This should be of some interest for the application of the algorithm, too.

226

9. Mass Transportation Problems and Recursive Stochastic Equations

This stability is not clear or expected from the methods that established the central limit theorem up to now in some very special cases. (b) Numerical Results In the ﬁrst part of this section we study numerically the extent of nonlind earity of ELn , Var Ln in the special case of (9.2.37) where X = Y = 0, In = B(n, p), log p/ log(1 − p) rational. Initial investigation of the behavior of the mean n of Ln at p = 0.5 failed to show the predicted instability of n /n. The normalized value n /n seemed to converge rapidly, reaching a value of about 2885 for n = 2400, and showing no variation out to 7 decimal places with further increase in n. The increments n /n − n−1 /(n − 1) were observed always to be positive, another indication of convergence. At n = 38, 488, a negative increment appears, and subsequently, values of the increment oscillate in a sinusoidal fashion, with a peak magnitude of about 1 × 10−10 . The behavior of the increments is shown graphically in Figure 9.5 on a logarithmic scale.

FIGURE 9.5. Increments of n /n, p = 0.5, n = number of users who initially collide

Based on recursions for the ﬁrst moments, the numerical √ results for evalun−1 ation of n /n, ∆( n /n) := nn − n−1 , Varn := Var(Ln / n), and ∆(Varn ) := Varn − Varn−1 are shown in Table 9.1. On the other hand, a change in the initial conditions disturbs the value of n /n and Varn (see Table 9.2). Table 9.1 conﬁrms the stability of nn ≈ 2.88, Varn ≈ 3.38 for moderate n ∈ (102 , 104 ) and p = 0.5. Slight perturbation of p around 0.5 does not change the overall stability of nn for practically relevant n; see Figure 9.6. Summarizing the numerical ﬁndings, it appears that for reasonably large n ≥ 100 and p = 0.5 the nonlinearity of n /n and Varn is not observed

9.2 Convergence of Recursive Algorithms

n 2 3 4 5 10 100 500 1000 5000 10000

n 2 3 4 5 10 100 500 1000

227

TABLE 9.1. numerical results p = 0.5, L0 = L1 = 1 n /n ∆(n /n) Varn ∆(Varn ) 2.5000D+00 1.5000D+00 4.0000D+00 4.000D+00 2.5556D+00 5.5556D−02 3.2593D+00 −7.4074D−01 2.6310D+00 7.5397D−02 3.3832D+00 1.2396D−01 2.6838D+00 5.2857D−02 3.3875D+00 4.2812D−03 2.7853D+00 1.0985D−02 3.3832D+00 1.1672D−04 2.8754D+00 1.0113D−04 3.3834D+00 9.1046D−07 2.8834D+00 4.0528D−06 3.3834D+00 −8.5624D−08 2.8844D+00 1.0224D−06 3.3834D+00 −4.1963D−08 2.8852D+00 3.4639D−08 3.3834D+00 2.1844D−08 2.8853D+00 7.3428D−09 3.3835D+00 −4.1539D−07

n /n 1.0000E+00 1.1111E+00 1.1905E+00 1.2419E+00 1.3427E+00 1.4327E+00 1.4407E+00 1.4417E+00

TABLE 9.2. p = 0.5, L0 = L1 ∆(n /n) Varn 1.000E+00 1.000E+00 1.1111E−01 8.1481E−01 7.9365E−02 8.4580E−01 5.1429E−02 8.4688E−01 1.1048E−02 8.4579E−01 1.0107E−04 8.4586E−01 4.0304E−06 8.4586E−01 1.0117E−06 8.4586E−01

=0 ∆(Varn ) 1.000E+00 −1.8519E−01 3.0990E−02 1.0703E−03 2.9179E−05 2.2762E−07 −2.1503E−08 −1.2471E−08

FIGURE 9.6. |∆(n /n)|, p = 0.499

in a practically relevant magnitude. Also, in this range of values of n the behavior of n and Varn is stable with respect to p. The following simulations (Figures 9.7, 9.8, and 9.9) show a good agreement with the normal approximation for n ≥ 100. For n = 20 or n = 30 the normal ﬁt is no longer good. Further simulation results indicate stability with respect to the value of p.

228

9. Mass Transportation Problems and Recursive Stochastic Equations

FIGURE 9.7. Simulation curve for Yn = (Ln − n )/σn for n = 1000, p = 0.5, L0 = L1 = 1, based on 936,725 trials, and the ﬁtted normal curve with mean zero and variance 3.3834 as given in Table 9.1

FIGURE 9.8. Normal ﬁt to empirical d.f. with n = 50, p = 0.5

FIGURE 9.9. Normal ﬁt with σ 2 = 3.3874 to the simulated Yn ’s; n = 1000, p = 0.49, L0 = L1 = 1 based on 697,675 trials

In the ﬁnal simulations (Figures 9.10, 9.11) we consider the case with nonzero immigrations X, Y in a symmetric and a nonsymmetric case with masses in 0,1,2. These examples conﬁrm the general robustness idea that asymptotic normality is approximatively valid if the variances behave approximately lin-

9.2 Convergence of Recursive Algorithms

FIGURE 9.10. Normal ﬁt for n = 40/100 and X ∼ (nonsymmetric case)

3 δ 4 0

229

+ 18 δ1 + 18 δ2 , Y ∼ δ0 ,

3 δ1 + FIGURE 9.11. Normal ﬁt for n = 50/10000 and X ∼ Y ∼ δ0 16 metric case)

1 δ 16 2

(sym-

early (which is observed in these examples empirically).

9.2.4

Quicksort

The quicksort algorithm, which was introduced by C.A.A. Hoare in 1961– 1962, represents a standard sorting procedure in computer systems. From a list of n arbitrary (but diﬀerent) real numbers it selects an element x randomly. Then the remaining numbers are divided into two groups, the group of numbers smaller and that of numbers larger than x. The same procedure is applied to each of these groups if they contain more than one element. The algorithm ends with a sorted list of the original numbers.

230

9. Mass Transportation Problems and Recursive Stochastic Equations

If Ln denotes the number of comparisons in the quicksort algorithm on its way to ordering n elements x1 , . . . , xn , then Ln satisﬁes the following recursive equation: d

Ln = n − 1 + LIn + Ln−In ,

L0 = L1 = 0,

L2 = 1. (9.2.43)

Here In , n − In are the sizes of the subgroups, and they are assumed to be uniformly distributed on {0, . . . , n−1}. The expectation n = ELn satisﬁes then the resursion

n = n − 1 +

n−1

P (In = i)( i + n−i ),

i=0

and therefore, n+1 1 2

n

n−1 2(n − 1) + − 4. = + = n+1 n n(n + 1) i n + 1 i=1

This yields

n = 2n ln n + n(2γ − 4) + 2 ln n + 2γ + 1 + O(n−1 ln n), (9.2.44) where γ ≈ 0.5772 is the Euler constant. Similarly, for vn = Var(Ln ), we have 2 2 vn = 7 − π n2 + o(n2 ). (9.2.45) 3 The normalized random sequence Yn =

Ln − n n

(9.2.46)

satisﬁes the recursion d

Yn =

In n − In Y In + Y n−In + Cn (In ), n n

(9.2.47)

where Cn (i) =

n−1 E(Li + Ln−i − Ln ). n

As n → ∞, Inn converges to some random variable τ that is uniformly distributed on [0, 1]. Moreover, Cn (In ) = Cn (n Inn ) can be uniformly approximated as follows: sup |Cn (nx ) − C(x)| ≤ x∈(0,1)

6 n ln n + O(n−1 ). n

(9.2.48)

9.2 Convergence of Recursive Algorithms

231

Here C(x) = 2x log x+2(1−x) log(1−x)+1, and x is the smallest integer larger than or equal to x (cf. R¨ osler (1991)). As a result we obtain the limiting equation Y

d

= τ Y + (1 − τ )Y + C(τ ).

(9.2.49)

In particular, it yields recursive formulas for the moments of Y . Using as an accompanying sequence Yn := τ Y + (1 − τ )Y + Cn (In ), R¨ osler (1991) established the convergence of Yn to Y for the p -metrics. From Proposition 9.1.3 there exists a unique solution Y (in distribution) of the ﬁxed-point equation (9.2.49). Theorem 9.2.19 Let Y denote the solution of (9.2.49). Then

p (Yn , Y ) → 0.

(9.2.50)

0.6

The simulation result described on Figure 9.12 shows that the density of Y is very well approximated by a lognormal distribution (cf. Cramer (1996)). The maximal deviation of the ﬁtted lognormal density and the smoothed empirical density is about 0.004.

0.0

0.2

0.4

FIGURE 9.12. Smoothed empirical density of quicksort. Simultaneously, a lognormal approximation is given

-2

-1

9.2.5

0

1

2

3

Limiting Behavior of Random Maxima

A sample of size n is divided into two parts of random size In and n − In , where In is a random variable. We consider the recursion of “maxima type” d

Ln = cn + LIn ∨ Ln−In ,

(9.2.51)

where Ln , Ln are independent and identically distributed r.v.s, (In ) are independent, and (cn ) is a sequence of real numbers. Given α > 0, let us introduce the normalizations Yn = n−1/α Ln ,

Y n = n−1/α Ln .

(9.2.52)

232

9. Mass Transportation Problems and Recursive Stochastic Equations

By (9.2.51), −1/α

d

Yn = c n n

+

In n

1/α

Y In ∨

n − In n

1/α Y n−In .

(9.2.53)

Suppose that n−1/α is the right normalization to obtain the weak converD D gence results, Yn −→ Z, Y n −→ Z, and moreover, let Inn → τ , where τ a random variable independent of Z, Z. Then, in the limit, we obtain the ﬁxed-point equation Z = τ 1/α Z ∨ (1 − τ )1/α Z. d

(9.2.54)

It is easy to check that, for example, the extreme value distribution FZ (x) =

e−x 0,

−α

, x > 0, x≤0

(9.2.55)

satisﬁes (9.2.53). As a motivation for the study of equation (9.3.46), consider Ln = max{X1 , . . . , Xn }, cn = 0, and assume that (Xi ) are i.i.d. r.v.s of Paretian type (F (x) ∼ 1 − x−α for x → ∞). Then by Gnedenko’s extreme-value D theorem, Yn = n−1/α Ln −→ Z, with FZ as in (9.3.50). Note also that formula (9.3.46) concerns some modiﬁcations of this recursion, where the maxima are produced by a (random) scheme determined by In (for examd

ple In = B(n, p) ) and cn corresponding to some weighting of the number of steps in this reduction (for example cn = 1). Furthermore, note that (9.2.51) with cn = 1 also describes the maximum search length of a search algorithm dividing a slot of size n succesively into two parts of size In and n − In , respectively. Deﬁne next ak := r (Yk , Z) for 0 < α < r ≤ 1 or ak := r (Yk , Z), r the weighted Kolmogorov metric (cf. (9.1.16)) for 1 ≤ α < r < α + 1, and consider the following assumptions: lim ak

<

∞,

−1/α

→

0,

In n

→

τ

cn n

(9.2.56) a.s. with Eτ r/α + (1 − τ )r/α < 1.

The ﬁrst assumption corresponds to the condition that n−1/α is the right normalization for Ln (as for example in the case Ln = max(X1 , . . . , Xn )). d

If In = B(n, p), then

In n

→ p, and for α < r we have pr/α + (1 − p)r/α < 1.

9.2 Convergence of Recursive Algorithms

233

Theorem 9.2.20 Let (Ln ) satisfy the recursion (9.2.51). Deﬁne ak :=

r (Yk , Z) if 0 < α < r ≤ 1, or ak := r (Yk , Z) if 0 ≤ α < r < α + 1, and let FZ be as in (9.2.55). Then assumption (9.2.56) implies lim ak = 0. k

Proof: We consider ﬁrst the case 0 < α < r ≤ 1 and ak = r (Yk , Z). Let (Zi ) be an i.i.d. sequence with common extreme-value distribution (9.2.55), and deﬁne Zn∗

= n

−1/α

cn +

In n

1/α

ZIn ∨

n − In n

1/α Z n−In .

(9.2.57)

Then

r (Yn , Zn∗ )

1

1/α 1/α n − In In Y In ∨ Y n−In , (9.2.58) n n 2 1/α 1/α In n − In Z In ∨ Z n−In n n ) 1 1/α n 1/α n−k k ≤ P (In = k) r Yk ∨ Y n−k , n n k=1 2* 1/α 1/α k n−k Zk ∨ Z n−k n n ) * r/α n r/α k n−k ≤ P (In = k) ak + an−k n n k=1 r/α r/α In n − In = E aIn + E an−In . n n

= r

The arguments in deriving the above bounds rely on the “ideality” of r with respect to the maxima scheme.(1) Deﬁne bn := r (Zn∗ , Zn ) and let us use the bound

r (X, Y ) ≤ ( 1 (X, Y ))r

(9.2.59)

(1) Recall that a metric µ(X, Y ) = µ(F , F ) in the space of distribution functions is X Y called ideal with respect to the maxima scheme (or max-ideal) of order r > 0 if for any c > 0 and independent X, Y , and Z,

µ(cX ∨ Z, cY ∨ Z) ≤ cr µ(X, Z); see Rachev (1991) and Rachev and R¨ uschendorf (1991).

234

9. Mass Transportation Problems and Recursive Stochastic Equations

for any r ≤ 1 that is a singleconsequence of the Monge–Kantorovich theorem; recall that 1 (X, Y ) = |FX (x) − FY (x)| dx. Claim 1. bn → 0. To show the claim, we apply (9.2.57) to obtain 1 2 1/α 1/α I n − I n n b1/r ≤ 1 n−1/α cn + ZIn ∨ Z n−In , Zn n n n 1 2 1/α 1/α n k n−k −1/α ≤ P (In = k) 1 n cn + Zk ∨ Z n−k , Zn n n k=1 2 1 1/α n n − k k + = P (In = k) 1 cn n−1/α + Z, Z . n n k=1

In the above bound we have used that the extreme value distributions Zn satisfy the max-stability property: 1/α 1/α 1/α n−k n−k k k d + Zk ∨ Z n−k = Z = Z. n n n n 1/r

Therefore, bn

≤ cn n−1/α → 0, proving the claim.

Applying the triangle inequality and (9.2.58), (9.2.59), we have an ≤ r (Yn , Zn∗ ) + r (Zn∗ , Zn ),

(9.2.60)

and therefore, a ≤ aE(τ r/α + (1 − τ )r/α ) + 0,

with a := lim an ,

(9.2.61)

implying that a = 0. Next, we shall make use of the weighted Kolmogorov metric r (cf. (9.1.16)). It is easy to check that for ε ≥ 1 and X, Y ≥ 0, r (X + a, Y + ε) ≤ (εa)r + εr r (X, Y ).

(9.2.62)

Deﬁne then ak = r (Yk , Z), bk = r (Zk∗ , Zk ). By (9.2.62) (with ε = 1), 1 1/α 1/α In n − In ∗ r −r/α + r Z In ∨ Z n−In , r (Yn , Zn ) ≤ cn n n n 2 1/α 1/α In n − In YIn ∨ Y n−In n n r/α r/α In n − In r −r/α ≤ cn n +E aIn + E an−In . n n

9.2 Convergence of Recursive Algorithms

235

This implies that lim r (Yn , Zn∗ ) ≤ E(τ r/α + (1 − τ )r/α )a for r > α.

(9.2.63)

If α < r ≤ α + 1, then r (Zn , Zn∗ ) ≤

n

P (In = k)r (cn n−1/α + Zn , Zn )

k=1

= r (cn n−1/α + Z, Z). We next prove that r (a + Z, Z) → 0 for a → 0. Let χ(x, a) := xr |FZ (x) − FZ+a (x)| = xr |e−x

−α

− e−(x−a)

−α

−α

Then sup0≤x≤a χ(x, a) ≤ ar , supa≤x≤2a χ(x, a) ≤ (2a)r e−a more, sup x |e r

−x−α

2a≤x≤1

−e

−(x−a)α

x | =

r

sup x α

2a≤x≤1

y −α−1 e−y

−α

|.

. Further-

dy

x−a

−α 1 e−x (x − a)α+1 r x αa ≤ sup e−1 α+1−r x − a (x − a) 2a≤x≤1

≤

r

sup x αa

2a≤x≤1

≤ 2r αar−α e−1 and x r

sup x α

1≤x<∞

y −α−1 e−y

x−a

−α

dy

≤

−α xα+1 αa e−x α+1 (x − y) 1≤x<∞

sup

≤ 2α+1 αa. Combining all this we obtain r (a + Z, Z) ≤ C ar−α for α < r ≤ α + 1. (Note that supx χ(x, u) = ∞ for r > α + 1.) Applying again the triangle inequality, we ﬁnally obtain a = lim r (Yn , Z) ≤ lim r (Yn , Zn∗ ) + lim bn ≤ E(τ r/α + (1 − τ )r/α )a, which indeed implies a = 0.

2

A similar study of logarithmic normalizations for max-search algorithms is provided by Cramer (1995a).

236

9. Mass Transportation Problems and Recursive Stochastic Equations

9.2.6

Random Recursion Arising in Probabilistic Modeling: Limit Laws

In this and the next section we study various random recursions arising in probabilistic modeling. In 9.2.6 we shall discuss the limiting behavior of these recursions and describe the limit distributions, and in the next section we estimate the rate of convergence to the corresponding limit.(2) Let {(Yn , Zn )}n≥1 be a sequence of i.i.d. random vectors in IR2 . Deﬁne the random recursion (Sn∗ ) by ∗ Zn + Yn Zn , Sn∗ = Sn−1

n = 1, 2, . . . ; S0∗ = 0.

(9.2.64)

The processes {Sn }n≥1 and {Sn∗ }n≥1 have appeared in a variety of situations. The random recursion (9.2.64) is often written in an equivalent form, ∗ An + Bn , Sn∗ = Sn−1

n = 1, 2, . . . ,

(9.2.65)

for a sequence of i.i.d. random vectors {(An , Bn )}n≥1 .(3) Alternatively to (9.2.64) we can introduce the process(4) Sn =

n i=1

(2) The

Yi

i 8

Zj ,

n = 1, 2, . . . .

(9.2.66)

j=1

results of Sections 9.2.6 and 9.2.7 are due to Rachev and Samorodnitsky (1995); see also the references therein. (3) Typically, the Markov chain (9.2.65) is supplied with an initial state S ∗ = B for a 0 0 random variable B0 independent of the sequence {(An , Bn )}n≥1 . In the ergodic case the ∗ is, of course, independent of the distribution of B . The recursion limit distribution of Sn 0 ∗ can be (9.2.65) arises in many applications, and as pointed out by Vervaat (1979), Sn regarded as the “wealth” at time n; An is the relative change of the “wealth” between times n and n − 1 due to a “quality” change in the “environment”: inﬂation, change of an exchange rate, erosion, spoilage, decay, etc. Bn represents the added (or removed) “wealth” just prior to time n. Applications of this model are abundant in the literature. Uppuluri et al. (1967) and Paulson and Uppuluri (1972) used this model to represent the evolution of a stock of a radioactive material. Chandrasekhar and Munch (1950) studied the ﬂuctuations in brightness of the Milky Way. Cavalli-Sforza and Feldman (1973) and Cavalli-Sforza (1975) modeled evolution and cultural inheritance. Applications to investment models can be found in Lassner (1974a) and Perrakis and Henin (1974). A particular subclass of random recursions (9.2.65), the so-called ARCH (autoregressive conditional heteroskedastic) processes, has been used in mathematical ﬁnance to model data exhibiting clusters; see Domowitz and Hakkio (1985) and Hsieh (1988) for modeling exchange rate yields, and Engle et al. (1987) and Bollerslev (1987) for modeling stock returns. (4) The stochastic process (9.2.66) has been used by Todorovic and Gani (1987) and Todorovic (1987) to model the eﬀect of environmental changes on crop production; see also Puri (1987).

9.2 Convergence of Recursive Algorithms

237

It is obvious that the two models (9.2.66) and (9.2.64) are closely related. The stochastic processes {Sn }n≥1 and {Sn∗ }n≥1 , although not equal, in general, in ﬁnite-dimensional distributions, have equal two-dimensional disd ∗ tributions: More precisely, (Sn , Sn+1 )=(Sn∗ , Sn+1 ) for each n = 1, 2, . . . . d

∗ ∗ , Sn+2 ).(5) However, we may have (Sn , Sn+1 , Sn+2 ) =(Sn∗ , Sn+1

A related pair of processes can be deﬁned by replacing the sum with the maximum: Mn∗ Mn

= =

∗ max(Mn−1 Zn , Yn Zn ); n ? i=1

Yi

i 8

Zj ,

M0∗ = 0;

(9.2.67)

n = 1, 2, . . . .

j=1

These models can be regarded as describing the evolution of the highest up-to-date adjusted change in the “wealth” associated with the summation models (9.2.64)–(9.2.66).(6) Further, we prove limit theorems for the processes {Sn }n≥1 and {Mn }n≥1 stopped at random times. Thinking in terms of “wealth” and “environment” changes described above, suppose that in each time period a disastrous event may occur with probability p ∈ (0, 1). As a result of the disastrous event (bankruptcy, drought, etc., depending on the application) the whole wealth could be lost. The time of the disastrous event τ = τ (p) is assumed to be geometrically distributed, P (τ (p) = k) = (1 − p)pk−1 ,

k = 1, 2, . . . ,

(9.2.68)

and independent of the sequence {(Yn , Zn )}n≥1 . We will discuss the limiting behavior of the total “wealth” (until time τ ), Sτ , as p → 0. (5) Typically, of interest have been conditions for ergodicity of the Markov chain (9.2.65) and characterization of the limiting distribution. The key reference here is Vervaat (1979). An earlier work of Kesten (1973) studies the multidimensional version of ∗ ’s and B ’s are d-dimensional (random) vectors and A ’s are d × d (ran(9.2.65), i.e., Sn n n dom) matrices. This level of generality allows one, for example, to treat one-dimensional recursions of a higher order. It turns out, in particular, that under certain moment con∗ has Pareto-like tails. ditions on (An , Bn ) in (9.2.65), the stationary distribution of Sn This phenomenon has been studied further by Goldie (1991) in the case of more general recursions. (6) For applications of these and related models see Helland and Nilsen (1976), Hooghiemstra and Keane (1985), and Hooghiemstra and Scheﬀer (1986). An interesting time-reversibility relation between special cases of the models (9.2.64) and (9.3.68) has been noted by Chernick et al. (1988). Extrema of the processes arising in the random recursion (9.2.65) and, in particular, of ARCH processes are studied in de Haan et al. (1989). As general references on random recursions see Letac (1986) and the introduction of Kifer (1986). Brandt et al. (1990, Chapter 9) establishes a continuous dependence of the stationary distribution of the Markov chain (9.2.65) on certain parameters of the recursion.

238

9. Mass Transportation Problems and Recursive Stochastic Equations

The disastrous event could be caused by the cumulative eﬀect of a large number of “bad” events with high success probability, p ≈ 1. This leads to a negative binomial model for the time of the disastrous event τ = τ1 + · · · + τr , where the τi ’s are i.i.d. geometric with mean 1/p. For p → 1, r → ∞, r(1 − p) → λ > 0 we have the Poisson approximation P (τ = n + r) ≈ e−λ λn /n!. This, in turn, leads to a Poisson model for the time of the disastrous event. Assuming that the Ni are i.i.d. Poiss(λ) r.v.s, Ti = N1 +· · ·+Ni is viewed as the time of the ith disastrous event. We shall study in the framework of the model (9.2.66) the distributions of the sums T1

ST1 =

i=0

Yi

i 8

Zj ,

(9.2.69)

j=0

and

Tk+1

STk+1 − STk =

Yi

i=Tk +1

i 8

Zj ,

k = 1, 2, . . . .

j=0

Here for the sake of convenience we start the sequence {(Yn , Zn )} at n = 0. Similarly to (9.2.69) we shall be interested in the laws of Mτ

=

τ ?

Yi

i=1

i 8

Zj

(9.2.70)

j=1

and =

MT1

T1 ?

Yi

i=0

i 8

Zj

j=0

and ?

Tk+1

Yi

i=Tk +1

i 8

Zj ,

k = 1, 2, . . . .

j=0

Note that if Sn (or Sn∗ ) converges in distribution, then the limiting (in distribution) random variable S satisﬁes the equation d

S = (S + Y )Z.

(9.2.71)

d

Here (Y, Z) = (Yn , Zn ), and the random quantities S and (Y, Z) in the right-hand side of (9.2.71) are independent. In many cases the solution of the equation (9.2.71) turns out to be an inﬁnitely divisible random variable. Similarily, the distributional limit M of Mn satisﬁes d

M = (M ∨ Y )Z.

(9.2.72)

9.2 Convergence of Recursive Algorithms

239

It is interesting to note that the total “wealth” until the disastrous geometrical event Sτ also satisﬁes a distributional equation: d

Sτ = (Y + δSτ )Z.

(9.2.73)

d

Here, as before, (Y, Z) = (Yn , Zn ), δ is Bernoulli with P (δ = 1) = 1 − p); S, δ, and (Y, Z) in the right-hand side of (9.2.73) are independent.(7) Similarily to (9.2.73), d

Mτ = (Y ∨ δMτ )Z;

(9.2.74)

if Z ≡ 1, Mτ is said to be max-geometric inﬁnitely divisible.(8) We start with results on the limiting behavior of the recursions deﬁned above. In the next ﬁve theorems {(Yn , Zn )}n≥1 is, unless explicitely stated otherwise, a sequence of nonnegative i.i.d. random vectors such that P (Yn > 0) > 0 and P (Zn > 0) = 1. Set Sn and Mn as in (9.2.66) and (9.2.67), and let Xn = Yn

n 8

Zj .

j=1

Set ξn = log Zn , ν = Eξn (when they exist). Lemma 9.2.21 Let {(Yn , Zn )}n≥1 be a sequence of random vectors living on a common probability space such that {Yn }n≥1 is a sequence of nonnegative i.i.d. random variables with P (Yn > 0) > 0, and {Zn }n≥1 is a sequence of positive i.i.d. random variables. Suppose that E log(1 + Zn ) < ∞. (a) If ν > 0, then with probability 1, Xn does not converge to 0, and thus Sn → ∞. The same is true in the case ν = 0, provided that the sequence {Yn , Zn }n≥1 is a sequence of i.i.d. random vectors. Moreover, in both cases Mn → ∞ (unless P (Zn = 1) = 1). (b) If −∞ < ν < 0, the following are equivalent as n → ∞: (b-i) Xn → 0 a.s.. (b-ii) Sn converges to a ﬁnite limit S a.s..

(7) See Rachev and Todorovich (1990) for some examples of distributions of S ; if τ Z ≡ 1, Sτ is said to be geometrically inﬁnitely divisible; see Klebanov, Maniya, and Melamed (1984). (8) See Rachev and Resnick (1991).

240

9. Mass Transportation Problems and Recursive Stochastic Equations

(b-iii) Mn converges to a ﬁnite limit a.s. (b-iv) 0 < E log(1 + Yn ) < ∞. Moreover, (b-iv) implies (b-i)–(b-iii) even if ν = −∞. The proofs of this and the further assertions in this section can be found in Rachev and Samorodnitsky (1995). Remark 9.2.22 Given a sequence of nonnegative i.i.d. random vectors

(Yn , Zn ) = Yn(1) , . . . , Yn(d) , Zn(1) , . . . , Zn(d) ∈ IR2d , (1)

(d)

we consider the vector of “wealths” Sn = (Sn , . . . , Sn ) given by Sn(k) =

n

(k)

Yi

i=1

i 8

(k)

Zj ,

n = 1, 2, . . . .

(9.2.75)

j=1

Then Lemma 9.2.21 applied componentwise yields convergence. Our next theorem is the CLT for the “total wealth” Sn in (9.2.66). We assume that ξn = log Zn belongs to the domain of attraction of an α-stable r.v. ηα (1 < α ≤ 2); i.e., there exist an > 0 and bn ∈ IR such that an

n

an = n−1/α L(n),

D

ξi + bn =⇒ ηα ;

(9.2.76)

i=1

where L(n) is a slowly varying function. Theorem 9.2.23 Suppose that E log(1 + Zn ) < ∞. D

(a) If ν > 0 and E log(1 + Yn ) < ∞, then an log Sn + bn =⇒ ηα . D

(b) If ν < 0 and E log(1 + Yn ) < ∞, then an log(S − Sn ) + bn =⇒ ηα . (c) Let ν = 0, and assume (without loss of generality) that bn ≡ 0. Suppose also that the sequences {Yn }n≥1 and {Zn }n≥1 are independent and that P (log Y1 > 1/an ) = o(n−1 ),

n → ∞.

(9.2.77)

Then, as n → ∞, D

an log Sn =⇒

sup L(t),

(9.2.78)

0≤t≤1

d

where L is a Levy stable motion on [0, 1] with L(1) = ηα .

9.2 Convergence of Recursive Algorithms

241

Remark 9.2.24 The results of Theorem 9.2.23 can be extended both to the multivariate setting and to the form of a functional CLT. We give just one example of such an extension, which is obtainable using Theorem 1 of Resnick and Greenwood (1979).

(1) (2) = In the notation of Remark 9.2.22 let d = 2 and set ξn = ξn , ξn

(1) (2) 2 2 log Zn , log Zn . Assume that there exists an ∈ IR+ and bn ∈ IR such that

D (1) (2) (2) ξ , a ξ + bn =⇒ ηα . a(1) n n n n (i)

Here α = (α1 , α2 ), 1 < αi ≤ 2, i = 1, 2 and an = n−1/αi Li (n), i = 1, 2, (1) where the Li ’s are slowly varying functions. If E log(1 + Zn ) < ∞ and (i) ν (i) := E log Zn > 0 for i = 1, 2, then % & D (1) (2) (2) a(1) log S , a log S =⇒ {L(t)}t≥0 ; + b n n [nt] n [nt] t≥0

the weak convergence is in the space D [0, ∞), IR2 . {L(t) = (1) d L (t), L(2) (t) , t ≥ 0} is a L´evy process with L(1) = ηα and such that % & d t1/α1 L(1) (1), t1/α2 L(2) (1) + β(t), t ≥ 0 {L(t), t ≥ 0} = for some β(t) ∈ IR2 prescribed by the marginal convergence. Moreover, (i) if (α1 , α2 ) = (2, 2), then L is an IR2 -valued Wiener process; (ii) if (α1 , α2 ) = (α, 2), 1 < α < 2, then {L(t) = (L(1) (t), L(2) (t)), t ≥ 0}, where L(1) is an α-stable process and L(2) is a Wiener process independent of L(1) ; (iii) if 1 < αi < 2, i = 1, 2, then L has L´evy measure deﬁned by ◦T = , where

T X = (sign x1 )|x1 |1/α1 , (sign x2 )|x2 |1/α2 . The measure is determined by ∈ IR2 ; |x| > r, θ(x) ∈ H} = r−1 S(H)

{x for r > 0; H is a Borel subset of [0, 2π], where |x| and θ(x) are the polar coordinates of x ∈ IR2 , and S is a ﬁnite Borel measure on [0, 2π].(9) (9) A more detailed analysis of L can be obtained using further the results of Resnick and Greenwood (1979) and de Haan et al. (1984). An even more general case where an

242

9. Mass Transportation Problems and Recursive Stochastic Equations

Propositions (b) and (c) of Theorem 9.2.23 can be extended in a similar fashion. As far as the maximal “wealth change” (9.2.65) for n years is concerned, we have the following analogue of Theorem 9.2.23.(10) Theorem 9.2.25 Under the assumptions of Theorem 9.2.23 the following hold: D

(a) If ν > 0, then an log Mn + bn =⇒ ηα . D

(b) If ν < 0, then an log(∨j>n Xj ) + bn =⇒ ηα . D

(c) If ν = 0 and (9.2.78) hold, then an log Mn =⇒ sup0≤t≤1 L(t). Next, we examine the geometric random sum Sτ as deﬁned above. We say that ξn = log Zn belongs to the domain of attraction of a geometric α-stable r.v. Gα if there exist functions a = a(p) > 0 and b = b(p) on [0, 1] such that a

τ

D

(ξi + b) =⇒ Gα

as p → 0.

(9.2.79)

i=1

Here a(p) = p1/α L(1/p), where L is slowly varying function.(11) Theorem 9.2.26 Suppose that E log(1 + Zn ) < ∞ and (9.2.79) holds. D

(a) If ν > 0 and E log(1+Yn ) < ∞, then a(log Sτ +τ b) =⇒ Gα as p → 0. (b) If ν < 0 and E log(1 + Yn ) < ∞, then a(log as p → 0.

j≥τ +1

D

Xj + τ b) =⇒ Gα

(c) Let ν = 0 and b ≡ 0. Assume also that the sequences {Yn }n≥1 and {Zn }n≥1 are independent and P (log Y1 > n1/α L(n)−1 ) = o(n−1 ),

n → ∞.

is a (2 × 2) matrix can be treated using the theory of operator stable random vectors; see Meerschaert (1991). (10) Extensions similar to the ones discussed in Remark 9.2.24 are possible here as well; we may use the multivariate extreme value theory as in de Haan and Resnick (1977). (11) The ch.f. of G admits the representation f α Gα (t) = 1/(1 − log φα (t)), where φα is the ch.f. of an α-stable r.v. (Klebanov, Maniya, and Melamed (1984)). Similarly, fξn = 1/(1 − log ψ), where ψ is the ch.f. of a distribution in the domain of attraction of an α-stable r.v. with ch.f. φα (Mittnik and Rachev (1991)). Examples of geometric α-stable distributions are the exponential law (α = 1) and the Laplace law (α = 2).

9.2 Convergence of Recursive Algorithms

243

Then D

a log Sτ =⇒

sup G(t)

0≤t≤1

as p → 0.

(9.2.80)

Here G is a “geometric L´evy stable motion”; i.e., the weak limit in D[0, 1] [τ t] of Gp (t) = a j=1 ξj , 0 ≤ t ≤ 1. Remark 9.2.27 Regarding the existence of the process G as the weak limit of Gp , one can check the following: (a) The ﬁnite-dimensional distributions Gp (t1 ), . . . , Gp (td ) (0 ≤ t1 < · · · < td ≤ 1) converge to “geometric strictly stable distributions” G(t1 ), . . . , G(td ) with ch.f. g(θ) of the form 1/(1 − log ψ(θ)), where ψ(θ) is the ch.f. of a strictly α-stable random vector on IRd . (b) The set of laws of Gp (·) (0 < p < 1) is tight. Remark 9.2.28 Under the assumptions listed in Remark 9.2.24 we also have & % w (1) (2) (2) =⇒ {L(νt)}t≥0 , a(1) n S[τ t] , an S[τ t] + τn tbn t≥0

where τn = τ (1/n) and ν is an exponential random variable with mean 1 independent of the bivariate L´evy process L. An important observation is that the above limit relation remains true if we choose any sequence of D positive integer-valued random variables τn such that τn /n ⇒ τ , where τ is a positive random variable. Choosing therefore diﬀerent laws for τ , we arrive at diﬀerent models for the total “asset value process.” We list below some of these models, assuming that L is a zero mean bivariate Wiener process; that is, we are in case (i) discussed in Remark 9.2.24. (a) If τn is a mixture (by the values of a ﬁxed random variable U ) of Poisson random variables with mean nU , then we may take τ = U , and L(τ ·) is a mixture of Wiener processes; see Boness et al. (1979). √ (b) If τ = 1/ Xm , where Xm is a chi-square random variable with m degrees of freedom, then the one-dimensional marginals of L(τ ·) are Student’s t distributed; this model was used in Blattberg and Genodes (1974) to model stock prices. (c) If τ is positive strictly stable with index α/2, 0 < α < 2, then L(τ ·) is an α-stable motion. This subordinated process was used in Mandelbrot and Taylor (1967) to explain the nonnormality of stock price changes.

244

9. Mass Transportation Problems and Recursive Stochastic Equations

(d) If τ is a lognormal random variable, then L(τ ·) is the Clark (1973) alternative to the Mandelbrot and Taylor (1967) subordinated process. Note that in contrast to (c), here L(τ ·) has ﬁnite variances. Similarly to Theorems 9.2.25 and 9.2.26, we obtain the following limit theorem for the distribution of the maximal “wealth change.” Theorem 9.2.29 Under the assumptions of Theorem 9.2.26 the following holds: D

(a) If ν > 0, then a(log Mτ + τ b) =⇒ Gα as p → 0. D

(b) If ν < 0, then a(log ∨j≥τ +1 Xj + τ b) =⇒ Gα as p → 0. (c) Suppose that the conditions of Theorem 9.2.26(c) hold. Then as p → 0, D

a log Mτ =⇒

sup G(t).

0≤t≤1

Finally, let us consider the total “wealth” until a Poisson (λ) random moment T = T (λ). Let the sequence {Yn , Zn }n≥0 be as before and independent of T . Suppose also that the ch.f. fξn of ξn = log Zn satisﬁes lim |u|−α (1 − fξn −a (u)) = µ

u→0

(9.2.81)

for some µ > 0, real a, and 1 < α ≤ 2. Note that a = Eξn , and at least when α = 2, (9.2.81) is equivalent to assuming that the ξn ’s are in the domain of normal attraction of an α-stable distribution (Feller (1971, p. 596)). Theorem 9.2.30 Suppose that E log(1 + Zn ) < ∞ and (9.2.81) holds. Let ST =

T i=0

Xi =

T i=0

Yi

i 8

Zj .

j=0

(a) If ν = Eξn > 0 and E log(1 + Yn ) < ∞, then as λ → ∞, λ−1/α (log ST − aT ) =⇒ Y(α) , D

where Y(α) is a symmetric stable r.v. with ch.f. exp(−µ|θ|α ).

9.2 Convergence of Recursive Algorithms

245

(b) If ν < 0 and E log(1 + Yn ) < ∞, then as λ → ∞, ⎛ λ−1/α ⎝log ⎛ λ−1/α ⎝log

⎞

∞

Xj − aT ⎠

j=T +1 Tk

D

=⇒ Y(α) ,

⎞

Xj − aT1 ⎠

D

=⇒ Y(α) ,

j=T1 +1

where T1 , Tk are as in (9.2.69). (c) Let ν = 0 and suppose that the sequences {Yn }n≥0 and {Zn }n≥0 are independent and P (log Yn > u) = o(u−α ) as u → ∞. Then as λ → ∞, λ−1/α log ST =⇒ D

sup L(t),

0≤t≤1

d

where L(·) is a L´evy stable motion on [0, 1] with L(1) = Y(α) . Analogous theorems can be established for the limit distributions of MT1 Tk+1 and ∨i=T Xi . k +1 The remaining results in this section deal with characterizations of the limit laws of Sn (cf. (9.2.66) and (9.2.64)) and Mn (cf. (9.2.67)), which can arise for any given distribution of Zn ’s in a given parametric family of distributions. We will assume that the sequences {Yn }n≥1 and {Zn }n≥1 are independent. Also, we will concentrate our attention on the distributions of Zn supported by (0.1).(12) Invoking Lemma 9.2.21(b), we conclude that Sn (resp. Mn ) converges to a ﬁnite limit S (resp. M ) if (and only if, in the case E log Zn > −∞) 0 ≤ E log(1 + Yn ) < ∞.

(9.2.82)

Given (9.2.82), the limits S and M satisfy the equations (9.2.71) and (9.2.72).(13) We start with a characterization of the class S1 (resp. M1 ) of laws L(S) of S (resp. L(M )) such that for any L(Z) ∈ Z1 there exist Y = Y (Z) (12) The case Z ∈ (0, 1) a.s. corresponds to “deteriorating environment,” the case n being close to the soil erosion model of Todorovic and Gani (1987). (13) Moreover, the converse is also true. Namely, if S, (Y, Z) (M, (Y, Z) respectively) is a solution of (9.2.71) (or (9.2.72) respectively), then the distribution of S(M ) is equal d

to the limiting distribution in the model (9.2.66) ((9.2.67) respectively) with (Yn , Zn ) = (Y, Z). This is a simple consequence of the uniquenes principle (the so-called Letac principle); see Letac (1986) or Goldie (1991).

246

9. Mass Transportation Problems and Recursive Stochastic Equations

such that (9.2.71) (resp. (9.2.72)) holds. The class(14) Z1 of Z-laws L(Z) consists of distributions on (0, 1) with densities fα (u) = (1 + α)z α ,

0 < z < 1, α ≥ 0.

(9.2.83)

In the sequel, for any 0 < β < 1 and an r.v. Y , deﬁne ⎧ ⎨ 0 with probability 1 − β, Yβ := ⎩ Y with probability β.

(9.2.84)

A complete description of the class Z1 is given in the following theorem. d

Theorem 9.2.31 The class S1 of the laws L(S) solving S = (S + Y )Z consists of all nonnegative inﬁnitely divisible r.v.s S with Laplace transform ⎧ ∞ ⎫ ⎨ 1 ⎬ (1 − e−θx )MS ( dx) . φS (θ) = exp − ⎩ ⎭ x 0

Moreover, the L´evy measure MS is of the following form: MS ! Leb

and

MS ( dx) = H(x) dx,

where H(0) ∈ [0, 1], H is nonincreasing on [0, ∞) and vanishing at ∞. The corresponding Y has 1 − H as its distribution function. Remark 9.2.32 Suppose that S is a solution of (9.2.71) for a given Y with 0 ≤ E log(1 + Y ) < ∞ and Z uniform. Then then S is also a solution of (9.2.71) with Z having density (9.2.83) and Y replaced by Y1/(1+α) . Note that Z1 is a subclass of the class of self-decomposable random variables; see Vervaat (1979). Also, allowing α in (9.2.83) to take values in the whole range (−1, ∞) would have made the class Z1 degenerate (consisting of Z = 0 a.s.). Remark 9.2.32, in particular, has no counterpart for α’s in the range (−1, 0). Our next task is the characterization of the class M1 of laws L(M ) such that for every L(Z) ∈ Z1 there exists Y = Y (Z) such that (9.2.72) holds. Theorem 9.2.33 The class M1 consists of all absolutely continuous laws L(M ) with density fM and d.f. FM satisfying the following conditions: (14) The class Z was considered by Vervaat (1979) (who discussed a wider family, 1 allowing α > −1 in (9.2.83)) and Todorovich and Gani (1987). Some particular examples of laws L(S) ∈ S1 , L(M ) ∈ M1 , were studied by Todorovich and Gani (1987), Todorovich (1987), and Rachev and Todorovich (1990).

9.2 Convergence of Recursive Algorithms

247

(i) fM (x) is nonincreasing on (0, ∞). (ii) x fM (x)/FM (x) is nonincreasing on (0, ∞). Suppose that L(M ) ∈ M1 and let Z (α) have density fZ α (z) = (1 + d

α)z α , 0 < z < 1. Then M = (M ∨ Y )Z (α) is equivalent to F Y (x) =

1 x fM (x) , 1 + α FM (x)

x > 0.

(9.2.85)

By (9.2.85), for any L(M ) ∈ M1 and 0 < α < 1, M = (M ∨ Yα )Z (α) ⇐⇒ M = (M ∨ Y0 )Z (0) , d

d

where Yα is determined by (9.2.84). The last relation is parallel to the corresponding relation in the scheme of summation (cf. Remark 9.2.32). Note also that gamma Γ(p, λ)-distributions with 0 1 do not. Next, we consider the class S2 (resp. M2 )(15) of laws L(S) (resp. L(M )) such that for every L(Z) ∈ Z2 ≡ {δz , 0 < z < 1} there is a Y = Y (Z) such that (9.2.71) (resp. (9.2.72)) holds. Theorem 9.2.34 The class S2 coincides with the family of all nonnegative inﬁnitely divisible r.v.s with Laplace tranform of the form ⎧ ⎫ ∞ ⎨ ⎬ 1 (1 − e−tx )MS ( dx) , φS (t) = exp −at − ⎩ ⎭ x 0

where a ≥ 0 and the L´evy measure MS ! Leb is absolutely continuous, whose Radon-Nikodym derivative is nonincreasing a.s. For any S ∈ S2 and z ∈ (0, 1), the corresponding Y in the equation d

S = (S + Y )z is a nonnegative inﬁnitely divisible r.v. with Laplace transform ⎧ ⎫ ⎨ at(1 − z) ∞ 1 ⎬ − (1 − e−tx )MY ( dx) . φY (t) = exp − ⎩ ⎭ z x 0

(15) It turns out that the class S coincides with the class L of Khinchine (cf. Feller 2 (1971, Sect. 8, Chapter XVII) of nonnegative r.v.s. We shall state here a more explicit description of S2 than that in Feller (1971, Theorem XVII.8). Moreover, S1 ⊂ S2 ; see also Vervaat (1979, Remark 4.9). The class of M2 coincides with the class of the laws of max self-decomposable r.v.s (see Balkema et al. (1990) and the references there). The next theorem, similar to the Mejzler (1956) result, is based on a characterization of the weak limits of the normalized maxima an {max(X1 , X2 , . . . , Xn ) − bn } when the Xi ’s are independent and nonidentically distributed.

248

9. Mass Transportation Problems and Recursive Stochastic Equations

Moreover, MY ! Leb, and dMS dMS dMY (x) = (zx) − (x), dλ dλ dλ where λ = Leb. Theorem 9.2.35 The class M2 consists of the laws of positive absolutely continuous r.v.s M such that xfM (x)/FM (x) is a nonincreasing function on (0, ∞). Also, M1 ⊂ M2 .

9.2.7

Random Recursion Arising in Probabilistic Modeling: Rate of Convergence

Throughout this section, (B, || · ||) is the separable Banach space C(T ) of continuous mappings x : T → IR, where T is compact and || · || is the usual supremum norm in C(T ). For any x, y ∈ B we set (x · y)(t) = x(t) · y(t), (x ∨ y)(t) = x(t) ∨ y(t), t ∈ T . Given a nonatomic probability space, let X (B) be the space of all random ﬁelds (r.f.s) X of B-valued random variables, and let L(B) be the space of all laws PX . Suppose {(Yn , Zn )}n≥1 is a sequence of i.i.d. pairs of r.f.s, and deﬁne Sn =

n

Xi ,

i=1

Xi = Yi

i 8

Zj .

(9.2.86)

j=1

The r.f. Sn can be interpreted as the “wealth” accumulated in diﬀerent commodities {At , t ∈ T } for a period of n years. We take T = T (U ), where U is a compact metric space, U = (U, ), and T (U ) is the set of all closed subsets (think, for example, of crop-producing areas) t of U endowed with the Hausdorﬀ metric h(t1 , t2 ) = inf{ε > 0; t1 ⊂ tε2 , t2 ⊂ tε1 }. Here tε stands for the ε-neighborhood of t (cf. Hausdorﬀ (1957, Sect. 29)).(16) Similarly, we deﬁne the maximal “wealth changes” Mn =

n ?

Xi .

(9.2.87)

i=1

Next, we are interested in conditions providing an exponential rate of convergence of Sn and Mn to ﬁnite limits S and M , respectively. The rate (16) Then (T, h) is a compact metric space (cf. Hausdorﬀ (1957, Sect. 29); see also Kuratowski (1966, §21), Kuratowski (1969, §31), and Matheron (1975)).

9.2 Convergence of Recursive Algorithms

249

of convergence of the laws PXn to PX will be expressed, as usual in the Banach space setting, in terms of the Prohorov metric π(X, Y ) := π(PX , PY ) :=

(9.2.88)

inf{ε > 0; P (X ∈ A) ≤ P (Y ∈ A ) + ε ε

for all Borel subsets A in B}, where Aε is the open ε-neighborhood of A. Further, we shall use also the following metrics and functions in X (B) and L(B):(17) (i) χp -metric in X (B): 1 " # 1+p p χp (X, Y ) := sup t P (||X − Y || > t) ,

p > 0;

t>0

(ii) χp -minimal metric in L(B): χ 4p (X, Y ) = χ 4p (PX , PY ) d d Y ); X, Y ∈ X (B), X = X, Y = Y }, p > 0; := inf{χp (X,

ωp,N (X)p+1

(iii)

:=

sup tp P (||X|| > t), t>N

ωp (X)

:= ωp,0 (X),

Np (X)

:= {E||X||p }1/(1+p) ,

p > 0.

Note that χ 4p is a metric in L(B), χ 4p ≥ π, and the following convergence criterion holds: if ωp,N (Xn ) + ωp,N (X) → 0

as N → ∞ for any n = 1, 2, . . . ,

(9.2.89)

then D

χ 4p (Xn , X) → 0 as n → ∞ if and only if Xn →X as n → ∞ and lim lim sup ωp,N (Xn ) = 0.

N →∞ n→∞ (17) (cf.

Pisier and Zinn (1977), de Haan and Rachev (1989), and Rachev (1991c).

250

9. Mass Transportation Problems and Recursive Stochastic Equations

Similarly, if (9.2.21) holds, then P

χp (Xn , X) → 0 as n → ∞ if and only if Xn →X as n → ∞ and lim lim sup ωp,N (Xn ) = 0.

N →∞ n→∞

Theorem 9.2.36 (a) If for some p > 0, Np (Z1 ) < 1 and ωp (Y1 Z 1 ) < ∞, ∞ then Sn converges in probability to a P -a.e. ﬁnite limit S = i=1 Xi . Moreover, χp (Sn , S) ≤ φn (Z1 )ωp (Y1 Z1 ),

(9.2.90)

where φn (Z1 ) := Np (Z1 )n /(1 − Np (Z1 )). (b) Under the above conditions, suppose additionally that {(Yi∗ , Zi )}i≥1 is a sequence of i.i.d. pairs of r.f.s satisfying the “tail” condition ωp (Y1∗ Z1 ) < ∞. Let S ∗ be the limit of Sn∗ , i.e., Sn∗ :=

n i=1

Yi∗

i 8

Zj → S ∗ . P

j=1

i ’s are i.i.d. copies of the Z1 independent of the Suppose now that the Z 4p (Y1 Z1 , Y1∗ Z1 ) < ∞. Then as n → ∞,(18) (Yi , Zi )’s and let χ 1 · · · Z n · S) ≤ φn (Z1 )4 π(S ∗ − Sn∗ , Z χp (Y1 Z1 , Y1∗ Z1 ) → 0.

(9.2.91)

Proof: a) First note that χp ≥ κ, where κ is the distance in probability (the Ky Fan metric): κ(X, Y ) = inf{u > 0; P (||X − Y || > u) < u}. P

Indeed, to prove Sn → S it is enough to show that Sn is χp -fundamental. Actually, for k = 1, 2, . . . , χp (Sn+k , Sn ) 1 n+k 2 = ωp ≤ Xi

n+k

(9.2.92)

i=n+1

i=n+1

ωp (Xi )

1 ⎧ ⎛ ⎞p+1 ⎫ 1+p ⎪ ⎪ i−1 ⎨ ⎬ 8 E ω ⎝Yi zj Zi ⎠ ≤ ⎪ Z1 =z1 ,...,Zi−1 =zi−1 p ⎪ ⎭ i=n+1 ⎩ j=1

n+k

(18) The

CLT.

1 · · · Z n plays the same role as the normalizing scaling in the usual factor Z

9.2 Convergence of Recursive Algorithms

=

n+k

⎡

⎛

⎣EZ1 =z1 ,...,Zi−1 =zi−1 ⎝

i=n+1

≤

n+k

i−1 8

251

1 ⎞p ⎤ 1+p

||zj ||⎠ ⎦

ωp (Yi Zi )

j=1

Np (Z1 )i−1 ωp (Y1 Z1 )

i=n+1

= φn (Z1 )ωp (Y1 Z1 ) → 0 as n → ∞, which indeed implies that Sn is χp -fundamental. The bound (9.2.90) follows by the same arguments we used to show (9.2.92). (b) By deﬁnition, χ 4p is the minimal metric with respect to χp , and thus for any joint distribution of Y1 and Y1∗ , ⎛ ⎞ i i 8 8 1 · · · Z n · S) ≤ χp ⎝ Yi∗ Zj , Yi Zj ⎠ . χ 4p (S ∗ − Sn∗ , Z i>n

j=1

i>n

j=1

Now proceed as in (9.2.92) to obtain that the right-hand side is not greater than φn (Z1 )4 χp (Y1 Z1 , Y1∗ Z1 ). We next take the inﬁmum in the last inequality over all joint distributions of (Y1 , Y1∗ ) with ﬁxed marginals, and use the inequality χ 4p ≥ π to complete the proof of (9.2.92). 2 Theorem 9.2.37 Suppose the Yi ’s and Zi ’s are nonnegative r.v.s. Then P under the assumptions of Theorem 9.2.36a), Mn → M , and moreover, χp (Mn , M ) ≤ φn (Z1 )ωp (Y1 Z1 ) → 0 as n → ∞. If also χ 4p (Y1 , Y1∗ ) < ∞, then under the assumptions of Theorem 9.2.36(b), ⎞ ⎛ i ∞ ? 8 1 · · · Z n · M ⎠ ≤ φn (Z1 )4 π⎝ Yi∗ Zj , Z χp (Y1 Z1 , Y1∗ Z1 ) → 0. i=n+1

j=1

Proof: (a) For any k = 1, 2, . . ., 1 n+k 2 χp (Mn+k , Mn ) ≤ ωp ≤ Xi i=n+1 p

n+k

ωp (Xi ),

i=n+1

and therefore, Mn → M follows by the same arguments as in the proof of Theorem 9.2.36, and the required bound for χp (Mn , M ) is obtained in the same way as we did in (9.2.92).

252

9. Mass Transportation Problems and Recursive Stochastic Equations

b) With Xi∗ = Yi∗ 1 χ 4p

?

Bi j=1

Zj we have 2

Xi∗ ,

1 · · · Z n · M Z

1 ≤ χp

i>n

≤

Xi∗ ,

i>n

1 ≤

?

p

sup t P

2 Xi

i>n

1

t>0

?

||Xi∗

221/1+p − Xi || > t

i>n

ωp (Xi∗ − Xi ).

i>n

The last inequality follows from the triangle inequality for χp in the space of real-valued random variables. Conditioning as in the proof of Theorem 9.2.36(a), we obtain

ωp (Xi∗ − Xi ) ≤ φn (Z1 )χp (X1∗ , X1 ).

i>n

Passing to the minimal metrics and using again π ≤ χ 4p , we obtain the necessary bound. 2 Suppose N is an integer valued r.v. independent of the Yi ’s and Zi ’s. Then under the conditions of Theorem 9.2.36, π(SN , S) ≤ ψN (Z1 )ωp (Z1 ), where ψN (Z1 ) = (ENp (Z1 )N )/(1 − Np (Z1 )), and moreover, ∗ 1 · · · Z N · S) ≤ ψN (Z1 )4 π(S ∗ − SN , Z χp (Y1 , Y1∗ ).

Similar results on the limiting behavior of of the maximum be obtained as a consequence of Theorem 9.2.36.

=N i=1

Xi can

Remark 9.2.38 Vervaat (1979) showed the following limiting result for Sn∗ (see (9.2.65)): let un ↑ ∞ as n → ∞ be a sequence of reals, and assume, in addition, that (i) E log+ |B1 | < ∞, E| log |A1 ||2+η < ∞ for some η > 0, µ := E log |A1 | < 0; d

(ii) the solution S ∗ of the equation S ∗ = A1 S ∗ + B1 , S ∗ and (A1 , B1 ) independent, has a density f that is ultimately nonincreasing and such that f (t) = O(t−1 ) as t → ∞;

9.2 Convergence of Recursive Algorithms

253

(iii) there are positive reals b and ε < |µ| and a positive nonincreasing integrable function φ on [1, ∞) such that the function T ← (T+ (x)φ(y))x−1 e(µ+ε)y (where T+ (x) = P (|S ∗ | > x), T (x) = P (S ∗ > x), and T ← is its generalized inverse) is bounded on the set {(x, y); x ≥ b, y ≥ 1}. Then ∞

P (S ∗ > un )

n=1

< ∞, = ∞,

implies P (Sn∗

> un i.o.) =

0, 1.

Following our Theorem 9.2.36, let us compare the tail behavior of the distribution of the Sn∗ ’s and their limit S ∗ . We consider again the Banach 4 p be the minimal metric space setting for An , Bn , Sn∗ , and S ∗ . Let p = L with respect to Lp (X, Y ), 0 ≤ p ≤ ∞, L0 (X, Y ) = P (X = Y ), Lp (X, Y ) = {EX − Y p }min(1,1/p) ,

0 < p < ∞,

and L∞ (X, Y ) = ess supX − Y . Then, as in Theorem 9.2.36(a), if for some p ∈ [0, ∞], Np∗ (A1 ) := Lp (Z, 0) < 1 and Np∗ (B1 ) < ∞, then as n → ∞,

p (Sn∗ , S ∗ ) ≤

Np (A1 )n Np (B1 ) → 0. 1 − Np (A1 )

In the case of real-valued Sn∗ and S ∗ , the last bound gives us conditions for exponential rate of convergence in the total variation metric and in the

p -Kantorovich metrics:

0 (Sn∗ , S ∗ )

=

1 (Sn∗ , S ∗ ) =

sup |P (Sn∗ ∈ A) − P (S ∗ ∈ A)| → 0;

A Borel ∞

|FSn∗ (x) − FS ∗ (x)| dx → 0; −∞

254

9. Mass Transportation Problems and Recursive Stochastic Equations

⎛

⎞ 1/p ∞

p (Sn∗ , S ∗ ) = ⎝ |FS←n∗ (x) − FS←∗ (x)|p dx⎠ → 0,

1 ≤ p < ∞;

−∞

and

∞ (Sn∗ , S ∗ ) =

sup |FS←n∗ (x) − FS←∗ (x)| → 0.

0≤x≤1

Here as usual, FSn∗ and FS ∗ are the corresponding distribution functions, and F ← stands for the generalized inverse of F .

9.3 Extensions of the Contraction Method A well-known problem in the theory of probability metrics is the extension of the method of ideal metrics to limit theorems for sums or maxima with “nonregular” normalizations of logarithmic type. Moreover, this problem is quite typical in a wide range of stochastic algorithms, since the logarithmictype normalization is not reﬂected in the regularity structure of probability metrics, while power normalizations na can be captured easily by ideal metrics of order a. The second diﬃculty arises when the contraction factors converge to one. In this section we study several examples that show solutions to this problem by the use of a modiﬁed version of the contraction method. In Sections 9.3.1 and 9.3.2 we consider the number of inversions for random permutations and the “MAX”-algorithm. In Sections 9.3.3 and 9.3.4 we study successful and unsuccessful searching in binary random trees. Each of these examples needs some special arguments in order to achieve approximation by a limit distribution; so in general, the contraction method cannot be considered an “automatic” method. The advantage of the contraction method is its generality, which allows us, for example, to consider recursions in very general spaces, as well as the fact that it often allows us to obtain quantitative approximations. The examples in this section are due to Cramer and R¨ uschendorf (1996a).

9.3.1

The Number of Inversions of a Random Permutation

Given a permutation σ = (a1 , . . . , an ), the pair (ai , aj ), i < j, is called an inversion if ai > aj . Denote by In the number of inversions in a random permutation of size n. Then the following recursion holds: d

In = In−1 + Xn ,

I1 = 0,

(9.3.1)

9.3 Extensions of the Contraction Method

255

where Xn ∼ U({0, . . . , n − 1}) is uniformly distributed on 0, . . . , n − 1 and the r.v.s In−1 , Xn are independent. This leads to explicit expressions for the moment generating function, the mean, and the variance: Gn (z) = Ez In =

E In =

n (n − 1) , 4

1 (1 − z 2 ) · · · (1 − z n ) · , n! (1 − z)n−1 Var In =

(n − 1) n (2 n + 5) 72

(9.3.2)

(9.3.3)

(cf. Hofri (1987, pp. 122–124)). For the normalized version In − E In I4n := √ Var In

(9.3.4)

we obtain the following Berry–Ess´een-type result. (Note that we assume that all the occurring random variables are deﬁned on one and the same probability space.) Theorem 9.3.1 For n ≥ 7,

1 I4n , N (0, 1) ≤ C · n− 2 , with C = 2.75 ·

84 6·128

@

(9.3.5)

7 6.

n Proof: Without loss of generality, we assume that In = i=1 Xi , where the Xi are independent, Xi ∼ U({0, . . . , i − 1}). By the Berry–Ess´een theorem (cf. Bhattacharya and Ranga Rao (1976, Th. 12.4)),

I4n , N (0, 1) ≤ 2.75

Sn,3 , (Sn,2 )3/2

(9.3.6)

where Sn,m :=

n

E|Xk − E Xk |m .

(9.3.7)

k=1

We have, for k ≥ 2, E|Xk − E Xk |3 ≤

k3 , 32

Var Xk =

k2 − 1 . 12

3 n by some tedious calculations. This implies that k=1 Var Xk ≥ (n−1) and 36 n (n+1)4 3 k=1 E|Xk − E Xk | ≤ 128 . Thus, from (9.3.6), we obtain, for n ≥ 7,

256

9. Mass Transportation Problems and Recursive Stochastic Equations

4 7

1 1 n+1 n 63 n− 2 ≤ C n− 2 . I4n , N (0, 1) ≤ 2.75 128 n − 1 n−1 2 Recursion (9.3.1) leads to a sum of independent variables and therefore allows the application of the classical tools for the central limit theorem. On the other hand, it is an interesting “test” rate of convergence example for the contraction method, since the contraction factors of the normalized recursion converge to one. Furthermore, the approximation result (in terms of the ζ3 -metric) is of independent interest. It gives the same convergence rate as in Theorem 2.1 uniformly on the set of functions f (In ) with f (3) ∞ ≤ 1, when we study the limiting behavior of In − E I n In := . n3/2

(9.3.8)

Theorem 9.3.2 Let σn2 := Var(In ) and Zn ∼ N (0, σn2 ). Then for some C > 0, and for all n ∈ IN, 1 ζ3 (In , Zn ) ≤ C n− 2 .

(9.3.9)

Proof: First note that In satisﬁes the modiﬁed recursion 3/2 n−1 d n , In−1 + X In = n

(9.3.10)

Xn n := Xn −E where X . Let the sequence (Zn ) be independent, Zn ∼ n3/2 2 N (0, σn ), and deﬁne the accompanying sequence

Zn∗ :=

n−1 n

3/2

An . Zn−1 + X

Let Yi ∼ N (0, τi2 ) be independent 3 Xi 2 τi2 := σi2 − i−1 σi−1 = Var ≥ 0. Then i i3 d

Zi =

i−1 i

(9.3.11) of

i , Zi−1 ), (X

where

3/2 Zi−1 + Yi .

(9.3.12)

Using the homogeneity of order three of the ideal metric ζ3 , we obtain 1 2 3/2 3/2 n−1 n−1 In−1 + Xn , Zn−1 + Xn ζ3 (In , Zn ) ≤ ζ3 n n + ζ3 (Zn∗ , Zn ) 9/2 n−1 ζ3 (In−1 , Zn−1 ) + ζ3 (Zn∗ , Zn ). ≤ n

9.3 Extensions of the Contraction Method

257

By iteration, using Z1 = I1 = 0, we obtain the “ground estimate” ζ3 (In , Zn ) ≤

n 9/2 i ζ3 (Zi∗ , Zi ). n i=2

(9.3.13)

Note that E Zi = E Zi∗ = 0 and E Zi2 = E(Zi∗ )2 . Therefore, by making T (1+ 1 )

α use of the estimate ζr ≤ Γ(1+r) κr for r = m + 2, by (9.3.12) and some calculations (cf. Cramer (1995)) we have 1 2 3/2 3/2 i−1 i−1 ∗ A ζ3 (Zi , Zi ) = ζ3 Xi + Zi−1 , Yi + Zi−1 i i

Ai , Yi ≤ Γ(2) κ3 X i , Yi ≤ ζ3 X Γ(4) 0 = x2 |FX i (x) − FYi (x)| dx

−∞

≤

7 1 i−3/2 + 5 i−5/2 . 26 · 32 2

Therefore, by some additional calculations, n 9/2 i ζ3 (Zi∗ , Zi ) n i=2 n 9/2 i 1 −5/2 7 −3/2 ≤ i + 6 2i n 25 2 ·3 i=2 " # 1 1 3 1 4 7 1 2 3 ≤ n + n + 6 2 n + n 3 2 ·3 4 n9/2 25

3 1 7 · √ + O n− 2 . = 28 · 32 n

ζ3 (In , Zn ) ≤

2 32 only, Note that the contraction factor in this example is of order n−1 n and consequently we cannot obtain a uniform bound, implying that we need to estimate more precisely the individual terms. The exponential con√ vergence rate is reduced to the rate n.

9.3.2

The Number of Records

The “MAX”-algorithm determines the maximum element of a random sequence (cf. Hofri (1987, pp. 112–113)). Its complexity is essentially given

258

9. Mass Transportation Problems and Recursive Stochastic Equations

by the number of records in a random permutation. Let Mn denote the number of maxima of a random permutation read from left to right. Then Mn satisﬁes the recursion d

Mn = Mn−1 + Xn ,

(9.3.14)

1 where n has a Bernoulli distribution with success probability n , Xn ∼ 1X B 1, n , and Xn , Mn−1 are assumed independent. Deﬁne M1 = 0. Then d

Mn =

n

Xi ,

(9.3.15)

i=2

when the (Xi ) are independent. Furthermore, E Mn = Hn − 1, (k)

where Hn = (2)

Hn

n

1 j=1 j k ,

−→ ζ(2) =

n→∞

π2 6

Var Mn = Hn − Hn(2) ,

(9.3.16)

(1) Hn = Hn = ln n + γ + O n−1 , and (cf. Hofri (1987)).

Deﬁne next the normalized sum Cn := M√n − E Mn . M Var Mn

(9.3.17)

Then as in Section 9.3.1, we obtain the normal approximation, but with a “very slow” logarithmic rate of convergence. Theorem 9.3.3 For all n ∈ IN and some absolute constant C > 0, the following uniform rate of convergence holds:

Cn , N (0, 1) ≤ √C . M (9.3.18) ln n Proof: We invoke the Berry–Ess´een bound (9.3.6), where E Xk = k1 , k3 −3 k2 +4 k−2 3 . Var Xk = k−1 k2 , and E|Xk − E Xk | = k4 n n Therefore, k=2 E|Xk − EXk |3 ∼ ln n, and k=2 Var Xk ∼ ln n, leading to (9.3.18). The constant C can be easily explicitly calculated. 2 The normalization of Mn is logarithmic in n. To get a rate of convergence result similar to that in (9.3.18), we shall make use of the ζ3 -metric. @It turns out that in this example we obtain contraction factors of order ln(n−1) ln n that converge to one. Nevertheless, the method described in the proof of Theorem 9.3.2 can also be applied in this case. To this end, deﬁne An := Mn√− E Mn . M ln n

(9.3.19)

9.3 Extensions of the Contraction Method

259

An and Zn ∼ N (0, σn2 ), we have Theorem 9.3.4 For σn2 := Var M An , Zn ) = O √ 1 ζ3 (M . (9.3.20) ln n An satisﬁes the recursion Proof: Indeed, M 7 ln(n − 1) A d n , A Mn = Mn−1 + X ln n

(9.3.21)

n := Xn√−E Xn . Let (Zn ) be independent normally distributed r.v.s, where X ln n Zn ∼ N (0, σn2 ), and let 7 ln(n − 1) ∗ n Zn−1 + X Zn := (9.3.22) ln n be the accompanying sequence. Further, let Yn ∼ N (0, τn2 ), and τn2 := σn2 −

ln(n − 1) 2 n . σn−1 = Var X ln n

(9.3.23)

Then 7 d

Zn =

ln(n − 1) Zn−1 + Yn , ln n

(9.3.24)

and using the same arguments as in Section 9.2, we get An , Zn ) ≤ ζ3 (M An , Zn∗ ) + ζ3 (Zn∗ , Zn ) ζ3 (M 3/2 ln(n − 1) An−1 , Zn−1 ) + ζ3 (Yn , X n ). ζ3 (M ≤ ln n By iteration, this yields the bound An , Zn ) ≤ ζ3 (M

ln 2 ln n

3/2 3/2 n ln i A2 , Z2 ) + i ). ζ3 (M ζ3 (Yi , X ln n i=3

(9.3.25)

A2 , Z2 ) < ∞, and since By the moment estimate ζ3 (M 1 2 Var Xi = Var Yi = τi = ln i · i−1 i2 , we have i ) ≤ ζ3 (Yi , X ≤

1 i |3 E|Yi |3 + E|X 6 1 2 √ 1 8 1 1 1 · +√ · √ ; π i i (ln i)3/2 6 i

3 here we also used the estimate E Xi − 1i ≤ 1i .

(9.3.26)

260

9. Mass Transportation Problems and Recursive Stochastic Equations

From (9.3.25) we ﬁnally obtain An , Zn ) ≤ ζ3 (M

= ≤ =

9.3.3

3/2 ln 2 1 · 6 ln n 1 2 √ n ln i 3/2 1 8 1 1 1 +√ · √ + · · ln n π i i (ln i)3/2 6 i i=3 ) √ * n n 1 1 1 8 3/2 √ ·√ (ln 2) + + i π 6 (ln n)3/2 i i i=3 i=3 1 (ln 2)3/2 + 2 ln n 6 (ln n)3/2

1 1 ·√ + O (ln n)−3/2 . 3 ln n

2

Unsuccessful Searching in Binary Search Trees

In this and the following section we deal with the analysis of inserting and retrieving randomly ordered data in binary search trees by the contraction method; we refer to Mahmoud (1992) for an introduction to random search tree algorithms. Let Un denote the number of comparisons that are necessary in order to insert a new random element in a random search tree. A search tree is called random if it arises from a random permutation. An element (to be inserted in a tree) is called random if each of the n + 1 free leaves of the 1 tree has probability n+1 of being chosen. Un satisﬁes the recursion d

Un = Un−1 + Yn ,

U0 = 0,

(9.3.27)

2 ). For n = 1, one comwhere Un−1 , Yn are independent, Yn ∼ B(1, n+1 parison with the root is necessary. For n ≥ 2, insertion of the (n + 1)th element needs as many comparisons in the n-tree as in the (n − 1)-tree except in the case that one comparison with the nth element is necessary. The probability that no comparison with this element is necessary equals n−1 n+1 .

From (9.3.27) we have E Un = 2 (Hn+1 − 1),

(2)

Var Un = 2 Hn+1 − 4 Hn+1 + 2.

(9.3.28)

9.3 Extensions of the Contraction Method

261

Brown and Shubert (1984) (cf. Mahmoud (1992, p. 76)) proved a central limit theorem for Un making use of the Lyapunov theorem and the method generating functions. Since by (9.3.27), d

Un =

n

Yi ,

Yi ∼ B 1,

i=1

2 i+1

,

(Yi ) independent, (9.3.29)

this argument can be simpliﬁed to yield the following theorem. Theorem 9.3.5 Deﬁne n − E Un 4n := U√ . U Var Un

Then for some constant C > 0 and all n,

4n , N (0, 1) ≤ √C . U ln n Sn,3

(9.3.30)

1 (cf. Mahmoud (1992, p. 77)). There2 ln n fore, (9.3.30) is a consequence of (9.3.6). 2

Proof: Observe that

3/2 Sn,2

∼√

Applying the results of Deheuvels and Pfeifer (1988) we obtain that ln1n 4n by a Poisson distribution. This is the exact order of approximation of U indicates that the logarithmic rate in the Berry–Esseen bound (9.3.30) should give essentially the right order of approximation. The following rate of convergence result, obtained by the contraction method, supports the fact that the logarithmic order is sharp. The contraction method can be applied in the theorem below in much the same way as in Section 9.3.2. We therefore only give a sketch of the proof. For more details we refer to Cramer (1995a). n , and Zn ∼ N (0, σn2 ). n := Un√−E Un , σn2 := Var U Theorem 9.3.6 Deﬁne U ln n Then, for some C > 0 and all n ∈ IN, we have

n , Zn ≤ √C . ζ3 U ln n n satiﬁes the recursion Proof: U 7 ln (n − 1) d Un−1 + Yn , Un = ln n

(9.3.31)

Yn − E Y n √ . Yn := ln n

(9.3.32)

262

9. Mass Transportation Problems and Recursive Stochastic Equations

Deﬁne then Zn∗

7

:=

ln (n − 1) Zn−1 + Yn ln n

(9.3.33)

and τn2 := σn2 −

ln (n − 1) 2 σn−1 = Var Yn . ln n

(9.3.34)

Let the normal random variables Wn ∼ N (0, τn2 ) be independent of the sequences (Zn ), (Yn ). Then the sequences 7 ln(n − 1) d Zn−1 + Wn . (9.3.35) Zn = ln n Consequently, as in Section 9.3.2, we have the bound 3/2 3/2 n

ln i ln 2 n , Zn ≤ 2 , Z2 + ζ3 U ζ3 Wi , Yi . ζ3 U ln n ln n i=3

(9.3.36)

2 = 0 = E Z2 , Var U 2 = σ 2 = Var Z2 , it follows that Next, since E U 2

ζ3

2 , Z2 U

1 ≤ 6

3 3 E U2 + E |Z2 | < ∞.

Furthermore, 3

1 3 ζ3 Wi , Yi ≤ E|Wi | + E Yi 6 ) √ 1 3 3 2* 1 2 2 3 2 i−1 1 2 i−1 √ τi + = + 6 i+1 i+1 π (ln i)3/2 i + 1 i + 1 * ) 2 4 . ≤ 1+ $ 3/2 6 (ln i) (i + 1) π (i + 1) Therefore,

n , Zn ζ3 U ≤

1 1 (ln n)3/2 6 +

≤ as required.

√

1 (ln n)3/2

1 ln n

10 8 √ + 81 27 π 1 2 n 4 1 1 1+ $ · i+1 3 π(i + 1) i=3

for n ≥ n0 ,

(9.3.37) 2

9.3 Extensions of the Contraction Method

263

Remark 9.3.7 Studying the recursion (9.3.32) we can also obtain rate of convergence under alternative distributional assumptions on Yn (resp. Yn ). For example, if µr is any (r, +)-ideal, simple metric, then (as in (9.3.36))

µr

r/2 n ln 2 r/2 ln i Un , Zn ≤ µr U2 , Z2 + µr Wi , Yi . (9.3.38) ln n ln n i=3

This indeed implies that

n , Zn −→ 0, µr U

(9.3.39)

n→∞

provided that the following conditions hold:

2 , Z2 < ∞, µr Wi , Yi < ∞, (a) µr U (b)

i ≥ 3.

1 . µr Wi , Yi = o i ln i

(9.3.40)

ε , To show (9.3.39) for ε > 0, choose k0 ∈ IN such that µr (Wk , Yk ) ≤ k ln k for k ≥ k0 . Then

lim sup µr n→∞

n , Zn U

≤ lim sup n→∞

ln 2 ln n

+ lim sup n→∞

+ lim sup n→∞

r/2

2 , Z2 µr U

k 0 −1

1 r/2 i W (ln i) µ , Y r i (ln n)r/2 i=3 n 1 1 (ln i)r/2−1 ε r/2 i (ln n) i=k 0

n 1 1 ≤ 0 + 0 + lim sup ε ≤ ε. ln n i n→∞ i=k0

In the preceding example of unsuccessful searching, the estimate of the rate of “merging” of the sequences (Wi ) and (Yi ) in terms of µr (W √i , Yi ) is 3/2 of order 1/i(ln i) , allowing us to reach the convergence rate 1/ ln n.

9.3.4

Successful Searching in Binary Search Trees

Given a random binary search tree as in Section 9.3.3, let Sn denote the number of comparisons to retrieve a randomly chosen element in the tree. Brown and Shubert (1984) derived a formula for P (Sn = k), and Louchard (1987) proved a central limit theorem for Sn using the generating function

264

9. Mass Transportation Problems and Recursive Stochastic Equations

method in Mahmoud (1992, pp. 78–82). We shall next derive a quantitative version of the central limit theorem. Our main tool will be the contraction method and moment formulas based on the following recursion for Sn : d

Sn = 1 + SIn ,

S0 = 0,

S1 = 1.

Here In is independent of (Si ), and P (In = 0) = 1 ≤ j ≤ n − 1.

(9.3.41) 1 n,

P (In = j) =

2j n2 ,

It can be shown that this recursion does not transform itself to a sum of independent random variables as was done in the random search algorithm in Rachev and R¨ uschendorf (1991) (cf. (9.3.59)). Therefore, (9.3.41) does not allow the application of the Berry–Esseen-type or Poisson-type approximation result. In fact, it arises from the recursion n

P (Sn = k) =

P (Sn = k, j chosen)

(9.3.42)

j=1 n

n 1 1{i=j} δ1k + 1{i<j} P (Sn−i = k − 1) n2 j=1 i=1 + 1{i>j} P (Si−1 = k − 1) n δ1k n − i P (Sn−i = k − 1) + n n2 i=1

=

=

+

n i−1 i=1

n2

P (Si−1 = k − 1)

δ1k 2j + P (Sj = k − 1) · 2 . n n j=1 n−1

=

An explicit formula for P (Sn = k) is due to Brown and Shubert (1984) (cf. Mahmoud (1992, p. 79)). Making use of the Brown–Shubert result, Mahmoud (1992, p. 80) desired formulas for the ﬁrst two moments of Sn . The recursion (9.3.41) leads to a direct calculation of those moments, as we shall see in the next proposition. Proposition 9.3.8 1 = 2 1+ (9.3.43) Hn − 3. n " 2 # Hn 10 1 = 2+ Hn − 4 1 + + Hn(2) + 4. (9.3.44) n n n

(a)

E Sn

(b)

Var Sn

9.3 Extensions of the Contraction Method

265

Proof: = 1 + E (E(SIn |In )) = 1 +

(a) E Sn

n−1

P (In = k) E Sk

(9.3.45)

k=0

= 1+

n−1 k=0

2k E Sk . n2

With Qn := n · E Sn , the recursion (9.3.45) leads to Qn = n + n2 n+1 + n+2 Q1 = 1, which implies Qn+1 = 2n+1 n+1 Qn . Iteratively, Qn

n−1 k=1

Qk ,

n−1 2k − 1 n + 1 2n − 1 + · n k k+1 k=1 ) n * n 2 1 1 − − = (n + 1) k+1 k k+1 k=1 k=1 # " 1 = (n + 1) 2 (Hn+1 − 1) − 1 + n+1 = 2(n + 1) Hn − 3 n.

=

(b) E Sn2

=

1 + 2 E SIn + E SI2n

=

1 + 2 (E Sn − 1) +

n−1 j=1

2j E Sj2 . n2

With Pn := n · E Sn2 , we obtain Pn = −n + 2 Qn + This yields

n+1 2

Pn+1 −

Pn+1 = 8 Hn −

n 2

Pn =

2 n+1 2

2 n

n−1 j=1

Pj .

+ 2 Qn + Pn . By (a), we now have

10 n − 1 n + 2 Pn , + n+1 n+1

and iterating the above expression, we get Pn =

n j=1

10 j − 3 8 Hj − j

n+1 . j+1

The relation n Hj j=1

j

(2)

=

Hn + Hn2 2

leads to an explicit calculation of Pn , which yields (9.3.44).

2

266

9. Mass Transportation Problems and Recursive Stochastic Equations

Our next step is to show that (Sn ) after a logarithmic normalization merges to a sequence of normal r.v.s. Deﬁne the following normalized version of (Sn ): Sn − E S n √ , Sn := 2 ln n

S0 = S1 = 0.

(9.3.46)

Let a(k, n) := 1 − E Sn + E Sk , b(k) := Var Sk , σn2 := Var Sn .

(9.3.47)

For our derivation we need the following (so far unchecked): 2 1 2* ) 1 n−1 2k y y − a(k, n) 2 $ −Φ Φ $ (C) lim sup y dy < ∞.(9.3.48) n2 n→∞ b(n) b(k) k=2

Here, Φ is the standard normal d.f. Let (Zn ) be independent of (Sn ), and Zn ∼ N (0, σn2 ). Theorem 9.3.9 Suppose that (C) holds. Then there exists a constant K < ∞ such that

K . (9.3.49) ζ3 Sn , Zn ≤ √ ln n Proof: Note ﬁrst that (Sn ) satisﬁes the recursion 7 ln In d (9.3.50) Sn = SI + cn (In ), ln n n √ where cn (k) := 1 − E Sn + E Sk / 2 ln n. Deﬁne then the accompanying sequence 7 ln In ∗ d ZIn + cn (In ). (9.3.51) Zn = ln n Applying the “ideality” properties of the metric ζ3 , we obtain the following recursive bound for ζ3 (Sn , Zn );

ζ3 Sn , Zn ≤ ζ3 Sn , Zn∗ + ζ3 (Zn∗ , Zn ) (9.3.52) 1 2 7 7 n−1 ln k ln k Zk +cn (k) P (In = k) ζ3 ≤ Sk +cn (k), ln n ln n k=0

+ ζ3 (Zn∗ , Zn ) n−1 2 k ln k 3/2 k , Zk + ζ3 (Z ∗ , Zn ) . S ζ ≤ 3 n n2 ln n k=2

9.3 Extensions of the Contraction Method

267

To estimate the (ζ3 )-distance between Zn∗ , and Zn we compute the ﬁrst two moments of Zn∗ : E

Zn∗

n−1

=

17 P (In = k) E

k=0 n−1

=

P (In = k)

k=0

2

ln k Zk + cn (k) ln n

1 − E S n + E Sk √ 2 ln n

= (2 ln n)−1/2 [1 − E Sn + E SIn ] = 0 = E Zn , 2

and similarly, E (Zn∗ ) = ζ3 (Zn∗ , Zn ) ≤

1 2 ln n

Var Sn = Var Sn . Now we obtain

1 1 κ3 (Zn∗ , Zn ) = 6 2

x2 FZn∗ (x) − FZn (x) dx.

√ $ Furthermore, FZn (x) = Φ(x/σn ) = Φ x 2 ln n/ b(n) , and

FZn∗ (x) =

n−1

P (Zn∗ ≤ x | In = k) · P (In = k)

k=0

2 1 √ √ 1 x 2 ln n − a(k, n) $ = + 1[1−E Sn ,∞) (x 2 ln n) n b(k) k=2 √ 2 + 2 1[2−E Sn ,∞) (x 2 ln n). n √ Applying the substitution y = x · 2 ln n, the above implies n−1

2k Φ n2

1 [An + Bn + Cn ] , 2 · (2 ln n)3/2

ζ3 (Zn∗ , Zn ) ≤

(9.3.53)

where An

:=

Bn

:=

2 1 y y 1[1−E Sn ,∞) (y) − Φ $ dy, b(n) 2 1 2 y 2 $ 1 y (y) − Φ dy, [2−E S ,∞) n n2 b(n) 1 n

2

and Cn :=

n−1 2 1 2* ) 1 2 k y y − a(k, n) $ $ − Φ Φ y dy. n2 b(k) b(n) 2

k=2

268

9. Mass Transportation Problems and Recursive Stochastic Equations

Invoking the assumption (C), we obtain Cn ≤ MC for all n ∈ IN and a ﬁxed constant MC . For n ≥ n0 we have E Sn ≥ 1 and 1 2 1 1 y 2 y Φ $ y 2 1[1−E Sn ,0) (y) dy An ≤ − 1[0,∞) (y) dy + n n b(n) √ 3 1 1 2$ 1 1 ≤ · 2√ b(n) + · (E Sn − 1)3 −→ 0. n→∞ n 3 n 3 π The last bound follows from the follwoing asymptotics: b(n) = Var Sn ∼ 2 ln n, and E Sn ∼ 2 ln n. Therefore, An ≤ MA , and similarly Bn ≤ MB , for all n, and we obtain M , (ln n)3/2

ζ3 (Zn , Zn∗ ) ≤

(9.3.54)

where M is a ﬁxed constant. Next, we need to apply the Euler summation formula (cf. Hofri (1987, p. 19)) to the function f (x) = x ln x, x ≥ 1: n−1 j=1

n m Bk (k−1) f f (j) = f (x) dx + (n) − f (k−1) (1) + Rm , k!

(9.3.55)

k=1

1

where (Bk ) are the Bernoulli numbers. In (9.3.55) the term Rm has the form Rm

(−1)m+1 = m!

n Bm ({x}) f (m) (x) dx,

{x} = x − $x%,

1

m−k is the mth Bernoulli polynomial. After where Bm (x) = k≥0 m k Bk x some calculations, (9.3.55) with m = 2 yields n−1 j=2

j ln j =

1 2 1 1 n ln n − n2 − n ln n + O(ln n). 2 4 2

(9.3.56)

n−1 ≤ Consider a suﬃciently large n0 such that for n ≥ n0 , j=2 j ln j 1 2 1 2 A A 2 n ln n − 4 n . Choose M large enough that

(9.3.49)(with M instead of A, 2 M . So from (9.3.52), K) holds for n < n0 and deﬁne K := max M (9.3.54), using inductive arguments and assuming (9.3.49) for all k < n, we obtain the ﬁnal bound:

ζ3 Sn , Zn ≤

n−1 k=2

2k n2

ln k ln n

3/2

K M √ + ln k (ln n)3/2

9.3 Extensions of the Contraction Method

= ≤ ≤

269

n−1 1 2 M · · K k ln k + (ln n)3/2 n2 (ln n)3/2 k=2 " # 1 1 2 2K 1 2 ln n − + M n n 4 (ln n)3/2 n2 2 # " K K K 1 + = √ K ln n − . 2 2 (ln n)3/2 ln n

2

Remark 9.3.10 In the preceding example, a direct proof of the convergence of Sn based on direct application of the method of probability metric seems impossible. We were able to obtain the rate of convergence by induction arguments that use the Euler summation formula in a crucial way. This extension of the contraction technique seems to be potentially useful also for other examples in the theorey of probability metrics.

0.358

0.358

0.360

0.360

0.362

0.362

0.364

0.364

0.366

0.366

Remark 9.3.11 Numerical simulations (for n ≤ 10, 000) indicate that (C) is correct. Let us denote the integral in (9.3.48) for n ∈ IN by f (n). Numerical calculation in the range −25 to 25 (with a Newton–Cote algorithm with precision 10−5 ) leads to the graphs in Figures 9.13 and 9.14 of f (n) against n, respectively against ln(ln(ln n)). These graphs indicate the boundedness of f .

0

2000

4000

6000

8000

FIGURE 9.13. f (n) against n

9.3.5

0.5

10000

0.6

FIGURE 9.14. ln(ln(ln(n)))

0.7

f (n)

0.8

against

A Random Search Algorithm

In this section we consider a random search in a set of n ordered states {1, 2, . . . , n}, starting in the largest state n. Let (Tn ) be an independent sequence of random natural numbers, Tn ≤ n − 1. After one step of the

270

9. Mass Transportation Problems and Recursive Stochastic Equations

search we reach state Tn ≤ n − 1. The search is continued in the smaller set {1, . . . , Tn } in the same way, reaching in the next step the state TTn ≤ Tn − 1. The search ends if state 1 is reached. Let Sn denote the number of steps needed for this random search to reach this ﬁnal state 1. Then Sn satisﬁes the recursion d

Sn = 1 + STn ,

S1 = 1.

(9.3.57)

With r.v.s Tn being uniformly distributed on {1, . . . , n − 1}, this model has been used by Ross (1982, p. 118) and Bickel and Freeman (1981) in a search for an estimate of mean number of steps in the simplex method (with n extreme points). For applications to max-search problems we refer to Nevzorov (1988), and Pfeifer (1991). In their setting there are given independent r.v.s X1 , . . . , Xn , and Tn is the largest index k ≤ n − 1 such that Xk > Xn . We add the index 0 to the state space, Tn = 0 meaning that no value larger than Xn occurs. Consider now the r.v.s I1 , . . . , In , where Ik is deﬁned as 1 or 0 as state k is visited by the search process or not. Then d

Sn =

n

Ik .

(9.3.58)

k=1

Let ai ∈ [0, 1], i ≥ 1, a1 = 1, and consider the special search strategy 1 P (Tn = k) =

2

n−1 8

bm

ak ,

1 ≤ k ≤ n − 1,

(9.3.59)

m=k+1

where bm = 1 − am and

Bn−1

m=n bm

= 1.

Special cases: (a)

If ak = 1/k, bk = (k − 1)/k, then 1 αn,k =

n−1 8

2 bm

m=k+1

ak =

n−2 1 1 k ··· = , kk+1 n−1 n−1

that is, this special case corresponds to the uniform search on {1, . . . , n−1}; (b)

If ak = 1 − e−αk , bk = e−αk , (α1 = −∞), then αn,k = e−

n−1 m=k+1

αk

(1 − e−αn ).

9.3 Extensions of the Contraction Method

271

With our choice of the search probabilities in (9.3.59) we can easily see that the random variables d

I1 , . . . , In

are independent, and Ii = B(1, ai ).

(9.3.60)

The above implies that d

Sn =

n

Ii

(9.3.61)

i=1

is a sum of independent binomial random variables; in particular, ESn =

n

ai ,

Var(Sn ) =

i=1

n

ai bi .

(9.3.62)

i=1

In the uniform search case this leads to ESn = log n + γ + O

1 n

(9.3.63)

and π2 +O Var(Sn ) = log n + γ − 6

1 , n

where γ = 0.5772 is the Euler constant. n n n 2 Suppose that λn = i=2 ai , and i=2 ai / i=2 ai = rn is small for n → ∞. Consider then for the Kolmogorov distance (X, Y ) = sup |FX (x) − FY (x)|

(9.3.64)

x

between Sn − 1 and a Poisson distributed random variable Zn with mean λn . From the results of Deheuvels and Pfeifer (1988) we have the following asymptotic approximation: 1 n 2

3/2 1 2 2 pi / (Sn − 1, Zn ) = √ rn + O max pi , rn ; (9.3.65) 2 2πe 2 that is, P (Sn = k + 1) = e−λn λkn /k! + O(rn ).

(9.3.66)

272

9. Mass Transportation Problems and Recursive Stochastic Equations

Some alternative approximations of Sn in terms of various probability metrics were studied in Rachev and R¨ uschendorf (1990).

9.3.6

Bucket Algorithms

Consider now n i.i.d. r.v.s X1 , . . . , Xn with density f on [0, 1], and let i us divide [0, 1] into m intervals Ai = [ i−1 m , m ], 1 ≤ i ≤ m. Let N = number of r.v.s in the “m-buckets” (N1 , . . . , Nm ) be the vector of the n A1 , . . . , Am ; in other words, Ni = j=1 1Ai (Xj ). The total number of comparisons needed to sort n random numbers by the bucket algorithm is given by

Cn =

m Ni i=1

2

=

1 (Tn − n), 2

Tn =

m

Ni2

(9.3.67)

i=1

(cf. Devroye (1986)). Since N is multinomial M (m; p1 , . . . , pm )-distributed with pi = f (x) dx, we obtain Ai

n(n − 1) 2 pj . 2 i=1 m

ECn =

(9.3.68)

Therefore, in the case of a uniform distribution pj = 1/m and for m/n → α ∈ (0, 1), we have ECn ≈ n/2α. In the general case we have the following asymptotics for the ﬁrst two moments of Cn :

ECn

n ≈ 2α

1

f 2 (x) dx

(9.3.69)

0

and 1 2 2 1 4 2 3 2 f 2 (x) dx. (9.3.70) f (x) dx f (x) dx − + Var Cn → 2 n α α 2 ≥ f 2 (x) dx α

9.3 Extensions of the Contraction Method

273

We shall demonstrate the method of probability metrics in order to obtain the asymptotic distribution of Cn in the special case m = 2 and n → ∞. We have 1 n 22 1 22 n Tn = ζi + n− ζi ,

1 Cn = (Tn − n), n

i=1

(9.3.71)

i=1

where the ζi ’s are i.i.d. Bernoulli random variables with success probability p. Deﬁne the approximating U-statistic based on a normal sample: 1 (Sn − n), = 2

Dn

Sn

22 1 n 22 1 n = ηi + n− ηi , (9.3.72) i=1

i=1

where ηi ∼ N (p, pq), q := 1 − p. (A detailed analysis of the distribution of Sn can be found in Seidel (1988).) Next, by use of ◦ we shall ◦ √ √ ◦ denote the normalized quantities ζi = (ζi − p)/ pq, ηi = (ηi − p)/ pq, ◦ ◦ ◦ Tn = ( ζi )2 + (n − ζi )2 , etc. ◦

◦

The next theorem provides estimates of closeness of Cn and Dn in terms of the Kantorovich p -metrics for p = 1 and p = 2. Theorem 9.3.12 For m = 2 and n → ∞, 1

2

◦

◦

Cn Dn , n3/2 n3/2

2 = O(n−1/4 ),

(9.3.73)

= O(n−1/2 ).

(9.3.74)

and 1

1

◦

◦

Cn Dn , n3/2 n3/2

2

Proof: To show (9.3.73) note that the normalization n3/2 is of the right order. In fact, 1 Var

◦

Dn n3/2

2 =

=

◦ 1 Var(n−3/2 Sn ) 4 1 n n ◦ ◦ ◦2 2 i=1 j=1 ηi ηj − 2n ηi 1 Var 4 n3/2

274

9. Mass Transportation Problems and Recursive Stochastic Equations

≈ constant Var

◦ 2n ηi ≈ constant > 0. n3/2

Since 2 is the minimal L2 -metric, it follows that ◦

◦

2 (Cn , Dn ) ≤ L2

1 ◦ 1 ◦ (Tn − n), (Sn − n) 2 2

◦ ◦ 1 L2 (Tn , Sn ). 2

=

Thus 1

2

◦

◦

Cn Dn , n3/2 n3/2

2

2 ◦ ◦ Tn Sn , n3/2 n3/2 ⎛⎛ ⎞ ◦ 1 ⎝⎝ ◦ ◦ L2 2 ζi ζj + n2 − 2n ζi ⎠ n−3/2 , 2 i j ⎛ ⎞ ⎞ ◦ ◦ ◦ 2 −3/2 ⎝2 ⎠ ηi ηj + n − 2n ηi ⎠ n 1 L2 2

≤

=

1

i

⎛ ≤ L2 ⎝n−3/2

i

+ L2 n−1/2 ◦

j

j

◦ ◦

ζi ζj , n−3/2

i

⎞ ηi ηj ⎠ ◦ ◦

j

◦ ζi , n−1/2 ηi =: I1 + I2 . ◦

◦

Assuming that the pairs ( ζi , ηi ) are independent, we obtain I1 = n−3/2 L2

1 i

◦2

ζi ,

⎛

2 η2i ◦

+ n−3/2 L2 ⎝

i =j

i

◦ ◦

ζi ζj ,

⎞ ηi ηj ⎠ ◦ ◦

(9.3.75)

i =j

⎛ ⎛ ⎞2 ⎞1/2

◦2

◦ ◦ ⎜ ⎟ ◦ ◦ ◦ ≤ n−1/2 L2 ζ1 , η21 + n−3/2 ⎝E ⎝ ζi ζj − ηi ηj ⎠ ⎠ i =j

⎛ ⎞1/2

◦2

◦ ◦ 2 ◦ ◦ ◦ ⎠ = n−1/2 L2 ζ1 , η21 + n−3/2 ⎝ E ζi ζj − ηi ηj i =j

◦2

◦ ◦ 1/2 ◦ ◦ ◦ = n−1/2 L2 ζ1 , η21 + n−3/2 n(n − 1)L1 ζ1 ζ2 , η1 η2

◦ ◦

◦ 2 ◦ 2 ◦ ◦ ≤ n−1/2 Var ζ1 + Var ( η1 ) + n−1/2 Var ζ1 ζ2 + Var ( η1 η2 ) ≤ c n−1/2 .

9.3 Extensions of the Contraction Method

275

Here and in the sequel c stands for an absolute constant, which may be diﬀerent at diﬀerent places. Similarly,

◦ ◦ ζi , n−1/2 ηi I22 = L22 n−1/2 (9.3.76)

◦ ◦ ◦ ◦ ζi , n−1/2 ηi E n−1/2 ζi +E n−1/2 ηi ≤ L1 n−1/2

◦ ◦ ◦ ◦ ζi , n−1/2 ηi Var ζ1 + Var η1 ≤ L1 n−1/2

◦ ◦ ζi , n−1/2 ηi . = 2 L1 n−1/2 Passing to the minimal metric, this yields ◦

◦

2 (Cn , Dn ) ≤ c n−1/2 +

◦ ◦ √ @ 2 1 (n−1/2 ζi , η2 ) .

(9.3.77)

The rate of convergence in the CLT for the 1 = ζ1 -metric has been discussed by Zolotarev (1986, Theorem 5.4.7) (see also Rachev and R¨ uschendorf (1990, Lemma 3.3)). It is given by

1 (n−1/2

◦ ◦ ◦ ◦ ◦ ◦ ζi , η1 ) ≤ 11.5 max 1 ( ζ1 , η1 ), ζ2 ( ζ1 , η1 ) n−1/2 ,

(9.3.78) ◦

◦

where ζr is the Zolotarev ideal metric of order r > 0. This implies 2 (Cn , Dn ) ≤ C n−1/4 . We can argue similarly to show (9.3.76). The bound (9.3.75) can be replaced by −1/2

I1 ≤ n

◦ ◦ ◦ 2 ◦ 2 ◦ ◦ E ζ1 + E | η1 | + n−1/2 E ζ1 ζ2 + E | η1 η2 | (9.3.79)

≤ c n−1/2 . Invoking (9.3.78), we see that the term I2 is of order n−1/2 .

2

We can extend our results to the cases m = 3, 4. However, the proofs are computationally quite involved. For some general results on the asymptotic distributions of quadratic forms we refer to de Jong (1989).

10 Stochastic Diﬀerential Equations and Empirical Measures

10.1 Propagation of Chaos and Contraction of Stochastic Mappings In this section we use contraction properties of stochastic mappings with respect to suitably chosen metrics in order to study some new examples of propagation of chaos. In particular, systems of stochastic diﬀerential equations (SDEs) with mean ﬁeld type interactions and the corresponding nonlinear SDEs of McKean–Vlasov type for the limiting cases will be considered. We shall also study the rate of convergence to the corresponding limit. Assumptions on the smoothness and growth properties of the coeﬃcients of the SDEs are to be reﬂected in the choice of the probability metric in order to obtain the required contraction properties. This allows us to investigate new types of interactions as well as to consider systems with relaxed Lipschitz assumptions.

10.1.1

Introduction

The notion “propagation of chaos” was introduced by Kac in his investigation of the relationship between simple Markov models of interacting particles and nonlinear Boltzmann-type equations; for an introduction to the propagation theory of chaos we refer to Sznitman (1989). A formal def-

278

10. Stochastic Diﬀerential Equations and Empirical Measures

inition follows. Let (uN ) be a sequence of symmetric probability measures on E N , E a separable metric space, and let u be a probability on E. Then w (uN ) is called u-chaotic if πk un −→ u(k) . Here πk stands for the k-marginal w distribution, u(k) is the k-fold product, and −→ denotes weak convergence. A basic example for chaotic sequences is McKean’s interacting diﬀusion (cf. the “laboratory” example in Sznitman (1989, p. 172)). Consider a system of interacting diﬀusions

dXti,N X0i,N

= dWti +

N 1 b(Xti,N , Xtj,N )dt, N j=1

i = 1, . . . , N,

(10.1.1)

= xi0 ,

where the W i are independent Brownian motions and b satisﬁes a certain Lipschitz condition. Let uN denote the distribution of (X 1,N , . . . , X N,N ). The nonlinear limiting equation is given by the McKean–Vlasov equation dXt = dBt +

b(Xt , y)ut (dy) dt,

(10.1.2)

when Bt is a Brownian motion and ut is the distribution of Xt . Then uN is u-chaotic, where u is the distribution of X on C(IR+ , IRd ). An alternative example of chaotic behavior of particles, not described by SDEs, are uniform distributions on “p-spheres.” Let uN denote the uniform distribution on the p-sphere of radius N in IRN + , that is, on Sp,N := {x ∈ p N IR+ ; Σxi = N }. Let u denote the probability measure on IR+ with density fp (x) =

p1−1/p −xp /p e , Γ(1/p)

x ≥ 0.

Then for N > k + p, and k and N big enough, πk uN − u(k) ≤

2(k + p) + 1 , N −k−p

(10.1.3)

where · denotes the total variation distance (cf. Kuelbs (1977), Rachev and R¨ uschendorf (1991)). In particular, we obtain that uN is u-chaotic. This example has its origin in Poincar´e’s theorem on the asymptotic behavior of particle systems. More general examples of this kind have been developed in the statistical physics literature in connection with the “equivalence of ensembles” but typically without a quantitative estimate as in (10.1.3).

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

279

The main goal of this section is the study of propagation of chaos in several modiﬁcations of McKean’s example. We shall be concerned with the form of the interaction and the regularity assumptions on the coeﬃcients. To this end we introduce suitable probability metrics, allowing us to derive contraction properties of the stochastic equations deﬁned by the corresponding linear equations. Dobrushin (1979) introduced the use of the Kantorovich metric for the interacting diﬀusions in the model (10.1.1), (10.1.2). The success of this metric is based on a coupling argument inherent in its deﬁnition. This metric has been applied since then in several other papers. For some modiﬁcations of the model (10.1.1), (10.1.2) we shall need alternatives to the Kantorovich metric that provide suitable regularity and ideality properties for the equations considered. In particular, we need metrics that are “ideal” of higher order when we relax the Lipschitz conditions in equations (10.1.1), (10.1.2). Our modiﬁcations allow us to treat much more complicated forms of interactions than those in the McKean example. In particular, we consider nonlinear interactions via some general energy function, as for example the p-norm of the vector of all pair interactions. We also consider interactions with “outside” particles over the whole past (history) of the process, describing some non-Markovian systems. We demonstrate the ﬂexibility of the approach based on suitable probability metrics to analyze with nonstandard forms of interactions and develop the tools to study complex physical systems.

10.1.2

Equations with p-Norm Interacting Drifts

Consider a system of N interacting diﬀusions with p-norm interacting drifts; that is, the drift is given by the pth norm of the vector of all pair interactions (which can be viewed as the driving force in the system):

dXti,N X0i,N

⎧ ⎫1/p N ⎨1

⎬ = dWti + bp Xti,N , Xtj,N dt ⎩N ⎭ j=1 = X0i ,

1 ≤ i ≤ N,

(10.1.4)

280

10. Stochastic Diﬀerential Equations and Empirical Measures

b ≥ 0, p ≥ 1. ((Wti ), X0i ) are independent, identically distributed for all i.) We shall establish that each X i,N has a natural limit X i , where the (X i ) are independent copies of the solutions of a nonlinear equation ⎧ ⎨ ⎩

dXt

= dBt +

Xt=0

= X,

1/p p

b(Xt , y) ut ( dy)

dt,

(10.1.5)

d

with B = W 1 a process on CT , and ut = P Xt . In order to obtain the necessary contraction properties of these equations, we consider the L∗p -metric and the corresponding minimal L∗p -metric ( ∗p ), deﬁned for processes X, Y (or the corresponding probability measures m1 , m2 ∈ M 1 (CT )). Here and in what follows M 1 (CT ) denotes the class of all probability distributions on CT , L∗p,t (X, Y ) := (E sup |Xs − Ys |p )1/p ,

(10.1.6)

s≤t

and

∗p,t (m1 , m2 ) := inf{L∗p,t (X, Y ); X = m1 , Y = m2 }. d

d

(10.1.7)

In (10.1.7) we tacitly assumed that the underlying probability space is rich enough to support all possible couplings of m1 , m2 , which is true, for example, in the case of atomless probability spaces. Deﬁne, for m0 ∈ M 1 (CT ), Mp (CT , m0 ) := {m1 ∈ M 1 (CT ); ∗p,T (m0 , m1 ) < ∞},

(10.1.8)

and let Xp (CT , m0 ) be the class of processes on CT with distribution m ∈ Mp (CT , m0 ). For m0 = δa (the one-point measure in a ∈ CT ), this is the class of all distributions on CT with ﬁnite pth moment of the process norm. For m ∈ Mp (CT , m0 ) consider the linear equation corresponding to (10.1.5) t Xt = Bt + 0

⎛ ⎝

CT

⎞1/p b(Xs , ys )p dm(y)⎠

ds,

(10.1.9)

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

281

where ys is the value of y at time s. Let (Bt ) be a real-valued process on CT = C[0, T ] with ﬁnite pth absolute moment (E sups≤T |Bs |p < ∞), and let b ≥ 0 be a Lipschitz function in x; that is, |b(x1 , y) − b(x2 , y)| ≤ c|x1 − x2 |,

for all x1 , x2 and Y in CT . (10.1.10)

As usual, a strong solution of the SDE (10.1.9) means a solution measurable with respect to the augmented ﬁltration of the process (Bt ). In constrast, a weak solution of (10.1.9) is deﬁned on a suitable ﬁltered space of distributions.

Lemma 10.1.1 Assume that (10.1.10) holds, and let 1/p b(0, ys )p dm(y) ds < ∞.

(a) Then equation (10.1.9) has a unique strong solution X. (b) If Φ(m) is the law of X, then Φ(m) ∈ Mp (CT , m0 ); that is, Φ : Mp (CT , m0 ) → Mp (CT , m0 ). Proof: Let X ∈ Xp (CT , m0 ), and deﬁne 1/p

t b(Xs , ys )p dm(y)

(SX)t := Bt +

ds.

0

Then, for Y ∈ Xp (CT , m0 ), t |(SX)t − (SY )t | ≤ 0

1/p 1/p p p b(Xs , ys ) dm(y) − b(Ys , y) dm(y) ds 1/p

t ≤

|b(Xs , ys ) − b(Ys , ys )|p dm(y) 0

t ≤ c |Xs − Ys | ds. 0

ds

282

10. Stochastic Diﬀerential Equations and Empirical Measures

This implies sups≤t |(SX)s − (SY )s | ≤ c thermore, L∗p,t (SX, SY ) =

t 0

supu≤s |Xu − Yu | ds, and fur-

p

1/p

E sup |(SX)s − (SY )s | s≤t

⎛ ⎛ t ⎞p ⎞ 1/p ≤ c ⎝E ⎝ sup |Xu − Yu | ds⎠ ⎠ 0

u≤s

t ≤ c L∗p,s (X, Y ) ds. 0

Deﬁne inductively X 0 := B and X n := SX n−1 . Then the above bound yields L∗p,T (X n , X n−1 ) ≤ cn

Tn ∗ L (X 1 , X 0 ). n! p,T

For the L∗p,T -distance in the right-hand side we have the following estimate:

L∗p,T (X 1 , X 0 )

#1/p T" ≤ c ds E|Bs |p + b(0, ys )p dm(y)

0

T

≤ c

1/p E sup |Bu |

0

p

ds + c

u≤s

p

b(0, ys ) dm(y)

ds

0 p 1/p

≤ c T (E sup |Bs | ) s≤T

1/p

T

+c

1/p

T p

b(0, ys ) dm(y)

ds.

0

From the assumptions on B and b, the above bound implies that L∗p,T (X 1 , X 0 ) < ∞. Consequently, ∞

L∗p,T (X n , X n−1 ) ≤ ecT L∗p,T (X 1 , X 0 ) < ∞,

n=1

∞ by the Gronwall lemma. This results in n=1 sups≤T |Xsn −Xsn−1 | < ∞ a.s., and therefore, X n converges to some process X a.s., uniformly on bounded intervals. The limiting process X is a.s. continuous, has ﬁnite pth moments

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

283

(i.e., X∗p,t := E sups≤t |Xs |p < ∞), and is a ﬁxed point of the mapping S. So, Φ(m) = P X ∈ Mp (CT , m0 ); this holds because B∗p,T < ∞ and 2 L∗p,T (X, B) < ∞. In addition, suppose that b is Lipschitz in both arguments; that is, for all x1 , x2 , y1 , and y2 in CT , |b(x1 , y1 ) − b(x2 , y2 )| ≤ c[|x1 − y1 | + |x2 − y2 |],

(10.1.11)

and consider the mapping Φ : Mp (CT , m0 ) → Mp (CT , m0 ). Lemma 10.1.2 (Contraction of Φ with respect to the ∗p,t -minimal metric) Under the Lipschitz condition (10.1.11) and the assumptions of Lemma 10.1.1, for t ≤ T and m1 , m2 ∈ Mp (CT , m0 ), the following holds:

∗p,t (Φ(m1 ), Φ(m2 ))

t ≤ ce

ct

∗p,u (m1 , m2 ) du.

(10.1.12)

0

Proof: Let for i = 1, 2 and t ≤ T ,

(i)

Xt

t := Bt +

⎛ ⎝

0

⎞1/p b(Xs(i) , ys )p dmi (y)⎠

ds,

CT

and let m ∈ M 1 (m1 , m2 ), the class of probability measures on CT × CT with marginals m1 , m2 . Then sup |Xs(1) − Xs(2) | s≤t ⎡ ⎤1/p ⎡ ⎤1/p t ds ⎣ b(Xs(1) , ys(1) )p dm1 (y (1) )⎦ − ⎣ b(Xs(2) , ys(2) )p dm2 (y (2) )⎦ ≤ CT 0 CT ⎡ ⎤ 1/p t p ≤ ds ⎣ b(Xs(1) , ys(1) ) − b(Xs(2) , ys(2) ) dm(y (1) , y (2) )⎦ 0

CT ×CT

t ≤ c

ds 0

|Xs(1)

−

Xs(2) |

" +

|ys(1)

−

ys(2) |p

dm(y

(1)

,y

(2)

#1/p )

.

284

10. Stochastic Diﬀerential Equations and Empirical Measures

Minimizing the right-hand side with respect to all couplings m, we obtain

sup |Xs(1) s≤t

−

Xs(2) |

t t (1) (2) ≤ c ds sup |Xu − Xu | + c ds ∗p,s (m1 , m2 ). (10.1.13) 0

u≤s

0

Consequently, by Gronwall’s lemma,

sup |Xs(1) s≤t

−

Xs(2) |

t ≤ ce

∗p,s (m1 , m2 ) ds. ct

(10.1.14)

0

Finally, passing to the pth norm in the left-hand side of (10.1.14) and then to the corresponding minimal metric ∗p,t proves the lemma. 2

Theorem 10.1.3 Under the Lipschitz condition (10.1.11) and assuming T that ( b(0, ys )p dm0 (y))1/p ds < ∞, equation (10.1.1) has a unique weak 0

and strong solution in Xp (CT , m0 ). Proof: From Lemma 10.1.2 we obtain that for m ∈ Mp (CT , m0 ), Tk ∗

(Φ(m), m) (cT = c ecT ) k! p,T Tk ∗ ( (Φ(m), m0 ) + ∗p,T (m, m0 )) < ∞. ≤ ckT k! p,T

∗p,T (Φk+1 (m), Φk (m)) ≤ ckT

Consequently, (Φk (m)) is a Cauchy sequence in (CT , ∗p,T ) and thus converges to a ﬁxed point of Φ. Let X k+1 , X k denote the couplings of Φk+1 (m), Φk (m). Then, by (10.1.12), we have that L∗p,T (X k+1, X k ) ≤ ckT

Tk ∗

(Φ(m), m), k! p,T

and therefore, we determine a unique strong solution with ﬁnite pth moment. 2

Remark 10.1.4 While the linear equation in Lemma 10.1.1 can be handled with the L1 -metric, in Lemma 10.1.2 we obtain only a contraction with respect to the minimal p -metric ∗p,T (cf. equation (10.1.12)).

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

285

Remark 10.1.5 The result of Theorem 10.1.3 can be extended to the case p = ∞ by applying the metric L∗∞,T (X, Y ) = ess sup sup |Xs − Ys |

(10.1.15)

s≤T

and the corresponding minimal metric

∗∞,T (m1 , m2 ) = inf{L∗∞,T (X, Y ); X = m1 , Y = m2 }. d

d

(10.1.16)

Then the equation

Xt

t = Bt + ess sup b(Xs , y) ds 0

(10.1.17)

us (dy)

has a unique solution in M∞ (CT , m0 ) if B is a.s. bounded, that is, if ess sups≤T |Bs | < ∞. Remark 10.1.6 Several extensions of equation (10.1.5) can be handled in a similar way, as for example t p

Xt = Bt +

b(Xs , y)

u(k) s ( dy)

1/p ds,

(10.1.18)

0

Dk (k) Xs stands for the k-fold product of us and y = where us = i=1 P k (y1 , . . . , yk ) ∈ IR . More generally, b = b(s, x, y) can be dependent upon s and the past of the process y = (yu )u≤s . In this case, us has to be replaced by u(s) := P (Xu )u≤s (the distribution of the past), and we need to assume a functional Lipschitz condition on b. In a similar way one can also investigate the d-dimensional case. Taking as a starting point Theorem 10.1.3, we next investigate the system of interacting equations in (10.1.4). The following theorem asserts that as N goes to inﬁnity, each X i,N has a natural limit X i . Here, the (X i ) are independent copies of the solutions of the nonlinear equation (10.1.5). Theorem 10.1.7 Let b satisfy the Lipschitz condition (10.1.11) and suppose that |b(X 1s , ys )|2p us ( dys ) < ∞, a.s. Then √ sup N E 1/p sup |Xti,N − X it |p < ∞ N

t≤T

for

p ≥ 2,

(10.1.19)

286

10. Stochastic Diﬀerential Equations and Empirical Measures

and N p−1 E sup |Xti,N − X it |p = o(1)

for

t≤T

1 ≤ p ≤ 2.

Proof: For notational convenience we drop the superscript N ; then t Xti − X it = 0

⎛⎧ ⎫1/p ⎧ ⎫1/p ⎞ N ⎨ ⎬ ⎬ ⎨1 ⎜ ⎟ b(Xsi , Xsj )p − b(X is , y)p us ( dy) ⎝ ⎠ ds ⎭ ⎩N ⎩ ⎭ j=1

⎧⎡⎛ ⎤ ⎞ ⎞ 1/p ⎛ 1/p t ⎪ ⎨ 1 ⎢ 1 ⎥ ds ⎣⎝ b(Xsi , Xsj )p ⎠ − ⎝ b(X is , Xsj ⎠ ⎦ = ⎪ N N ⎩ j j 0 ⎡⎛ ⎤ ⎞ ⎞ 1/p ⎛ 1/p 1 p i j ⎠ ⎥ ⎢ 1 + ⎣⎝ b(X is , Xsj )p ⎠ − ⎝ b (X s , X s ) ⎦ N j N j

⎡⎛ ⎤⎫ ⎞ ⎞ 1/p ⎛ 1/p ⎪ ⎬ ⎢ 1 ⎥ + ⎣⎝ b(X is , X js )p ⎠ − ⎝ b(X is , y)p us ( dy)⎠ ⎦ . ⎪ N j ⎭

Set |X|T := sups≤T |Xs |. Then by the Minkowski inequality and the Lipschitz condition on b, the above equality implies

p 1/p X i − X i T,p := E X i − X i T ⎧ ⎪ T ⎪ N ⎨ 1 ≤ ds cXsi − X is p + c Xsj − X js p ⎪ N ⎪ j=1 ⎩ 0

⎫ ⎞ ⎛ ⎛ 1/p ⎞1/p ⎛ ⎞ 1/p p ⎪ ⎪ ⎬ 1 ⎟ ⎜ ⎝ i j p⎠ i p . + ⎝E b(X s , X s ) − ⎝ b(X s , y) us ( dy)⎠ ⎠ ⎪ N ⎪ j ⎭

Summing up over i and using the symmetry, we ﬁnd

N X 1 − X 1 T,p =

N i=1

X i − X i T,p

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

T ≤ 2c

287

⎧ ⎪ N ⎨

Xsi − X is p ⎪ ⎩ i=1 0 ⎫ ⎛ ⎞ ⎞ 1/p ⎛ 1/p p N N 1 ⎪ ⎬ ⎝ i j p⎠ i p ⎝ ⎠ . + E b(X s , X s ) − b(X s , y) us ( dy) N ⎪ i=1 j=1 ⎭ ds

This amounts to

T X i − X i T,p ≤ 2c 0

⎧ ⎪ ⎨ ds X i − X i s,p ⎪ ⎩

⎤⎫ ⎡ ⎛ ⎞ ⎞ 1/p ⎛ 1/p p N ⎪ ⎬ 1 ⎢ ⎝ 1 ⎥ i j p⎠ i p + b(X s , X s ) − ⎝ b(X s , y) us ( dy)⎠ ⎦ , ⎣E ⎪ N N i=1 j ⎭

and consequently, by the Gronwall lemma, ⎡ ⎛ ⎞ 1/p N 1 ⎢ ⎝ 1 i i 2cT i j p⎠ X − X T,p ≤ 2c e ds b(X s , X s ) ⎣E N N i=1 j=1 0 ⎛ ⎞1/p p ⎤ ⎥ − ⎝ b(X is , y)p us ( dy)⎠ ⎦ ⎛ p ⎞ ⎛ ⎞ 1/p 1/p T 1 ⎝ 2cT 1 j p⎠ 1 p ds E b(X s , X s ) − ⎝ b(X s , y) us ( dy)⎠ . = 2c e N j 0 T

N

By the Taylor expansion and with Yj := b(X is , X js )p (conditionally on X is ) we obtain p 1/p S SN p 1 N 1/p , +a − a ≤ p a1−p E E N p N

(10.1.20)

288

10. Stochastic Diﬀerential Equations and Empirical Measures

where SN = (Yj − a), a = EYj > 0. Therefore, from the Marcinkiewicz– Zygmund inequality (cf. Chow and Teicher, (1978, p. 357)), we conclude that √

NE

1/p

p p 1/p S N 1/p 1/p SN √ + a − a ≤ const. E = O(1). N N

This yields (10.1.19) for p ≥ 2. For 1 ≤ p < 2, the claim follows from N the moment bounds of Pyke and Root (1968), giving E| NS1/p |p = o(1). Therefore,

N

p−1

p 1/p S N 1/p +a E − a = o(1). N

(10.1.21)

2 We next interpret Theorem 10.1.7 as a chaotic property of the diﬀusions governed by (10.1.4). Recall that by Proposition 2.2 in Sznitman (1989), a sequence (uN ) of symmetric probability measures on E (N ) is u-chaotic, u ∈ M 1 (E), if for (X1 , . . . , XN ) distributed as uN , N 1 w δX −→ u. N i=1 i

For X N :=

1 N

w

N

i=1 δX i,N

X N −→ X,

(10.1.22)

we obtain from Theorem 10.1.7 that

(10.1.23)

where X is the solution of equation (10.1.5). Therefore, with m denoting the law of X and mN denoting the law of (X 1,N , . . . , X N,N ) we obtain from (10.1.22) the following corollary.

Corollary 10.1.8 Under the assumptions of Theorems 10.1.3 and 10.1.7, the sequence (mN ) is m-chaotic.

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

289

Remark 10.1.9 For p = ∞ (see (10.1.17)) the propagation of chaos property does not hold. Also, the case 0 < p < 1 does not lead to propagation of chaos, and there does not exist a unique strong solution of t b(Xs , ys )p dm(y) ds.

X t = Bt +

(10.1.24)

0

Remark 10.1.10 (An example leading to a Burger type equation.) Consider the stochastic system ⎞1/p N 1 j i p +⎝ b(Xt , Xt ) ⎠ dt, N j=1 ⎛

dXti

=

dWti

i = 1, . . . , N,

(10.1.25)

with Lipschitz (in both arguments) interactive term b(·, ·). Then the instantaneous drift term seen by particle i is ⎞1/p N 1 = ⎝ b(Xti , Xtj )p ⎠ . N j=1 ⎛

∆i

Under the assumptions of Theorem 10.1.7, we have ⎤ 1 1/p 22 N 1 ⎦ = 0, lim E ⎣ ∆pi − bp (Xti , y)ut ( dy) N →∞ N i=1 ⎡

as well as )

*2 N 1 p p i lim E = 0. ∆i − b (Xt , y)ut ( dy) N →∞ N i=1 Similarly to the above limit nrelations we shall examinethe average behavior of the “pseudo drift” N1 i=1 Zip . Here, Zip := N 1−1 j =i φpN,a (Xti − Xtj ), and φN,a (x − y) = N ad/p φ(N a (x − y)), where φ(·) ≥ 0 is smooth, compactly supported on IRd , and φ(x) dx = 1. We consider the vector-valued case here. Note that 1 p φ (X 1 − Xtj ) N − 1 j=2 N,a t N

Z1p

:=

290

10. Stochastic Diﬀerential Equations and Empirical Measures

1 ad p a i N φ (N (Xt − Xtj )), N − 1 j=1 N

=

and consequently, EZ1p = N ad (Eφp (N a (Xt1 − Xt2 ))) ad p a 1 N Eφ (N (Xt − Xt2 )) p = φp φ

−→

N →∞

ut 2L2

φp =: ut,p (Xt ).

Consider next )

an

*2 N 1 p 1 := E (Z − ut,p (Xt )) N i=1 i ⎛ ⎡ ⎞⎤2 N p 1 1 ⎝ = E⎣ φN,a (Xti − Xtj ) − ut,p (Xt1 )⎠⎦ N i=1 N − 1 j =i

⎡

=

E⎣

1 N −1

N

⎤2

φpN,a (Xt1 − Xtj ) − ut,p (Xt1 )⎦ .

j=1

Arguing as in Sznitman (1989, p. 196), we ﬁnd that

aN →

⎧ ⎪ ⎪ 0 ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ∞

2 φp 2p 2 φ dx ut L2 φ2p

if 0 < a < d1 , if a = d1 , if a > d1 .

Therefore, only in the case of moderate interaction do we obtain Burger’s equation in the limit.

10.1.3

A Random Number of Particles

Let (W i )i∈IN be a sysetm of i.i.d. real-valued processes (as in (10.1.4)) with ﬁnite pth moments and let (Nn )n≥1 be an i.i.d. integer-valued sequence of

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

291

r.v.s independent of (W i ). Consider the following system of SDE with a random number of particles and interactions: ⎞1/p Nn 1 = dWti + ⎝ b(Xti,n , Xtj,n )p ⎠ dt, i = 1, . . . , Nn . (10.1.26) n j=1 ⎛

dXti,n

We assume that the following asymptotic stability condition holds: Nn → Y n

a.s. as n → ∞.

(10.1.27)

As in Section 10.1.2, it turns out that X i,N has a natural limit X i that is a solution of the nonlinear SDE

dXt = dBt + Y

1/p

1/p

p

b(Xt , y) ut ( dy)

.

(10.1.28)

d

Here B = W 1 , and Y is assumed to be independent of B. For m0 ∈ M 1 (CT ) let Mp (CT , m0 ), L∗p,T , ∗p,T be deﬁned as in Section 10.1.2.

Lemma 10.1.11 Suppose that

t 0

|Bs | ds < ∞ a.s. Then for any m ∈

Mp (CT , m0 ), there exists a unique strong solution of the equation

Xt = Bt + Y 1/p

t

⎛ ⎝

0

Proof: Set (SX)t := Y 1/p

⎞1/p

b(Xs , ys )p dm(y)⎠

ds.

(10.1.29)

CT

t 0

⎛ ⎝

⎞1/p b(Xs , ys )p dm(y)⎠

ds. Then arguing

CT

in a similar fashion as in the proof of Lemma 10.1.1, we obtain the bound

sup |(SX)s − (SY )s | ≤ cY 1/p s≤t

t sup |Xu − Yu | du.

0

0≤u≤s

292

10. Stochastic Diﬀerential Equations and Empirical Measures

Deﬁning inductively X 0 := B, X n := SX n−1 , we have tn sup |X 1 − Xs0 | n! s≤t s ⎤ ⎡ t 1/p t n t ds⎦ ≤ cn Y (n+1)/p ⎣ Bs ds + |ys |p dm(y) n!

sup |Xsn − Xsn−1 | ≤ cn Y n/p s≤t

0

< ∞.

0

2

This indeed implies the existence of a unique strong solution.

Given m ∈ Mp (CT , m0 ), let Φ(m) denote the distribution of the solution of (10.1.29). Then we have the following contraction-type property for the mapping Φ.

Lemma 10.1.12 Suppose that Ap := cY 1/p ecY T, m1 , m2 ∈ Mp (CT , m0 ),

1/p

p < ∞. Then for t ≤

t ≤ Ap ∗p,s (m1 , m2 ) ds.

∗p,t (Φ(m1 ), Φ(m2 ))

(10.1.30)

0

Proof: Let X (i) be the solution of the SDE

(i)

Xt

= Bt + Y 1/p

t

⎛ ⎝

0

⎞1/p b(Xs(i) , ys )p dmi (y)⎠

ds.

CT

Then, as in the proof of Lemma 10.1.2,

sup |Xs(1) s≤t

−

Xs(2) |

≤ cY

1/p

t 0

sup |Xu(1) u≤s

−

Xu(2) |

t + c ds p,s (m1 , m2 ). 0

By the Gronwall lemma, sups≤t |Xs1 − Xs2 | ≤ cY 1/p ecY implying that

1/p

t 0

p,s (m1 , m2 ) ds,

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

293

' ' t ' 1/p cY 1/p ' ≤ 'cY e

∗p,s (m1 , m2 ) ds. '

∗p,t (Φ(m1 ), Φ(m2 ))

p

0

2 From Lemmas 10.1.11 and 10.1.12 we conclude that (10.1.28) has a unique solution. The proof is similar to that of Theorem 10.1.3. Theorem 10.1.13 Under the assumptions of Lemmas 10.1.11 and 10.1.12, equation (10.1.28) has a unique solution, provided that B∗p,T

10.1.4

< ∞

1/p b(0, ys ) dm0 (y) ds < ∞. p

and

pth Mean Interactions in Time: A Non-Markovian Case

Suppose (Xti,N )i=1,...,N determines a system of N particles and let b(Xsi,N , ·) := (b(Xsi,N , Xsj,N ))1≤i≤N describe the interaction vector. Recall that in Section 10.1.2 we considered equation (10.1.4) with a drift of the form b(Xsi,N , ·)p corresponding to the pth norm of the interaction vector. In this section we shall study SDEs with mean interactions in time. In fact, let N 1 i,N j,N (10.1.31) b Xs , Xs Fi (s) := N j=1 be the average of the interaction vector and consider the equations ⎛

t

⎞1/p

Xti,N

= Wti + ⎝

|Fi (s)|p ds⎠

X0i,N

=

X0i ,

Xti,N

= Wti + ess sup |Fi (s)|;

X0i,N

= X0i , 1 ≤ i ≤ N,

;

(10.1.32)

0

1 ≤ i ≤ N,

for 1 ≤ p < ∞;

s≤t,λ\

for p = ∞;

(10.1.33)

294

10. Stochastic Diﬀerential Equations and Empirical Measures

t Xti,N

=

Wti

=

X0i

|Fi (s)|p ds;

+

(10.1.34)

0

X0i,N

; 1 ≤ i ≤ N,

for 0 < p < 1.

In other words, we consider SDEs with a drift resulting from the pth mean in time of the average of the interaction vector. From the deﬁnition it is clear that this model describes a system that no longer behaves as a Markovian one, since the instantaneous drift |Fi (t)|p is weighted by the mean interac⎛ t ⎞1/p−1 1⎝ |Fi (s)|p ds⎠ over the whole past of the process. From this tion p 0

point of view the propagation of chaos property seems to be not so obvious in this model. First we consider the case 1 ≤ p < ∞. The nonlinear limiting equation is given by ⎛

Xt

⎞1/p p t = Bt + ⎝ b(Xs , y)us ( dy) ds⎠ , us = P Xs . (10.1.35) 0

Here Xt , Bt , b are real-valued, Bt is a process in CT = C[0, T ], and |b(x1 , y) − b(x2 , y)| ≤ c|x1 − x2 |

for some c > 0.

(10.1.36)

Deﬁne, for m0 ∈ M 1 ( T ), Mp (CT , m0 ) := {m1 ∈ M 1 (CT ); ∗p,t (m1 , m0 ) < ∞}.

(10.1.37)

Then, for m ∈ Mp (CT , m0 ), consider the linear equation p ⎞1/p t ⎠ ⎝ = Bt + b(Xs , ys ) dm(y) ds , ⎛

Xt

0

CT

where ys is the value of y at time s.

(10.1.38)

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

295

Lemma 10.1.14 Assume that the Lipschitz condition (10.1.36) holds, and furthermore, p T b(0, ys )ms ( dy) ds < ∞, 0

CT

where ms is the distribution of Xs at time s under m. Then (a) Equation (10.1.38) has a unique strong solution X. (b) If Φ(m) is the law of X, then Φ(m) ∈ Mp (CT , m0 ); that is, Φ : Mp (CT , m0 ) → Mp (CT , m0 ). Proof: Let X ∈ Xp (CT , m0 ) and deﬁne ⎛

⎞1/p p t := Bt + ⎝ b(Xs , y)ms ( dy) ds⎠ .

(SX)t

(10.1.39)

0

Then |(SX)t − (SY )t |p ⎛⎛ ⎞1/p p t ⎜ = ⎝⎝ b(Xs , y)ms ( dy) ds⎠ 0

⎞1/p ⎞p p t ⎟ − ⎝ b(Ys , y)ms ( dy) ds⎠ ⎠ ⎛

0

⎛ t ⎞1/p p ≤ ⎝ (b(Xs , y) − b(Ys , y))ms ( dy) ds⎠ 0

(Minkowski inequality) t cp |Xs − Ys |p ds (by the Lipschitz condition (10.1.36)).

≤ 0

296

10. Stochastic Diﬀerential Equations and Empirical Measures

This implies

sup |(SX)s − (SY )s |

p

s≤t

t ≤ c sup |Xu − Yu |p ds, p

0

(10.1.40)

u≤s

t ∗p p and furthermore, L∗p p,t (SX, SY ) ≤ c Lp,s (X, Y ) ds. Deﬁne, inductively, 0

n n−1 1 0 X 0 := B, X n := SX n−1 . Then L∗p ) ≤ cpn Tn! L∗p p,t (X , X p,T (X , X ). b(Xs , ys )m( dy) is a Lipschitz function of Xs . By (10.1.36), the integral n

CT

Thus p t = E sup b(Bs , y)ms ( dy) ds t≤T

1 0 L∗p p,T (X , X )

0

p

T ≤ E

(|b(0, y)| + c|Bs |)ms ( dy)

ds

0

≤ c

p

T |b(0, y)|ms ( dy)

T ds + c E |Bs |p ds < ∞,

0

0

as by the assumptions the integrals in the right-hand side are ﬁnite. Therefore,

L∗p,T (X n , X n−1 ) ≤

n≥1

n≥1

This implies

n≥1

cn

Tn n!

1/p

L∗p,T (X 1 , X 0 ) < ∞.

L∗p,T (X n , X n−1 ) < ∞. Then

L∗1,T (X n , X n−1 ) < ∞.

n≥1

In consequence, X n converges to some process X a.s. uniformly on bounded intervals. X is a.s. continuous, and E sups≤t |Xs |p < ∞, since E sups≤t |Bs |p < ∞. This yields Φ(m) ∈ Mp (CT , m0 ). 2 In addition, suppose that b is Lipschitz in both arguments; that is, |b(x1 , y1 ) − b(x2 , y2 )| ≤ c[|x1 − x2 | + |y1 − y2 |]

(10.1.41)

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

297

for all x1 , x2 , y1 , and y2 in CT , and consider the map Φ : Mp (XT , m0 ) → Mp (CT , m0 ). Lemma 10.1.15 (Contraction of Φ with respect to ∗p,t ) Suppose that (10.1.41) and the assumption of Lemma 10.1.14 hold. Then for t < T and m1 , m2 ∈ Mp (CT , m0 ),

∗p,t (Φ(m1 ), Φ(m2 ))

≤ cp e

cp t

t

∗p p,s (m1 , m2 ) ds,

(10.1.42)

0

where cp := c 2p−1 . Proof: For i = 1, 2 and t ≤ T , set

(i)

Xt

p ⎞1/p ⎛ t (i) ⎝ ⎠ , = Bt b Xs , ys dmi (y) ds 0

CT

and let m ∈ M 1 (m1 , m2 ), the class of probabilities on CT × CT with marginals m1 and m2 . Then ⎛ p ⎞ 1/p t

sup |Xs(1) − Xs(2) |p = ⎝ b Xs(1) , ys(1) dm1 (y (1) ) ds⎠ s≤t 0 CT p ⎞ ⎛ t 1/p p − ⎝ b Xs(2) , ys(2) dm2 (y (2) ) ds⎠ 0 CT ⎤ ⎡ p t

⎣ ≤ ds b Xs(1) , ys(1) − b Xs(2) , ys(2) dm y (1) , y (2) ⎦ 0

t ≤

CT ×CT

"

#p ds c Xs(1) − Xs(2) + ys(1) − ys(2) dm y (1) , y (2) .

0

Minimizing the right-hand side over all couplings, we get p sup Xs(1) − Xs(2) s≤t

298

10. Stochastic Diﬀerential Equations and Empirical Measures

t ≤ Gc · HI 2 J p−1

=:cp

0

t p (1) (2) p−1 ds sup Xu − Xu + cG · HI 2 J ds ∗p 1,s (m1 , m2 ). u≤s

=:cp

0

Consequently, for p ≥ 1, by the Gronwall lemma and ∗1,s ≤ ∗p,s , t p (1) (2) cp t ds ∗p sup Xs − Xs ≤ cp e p,s (m1 , m2 ). s
0

This yields the desired contractive inequality

∗p p,t (Φ(m1 ), Φ(m2 ))

≤ cp e

cp t

t

∗p p,s (m1 , m2 ) ds. 0

Theorem 10.1.16 Under (10.1.41) and

2

T ( b(0, ys ) dm0 (y))p ds < ∞, 0 CT

equation (10.1.38) has a unique weak and strong solution in Xp (CT , m0 ).

Proof: From Lemma 10.1.15 we obtain that for m ∈ Mp (CT , m0 ), T k ∗p

(Φ(m), m) k! p,T T k ∗p

p,T (Φ(m), m0 ) + ∗p ≤ 2p−1 CT p,T (m, m0 ) k! < ∞.

k+1

∗p (m), Φk (m)) ≤ CT p,T (Φ

The remaining part of the proof is similar to that of Theorem 10.1.3.

2

In the next step we turn our attention to the system of interacting particles deﬁned in (10.1.32), where ((Wti ), X0i ) are independent processes and identically distributed for all i. The following theorem asserts that as N → ∞, every X i,N has a natural limit X i . In fact, the (X i ) are indepen-

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

299

dent copies of the solution of the nonlinear equation of McKean–Vlasov type, p ⎞1/p t = Bt + ⎝ b(Xs , y)us ( dy) ds⎠ , ⎛

Xt

0

Xt=0

(10.1.43)

CT

= X0 , d

considered in Theorem 10.1.16 with B = W (1) . Let b satisfy the Lipschitz condition (10.1.36). Theorem 10.1.17 Suppose that

b(X 1s , y)p us ( dy) < ∞ a.s.

(10.1.44)

Then for any i ≥ 1, T > 0,

√

sup N N

E

sup |Xti,N t≤T

1/p −

X it |p

< ∞

for p ≥ 2

(10.1.45)

and 1/p N (1/p)−1 E sup |Xti,N − X it |p = o(1) t≤T

for 1 ≤ p < 2.

Proof: We drop further the superscript N . Then Xti − X it p ⎞ ⎛ t ⎞ 1/p ⎛ t p 1/p 1 N i j i = ⎝ b Xs , Xs ds⎠ − ⎝ b X s , y us ( dy) ds⎠ N j=1 0 0 ⎤ ⎡⎛ p ⎞ p ⎞ 1/p ⎛ t 1/p t N N 1 1 ⎥ ⎢ = ⎣⎝ b Xsi , Xsj ds⎠ − ⎝ b X is , Xsj ds⎠ ⎦ N N j=1 j=1 0 0 ⎤ ⎡⎛ p ⎞ p ⎞ 1/p ⎛ t 1/p t N N 1 1 ⎥ ⎢ b X is , Xsj ds⎠ − ⎝ b X is , X js ds⎠ ⎦ + ⎣⎝ N N j=1 j=1 0

0

300

10. Stochastic Diﬀerential Equations and Empirical Measures

⎡⎛ p ⎞ ⎞ ⎤ 1/p ⎛ t p 1/p t N 1 ⎥ ⎢ + ⎣⎝ b X is , X js ds⎠ − ⎝ b X is , y us ( dy) ds⎠ ⎦ . N j=1 0

0

Applying the Minkowski inequality and setting XT := sups≤T |Xs |, we obtain

X i − X i pT,p = EX i − X i pT p ⎡ t N 1 p−1 ⎣ i j i j b Xs , Xs − b X s , Xs E ds ≤ 4 N j=1 0 p T 1 N i j i j b X s , Xs − b X s , X s + E ds N j=1 0 p ⎤ T 1 N i j i + E ds b X s , X s − b X s , y us ( dy) ⎦ N j=1 0 ⎧ ⎡ ⎤ p T ⎨ N 1 j p−1 ≤ 4 ds cp E|Xsi − X is |p + cp E ⎣ |X − X js |⎦ ⎩ N j=1 s 0 p ⎫ ⎬ T 1 N i j i + E ds b(X s , X s ) − b(X s , y)us ( dy) . N j=1 ⎭ 0

Summing up over i and using the symmetry, we ﬁnd that

N X i − X i pT,p =

N i=1

X i − X i pT,p

⎧ ⎤ ⎡ ⎛ ⎞ 1/p p T ⎪ N N ⎨ ⎥ ⎢1 ≤ 4p−1 ds cp EXsi − X is pp + cp N E ⎣ ⎝ |Xsj − X js |p ⎠ ⎦ ⎪ N ⎩ i=1 j=1 0 p ⎫ ⎬ N N 1 + cp E b(X is , X js ) − b(X is , y)us ( dy) . N j=1 ⎭ i=1

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

301

Therefore, ⎧ ⎨ i ≤ 4p−1 cp ds X i − X i ps,p + cp X i − X ps,p ⎩ 0 p ⎫ ⎬ N N 1 1 + E b(X is , X js ) − b(X is , y)us ( dy) . N i=1 N j=1 ⎭ T

X i − X i pT,p

Consequently, by the Gronwall lemma, X i − X i pT,p T ≤ Cp eCp T 0

p ⎤ N N 1 1 ds ⎣ E b(X is , X js ) − b(X is , y)us ( dy) ⎦ N i=1 N j=1 ⎡

(with Cp = 2 · 4p−1 cp ) p T N 1 b(X is , X js ) − b(X is , y)us ( dy) ≤ Cp eCp T ds E N j=1 0 " #p 1 = Cp eCp T T · E 0 √ ; N here we also used the Marcinkiewicz–Zygmund inequality (cf. Chow and Teicher (1978, p. 357)) for p ≥ 2, and the Pyke and Root (1968) inequality for 1 ≤ p < 2. 2

Corollary 10.1.18 Let m denote the law of X satisfying (10.1.43), and let mN be the law of (X 1,N , . . . , X N,N ). Then, under the assumptions of Theorems 10.1.16 and 10.1.17, mN is m-chaotic. We next study the limiting case p = ∞ (cf.(10.1.33)). In contrast to the limiting case in Section 10.1.4 of pth norm interaction, we obtain the propagation of chaos property for pth mean interaction in time under a stronger Lipschitz condition. Consider for m ∈ M 1 (CT ),

Xt

= Bt + ess sup b(Xs , y)m( dy) . s≤t CT

(10.1.46)

302

10. Stochastic Diﬀerential Equations and Empirical Measures

Here Xt , Bt , and b are real-valued, Bt is a process on CT having ﬁnite pth moment, E ess sups≤T |Bs |p < ∞, and b satisﬁes the Lipschitz condition for all x1 , x2 , and y ∈ CT : |b(x1 , y) − b(x2 , y)| ≤ c|x1 − x2 |,

with 0 < c < 1.

(10.1.47)

We shall use the following Lp -type metric for p ≥ 1: ∗ (X, Y ) := (E ess sup |Xs − Ys |p )1/p L p,t s≤t

in X (CT ).

(10.1.48)

Let 4 ∗p,t (m1 , m2 )

∗p,t (m1 , m2 ) = L

(10.1.49)

be the corresponding minimal metric. Consider the set of measures on M 2 (CT ): Ap (CT , m0 ) = {m1 ∈ M 1 (CT ); ∗p,T (m1 , m0 ) < ∞}, M

(10.1.50)

and let Xp (CT , m0 ) denote the corresponding class of processes. For m0 ∈ Ap (CT , m0 ), consider the linear equation M 1 (CT ) and m ∈ M

Xt

= Bt + ess sup b(Xs , ys ) dm(y) . s≤t

(10.1.51)

CT

Lemma 10.1.19 Assume that the Lipschitz condition (10.1.47) holds, and let ess sup b(0, ys )m( dy) < ∞. s≤T CT

Then (a) Equation (10.1.51) has a unique solution X. (b) If Φ(m) is the law of X, then Φ(m) ∈ Mp (CT , m0 ); that is, Ap (CT , m0 ). Ap (CT , m0 ) → M Φ:M

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

Proof: Let X ∈ Xp (CT , m0 ) and deﬁne := Bt + ess sup b(Xs , ys )m( dy) . s≤t

(SX)t

CT

Then |(SX)t − (SY )t |p p = ess sup b(Xs , ys )m( dy) − ess sup b(Ys , ys )m( dy) 0≤s≤t 0≤s
CT

≤ ess sup c |Xs − Ys |

p

0≤s≤t

by the Lipschitz condition (10.1.47). This amounts to ess sup |(SX)s − (SY )s |p ≤ cp ess sup |Xs − Ys |p , 0≤s≤t

s≤t

∗ (X, Y ). ∗p,t (SX, SY ) ≤ c L and L p,t Deﬁne, inductively, X 0 = B, X n = SX n−1 . Then ∗ (X 1 , X 0 ). ∗p,t (X n , X n−1 ) ≤ cn L L p,t Furthermore, ∗ (X 1 , X 0 ) = L p,T

p 1/p E ess sup b(Bs , ys )m( dy) s≤T

#p 1/p " ≤ E ess sup b(0, ys )m( dy) + c |Bs |m( dy) s≤T 1/p ≤ c ess sup b(0, ys )m( dy) + E ess sup |Bs |p < ∞.

s≤T

s≤t

This yields n≥1

∗p,T (X n , X n−1 ) ≤ L

n≥1

∗p (X 1 , X 0 ) < ∞. cn L p,T

303

304

10. Stochastic Diﬀerential Equations and Empirical Measures a.s.

Therefore, X n −→ X, uniformly on bounded intervals, and E ess sups≤t |Xs |p < ∞.

2

In addition, suppose that b is a Lipschitz function in both arguments; for all x1 , x2 , y1 , and y2 in CT , |b(x1 , y1 ) − b(x2 , y2 )| ≤ c[|x1 − x2 | + |y1 − y2 |],

(10.1.52)

Ap (CT , m0 ) → where we assume that 0 < c < 12 . Consider the mapping Φ : M Ap (CT , m0 ). M

Lemma 10.1.20 (Contraction of Φ with respect to the minimal metric

∗p,t ) Under (10.1.52) and the assumptions of Lemma 10.1.14, for t < T Ap (CT , m0 ), the following contraction property holds: and m1 , m2 ∈ M

∗p,t (Φ(m1 ), Φ(m2 )) ≤

c ∗

(m1 , m2 ). 1 − c p,t

(10.1.53)

Proof: For i = 1, 2, and t ≤ T , deﬁne

(i)

Xt

(i) = Bt + ess sup b(Xs , ys ) dmi (y) , 0<s
and let m ∈ M 1 (m1 , m2 ). Then E ess sup |Xs(1) − Xs(2) |p s≤t

= E ess sup b Xs(1) , ys(1) dm1 y (1) s≤t CT p

− ess sup b Xs(1) , ys(2) dm2 y (2) s≤t CT ⎤p ⎡

(1) ≤ E ess sup c ⎣Xs(1) − Xs(2) + ys − ys(2) dm y (1) , y (2) ⎦ . s≤t CT ×CT

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

305

Therefore, passing to minimal metrics on the right-hand side,

p 1/p E ess sup Xs(1) − Xs(2) s≤t

p 1/p ≤ c E ess sup Xs(1) − Xs(2) s≤t

⎡⎛ + c ⎣⎝

inf

⎞p ⎤1/p ess sup ys(1) − ys(2) dm(y1 , y2 )⎠ ⎦ ;

m∈M 1 (m1 ,m2 ) s≤t CT ×CT

that is, p 1/p (1) (2) (1 . − c) E ess sup Xs − Xs ≤ c ∗1,s (m1 , m2 ) ≤ c ∗p,s (m1 , m2 ) s≤t

Passing to the minimal metrics in the left-hand side, we obtain

∗p p,T (Φ(m1 ), Φ(m2 )) ≤

c ∗

(m1 , m2 ), 1 − c p,T 2

as desired.

Next, we conclude the existence of a unique solution of the McKean– Vlasov-type equation Xt = Bt + ess sup b(Xs , ys )us ( dys ) , s≤t

Xt=0 = X0 . (10.1.54)

Theorem 10.1.21 Under (10.1.52), and assuming ess sup b(0, ys ) dm0 (y) < ∞, s≤T CT

Ap (CT , m0 )) a unique weak and strong equation (10.1.54) has (for m ∈ M solution in Xp (CT , m0 ).

306

10. Stochastic Diﬀerential Equations and Empirical Measures

Proof: From Lemma 10.1.20 with C := conclude that

c Ap (CT , m0 ), we , and m ∈ M 1−c

k+1 (m), Φk (m)) ≤ C k ∗p

∗p p,T (Φ p,T (Φ(m), m) < ∞,

2

which implies the theorem.

Consider next a system of N interacting particles driven by the equation (10.1.33), namely

Xti,N

N 1 = Wti + ess sup b Xsi,N , Xsj,N s≤t N j=1

(10.1.55)

and X0i,N = X0i ,

1 ≤ i ≤ N.

We shall show that X i,N has a natural limit X i , where the X i are i.i.d. copies of the solution of (10.1.43). Theorem 10.1.22 Suppose that (10.1.52) holds and that the r.v. Ys,j := b(X 1s , X js ) on C[0, T ] are either (i) in the domain of normal attraction (dna) a Gaussian law, or (ii) satisfy the bounded law of the iterated logarithm (BLIL). Suppose also that Eb(X 1s , X js )2∞ < ∞. Then for any i ≥ 1, ' ' ' ' sup aN E 'Xti,N − X it '

∞

N

where in case (i) aN =

√

< ∞,

N , while aN =

(10.1.56) $ N log log N in case (ii).

Proof: Similarly to the proof of Theorem 10.1.17, we obtain from the condition 2c < 1 that for α ≥ 1, ∗α,T (X i , X i ) L ≤

α N N 1 1 1 E ess sup b(X is , X js ) − b(X is , y)us ( dy) . 1 − 2c N i=1 s≤T N j=1

If (Ys,j ) are in the orgensen (1977)), then (10.1.56) fol√ dna (cf. Hoﬀmann–J¨ lows with aN = N . If (Ys,j ) satisfy the BLIL, then for the corresponding

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

307

N centered sum we have SN lim E SaN ∞ ≤ lim SaNN∞ < ∞ a.s. (cf. Kuelbs (1977)), and thus (10.1.56) follows. 2

We remark that invoking Corollary 5.7 of Hoﬀmann–J¨ orgensen (1977), N a suﬃcient condition for the dna of SN = j=1 Xj is given by EX1 2bL < ∞,

(10.1.57)

where · bL is the bounded Lipschitz norm with respect to a uniform distance to a Gaussian law. Corollary 10.1.23 Suppose the assumptions of Theorems 10.1.21 and 10.1.22 hold. Let m denote the law of X, and mN stands for the law of (X 1,N , . . . , X N,N ). Then mN is m-chaotic. Remark 10.1.24 Applying a similar technique in the case 0 < p < 1, we see that there exists no unique solution of the linear equation, and furthermore, there is no propagation of chaos.

10.1.5

Minimal Mean Interactions in Time

Next, we study the analogue of equation (10.1.33) with minimal mean interaction in time:

Xti,N X0i,N

N 1 i i,N j,N = Wt + ess inf b(Xs , Xs ) , s≤t N j=1 = X0i ,

(10.1.58)

1 ≤ i ≤ N.

The corresponding Boltzmann type equation is

Xt Xt=0

= Bt + ess inf b(Xs , y)us ( dy) , s≤t

(10.1.59)

= X0 .

We obtain the following results. (The proofs are similar to those in section 10.1.4 and are therefore omitted.)

308

10. Stochastic Diﬀerential Equations and Empirical Measures

Theorem 10.1.25 Suppose that m0 ∈ M 1 (CT ) and for all x1 , x2 , y1 , and y2 in CT , |b(x1 , y1 ) − b(x2 , y2 )| ≤ c[|x1 − x2 | + |y1 − y2 |],

(10.1.60)

where 0 < c < 12 . Suppose also that ess sup b(0, ys ) dm0 (y) < ∞. s≤T

(10.1.61)

CT

Then (10.1.59) has a unique strong solution in Xp (CT , m0 ). The system (X i,N ) in (10.1.58) has a natural limiting process (X i ), where X , i ≥ 1, are i.i.d. copies of the solution X of (10.1.59). i

Theorem 10.1.26 Suppose the assumptions of Theorem 10.1.25 and Theorem 10.1.22 hold. Then for any i ≥ 1, sup aN E sup Xti,N − X i,N t < ∞. N

(10.1.62)

t≤T

Corollary 10.1.27 Under the conditions of Theorems 10.1.25 and 10.1.26, the system (10.1.58) admits the propagation of chaos.

10.1.6

Interactions with a Normalized Variation of the Neighbors: Relaxed Lipschitz Conditions

Consider the following stochastic system: t Xti,N

= Wti +

X0i,N

= X0i ,

0

⎛

⎞ N

◦ 1 ⎠ ds, ⎝ b Xsi,N, Xj,N s N j=1

(10.1.63)

1 ≤ i ≤ N.

Here ◦

Xis :=

Xsi − EXsi E|Xsi − EXsi |

(10.1.64)

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

309

is the normalized variation of particle i, and ((Wti ), X0i ) are independent identically distributed processes on CT × IR. The drift is given by the mean of the interactions with the normalized variation of all particles. We assume that b(x, 0) = 0,

for all x;

(10.1.65)

that is, the interaction is zero if the relative variation is zero. The McKean–Vlasov-type equation corresponding to (10.1.62) is given by t Xt

◦

b(Xs , y) dP Xs (y)

= Bt +

ds,

(10.1.66)

0

= X0 ,

Xt=0 d

where B = W i . Note that B in this section is not necessarily a Brownian motion. We study these equations under the following relaxed Lipschitz condition on b. Assume that b has a partial derivative b2 :=

∂b ∂y

(10.1.67)

with respect to the second argument, and consider the following Lipschitztype assumptions: For all x1 , x2 , y ∈ CT , (L1) |b2 (x1 , y) − b2 (x2 , y)| ≤ c|x1 − x2 |; or, for all x1 , x2 , y1 , and y2 , (L2) |b2 (x1 , y1 ) − b2 (x2 , y2 )| ≤ c [|x1 − x2 | + |y1 − y2 |] . (L2) allows a quadratic growth of b with respect to the second component. To obtain contraction properties in this case, we have to switch to a suitable probability metric with regularity conditions of higher order. This makes necessary an essential change in the method of the proofs given so far. ◦

Let m ∈ M1 (CT ) be the distribution of a process (ξs ), and denote by m ◦ the distribution of the normalized process ( ξs ) assuming an absolute ﬁrst moment of the marginal measure ms . Deﬁne Ns

◦

:= ms − δ0 = Nsm ,

and

(10.1.68)

310

10. Stochastic Diﬀerential Equations and Empirical Measures (−1) FNs (y)

y :=

FNs (u) du.

(10.1.69)

−∞

Following the common derivates notation of a function f , f (s) , s ≥ 1, we deﬁne the s-fold integrated function by f (−s) , and thus (f (−s) )(s) = f . ◦

Note that due to (10.1.65), we can replace the integration of ms in (10.1.66) by integration of Ns . Consider then the linear equation

t Xt = Bt +

b(Xs , y) dNs (ys )

ds.

(10.1.70)

0

Integration by parts in (10.1.70) leads to the equivalent equation t Xt = Bt +

(−1) b2 (Xs , y) dFNs (y)

ds.

(10.1.71)

0

Theorem 10.1.28 Suppose that m ∈ M 1 (CT ) has a ﬁnite ﬁrst moment and E sups≤T |Bs | < ∞. Furthermore, let (L1) be satisﬁed and suppose that T

ds

|b2 (0, y)| |FNs (y)| dy < ∞.

0

Then

T ds

◦

T ξs |b2 (0, y)| |FNs (y)| dy = E |b2 (0, t)| dt < ∞, (10.1.72)

0

0

0

(10.1.70) has a unique strong solution X, and moreover, E sups≤T |Xs | < ∞. Proof: Let t (SX)t

:= Bt + 0

⎛

⎞

ds ⎝ b(Xs , ys ) dNs (ys )⎠ IR

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

t =

Bt +

⎛

311

⎞

(−1)

ds ⎝ b2 (Xs , ys ) dFNs (ys )⎠ .

0

IR

Then by the Lipschitz condition (L1), (−1) ds (b2 (Xs , ys ) − b2 (Ys , ys )) dFNs (ys )

t |(SX)t − (SY )t | ≤ 0

IR

t ≤

ds c|Xs − Ys | |FNs (ys )| dys . 0

IR (−1)

Observe that the total variation norm of the measure FNs (dy) is 1:

(−1) Var(FNs )

=

0 ∞ ◦ ◦ |FNs (y)| dy = F ξ (y) dy + (1 − F ξ◦ (y)) dy = E| ξs | = 1. −∞

IR

s

s

0

Therefore, t |(SX)t − (SY )t | ≤ c

|Xs − Ys | ds,

(10.1.73)

0

implying L∗1,t (SX, SY ) ≤ c X n = SX n−1 . Then

t 0

L∗1,T (X n , X n−1 ) ≤ cn

L∗1,s (X, Y ) ds. Deﬁne inductively X 0 = B,

Tn ∗ L (X 1 , X 0 ). n! 1,T

(10.1.74)

Let us estimate the term on the right-hand side of (10.1.74): s ⎛ ⎞ (−1) ∗ 1 0 L1,T (X , X ) = E sup ds ⎝ b2 (Bs , ys ) dFNs (ys )⎠ s≤T 0

T ≤ E

ds

0

IR

IR

|b2 (Bs , ys )| |FNs (ys )| dys

(10.1.75)

312

10. Stochastic Diﬀerential Equations and Empirical Measures

T ≤ E

(c|Bs | + |b2 (0, ys )|)|FNs (ys )| dys

ds 0

IR

T ≤ E

T ds c|Bs | +

0

ds 0

|b2 (0, ys )| |FNs (ys )| dys < ∞.

IR

Now the equality in (10.1.72) results from the following integration by parts arguments:

|b2 (0, y)||FNs (y)| dy

IR

=

|b2 (0, y)||F ξ◦ (y) − F0 (y)| dy s

IR

0 =

|b2 (0, y)|F ξ◦

∞ (y) dy +

s

−∞

|b2 (0, y)|(1 − F ξ◦ (y)) dy s

0

◦ ⎛ ⎞ +∞ y ξs ⎝ |b2 (0, t)| dt⎠ dF ◦ (y) = E |b2 (0, t)| dt < ∞. ξ

=

−∞

s

0

0

Consequently, L∗1,T (X 1 , X 0 ) < ∞. Combining (10.1.74), (10.1.75) implies the existence and the uniqueness of a strong solution X. Moreover, L∗1,T (X, B) ≤

∞

L∗1,T (X n , X n−1 ) ≤ ecT L∗1,T (X1 , B) < ∞;

n≥1

that is, E sups≤T |Bs | < ∞ provides that E sups≤T |Xs | < ∞.

2

We next extend the result of Theorem 10.1.28 to the case where pth moments exist, p ≥ 1. Deﬁne X∗T,p = (E supt≤T |X(t)|p )1/p , 1 ≤ p < ∞, and X∗T,∞ = E ess sup0
⎛ ds ⎝

IR

⎞1/p |b2 (0, ys )FNs (ys )|p dys ⎠

< ∞,

(10.1.76)

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

313

and (ii) if p = ∞, then T

ds(ess sup |b2 (0, ys )||FNs (ys )|) < ∞ ys

0

(p = ∞).

(10.1.77)

Under these assumptions, the SDE (10.1.70) has a unique solution X, and furthermore, X∗T,p < ∞. In particular, if Φ(m) is the distribution of the solution of (10.1.70), then Φ(m) maps Mp (CT , δ0 ) into Mp (CT , δ0 ). t Proof: As in Theorem 10.1.28, we have |(SX)t − (SX)t | ≤ c |Xs − Ys | ds. Thus for any 1 < p ≤ ∞,

L∗p,T (SX, SY

t ) ≤ L∗p,T (X, Y ) ds.

0

0

Further, for 1 ≤ p < ∞ (the case p = ∞ is similar), s ⎛ p ⎞1/p (−1) L∗p,T (X, B) = (E sup ds ⎝ b2 (Bs , ys ) dFNs (ys ) ⎠ s≤T ⎛ ⎛ ≤ ⎝E ⎝

0

T

ds

0

T ≤

≤

⎞p ⎞1/p

|b2 (Bs , ys )| |FNs (ys )| dys ⎠ ⎠

IR

⎡ ⎛

⎞p ⎤1/p

ds ⎣E ⎝ |b2 (Bs , ys )| |FNs (ys )| dys ⎠ ⎦

0

T

IR

IR

⎡ ⎛

ds ⎣E ⎝

0

⎞p ⎤1/p (c|Bs | + |b2 (0, ys )|) |FNs (ys )| dys ⎠ ⎦

IR

T ≤ c

ds(E|Bs |p )1/p +

0

T 0

⎞1/p

⎛

ds ⎝ |b2 (0, ys )FNs (ys )|p dys ⎠

< ∞.

IR

2

Now we continue as in Theorem 10.1.28 to complete the proof. Denote by M2∗ (CT , δ0 ) the space of all m ∈ M2 (CT , δ0 ) such that inf E|ξs − Eξs | =: A∗T > 0,

0<s≤T

d

ξ = m.

(10.1.78)

314

10. Stochastic Diﬀerential Equations and Empirical Measures

Condition (10.1.78) postulates that the L1 -variation does not converge to 0 for 0 < s < T . In the case of B being a Brownian motion, this means that we do not start (at time s = 0) deterministically at a ﬁxed point. Let Φ(m) be the solution of (10.1.70), t Xt = Bt +

⎛

⎞

ds ⎝ b(Xs , ys ) dms (ys )⎠

0

◦

IR

under the assumptions of Theorem 10.1.29 with p = 2. Then by Theorem 10.1.29, Φ maps M2 (CT , δ0 ) into M0 (CT , δ0 ). Theorem 10.1.30 (Contraction of Φ) Suppose that the Lipschitz condition (L2) holds, and m1 , m2 ∈ M2∗ (CT , δ0 ). Then the following contraction inequality for Φ in terms of ∗2,t is valid: t ≤ ct ∗2,u (m1 , m2 ) du.

∗2,t (Φ(m1 ), Φ(m2 ))

(10.1.79)

0

Proof: For m1 , m2 ∈ M2∗ (CT , δ0 ) let (i) Xt

t = Bt +

⎛

⎝

⎞

b Xs(i) , ys(i) dFN (i) ys(i) ⎠ ds s

0

t = Bt +

⎛

IR

⎝

0

b2 Xs(i) , ys(i) dF

(−1)

(i)

Ns

IR

⎞

ys(i) ⎠ ds.

Then (1)

Xt

(2)

− Xt

⎡ t

⎣ b2 Xs(1) , ys(1) dF (−1) = ys(1) (1) 0

Ns

IR

b2 Xs(2) , ys(2) dF

− IR

(−1)

The total variation norm of FNs

(−1) (2)

Ns

⎤

ys(2) ⎦ ds.

is 1 and the total mass is 0. Then the (−1)

− Jordan decomposition has the form FNs (dx) = µ+ s (dx) − µs (dx), where

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

315

− + − + µ+ s (IR) + µs (IR) = 1, µs (IR) − µs (IR) = 0. In other words, µs (IR) = 1 − µs (IR) = 2 .

We write F

(−1) (i)

Ns

(1) Xt

−

(i)+

(ds) = µs t

(2) Xt

=

(i)−

(dx) − µs

(dx), and consequently,

⎡

⎣ dys(1) ds b2 Xs(1) , ys(1) µ(1)+ − µ(1)− s s

0

IR

−

b2 Xs(2) , ys(2)

µ(2)+ − µ(2)− s s

⎤

dys(2) ⎦ .

IR

(1) (2) (1)+ (2)+ Let dm+ , y and µs ; that is, m+ y be a coupling for µs s s s s is a (i)+

1 2

and such that πi m+ , i = s = µs

(1) (2) − 1, 2, πi the ith component. Similarly, let dms ys , ys be a coupling positive measure with total mass (1)−

(2)−

for µs

(1)

and µs

(2)

Xt − Xt

t =

. Then

⎡

(1) (2) ds ⎣ , y b2 Xs(1) , ys(1) − b2 Xs(2) , ys(2) dm+ y s s s

0

IR

−

b2 Xs(1) , ys(1) − b2 Xs(2) , ys(2)

dm− s

⎤ . ys(1) , ys(2) ⎦

IR

Consequently, by the Lipschitz condition, (1) (2) (10.1.80) Xt − Xt ⎞ ⎛ t

− ≤ ds ⎝ b2 Xs(1) , ys(1) − b2 Xs(2) , ys(2) d m+ ys(1) , ys(2) ⎠ s + ms 0

t ≤ 0

IR2

⎞

− ds ⎝ c Xs(1) − Xs(2) + c ys(1) − ys(2) d m+ ys(1) , ys(2) ⎠ . s + ms ⎛

IR2

− + Observe that the total mass of m+ s + ms is 1, and for i = 1, 2, πi ms + (i)+ (i)− (−1) πi m− + µs is the variation of F (i) . Minimizing with respect to s = µs Ns

316

10. Stochastic Diﬀerential Equations and Empirical Measures (i)+

− all couplings m+ s + ms with marginals µs

(i)−

+ µs

, i = 1, 2, we obtain

(1) (2) Xt − Xt t c ds Xs(1) − Xs(2) + Fµ(1)+ +µ(1)− (x) − Fµ(2)+ +µ(2)− (x) dx. ≤ s s s s 0

IR

As Fµ(1)+ +µ(1)− (x) = FVar(F (−1) ) (x), we have that the integral on the rights

s

(1) Ns

hand side can be bound from above by κ2 : (x) − F (x) F dx (−1) Var F (−1) Var F (1) (2) Ns Ns IR

(−1) (−1) ≤ |x| Var Var F (1) − Var F (2) ( dx) Ns

IR

(10.1.81)

Ns

|Fµ1 (x) − Fµ2 (x)| dx ≤ |x| Var(µ1 − µ2 )( dx)

d (−1) (−1) (−1) (−1) ≤ |x| Var F (1) − F (2) ( dx) = |x| F (1) (x) − F (2) (x) dx Ns Ns Ns dx Ns IR IR ◦ = |x| FN (1) (x) − FN (2) (x) dx (as Ns := P ξs − δ0 )

using

s

IR

=

s

◦(1) ◦(2) |x| F ◦(1) (x) − F ◦(2) (x) dx =: κ2 ξs , ξs ,

IR

ξs

ξs

◦(i)

◦(i)

(i)

where ξs are r.v.s with laws P ξs = P (ξs (i) ms .

−Eξs(i) )/E |ξs(i) −Eξs(i) |

(i)

, and P ξs =

The distance κ2 has the following representation as a minimal metric:

% & d κ2 ξs(1) , ξs(2) = inf E |ηs(1) |ηs(1) − |ηs(2) |ηs2 ; ηs(i) = ξs(i) . This representation allows us to estimate κ2

◦(1)

◦(2)

ξs , ξs

(10.1.82)

(1) (2) by κ2 ξs , ξs

making use of the assumption that 2 sup E ξs(i) ≤ AT

s≤T

and

inf E ξs(i) − Eξs(i) =: A∗T > 0.

s≤T

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

317

Then κ2

◦(1)

◦(2)

ξs , ξs (10.1.83) ◦(1) ◦(2) ◦(1) ◦(2) ≤ 2 ξs , ξs E| ξs |2 + E| ξs |2 2 1 (1) (1) (2) (2) ξs − Eξs ξs − Eξs ≤ 2AT 2 , (1) (1) (2) (2) E|ξs − Eξs | E|ξs − Eξs | 2 1 (1) (1) (2) (2) ξs − Eξs ξs − Eξs ≤ 2AT 2 , (1) (1) (1) (1) E|ξs − Eξs | E|ξs − Eξs | 1 2 (2) (2) (2) (2) ξs − Eξs ξs − Eξs + 2AT 2 , (1) (1) (2) (2) E|ξs − Eξs | E|ξs − Eξs

2AT (1) (1) (2) (2) ·

− Eξ , ξ − Eξ ≤ ξ 2 s s s s (1) (1) E|ξs − Eξs |

1/2 |E|ξ (1) − Eξ (1) | − E|ξ (2) − Eξ (2) || s s s s + 2AT E|ξs(2) − Eξs(2) |2 (1) (1) (2) (2) (E|ξs − Eξs |)(E|ξs − Eξs |)

(1) (2) . ≤ cT 2 ξ2 , ξs

(1) (2) (1) (2) In the above derivation we used the fact that Eξs − Eξs ≤ 1 ξs , ξs

(1) (2) 1 ≤ 2 ξs , ξs , and ≤ A1∗ . Combining these estimates, we (i) (i) |Eξs −Eξs | T write t

(1) (2) (2) , − X ≤ c ds Xs(1) − Xs(2) + cT 2 m(1) Xt t s , ms

(10.1.84)

0

(i)

invoking the assumptions E(ξs )2 < ∞, i = 1, 2, and noticing that (i) (i) ∗ E ξs − Eξs ≥ AT > 0 uniformly on s ∈ (0, T ]. Then, by the Gronwall inequality, with c∗T = c ∨ ct , we have the uniform bound t

∗ (2) sup Xs(1) − Xs(2) ≤ c∗T eCT T ∗2,s m(1) ds. s , ms s≤t

0

318

10. Stochastic Diﬀerential Equations and Empirical Measures

By passing to minimal metrics, the above inequality implies that

∗2,t (Φ(m1 ), Φ(m2 ))

≤

c∗T

e

c∗ TT

t

∗2,s (m1 , m2 ) ds. 2

0

Theorem 10.1.31 Suppose that B∗T,2 < ∞. Suppose also that the assumption (L2) holds. Finally, assume that for some m0 ∈ M2 (CT , δ0 ), the following boundedness assumptions on the interaction term b hold:

T ds

|b2 (0, y)FNs (y)|2 dy

1/2 < ∞,

with Ns = Nsm0

(10.1.85)

0

and Φn (m0 ) ∈ M2∗ (CT , δ0 ),

∀ n ∈ IN.

(10.1.86)

Then the Boltzmann-type equation (10.1.66) has a unique weak and strong solution in M2 (CT , δ0 ). Proof: From Theorem 10.1.30,

∗2,T (Φ(k+1) (m), Φ(k) (m)) ≤ CTk

Tk ∗ ( (Φ(m), δ0 ) + ∗2,T (m, δ0 )) < ∞, k! 2,T

for m ∈ M2 (CT , δ0 ). Therefore, (Φk (m)) is a Cauchy sequence in (CT , ∗2,T ) and converges to a ﬁxed point. If X (k+1) , X (k) are the optimal couplings of Φ(k+1) (m) and Φ(k) (m) respectively, we obtain that (X (k) ) is an L2∗,T 2 Cauchy sequence, leading to a (unique) L∗2,T ﬁxed point X.

Remark 10.1.32 Condition (10.1.86) postulates that the solutions of the linear equations corresponding to Φm (m0 ) have strictly positive variation. A simple, suﬃcient condition for that to hold is inf s≤T |Bs − EBs | ≥ T M + ε, provided that b is bounded and |b| ≤ M . This condition is useful only for ﬁxed T but not for T → ∞. However, it might be possible (at least in some examples, as in the construction of solutions of special SDEs) to construct a solution piecewise on small time intervals and to join the pieces to a solution on the whole real line. For special choices of b it is possible to obtain weaker suﬃcient conditions for (10.1.86). Condition (10.1.78) is needed in order

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

319

to reconstruct the process. Without this condition we only can reconstruct the normalized process (cf. (10.1.83)). We now turn our attention to equation (10.1.63). The next theorem asserts that as N → ∞ each X i,N has a limit X i . The (X i ) are independent copies of the solution of (10.1.67) considered in Theorem 10.1.31. Theorem 10.1.33 Suppose that (L2) holds, and moreover, b∞ = supx,y |b(x, y)| < ∞. Suppose also that uniformly on i, |W i |T,∞ := ess sup sup |Wsi | ≤ X < ∞. 0<s
Then for any i ≥ 1, T > 0, √ sup N E sup |Xti,N − X it | < ∞. 0
N

Corollary 10.1.34 (Propagation of Chaos) Let m denote the law of X i satisfying (10.1.25) and let WN denote the law of (X 1,N , . . . , X N,N ). Then, under the assumptions of Theorems 10.1.31 and 10.1.33, WN is m-chaotic. Proof of Theorem 10.1.33: Omitting the index N , we get t Xti

−

X it

= 0

=: where I1 (t)

:=

I2 (t)

:=

I3 (t)

:=

N ◦ ◦ 1 i j ds b(Xs , Xs ) − ds b(X s , ys )P Xs ( dy) N j=1 t

0

⎡ t ⎤ t N N ◦ ◦ 1 1 ⎣ ds b(Xsi , Xjs ) − ds b(X is , Xjs )⎦ , N j=1 N j=1 0 0 ⎡ t ⎤ t N N ◦ ◦ 1 1 ⎣ ds b(X is , Xjs ) − ds b(X is , Xjs )⎦ , N j=1 N j=1 0 0 ⎡ t ⎤ t N ◦ ◦ 1 ⎣ ds b(X is , Xjs ) − ds b(X s , ys )P Xs ( dy)⎦ , N j=1 0

and E|I1 |T

:=

CT

I1 (t) + I2 (t) + I3 (t),

E sup E|I1 (t)| 0
0

CT

320

10. Stochastic Diﬀerential Equations and Empirical Measures

T =

E 0

N 1 ◦ ◦ ds [b(Xsi , Xjs ) − b(X is , Xjs )] . N j=1

From (L2), |b(x, y) − b(x, y)| = |b(x, y) − b(x, 0) − (b(x, y) − b(x, 0))| y t = b2 (x, t) dt − b2 (x, t) dt 0

0

|y| ≤ |b2 (x, t) − b2 (x, t)| dt 0

≤ c|x − x| |y|. Therefore, E|I1 |T ≤ c E

T 0

1 N

N

◦

j=1

|Xsi − X is | |Xjs |.

Assuming that b∞ = supx,y |b(x, y)| < ∞ and |W i,N |T,∞ := ess sup sup |Wsi,N | ≤ K, 0<s≤T

then supi,N |Xti,N | ≤ K + T · b∞ . Therefore, T E|I1 |T,1

≤ C

ds E|Xsi − X is |, 0

T E|I2 |T,1

≤ 0

N ◦ 1 i ◦j i i b X s , Xs − b X s , Xs . ds N j=1

For 0 < y < y, y |b(x, y) − b(x, y)| =

|b2 (x, t)|dt

y

y ≤ c y

|b2 (x, t)

−

b2 (0, t)| dt

y +c y

|b2 (0, t) − b2 (0, 0)| dt

10.1 Propagation of Chaos and Contraction of Stochastic Mappings

321

y+y 1 2 2 ≤ c|x| |y − y| + |y − y | ≤ c|y − y| |x| + . 2 2

In general, |b(x, y) − b(x, y)| ≤ c|y − y| |x| +

|y|+|y| 2

.

Assuming that Xsj,N are bounded a.s., |X j,N |T,∞ := ess sup sup |Xsj,N | 0<s
< ∞, we obtain I2 T,1

:= E|I2 |T T N ◦ 1 i ◦j i j b − b ≤ ds X , X X , X s s s s N j=1 0

T ≤ c

ds 0

T ≤ c 0

N ◦j ◦ ◦ ◦ 1 1 E|Xjs − Xs | × |X is | + |Xjs | + |Xjs | N j=1 2

N ◦ ◦ 1 ds E|Xjs − Xjs |, N j=1

changing the values of the absolute constants c wherever it is necessary. Using the estimates for |I1 |T and |I2 |T , we have

N X i − X i T,1 = T ≤ c

ds 0

⎧ N ⎨ ⎩

X i − X i T,1

i=1

E|X i − X i |s,1 +

i=1

N

T

+

N

i=1 0

N

E|X j − X j |s,1

j=1

⎫ ⎬ ⎭

N ◦ ◦ 1 i j Xs ds b(X s , Xs ) − b(X s , ys )P ( dy) . N j=1 CT

By the Gronwall lemma and the Pyke and Root (1968) inequality,

i

X i − X T,1

T ≤ c 0

N T N ◦ 1 1 ds ⎣ ds b(X is , Xjs ) N i=1 N j=1 ⎡

0

322

10. Stochastic Diﬀerential Equations and Empirical Measures

− CT

⎤ 1 b(X s , ys )P Xs ( dys )⎦ ≤ O( √ ). N ◦

2

10.2 Rates of Convergence of Empirical Measures in the Kantorovich Metric Let µ be a probability measure on IRd (typically unknown) and let X1 , X2 , . . . , Xn be i.i.d. r.v.s with common probability law µ. Let

µn

n 1 = δX n i=1 i

⎧ ⎨ 1 with δx (A) := ⎩ 0

if x ∈ A, if x ∈ A

be the empirical measure of X1 , X2 , . . . , Xn . Then it is well known that µn → µ a.s.

(10.2.1)

in the topology of weak convergence.(1) If σ2 =

|x|2 µ( dx) < ∞,

(10.2.2)

then by the SLLN, 1 2 X = n i=1 i n

|x|2 µn ( dx) → σ 2

(10.2.3)

a.s. and in L1 (P ). We denote by P2 = P2 (IRd ) the space of probability measures on Borel sets of) IRd having ﬁnite second moments, i.e., (the 2 such that |u| µ( du) < ∞. Recall that the L2 -Kantorovich metric (the Wasserstein metric of order 2) on P2 is

22 (µ, ν) = inf (1) See

|u − v|2 P ( du, dv); P ∈ M (µ, ν) ,

Dudley (1989) and Rachev (1991) and the references therein.

10.2 Rates of Convergence in the Kantorovich Metric

323

where M (µ, ν) denotes the set of probability measures on IRd × IRd with marginals µ and ν. (Here and below | · | denotes the usual Euclidean norm on the appropriate space.) Equivalently,

22 (µ, ν) = inf E|X − Y |2 , where the “inf” is taken over all pairs of r.v.s X, Y having laws µ, ν, respectively, in other words, over all couplings of µ and ν. From (10.2.1)–(10.2.3) it follows that 2 (µn , µ) → 0 a.s. In this section we investigate the rate of convergence to zero of E 22 (µn , µ). (2) A similar result is obtained for inﬁnite exchangeable sequences except that the common probability law must be replaced by the directing measure. Finally a mean square uniform rate of convergence is obtained for an i.i.d. sequence of stochastic processes on a ﬁnite time interval. Theorem 10.2.1 Suppose that the unknown µ has high enough ﬁnite ab solute moments c := |u|d+5 µ( du) < ∞. Then there is a constant C, depending only on c and the dimension d, such that E 22 (µn , µ) ≤ Cn−2/(d+4) . The proof is built up on lemmas that are of some independent interest. Lemma 10.2.2 (Carlson’s lemma) Let g be a nonnegative, measurable function on IRd . Then for p > d, 3

g(x) dx ≤ Cp,d

1−d/p g 2 (x) dx

d/p . |x|p g 2 (x) dx

(10.2.4)

where 3 Cp,d =

(2) The

ωd π sin(πd/p)dd/p (p − d)1−d/p

results of this section are due to Horowitz and Karandikar (1994); see also Horowitz and Karandikar (1990). Their study presented in this section was motivated by the observation (see also Tanaka (1978)) that the Wasserstein metric is convenient for formulating weak convergence results for the empirical measures of ﬁnite interacting particle systems related to the Boltzmann equation.

324

10. Stochastic Diﬀerential Equations and Empirical Measures

and ωd is the surface area of the unit sphere in IRd . In particular, for p = d + 1 we have 3

1/(d+1)

g 2 (x) dx

g(x) dx ≤ Cd

d/(d+1) |x|d+1 g 2 (x) dx

,

from which follows 3

g(x) dx ≤ Cd

(|x|d+1 + 1)g 2 (x) dx,

(10.2.5)

where Cd is a constant depending only on d. Lemma 10.2.3 (Density coupling lemma) Let f, g be probability densities on IRd such that

|x|2 (f (x) + g(x)) dx < ∞,

and deﬁne µ( dx) = f (x) dx, ν( dx) = g(x) dx. Then(3)

22 (µ, ν) ≤ 3

|x|2 |f (x) − g(x)| dx.

Proof: Let M be a coupling of µ and ν deﬁned by ϕ(x, y)M ( dx, dy) 1 ϕ(x, y)(f (x) − f ∧ g(x))(g(y) − f ∧ g(y)) dx dy = 1−A + ϕ(x, x)f ∧ g(x) dx, where A = Then

f ∧ g(x) dx and ϕ(x, y) is any nonnegative Borel function.

|x − y|2 M ( dx dy)

(3) Zolotarev

here.

(1978) proves this lemma with the constant 4 instead of the constant 3

10.2 Rates of Convergence in the Kantorovich Metric

325

|x|2 (f (x) − f ∧ g(x)) dx + |y|2 (g(y) − f ∧ g(y)) dy 2 x(f (x) − f ∧ g(x)) dx · y(g(y) − f ∧ g(y)) dy − 1−A = |x|2 |f (x) − g(x)| dx 2 x(f (x) − f ∧ g(x)) dx · y(g(y) − f ∧ g(y)) dy − 1−A =

(the dot · indicates the usual inner product in IRd ). Furthermore, 1/2 1/2 2 x(f (x) − f ∧ g(x)) dx ≤ |x| |f − f ∧ g| dx |f − f ∧ g| dx 1/2 = |x|2 |f − g| dx (1 − A)1/2 . Thus

|x − y|2 dM ≤ 3 |x|2 |f − g| dx, and the result follows.

2

Lemma 10.2.4 (Pollard (1986)) For any r.v.s Z1 , . . . , ZN , √ @ N max EZk2 .

E max |Zk | ≤ 1≤k≤n

The proof is obvious: E max |Zk | ≤

@ @ @ EZk2 ≤ E max Zk2 ≤ N max EZk2 .

We next write Φσ ∼ N (0, σ 2 I) to indicate that Φσ is the multivariate normal distribution on IRd with mean vector 0 and dispersion matrix σ 2 I; here σ 2 > 0, and I is the d × d identity matrix. For any probability measure µ on IRd , let µσ := Φσ ∗ µ be the convolution of Φσ and µ. Thus µσ will have density qσ := φσ ∗ µ, where φσ is the density of Φσ . Lemma 10.2.5 If µ ∈ P2 , then 22 (µσ , µ) ≤ dσ 2 . Proof: Let X and Y be independent random vectors with laws µ and Φσ , respectively. Then (X, X + Y ) is a coupling of µ and µσ , and 22 (µσ , µ) ≤ 2 E|Y |2 = dσ 2 .

326

10. Stochastic Diﬀerential Equations and Empirical Measures

Proof of Theorem 10.2.1: Let X1 , X2 , . . . be i.i.d. r.v.s with law µ ∈ P2 , and let µn be the corresponding empirical measure. The triangle inequality gives

22 (µn , µ) ≤ 2 22 (µn , µσn ) + 22 (µσn , µσ ) + 22 (µσ , µ) . Thus

22 (µn , µ) ≤ C(σ 2 + 22 (µσn , µσ )).

(10.2.6)

The constant σ 2 > 0 will be chosen later. Let g σ := φσ ∗ µ and gnσ := φσ ∗ µn be the densities of µσ and µσn , respectively; here gnσ is given by 1 φσ (x − Xi ). n i=1 n

gnσ (x) =

By Lemma 10.2.3 and inequality (10.2.5), we have

22 (µσn , µσ )

≤ 3 |x|2 |g σ (x) − gnσ (x)| dx 3

(10.2.7)

(|x|d+5 + 1)|g σ (x) − gnσ (x)|2 dx.

≤ C The above bound yields E 22 (µσn , µσ ) ≤ C

3 (|x|d+5 + 1)E|g σ (x) − gnσ (x)|2 dx .

(10.2.8)

Since gnσ (x) is the mean of n i.i.d. r.v.s, the expectation in (10.2.8) is (1/n)V (φσ (x − X)), since Eg n,σ (x) = g σ (x), where V stands for variance and X has law µ, X ∼ µ. The indicated variance is dominated by Eφ2σ (x − X), and we obtain E 22 (µσn , µσ )

C ≤ √ n

3

(|x|d+5 + 1) φ2σ (x − y)µ( dy) dx.

(10.2.9)

Now observe that φ2σ (x) = 2−d/2 (2π)−d/2 σ −d φσ/√2 (x).

(10.2.10)

10.2 Rates of Convergence in the Kantorovich Metric

327

Using this, the integral in (10.2.9) is easily seen to be dominated by (4π)−d/2 σ −d 1 + 2d+4 σ d+5 E|Z|d+5 + |y|d+5 µ( dy) = Cσ −d , where Z ∼ N (0, I) and we assume σ ≤ 1. Thus E 22 (µσn , µσ ) ≤ Cn−1/2 σ −d/2 .

(10.2.11)

Taking expectations in (10.2.6), we get E 22 (µn , µ) ≤ C(σ 2 + n1/2 σ −d/2 ).

(10.2.12)

Choose σ = n−1/(d+4) , and Theorem 10.2.1 is proved.

2

Theorem 10.2.1(4) is also valid, with a slight modiﬁcation, for inﬁnite exchangeable sequences. Let X1 , X2 , . . . be an inﬁnite exchangeable sequence with directing measure µ (Aldous (1985)). Thus µ is now a random measure on IRd , and conditional on µ, the r.v.s Xn are i.i.d. r.v.s with law µ. Let β be the marginal distribution of Xn , so β(B) = Eµ(B). We then have the following rate of convergence result. Theorem 10.2.6 Suppose c := |u|d+5 β( du) < ∞. Then there is a constant C, depending only on c and d, such that −2

E 22 (µn , µ) ≤ cn d+4 . Proof: The proof is virtually the same as that of Theorem 10.2.1, except that the notation µ now refers to the directing measure. In (10.2.8) we take conditional expectation given µ instead of the ordinary (unconditional) expectation, and, arguing as in (10.2.9), but conditional on µ, we get 3 E( 22 (µσn , µσ )|µ)

≤ Cσ

−d/2 −1/2

n

C1 + C2

|y|d+5 µ( dy),

(4) Two results related to Theorem 10.2.1 are (i) If the law µ is the Lebesgue measure on [0, 1]d , then for d ≥ 3, E22 (µn , µ) = O(n−2/d ); see Yukich (1991). (ii) Rachev (1991c, Theorem 11.1.6) (see also Dudley (1969)) showed that under a metric entropy condition, the rate E22 (µn , µ) = O(n−1/d ) is optimal.

328

10. Stochastic Diﬀerential Equations and Empirical Measures

for some constants C, C1 , C2 . Taking expectation yields (10.2.11), and the proof is completed as before. 2 Consider an i.i.d. sequence of processes Xn (t) with sample functions in D := D([0, 1], IRd ), i.e., the space of cadlag functions (i.e., right continuous and having left limits at each point) on the unit interval, with values in IRd . Let X(t) denote a process having the same law. Set µt to be the marginal law of the process X at time t. The empirical measure at time t, based on observations X1 (t), . . . , Xn (t), is deﬁned by 1 δX (t) . n i=1 i n

µnt =

In this case we give a bound on the mean square uniform rate of convergence, that is of convergence to zero of E sup0≤t≤1 22 (µnt , µt ) under mild assumptions. Theorem 10.2.7 Suppose that for some constants p > 2 and c < ∞, (i) E|X(t)|d+5 ≤ c for 0 ≤ t ≤ 1, (ii) E|X(s) −X(r)|p |X(s) −X(t)|p ≤ c|t − r|2 , for 0 ≤ r < s < t ≤ 1, (iii) E|X(t) −X(s)|p ≤ c|t − s|, for 0 ≤ s ≤ t ≤ 1, (iv) E|X(t) −X(s)|2 ≤ c|t − s|, for 0 ≤ s ≤ t ≤ 1. Then there is a constant C, depending only on p, c, and the dimension d, such that E sup 22 (µnt , µt ) ≤ Cn−2/(d+8) . 0≤t≤1

Proof: Let N be a positive integer, to be chosen later, and let tk = k/N, 0 ≤ k ≤ N , so the tk partition [0, 1], and let Zk :=

sup

tk ≤t≤tk+t

22 (µnt , µntk ) ∧ 22 (µnt , µntk+1 ).

Then sup 22 (µnt , µt )

0≤t≤1

(10.2.13)

10.2 Rates of Convergence in the Kantorovich Metric

)

329

*

≤ 3 max Zk + k

max 22 (µntk , µtk ) k

+ max k

sup

tk ≤t≤tk+1

22 (µtk , µt )

.

The last term on the right is easy to estimate: µtk and µt are coupled by X(tk ) and X(t), so by (iv),

22 (µtk , µt ) ≤ E|X(tk ) − X(t)|2 ≤ C|tk − t|. Thus the contribution of the last term on the right-hand side of (10.2.13) is at most C/N . Next we consider the middle term. Let σ > 0 (to be chosen later). With the notation of Theorem 10.2.1, we have (10.2.14) E max 22 (µntk , µtk ) k n,σ 2 n 2 n,σ σ 2 σ ≤ C E max 2 (µtk , µtk ) + E max 2 (µtk , µtk ) + E max 2 (utk , µtk ) k k k σ ≤ C σ 2 + E max 22 (µn,σ t k , µt k ) . k

The last inequality follows by Lemma 10.2.5. By Lemma 10.2.4, σ E max 22 (µn,σ t k , µt k ) ≤ k

√ @ σ N max E 42 (µn,σ tk , µtk ).

(10.2.15)

k

Now, as in (10.2.7), σ , µ ) ≤ C (|x|d+5 + 1)|gtn,σ (x) − gtσk (x)|2 dx,

42 (µn,σ tk tk k σ (x), gtnk (x) are the p.d.f.s of µn,σ where gtn,σ tk , µtk . Arguing as in Theorem k 4 n,σ σ 10.2.1 and noting (i), we ﬁnd E 2 (µtk , µtk ) ≤ Cσ −d n−1 , and (10.2.15) yields −d/2 E max 22 (µn,σ tk ) ≤ Cσ k

$ N/n.

(10.2.16)

Putting everything together in (10.2.13) we have E

sup 22 (µnt , µt ) k

2

≤ C 1/N + σ + σ

−d/2

$ N/n + E max Zk k

(10.2.17)

330

10. Stochastic Diﬀerential Equations and Empirical Measures

2

≤ C 1/N + σ + σ

−d/2

$ √ @ 2 N/n + N max EZk . k

Finally, we analyze the term involving Zk . Deﬁne a random vector in (IRd )n by 1 Y (t) := √ (X1 (t), . . . , Xn (t)). n Then for t1 ≤ t ≤ t2 , E|Y (t) − Y (t1 )|p |Y (t2 ) − Y (t)|p 1 n 2p/2 1 n 2p/2 1 1 2 2 = E |Xi (t) − Xi (t1 )| |Xj (t2 ) − Xj (t)| n 1 n 1 1 n 21 n 2 1 1 ≤ E |Xi (t) − Xi (t1 )|p |Xj (t2 ) − Xj (t)|p n 1 n 1 =

n 1 E(|Xi (t) − Xi (t1 )|p |Xi (t2 ) − Xi (t)|p ) n2 1 1 + 2 E(|Xi (t) − Xi (t1 )|p )E(|Xj (t2 ) − Xj (t)|p ) n i =j

≤ C|t2 − t1 |2 . Here we have used the independence of Xi and Xj for i = j, and conditions (ii) and (iii) of the theorem. By Billingsley (1968; (15.26)), there is a constant K = Kp , depending only on p, such that P

sup min(|Y (t) − Y (t1 )|, |Y (t2 ) − Y (t)|) > λ

t1 ≤t≤t2

≤ CKλ−2p |t2 − t1 |2 .

Hence " E

4

sup |Y (t) − Y (t1 )| ∧ |Y (t2 ) − Y (t)|

t1 ≤t≤t2

4

#

≤ CK|t2 − t1 |2 ;

10.2 Rates of Convergence in the Kantorovich Metric

331

that is, ⎡

1

E ⎣ sup

t1 ≤t≤t2

1 1 |Xi (t) − Xi (t1 )|2 ∧ |Xi (t2 ) − Xi (t)|2 n 1 n 1 n

n

≤ CK|t2 − t1 |2 .

22 ⎤ ⎦

(10.2.18)

It is easy to check that 1 |Xi (t) − Xi (t1 )|2 , n 1 n

22 (µnt , µnt1 ) ≤

so that (10.2.18) implies (replace t1 , t2 by tk , tk+1 ) EZk2 ≤ CK/N 2 .

(10.2.19)

Putting this into (10.2.17), we have

$ √ E sup 22 (µnt , µt ) ≤ C 1/N + σ 2 + σ −d/2 N/n + 1/ N t

$ √ ≤ C σ 2 + σ −d/2 N/n + 1/ N .

(10.2.20)

The choice σ = n−1/(d+8) , and N = n4/(d+8) now gives the result.

2

Remark 10.2.8 Horowitz and Karandikar also showed that if X(t) is an IRd -valued diﬀusion with jumps, then under some conditions on the drift and diﬀusion coeﬃcient, it satisﬁes moment estimates as in Theorem 10.2.7. More precisely, let X(t) be a solution of the SDE ( dt dz). dX(t) = σ(t, X(t)) dW (t) + b(t, X(t)) dt + h(t, X(t−), z)N Here, W (t) is an IRd -valued standard Wiener process; N is a Poisson random measure on [0, T ] × IRd with intensity measure Λ given by Λ( dt, dz) = λt ( dz) dt, = N − Λ, σ(t, x), b(t, x) are measurable functions on [0, T ] × IRd taking N values in d × d matrices and IRd , respectively; and h(t, x, z) is an IRd -

332

10. Stochastic Diﬀerential Equations and Empirical Measures

valued measurable function on [0, T ] × IRd × IRd . It is assumed that X(0) is independent of W , N , and that X(t) is Ft -adapted, where

Ft = σ X(0), W (s), N ([0, s] × A), s ≤ t, A Borel in IRd (see Jacod (1979)). Suppose that |σ(t, x)|2 ≤ C(1 + |x|2 ),

|b(t, x)|2 ≤ C(1 + |x|2 ),

E|X(0)|d+5 < ∞,

and for 2 ≤ q ≤ d + 5, 0 ≤ r ≤ T , hq (r, x, z)λr ( dz) ≤ C(1 + |x|q ). Under these conditions, the process X(t) satisﬁes the moment conditions imposed in Theorem 10.2.7. For details we refer to Horowitz and Karandikar (1994, Section 5).

10.3 Wasserstein Metric and Approximation of Stochastic Diﬀerential Equations In this section we shall study weak approximations of stochastic diﬀerential equations (SDEs) of Itˆ o type. Moreover, we shall estimate the convergence rates of the approximate solutions using the Lp -distance E · pC[t0 ,T ] ,

p ∈ [2, ∞).(5) These results can also be interpreted as convergence rates for the minimal Lp -(Wasserstein) metric, p ∈ [1, ∞), between the distributions of exact and approximate solutions. Two approximation schemes will be considered. They represent a combination of the time discretization methods of Euler and Milshtein with a chance discretization based on the invariance principle, and they work on a grid constructed to tune both discretizations. The methods investigated here are based on the evaluation of the drift and diﬀusion coeﬃcients in grid points, and they combine the time discretization of the SDE—as done, for instance, by the stochastic analogue of Euler’s method—with the discretization of the stochastic input, the Wiener (5) The

results in this section are due to Gelbrich (1995).

10.3 Stochastic Diﬀerential Equations

333

process. This combination of time and chance discretization is necessary for a computer simulation of the solution of Itˆo SDEs. Another idea for discretizing such SDEs without using the Wiener process can be found in Pardoux and Talay (1985) and Talay (1988) and is based on the approach of Doss (1977) and Sussmann (1978). In fact, Doss (1977) and Sussmann (1978) use a partial and an ordinary diﬀerential equation for constructing a pathwise solution of the SDE; that is, a pathwise convergence in the supremum norm is considered; see also Milshtein (1978), Newton (1986), Wagner (1988). A broad survey over various approximations of solutions for SDEs is given in the monograph Kloeden and Platen (1992), see also Platen (1981), Maruyama (1955), Milshtein (1978), Janssen (1984), Dudley (1968), Doss (1977), Sussmann (1978), and R¨omisch and Wakolbinger (1985). Kanagawa (1986) used a method derived from the stochastic Euler method by replacing the increments of the Wiener process by other “simpler” i.i.d. r.v.s. He uses Lp -Wasserstein metrics (p ≥ 2) between the distributions of exact and approximate solutions, thus achieving convergence rates, see also Rachev (1991), Givens and Shortt (1984), and Gelbrich (1990). Gelbrich (1995) uses the same metrics but generalizes the method of Kanagawa. For that he uses as a basis the stochastic Euler method (see further the method (E1)) (proposed by Maruyama (1955)) and Milshtein’s method (M1∗ ) (proposed by Milshtein (1978)) having orders 1 and 2, respectively, with respect to the mean square of the supremum norm of the diﬀerence between exact and approximate solutions. Since these methods use values of the drift and diﬀusion coeﬃcients and of the Wiener process only in grid points tk , the order 2 is optimal, as shown by Clark and Cameron (1980). The orders of these methods (see further (E1) and (M1∗ )) are proved in Platen (1981), as well as higher orders for methods using also iterated integrals of the Wiener process. Consider a stochastic diﬀerential equation of Itˆ o type written in integral form:

x(t) − x0

t t = b(x(s)) ds + σ(x(s)) dw(s) t0 t

(SE) =

b(x(s)) ds + t0

t0 q t j=1 t

0

σj (x(s)) dwj (s),

t ∈ [t0 , T ], x0 ∈ IRd ,

334

10. Stochastic Diﬀerential Equations and Empirical Measures

where w = (w1 , . . . , wq )T is a q-dimensional standard Wiener process, b ∈ C(IRd ; IRd ), and σ ∈ C(IRd ; L(IRq ; IRd )), and where σj ∈ C(IRd ; IRd ), j = 1, . . . , q, denote the columns of the matrix function σ = (σ1 , . . . , σq ). In the sequel we denote by C and C i spaces of continuous and i times diﬀerentiable functions, respectively, and by L spaces of linear mappings. By · we shall denote the Euclidean norm on IRn (n ∈ IN) and the corresponding induced norm on a space L. For any random variable (r.v.) ζ mapping a probability space (Ω, A, P ) into a separable metric space (X, d) with the Borel σ-algebra B(X), L(ζ) denotes the distribution P ◦ ζ −1 induced on X by ζ. P(X) is the set of all Borel probability measures on X. The case that b and σ explicitly depend on the time t can be written in the form (SE) by taking t as another component of x. A direct treatment of this case follows the same lines as in this section and is carried out—for equidistant grids and bounded b and σ—in Gelbrich (1989). It allows us to relax the eventually required second-order t-diﬀerentiability to ﬁrst-order t-diﬀerentiability.

p on the For p ∈ [1, ∞), recall the minimal Lp -metric set Mp (X) := µ ∈ P(X); (d(x, θ))p dµ(x) < ∞, θ ∈ X : X

⎡

p (µ, ν) := ⎣inf

⎤1/p (d(x, y))p dη(x, y)⎦

,

µ, ν ∈ Mp (X);

X×X

the inﬁmum is taken over all measures η ∈ P(X × X) with marginal distributions µ and ν. Theorem 10.3.1 (Kanagawa (1986)) Let {ζ k , k = 1, . . . , N } ∈ IRq be a set of bounded i.i.d. q-dimensional r.v.s with mean value 0 and covariance matrix Iq (unit matrix), with ﬁnite (2+δ)th absolute moments for some δ ∈ (0, 1], and with a quadratically integrable density. If b and σ are Lipschitz continuous, then the method ⎧ ⎪ y4N (0) = x0 , ⎪ ⎪ ⎪ ⎪ k k ⎪ ζ j 1 ⎪ ⎪y4N ( k ) = x0 + √ σ j−1 , y4N j−1 + b j−1 , y4N j−1 ⎪ N N N N N N, ⎪ N ⎨ j=1 j=1 (K)

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ y4N (t) ⎪ ⎪ ⎪ ⎪ ⎩

k = 1, . . . , N, k−1 k − y4N k−1 = y4N N + (N · t − k + 1) y4N N N k for t ∈ k−1 N , N , k = 1, . . . , N,

10.3 Stochastic Diﬀerential Equations

converges for any ε >

1 2

335

and every p ∈ [2, 2 + δ) at the rate

p (D(4 yN ), D(x)) = O(N −δ/2(2+δ) (ln N )ε )

for

N → ∞.

In the sequel we shall use the following assumptions concerning (SE): (AS1)

There exists a constant M > 0 such that for all j = 1, . . . , q and x ∈ IRd , b(x) ≤ M (1 + x)

(AS2)

and

σj (x) ≤ M .

There exists a constant L > 0 such that, for all j = 1, . . . , q and x, y ∈ IRd , b(x) − b(y) ≤ Lx − y

(AS3)

and

σj (x) − σj (y) ≤ Lx − y.

b, σj ∈ C 2 (IRd ; IRd ), j = 1, . . . , q, and there exists a constant B > 0 such that for all j = 1, . . . , q and x, y ∈ IRd , b (x) − b (y) ≤ Bx − y and σj (x) − σj (y) ≤ Bx − y.

(AS2∗ )

b, σj ∈ C 2 (IRd ; IRd ), j = 1, . . . , q, and there exists a constant L > 0 such that for all j = 1, . . . , q and x ∈ IRd , b (x) ≤ L

(AS3∗ )

and

σj (x) ≤ L.

There exists a constant B > 0 such that for all j = 1, . . . , q and x ∈ IRd , sup{ b (x)[h, k]; h, k ∈ IRd , h ≤ 1, k ≤ 1} ≤ B and sup{σj (x)[h, k]; h, k ∈ IRd , h ≤ 1, k ≤ 1} ≤ B.

(AS4)

σi σj = σj σi for all i, j = 1, . . . , q.

The construction of the approximate solutions in Theorem 10.3.1 will be generalized by considering—instead of one equidistant grid for both time and chance discretization—a not necessarily equidistant coarse grid for the time discretization and a ﬁne grid. The ﬁne grid will be a reﬁnement of the coarse grid and will be needed for the chance discretization via

336

10. Stochastic Diﬀerential Equations and Empirical Measures

the invariance principle, which yields a lower convergence speed than the time discretization. To this end, we consider a grid class G(m, Λ, α, β). Here let m : (0, T − t0 ] → [1, ∞) be a monotonically decreasing function and let Λ, α, β > 0 be constants. Then each element G of G(m, Λ, α, β) is constructed in the following way and has the following properties: G consists of two kinds of grid points: • the time discretization points tk , k = 0 . . . , n, with t0 < t1 < · · · < tn = T and • the chance discretization points uki , i = 0, . . . , mk ; k = 0, . . . , n − 1, with tk = uk0 < uk1 < · · · < ukmk = tk+1 ; k = 0, . . . , n − 1. Hence, G is a combination of a coarse subgrid consisting of all points tk relevant for the pure time discretization and of a ﬁne grid consisting of all points uki needed for the discretization of the Wiener process. Denote by hk := tk+1 − tk ; k = 0, . . . , n − 1,

and h :=

max

0≤k≤n−1

hk

the step sizes and the maximal step size of the coarse subgrid. Now, G is required to satisfy the following assumptions: (G1)

h · n ≤ Λ and n ∈ IN, h ≤ 1.

(G2)

1 ≤ mk ≤ m(h)α and mk ∈ IN for all k = 0, . . . , n − 1.

(G3)

uki − uki−1 =

hk mk

h ≤ β m(h) for all k = 0, . . . , n − 1; i = 1, . . . , mk .

Here (G1) restricts the number of intervals of the coarse subgrid with given h, which is bounded by 1 only for convenience (in order to write simpler upper bounds later). (G2) and (G3) say that each interval of the coarse subgrid is subdivided in an equidistant way by the points uki , both the number of the subdivisions and the step size of the full grid being bounded by functions of h. As an example, it is easy to see that all equidistant grids that also have an equidistant coarse subgrid and satisfy mk = [m(h)], k = 0, . . . , n − 1, belong to G(m, T − t0 , 1, 2). For a grid G of G(m, Λ, α, β) we deﬁne [t]G := tk and iG (t) := k,

if t ∈ [tk , tk+1 ); k = 0, . . . , n − 1;

[t]∗G

i = 1, . . . , mk ; k = 0, . . . , n − 1.

:=

uki

if t ∈

[uki , uki+1 );

10.3 Stochastic Diﬀerential Equations

337

We construct the approximate solutions in (E3) and (M3) in three steps. The ﬁrst step is a pure time discretization using the stochastic Euler method (E1) and the method (M1) corresponding to Milshtein’s method (M1∗ ). Here only the coarse subgrid is involved. We deﬁne these two methods as follows:

(E1)

t q t E σj (y E ([s]G )) dwj (s) y (t) = x0 + b(y ([s]G )) ds + E

t0

j=1 t

0

for all t ∈ [t0 , T ],

and

(M1)

⎧ t t q ⎪ ⎪ ⎪ ⎪ y M (t) = x0 + b(y M ([s]G )) ds + σk (y M ([s]G )) dwk (s) ⎪ ⎪ ⎪ ⎪ k=1 ⎨ t0 t0 t s q ⎪ M ⎪ + (σk σj )(y ([s]G )) dwj (u) dwk (s) ⎪ ⎪ ⎪ ⎪ j=1 t ⎪ 0 [s]G ⎪ ⎩ for all t ∈ [t0 , T ].

q If (AS4) holds and b := b − 12 j=1 σj σj , then (M1) is equivalent to the following method (M1∗ ) proposed by Milshtein (1978). This equivalence is an immediate consequence of Itˆo’s formula. ⎧ M y (t) = ⎪ ⎪ ⎪ iG (t)−1 ⎪ ⎪ ⎪ ⎪ ⎪ x + hrb(y M (tr )) + b(y M ([t]G ))(t − [t]G ) 0 ⎪ ⎪ ⎪ ⎪ ⎪ )r=0 ⎪ iG (t)−1 q ⎪ ⎪ ⎪ ⎪ ⎪ σj (y M (tr ))(wj (tr+1 ) − wj (tr )) + ⎪ ⎪ * ⎪ ⎪ r=0 j=1 ⎪ ⎪ M ⎪ + σj (y ([t]G )) (wj (t) − wj ([t]G )) ⎪ ⎪ ⎪ ⎪ ⎨ )iG (t)−1 q (M1∗ ) 1 ⎪ + (σj σg )(y M (tr )) (wj (tr+1 ) − wj (tr )) ⎪ ⎪ 2 j,g=1 r=0 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ · (wg (tr+1 ) − wg (tr )) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + (σj σg )(y M ([t]G )) (wj (t) − wj ([t]G )) ⎪ * ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ · (wg (t) − wg ([t]G )) ⎪ ⎪ ⎪ ⎪ ⎩ for all t ∈ [t0 , T ].

338

10. Stochastic Diﬀerential Equations and Empirical Measures

In the second step, a continuous and piecewise linear interpolation of the trajectories in (E1) and (M1) between the points of the whole ﬁne grid yields the methods (E2) and (M2), respectively:

(E2)

⎧ E ⎨ y is continuous, and linear in the intervals [uki−1 , uki ], i = 1, . . . , mk ; k = 0, . . . , n − 1, ⎩ with yE (uki ) = y E (uki ), i = 0, . . . , mk ; k = 0, . . . , n − 1.

(M2)

⎧ M ⎨ y is continuous, and linear in the intervals [uki−1 , uki ], i = 1, . . . , mk ; k = 0, . . . , n − 1, ⎩ with yM (uki ) = y M (uki ), i = 0, . . . , mk ; k = 0, . . . , n − 1.

In the third step, the Wiener process increments over the ﬁne grid are replaced by other i.i.d. r.v.s: Let µ ∈ P(IR) be a measure with mean value 0 and variance 1, and let

k {ξjs : j = 1, . . . , q; s = 1, . . . , mk ; k = 0, . . . , n − 1}

0 ) = µ. be a family of i.i.d. r.v.s with distribution D(ξ11

Then we shall deﬁne the methods (E3) and (M3) yielding continuous trajectories linear between neighboring grid points: ⎧ E 0 z (u0 ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ z E (uki ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (E3)

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

= x0 ; = x0 +

k−1

hr b(z E (tr )) + hk ·

E i mk b(z (tk ))

r=0

+

q

)k−1 7 hr

j=1

mr

r=0

7 +

σj (z E (tr ))

mr

r ξjs

s=1 i hk k σj (z E (tk )) ξjs mk s=1

*

for all i = 1, . . . , mk ; k = 0, . . . , n − 1;

10.3 Stochastic Diﬀerential Equations

339

and ⎧ ⎪ ⎪ ⎪ z M (u00 ) = ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ z M (uki ) = ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ (M3)

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

x0 , and for b := b − x0 +

k−1

hr b(z r=0 ) 7 q k−1

M

1 σ σj ; 2 j=1 j q

(tr )) + hk ·

i M mk b(z (tk ))

r hr r σj (z M (tr )) ξjs m r s=1 j=1 r=0 * 7 i hk M k + σj (z (tk )) ξjs mk s=1 ) k−1 21m 2 1m q r r 1 hr M r r + (σ σg )(z (tr )) ξjs ξgs 2 j,g=1 r=0 mr j s=1 s=1 1 i 21 i 2* hk M k k + mk (σj σg )(z (tk )) ξjs ξgs

+

m

s=1

for all i = 1, . . . , mk ; k = 0, . . . , n − 1.

s=1

k For this last step, the Wiener process w and the r.v.s ξji will have to be deﬁned anew on a common probability space. In the following we investigate the convergence rates in terms of the norm E supt0 ≤t≤T · p for C([t0 , T ]; IRd )-valued r.v.s in each of the three steps.

For convenience we shall denote by K any constant depending only on p, the considered grid class, and on the data of the original SDE (SE). This means that K does not depend on the particular grid. Moreover, K may have diﬀerent values at diﬀerent occurrences. The theorems(6) in the sequel will be formulated for an arbitrary ﬁxed grid G of the grid class G(m, Λ, α, β). Therefore, G fulﬁls (G1)–(G3) with the construction above. We start with some preliminary results. The ﬁrst one provides the multidimensional H¨older inequality. Lemma 10.3.2 (H¨ older’s inequality) (a) Let p ∈ [1, ∞), s < t, and let g : [s, t] → IRd , g(u) = (g1 (u), . . . , gd (u))T (u ∈ [s, t]), be a Borel measurable function such that |gi |p is Lebesgue integrable over [s, t] for i = 1, . . . , d. Then 't 'p t ' ' p−1 ' g(u) du' ≤ (t − s) g(u)p du. ' ' s (6) For

s

the proof of the results in this section we refer to Gelbrich (1995).

340

10. Stochastic Diﬀerential Equations and Empirical Measures

(b) Let p ∈ [1, ∞) and ai ∈ IRd for all i = 1, . . . , r. Then ' r 'p r ' ' ' ai ' ≤ rp−1 ai p . ' ' i=1

i=1

Lemma 10.3.3 (The multidimensional martingale inequalities) Let p ∈ [2, ∞). Then there exist constants Cp , Ap > 0 such that the following assertions hold: (a) Let (w(t), F(t))t∈[α,β] be a one-dimensional standard Wiener process over the probability space (Ω, A, P ). Then for every function g = (g1 , . . . , gd ) : [α, β] × Ω → IRd with the properties (i)

g(·, ω) is square-integrable over [α, β] for almost all ω ∈ Ω,

(ii)

g(u) = g(u, ·) is F(u)-measurable for all u ∈ [α, β],

we have 'p ' s ⎛ t ⎞p/2 ' ' ' ' p/2−1 ' E sup ' Cp E ⎝ g(u)2 du⎠ ' g(u) dw(u)' ≤ d α≤s≤t ' ' α

α

for all t ∈ [α, β]. (b) Let (Ms , Fs )s=0,...,r be an IRd -valued martingale (i.e., each component is a martingale), and let p ∈ [2, ∞). Then with ∆Ms := Ms − Ms−1 we have E sup Ms 0≤s≤r

p

≤ d

p/2−1

Ap E

r

p/2 ∆Ms . 2

s=1

Corollary 10.3.4 Let p ∈ [2, ∞). Then there exist constants Cp , Ap > 0 such that (a)

under the assumptions of Lemma 10.3.3(a), for all t ∈ [α, β], ' s 'p ' ' t ' ' p/2−1 ' ' E sup ' g(u) dw(u)' ≤ [d(β − α)] Cp Eg(u)p du, α≤s≤t ' ' α

(b)

α

under the assumptions of Lemma 10.3.3(b), E max Ms p ≤ Ap (dr)p/2−1 E 0≤s≤r

r s=1

∆Ms p .

10.3 Stochastic Diﬀerential Equations

341

Lemma 10.3.5 (Gronwall’s lemma) (a) Let f : [t0 , T ] → [0, ∞) be a continuous function and c1 , c2 positive t constants. If for all t ∈ [t0 , T ], f (t) ≤ c1 + c2 f (s) ds, then t0

sup f (t) ≤ c1 ec2 (T −t0 ) .

t0 ≤t≤T

(b) Let a0 , . . . , an and c1 , c2 be nonnegative real numbers. If for all k = k−1 0, . . . , n, ak ≤ c1 + c2 n1 i=0 ai , then max ai ≤ c1 ec2 .

0≤i≤n

Based on Lemmas 10.3.2, 10.3.3, and 10.3.5 one gets the following convergence results for the time discretization step. Theorem 10.3.6 Let p ∈ [2, ∞). Then, (a)

(AS1) and (AS2) imply E sup x(t) − y E (t)p ≤ K · hp/2 , t0 ≤t≤T

(b)

(AS1), (AS2), and (AS3) imply E sup x(t) − y M (t)p ≤ K · hp . t0 ≤t≤T

Next, the solutions in (E1) and (M1∗ )—which behave like the Wiener process between two neighboring points tk−1 and tk of the coarse subgrid of G—will be smoothened by linear interpolation with vertices in all grid points of G, that means in all uki . This will be the contents of Theorem 10.3.10. For its proof we need the following three lemmas. Lemma 10.3.7 Let vi , i = 1, . . . , r, be i.i.d. standard-normally distributed real-valued r.v.s. Then for all p ∈ [0, ∞), E max |vi |p ≤ K(1 + ln r)p/2 . 1≤i≤r

Lemma 10.3.8 Let (w(t)) t∈[τ0 ,∞] be a one-dimensional standard Wiener process and x a standard-normally distributed random variable. Then for

342

10. Stochastic Diﬀerential Equations and Empirical Measures

τ0 ≤ a < a < ∞, the random variables |x| have the same distribution.

@

1 a−a

supa≤t≤a (w(t) − w(a)) and

Lemma 10.3.9 Let a0 < a1 < · · · < ar be a partition of [a0 , ar ] with maximal step size ∆ := max0≤i≤r−1 (ai+1 − ai ) and (w(t)) t∈[a0 ,ar ] a onedimensional standard Wiener process. Then E max

sup

0≤i≤r−1 ai ≤t≤ai+1

|w(t) − w(a i )|p ≤ K · ∆p/2 (1 + ln r)p/2 .

Now, upper bounds for the Lp -norm of the diﬀerences between the approximate solutions in (E1) and (E2), and in (M1∗ ) and (M2), respectively, can be obtained: Theorem 10.3.10 Let p ∈ [2, ∞). Then (a)

(AS1) and (AS2) imply E sup y E (t) − yE (t)p ≤ K

t0 ≤t≤T

(b)

h m(h)

p/2

1 + ln

m(h) h

p/2 ;

(AS1)–(AS4) imply E sup y M (t) − yM (t)p ≤ K t0 ≤t≤T

h m(h)

p/2

1 + ln

m(h) h

p/2 .

Proof:(7) (a) First, consider the process y E with y E (t0 ) = x0 , y E (uki ) = yE (uki ), y E (t) = y E (uki−1 ) for t ∈ [uki−1 , uki ) (k = 0, . . . , n−1; i = 1, . . . , mk ). Then, with Lemma 10.3.2(b), (AS1), Lemma 10.3.9, (G2), and (G3), we have E sup y E (t) − y E (t)p t0 ≤t≤T ⎧ ' t 'p ⎪ ' ⎨ ' ' ' E ≤ K E sup ' b(y ([t]G )) ds' ' ' ⎪ ⎩ t0 ≤t≤T ∗

(10.3.1)

[t]G

(7) We shall include only this proof, which is typical for the methods used in Gelbrich (1995).

10.3 Stochastic Diﬀerential Equations

+

q j=1

≤ K

%

max

0≤k≤n−1

+

⎫ ' 'p ⎪ t ' ' ⎬ ' ' E sup 'σj (y E ([t]G )) dwj (s)' ' ⎪ t0 ≤t≤T ' ⎭

p hk 1+E mk

q j=1

343

[t]∗ G

sup tJ 0≤t≤T

hk 0≤k≤n−1 mk

max

y E (t)p

p/2 & (1 + ln(n · m(h)α ))p/2

p/2 h (1 + ln n + ln m(h))p/2 . ≤ K 1 + E sup y E (t)p m(h) t0 ≤t≤T

Since we have by Minkowski’s inequality that

1/p E sup y (t) E

p

t0 ≤t≤T

≤

1/p

E sup x(t) − y (t) E

p

t0 ≤t≤T

1/p

+ E sup x(t)

p

t0 ≤t≤T

where the right-hand side is bounded because of Theorem 10.3.6, it holds that E sup y E (t)p

≤

to ≤t≤T

K.

(10.3.2)

Hence, by (10.3.1) and (G1), E sup y E (t) − y E (t)p t0 ≤t≤T

≤ K

(10.3.3)

p/2

(1 + ln n + ln m(h))p/2

p/2

p/2 h ≤ K m(h) . 1 + ln m(h) h h m(h)

On the other hand, E sup y E (t) − yE (t)p

(10.3.4)

t0 ≤t≤T

= E

max 0≤k≤n−1 0≤i ≤mk −1

= E

max 0≤k≤n−1 0≤i ≤mk −1

sup k uk i ≤t≤ui+1

y E (t) − yE (t)p

y E (uki+1 ) − yE (uki )p

,

344

10. Stochastic Diﬀerential Equations and Empirical Measures

≤ E

max 0≤k≤n−1 0≤i ≤mk −1

sup k uk i ≤t≤ui+1

y E (t) − y E (uki )p

= E sup y E (t) − y E (t)p . t0 ≤t≤T

Now, with (10.3.3) and (10.3.4) we have E sup y E (t) − yE (t)p t0 ≤t≤T

≤ K E sup y E (t) − y E (t)p + E sup y E (t) − yE (t)p ≤ K

t0 ≤t≤T

p/2

h m(h)

1 + ln

m(h) h

t0 ≤t≤T

p/2 .

(b) As in (a), we ﬁrst consider the process y M deﬁned by y M (t0 ) = x0 ; y M (uki ) = yM (uki ); y M (t) = y M (uki−1 ) for t ∈ [uki−1 , uki ) q (k = 0, . . . , n − 1; i = 1, . . . , mk ); and with b = b − 12 j=1 σj σj and ∆j w(u, v) := wj (v) − wj (u) (j = 1, . . . , q; u, v ∈ [t0 , T ]) we have, using method (M1∗ ), E sup y M (t) − y M (t)p

(10.3.5)

t0 ≤t≤T

' t 'p ' ' ' ' M ≤ K E sup ' b(y ([t]G )) ds' ' ' t0 ≤t≤T

[t]∗ G

+

q j=1

+

' t 'p ' ' ' ' E sup ' σj (y M ([t]G )) ds' ' ' t0 ≤t≤T [t]∗ G

q i,j=1

− ≤ K

E sup (σi σj )(y M ([t]G ))[∆i w([t]G , t)∆j w([t]G , t) t0 ≤t≤T

∆i w([t]G , [t]∗G )∆j w([t]G , [t]∗G )]p

h m(h)

+ K

p/2

1 + ln

q i,j=1

m(h) h

p/2

E sup |∆i w([t]G , t)∆j w([t]G , t) t0 ≤t≤T

− ∆i w([t]G , [t]∗G )∆j w([t]G , [t]∗G )|p , analogously to (10.3.1)–(10.3.3), but having used the inequalities b(x) ≤ K(1 + x) (x ∈ IRd )and E sup y M (t)p ≤ K. t0 ≤t≤T

10.3 Stochastic Diﬀerential Equations

345

By the Cauchy–Schwarz inequality and by the relations sup |∆wj ([t]∗G , t)|2p =

t0 ≤t≤T

sup |∆wj ([t]G , [t]∗G )|2p

t0 ≤t≤T

max

sup

0≤k≤n−1 uk ≤t≤uk i+1 0≤i ≤mk −1 i

|∆wj (uki , t)|2p ,

sup |∆wj ([t]G , t)|2p

≤

t0 ≤t≤T

=

max

sup

0≤k≤n−1 tk ≤t≤tk+1

|∆wj (tk , t)|2p .

By Lemma 10.3.9 and (G3) we obtain E sup |∆wi ([t]G , t)∆wj ([t]G , t) − ∆wi ([t]G , [t]∗G )∆wj ([t]G , [t]∗G )|p t0 ≤t≤T

≤ K E sup |∆wi ([t]G , t)[∆wj ([t]G , t) − ∆wj ([t]G , [t]∗G )]|p t0 ≤t≤T

+ E sup |[∆wi ([t]G , t) − t0 ≤t≤T

∆wi ([t]G , [t]∗G )]∆wj ([t]G , [t]∗G )|p

" # p ∗ p ≤ K E sup |∆wi ([t]G , t)| sup |∆wj ([t]G , t)| " +E

t0 ≤t≤T

t0 ≤t≤T

sup |∆wi ([t]∗G , t)|p sup |∆wj ([t]G , [t]∗G )|p

t0 ≤t≤T

#

t0 ≤t≤T

1/2 1/2 ≤ K E sup |∆wi ([t]G , t)|2p E sup |∆wj ([t]∗G , t)|2p t0 ≤t≤T

t0 ≤t≤T

1/2 1/2 ∗ 2p ∗ 2p + E sup |∆wi ([t]G , t)| E sup |∆wj ([t]G , [t]G )| ≤ K

t0 ≤t≤T

p/2

max hk

0≤k≤n−1

max

0≤k≤n−1

t0 ≤t≤T

p/2

p/2 p/2 hk hk + max max h k mk mk 0≤k≤n−1

0≤k≤n−1

×(1 + ln n + ln m(h)) (1 + ln n)

p/2 h ≤ K · hp/2 m(h) (1 + ln n + ln m(h))p/2 (1 + ln n)p/2

p/2

p/2 h ≤ K · m(h) . (10.3.6) 1 + ln m(h) h p/2

p/2

Here the last step is based on (G1) and the boundedness of h(1 + ln n) ≤ h(1 + ln(Λ/h)). Now, (10.3.5) and (10.3.6) yield E sup y M (t)−y M (t)p ≤ K· t0 ≤t≤T

h m(h)

p/2

1 + ln

m(h) h

p/2 .(10.3.7)

346

10. Stochastic Diﬀerential Equations and Empirical Measures

Analogously to (10.3.4), it follows that E sup y M (t) − yM (t)p ≤ E sup y M (t) − y M (t)p , t0 ≤t≤T

t0 ≤t≤T

which, together with (10.3.7), gives us the estimate (b). 2 In the last discretization step the Wiener process increments shall be replaced by i.i.d. r.v.s with a given distribution µ on IR. But the corresponding results hold only in the weak sense; i.e., the Wiener process (and k can be deﬁned its increments between the points of G) and i.i.d. r.v.s ξji on a common probability space such that the estimates hold. Theorem 10.3.11 (Koml´ os, Major, Tusn´ ady (1975, 1976)) Let µ ∈ P(IR) have the following properties: ∞ x dµ(x) = 0, −∞

∞ x2 dµ(x) = 1, −∞

∞ etx dµ(x) < ∞, −∞

for all t with t ≤ τ, τ > 0.

(10.3.8)

Then there exist positive constants C, A, λ, depending only on µ, and for each natural number s > 0 two s-tuples (x1 , . . . , xs ) and (y1 , . . . , ys ), each consisting of i.i.d. real-valued r.v.s, with L(x1 ) = µ and y1 being standardnormally distributed, such that for each a > 0, P

k max i=1 (xi − yi ) > C ln s + a < Ae−λa .

1≤k≤s

For translating this estimate into an estimate with the distance used in the previous chapters, we need the following lemma. Lemma 10.3.12 Assume that there exist constants C, A, λ > 0 with λC ≥ 1 and for any two natural numbers r, s ≥ 1 an r-tuple (δ1,s , . . . , δr,s ) of i.i.d. positive real-valued r.v.s satisfying P (δ1,s > C ln s + a) < Ae−λa

for all a > 0.

(10.3.9)

Then for each p ∈ [0, ∞), there exists a constant Mp > 0 such that for all natural r, s ≥ 1, p ≤ Mp (1 + ln r + ln s)p . E max δi,s 1≤i≤r

10.3 Stochastic Diﬀerential Equations

347

The following well-known result (cf., for example, Shortt (1983), Rachev (1991)) is used in the proof of the following theorem. Lemma 10.3.13 Let S1 , S2 , and S3 be Polish spaces (i.e., topological spaces that are metrizable with a complete separable metric), and let π12 : S1 × S2 × S3 → S1 × S2 ; π23 : S1 × S2 × S3 → S2 × S3 ; π212 : S1 × S2 → S2 , and π223 : S2 × S3 → S2 denote the projections deﬁned by dropping one component. Then for any two measures ν12 ∈ P(S1 × S2 ) and ν23 ∈ P(S2 × S3 ) with ν12 ◦ (π212 )−1 = ν23 ◦ (π223 )−1 , i.e., with identical marginal distributions on S2 , there exists a measure ν123 ∈ P(S1 ×S2 ×S3 ) with ν123 ◦ (π12 )−1 = ν12 and ν123 ◦ (π23 )−1 = ν23 . Now we can prove the estimates for the chance discretization step: Theorem 10.3.14 Let p ∈ [2, ∞) and µ ∈ P(IR) satisfy (10.3.8). Then we can deﬁne a q-dimensional standard Wiener process (w(t))t∈[t0 ,T ] and k a set of i.i.d. r.v.s {ξji : j = 1, . . . , q; i = 1, . . . , mk ; k = 0, . . . , n − 1} 0 with distribution L(ξ11 ) = µ on a common probability space such that for the methods (E2), (E3), (M2), and (M3) constructed with them we have (a) If (AS1) and (AS2) hold, then y E (t) − z E (t)p ≤ K E sup

1+ln m(h)

t0 ≤t≤T

√

p .

m(h)

(b) If (AS1), (AS2), (AS3), and (AS4) hold, then y M (t) − z M (t)p ≤ K E sup t0 ≤t≤T

1+ln m(h)

√

m(h)

p .

The preceding results yield the following theorem, which gives bounds for the Lp -norm of the diﬀerences between the exact solution x of (SE) and the approximate solutions z E and z M deﬁned in (E3) and (M3). Again, as in Theorem 10.3.14, this is a result in the weak sense. Theorem 10.3.15 Let p ∈ [2, ∞) and µ ∈ P(IR) satisfy (10.3.8). Then we can deﬁne a q-dimensional standard Wiener process (w(t))t∈[t0 ,T ] and k a set of i.i.d. r.v.s {ξji , j = 1, . . . , q, i = 1, . . . , mk , k = 0, . . . , n − 1} with 0 distribution L(ξ11 ) = µ on a common probability space such that for (SE) and the methods (E3) and (M3) constructed with them we have

348

10. Stochastic Diﬀerential Equations and Empirical Measures

(a) If (AS1) and (AS2) hold, then E sup x(t) − z (t) E

p

t0 ≤t≤T

≤ K h

p/2

+

1+ln m(h)

√

p

m(h)

.

(b) If (AS1), (AS2), (AS3), and (AS4) hold, then p √ m(h) E sup x(t) − z M (t)p ≤ K hp + 1+ln . m(h)

t0 ≤t≤T

To show that both assertions (a) and (b) follow from the Theorems 10.3.6, 10.3.10, and 10.3.14, it suﬃces to verify that h m(h)

1 + ln

m(h) h

≤ K

1+ln m(h)

√

m(h)

2 ,

which follows easily from (G1). Since Theorem 10.3.15 provides results in the weak sense, it is appropriate to formulate it as an estimate for the Lp -Wasserstein metric between the distributions of the exact solution and the approximate solutions: Corollary 10.3.16 Let p ∈ [1, ∞) and µ ∈ P(IR) have the properties (10.3.8). Moreover, let w(t) t∈[t ,T ] be a q-dimensional standard Wiener 0

k : j = 1, . . . , q; i = 1, . . . , mk ; k = 0, . . . , n − 1} a set of process and {ξji 0 i.i.d. r.v.s with distribution L(ξ11 ) = µ. Then for (SE) and the methods (E3) and (M3) constructed with them we have

(a) If (AS1) and (AS2) hold, then √ m(h) .

p (x, z E ) ≤ K h1/2 + 1+ln m(h)

(b) If (AS1), (AS2), (AS3), and (AS4) hold, then 1+ln m(h) √

p (x, z ) ≤ K h + . M

m(h)

For p ∈ [2, ∞) the assertions follow directly from Theorem 10.3.15 and applying Lemma 10.3.3 to the right-hand sides. Then the assertions are also true for p ∈ [1, 2), since p1 ≤ p2 for 1 ≤ p1 ≤ p2 < ∞. The estimates in Theorem 10.3.15 and Corollary 10.3.16 give convergence rates

10.3 Stochastic Diﬀerential Equations

349

with respect to h for the methods (E3) and (M3) and for any grid sequence in G(m, Λ, α, β). These rates consist of two summands, one depending on h and the other depending on m(h), representing the rates of time and chance discretization, respectively. It is desirable to tune the rates of both summands, i.e., to equal the powers of h in both summands. This means to choose m(h) to be increasing like 1/h for method (E3) and like 1/h2 for the method (M3). Corollary 10.3.17 Let p ∈ [2, ∞) and µ ∈ P(IR) satisfy (10.3.8). Then we can construct solutions in (SE), (E3), and (M3) on a common probability space (as in Theorem 10.3.15) with the following properties. % (a) If (AS1) and (AS2) hold and max sup0<s≤1 sm(s), sup0<s≤1 ≤ K, then

1 sm(s)

&

E sup x(t) − z E (t)p ≤ K · hp/2 (1 − ln h)p . t0 ≤t≤T

(b) If (AS1), (AS2), (AS3), and (AS4) hold&and % 1 max sup0<s≤1 s2 m(s), sup0<s≤1 s2 m(s) ≤ K, then E sup x(t) − z M (t)p ≤ K · hp (1 − ln h)p . t0 ≤t≤T

Corollary 10.3.18 Under the general assumptions in Corollary 10.3.16 we have % (a) If (AS1) and (AS2) hold and max sup0<s≤1 sm(s), sup0<s≤1 ≤ K, then

1 sm(s)

&

p (x, z E ) ≤ K · h1/2 (1 − ln h). (b) If (AS1), (AS2), (AS3), and (AS4) hold&and % 1 max sup0<s≤1 s2 m(s), sup0<s≤1 s2 m(s) ≤ K, then

p (x, z M ) ≤ K · h(1 − ln h). In conclusion, suppose that a grid sequence in G(m, Λ, α, β) with h → 0 is given. Then using the metric p , we have under the assumptions of Corollary 10.3.18(a), for the method (E3), the convergence rate O(h1/2 (1 − ln h)). This convergence is in terms of the maximal step sizes h of the

350

10. Stochastic Diﬀerential Equations and Empirical Measures

h coarse subgrids. Similarly, we have the convergence rate O(( m(h) )1/4 (1 − h h ln m(h) )) with respect to the maximal step sizes m(h) of the whole ﬁne grids.

Finally, we have the convergence rate O(N −1/4 (1 + ln N )) with respect to the number N of all gridpoints of the whole ﬁne grids. Analogously, under the assumptions of Corollary 10.3.18(b) we have, for the method h h (M3), the convergence rates O(h(1 − ln h)), O(( m(h) )1/3 (1 − ln m(h) )), and O(N −1/3 (1 + ln N )).

References

[1] T. Abdellaoui. Distances de deux lois dans les espaces de Banach. PhD thesis, Universit´e de Rouen, 1993. [2] T. Abdellaoui. D´etermination d’un couple optimal du probl`eme de Monge–Kantorovich. C.R. Acad. Sci. Paris I, 319:981–984, 1994. [3] T. Abdellaoui and H. Heinich. Sur la distance de deux lois dans le cas vectoriel. C.R. Acad. Sci. Paris I, 319:397–400, 1994. [4] M. Abramowitz and I.A. Stegun. Handbook of Mathematical Functions. Dover Publications, New York, 9th edition, 1970. [5] A. Acosta and G. Gine. Convergence of moments and related functionals in the general central limit theorem in Banach spaces. Z. Wahrscheinlichkeitstheorie Verw. Geb., 48(2):213–241, 1979. [6] N.I. Ahiezer. Classical Moment Problem and Related Questions of Analysis. GIFML, Moscow, 1961. [7] N.I. Ahiezer and M. Krein. Some Questions in the Theory of Moments. American Mathematical Society, Providence, 1962. [8] H. Akaike. Modern development of statistical methods. In P. Eykhoﬀ, editor, Trends and Progress in System Identiﬁcation, pages 169–184. Pergamon Press, 1981.

352

References

[9] D.J. Aldous. Exchangeability and related topics. Lecture Notes in Mathematics, 1117, 1985. [10] D.J. Aldous. Ultimate instability of exponential backoﬀ protocol for acknowledgement-based transmission control of random access communication channels. IEEE Transactions on Information Theory, IT 33:219–223, 1987. [11] D.J. Aldous. Asymptotic fringe distribution for general families of random trees. Annals of Applied Probability, 1:228–266, 1991. [12] D.J. Aldous. The continuum random tree II: An overview. In M.T. Barlow and N.H. Bingham, editors, Stochastic Analysis, volume 167 of London Math. Soc. Lecture Notes Series, pages 23–70. Cambridge University Press, 1991. [13] D.J. Aldous and J.M. Steele. Introduction to the interface of probability and algorithms. Statistical Science, 8:3–9, 1993. [14] G.A. Anastassiou. Moments in Probability and Approximation Theory. Pitman, England, 1993. [15] G.A. Anastassiou and S.T. Rachev. Approximation of a random queue by means of deterministic queueing models. In C.K. Chui, L.L. Shumaker, and J.D. Ward, editors, Approximation Theory VI, volume 1, pages 9–11. Academic Press, 1989. [16] G.A. Anastassiou and S.T. Rachev. Moment problems and their applications to characterization of stochastic processes, queueing theory, and rounding problems. In Approximation Theory, volume 138, pages 1–77, New York, 1992. Proceedings of 6th S.E.A. Meeting, Marcel Dekker Inc. [17] G.A. Anastassiou and S.T. Rachev. Moment problems and their applications to the stability of queueing models. Computers and Mathematics with Applications, 24(8/9):229–246, 1992. [18] E.J. Anderson and P. Nash. Linear Programming in Inﬁnite Dimensional Spaces. Theory and Applications. Wiley, New York, 1987. [19] E.J. Anderson and A.B. Philpott. An algorithm for a continuous version of the assignment problem. Lecture Notes in Economics and Mathematical Systems, 215:108–117, 1983. Semi-Inﬁnite Programming and Applications (Austin, Texas, 1981). [20] E.J. Anderson and A.B. Philpott. Duality and an algorithm for a class of continuous transportation problems. Mathematics of Operations Research, 9:222–231, 1984.

References

353

[21] T.W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley, New York, 1984. [22] W. Apitzsch, B. Fritzsche, and B. Kirstein. A Schur analysis approach to minimum distance problems. Linear Algebra and its Applications, 1990. [23] A. Araujo and E. Gin´e. The Central Limit Theorem for Real and Banach Valued Random Variables. Wiley, New York, 1980. [24] M.A. Arbeiter. Random recursive constructions of self-similar fractal measures. The non-compact case. Probability Theory and Related Fields, 88:497–520, 1991. [25] R.J. Aumann. Measurable utility and the measurable choice theorem. In Centre Nat. Recherche Sci., Paris, editor, La Decision, volume 171, pages 15–26. Actes Coll. Internat., Aix-en-Provence, 1967. [26] F. Aurenhammer, F. Hoﬀmann, and B. Arnov. Minkowski-type theorems and least-square partitioning. Reports of the Institute for Computer Science, 1992. Dept. of Mathematics, Freie Universit¨at Berlin. [27] J. Auslander. Generalized recurrences in dynamical systems. Contributions to diﬀerential equations, 3(1):65–74, 1964. [28] M.L. Balinski. Signature des points extrˆemes du polyhedre dual du probl`eme de transport. Comptes Rendus de l’Acad´emie des Sciences, Paris, 1983. [29] M.L. Balinski. The Hirsch conjecture for dual transportation polyhedra. Mathematics of Operations Research, 9:629–633, 1984. [30] M.L. Balinski. Signature methods for the assignment problem. Operations Research, 34:125–141, 1985. [31] M.L. Balinski. A complex (dual) simplex method for the assingment problem. Mathematical Programming Study, 34:125–141, 1986. [32] M.L. Balinski, B. Athanasopoulos, and S.T. Rachev. Some developments on the theory of rounding proportions. In Bulletin of thi ISI, 49th Session, volume 1, pages 71–72, Firenze, 1993. [33] M.L. Balinski and D. Gale. On the core of the assignment game. In Functional Analysis, Optimization, and Mathematical Economics: A Collection of Papers dedicated to the Memory of L.V. Kantorovich, pages 274–289, Oxford, 1990. Oxford University Press. [34] M.L. Balinski and S.T. Rachev. On Monge–Kantorovich problems. Preprint, 1989. SUNY at Stony Brook, Dept. of Applied Mathematics and Statistics.

354

References

[35] M.L. Balinski and S.T. Rachev. Rounding proportions: rules of rounding. Numer. Funct. Anal. Optimization, 14:475–501, 1993. [36] M.L. Balinski and S.T. Rachev. Rounding proportions: methods of rounding. Mathematical Scientist, 1997. [37] M.L. Balinski and A. Russakoﬀ. Faces of dual transportation polyhedra. Mathematical Programming Study, 22:1–8, 1984. [38] M.L. Balinski and H.P. Young. Stability, coalitions and schisms in proportional representation systems. Americal Political Science Review, 72:848–858, 1978. [39] M.L. Balinski and H.P. Young. Fair Representation: Meeting the Ideal of One Man, One Vote. Yale University Press, New Haven, 1982. [40] A.A. Balkema, L. de Haan, and R. Karandikar. The maximum of n independent stochastic processes. Preprint, 1990. Erasmus University, Rotterdam. [41] D.P. Barbu and Th. Precupanu. Convexity and optimization in Banach spaces. Sijthoﬀ/Nordhoﬀ, 1978. [42] R.E. Barlow and F. Proschan. Statistical Theory of Reliability and Life Testing: Probability Models. Hold, Rinehart, and Winston, New York, 1975. [43] E.R. Barnes and A.J. Hoﬀman. Partitioning spectra and linear programming. In Proc. Silver Jubilee Conference on Combinations, Ontario, Canada, June 1982. Univ. Waterloo. [44] E.R. Barnes and A.J. Hoﬀman. On transportation problems with upper bounds on leading rectangles. SIAM Journal of Algebraic and Discrete Methods, 6:487–496, 1985. [45] M.F. Barnsley and J.H. Elton. A new class of of Markov processes for image encoding. Advances in Applied Probability, 20:14–32, 1988. [46] D.P. Baron and R.B. Myerson. Regulating a monopolist with unknown cost. Econometrica, 50:911–930, 1982. [47] S.K. Basu. On the rate of convergence to normality of sums of dependent random variables. Acta Math. Acad. Sci. Hungarica, 28:261– 265, 1976. [48] S.K. Basu and G. Simons. Moment spaces of IFR distributions, applications and related material. In P.K. Sen, editor, Contributions to Statistics: Essay in Honor of Norman L. Johnson, pages 27–46. North-Holland Publishing Company, 1983.

References

355

[49] J. Beirlant and S.T. Rachev. The problems in stability in insurance mathematics. Insurance: Mathematics and Economics, 6:179–188, 1987. [50] V. Bene˘s. The moment problem and its technical application. In Proc. 30th Int. Wissen. Kolloq., pages 11–14. TH Ilmenau, 1985. [51] V. Bene˘s. Moment Problem and Its Application. PhD thesis, Charles University, 1986. [52] V. Bene˘s. Extremal and optimal solutions in the transshipment problem. Comment. Math. Univ. Carolinae, 33:97–112, 1992. [53] V. Bene˘s. Extremal and Optimal Solutions of the Marginal and Transshipment Problem. PhD thesis, Dept. of Mathematics, FSI, Czech Technical University, Praha, Czech Republic, 1995. ˘ ep´ [54] V. Bene˘s and J. St˘ an. The support of extremal probability measure with given marginals. In M.L. Puri, P. Revesz, and W. Werzt, editors, Mathematical Statistics and Probability Theory, volume A of Proc. 6th Pannon Symp., pages 33–41. D. Reidel Publ. Comp., 1987. ˘ ep´ [55] V. Bene˘s and J. St˘ an. Extremal solutions in the marginal problem. In G. Dall’Aglio et al., editor, Advances in Probability Measures with Given Marginals, pages 189–206. Kluwer, Dordrecht, 1991. [56] V.Y. Bentkus, F. G¨otze, V. Paulauskas, and A. Rackauskas. The accuracy of Gaussian approximation in Banach spaces. University of Bielefeld, Preprint 90-100, 1990. [57] C. Berge. Th´eorie g´en´erale des jeux ` a n personnes, volume 138. Gauthier-Villars, Paris, 1957. M´emorial des science math´ematiques. [58] C. Berge and A. Ghouila-Houri. Programming, Games and Transportation Networks. Methnen, John Wiley and Sons, Inc., New York, 1965. [59] S. Bertino. Su di una sottoclasse della classe di Fr´echet. Statistica, 28:511–542, 1968. [60] S. Bertino. Sulla distanza tra distribuzioni. Pubbl. Ist. Calc. Prob. Univ. Roma, 1968. [61] D. Bertsekas and R. Gallager. Data Networks. Prentice-Hall, New Jersey, 1987. [62] N.P. Bhatia and G.P. Szeg¨o. Stability theory of dynamical systems. Number 161 in Dre Grundlehren der mathematischen Wissenschaften. Springer, 1970.

356

References

[63] R.M. Bhattacharya and R. Rango Rao. Normal Approximation and Asymptotoic Expansions. Wiley, 1976. [64] P.J. Bickel and D.A. Freedman. Some asymptotic theory for the bootstrap. Annals of Statistics, 9:1196–1217, 1981. [65] P. Billingsley. Convergence of Probability Measures. Wiley, New York, 1968. [66] P. Billingsley. Probability and Measure. Wiley, New York, 2nd edition, 1986. [67] D. Blackwell and L.E. Dubins. An extension of Skorohod’s almost sure representation theorem. Proc. Amer. Math. Soc., 89:691–692, 1983. [68] R.C. Blattberg and N.J. Genodes. A comparison of the stable and student distributions as statistical models for stock prices. J. Business, 47:244–280, 1974. [69] T. Bollerslev. A conditionally heteroscedastic time series model for speculative prices and rates of return. Review of Economic Studies, 69:542–547, 1987. [70] E. Bolthausen. Exact convergence rate in some martingale central limit theorems. Annals of Probability, 10:672–688, 1982. [71] A. Boness, A. Chen, and S. Jatusipitak. Investigations of nonstationary prices. J. Business, 48:518–537, 1979. [72] A.A. Borovkov. Asymptotic Methods in Queueing Theory. Wiley, New York, 1984. [73] A.A. Borovkov. On the ergodicity and stability of the sequence wn+1 = f (wn , zn ): applications to communication networks. Theory of Probability and its Applications, 33:595–611, 1988. [74] A. Brandt, P. Franken, and B. Lisek. Stationary Stochastic Models. Wiley, New York, 1990. [75] Y. Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure. Appl. Maths., XLIV:375–417, 1987. [76] G. Brown and B. Shubert. On random binary trees. Mathematics of Operations Research, 9:43–65, 1984. [77] R.A. Brualdi and J. Csima. Extremal plane stochastic matrices of dimension three. Journal of Linear Algebra and its Applications, 11:105–133, 1975.

References

357

[78] R.A. Brualdi and J. Csima. Stochastic patterns. J. Comb. Theory, 19:1–12, 1975. [79] Y.A. Brudnii. A multidimensional analog of a theorem of Whitney. USSR Math. Sbornik, 11:157–170, 1970. [80] R.E. Burkard, B. Klinz, and R. Rudolf. Perspectives of Monge properties in optimization. Bericht 2, 1994. Spezialforschungsbereich F 003, Karl-Franzens-Universit¨ at Graz & Technische Universit¨at Graz. [81] R.M. Burton and U. R¨osler. An L2 -convergence theorem for random aﬃne mappings. Journal of Applied Probability, 32:183–192, 1995. [82] P.L. Butzer, L. Hahn, and M.Th. Roeckerath. Central limit theorem and weak law of large numbers with rates for martingales in Banach spaces. Journal of Multivariate Analysis, 13:287–301, 1983. [83] S. Cambanis and G. Simons. Probability and expectation inequalities. Z. Wahrscheinlichkeitstheorie Verw. Geb., 59:285–294, 1982. [84] S. Cambanis, G. Simons, and W. Stout. Inequalities for Ek(X, Y ) when the marginals are ﬁxed. Z. Wahrscheinlichkeitstheorie Verw. Geb., 36:285–294, 1976. [85] L. Cavalli-Sforza. Cultural and biological evolution: a theoretical inquirey. In S.G. Ghurye, editor, Proceedings of the Conference on Directions for Mathematical Statistics, volume 7 of Suppl. Adv. Appl. Prob., pages 90–99, 1975. [86] L. Cavalli-Sforza and M.W. Feldman. Models for cultural inheritance I. Group mean and within group variation. Theoret. Popn. Biol., 4:42–55, 1973. [87] S. Chandrasekhar and G. Munch. The theory of the ﬂuctuations in brightness of the milky way. I and II. Astrophys. J., 112:380–398, 1950. [88] M.R. Chernick, D.J. Daley, and R.P. Littlejohn. A time-revisibility relationship between two markov chains with exponential stationary distributions. Journal of Applied Probability, 25:418–422, 1988. [89] G. Choquet. Forme abstraite du th´eor`eme de capacitabilit´e. Ann. Inst. Fourier, 9:83–89, 1959. [90] Y.S. Chow and H. Teicher. Probability Theory: Independeance, interchangeability, martingales. Springer, New York, 1978. [91] F.H. Clark. Optimization and nonsmooth analysis. Classics in Appl. Math. SIAM, 1990.

358

References

[92] J.M.C. Clark and R.J. Cameron. The maximum rate of convergence of discrete approximations for stochastic diﬀerential equations. Lecture Notes in Control and Information Science, 25:162–171, 1980. [93] P.K. Clark. A subordinated stochastic process model with ﬁnite variance for speculative prices. Econometrica, 41:135–155, 1973. [94] M. Cramer. Stochastische Analyse rekursiver Algorithmen mit idealen Metriken. PhD thesis, Universit¨at Freiburg, 1995a. [95] M. Cramer. Convergence of a branching type recursion with nonstationary immigration. Metrica, 1995b. To appear. [96] M. Cramer. A note concerning the limit distribution of the Quicksort algorithm. Informatique Th´eoriqu´e et Appl., 30:195–207, 1996. [97] M. Cramer and L. R¨ uschendorf. Analysis of recursive algorithms by the contraction method. Lecture Notes in Statistics, 114:18–33, 1996a. [98] M. Cramer and L. R¨ uschendorf. Convergence of a branching type recursion. Annales de l’Institut Henri Poincar´e, 32:725–741, 1996b. [99] J. Csima. Multidimensional stochastic matrices and patterns. J. Algebra, 14:194–202, 1970. [100] J.A. Cuesta-Albertos and C. Matr´ an. Strong convergence of weighted sums of random elements through the equivalence of sequences of distributions. Journal of Multivariate Analysis, 25:311–322, 1988. [101] J.A. Cuesta-Albertos and C. Matr´ an. Notes on the Wasserstein metric in Hilbert spaces. Annals of Probability, 17:1264–1276, 1989. [102] J.A. Cuesta-Albertos and C. Matr´ an. Skorohod representation theorem and Wasserstein metrics. Preprint, 1991. [103] J.A. Cuesta-Albertos and C. Matr´an. A review on strong convergence of weighted sums of random elements based on Wasserstein metrics. Journal of Stat. Planning Infer., 30:359–370, 1992. [104] J.A. Cuesta-Albertos and C. Matr´an. Stochastic convergence through Skorohod representation theorems and Wasserstein metrics. Suppl. Rendic. Circolo Matem. Palermo II, 35:89–113, 1994. [105] J.A. Cuesta-Albertos, C. Matr´ an, S.T. Rachev, and L. R¨ uschendorf. Mass transportation problems in probability theory. Mathematical Scientist, 21:37–72, 1996.

References

359

[106] J.A. Cuesta-Albertos, L. R¨ uschendorf, and A. Tuero-Diaz. Optimal coupling of multivariate distributions and stochastic processes. Journal of Multivariate Analysis, 46:335–361, 1993. [107] J.A. Cuesta-Albertos and A. Tuero-Diaz. A characterization for the solution of the Monge–Kantorovich mass transference problem. Statist. Probab. Letters, 16:147–152, 1993. [108] G. Dall’Aglio. Sugli estremi dei momenti delle funzioni di ripartizione doppie. Ann. Scuola Normale Superiore Di Pisa, Cl. Sci., 3(1):33–74, 1956. [109] G. Dall’Aglio. Sulla compatibilita delle funzioni di ripartizione doppia. Rendiconti di Math., 18:385–413, 1959. [110] G. Dall’Aglio. Les fonctions extr`emes de la classe de Fr´echet `a 3 dimensions. Publ. Inst. Stat. Univ. Paris, IX:175–188, 1960. [111] G. Dall’Aglio. Sulle distribuzioni doppie con margini assegnati soggette a delle limitazioni. It. Giorn. 1st. Ital. Attuari, 94, 1961. [112] G. Dall’Aglio. Fr´echet classes and compatibility of distribution functions. Symposia Mathematica, 9:131–150, 1972. [113] G. Dall’Aglio, S. Kotz, and G. Salinetti. Advances in Probability Distributions with Given Marginals. Kluver, Dordrecht, 1991. [114] G.B. Dantzig and A.R. Ferguson. The allocation of aircraft to routes—an example of linear programming under uncertain demands. Mang. Science, 3:45–73, 1956. [115] A. D’Aristotile, P. Diaconis, and D. Freedman. On a merging of probabilities. No. 301, 1988. Dept. of Statistics, Stanford University. [116] M.M. Day. Normed Linear Spaces. Heidelberg, 1958.

Springer, Berlin–G¨ ottingen–

[117] A. de Acosta. Invariance principles in probability for triangle arrays of B-valued random vectors and some applications. Annals of Probability, 10:346–373, 1982. [118] L. de Haan, E. Omey, and S.I. Resnick. Domains of attraction and regular variation in IRd . Journal of Multivariate Analysis, 14:17–33, 1984. [119] L. de Haan and S.T. Rachev. Estimates of the rate of convergence for max-stable processes. Annals of Probability, 17:651–677, 1989.

360

References

[120] L. de Haan and S.I. Resnick. Limit theory for multivariate sample extremes. Z. Wahrscheinlichkeitstheorie Verw. Geb., 40:317–337, 1977. [121] L. de Haan, S.I. Resnick, H. Rootzen, and C.G. Vries. Extremal behavior of solutions to a stochastic diﬀerence equation with applications to ARCH process. Stoch. Processes and Applications, 32:213– 224, 1989. [122] P. de Jong. Central limit theorems for generalized multilinear forms. CWI Tract, Amsterdam, 61, 1989. [123] G. Debreu. Representation of a preference ordering by a numerical function. In Decision Processes, pages 159–165. Wiley, New York, 1954. [124] G. Debreu. Continuity properties of paretian utility. Intern. Econ. Revue, 5:285–293, 1964. [125] P. Deheuvels and D. Pfeifer. On a relationship between Uspensky’s theorem and Poisson approximation. Ann. Inst. Statist. Math., 40:671–681, 1988. [126] C. Dellacherie and P.A. Meyer. Probabilit´es et potential, volume 29 of North-Holland Mathematics Studies. Hermann, Paris, 1983. Chapitres IX a XI. [127] U. Derigs, O. Goecke, and R. Schrader. Monge sequences and a simple assignment algorithm. Discrete Applied Mathematics, 15:241– 248, 1986. [128] L. Devroye. Lecture Notes on Bucket Algorithms, volume 6. Birkh¨ auser, Boston, 1986. Progress in computer science. [129] L. Devroye. A Course in Density Estimation, volume 14 of Progress in probability and statistics. Birkh¨auser, Boston, 1987. [130] P. Diaconis and D. Freedman. On rounding percentages. Journal of the American Statistical Association, 74:359–364, 1979. [131] P. Diaconis and D. Freedman. A dozen of the Finetti-style results in search of a theory. Annales de l’Institut Henri Poincar´e, 23:397–423, 1987. [132] H. Dietrich. Zur c-Konvexit¨ at und c-Subdiﬀerenzierbarkeit von Funktionalen. Optimization, 19:355–371, 1988. [133] N. Dinculeanu. Vector Measures, volume 95 of International series of monographs on pure and applied mathematics. Pergamon Press, Oxford, 1967.

References

361

[134] R.L. Dobrushin. Prescribing a system of random variables by conditional distributions. Theory of Probability and its Applications, 15:458–486, 1970. [135] R.L. Dobrushin. Vlasov equations. Func. Anal. Appl., 13:115–123, 1979. [136] I. Domowitz and C.S. Hakkio. Conditional variance and the risk premium in the foreign exchange market. Journal of Internat. Economics, 19:47–66, 1985. [137] H. Doss. Liens entre ´equation diﬀerentielles stochastiques et ordinaires. Annales de l’Institut Henri Poincar´e, XIII:99–125, 1977. [138] R.G. Douglas. On extremal measures and subspace density. Michigan Math. J., 11:243–246, 1964. [139] D.C. Dowson and B.V. Landau. The Fr´echet distance between multivariate normal distributions. Journal of Multivariate Analysis, 12:450–455, 1982. [140] A.Y. Dubovitskii and A.A. Milyutin. Necessary Conditions for a Weak Extremum in the General Problems of Optimal Management. Nauka, Moscow, 1971. In Russian. [141] R.M. Dudley. Convergence of Baire measures. Studia Mathematica, 27:251–268, 1966. [142] R.M. Dudley. Distances of probability measures and random variables. Annals of Mathematical Statistics, 39:1563–1572, 1968. [143] R.M. Dudley. The speed of mean Glivenko–Cantelli convergence. Annals of Mathematical Statistics, 40:40–50, 1969. [144] R.M. Dudley. Speeds of metric probability convergence. Z. Wahrscheinlichkeitstheorie Verw. Geb., 22:323–332, 1972. [145] R.M. Dudley. Probability and metrics. Convergence of laws on metric spaces, with a view to statistical testing. Aarhus Univ. Lect. Notes, 45, 1976. [146] R.M. Dudley. Real Analysis and Probability. Wadsworth & BrooksCole, Paciﬁc Grove, California, 1989. [147] D. Duﬃe. Dynamic Asset Pricing Theory. Princeton University Press, Princeton, 1992. [148] N. Dunford and J. Schwartz. Linear Operators. General Theory, volume Part I. Wiley-Interscience Publication, New York, 1958.

362

References

[149] R. Durrett and M. Liggett. Fixed points of the smoothing transformation. Z. Wahrscheinlichkeitstheorie Verw. Geb., 64:275–301, 1983. [150] A. Dvoretzky. Asymptotic normality for sums of dependent random variables. Proc. Berkeley Symp. II, pages 513–535, 1970. [151] D.A. Edwards. On the existence of probability measures with given marginals. Ann. Inst. Fourier, 28:53–78, 1978. [152] I. Ekeland and R. Teman. Convex analysis and variational problems. North Holland, 1976. [153] K.H. Elster and R. Nehse. Zur Theorie der Polarfunktionale. Optimization, 5:3–21, 1974. [154] R.F. Engle, D.M. Lilien, and R.P. Robins. Estimating time varying risk premia in the term structure: the ARCH model. Econometrica, 55:391–407, 1987. [155] Y. Ermoljev, A. Gaivoronski, and C. Nedeva. Stochastic optimization problem with incomplete information on distribution functions. Report WP-83-113, 1983. [156] I.V. Evstigneev. Measurable choice theorems and probabilistic control models. Dokl. Akad. Nauk USSR, 283(5):1065–1068, 1985. [157] G. Fayolle, P. Flajolet, and M. Hofri. On a functional equation arising in the analysis of a protocol for a multi-access broadcast channel. Advances in Applied Probability, 18:441–472, 1986. [158] G. Fayolle, P. Flajolet, M. Hofri, and P. Jacquet. Analysis of a stack algorithm for random multiple-access communication. IEEE Transactions on Information Theory, 31:244–254, 1985. [159] M.W. Feldman, S.T. Rachev, and L. R¨ uschendorf. Limit theorems for recursive algorithms. Journal of Computational and Applied Mathematics, 56:69–182, 1994. [160] W. Feller. An Introduction to Probability Theory and Its Applications, volume II. Wiley, New York, 2nd edition, 1971. [161] R. Ferland and G. Giroux. Cutoﬀ-type Boltzmann equations: Convergence of the solution. Adv. Appl. Math., 8:98–107, 1987. [162] R. Ferland and G. Giroux. Le mod`ele Bose–Einstein de l’´equation non lin´eaire de Boltzmann: Convergence vers l’equilibre. Ann. Sc. Math. Qu´ebec, 15:23–33, 1991. [163] X. Fernique. Sur le th´eor`eme de Kantorovich–Rubinstein dans les espaces polonais. Lecture Notes in Mathematics, 850:6–10, 1981.

References

363

[164] P.C. Fishburn, J.C. Lagarias, J.A. Reeds, and L.A. Shepp. Sets uniquelly determine by projections on axes. I. Continuous case. SIAM Journal on Applied Mathematics, 50:288–306, 1990. [165] A.T. Fomenko and S.T. Rachev. Volume functions on historical (narrative) texts and the amplitude correlation principle. Computers and Humanities, 24(3):187–206, 1990. [166] P.R. Fortet and B. Mourier. Convergence de la repartition empirique vers la repartition theoretique. Ann. Sci. Ecole Norm. Sup., 70(3):267–285, 1953. [167] M.J. Frank. Operations arising from copulas. In Symp. Probab. Measures with Given Marginals, volume 67 of Math. Appl., pages 75–93, Rome, 1991. [168] M.J. Frank, R.B. Nelsen, and B. Schweizer. Best possible bounds for the distribution of a sum — a problem of Kolmogorov. Probability Theory and Related Fields, 74:199–211, 1987. [169] M. Fr´echet. Sur les tableaux de corr´elation dont les marges sont donn´ees. Ann. Univ. de Lyon, Sciences, 14:53–77, 1951. [170] M. Fr´echet. Les tableaux de correlation dont les marges sont donn´ees. Ann. Univ. de Lyon, Sciences, 20:13–31, 1957. [171] M. Fr´echet. Sur la distance de deux lois de probabilit´e. C.R. Acad. Sci. Paris, 244:689–692, 1957. [172] M. Fr´echet. Sur les tableaux de corr´elation dont les marges et des bornes sont donn´ees. Revue Inst. Int. de Statistique, 28:10–32, 1960. [173] N. Gaﬀke and L. R¨ uschendorf. On a class of extremal problems in statistics. Math. Operationsforschung Statist., 12:123–135, 1981. [174] N. Gaﬀke and L. R¨ uschendorf. On the existence of probability measures with given marginals. Statistics & Decisions, 2:163–174, 1984. [175] D. Gale. Theory of Linear Economic Models. McGraw-Hill, New York, 1960. [176] D. Gale and A. Mas-Colell. An equilibrium existence theorem for a general model without ordered preferences. Journal of Mathematical Economics, 2:9–15, 1975. [177] W. Gangbo and R.J. McCann. Optimal maps in Monge’s mass transport problem. CRAS, Ser. I, 321:1653–1658, 1995. [178] W. Gangbo and R.J. McCann. The geometry of optimal transformations. Preprint, 1996.

364

References

[179] M. Gelbrich. On a formula for the Lp Wasserstein metric between measures on Euclidean and Hilbert spaces. Preprint 179, 1988. Sektion Mathematik der Humboldt-Universit¨at zu Berlin. [180] M. Gelbrich. Lp -Wasserstein-Metriken und Approximationen stochastischer Diﬀerentialgleichungen. Dissertation A, Humboldt-Universit¨ at zu Berlin, Sektion Mathematik, 1989. [181] M. Gelbrich. On a formula for the L2 -Wasserstein metric between measures on Euclidean and Hilbert spaces. Math. Nachr., 147:185– 203, 1990. [182] M. Gelbrich. Simultaneous time and chance discretization for stochastic diﬀerential equations. Journal of Computational and Applied Mathematics, 58:255–289, 1995. [183] M. Gelbrich and S.T. Rachev. Discretization for stochastic diﬀerential equations, L2 -Wasserstein metrics, and econometric models. In Distributions with Given Marginals. IMS Proc., 1996. To appear. [184] I. Gelfand, D. Raikov, and G. Shilov. Kommutative normierte Algebren. VEB Deutscher Verlag der Wissenschaften, 1964. [185] C. Genest. A survey of the statistical properties and applications of Archimedean copulas, 1990. Technical Report. [186] H. Gerber. An Introducation to Mathematical Risk Theory. Huebner Foundation Monograph, 1981. [187] I.I. Gikhman and A.W. Skorokhod. Introduction to the theory of stochastic processes. Nauka, Moscow, 1977. In Russian. [188] C. Gini. Di una misura delle ralazioni tra le graduatorie di due caratteri. Appendix to: A. Hancini. L’Elezioni Generali Politiche del 1913 nel comune di Roma, Ludovic, Cecehini, 1914. [189] C. Gini. La dissomiglianza. Matron, 24:309–331, 1965. [190] C.R. Givens and R.M. Shortt. A class of Wasserstein metrics for probability distributions. Michigan Math. J., 31:231–240, 1984. [191] D. Goldfarb. Eﬃcient dual simplex algorithms for the assignment problem. Preprint, 1985. [192] C.M. Goldie. Implicit renewal theory and tails of solutions of random equations. Annals of Applied Probability, 1:126–166, 1991. [193] C. Graham. McKean–Vlasov Itˆ o–Skorohod equations and nonlinear diﬀusions with discrete jump sets. Stoch. Proc. Appl., 40:69–82, 1992.

References

365

[194] C. Graham. Nonlinear diﬀusions with jumps. Preprint, 1992. [195] R.M. Gray, D.L. Neuhoﬀ, and R.L. Dobrushin. Block synchronization, sliding-block coding, invulnerable sources and zero error codes for discrete noisy channels. Annals of Probability, 8:315–328, 1980. [196] R.M. Gray, D.L. Neuhoﬀ, and P.C. Shields. A generalization to Ornstein’s d-distance with applications to information theory. Annals of Probability, 3:315–328, 1975. [197] R.M. Gray and D.S. Ornstein. Block coding for discrete stationary d-continuous channels. IEEE Transactions on Information Theory, 25:292–306, 1979. [198] N.E. Gretsky, J.M. Ostroy, and W.R. Zame. The nonatomic assignment model. Journal of Economic Theory, 2:103–128, 1992. [199] N.V. Grigorevski and I.S. Shiganov. On some modiﬁcations of Duley’s metric. Zap. Nauchnich Sem. LOMI, 61:17–24, 1976. [200] F.A. Gr¨ unbaum. Propagation of chaos for the Boltzmann equation. Arch. Rational Mech. Anal., 42:323–345, 1971. [201] P. Gudynas. Approximation by distributions of sums of conditionally independent random variables. Litovski Mat. Sbornik, 24:68–80, 1985. [202] Y. Guivarch. Sur une extension de la notion de loi semi-stable. Annales de l’Institut Henri Poincar´e, 26:261–286, 1990. [203] W. Gutjahr and G.Ch. Pﬂug. The asymptotic contour process of a binary tree is a Brownian excursion. Stoch. Processes and Applications, 41:69–89, 1992. [204] S. Gutmann, J.H.B. Kemperman, and J.A. Reeds. Existence of probability measures with given marginals. Annals of Probability, 19:1781–1791, 1991. [205] S. Gutmann, J.H.B. Kemperman, J.A. Reeds, and L.A. Shepp. Existence of probability measures with given marginals. Annals of Probability, 19:1781–1791, 1991. [206] D.L. Guy. common extension of ﬁnitely additive probability measures. Portugalia Math., 20:1–5, 1961. [207] M.G. Hahn, W.N. Hudson, and J.A. Veeh. Operator stable laws: series representations and domains of normal attraction. Journal of Multivariate Analysis, 10:26–37, 1989. [208] P. Hall. Personal communication, 1985.

366

References

[209] J.P. Hammond. Straightforward individual incentive compatiblility in large economies. Review of Economic Studies, 46:263–282, 1979. [210] W.K.K. Haneveld. Duality in Stochastic Linear and Dynamic Programming. Centrum voor Wiskunde en Informatica, Amsterdam, 1985. [211] L.G. Hanin. Kantorovich–Rubinstein duality for Lipschitz spaces deﬁned by diﬀerences of arbitrary order. Soviet Math. Doklady, 42(1):220–224, 1991. [212] L.G. Hanin and S.T. Rachev. An extension of the Kantorovich– Rubinstein mass transportation problem., 1991. Dept. of Statistics and Applied Probability, University of California, Santa Barbara. [213] L.G. Hanin and S.T. Rachev. Mass transshipment problems and ideal metrics. Journal of Computational and Applied Mathematics, 56:183–196, 1994. [214] L.G. Hanin and S.T. Rachev. An extension of the Kantorovich– Rubinstein mass transshipment problem. Numer. Funct. Anal. Optimization, 16:701–735, 1995. [215] G. Hansel and J.P. Troallic. Measures marginales et th´eor`eme de Ford–Fulkerson. Z. Wahrscheinlichkeitstheorie Verw. Geb., 43:245– 251, 1978. [216] G. Hansel and J.P. Troallic. Sur le probl`eme des marges. Probability Theory and Related Fields, 71:357–366, 1986. [217] F. Hausdorﬀ. Set Theory. Chelsea Publishing Company, New York, 1957. [218] E. H¨ aussler. On the rate of convergence in the central limit theorem for martingales with discrete and continuous time. Annals of Probability, 16:275–299, 1988. [219] H. Heinich and J.C. Lootgieter. Convergence des fonctions monotones. Preprint, 1993. [220] I.S. Helland and T.S. Nilsen. On a general random exchange mode. Journal of Applied Probability, 13:781–790, 1976. [221] P.L. Hennequin and A. Tortrat. Probability Theory and Some of Its Applications. Nauka, Moscow, 1974. Russian translation. [222] W. Hildenbrand. On economies with many agents. Journal of Economic Theory, 2:161–168, 1970.

References

367

[223] C. Hipp and R. Michel. Risikotheorie: Stochastische Modelle und Statistische Methoden. DGVM, 24, 1990. [224] W. Hoeﬀding. Maßstabinvariante Korrelationstheorie. Schriften des Mathematischen Instituts und des Instituts f¨ ur Angewandte Mathematik der Universit¨ at Berlin, 5:181–233, 1940. [225] W. Hoeﬀding. The extrema of the expected value of a function of independent random variables. Annals of Mathematical Statistics, 26:268–275, 1955. [226] W. Hoeﬀding and S.S. Shrikahande. Bounds for the distribution function of a sum of independent, identically distributed random variables. Annals of Mathematical Statistics, 27:439–449, 1956. [227] A.J. Hoﬀman. On simple linear programming problems. Convexity. In Proceedings of Symposia in Pure Mathematics, volume 7, pages 317–327, Providence, R.I, 1961. [228] A.J. Hoﬀman. On simple linear programming problems. In V. Klee, editor, Convexity, volume 7, pages 317–327, Providence, R.I, 1963. Proc. Symp. Pure Math. [229] A.J. Hoﬀman and A.F. Veinott jr. Staircase transportation problems with hyperadditive rewards and cumulative capacities. Preprint, 1990. IBM T.Y. Watson Research Center, Yorktown Heights, New York, 10598. [230] J. Hoﬀmann-J¨orgensen. Probability in Banach space. Lecture Notes in Mathematics, 598:2–186, 1977. [231] M. Hofri. Probabilistic Analysis of Algorithms. Springer, New York, 1987. [232] R. Holley and M. Liggett. Generalized potlach and smoothing processes. Z. Wahrscheinlichkeitstheorie Verw. Geb., 55:165–195, 1981. [233] G. Hooghiemstra and M. Keane. Calculation of the equilibrium distribution for a solar energy storage model. Journal of Applied Probability, 22:852–864, 1985. [234] G. Hooghiemstra and C.L. Scheﬀer. Some limit theorems for an energy storage model. Stoch. Processes and Applications, 22:121– 127, 1986. [235] J. Horowitz and R.L. Karandikar. Martingale problems associated with the Boltzmann equation. In E. C ¸ inlar et al., editor, Seminar on Stochastic Processes 1989, Boston, 1990. Birkh¨auser.

368

References

[236] J. Horowitz and R.L. Karandikar. Mean rates of convergence of empirical measures in the Wasserstein metric. Journal of Computational and Applied Mathematics, 55:261–273, 1994. [237] D.A. Hsieh. The statistical properties of daily foreign exchange rates: 1974-1983. Journal of Internat. Economics, 24:129–145, 1988. [238] P.J. Huber. Robust Statistics. Wiley, New York, 1981. [239] W.N. Hudson. Operator-stable distributions and stable marginals. Journal of Multivariate Analysis, 10:26–37, 1980. [240] W.N. Hudson, Z.J. Jurek, and J.A. Veeh. The symmetry group and exponents of operator stable probability measures. Annals of Probability, 14:1014–1023, 1986. [241] W.N. Hudson and J.D. Mason. Operator-stable laws. Journal of Multivariate Analysis, 11:434–447, 1981. [242] W.N. Hudson, J.A. Veeh, and D.C. Weiner. Moments of distributions attracted to operator-stable laws. Journal of Multivariate Analysis, 24:1–10, 1988. [243] J.E. Hutchinson. Fractals and selfsimilarity. Indiana Univ. Math. Journal, 30:713–747, 1981. [244] Z. Ignatov and S.T. Rachev. Minimality of ideal probabilistic metrics. J. Soviet Math., 32(6):595–608, 1986. [245] N. Ikeda and S. Watanabe. Stochastic Diﬀerential Equations and Diﬀusion Processes. North-Holland, Amsterdam, 1981. [246] A.D. Ioﬀe and V.M. Tihomirov. Theory der Extremalaufgaben. VEB Deutscher Verlag der Wissenschaften, Berlin, 1979. [247] K. Isii. Inequalities of the type of Chebychev and Cram´er-Rao and mathematical programming. Ann. Inst. Statist. Math., 16:247–270, 1964. [248] E.H. Ivanov and R. Nehse. Relations between generalized concepts of convexity and conjugacy. Math. Operationsforschung Statist., 13:9– 18, 1982. [249] K. Jacobs. Measure and Integral. Academic Press, New York, 1987. [250] J. Jacod. Calcul stochastique et probl`eme de martingales. Lecture Notes in Mathematics, 714, 1979.

References

369

[251] P. Jacquet and M. Regnier. Normal limiting distribution of the size of tries. In P.J. Courtois and G. Latouche, editors, Proc. Performance 87, pages 209–223, Amsterdam, 1988. Elsevier Science Publications B.V. (North Holland). [252] R. Janssen. Discretization of the Wiener-Process in DiﬀerenceMethods for stochastic diﬀerential equations. Stoch. Processes and Applications, 18:361–369, 1984. [253] M. Jirina and J. Nedoma. Minimax solution of a sampling inventory process. Aplikace matematiky, 1:296–314, 1957. In Czech. [254] R. Jirousek. A survey of methods used in probabilistic expert systems for knowledge integration. Knowledge Based Systems, 3:7–12, 1990. [255] R. Jirousek. Solution of the marginal problem and decomposable distributions. Kybernetika, 27(5):403–412, 1991. [256] H. Johnen and K. Scherer. On the equivalence of K-functional and moduli of continuity and some applications. Lecture Notes in Mathematics, 571:119–130, 1977. [257] J.P. Kahane and J. Peyri`ere. Sur certaines martingales de Benoit Mandelbrot. Adv. Math., 22:131–145, 1976. [258] A.V. Kakosjan, K. Klebanov, and S.T. Rachev. Quantitative Criteria for Convergence of Probability Measures. Ayastan Press, Erevan, 1988. (In Russian, Engl. transl.: Springer-Verlag, To appear). [259] A.V. Kakosjan and L.B. Klebanov. On estimates of the closeness of distributions in terms of characteristic functions. Theory of Probability and its Applications, 29:852–853, 1984. [260] V.V. Kalashnikov and S.T. Rachev. Characterization problems in queueing theory and their stability. Advances in Applied Probability, 17:320–348, 1985. [261] V.V. Kalashnikov and S.T. Rachev. Characterization of inverse problems in queueing and their stability, 1986. [262] V.V. Kalashnikov and S.T. Rachev. Mathematical Methods for Construction of Stochastic Queueing Models. Wadsworth & Brooks/Cole, California, 1990. [263] T. Kamae, U. Krengel, and G.I. O’Brien. Stochastic inequalities on partially ordered spaces. Annals of Probability, 5:899–912, 1977. [264] S. Kanagawa. The rate of convergence for approximate solutions of stochastic diﬀerential equations. Tokyo J. Math., 12:33–48, 1986.

370

References

[265] Y. Kannai. Continuity properties of the core of a market. Econometrica, 38(6):791–815, 1970. [266] L.V. Kantorovich. On the transfer of masses. Dokl. Akad. Nauk USSR, 37:7–8, 1942. [267] L.V. Kantorovich. On a problem of Monge. Uspekhi Mat. Nauk, 3:225–226, 1948. In Russian. [268] L.V. Kantorovich and G.P. Akilov. Functional Analysis. Nauka, Moscow, 3rd edition, 1984. In Russian. [269] L.V. Kantorovich and G.Sh. Rubinstein. On a function space in certain extremal problems. Dokl. Akad. Nauk USSR, 115(6):1058– 1061, 1957. [270] L.V. Kantorovich and G.Sh. Rubinstein. On the space of completely additive functions. Vestnic Leningrad Univ., Ser. Mat. Mekh. i Astron., 13(7):52–59, 1958. In Russian. [271] S. Karlin and W.J. Studden. Tchebycheﬀ Systems. Interscience, New York, 1966. [272] T. Kawata. Fourier Analysis in Probability Theory. Academic Press, New York, 1972. [273] H.G. Kellerer. Funktionen auf Produktr¨aumen mit vorgegebenen Marginal-Funktionen. Math. Ann., 144:323–344, 1961. [274] H.G. Kellerer. Maßtheoretische Marginal Probleme. Math. Annalen, 153:168–198, 1964. [275] H.G. Kellerer. Duality theorems and probability metrics. In Proc. 7th Brasov Conf., pages 211–220, Bucuresti, 1984. [276] H.G. Kellerer. Duality theorems for marginal problems. In M. Iosifescu, editor, Proceedings of the 7th Conference on Probability Theory, Bra¸sov, Romania, 1984. [277] H.G. Kellerer. Duality theorems for marginal problems. Z. Wahrscheinlichkeitstheorie Verw. Geb., 67:399–432, 1984. [278] H.G. Kellerer. Ambiguity in bounded moment problems. In AMS-IMS-SIAM Joint Research Conference: Distributions with ﬁxed marginals, double-stochastic measures and Markov operators, 1993. To appear. [279] R. Kemp. Fundamentals of the Average Case Analysis of Particular Algorithms. Wiley, New York, 1984.

References

371

[280] J.H.B. Kemperman. The general moment problem, a geometric approach. Annals of Mathematical Statistics, 19:93–122, 1968. [281] J.H.B. Kemperman. On a class of moment problems. In Proceedings 6th Berkeley Symposium on Mathematical Statistics and Probability, volume 2, pages 101–126, 1972. [282] J.H.B. Kemperman. On the FKG-inequality for measures on a partially ordered space. Proc. Nederl. Akad. Wet., 80:313–331, 1977. [283] J.H.B. Kemperman. On the role of duality in the theory of moments. In Semi-Inﬁnite Programming and Applications 1981, volume 215, pages 63–92. Springer, 1983. [284] J.H.B. Kemperman. Geometry of the moment problem. In Proceedings of Symposia in Applied Mathematics, volume 27, pages 16–53. American Mathematical Society, 1987. [285] J.H.B. Kemperman. Moment problems for measures on IRn with given k-dimensional marginals. In AMS-IMS-SIAM; Joint Research Conference. Distributions with ﬁxed marginals, double-stochastic measures and Markov operators, 1993. To appear. [286] H. Kesten. Random diﬀerence equations and renewal theory for products of random matrices. Acta Math., 131:207–248, 1973. [287] L.A. Khalﬁn and L.B. Klebanov. A solution of the computer tomography paradox and estimation of the distances between the densities of measures with the same marginals. Annals of Probability, 22:2235– 2241, 1994. [288] V. Kifer. Ergodic Theory of Random Transformations. Birkh¨ auser, Boston, 1986. [289] T. Kim and M.K. Richter. Nontransitive-nontotal consumer theory. Journal of Economic Theory, 38, 1986. [290] A.Y. Kiruta, A.M. Rubinov, and E.B. Yanovskaya. Optimal choice of distributions in complex socio-economic problems. Nauka, Leningrad, 1980. In Russian. [291] L.B. Klebanov, G.M. Maniya, and I.A. Melamed. A problem of Zolotarev and analogs of inﬁnitely divisible and stable distributions in a scheme for summing a random number of random variables. Theory of Probability and its Applications, 29:791–794, 1984. [292] L.B. Klebanov and S.T. Mkrtchian. Estimator of the closeness of distributions in terms of coinciding moments. In Problems of Stability of Stochastic Models, Proceedings, pages 64–72, Moscow, 1980.

372

References

[293] L.B. Klebanov and S.T. Rachev. The method of moments in computer tomography. Math. Scientist, 20:1–14, 1995. [294] L.B. Klebanov and S.T. Rachev. On a special case of the basic problem in diﬀraction tomography. In Stochastic Models, 1995. [295] L.B. Klebanov and S.T. Rachev. Closeness of probability measures with common marginals on ﬁnite number of direction. In Proceedings of Distributions with ﬁxed Marginals and Related Topics, volume 28, pages 162–174. IMS Lecture Notes Monography Series, 1996. [296] L.B. Klebanov and S.T. Rachev. Proximity of probability with common marginals in a ﬁnite number of directions. In Distributions with Given Marginals, 1996. [297] P. Kleinschmidt, C.W. Lee, and H. Schannath. Transportation problems which can be solved by the use of Hirsch paths for the dual problems. Mathematical Programming Study, 37:153–168, 1987. [298] P.E. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations. Springer-Verlag, Berlin, 1992. [299] M. Knott and C.S. Smith. On the optimal mapping of distributions. Journal of Optimization Theory and Applications, 43:39–49, 1984. [300] M. Knott and C.S. Smith. Note on the optimal transportation of distributions. Journal of Optimization Theory and Applications, 52:323– 329, 1987. [301] M. Knott and C.S. Smith. On Hoeﬀding–Fr´echet bounds and cyclic monotone relations. Journal of Multivariate Analysis, 40:328–334, 1992. [302] M. Knott and C.S. Smith. On a generalization of cyclic monotonicity and distances among random vectors. Linear Algebra and its Applications, 199:363–371, 1994. [303] D.E. Knuth. The Art of Computer Programming, volume II. Addison-Wesley, 1969. [304] J. Koml´ os, P. Major, and G. Tusn´ ady. An approximation of partial sums of independent r.v.s and the sample d.f., I. Z. Wahrscheinlichkeitstheorie Verw. Geb., 32:111–131, 1975. [305] J. Koml´ os, P. Major, and G. Tusn´ ady. An approximation of partial sums of independent r.v.s and the sample d.f., II. Z. Wahrscheinlichkeitstheorie Verw. Geb., 34:33–58, 1976. [306] M.G. Krein and A.A. Nudelman. The markov moment problem and extremal problems, 1977.

References

373

[307] W.M. Kruskal. Ordinal measures of association. Journal of the American Statistical Association, 53:814–861, 1958. [308] J. Kuelbs. Kolmogorov’s law of the iterated logarithm for Banach space valued random variables. Illinois J. Math., 21:784–800, 1977. [309] K. Kuratowski. Topology, volume I. Academic Press, New York, 1966. [310] K. Kuratowski. Topology, volume II. Academic Press, New York, 1969. [311] I. Kuznezova-Sholpo and S.T. Rachev. Explicit solutions of moment problems. Probability and Mathematical Statistics, 10:297–312, 1989. [312] J.J. Laﬀont and E. Maskin. A diﬀerential approach to dominant strategy mechanisms. Econometrica, 48:1507–1520, 1980. [313] T.L. Lai and M. Robbins. Maximally dependent random variables. Proc. Nat. Acad. Sci. USA, 73:286–288, 1976. [314] P. Lancaster. Theory of Matrices. Wiley, New York, London, 1969. [315] D. Landers and L. Rogge. Best approximations in Lφ -spaces. Z. Wahrscheinlichkeitstheorie Verw. Geb., 51:215–237, 1980. [316] F. Lassner. Sommes de produit de variables al´eatoires ind´ependantes. Thesis, Universit´e de Paris VI, 1974. [317] M. Ledoux and M. Talagrand. Springer, Berlin, 1991.

Probability in Banach Spaces.

[318] S.J. Leese. Multifunctions of Suslin type. Bull. Austral. Math. Soc., 11:395–411, 1975. and 13:159-160. [319] G. Letac. Repr´esentation des mesures de probabilit´e sur le produit de deux espaces denombrables, de marges donn´ees. Ann. Inst. Fourier, 16:497–507, 1966. [320] G. Letac. A contraction principle for certain Markov chains and its applications. Random matrices and their applications. In H. Kesten J.E. Cohen and C.M. Newman, editors, Proc. AMS-IMS-SIAM Joint Summer Research Conf. 1984, volume 50 of Contemp. Math., pages 263–273, Providence, R.I., 1986. Amer. Math. Soc. [321] V.L. Levin. Application of E. Helly’s theorem to convex programming, problems of best approximation and related questions. USSR Math. Sbornik, 8:235–248, 1969.

374

References

[322] V.L. Levin. Duality and approximation in the problem of mass transfer. In B.S. Mityagin, editor, Mathematical Economics and Functional Analysis, pages 94–108. Nauka, Moscow, 1974. In Russian. [323] V.L. Levin. On the problem of mass transfer. Soviet Math. Doklady, 16:1349–1353, 1975. [324] V.L. Levin. On the theorems in the Monge–Kantorovich problem. Uspekhi Mat. Nauk, 32:171–172, 1977. In Russian. [325] V.L. Levin. The mass transfer problem, strong stochastic domination and probability measures on the product of two compact spaces with given projections. Preprint, TsEMI, Moscow, 1978a. In Russian. [326] V.L. Levin. The Monge–Kantorovich problem on mass transfer. In Methods of Functional Analysis in Mathematical Economics, pages 23–55. Nauka, Moscow, 1978b. In Russian. [327] V.L. Levin. Measurable selections of multivalued mappings into topological spaces and upper envelopes of Carath´eodory integrands. Soviet Math. Doklady, 21:771–775, 1980. [328] V.L. Levin. Some applications of duality for the problem of translocation of masses with a lower semicontinuous cost function. Closed preferences and Choquet theory. Soviet Math. Doklady, 2:262–267, 1981. [329] V.L. Levin. A continuous utility theorem for closed preorders on a compact metrizable space. Soviet Math. Doklady, 28:715–718, 1983a. [330] V.L. Levin. Measurable utility theorems for closed and lexicographic preference relations. Soviet Math. Doklady, 27:639–643, 1983b. [331] V.L. Levin. Lipschitz preorders and Lipschitz utility functions. Russian Mathematical Surveys, 39:199–200, 1984a. [332] V.L. Levin. The mass transfer problem in topological space and probability measures on the product of two spaces with given marginal measures. Soviet Math. Doklady, 29:638–643, 1984b. [333] V.L. Levin. Convex Analysis in Spaces of Measurable Functions and Its Applications in Mathematics and Economics. Nauka, Moscow, 1985a. In Russian. [334] V.L. Levin. Functionally closed preorders and strong stochastic dominance. Soviet Math. Doklady, 32:22–26, 1985b.

References

375

[335] V.L. Levin. Extremal problems with probability measures, functionally closed preorders and strong stochastic dominance. In Stochastic Optimization, volume 81 of Lecture Notes in Control and Information Science, pages 435–447, Berlin, New York, 1986. Proc. Int. Conf. Kiev 1984, Springer-Verlag. [336] V.L. Levin. Measurable selectors of multivalued mappings and the mass transfer problem. Dokl. Akad. Nauk USSR, 292:1048–1053, 1987. [337] V.L. Levin. General Monge–Kantorovich problem and its applications in measure theory and mathematical economics. In L.J. Leifman, editor, Functional Analysis, Optimization and Mathematical Economics. Oxford University Press, 1990. A collection of papers dedicated to the Memory of L.V. Kantorovich. [338] V.L. Levin. Some applications of set-valued mappings in mathematical economics. Journal of Mathematical Economics, 20:69–87, 1991. [339] V.L. Levin. A formula for the optimal value in the Monge– Kantorovich problem with a smooth cost function and a characterization of cyclically monotone mappings. USSR Math. Sbornik, 71:533–548, 1992. [340] V.L. Levin. Private communication, 1994. [341] V.L. Levin. Quasi-convex functions and quasi-monotone operators. Journal of Convex Analysis, 2, 1995a. [342] V.L. Levin. Reduced cost functions and their applications. Journal of Mathematical Economics, 1995b. To appear. [343] V.L. Levin and A.A. Milyutin. The mass transfer problem with discontinuous cost function and a mass setting for the problem of duality of convex extremum problems. Trans Russian Math. Surveys, 34:1– 78, 1979. [344] V.L. Levin and S.T. Rachev. New duality theorems for marginal problems with some applications in stochastics. Lecture Notes in Mathematics, 1412:137–170, 1989. [345] M. Loeve. Probability Theory. Van Nostrand, 1977. [346] G.G. Lorentz. A problem of plane measure. Amer. J. Math., 71:417– 426, 1949. [347] G.G. Lorentz. An inequality for rearrangements. American Mathematics Monthly, 60:176–179, 1953.

376

References

[348] G. Louchard. Exact and asymptotic distributions in digital and binary search trees. Theor. Inf. Appl., 21:479–495, 1987. [349] R. Lucchetti and F. Patrone. Closure and upper semicontinuity results in mathematical programming, Nash and economic equilibria, Optimization. Mathematische Operationsforschung und StatistikSeries Optimization, 17:619–628, 1986. [350] N. Lusin. Le¸cons sur les Ensembles Analytiques. Gauthier-Villars, 1930. [351] M. Maejima. Some limit theorems for summability methods of iid random variables. In V.V. Kalashnikov et al., editor, Stability problems of stochastic models, volume 1233 of Lecture Notes in Mathematics, pages 57–68, 1985. Varna 1985. [352] M. Maejima. Some limit theorems for stability methods of i.i.d. random variables. Lecture Notes in Mathematics, 1233:57–68, 1988. [353] M. Maejima and S.T. Rachev. An ideal metric and the rate of convergence to a self-similar process. Annals of Probability, 15:708–727, 1987. [354] M. Maejima and S.T. Rachev. Rates of convergence in the operatorstable limit theorems. J. Theor. Probability, 9:37–86, 1996. [355] H.M. Mahmoud. Evolution of Random Search Trees. Wiley, New York, London, 1992. [356] G.D. Makarov. Estimates for the distributions function of a sum of two random variables when the marginal distributions are ﬁxed. Theory of Probability and its Applications, 26:803–806, 1981. [357] C.L. Mallows. A note on asymptotic joint normality. Annals of Mathematical Statistics, 43:508–515, 1972. [358] B.B. Mandelbrot. Multiplications al´eatoires it´er´ees et distributions invariantes par moyenne pond´er´ee al´eatorie. C.R. Acad. Sci. Paris, 278, 1974. [359] B.B. Mandelbrot and M. Taylor. On the distribution of stock price diﬀerences. Oper. Res., 15:1057–1062, 1967. [360] M. Marcus. Some properties and applications of doubly stochastic matrices. American Mathematics Monthly, 67:215–222, 1960. [361] A.W. Marshall and I. Olkin. Theory of majorization and its applications. Academic Press, New York, 1979.

References

377

[362] G. Maruyama. Continuous Markov processes and stochastic equations. Rend. Circolo Math. Palermo, 4:48–90, 1955. [363] A. Mas-Colell. On the continuous representation of preorders. Intern. Econ. Revue, 18:509–513, 1977. [364] E. Maskin and J. Riley. Monopoly with incomplete information. Rand Journal of Economics, 15:171–196, 1984. [365] J.L. Massey. Collision-resolution algorithms and random-access communications, multi-user communication systems. CISM Courses and Lectures, 1981. [366] R. Mathar and D. Pfeifer. Stochastik f¨ ur Informatiker. Teubner, Stuttgart, 1990. [367] G. Matheron. Random Sets and Integral Geometry. Wiley, 1975. [368] M. Meerschaert. Moments of random vectors which belong to some domain of normal attraction. Annals of Probability, 18:870–876, 1989. [369] M. Meerschaert. Spectral decomposition for generalized domains of attraction. Annals of Probability, 19:875–892, 1991. [370] K. Mehlhorn. Datenstrukturen und eﬃziente Algorithmen, volume I. Teubner, Stuttgart, 1986. [371] I. Meilijson and A. Nadas. Convex majorization with application to the length of critical paths. Journal of Applied Probability, 16:671– 677, 1979. [372] D. Mejzler. On the problem of the limit distributions for the maximal term of a variational series. Lvov Politechn. Inst. Naucn. Zap. Ser. Fiz.-Mat., 38:90–109, 1956. In Russian. [373] E. Michael. Continuous selections. Ann. of Math., 63:361–382, 1956. [374] P. Mikusinski, H. Sherwood, and M.D. Taylor. Probabilistic interpretations of copulas and their convex sums. In Symp. Probab. Measures with Given Marginals, volume 67 of Math. Appl., pages 95–112, Rome, 1991. [375] G.N. Milshtein. A method of second-order accuracy integration of stochastic diﬀerential equations. Theory of Probability and its Applications, 23, 1978. [376] G.N. Milshtein. Numerical integration of stochastic diﬀerential equations. Izd. Ural. Univ. Sverdlovsk, 1988. In Russian.

378

References

[377] J.A. Mirrlees. Optimal tax theory: a synthesis. Journal of Public Economics, 6:327–358, 1976. [378] S. Mittnik and S.T. Rachev. Alternative multivariate stable distributions and their applications to ﬁnancial modeling. In S. Cambanis, G. Samordodnitsky, and M.S. Taqqu, editors, Stable Processes and Related Topics, pages 107–120, Boston, 1991. Birkh¨auser. [379] S. Mittnik and S.T. Rachev. Modeling assets returns with alternative stable laws. Econometric reviews, 12(3):261–330, 1993. [380] S. Mittnik and S.T. Rachev. Reply on comments on “modeling assets returns with alternative stable laws” and some extensions. Econometric reviews, 12(3):347–389, 1993. [381] S. Mittnik and S.T. Rachev. Modelling Financial Assets with Alternative Stable Models. Series in Financial Economics and Quantitative Analysis. Wiley, New York, 1997. [382] G. Monge. M´emoire sur la th´eorie des d´eblais et des remblais, 1781. [383] F. Mosteller, C. Youtz, and D. Zahn. The distribution of sums of rounded percentages. Demography, 4:850–858, 1967. [384] K.R. Mount and S. Reiter. Construction of a continuous utility function for a class of preferences. Journal of Mathematical Economics, 3:227–245, 1976. [385] L. Nachbin. Topology and Order. Van Nostrand, New York, 1965. [386] R.B. Nelsen. Copulas and association. In Symp. Probab. Measures with Given Marginals, pages 51–74, Rome, 1991. Kluwer. [387] W. Neuefeind. On continuous utility. Journal of Economic Theory, 5:174–176, 1972. [388] J. Neveu. Mathematical Foundations of the Calculus of Probability. Holden-Day, San Francisco, 1965. [389] J. Neveu and R.M. Dudley. On Kantorovich–Rubinstein theorems. (Transcript), 1980. [390] V.B. Nevzorov. Records. Theory of Probability and its Applications, 32:201–228, 1988. [391] N.J. Newton. An asymptotically eﬃcient diﬀerence formula for solving stochastic diﬀerential equations. Stochastics, 19:175–206, 1986. [392] I. Olkin and F. Pukelsheim. The distance between two random vectors with given dispersion matrices. Journal of Linear Algebra and its Applications, 48:257–263, 1982.

References

379

[393] I. Olkin and F. Pukelsheim. Marginal problems with additional constraints. Tech. report, 270, 1990. Department of Statistics, Stanford University, Stanford, CA. [394] I. Olkin and S.T. Rachev. Distances among random vecotrs with given dispersion matrices. Preprint, 1991. Department of Statistics, Stanford University, Stanford, CA. [395] I. Olkin and S.T. Rachev. Maximum submatrix traces for positive deﬁnite matrices. SIAM Journal of Matrix Analysis Applications, 14:390–397, 1993. [396] J.M. Ortega and W.C. Rheinboldt. Iterative solution of nonlinear equations in several variables. Academic Press, New York, 1970. [397] J. Pachl. Two classes of measures. Colloq. Math., 42:331–340, 1979. [398] E. Pardoux and D. Talay. Discretization and simulation of stochastic diﬀerential equations. Acta Appl. Math., 3:23–47, 1985. [399] V. Paulauskas and A. Rackauskas. Approximation Theory in the Central Limit Theorem. Kluwer Academic Publisher, 1989. [400] A.S. Paulson and V.R.R. Uppuluri. Limit laws of a sequence determined by a random diﬀerence equation governing a one-compartment system. Math. Biosci., 13:325–333, 1972. [401] A. Perez and R. Jirousek. Constructing an intentional expert system INES. In J.H. van Remmel, F. Gremy, and J. Zvarova, editors, Medical decision making: Diagnostic Strategies and Expert Systems, pages 307–315. North-Holland, 1985. [402] S. Perrakis and C. Henin. Evaluation of risky investments with random timing of cash returns. Management Sci., 21:79–86, 1974. [403] D. Pfeifer. Some remarks on Nevzorov’s record model. Advances in Applied Probability, 23:823–834, 1991. [404] G. Pﬂug. Stochastische Modelle in der Informatik. Stuttgart, 1986.

Teubner,

[405] G. Pisier and J. Zinn. On limit theorems for random variables with values in the spaces Lp . Z. Wahrscheinlichkeitstheorie Verw. Geb., 41:286–305, 1977. [406] B. Pittel. Paths in a random digital tree: Limiting distributions. Advances in Applied Probability, 18:139–155, 1986. [407] E. Platen. An approximation method for a class of Itˆ o processes. Lietuvos Math. Rink. XXI, 1:121–133, 1981.

380

References

[408] D. Pollard. Convergence of Stochastic Processes. Springer, 1984. [409] C.J. Preston. A generalization of the FKG inequalities. Comm. Math. Phys., 36:233–241, 1974. [410] P.S. Puri. On almost sure convergence of an erosion process due to Todorovic and Gani. Journal of Applied Probability, 24:1001–1005, 1987. [411] G. Pyatt and J.J. Round, editors. Social Accounting Matrics: A Basis for Planning. World Bank, Washington, D.C., 1985. [412] R. Pyke and D. Root. On convergence in r-mean of normalized partial sums. Annals of Mathematical Statistics, 39:379–381, 1968. [413] S.T. Rachev. On a metric construction of Hausdorﬀ in a space of probability measures. Zapiski Nauchn. Sem. LOMI, 87:87–104, 1978. [414] S.T. Rachev. Minimal metrics in a space of real random variables. Dokl. Akad. Nauk SSSR, 257(5):1067–1070, 1981. [415] S.T. Rachev. On minimal metrics in the space of real-valued random variables. Soviet Dokl. Math., 23(2):425–438, 1981a. [416] S.T. Rachev. Minimal metrics in the random variables spaces. Pub. Inst. Stat. Univ. Paris, 27(1):27–47, 1982a. [417] S.T. Rachev. Minimal metrics in the random variables spaces. In W. Grossmann et al., editor, Probability and Statistical Inference Proceedings of the 2nd Pannonian Symp., pages 319–327, Dordrecht, 1982b. D. Reidel Company. [418] S.T. Rachev. Compactness in the probability measures space. In M. Galyare et al., editor, Proceedings of the 3rd European Young Statisticians Meeting, pages 136–150, Katholieke Univ., Leuven, 1983a. [419] S.T. Rachev. Minimal metrics in the real valued random variable spaces. Lecture Notes in Mathematics, 982:172–190, 1983b. [420] S.T. Rachev. Hausdorﬀ metric construction in the probability measures space. Studia Mathematica, 7:152–162, 1984a. Pliska. [421] S.T. Rachev. The Monge–Kantorovich mass transference problem and its stochastic applications. Theory of Probability and its Applications, 29:647–676, 1984b. [422] S.T. Rachev. On a class of minimal functionals on a space of probability measure. Theory of Probability and its Applications, 29(1):41–49, 1984c.

References

[423] S.T. Rachev. On a problem of Dudley. 29(2):162–164, 1984d.

381

Soviet Math. Doklady,

[424] S.T. Rachev. Extreme functionals in the space of probability measures. Lecture Notes in Mathematics, 1155:320–348, 1985a. Proc. “Stability Problems for Stochastic Models”. [425] S.T. Rachev. Probability metrics and their applications to the stability problems for stochastic models, 1985b. Author’s review of doctor of sciences theses, Steklov Mathematical Institute, USSR Academy of Sciences, Moscow. In Russian. [426] S.T. Rachev. Extreme functional in the space of probability theory and mathematical statistics. VNU Science Press, 2:474–476, 1986. [427] S.T. Rachev. Minimal metrics in a space of random vectors with ﬁxed one-dimensional marginal distributions. J. Soviet Math., 34(2):1542– 1555, 1986. Stability Problems for Stochastic Models. Proceedings, Moscow, VNIISI. [428] S.T. Rachev. The stability of stochastic models. Applied Probability Newsletter, 12(2):3–4, 1988. [429] S.T. Rachev. The problem of stability in queueing theory. Queueing Systems Theory and Applications, 4:287–318, 1989. [430] S.T. Rachev. Mass transshipment problems and ideal metrics. Numer. Func. Anal. & Optimiz., 12(5& 6):563–573, 1991a. [431] S.T. Rachev. Optimal mass transportation problems. In Proceedings of XI Congres de Metodologias en Ingenieria de Sistemas, pages 115– 120, Azocar, Santiago de Chile, 1991b. [432] S.T. Rachev. Probability Metrics and the Stability of Stochastic Models. Wiley, Chichester-New York, 1991c. [433] S.T. Rachev. Theory of probability metrics and recursive algorithms. In S. Joly and G. le Calve, editors, Distancia 1992, Proceedings of Congres International sur Analyse en Distance, pages 339–403, Universit´e de haute Bretagne, Rennes, 1992. [434] S.T. Rachev and G.S. Chobanov. Minimality of ideal probabilistic metrics. Pliska, 2:1154–1158, 1986. In Russian. [435] S.T. Rachev, B. Dimitrov, and Z. Khalil. A probabilistic approach to optimal quality usage. Computers and Mathematics with Applications, 24(8/9):219–227, 1992. [436] S.T. Rachev and Z. Ignatov. Minimality of ideal probabilistic metrics. J. Soviet Math., 32(6):595–608, 1986.

382

References

[437] S.T. Rachev and S.I. Resnick. Max-geometric inﬁnite divisibility and stability. Stoch. Models, 2:191–218, 1991. [438] S.T. Rachev and L. R¨ uschendorf. Approximation of sums by compound Poisson distributions with respect to stop-loss distances. Advances in Applied Probability, 22:350–374, 1990. [439] S.T. Rachev and L. R¨ uschendorf. A counterexample to a.s. constructions. Stat. Prob. Letters, 9:307–309, 1990a. [440] S.T. Rachev and L. R¨ uschendorf. A transformation property of minimal metrics. Theory of Probability and its Applications, 35:131–137, 1990b. [441] S.T. Rachev and L. R¨ uschendorf. Approximate independence of distributions on spheres and their stability properties. Annals of Probability, 19:1311–1337, 1991. [442] S.T. Rachev and L. R¨ uschendorf. Recent results in the theory of probability metrics. Statistics & Decisions, 9:327–373, 1991a. [443] S.T. Rachev and L. R¨ uschendorf. A new ideal metric with applications to multivariate stable limit theorems, summability methods and compound Poisson approximation. Probability Theory and Related Fields, 94:163–187, 1992. [444] S.T. Rachev and L. R¨ uschendorf. Rate of convergence for sums and maxima and doubly ideal metrics. Theory of Probability and its Applications, 37:276–289, 1992a. [445] S.T. Rachev and L. R¨ uschendorf. On constrained transportation problems. In Proceedings of the 32nd Conference on Decision and Control, volume 3, pages 2896–2900. IEEE Control System Society, 1993. [446] S.T. Rachev and L. R¨ uschendorf. On the Cox, Ross and Rubinstein model for option pricing. Theory of Probability and its Applications, 39:150–190, 1994. [447] S.T. Rachev and L. R¨ uschendorf. On the rate of convergence in the CLT with respect to the Kantorovich metric. In J. Kuelbs, M. Marcus, and J. Hoﬀman-Jorgensen, editors, 9th Conf. on Probability on Banach Spaces, pages 193–207, Boston–Basel–Berlin, 1994a. Birkh¨ auser. [448] S.T. Rachev and L. R¨ uschendorf. Propagation of chaos and contraction of stochastic mappings. Siberian Advances in Mathematics, 4:114–150, 1994b.

References

383

[449] S.T. Rachev and L. R¨ uschendorf. Solution of some transportation problems with relaxed or additional constraints. SIAM Journal of Control and Optimization, 32(3):673–689, 1994c. [450] S.T. Rachev and L. R¨ uschendorf. Probability metrics and recursive algorithms. Journal of Applied Probability, 27:770–799, 1995. Technical Report (1991). [451] S.T. Rachev and L. R¨ uschendorf. Propagation of chaos and contraction of stochastic mappings. Siberian Adv. Math., 4:114–150, 1995a. [452] S.T. Rachev, L. R¨ uschendorf, and A. Schief. Uniformities for the convergence in law and probability. Journal of Theoretical Probability, 5:33–44, 1992. [453] S.T. Rachev and G. Samorodnitsky. Geometric stable distributions in Banach spaces. Journal of Theoretical Probability, 7(29):351–373, 1994. [454] S.T. Rachev and G. Samorodnitsky. Limit laws for a stochastic process and random recursion arising in probabilistic modelling. Advances in Applied Probability, 27:185–203, 1995. [455] S.T. Rachev and A. Schief. On Lp -minimal metric. Probability and Mathematical Statistics, 13(2):311–320, 1992. [456] S.T. Rachev and A. SenGupta. Geometric stable distributions and Laplace–Weibull mixtures. Statistics & Decisions, 10:251–271, 1992. [457] S.T. Rachev and A. SenGupta. Laplace-Weibull mixtures for modeling price changes. Management Science, pages 1029–1038, 1993. [458] S.T. Rachev and R.M. Shortt. Classiﬁcation problem for probability metrics, volume 94 of Contemporary Mathematics, pages 221–262. AMS, 1989. [459] S.T. Rachev and R.M. Shortt. Duality theorems for Kantorovich– Rubinstein and Wasserstein functionals. Dissertationes Mathematicae, 299:647–676, 1990. [460] S.T. Rachev and M. Taksar. Kantorovich’s functionals in space of measures. In I. Karatzas and D. Ocone, editors, Applied Stochastic Analysis, volume 77 of Lecture Notes in Control and Information Science, pages 248–261, Berlin–New York, 1992. Proceedings of the US–French Workshop, Springer-Verlag. [461] S.T. Rachev and P. Todorovic. On the rate of convergence of some functionals of a stochastic process. Journal of Applied Probability, 28:805–814, 1990.

384

References

[462] S.T. Rachev and J.E. Yukich. Rates for the CLT via new ideal metrics. Annals of Probability, 17:775–788, 1989. [463] S.T. Rachev and J.E. Yukich. Smoothing metrics for measures on groups with applications to random motions. Annales de l’Institut Henri Poincar´e, 25:429–941, 1990. [464] S.T. Rachev and J.E. Yukich. Rates of convergence of α-stable random motions. J. Theor. Prob., 4:333–352, 1991. [465] A. Rackauskas. On the convergence rate in martingale CLT in Hilbert spaces. Preprint 90-031, 1990. University of Bielefeld. [466] D. Ramachandran. Perfect measures. Part I: Basic theory, volume 5. Macmillan, New Delhi, 1979. [467] D. Ramachandran. Perfect measures. Part II: Special topics, volume 7. Macmillan, New Delhi, 1979. [468] D. Ramachandran. Marginal problem in arbitrary product spaces. In Proceedings of the conference on “Distribution with Fixed Marginals, Double Stochastic Measures and Markov Operators”, volume 28, pages 260–272, Seattle, August 1993. IMS Lecture Notes Monograph Series 1997. [469] D. Ramachandran and L. R¨ uschendorf. A general duality theorem for marginal problems. Probability Theory and Related Fields, 101:311– 319, 1995. [470] D. Ramachandran and L. R¨ uschendorf. Duality and perfect probability spaces. Proc. Amer. Math. Soc., 124:2223–2228, 1996a. [471] D. Ramachandran and L. R¨ uschendorf. Duality theorems for assignments with upper bounds. In ‘Distributions with Fixed Marginals and Moment Problems’, pages 283–290. Kluwer, 1997. [472] D. Ramachandran and L. R¨ uschendorf. On the validity of the Monge– Kantorovich duality theorem. Preprint, 1997. [473] F. Ramsey. A mathematical theory of savings. Economic Journal, 38:543–559, 1928. [474] M. Regnier and P. Jacquet. New results on the size of tries. IEEE Transactions on Information Theory, 35:203–205, 1989. [475] S.I. Resnick and P. Greenwood. A bivariate stable characterization and domains of attraction. Journal of Multivariate Analysis, 9:206– 221, 1979.

References

385

[476] M.K. Richter. Duality and rationality. Journal of Economic Theory, 20:131–181, 1979. [477] H. Robbins. The maximum of identically distributed random variables. I.M.S. Bull., March 1975. Abstract. [478] H. Robbins and D. Siegmund. A convergence theorem for nonnegative almost supermartingales. In Rustagi, editor, Optimiz. Meth. in Statistics, pages 233–258. Academic Press, 1971. [479] J.C. Rochet. The taxation principle and multi-time Hamilton–Jacobi equation. Journal of Mathematical Economics, 14:113–128, 1985. [480] J.C. Rochet. A necessary and suﬃcient condition for rationalizability in a quasi-linear context. Journal of Mathematical Economics, 16:191–200, 1987. [481] R.T. Rockafellar. Characterization of the subdiﬀerentials of convex functions. Paciﬁc J. Math., 17:497–510, 1966. [482] R.T. Rockafellar. Convex Analysis. Princeton Univ. Press, Princeton, NJ, 1970. [483] C. Rogers. Coupling of random walks, 1992. Private communication. [484] W.W. Rogosinski. Moments of non-negative mass. In Proceedings of Royal Society London, Ser. A, volume 245, pages 1–27, 1958. [485] W. R¨ omisch. An approximation method in stochastic optimization and control. In Optimization techniques, volume 22, pages 169–178. Proc. 9th IFIP Conf., Warsaw 1979, Part 1, Lecture Notes in Control and Information Science, 1980. [486] W. R¨omisch. On discrete approximations in stochastic programming, 1981. Seminarbericht. [487] W. R¨ omisch and R. Schultz. Stability analysis of stochastic programs. Ann. Operat. Res., 30:241–266, 1991. [488] W. R¨omisch and R. Schultz. Stability of solutions for stochastic programs with complete recourse. Mathematics of Operations Research, 18:590–609, 1993. [489] W. R¨omisch and A. Wakolbinger. On Lipschitz dependence in systems with diﬀerentiated inputs. Math. Ann, 272:237–248, 1985. [490] U. R¨ osler. A limit theorem for quicksort. Informatique Th´eorique et Applications, 25:85–100, 1991.

386

References

[491] U. R¨osler. A ﬁxed point theorem for distributions. Stoch. Processes and Applications, 37:195–214, 1992. [492] S.M. Ross. A simple heuristic approach to simplex eﬃciency. European J. Oper. Res., 9:344–346, 1982. [493] S.M. Ross. Stochastic Processes. Wiley, New York, 1983. [494] B. R¨ uger. Scharfe untere und obere Schranken f¨ ur die Wahrscheinlichkeit der Realisation von k unter n Ereignissen. Metrika, 26:71–77, 1979. [495] L. R¨ uschendorf. Vergleich von Zufallsvariablen bzgl. integralinduzierter Halbordnungen, 1979. Habilitationsschrift. [496] L. R¨ uschendorf. Inequalities for the expectiation of -monotone functions. Z. Wahrscheinlichkeitstheorie Verw. Geb., 54:341–349, 1980. [497] L. R¨ uschendorf. Ordering of distributions and rearrangement of functions. Annals of Probability, 9:276–283, 1980. [498] L. R¨ uschendorf. Sharpness of Fr´echet-Bounds. Z. Wahrscheinlichkeitstheorie Verw. Geb., 57:293–302, 1981. [499] L. R¨ uschendorf. Random variables with maximum sums. Advances in Applied Probability, 14:623–632, 1982. [500] L. R¨ uschendorf. On the multidimensional assignment problem. Methods of OR, 47:107–113, 1983. [501] L. R¨ uschendorf. Solution of a statistical optimization problem by rearrangement methods. Metrika, 30:55–62, 1983. [502] L. R¨ uschendorf. On the minimum discrimination information theorem. Statistics & Decisions, 1:263–283, 1984. Suppl. Issue. [503] L. R¨ uschendorf. Construction of multivariate distributions with given marginals. Ann. Inst. Stat. Math., 37:225–233, 1985. [504] L. R¨ uschendorf. The Wasserstein distance and approximation theorems. Z. Wahrscheinlichkeitstheorie Verw. Geb., 70:117–129, 1985. [505] L. R¨ uschendorf. Monotonicity and unbiasedness of tests via a.s. constructions. Statistics, 17:221–230, 1986. [506] L. R¨ uschendorf. Fr´echet-bounds and their applications. In G. Dall’Aglio, S. Kotz, and G. Salinetti, editors, Advances in Probability Measure with Given Marginals, pages 151–188. Kluver, Amsterdam, 1991.

References

387

[507] L. R¨ uschendorf. Bounds for distributions with multivariate marginals. In K. Mosler and M. Scarsini, editors, Stochastic Order and Decision under Risk, volume 19, pages 285–310. IMS Lecture Notes, 1991a. [508] L. R¨ uschendorf. Conditional stochastic ordering of distributions. Advances in Applied Probability, 23:46–63, 1991b. [509] L. R¨ uschendorf. Stochastic ordering of likelihood ratios and partial suﬃciency. Statistics, 22:551–558, 1991c. [510] L. R¨ uschendorf. Optimal solutions of multivariate coupling problems. Appl. Mathematicae, 22:325–338, 1995. [511] L. R¨ uschendorf. Developments on Fr´echet bounds. In Proceedings of Distributions with Fixed Marginals and Related Topics, volume 28, pages 273–296. IMS Lecture Notes Monograph Series, 1996. [512] L. R¨ uschendorf. On c-optimal random variables. Statistics Prob. Letters, 27:267–270, 1996. [513] L. R¨ uschendorf and S.T. Rachev. A characterization of random variables with minimum L2 -distance. Journal of Multivariate Analysis, 32:48–54, 1990. [514] L. R¨ uschendorf, B. Schweizer, and M.D. Taylor. Distributions with Fixed Marginals and Related Topics. In Proceedings of Distributions with Fixed Marginals and Related Topics, volume 28. IMS Lecture Notes Monograph Series, 1996. [515] L. R¨ uschendorf and L. Uckelmann. On optimal multivariate couplings. In Distribution with given marginals and moment problems, pages 261–274. Kluwer, 1997. [516] T. Rychlik. Stochastically extremal distributions of order statistics for dependent samples. Statistics & Probability Letters, 13:337–341, 1992. [517] C. Ryll-Nardzewski. 40:125–130, 1953.

On quasi-compact measures.

Fund. Math.,

[518] G. Samorodnitsky and M. Taqqu. Stable Non-Gaussian Random Processes. Stochastic Models with Inﬁnite Variance. Chapman & Hall, New York, 1994. [519] E. Samuel and R. Bachi. Measures of distance of distribution functions and some applications. Metron, 23:83–122, 1964. [520] V.V. Sazonov. Normal approximation - some recent advances. Lecture Notes in Mathematics, 879, 1981.

388

References

[521] H.H. Schaefer. Topological Vector Spaces. Springer, New York, 1966. [522] M. Schaefer. Note on the k-dimensional Jensen inequality. Annals of Probability, 2:502–504, 1976. [523] G. Schay. Optimal joint distributions of several random variables with given marginals. Stud. Appl. Math., LXI:179–183, 1979. [524] L. Schwartz. Radon Measures On Arbitrary Topological Spaces and Cylindrical Measures. Oxford University Press, London, 1973. [525] B. Schweizer. Thirty years of copulas. In G. Dall’Aglio, S. Kotz, and G. Salinetti, editors, Symp. Probab. Measures with Given Marginals, pages 13–50, Rome, 1990. Kluwer. [526] B. Schweizer and A. Sklar. Probabilistic Metric Spaces. Elsevier, North-Holland, 1983. [527] L. Seidel. On limit distributions of random symmetric polynomials. Theory of Probability and its Applications, 23:266–278, 1988. [528] V.V. Senatov. Uniform estimates of the rate of convergence in the multi-dimensional central limit theorem. Theory of Probability and its Applications, 25:745–759, 1980. [529] V.V. Senatov. Some lower estimates for the rate of convergence in the multi-dimensional central limit theorem. Soviet Math. Doklady, 23:188–192, 1981. [530] W.J. Shafer and H.F. Sonnenschein. Equilibrium in abstract economics without ordered preferences. Journal of Mathematical Economics, 2:345–348, 1975. [531] L.S. Shapley and M. Shubik. The assignment game, 1: the core. Int. J. Game Theory, 1:110–130, 1972. [532] M. Sharpe. Operator-stable probability distributions on vector groups. Trans. Amer. Math. Soc., 136:51–65, 1969. [533] A.N. Shiryaev. Probability Theory. Springer, 1984. [534] J.A. Shohat and J.D. Tamarkin. The Problem of Moments. American Mathematical Society, Providence, 1943. [535] I.A. Sholpo. ε-minimal metrics. Theory of Probability and its Applications, 28:854–855, 1983. [536] G.R. Shorack and J.A. Wellner. Empiricial Processes With Applications to Statistics. Wiley, New York, 1986.

References

389

[537] R.M. Shortt. Private communication. [538] R.M. Shortt. Combinatorial methods in the study of marginal problems over separable spaces. Journal of Mathematical Analalysis and its Applications, 97:462–479, 1983. [539] R.M. Shortt. Strassen’s marginal problems in two or more dimensions. Z. Wahrscheinlichkeitstheorie Verw. Geb., 64:313–325, 1983. [540] R.M. Shortt. Univerally measurable spaces: An invariance theorem and diverse characterizations. Fund. Math. Th., 121:35–42, 1983. [541] H.J. Skala. The existence of probability measures with given marginals. Annals of Probability, 21:136–142, 1993. [542] M. Sklar. Fonctions de repartition a dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris, 8:229–231, 1959. [543] C.S. Smith and M. Knott. Note on the optimal transportation of distributions. Journal of Optimization Theory and Applications, 52:323– 329, 1987. [544] C.S. Smith and M. Knott. On Hoeﬀding–Fr´echet bounds and cyclic monotone relations. Journal of Multivariate Analysis, 40:328–334, 1992. [545] T.A.B. Snijders. Antithetic variates for Monte Carlo estimation of probabilites. Statistics Neerlandica, 38:1–19, 1984. [546] D. Stoyan. Comparison Methods for Queues and Other Stochastic Models. Wiley, 1983. [547] V. Strassen. The existence of probability measures with given marginals. Annals of Mathematical Statistics, 36(2):423–439, 1965. ˘ ep´ [548] J. St˘ an. Simplicial measures. In Memor. Vol. of J. H´ ajek, pages 239–251. Academia Prague, 1977. ˘ ep´ [549] J. St˘ an. Probability measures with given expectations. In Proc. of the 2nd Prague Symp. on Asympt. Statistics, pages 315–320. North Holland, 1979. [550] V.N. Sudakov. Geometric problems in the theory of inﬁnite dimensional probability distributions. Proc. Steklov Inst. Math., 141(2), 1979. [551] H. Sussmann. On the gap between deterministic and stochastic differential equations. Annals of Probability, 6:19–41, 1978.

390

References

[552] A.S. Sznitman. Equations de type de Boltzmann, Spatialement homog`enes. Z. Wahrscheinlichkeitstheorie Verw. Geb., 660:559–592, 1984. [553] A.S. Sznitman. Propagation of chaos. In Ecole d’Et´e Saint-Flour, volume 1464 of Lecture Notes in Mathematics, pages 165–251, 1989. [554] A. Szulga. On the Wasserstein metric. In Transactions of the 8th Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, volume B, pages 267–273, Prague, 1978. Akademia Praha. [555] A. Szulga. On minimal metrics in the space of random variables. Theory of Probability and its Applications, 27:424–430, 1982. [556] W. Szwarc and M. Posner. The tridiagonal transportation problem. Operations Research Letters, 3:25–30, 1984. [557] M. Talagrand. Matching random samples in many dimensions. Annals of Applied Probability, 2:846–856, 1992. [558] D. Talay. R´esolution trajectorielle et analyse num´erique des ´equations diﬀerentielles stochastiques. Stochastics, 9:275–306, 1988. [559] H. Tanaka. An inequality for a functional of probabillity distributions and its applications to Kac’s one-dimensional modal of a Maxwellian gas. Z. Wahrscheinlichkeitstheorie Verw. Geb., 27:47–52, 1973. [560] H. Tanaka. Probabilistic treatment of the Boltzmann equation for Maxwellian molecules. Z. Wahrscheinlichkeitstheorie Verw. Geb., 46:67–105, 1978. [561] A.H. Tchen. Inequalities for distributions with given marginals. Annals of Probability, 8:814–827, 1980. [562] P. Todorovic. An extremal problem arising in soil erosion modeling, pages 65–73. Reidel, Dordrecht, 1987. edt.: I.B. MacNeil and G.J. Umphrey. [563] P. Todorovic and J. Gani. Modeling of the eﬀect of erosion on crop production. Journal of Applied Probability, 24:787–797, 1987. [564] Y.L. Tong. Probability Inequalities in Multivariate Distributions. Academic Press, 1980. [565] D.M. Topkis and A.F. Veinott jr. Monotone solution of extremal problems on lattices (abstract). In Abstract of 8th International Symposium on Mathematical Programming, volume 131, Stanford, CA,, 1973. Stanford University.

References

391

[566] A. Tuero-Diaz. Aplicaciones crecientes: Relaciones con las m´etricas de Wasserstein. PhD thesis, Universidad de Cantabria, 1991. [567] A. Tuero-Diaz. On the stochastic convergence of representations based on Wasserstein metrics. Annals of Probability, 21:72–85, 1993. [568] L. Uckelmann. Konstruktion von optimalen Couplings. Universit¨at M¨ unster, 1993. Diplom-Arbeit. [569] L. Uckelmann. Optimal couplings between one dimensional distributions. In Distribution with given marginals and moment problems, pages 275–282. Kluwer, 1997. [570] V.R.R. Uppuluri, P.I. Feder, and L.R. Shenton. Random diﬀerence equations occuring in one-compartment models. Math. Biosci., 2:143–171, 1967. [571] S.S. Vallander. Calculation of the Wasserstein distance between probability distributions on the line. Theory of Probability and its Applications, 18:784–786, 1973. [572] A.F. Veinott Jr. Representation of general and polyhedral sublattices and sublattices of product spaces. Journal of Linear Algebra and its Applications, 114/115:681–704, 1989. [573] A.M. Vershik. Some remarks on inﬁnite-dimensional linear programming problems. Russian Math. Surveys, 25:117–124, 1970. [574] A.M. Vershik and V. Temelt. Some questions of approximation of the optimal value of inﬁnite-dimensional linear programming problems. Siberian Math. J, 9:591–601, 1968. [575] W. Vervaat. On a stochastic diﬀerence equation and a representation of non-negative inﬁnitely divisible random variables. Advances in Applied Probability, 11:750–783, 1979. [576] N.N. Vorobev. Consistent families of measures and their extensions. Theory of Probability and its Applications, 7:147–163, 1962. [577] W. Wagner. Monte Carlo evalutation of functionals of solutions of stochastic diﬀerential equations. Variance reduction and numerical examples. Stoch. Analysis Appl., 6:447–468, 1988. [578] W. Warmuth. Marginal Fr´echet-bounds for multidimensional distribution functions. Statistics, 19:283–294, 1976. [579] L.N. Wasserstein. Markov processes over denumerable products of spaces describing large systems of automata. Problems of Information Transmission, 1969.

392

References

[580] H. von Weizs¨acker and G. Winkler. Integral representation in the set of solutions of a generalized moment problem, 1980. [581] E. Wesley. Borel preference orders in markets with a continuum of traders. Journal of Mathematical Economics, 3:155–165, 1976. [582] A. Wieczorek. On the measurable utility theorem. Journal of Mathematical Economics, 7:165–173, 1980. [583] E. Wild. On Boltzmann’s equation in the kinetic theory of gases. Proc. Camb. Phil. Soc., 4:602–609, 1951. [584] G. Winkler. Choquet order and simplices with applications in probabilistic models. Lecture Notes in Mathematics, 1145, 1988. [585] J. Yukich. Exact order rates of convergence of empirical measures. Preprint, 1991. [586] J. Yukich. The exponential integrability of transportation cost. Preprint, 1991. [587] J. Yukich. Some generalizations of the Euclidean two-sample matching problem. Prob. Banach Spaces, 8:55–66, 1992. [588] V.M. Zolotarev. On the continuity of stochastic sequences generated by recursive procedures. Theory of Probability and its Applications, 20:819–832, 1975. [589] V.M. Zolotarev. Approximation of distributions of sums of independent random variables with values in inﬁnite dimensional spaces. Theory of Probability and its Applications, 21:721–737, 1976. [590] V.M. Zolotarev. Metric distances in spaces of random variables and their distributions. Math. Sb., 30(3):393–401, 1976. [591] V.M. Zolotarev. General problems of the stability of mathematical models. Bull. Int. Stat. Inst., 47(2):382–401, 1977. [592] V.M. Zolotarev. On pseudomoments. Theory of Probability and its Applications, 23:269–278, 1978. [593] V.M. Zolotarev. On the properties and relationships of certain types of metrics. Zapiski Nauchn. Sem. LOMI, 87:18–35, 1978. [594] V.M. Zolotarev. Ideal metrics in the problems of probability theory and mathematical statistics. Austral. J. Statist., 21(3):193–208, 1979. [595] V.M. Zolotarev. Probability metrics. Theory of Probability and its Applications, 28:278–302, 1983.

References

393

[596] V.M. Zolotarev. Contemporary Theory of Summation of Independent Random Variables. Nauka, Moscow, 1986. In Russian. [597] V.M. Zolotarev. Modern theory of summation of independent random varables. Nauka, Moscow, 1987. In Russian. [598] V.M. Zolotarev and S.T. Rachev. Rate of convergence in limit theorems for the max scheme. In Stability Problems for stochastic models, volume 1155, pages 415–442. Springer, 1984.

Abbreviations

Bold pagenumbers refer to this volume, non-bold pagenumbers to the other volume.

a.e. ARCH a.s.

almost everywhere autoregressive conditional heteroscedasticity almost sure

158, 385 39 8

BLIL

bounded law of the iterated logarithm

306

CLT ch.f. CRI CTM

central limit theorem characteristic function communication resolution interval Capetanakis–Tsybakov–Mikhailov

85 400 38 220

d.f.(s) dna DP DTP

distribution function(s) domain of normal attraction dual polyhedron dual transportation problem

8, 107 306 23 23

GARCH

general ARCH

39

htl

explained on page

433

IFS i.i.d.

iterated function systems independent identically distributed

202 35

KKR KRP

Kakosjan, Klebanov, and Rachev Kantorovich–Rubinstein transshipment problem

43 vii, 2

LCFS LHS LLN lsc

last come ﬁrst served left-hand side law of large numbers lower semicontinuous

220 405 81 113

396 MKP MKTP MTPA MTP MTPP OTP PDE PERT PP r.f.(s) r.v.(s) SDE SLLN supp P TP usc

Abbreviations Monge–Kontorovich mass transportation problem classical Monge–Kantorovich transportation problem MTP with additional constraints mass transportation MTP with partial knowledge of the marginals optimal transportation plan partial diﬀerential equation network model primal polyhedron random ﬁeld(s) random variable(s) stochastic diﬀerential equation strong law of large numbers support of P transportation problem upper semicontinuous

vii, 1, 19, 58 374 vii vii, 1 4 3 xii, xvi 148 23 248 3 39 30 20 21 127

Symbols

Bold pagenumbers refer to this volume, non-bold pagenumbers to the other volume.

◦

A d A=A Ab Ab Adk

Am Am Aε A(h, g)

An (t) A+ n (t) An (α) An (α) Ap (H)

A(α) ◦ Ar f Q ASp (P1 , P2 ) A∗ (α)

interior of A 59 closure of A with respect to d 68 69 69 set of all linear subspaces Vk of IRd 137 69 69 139 assumption for a moment problem 62 148 148 190, 235 190, 235 optimal multivariate transshipment costs 158 263 393 97 191, 235

Aut(IRd )

a(s, k) au1 (x) a(Z)

B B∗ Bx B(1, n1 ) B(g)

BK (Si ) B n (α) Bkn (Zk ) Bn (m)

all invertible linear operators (automorphisms) 151 109 296 superlinear mapping 241 Banach limit 366 adjoint operator of B 132 109 Bernoulli distribution 257 assumption for the solution of a moment problem 62 61 191 191 set of nonnegative Borel measures 377

398

Symbols

Bp (H) B(p; x, y) Br B(Si ) B(X, B) B1 (x, y) B2 (x, y) B(α) Bx (ε) (B, · ) b br bu1 (x) ba (P1 , . . . , Pn ) ba (S, B)

C, C i

Cp,d Cs,Z

Cs,εZ C(g)

Cb (S), C b (S)

C(T )

upper bound for Ap (H) 158 quadratic form 280 ball of radius r 149 72 58 28 28 263 109 separable Banach space C(T ) 248 transshipment 371 absolute moments 102 296 measures with ﬁxed marginals 62 ﬁnitely additive measures 58 spaces of continuous and i times diﬀerentiable functions 255, 333 321 integrable (s − 1)-fold derivative 115 117 assumption for the solution of a moment problem 62 Banach space of bounded continuous real-valued functions on S 63, 164 Banach space 248

C γ (c; σ1 , σ2 ) C(Q) ◦

C(Q) ◦

C(Q)∗ Cb (IRd )

C(S)+ C(S)∗+ Cm (θ) Cov Cov (Xi |Fi−1 ) c c(i, j) c(x, y) c1 (x, y) c∗ (x, y) con(supp (P )) (D) D(h, g)

DP D(P, Q) Dp f (x) DΦ Dk (ϑ) Dm (θ) dr dr,k d(h) dn,m (x, t) d(x, y) d(X, Y )

307 set of continuous functions 384 quotient of the space C(Q) 384 conjugate space 393 set of all bounded continuous functions on IRd 152 166 166 170 covariance 108 conditional covariance 96 closure 304 discrete cost 27, 29 cost function viii, 10 ∂ = ∂x c(x, y) 128 reduced cost function 170 129 duality 76 assumption for a moment problem 62 dual polyhedron 23 50 optimal pairs 103 i ) 118 = ( ∂Φ ∂xj 112 170 smoothed version of d 137 61 divisor criterion 180 determinant of An,m (x, t) 397 76 uniform metric 137

Symbols dr (X, Y ) dKR (σ1 , σ2 )

dn (µ) dom fk dom Γ E Ek−1 (f ; Q) E(Si )

Es (X, Y ) ess sup ex H

F

∗

FP Fi Fmn F mn (n)

Fn (Fs,X ) Fu F Mp (P1 , P2 ) Fi (s) F1 ∧ F2 (t) F1 ∨ F2 (t) F+ (x) F− (x)

F1∗ (x) F2∗ (x) F (x, y)

probability metric 170 Kantorovich– Rubinstein distance 162 267 253 235

F P (x, y) F σ (x, y) (−1) FNs (y) f fc

separable metric space 278 factor-norm 384 ﬁnite elementary functions on Si 76 set of points 425 essential supremum 386 extremal points of H 19

f cc f (m) f (n)c f∗ f ∗∗

Fr´echet bound 19, 31, 33 distribution function of P 18 real distribution function 107 nth integral of m 375 survival function 375 385 423, 107 355 Fort´et–Mourier metric 17, 51 293 inﬁmal convolution 148 supremal convolution 148 := min(Fi (xi )) 107 := k Fi (xi )−(n−1) i=1

107

f (n)∗ f∗ f2 (u) fa (x) f (Z1 , Z2 ) fV (·)

Gk GQ

Gs,p Gα G|G|1|∞ G(m, α, β) Gs,X (t) G(u, v) +

Gσ (x, y)

399

12 12 extended Fr´echet bound 19 26 19 310 Young–Fenchel transform 104 c-conjugate of f 124 doubly c-conjugate of f 124 mth Fr´echet derivative of f 102 nth c-conjugate of f 124 p-conjugate 114, 124 second p-conjugate 102, 112 n-conjugate function of f 112 lower conjugate 103 38 145 extension f 317 translation by V 95 359 determination of an optimal measure Q 29 class of functions 103 geometric α-stable r.v. 242 71 grid class 41 424 graph of (u, v) ∈ DP 23 19

400

Symbols

Gn (Z) G(µ) g= gr Γn g(χ) H H

(k)

Hn hβ

hµ (A × B)

h(t1 , t2 )

I Iq Is I[A] I(|f − g|) I(h) I{0, g, a, b} IND i(x1 , x2 )

255 µ-neglegible open set 221 (g1 , ..., gN ) ∈ M 63 graph of Γn 194 363 Haar probability 133 distribution function of max(V1 , . . . , V ) 156 258 indicator or characteristic function 251 generalized upper Fr´echet bound 54, 35 Hausdorﬀ metric 248 175 unit matrix 334 operator 415 indicator function of a set A 139 semimetric on P(S) 67 65 69 = p (X, X) 76 indicator metric 111

JA

151

K(d, B) Kr (P, Q)

137 Kantorovich-type metric 48, 412 Kantorovich metric 412 Markov kernel 200

K1 (P, Q) K(x, ·)

kr

rth diﬀerence pseudomoment 122

L´evy metric 81, 109 L∞ L∞ -space of functions 388 ◦∞ L 389 Lc 17 Lf continuous linear functional 401 Li 30 Ln class of nth integrals 47 4p L 139 (L1), (L2) 309 58 L[·]c (a, r, d) L[·] (a, r, d1 , dr ) 60 59 L[·]c (r, d) L[·] (r, d1 , dr ) 61 L1f (Pi ) 69 LSC(βS1 × βS2 ) 252 Lp (X, Y ) 132, 72 Y ) Lp -metric 76 Lp (X, 4 Lp,r (X, Y ) 140 4 (X, Y ) L probability L

p,r

∗p,t (X, Y ) L L∗p,t (X, Y ) Lr (µ) Lp (µ) L(ω, µ) Lipb Liph (r) Lip(r, S) 1 ∞ ∗1

metric 170 302 280 r-fold integrable functions 32 196 Lagrange function 311 bounded Lipschitz functions 88 Lipschitz norm 49 r-Lipschitz functions 163 Kantorovich metric 35, 86 bounded real sequences (ξT )∞ T =1 366 92

Symbols ∗p,t (m1 , m2 ) 2 (P X , P Y ) p (P1 , P2 ) p (X, Y ) p (µ, ν) r (P1 , P2 )

280 = 2 (X, Y ) 132 p -metric 6, 87 76 334 smoothed version of 1 (of order r) 35, 87

Mc

set of measures 403 pseudometrics 423 linear space 384 set of measures 403 subset of Mr 403 L´evy measure 246 set of all signed Borel measures µ on IRn 47 40 41 280 280 59 60 15 81 142 142 35 measures with marginals Pi 58 ﬁnite signed Borel measures 375 319

M

i

M ◦ , Mk◦ Mr (r > 0) Mr0 Ms Ms◦

Mµ Mµ (B) M 1 (CT ) Mp (CT , m0 ) M1 (c) M2 (c) MC (F, G) M (h, δ) MX (n) Mθ (n) M 1 (P1 , P2 ) M (P1 , . . . , Pn ) M (IRk ), M

M (S) Mf (S), Mf (S×S)

M1 (U ) Mµ m(c) m0 (c) mX (n)

ﬁnite measures 36 probability measures 191 40 58 59 142

401

mθ (n) mn

142 375

N(m,σ)

normal distribution 188 309 normalized rounding error 81

Ns n−1 Sn,c

OTP(c)

(t)

P

P (Xu )u≤s P ∗ (A × B) P1,2 (B|A) P ∗ (h) PP Pε X P (µ)

P ∧µ

pN p mn (p, h) pX (t)

Qd

OTP with respect to c 3 marginal of P in the direction t 46 285 35 transportation plan 2 outer integral of h 65 primal polyhedron 23 approximation 93 stochastic optimization problem 49 inﬁmum in the lattice of measures 41 180 density of mn 376 vector problem 180 density of the r.v. X 419

Qγ Qp,r Q(a)

set of d-quasi periodic points 355 309 140 256

Rp,r R = R(k, n) R(Y )

140 405 145

402

Symbols

R(x) rba (S, R(E))

S1 S1 S2 S coll S ind γ S+ γ S− Sn Sn Sn∗

Sn,c Sn,m (S, ≤) K S1 S2 (Si , Bi ) S(c) S0 (c) (S, d) (SE) S(h) SLr (P1 , P2 )

Sp (P ) Spp (P1 , P2 ) (S, U) Sm (x, h) S(Y ) S(µ)

supp σ

x = xxM 137 x regular bounded additive measures 63

unit circle 47 421 421 129 129 313 313 80 simplex 181 sum of conventional roundings 80 total rounding error 81 255 topological space with closed preorder ≤ 44 316 measurable spaces 58 58 59 (separable) metric space 92 333 shift operator 65, 390 Skorohod– Lebesgue metric 34 97 dual form of Sp (P1 , P2 ) 97 measure space 36 392 F(Y )-Suslin functions 79 a symmetry group associated with µ 133 265

T Tr T C(u, v) Tp (t) T (λ) T← tA U UC U0 U[·]c (a, r, d) U[·]c (r, d) U[·] (r, d1 , dr ) U U Uµ (ϕ)

(U, · ) uX , uY (k) us u1n (x), u2n (x) Vi Vε V (S) V+ (S) V0 (S) Val(c; σ1 , σ2 , b) Var

val(c; σ1 , σ2 , b)

vr (X, Y )

v r (X, Y )

transformation 192 138 total costs 25 quantile function 32 a weighted sum 126 253 132 dual operator 40 17 415 57 58 60 norm 74 transportation problem with local upper bound µ 40 separable Banach space 86 densities 107 285 263 rounding error 81 ﬁnite covering ε-net 93 219 219 220 optimal value 252 total variation distance in X (IRd ) 134 optimal value of the dual problem 253 absolute pseudomoment 194 105

Symbols Wi Wp = p

Wu w# wn+1 w|M wp,N (X)p+1 X∗ X ◦i Xs [x] := [x1 ] Xm:n , Fm:n

(X, T (X)) xα x, x∗ (Y, ≤)

Brownian motions 278 Lp -Wasserstein metric / Lp -Kantorovich metric 40 354 transposed function 172 “output” ﬂow 71 restriction of w to M 180 249 topological dual space of X 112 76 normalized variation 308 conventional rounding 59 order statistic resp. its distribution function 156 Monge solution 3 384 bilinear form 112 ordered topological space 145

Z(·) Zk,n (X, Y ) Zn (X, Y ) 4 Z(X, Y ; s, p, α)

action proﬁle 367 ideal metric 47 ideal metric 383 426

IBb (S × S), IBb (S)

bounded Borel functions on S × S, resp. on S 221 the d-dimensional Euclidean space bounded universally

IRd UIb (S)

ZZn + λ\1 Aϕ

Aγ A(c, )

AM Ac (P )

A (S, ) B, B(U ) B(c, )

B(En )µ Bm (S)

B(S)ν B(S)σ B0 (S) = σ(C(S)) C(c) C(c; σ1 , σ2 )

403

measurable functions 169 384 150, 155, 111 σ-algebra generated by a measurable function ϕ 420 optimal value 314 optimal value of the general Kantorovich– Rubinstein problem 163 class of M-analytic sets 167 generalized Monge– Kantorovich functional 87 199 Borel σ-algebra 30, 418 optimal value of the dual Kantorovich– Rubinstein problem 164 µ-completion of B(En ) 194 set of lower majorized Borel functions on S 145 ν-completion of B(S) 220 σ-completion of B(S) 167 Baire-sets in S80 set of stable imputations optimal value of the general Monge–Kantorovich mass

404

Symbols

D Dn = Dn (µ) Dγ D(c; σ1 , σ2 )

D(x) D()

Eθ Eθ,u F2 FA+B FA−B Fr Fr FZ F(A, B) F(A, B, F σ ) F(F1 , F2 ) F 1 (R) F b (S) F(S) Fo (S) Gp G(A, B, Gσ ) G(m, Λ, α, β) G(S) Gb (S)

transfer problem 164 the diagonal in S × S 210 81 311 optimal value of the dual Monge– Kantorovich mass transfer problem 164 86 Borel measures with given marginal diﬀerence 14 177 177 421 set of d.f.s 11 set of d.f.s 13 class of functions 412, 102 104 distribution function of Z 184 18 19 joint d.f.s F with marginals F1 ,F2 51, 1 distribution functions 421 bounded upper semicontinuous functions 70, 74 upper semicontinuous functions 70 219 pairs of bounded continuous functions 97 19 grid class 332 lower semicontinuous functions 70 70, 74

H(F1 , F2 ) Id (P1 , P2 ) K K(P ) L L, Lo Lm

L(h; δ) L1 (R, P ) Lp (P ) L1 (Pi ) L(X, Y ) L(Y ) M1 M(P1 , P2 ) Mp (X) N Oε (P0 ) PH Pi

P2 PL Pγ Pµµ12 P(S)

relaxed marginal class 52, 3 94 Kantorovich metric 417 dual Monge–Kantorovich functional 87 L´evy stable motion 240 class of topological spaces 219 measurable functions bounded below 62 71 P -integrable functions 63 97 62 joint distributions 414 F(Y )-Suslin sets 79 class of laws 245 probability measures with given marginals 3 334 67 neighborhood of P0 92 space of probabilities 87 Borel probability measures on a product of i copies of (S, d) 27 322 class of all P ’s on L 31 set of measures 309 37 space of tight probabilities on S 64

Symbols Pb (S) P m (S), P P (U ) P(µ, Q)

PL (µ, σ) R R R(×i = 1n Bi ) S1 U U(S) V X Xc X40∗ (X )2 , L(X, Y ) Xs∗ X (B) X (C[0, 1])

Xp (CT , m0 ) X (R) X (IRk )

X (T, g, a) X (U ) Z

70 69, 96 multivariate compound Poisson distribution 129 33 ring 63 class of rules 184 63 class of laws 245 set of input ﬂows U 74 universally measurable sets 167 set of output ﬂows V 74 space of real random variables 414 class of r.v. belonging to X ∗ 427 426 space of joint distributions 414 417 space of random ﬁelds 248 space of r.v.s on a nonatomic probability space 54 class of processes on CT 280 set of all real-valued r.v.s 62 class of k-dimensional random vectors 103 space of X ∈ X (C[0, 1]) 63 space of U -valued r.v.s 86 291

405

Z1

class of Z-laws 246

α α1G1 ×···×Gn [α1 , . . . , αn ] αs,p (X, Y ) βS

384 73 403 107 ˇ Stone–Cech compactiﬁcation of S 225 176 set of transshipment plans 384 set of signed Borel measures Ψ on IR2n 47 set-valued mapping 236, 302 ﬁnite collection of functions 307 dual representation of λpp (P1 , P2 ) 105 quasi-antitone 109 78 kth diﬀerence of f with step h 384 function class 47 180 180 59 61 60 389 discrete measure 400 absolutely continuous marginal diﬀerence 378 391 392 rate of completing the ﬁnal mass 372 55

Γj Γµ

Γµ

Γn γ γpp (P1 , P2 )

∆-antitone ∆j ∆kn ∆kr ∆s ∆∗s ∆r,a ∆r,θ ∆θ ∆kx;h1 ,...,hk ∆α x;d ∆b (·)

∆α t f (x) ∆kh Pm (x) ∆(x)

δ

406

Symbols

δx δp (T ) ζ ζF ζr

ζr ζn (P1 , P2 ) ζs,p (X, Y ) θs ϑs,p κ κ2 κn κr

κm (X1 , θ)

Λ λ = λ + − λ− Λkϕ λpp (P1 , P2 ) λ(X, Y ) µ 4 µn µr µγ µ(ε)

Dirac measure at x 207 measure of deviation 72 Zolotarev metric 416 Zolotarev ζ-metric 110 extension of the Kantorovich metric 102 modiﬁcation of ζr 104 Zolotarev metric of order n 46 ideal metric 417 ideal metric 81 ideal metric 102 Kantorovich metric 88, 417 315 382 rth diﬀerence pseudomoment 143 diﬀerence pseudomoment 177 homogeneously convex functional 415 Hahn decomposition 93, 36 generalized Lipschitz space 394 105 λ-metric 423 characteristic function of µ 132 a measure 267, 322 convolution type metric 134 optimal solution for C γ 309 probability 24

◦

µc (·|·) µ 4c (·|·) µ∗ (A × B) µ 4(P1 , P2 ) µ(P1 , P2 ) µ(·G) µ(·, S), µ1 (·) µ(S, ·), µ2 (·) µF (X, Y ) µr (X, Y ) top

µ ≺ ν

νr∗ ν∗ (g) ν ∗ (g) νr ν(ϕ) ξr

πi

π1 µ(B) π2 µ(B) ΠK (x) π∗

π(X, Y )

(X, Y )

Kantorovich– Rubinstein functional 14 Kantorovich functional 3 36 µ-minimal metric 110, 417 105 G-dependence metric 37, 94 ﬁxed marginal distribution 53 ﬁxed marginal distribution 53 functional in X × X 419 probability metric 170 ν-convergence implies µ-convergence 134 137 147 147 136 220 rth absolute pseudomoment 143 projection on the ith coordinate 155 := µ(B × S) 163 := µ(S × B) 163 projection of x on K 122 optimal admissible permutation 16 Prohorov metric 417, 86 Kolmogorov (uniform) distance 24, 133 109, 184

Symbols p t K t w t σ σi σM σr∗ σ ∗ (X, Y ) σ(P1 , P2 ) σr σ r (P1 , P2 ) τK

τr τr∗ τr τ (X, Y )

ϕ(ε) ϕ(µ) ϕ (τ ; t) Φ ΦS (θ) Φσ χ

χ∗ χr

Kolmogorov metric 111 mapping 180 K-stationary divisor 182 Webster’s rule 182 permutation 254 discrete measures 407 supremum of the set Φ(σ, M ) 180 92 134 total variation metric 30 87 smoothed version of σ 35 topology generated by K 90 moment-type condition 88, 135 92 13 compound metric, τ -metric 373 97 optimal value of P (µ) 49 characteristic function 46 standard normal d.f. 266 Laplace transform 246 d.f. of N (0, σ 2 I) 325 uniform distance between characteristic functions 137 “tB -uniform” version of χ 137 “smoothed” version of χ 137

χn,c (m) χn,c (P1 − P2 ) χp (X, Y ) χ 4p (X, Y ) ψ(µ)

(Ω, A, P ) ωk (f, t)

ωk (f ; Q; t) ω(γ) f c · · ∞ mn µ||k,r ||h||H i X i − X T,p XT · bL X∗T,p X∗T,∞ b∞ Dis1 ,...,is ·f q,j (x) x − yp uC b (S) (ξT )∞ T =1 mb,c µr ◦

fk

407

absolute pseudomoment 382 382 metric 249 minimal metric 249 solution set corresponding to P (µ),ϕ(µ) 49 probability space 8, 414 kth modulus of continuity of f 384 393 405 Lipschitz norm 16 norm on 40 supremum norm 91 Kantorovich– Rubinstein norm 46, 378 minimal function on Mr◦ 48 seminorm of h 49 286 300 bounded Lipschitz norm 306 312 312 318 103 p-norm 158 uniform norm on C b (S) 164 norm of ∞ 366 Fortet–Mourier metric 382 383 ◦∞

seminorm on L 389

408

Symbols

µk,ϕ

V V n K B(Bi , Pi ) i=1 n D

(Si , Bi )

i=1

∨-stable ∧-stable x∨y x∧y ∧ (−∞)1x1 ≥x2 ∂A0 (c, ·)(0) ∂f (x) ∂c f ∂p(0) ∂V (c, ·)(0)

generalized Kantorovich– Rubinstein norm 394, 404 norm 74 direct sum of Bi -measurables 61 product 58 69 69 = max{x, y} 4 = min{x, y} 4 min 19 77 subdiﬀerential 268 subdiﬀerential of f 104, 287 c-subdiﬀerential of f 125 p-subdiﬀerential 178 subdiﬀerential 243

∇f (x) (·, ·) (·)+ ≤st |S1 α!

α β

[r] [x]c

x |W i |T,∞ [t]G , [t]∗G

= grad f (x) = ∂f (x) 115 inner product in IRd 142 = max(0, ·) 71 the stochastic ordering 147 restriction to S1 290 lexicographic order 397 403 convolution of measures 411 403 integer part of the number r 44 c-rounding of x 53 smallest integer larger than or equal to x 231 318 336

Index

Bold pagenumbers refer to this volume, non-bold pagenumbers to the other volume.

α-stable, 86 Abel method, 126, 128 absolute pseudomoment(s), 142, 382 absolutely monotonic, 424 abstract duality theorem, 178, 242, 244, 255, 260 version of the dual problem, 175 of the Kantorovich–Rubinstein functional, 179, 188 of the mass transfer problem, 172, 175 action proﬁle, 367, 368 admissibility of (fi ), 153 admissible, 2 permutation, 16 aﬃne maps on IRk , 202 analytic sets, 177, 190 antithetic variates, 154 apportionment theory, 53 approximable compactly, 63 approximate

extension property, 292 theorem(s), 292 approximating algorithms, 12 model, 79 approximation ﬁnite-dimensional, 92 model, 77 of mass transfer problems, 306 of queueing systems, 71, 72 of the distribution of sums, 128 optimal rate, 93 queues, 76 theorems, 306, 307 arbitrary directions, 43 mutually dependent, 72 arbitrary compact space, 198 Arzela theorem, 202, 218 asset returns, 39 assignment games discrete and continuous, 25 asymptotic distribution, 273 normality, 224

410

Index

attracted trajectory, 358 automorphism, 150 autoregressive conditional heteroscedasticity (ARCH), 40 modeling of asset returns, 39 auxiliary theorem on convex sets, 179 Baire function, 80, 167, 220 measurable functions, 177, 302 σ-algebra, 197, 217 sets, 80, 219, 302 subset, 219 balancing condition, 372 Banach lattice, 166, 173, 177, 209, 301, 366 conjugate, 180, 187 of bounded real-valued functions, 292 limit, 366 space(s), viii, xvi, 166, 389 conjugate, 227 dual, 166, 251, 261 isometric isomorphism, 405 real, 112 separable, 4, 33, 329, 354 Barnes–Hoﬀmann greedy algorithm, 20 Berry–Ess´een bound, 258 type result, 117 theorems, 113 Berry-Ess´een theorem, 255 Berry-Ess´een type result, 255 Beta-distributed, 214 biconjugate function, 112 bilinear form, 112 binary random trees, 254 relation, 322 search trees, 260, 263

BLIL, see bounded law of the iterated logarithm Boltzmann -type equation, 277, 307, 318 Bonferoni bounds, 151 bootstrap approximation, 199 estimator, 198, 199 sample, 198, 199 Borel extension problem, 301 function, 167, 177, 199, 220 measurable function, 339 measure, 167 on a compact space, 166 method, 126, 128 probability measure, 89 σ-algebra, 63 set(s), 219, 301, 302 subset, 295 bounded Kantorovich metric, 79 law of the iterated logarithm (BLIL), 306 boundedness from below, 208 of the cost function, 169 bounds for mn in the multivariate case, 382 for the deviation of two dependent queueing systems, 72 for the total transportation cost, 158 of deviation between probability measures, 51 to the total cost, 158 branching processes, 216 type recursion, 206 with multiplicative weights, 207 Brownian motion, 309, 314 motions, 278 bucket algorithm, 272 Burger’s type equation, 289

Index c-conjugation, 124 c-convex minorant, 125 c-convexity, 124 c-coupling optimal, 123, 130, 131 c-cyclic monotone, 131 monotonicity, 131 c-cyclical monotonicity, 126 c-optimal couplings, 127 c-optimality, 130 c-rounding, 53, 57 lower bounds, 58 c-subdiﬀerential, 125 c-subgradient, 125 C 1 -operator, single-valued, 288 cadlag functions, 328 Cantor’s diagonal method, 360 capacity, 79 Capetanakis–Tsybakov–Mikhailov (CTM) protocol, 220 Carlson’s inequality, 174 lemma, 323 case of equiprobable atoms, 16 Cauchy-Schwarz inequality, 345 central limit theorem (CLT), 103, 179, 204, 226, 263 for the total wealth, 240 functional, 241 local, 137 quantitative version, 264 rate of convergence, 34, 374 C´esaro method, 126, 128 chance discretization points, 42 chaotic, 278, 288, 301, 307, 319 characteristic function, 231 characterization classical Hoeﬀding–Fr´echet, 52 of c-optimal couplings, 127 of optimal 2 -couplings, 116 characterize the duality theorem, 259 charge, 83 choice function, 352 problem, 352 theory, 352

411

Choquet’s Theorem, 79 classes AB0 (S × S), 197 classical Hoeﬀding–Fr´echet characterization, 52 Kantorovich–Rubinstein functional, 394 classical multiple-access problem, 220 closed, 361 formula for mn in the univariable case, 380 preorder, 322–324, 327, 336, 340, 341, 344 set-valued mapping, 358 subspace, 172 closeness, 273 between Sn and Sn∗ , 81 in terms of a weak metric, 43 CLT, see central limit theorem, see central limit theorem coarse grid, 41 common probability space, 339 communication resolution interval (CRI) algorithm, 192 compact, 64, 70, 359 case metrizable, 190 nonmetrizable, 196 measures, 64 space, 161, 219 arbitrary, 198 metrizable, 170, 190, 208 compactly approximable, 63 compensatory transfers, 367 competitive equilibria models, 340 complex queueing models, 53 compound metric, 373 computer tomography paradox, 51 conditional covariance, 96 measure, 327 conditionally independent, 95 conditions for a nontrivial explicit solution, 281, 285

412

Index

for duality in the Monge–Kantorovich problem, 248 on the cost function, 176 conjugate Banach lattice, 187 function, 112 functional, 178 connectivity hypothesis, 363 continuity, 70, 72 continuous, 358 and discrete mass transportation problems, 23 function, 300, 330 increasing, 329 isotone, 325 functionals, 68 linear functional, 404 transformation, 404 linear interpolation, 338 partial derivatives, 280 selection theorem, 302 utility, 337 function, 329, 330, 334 -utility-rational, 352 function, 352 continuously diﬀerentiable, 280 contraction method, 37, 192, 254, 264 of Φ, 314 of Φ with respect to ∗p,t , 297 of Φ with respect to the ∗p,t -minimal metric, 283 of Φ with respect to the minimal metric ∗p,t , 304 of stochastic mappings, 277 of transformation, 191 contractive mapping, 200 conventional rounding, 59 convergence of a net to a point, 261 of algorithms, 37 of recursive algorithms, 204 converse to the duality space, 86 convex, 64, 103, 112 biconjugate

function(s), 112 cone, 179, 184 thick, 179, 184 conjugate function(s), 112 functional, 178 sets, 184 auxiliary theorem, 179 subset, 179 convex function, 289 convexity, 285 convolution argument, 380 of a measure, 380 of measures, 412 property, 380 copula, 7 corner rule generalized northwest, 24 northwest, 2, 7, 17, 22 Hoﬀman’s, 26 multivariate version, 34 southwest, 25 cost of shipping a unit commodity form origin i to destination j, 27 cost function(s), viii, xii, 2, 170, 172, 198 ∆-antitone, 109 bounded below, 170, 208 boundedness, 169 condition, 176 duality theorem for symmetric, nonnegative, 16 lower semicontinuous, 365 nonsymmetric, 12 quasi-antitone, 109 reduced, 170, 190 regular, 176, 279 semimetric, 14 strictly positive, 365 symmetric, 4, 11 coupling(s) optimal, 112 couplings, 323 Courant–Fischer lemma, 137

Index CRI, see communication resolution interval CTM, see Capetanakis–Tsybakov– Mikhailov cyclic -monotone, 115 maximal, 116 operator(s), 287 operator(s) and mass transfer problem, 288 -monotonicity, 115, 289 condition, 131 cyclical monotone function, 10 -monotone function, 38 ∆-antitone cost functions, 109 d-closure of the set of upper semicontinuous functions, 81 d-Lipschitz, 349–351, 357 -utility-rational, 352 choice function, 352 utility function, 349 d -Lipschitz, 351 d1 -Lipschitz, 351 d-quasiperiodic, 355 d-valuation, 346, 348 Debreu theorem, 323, 329, 335 demand distribution, 2 demand theory, 370 densitiy of Lip (c, S; X), 293 density, 376 density coupling lemma, 324 deviation between probability measures, 51 Diaconis and Freedman results, 179 diagonal method of Cantor, 361 diﬀerence between λp and γp , 106 pseudomoment, 142, 176 diﬀerentiability of functions, 115 diﬀerential equations stochastic, 277 diﬀusion with jumps, 331 Dini’s theorem, 74 Dirac measure, 207, 258, 311

413

disastrous event, 237 discrete and continuous assignment games, 25 mass transportation problems, 23 case, 35 marginal measure, 313 metric, 93 Monge condition, 53 transportation problem, 2 discretization of the SDE, 332 point(s), 41, 42 Wiener process, 336 distance between X and Y , 62 in probability, 417 distance from point x to set A, 333 distribution asymptotic, 273 demand, 2 function, 4 multinomial, 272 of the exact solution, 348 of the past, 285 supply, 2 uniform, 272 divisor criterion, 180 rule(s), 180 of (1/t)-rounding, 180 stationary, 181 Dobrushin’s result on optimal couplings, 36 Dobrushin’s theorem, 93 domain of normal attraction, 132 of normal attraction (dna), 306 DTP, see dual transportation problem dual Banach lattice, 173 Banach space, 261 extremal problem, 259 linear extremal problem, 163 Monge–Kantorovich

414

Index

functional(s), 64, 87 problem, 247 polyhedron (DP), 23 problem, 58, 217, 219 of the nontopological version of the mass transfer problem, 265 optimal value, 253, 268 representation, 5, 14 for Lp -minimal metrics, 96 transportation problem (DTP), 23 dual representation of p , 201 duality for Suslin functions, 79 problem in a mass setting, 242 relation, 212, 213 representation for mn , 379 results of KRP, 13 theory for mass transfer problems, 161 duality theorem(s), 171, 207, 214, 219, 225, 277, 375 abstract, 178, 242, 244, 255, 260 characterization, 259 4 p , 151 for L for a marginal problem with moment-type constraints, 251, 253 for a nontopological version of the mass transfer problem, 265, 272 for compact space, 161 for inﬁnite linear programs, 241 for mass transshipments on a compact space with constraints on the marginal kth diﬀerence, 402 for semicontinuous functions, 76 for symmetric, nonnegative cost functions, 16 for the KRP, 15 formulation, 175 general, 7, 82, 84

in mass settings, 168, 241 in topolocial setting, 76 more general, 211 of Isii, 59 of Kantorovich–Rubinstein, on noncompact spaces, 222 on a metrizable compact space, 208 on arbitrary compact space, 169, 171, 207 on metrizable compact space, 170 on noncompact spaces, 211, 222, 232 and general cost function, 225, 234, 238 with, 238 with continuous and cost function, 234 with continuous cost function bounded below, 223 with cost function satisfying the triangle inequality, 222 with metric cost function, 86 Dubovitskii–Milyutin theorem on convex sets, 180, 184 Dudley’s problem, 6 dynamic optimization problem, 363 dynamical system, 354, 358 dynamics of a queueing system, 74 ε-coincidence of marginals, 50 of moments, 50 eﬃcient inﬁnite trajectory, 364, 365 empirical measure, 322, 326 environmental processes, 203 equicontinuous, 201, 202 Euclidean case, 96 norm, 323 Euler constant, 230, 271 method, 126, 128, 337 summation formula, 268 existence

Index of optimal measures, 270 of optimal solutions, 217 explicit 4 p in representations for L X (IR), 152 explicit solution of the mass transfer problem with a smooth cost function, 276 exponent, 132 exponential convergence rate, 196 rate of convergence, 219, 253 exponential topology, 340 extension of a function, 299 of the Kantorovich metric, 102, 183 of the Kantorovich–Rubinstein theorem, 406 problem, 290, 295 solution, 296 theorem(s), 295, 325 extremal marginal problem, 251, 258, 307 points, 19 problem(s) linear, 241 solution of, 139–141 value, 28 Fenchel–Moreau theroem, 178 ﬁnal mass, 372 ﬁne grid, 41 ﬁnite -dimensional linear programs, 307 Borel measure on a compact space, 166 dimensional approximation, 92 bounds, 93 case, 92 measure, 35, 265 trajectories, 364 ﬁniteness, 66, 254 of ζm (X1 , θ), 176 of Cm (θ) and Dm (θ), 173 of I, 174

415

of the metrics µr , χr , dr , and 4 p,r , 169 L

of the upper bounds, 122 ﬁxed

marginal distributions, 53 moments, 52 moments, 53 ﬂuctuation inequalities, 93 formal equiprobable case, 16 Fortet–Mourier metric, 17, 50, 382 Fr´echet bound(s) lower, 2, 17 majorized, 42 upper, 2, 17 usual, 42 bounds, 2 bounds generalized upper, 54 sharpness of, 152 condition, 19, 21 diﬀerentiable density, 90 problem, 262 topological version, 262 -problem, 152 space, 339 type bound, 24 Fubini theorem, 5 full distribution, 132 probability distribution, 131 strictly operator-stable distribution, 146 random vector(s), 143, 151 function (convex) biconjugate, 112 (convex) conjugate, 112 bounded below, 171 diﬀerentiability, 115 isotone, 324 monotone, 11 optimal, 10 functional sublinear, 244 functional central limit theorem (CLT), 241 functionally

416

Index closed preorder, 324, 327, 336, 341 preorder, 324, 327

G-dependence metric, 94 G-dependence metrics, 37 G-measurable random variable, 94 Γ(p, λ)-distributions, 247 Galton–Watson process, 206, 216 normalized, 216 Gamma -distributed, 214 gamma distribution, 215 GARCH, see generalized autoregressive conditional heteroscedasticity Gaussian processes, 120 Gel’fand compactum, 225 general case, 123, 402 cost functions, 123 duality result, 245 theorem, 7, 82, 84 Kantorovich–Rubinstein mass transshipment problem, 244 problem (KRP), 163 Monge condition, 24 Monge–Kantorovich mass transfer problem with given marginals, 164 mass transportation problem (MKP), 247 problem on continuous selections, 303 generalization of the Monge–Kantorovich mass transportation problem, 29 generalizations of Debreu theorem, 329 generalized autoregressive conditional heteroscedasticity (GARCH), 40 modeling of asset returns, 39

Kantorovich–Rubinstein norm, 394 Monge–Kantorovich functional, 87 subsequence, 187 upper Fr´echet bound, 54 geometric α-stable r.v., 242 L´evy stable motion, 243 strictly stable distributions, 243 geometrically distributed, 237 global minimum, 129 Gnedenko’s extreme-value theorem, 232 greedy algorithm(s), 6, 17, 20, 22 solution(s), 7 greedy recursion, 22 grid class, 41, 339 coarse, 41 ﬁne, 41 points, 41, 338 Gronwall inquality, 317 lemma, 282, 301, 321, 341 H¨ older condition, 418 Haar probability, 132 Hahn decomposition, 93 Hahn–Banach theorem, 61, 393, 402 Hausdorﬀ locally convex linear topological space, 265 space, 178 metric, 333 Hausdorﬀ metric, 248 Hoeﬀding–Fr´echet bounds, 107, 151 lower, 31–33 upper, 20, 21, 31 characterization, 52 inequality, 17 upper bound, 20, 21 H¨ older’s inequality, 108, 339 multidimensional, 339 homogeneity, 143, 413

Index homogeneous, 94, 174 functional, 414, 427 metric, 416, 422, 424, 428 metric(s), 422 ideal Kantorovich metric, 87 metric, 81, 82, 87, 102, 107, 183, 193, 233 of Zolotarev, 275 metric(s), 30, 223, 371, 374, 383, 411, 414, 415, 421, 424 of Zolotarev, 381 properties of the metric Kr , 412 ideality for a probability metric, 143 identical mapping, 171 IFS, see iterated function system image encoding, 199 implementable, 368 imputation, 25 feasible, 25 individually rational, 25 stable, 25, 26 increasing continuous function, 329 convex function, 26 function, 47, 380 sequence, 72 indicator function, 177, 231 metric, 111 indicator cost function, 36 inequality of Marcinkiewicz–Zygmund, 288 inﬁnite -dimensional linear program, 241 dimensional network ﬂow problem, 378 exchangeable sequence, 327 trajectory, 364, 365 initial mass, 372 input of laws, 71 interacting diﬀusion, 278

417

diﬀusions, 279 drifts, 279 intrinsic properties of prabability metrics, 113 inversion, 254 in a random permutation, 254 isometric isomorphism, 404 isometry, 396, 405 isotone, 146 completion, 146 function, 324, 325 functionals, 65 real-valued function, 324 with respect to ω , 337 iterated function system (IFS), 201 Itˆ o type SDEs, 332 Jordan decomposition, 166, 167, 179, 270, 314 k-minimal metric, 110 Kantorovich equality, 246 formulation, 2 of the MTP, 2 functional, 3, 14, 29 L2 -minimal problem, 132 on IRd , 132 metric, 35, 85, 86, 88, 90, 102, 138, 183, 200, 322, 412 p , 76 bounded, 79 extension, 183 generalized, 424 optimality criterion, 163 radius, 53, 54, 56 rth pseudomoment, 184 theorem, 88 Kantorovich–Rubinstein distance, 163 duality theorem on noncompact spaces, 222 functional, 14, 17, 179, 183 abstract version, 179, 188 classical, 394

418

Index

mass transshipment problem, 50, 162, 244, 275, 281, 371 duality results of, 13 topological properties, 13 metric, 306 minimal functionals, 246 norm, 46, 382, 404 generalized, 394 problem (KRP), 2, 161, 372 duality theorem, 15 general, 163 optimal transportation plan (OTP), 2 original, 163 seminorm, 378 theorem, 412 extension, 406 transshipment problem (KRP), vii, xi Kemperman equality, 410 Kingman’s subadditive ergodic theorem, 214 Kirchhoﬀ equation, 13 Kirszbraun–McShane extension, 91 Kolmogorov distance, 24, 183, 271 metric, 188 weighted, 232 uniform distance, 24 Kolmogorov metric, 111 Krein–Milman and Choquet theorem, 19 Krein–Smulyan theorem, 251, 256 KRP, see Kantorovich–Rubinstein problem KRP, see Kantorovich–Rubinstein transshipment problem, see Kantorovich–Rubinstein transshipment problem kth modulus of continuity of f , 384 Ky–Fan metric, 417 λ-metric, 423 L1 -variation, 314 Lp -Kantorovich metric, 40 Lp -Wasserstein metric, 332, 348

1 -convergence, 86 L2 -minimal problem, 132 L2 -Kantorovich metric, 322 Lp -Wasserstein metric, 40 (p , ε)-independence, 77 (p , ε)-independent, 76 4 p -convergence, 152 L Lp -distance, 332 Lp -Kantorovich problem on mass transportation, 53 p -Kantorovich metric, 253 Lp -metric, 138 minimal, 194 Lagrange function, 312 λ-metric, 43 Laplace transform, 246, 247 largest c-convex minorant, 125 elements of the marginal, 12 lattice superadditive, 17 lattice measure, 396 learning algorithm, 204 Lebesgue integrable, 339 Lebesgue–Fatou lemma, 206 lemma of Carlson, 323 of Courant–Fischer, 137 of Gronwall, 282, 301, 321 of Lebesgue–Fatou, 206 of Pollard, 93 of Robbins–Siegmund, 206 of Urysohn, 176, 328 of Zorn, 291 LePage decomposition, 103 representation, 91, 124, 125 less concordant, 33 L´evy distance, 108 measure, 241, 246, 247 metric, 102, 183 process, 241 L´evy metric, 423, 424 generalized, 423

Index lexicographic order, 397 limit laws, 236 linear combination of measures, 400 extremal problem, 163, 241, 242 function(s), 119 preorder, 322, 323 programming duality, 5 transportation problem, 315 programs, 307 transformation, 401 linear interpolation of the trajectories, 338 Lipschitz assumption, 309 condition, 5, 332–334 relaxed, 308 stronger, 301 constant, 334 function, 200 norm, 98, 379 preorder, 333 on a metric space, 332 space, 394 utility function, 344 local bounds for the transportation plans, 36 in the transportation problem, 35 upper bounds on the transportation plans, 40 locally convex Hausdorﬀ space, 286 space, 178 logarithmic normalization, 266 lognormal distribution, 231 lower bounded semicontinuous cost function, 62 bounds c-rounding, 58 Fr´echet bound, 2 Hoeﬀding–Fr´echet bound, 31

419

semicontinuity of c, 214 semicontinuity of c∗ , 214 semicontinuous, 62, 77, 113, 169, 178, 188, 201, 243, 244, 259, 358, 361 convex function, 103 cost function, 365 function, 70, 171, 176, 227 Lusin C-property, 343 separation theorem, 178, 230 extension, 225 theorem, 74 Lyapunov theorem, 261 µr -closeness, 223 M-analytic, 167 function, 167 m-buckets, 272 m-chaotic, 301, 307, 319 µ-completion, 194 µ-measurable selection, 216 sets, 195 µ-minimal metric, 111 µ-negligible open set, 221 µ 4c -convergence, 29 Maejima–Rachev construction, 104 majorized Fr´echet bounds, 42 Marcinkiewicz–Zygmund inequality, 166, 288 Marcinkiewicz-Zygmund inequality, 301 marginal distributions, 53 elements, 12 moments, 52 marginal(s), 83 and perfectness, 83 constraints, 54 extensions and perfectness, 83 measures, 145 Markov chain, 236 kernel, 199, 200

420

Index

models of interacting particles, 277 Markov inequality, 392 martingale, 211 case, 94 inequalities, 340 mass transfer problem, 161, 162, 175, 198, 219, 275, 365 abstract version, 175 and cyclic-monotone operators, 288 approximated, 307 approximation, 306 noncompact version, 220 nontopological version, 265 dual problem, 265 duality theorem, 265, 272 on completely regular topological spaces, 221, 232 optimal value, 275 with continuous cost function, 306 with given marginal diﬀerence, 162, 163, 221, 244 on compact space, 313 with given marginals, 245 mass transportation problem, 414 with ﬁxed sum, 10 with stochastically ordered marginals, 10 mass transportation problem (MTP), vii, xi, xiii, xvii, 1, 27 and probability distances, 27 approximation of, 4 continuous and discrete, 23 general, Monge–Kantorovich (MKP), 247 of Monge–Kantorovich (MKP), vii, xi on IRn , 158 spezialized, 51 with additional constraints (MTPA), vii, xi with partial knowledge of the marginals (MTPP), 4 mass transshipment problem, 13, 371, 378

condition for nontrivial solution, 285 Kantorovich–Rubinstein, 50, 162, 275, 281 necessary condition, 280 for a nontrivial solution, 281 optimal value, 381 with constraints on derivatives of marginals, 378 mathematical economics applications, 322 matrix problem, 182 MAX-algorithm, 254, 257 max-geometric inﬁnitely divisible, 239 max-operator-stable limit theorem, 132 maximal compactiﬁcation, 226 concentration on the diagonal, 16 cyclic-monotone, 116 dependence, 151 element of a set, 291 measure, 146 maximally dependent random variables, 155 maximum of sums, 144, 148 probability of sets, 144 McKean example, 279 interacting diﬀusion, 278 McKean–Vlasov equation, 299, 305, 309 McKean-Vlasov equation, 278 MD-operator, 419, 425, 426 measurable function, 420 mapping, 235 selection theorem, 194, 217, 235, 237 measures with a large number of common marginals, 43 method generating function, 261 of antithetic variates, 154

Index of probability metrics, 204, 273 metric(s) compound, 373 ideal, 30, 81, 82, 102, 183, 193, 223, 374, 411 indicator, 111 k-minimal, 110 Kolmogorov, 111 Lp -Kantorovich, 40 2 -minimal, , 112 µ-minimal, 111 minimal, 30, 374 nonpathological, 185 protominimal, 111 simple, 373 space preorder, 332 separable, 332 metrizable, 190 compact case, 190 space, 170, 190, 208 topological spaces, 337 Michael’s selection theorem, 306, 339 middle inequality, 291 Milshtein’s method, 337 minimal p -metric, 191 distance between X and Y , 62 functionals, 246 L0 -metric, 138 1 -metric, 45 2 -metric, 112 Lp -metric, 138 p -coupling, 124, 131 Lp -metric in the space of probabilities, 87 p -metric, 124 Lp -metrics, 194 mean interaction, 307 metric, 417 metric(s), 30, 45, 110, 111, 374 network ﬂow problem, 13 representation of metrics, 45 variance of the sum, 155 minimality of ideal metrics, 414

421

4 p , 140 property of L Minkowski inequality, 343 minorant, 125 MKP, see Monge–Kantorovich problem MKP, see Monge–Kantorovich mass transportation problem MKTP, see Monge–Kantorovich transportation problem moment formulas, 264 generating function, 255 problems, 52 moment-type marginal constraints, 54 Monge condition, 2, 11, 22, 39, 53 generalized, 24 formulation of the MTP, 2 function, 25 problem, 162 solutions, 118, 129 Monge–Amp`ere PDE, 123 Monge–Kantorovich functional(s), 5, 65, 179, 183 dual, 64, 87 generalized, 87 primal, 64 mass transfer problem with given marginals, general, 164 mass transportation problem (MKP), vii, ix, xi, xiii, 1, 34 abstract version, 17 generalization, 29 multidimensional, 23 optimal transportation plan (OTP), 2 with capacity constraints, 35 minimal functionals, 246 problem (MKP), 246, 418 conditions for duality, 248 dual, 247 multivariate, 58 with given marginals, 162

422

Index

transportation problem (MKTP), 374 classical, 374 monotone, 203 convergence, 308 function, 11, 147 cyclical, 10 operator, 287 seminorm, 86 Zarantonello-, 11 Monte Carlo simulation, 151 Moreau’s theorem, 122 MTP, see mass transportation problem MTPA, see mass transportation problem with additional constraints, see mass transportation problem with additional constraints MTPP, see mass transportation problem with partial knowledge of the marginals multi-dimensional martingale inequalities, 340 multichannel models, 74 –multiphased model, 74 multidimensional MKP, 23 multifunction, 363 multinomial distributed, 272 multivariate compound Poisson distribution, 128 normal distribution, 325 setting, 241 summability methods, 126 version of Hoﬀman’s northwest corner rule, 34 MYZ-rounding, 59 MYZ-rule of rounding, 180 ν-completion, 220 ν-measurable, 220 necessary condition for a nontrivial solution, 281

a nontrivial solution of the mass transshipment problem, 280 the duality relation, 212 negative cost of shipping a unit commodity from origin i to destination j, 27 network, 363 ﬂow problem, 6, 15 minimal, 13 node j, 10 non-Markovian case, 293 nonatomic market games, 370 noncompact reduction theorem, 234 version of mass transfer problem, 220, 244 nondecreasing, 203 nonincreasing function, 394, 421 nonincreasing function, 248 nonmetrizable case, 197 compact case, 196 nonnegative, 66, 203 lower semicontinuous cost function, 213 Radon measure, 166 nonpathological metrics, 185 nonsymmetric cost functions, 12 nonsymmetric case, 8, 12 nontopological version of the mass transfer problem, 265 dual problem, 265 duality theorem, 265, 272 nontotal and nontransitive preference, 344 nontraditional measurable selection theorem, 235 nonuniqueness of optimal solution, 312 nonvoid, 64, 70 normalization condition, 180, 181, 185 normalized rounding error, 81 normed space, 179

Index northwest corner rule, 2, 7, 17, 22 generalized, 24 Hoﬀman’s, 26 multivariate version, 34 variant, 20 nth integral, 375 numerical approximation of stochastic diﬀerential equations, 39 one dimensional case, 120 one-dimensional standard Wiener process, 341, 342 one-dimensional case, 132 operator -ideal metric, 143 -stable, 132 full distribution, 132 limit theorem, 131 random vector(s), 131, 138 strictly, 132 optimal admissible permutation, 16 c-coupling, 123, 130, 131 coupling(s), 6, 16, 37, 112, 117, 123 Dobrushin’s result, 36 of Gaussian processes, 120 with local restrictions, 36 couplings, 318 distribution, 39 feasible, 18 ﬁnite trajectories, 364 function, 10 joint distribution, 22 2 -couplings, 116 measure, 26, 28, 30, 217, 221, 248, 270 multivariate transshipment costs, 158 over the class of K-stationary rules, 187 pair, 103 rate of approximation, 93 rounding rule, 185 roundings in terms of ideal metrics, 179 rule of rounding, 183

423

solution for C γ , 308 solution(s), 187, 217 taxation, theory of, 367 trajectory, 364, 365 transportation plan (OTP), 2, 3, 10 for KRP, 2 for MKP, 2 uniqueness of, 13 transshipment, 372 value, 218, 246, 265, 308 in the dual problem, 268 of the dual problem, 253 of the mass transfer problem, 275 of the mass transshipment problem, 381 optimality criterion, 221 of Kantorovich, 163 of a map, 10 of projections, 122 of radial transformations, 121 optimization function, viii, xii problem, 363, 365 order, 323 of convergence, 140 rounding rule, 185 type relation, 15 ordered topological space, 145 ordering criterion, 17 original Kantorovich–Rubinstein problem, 163 Orlicz condition, 26, 29 OTP, see optimal transportation plan output ﬂow, 71 p-conjugate, 102 second, 102 p-th mean interaction, 293 paracompact space, 303 partial derivative, 309 partial derivatives, 280 perfect

424

Index

compound, 415 measures, 64 metric, 414, 422 metric(s), 422 probability, 63 space, 7 perfectness and marginal extensions, 83 and marginals, 83 piecewise linear interpolation, 338 piecewise smooth oriented curve, 285 Poisson distributed random variable, 271 process, 220 Polish space, 167, 177, 219, 295, 301, 302 Polish spaces, 347 Pollard’s lemma, 93 polyhedron dual, 23 primal, 23 positive cost function, 365 deﬁnite matrix, 281 semideﬁnite, 280, 289 precompact, 318 trajectory, 359 preference, 344 nontotal and nontransitive, 344 strict, 345 preferred, 344 rounding roules, 185 preorder, 322, 332 closed, 322, 327, 344 functionally closed, 324, 327, 336, 341 linear, 322, 323 on a completely regular topological space, 341 on a metric space, 332 variying, 337 primal Monge–Kantorovich functional(s), 64 polyhedron (PP), 23

probability density function on IR2 , 42 distance, 28 distribution full, 131 measure with µ-density f2∗ , 37 measure, µ 4c -convergence of, 29 metric, 28, 414, 419, 426 theory of, 373 perfect, 63 semidistance, 27 space, perfect, 7 problem of mass transfer, 315 of Monge, 162 of variance reduction, 154 on mass transportation, 53 with ﬁxed marginals, 315 product measurable functions, 80 Prohorov metric, 86, 138, 152, 249 Prohorov metric, 92, 417 projection(s), 122, 191 operators, 199 optimality of, 122 propagation of chaos, 277, 289, 319 property, 289 proper mapping, 286 sublinear functional, 244 properties of the metric Kr , 412 protominimal metric, 111 pseudo -diﬀerence moment, 96 drift, 289 pseudometric(s), 421, 422 pth mean interaction, 301 norm interaction, 301 pure time discretization, 337 Pyke and Root inequality, 301, 321 quantitative approximation, 254 version of the central limit theorem, 264

Index quasi-antitone cost functions, 109 quasiconvexity, 284 queueing models, 75 system(s) approximation, 71, 72 dynamics, 74 real, 71 simpler, 71 quicksort, 229 algorithm, 191, 229, 230 R-isotone, 349, 351, 354, 357, 361 function, 345 R-nondecreasing chain, 346 R-regular, 347 -relative compact, 24 -relatively compact, 24 r-th pseudomoment, 92 Rademacher’s theorem, 381 radial transformation, 123 radial transformation(s), 121, 132 radius of the set of probabilistic laws, 81 Radon measure, 13, 163, 166, 167, 200, 204, 237 nonnegative, 166 signed, 166 Radon-Nikodym derivative, 247 random broken line(s), 66, 67 ﬁeld(s), 248 immigration term, 217 measure, 327 polygon line(s), 64 recursion, 236 search algorithm, 269 search tree(s), 260 variables, maximally dependent, 155 vector, 131 walk method, 126, 128 random recursion, 248 range of values of Eh(X − Y ), 63 rate explosions, 373 of convergence

425

in the central limit theorem, 34 in the stable limit theorem, 35 of transshipment, 372 rate of convergence, 138, 143, 181, 199, 248, 322, 327 bound in the local central limit theorem, 137 exponential, 219 faster, 186 in the CLT, 85, 275 for random elements with LePage representation, 91 in the i.i.d. case, 86 problem, 131 result(s), 126, 138 square uniform, 323 to zero, 323 under alternative distributional assumptions, 263 rational choice theory, 352 rationalizable, 368 real queueing system, 71 real-valued function, isotone, 324 recursion of branching type with multiplicative weights, 207 of branching-type, 206 recursive algorithm, 191 reduced cost function, 170, 190, 348 associated with the original cost function, 332 reduction theorem(s), 190, 192, 198, 209, 211, 277, 279 noncompact, 234 reﬂections, 120 reﬂexive relation, 322 regular cost function, 176, 279 function, 176, 299 functional, 414

426

Index

with respect to R, 347 regularity, 143, 413 related theorem, 178 relation binary, 322 order-type, 15 reﬂexive, 322 transitive, 322 relaxed Lipschitz condition, 308 side conditions, 7 transportation problem, 4, 8 relaxed transportation problem, 52 representation of metrics, minimal, 45 utility, 45 Robbins–Monroe-type recursion, 206 Robbins–Siegmund lemma, 206 Rosenthal inequality, 168 rounding error, 81 normalized, 81 total, 81 of random proportions, 80 problem, 52 rule(s), 180, 185 optimal, 185 order, 185 rth absolute pseudomoment, 143 rth diﬀerence pseudomoment, 122, 142 rule of rounding, 183 optimal, 183 Ryll-Nardyewski, result of, 63 σ-additive, 63 σ-completion, 167 σ-continuity upwards, 72 σ-continuous upwards, 70 σ-measurable, 167 Schur complement, 134 SDE, see stochastic diﬀerential equations SDEs wit a drift, 294

with mean interaction in time, 293 search tree binary, 260, 263 random, 260 second p-conjugate, 102 selection theorem, 194, 217, 237 of Michael, 306, 339 self-decomposable, 246 selling strategy, 367, 368 semi-inﬁnite linear programs, 307 semicontinuous function lower, 70, 171 upper, 70 semidistance, 27 semilinear space, 241, 255, 260 semimetric, 67 cost function, 14 separable Fr´echet space, 337 metric space, 332 separation theorem, 325, 328 of Lusin, 178 set of m-dimensional vectors, 51 set-valued mapping, 358, 363 sharpness of Hoeﬀding–Fr´echet bounds, 152 signature algorithms, 23 of a graph, 23 signed ﬁnite measure, 265 Radon measure, 166 simple measure, 396 metric(s), 373 signed measure, 405 simple metric, 415 simplex, 80 simplex method, 270 simultaneous representations, 32 single-channel models, 74 single-valued C 1 -operator, 288 Skorohod–Lebesgue spaces, 32, 33 smallest elements of the marginal, 12 smooth transportation plans, 373

Index smooth convex function, 289 smoothing Kantorovich metric, 87 smoothness of the cost function, 279 solution of mass transportation, 85 of mass transshipment problems, 85 of the maximization problem, 2 of the SDE, 281, 331 solution of extremal problem(s), 139–141 the extension problem, 296 the maximization problem, 138 southwest corner rule, 25 square uniform rate of convergence, 323 stability of stochastic optimization problem, 49 programs, 49 stable central limit theorem, 117 limit theorem(s), 102, 124, 126 symmetric law, 125 stable limit theorem rate of convergence, 35 starlike, 285 stationary divisor rules, 181 rule(s), 185 of (1/t)-rounding, 186 stochastic applications, 27 of the MKP, 27 diﬀerential equations, 277 numerical approximation, 39 dominance, 341 Euler method, 337 inequality, 110 mappings, 277 optimization problem, 49 order, 15, 144

427

ordering, 147, 148 Strassen representation theorem, 146 theorem, 154, 417 application of the duality theory, 319 Strassen–Dudley theorem, 105 strict preference, 345 strictly α-stable random vector, 243 operator-stable distribution, 132 operator-stable random vector, 131 strong axiom of revealed preference, 352 law of large numbers, 198 metric, 43 solution of the SDE, 281 stochastic dominance, 341 subadditive, 30 subadditivity, 66 subdiﬀerential, 113, 178 subgradient, 113 sublinear functional, 243, 244 submartingale, 211 subnet, 187 subspace, closed, 172 suﬃcient condition for a nontrivial solution, 281 summability method, 126 superadditive, 34 superadditive function, 25 superlinear mapping, 241 superlinearity, 241 supply distribution, 2 support of a measure, 403 of marginal measures, 145 supporting hyperplane, 114 survival function, 375 Suslin function(s), 78 set(s), 78 symmetric α-stable, 126

428

Index

U -valued random variable ϑ, 91 cost function, 4, 11 symmetric matrix, 289 system of interacting particles, 298 τ -continuity downwards, 72 upwards, 72 τ -continuous downwards, 70 tail condition, 250 theorem by Weizs¨ acker and Winkler, 19 ergodic, 214 of Arzela, 202, 218 of Berry–Ess´een, 255 of Choquet, 79 of Debreu, 323, 329, 335 of Dini, 74 of Dobrushin, 93 of Douglas, 20 of Dubovitskii–Milyutin on convex sets, 180, 184 of Fenchel–Moreau, 178 of Fubini, 5 of Gutman, 43 of Hahn–Banach, 61, 393, 402 of Isii, duality, 59 of Kantorovich, 88 of Kantorovich–Rubinstein, 412 extension, 406 of Krein–Milman and Choquet, 19 of Krein–Smulyan, 251, 256 of Lusin, 74, 178 of Lyapunov, 261 of Michael, 306, 339 of Moreau, 122 of Rademacher, 381 of Strassen, 146, 154, 417 application, 319 of Strassen–Dudley, 105 theory of moments, 52 of monopoles with incomplete information, 367

of optimal taxation, 367 of probability metrics, 373 of rounding, 179 thick convex cone, 179, 184 threshold for rounding, 180 time discretization methods, 332 discretization of the SDE, 332 time discretization points, 41 topological properties, 21 of Kantorovich–Rubinstein MTP, 13 spaces, 63, 219, 337 completely regular, 221 ordered, 145 version of Fr´echet problem, 262 topology of weak convergence, 322 total cost, bounds to, 158 mass, 375 rounding error, 81 variation distance, 111 metric, 30, 93, 253 variation distance, 133, 136 variation norm, 375 TP, see transportation problem trajectory, 358, 363 eﬃcient inﬁnite, 364 inﬁnite, 365 of dynamical system, 354, 358 optimal, 364 ﬁnite, 364 transfer function, 367 problem, 162 transformation by Markov kernel, 199 transitive relation, 322 transportation cost of a unit from note i to node j, 10 cost, upper bound for, 18 plan, 2, 40 problem (TP), 15, 21

Index discrete, 2 relaxed, 4, 8, 52 with local upper bounds, 40 with nonnegative cost function, 2 transshipment, 271 cost, optimal multivariate, 158 network ﬂow problem, 372 plans, 47 problem of Kantorovich–Rubinstein (KRP), vii, xi rate, 372 tree splitting protocols, 220 triangle inequality, 174, 179, 183, 217, 271, 290 trinary feedback, 220 triple of points, 271 two-dimensional case, 43 u-chaotic, 278, 288 uniform bound, 317 distance, 183 between characteristic functions, 136 distribution, 272 k-modulus of continuity, 393 metric, 133, 136, 137 depending on the exponent B, 133 norm, 219 uniformly convergent, 201 tapered matrix, 27 unimodality condition, 8, 39 uniqueness of OTP, 13 univariable case, 380 universal utility theorem, 340 universally measurable, 167, 192, 197, 220, 226, 235, 245 set, 167, 220 upper bound for the transportation cost, 18 bounds ﬁniteness, 122 4 p , 152 bounds for L

429

envelope, 358 Fr´echet bounds, 2 Hoeﬀding–Fr´echet bound, 21, 31 semicontinuous, 358–362 function, 70, 81 Urysohn lemma, 176, 328 usual Fr´echet bounds, 42 usual stochastic dominance, 341 utility continuous, 337 function(s), 44, 329–332, 345 d-Lipschitz, 349 of a preorder, 323 -rational choice function, 352 representation, 45 theorem, 337, 340, 344 variance of the sum, 155 reduction, 154 variation distance, total, 111 metric, total, 30 norm, 375 vector problem, 179 Wasserstein metric, Lp , 40 norm, 404 Wasserstein metric, 322, 332 weak approximation of SDEs, 332 convergence, 102, 152, 182, 232, 278, 322 metric, 43 weak* compact, 308 compactness, 256 lower semicontinuity, 257 semicontinuous, 178, 188 precompact, 318 weakly perfect metric, 415 regular functional, 427 weakly*

430

Index

closed, 256, 262 compact, 257, 262 convergent subnet, 308 subsequence, 308 lower semicontinuous, 256, 257 wealth changes, 248 Webster rounding, 59, 188 rule, 185, 188 Weibull distribution, 123 weighted total variation metric, 427 Wiener process, 43, 241, 333, 338, 339, 341 q-dimensional, 334 discretization, 336 increments, 346 one-dimensional, 341, 342 standard, 347, 348

Woyczinski inequality, 196 χp -metric, 249 χp -minimal metric, 249 Young inequality, 113, 114, 124 ζF -representation for p , 99 ζn -metric, 47 Zarantonello-monotone, 11 Zolotarev ideal metric, 193, 275, 381 metric, 97, 107, 412, 416 metric ζr , 218 type metric, 102 ζn -metric, 374 ζr -metric, 413 Zorn’s lemma, 291