COUNTEREXAMPLES IN PROBABILITY (TOC) Preface to the Second Edition Preface to the First Edition Basic Notation and Abbreviations Part 1. Classes of Random Events and Probabilities
1
Section 1. Classes of Random Events
3
1.1. A Class of Events Which Is a Field but Not a-Field
3
1.2. A Class of Events Can Be Closed under Finite Unions and Finite Intersections but Not under Complements
4
1.3. A Class of Events Which Is a Semi-Field but Not a Field
4
lA. a-Field of Subsets of n Need Not Contain All Subsets of n
4
1.5. Every a-Field of Events Is a D-System, but the Converse Does Not Always Hold
5
1.6. Sets Which Are Not Events in the Product a-field
5
1.7. The Union of a Sequence of a-Fields Need Not Be a-Field.
6
Section 2. Probabilities
6
2.1. A Probability Measure Which Is Additive but Not a-Additive
7
2.2. The Coincidence of Two Probability Measures On a Given Class Does Not Always Imply Their Coincidence On the a-Field Generated by This Class
8
2.3. On the Validity of the Kolmogorov Extension Theorem in
8
(R-,~OO)
2.4. There May Not Exist a Regular Conditional Probability with Respect to a Given a-Field Section 3. Independence of Random Events
10
11
3.1. Random Events with a Different Kind of Dependence
12
3.2. The Pairwise Independence of Random Events Does Not Imply Their Mutual Independence
13
3.3. The Relation P(ABC) = P(A)P(B)P(C) Does Not Always Imply the Mutual Independence of the Events A, B, C
14
3.4. A Collection of n + 1 Dependent Events Such That Any n of Them Are Mutually Independent
15
3.5. Collections of Random Events with 'Unusual' IndependencelDependence Properti es
16
3.6. Is There a Relationship between Conditional and Mutual Independence of Random Events?
18
3.7. Independence Type Conditions Which Do Not Imply the Mutual Independence of a Set of Events
19
3.8. Mutually Independent Events Can Form Families Which Are Strongly Dependent
19
3.9. Independent Classes of Random Events Can Generate a-fields Which Are Not Independent
20
Section 4. Diverse Properties of Random Events and Their Probabilities
21
4.1. Probability Spaces Without Non-Trivial Independent Events: Totally Dependent Spaces
21
4.2. On the Borel-Cantelli Lemma and Its Corollaries
22
4.3. When Can a Set of Events Be Both Exhaustive and Independent?
23
4.4. How Are Independence and Exchangeability Related?
24
4.5. A Sequence of Random Events Which Is Stable but Not Mixing
25
Part 2. Random Variables and Basic Characteristics
27
Section 5. Distribution Functions of Random Variables
29
5.1. Equivalent Random Variables Are Identically Distributed but the Converse Is Not True
31
5.2. If X, Y, Z Are Random Variables on the Same Probability Space, Then X 4y Does Not Always Imply That XZ g, yz
31
5.3 Different Distributions Can Be Transformed by Different Functions to the Same Distribution
32
5.4 A Function Which Is a Metric on the Space of Distributions but Not on the Space of Random Variables
33
5.5. On the n-Dimensional Distribution Functions
33
5.6. The Continuity Property of One-Dimensional Distributions May Fail in the Multi-Dimensional Case
34
5.7. On the Absolute Continuity of the Distribution of a Random Vector and of Its Components
35
5.8. There Are Infinitely Many Multi-Dimensional Probability Distributions with Given Marginals
36
5.9. The Continuity of a Two-Dimensional Probability Density Does Not Imply That the Marginal Densities Are Continuous
37
5.10. The Convolution of a Unimodal Probability Density Function with ItselfIs Not Always Unimodal
38
5.11. The Convolution of Unimodal Discrete Distributions Is Not Always Unimodal
40
5.12. Strong Unimodality Is a Stronger Property Than the Usual Unimodality
40
5.13. Every Unimodal Distribution Has a Unimodal Concentration Function, but the Converse Does Not Hold
41
Section 6. Expectations and Conditional Expectation
42
6.1. On the Linearity Property of Expectations
44
6.2. An Integrable Sequence of Non-Negative Random Variables Need Not Have a Bounded Supremum
45
6.3. A Necessary Condition Which Is Not Sufficient for the Existence of the First Moment
45
6.4. A Condition Which Is Sufficient but Not Necessary for the Existence of Moment of Order (-1) of a Random Variable
46
6.5. An Absolutely Continuous Distribution Need Not Be Symmetric Even Though All Its Central Odd-Order Moments Vanish
47
6.6. A Property of the Moments of Random Variables Which Does Not Have an Analogue for Random Vectors
47
6.7. On the Validity of the Fubini Theorem
48
6.8 A Non-Uniformly Integrable Family of Random Variables
48
6.9 On the Relation E[E(~Y)] = EX
49
6.10. Is It Possible to Extend One of the Properties of the Conditional Expectation?
49
6.11. The Mean-Median-Mode Inequality May Fail to Hold
50
6.12. Not All Properties of Conditional Expectations Have Analogues for Conditional Medians
51
Section 7. Independence of Random Variables
51
7.1. Discrete Random Variables Which Are Pairwise but Not Mutually Independent
53
7.2. Absolutely Continuous Random Variables Which Are Pairwise but Not Mutually Independent
53
7.3. A Set of Dependent Random Variables Such That Any of Its Subsets Consists of Mutually Independent Variables
54
7.4. Collection of n Dependent Random Variables Which Are m- Wise Independent
56
7.5. An Independence-Type Property for Random Variable
57
7.6. Dependent Random VariablesXand YSuch ThatX2 and Independent 7.7. The Independence of Random Variables in Terms of Characteristic Functions
y2
Are
58
59
7.8. The Independence of Random Variables in Terms of Generating Functions
61
7.9. The Distribution of a Sum Can Be Expressed by the Convolution Even If the Variables Are Dependent
62
7.10. Discrete Random Variables Which Are Uncorrelated but Not Independent
63
7.11. Absolutely Continuous Random Variables Which Are Uncorrelated but Not Independent
64
7.12. Independent Random Variables Have Zero Correlation Ratio, But the Converse Is Not True
64
7.13. The Relation E[Ylx] = EY Almost Surely Does Not Imply That the Random Variables X and Y Are Independent
65
7.14. There Is No Relationship between the Notions of Independence and Conditional Independence
65
7.15. Mutual Independence Implies the Exchangeability of Any Set of Random Variables, but Not Conversely
67
7.16. Different Kinds of Monotone Dependence between Random Variable
67
Section 8. Characteristic and Generating Functions
68
8.1. Different Characteristic Functions Which Coincide on a Finite Interval but Not On the Whole Real Line
70
8.2. Discrete and Absolutely Continuous Distributions Can Have Characteristic Functions Coinciding On the Interval [-1, 1]
71
8.3. The Absolute Value of a Characteristic Function Is Not Necessarily a Characteristic Function
72
8.4. The Ratio of Two Characteristic Functions Need Not Be a Characteristic Function
72
8.5. The Factorization of a Characteristic Function into Indecomposable Factors May Not Be Unique
73
8.6. An Absolutely Continuous Distribution Can Have a Characteristic Function Which Is Not Absolutely Integrable
74
8.7. A Discrete Distribution Without a First-Order Moment but with a Differentiable Characteristic Function
75
8.8. An Absolutely Continuous Distribution Without Expectation but with a Differentiable Characteristic Function
75
8.9. The Convolution of Two Indecomposable Distributions Can Even Have a Normal Component
76
8.10. Does the Existence of All Moments of a Distribution Guarantee the Analyticity of Its Characteristic and Moment Generating Functions?
77
Section 9. Infinitely Divisible and Stable Distributions 9. 1. A Non-Vanishing Characteristic Function Which Is Not Infinitely Divisible
78 79
9.2. If I~I Is an Infinitely Divisible Characteristic Function, This Does Not Always Imply That ~ Is Also Infinitely Divisible
80
9.3. The Product of Two Independent Non-Negative and Infinitely Divisible Random Variables Is Not Always Infinitely Divisible
81
9.4. Infinitely Divisible Products of Non-Infinitely Divisible Random Variables
82
9.5. Every Distribution Without Indecomposable Components Is Infinitely Divisible, but the Converse Is Not True
83
9.6. A Non-Infinitely Divisible Random Vector with Infinitely Divisible Subsets of Its Coordinates
83
9.7. A Non-Infinitely Divisible Random Vector with Infinitely Divisible Linear Combinations of Its Components
84
9.8. Distributions Which Are Infinitely Divisible but Not Stable
85
9.9. A Stable Distribution Which Can Be Decomposed into Two Infinitely Divisible but Not Stable Distributions
86
Section 10. Normal Distribution
87
10.1. Non-Normal Bivariate Distributions with Normal Marginals
88
10.2. If (Xl, X 2 ) Has a Bivariate Normal Distribution Then.x;, Xl + X 2 are Normally Distributed, but Not Conversely
89
and
10.3. A Non-Normally Distributed Random Vector Such That Any Proper Subset ofIts Components Consists of Jointly Normally Distributed and Mutually Independent Random Variables
91
10.4. The Relationship between Two Notions: Normality and Uncorrelatedness
92
10.5. It Is Possible That X, Y, X + Y, X - Y Are Each Normally Distributed, X and YAre Uncorrelated, but (X, Y) Is Not Bivariate Normal
10.6. If the Distribution of (Xl, ... , X,) Is Normal, Then Any Linear Combination and Any Subset of Xl, ... , X, is Normally Distributed, but There Is a Converse Statement Which Is Not True
94
10.7. The Condition Characterizing the Normal Distribution by Normality of Linear Combinations Cannot Be Weakened
96
10.8. Non-Normal Distributions Such That All or Some of the Conditional Distributions Are Normal
97
10.9. Two Random Vectors with the Same Normal Distribution Can Be Obtained in Different Ways from Independent Standard Normal Random Variables
98
10.10. A Property of a Gaussian System May Hold Even for Discrete Random Variables
99
Section 11. The Moment Problem
100
11.1. The Moment Problem for Powers of the Normal Distribution
101
11.2. The Lognormal Distribution and the Moment Problem
102
11.3. The Moment Problem for Powers of an Exponential Distribution
105
11.4. A Class of Hyper-Exponential Distributions with an Indeterminate Moment Problem
105
11.5. Different Distributions with Equal Absolute Values of the Characteristic Functions and the Same Moments of All Orders
107
11.6. Another Class of Absolutely Continuous Distributions Which Are Not Determined Uniquely by Their Moments
108
11.7. Two Different Discrete Distributions on a Subset of Natural Numbers Both Having the Same Moments of All Orders
109
11.8. Another Family of Discrete Distributions with the Same Moments of All Orders
110
11.9. On the Relationship between Two Sufficient Conditions for the Determination of the Moment Problem
111
11.10. The Carleman Condition is Sufficient but Not Necessary for the Determination of the Moment Problem
113
11.11. The Krein Condition is Sufficient but Not Necessary for the Moment Problem to Be Indeterminate
114
11.12. An Indeterminate Moment Problem and Non-Symmetric Distributions Whose Odd-Order Moments All Vanish
115
11.13. A Non-Symmetric Distribution with Vanishing Odd-Order Moments Can Coincide with the Normal Distribution Only Partially
115
Section 12. Characterization Properties of Some Probability Distributions
116
12.1. A Binomial Sum of Non-Binomial Random Variables
117
12.2. A Property of the Geometric Distribution Which Is Not Its Characterization Property
118
12.3. If the Random Variables X, Yand Their Sum X + YEach Have a Poisson Distribution, This Does Not Imply That X and Y Are Independent
118
12.4. The Raikov Theorem Does Not Hold Without the Independence Condition
119
12.5. The Raikov Theorem Does Not Hold for a Generalized Poisson Distribution of Order k, k ~ 2
120
12.6. A Case When the Cramer Theorem Is Not Applicable
121
12.7. A Pair of Unfair Dice May Behave Like a Pair of Fair Dice
121
12.8. On Two Properties of the Normal Distribution Which Are Not Characterizing Properties
122
12.9. Another Interesting Property Which Does Not Characterize the Normal Distribution
125
12.10. Can We Weaken Some Conditions under Which Two Distribution Functions Coincide?
126
12.11. Does the Renewal Equation Determine Uniquely the Probability Density?
127
12.12. A Property Not Characterizing the Cauchy Distribution
128
12.13. A Property Not Characterizing the Gamma Di stri buti on
128
12.14. An Interesting Property Whi ch Does Not Characterize Uniquely the Inverse Gaussian Distribution
129
Section 13. Diverse Properties of Random Variables
130
13.1. On the Symmetry Property of the Sum or the Difference of Two Symmetric Random Variables
130
13.2. When Is a Mixture of Normal Distributions Infinitely Divisible?
132
13.3. A Distribution Function Can Belong to the Class IFRA but Not to FR
133
13.4. A Continuous Distribution Function of the Class NBU Which Is Not of the Class IFR
134
13.5. Exchangeable and Tail Events Related to Sequences of Random Variables
134
13.6. The de Finetti Theorem for an Infinite Sequence of Exchangeable Random Variables Does Not Always Hold for a Finite Number of Such Variables
136
13.7. Can we Always Extend a Finite Set of Exchangeable Random Variables?
137
13.8. Collections of Random Variables Which Are or Are Not Independent and Are or Are Not Exchangeable
138
13.9. Integrable Randomly Stopped Sums with Non-Integrable Stopping Times
138
Part 3. Limit Theorems
141
Secti on 14. Vari ous Kinds of Convergence of Sequences of Random Variables
143
14.1. Convergence and Divergence of Sequences of Distribution Functions
144
14.2. Convergence in Distribution Does Not Imply Convergence in Probability
145
14.3. Sequences of Random Variables Converging in Probability but Not Almost Surely
146
14.4. On the Borel-Cantelli Lemma and Almost Sure Convergence
147
14.5. On the Convergence of Sequences of Random Variables in g -Sense for Different Values of r
148
14.6. Sequences of Random Variables Converging in Probability but Not in U-Sense
148
14.7. Convergence in Lr-Sense Does Not Imply Almost Sure Convergence
149
14.8. Almost Sure Convergence Does Not Necessarily Imply Convergence in L r -Sense
150
14.9. Weak Convergence of the Distribution Functions Does Not Imply Convergence of the Densities
151
!Ii
d
14.10. The Convergence Xn -----t X and YJ;I ---t Y Does Not Always d Imply That Xt/, + YII. ---+ X + y
,
14.11. The Convergence in Probability X fl ---. X Does Not Always
152
152
p
Imply That g(Xn) --+ g(Xl£or Any Function g 14.12. Convergence in V ariati on 1mpli es Convergence in Distribution but the Converse Is Not Always True
153
14.13. There Is No Metric Corresponding to Almost Sure Convergence
155
14.14. Complete Convergence of Sequences of Random Variables Is Stronger Than Almost Sure Convergence
155
14.15. The Almost Sure Uniform Convergence of a Random Sequence Implies Its Complete Convergence, but the Converse Is Not True
156
14.16. Converging Sequences of Random Variables Such That the Sequences of the Expectations Do Not Converge
156
14.17. Weak L1-Convergence of Random Variables is Weaker Than Both Weak Convergence and Convergence in Ll -Sense
157
14.18. A Converging Sequence of Random Variables Whose Cesaro Means Do Not Converge
158
Section 15. Laws of Large Numbers 15.1. The Markov Condition is Sufficient but Not Necessary for the Weak Law of Large Numbers
159 161
IS.2. The Kolmogorov Condition for Arbitrary Random Variables is Sufficient but Not Necessary for the Strong Law of Large Numbers
162
IS.3. A Sequence of Independent Discrete Random Variables Satisfying the Weak but Not the Strong Law of Large Numbers
163
IS.4. A Sequence ofIndependent Absolutely Continuous Random Variables Satisfying the Weak but Not the Strong Law of Large Numbers
164
IS.5. The Kolmogorov Condition ~1.f1;/n'1. < 00 Is the Best Possible Condition for the Strong Law of Large Numbers
16S
IS.6. More on the Strong Law of Large Numbers Without the Kolmogorov Condition
16S
IS.7. Two 'near' Sequences of Random Variables Such That the Strong Law of Large Numbers Holds for One of Them and Does Not Hold for the Other
166
15.8. The Law of Large Numbers Does Not Hold If Almost Sure Convergence Is Replaced by Complete Convergence
167
15.9. The Uniform Boundedness of the First Moments of a Tight Sequence of Random Variables Is Not Sufficient for the Strong Law of Large Numbers
167
15.10. The Ari thmeti c Means of a Random Sequence Can Converge in Probability Even If the Strong Law of Large Numbers Fails to Hold
168
15.11. The Weighted Averages of a Sequence of Random Variables Can Converge Even If the Law of Large Numbers Does Not Hold
169
15.12. The Law of Large Numbers with a Special Choice of Norming Constants
170
Section 16. Weak Convergence of Probability Measures and Distributions
171
16.1. Defining Classes and Classes Defining Convergence
173
16.2. In the Case of Convergence in Distribution, Do the Corresponding Probability Measures Converge for All Borel Sets?
174
16.3. Weak Convergence of Probability Measures Need Not Be Uniform
175
16.4. Two Cases When the Continuity Theorem Is Not Valid
176
16.5 Weak Convergence and Levy Metric
177
16.6 A Sequence of Probability Density Functions Can Converge in the Mean of Order 1 Without Being Converging Everywhere
178
16.7. A Version of the Continuity Theorem for Distribution Functions Which Does Not Hold for Some Densities
179
16.8. Weak Convergence of Distribution Functions Does Not Imply Convergence of the Moments
180
16.9. Weak Convergence of a Sequence of Distributions Does Not Always Imply the Convergence of the Moment Generating Functions
182
16.10. Weak Convergence of a Sequence of Distribution Functions Does Not Always Imply Their Convergence in the Mean
182
Section 17. Central Limit Theorem
183
17.1. Sequences of Random Variables Which Do Not Satisfy the Central Limit Theorem
184
17.2. How is the Central Limit Theorem Connected with the Feller Condition and the Uniform Negligibility Condition?
186
17.3. Two 'Equivalent' Sequences of Random Variables Such That One of Them Obeys the Central Limit Theorem While the Other Does Not
186
17.4. If the Sequence of Random Variables ex,,} Satisfies the Central Limit Theorem, What Can We Say about the Variance of
187
S~/~?
17.5. Not Every Interval Can Be a Domain of Normal Convergence
188
17.6. The Central Limit Theorem Does Not Always Hold for Random Sums of Random Variables
189
17.7. Sequences of Random Variables Which Satisfy the Integral but Not the Local Central Limit Theorem
189
Section 18. Diverse Limit Theorems
192
18.1. On the Conditions in the Kolmogorov Three-Series Theorem
192
18.2. The Independency Condition is Essential in the Kolmogorov Three-Series Theorem
193
18.3. The Interchange of Expectations and Infinite Summation Is Not Always Possible
195
18.4. A Relationship between a Convergence of Random Sequences and Convergence of Conditional Expectations
195
18.5. The Convergence of a Sequence of Random Variables Does Not Imply That the Corresponding Conditional Medians Converge
196
18.6. A Sequence of Conditional Expectations Can Converge Only on a Set of Measure Zero
196
18.7. When Is a Sequence of Conditional Expectations Convergent Almost Surely?
197
18.8. The Weierstrass Theorem for the Unconditional Convergence of a Numerical Series Does Not Hold for a Series of Random Variables
198
18.9. A Condition Which Is Sufficient but Not Necessary for the Convergence of a Random Power Series
199
18.10. A Random Power Series Without a Radius of Convergence in Probability
200
18.11. Two Sequences of Random Variables Can Obey the Same Strong Law of Large Numbers but One of Them May Not Be in the Domain of Attraction of the Other
201
18.12. Does a Sequence of Random Variables Always Imitate Normal Behaviour?
202
18.13. On the Chover Law of Iterated Logarithm
204
18.14. On Record Values and Maxima of a Sequence of Random Variables
205
Part 4. Stochastic Processes
207
Section 19. Basic Notions on Stochastic Processes
209
19.1. Is It Possible to Find a Probability Space on Which Any Stochastic Process Can Be Defined?
210
19.2. What Is the Role of the Family of Finite-Dimensional Distributions in Constructing a Stochastic Process with Specific Properties?
211
19.3. Stochastic Processes Whose Modifications Possess Quite Different Properties
212
19.4. On the Separability Property of Stochastic Processes
213
19.5. Measurable and Progressively Measurable Stochastic Processes
214
19.6. On the Stochastic Continuity and the Weak Ll-Continuity of Stochastic Processes
217
19.7. Processes Which are Stochastically Continuous but Not Continuous Almost Surely
219
19.8. Almost Sure Continuity of Stochastic Processes and the Kolmogorov Condition
219
19.9. Does the Riemann or Lebesgue Integrability of the Covariance Function Ensure the Existence of the Integral of a Stochastic Process?
220
19.10. The Continuity of a Stochastic Process Does Not Imply the Continuity ofIts Own Generated Filtration, and Vice Versa
223
Section 20. Markov Processes
224
20.1. Non-Markov Random Sequences Whose Transition Functions Satisfy the Chapman-Kolmogorov Equation
226
20.2. Non-Markov Processes Which Are Functions of Markov Processes
227
20.3. Comparison of Three Kinds ofErgodicity of Markov Chains
229
20.4. Convergence of Functions of an Ergodic Markov Chain
232
20.5. A Useful Property ofIndependent Random Variables Which Cannot be Extended to Stationary Markov Chains
233
20.6. The Partial Coincidence of Two Continuous-Time Markov Chains Does Not Imply That the Chains Are Equivalent
234
20.7. Markov Processes, Feller Processes, Strong Feller Processes and Relationships between Them
235
20.8. Markov but Not Strong Markov Processes
236
20.9. Can a Differential Operator of Order k > 2 Be an Infinitesimal Operator of a Markov Process?
238
Section 21. Stationary Processes and Some Related Topics
239
21.1. On the Weak and the Strict Stationary Properties of Stochastic Processes
240
21.2. On the Strict Stationarity of a Given Order
241
21.3. The Strong Mixing Property Can Fail If We Consider a Functional of a Strictly Stationary Strong Mixing Process
242
21.4. A Strictly Stationary Process Can Be Regular but Not Absolutely Regular
243
21.5. Weak and Strong Ergodicity of Stationary Processes
244
21.6. A Measure-Preserving Transformation Which Is Ergodic but Not Mixing
246
21.7. On the Convergence of Sums of cp-mixing Random Variables
248
21.8. The Central Limit Theorem for Stationary Random Sequences
248
Section 22. Discrete-Time Martingales
250
22.1. Martingales Which Are L I-Bounded but Not L I-Dominated
251
22.2. A Property of a Martingale Which Is Not Preserved Under Random Stopping
252
22.3. Martingales for Which the Doob Optional Theorem Fails to Hold
254
22.4. Every Quasimartingale Is an Arnart, but Not Conversely
255
22.5. Amarts, Martingales in the Limit, Eventual Martingales and Relationships between Them
256
22.6. Relationships between Amarts, Progressive Martingales and Quasimartingal es
257
22.7. An Eventual Martingale Need Not Be a Game Fairer with Time
258
22.8. Not Every, Martingale-Like Sequence Admits a Riesz Decom posi ti on
258
22.9. On the vali di ty of Two Inequali ti es for Martingal es
259
22.10. On the Convergence of Submartingales Almost Surely and in L I-Sense
260
22.11. A Martingale May Converge in Probability but Not Almost Surely
261
22.12. Zero-Mean Martingales Which Are Divergent with a Given Probability
263
22.13. More on the Convergence of Martingal es
264
22.14. A Uniformly Integrable Martingale with a Nonintegrable Quadratic Variation
265
Section 23. Continuous-Time Martingales
267
23.1. Martingales Which Are Not Locally Square Integrable
268
23.2. Every Martingale Is a Weak Martingale but the Converse Is Not Always True
269
23.3. The Local Martingale Property Is Not Always Preserved under Change of Time
270
23.4. A Uniformly Integrable Supermartingale Which Does Not Belong to Class (D)
271
23.5. LP-Bounded Local Martingale Which Is Not a True Martingale
272
23.6. A Sufficient but Not Necessary Condition for a Process to Be a Local Martingale
274
23.7. A Square Integrable Martingale with a Non-Random Characteristic Need Not Be a Process with Independent Increments
275
23.8. The Time-Reversal of a Semimartingale Can Fail to Be a Semimartingal e
276
23.9. Functions of Semimartingales Which Are Not Semimartingal es
276
23.10. Gaussian Processes Which Are Not Semimartingales
277
23.11. On the Possibility of Representing a Martingale As a Stochastic Integral with Respect to Another Martingale
279
Section 24. Poisson Process and Wiener Process
280
24.1. On Some Elementary Properties of the Poisson Process and the Wiener Process
281
24.2. Can the Poisson Process Be Characterized by Only One of It Properties?
283
24.3. The Conditions under Which a Process Is a Poisson Process Cannot Be Weakened
284
24.4. Two Dependent Poisson Processes Whose Sum Is Still a Poisson Process
286
24.5. Multidimensional Gaussian Processes Which Are Close to the Wiener Process
287
24.6. On the Wald identities for the Wiener process
288
24.7. Wald identity and a non-uniformly integrable martingale based on the Wiener process
290
24.8. On Some Properties of the Variation of the Wiener Process
291
24.9. A Wiener Process with Respect to Different Filtrations
293
24.10. How to Enlarge the Filtration and Preserve the Markov Property of the Brownian Bridge
294
Section 25. Diverse Properties of Stochastic Processes
295
25.1. How Can We Find the Probabilistic Characteristics of a Function of a Stationary Gaussian Process?
296
25.2. Cramer Representation, Multiplicity and Spectral Type of Stochastic Processes
297
25.3. Weak and strong Solutions of Stochastic Differential Equations
300
25.4. A Stochastic Differential Equation Which Does Not Have a Strong Solution but For Which a Weak Solution Exists and Is Unique
302
Supplementary Remarks
305
References
317
Index
339
PREFACE TO THE SECOND EDITION A large amount of newly collected and created material and the lively interest in the first edition of this book (CEP-I) motivated me towards the second edition (CEP-2). Actually, I have never stopped looking for new counterexamples or thinking about how to achieve completeness and clarity as far as possible in this work. My strategy was to keep the best from CEP-I, replace some examples by new and more attractive ones and add entirely new examples taken from recent publications or invented especially for CEP-2. Thus the reader will find several original topics well supplementing the material in CEP-I. Among the topics essentially extended are independence/dependence/exchangeability properties of sets of random events and random variables, characterization of probability distributions, the moment problem, martingales and limit theorems. Clearer interpretations of many statements and improvements in presentation have been made in all sections. The text of CEP-2 is more compact. However, much material has remained unused in order to keep the book a reasonable size. The Index, Supplementary Remarks and the References have been updated and extended accordingly. My work on CEP-2 took a long time and, as always, my enthusiasm was based on my strong belief about the importance of the role of counterexamples to everyone teaching or learning probability theory. Additional stimuli came from the positive reactions of so many colleagues in so many countries. Like many others I experienced difficulties during this time and first had to solve the problem of how to survive in this changing and unpredictable world. I now use this opportunity to express sincere thanks to many colleagues and friends for their attention and support during my visits to several universities in The Netherlands, Great Britain, Russia, Italy, Canada, USA, France and Spain. In particular, large portions of CEP-2 were prepared when I was visiting Queen's University (Kingston, Ontario) and Miami University (Oxford, Ohio). The last stages of this work were undertaken during a recent visit to Universite Joseph Fourier (Grenoble) and in Sofia just before my trip to Kentucky. I am very grateful for my collaboration with John Wiley & Sons (Chichester). The attention, the patience and the help of Helen Ramsey and Jenny Smith were much appreciated. My thanks go to them and to all the staff at Wiley.
Finally, I hope that you, the reader, will benefit from this edition and my belief that new counterexamples will be created as an essential part of the further development of probability theory. As before, any new suggestions are welcome! July/August 1996 Europe/America Jordan Stoyanov
PREFACE TO THE FIRST EDITION General comments. We have used the term counterexample in the same sense as generally accepted in mathematics. Three previous books related to counterexamples: on analysis (Gelbaum and Olmsted 1964), on topology (Steen and Seebach 1978) and on graph theory (Capobianco and Molluzzo 1978), have been and still are popular among mathematicians. The present book is a collection of counterexamples covering important topics in the field of probability theory and stochastic processes. It is not only traditional theorems, proofs and illustrative examples, but also counterexamples, which reflect the power, the width, the depth, the degree of nontriviality and the beauty of the theory. Ifwe have found necessary and sufficient conditions for some statement or result, then any change in the conditions implies that the result is false and accordingly the statement has to be modified. Our attention is focused on interesting questions concerning: (a) the necessity of some sufficient conditions; (b) the sufficiency of certain necessary conditions; (c) the validity of a statement which is the converse to another statement. However, we have included some useful and instructive examples which can be interpreted as counterexamples in a generalized sense.
Purpose of the book. The present book is intended to serve as a supplementary source for many courses in the field of probability theory and stochastic processes. The topics dealt with in the book, and the level of counterexamples, are chosen in such a way that it becomes a multi-purpose book. Firstly, it can be used for any standard course in probability theory for undergraduates. Secondly, some of the material is suitable for advanced courses in probability theory and stochastic processes, assuming that the students have had a course in measure theory and function theory. Thirdly, young researchers and even professionals will find the book useful and may discover new and strange results. The wide variety of content and detail in the discussions of the counterexamples may also help lecturers and tutors in their teaching. It should be noted that some of the examples considered in the book give the reader an opportunity to become more familiar with standard results in probability and stochastic processes and to develop a better understanding of the subject. However, there exist some examples which are more difficult and their mastering requires a considerable amount of additional work.
Content and structure of the book. The present book includes a relatively large number of counterexamples. Their choice was not easy. We have tried to include a variety of counterexamples concerning different topics in probability theory and stochastic processes. Though we have avoided trivial examples, we have nonetheless included some which cover elementary matters. Pathological examples have been completely avoided. The examples which are most useful and interesting fall in between these two categories. The material of the book is divided into 4 chapters and 25 sections. Each section begins with short introductory notes giving basic definitions and main results. Then we present the counterexamples related to the main results, the motivation for questions and the counter-statements. Some notions and results are given and analysed in the counterexamples themselves. All counterexamples are named and numbered for the convenience of readers. The counterexamples range over various degrees of difficulty. Some are elementary and well known counterexamples and can be classified as a part of a probabilistic folklore. Also the style of presentation needs to vary. Some of the counterexamples are only briefly described to economize on space and to provide the reader with a chance for independent work. Readers of the book are assumed to be familiar with the basic notions and results in probability theory and stochastic processes. Some references are given to textbooks and lecture notes which provide the necessary background to the subj ect. At the end of the book, Supplementary Remarks are included providing references and some additional explanations for the majority of the counterexamples. For most of the examples we have given at least one relevant early reference. Many of the counterexamples originate from individual probabilists and statisticians and we have cited them fully. Other sources are also indicated where the reader can find new counterexamples, ideas for such examples or some questions whose answers would lead to interesting and useful counterexamples. The Supplementary Remarks give readers the opportunity for further work.
Note about references. References Dudley (1972) and (Dudley 1976) indicate a paper or book published by Dudley in 1972 or 1976 respectively. For convenience we have devised abbreviated names for the principal journals in the field of probability theory, stochastic processes and mathematical statistics. In all other cases standard international abbreviations are used. History of the book. The book is a result of 16 years of my study in the field of probability theory and stochastic processes. I started to collect counterexamples in 1970 when I was a student at Moscow University and later it became an intriguing preoccupation. As a result I increased the number of counterexamples to 500 or so. Many of the counterexamples or different versions of them belong to other authors. Some new and fresh counterexamples were created by colleagues and friends especially for this book. During the preparation of the book I have been guided by my own experience in lecturing on these topics in several European and Canadian universities and in giving special seminars in recent years for students of Sofia
University. The international character of the book is obvious. It is not only my opinion that the present book is an example, not a counterexample, of a successful collaboration and friendship among mathematicians from different countries. Acknowledgements. The selection and presentation of the material in the book, aimed at covering the wide field of probability theory and stochastic processes, has not been an easy task. I was grateful for the opportunity to discuss the project with my many colleagues and friends whose advice and valuable suggestions were extremely helpful. I wish to express my thanks to all of them. My special thanks are addressed to my teachers Prof. B. V. Gnedenko, Prof. Yu. V. Prohorov and Prof. A. N. Shiryaev for their attention, general and specific suggestions and encouragement. Among colleagues and friends I have to mention N. V. Krylov, R. Sh. Liptser, A. A. Novikov, Yu. M. Kabanov, S. E. Kuznetsov, A. M. Zubkov, O. B. Enchev and S. D. Gaidov with whom I had very useful discussions on several concrete topics. My thanks are directed to all colleagues who were so kind as to send me their specific suggestions. The names of these colleagues are included in the list of references. I use the opportunity to express my special grateful to Prof. A. T. Fomenko for providing five of his extraordinary drawings especially for this book. I wish to thank Prof. D. G. Kendall for his interest to my work and for his constructive suggestions and encouragement. The comments of the anonymous referees and the editor helped me to improve both the content and the style of the presentation. I express my appreciation to them. Finally I should like to thank the collaborators of John Wiley & Sons (Chichester) for their patience and for their precise and excellent work. It is my pleasure to mention the names of Charlotte Farmer and Ian McIntosh. Suggestions and comments from readers are most welcome and will, if appropriate, be reflected in any subsubsequent editions of the book. JUJ\lE 1986, SOFIA JORDAN STOYANOV
Part 1
Classes of Random Events and Probabilities
Courtesy of Professor A. T. Fomenko of Moscow
3
CLASSES OF RANDOM EVENTS AND PROBABILITIES
SECTION 1.
CLASSES OF RANDOM EVENTS
Let n be an arbitrary non-empty set. Its elements, denoted by w, will be interpreted as outcomes (results) of some experiment. As usual, we use A U B and A n B (as well AB) to represent the union and the intersection of any two subsets A and B of n respectively. Also, A C is the complement of A c n. In particular, n c = 0 where 0 is the empty set. The class A of subsets of n is called a field if it contains n and is closed under the formation of complements and finite unions, that is if: (a) n E A; (b) A E A =? AC E A; (c) AI, A2 E A =? Al U A2 E A.
Taking into account the so-called de Morgan laws, (AI A 2 )C = A~ U (AI U A2)l: = AI A2, we easily see that (c) can be replaced by the condition
A~
and
(c') A I ,A 2 E A =? AIA2 EA.
Thus A is closed under finite intersections. The class l' of subsets of n is called a a-field if it is a field and if it is closed under the formation of countable unions, that is if:
(d)A I ,A2, ... , E l' =? U~=I An E T Again, as above, condition (d) can be replaced by (d') AI, A21 ... , E l' =? n~=1 An E l'
and clearly the a-field l' is closed under countable intersections. Recall that the elements of any field or a-field are called random events (or simply, events). Other classes of events, such as the semi-field, D-system, and product of a-fields, will be defined and compared with each another in the examples below. Any textbook on probability theory contains a detailed presentation of all these basic ideas (see Kolmogorov 1956; Breiman 1968; Gihman and Skorohod 197411979; Chung 1974; Neveu 1965; Chow and Teicher 1978; Billingsley 1995; Shiryaev 1995). The examples given in this section concern some of the properties of different classes of random events and examine the relationship between notions which seem to be close to one another.
1.1.
A class of events which is a field but not a a-field
Let n = [0,00) and 1'1 be the class of all intervals of the type [a, b) or [a, 00) where o ::; a < b < 00. Denote by 1'2 the class of all finite sums of intervals of 1',. Then 1'1 is not a field, and 1'2 is a field but not a a-field. Take arbitrary numbers a and b, 0 < a < b < 00. Then A = [a, b) E 1'1. However, AC [0, a) U [b, (0) f 1'1 and thus 1'1 is not a field.
=
4
COUNTEREXAMPLES IN PROBABIUTY
It is easy to see that: (i) the finite union of finite sums of intervals (of :1) is again a sum of intervals; (ii) the complement of a finite sum of intervals is also a sum of intervals. This means that ~2 is a field. However, :12 is not a a-field because, for example, the set An = [0, lin) E :11 for each n = 1,2, ... , and the intersection n~::: I An = {O} does not belong to :11. Let us look at two additional cases. (al) Let n jRl and :1 be the class of all finite sums of intervals of the type (-00, al, (b, c] and (d, (0). Then:1 is a field. But the intersection n~=l (b - lin, c] is equal to [b, c] which does not belong to ~. Hence the field :1 is not a a-field. (a2) Let n be any infinite set and A the collection of all subsets A E n such that either A or its complement A C is finite. Then it is easy to see that A is a field but not a a-field.
=
1.2.
A class of events can be closed under finite unions and finite intersections but not under complements
Let n - JRI and the class A consist of intervals of the type (x, 00), x E n. Then x I\. y := min{ x, y} and v = x V y := max{ x, y} we have: using the notations u
=
(x, 00) U (y, 00) = (u,oo) E A
(x, 00) n (y, 00) = (v,oo) EA. However, (x, oo)C = ( 00, xl
1.3.
¢ A.
A class of events which is a semi-field but not a field
Let n be an arbitrary set. A non-empty class ~ of subsets of n is called a semi-field if n E ~, 0 E :1, ::1 is closed under the formation of finite intersections, and the complement of any set in ~ is a finite sum of disjoint sets of::1. It is easy to see that any field of subsets of n is also a semi-field. However, the following simple examples show that the converse is not true. (a) Letn = [-00,(0) and::1, contain n, {oo} and all intervals of the type [a, b) where -00 < a::; b::; 00. Then 0 E ~I' n E :1), [a), b, n [a2, b2) = [a) V a2, b) I\. b2) E~) and [a, b)C = [-00, a) U [b, 00). So::1, is a semi-field. Obviously::1 1 is not a field. (a2) Take n = JR) and denote by ::12 the class of all subsets of the form AB (= A n B) where A is a closed and B is an open set in n. Then again, J2 is a semi-field but not a field.
1.4.
A a-field of subsets of n need not contain an subsets of n
Recall that the set A E n is called a co-finite set if its complement AC is finite. Let :11 consist of the finite and co-finite subsets of n. Then :1, is a field. It is a a-field iff n is finite.
5
CLASSES OF RANDOM EVENTS AND PROBABILITIES
Further, the set A E 0 is called a co-countable set if AC is countable. Let 1"2 consist of the countable and the co-countable subsets of O. Then it is easy to check that 1"2 is a a-field. Suppose now that 0 is uncountable. Then 0 contains a subset A such that A and AC are both uncountable. This shows that in general a a-field of 0 need not contain all subsets of 0 and need not be closed under the fonnation of arbitrary uncountable Unlons.
1.5.
Every a-field of events is a D-system, but the converse does not always hold
A system 'D of subsets of a given set 0 is called aD-system (Dynkin system) in 0 if the following three conditions hold: (i) 0 E 'D ; (ii) A, B E 'D and A c B => B \ A E 'D; (iii) An E 'D, n = 1,2, ... and AI C A2 C ... => U~=I An E 'D. It is obvious that every a-field is a D-system, but the converse may not be true, as can be seen in the following example. Take 0 = {WI,W2,'" ,W2n}, n E N. Denote by 'De the collection of all subsets D E 0 consisting of an even number of elements. Conditions (i), (ii) and (iii) above are satisfied, and hence 'De is a D-system. However, if n > I and we take A = {WI,W2} and B = {W2,W3}, we see that A E 'De, B E 'De and AB = {W2}
1.6.
Sets which are not events in the product a-field
Given two arbitrary sets 0 1 and O 2 , we denote their product by 0 1 X O 2 : 0 1 x 0 1 := {(W"W2)} : WI E 01,W2 E O 2. For any set A E 0 1 X O 2 we denote by AWl the section of A at WI: AWl = {W2 E O 2 : (WI, W2) E A}. Analogously, AW2 = {WI E 0 1 : (WI,W2) E A} . A rectangle in 0 1 x O 2 is a subset of the fonn
AI x A2 = {(WI,W2):
WI
E A,W2 E A 2}, AI E 0
1,
A2 E O 2 .
AI X A2 is called a measurable rectangle (with respect to 1"1 and 1"2) if AI E 1"1 and A2 E 1"2 where 1"1 and 1"2 are a-fields of subsets of 0 1 and O 2 respectively. The measurable rectangles fonn a semi-field of subsets in 0 1 x O 2 • Thus the field generated by the measurable rectangles consists of all finite sums of disjoint measurable rectangles. The a-field generated by this field is denoted by 1"1 x 1"2 and is called the product a-field of 1"1 and 1"2. Let us note the following result (see Neveu 1965; Kingman and Taylor 1966). For every measurable set A in (0 1 x O 2,1"1 x 1"2) and every fixed WI E 0 1 and W2 E O 2, the sections AWl and AW2 are measurable sets in (0 2,1"2) and (0 1,1"1) respectively. However, the converse is not true. To see this, let 0 be any uncountable set and
COUNTEREXAMPLES IN PROBABILITY
6
l' the smallest a-field of subsets of 0 containing all one-point elements. Then the diagonal D = {(w, w) : W E O} of 0 x 0 does not belong to the product a-field l' x ~, although all its sections belong to ~. In other words, for each wEn, the section Dw E l' and is an event but D ¢ ~ x l' and is not an event. 1.7.
The union of a sequence of a-fields need not be a a-field
Let 1'1,1'2, ... be a sequence of a-fields of subsets of the set O. Then their intersection n~=1 ~n is always a a-field and it is natural to ask whether the union U~=I ~n is a a-field. We shall now show that the answer to this question is negative. Consider the set 0 = {WI, W2, w)} and the following two classes of its subsets: 1'1 = {0, {wt}, {W2,W3},0}, 1'2 = {0, {W2}, {WI,W3},0}. Then 1'1 and ~2 are fields and hence a-fields. Obviously the intersection 1'1 n 1'2 = {0, n}, the trivial a-field. However, the union
1" = 1'1 U ~2 = {0, {w t}, {W2}, {W2, W3}, {WI, W3 }, 0 } is not a field, and hence not a a-field because the element {WI} U {W2} = {WI, W2}
f/;
1'.
SECTION 2.
PROBABILITIES
Let n be any set and A be a field of its subsets. We say that P is a probability on the measurable space (0, .A) if P is defined for all events A E A and satisfies the following axioms. (a) P(A) 2 0 for each A E A; P(O) = 1. (b) P is finitely additive. That is, for any finite number of pairwise disjoint events A I, ... ,An E A we have
(c) P is continuous at 0. That is, for any events AI, A2 , •.. E A such that A n+ 1 C An and n~=l An = 0, it is true that lim P(An)
n--+oo
= O.
Note that conditions (b) and (c) are equivalent to the next one (d). (d) P is a-additive (countably additive), that is
P
(Q
An) =
~P(An)
for any events AI, A 2 , .•. E A which are pairwise disjoint.
7
CLASSES OF RANDOM EVENTS AND PROBABILITIES
According to the Caratheodory theorem (see Kolmogorov 1956; Loeve 1977; Shiryaev 1995), if Po is a a-additive probability on (n,A) and g- a(A) denotes the smallest a-field generated by the field A. then there is a unique probability measure P on (n, 1) which is an extension of Po in the sense that P(A) = Po(A) for A EA. In this case we also say that Po is a restriction of P over A and write PIA Po. The ordered triplet (n,:T, P) is called a probability space if:
=
n
is any set of points called elementary events (outcomes);
g- is a a-field of subsets of!l; the elements of g- are events; P is a probability on :T, that is P satisfies conditions (a), (b) and (c) above, or, equivalently, (a) and (d). Thus we have described the axiomatic system which is generally accepted in probability theory. This system was suggested by A. N. Kolmogorov in 1933 (see Kolmogorov 1956). In this section we present a few examples characterizing some of the properties of probability measures. The important notion of conditional probability is introduced and treated in Example 2.4.
2.1.
A probability measure which is additive but not a-additive
Let !l be the set of all rational numbers r of the unit interval [0,1] and :TI the class of the subsets of!lofthe form [a,b], (a,b], (a,b) or [a, b) where a and b are rational numbers. Denote by g-2 the class of all finite sums of disjoint sets of g-I. Then :T2 is a field. Let us define the probability measure P as follows:
P(A) = b - a,
if A E :TI,
n
P(B) =
L P(A i ),
if B E :T2, that is B = 2:7:::1 Ai, Ai E :TI.
i=l
Consider two disjoint sets of :T2. say m
n
B
=L
Ai
i=1
and
B'
=L
Aj
j=1
where Ai, Aj E g-) and all Ai, Aj are disjoint. Then B + B' = L:~:t C k where either C k = Ai for some i = 1, ... ,n, or C k = Aj for some j = I, ... ,m. Moreover,
P(B
(~Ck) ~P(Ck) = f,1(P(A,) + PtA})) P(Ai) + L PCAj) = PCB) + PCB')
+ B') = P
j
8
COUNTEREXAMPLES IN PROBABILITY
and hence P is an additive measure. Obviously every one-point set {r} E 1'2 and P( {r}) = O. Since 0. is a countable set and 0. = E:I {rd, we get 00
P( 0.) = 1 ,t 0 =
L P( { r d) . i=1
This contradiction shows that P is not a-additive.
2.2.
The coincidence of two probability measures on a given class does not always imply their coincidence on the a-field generated by this class
Let 0. be a set and e a class of events such that A, BEe => AB E e (that is, e is closed under intersection). Denote by l' = a(e) the a-field generated bye. Let Q 1 and Q 2 be two probabilities on the measurable space (0.,:1). The following result is well known (see Breiman 1968):
It is not surprising that results of this kind depend essentially on the structure of the class e. By an example we show the importance of the hypothesis that e is closed under intersection by an example. Take n = {a, b, c, d} and two measures Q1and Q2 defined as follows:
Ql (a) = Ql (d) = Q2(b) = Q2(C) = ~, Q 1(b) = Q(c) = Q2(a) = Q2(d) = ~. Let l' be the class of all subsets of nand e = {a U b, dUe, a U c, bUd}. Here and below x U y denotes the two-element set {x, y }. Then it is easy to check that Q 1 = Q 2 on e. For example,
t=t
Q I (d U c) _ Q 1(d) + Q I (c) _ ~ + Q2(d U c) - Q2(d) + Q 2 (c) - 3" + (; - 2:
and thus QI (d U c) = Q2(d U c). Analogously, Q1(-) = Q2(') for all remaining elements of e. However, it is evident from the definition of Q 1 and Q 2 that the equality Q 1(.) = Q 2(-) does not hold for all elements of 1'; for example, it is false for each of a, b, c and d. The reason for this is that e, as taken, is not closed under intersection.
2.3.
On the validity of the Kolmogorov extension theorem in (IR 00 , 'B 00)
Recall that the probability measures in the space IR n , n > 1 are constructed in the following way: first for elementary sets (rectangles of the type (a, b]), then for sets
CLASSES OF RANDOM EVENTS AND PROBABILITIES
9
A = E(ai, bil, and finally. by using the Caratheodory theorem (see Loeve 1977; Shiryaev 1995). for sets in 'Bn. A similar construction can be used for the space (]Roo, 'BOO). Let Cn(B) = {x E ]Roo : (XI, . .. ,xn ) E B}. B E 'B n denote a cylinder set in ]Roo with base B E 'Bn. It is natural to take the cylinder sets as elementary sets in ]Roo with their probabilities defined by the probability measure on the sets of 'Boo. Suppose P is a probability measure on (]Roo, 'B 00). For n = 1,2, ... we put
Thus we obtain a sequence of probability measures PI, P2 , .•. defined respectively on (]RI, 'B I). (]R2, 'B 2) •.... For n = 1, 2, ... and B E 'B n the following consistency (or compatibility) property holds:
Pn+ 1 (B
(1)
X
]RI) = Pn(B).
We now formulate a fundamental result. Kolmogorov theorem. Let PI, P2 , ... be a sequence of probability measures respectively on (]RI , 'B I), (]R2, 'B 2), ... satisfying the consistency propert), ( I). Then there is a unique probability measure P on (]Roo, 'BOO) such that its restriction on 'Btl coincides with Pn , that is, P(Cn(B)) = Pn(B), B E 'Bn, n = 1,2, .... The proof of this theorem can be found in many textbooks (see Kolmogorov 1956; Doob 1953; Loeve 1977; Neveu 1965; Feller 1971; Billingsley 1995; Shiryaev 1995). Let us note that it uses several specific properties of Euclidean spaces. However, this theorem may fail in general (without any hypotheses on the topological nature of measurable spaces and on the structure of the family of measures {Pn }). This is seen from the following example. Consider the space n = (0, 1]. (Clearly n is not complete.) We shall construct a sequence of a-fields 1'1,1'2, ... and a sequence of probability measures {Pn } where P n is defined on (n, 1'n). Let l' = a(U1'n) be the smallest a-field containing all1'n. Then we shall show that there is no probability measure P on (n,1') such that its restriction Pl1'n on 1'n coincides with Pn, n = 1,2, .... For n = 1,2, ... define the function hn(w) = 1 if 0 < W < lin and hn(w) = 0 if lin ~ W ~ 1. Let en = {A En: A = {w : hn(w) E B}, B E 'BI} and 1'n = a{e l , ... , en} be the smallest a-field containing the sets e l , ... ,en, Clearly 1'1 C 1'2 C .... On the measurable space (n, l'n) define a probability measure Pn as follows:
() .. · , n hW ()) E Bn}={I, ... PnW' { '(h IW, 0 if(I, ' ,I)EB71 , oth erWlse where Bn E 'Bn. It is easy to see that the family {Pn } is consistent: if A E 1'n then P n+ 1 (A X ]RI) = Pn(A). Suppose now that there exists a probability P on the measurable space (n, 1') such thatPl1'n = Pn' Uso, then forn = 1,2, ...
(2)
P{w:hl (w)
= ... = hn(w) =
I} = Pn {w:h l (w)
= ... = hn(w) = 1} = 1.
COUNTEREXAMPLES IN PROBABIUTY
10
However, {w:hl(w) = ... = hn(w) = I} = (0, lin) .J.. 0, which contradicts (2) and the requirement for the set function P to be a-additive (or, equivalently, to be continuous at the 'zero' set 0).
2.4.
There may not exist a regular conditional probability with respect to a given a-field
Let (n, 1', P) be a probability space and 1'1 a a-field such that 1'\ c 1'. Recall that the conditional probability P(AI~\) is defined P-a.s. as an l'\-measurable function of w such that
P(AB) =
1
P(All'J) dP(w)
for each B E 1'\.
The conditional probability P(All'\), A E l' is said to be regular if there exists a function P(A,w), A E 1', wEn, which satisfys the following two properties: (i) P(A, w) = P(All'\) P-a.s. for an arbitrary A E 1'; (ii) for fixed w, P(·, w) is a probability measure on 1'.
If condition (ii) is satisfied and condition (i) holds for all w (not only for P-almost all w), then P(All'd is called a proper regular conditional probability. (In tenns of distributions we speak about regular and proper regular conditional distributions.) Regular conditional probabilities exist in many cases, but proper regular conditional probabilities do not always exist, as can be seen below. Let (n, 1', A) be a probability space with n = [0, I], ~ the a-field of the Lebesgue measurable sets in [0,1] and A the Lebesgue measure. It is well known that in the interval [0,1] there is a non-measurable (in Lebesgue sense) set, say N, such that its outer measure is A· (N) = I and its inner measure is A. (N) = (for details see Halmos 1974). Define a new a-field l' which is generated by l' and the set N. Thus l' consists of sets of the fonn N B\ U N C Bz where B), Bz E 1'. Define also the measure P on the measurable space ([0, ]], 1', P) by
°
P(N B\ U N C B z ) = ~[A(B\) + A(Bz)]. It is easy to check that P is well defined and defines a probability on
t,
so the
triplet ([0, 1], 1', P) is a probability space. For every B E l' we have
P(N B\ U N C B z ) = P(B)
= A(B)
and hence P coincides with A on 1', that is Pll' = A. Moreover,
P{N)
= ~.
Now we shall prove the following statement: on the probability space ([O, I], 1', P) there is no regular conditional probability P(AIJ') , A E j' with respect to the a-field
1'.
11
CLASSES OF RANDOM EVENTS AND PROBABILITIES
Suppose such a probability exists: that is, there is a function, say P(A,w), which satisfies the above conditions (i) and (ii). If so, then for any Borel (and Lebesgue) set A, P(A,w) IA(w). Therefore if A is a one-point set, A {w}, then P({w},w) = 1. Now take the set N. From the definition of a conditional probability and the equality peN) - ~ we get
=
=
t = peN)
l
P(N,W)A (dw).
On the other hand, ifP(·,w) is a measure for each w, then P( N, w) ;::: P( {w }, w) = 1 for all wEN ::::} P( N, w) = 1 for all wEN.
=
=
Consider the set C {w:P( {w}, w) I}. Since P(', w) is a Borel function in w, then the set C is Borel measurable with P( C) = 1. Let D = {w : P( N, w) = I}. It is c1ear that D is Borel-measurable and D ::) C N, which implies that D U CC ::) N. However, the set D U CC is Borel and covers the (non-measurable!) set N which has A* (N) = ]. Therefore P( D U CC) = 1 and P( D) = ]. In other words, for almost all w we get peN, w) = 1, which implies the following equality
l
P(N,W)A(dw) = 1.
=!
However, this contradicts the relation fa peN, W)A (dw) obtained above. Therefore a regular conditional probability need not always exist. Let us note that in this counterexample the role of the non-measurable set N is essential. Recall that the construction of N relies on the axiom of choice. Using a weakened form of the axiom of choice, So]ovay (1970) derived several interesting results concerning the measurability of sets, measures and their properties. General resu1ts on the existence of regular conditional probabilities can be found in the works of Pfanzagl (1969), Blackwell and Dubins (1975) and Faden (1985).
SECTION 3. INDEPENDENCE OF RANDOM EVENTS Let (n,~, P) be a probability space. The events A and B of independent (with respect to P) if
~
are said to be
P(AB) = P(A)P(B). More generally, two c1asses of events (for example fields, a-fields), say Al and A z, AhAI E ~ are called independent if any two events Al and Az where Al E AI, A2 E Az are independent. The concept of independence of two events or two c1asses of events can be extended to any finite number of events or c1asses. We say that the events A I, ... , An E ~ are mutually independent if the following relation (product rule) (1)
12
COUNTEREXAMPLES IN PROBABILITY
is satisfied for all k and ii, i2, ... ,ik where k = 2, ... , n and 1 < i I < i2 < ... < ik < n. Thus for the mutual independence of n events all 2 n - n - 1 relations (1) must be satisfied. If at least one relation does not hold, the events are dependent. If all the relations ( I) fail to hold, we say that the events AI , ... , An are totally dependent. If the product rule (1) holds only for k == 2, the events are pairwise independent. Finally, if (1) holds for all k, 2 ::; k ::; m for some m < n, we have a set of n events which are m-wise independent (pairwise independent if m 2 and mutually independent if m = n). When considering the independence/dependence properties of collections of random events it is natural to speak about the product rule (1) at level k. that is, possible combinations (k-tuples) of that (1) holds or does not hold for any of the events. Thus we can characterize each level k, k = 2, ... , n, as being independent or dependent. Some interesting (and even unusual) possibilities will be illustrated in the examples below. It is obvious how to define the independence of a finite number of classes of events. If A, B E J' and P( B) > 0 we denote by P( AI B) the conditional probability of A given B and put
G)
P(AIB) == P(AB}/P(B}. The independence of two events can easily be expressed through conditional probabilities. Another notion, that of conditional independence. is considered in one of the examples. The examples included in this section aim to help the reader understand the meaning of the fundamental notion of independence more clearly.
3.1.
Random events with a different kind of dependence
In a Bernoulli scheme with a parameter p we shall consider two events which. according to the value of p, are either independent or dependent. {heads} and T :=;; {tails} be the outcomes at tossing a coin with Let H P(H) p, P(T) :=;; 1 - p, 0 ::; p ::; 1. Toss the coin three times independently and consider the events A = {at most one tails} and B = {all tosses are the same}. Obviously.4 = {HHH,HHT,HTH,THH},B = {HHH,TTT}.Hence
P(A}
p3 + 3p2(1 - p),
P(B)
= p3 + (1
- p)3,
P(AB)
p3.
It is easy to see that the product rule
P(AB)
= P(A)P(B)
!.
holds in the trivial cases p = 0 and p = 1 and in the symmetric case p Hence the events A and B are independent if p == 0, or p == 1, or p = For all other values of p in the interval [0,1], A and B are dependent events.
!.
CLASSES OF RANDOM EVENTS AND PROBABILITIES
3.2.
I3
The pairwise independence of random events does not imply their mutual independence
It is natural to start with the first ever known examples showing the difference between the mutual and pairwise independence of random events. The two examples (i) and (ii) below, first presented by Bohlmann (1908) and Bernstein (1928), were created in a period of active studies in probability theory and its establishment as a rigorous branch of mathematics. (i) (Bohlmann 1908). Suppose we have at our disposal 16 capsules with no difference between them. In each capsule we insert three small balls labelled a, b, c and each ball is either white or black. The capsules are put in an urn, mixed well, and we choose randomly one capsule. We open this capsule to see what is inside. that is what is the outcome of our experiment. We are interested in the property denoted by (0'),0'2,0'3) where O'j = 1 if a white ball is at position j and O'j = 0 if that ball is black, j = 1,2, 3. The question is: what kind of dependence exist between a I, 0'2 and 0'3? Clearly, this original and il1uminating description is equivalent to considering an urn with 16 capsules marked (inside) as follows: three capsules by Ill; three capsules by 100; three capsules by 010; three capsules by 001, and each of the marks 1 10, 101, 011 and 000 is used just once among the remaining four capsules. We choose one capsule at random and consider the following events:
Aj = {"I"atjthposition}, j = 1,2,3
(equivalently Aj = {O'j = l},j = 1,2,3). We easily find that P(A I ) = P(A 2 ) =
4,
4, P(A3) = 4and then
P(A I A 2 ) = ~, P(AI A 3) = ~, P(A2 A 3) = ~ implying that the events AI, A 2, A3 are (at least) pairwise independent. However P(AIA2 A 3) = I~ f. = P(A,)P(A 2 )P(A 3 )
ii4
and hence these events are not mutually independent. (ii) (Bernstein 1928). Suppose a box contains four tickets labelled 112, 121, 211, 222. Choose one ticket at random and consider the events A. = {I occurs in the first place}, A2 = {I occurs in the second place} and A3 = {I occurs in the third place}. Obviously P(A I ) = P(A 2 ) = P(A3) = and
4
P(A I A 2) = P(A\A3) = P(A2A~) = ~. This means that the three events AI. A 2, A3 are pairwise independent. However,
COUNTEREXAMPLES IN PROBABILITY
14
and hence these events are not mutually independent. (iii) Consider the six pennutations of the letters a, b, c as well as the triplets (a, a, a),
(b, b, b) and (c, c, c). Let n consist of these nine triplets as points, and let each have probability Define the events AI. = {the kth place is occupied by the letter a}, k = 1,2,3. Then obviously
b.
i,
P(A I ) = P(A2) = P(A3) = P(A I A2 ) = P(AIA3) = P(A2A3)
=~
and hence the events AI, A 2 , A3 are pairwise independent. However, they are not mutually independent, since AIA2 C A3, which implies that
P(A I A2A3) =
bf-
2~'
The same idea can be generalized as follows. Let n contain n! + n points, namely the n! pennutations of the symbols al, . .. ,an and the n repetitions of the same symbol aI., k = 1, ... , n. Suppose that each of the pennutations has probability 1/[n2(n - 2)!] while each of the repetitions has probability 1/n2. Then it is not difficult to check that the events Ak = {al occurs at the kth place}, k = 1, ... , n, are pairwise independent, but no three of them are mutually independent. Let AI, A2, A3 be independent events each of probability ~ and put Aij = (Ai 6 Aj)C where 6 denotes the symmetric difference of two sets: Ai 6 Aj = AiAj + AiAj or, equivalently, Ai 6 Aj = (Ai \ Aj) U (Aj \ Ai). (In particular, we could consider the following simple experiment: three symmetric coins numbered 1, 2, 3 are tossed; then Ai = {coin i falls heads}, Aij = {coins i and j agree}.) Then the events A 12 , A 13 , A23 are not mutually independent, though they are independent in pairs.
(iv)
(v) Let L be the set of all n 3 three-letter words s of a language and all words are equally likely. Define the events A, Band C as follows:
A = {s E L: s begins with a specified letter, say x}, B = {s E L: s has the letter x in the middle}, C = {s E L: s has two of its letters the same}. Then A, Band C are pairwise but not mutually independent.
3.3.
The relation P(ABC) = P(A)P(B)P(C) does not always imply the mutual independence of the events A, B, C
n = all ordered pairs ij, i, j of n has probability 1136. Consider the events:
(i) Let two dice be tossed,
B
= {first die = 1, = {first die = 3,
C
=
A
= 1, ... ,6 and each point
2 or 3}, 4 or 5},
{the sum of two faces is 9}.
15
CLASSES OF RANDOM EVENTS AND PROBABILITIES
=
Obviously we have AB {31,32,33,34,35,36}, AC {36, 45,54}, ABC {36}. Then P(A) = ~, P(B) = ~, P(C)
P(ABC)
=
{136}, BC -
= ~ and
= 316 = !~~ = P(A)P(B)P(C).
Nevertheless the events A, B, C are not mutually independent, since
P(AB) =
t :f i = P(A)P(B),
P(AC) =
1 36
=F 1~ = P(A)P(C),
P(BC) =
112
:f
I~ = P(B)P(C).
In other words, independence at level 3 does not imply independence at level 2. Let.Q {l, 2, 3,4,5,6, 7,8} where each outcome has probability 118. Consider the events BI = {1,2,3,4}, B2 = B3 = {l,5,6,7}. Then P(BJ) = P(B2) P(B3) B1B2 B 3 {l} and thus P(B1B2 B 3) &= P(Bt)P(B2)P(B3). However, the events B2 and B3 are not independent and hence the three events are not mutually independent. (ii)
=
= !,
l
=
! .! .
(iii) Let the space .Q be partitioned into five events, say AJ, A2, Aj, A 4, As, such P(A2) = P(A3) = 15/64, P(A4) = 1/64, P(As) = 18/64. Define that P(Ad three new events, namely B = AI U A4 • C = A2 U A 4 • D A3 U A 4 • Then P(B) =
=
P(B)P(C)P(D). P(C) = P(D) = 1/4. P(BCD) = 1/64: that is, P(BCD) However, P(BC) = P(A4) - 1/64:f 1/16 = P(B)P(C) and hence the events B, C, D are not mutually independent.
3.4. A collection of n + 1 dependent events such that any n of them are mutually independent A symmetric coin is tossed independently n times. Consider the events Ak = {heads at the kth tossing}, for k = 1, .. ", nand A n +1 = {the sum of the heads in these n tossings is even}. Then obviously (i)
[(n)0 + (n)2 + ...1= ~ 2 1 = 2" 1
n
1 .. ,P(A n) - 2' 1 P(An+d = 21 P(At) = 2,· n
-
It is easy to see that the conditional probability P(An+IIA I ... An) = I if n is even, and 0 if n is odd. This implies that the equaJity
is impossible because the right-hand side is 2-(n+l) and the left-hand side P(AI " .. AnAn+d P(A t '" An)P(An+dAI .". An) - 2- n if n is even, and 0 if n is odd. Therefore AI, ... , An, An+1 cannot be a collection of mutually independent events.
COUNTEREXAMPLES IN PROBABILITY
16
Now take any n of these events. If we have chosen AI,.'" An, they are independent, since for any Ail"" ,Ail!' 2 :S k :S n we have P(Ail ... Ai,.,) = P( Ail) ... P( Aile)' It remains to consider the choice of n events including An+ I and n - 1 events taken from A I, ... , An, for example A 2 , A 3 , ... , An, An+ I. For their mutual independence it suffices to check that
(1) where 1 :S Tn :S n - I, ii, ... ,im are among 2, ... ,n. We have P(A i, ) = P(Ai"J = P(An+d = ~ and thus the right-hand side of (1) is 2-(rn+I). Further,
P(A i,
...
Aim An+d = P(Ai, '" Aim )P(An+dAi, ... Aim) = 2- m2- 1 = 2-(m+l).
Thus (1) is satisfied and therefore any n events among the given n + 1 events are mutually independent. In other words, the dependent n + I events A I, ... 1 An+ I are n-wise independent. We can conclude that if we have n + 1 events and any n of them are mutually independent, this does not always imply that the given events are mutually independent. Clearly this is a generalization of the Bernstein example (see Example 3.2(ii)).
+ 1 points in the plane, say M I, ... , A1n+ I, which are in a general position (no three of them lie in a straight Hne). Join up the points in pairs and obtain (n! I) segments. Then we put a pointer to each of the segments by tossing a symmetric coin (nil) times (that is, if we consider the segment A1iMj and the result of the tossing is heads, we put a pointer from Mi to M j ; if tails, the pointer goes from M j to Mi. Consider n + 1 events A I, ... , An+ I , where (ii) We are given n
Ak
{the number of pointers going to M k is even}, k = I, ... , n
+ I.
Then for each k, 2 :S k :S n and any 1 :S i l < i2 < .,. < ik :S n + I, the events Ail' Ai., ... ,Ail! are mutually independent. However AI, ... ,A n+ 1 are dependent and so we have another collection of n + 1 dependent events which are n-wise independent.
3.5.
Collections of random events with 'unusual' independence/dependence properties
Let us describe a few probability models and collections of random events with specific properties. (i) Suppose that the sample space of an experiment is n
-
{I, 2,3,4,5,6,7, 8} with
probabilities Pk = P( {k } ) defined as follows: PI
0, P2 = P:. = P4
= (7 -
160)/24, Ps
P6
= P7 = (1 + 80)/24, Ps = 1/8
]7
CLASSES OF RANDOM EVENTS AND PROBABILITIES
where 0' is an arbitrary number in the interval (0, 7/16). Consider the events
AI = {2, 5, 6, 8}, A2 = {3, 5, 6, 8}, A3 = {4, 6,7, 8}. We easily find that P(A I ) = P(A 2)
P(A3) =
! and then
P(A 1 A 2) = P(AIA3) = P(A2A3) = 1"\2a, P(AIA2A3) =
k.
Hence the events A I, A 2, A3 are independent at level 3 for any value of Q E (0, 7/16). If Q: - I /4 they are independent at level 2 and this is the only case when these three events are mutually independent. (ii) Let n = {1,2,3,4,5~6} with PI Consider the following events:
AI
= {I,2,3,4},
Then P(Ad
A2
= I~' P2 = P3 = P4 = Ps =
= {1,2,3,5},
= P(A 2) = P(A3)
A3
P(A4) =
= {1,2,4,5},
,P6
= ~.
A4 = {1,3,4,5}.
! and we find further that I
16'
Therefore these four events are independent at level 4 but they are dependent at level 2 and dependent at level 3. (iii) Take a sample space n containing Inl = 16 outcomes denoted by 1,2, ... ,16 each having the same probability I~' Consider the events:
A = {2, 3,4,5,6,9, 13, 16}, (j= {4,6,7,8, 10, II, 13, 14}, Then peA)
= {4, 7, 8, 10, II, 13, 14, 16}, LJ = {3,4,5,6,9, 10, 15,16}. B
= P(B) = P((j) = P(LJ) = ~ and since AB(jLJ = {4} we have l~ = P(AB(jLJ) = P(A)P(B)P((j)P(LJ)
and hence the product rule is satisfied at level 4. Further. AB(j = {4, 13} implying that k = P(AB(j) = P(A)P(B)P((j) and similarly the product rule holds for any of the remaining five possible triplets of events. It turns out, however, that the product rule does not hold for any 6 (i) possible pairs of events. In particular, (jLJ = {4, 6, 10} and
i~
P( (jLJ)
f.
P( (j)P(LJ) =
t·
Thus the events A, B, (j,LJ are independent at level 4, independent at level 3 and (completely) dependent at level 2.
18
COUNTEREXAMPLES IN PROBABILITY
(iv) Suppose the space following probabilities:
n consists of Inl
= 12 outcomes, say 1,2, ... ,12 with the
I P2 -P I P I -- 16' 3 -P 4 -P 5 -_ 24' 5 P 12 _ - r 8 -'1ln-P - 1-'':1 10 -P II -- 48' P6 -'Th-P f
-
7 48'
Define the events B I , B 2 , B 3 , B4 as follows:
BI B3
= {1,2,3,4,6,7,8},
B2 = {l,2,4,5,7,9, II,}, B4
= {1,2,3,5,6,9, 10},
= {l,3,4,5,8, ]0, ]I}.
Standard reasoning leads to the following conclusion: the events B I , B 2, B 3 , B4 are independent at level 4, dependent at level 3 and independent at level 2. (The details are left to the reader.)
3.6.
Is there a relationship between conditional and mutual independence of random events?
The random events AI, A 2, ... , An are called conditionally independent given event B with P(B) > if
°
P(A I A 2 ··. AnIB) = P(A I IB)P(A 2IB) ... P(AnIB). We want to examine the relationship between the two concepts mutual independence and conditional independence. (i) Suppose we have at our disposal two coins, say a and b. Let Pa and Pb, Pa f:. Pb, be the probabilities of heads for a and b respectively. Select a coin at random and toss it twice. Consider the events AI = {heads at the first tossing}, A2 = {heads at the second tossing} and B = {coin a is selected}. Then P(A I A 2IB) = PaPb. P(AdB) = Pa, P(A 2IB) = Pa· Hence P(A I A 2IB) = P(AdB)P(A2IB), and the events A I and A2 are conditionally independent given B. However,
P(A I A2) = 4p~ + 4p~,
= 4(Pa + Pb), P(A 2) = 1(Pa + Pb) f:. Pb the equality P(A I A 2) = P(AJ)P(A 2) is not satisfied. P(AJ)
and since Pa Therefore the events A I and A2 are not independent, despite their conditional independence. (ii) A symmetric coin is tossed twice. Consider the events Ak = {heads at kth tossing}, k = 1,2 and B = {at least one tails}. Then P(A I ) = P(A 2) = P(A I A 2 ) = ~ and hence the events AI and A2 are independent. Further, it is easy to see that P(AIIB) = P(A2IB) = However P(AIA2IB) = and (I) fails to hold. Therefore the independent events AI and A2 are not conditionally independent given
t.
l.
°
B. The final conclusion is that there is no relationship between conditional independence and mutual independence, that is neither one of these properties implies the other. (See also Example 7.14.)
19
CLASSES OF RANDOM EVENTS AND PROBABILITIES
3.7.
Independence type conditions which do not imply the mutual independence of a set of events
Suppose the random events AI, A 2, ... , An satisfy the conditions
(1)
P(ih)=Pk,
P(A I A 2 ... A k )=PIP2".Pk,
k=1,2, ... ,n
which could be called independence-type conditions. In (l) PI , ... , Pn are arbitrary numbers in the interval (0,1). Obviously, if n = 2 and (1) is satisfied, this is merely the definition of the independence of two random events AI and A2. We ask the following question: does (1) imply, in the general case when n ~ 3, that the given events are mutually independent? Of course, it is clear that (1) is much less than the standard condition for mutual independence. Thus we can expect that the answer to this question is negative. Let us illustrate the truth of this with the following example, considering for simplicity the case n = 3. Suppose AI, A2, A3 are random events such that
P(AIA2A3)
= P(AIA2A~) = ~,
P(AIA~A3) = P(AIA2A3)
P(AI A~A3) = P(AI A 2A3) = ~ + c, where
°< :S t. c
= ~ - C,
P(AI A zA 3) = ~ + 2c,
P(AI A zA 3) = ~ - 2c We can easily check that
P(Ad = P(A 2) = P(A3) =~,
P(A I A2) =~,
P(AIA2 A 3)
=~
and thus the conditions in (1) hold. For the mutual independence of AI, A 2, A3 the equalities P(AI A3) = P(AJ)P(A 3) and P(A2A3) = P(AdP(A3) must also be satisfied. However,
Hence the independence-type conditions (1) are satisfied for the events AI, A 2 , A 3 , but these events are not mutually independent.
3.8.
Mutually independent events can form families which are strongly dependent
Choose a number x at random in the interval [0,1] and consider the expansions of x in bases 2, 3, .... Denote by Ab k = 2,3, ... , the family of sets A~) , m = 1,2, .. " containing all points x whose nth digit in the expansion in base k is equal to zero. Then for every fixed k the events A~), m = 1,2, ... , are mutually independent. This is easily checked, but for details see Neuts (1973) or Billingsley (1995). We want to know whether the families Ab k = 2,3, ... , are independent. To see this, take the events
A(3) A(4) A (2) 1 , I , 1 , ...
20
COUNTEREXAMPLES IN PROBABILITY
which are representatives of the families A2,A3 ,A4 , •.. respectively. On the one hand, for any n > 2,
because the first digit in the number base k is However, on the other hand,
n p(A(k» = rrn k1 = rr k=] k=Z I
1 n!
°
iff x
<
1/k for k
= 2,3, ... , n.
#.!.n = P (nn A(k»). I k=Z
Therefore the families A k , k 2,3, ... , are not independent, although they are generated by mutuaJJy independent events. 3.9.
Independent classes of random events can generate a-fields which are not independent
Let (n, 5", P) be the standard probabiJity space: n = [0, I], 5" is the Borel a-field 'B[O,1J generated by the subsets of nand P is the Lebesgue measure. Consider the following two classes of random events: Al = {All, A l2 } and Az = {Az} where
I
~.! = P(A ll )P(A 2 ),
I
~.!
P(A II Az)
4
P(AI2 A 2)
3'
= P(A I2 )P(A2).
Therefore the classes A 1 and A2 are independent. It is easy to see that the a-fields a(AI) and a(A2) generated by AI and A2 are not A lI A'2. then P(A t } = ~. A,A2 = [0, and independent. E.g. if Al
U
A similar example can be given in the discrete case. It is enough to take e.g. the sample space n = {I, 2, 3,4} with equaJJy likely outcomes and two classes A I and A2 where AI contains one of the outcomes of nand Az contains two of them. A simple calculation leads to a conclusion like that presented above. Let us note finally that a(A]} and a(A2) would be independent if each of Al and A2 were a rr-system, i.e. n E Ai and Ai. i = 1,2, is closed under intersection.
21
CLASSES OF RANDOM EVENTS AND PROBABILITIES
SECTION 4.
DIVERSE PROPERTIES OF RANDOM EVENTS AND THEIR PROBABILITIES
Here we introduce and analyse some other properties of random events and probabilities. The corresponding definitions are given in the examples themselves. This section is a natural continuation and an extension of the ideas treated in the previous sections.
4.1.
Probability spaces without non-trivial independent events: totally dependent spaces
Let (.Q, :1, P) be a probability space. Recall that the events A, B E :1 are non-trivial and independent if 0 < P(A) < 1, 0 < P(B) < I and P(AB) = P(A)P(B). One might think that every probability space contains non-trivial independent events. However, this conjecture is false. (i) Let.Q be a finite set, .Q
P( {wt})
=I
{WI, •..
(n - ))e:,
,wn } and
P( {w2}) = P( {W3 })
= ... -
P( {Wn })
= e:
where e is an irrational number, 0 < e < (n I) -I. Suppose there exists a pair A, B of non-trivial independent events. We have the following three possibilities: (I) WI ft A, WI ft B; (2) WI ft A, WI E B, or conversely; (3) WI E A, WI E B. We can easily verify that the independence condition is not satisfied in any of the cases (1), (2) or (3). For example, consider case (2). Here A contains some k outcomes taken from W2, .•• , Wn and B consists of WI and some 1 outcomes taken from W2, ..• ,Wn . Then the intersection AB contains elements taken only from W2, . .. ,Wn . Let their number be m, m < k. We obtain the following equality: me = [1 - (n
l)e + 1e]ke:.
It follows that e: = (k - m)/[k(n - 1 -1)], which contradicts the assumption that £ is irrational. Similar reasoning can be used in cases (1) and (3). Therefore, in this example non-trivial independent events do not exist. Moreover, it can be shown that more than two non-trivial events also do not exist. Notice that here .Q is a finite set.
In case (i) .Q was a finite set. Let now .Q be a countably infinite set, .Q= {W\,W2, .•• },andlet (ii)
00
P({wd) =2-k~, k=2,3, ... ,
P({wd)=e with e= 1- LP({Wk})' k=2
Note that the latter infinite series is convergent and its sum e is a number in (0, I) and it is crucial for further reasoning that £ is an irrational number (in fact, £ is also
22
COUNTEREXAMPLES IN PROBABILITY
transcendental; £ is a Liouville number). It can be shown that any finite or infinite collection of arbitrarily composed random events is totally dependent. (iii) In cases (i) and (ii) above we have described probability spaces with total dependence of their events, no matter how they are defined. In such a case we use the term totally dependent probability space. Notice, however, that in (i) and (ii) il is a discrete set and the probability measure P is purely discrete. Hence there are purely discrete probability spaces which are totally dependent. This immediately leads to the question: is it possible for a non-purely discrete probability space to be totally dependent? Recall that 'non-purely discrete' means that P is not just a sum of 'atoms', as in cases (i) and (ii) above. Now we assume that there is a subset ile C il with P(ile) > 0 0 for each and such that the restriction Plile of P on ile is non-atomic: P( {w}) w E nco Let p(nc) = c where obviously 0 < c < 1. Let us clarify if such a space can be totally dependent. For this we need the following result known as the Lyapunov theorem (Rudin 1973): For any b, 0 ~ b ~ c there is a subset (event) D C ile such that P(D) = b. Let now p be a fixed number, 0 < p < c. As a consequence of the above cited result we can find three events, say D I • D2, D'j. which are pairwise disjoint and such that P(Dt) = p2, P(D 2) = p(l - p), P(D3) = p(l - p)
=
(the measure of DJ U D2 U D3 is P - p2
< c). Define the events
A = D, U D2 and B - Dl U D3. Obviously P{A) = p, P(B)
= p and since AB = P(AB)
Dl where P(D!) = p2, we get
= P(A)P(B)
and A and B are non-trivial events. Therefore a non-purely discrete probability space (the measure P has a 'continuous' part) cannot be totally dependent. Notice that the examples of Bernstein, Bohlmann and their inverses (Example 3.2) are purely discrete. They all can be realized on probability spaces which are nonpurely discrete, that is, on spaces with at least a partially 'continuous' part.
4.2.
On the Borel-Cantelli lemma and its corollaries
Let {An,n ~ l} be a sequence of events in the probability space (n,!f,p). Define the event A" = n~= 1 U~n A k • Then A" = {An i.o.}: that is, infinitely many An occur (i.o. means infinitely often). The following result (the Borel-Cantelli lemma) can be found in almost all textbooks on probability theory. (a) If2:~=1 P(An) < (b) If 2:~=1 P(An) =
00, 00
then P[A n i.o.] = O. and AI, A 2, .. . are independent. then PlAn i.o.]
l.
23
CLASSES OF RANDOM EVENTS AND PROBABILITIES
We show by an example that in general the converse of (a) is not true. and that the independence condition in (b) is essential. [0, I], :1 ~[O,IJ and let P be the Lebesgue measure. Consider the Let n following sequence of events: An = [0, 1In], n 1,2, .... Then obviously we have An 4- in n as n -t 00, [An Lo.] n~1 An 0. so that P[A n Lo.] 0. However. 2:~1 P(An) = 2:~1 (1 In) = 00. It follows that the converse of (a) is not true. Looking at (b). we see that the condition L~=I P(An) = 00 does not imply that P[A n Lo.] = 1 and thus the independence of At, A2, . .. is essential.
=
=
=
4.3.
=
=
When can a set of events be both exhaustive and independent?
Let (n,:1, P) be a probability space and {Ai, i E A} a non-empty set of events. (A denotes some non-empty index set.) This set is called independent if for any k ~ 2 and any subset A ik , ••• , Ail.' il, ... , ik E A, k
II P(A
i ;).
j=1
The set is called exhaustive if
The following question arises naturally: is it possible for the set {Ad to be both exhaustive and independent? The answer will be given for two cases: when the set {Ai} consists of a finite or of an infinite number of events. (0 Let the index set A be finite. Suppose {Ail i E A} is an independent set. Then so is the set {A1, i E A}, and
(1) Obviously, if for all i E A, P(AI) > 0, then the right-hand side of (I) becomes less than 1 and {Ai, i E A} cannot be exhaustive. However, if for some i we have P(A;) = 0, this means that P(Ai) = 1 and Ai = n. Therefore in this trivial case only (compare Example 4.1) the finite set of independent events can be exhaustive. Of course, a finite set {Ai, i E A} can be exhaustive without being independent. (ii) Here the index set A
= N. We shall construct two different sets of independent
events such that one of them is exhaustive and the other is not. Choose at random a number x E [0,1]. Let Ai be the event that the ith bit in the binary expansion of x is zero. It is easy to check that AI, A 2 , ... are independent and
24
COUNTEREXAMPLES IN PROBABILITY
=!
=
moreover P(Ad for each i. Thus 2.::1 P(Ad 00 and, according to the BorelCantelli lemma (see Example 4.2), P(U: I Ad = l. Hence the set {Ai, i 2: I} is both independent and exhaustive. Consider now another set {Bi' i 2: I} defined by s
wherer
= !i(i -
I)
+ I, s = !i(i + I).
BI is the event that the first bit in the binary expansion of x is zero, B2 that the second and the third bits are zero, B3 that the next three bits are zero, and so on. Since P(B i ) = 2- i , we have 2.:: 1P(Bd < 00 and P(U:I Bd < I. Hence {Bi' i 2: I} is a set of independent events which. however, is not exhaustive.
4.4.
How are independence and exchangeability related?
Let us consider a finite collection of random events An = {AI," . , An}, n 2: 2 in probability spaces. An is said to be exchangeable (also symmetrically dependent) if the probability P( Ail' .. A ik ) is the same for all possible choices of k events, k ~ 1. 1 :::; il < i2 < " . < ik :::; n. In other words, there are numbers PI ,P2, ... ,Pk-I, all in (0, I), such that
P(A j ) = PI for all j; P(AiAj) = P2 for all i < j; P(AiAj At) = P3 for all i < j < I etc. Like the independence property we can introduce the term exchangeability at level k for a fixed k meaning that P(A il ... A ik ) is the same for all choices of just k events from An regardless of what happens at levels higher than k, and lower than k. It turns out the collection An can be such that exchangeability property does not hold for others. Thus An is totally exchangeable (or simply exchangeable) if it obeys this property at all levels k, k = 1,2, ... , n - 1 (for k = n we have only one event,
A I A 2 •.• An)·
It is easy to see that if An is exchangeable at level 1 (P(AI) = ... = P(An) = PI) and An is mutually independent, then obviously An is totally exchangeable (now P(AiAj) all i < j; P(AiAjAt) = all i < j < l etc.). If, however, An is mutually independent but there are different numbers among P(AI), ... , P(An), then An is not exchangeable at all. We can return back to Example 3.5 and derive additional conclusions about the validity of the exchangeability property (total or partial, only at some levels). Let us turn to another example. Suppose we have at our disposal 192 cards on which in a special way numbers are written such that: 110 cards are marked by a 'triplet' (each of 123, 124, 125, 134, 135, 145,234.235,245, 345 is written 11 times); 30 cards are marked by a 'quartet' (each of 1234, 1235, 1245, 1345,2345 is written six times); six cards are marked by
= Pt,
P1,
25
CLASSES OF RANDOM EVENTS AND PROBABILITIES
the 'quintet' 12345; the remaining 46 cards are blank. All 192 cards are put into a box and well mixed. We are interested in the following five events:
Ai
= {randomly chosen card contains the number i}, i = 1,2,3,4,5.
It is easy to check that for all possible indices i,j, l, s we have:
Thus we arrive at the following two conclusions for these five events, namely: (a) they are dependent at level 2, dependent at level 3, independent at level 4 and independent at level 5; (b) they are totally exchangeable. The final conclusion is that these two properties of random events, independence and exchangeability, are not related.
4.5.
A sequence of random events which is stable but not mixing
Let (il,~, P) be a probability space and {Ai, i such that for every B E ~, lim P(AnB)
n--+oo
2: 1}, An
E ~ a sequence of events
= >.P(B)
where>. is a cunstant not depending un B, 0 < >. < 1. Then {An} is called a mixing sequence with density>. (see Renyi 1970). In this case it is usual to speak about mixing in the sense of ergodic theory (see Doukhan 1994). The mixing property can be extended as follows. The sequence {An} is called a stable sequence of events if for any B E ~ the following limit exists lim P(AnB)
n--+oo
= Q(B).
According to Renyi (1970), Q is a measure on ~ which is absolutely continuous with respect to P. The Radon-Nikodym derivative dQ/dP = o:(w) exists and for every B E ~, Q(B) = a(w) dP. Here 0 ~ a(w) ~ 1 with probability 1. The r.v. a is called a (local) density of the stable sequence {An}. If a = >. = constant a.s., 0 < >. < 1, clearly the stable sequence {An} is mixing and has density>.. However, if a is not a constant, the stable sequence {An} cannot be mixing. Let us illustrate this statement by an example. In the probability space (il,~, P) let B, E ~, 0 < P(BI) < 1 and B2 = B I . Consider two spaces, (il,~, PI) and (il,~, P2) where
IE
26
COUNTEREXAMPLES IN PROBABILITY
Suppose that {A~} is a mixing sequence in (0,:1, Pd with density Al and {A~} a mixing sequence in (0,:1, P 2) with density A2 where 0 < Al < A2 < 1. Put An = A~BI + A~B2. Then for every BE :1 we have
and hence
Define the r.v. a wE B 2 . Then
= a(w)
as follows: a(w)
Q(B) =
= AI
if wEB" and a(w) = A2 if
L
a(w) dP.
It follows that the sequence {An} is stable but not mixing, since its density is not constant but takes two different values with positive probabilities. As noted by Renyi (1970), in a similar way we can construct a stable sequence of events such that its density has an arbitrarily prescribed discrete distribution.
Part 2
Random Variables and Basic Characteristics
Professor
T.
omen~:o
of Moscow
RANDOM VARIABLES AND BASIC CHARACTERISTICS
SECTION 5.
29
DISTRIBUTION FUNCTIONS OF RANDOM VARIABLES
Let F (x), x E R I be a function satisfy ing the conditions: (a) F is non-decreasing, that is XI < X2 ::::} F(xl) < F(X2); (b) F is right-continuous and has left-hand limits at each x E RI, that is F(x+) := limuix F(u) = F(x); (c) limx-t-oo F(x) 0, lirnx-too F(x) - 1. Any function F satisfying conditions (a), (b) and (c) above is said to be a distribution function CdJ.). Now let (Q, ~, P) be a probability space. Denote by 1) I the Borel O"-field of the real line JRl (00,00). Recall that any measurable function X : (n,9") H (JR l )1) l) is called a random variable (r.v.). By the equality Px(B) = P(X-l(B)), B E 1)1 we define a probability measure on 1) I. Using the properties of the probability P (see the introductory notes to Section 3 we can easily show that the function
Fx (x) = Px (( -oc, xj),
x E JRI
satisfies the above conditions (a), (b) and (c) and hence Fx is a d.f. In such a case we say that Fx is the d.f. of the r. v. X. If there is a countable set of numbers XI, X2, . .. (finite or infinite) such that Fx (x n ) - Fx (xn ):= Pn > 0, Ln Pn = 1, then the dJ. Fx is called discrete. The probability measure Px is also discrete, that is Px is concentrated at the points Xl,X2,,," called atoms, and Px({x,J) = Fx(xn) - Fx(xn-) > 0, Ln Px ({ xn}) 1. The set {PI, P2, ... } is called a discrete probability distribution and X a discrete r. v. with values in the set {XI, X2, ... } and with a distribution {P)'P2 •... }. Clearly P[X = Xn] = Pn. n = 1,2, .. " The d.f. Fx is said to be absolutely continuous if there is a non-negative and integrable function f(x), x E JRI such that
Fx (x) - [Xoo feu) du
for all x E JRI.
Here f is called a probability density function (simply, density) of the d.f. Fx as well as of the r. v. X and of the measure Px . Let us note that there are measures whose d.f.s are continuous but have their points of increase on sets of zero Lebesgue measure. Such measures and distributions are called singular. They will not be treated here. The interested reader is referred to the books by Feller (1971), Rao (1984), Billingsley (1995) and Shiryaev (1995). Now we shall define the multi-dimensional dJ. For the n-dimensional random vector (X" ... , Xn), consider the function
G(XI,."'X n ) = P[XI ~ Xl, ... ,Xn ~ Xn], It is easy to derive the following properties of G:
30
COUNTEREXAMPLES IN PROBABILITY
(ad G(XI,"" xn) (bd G(XI,.'" xn) (CI) G(XI,"" xn) G (x I , ... , xn) (d) if ai ~ bi , i =
is non-decreasing in each of its arguments; is right-continuous in each of its arguments; ~ 0 as Xj ~ -00 for at least one j; ~ 1 as x j ~ 00 for all i = 1, ... , n; 1, ... ,n and
~ai,bi G(XI, . .. ,x n )
= G(XI,' .. , Xi-I, bi , Xi+I,··., Xn) -
G(XI,'"
,Xi-I,
ai, Xi+I,·· ., Xn)
then ~aJ ,bl ~a2,b2 ... ~an ,bn G(XI , ... ,X n ) ;:::
O.
Any function G satisfying conditions (ad, (bd, (CI) and (d) is called an n-dimensional d.f Actually G is the d.f. of the given random vector. Analogously to the one-dimensional case we can define the notion of a discrete multi-dimensional dJ. Further, we say that the dJ. G and the random vector (XI, ... ,Xn ) are absolutely continuous with density g(XI, . .. ,xn), (XI, ... ,x n ) E n IR if
for all (XI, ... ,x n ) E IRn. Here g is non-negative and its integral over IR n is 1. The marginal d.f Gi(Xi) = P[Xi ~ Xi] is obtained from G in an obvious way, putting Xj = 00 for j =I i. If we integrate g(XI,'" ,x n ) in the arguments XI, ... ,Xi-I,XHI,· .. ,X n each in IRI, we obtain the function gi(Xi) which is the marginal density of Xi, i = 1, ... ,n. We say that the r.v.s XI and X2 are independent if
P[XI E B I , X 2 E B 2] = P[XI E BdP[X2 E B 2] for any Borel sets B I and B2 (that is, B I ,B2 E 'B I). Analogously to the case of random events we can introduce the notion of mutual and pairwise independence of a finite collection of r.v.s. If XI, ... ,Xn are n r.v.s with dJ.s F I , ... ,Fn respectively, and F(XI, ... ,x n ) is their joint dJ., then these variables are mutually independent (simply independent) iff
F(XI,'" ,x n ) = FI(xJ) ... Fn(xn),
XI, ...
,X n E IRI.
In terms of the corresponding densities the independence of the r.v.S XI, . .. ,Xn is expressed in the form
f(Xl, ... ,X n ) = fdxl) ... fn(xn),
Xl,,,.,Xn E IRI.
Let us now define the unimodality property for an absolutely continuous d.f. F: F(x), X E ]RI is said to be unimodal with its mode (or vertex) at the point Xo E IR I if F(x) is convex for X < Xo and concave for X > Xo. For a detailed description of the properties of one-dimensional and multidimensional dJ.s we refer the reader to the works of Feller (1971), Chow and Teicher ( 1978), Laha and Rohatgi (1979) and Shiryaev (1995).
RANDOM VARIABLES AND BASIC CHARACTERISTICS
5.1.
31
Equivalent random variables are identically distributed but the converse is not true
Consider two r. v.s X and Y on the same probability space (0,1', P) and suppose they are equivalent, that is P{w:X(w) 1- Y(w)} = O. Hence
Fx(x)
= P{w:X(w)::; x} = P{w:Y(w)::; x} = Fy(x)
I for each x E JR .
Thus X and Y are identically distributed. In such a case we use the following notation: d
X=Y. To see that the converse is not true, take the r. v. X which is absolutely continuous and symmetric with respect to the origin. Let Y = - X. Then the symmetry of X implies that Fx = Fy. Further, as a consequence of the absolute continuity of X, we obtain
P{w : X(w) = Y(w)} = P{w: X(w) = -X(w)} = P{w: X(w) = O}
= O.
Therefore X ~ Y, however X and Y are not equivalent. The same conc1usion can be drawn if X is any discrete r. v. which is symmetric with respect to 0 and such that P{X = O} < 1. (The last condition excludes the trivial case.) This means that X takes a finite or infinite number of values ... , -X2, -XI, Xo = 0, XI, X2,'" with probabilities Po = P{X = O} < l,pj = P{X = Xj} = P{X = -Xj}, j = 1,2,.··,Po + 2LjPj = 1.
5.2.
If X, Y, Z are random variables on the same probability space, then
X
4: Y
does not always imply that X Z
d
YZ
Let X and Y be r. v.s (defined, perhaps, on different probability spaces). It is well
Y and g(x), X E JR I is a ~I-measurable function, then g(X) and g(Y) are also r. v.s and g(X) ~g(Y). This fact could suggest the following conjecture.
known that if X
d
If X. Y and Z are defined on the same probability space, then d
d
X = Y => X Z = Y Z
for any r. v. Z.
A simple example will show that in general this is not true. Let the r. v. X have
-X. Then X ~ Y. Now take Z = Y, that is Z = -X. Then the equality X Z d Y Z is impossible because X Z = - X2 and Y Z = (- X)( - X) = X2. It suffices to note that all values of X Z are non-positive while those ofY Z are non-negative. The trivial case P{X = o} = 1 is of no interest. a symmetric distribution and let Y
=
32
COUNTEREXAMPLES IN PROBABILITY
5.3.
Different distributions can be transformed by different functions to the same distribution
Suppose (is a r.v. in IRI and gl (x)
f.
g2(X), x E IRI are Borel-measurable functions. d
Then gl (() and g2(() are r.v.s with different distributions, i.e. gl (() f. g2(() (except trivial cases involving symmetry-type properties). Further, if (I and 6 are r.v.s andg(x), x E IRI is a Borel-measurable function, then d
(I
d
f. 6
implies that g(6) f. g(6) (again except easy cases). These two facts make the possibility to describe explicitly two r.v.s (I and 6 and two d
Borel-measurable functions gl and g2 such that 6 f. 6,91 interesting. The multi-dimensional case is also of interest. (i) Consider the r.v. (I ,. . ., N(O,
!
f.
d
g2, but gl (6) = 92(6)
!), normally distributed with zero mean and variance
and the r.v. 6 '" ,(a), a > 0, i.e. 6 has a gamma distribution with density (l/r(a))xa-Ie- x , if x > and otherwise. Take also the functions gl(x) IxlP tJ and 92(X) = Ixl , x E IRI, where p > and,B > are arbitrary numbers. Let us see how the two r.v.s 7]1 = gl ((d = l(tl P and 7J2 = g2(6) 161 tJ = are connected. If II and h are the densities of7]1 and 7]2 respectively, we find that 11 (x) = h(x) = if x ::; and, for x > 0,
° °
°
°
=
=
(f
°
2 II(x) = - - x l / p - I exp(-x 2 /p), pfi
h(x)
°
I = ,Br(a) x a / tJ - 1 exp( _XI/tJ) .
!
Now let us keep p fixed and take a = and,B = p/2. Hence (I '" N(O, ~) as before, 6 '" ,( ~) and taking into account that r( ~) = Vii and comparing II and h, we conclude that the r.v.s 7]1 and 7]2 have the same distribution. Therefore two different r.v.s 6 '" N(O,~) and 6 "-' ,(!) can be transformed by different functions to identically distributed r.v.s:
(ii) Here is another case involving more than two variables. Take three r.v.s (, 7] and () where ( '" N(O, I), 7] '" Exp(l) (exponential distribution with parameter 1) and () follow the arcsine law, that is, the density of () is 1/ (1T" X (1 - x)) on [0, I] and () otherwise. Consider now three new r.v.s, namely
vi
log( ~e), 10g(7]), log(()) and denote by
'I/J), 'l/J2, 'I/J] their ch.f.s, respectively. Then
1/Jl(t) = r(! + it)/fi, 'l/J2(t) = r(l + it), 'l/J3(t) = r(1 + it)r(! + it)/fi· By using the obvious identity 1/;1 (t)'ljJ2(t) = 1/;3(t) for all t and assuming that 17 and (j are independent, we easily arrive at the relation
e:2:.27]().
33
RANDOM VARIABLES AND BASIC CHARACTERISTICS
Note that the same conclusion can be derived directly by writing
(e
+ (2) and where ~ rv N(O, 1) is independent of ~ and showing that the two factors /(e + (2) are independent and follow exponential and arcsine laws respectively. The final remark is that all r. v.s considered in cases (i) and (ii) are absolutely continuous. Similar examples for discrete variables can also be constructed. This is left to the reader (try to avoid some trivial cases).
e
5.4.
A function which is a metric on the space of distributions but not on the space of random variables
Let us define the distance r(X, Y) between any two r.v.s X and Y by r(X, Y) = sup IP{X ~ x} - P{Y ~ x}1 = sup IFx(x) - Fy(x)l, x
x E ]RI.
x
(r is called the uniform distance or Kolmogorov distance). Another suitable notation for r(X, Y) is r(Fx , Fy), where Fx and Fy are d.f.s of X and Y respectively. The function r, considered on the space of all distribution functions on ]R', is a metric. Indeed, it is easy to see that: (i) r(X, Y) > 0 and r(X, Y) = 0 iff Fx = Fy; (ii) r(X, Y) = r(Y, X); (iii) r(X, Y) ~ r(X, Z) + r(Z, Y). In (i)-(iii), X, Y and Z are arbitrary r.v.s. Suppose now that the function r is considered on the space of the r. v.s on the underlying probability space. Then, referring to Example 5.1 above, we conclude that r is not a metric because it violates the condition that r(X, Y) 0 implies
Xa.s.y. 5.5.
On the n-dimensional distribution functions
Comparing the definitions of one-dimensional and multi-dimensional dJ.s, we see that in the one-dimensional case condition (d) is implied by (at). However, if n > 1 2 it is easy to construct then (d) is no longer a consequence of (al) and even for n a function F( x, y) satisfying (ad, (bd and (c I) but not (d). For example, take the function F(x ) = if x ~ or x + y ~ 1 or y ~ 0 ,Y 1, otherwise.
{O,
?
Obviously (al), (b l ) and (c,) are satisfied. Suppose F is a d.f. of some random vector, say (X, Y). Then for every parallelepiped Q = [a" btl x [a2. b2] (here a rectangle) we might have P{(X, Y) E Q} ~ O. However, ifQ [%,1] x [~. 1] then
P{(X, Y) E Q} = F(1, 1) - F(I,
*) - FO, 1) + F(~,~) =-1
34
COUNTEREXAMPLES IN PROBABILITY
which is impossible. Therefore conditions (al), (b l ) and (CI) are not sufficient for F to be a dJ. in the n-dimensional case when n ~ 2. Let us suggest one additional example. Define
if x ::; 0 or Y ::; 0 otherwise.
0
G(x, y) = { ~in[l, max[x, yJ],
It can be checked that G satisfies conditions (al), (b l ) and (CI) but not (d). It is sufficent XI ::; YI < X2 ::; Y2 ::; I here to take the rectangle R = [Xl, X2] X [YI ,Y2], where
°: ;
and calculate the probability P{ (X, Y) E R}.
5.6.
The continuity property of one-dimensional distributions may fail in the multi-dimensional case
Let X be a r. v. on the probability space (.0, ~, P) and F its dJ. Suppose the values of X fill some interval (finite or infinite) in IRI and for each X of this interval, P{w:X(w) x} = 0, that is the probability measure P has no atoms. Then F(x) is continuous in X everywhere in this interval. Thus we come naturally to the following question: does an analogous property hold in the multi-dimensional case? By an example for n :::: 2 we show that in general the answer is negative. Indeed, consider the following function in the plane:
F(x,y) =
xy, if 0 ::; 1 if 0 ~ Y, I, 0,
X
X
~ 1, 0 ~ Y ~ ~ 1, ~ Y <
if 1 < x < if x > I, Y otherwise.
!
00,
! 00
0 ~Y ~ 1
>1
It is easy to check thatFis a two-dimensional d.f. Denote by (X, Y) the random vector whose d.f. is F. We shaH also use the notation in figure 1, where we have indicated the domains in jR2 and the corresponding values of F. Note that the vector (X, Y) takes values in the quadrant Q = {(x, y) : 0 ::; x < 00, 0 ::; y < oo}, and moreover each point (x, y) E Q has zero probability. Following the one-dimensional case we could expect that F{x, y) is continuous everywhere in Q. But this conjecture is false. Indeed, it is easily seen that every point with coordinates (l,y) where ~ < y < 00 is a discontinuity point of F. If ~(l, y) := F( 1, y) - F( I - 0, Y - 0) is the size of the jump in F at the point (l,y), we find thatA(l,Y) = Y - ~ for! < Y < 1 and A( 1, y) = for 1 ::; Y ::; 00. The reason for the existence of this discontinuity of the dJ. F is that there is a 'hyperplane' with strongly positive probability, namely P{ X l , ~ ::; Y ::; I} = ~ (see the bold vertical segment in figure 1).
!
35
RANDOM VARIABLES AND BASIC CHARACTERISTICS
y
x
2 1
0 Y
2 xy
a
I
0
x
0
Figure 1
5.7.
On the absolute continuity of the distribution of a random vector and of its components
Consider for simplicity the two-dimensional case. Suppose (XI, X 2 ) has an absolutely continuous distribution. Then it is easy to see that each of XI and X2 also has an absolutely continuous distribution. The question now is whether the X., converse is true. To see this, take Xl to be absolutely continuous and let X 2 that is X 2 (w) Xl (w) for each wEn. Evidently X2 is absolutely continuous. Suppose the vector (XI, X 2 ) has an absolutely continuous distribution with some density, say f. Then the following relation would hold:
=
=
(1)
P{(X I ,X2 ) E B} =
II
f(XI,X2)dxldx2
for any set B E I11
2
.
B
However. all values of the vector (XI, X 2 ) belong to the line l : X2 = Xl. If we take B = l {(x J, X2) : X2 - Xd then the left-hand side of (l) is 1, but the right- hand side is 0 since the line 1 has a plane measure O. Hence (XI, X 2 ) is not absolutely continuously distributed, but each of its components is. Note that if XI and X 2 are independent and absolutely continuous, then (XI, X 2 ) is also absolutely continuous.
=
36
5.S.
COUNTEREXAMPLES IN PROBABILITY
There are infinitely many multi-dimensional probability distributions with given marginals
If the random vector (XI, ... ,Xn ) has a d.f. F(XI,"" Xn) then the marginal distributions Fdxk) = P[Xk ~ Xk], k = 1, ... , n are uniquely detennined. By a few examples we show that the converse is not true. It is sufficient here just to consider the two-dimensional case. The examples treat the discrete, absolutely continuous and the general cases. (i) Let P
= {pij, i, j = 1,2, ... } be a two-dimensional discrete distribution. Select
two points, say (XI, YI) and (XI, X2), each with positive probability and such that XI ¥- X2, YI ¥- Y2. We can choose a small c satisfying the relations 0 < c ~ PI I , and o < c ~ P22. Consider the set q = {qij, i, j = 1,2, ... } defined as follows: qll =PII-E, ql2 =PI2+ c , q21 =P21 +E, q22 =P22- C
=
and for all i, j ¥- 1, 2, we put qij Pij. Then it is easy to check that q is a two-dimensional distribution. Moreover, the marginal distributions of q are the same as those of P for each c as chosen above, even though P ¥- q. (ii) Consider the following two functions:
~(1 +XIX2),
{ 0,
~, { 0,
if-1 ~ XI ~ 1, -1 ~ X2 ~ 1 otherwise,
if -1 ~ XI ~ 1, -1 ~ X2 ~ 1 otherwise.
Then I and 9 are both two-dimensional probability density functions. For the marginal densities we find if -1 ~ XI ~ 1 otherwise, if -1 ~ XI ~ 1 otherwise, thus
if -1 ~ X2 ~ 1 otherwise, if -1 ~ X2 ~ 1 otherwise
= 91 and h = 92, but obviously I ¥- 9·
II
(iii) Here is another specific but interesting example. For arbitrary positive constants a, b, c consider the functions ifO~x,y,x+y~1
otherwise and j(X,y)
={
I
B(a,b+c)B(b,a+c)
o
xa-I(1 _ x)b+c-l y b-I(1 _ y)a+c-I , ,
if 0 ~ X, Y ~ 1 otherwise.
RANDOM VARIABLES AND BASIC CHARACTERISTICS
37
Here r(·) and B ( " .) are the well known gamma and beta functions of Euler. Both I and j are two-dimensional probability density functions. Note that I is the density of the so-called Dirichlet distribution. Denote by (X, Y) and (X, Y) the random vectors whose densities are I and j respectively. Direct computations show that X and X have beta distribution with parameters (a, b + c) and Y and Y have beta distribution with parameters (b, a + c). Thus, again, the marginal densities of the vectors (X, Y) and (X, Y) are identical, but obviously I f. j. (iv) Suppose Fl and Fz are dJ.s obeying densities band
h
respectively. Consider
the function
where E is an arbitrary number, lEI::; 1. Then I is a density function and for each E the marginal densities are band h respectively. The answer to the question fonnulated at the beginning of this example can be also given in tenns of the dJ.s only. Indeed, let F l , F z be any dJ.s and E any real number, lEI::; 1. Then by direct computation we see that
is a two-dimensional dJ. whose marginals are just FI and F 2 . (v) Let F and G be arbitrary dJ.s. in JR 1. Define
HI (x, y) For any CI, C2
= max{O, F(x) + G(y) 2:
°
with CI
+ C2 =
H(x,y)
1},
H 2 (x, y) = min{F(x), G(y)}.
1 let
= cIHl(x,y) +C2H2(X,y).
Then it is not difficult to check that H(x, y), (x, y) E JR2 are two-dimensional dJ.s (Frechet distributions). Moreover, any dJ. of this class has F and G as its marginals. Hence there are infinitely many multi-dimensional distributions with the same marginal distributions. In other words, the marginal distributions do not uniquely detennine the corresponding joint distribution. The only exception is the case when the random vector consists of independent components. In this case, the joint distribution equals the product of the marginals.
5.9.
The continuity of a two-dimensional probability density does not imply that the marginal densities are continuous
Let I(x, y), (x, y) E ll~? be a probability density function which is continuous. Denote by Idx). x E JRl and h(y), y E lR! the corresponding marginal densities. There are problems which require the use ofthe marginal densities II and h and their continuity properties. Intuitively we might expect that 11 and h are continuous if I
38
COUNTEREXAMPLES IN PROBABILITY
is. However, such a conjecture is not generally true. Indeed, consider the following function:
It is easy to check that density It we find
I
is a probability density function. For the first marginal
if x = 0 if x # O.
(2)
Clearly II is discontinuous at x = 0 despite the fact that I is continuous. Notice that the marginal density II is discontinuous at one point only. Now, using (1), we construct a new probability density function which will be continuous, but one of whose marginal densities will have infinitely many points of discontinuity. Let Tj, T2, .•. be rational numbers in some order and let 00
g(x, y)
(3)
=L
2- n I(x -
Tn,
y).
n=1
Since I given by (1) is bounded on m?, the series on the right-hand side of (3) is uniformly convergent on JR 2 . Moreover, 9 is a probability density function which is everywhere continuous. The marginal density gl of 9 is 00
(4)
gl(x)
=L
2 - n ll(x -
Tn)
n=1
with II given by (2). The bounded ness of II implies the uniform convergence of (4). It follows from (4) that gl is discontinuous at all rational points Tj, T2, ... , though it is continuous at every irrational point of JR I.
5.10.
The convolution of a unimodal probability density function with itself is not always unimodal
We present two examples and then discuss them briefly. (i) Consider the following function:
f(x)
It is easy to see that
I
=
0, 5, { 1,
1'f X
I d 5 < -30 an x > 6
if _..l <x < 0 30 if 0
<x
~ ~.
is a probability density function which is unimodal. Direct
RANDOM VARIABLES AND BASIC CHARACfERISTICS
calculation shows that the convolution
r
o2
02
(x) := (J
* J)(x) is:
0,
1'f X
25x + ~,
if _J..<x<_J.. 15 30
+ 13 '
-15x (x) =
1
1 < -15
-
if _J.. 30
5 an d x>:.. - 3
< x -< 0
-
if 0 ~ x ~ ~
+ 3' -9x + 25 3 '
if ~<x<~ 5 - 6
~-x
if ~ ~ x ~ ~.
x
:1
r
r
'
has two local maximaatx = -1 /30and x = 4/5. J"2( -1/30) 1*2(4/5) = 17/15 and one minimum equal to 1/3 at the point x = O. Hence the convolution operation does not preserve unimodality.
Obviously
o2
39
= 5/6.
(ii) Suppose a and b are positive numbers. Denote by U a and Va the densities of uniform distributions on (O,a) and (-~a, !a) respectively. Let 1 = !(u a + Ub), 9 = !(v a + Vb). Then each of 1 and 9 is a unimodal density and, moreover, 9 is symmetric. We want to know whether the convolution 1 * 9 is unimodal. To see this we use the equality
Considering separately each of the terms on the right-hand side of this representation we arrive at the following conclusions:
* Va
linearly decreases on (!a, ~a) with slope (_a- 2 ) and vanishes on (~a,oo); (2) Ub * Vb linearly increases on (- !b, tb) with slope b- 2 ; (3) U a * Vb is constant on (a - !b, !b); (4) Ub * Va is constant on (ta, b - ta) and then decreases linearly. (1)
Ua
Now choose the parameters a, b such that b > 3a. From (1 )-(4) it follows that 1 *9 is decreasing in the interval (~a, ~a) and is increasing in (~a, ~b). but this means that 1 * 9 is not unimodal. Let us note that in case (i) the density 1 is unimodal but not symmetric, while in case (ii) both densities 1 and 9 are unimodal, 9 is symmetric and 1 is not symmetric. We have seen that the convolutions 1 * 1 and 1 * 9 are not unimodal. Thus in general the convolution operation does not preserve the unimodality property. Note that if 1 and 9 are unimodal densities and both are symmetric then their convolution 1 * q is unimodal (Lukacs 1970; Dharmadhikari and Joag-Dev 1988).
40
COUNTEREXAMPLES IN PROBABILITY
5.11.
The convolution of unimodal discrete distributions is not always unimodal
Recall first the definition of unimodality. Let T = {Pn, n E No} be a probability distribution on the set of the non-negative integer numbers No or on some subset (or even on a countable subset of IR I). We say that P is unimodal if there is an integer ko such that Pk is non-decreasing for k ::; ko and non-increasing for k ~ k o. The value ko is called a mode. We wish to know if the unimodal property is preserved under the convolution operation. Example 5.9 shows that the answer is negative in the absolutely continuous case. Let us find the answer in the discrete case. Consider two independent r.v.s., say ~ and TJ with values in the sets {O, 1, ... ,m} and {O, 1, ... ,n} respectively. For the probabilities Pi = P[~ = i] and qj = P[17 = j] we suppose that
m+2
Po
= 2m + 2'
n+2
1
PI
= ... = Pm = 2m + 2;
qo
= 2n + 2'
I
ql
= ... = qn = 2n + 2 .
Then each of the distributions P~ = {Pi, i = 0, I, ... , m} and P1/ = {qj, j 0, I, ... , n} is unimodal. The sum () = ~ + 17 is a r. v. with values in the set {O, I, ... , m + n} and its distribution Po = {Tk' k = 0, I, ... ,m + n}, in view of the independence of ~ and 17, is eq ual to the convolution of P{ and P1/: Po = P{ * P1/ . This means that Tk
= prO = k] = P[~ + 17 = k] = L Piqj,
k = 0, 1, ... , m
+n
where the summation is over all i E {O, I, ... , m} and j E {O, I, ... , n} with i + j = k. In particular we can easily find that TO
= Po + qo = T2
(m + 2)(n + 2) (2m + 2)(2n + 2)'
TI
= Poq2 + Plql + P2QO =
= Poql + PI qo =
m +n +4 (2m + 2)(2n + 2) ,
m+n+5 (2m + 2)(2n + 2)' etc.
Comparing TO, TI and T2 we see that TO > TI but TI < T2. Even without additional calculations this is enough to conclude that the distribution Po, that is the convolution P( * P11 is not unimodal even though both P( and P11 are unimodal.
5.12.
Strong unimodality is a stronger property than the usual unimodality
The d.f. G is called strongly unimodal if the convolution G * F is unimodal for every unimodal F. (This notion was introduced by 1. Ibragimov in 1956.) Note that several useful distributions are indeed strongly unimodal: the normal distribution N(a, 0'2); the uniform distribution on the interval [a, b]; the gamma distribution with a shape parameter a ~ 1; the beta distribution with parameters (a,b), a > 1, b ~ 1, etc. However we have seen (see Example 5.9) that the convolution of two unimodal distributions is, in general, not unimodal. This implies that strong unimodality is a stronger property than (usual) unimodality. Obviously, Example 5.9 deals with
RANDOM VARIABLES AND BASIC CHARACTERISTICS
41
absolutely continuous distributions. Hence it is of interest to consider such a case involving discrete distributions. Let Fk denote the uniform distrihution on the finite set {O, I, ... , k} and let F = 4(Fo + Fm+d for a fixed m 2: 3. Then F is unimodal and our goal is to look at the convolution G = F * F. The distribution G is concentrated on the set {O, 1,2, ... , 2m - 2} and if gil i = 0, 1,2, ... ,2m - 2. are the masses of its 'atoms'. then we easily find that
It follows immediately that
Thus 91 < min[90,92] and therefore the distribution G = F other words. F is unimodal but not strongly unimodal.
*F
is not unimodal. In
5.13.
Every unimodal distribution has a unimodal concentration function, but the converse does not hold Let X be a r.v. with a d.f. F and /-LF be the measure on (JR 1, ~I) induced by F. Recall that the function
(1) is said to be a concentration/unction (ofP. Levy) corresponding to F and also to /-LF. (Here the sum of sets is defined in the usual sense: A + B = {a + b : a E A, b E B}.) Important results concerning concentration functions and their applications have been summarized by Hengartner and Theodorescu (1973). From (1) we can easily derive that QF(l). Z E JRI is a d.f. Let us mention the following result (Hengartner and Theodorescu ] 973). If F(x). x E JR I is a unimodal d.f. with mode x* = 0. then the concentration function Q F (Z). 1 E JRI is unimodal with mode Z* = 0. By a concrete example we can show that the converse is not always true. We give below the dJ. F and its concentration function Q F calculated by (1), namely:
0, I
4x , 1
F(x) =
4'
~(x-1), ~(x + 2), 1,
if if if if if if
°x :<; x°x <<
1 1 ::; 2 2 ::; x < 4 4 ::; x < 6 x 2: 6.
_
QF(I) -
Clearly Q F is unimodal but F is not unimodal.
r
~Z,
~(l + 2),
1,
if 1 < 0 if 1< 2 if 2 ::; Z < 6 if 1 2: 6.
°: ;
42
COUNTEREXAMPLES IN PROBABILITY
SECTION 6.
EXPECTATIONS AND CONDITIONAL EXPECTATIONS
For any r.v. X on a given probability space (n,:T, P) we can define an important characteristic which is called an expectation, or an expected value, and is denoted by EX. If X > 0 and P[X = 00] > 0 we put EX = 00, while if P(X = 00] = 0 we define .
00
k+lj
k [k
EX = n-+oo hm '"' L.... -2 n P -2n < X < - -2n- .
( 1)
k=\
For an arbitrary r.v. X let X+ = max{X,O} and X- = max{-X,O}. Since X+ and X- are non-negative, their expectations E[X+] and E[X-] can be obtained by (l), and if either E[X+] < 00 or E[X-] < 00 then
EX = E(X+] - E[X-]. The expectation EX is also called the Lebesgue integral of the :T-measurable function X with respect to the probability measure P. We say that the expectation of X is finite if both E[X+] and E[X-] are finite. Since IXI X+ + X ,the finiteness of EX is equivalent to E(lX I] < 00. In this case the r. v. X is said to be integrable. If X is absolutely continuous with a density f, then X is integrable iff f~oo Ixlf(x) dx < 00 and EX = f~oo xf(x) dx. If X is discrete, P[X = x n] = Pn, Pn > 0, n 1,2, ... , LnPn = I, then X is integrable iff Ln IxnlPn < 00 and EX:::::: LnxnPn. For some purposes it is necessary to consider the integral of X over the set A E :r. X(w)lA(w) dP(w). In such a case fAX dP = It is convenient to introduce here the space Lr(n,:r, P), or simply Lr, of all r- in tegrable r. v.s where r > and X E L riff E(I X Ir] < 00. In addition to the expectation EX, important characteristics of the r. v. X are the numbers (if defined)
=
=
fa
°
E[(X - c)k], E[IX - clk], k
= 1,2, ... , c E }R\
which are known as the krh non-central moment and kth non-central absolute moment of X about c respectively. If c :::::: EX these moments are called central. In this section and later we use the notation O!k = E(Xk] for the kth moment of X. In the particular case when k :::::: 2 and c = EX we get the quantity E[(X - EX)2] which is said to be the variance of X and is denoted by VX : VX E[(X - EX)2]. The expectation possesses several properties. We mention here only a few of them. If Xl and X 2 are integrable r.v.s and CI,C2 E III I then Xl + X 2 and CiXi are also integrable and
=
E[c\XJ + C2X2] = c\EX J + C2EX2 EX\ ::; EX2 if XI::; X 2
(linearity), (monotonicity).
43
RANDOM VARIABLES AND BASIC CHARACTERISTICS
Other properties such as additivity over disjoint sets and different kinds of convergence theorems can be found in the literature (Chung 1974; Chow and Teicher 1978; Laha and Rohatgi 1979; Shiryaev 1995). The family {Xn : n ~ 1} ofr.v.s is said to be uniformly integrable if sup
r
IX n I dP(w) ---+ 0 as a ---+
00
n i[lXI>a]
or, in another equivalent form, if sup E[IX n IIUxn I>aj] ---+ 0 as a ---+
00.
n
Suppose now that X is a r.v. on the given probability space and 'D is a sub-a-field of 1'. Following the same steps as for the definition of the expectation EX, we can define the conditional expectation E[X I'D] of the r.v. X with respect to the a-field 'D. So, if X is an integrable r.v., then E[XI'D] is a 'D-measurable r.v. such that for every A E 'D we have
i
E[XI'D] dP =
i
X dP a.s.
The existence ofE[XI'D], up to equivalence, is a consequence of the Radon-Nikodym theorem (Chow and Teicher 1978; Shiryaev 1995). Here are some properties of conditional expectations: (i) if X = C a.s. where C = constant, then E[XI'D] = Ca.s.; (ii) XI ~ X2 ~ E[XII'D] ~ E[X21'D] a.s.; (iii) if Xl and X2 are integrable r.v.s and Cl, C2 E JR 1, then E[CIXI
+ c2X21'D]
= cIE[XII'D]
+ c2E[X21'D] a.s.;
(iv) E{E[XI'D]} = EX; (v) if'Dl C 'D 2 C 1', then E{E[XI'D2]1'DJ} = E[XI'Dd a.s.; (vi) if X is independent of the a-field 'D (that is, X is independent of lA, A E 'D), then E[XI'D] = EX a.s.; (vii) if X is 'D-measurable and E[lXYIJ < 00, then
E[XYI'D] = XE[YI'D] a.s. Finally, let us mention an important particular case of the conditional expectation. For any event A E l' the conditional expectation E[IA I'D] is denoted by P(AI'D) and is called the conditional probability of the event A with respect to the a-field 'D (also see Example 2.4). This section includes examples devoted to various properties of expectations, conditional expectations and moments (in both one-dimensional and multidimensional cases). The Fubini theorem is introduced and analysed in Example 6.6, and conditional medians are considered in Example 6.10.
44
6.1.
COUNTEREXAMPLES IN PROBABILITY
On the linearity property of expectations
If one operates with expectations sm;h as E[X + Y] and E[X + Y + Z] it is generally accepted that E[X + Y] = EX + EY and E[X + Y + Z] = EX + EY + EZ. (Analogous relations can be written for more than three terms.) This is just the so-called linearity property of expectations. Its meaning is that the value of E[·] depends on the variables in [.] only through their marginal distributions. Recall that in the case of two r. v.s the linearity holds if E[X + Y] is defined (in the sense that E[(X + Y)+] and/or E[(X + Y)-] are finite). Of course, if EX and EY both exist then E[ X + Y] exists and equals their sum. Moreover, the linearity holds even when EX and EY are not defined, or if one of them equals +00 and the other equals -00 (Simons 1977). N ow the question is: what happens if we consider three variables? Does the linearity property of expectations still remain valid? The answer will follow from the example below. Let € denote a r.v. distributed uniformly on [0,1.1. Then I - (and 7J = 12( - II have the same distribution as €. Define three new r. v.s, say X, Y and Z, in two different ways.
Case I. X :::: Y = tan(7r€/2), Z :::: -2€. Case II. X'
= tan(7r€/2), Y' = tan(7r(1
- €)/2), ZI = -2tan(7r7J/2).
It is evident that X 4: X', Y d Y', Z d Z'. Our purpose now is to find the expectations E[X + Y + Z] and E[X' + Y' + Z']. In Case I, X + Y + Z = 0 and hence E[X +Y +Z] = O. In Case II we have: Y' = cot(7r€/2), Z' = tan(7r€/2) - cot(7r€/2) if 0 < € < and Z' = cot(7r€/2) - tan(7r€/2) if < € < 1. Thus X' + Y' + Z' = 2 tan(7r€/2) = 2X if 0 < € < 1 and X' + Y' + Z' = 2 cot(7r€/2) = 2Y if < € < 1. Hence P[X' + Y' + Z' > 0] = 1. Moreover, it is easy to calculate that E[X' + Y' + Z'] = (4/7r) log2. Comparing the results from Cases I and II we see that the linearity property described above for two r.v.s can fail for three variables. Note that if one considers X = X + Y and Y = Z then E[X + Y + Z] = E[X + Y] and E[X + Y], when defined, depends on X and Y only through the distribution of X and the distribution of Y. But X = X + Y and thus the value of the expectation E[X + Y + Z], when defined, depends on X, Y, Z through the bivariate distribution of X and Y, and the distribution of Z. The reader could try to clarify how the linearity property of expectations is expressed when considering more than three variables. In general we have to be careful when taking expectations of even such simple expressions like sums of r. v.s.
i
!
t
45
RANDOM VARIABLES AND BASIC CHARACTERISTICS
6.2.
An integrable sequence of non-negative random variables need not have a bounded supremum
Let {X n, n > I} be a sequence of non-negative r. v.s such that for some p > 0, X~ is integrable for each n, and, moreover, let supn E[X~] < 00. Then intuitively we could expect that the variables Xn as well as sUPn Xn are bounded. Let us show that such a conjecture need not be true. Consider the sequence Xl 1 X 2 , ••. of Li.d. r.v.s whose common d.t. is F(x) = 0 if x < 0 and F(x) - J - e- X if x > 0 (exponential distributiun with parameter J). Then for any p > 0 we have E[X~] rep + 1) < 00 and thus supn E[X~] < 00. Further, for x > 0 and m = 1,2,.,. we find
Passing to the limit in hoth parameters m and x we get lim P[ m,ax Xj
m-+oo
I::;J::;m
S x] =
P[supXj j
and lim P[sup Xj
x-+oo
j
S x]
S x]
= 0
for all x
>0
= P[sup Xj < 00] = o. j
Therefore we have shown that in general the integrahility of any order p > 0 of members of the sequence {Xn, n ~ I} does not imply boundedness of the supremum of this sequence.
6.3.
A necessary condition which is not sufficient for the existence of the first moment
Let X be a r.v, with d.f. F. It is well known and easy to check that the condition lim x -+ oo x(l - F(x» = 0 is necessary for the existence of the expectation EX. Thus we arrive at the inverse question: if F is such that x(l - F(x» -t 0 as x -t 0, does this imply that EX exists? The example below shows that in general the answer is negative. To see this take the following d.f.:
F( ) _ { 0, x l-I/(kx),
if x S 1 if e k - I < x
S ek,k = 1,2, ....
Jooo
=
Direct reasoning shows that x( 1 F( x) -t 0 as x -t 0 while (1 - F( x) dx oo 00 and since EX = (1 F (x) ) dx. then EX does not exist. We can say even more: if E[lXIO] < 00 for some 0: > 0, then necessarily nOP[lXI > n] -4 0 as n -4 00 (e,g. see Rohatgi 1976), Let us take 0: = 1 and illustrate once again that a condition like nP[X > nJ -4 0 as n -4 00 is not significant for the existence of EX . Indeed, let us consider the following discrete r.v. X defined by P[X = n] = c/(n2 10gn). n = 2,3, .. " c is a norming
Jo
46
COUNTEREXAMPLES IN PROBABILITY
constant. We can then show that for large n. P( X > n) '" c/ (n log n) implying that nP[X > n] -+ 0 as n -+ 00. However EX should be equal L~=2 c/(n logn) and the divergence of this series shows that the expectation EX does not exist. Finally, note that if n + 15 P[i X I > n] -+ 0 as n -+ 00 for some 8 > 0, then E[IXIQ] does exist. Q
6.4.
A condition which is sufficient but not necessary for the existence of moment of order ( - 1) of a random variable
The moments of negative orders of r. v.s are used in some probabilistic problems and it is of interest to know the conditions which ensure their existence. If X is a r.v. with a discrete distribution having positive mass at 0, then E[X- I ] is infinite. The same holds if X is absolutely continuous and its density 1 satisfies the condition 1(0) > O. The following useful result is proved by Piegorsch and Casella ( 1985): let X be a r. v. with density f (x), x E (0, 00) which is con tin uous and satisfies the condition lim xQ f(x) = 0 for some
( 1)
x-+o
0:
>0
then E[X- I ] < 00. By an example we aim to show that E[ X -I] can be finite even if (1) fails: that is, in general, condition (I) is sufficient but not necessary for the moment of order minus one to be finite. Indeed, define the family of functions {In, n 2: l} by
(2) where c = constant, c E (0, I). It is easy to check that for each n, In is a probability density function of some r.v., say X n . Since
I.e Ilogn ul- du < I
00,
lim In(x) = 0,
x-+o
for every 0: > 0, it follows that (I) is not satisfied. It then remains for us to determine whether E[X;;-I] exists. By (2) we find that E[X;I] is finite iff
(3) For n = I the integral in (3) diverges for all c E (0, I). but if n = 2,3, ... , this integral is finite for any c E (0, I). Consequently E[X; I] < 00 iff n 2: 2. So, for n ~ 2, E[ X;;- I] < 00 but condition (1) does not hold.
47
RANDOM VARIABLES AND BASIC CHARACTERISTICS
6.5.
An absolutely continuous distribution need not be symmetric even though all its central odd-order moments vanish
Let F(x), x E JR.I be an absolutely continuous dJ. with a density f. Suppose F is symmetric, that is F( -x) = 1 - F(x), or, equivalently, f( -x) = f(x) for all x E JR. 1• Suppose F has moments of all orders. Then the central odd-order moments of X
el2n+1
= E[x2n+l] =
i:
x2n+1 f(x) dx
are zero for all n = 0, I, ... since the integrand x2n+' f(x) is an odd function and the integral is taken over the interval (-00,00), which is symmetric with respect to the origin O. Suppose now that the distribution G(x), x E JR.' has all its central odd-order moments vanishing. The question is: does it follow from this condition that G is symmetric? The answer is negative as illustrated by the following example. Let the function g( x), x E JR.1 be defined by
( ) = {418 exp(-lxI 1/ 4 () + sin Ixl l / 4 ),
(I)
9 X
418
exp
(-x l / 4 (l
- sin xl/4),
if x < 0 if x 2 o.
It is easy to verify that 9 is a probability density function. Denote by Y a r. v. with this density. Then we can calculate explicitly the moments eln - E[yn] for each n, n = 0, I, .... The result is
el2n+1
= 0,
1
el2n = 6(8n + 3)!.
Thus all central odd-order moments of Y are zero, but obviously the distribution of Y defined by the density (I) is not symmetric. (Also see Example) 1.12.)
6.6.
A property of the moments of random variables which does not have an analogue for random vectors
Let (Xl,' .. ,Xn ) be a random vector on a given probability space (.o,:r, P). Let kl' ... ,kn be non-negative integers. If E(lXllkJ .. 'IXnlk n 1exists, then the number
elkJ ... kn = E[XtJ ... x~n] is called a (kl, ... ,kn)th mixed central moment of the random vector (XI, . .. ,Xn ) and k = kl + ... + k n is its order. If n 1 we have one r. v. X I only, and it is well known that the existence of the kth moment elk implies the existence of all moments elj for 0 S j S k. It suffices to recall the Lyapunov inequality (E[lXIIJ])I/J S (E[lXllk])l/k, 0 S j S k, or to use the elementary inequality Ixl j S I + Ixl k , x E JR.1, 0 S j S k. This observation in the one-dimensional case leads to the following question: does a similar statement
=
48
COUNTEREXAMPLES IN PROBABILITY
hold in the multi-dimensional case? The answer is negative and can be expressed as follows: in the case n > I the existence of a moment Ok l , ... ,k n does not imply the existence of all moments ajJ, ... ,j.. for 0 S 1m S k m • m 1, ... , n. To see this, take .a (0, 1), ::1 13 (0, I) and P the Lebesgue measure. For fixed numbers CI and C2, o < CI S C2 < 1, define the following r. v.s:
=
=
=
Xl={W0,
I
,
x = {O,
ifO<w
2
0
~f < w S CI (I - w)-I : If ci < w < 1.
It is easy to check that the product XI . X 2 is integrable, but neither XI nor X 2 is. Thus the moment 01,1 of the vector (XI, X 2 ) exists, but 00,1 and 01,0 do not exist. O. Obviously, if CI < C2, then 01,1 > 0 and if CI = C2, then al,1
=
6.7.
On the validity of the Fubini theorem
Let (.a l ! ::1 1 , PI) and (.a2, ::12, P 2 ) be two probability spaces. Then there exists only one probability P on the product (.a l x .a 2 ,::1 1 x ::12 ) such that P(AI x A2) = PI (AI )P2(A2 ),
AI E ::1 1, A2 E ::12-
Further, for every non-negative (or quasi-integrable) r. v. X defined on (.a l x .a2, ::11 x ::12, P), the following formula is both meaningful and valid:
(I)
lUI )(U2
I
PI (dwd U 2 X W1 (W2)P 2 ( dw 2) Inl P2( dw2) In) Xwl(WdPI(dwJ) lUI
=
(for the proof see the books of Gihman and Skorohod ( 1974/1979) and Neveu ( 1965). Our purpose now is to show that the assumption that X dP exists is essential for the validity of (I). Let Z ;;::: 0 be a non-integrable r. v. on (.aI, ::11, PI) and define the variable X on the product of this space with the discrete space {O, I}, both points having equal probabilities, by
I
X(w,O)
= Z(w),
X(w, 1)
= -Z(w).
Then it is elementary to check that the second equality in (1) is violated.
6.S.
A non-uniformly integrable family of random variables
Consider the seq uence of r. v.s {X n, n
2: I} where
({Xn} arises in the so-called St. Petersburg paradox, see e.g. Szekely 1986.) Then the following relation clearly holds:
if a > if a S
2n
2n
.
49
RANDOM VARIABLES AND BASIC CHARACTERISTICS
This means that ~Xn~a \Xn\ dP does not tend to zero uniformly in n as a ---+ However, for each n, Xn is integrable since EX n = 1. Hence {X n} is an integrable but not uniformly integrable family of r. v.s.
00.
On the relation E[E(XIY)] = EX
6.9.
The definition of the conditional expectation of the r.v. X given another r.v. Y or some a-field, requires X to be integrable: E[\XI] < 00. In this case the equality E[E(X IY)] = EX holds. However, the following 'reasoning' appears to contradict this result. Let Y be a positive r. v. whose density gv (y) is given by
(compare this with a gamma distribution). Suppose the conditional distribution of X given Y = y is specified for y > 0 by the following probability density function:
(2) Therefore
E[XIY
= y] =
I:
xf(xly) dx
= 0 :::} E[XIY] = o:::} E[E(XIY)] = O.
On the other hand, (1) and (2) imply that the marginal density of X is
(3)
hv(x) = r(!(v + 1))(r(!v)(nv)!)-I(I + x 2 /v)-!(V+l),
x E IRI
that is, X has a Student distribution with v degrees of freedom. In particular, for v = I, X has a Cauchy distribution. In this case EX does not exist and hence
E[E(XIY)]
I- EX.
The reason for this 'contradiction' is in the approach used above: we started from (2), which yields E(XIY) = 0, then from (1) and (2) derived (3) which is a density of a r. v. without expectation.
6.10.
Is it possible to extend one of the properties of the conditional expectation?
Consider three r. v.s, say X, Y, Z. Suppose X is integrable and, moreover, X and Z are independent. Then E[XIZ] = EX a.s. Having this property we can assume that (I )
E[X\Y, Z] = E[XIY] a.s.
Our purpose now is to show that in general such an 'extension' is impossible. To see this, take n = [0, 1], ~ = 13[0,1] and P the Lebesgue measure. Define the
50
COUNTEREXAMPLES IN PROBABILITY
following r. v.s:
X(W)
= { 1, 0,
y(w)=={l, 0, Z(w) == { 1, 0,
if w E [0,4) jf w E [~, "1], if w E [0, if w E [~, 1], if w E [~, if w ¢. [i, ~].
V V
Then we can check that X and Z are independent Furthermore,
E[X\YJ ==
{J'0,
Therefore E[XIY, Z]
6.11.
if w E .[0, otherwlse,
V
O' E[XIY, Z] ==
{
!, I!
ifwE[i,lJ if w E if w E [0,
a, i)
!).
::f E[X\Y] and in general (1) does not hold.
The mean-median-mode inequality may fail to hold
Suppose X is a r.v. with mean Ji. A number m is called a median of X if P(X 2: m) 2: ~ and P(X S m) 2: It is easy to see that such m always exists, but in general X may have several medians. If X is unimodal and M is its mode, then the median m is unique and for M t m and Ji we have either M S m ~ Ji or M 2: m 2:: Ji-the median falls between the mean and the mode. A result of this kind is referred to as a mean-median-mode inequality. Recall that the symbol> 8 is used to denote a stochastic domination: for two r. v.s and 11, >8 11 {=} > x] 2: P[l1 > x] for all x. Let us cite the following statement (Dharmadhikari and Joag-Dev 1988): if X is a unimodal r.v. with mode M, median m and mean J.L and (X - m)+ >8 (X - m) , then M S m S Ji. Our goal now is to describe a case when the mean-median-mode inequality does not hold. Consider a r. v. X with density
!.
e
e
pre
sO
O, f(x) ==
{ x, ->.(x-c) ce
,
if x if 0< x ~ c if x> c.
Here c and" are positive constants and f is density iff d /2 + c/" == 1. We can easily find the mean, the median and the mode of X: c~
Ji = 3
c2
C
+ " + ,,2'
i
m
= 1,
M == c.
Now let c -t I. Then A -t 2, Ji -I> : > I and if c is sufficiently close to I but c > 1, then Ji > c and M == c > I. Here the median Tn (= 1) does nol fall between the mean Ji (> 1) and the mode M (> 1), i.e. the mean-median-mode inequality does not hold despite the fact that the density f is unimodal.
RANDOM VARIABLES AND BASIC CHARACfERISTICS
6.12.
51
Not all properties of conditional expectations have analogues for conditional medians
Recall that the conditional median of the r.v. X with respect to the a-field 1) is defined as a 'D-measurable r. v., say m, such that
P[X ~ ml'D] ~
! ~ P[X ~ ml'D].
By using the notation J,t(XI'D) for the conditional median we want to see if the properties of conditional expectations can be extended to conditional medians. In the examples below X and Y are r. v.s, 'D is a a-field, :To is the trivial a-field (:To = {0, O} ) and I (.) is the indicator function.
1. It is not always possible to find conditional medians satisfying
J,t(X + YI'D)
= J,t(XI'D) + J,t(YI'D)·
=* =
1 - P[XI -IJ and Indeed, let Xl and X2 be Li.d. r.v.s with P[XI = OJ put X XI X2, Y Xl - Xl X 2 • Then J,t(XI:To) 0: J,t(YI:To) 0 and even XY 0 while J,t(X + YI:To) = J,t(X11:To) = I. Thus the linear property of the conditional expectation (E[X + YI'D] = E[XI'DJ + E[YI'D]) does not in general hold for conditional medians.
=
=
=
2. It is not always possible to find conditional medians satisfying
J,t(J,t(XI'D)I'Dd = J,t(XI'D),
'Dl
c
'D.
=
Consider the r.v.s X and Y where pry = k] = ~, k = 0,1,2; P[X IIY = k] = = I-P[X == 0IY == k],k = 0, 1, and P[X - llY 2J ~ = I-P[X = 0IY = 2J. Let 'D be the a-field generated by Y. Since P[X = 1J then J,t(XI:To) = l. However, J,t(XI'D) J,t(XIY) = I(Y = 2) so J,t(J,t(XI'D)I:To) O. Therefore the smoothing property (E[E(XI'D)I'DJ] = E[XI'DJ]) also docs not in general hold for conditional medians.
l
=
1!
=
3. If the r.v. X is independent of the a-field 'D, it does not necessarily follow that every conditional median J,t(XI'D) is constant. To see this we need the following result (Tomkins 1975a): if X is independent of 'D, then every median J,t(X l:To) of X is a conditional median of X with respect to 'D. Now consider two independent r. v.s X and Y, each taking the values 1 and 0 with probability Let 'D 'D Y be the a-field generated by Y. Then X is independent of'D Y but the conditional median of X with respect to 'D Y is equal to Y, that is it is not constant.
t.
SECTION 7.
INDEPENDENCE OF RANDOM VARIABLES
Two r.v.s XI and X 2 on a given probability space (O,:T, P) are called independent if
(1)
P[XJ E B J , X2 E B2J = P[XI E BdP[X2 E B2J
52
COUNTEREXAMPLES IN PROBABILITY
ne
1
for any B I , B2 E 13 • If F(XI, X2), (XI, X2) E is the joint d.f. of Xl and X2 and FI (xJ), X E ]RI and F 2(X2), X2 E ]Rl are their respective marginal d.f.s then (1) is expressed as
(2) In the absolutely continuous case the independence of XI and X2 can be written in terms of the corresponding densities by (3)
=
=
If Xl and X2 are discrete r.v.s with P[X\ xld Ph, PH > 0, i ~ 1, EiPli = 1 and P[X2 X2j] P2j, P2j > O,j ~ 1, E j P2j - 1, then XI and X2 are independent iff
=
=
(4) or, equivalently, Pij=PIiP2j for all possible i,j, where Pij:=P[X 1=Xli, X2=X2j]. We say that Xl, ... , Xn is a family of mutually independent r.v.S if for every k, 2 5 k 5 n and I 5 i l < i2 < ... < ik 5 n the foHowing relation holds:
(5)
P[Xil E B il ,···, Xi" E B ik ] = P[Xil E Bill ... P[Xik E B i ,,]
=
for arbitrary Borel sets Bit" .. , B i ". If (5) is valid only for k 2, the variables X I, ... ,Xn are called pailWise independent. It is clear how mutual independence and pairwise independence of r. v.s can be expressed through the corresponding d.f.s (see (2», and how to do this in the absolutely continuous case (see (3» and in the discrete case (see (4». Parallel to the notion of independence we can introduce the closely related notion of conditional independence. Let 'D be a a-field, 'D C T and 'D I , 'D2 be classes of events. Then 'DI and 'Dz are said to be conditionally independent given 'D if, for all DI E 'Dl and D2 E 'D 2 • the following relation holds:
P(DID21'D) = P(Dd'D)P(D21'D)
a.s.
Obviously this definition includes the conditional independence of random events and of random variables. Let X and Y be r. v.s with < V X < 00, 0 < VY < 00. The quantity
°
p
X Y) = E[(X - EX)(Y - EY)] ( , (V XVY)I/2
is said to be a correlation coefficient between X and Y (simply, a correlation of X and V). If p(X, Y) = 0, the variables X and Yare called uncorrelated. We refer the reader to the books by Feller (1968, ]971), Chung (1974). Chow and Teicher (1978), Laha and Rohatgi (1979), Shiryaev (1995) for a detailed treatment of the notion of independence and several related topics. The examples in this section examine the relationship between independence, dependence and related properties of r. v.s.
53
RANDOM VARIABLES AND BASlC CHARACfERlSTlCS
7.1.
Discrete random variables which are pairwise but not mutually independent
Using some of the examples in Section 3 we can easily construct sets of r. v.s with different independence/dependence properties. Let (X, Y, Z) take each of the values (1,0,0), (0,1,0), (0,0,1), (1,1,1) with probability Then X, Y and Z are pairwise independent. For example, PIX I,Z=O] =! =~. ~ =P[X= ljP[Z = OJ. However,
(i)
i.
P[X = I, Y
1, Z = I] =
! ~ l = P[X = IjP[Y = IjP[Z =
1],
and hence the three variables are not mutually independent. (ii) Let n consist of nine points: the permutations of 1, 2, 3 and the triplets (1,1,1), (2,2,2), (3,3,3). Each has probability ~. Introduce three r.v.s, say Xl, X2, X), where X k equals the number appearing at the kth place. The possible values of these variables are 1, 2, 3 and we can easily show that
( 1)
P [X k =
i] = ~, P[ X k = i, Xl = j] = ~, k, I = 1, 2, 3, i, j = 1, 2, 3.
It follows immediately from (1) that Xl, X 2, X3 are pairwise independent. Since XI and X2 uniquely determine X 3 , the three variables are not mutually independent. (iii)
Let us continue the construction in case (ii). Consider new triplets
(X4 , X 5 , X 6 ), (X71 X g , X 9 ), ••• , similar in structure to (Xl, X2, X 3 ) and each independent of (X I ,X2,X3 ). Thus we obtain an infinite sequence of r.v.s Xl, X2, ... , X n ) .... Clearly, any two members X k , XI of this sequence satisfy relations (I). However the product rule does not hold for any k, k variables. Thus the r. v.s {Xn, n ~ I} are only pairwise independent.
7.2.
~
3, of these
Absolutely continuous random variables which are pairwise but not mutually independent
Let { and fJ be two independent r.v.s uniformly distributed in the interval (0,1r). Define the variables Xl = tan {, X2 = tan fJ, X3 = - tan({ + fJ). The variables Xl and X2 are independent, as a consequence of the independence of { and fJ. By finding the distribution of X3 we can establish that X3 and Xl are independent, as are X3 and X2. However, these variables are functionally dependent by the relation Xl + X2 + X3 = XIX2X3 and thus they cannot be mutually independent. Thus we have constructed a triplet of r.v.s which are pairwise but not mutually independent (equivalently, independent at level 2 and dependent at level 3).
54
7.3.
COUNTEREXAMPLES IN PROBABILITY
A set of dependent random variables such that any of its subsets consists of mutually independent variables
If Xl, . .. ,Xn are r.v.s, n :2: 3, and we know that they are mutually independent, then any proper subset of them consists of mutually independent variables. However, in general the converse statement is not true (see Examples 7.1 and 7.2, or construct analogues to some of the examples in Section 3). Here we shall consider two examples covering the discrete and the absolutely continuous cases. (i) Let n :2: 3 and A C IR n - 1 be the set of all (n - 1)-dimensional vectors of the type a (01, ... , gan-I) where 0i 1 or 0, i I, ... , n - 1. Obviously A contains 2 n - 1 elements (vectors): IAI = 2n-l. Let I(a) = 01 + ... + On-I, so I(a) takes values 0, I, ... ,n - I. Let B C IR n be the set of all vectors b where
=
=
b=
=
if I(a) is even if I(a) is odd.
{(Ol, ... ,on-I,I), (OI, ... ,On-l,O),
Then T : a H b is a one-one mapping of A onto B, so IBI = 2 n - 1 and, moreover, B is permutation invariant. Let aU) be an (n - 1)-dimensional vector obtained from b by eliminating the jth component of b. Denote by AU) the set of all such vectors aU). Thus we have defined the mapping T j - I : B H A (J). Clearly, A (n) = A and since B is permutation
invariant, we have A(j) = A(n) for all j = 1, ... , n - 1 and hence AU) = A for aJ] j = 1, .. . ,n. Now define the n-dimensional random vector X = (XI, ... , Xn) taking values in the set B and with a distribution given by
P[X = x] =
(I) Let X(j)
= T j- I (X)
2-(n-l) {
0,
'
if x E B otherwise.
= (XI, ... ,Xj _ l , Xj+I,"" Xn). Since T;I are one-one
mappings of B onto A, we find easily that the distribution of X(j) is given by
(2)
if x(j) E A otherwise.
The next step is to use relation (2) in order to find the marginal distribution of each of the components Xi of the vector X. We have
(3)
if Xi = 0 or 1 otherwise.
Now, comparing (I), (2) and (3) we arrive at the following conclusion: we have constructed n dependent discrete r. v.s XI, . .. , X n which are (n - I) -wise independent. that is any proper subset of which consists of mutually independent variables being in this case even identically distributed.
RANDOM VARIABLES AND BASIC CHARACTERISTICS
55
(ii) Let X be a r.v. with density function f and mean J1. = EX. Let XI,"" X n , n :::::: 3, be r. v.s and take a function of the following type as their joint density: n
(4)
n
gn(XI, ... ,X n ) = [IIf(xi)] [1 i=1
+ II(Xj
-J1.)f(Xj)], eachxj E IRI.
j=1
Weconsiderg n only forthosexj E IRI,j = I, ... ,n, for which IXj - J1.lf(xj) < 1. Otherwise we put gn (-) = 0. Then gn is a non-negative function. In order for (1) to be a density function, the integral of gn over the range of (XI, ... , xn) described above must be equal to 1. This leads to the condition
JC:: (x -
(5)
J1.)f2(x)dx = 0.
Notice that (5) is satisfied if, for example, the density f is symmetric about its mean value J1.. Let the density f satisfy (5), gn be defined by (4), and XI"'" Xn be r.v.s with density gn' Our purpose now is to establish what dependence there is between these n variables. By direct integration of (4) we find that each of the r.v.s XI, ... , Xn has as its density the given function f. Suppose we have chosen k of the X s, without restriction we can choose XI, ... ,Xb 2 S k < n. Denote by hk(XI, ... ,Xk), (XI, . .. , Xk) E IRk the joint density of XI"'" X k . Then from (4) and (5) we can easily show that Obviously this relation implies that X I, ... , X k are mutually independent. Of course, the same holds for any k-subset of Xl, ... , Xn where 2 S k < n. Nevertheless all n r.v.s XI"'" Xn are not mutually independent because (4) implies that
gn(Xl, ... , Xn) f: f(xl) ... f(x n ). It is useful to consider the following case. Let X be distributed uniformly on the interval (0, c), < e < 00. Its density is f(x) = lie for < X < e and otherwise. Then J1. = EX = ~e and (5) is satisfied. Take the random vector XI, ... , X n with density
°
_ {e- n [1 gn (Xl,· .. , Xn ) ,
°
+ n~I(Xi
- ~e)e-l],
°
°
°
= I, ... ,n
if < Xi < e, i h . ot erwlse.
Clearly gn is not the uniform density on the n-dimensional cube (0, e)n in IR n and X I, ... , X n cannot be mutually independent. However, any k of them, 2 S k < n, will be distributed uniformly in the cube (0, e)k in IRk and these k variables arc mutually independent. Hence we have described collections of n dependent absolutely continuous r. v.s which are (n - 1)-wise independent.
COUNTEREXAMPLES IN PROBABILITY
56
(iii) Consider the following function
- {(2IT)-n[l- cosx] .. . cosx n ], if (XI,." ,x n ) E Qn f( XI, ••. , Xn ) ~ a, h . ot erWlse where Qn is the n-dimensional cube [0,2IT]n in IRn. It is easy to check that f is non-negative and the integral of f over ]Rn equals 1. Hence f is a probability density function of a random vector in ]Rn, say of (XI," ., Xn). Denoting by fk(Xk) the marginal density of the component X k, we find that
implying that Xk is unifonnly distributed on the interval [0,2IT] and this holds for any (single) r.v. XI, X2, ... ,Xn . The fonn of their joint density f shows that these variables are not independent. If. however, we take any k of them, we conclude that for 2 S k S n - 1 they are independent (their joint density is equal to 1/ (2IT)k on the cube Qk == [0, 2IT]k in ]Rk). Therefore Xl,"" Xk is another collection of n dependent r.v.s which are (n I )-wise independent. (Compare with case (ii) above.)
7.4.
Collection of n dependent random variables which are m-wise independent
In Example 7.3 we have described collections of n dependent r.v.s which are (n - 1)-wise independent. Thus it is of a general interest to see collections of n dependent r.v.s which are m-wise independent with m < n - 1. We present two examples: in the first n = 4, m = 2, while in the second n :: 5, m= 3. (i) Let F r , F2 , F3, F4 bed.f.s on]R1 (or on its subsets). Denote G j 1- Fj and define the function H I234 (XI, X2, X3, X4), (XI, X2, X3, X4) E ]R4 as foHows (for simplicity we omit the arguments but we know they are real):
Our first claim is that if CI, C2, C3, C4, are non-zero numbers in the interval (-I, I) and Icd + IC21 + IC31 + IC41 < 1, then H 1234 is a four-dimensional d.f. Let ({I, 6, 6, {4) be a random vector whose dJ. is just H1234' We are interested in the independence/dependence properties of the components of this vector, so we need to know its k-dimensional marginal distributions for k = 3,2 and I. For example, if H 12 :h HI2 and HI are the d.f.s of ({1,{2,6), (6,{2) and {I respectively, we easily find that
RANDOM VARIABLES AND BASIC CHARACTERISTICS
57
It is quite clear how to write down the d.f. of any possible subset of components of the vector (~l' 6, 6, ~4)' Thus we arrive at the following conclusions: 1,2,3,4; (a) ~j has a d.f. equalto F j , j (b) any two of the r.v.s~" 6, 6, ~4 are independent; (c) any three of them as well as all four are dependent. Therefore {~l' 6, 6, ~4} is a collection of dependent r. v.s which are twice-wise (=pairwise) independent.
=
(ii) Now we have five d.f.s F" F2, F3, F4, Fs and as above we use the notation Gj = 1 - Fj , j = 1, ... ,5. Define the function H'234S(X"X2,X3,X4,XS), (XI, X2, X3, X4, XS) E ~s as follows:
H J234S = FIF2 F3F4FS{1 + E1G2G 3G 4G S +E2G,G3G 4G S +E3GIG2G4GS + E4G,G2G 3 G S + EsGIG2G3G4}. If E"E2,E3,E4,ES are non-zero numbers in the interval (-1, 1) and lEd + IE21 + IE31 + IE41 + IEsl < 1, then H'234S is a five-dimensional d.f. of a random vector in ~5, say (7]1,7]2,7]3,7]4, 7]S). In order to clarify what kind of independence/dependence there exists between the components of this vector, we have first to find all k-dimensional marginal distributions for k = 4,3,2, 1. In particular, if H1234, H'23, H'2 and H, are the d.f.s of (7]1,7]2,7]3,7]4), (7]1,7]2,7]3), (7]J 7 7]2) and 7]1 respectively, we find that
Similarly we can write the dJ.s in all the remaining cases, thus arriving at the fol1owing conclusions: (a) 7]j has a d.f. equal to F j , j = 1,2,3,4,5; (b) any two of the r. v.s 7],,7]2,7]3,7]4, 7]s are independent; (c) any three of them are independent; (d) any four, as well as all five, variables are dependent. Hence {7]I, 7]2, 7]3, 7]47 7]5} is a col1ection of dependent r.v.s which are three-wise independent. Note finally that a similar idea can be used when describing n dependent r.v.s which are m-wise independent. In cases (i) and (ii) above, as well as in the general case, the description can be done in terms of probability density functions.
7.S.
An independence-type property for random variables
Let XI, X2, ... be positive integer-valued r.v.s and Sk = XI + ... + X k . Suppose that Y" Y2,." is another sequence of i.i.d. positive integer-valued r.v.S with P[Yl = i] = Pi, Pi > 0, Pi = 1, and for all k ~ 1 and i ~ 1 the fonowing relation holds:
2::,
(1)
P[Sk
= i] = prY, + ... + Yk = i].
58
COUNTEREXAMPLES IN PROBABILITY
For various purposes one needs to find P[SI = iI, S2 = i2," ., Sk into account (I), the equaJities S2 = i l + X 2, S3 = i2 + X 3 , •.• , Sk and the independence of Y s, we can suppose that
= ik]' Taking = ik- I + X k
Obviously (2) is satisfied if the variables XI, X 2, ... are independent. Thus we want to know whether or not relation (2) holds for any choice of the sequence {X k}. Let PI,]J2, P3 be positive numbers with PI +P2 + P3 = I. Denote by Y a r. v. taking the values 1, 2, 3 with probabilities P',])2, P3 respectively, and let {Yk , k ~ I} be a sequence of independent copies of Y. Now define the pair of r. v.s (XI, X 2 ) as follows:
with Ell = £22 = £33 = 0, £21 = £32 = £13 = £ and £12 = £23 = £31 = -£ where the real number £ is chosen so that 1£1 ::; min{PIP2,P2P3,PIP3}' Let(X 3, X 4 ), (Xs, X 6 ),· .. be independent copies of the pair (XI, X 2). Thus we obtain the sequence XI, X2,. ", X n ,· ... We want to determine whether the sequences {Xd and {Yd just defined satisfy conditions (] ) and (2). Evidently, for all i, j we have
PIX,
= i]
= Pi
and
and (1) holds. Furthermore, if £ P[SI
i:
P[X,
+ X2 = j] =
P[Yi
+ 1'2 = j]
°
then
= 2, S2 = 3] = P[XI = 2, X 2 = ]] = PZPI + £ i: PZPI
and hence (2) is not satisfied. Therefore the independence property for the sequence {X k} is essential for the validity of (2).
7.6.
Dependent random variables X and Y such that X 2 and independent
y2
are
It is well known that if X and Y are independent r.v.s, then for any continuous functions 9 and h, the r.v.s g(X) and h(Y) are also independent (see Gnedenko 1962; Feller] 971). The converse statement is true if the functions 9 and h are oneone mappings of JR.I to JR. I . However, we can choose functions 9 and h without this condition such that g(X) and h(Y) are independent r.v.s but X and Y themselves are not. We present two examples treating the discrete and the absolutely continuous cases. (i) Consider the two-dimensional random vector (X, Y) with
Pi,j := PIX
= i, Y
= j],
i,j = -],0,]
59
RANDOM VARIABLES AND BASIC CHARACTERISTICS
=
=
=
=
where PI,1 = P-I,l 1/32.p-l,-1 Pl,-l PI,O Po,l = 3/32,p-I,O = Po,-l = 5/32. PO,O = 8/32. It is easy to check that X2 and y2 are independent r.v.s but X and Y are not. (ii) Let Xl and X2 be two independent absolutely continuous r.v.s. Take another r.v. Y which is independent on Xl, X2 and assumes the values +1 and -1 with probabiJity ~ each. Define two new r. v.s, say ZI and Z2, by
The absolute continuity of XI and X2 implies that Zl and Z2 are absolutely continuous. Obviously, Zl and Z2 are functionally connected and thus they cannot be independent. However, Zl = Xl, zi = Xi and, since Xl and X2 are independent, and Zi are independent.
zt
(iii) Here is another illustration. Let the random vector (X, Y) have the following density (compare with Example 5.8(ii»:
I(x,y) = { ~(1 0,
+ XV),
if Ixl ~ 1 and otherwise.
Iyl < 1
We easily find the marginal densities II (x) of X and h(y) of Y:
II(x) =
{!, 0,
if Ixl ~ 1 otherWise,
h(y) =
{!, 0,
if Iyl ;:: 1 otherWise.
Obviously I (x, y) ::f. It (x) I (y) for all x and y, hence X and Y are dependent. Each of the variables X2 and y2 takes values in (0,1) and for x E (0,1) and y E (0, 1) we find
P[X2 < x, y2 < y] = P[-VX < X < VX, -.jY < Y < JY]
IjVi JVY ( + uv)
4'
1 du dv -Vi -Vii = p[X2 < x]p[y2 < yJ.
vx...fY Thus X2 and y2 are independent r.v.s.
7.7.
The independence of random variables in terms of characteristic functions
If X is a r. v. defined on a given probability space (il, 3', P). then the function
=
JR'.
COUNTEREXAMPLES IN PROBABILITY
60
Let X., Xl be independent r.v.s and ¢I,¢l their characteristic functions (ch.f.s) respectively. Then the ch.f. ¢ of the sum XI + Xl is ¢1¢2: (1)
¢(t) :::: ¢I (t)¢2(t)
for all t E ~I.
We can pose the converse question: if ¢I, ¢2 and ¢ are the ch.f.s of XI, Xz, and XI + X2 and (1) holds, does it follow that XI and X2 are independent? Let us show that the answer to this question is negative. (i) Let the random vector (XI, X2) have density
(2)
I(XI,X2) :::: { HI 0,
+ XIXl(xI
- X})],
if Ixd.:S I and otherwIse.
IXll
:s 1
First. we find from (2) the marginal densities II and!z of XI and X2, namely
Since I(XI, X2) =I II (XI )!z(X2), the r.v.S XI and X 2 are not independent. Second, the variables Xl and X2 are identicaHy distributed and for their ch.f.s
¢I(t)
= ¢2(t) = t- 1 sint,
t E ~I.
Third, denote by 9 the density of the sum XI + X 2 • Then 9 is expressed by I from (2) as g(x) = J~oo I(XI, X - xI) dXI and a direct integration yields
g(x)
=
~(2 + x), H2 - x), { 0,
Having g, we find that the ch.f. ¢ of XI
if -2 ~ x ~ if 0 < x ~ 2 if Ixl > 2.
°
+ X 2 is
¢(t) :::: t- 2 sin2 t. Therefore ¢(t) = ¢I (t)¢2(t), that is relation (1) is satisfied, but, as we saw above, the variables X I and X 2 are dependent.
(ii) Take X I = X 2 = X where X has a Cauchy distribution with density 1j[1r( 1 + x 2)], x E JR I . If ¢I, ¢2 and ¢ are the ch.f.s of XI, Xl and XI + Xl respectively,wehave¢l(t):::: ¢2(t) = e-1tl,¢(t):::: e- 2Itl .Hence¢(t):::: ¢I(t)¢l(t) for all t E lR I, but clearly X I and X 2 are not independent. Finally, let us recall that the r. v.s X I, ... , Xn with ch.f.s 4>1 , ... , 4>n are independent iff for all real tl , ... , tn
Comparing (1) with this general condition enables us to explain the conclusions obtained in the examples above.
61
RANDOM VARIABLES AND BASIC CHARACTERISTICS
7.S.
The independence of random variables in terms of generating functions
If X is a non-negative integer-valued r.v., then the function p(z) = E[zX] is called a probability generating function (p.g.f.). Recall that p(z) is defined for all complex numbers z with Izl ~ 1. Further, if X is an arbitrary r.v., then the function M(z) = E[e zx ], z complex, is called a moment generating function (m.g.f.) of X. More on p.g.f.s and m.g.f.s is included in Section 8. Here we are interested in expressing the independence property of r.v.s by the corresponding generating functions.
(i) Let X and Y be independent non-negative integer-valued r.v.s. Denote by Px, py, and p x + y the probability generating functions of X, Y and X + Y respectively.
Then
(1 )
Px +y(z) = Px (z )py(z).
It is natural to ask the following question: if X and Y are non-negative, integervalued r.v.s such that (1) is satisfied, does it follow that X and Y are independent? We show by an example that in general the answer is negative. Let ~ and 'T1 be independent r. v.s such that ~ takes the values 0, 1 and 2 with probability each, and 'T1 takes the values and 1 with probabilities and ~ respectively." Define X = ~ and Y = ~ + 'T1 (mod 3). Then Y takes the value·s 0, 1 and 2 with probability cacho Further, the sum X + Y takes the values 0, 1, 2, 3 and 4 with probabilities ~, ~, ~ and ~ respectively. Obviously relation (1) is satisfied for the p.g.f.s of X, Y and X + Y. However, the variables X and Y are not independent; they are functionally dependent. In addition, we can show that X and Yare uncorrelated (for this property see Examples 7.9 and 7.10 below).
°
*
*
*
b,
(ii) If X and Y are arbitrary r. v.s which are independent and Mx, My and A{l( + Y are the m.g.f.s of X, Y and X + Y respectively, then
(2)
Mx+y(z)
= Mx(z)My(z).
As in case (i) we want to know if (2) implies the independence of X and Y. The answer will follow from the example below. Let (X, Y) be a two-dimensional random vector defined by the table:
Y 1 2 3
X 2
I
3
3
2
I
I
3
2
1
is is is
2
is is is
3
18 18 is
COUNTEREXAMPLES IN PROBABILITY
62
We can easily find that X and Y are identically distributed r. v.s taking each of the values 1, 2, 3 with probability!. The sum Z = X + Y is a r. v. taking the values 2, 3,4,5,6 with probabilities ~, ~. ~, ~, ~ respectively. Since X, Y and X + Y are non-negative and integer-valued, we can study their properties in terms of the p.g. f.s. But in all cases we can use m.g.f.s. Thus for the m.g.f.s we get
Mx(z) = E[e Zx ] Mz(z)
= My(z) = E[e zY ]
= Mx+y(z) =
~(eZ
+ e2z + e3z ), ~(e2Z + 2e 3z + 3e4z + 2e5z + e6z ).
Clearly Mx+y(z) = Mx(z)My(z), i.e. relation (2) is satisfied. However, the r.v.s X and Y are not independent as can be seen easily from the table above: P[X = i, Y = j] i: P[X = i]P[Y = j] for all i i: j. Finally, let us comment on both cases (i) and (ii). The independence of two (or more) r. v.s can be expressed in terms of the p.g.f.s or the m.g.f.s. Let us illustrate this for two variables. If (Xl! X 2 ) is a random vector whose components XI and X 2 are non-negative integer-valued r.v.s, then its p.gJ., say p( zl, Z2), is defined as
Denote by PI (Zl) the p.g.£. of XI and P2(Z2) the p.gJ. of X 2. Then Xl and Xz are independent iff p(ZI, Z2) = PI (Zl hn(Z2) for all Zl and Zz. For Zl = Z2 = z, the function p(z, z) = E[ZXI+X2] is the p.g.f. of the sum XI + X 2 in which case p(z, z) = PI (z)P2(z). This is exactly case (i) above where we do not have p(ZI, zz) p(ZJ)P(Z2) for all z" zz, i.e. we do not have independent Xl and X 2. For an arbitrary random vector (Xl, Xz) the m.gJ. is defined by
=
Denote by Ml (Z\) and Mz(zz) the m.g.f.s of XI and X z respectively. and IZII ~ rand IZ21 ~ r, given r ~ O. Then XI and X z are independent iff M(ZI, Z2) Ml (zJ )M2(Z2) for aU Zl) Z2. If we take Zl Z, Zz = z we get the function M(z, z) which is the m.gJ. of the sum Xl + X 2 • Obviously in this case M(z, z) Ml (z)Mz(z). We met this equality in case (ii) above. However, in this case M(zJ, zz) = MJ (zJ )Mz(zz) does not hold for aU Zl and Zz. This explains why Xl and Xz are not independent.
=
=
=
7.9.
The distribution of a sum can be expressed by the convolution even if the variables are dependent
If Xl and Xz are r.v.s with d.f.s FI and Fz respectively, and Xl, Xz are independent, the distribution of the sum X I + X 2 is FI * F2.1f X I and X 2 are absolutely continuous with densities It and h respectively. then the density of Xl + X 2 is It * h·
63
RANDOM VARIABLES AND BASIC CHARACTERISTICS
Now we are interested in the converse: what is the connection between the r.v.s XI and X 2 if we know that the sum XI + X 2 has distribution FI * F2 or density fl * h? The answer will follow from an example based on the Cauchy distribution. Let fa (x) = a/[1r(a 2+x2)], X E lR I be the density of a Cauchy distribution, where a > O. It is easy to check, for example by using ch.f.s, that the family of Cauchy densities is closed under convolutions. Consider two independent r. v.s ~ and 'T} each with density fa. Let X = a~ + /3'T}, Y = 1~ + ~'T} where a, /3, " ~ are arbitrary real numbers. Then the sum X + Y has density f(o.+!3+'Y+ 6 )a, which is the convolution of the densities f(o.+f3)a of X and f(-y+6)a of Y. Nevertheless, X and Y are not independent.
7.10.
Discrete random variables which are uncorrelated but not independent
It is a well known result that if X and Y are integrable and independent r.v.s, they are uncorrelated. The property of uncorrelatedness is weaker than independence. This will be demonstrated by a few examples. Here we consider discrete r. v.s; the absolutely continuous case is treated in Example 7.11.
= P[X = i, Y = j] are given by
(i) Let X and Y be r.v.s such that Pi,j PI,I
= P-I,I = PI,-I = P-I,-I = te , Po,1 = Po,-I = PI,O = P-I,O = ~(1
- e)
where 0 < e < 1. It is easy to find the marginal distributions of X, Y and compute that EX 0, EY O. Moreover, we also find that E[XY] 0 and hence the variables X and Y are uncorrelated. However,
=
=
P[X
=
= 0, Y = 0] = 0 ~ P[X = O]P[Y = 0] = t(1 - ef
and thus X and Y are not independent.
(ii) Let n = {I, 2, 3} and let each wEn have probability ~. Define two r. v.s X and Yby
I,
if w
!f w =2
X(w) = { 0,
= 0, E[XY] P[X
=
Y(w)=
Ifw-3,
-1,
Then EX
=1 {
0, 1,
0,
if w = 1 !fw=2 If w = 3.
= 0, so X and Y are uncorrelated. But
1, Y
= 1] = 0 ~
*.~ = P[X = I]P[Y = 1]
and therefore X and Yare not independent.
(iii) Let X and Y be r. v.s each taki ng the values -1,0, 1. The joint probability Pi,j = P[X = i, Y = j] is given by PI,O
= P-I,O
= Po,1
= PO,-I = 4'I
COUNTEREXAMPLES IN PROBABILITY
64
Then obviously EX Further,
=:
0, EY
= 0 and E[ XY]
O. Thus X and Y are uncorrelated.
i, !,
P[ X = I] = P[ X=:- I] pry = I] = pry = - 1]
=
P[X
=:
pry
OJ =
0]
!,
=!
=
and clearly the relation P[X i, Y = j] = P[X i]P[Y = j] is not valid for all pairs (i,j). So the variables X and Yare dependent.
(iv) Let € be a r. v. taking the values 0, ~11" and 11" with probability ~ each. Then it is easy to see that X = sin € and Y ~ cos ~ are uncorrelated. However, they are not independent. Moreover, X and Y are functionally connected: X2 + y2 I. 7.11.
Absolutely continuous random variables which are uncorrelated but not independent
(i) Let X I and X 2 have ajoint probability density function f where f(XI, X2) = 11"-1 if xi + x~ ~ 1 and f(XI, X2) = 0 otherwise (uniform distribution on the unit disk).
=
Simple computation shows that E(X I X 2 ] O. Thus the variables XI and X 2 are uncorrelated. It is very easy to find the marginal densities 11 and h and see that f(Xl, X2) :/= !I (Xl )h(Xl)' This means that XI and X 2 are not independent. (ii) If ~ is uniformly distributed on the interval (0, 211") then Xl satisfy the relations:
sin ~ and Xl = cos €
Therefore XI and Xl are uncorrelated but not independent. They are functionally dependent: + Xi = 1. (See case (iv) of Example 7.10.)
Xl
(iii) Recall that the r. v. X is said to be normally distributed with parameters a
and a l , where a E !R 1, a 2 > 0, if X is absolutely continuous and has a density (vz;Ta)-1 exp[-!(x - a)2/a 2], X E !R I . In such a case we use the following standard notation: X ,-. . ; N(a, a l ). Several properties of the normal distribution will be discussed further in Section 10. Let X '" N(O, I) and X2 = 1. Then EX2 = 0, E[X1Xl] = O. Hence Xl and X2 are uncorrelated. However they are functionally dependent
Xl -
7.12.
Independent random variables have zero correlation ratio, but the converse is not true
Let X and Y be r.v.s such that EY and VY exist. As usual, E[YIX] denotes the conditional expectation of Y given X. The quantity K x (Y) == V[E(YIX)J/VY
RANDOM VARIABLES AND BASIC CHARACTERISTICS
65
°
is called a correlation ratio of Y with respect to X. Obviously :S Kx (Y) :S 1 and Kx (Y) is defined for Y with VY > (see Renyi 1970). Note that Kx(Y) gives us information about the mutual dependence of X and Y. Obviously, if X and Yare independent and < VY < 00 then K x (Y) = 0, but not conversely. To see this, take (X, Y) to be uniformly distributed on the unit disk x 2 + y2 < 1. Let g(ylx) be the conditional density of Y given X = x. We have
°
°
°
Hence E[YIX] = and consequently Kx(Y) are not independent. 7.13.
= 0, though the variables X
and y-
The relation E[YIX] = EY almost surely does not imply that the random variables X and Yare independent
If X and Y are independent r.v.s on a given probability space and Y is integrable, then E[YIX] = EY a.s. Now the question is whether or not the converse is true. Let Z be any integrable r. v. which is distributed symmetrically with respect to zero, and let X be a r. v. independent of Z and such that X 2: 1 a.s. Let Y = Z / X. Then Y is integrable and the conditional expectation E[YIX] is well defined. Obviously we have EZ=O, EY =0 , E[YIX] = 0.
Therefore the relation E[YIX] = EY a.s. is satisfied but the variables X and Y are dependent. 7.14.
There is no relationship between the notions of independence and conditional independence
Intuitively the notions of independence and conditional independence are close to each other (see the introductory notes to this section). By a few examples we can show that neither of them implies the other one. (Also see Example 3.6.) (i) Let X n , n 2: 1 be independent BernoulIi r.v.s, that is Xn are i.i.d. and each takes two values, 1 and 0, with probabilities p and 1 - p respectively. As usual, let 5 n = XI + ... + X n . Then obviously for 52 = or 2, we have P[XI = 1152] > and P[X2 = 1152] > 0, whereas for 82 = 0, P[XI = 1, X 2 = :1182] = 0; that is, the equality
°
°
is not satisfied. Therefore the independence property of r.v.s can be lost under conditioning. (ii) LetXn,n 2: 1 be independentinteger-valuedr.v.sand 5 n = XI + .. ,+Xn' Then clearly the r.v.s 5 n , n 2: 1 are dependent. However, given that the event [52 = k] has
66
COUNTEREXAMPLES IN PROBABILITY
a positive probability and occurs, we can easily show that
P[S ::::.: . S ::::.: 'IS] ::::.: P[SI : : .: i, S2 = k, S3 ::::.: j] I 1" 3 J 2 P[S2 ::::.: k]
P[SI = i]P[X2 : : .: k - i]P[XJ ::::.: j - k] P[S2 ::::.: k] ::::.: P[S ::::.: 'IS ] P[X3 ::::.: j - k]P[S2 ::::.: k] I 1, 2 P[S2 ::::.: k] =
P[SI ::::.: iIS2 ]P[S] ::::.: jIS2].
Therefore there are dependent r. v.s which are conditionaJly independent.
(iii) Consider three r. v.s, X, Y and Z, with the following joint distribution:
P[X = k, Y ::::.: m, Z ::::.: n] ::::.: p3 q m-3
=
=
where 0 < p < 1, q ::::.: 1 - p, k = 1, ... , m - 1, m 2, ... , n - 1, n 3,4, .... Firstly, we can easily find the distributions of the pairs (X, Y), (X, Z) and (Y, Z), then the marginal distribution of each of X. Y and Z and see in particular that the r. v.s Z and X are dependent. Further, we have
P[X == k,Y P[Z
= niX
m] : : .: p2 q m-2, k::::.:l, ... ,m-l, m::::.:2,3, ... , k, Y::::.: m] : : .: pqn-m-I, k ::::.: 1, ... , m - 1, m::::.: 2, ... ,n - 1.
Hence for k ::::.: 1, ... , m - 1 and m ::::.: 2,3, ... we can obtain (hat 00
E[ZIX ::::.: k, Y ::::.: m]::::.: L
npqn-m-I::::.: m
p
n=m+1
and write the relation
E[ZIX, Y] = Y
1
+-
p
a.s.
Moreover, for any measurable and bounded function g, 00
E[g(Z)IX
= k, Y
= m] = LgU + m)pqj
I
j=1
so that
00
E[g(Z)IX, Y] ::::.: L g(Y + j)pqj-I
a.s.
j=1
Obviously the right-hand side of the last equality does not depend on X, which means that Z is conditionally independent of X given Y despite the fact (mentioned above) that Z and X are dependent r.v.s.
RANDOM VARIABLES AND BASIC CHARACTERISTICS
7.15.
67
Mutual independence implies the exchangeability of any set of random variables, but not conversely
Let XI,"" Xn be i.i.d. r.v.s. Clearly for any permutation (il, ... ,in) of (1, ... ,n), the random vectors (X I, ... ,Xn ) and (Xil"'" Xi,.) have the same distribution. Thus XI, ... ,Xn is a set of exchangeable variables. However, the converse is not generally true and this is illustrated by the following examples. (i) Let () be an arbitrary r. v. with values in the interval (0, 1). Let YI , Y2 , .•. be r.v.s which, conditionally on 0, are independent and take the values 1 and 0 with probabilities 0 and 1 - 0 respectively. Then for any sequence UI, " , Un of Os and Is, we have
(1) where k u, + ... + Un and n is an arbitrary natural number. We are interested in the properties of the set of r. v.s Yi, ... , Yn . Taking the expectation of both sides of (l) we find that the probability
P[Y,
U" ... , Yn - un] = E[P(Y,
= U"""
Yn = unIO)]
= E[Ok(l
())n-k]
depends only on the sum U! + ... + Un, which is k. and on n of course. Therefore Yi, ... , Yn , for any n, is a set of exchangeable variables. However, Y I , ••• , Yn are not mutually independent. Indeed. P[lj = l J = E() for each j ;::: 1. Further, (1) implies that P[Yi 1, ... , Yn = 1] = E[on]. On the other hand P[Y} 1] = (E())n. But 0 is an arbitrary r. v. with values in (0, 1). If, for example, () is uniformly distribut.ed on (0, 1). then (EO) n = (!) n f:: 1/ (n + 1) = E[()n]. This justifies our statement that Yi, ... , Yn are not mutually independent.
n7=1
(ii) Suppose that an urn containing balls of two colours, say w white and b black. is used, and after each draw the chosen ball is returned, together with s balls of the same colour. Introduce the r. v.s Yi , ... , Y n such that
y.. = { 1, t 0,
if the ith draw is black if the ith draw is white.
It can be shown that the variables Yi, ... ,Yn are not independent but they are exchangeable. The last statement follows from the fact that p[n~=l (Yi = Yi)] depends only on the sum Yi (for details we refer the reader to Johnson and Kotz (1977)).
E?=l
7.16.
Different kinds of monotone dependence between random variables
Recall that the r.v. Y is said to be completely dependent on the r.v. X if there exists a function 9 such that pry = g(X)] = 1.
68
COUNTEREXAMPLES IN PROBABILITY
Another measure of dependence between two non-degenerate r. v.s X and Y is that of sup correlation, defined by
p(X, Y) = supp(f(X),g(Y)) where the supremum is taken over all measurable f and 9 such that 0 < V[J(X)] < 00,0 < V[g(Y)] < 00 and p is the ordinary correlation coefficient. Let X and Y be absolutely continuous r. V.s. They are called monotone dependent if there exists a monotone function 9 for which pry = g(X)] == 1. The quantity l'(X, Y) supp(f(X),g(Y))
=
where the supremum is taken over all monotone functions f and 9 with 0 < 00 and 0 < V[g(Y)] < 00, is said to be a monotone correlation. Let us try to compare these kinds of monotone dependence. It is clear that if X and Y are monotone dependent, then their monotone correlation is 1. However, the converse statement is false. Indeed, let (X, Y) have a uniform distribution over the region [(0,1) x (0,1)] U [(1,2) x (1,2)]. Then
V[f(X)] <
p*(X, Y) ::::: p(I(O,1) (X),
1(0,1) (Y))
=1
but X and Y are not monotone dependent. Further, it is obvious that
Ip(X, Y)I ~ p*(X, Y) ~ p(X, Y).
( 1)
For a bivariate nonnally distributed (X, Y), it is well known that Ip(X, Y)l p(X, Y), and in this case we should have equalities in (1). On the other hand, it can easily be seen that in general p* is not equal to p. Indeed, take (X, Y) with a uniform distribution on the region
[(0, I) x (0,1)] U [(0,1) x (2,3)] U [(1,2) x (1,2)] U [(2,3) x (2,3)]. Let f(x) =
1(0, I} (x)
+ 1(2,3} (x).
Then p* (X, Y) < 1, but
p(X, Y) ::::: p(f(X), f(Y))
SECTION 8.
= 1.
CHARACTERISTIC AND GENERATING FUNCTIONS
Let X be a r. v. defined on the probability space (n,1',p). The function
(1)
¢(t) = E[e itX ],
tE
]RI,
i
= v'=-t
is said to be a characteristic function (chJ.) of X. If F(x), x E ]RI, is the dJ. of X then ¢(t) == J~oo eitx dF(x). Thus ¢(t) = J~oo eitx f(x) dx if X is
RANDOM VARIABLES AND BASIC CHARACTERISTICS
69
absolutely continuous with density f and ¢(t) = Ln e itxn Pn if X is discrete with P[X = xn] = Pn, Pn > 0, Ln Pn = 1. Recall some of the basic properties of a ch.f.
=
I
(i) ¢(O) = 1, ¢(-t) ¢(t), 1¢(t)1 :S 1, t E JR . (ii) If E[xn] exists, then ¢(n) (0) exists and E[xn] = i-n¢(n) (0). (iii) If ¢(n) (0) exists and n is even, then E[xn] exists; ifn is odd, then E[xn-I] exists. (iv) If E[xn] exists (and hence E[Xk] exists for k < n) then n
¢(t) = 2:)k!)-I(it)kE[Xk] + o(tn) k=O
in the neighbourhood of the origin. (v) ¢(t), t E JRI is a ch.f. iff ¢(O) = 1 and ¢ is positive definite. (vi) If XI and X 2 are r.v.s with ch.f.s ¢I and ¢2, and XI and X 2 are independent, then the ch.f. ¢ of the sum XI + X 2 is given by
¢(t)
= ¢I (t)¢2(t),
t E JR I .
(vii) If we know the ch.f. ¢ of a r.v. X then we can find the d.f. F of X by the so-called inversion fonnula and, moreover, if ¢ is absolutely integrable over JR I then X is absolutely continuous and its density is the inverse Fourier transfonn of ¢. Let us introduce two other functions which, like the ch.f. ¢, are essentially used in probability theory. For an arbitrary r.v. X with a d.f. F denote
(2)
M(z) = E[e ZX ] =
J
eZx dF(x),
z a complex number.
Suppose for some real r > 0 the function M(z) is well defined for all z, Izl < r. In such a case M is called a moment generating function (m.g.f.) of X and also of F. The relationship between the m.g.f. M and the ch.f. ¢, see (1), is obvious: M(it) = ¢(t) for real t. If X is a non-negative integer valued r. v. we can introduce the function
(3)
p(z) = E[zx],
z complex
which is called a probability generating function (p.g.f.) of X. (Note that the m.g.f. and p.g.f. were briefly introduced in Example 7.8 and used to analyse the independence property.) Some of the properties of ¢ listed above can be refonnulated for the generating functions M and p. However, note that the ch.f. of a distribution always exists while the m.g.f. need not always exist (excluding the trivial case when t = 0).
70
COUNTEREXAMPLES IN PROBABILITY
The ch.f. ¢ is called analytic if there is a number r > 0 such that ¢ can be represented by a convergent power series in the interval (-r, r), that is if ¢(t) :::: L~oaktk/k!, t E (-r,r), with some complex coefficients ak. The following important result is often used (see Lukacs 1970; Chow and Teicher 1978). If F and ¢ are a pair of a d.f. and a ch.f., then the following conditions are equivalent: (a) ¢ is r-analytic; (b) the moments ak = J xk dF(x), k ;:::: 1 are finite and ¢ admits the representation ¢(t) = L~Oak(it)k/k!, t E (-r,r); (c) Jet1x1dF(x) < 00,
O::;t
where ¢I and ¢2 are both ch-f.s of non-degenerate distributions. If ¢ admits only a Irivial product representation (that is, if ¢I or ¢2 is of the form eiat , a=constant), it is cal1ed indecomposable. We refer the reader to the books by Lukacs (1970), Ramachandran (1967), FeUer (1971), Chow and Teicher (1978), Rao (1984), Shiryaev (1995) and Bauer (1996) where the theory of characteristic functions and related topics can be found in detail. In this section we have included various counterexamples which explain the meaning of some of the properties of ch.f.s and of generating functions.
8.1.
Different characteristic functions which coincide on a finite interval but not on the whole real line
Suppose ¢I, ¢z are ch.f.s such that ¢I (t) ¢2(t) for t E [-l, l] where l is an arbitrary positive number. Does it then follow that ¢I (t) coincides with ¢2(t) for all t E JR 1? This important problem was considered and solved almost 60 years ago by Gnedenko (1937). Let us present his solution. Consider the function h(x) = 0 if Ixl > 7r /2 and h(x) = x, if Ixl ::; 7r /2. If c(t) = J~ h(x)h(x + t) dx, then the ratio ¢I (t) = c(t)/c(O) is a ch.f. An easy calculation shows that
¢I(t) -
{
1 + 37r It - 211"- 3 t 3 1 - 37r- 1t + 27r- 3t 3 :
0,
if -7r
if 0 ~ t::;-;;' if It I > 7r.
Now introduce another function, say ¢2, as follows:
¢2(t) :::: ¢I (t), jf It I ::; 7r ¢2(t + 27r) = ¢2(t), if t E JR 1• Let us show that ¢2 is a ch.f. Obviously ¢2 is an even function with the Fourier
71
RANDOM VARIABLES AND BASIC CHARACfERISllCS
expanslOn
1
zao + Lan cosnt.
(1)
00
n=1
A standard calculation shows that
Thus the series (1) converges unifonnly, its coefficients are non-negative and their sum equals tP2(0) = 1. Hence tP2(t) = eitx dF(x) for some dJ. F. This means that rP2 is a chJ. Therefore we have that tP2(t) = tPl(t) for t E [-11",11"] but not for all t E JR I . In a similar way we can construct two chJ.s tPl and tP2 which coincide on the interval [-l, l] for large enough l but not for all t E JR I. Note finally that at the end of the Gnedenko's paper we can find a very important remark made by A. Ya. Khintchine concerning the above result. Let FI and F2 be the dJ.s corresponding to 4>1 and 4>2. The above reasoning implies the equality
J.:o
tPl (t)tPl (t) = rPl (t)tP2(t)
for all t E
JRI
which is equivalent to the relation
(2) Equation (2) states that there exists a dJ. whose convolutions with two different dJ.s coincide. In other words, the convolution equality (2) does not in general imply that FI =F2 .
8.2.
Discrete and absolutely continuous distributions can have characteristic functions coinciding on the interval [- 1, 1]
Let X be a r. v. whose chJ.
tPl
is given by
tPl (t)
(1) Obviously
tPl
={ 1 -
Itl,
0,
is absolutely integrable on
JRI
if It I ~ 1 otherwise. and the density
f of X is
Consider now the r. v. Y where pry
= 0] = 1,
pry
= (2k -
1)11"]
= 2j[(2k -
1)211"2], k
= 0, ±1, ±2, ....
COUNTEREXAMPLES IN PROBABILITY
72
If ¢2 is the ch.f. of Y, then
4 -2 ~ cos(2k - l)nt n L.,; (2k-l)2 .
(2)
k=1
Let us show that ¢I given by (1) equals ¢2 given by (2) for each t E [-1,1]. The function h(t) = It I has the fo]]owing Fourier expansion: h(t) = + 2 E~=1 an cosnnt, It! S I, where ao 1, an 2(cosnn - 1)/(n 'Jr2). For even n, an = 0 and for odd n, that is for n = 2k 1, we have a2k-1 = -4/((2k - 1)2'Jr2). Now comparing ¢dt) and¢2(t) wecondudethat¢l(t) = ¢2(t) foreacht E [-1,1]. Nevertheless ¢1 and ¢2 correspond to quite different distributions, one of which is absolutely continuous while the other is purely discrete. Note additionally that ¢1 (t) f:: ¢2(t) for It I > 1.
=
8.3.
!ao
=
The absolute value of a characteristic function is not necessarily a characteristic function
If ¢ is a ch.f., then it is of general interest to know whether I¢I is also a ch.f. Consider the function ¢(t) = + 7eit ), t E !R I ,
k(l
Obviously ¢ is a ch.f. of a r.v. taking two values. We now want to know whether
I¢(t)!
= (!¢(t)12)1/2 =
(¢(t) ¢(t))1/2 = i(50 + 7e- it
+ 7eit )I/2
is a ch.f. If the answer were positive then '1/1 := I¢I must be of the fonn
'I/1(t)
peit~1
+ (I -
p)eit~2
where 0 < p < 1 and x J, X2 are different real numbers. Comparing 1'1/112 and 1¢1 2 we see that p should satisfy the relations
p2 = (I - p)2 = ~,
2p( 1 - p)
=~
which are obviously incompatible. Hence I¢I is not a ch.f. although ¢ is.
8.4.
The ratio of two characteristic functions need not be a characteristic function
Let ¢I and ¢2 be ch.f.s. Is it true that the ratio ¢I / ¢2 is also a ch.f.? The answer is based on the following result (see Lukacs 1970). A necessary condition for a function, analytic in some neighbourhood of the origin, to be a ch.f., is that in either half-plane the singularity nearest to the real axis is located on the imaginary axis. Consider the following two functions ¢1 (t)
[(
l_it)(I_ a
it
a + ib
)(1-
it
a - ib
)]-1 '
RANDOM VARIABLES AND BASIC CHARACTERISTICS
¢2(t) =
. )-1 't ( zt
1- a
ER
73
I
where a ;:::: b > O. One can check that both ¢I and ¢2 are analytic ch.f.s. Furthennore, their quotient 1/;(t) = ¢I (t) / ¢2 (t) satisfies some of the elementary properties of ch.f.s, namely 1/1 ( -t) = 1/1(t) , 11/1(t) I :::; 1/1(0) = 1 for all t E RI. However, the condition in the result cited above is violated since 1/1 has no singularity on the imaginary axis while it has a pair of conjugate complex poles ±b - ia. Therefore in general the ratio of two ch.f.s is not a ch.f.
8.5.
The factorization of a characteristic function into indecomposable factors may not be unique
We shall give two examples concerning the discrete and the absolutely continuous case respectively.
= i Ei=o
(i) The function ¢(t) eitk is the ch.f. of a discrete unifonn distribution on the set {O, 1,2,3,4, 5}. Take the functions
¢ I (t) = ! (l
1/11(t)
= i(1
+ eZit + e4it ), + eit +eZit ),
¢2(t) 1/12(t)
= !(1 + eit ) = (1 + eJit ).
!
Obviously we have
¢(t) = ¢I (t)¢2(t) = 1/11 (t)1/;2(t) ,
t E RI.
It is easy to see that ¢I, ¢2, 1/11t 1/1z are all ch.f.s of some (discrete) distributions. Moreover, ¢2 and 1/12 correspond to two-point distributions and hence they are indecomposable (see Gnedenko and Kolmogorov 1954; Lukacs 1970). Thus it only remains to show that ¢I and 1/11 are also indecomposable. Suppose that 1/11 (t) = 1/111 (t)1/1I2(t). where 1/111 and 1/112 are non-trivial factors. Clearly 1/11 corresponds to a distribution, say G I, concentrated at three points. 0, 1, 2 each with probability However, the discontinuity points of G 1 are of the type Xj + Yk where Xj and Yk are discontinuity points of the distributions corresponding to the ch.f.s 1/1) I and 1/112 respectively (see Lukacs 1970). Since GI has three discontinuity points and 1/111,1/112 are non-trivial, we conclude that 1/111(t) = peit:tl + (1 - p)eit :t2, 1/1IZ(t) = qe itY1 + (1 _ q) eit !J2
1.
where 0 < p the relations
<
1,0
< q < 1. But 1/;1 (t)
pq = (1 Clearly this is not possible.
p)( 1
q)
= 1/111 (t)1/1I2(t) implies thatp, q must satisfy p( 1
q)
+ q(1
- p) = ~.
74
COUNTEREXAMPLES IN PROBABILITY
We have therefore shown that 1/JI is indecomposable and, since ¢I (t) = 1/JI (2t), we conclude that ¢I is also indecomposable.
(ii) Consider now a uniform distribution over the interval ( - 1, 1). The ch.f. ¢ of this distribution is ¢(t) = t- I sin t, t E IRI. Using the elementary formula C l sin t = cos(t/2)(t/2)-1 sin(t/2) we obtain
q,( t) = t-I sin t = Passing to the limit in n, as n --t
[}1 00,
COS(t/2')] (t/2" )-1 sin(t/2").
we get the following well known representation: 00
¢(t)
(1)
= t-
I
sin t
= II cos(t/2k). k=1
Now it only remains for us to show that cos(t/2k) is an indecomposable ch.f. This is a consequence of the equality cos(t/2k) = !(e it / 2" + e- it / 2") which implies that cos(t/2k) is a ch.f. of a distribution concentrated at two points and hence it is indecomposable. Another factorization can be obtained by using the formula
c
l
sin t = (t/3)-1 sin(t/3)[2cos(2t/3) - 1]/3.
In this case we have 00
(2)
¢(t)
= t-
I
sin t = H2 cos(2t/3) - 1]
II cos(t/3 ·2k ). k=1
It follows from (2) that ¢ is a product of indecomposable factors and obviously (1) and (2) are different factorizations of the ch.f. ¢.
8.6.
An absolutely continuous distribution can have a characteristic function which is not absolutely integrable
Let ¢ be a ch.f. and F be its dJ. Recall that if ¢ is absolutely integrable on IR I, then F is absolutely continuous and the density f = F' is the inverse Fourier transform of ¢ (see Feller 1971; Lukacs 1970; Loeve 197711978). Let us now clarify if the converse statement holds. For this purpose we shall use the following theorem of G. P61ya (see Lukacs 1970; Feller 1971). Let 1/J(t). t E IRI be a real-valued continuous function such that: (i) 1/J(0) = 1; (ii) 1/J( -t) = 1/J(t); (iii) 1/J(t) is convex for t > 0; (iv) limt-too 1/J(t) = O. Then 1/J is a ch.f. of a distribution which is absolutely continuous.
RANDOM VARIABLES AND BASIC CHARACfERISTICS
75
Take for example the following two functions:
'l/J1(t)
= (1 + It/)-I,
if t E IR
I
and 'l/J2(t)
=
{
1 - It I 1/(4Itl'),
if 0 ::; It I ::; I I if tl ~ 2.
!
According to the result cited above we conclude that 1/JI and 1/J2 are ch.f.s which correspond to absolutely continuous distributions. However, it is easy to check that 'l/JI and 'l/J2 are not absolutely integrable. Finally, suppose X is a r.v. exponentially distributed, X £xp()..). By definition X is absolutely continuous (its density is )..e-~x, x > 0). However its ch.f. is equal to )../().. - it), and obviously this function is not absolutely integrable on ]RI. Therefore the absolute integrability condition for the ch.f. is sufficient but not necessary for the corresponding d.f. to be absolutely continuous. I"V
8.7.
A discrete distribution without a first-order moment but with a differentiable characteristic function
This and the next example are given to show that the existence of the derivative ¢(n) (0) for odd n does not necessarily imply that the moment an = E[xn] exists. To see this, consider the r. v. X with
Then the ch.f. ¢ of X is 00
(1)
¢(t)
= 2c 2::(cosnt)/(n 2 ]ogn),
t E 1R1.
n=2
Since the partial sums of the series 2:~=2 sin nt/n are uniformly bounded, the series 2:::2 sin nt/(n log n) obtained from (1) by differentiation is uniformly convergent (see Zygmund 1968). This implies the uniform differentiability of the ch.f. ¢(t) for all t E IR I. In particular, if t = 0, ¢t (0) = 0 but the expectation EX does not exist because the series 2:~2 1/ (n log n) is divergent.
8.S.
An absolutely continuous distribution without expectation but with a differentiable characteristic function
(i) Let X be a r.v. with the following density:
f(x)
=
0 { ci(x 2 log lx/),
if Ixl ::; 2 if Ixl > 2
76
COUNTEREXAMPLES IN PROBABILITY
where c is a nonning constant, 0 < c < 00 (the exact value is not essential). Since J2°O (x log x) -I dx = 00, the expectation EX does not exist. Nevertheless we can ask whether the chI ¢(t) of X is differentiable at t == O. Since
¢(t) = 2c
1
00
2
cos tx dx
x 2 10gx
we can write the difference [I - ¢(t)]/(2c) in the following way: 1- ¢(t) lIlt 1- costx ---'--'- = dx 2c 2 x 2log X
+
/00 1 lit
costx dx. x 2 log x
Obviously I - ¢(t) is a real-valued, non-negative and even function. For an arbitrary 2 U E JRI we have 0 :::; I-cosu:::; min{2,u }. This implies that 1-¢(t) is not greater than some constant multiplied by the function h(t) where
h(t) =
t2
l
Ilt
(IogX)-1 dx
+2
2
/00 (x2 10gx)-1 dx. lit
However, since h(t) == O( -t/ log t) = o(t) as t ~ 0, we find that
¢(t) = 1 + oU)
as
Therefore the ch.f. ¢( t) is differentiable at t
t --t O.
= 0 and ¢' (0) == O.
(ii) Let us extend case (i). Suppose now that X is a r. v. with the following density (J{ is a nonning constant):
x = { 0, f() K /(x410g Ixl),
if Ixl :::; 2 if Ixl > 2.
It can be shown that the ch.f. ¢(t) = J~ eitx f(x) dx, t E JRI is differentiable at t = 0 three times and e.g. ¢(3) (0) = 0 (¢(4) (t) does not exist at t == 0). However E[[XI3] == J~ Ixl 3 f(x) dx = 00, i.e. 0:3, the third-order moment of X, does not exist. (For details see Rao 1984.)
8.9.
The convolution of two indecomposable distributions can even have a normal component
Let F I , i<'z be dIs and ¢I, ¢2 their chIs respectively. If at least one of ¢I, rP2 is decomposable, then the convolution FI * F2 has a ch.f. ¢I ¢2 which is also decomposable. If FI and F2 are both indecomposable, is it true that FI * F2 is indecomposable? Regardless of our intuition we shall show that FI * F2 can contain a decomposable component which in particular can be chosen to be nonnal. To see this, let us consider the d.f. F with the following ch.f.: ¢(t) = (1 - t2)e-t2/2,
77
RANDOM VARIABLES AND BASIC CHARACTERISTICS
t E lR I. According to Linnik and Ostrovskii (1977) any ch.f. of the form (1 - b2t 2) exp[ict - b2t 2/2] where b, c E lR I, b ::f. 0, is indecomposable. So, ¢ is indecomposable. Denote by '1jJ the ch.f. of the d.£. G := F * F. Then '1jJ(t) = ¢2(t) = (1 - t 2)2e- t2 • Write '1jJ(t) in the form '1jJ(t) 'l/JI (t)'l/J2(t) where (I - t 2)2 exp( -3t 2/4),
~}l (t)
~J2(t)
= exp( _t 2/4).
It is then not difficult to check that the integral f~ ¢I (t) exp( -itx) dt is real-valued
and non-negative for al1 x E lRl. This implies that '1jJ1 is a ch.f. of some distribution (it is not important which one). On the other hand, '1jJ2 can be identified as the ch.f. of the normal distribution N(O, ~) since in general the nonnal distribution N(a, (]'2) has a ch.f. equal to exp[iat - ~(]'2t2J, t E lRl. Hence the indecomposabiIity property is not preserved under convolution. The same example considered above can be interpreted as follows. Let X I, X 2 be independent r.v.s with a common d.f. F. Then the sum XI + X2 has a d.£. G and, moreover, the following relation holds:
where Y 1,
8.10.
Y2
are independentr.v.s such that Yj has a ch.f. '1jJJ,
Y2 ,. . , N(O, t).
Does the existence of all moments of a distribution guarantee the analyticity of its characteristic and moment generating functions?
Let X be a r. v. with ch. f. ¢ and m.g.f. M. Then if ¢( t) and M (z) are analytic functions for t ~ to or Iz I ~ TO with to > 0, TO > 0, the r. v. X possesses moments of all orders. Thus we come to the question of whether the converse of the statement is true. Suppose Z is a r. v. with density
f(x) = {
=
0, exp( -VX),
!
if x if x
°
< ~ 0.
=
Then o.k - E[Zk] r(2k + 1). k 0,1, ... , and hence Z possesses moments of any order. For clarifying the properties of the ch.f. of Z we need the following result (see Laha and Rohatgi 1979). The ch.f. ¢ of the r. v. X is analytic iff: (a) X has moments o.k of any order k, k ~ 1; (b) there exists a constant c > such that Io.kl ~ k!c k for all k l. Since in our example o.k = (2k + I)! we can easily find that
°
and clearly condition (b) in the above result is not satisfied. Therefore the ch.f. ¢ of Z cannot be analytic. It follows that the m.g.f. M does not exist. Note that the last
78
COUNTEREXAMPLES IN PROBABILITY
statement can be derived directly. Indeed, M (z) can be written in the form
M(z) = If €
1
(OO
'2 10
exp(zx -
vx) dx.
> 0 is small enough then for every z with 0 < z < € we have zx - VX -+
vx)
oo
00
as x -+ 00. This implies that fo exp{zx dx == 00. Therefore M(z) does not exist in spite of the fact that all moments of Z do exist. Finally, lel us show a case which is an extension of the above example. Suppose U is a r. v. with density
g(x)
c exp{ -Ixll'),
xE
:[RI
< 'Y < I and c is a norming constant, c- 1 := f~oo exp( -Ixll') dx. Then E[lUlkJ < 00 for every k ~ 1, so U possesses moments of any order. Nevertheless, where 0
the chJ. of U is not analytic and consequently the m.gJ. of U does not exist.
SECTION 9.
INFINITELY DIVISIBLE AND STABLE DISTRIBUTIONS
Let X be a r. v. with d.f. F and chJ.
X==Xnl + ... + Xnn or equivalently, if for adJ.
Fn and a ch.f.
Let us note the following properties. (i) A distribution F with bounded support is infinitely divisible iff it is degenerate. (ii) The infinitely divisible ch.f. does not vanish.
(iii) The product of a finite number of infinitely divisible ch.f.s is a ch.f. which is again infinitely divisible. (iv) The r. v. X can be a limit of sums Sn = l:::~=1 X nk iff X is infinitely divisible. Fundamental in this field is the following result (see Feller 1971; Chow and Teicher 1978; Shiryaev 1995). The r. v. X with ch.f.
RANDOM VARIABLES AND BASIC CHARACTERISTICS
79
where, E ]RI, and G(x), x E ]RI, is non-decreasing left-continuous function of bounded variation and G(-oo) = O. Now let us introduce another notion. The r. v. X, its d.f. F and its ch.f. ¢ are called stable if for every n ~ I there exist constants an and bn > 0 and independent r.v.s X I , ... , X n distributed like X such that
or, equivalently, F
(xt;:n ) = [F(x)]*n, or [¢(t)]n
¢(bnt)einnt.
The basic result concerning stable distributions is as follows (see Chow and Teicher 1978; Zolotarev 1986). The r. v. X with ch.f. ¢ is stable iff ¢ admits the following canonical representation:
(2)
¢(t) = exp {i,t 1
where, E ]R , 0 < a ~ 2,
1f31
cltlO
[1
+ if3I:1 w(t, a)1}
~ 1, c ~ 0 and
( ) _ {tan 11Ta, w t, a (2/71-} log Itl,
if a =1= 1 if a = 1.
Recall that (2) is also known as the Levy-Khintchine representation. In particular, if, = 0, f3 = 0, we obtain the symmetric stable distributions. They have ch.f.s of the type exp( -cltIO) where c ~ 0,0 < a ~ 2. A detailed investigation of the infinitely divisible distributions and the stable distributions can be found in the books by Gnedenko and Kolmogorov (1954), Lukacs (1970), Feller (1971), Linnik and Ostrovskii (1977), Loeve (1978), Chow and Teicher (1978) and Zo]otarev (1986). The next examples illustrate different properties of infinitely divisible and stable distributions. Two examples deal with random vectors.
9.1.
A non-vanishing characteristic function which is not infinitely divisible
Let the r.v. X with ch.f. ¢(t), t E ]RI, be infinitely divisible. Then ¢ does not vanish. The example below shows that in general the converse is not true. Consider the discrete r.v. X which takes the values -1,0, 1 with probabilities ~, krespectively. The ch.f. ¢ of X is
k,
¢(t) = ke-it + ie itO + keit = k(3 + cos t). Obviously ¢( t) > 0 for all t E ]R I, so ¢ does not vanish. Nevertheless, X is not infinitely divisible. To see this, let us assume that X can be written as
(l)
80
COUNTEREXAMPLES IN PROBABILITY
where XI and X2 are i.i.d. r.v.s. Since X has three possible values, it is clear that each of XI and X 2 can take only two values, say a and b, a < b. Let P[XI = a] = p, P[XI = b] = 1 - p for some p, 0 < p < 1. Then XI + X2 takes the values 2a, a + b and 2b with probabilities p2, 2p(I - p) and (I - p)2 respectively. Thus we should have the relations
2a=-I,
a+b=O,
2b=l,
p2=k,
2p(p+l)=~,
(1-p)2=k
which are clearly incompatible. Hence the representation (1) is not possible, implying that X is not infinitely divisible.
9.2.
If IcPl is an infinitely divisible characteristic function, this does not always imply that ¢ is also infinitely divisible
Recall that if cP is an infinitely divisible ch.f. then its absolute value IcPl is so. It is not so trivial that in general the converse statement is false. This was discovered by Gnedenko and Kolmogorov (1954) and we present here their example of a ch.f. cP such that IcPl is infmitely divisible, but ¢ is not. Consider the function
(I)
¢(t)
where 0
= [(1- b)/(I
< a ::; b <
- a)][(1
+ ae-it)/(I
- be it )],
I. Obviously ¢ is continuous, ¢(O)
t E jRl
= 1 and
It follows that ¢ is the ch.f. of a r.v. X with
P[ X
= - 1] = (I
- b) a / (I - a),
P[ X
= k]
= (1 - b) (1
+ ab) bk / (I
- a),
k = 0, 1,2, .... Let us show that ¢ is not infinitely divisible. Indeed, we find that 00
(2)
log¢(t)
= L)(-I)k-lk-lak(e-itk -
1)
+ bkk-I(e itk
- 1)].
k=1
We can also write log ¢(t) in its canonical fonn (see the introductory notes to this section; the Levy-Khintchine fonnula) by taking "I = E~I (b k + (-1 )k a k)/(k 2+ 1) and G(x) to be a function of bounded variation with jumps of size kb k /(k 2 + 1) at x = k and (_I)k-I ka k /(k 2 + I) at x = -k for k = 1,2, .... However, G is not monotone, which automatically implies that ¢ cannot be infinitely divisible. Furthennore, the function
¢(t) = [( I - b)/( 1 - a)][( I
+ ae it )/( 1 -
be-it)]
RANDOM VARIABLES AND BASIC CHARACTERISTICS
81
is also a ch.f. but not infinitely divisible. Our next step is to show that the function
1/J(t) = 1¢(t)12 = ¢(t)¢(t) is infinitely divisible. Note that 1/J is a ch.f. as a product of two ch.f.s. It is easy to write firstly log¢(t) in the fonn (2) and then obtain 10g1/J(t), namely 00
00
Thus in the Levy-Khintchine fonnula for 10g1/J(t) we can take 'Y = 0 and G(x) to be a non-decreasing function with jumps of size k(k 2 + 1)-1 [b k + (-1 )k-I ak] at the points x = ±k, k = 1,2, .... Since this representation of 10g7f!(t) is unique, we conclude that the ch.f. 1/J is infinitely divisible. Moreover I¢I = (1¢12)! is also infinitely divisible despite the fact that ¢ given by (1) is not. Another interesting observation is that the infinitely divisible ch.f. t/J is the product of the two non-infinitely divisible ch.f.s ¢ and
;p.
9.3.
The product of two independent non-negative and infinitely divisible random variables is not always infinitely divisible
(i) Define two independent r.v.s X and Y having values in the sets {O, 1,2,3, ... } and { 1, 1 + e, 1 + 2e, 1 + 3e, ... } respectively where 1 < e < ~. The corresponding probabilities for X and Y are {Po,PI,Pl, ... } and {QO,QI,q2,"'} wherepj > 0, Lpj = 1, Qj > 0, Lqj = 1. Consider the product Z = XY and suppose it is infinitely divisible. Then
(1) where ZI and Z2 are i.i.d. r. v.s. Evidently, the 'first' six possible values of Z are 0, 1,2, 1 + e, 3, 1 + 2e. It follows that 0,1 and 1 + e are among the values of ZI (and hence of Z2). But this implies that 2 + c is a possible value of Z. Since 2 + e < 1 + 2e we get a contradiction. Consequently a relation similar to (I) is not possible. Thus Z cannot be infinitely divisible. Notice that X and Y take their values from different sets. The same answer concerning the non-infinite divisibility ofthe product XY can be obtained in the case of X and Y taking values in the same space. (ii) Let us exhibit now an example in which the reasoning is based on the following (see Katti 1967). Suppose {Pn, n E No} is a distribution with Po > 0 and PI > O. Then {Pn} is infinitely divisible iff the numbers Tko k = 0, 1, ... , defined by n
(2)
(n
+ 1 )Pn+ I = L k=O
T kPn -
k,
n
= 0, 1, 2, ...
82
COUNTEREXAMPLES IN PROBABILITY
are all non-negative. Let us use this result to prove a new and not too well known statement: let ~ and T) be independent r. v.s each having a Poisson distribution Po (.~). Then both, and T) are 'T) is not. infinitely divisible, but the product X Indeed, take n > 1 such that n + 1 is a prime number. Then
=
Pn+1 ::= P[X = n + I] = P['1J = n + I] = P[, = 1,T)::= n + 1] + P[, = n + 1,T) = 1] = 2)..n+l e-2>"/(n
+ I)!.
The number n itself is even and hence n has at least two (integer) factorizations: n = I . n = 2· (n/2). Therefore
Pn = P[X
= n]
= )..n/2+2 e -2>" /(n/2)!. 0, PI = )..2e- 2>" > 0, and so
> 2· ()..2e->" /2!) . ()..n/2e->" /(n/2)!)
Obviously Po = P[X = 0] = I - (1 - e->")2 > ro = PI/Po> 0. Further, suppose that rl, r2, ... ,rn-I in (2) are all non-negative. Let us check the sign of rn. We have
rnPo ~ (n
+ 1)Pn+1
- rOPn
< 2)..n+2e -2>" In!
- ro)..n/2+2e- 2>" /(n/2)!
= e- 2A )..n/2+2 [2)..n/2/n! - ro/(n/2)!].
°
Since).. > is fixed and I/n! goes to zero as n -t 00 faster than 1/(n/2)!, we conclude that for sufficiently large n the number rn becomes negative. This does not agree with the property in (2) that all r n are non-negative. Hence the product T)' of two independent Poisson r. v.s is not infinitely divisible.
9.4.
Infinitely divisible products of non-infinitely divisible random variables
There are many examples of the following kind: if X is a r. v. which is absolutely continuous and infinitely divisible and XI, X2 are independent copies of X, then the product XI X 2 is again infinitely divisible. As a first example take X "" N(O, I). Then Xl X2 has a ch.f. equal to 1/( I + t 2) I /2 and hence X I X 2 is infinitely divisible. As a second example, take X ......, e(O, 1), i.e. X has a Cauchy density f(x) = 1/( 7l'( 1 + x 2)), x E Ii I. If XI and X 2 are independent copies of X, it can be checked that the ch.f. of the product XIX2 is infmitely divisible. These and other examples (discussed by Rohatgi et al (1990» lead to the following question. Suppose XI and X2 are independent copies of the absolutely continuous r. v. X. Suppose further that the product Y = XI X 2 is infinitely divisible. Does this imply that X itself is infinitely divisible? Let Y be a r. v. distributed N(O, I). Then there exists a r. v. X such that by taking two independent copies, XI and X 2 , we obtain XI X 2 4: Y (for details see Groeneboom and Klaassen (1982». Thus P[IYI > x 2] ~ (P[lXI > x])2 which implies that P[IXI > x] ~ (P[lYI > x 2 ]) 1/2 = O(e-
x4 /
4
)
as
x -t
00.
83
RANDOM VARIABLES AND BASIC CHARACTERISTICS
Referring to the paper of Steutel (1973) for details we conclude that X cannot be infinitely divisible. Hence the answer to the above question is negative.
9.5.
Every distribution without indecomposable components is infinitely divisible, but the converse is not true
Following tradition, denote by 10 the class of distributions which have no indecomposable components. Recall that F E 10 means that the ch.f. ¢ of F cannot be represented in the form ¢ ¢1 ¢z where ¢1 and ¢2 are ch.f.s of non-degenerate distributions. Detailed study of the class 10 is due to A. Va. Khintchine. In this connection see Linnik and Ostrovskii (I977) where among a variety of results, the following theorem is proved: the class 10 is a subclass of the class of infinitely divisible distributions. Our purpose now is to show that this inclusion is strong. Indeed, take the following ch.f.:
=
(I)
¢(t)=(I-a)(l-aeit )-"
O
tElRl.
The representation
4>(t) = exp[log(l - a) -log(l - ae")] = exp
[~ann-'(e"n
I)]
shows that ¢ is a limit of products of ch.f.s corresponding to Poisson distributions. Then (see Gnedenko and Kolmogorov 1954; Loeve 197711978) the ch.f. ¢ is infinitely divisible. Further, the identity 1/(1 - x) = f1~0(1 + xZ"').lxl < 1 implies that 00
¢(t) =
II (1 + aZ"'e it2"')/(1 + a2"'). k=O
Recall that (1 + a 21< eitZI< )/(1 + a 21<) is a ch.f. of a distribution concentrated at two points, namely 0 and 2k. However. such a distribution is indecomposable (see Example 9.1). Hence the ch.f. ¢ defined by (1) is infinitely divisible but ¢ does not belong to the class 10.
9.6.
A non-infinitely divisible random vector with infinitely divisible subsets of its coordinates
Let (XI, Xz, X 3) be a random vector and ?/J(tl, t2, t3), tl, t2, t3 E lRl its ch.f.:
?/J(tl, t2. t3)
= E [exp[i(tIX\ + t2X2 + t3X3)]] .
The vector (X l ,X2 ,X3 ) is said to be infinitely divisible jf for each Q' > 0, ?/JO< (tl, t2, t3) is again a ch.f. Obviously this notion can be introduced for random
84
COUNTEREXAMPLES IN PROBABILlTY
vectors in lin with n > 3. We confine ourselves to the three-dimensional case for simplicity. Let us note that if (X], X2, X3) is infinitely divisible, then each subset of its coordinates XI. X 2 • X3 is infinitely divisible. This follows easily from the properties of the usual one-dimensional infinitely divisible distributions. Thus it is natural to ask whether the converse statement is true. Consider two independent r.v.s X and Y each N(O, 1). Let ZI=X 2,
Z2=XY,
Z3=y2.
It is easy to check that each of ZI, Z2. Z3 is infinitely divisible. Moreover, any of the two-dimensional random vectors (ZI, Z2), (ZJ, Z3) and (Z2, Z3) is also infinitely divisible. However, the vector (ZI' Z2, Z3) is indecomposable, it has trivariatc gamma distribution which is not infinitely divisible. For details we refer the reader to works by Levy (1948). Griffiths (1970) and Rao (1984).
9.7.
A non-infinitely divisible random vector with infinitely divisible linear combinations of its components
If (X, Y) is an infinitely divisible random vector, then any linear combination Z = a,X + a2Y, ai, a2 E JRI is an infinitely divisible r.v. The question to be considered is whether the converse is true. This problem, posed by c.R. Rao, has been solved by Ibragimov (1972). The following example shows that the answer is negative. 1 2 For x / and 0 < £ < (XIJ X2) E JR2, Ixi = (XI + define the function
xn
=
ae(x) =
{
I, 0,
if if
-c,
if
i,
Ixl ::; ~ £ or 4 £ < Ixl ::; Ixl > 1 ~ - c < Ixl ::; ~ + c.
1
Let Ae (U). U E JR 2 • be the signed measure with density a e• that is
and also introduce the function
For all sufficiently sma]) £ > 0, tPe is positive definite and hence some d.f. FE in JR 2. Indeed, from (I)
tPe
is the ch.f. of
85
RANDOM VARIABLES AND BASIC CHARACTERISTICS
2 where c = exp[-Ae(1It )]. Thus Fe can be written in the fonn Fe = c(G o + Ge ) where Go is a probability measure with Go({O}) = 1 and Ge is a measure with density Je(x) = E~=I a~(x) In!. Furthennore we can check that for all small E,
ii~(x) 2: O.
ii;(x) = ( ae(x - 11,)ae(u) d11, 2: 0 and
iR2
Hence for n 2: 4 we have a~(x) = ii~n-2) * a;(x) 2: O. For small E, ae(x) is close to the function a6(x). It is easy to see that for! ~ x < ~ we have infx ii~(x) = CI > O. Thus for small E, a;(x) > 2E if E < Ixl :::; + E. Evidently this implies that 2 Je(x) 2: 0 for all x E 1It . Therefore Fe described above is a probability measure in
t-
t
tR 2.
Denote by (XI, X 2 ) a random vector with dJ. Fe. Since Ae is a signed measure (its values are not only positive), Fe cannot be infinitely divisible. It remains to be shown that any linear combination QIX I + Q2X2, QI, Q2 E lIt', has a distribution which is infinitely divisible. Indeed, for s E tR'
rPo:(s) := E{exp[is(Q,XI
+ Q2X 2)]}
= 1fle(QIS,Q2S)
= eX P [k2(e iS (0:,X) -1)dAe(X)].
Denoting (Q, x)
rPo:(s) = exp
= u where u E tR I we can write rPo: (s)
[1:
(e
isu
-
in the fonn
dHo:(11,) =
I) dHo:(U)] ,
1
dAe(x) duo
(o:,x)::; u
Since for sufficiently small E every strip {x : 11,1 :::; (Q, x) :::; 11,2} has positive Aemeasure, we conclude (again see Ibragimov 1972) that the function H 0: (u), u E lIt I, is a d.f. and moreover, rPo:(s) = 1fle(OI.'i, 02S), .'i E tRl, is a chI of a distribution which is infinitely divisible. Thus we have established that any linear combination QIX I + Q2X2 is an infinitely divisible r.v. but (XI, X2) is not an infinitely divisible vector.
9.S.
Distributions which are infinitely divisible but not stable
Usually we introduce and study the class of infinitely divisible distributions and then the class of stable distributions. One of the first observed properties is that every stable distribution is infinitely divisible. Let us show that the converse is not always true. (i) Let X be a r.v. with Poisson distribution, X
P[X where the parameter .\ (I)
= n] =
.\ne->'In!,
"-J
1>0(.\), that is
n = 0, 1,2, ...
> 0 is given. If rP is the chJ. of X rP(t) = exp['\(e it
-
1)],
then
t E tRl.
86
COUNTEREXAMPLES IN PROBABILITY
Since ¢(t) = [¢n{t))n for ¢n(t) = exp[An- 1 (e it - 1)) and ¢n is again a ch.f. (of Po(A/n}) then X is infinitely divisible. However, ¢ from (I) does not satisfy any relation of the type ¢(b l t)¢(b 2t) = ¢(bt)ei-rt (see the introductory notes to this section). This means that the Poisson distribution is not stable despite the fact that it is infinitely divisible. (ii) Let Y be a r.v. with Lap/ace distribution, that is, its density 9 is
(1/2A) exp[-Ix - ttl/A],
g(x)
x E Rl
where J.1 E R I , A > O. For the ch. f. t/J of Y we have
(2) It is not difficult to verify that t/J is infinitely divisible. But t/J from (2) does not satisfy any relation of the type t/J(b, t)t/J(b2 t) = t/J(bt)ei-rt and hence '!/J is not stable. Therefore the Laplace distribution is an example of an absolutely continuous distribution which is infinitely divisible without being stable. (iii) Suppose the gamma distributed r. v. Z has a density
g(x)
(l/v'21T)x- I / 2e- x ,
x >0
=
(and g(x) 0 for x SO). Then by using the explicit form of the ch.f. of Z we can show that Z is infinitely divisible but not stable. An additional example of a distribution which is infinitely divisible but not stable is given in Example 21.S.
9.9.
A stable distribution which can be decomposed into two infinitely divisible but not stable distributions
Let X be a r.v. with Cauchy distribution e(], 0), that is, its density is
f(x)
1/[11"(1 + x 2 )), x E RI.
If ¢ denotes the ch.f. of X, then ¢(t) = e- 1tl , t E RI. It is wen known that this distribution is stable. Let us show that X can be written as
(I) where X I and X 2 are independent r. v.s whose distributions are infinitely divisible but not stable. For introduce the following two functions:
We claim that ¢, and ¢2 are ch.f.s of distributions which are infinitely divisible. This follows from the fact that each of ¢I, ¢2 can be expressed in the fonn
RANDOM VARIABLES AND BASIC CHARACTERISTICS
87
u
exp[- f~ fo 'IjJ(v) dvdu) with a suitable integrand 'IjJ and the only assumption is that 'IjJ is a ch.f. Then our conclusion concerning ¢I and ¢2 is a consequence of a result of Lukacs (1970, Th. 12.2.8). It is easy to verify that ¢I and ¢2 are not stable ch.f.s. Thus we have
(2) Now take two independent r.v.s, say XI and X 2 , whose ch,f.s are ¢I and ¢2 respectively. It only remains to see that (2) implies (1). Therefore we have constructed two r.v.s Xl and Xz which are independent, both are infinitely divisible but not stable and they are such that the sum Xl + X2 has a stable distribution.
SECTION 10.
NORMAL DISTRIBUTION
We say that the r. v. X has a normal distribution with parameters a and a 2 , a E IR I, a > 0, if X is absolutely continuous and has a density 1 exp [ (x - a)2] f(x) -- a-J2ir - 2a2'
(1)
xE
lll)1. J.l\\.
In such a case we use the notation X "" N(a, a 2 ). It is easy to write explicitly the d.L corresponding to (1). Consider the particular case when a = 0, a = I. We obtain the functions
(2) and
(3) These two functions, , are caBed a standard normal density function, and a standard normal d.f respectively. They correspond to a r. v. N(O, 1). Recall that the r. v. X "" N( a, a 2 ) has EX a, V X a Z and a ch.f ¢(t) exp(iat - ~a2t2). If a 0, then all odd-order moments are zero, that is,a2n+1 E[X2n+l) = 0, while the even-order moments are a2n = E[X 2n J =
=
a 2n (2n
=
=
=
I)!!.
Consider now the random vector X = (Xl"'" Xn). If EXi = ai, i - I , ... , n, then a = (al,"" an) is called a mean value vector (or vector of the expectation) of X. The matrix C = (Cij) where eij = E[(X i - ai)(Xj aj)), i,j = 1, ... ,n is caBed a covariance matrix of X. We say that X has an n-dimensional normal distribution if X possesses a density function
COUNTEREXAMPLES IN PROBABILITY
88
(4)
!(XI, ... ,X n ) = 2 (27r)-n/2I D II / exp
{-i .t
dij(Xi - ai)(Xj - aj )},
(XI, ... , xn) E IRn.
l,J=1
Here the matrix D = (d ij ) is the inverse matrix to C. Clearly, D exists if C is positive definite and IDI :=detD. Note that we could start with the vector a (a), ... , an) E IR n and the symmetric positive definite matrix C = (Cij), then invert C to yield matrix D, and finally use the vector a and the matrix D to write the function! as in (4). This function f is an n-dimensional density and thus there is a random vector, say (XI"'" X n ), whose density is f. By definition this vector is called nonnally distributed, and (4) defines an n-dimensional nonnal density. For some of the examples below we need the explicit fonn of (4) when n = 2. The two-dimensional (or bivariate) nonnal density can be written as
=
(5)
!(XI,X2) =
1 27rCTI O'
2V 1 -
p2
x
I [(XI - al)2 2 (XI - al)(x2 - a2) exp - p { 2(l_p2) O'f 0'10'2
(X2 - a2)2] } + ~----;;,.........:0'1
where 0'1,0'2 > 0 and Ipi < I. If (XI, X2) is a random vector with density (5) then EX I = al,EX2 = a2, VX I = O'r, VX 2 = O'~ andpequalsthecorrelationcoefficient
p(XI' X2).
The nonnal distribution over IR I and IR n is considered in almost all textbooks and lecture notes. We refer the reader to the books by Anderson (1958), Parzen (I960), Gnedenko (I962), Papoulis (1965), Thomasian (1969), Feller (1971), Laha and Rohatgi (1979), Rao (1984), Shiryaev (1995) and Bauer (1996). In this section we have given various examples which clarify the properties of the nonnal distribution.
10.1.
Non-normal bivariate distributions with normal marginals
(i) Take two independent r.v.s ~I and 6, each distributed N(O, ]). Consider the following two-dimensional random vector:
Obviously the distribution of (X I, X2) is not bivariate nonnal, but each of the components XI and X 2 is nonnally distributed. (ii) Suppose h(x), x E IRI, is any odd continuous function vanishing outside the interval [-I, 1] and satisfying the condition Ih(x)l:S (27Te)-1/2. Using the standard
89
RANDOM VARIABLES AND BASIC CHARACTERISTICS
normal density
f(x,y)
( 1)
=
It is easy to check that f(x,y), (x,y) E jR2, is a two-dimensional density function and f(x, y) is not bivariate normal, but the marginal densities fl (x) and Jz(y) are both normal. The function h in (1) can be chosen as follows:
where ][-I,IJ (-) is the indicator function of the interval [-I, I]. (iii) For any number e,
lei
s I, define the function
H(x, y) = <1>(x)<1>(y)[1 + e(l - <1>(x))(l - <1>(y))L
(x, y) E
jR2
(<1> is the standard normal d.f.). It is easy to check that H is a two-dimensional d.f. with marginal distributions <1>( x) and <1>{y) respectively. Obviously, if e f; 0, H is non-normal. Another possibility is to take the function
h(x,y) =
+ t:{2<1>{x)
- I)(2<1>{Y) - I)].
Then h(x,y) is a two-dimensional density function with marginals
(iv) Consider the following function:
f(x,y)
[lj(7f(1 - p2)I/Z)] cxp[-!p-2(x2 - 2pxy + y2)), { 0,
° <°
if xy ~
if xy
where p E (-1, I). It is easy to verify that f is a two-dimensional density function. Denote by (X, Y) the random vector whose density is f(x,y). Obviously the distribution of (X, Y) is not normal, but each of the components X and Y is distributed N(O, 1).
10.2. If (X I, X 2 ) has a bivariate normal distribution then X h X 2 and X I + X 2 are normally distributed, but not conversely Let (XI, X2) have a bivariate normal distribution and f(XI, xz), (Xl, xz) E jRz be its density. Then each of the r.v.s X" Xz and XI + Xz has a one-dimensional normal distribution. We are interested in whether the converse is true. Suppose XI, X 2 are independent r.v.s each distributed N(O, I). Then their joint density is f(Xl, X2) =
90
COUNTEREXAMPLES IN PROBABILITY
SI===========t========== s;
s{ ====1
s~
Figure 2
Firstly, let us draw eight equal squares at a fixed distance from the origin a and located symmetrically about the axes OXI and OX2 as shown in figure 2. Put alternately the signs ( +) and (-) in the squares. Let the small positive number c denote the amount of 'mass' which we transfer from a square with (-) to a square with (+). Now define the function
where Q+ is the union of the squares with (+) and Q- the union of those with (-).
91
RANDOM VARIABLES AND BASIC CHARACTERISTICS
For such squares we can choose c > 0 sufficiently small such that g(Xl' xz) 2: 0 g(Xl, xz)dxl dX2 :=:: 1. for all (Xl, X2) E IRz. From (1) we find immediately that f Hence 9 is a density function of a two-dimensional random vector, say (Yl , Y2 ). Next we want to find the distributions of Y 1 , Y2 and YI + Y 2. The strips drawn in figure 2 will help us to do this. These strips can be arbitrarily wide and arbitrarily located but parallel to OXI or OX2 or to the bisector of quadrants II and IV. Evidently the strips either do not intersect any of the squares, or each intersects just two of them, one signed by (+) and another by (-). Since the total mass in any strip remains unchanged and we know the distribution of (Xl, X2) (recall it is nonnal), then we easily conclude that Y, '"-J N(O, 1), Y2 '"-J N(O, 1), YI + Y2 '"-J N(O, 1). For example. look at the pairs of strips (8,,8D, (82,82), (83,8D. However it is clear that the distribution of (Yl, Y2) given by the density (1) is not bivariate nonnal. Therefore the nonnality of Yi, Y2 and Y, + Y2 is not enough to ensure that (Y" Yz) is nonnally distributed.
fRl
10.3.
A non-normally distributed random vector such that any proper subset of its components consists of jointly normally distributed and mutually independent random variables
We present here two examples based on different ideas. The first one is related to Example 7.3. (i) Let the r.v. X have a distribution N(a, (12) and let f be its density. Take n r.v.S, say XI, ... ,Xn , n 2: 3, and define their joint density gn as follows:
(1)
gn(Xl, ... ,xn )
:=::
[TI~1 f(Xi) 1[1 + TIj=1 (Xj -
a)f(xj)
1'
(Xl,""X n ) E ]Rn.
Firstly we have to check that gn is a probability density. Since in this case we know f explicitly, f(x) = (21r(12)-1/2 exp [_(x - a)2/2(12], we can easily find that f~oo (x - a) f2 (x) dx :=:: O. Then we can derive that gn is non- negative and its integral over IR n is 1. Thus qn, given by (1), is a density and, as we accepted, gn is the density of the vector (Xl"'" Xn). Let us choose k of the variables X" ... ,Xn , 2 S; k :S n 1. Without loss of generality assume that Xl, ... ,Xk is our choice. Denote by gk the density function of (XI,' .. , Xk). From (1) we obtain
(2) (recall that f is the density of X and X '"-J N( a, (12)). Therefore the variables X I, ... ,Xk are jointly normally distributed and, moreover, they are independent. This conclusion holds for all choices of k variables among XI, ... , Xn and, let us repeat, 2 S; k S n - 1. It is also clear that each X j, j = 1, ... , n has a nonnal distribution N(a, (12).
COUNTEREXAMPLES IN PROBABILITY
92
Therefore we have described a set of 71, r. v.s which, according to (1), are dependent but, as it follows from (Z), are (71, 1)-wise independent.
(ii) Let (XI, ... , Xn) be an n-dimensional normaHy distributed random vector. Then its distribution is determined uniquely if we know the distributions of all pairs (Xi, Xj). i, j = I, ... , n. This observation leads to the question of whether (Xl) ... ,Xn ) is necessarily normal if all the pairs (Xi, Xi) are two-dimensional normal vectors. We shall show that the joint normality of aU pairs and even of all (71, - 1)-tuples does not imply that (XI, ... ,Xn ) is normally distributed. (Look at case (i) above.) Firstly, let 11, 2:: 3 and (~], ... ,~n) be such that ~j = ± I and any particular sign vector (YI, ... )Yn) is taken with probability p if n;::::1 Yj == + 1 and with probability q == 2-(n-l) - p if nj::::::l Yj == -1 Here 0 ~ p ~ 2-(n-I). It is not difficult to see that all subsets of 11, - I of the r.v.s ~], ... , ~71 are independent (that is, any 71, - I of them are mutually independent). Moreover, if p i- z-n, all 71, variables are not mutually independent. Indeed, if 1 ~ k ~ 11, the vector (YII , ... , Yi,.) can be extended in zn-k-l ways to a vector (YI,"') Yn) with n;=l Yj == 1 and in as many ways to one for which nj:;;;;;] Yj == 1. Thus
C == P[1,,11
C ]= == Yi,.
Yil , ••• ) I"i,.
Zn-k-l
P + zn-k-l q ==
Z-k
and this equality holds for any k ~ 71, - 1. Hence 6, ... , ~n. are (11, - I) -wise 1) ... ) ~n 1] = P it is obvious that 6) ... ,~n are not independent. Since P[~l 71 independent when p i- 2- • Now take Z,) ... ) Zn to be 11, mutually independent standard normal r. v.s which are independent of the vector (~I"") ~n). Define a new vector (Xl, ... , Xn) where Xj == ~j 1Zj I, j = 1, ... , n. Then clearly the Xj are again standard normal. The independence of the Zj together with the above reasoning concerning the properties of the vector (~I" .. ,~n) imply that aU subsets of 11, - 1 of the variables Xl, ... ,Xn are independent. Thus any (11, I)-tuple out of XI, . .. ,Xn hasan (11,-1 )-dimensional normal distribution. It remains for us to clarify whether all 11, variables Xl, ... ) Xn are independent. It is easy to see that
P[ X I > 0, ... , X n > 0] = P[6 == 1, ... ,~n == 1] == p. We conclude from this that if p ::j:. 2 n then the variables Xl, ... , Xn are not independent and not normally distributed. Let us note finally that in both cases, (i) and (ii), the joint normality and the mutual I of the variables X I, ... , Xn do not imply that the vector independence of any n (Xl) ... , Xn) is normally distributed.
10.4. The relationship between two notions: normality and uncorrelatedness Let (X, Y) be a random vector with normal distribution. Recall that both X and Y are also normally distributed, and if X and Yare uncorrelated, they are independent.
93
RANDOM VARIABLES AND BASIC CHARACTERISTICS
The examples below will show how important the normality of (X, Y) is. (i) Let X '" N(O, I). For a fixed number C
Y _ { X, -X,
2:: 0 define the r.v. Y by if IXI if IXI
:s; c
2:: c.
It is easy to see that Y ,..., N(O, I) for each c. Further,
E[XY]
= E[X2 [(IXI
c)] - E[X2 [(IXI > c)].
=
This implies that E[XY] -I if c = 0 and E[XY] -t I as c -t 00. Since E[XY] depends continuously on c, then there exists Co for which p(X, Y) = E[XY] = O. In X2i.p(x) dx - 1 = 0 fact, Co :;::;:: 1.54 is the only solution of the equation E[XY] = 4 (i.p is the standard normal density). For this Co the r. v.s X and Y are uncorrelated. However, P[X > c, Y > cJ = 0 :j:. P[X > cJP[Y > c] and hence X and Yare not independent.
J;o
(ii) Let i.pl (x, y) and i.p2(X, y), (x, y) E m,2 be standard bivariate normal densities
with correlation coefficients PI and P2 respectively. Define
f(x, y)
= Cli.pl (x, y) + C2i.p2(X, v),
(x, y) E m,2
where CI, Cz are arbitrary numbers, CI ,C2 2:: 0, CI + Cz = 1. One can see that f is non-normal if PI :j:. p2. If we denote by (X, Y) a random vector with density f then we can easily find that X '" N(O, 1), Y '" N(O, 1). Moreover, the correlation coefficient between X and Y is P = CIPI + C2P2. Choosing CI, C2, PI, P2 such that CI PI + C2P2 0, we obtain two normally distributed and uncorrelated r.v.s X and Y. However, they are not independent. (iii) Let (X, Y) be a two-dimensional random vector with density
f(x,y)
= (27rv'3)
I
{exp[-~(x2+xy+y2)] +exp[-~(x2 _xy+y2)]},
(x, y) E Il~? Obviously the distribution of (X, Y) is not bivariate norma1. Direct calculation shows that X ,..., N(O, I). Y ,..., N(O, 1) and E[XYJ = O. Thus X and Yare uncorrelated but dependent. + i6 and Y = 6 + i~4 where i = Ff and (6,6,6,~4) is a normally distributed random vector with zero mean and covariance matrix
(iv) Let X =
6
o
c
[j
-1
o
0 -1 1
0
-~lo . 1
94
COUNTEREXAMPLES IN PROBABILITY
The reader can check that C is a covariance matrix. Since X and Y are complexvalued, their covariance is
E[XY] = E[66 + 6~4] + iE[66 - 6~4] = 0 + i( -1 + 1) = O. Hence X and Y are uncorrelated. Let us see if they are independent. If so, then 6 and ~4 would be independent, and thus uncorrelated. But E[6~4] = -1. This contradiction shows that X and Y are dependent.
10.5. It is possible that X, Y, X + Y, X - Y are each normally distributed, X and Yare uncorrelated, but (X, Y) is not bivariate normal Consider the following function:
where (x, y) E
m. 2 and the constant c > 0 is chosen in such a way that
In order to establish that f is a two-dimensional probability density function and then derive some other properties, it is best to find first the Fourier transform ¢ of f. We have
¢(s, t)
= / L2 exp(isx + ity)f(x, y) dx dy = exp [_~(S2 +t 2 )] + 3'2 st (S2 -
2
t )exp
[-c -
~(S2 +t 2 )]
,
(s,t)
E
m. 2
From this we deduce the following conclusions. (I) (2) (3) (4) (5)
Since ¢(O, 0) = I, f(x, y) is the density of a two-dimensional vector (X, Y). ¢(t,O) = 4>(0, t) = exp( - !t 2 ) => X '" N(O, 1). Y '" N(O, 1). 4>(t, t) = exp( -t 2 ), that is X + Y ,..- N(O, 2). X - Y is also normally distributed. X and Y are uncorrelated.
However, the random vector (X, Y) as defined by the density f(x, y) is not bivariate normal despite the fact that properties (2)-(5) are satisfied.
10.6.
If the distribution of (X, , ... , X n) is normal, then any linear combination and any subset of X I , ... , X n is normally distributed, but
there is a converse statement which is not true This example can be considered as a natural continuation of Examples 10.2 and 10.5. Let us introduce the function
95
RANDOM VARIABLES AND BASIC CHARACTERISTICS
(I)
le(XI,''''Xn) = (27r)-0/2
g
<po (x,)
[I H(xl- xll g
X.I(-I,I)(x.)expG
~x2) 1
where (XI, ... ,Xn) E m. n, IPO(X) = exp(-!x2), 1(_1,1) is the indicator function of the interval ( -1, 1) and the constant c is chosen such that
Under this condition we can check that I~ is a density of some n-dimensional random vector, say (X I, ... ,Xn). Evidently the density Ie defined by (1) is not nonnal. Now let us derive some statements for the distributions of the components of (Xl, ... , Xn). For this purpose we find the ch.f. ¢ of Ie explicitly, namely
where
'IjJ(t)
= { (2ilt2)(sin t -
t cos t),
0,
(iJ(t)
= { (2i It) + (6jt 2)'IjJ( t), 0,
if t ::J: 0 if t = 0, if t ::J: 0 if t = O.
From (2) one can draw the following conclusions. (a) Each of the components XI, ... ,Xn is distributed N(0,1). (b) For each k, k < n, the vector (Xii,"" XiI<) is nonnally distributed. (c) If U = XI ± X2 and V is any linear combination of the variables X 3 , • •• , X n , then U + V is nonnally distributed. (d) If al, ... ,an are real numbers such that ak ::J: 0 for k = 1, ... , nand lall ::J: la21, then 2:;=1 akXk is not nonnally distributed. (e) E[n~=l X k ] = O. For the particular case n = 2 (which can be compared with Example 10.5) we obtain that: (a) ::::} Xl and X2 have standard nonnal distribution; (c) ::::} Xl + X 2 and XI - X2 are nonnally distributed; (e) ::::} XI and X2 are uncorrelated. However (XI, X2) is not nonnal, which follows from (d). Return again to the general case. Let U = XI ± X 2 , U I be a linear combination of anykofthevariablesX3 ,X4 , •.. ,Xn,O::; k::; n-2,andYbealinearcombination of the remaining n - k - 2 of these variables. Then the r.v.S X = U + U I and Y are independent and nonnally distributed. Indeed, X and Y are uncorrelated and normal r. v.s and (c) implies that a countably infinite number of distinct linear combinations of them are distributed nonnally.
96
COUNTEREXAMPLES IN PROBABILITY
10.7.
The condition characterizing the normal distribution by normality of linear combinations cannot be weakened
Let us start with the formulation of the following result (Hamedani and Tata 1975). Suppose {( ak, bk ), k = 1,2, ... } is a countable 'distinct' sequence in ~? such that for each k, akX + bkY is a normal r.v. Then (X, Y) has a bivariate normal distribution. (Here 'distinct' means that the parametric equations t, = akt, t2 = bkt represent an infinite number of lines in ]R2.) We are now interested in whether the condition of this theorem can be weakened. More precisely, let X and Y be r.v.s satisfying the following condition:
(ak,bk),k = 1, ... ,N, N a fixed natural number, the linear combinations ak X + bkY, k = 1, ... , N are normally distributed.
(CN) for given N pairs
The question is of whether (CN) implies that (X, Y) has a bivariate normal distribution. To see this, consider the following function:
>(8, t) = exp [-
[g
~(8' + t')1+ exp [-E - ~c(s' + t') 1 (b~8' - a~t')1
where s, t E ]R', c, c E ]R+. Firstly, we shall show that for a suitable choice of c and c, ¢(s, t) is the ch.f. of some two-dimensional distribution. Indeed, denoting by I(x, y) the inverse Fourier transform of ¢(s, t), we obtain:
I(x,y)
=
(27r)-2/L2 exp(-isx-ity)¢(s,t)dsdt
= (27r)-' exp [- ~(x2
he ,c(x, y) .- (27r)-2e-e X
fL2
+ y2)] + he,c(x, y),
exp( -isx - ity) exp [- ~C(s2
+ t 2 )]
n~=, (b~S2 - a~t2) ds dt.
Further, we need an estimate for the function he ,c which has just been introduced. It can be shown (Hamedani and Tata 1975) that for suitably chosen constants c and c (1)
Ihe,c(x,y)l::; (27r)-' exp [-i(x 2 +y2)]
for all (x,y) E
]R2.
Since ¢(s, t) is a continuous function, ¢(O, 0) = I and I(x, y) is real-valued, we conclude that I and ¢ is a pair of functions where I is a two-dimensional density and ¢ its corresponding ch.f. Denote by (~, 17) the random vector whose density is f. Further, the definition of ¢ immediately implies that
¢(akt,bkt) = exp [-~(a~ +b~)t2],
k = I, ... ,N.
Obviously, this means that the r.v. ak~ +b k17 is normally distributed as N(O, a~ +b~), k = 1, ... N. However, ¢ itself is not the ch.f. of a bivariate normal distribution.
RANDOM VARIABLES AND BASIC CHARACTERISTICS
97
Therefore we have constructed a pair of r.v.s" and 11, for which condition (CN) is satisfied. but (',11) is not normaL Thus (CN) is not enough for normality of (',11). It should be noted that condition (1) holds only if N is finite.
10.8.
Non-normal distributions such that all or some of the conditional distributions are normal
(i) Let lex, y), (x, y) E JR2 be a bivariate normal density. Then it is easy to check that each of the conditional densities!1 (xjy) and h(yjx) is normaL This observation leads naturally to the question of whether the converse statement is true. We shall show that the answer is negative. Consider the following function:
(1)
g(x, y) = C exp[ -( 1 + x 2)(1
(x, y) E JR 2.
+ y2)],
Here C > 0 is a norming constant such that ffJR 2 g(x, y) dx dy ]. A standard calculation shows that the conditional densities g[ (xly) and g2 (y Ix) of g(x, y) are expressed as follows: g[ (xly) = (27rap 1/2 exp( -x 2
g2(ylx)
/2ap,
= (27rax)-1/2exp{-y2/2ax)
wherea~ = 1/(2(1 +y2)),a; = 1/(2(1 +x 2)),x E JRl,y E JR l . Obviously gl (xly) and g2(ylx) are normal densities of N{O, a~) and N(O, a;) respectively. However, g( x, y) given by (I) is not a two-dimensional normal density. Therefore the normality of the conditional densities does not imply that the two-dimensional density is normal. Let us note that similar properties hold for any density (non-normal) of the type 2
!i(x,y)
= tex p [-
,2: bijXiyj] 1,J=0
(for details see Castillo and Galambos (1987. 1989». One particular case of 9 is the function 9 given by (I ). (U) Consider now another interesting situation. Let' be a r.v. distributed uniformly on the interval [0,1] and 111,112,113,114,115 be independentr.v.s each with distribution N(O, 1). Suppose additionally that' and 11k, k = 1, ... 5 are independent. Define the r.v.s
It is then not difficult to check that each of XI. X2. X3 has a standard normal distribution N{O, 1). Further, if ¢31 1,2( t) denotes the conditional ch.f. of X3 given Xl.
98
COUNTEREXAMPLES IN PROBABILITY
Xz, then we find
and hence X 3 , conditionally on XI and Xz, has a normal distribution N(O, I). So, given these properties we conjecture that the vector (XI, Xz, X3) has a trivariate normal distribution. Let us check whether or not this is correct. For this purpose we compute the joint ch.f. tj;(tr, tz) of XI and Xz as follows
+ itzXz)} = E{E[exp(itIXI + itzXz)l~]} = E{exp[-!ti~ - !t~~ - !(tl + tz)z(1 - ~)]} = exp[ - !(tl + tz)Z]E[exp(tl tz~)l = (tl tZ)-1 (e t1t2 - 1)e- !(tl+td.
tj;(tl, tz) = E{ exp(itlXI
This form of the ch.f. tj;(tl' tz) shows that the distribution of (Xl, Xz) is not bivariate normal. Therefore the vector (X I, X Z, XJ) cannot have a trivariate normal distribution despite the normality of each of the components Xl, Xz, XJ and the conditional normality of X3 given XI, Xz. Note that under some additional assumptions the conditional normality of the components of a random vector wiJI imply the normality of the vector itself (see Ahsanullah 1985; Ahsanullah and Sinha 1986; Bischoff and Fieger 1991; Hamedani 1992).
10.9.
Two random vectors with the same normal distribution can be obtained in different ways from independent standard normal random variables
Recall that there are a few equ ivalent definitions of a multi-variate normal distribution. According to one of them a set of r. v.s X I, ... , X n with zero means is said to have a multi-variate normal distribution if these variables are linear combinations of independent r.v.s, say 6, ... ,~M, where ~j '" N(O, 1), j = 1, ... , M. That is, we have M
(1)
Xi
=
L Cij~j, i = 1, ... , N. j=1
Note that there is no restriction on M. It may be possible that M
< N, M = N
or
M>N. Suppose we are given the r.v.s ~I, ... '~M which are independent and distributed N(O, 1). Any fixed matrix (Cij) generates by (1) a random vector with a multi-variate normal distribution. Then the natural question which arises is whether different matrices generate random vectors with different (multi-variate normal) distributions. To find the answer we need the following result (see Breiman 1969): if both random vectors (Xl, ... , X N) and (Y1 , ••• , YN ) have a multi-variate normal distribution with the same mean and the same covariance matrix, then they have the same distribution.
RANDOM VARIABLES AND BASIC CHARACTERISTICS
99
Now we shall use this result to answer the above question. According to (1) each of the vectors (Xl, ... ,XN ) and (Yi, ... , YN ) is a transformation of independent N(O, 1) r. v.s obtained by using a matrix. Thus the question to be considered is whether this matrix is unique for both vectors. By a simple example we can show that the answer is negative. Take 6 and 6 to be independent N(O, 1) r.v.s and let
Xl = ~I
+ 6,
X2 =
26 + 6·
Define also
Yi =
V26,
Y2 =
(3/V2) 6 + (l/h) 6·
Thus we obtain two random vectors, (Xl, X2) and (Yi, 1'2). It is easy to see that
(Xl, X2) has zero mean and covariance matrix
(~ ; ) . Further, (Yi, Y2) has zero
mean and the same covariance matrix. Moreover, both vectors, (Xl, X 2 ) and (Yi, Y2 ) are multi-variate normal. By the above result, (Xl, X 2 ) and (Yi, Y2 ) have the same distribution. However, as we have seen, these identically distributed vectors are obtained in quite different ways from independent N(O, 1) r. v.s.
10.10.
A property ofa Gaussian system may hold even for discrete random variables
A set of r.v.s {~I' ... ,~n} is said to be a Gaussian system if any of its subsets has a Gaussian (normal) distribution. Suppose for convenience that each ~j has zero mean and denote Cij = COV(~i' ~j) = E[~i~jl. Then for an arbitrary choice of four indices i, j, k, l (including any possible number of coincidences) the following relation holds:
( 1) Note that a similar property is satisfied also for a larger even number of variables chosen from the given Gaussian system (including coincidences of indices). To prove such a property it is enough (0 use the ch.f. of the random vector whose components are involved in the product. The above property has some useful applications but it is also of independent interest. It is natural to ask if this property holds for Gaussian systems only. If the answer were positive, then (1) would be a property characterizing the given system of r.v.s as Gaussian. It turns out, however, this is not the case. Here is a simple illustration. Consider the sequence 1]1,1]2, ... of i.i.d. r. v.s such that P[1]1 = -V3] = P[1]1 = V3] = 1/6,
P[1][ = 0] = 2/3.
It is easy to see that for all choices of indices (including possible coincidences)
E1]i = 0,
E[1]i1]i] =
Cij
and
E[1]iT/i1]k] = 0,
100
COUNTEREXAMPLES IN PROBABILITY
where Cij
:::
1, if j ::: i and
E[17i77i77i77i]
3,
Cij :::
f:.
0, if j
i. Direct calculations show that for
E[77i1]i77j77j]::: 1
j
f:. i
and
E[1]i1]j1]k1]d::: 0
for all other choices of indices. All these facts taken together justify the following relation (compare with (1»:
E[1]i77j77k1]d :::
CijCkl
+ CikCjl + CUCjk
which is valid for arbitrary indices i, j, k, l. Hence (1) is satisfied for a collection of r. v.s whose distribution is very far from Gaussian.
SECTION 11. THE MOMENT PROBLEM Let {ao ::: 1, ai, a2, ... } be a sequence of real numbers and I be a fixed interval, I C }RI. Suppose there is at least one d.t. F(x), x E I such that
an
=
1
xndF(x),
n::: 0, 1,2 ....
If F is uniquely specified by {an} we say that the moment problem is determinate (the distribution is uniquely determined by its moments). Otherwise the moment problem is indeterminate. Note that the moment problem in the case I = [0,(0) is called the Stieltjes moment problem, while in the case I ::: (- 00, (0) we speak of the Hamburger moment problem. There are sufficient conditions for the moment problem to be determinate or indeterminate. Let us formulate the following three criteria. Criterion (Cd. Let F(x), x E }RI be a d.f. whose ch.f. ¢(t), t E IRI is r-analytic for some r > 0. Then F is uniquely determined by its moment sequence {an} where an = J~oo x tl dF{x). Further, the ch.f. ¢ is r-analytic for some r > iff
°
_._ (a2n)1/(2n) hm
n-+oo
2n
< 00.
(Equivalently, the m.gJ. M(t) ::: E[e tX ] exists for It I < to, to > 0.) Criterion (C2)' Let {ao::: l,al,a2, ... } be the momentsofad.f. F(x), x E IRI and let 00
(1)
I)a2n)-I/(2n) ::: n=l
00
(Carleman condition).
Then F is uniquely determined by {an}. If the d.f. F has as support the interval [0, (0) (instead of (- 00, (0» then a sufficient condition for uniqueness is ,,",<X> la n 1-1/(2n)::: 00 . ~n=1
tot
RANDOM VARIABLES AND BASIC CHARACTERISTICS
Criterion (C3). (a) Suppose the dJ. F(x). x E ]Rl is absolutely continuous with density f(x) > 0, x E ]R I and let F have moments of all orders. If
j <XJ - log+ f(x) x
(2a)
~<XJ
----::=-2-
1
d X
< 00 (Krein condition)
then the distribution F is indeterminate. (b) Let the d.f. F(x), x E ]R+ (F(O) = 0) be absolutely continuous with density f(x) > 0, x > 0 (f(x) - 0, x :s 0) and let F have moments of all orders. If <XJ - log f(x 2 )
1
(2b)
o
--=-~2~
1+ x
dx
< 00 (Krein condition)
then the distribution F is indeterminate. The proof of criteria (Cd and (C2 ) can be found in Shohat and Tamarkin (1943). Criterion (C) was suggested by Krein (1944) and discussed intensively by Akhiezer (1965) and Berg (1995). In these sources, as well as in Kendall and Stuart (1958), Feller (1971), Chow and Teicher (1978) and Shiryaev (1995), the reader will find discussions of these and other related topics. The ex.amples in this section clarify the role of the conditions which guarantee the uniqueness of the moment problem and reveal the relationships between different sufficient conditions.
11.1.
The moment problem for powers of the normal distribution
If ~ is a r.v., ~ N( a, (12), then the distribution of ~ (the normal distribution) as well as that of (X2-distribution) are uniquely determined by the corresponding moment sequences. These facts are well known but also they can be easily checked by, e.g. the Carleman criterion. Thus a reasonable question is: what can we say about higher powers of ~? It turns out 'the picture' changes even for e. The first observation is that all moments E[(e)k], k 1,2, ... , exist, however E[exp(te)] exists only if t = O. Hence the m.g.f. does not exist and we can conclude (see Criterion (CI» that the distribution of e is indeterminate. The case e allows us to make a more detailed analysis. For let us take a r.v. TJ N(O, whose density is 11"-1/2 exp( -x 2), x E ]RI. Then the new r.'1. X TJ3 has the following density:
e
/"oJ
=
/"oJ
!)
By using some standard integrals (fo<XJ(1 + x 2)-1 dx = n/2, fo<XJ[(Iogx)/(I x 2 )]dx = 0, fo<XJ[xO /(1 + x 2 )]dx = n/2cos(6n/2), -1 < t5 < I) we can easily conclude that
i:
[-logf(x)/(l
+ x 2 )] dx <
00.
102
COUNTEREXAMPLES IN PROBABILITY
Hence, according to the Krein criterion (C3), the distribution of the r.v. X = TJ3 is not determined uniquely by the moment sequence {Ok = E[Xk], k = 1,2, ... }. In a similar way we can show that the moment problem is indeterminate for the · dlstn'b' utlOn 0 f any LV. TJ 2n+1 , n = 1, 2, .... Let us return to the r. v. X = TJ3. Knowing that the distribution of TJ3 is indeterminate, we should like to describe explicitly another LV. with the same moments as those of TJ3. One possible way to do this is to consider the following function:
It can be shown that for E E [- ~, ~], Ie (x), x E IR I, is a probability density function. Denote by Xe a r.v. having Ie as its density. Obviously Ie i- I, except for the trivial case E = O. Our further reasoning is based on the equality
This immediately implies that
despite the fact that Xe and X are r.v.s with different distributions since for their densities one holds: Ie i- I, E E [-~,~] (exceptE = 0). It is interesting (and even curious) to note that the distribution of the absolute value IXI is determinate! Indeed, the LV. IXI = ITJI3, where TJ '" N(O, ~), has a density (2/3y'1r)x- 2/ 3 exp( _x 2/ 3 ) for x > 0 (and 0 for x :S 0). Then for the moment ak = E[lXlk] of order k, k = 1,2, ... , we have ak = (1/y0r)r((3k + 1)/2). For large k, ak "-' ck 3/ 2, c = constant, implying that 2:~1(ak)-I/(2k) = 00, i.e. the Carle man condition is satisfied. Therefore in this case the moment problem is determinate.
11.2.
The lognormal distribution and the moment problem
Let X be a LV. such that log X '" N(O, 1). In this case we say that X has a (standard) lognormal distribution. The density I of X is given by
( 1) n2
The moments an = E[xn], n ~ 1, can be calculated explicitly, namely an = e / 2 • It is easy to check that the moments {an} do not satisfy the Carleman condition 2:~1Ianl-I/(2n) = 00. Since this condition is only sufficient, we cannot say whether the sequence {an} determines uniquely the d.f. F of X. Further, we have
RANDOM VARIABLES AND BASIC CHARACTERISTICS
the following relations:
1=
(I +
2 X )
II log xl k dx =
~
i:
103
2y
(1 + e )-llylk e ll dy
fO Iylk e y dy + (Xl Iylkey dy < 00,
1-00
10
k
= 0, 1,2, ....
From this we conclude that the density (1) does satisfy the Krein condition (2b). According to Criterion (C3 ) the lognormal distribution is not determined uniquely by its moments. Alternatively, the same conclusion is derived by referring to Criterion (C,) after showing that E[e tX ] = 00 for each t > 0 meaning that the m.gJ. of X does not exist. Thus we come to the following interesting question: is it possible to find explicitly other distributions with the same moments as those of the lognormal distribution? Two possible answers are given below. (i) Let {fe (x), x E R I, t E [- I, I]} be a family of functions defined as:
fe(x) _ {f(x)[1 0,
(2)
+e:
if x if x
sin(27rlogx)],
>0 ~ 0
where f is given by (1). Obviously, fe(x) ~ 0 for an x E JRI and any E" E [-1,1]. In order to establish other properties of fe. we have to prove that
Indeed, by the substitution log x = (27r)-1/2
i:
exp (
U
= Y + k we reduce
_~u2 + kU) sin(21fu) du = (27r) 1/2exp
(~k2)
i:
Jk
exp
to the integrals
(_~y2) sin(27ry)dy.
The last integral is zero since the integrand is an odd function and the interval (-00,00) is symmetric with respect to O. So, based on (3) we draw the following conclusions concerning the family (2). If k = 0 then for any E E [-1, 1], fe (x), x E JR 1 is a probability density function of some r.v., say Xc' Obviously, if e: = 0, fe and Xc are respectively f and X defined at the beginning of this section. Moreover, we have
E[X:J = E[Xk] for any k, k
1,2, ...
despite the fact that fe f:. f for E f:. O. Therefore we have described explicitly the family {Xc} containing infinitely many absolutely continuous r. v.s having the same moments as those of the r. v. X with
104
COUNTEREXAMPLES IN PROBABILITY
lognonnal distribution. This example, after the paper by Heyde (1963a) appeared, became one of the most popular examples illustrating the classical moment problem. (ii) Now we shall exhibit another family of d.f.s {Ha, a > o} having the same moments as the lognonnal distribution (1). Let us announce a priori that H a , for each a > 0, will correspond to some discrete r.v. Ya . For a > consider the function
°
00
(4)
ha(t)
=
L
n2 a- ne- / 2 exp(iaent),
t E RI.
n=-oo
It is easy to see that the series in (4) is convergent for all t E R I and all a > 0. Moreover, the functions ha(t), i E RI are continuous and positive definite in the standard sense: L:j,k zjzkha(tj - ik) 2: 0, ij, ik E RI, Zj, Zk are complex numbers. By the Bochner theorem (see Feller 1971) the function
is a ch.f. Denote respectively by H a and Ya the d.f. and the r. v. whose ch.f. is 'l/Ja. The explicit fonn (4) of the function ha allows us to describe explicitly the r.v. Ya. We have
The next step is to find the moments an = E[Y:J = L~_oo(aek)npa(k), n = 1,2, .... Since
we can easily obtain
= i- n
L a- k (iae k te-(n +k2 2
)/2 /ha(O)
k
= L a-(k-n)e-(k-n)2 /2 /ha(O) = 1. k
It follows from this that
E[Y:J
= an = en2 / 2 = E[xn],
n
= 1,2, ....
Therefore there is an infinite family {Ya } of discrete r. v.s such that Ya has the same moments as the r.v. X with lognonnal distribution. We refer the reader to papers by Leipnik (1981) and Pakes and Khattree (1992) for more comments about this example and other related topics.
RANDOM VARIABLES AND BASIC CHARACTERISTICS
11.3.
105
The moment problem for powers of an exponential distribution
Let € be a r.v., € '" Exp( 1). The density of € is e- x for x > 0 and 0 for x S; 0, so the moment of order k is Ok = E[e] = k!, k = 1,2, .... The distribution of €, i.e. the exponential distribution is uniquely determined by the moment sequence {Ok}. This follows from the Carleman criterion as well as from the existence of the m.gJ. of € and referring to Criterion eC I ). Now we want to clarify whether powers of € have determinate distributions. For let 6 > 0 and let X = If J is the density of X, we easily find that
e.
I(x)
= (I/6)x 1/ O- 1 exp(-x IN ),
x >0
(of course, I(x) :--: 0 for x S; 0). The moments of X exist, E[Xk] = r(6k + 1), k = 1,2, .... The explicit form of the density I al10ws us to find that oo 2 2 [- (log I (x )) / (1 + x )] dx < 00 iff 6 > 2. Hence for 6 > 2 the distribution of the r.v. X = is indeterminate. Let us show that in this case, with 6 > 2, there is a family of r. v.s all having the Indeed, consider the function: same moments as those of
fo
e
e.
where Co = tan(7r/6) and
1
lEI
e
S; sin(7r/6). The equality
00
xk I(x )[cos(coxIN) - (1/ co) sin(coxIN)] dx = 0,
shows that
Ie is a probability density function of a r. v. Xc
k = 0, 1,2, ...
and
even though Ie f:. I (except the trivial case E = 0). For completeness we have to consider the case 6 E (0,2]. Since Ok = E[(e)k] = r(6k + 1), we can use the properties of the gamma function ro and show that the Carleman condition is satisfied. Thus we conclude that if € '" exp( 1) and 0 < 6 S; 2, then the distribution of is uniquely determined by its moment sequence (also see Example 11.4).
e
11.4.
A class of hyper-exponential distributions with an indeterminate moment problem
Recall first that the one-sided hyper-exponentiaL distribution J(+ (a, b, c), where a, b, c are positive numbers, is given by the density function
(I)
I(x)
= {cob,-a/c(r(a/c))-Ixa-I exp(-xc/b),
if x if x
>0 < o.
106
COUNTEREXAMPLES IN PROBABILITY
(Notice that the gamma distribution ",(a, b), and the exponential distribution £xp(>..) are special cases of the hyper-exponential distributions.) It can be shown (for details see Hoffmann-Jorgensen 1994) that if X is a r.v., X 9{+ (a, b, c), then the quantity E[Xk] does exist for any k 2: 0 (not just for integer k) and, moreover, I"V
Hence the r.v. X has finite moments Ok = E[Xk], k = 1,2, ... , and the question is whether or not the moment sequence {Ok} determines uniquely the distribution of X. Let us take some a > 0, b > 0 and 0 < c < Then we can choose p > 0 such that m := a + p is an integer number, set r = plc,"\ = l' + 11(3, s = tan(crr) and introduce the function
t.
Since e- X ::; rrx-r for all x > 0, we easily see that 11jJ(u)1 ::; 1, U > O. Let k be and a fixed non-negative integer. Then n = k + m is an integer, v = crr E (0, substituting x = U C yields
I)
This implies that for any non-negative integer k and any real holds:
1
00
where
f
1
E
the following relation
00
uk fe(u) du
=
uk f(u) du
is the hyper-exponential density (1) and
fe(x)
= { f(x)[1 + E1jJ(X)], 0,
if x> 0 if x < O.
Since I1jJ( x) ::; 1, x > 0, it is easy to see that for any E E [- 1, 1], fe is a probability density function of a r. v. X e and
despite the fact that fe f:. f (except the trivial case E = 0). Therefore for a > 0, b > 0 and c E (0, the hyper-exponential distribution J(+ (a, b, c) is not determined uniquely by its moment sequence. It can be shown, however, that for a > 0, b > 0 and c 2: ~, the moment problem for J(+ (a, b, c) has a unique solution (Berg 1988).
1)
107
RANDOM VARIABLES AND BASIC CHARACTERISTICS
Since for the exponential distribution Exp(>'), the gamma distribution ,(a, b) and the hyper-exponential distribution :;..c+ (a, b, c) we have
Exp(>')
,(1,1/>'),
,(a, b) = Je+(a,b, 1)
it follows that if X '" exp(>.) or X '" ,(a, b), where>. > 0, a > 0 and b > 0 are arbitrary, then the distribution of X 6 for {} > 2 is not determined uniquely by the corresponding moment sequence. (Also see Example lI.3.) 11.5.
Different distributions with equal absolute values of the characteristic functions and the same moments of all orders
Let us start with the number sequence {ak, k == 1,2, ... } where ak > 0 and 00. We shall consider a special sequence of distributions and study the corresponding sequence of the ch.f.s. The uniform distribution on [-ak, ak] has ach.f. sin(akt)/(akt). Denote by!k (x), x E [- 2ak, 2ak] the convolution of this distribution with itself. Then the ch.f. ¢k of !k is ¢dt) = [sin(akt)/(akt)j2, k - 1,2, ... , t E )Rl. The product ¢j(t) converges, as m -t 00, to the function ¢(t) where
a := L:~I ak <
n;,
n 00
¢(t) :=
(1)
¢j{t),
t
E )RI.
Using the Taylor expansion of sin x for small x, we find that
¢(t) ~ exp (- iiit2 )
as t -t 0
where ii = L:;l aJ. Therefore according to Lukacs (1970, Th. 3.6.1), ¢(t), t E )RI is a ch.f. The d.f. corresponding to ¢ is absolutely continuous. Let its density be denoted by f. Clearly f is an infinite convolution: that is. f == fl * 12 * .. ', By the inversion formula (Feller 1971; Lukacs 1970; Shiryaev 1995) we find that f(O) = (27l')-1 f~oo ¢(t)dt, Since ¢(t) > 0, for all t E )RI, we can construct the density fa by setting fo(x) = (27l'f(0»-1¢(x), x E )RI. If ¢o denotes the ch.f. of fa then ¢OCt) = f~oo exp(itx)fo(x) dx
= (211" f(O»-1 f~oo exp( -itx)¢(x) dx =
f(t)/ f(O).
Note especiaJly that the support of ¢o is contained in the interval [-2a, 2a]. Using the function ¢o and the function ¢ from (1) we define the following four functions:
'1/JJ(t)
= ¢o(t) + H¢o(t + 4a) + ¢o(t -
4a)],
4n(t) = ¢OCt) - H¢o(t + 4a) + ¢o(t - 4a)], 9t(X) = (211"j(0»-I¢(x)(1 +cos4ax), 92(X) = (27l'f(0» l¢(x)(l - cos4ax).
108
COUNTEREXAMPLES IN PROBABILITY
The above reasoning shows that 9 I(x), x E lR I and 92 (x), x E lR I are probability density functions. Moreover, 'l/JI and 'l/J2 given by (2) and (3) are just their ch.f.s. Also I'l/JI (t)1 = 1'l/J2(t) I for each t E lRl. Denote by XI and X 2 r.v.s with densities 91 and 92 respectively. If c;; I := Tff(O) n;=1 a~ then it is easy to derive from (4) that Igl(x)1 S c n lxl- 2n for each n E N. The same estimation holds for 92. Hence both variables XI and X2 possess moments of all orders. As a consequence we obtain (see Feller 1971) that the ch.f.s 'l/JI and 'l/J2 have derivatives of all orders, and, since I'l/JI (t)1 = 1'l/J2(t)l, 'l/JI (t) = 'l/J2(t) for t in a small neighbourhood of 0. This implies that the moments of XI and X 2 of all orders coincide. Looking again at the pairs (91,92), ('l/JI ,'l/J2) and (X I, X 2) we conclude that I'l/JI (t) I = I'l/J2 (t) I for all t E lRl, E[Xn = E[Xf] for each kEN but nevertheless 91 i- 92· 11.6.
Another class or absolutely continuous distributions which are not determined uniquely by their moments
Consider the r.v. X with the following density: f(x)
°
= { c exp( -ax A),
if x > if x SO.
0,
°
Here 0' > 0, < A < ~ and c is a nonning constant. For c E (-1, I) and ,8 = 0' tan ATf define the function fe(x)
fe
= {c exp(-ax A)(1 + csin(J1x A)), 0,
by
°
~f x > If x SO.
Obviously fe(x) ~ 0, x E lRl. Next we shall use the relation
1
00
(I)
xn exp( -ax A) sin(J1x A) dx
Let us establish the validity of (1). If p > ~q > 0, then we use the well known identity
1
= 0.
°
and q is a complex number with
00
Denoting p
1
tp-1e- qt dt = r(p)jqP.
= (n + 1) j A, q = a + ib, t =
x A , we find
1
00
00
xA[(n+I)/A-I] exp[-(a + ib)XA]AXA-I dx = A
1
00
= A
xn exp( -ax n ) COS(bXA) dx - iA
= r((n -+-
1
00
xn exp[-(a + ib)x A]dx
xn exp( -ax A) sin(bx A) dx
l)j A) [a(n+l)/A( 1 -+- i tan ATf)(n+I)/A] .
RANDOM VARIABLES AND BASIC CHARACTERISTICS
The last ratio is real-valued because sin[rr(n
(I
+i
109
+ I)] == 0 and
tanArr)(n+I)/A
= (cos Arr) -(n+I)/ Aei1f (n+l)
= (cos Arr)-(n+I)/A cos1r(n + 1). Thus (1) is proved. Taking n = 0 we see that ff: is a probability density function. Denote by Xf: a r.v. whose density is fe;. The relationship between fe. and f, together with (I), imply that E[X:] = E[xn]
for each n == 1,2 ....
Therefore we have constructed infinitely many r.v.s Xe: with the same moments as those of X though their densities Ie: and f are different (fe: = I only if c = 0). So in this case the moment problem is indeterminate. However, this fact is not surprising because the density f does satisfy criterion (C3).
11.7.
Two different discrete distributions on a subset of natural numbers both having the same moments of all orders
Let q > 2 be a fixed natural number and Mq == {qm : m == 0, 1,2, ... }. Clearly Mq c N for q = 2,3, .... If n E Mq then n has the form qm and we can define Pn by Pn = e-qqm 1m!. It is easy to see that {Pn} is a discrete probability distribution over the set }vfq • Denote by X a r.v. with values in Mq and a distribution {Pn}. In this case we say that X has a log-Poisson distribution. Then the kth-order moment Ok of X is 00
Ok == E[xk]
L e-qqkmqm 1m! = exp[q(l -
I)]
< 00.
m=O
Our purpose now is to construct many other r. v.s with the same moments. Consider the function h(z) := n~, (J - zq-k). Since 2:~1 q-k < 00 for q > I then h(z) is an analytic function in the whole complex plane. Let h(z) = 2::=0 cmz m be its Taylor expansion around O. Taking into account the equality h(qz) = (I - z)h(z) we have the relation Cm I Cm-I = - (qm - I) -I where C{) = I and for m 2: 1 we find
Setting am
= m !cm we see that lam I ::;:
I for all m. This implies that
00
(1)
e- q
L m=1
lmamqmlm!
= e-qh(l+l) == 0
foraU
k
= 0,1,2, ....
110
COUNTEREXAMPLES IN PROBABILITY
N ow introduce the n umber set {p~) , n E M q} where
p~) := Pn(1 + cam) = e-qqm[(m!)-I
+ c( -
I)m((q - l)(q2 - I) ... (qm - 1))-1],
Here c is any number in the interval [-I, I]. Obviously p~) ~ LnEMq
p~) = I for any c
E
n = qm.
°
and (1) implies that
[-I, I]. Therefore {p~)} defines a discrete probability
distribution over the set M q. Let Xe be a r. v. with values in Mq and a distribution {p~:)}. Using (1) again we conclude that E[X:] = E[Xk] for each k = 1,2, ... ,
cE[-l,I]. So, excluding the trivial case c = 0, we have constructed discrete r. v.s X and Xe whose distributions are different but whose moments of all orders coincide.
11.S.
Another family of discrete distributions with the same moments of all orders
Let N = {O, ± 1, ±2, ... } and X be a r.v. with the following distribution:
= e 8m ] =
P[X
ce-
m2
,
c- I =
m E N,
L
"'e-
m2
.
m2 is the sum over all m E N. For any positive integer k we Here and below 2: "'ecan calculate explicitly the moment Q:k = E[Xk] of order k, namely Ok
=
L
"'e8kmce-m2
= e 16k2 L
"'cexp[-(m
+ 4k)2]
= e 16k2 .
Now we shall construct a family consisting of infinitely many r.v.s with the same moments. For any c E (0, 1) define the function if if if if
= ° = m =3
m
1 2
m m
(mod (mod (mod (mod
4) 4) 4) 4).
In the sequel we use the evident properties: for any fixed c, he(m) is an odd function of m, that is he ( -m) = - he (m); he (m) is a periodic function of period 4; he(m + 4k) is an odd function in m for each integer k. The next crucial step is to evaluate the sum Sk where
Sk We have
= L "'e8kmhe(m)ce-m, 2
k = 0,1,2, ....
Sk = cL"'h e(m)exp(8km-m 2 ) = cexp(16k 2) L "'he(m) exp[-(m - 4k)2] = cexp( 16k 2) L: '" he(u
+ 4k) exp( -u 2).
111
RANDOM VARIABLES AND BASIC CHARACfERISTICS
The last sum is zero because he (u + 4k) is an odd function of u for all k = 0, 1,2, .. " Thus we have estabHshed that
(I)
Sk
=
°
for all k
= 0, 1,2, ...
anda11 c E (0, I).
As a consequence of (1) we derive that
for each m E N and any c E (0, 1) and moreover L *qe (m) = 1. This means thatthe set {qe (m), m E N} can be regarded as a discrete probability distribution of some r.v. which will be denoted by Xe. Thus we have constructed a r.v. Xe whose values are the same as those of X but whose distribution is
Since (1) is satisfied for any k = 1, 2, ... we find that
d
Therefore for any c E (0,1) we have Xe same moments of all orders.
11.9.
:I X
but nevertheless Xe and X have the
On the relationship between two sufficient conditions for the determination of the moment problem
(i) Let Z = X log( 1 + Y) where X and Y are independent r. v.s each distributed exponentially with parameter I. Obviously Z is absolutely continuous and takes non-negative values. The nth moment On of Z is
It can be shown (the details are left to the reader) that
(1)
e- 1 log(1
+ n) < v:/ n < clog(1 + n),
c = constant
Thus (Onln!)I/n = v~n
> e-1log(1 + n)
--t 00
as n
--t 00
which implies that the series L~=l ontn In! does not converge for any t :I 0. Therefore we cannot apply Criterion (Cd (see the introductory notes to this section) to decide whether the d.f. of Z is determined by its moments.
112
COUNTEREXAMPLES IN PROBABILITY
From (1) we obtain that for large n,
a!!n ~ e-Inv~jn
< ce-1n 10g(1 + n)
and hence En an Ij(2n) = 00 because the series E:~l I/(mlog(J + m)) is divergent. Therefore, according to Criterion (C2), the dJ. F of the r. v. Z is detennined uniquely by its moments. Note, however, that we used the Carleman criterion (C2) whereas Criterion (C I). based on the existence of the m.g.f. does not help in this case. (ii) In case (i) we considered an absolutely continuous r. v. Here we take a discrete r. v. and tackle the same questions as raised in case (i). So, let Z be a r.v. whose set of values is {3, 4,5, ... } and whose distribution is given by P[Z = m] = cexp( -ml logm), m = 3,4, ...
where c is a norming constant obtained from the condition P[Z ;:: 3] = I. Our purpose is to verify whether Criteria (C I ) and (C2) apply. Firstly we derive suitable upper and lower bounds for the moment a2n of order 2n. Introducing the function
h(x) = xn exp( -xl log x) where x ;:: 3, n ;:: 4, we can easily show that
h' (x) nIl --::::----+--=0 h(x} x logx log2x iff x = Xn where
n=
n logx n
I) <
( 1--logx n
-
Xn .
logx n
Since nand Xn tend to infinity simultaneously, we have (n log xn)lxn --t I whence ~n log n :s: Xn :s; 2n log n for n ~ no. If we define Mn by
Mn = max[x n exp( -xl log x)] = x~ exp( -xnl logx n ) x;::::3
then for n
~
no we obtain the foHowing estimate: 00
a2n = c
L
m 2n e- mj log m S; CM2n+2
m=3
00
L (11m2) m=3
S; CM2n+2 S; c[( 4n
+ 4) log(2n + 2)]2n+2e- 2n - 2.
Therefore for all sufficiently large n
(a2n)lj(2n) S; 4c lj (2n) [(2n (2) S;
+ 2) log(2n + 2)p+(l j n) 8(2n + 2) log(2n + 2).
113
RANDOM VARIABLES AND BASIC CHARACTERISTICS
On the other hand, (a2n)I/(2n) > (cM2n )I/(2n) >e-I(~nlogn)
(3)
for all large n.
Now using (2) and (3) we easily find that 00
lim [(a2n)I/(2n) /(2n)) = 00
n-+oo
and
"'(a2n)-I/(2n) = 00.
L....,
n=1
Therefore the ch.f. of the r.v. Z is not analytic and we cannot apply Criterion (Cd to say whether the moment sequence {ak} determines uniquely the distribution of Z. However, the Carleman criterion (C2) guarantees that the distribution of Z is uniquely determined by its moments.
11.10.
The Carleman condition is sufficient but not necessary for the determination of the moment problem
In two different ways we now illustrate that the Carleman condition is not necessary for the moment problem to be determined. (i) Let FH be a symmetric distribution on ( -00,00) and Fs a distribution on [0,00). (The subscripts Hand S correspond to the Hamburger and the Stieitjes cases.) By the relations 2 F1 ( ) _ { ~[1 + Fs(x )], if x > H x ~[1 - Fs(x 2 )], if x <
°°
we can define a one-one correspondence between the set of symmetric distributions on (-00,00) and the set of distributions on [0,00). It is clear that FH possesses moments {an} of all orders iff Fs possesses moments {an} of all orders. In this case a2n
= an,
a2n+ 1 = 0,
n = 0, I , 2, ....
Thus we conclude that the Hamburger problem for {an} is determinate iff the corresponding Stieltjes problem is determinate. Moreover the Carleman condition 2::(a2n)-I/(2n) = 00 for the determination of the Hamburger case becomes 2::(a n )-I/(2n) = 00 in the Stieltjes case. We shall use this result later but let us now formulate the following result (see Heyde 1963b): if a set {an} of moments corresponds to a determinate Stieltjes problem, the solution of which has no point of discontinuity at the origin, then the set {an} also corresponds to a determinate Hamburger problem. Consider the r. v. X with density f given by
f(x)
= {[I/r(I/,B)]ex p (-x l1 ), 0,
°
if x> if x <
°°
where < ,B < 1. One can show that an = E[xn] = r((n + I)/,B)/r(l/,B), n = 0,1,2, ... , so that a~n Kn l/ 11 for some constant K. Then 2::(a n )-I/(2n) = 00 I"'<..J
COUNTEREXAMPLES IN PROBABILITY
114
!
for ~ /3 < 1, and by the Carleman criterion (for the Stieltjes case) the Stieltjes problem for these moments is determinate for ~ ~ /3 < 1. Since the distribution with density f has no discontinuity at the origin, from the above result we conclude that the Hamburger problem corresponding to the moments an == r( (n + I) 1(3) Ir( 11(3), n = 0, 1,2, ... , 4 /3 < 1 is also determinate. However, it is easy to check that 2:)a2n) 1/(2n) < 00 for 0 < /3 < 1. Hence the Carleman condition is not necessary for the determination of the moment problem (on (-00,00». (ii) Now we shall use the following interesting and intuitively unexpected result
(see Heyde 1963b); let the moments {I, aI, a2, ... } correspond to a determinate Stieltjes problem. After a mass £, 0 < £ < I, has been added at the origin and the distribution has been renormalized. it is possibJe for the new set of moments { J , al ( 1+£) -1, a2 (1 +£) -I, ... } to correspond to an indeterminate Stieltjes problem. So, Jet {l, aI, a2,"'} and {I, al (I +£)-1, a2(I +£)-1, ... }. 0 < £ < l, be sets of moments corresponding respectively to a determinate and an indeterminate Stieltjes problem. Suppose the Carleman condition is necessary for the determination of the moment problem. Then we should have 2:(a2n)-I/(2n) = 00, which is impossible because 2:(a2n)-I/(2n)(1 + £)I/(2n) < 00.
11.11. The Krein condition is sufficient but not necessary for the moment problem to be indeterminate As shown in the introductory notes to this section, the Krein condition is sufficient for the moment problem to be indeterminate. Thus it is useful to consider examples showing that this condition is not necessary. (i) Let X be a r.v., X "" N(O, ~), and d
> O. Then the density of the r.v.IXI 6 is
(/6 (x) 0, if x ~ 0) and it is easy to see that all moments E[(IXI&)k), k == 1,2, ... , exist. Berg (1988) has shown that the distribution of IX 1° is determinate for d ::; 4 and indeterminate for d > 4. However for the density h we find that Jooo{Jogh(x)/(v'x(I + x»)}dx = -00, i.e. the Krein condition is not satisfied, and hence it is not necessary for the moment problem to be indeterminate. (ii) Take the function h(x) == exp( -x'), if x > 0 and h(x) = exp(x), if x ::; O. Here 1 E (0, ~) and let c be a constant such that g,(x) = c,h(x), x E IRI is a probability
=
density. If Y is a r. v. with density g" then all moments E[yk), k J, 2, ... , exist. 2 Moreover, J~oo {logg,(x)/(1 + x )} dx == -00. Hence the Krein condition is not satisfied but the distribution of Y is indeterminate as follows from Example 11.6.
RANDOM VARIABLES AND BASIC CHARACTERISTICS
11.12.
115
An indeterminate moment problem and non-symmetric distributions whose odd-order moments all vanish
In Example 6.5 we described a r. v. Y such that Ct2n+ 1 = E[y2n+ I] = 0 for all n = 0, 1,2, ... but the distribution of Y is non-symmetric. However, we did not discuss tpe reason for this fact. Let us note that the distribution of the r. v. Y in Example 6.5 is not detennined uniquely by its moments. Now we shall show that the vanishing of the odd-order moments of a non-symmetric distribution is closely related to indetenninate Stieltjes problems. From Example ] ] .7 we know that there are indetenninate Stieitjes problems. Let the dJ.s FI and F2 be two distinct solutions of such a problem for a given set of moments {I, Ct I, Ct2, ... }. Then
is a d.f. which evidently is non-symmetric. Moreover, F has the following moments: 1, 0, Ct2, 0, Ct4, 0, .... Therefore any odd-order moment of F is zero despite its non-symmetry. FinaI1y, let US present one additional example based on the lognonnal distribution considered in Example 11.2. Once again, let
f(x) = (27r)-1/2x -1 exp[-!(Iogx)2], fl(x) = f(x)[l - sin(27rlogx)],
x> O.
Denote by Z a r. v. whose density 9 is defined as follows:
g(x) = { tf(x), 2 fl (-x),
if x> 0 if x ::; O.
Then one can check that all the moments of Z are finite, all odd-order moments E[Z2n+l] are zero but Z is non-symmetric.
11.13.
A non-symmetric distribution with vanishing odd-order moments can coincide with the normal distribution only partially
Let us recall that in general no probability distribution is detennined by a finite number of moments. The previous examples show that the distribution cannot be determined uniquely even if we know an (and hence an infinite number of) its moments. However, if we specify the class of distributions, then a member of this class could be detennined by a finite number of moments. For example, a member of the so-called class of Pearson distributions is specified by a knowledge of at most four moments (Feller 1971; Heyde 1975). Certainly we have to indicate the nonnal distribution N(a, (12) which is detennined uniquely by its first two moments only. Thus we come to the following question: does there exist a r. v. X such that for infinitely many k, but not for all k 2: 1, we have E[Xk] = E[ Zk] where Z '" N(a, (12) d
but nevertheless X
~
Z?
COUNTEREXAMPLES IN PROBABILITY
116
Let Z be a r. v. distributed N( 0, I). We shall construct a r. v. X such that
(1)
If (1) holds we can speak about a partial, but not full, coincidence of the distributions of X and Z. So, let YI be a r.v. with density
g(x) ::::::
- e signx sin(clxll/4)],
xE
}RI
> 0, e =F 0, lei <
I. Obviously g is non-symmetric. The moments E[Y/] can be calculated explicitly, namely
where c Qk ::::::
41gC4 exp( -clxP/4)[1
Q2k+l ::::::
0,
Q2k = ~c-gk(8k + 3)!,
k = 0, 1,2, ...
(see also Example 6.5). By choosing c = (11 !/6)1/8 we get Q2 = E[Y?] :::::: 1. Take now a r.v. Y2 which is independent of Yi and takes the values I and -1 with probability! each. For some constant (3, < (3 < 1 which wi11 be specified later, put
°
Clearly the distribution of X is non-normal and non-symmetric, E[X 2k+l]
°
= = E[Z2k+l]
f or each k
°
= )1'2, ... ,
Finally we find
It remains to choose (3 such that E[X4] :::::: 3 :::::: E[Z4]. Indeed, if the kurtosis coefficient of the r.v. Y I is r2 : : : E[yI4J - 3, then take (3 (,2 - y'2 ( 2) /(,2 - 2) Since c was already fixed (c = (11!/ 16) 1/8) then r2 and hence (3 have definite values. Thus we have constructed a r. v. X which coincides partially but not fully with the standard normal r. v. Z in the sense of (1).
SECTION 12.
CHARACTERIZATION PROPERTIES OF SOME PROBABILITY DISTRIBUTIONS
There are probability distributions which can be characterized uniquely by some properties. In such cases it is natural to use the term 'characterization properties'. Let us formulate two important results connected with the most popular distributions, the normal distribution and the Poisson distribution.
117
RANDOM VARIABLES AND BASIC CHARACTERISTICS
Cramer theorem. If the sum X I + X 2 of the r.v.s XI and X 2 is normally distributed and these variables are. independent, then each of Xl and X2 is normally distributed (Cramer 1936), Raikov theorem. If XI and X2 are non-negative integer-valued r.v.s such that XI + X2 has a Poisson distribution and Xl and X2 are independent. then each of XI and X 2 has a Poisson distribution (Raikov 1938). These important theorems, several useful coroUaries and other characterization theorems can be found in the books by Fisz (1963), Moran (1968), FeHer (1971), Kagan et at (1973), Chow and Teicher (1978) and Galambos and Kotz (1978). Let us note that some of the examples dealt with in Section 10 can be compared with the Cramer theorem. In particular this comparison shows that the assumption of the independence of XI and X2 is essential. We present below various examples of discrete and absolutely continuous distributions and clarify whether or not some properties are characterization properties.
12.1. A binomial sum of non-binomial random variables Let the r. v.s. X and Y be non-negative integer-valued and let their sum Z = X + Y have a binomial distribution with parameters (n,p), Z '" 'Bi(n,p). Then the probability generating function of Z is E[ sZ] = (ps + q) n. If additionally we suppose that X and Y are independent, then (ps + q)n = E(sxJE[sYJ. Since alJ factors of the polynomial (ps + q)n have the fonn (ps + q)k, k = 0,1" .. ) n, it foHows that each of the variables X and Y is also binomially distributed. This observation leads to the following question: does this conclusion hold without the hypothesis ofindependence between X and Y? Let us show that the answer is negative. Let ( be any non-negative integer-valued r. v. Suppose ( takes more than two different values. Define the r. v.s and 1] by
e
e=
((/2],
1] = [((
+ 1)/2J
where [xl denotes the 'integer part' of x. Obviously
pre
Moreover, knowing the distribution of ( we can easily compute = k] and P[1] = m] for all possible values of k, m. Since k,1] = m] 0 for those k. m satisfying the relation Ik - ml > I, we see that the r. v.S ( and 1] are not independent. Note that this property holds irrespective of the distribution of (. In particular, suppose ( is binomially distributed. Then neither nor 1] is binomial, but their sum + 1]. which is equal to (, has a binomial distribution. Recall that ( and 1] are dependent.
pre -
e
e
118
12.2.
COUNTEREXAMPLES IN PROBABILITY
A property of the geometric distribution which is not its characterization property
Recall that the r. v. X has a geometric distribution with parameter P, 0 < P < ), if P[X = n] = pqn, q = I - p, n = 0, I, .... Let Xl and X 2 be independent r.v.s each distributed as X. From the definition of a conditional probability we can easily derive that ( 1)
+ X 2 = n]
P[XI = klX l
1 = --1'
n+
k = 0, I, ... ,n.
We are interested now in whether (I) is a characterization property of the geometric distribution. More precisely: suppose Xl, X 2 are integer-valued independent r.v.s which satisfy relation (l), does it follow that Xl, X 2 are geometrically distributed? To find the answer let us consider the set 0 = {Wkn : k = 0, 1, ... , n, n = 1,2, ... } and let Pn, n = 0, I, ... be positive numbers with L~=o Pn = I. Define a probability P on 0 as follows:
P(Wkn)
= Pn/(n + 1).
This means that 0 = U~=o On where On = {Wkn, k = 0, 1, ... n}, P(On) = Pn and each of the outcomes Wkn has probability Pn/ (n + 1). Introduce two r. v.s, Yi and Y2 , such that YI (Wkn) = k, 1'2(Wkn) = n - k. Then for k = 0, 1, .. " n,
pry; = k Iy; 1
I
+
y;: = 2
1=
n
P[Yi = k, Yi + Y2 = n] pry) + 1'2 = n]
_ P(Wkn) _ Pn/(n + 1) _ P(On) Pn n
+I
Thus relation ( I) is true for the r. v.s Yi and 1'2. However, the distribution of Yi is 00
P[YI = k] = P[{Wkn: n = k,k
+ I, ... }] = LPn/(n + l} n=k
and since the Pn are arbitrary (with Ln Pn = 1), P[Yi = k], k = 0, 1, ... can be very different from the geometric distribution. Therefore (I) is not a characterization property of the geometric distribution. Note, however, thal if additionally we suppose that Xl and X 2 are independent and identically distributed, it can be proved that each of these variables has a geometric distribution.
12.3.
If the random variables X, Y and their sum X + Y each have a Poisson distribution, this does not imply that X and Yare independent
(i) Let X and Y be independent r.v.s each with a Poisson distribution. Then their
sum X
+Y
also has a Poisson distribution. We want to know whether the converse
119
RANDOM VARIABLES AND BASIC CHARACTERISTICS
of the above statement is true: if X and Y are integer-valued and each of X, Y and X + Y has a Poisson distrihution, are the variahles X and Y independent? It turns out that the answer to this question is negative. Take two r.v.s, and 1] each with a Poisson distribution of a given rate. Denote 0, I, ... } and {rj,j = 0, I, ... } where their individual distributions by {Qi,i Qi =p[e i] and rj P[1] = j]. Introduce the sets AI {(O, I), (1,2), (2, On and A2 = {(O, 2), (2, I), (I,O)}.Thejoint distribution of and 1], Pij := = i,1] = j] will be defined in the following way:
e
=
=
=
e
Qirj
(I)
Pij
={
+ e,
Qirj -
e,
Qirj,
=
pre
if (i,j) E AI if (i,j~ E A2 otherWise.
Here e is a real number such that lei < minij Qirj, (i,j) E AI U A2. It is easy to check that {Pij, i = 0, I, ... ,j = 0, 1, ... } is a two-dimensional discrete probability distribution. Moreover, using (1) we find that the sum + 1] has a Poisson distribution. By definition and 1] also have Poisson distributions. However, (1) implies that the r.v.s and 1] are not independent.
e
e
e
(ii) Here is another case slightly similar to case (i). For fixed>. > 0 let e be an arbitrary number in the interval (0, e- 2>">.4 /6). Define Pij, i = 0, I, ... , j = 0, 1, ... , as follows: PII P33
= e- 2>">.2 + e, PI3 = e- 2>">.4/6 - e, P31 = e- 2>">.4/6 - e, = e- 2>..>.6/36 + e, Pij = e- 2>">.i+j /i!j! for all other i and j.
Direct calculations lead to the following conclusions:
i, j = 0, I, ... } is a two-dimensional distribution of a random vector, say (X, Y); 2) X Po(>.) and Y Po(>.); 3) X + Y Po(2)'). I)
{Pij,
f"'V
f"'V
f"'V
However the two components X and Y are not independent.
12.4.
The Raikov theorem does not hold without the independence condition
Recall that the independence of the variables X and Y is one of the hypotheses in the Raikov theorem (see the introductory notes). We are now interested in what happens if we do not assume that X and Y are independent. Our reasoning is similar to that used in Example 12.1. Let ( be a r. v. with a Poisson distribution. Define the r. v.s and 1] by
e
e= [(/2],
1]
= [(( + 1)/2]
(here [x] denotes the integer part of x). It is easy to verify that each of an integer-valued r.v., neither nor 1] has a Poisson distribution, and independent, but the sum + 1] = ( has a Poisson distribution.
e
e
e
eand 1] is 1]
are not
COUNTEREXAMPLES IN PROBABILITY
120
Therefore, as expected, the independence condition in the Raikov theorem cannot be dropped.
12.5.
The Raikov theorem does not hold for a generalized Poisson distribution of order k, k ~ 2
We say that the integer-valued non-negative r.v. X has a generalized Poisson distribution of order k and parameter ..x, ..x > 0, if
(1) where the summation is taken over all non-negative integers m\, ... ,mk such that m 1 + 2m2 + ... + kmk = n. Obviously, if k = 1, then (1) defines the usual Poisson distribution. By using (1) we find explicitly the p.g.f. 9( s) = E[ sX] (see Philippou 1983):
(2) Suppose now that Y1, Y2 are independent r. v.s taking values in the set {O, 1,2 ... } and such that the sum Y1 + Y2 has a generalized Poisson distribution of order k. The question is: does it follow from this that each of the variables Yi, Y2 has a generalized Poisson distribution of order k? Note that in the particular case k = 1 the usual Poisson distribution is obtained and it follows from the Raikov theorem that the answer to this question is positive (see Example 12.4). We have to find an answer for k ~ 2. Consider two independent r. v.s, ZI and Zz where ZI has a generalized Poisson distribution of order (k - 1) and a parameter ..x, and Z2 has the following distribution:
P[Zz = m] = e->" ..x m / k /(m/k)!,
m
We shall use the explicit form of the p.g.f.s respectively. Taking (2) into account we find that
= 0, k, 2k, 3k, .. .. gl
(s) and 92 (s) of Z 1 and Zz
On the other hand, direct computation shows that
92(S)
= exp[-..x(l
- Sk)],
lsi
~ 1.
Since ZI and Z2 are independent, the p.g.f. 93 of the sum Z\ gl and 9z. Thus
+ Zz
is the product of
121
RANDOM VARIABLES AND BASIC CHARACTERISTICS
But, looking at (2), we see that 03 is the p.g.f. of a generalized Poisson distribution of order k. Therefore the r. v. Z = ZI + Z2 has just this distribution. Moreover, Z is decomposed into a sum of two independent r. v.s ZI and Z2, neither of which has a generalized Poisson distribution of order k. The Raikov theorem is therefore not valid for generalized Poisson distributions of order k 2:: 2.
12.6. A case when the Cramer theorem is not applicable Recall first that the Cramer theorem can be reformulated in the following equivalent form. Let FI (x), x E lIt I and F2(X), x E lIt I be non-degenerate d.f.s satisfying the relation
(l) where c:I>a.O' is a d.f. corresponding to N(a, (12). Then each of Fr and F2 is a normal d.f. Suppose now that the condition (l) is satisfied only for x $ xo, where Xo is a fixed number, Xo < 00 (i.e. not for all x E lIt I). Is it true in this case that FI and F2 are nonnal d.f.s? The answer follows from the next example. Denote by c:I> = c:I>O,1 the standard nonnal d.f. and define the function:
FJ(x)
= {2~(-1)"(X-n), v(x),
if x <0 if x
>0
where v( x) is an arbitrary non-decreasing function defined for x E (0,00) and such that v(O+) = FI (0) and v{ (0) = 1. It is easy to check that FI is a d.f. and let ~I be a r.v. with this d.f. Funher, let F2 be the d.f. of the r. v. 6 taking two values, 0 and 1, each with probability ~. Then we find that
i.e. condition (1) is satisfied for x $ Xo with Xo = O. However if x (Fr * F2 }{x) i: c:I>(x). Obviously FI and F2 are not normal d.f.s. Hence condition (1) in the Cramer theorem cannot be relaxed.
>
0, then
12.7. A pair of unfair dice may behave like a pair of fair dice Recall first that a standard and symmetric die (a fair die) is a term used for a 'real' material cube whose six faces are numbered by 1, 2, 3, 4, 5, 6 and such that when rolling this die each of the outcomes has probability Suppose now we have at our disposal four dice: white, black, blue and red. The available information is that the white and the black dice are standard and symmetric.
i.
122
COUNTEREXAMPLES IN PROBABILITY
Then the sum X + Y of the numbers of these two dice is a r. v. which is easy to describe. Clearly, X + Y takes the values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12 with 'l" I 2 3 4 5 6 5 4 3 2 d I . I pro ba bI ItIes 36' 36' 36' 36' 36' 36' 36' 36' 36' 36 an 36' respective y. Suppose we additionally know that the blue and the red dice are such that the sum ~ + 1J of the numbers on these two dice is exactly as the sum X + Y obtained when rolling the white and the black dice (i.e. ~ + 1J takes the same values as X + Y with the same probabilities shown above). Does this information imply that the blue die and the red die are fair, i.e. that each is standard and symmetric? It turns out the answer to this question is negative as can be seen by the following physically realizable situation. Take a pair of ordinary dice changing, however, the numbers on the faces. Namely, the faces of the blue die are numbered 1,2,2,3, 3 and 4 while those of the red die are numbered 1, 3,4, 5, 6 and 8. If ~ and 1J are the numbers appearing after rolling these two dice, we easily find that indeed ~ + 1J d X + Y. Hence, despite the facts that the sum X + Y comes from a pair of fair dice and that X + Y has the same distribution as ~ + 1J, this does not imply that the blue and the red dice are fair. The practical advice is: do not rush to pay the same for a pair of dice with fair sums as for a pair of fair dice!
12.8.
On two properties of the normal distribution which are not characterizing properties
Let X and Y be independent r.v.s distributed N(O, 1). Then the ratio X/Y has a distribution
P[X/Y
~ z] = (2rr)-1
II
exp
(_~x2) exp ( _~y2) dxdy,
zE
jRl.
x/ySz
It is easy to check that
(P[X/Y ~ z])~
= 1/[rr(l + z2)].
Hence X / Y has a Cauchy distribution. Let us call this property (NIN -7C). The presence of the property (NIN -7C) leads to the following question. Suppose X and Y are i.i.d. r.v.s with zero means such that X/Y has a Cauchy distribution. Is it true that X and Yare normally distributed? By examples we show that the answer to this question is negative. (i) Consider two i.i.d. r.v.s
f(x)
~
and 1J having the density
= (J2/rr) /(l + X4),
x
E jRl.
123
RANDOM VARIABLES AND BASIC CHARACfERISTICS
If 9 (z), z E 1R I denotes the density of the ratio {/17 then we easily find
g(z)
=
((2/7r2)!~ f
= ((2/1T')
i:
(l
}x/y
(1
+ x 4)-1 (1 + y4)-ldXdY)
+ y4)
I
x
'( 1 + zV)-'lyl dy )
~ = [1T( 1 + Z')]-I.
Therefore the ratio {/17 has a Cauchy distribution but obviously the variables { and 17 are non-normally distributed. Thus we have established that the property (NIN-tC) is not a characterization property of the normal distribution. (ii) Let us consider another case on the same topic. It can be checked that
2x4 f,(x) = 7r(1 + x 2 )(1 + x4)' is a density function. Take two independent r.v.s XI and YI each with density fl. Then the ratio Z\ = XI/Y\ has a Cauchy distribution (for details see Steck 1959), Clearly X I and Y1 are not normally distributed. It should be noted that the variables X and Y above are independent and this condition is essential for XI Y to be Cauchy distributed. (iii) It turns out the ratio XI Y has a Cauchy distribution for some cases when X and Y are dependent. (We can guess that now the reasoning is more complicated.) Indeed, consider the function
It can be shown that 'l/J is the ch.f. of a two-dimensional random vector, say ({, 17) such that: 1) each of { and 17 is not normally distributed (look at the marginal ch.f.s of f. and 17); 2) f. and 17 are dependent; 3) the ratio f.111 has Cauchy distribution. (For details see Rao 1984.)
(iv) Let XI, ... , Xn be n independent r.v.s each distributed normally N(O, 0'2), n > 2. Define
It is well known (see Feller 1971) that T has a Student distribution with n - 1 degrees of freedom. (Recall that T is often used in mathematical statistics.) Let us consider the converse. SupposeX t ,. " , Xn are Li.d. r.V.s with density f(x), x E 1R I, and we are given that the variable T has a Student distribution. Does it follow from this that f is a normal density? The example below shows that for n = 2 the answer is negative.
124
COUNTEREXAMPLES IN PROBABILITY
Let XI, X 2 be independent r.v.s each with density
f
and let
Our assumption is that T has a Student distribution. Thus the problem is to find the unknown density f. Firstly, let us introduce the functions hi (XI, X2), h 2(x, y) and h 3(z, y) which are the densities of the random vectors (XI, X 2 ), (X, s) and (T, s) respectively. We find that
XI
hI (XI, X2) = f(xi )f(X2), h 2(x,y) = 23/ 2f(x + 2- 1/ 2y)f(x - 2- 1/ 2y), h 3 (z,y) = 2f [y(z +
l)/Vi]
f [y(z -
E
)RI,
X2 E )RI, Y E )R+,
z E
)R I,
YE
X
l)/Vi] ,
E )RI,
)R+.
By the assumption above T has a Student distribution and clearly in this case (of two variables XI and X 2 ) T has a Cauchy distribution, that is the density of T is 1/[7r(1 + z2)], z E )RI. But the density of T can be obtained from h 3(z,y) by integration. Thus we come to the following relation:
It can be shown (see Mauldon 1956) that f is an even function and that the general solution ofCl) is of the form f(x) = 7r- 1/ 2 g(X2), X E )RI, where
10
(2)
00
g(u)g(au) du = 1/(1 + a),
a> o.
Furthermore, the integral equation (2) has an infinitely many solutions. However, for our purpose it is enough to take e.g. only two solutions, namely:
(a)
(b)
g(u)
= }2/7r(1 + u 2 ),
f(x) = ( V2/7r) /(1 + X4),
X E )RI.
Obviously in case (a) the variables XI and X 2 are distributed normally N(O, ~), while in case (b) the distributions of XI and X 2 are both non-normal. Therefore we have constructed two ij.d. r.v.s XI and X 2 whose distribution is non-normal but the variable T has a Student distribution. Finally it is interesting to note that the probability density function f(x) = (Vi/7r)/( 1 + x 4 ) has appeared in both cases (i) and (ii). (v) Recall the definition of the so-called beta distribution of the second kind denoted by /3(2) (a, b). We say that the r. v. X /3(2) ( a, b) if its density equals xa-I(l + x)-a-b/B(a,b), if x > o and 0, if x ~ o. Here a > 0, b > O. r-...J
125
RANDOM VARIABLES AND BASIC CHARACTERISTICS
The following result being of independent interest (see Letac 1995) is used for the reasoning below: Z has a Cauchy distribution e(O, 1) ¢::::::} Z is symmetric and
!).
IZI 2 '" (3(2) (1, Consider now two independent and symmetric r.v.s XI and X2 such that IXrl2"'{3(2)(!,b)
and
IX212 "'{3(2)(!,1 +b)
b>O.
forsome
Then, referring again to the book by Letac (1995) for details, we can show that the quotient X 21XI has a Cauchy distribution e(O, 1). Hence we have described another case of independent r.v.s XI and X2 such that their quotient X2/ XI follows a Cauchy distribution. Obviously, neither XI nor X2 is normally distributed. Note, however, that here we did not make the advance requirement that XI and X2 have the same distribution.
12.9.
Another interesting property which does not characterize the normal distribution
Let us start with the following result (for the proof see Baringhaus et al 1988). If X and Yare independent r.v.s, Z = XYI(X 2 + y2)1/2 and X '" N(O, aT), Y '" N(O, ai), then Z '" N(O, a 2) with a 2 = arai!(al + a2)2. It is interesting to poit out that Z is a non-linear function oftwo normally distributed r.v.s and, as stated, Z itself has also a normal distribution. This leads to the inverse question: if X and Yare independent r.v.s and Z is normally distributed, does it follow that X '" Nand Y '" N? Assume the answer to this question is positive. Thus we can suppose that Z '" N(O, 1). The definition of Z implies that
l/X2
+ l/y2
d
l1Z2.
It is easy to find that the distribution of 1I Z2 has a Laplace transform 1jJ (t) = exp(-V2t), t ~ 0, which means that l1Z 2 has a stable distribution with parameter Let us show that 1I Z2 admits the representation
!.
where UI and U2 are independent non-negative r.v.s such that the distribution of each of them does not have an 'atom' at and does not belong to the class of stable distributions with parameter For this we write 'I/J in the form
°
!.
1jJ(t) = exp [ __1_
roo (1 -
..J2i 10
e- tx )x- 3/2dX],
and introduce the following two functions of x E IR I:
t
>
°
126
COUNTEREXAMPLES IN PROBABILITY
(as usual I A (.) is the indicator function of the set A). Denoting
i.pj(t)
=
1=(] -
e-tX)x-lhj(x)ux,
=
we see that the integrals I 1 x-I hj (x) dx, j tPl (t)
= exp[-i.pl (t)],
t E]Rt
and
j
= 1,2
1,2, are convergent and both
tPz(t) = exp[-i.pz(t)], t E ]RI
are Laplace transforms of an infinitely divisible distribution with support [0,(0) (see FeHer 1971). Since 'I/Jj (t) -t 00 as t -t 00, j = 1,2, these distributions do not have 'atoms' at 0. Suppose now that Uj is a r.v. having Wj as its Laplace transform, j = 1,2. We can take UI and U2 to be independent. Then the Laplace transform of the distribution of UI + U2 equals tPl (t)tP2(t). However1f;1 (t)'l/J2(t) 'I/J(t). This fact and the reasoning above imply that 1I Z2 is the sum of UJ and Uz which are independent but. obviously, the distributions of 1lUI and llUz are not normal as they might be if the answer to the above question were positive. The interesting property described at the beginning is therefore not a characterizing property for a normal distribution.
=
12.10.
Can we weaken some conditions under which two distribution functions coincide?
Let us formulate the following result (see Riedel 1975). Suppose F (x), x E lR I, is an infinitely divisible d.f. and F2(X) = (x), x E ]Rt, where is the standard normal d.f. Then the condition J
(1)
implies that (2)
It is interesting to show the importance of the conditions in this result. In particular, the fonowing question is discussed by Blank (1981): does (1) imply that FI = F2 if we suppose that FI and F2 are arbitrary infinite1y divisible dJ.s, F2(x) > 0, x E ]Rt? By an example we show that the answer is negative. Introduce the functions
G I (x) := e- I
L
11k !,
k<x
and define FI and F2 as convolutions by
G2(X):= e
t
L 2k:S;x
11k!
127
RANDOM VARIABLES AND BASIC CHARACTERISTICS
Then both FI and F2 are infinitely divisible and F2(x) > 0, x E ]RI. Let us now estimate the quantity [FI (x)/ F 2 (x)] - 1 in two ways. We have
F,(x) L:~o(I/k!)(x - k) L:~, (I/k!)(x - k) F2 (x) - 1 = E~o(1/k!)(x _ 2k) - 1 ~ E~,(I/k!)(x - 2k)
< E~, (I/k!)(x - k) -
(x)
~
(x - 1) ~oo / (x) L..."k=1 (1 k!) -+ 0 as x -+
-00.
On the other hand, 1 _ FI (x)
< (x -
F2 (x) -
(x)
2)
E oo k=1
(11 k ') -+ 0 .
as x -+
-00.
Thus limx-+_oo[FI (x)1 F 2 (x)] = 1, that is relation (1) is satisfied, but FI
12.11.
i- F 2•
Does the renewal equation determine uniquely the probability density?
Let us start with a sequence {Xi,i = I,2, ... } of non-negative Li.d. r.v.s with a common d.f. F and density f. It is accepted to interpret Xi as a lifetime, or renewal time and it is important to know the probability distribution, say Ht. of the variable Nt defined as the number of renewals on the time interval [0, t]. In some cases it is even more important to find U (t) = ENt , the average number of renewals up to time t without asking explicitly for H t . We have
(1)
U(t) = F(t)
+
l
F(t - s)dU(s)
and hence the function u(t) = dU(t)/dt (which exists since 1 = F' exists), called a renewal density, satisfies
(2)
u(t) = I(t)
+
lot I(t - s)u(s) ds
for t
> O.
The tenn renewal equation is used for both (1) and (2) and we are interested in . how to find U (oru) in tenns of F (or j), and conversely. If 1* and u* are the Laplace oo oo transfonns of 1 and u (f* (a) = Jo e- at I(t) dt, u* (a) = Jo e-atu(t) dt), we easily find from (2) that
(3)
* U
1*(0:) (0:) = 1 _ 1*(0:) ,
0: ~
o.
Obviously, (3) and (2) imply that 1 detennines u uniquely. Consequently F detennines U uniquely. E.g. if F '" exp(.A), then 1*(0:) = .A/(.A + a), u*(o:) .A/o: :::} u(t) = .A for all t :::} U(t) = .At, a well known result.
COUNTEREXAMPLES IN PROBABILITY
128
Let us now answer the inverse question: does u determine 1 uniquely? It turns out the answer is negative. Recall first that the classical renewal theorem (Feller 1971) states that in general limt ...... = u(t) 1/p" where p, E[Xt] is the average lifetime. Let us show that there is a renewal density u(t) with u(t) -t 1/p, as t -t 00 for u*(a)/[I + u*(a)] found from (3) leads to a same J-' > 0 and such that 1*(0.) function l{t) which may not be a probability density. Indeed, take u (t) = (1 e - p.t) / P, for t > 0 and fixed p,. Obviously, u (t) -t 1/J-' as t -t 00. The Laplace transform u* (a) of this u(t) is
=
=
=
u*(a) = ]/[0.(0. + p,)]
=>
1*(0.)
= 1/(0.2 + ap, + 1).
Suppose now that p, < 2 (by assumption p, > 0). Inverting 1* we find that = e-p.t/2(c/2)-1 sin(ct/2), 0 < t < 00, where c = }4 - p,2 and that 1(t) dt = I. However the function 1 is not a probability density.
I(t)
Jo=
12.12. A property not characterizing the Cauchy distribution Suppose the r. v. X has a Cauchy distribution with density 1(x) = 1/ [7r(I + x 2 )], x E lR I. Then it is easy to check that the r. v. 1/ X has the same density. This property leads naturally to the following question. Let X be a r. v. which is absolutely continuous and let its dJ. be denoted by F(x), x E lR I. Suppose the r. v. 1/ X has the same d.f. F. Does it follow that F is the Cauchy distribution? Clearly, if the answer to this question is positive, then the property X d 1/ X would be a characterizing property of the Cauchy distribution. It turns out, however. that in general the answer is negative. Let us illustrate this by the foHowing example. Suppose X is a r. v. with density
g(x)
1/4, { 1/(4x2),
if if
Ixl ::; 1 Ixl > 1.
Thus X is absolutely continuous and it is not difficult to check that 1/ X has the same density g. Hence X distribution.
1/ X. It is obvious, however, that X does not enjoy the Cauchy
12.13. A property not characterizing the gamma distribution Let X and Y be independent r.v.s each with a gamma distribution "Y(p, a), P a > 0; that is, the common density is
(I)
I(x)
if x ::; 0 if x> O.
> 0,
129
RANDOM VARIABLES AND BASIC CHARACTERISTICS
XI Y has the following density:
Then the ratio Z =
0
(2)
g(z) = { [i/B(p,P)]zP-'(1 + z)-2p ,
if z ~ 0 if z > 0
(beta distribution of the second kind; see Example 12.8). This connection between gamma and beta distributions leads to the next question. Let X and Y be positive independent r. v.s such that the ratio Z XI Y has a density given by (2). Does this imply that each of X and Y has gamma distribution? Let us show that the answer to this question is negative. To see this, introduce the following two functions, where a > 0, p > 0:
f (x)
= { 0,
I
Clx-P-1e-a/x I
0,
h(x)
if x ::; 0 if x > 0,
= { C2XP 1[(1 + X 2 )p+I/2],
if x ~ 0 if x > o.
It is easy to check that with CI
aP/r(p)
and
C2
= 2r(p+ ~)/[r(!p)r(!p+
Dl
and h are density functions. Take two independent r.v.s, say ~I and TIl, each with density fl. Then we can establish that the density gl of the ratio (I = ~I ITII coincides with the density 9 given by (2). Clearly, however, in this case, fl does not have the form (I). The same conclusion can be derive if we start with two independent r.v.s, 6 and Tl2, each having the density h. In this case again the density of (2 = 61112 coincides with (2) while h is not of the form described by (1). fl
12.14. An interesting property which does not characterize uniquely the inverse Gaussian distribution We say that the r. v. X has an inverse Gaussian distribution with parameters /-L and A > 0 if the density f of X is given by
(1)
f(x) =
>0
A) 1/2 [ A (x - Jl.)2] . 27rx3 exp - 2/-L 2 x ,If x > 0 ( { 0,
if x
~
o.
It is easy to see that all moments Q n = E[xn], n = 1,2, ... , exist. Moreover, it can be shown that this distribution is determined uniquely by its moment sequence {Qn,n 1,2, ... }. It is interesting to note that all negative-order moments of X are also finite, that is E[X nl exists for each positive integer n. Further, a standard transformation leads to the following interesting relation:
(2)
130
COUNTEREXAMPLES IN PROBABILITY
This relation and the uniqueness of the moment problem mentioned above motivate the conjecture: if X is a positive r.v. such that all moments E[xn] and E[x-n], n = 1,2, ... , exist and satisfy (2), then X has an inverse Gaussian distribution. It turns out, however, that this conjecture is not correct. Note firstly that EX = JJ and let for simplicity JJ = 1. Then (2) has the form E[x-n] = E[xn+I]. Further, the density (1) satisfies the relation
(3)
x/ex) = (l/x 2 )/(I/x),
x > O.
Thus the density / of the inverse Gaussian distribution can be considered as a solution of the functional equation (3). Let Y be a r. v. whose density 9 satisfies (3). Then it is easy to check that the relation (2) is fulfilled for Y. So it is clear that if(3) has a unique solution, namely f by (1), then our conjecture will be true; otherwise it will be false. To clarify this consider the function 9 given by
(4)
2 I I g(x):::: { -; JX 1 + 0,
,
if x
>0
if x ::; O.
It can be verified directly that 9 is a probability density function which satisfies (3). As a consequence, Y satisfies (2). Therefore we have found two r.v.s, X and y, whose densities (1) and (4) are different, and nevertheless both satisfy relation (2). Thus the relation (2) is not a characterizing property of the inverse Gaussian distribution. Finally we suggest that the reader considers equation (3) and tries to find other solutions to it which will provide new r. v.s satisfying relation (2).
SECTION 13. DIVERSE PROPERTIES OF RANDOM VARIABLES In this section we consider examples devoted to different properties of r. v.s and their numerical characteristics. Some notions are defined in the examp]es themselves.
13.1.
On the symmetry property of the sum or the difference of two symmetric random variables
Recall first that the r. v. X is called symmetric about 0 if X ( - X). In terms of the d.f. F, the density f and the ch.f. r,p this property is expressed as follows: F( -x) :::: I - F(x) for all x ~ 0; f( -x) :::: f(x) for all x E jR'; r,p(t), t E jR' takes only real values. By the examples below we analyse the symmetry and the independence properties under summation or subtraction. (i) If X and Y are identically distributed and independent r. v.s, then their difference
X - Y is symmetric about O. Suppose we know that X
d
Y and that the difference
131
RANDOM VARIABLES AND BASIC CHARACTERISTICS
x-
Y is symmetric. Does it follow that X and Y are independent? To see this consider the random vector (X, Y) defined as follows:
y
1 2 3
X 1
2
I
2
12 12 12 I I T2 0 12
3
4 2 12 0 12 .
It is easy to check that X and Y have the same distribution, each taking the values 1, 2 and 3 with probability equal to ~, ~ and &respectively. Obviously, X and Y are not independent. Further, the difference Z = X - Y takes the values -2, -1, 0, I, 2 with probabilities 366' {6' ~~, {6' 3~' Clearly Z and ( - Z) have the same distribution. X - Y is a symmetric r.v. despite the fact that the variables X In other words, Z and Y are not independent. (ii) If X and Y are symmetric and independent r. V.s, then the sum Z = X + Y is again symmetric. Thus it is of interest to discuss the following question. Suppose X and Y are independent r. v.s and we know that X is symmetric and that the sum Z = X + Y is also symmetric. Is it true that Y is symmetric? Intuitively we could expect a positive answer. It turns out, however, in general the answer is negative. This is illustrated by the following example. Let ~ be a r. v. with the following ch.f. indicating that ~ is symmetric:
1/Jdt ) =
(I)
{Ol,- 21tl,
if if
It I S 1/2 It I > 1/2.
Consider two other ch.f.s:
_{IJ/(4Itl), Itl,
hl(t) -
if
ItI S 1/2
if It I
>
1/2,
h (t) _ { I 2 0,
Itl,
s
if It I I if It I > I.
Introduce now a r. v. 11 with ch.f.1/Jl1 which is the mixture of hI and h2:
1/Jl1 (t) = !eithl (t) + !e- ith 2(t),
t E )RI.
Elementary transformations show that
(2)
_{(Ieit j(8It!) It I) + (lj2)c(t)e-
1/J l1 (t) -
cost,
it
(l
-Itl)'
s
if It I 1/2 if It I > 1/2
wherec(t) = I, ifltl < 1 andc(t) - 0, ifltl > 1. The explicit form (2) of the ch.f. tPl1 shows that the r. v. 11 is not symmetric. Thus we have described two r.v.s, and 11, the first being symmetric while the second is not. Assuming that ~ and 11 are independent we look for the properties of
e
132
COUNTEREXAMPLES IN PROBABILITY
e
the sum ( = 'fl. Since for the ch.f.s 'l/Je, 'l/J1J and 'I/J(, we have 'I/J(, of ( I) and (2), it is not difficult to find that 'I/J (t)
(,
= {(I - 21tl)(1 -Itl)cost, 0,
if if
= 'l/Je'I/J1J' in view
It I ::; 1/2 ItI > 1/2.
Obviously 'I/J(, takes only real values which means that the r. v. ( is symmetric. Therefore the symmetric property of two variables. and ( = + 'fI, together with the independence of and 'fI, do not imply that 'fI is symmetric. Here is another equivalent interpretation-the difference, and hence the sum. of two dependent r. v.s both symmetric, need not be symmetric.
e
e
e
13.2. When is a mixture of normal distributions infinitely divisible? Let G (u), u E IR + be a d.f. Then the function 'I/J( t), t E IR I where
(1) is a ch.f. The d.f. F with ch.f. 'I/J is called a mixture of normal distributions and G a mixing distribution. Note that the density f of F corresponding to (l) has the form (see Kelker 1971): f(x)
= 10
00
(27ru)-1/2 exp( -x2/(2u)) dG(u).
Since the normal distribution is infinitely divisible it is natural to ask whether such a mixture preserves the infinite divisibility. It is easy to check that if G is an infinitely divisible d.f. then 'I/J is an infinitely divisible ch.f. Now we want to answer the converse question: if'I/J is an infinitely divisible ch.f., does it follow that the mixing distribution G is infinitely divisible? Consider the function H(x), x E IRI where H(x) = 0, 0.26, 0.52, 0.48, 0.74 and I respectively in the intervals
(-00, I], (1,2], (2,3], (3,4], (4,5] and (5,00). Clearly H is not a d.f. However, we obtain the interesting and unexpected fact that the convolution H * H is a d.f. Moreover, the function Jooo (2 7ru) - t /2 exp( - x 2 1(2u)) dH (u) is a density. Define G as follows: 00
(2)
G(x} = e-
1
I)k!) k=O
1H*k(x).
133
RANDOM VARIABLES AND BASIC CHARACfERISTICS
We can verify that G given by (2) is a d.f. and find that
f
exp (
-~t'u) dG(u) =
e-
I
[f
~(k!)-I 00
= exp [1
exp
= exp (1OO[cos(tx) -1](27r)-1/21
exp (
(_~t2u) OO
+'u) dH(u)] ,
dH(u)
-1]
u- I / 2ex p[-x 2/(2U)]dH(u)dX) .
The last expression in this chain of equalities coincides with the Kolmogorov canonical representation for an infinitely divisible ch.f. provided oo that fo it- I / 2 exp( -x 2 /(2u)) dH(u) 2: 0 for all x > 0 (see Gnedenko and Kolmogorov 1954). But H satisfies this condition by construction. Therefore 1/J defined by (l), with G given by (2), is an infinitely divisible ch.f. It remains for us to show that G in (2) is not infinitely divisible. This folJows from the Levy-Khintchine representation for the ch.f. of G and from the fact that H is not non-decreasing.
13.3. A distribution function can belong to the class IFRA but not to IFR Let F(x), x 2: 0 be a d.£. with density J. We say that F is an increasing failure rate distribution and write F E IFR, if its failure rate r(x) := J(x)/(l - F(x)) is increasing in x, x > O. In this case - Jog[1 - F(x)] is a convex function in the domain where -log[l - F(x)] is finite. This observation motivates the more general definition: F E IFR if -log[l - F(x)] is convex where finite. However, for some problems it is necessary to introduce a considerably weaker restriction on F. For example, if F has density J and failure rate r such that (1/ x) foX r( it) dit is increasing in x, we say that F has an increasingfai/ure rate average. In this case we write FE IFRA. More generally, FE IFRA if (-l/x) log[l - F(x)] is increasing where finite. Thus we have introduced two classes of distributions, IFR and IFRA, and it is natural to ask what the relationship between them is. According to Barlow and Pros han (1966). if F E IFR and F(O) = 0 then F E IFRA. We are interested now in whether the converse is true. To see this, consider the function
0
F(x) = {
(i - e-X)(l -
It is easy to check that F E IFRA but F
e- kx ),
¢ IFR.
if x ~ 0 if x > 0, k
> 1.
134
13.4.
COUNTEREXAMPLES IN PROBABILITY
A continuous distribution function of the class NBU which is not of the class IFR
A d.f. F (of a non-negative r.v.) is said to belong to the class NBU (new and better than used) if for any x, y ~ 0 we have
F(x + y) :5 F(x)F(y)
(1) If for any y
F = 1 - F.
where
> 0 the function [F(x + y) - F(x)]/ F(x)
is increasing in x, we say that F is of the class IFR (compare this definition with that given in Example 13.3). It is well known that F E IFR =} F E NBU, but in general the converse implication is not true (see Barlow and Proshan 1966). The d.f. F E IFR has the property that it is continuous o'n the set {x : F(x) < I} and, moreover, h(x) = -log F(x) is a convex function. However, the elements of the class NBU need not be continuous. This follows from a simple example. Indeed, Consider the function
F(x)
=
1-2-k
xE (k,k+ I],
for
k=O,I,2, ....
It is easy to check that (I) is satisfied and hence F E NBU. Obviously F is discontinuous and hence F ¢ IFR. Suppose now that F E NBU and F is continuous. Does it follow from these conditions that F E IFR? It turns out that the answer is negative. To see this consider the function _ { sin 2 x, if x E [0, 111'] h( x ) - I( 1) . x E (211', 1 00 ) . 211' X - 211' + I, If It is easy to check that F(x)
h(x
=I-
+ y)
~
e-h(x),
h(x)
x ~ 0 is a dJ. and, moreover, that
+ h(y),
x,y
~
O.
Therefore F E NBU and clearly F is continuous. Nevertheless F h(x) = -log F(x) is not a convex function.
13.5.
¢
IFR since
Exchangeable and tail events related to sequences of random variables
Let {Xn,n ~ I} be an infinite sequence of r.v.s defined on the probability space (0, ~, P). Denote by a{ XI, ... ,Xu} the a-field generated by XI, ... ,Xu' Then clearly U~I a{Xn,Xn+ I , ... ,Xn+d is a field and let a{Xn,Xn+I, ... } be the a-field generated by this field. The sequence of a-fields a{Xn,Xn+ I , .. . }, a{ X n + 1, X n +2, ... }, ... is non-increasing, its limit exists and is a a-field. This limit is denoted by
'J =
n
a{Xn,Xn+" .. . }.
n=1
RANDOM VARIABLES AND BASIC CHARACTERISTICS
135
'I is called the tail a-field of the sequence {Xnl n ~ I}. Any event A E 'I is called a tail event, and any function on n which is measurable with respect to 'I is said to be a tail function. Let us fonnulate the basic result concerning the tail events and functions. Kolmogorov 0-1 law . Let {X n} be a sequence of independent r. v.S and 'I be its tail a-field. Then for any tail event A. A E 'I, either P(A) = 0 or P(A) = 1. Moreover, any tail function is a.s. constant, that is, if Y is a r.v. such that a{Y} C 'I then pry = cJ = 1 with c = constant We now introduce another notion. (Also see Example 7.14). We say that the r. v.s Xl l' .. 1Xn are exchangeable (another tenn is symmetrically dependent) if for each of the n! pennutations {i II i l , ... , in} of {I, 2, ... , n}, the random vectors (Xii 1X iZ )' .. 1Xi n ) and (XI, Xl,· .. 1 Xn) have the same distribution. Further, an infinite sequence {Xnl n ~ I} is said to be exchangeable if for each n the r.v.s Xh ... ,Xn are exchangeable. The ~oo-measurable function g(XJ,X2, ... ) is called exchangeable if it is invariant under all finite pennutations of its g(Xill . .. ) X in ) X n+ 1, •.. ). In particular, arguments: g(X" ... ,Xn , Xn+ I,· .. ) A E a{XJ, X2,"'} is caned an exchangeable event if its indicator function I(A) is an exchangeable function. Let us fonnulate a result concerning the exchangeability.
Hewitt-Savage 0-1 law. Let {Xn) n ~ I} be a sequence of r.v.s which are independent and identically distributed. Then for any exchangeable event A E a{Xl,X21"'} eitherP(A) OorP(A) 1.
=
Note that a detailed presentation of the notions and results given briefly above can be found in the books by Feller (1971). Laha and Rohatgi (1979), Chow and Teicher (1978), Aldous (1985) and Galambos (1988). Obviously tailness and exchangeability are close notions and it would be useful to illustrate by a few examples the relationships between them.
(i) The first question concerns the tail and exchangeable events. Let {Xnl n 2:: I} be a sequence of (real-valued) r.v.s and 'I its tail a-field. If A E 'I, then for any pennutation {ill" .) in} of {I, ... 1 n}, n ~ 1, we can write A in the fonn {(Xn+ll Xn+21"') E Bn+d where Bn+l isa Borel set in !Roo, that is, Bn+l E ~oo. Thus for each n, n n A = {(Xl, Xl,"') E !R x B n+ l } - ((Xip ... ) Xi .. ,Xn+h"') E !R x Bn+d and since Boo := Iltn x Bn+ I is a Borel set in }Roo, this implies that the tail event A is also an exchangeable event. However, there are exchangeable events which are not tail events. The simplest example is to take A {Xn 0 for all n 2:: I}. Obviously A is an exchangeable event. But A ¢ a{ X n, Xn+ I, ... } for every n ~ 1. So A is not a tail event.
(ii) Now let us clarify the possibility of changing some of the conditions in the Hewitt-Savage 0-1 law. Consider the sequence {Xnl n 2:: I} of independent r.v.s
COUNTEREXAMPLES IN PROBABILITY
136
!
=
=
where P[XI 1] P[XI = -1] = and P[Xn OJ = 1 for n 2: 2. The event A = {2: =1 Xj > 0 for infinitely many n} is clearly an exchangeable (but not tail!) event with respect to the infinite sequence of r.v.s {Xn,n 2: I}. Moreover, P(A) P[XJ > 0] = and hence P(A) is neither 0 nor 1 as we could expect. Here the Hewitt-Savage 0-1 law is not applicable since the independent r.v.s X n , n 2: 1 are not identically distributed.
J
!
(iii) Let X n , n
2: 1 be independent r.v.s such that
P[Xn=1]=2-n, and let A - {Xn = 0 for all n Further, we have
P(A)
P[Xn =Oj-I-2- n ,n
1,2, ...
2: I}. Then A is an exchangeable but not a tail event. 00
00
n:-::l
n:::-:I
= II P[Xn = 0] = II (1
- 2-n).
Since 2::'= 1 2 -n < 00, the infinite product n:':::::: I (I - 2- n ) converges to a positive limit which is strictly less than 1 and hence P(A) is neither 0 nor 1. Therefore again, as in case (ii), the Hewitt-Savage 0-1 law does not hold. Note that here the variables X n , n 2: 1, are independent and take the same values but with different probabiJities.
13.6. The de Finetti theorem for an infinite sequence of exchangeable random variables does not always hold for a finite number or such variables Let {X n, n 2: I} be an infinite sequence of exchangeable r. v.s each taking two values, o and 1. Then according to the de Finetti theorem (see FeHer 1971) there is a unique probability measure p on the Borel a-field 'B[O, I] of the interval [0, 1] such that for each n we have
where ej = 0 or I and k = e, + ... + en. In other words, the distribu lion of the number of occurrences of the outcomes 0 and I in a finite segment of length n of the infinite sequence XI, X 2 , •.. of exchangeable variables is always a mixture of a binomial distribution with some proper distribution over the interval [0, 1]. Thus we come to the question of whether this result holds for a fixed finite exchangeable sequence. The answer can be found from the following two examples. (i) Consider the case n = 2 and the r.v.s XI and X 2 :
P[X, - 0, X 2 = 1] = P[X,
= 1,
X2
0]
137
RANDOM VARIABLES AND BASIC CHARACTERISTICS
P[XI
0, X2
0]
P[XI = 1, X 2 = 1] = O.
It is easy to see that XI and X 2 are exchangeable. Suppose a representation like (1)
holds. Then it would foJ]ow that
l
This means that
jJ
p'/J (dp)
0
and
l (\ -
p)'/J (dp)
= O.
puts mass one both at 0 and at 1, which is not possible.
Let Y1 , ••• , Yn be n independent r.v.s with some common distribution. Let Sn YI + ... + Y n and Zk Yk n-ISn for k = 1, ... ,n - 1. Then it is easy to check that the r. v.s ZI, ... , Zn-l are exchangeable but their joint distribution is not of the form (1). Therefore the de Finetti theorem does not always hold for a finite exchangeable sequence. (ii)
=
13.7. Can we always extend a finite set of exchangeable random variables? If {Xn} is a finite or an infinite sequence of exchangeable r. v.s then any subset consists of r.v.s which are exchangeable. Suppose now we are given the set XI"'" Xm of exchangeable r.v.s. We say that XI, ... ) Xm can be extended exchangeable if there is a finite set XI, ... 1 X ml X m+ l ) ... , X m +k , k > 1, or an infinite set XI, ... , X m , X m+ l , X m +2 , ••. of r.v.s which are exchangeable. Thus the question to consider is: can we extend any fixed set of exchangeable r.v.s to an infinite exchangeable sequence? Let us show that in general the answer is negative. Consider the particular case of three r. v.s XI, X 2. X3 each taking the values 0 or 1 with P[Xj = 1] = P[Xj = OJ = ~,j = 1,2,3. Let
P[XI
1,X2 - 1] = P[XI
1,X3
1] = P[X2
I,X3 = 1] =0.2.
It is easy to see that (Xl, X 2 , X 3 ) is an exchangeable set. Assume this set can be extended to an infinite exchangeable sequence Xl, X 2 , X 3 , X4! X5,' ... This would mean that for each 11 ~ 4 the set XI, X 2 , X 3 , X 4, . .. ,Xn consists of exchangeable variables. Then we can easily show that
o :::; =
E[(2:7=1 J(Xj = 1))2]- (E [2:7=1 J(Xj = I)]r 2:7=1 2:~=1 P[Xj 1, Xk = IJ *n 2
~n + Lj~k P[Xj
tn + (0.2)n(n -
0, Xk
1) - tn2
1.] -
!n2
= (0.3)n -
(0.05)n2.
Obviously it follows from this that n must satisfy the restriction n :::; 6. However, this contradicts the definition of an infinite exchangeability and therefore the desired extension of a finite to an infinite exchangeable sequence is not always possible.
138
COUNTEREXAMPLES IN PROBABILITY
Interesting results on exchangeability of finite or infinite sequences of random events or r.v.s can be found in the works of Kendall (1967) and Galambos (1988). Finally let us mention that the variables in an infinite exchangeable sequence are necessarily non-negatively correlated. This follows directly from an examination of the tenns of the variance V[XI + ... + Xn]. However, in the above specific example we have P(Xl' X 2 ) < O.
13.8.
Collections of random variables which are or are not independent and are or are not exchangeable
Let X := {Xn,n ~ 2} be a finite or infinite sequence of r.v.s which are independent and identically distributed. Then X is exchangeable, that is both properties independence and exchangeability hold for X in this case. If, however, Xn (at least two of them) have different distributions then X is not exchangeable regardless of whether X is independent or dependent. Our goal now is to describe a sequence of r. v.s which are totally dependent and exchangeable. For consider a sequence X = {Xn, n ~ 2} of i.i.d. r.v.s each with zero mean and finite variance. Let ~ be another r.v. independent of X. Assume that ~ is non-degenerate with finite variance, that is, 0 < V~ < 00. Let us define a new sequence ~ := {Yn , n ~ 2} where Yn = Xn +~. It is easily seen that ~ is exchangeable. Let us clarify whether or not ~ IS independent. The distribution of Yi l , • • • , Yile is the same for any possible choice of k variables from~, k = 1,2, .... Taking k = 2 we conc1ude that ~ is characterized by a common correlation coefficient, say Po where
po = p(Yi, Yj) = (E[YiYj] - EYiE1j)/(VYiVlj)I/2 for any two representatives Yi and lj
of~.
A simple reasoning shows that
Po = (V~)/(VXI
+ V~)
where V Xl (= E[X1D is the common variance of the sequence X. The assumption o < V~ < 00 implies that Po :/: 0 (in fact Po > 0) and hence Yi and lj cannot be independent because they are not even uncorrelated. In other words, ~ is totally dependent in the sense that there is no pair of variables in ~ which is independent. Hence the sequence ~, finite or infinite, is dependent and exchangeable. The final conclusion is that these two properties, independence and exchangeability, are incompatible.
13.9.
Integrable randomly stopped sums with non-integrable stopping times
Let X and Xl, X2, .. , be i.i.d. r. v.s defined on the probability space (0, ~,P) and {~n, n ~ O} where ~o = {0,0} is an increasing sequence of sub-a-fields of ~.
RANDOM VARIABLES AND BASIC CHARACTERISTICS
139
Recall that the r. v. r with possible values in the set {I, 2, ... ,oo} is said to be a stopping time with respect to frn} if the set [w : T( w) = n] denoted further simply by [r = n] belongs to ~n for each n. If So = 0, Sn = Xl + ... + Xn then ST is the sum of the first r of the variables Xl, X 2 , • • " that is ST = Xl + ... + X T. For many problems it is important to have conditions under which the r. v.s X, rand ST are integrable. Let us formulate the following result (see Gut and Janson 1986). Let r ~ 1 and EX =I- O. Then E[lXn < 00 and E[lSrn < 00 imply that E[rr] < 00. Our aim now is to show that the condition EX =I- 0 is essential for the validity of this result. So, let the r. v. X have EX = O. In particular, take X such that P[X = I] = P[X = -I] = and r = min{n : Sn = I}. Clearly r is a stopping time with respect to {~n} where ~n = a{X I , ••• , X n }. It is easy to check that the r.v. X and the random sum ST have moments of all orders, that is, for any r > 0 we have E[lXn < 00 and E[lSTn < 00. However, E[T I / 2 ] = 00 and therefore E[rr] does not exist for any r ~
!
k.
Part 3
Limit Theorems
Prc)lessor A T. F omenko
M()SC,OW
University.
LIMIT THEOREMS
143
SECTION 14. VARIOUS KINDS OF CONVERGENCE OF SEQUENCES OF RANDOM VARIABLES On the probability space (.0,1', P) we have a given r.v. X and a sequence of r.v.s {Xn' n ~ I}. Important probabilistic problems require us to find the limit of X n as n -+ 00. However. this limit can be understood in a different way. Let us define basic kinds of convergence considered in probability theory. (a) We say that {Xn} converges almost surely (a.s.). or with probability 1, to X as n -+ 00 and write Xn X if
P[w: lim Xn(W) n-+oo
= X(w)] =
1.
(b) The sequence {Xn} is said to converge in probability to X as n € > 0 we have lim P[w: IXn(w) - X(w)1 ~ €J = O. n-+oo
-+
00
if for any
In this case the following notation is used: Xn ~ X or P - limn-+oo Xn X. (c) Let F and Fn be the dJ.s of X and Xn respectively. The sequence {Xn} is called convergent in distribution to X if lim Fn(x) = F(x) n-+oo for all x E lR I for which F(x) is continuous. Notation: Fn ~ F, and X n X. (d) Suppose X and X n • n ~ 1, belong to the space LT for some T > 0 (that is E[lXITJ < 00, E[IXnrJ < (0). We say that the sequence {Xn} converges to X in LT -sense, and write Xn
X, if lim E[lXn n-+oo
Xn = O.
In particular, the LT -convergence with T = 2 is called square mean (or quadratic mean) convergence and is used so often in probability theory and mathematical statistics. Note that the convergence in distribution defined in (c) is closely related to the weak convergence treated in Section 16 (see also Example l4.I(iii». Some notions (complete convergence. weak L I-convergence and convergence of the Cesaro means) are introduced and analysed in Examples 14.14-14.18. Practically all textbooks and lecture notes in probability theory deal extensively with the topics of convergence of random sequences. The reader is advised to consider the following references: Parzen (1960). Neveu (1965). Lamperti (1966), Moran (1968), Ash (1970), Renyi (1970), Feller (1971), Roussas (1973). Neuts (1973), Chung (1974), Petrov (1975), Lukacs (1975). Chow and Teicher (1978), Billingsley (1995), Laha and Rohatgi (1979), Serfling (1980) and Shiryaev (1995).
144
COUNTEREXAMPLES IN PROBABILITY
It is usual for any course in probability theory to justify the following scheme. convergence In probability
convergence a.s.
convergence in distribution
Iconvergence in LT -sense I In this section we consider examples which are different in their content and level of difficulty but all illustrate this general scheme clearly. In particular, we show that the inclusions shown above are all strong inclusions. The relationship between these four kinds of convergence and other kinds of convergence of random sequences is also analysed.
14.1.
Convergence and divergence of sequences of distribution functions
We now summarize a few elementary statements showing that a sequence of d.f.s {Fn , n 2: I} can have different behaviour as n -+ 00. In particular it can be divergent, or convergent, but not to a d.f.
F(x), x E R' be a d.f. which is continuous. Consider two sets of d.f.s, {Fn,n 2: I} and {Gn,n > I} where
(i) Let
Obviously Fn{x) -+ 1 as n -+ 00 for each x E RI. But a function equal to I at all points is not a d.£. Hence {Fn} is convergent but the limit limn-too Fn{x) is not a d.f. Further, G 2n (x) -+ I whereas G2n+1 (x) -+ as n -+ 00 for all x E R'. Clearly the family {G n} does not converge.
°
(ii) Consider the family of d.f.s {Fn' n
{
°
2: I} where
a, 1,
if x if x
°
< >
1 n 1... n
Then Fn{x) -+ F{x) if x:;:. and Fn(O) for each n 2: 1, where F is the d.f. of a degenerate r.v. which is zero with probability 1. Thus limn-too Fn(x) exists but is not a d.f. because it is not right-continuous at x 0. (iii) The following basic result is always used when considering convergence in distribution (see Lukacs 1970; Feller 1971; Chow and Teicher 1978; Billingsley 1995; Shiryaev 1995):
( I)
Fn
~ F.;::::}
[ g(x) dFn(x) -+ [ g(x) dF(x)
JRi
JR i
for all continuous and bounded functions g(x), x E jRl.
145
LIMIT THEOREMS
Despite the fact that (1) contains a necessary and sufficient condition it is useful to show that the assumptions for g cannot be weakened. For example take g bounded and measurable (but not continuous), say
g(x) =
{O,1,
0
~f x ~ If x > O.
Denote by F and Fn the dJ.s of the r.v.s X
=0
=Ijn
and Xn
Fn ~ F and obviously J g dFn = I for each n ~ I but
respectively. Then
J g dF =
O. Therefore (I)
does not hold, as we of course expected. Finally, recall that the integral relation (I) can be used as a definition of the weak convergence of d.f.s (see Example 14.9 and the topics discussed in Section 16).
14.2.
Convergence in distribution does not imply convergence in probability
We show by a few specific examples that in general as n ---7
Xn
d
~
'::/?
X
Xn
P
~
00,
X.
(i) Let X be a Bernoulli variable, that is X is a r. v. taking the values I and 0 with probability ~ each. Let {Xn, n ~ I} be a sequence of r.v.s such that Xn = X for any
n. Since Xn d X then Xn ~ X as n ---7 00. Now let Y = I-X. Thus Xn ~ y because Y and X have the same distributions. However, Xn cannot converge to Yin any other mode since
IX n
-
YI
= I always. In particular,
P[lXn - YI > c] f+ 0 P
for an arbitrary e E (0, I) and therefore Xn
-14 Y
as n ---7
00.
(ii) Let n = {WI, W2, W3, W4}, l' be the a-field of all subsets of nand P the discrete uniform measure. Define the r.v.s X n , n ~ 1, and X where
Xn(WI)
= X n (W2) = 1,
X n (W3)
X(WI) = X(W2) = 0,
= X n (W4) = 0,
n
~ I,
X(W3) = X(W4) = 1.
Then IXn(w) - X(w)1 = I for all wEn and n ~ 1. Hence as in case (i), Xn cannot converge in probability to X as n ---7 00. Further, if F and F n , n ~ I, are the d.f.s of X and X n , n ~ 1, we have
0, F(x) = { ~, 1, Thus Fn(x)
= F(x)
if x ~ 0 if O<x~ I if x > 1,
Fn(x)= {
if x < 0 ifO<x~1 if x > I.
0, ~,
I,
for all x E ~I and trivially Fn(x) ---7 F(x) at each continuity
point of F. Therefore Xn ~ X but, as was shown, Xn
P
---f+ X.
146
COUNTEREXAMPLES IN PROBABILITY
(iii) Let X be any symmetric r.v., for example X '" N(O, 1), and put Xn for each n arbitrary E
14.3.
d
2: 1. Then Xn = X > 0 we have
d
and Xn ---7 X. However, Xn
P
--f-+ X
= -X
because for an
Sequences of random variables converging in probability but not almost surely
n
(i) Let = [0, I], l' = 13[0,1] and P be the Lebesgue measure. For every number n E N there is only one pair of integers, m and k, where m 2: 0, 0 ~ k ~ 2m - 1, such that n = 2m + k. Define the sequence of events An = [k2- m , (k + 1)2- m ) and put Xn = Xn(w) = IA,,(w). Thus we obtain a sequence ofr.v.s {Xn,n 2: I}. Obviously if 0 ~ E < 1 if E > 1. Since n -t
00
iff m -t
00,
we can conclude that
Xn
(1)
P ---70
as n -t
00.
Now we want to see whether in (1) the convergence in probability can be replaced by almost sure convergence. It is easy to show that for each fixed wEn there are always infinitely many n for which Xn(W) = 1 and infinitely many n such that Xn(w) = O. Indeed, w E [k2- m , (k + 1)2- m ) for exactly one k where k = 0,1, ... ,2 m - 1, that is w E A 2,"+I. Obviously, if k < 2m - 1 then also W E A 2"'+k+1 and if k = 2m - 1 (and m 2: 1) then also W E A 2",+I. In other words, wEAn i.o., and also W E n\An i.o. which means that lim sUP n --+ oo Xn = I and lim infn--+oo Xn O. Therefore
=
a.s.
Xn
--f+ 0
(ii) Consider the sequence {X n, n
P[Xn = I] Obviously for any
E,
0
P[lXn -
01
1
= -, n
2: I}
as n -t
00.
of independent r. v.s where
P[Xn
1
= 0] = 1--, n
n > 1.
I we have
> E] = P[Xn = 1] =
!n
-t 0
as
n -t
00
and thus Xn ~ 0 as n -t 00. It turns out, however, that the convergence Xn ~ 0 fails to hold. Let us analyse a more general situation. For given r. v.s X, X n, n 2: 1,
147
LIMIT THEOREMS
define the events
Then
Xn ~ X as n -7
(2)
00
{=::}
P(Bm(c:)) -70 as m -7
00
for all c:
> O.
Indeed, let
C
= {w En: Xn(W)
-7 X(w) as n -7 oo},
A(c:)
= {w En: w E An(C:) i.o.}.
Then P(C) = 1 iff P(A(c:)) = 0 for all c: > O. However {Bm(C:)} is a decreasing sequence of events, Bm(C:) -!. A(c:)asm -7 00 and soP(A(c:)) = OiffP(Bm(c:)) -70 as m -7 00. Thus (2) is proved. Using statement (2) for our specific sequence {Xn} yields
P(Bm(c:)) = I - Mlim P[Xn = 0 for all n such that m -too
~ n ~
M].
By the independence of X n ,
and since the product I1~m (I - k- I ) is zero for each mEN we conclude that P(Bm(.s)) = I for all m, that is P(Bm(.s)) does not tend to zero as (2) indicates. Therefore the sequence {Xn} does not converge almost surely.
14.4.
On the Borel-Cantelli lemma and almost sure convergence
Let {Xn, n 2: I} be a sequence ofr.v.s such that for each c:
> 0,
00
(I)
L P[lXnl >.s] <
00.
n=1
According to the Borel-Cantelli lemma, if {An, n 2: I} is an arbitrary sequence of events and E~, P(An) < 00, then PlAn i.o.] = 0 (see also Example 4.2). This lemma and condition (1) immediately imply that Xn ~ 0 as n -7 00. Moreover, the same conclusion, Xn ~ 0 as n -7 00, holds if for any sequence of numbers {c: n } with C:n -!. 0, we have 00
(2)
L P[lXnl > C: n] < n=1
00.
148
COUNTEREXAMPLES IN PROBABILITY
We now want to clarify whether the converse of the statement is true. For this purpose consider the probability space (.0.,1', P) where.o. = [0,1].1' = ~[O,IJ and P is the Lebesgue measure. Define the sequence of r.v.s {Xn, n ~ I} by I
= X n (w) = {O,
Xn
if 0 S; w S; I - nif 1 - n -I < W S; 1.
1,
Obviously Xn ~ 0 as n -t 00. However, for any En > 0 with en P[lXnl > en] = P[Xn = I] = n- I for sufficiently largen. Thus
t
0 we have
00
L P[lXnl > en] =
00.
n=1
Therefore condition (2) is sufficient but not necessary for the almost sure convergence of X n to zero.
14.5.
On the convergence of sequences of random variables in Lr -sense for different values of r
Suppose X and X n , n ~ I are r.v.s in Lr for some fixed r, r > O. Then X, Xn E L S for each s, 0 < s < r. This follows from the well known Lyapunov inequality (see Feller 1971; Shiryaev 1995), (E[IXlsD'/s S; (E[lXlr])l/r, 0 < s < r, or from the elementary inequality Ixls S; 1 + Ixlr, x E }RI, 0 < s < r (used once before in Example 6.5). In other words Lr
L~
Xn ----+ X => Xn ----+ X
for
0 < s < r.
Let us illustrate the fact that in general the converse is not true. Consider the sequence ofr.v.s {Xn,n ~ I} where
P[Xn
= n] = n-(r+s)/2 = I -
P[Xn
= 0],
n ~
I, 0 < s < r.
Then we find E[X~]
=
n(s-r)/2 -t
which implies that Xn ~ 0 as n -t
00.
L'
14.6.
as
n -t 00
However,
E[X~] = n(r-s)/2 -t
and therefore Xn ----+ 0
0
00
as
n -t
00
Lr
=fo Xn ----+ 0 for all r > s.
Sequences of random variables converging in probability but not in Lr -sense
(i) Let {X n , n ~ I} be r. v.s such lhal
I
P[Xn = en] = -, n
I
P[Xn = 0] = I - -, n
n
> I.
149
LIMIT THEOREMS
Then for any c
> 0 we have < c]
P[IXnl
= P[Xn = 0] = 1 -
and hence Xn ~ 0 as n -t
00.
I
eTn ; ; -t
> 0,
as n -t
00
asn-too
-t 1
However, for each r
E[X~] =
and therefore Xn
~n
00
Lr
-f+ 0 as n
-t
00.
(ii) Consider the sequence {Xn, n ~ I} where Xn has a density
(that is, Xn has a Cauchy distribution). Since for any c P[IXnl
< c]
1
2
<:
=
_<:
> 0,
fn(x) dx = -; arctan(nc) -t I
as n -t
00
p
we conclude that Xn --t 0 as n -t 00. But for any r ~ I, E[IXnIT] = 00 and thus the sequence {Xn} cannot converge to zero as n -t 00 in LT -sense for r ~ I. Moreover, since X n, n ~ I, do not belong to the space L Tit is not sensi ble to speak about LT -convergence. (iii) Define the sequences {Yn , n ~ 2} and {Zn, n ~
P[Yn = 1] = P[Zn
= 0] = 1 p
Then, as n -t
14.7.
00,
n- a ,
II logn = P[Zn L
I] as follows:
I - P[Yn
= 0],
= ±n] = 1/(2n a ),
r
Yn --t 0 but Yn -f+ 0 for an)
0
S; 2. L~
p
> 0; Zn --t 0 but Zn -f+ O.
Convergence in LT -sense does not imply almost sure convergence
(i) Consider again the sequence {Xn, n ~ I} defined in Example 14.3(i): namely, Xn = Xn(w) = IAn (w) for n = 2 m + k and An = [k2- m , (k + 1)2-m). Since wEn = [0, I] and P is the Lebesgue measure then E[IXnl] = E[Xn] = 2- m -t 0 as n --7 00. Thus L' Xn --t 0 as n --7 00. a.s.
Nevertheless, as was shown in Example 14.3(i), Xn
-f+ 0 as n
-t
00.
(ii) Let {Xn, n ~ I} be a sequence of independent r. v.s defined by
P[Xn = n l !(2 T )J = lin,
P[Xn = 0] = 1 - lin,
n> I
150
COUNTEREXAMPLES IN PROBABILITY
where r
> 0 is arbitrary. It is easy to see that E[lXnn == E[X~] =
l (n /(2r)rn
I
= n
1/2
~ 0
as n ~
00.
Lr
Therefore for any r > 0, Xn ---; 0 as n ~ 00. Let A4 < N be positive integers. Since Xn are independent, we find N
for M :::; n
prall Xn - 0
II (I -
< N]
lin).
n==M The continuity of P (BN
t Bo => P(BN) t P(Bo)
implies that N
II (1 -
p{n~M(W : Xn(w) :::; e)] == J~oo
lin)
n==M for arbitrary € > 0 and integer M. Separately we can check that for arbitrary M, n~2(1 - I In) 0 and n:'M(1 - lin) == O. Thus p[n~M(W
: Xn(w) :::; e)]
= O.
Since the r.v.s Xn are non-negative this relation means that the sequence {Xn} cannot converge almost surely. (iii) Let {Yn , n 2:: IJ be a sequence of independent r. v.s given by P[Yn == 0] = I - Ilnl/4,
Then it can be shown that Yn
L:> ---;
±I] = 1/(271,1/4),
P[Yn
0 but Yn
a.s.
-f+ 0 as n
n > I.
~ 00.
el
(iv) Let {Sn, n 2:: I} be a symmetric Bernoulli walk. that is Sn = + ... + en where are Li.d. r.v.s each taking the values (+ I) or (-I) with probability!. Define Xn = Xn{w) == I[Sn=Oj(w),n 2:: I. Thenforeveryr > Owe have
ej
lim E[X~] = O.
n-too
Thus Xn ~ 0 as n ~ 00 for r > O. However, the symmetric random walk {Sn} is recurrent in the sense that Sn crosses the level zero for infinitely many values of n (for details see Feller 1968; Chung 1974; Shiryaev 1995). This means that Xn = I a.s.
i.o. and therefore Xn
14.8.
-f+ 0 as n
~ 00.
Almost sure convergence does not necessarily imply convergence in LT -sense
(i) Define the sequence of r. V.s {Xn, n
P[Xn
= 0] =
I
Iln a ,
P[Xn
2:: I} as follows:
= n] = P[Xn
-nJ ==
1/(2n a ),
a> O.
151
LIMIT THEOREMS
Since E[lXnll/2] = l/n cx - 1/ 2 we find that L:~=I E[lXnll/2] < 00 for any Q > ~. According to the Markov inequality we have P[lXnl > E] S E-I/2E[IXnll/2] and hence L:~=I P[lXnl > c] < 00 for every c > O. Using the Borel~Cantelli lemma as a.s.
in Example 14.4 we conclude that Xn
-f-t 0 as n -+
Further, E[lXnI2] = I /n cx - 2 and hence for any Therefore, if Q E [~, 2], then Xn ~ 0 but Xn
00. L'"
S 2, Xn
Q
-f-t 0 as n -+
00.
L'"
-f-t 0 as n -+
00.
(ii) Let {Yn , n ~ I} be a sequence of r.v.s where Yn takes the values en and 0, with probability n- 2 and I - n- 2 respectively. Since for any c > 0, P[lXnl > c] = P[Xn > c] = P[Xn = en] = n- 2 and
n=1
n=1
we conclude as in case (i) above that Xn ~ 0 as n -+
for any r
14.9.
> O. Therefore, as n -+
00.
Obviously,
Lr
00,
Xn ~ 0 but Xn -f-t 0 for all r > O.
Weak convergence of the distribution functions does not imply convergence of the densities
Let F, Fn , n ~ I be d.f.s such that their densities I, In. n ~ I exist. According to the well known Scheffe theorem (see Billingsley 1968), if In(x) -+ I(x) as n -+ 00 for almost all x E JRI then Fn ~ F as n -+ 00. It is natural to ask whether or not the converse is true. The example below shows that in general the answer is negative. Consider the function
0,
Fn(x)
=
x {
if x
<0
(I - sin2nrrx), if 0< x S I 2nrrx if x > 1.
1,
Then Fn is an absolutely continuous dJ. with density
In(x) = { I - cos2nrrx, 0,
if x E [0, I] otherwise.
Also introduce the functions
0,
F(x)= { x, 1,
if x S 0 ifO<xSI if x > 1,
I(x)
= {OI"
if x E (0,1] otherwise.
152
COUNTEREXAMPLES IN PROBABILITY
Obviously F and I are the d. f. and the density corresponding to a uniform distribution on the interval (0, I] respectively. It is easy to see that
Fn(x) ~ F(x)
as
n ~
00
for all x E 1R1.
However,
In(x)
-It
I(x)
as
n ~
00.
Therefore we have established that in general Fn ~ F f;. In ~ I.
14.10.
The convergence Xn ~ X and Y n ~ Y does not always imply that d Xn + Y n -+X + Y
Let X. X n • n ~ 1 and Y. Yn , n ~ 1 be r.v.s defined on the same probability space. Suppose X n ~ X and Yn ~ Y as n ~ 00. Does it follow from this that
Xn + Yn ~ X + Y as n ~ oo? There are cases when the answer to this question is positive, for example if X nand Yn , n ~ I are independent, or if the joint distribution of X n • Yn converges to that of X, Y (see Grimmett andStirzaker 1982). The examples below aim to show that the answer is negative if we drop the independence condition. (0 Let {Xn, n ~ I} be i.i.d. r.v.s such that Xn d
= lorD with probability! each and
d
put Y n = I X n . Then Xn -+ X and Yn -+ Y as n ~ 00 where each of X and Y takes the values I and 0 with equal probabilities. Further. since Xn + Yn = I, it is obvious that Xn + Yn does not tend in distribution to the sum X + Y which is a r.v. with three possible values, 0, I and 2, with probabilities and ~ respectively.
l, !
(ii) Suppose now the sequences of r.v.s {Xn, n ~ I} and {Yn , n ~ I} are such
that Xn ~ X and Yn ~ Y where X "'" N(O, I). Y '" N(O, I). If for each n, Xn and Yn are independent, then Xn + Yn ~ Z with Z ,. . ., N(O, 2). Moreover, in this case the distribution of (Xn' Yn ) converges to the standard two-dimensional nonnal distribution with zero mean and covariance matrix
(~ ~).
Let us now drop the condition that Xn and Y n are independent. Again take {Xn,n ~ I} such that Xn ~X with X '" N(O, I) and let Y n - Xn for all
n E N. Then Yn Y where Y ,..... N(O, I). Obviously the sum Xn + Yn = 2Xn and it converges in distribution to a r. v. Z where Z '" N(O, 4) but not to a r. v. distributed N(0,2) as expected.
14.11.
The convergence in probability Xn ~ X does not always imply that g(X n ) ~ g(X) for any funelion 9
The following result is well known and is used in many probabilistic problems (see Feller 1971; Billingsley 1995; Serfling 1980).
153
LIMIT THEOREMS
If X, X n , n
2':
1 are r.v.s such that Xn ~ X as n --t
°
and g(x), x E lIt I is a
continuous function, then g(X n ) ~ g(X) as n --t 00. By a specific example we show that the continuity of 9 is an essential condition in the sense that it cannot be replaced by measurability only. To see this, consider the function
g(X)={O,
1,
The sequence {X n , n
°
~fx~O > 0.
If x
2': I} can be taken arbitrarily but so as to satisfy the properties p
°
X n > for all n E N and X n ----t as n --t 00. For example, let X n take the values 1 and n- l with probabilities n- 1 and 1 - n- l respectively. Then obviously Xn ~ X where X = a.s. Moreover, for each n we have g(Xn) = 1. However, g(X) = and hence g(X n ) cannot converge in any reasonable sense to g(X). In particular,
°
°
p
g(Xn) -f+ g(X) as n --t
p
despite the fact that Xn ----t X. We come to the same conclusion by considering the function 9 defined above and the sequence of r.v.s {Xn,n 2': I} where Xn '" N(0,a 2 /n), a 2 > 0. Obviously
Xn ~ X as n --t
00
00
with X =
(X ) = 9
However, g(X) =
14.12.
°
a.s. Since Xn is symmetric, we have for each n,
{O, 1,
n
°
with probability with probability
!
~.
p
a.s. and hence g(Xn)
-f+ g(X)
as n --t
00.
Convergence in variation implies convergence in distribution but the converse is not always true
Let X and Y be discrete r. v.s such that
where ak E lIt I, Pk
2': 0,
qk
2': 0,
k = 1,2, ... ,
Lk Pk = 1,
Lk qk = 1.
If F and G are the dJ.s of X and Y respectively, then the distance in variation, v(F, G), is defined by
(I) If X and Y are absolutely continuous r.v.s whose dJ.s and densities are F, G and f, g, then v(F, G) is defined by
(2)
v(F, G) =
i:
If(x) - g(x)1 dx.
154
COUNTEREXAMPLES IN PROBABILITY
Suppose F. Fn. n > 1, are the dJ.s of the r.v.s X, X n . n 2 1, respectively. If v(Fn' F) -+ 0 as n -+ 00 we write Fn ~ F and also Xn ~ X and say that the sequence {Xn} converges in variation to X as n -+ 00. It is easy to see that convergence in variation implies convergence in distribution, that is, Fn ~ F ~
Fn ~ F. However, as we shall now see, the converse is not true. (i) Let Fn be the dJ. of a r.v. Xn concentrated at the point lin. Then Fn ~ Fo as n -+ 00 where Fo is the d.f. of the r.v. Xo == 0, while the quantity v(Fn, Fo) calculated by (1) does not tend to zero as n -+ 00. (ii) Let F (x), x E IR I be a d.f. with density f (x), x E IR I. Our goal is to construct a
i:
w
i:
sequence of dJ.s {Fn, n ~ I} such that Fn ----+ F but Fn
In
=
2
f(x) cos nxdx,
In
=
v
-1-+ F
as n -+
2
f(x)sin nxdx,
00.
Denote
n> 1.
The obvious identity In + I n = J~oo f(x) dx = 1 implies that the numerical sequences {In, n 2 I} and {Jn, n 2 I} cannot simultaneously tend to zero as n -+ 00. Thus, we can assume that e.g. In f+ 0 as n -+ 00. In such a case we introduce the function
f n (x) = C n f (x) (1
+ cos nx) ,
x E
m. I
where c~ I = J~oo f (x) (1 + cos nx) dx. Then for each n the function f n is a density and let Fn be the corresponding dJ. Let us try to find the limit of the sequence {Fn} as n -+ 00. Since f is a density, then the well known Riemann-Lebesgue lemma (see e.g. Kolmogorov and Fomin 1970)
1
f(x) cosnxdx -+ 0 as n -+
00
holds for any Borel set B E 'B I. Hence
1
fn(x) dx -+
1
f(x) dx,
n -+
as
00
~
w
Fn ----+ F
as
n -+
00.
Let us now calculate the distance in variation v(Fn, F). We have
v(Fn, F)
=
J~oo I/n(x) - f(x)1 dx
= J~oo Icnf(x) cos nx - (1 - cn)f(x)1 dx
2 I J~ou Icnf(x) cos nxl dx - J~ou 1(1 - cn)f(x)1 dxl·
i:
We find that
Cn
-+ 1 as n -+
00
i:
and
f(x)1 cos nxl dx 2
2
f(x) cos nxdx = In
f+ o.
Therefore v(Fn' F) f+ 0 as n -+ 00, i.e. the sequence ofd.f.s {Fn} does not converge in variation to F despite the weak convergence established above.
155
LIMIT THEOREMS
14.13.
There is no metric corresponding to almost sure convergence
It is well known that each of the following kinds of convergence: (i) in distribution; (ii) in probability; (iii) in L r -sense, can be metrized (see Ash and Gardner 1975; Dudley 1976). It is therefore natural to ask whether almost sure convergence can be metrized. Let us show that the answer is negative. Let :R denote a set of r.v.S defined on the probability space (.0,::1, P) and d : :R x :R ~ IR+ a metric on :R, that is, d is non-negative, symmetric and satisfies the triangle inequality. Let us check the correctness of the following statement: For X, XI, X 2 , ••• E:R,
d(Xn, X) ~ 0
iff
Xn ~ X.
Suppose such a function d does exist. Let {X n, n ~ I} be a sequence of r. v.S converging to some r. v. X in probability but not almost surely (see Example 14.3). Then for some 6 > 0 the inequality d(Xn' X) ~ 6 will be satisfied for infinitely many n. Let A denote the set of these n. However, since Xn ~ X there exists a subsequence {Xnk,nk E A} of {Xn,n E A} converging to X almost surely. But this would mean that d(Xnk' X) ~ 0 as nk ~ 00, which is impossible because d(Xnk' X) ~ 6 for each nk E A. Thus the statement given above is incorrect and we conclude that a.s. convergence is not metrizable. Note, however, that this type of convergence can be metrized iff the probability space is atomic (see Thomasian 1957; Tomkins 1975a).
14.14.
Complete convergence of sequences of random variables is stronger than almost sure convergence
The sequence ofr.v.s {Xn, n ~ I} is called completely convergentto 0 if 00
(I)
lim ~ P[lXml
n-+oo
L......
> c] = 0 for every c > O.
m=n
In this case the following notation is used: Xn ---=-+ O. In order to compare this mode of convergence with a.s. convergence, recall that
(2) Since the probability P is semi-additive, we obtain
which immediately implies that Xn ---=-+ 0 => Xn ~ O. However, the converse is not always true. To see this, consider the probability space (.0,::1, P) where .0 = [0,1], ::1 = '.B[O,I] and P is the Lebesgue measure. Take the sequence {Xn, n ~ I} where if 0 < - w < 1 n if 1n -< w -< 1.
156
COUNTEREXAMPLES IN PROBABILITY
Then clearly this sequence converges to zero almost surely but not completely. These two kinds of convergence are equivalent if the r.v.s X n , n ~ I, are independent (Hsu and Robbins 1947).
14.15.
The almost sure uniform convergence of a random sequence implies its complete convergence, but the converse is not true
Recall that the sequence of r.v.s {Xn,n ~ I} is said to converge almost surely uniformly to a r.v. X if there exists a set A E 1" with P(A) = such that Xn = Xn(w) converge uniformly (in w) to X on the complement A C • Note that almost sure uniform convergence implies complete convergence discussed in Example 14.14 (see Lukacs 1975). Thus we come to the question of the validity of the converse statement. To find the answer we consider an example. Let the probability space (n, 1", P) be given by n = [0, 1],1" = '.B[O,l] and P is the Lebesgue measure. Consider the sequence { X n, n ~ I} of r. v. s such that
°
°
if ~ w ~ 1/(2n ) if 1/(2n 2) < w ~ 1 - 1/(2n 2) if I - 1/(2n 2 ) < w ~ I. For arhitrary c > 0, 0 ~ Xn
< c iff wE ((1 -
2
c)/(2n2), I - (1 - c)/(2n 2)). Hence
P[IXnl ~ c] = P[Xn ~ c]
= (I
- c)/n 2
so that 00
00
n=l
n=1
This means that the sequence {Xn} converges completely to zero. Now let us introduce the sets Bn = [0, ~n2) U (1 - ~n2, I]. n ~ I. Clearly P(B n ) = 1/(2n 2 ). Suppose for some set A with P(A) = 0, Xn converges to zero almost surely uniformly on AC. Then there exists a number n~ E N (independent of w) such that /Xn/ ~ c < on AC provided n ~ n~. However, we have Bn n AC = 0 and Bn~ C A. Hence P(A) ~ P(Bn~) = !n;2. This contradiction shows that the sequence {Xn} defined above does not converge almost surely uniformly to zero.
!
14.16.
Converging sequences of random variables such that the sequences of the expectations do not converge
If the sequence of r. V.s {X n} converges in probability or a.s. to some r. v. X, then under additional assumptions we can show that the sequence of the expectations {EXrJ will tend to EX. However, in general such a statement is not true without appropriate assumptions.
157
LIMIT THEOREMS
(i) Let {Xn,n
P[Xn
2
I} be r.v.sdefined by
= -n -
l/(n + 4), P[Xn
4]
4/(n + 4),
= n + 4] = 31 (n + 4).
P[Xn Obviously for any c
= -1] = 1 -
> 0 we have P[lXn - (-1)1> c] = 4/{n
p
and hence Xn --+( -I) as n ---t
00.
EX n = 1 + 4/(n
On the other hand,
+ 1)
and
Therefore
lim EXn
n--+oo
+ 4)
1 f:. -1 - E
lim EXn
n--+oo
[p -
= I.
lim Xn]
n--+oo
and the convergence in probability of Xn to X is not sufficient to ensure that EX" ---t EX. This can be explained by referring to the standard result (see Lukacs 1975; Chow and Teicher 1978): if X n• n 2 E[IXnlk] ---t E[lXlk] for each 0 < k ::; r. (ii) Consider the sequence {Yn , n
Yn(W)
2 I} n2, { 0,
] and X
are L r r.v.s and Xn
".
X then
of r. v.s where ifO::;w::;n-1 if n- I <w::; 1
=
0, w E [0,1]. Then for every w E [0,1] we have and also the r.v. Yew} Yn(w) ---t Yew) as n ---t 00. However. EYn = nand EYn 1+ EY = as n ---t 00. Let us note that in case (ii) EYn is unbounded. while in case (i) EX n is bounded. According to Bi11ingsley (1995). if {Zn} is unifonnly bounded and Iim n --+ oo Zn Z on a set of probability I, then EZ = limn--+oo EZn . Both cases (i) and (ii) show that unifonn boundedness is essential.
14.17.
°
Weak L I-convergence of random variables is weaker than both weak convergence and convergence in L I·sense
Recall that the sequence {X n, n 2 I} of r. V.s in the space L I is said to converge weakly in L I to (he r. v. X iff for any bounded r. v. Y we have
(I)
lim E[XnY] = E[XY].
n--+oo
In this case the following notation is used: Xn w,L; X as n ---t 00. Clearly the limit X belongs to U and it is unique up to equivalence (see Neveu 1965; Chung 1974).
158
COUNTEREXAMPLES IN PROBABILITY
It is of general interest to clarify the connection between this mode of convergence and the others discussed in the previous examples. In particular, if Xn w
w,L;
X, does
L'
it follow that Xn ~ X or that Xn ~ X? Remark. Here the notation ~ is used to denote the so-called weak convergence of Xn to X as n ~ 00 which in this case is equivalent to convergence in distribution. In a more general context weak convergence will be considered in Section 16. To answer these questions consider the probability space (0,1', P) where 0 = [0, 'J, ~ = 'B(O,I] and P is the Lebesgue measure, and take the following sequence of r.v.s Xn{W) = sin2rrnw, n 2 1. Note that {Xn} is not convergent in either
°
sense-weak or LI-sense. Nevertheless we shall show that Xn w,L; in the sense of definition (l). Let Y be any bounded r.v., that is Y = Yew), w E [0,1] is an 1'-measurable function. Then there is a sequence of stepwise functions {y(m) (w), m 2 I} such that y(m) ~ y as m --4 00 (see Loeve 1978). By the Egorov theorem (see Kolmogorov and Fomin 1970; Royden 1968) for any E > we can find an open set Ao C [0, 1] such that the convergence y(m) --4 Y as m --4 00 is uniform forw E A~ = [0, 1]\A o . Here we can also use the Lusin theorem (see Kolmogorov and Fomin 1970) on the existence of a continuous function Y* coinciding with Y on the complement of a set of E-measure. In both cases, for stepwise or continuous Y, we have
°
E[ Xn Y'I =
l'
Y' (w) sin 2"nw dw --> 0 as
n -->
00.
Since Yand Y* are bounded and y* is close to Y, the difference IE[XnY*]-E[XnYll can be made arbitrarily small. Hence E[XnY] --4 0 as n --4 00 for any bounded r.v.
Y. Therefore Xn w Xn ~ 0 or Xn 14.18.
0 as n --4 0 is true.
00.
However, as noted above, neither of the relations
A converging sequence of random variables whose Cesaro means do not converge
Let {Xn' n
(I)
w,L; L' ~
2 I}
be a sequence of r. v.s. Then the following implication holds:
n
1
--4 00 ::::}
-(XI n
+ .. , + Xn)
~0
as
n
--4 00.
This follows from the standard theorem in analysis about the Cesaro means. Our aim now is to show that almost sure convergence in (1) cannot be replaced by convergence in probability. Indeed, consider the sequence of independent r.v.s {en, n 2 I} where en has adJ. Fn given by
Fn(x)
= {~'-I/(x+n),
if x < 0 if x > O.
159
LIMIT THEOREMS
Then for every fixed c
> 0 we have P[I{nl > c] = 1 - Fn(c) = 1/(c + n)
which means that {n ~ 0 as n ~ 00. Let us show that the Cesaro means
1 17n:=
n
p
+ ... + {n) -1+ 0
({I
as
n ~ 00.
Denoting Mn = max{{I, ... , {n} and taking into account the independence of the variables {j we can easily show that for any x > 0
P[Mn
::;
x] ==
(1 __ 1 ) (1 __ 1 ) ... (1 __ I ) < (I __ I) x+1 x+2 x+n x+n
n
Therefore
P[Mn/n::; c] <
(2) Since [Mn/n
> c]
C [1Jn
(I _cn+n 1 )n
> c],
P[Mn/n > c] =s; P[1Jn > c]
or
P[Mn/n =s; c] ~ P[1Jn =s; c].
Combining the last relation with (2) we see that
c
P[ 17n ::; J < (I -
(c; n) 1)
n
and hence lim P[1Jn
n-too
> c]
~ 1
lim P[1Jn S c] ~ 1 - exp( -(c
n-too
+ I)
I)
> O.
This means that 17n does not converge to zero in probability. Therefore in general
{n~O~~({I+'"
{n)~Oasn~oo.
Finally, let us indicate one additional
c~se
leading to the same result. Let
{Xn, n 2: I} be independent r.v.s, Xn taking the values 2n and 0, with probabilities p n- I and I-n- I respectively. ThenX n ~Obut ~(XI + .. ·+Xn ) -I+Oasn ~ 00 (the details are left to the reader).
SECTION 15. LAWS OF LARGE NUMBERS Let {X n, n 2: I} be a sequence of r. v.s defined on the probability space EXk , An = ES n = at + ... + an. Define Sn = XI + ... + X n , ak
(.o,~,
P).
COUNTEREXAMPLES IN PROBABILITY
160
We say that the sequence {Xn} satisfies the weak law oflarge numbers (WLLN) (or that {Xn} obeys the WLLN) if ~Sn - ~An ~ 0 as n ---4 E > 0 we have
Further, if lSn - lAn ~ 0 as n ---4 n n p
00,
00,
that is if for any
that is if
[w: n-too lim (~Sn(W) n
!A n ) =
n
0] = 1
we say that the sequence {Xn} satisfies the strong law oflarge numbers (SLLN) (or that {Xn} obeys the SLLN). Let us formulate some of the basic results concerning the WLLN and the SLLN. It is obvious that either {Xn} is a sequence of identically distributed r. v.s or these variables are arbitrarily distributed. ~
Khintchine theorem. Let {Xn, n E[IX1Il where a
<
00.
I} be a sequence of i.i.d. r.v.s with
Then this sequence satisfies the WLLN and ~Sn ~ a as n
-+
00
= EX 1•
Kolmogorov theorem 1. Let {Xn, n ~ I} be a sequence ofi.i.d. r.v.s. The existence of E[lXlll is a necessary and sufficient condition for the sequence {Xn} to satisfy the SLLN and ~Sn ~ a as n -+ 00 where a = EX 1. ~
Markov theorem. Suppose {Xn, n that the following condition holds:
(1)
(l/n2)V[XI
+ ... + Xnl---4 0
I} is an arbitrary sequence of r.v.s such as n
-+
00
(Markov condition).
Then {X n} satisfies the WLLN.
Kolmogorov theorem 2. Let {X n, n ~ I} be a sequence of independent r. v. s with VXn < 00, n ~ 1. Suppose the following condition is fulfilled:
a~ =
00
(2)
L a~/n2 <
00
(Kolmogorov condition).
n=l
Then the given sequence satisfies the SLLN. In the examples below we refer to (1) and (2) as the Markov condition and the Kolmogorov condition respectively. A detailed presentation of the laws of large numbers can be found in the books by Doob (1953), Gnedenko (1962), Fisz (1963), Revesz (1967), Feller (1971), Chung (1974), Petro v (1975), Laha and Rohatgi (1979), Billingsley (1995) and Shiryaev (1995). In this section we consider examples which illustrate the importance of the conditions ensuring the validity of the WLLN or of the SLLN as well as the relationship hetween these two laws and some other related topics.
LIMIT THEOREMS
15.1.
161
The Markov condition is sufficient but not necessary for the weak law of large numbers
(i) Let {Xn' n :2: I} be a sequence of independent r.v.s such that Xn has a X~ distribution with n degrees offreedom: that is, Xn has a density
fn(x) = { [2r(n/2)]-1 (x/2)(n-2)/2 exp( -x/2), 0,
if x >.0 otherwise.
Then EX n = n, V Xn = 2n and clearly the Markov condition is not satisfied. Hence we cannot apply the Markov theorem to detennine whether the sequence {Xn} satisfies the WLLN. We use the following result (see Feller (1971) or Shiryaev (1995». If {en, n :2: I} is a sequence ofr. v.s and en has a ch.f. ¢n, then
en ~ 0 as
n --+
00
~ ¢n(t) --+ 1 as
n --+
00
for all
tE
jRI.
The ch.f. 'l/Jn of Xn is 'l/Jn(t) = (1 - 2it)-n/2. Then calculating the ch.f. ~n of (Sn - ESn)/n where Sn = XI + ... + Xn and ESn = ~n(n + 1) we find that ~n --+ 1 as n --+ 00 for all t E m,1 and in view of the above result we conclude that the sequence {Xn} does satisfy the WLLN. Note that analogous reasoning leads to the same conclusion if we consider the sequence of discrete independent r. v.s {Yn , n :2: I} where P[Yn = ± I] = ~ (1 - 2- 71 ) and P[Yn = ±2n] = 2-(n+I). Therefore the Markov condition is not necessary for the WLLN.
(ii) Let {Yn , n
:2: I} be independent r.v.s where Yn has a density
It is easy to show that EYn = 0 and VYn = (]'~. Let us choose (]'~ in the following special way: (]'~ = n 1+0 where 0 ~ t5 < I. Then the Markov condition is not fulfilled. Nevertheless, as will be shown, the sequence {Yn } satisfies the WLLN. However, to prove this statement we need the following result (see Feller 1971): let 11n be independent r. v.s and let
(1) for any positive c., 15, k = 1,2, ... , n and all sufficiently large n. Denote by {17n} the truncated sequence with some constant c > 0, that is 17k = 17k if l11k I ~ c and 17k = c if l11k I > c. Then {17k} obeys the WLLN iff the following two conditions hold for any c. > 0 and c > 0:
(2)
lim
n-+oo
~
L- P [.!.. n I17k I > k=1
c.] = 0
162
COUNTEREXAMPLES IN PROBABILITY
and n 2
lim n- ""V7}k =
(3)
n-+oo
~
o.
k=1
Now for any fixed c > 0 we can easily show that P
[~IYkl
>
c] = exp[-v'2cn/k(l+t5)/2],
k
= 1,2, ... , n
and for sufficiently large n the right-hand side can be made arbitrarily small. Thus condition (1) holds. For given c > 0 and constant c > 0 let N be an integer satisfying the relations: eN ~ c and e(N + 1) > c. Choose n > N. Then n
(4) Since the sum on the right-hand side of (4) contains a finite number of tenns and each tenn tends to zero as n -+ 00, (4) implies (2). It then remains for us to check condition (3). A direct calculation shows that VYk = k l+ 6(1
- exp( -cv'2/k(I+6)/2)] - v'2k(l+6)/2 exp( -cv'2/k(1+ 6)/2).
Using a Taylor expansion we find
where tk includes higher-order terms (their exact expressions are not important). From this we can easily derive (3). Therefore according to the Feller theorem cited above the sequence {Yn } satisfies the WLLN. Again, as in case (i), the Markov condition is not satisfied.
15.2. The Kolmogorov condition for arbitrary random variables is sufficient but not necessary for the strong law of large numbers Consider the sequence {Xn' n ~ I} of independent r. v.s where
Obviously EXn :::::: 0, O'~ :::::: VXn :::::: 1 2- n + 2n so that E~=1 0'~/n2 diverges. Thus the Kolmogorov condition, E~1 0'~/n2 < 00 is not satisfied. Nevertheless we shall show that {X n} obeys the SLLN. Recall that two sequences of r. v.s {'n} and {1]n} are said to be equivalent in the sense of Khintchine if E~=) P('n i= 1]n] < 00. According to ReveSl (1967) two such sequences simultaneously satisfy or do not satisfy the SLLN.
163
LIMIT THEOREMS
Introduce the sequence {Yn , n
2 I} where
EYn and P[Xn =/: Yn] = 2- n , n E N. Since the series :E~=I P[Xn =/: Yn] :E~=I 2- n is convergent, the sequences {Xn} and {Yn } are equivalent in the sense of Khintchine. Further, VYn = 1 - 2- n so that :E~,""I VYnln2 < 00. Thus the Kolmogorov condition is satisfied for the sequence
Clearly EX n
{Yn } and therefore {Yn } obeys the SLLN. By the above result it follows that the sequence {Xn} also obeys the SLLN. Thus we have shown that the Kolmogorov condition for arbitrarily distributed r. v.s is not necessary for the validity of the SLLN. 15.3.
A sequence of independent discrete random variables satisfying the weak but not the strong law of large numbers
Let {Xn, n
2: 2} be independent r.v.S such that
P[Xn = ±n] - 1/(2n logn), P[Xn = 0] = 1 - I/(n logn), n = 2,3, .... Consider the events An
{IX n I 2 n}. n
~ 2. Then
The divergence of the series L~=2 P(An). the mutual independence of the variables Xn and the Borel-Cantelli lemma allow us to conclude that the event [An i.o.] has probability 1. In other words,
P[lXnl 2 n i.o.] = 1 ::} P [ lim Snln =/: 0] n-too
= 1.
Therefore the sequence {Xn, n 2} cannot satisfy the SLLN. Now we shall show that {Xn} obeys the WLLN. Obviously VXk = kllogk. Since the function x I log x has a local minimum at x = e and L~=3 k I log k is a lower Riemann sum for the integral (xl logx) dx. we easily obtain that
ft+ '
t
n 1 + (xl logx) dX]
~ VXk ~ ~ [1 22+ r n k=2 n og J3 <
2
logn
+ (n - 2)(n + 1) n 2 10gn
-t 0
as n -t
00.
Thus the Markov condition for {Xn} is satisfied and therefore the sequence {Xn} obeys the WLLN.
164
COUNTEREXAMPLES IN PROBABILITY
Finally, let us indicate another sequence whose properties are close to those of {X n }. Let {Yn , n ~ 2} be a sequence of i.i.d. r.v.s such that
PlY, = ±nJ = C I(n' I logn), n =
2,3, ... , C =
~ (~(n'IOgn)-I) -I
It can be shown that this sequence obeys the WLLN but does not satisfy the SLLN. The easiest way to do this is to use ch.f.s showing, for example, that 'l/Jn(t) ---+ 1 as n ---+ 00 where 'l/Jn is the ch.f. of ~(Y2 + ... + Yn+d.
15.4.
A sequence of independent absolutely continuous random variables satisfying the weak but not the strong law of large numbers
Let {X n , n ~ I} be independent r.v.s where the density of Xn is given by
= (V2a n )-1 exp(-V2lxl/an ), x E JR I . to show that V Xn = a~. Let us define a~ in the following special way: fn(x)
It is easy a~ = 2n2/(logn)2, n ~ 2. First we shall establish that {Xn} does not obey the SLLN. In fact, if An {IXnl ~ n} then
P(An) = 2(V2a n
)-I[OO exp(-V2x/an )dx = exp [-~V2(logn)2/n].
Since (log n)2 /n ---+ 0 as n ---+ 00, E~=2 P(An) = 00. Using similar reasoning to that in Example 15.3, we conclude that {Xn} does not obey the SLLN. Our purpose now is to show that {Xn} satisfies the WLLN. However, one can check that the Markov condition for {Xn} does not hold. Then the proof uses the Feller theorem cited in Example 15.1. Indeed, we can see that (1)
P
[~IXkl
>
c] = exp(-nc1ogk/k),
k = 2,3, ... ,n
and clearly this probability can be made arbitrarily small for large n. For any truncation level c > 0 and c > 0 we introduce the variables Xk Xk, if IXkl ~ c and Xk c, if IXkl > c. Using (1) we obtain
=
=
tp[~IX.[>cl-40
asn-4oo.
k=2
Similarly to Example 15.1 we can verify that n
(1/n
2
)
LVX
k
---+ 0 as
n ---+
00.
k=2
Thus, by the Feller theorem, the sequence {Xn} satisfies the WLLN.
Xk
where
165
LIMIT THEOREMS
15.5.
The Kolmogorov condition L~=l a~/n2 < 00 is the best possible condition for the strong law of large numbers
Let {Xn,n 2 I} be a sequence of independent r.v.s with finite variances a~ and {b n , n 2 I} be a non-decreasing sequence of positive constants with bn ~ 00. We say that the sequence {Xn} obeys the SLLN with respect to ibn} if b;;lSn - b;;IESn ~ 0 as n --t 00 where Sn = Xl + ... + X n . According to the Kolmogorov theorem the condition L~=l a~/b~ < 00 implies that {Xn} satisfies the SLLN with respect to ibn}. Note that in the classical Kolmogorov theorem bn = n, n 2 1. It is of general interest to understand the importance of the condition L~=l a~/b~ < 00 in the SLLN. We shall now show that this condition is the best possible in the following sense. For simplicity we confine ourselves to the case bn = n, n 2 1. So, let {a~} be a sequence of positive numbers with 00
L a~/n2 =
(1)
00.
n=l We aim to construct a sequence {Yn , n 2 I} of independent r.v.s with VYn = a;t such that {Yn } does not satisfy the SLLN. Let us describe the sequence {Yn}. If a~/n2 ::; 1 then the r.v. Y n takes the values (-n), 0 and n with probabilities a~/(2n2), 1 - a~/n2 and a~/(2n2) respectively. If a~/n2 > 1 then Y n = ±an with probability ~ each. Clearly EYn = 0, VYn = a~. For any e > 0 we have P[lYnl/n > e] = P[Yn =1= 0]
2 2 = { a1,n /n ,
f a~/n2 ::; 1 if a~/n2 > 1.
Suppose the sequence {Yn } does obey the SLLN. Then necessarily Yn/n ~ 0 as n --t 00. From (1) it is easy to derive that L~=l P[lYnl > en] = 00. By the BorelCantelli lemma the events [IYn I > en] occur infinitely often, so the convergence Yn/n ~ 0 as n --t 00 is not possible. Therefore {Yn } does not obey the SLLN.
15.6.
More on the strong law of large numbers without the Kolmogorov condition
Consider the sequence {Xn,n 2 2} of independent r.v.s where
(1)
P[Xn
= ±(n/logn)I/2] = ~.
It is easy to check that the Kolmogorov condition L~=2 a~/n2 < 00 is not satisfied. However, Example 15.2 shows that the SLLN can also hold without this condition. In our specific case the most suitable result which can be applied is the
166
COUNTEREXAMPLES IN PROBABILITY
following theorem (see Revesz 1967): let {~n} be independent r.v.s with E~n and let for some T 2: 1
=0
00
E[I~nI2r]
< 00
L E[I~nI2r]/nr+l < 00.
and
n=1
Then the sequence {~n} satisfies the SLLN. Clearly for the sequence {Xn} defined by (1) it is sufficient to take T = 2 and verify directly the conditions in the Revesz theorem. Thus we arrive at the conclusion that {Xn} obeys the SLLN.
15.7.
Two 'near' sequences of random variables such that the strong law of large numbers holds for one of them and does not hold for the other
Considertwosequencesofr.v.s, {Xn,n 2: 2} and {Yn,n 2: 2} where
P[Xn =n/ log n] = P[Xn = -n/ log n] P[Yn =/3n] = P[Yn =-/3n] with 0
< /3 <
= log n/(2n),
P[Xn = 0] = 1 - log n/n,
= 1/(2/32nlogn),P[Yn=O] = 1- 1/(/32nlogn)
1. Obviously Xn and Yn are symmetric r.v.s with
EX n = EYn = 0,
VXn = VYn = n/logn
and both satisfy the inequalities
IXnl < n
IYnl < n
a.s.,
a.s.,
n = 3,4, ....
We are interested to know whether or not these sequences satisfy the SLLN. We shall show that {Xn} obeys the SLLN while {Yn } does not. For this purpose we introduce Hr where 2r + 1 Hr = 2- 2r L VX n . n=2 r
For any choice of c
+l
> 0 we have exp( -c/ Hr) < exp( -cT log 2/2) implying 00
Lexp(-c/Hr) < 00. r=1
However, this condition is sufficient to conclude that the sequence {Xn} obeys the SLLN (see Prohorov 1950). Suppose now that {Yn } also satisfies the SLLN. Then necessarily
P[Yn/n
-4
0] = 1.
167
LIMIT THEOREMS
It can easily be seen from the definition of {Yn } that 00
L P[IYnl/n
{J]
00.
n=2
Then by the Borel-Cantelli lemma, the events [IYnl = n{J] occur infinitely often. This, however, contradicts the above relation, namely that Yn/n ~ 0 as n ---t 00, and therefore the sequence {Yn } does not obey the SLLN.
15.8. The law of large numbers does not hold if almost sure convergence is replaced by complete convergence Let {Xn, n ~ I} be a sequence of i.i.d. r.v.s, F(x), x E }R' their common d.f. and EX, = J~oo xdF(x) = O. Suppose that {Xn} satisfies the SLLN. Then
(1)
Y n :=
!n (Xl + ... + Xn) ~ 0
as
n ---t
00.
It is natural to ask whether the conditions for the SLLN could guarantee that in (1) almost sure convergence can be replaced by a stronger kind of convergence, in particular by complete convergence (see Example 14.l4). Under the conditions
i:
(2)
x dF(x) = 0,
(J'2 -
i:
x 2 dF(x) < 00
Hsu and Robbins (1947) have shown the convergence of the series L~=' P[lYnl > c] for any c > O. Therefore if condition (2) is satisfied, the sequence {Yn } converges completely. Thus instead of (1) we have Yn ~ 0 as n ---t 00. Suppose now that condition (2) is relaxed a little as follows:
(3)
i:
x dF(x) = 0,
i:
IxlQ dF(x)
< 00,
i:
x 2 dF(x) =
00
where Q' = constant and! (1 + VS) ~ Q' < 2. Then the sequence {Xn} satisfies the SLLN. However, the series L~=l P[lYnl > c] diverges for every c > 0 and hence the relation Yn ~ 0 fails to hold. Therefore there are sequences of Li.d. r. v.s such that the corresponding arithmetic means {}~} converge almost surely but not completely. Finally, it remains for us to indicate a particular case when conditions (3) are satisfied. For example, take X I to be absolutely continuous with density f (x) = Ixl- 3 for Ixl :: 1 and f(x) = 0 otherwise.
15.9.
The uniform boundedness of the first moments of a tight sequence of random variables is not sufficient for the strong law of large numbers
Recall firstly that the sequence {Xnl n ~ I} of real-valued r. v.s is said to be tight if for each c > 0 there exists a compact interval K£ C }R I such thatP[Xn EKe] > l-c
168
COUNTEREXAMPLES IN PROBABILITY
for all n. Let {X n, n ~ I} be a sequence of independent r. v.s. According to a result derived by Taylor and Wei (1979), if {Xn} is a tight sequence and the rth moments with r > I are uniformly bounded (E[lXnn < M = constant < 00) then {Xn} satisfies the SLLN. Is it then possible to weaken the assumption for r, r > 1, replacing it by r = 1 in the above result? By a specific example we show that the answer to this question is negative. Let {X n, n ~ I} be a sequence of independent r. v.s such that
P[Xn
= ±n] = Hn log(n + 2)]-1,
P[Xn
= 0] = 1 -
[n log(n + 2)]-1.
Then EXn = 0, E[IXnl1 = I/Iog(n + 2). So E[lXnlJ are uniformly bounded. and indeed, E[lXnlJ ~ O. Taking into account the relation P[lXnl ~ n] = I /[n log(n + 2)], we conclude that the sequence {Xn} is tight. Further, L~=I P[lXnl ~ n] = 00 and the Borel-Cantelli lemma implies that the event [lXnl ~ n i.o.] has probability I. However, this means that the SLLN cannot be valid for the sequence {Xn}.
15.10.
The arithmetic means of a random sequence can converge in probability even if the strong law of large numbers fails to hold
Let {Xn, n ~ I} be a sequence of i.i.d. r.v.s such that E[lXd] = 00. According to the Kolmogorov theorem this sequence does not satisfy the SLLN, i.e. Yn = Sn/n. where Sn = XI + ... + Xn, is not a.s. convergent as n ~ 00. However we can still ask about the convergence of Yn , the arithmetic means, in a weaker sense, e.g. in probability. This possibility is considered in the next example. Consider the sequence {~n, n ~ I} of i.i.d. r.v.s where P[6 = (-I )k-l k] = 6/(1r 2k 2), k = 1,2, ... The divergence of the harmonic series implies that E[l61J = 00. Hence {~n} does not satisfy the SLLN. Let us show now that the arithmetic means (6 + ... +~n) /n converge in probability to a fixed number as n ~ 00. Our reasoning is based on the following general and very useful result (see e.g. Feller 1971 or Shiryaev 1995): if the ch.f.1/J( t) = E[eit~l), t E ~I of 6 is differentiable at t = 0 and 1/J'(0) = ie, where i = yCT and e E ~ , p
then (~l + ... + ~n)/n ---t e as n ~ r. v. ~I defined above. We have 6
1/J(t)
= 7f2
f; 00
00.
Thus we first have to find the ch.f.1/J of the
(e it )2j-1 (2j - 1)2
6
+
7f2
00
~
(e it )2k (2k)2 .
If we introduce the functions 00
and
h2(U) =
L k=l
2k
(~k)2'
lui
~
I
169
LIMIT THEOREMS
we easily find that they both are differentiable and , I I +u h](u) = -2 log - - , u 1- u
Hence h~ (u) - h~ (u)
= (1/ u) In( I + u) which implies that 1jJ' (0) exists and 6 [h]'( 1) - h2' (I)] = i 6 log 2 1jJ '( 0) = i 2: 2' 1T
1T
Thus we arrive at the final conclusion that I
p
+ ... + ~n) ----+
-(6 n
6 log 2 1T2
as
n ---+
00.
Note that in this case the sequence {~n} satisfies the so-called generalized law of large numbers (see Example 15.12).
15.11.
The weighted averages of a sequence of random variables can converge even if the law of large numbers does not hold
Let {Xk, k ~ I} be a sequence of non-degenerate i.i.d. r. V.s, {Ck, k ~ I} a sequence ofpositivenumbersandletSn = E~=] CkXk andCn = E~=] ck,n ~ 1. The ratios Sn/Cn,n ~ 1 are called weighted averages generated by {Xk,Ck,k ~ I}. We say that the weak (strong) law holds for the weighted averages of {Xk, Ck, k ~ I} iff Sn/Cn converges in probability (a.s.) to some constant as n ---+ 00. Without any loss of generality we can suppose that EX k = 0 for all k ~ 1. We now want to see whether Sn/Cn converges to 0 as n ---+ 00. Obviously if all Ck I then Sn = Xl + .. , + X n , Cn = n and we are in the framework of the c1assicallaws of large numbers. Our aim now is to show that there is a sequence of i.i.d. r.v.s {Xd and a sequence of weights {c n } such that
=
I
-(Xl
n
a.s.
+ ... + Xn) +0
as
n ---+
00.
In other words, the strong law holds for weighted averages but the classical SLLN is not valid. An analogous conclusion can be drawn about the weak law for weighted averages and the classical WLLN. Firstly, consider the strong law. By assumption the variables Xn are identically distributed and in this case the SLLN does not hold iff E[lXll] = 00. Further, we need the following result (see Wright et at): let g(x), x E IR+ be a non-negative measurable function with g(x) ---+ 00 as x -t 00. Then there exists a sequence { X k, Ck, k ~ I} whose weighted averages Sn / C n , n ~ 1, satisfy the strong law and
E[g(Xt)]
= E[g(X))]
=
00.
Actually this result contains all that we wanted, namely a sequence of i.i.d. r. v.s {Xk, k ~ I} with EIX]I = 00 and a sequence of weights {Ck' k ~ I} such that
170
COUNTEREXAMPLES IN PROBABILITY
Sn/Cn ~ 0 as n -+ 00, although the sequence {Xd does not obey the classical SLLN. A similar conclusion can be obtained by using a result of Chow and Teicher (1971) which states that there is a r. v. X with EI X I = 00 such that the sequence {Xd of independent copies of X together with a suitable sequence of weights { Ck} generates weighted averages Sn / Cn which converge a.s. as n -+ 00. Obviously it is impossible 1 since the classical SLLN for {Xd is not satisfied. In this in this case to take Ck connection Chow and Teicher (1971) give two specific examples. The first one arises in the so-called St Petersburg game (Xk = 2k with probability 2- k, and 0 otherwise), while in the second case X has a Cauchy distribution. It is of general interest to compare some consequences of the results cited above. In particular, let us look at the value ofE[IX for different r. Both examples considered by Chow and Teicher are such that
=
n
lim xrp[lXI ~ x] = 0
x-too
0
for all
1
which implies that E[lXn < 00 for all 0 < r < 1. In the result of Wright et af (1977) we can take the function g(x) = (logx)+ and choose a sequence {Xd of i.i.d. r. v.s such that E[IX = 00 for all r > 0 and find weights {cd such that Sn/Cn ~ constant. Clearly the SLLN fails to hold. Consider now the weak law. It is easy to see that if the weak law holds for the weighted averages of {X k, Ck} then {Ck} must satisfy the condition
n
Cn -+
(I)
00,
cn/Cn -+ 0
as
n -+
00.
According to Jamison et af (1965), the weak law holds for any sequence of weights {cd satisfying (1) if Jixl~T x dF(x) -+ a = constant as T -+ 00 and lim TP[lXI ~ T] = 0
(2)
T-too
where F is the d.f of X. This result and a statement by Loeve (1978) allow us to conclude that if X has a fixed distribution (we consider only the case of i.i.d.) then the weak law holds for {X k, cd for any {cd iff {Xd satisfies the classical WLLN (when all Ck 1). However, using the result of Wright et al with g(x) = xr and 0 < r < 1, one can obtain a sequence {Xk, cd for which the weak law holds but condition (2) is not satisfied. In such a case the weak law does not hold for the sequence {X k, l}. Obviously this means that the sequence {Xk} does not obey the classical WLLN in spite of the fact that for some weights {cd. the weighted averages Sn/Cn converge in probability.
=
15.12.
The law of large numbers with a special choice of norming constants
Let {Xn, n ~ I} be a sequence of independent r.v.s and Sn = XI + ... + X n . If for some number sequences { an, n ~ I} and {b n , n ~ I}, with all bn > 0, the following
171
LIMIT THEOREMS
relation holds: (1) and we say that {Xn} satisfies a generalized law of large numbers (LLN). This law is weak or strong depending on the type of convergence in (1). If an = ESn and bn = n we obtain the scheme of the classical LLN. There are sequences of r.v.s for which the classical LLN does not hold, but for some choice of {an} and {b n } the generalized LLN holds. Let us consider an example. In the well known St Petersburg game (also mentioned in Example 15.11), a player wins 2k roubles if heads first appears at the kth toss of a symmetric coin, k = 1,2, .... Thus we get a sequence of independent r. v.s {X k, k 2: I} where P[Xk = 2k] = 2- k = 1 - P[Xk = 0]. It is easy to check that {Xd does not obey the WLLN. However, we can hope that a relation like (]) will hold. Using game tenninology, suppose that a player pays variable entrance fees with a cumulative fee bn = n logn for the first n games. Then the game becomes 'fair' in the sense that
(2)
lim Sn/b n
n-too
=1
in probability.
It is natural to ask whether this game is 'fair' in a strong sense, that is, whether (2) is satisfied with probability 1. Actually we shall show that the St Petersburg game with bn = n log n is 'fair' in a weak but not in a strong sense. In other words, it will be shown that {Xd obeys the weak but not the strong generalized LLN with an = bn = n log n, n 2: 2. The result that Sn/bn ~ 1 as n -+ 00 is left to the reader as a useful exercise. Further, it is easy to see that P[Xn > c] 2: 1/ c for any c > 1 and every n 2: 2. Hence for c = constant> 1 and n 2: 2 we have 00
P[Xn > cbn ] 2: l/{cb n ) = 1/{cn logn)
and
L P[Xn > cbn] =
00.
n=2
This and the Borel-Cantelli lemma imply that P[Xn/b n > c i.o.] = 1. Thus
P[limXn/b n = 00] = 1 and P[limSn/b n = 00]
= 1.
Therefore }:I[ lim Sn/bn = 1] = 0 n-too
showing that (2) is satisfied for convergence in probability but not a.s.
SECTION 16.
WEAK CONVERGENCE OF PROBABILITY MEASURES AND DISTRIBUTIONS
In Section 14 we introduced the notion of convergence in distribution and illustrated it by examples. In particular, we mentioned that this kind of convergence is close to
172
COUNTEREXAMPLES IN PROBABILITY
so-called weak convergence. In this section we define weak convergence and clarify its relationship with other kinds of convergence. Let F n , n 2: 1, and F be d.f.s over the real line ]Rl. Denote by P n and P the probability measures over (]R 1,2) I) generated by Fn and F respectively. Recall that P n and Pare detennined uniquely by the relations Pn(-oo,xl = Fn(x) and P(-oo,xl = F(x), x E ]RI. Since F is continuous at the point x iff P({x}) = 0, then convergence in distribution Fn ~ F means that P n ( -00, xl --+ P( -00, xl for O. Let us consider a more general situation. every x such that P( {x}) For any Borel set A in ]Rl (that is A E 2)'), 8A will denote the boundary of A. Suppose P and P n , n 2: I, are probability measures on (]R' ,2)'). We say that the sequence {Pn} converges weakly to P and write P n ~ P, if for any A E 2) I with P(8A) = 0 we have Pn(A) --+ P(A) as n -+ 00.
=
Now we formulate the following fundamental result.
Theorem 1. The following statements are equivalent: (a) Pn ~P;
(b) lim P n (A) ~ P( A) for any closed set A E 2) t ; n400
(c) lim Pn(A)
2: P(A) for any open set A
E 2)';
n400
(d) For every continuous and boundedfunction 9 on]R' we have
( g(x)Pn(dx) -+ ( g(x)P(dx)
JRi
JRi
as
n -+
00.
Weak convergence can be studied in much more general situations not just for probability measures defined on the real line ]R' . However, convergence in distribution treated in Section 14 is equivalent to weak convergence discussed above. If we work with probability measures, the tenn weak convergence is preferable, while for d.f.s both tenns, weak convergence and convergence in distribution, are used, as well as both notations, Fn ~ F and Fn ~ F. We now fonnulate another fundamental result connecting the weak convergence of d.f.s with the pointwise convergence of the corresponding ch.f.s.
Theorem 2. (Continuity theorem.) Let {Fn' n 2: I} be a sequence of d.fs on ]Rl and {¢n, n 2 I} be the corresponding sequence of the chfs. (a) If Fn ~ F where F is a d.!, then ¢n(t) --+ ¢(t), t E ]Rt where ¢ is the ch.f of F. (b) Iflimn---lo oo ¢n(t) existsforeacht E]Rl and¢(t):= limn---lo{X) ¢n(t) is continuous at t = 0, then ¢ is the ch.! of a d.f F and Fn ~ F as n --+ 00. We refer the reader to the books by Billingsley (1968, 1995), Chung (1974) or Shiryaev (1995) for a detailed proof of Theorems 1 and 2 and of several others. In this section we have included examples illustrating some aspects of the weak convergence of probability measures, distributions, and densities.
173
LIMIT THEOREMS
16.1.
Defining classes and classes defining convergence
Let (n,9) be a measurable space and P, Q probabilities on this space. The class of events A C l' is said to be a defining class, if P= Q
A
on
=}
P= Q
on
1'.
We say that A C l' is a class defining convergence if
=}
Pn(A) --t P{A)
for all sets
A EA
Pn(A) --t P{A)
for aU sets
A E l' with
P(8A):::: 0
with
P{8A)
=0
that is, that P n ~ P as n --t 00. Let us illustrate the relationship between these two notions. (i) Obviously every class defining convergence is a defining class. However, the converse is not always true. Let n :::: [0, I), l' :::: 'B[0,1) and A C l' be the field of all finite sums of disjoint subintervals of the type [a, b) where 0 < a < b < 1. Then A is a defining class but not a class defining convergence. To see this it is enough to consider the probabilities P n and P concentrated at the points I - lin and 0 respectively.
n,
(ii) Let {P n 2:: I}, P and Q be probabilities on (n, 9) where n ~ I , l' :::: 'B I and let A C l' be a defining class. Suppose two conditions are satisfied:
(1)
Pn(A) --t Q(A)
as
n --t
00
for all
as
n --t
00.
A EA
and
(2)
Pn
w
~
P
Since A is a defining class, from (1) and (2) we could expect that P this is not the case. Define P n , P and Q as follows:
Q. However,
It is easy to see that P n ~ P as n --t 00. Further, let B consist of the points 0, 1, ~, 1 + "* where n 2:: 1. Denote by A the field containing all A E l' such that either AB is finite and 0 ¢ A, or A C B is finite and 0 ¢ A c. Then A is a defining c1ass and Pn(A) --t Q(A) as n --t 00 for every A EA. So (1) and (2) are satisfied, but P 1= Q. (iii) Let «:::[0, 1] be the space of all continuous functions on [0, 1] and a-field. For kEN and tJ, ... , tk E [0,1] let 1rt] ... t,.
: «:::[0, 1J t----t Rk
e its
Borel
174
COUNTEREXAMPLES IN PROBABILITY
map the point (function) x E
°
nt, xn(t) =
{
2 - nt, 0,
°
if if if
0< - t -< !n !n 1. n
< t -<
1. n
1.
Since Xn does not converge to uniformly in
°
=
!)
°
=
=
tmin
= min{tj, tj
°
=
=
=1=
o}.
This example shows that weak convergence in the space e[O, I] cannot be characterized by convergence for all finite-dimensional sets (as in ]ROO).
16.2.
In the case of convergence in distribution, do the corresponding probability measures converge for all Borel sets?
Let Fo(x), Fn(x), n
>
1, be dJ.s and /-Lo, /-Ln, n 2: I, their probability measures on
(]R I, 'B I). Suppose Fn ~ Fo as n -+ 0. It follows that /-Ln((-OO, x]) -+ /-Lo((-oo, x]) for every x E ]R I which is a continuity point of Fo. However this is a convergence of /-Ln to /-Lo but for a special kind of sets, namely for infinite intervals which of course belong to 'B I. Thus we arrive at the following question. Is it true that Fn ~ Fo imply /-Ln (B) -+ /-Lo( B) for all B E 'B I? In fact, the negative answer to this question is contained in the definition of convergence in distribution. Perhaps the easiest illustration is to take F n (x) 1[l/n,oo)(x),n 2: 1, and Fo(x) = I[o,oo)(x),x E ]RI.ThenobviouslyFn(x) -+ Fo(x) as n -+ 00 for all x except the only point x = where Fo has a jump (of size 1).
°
Thus in this completely degenerate case we obviously have Fn ~ Fo as n -+ Taking, for example, the Borel set (-00,0], we find
ILn(( -00,0]) = Fn(O) =
°It /-Lo(( -00,0]) = Fo(O)
= 1 as n -+ 00.
00.
175
LIMIT THEOREMS
In the above case the limiting function Fo is discontinuous. Let us assume now that Fo is continuous everywhere on lR', Of course, if Fn ~ Fo and B is a Borel 0, then f..tn(B) -+ f..to(B) as n -+ 00. Let us i1lustrate what we set with f..to(aB) expect if f..to( 8B) i- O. Consider the r.v.s Xo and X 1" n > 1, where Xo is uniformly distributed on the interval (0,1) and Xn is defined by l'[Xn = ~] = for k = 0, ], ... , n - 1, n 2:: 1 (uniform discrete distribution). If Fo and Fn are the dJ.s of Xo and Xn respectively, we have (hy [a] standing for the integer part of a)
=
*
Fo(x) = {
1,
Since l[nx]/n -
~
if x if 0 if x
0, x,
xl
< >
0 x
0, ~
I
Fn(x) =
I,
{
[nx]/n, I,
~ lin for any x E lR' and any n
if x if 0 if x
~
0
<x~ > 1.
2:: 1 we
I
conclude that
Xo as n --+ 00 (equivalently, that Fn -..::., Fo). Denote by Po and Pn the measures on lRl induced by Fo and Fn and let Q be the set of all rational numbers in RI, Then Pn(Q) = I for each n, Po(Q) 0 and hence Xn
lim Pn(Q) = 1 i- 0 = Po(Q).
n-+oo
In this example Po(8Q) = 1, that is the crucial condition Po(8B) = 0 is not satisfied Q. for B Note that the limiting function Fo is not only continuous, it is absolutely continuous with a finite support (uniform distribution on (0,1). A conclusion similar to the above concerning the eventual convergence of Pn to Po can also be derived for absolutely continuous Fo having the whole real line RI as its support. Consider a sequence of independent Bernoulli r.v.s 6,6, ... : P[ei 1] == p, P[ei = 0] = q, q 1 - p, 0< p < 1. Denote by G n the dJ. of the quantity Sn (Sn - np) I (npq) 1/2 where
=
Sn
el + ' .. + en and let 0 be a r.v. distributed normally N(O, I). Then Sn ~ 0,
or equivalently, Gn -..::., q, as n -+ 00 (q, is the standard normal d.f.). If Po and Pn are the measures on lR I induced by q, and G n and the Borel set B is defined by B U~=o{(k - np)/(npq)I/2}, then obviously
Pn(B)
= P[Sn E b] = I fi Po(B) = P[O E B] = 0 as n -+ 00.
Once again this is due to the fact that the condition Po (8B) = 0 is not satisfied.
16.3.
Weak convergence of probability measures need not be uniform
Let Fo(x), Fn(x), x E lRl, n ~ 1 be d.f.s and f..to. f..tn. n ;:::: 1, their corresponding probability measures on (lR I , ~ I). Let us suppose that
(1)
lim f..tn(B) = f..to(B)
n-+oo
for all B E ~I.
176
COUNTEREXAMPLES IN PROBABILITY
It is natura1 to ask if (1) holds uniformly in B. The example below shows that in general the answer is negative even for absolutely continuous dJ.s. Indeed, if Fo. Fn, n ~ 1, have densities 10, In, n ~ 1, respectively, then ( I ) can be written in the form
B E 13 1•
n-'lim t<XJ1[BIn(x) dx = 1B[ lo(x) dx,
(2)
Consider now the following functions:
f ( ) - { 1 + sin (21l'nx) , n x 0,
if x E [0,1] if x ¢ [0, 1].
lo(x)
=
{I,
0,
if x E [0,1] ifx¢[O,I].
It is easy to see that 10 and In for each n ~ 1 are density functions. Clearly 10 is a uniform density on [0,] 1. If Fo and Fn. n ~ ], are the dJ.s of 10 and In, n ~ 1,
then Fn ~ Fo as n --t 00. Moreover, applying the Riemann-Lebesgue theorem (see Rudin 1966; Royden 1968), we conclude that relation (2), and hence (]), is satisfied 1 for this choice of 10. In. n ~ 1, and for all B E 13 • Consider now the sets Bn = {x E [0,1] : In(x) ~ I}. n ~ I. Then
[
1Bn
lo(x) dx =
~,
1
2"
1
+:;,
n
= 1,2, ....
Therefore in general the convergence in (1) and (2) can be non-uniform.
16.4.
Two cases when the continuity theorem is not valid
Let Fo, Fn , n ~ l, be d.f.s with ch.f.s ¢o,
Fn ~ Fo
-¢=:::>
--t
~
1, respectively. The continuity
¢o(t) where
Let us show that the continuity of
°is essential.
(i) Consider the sequence of r.v.s {Xn, n ~ I} where Xn "" N(O, n). Then the ch.f.
¢nofXnisgivenbY¢n(t) as n --t 00 where
exp(-!nt2),t E IR 1.Obviouslywehave¢n(t) --t ¢(t)
¢(t)
= {O,
1,
~f t :I If t
0
= 0.
Thus the limiting function ¢> is discontinuous at 0 and hence the continuity theorem does not hold. On the other hand, we have
Clearly limn-'t<XJ Fn{x) = F(x) exists for all x E IRI but F(x) =: ~ is not a d.f.
177
LIMIT THEOREMS
(ii) Consider the family of functions {Fn, n ~ I} where
0, Fn(x) = (n + x)/(2n), { I,
if x < -n if -n :S x :S n if x ~ n.
Then for each n, Fn is a dJ. and obviously for all x E IR I we have limn-H)o Fn (x) = &. Thus the sequence {Fn} is convergent but its limit, the constant &' is not a d.f. A simple explanation of this fact can be given if we consider the ch.f. ¢n of Fn. Since ¢n(t) = (sinnt)/(nt) then
Again, as in case (i), the limiting function continuity theorem cannot be applied. 16.5.
¢ is discontinuous at 0 and therefore the
Weak convergence and Levy metric
For given two dJ.s F(x), X E 1R1 and G(x), x E IRI the following quantity
L(F,G)
= inf{c > 0:
F(x - c) - c:S G(x) ~ F(x +c) +c,x E 1R1}
is called a Levy metric (distance) between F and G. Note that L(·, .) is a metric in the space of all d.f.s and plays an essential role in probability theory; e.g. the following result is frequently used. Let F and F n , n ~ I be d.f.s. Then, as n --+ 00
Consider now the sequence {Xn, n ~ I} of independent r.v.s. Denote Sn XI + '.' + X n , s~ = VS n and let Fn be the d.f. of Sn/sn. Suppose the variables Xn are such that Fn ~ G as n --+ 00, where G is a d.f. (Actually, G belongs to the class of infinitely divisible distributions.) This is equivalent to saying that for any c > 0 there is an index n E such that for all n ~ n E we have L(Fn, G) < E. Since the quantity L(Fn, G) is 'small', we can suggest that another related quantity, L( Fn , On), is also 'small'. Here Fn is the d.f. of Sn (without normalization!) and Gn(x) = G(xs n ). In several cases such a statement is true, but not always, as in the next example. Let X nj, j = I, ... , n, n ~ I be independent r. v.s where P[Xnj
If Sn as n --+
= X nl 00.
= ±nVs] =
I
2n'
j
= I, ... ,no
+ ... + X nn , then ESn = 0 and s~ = VS n = 5n 2 + n - I --+ 00 For the normalized variable 7Jn = Sn/(nVs) we have E77n = 0 and
178
COUNTEREXAMPLES IN PROBABILITY
= 1 + (n
- 1)/{Sn2 ) implying that VrJn -+ 1 as n -+ 00. Let us find the limit of the d.f. Fn{x) = P[rJn ~ x] as n -+ 00. In this case the best way is to find the ch.f. 'ljJn{t) == E[e it '7n]. By using the structure of the variables X nj and the properties of VrJn
ch.f.s we find that lim t/Jn{t) = t/J{t) = exp(cos t - 1), t E ]RI. n-too
=
exp( cos t - 1) is a ch.f. corresponding to a concrete r. v., say TJO and However t/J{t) 110 = t.1 - 6 with 6 and 6 independent r.v.S each having a Poisson distribution with parameter ~. Hence by the continuity theorem, we have
F n ~ G as n -+
00
with G(x) = P[110 ~ x], x E ]RI, or equivalently limn-too L(Fn, G) = O. Thus the quantity L{Fn , G) is 'small' and we want to see if L(Fn,G n ) is also 'small'. Recall that Fn is the d.f. of Sn itself, while Gn{x) = G(xsn). Note first that Fn and en correspond to discrete r. v.s. Specifically, the values of Sn are in the set {±j, ±kvls ± l : j, k, l = 1, ... , n} and the d.t'. Fn has jumps at all points of this set. Further, en{x) = P[Cn ~ x], where Cn = rJo·nvls and it is obvious thatJn takes I,2, ... } at each point of which G n has a its values in the set {O, ±k·n..J5 : k jump. In particular, P[110 = 0] > 0 which implies that for an odd index n we can find 0]) such that L{Fn, en) 2: c. Hence we a number c > 0 (expressed through P[TJO conclude that in this case the quantity L{Fn, en) is not small.
=
=
16.6.
A sequence of probability density functions can converge in the mean of order 1 without being converging everywhere
Let fo(x),fl{X),h(x), ... , x E ]RI be probability density functions. Here we consider two kinds of convergence of f n to fo: convergence almost everywhere and convergence in the mean of order 1 which are expressed respectively by lim fn{x) = fo{x) almost everywhere n-too
(1) and
lim n-too
(2)
roo Ifn(x) J-oo
fo(x)1 dx
= O.
Let us compare (l) and (2). According to a result by Robbins (1948), (l)=} (2). However, the converse is not always true. Indeed, let
fn{x) = {n/(n - 1), 0,
if (k ~ I)/n otherwIse
< x < kin -
I/n
2
,
k = I,2, ... ,n
179
LIMIT THEOREMS
and let 10 be the unifonn density on the interval (0,1). It is easy to see that for every n, In is a density and if Bn = {x E (0,1) : In(x) > O}, then
_ { I/(n - I), I, Iln(x) - 10(x)1 -
if x E Bn if x E B~
n (0, I).
Since the sets Bn and B~(O, I) have Lebesgue measures (n - 1)ln and lin respectively, we obtain the relation
1 1 o
2 n
Iln(x) - 10(x)1 dx = - => lim
n~oo
11 0
Iln(x) - lo(x) Idx
=
°
that is, In converges to 10 in the mean of order I. It now remains to show that
In(x)
It
lo(x)
= I,
x
E
(0, I).
For any fixed irrational number z there exist infinitely many rational numbers m I k such that mlk - IIk 2 < z < mlk. This fact and the definition of In imply that In(x) = for infinitely many n and for any fixed irrational x E (0,1). Furthennore. if x is a rational number in (0,1), then x = m I k for some positive integers m and k with m < k, and moreover
°
In(x) =0 for n=lk, l= 1,2, .... Thus for any x E (0,1) the densities In (x) cannot converge to lo(x)
16.7.
== 1.
A version of the continuity theorem for distribution functions which does not hold for some densities
Let Xn be a r.v. with d.f. Fn , density In and ch.f. 4>", n ~ 1. The continuity theorem provides necessary and sufficient conditions for the weak convergence of {Fn} in tenns of {4>n}. Now we want to find conditions which relate the ch.f.s {4>n} and the densities {In}. For some r.v. Xo with d.f. Fo, density 10 and ch.f. 4>0 we introduce the following three conditions: lim In(x)
(1)
n~oo
for almost all x E ]RI,
Fn ~ Fo as n
(2) (3)
= lo(x)
-t 00,
lim 4>n(t) = 4>o(t) for all t E ]RI and 4>0 is continuous at 0. n~oo
By the continuity theorem we have (2) {:::::::> (3). According to the Scheffe theorem (see Example 14.9), (I )=>(2). Example 14.9 also shows that in general (2)~(l). Thus
180
COUNTEREXAMPLES IN PROBABILITY
we conclude that (l)::::}(3) and can expect that in general (3)~(l). Let us illustrate by an example that indeed (3)~( 1). Consider the standard normal density tp(x) == (27r)-1/2 exp( _~x2) and its ch.f. ¢o( t) = exp( -
i>.(x) == tp(x)(l
(4)
(5)
!t2). Define the functions
cosAx)/(1 - ¢o(A)),
x E lRl,
W>,(t) = [2¢o(t) - ¢lo(t + A) - ¢lo(t - A)]/[2(1 - ¢O(A))],
tE
jRl
where A is any real number (e.g. take A == n). It is not difficult to check that for each A, 1>.. (x), x E lR I is a probability density function, ¢ >, (t), t E lR I is a ch.f., and moreover, ¢ >.. corresponds to 1>,. Further, we find
(6) where the limiting function ¢o is continuous at 0 and thus (3) is satisfied. However,
(7) and hence condition (1) does not hold. Comparing (6) and (7) we see that in general the pointwise convergence of the ch.f.s ¢n given by (3) is not enough to ensure the convergence (1) of the densities In. At this point the following result may be useful (see Feller 1971). Let ¢n and ¢ be absolutely integrable ch.f.s such that (8) Then the corresponding dJ.s Fn and F have bounded continuous densities In and 1 respectively, and (8) implies that lim In(x) == l(x) uniformly in x, x E lRl.
(9)
n-\o(X)
Obviously. in the above specific example. condition (9) is not satisfied (see (7». It is easy to see that the pointwise convergence given by (6) does not imply the integral convergence (8).
16.8.
Weak convergence of distribution functions does not imply convergence of the moments
Let F and Fn , n ~ 1 be dJ.s. Denote by Ok and oin) their kth moments: Ok
==
i:
xk dF(x),
o~n) =
i:
Xk
dFn(x),
k = 1,2, ....
lSI
LIMIT THEOREMS
According to the Frechet-Shohat theorem (see Feller 1971), if Q~n) --+ Qk as n --+ 00 for all k and the moment sequence {Qk} determines F uniquely, then w
(1)
Fn --+ F as n --+
00.
(For such results also see the works of Kendall and Rao 1950, Lukacs 1970.) Now let us answer the converse question: does the weak convergence (1) imply convergence of the moments Q~n) to Qk? By two examples we show that (1) can hold even if Q~n)
It Qk as n
--+
00
for any k.
(i) Consider the family of d.f.s {Fn, n ~ I} where
Fn(x)
= (1-~)
(211')-1/2 [Zoo e-
u2 /
2du+
2~(1 + l[n,oo)(x)),
x E lRl.
It is easy to see that
lim Fn(x) = (x) for all x E lRl n-too where is the standard normal d.f., that is Fn ~ as n --+ 00. However, the moments Q~n) of any order k of Fn tend to infinity as n --+
00
and hence Q~n) cannot converge to the moments Qk of N(O, 1). Recall that here Q2k-1 = 0, Q2k = (2k - I)!!, k = 1,2, .... (ii) Let Fn be the d.f. of a r.v. Xn distributed uniformly on the interval [0, n] and Fo be the d.f. of a degenerate r.v. X o. for example. Xo = 0. Define
1
Gn(x) = ~Fn(x)
+
(1) Fo(x), x 1- ~
E lR I , n ~ I.
Then {G n , n ~ I} is a sequence of d.f.s. The limit behaviour of {G n } can easily be investigated in terms of the corresponding ch.f.s {1fn}. Since 1jJ n (t) =
1'
1
e t't :z: dGn(x) = _(e ttn - l)/(itn) -00 n 00
+
(1 1- - ) n
we find that limn-too 'l/Jn (t) = 1. t E ~ I which implies that
°
lim Gn(x) = Fo(x) n-too
for all x E lR 1 except x = (the value of Xo; the only point of jump of Fo). It remains for us to clarify whether the moments Q~n) of G n converge to the moments Qk of Fo. We have
Q~n) = [ : xk dGn(x) = n k/(k + 1) --+ 00 for every k, k
as n --+
= 1,2, ... , while the moments Qk of Fo are all zero.
00
182
16.9.
COUNTEREXAMPLES IN PROBABILITY
Weak convergence of a sequence of distributions does not always imply the convergence of the moment generating functions
Recall first a version of the continuity theorem. Suppose {Fn, n = 1,2, ... } are d.f.s and {Mn, n = 1,2, ... }, the corresponding m.g.f.s Mn(z) exist for all Izi :S ro and all n. If F and M is another pair of a d.f. and m.g.f. such that Mn(z) -+ M(z) as n -+ 00 for all Izl :S rJ where rJ < ro, then Fn ~ F. Thus under general conditions the convergence of the m.g.f.s implies the weak convergence of the corresponding d.f.s and this motivates us to ask the inverse question: if Fn ~ F. does it follow that Mn -+ M as n -+ oo? Intuitively we may guess that the answer is 'no' rather than 'yes'. simply because when talking about a m.g.f. we assume at least the existence of moments of any order. The latter is not necessary for the weak distribution. A simple example shows that the answer to the above question is negative. Consider the d.f.s F and Fn. n = 1,2, .... defined by
F(x) =
{O,1,
~f
x <
If x
~
° 0,
0,
Fn(x)
={ !+
Cn
arctan (nx) ,
1,
if x < -n if -n :S x if x ~ n
where Cn = 1/[2arctan(n 2 )]. It is easy to check that Fn(x) -+ F(x) as n -+ 00 at all points of continuity of F. Hence Fn ~ F. Since F is a degenerate distribution concentrated at 0, then its m.g.f. Mn(z) = I for all z. Further, the m.g.f. Mn(z) of Fn ,
= (n
cne zx 1 n 2 2 dx J-n +n x exists for all z. It is almost obvious that Mn(z) -+ M(z) as n -+ 00 only for z If z f: 0, Mn(z) f+ M(z) as n -+ 00 since IMn(z)1 -+ 00 as n -+ 00. Mn(z)
16.10.
Weak convergence of a sequence of distribution functions does not always imply their convergence in the mean
i:
Let F o, F J , F2, ... be d.f.s. Suppose for some (3 (I)
= 0.
nl~~
>
°
the following relation holds:
IFn(x) - Fo(x)li3 dx
= 0.
From here it is easy to derive that Fn ~ F as n -+ 00. Now let us analyse (1) but in the opposite direction. Firstly, suppose that Fn ~ Fo. The question is, under what additional assumptions we can obtain a relation like (1) with a suitable {3 > o? One possible answer is contained in the following result (see Laube 1973). If Fn ~ Fo and for some '"Y > 0, (2)
183
LIMIT THEOREMS
then Fn tends to Fo in the mean of order f3 > I Ir' that is (2) and the weak convergence of Fn to Fo imply (1) with f3 > Ilr' Our aim now is to show that (1) need not be true if we take f3 = II r. To see this, consider the following d.f.s:
Fo(x) = l[o,oo)(x), x E l~'" I
I
Fn(x) = ;;Ir-n,o)(x) + I[o,oo)(x), x E lR, n = 1,2, ....
i:
Then it can be easily seen that lim Fn(x)
Ixl dFn(x) = 1,
n-too
Obviously condition (2) is valid for f3 = I. that is. for f3 = I Ir. since
i:
r =
= 1[0' 00) (x) = Fo(x),
x E lRl.
I. However, relation (I) does not hold for
IFn(x) - Fo(x)1 dx
=1
for all n.
Finally, note that relations like (1) can be used to obtain estimates for the global convergence behaviour in the central limit theorem (CLT) (see Laube 1973).
SECTION 17.
CENTRAL LIMIT THEOREM
Let {X n, n ~ I} be a sequence of independent r. v.s defined on the probability space (0,9", P). As usual, denote
Sn O'~
= Xl + ... + X n ,
ak = EX k , An = ESn = al s~ = V Sn = a? + ... + o'~.
= VX k ,
+ ... + an,
We say that the sequence {X n} satisfies the central limit theorem (CLT) (or, that
{Xn} obeys the CLT) if
nl~~ P[(Sn -
An)1 Sn
~ x] =
iXoo e-
u2 /
2
du for all x E IR' .
Let Fk denote the d.f. of Xk. Clearly, we can suppose that EXk = 0 for all k Now introduce the following three conditions:
(Lindeberg condition); 0'2
(F)
lim
max ~
n-too ':$k:$n s~
=0
~
I.
184
COUNTEREXAMPLES IN PROBABILITY
(Feller condition);
(UAN)
lim n-HXl
max P[lXkn
I :$k:$n'
> c] = 0
-
where Xkn
= Xk! Sn.
(uniform asymptotic negligibility condition (u.a.n. condition». Now we shall formulate in a compact fonn two fundamental results.
Lindeberg theorem. (L)
(CLT)
=}
Lindeberg-Feller theorem. If (F), then (L)
-¢::::::}
( CLT)
(L)
-¢::::::}
(CLT).
or if (UAN). then The proof of these theorems and several other related topics can be found in many books. We refer the reader to the books by Gnedenko (1962), Fisz (1963), Breiman (1968). Billingsley (1968. 1995). Thomasian (1969). Renyi (1970). FelJer (1971). Ash (1972). Chung (1974), Chow and Teicher (1978). Loeve (1978). Laha and Rohatgi (1979) and Shiryaev (1995). The examples below demonstrate the range of validity of the CLT and examine the importance of the conditions under which the CLT does hold. Some related questions are also considered.
17.1. Sequences of random variables which do not satisfy the central limit theorem (i) Let X I , X 2, .•• be independent r. v.s defined as follows: P[ X I = 1] = P[X I - 1] = ~ and for k ~ 2 and some c. 0 < c < 1.
P[Xk
= ±1] = ~(1- c),
P[Xk
= ±k] = 2~2C,
P[Xk
= 0] = (1 -
First let us check if the Lindeberg condition is satisfied. We have
If n is large enough and such that cvln
>
1, c
> 0 is fixed. then we find
:2)
c.
185
LIMIT THEOREMS
Therefore the given sequence {Xk} does not satisfy the Lindeberg condition. However, this does not mean that the CLT fails to hold for the sequence {Xd because the Lindeberg condition is only a sufficient condition. Actually the sequence {Xk} does not obey the CLT. This follows from the fact that Xk/ Sn satisfy the u.a.n. condition. Indeed. if k if k
< cy'n 2:: cy'n.
Thus
Now Sn/ Sn ~ ~ where ~ "'" N(O, I); this and the u.a.n. condition would imply the Lindeberg condition which, as we have seen above, is not satisfied. Thus our final conclusion is that the Lindeberg condition is not satisfied and the CLT does not hold.
!
(ii) Let the r.v. Y take two values, 1 and -I, with probability each, and let {Yk, k 2:: I} be a sequence of independent copies of Y. Define a new sequence {Xk, k 2:: I} where Xk = JISYk/4k and let Sn = XI + ... + X n . Since EY = 0 and V X = 1 we easily find that
ESn
=0
and s~
= VSn = 1 -
Thus s~ ::::: 1 for large n (this is why the factor On the other hand it is obvious that
(1/16)n.
v't5 was involved).
P[lSnl :; ~] = 0 for every n 2:: 1. Therefore the probabilities P[Sn ::; x] cannot converge to the standard nonnal d.f. 4>(x) for all x, so the sequence {Xk} does not obey the CLT. Note that in this ex.ample XI 'dominates' the other tenns. (iii) Suppose that for each n,
Sn = Xnl + Xn2 + ... + Xnn where Xnl, ... ,Xnn are independent r.v.s and each has a Poisson distribution with mean 1/(2n). We could ex.pectthat the distribution ofthe quantity (Sn- ESn ) /VVB:: will tend to the standard nonnal d.f. 4>. However, this is not the case, in spite of the fact that P[Xnk = 0] = e- 1/(2n) ::::: 1 for large n that is each Xnk is 'almost' zero. It is enough to note that for each n the sum Sn has a Poisson distribution with parametefi In particular, P[ Sn = 0] = e -I /2 implying that the distribution of (Sn - ESn) / Sn cannot be close to 4>.
COUNTEREXAMPLES IN PROBABILITY
186
17.2.
How is the central limit theorem connected with the Feller condition and the uniform negligibility condition?
Let {X n , n ~ I} be a sequence of independent r.v.s such that Xn . . . ., N(O, O'~) where O'f 1 and O'i 2k- 2 for k ~ 2. Then Sn = Xl + ... +Xn has variance s~ = 2n - l . Since Xk/Sk ,..,.. N(O,~) we find that
=
=
Sn/ Sn "" N(O, I) for each n and therefore the cLr for {Xk} is satisfied trivially. Further, 0'2
lim
max --.!. 2
n-too 1
2n- 2
I
= n-too lim -n -1 = :f: 0 2 2
and moreover
Hence neither the Feller condition nor the u.a.n. condition holds. This implies that the Lindeberg condition also does not hold. However, despite these facts the sequence {Xn} obeys the CLT.
17.3.
Two 'equivalent' sequences of random variables such that one of them obeys the central limit theorem while the other does not
Consider again the sequence {X n, n ~ I} from Example 17.1: namely, P[ X I ± I] = ~ and for k ~ 2 and 0 < c < 1,
P[Xk
= ±l] = ~(1
c), P[Xk = ±k] =
2~2C,
P[Xk
= 0] =
(1
;2) C.
Using truncation we define the sequence {Xnk, k = 1, ... , n, n ~ I} by
Denote Sn = Xnl VX nk = I - c if k > and thus
+ " . + gnn, s;' == VSn . Since V Xnk = I if k ::; ....jii and ....jii, we find that s~ == [...fii] + (I - c)(n - [...fii]) ~ n(l- c) c))c
-t
O.
Therefore the Lindeberg condition holds and Sn/Sn ~ 1J where 1J is a r.v. distributed N(O, 1). So the sequence {'Ynk} obeys the CLT.
187
LIMIT THEOREMS
We shall show that the sequences {Sn} and {Sn} (not {Xn} and {Xnd) are 'equivalent' in the following sense:
(1) Indeed, n
P[Sn -:f Sn] ~
L P[Xk -:f X nk ] k=1
n
n
~
L P[lXk/ > v'n] ~ L k=1
P[lXk/
= k].
k=[.;n]
Therefore
P[lX.1
= k] = :'
D
(1 -
and
f: :' <
00
k=1
=>
P[Sn '" En] -+ 0 as n -+
00.
However (see Example 17.1) the sequence {Xn} does not obey the CLT. Thus we have constructed two sequences, {Xn} and {Xnd, which are equivalent in the sense of(1) and such that the CLTholds for {Xnk} but does not hold for {X n }. Note again that the Lindeberg condition is valid for {Xnk} but not for {X n }.
17.4.
If the sequence of random variables {X n} satisfies the central limit theorem, what can we say about the variance of Sn/JVS:,.?
Consider two sequences, {Xk' k ~ I} and {Yk, k ~ I}, each consisting of independent r. v.s and such that
Denote
Sn = YI
+ ... + Yn , Sn
= XI
+ ... + X n ·
Obviously the sequence {Yn } obeys the CLT: that is, Sn/.fii ~ ~ where ~ N(O, I). The truncation principle (see Gnedenko 1962; Feller 1971), when applied to the sequence {X k}, shows that Sn/.fii has the same asymptotic behaviour as that of Sn/.fii. Thus we conclude that Sn/.fii ~ 1] as n -+ 00 where 1] N(O, I). f'V
f'V
Then we can expect intuitively that
V[Sn/J1iJ -+ I and V[Sn/J1iJ For the sequence {Yk} we have EYk
-4
1 as n -+
00.
= 0, VYk = I. Thus for each n,
I = V[Sn/v'nl -+ I as n
-400.
188
COUNlEREXAMPLES IN PROBABILITY
On the other hand, for {X/c} we find EX/c = 0, VX/c = 2 - 1/ k 2 and
_ 1 V[Sn/VnJ = - I: n
n
(
k=1
1 1 1) k = 2 I: k --t 2 as n --t n n
2-
2
k=l
2
00
(since E~l (1/ P) < 00), that is V[Sn/ Jii] fi 1 as we assumed. Therefore the CLT does not ensure in general the convergence of the moments of the normed sum Sn/ fo to the moments of the normal distribution N(O, 1). For the convergence of the moments we need some additional integrability conditions. In particular,
17.5.
Not every interval can be a domain of normal convergence
Suppose {X n , n ~ I} is a sequence of Li.d. r.v.s which satisfies the CLT. Denote by Fn the d.f. of (Sn - ESn )/(VSn )1/2 where Sn = Xl + ... + X n. The uniform convergence Fn (x) --t cI>( x), x E IR I implies (1)
. 1 - Fn(x) I1m n-HlO 1 cI>(x)
1 uniformly in x on any finite interval of IR I.
Note that (1) will hold uniformly on intervals of the type [0, bnl whose length btl increases with n. In general, intervals for which (1 ) holds are called domains of normal convergence. Obviously such intervals exist, but we now show that not every interval can be a domain of normal convergence. Consider Xl, X2, .. , independent Bernoulli r.v.s with parameter p: that is, P[XJ = 1] = p = I - P[XI = 0]. Obviously the sequence {Xn} obeys the CLT. If Sn = Xl + ... + Xn then ESn np, s~ = VS n = np(l - p) and
1- Fn(x) = P
= P Hence for an arbitrary x
[(np {l-PW (~Xk -n 1 2 /
p)
>
xl
[~Xk > x(np(1 - p))'/2 + npl.
> (n(l
- p)/p)I/2 we obtain the equality
[I - Fn(x)]j[l - cI>(x)]
=0
which clearly contradicts (1). Therefore (1) cannot hold for any interval of the type [O,O(Vn)]. In particular the interval [O,cpJii], where cp > ((1 - p)/p)I/2 (p is fixed), cannot be a domain of normal convergence. Finally note that intervals of the type [0, o( yin)] are domains of normal convergence. This follows from the well known Berry-Esseen estimates in the CLT (see FeHer 1971; Chow and Teicher 197R; Shiryaev ] 995).
189
LIMIT THEOREMS
17.6. The central limit theorem does not always hold for random sums of random variables Let {X n, n > I} be a sequence of r. v.s which satisfies the CLT. Take another sequence {v n , n ~ I} of integer-valued r.v.s such that Vn ~ 00 as n -t 00 and define Tn = SV n = XI + ... + XVn and b~ - VTn . If lim P[(Tn
n-too
ETn)/bn S x] = (x) ,
x E IRI
we say that the CLT holds for the random sums {SlIn} generated by {Xn} and {vn }. In the next two examples we show that the CLT does not always hold for {Sv n }, In both cases {Xn,n ~ l}isasequenceofLi.d.r.v.ssuchthatP[X I = I] P[XI = 1] = Obviously if Vn = n a.s. for each n, then Tn = Sn = XI + ... + X n ,
!.
b2 = nand Tn/b n ~ Ewhere E"" N(O, I). (i) Define the sequence {vn, n ~
°
Vo = Then
Vn
~
00
O} as follows:
and Vn = min{k
as n -t
00,
> Vn-l : Sk
= (_l)k} forn ~ 1.
b~ = VTn = n 2 and dearly
It follows that the distribution of Tn/bn does not have a limit as n -t the CLT cannot be valid for the random sums {Sv n }.
00
and hence
(ii) Let {v n , n ~ l} be independent r. v.s such that Vn takes the values nand 2n. with probabilities p and q = I - p respectively. Suppose additionally that {v n } is independent of {X n }. Then
b~
= VTn = pE[S~]
qE[S?n] = (I + q)n.
It is easy to check that Tn/bn does not converge in distribution to a r.v. E,. . , N(O, I). More precisely, P[Tn/b n S x] converges to the mixture of the distributions of two r.v.s, EI '"" N(O, (I + q)-2) and 6 -- N(O, 2( I + q)-2) with weights p and q respectively.
17.7. Sequences of random variables which satisfy the integral but not the local central limit theorem Let {Xu, n ~ I} be a sequence ofindependentr.v.s. Denote by Fnand in respectively the dJ. and the density of (Sn - ESn)/sn where as usual Sn = XI + .. , + X n• s~ - VS n. Let us set down the following relations:
(I)
lim Fn(x) = (x) = (27r) 1/2 n-too
j
x
-00
2
e- u /2 du,
x E IRI;
COUNTEREXAMPLES IN PROBABILITY
190
(2) Recall that if (1) holds we say that the sequence {Xn} obeys the integral CLT, while in case (2) we say that {Xn} obeys the local CLT (for the densities). It is easy to see that (2)=}(1). However, in general weak convergence does not imply convergence of the corresponding densities (see Example 14.9). Note that in (1) and (2) the limit distribution is N{O, 1). Hence in this particular case we could expect that the implication (1 )=}(2) is true. Two examples will be considered where (l )~(2). In the first example the variables are identically distributed while in the second they have different distributions. (i) Let X be a r. v. with density
(3)
0,
I(x)
{ 1/(2Ixllog2Ixl),
if Ixl 2:: e- I if Ixl < e- I .
Since X is a bounded r.v., the sequence {Xn' n 2:: I} of independent copies of X satisfies the (integral) CLT. So the aim is to study the limit behaviour of the density
J;
-1
fn of (XI + ... + Xn)/(aVn) where a 2 = VX = (x/ log2 x) dx. If g2 is the density of the sum X I + X2 then g2 is expressed by the convolution -1
92(X) =
iee_
f(u)f(x - u) du. 1
Let us now try to find a lower bound for g2. It is enough to consider x in a neighbourhood of 0; in particular we can assume that Ixl < e -I, and, even more, that 0 < x < e- I . Then g2(X) 2:: J~x f(u)f(x - u) duo Since f(x - u) reaches its minimum in the domain lui::; x at u = 0, we have
1 g2(X) > - 2x log2 x
jX
I
I
2 du = 2xl log3 xl . -x 21ullog lui
Analogously we establish that in a neighbourhood of 0 the density g3 of the sum XI + X2 + X3 satisfies the inequality
g3(X) >
C3
4'
xlog x
In general, jf gn is the density of XI
gn(x) >
e3
= constant> O.
+ ... + Xn we find that around 0,
en
I I n+1 I' x og x
en
= constant> O.
Thus for each n, gn(X) takes an infinite value at x = O. Since the density fn is obtained from gn by suitable nonning, we conclude that In(x) cannot converge to
191
LIMIT THEOREMS
Therefore the sequence {Xn} defined by the density (3) does not obey the local CLT a1though the integral CLT holds.
Oi) Let {Xnl n ~ I} be independent r.v.s where Xn has density if _2- n - 2 :$; X :$; 2- n- 2 or 1 - 2- n- 3 otherwise.
4 In X = {2n, () () 0,
< Ixi < ] + 2- n- 3
!
It is easy to see that EX n = 0, V Xn + 5/(3·2 2k +7 ). Then for an arbitrary k > 1, ~ < V X k < 1, the Lindeberg condition is satisfied and hence the sequence {Xn} obeys the (integral) CLT. Denote by 9k(X), x E rn:.! the density of the sum Sk = XI + ... + Xk. Then for k = 2,92 is the convolution of II and 12, that is
92(X) = (II
* h)(x)
i:
II (u)h(x - u) duo
= t. By (4) we have 15 !8 -< u -< 18 or T6 < Iu I < 15 ?6:$; u:$; J~ or 32 < Iu I <
Let us find the value of 92(X) at the point x
b(u) ~ 0,
12 (t - u)
if
~ 0,
if
17
16'
17 32'
Comparing the intervals where II ~ 0 and 12 ~ 0 we see that92(~) = O. Analogously we find that 00
92(U)h(x _ u)
dul
/ -00
= 0
x=!
t) =
and, more generaBy, that 9n ( 0 for aU n ~ 2. It is not difficult to see that 9n(X) = 0 for al1 x of the fonn x = ~(2m + 1), m = 0, ±I, ±2, ... and finally that 9n(X) = 0 for all x = !(2m + 1) + 6 where m = 0, ± I, ±2, ... and 161 < ThesumSn = XI + .. ·+Xn hasESn = oand VSn = s~ = 1n+ 1~'i2(J _2- 2n ). Since the density 9n of Sn and the density Pn of Sn/ Sn satisfy the relation Pn(x) = Sn9n(XSn), we have to study the behaviour of the quantity Sn9n(X8 n ) as n -t 00. Again, take x Then
!.
= i.
I ) -_ [I'2 n + 1152 5 (1 Sn9n ( iSn
-
2- 2n )] 1/2 '9n (I2 [I2n + 1152 5 (1
-
2- 2n )] 1/2)
.
If n is of the fonn n = 2(2N + 1)2, then the argument of 9n becomes
t(2N + 1)[1
+
11552(1-
2- 2.2(2N+I)2)(2N
!
For large N this expression takes the fonn (2N + I) + properties of 9n established above we conclude that
+ 1)-2].
6 with 161 <
Sn9n (1sn) = 0 for sufficiently large n.
!. From the
COUNTEREXAMPLES TN PROBABILITY
192
This implies that lim Pn
n-too
(4) =
o.
However. rp(!) =I 0 and thus relation (2) is not possible. Therefore the sequence {Xn} defined by the densities (4) does not obey the local CLT. General conditions ensuring convergence of both the d.f.s and the densities are described by Gnedenko and Kolmogorov (1954).
SECTION 18.
DIVERSE LIMIT THEOREMS
In this section we have collected examples dealing with different kinds of limit behaviour of random sequences. The examples concern random series, conditional expectations, records and maxima of random sequences, versions of the law of the iterated logarithm and net convergence. The definitions of some of the notions are given in the examples themselves. For convenience we formulate one result here and give one definition.
Kolmogorov three-series theorem. Let {Xnl n ~
)} be a sequence of
independentr.v.sandX~c) = XnI!lxnl~c)forsomec > o. A necessary condition for the convergence OfL~=l Xn with probability 1 is that the series 00
L
E[XAc)],
00
00
L V[X~c)], L P[IXnl ~ c]
n=] n=1 converge for every c > O. A sufficient condition is that these series are convergent for some c > O. n=1
The proof of this theorem and some usefuJ corollaries can be found in the books by Breiman (1968), Chow and Teicher (1978), Shiryaev (1995). Now let us define the so-called net convergence (see Neveu 1975). Let T be the set of all bounded stopping times with respect to the family (:1n, n EN). Here (:1n) is a non-decreasing sequence of sub-a-fields of:1 and a stopping time T is a function with values in [0, ooJ such that [T = nJ E :1n for each n E N. The family (aT' T E T) of real numbers, called a net, is said to converge to the real number b provided for every [ > 0 (here is TO E T such that for aU T E T with T 2: TO we have la T bl < [, Each of the examples given below contains appropriate references for further reading.
18.1.
On the conditions in the Kolmogorov three-series theorem
(i) Let {Xn, n ~ I} be independent r.v.s with EXn = 0, n
2: 1. Then the condition
L~=I V Xn < 00 implies that L~=I Xn converges a.s. Note that this is one of the simplest versions of the Kolmogorov three-series theorem.
LIMIT THEOREMS
193
Let us show that the condition L~=I V Xn < 00 is not necessary for the convergence of L~=I X n . Indeed, consider the sequence {Xn' n ~ I} of independent r.v.s where
Obviously L~=I V Xn = 00 but nevertheless the series L~=I Xn is convergent a.s. according to the Borel-Cantelli lemma. (ii) The Kolmogorov three-series theorem yields the following result (see Chow and
Teicher 1978): if {X n , n ~ I} are independent r.v.s with EXn = 0, n 2: I and 00
I:: E[X~I[lXn 19]
(I)
+ IXnII[IXn I> 1]] < 00
n=1
then the series L~= I Xn converges a.s. Let us clarify the role of condition (1) in the convergence of L~=I X n . For I} of i.i.d. r. v.s with P[~I = I] = this purpose consider the sequence {~nl n P[~I = -1 J = and define Xn = ~n/ ft, n ~ 1. It is easy to check that for any r > 2 the following condition holds:
!
00
(2)
I:: E[lXnn
< 00.
n=1
Condition (2) can be considered in some sense similar to (I). However the series Xn diverges a.s. This shows that the power 2 in the first term ofthe summands in (1) is essential. Finally let us note that if condition (2) is satisfied for some 0 < r ~ 2 then the series L~=I Xn does converge a.s. (see Loeve 1978). L~=I
18.2.
The independency condition is essential in the Kolmogorov three-series theorem
Let us start with a direct consequence of the Kolmogorov three-series theorem (sometimes called the 'two-series' theorem). If Xm n 2: I are independent r.v.s and the series L~=l an and L~=I (T~, with an = EXn • (T~ = V X n , are convergent, then the random series L~=I Xn is convergent with probability 1. Our goal is to show that the independency property for X n • n 2: I, is essential for this and similar results to hold. (i) Let ~ be a r. v. with E~ = 0 and 0 < V ~ = b2 < 00 (i.e. ~ is non-degenerate). I. Then an = EXn = 0, (T~ V Xn = b2/n2 implying Define Xn = ~/n, n
that
00
I:: an n=1
00
=0
00
and I:: (T~ = b I::(l/n 2
n=1
n=1
2
)
< 00.
COUNTEREXAMPLES IN PROBABILITY
194
Hence two of the conditions in the above result are satisfied and one condition, the independence of X n 1 n 2:: I, is not. Nevertheless the question about the convergence of the random series 2::::=1 Xn is reasonable. Since 00
~(w) L(1ln)
n=1
the series 2::~= 1X n (w) is convergent on the set {w : ~ (w) = O} and divergent on the set AC = {w : e(w) -:f. OJ. If the non-degenerate r.v. ~ is such that peA) = p where p is any number in [0, I), we get a random series 2::~= 1X n which is convergent with probability p (strictly less than I) and divergent with probability 1- P (strictly greater than 0). (ii) In case (i) the dependence among the variables X n , n 2:: 1, is 'quite strong'-any two of them are functionally related. Let us see if the independence of X n , n 2:: 1, can be weakened and replaced e.g. by the exchangeability property. We use the following modification of the Kolmogorov three-series theorem. If X n , n 2:: 1. are Li.d. r.v.s with EX 1 = 0, E[ Xf] < 00 and Cn , n 2:: I are real numbers with 2::~== I c~ < 00, then the random series 2:::=1 cnXn is convergent with probability]. Let us now consider the sequence of i.i.d. r. v.s en, n 2:: I with Eel = 0 and E[efJ < 00 and let 7] be another r.V. with E7] = 0, 0 < E[7]2] < 00 and independent of {en,n 2:: I}. Define the sequenceXn, n;::: I, by
Xn = en + 7], n ;::: 1. Thus X n , n ;::: 1, is a sequence of identical1y distributed r.v.s with EX( - 0 and E[Xf] < 00. Obviously the variables X n , n > 1, are not independent. However X n , n ;::: I, is an exchangeable sequence. (See also Example 13.8.) Our goal is to study the convergence of the series 2::~= I cnX n where Cn . n ;::: ]. satisfy the condition 2::~=1 c~ < 00. Choose Cn. n ;::: 1, such that Cn > 0 for any nand 2:::=1 Cn = 00 (an easy case is Cn = ] In). Since cnXn = cnen + Cn7], we have 00
00
ou
n=1
n==]
n==1
The independence of en, n 2:: I, implies that the series E~= 1 cnen is convergent a.s. Hence, in view of 2::~=1 C n = 00, the series E~=, cnXn is convergent on the set A = {w : new) = O} and divergent on AC = {w : new) -:f. OJ. For preliminary given p, p E [0, I), take the r.v. 7] such that peA) p. Then the random series 2:::=1 cnXn of exchangeable (but not independent) variables is convergent with probability p < I and divergent with probability 1 - P > O. We have seen in both cases (0 and (ij) the role of the independence property for random series to converge with probability 1. The same examples lead to one additional conclusion. According to the Kolmogorov 0-1 law, if X n1 n > ],
=
195
LIMIT THEOREMS
are independent r.v.s, then the set {w 2::'=1 Xn(w) converges} has probability or 1. Hence, if X n , n ~ 1, are not independent we can obtain P[w : 2::'=1 Xn(w) converges] = p for arbitrarily given p E [0,1).
o
18.3.
The interchange of expectations and infinite summation is not always possible
Let us start with the fonnulation of a result showing that in some cases the operations of expectations and summation can be interchanged (see Chow and Teicher 1978). If {Xn' n ~ I} are non-negative r.v.s then
(1) Our aim now is to show that (1) is not true without the non-negativity of the variables Xn even if the series 2::'=1 Xn is convergent. Consider {~n' n ~ I} to be Li.d. r. v.s with P[~I = ± 1] = ~ and define the stopping time T = inf {n ~ 1 : 2:;=1 ~k = I} where inf{0} = 00. Then it is easy to check that P[T < 00] = 1. Setting Xn = ~nI[T~nl' we get from the definition of T that
~ Xn ~ ~~nI[T~n[ ~ ~~n ~ I
=?
E
[~Xn] ~ I.
However, the event [T ~ n] E a{6, ... , En-I}, the r.v.s En and I[T~nl are independent and from the properties of the expectation we obtain
EX n =
E~nEI[T~nl
= 0,
n
>
1.
Thus 2:~=1 EX n = 0 and therefore (1) is not satisfied.
18.4.
A relationship between a convergence of random sequences and convergence of conditional expectations
On the probability space (0,1', P) we have given r.v.s X and X n , n space LT (i.e. r-integrable) for some r ~ 1. Suppose Xn ~ X as n
~
-t
1, all in the
00. Then for
L'
any sub-a-field A C l' we have E[XnIA] ---+ E[XIA] as n -t 00 (e.g. see Neveu 1975 or Shiryaev 1995). This statement is a consequence of the Jensen inequality for conditional expectations. Obviously, we can ask the inverse question and the best way to answer it is to consider a specific example. Let X n , n ~ 1 be i.i.d. r.v.s with P[X = 2c] = P[X = 0] = ~, c=/;O is a fixed real number. Take also (a trivial) r.v. X = c: P[X = c] = ]. Then, if A = a{0, O}, the trivial a-field, we obviously get
E[XnIA]
= EX n = 2c.~ + O.~ = c = E[XIA]
for any n
> 1.
196
COUNTEREXAMPLES IN PROBABILITY
Moreover because X, X n are bounded, then for any r
>
I, one has
L'
E[XnIA] ---t E[XIA] as n -+ However E[IX n - Xlr] L'
hence Xn
18.5.
-1+ X
= E[lXn -
as n -+
en
00.
= ~er + ~er = er for all n
~ I, e
f= 0 and
00.
The convergence of a sequence of random variables does not imply that the corresponding conditional medians converge
Let (0,1", P) be a probability space, 1"0 = {0, o} the trivial a-field and 'D a sub-a-field of 1". If X is a r.v., then the conditional median of X with respect to 'D is defined as a 'D-measurable r. v. M such that
P[X ~ 1\11'D] ~ ~ S P[X < Min] a.s. Usually the conditional median is denoted by I-'(XI'D) (see Example 6.10). If {Xn, n ~ I} is a sequence of r. v.s which is convergent in a definite sense, then it is 10gicaJ to expect that the corresponding sequence of the conditional medians also will be convergent. In this connection let us formulate the following result (see Tomkins 197 Sa). Let {X n, n ~ I} and {Mn, n ~ I} be sequences ofr. v.s such that for a given a-field 'D we have Mn = I-'(Xnl'D) a.s. and there exist r.v.s X and M p
such that Xn X and Mn ---t M as n -+ 00. Then M = I-'(XI'D) a.s. We can now try to answer the question of whether the convergence of {Xn} always implies convergence of the conditional medians {M n}. Let be a r. v. distributed uniformly on the interval ~), 'D = :to (the trivial a-field) and define the sequence {Xu} by
e
!,
It is easy to see that Xn ~ X as n -+ 00. Moreover, X" has a unique median Mn and Mn 0 or ] accordingly as n is odd or even. But clearly the sequence {Mn} does not converge. (It would be useful for the reader to compare this example with the result cited above.)
18.6.
A sequence of conditional expectations can converge only on a set of measure zero
If (O,:t, P) is a complete probability space, (:tn, n E N) an increasing family of sub-a-fields of 1",1"00 = limn-too 1"n and X a positive r. v., the following result holds (see Neveu 1975):
(1)
E(XI1"n] u)E[XI1"oo] outsidetheset {w:E[XI1"nl=oo forall n}.
197
LIMIT THEOREMS
We shall show that this result cannot be improved. More precisely, we give an example of an a.s. finite ~oo-measurable r.v. X such that E[XI~nJ = 00 a.s. for all n E N. Clearly in such a case the convergence in (1) holds only on a set of measure zero. Let n = [0, 1J. ~ = 13[0,1] and P be the Lebesgue measure. Consider the increasing sequence (~n) of sub-a-fields of ~ where ~n is generated by the dyadic partitions {(2- n k,2- n (k + 1)),0 ~ k < 2 n ,n EN}. For each n E N choose a positive measurabJefunctionln : [0, I) t-+ jR+ ofperiod2- n with
10' fn(w) dw =
I
and
10' I 1/. >oj (w) dw = 2-
n
,
Since the sum E:'= I ] [fn >0] is integrable, and hence a.s. finite, then X = E:'= lin is a positive r.v. which is finite a.s. Thus the series E~=I In contains no more than a finite number of non-zero terms for almost all w. On the other hand, for aJJ n E N and all k, 0 ~ k < 2n , we have
by the periodicity of 1m. Therefore we have shown that E[ X I~n1 = in (1) holds only on a set of measure zero.
18.7.
00
for all n E N and the a.s. convergence
When is a sequence oCconditional expectations convergent almost surely?
Let (n,~, P) be a probabiJity space and {~n, n 2 I} an independent sequence of sub-a-fields of~, that is, for k = 1,2,. " and Aj E ~j, 1 j k, we have
s: s:
Let X be an integrable r.v. with EX = a and let Xn = E[XI~nJ. The following result is proved by Basterfield (] 972): if E[IXllog+ IXI]
< 00
then P[X
-t
a as n
-t
00] = ]
(log x is defined for x > 0 and log+ x log x if x > ], and 0 if 0 < x ~ ]). We aim to show that the assumption in this result cannot be weakened, e.g. it cannot be replaced by E[lXI] < 00. To see this, consider a sequence {Anl n > ]} of independent events with P( An) = 1/ n. Define the r. v. X by 00
X =
I: m!{m+2 m=1
198
where
COUNTEREXAMPLES IN PROBABILITY
€m
is the indicator function of the event AI A2 ... Am. Since
we obtain
EX
DO OO( 1 = '""" m!/(m + 2)! = '""" LL- m+l m=1
m=1
1) = -.I
m+2
2
Moreover, it is not difficult to verify that
E[X log+ X]
= 00.
Consider now :fn = {0, An, A~, il} and Xn = E[XI:fn]. We need to check if Xn ~! as n -+ 00. Since E[XIAn] = (l/P(A n )) fAn X dP = n fAn X dP, replacing X by L::=I m!€m+2 we arrive at the equality
E[XIAn] = ~. However, L:~=I P(An) = 00 and, by the Borel-Cantelli lemma, almost all w belong to infinitely many An. Therefore limsupXn = ~
7l-tDO
=I
!= a=
EX.
Thus the condition E[IX\log+ IXI] < 00 cannot be replaced by E[IXIl to preserve the convergence Xn = E[XI:fn] ~ a = EX as n -+ 00.
18.8.
< 00 so as
The Weierstrass theorem for the unconditional convergence of a numerical series does not hold for a series of random variables
Let L:~=I an be an infinite series of real numbers. This series is said to converge unconditionally if L:r=1 an", < 00 for every rearrangement {nl,n2,"'} of {I, 2, ... }. (By rearrangement we understand a one-one map of N onto N.) We say that the series L:~=I an converges absolutely if L:~=I Ian I < 00. According to the classical Weierstrass theorem these two concepts, unconditional convergence and absolute convergence, are equivalent. Thus we arrive at the question: what happens when considering random series L:~=I Xn(w), that is series ofr.v.s? Let {Xn' n ~ I} be a sequence of r.v.s defined on some probability space (il, :f, P). The series L:~=I Xn is said to be a.s. unconditionally convergent if for every rearrangement {nd of N we have L:r=1 X n ", < 00 a.s. If L:~=I IXnl < 00 a.s., the given series is a.s. absolutely convergent. So, bearing in mind the Weierstrass theorem, we could suppose that the concepts a.s. unconditional and a.s. absolute convergence are equivalent. However, as will be seen later, such a conjecture is not generally true.
LIMIT THEOREMS
199
Consider the sequence {rn, n = 0,1,2, ... } of the so-called Rademacher functions. that is rn(W) = sign sin(2 n Trw). ~ W < 1, n 0, ], ... (see Lukacs 1975). Actually rn can also be written in the form
°
rn(w)
={
I, -1,
0,
if 2k/2 n < W < (2k ~f (2k If W -
+ 1)/2 n
+ 1)~2n < W < (2k +})/2n k/2 ,k = 0,1, ... ,2 .
Then {rn} is a sequence of independent r.v.s on the probability space (n, 5", P) with n = [0, 1], 5" = '.B(O, I] and P the Lebesgue measure. Moreover, rn takes the 1. values 1 and -1 with probability! each, Ern 0, Vrn Now take any numerical sequence {an} such that
=
00 2: a~ < 00
n=I
but
=
00 2: lanl = 00.
n=I
For example. an = (_1)n/(n + I). Using the sequence {rn} of the Rademacher functions and the numerical sequence {an} we construct the series (1)
Applying the Kolmogorov three-series theorem we easily conclude that this series is a.s. convergent. If {nk} is any rearrangement of N then the series l:~ 1a nk r nk (w) is also a.s. convergent. However, E:':::::I anTn(w) is not absolutely convergent since Irn(w)1 = 1 and l::'=1 lanl = 00. Therefore the series (1) is a.s. unconditionaUy convergent but not a.s. absolutely convergent, and so these two concepts of convergence of random series are not equivalent.
18.9. A condition which is sufficient but not necessary for the convergence of a random power series
=
Let an an(w), n = 0,1,2, ... , be a sequence of i.i.d. r.v.s. The random power series, that is a series of the type 2::'=:0 an (w )zn, is defined in the standard manner (see Lukacs 1975). As in the deterministic case (when an are numbers), one of the basic problems is to find the so-called radius of convergence r = r(w). This T is a r.v. such that for alllzi < r the series 2::'=oa n (w)zn is a.s. convergent. Moreover, r(w) (Jim sUPn-too Vlan(w)I)-'. Among the variety of results concerning random power series, we formulate here the following (see Lukacs 1975). If {an, n 2 O} are Li.d. r.v.s and the d.f. F(x), x E 1R + of la II satisfies the condition
=
(1)
/00 10gxdF(x) < 00
200
COUNTEREXAMPLES IN PROBABILITY
then the random power series L~=o an (w )zn has a radius of convergence r(w) such that P[r(w) ~ I] = 1. Let us show by a concrete example that condition (1) is not necessary for the existence of r with P[r ~ I] = I. Take ~ as a r. v. distributed unifonnly on the interval [0,1]. Define an by an(w) = exp(I/~(w)). Then the common d.f. of an is if if
X
<e
X
> e.
Clearly
JOO 10gxdF(x) =
00
and condition (I) is not satisfied. However, for any c > P[lim sup [Ianl ~ (I n-too
°
we have
+ ctJ] = P[lim sup [0, I I(n log( I + c))]] = 0. n-too
This relation, the definition of a radius of convergence and a result of Lukacs (1975) allow us to conclude that P[r(w) < x] = for all x E (0, I]. For x = I we get P[r(w) ~ I] = 1. Thus condition (I) is not necessary for the random power series L~=o an(w )zn to have a radius of convergence r ~ I.
°
18.10.
A random power series without a radius of convergence in probability
As before consider a random power series and its partial sums N
00
2:: an(w)zn
2:: an(w)zn
and UN(Z, w) =
n=O
n=O
where the coefficients an(w), n = 0, I, ... , are given r.v.s. If Un(z) are convergent in some definite sense as N ----t 00 then the random power series is said to converge in the same sense. There are several interesting results about the existence of the radii of convergence if we consider a.s. convergence, convergence in probability and LT -convergence. Note that a.s. convergence was treated in Example 18.9. Now we aim to show that no circle of convergence exists for convergence in probability of a random power series. For this let {an (w), n ~ O} be independent r.v.s with
P[ao
= 0] =
I,
Plan
= nn] =
lin,
Plan
= 0] = I -
lin, n ~ 1.
It is easy to check that the power series L~=o an (w)zn is a.s. divergent, that is its radius of convergence is ro = O. Clearly, this series cannot converge in probability or in LT -sense. From the definition of an we find that
LIMIT THEOREMS
201
Define another power series, say L:~=o bn(w)zn, whose coefficients bn are given by bo = ao and bn = an - an-I for n 2: 1. Obviously, N
lim "" bn 1n ~
P-
N-+oo
=P -
n=O
lim aN = O.
N-+oo
Furthermore, we have N
N-I
n=O
n=O
L bn(w)zTl = aN(w)zN + (1 - z) L an(w)zn.
(I)
It is clear that the series L:~=o bn{w)zn converges in probability at least at two points, namely z 0 and z = I. If we suppose that it is convergent for a point z such that z -:p 0 and z -:p 1, we derive from (I) that
=
~ an(w)zn = 1 ~ z
[t.
bn(w)zn - aN (w)zN
1
which must also converge in probability as N -+ 00. However, this contradicts the fact that TO = O. Therefore in general the random power series has no circle of convergence in probability. Finally, note that Arnold (1967) characterized probability spaces in which every random power series has a circle of convergence in probability. Such a property holds iff the probability space is atomic.
18.11.
1\vo sequences of random variables can obey the same strong law of large numbers but one of them may not be in the domain of attraction of the other
Let {Xk, k 2: I} and {Yk, k 2: I} each be a sequence of i.i.d. r.v.s. Omitting subscripts, we say that X and Y obey the same SLLN if for each number sequence {an, n 2: I} with 0 < an t 00 either
n (1)
n
LXk
= o(a n )
k=1
and LYk = o(a n )
a.s.
k=1
or
n (2)
lim sup( 1/ an) L n-+oo
k=1
n
Xk =
00
and limsup(l/a n ) LYk n-+oo
= 00
a.s.
k=1
We also need the following result of Stout (1979): X and Y obey the same SLLN iff
(3)
P[IYI > x]/P[lXI > x] = O(x) as x -+
00.
202
COUNTEREXAMPLES IN PROBABILITY
Note that the statement that two r.v.s X and Y, or more exactly two sequences {X k, k ~ I} and {Yk , k ~ I}, obey the same SLLN is closely related to another statement involving the so-called domain of attraction. Let U and V be r.v.s with d.f.s G and Hand ch.f.s
}RI.
n-t<Xl
We write U E N (,) to denote that U is in the domain of nonnal attraction of a stable law with index,. Now let X E N(,x) and Y E N(,y) where,x < 2"y < 2. Then, as a consequence of a result by Gnedenko and Kolmogorov (1954), we obtain that X and Y obey the same SLLN iff,x = Thus we come to the question: Can a r.v. Y fail to be in the domain of normal attraction of a stable law X and yet obey the same SLLN as X? By an example we show that the answer is positive. Consider a r. v. X with a Cauchy distribution and a r. v. Y whose d.f. F is given by
'y.
(4)
F(x)
= { 1-
(x
+ 3)-1[2 + sin(1ogx)],
0,
if x > 0 if x :s o.
It is easy to check that X and Y satisfy condition (3) and hence X and Y obey the same SLLN in the sense of (1) and (2). According to a result by Gnedenko and Kolmogorov (1954) the r.v. Y is in the domain of attraction of a Cauchy distribution only if
P[IYI > x]
(5)
= (a + f3(x))/x
for some a = constant> 0 and f3(x) -+ 0 as x -+ 00. However, the d.f. F given by (4) does not satisfy (5). Therefore Y is not in the domain of normal attraction of the Cauchy-distributed r.v. X despite the fact that X and Y obey the same SLLN.
18.12.
Does a sequence of random variables always imitate normal behaviour?
Let F (x), x E
be a d.f. with zero mean and variance 1. Consider the sequence {X n ) n ~ I} of Li.d. r. v.s whose d.f. is F, and another sequence of independent r.v.s {Yn , n ~ I} each distributed N(O, 1). We also have a non-decreasing sequence {an) n ~ l} of real numbers. As usual, let Sn = XI + ... + X n . We say that the sequence {Sn) n ~ I} (generated by {Xn}) imitates normal behaviour to within {an} if there is a prohahility space with r.v.s {Xn} and {Yn } defined on it such that {Xn} are i.i.d. with a common d.f. F, {Yn } are independent N(O, 1) and ( 1)
-
1
an
}R I
[( X
I
+ ... + X n) - (Yi + ... + Yn )] ~ 0
as n -+
00.
203
LIMIT THEOREMS
Note that the first result of this type was obtained by Strassen (1964), who showed that every sequence {Sn} with EX) 0 and E[Xf} I imitates normal behaviour to within { an} with an = (n log log n) )/2. He used this result to prove the law of iterated logarithm (LIL) for all such sequences. The question now is whether it is possible to choose a sequence {an} 'smaller' than {(nloglognp/2} and preserve the property described by (1). Some results in this direction can be found in Breiman (1967). Our (n log log n) I /2, cannot aim now is to show that the condition on {an}, that is an be weakened too much. More precisely, let us show that the sequence {Sn} defined above does not imitate normal behaviour to within {b n } where bn n]/2. Firstly, define the sequence {nk, k ;::: 2} by
=
=
=
=
Thus the differences nk+1 - nk, k ;::: 2, are increasing. Suppose now that {Sn} imitates normal behaviour to within {b n }. Then for Zn + ... + en sums ofindependentN(O, 1) r.v.s, the series
= el
00
L P[Snk+l-n/o > bnk+l]
00
and
k=2
L
P[ZnA.+l-nlo
> bnlo+IJ
k=2
must converge or diverge simultaneously as a consequence of (1). Take Xk = 1]kOk where 1/1,0),1]2,82, ... are all mutually independent, Ok "" N(O,I), E1]) = 0, E[1]fJ = 1 and the distribution of 1]] will be specified later. We have
For the sequence {Sn} we find
+ .. .
1]~lo+l-nlo' We can take 1]1 distributed such that
where Uk =
1]f
where h( n)
= (log n) (log log n) 1+6 and <5 > O. From (3) and (4) it follows that
(5) Taking (2) into account we find that 00
L
1.:=2
P[ZnAo+I-nlo
> y'nk+d < 00.
204
COUNTEREXAMPLES IN PROBABILITY
On the other hand, since 10gnk+1 obtain from (5) that
f"V
k/((3logk), for any fixed 6, 0
<6<
!, we
<Xl
2: P[Snr.+I-nr. > ~ = 00. k=2
However, these two series must converge or diverge together. This contradiction shows that the sequence {Sn} does not imitate normal behaviour to within {b n } where bn = y'n,. Therefore in the result of Strassen (1964) the sequence an (n log log n) I /2 cannot be replaced by bn = n I /2.
18.13.
On the Chover law of iterated logarithm
Let {Xn, n 2: I} be a sequence ofi.i.d. r.v.s with a d.f. F. Denote
Sn = Xl
+ ... + X n ,
~n
= Sn/bn,
= I~nll/loglogn,
TIn
n >3
where {b n , n 2: I} are norming constants, bn > 0, n 2: 1. It is interesting to study the asymptotic behaviour of TIn as n -t 00. For example, Vasudeva (1984) has proved the following result. Suppose there exists a sequence {b n } such that ~n ~ ~ as n -t 00 where ~ is a stable r.v. with index" 0 < , ~ 2. Then TIn ~ P where p is a definite number in the interval [0,00). Let us note that the a.s. convergence of {TIn} is known as the Chover law o/iterated logarithm (for references and details see Vasudeva 1984). One can ask whether it is necessary to assume the weak convergence of {~n} in order to get the a.s. convergence of {TIn}. Our aim now is to describe a sequence of r.V.S {Xn, n 2: I} such that ~n = (XI + ... + Xn)/b n, for given {b n }, fails to converge weakly, but nevertheless
TIn = I~nll/ log log n ~ constant
= e l /../2
as n -t
00.
For this take the function F(x), x E IRI where
F( -x)
=1-
F(x)
={
r
'2 x
<x ~ log x) ) , if x > 1. if 0
!
-../2
(1
I. (
+ T2 sm
1
It is easy to see that F is a dJ. Let {Xn, n 2: I} be i.i.d. r.v.s whose common dJ. is F. Choose bn = n l /v'2, n 2: 1, as a sequence ofnorming constants. Note that for all x > 0 we have
.!.!.x-v'2 < I - F(x) 12
-
Then for TIn = ISn/bnll/loglogn, n two relations:
2:
+ F(-x) < .!2 x -v'2. - 12
3, (with bn = nl/../2) we find the following
P[ISnl 2: bn (1ogn)(I-E)/v'2]
2:
cl/(n(logn)'-E),
P[lSnl2: bn (logn){I+E)/../2] ~ c2/(n(Iogn)I+E)
205
LIMIT THEOREMS
valid for any c E (0, 1) and all n ~ no, where no is a fixed natural number. By the Borel-Cantelli lemma and a result by Feller (1946) we find that P[lSnl
2:
bn{log n)(I-€)/v'2 i.o.] = 1, P[lSnl
Therefore
P[ lim
n~<Xl
TIn
2:
bn(log n)(I+€)/v'2 Lo.] = O.
=e 1/ v'2] = 1.
Applying a result ofZolotarev and Korolyuk (1961) we see that the sequence {~n} cannot converge weakly (to a non-degenerate r.v.) for any choice of the norming constants {b n }.
18.14.
On record values and maxima ofa sequence of random variables
Let {Xn,n 2: I} be a sequence ofi.i.d. r.v.s with a common d.f. F. Recall that Xk is said to be a record value of {Xn} iff Xk > max{XI, .. ·, Xk-I}. By convention XI is a record. Define the r.v.s {Tn, n 2: O} by
TO = 0,
Tn
= min{k : k > Tn-I, Xk > X Tn _t }·
Obviously the variables Tn are the indices at which record values occur. Further, we shall analyse some properties of two sequences, {X Tn' n 2: I} and the sequence of maxima {Mn' n ~ I} where Mn = max{X], ... , Xn}' The sequence of r. v.s {~n' n 2: I} is called stable if there exist nomling constants {b n , n 2: I} such that ~n/bn ~ 1 as n ~ 00. If the convergence is with probability ], then {~n} is a.s. stahle, while if the convergence is in probability, we say that {~,J is stable in probability. Let us formulate a result connecting the sequences {XTn} and {Mn} (see Resnik 1973): if {X Tn } is stable in probability, then the same holds for {Mn }. Note firstly that the function h(x) = -log(1 - F(x)) and its inverse h-I(x) = inf {y : h(y) > x} play an important role in studying records and maxima of random sequences. In particular the above result of Resnik has the following precise formulation: as n ~ 00
XTn/h-l(n) ~ 1 => Mn/h-I(logn) ~ 1. This naturally raises the question of whether the converse is true. By an example we show that the answer is negative. Take the function h(x) = (logx)2, X ~ 1, and let {X n} be i.i.d. r. v.s with d. f. F corresponding to this h. As above, {AIn} and {X Tn} are the sequences of the maxima and the records respectively. Since the function h -I (log y) = exp[ (log y) 1/2] is slowly varying, according to Gnedenko (1943), {Mn} is stable in probability. Moreover, from a result by Resnik (1973), {Aln} is a.s. stable. Nevertheless, the sequence of records {X Tn } is not stable in probability since the function h- I ((log Y)2) = Y (compare with the result cited above) is not slowly varying and this condition (see again Resnik 1973) is necessary for {X Tn} to be stable.
Part 4
Stochastic Processes
Courtesy of Professor A. T. Fomenko of Moscow University.
STOCHASTIC PROCESSES
SECTION 19.
209
BASIC NOTIONS ON STOCHASTIC PROCESSES
Let (O,:J, P) be a probability space, T a subset ofJR+ and (E, e) a measurable space. (X t, t E T) of random variables on (0, ~, P) with values in (E, e) The family X is called a stochastic process (or a random process, or simply a process). We call T the parameter set (or time set, or index set) and (E, e) the state space of the process X. For every wE 0, the mapping from T into E defined by t H Xt(w) is called a trajectory (path, realization) of the process X. We shall restrict ourselves to the case of real-valued processes, that is processes whose state space is E JR I and = 13 1• The index set T will be either discrete (a subset of N) or some interval in JR + . Some classes of stochastic processes can be characterized by the family P of the finite-dimensional distributions. Recall that
e
P={Pt" ... ,tn(BJ, ... ,Bn ), n21, tl, ... ,tnET, BJ, ... ,BnE13 I } where
= P[X tt
Ptt, ... ,t n(B 1 , ••• , Bn)
E B 1,
•"
,
Xtn E BnJ.
The following fundamental result (Kolmogorov theorem) is important in this analysis. Let P be a family of finite-dimensional distributions,
and P satisfies the compatibility (consistency) condition (see Example 2.3). Then there exists a probability space (O,:J, P) with a stochastic process X = (X t , t E T) defined on it such that X has P as its family of the finite-dimensional distributions. Let T be a finite orinfinite interval ofJR+. We say that the process X (Xt, t E T) is continuous (more exactly, almost surely continuous) if almost all of its trajectories Xt(w), t E T are continuous functions. In this case we say that X is a process in the space C(T) (of the continuous functions from T to JR I). Further, X is said to be right-continuous if almost all of its trajectories are right-continuous functions: that is, for each t, X t = X t + where Xt+:= limll-i.t Xs' In this case, if the left-hand limits exist for each time t, the process X is without discontinuity of second kind and we say that X is a process in the space lI}(T) (the space of all functions from T to JR 1 which are right-continuous and have left-hand limits). We can define the left-continuity of X analogously. Let (:Jt, t E JR+) be an increasing family of sub-a-fields of the basic a-field :J, that is :Jt C :J for each t E JR+ and:J1l C :Jt if 8 < t. The family (:Jt , t E JR+) is called afiltration of 0. For each t E JR+ define
=
:Jt+:=
n
~8'
s>t
~t-:=
V ~8' s
(Recall that V s
210
COUNTEREXAMPLES IN PROBABILITY
continuous if it is both left-continuous and right-continuous. We say that the process X = (Xt, t E JR+) is adapted with the filtration (Tt, t E JR+) if for each t E JR+ the r.v. X t is Tt-measurable. In this case we write simply that X is (Tt ) adapted. The quadruple (11, T, (Ttl t E JR+), P) is called a probability basis. The phrase 'X = (Xt, t E JR+) is a process on (11, T, (Tt, t E JR+), P)' means that (11, T, P) is a probability space, (Tt, t E JR+) is a filtration, X is a process on (11, T, P) and X is (Tt)-adapted. If the filtration (Tt, t E JR+) is right-continuous and is completed by all subsets in 11 of P-measure zero, we say that this filtration satisfies the usual conditions. Other important notions and properties concerning the stochastic processes will be introduced in the examples themselves. The reader will find systematic presentations of the basic theory of stochastic processes in many books (see Doob 1953, 1984; Blumenthal and Getoor 1968; Gihman and Skorohod 1974/1979; Ash and Gardner 1975; Prohorovand Rozanov 1969; Dellacherie 1972; Dellacherie and Meyer 1978, 1982; Loeve 1978; Wentzell 1981; Metivier 1982; Jacod and Shiryaev 1987; Revuz and Yor 1991; Rao 1995). 19.1.
Is it possible to find a probability space on which any stochastic process can be defined?
(Xt, t ETc JR+) be a stochastic Let (11, T, P) be a fixed probability space and X process. It is quite natural to ask whether X can be defined on this space. Our motive for asking such a question will be clear if we recall the following result (see Ash 1972): there exists a universal probability space on which all possible random variables can be defined. By the two examples considered below we show that some difficulties can arise when trying to extend this result to stochastic processes. (i) Suppose the answer to the above question is positive and (11, T, P) is such a
space. Note that 11 is fixed, T is fixed and clearly the cardinality of T is less than or equal to 2°. Choose the index set T such that its cardinality is greater than that of ~ and consider the process X = (Xt, t E T) where X t are independent r.v.s with P[X t = 0] = P[X t = IJ ~ for each t E T. However, if tl oj; t2, tl, t2 E T, the events [Xtl = 1] and [Xh 1] cannot be equivalent, since this would give
This contradiction shows that T must contain events whose number is greater than or at least equal to the cardinality of T. But this contradicts the choice of T. (ii) Let 11 - [0,1], T
13[o,lj and P be the Lebesgue measure. We shall show that on this space (11, T, P) there does not exist a process X :::: (Xt, t E [0, 1]) such that
the variables X t are independent and X t takes the values 0 and 1 with probability ~ each. (Compare this with case (i) above.)
211
STOCHASTIC PROCESSES
Suppose X does exist on (n,:1, P). Then E[X t ] < 00 for every t E [0,1]. Let ~ be the countable set of simple r.v.s of the type 2:k ckIA,. where Ck are rational numbers and {Ak} are finite partitions of the interval [0,1] into subintervals with rational endpoints. Since E[Xd < 00 then according to BiIlingsley (1995) there is a r.v. yt E ~ with E[IX t - ytl] < (Instead of we could take any c > 0.) However, for arbitrary s, t we have E[lXs - Xtl] = implying that E[lYs - ytl] > for all s i: t. But there are only countably many variables yt.
*.
19.2.
!
*
°
What is the role of the family of finite-dimensional distributions in constructing a stochastic process with specific properties?
Let P = {Pn:=Ptl ,... ,tn' n E N, t), ... ,tn E T, T C IR+} be a compatible family of finite-dimensional distributions. Then we can always find a probability space (n,:1, P) and a process X = (Xt, t E T) defined on it with just P as a family of its finite-dimensional distributions. Note, however, that this result says nothing about any other properties of X. Now we aim to clarify whether the compatibility conditions are sufficient to define a process satisfying some preliminary prescribed properties. We are given the compatible family P and the family of functions {Xt (w), w E n, t E T}. Denote by A and Jt respectively the smallest Borel field and the smallest O"-field with respect to which every X t is measurable. Since every set in A has the form A = {w: (Xtl (w), ... ,Xtn (w)) E B} where B E ~ n, by the relation
(1)
P(A)
= P{ w: (Xtl (w), ... ,Xtn (w))
E
B} =
l
dPtl, ... ,t n (XI, ... ,x n )
we define a probability measure P on (n,A) and this measure is additive. However, it can happen that P is not O"-additive and in this case we cannot extend P from (n, A) to (n, Jt) (recall that Jt is the 0"- field generated by A) to get a process (X t, t E T) with the prescribed finite-dimensional distributions. Let us illustrate this by an example. Take T = [0, 1] and n = C[O, 1] the space of all continuous functions on [0,1]. Let Xdw) be the coordinate function: Xt(w) = w(t) where w = {w(t), t E [0, I]} E C[O, 1]. Suppose the family P = {Pn } is defined as follows:
(2)
Ptl, ... ,t n (XI, ... ,x n ) =
II jXk g(u) du n
k=)
-00
where g( u), u E IR I is any probability density function which is symmetric with respect to zero. It is easy to see that this family P is compatible. Further, the measure P which we want to find must satisfy the relation
P[X,
> t, X, < -tJ
= (l~ g(u) dU)'
for any t
> O.
212
COUNTEREXAMPLES IN PROBABILITY
Since n = C[O, 1], the sets An = {w:Xt(w) > E,Xt+ 1/ n (w) the empty set 0 as n -+ 00 for every E > O. However, lim P(An)
n-+oo
=
(1 E
00
g(u)
dU)
2
f
< -c}
must tend to
O.
Hence the measure P defined by (I) and (2) is not continuous at 0 (or, equivalently, P is not a-additive) and thus P cannot be extended to the a-field A. Moreover, for any probability measure P on (n, A) we have p(n) = I which means that with probability 1 every trajectory would be continuous. However, as we have shown, the family 'J> is not consistent with this fact despite its compatibility. In particular, note that (2) implies that (Xl, t E [0,1]) is a set of r.v.s which are independent and each is distributed symmetrically with density g. The independence between X t and X s, even for very close sand t, is inconsistent with the desired continuity of the process. This example and others shuw that the family, even though compatible, must satisfy additional conditions in order to obtain a stochastic process whose trajectories possess specific properties (see Prohorov 1956; Billingsley 1968).
19.3. Stochastic processes whose modifications possess quite different properties Let X = (Xl, t E T) and Y = (Yt, t E T) be two stochastic processes defined on the same probability space (n,:f, P) and taking values in the same state space (E, C.). We say that Y is a modification of X (and conversely) if for each t E T. P[w:Xt(w) f Yi(w)] = O. If we have P[UtET : Xt(w) ::J. yt(w)}]
=0
then the processes X and Yare called indistinguishable. The following examples illustrate the relationship between these two notions and show that two processes can have very different properties even if one 0 f the processes is a modification of the other. (i) Note firstly that if the parameter set T is countable, then X and Y are indistinguishable iff Y is a modification of X. Thus some differences can arise only if T is not countable. So, define the probability space (n,:f, P) as follows: n = jR+, :f = '13+ and P is any absolutely continuous probability distribution. Take T = jR+ and consider the processes X = (Xl, t E ~+) and Y = (Yt, t E ~+) where Xt(w) - 0 and yt(w) = I{t}(w), Obviously, Y is a modification of X and this fact is a consequence of the absolute continuity of P. Nevertheless the processes X and Y are not indistinguishable, as is easily seen. (ii) Let n = [0, 1], ~ = '13[0,1], P be the Lebesgue measure and T = jR+. As usual, denote by [t] the integer part of t. Consider two processes X = (Xl, t E JR+) and
STOCHASTIC PROCESSES
213
Y = (Yi, t E ]R+) where
Yi (w)
Xt(W) = 0 for all wand all t,
=
{O,1,
~f t - [t] f- w
If t - [t]
= w.
It is obvious that Y is a modification of X. Moreover, all trajectories of X are continuous while all trajectories of Y are discontinuous. (A similar fact holds for the processes X and Y in case (i).)
(iii) Let r be a non-negative r.v. with an absolutely continuous distribution. Define the processes X = (Xt, t E ]R+) and Y = (Yi, t E ]R+) where
Xt
= Xt(w) = l[T(w):Stj(w),
It is easy to see that for each
Yi
= Yi(w) =
I[T(w)
= 0,
Yo
= O.
t E ]R+ we have
P[W: Xt(w)
f- Yi(w)]
=
:P[w: r(w) = t] = o.
Hence each of the processes X and Y is a modification of the other. But let us look at their trajectories. Clearly X is right-continuous with left-hand limits, while Y is left-continuous with right-hand limits.
19.4.
On the separability property of stochastic processes
Let X
= (Xt, t ETc
]R 1) be a stochastic process defined on the probability space (il, T, P) and taking values in the measurable space (]R I , ~ I). The process X is said to be separable if there exists a countable dense subset So C T such that for every closed set B E ~I and every open set IE ]RI,
{w: Xt(w) E B for all t E T I} = {w : Xs(w) E B for all
S
E
Sol}.
Clearly, if the process X is separable, then any event associated with X can be represented by countably many operations like union and intersection. The last situation, as we know, is typical in probability theory. However, not every stochastic process is separable. (i) Let r be a r.v. distributed unifonnly on the interval [O,lJ. Consider the process X = (Xt, t E [0,1]) where
Xt
= Xt(w) = {01"
if r(w) if r(w)
=t
f- t.
If S is any countable subset of [0,1] we have P[Xt = 0 for
t E S] = 1,
Therefore the process X is not separable.
P[Xt
= 0 for t E [0, 1]] =
O.
214
COUNTEREXAMPLES IN PROBABILITY
(ii) Consider the probability space (n, j', P) where n = [0, 1], j' is the a-field of Lebesgue-measurable sets of [0, I] and P is the Lebesgue measure. Let T = [0, 1] and A be a non-Lebesgue-measurable set contained in [0, I] (the construction of such sets is described by Halmos 1974). Define the function X = (Xt, t E T) by
Xt(W)={I, 0,
iftE~andw=t
otherWise.
°
for all wEn. Further, for each Then for each t E T, t fi A, we have Xt(w) = t E T, tEA, we have Xt(w) fur all wEn except for w - t when Xt(w) 1. Thus for every t E T, X t (w) is j'-measurable and hence X is a stochastic process. Let us note that for each wEn, w fi A, we have Xt(w) = for all t E T, and for each wEn, w E A, we have Xt(w) = except for t = w when Xt(w) = 1. Therefore every sample function of X is a Lebesgue-measurable function. Suppose now that the process X is separable. Then there would be a countable dense subset So c T such that for every closed B and open I,
°
°
°
{w: Xt(w) E B for all t E T l} = {w: Xs(w) E B for all
8
and both events belong to the a-field j'. Take in particular B - [0, Then the event
E Sol}
tl and I
R 1•
{w:Xt(w)E[O,i] furaH tET}=[O,I]\A does not belong to j'. Hence the process X is not separable. The processes considered in cases (i) and (ii) have modifications which are separable. For very general results concerning the existence of separable modifications of stochastic processes we refer the reader to the books by Doob (1953, 1984), Gihman and Skorohod (197411979), Yeh (1973), Ash and Gardner (1975) and Rao (1979,1995).
19.5.
Measurable and progressively measurable stochastic processes
Consider the process X = (Xt, t 2:: 0) defined on the probability basis (n, j', (j'tJ, P) and taking values in some measurable space (E, e.). Here (j't) is a filtration satisfying the usual conditions (see the introductory notes). Recall that if for each t, X t is j'tmeasurable, we say that the process X is (j't)-adapted. The process X is said to be measurable if the mapping (t, w) H Xt(w) of R+ x n to is measurable with respect to the product a-field '.B+ x j'. Finally, the process X is called progressively measurable (or simply, a progressive process) if for each t, the map (8, w) H Xs(w) of [0, t] x n to E is '.B[O,t) x j't-measurable. Now let us consider examples tu answer the fulluwing questiuns. (a) Dues every process have a measurable modification? (b) What is the relationship between measurability and progressive measurability?
215
STOCHASTIC PROCESSES
(i) Let X = (Xl, t E [0, 1]) be a stochastic process consisting of mutually independent r.v.s such that EXt = 0 and E[Xll = 1, t E [0,1]. We want to know if this process is measurable. Suppose the answer is positive: that is there exists a (t,w)-measurable family (Xt(w)) with these properties: EXt = 0 for t E [0,1], E[XsXt ] = 0 if.~ # t and E[XsXtl = 1 if 8 = t. It follows that for every subinterval I of [0,1] we should have
l hh
IXs(w)Xt(w)IP (dw) dsdt <
00.
Hence using the Fubini theorem we obtain
Thus for a set NI with P(NI) = 0 we have II Xt(w) dt = 0 if "-' ¢ NI. Consider now all subintervals I = [r',r"] with rational endpoints r', rtf and let N = UINI .
J:
Then P(N) = 0 and for all w in the complement NC of N we have Xt(w) dt = 0 for any subinterval [a,b] of [0,1]. This means that for w E NC, Xt(w) = 0 for all t except possibly for a set of Lebesgue measure zero. Applying the Fubini theorem again we find that
LJ.' xl
(w)P(dw) dt
= O.
However, this is not possible, since
LJ.' xl
(w)P(dw) dt =
J.'
E[xiJ dt = 1.
This contradiction shows that the process X is not measurable. Moreover, the same reasoning shows that X does not have a measurable modification.
(ii)
Consider now a situation which could be compared with case (i). Let X = (Xl, t ETc lR 1) be a second-order stochastic process (E[Xl1 < 00 for all t E T) and let C(8, t) = E[XsX,] be its covariance function. If X is a measurable process, then it follows from the Fubini theorem that C( 8, t) is a ~T x ~T-measurable function. This fact leads naturally to the question: does the measurability of the covariance function C(8, t) imply that the process X is measurable? The example below shows that the answer is negative. Let T = [0,1] and X = (X t , t E [0,1]) be a family of zero-mean r.v.s of unit variance and such that Xs and X t for 8 # t are uncorrelated: that is, EXs = 0 for t E [0,1], and C(8, t) = E[XsXtl = 0 if 8 =I- t, C(8, t) = 1 if 8 t, 8, t E [0,1]. Since C is symmetric and non-negative definite, there exists a probability space (n,:f,p) and on it areal-valued process X = (Xl,t E [0,1]) withCasitscovariance function. Obviously, the given function C is ~[O,11 x ~[O,q-measurable. Denote by
=
216
COUNTEREXAMPLES IN PROBABILITY
H(X) the closure in L2 = L 2(0, T, P) of the linear space generated by the r.v.s {Xt,t E [0, I]}; H(X) is called a linear space of the process X. According to Cambanis (1975) the following two statements are equivalent: (a) the process X has a measurable modification; (b) the covariance function C is 'BJ x 'BJ-measurable and H(X) is a separable space. Now, since the values of X are orthogonal in L2, that is E[XsX t ] = for s f. t, s, t E [0,1], the space H (X) is not separable and therefore the process X does not have a measurable modification. The same conclusion can be derived in the following way. Suppose X has a measurable modification, say Y = (yt, t E [0, 1]). Then
°
(1)
Jd
and this relation implies that Y? dt < 00 a.s. Let {'Pn, n 2 I} be a complete orthogonal system in the space L 2[0, 1] = L 2([0, 1], 'B[O, 1] , Leb) of all functions f (t),
t E [0, 1] which are 'B[O,Wmeasurable and square-integrable: Jd f2(t) dt <
00.
Then
(see Loeve 1978)
n=l
in L2[0, 1] where ~n =
Jd Yt'Pn(t) dt a.s. Further, we have
E[~~l = J.' J.' C(s, t)
= 0] =
1, and hence
t ~2dt = L~~ = ° a.s.
io
o
00
n=l
which contradicts equality (1). Therefore the process X with the covariance function C does not have a measurable modification. (iii) Here we suggest a brief analysis of the usual and the progressive measurability of stochastic processes. Let X = (XL, t 2 0) be an (Tt}-progressive process. Obviously X is (Td-adapted and measurable. Is it then true that every (Tt}-adapted and measurable process is progressive? The following example shows that this is not the case. Let 0 = JR + , T = 'B + and P( dx) = e - x dx where dx corresponds to the Lebesgue measure. Define ~ = {(x, x), x E JR+} and let T t for each t E JR+ be the a-field generated by the points of JR+ (this means that A E T t iff A or AC is countable). Consider the process X = (Xt, t E JR+) where
217
STOCHASTIC PROCESSES
Then the process X is (s:-t}-adapted and 'B+ x S:--measurable but is not progressively measurable. It is useful to cite the following result: if X is an adapted and right-continuous stochastic process on the probability basis (0, s:-, (S:-t) , t ~ 0, P) and takes values in the metric space (E, e), then X is progressively measurable. The proof of this result as well as a detailed presentation of many other results concerning measurability properties of stochastic processes can be found in the books by Doob (1953, 1984), Dellacherie and Meyer (1978, 1982). Dudley (1972), Elliott (1982), Rogers and Williams (1994) and Rao (1995).
19.6.
On the stochastic continuity and the weak L I-continuity of stochastic processes
Let X (X(t), t E T) be a stochastic process where T is an interval in /R I . We say that X is stochastically continuous (P-continuous) at a fixed point to E T if for t E T,
X(t) X(to) as t ....-t to. The process X is said to be stochastically continuous if it is P-continuous at all points ofT. A second-order process X (X (t), t ETC ~ I) is called weakly L I-continuous if for every t E T and every r.v. ~ with E[e] < 00 we have
=
lim E[X(s)e] = E[X(t)~].
s-+t
We now consider two specific examples. The first one shows that not every process is stochastical1y continuous, while the second examines the relationship between the two notions discussed above. (i) Let the process X = (X(t), t E [0, I)) consist of i.i.d. r.v.s with a common density g(x), x E ~ 1. Let to, t E [0,1), t ~ to and.s > 0. Then
Pe:=P[IX(t) - X(to)1
Obviously, if .s ....-t Pc: ....-t
~ .s] =
g(x)g(y) dx dy.
Iz-yl~e
°
then
ff
ff
g(x)g(y) dx dy =
z=f.y
This means that for some.so
> 0, Pea>
i: i:
g(x)g(y) dxdy = 1.
!, and hence
P[IX(t) - X(to)1 ~ .so]
1+ °as t ....-t to·
Therefore the process X is not stochastically continuous at each t E [0,1]. (ii) Let the probability space (0, S:-, P) be defined by 0 = [0, I], s:- = 'B[O, I] with P the Lebesgue measure. Take the sequence of r.v.s {17n) 11 ~ I} where
COUNTEREXAMPLES IN PROBABILITY
218
1]n(w) = n 3 / 4 1[O,I/nj(w). Then for sufficiently small e > 0 we have 1 P[w: l1]n(w)1 ~ e] = - -+ 0 n
and hence 1]n ~O as n -+ 00. However, E[l1~] = n 1/ 2 , the sequence {l1n} is not hounded and consequently {l1n} is not weakly LI-convergent (see Masry and Camhanis 1973). Our plan now is to use the sequence {1]n} to construct a stochastic process which is stochastically but not weakly L I-continuous. Define the process X = (X(t), t E [0, I)) by X(t) = 0 for t
X(t) = (n for t E
[n~I' *]
+ 1)(1
- nt)l1n+l(W)
and wEn, n
~
= 0, wEn and
+ n((n + I)t - l)l1n(W)
1. Thus X(O) = 0, X(1/n) = 1]n, n
~
1, and
for all wEn, X(·,w) is a linear function on every interval [n~I' *], n ~ 1. Since X ( = 17n and the sequence {l1n} does not converge weakly in L I-sense, then the process X is not weakly L I-continuous at the point t = O. It is easy to see that for all wEn the process X is continuous on (0, 11 and hence X is stochastically continuous on the same interval (0, I]. Clearly, it remains for us
*)
to show that X is P-continuous at t == O. Fix e > 0 and 6 > O. Since 1]n ~ 0, there exists N = N(e,6) such that for an n ~ N, P[l1]nl ~ e] < Now for all t E (0, N-I] we have t E [ , for some concrete n ~ N and it follows from the definition of X that
!6.
*]
o$
IX(t)1 $ max{l1]nl, I1]n+ I I} for aU wEn.
Thus P[w: IX(t)1
2: e]
~ P[w: max{l1]nl, 11]n+ 1 I} ~
~ P[w: l1]nl
1
e]
2: e] + P[w: 11In+ll
~
e]
1
< 2 6 + 26 = 6 implying that
X(t) ~O = X(O)
as
t -+ 0
and hence the process X is stochastically continuous on the interval [0,1]. Note that in this example the weak L I-continuity of X is violated at the point t = 0 only. Using arguments from the paper by Masry and Cambanis (1973) we can construct a process which is stochastically continuous and weakly U -discontinuous at a finite or even at a countable number of points in [0,1].
219
STOCHASTIC PROCESSES
19.7.
Processes which are stochastically continuous but not continuous almost surely
We know that convergence in probability is weaker than a.s. convergence (see Section 14). So it is not surprising that there are processes which are stochastically but not a.s. continuous. Consider the following two examples. (i) Let X
= (Xt , t E [0, 1]) be a stochastic process defined on the probability space
(0,1", P) where 0= [0,1],1" = Xt
'B[O,t] ,
P is the Lebesgue measure and
= Xt(w) = {
I 0',
if t > w if t ~ w.
The state space of X consists of the values 1 and 0, and the finite-dimensional distributions are expressed as follows:
P[Xtl =0"",Xti _1 =O,Xti
= 1, ... ,Xtn = 1]=ti- ti-1
iftl
P[Xtl
= 0, ... , Xtn = 0] =
1 - tn, P[Xtl = 1, ... , Xtn
°
= 1] = tl.
°
In all other cases P[Xtl = k l , . .. , Xtn = knl = where ki = or 1. Clearly, the process X is stochastically continuous since for any c: E (0, 1) we have
P[lXt2 - Xtll ~ c:]
= P[Xtl = 0,Xt2 = 1] = t2 -
tl'
However, almost all trajectories of X are discontinuous functions.
(ii) Consider the Poisson process X = (X t , t ~ 0) with a given parameter A. That is, Xo = a.s., the increments of X are independent, and X t - Xs for t > s has a Poisson distribution: P[Xt - Xs = k] = e-~(t-s)[A(t - s)]k/k!, k = 0,1,2, .... The definition immediately implies that for each fixed to ~
°
°
p
X t ---+ Xto as t -+ to· Hence X is stochastically continuous. However, it can he shown that every trajectory of X is a non-decreasing stepwise function with jumps of size 1 only. This and other results can be found e.g. in the book by Wentzell (1981). Therefore the Poisson process is stochastically continuous but not a.s. continuous.
19.8.
Almost sure continuity of stochastic processes and the Kolmogorov condition
Let X = (X t , t ETc JR+) be a real-valued stochastic process defined on some probability space (0,1", P). Suppose X satisfies the following classical Kolmogorov
condition:
(1)
E[lX t
-
XS IP] :S Kit - sl1+Q, K = cunstant, p> 0, q > 0, t, sET.
220
COUNTEREXAMPLES IN PROBABILITY
Then X is a.s. continuous. In other words, almost all of the trajectories of X are continuous functions. The same result can be expressed in another fonn: if condition (1) is valid a process X == (Xt) t E T) exists which is a.s. continuous and is a modification of the process X. Since (l) is a sufficient condition for the continuity of a stochastic process, it is natural to ask whether this condition is necessary. Firstly, if we consider the Poisson process (see Example 19.7) again we can easily see that condition (1) is not satisfied. Of course, we cannot make further conclusions from this fact. However, we know by other arguments that the Poisson process is not continuous. Consider now the standard Wiener process W = (Wt, t 2: 0). Recall that Wo = 0 a.s., the increments of ware independent, and Wt - Ws ,...., N(O, It - sl). It is easy to check that W satisfies (l) with p = 4 and q == 1. Hence the Wiener process w is a.s. continuous. Now based on the Wiener process we can construct an example of a process which is continuous but does not satisfy condition (1). Let Y = (yt, t 2: 0) where yt = exp( wi). This process is a.s. continuous. However, for any p > 0 the expectation E[I yt - Ys IP] does not exist and thus condition (1) cannot be satisfied. This example shows that the Kolmogorov condition (1) is not generally necessary for the continuity of a stochastic processes. Important general results concerning continuity properties of stochastic processes, including some useful counterexamples, are given by Ibragimov (] 983) and Balasanov and Zhurbenko (1985).
19.9.
Does the Riemann or Lebesgue integrability of the covariance function ensure the existence of the integral of a stochastic process?
Suppose X = (Xt,t E [a,b] C IRl) is a second-order real-valued stochastic process with zero mean and covariance function r(s, t) = E[XsXe], s, t E [a, b]. We should like to analyse the conditions under which the integral J = X t dt can be constructed. As usual, we consider integral sums of the type J N = L~=l XSIc (tk - tk- d, Sk E (tk-l, tk) and define J as the limit of {JN } in a definite sense. One reasonable approach is to consider the convergence ofthe sequence {JN} in L 2-sense. In this case, if the limit exists, it is called an L 2-integral and is denoted as (L 2 )J: X t dt. According to results which are generally accepted as classical (see Levy 1965; Loeve 1978), the integral (L 2 )J: X t dt exists iff the Riemann integral
J:
(R)J: J: r(s, t) ds dt
exists. Note however that a paper by Wang (1982) provides some explanations of certain differences in the interpretation of the double Riemann integral. As an important consequence Wang has shown that the existence of (R)J: J: r(s, t) ds dt is a sufficient but not necessary condition for the existence of (L 2)
J: X
t dt.
Let us consider this situation in more detail.
221
STOCHASTIC PROCESSES
Starting with the points a = Xo < x] < ... < Xm = b on the axis Ox and a = Yo < Y1 < ... < Yn = b on the axis Oy we divide the square fa, b] x fa, b] into rectangles in the standard way. Define
Introduce the following two conditions: (1)
(2)
(Heretti E (xi-J,xd,Vj E (Yj-bYj) and (Uij,Vij E (xi-hxd x (Yj-hYj).) It is important to note that the Riemann integral (R)J: J: r(x, y) dx dy exists iff condition (2) is fulfilled (not condition (1)). On the other hand, the integral (L 2) X t dt exists iff condition (1) is fulfilled. Since (2) =} (1), then obviously
J:
the existence of (R)J: (L
2
J: r(x, y) dx dy is a sufficient condition for the existence of
)J: X t dt.
=
Following Wang (1982) we describe a stochastic process X (Xt, t E [0, I]) such that its covariance function r(s, t) is not Riemann-integrable but the integral (L 2 )J: X t dt does exist. In particular, this example will show that in general (1) =/? (2), that is conditions (1) and (2) are not equivalent. For the construction of the process X with the desired properties we need some notation and statements. Suppose that [a, b1 = [0, 11. Let A = {(x, y) :
B
! < x < y < I},
= {x : x E 0,1) where x = (2 210 + j)/221o +l ,0 ~ j
and j(2k) _ 2k For x E B, x
~ 2210 ,j,k E N}
(mod j). 1 ~ j(2k) < 2k. Clearly, if j is odd. then j(2k) is odd. 210 21o (2 + j)/2 + 1 we define the function
=
Let us now formulate four statements numbered (I), (II), (III) and (IV). (For detailed proofs see Wang (1982).) (I) If x], X2 E B and XI
t
X2. then g(xI)
t
g(X2).
Now let
B'={x:x E B,g(x) ¢ B,x < g(x)}, D={(x,y):x E B',y - g(x), (x,y) E A}.
COUNTEREXAMPLES IN PROBABILITY
222
°
(II) We have DcA and for arbitrary 6 > and (xo, YO) E A there exists (x, y) E D such that d[(xo, yo), (x, y)] < 6. (Here d[·, .] is the usual Euc1idean distance in the plane.) Introduce the set
B" = {y: y = g(x),x E B'}
n (!, 1).
Then B" B' = 0. In the square [~, I] x [1, I] we define the function ')'( x, y) as follows. I) If (x,y) E A = {(x,y) : < x < y < I} we put ')'(x,y) = 1, if (x,y) E D and ')'( x, y) = 0, otherwise. 2) If < x = y < 1, let ,),(x,y) = I, if x = y E B' U B" and ,),(x,y) = 0, otherwise. 3) If! < y < x < I we take ')'(x, y) = ,),(y, x). For the boundary points of A, y = I, let ,),(x, y) = 0. x = x = 1, y =
1
1
!,
4,
(III) The Riemann integral (R)III III ')'(x, y) dx dy does not exist.
r:::1 r:7=1
!
!
(IV) lim61-+0,62-+0 ')'(Ui' Vj )t:.xit1.Yj exists and is zero. So, having statements (I) and (II) we can now define a stochastic process whose covariance function equals ')'(8, t). For t in the interval [~, 1] and t E B', let ~t be a r. v. distributed N( 0, I). If t E B" then there exists a unique 8 E B' such that t = g( 8); let ~t = ~s. If t ¢ B' U B", let ~t _ 0. Then it is not difficult to find that r(s, t) := E[~s~tl = ')'(8, t) where ')'(8, t) is exactly the function introduced above. It remains for us to apply statements (III) and (IV). Obviously, (IV) implies the existence of (L 2 ~t dt. However, (III) shows that the integral ,),(8, t) ds dt 7: 2 2 does not exist. Therefore the Riemann integrability of the covariance function is not necessary for the existence of the integral of the stochastic process. As we have seen, the existence ofthe integral (L2) X t dt is related to the Riemann integrability of the covariance function of X. Thus we arrive at the question: is it possible to weaken this condition replacing it by the Lebesgue integrability? The next example gives the answer. Define the process X = (Xt, t E [0,1]) as follows:
)Il
(R)I1 I1
I:
= {O,
X t
1},
if t is irrational if t is rational
where 1} is a r.v. distributed N(O, 1). It is easy to see that r(8, t) = E[XsX t ] = I if both 8 and t are rational, otherwise. Since r(8, t) iover a set of plane Lebesgue and r(8, t) = measure zero, then r(s, t) is Lebesgue-integrable on the square [0, 1] x [0, I] and
°
°
223
STOCHASTIC PROCESSES
I~ I~ r( s) t) ds dt
= O. However, the function r( 8, t) does not satisfy condition (1)
which is necessary and sufficient for the existence of (L2)
101 X t dt. Hence the integral
l
(L2)Jo X t dt does not exist. Therefore the Lebesgue integrability of the covariance function r of the process 1 X is not sufficient to ensure the existence of the integral (L2) X t dt.
10
19.10.
The continuity of a stochastic process does not imply the continuity of its own generated filtration, and vice versa
Let X = (Xt, t ~ 0) be a stochastic process defined on the probability space (n,j"',p). Denote by j"'{- = a{X 8 < tlthe smallest a-field generated by the process X up to time t. Clearly, j"'t C :1t1 if t < tl. The family (:1{-, t ~ 0) is called the own generated filtration of the process. It is of general interest to clarify the relationship between the continuity of the process X and the continuity of the filtration (j"'{-). Recall the following well known result (see Liptser and Shiryaev 1977178): if X = w is the standard Wiener process, then the filtration (JY, t ~ 0) is continuous. Let us answer two questions. (a) Does the continuity of the process X imply that the filtration (j"'{-) is continuous? (b) Is it possible to have a continuous filtration (:1;) which is generated by a discontinuous process X?
sx
(i) Let n
= IR 1, j"' = 13 1 and P be an arbitrary probabiJity measure on 13 1• Consider
=
Xt(w) - t~(w) and ~ is a r.v. distributed the process X = (Xt, t ~ 0) where X t N(O, 1). Obviously, the process X is continuous. Further, it is easy to see that for t = 0, :1; is the trivial a-field {0, n}. If t > 0, we have:1; = 13 1. Thus:1; 1:- :1;+. Therefore the filtration (j"'f) is not right-continuous and hence not continuous despite the continuity of X. (ii) Let n =
[0,1], :1 == 13[0,1] and P be the Lebesgue measure. Choose the function
h E Coo(IR+) so that h(x) == 0 for x ~ ~ and for x > ~, h(x) > 0, and h is strictly increasing on the interval [!, (0). (It is easy to find examples of such functions.) Consider the process X = (Xt,t ~ 0) where X t = Xt{w) = wh(t), wEn, t ~ 0 and let (j"'{-, t ~ 0) be its own generated filtration: j"'f = a{ X s , s ~ t}. Then it is easy to check that
j"'X == {{0,n}, t 13 1 ,
if 0 ~ t ~ if t >
!.
!
Hence the filtration (j"'{) is discontinuous even though the trajectories of X are in the space Coo . (iii) Now we aim to show that the filtration (j"'{-) of the process X can be continuous even if X has discontinuous trajectories. Firstly, let h : IR+ t--+ IR I be any function. Then a countable dense set D C IR+ exists such that for an t ~ 0 there is a sequence {tn, n ~ I} in D with tn --t t and
COUNTEREXAMPLES IN PROBABILITY
224
h(t n ) -t h(t) as n -t 00. The reasoning is as follows. Let (n, j', P) be a one-point probability space. Define Xt(w) h(t) for wEn and all t ~ O. Since the extended real line R I is a compact, the separability theorem (see Doob 1953, 1984; Gihman and Skorohod (197411979); Ash and Gardner 1975) implies that (Xb t ~ 0) has a separable version (yt, t ~ 0) with yt : n -t JR 1, t ~ O. But yt = X t a,s, and so yt(w) Xt(w) h(t), t ~ O. Thus we can construct a class of stochastic processes which are separable but whose trajectories need not possess any useful properties (for example. X can be discontinuous, and even non-measurable; of course, everything depends on the properties of the function h which, let us repeat, can be chosen arbitrarily). Now take again the above one-point probability space (n, j', P) and choose any functionh: JR+ H JR 1. Define the process X = (Xt,t ~ 0) by X t = Xt(w) = h(t), wEn, t ~ O. Then for an t ~ 0, j'; = a{Xs, s :::; t} is a P-trivial a-field in the
=
=
=
=
sense that each event A E j'; has either P(A) 1 or P(A) = O. Therefore the filtration (j'; ,t ~ 0) is continuous. By the above result the process X is separable but its trajectories are equal to h, and h is chosen arbitrarily. It is enough to take h as discontinuous. Finally we can conclude that in genera1 the continuity (and even the infinite smoothness) of a stochastic process does not imply the continuity of its own generated filtration (see case (i) and case (ii». On the other hand, a discontinuous process can generate a continuous filtration (case (iii». The interested reader can find several useful results concerning fine properties of stochastic processes and their filtrations in the books by DeHacherie and Meyer (1978, 1982), Metivier (1982), Jacod and Shiryaev (1987), Revuz and Yor (1991) and Rao (1995).
SECTION 20.
MARKOV PROCESSES
We recall briefly only a few basic notions concerning Markov processes. Some definitions will be given in the examples considered below. In a few cases we refer the reader to the existing literature. Firstly, let X = (Xt. t ETc JR+) be a family of r.v.s on the probability space (n, j', P) such that for each t, X t takes values in some countable set E. We say that X is a Markov chain if it satisfies the Markov property: for arbitrary n ~ I, tl < t2 < .. , < tn < t, t j , t E T, il, ... ,in, j E E,
The chain X is finite or infinite accordingly as the state space E is finite or infinite. 1fT {O, 1,2, ... } we write X = (Xm n ~ 0) or X - (Xn, n 0,1, ... ) and say that X is a discrete-time Markov chain. If T = R+ or T = [a, b] C JR+ we say that X (Xt, t ~ 0) or X = (Xt, t E [a, b]) is a continuous-time Markov chain.
STOCHASTIC PROCESSES
225
The probabilistic characteristics of any Markov chain can be found if we know the initial distribution (Tj' j E E) where Tj = P[Xo = j], Tj 2: 0, LjEE Tj 1 and jlX s - i], t 2: s, i, j E E. The chain the transition probabilities Pij (s, t) = P[X t X is called homogeneous if Pij(S, t) depends on sand t only through t - s. In this case, if X is a discrete-time Markov chain, it is enough to know (rj,j E E) and the I-step transition matrix P = (Pij) where Pij = P[Xn+1 = jlX n = i], n ;::: 0. The n-step transition probabilities fonn the matrix pen) = (p~;») and satisfy the relation (m+n) _ "'"'
(2)
Pij
-
(m) (n)
L...J Pik
Pkj
kEE
which is called the Chapman-Kolmogorovequation. Note that the transition probabil ities Pij (t) or Pij (s, t) of any continuous-time Markov chain satisfy the so-called forward and backward Kolmogorov equations. In some of the examples below we assume that the reader is familiar with basic notions and results in the theory of Markov chains such as classification of the states, recurrence and transience properties, irreducibility, aperiodicity, infinitesimal matrix and Kolmogorovequations. Now let us recall some more general notions. Let X = (Xt, t ;::: 0) be areal-valued process on the probability space (n,~, P) and (~t, t ;::: 0) be its own generated filtration. We say that X is a Markov process with state space (Ii I, 'B 1 ) if it satisfies the Markov property: for arbitrary r E 'Bl and t > s,
(3) This property can also be written in other equivalent fonns. The function P(s, x; t, r) defined for s, t E Ii+, S ::; t, x E Ii I, r E 'B I is said to be a transitionfunction if: (a) for fixed s, t and x, P(s, x; t,') is a probability measure on 'BI; (b) P(s, x; t, r) is 'Bl-measurable in x for fixed s, t, r; (c) P(s, x; s, r) = <5x (r) where <5x (r) is the unit measure concentrated at x; (d) the foHowing relation holds:
P(s,x;t,r) =
r P{s,x;u,dy)P(u,y;t,r),
iRI
s
< u < t.
This relation, called the Chapman-Kolmogorov equation, is the continuous analogue of (2). We say that X (Xt! t ;::: 0) is a Markov process with transition function P(s, x; t, r) if X satisfies (3) and
The Markov process X is called homogeneous if its transition function P( s, x; t, r) depends on sand t only through t - s. In this case we can introduce the function P( t, x, r) = P(O, x; t, r) of three arguments, t ;::: 0, x E Ii I, r E 'B I and to express conditions (a)-(d) in a simpler fonn.
226
COUNTEREXAMPLES IN PROBABILITY
Note that the strong Markov property will be introduced and compared with the usual Markov property (3) in one of the examples. Complete presentations of the theory of Markov chains in discrete and continuous time can be found in the books by Doob (1953), Chung (1960), Gihman and Skorohod (197411979), Isaacson and Madsen (1976) and Iosifescu (1980). Some important books are devoted to the general theory of Markov processes: see Dynkin (] 961, 1965), Blumenthal and Getoor (1968), Rosenblatt (1971, 1974), Wentzell (1981), Chung (1982), Ethier and Kurtz (1986), Bhattacharya and Waymire (1990) and Rogers and Williams (1994). In this section we have included examples which examine the relationships between some similar notions or illustrate some of the basic properties of the Markov chains and processes. Note especially that many other useful results and counterexamples can be found in the recent publications indicated in the Supplementary Remarks.
20.1.
Non-Markov random sequences whose transition functions satisfy the Chapman-Kolmogorovequation
Here we consider a few examples to illustrate the difference between the Markov property, which defines a Markov process, and the Chapman-Kolmogorov equation, which is a consequence of the Markov property. (i) Suppose an urn contains four balls numbered 1, 2, 3, 4. Randomly we choose
one ball, note its number and return it to the urn. This procedure is repeated many times. Denote by en the number on the nth chosen ball. For j = 1,2,3 introduce the events A;n) = {either en = j or en = 4} and let X 3{m-I)+j = 1 if A;m) occurs, and 0 otherwise, m :::: 1,2, .... Thus we have defined the random sequence (X n, n ~ 1) and want to establish whether it satisfies the Markov property and the Chapman-Kolmogorovequation. If each of kl' kz, k3 is I or 0, then
Therefore for 1 < m
!
< n we have
= kzlXI = kd = P[Xn = kzlXm = O]P[Xm = 0IX, = kd + P[Xn = kzlXm = I]P[Xm = ]IX, = kd = ! . ! + ! . ! = P[Xn
This means that the transition probabilities of the sequence {X n, n > I} satisfy the Chapman-Kolmogorov equation. Further, the event [X 3m :::: 1, X 3m-I :::: l] means that en 4 which implies that X 3m :::: 1. Thus
=
P[X3m = IIX3m-z
= I,X3m - 1 = 1] = 1,
m = 1,2, ....
227
STOCHASTIC PROCES SES
This relation shows that the Markov property does not hold for the sequence {Xn, n 2: 1}. Therefore {Xn, n 2: 1} is not a Markov chain despite the fact that its transition probabilities satisfy the Chapman-Kolmogorov equation. (ii) In Example 7.1 (iii) we constructed an infinite sequence of pairwise i.i.d r. v.s
*
{Xn' n 2: l} where Xn takes the values l, 2, 3 with probability each. Thus we have a random sequence such that Pij = P[Xn+1 = jlXn = i] = for all possible i, j. The Chapman-Kolmogorov equation is trivially satisfied with p~7) = n 2: 1. However, the sequence {X n , n 2: 1} is not Markovian. To see this, suppose that at time n = 1 we have XI = 2. Then a transition to state 3 at the next step is possible iff
l
t,
the initial state was 1. Hence the transitions following the first step depend not only on the present state but also on the initial state. This means that the Markov property is violated although the Chapman-Kolmogorov equation is satisfied. (iii) Every N x N stochastic matrix P defines the transition probabilities of a
Markov process with discrete time. Its n-step transition probabilities satisfy the Chapman-Kolmogorov equation which can be written as the semigroup relation pm+n = pm pn. Now we are going to show that for N 2: 3 there is a non-Markov process with N states whose transition prohahilities satisfy the same equation. Let 0 1 be the sample space whose points (x(\), ... , x( N)) are the random pennutations of (1, ... , N) each with probability 1IN!. Let i and v be fixed numbers of the set {1, ... ,N} and O 2 be the set of the N points (x(l), ... ,x(N)) such that xU) = v. Each point in O 2 has probability 1IN. Let 0 be the mixture of 0 1 and O 2 with 0 1 carrying weight 1 - liN and 02 weight liN. More formally, n contains N! + N arrangements (x(l), ... ,x(N)) which represent either a permutation of (1, ... , N) or the N -fold repetition of the integer v, v = 1, ... ,N. To each point of the first class we attribute probability (1 - N-I)IN!; to each point of the second class, probability N- 2 . Then clearly
Thus all transition probabilities of the sequence constructed above are the same, namely P[x(i) = vlx U ) = Il] = N- I . If x(l) = 1, x(2) = 1, then P[x(3) :f- 1] = 0 which means that the Markov property is not satisfied. Nevertheless the Chapman-Kolmogorov equation is satisfied.
20.2.
Non-Markov processes which are functions of Markov processes
If X = (X t , t 2: 0) is a Markov process with state space (E, e) and 9 is a one--{)ne mapping of E into E, then Y = (g(X t ), t 2: 0) is again a Markov process. However, if 9 is not a one-one function, the Markov property may not hold. Let us illustrate this possibility by a few examples.
228
COUNTEREXAMPLES IN PROBABILITY
(i) Let {Xn,n = 0,1,2, ... } be a Markov chain with state space E transition matrix
= {1,2,3},
p=(~; 1~) 3
4
12
!,
and initial distribution r = (j, j). It is easy to see that the chain {Xn} is stationary. Consider now the new process {Yn,n = 0,1,2, ... } where Yn = g(Xn) and 9 is a given function on E. Suppose the states i of X on which 9 equals some fixed constant are collapsed into a single state of the new process Y called an aggregated process. The collection of states on which 9 takes the value x will be called the set of states S x' It is obvious that only non-empty sets of states are of interest. For the Markov chain given above let us collapse the set of states S consisting of I, 2 into one state. Then it is not difficult to find that
This relation implies that the new process Y is not Markov. (ii) Let {X n , n = 0, 1, ... } be a stationary Markov chain with state space E = {I, 2,3,4} and n-step transition matrices
p(n) -
k
111) l I n (01-10) 0 0 00 1111 +A 0-110 ( 1111 0 0 00
where n = 1,2, ... and A, )..' are real numbers sufficiently small in absolute value. Takethefunctiong: E I--t {I,2} suchthatg(l) = g(2) = l,g(3) = g(4) = 2, and consider the aggregated process {Yn , n = 0,1, ... } where Yn = g(X n ). If Q(n) denotes the n-step transition matrix of y, we find
-4 ) 1
•
2
It turns out that Q(n) does not depend on )..' and it is easy to check that Q(n), n ~ 1, satisfy the Chapman-Kolmogorov equation. However, the relation
P[Yo = 1, Y I
= 1, Y2 =
.1]
= k(1 + 2A + )"X)
STOCHASTIC PROCESSES
229
implies that the sequence {Yn , n
2: O} is not Markov when A i- A'.
(iii) Consider two Markov chains, XI and Xz, with the same state space E and initial distribution r, and with transition matrices PI and P z respectively. Define a new process, say X, with the same state space E, initial distribution rand n-step transition matrix pen) = + Then it can be shown that the process X, which is called a mixture of XI and Xz, is not Markov.
1P/n) 1Pin).
(iv) Let w = (Wt, t 2: 0) be a standard Wiener process. Consider the processes
M = (M t : = max w s , t 2: 0), O~s~t
Y = M -
W.
Then obviously the process M is not Markov. According to a result by Freedman (1971), see also Revuz and Yor (1991), Y is a Markov process distributed as Iwi where Iwl is called a Wiener process with a reflecting barrier at O. Since the Wiener process w itself is a Markov process, we have the relation
M
= Y +w.
Here the right-hand side is a sum of two Markov processes but the left-hand side is a process which is not Markov. In other words, the sum of two Markov processes need not be a Markov process. Note, however, that the sum of two independent Markov processes preserves this property.
20.3.
Comparison of three kinds of ergodicity of Markov chains
Let X = {Xn, n = 0,1, ... } be a non-stationary Markov chain with state space E (E is a countable set, finite or infinite). The chain X is described completely by the initial distribution frO) = (fjO) , j E E) and the sequence {Pn , n 2: I} of the transition matrices. If limn~oo p~:,k+n) = 7rj exists for all j E E independently of i, 7ri > 0 and LjEE 7rj = 1, we say that the chain X is ergodic and (7rj, j E E) is its ergodic distribution. Introduce the following notation: f(m)
= frO) PIPZ ••• Pm, p(k,m)
f(k,m)
= f(O) Pk+IPk+Z '"
= Pk+IPk+Z ."
Pm,
Pm.
The Markov chain X is called weakly ergodic if for all kEN,
(1) where frO) of X.
= (f;o), j
Ilf(k,m) - g(k,m)11 = 0
lim
sup
m-+oo
/(0) ,g(O)
E E) and g(O)
= (g~O), j
E E) are arbitrary initial distributions
230
COUNTEREXAMPLES IN PROBABILITY
The chain X is called strongly ergodic if there is a probability distribution q = (qj, j E E) such that for all kEN, lim
(2)
m-+oo
sup IIj(k,m) -
qll =
O.
!(O)
(In (1) and (2) the norm of the vector x = (xj, j E E) is defined by
IIxll
l:jEE IXj ,.)
Now we can easily make a distinction between the ergodicity, the weak ergodicity and the strong ergodicity in the case of stationary Markov chains. For every Markov chain we can introduce the so-called 6-coefficient. If P = (Pij) is the transition matrix of the chain we put
This coefficient is effectively used for studying Markov chains and will be used in the examples below. Our aim is to compare the three notions of ergodicity introduced above. Obviously, strong ergodicity implies weak ergodicity. Thus the first question is whether the converse is true. According to a result by Isaacson and Madsen (1976), if the state space E is finite, the ergodicity and the weak ergodicity are equivalent notions. The second question is: what happens if E is infinite? The examples below will answer these and other related questions. (i) Let
{Xn} be a non-stationary Markov chain with
We can easily see that 6(P2n ) = 1 and 6(P2n -
l)
= ~. Hence for all k,
m
6(p(k,m»)::;
II
6(Pj
)::;
(1 /2)[(m-k)/2] -t 0
as
m -t O.
j=k+1
However, the condition 6(p(k,m») -t 0 for all k as m -t 00 is necessary and sufficient for any Markov chain X to be weakly ergodic (see Isaacson and Madsen 1976). Therefore the Markov chain considered here is weakly ergodic. Let us determine whether X is strongly ergodic. Take j(O) = (0, 1) as an initial distribution. Then
j(2k) = j(0)(P]P2)(P3 P4)'" (P2k -IPZk )
= (0, 1) (~
:
r
= (0, I)
231
STOCHASTIC PROCESSES
and
= (0, 1) (~
r
= (0, 1)
;
Hence 11/(2j ) - 1(2j +I)11 = 2 for j = k, k + 1, ... and the sequence {/(k)} does not converge. Therefore the chain X is not strongly ergodic. (ii) Again, let {Xn} be a non-stationary Markov chain with
P2n-l
1 1 __ 2n-1 = ( 1- _1_
2n-1
2n~1
1
)
1 2n-1
Then for any initial distribution
P2n
'
1(0)
1 - ....L 2n
2n
=
1
(
1 - ....L 2n
2n
),
n = 1,2, ....
we have if m is odd jf 111, is even.
It is not difficult to check that condition (1) is satisfied while condition (2) is violated. Therefore this Markov chain is weakly, but not strongly, ergodic.
(iii) Let {Xn} be a stationary Markov chain with infinite state space E and transition matrix 1 1 2: 2: 0 0 0 1 0 1 0 0 2:
P=
2:
0
4
0
1 4
0
0
8
7
0
•
3
.o
.......
.o
•
.o
•
.o
0 1 8
..
.o
...
.o.o.o
•
It can be shown that this chain is irreducible, positive recurrent and aperiodic. Hence it is ergodic (see Doob 1953; Chung 1960), that is independently of the initial distribution 1(0), limn-+oo I(O)p(n) = 7r exists and 7r = (7rj,j E E) is a probability distribution. However, c5(p(m)) = 1 for all m which implies that the chain is not weakly ergodic.
(iv) Since the condition c5(p(k,m j ) 4' 0 for an k as m 4' 00 is necessary and sufficient for the weak ergodicity of non-stationary Markov chains, it is natural to ask how this condition is expressed in the stationary case. Let {Xn} be a stationary Markov chain with transition matrix P. Then p(k,m) = p(m-k) and for the c5-coefficient we find
232
COUNTEREXAMPLES IN PROBABILITY
This means that the condition c5 (P) < 1 is sufficient for the chain to be weakly ergodic. However, this condition is not necessary. Indeed, let
The Markov chain with this P is irreducible, aperiodic and pOSitive recurrent. Therefore (see Isaacson and Madsen 1976) the chain is weakly ergodic. At the same time c5(P) = l.
20.4.
Convergence of functions of an ergodic Markov chain
Let X = (Xnl n ~ 0) be a Markov chain with countable state space E and n-step transition matrix (p~j»). Let 1rj := lim n -4(X) p~j) be the ergodic distribution of X and g : E H IR I be a bounded and measurable function. We are interested in the conditions under which the following relation holds: lim E[g(Xn )] = ' " 1rjg(j). L..J
(1)
n-4(X)
jEE
One of the possible answers, given by Holewijn and Hordijk (1975), can be formulated as follows. Let X be an irreducible, positive recurrent and aperiodic Markov chain with values in the space E. Denote by (1r j, j E E) its ergodic distribution. Suppose that the function g is non-negative and is such that LjEE 1rjg(j) < 00. Additionally, suppose that for some io E E, P[Xo = io] = 1. Then relation (1) does hold. Our aim now is to show that the condition Xo = io is essential. In particular, this condition cannot be replaced by the assumption that X has some proper distribution over the whole space E at the time n = 0. Consider the Markov chain X = (Xn, n ~ 0) which takes values in the set E = {O, 1,2, ... } and has the following transition probabilities: POj
=qip, ifj =0,1,2, ... ,
where
°<
P
< 1, q =
Pg')
Pi,i-I
= 1, ifi= 1,2, ... ,
Pij
=0, otherwise
1 - p. A direct calculation shows that
= qjp, if i
p~n~ l+k,k-l = 1,
= 0, 1,2, ... , j
if k = 1,2,...
= 0, 1, ... , n - 1,
and
p~j) = 0,
otherwise.
The chain X is irreducible. aperiodic and positive recurrent and its ergodic distribution (1rj,j E E) is given by 1r':= lim P~'"!) = qj p. J
n-4(X)
tJ
233
STOCHASTIC PROCESSES
Suppose now 9 is a function on E satisfying the following condition: E~oqjlg(j)1 < 00. Suppose also that (Tj,) E E) is the initial distribution of the chain X. Then
E[g(Xn)] = =
E~o (E:o TiP~;») g(j) (TO + ... + Tn-d L~oPqjg(j) + L~o Tn+jg(j).
Clearly, E[g(Xn}] will converge to L~oPqj g(j) as n ......;
00
iff
00
lim ' " Tn+jg(j) = 0.
n~oo
L.J
j=O
Now we can make our choice of 9 and (Tj,j E E). Let
g(j)
= l,
if j
Then L~o qjlg(j)l
= 0, 1,2,...
and
Tj
= {6°'1r -2J'-2 ,
< 00 and obviously for all n 2:
°
if j = if j 1,2, ....
°
we get
00
I: Tn+jg(j) =
00.
j=O
Hence a relation like (I) is not possible. Therefore in general we cannot replace the condition P[Xo io] 1 by another one assuming that Xo is a non-degenerate r. v. with an arbitrary distribution over the whole state space E.
20.5.
A useful property of independent random variables which cannot be extended to stationary Markov chains
It is well known that sequences of independent r. v.s obey several interesting properties (see Chung 1974; Stout 1974a; Petrov 1975). It turns out that the independence condition is essential for the validity of many of these properties. Let us fonnulate the following result: if {Xn' n 2: 1} is a sequence of i.i.d. r.v.s. and EX) = 00 then lim sUPn~oo(Xnln) = 00 a.s. This result is a consequence of the Borel-Cantelli lemma (see lanny 1974; 0' Brien 1982). Our aim now is to clarify whether a similar result holds for a 'weakly' dependent random sequence. We shall treat the case of {Xn, n ~ O} fonning a stationary Markov chain. Let X = (Xn, n 2: 0) be a stationary Markov chain with state space E {1,2, ... }and
P[Xn = k] = l/(k(k+ 1)), k = 1,2, ... , n =0, 1,2, ... P[Xn - k + IIX n-) = k] = k/(k + 2), P[Xn = IIXn-1 = k] = 2/(k+ 2), k = 1,2, ... , n = 1,2, ....
234
COUNTEREXAMPLES IN PROBABILITY
It is easy to see that EX n = 00 for each n. However, we have Xn :$ Xo + n a.s. for all n which implies that P[lim sUPn-+oo(Xn/n) :$ 1] 1. Hence lim sUPn-+oo(Xn/n) is not infinity as in the case of independent r.v.s. Using a result by O'Brien (1982) we conclude that lim sUPn-+oo(Xn/n) = 0 a.s. Therefore we have constructed a stationary Markov chain such that for each n, EXn = 00 but with Lim sUPn--too(Xn/n) 0 a.s.
=
=
20.6.
The partial coincidence of two continuous-time Markov chains does not imply that the chains are equivalent
Let X = (X t • t ~ 0) and X = (X t • t ~ 0) be homogeneous continuous-time Markov chains with the same state space E, the same initial distribution and transition probabiIitiesPij(t) andpij(t) respectively. If Pij(t) = Pij(t). i,j E E for infinitely many t, but not for all t ~ 0, we say that X and X coincide partially. If moreover we have Pij(t) :;::: Pij(t), i,j E E for all t ~ 0, then the processes X and X are equivalent (stochastically equivalent) in the sense that each one is a modification of the other (see Example 19.3). Firstly, let us note that the transition probabilities of any continuous-time Markov chain satisfy two systems of differential equations which are called Kolmogorov equations (see Chung 1960; Gihman and Skorohod 197411979). These equations are written in terms of the corresponding infinitesimal matrix Q = (qij) and under some natural conditions they uniquely define the transition probabilities Pij (t), t > O. Let X and X be Markov chains each taking values in the set {I, 2, 3}. Suppose X and X are defined by their infinitesimal matrices Q (qij) and Q (ijij) respectively, where
=
Q=
-1 0) ( 0-1 1
1
0-1
,
=
( 1-tJJ.
Thus, knowing Q and Q and using the Kolmogorov equations, we can show that the transition probabilities Pij (t) and Pii (t) have the following explicit form:
pJ1(t) = P22(t) = P33(t) =
*+ ~e-3t/2cos(v'3t/2).
PI2(t) = P23(t) =
t + ~e-3t/2 cos( v'3t/2
P31 (t)
-
2rr /3),
= P21 (t) = P32(t) = t + je- 3t / 2cos( v'3t/2 + 2n /3), Pll (t) = P22(t) = fu3(t) = t + ~e-3t/2, Pij(t) =} - i e- 3t / 2 , ifi:f j, i,j = ],2,3. PI3(t)
(The details are left to the reader.)
STOCHASTIC PROCESSES
=
235
=
Obviously, Pij(t) Pij(t) for every t 4ktr/v'3, kEN, but for all other t we have Pij (t) ::j:. Pij (t). Therefore the processes X and X are not equivalent, though they partially coincide.
20.7.
Markov processes, Feller processes, strong Feller processes and relationships between them
LetX = (Xt,t 2: s,Ps,x) be a Markov family: that is (Xt, t ~ s)isaMarkovprocess with respect to the probability measure Ps,x. Ps,x[X s = x] = 1 and P(s,x;t,r), t 2: s, x E jRI, r E ~I is its transition function. As usual, IIi = IIi(jRl) and e = e(jRi) will denote the set of all bounded and measurable functions on jRl , and the set of all bounded and continuous functions on jR 1 respectively. By the equality
pstg(x) = Es,x[g(Xt}] =
r g(y)P(s, x; t, dy)
iR'
we define on IIi a semigroup of operators {pst}. Obviously we have the inclusion pst 1m C 1m and, moreover, pste IIi. A Markov process for which pste C e is called a Feller process. This means that for each continuous and bounded function g, the function pstg(x) is continuous in x. In other words,
r g(y)P(s, x; t, dy) ~ iR'r g(y)P(s,
iR'
Xo; t, dy)
as
x
~ Xo,
Xo E jRI
which is equivalent to the weak. continuity of P(·) with respect to the second argument (the starting point of the process). Let us now introduce another notion. If for each 9 E 1m the function pstg(x) is continuous in x, the Markov process is called a strong Feller process. Clearly, the assumption for a process to be strong Feller is more restrictive than that for a process to be Feller. Thus, every strong Feller process is also a Feller process. However the converse is not always true. There are Markov processes which are not Feller processes, although the condition for a process to be Feller seems very natural and not too strong.
=
(i) Let the family X (Xt. t 2: s, Ps,x) describe the motion for t ~ s of a particle starting at time s from the position Xs = x: if Xs < 0, the particle moves to the left with unit rate; if Xs > 0 it moves to the right with unit rate; if Xs = 0, the particle moves to the left or to the right with probability for each of these two directions. Formally this can be expressed by:
t
+ (t - s), t ~ s] = 1,
>0
P s ,x [X t =
X
Ps,x[Xt
x-(t-s),t2::s]=l,
Ps,x[X t
t - s, t ~ s] - Ps,x[Xt = -(t - s), t ~ s] =
if x if
x
!.
236
COUNTEREXAMPLES IN PROBABILITY
It is easy to see that X = (Xt, t 2:: s, P s,x) is a Markov family. Further, if 9 is a continuous and bounded function, we find explicitly that
pstg(x) =
g(x + (t - s)), g(x - (t - s)), { !g(t - s) + !g(-(t - s)),
if x if x if x
>0 <0
= o.
Since pst 9 (x) has a discontinuity at x = 0, it follows from this that X is not a Feller process even though it is a Markov process. (ii) It is easy to give an example of a process which is Feller but not strong Feller. Indeed, by the formula
P(t,x, r)
= Ir(x + vt),
t
2::
0, x E 1Rl,
r
E 23
1 ,
v
= constant> 0
we define a transition function which corresponds to a homogeneous Markov process. Actually, this P describes a motion with constant velocity v. All that remains is to check that the process is Feller but is not strong Feller.
20.8.
Markov but not strong Markov processes
In the introductory notes to this section we defined the Markov property of a stochastic process. For the examples below we need another property called the strong Markov property. For simpliCity we consider the homogeneous case. Let X = (Xt, t 2:: 0) be a real-valued Markov process defined on the probability space (.0, T, P) and (Tt, t 2:: 0) be its own generated filtration which is assumed to satisfy the usual conditions. Let T be an (~d-stopping time and ~T be the a-field of all events A E ~ such that A n [T :S t] E ~t for all t 2:: o. Suppose the Markov process X is (~d-progressive, and let ry be an ~T-measurable non-negative r.v. defined on the set [w : T(W) < 00]. Then X is said to be a strong 1 Markov process if, for any r E 23 ,
This relation defines the strong Markov property. In terms of the transition function P(t,x, r) it can be written in the form
(I) If (Xt, t 2:: 0, P x ) is a homogeneous Markov family (also see Example 20.5), the strong Markov property can be expressed by
(2)
px{An [X T+l1
E
rn = i
P(ry,XT,r)Px(dw),
A c 1'n {w: T(W) < oo,ry(w) < oo}.
237
STOCHASTIC PROCESSES
Two examples of processes which are Markov but not strong Markov are now given. Case (i) is the first ever known example of such a process, proposed by A. A. Yushkevich (see Dynkin and Yushkevich 1956). (i) Let W == (Wt, t 2:: 0, P x ) be a Wiener process which can start from any point x E ~J. Define a new process X (Xt, t > 0) by
X t ==
{Wt, 0,
~f Wo i: 0 If Wo == o.
Then X is a homogeneous Markov process whose transition function is
P(t ,x,
r) = {(27ft) c)o (r) ,
1/2 frexp[-(u
- x)2/(2t)J du,
if xi: 0 if x == 0.
(Here c)o(-) is a unit measure concentrated at the point 0.) Let us check the usual Markov property for X. Clear) y, P satisfies all the conditions for a transition function. We then need to establish the relation
(3) for t, h 2:: 0, A E 1't and r E 23 1, (Note that by equation (3) we express the Markov property ofthe family (Xt, t 2:: 0, Px )' while the strong Markov property is given by (2).) If x i: 0, then X t = Wt a.s. and (3) is reduced to the Markov property of the Wiener process. If x == 0, (3) is trivial since both sides are simultaneously either 1 or O. Hence X is a Markov process. Let us take x i: 0, r inf {t : X t = O}, 1] = (1 - r) V0, A = {r 5 1} and r = ~l \ {O}. Then ohviously r < 00 a.s., 1] < 00 a.s. Suppose X is strong Markov. Then the following relation would hold (see (2»:
(4) However, the left-hand side is equal to
while the right-hand side is 0. Thus (4) is not valid and therefore the Markov process X is not strong Markov. (ii) Let r be a r.v. distributed exponentially with parameter 1, that is P[r Define the process X (Xt, t 2:: 0) by
Xt
= Xt(w) = max{O, t -
and let 1't = a{XS! s ~ t}, t
r(w)}
> 0 be its own generated filtration.
> tJ
= e- t .
238
COUNTEREXAMPLES IN PROBABILITY
It is easy to see that if X t = a > 0 for some t, then X t + s = a + s for all s ~ O. If for some t we have X t = 0, then Xs must be zero for s ::; t, so it does not provide new infonnation. Thus we conclude that X is a Markov process. Denote its transition function by P(t, x, r). Let us show that X is not a strong Markov process. Indeed, the relation [w : T(W) ~ t] = [w : Xt(w) = 0] E 1't shows that the r.v. T is an (1't)-stopping time. For a given T we have P[X r + s = s] = l. Therefore
I, P[Xr+s .::; xl1'r] = { 0 ,
(5)
if x 'f X
I
>s < _ s.
If X were strong Markov, then the following relation would hold:
(6) and the conditional probability on the right-hand side could be expressed according to (1) by the transition function P(t, x, namely
n,
However, (7)
I,
= { e-(s-x),
if x > s if x ::; s.
From (5) and (7) it follows that (6) is not satisfied. The process X is therefore Markov but not strong Markov.
20.9.
Can a differential operator of order k > 2 be an infinitesimal operator of a Markov process?
n,
Let P(t, x, t ~ 0, x E IRI, r E ~I be the transition function of a homogeneous Markov process X = (Xl, t ~ 0) and {pt, t ~ O} be the semigroup of operators u(y)P(t, x, dy). The infinitesimal operator associated with P: ptu(x) = corresponding to {pt} (and also to P and to X) is denoted by A and is defined by:
iRi
(1)
Au(x) = lim( I It) t.j.O
[rJRi u(y)P(t, x, dy) - u(x)].
Let D(A) be the domain of A, that is D(A) contains all functions u(x), x E IRI for which the limit in (l) exists in the sense of convergence in norm in the space where {pt} is considered. Several important results concerning the infinitesimal operators of Markov processes and related topics can be found in the book by Dynkin (1965). In
STOCHASTIC PROCESSES
239
particular, Dynkin proves that under natural conditions the infinitesimal operator A is a differential operator of first or second order. In the latter case D(A) = C 2(IR I), the space of twice continuously differentiable functions. So we come to the following question: can a differential operator of order k > 2 be an infinitesimal operator? Suppose the answer to this question is positive in the particular case k 3 and let Au(x) = u"'(x) with D(A) = C 3 ({Rl), the space of three times continuously differentiable functions u(x), x E {RI. However, if A is an infinitesimal operator, then according to the Hille-Yosida theorem (see Dynkin 1965; Wentzell 1981) A must satisfy the minimum principle: if u E D(A) is minimum at the point Xo. then AU(xo) ~ O. Take the function u( x) = 2( sin 21rx)2 - (sin 21rx)3. Obviously u E C\{R I) and it is a periodic function with period 1. It is easy to see that u takes its minimal value at x = 0 and, moreover, ullf(O) < O. This implies that the minimum principle is violated. Thus in general a differential operator of order k = 3 cannot be an infinitesimal operator. Similar arguments can be used in the case k > 3.
=
SECTION 21. STATIONARY PROCESSES AND SOME RELATED TOPICS
=
Let X (X t, t ETc IR I) be a real- valued stochastic process defined on the probability space (O,~, P). We say that X is strictly stationary if for each n ~ and tk, tk + h E T, k == I, ... , n, the random vectors
(Xtl"'" Xt n ) and
(Xtl+h, ... , Xtn+h)
have the same distribution. Suppose now that X is an L 2-process (or second-order process), that is E[Xl] < 00 for each t E T. Such a process X is said to be weakly stationary if EXt - c constant for all t E T and the covariance function G(s, t) == E[XsXtJ depends on s and t only through t - s. This means that there is a function G(t) of one argument t, t E T, such that G(t) = E[XsXs+t ] for all 8, 8 + t E T. On the same lines we can define weak and strict stationarity for multi-dimensional processes and for complex-valued processes. The notions of strict and weak stationarity were introduced by Khintchine (1934). Let us note that the covariance function G of any weakly stationary process admits IR I or T = IR+ we have a continuousthe so-called spectral representation. If T time weakly stationary process and its covariance function G has the representation
G(t) ==
i:
e it )' dF(,\)
where F('\), ,\ E IRI is a non-decreasing, right-continuous and bounded function. F is called a spectral d.!, while its derivative f, if it exists, is called a spectral density
240
COUNTEREXAMPLES IN PROBABIUTY
function. If T N or T = N we say that X = (Xn) is a discrete-time weakly stationary process (or a weakly stationary sequence). In this case the covariance function C of X has the representation
where F(A), A E [-rr, rr] possesses properties as in the continuous case. Note that many useful properties of stationary processes and sequences can be derived under conditions in tenns of C, F and f. It is important to note that stationary processes themselves also admit spectral representations in the fonn of integrals of the type J~(X) eit>. dZ>. with respect to processes with orthogonal increments. Let X = (Xn, n E N) be a strictly stationary process. Denote by M~ the a-field generated by the r.v.s. X a , X a+ I , •.. ,Xb. Without going into details here we note that in tenns of probabilities of events belonging to the a-fields M~(X) and M~n we can define some important conditions, such as 'P-mixing strong mixing, regularity and absolute regularity, which are essential in studying stationary processes. In the examples below we give definitions of these conditions and analyse properties of the processes. A complete presentation of the theory of stationary processes and several related topics can be found in the books by Parzen (1962), Cramer and Leadbetter (1967), Rozanov (1967), Gihman and Skorohod (197411979), Ibragimov and Linnik (1971), Ash and Gardner (1975), Ibragimov and Rozanov (1978) and Wentzell (1981). In this section we consider only a few examples dealing with the stationarity property, as well as properties such as mixing and ergodicity.
21.1.
On the weak and the strict stationary properties of stochastic processes
Since we shall be studying two classes of stationary processes, it is useful to clarify the relationship between them. Firstly, if X = (Xt, t E IRI) is a strictly stationary process, and moreover X is an 2 L -process, then clearly X is also a weakly stationary process. However, X can be strictly stationary without being weakly stationary and this is the case when X is not an L 2-process. It is easy to construct examples of this. Further, let () be a r.v. with a unifonn distribution on [0,2rr] and let Zt = sin ()t. Then the random sequence (Zn = sin ()n, n = 1,2, ... ) is weakly but not strictly stationary, while the process (Zt = sin ()t, t E IRI) is neither weakly nor strictly stationary. If we take another r.v., say C, which has an arbitrary distribution and does not depend on (), then the process Y = (Yt , t E IRI) where Yt = cos(Ct + ()) is both weakly and strictly stationary. Let us consider two other examples of weakly but not strictly stationary processes. Let 6 and 7JI be r. v.s each distributed N(O, 1) and such that the distribution of (~I ,7JI) is not bivariate nonnal, and 6 and 171 are uncorrelated. Such examples exist and
241
STOCHASTIC PROCESSES
were described in Section 10. Now take an infinite sequence of independent copies of (6, 77d, that is which in this order are renamed XI, X 2, ... , that is,
XI
= ~I,
X2
= 771,
X3
= 6,
X4
= 772, ..•.
It is easy to check that the random sequence (Xn' n = 1,2, ... ) is weakly stationary but not strictly stationary. Finally, it is not difficult to construct a continuous-time process X = (Xt, t E lR I) with similar properties. For t ~ 0 take Xt to be a r. v. distributed N( I, 1) and for t < 0 let X t be exponentially distributed with parameter 1. Suppose also that for all s f:. t the r.v.s. Xs and X t are independent. Then X is a weakly but not strictly stationary process.
21.2.
On the strict stationarity of a given order
Recall that the process X = (Xt, t ETc lR I) is said to be strictly stationary oforder m if for arbitrary tl, ... ,tm E T and tl + h, ... ,tm + h E T the random vectors (Xtl' ... ,Xttn.) and (Xt1+h, .. . ,Xt,.,..+h) have the same distribution. Clearly, the process X is strictly stationary if it is so of any order m, m ~ I. It is easy to see that the m-order strictly stationary process X is also k-order strictly stationary for every k, 1 :5 k < m. The following example determines if the converse is true. Let ~ and 77 be independent r.v.s with the same non-degenerate dJ. F(x). x E jRl. Define the sequence (Xn, n = 1,2, ... ) as follows:
XI
=~,
X 2 =~, X3 =~, X 4 = 77, Xs = 77, X6 =~, X7 X9 = 77, XIO = 77, X Il =~, ....
= ~,
Xg = C
'This means that
{~'1],
Xn -
if n = 5k if n = 5 k
+ 1,
+ 4,
5k 2, 5k + 3 5 k + 5, for k = 0, 1, 2, ....
It is obvious that the sequence (X n , n = 1,2, ... ) is strictly stationary of order 1. Let us check if it is strictly stationary of order 2. E.g. the random vectors (XI, X2). (X2' X 3). (X4' Xs). (X6, X 7 ), '" are identically distributed. However, (X3, X 4), that is, (X 1+2, X2+1) has a distribution which differs from that of (XI, X 2). Indeed, ~, X 2 ~. X3 ~ and X 4 77 we have since Xl
=
P[XI
=
:5 x\,X2 :5 X2] =
=F(min{xl,x2})
f:.
=
=
P[~
F(xJ)F(X2)
XI,~
X2] = P[~ ~ min{x),x2}]
= P[~ < XI,tl ~ X2] = P[X3
~ xl,X4
:5 X2].
lberefore the sequence (X n , n = 1,2, ... ) is not strictly stationary of order 2. It is clear what conclusion can be drawn in the general case.
242
21.3.
COUNTEREXAMPLES IN PROBABILITY
The strong mixing property can fail if we consider a functional of a strictly stationary strong mixing process
Suppose X = (Xn, n E N) is a strictly stationary process satisfying the strong mixing condition. This means that there is a numerical sequence n(n) to as n -t 00 such that sup IP(AB) - P(A)P(B)I ~ n(n) A,B
where sup is taken over all events A E M~oo' B E MM-n. This condition is essential for establishing limit theorems for sequences of weakly dependent r.v.s (see Rosenblatt 1956, 1978; Ibragimov 1962; Ibragimov and Linnik 1971; Billingsley 1968). Let g(x), x E !R J , be a measurable function and ~ = (~n,n E N) a strictly stationary process. Then the process (X n , n E N) where Xn = g(~n) is again strictly stationary (see e.g. Breiman 1968). Suppose now ~ = (~n, n E N) is strongly mixing and g(x), x E !Roo is a bounded and 'Boo -measurable function. If we define the process X = (X n , n E N) by Xn = g(~n, ~n+J,' .. ), the question to consider is whether the functional 9 preserves the strong mixing property. In general the answer is negative and this is shown in the next example. Let {Cj, j E N} be a sequence ofi.i.d. r.v.s. such that P[Cj = 1] = P[Cj = 0] = Define the random sequence (X j , j E N) where
!.
Since {Cj} consists of i.i d. r.v.s, then {Cj} is a strictly stationary sequence. This implies that the sequence (X j, j E N) is also strictly stationary. However, {c j } satisfies the strong mixing condition and thus we could expect that (X j , j E N) satisfies this condition. Suppose this conjecture is right. Then according to Ibragimov (1962) the sequence of d.f.s
where an and bn are norming constants, can converge as n -t 00 only to a stable law. (For properties of stable distributions see Section 9.) Moreover, if the limit law of Fn has a parameter n, then necessarily bn = (V[XI + ... + XnJ) 1/2 = n I/ah(n) where h(n) is a slowly varying function. Consider the random sequence 00
qj
= L Tk(X j )k- 3/ 4 k=1
where Tk (x), k = 1,2, ... are the Rademacher functions: Tk(X,) = sign sin(2k7r XJ) or Tk = -I + 2ck (Ck as above). Since Tk, k ~ 1 are i.i.d. r.v.s, P[Tk = ±I] = ~, we
243
STOCHASTIC PROCESSES
can easily see that 00
E[gJgj+d
=L
k- 3/ 4(k
+ j)-3/4 > 2- 3/ 4j-3/4
k=J
and
<1~ = (~gj) '] > E [
n'/4(I
+ 0(1)).
Moreover, as a consequence of our assumption that {Xj} is strongly mixing, the sequence {gj} must satisfy the CLT, that is P[(gJ
+ ... + gn)/a n < zJ
~ (f)(z), z E IR J as
n ~
00.
However, as the variance a~ is greater than n 5 / 4 ( 1 + o( 1)) it cannot be represented in the form nh(n) with h(n) a slowly varying function. This contradiction shows that the strictly stationary process (Xj, j E N) defined above does not satisfy the strong mixing condition. This would be interesting even if we could only conclude that not every strictly stationary process is strongly mixing. Clearly, the example considered here provides a little more: the functional (Xj) of a strictly stationary and strong mixing process (c j) may not preserve the strong mixing property.
21.4.
A strictly stationary process can be regular but not absolutely regular
Let X = (Xt, t E IR I) be a strictly stationary process. We say that X is regular if the a-field M- oo := M~oo
n
tElR J
is trivial, that is if M- oo contains only events of probability 0 or 1. This condition can be expressed also in another form: for all B E M~oo and A E M~oo we have sup IP(AB) - P(A)P(B)I ~ 0
as
t ~
00.
A
Further, define p(t) := SUpE[1]11]2J where sup is taken over all r.v.s 1]1 and 1]2 such that 1]1 is M~oo-measurable, 1]2 is M~t-measurable, E1]1 = 0, E1J2 = 0, E[1]?J = 1, E[1]n = 1. The quantity p(t), t ~ 0 is called a maximal correlation coefficient between the a-fields M~oo and M~t. The process X is said to be absolutely regular (completely, strictly regular) if p(t) ~ 0 as t ~ 00. Note that for stationary processes which are also Gaussian, the notion of absolute regularity coincides with the so-called strong mixing condition (see Ibragimov and Rozanov ] 978). It is obvious that any absolutely regular process is also regular. We now consider whether the converse is true.
244
COUNTEREXAMPLES IN PROBABILITY
Suppose X is a strictly stationary process and f function. Then X is regular iff
(,\), ,\
E IR I, is its spectral density
( I) For the proo f of this result and of many others we again refer the reader to the book by Ibragimov and Rozanov (] 978). Consider now a stationary process X whose spectral density is
(2) with p any positive integer. Then it is not difficult to check that f given by (2) satisfies (I). Hence X is a regular process. However, the process X and its spectral density f do not satisfy another condition which is necessary for a process to be absolutely regular (Ibragimov and Rozanov 1978, Th. 6.4.3). Thus we conclude that the stationary process X with spectral density f given by (2) is not absolutely regular even though it is regular.
21.S.
Weak and strong ergodicity of stationary processes
Let X (Xn, n ~ I) be a weakly stationary sequence with EXn say that X is weakly ergodic (or that X satisfies the WLLN) if I
0, n
~
1. We
n
-n L Xh ~ 0
(1)
::=
as
n -+
00.
k=1
If (I) holds with probability ], we say that X is strongly ergodic (or that X satisfies the SLLN). If X = (Xt, t ~ 0) is a weakly stationary (continuous-time) process with EXt = 0 and
~ faT Xtdt~O
(2)
as T
-7 00
then X is said to be weakly ergodic (to obey the WLLN); X is strongly ergodic if (2) is satisfied with probability I (now X obeys the SLLN). There are many results concerning the ergodicity of stationary processes. The conditions guaranteeing a certain type of ergodicity can be expressed in different tenns. Here we discuss two examples involving the covariance functions and the spectral d.f.
=
=
(i) Let X (Xnl n 2:: 1) be a weakly stationary sequence such that EXn 0, E[X~l = ] and the covariance function is C(n) E[XkX k+n ]. Then the condition
(3)
=
lim C(n)
n-too
::=
0
245
STOCHASTIC PROCESSES
is sufficient for the process X to be weakly ergodic (see Gihman and Skorohod 197411979; Gaposhkin 1973). Note that (3) also implies that (lin) L~=I X k ~ 0 which means that (3) is a sufficient condition for X to be L 2-ergodic. Moreover,
if we suppose additionally that X is strictly stationary, then it can be shown that condition (3) implies the strong ergodicity of X. Thus we come to the question: if X is only weakly stationary, can condition (3) ensure that X is strongly ergodic? It turns out that in general the answer is negative. It can be proved that there exists a weakly stationary sequence X = (Xn, n ~ 1) such that its covariance function C(n) satisfies the condition C(n) = O[(loglogn)-2]
as
n ~
00
(hence C(n) ~ 0) so that X is weakly ergodic but (lin) L~=I X k diverges almost surely. Note that the construction of such a process as well as of a similar continuoustime process is given by Gaposhkin (1973). (ii) We now consider a weakly stationary process X = (Xt, t E ~+) with EXt = 0, EXt = 1 and covariance function C(t) = E[XsXs+t] and discuss the conditions which ensure the strong ergodicity of X. Firstly let us formulate the following result (see Verbitskaya 1966). If the covariance function C satisfies the condition
loo
C I C(t)(logt)2dt
< 00
then the process X is strongly ergodic. Moreover. if the process X is bounded. then oo the condition f l t-I C(t) dt < 00 is sufficient for the strong ergodicity of X. Obviously this result contains conditions which are only sufficient for strong ergodicity. However, it is of general interest to look for necessary and sufficient conditions under which a stationary process will be strongly ergodic. The above result and other results in this area lead naturally to the conjecture that eventually such conditions can be expressed either as restrictions on the covariance function C at infinity, or as restrictions on the corresponding spectral dJ. around O. The following example wil1 show if this conjecture is true. Consider two independent r.v.s, say ( and (), where ( has an arbitrary dJ. F(x), x E ~I , and () is uniformly distributed on [0, 27r]. Let X t = V2cos((t
+ ()),
t E ~+.
Then the process X = (Xl, t E ~+) is both weakly and strictly stationary,
EXt
= 0,
and
C(t)
=
I:
cos(tx) dF(x).
In particular this explicit form of the covariance function of X shows that the d.£. F of the r. v. ( is just the spectral dJ. of the process X. Obviously fact this is very convenient when studying the ergodicity of X.
246
COUNTEREXAMPLES IN PROBABILITY
Suppose F satisfies only one condition, namely it is continuous at 0:
= O.
F(O) - F(O-)
(4)
=
Recall that (4) is equivalent to the condition limT .... oo JoT C(t) dt 0 which implies that X is weakly ergodic (see Gihman and Skorohod 197411979). Let us show that X is strongly ergodic. A direct calculation leads to: (5)
1
T
fT
10
X t dt =
and I
T
V2 fT
T 10
cos((t + (J) dt = O(I/T),
fT
10
X t dt
= V2cos(J,
if (=
if
(1= 0
o.
However, (4) impJies that P[( = OJ - O. From (5) we can then conc1ude that X is strongly ergodic. Note especially that (4) is the only condition imposed on the spectral d.f. F of the process X. This example and other arguments given by Verbitskaya (1966) allow us to conclude that in general no restrictions on the spectral d.f. at a neighbourhood of o (excluding the continuity of F at 0) could be necessary conditions for the strong ergodicity of a stationary process.
21.6.
A measure-preserving transformation which is ergodic but not mixing
Stationary processes possess many properties such as mixing and ergodicily which can be studied in a unified way as transformations of the probability space on which the processes are defined (see Ash and Gardner 1975; Rosenblatt 1979; Shiryaev 1995). We give some definitions first and then discuss an interesting example. Let (U, 1', P) be a probability space and T a transformation of U into itself. T is called measurable if T- 1(A) {w : Tw E A} E l' for all A E 1'. We say that T : U I---t U is a measure-preserving transformation if p(T-1 A) P(A) for all A E 1'. If the event A E l' is such that T-I A A. A is called an invariant event. The class of all invariant events is a (1- field denoted by J. If for every A E J we have P(A) = 0 or I, the measure-preserving transformation T is said to be ergodic. The function 9 : (U,:f) I---t (JR', 13') is caned invariant under Tiff g(Tw) - g(w) for all w. It can easily be shown that the measure-preserving transformation T is ergodic iff every T -invariant function is P-a.s. constant. Finally, recall that T is said to be a mixing transformation on (U, 1', P) iff for all A, B E 1'.
=
=
lim P(A
n .... oo
=
n T- n B) = P(A)P(B).
We now compare two of the notions introduced above-ergodicity and mixing. Let T be a measure-preserving transformation on the probability space (U, 1', P). Then: (a) T is ergodic iff any T-invariant function is P-a.s. constant; (b) T mixing
247
STOCHASTIC PROCESSES
implies that T is ergodic (see Ash and Gardner 1975; Rosenblatt 1979; Shiryaev 1995). Let Q = [0, I], l' = 'B[O,I] and P be the Lebesgue measure. Consider the transformation Tw = (w + >')(mod I), w E Q. It is easy to see that T is measure preserving. Thus we want to establish if T is ergodic and mixing. Suppose first that>. is a rational number, >. = kim for some integer k and m. Then the set
2m-2 { k
U
A=
w : 2m
~w<
k
+ 2m
I}
k=O
1.
is invariant and P(A) = This implies that for>. rational, the transformation T cannot be ergodic. Now let>. be an irrational number. Our goal is to show that in this case T is ergodic. Consider a r.v. ~ = ~(w) on (Q, 1', P) with E[e] < 00. Then (see Ash 1972; Kolmogorov and Fomin 1970) the Fourier series 2::'=-00 cne27rinW of the function ~(w) is L2-convergent and 2::'=-00 Ic n l2 < 00. Suppose that ~ is an invariant r.v. Since T is measure preserving we find for the Fourier coefficient Cn that Cn
= E[~(w)e-27rinW] = E[~(Tw)e-27rinTw] = e-27rin'\E[~(Tw)e-27rinwl = e-27rin'\E[~(w)e-27rinw] = cne-27rin".
This implies that C n (1 - e -27rinA) = O. However, as we have assumed that >. is irrational, e- 27rin '\ :j:. 1 for all n :j:. O. Hence Cn = 0 if n :j:. 0 and ~(w) = Co a.s., Co = constant. From statement (a) we can conclude that the transformation T is ergodic. It remains for us to show that T is not mixing. Indeed, take the set A = {w : 0 ~ w ::; 1/2} and let B = A. Since T is measure preserving and invertible, then for any n we get
P(A n T- n B)
= P(A n T- nA) = p(Tn A n A).
Let us fix a number c E (0, I). Since>. is irrational, then for infinitely many n the difference between e 27rin ,\ and e i .O = I, in absolute value, does not exceed c. The sets A and Tn A overlap except for a set of measure less than c. Thus
p(AnT-nB) and for 0
~
P(A) - c
< c < kwe find p(AnT-nB»~.
If the transformation T were mixing, then
P(A n T- n B)
-t
P(A)P(B)
as n -t
00
248
COUNTEREXAMPLES IN PROBABILITY
and P(A)P(B) ~
i. On the other hand, since P(A) =
~,
P(A)P(B) = [P(A)]2 = ~. Thus we come to a contradiction, so the mixing property of T fails to hold. Therefore, for measure-preserving transformations, mixing is a stronger property than ergodicity.
21.7.
On the convergence of sums of
It is well known that in the case of independent r. v.s {X n, n ~ I} the infinite series E~= I X n is convergent simultaneously in distribution, in probability and with probability I. This statement, called the Levy equivalence theorem, can be found in a book by Ito (1984) and leads to the question: does a similar result hold for sequences of 'weakly' dependent r.v.s? Let {X n, n ~ I} be a stationary random sequence satisfying the so-called
IP(AB) - P(A)P(B)I
~
where A E M~. B E M:+ n , m ~ I. n ~ I and P(A) > O. Note that there are several results concerning the convergence of the partial sums Sn = XI + . " + Xn as n -t 00 of a
00;
Recall that for independent r.v.s a condition like Xn ~ 0 is not involved. Since for
!
21.S.
The central limit theorem for stationary random sequences
The classical CLT deals with independent r.v.s (see Section 17). Thus if we suppose that {Xn, n ~ I} is a sequence of 'weakly' dependent r.v.s we cannot expect that without additional assumptions the normed sums Sn/ Sn wiJ] converge to the standard normal distribution. As usual, Sn = XI + " . + X n , s~ = VS n. There are works
STOCHASTIC PROCESSES
249
(see Rosenblatt 1956; Ibragimov and Linnik 1971; Davydov 1973; Bradley 1980) where under appropriate conditions the CLT is proved for some classes of stationary sequences (see Ibragimov and Linnik 1971; Bradley 1980) and for stationary random fields (see B ulinskii 1988). We present below a few examples which show that for stationary sequences the normed sums Sn/ Sn can behave differently as n -+ 00. In particular, the limit distribution, if it exists, need not be the normal distribution :N(O, 1).
e
be a r.v. distributed uniformly On [0, 1]. Consider the random sequence {X n , n = 0, ± 1, ... } where Xn = cos(27rne). It is easy to see that the variables Xn
(j) Let
are uncorrelated (but not independent), so {X n} forms a weakly stationary sequence. If Sn Xl + ... + Xn we can easily see that ES n 0 and VS n = !n. Moreover,
=
=
1
I sin(27r( n
2
2
+ ~ )e)
- + - -----:-"'-sin(7re)
.
According to a result by Grenander and Rosenblatt (1957). we have d
S1£ --+ Y:
1
1 sin(7r7J)
2
22 sin( 7re)
- + - ---'--'-'--
n -+
as
00
e.
where 7J is another r. v. uniformly distributed on [0, 1] and independent of Note especially that Sn itself. not the normed quantity Sn/ Sn. has a limit distribution. Moreover, it is obvious that Sn/sn does not converge to ar.v. distributed :N(O, 1).
=
(ii) Consider the sequence of r.v.s {Xn' n 0, ±l, ... } such that for an arbitrary integer n and non-negative integer m, the random vector (Xn, X n+ h .. . , X n+m ) has the following density:
1 + 1
f(X1£,Xn+h""X n+m ) = 2'(27r)-n/2alnexp
(1-2aI2Lx~+k (1-2ai2Lx~+k m
)
k=O
2(27r)-n/2ainexp
m
)
.
k=O
Here a, > 0, a2 > 0 and we assume that a, -I a2. Obviously {Xn} is a strictly stationary sequence. If Sn = Xl + ... + X n it is not difficult to see that lim P[Sn/sn S;
n-+oo
xl := G(x) =
~q,(alx)
+ ~q,(a2x).
Thus the limit distribution G of Sn/ Sn is a mixture of two normal distributions and. since a, -I a2. G is not normal.
(iii) Let {Xn,n 0, I, ... } be a strictly stationary sequence with E[X~] < 00 for all n. Denote by p( n), n 2: 1. the maximal correlation coefficient associated with this sequence (see Example 21.4). Recall that in general
p(n) = sup {E[(7J, - E7Jd(1}2 - E1}2)]/(V1}1 V1}2)'/2} 1/1.1/2
COUNTEREXAMPLES IN PROBABILITY
250
where 7J1 is M~oo-measurable, 7J2 is M:+n-measurable, any integer and n ~ 1. Note that the condition p(n) ~
°
as
n ~
°<
V7JI, V7J2
<
00,
m is
00
plays an important role in the theory of stationary processes. In particular, the rpmixing condition implies that p( n) ~ 0 and, further, p( 7J) ~ 0 implies the strong mixing condition (for details see Ibragimov and Linnik 1971). Suppose {Xn, n = 0, ± 1, ... } is a strictly stationary sequence with EX n = and E[X~] < 00 for all n. Using the notation Sn = XI + ... + Xn and s~ = E[S~] we formulate the following result of Ibragimov (1975). If p( n) ~ 0 then either SUPn s~ < 00 or s~ = nh(n) where h(n) is a slowly varying function as n ~ 00. If
°
s~ ~
p(n) ~ 0 and for some <5 > 0, E[lXO/2+0] < 00, then Sn/ Vn ~ Y as n ~ 00 forar.v. Y distributedN(O, 1). Our aim now is to see whether the conditions for this result can be weakened while preserving the asymptotic normality of Sn/ Vn' In particular, an example will be described in which instead of the condition E[IXoI2+O] < 00 we have E[IXoI2+O] = 00 for each <5 > 0 but E[lXoI2] < 00. This example is the main result in a paper by Bradley (1980) and is formulated as follows. There exists a strictly stationary sequence {Xn, n = 0, ± 1, ... } ofreal-valued r. v.s such that: (a) EX n = and 0 < V Xn < 00; (b) s~ ~ 00 as n ~ 00; (c) p(n) ~ 0 as n ~ 00; (d) for each A > there is an increasing sequence of positive integers 00,
° °
{n(k)} such that AI / 2S n (k)/Sn(k) ~~A as k ~
00
where ~A is ar.v. with ad.f.
FA
defined by
°
Note that for each fixed A > the limit distribution FA is a Poisson mixture of normal distributions and has a point-mass at 0. Thus FA is not a normal distribution. Therefore the stationary sequence constructed above does not satisfy the CLT. It is interesting to note that FA is an infinitely divisible but not a stable distribution. (Two other distributions with analogous properties were given in Example 9.7.) Let us note finally that Herrndorf (1984) constructed an example of a stationary seq uence (not m-dependent) of mutually uncorrelated r. v.s such that the strong mi xing coefficient tends to zero 'very fast' but nevertheless the CLT fails to hold. For more recent results on this topic see the papers by Janson (1988) and Bradley (J 989).
SECTION 22.
DISCRETE-TIME MARTINGALES
Let (Xn,n ~ I) be a random sequence defined on the probability space (n,1',p). We are also given the family (1'n, n ~ ]) of non-decreasing sub-a-fields of 1', that
STOCHASTIC PROCESSES
251
is Tn C T for each n and Tn C T n+l • As usual, if we write (Xn,Tn,n 2:: I), this means that the sequence (Xn) is (Tn)-adapted: Xn is Tn-measurable for each n. The sequence (Xn, n 2:: I) is integrable if Etxnl < 00 for every n 2:: I. If sUPn>1 EIXnl < 00 we say that the given sequence is Ll-bounded, while if E[SuPn;1 IXnlJ < 00 the sequence (Xn' n 2:: 1) is LI-dominated. The system (Xnl Tn, n 2:: I) is called a martingale if EIXnl < 00, n 2:: I and
(I) for all m S; n. If in (l) instead of equality we have E[XnITm] S; Xm or E[ X niTm] 2:: X m , then we have a supermartingale or a submartingale respectively. A stopping time with respect to (Tn) is a function T : n 1-+ N U {oo} such that [T == n] E Tn for all n 2:: l. Denote by T the set of all bounded stopping times. Recall that the family (aT> T E T) of real numbers (such a family is called a net) is said to converge to the real number b if for every e > 0 there is TO E T such that for all T E T with T 2:: TO we have la r - bl < c. Some definitions of systems whose properties are close to those of martingales but are in some sense generalizations of them are listed below. The random sequence (Xn, Tn, n 2:: 1) is said to be: (a) a quasimartingale if I:~=I E[IXn - E(Xn+IITn)1J < 00; (b) an amart if the net (E[Xr], T E T) converges; (c) a martingale in the limit if sUPm~n IE(XmITn) Xnl ~ 0 as n -; 00; (d) a game fairer with time if sUPm>n IE(XmITn) - Xnl 0 as n -; 00; (c) a progressive martingale if An- C An+ I for n 2:: 1 and P[U~= JAn] == where An = [E(Xn+dTn) = XnJ; (f) an eventual martingale if P[E(Xn+IITn ) =f. Xn Lo.] = O. Random sequences which possess the martingale, supermartingale or submartingale properties are of classic importance in the theory of stochastic processes. Complete presentations of them have been given by Doob (] 953), Neveu (1975) and Chow and Teicher (l978). The martingale generalizations (a)-(f) given above have appeared in recent years. Many results and references in this new area can be found in the works of Gut and Schmidt (1983) and Tomkins (I 984a, b). In this section we have included examples which illustrate the basic properties of martingales and martingale-like sequences (with discrete time) and reveal the relationships between them.
22.1.
Martingales which are L I -bounded but not L I-dominated
=
Let X (Xn, Tn, n 2:: 1) be a martingale. The relation sUPn>J EIXnl < E[SUPn~1 IXnl] implies that every L1-dominated martingale is also L I-bounded. This
252
COUNTEREXAMPLES IN PROBABILITY
raises the question of whether the converse is true. The answer is negative and will be illustrated by a few examples. (i) Consider the discrete space n = {I, 2, ... } with probability P on it defined by P({n}) = n~l' n E N. Let (1'n,n 2 I) be the increasingsequenceofO'-fields where 1'n is generated by the partitions { {I}, {2}, ... , {n}, [n + 1,00)}. Define the sequence (Xn,n 2 1) ofr.v.s by
*-
Xn = Xn(w) = (n + I) x l(n+l,oo)(W),
n E N.
Then X = (X n, ~n, n 2 1) is a positive martingale such that EX n = I for all n E N and hence X is LJ -bounded. However, sUPnEN Xn(w) = wand clearly it is not integrable. Therefore the martingale X is not L I-dominated. (ii) Let n
= [0, 1], ~ = 13[0,1] and P be the Lebesgue measure. Define = X = {O,
X n
w
-n 2w + n,
if lin if 0 ~
<W ~ W
1
~ I In
t
and ~n = 0'{X 1 , •• • , X n }. Then (Xn, ~n, n 2 1) is a martingale. Since EXn = for each n E N this martingale is L I-bounded. However, its supremum, sUPnEN IXn I, is not integrable and the L I-domination property fails to hold.
(iii) Let w = (w(t), t 2 0) be a standard Wiener process, ~t = O'{ W s , S ~ t}. Take any numerical sequence {nk' k 2 I} such that 0 < nl < n2 < ... -t 00 as k -t 00. Denote Mk = exp[w(nk) - ~nkl. Then it can be shown that M = (Mk, ~nk' k 2 I)
is a non-negative martingale (and even that Mk ~ 0 as k -t 00) which is integrable but E[suPk>1 M k ] = 00. Hence in this case the LI-domination property again does not hold. despite the integrability of M. One additional example of an L I-bounded but not L I-dominated martingale will be given at the end of Example 22.2.
22.2.
A property of a martingale which is not preserved under random stopping
Let X = (Xn, ~n, n 2 1) be a martingale and Yrt = ~(Xl + ... + Xn). Denote by T the set of all bounded (~n)-stopping times and introduce the following four conditions: (1)
supEIXnl
< 00,
n~1
(2) (3)
(4)
sup EIYnl < 00, n>1 sup EIXT I < 00, TET supEIYTI < 00. TET
253
STOCHASTIC PROCESSES
Obviously, conditions (3) and (4) can be considered as 'random stopped versions' of (1) and (2) respectively. It is well known (see Yamazaki 1972) that conditions (1) and (2) are equivalent; moreover, conditions (1) and (3) are also equivalent. Thus it is natural to assume that (3) and (4) are equivalent. However, as we shall now see, this conjecture is wrong. Let 7 E T, that is 7 is a positive integer-valued r.v. such that P[7 < 00] = 1 and let l»[7 > n] > 0 for every n 2 1. Denote by 1'n the cr-field generated by the events [7 = 1], [7 = 2], ... , [7 = n]. Clearly, 7 is an (1'n)-stopping time. Let {b n , n 2 I} be a non-increasing sequence of positive numbers such that bk - 1 - bk = 0 for those k for which P[7 = k] = 0, and in such cases we also put (bk-l - bk)/P[7 = k] = O. Define the sequence (X n, n 2 1) of r. v.s by n
(5)
Xn(w) = L[(bk-l - bk)/P[7 = k]]I[T=kj(W) k=l
+ (b n/P[7 > n])I[T>nj(w).
Then it is not difficult to check that X = (X n, ::Tn, n 2 1) is a non-negative martingale. Indeed, taking into account that [7 = 1], ... , [7 = n - 1] and [7 > n - 1] are atoms of ::Tn, we can easily see that
r
J[T=kj
j
[T>n-Ij
(Xn-Xn-1)dP=0,
k=I, ... ,n-l,
(Xn - Xn-t) dP = (b n- 1 - bn) + (b n - bn-l) = O.
These relations imply the martingale property of X. We can check directly that condition (1) is satisfied and hence (2) and (3) hold. It then remains for us to clarify whether condition (4) is satisfied. To do this, consider the variable YT (1/7)(X 1 + ... + X T). Clearly,
=
T-l
T-l
k=1
k=1
Here 7] is a r.v. which takes the value (l/n) E~~ll (b k /P[7 > k]) with probability equal to P[7 = 11]. This implies that EYT 2 ET/. So our aim is to estimate the expectation E7]. However, we need the following result from analysis (the proof is left to the reader): if {an, n 2 I} is a positive non-increasing sequence converging to zero and {b n , n 2 I} is a non-negative and non-increasing sequence, then
(6) Now let an = P[7 > n], and take the sequence {b n , n 2 I} used to define X by (5) to be non-increasing and bounded from below by some positive constant, that
254
COUNTEREXAMPLES IN PROBABILITY
°
is bn 2 c = constant > for all n 2 I. Then these two sequences, {an, n 2 I} and {b n , n 2 I}, satisfy the conditions required for the validity of (6). After some calculations we find that E7J = 00 and hence
EIYTI = EYT 2 E7J =
00.
Therefore condition (4) does not hold in spite of the fact that (I), (2) and (3) are satisfied. Finally. let us look at the following possibility. It is easy to see that the martingale (XTIl ~T" n 2 1) defined by (5) is unifonnly integrable. If in particular we choose bn = 1/(n + 1) and P[T = n] = 2- n, then we can check that E[suPn>I Xn] = 00. Thus we obtain another example of a martingale which is U -bounded but not U -dominated (see also Example 22.1).
22.3.
Martingales for which the Doob optional theorem fails to hold
Let X = (X n , ~n, n 20) be a martingale and T be an (~n)-stopping time. Suppose the following two conditions are satisfied: (b)
(a)
lim n-+oo
1
[T>n]
Xn dP = 0.
Then EXT = EXo· This statement, called the Doob optional theorem, is considered in many books (see Doob 1953; Kemeny et a11966; Neveu 1975). Conditions (a) and (b) together are sufficient conditions for the validity of the relation EXT = EXo. Our purpose now is to clarify whether both (a) and (b) are necessary. (i) Let {'rJn, n
2 I} be a sequence of ij.d. r. v.s. Suppose 7Jl takes only the values
-1,0,1 and E7J1 = 0. Define Xn = 7J1 + '" + 7Jn and ~n = a{7JI, ... ,7Jn} for n 2 1 and Xo = 0, ~o = {0, O}. Clearly X = (X n , ~n, n 2 0) is a martingale. If T = inf{n : Xn = I}, then T is an (~n)-stopping time such that P[T < 00] = 1 and X T = 1 a.s. Hence EXo = :f 1 = EXT which means that the Doob optional theorem does not hold for the martingale X and the stopping time T. Let us check whether conditions (a) and (b) are satisfied. It is easy to see that EIXTI < 00 and thus condition (a) is satisfied. Furthennore,
°
0=
r
in
XndP=
r
i[T$n]
XndP+
r
i[T>n]
XndP:=JI
+lz.
The term JI is equal to the probability that level 1 has been reached by the martingale X in time n and this probability tends to 1 as n -+ 00. Since JI + lz = we see that lz tends to -1, not to O. Thus condition (b) is violated.
°
(ii) Let €1,6, ... be independent r.v.s where €n '" N(O,b n ). Here the variances bn , n 2 1, are chosen as follows. We take bl = 1 and bn + I = a~+ I - a~ for n 2 1
STOCHASTIC PROCESSES
255
where an (n - 1) 2 /log(3 + n). The reason for this special choice will become clear later. DefineXn = 6 + ... +€n andJ'n = a{6,··· ,€nl' Then X = (Xn,J'n,n 2:: 0) is a martingale. Let 9 be a measurable function from lR. to N with P[g( € I) = n] = Pn where Pn = n- 2 - (n + l)-2,n 2:: 1. Thus T := g(€d is a stopping time and EX 1 does moreover its expectation is finite. It can be shown that the relation EXT not hold. So let us check whether conditions (a) and (b) are satisfied. Denote by F the dJ. of €I and let 8 1 = 0, 8 n = 6 + .,. + €n for n 2:: 2. Thus 6 is independent of 8 1 ,82 )", and Xn = €I + 8 n where 8 n ,..., N(O, a~). Now we have to compute the quantities EIXTI and ~T>nJ IXnl dP. We find {
J[T>n]
IXnl dP =
~
(
J[g>n] {
J[g>n]
= (
J[g>n]
E[ly {Iyl
+ 8 n ll dF(y)
+ EISnl} dF(y)
Iyl dF(y)
+ canp[g > n]
where c = E[I€,Il. It is easy to conc1ude that ~T>nJ IXnl dP -+ hence condition (b) is satisfied. Furthennore,
- c
J
°
as n -+
00
and
00
ag(y) dF(y) = c LPnan
= 00
n=1
and condition (a) is not satisfied. Examples (i) and (ii) show that both conditions (a) and (b) are essential for the validity of the Doob optional theorem.
22.4. Every quasimartingale is an amart, but not conversely It is easy to show that every quasimartingale is also an amart (for details see Edgar and Sucheston 1976a; Gut and Schmidt 1983). However, the converse is not always true. This will be illustrated by two simple examples. (i) Let an = (-I )nn -I, n 2:: 1. Take Xn = an a.s. and choose an arbitrary sequence {Tn, n 2:: I} of bounded stopping times with the only condition that T t 00 as
°
°
Since an -+ as n -+ 00, we have X Tn ~ as n -+ 00. Moreover, lanl ~ 1 implies that EX Tn -+ as n -+ 00. Hence for any increasing family of a-fields (J'n,n 2:: I) to which (Tn) are related, the system (Xn,J'n,n 2:: 1) is an amart. However, L::~=I EIX n - E(Xn+dJ'n)1 = L::~=I Ian - an-d = 00 and the amart (X n, J'n, n 2:: 1) is not a quasimartingale.
n -+
00.
°
COUNTEREXAMPLES IN PROBABILITY
256
(ii) Let (Xnl n ~ 1) be a sequence of i.i.d. r.v.s such that P[Xn = 1J = P[Xn - 1) == and let (c n , n ~ 1) be positive real numbers, Cn .J. 0 as n -+ 00 and L~=I Cn = 00. Consider the sequence (Yn1 n ~ 1) where Yn cnX I ... Xn and the a-fields ~n == a{X 11 ... 1 X n }. Clearly, Yn is ~n-measurable for every n > 1. Since a.s.IYnl ::;: C n .J.. 0, YTn ~ 0 as n -+ 00 for any sequence of bounded stopping times (Tn, n ~ 1) such that Tn t 00 as n -+ 00. Applying the dominated convergence theorem, we conclude that EYTn -t 0 as n -t 00, so Y = (Yn , 1'n , n ~ 1) is an amart. However,
!
=
00
00
00
LCn==oo n=1
n=1
n=1
and therefore the amart Y is not a quasi martingale.
22.5.
Amarts, martingales in the limit, eventual martingales and relationships between them
(i) Let e" 6, . " be a sequence of positive Li.d. r.v.s such that Eel < 00 and E[e. log+ ed = 00. Consider the sequence XI, X 2 , • .• where Xn = en/n and the family (1'n,n ~ 1) with1'n = a{ell ... ,en}. It is easy tocheckthatXn~Oas n -t 00. Moreover, EX n -t 0 as n -t 00 and E[suPn>I XnJ = 00. It follows that X - (X n , ~n, n ~ ]) is a martingale in the limit, but X is not an am art because the net (EXT, T E T) is unbounded where T is the set of all bounded (1' n)-stopping times. (ii) Consider the sequence (T}n, n ~ 1) of independent r.v.s, P[T}n == 1] 1 - P[T}n = 0]. Let1'n = a{T}I,"" T}n} and Xn = T}I + ... + T}n· Since
= n- 2 ==
00
E(Xnl1'm ) - Xm
=
~ k- 2 L
k=m+'
:::}
lim
n>m-+OQ
(E(Xnl1'm )
Xm) = 0 a.s.
-
we conclude that X - (X n, l' n , n ~ 1) is a martingale in the limit. Moreover,
imply that X is even uniformly integrable. Despite these properties, X is not an eventual martingale. This follows from the relation
and definition (f) (see the introductory notes in this section).
257
STOCHASTIC PROCESSES
22.6.
Relationships between amarts, progressive martingales and quasi martingales
= =
=
=
(i) Let ({nl n ~ 1) be independent r.v.s such that P[{n 1] n/(n + 1) 1 P[{n 0], n ~ 1. Define 1]1 1 and for n ~ 2,1]n (_I)n-I{16·· ·{n-I. Further, let Xn = 7}1 + ... + 7}n and 1"n = (J{ {II' .. 1 {n}. Obviously, for every n, Xn is either 0 or 1. Moreover, by the Borel-Cantelli lemma, P[{n = 0 i.o.] 1 which 0 jf implies that P[1]n "# 0 Lo.] = O. However, E[7}nl1"n-d = 7}n-1 a.s. and 7}n+1 1},. = O. Hence X = (Xn I 1"n 1 n ~ 1) is a progressive martingale. Let us check if X
=
=
=
is a quasimartingale. We have n-l
E7}n
= (-I)n-1 II (k/(k + 1)) = (_l)n-In-I k=1
and 00
00
n=l
n=l
L E/E(1}nl1"n-l)1 = L n- I =
00
Therefore the progressive martingale X is not a quasimartingale. (ii) Let us now discribe a random sequence which is a progressive martingale but not an amart. Consider the sequence ({n, n ~ 1) of independent r. v.s where P[c;n 1] n/(n + 1) 1 - P[c;n 0] (case (i) above). Let Xo = 1 and for n > I, 2 Xn = n {1{2 ... c;n-I, and 1"n = (J{c;ll'" 1 {n}, n ~ 1. Clearly,
=
=
By the Borel-Cantelli lemma P[c;n = 0 i.o.] = 1 and since Xn-I = 0 implies that Xn = O. we conclude that P[Xn "# 0 Lo.] = O. Consequently X = (Xn' 1"n, n ~ 1) is a progressive martingale. However, EXn n --+ 00 as n --+ 00 which shows that X cannot be an am art. (iii) Recall that every quasimartingale is also an amart and a martingale in the limit.
Let us illustrate that the convers is false. Consider the sequence (Xnln ~ I) given by Xn = E~=I(-I)k-Jk-I and let 1"n 1"0 for all n ~ 1. Then X - (Xnl 1"nl n ~ 1) is an am art and also a martingale in the limit. Further, we have 0 < Xn < J + E~l (-l)k-I k- 1 < 00. However. 00
00
L EIE(Xnl1"n-dl = L nn=1
and therefore X is not a quasimartingale.
n=1
I
= 00
258
22.7.
COUNTEREXAMPLES IN PROBABILITY
An eventual martingale need not be a game fairer with time
Let (~nln 2: I) be independentr.v.s such thatP[~n = -1] = 2- n = I - P[~n = I], n2: l.Let1'n=a{~I"",~n},7J1 =~I,7Jn+1 =2n~n+II(~n=-I)forn2: land Xn = 711 + ... + 71n, n 2: 1. Then for k > I we find E[7Jkl1'k-d
= 2k- 1I(~k-I = -1)E(~kl~h-l) = (2 k- 1 -
I)I(~k_1 = -I).
Hence
Therefore X = (Xnl 1'n, n 2: I) is an eventual martingale. Now take m 2: 2. Then
E(X2m - Xml1'm) = E~:m+1 E[E(71kl~k-1 )I~m] = E!:m+2(2 k- 1 - I)P[~k-1 = -I] + (2m - I)I(~m = -I) 2m l k l > - ,,2~ L.Jk-m+2 (I _ 2- + ) > (I _ 2- + ) > 1. 2
Hence if 0
! we obtain
P[lE(X2m l1'm) - Xml
> c] =
1
for all m
> 2.
This means that X is not a game fairer with time.
22.8.
Not every martingale-like sequence admits a Riesz decomposition
Recall that the random sequence (X n 1 l'n 1 n 2: I) is said to admit the Riesz decomposition if X n = M n + Zn, n 2: I, where (Mn 1 ~n 1 n 2: I) is a martingale and E[ZnIA] ~ 00 as n ~ 00 for every A E U~I ~n. If this property holds then the sequence (EX n ) must converge since
There are of course martingale-like sequences which admit the Riesz decomposition. However, this property does not always hold. Consider the sequence (~nl n 2: I) ofi.i.d. r.v.s such that P[~I = 4] = P[~I = 0] = Let Xn = ~16 ... ~n and ~n = a{~1 l'" 1 ~n}' n 2: 1. Since EX n = 2 n ~ 00, (Xnl 1'nl n 2: I) does not admit a Riesz decomposition. It remains for us to show that (X n 1 ~n 1 n 2: 1) is a martingale-like sequence in the sense of at least one of the definitions (a)-(f) given in the introductory notes. In particular, it is easy to see that E[Xn+ll~n] = 2Xn . Also we have Xn+1 = 0 if Xn = O. By the Borel-Cantelli lemma we conclude that P[Xn =I 0 i.o.] = 0 and therefore (Xnl 1'nl n 2: 1) is a progressive martingale.
!.
STOCHASTIC PROCESSES
22.9.
259
On the validity of two inequalities for martingales
Here we shaH consider two important inequalities for martingales and analyse the conditions under which they hold or fail to hold. (i) Let (Xn, ~n, n ;::: 1) be a martingale and 9 : lIt I H lItl be a measurable function
which is: (a) positive over lIt+; (b) even; and (c) convex; that is, for any x, y E lIt I, g(1(x + y» ~ !g(x) + 19(y). Then for an arbitrary e > 0,
(1)
p [ sup IXkl;::: e] S; E[g(Xn )]/ g(e). O~k~n
Note that this extension of the classical Kolmogorov inequality was obtained by Zolotarev (1961). Now we should like to show that the convexity of 9 is essential for the validity of (1). Suppose 9 satisfies conditions (a) and (b) but not (c). Since 9 is not convex in this case, there exist a and h, 0 < h ~ a, such that
g(a) > !g(a - h)
(2)
el
+ !g(a + h).
el
Consider the r.v.S XI = and X 2 = 6 + 6 where and 6 are independent. takes the values ±a with probability! each. and 6 takes the values ±h also with probability! each. It is easy to check that E[XzIXd = XI a.s. Thus letting ~1 O'{ ~2 O'{ 6} we find that the system (Xk' ~k, k = 1,2) is a martingale. Since 9 is an even function. taking (2) into account we obtain
6
=
ed.
E[g(Xz)J
= e"
+ g( -a + h) + g(a Hg(a - h) + g(a + h)] < g(a) Hg( -a - h)
h) + g(a + h)]
and p [ sup IXkl;::: t~k9
a]
= I
> E[g(X2 )]/g(a).
Therefore inequality (I) does not hold for the martingale constructed above, taking
e = a. Let X = (Xn,~n,n ;::: 1) be a martingale and [XJn = Ej=I(.1.X j )z • .1.Xj = Xj - Xj-I. Xo = 0 be its quadratic variation. Then for every p > 1 there are universal constants Ap and Bp (independent of X) such that
(ii)
(3) where IIXnlip (E(IXnIP»I/p, Note that inequalities (3), called Burkholder inequalities, are often used in the theory of martingales (for details see Burkholder and Gundy 1970; Shiryaev 1995),
260
COUNTEREXAMPLES IN PROBABILITY
We shaH now check that the condition on p, namely p > I, is essential. By a simple example we can mustrate that (3) fails to hold if p = 1. Let ~l, 6, ... be independent Bernoulli r.v.S with P[~i = 1] = P[~i = -1] = ~ and let Xn = 'L7~; ~j where T = inf{n ~ 1 : 'L7=1 ~j = I}. If ~n = U{~l"" '~n} then it is easy to see that the sequence X = (Xn, ~n, n ~ I) is a martingale with the property IIXnll1 = EIXnl = 2E[X;i] ~ 2 as n ~ 00. However,
Therefore in general inequalities (3) cannot hold for p = I.
22.10.
On the convergence of submartingales almost surely and in L I -sense
Let (X n, ~n, n ~ 1) be a submartingale satisfying the condition
(1)
sup EIXnl
n>l
< 00.
Then, according to the classical Doob theorem, the limit Xoo := limn-+oo Xn exists a.s. and EIXool < 00. Moreover, if (Xn, ~n, n ~ 1) is a uniformly integrable submartingale, then there is a r.v. Xoo with EIXool < 00 such that Xn ~ Xoo and Ll
Xn --+ Xoo as n
~ 00.
The proof of these and of many other close results can be found in the books by Doob (1953), Neveu (1975), Chow and Teicher (1978) and Shiryaev (1995). Let us now consider a few examples with the aim of i1lustrating the importance of the conditions under which the above results hold. Let {~n' n ~ I} be i.i.d. r. v.s with P[~l = 0] = P[6 = 2] = ~. Define Xn = ~l"'~n and ~n = U{~l, ... ,~n}, n ~ 1. Then (Xn,~n,n ~ 1) is a martingale with EX n = 1 for all n ~ 1. Hence condition (1) implies that Xn ~ Xoo as n ~ 00 where Xoo is a r.v. with EIXool < 00. Clearly we have P[Xn = 2n] = 2- n , P[Xn = 0] = 1 - 2- n and Xoo = 0 a.s. However, (i)
L'
EIX n - Xool = EX n = 1. Therefore Xn Xn to Xoo.
--f+ Xoo
despite the a.s. convergence of
(ii) Let (0, ~, P) be a probability space defined by 0 = [0, 1]. ~ = 13[0, I] and let P be
the Lebesgue measure. On this space we consider the random sequence (Xn' n ~ I) where Xn = Xn(W) = 2n if W E [0,2- n] and Xn = Xn(W) = 0 if w E (2- n , J] and let ~n = u{X 1 , ••• , X n }. Then (Xn, ~n, n ~ 1) is a martingale with EXn = 1
261
STOCHASTIC PROCESSES
for all n ~ 1. Hence by (1), Xn ~ Xoo as n -t
00
with Xoo = O. Again, as above,
Ll
-1-+ 0 as n -t 00. So, having examples (i) and (ii), we conclude that the Doob condition (1) guarantees a.s. convergence but not convergence in the L I-sense. In both cases we have EIXnl = 1, n ~ 1, which means that the corresponding martingales are not unifonnly integrable. Xn
(iii) Let us consider this further. Recall that the martingale X = (Xn, ~n, n ~ 1) is said to be regular if there exists an integrable r.v. ~ such that Xn = E[~I~n] a.s. for each n ~ 1. Clearly, if the parameter n takes only a finite number of values, say n = 1, .. " N, then such a martingale is regular since Xn = E[XNI~n]. However, if n E N, the martingale need not be regular. Note first the following result (see Shiryaev 1995): the martingale X is regular iff X is unifonnly integrable. In this case Xn = E[Xool~n] where Xoo =: limn~oo X n . Consider the sequence (~k, k ~ 1) of i.i.d. r. v.s each distributed N(O, 1) and let Sn = ~I + ... + ~n' Xn = exp(Sn - ~n), ~n = a{~I"'" ~n}' Then we can easily check that X = (Xn, ~n, n ~ 1) is a martingale. Applying the SLLN to the sequence (~k, k ~ I) we find that Xoo:= lim Xn = lim exp [n ('!'Sn n~oo
n
n~oo
!) 1
= 0
2
a.s.
Therefore a.s.
Xn
-I
E[Xool~n]
=
o.
Thus we have shown that the martingale X is not regular and it can be verified that it is not unifonnly integrable.
22.11.
A martingale may converge in probability but not almost surely
Recall that for series of independent r. v.s the two kinds of convergence, in probability and with probability 1, are equivalent (see e.g. Loeve 1978, Ito 1984, Rao 1984). This result leads to the following question forthemartingaleM = (A1n,~n,n ~ 1).Ifwe know that M n converges in probability as n -t 00, does this imply its convergence with probability I? (The converse, of course, is always true.)
(i) Let (~n, n ~ 1) be a sequence of independent r. v.s where P[~n=±I]=(2n)-I,
P[~n=O]=I-n-l,
n>l.
Consider a new sequence (Xn' n ~ 0) given by Xo = 0 and
X Let ~n hold:
- {~n' nnXn-t/~nl,
if X n if X n -
I I
= 0
-10,
n > 1.
a{~I"'" ~n}' We can easily verify that the following four statements
262 (a) (b) (c) (d)
COUNTEREXAMPLES IN PROBABILITY
X = (Xn, :1"'n, n ~ 1) is a martingale; for each n ~ 1, Xn = 0 iff
Note that statement (d) follows from the relation L:~I P[I~nl = 1] = 00. We are interested in the behaviour of Xn as n -+ 00. Obviously, (c) implies that p
Xn ~ 0 as n -+ 00. However, (d) shows that P[w : Xn(w) converges] = O. Thus the martingale X converges in probability but not with probability 1.
«n,n
(ii) Let ~ 1) be a sequence of i.i.d. r.v.s each taking the values ±1 with probability!. Define :1"'n a{ ~I, ... ,
=
Xn+1 = Xn(1
+
1.
It is easy to check that X = (Xn, :1"'n, n ~ 1) is a martingale. Since
we conclude that lim P(Xn = 0) n-too
= 1,
P[w: Xn(w) converges] = O.
Therefore the martingale X is a.s. divergent despite the fact that it converges in probability. (iii) The existence of martingales obeying some special properties can be proved by using the following result (see Bojdecki 1977). Let the probability space (O,:r, P) consist of 0 = [0,1], :r = 'B[O,I] and P the Lebesgue measure. For any sequence n ~ 1) of simple r. v.S «j is simple if it takes a finite number of values) there exists a martingale (Xn, :1"'n, n ~ 1) such that
«n,
P[
= 1.
Recall that there are sequences of simple r. v.s converging in probability but not a.s., and other sequences which are bounded but not converging. Thus in these particular cases we come to the following two statements. (a) There exists a martingale (Xn, :1"'n, n ~ 1) such that
Xn
p
~ 0 as
n -+
00
but
P[w: Xn(w) converges to 0] = O.
(b) There exists a martingale (X n , :1"'n, n ~ 1) such that P[w : (Xn, n 2: 1) is bounded]
=1
but
P[w: Xn(W) converges] = O.
263
STOCHASTIC PROCESSES
22.12.
Zero-mean martingales which are divergent with a given probability
=
(i) Let (~n, n ~ 1) be a sequence of i.i.d. r.v.s with E~l 0 and EI61 > O. Take another sequence (1Jn, n ~ 1) of independent r.v.s with E1Jn = 0, E[1J~] = n- 2 , n ~ 1, and consider the two series, L~l ~n and L~l1Jn. According to Chung and Fuchs (1951), the series L~ 1 ~n diverges a.s. On the other hand, the series L~l1Jn converges by the Kolmogorov three-series theorem. Assume that (~n, n ~ 1) and (1Jn, n ~ 1) are independent of each other and take another r.v. Xo which is independent of both sequences (~n) and (1Jn) and is such that P[ Xo = 1] = p = 1 - P[ Xo = -1] where p is any fixed number in the interval [0,1]. Define the new sequence (X n , n ~ 1) as:
= 1) + 1JnJ(Xo - -1). = O'{Xl, ... ,Xn } and put Sn = L~=1 Xk. Then (Sn,:Tn,n Xn =
~nJ(XO
Let:Tn ~ 1) is a 0, n ~ 1. The question of obvious interest is what happens martingale with ESn to the sequence (Sn) when n --t Since
=
00.
n
Sn = J(Xo = 1) L~k k=1
+ J(Xo
it foHows that
P[Sn converges]
= P[Xo = -1] = 1 -
~
p,
P[Sn diverges]
= P[Xo = 1] = p.
0) be a standard Wiener process on (Q,:T, P) which is adapted to the given filtration (:Tt, t ~ 0) where:To {0, Q} and:T Vt>o :Tt . Let us take an event A E :T with 0 < P( A) < 1. Define the random sequence -
(ii) Let (Wt, t
=
Xn = Xn{w) = {
0,
=
if wE A
)'f Wn ( W, I W
E AC , n > 1.
Then (Xn, n ~ 1) is a martingale with respect to the filtration (:Tn, n ~ 1). This is a simple consequence of the martingale property of the Wiener process. Indeed, for any n > m we have a.s.
Furthermore, it is well known (see Freedman 1971) that
P [limSUPwn n-+oo
= 00] = 1,
P [lim inf Wn = n-+oo
-00] =
From these relations we conclude that
P[W : Xn(w) converges as n --t
00] = P(A)
where, to repeat, P(A) is a fixed number between 0 and 1.
1.
COUNTEREXAMPLES IN PROBABILITY
264
22.13.
More on the convergence of martingales
Here we present three examples of martingales X (Xn, !Tn, n ~ 1) which satisfy the condition sUPn>l IXnl < 00 a.s. but have quite different behaviour as n -t 00. It will be shown that X may not be convergent, or convergent with a given probability (as in Example 22.12), or a.s. divergent. Let (ek, k ~ 1) be independent r.V.s with p[ek = 21. 1] = 2-1. and k p[ek == -1] == 1 - 2- . Defining r = inf{k : ek =1= -I} we find that
(i)
00
P[1 ==
00]
=
II (1 -
2- k) > O.
1.=1
Consider the sequence (Xn, n ~ 1) and the family (!Tn, n ~ 1) defined by n
Xn == I::(-I)kl[T~kJek'
== O'{e" .. · ,en},
!Tn
n> 1.
k=1
Then (Xnl !Tn, n ~ ]) is a martingale and for X* = sUPn~1 IXnl we have 00
P[X·
> 2n] ~
I::
(1/21.) = 2- n.
k=n+1
Hence for all n ~ 1, we have 2np[x·
> 2n]
~
1 and AP[X·
> A]
~ 2 for arbitrary
A > O. Thus we have shown that X· < 00 a.s. However, on the set [I == 00] which has positive probability, Xn alternates between 1 and 0, and hence (Xn) does not converge as n -t 00. (ii) Let (en, n ~ 1) be independent r. v.s such that
P[6n
== 1] = 1 _n- 2 = 1 - P[6n == _(n 2 - I)],
P[6n-1 - -1] ::::: 1-nObviously Een
== 0 for all n
P[6n+1 Let Sn
~
2
= 1-
P[6n-' = n
2
I],
n ~ 1.
1. By the Borel-Cantelli lemma
+ 6n =1= 0 i.o.] == 0
and
p[lenl =1= 1 i.o.]
O.
= e I + ... + en and !Tn = 0' {e, , ... , en}. Define the stopping time 1
= inf {n ~ m ; len I =1= I}
where m = m(p) is chosen so that P[U~==m {len I =1= I}] < 1 P for some fixed p, o < p < 1. Finally, let Xn == STl\n. Then it is easy to check that (Xn' !Tn, n ~ 1) is a martingale and Xn = SnI{1 > n) + SrI(r ~ n).
265
STOCHASTIC PROCESSES
Let us note that SnI{T
> n) is either
°
or + 1 or -1, and so for each n ~ m
However, X n = Sn on the set [T = 00] and thus Sn diverges a.s. since its summands ek alternate between 1 and -1 for all large n. Therefore
P[Xn diverges as n
--t
00]
~
P[T
= 00] > p.
(iii) Let 0 = [0, 1], l' = '.B[O,I] and P be the Lebesgue measure. Consider the random sequence (Xn, n ~ 1) and the family (1'n, n ~ 1) defined as follows:
XI = XI{w) = 1, X 2 = X2(W)
1'1
wE [0,1], 1
= { -_23 , 2'
t)
if wE [0, I ] if wE [ 2,1, if if if if
w E w E w E
wE
= {[O, 1], 0},
(1:'2 J
-
{[O , 2I] '2' [I 1] [0 1] 0} , , , ,
[0, !)
[!, t) 1'3 = [t, ~) [i, 1],
{[O,~],
[!, ~), ... , [0, 1], 0}.
Similarly we can express Xn(W) explicitly for n ~ 4 as well as 1'n. Further, we can easily check that (Xn, l'n, n ~ 1) is a martingale on the probability space (0, l'n, P). Obviously sup IXnl ::; ~ for all W E [0, 1]. n>1 However, P[W: Xn{w) converges as n
22.14.
--t
00] = 0.
A uniformly integrable martingale with a nonintegrable quadratic variation
Suppose M = (Mn, n = 0,1, ... ) is a uniformly integrable martingale. Then the series :En>1 J1 n M of the successive differences J1 nM = Mn - Mn-I CMo = 0) is U -convergent. A natural question is if U -convergence also holds for all subseries :En>1 v nJ1 n M called Burkholder martingale transfonns. Here Vn E {O, I}, n ~ 1. Dozzi and Imkeller (1990) have shown that the integrability of the quadratic variation S(M) := {:En>1 (J1 n M)2} 1/2 implies that all series :En>1 v nJ1 n M are U-convergent. Moreover~ if S(M) is not integrable, then there is a sequence {vn, n ~ 1} such that :En>1 vnJ1 n M is not integrable. Let us describe an explicit example of a uniformly integrable martingale M with a nonintegrable quadratic variation S{M) and construct a nonintegrable martingale transform :En~ I v n J1 n M.
266
COUNTEREXAMPLES IN PROBABILITY
Consider the probability space (n, :f, P) where n = [1,00), :f is the a-field of the Lebesgue-measurable sets in nand P(dw) = ce-wdw. Here c = e is the nonning constant and P corresponds to a shifted exponential distribution £xp( I). Introduce the r.v. Moo and the filtration (:fk' k 1,2, ... ) as follows:
=
Since Moo is integrable, the conditional expectation E[MCXlI:fkl is well defined and is :1'k-measurable for each k ~ 1. Hence with Mk := E[MCXlI:1'k] we obtain the martingale M = (Mk' :fk, k ~ I). Let us derive some properties of M. For this we use the following representation of M:
Mk(w)
= eww- 2I[!,k)(W) + ekk-ll[k,CXl)(W),
k ~ 1.
For A E a([I, kD this is trivial, and
r i[k,CXl)
MCXl dP
=c
r eWw- 2e- w dw = ekk-1P([k, i[k,CXl)
00)) =
r
Mk dP.
i[k,CXl)
Similar reasoning shows that M is unifonnly integrable. The next property of M is based on the variable
M*(w):= SUp\Mk(W)\ = e[wl[wtl. k~1
Obviously M* is not integrable, that is M* ¢ L 1(n,:1', P), and the Davis inequality (see e.g. Dellacherie and Meyer (1982) or Liptser and Shiryaev (1989)) implies that S(M) ¢ L' (n,:1', P). Thus we have described a unifonnly integrable martingale whose quadratic variation is not integrable. It now remains for us to construct a sequence (Vk, k ~ I), Vk E {O, I}, such that the partial sums N n := L~=I vk6.kM are a.s. convergent as n ----7 00 but not L I-convergent. Since Mo:= 0, then 6. IM(w) = Mdw) = e and choosing Vk = + (_I)k), k ~ I, and using the above representation of M we easily find that
4(1
N 2n (w) =
t;
n {( eW
e2k-l) w2 - 2k - 1 1[2k-I,2k)(W)
2k l
+
(e2k e - ) } 2k - 2k _ 1 1[2k,CXl)(W) .
This shows in particular that NCXl := limn~CXl N n exists a.s. If we write explicitly N 2n (w) 1[21,2l+I)(W) for l ::; n and denote B = U,~d2l, 2l + I), then we see by a direct calculation that
1
NCXl dP =
B
CXl
1 (2k
LL
1=1 k=1
~k
e2k-l)
- 2k _ 1
c
1
21 +1
21
w
e- dw = 00.
STOCHASTIC PROCESSES
267
Therefore the martingale transform (Nn, n ~ I) is not L I-convergent. It is interesting to note the case when Mn = :L:::~=I X k , n ~ I, with X k independent r.v.s, EX k = 0, k ~ 1. Here uniform integrability of (Mn, n ~ 1) implies integrability of the quadratic variation S(M).
SECTION 23.
CONTINUOUS-TIME MARTINGALES
Suppose we have given a complete probability space (0,1', P) and a filtration (Tt, t ~ 0) which satisfies the usual conditions: T t c l' for each t; if s < t, then Ts C T t • (Tt ) is right-continuous; each T t contains all P-null sets of T. As usual, the notation (Xt, T t , t ~ 0) means that the stochastic process (Xt, t ~ 0) is adapted with respect to (Tt ), that is for each t, X t is Tt-measurable. The process X = (Xt, Ttl t ~ 0) with EIXt! < 00 for all t ~ 0 is called a martingale, submartingale or supermartingale, if s ~ t implies respectively that E[XtITs] = Xs a.s., E[XtITs] ~ Xs a.s., orE[XtITs ] ~ Xs a.s. We say that the martingale M = (Mt, T t , t ~ 0) is an LP -martingale, p ~ I, if E[lXt IP] < 00 for all t ~ O. If p = 2 we use the term square integrable martingale. A r. v. T on 0 with values in R+ U {(X)} is called a stopping time with respect to (Td (or that T is an (Tt)-stopping time) if for all t E R+, [T ~ t] E T t . Let X (Xt, Tt, t ~ 0) be a right-continuous process. X is said to be a local martingale if there exists an increasing sequence (Tn, n ~ 1) of (Tt)-stopping times with Tn ~ 00 as n --+ 00 such that for each n the process (Xt/\Tn ITt 1 t ~ 0) is a unifonnly integrable martingale. Further, X is called locally square integrable if (Xt/\Tn' T t , t ~ 0) are square integrable martingales, that is if for each n,
=
E[X;/\Tn ] < 00. If M = (Mt) T t ) t
~ 0) is a square integrable martingale, then there exists a unique predictable increasing process denoted by (M) ((Mt) ) T t ) t 2 0) and called a quadratic variation of M, such that (Ml- (M}t, T t ) t ~ 0) is a martingale. Suppose X = (Xt, T t , t 2:: 0) is a dtdlag process (that is, X is right-continuous with left-hand limits) where the filtration (Tt ) satisfies the usual conditions, and assume for simplicity that 1'0- = To, 1'00- = T. The process X is said to be a semimartingale if it has the following decomposition:
where M = (Mt) T t , t ~ 0) isa local martingale with Mo 0, and A = (At, Ttl t ~ 0) is a right-continuous process, Ao = 0, with paths of locally finite variation. A few other notions will be introduced and analysed in the examples below. A great number of papers and books devoted to the theory of martingales and its various applications have been published recently. For an intensive and complete presentation of the theory of martingales we refer the reader to books by Dellacherie and Meyer (1978, 1982), Jacod (1979), Metivier (1982), Elliott (1982), Durrett
268
COUNTEREXAMPLES IN PROBABILITY
(1984), Kopp (1984), Jacod and Shiryaev (1987), Liptser and Shiryaev (1989), Revuz and Yor (1991) and Karatzas and Shreve (1991). For the present section we have chosen a few examples which illustrate the relationship between different but close classes of processes obeying one or another martingale-type property. In general, the examples in this section can be considered jointly with the examples in Section 22.
23.1.
Martingales which are not locally square integrable
We now introduce and study close subclasses of martingale-like processes. This makes it necessary to compare these subclasses and clarify the relationships between them. In particular, the examples below show that in general a process can be a martingale without being locally square integrable. We shall suppose that the probability space (n, 1", P) is complete and the filtration (1"t, t :2 0) satisfies the usual conditions. (i) Let us construct a uniformly integrable martingale
X
= (Xt, 1"h t 2:
0) such that for every (1"t)-stopping time T, T is not identically zero, we have E[X}] ::::: 00. Obviously such an X cannot be locally square integrable. Let n = lR+, 1" = 13+ and 1"t be the u-field generated by rAt where r is a r.v. distributed exponentially with parameter 1 : P[r > x] e- x , x 2: O. Moreover,3=" and 1"t are assumed to be completed by all P-null sets of n. According to Dellacherie (1970) the following two statements hold.
=
(a) (1"t) is an increasing right-continuous sequence of u-fields without points of discontinuity. (b) The r.v. T is a stopping time with respect to (1"t) iff there exists a number u E lR+ U {oo} such that T 2: r a.s. on the set [7 ::; u] and T ::::: u a.s. on the set [r > u]. Thus for each stopping time T with P[T = 0] < I there exists u E lR+ U {OJ such that 7 A u = T a. s. Consider now the r.v. Z = r- 1/ 2 e r /2 I[O
EZ =
/.1 x-I/'e'/'e-' dx = /.1 x-I/'e-'/' dx < 00.
So Z is an integrable r.v. Take the process X
= (Xt, t 2: 0) where
Then X is a right-continuous martingale which is uniformly integrable. The next step is to check whether X is locally square integrable. To see this, we use the following representation found by Doleans-Dade (1971):
269
STOCHASTIC PROCESSES
Further, for every a E (0,1) we have 2 _> Z 1[T:5Tl\a] --
2
X Tl\a
r-1eTI[O
~
-,r
E[X2Tl\a ] > _
l
a
x-1exe- x dx
o
00 .
Now let T be a stopping time such that P[T = 0] < 1 and a E (0, 1) so that r /\ a ~ T a.s. Then the inequality E[X}] < 00, which is necessary for square integrability, is not possible because this would imply that E[X;l\a] ~ E[Xf] < 00 which leads to a contradiction. Therefore the martingale X is not locally square integrable.
(ii) Let the r. v. r be the moment of the first jump of a homogeneous Poisson process N = (Nt, t > 0) with parameter 1. Define the filtration (Tt , t > 0) where T t = a{Ns ! s < t} and the process m = (mt, t ~ 0) by
mt = r- I / 2 I[T:5t]- 2~. According to Kabanov (1974), the process m has the following representation as a Stieltjes integral:
m, = /.'
s-I/2I[r~'I(dN. -
ds).
It can be derived from here that (mt, T t , t ~ 0) is a martingale. It also obeys other properties but the question to ask is whether m is locally square integrable. To answer this we again use the result of Dellacherie (1970) cited above. So, take any (Tt )stopping time T. Then T /\ r = c /\ r for some constant c and for any c > 0 we have E[r- I I[T:5c1J = 00. Hence E[m}J = 00 and the martingale m is not locally square integrable.
(iii) Let ~ be a r.v. defined on (il, T, P). Consider the process M = (Mt, t the filtration (Ttl t ~ 0) given by
= { 0,
M t
~
_
E~,
if 0 ~ t < 1 if t ~ 1,
Tt
= {{0,1l}, ~
= a{(},
~
0) and
~f 0 ~ t < 1 If t ~ 1.
In addition, suppose that EI(I < 00 but E[(2] = 00. Then it is easy to verify that (Mt, Tt, t ~ 0) is a martingale. Following the definition we see that this martingale. which is also a local martingale. is not locally square integrable.
23.2.
Every martingale is a weak martingale but the converse is not always true
Let M (Mt, T t , t ~ 0) be a stochastic process. We say that M is a weak martingale if for each n there exists a right-continuous and uniformly integrable martingale Mn = (Mr, Til t ~ 0) such that M t = Mr for 0 ~ t < Tn, where (Tn' n ~ 1) is an increasing sequence of (Tt)-stopping times with Tn ~ 00 as
270
COUNTEREXAMPLES IN PROBABILITY
n -+ 00. It is convenient to say that a stopping time T reduces a right-continuous process M = (Mt , ~t, t ~ 0) if there exists a uniformly integrable martingale H = (Ht , ~t, t ~ 0) such that M t = H t forO::; t < T. It is obvious from the above definition that every martingale and every local martingale are also weak martingales. This observation leads naturally to the question of whether or not the converse statement is correct. The answer is contained in the next example. Let 1f = (1ft, t ~ 0) be a Poisson process with parameter A > 0, 1fo = 0 and (~t,t ~ 0) be its own generated filtration: ~t = ~ = a{1f s ,s ::; t}. Let T be the first jump time of 1f so T is an exponential r. v. with parameter A. An easy computation shows that if t < T if t> T. This relation will help us to construct the example we require. Indeed, for a suitable probability space, consider a sequence of such independent Poisson processes 1fn = (1f;,",t ~ O),n ~ 1, where 1fn has parameter An and suppose that An -+ 0 as n -+ 00. Let Tn be the first jump time of the process 1fn. Denote by :Tt the a-field generated by the r. v.s 1f: for all nand s ::; t and including all sets of measure zero. Thus the family (:Tt , t ~ 0) is right-continuous. Consider the process M = (Mt, :Jt , t ~ 0) where M t = t. Using the independence of the processes 1fn we obtain analogously that if t if t
< Tn ~ Tn.
This relation shows that Tn reduces M. If we take, for instance, An = n- 3 then the series Ln P[Tn ::; n] = Ln (I - e-n>.n) converges and the Borel-Cantelli lemma says that Tn ~ 00 as n -+ 00. This and a result of Kazamaki (1972a) imply that the process M is a weak martingale. However, M is not a martingale, which is seen immediately if we stop M at a fixed time u. Therefore we have described an example of a continuous and bounded weak martingale which is not a martingale.
23.3.
The local martingale property is not always preserved under change of time
Again, let (n,~, P) be a complete probability space and (~t, t ~ 0) a filtration satisfying the usual conditions. All martingales considered here are assumed to be (~t)-adapted and right-continuous. By a change of time (Tt, 1't, t ~ 0) we mean a family of (~t )-stopping times (Tt) such that for all wEn the mapping T.(W) is increasing and right-continuous. If X = (Xl, ~l, t ~ 0) is a stochastic process, denote by X = (XT1 , ~TI' t ~ 0) the new process obtained from X by a change of time. So if X obeys some useful
STOCHASTIC PROCESSES
271
property, it is of general interest to know whether the new process X obeys the same property. In particular, if X is a martingale or a weak martingale we want to know whether under some mild conditions the process X is a martingale or a weak martingale respectively (see Kazamaki 1972a, b). Thus we come to the question: does a change of time preserve the local martingale property? Let M = (Mt , ~t, t ~ 0), Mo = 0 be a continuous martingale with P[lim sup M t
= 00] = 1.
t-HX)
In particular, we can choose M to be a standard Wiener process w. The r.v. Tt defined by Tt = inf{u: Mu > t} is a finite (~d-stopping time. Clearly, TO = 0 and Too = 00 a.s. It is easy to see that the change of time (Tt, t ~ 0) satisfies the relation MT' = t which is a consequence of the continuity of M. However, the process AI = (t, ~TI' t ~ 0) is not a local martingale. Therefore in general the local martingale property is not invariant under a change of time. Dellacherie and Meyer (1982) give very general results on semimartingales when the semimartingale property is preserved under a change of time.
23.4.
A uniformly integrable supermartingale which does not belong to class (D)
Let X = (X t , t E 1R+) be a measurable process. We say that X is bounded in U with respect to a given filtration (~t, t E 1R+) if the number
IIXII I
= sup E[lXT II[t
where sup is taken aver all (~d-stopping times T, is finite. If, moreover, all the r.v.s XTI[T
272
on
t
COUNTEREXAMPLES IN PROBABILITY
[O,ooJ. Moreover, for every sequence (tn) of elements of [0,00] converging to
E [0,00] we have
X t .. ~ Xt. So the mapping t H X t of [0,00] into the space
U
is continuous and since [0,00] is compact, the r.v.s X t , t E [0,00] are unifonnly integrable (see Dellacherie and Meyer 1978). Therefore the process X is a unifonnly integrable supennartingale which is even continuous. It remains for us to check if X belongs to class (D). For this purpose we use the following result (see Johnson and Helms 1963). Let Z be a positive right-continuous supennartingale and let
Tn
= Tn{W) = inf{t : Zt{w)
~
n}.
Then Z belongs to class (D) iff limn-+oo E[ZTn I[Tn
=
I
P[Tn < 00] = { l/(nlxl),
if Ixl::; I/n if Ixl ~ I/n.
Hence nP[Tn < 00] = 1/Ixl for sufficiently large n, nP[Tn < 00] does not tend to 0 as n -+ 00 and according to the result of Johnson and Helms (1963) quoted above, the process X does not belong to class (D).
23.5.
LP-bounded local martingale which is not a true martingale
Recall that the process M = (Mtl ::Ttl t ~ 0) is caned an LP-martingale, p ~ 1, iff it is a martingale and M t E LP for each t > O. If SUPt E[lMtIP] < 00 we say that M is LP-bounded. For simplicity, let Mo O. For p E [0,00), M is called a local LP-martingale if there is a sequence {Tn, n ~ I} of (::Tt)-stopping times such that n tn t 00 as n -+ 00 and for each n the process M (Mt/\Tn' ::Ttl t ~ 0) is an LP-martingale. In Example 23.1 we established that there are martingales and local martingales which are not locally square integrable. Similarly, we shall show below that an LP-bounded local martingale need not be a true martingale.
=
(i) Letw
= (wt,t ~ 0) be a standard Wiener process in]R3. Leth : ]R3\{0} H]RI be
afunctiondefinedbyh{x) = lxi-I for x E ]R3\{0} and let Tn = inf{t > 0: IWtl < n-I}. Then {Tn, n > I} is an increasing sequence of(::Tt)-stopping times,::Tt = ~, with Tn -+ 00 a.s. as n -+ 00. The function h is harmonic in the domain ]RJ \ {OJ which obviously contains the domain Dn = {x : Ixl > n -I} for each n ~ 1. Define a function 9n on the closure Dn of Dn by 9n{X) = Ex [h(wTJ], x E Dn where Ex denotes the expectation given Wo .::::: X a.s. Since W is a strong Markov process (see Dynkin 1965; Freedman 1971; Wentzell 1981) with spherical symmetry, Un possesses the mean-value property that its average value over the surface of any sufficiently
STOCHASTIC PROCESSES
273
small ball about x E Dn equals its value at x (see Dynkin 1965). This implies that 9n is a harmonic function in Dn and it can be shown that 9n is continuous in Dn with boundary values equal to those of the function h. By the maximum principle for harmonic functions we conclude that 9n = h in Dn for all n. Moreover, for n ~ I, x E Dn and each fixed t we have the following relation:
The strong Markov property of W gives
on the set [Tn> tJ. So if we combine these two relations and take into account that 9n - h in Dn we have the equality
Recall now that the initial state of the Wiener process is Wo i= (0,0,0) and let Wo = Xo. Then for all sufficiently large n we have Xo E Dn. Thus we conclude that the process (h(WtATn), t ~ 0) is a bounded martingale. This implies that (h(wd, t ~ 0) is a local martingale. So it remains for us to clarify whether this local martingale is a true martingale. We have
Exo[h(wo)] = Xo and we want to find Exo[h(wt)]. 1ft
> 0 and c > 21xol then
Exo[h(Wt)] = (21rt)-3/2 [ Iyl-I exp( Iy
JR
xol 2 j(2t)) dy
3
< (21rt)-3/2 { [ Iyl-' dy + [ Iyl JIYI~c J1yl>c 1 S (21rt)-3/2 C1C2 + C2C- .
1 exp( -lyl2 j(St))
dY }
Here c, > 0 and C2 > 0 are constants not depending on c and t. Obviously, if t -4 00, and ct- 3 / 4 -4 0 then Exo[h(wt)] -+ O. Hence for all sufficiently large t we obtain C -4 00
This relation means that the process (h(wt} , t ~ 0) is not a true martingale despite the fact that it is a local martingale. A calculation similar to the one above shows that h( Wt) E L 2 for each t and also 2 2 SUPt E[h (wt}1 < 00. Therefore (h(wd, t ~ 0) is an L -bounded local martingale although, let us repeat, it is not a true martingale. It would be useful for the reader to compare this case with Example 23.4.
274
COUNTEREXAMPLES IN PROBABILITY
(ii) Let us briefly consider another interesting example. Let X = (Xt, t ~ 0) be a Bessel process of order 1, 1 ~ 2. Recall that X is a continuous Markov process whose infinitesimal operator on the space of twice differentiable functions has the fonn
I d2 2dx 2
+
1- 1 d h dx'
Note that if 1 is integer, X is identical in law with the process (lw(t)l, t ~ 0) where Iw(t)1 = (wr(t) + ... + wl(t))1/2 and ((WI (t), ... , Wl(t)), t ~ 0) is a standard Wiener process in ]Rl (see Dynkin 1965; Rogers and Williams 1994). Suppose X starts from a point x > 0, that is Xo = x a.s., and consider the process M = (Mt, t ~ 0) where M t = I/X:-2. If 1't = CT{ X 8, S ~ t} then it can be shown that (Mt, 1't, t ~ 0) is a local continuous martingale which. however, is not a martingale because EMt vanishes when t ~ 00 (compare with case (i)). On the other hand, E[Mf] < 00 for any p such that p < 1/ (1 - 2). Thus, if 1 is close to 2, p is 'big enough' and we have a conti nuous local martingale which is 'sufficiently' integrable in the sense that M belongs to the space LP for 'sufficiently'large p; despite this fact, the process M is not a true martingale.
23.6.
A sufficient but not necessary condition for a process to be a local martingale
=
We shall start by considering the following. Let X (Xt, 1't, t ~ 0) be a dldlag process with Xo = 0 and A = (At, 1't, t ~ 0) be a continuous increasing process such that Ao = O. Assume that for A E ]RI the process Z>.. = (zt, 1't, t ~ 0) defined by = exp(AX t - ~A2 At)
z;
is a local martingale. Then X is a continuous local martingale and A = (X). Here A = (X) is the unique predictable process of finite variation such that X 2 - (X) is a martingale. (For details see Dellacherie and Meyer (1982) or Metivier (1982).) This result is due to M. Yor and is presented here in a fonn suggested by C. Stricker. It can also be found in a paper by Meyer and Zheng (1984). Now we shall show that the continuity of A and the condition Ao = 0 are essential for the validity of this result. Let xn = (xtn, 1't, t ~ 0) be a sequence of centred Gaussian martingales such that X n has the following increasing process An = (A~, 1't, t ~ 0):
0, A~ = 1, { linear in between,
if t
ift~c+l/n
c = constant.
We now consider the limiting case as n ~ 00. Referring the reader to a paper by Meyer and Zheng (1984) for details, we get A~ ~ At and Xr ~ X t weakly in
275
STOCHASTIC PROCESSES
the space ]j)) where At = Z[t~cl and X t = €1[t~cl with € a r.v. distributed N(O, 1). It is not difficult to check that for each .,\ the process ZA = (Z!') ~tl t 2:: 0), where Zt = exp("\Xt ~.:\.2 At), is a martingale. However, neither A nor X is continuous. Moreover, if c = 0, the property Xo = a.s. no longer holds.
°
23.7.
A square integrable martingale with a non-random characteristic need not be a process with independent increments
Let V = (Xt, ~t, t 2:: 0) be a square integrable martingale defined on the complete probability space (O,~, P) where the filtration (~t, t 2:: 0) satisfies the usual conditions. The well known Levy theorem asserts that if X is continuous and its characteristic (X) is detenninistic, then X is a Gaussian process with independent increments (see Grigelionis 1977; Jacod 1979). Our purpose now is to answer the following question. Is it true that any square integrable martingale X with a non-random characteristic (X) is a process with independent increments? Let 0 = [0, 1], P be the Lebesgue measure and the a-field ~ be generated by the following three r.v.s 1]0,1]1 and 1]2, where 1]0
1]1
1]2
= 0
for all W E 0, -I, if W E [0, ~) = { . 1 1, If wE [i>l],
=
-2, 0, 1 -/3fi, I, 1+ /3fi,
if wE [O,!) if W E [!, if wE [!, j) ifwE[j,~) if w E [~, 1].
t)
Denote by l'i the a-field generated by the r. v. 1]i, i - 0, 1,2, and fix the points So = 0, SI = I, S2 = 2, S3 = 00. Consider the stochastic process X = (Xt, t 2:: 0) defined by 2
Xt =
L
1]kI[SA.,SA.+l) (t),
t
>0
k=O
and introduce the family (~t, t 2:: 0) of increasing and right-continuous sub-a-fields of ~ where ~t l'k for t E [Sk,Sk+d, k = 0,1,2. It is easy to check that X (X t , ~t, t 2:: 0) is a martingale (and is bounded). Moreover, its characteristic (X) can be found explicitly, namely:
=
0, (X}t = 1, { 2,
if 0 S t < 1 if 1 S t < 2 if t 2:: 2.
276
COUNTEREXAMPLES IN PROBABILITY
Obviously the characteristic (X) is non-random. Further, the relations
imply that the increments of the process X are not independent. Therefore we have constructed a square integrable martingale whose characteristic (X) is non-random, but this does not imply that X has independent increments. It may be noted that here the process X varies only by jumps while in the Levy theorem X is supposed to be continuous. Thus the continuity condition is essential for this result. A correct generalization of the Levy theorem to arbitrary square integrable martingales (not necessarily continuous) was given by Grigelionis (1977). (See also the books of Liptser and Shiryaev 1989 or Jacod and Shiryaev 1987.) 23.8.
The time-reversal of a semimartingale can fail to be a semimartingale
Let W (Wt , t ;::: 0) be a standard Wiener process in IR I. Take some measurable function h which maps the space e[O, 1] one-one to the interval [0,1]. Define the r. v. 7 7(W) h({ws(w),O ~ s ~ I}) and the process X (Xt,t ~ 0) where
=
=
Wt,
Xt =
{
WI,
Wt-r,
if 0 < t < I if 1 ~ t ~ 1 + if t ~ 1 + 7.
7
Thus X is a Wiener process with a flat spot of length 7 ~ 1 interpolated from t = 1 I + 7. Since 7 is measurable with respect to the u-field u{Xs,s ~ t}, it is to t easy to see that X is a martingale (and hence a semimartingale). Now we shall reverse the process X from the time t = 2. Let
Xt
= Xz- t for 0 ~ t ~ 2.
Denote by (:!t) the natural filtration of X. Note that the variable 7 is :!\-measurable, hence so is {.Kt,O ~ t ~ l}, since it is the time-reversal h- 1(7). Thus:!t = :!I for 1 < t < 2. This means that any martingale with respect to the filtration (:!d will be constant on the interval (I, 2) and any semimartingale will have a finite variation there. However, the Wiener process w has an infinite variation on each interval and therefore X has an infinite variation on the interval (1,2). Hence X, which was defined as the time-reversal of X, is not a semimartingale relative to its own generated filtration (:!t). According to a result by Stricker (1977), the process X cannot be a semimartingale with respect to any other filtration. 23.9.
Functions of semi martingales which are not semimartingales
Let X (Xt, ::ft , t ;::: 0) be a semimartingale on the complete probability space (.0, ::f, P) and the family of a-fields (::ft , t ~ 0) satisfies the usual conditions. The
277
STOCHASTIC PROCESSES
following result is well known and often used. If f(x), x E JR 1, is a function of the space CC 2(JR I) or f is a difference of two convex functions, then the process Y = (Yi, 1't, t 2: 0) where Yi = f(X t ) is again a semimartingale (see Dellacherie and Meyer 1982). In general, it is not surprising that for some 'bad' functions fthe process Y = f(X) fails to be a semi martingale. However, it would be useful to have at least one particular example of this kind. Take the function f(x) = Ixla, 1 < Q < 2. Consider the process Y = f(X), that is Y = IXla and try to clarify whether Y is a semimartingale. In order to do this we need the following result (see Yor 1978): if X is a continuous local martingale, Xo = 0 a.s., then statements (a) and (b) below are equivalent: (a) X 0; (b) the local time of X at 0 is L O = o. Let us suppose that the process Y = IX 10' is a semimartingale. Then applying the Ito fonnula (see Dellacherie and Meyer 1982; Elliott 1982; Metivier 1982; Chung and Williams 1990) for f3 = 1/ Q > 1 we obtain
=
and
L~ = lot l[x~=oJ dlXsl = lot l[Y~=oJ d(yt1)s = 0,
t
2: o.
Thus by the above result we can conclude that X = O. This contradiction shows that the process Y = IXla is not a semimartingale. The following particular case of this example is fairly well known. If w is the standard Wiener process, then Iwl a, 0 < Q < is not a semimartingale (see Protter 1990). Other useful facts concerning the semimartingale properties of functions of semimartingales can be found in the books by Yor (1978), Liptser and Shiryaev (1989), Protter (1990), Revuz and Yor (1991), Karatzas and Shreve (1991) and Yor (1992, 1996).
t,
23.10.
Gaussian processes which are not semimartingales
One of the 'best' representatives of Gaussian processes is the Wiener process which is also a martingale, and hence a semimartingale. Since any Gaussian process is square integrable, it seems natural to ask the following questions. What is the relationship between the Gaussian and semimartingale properties of a stochastic process? In particular, is any Gaussian process a semimartingale? Our aim now is to construct a family {x(a)} of Gaussian processes depending on a parameter Q such that for some Q, x(a) is a semimartingale, while for other 0:, it is not. Indeed, consider the function
278
COUNTEREXAMPLES IN PROBABILITY
It can be shown that for each a E [1,2] the function K(a) is positive definite. This implies (see Doob 1953; Ash and Gardner 1975) that for each a E [I, 2J there exists a Gaussian process, say x(a) = (Xi a ), t E 1R+), such that EX~a) = 0 and its covariance function is E[X~a) a)] K(a)(s, t), s, t E 1R+. The next step is to verify whether or not the process x(a) is a semimartingale (with respect to its natural filtration). It is easy to see that for a = I we have K(I)(s, t) = min{ s, t}. This fact and the continuity of any of the processes x(a) imply that X(l) is the standard Wiener process. Further, if a 2 we obtain that 2 ) = t~ where ~ is a r.v. distributed N(O, I). Therefore in these two particular cases, a = I and a ::::: 2, the corresponding Gaussian processes X(1) and X(2) are semimartingales. To determine what happens if I < a < 2 we need the following result of A. Butov (see Liptser and Shiryaev 1989). Suppose X = (Xt, t ~ 0) is a Gaussian process with zero mean and covariance function r(s, t), s, t ~ 0 and conditions (a) and (b) below are satisfied.
xi =
=
x1
(a) There does not exist a non-negative and non-decreasing function F of bounded variation such that (r(t,t) + r(s,s) - 2r(s,t))1/2::; F(t) - F(s), s < t.
< ... < tn = T with maxk(tk+1 - tk) ~ 0 we have L:~:ci (Xtle+l - X t ,.)2 ~ 0 as n ~ 00.
(b) For any interval [0, TJ
1R+ and any partition 0
= to <
tl
Then the process X is not a semimartingale. Now let us check conditions (a) and (b) for the process x(a). We have (K(a)(t,t)
+ K(a)(s,s)
2K(a)(s,t))1/2:::::
It -
sla/2,
However. the function It - sla/2 with I < a < 2 is not representable in the form F(t) - F(s) for some non-negative and non-decreasing F of bounded variation. So condition (a) is satisfied. Furthermore. for t > s we can easily calculate that E[(Xi a) - X!a})2] ::::: It - sla. It follows that
which implies the validity of condition (b). Thus the Gaussian process x(a) is not a semimartingale if I < a < 2. Therefore we have constructed the family {X( a) , I ::; a ::; 2} of Gaussian (indeed. continuous) processes such that some members of this family. those for a ::::: I and a ::::: 2, are semimartingales. while others, when I < a < 2, are not semimartingales. Consider another interpretation of the above case. Recall that afractional standard Brownian motion BH::::: (BH(t),t E 1R1) with scaling parameter H,O < H < 1, is a Gaussian process with zero mean, B H (0) = 0 a.s. and covariance function
279
STOCHASTIC PROCESSES
(compare r(s, t) with K(a)(s, t) above) (see Mandelbrot and Van Ness 1968). Hence for any H, < H < 1, the fractional Brownian motion BH is not a semi martingale. A very interesting general problem (posed as far as we know by A. N. Shiryaev) is to characterize the class of Gaussian processes which are also semimartingales. Useful results on this topic can be found in papers by Emery (1982), Jain and Monrad (1982), Stricker (1983), Enchev (1984,1988) and Galchuk (1985) and the book by Liptser and Shiryaev (1989).
!
23.11.
On the possibility of representing a martingale as a stochastic integral with respect to another martingale
(i) Let the process X (X t , t E [0, T)) be a martingale relative to its own generated filtration(~f,t E [O,T]).Sup~oseM (Mt,t E [O,T]) is another process which is a martingale with respect to (~t ). The question is whether M can be represented as a stochastic integral with respect to X, that is whether there exists a 'suitable' function
=
CPs, s E [0, T]) such that M t =
f; CPs dX s' One reason for asking this question is
that there is an important case when the answer is positive, e.g. when X is a standard Wiener process (see Clark 1970; Liptser and Shiryaev 1977178; Dudley 1977). The following example shows that in some cases the answer to the above question is negative. Consider two independent Wiener processes, sayw = (Wt, t 2: 0) and v = (Vt, t > 0). Let X t = f;Wsdvs and ~f = a{Xs,s:::; t}. Then (X)t is ~f-measurable
f; w;
wt
= ds it follows that is ~tl( -measurable. Hence the process M = (Mt , t 2: 0) where M t = t is an L 2- martingale with respectto the filtration (~f, t 2: 0). Suppose now that the martingale M can be represented as a stochastic integral with respect to X: that is, for some predictable function (Hs (w), s 2: 0) with E[fooo H; d(X)s] < 00 we have M t = f; Hs dX s, t 2: O. Since by the Ito formula we have M t = 2 Ws dw s , it follows that and since (X}t
wi -
f;
M,
= 2 /.' w. dw. = /.' H, dX, = /.' H,w, dv,.
These relations imply that
o
E{[2 I.' w. dw. - I.' 4E
[I.' w;
H,w,
dV.n
1+ E [I.' H;w; ds1
ds
which of course is not possible. Therefore the martingale M = (Mh ~f t 2: 0) cannot be represented as a stochastic integral with respect to the martingale X.
,
280
COUNTEREXAMPLES IN PROBABILITY
(ii) Let X be a r.v. which is measurable with respect to the a~field
:Jf generated
by the Wiener process W in the interval [O,IJ. Clearly in this case X is a functional of the Wiener process and it is natural to expect that X has some representation through w. The following useful result can be found in the book by Liptser and Shiryaev (1977178). Let the r.v. X be square integrable, that is E[X2] < 00. Suppose additional1y that the r.v. X and the Wiener process w = (w(t), t E [0,1]) fonn a Gaussian system. Then there exists a detenninistic measurable function g(t), t E [0, 1], with fOI g2(t) dt < 00 such that
X = EX
(1)
+
f
g(t)dw(t).
We now want to show that the conditions ensuring the validity of this result cannot be weakened. In particular, we cannot remove the condition that (X, w = (w(t), t E [0, I])) is a Gaussian system. Indeed, consider the process
X, =
l
h(w(s))dw(s),
t E [0,11
where h{x) = I if x 2:: 0 and h(x) -I if x < O. It is easy to check that (Xt) t E [0, 1]) is a Wiener process. Therefore the r.v. X = XI is a Gaussian and 91 ~measurable r. v. However, X cannot be represented in the fonn given by (1) with a detenninistic function g.
SECTION 24.
POISSON PROCESS AND WIENER PROCESS
The Poisson process and the Wiener process play important roles in the theory of stochastic processes, similar to the roles of the Poisson and the nonnal distributions in the theory of probability. In previous sections we considered the Poisson and the Wiener processes in order to illustrate some basic properties of stochastic processes. Here we shall analyse other properties of these two processes, but for convenience let us give the corresponding definitions again. We say that w (Wtl t 2:: 0) is a standard Wiener process if: (i)wo Oa.s.; (ii) any increment Wt - w 8 where s < t is distributed nonnally, N{ 0) t - s ); (iii) for each n 2:: 3 and any 0 < tl < t2 < ... < tn the increments Wt2 Wtll Wt3 - Wt2' ... ,Wt n - Wtn I are independent. The process N = (Nt, t 2:: 0) is said to be a (homogeneous) Poisson process with parameter A. A > 0, if: (i) No = 0 a.s.; (ii) any increment Nt - Ns where s < t has a Poisson distribution with parameter A(t - s); (iii) for each n 2:: 3 and any o ~ tl < tz < ... < tn the increments Nt2 N tl , Ntl - Nt21 ••• I N tn - Ntn _ 1 are independent. Note that the processes wand N can also be defined in different but equivalent ways. In particular we can consider the non~standard Wiener process, the Wiener
=
=
281
STOCHASTIC PROCESSES
process with drift, the non-homogeneous Poisson process, etc. Another possibility is to give the martingale characterization of each of these processes. The reader can find numerous important and interesting results concerning the Wiener and Poisson processes in the books by Freedman (1971), Yeh (1973), Cinlar (1975), Liptser and Shiryaev (1977178), Wentzell (1981), Chung (1982), Durrett (1984), Kopp (1984), Protter (1990), Karatzas and Shreve (1991), Revuz and Yor (1991), Yor (1992, 1996) and Rogers and Williams (1994).
24.1.
On some elementary properties of the Poisson process and the Wiener process
(i) Take the standard Wiener process wand the Poisson process N with parameter 1 (Nt, t ~ 0) where Nt = Nt - t is the so-called centred Poisson process. and let N It is easy to calculate their covariance functions:
=
Cw(s,t)=min{s,t},
Cfl(s,t)=min{s,t},
s,t~O.
Therefore these two quite different processes have the same covariance functions. = a{Ns, s :S t} then each of Further, if we denote 17 = a{w s , s :S t} and the processes (Wt, 17, t ~ 0) and (Nt, t ~ 0) is a square integrable martingale. Recall that for every square integrable martingale M = (Mt , 1't, t ~ 0) we can find a unique process denoted by (M) = ((M)t, t ~ 0) and called a quadratic variation process, such that M2 - (M) is a martingale with respect to (1't) (see Dellacherie and Meyer 1982; Elliott 1982; Metivier 1982; Liptser and Shiryaev 1977178, 1989). In our case we easily see that
1'f",
(w)t
=t
and
1'f"
(N)t
= t.
Again, two very different square integrable martingales have the same quadratic processes. Obviously, in both cases (w) and (N) are deterministic functions (indeed, continuous), the processes wand N have independent increments, w is a.s. continuous, while almost all trajectories of N are discontinuous (increasing stepwise functions, left- or right-continuous, with unit jumps only). Therefore, neither the covariance function nor the quadratic variation characterize the processes wand N uniquely. (ii) The above reasoning can be extended. Take the function
C(s, t)
= e- 2.xls-tl,
s, t ~ 0, A = constant> O.
It can be checked that C(s, t) is positive definite and hence there exists a Gaussian stationary process with zero-mean function and covariance function equal to C. We shall now construct two stationary processes, say X and Y, each with a covariance function C; moreover, X will be defined by the Wiener process wand Y by the Poisson process N with parameter A.
282
COUNTEREXAMPLES IN PROBABILITY
Consider the process X = (Xt, t ~ 0) where
Here a > 0 and (3 > 0 are fixed constants. This process X is called the OmsteinUhlenbeck process with parameters a and [3. It is easy to conclude that X is a continuous stationary Gaussian and Markov process with EXt = 0, t ~ 0 and covariance function Cx (s, t) = ae-13ls-tl. So, if we take a = 1 and (3 = 2A, we obtain Cx (s, t) = e- 2>.ls-tl (the function given at the beginning). Further, let Y = (Yt , t ~ 0) be a process defined by
where Yo is a r. v. taking two values, 1 and -1, with probability! each and Yo does not depend on N. The process Y, called a random telegraph signal, is a stationary process with Eyt = 0, t ~ 0 and covariance function C y (s, t) = e- 2>'ls-tl. Obviously, Y takes only two values, 1 and -1; it is not continuous and is not Gaussian. Thus using the processes wand N we have constructed in very different ways two new processes, X and Y, which have the same covariance functions. (iii) Here we look at other functionals of the processes wand N. Consider the processes U = (Ut , t ~ 0) and V = (Vt, t ~ 0) defined by
u, =
l' x,
d.,
V,
= v(N,l
where X is the Ornstein-Uhlenbeck process introduced above and the function v is such that v(2n) = v(2n + 1) = n, n = 0, 1, .... What can we say about the processes U and V? Obviously, U is Gaussian because it is derived from the Gaussian process X by a linear operation. Direct calculation shows that EUt = 0, t ~ 0 and
Cu, (s t) = ~ min{s t} + _1 [e- 13min {s,t} + e- 13max {s,t} _ e-13ls-tl-lj (3 , (32 . Clearly, if we take [3 = 2,thenforlarges,tandls-tl we have Cu(s,t) :::::: min{s,t}. So we can say that, asymptotically, the process U has the same covariance structure as the Wiener process. Both processes are continuous but some of their other properties are very different. In particular, U is not Markov and is not a stationary process. Consider now the process V. Does this process obey the properties of the original process N? From the definition it follows that V is a counting process which, however, only counts the arrivals t2, t4, t 6, ... from N. Further, for 0 < t - h < t < t + h we have
283
STOCHASTIC PROCESSES
which means that V is not a process with stationary increments. Moreover, it is easy to establish that the increments of V are not independent. Finally, from the relations
limP[Vt+h r+O
= IIVt = I, Vt-r = 0] = P[Nh = 0 or 1]
and
P[Nh = 0 or 1] > P[Vt+h = llVt = I] we conclude that V is not a Markov process. Thus the process V, obtained as a function of the Poisson process N, does not obey at least three of the essential properties of N. Actually, this is not so surprising, since v as defined above is not a one--one function.
24.2.
Can the Poisson process be characterized by only one of its properties?
Recall that we can construct a Poisson process in the interval [0,1] by choosing the number of points according to a Poisson distribution and then distributing them independently of each other and unifonnly on [0,1] (see Doob 1953). We now consider an example of a point process, say S, on the interval [0,1] such that the number of points in any subinterval has a Poisson distribution with given parameter A, but the numbers of points in disjoint subintervals are not independent. Obviously such a process cannot be a Poisson process. Fix A > 0, choose a numbern with probability e-AA n In!, n 0, I,. " and define the dJ. Fn of the n points t), ... , tn of S as follows. If n :/= 3 let and if n
(I)
= 3 let F3(X\, X2, X3)
+cX\X2X3(XI
=
X\X2X3
X2)2(xJ - X3)2(X2 - X3)2(1
xJ)(1 - x2)(1 -
X3).
It is easy to see that for sufficiently small c > 0, F3 is a dJ. Moreover, it is obvious that the process S described by the family of d.f.s {Fn} is not a Poisson process. Thus it remains for us to show that the number of points of S in any subinterval of [0,1] has a Poisson distribution. For positive integers m < n and (a, b) C [0, 1] we have (2)
Gm,n(a, b) = Pn[exactlymoft), ... ,t n E (a, b)]
= (:) Pn[tl,'''' tm E (a, b), tm+I,""
(:)En [fi(Xb(tj) - Xa(tj)) . IT )=)
)=m+J
tn ¢ (a, b)]
(Xa(tj) + X1(tj) - Xb(t j
))]
COUNTEREXAMPLES IN PROBABILITY
284
where Xa(t) = 1 if t
°
< a and Xa(t) = ift
~
a. Moreover, since
then in (2) only tenns of the fonn Fn (ai, ... ,an) appear where for each i, ai is equaJ either to a, or to b, or to 1. Hence if
(3) for all such values of ai, ... ,an, then Gm,n(a, b) in (2) will be the same as in the Poisson case. For n i= 3 this follows from the choice of Fn as a unifonn dJ. For n = 3, relation (3) follows from (1) and the remark before (3). The final conclusion is that the Poisson process cannot be characterized by only one of its properties even if this is the most important property. 24.3.
The conditions under which a process is a Poisson process cannot be weakened
Let v be a point process on ~.. , v(I) denote the number of points which fall into the interval I and III be the length of I. Recall that the stationary Poisson process can be characterized by the following two properties.
. I , P[ v (I) (A) For any mterva1
= k]
= ('\III)k k! e -AlII ,k= 0, 1,2, .. "
(B) For any number n of disjoint intervals
II,'" ,In, the r.v.s V(II)," ., v(In)
are independent, n = 2,3, .... In Example 24.2 we have seen that condition (B) cannot be removed if the Poisson property of the process is to be preserved. Suppose now that condition (A) is satisfied but (B) is replaced by another condition which is weaker, namely: (B 2 ) for any two disjoint intervals
II and 12 the r.v.s v(I1 ) and v(I2) are
independent. Thus we come to the following question (posed in a similar foml by A. Renyi): do conditions (A) and (B2) imply that v is a Poisson process? The construction below shows that in general the answer is negative. For our purpose it is sufficient to construct the process v in the interval [0,1]. Let v be a Poisson process with parameter ,\ with respect to a given probability measure P. The idea is to introduce another measure, denoted by P, with respect to which the process v will not be Poisson. Define the unconditional and conditional probabilities of v with respect to P by the relations _
(I)
(2)
,\k
= k] = P[v([O, 1]) = k] = kTe-A, P[ 'Iv([O, 1]) = k] = P[ 'Iv([O, 1]) = k],
P[v([O, 1])
k
= 0,1,2, ... ,
if k
i= 5.
STOCHASTIC PROCESSES
285
If v([O, 1)) = k and we take a random permutation of the k points of a Poisson process which fall into [0,1], then the distribution of the k-dimensional vector obtained is the same as the distribution of a vector whose components are independent and uniformly distributed in [0,1], that is its conditional d.f. FIc given v([O, 1]) = k has the form
From (2) it follows that for k
i= 5 the d.f. FIc of v about P satisfies the relations
(3) For k
= 5 and 0:::; Xj
:::; 1, j
= 1, ... ,5 we define Fs as follows:
II
X5)
(4)
(Xj
lSi<jS5
=F5(xl, ... , xs) + H(XI"'"
X5).
It is easy to check that for E positive and sufficiently small the mixed partial derivative (a 5laXI ... aX5)F5(XI, ... , X5) is a probability density function and thus F5 is adJ. It is obvious that our process v, and also the measure P, are determined by (l), (2), (3) and (4). Moreover, (4) means that v is not a Poisson process. Clearly it remains for us to verify that the probability measure P satisfies conditions (A) and (B2). These conditions are satisfied for the measure P, so it is sufficient to prove that for disjoint intervals 1\ and h we have
By the definition of P we see that we need to establish the relation
(5)
P[v(Id=k\) v(h)=k2Iv([O, 1])=5] = P[v(Id=k\, v(h)=k2Iv([0, 1])=5].
The probability in the left-hand side of (5) is a finite sum of the form I:(±F5(al, ... ) (5)) where each aj is either 0, or 1, or the endpoint of one of the intervals II or 12 • So the difference between the two sides of (5) is equal to I:(±H(al,' .. ) (5))' Obviously, each term in this sum is O. This is clear if 0 or I occurs among the as; if not, then at least two of the as are the same, so H vanishes again. Therefore the measure P satisfies conditions (A) and (B2). This means that we have described a process v which obeys the properties (A) and (B2), but nevertheless v is not a Poisson process. Condition (B2) can be replaced by a slightly stronger condition of the same type, (B M), which includes the mutual independence of the r. v.s v( I,), ... , v( 1M) for any M disjoint intervals II, ... 11M where, let us emphasise, M is finite. The conclusion in this case is the same as for M = 2.
286 24.4.
COUNTEREXAMPLES IN PROBABILITY
Two dependent Poisson processes whose sum is still a Poisson process
=
=
Let X (X(t), t ~ 0) and Y (Y(t), t ~ 0) be Poisson processes with given parameters. IL is well known and easy to check that if X and Y are independent then their sum X + Y = (X(t) + Y(t), t ~ 0) is also a Poisson process. Let us now consider the converse question. X and Y are Poisson processes and we know that their sum X + Y is also a Poisson process. Does it follow that X and Y are independent? The example below shows that in general the answer is negative. Let g(x, y) = e- x - y for x ~ 0 and y ~ 0, and g(x, y) = 0 otherwise. So 9 is the density of a pair of independent exponentially distributed r. v.s each of rate 1. We introduce the function Q',
f,(x,y) =
-Q',
0, where Q'
if (O:S;x< or (2 :s; x < if (0 :s; x < or (2:S; x < otherwise
or or or or
1,3:S;y<4) 3,0 :s; y < I) 1,2 :s; y < 3) 3, 1 :s; y < 2)
= constant, 0 < Q' < e- 6 and define f(x,y) = g(x,y) + ft(x,y),
:s; x < 2,2:S; y < 3) :s; x < 4, 1 :s; y < 2) :s; x < 2,3 :s; y < 4) :s; x < 4,0 :s; y < 1)
(I (3 (1 (3
(x,y) E
2 J.R •
It is easy to check that: (a) f is a density of some d.f. on J.R2; (b) the marginals of f are exponential of rate 1; (c) for each non-negative measurable function h on J.R2 such that h(x, y) = h(y, x) the following equality holds:
(1)
r
iIR
f(x,y)h(x,y)dxdy = 2
r g(x,y)h(x,y)dxdy.
iIR
2
Now let n = J.R2 x ]R2 X .•• be the infinite and countable product of the space ]R2 with itself such that n = (]R2)N. Define Wn(w) = (Un(w), Vn(w)), n ~ 1, as the nth coordinate of wEn. Let A be the a-field generated by the coordinates. We shall provide (n, A) with two different probability measures, say P and Q, as follows: (a) P is a measure for which {Wn' n ~ I} is a sequence of independent r.v.s, WI has density f and each W n , n > 2, has density g; (b) Q is a measure for which {Wn, n ~ I} is a sequence of i.i.d. r.v.s each having the same density g. Put Uo - Vo
= 0 and define the processes X, Y
and Z
n
X(t) = n, if
Y(t) Z(t)
= Tt,
if
X(t)
LU
k
=X +Y
n+1
:s; t <
LU
kl
k=O
k=O
n
n+1
k=O
k=O
LVk :s; t < LVk, + Y(t).
where:
287
STOCHASTIC PROCESSES ~
{Un, n
I} is a sequence of independent exponential r. v.s of rate 1 with respect to each of the measures P and Q. The same holds for the sequence {V,l' n ~ l}. Hence X and Y are Poisson processes with respect to both P and Q. Moreover, X and Y are independent for Q which implies that Z is a Poisson process for Q. The next step is to show that X and Yare not independent for P. This will follow from the relation P[X(2)
0,X(3} ~ 1,Y(I}
>
1]
= P[X(2) = 0,X(3) ~
l]P[Y(I) ~
1]
a.
Now let Ben and tJ be the set of all points ((Xl, YI), ... , (Xn' Yn), ... } such that ((Yl,XI)"." (Yn,x n ), ... ) E B. Using relation (1) we can prove thatP(B) = Q(B) for any measurable subset Ben such that tJ = B (for details see Jacod 1975). It remains for us to show that Z (Z(t), t ~ 0) is a Poisson process for the measure P. Note first that each event B which depends only on the process Z (this means that B belongs to the a-field generated by Z) satisfies the equality iJ B. Since Z is a Poisson process for the measure Q, Z must also be a Poisson process for the measure P. More precisely, if S) :::; tl :::; ... :::; Sn :::; tn we see from the above reasoning that
=
n
n
p[n {Z(tk) - Z(Sk) == nk}]
= Q[n {Z(tk) -
Z(Sk)
= nd]
k=l
Ie=l
n
=
II exp{ -2(tk -
sk)}[2(tk - Sk)]n le Ink! .
k=1
Obviously this relation illustrates the fact that Z is a Poisson process with respect to the probability measure P. Note that the present example is in some sense an analogue to Example 12.3 where we considered an interesting property of dependent Poisson r. v.s.
24.S.
Multidimensional Gaussian processes which are close to the Wiener process
Recall that W ((WI (t), ... , wn(t)), t ~ 0) is said to be an n-dimensional standard Wiener process if each component Wj = (Wj(t), t ~ 0), j = 1, ... , n, is a onedimensional standard Wiener process and WI, . , " Wn are independent. Further, the linear combinations n
Y(t) =
L: AjWj{t},
t ~ 0, Aj E JRl
j=l
are often called the projections of wand it is very easy to see that
(1)
E[Y(s)Y(t)] =
(~A]) mints, t}.
288
COUNTEREXAMPLES IN PROBABILITY
Suppose now X projections Z(t) =
(2)
((Xl(t), ... ,Xn(t», t ~ 0) is a Gaussian process whose Ej=1 AjXj(t), Aj E ]RI, satisfy the relation E[Z(s)Z(t)] =
(~AJ) min{ s, t}.
Comparing (2) and (l) we see that in some sense the projections of the process X behave like those of the Wiener process w. Since in (2) and (1), AI, ... ,An are arbitrary numbers in ~ I, and sand t are also arbitrary in ~+, we could conjecture that X is a standard Wiener process in JR n . However, the example below shows that 2. Take in general this is not the case. To see this. consider for simplicity the case n two independent Wiener processes, WI = (WI (t), t ~ 0) and Wz (wz(t), t ~ 0), and define the process X = ((XI (t), Xz(t), t ~ 0) by
=
XI(t) = WI(~t) + wz(it),
Xz(t) = w,(it) - wz(~t).
=
Then w «WI (t), wz(t», t ~ 0) is a standard Wiener process in JRz and for any AI,AZ E JRI theprojectionsY(t) =: AIWI(t)+AZWZ(t) satisfy (I). Further, if we take the same All AZ we can easily show that the projections Z(t) =: AI XI (t) + AZ X 2 (t) of the Gaussian process X satisfy (2). However, this coincidence of the covariances of the projections of X and w does not imply that X is a standard Wiener process in JR z. It is enough to note that the components Xl and Xz of X are not independent. Note that the Gaussian process X with property (2) will be a standard Wiener process in ~n if we impose some other conditions (see Hardin 1985).
24.6.
On the WaJd identities for the Wiener process
Let W = (w(t), t ~ 0) be a standard Wiener process and r be an (~)-stopping time. The following three relations (I)
Ew(r)
=:
0,
(2)
E[wz(r)] = Er,
(3)
E[exp(w(r) - !r)] = 1
are called the Wald identities/or the Wiener process. Let us introduce three conditions, namely (1*)
KIT < 00,
(2*)
< 00, E[exp( !r)] < 00.
(3·)
Er
Note that (1*), (2*) and (3*) are sufficient conditions for the validity of (1), (2) and (3) respectively (see Burkholder and Gundy 1970; Novikov 1972; Liptser and Shiryaev 1977178).
289
STOCHASTIC PROCESSES
Our purpose here is to analyse these conditions. In particular, to clarify what happens to (1), (2) and (3) when changing (1 *), (2*) and (3 *). Firstly, take the stopping time T\ = inf {t ~ 0 ; w( t) ~ I}. By the continuity of the Wiener process w we have w( TI) = 1 and hence Ew( Tl) = 1 but not 0 as in (1). However, the r.v. Tl has density (27rt 3 )-1/2 exp (-I/(2t)), t > 0, and it is easy to check that E[Tt] < 00 for all 0 < but E[T)I/2] = 00, so (1 *) is violated. Obviously, identity (2) is also not satisfied because E[w 2 (rd] = 1 =F Erl = 00. Regarding the identity (1) we can go further. Among many other results Novikov (1983) proved the following statement. Let f(t), t ~ 0, be a positive, continuous and non-decreasing function such that
!
1
00
= 00.
t- 3/ 2 f(t) dt
Then for any (17)-stopping time r with E[J(r)] < 00 and E[lw(r)1l < 00 we have Ew(r) = O. Let us show that the integrability condition for f cannot be weakened. Suppose that / is positive, continuous and non-decreasing, / (O) > 0 and 3 i1°O t- / 2 /(t} dt < 00. Consider the stopping time 1'2 = inf{t ~ 0 : w(t} ~ 1 - ! (t)}. It can be shown (for details see Novikov (1983» that E[lw(r2)1l
< 00, E[j(T2)] < 00
but EW(T2)
> O.
Now consider condition (3*) and the identity (3). It is not difficult to show that (3*) cannot be essentially weakened, Indeed. define the stopping time
ra
= inf{ t ~ 0 : w(t)
+ at}
S -I
where a is an arbitrary real number. Since Ta has the density
it is easy to verify that E[exp( !a2Ta)] I
< 00 for each a. Furthermore, ] _
E [exp(w(Ta} - 2 T a) -
{
1,
ca <
1
if a ~ 1 'f 1
,1
a<.
Here Ca is a constant depending on a. Its exact value is not important but it is essential that Ca < 1. Therefore the coefficient ~ in the exponent in condition (3*) is the 'best possible' case for which the Wald identity (3) still holds. The Wald identity (3) is closely connected with a more general problem of characterization of the uniform integrability of the class of exponential continuous local martingales (see Liptser and Shiryaev 1977178; Novikov 1979; Kazamaki and Sekiguchi 1983; Liptser and Shiryaev 1989; Kazamaki 1994). (It is useful to compare (3) and (3*) with the description in Example 24.7.)
290
24.7.
COUNTEREXAMPLES IN PROBABILITY
Wald identity and a non-uniformly integrable martingale based on the Wiener process
Let us fonnulate first the following very recent and general result (see Novikov 1996). Suppose X = (Xt, ~t, t ~ 0) is a square integrab1e local martingale with bounded jumps (IAXt = IXt X t -l:'5 constant a.s. for each t) and such that (X)oo < 00 a.s. and E[XooJ exists. Then lim inf( Jtp[(X)oo > t]) ~ v'2/1rIE[Xoo]I t-too
(1)
=
and in particular lim inft-too( JtP[(X}oo > t]) = 0 implies E[Xoo] O. From this result we can easily derive an elegant corollary for the Wiener process W = (Wt,t ~ 0). Let 7 be a (97)-stopping time such that 7 < 00 a.s. and E[w r ] exists. Then the process X t := WtAn 7 ~ 0 is a square integrable local martingale and in this case (1) takes the fonn (even continuous) with (X}oo = 7 and Xoo
Wr
lim inf( JtP[7 > tD ~ v'2/1rIE[wr ]l. t-too
(2)
In particular, lim inft-too( JtP[7 > t]) = 0 ::} E[w r ] = O. Example 24.6 shows that the Wald identity E[w r ] 0 does not hold for the stopping times 7A = inf {t : Wt = A}. A is a real number. Note however that P[7A > t] ~ v'2/1rIAlt-1/2 for large t. implying that lim inft-too (JtP[7A > t]) > O. Thus we arrive at the question: is there a more general martingale X satisfying the conditions in the above result of Novikov and such that
=
(3)
liminf(Jtp[(X}oo > tD = 0 but limsup(Jtp[(X)oo> t]) > 0 t-too t-too
and, if so, what additional conclusion can be derived? Let us show by a specific example that both relations (3) are possible. Indeed, take the increasing sequence 1 = tl < t2 < t3 < ... and define the function g(s), s ~ 0, where I, if0:'5s
O}
and define the process m = (mt, t ~ 0) by mt =
tAr g(s) dw
10
s.
Then m is a square integrable local martingale which is continuous and such that (m}t = G(t) = g2(s) ds and (m}oo - G(7) < 00 a.s. Moreover, the relations
J;
moo =
for g(s) dws = 2wr + for (2 -
9(8))
dW Wr = 0 a.s' sl
l
291
STOCHASTIC PROCESSES
foe<) (2 - g(s))2ds
< 1 + I.e<) (ljs)2ds ::;
2
imply that moo is integrable. The next step is to check that for large t we have P[r > t] ~ C·t- I / 2 and, since a(t), t ~ 0 is strictly monotone (due to the special choice of 9 above). there is an inverse function a-I and P[(m)oo
> t]
= P[r
> a-I(t)]
~ c.(a-I(t))-I/:.
Thus we conclude that liminf(VtP[(m)oo
t-+e<)
> t]) = 0 (=>
E[m oo ] = 0)
while
(4)
lim sup( VtP[ (m)e<) t-+oo
> tD > o.
It should be noted that (4) is a sufficient and necessary condition for the process m to be non-unifonnly integrable (see Azema et alI980). Therefore we have described a continuous square integrable local martingale m = (mt, t ~ 0) with E[me<)J = 0 but despite these properties, m is not unifonnly integrable.
24.S.
On some properties of the variation of the Wiener process
(i) Let us consider the Wiener process w in the unit interval [0,1]. For any fixed p ~ 1 let n-I
Vp(w) = sup 1I'n
L IW(tk+l - W(tk)/P
k=O
where sup is taken over all finite partitions 1rn = {O = to < tl < ... < tn = I} of [0.1]. The quantity Vp(w) is called a p-variation (or maximal p-variation) of the Wiener process in [0,1]. Let us also introduce the so-called expected p-variation of was E[Vp(w)]. We are interested in the conditions ensuring that Vp (w) and E[Vp (w)] take finite values. It is better to consider an even more general situation. Suppose X = (X(t), t E [0,1]) is a separable Gaussian process with EX(t) = 0, t E [0,1] and let ex(s, t) = EIX(s) - X(t)/. Firstly, according to the 0-1 law for Gaussian processes, the probability P[Vp(X) < 00] is either 1 or 0 (see Jain and Monrad 1983). Further, it can be shown that if P[Vp(X) < ooJ = 1, then it is also true that E[Vp(X)] < 00 (see Fernique 1974). Since
E[s,:'! ~ IX(tk+l) -
X(t.)IP] ;::
s,:'!E [~IX(tk+l) n-J
~ cpsup Le~(tk,tk+J) 1I'n
k=O
X(t.)IP]
292
COUNTEREXAMPLES IN PROBABILITY
we conclude that the condition n-l
(1)
sup L 1I"n
e~(tk' tk+l) < 00
k=O
is necessary for the Gaussian process X to have trajectories of finite p-variation with probability 1. Take the particular case p = 1. The equality
shows that if p = 1, then condition (1) is also sufficient to ensure that the variation VI (X) of order 1 is finite. If p > 1, condition (l) is not sufficient to ensure that Vp (X) < 00 a.s. To demonstrate this, consider the Wiener process w again. In particular, for p = 2 we have n-I
sup L 1I"n
e~(tk) tk+d <
00
k=O
that is, (1) is satisfied. However, the Wiener process w has an infinite variation on every interval. Therefore the finiteness of the expected p-variation, p > 1, does not in general imply that the trajectories of the process have a.s. finite p-variation for p = 1. (ii) Let us now consider some properties of the quadratic variation V2 (w, 7r n) of the
Wiener process w, which is defined by n-I
V2 (W,7T n ) = L[W(tk+d - w(tk)f·
(2)
k=O
It is useful to recall the following classical result (see Levy 1940). If the partition
7T n
is defined by {k2- n ) k = 0, ... , 2n} then with probability 1 V,(w,1r n ) =
2~1 [w(k;l)
-W(2:)]' ~
1
~'n~oo.
(Note that the limit value I is simply the length of the interval [0,1].) Obviously, in this particular case the diameter of 7T n is d n = 2 -n which tends to 0 'very quickly' as n --+ 00. Thus we come to the question of the limit behaviour of V2(W, 7T n ) as n --+ 00 and d n --+ O. Dudley (1973) proved that the condition d n = o( I / log n) implies that V2 (w, 7T n) ~ 1 as n --+ 00. Even in more general situations he has shown that o( 1/ log n) is the 'best possible' order of d n . More precisely, there exists a sequence {7Tn} of partitions of the interval [0,1] with d n = D( 1/ log n) and such
293
STOCHASTIC PROCESSES
that V2( W, 1Tn) does not converge a.s. to I as n -+ 00; V2( W, 1Tn) will converge a.s. to a number which is (strictly) greater than 1. A paper by Fernandez De La Vega (1974) gives details concerning the construction of {1T n} with d n =: 0 ( l/log n) and proof that the quadratic variation V2(W, 1Tn) converges a.s. to a number 1 + 8 where 8> O. (iii) Finally, let us mention another interesting result. It can be shown that if the diameter d n of the partition 1T11 of the interval [0, I] is of order less than (l/Iog n y~ for any 0 < 0: < 1, then the quadratic variation V2 (w, 1Tn) of the Wiener process W diverges a.s. as n -+ 00. For details we refer the reader to a paper by Wrobel (1982).
24.9.
A Wiener process with respect to different filtrations
The Wiener process W = (Wt, t ;::: 0) obeys several useful properties. One of them is that W is a martingale which, moreover, is square integrable (see Liptser and Shiryaev 1977178; Kallianpur 1980; Durrett 1984; Protter 1990). Recall, however, that for some martingale M (Mt, t ;::: 0) we mean that M is adapted with respect to a suitable filtration (:ft, t ;::: 0), that is for each t ;::: 0, Aft is :ft-measurable. In the case of the Wiener process W we can start with some of its definitions and establish that W is a martingale with respect to its own generated filtration (~, t ;::: 0): ~ = u{ W s , S ~ t}. Note that in general a process can be adapted with different filtrations; in particular, a process can be a martingale about different filtrations. Hence it is interesting to consider the following question. What is the role of the filtration and what happens if we replace one filtration by another? One possible answer will be given in the example below. Let (O,:f, P) be a probability space and let (Xt, t ;::: 0) and C~t, t ~ 0) be two filtrations on this space. Suppose W = (Wt, t > 0) is a Wiener process with respect to each of the filtrations (Xt, t ;::: 0) and Oh, t ;::: 0). Now let us define a new filtration, say (:ft, t ~ 0), where:ft = Xt V~t is the u-field generated by the union ofXt with}lt. How is the process W = (Wt, t ;::: 0) connected with the new filtration (:ft ,t ;::: O)? In particular, is it true that W is a Wiener process with respect to (:ft, t ;::: O)? Intuitively we could expect that the answer to the last question is positive. However, the example below shows that such a conjecture is false. Suppose we have found two r. v.s, say X and Y, which satisfy the following three conditions:
=
=
(wt, t > 0); (a) X does not depend on the process W (b) Y does not depend on the process W =: (Wt, t ;::: 0); (c) the process W = (Wtl t ~ 0) and the u·field u(X, Y) generated by the r.v.s X and Yare dependent. Now, denote:17 = u{w s , s ~ t} and define X t and ~t as:
Xt
= ~ V u(X),
~t
It is easy to see that the new filtration (:ft, t :ft = ~ V u(X, V).
= :17 V u(Y). 0) where:ft = X t V
~t.
is such that
294
COUNTEREXAMPLES IN PROBA BILITY
Clearly W = (Wt, t ~ 0) is a Wiener process with respect to each of the filtrations (Xt, t ~ 0) and Olt, t ~ 0). However, W = (Wt, t ~ 0) is not a Wiener process with respect to (3"t, t ~ 0) which follows from condition (c). Hence it only remains for us to construct r.v.s X and Y satisfying conditions (a), (b) and (c). For simplicity we consider the Wiener process W in the interval [0,1]. Let X be a r.v. with P[X 1) = P[X = -1) and suppose X is (wt,O ~ t ~ 1). Define the r.v. Y by Y !IX + sign(wl)l. independent of W Obviously the r. v. sign( WI) is a(X, Y)-measurable and thus condition (c) is satisfied. Condition (a) is satisfied by construction. Let us check the validity of (b). For this, 1et 0 ~ tJ < t2 < ... < tn ~ 1 be any subdivision of [0,1]. Then for arbitrary continuous and bounded functions f(x), x E JR I , and g(XI, ... , x ,t ), x I , ... , X n E JR I , we have
=!
=
=
=
E[J(Y)g(Wtl"'" WtJ] = E{E[J(Y)g(Wtl1"" Wtn )IX, Wd} = E{f(Y)E[g(Wtl'" ., Wtn )IWd}· Since E[g(Wt" ... ,Wt n )Iwd is a (measurable) function of WI only. it is sufficient to show that the variables Y and Wl are independent. Obviously, 1 and 0 are the possible values of Y, and the event [Y = 0) can occur only if WI E B where B C (-00,0), while [Y 1) is possible only if WI E B C (0,00). Further, the relation [Y = 0] n [WI E B) = [X = -1 J n [WI E B] holds for any set B E 'B I and hence we have
=
Analogously,
P{[Y
= IJ n [WI
Therefore the variahles Y and satisfied.
24.10.
WI
E Bn
= !P[WI
E B].
are independent and so condition (b) is also
How to enlarge the filtration and preserve the Markov property of the Brownian bridge
Let X = (Xt, t ~ 0) be a real-valued Markov process on the probability space (11,3", P) and let EIXtl < 00 for a11 t ~ O. For s, t and u with 0 ~ s < t < u, let us call t the 'present' time. Then, with respect to the 'present' time t, the a-field 1ls = a{Xv, v ~ s} is the 'past' (the 'history') of the process X up to time s, while the a~field 3"u = a{Xv, v 2: u} is called the 'future' of X from time u on. Denote by 1ls V 3"u, s ~ u, the minimal a-field generated by the union of1ls and 3"u. The Markov property of X, usually written as P[Xt E fl1ls] P[Xt E flX s] a.s., can be expressed in the following equivalent form involving the 'past' and the 'future' in a symmetric way (see e.g. AI-Hussaini and Elliott 1989):
=
(I)
295
STOCHASTIC PROCESSES
It is not difficult to derive from (l) the corollary:
(2) Our goal now is to determine if the 'information' 1ls V a(Xu) can be enlarged whilst still keeping the Markov property (2), Let us consider a standard Wiener process W = (w(, t :2': 0) and let 1 < s < t < 2. By 1l': = a{wVl v < s} we denote the 'past' of W about the 'present' time t and a{ WI + W2} is the a-field generated by the r.v. 1111 + W2. Note that the value 1112 plays the role of the fixed 'future' of the process Wt at time t = 2. In such a case we speak about a Brownian bridge process. We want to compare two conditional expectations, E[wtl1l': Va(wl + W2)] and E[wt.lw S1 WI +W2]. In view of(2) we could suggest that these two quantities coincide. Let us check if this is true. The Markov property of W implies E[Wt 11l:' V a( WI
+ W2)]
= E[wt 11l: V a( W2)] = E[wt Iws 1 W2] a.s.
Since W is a Gaussian process with independent increments, we can easily derive the following two relations:
and
Thus we have shown that
Hence the Markov property expressed by (2) will not be preserved if the 'past' 1l:' is enlarged by 'new information' taken strictly from the 'future'.
SECTION 25. DIVERSE PROPERTIES OF STOCHASTIC PROCESSES This section covers only a few counterexamples concerning different properties of stochastic processes. All new notions are defined in the examples themselves. Obviously, far more counterexamples could be considered here, but for various reasons we have restricted ourselves to indicating additional sources of interesting but rather complicated counterexamples in the Supplementary Remarks.
296
COUNTEREXAMPLES IN PROBABILITY
25.1.
How can we find the probabilistic characteristics of a function of a stationary Gaussian process?
= (Xt,t E JRI) be areal-valued stationary Gaussian process with EXt = 0, I JR , and covariance function C(t) = E[XsXs+t]. S, t E JR I, Then the finite-
Let X
t
E
dimensional distributions of X, and hence any other probabilistic characteristics, are completely determined by C(t), t E JR I , In other words. if X is any Gaussian process and we know its moments of order 1 and 2 (that is, we know the mean function EXt and the covariance function E[X sX t D, then we can find all probabilistic characteristics of X. It is interesting to clarify whether a similar fact holds for the process Y (Yt, t E JR I) which is a function of X. We consider the following particular case yt = t E JR I .
=
Xi,
Does there exist a universal constant m such that the moments of Y of order not greater than Tn are enough to determine the distributions of Y? As was mentioned above, for Gaussian processes the answer is positive and m = 2. For fixed e E JRI with lei < 1 and integer n ~ 2, introduce the function
f()..) =
cos )..)(1
)..2(1
+ e cosn)..),
).. E JR I ,
It can be shown that the Fourier transform Cc ,n of f has the form:
(I)
~e(l C t 1 - Itl, c •n ( ) - { c (l 0,
!
It -
nl), if
It + n I),
It - nl ::; It I ::; 1 It + n I ::;
if if otherwise.
1 1
Moreover, for lei < 1 and n ~ 2 the function Cc,n(t), t E JRI) is positive definite. Therefore (see Doob 1953; Ash and Gardner 1975) there is a centred stationary Gaussian process, say X, with covariance function equal to Cc,n' t E JR I, and suppose that we know all the moments of Y Now take Yt = of order not greater than m where m is a fixed natural number. This means that we know the quantities E[Yt ,Yt2 ... Yt io ] for all k ::; m and arbitrary tJ , t2, . , . ,tk E JR I, However,
Xl.
(2)
Xl, ...
and since xlI' ,xlk is a product of 2k Gaussian r. v.s, applying the well known Wick lemma we obtain
where the sum in (2) is taken over the group of all permutations 11" of the k elements
tl,tz,· .. ,tk·
297
STOCHASTIC PROCESSES
Now we show (by an appropriate choice of k) that the information contained in (2) is not su fficient to determine the sign of the parameter c. Indeed, let us first clarify which of the terms in (2) give a non-zero contribution. It is easy to see that non-zero terms are those in which the difference It7Ti - t7T( i - I) I is either smaller than 1 or is between k - 1 and k + 1. This observation is based on the explicit form (1) of the covariance function CE,n; together with the equality (t7T1 - t7Tk) + ... + (t7Tk - t 7T (k-I)) = 0, it implies that if we choose k such that n > 2m ~ 2k then the number of terms in (2) whose arguments are close to n is the same as the number of terms with arguments close to (-n). Obviously this means that the parameter c in (2) has an even power and thus its sign is lost. We have shown that for an arbitrary positive integer m, there exists a centred stationary Gaussian process X such that the moments of order not greater than m of yt = t E IR I, are not sufficient to determine the distributions of Y.
Xl,
25.2.
Cramer representation, multiplicity and spectral type of stochastic processes Let X = (X(t), t ~ 0) be a real-valued L2-stochastic process defined on a given probability space (0, ~,P). Denote by Jet(X) the closed (in L2-sense) linear manifold generated by the r. v.s {X (s), 0 ~ s ~ t} and let
U Jet(X).
Je(X) =
tER+
Suppose now that Y = (Y(t), t ~ 0) is an L2-process with orthogonal 00 increments. Then Je(Y) coincides with the set of all integrals g(u) dY(u) where 00 g2(U) dF(u) < 00 and dF(u) = E[dy2(u)]. Thus we come to the following interesting and important problem. For a given process X, find a process Y with orthogonal increments such that
10
10
(1 ) Regarding this problem and other related topics we refer the reader to works by Cramer (1960, 1964), Hida (1960), Ivkovic et al (1974) and Rozanov (1977). In particular, Hida (1960) suggested the first example of a process X for which relation (1) is not possible. Take two independent standard Wiener processes, WI = (WI (t), t ~ 0) and W2 = (W2(t), t 2: 0), and define the process X = (X(t), t 2: 0) by
X(t) = {WI (t), W2(t),
if t is rational if t is irrational.
Obviously, X is discontinuous at each t and we have the representation
298
COUNTEREXAMPLES IN PROBABILITY
where as usual the symbol Ef) denotes the sum of orthogonal subspaces. The general solution of the coincidence problem (I) was found by Cramer (1964) and can be described as foHows. Let Fl, F2,' .. , FN be an arbitrary sequence of measures on jR+ ordered by absolute continuity, namely: (2) Here N is either a fixed natural number or infinity. Then there exists a continuous process X and N mutuaJIy orthogonal processes Vi, ... , YN each with orthogonal increments and dFn(t) = E[dY;(t)], n = I, ... ,N, such that N
(3)
1it (X) ::::
L 1i (Y t
n ),
t
> O.
n=1
This general result implies in particular that N
(4)
X(t) ::::
L n=1
1 t
gn(t, u) dYn(u)
0
where the functions gn, n = 1, ... ,N, satisfy the condition
L 1g~(t,u)dFn(u) < N
n=1
t
00.
0
The equality (4) is called the Cramer representation for the process X while the sequence (2) is caHed the spectral type of X. Finally, the number N in (2) (also in (3) and (4» is caHed the multiplicity of X. Our purpose now is to illustrate the relationships between the notions introduced above by suitable examples.
(0 Suppose Y and E[dy2(t)]
= (Y (t), t
=
~ 0) is an arbitrary L2 -process with orthogonal increments dt. Consider the process X = (X(t), t ~ 0),
X (t)
(5)
= fat h(t, u) dY(u)
where h is some (deterministic) function. Comparing (5) and (4) we see that X has a Cramer-type representation and it is natural to expect that the multiplicity of X is equal to 1. However, the kernel h can be chosen such that the multiplicity of X is greater than 1. Indeed, take h(t, u)
= 0,
if 0 ~ t ~ to
and
a ~ u ~ b < to
where 0 < a < b < to are fixed numbers. Since for t > 0 any increment Y (d) - Y (c) with a ~ c < d ~ b belongs to 1it (Y) and is orthogonal to 1it (X), we conclude
299
STOCHASTIC PROCESSES
that 1it (X) C 1it (Y) (the inclusion is strong). Further, the function h can be chosen arbitrarily for 0 :::; t :::; to and u rt. [a, b]. Take, for example,
h(t u) = { sin u, , cos u,
~f u ~s ~ati~nal
If u IS IrratIOnal
and suppose that b - a = 27rk for some natural number k. Then for any t is equal either to ZI or to Z2 where ZI and Z2 are r.v.s defined by
z, =
f.'
sinudY(u),
Z,
=
f.'
> to, X(t)
cosudY(u).
Obviously,
(27rk
E[Z,Z2]
= Jo
sinucosudu
=0
which means that Z, and Z2 are orthogonal. Moreover, both Z, and Z2 are orthogonal to the space 1ito (X). Thus for any t > to, 1it (X) consists of1ito (X), ZI and Z2. Consequently
n
1ito+o(X) = 1ito(X) EEl ZI EEl Z2.
0>0
According to a result by Cramer (1960), the point to is a point of increase of the space 1ito(X) with dimension equal to 2. Therefore the multiplicity of the process X at time to cannot be less than 2. (ii) Let X, = (XI(t), t 2:: 0) and X2 = (X 2(t), t 2:: 0) be Gaussian processes. Denote by PI and P 2 the probability measures induced by these processes in the sample function space. It is well known that PI and P2 are either equivalent or singular (see Ibragimov and Rozanov 1978). The question of whether PI and P2 are equivalent is closely related to some property of the corresponding spectral types of the processes XI and X 2. The following result can be found in a book by Rozanov (1977). If the measures PI and P 2 ofthe Gaussian processes X I and X 2 are equivalent, then XI and X 2 have the same spectral type. The next example shows whether the converse is true. Consider two processes, the standard Wiener process w = (w(t), t 2:: 0) and the process ~ = (~(t), t 2:: 0) defined by ~(t)
= h(t)w(t), t > O.
Here h is a function which will be chosen in a special way: h is a non-random continuous function which is not absolutely continuous in any interval. Additionally, let 0 < ml :::; h(t) :::; m2 < 00 for some constants m" m2 and all t. It is obvious that 1it (~) = 1it (w) for each t > O. This implies that the processes ~ and w have the same spectral type.
300
COUNTEREXAMPLES IN PROBABILITY
Denote by P w and P~ the measures in the space C(lR+) induced by the processes wand respectively. Clearly, it remains for us to see whether these measures are equivalent. Indeed, if C w and C( are the covariance functions of wand then the difference between them is
e
e,
.1(s, t)
= Cw(s, t) -
Cds, t)
= [1
- h(s)h(t)] min{s, t}.
Since the function .1(s, t), (s, t) E lR2 , is not absolutely continuous, using a well known criterion (see Ibragimov and Rozanov 1978) we conclude that the measures P wand p( are not equivalent despite the coincidence of the spectral types of the processes wand
e.
25.3.
Weak and strong solutions of stochastic differential equations
A large class of stochastic processes can be obtained as solutions of stochastic differential equations of the type
(1)
X(t)=Xo+ lta(S,X(S))ds+ lta(s,X(s))dW(s),
t~O
where a and a 2 , the drift and the diffusion coefficients respectively, satisfy appropriate conditions, and J~ a (-) dw( s) is a stochastic integral (in the sense of K. Ito) with respect to the standard Wiener process. Let us define two kinds of solutions of (l), weak and strong, and then analyse the relationship between them. Let w = (w(t), t ~ 0) be a standard Wiener process on the probability space (n, 1', P). Suppose that w is adapted to the family (1't, t ~ 0) of non-decreasing suba-fields of 1'. If there exists an (l't)-adapted process X = (X (t), t ~ 0) satisfying (1) a.s., we say that (1) has a strong solution. If (1) has at most one (l't )-strong solution, we say that strong uniqueness (pathwise uniqueness) holds for (1), or that (1) has a unique strong solution. Further, if there exist a probability space (n, 1', P), a family (~, t ~ 0) of non-decreasing sub-a-fields of Y, and two (~)-adapted processes, X' = (X' (t), t ~ 0) and w' = (w' (t), t ~ 0) such that w' is a standard Wiener process and (X', w') satisfy (1) a.s., we say that a weak solution exists. If the law of the process XI is unique (that is, the measure in the space C generated by XI is unique), we say that weak uniqueness holds for (1), or that (1) has a unique weak solution. There are many books dealing entirely or partially with the theory of stochastic differential equations (see Doob 1953; McKean 1969; Gihman and Skorohod 1972, 1979; Liptser and Shiryaev 1977178; Krylov 1980; Jacod 1979; Kallianpur 1980; Ikeda and Watanabe 1981; Durrett 1984). The purpose of the two examples below is to clarify the relationship between the weak and strong solutions of (1), looking at both aspects, existence and uniqueness. The survey paper by Zvonkin and Krylov (1981) provides a very useful and detailed
STOCHASTIC PROCESSES
301
analysis of these two concepts (see also Barlow 1982; Barlow and Perkins 1984; Protter 1990; Karatzas and Shreve 1991). Let us briefly describe the first interesting example in this field. Consider the stochastic differential equation
x(t)
=
J.'
Ix(s)IO dW(s),
t
~a
!
where a > 0 is a fixed parameter. It can be shown that for a ~ this equation has only one strong solution (with respect to the family (s:r), namely x - O. However, for 0 < a < it has infinitely many solutions. For the proof of this result we refer the reader to the original paper of Girsanov (1962) (see also McKean 1969). Thus the above stochastic equation has a strong solution for any a > 0, but this strong solution need not be unique. Among a variety of results concerning the properties of the solutions of stochastic differential equations (1), we quote the following (see Yamada and Watanabe 1971): strong uniqueness of the solution of (1) implies its weak uniqueness. Of course, this result is not surprising. However, it can happen that a weak solution exists and is unique, but no strong solution exists. For details of such an example we refer the reader to a book by Stroock and Varadhan (1979). Let us now consider an example of a stochastic differential equation which has a unique weak solution but several (at least two) strong solutions. Take the function a(x) = I if x ~ 0 and a(x) = -I if x < 1 and consider the stochastic equation
!
(2)
x(t) =
Xo
+
J.'
IT(x(s)) dW(s), t
~ O.
Firstly let us check that (2) has a solution. Suppose for simplicity that Xo = 0 and let
1 t
x(t) = wet)
and
w(t):=
, a(x(s))
dW(s), t
~ O.
Then tV is a continuous local martingale with (w}t = t and so W is a Wiener process. Moreover, the pair (x(t), fU(t), t ~ 0) is a solution of (2). Hence the stochastic equation (2) has a weak solution. Weak uniqueness of (2) follows from the fact that for any solution x, the stochastic integral a(x(s)) dW(s), with the function a defined above, is again a Wiener process. It remains for us to show that strong uniqueness does not hold for the stochastic equation (2), Obviously, a( -x) = -a(x) for x =i 0 and if Xo = 0 and (x(t), t ~ 0) is a solution of (2), then the process ( -x(t), t ~ 0) is also its solution. Moreover, it is not only strong uniqueness which cannot hold for equation (2}-the stochastic equation (2) does not have a strong solution at all. This can be shown by using the local time technique (for details see Karatzas and Shreve 1991).
J;
302
25.4.
COUNTEREXAMPLES IN PROBABILITY
A stochastic differential equation which does not have a strong solution but for which a weak solution exists and is unique
Let (Q,:.1, P) be a complete probability space on which a standard Wiener process w (w(t), t ~ 0) is given. Suppose that a(t, x} is a real-valued function on [0,1] x CC([O,ID defined as follows. Let (tk,k::::;; 0,-1,-2, ... ) be a sequence contained in the interval [0,]] and such that to = 1 > L 1 > L2 > '" ~ as k ~ 00. If for x E CC([O, I]) we have a(O, x) == and if t > 0, let
°
°
1, -2, ... where {O:'} denotes the fractional part of the real number Q and Xt denotes the value of the continuous function x at the point t. Clearly, a satisfies the usual measurability conditions, a is (et)-adapted where e t = a{xs,s ::; t} and.r~ a 2 (t,x)dt < 00 for each x E CC([O, 1]). Consider the following stochastic differential equation:
~, =
(1)
I,' a(s,~)
ds + w"
t
E [0,1].
Firstly, according to general results given by Liptser and Shiryaev (1977178), Stroock and Varadhan (1979) and Kallianpur (1980), equation (1) has a weak solution and this solution is unique. Let us now detennine whether equation (1) has a strong solution. Suppose the answer is positive, that is (1) has a strong solution (~t! 0 ::; t ::; 1) which is (17)adapted where 17 == a{wS! s ::; t}. Then if tk ::; t < tk+l we obtain from (1) that
U sing the notations and we arrive at the relation 11k+l
= {11k} + €k+h
k
= 0, -1, -2, ....
Since we supposed that a strong solution of (1) exists, 11k must be ~ -measurable and. moreover, the family of r.v.s {11m, m = k, k - 1, ... } is independent of €k+l. This independence and the equality (2)
303
STOCHASTIC PROCESSES
easily lead to the relation
where we have introduced the notation dk we get inductively that
dk + 1 = dk -
n
exp
It follows that Idk+11 $ e-
[-27r2 (tk+1 121f2n
dk
= E[e 21fi'71'j. Thus, for any n = 0, 1, ... ,
tk
+ ... +
tk+l-n
1
tk-n
for any n and so dk+1 -t 0 as n -t
=0
for k
00.
]. Hence
= 0, -1, - 2, ....
From (2) and the relation for 7Jk+1 we find that
and also E[e21fi'1k+ll~ ] [t"-n,tk+d
where
Trf"
11.,tk+tl = a{wt
W s , h-n
Now, if n -t 00, then ~"-,,,tk+d each k, we come to the equality
t
= dk _ n e21fi(€k+ +"'+€k+I-n) 1
$ s $ t $ tk+d· Since dk-n = 0 we have
~+l and since 7Jk+1 is ~+I -measurable for
o _ E[e21fil1J.+ll~tk+1 ] =
e 21fil1 J.+I.
It is obvious, however, that this is not possible and this contradiction is a direct result of our assumption that (1) has a strong solution. Therefore, despite the fact that the stochastic differential equation (1) has a unique weak solution, it has no strong solution. In Examples 25.3 and 25.4 we analysed a few stochastic differential equations and have seen that the properties of their solutions (existence, non-existence, uniqueness, non-uniqueness) in the weak and strong sense depend completely on either the drift coefficient or the diffusion coefficient. More details on stochastic differential equations, not only theory but also examples and intricate counterexamples, can be found in many books (e.g. Liptser and Shiryaev 1977178 and 1989; Jacod 1979; Strook and Varadhan 1979; Kallianpur 1980; Ikeda and Watanabe 1981; Jacod and Shiryaev 1987; Protter 1990. Karatzas and Shreve 1991; Revuz and Yor 1991; Rogers and Williams 1994; Nualart 1995).
Supplementary Remarks
Section 1.
Classes of random events
Examples 1.1, 1.2, 1.3, 1.4 and 1.7 or their modifications can be found in many books. These examples are part of so-called probabilistic folklore. The idea of Example 1.5 is taken from Bauer (1996). Example 1.6 is based on arguments by Neveu (1965) and Kingman and Taylor (1966). Other interesting counterexamples and ideas for constructing counterexamples can be found in works by Chung (1974). Broughton and Huff (1977), Williams (1991) and Billingsley (1995).
Section 2.
Probabilities
Example 2.1 could be classified as folklore. Example 2.2 belongs to Breiman (1968). The presentation of Example 2.3 follows that in Neveu (1965) and Shiryaev (1995). Example 2.4 was originally suggested by Doob (1953) and has since been included in many books; see Halmos (1974). Loeve (1978). Laha and Rohatgi (1979). Rao (1979) and Billingsley (1995). We refer the reader also to works by Pfanzagl (1969), Blake (1973). Rogers and Williams (1994) and Billingsley (1995) where other interesting counterexamples concerning conditional probabilities can be found.
Section 3.
Independence of random events
Since the concept of independence plays a central role in probability theory, it is no wonder that we find it treated in almost all textbooks and lecture notes. Many examples concerning the independence properties of collections of random events could be qualified as probabilistic folklore. For Example 3.1 see Feller (1968) or Bissinger (1980). Example 3.2(i), suggested by Bohlmann (1908), and 3.2(ii), suggested by
306
COUNTEREXAMPLES IN PROBABILITY
Bernstein (1928), seem to be the oldest among all examples included into this book. Example 3.2(iii) is due to Feller (1968) and 3.2(v) to Roussas (1973). Examples 3.2(iv) and 3.3(ii) were suggested by an anonymous referee. Example 3.3(i) is given by Ash (1970) and Shiryaev (1995). The idea of Examples 3.3(iii) and 3.7 was taken from Neuts (1973). Example 3.4(i) belongs to Wong (1972) and case (ii) of the same example was suggested by Ambartzumian (1982). Example 3.5 is based on the papers of Wang et al (1993) and Stoyanov (1995). Example 3.6 is considered by Papoulis (1965). Example 3.7 is given by Sevastyanov et al (1985). For other counterexamples the reader is referred to works by Lancaster (1965), Kingman and Taylor ( 1966), Crow (1967), Moran (1968), Ramachandran (1975), Chow and Teicher (1978), Grimmett and Stirzaker (1982), Lopez and Moser (1980), Falk and Bar-Hillel (1983), Krewski and Bickis (1984), Wang et al (1993), Stoyanov (1995), Shiryaev (1995), Billingsley (1995) and Mori and Stoyanov (199511996).
Section 4.
Diverse properties of random events and their probabilities
The idea of Example 4.1 came from Gelbaum (1976) and, as the author noted, case (ii) was originally suggested by E. O. Thorp. Example 4.2 is folklore. Example 4.3 belongs to Krewski and Bickis (1984). Example 4.5 is from Renyi (1970). Several other counterexamples can be found in works by Lehmann (1966), Hawkes (1973), Ramachandran (1974), Lee (1985) and Billingsley (1995).
Section 5.
Distribution functions of random variables
Different versions of Examples 5.1,5.3,5.6 and 5.7 can be found in many sources and definitely belong to folklore. Example 5.2 was suggested by Zubkov (1986). Examples like 5.5 are noted by Gnedenko (1962), Cramer (1970) and Laha and Rohatgi (1979). Case (ii) of Example 5.8 is described by Ash (1970) and case (iii) is given by 01kin el al (1980). Cases (iv) and (v) of the same example are considered by Gumbel (1958) and Fn5chet (1951). A paper by Clarke (1975) covers Example 5.9. Example 5.100) is treated by Chung (1953), while case Oi) is presented by Dharmadhikari and Jogdeo (1974). Example 5.12 follows the presentation in Dharmadhikari and JoagDev (1988). The last example, 5.13, is described by Hengartner and Theodorescu (1973). Other counterexamples concerning properties of one-dimensional and multidimensional d.f.s can be found in the works of Thomasian (1969), Feller (1971), Dall' Aglio (1960, 1972, 1990), Barndorff-Nielsen (1978), Riischendorf (1991), Rachev ( 1991), Mikusinski et al (1992) and Kalashnikov (1994).
SUPPLEMENTARY REMARKS
Section 6.
307
Expectations and conditional expectations
Example 6.1 belongs to Simons (1977). Example 6.2 is due to Takacs (1985) and is the answer to a problem proposed by Emmanuele and Villani (1984). Example 6.4 and other related topics can be found in Piegorsch and Casella (1985). Example 6.5, suggested by Churchill (1946), is probably the first example to be found of a nonsymmetric distribution with vanishing odd-order moments. Example 6.6 is indicated by Bauer (1996). Examples 6.7 and 6.8 can be classified as folklore. Example 6.9 belongs to Enis (1973) (see also Rao (1993)) while Example 6.10 was taken from Laha and Rohatgi (1979). The idea of Example 6.11 is taken from Dharmadhikari and Joag-Dev (1988). Finally, Example 6.12 belongs to Tomkins (1975a). Several other counterexamples concerning the integrability properties of r.v.s, conditional expectations and some related topics can be found in works by Robertson (1968), B. Johnson (1974), Witsenhausen (1975), Rao (1979), Leon and Masse (1992), Bryc and Smolenski (1992), Zieba (1993) and Rao (1993).
Section 7.
Independence of random variables
Examples 7.1(i), 7.8, 7.9(i) and (ii), 7.1O(i), (ii) and (iii), and 7.12 can be described as folklore. Examples 7.1 (ii), (iii), and 7.8 follow some ideas by Feller (1968, 1971). Example 7.2 is due to Pitman and Williams (1967), who assert that this is the first example of three pairwise independent but not mutually independent r.v.s in the absolutely continuous case. Example 7.3(i) is based on a paper by Wang (1979), case (ii) is considered by Han (1971), while case (iii) is outlined by Ying (1988). Example 7.4 is based on a paper by Wang (1990). Runnenburg (1984) is the originator of Example 7.5. Drossos (1984) suggested Example 7.6(i) to me and attributed it to E. Lukacs and R. Laha. Case (ii) of the same example was suggested by Falin (1985) and case (iii) is indicated by Rohatgi (1976). The description of Example 7.7(i) follows an idea of Fisz (1963) and Laha and Rohatgi (1979). Examples 7. 7(ii) and 7.12 are indicated by Renyi (1970). The idea of Example 7.7(ii) belongs to Flury (1986). Case (iii) of Example 7.10 was suggested by an anonymous referee. Example 7.11(iii) is taken from Ash and Gardner (1975). Examples 7.14(i) and (ii) belong to Chow and Teicher (1978) while case (iii) of the same example is considered by Cinlar (1975). Case (i) of Example 7.15 follows an idea of Billingsley (1995) and case (ii) is indicated by Johnson and Kotz (1977). Finally, Example 7.16 is based on a paper by Kimeldorf and Sampson (1978). Note that a great number of additional counterexamples concerning the independence and dependence properties of r. v.s can be found in works by Geisser and Mantel (1962), Tsokos (1972), Roussas (1973), Coleman (1974), Chung (1974), Joffe (1974), Fortet (1977), Ganssler and Stute (1977), Loeve (1978), Wang (1979),0' Brien (1980), Grimmett and Stirzaker (1982), Galambos (1984), Gelbaum (1985, 1990), Heilmann and Schrater (1987), Ahmed (1990), Dall' Aglio (1990), Dall' Aglio et at (1991), Durrett (1991), Whittaker (1991), Liu and Diaconis (1993) and Mori and Stoyanov (199511996).
308
Section 8.
COUNTEREXAMPLES IN PROBABILITY
Characteristic and generating functions
Example 8.1 belongs to Gnedenko (1937) and can be classified as one of the most significant classical counterexamples in probability theory. Example 8.2 is contained in many books; see those by Fisz (1963), Moran (1968) and Ash (1972). Examples 8.3,8.4,8.5 and 8.6, or versions of them, can be found in the book by Lukacs (1970) and in later books by other authors. Example 8.7 was suggested by Zygmund (1947) and our presentation follows that in Renyi (1970) and Lamperti (1966). Example 8.8 is described by Wintner (1947) and also by Sevastyanov et al (1985). Example 8.9 is given by Linnik and Ostrovskii (1977). Finally, Example 8.10 is presented in a form close to that given by Lukacs (1970) and Laha and Rohatgi (1979). Note that other counterexamples on the topics in this section can be found in works by Ramachandran (1967). Thomasian (1969), Feller (1971), Loeve (1977/1978), Chow and Teicher (1978). Rao (1984), Rohatgi (1984), Dudley (1989) and Shiryaev (1995).
Section 9.
Infinitely divisible and stable distributions
Example 9.1 and other versions of it can be classified as folklore. Example 9.2 belongs to Gnedenko and Kolmogorov (1954) (see also Laha and Rohatgi (1979». Example 9.3(i) is based on a paper by Shanbhag et al (1977) and answers a question proposed by Steutel (1973). Case (ii) of Example 9.3 as well as Example 9.4 are considered by Rohatgi et al (1990). Example 9.5 is described by Linnik and Ostrovskii (1977). Example 9.6 belongs to Levy (1948), but some arguments from Griffiths (1970) are also needed (also see Rao (1984». Ibragimov (1972) proposed Example 9.7. Example 9.8 could also be considered as probabilistic folklore. The last example, 9.9. belongs to Lukacs (1970). Let us note that several other counterexamples which are interesting but rather complicated can be found in works by Ramachandran (1967), Steutel (1970), Kanter (1975), O'Connor (1979), Marcus (1983), Hansen (1988), Evans (1991), Jurek and Mason (1993), Rutkowski (1995) and Bondesson et at (1996).
Section 10.
Normal distribution
Some of the examples in this section are popular among probabilists and statisticians and can be found in different sources. In particular, cases (ii), (iii) and (iv) of Example 10.1 are noted respectively by Roussas (1973), Morgenstern (1956) and Olkin et al (1980). The idea of Example 10.2 is indicated by Papoulis (1965). Example 1O.3(i) is based on papers by Pierce and Dykstra (1969) and Han (1971). Case (ii) of the same example is considered by Buhler and Mieshke (1981). Example 10.4(i) in this form belongs to Ash and Gardner (1975) and case (iii) is treated by Ijzeren (1972). Hamedani and Tata (1975) describe Examples 10.5 and 10.7. while Example 10.6 is considered by Hamedani (1984). Moran (1968) proposed the problem of finding
SUPPLEMENTARY REMARKS
309
a non-normal density such that both conditional densities are normal. Example 10.8 presents one of the possible answers. Case 0) is a result of my discussions with N. V. Krylov and A. M. Zubkov, while case (ii) is due to Ahsanullah and Sinha (1986). Example 10.9 is given by Breiman (1969). Finally, Example 10.10 was suggested by Kovatchev (1996). Many useful facts. including counterexamples, concerning the normal distribution can be found in the works of Anderson (1958), Steck (1959). Geisser and Mantel (1962), Grenander (1963), Thomasian (1969), Feller (1971), Kowalski (1973). Vitale (1978). Hahn and Klass (1981). Melnick and Tenenbein (1982). Ahsanullah (1985), Devroye (1986), Janson (1988), Castillo and Galambos (1989). Whittaker ( 1991) and Hamedani (1992).
Section 11.
The moment problem
Example 11.1 follows the presentation of Berg (1988). Example 11.2(i) was originally suggested by Heyde (1963a) and has since been included in many textbooks; see Feller (1971), Rao (1973), Billingsley (1995), Laha and Rohatgi (1979). Case (ii) of this example belongs to Leipnik (1981). Example 11.3 is considered in a recent paper by Targhetta (1990). Example IIA is mentioned by Hoffmann-Jorgensen (1994), but also see Lukacz (1970) and Berg (1988). Example 11.5 follows an idea from Carnal and Dozzi (1989). Our presentation of Example 11.6 follows that in Kendall and Stuart (1958) and Shiryaev (1995). Examples 11.7 and 11.8 belong to Jagers (1983) and Fu (1984) respectively. As far as we know these are the first examples of this kind in the discrete case (also see Schoenberg (1983». Example 11.9(i) is based on a paper by Dharmadhikari (1965). Case (ii) ofthe same example is considered by Chow and Teicher (1978). Both cases of Example 11.10 belong to Heyde (1963b). Example 11.12 is based on papers by Heyde (1963b) and Hall (1981). Example 11.13 is treated by Heyde (1975). Note that other counterexamples concerning the moment problem as well as related topics can be found in works by Fisz (1963), Neuts (1973), Prohorov and Rozanov (1969), Lukacs (1970), Schoenberg (1983), Devroye (1986), Berg and Thill (1991), Siud (1993), Hoffmann-Jorgensen (1994) and Shiryaev (1995). Readers interested in the history of progress in the moment problem are referred to works by Shohat and Tamarkin (1943), Kendall and Stuart (1958), Heyde (l963b), Akhiezer (1965) and Berg (1995).
Section 12.
Characterization properties of some probability distributions
Example 12.1 was suggested by Zubkov (1986). General characterization theorems for the binomial distribution can be found in Ramachandran (1967) and Chow and Teicher (1978). Example 12.2 is given by Klimov and Kuzmin (1985). Example
310
COUNTEREXAMPLES IN PROBABILITY
12.3 belongs to Steutel (1984) but, according to Jacod (1975), a similar result was proved by R. Serfling and included in a preprint which unfortunately I have never seen. Example 12.4 is a natural continuation of the reasoning in Example 12.1. Example 12.5 belongs to Philippou and Hadjichristos (1985). Example 12.6 is given by Rossberg et al (1985). Example 12.7 is based on an idea of Robertson et al (1988). Laha (1958) is the author of Example 12.8(i), while case (iv) of this example uses an idea from Mauldon (1956). Case (v) of Example 12.8 is discussed by Letac (1995). Baringhaus and Henze (1989) invented Example 12.9. Example 12.10 is based on a paper by Blank (1981). The idea of Example 12.11 can be found in the book by Syski (1991). Example 12.l2 is outlined by Rohatgi (1976) and Example 12.14 was suggested to me by Seshadri (1986). Note that other counterexamples and useful facts concerning the characterization-type properties of different classes of probability distri butions can be found in works by Mauldon (1956), Dykstra and Hewett (1972), Kagan et al (1973), Gani and Shanhag (1974), Huang (1975), Galambos and Kotz (1978), Ahlo (1982), Azlarov and Volodin (1983), Hwang and Lin (1984), Rossberg et al (1985), Too and Lin (1989), Balasubrahmanyan and Lau (1991), Letac (1991, 1995), Prakasa Rao (1992), Yellott and Iverson (1992), Braverman (1993) and Huang and Li (1993).
Section 13.
Diverse properties of random variables
Example 13.1(i) is folklore while case (ii) is due to Behboodian (1989). Example 13.2 is indicated by Feller (1971), but we have followed the presentation given by Kelker (l971). Example 13.3 is outlined by Barlow and Proshan (1966). Example 13.4 is based on a paper by Pavlov (1978). The notion of exchangeability is intensively treated by Feller (1971), Chow and Teicher (1978), Laha and Rohatgi (1979) and Aldous (1985). Example 13.5 is based on these sources and on discussions with Rohatgi (1986). Diaconis and Dubins (1980) suggested Example 13.6, but a similar statement can also be found in the book by Feller (1971). The idea of Example 13.7 is indicated by Galambos (1987). Example 13.8 belongs to Taylor et al (1985). Example 13.9 is considered by Gut and Janson (1986). Other counterexamples classified as 'diverse' can be found in the works of Bhattacharjee and Sengupta (1966), Ord (1968), Fisher and Walkup (1969), Brown (1972), Burdick (1972), Dykstra and Hewett (1972), Klass (1973), GIeser (1975). Cambanis et al (1976), Freedman (1980), Tong (1980), Franken and Lisek (1982), Laue (1983), Chen and Shepp (1983), Galambos (1984), Aldous (1985), Taylor et al (1985), Husler (1989) and Metry and Sampson (1993). For more abstract topics, see Laha and Rohatgi (1979), Rao (1979), Tjur (1980), Vahaniya et al (1989), Gelbaum and Olmsted (1990), DaH' Aglio et al (1991), Ledoux and Talagrand (1991), Kalashnikov (1994) and Rao (1995).
SUPPLEMENTARY REMARKS
Section 14.
311
Various kinds of convergence of sequences of random variables
Examples 14.1,14.2,14.4, 14.S, 14.6, 14.8(i), 14.l0(ii), 14.12(i) or their modifications can be found in many publications. These examples can be classified as belonging to probabilistic folklore. Examples 14.3(i) and 14.7(i) are based on arguments by Roussas (1973), Laha and Rohatgi (1979) and Bauer (1996). Example 14.3(ii) is due to Grimmett and Stirzaker (1982). Examples 14.7(ii) and 14.8(ii) are considered by Thomas (1971). Fortet (1977) has described Example 14.7(iii). Example 14.7(iv) is treated by Chung (1974). The idea of Example 14.9 is indicated by Feller (1971), Lukacs (197S) and Billingsley (199S). Cases (i) and (ii) of example 14.10 were suggested by Grimmett (1986) and Zubkov (1986) respectively. Example 14.11 is due to Rohatgi (1986) and a similar example is given in Serfling (1980). Case (ii) of Example 14.12 is briefly discussed by Cuevas (1987). Example 14.13 is presented in a form which is close to that of Ash and Gardner (197S). In Example 14.14 we follow Hsu and Robbins (1947) and Chow and Teicher (1978). Lukacs (197S) considers Examples 14.1S and 14.18. Example 14.17 was suggested by Zubkov (1986). Cases (i) and (ii) of Example 14.16 are described following Lukacs (197S) and Billingsley (199S) respectively. Note that other useful counterexamples can be found in works by Neveu (196S), Kingman and Taylor (1966), Hettmansperger and Klimko (1974), Stout (l974a), Dudley (1976), Ganssler and Stule (1977), Bartlett (1978), Rao (1984), Ledoux and Talagrand (1991), Lessi (1993) and Shiryaev (1995).
Section 15.
Laws of large numbers
Example IS.l(i) and its modifications can be classified as folklore. Examples 15.1(ii), IS.3 and IS.4 belong to Geller (1978). In Example IS.2 we follow the presentations of Lukacs (197S) and Bauer (1996). The statement in Example 15.S is contained in many books: see those by Fisz (1963) or Lukacs (197S). Revesz (1967) is the author of Example IS.6. Example IS.7 is based on papers by Prohorov (19S0) and Fisz (19S9). Example IS.8 is due to Hsu and Robbins (1947). Taylor and Wei (1979) describe Example IS.9. For a presentation of Example IS.1O see Stoyanov et a/ (1988). For the presentation of Example IS. 11 we used papers by Jamison et al (196S), Chow and Teicher (1971) and Wright et at (1977). The classical Example IS.12 is described by Feller (1968). Finally, let us note that other counterexamples about the laws of large numbers and related topics can be found in works by Prohorov (19S9), Jamison et a/ (196S), Lamperti (1966), Revesz (1967), Chow and Teicher (1971), Feller (1971), Stout (197 4a), Wright et a/ (1977), Asmussen and Kurtz (1980), Han and Heyde (1980), Csorg6 et at (1983), Dobric (1987), Ramakrishnan (1988) and Chandra (1989).
312
Section 16.
COUNTEREXAMPLES IN PROBABILITY
Weak convergence of probabi1ity measures and distributions
Example 16.1 and other similar examples were originally described by Billingsley (1968) and have since appeared in many books and lecture notes. Chung (1974) considered Example 16.2 and its variations can be classified as folklore. Example 16.3, suggested by Robbins (1948), is presented in a form similar to that in Fisz (1963). Clearly, Example 16.4 belongs to probabilistic folklore. The idea of Example 16.5 was suggested by Zolotarev (1989). Takahasi (1971172) is the originator of Example 16.6. The idea of Example 16.7 is indicated by Feller (1971). Example 16.8 is considered by KendaH and Rao (1950). Example 16.9 is outlined by Rohatgi (1976). Example 16.10 is described by Laube (1973). Other counterexamples devoted to weak convergence and related topics can be found in works by Billingsley (1968, 1995), Breiman (1968), Sibley (1971), Borovkov (1972), Roussas (1972), Chung (1974), Lukacs (1975) and Eisenberg and Shixin (1983).
Section 17.
Central limit theorem
Example 17.1(i) is based on arguments given by Ash (1972) and Chow and Teicher (1978). Cases (ii) and (iii) of the same example are considered by Thomasian (1969). Obviously Examples 17.2 and 17.5 can be classified as folklore. Example 17.3 is considered by Ash (1972). The idea of Example 17.4 is to be found in Feller (1971). Zubkov (1986) suggested Example 17.6. Case (i) of Example 17.7 is considered by Gnedenko and Kolmogorov (1954), while case (ii) is taken from Malisic (1970) and is presented as it is given by Stoyanov et al (1988). Additional counterexamples concerning the CLT can be found in works by Gnedenko and Kolmogorov (1954), Fisz (1963), Renyi (1970), Fel1er (1971), Chung (1974), Landers and Rogge (1977), Laha and Rohatgi (1979), Rao (1984), Shevarshidze (1984), Janson (1988) and Berkes etal(l991).
Section 18.
Diverse limit theorems
Example 18.1(i) is considered by Billingsley (1995). Case (ii) of this example and Example 18.3 are considered by Chow and Teicher (1978). Examples 18.2 and 18.4 are covered in many sources. Tomkins (1975a) is the author of Example 18.5 and 18.6 belongs to Neveu (1975). Basterfield (1972) considered Example 18.7 and noted that this example was suggested by Williams. Examples 18.8 and 18.9 are considered by Lukacs (1975). Example 18.10 belongs to Arnold (1966), but also see Lukacs (1975). Example 18.11 is based on a paper by Stout (1979). Example 18.12 is given by Breiman (1967). Vasudeva (1984) treated Example 18.13. Example
SUPPLEMENTARY REMARKS
313
18.14 is due to Resnik (1973). A great number of other counterexamples concerning the limit behaviour of random sequences can be found in the literature. However, some of these counterexamples are either very specialized or very complicated. The interested reader is referred to works by Spitzer (1964), Kendall (1967), Feller (1968,1971). Moran (1968), Sudderth (1971), Roussas (1972), Greenwood (1973). Berkes (1974). Chung (1974), Stout (1974a). Kuelbs (1976). Hall and Heyde (1980), Serfting (1980), Tomkins (1980), Rosalsky and Teicher (1981), Prohorov (1983), Daley and Hall (1984), Boss (1985), Kahane (1985), Wittmann (1985), Sato (1987), Alonso (1988), Barbour et al (1988), Husler (1989), Adler (1990), Jensen (1990), Tomkins (1990, 1992, 1996), Hu (1991), Ledoux and Talagrand (1991). Rachev (1991), Williams (1991). Adler et al (1992), Fu (1993), Rosalsky (1993), Klesov (1995) and Rao (1995).
Section 19. Basic notions on stochastic processes Example 19.1 is based on remarks by Ash and Gardner (1975) and Billingsley (1995). Examples 19.2, 19.3, 19.4(i),19.6(i),19.7, 19.8 and 19.1O(i)ormodificationsofthem can be found in many textbooks and can be classified as probabilistic folklore. Case (ii) of Example 19.4 is considered by Yeh (1973). Example 19.5(i) is described by Kallianpur (1980), case (ii) belongs to Cambanis (1975), while case (iii) is given by Dellacherie (1972) and Elliott (1982). Example 19.6(ii) is due to Masry and Cambanis (1973). Example 19.9 is based entirely on a paper by Wang (1982). Cases (ii) and (iii) of Example 19.1 0 are given in a form similar to that of Morrison and Wise (1987). For other counterexamples concerning the basic characteristics and properties of stochastic processes we refer the reader to works by Dudley (1973). Kallenberg (1973), Wentzell (1981), Dellacherie and Meyer (1982), Elliott (1982), Metivier (1982), Doob (1984). Hooper and Thorisson (1988). Edgar and Sucheston (1992), Rogers and Williams (1994). Billingsley (1995) and Rao (1995).
Section 20. Markov processes Examples 20.1 (i) and 20.2(iii) are probabilistic folklore. Example 20.1, cases (ii) and (iii). are due to FeHer (1968.1959). Case (iv) of Example 20.1 as well as Example 20.2(i) and (ii) are considered by Rosenblatt (1971. 1974). Example 20.2(iv) is discussed by Freedman (1971). Arguments which are essentially from Isaacson and Madsen (1976) are used to describe Example 20.3. According to Holewijn and Hordijk (1975). Example 2004 was suggested by Runnenburg. Example 20.5 is due to Tanny (1974) and O'Brien (1982). Speakman (1967) considered Example 20.6. Example 20.7 is considered by Dynkin (1965) and Wentzell (1981). Example 20.8(i) is due to A. A. Yushkevich (see Dynkin and Yushkevich 1956; and also Dynkin 1961; Wentzell 1981). Case (ii) of the same example is based on an idea from Wong ( 1971).
314
COUNTEREXAMPLES IN PROBABILITY
Example 20.9 is considered by Ito (1963). A great number of other counterexamples (some of them very complicated) can be found in the works of Chung (1960,1982), Dynkin (1961, 1965), Breiman (1968), Chung and Walsh (1969), Kurtz (1969), Feller (1971), Freedman (1971), Rosenblatt (1971, 1974), Gnedenko and Solovyev (1973), D. P. Johnson (1974), Tweedie (1975). Monrad (1976). Lamperti (1977), Iosifescu (1980). Wentzell (1981). Portenko (1982), Salisbury (1986,1987), Ethier and Kurtz (1986), Grey (1989), Liu and Neuts (1991), Revuz and Yor (1991). Alabert and Nualart (1992), Ihara (1993). Meyn and Tweedie (1993). Courbage and Hamdan (1994). Rogers and Williams (1994). Pakes (1995) and Eisenbaum and Kaspi (1995).
Section 21. Stationary processes and some related topics Examples 21.1 and 21.2 and other versions of them are probabilistic folklore. Example 21.3 is considered by Ibragimov (1962). ExampJe 21.4 is based on arguments by Ibragimov and Rozanov (1978). Case (i) of Example 21.5 is discussed by Gaposhkin (1973). while case (ii) of the same example can be found in the paper by Verbitskaya (1966). Example 21.6 can be found in more than one source: we follow the presentation given by Shiryaev (1995); see also Ash and Gardner (1975). Example 21.7 is due to Stout (1 974b). Cases (i) and (ii) ofExample21.8 are found in the work of Grenander and Rosenblatt (1957), while case (iii) is discussed by Bradley (1980). For other counterexamples we refer the reader to works by Krickeberg (1967). Billingsley (1968), Breiman (1969). Ibragimov and Linnik (1971), Davydov (1973), Rosenblatt (1979), Bradley (1982, 1989). Herrndorf (1984). Robertson and Womak (1985), Eberlein and Taqqu (1986). Cambanis et al (1987), Dehay (1987a. 1987b). Janson (1988), Rieders (1993), Doukhan (1994) and Rosalsky et al (1995).
Section 22. Discrete-time martingales Examples 22.1 (iii), 22.4(i) and 22.10 can be classified as probabilistic folklore. Example 22.10) is given by Neveu (1975), while case (ii) of the same example was proposed by KUchler (1986). Example 22.2 is based on arguments by Yamazaki (1972). Case (i) and case (ii) of Example 22.3 are considered respectively by Kemeny et at (1965) and Freedman (1971). Examples 22.4(ii) and 22.5(i) were suggested by Melnikov (1983). Tomkins (I 975b) described Examples 22.5(ii) and 22.7. Examples 22.6 and 22.8 can be found in Tomkins (1984b) and (1984a) respectively. Zolotarev (1961) is the author of Example 22.9, case (i), while case (ii) can be found in Shiryaev (1984). Example 22.11 (i) is given by Stout (1974a) with an indication that it belongs to G. Simons. Case (ii) of the same example is treated by Neveu (1975). while the general possibility presented by case (iii) was suggested by Bojdecki (1985). Examples 22. 12(ii) and 22. 13(iii) were suggested by Marinescu (1985) and are given here in the form proposed by Iosifescu (1985). Example 22. 13(i) is considered by
SUPPLEMENTARY REMARKS
315
Edgar and Sucheston (1976a). The last example, 22.14, is based on a paper by Dozzi and Imkeller (1990). Other counterexamples concerning the properties of discretetime martingales can be found in works by Cuculescu (1970), Nelson (1970). BaezDuarte (1971), Ash (1972), Gilat (1972), Mucci (1973), Austin et at (1974), Stout (1974a), Edgar and Sucheston (l976a, 1976b, 1977), Blake (1978, 1983), Janson (1979), Rao (1979), Alvo et al (1981), Gut and Schmidt (1983), Tomkins (1984a, b), Alsmeyer (1990) and Durrett (1991 ).
Section 23.
Continuous-time martingales
Example 23.1(i) belongs to Doleans-Dade (1971). Case (ii) and case (iii) of the same example are described by Kabanov (1974) and Stricker (1986) respectively. According to Kazamaki (1972a), Example 23.2 was suggested by P. A. Meyer. Example 23.3 is given by Kazamaki (1972b). Johnson and Helms (1963) have given Example 23.4, but here we follow the presentation given by Dellacherie and Meyer (1982) and Rao (1979). Case (i) of Example 23.5 is treated by Chung and Williams (1990) (see also Revuz and Yor (1991» while case (ii) was suggested to me by Yor (1986) (see Karatzas and Shreve (1991». Example 23.6 is presented by Meyer and Zheng (1984). Example 23.7, considered by Radavicius (1980), is an answer to a question posed by B. Grigelionis. Example 23.8 belongs to Walsh (1982). Yor (1978) has treated topics covering Example 23.9. Example 23.10 was communicated to me by Liptser (1985) (see also Liptser and Shiryaev (1989». According to Kallianpur (1980), Example 23.11 (i) belongs to H. Kunita, and the presentation given here is due to Yor. Case (ii) of the same example is considered by Liptser and Shiryaev (1977). Several other counterexamples (some very complicated) can be found in works by Dellacherie and Meyer (1982). Metivier (1982), Kopp (1984). Liptser and Shiryaev (1989). Isaacson (1971). Kazamaki (1974. 1985a). Surgailis (1974). Edgar and Sucheston (1976b), Monroe (1976), Sekiguchi (1976), Stricker (1977, 1984), Janson (1979), Jeulin and Yor (1979), Azema et al (1980), Kurtz (1980), Enchev (1984, 1988), Bouleau (1985), Merzbach and Nualart (1985), Williams ( 1985), Ethier and Kurtz (1986), Jacod and Shiryaev (1987), Dudley (1989). Revuz and Yor (1991), Yor (1992,1996), Kazamaki (1994) and Pratelli (1994).
Section 24. Poisson process and Wiener process Example 24.1 and its versions can be found in many SOurces and so can be classified as probabilistic folklore. According to Goldman (1967), Example 24.2 is due to L Shepp. We present Example 24.3 following the paper of Szasz (1970). Example 24.4 belongs to Jacod (1975). Hardin (1985) described Example 24.5. Example 24.6, cases (i), (ii) and (iii), was treated by Novikov (1972, 1979, 1983) (but see also Liptser and Shiryaev (1977». Example 24.7 was created recently by Novikov (1996).
316
COUNTEREXAMPLES IN PROBABILITY
Case (i) of Example 24.8 is considered by Jain and Monrad (1983); for case (ii) see Dudley (1973) and Fernandez De La Vega (1974). Case (iii) of the same example is the main result of Wrobel's work (1982). An anonymous enthusiast from Marseille wrote a letter describing the idea behind Example 24.9. Example 24.10 belongs to Al-Hussaini and Elliott (1989). Several other counterexamples can be found in the works of Moran (1967), Thomasian (1969), Wang (1977), Novikov (1979), Jain and Monrad (1983), Kazamaki and Sekiguchi (1983), Panaretos (1983), Williams (1985), Daley and Vere-Jones (1988), Mueller (1988), Huang and Li (1993) and Yor (1992, 1996). Finally, let us pose one interesting question concerning the Wiener process. Suppose X = (Xt, t ~ 0) is a process such that: (i) Xo = 0 a.s.; (ii) any increment X t - Xs with s < t is distributed N(O, t - s); (iii) any two increments, X t2 - Xtl and X t4 - X t ], where 0 ~ tl < t2 ~ t3 < t4, are independent. Question: Do these conditions imply that X is a Wiener process? Conjecture: No.
Section 25.
Diverse properties of stochastic processes
Example 25.1 belongs to Grilnbaum (1972). Case (i) of Example 25.2 is indicated in the work of Ephremides and Thomas (1974), while case (ii) of the same example was suggested to me by Ivkovic (1985). Example 25.3 is due to H. Tanaka (see Yamada and Watanabe 1971; Zvonkin and Krylov 1981; Durrett 1984). Example 25.4 was originally considered by Tsirelson (1975), but the proof of the non-existence of the strong solution given here belongs to N. V. Krylov (see also Liptser and Shiryaev 1977; Kallianpur 1980). For a variety of further counterexamples we refer the reader to the following sources: Kadota and Shepp (1970), Borovkov (1972), Davies (1973), Cairoli and Walsh (1975), Azema and Yor (1978), Rao (1979), Hasminskii (1980), Hill et al (1980), Kallianpur (1980). Krylov (1980), Metivier and Pellaumail (1980). Chitashvili and Toronjadze (1981), Csorgo and Revesz (1981), Follmer (1981), Liptser and Shiryaev (1981, 1982), Washburn and Willsky (1981), Kabanov et al (1983), Le Gall and Yor (1983), Melnikov (1983), Van der Hoeven (1983), Zaremba (1983), Barlow and Perkins (1984), Hoover and Keisler (1984), Engelbert and Schmidt (1985), Ethier and Kurtz (1986), Rogers and Williams (1987, 1994), Rutkowski (1987), KUchler and Sorensen (1989), Maejima (1989), Anulova (1990), Ihara (1993), Schacherrnayer (1993), Assing and Manthey (1995), Hu and Perez-Abreu (1995) and Rao (1995).
References = =
AAP AMM AmS AMS AP AS JAP LNM PTRF
=
SPA SPL TPA
= = =
ZW
=
= =
= =
=
=
Advances in Applied Probability American Mathematical Monthly American Statistician Annals of Mathematical Statistics Annals of Probability Annals of Statistics Journal of Applied Probability Lecture Notes in Mathematics Probability Theory and Related Fields (formerly ZW) Stochastic Processes and Their Applications Statistics and Probability Letters Theory of Probability and Its Applications (transl. of' Teoriya Veroyatnostey i Primeneniya) Zeitschrift fur Wahrscheinlichkeitstheorie und venvandte Gebiete (new title PTRF since 1986)
Adell, 1. A. (1996) Personal communication. Adler, A. (1990) On the nonexistence of the LIL for weighted sums of identically distributed r.v.s. J. Appl. Math. Stoch. Anal. 3, 135-140. Adler, A., Rosalsky, A. and Taylor, R. L. (1992) Some SLLNs for sums of random elements. Bull. Inst. Math. A cad. Sinica 20, 335-357. Ahlo, 1. (1982) A class of random variables which are not continuous functions of several independent random variables. ZW 60,497-500. Ahmed, A. H. N. (1990) Negative dependence structures through stochastic ordering. Trab. Estadistica 5, 15-26. Ahsanullah, M. (1985) Some characterizations of the bivariate normal distribution. Metrika 32.215-217. Ahsanullah, M. and Sinha, B. K. (1986) On normality via conditional normality. Calcutta Statist. Assoc. Bulletin 35, 199-202. Akhiezer. N. I. (1965) The Classical Moment Problem. Hafner, New York. (Russian edn 196]) AI-Hussaini, A. N. and ElJiott, R. (1989) Markov bridges and enlarged filtrations. Canad. 1. Statist. 17. 329-332. Alabert. A. and Nualart. D. (1992) Some remarks on the conditional independence and the Markov property. In: Stochastic Analysis and Related Topics. Eds H. Koreslioglu and A. Ustune1. Birkhauser, Basel. 343-364. Aldous, D. 1. (1985) Exchangeability and related topics. LNM 1117, 1-186.
318
COlJNTEREXAMPLES IN PROBABILITY
Alonso, A. (1988) A counterexample on the continuity of conditional expectations. 1. Math. Analysis App/. 129, 1-5. Alsmeyer, G. (1990) Convergence rates in LLNs for martingales. SPA 36, 181-194. Alvo, M., Cabilio, P. and Feigin, P. D. (1981) A class of martingales with non-symmetric distributions. ZW 58, 87-93. Ambartzumian, R. A. (1982) Personal communication. Anderson, T. W. (1958) An Introduction to Multivariate Statistical Analysis. John Wiley & Sons, New York. Anu1ova, S.Y. (1990) Counterexamples: SDE with linearly increasing coefficients may have an explosive solution within a domain. TPA 35, 336-338. Arnold, L. (1966) Ober die Konvergenz einer zuflilligen Potenzreihe. 1. Reine Angew. Math. 222, 79-112. Arnold, L. (1967) Convergence in probability of random power series and a related problem in linear topological spaces. Israel 1. Math. 5, 127-134. Ash, R. (1970) Basic Probability Theory. John Wiley & Sons, New York. Ash, R. (1972) Real Analysis and Probability. Acad. Press, New York. Ash, R. B. and Gardner, M. F. (1975) Topics in Stochastic Processes. Acad. Press, New York. Asmussen, S. and Kurtz, T. (1980) Necessary and sufficient conditions for complete convergence in the law of large numbers. AP 8, 176-182. Assing, S. and Manthey, R. (1995) The behaviour of solutions of stochastic differential inequalities. PTRF 103,493-514. Austin, D. G., Edgar, G. A. and Ionescu Tulcea, A. (1974) Pointwise convergence in terms of expectations. ZW 30, 17-26. Azema, J. and Yor, M. (eds) (1978) Temps locaux. Asterisque 52-53. Azema, 1., Gundy, R. F. and Yor, M. (1980) Sur l'integrabilite uniforme des martingales continues. LNM 784, 53-6l. Azlarov, T. A. and Volodin, N. A. (1983) On the discrete analog of the Marshall-Olkin distribution. LNM 982, 17-23. Baez-Duarte, L. (1971) An a.e. divergent martingale that converges in probability. 1. Math. Analysis App/. 36, 149-150. Bagchi, A. (1989) Personal communication. Balasanov, Yu. G. and Zhurbenko, I. G. (1985) Comments on the local properties of the sample functions of random processes. Math. Notes 37,506-509. Balasubrahmanyan, R. and Lau, K. S. (1991) Functional Equations in Probability Theory. Acad. Press, Boston. Baringhaus, L., Henze, N. and Morgenstern, D. (1988) Some elementary proofs of the normality of XY/(X 2 + y2)1/2 when X and Yare normal. Compo Math Appl. 15,943-944. Baringhaus, L. and Henze, N. (1989) An example of normal XY/(X 2 + y2)1/2 with nonnormal X, Y. Preprint, lJni V. Hannover. Barbour, A. D., Holst, L. and Janson, S. (1988) Poisson approximation with the Stein-Chen method and coupling. Preprint, Uppsala Univ. Barlow, M. T. (1982) One-dimensional stochastic differential equation with no strong solution. 1. London Math. Soc. (2), 26, 335-347. Barlow, R. E. and Proshan, F. (1966) Tolerance and confidence limits for classes of distributions based on failure rate. AMS 37, 1593-160l. Barlow, M. T. and Perkins, E. (1984) One-dimensional stochastic differential equations involving a singular increasing process. Stochastics 12, 229-249. Bamdorff-Nielsen, O. (1978) Information and Exponential Families. John Wiley & Sons, Chichester. Bartlett, M. (1978) An Introduction to Stochastic Processes (3rd edn). Cambro Univ. Press., Cambridge.
REFERENCES
319
Basterfield, J. G. (1972) Independent conditional expectations and Llog L. ZW 21,233-240. Bauer, H. (1996) Probability Theory. Walter de Gruyter, Berlin. Behboodian, J. (1989) Symmetric sum and symmetric product of two independent r. v.s. 1. Theoret. Probab. 2, 267-270. Belyaev, Yu. K. (1985) Personal communication. Berg, C. (1988) The cube of a normal distribution is indeterminate. AP 16, 910-913. Berg, C. (1995) Indeterminate moment problems and the theory of entire functions. 1. Comput. Appl. Math. 65, 27-55. Berg, C. and Thill, M. (1991) Rotation invariant moment problem. Acta Math. 167, 207-227. Berkes, I. (1974) The LIL for subsequences of random variables. ZW 30,209-215. Berkes, I., Dehking, H. and Mori, T. (1991) Counterexamples related to the a.s. CLT. Studia Sci. Math. Hungarica 26, 153-164. Bernstein, S. N. (1928) Theory of Probability. Gostechizdat, Moscow, Leningrad. (In Russian; preliminary edition 1916) Bhattacharjee, A. and Sengupta, D. (1966) On the coefficient of variation of the classes Land L. SPL 27,177-180. Bhattacharya, R. and Waymire, E. (1990) Stochastic Processes and Applications. John Wiley & Sons, New York. Billingsley, P. (1968) Convergence of Probability Measures. John Wiley & Sons, New York. Billingsley, P. (1995) Probability and Measure (3rd edn). John Wiley & Sons, New York. Bischoff, W. and Fieger, W. (1991) Characterization of the multivariate normal distribution by conditional normal distributions. Metrika 38, 239-248. Bissinger, B. (1980) Stochastic independence versus intuitive independence. Two- Year College Math. 1. 11, 122-123. Blackwell, D. and Dubins, L.-E. (1975) On existence and non-existence of proper regular conditional probabilities. AP 3,741-752. Blake, L. H. (1973) Simple extensions of measures and the preservation of regularity of conditional probabilities. Pacific 1. Math. 46, 355-359. Blake, L. H. (1978) Every amart is a martingale in the limit. 1. London Math. Soc. (2), 18, 381-384. Blake, L. H. (1983) Some further results concerning equiconvergence of martingales. Rev. Roum. Math. Pure Appl. 28, 927-932. Blank, N. M. (1981) On the definiteness of functions of bounded variation and of d.f.s. by the asymptotic behavior as x ~ 00. In: Problems of Stability of Stochastic Models. Eds V. M. Zolotarev and V. V. Kalashnikov. Inst. Systems Sci., Moscow, 10-15. (In Russian) Block, H.W., Sampson, A.R. and Savits, T.H. (eds) (1991) Topics in Statistical Dependence. (lMS Series, vol. 16). Inst. Math. Statist., Hayward (CA). Blumenthal, R. M. and Getoor, R. K. (1968) Markov Processes and Potential Theory. Acad. Press, New York. Blyth, C. (1986) Personal communication. Bohlmann, G. (1908) Die Grund bergi ffe der Wahrscheinlichkeitsrechnung in Ihrer Anwendung auf die Lebensversicherung. In: Arti dei 4. Congresso lnternationale del Matematici, (Roma 1908), vol. 3. Ed G. Castelnuovo. 244-278. Bojdecki, T. (1977) Discrete-time Martingales. Warsaw Univ. Press, Warsaw. (In Polish) Bojdecki. T. (1985) Personal communication. Bondesson, L., Kristiansen, G.K. and Steutel, F.W. (1996) Infinite divisibility of r. v.s and their integer parts. SPL 28,271-278. Borovkov, A. A. (1972) Convergence of distributions of functionals of stochastic processes. Russian Math. Surveys 27, 1-42. Boss, D. D. (1985) A converse to Scheff6's theorem. AS 13, 423-427.
320
COUNTEREXAMPLES IN PROBABILITY
Bouleau, N. (1985) About stochastic integrals with respect to processes which are not semi martingales. Osaka 1. Math. 22, 31-34. Bradley, R. C. (1980) A remark on the central limit question for dependent random variables. lAP 17, 94-101. Bradley, R. C. (1982) Counterexamples to the CLT under strong mixing conditions, I and n. Colloq. Math. Soc. lanos Bolyai 36, 153-171 and 57, 59-67. Bradley, R. (1982) Personal communication. Bradley, R. (1989) A stationary, pairwise independent, absolutely regular sequences for which the CLT fails. PTRF 81, 1-10. Braverman, M. S. (1993) Remarks on characterization of normal and stable distributions. 1. Theoret. Probab. 6,407-415. Breiman. L. (1967) On the tail behavior of sums of independent random variables. ZW 9. 20-25. Breiman, L. (1968) Probability. Addison-Wesley, Reading (MA). Breiman, L. (1969) Probability and Stochastic Processes. Houghton Mifflin, Boston. Broughton, A. and Huff, B. W. (1977) A comment on unions of u-fields. AMM 84, 553-554. Brown,1. B. (1972) Stochastic metrics. ZW 24,49-62. Bryc, W. and Smolenski, W. (1992) On the stability problem for conditional expectation. SPL 15,41-46. BUhler, W. 1. and Mieshke, K. L. (1981) On (n - 1)-wise and joint independence and normality of n random variables. Commun. Statist. Theor. Meth. 10, 927-930. Bulinskii, A. (1988) On different mixing conditions and asymptotic normality. Soviet Math. Doklady 37, 443-447. Bulinskii, A. (1989) Personal communications. Burdick, D. L. (1972) A note on symmetric random variables. AMS 43, 2039-2040. Burkholder, D. L. and Gundy, R. F. (1970) Extrapolation and interpolation of quasi-linear operators of martingales. Acta Math. 124,249-304. Cacoullos, T. (1985) Personal communication. Cairoli, R. and Walsh, J. B. (1975) Stochastic integrals in the plane. Acta Math. 134, 111-183. Cambanis, S. (1975) The measurability of a stochastic process of second order and its linear space. Proc. Amer. Math. Soc. 47,467-475. Cambanis, S., Simons, G. and Stout, W. (1976) Inequalities forE[k(X, Y)] when the marginals are fixed. ZW 36, 285-294. Cambanis, S., Hardin, C. D. and Weron, A. (1987) Ergodic properties of stationary stable processes. SPA 24, 1-18. Candiloro, S. (1993) Personal communication. Capobianco, M. and Molluzzo, 1. C. (1978) Examples and Counterexamples in Graph Theory. North-Holland, Amsterdam. Carnal, H. and Dozzi, M. (1989) On a decomposition problem for multivariate probability measures. 1. Multivar. Anal. 31, 165-177. Castillo, E. and Galambos, J. (1987) Bivariate distributions with normal conditionals. In: Proc. Intern. Assoc. Sci.-Techn. Development (Cairo'87). Acta Press, Anaheim (CA). 59-62. Castillo, E. and Galambos, J. (1989) Conditional distributions and the bivariate normal distribution. Metrika 36, 209-214. Chandra, T. K. (1989) Uniform integrability in the Cesaro sense and the weak LLNs. Sankhya A·51, 309-317. Chen, R. and Shepp, L. A. (1983) On the sum of symmetric r. v.S. AmS 7, 236. Chernogorov, V. G. (1996) Personal communication. Chitashvili, R. 1. and Toronjadze, T. A. (1981) On one-dimensional SDEs with unit diffusion coefficient. Structure of solutions. Stochastics 4, 281-315.
REFERENCES
321
Chow, Y. and Teicher, H. (1971) Almost certain summability of independent identically distributed random variables. AMS 42,401-404. Chow, Y. S. and Teicher, H. (1978) Probability Theory: Independence, Interchangeability, Martingales. Springer. New York. Chung, K. L. (1953) Sur les lois de probabilite unimodales. C. R. Acad. Sci. Paris 236, 583-584. Chung, K. L. (1960) Markov Chains with Stationary Transition Probabilities. Springer, Berlin. Chung, K. L. (1974) A Course in Probability Theory (2nd edn). Acad. Press, New York. Chung, K. L. (1982) Lectures from Markov Processes to Brownian Motion. Springer, New York. Chung, K. L. (1984) Personal communication. Chung, K. L. and Fuchs, W. H. 1. (1951) On the distribution of values of sums of random variables. Memoirs Amer. Math. Soc. 6. Chung, K. L. and Walsh, 1. B. (1969) To reverse a Markov process. Acta Math. 123, 225-25 I. Chung, K. L. and Williams, R. 1. (1990) Introduction to Stochastic Integration (2nd edn). Birkhauser. Boston. Churchill. E. (1946) Information given by odd moments. AMS 17,244-246. Cinlar, E. (1975) Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliffs (NJ). Clark, 1. M. C. (J 970) The representation of functionals of Brownian motion by stochastic integrals. AMS 41, 1282-1295. Clarke, L. E. (1975) On marginal density functions of continuous densities. AMM 82, 845-846. Coleman, R. (1974) Stochastic Processes. Problem Solvers. Allen & Unwin, London. Courbage, M. and Hamdan, D. (1994) Chapman-Kolmogorov equation for non-Markovian shift-invariant measures. AP 22, 1662-1677. Cramer, H. (1936) Ober eine Eigenschaft der normal en Verteilungfunktion. Math. Z 41,405414. Cramer, H. (1960) On some classes of non-stationary stochastic processes. In: Proc. 4th Berkeley Symp. Math. Statist. Probab. 2. Univ. California Press, Berkeley. 57-78. Cramer, H. (1964) Stochastic processes as curves in Hilbert space. TPA 9, 169-179. Cramer, H. (1970) Random Variables and Probability Distributions. Cambro Univ. Press, Cambridge. Cramer, H. and Leadbetter, M. R. (1967) Stationary and Related Stochastic Processes. John Wiley & Sons, New York. Crow, E. L. (1967) A counterexample on independent events. AMM74, 716-717. Csorgo, M. and Revesz, P. (1981) Strong Approximations in Probability and Statistics. Akad. Kiad6, Budapest, and Acad. Press, New York. Csorgo, S., Tandori, K. and Totik, V. (1983) On the strong law of large numbers for pairwise independent random variables. Acta Math. Hungar. 42, 319-330. Cuculescu, L (1970) Nonuniformly integrable non-negative martingales. Rev. Rown. Math. Pure Appl. 15, 327-337. Cuevas, A. (1987) Density estimation: robustness versus consistency. In: New Perspectives in Theoretical and Applied Statistics. Ed M. L. Puri. John Wiley & Sons, New York. 259-264. Cuevas, A. (1989) Personal communications. Daley, D. 1. and Hall, P. (1984) Limit laws for the maximum of weighted and shifted Li.d. r.v.s. AP 12, 571-587. Daley, D. 1. and Vere-Jones, D. (1988) An Introduction to the Theory of Point Processes. Springer, New York. Dan' Aglio, G. (1960) Les fonctions extremes de la classe de Fn!chet a 3 dimensions. Publ. Ins!. Statist. Univ. Paris 9, 175-188. DaIl' Aglio, G. (1972) Frechet classes and compatibility of distribution functions. Symp. Math. 9,131-150.
322
COUNTEREXAMPLES IN PROBABILITY
Dall' Aglio, G. (1990) Somme di variabili aleatorie e convoluzioni. Preprint # 611990, Dip. Statist., Univ. Roma "La Sapienza". DalI' Aglio, G. (1995) Personal communication. DaB' Aglio, G., Kotz, S. and Salinetti, G. (eds) (1991) Advances in Probability Distributions with Given Marginals. (Proc. Symp., Roma'90). K1uwer Acad. Publ., Dordrecht. Davies, P. L. (1973) A class of almost nowhere differentiable stationary Gaussian processes which are somewhere differentiable. lAP 10, 682-684. Davis, M. H. A. (1990) Personal communication. Davydov, Yu. A. (1973) Mixing conditions for Markov chains. TPA 18, 312-328. De La Cal, J. (1996) Personal communication. Dehay, D. (1987a) SLLNs for weakly harmonizable processes. SPA 24, 259-267. Dehay, D. (1987b) On a class of asymptotically stationary harmonizable processes. l. Multivar. Anal. 22, 251-257. Dellacherie, C. (1970) Un example de la theorie generale des processus. LNM 124, 60-70. Dellacherie, C. (1972) Capacities et Processus Stochastiques. Springer, Berlin. Dell acheri e, C. and Meyer, P.-A. (1978) Probabilities and Potential. A. North-Holland, Amsterdam. Dellacherie, C. and Meyer, P.-A. (1982) Probabilities and Potential. B. North-Holland, Amsterdam. Devroye, L. (1986) Non-unifonn Random Variate Generation. Springer, New York. Devroye, L. (1988) Personal communication. Dharmadhikari, S. W. (1965) An example in the problem of moments. AMM 72, 302-303. Dharmadhikari, S. W. and Jogdeo, K. (1974) Convolutions of a-modal distributions. ZW 30, 203-208. Dharmadhikari, S. and Joag-Dev, K. (1988) Unimodality, Convexity and Applications. Acad. Press, New York. Diaconis. P. and Dubins, L. (1980) Finite exchangeable sequences. AP 8, 745-764. Dilcher, K. (1992) Personal communication. Dobric, V. (1987) The law of large numbers: examples and counterexamples. Math. Scand. 60, 273-291. Dodunekova, R. D. (J 985) Personal communication. Doleans-Dade, C. (197 I) Une martingale uniformement integrable mais non localement de carre integrable. LNM 191,138-140. Doob, 1. L. (1953) Stochastic Processes. John Wiley & Sons, New York. Doob, 1. L. (1984) Classical Potential Theory and its Probabilistic Counterpart. Springer, New York. Doukhan, P. (1994) Mixing: Properties and Examples. (Lecture Notes in Statist. 85.) Springer. New York. Dozzi, M. (1985) Personal communication. Dozzi, M. and Imkeller. P. (1990) On the integrability of martingale transforms. Preprint, Univ. Bern. Drossos, C. (1984) Personal communication. Dryginoff, M. B. L. ( 1996) Personal communication. Dudley, R. M. (1972) A counterexample on measurable processes. In: Proc. Sixth Berkeley Symp. Math. Statist. Probab. II. Univ. California Press, Berkeley. 57-66. Dudley, R. M. (1973) Sample functions oflhe Gaussian processes. AP 1, 66-103. Dudley, R. M. (1976) Probabilities and Metrics. Lecture Notes Ser. no. 45. Aarhus Univ., Aarhus. Dudley, R. M. (1977) Wiener functionaIs as Ito integrals. AP 5, 140-141. Dudley, R. M. (1989) Real Analysis and Probability. Wadsworth & Brooks, Pacific Grove (CA).
REFERENCES
323
Durrett, R. (1984) Brownian Motion and Martingales in Analysis. Wadsworth & Brooks, Monterey (CA). Durrett, R. (1991) Probability; Theory and Examples. Wadsworth, Belmont (CA). Dykstra, R. L. and Hewett, J. E. (1972) Examples of decompositions chi-squared variables. AmS 26(4),42-43. Dynkin, E. B. (1961) Theory of Markov Processes. Prentice-Hall, Englewood Cliffs (NJ). (Russian edn 1959) Dynkin, E. B. (1965) Markov Processes. Vols 1 and 2. Springer, Berlin. (Russian edn 1963) Dynkin, E. B. and Yushkevich, A. A. (1956) Strong Markov processes. TPA 1, 134-139. Eberlein, E. and Taqqu, M. (eds) (1986) Dependence in Probability and Statistics. Birkh:iuser, Boston. Edgar, G. A. and Sucheston, L. (1976a) Amarts: A class of asymptotic martingales. A. Discrete parameter. 1. Mu/tivar. Anal. 6, 193-221. Edgar, G. A. and Sucheston, L. (I976b) Amarts: A class of asymptotic martingales. B. Continuous parameter. J. Mulrivar. Anal. 6,572-591. Edgar, G. A. and Sucheston, L. (1977) Martingales in the limit and amarts. Proc. Am. Math. Soc. 67, 315-320. Edgar, G. A. and Sucheston, L. (1992) Stopping Times and Directed Processes. Cambro Univ. Press, New York. Eisenbaum, N. and Kapsi, H. (1995) A counterexample for the Markov property of local time for diffusions on graphs. LNM 1613, 260-265. Eisenberg, B. and Shixin, G. (1983) Uniform convergence of distribution functions. Proc. Am. Math. Soc. 88, 145-146. Elliott, R. J. (1982) Stochastic Calculus and Applications. Springer, New York. Emery, M. (1982) Covariance des semimartingales Gaussienes. C.R. Acad. Sci. Paris Ser. I 295, 703-705. Emmanuele, G. and Villani, A. (1984) Problem 6452. AMM91, 144. Enchev, O. B. (1984) Gaussian random Junctionals. Math. Research Report. Techn. Univ. Rousse, Rousse (BG). Enchev, O. B. (1985) Personal communication. Enchev, O. (1988) Hilbert-space-valued semi martingales. Boll. Unione Mat. !taNana B 2(7), 19-39. Engelbert, H. 1. and Schmidt, W. (1985) On solutions of one-dimensional stochastic differential equations without drift. ZW 68, 287-314. Enis, P. (1973) On the relation E[E(XIY)] = EX. Biometrika 60, 432-433. Ephremides, A. and Thomas, 1. B. (1974) On random processes linearly equivalent to white noise. Inform. Sci. 7, 133-156. Ethier, S. N. and Kurtz, T. G. (1986) Markov Processes. Characterization and Convergence. John Wiley & Sons, New York. Evans, S. N. (1991) Association and infinite divisibility for the Wishart distribution and its diagonal marginals. J. Multivar. Analysis 36, 199-203. Faden, A. M. (1985) The existence of regular conditional probabilities: necessary and sufficient conditions. AP 13, 288-298. Falk, R. and Bar-Hillel, M. (1983) Probabilistic dependence between events. Two-Year College Math. 1. 14,240-247. Falin, G. I. (1985) Personal communication. Feller, W. (1946) A limit theorem for random variables with infinite moments. Am. 1. Math. 68, 257-262. Feller, W. (1959) Non-Markovian processes with the semi group property. AMS30, 1252-1253. Feller, W. (1968) An Introduction to Probability Theory and its Applications 1 (3rd edn). John Wiley & Sons, New York.
324
COUNTEREXAMPLES IN PROBABILITY
Feller, W. (1971) An Introduction to Probability Theory and its Applications 2 (2nd edn). John Wiley & Sons, New York. Fernandez De La Vega, W. (1974) On almost sure convergence of quadratic Brownian variation. AP 2, 551-552. Fernique, X. (1974) Regularite des trajectories des fonctions aleatoires Gaussiennes. LNM 480,1-96. Fisher, L. and Walkup, D. W. (1969) An example of the difference between the Levy and Uvy-Prohorov metrics. AMS 40,322-324. Fisz, M. (1959) On necessary and sufficient conditions for the val idity of the SLLN expressed in tenns of moments. Bull. A cad. Polon. Sci. Ser. Math. 7, 221-225. Fisz, M. (1963) Probability Theory and Mathematical Statistics (3rd edn). John Wiley & Sons, New York. Flury, B. K. (1986) On sums of random variables and independence. AmS40, 214-215. Follmer, H. (1981) Dirichlet processes. LNM 851,476-478. Follmer, H. (1986) Personal communication. Fortet, R. (1977) Elements of Probability Theory. Gordon and Breach, London. Franken, P. and Lisek, B. (1982) On Wald's identity for dependent variables. ZW60, 143-150. Frechet, M. (1951) Sur les tableaux de correlation dont les marges sont donees. Ann. Univ. Lyon 14,53-77. Freedman, D. (1971) Brownian Motion and Diffusion. Holden-Day, San Francisco. Freedman, D. A. (1980) A mixture ofi.i.d. r. v.s. need not admit a regular conditional probability given the exchangeable a-field. ZW 51,239-248. Fu, 1. c. (1984) The moments do not detennine a distribution. AmS 38, 294. Fu, J. C. (1993) Poisson convergence in reliability of a large linearly connected systems as related to coin tossing. Statistica Sinica 3,261-275. Gaidov, S. (1986) Personal communication. Galambos, J. (1984) Introductory Probability Theory. Marcel Dekker, New York. Galambos, 1. (1987) The Asymptotic Theory of Extreme Order Statistics (2nd edn). Krieger, Malabar (FL). Galambos, J. (1988) Advanced Probability Theory. Marcel Dekker, New York. Galambos, 1. and Kotz, S. (1978) Characterizations of Probability Distributions. (LNM 675). Springer, Berlin. Galchuk, L. I. (1985) Gaussian semi martingales. In: Statistics and control of stochastic processes. Proc. Steklov Seminar 1984. Eds N. Krylov, R. Liptser and A. Novikov. Optimization Software, New York. 102-121. Gani, 1. and Shanbhag, D. N. (1974) An extension of Raikov's theorem derivable from a result in epidemic theory. ZW 29, 33-37. Ganssler, P. and Stute, W. (1977) Wahrscheinlichkeitstheorie. Springer, Berlin. Gaposhkin, V. F. (1973) On the SLLN for second-order stationary processes and sequences. TPA 18, 372-375. Geisser, S. and Mantel, N. (1962) Pairwise independence of jointly dependent variables. AMS 33, 290-291. Gelbaum, B. R. (1976) Independence of events and of random variables. ZW 36, 333-343. Gelbaum, B. R. (1985) Some theorems in probability theory. Pacific J. Math. 118,383-391. Gelbaum, B. R. and Olmsted, 1. M. H. (1964) Counterexamples in Analysis. Holden-Day, San Francisco. Gelbaum, B. R. and Olmsted, 1. M. H. (1990) Theorems and Counterexamples in Mathematics. Springer, New York. Geller, N. L. (1978) Some examples of the WLLN and SLLN for averages of mutually independent random variables. AmS 32,34-36.
REFERENCES
325
Gihman, I. I. and Skorohod, A. V. (1972) Stochastic Differential Equations. Springer, Berlin. (Russian edn 1968) Gihman, I. I. and Skorohod, A V. (1974179) The.ory of Stochastic Proce.sses. Vols 1, 2 and 3. Springer, New York. (Russian edns 1971n5) Gilat, D. (1972) Convergence in distribution. convergence in probability and almost sure convergence of discrete martingales. AMS 43. 1374-1379. Girsanov. I. V. (1962) An example of nonuniqueness of the solution of Ito stochastic integral equation. TPA 7, 325-331. Gieser, L. 1. (1975) On the distribution of the number of successes in independent trials. AP 3, 182-188. Gnedenko, B. V. (1937) Sur les functions caracteristiques. Bull, Univ. Moscou, Ser. Internat., Sect. A 1, 16-17. Gnedenko, B. V (1943) Sur la distribution limite du terme maximum d'une serie aleatoire. Ann. Math. 44, 423-453. Gnedenko, B. V. (1962) The Theory of Probability. Chelsea, New York. (Russian edn 1960) Gnedenko, B. V. (1985) Personal communication. Gnedenko, B. V. and Kolmogorov, A N. (1954) Limit Distributions for Swns of Independent Random Variables. Addison-Wesley, Cambridge (MA). (Russian edn 1949) Gnedenko, B. V. and Solovyev, A D. (1973) On the conditions for existence of final probabilities for a Markov process. Math. Operationasforsch. Statist. 4,379-390. Goldman, J. R. (1967) Stochastic point processes: Limit theorems. AMS 38, 771~779. Golec, J. (1994) Personal communication. Goode, J. M. (1995) Personal communication. Greenwood, P. (1973) Asymptotics of randomly stopped sequences with independent increments. AP 1,317-321. Grenander, U. (1963) Probabilities on Algebraic Structures. Almqvist & Wiksell, Stockholm and John Wiley & Sons, New York. Grenander, U. and Rosenblatt, M. (1957) Statistical Analysis of Stationary Time Series. John Wiley & Sons, New York. Grey, D. R. (1989) A note on explosiveness of Markov branching processes. AAP 21, 226-228. Griffiths, R. C. (1970) Infinitely divisible multivariate gamma distributions. Sankhya A32, 393-404. GrigeIioni s, B. (1977) On martingale characterization of stochastic processes with independent increments. Lithuanian Math. 1. 17(1),52-60. Grigelionis, B. (1986) Personal communication. Grimmett, G. (1986) Personal communication. Grimmett, G. R. and Stirzaker, D. R. (1982) Probability and Stochastic Processes. Clarendon Press, Oxford. Groeneboom, P. and Klaassen, C. A. 1. (1982) Solution to Problem 121. Statist. Neerlandica 36, 160-161. Griinbaum, F. A (1972) An inverse problem for Gaussian processes. Bull. Am. Math. Soc. 78. 615-616. Gumbel, E. (1958) Distributions a plusieurs variables dont les marges sont donnes. C. R. Acad. Sci. Paris 246.2717-2720. Gut, A. and Schmidt. K. D. (1983) Amarts and Set Function Processes. (LNM 1042). Springer, Berlin. Gut. A. and Janson, S. (1986) Converse results for existence of moments and uniform integrability for stopped random walks. AP 14, 1296-1317. Gyongy. I. (1985) Personal communication. Hahn. M. G. and Klass, M. 1. (1981) The multidimensional CLT for arrays normed by affine transformations. AP 9,6] ]-623.
326
COUNTEREXAMPLES IN PROBABILITY
Hall, P. (1981) A converse to the Spitzer-Rosen theorem. AP 9,633-641. Hall, P. and Heyde, C. C. (1980) Martingale Limit Theory and its Application. Acad. Press, New York. Halmos, P. R. (1974) Measure Theory. Springer, New York. Hamedani, G. G. (1984) Nonnormality of linear combinations of normal random variables. AmS 38, 295-296. Hamedani, G. G. (1992) Bivariate and multivariate normal characterizations. Commun. Statist. Theory Methods 21, 2665-2688. Hamedani, G. G. and Tata, M. N. (1975) On the determination of the bivariate normal distribution from distributions of linear combinations of the variables. AMM 82,913-915. Han, C. P. (1971) Dependence of random variables. AmS 25(4),35. Hansen, B. G. (1988) On the log-concave and log-convex infinitely divisible sequences and densities. AP 16, 1832-1839. Hardin, c., Jr. (1985) A spurious Brownian motion. Proc. Am. Math. Soc. 93, 350. Hasminskii, R. Z. (1980) Stochastic Stability of Differential Equations. Sijthoff & Nordhotf, Alphen aan den Rijn. (Russian edn 1969) Hawkes, J. (1973) On the covering of small sets by random intervals. Quart. 1. Math. 24, 427-432. Heilmann, W.-R. and Schroter, K. (1987) Eine Bemergung tiber bedingte Wahrschanlichkeiten, bedinkte Erwartungswerte und bedingte Unabhangigkeit. Bliitter28, 119-126. Hengartner, W. and Theodorescu, R. (1973) Concentration Functions. Acad. Press, New York. Herrndorf, N. (1984) A functional central limit theorem for weakly dependent sequences of random variables. AP 12, 141-153. Hettmansperger, T. P. and Klimko, L. A. (1974) A note on the strong convergence of distributions. AS 2, 597-598. Heyde, C. C. (1963a) On a property of the lognormal distribution. 1. Royal Statist. Soc. B29, 392-393. Heyde, C. C. (1963b) Some remarks on the moment problem. I and II. Quart. 1. Math. (2) 14, 91-96,97-105. Heyde, C. C. (1975) Kurtosis and departure from normality. In: Statistical Distributions in Scientific Work. 1. Eds G. P. Patil et al. Reidel, Dordrecht. 193-221. Heyde, C. C. (1986) Personal communication. Hida, T. (1960) Canonical representations of Gaussian processes and their applications. Memoirs Coli. Sci. Univ. Kyoto 23, 109-155. Hill, B. M., Lane, D. and Sudderth, W. (1980) A strong law for some generalized urn processes. AP 8, 214-226. Hoffmann-Jorgensen, J. (1994) Probability with a View Toward Statistics vols 1 and 2. Chapman & Hall, London. Holewijn, P. J. and Hordijk, A. (1975) On the convergence of moments in stationary Markov chains. SPA 3, 55-64. Hooper, P. M. and Thorisson, H. (1988) On killed processes and stopped filtrations. Stoch. Analysis Appl. 6,389-395. Hoover, D. N. and Keisler, H. J. (1984) Adapted probability distributions. Trans. Am. Math. Soc. 286, 159-201. Houdre, C. (1993) Personal communication. Hsu, P. L. and Robbins, H. (1947) Complete convergence and the law of large numbers. Proc. Nat. A cad. Sci. USA 33(2), 25-31. Hu, T. c. (1991) On the law ofthe iterated logarithm for arrays of random variables. Commun. Statist. Theory Methods 20, 1989-1994. Hu, Y. and Perez-Abreu, (1995) On the continuity of Wiener chaos. Bol. Soc. Mat. Mexicana 1, 127-135.
v.
REFERENCES
327
Huang, 1. S. (1975) A note on order statistics from Pareto distribution. Skand. Aktuarietidskr. 3, 187-190. Huang, W. J. and Li, S. H. (1993) Characterization of the Poisson process using the variance. Commun. Statist. Theory Methods 22, 1371-1382. HUsler,1. (1989) A note on the independence and total dependence of max i.d. distributions. AAP 21,231-232. Hwang, 1. S. and Lin, G. D. (1984) Characterizations of distributions by linear combinations of moments of order statistics. Bull. Inst. Math. Acad. Sinica 12, 179-202. Ibragimov, I. A. (1962) Some limit theorems for stationary processes. TPA 7, 361-392. Ibragimov, I. A. (1972) On a problem of C. R. Rao on infinitely divisible laws. Sankhya A34, 447-448. Ibragimov, I. A. (1975) Note on the CLT for dependent random variables. TPA, 20, 135-141. Ibragimov, I. A. (1983) On the conditions for the smoothness of trajectories of random functions. TPA 28, 229-250. Ibragimov, I. A. and Linnik, Yu. V. (1971) Independent and Stationary Sequences of Random Variables. Wolters-Noordhoff, Groningen. (Russian edn 1965) Ibragimov, I. A. and Rozanov, Yu. A. (1978) Gaussian Random Processes. Springer, Berlin. (Russian edn 1970) Ihara, S. (1993) Infonnation Theory for Continuous Systems. World Scientific, Singapore. Ihara, S. (1995) Personal communication. Ijzeren, 1. van (1972) A bivariate distribution with instructive properties as to normality, correlation and dependence. Statist. Neerland. 26, 55-56. Ikeda, N. and Watanabe, S. (1981) Stochastic Differential Equations and Diffusion Processes. North-Holland, Amsterdam. Iosifescu, M. (1980) Finite Markov Processes and their Applications. John Wiley & Sons, Chichester and Tehnica, Bucharest. Iosifescu, M. (1985) Personal communication. Isaacson, D. (1971) Continuous martingales with discontinuous marginal distributions. AMS 42,2139-2142. Isaacson, D. L. and Madsen, R. W. (1976) Markov Chains. Theory and Applications. John Wiley & Sons, New York. Ito, K. (1963) Stochastic Processes. Inostr. Liter., Moscow. (In Russian; transl. from Japanese) Ito, K. (1984) Introduction to Probability Theory. Cambro Univ. Press, Cambridge. Ivkovic, Z. (1985) Personal communication. Ivkovic, Z., Bulatovic, 1., Vukmirovic, 1. and Zivanovic, S. (1974) Applications of Spectral Multiplicity in Separable Hilbert Space to Stochastic Processes. Matern. Inst., Belgrade. Jacod, 1. (1975) Two dependent Poisson processes whose sum is still a Poisson process. lAP 12, 170-172. Jacod, J. (1979) Calcul Stochastique et Probleme de Martingales. (LNM714). Springer, Berlin. Jacod, 1. and Shiryaev, A. (1987) Limit Theorems for Stochastic Processes. Springer. Berlin. Jagers, A. A. (1983) Solution to Problem 650. Nieuw ArchiffVor Wiskunde. Ser. 4, 1, 377-378. Jagers, A. A. (1988) Personal communication. Jain, N. C. and Monrad, D. (1982) Gaussian quasimartingales. ZW 59, 139-159. Jain, N. C. and Monrad, D. (1983) Gaussian measures in Bp. AP 11, 46-57. Jamison, B., Orey, S. and Pruitt, W. (1965) Convergence of weighted averages of independent random variables. ZW 4, 40-44. Jankovic, S. (1988) Personal communication. Janson, S. (1979) A two-dimensional martingale counterexample. Report no. 8. Inst. MittagLeffler, Djursholm. Janson, S. (1988) Some pairwise independent sequences for which the central limit theorem fails. Stochastics 23, 439-448.
328
COUNTEREXAMPLES IN PROBABILITY
Jensen, U. (1990) An example concerning the convergence of conditional expectations. Statistics 21, 609-611. Jeulin, T. and Yor, M. (1979) Inegalite de Hardy, semi martingales et faux-amis. LNM 721, 332-359. Joffe, A. (1974) On a set of almost deterministic k-dependent r.v.s. AP 2, 161-162. Joffe, A. (1988) Personal communication. Johnson, B. R. (1974) An inequality for conditional distributions. Math. Mag. 47,281-283. Johnson, D. P. (1974) Representations and classifications of stochastic processes. Trans. Am. Math. Soc. 188, 179-197. Johnson, G. and Helms, L. L. (1963) Class (D) supennartingales. Bull. Am. Math. Soc. 69, 59-62. Johnson, N. S. and Kotz, S. (1977) Urn Models and their Application. John Wiley & Sons, New York. Jurek, Z. J. and Mason, J. D. (1993) Operator-Limit Distributions in Probability Theory. John Wiley & Sons, New York. Kabanov, Yu. M. (1974) Integral representations of functionals of processes with independent increments. TPA 19, 853-857. Kabanov, Yu. M. (1985) Personal communication. Kabanov, Yu. M., Liptser, R. Sh. and Shiryaev, A. N. (1983) Weak and strong convergence of the distributions of counting processes. TPA 28, 303-306. Kadota. T. T. and Shepp. L. A. (1970) Conditions for absolute continuity between a certain pair of probability measures. ZW 16,250-260. Kagan, A. M., Linnik, Yu. V. and Rao, C. R. (1973) Characterization Problems in Mathematical Statistics. John Wiley & Sons, New York. (Russian edn 1972) Kahane. J.-P. (1985) Some Random Series of Functions (2nd edn). Cambro Univ. Press, Cambridge. Kaishev, V. (1985) Personal communication. Kalashnikov, V. (1994) Topics on Regenerative Processes. CRC Press, Boca Raton (FL). Kalashnikov, V. (1996) Personal communication. Kallenberg, O. (1973) Conditions for continuity of random processes without discontinuities of second kind. AP 1, 519-526. KalIianpur, G. (1980) Stochastic Filtering Theory. Springer, New York. Kallianpur, G. (1989) Personal communication. Kalpazidou, S. (1985) Personal communication. Kanter, M. (1975) Stable densities under change of scale and total variation inequalities. AP 3,697-707. Karatzas, I. and Shreve, S. E. (1991) Brownian Motion and Stochastic Calculus (2nd edn). Springer, New York. Katti, S. K. (1967) Infinite divisibility of integer-valued r. v.s. AMS 38, 1306-1308. Kazamaki, N. (1972a) Changes of time, stochastic integrals and weak martingales. ZW 22, 25-32. Kazamaki, N. (l972b) Examples of local martingales. LNM 258,98-100. Kazamaki, N. (1974) On a stochastic integral equation with respect to a weak martingale. Tohoku Math. 1. 26, 53-63. Kazamaki, N. (l985a) A counterexample related to Ap-weights in martingale theory. LNM 1123, 275-277. Kazamaki, N. (1985b) Personal communication. Kazamaki, N. (1994) Continuous Exponential Martingales and BMO. (LNM 1579). Springer, Berlin. Kazamaki, N. and Sekiguchi, T. (1983) Unifonn integrability of continuous exponential martingales. Tohoku Math. 1. 35, 289-301.
REFERENCES
329
Kelker, D. (1971) Infinite divisibility and variance mixture of the nonnal distribution. AMS 42, 802-808. Kemeny, 1. G., Snell, 1. L. and Knapp, A. W. (1966) Denumerable Markov Chains. Van Nostrand, Princeton (NJ). Kendall, D. G. (1967) On finite and infinite sequences of exchangeable events. Studia Sci. Math. Hung. 2, 319-327. Kendall, D. G. (1985) Personal communication. Kendall, D. G. and Rao, K. S.(l950) On the generalized second limit theorem in the theory of probability. Biometrika 37,224-230. Kendall, M. G. and Stuart, A. (1958) The Advanced Theory of Statistics. 1. Griffin, London. Kenderov, P. S. (1992) Personal communication. Khintchine, A. Ya. (1934) Korrelationstheorie der stationaren stochastischen Prozesse. Math. Ann. 109,604-615. Kimeldorf, D. and Sampson, A. (1978) Monotone dependence. AS 6, 895-903. Kingman, 1. F. C. and Taylor, S. 1. (1966) Introduction to Measure and Probability. Cambro Univ. Press, Cambridge. Klass, M. 1. (1973) Properties of optimal extended-valued stopping rules for Sn/n. AP 1, 719-757. Klesov, O. I. (1995) Convergence a.s. of multiple sums of independent r. v.s. TPA 40, 52-65. Klimov, G. P. and Kuzmin, A. D. (1985) Probability, Processes, Statistics. Exercises with Solutions. Moscow Univ. Press, Moscow. (In Russian) Klopotowski, A. (1996) Personal communication. Kolmogorov, A. N. (1956) Foundations of the Theory of Probability. Chelsea, New York. (German edn 1933; Russian edns 1936 and 1973) Kolmogorov, A. N. and Fomin, S. V. (1970) Introductory Real Analysis. Prentice-Hall, Englewood Cliffs (NJ). (Russian edn 1968) Kopp, P. E. (1984) Martingales and Stochastic Integrals. Cambro Univ. Press, Cambridge. Kordzahia, N. (1996) Personal communication. Kotz, S. (1996) Personal communication. Kovatchev, B. (1996) Personal communication. Kowalski, C. 1. (1973) Nonnonnal bivariate distributions with nonnal marginals. AmS 27(3), 103-106. Krein, M. (1944) On one extrapolation problem of A. N. Kolmogorov. Doklady Akad. Nauk SSSR 46(8), 339-342. (In Russian) Krengel, U. (1989) Personal communication. Krewski, D. and Bickis, M. (1984) A note on independent and exhaustive events. AmS 38, 290-291. Krickeberg, K. (1967) Strong mixing properties of Markov chains with infinite invariant measure. In: Proc. 5th Berkeley Symp. Math. Statist. Probab. 2, part II. Univ. California Press, Berkeley. 431-446. Kronfeld, B. (1982) Personal communication. Krylov, N. V. (1980) Controlled Diffusion Processes. Springer, New York. (Russian edn 1977) Krylov, N. V. (1985) Personal communication. KOchler, U. (1986) Personal communication. KOchler, U. and Sorensen, M. (1989) Exponential families of stochastic processes: A unified semi martingale approach. Int. Statist. Rev. 57,123-144. Kuelbs,1. (1976) A counterexample for Banach space valued random variables. AP 4,684-689. Kurtz, T. G. (1969) A note on sequences of continuous parameter Markov chains. AMS 40, 1078-1082. Kurtz, T. G. (1980) The optional sampling theorem for martingales indexed by directed sets. AP 8, 675-681.
330
COUNTEREXAMPLES IN PROBABILITY
Kuznetsov, S. (1990) Personal communication. Kwapien, S. (1985) Personal communication. Laha, R. G. (1958) An example of a nonnonnal distribution where the quotient follows the Cauchy law. Proc. Nat. Acad. Sci. USA 44, 222-223. Laha, R. and Rohatgi, V. (1979) Probability Theory. John Wiley & Sons, New York. Lamperti, 1. (1966) Probability. Benjamin, New York. Lamperti, J. (1977) Stochastic Processes. Springer, New York. Lancaster, H. O. (1965) Pairwise statistical independence. AMS 36, 1313-1317. Landers, D. and Rogge, L. (1977) A counterexample in the approximation theory of random summation. AP 5, 1018-1023. Laube, G. (1973) Weak convergence and convergence in the mean of distribution functions. Metrika 20, 103-105. Laue, G. (1983) Existence and representation of density functions. Math. Nachricht. 114,7-21. Le Breton, A. (1989, 1996) Personal communications. Le Gall, 1. and Yor, M. (1983) Sur I' equation stochastique de Tsirelson. LNM 986. 81-88. Lebedev. V. (1985) Personal communication. Ledoux, M. and Talagrand, M. (1991) Probability in Banach Space. Springer, New York. Lee, M. -L. T. (1985) Dependence by total positivity. AP 13, 572-582. Lehmann, E. L. (1966) Some concepts of dependence. AMS 37. 1137-1153. Leipnik, R. (1981) The lognonnal distribution and strong non-uniqueness of the moment problem. TPA 26, 850-852. Leon. A. and Masse. J.-c. (1992) A counterexample on the existence of the LI-median. SPL 13. 117-120. Lessi, O. (1993) Corso di Probabilita. Metria. Padova. Letac. G. (1991) Counterexamples to P. C. Consul's theorem about the factorization of the GPD. Canad J. Statist. 19,229-232. Letac. G. (1995) Integration and Probability: Exercises and Solutions. Springer, New York. Levy, P. (1940) Le mouvement Brownien plan. Am. J. Math. 62, 487-550. Levy, P. (1948) The arithmetic charader of Wishart's distribution. Proc. Cambro Phil. Soc. 44, 295-297. Levy, P. (1965) Processus Stochastique et Mouvement Brownien (2nd edn). Gauthier-Villars, Paris. Lindemann, I. (1995) Personal communication. Linnik, Yu. V. and Ostrovskii, I. V. (1977) Decomposition of Random Variables and Vectors. Am. Math. Soc .• Providence (RI). (Russian edn 1972) Liptser, R. Sh. (1985) Personal communication. Liptser. R. Sh. and Shiryaev, A. N. (1977178) Statistics of Random Processes. 1 & 2. Springer. New York. (Russian edn 1974) Liptser, R. Sh. and Shiryaev, A. N. (1981) On necessary and sufficient conditions in the functional CLT for semi martingales. TPA 26, 130-135. Liptser, R. Sh. and Shiryaev, A. N. (1982) On a problem of necessary and sufficient conditions in the functional CLT for local martingales. ZW 59, 311-318. Liptser, R. Sh. and Shiryaev, A. N. (1989) Theory of Martingales. Kluwer Acad. Pub!., Dordrechl. (Russian edn 1986) Liu, D. and Neuts, M. F. (1991) Counterexamples involving Markovian arrival processes. Stoch. Models 7, 499-509. Liu, 1. S. and Diaconis, P. (1993) Positive dependence and conditional independence for bivariate exchangeable random variables. Techn. Rep. 430, Dept. Statist., Harvard Univ. Loeve, M. (1977178) Probability Theory. 1 & 2 (4th edn). Springer, New York. Lopez. G. and Moser, J. (1980) Dependent events. Pi-Mu-Epsilon 7,117-118. Lukacs, E. (1970) Characteristic Functions (2nd edn). Griffin, London.
REFERENCES
331
Lukacs, E. (1975) StocMstic Convergence (2nd edn). Acad. Press, New York. McKean, H. P. (1969) StocMstic Integrals. Acad. Press, New York. Maejima, M. (1989) Self-similar processes and limit theorems. Sugaku Expos. 2, 103-123. Malisic, 1. (1970) Collection of Exercises in Probability Theory with Applications. Gradjevinska Kniga, Belgrade. (In Serbo-Croatian) Mandelbrot, B. B. and Van Ness, 1. W. (1968) Fractional Brownian motions, fractional noises and applications, SIAM Rev. 10, 422-437. Marcus, D. J. (1983) Non-stable laws with all projections stable, ZW 64, 139-156. Mari nescu, E. (1985) Personal communication. Masry, E. and Cambanis, S. (1973) The representation of stochastic processes without loss of infonnation. SIAM J. Appl. Math. 25,628-633. Mauldon, J. G. (1956) Characterizing properties of statistical distributions. Quart, 1. Math, Oxford 7(2), 155-160. Melnick, E. L. and Tenenbein, A (1982) Misspecification of the normal distribution. AmS 36, 372-373. Melnikov, A V. (1983) Personal communication. Merzbach. E. and Nualart. D. (1985) Different kinds of two-parameter martingales, Israel J, Math. 52, 193-208. Metivier, M, (1982) Semimartingales. A Course on StOciUlstic Processes. Walter de Gruyter, Berlin. Metivier, M. and Pellaumail, J. (1980) Stoclwstic Integration. Acad. Press, New York. Metry, M, H. and Sampson, AR. (1993) Ordering for positive dependence on multivariate empirical distribution, Ann. Appl. Probab. 3, 1241-1251. Meyer, P.-A and Zheng, W. A, (1984) Tightness criteria for laws of semimartingales, Ann. Inst. H. Poincare B20,353-372. Meyn, S.P. and Tweedie, R.L. (1993) Markov CMins and StOclUlstic Stability. Springer, London. Mikusinski, P., Sherwood, H. and Taylor, M. D. (1992) Shuffles of min, StocMstica 13. 61-74. Molchanov, S. A (1986) Personal communication. Monrad, D. (1976) Levy processes: absolute continuity of hitting times for points, ZW 37, 43-49. Monroe, I. (1976) Almost sure convergence of the quadratic variation of martingales: a counterexample. AP 4, 133-138. Moran, P. A. P. (1967) A non-Markovian quasi-Poisson process. Studia Sci. Math. Hungar. 2, 425-429. Moran, P. A P. (1968) An Introduction to Probability Theory. Oxford Univ. Press, New York. Morgenstern, D. (1956) Einfache Beispiele zweidimensionaler Verteilungen. Mitt. Math. Statist. 8, 234-245. Mori, T. F. and Stoyanov, J. (1995/1996) Realizability of a probability model and random events with given independence/dependence structure (to appear), Morrison, J. M. and Wise, G. L. (1987) Continuity of filtrations of a-algebras. SPL 6,55-60. Mucci, A G. (1973) Limits for martingale-like sequences. Pacific J. Math. 48, 197-202. Mueller, C. (1988) A counterexample for Brownian motion on manifolds. Contemporary Math. 73, 217-221. Mutafchiev, L. (1986) Personal communication. Negri, I. (1995) Personal communication, Nelsen, R. B. (1992) Personal communication. Nelson, P. I. (1970) A class of orthogonal series related to martingales. AMS 41, 1684-1694. Neuts, M. (1973) Probability. Allyn & Bacon, Boston. Neveu, 1. (1965) Mathematical Foundations of the Calculus of Probability. Holden-Day, San Francisco.
332
COUNTEREXAMPLES IN PROBABILITY
Neveu, J. (1975) Discrete Parameter Martingales. North-Holland, Amsterdam. Novikov. A. A. (1972) On an identity for stochastic integrals. TPA 17,717-720. Novikov, A. A. (1979) On the conditions ofthe uniform integrability of continuous nonnegative martingales. TPA 24, 820-824. Novikov. A. A. (1983) A martingale approach in problems of first crossing time of nonlinear boundaries. Proc. Steklov Inst. Math. 158. 141-163. Novikov. A. A. (1985) Personal communication. Novikov. A. A. (1996) Martingales, TaubClian theorems and gambling systems. Preprint. Nualart, D. (1995) Tlie Malliavin Calculus and Related Topics. Springer. New York. O'Brien, G. L. (1980) Pairwise independent random variables. AP 8, 170-175. 0' Brien, G. L. (1982) The occurrence oflarge values in stationary sequences. ZW 61,347-353. 0' Connor, T. A. (1979) Infinitely divisible distributions with unimodal Levy spectral functions. AP 7, 494-499. Olkin,l., Gieser, L. and Derman, C. (1980) Probability Models and Applications. Macmillan, New York. Ord, J. K. (1968) The discrete Student's distribution. AMS 39. 1513-1516. Pakes, A.G. (1995) Quasi-stationary laws for Markov processes: examples of an always proximate absorbing state. AAP 27, 120-145. Pakes, A. G. and Khattree, R. (1992) Length-biasing, characterizations oflaws and the moment problem. Austral. 1. Statist. 34, 307-322. Panarelos, J. (1983) On Moran's property of the Poisson distribution. Biometr. 1.25,69-76. Papageorgiou, H. (1985) Personal communication. Papoulis, A. (1965) Probability, Random Variables and Stochastic Processes. McGraw-Hili. New York. Parzen, E. (1960) Modem Probability Theory & Applications. John WHey & Sons, New York. Parzen, E. (1962) Stochastic Processes. Holden-Day, San Francisco. Parzen, E. (1993) Personal communication. Pavlov, H. V. (1978) Some-properties of the distributions of the class NBU. In: Math. and MatkEducation, 4 (Proc. Spring Conf. UBM). Academia, Sofia. 283-285. (In Russian) Peligrad, M. (1993) Personal communication. Pesarin, F. (1990) Personal communication. Petkova, E. (1994) Personal communications. Petrov, V. V. (1975) Sums of Independent Random Variables. Springer, Berlin. (Russian edn 1972) Pfanzagl, J. (1969) On the existence of regular conditional probabilities. ZW 11, 244-256. Pflug, G. (1991) Personal communication. Philippou, A. N. (1983) Poisson and compound Poisson distributions of order k and some of their properties. Zap. Nauchn. Semin. LOMI AN SSSR (Lenjngrad) 130, 175-] 80. Philippou, A. N. and-Hadjichristos, J. H. (1985) A note on lhe Poisson distribution of order k and a result of Raikov. Preprint, Univ. Patras, Patras, Greece. Piegorsch. W. ~W. and Casella, G. (1985) The existence of the first negative moments. AmS 39, 60-62. Pierce, D. A. and Dykstra, R. L. (1969) Independence and the normal distribution. AmS 23(4), 39. Pitman, E. J. G. and Williams, E. G. (1967) Cauchy-distributed functions of Cauchy variates. AMS 38,916-918. Pirinsky, Ch. (1995) Personal communication. Portenko, N. 1.{1982) Generalized Diffusion Processes. Naukova Dumka. Kiev. (In Russian) P()rtenko, N. I. (1986) Personal communication. Prakasa Rao, B. L. S. (1992) Identifiability in Stochastic Models. Acad. Press, Boston.
REFERENCES
333
Pratelli, L. (1994) Deux contre-exemples sur la convergence d'integrales anticipative. LNM IS83, 110-112. Prohorov, Yu. V. (1950) The strong law of large numbers. /zv. Akad. Nauk. SSSR, Ser. Mat. 14, 523-536. (In Russian) Prohorov, Yu. V. (1956) Convergence of random processes and limit theorems in probability theory. TPA 1, 157-214. Prohorov, Yu. V. (1959) Some remarks on the strong law of large numbers. TPA 4, 204-208. Prohorov, Yu. V. (1983) On sums of random vectors with values in Hilbert space. TPA 28, 375-379. Prohorov, Yu., V. and Rozanov, Yu. A. (1969) Probability Theory. Springer, Berlin. (Russian edn 1967) Protter, P. (1990) Stochastic Integration and Differential Equations. A New Approach. Springer, New York. Purl, M. L. (1993) Personal communication. Rachev, S. T. (1991) Probability Metrics and the Stability of Stochastic Models. John Wiley & Sons, Chichester. Radavicius, M. (1980) On the question of the P. Levy theorem generalization. LitovsK.. Mat. Sbornik. 20(4), 129-131. (In Russian) Raikov, D. A. (1938) On the decomposition of Gauss and Poisson laws.lzv. Akad. Nauk. SSSR, Ser. Mat. 2, 91-124. (In Russian) Ramachandran, B. (1967) Advanced Theory of Characteristic Functions. Statist. Society. Calcutta. Ramachandran, D. (1974) Mixtures of perfect probability measures. AP 2, 495-500. Ramachandran, D. (1975) On the two definitions of independence. Colloq. Math. 32,227-231. Ramakrishnan, S. (1988) A sequence of coin toss variables for which the strong law fails. AMM 95,939-941. Rao, C. R. (1973) Linear Statistical Inference and its Applications (2nd edn). John Wiley & Sons, New York. Rao, M. M. (1979) Stochastic Processes and Integration. Sijthoff & Noordhoff, Alphen. Rao, M. M. (1984) Probability Theory with Applications. Acad. Press, Orlando. Rao, M. M. (1993) Conditional Measures and Applications. Marcel Dekker, New York. Rao. M. M. (1995) Stochastic Processes: General Theory. K1uwer Acad. Publ.. Dordrecht. Regazzini, E. (1992) Personal communication. Renyi, A. (1970) Probability Theory. Akad. Kiad6, Budapest, and North-Holland, Amsterdam. Resnik, S. I. (1973) Record values and maxima. AP 1, 650-662. Revesz, P. (1967) The Laws of Large Numbers. Akad. Kiad6, Budapest and Acad. Press, New York. Revuz, D. and Yor, M. (1991) Continuous Martingales and Brownian Motion. Springer, Berlin. Riedel, M. (1975) On the one-sided tails of infinitely divisible distributions. Math. Naclzricht. 70, 115-163. Rieders, E. (1993) The size of the averages of strongly mixing r. v.s. SPL 18, 57-64. Robbins, H. (1948) Convergence of distributions. AMS 19, 72-76. Robertson, J. B. and Womak, 1. M. (1985) A pairwise independent stationary stochastic process. SPL 3, 195-199. Robertson, L. C., Shortt, R. M. and Landry, S. S. (1988) Dice with fair sums. AMM 95, 316-328. Robertson, T. (1968) A smoothing property for conditional expectations given O"-Iattices. AMM 75,515-518. Robinson, P. M. (1990) Personal communication. Rogers, L.C.G. and Williams, D. (1987) Diffusions, Markov Processes and Martingales. Vol. 2: Ito calculus. John Wiley & Sons, Chichester.
334
COUNTEREXAMPLES IN PROBABILITY
Rogers. L.c.G. and Williams. D. (1994) Diffusions. Markov Processes and Martingales. Vol. 1: Foundations (2nd edn). John Wiley & Sons, Chichester. Rohatgi, V. (1976) Introduction to Probability Theory. John Wiley & Sons. New York. Rohatgi, V. (1984) Statistical Interference. John Wiley & Sons, New York. Rohatgi, V. (1986) Personal communication. Rohatgi. V. K .. Steutel. F. W. and Szekely. G. 1. (1990) Infinite divisibility of products and quotients of iid random variables. Math. Sci. 15, 53-59. Rosalsky, A (1993) On the almost certain limiting behaviour of normed sums of identically distributed positive random variables. SPL 16, 65-70. Rosalsky, A. and Teicher. H. (1981) A limit theorem for double arrays. AP 9,460-467. Rosalsky, A, Stoyanov. 1. and Presnell, B. (1995) An ergodic-type theorem 11 la Feller for nonintegrable strictly stationary continuous time process. Stoch. Anal. Appl. 13, 555-572. Rosenblatt, M. (1956) A central limit theorem and a strong mixing condition. Proc. Nat. Acad. Sci. USA 42, 43-47. Rosenblatt, M. (1971) Markov Processes. Structure and Asymptotic Behavior. Springer, Berlin. Rosenblatt, M. (1974) Random Processes. Springer, New York. Rosenblatt, M. (1979) Dependence and asymptotic independence for random processes. In: Studies in Probability Theory. 18. Math. Assoc. of America, Washington (DC). 24-45. Rossberg, H.-J., Jesiak, B. and Siegal. G. (1985) Analytic Methods of Probability Theory. Akademie, Berlin. Rotar, V. (J985) Personal communication. Roussas, G. (J 972) Contiguity of Probability Measures. Cambro Univ. Press, Cambridge. Roussas. G. (1973) A First Course in Mathematical Statistics. Addison-Wesley, Reading (MA). Royden, H. L. (1968) Real Analysis (2nd edn). Macmillan, New York. Rozanov, Yu. A. (1967) Stationary Random Processes. Holden-Day, San Francisco. (Russian edn 1963) Rozanov, Yu. A. (1977) Innovation Processes. Winston & Sons, Washington (DC). (Russian edn 1974) Rozovskii, B. L. (1988) Personal communication. Rudin, W. (1966) Real and Complex Analysis. McGraw-Hili, New York. Rudin, W. (1973) Functional Analysis. McGraw-Hill, New York. Rudin, W. (1994) Personal communication. Runnenburg, J. Th. (1984) Problem 142 with the solution. Statist. Neerland. 39,48-49. Riischendorf, L. (1991) On conditional stochastic ordering of distributions. AAP 23, 46-63. Rutkowski, M. (1987) Strong solutions of SDEs involving local times. Stochastics 22,201-218. Rutkowski, M. (] 995) Left and right linear innovation for a multivariate SaS random variables. SPL 22, 175-] 84. Salisbury, T. S. (1986) An increasing diffusion. In: Seminar in Stochastic Processes] 984. Eds E. Cinlar et al. Birkhiiuser, Basel. 173-194. Salisbury, T. S. (1987) Three problems from the theory of right processes. AP 15, 263-267. Sato, H. (1987) On the convergence of the product of independent random variables. 1. Math. Kyoto Univ. 27. 381-385. Schachermayer. W. (1993) A counterexample to several problems in the theory of asset pricing. Math. Finance 3.217-229. Schoenberg, I. 1. (1983) Solution to Problem 650. Nieuw Archiff Vor Wiskunde. Ser. 4, 1, 377-378. Sekiguchi, T. (1976) Note on the Krickeberg decomposition. Tohoku Math. 1. 28, 95-97. Serfling, R. (1980) Approximation Theorems of Mathematical Statistics. John Wiley & Sons, New York. Seshadri, V. (1986) Personal communication.
REFERENCES
335
Sevastyanov, B. A., Chistyakov, V. P. and Zubkov, A. M. (1985) Problems in the Theory of Probability. Mir, Moscow. (Russian edn 1980) Shanbhag. D. N., Pestana, D. andSreehari. M. (1977) Some further results in infinite divisibility. Math. Proc. Cambr. Phil. Soc. 82, 289-295. Shevarshidze, T. (1984) On the multidimensional local limit theorem for densities. In: Limit Theorems and Stochastic Equations. Ed. G. M. Manya. Metsniereba, Tbilisi. 12-53. Shiryaev, A. N. (1985) Personal communication. Shiryaev, A. (1995) Probability (2nd edn). Springer, New York. (Russian edn 1980) Shohat, J. and Tamarkin, J. (1943) The Problem of Moments. Am. Math. Soc., New York. Shur, M. G. (1985) Personal communication. Sibley, D. (1971) A metric for weak convergence of distribution functions. Rocky Mountain 1. Math. I, 437-440. Simons, G. (1977) An unexpected expectation. AP S, 157-158. Slud, E. V. (1993) The moment problem for polynomial forms in normal random variables. AP 21,2200-2214. Solovay, R. M. (1970) A model of set theory in which every set of reals is Lebesgue measurable. Ann. Math. 92, I-56. Solovyev, A. D. (1985) Personal communication. Speakman, J. M. O. (1967) Two Markov chains with common skeleton. ZW7, 224. Spitzer, F. (1964) Principles of Random Walk. Van Nostrand, Princeton, (NJ). Steck, G. P. (1959) A uniqueness property not enjoyed by the normal distribution. AMS 29, 604-606. Steen, L. A. and Seebach, J. A. (1978) Counterexamples in Topology (2nd edn). Springer, New York. Steutel, F. W. (1970) Preservation of Infinite Divisibility Under Mixing. 33. Math. Centre Tracts, Amsterdam. Steutel, F. W. (1973) Some recent results in infinite divisibility. SPA 1, 125-143. Steutel, F. W. (1984) Problem 153 and its solution. Statist. Neerland. 38, 215. Steutel. F. W. (1989) Personal communications. Stout, W. (1974a) Almost Sure Convergence. Acad. Press. New York. Stout, W. (1974b) On convergence of cp-mixing sequences of random variables. ZW31, 69-70. Stout, W. (1979) Almost sure invariance principle when EX? 00. ZW 49,23-32. Stoyanov, J. (1995) Dependency measure for sets of random events or random variables. SPL 23, 108-115. Stoyanov, 1., Mirazchiiski, I., Ignatov, Zv. and Tanushev, M. (1988) Exercise Manual in Probability Theory. Kluwer Acad. Puhl., Dordrecht. (Bulgarian edn 1985; Polish edn 1991) Strassen, V. (1964) An invariance principle for the law of the iterated logarithm. ZW3, 211-226. Stricker, C. (1977) Quasimartingales. martingales locales, semimartingales et filtrations naturelles. ZW 39, 55-64. Stricker, C. (1983) Semimartingales Gaussiennes-application au probleme de l'innovation. ZW 64, 303-312. Stricker, C. (1984) Integral representation in the theory of continuous trading. Stochaslics 13, 249-265. Stricker, C. (1986) Personal communication. Stroock, D. W. and Varadhan, S. R. S. (1979) Multidimensional Diffusion Processes. Springer, New York. Stroud, T. F. (1992) Personal communication. Sudderth, W. D. (1971) A 'Fatou equation' for randomly stopped variables. AMS 42, 21432146. Surgai lis, D. (1974) Characterization of a supermartingale by some stopping times. Lilhuanian Math 1. 14(1), 147-150.
=
336
COUNTEREXAMPLES IN PROBABILITY
Syski, R. (1991) Introduction to Random Processes (2nd edn). Marcel Dekker, New York. Szasz, D. O. H. (1970) Once more on the Poisson process. Studia Sci. Math. Hungar. 5,
441-444. Szekely, G. J. (1986) Paradoxes in Probability Theory and Mathematical Statistics. Akad. Kiad6, Budapest and KJuwer Acad. Publ., Dordrecht. Takacs, L. (1985) Solution to Problem 6452. AMM 92, 515. Takahasi, K. (1971172) An example of a sequence of frequency function which converges to a frequency function in the mean of order 1 but nowhere. 1. Japan Statist. Soc. 2, 33-34. Tanny, D. (1974) A zero-<>ne law for stationary sequences. ZW 30, 139-148. Targhetta, M. L. (1990) On a family of indeterminate distributions. 1. Math. Anal. Appl. 147, 477-479. Taylor, R. L. and Wei, D. (1979) Laws oflarge numbers for tight random elements in normed linear spaces. AP 7, 150-155. Taylor. R. L., Daffer. P. Z. and Patterson, R. F. (1985) Limit Theorems for Sums of Exchangeable Random Variables. Rowman & Allanheld, Totowa (NJ). Thomas, J. (1971) An Introduction to Applied Probability and Random Processes. John Wiley & Sons, New York. Thomasian. A. 1. (1957) Metrics and norms on spaces of random variables. AMS 28, 512-514. Thomasian, A. (1969) The Structure of Probability Theory with Applications. McGraw-HilI, New York. Tjur, T. (1980) Probability Based on Radon Measures. John Wiley & Sons, Chichester. Tjur, T. (1986) Personal communication. Tomkins, R. J. (1975a) On the equivalence of modes of convergence. Canad. Math. Bull. 10, 571-575. Tomkins. R. J. (I 975b) Properties of martingale-like sequences. Pacific J. Math. 61, 521-525. Tomkins, R. J. (I9RO) Limit theorems without moment hypotheses for sums of independent random variables. AP 8, 314-324. Tomkins, R. J. (1984a) Martingale generalizations and preservation of martingale properties. Canad. J. Statist. 12, 99-106. Tomkins. R. 1. (I 984b ) Martingale generalizations. In: Topics in Applied Statistics. Eds Y. P. Chaubey and T. D. Dviwedi. Concordia Univ., Montreal. 537-548. Tomkins, R. 1. (1986) Personal communication. Tomkins, R. J. (1990) A generalized LIL. SPL 10, 9-15. Tomkins, R. J. (1992) Refinements of Kolmogorov's LIL. SPL 14, 321-325. Tomkins, R. 1. (1996) Refinement of a 0-1 law for maxima. SPL 27,67-69. Tong, Y. L. (1980) Probability Inequalities in Multivariate Distributions. Acad. Press, New York. Too, Y. H. and Lin, G. D. (1989) Characterizations of uniform and exponential distributions. SPL 7, 357-359. Tsire1son. B. S. (1975) An example of a stochastic equation having no strong solution. TPA 20,416-418. Tsokos, C. (1972) Probability Distributions: An Introduction to Probability Theory with Applications. Duxbury Press, Belmont (CA). lWardowska, K. (1991) Personal communication. lWeedie, R. L. (1975) Sufficient conditions for ergodicity and recurrence of Markov chains on a general state space. SPA 3. 385-403. Tzokov, V. S. (1996) Personal communication. Vahaniya, N. N., Tarie1adze, V. 1. and Chobanyan, S. A. (1989) Probability Distributions in Banach Spaces. Kluwer Acad. Publ., Dordrecht. (Russian edn 1985) Van der Hoeven, P. C. T. (1983) On Point Processes. 165. Math. Centre Tracts, Amsterdam. Van Eeden, C. (1989) Personal communication.
REFERENCES
337
Vandev, D. L. (1986) Personal communication. Vasudeva, R. (1984) Chover's law of the iterated logarithm and weak convergence. Acta Math. Hung. 44,215-221. Verbitskaya, I. N. (1966) On conditions for the applicability of the SLLN to wide sense stationary processes. TPA 11,632-636. Vitale, R. A. (1978) Joint vs individual nonnality. Math Magazine 51, 123. Walsh, J. B. (1982) A non-reversible semimartingale. LNM920, 212. Wang, A. (1977) Quadratic variation of functionals of Brownian motion. AP 5, 75(r769. Wang, Y. H. (1979) Dependent random variables with independent subsets. AMM86, 290-292. Wang, Y. H. (1990) Dependent random variables with independent subsets-II. Canad. Math. Bull. 33, 24-28. Wang, Y. H., Stoyanov, J. and Shao, Q.-M. (1993) On independence and dependence properties of sets of random events. AmS 47, 112-115. Wang, Zh. (1982) A remark on the condition of integrability in quadratic mean for second order random processes. Chinese Ann. Math. 3,349-352. (In Chinese) Washburn, R. B. and Will sky, A. S. (1981) Optional sampling of submartingales indexed by partially ohserved sets. AP 9, 957-970. Wentzell, A. D. (1981) A Course in the Theory of Stochnstic Processes. McGraw-Hill, New York. (Russian edn 1975) Whittaker, J. (1991) Graphical Models in Applied Multivariate Statistics. John Wiley & Sons, Chichester. Williams, D. (1991) Probability with Martingales. Cambro Univ. Press, Cambridge. Williams, R. J. (1984) Personal communication. Williams, R. 1. (1985) Reflected Brownian motion in a wedge: semimartingale property. ZW
69,161-176. Wintner, A. (1947) The Fourier Transfonns of Probability Distributions. Baltimore (MD). (Published by the author.) Witsenhausen, H. S. (1975) On policy independence of conditional expectations. Infonn. Control 28, 65-75. Willmann, R. (1985) A general law of iterated logarithm. ZW 68, 521-543. Wong, C. K. (1972) A note on mutually independent events. AmS 26, April, 27-28. Wong, E. (1971) Stochnstic Processes in Infonnation and Dynamical Systems. McGraw-Hill, New York. Wright, F. T., Platt, R. D. and Robertson, T. (1977) A strong law for weighted averages of i.i.d. r. V.S. with arbitrarily heavy tails. AP 5, 58(r590. Wrobel, A. (1982) On the almost sure convergence of the square variation of the Brownian motion. Probab. Math. Statist. 3,97-101. Yamada, T. and Watanabe, S. (1971) On the uniqueness of solutions of SDEs. I and 11.1. Math. Kyoto Univ. 11, 115-167,553-563. Yamazaki, M. (1972) Note on stopped average of martingales. Tohoku Math. 1. 24, 41-44. Yanev, G. P. (1993) Personal communication. Yeh, J. (1973) Stochnstic Processes and the Wiener integrals. Marcel Dekker, New York. Yellotl, J. and Iverson, G. J. (1992) Uniqueness properties of higher-order autocorrelation functions. 1. Optical Soc. Am. 9, 388-404. Ying, P. (1988) A note on independence of random events and random variables. Natural Sci. 1. Hunan Nonnal Univ. 11, 19-21. (In Chinese) Yor, M. (1978) Un exemple de processus qui n' est pas une semi-martingale. Asterisque 52-53,
219-221. Yor, M. (1986, 1996) Personal communications. Yor, M. (1989) De convex resultats sur l' equation de Tsirelson. C.R. A cad. Sci. Paris, Ser. I
309,511-514.
338
COUNTEREXAMPLES IN PROBABILITY
Yor. M. (1992) Some Aspects of Brownian Motion. Part 1: Some Special Functionals. Birkhauser. Basel. Yor. M. (1996) Some Aspects of Brownian Motion. Part ll: Recent Martingale Problems. Birkhauser, BaseL Zabczyk. J. (1986) Personal communication. Zane11a. A. (1990) Personal communication. Zaremba, P. (1983) Embedding of semi martingales and Brownian motion. UtoVJk. Mal. Sbomik 23(1), 96-100. Zbaganu, G. (1985) Personal communication. Zieba, W. (1993) Some special properties of conditional expectations. Acta Math. Hungar. 62, 385-393. Zolotarev, V. M. (1961) Generalization of the Kolmogorov inequality. Issled. Mech. Prikt. Matem. (MITl) 7, 162-166. (In Russian) Zolotarev. V. M. (1986) One-dimensional Stable Distributions. Am. Math. Soc .• Providence (RI). (Russian edn 1983) Zolotarev, V. M. (1989) Personal communication. Zolotarev, V. M. and Korolyuk. V. S. (1961) On a hypothesis proposed by B. V. Gnedenko. TPA 6,431-434. Zubkov, A. M. (1986) Personal communication. Zvonkin, A. K. and Krylov, N. V. (1981) On strong solutions of SDBs. Selecta Math. Sovietica 1, 19-61. (Russian publication 1975) Zygmund. A. (1947) A remark on characteristic functions. AMS 18, 272-276. Zygmund, A. (1968) Trigonometric Series Vols 1 and 2 (2nd edn). Cambro Univ. Press. Cambridge.