This page intentionally left blank
MEASURES, INTEGRALS AND MARTINGALES
This is a concise and elementary introduction...
164 downloads
2917 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
This page intentionally left blank
MEASURES, INTEGRALS AND MARTINGALES
This is a concise and elementary introduction to measure and integration theory as it is nowadays needed in many parts of analysis and probability theory. The basic theory – measures, integrals, convergence theorems, Lp -spaces and multiple integrals – is explored in the first part of the book. The second part then uses the notion of martingales to develop the theory further, covering topics such as Jacobi’s general transformation theorem, the Radon–Nikodým theorem, differentiation of measures, Hardy–Littlewood maximal functions or general Fourier series. Undergraduate calculus and an introductory course on rigorous analysis in are the only essential prerequisites, making this text suitable for both lecture courses and for self-study. Numerous illustrations and exercises are included, and these are not merely drill problems but are there to consolidate what has already been learnt and to discover variants, sideways and extensions to the main material. Hints and solutions will be available on the internet. René Schilling is Professor of Stochastics at the University of Marburg.
MEASURES, INTEGRALS AND MARTINGALES RENÉ L. SCHILLING
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521850155 © Cambridge University Press 2005 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2005 eBook (EBL) ISBN-13 978-0-511-34456-5 ISBN-10 0-511-34456-2 eBook (EBL) ISBN-13 ISBN-10
hardback 978-0-521-85015-5 hardback 0-521-85015-0
ISBN-13 ISBN-10
paperback 978-0-521-61525-9 paperback 0-521-61525-9
Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents
Prelude
page viii
Dependence chart
xi
1
Prologue Problems
1 4
2
The pleasures of counting Problems
5 13
3
-algebras Problems
15 20
4
Measures Problems
22 28
5
Uniqueness of measures Problems
31 35
6
Existence of measures Problems
37 46
7
Measurable mappings Problems
49 54
8
Measurable functions Problems
57 65
9
Integration of positive functions Problems
67 73
10 Integrals of measurable functions and null sets Null sets and the ‘a.e.’ Problems v
76 80 84
vi
Contents
11 Convergence theorems and their applications Parameter-dependent integrals Riemann vs. Lebesgue integration Examples Problems
88 91 92 98 100
12 The function spaces p 1 p Problems
105 116
13 Product measures and Fubini’s theorem More on measurable functions Distribution functions Minkowski’s inequality for integrals Problems
120 127 128 130 130
14 Integrals with respect to image measures Convolutions Problems
134 137 140
15 Integrals of images and Jacobi’s transformation rule Jacobi’s transformation formula Spherical coordinates and the volume of the unit ball Continuous functions are dense in p n Regular measures Problems
142 147 152 156 158 159
16 Uniform integrability and Vitali’s convergence theorem Different forms of uniform integrability Problems
163 168 173
17 Martingales Problems
176 188
18 Martingale convergence theorems Problems
190 200
19 The Radon–Nikodým theorem and other applications of martingales The Radon–Nikodým theorem Martingale inequalities The Hardy–Littlewood maximal theorem Lebesgue’s differentiation theorem The Calderón–Zygmund lemma Problems
202 202 211 213 218 221 222
Contents
vii
20 Inner product spaces Problems
226 232
21 Hilbert space Problems
234 246
22 Conditional expectations in L2 On the structure of subspaces of L2 Problems
248 253 257
23 Conditional expectations in Lp Classical conditional expectations Separability criteria for the spaces Lp X Problems
258 263 269 274
24 Orthonormal systems and their convergence behaviour Orthogonal polynomials The trigonometric system and Fourier series The Haar system The Haar wavelet The Rademacher functions Well-behaved orthonormal systems Problems
276 276 283 289 295 299 302 312
Appendix A: lim inf and lim sup
313
Appendix B: Some facts from point-set topology Topological spaces Metric spaces Normed spaces
318 319 322 325
Appendix C: The volume of a parallelepiped
328
Appendix D: Non-measurable sets
330
Appendix E: A summary of the Riemann integral The (proper) Riemann integral The fundamental theorem of integral calculus Integrals and limits Improper Riemann integrals
337 337 346 351 353
Further reading
360
References
364
Notation index
367
Name and subject index
371
Prelude
The purpose of this book is to give a straightforward and yet elementary introduction to measure and integration theory that is within the grasp of second or third year undergraduates. Indeed, apart from interest in the subject, the only prerequisites for Chapters 1–13 are a course on rigorous --analysis on the real line and basic notions of linear algebra and calculus in n . The first few chapters form a concise (not to say minimalist) introduction to Lebesgue’s approach to measure and integration, based on a 10-week, 30-hour lecture course for Sussex University mathematics undergraduates. Chapters 14–24 are more advanced and contain a selection of results from measure theory, probability theory and analysis. This material can be read linearly but it is also possible to select certain topics; see the dependence chart on page xi. Although more challenging than the first part, the prerequisites stay essentially the same and a reader who has worked through and understood Chapters 1–13 will be well prepared for all that follows. At some points, one or another concept from point-set topology will be (mostly superficially) needed; those readers who are not familiar with the topic can look up the basic results in Appendix B whenever the need arises. Each chapter is followed by a section of Problems. They are not just drill exercises but contain variants, excursions from and extensions of the material presented in the text. The proofs of the core material do not depend on any of the problems and it is an exception that I refer to a problem in one of the proofs. Nevertheless I do advise you to attempt as many problems as possible. The material in the Appendices – on upper and lower limits, basic topology and the Riemann integral – is primarily intended as back-up, for when you want to look something up. Unlike many textbooks this is not an introduction to integration for analysts or a probabilistic measure theory. I want to reach both (future) analysts and (future) probabilists, and to provide a foundation which will be useful for both viii
Prelude
ix
communities and for further, more specialized, studies. It goes without saying that I have to leave out many pet choices of each discipline. On the other hand, I try to intertwine the subjects as far as possible, resulting – mostly in the latter part of the book – in the consequent use of the martingale machinery which gives ‘probabilistic’ proofs of ‘analytic’ results. Measure and integration theory is often seen as an abstract and dry subject, disliked by many students. There are several reasons for this. One of them is certainly the fact that measure theory has traditionally been based on a thorough knowledge of real analysis in one and several dimensions. Many excellent textbooks are written for such an audience but today’s undergraduates find it increasingly hard to follow such tracts, which are often more aptly labelled graduate texts. Another reason lies within the subject: measure theory has come a long way and is, in its modern purist form, stripped of its motivating roots. If, for example, one starts out with the basic definition of measures, it takes unreasonably long until one arrives at interesting examples of measures – the proof of existence and uniqueness of something as basic as Lebesgue measure already needs the full abstract machinery – and it is not easy to entertain by constantly referring to examples made up of delta functions and artificial discrete measures. I try to alleviate this by postulating the existence and properties of Lebesgue measure early on, then justifying the claims as we proceed with the abstract theory. Technically, measure and integration theory is no more difficult than, say, complex function theory or vector calculus. Most proofs are even shorter and have a very clear structure. The one big exception, Carathéodory’s extension theorem, can be safely stated without proof since an understanding of the technique is not really needed at the beginning; we will refer to the details of it only in Chapter 14 in connection with regularity questions. The other exception is the (classical proof of the) Radon–Nikodým theorem, but we will follow a different route in this book and use martingales to prove this and other results. I am grateful to all students who went to my classes, challenged me to write, rewrite and improve this text and who drew my attention – sometimes unbeknownst to them – to many weaknesses. I owe a great debt to the patience and interest of my colleagues, in particular to Niels Jacob, Nick Bingham, David Edmunds and Alexei Tyukov who read the whole text, and to Charles Goldie and Alex Sobolev who commented on large parts of the manuscript. Without their encouragement and help there would be more obscure passages, blunders and typos in the pages to follow. It is a pleasure to acknowledge the interest and skill of the Cambridge University Press and its editor, Roger Astley, in the preparation of this book.
x
Prelude
A few words on notation before getting started. I tried to keep unusual and special notation to a minimum. However, a few remarks are in order: means the natural numbers 1 2 3 and 0 = ∪ 0. Positive or negative is always understood in non-strict sense 0 or 0; to exclude 0, I say strictly positive/negative. A ‘+’ as sub- or superscript refers to the positive part of a function or the positive members of a set. Finally, a ∨ b resp. a ∧ b denote the maximum resp. minimum of the numbers a b ∈ . For any other general notation there is a comprehensive index of notation at the end of the book. In some statements I indicate alternatives using square brackets, i.e., ‘if A [B] … then P [Q] ’ should be read as ‘if A … then P ’ and ‘if B … then Q ’. The end of a proof is marked by Halmos’ ‘tombstone’ symbol , and Bourbaki’s ‘dangerous bend’ symbol in the margin identifies a passage which requires some attention. As with every book, one cannot give all the details at every instance. On the other hand, the less experienced reader might glide over these places without even noticing that some extra effort is needed; for these readers – and, hopefully, not to the annoyance of all others – I use the symbol[] to indicate where some little verification is appropriate. Cross-referencing. Throughout the text chapters are numbered with arabic numerals and appendices with capital letters. Formulae are numbered (n.k) refering to formula k from Chapter n. For theorems and the like I write n.m for Theorem m from Chapter n. The abbreviation Tn.m is sometimes used for Theorem n.m (with D standing for Definition, L for Lemma, P for Proposition and C for Corollary).
§ 23.14 –18 Martingales & Cond. Expectation
§19.11–12 Martingale ineq.
§16.1–7 Uniform integrability, Vitali
§16.8–9 Different forms of UI
§15.16–17 Cc is dense in Lp
§15.18–20 Regularity of measures
§19.20–21 Leb. Differentiation T.
§19.14–18 Maximal functions
§ 23.1–13 Cond. Expectation in Lp
§18 Martingale Convergence
§19.1–9 ´ Radon-Nikodym Theorem
§15.5–15 Jacobi’s Transformation T. needs pf. of T. 5.1
§ 23.19–21 Separability of Lp
§ 22.1– 4 Cond. Expectation in L2
§17 Martingales
§19.22 ´ Calderon-Zygmund Lemma
§15.1– 4 Integrals of direct images
§§ 20, 21 Inner products, Hilbert space
§13.11–13 Distribution functs.
§13.14
§13.10
Dependence Chart
§ 24.29 Brownian motion
§ 24.19–20 Haar wavelets § 24.21–23 Rademacher fns. § 24.24 –28 Wellbehaved ONSs
functions
§ 24.16–18 Haar
§ 22.5 Structure of subspaces of L2
§ 24.1–15 Orth. polynomials, Fourier series
Chapters 2–12 contain core material which is needed in all later chapters. Prerequisites within Chapters 13–24 are shown by arrows , dashed arrows indicate a minor dependence.
§14.4–8 Convolution
§14.1–3 Image measure & integrals
§13.1–9 Product measure, Fubini T.
1 Prologue
The theme of this book is the problem of how to assign a size, a content, a probability, etc. to certain sets. In everyday life this is usually pretty straightforward; we • count: a b c x y z has 26 letters; • take measurements: length (in one dimension), area (in two dimensions), volume (in three dimensions) or time; • calculate: rates of radioactive decay or the odds to win the lottery. In each case we compare (and express the outcome) with respect to some base unit; most of the measurements just mentioned are intuitively clear. Nevertheless, let’s have a closer look at areas: area = length × widthw
w
(1.1)
l
An even more flexible shape than the rectangle is the triangle:
h
area =
b
1
1 × baseb × heighth 2
(1.2)
2
R.L. Schilling
Triangles are indeed more basic than rectangles since we can represent every rectangle, and actually any odd-shaped quadrangle, as the ‘sum’ of two nonoverlapping triangles:
(1.3)
area = area of shaded triangle + area of white triangle In doing so we have tacitly assumed a few things. In (1.2) we have chosen a particular base line and the corresponding height arbitrarily. But the concept of area should not depend on such a choice and the calculation this choice entails. Independence of the area from the way we calculate it is called well-definedness. Plainly, b3 h1
b1
area =
h3
b2
(1.4)
h2
1 × h1 × b1 2
=
1 × h2 × b2 2
=
1 × h3 × b3 2
Notice that (1.4) allows us to pick the most convenient method to work out the area. In (1.3) we actually used two facts: • the area of non-overlapping (disjoint) sets can be added, i.e. areaA = areaB = A ∩ B = ∅
=⇒ • congruent triangles have the same area, i.e. area
areaA ∪ B = + = area .
This shows that the least we should expect from a reasonable measure is that it is well-defined, takes values in 0 and ∅ = 0
(1.5)
additive, i.e. A ∪ B = A + B whenever A ∩ B = ∅
(1.6)
The additional property that the measure is invariant under congruences
(1.7)
Measures, Integrals and Martingales
3
turns out to be a very special property of length, area and volume, i.e. of Lebesgue measure on n . The above rules allow us to measure arbitrarily odd-looking polygons using the following recipe: dissect the polygon into non-overlapping triangles and add their areas. But what about curved or even more complicated shapes, say,
?
Here is one possibility for the circle: inscribe a regular 2j -gon, j ∈ , into the circle, subdivide it into congruent triangles, find the area of each of these slices and then add all 2j pieces. In the next step increase j j + 1 by doubling the number of points on the circumference and repeat the above procedure. Eventually, area of circle = lim 2j × area triangle at step j j→
2π rad 2j
(1.8)
Again, there are a few problems: does the limit exist? Is it admissible to subdivide a set into arbitrarily many subsets? Is the procedure independent of the particular subdivision? In fact, nothing would have prevented us from paving the circle with ever smaller squares! For a reasonable notion of measure the answer to all of these questions should be yes and the way we pave the circle should not lead to different results, as long as our tiles are disjoint. However, finite additivity (1.6) is not enough for this and we have to use instead − additivity area · Aj = areaAj (1.9) j∈
j∈
where the notation · j Aj means the disjoint union of the sets Aj , i.e. the union where the sets Aj are pairwise disjoint: Aj ∩ Ak = ∅ if j = k; a corresponding notation is used for unions of finitely many sets. We will see that conditions (1.5) and (1.9) lead to the notion of measure which is powerful enough to cater for all our everyday measuring needs and for much more. We will also see that a good notion of measure allows us to introduce integrals, basically starting with the na¨ıve idea that the integral of a positive
4
R.L. Schilling
function should stand for the area of the set between the graph of the function and the abscissa. Problems 1.1. Consider the two figures below.
They seem to indicate that there is no conclusive way to exhaust an area by squares (see the extra square in the second figure). Can that be? 1.2. Use (1.8) to find the area of a circle with radius r.
2 The pleasures of counting
Set algebra and countability play a major rôle in measure theory. In this chapter we review briefly notation and manipulations with sets and introduce then the notion of countability. If you are not already acquainted with set algebra, you should verify all statements in this chapter and work through the exercises. Throughout this chapter X and Y denote two arbitrary sets. For any two sets A B (which are not necessarily subsets of a common set) we write A ∪ B = x x ∈ A or x ∈ B or x ∈ A and B A ∩ B = x x ∈ A and x ∈ B A \ B = x x ∈ A and x ∈ B in particular we write A ∪· B for the disjoint union, i.e. for A ∪ B if A ∩ B = ∅. A ⊂ B means that A is contained in B including the possibility that A = B; to exclude the latter we write A B. If A ⊂ X, we set Ac = X \A for the complement of A (relative to X). Recall also the distributive laws for A B C ⊂ X A ∩ B ∪ C = A ∩ B ∪ A ∩ C A ∪ B ∩ C = A ∪ B ∩ A ∪ C
(2.1)
and de Morgan’s identities A ∩ Bc = Ac ∪ Bc A ∪ Bc = Ac ∩ Bc
(2.2)
which also hold for arbitrarily many sets Ai ⊂ X, i ∈ I (I stands for an arbitrary index set), c Ai = Aci i∈I
c Ai
i∈I
=
i∈I
i∈I
5
(2.3) Aci
6
R.L. Schilling
A map f X → Y is called injective (or one-one) ⇐⇒ fx = fx =⇒ x = x surjective (or onto) ⇐⇒ fX = fx ∈ Y x ∈ X = Y bijective ⇐⇒ f is injective and surjective Set operations and direct images under a map f are not necessarily compatible: indeed, we have, in general, fA ∪ B = fA ∪ fB fA ∩ B = fA ∩ fB
(2.4)
fA \ B = fA \ fB Inverse images and set operations are, however, always compatible. For C Ci D ⊂ Y one has f −1 Ci = f −1 Ci i∈I
f −1
i∈I
i∈I
Ci = f −1 Ci
(2.5)
i∈I
f −1 C \ D = f −1 C \ f −1 D If we have more information about f we can, of course, say more. 2.1 Lemma f X → Y is injective if, and only if, fA ∩ B = fA ∩ fB for all A B ⊂ X. Proof ‘⇒’: Since fA ∩ B ⊂ fA and fA ∩ B ⊂ fB, we have always fA ∩ B ⊂fA ∩ fB. Let us check the converse inclusion ‘⊃’. If y ∈ fA ∩ fB, we have y = fa and y = fb for some a ∈ A b ∈ B. So, fa = y = fb and, by injectivity, a = b. This means that a = b ∈ A ∩ B, hence y ∈ fA ∩ B and fA ∩ fB ⊂ fA ∩ B follows. ‘⇐’: Take x x ∈ X with fx = fx and set A = x, B = x . Then ∅ = fx ∩ fx = fx ∩ x which is only possible if x ∩ x = ∅, i.e. if x = x . This shows that f is injective. 2.2 Lemma f X → Y is injective if, and only if, fX \ A = fX \ fA for all A ⊂ X. Proof ‘⇒’ Assume that f is injective. We show first that fx ∈ fA if, and only if, x ∈ A. Indeed, if fx ∈ fA, then x ∈ A; if x ∈ A but fx ∈ fA,
Measures, Integrals and Martingales
7
then we can find some a ∈ A such that fa = fx ∈ fA. Since f is injective, x = a ∈ A and we have found a contradiction. Thus fX \ fA = y ∈ Y y = fx fx ∈ fA = y ∈ Y y = fx x ∈ A = fX \ A ‘⇐’: Let fx = fx and assume that x = x . Then fx ∈ fX \ x = fX \ fx which cannot happen as fx ∈ fx . We can now start with the main topic of this chapter: counting. 2.3 Definition Two sets X Y have the same cardinality if there exists a bijection f X → Y . In this case we write #X = #Y . If there is an injection g X → Y , we say that the cardinality of X is less than or equal to the cardinality of Y and write #X #Y . If #X #Y but #X = #Y , we say that X is of strictly smaller cardinality than Y and write #X < #Y (in this case, no injection g X → Y can be surjective). That Definition 2.3 is indeed counting becomes clear if we choose Y = since in this case #X = # or #X # just means that we can label each x ∈ X with a unique tag from the set 1 2 3 , i.e. we are numbering X. This particular example is, in fact, of central importance. 2.4 Definition A set X is countable if #X #. If # < #X, the set X is said to be uncountable. The cardinality of is called ℵ0 , aleph null. Plainly, Definition 2.4 requires that we can find for every countable set some enumeration X = x1 x2 x3 which may or may not be finite (and which may contain any xj more than once). Caution: Some authors reserve the word countable for the situation where #X = # while sets where #X # are called at most countable or finite or countable. This has the effect that a countable set is always infinite. We do not adopt this convention.
8
R.L. Schilling
The following examples show that (countable) sets with infinitely many elements can behave strangely. 2.5 Examples (i) Finite sets are countable: a b z → 1 2 26 where a ↔ 1 z ↔ 26, is bijective and 1 2 3 26 → is clearly an injection. Thus #a b c z = #1 2 3 26 # (ii) The even numbers are countable. This follows from the fact that the map f 2 4 6 2j →
k →
k 2
is an injection and even a bijection.[] This means that there are ‘as many’ even numbers as there are natural numbers. (iii) The set of integers = 0 ±1 ±2 is countable. The counting > scheme is shown on the right (run through in clockwise orientation starting > from 0) or, more formally, 1 2 –2 –1 0 < 2k if k > 0 < g k ∈ → 2k + 1 if k 0 hence # #.[] (iv) The Cartesian product × = j k j k ∈ is countable. To see this, arrange the pairs j k in an array and count along the diagonals:
1
2
3
4
5
(1, 1)
(1, 2)
(1, 3)
(1, 4)
(1, 5)
...
(2, 1)
(2, 2)
(2, 3)
(2, 4)
(2, 5)
...
(3, 1)
(3, 2)
(3, 3)
(3, 4)
(3, 5)
...
(4, 1)
(4, 2)
(4, 3)
(4, 4)
(4, 5)
...
(5, 1)
(5, 2)
(5, 3)
(5, 4)
(5, 5)
...
. ..
. ..
. ..
. ..
. ..
Measures, Integrals and Martingales
9
Notice that each line contains only finitely many elements, so that each diagonal can be dealt with in finitely many steps. The map for the above counting scheme is given by j + kj + k − 1 − k + 1 ∈ j k ∈ × (2.6) 2 (v) The rational numbers are countable. To see this, set Q± = q ∈ ±q > 0. Every element mn ∈ Q+ can be identified with at least one pair m n ∈ × , so that Q+ ⊂ 11 21 21 31 22 31 41 23 23 41 h j k →
1
2
3
4
in the set on the right we distinguish between cancelled and uncancelled 6 1 2 3 forms of a rational, i.e. 18 3 6 9 etc. are counted whenever they appear. k refer to the corresponding diagonals in the counting scheme in The numbers i
j
part (iv). This shows that we can find injections Q+ −→ −→ × ; the set × is countable, thus Q+ is countable[] and so is Q− . Finally, = Q− ∪· 0 ∪· Q+ = r1 r2 r3 ∪· 0 ∪· q1 q2 q3 and p1 = 0 p2k = qk p2k+1 = rk gives an enumeration p1 p2 p3 of . 2.6 Theorem Let A1 A2 A3 be countably many countable sets. Then A =
j∈ Aj is countable, i.e. countable unions of countable sets are countable. Proof Since each Aj is countable we can find an enumeration Aj = aj1 aj2 ajk (if Aj is a finite set, we repeat the last element of the list infinitely often), so that Aj = ajk j k ∈ × A= j∈
Using Example 2.5(iv) we can relabel × by and (after deleting all duplicates) we have found an enumeration. It is not hard to see that for cardinalities ‘ ’ is reflexive (#A #A) and transitive (#A #B #B #C =⇒ #A #C). Antisymmetry, which makes ‘ ’ into a partial order relation, is less obvious. The proof of the following important result is somewhat technical and can be left out at first reading. 2.7 Theorem (Cantor–Bernstein) Let X Y be two sets. If #X #Y and #Y #X, then #X = #Y .
10
R.L. Schilling
Proof By assumption, #X #Y ⇐⇒ there exists an injection f X → Y #Y #X ⇐⇒ there exists an injection g Y → X In order to prove #X = #Y we have to construct a bijection h X → Y . Step 1. Without loss of generality we may assume that Y ⊂ X. Indeed, since g Y → gY is a bijection, we know that #Y = #gY and it is enough to show #gY = #X. As gY ⊂ X we can simplify things and identify gY with Y , i.e. assume that g = id or, equivalently, Y ⊂ X. Step 2. Let Y ⊂ X and g = id. Recursively we define X0 = X Xj+1 = fXj
Y0 = Y Yj+1 = fYj
As usual, we write f j = f f · · · f and f 0 = id. Then
j times
f j+1 X = f j fX
fX⊂Y
⊂
Y ⊂X
f j Y ⊂ f j X
⊂
Xj+1
⊂
Yj
Xj
and we can define a map h X → Y by fx if x ∈ Xj \ Yj for some j ∈ 0 hx =
x if x ∈ j∈0 Xj \ Yj Step 3. The map h is surjective: hX = Y . Indeed, we have by definition c hX = fXj \ Yj ∪ Xj \ Yj j∈0
j∈0
c Xj \ Yj fXj \ fYj ∪ X \ Y ∪
1 2
=
j∈0
j∈
c c = Xj+1 \ Yj+1 Xj+1 \ Yj+1 ∪ X \ Y ∩ j∈0
= A
j∈0
= A ∪ X c ∪ Y ∩ Ac = A ∪ Y ∩ Ac = Y ∩ X = Y
where we used that A = j∈0 Xj+1 \ Yj+1 ⊂ j∈0 Xj+1 ⊂ X1 = fX ⊂ Y .
Measures, Integrals and Martingales
11
Step 4. The map h is injective. To see this, let x x ∈ X and hx = hx . We have four possibilities (a) x x ∈ Xj \ Yj for some j ∈ 0 . Then fx = hx = hx = fx so that x = x since f is injective.
(b) x x ∈ j∈0 Xj \ Yj . Then x = hx = hx = x . (c) x ∈ Xj \ Yj for some j ∈ 0 and x ∈ Xk \ Yk for all k ∈ 0 . As fx = hx = hx = x we see 1 2
x = fx ∈ fXj \ Yj = fXj \ fYj = Xj+1 \ Yj+1 which is impossible, i.e. (c) cannot occur. (d) x ∈ Xj\Yj for some j ∈ 0 and x ∈ Xk\Yk for all k ∈ 0 . This is analogous to (c). . Theorem 2.7 says that #X < #Y and #Y < #X cannot occur at the same time; it does not claim that we can compare the cardinality of any two sets X and Y , i.e. that ‘’ is a linear ordering. This is indeed true but its proof requires the axiom of choice, see Hewitt and Stromberg [20, p. 19]. Not all sets are countable. The following proof goes back to G. Cantor and is called Cantor’s diagonal method. 2.8 Theorem The interval 0 1 is uncountable; its cardinality = #0 1 is called the continuum. Proof Recall that we can write each x ∈ 0 1 as a decimal fraction, i.e. x = 0 y1 y2 y3 with yj ∈ 0 1 9. If x has a finite decimal representation, say x = 0 y1 y2 y3 yn , yn = 0, we replace the last digit yn by yn − 1 and fill it up with trailing 9s. For example, 0 24 = 0 2399 . This yields a unique representation of x by an infinite decimal expansion. Assume that 0 1 were countable and let x1 x2 be an enumeration (containing no element more than once!). Then we can write x1 = 0 a11 a12 a13 a14 x2 = 0 a21 a22 a23 a24 x3 = 0 a31 a32 a33 a34
(2.7)
x4 = 0 a41 a42 a43 a44
and construct a new number x = 0 y1 y2 y3 ∈ 0 1 with digits 1 if ajj = 5 yj = 5 if ajj = 5
(2.8)
12
R.L. Schilling
By construction, x = xj for any xj from the list (2.7): x and xj differ at the jth decimal. But then we have found a number x ∈ 0 1 which is not contained in our supposedly complete enumeration of 0 1 and we have arrived at a contradiction. By 0 1 we denote the set of all sequences xj j∈ where xj ∈ 0 1. 2.9 Theorem We have #0 1 = . Proof We have to assign to every sequence xj j∈ ⊂ 0 1 a unique number x ∈ 0 1 – and vice versa. For this we write, as in the proof of Theorem 2.8, each xj as a unique infinite decimal fraction xj = 0 aj1 aj2 aj3 aj4
j ∈
and we organize the array ajk jk∈ into one sequence with the help of the counting scheme of Example 2.5(iv): x = 0 a11 a12 a21 a13 a22 a31 a14 a23 a32 a41
1
2
3
4
k refer to the corresponding diagonals in the counting scheme (The numbers of Example 2.5(iv).) Since the counting scheme was bijective, this procedure is reversible, i.e. we can start with the decimal expansion of x ∈ 0 1 and get a unique sequence of xj s. We have thus found a bijection between 0 1 and 0 1.
We write X for the power set A A ⊂ X which is the family of all subsets of a given set X. For finite sets it is clear that the power set is of strictly larger cardinality than X. This is still true for infinite sets. 2.10 Theorem For any set X we have #X < #X. Proof We have to show that no injection X → X can be surjective. Fix such an injection and define B = x ∈ X x ∈ x (mind: x is a set!). Clearly B ∈ X. If were surjective, B = z for some element z ∈ X. Then, however, z∈B
def
⇐⇒
z ∈ z
⇐⇒
z ∈ B
which is impossible. Thus cannot be surjective.
since z = B
Measures, Integrals and Martingales
13
Problems 2.1. Let A B C ⊂ X be sets. Show that (i) (ii) (iii) (iv) (v)
A \ B = A ∩ Bc ; A \ B \ C = A \ B ∪ C; A \ B \ C = A \ B ∪ A ∩ C; A \ B ∩ C = A \ B ∪ A \ C; A \ B ∪ C = A \ B ∩ A \ C.
2.2. Let A B C ⊂ X. The symmetric difference of A and B is defined as A B = A \ B ∪ B \ A. Verify that A ∪ B ∪ C \ A ∩ B ∩ C = A B ∪ B C 2.3. Prove de Morgan’s identities (2.2) and (2.3). 2.4. (i) Find examples which illustrate that fA ∩ B = fA ∩ fB and fA \ B = fA \ fB. In both relations one inclusion ‘⊂’ or ‘⊃’ is always true. Which one? (ii) Prove (2.5). 1 if x ∈ A 2.5. The indicator function of a set A ⊂ X is defined by 1A x = 0 if x ∈ A Check that (i) 1A∩B = 1A 1B
(ii)
1A∪B = min1A + 1B 1
(iii)
1A\B = 1A − 1A∩B
(iv)
1A∪B = 1A + 1B − 1A∩B
(v)
1A∪B = max1A 1B
(vi)
1A∩B = min1A 1B
2.6. Let A B C ⊂ X and denote by AB the symmetric difference as in Problem 2.2. Show that (i) 1AB = 1A + 1B − 2 1A 1B = 1A + 1B mod 2; (ii) ABC = ABC; (iii) X is a commutative ring (in the usual algebraists’ sense) with ‘addition’ and ‘multiplication’ ∩. [Hint: use indicator functions for (ii) and (iii).] 2.7. Let f X → Y be a map, A ⊂ X and B ⊂ Y . Show that, in general, f f −1 B B
and
f −1 fA A
When does ‘=’ hold in these relations? Provide an example showing that the above inclusions are strict. 2.8. Let f and g be two injective maps. Show that f g, if it exists, is injective.
14
R.L. Schilling
2.9. Show that the following sets have the same cardinality as m ∈ m is odd ×
m m ∈ m∈ m . 2.10. Use Theorem 2.7 to show that # × = #. [Hint: # = # × 1 and × 1 ⊂ × .] 2.11. Show that if E ⊂ F we have #E #F . In particular, subsets of countable sets are again countable. 2.12. Show that 0 1 = all infinite sequences consisting of 0 and 1 is uncountable. [Hint: diagonal method.] 2.13. Show that the set is uncountable and that #0 1 = #. [Hint: find a bijection f 0 1 → .]
2.14. Let Aj j∈ be a sequence of sets of cardinality . Show that # j∈ Aj = .
[Hint: map Aj bijectively onto j − 1 j and use that 0 1 ⊂ j=1 j − 1 j ⊂ .] 2.15. Adapt the proof of Theorem 2.8 to show that #1 2 #0 1 #0 1 and conclude that #0 1 = #0 1 . Remark. This is the reason for writing = 2ℵ0 . [Hint: interpret 0 1 as base-2 expansions of all numbers in 0 1 while 1 2 are all infinite base-3 expansions lacking the digit 0.] 2.16. Extend Problem 2.15 to deduce #0 1 2 n = #0 1 for all n ∈ . 2.17. Mimic the proof of Theorem 2.9 to show that #0 12 = . Use the fact that # = #0 1 to conclude that #2 = . 2.18. Show that the set of all infinite sequences of natural numbers has cardinality . [Hint: use that #0 1 = #1 2 1 2 ⊂ ⊂ and # = #0 1 .] 2.19. Let = F ⊂ #F < . Show that # = #.
[Hint: embed into k∈ k or show that F → j∈F 2j is a bijection between and .] 2.20. Show – not using Theorem 2.10 – that # > #. Conclude that there are more than countably many maps f → . [Hint: diagonal method.] 2.21. If A ⊂ we can identify the indicator function 1A → 0 1 with the 0-1-sequence 1A j j∈ , i.e., 1A ∈ 0 1 . Show that the map A → 1A ∈ 0 1 is a bijection and conclude that # = .
3 -algebras
We have seen in the prologue that a reasonable measure should be able to deal with countable partitions. Therefore, a measure function should be defined on a system of sets which is stable whenever we repeat any of the basic set operations – ∪ ∩ c – countably many times. 3.1 Definition A -algebra on a set X is a family of subsets of X with the following properties: X ∈
(1 )
A ∈ =⇒ Ac ∈ Aj j∈ ⊂ =⇒ Aj ∈
(2 ) (3 )
j∈
A set A ∈ is said to be (-)measurable. 3.2 Properties (of a -algebra) (i) ∅ ∈ . Indeed: ∅ = X c ∈ by 1 2 . (ii) A B ∈ =⇒ A ∪ B ∈ . Indeed: set A1 = A A2 = B A3 = A4 = = ∅. Then A ∪ B = j∈ Aj ∈ by 3 . (iii) Aj j∈ ⊂ =⇒ j∈ Aj ∈ . Indeed: if Aj ∈ , then Acj ∈ by 2 , hence j∈ Acj ∈ by 3 and, c c ∈ . again by 2 j∈ Aj = j∈ Aj 3.3 Examples (i) X is a -algebra (the maximal -algebra in X). (ii) ∅ X is a -algebra (the minimal -algebra in X). (iii) ∅ B Bc X , B ⊂ X, is a -algebra. (iv) ∅ B X is no -algebra (unless B = ∅ or B = X). (v) = A ⊂ X #A # or #Ac # is a -algebra. 15
16
R.L. Schilling
Proof: Let us verify 1 –3 . 1 : X c = ∅ which is certainly countable. 2 : if A ∈ , either A or Ac is by definition countable, so Ac ∈ . 3 : if Aj j∈ ⊂ , then two cases can occur: • All Aj are countable. Then A = j∈ Aj is a countable union of countable sets which is, by T2.6, itself countable. • At least one Aj0 is uncountable. Then Acj0 must be countable, so that c Aj = Acj ⊂ Acj0 Hence
c
j∈ Aj
j∈
j∈
is countable (Problem 2.11) and so
j∈ Aj
∈ .
(vi) (trace -algebra) Let E ⊂ X be any set and let be some -algebra in X. Then E = E ∩ = E ∩ A A ∈
(3.1)
is a -algebra in E. (vii) (pre-image -algebra) Let f X → X be a map and let be a -algebra in X . Then
= f −1 = f −1 A A ∈ is a -algebra in X. 3.4 Theorem (and Definition) (i) The intersection i∈I i of arbitrarily many -algebras i in X is again a -algebra in X. (ii) For every system of sets ⊂ X there exists a smallest (also: minimal, coarsest) -algebra containing . This is the -algebra generated by , denoted by , and is called its generator. Proof (i) We check 1 –3 1 : since X ∈ i for all i ∈ I, X ∈ i i . 2 : if A ∈ i i , then Ac ∈ i for all i ∈ I, so Ac ∈ i i 3 : let Ak k∈ ⊂ i i . Then Ak ∈ i for all k ∈ and all i ∈ I, hence k∈ Ak ∈ i for each i ∈ I and so k∈ Ak ∈ i∈I i . (ii) Consider the family = -alg. ⊃
Since ⊂ X and since X is a -algebra, the above intersection is nonvoid. This means that the definition of makes sense and yields, by part (i), a -algebra containing . If is a further -algebra with ⊃ , then would be included in the intersection used for the definition of , hence ⊂ . In this sense, is the smallest -algebra containing .
Measures, Integrals and Martingales
17
3.5 Remarks (i) If is a -algebra, then = . (ii) For A ⊂ X we have A = ∅ A Ac X . 3.5(i) (iii) If ⊂ ⊂ , then ⊂ ⊂ = . On the Euclidean space n there is a canonical -algebra, which is generated by the open sets. Recall that U ⊂ n
is open ⇐⇒ ∀ x ∈ U ∃ > 0 B x ⊂ U
where B x = y ∈ n x − y < is the open ball with centre x and radius . A set is closed if its complement is open. The system of open sets in X = n , n , has the following properties: ∅ X ∈ n
( 1 )
U V ∈ =⇒ U ∩ V ∈ Ui ∈ n i ∈ I arbitrary =⇒ Ui ∈ n n
n
( 2 ) ( 3 )
i∈I
Note, however, that countable or arbitrary intersections of open sets need not be open[] . A family of subsets of a general space X satisfying the conditions 1 – 3 is called a topology, and the pair X is called a topological space; in analogy to n , U ∈ is said to be open while closed sets are exactly the complements of open sets; see Appendix B. 3.6 Definition The -algebra n generated by the open sets n of n is called Borel -algebra, and its members are the Borel sets or Borel measurable sets. We write n or n for the Borel sets in n . The Borel sets are fundamental for the study of measures on n . Since the Borel -algebra depends on the topology of n , n is often also called the topological -algebra. 3.7 Theorem Denote by n n and n the families of open, closed and compact1 sets in n . Then
n = n = n = n Proof Since compact sets are closed, we have n ⊂ n and by Remark 3.5(iii), n ⊂ n . On the other hand, if C ∈ n , then Ck = C ∩ Bk 0 is2 closed and bounded, hence ∈ n . By construction C = k∈ Ck , thus n ⊂ n and also n ⊂ n . 1 2
i.e. closed and bounded. Bk 0 Bk 0 denote the open, resp., closed balls with centre 0 and radius k.
18
R.L. Schilling
Since n c = U c U ∈ n = n (and n c = n ) we have n = n c ⊂ n , hence n ⊂ n and the converse inclusion is similar. The Borel -algebra n is generated by many different systems of sets. For our purposes the most interesting generators are the families of open rectangles o = on = o n = a1 b1 × · · · × an bn aj bj ∈ and (from the right) half-open rectangles = n = n = a1 b1 × · · · × an bn aj bj ∈ We use the convention that aj bj = aj bj = ∅ if bj aj and, of course, that a1 b1 × · · · × ∅ × · · · × an bn = ∅. Sometimes we use the shorthand a b = a1 b1 × · · · × an bn for vectors a = a1 an b = b1 bn o for the (half-)open rectangles with only from n . Finally, we write rat rat rational endpoints. Notice that the half-open rectangles are b b
a
b
intervals in R . . . ,
a
rectangles in R2 . . . ,
a
cuboids in R3 . . . ,
and hypercubes in dimensions n > 3. n = on = n = on . 3.8 Theorem We have n = rat rat
Proof We begin with open rectangles having rational endpoints. Since the open o . rectangle a b is an open set[] , we find n ⊃ o ⊃ rat n Conversely, if U ∈ , we have I (x) Bε (x) U= I (3.2) o I⊂U I∈ rat
Here ‘⊃’ is clear from the definition and for the other direction ‘⊂’ we fix x ∈ U . Since U is open, there is some ball B x ⊂ U – see the picture – and we can inscribe a square into B x and then shrink this square to get a rectangle o containing x. Since every rectangle I is uniquely determined I = Ix ∈ rat U
Measures, Integrals and Martingales
19
by its main diagonal, there are at most # n × n = # many I in the union (3.2). Thus o U ∈ n ⊂ rat o , and so n = o = o . proving the other inclusion n ⊂ rat rat Every half-open rectangle (with rational endpoints) can be written as a1 − 1j b1 × · · · × an − 1j bn a1 b1 × · · · × an bn = j∈
while every open rectangle (with rational endpoints) can be represented as c1 + 1j d1 × · · · × cn + 1j dn c1 d1 × · · · × cn dn = j∈
o and These formulae imply that ⊂ o and o ⊂ resp. rat ⊂ rat
o ⊂ , hence by Remark 3.5(iii), o = resp. o = rat rat rat rat o = o = n from the first and the proof follows since we know rat part. 3.9 Remark The Borel sets of the real line are also generated by any of the following systems − a a ∈
− a a ∈
− b b ∈
− b b ∈
c c ∈
c c ∈
d d ∈
d d ∈
3.10 Remark One might think that can be explicitly constructed for any given by adding to the family all possible countable unions of its members and their complements: c c = Gj Gj
Gj ∈ j∈
j∈
But c is not necessarily a -algebra.[] Even if we repeat this procedure countably often, i.e. = n = c c c n n∈ n times
.3 we end up, in general, with a set that is too small: 3
A ‘constructive’ approach along these lines is nevertheless possible if we use transfinite induction, see Hewitt and Stromberg [20, Theorem 10.23] or Appendix D.
20
R.L. Schilling
This shows that the -operation produces a pretty big family; so big, in fact, that no approach using countably many countable set operations will give the whole of . On the other hand, it is rather typical that a -algebra is given through its generator. In order to deal with these cases, we need the notion of Dynkin systems which will be introduced in Chapter 5.
Problems 3.1. Let be a -algebra. Show that (i) if A1 A2 AN ∈ , then A1 ∩ A2 ∩ ∩ AN ∈ ; (ii) A ∈ if, and only if, Ac ∈ ; (iii) if A B ∈ , then A \ B ∈ and A B ∈ . 3.2. Prove the assertions made in Example 3.3 (iv), (vi) and (vii). [Hint: use (2.5) for (vii).] 3.3. Verify the assertions made in Remark 3.5. 3.4. Let X = 0 1 . Find the -algebra generated by the sets (i) 0 21 ;
(ii) 0 41 43 1 ;
(iii) 0 43 41 1 . 3.5. Let A1 A2 AN be subsets of X. (i) If the Aj are disjoint and · Aj = X, then #A1 A2 AN = 2N . Remark. A set A in a -algebra is called an atom, if there is no proper subset B A such that B ∈ . In this sense all Aj are atoms. (ii) Show that A1 A2 AN consists of finitely many sets. [Hint: show that A1 A2 AN has only finitely many atoms.] 3.6. Verify the properties 1 – 3 for open sets in n . Is n a -algebra? 3.7. Find an example (e.g. in ) showing that j∈ Uj need not be open even if all Uj are open sets. 3.8. Prove any one of the assertions made in Remark 3.9. 3.9. Is this still true for the family = Br x x ∈ n r ∈ + ? [Hint: mimic the Proof of T3.8.] 3.10. Let n be the collection of open sets (topology) in n and let A ⊂ n be an arbitrary subset. We can introduce a topology A on A as follows: a set V ⊂ A is called open (relative to A) if V = U ∩ A for some U ∈ n . We write A for the open sets relative to A. (i) Show that A is a topology on A, i.e. a family satisfying 1 – 3 . (ii) If A ∈ n , show that the trace -algebra A ∩ n coincides with A (the latter is usually denoted by A: the Borel sets relative to A).
Measures, Integrals and Martingales
21
3.11. Monotone classes. A family ⊂ X is called a monotone class if it is stable under countable unions and countable intersections, i.e. Aj j∈ ⊂ =⇒ Aj Aj ∈ j∈
j∈
(i) Mimic the proof of T3.4 to show that for every ⊂ X there is a smallest monotone class containing . (ii) Assume that ∅ ∈ and that E ∈ =⇒ E c ∈ . Show that the system = B ∈ Bc ∈ is a -algebra. (iii) Show that in (ii) ⊂ ⊂ ⊂ holds and conclude that = . 3.12. Alternative characterization of n . In older books the Borel sets are often introduced as the smallest family of sets which is stable under countable intersections and countable unions and which contains all open sets n . The purpose of this exercise is to verify that = n . Show that (i) (ii) (iii) (iv)
is well-defined and ⊂ n ; U ∈ n =⇒ U c ∈ , i.e. contains all closed sets; B ∈ Bc ∈ is a -algebra; n ⊂ B ∈ Bc ∈ ⊂ .
[Hints: (i) – mimic T3.4(ii); (ii) – every closed set F is the intersection of the open sets Un = F + B1/n 0 = y x ∈ F x − y < 1/n , n ∈ .]
4 Measures
We are now ready to introduce one of the central concepts of measure and integration theory: measures. As before, X is some set and is a -algebra on X. 4.1 Definition A (positive) measure on X is a mapping → 0 defined on a -algebra satisfying ∅ = 0
(M1 )
and, for any countable family of pairwise disjoint sets Aj j∈ ⊂ , -additivity
Aj
· Aj = j∈
(M2 )
j∈
If M1 M2 hold, but is not a -algebra, is said to be a pre-measure. Caution: M2 requires implicitly that · j Aj is again in – this is clearly the case for -algebras, but needs special attention if one deals with pre-measures. 4.2 Definition Let X be a set and be a -algebra on X. The pair X is called measurable space. If is a measure on X, X is called measure space. A finite measure is a measure with X < and a probability measure is a measure with X = 1. The corresponding measure spaces are called finite measure space resp. probability space. An exhausting sequence Aj j∈ ⊂ is an increasing sequence of sets A1 ⊂ A2 ⊂ A3 ⊂ such that j∈ Aj = X. A measure is said to be -finite and X is called a -finite measure space, if contains an exhausting sequence Aj j∈ such that Aj < for all j ∈ . Let us derive some immediate properties of (pre-)measures. 22
Measures, Integrals and Martingales
23
4.3 Proposition Let X be a measure space and A B ∈ . Then (i) (ii) (iii) (iv) (v)
· B = A + B A ∩ B = ∅ =⇒ A ∪ A ⊂ B =⇒ A B A ⊂ B A < =⇒ B \ A = B − A A ∪ B + A ∩ B = A + B A ∪ B A + B
( finitely additive) (monotone) (strongly additive) (subadditive)
Proof (i) Set A1 = A A2 = B A3 = A4 = = ∅. Then Aj j∈ is a family of · B = · Aj and by M2 pairwise disjoint sets from . Moreover A ∪ j · B = · Aj = Aj = A + B + ∅ +
A ∪ j∈
j∈
= A + B
· B \ A , and by (i) (ii) If A ⊂ B, we have B = A ∪ · B \ A = A + B \ A B = A ∪ A
(4.1) (4.2)
(iii) If A ⊂ B, we can subtract the finite number A from both sides of (4.1) to get B − A = B \ A . (iv) For all A B ∈ we have · A∩B ∪ · B \ A ∩ B A ∪ B = A \ A ∩ B ∪ and using (i) twice we get A ∪ B = A \ A ∩ B + A ∩ B + B \ A ∩ B
Adding A ∩ B (which may assume the value +) on both sides and using again (4.1) yields A ∪ B + A ∩ B = A \ A ∩ B + A ∩ B + B \ A ∩ B + A ∩ B = A + B
(v) From (iv) we get A + B = A ∪ B + A ∩ B A ∪ B for all A B ∈ . So far we have not really used the -additivity of in its full strength. The next theorem shows that -additivity is, in fact, some kind of continuity condition for (pre-)measures.
24
R.L. Schilling
We call a sequence of sets Aj j∈ increasing, if A1 ⊂ A2 ⊂ A3 ⊂ and we write in this case Aj ↑ A with limit A = j Aj . Decreasing sequences of sets are defined accordingly and we write Aj ↓ A with limit A = j Aj . All -algebras are stable under increasing or decreasing limits of their members. 4.4 Theorem Let X be a measurable space. A map → 0 is a measure if, and only if, (i) ∅ = 0, · B = A + B for all A B ∈ with A ∩ B = ∅, (ii) A ∪ (iii) (continuity of measures from below) for any increasing sequence Aj j∈ ⊂ with Aj ↑ A ∈ we have A = lim Aj = sup Aj
j→
j∈
If A < for all A ∈ , (iii) can be replaced by either of the following equivalent conditions (iii ) (continuity of measures from above) for any decreasing sequence Aj j∈ ⊂ with Aj ↓ A ∈ we have A = lim Aj = inf Aj j→
(iii )
j∈
(continuity of measures at ∅) for any decreasing sequence Aj j∈ ⊂ with Aj ↓ ∅ we have lim Aj = 0
j→
4.5 Remark With some obvious rewordings, P4.3 and T4.4 are still valid for pre-measures, i.e. for families which are not -algebras. Of course, one has to make sure that ∅ ∈ and that is stable under finite unions, intersections and differences of sets1 (for P4.3) and, for T4.4, that increasing and decreasing sequences of the sets under consideration have their limits in . The proofs are literally the same. Proof (of Theorem 4.4) Let us, first of all, check that every measure enjoys all the properties (i)–(iii) and (iii ), (iii ). Property (i) is clear from the definition of a measure and (ii) follows from P4.3(i). Let Aj j∈ ⊂ be an increasing sequence of sets Aj ↑ A and set B1 = A1 Bj+1 = Aj+1 \ Aj
1
Such a family is called a ring of sets.
Measures, Integrals and Martingales
Obviously, Bj ∈ , the Bj are pairwise disjoint, Ak = kj=1 Bj and k∈ Ak = · Bj = A. Thus j∈ A = · Bj = Bj j∈
= lim
k→
k
25
B2
B1
j∈
Bj
j=1
= lim B1 ∪ ∪ Bk
B3
k→
= lim Ak
k→
If Aj ↓ A we see easily that A1 \ Aj ↑ A1 \ A as j → . Since A1 < , the previous argument shows that A1 \ A = lim A1 \ Aj = lim A1 − Aj = A1 − lim Aj
j→
j→
j→
This means that A1 − A = A1 − limj→ Aj and (iii ) follows. If we take, in particular, A = ∅, the above calculation also proves (iii ). Let us now assume that (i)–(iii) hold for the set-function → 0 . In order to see that is a measure, we have to check M2 . For this take a sequence Bj j∈ ⊂ of pairwise disjoint sets and define ·
∪ · Bk ∈ Ak = B1 ∪
A =
Ak = · Bj
(4.3)
j∈
k∈
Clearly Ak ↑ A, and using repeatedly property (ii) we get Ak = B1 + · · · + Bk . From (iii) we conclude A = lim Ak = lim k→
k→
∗
∗
k
Bj =
Bj
j∈
j=1
∗
Finally assume that A < for all A ∈ and that (i), (ii) and (iii ) or (iii ) hold. We will show that under the finiteness assumption (iii )⇒(iii )⇒(iii); the assertion follows then from the considerations of the first part of the proof.
26
R.L. Schilling
For (iii )⇒(iii ) there is nothing to show. For the remaining implication take a sequence Bj j∈ ⊂ of pairwise disjoint sets and define sets Ak and A as in (4.3). Then A \ Ak ↓ ∅ and from (iii ) we conclude that limk→ A\Ak = 0. Since Ak < we get A = limk→ Ak and (iii) follows. 4.6 Corollary Every measure [pre-measure] is -subadditive, i.e. Aj Aj j∈
(4.4)
j∈
holds for all sequences Aj j∈ ⊂ of not necessarily disjoint sets such that 2 j∈ Aj ∈ . Proof Since the arguments are virtually the same, we may assume that is a -algebra, so that becomes a measure. Set Bk = A1 ∪ ∪ Ak ↑ j∈ Aj as k → . By T4.4(iii) and repeated applications of P4.3(v), Aj = lim A1 ∪ ∪ Ak j∈
k→
lim A1 + · · · + Ak = Aj
k→
j∈
It is about time to give some examples of measures. At this stage this is, unfortunately, a somewhat difficult task! The main problem is that we have to explain for every set of the -algebra what its measure A shall be. Since can be very large – see Remark 3.10 – this is, in general, only (explicitly!) possible if either or is very simple. Nevertheless ... 4.7 Examples (i) (Dirac measure, unit mass) Let X be any measurable space and let x ∈ X be some point. Then x → 0 1, defined for A ∈ by 0 if x ∈ A x A = 1 if x ∈ A is a measure. It is called Dirac’s delta measure or unit mass at the point x. (ii) Consider with from Example 2.3(v) (i.e. A ∈ if A or Ac is countable). Then → 0 1, defined for A ∈ by 0 if A is countable A = 1 if Ac is countable is a measure. 2
This is automatically fulfilled for a measure on a -algebra.
Measures, Integrals and Martingales
27
(iii) (Counting measure) Let X be a measurable space. Then #A if A is finite A = + if A is infinite defines a measure. It is called counting measure. (iv) (Discrete probability measure) Let = 1 2 be a countable set
and pj j∈ be a sequence of real numbers pj ∈ 0 1 such that j∈ pj = 1. On the set-function pj = pj j A A ⊂ PA = j j ∈A
j∈
defines a probability measure. The triplet P is called discrete probability space. (v) (Trivial measures) Let X be a measurable space. Then 0 if A = ∅ and A = 0 A ∈ A = + if A = ∅ are measures. Note that our list of examples does not include the most familiar of all measures: length, area and volume. 4.8 Definition The set-function n on n n that assigns every half-open rectangle a b = a1 b1 × · · · × an bn ∈ the value n n a b = bj − aj j=1
is called n-dimensional Lebesgue measure. The problem here is that we do not know whether n is a measure in the sense of Definition 4.1: n is only explicitly given on the half-open rectangles and it is not obvious at all that n is a pre-measure on ; much less clear is the question if and how we can extend this pre-measure from to a proper measure on . Over the next few chapters we will see that such an extension is indeed possible. But this requires some extra work and a more abstract approach. One of the main obstacles is, of course, that cannot be obtained by a bare-hands construction from . Let us, meanwhile, note the upshot of what will be proved in the next chapters.
28
R.L. Schilling
4.9 Theorem Lebesgue measure n exists, is a measure on the Borel sets n and is unique. Moreover, n enjoys the following additional properties for B ∈ n : (i) n is invariant under translations: n x + B = n B , x ∈ n ; (ii) n is invariant under motions: n R−1 B = n B where R is a motion, i.e. a combination of translations, rotations and reflections; (iii) n M −1 B = det M−1 n B for any invertible matrix M ∈ n×n . The attentive reader will have noticed that the sets x + B = x + y y ∈ B R−1 B = R−1 y y ∈ B and M −1 B must again be Borel sets, otherwise the statement of T4.9 would be senseless, cf. T5.8 and Chapter 7. Problems 4.1. Extend Proposition 4.3(i), (iv) and (v) to finitely many sets A1 A2 AN ∈ . 4.2. Check that the set-functions defined in Example 4.7 are measures in the sense of Definition 4.1. 4.3. Is the set-function of 4.7 (ii) still a measure on the measurable space ? And on the measurable space ∩ ? 4.4. Let X = . For which -algebras are the following set-functions measures: 0 if A = ∅ 0 if A is finite (i) A = (ii) A = 1 if A = ∅ 1 if Ac is finite? 4.5. Find an example showing that the finiteness condition in Theorem 4.4 (iii ) or (iii ) is essential. [Hint: use Lebesgue measure or the counting measure on infinite tails k ↓ ∅.] 4.6. Let X be a measurable space. (i) Let be two measures on X . Show that for all a b 0 the set-function A = aA + bA , A ∈ , is again a measure. (ii) Let 1 2 be countably many measures on X and let j j∈ be a
sequence of positive numbers. Show that A = j=1 j j A , A ∈ , is again a measure. [Hint: to show -additivity use (and prove) the following helpful lemma: for any double sequence ij i j ∈ , of real numbers we have sup sup ij = sup sup ij
i∈ j∈
j∈ i∈
Thus limi→ limj→ ij = limj→ limi→ ij if i → ij , and j → ij increases when the other index is fixed.] 4.7. Let X be a measure space and F ∈ . Show that A → A ∩ F defines a measure.
Measures, Integrals and Martingales
29
4.8. Let P be a probability space and Aj j∈ ⊂ a sequence of sets with PAj = 1 for all j ∈ . Show that P j∈ Aj = 1. 4.9. Let X be a finite measure space and Aj j∈ Bj j∈ ⊂ such that Aj ⊃ Bj for all j ∈ . Show that Aj − Bj Aj − Bj
j∈
j∈
j∈
[Hint: show first that j Aj \ k Bk ⊂ j Aj \ Bj then use C4.6.] 4.10. Null sets. Let X be a measure space. A set N ∈ is called a null set or -null set if N = 0. We write for the family of all -null sets. Check that has the following properties: (i) ∅ ∈ ; (ii) if N ∈ M ∈ and M ⊂ N then M ∈ ; (iii) if Nj j∈ ⊂ , then j∈ Nj ∈ . 4.11. Let be one-dimensional Lebesgue measure. (i) Show that for all x ∈ the set x is a Borel set with x = 0. [Hint: consider the intervals x − 1/k x + 1/k k ∈ and use Theorem 4.4.] (ii) Prove that is a Borel set and that = 0 in two ways: a) by using the first part of the problem; b) by considering the set C = k∈ qk − 2−k qk + 2−k , where qk k∈ is an enumeration of , and letting → 0. (iii) Use the trivial fact that 0 1 = 0x1 x to show that a non-countable union of null sets (here: x) is not necessarily a null set. 4.12. Determine all null sets of the measure a + b , a b ∈ , on . 4.13. Completion (1). We have seen in Problem 4.10 that measurable subsets of null sets are again null sets: M ∈ M ⊂ N ∈ N = 0 then M = 0; but there might be subsets of N which are not in . This motivates the following definition: a measure space X ∗ (or a measure ) is complete if all subsets of -null sets are again in ∗ . In other words: if all subsets of a null set are null sets. The following exercise shows that a measure space X which is not yet complete can be completed. (i) ∗ = A ∪ N A ∈ N is a subset of some -measurable null set is a -algebra satisfying ⊂ ∗ . (ii) A ¯ ∗ = A for A∗ = A ∪ N ∈ ∗ is well-defined, i.e. it is independent of the way we can write A∗ , say as A∗ = A ∪ N = B ∪ M where A B ∈ and M N are subsets of null sets. ¯ = A for all A ∈ . (iii) ¯ is a measure on ∗ and A
30
R.L. Schilling (iv) X ∗ ¯ is complete. (v) We have ∗ = A∗ ⊂ X ∃ A B ∈
A ⊂ A∗ ⊂ B
B \ A = 0.
4.14. Restriction. Let X be a measure space and let ⊂ be a sub--algebra. Denote by = the restriction of to . (i) Show that is again a measure. (ii) Assume that is a finite measure [a probability measure]. Is still a finite measure [a probability measure]? (iii) Does inherit -finiteness from ? 4.15. Show that a measure space X is -finite if, and only if, there exists a sequence of measurable sets Ej j∈ ⊂ such that j∈ Ej = X and Ej < for all j ∈ .
5 Uniqueness of measures
Before we embark on the proof of the existence of measures in the following chapter, let us first check whether it is enough to consider measures on some generator of a -algebra – otherwise our construction of Lebesgue measure would be flawed from the start. As mentioned in Remark 3.10 a major problem is that, apart from trivial cases, cannot be constructively obtained from . To overcome this obstacle we need a new concept. 5.1 Definition A family ⊂ X is a Dynkin system if X ∈
(1 )
D ∈ =⇒ Dc ∈
(2 )
Dj j∈ ⊂ pairwise disjoint =⇒ · Dj ∈
(3 )
j∈
5.2 Remark As for -algebras, cf. Properties 3.2, one sees that ∅ ∈ and that finite disjoint unions are again in : D E ∈ D ∩ E = ∅ =⇒ D ∪· E ∈ . Of course, every -algebra is a Dynkin system, but the converse is, in general, wrong[] , Problem 5.2. 5.3 Proposition Let ⊂ X. Then there is a smallest (also minimal, coarsest) Dynkin system containing . is called the Dynkin system generated by . Moreover, ⊂ ⊂ . Proof The proof that exists parallels the proof of T3.4(ii). As in the case of -algebras, = if is a Dynkin system (by minimality) and so = . Hence, ⊂ implies that ⊂ = . It is important to know when a Dynkin system is already a -algebra. 31
32
R.L. Schilling
5.4 Lemma A Dynkin system is a -algebra if, and only if, it is stable under finite intersections:1 D E ∈ =⇒ D ∩ E ∈ . Proof Since a -algebra is ∩-stable (cf. Properties 3.2, Problem 3.1) as well as a Dynkin system (Remark 5.2) it only remains to show that a ∩-stable Dynkin system is a -algebra. Let Dj j∈ be a sequence of subsets in . We have to show that D = j∈ Dj ∈ . Set E1 = D1 ∈ and Ej+1 = Dj+1 \ Dj \ Dj−1 \ \ D1 c = Dj+1 ∩ Djc ∩ Dj−1 ∩ ∩ D1c ∈
where we used 2 and the assumed ∩-stability of . The Ej are obviously mutually disjoint and D = · j∈ Ej ∈ by 3 . Lemma 5.4 is not applicable if is given in terms of a generator , which is often the case. The next theorem is very important for applications as it extends Lemma 5.4 to the much more convenient setting of generators. 5.5 Theorem If ⊂ X is stable under finite intersections, then = . Proof We have already established ⊂ in P5.3. If we knew that were a -algebra, the minimality of and ⊂ would immediately imply ⊂ , hence equality. In view of L5.4 it is enough to show that is ∩-stable. For this we fix some D ∈ and introduce the family D = Q ⊂ X Q ∩ D ∈ Let us check that D is a Dynkin system: 1 is obviously true. 2 : take Q ∈ D . Then · Dc c Qc ∩ D = Qc ∪ Dc ∩ D = Q ∩ Dc ∩ D = Q ∩ D ∪ ∈
(5.1)
∈
and disjoint unions of sets from are still in . Thus Qc ∈ D . 3 : let Qj j∈ be a sequence of pairwise disjoint sets from D . By definition, Qj ∩ Dj∈ is a disjoint sequence in and 3 for the Dynkin system shows · Qj ∩ D = · Qj ∩ D ∈ j∈
which means that · j∈ Qj ∈ D . 1
∩-stable, for short.
j∈
Measures, Integrals and Martingales
33
Since ⊂ and since is ∩-stable, we have ⊂ G for all G ∈ .[] But G is a Dynkin system and so ⊂ G for all G ∈ (use P5.3, Problem 5.4). Consequently, if D ∈ and G ∈ we find because of ⊂ G and the very definition of G that G ∩ D ∈
∀ G ∈
∀ D ∈
so
⊂ D
∀ D ∈
and
⊂ D
∀ D ∈
The latter just says that is stable under intersections with D ∈ . By Lemma 5.4 is a -algebra and the theorem is proved. 5.6 Remark The technique used in the proof of Theorem 5.5 is an extremely important and powerful tool. We will use it almost exclusively in this chapter to prove the uniqueness of measures theorem and some properties of Lebesgue measure n . 5.7 Theorem (Uniqueness of measures). Assume that X is a measurable space and that = is generated by a family such that • is stable under finite intersections: G H ∈ =⇒ G ∩ H ∈ ; • there exists an exhausting sequence Gj j∈ ⊂ with Gj ↑ X. Any two measures that coincide on and are finite for all members of the exhausting sequence Gj = Gj < , are equal on , i.e. A = A for all A ∈ . Proof For j ∈ we define
j = A ∈ Gj ∩ A = Gj ∩ A
< !
and we claim that every j is a Dynkin system. 1 is clear. 2 : if A ∈ j we have
Gj ∩ Ac = Gj \ A = Gj − Gj ∩ A = Gj − Gj ∩ A = Gj \ A = Gj ∩ Ac
34
R.L. Schilling
so that Ac ∈ j . 3 : if Ak k∈ ⊂ j are mutually disjoint sets, we get
Gj ∩ · Ak = · Gj ∩ Ak =
Gj ∩ Ak k∈
k∈
=
k∈
Gj ∩ Ak = · Gj ∩ Ak = Gj ∩ · Ak
k∈
k∈
k∈
and · k∈ Ak ∈ j follows. Since is ∩-stable, we know from T5.5 that = ; therefore, j ⊃ =⇒ j ⊃ =
∀ j ∈
On the other hand, = ⊂ j ⊂ , which means that = j for all j ∈ , and so
Gj ∩ A = Gj ∩ A
∀ j ∈
∀ A ∈
(5.2)
Using T4.4(iii) we can let j → in (5.2) to get
A = lim Gj ∩ A = lim Gj ∩ A = A j→
j→
∀ A ∈
The following two theorems show why Lebesgue measure (if it exists) plays a very special rôle indeed. 5.8 Theorem (i) n-dimensional Lebesgue measure n is invariant under translations, i.e. n x + B = n B
∀ x ∈ n ∀ B ∈ n
(5.3)
(ii) Every measure on n n which is invariant under translations and satisfies = 0 1n < is a multiple of Lebesgue measure: = n . Proof First of all we should convince ourselves that B ∈ n =⇒ x + B ∈ n
∀ x ∈ n
(5.4)
otherwise the statement of T5.8 would be senseless. For this set
x = B ∈ n x + B ∈ n ⊂ n It is clear that x is a -algebra and that ⊂ x .[] Hence, n = ⊂ x ⊂ n and (5.4) follows. We can now start the proof proper.
Measures, Integrals and Martingales
35
(i) Set B = n x + B for some fixed x = x1 xn ∈ n . It is easy to check that is a measure on n n [] . Take I = a1 b1 × · · · × an bn ∈
and observe that x + I = a1 + x1 b1 + x1 × · · · × an + xn bn + xn ∈ so that I = n x + I =
n
n bj + xj − aj + xj = bj − aj = n I
j=1
j=1
This means that = n .2 But is ∩-stable,3 generates n and admits the exhausting sequence −k kn ↑ n n −k kn = 2kn < We can now invoke T5.5 to see that n = on the whole of n . (ii) Take I ∈ as in part (I) but with rational endpoints aj bj ∈ . Thus there is some M ∈ and kI ∈ and points xj ∈ n , such that kI n I = · xj + 0 M1 j=1
i.e. we pave the rectangle I by little squares 0 M1 n of side-length 1/M, where M is, say, the common denominator of all aj and bj . Using the translation invariance of and n , we see n n
I = kI 0 M1
0 1n = M n 0 M1 n n n I = kI n 0 M1 n 0 1n = M n n 0 M1 =1
and dividing the top two and bottom two equalities gives kI kI kI n I = n n 0 1n = n
0 1n n M M M n n n Thus I = 0 1 I = I for all I ∈ and, as in part (I), an application of T5.5 finishes the proof.
I =
Incidentally, Theorem 5.8 proves Theorem 4.9(I). Further properties of Lebesgue measure will be studied in the following chapters, but first we concentrate on its existence. 2
This is short for I = n I ∀ I ∈ .
3
Use × aj bj ∩ × aj bj = × aj ∨ aj bj ∧ bj .
n
n
n
j=1
j=1
j=1
[ ]
36
R.L. Schilling
Problems 5.1. Verify the claims made in Remark 5.2. 5.2. The following exercise shows that Dynkin systems and -algebras are, in general, different: Let X = 1 2 3 2k − 1 2k for some fixed k ∈ . Then the family = A ⊂ X #A is even is a Dynkin system, but not a -algebra. 5.3. Let be a Dynkin system. Show that for all A B ∈ the difference B \ A ∈ . · Rc c where R Q ⊂ X.] [Hint: use R \ Q = R ∩ Q ∪ 5.4. Let be a -algebra, be a Dynkin system and ⊂ ⊂ X two collections of subsets of X. Show that (i) = and = ; (ii) ⊂ ; (iii) ⊂ . 5.5. Let A B ⊂ X. Compare A B and A B. When are they equal? 5.6. Show that Theorem 5.7 is still valid, if Gj j∈ ⊂ is not an increasing sequence but any countable family of sets such that 1 Gj = X and 2 Gj = Gj < j∈
[Hint: set FN = G1 ∪ ∪ GN = FN −1 ∪ GN and check by induction that FN = FN ; use then T5.7.] 5.7. Show that the half-open intervals n in n are stable under finite intersections. n n n [Hint: check that I = × aj bj , I = × aj bj satisfy I ∩I = × aj ∨aj bj ∧bj . ] j=1
j=1
j=1
5.8. Dilations. Mimic the proof of Theorem 5.8(I) and show that t · B = tb b ∈ B is a Borel set for all B ∈ n and t > 0. Moreover, n t · B = tn n B
∀ B ∈ n ∀ t > 0
(5.5)
5.9. Invariant measures. Let X be a finite measure space where = for some ∩-stable generator . Assume that X → X is a map such that −1 A ∈ for all A ∈ . Prove that
G = −1 G
∀G ∈
=⇒
A = −1 A ∀ A ∈
(A measure with this property is called invariant w.r.t. the map .) 5.10. Independence (1). Let P be a probability space and let ⊂ be two sub--algebras of . We call and independent, if PB ∩ C = PB PC
∀ B ∈ C ∈
Assume now that = and = where , are ∩-stable collections of sets. Prove that and are independent if, and only if, PG ∩ H = PG PH
∀ G ∈ H ∈
6 Existence of measures
In Chapter 4 we saw that it is not a trivial task to assign explicitly a -value to every set A from a -algebra . Rather than doing this it is often more natural to assign -values to, say, rectangles (in the case of the Borel -algebra) or, in general, to sets from some generator of . Because of Theorem 4.4 (and Remark 4.5) should be a pre-measure. If and satisfy the conditions of the uniqueness theorem 5.7, this approach will lead to a unique measure on , provided we can extend from onto = . To get such an automatic extension the following (technically motivated) class of generators is useful. A semi-ring is a family ⊂ X with the following properties: ∅ ∈
(S1 )
S T ∈ =⇒ S ∩ T ∈
(S2 )
for S T ∈ there exist finitely many disjoint M S1 S2 SM ∈ such that S \ T = · Sj
(S3 )
j=1
The solution to our problems is the following deep extension theorem for measures which goes back to Carathéodory [9]. 6.1 Theorem (Carathéodory) Let be a semi-ring of subsets of X and → 0 be a pre-measure, i.e. a set-function with (i) ∅ = 0;
(ii) Sj j∈ ⊂ , disjoint and S = · Sj ∈ =⇒ S = Sj . j∈
j∈
37
38
R.L. Schilling
Then has an extension to a measure on . If, moreover, contains an exhausting sequence Sj j∈ , Sj ↑ X such that Sj < for all j ∈ , then the extension is unique. 6.2 Remark From the Definition 4.1 of a measure it is clear that the conditions 6.1(i) and (ii) are necessary for to become a measure. Theorem 6.1 says that they are even sufficient. Remarkable is the fact that (ii) is only needed relative to – its extension to is then automatic. The proof of Carathéodory’s theorem is a bit involved and not particularly rewarding when read superficially. Therefore we recommend skipping the proof on first reading and resuming on p. 44. Proof (of Theorem 6.1) We begin with the construction of an auxiliary setfunction ∗ X → 0 which will, eventually, extend . Define for each A ⊂ X the family of countable -coverings of A A = Sj j∈ ⊂ j∈ Sj ⊃ A (A = ∅ is possible since we do not require X ∈ ), and set ∗ A = inf Sj Sj j∈ ∈ A
(6.1)
j∈
where, as usual, inf ∅ = + . Step 1: Claim: ∗ has the following three properties:1 ∗ ∅ = 0
OM1
A ⊂ B =⇒ ∗ A ∗ B
monotone
OM2
Aj j∈ ⊂ X =⇒ -subadditive
∗
j∈
Aj
∗
OM3
Aj
j∈
OM1 is obvious since we can take in (6.1) the constant sequence S1 = S2 = = ∅ which is clearly in ∅. OM2 : if B ⊃ A, then each -cover of B also covers A, i.e. B ⊂ A. Therefore, ∗ A = inf Sj Sj j∈ ∈ A j∈
inf
Tk Tk k∈ ∈ B = ∗ B
k∈ 1
A set-function ∗ X → 0 satisfying OM1 –OM3 is called outer measure.
Measures, Integrals and Martingales
39
OM3 : without loss of generality we can assume that ∗ Aj < for all j ∈ and so Aj = ∅. Fix > 0 and observe that by the very nature of the infimum j we find for each Aj a cover Sk k∈ ∈ Aj with
Sk ∗ Aj + j
k∈
2j
j ∈
j
The double sequence Sk jk∈ is an -cover of A = ∗ A
j
Sk =
(6.2)
j∈ Aj ,
and so
j
Sk
j∈ k∈
jk∈× (6.2)
∗ Aj +
j∈
=
2j
∗ Aj +
j∈
where the second ‘’ follows from (6.2). Letting → 0 proves OM3 . Step 2. Claim: ∗ extends , i.e. ∗ S = S
∀ S ∈ .
Observe that can be uniquely extended to the set ∪ = S1 ∪· ∪· SM M ∈ Sj ∈ of all finite unions of disjoint -sets by · ∪ · SM = S ¯ 1∪
M
Sj
(6.3)
j=1
Since (6.3) is necessary for an additive set-function on ∪ , (6.3) implies the uniqueness of the extension[] once we know that ¯ is well-defined, that is, independent of the particular representation of sets in ∪ . To see this assume that · ∪ · SM = T1 ∪ · ∪ · TN S1 ∪
M N ∈ Sj Tk ∈
Then N · ∪ · TN = · Sj ∩ Tk Sj = Sj ∩ T1 ∪ k=1
and the additivity of on shows Sj =
N k=1
Sj ∩ Tk
40
R.L. Schilling
Summing over j = 1 2 M and swapping the rôles of Sj and Tk gives M
Sj =
j=1
M N
Sj ∩ Tk =
j=1 k=1
N
Tk
k=1
which proves that (6.3) does not depend on the representation of ∪ -sets. The family ∪ is clearly stable under finite disjoint unions. If S T ∈ ∪ we find (notation as before) MN · ∪ · SM ∩ T1 ∪ · ∪ · TN = · Sj ∩ Tk ∈ ∪ S ∩ T = S1 ∪ jk=1 ∈
and, since by S3 Sj \ Tk ∈ ∪ , also · ∪ · SM \ T1 ∪ · ∪ · TN S \ T = S1 ∪ M M N N = · Sj ∩ Tkc = · Sj \ Tk ∈ ∪ j=1 k=1 j=1 k=1
∈ ∪
∈∪
· -stability of ∪ . Finally, where we used the ∩- and ∪ · S∩T ∪ · T \ S ∈ ∪ 2 S∪T = S\T ∪ and the prescription (6.3) can be used to extend to finite unions of -sets. Let us show that ¯ is -additive on ∪ , i.e. a pre-measure. For this take Tk k∈ ⊂ ∪ such that T = · k∈ Tk ∈ ∪ . By the definition of the family ∪ we find a sequence of disjoint sets Sj j∈ ⊂ and a sequence of integers 0 = n0 n1 n2 such that · ∪ · Snk Tk = Snk−1+1 ∪
k∈
· ∪ · UN , where U = · Sj ∈ [] with disjoint index sets and T = U1 ∪ j∈J
· J2 ∪ · ∪ · JN = partitioning . Thus J1 ∪ def
T ¯ =
N =1
6.1(ii)
U =
N =1 j∈J
Sj =
nk
k∈ j=nk−1+1
which proves -additivity of . ¯ 2
def
Sj =
This shows that ∪ is the ring generated by , i.e. the smallest ring containing .
k∈
T ¯ k
Measures, Integrals and Martingales
41
Using the pre-measure ¯ we get from Corollary 4.6 for any cover Sj j∈ ∈ S, S ∈ , that
S = S ¯ = ¯ Sj ∩ S S ¯ j ∩ S j∈
j∈
=
Sj ∩ S
j∈
Sj
j∈
and passing to the infimum over S shows S ∗ S. The special cover S ∅ ∅ ∈ S, on the other hand, yields ∗ S S and this shows that = ∗ . Step 3. Claim: ⊂ ∗ , where ∗ is given by ∗ = A ⊂ X ∗ Q = ∗ Q ∩ A + ∗ Q \ A ∀ Q ⊂ X
(6.4)
Let S T ∈ . From S3 we get M · T \ S = S ∩ T ∪ · · Sj T = S ∩ T ∪ j=1
for some mutually disjoint sets Sj ∈ , j = 1 2 M. Since is additive on and ∗ is (-)subadditive by OM3 , we find ∗ S ∩ T + ∗ T \ S S ∩ T +
M
Sj = T
(6.5)
j=1
Take any B ⊂ X and some -cover Tj j∈ ∈ B. Using ∗ Tj = Tj and summing the inequality (6.5) for T = Tj over j ∈ yields ∗ ∗ ∗ Tj \ S + Tj ∩ S Tj j∈
j∈
j∈
and the -subadditivity OM3 and monotonicity OM2 of ∗ give (recall that B ⊂ j∈ Tj )
Tj \ S + ∗ Tj ∩ S ∗ B \ S + ∗ B ∩ S ∗
j∈
j∈ ∗
Tj =
j∈
Tj
j∈
We can now pass to the inf over B and find ∗ B \ S + ∗ B ∩ S ∗ B. Since the reverse inequality follows easily from the (-)subadditivity OM3 of ∗ , S ∈ ∗ holds for all S ∈ . Step 4. Claim: ∗ is a -algebra and ∗ is a measure on X ∗ .
42
R.L. Schilling
Clearly, ∅ ∈ ∗ and by the symmetry (w.r.t. A and Ac ) of definition (6.4) of ∗ we have A ∈ ∗ if, and only if, Ac ∈ ∗ . Let us show that ∗ is ∪-stable. Using the (-)subadditivity OM3 of ∗ we find for A A ∈ ∗ and any P ⊂ X ∗ P ∩ A ∪ A + ∗ P \ A ∪ A = ∗ P ∩ A ∪ A \ A + ∗ P \ A ∪ A ∗ P ∩ A + ∗ P ∩ A \ A + ∗ P \ A ∪ A = ∗ P ∩ A + ∗ P \ A ∩ A + ∗ P \ A \ A 64
= ∗ P ∩ A + ∗ P \ A
(6.6)
64
= ∗ P
(6.6 )
where we used in the last two steps the definition (6.4) of ∗ with Q = P \ A and Q= P, respectively. The reverse inequality follows from OM3 , hence equality, and we conclude that A ∪ A ∈ ∗ . If A A are disjoint, the equality (6.6)=(6.6 ) becomes, for P = A ∪· A ∩ Q, Q ⊂ X, · A = ∗ Q ∩ A + ∗ Q ∩ A ∗ Q ∩ A ∪
∀ Q ⊂ X
and a simple induction argument yields · ∪ · AM = ∗ Q ∩ A1 ∪
M
∗ Q ∩ Aj
∀Q ⊂ X
j=1
for all mutually disjoint A1 A2 AM ∈ ∗ . In particular, if Aj j∈ ⊂ ∗ is a sequence of pairwise disjoint sets, we find for their union A = · j∈ Aj that · ∪ · AM = ∗ Q ∩ A ∗ Q ∩ A1 ∪
M
∗ Q ∩ Aj
(6.7)
j=1
Since A1 ∪ ∪ AM ∈ ∗ , we can use OM3 and (6.7) to deduce ∗ Q = ∗ Q ∩ A1 ∪ ∪ AM + ∗ Q \ A1 ∪ ∪ AM ∗ Q ∩ A1 ∪ ∪ AM + ∗ Q \ A =
M j=1
∗ Q ∩ Aj + ∗ Q \ A
(6.8)
Measures, Integrals and Martingales
43
The left-hand side is independent of M; therefore, we can let M → and get ∗ Q
∗ Q ∩ Aj + ∗ Q \ A ∗ Q ∩ A + ∗ Q \ A
(6.9)
j=1
The reverse inequality ∗ Q ∗ Q ∩ A + ∗ Q \ A follows at once from the subadditivity of ∗ . This means that equality holds throughout (6.9) and we get A ∈ ∗ . If we take Q = A in (6.9) we even see the -additivity of ∗ on ∗ . So far we have seen that ∗ is a ∪-stable Dynkin system. Because of A ∩ B = Ac ∪ Bc c we see that ∗ is also ∩-stable and, by L5.4, a -algebra. Step 5. Claim: ∗ is a measure on which extends . By step 3, ⊂ ∗ and thus ⊂ ∗ = ∗ since ∗ is itself a -algebra (step 4). Again by step 4, ∗ is a measure which, by step 2, extends . Step 6. Uniqueness of ∗ . If there is an exhausting sequence Sj j∈ ⊂ , Sj ↑ X such that Sj < for all j ∈ , it follows from T5.7 that any two extensions of to coincide. 6.3 Remark The core of Carathéodory’s theorem 6.1 is the definition (6.4) of ∗ -measurable sets, i.e. of the -algebra ∗ . The proof shows that, in general, we cannot expect ∗ to be (-)additive outside ∗ . In many situations the -algebra X is simply too big to support a non-trivial measure. Notable exceptions are countable sets X or Dirac measures[] . For n-dimensional Lebesgue measure, this was first remarked by Hausdorff [19, pp. 401–402]. The general case depends on the cardinality of X and the behaviour of on one-point sets; see the discussion in Oxtoby [33, Chapter 5]. Put in other words this says that even a household measure like Lebesgue measure cannot assign a content to every set! In 3 (and higher dimensions) we even have the Banach–Tarski paradox: the open balls B1 0 and B2 0 with M centre 0 and radii 1 resp. 2 have finite disjoint decompositions B1 0 = · j=1 Ej M and B2 0 = · j=1 Fj such that for every j = 1 2 M the sets Ej and Fj are geometrically congruent (hence, should have the same Lebesgue measure); see Stromberg [49] or Wagon [52]. Of course, not all of the sets Ej and Fj can be Borel sets. This brings us to the question if and how we can construct a non-Borel measurable set, i.e. a set A ∈ n \ n . Such constructions are possible but they are based on the axiom of choice, see for example Hewitt and Stromberg [20, pp. 136–7], Oxtoby [33, pp. 22–3] or Appendix D. ∗
∗
∗
44
R.L. Schilling
Let us now apply Theorem 6.1 to prove the existence of n-dimensional Lebesgue measure n which was defined for half-open rectangles n = n in D4.8: n a b =
n
bj − aj
n
a b = × aj bj ∈ n j=1
j=1
6.4 Proposition The family of n-dimensional rectangles n is a semi-ring. Proof (By induction) It is obvious that 1 satisfies the properties S1 –S3 from page 37. Assume that n is a semi-ring for some n 1. From the definition of rectangles it is clear that
n+1 = n × 1 = In × I1 In ∈ n I1 ∈ 1 S1 is obviously true and S2 follows from the identity In × I1 ∩ Jn × J1 = In ∩ Jn × I1 ∩ J1
(6.10)
where In Jn ∈ n and I1 J1 ∈ 1 . Since Jn × J1 c = x y x ∈ Jn y ∈ J1
or x ∈ Jn y ∈ J1
or x ∈ Jn y ∈ J1
· Jn × J1c ∪ · Jnc × J1 = Jnc × J1c ∪ we see, using (6.10), In × I1 \ Jn × J1 = In × I1 ∩ Jn × J1 c · In ∩ Jn × I1 \ J1 ∪ · In \ Jn × I1 ∩ J1 = In \ Jn × I1 \ J1 ∪ Both In \ Jn and I1 \ J1 are made up of finitely many disjoint rectangles from n and 1 , and therefore In × I1 \ Jn × J1 is a finite union of disjoint rectangles from n × 1 ; thus S3 holds. In 2 it is easy to depict the two typical situations that occur in the proof of S3 in Proposition 6.4: Jn 1
8
7
2
Jn × J1
6
3
4
5
J1 I1
1 I1
2
3 In
In
Measures, Integrals and Martingales
45
The proof of Proposition 6.4 reveals a bit more: the Cartesian product of any two semi-rings is again a semi-ring.[] 6.5 Proposition n is a pre-measure on n Proof It is enough to verify (i), (ii) and (iii ) of Theorem 4.4 since n assigns finite measure to every rectangle in n . We consider only the case n = 2, since (b1, b2) n = 1 is similar but easier and n 3 adds only notational complications. Obviously, I2 2 ∅ = 0. To see additivity on 2 , we γ may as well cut I = a1 b1 × a2 b2 along I1 one direction (say, along j = 2) to get I1 = a1 b1 × a2 , I2 = a1 b1 × b2 (a1, a2) · I2 (if n 3 this is and reassemble it I = I1 ∪ accomplished by a hyperplane). Thus 2 I1 + 2 I2 = b1 − a1 − a2 + b1 − a1 b2 − = b1 − a1 b2 − + − a2 = b1 − a1 b2 − a2 = 2 I Now let Ij j∈ ⊂ 2 , Ij = aj bj , be a decreasing sequence of rectangles Ij ↓ ∅. We have to show that limj→ 2 Ij = 0. Since Ij ↓ ∅, it is clear that j j at least in one coordinate direction, say k = 2, we have limj→ b2 − a2 = 0, j otherwise j∈ Ij would contain a rectangle with side-lengths limj→ bk − j bk > 0 for k = 1 2. But then 2 Ij =
2
j
j
bk − ak
k=1
j j 2−1 j j j→ max bk − ak b2 − a2 −−−→ 0 k =2
6.6 Corollary (Existence of Lebesgue measure) There is a unique extension of n-dimensional Lebesgue pre-measure n from n (Definition 4.8) to a measure on the Borel sets n . This extension is again denoted by n and is called Lebesgue measure. n Proof We know from Theorem 3.8 that n =
−k kn ↑ . Since n n n is an exhausting sequence of cubes and since −k k = 2kn < , all conditions of Carathéodory’s theorem 6.1 are fulfilled, and n extends to a measure on n .
46
R.L. Schilling
6.7 Remark The uniqueness of Lebesgue measure and its properties (cf. Theorem 4.9) show that it is necessarily the familiar elementary-geometric volume (length, area …)-function voln • in the sense that voln can in only one way be extended to a measure on the Borel -algebra. Problems 6.1. Consider on the family of all Borel sets which are symmetric w.r.t. the origin. Show that is a -algebra. Is it possible to extend a pre-measure on to a measure on ? If so, is this extension unique? Continues in Problem 9.12. 6.2. Completion (2). Recall from Problem 4.13 that a measure space X is complete, if every subset of a -null set is a -null set (thus, in particular, measurable). Let X be a -finite measure space – i.e. there is an exhausting sequence Aj j∈ ⊂ such that Aj < . As in the proof of Theorem 6.1 we write ∗ for the outer measure (1) – now defined using -coverings – and ∗ for the -algebra defined by (6.4). (i) Show that for every Q ⊂ X there is some A ∈ such that ∗ Q = A and that N = 0 for all N ⊂ Q \ A with N ∈ . [Hint: since ∗ is defined as an infimum, every Q with ∗ Q < admits a sequence Bk ∈ with Bk ⊃ Q and B − ∗ Q 1/k. If ∗ Q = , consider for each j ∈ the set Q ∩ Aj .] (ii) Show that X ∗ ∗ ∗ is a complete measure space. (iii) Show that X ∗ ∗ ∗ is the completion of X in the sense of Problem 4.13. (i) Show that non-void open sets in (resp. n ) have always strictly positive Lebesgue measure. [Hint: let U be open. Find a small ball in U and inscribe a cube.] (ii) Is (i) still true for closed sets? 6.4. (i) Show that 1 a b = b − a for all a b ∈ a b. [Hint: approximate a b by half-open intervals and use Theorem 4.4.] (ii) Let H ⊂ 2 be a hyperplane which is perpendicular to the x1 -direction (that is to say: H is a translate of the x2 -axis). Show that H ∈ 2 and 2 H = 0. [Hint: consider the sets Ak = − 2−k 2−k × −k k and note that H ⊂ y + ∪k∈ Ak for some y.] (iii) State and prove the n -analogues of (i) and (ii). 6.5. Let X be a measure space such that all singletons x ∈ . A point x is called an atom, if x > 0. A measure is called non-atomic or diffuse, if there are no atoms. 6.3.
(i) Show that one-dimensional Lebesgue measure 1 is diffuse. (ii) Give an example of a non-diffuse measure on . (iii) Show that for a diffuse measure on X all countable sets are null sets.
Measures, Integrals and Martingales
47
(iv) Show that every probability measure P on can be decomposed into a sum of two measures + , where is diffuse and is a measure of the form = j∈ j xj , j > 0, xj ∈ . k
k
k
[Hint: since P = 1, there are at most k points y1 y2 yk such that k 1 > P yj k1 . Find by recursion (in k) all points satisfying such a k−1 k relation. There are at most countably many of these yj . Relabel them as x1 x2 . These are the atoms of P. Now take j = P yj , define as stated and prove that and P − are measures.] 6.6. A set A ⊂ n is called bounded, if it can be contained in a ball Br 0 ⊃ A of finite radius r. A set A ⊂ n is called connected, if we can go along a curve from any point a ∈ A to any other point a ∈ A without ever leaving A, cf. Appendix B. (i) Construct an open and unbounded set in with finite, strictly positive Lebesgue measure. [Hint: try unions of ever smaller open intervals centred around n ∈ .] (ii) Construct an open, unbounded and connected set in 2 with finite, strictly positive Lebesgue measure. [Hint: try a union of adjacent, ever longer, ever thinner rectangles.] (iii) Is there a connected, open and unbounded set in with finite, strictly positive Lebesgue measure? 6.7. Let = 1 01 be Lebesgue measure on 0 1 0 1 . Show that for every > 0 there is a dense open set U ⊂ 0 1 with U . [Hint: take an enumeration qj j∈ of ∩ 0 1 and make each qj the centre of a small open interval.] 6.8. Let = 1 be Lebesgue measure on . Show that N ∈ is a null set if, and only if, for every > 0 there is an open set U = U ⊃ N such that U < . [Hint: sufficiency is trivial, for necessity use ∗ constructed in Theorem 6.1 (6.1) from and observe that by Theorem 6.1 n = ∗ n . This gives the required open cover.] 6.9. Borel–Cantelli lemma (1) – the direct half. Prove the following theorem. Theorem (Borel–Cantelli lemma). Let P be a probability space. For every sequence Aj j∈ ⊂ we have j=1
PAj <
=⇒
P
Aj = 0
(6.11)
n=1 j=n
[Hint: use Theorem 4.4 and the fact that P jn Aj jn PAj .] Remark. This is the ‘easy’ or direct half of the so-called Borel–Cantelli lemma; the more difficult part see T18.9. The condition ∈ Aj means that happens n=1 j=n
to be in infinitely many of the Aj and the lemma gives a simple sufficient condition when certain events happen almost surely not infinitely often, i.e. only finitely often with probability one.
48
R.L. Schilling
6.10. Non-measurable sets (1). Let be a measure on = ∅ 0 1 1 2 0 2 , X = 0 2, such that 0 1 = 1 2 = 21 and 0 2 = 1. Denote by ∗ and ∗ the outer measure and -algebra which appear in the proof of Theorem 6.1. (i) Find ∗ a b and ∗ a for all 0 a < b < 2 if we use = in T6.1; (ii) Show that 0 1 0 ∈ ∗ . 6.11. Non-measurable sets (2). Consider on X = the -algebra = A ⊂ A or Ac is countable from Example 3.3(v) and the measure A from 4.7(ii) which is 0 or 1 according to A or Ac being countable. Denote by ∗ and ∗ the outer measure and -algebra which appear in the proof of Theorem 6.1. (i) Find ∗ if we use = in T6.1; (ii) Show that no set B ⊂ , such that both B and Bc are uncountable, is in or in ∗ .
7 Measurable mappings
In this chapter we consider maps T X → X between two measurable spaces X and X which respect the measurable structures, that is -algebras, on X and X . Such maps can be used to transport a given measure , defined on X , onto X . We have already used this technique in Theorem 5.8, where we considered shifts of sets: A x + A, but it is in probability theory where this concept is truly fundamental: you use it whenever you speak of the ‘distribution’ of a ‘random variable’. 7.1 Definition Let X X be two measurable spaces. A map T X → X is called / -measurable (or measurable unless this is too ambiguous) if the pre-image of every measurable set is a measurable set: T −1 A ∈
∀ A ∈
(7.1)
A random variable is a measurable map from a probability space to any measurable space. Note that T −1 ⊂ is a common shorthand for (7.1). In the language of Definition 7.1 the translation (and its inverse) which we used in Theorem 5.8 is a n /n -measurable map: x n → n y → y − x
and
x−1 n → n y → y + x
(7.2)
∀ B ∈ n
(7.3)
In fact, Theorem 5.8 states that
n B = n x + B
and this requires x + B = −x B = x−1 B to be a Borel set! Our proof of T5.8 needed (and proved) this for rectangles B ∈ n and not for all Borel sets – but 49
50
R.L. Schilling
this is good enough even in the most general case. The following lemma shows that measurability needs only to be checked for the sets of a generator. 7.2 Lemma Let X X be measurable spaces and let = . Then T X → X is / -measurable if, and only if, T −1 ⊂ , i.e. if T −1 G ∈
∀ G ∈
(7.4)
Proof If T is / -measurable, we have T −1 ⊂ T −1 ⊂ , and (7.4) is obviously satisfied. Conversely, consider the system = A ⊂ X T −1 A ∈ . By (7.4), ⊂ and it is not difficult to see that is itself a -algebra since T −1 commutes with all set-operations.[] Therefore, = ⊂ = =⇒ T −1 A ∈
∀ A ∈
On a topological space X – see Appendix B – we consider usually the (topological) Borel -algebra X = . The interplay between measurability and topology is often quite intricate. One of the simple and extremely useful aspects is the fact that continuous maps are measurable; let us check this for n . 7.3 Example Every continuous map T n → m is n /m -measurable. From calculus1 we know that T is continuous if, and only if, T −1 U ⊂ n
is open
∀ open U ⊂ m
(7.5)
Since the open sets m in m generate the Borel -algebra m , we can use (7.5) to deduce T −1 m ⊂ n ⊂ n = n By Lemma 7.2, T −1 m ⊂ n which means that T is measurable. Caution: Not every measurable map is continuous, e.g. x → 1−11 x. 7.4 Theorem Let Xj j , j = 1 2 3, be measurable spaces and T X1 → X2 , S X2 → X3 be 1 /2 - resp. 2 /3 -measurable maps. Then S T X1 → X3 is 1 /3 -measurable. Proof For A3 ∈ 3 we have
S T −1 A3 = T −1 S −1 A3 ∈ T −1 2 ⊂ 1 ∈ 2
1
See also Appendix B, Theorem B.12 and B.19.
Measures, Integrals and Martingales
51
Often we find ourselves in a situation where T X → X is given and where X is equipped with a natural -algebra – e.g. if X = and = – but no -algebra is specified in X. Then the question arises: is there a (smallest) -algebra on X which makes T measurable? An obvious, but nevertheless useless, candidate is X, which renders every map measurable.[] From Example 3.3(vii) we know that T −1 is a -algebra in X but we cannot remove a single set from it without endangering the measurability of T .[] Let us formalize this observation. 7.5 Definition (and Lemma) Let Ti i∈I be arbitrarily many mappings Ti X → Xi from the same space X into measurable spaces Xi i . The smallest -algebra on X that makes all Ti simultaneously measurable[] is
Ti−1 i (7.6) Ti i ∈ I = i∈I
We say that Ti i ∈ I is generated by the family Ti i∈I . Although Ti−1 i is a -algebra this is, in general, no longer true for −1 i∈I Ti i if #I > 1; this explains why we have to use the -hull in (7.6).
7.6 Theorem Let X X be measurable spaces and T X → X be an / -measurable map. For every measure on X , A = T −1 A
A ∈
(7.7)
defines a measure on X . Proof If A = ∅, then T −1 ∅ = ∅ and ∅ = ∅ = 0. If Aj j∈ ⊂ is a sequence of mutually disjoint sets, then
[]
· Aj = T −1 · Aj = · T −1 Aj j∈
j∈
j∈
−1 T Aj = Aj = j∈
j∈
Notice that we have seen a special case of Theorem 7.6 in the proof of Theorem 5.8 when considering translates of Lebesgue measure: n x + B =
x−1 B. 7.7 Definition The measure • of Theorem 7.6 is called the image measure of under T and is denoted by T• or T −1 •.
52
R.L. Schilling
7.8 Example Let P be a probability space and → be a random variable, i.e. an /-measurable map. Then2 PA = P −1 A = P ∈ A = P ∈ A is again a probability measure, called the law or distribution of the random variable . More concretely, if P describes throwing two fair dice, i.e. = j k 1 j k 6 , = and P j k = 1/36, we could ask for the total number of points thrown: → 2 3 12 , j k = j + k, which is a measurable map.[] The law of is then given in the table below: j
2
3
4
5
6
7
8
9
10
11
12
P = j
1 36
1 18
1 12
1 9
5 36
1 6
5 36
1 9
1 12
1 18
1 36
We close this section with some transformation formulae for Lebesgue measure. Recall that On is the set of all orthogonal n × n matrices: T ∈ On if, and only if, t T · T = id. Orthogonal matrices preserve lengths and angles, i.e. we have for all x y ∈ n x y = Tx Ty ⇐⇒ x = Tx
(7.8)
where x y = nj=1 xj yj and x2 = x x denote the usual Euclidean scalar product, resp. norm. 7.9 Theorem If T ∈ On, then n = T n . Proof Since T is a linear orthogonal map it is continuous and by (7.8) even an isometry, Tx − Ty = Tx − y = x − y hence measurable by Example 7.3. Therefore, the image measure B =
n T −1 B is well-defined (by T7.6) and satisfies for all x ∈ n x + B = n T −1 x + B = n T −1 x + T −1 B 5.8
= n T −1 B
= B 2
We use the shorthand ∈ A for −1 A and P ∈ A for P ∈ A .
Measures, Integrals and Martingales
53
and, again by Theorem 5.8, B = n B for all B ∈ n . To determine the constant we choose B = B1 0. Since T ∈ On, (7.8) implies B1 0 = x x < 1 = x Tx < 1 = T −1 B1 0 and thus
n B1 0 = n T −1 B1 0 = B1 0 = n B1 0 As 0 < n B1 0 < , we have = 1, and the theorem follows. Theorem 7.9 is a particular case of the following general change-of-variable formula for Lebesgue measure. Recall that GLn is the set of all invertible n × n matrices, i.e. S ∈ GLn ⇐⇒ det S = 0. 7.10 Theorem Let S ∈ GLn . Then S n = det S −1 n =
1
n det S
(7.9)
Proof Since S is invertible, both S and S −1 are linear maps on n , and as such continuous and measurable (Example 7.3). Set B = n S −1 B for B ∈ n . Then we have for all x ∈ n x + B = n S −1 x + B = n S −1 x + S −1 B 58
= n S −1 B = B
and from Theorem 5.8 we conclude that B = 0 1n n B = n S −1 0 1n n B From elementary geometry we know that S −1 0 1n is a parallelepiped spanned by the vectors S −1 ej j = 1 2 n, ej = 0 0 1 0 . Its geometric j
volume is
voln S −1 0 1n = det S −1 =
1 det S
see also Appendix C. By Remark 6.7, voln = n (at least on the Borel sets) and the proof is finished. Theorem 7.9 or 7.10 allow us to complete the characterization of Lebesgue measure announced earlier in Theorem 4.9. A motion is a linear transformation of the form Mx = x T
54
R.L. Schilling t
where x y = y −x is a translation and T ∈ On is an orthogonal map ( T · T = id). In particular, congruent sets are connected by motions. 7.11 Corollary Lebesgue measure is invariant under motions: n = M n for all motions M in n . In particular, congruent sets have the same measure. Proof We know that M is of the form x T . Since det T = ±1, we get 7.10
5.8
M n = x T n = x n = n Problems 7.1. Use Lemma 7.2 to show that x of (7.2), i.e. x y = y − x x y ∈ n , is n /n measurable. 7.2. Show that defined in the proof of Lemma 7.2 is a -algebra. 7.3. Let X be a set, Xi i , i ∈ I, be arbitrarily many measurable spaces, and Ti X → Xi be a family of maps. (i) Show that for every i ∈ I the smallest -algebra in X that makes Ti measurable is given by Ti−1 i . (ii) Show that i∈I Ti−1 i is the smallest -algebra in X that makes all Ti , i ∈ I, simultaneously measurable. 7.4. Let X be a set, Xi i i ∈ I, be arbitrarily many measurable spaces, and Ti X → Xi be a family of maps. Show that a map f from a measurable space F to X Ti i ∈ I is measurable if, and only if, all maps Ti f are /i -measurable. 7.5. Use Problem 7.4 to show that a function f n → m , x → f1 x fm x is ’take out’ measurable if, and only if, all coordinate maps fj n → , j = 1 2 m, are measurable. [Hint: show that the coordinate projections x = x1 xn → xj are measurable.] 7.6. Let T X → X be a measurable map. Under which circumstances is the family of sets T a -algebra? 7.7. Use image measures to give a new proof of Problem 5.8, i.e. show that
n t · B = tn n B
∀ B ∈ n ∀ t > 0
7.8. Let T X → Y be any map. Show that T −1 = T −1 holds for arbitrary families of subsets of Y . 7.9. Stieltjes measure (1). Throughout this exercise X = 1 and = 1 is one-dimensional Lebesgue measure. ⎧ ⎪ if x > 0 ⎪ ⎨0 x 1 (i) Let be a measure on . Show that F x = 0 if x = 0 ⎪ ⎪ ⎩−x 0 if x < 0 is a monotonically increasing and left-continuous function F → .
Measures, Integrals and Martingales
55
Remark. Increasing and left-continuous functions are called Stieltjes functions. (ii) Let F → be a Stieltjes function (cf. part (i)). Show that F a b = Fb − Fa
∀ a b ∈ a < b
has a unique extension to a measure on 1 . [Hint: check the assumptions of Theorem 6.1 with = a b a b .] (iii) Use part (i) to show that every measure on 1 with −n n < , n ∈ , can be written in the form F as in (ii) with some Stieltjes function F = F as in (i). (iv) Which Stieltjes function F corresponds to ? (v) Which Stieltjes function F corresponds to 0 ? (vi) Show that F as in (i) is continuous at x ∈ if, and only if, x = 0. (vii) Show that every measure on 1 which has no atoms (see Problem 6.5) can be written as image measure of . [Hint: has no atoms implies that F is continuous. So G = F −1 exists and can be made left-continuous. Finally a b = F b − F a =
G−1 a b ] (viii) Is (vii) true for measures with atoms, say, = 0 ? [Hint: determine F−1 . Is it measurable?] 0 7.10. Cantor’s ternary set. Let X = 0 1 0 1 ∩ 1 , = 1 01 , and set · I12 . Remove the E0 = 0 1. Remove the open middle third of E0 to get E1 = I11 ∪ j · I22 ∪ · I23 ∪ · I24 and so forth. open middle thirds of I1 , j = 1 2, to get E2 = I21 ∪ (i) Make a sketch of E0 E1 E2 E3 . (ii) Prove that each En is compact. Conclude that C = n∈ 0 En is non-void and compact. (iii) The set C is called the Cantor set or Cantor’s discontinuum. It satisfies 3k+2 = ∅. C ∩ n∈ k∈ 0 3k+1 n n 3 3 (iv) Find the value of En and show that C = 0. (v) Show that C does not contain any open interval. Conclude that the interior (of the closure) of C is empty. Remark. Sets with empty interior are called nowhere dense. (vi) We can write x ∈ 0 1 as a base-3 ternary fraction, i.e. x = 0x1 x2 x3 where
1 −j xj ∈ 0 1 2 , which is short for x = j=1 xj 3 . (E.g. 3 = 01 = 002222 ; note that this representation is not unique[] , which is important for this exercise.) Show that x ∈ C if, and only if, x has a ternary representation involving only 0s and 2s. [Hint: the numbers in 13 23 , the first interval to be removed, are all of the form 01 ∗ ∗ ∗ , i.e. they contain at least one ‘1’, while in 0 13 and 23 1 we have numbers of the form 00 ∗ ∗ ∗ –0022222 and 02 ∗ ∗ ∗ –02222 ,
56
R.L. Schilling respectively. The next step eliminates the 001 ∗ ∗ ∗ s and 021 ∗ ∗ ∗ s – etc.] (vii) Use (vi) to show that C is not countable and has even the same cardinality as 0 1. Nevertheless, C = 0 = 1 = 0 1.
7.11. Factorization lemma. Let X be a set, Y be a measurable space and T X → Y be a surjective map. Show that a function f X → is T /1 -measurable if, and only if, there exists some /1 -measurable function g Y → such that f = g T . [Hint: show first that Tx = Tx implies fx = fx .] Remark. The result is actually true for any map T X → Y , but the proof is quite difficult if TX ∈ . The problem is that one has to extend the TX ∩ /1 measurable function g TX → to an /1 -measurable function g Y → .
8 Measurable functions
A measurable function is a measurable map u X → from some measurable space X to . Measurable functions will play a central rôle in the theory of integration. Recall that u X → is /-measurable1 ( = ) if u−1 B ∈
∀B ∈
(8.1)
∀ G from a generator of
(8.2)
which is, due to Lemma 7.2, equivalent to u−1 G ∈
As we have seen in Remark 3.9, is generated by all sets of the form a (or b or − c or − d) with a b c d ∈ or , and we need u−1 a = x ∈ X ux ∈ a = x ∈ X ux a ∈
(8.3)
with similar expressions for the other types of intervals. Let us introduce the following useful shorthand notation: u v = x ∈ X ux vx
(8.4)
and u > v u v u < v u = v u = v u ∈ B , etc. which are defined in a similar fashion. In this new notation measurability of functions reads as 8.1 Lemma Let X be a measurable space. The function u X → is /-measurable if, and only if, one, hence all, of the following conditions hold (i) u a ∈ (ii) u > a ∈ 1
for all a ∈ (or all a ∈ ), for all a ∈ (or all a ∈ ),
We will frequently drop the since is naturally equipped with the Borel -algebra and just say that u is -measurable.
57
58
R.L. Schilling
(iii) u a ∈ (iv) u < a ∈
for all a ∈ (or all a ∈ ), for all a ∈ (or all a ∈ ).
Proof Combine Remark 3.9 and Lemma 7.2. It is sometimes practical to admit the values + and − in some calculations. ¯ = − +. If we agree To do this properly, consider the extended real line ¯ inherits the ordering from that − < x and y < + for all x y ∈ , then as well as the usual rules of addition and multiplication of elements from . The latter need to be augmented as follows: for all x ∈ we have x + + = + + x = + + + + = +
x + − = − + x = − − + − = −
and, if x ∈ 0 , ±x+ = +±x = ± ±x− = −±x = ∓ 0 · ± = ± · 0 = 02
1 ±
= 0
¯ is not a field. Expressions of the form Caution: − and
± ±
must be avoided2
¯ are called numerical functions. The Borel Functions which take values in ¯ ¯ -algebra = is defined by B∗ = B ∪ S for some B ∈ and B∗ ∈ ¯ ⇐⇒ S ∈ ∅ − + − +
(8.5)
and it is not hard to see that ¯ is again a -algebra whose trace w.r.t. is .[] ¯ 8.2 Lemma = ∩ . Moreover, 8.3 Lemma ¯ is generated by all sets of the form a (or b or − c or − d) where a (or b c d) is from or . 2
Conventions are tricky. The rationale behind our definitions is to understand ‘±’ in every instance as the limit of some (possibly each time different) sequence, and ‘0’ as a bona fide zero. Then 0 · ± = 0 · limn an = limn 0 · an = limn 0 = 0 while expressions of the type − or ± become limn an − limn bn ± n an or lim where two sequences compete and do not lead to unique results. lim b n n
Measures, Integrals and Martingales
59
Proof Set = a a ∈ . Since a = a ∪ +
and a ∈
¯ On the other hand, we see that a ∈ ¯ and ⊂ . a b = a \ b ∈
∀ − < a b <
¯ Since also which means that ⊂ ⊂ . j − = − −j = −j c + = j∈
j∈
j∈
we have − + ∈ which entails that all sets of the form B B ∪ + B ∪ − B ∪ − + ∈
∀ B ∈
therefore, ¯ ⊂ . The proofs for a ∈ and the other generating systems are similar. 8.4 Definition Let X be a measurable space. We write = and ¯ = ¯ for the families of real-valued /-measurable and numerical ¯ /-measurable functions on X. 8.5 Examples Let X be a measurable space. (i) The indicator function fx = 1A x is measurable if, and only if, A ∈ . This follows easily from Lemma 8.1 and ⎧ ⎪ ⎨ ∅ if 1 1A > = A if 1 > 0 ⎪ ⎩ X if < 0 (ii) Let A1 A2 AM ∈ be mutually disjoint sets and y1 yM ∈ . Then the function gx =
M
yj 1Aj x
(8.6)
j=1
is measurable. This follows from Lemma 8.1 and the fact (compare with the picture!) that
g > = · Aj ∈ j yj >
60
R.L. Schilling y4 y2
λ
λ
y1 y3 A1
A2
A3
A4
A1
{ f > λ } = A2 · A4
Functions of the form (8.6) are the building blocks for all measurable functions as well as for the definition of the integral. 8.6 Definition A simple function g X → on a measurable space X is a function of the form (8.6) with finitely many sets A1 AM ∈ and y1 yM ∈ . The set of simple functions is denoted by or . If the sets Aj 1 j M, are mutually disjoint we call M
yj 1Aj x
(8.7)
j=0
with y0 = 0 and A0 = A1 ∪ ∪ AM c a standard representation of g. Caution: The representations (8.7) are not unique. 8.7 Examples (continued) (iii) If a measurable function h X → attains only finitely many values y1 y2 yM ∈ , then it is a simple function. Indeed: set Bj = h = yj = h yj \ h < yj ∈ j = 1 2 M and note that the Bj are disjoint. Thus hx =
M j=1
yj 1Bj x =
M
yj 1 h=yj x
j=1
Since every simple function attains only finitely many values, this shows that every simple function has at least one standard representation. In particular, ⊂ consists of measurable functions. (iv) f g ∈ =⇒ f ± g f g ∈ . N Indeed: let f = M j=0 yj 1Aj and g = k=0 zk 1Bk be standard representations of f and g.
Measures, Integrals and Martingales
f
61
g
It is not hard to see (use the picture!) that f ±g =
M N
yj ± zk 1Aj ∩Bk
j=0 k=0
fg =
M N
yj zk 1Aj ∩Bk
j=0 k=0
and that Aj ∩ Bk ∩ Aj ∩ Bk = ∅ whenever j k = j k . After relabelling and merging the double indexation into a single index, this shows that f ± g fg ∈ . Notice that Aj ∩ Bk jk is the common refinement of the partitions Aj j and Bk k and that inside each of the sets Aj ∩ Bk the functions f and g do not change their respective values. (v) f ∈ =⇒ f + f − ∈ . Here we use the following notation: for a function u X → we write for the – u–
u+
u+ x = max ux 0
u− x = − min ux 0
(8.8)
for the positive u+ and negative u− parts of u. Obviously, u = u+ − u−
and
u = u+ + u−
(8.9)
(vi) f ∈ =⇒ f ∈ . Our next theorem reveals the fundamental rôle of simple functions. ¯ 8.8 Theorem Let X be a measurable space. Every /-measurable numer¯ ical function u X → is the pointwise limit of simple functions: ux = limj→ fj x, fj ∈ and fj u. If u 0, all fj can be chosen to be positive and increasing towards u so that u = supj∈ fj .
62
R.L. Schilling
Proof Assume first that u 0. Fix j ∈ and define level sets k2−j u < k + 12−j k = 0 1 2 j2j − 1 j Ak = uj k = j2j which slice up the graph of u horizontally as shown in the picture. The approximating simple functions are
j
fj
2–j
j
fj x =
j2
k2−j 1Aj x
k=0
k
and from the picture it is easy to see that • fj x − ux 2−j if x ∈ u < j ; j • Ak = k2−j u ∩ u < k + 12−j u j ∈ ; • 0 fj u and fj ↑ u. For a general u, we consider its positive and negative parts u± . Since u > if 0 + u > = ∅ if < 0 ¯ Thus u± are positive and since u− = −u+ , we have u± > ∈ for all ∈ . measurable functions, and we can construct, as above, simple functions gj ↑ u+ j→
and hj ↑ u− . Clearly, fj = gj − hj −−−→ u+ − u− = u as well as fj = gj + hj u+ + u− = u, and we are done. ¯ j ∈ , are 8.9 Corollary Let X be a measurable space. If uj X → , measurable functions, then so are sup uj j∈
inf uj
j∈
lim sup uj j→
lim inf uj j→
and, whenever it exists, limj→ uj . Before we prove Corollary 8.9 let us stress again that expressions of the type j→
supj∈ uj or uj −−−→ u, etc. are always understood in a pointwise, x-by-x sense, i.e. they are short for supj∈ uj x = sup uj x j ∈ or limj→ uj x = ux at each x (or for a specified range). The infimum ‘inf’ and supremum ‘sup’ are familiar from calculus. Recall the following useful formula inf uj x = − sup−uj x
j∈
j∈
(8.10)
Measures, Integrals and Martingales
63
which allows us to express an inf as a sup, and vice versa. Recall also the definition of the lower resp. upper limits lim inf and lim sup,
(8.11) lim inf uj x = sup inf uj x = lim inf uj x j→
k∈
jk
k→
lim sup uj x = inf sup uj x = lim k∈
j→
k→
jk
jk
sup uj x
(8.12)
jk
more details can be found in Appendix A. ¯ lim inf and lim sup always exist – but they may In the extended real line attain the values + and − – and we have lim inf uj x lim sup uj x j→
(8.13)
j→
Moreover, limj→ uj x exists [and is finite] if, and only if, upper and lower limits coincide lim inf j→ uj x = lim supj→ uj x [and are finite]; in this case all three limits have the same value. Proof (of Corollary 8.9) We show that supj uj and −1u = −u (for a measurable function u) are again measurable. Observe that for all a ∈
uj > a ∈ sup uj > a = j∈ j∈ ∈
The inclusion ‘⊃’ is trivial since a < uj x supj∈ uj x always holds; the direction ‘⊂’ follows by contradiction: if uj x a for all j ∈ , then also supj∈ uj x a. This proves the measurability of supj∈ uj . If u is measurable, we have for all a ∈ −u > a = u < −a ∈ which shows that −u is also measurable. The measurability of inf j∈ uj , lim inf j→ uj and lim supj→ uj follows now from formulae (8.10)–(8.12), which can be written down in terms of supj s and several multiplications by −1. If limj→ uj exists, it coincides with lim inf j→ uj = lim supj→ uj and inherits their measurability. ¯ 8.10 Corollary Let u v be /-measurable numerical functions. Then the functions u ± v
uv
u ∨ v = max u v
u ∧ v = min u v
¯ are /-measurable (whenever they are defined).
(8.14)
64
R.L. Schilling u
max{u, υ}
min{u, υ}
υ
The maximum u ∨ v and minimum u ∧ v of two functions is always meant pointwise, i.e. [] 1 u ∨ vx = max ux vx = ux + vx + ux − vx 2 1 [] u ∧ vx = min ux vx = ux + vx − ux − vx 2 Proof (of Corollary 8.10) If u v ∈ are simple functions, all functions in (8.14) are again simple functions[] and, therefore, measurable. For general u v ∈ ¯ j→
choose sequences fj j∈ gj j∈ ⊂ of simple functions such that fj −−−→ u j→
and gj −−−→ v. The claim now follows from the usual rules for limits. ¯ ¯ 8.11 Corollary A function u is /-measurable if, and only if, u± are /measurable. ¯ 8.12 Corollary If u v are /-measurable numerical functions, then u < v
u v
u = v
u = v ∈
Let us finally show an interesting result on the structure of T -measurable functions; see also Problem 7.11 of the previous chapter. 8.13 Lemma (Factorization lemma) Let T X → X be an / measurable map and let T ⊂ be the -algebra generated by T . Then ¯ ¯ if, and only if, u u = wT for some /-measurable function w X → ¯ ¯ is T /-measurable. X→ Proof Suppose that u is T -measurable. If u is an indicator function, u = 1A with A ∈ T , we know from the definition of T that A = T −1 A for some A ∈ . Thus u = 1A = 1T −1 A = 1A T and w = 1A will do. This consideration remains true for simple functions u ∈ T since they are just sums of scalar multiples of indicator functions;[] hence u = w T for a suitable w ∈ .
Measures, Integrals and Martingales
65
We can now use Theorem 8.8 and approximate the T -measurable function u by a sequence uj j∈ ⊂ T . By what was said above, uj = wj T for suitable wj ∈ . Then w = lim supj→ wj is measurable by C8.9 and satisfies ∗
wT = lim sup wj T = lim wj T = lim uj = u j→
j→
j→
where we used for the equality marked ∗ the fact that the limit limj→ uj exists. The converse, that u = w T is T -measurable, is obvious. Problems 8.1. Show directly that condition (i) of Lemma 8.1 is equivalent to either of (ii), (iii), (iv). ¯ defined in (8.5) is a -algebra. Moreover, prove that 8.2. Verify that ¯ = ¯ = ∩ . 8.3. Let X be a measurable space. (i) Let f g X → be measurable functions. Show that for every A ∈ the function hx = fx, if x ∈ A, and hx = gx, if x ∈ A, is measurable. (ii) Let fj j∈ be a sequence of measurable functions and let Aj j∈ ⊂ such that j∈ Aj = X. Suppose that fj Aj ∩Ak = fk Aj ∩Ak for all j k ∈ and set fx = fj x if x ∈ Aj . Show that f X → is measurable. 8.4. Let X be a measurable space and let be a sub--algebra. Show that . 8.5. Show that f ∈ implies that f ± ∈ . Is the converse valid? 8.6. Show that for every real-valued function u = u+ − u− and u = u+ + u− . 8.7. Scrutinize the proof of Theorem 8.8 and check that bounded [positive] measurable functions u ∈ can be approximated uniformly by an [increasing] sequence fj j∈ ⊂ of [positive] simple functions. 8.8. Show that every continuous function u → is / measurable. [Hint: check that for continuous functions f > is an open set.] 8.9. Show that x → max x 0 and x → min x 0 are continuous, and by Problem 8.8 or Example 7.3, measurable functions from → . Conclude that on any measurable space X positive and negative parts u± of a measurable function u X → are measurable. 8.10. Check that the approximating sequence fj j∈ for u in Theorem 8.8 consists of u-measurable functions. 8.11. Complete the proofs of Corollaries 8.11 and 8.12. 8.12. Let u → be differentiable. Explain why u and u = du/dx are measurable. 8.13. Find u, i.e. the -algebra generated by u, for the following functions: f g h → F G 2 →
(i)
fx = x
(ii)
gx = x2
(iv) Fx y = x + y
(v)
(iii)
hx = x
Gx y = x2 + y2
66
8.14. 8.15. 8.16. 8.17.
8.18.
R.L. Schilling [Hint: under f g h the pre-images of intervals are (unions of) intervals, under F we get strips in the plane, under G annuli and discs.] Consider and u → . Show that x ∈ u for all x ∈ if, and only if, u is injective. Let be one-dimensional Lebesgue measure. Find u−1 , if ux = x. Let u → be measurable. Which of the following functions are measurable: ux − 2 eux sinux + 8 u x sgn ux − 7? One can show that there are non-Borel measurable sets A ⊂ , cf. Appendix D. Taking this fact for granted, show that measurability of u does not, in general, imply the measurability of u. (The converse is, of course, true: measurability of u always guarantees that of u.) Show that every increasing function u → is / measurable. Under which additional condition(s) do we have u = ? [Hint: show that u < is an interval by distinguishing three cases: u is continuous and strictly increasing when passing the level , u jumps over the level u is ‘flat’ at level . Make a picture of these situations.]
9 Integration of positive functions
Throughout this chapter X will be some measurable space. Recall that + [+¯ ] are the -measurable positive real [numerical] functions and [ + ] are the [positive] simple functions. The fundamental idea of integration is to measure the area between the graph of a function and the abscissa. For a positive simple function f ∈ + in standard representation1 this is easily done: if
f=
M
yj 1Aj ∈
+
M
then
j=0
yj Aj
(9.1)
j=0
should be the -area enclosed by the graph and the abscissa.
yj
f
µ (Aj)
There is only the problem that (9.1) might depend on the particular (standard) representation of f – and this should not happen. N 9.1 Lemma Let M j=0 yj 1Aj = k=0 zk 1Bk be two standard representations of + the same function f ∈ . Then M j=0 1
N
yj Aj =
zk Bk
k=0
In the sense of Definition 8.6. By 8.5(iii) every f ∈ has a standard representation.
67
68
R.L. Schilling
· A1 ∪ · ∪ · AM = X = B0 ∪ · B1 ∪ · ∪ · BN we get Proof Since A0 ∪ N Aj = · Aj ∩ Bk
and
M Bk = · Bk ∩ Aj j=0
k=0
Using the (finite) additivity of we see that M
yj Aj =
j=0
M
yj
j=0
N
Aj ∩ Bk =
M N
yj Aj ∩ Bk
(9.2)
j=0 k=0
k=0
(since all yj are positive, the above sums always exist in 0 ). Similarly, N
zk Bk =
k=0
N k=0
zk
M
Aj ∩ Bk =
j=0
N M
zk Aj ∩ Bk
(9.3)
k=0 j=0
But yj = zk whenever Aj ∩ Bk = ∅, while for Aj ∩ Bk = ∅ we have Aj ∩ Bk = ∅ = 0. Thus yj Aj ∩ Bk = zk Aj ∩ Bk
∀ j k
and (9.2) and (9.3) have the same value. Lemma 9.1 justifies the following definition based on (9.1). + 9.2 Definition Let f = M j=0 yj 1Aj ∈ be a simple function in standard representation. Then the number I f =
M
yj Aj ∈ 0
j=0
(which is independent of the representation of f ) is called the (-)integral of f . 9.3 Properties (of I + → 0 ). Let f g ∈ + . Then (i) (ii) (iii) (iv)
I 1A = A ∀ A ∈ ; I f = I f ∀ 0; I f + g = I f + I g; f g =⇒ I f I g.
(positive homogeneous) (additive) (monotone)
Proof (i) and (ii) are obvious from the definition of I . (iii): take standard representations f=
M j=0
yj 1Aj
and
g=
N k=0
zk 1Bk
Measures, Integrals and Martingales
69
and observe that, as in Example 8.5(iv), M N
f +g =
yj + zk 1Aj ∩Bk ∈ +
j=0 k=0
is a standard representation of f + g. Thus I f + g
M N
=
yj + zk Aj ∩ Bk
j=0 k=0 M
=
yj
j=0 9293
=
M
N
Aj ∩ Bk +
k=0
=
zk
k=0
yj Aj +
j=0
N
N
M
Aj ∩ Bk
j=0
zk Bk
k=0
I f + I g
(iv): If f g, then g = f + g − f where g − f ∈ + , see examples 8.7(iv). By part (iii) of this proof, I g = I f + I g − f I f since I • is positive. In Theorem 8.8 we have seen that every u ∈ + can be written as an increasing limit of simple functions; by Corollary 8.9, suprema of simple functions are again measurable, so that u ∈ + ⇐⇒ u = sup fj j∈
fj ∈ +
fj fj+1
We will use this to ‘inscribe’ simple functions (which we know how to integrate) below the graph of a positive measurable function u and exhaust the -area below u. 9.4 Definition Let X be a measure space. The (-)integral of a positive numerical function u ∈ +¯ is given by (9.4) u d = sup I g g u g ∈ + ∈ 0 If we need to emphasize the integration variable, we also write ux dx or ux dx. The key observation is that the integral d extends I , i.e.
70
R.L. Schilling
9.5 Lemma For all f ∈ + we have
f d = I f .
Proof Let f ∈ + . Since f f , f is an admissible function in the supremum appearing in (9.4), hence def I f sup I g g f g ∈ + = f d On the other hand, + g f implies that I g I f by Properties 9.3(iv), and def f d = sup I g g f g ∈ + I f The next result is the first of several convergence theorems. It shows, in particular, that we could have defined (9.4) using any increasing sequence fj ↑ u of simple functions fj ∈ + . 9.6 Theorem (Beppo Levi) Let X be a measure space. For an increasing sequence of numerical functions uj j∈ ⊂ +¯ , 0 uj uj+1 we have u = supj∈ uj ∈ +¯ and
sup uj d = sup j∈
j∈
uj d
(9.5)
Note that we can write limj→ instead of supj∈ in (9.5) since the supremum of an increasing sequence is its limit. Moreover, (9.5) holds in 0 +, i.e. the case ‘+ = +’ is possible. Proof (of Theorem 9.6) That u ∈ +¯ follows from Corollary 8.9. Step 1. Claim: u v ∈ +¯ u v =⇒ u d v d. This follows from the monotonicity of the supremum since every simple f ∈ + with f u also satisfies f v, and so u d = sup I f f u f ∈ + sup I f f v f ∈ + = v d Step 2. Claim: supj∈ uj d supj∈ uj d; this shows ‘’ in (9.5). Because of step 1 and uj u = supj∈ uj we see uj d u d ∀ j ∈
Measures, Integrals and Martingales
71
The right-hand side is independent of j, so that we may take the supremum over all j ∈ on the left. Step 3. Claim: f u f ∈ + =⇒ I f supj∈ uj d. This will prove ‘’ in (9.5) since the right-hand side does not depend on f and so we may take the supremum over all f ∈ + with f u on the left (which is, by definition, the integral u d). To prove the claim we fix some f ∈ + , f u. Since u = supj∈ uj we can find[] for every ∈ 0 1 and every x ∈ X some Nx ∈ with fx uj x
∀ j Nx
which means that the sets Bj = f uj increase as j ↑ towards X and are, by Corollary 8.12, measurable as f uj ∈ +¯ . By the very definition of the Bj 1Bj f 1Bj uj uj and, if f =
M
k=0 yk 1Ak ,
M
we get from Lemma 9.5 and step 1
yk Ak ∩ Bj = I 1Bj f
uj d sup j∈
k=0
uj d
(9.6)
At this point we use the -additivity of (in the guise of T4.4(iii)) to get Ak ∩ Bj ↑ Ak ∩ X = Ak
as Bj ↑ X j ↑
which implies (the far right of (9.6) no longer depends on j) I f =
M
yk Ak sup
j∈
k=0
uj d
Since we were free in our choice of ∈ 0 1, we can make → 1, and the claim and the theorem follow. One can see the next corollary just as a special case of Theorem 9.6. Its true meaning, however, is that it allows us to calculate the integral of a measurable function using any approximating sequence of elementary functions—and this is a considerable simplification of the original definition (9.4). 9.7 Corollary Let u ∈ +¯ . Then u d = lim fj d j→
holds for every increasing sequence fj j∈ ⊂ + with limj→ fj = u.
72
R.L. Schilling
9.8 Properties (of the integral) Let u v ∈ +¯ . Then ∀ A ∈ ; (i) 1A d = A (ii) u d = u d ∀ 0; (positive homogeneous) (iii) u + v d = u d + v d; (additive) (iv) u v =⇒ u d v d. (monotone) Proof (i) follows from Properties 9.3(i) and Lemma 9.5 and (ii), (iii) follow from the corresponding properties of I , Corollary 9.7 and the usual rules for limits. (iv) has been proved in step 1 of the proof of Theorem 9.6. 9.9 Corollary Let uj j∈ ⊂ +¯ . Then j=1 uj is measurable and we have
uj d =
j=1
uj d
(9.7)
j=1
(including the possibility + = +). Proof Set sM = u1 + u2 + · · · + uM and apply Properties 9.8(iii) and T9.6. 9.10 Examples Let X be a measurable space. (i) Let = y be the Dirac measure for fixed y ∈ X. Then ∀ u ∈ +¯ u dy = ux y dx = uy
Indeed: for any f ∈ + with standard representation f = M j=0 j 1Aj , we know that y ∈ X lies in exactly one of the Aj , say y ∈ Aj0 . Then
fx y dx =
M
j 1Aj x y dx =
j=0
M
j y Aj = j0 = fy
j=0
Now take any sequence of simple functions fk ↑ u. By Corollary 9.7 ux y dx = lim fk x y dx = lim fk y = uy
k→
k→
(ii) Let X = j=1 j j . As we have seen in Problem 4.6(ii), is indeed a measure and k = k . On the other hand, all measurable functions u ∈ +¯ are of the form uk =
j=1
uj 1 j k
∀k ∈
Measures, Integrals and Martingales
73
for a suitable sequence uj j∈ ⊂ 0 .2 Thus by Corollary 9.9,
u d =
j=1
=
uj 1 j d =
uj 1 j d
j=1
uj j =
j=1
uj j
j=1
We close this chapter with another convergence theorem due to P. Fatou and which is often called Fatou’s lemma. 9.11 Theorem (Fatou) Let uj j∈ ⊂ +¯ be a sequence of positive measurable numerical functions. Then u = lim inf j→ uj is measurable and lim inf uj d lim inf uj d j→
j→
¯ the meaProof Recall that lim inf j→ uj = supk∈ inf jk uj always exists in ; surability of C8.9. Applying T9.6 to the increasing sequence
lim inf was shown in + inf jk uj k∈ – which is in ¯ by C8.9 – we find 9.6 lim inf uj d = sup inf uj d j→
k∈
9.8(iv)
jk
sup inf k∈
k
= lim inf →
u d
u d
where we used that inf uj u for all k and the monotonicity of the integral, jk
cf. Properties 9.8(iv). Problems
9.1. Let f X → be a positive simple function of the form fx = m j 1Aj x, j=1 m j 0, Aj ∈ —but not necessarily disjoint. Show that I f = j=1 j Aj . [Hint: use additivity and positive homogeneity of I .] 9.2. Complete the proof of Properties 9.8 (of the integral). 9.3. Find an example showing that an ‘increasing sequence of functions’ is, in general, different from a ‘sequence of increasing functions’. 2
This means that we can identify -measurable functions f → 0 and arbitrary sequences uj j∈ ⊂ 0 by uk = uk .
74
R.L. Schilling
9.4. Complete the proof of Corollary 9.9 and show that (9.7) is actually equivalent to (9.5) in Beppo Levi’s theorem 9.6. + 9.5. Let X be a measure space and u ∈ . Show that the set-function A → 1A u d, A ∈ , is a measure. 9.6. Prove: Every function u → on is measurable. 9.7. Let X be a measurable space and j j∈ be a sequence of measures thereon. Set, as in 9.10(ii), = j∈ j . By Problem 4.6(ii) this is again a measure. Show that u d = u dj ∀ u ∈ + j∈
[Instructions: (1) consider u = 1A . (2) consider u = f ∈ + . (3) approximate u ∈ + by an increasing sequence of simple functions and use Theorem 9.6. To interchange increasing limits/suprema use the hint to Problem 4.6(ii).] + 9.8. Reverse Fatou lemma. Let X be a measure space and uj j∈ ⊂ . + If uj u for all j ∈ and some u ∈ with u d < , then lim sup uj d lim sup uj d j→
j→
9.9. Fatou’s lemma for measures. Let X be a measure space and let Aj j∈ , Aj ∈ , be a sequence of measurable sets. We set
lim inf Aj = Aj and lim sup Aj = Aj (9.8) j→
j→
k∈ jk
k∈ jk
(i) Prove that 1lim inf Aj = lim inf 1Aj and 1lim sup Aj = lim sup 1Aj . j→
j→
j→
j→
[Hint: check first that 1j∈ Aj = inf j∈ 1Aj and 1j∈ Aj = supj∈ 1Aj .] (ii) Prove that lim inf Aj lim inf Aj . j→ j→ (iii) Prove that lim sup Aj lim sup Aj if is a finite measure. j→
j→
(iv) Provide an example showing that (iii) fails if is not finite. 9.10. Let Aj j∈ ⊂ be a sequence of disjoint sets such that · j∈ Aj = X. Show that for every u ∈ + 1Aj u d u d = j=1
Use this to construct on a -finite measure space X a function w which satisfies wx > 0 for all x ∈ X and w d < . 9.11. Kernels. Let X be a measure space. A map N X × → 0 is called kernel if A → Nx A
is a measure for every x ∈ X
x → Nx A
is a measurable function for every A ∈
Measures, Integrals and Martingales
75
(i) Show that A → NA = Nx A dx is a measure on X . (ii) For u ∈ + define Nux = uy Nx dy. Show that u → Nu is additive, positive homogeneous and Nu• ∈ + . (iii) Let N be the measure introduced in (i). Show that u dN = Nu d for all u ∈ + . [Hint: consider in each part of this problem first indicator functions u = 1A , then simple functions u ∈ + and then approximate u ∈ + by simple functions using 8.8 and 9.6] 9.12. (Continuation of Problem 6.1) Consider on the -algebra of all Borel sets which are symmetric w.r.t. the origin. Set A+ = A ∩ 0 , A− = − 0 ∩ A ± ± and consider their symmetrizations A± = A ∪ −A ∈ . Show that for every u ∈ + with 0 u 1 and for every measure on the set-function A → 1A+ u d + 1A− 1 − u d is a measure on that extends . Why does this not contradict the uniqueness theorem 5.7 for measures?
10 Integrals of measurable functions and null sets
Throughout this chapter X will be a measure space. Let us briefly review how we constructed the integral for positive measurable functions u ∈ +¯ . Guided by the idea that the integral should be the area between the graph of a function and the x-axis, we defined for indicator functions 1A d = A and extended this definition by linearity to all positive simple functions + which are just linear combinations of indicator functions (there was an issue about well-definedness which was addressed in L9.1). Since all positive measurable functions can be obtained as increasing limits of simple functions (Theorem 8.8), we could then define the integral of u ∈ +¯ by exhausting the area below u with elementary functions f u, see Definition 9.4. Beppo Levi’s theorem (in the form of C9.7) finally allowed us to replace the sup by an increasing limit. The integral turned out to be positive homogeneous, additive and monotone. We want to extend this integral now to not necessarily positive measurable functions u ∈ ¯ by linearity. The fundamental observation here is that u ∈ ¯ ⇐⇒ u = u+ − u−
u+ u− ∈ +¯
(cf. Corollary 8.11). This remark suggests the following definition. ¯ on a measure space X is said to be 10.1 Definition A function u X → ¯ (-)integrable, if it is /-measurable and if the integrals u+ d u− d < are finite. In this case we call u d = u+ d − u− d ∈ − (10.1) the (-)integral of u. We write 1 [1¯ ] for the set of all real-valued [numerical] -integrable functions. 76
Measures, Integrals and Martingales
77
In case we need to exhibit the integration variable, we write u d = ux dx = ux dx If = n , we call u dn the (n-dimensional) Lebesgue integral and u ∈ 1¯ n 1 Traditionally one writes ux dx or u dx is said to be Lebesgue integrable. n for the formally more correct u d . If we want to stress X or , etc., we will also write 1¯ X or 1¯ , etc. 10.2 Remark In the definition of the integral for positive u ∈ +¯ we did allow that u d = . Since we want to avoid the case ‘ − ’ in (10.1), we impose the finiteness condition u± d < . In particular, a positive function is said to be integrable only if the integral is finite: u ∈ 1¯ u 0 ⇐⇒ u ∈ +¯ and u d < (which is clear since for positive functions u+ = u and u− = 0).
+ Caution: Some authors call u -integrable (in the wide sense) whenever u d− − ¯ i.e. whenever it is not of the form ‘ − ’. We will u d makes sense in , not use this convention. Let us briefly summarize the most important integrability criteria. 10.3 Theorem Let u ∈ ¯ . Then the following conditions are equivalent: (i) u ∈ 1¯ ;
(ii) u+ u− ∈ 1¯ ;
(iii) u ∈ 1¯ ;
(iv) ∃ w ∈ 1¯ w 0 such that u w.
Proof (i)⇔(ii): this is just the definition of integrability. (ii)⇒(iii): since u = u+ + u− , we can use additivity of the integral on the + + − ¯ , see 9.8(iii), to get u d = u d + u d < . (iii)⇒(iv): take w = u. (iv)⇒(i): we have to show that u± ∈ 1¯ . Since u± u w we find by the monotonicity of the integral 9.8(iv) that u± d w d < .
1
The letter is in honour of H. Lebesgue who was one of the pioneers of modern integration theory. If is other than n , d is sometimes called the abstract Lebesgue integral.
78
R.L. Schilling
It is now easy to see that the properties 9.8 of the integral on +¯ extend to the set 1¯ : 10.4 Theorem Let X be a measure space and u v ∈ 1¯ , ∈ . Then 1 (i) u ∈ ¯ and u d = u d; (homogeneous) (ii) u + v ∈ 1¯ and u + v d = u d + v d (additive) (whenever u + v is defined); (iii) min u v max u v ∈ 1¯ ; (iv) u v =⇒ u d v d; (v) u d u d.
(lattice property) (monotone) (triangle inequality)
Proof There are principally two ways to prove this theorem: either we consider positive and negative parts for (i)–(v) and show that their integrals are finite, or we use T10.3(iii), (iv). Doing this we find (i) u = · u ∈ 1¯ by 9.8(ii). (ii) u + v u + v ∈ 1¯ by 9.8(iii). (iii) max u v u + v ∈ 1¯ and min u v u + v ∈ 1¯ by 9.8(iii). (iv) If u v, we find that u+ v+ and v− u− . Thus 9.8(iv) + − + − v d − v d = v d u d = u d − u d (v) Using ±u u we deduce from (iv) that u d = max u d − u d max
u d
− u d = u d
¯ for all x ∈ X – i.e. if we can exclude 10.5 Remark If ux ± vx is defined in ‘ − ’ – then T10.4(i),(ii) just say that the integral is linear: u + v d = u d + v d ∈ (10.2) This is always true for real-valued u v ∈ 1 , i.e. 1 is a vector space with addition and scalar () multiplication defined by u + vx = ux + vx
· ux = · ux
Measures, Integrals and Martingales
and
d 1 →
u →
79
u d
is a positive linear functional. 10.6 Examples Let us reconsider the examples from 9.10: (i) On X y , y ∈ X fixed, we have ux y dx = uy and u ∈ 1¯ y ⇐⇒ u ∈ ¯ and uy < (ii) On = is measurable, cf. Probj=1 j j every u→ lem 9.6. From 9.10(ii) we know that u d = j=1 j uj, so that u ∈ 1 ⇐⇒
j uj <
j=1
If 1 = 2 = = 1, 1 is called the set of summable sequences and 1 customarily denoted by = xj j∈ ⊂ j=1 xj < . This space is important in functional analysis. (iii) Let P be a probability space. Then every bounded measurable function (‘random variable’) ∈ , C = sup∈ < , is integrable. This follows immediately from dP sup Pd = C Pd = C < ∈
Caution: Not every P-integrable function is bounded.[] For A ∈ and u ∈ +¯ [or 1¯ ] we know from 8.5(i) and C8.10 [and 10.3(iv) using 1A u u] that 1A u is again measurable [or integrable]. 10.7 Definition Let X be a measure space and u ∈ 1¯ or u ∈ +¯ . Then u d = 1A u d = 1A xux dx ∀ A ∈ A
Of course,
X u d
=
u d.
10.8 Lemma On the measure space X let u ∈ + . The set-function A ∈ A → u d = 1A u d A
80
R.L. Schilling
is a measure on X . It is called the measure with density (function) u with respect to and denoted by = u . Proof Exercise. If has a density w.r.t. , one writes traditionally d/d for the density function. This notation is to be understood in a purely symbolical way; it is motivated by the well-known fundamental theorem of integral and differential calculus (for Riemann integrals) b ub − ua = u x dx a
where u = du 1 /d1 in our notation = du/dx. At least if u x 0 one can show that a b = ub − ua defines a measure and that, taking the fundamental theorem of integral calculus for granted, = u 1 = u dx, compare with Problem 7.9. A more advanced discussion of derivatives can be found in Chapter 19, Theorem 19.20 and Appendix E.16–E.19. Null sets and the ‘a.e.’ We will now discuss the behaviour of integrable functions on null sets which we have already encountered in Problem 4.10. Let X be a measure space. A (-)null set N ∈ is a measurable set N ∈ satisfying N ∈ ⇐⇒ N ∈ and
N = 0
(10.3)
If a property = x is true for all x ∈ X apart from some x contained in a null set N ∈ , we say that x holds for (-) almost all (a.a.) x ∈ X or that holds (-) almost everywhere (a.e.). In other words,
holds a.e. ⇐⇒ x x
fails ⊂ N ∈
but we do not a priori require that the set fails is itself measurable. Typically we are interested in properties x of the type: ux = vx ux vx, etc. and we say, for example, u=v
a.e. ⇐⇒ x ux = vx
is (contained in) a -null set
Caution: The assertions ‘u enjoys a property a.e.’ and ‘u is a.e. equal to v which satisfies everywhere’ are, in general, far apart; see in this connection Problem 10.14.
Measures, Integrals and Martingales
81
10.9 Theorem Let u ∈ 1¯ be a numerical integrable function on a measure space X . Then (i) (ii) N
u d = 0 ⇐⇒ u = 0 a.e. ⇐⇒ u = 0 = 0; u d = 0 ∀ N ∈ .
Proof Let us begin with (ii). Obviously, min u j ↑ u as j ↑ . By Beppo Levi’s theorem 9.6 we find 10.4(v) u d = 1N u d 1N u d N 9.6 = sup 1N min u j d sup j 1N d j∈
j∈
= sup j 1N d = sup j N = 0 j∈ j∈ =0
The second equivalence in (i) is clear since, due to the measurability of u, the set u = 0 is not just a subset of a null set, but measurable, hence a proper null set. In order to see ‘⇐’ of the first equivalence, we use (ii) with N = u = 0:
u d = =
u=0
u=0
u d + u d +
u=0
u=0
u d (ii)
0 d = 0
For ‘⇒’ we use the so-called Markov inequality: for A ∈ and c > 0 we have u c ∩ A =
1 uc∩A x dx
c 1 uc x dx A c 1 ux 1 uc x dx c A 1 ux dx c A =
(10.4)
82
R.L. Schilling
and for A = X this inequality implies that 4.6 [] u > 0 =
u 1j u 1j j∈
j∈
j u d = 0 j∈
=0
10.10 Corollary Let u v ∈ ¯ such that u = v -almost everywhere. Then (i) u v 0 =⇒ u d = v d;2 (ii) u ∈ 1¯ =⇒ v ∈ 1¯ and u d = v d. Proof Since u v are measurable, N = u = v ∈ . Therefore (i) follows from u d + u d u d = Nc
10.9(i) =
Nc
10.9(i)
=
Nc
N
v d + 0 v d +
N
use that u = v on N c v d =
v d
± ± For (ii) we observe first that u =v a.e. implies ±that u = v a.e. and then apply ± (i) to positive and negative parts: v d = u d < ; the claim follows.
10.11 Corollary If u ∈ ¯ and v ∈ 1¯ , v 0, then u v a.e. =⇒ u ∈ 1¯ Proof We have u± u v a.e., and by C10.10 u± d v d < . This shows that u is integrable. 10.12 Proposition (Markov inequality) For all u ∈ 1¯ , A ∈ and c > 0 1 u d (10.5) u c ∩ A c A and if A = X, in particular, 1 u d (10.6) u c c Proof See (10.4) in the proof of Theorem 10.9(i). 2
including, possibly, + = +.
Measures, Integrals and Martingales
83
10.13 Corollary If u ∈ 1¯ , then u is almost everywhere -valued. In partic ular, we can find a version u˜ ∈ 1 such that u˜ = u a.e. and u˜ d = u d. · u = − ∈ . Now Proof Set N = u = = u = + ∪ N=
u j j∈
and by 3.4(iii )3 and the Markov inequality we get
1 u d = 0 N = lim u j lim j→ j→ j <
The function u˜ = 1N c u is real-valued, measurable and coincides outside N with u. From C10.10 we deduce that u˜ is integrable (and even ∈ 1 ) with u˜ d = u d. Corollary 10.13 allows us to identify (up to null sets) functions from 1¯ and Since 1 is a much nicer space – it is a vector space and we need not take any precautions when adding functions, etc. – we will work from now on only with 1 . The corresponding statements for 1¯ are then easily derived. We close this section with a technique which will be useful in many applications later on.
1 .
10.14 Corollary Let ⊂ be a sub--algebra. (i) If u w ∈ 1 and if G u d = G w d for all G ∈ , then u = w -a.e. (ii) If u w ∈ + and if G u d = G w d for all G ∈ , then u = w -a.e. under the additional assumption that is -finite.4 Proof (i) Since u and w are -measurable functions, we have G ∩ u w G ∩
u < w ∈ for all G ∈ . Thus u − w d = u − w d + w − u d (10.7) G
G∩ uw
while G∩ uw
3 4
u − w d =
G∩ uw
G∩ u<w
u d −
G∩ uw
w d = 0
[ ] This is applicable since is finite on u 1, say . i.e. there exists an exhausting sequence Gj j ⊂ with Gj ↑ X and Gj < .
(10.8)
84
R.L. Schilling
by the linearity of the integral and because of our assumption. The other term on the right-hand side of (10.7) can be treated similarly, and the conclusion follows from Theorem 10.9(i). (ii) If u w are positive measurable functions, we cannot use the linearity of the integral in (10.8) as this may yield an undefined expression of the form ‘ − ’. We can avoid this by an approximation procedure which relies on the fact that is -finite. Pick an exhausting sequence Gj j∈ with Gj ↑ X and Gj < . Then the sets Fj = G ∩ u w ∩ Gj ∩ u j are in and have finite -measure. Moreover, the function u − w 1Fj is integrable[] and increases towards u − w 1G∩ uw , so that by Beppo Levi’s Theorem 9.6,
u − w d = sup u − w d = sup u d − w d = 0 G∩ uw
j∈
Fj
j∈
Fj
Fj
=0
∀ j∈
A similar argument applies to the other term in (10.7) and the claim follows. Problem 10.16 below shows that the -finiteness of in Corollary 10.14(ii) is really needed. Problems 10.1. Prove Remark 10.5, i.e. prove the linearity of the integral. 10.2. Let P be a probability space. Find a counterexample to the claim: every P-integrable function u ∈ 1 P is bounded. √ [Hint: you could try to take = 0 1, P = 1 and show that 1/ x is Lebesgue integrable on 0 1 by finding a sequence of suitable simple functions that is above √ 1/ x on, say, 1/m 1 and then let m → using Beppo Levi’s Theorem 9.6.] 10.3. True or false: if f ∈ 1 we can change f on a set N of measure zero, (e.g. by fx if x ∈ N ˜ f x = if x ∈ N ¯ is any number) andf˜ is still integrable, even f d = f˜ d ? where ∈ 10.4. Every countable set is a 1 -null set. Use the Cantor ternary set C (cf. Problem 7.10) to illustrate that the converse is not true. What happens if we change 1 to 2 ? 10.5. Prove the following variants of the Markov inequality P10.12: For all c > 0 and whenever the expressions involved make sense/are finite, then: 1 (i) u > c u d; c 1 (ii) u > c p up d for all 0 < p < ; c
Measures, Integrals and Martingales
85
1 u d for an increasing function + → + ; c 1 (iv) u u d ; 1 (v) u < c u d for a decreasing function + → + ; c √ 1 (vi) P X − EX VX 2 , where P is a probability space and, in probabilistic jargon, X is a random variable (i.e. a measurable function X → ), EX = X dP the expectation or mean value and VX = X − EX2 dP the variance. Remark. This is Chebyshev’s inequality. 10.6. Show that up d < implies that u is a.e. real-valued (in the sense − valued!). Is this still true if we have arctanu d < ?. 10.7. Let Aj j∈ ⊂ be a sequence of pairwise disjoint sets. Show that (iii) u c
u 1j Aj ∈ 1 ⇐⇒ u 1An ∈ 1 and
j=1 Aj
u d <
10.8. Generalized Fatou lemma. Assume that uj j∈ ⊂ 1 . Prove: (i) If uj v for all j ∈ and some v ∈ 1 , then lim inf uj d lim inf uj d j→
j→
(ii) If uj w for all j ∈ and some w ∈ 1 , then lim sup uj d lim sup uj d j→
j→
(iii) Find examples that show that the upper and lower bounds in (i) and (ii) are necessary. [Hint: mimic and scrutinize the proof of Fatou’s Lemma 9.11 especially when it comes to the application of Beppo Levi’s theorem. What goes wrong if we do not have this upper/lower bound? Note that we have an ‘invisible’ v = 0 in T9.11.] 10.9. Let P be a probability space. Show that for u ∈ u ∈ 1 P
⇐⇒
P u j <
j=0
10.10. Independence (2). Let P be a probability space. Recall the notion of independence of two -algebras ⊂ introduced in Problem 5.10. Show that u ∈ + and w ∈ + satisfy uw dP = u dP · w dP
86
R.L. Schilling and that for u ∈ and w ∈ u ∈ 1
and w ∈ 1 ⇒ uw ∈ 1
Find an example proving that this fails if and are not independent. [Hint: start with simple functions and use Beppo Levi’s theorem 9.6.] 10.11. Completion (3). Let X ∗ ¯ be the completion of X – cf. Problems 4.13, 6.2. (i) Show that for every f ∗ ∈ +∗ there are f g ∈ + with f f ∗ g and f = g = 0 as well as f d = f ∗ d ¯ = g d. ∗ ∗ (ii) u X → is -measurable if, and only if, there exist -measurable ¯ with u u∗ w and u = w -a.e. functions u w X → ∗ 1 (iii) If ¯ then u w from (ii) can be chosen from 1 such that u ∈ , u d = u∗ d ¯ = w d. [Hint: (i) use Problem 4.13(v). (ii) for ‘⇒’ consider the sets u∗ > and use 4.13(v). The other direction is harder. For this consider first step functions using again 4.13(v) and then general functions by monotone convergence. (iii) by 4.13(iii), = ¯ on , and thus f d = f d ¯ for -measurable f .] 10.12. Completion (4). Inner measure and outer measure. Let X be a finite measure space. Define for every E ⊂ X the outer resp. inner measure ∗ E = inf A A ∈ A ⊃ E
and
∗ E = sup A A ∈ A ⊂ E
(i) Show that ∗ E ∗ E
∗ E + ∗ E c = X
∗ E ∪ F ∗ E + ∗ F
∗ E + ∗ F ∗ E ∪ F
(ii) For every E ⊂ X there exist sets E∗ E ∗ ∈ such that E∗ = ∗ E and E ∗ = ∗ E. [Hint: use the definition of the infimum to find sets E n ⊃ E such that E n − ∗ E n1 and consider n E n ∈ .] (iii) Show that ∗ = E ⊂ X ∗ E = ∗ E is a -algebra and that it is the completion of w.r.t. . Conclude, in particular, that ∗ ∗ = ∗ ∗ = ¯ if ¯ is the completion of . 10.13. Let X be a measure space and u ∈ . Assume that u ∈ and u = w almost everywhere w.r.t. . When can we say that w ∈ ? 10.14. ‘a.e.’ is a tricky business. When working with ‘a.e.’ properties one has to be extremely careful. For example, the assertions ‘u is continuous a.e.’ and ‘u is a.e. equal to an (everywhere) continuous function’ are far apart! Illustrate this by considering the functions u = 1 and u = 10 . 10.15. Let be a -finite measure on the measurable space X . Show that there exists a finite measure P on X such that = P , i.e. and P have the same null sets.
Measures, Integrals and Martingales
87
10.16. Construct an example showing that for u w ∈ + the equality G u d = w d for all G ∈ does not necessarily imply that u = w almost everywhere. G [Hint: In view of 10.14 cannot be -finite. Consider on the measure = m 1 where m = 1 x1 + 1 x>1 , u ≡ 1 and w = 1 x1 +2 1 x>1 . Then all Borel subsets of x > 1 have either -measure 0 or +, thus B u d = w d for all B ∈ while u = w = .] B
11 Convergence theorems and their applications
Throughout this chapter X will be some measure space. One of the shortfalls of the Riemann integral is the fact that we do not have sufficiently general results that allow us to interchange limits and integrals – typically one has to assume uniform convergence for this. This has partly to do with the fact that the set of Riemann integrable functions is somewhat limited, see Theorem 11.8. The classical counterexample for this defect is Dirichlet’s jump function x → 1∩01 x which is not Riemann integrable since its upper function is 101 while the lower function is 0 · 101 .[] For the Lebesgue integral on +¯ we have already seen more powerful convergence results in the form of Beppo Levi’s theorem 9.6 or Fatou’s lemma 9.11. They can deal with Dirichlet’s jump function: for any enumeration of = qj j ∈
we get 1∩01 d1 = sup 1q1 qN ∩01 d1 N ∈
9.6
= sup
N ∈
1q1 qN ∩01 d1
= sup 1 qj ∈ 0 1 1 j N = 0 N ∈ =0
In this chapter we study systematically convergence theorems for 1 and some of their most important applications. The first is a generalization of Beppo Levi’s theorem 9.6. 11.1 Theorem (Monotone convergence). Let X be a measure space. (i) Let uj j∈ ⊂ 1 be an increasing sequence of integrable functions u1 u2 with limit u = supj∈ uj . Then u ∈ 1 if, and only if, 88
Measures, Integrals and Martingales
89
supj∈ uj d < +, in which case sup uj d = sup uj d j∈
j∈
1
(ii) Let vk k∈ ⊂ be a decreasing sequence of integrable functions v1 1 v2 withlimitv = inf k∈ vk . Thenv ∈ if, andonlyif, inf k∈ vk d > −, in which case inf vk d = inf vk d k∈
k∈
Proof Obviously, (i) implies (ii) as uj = −vj fulfils all the assumptions of (i). To see (i), we remark that uj − u1 ∈ 1 defines an increasing sequence of positive functions 0 uj − u1 uj+1 − u1 for which we may use the Beppo Levi theorem 9.6: 0 sup uj − u1 d = sup uj − u1 d j∈
(11.1)
j∈
Assume that u ∈ 1 . Since the ‘sup’ in (11.1) stands for an increasing limit, we find that sup uj d = u − u1 d + u1 d j∈
(10.2)
=
u d −
u1 d +
u1 d =
u d <
Conversely, if supj∈ uj d < , we see from (11.1) that u − u1 ∈ 1 and, as u1 ∈ 1 , u = u − u1 + u1 ∈ 1 by (10.2). Therefore, (11.1) implies u d = u − u1 d + u1 d = sup uj d < j∈
One of the most useful and versatile convergence theorems is the following. 11.2 Theorem (Lebesgue. Dominated convergence). Let X be a measure space and uj j∈ ⊂ 1 be a sequence of functions such that uj w for all j ∈ and some w ∈ 1+ . If ux = limj→ uj x exists for almost every x ∈ X, then u ∈ 1 and we have (i) lim uj − u d = 0; j→ (ii) lim uj d = lim uj d = u d. j→
j→
90
R.L. Schilling
Proof Since all uj are measurable, N = x limj uj x does not exist is measurable, hence N ∈ , and we can assume that N = ∅ as the integral over the null set N gives no contribution, cf. Theorem 10.9(ii) – alternatively we could consider 1N c u and 1N c uj instead of u and uj . From uj w we get u = limj→ uj w, and u ∈ 1 by C10.11(iv). Therefore,
10.4(v)
uj d − u d = uj − u d uj − u d
which means that (i) implies (ii). Since uj − u uj + u 2w
∀j ∈
we get 2w − uj − u 0 and Fatou’s lemma 9.11 tells us that 2w d = lim inf 2w − uj − u d j→
lim inf =
j→
2w − uj − u d
2w d − lim sup j→
uj − u d 1
Thus 0 lim inf j→ uj − u d lim supj→ uj − u d 0, and conse quently limj→ uj − u d = 0. 11.3 Remark The uniform boundedness assumption uj w
∀ j ∈ and some
w ∈ 1+
(11.2)
is very important for Theorem 11.2. To see this, consider 1 and set j→
a.e.
uj x = j10 1 x −−−→ 10 x = 0 j
whereas uj d = j 1j = 1 = 0 = 10 d. The only obvious possibility to weaken (11.2) would be to require it to hold only almost everywhere.[] Lebesgue’s theorem gives merely sufficient – but easily verifiable – conditions for the interchange of limits and integrals; the ultimate version for such a result with necessary and sufficient conditions will be given in the form of Vitali’s convergence theorem 16.6 in Chapter 16 below. ∗ 1
Recall that lim inf j→ −xj = − lim supj→ xj .
∗
∗
Measures, Integrals and Martingales
91
Let us now have a look at two of the most important applications of the convergence theorems. Parameter-dependent integrals Again X is some measure space. 11.4 Theorem (Continuity lemma) Let ∅ = a b ⊂ be a non-degenerate open interval and u a b × X → be a function satisfying (a) x → ut x is in 1 for every fixed t ∈ a b; (b) t → ut x is continuous for every fixed x ∈ X; (c) ut x wx for all t x ∈ a b × X and some w ∈ 1+ . Then the function v a b → given by t → vt = ut x dx
(11.3)
is continuous. Proof Let us, first of all, remark that (11.3) is well-defined thanks to assumption (a). We are going to show that for any t ∈ a b and every sequence tj j∈ ⊂ a b with limj→ tj = t we have limj→ vtj = vt. This proves continuity of v at the point t. Because of (b), u• x is continuous and, therefore, j→
uj x = utj x −−−→ ut x
and uj x wx
∀ x ∈ X
Thus we can use Lebesgue’s dominated convergence theorem, and conclude lim vtj = lim utj x dx j→
j→
= =
lim utj x dx
j→
ut x dx = vt
A very similar consideration leads to 11.5 Theorem (Differentiability lemma) Let ∅ = a b ⊂ be a non-degenerate open interval and u a b × X → be a function satisfying (a) x → ut x is in 1 for every fixed t ∈ a b; (b) t → ut x is differentiable for every fixed x ∈ X; (c) t ut x wx for all t x ∈ a b × X and some w ∈ 1+ .
92
R.L. Schilling
Then the function v a b → given by t → vt = ut x dx
(11.4)
is differentiable and its derivative is
t vt = t ut x dx 2
(11.5)
Proof Let t ∈ a b and fix some sequence tj j∈ ⊂ a b such that tj = t and limj→ tj = t. Set uj x =
utj x − ut x j→ −−−→ t ut x tj − t
which shows, in particular, that x → t ut x is measurable. By the mean value theorem of differential calculus and (c) we see for some intermediate value = j x ∈ a b
uj x =
t ut x t= wx ∀ j ∈ 0 Thus uj ∈ 1 , and the sequence uj j∈ satisfies all conditions of the dominated convergence theorem 11.2. Finally, vtj − vt j→ tj − t ut x − ut x j = lim dx j→ tj − t = lim uj x dx
t vt = lim
j→
11.2
=
=
lim uj x dx
j→
t ut x dx
Later in this chapter we will give examples of how to apply the continuity and differentiability lemmas. Riemann vs. Lebesgue integration From here to the end of this chapter we choose X = . 2
This formula is very effectively remembered as ‘ t
=
t ’
Measures, Integrals and Martingales
93
Let us briefly recall the definition of the Riemann integral (see Appendix E for a more detailed discussion). Consider on the finite interval a b ⊂ the partitions = a = t0 < t1 < < tk = b define for a given function u a b → mj =
inf
ux
x∈tj−1 tj
Mj =
sup
j = 1 2 k
ux
x∈tj−1 tj
and introduce the lower resp. upper Darboux sums
k
S u =
k
mj tj − tj−1
resp.
S u =
j=1
Mj tj − tj−1
j=1
11.6 Definition A bounded function u a b → is said to be Riemann integrable, if the values ∗ S u = inf S u = u ∗ u = sup (sup inf range over all partitions of a b) coincide and are finite. b Their common value is called the Riemann integral of u and denoted by R a ux dx b or a ux dx. What is going on here? First of all, it is not difficult to see that lower [upper] Darboux sums increase [decrease] if we add points to the partition N , i.e. the sup [inf] in Definition 11.6 makes sense. Moreover, to S u and S u there correspond simple functions, namely u and u given by
k
ux =
k
mj 1tj−1 tj x
and ux =
j=1
Mj 1tj−1 tj x
j=1
which satisfy ux ux ux and which increase resp. decrease as refines. ∑π [u]
σπ [u] tj
tj + 1
tj + 2
tj + 3
94
R.L. Schilling
11.7 Remark The above construction gives the ‘usual’ integral which is often introduced as the anti-derivative. Unfortunately, this notion of integration is somewhat insufficient. Nice general convergence theorems (such as monotone or dominated convergence) hold only under unnatural restrictions or are not available at all. Moreover, it cannot deal with functions of the type x → 1∩01 x: the smallest upper function is 101 while the largest lower function is identically 0.[] Thus the Riemann integral of 1∩01 does not exist, whereas by T10.9(ii) the Lebesgue integral 1∩01 d = 0. Roughly speaking, the reason for this is the fact that the Riemann sums partition the domain of the function without taking into account the shape of the function, thus slicing up the area under the function vertically. Lebesgue’s approach is exactly the opposite: the domain is partitioned according to the values of the function at hand, leading to a horizontal decomposition of the area.
Lebesgue
(equidistant) Riemann
There is a beautifully simple connection with Lebesgue integrals which characterizes at the same time the class of Riemann integrable functions. It may come as a surprise that one needs the notion of Lebesgue null sets to understand Riemann’s integral completely. 11.8 Theorem Let u a b → be a measurable function. 1 (i) If u is Riemann integrable, then u is in b and the Lebesgue and Riemann integrals coincide: ab u d = R a ux dx.
(ii) A bounded function f a b → is Riemann integrable if, and only if, the points in a b where f is discontinuous are a Lebesgue null set. Caution: Theorem 11.8(ii) is often phrased in the following way: f is Riemann integrable if, and only if, f is (Lebesgue) a.e. continuous. Although correct, this is a dangerous way of putting things since one is led to read this statement (incorrectly) as ‘if f = a.e. with ∈ Ca b, then f is Riemann integrable’. That this is wrong is easily seen from f = 1∩ab and ≡ 0; see Problem 10.14 and 11.16.
Measures, Integrals and Martingales
95
Proof (of Theorem 11.8) (i) As u is Riemann integrable, we find a sequence of partitions j of a b such that ∗ lim Sj u = ∗ u = u = lim S j u j→ j→ Without loss of generality we may assume that the partitions are nested j ⊂ j + 1 ⊂ – otherwise we could switch to the increasing sequence 1 ∪ ∪ j of partitions, where we also observe that the lower [upper] Riemann sums increase [decrease] as the partitions refine. The corresponding simple functions j u and j u increase and decrease towards u = sup j u u inf j u = u j∈
j∈
and from the monotone convergence theorem 11.1 we conclude lim Sj u = lim j u d = u d ∗ u = j→ j→ ab ab and also ∗
u = lim S j u = lim j→
j→ ab
j u d =
(11.6)
u d
(11.7)
ab
In other words u u ∈ 1 . Since u is Riemann integrable, ∗ u − u d = u d − u d = u − ∗ u = 0 ab ab ab 0
which implies by Theorem 10.9(i) that u = u Lebesgue a.e. Thus u = u ∪ u = u ⊂ u = u ∈
(11.8)
and by Corollary 10.10(ii) we conclude that u is Lebesgue integrable. (ii) We continue to use the notation from part (i). The set = j∈ j of all partition points is countable, and by Problem 6.5(i),(iii) a Lebesgue null set. If f is Riemann integrable, we can find for > 0 and each x ∈ a b some nx ∈ such that for some suitable tj0 −1 tj0 ∈ nx we have x ∈ tj0 −1 tj0 and
j f x − f x + j f x − f x ∀ j nx By construction of the Riemann integral, all x y ∈ tj0 −1 tj0 satisfy fx − fy Mj0 − mj0 = nx f x − nx f x + f x − f x
96
R.L. Schilling
This inequality shows on the one hand that[] x fx is not continuous ⊂ ∪ f = f ∈ ∈ by (i)
is a null set if f is Lebesgue integrable. On the other hand, the above inequality shows also[] that f = f ⊂ x fx is continuous ∪ ∗ so that (11.6), (11.7) imply f = ∗ f , i.e. f is Riemann integrable. ∗
∗
∗
Let us finally discuss improper Riemann integrals of the type a ux dx = lim R ux dx R a→
0
(11.9)
0
provided the limit exists (cf. Appendix E for other types of improper integrals). 11.9 Corollary Let u 0 → be a measurable function which is Riemann integrable for every interval 0 N, N ∈ . Then u ∈ 1 0 if, and only if, N lim R ux dx < (11.10) N → 0 In this case, R 0 ux dx = 0 u d. Proof Using Theorem 11.8 we see that Riemann integrability of u implies Riemann integrability of u± .[] Moreover, N R u± x dx = u± x dx = u± 10N d (11.11) 0
0N
If u is Riemann integrable and satisfies (11.9) and (11.10), the limit N → of the left side of (11.11) exists and guarantees that the right-hand side has also a finite limit. The monotone convergence theorem 11.1 together with Theorem 10.3(ii) shows that u ∈ 1 0 . Conversely, if u is Lebesgue integrable, then so are u± u 10a and u± 10a for every a > 0. Since u is Riemann integrable over each interval [0 N ], we see from Theorem 11.8 that u and u± are Riemann integrable over each interval 0 a. The monotone convergence theorem 11.1 shows that for every increasing sequence aj ↑ lim u± 10aj d = u± d < j→
which yields that the limits (11.9), (11.10) exist.
Measures, Integrals and Martingales
97
11.10 Remark We can avoid in T11.8(i) and C11.9 the assumption that u is Borel measurable. If we admit an arbitrary u, our proofs show that u is outside a subset of a null set equal to the Borel measurable function – to wit: u = ⊂ = ∈ , but u = is not necessarily measurable. In other words, u becomes automatically measurable w.r.t. the completed Borel -algebra. This entails, of course, that we have to replace and 1 with the completed versions ¯ and ¯ see Problems 4.13, 6.2, 10.11 and 10.12. 1 , 11.11 Remark Lebesgue integration does not allow cancellations, but improper Riemann integrals N do. More precisely: the limit (11.9) can make sense even if limN → R 0 ux dx = . This is illustrated by the following example, which is typical in the theory of Fourier series: sin x The function x → sx = , x ∈ 0 , is improperly Riemann integrable but x not Lebesgue integrable. For a > 0 we can find N = Na ∈ such that N a < N + 1. Thus N sin x a sin x sin x dx = lim dx + dx a→ x x 0 0 N x N −1 j+1 sin x = lim dx N → x j=0 j = aj
where we used
a sin x
N +1
sin x
dx lim = 0
lim
a→ N x dx Nlim → N N → N x
Observe that the aj have alternating signs since j+1 sin x siny + j sin y aj = dx = dy = −1j dy x y + j j 0 0 y + j both as Riemann and Lebesgue integrals, by Theorem 11.8. Further, sin y sin y 1 sin y dy dy = dy aj = j +1 0 y 0 y + j 0 y+j y and also
aj =
0
sin y dy y + j
= aj+1
0
0
sin y dy y + j + 1 2 sin y dy = + j + 1 j + 2
98
R.L. Schilling
Since the function y → siny y is continuous and has a finite limit as y ↓ 0[] , we see that C = 0 siny y dy < , so that C 2/ aj+1 aj j +2 j +1
This and Leibniz’s convergence test prove that the alternating series j=0 aj converges conditionally but not absolutely, i.e. we get a finite improper Riemann integral, but the Lebesgue integral does not exist. Examples As we have seen in this chapter, the Lebesgue integral provides very powerful tools that justify the interchange of limits and integrals. On the other hand, the Riemann theory is quite handy when it comes to calculating the primitive (anti-derivative) of some concrete integrand. Theorem 11.8 tells us when we can switch between these two notions. 11.12 Example Let f x = x , x > 0 and ∈ . Then f ∈ 1 0 1 ⇐⇒ > −1 f ∈ 1 1 ⇐⇒ < −1 We show only the first assertion; the second follows similarly (or, indeed, from C11.9). Since f is continuous, it is Borel measurable, and since f 0 it is enough to show that 01 f d < . We find 01
9.6
x dx = lim
j→
x 11/j1 x dx
11.8
= lim R j→
1
x dx
1/j
x+1 1 = lim j→ + 1 1/j 1 1 − +1 = lim j→ + 1 j + 1
and the last limit is finite if, and only if, > −1. 11.13 Example The function fx = x e−x , x > 0, is Lebesgue integrable over 0 for all > −1 and 0.
Measures, Integrals and Martingales
99
Measurability of f follows from its continuity. Using the exponential series, we find for all N ∈ and x > 0
xN xj = ex N! j! j=0
=⇒
e−x
N ! −N x N
As e−x 1 for x > 0, we obtain the following majorization: fx = x e−x
N ! −N x 101 x + x 11 x ∈ 1 0 N
∈ 1 01 if >−1 by Example 11.12
(11.12)
∈ 1 1 if −N<−1 by Example 11.12
and f ∈ 1 0 follows from T10.3(iv). 11.14 Example (Euler’s Gamma function) The parameter-dependent integral xt−1 e−x dx t>0 (11.13) t = 0
is called the Gamma function. It has the following properties: (i) (ii) (iii) (iv)
is continuous; is arbitrarily often differentiable; tt = t + 1, in particular n + 1 = n!; ln t is convex.
(see Problem 11.13(i)) (see Problem 11.13(ii)) (see Problem 11.13(iii))
Example 11.13 shows that the Gamma function is well-defined for all t > 0. We prove (i) and (ii) first for every interval a b where 0 < a < b < . Since both continuity and differentiability are local properties, i.e. they need to be checked locally at each point, (i) and (ii) follow for the half-line if we let a → 0 and b → . (i) We apply the continuity lemma T11.4. Set ut x = xt−1 e−x . We have already seen in Example 11.13 that ut • ∈ 1 0 for all t > 0; the continuity of u• x is clear and all that remains is to find a uniform (for t ∈ a b) dominating function. An argument similar to (11.12) gives for N > b + 1 xt−1 e−x xt−1 101 x + N ! xt−1−N 11 x xa−1 101 x + N ! xb−1−N 11 x xa−1 101 x + N ! x−2 11 x The expression on the right no longer depends on t, and is integrable according to Example 11.12. (Note that N = Nb depends on the fixed interval a b, but not on t.) This shows that t = 0 ut x dx is continuous for all t ∈ a b.
100
R.L. Schilling
(ii) We apply the differentiability lemma T11.5. The integrand ut • is integrable, and u• x is differentiable for fixed x > 0. In fact,
ut x = xt−1 e−x = xt−1 e−x ln x
t
t We still have to show that t ut x has an integrable majorant uniformly for all t ∈ a b. First we observe that ln x x, thus
∀ a < t < b x 1
ut x xt e−x xb e−x
t For 0 < x < 1 we use ln x = ln x1 , so that
1 1
∀ a < t < b 0 < x < 1
ut x = xt−1 e−x ln xa−1 e−x ln
t x x and since a > 0, we find some > 0 with a − − 1 > −1, so that
1
C xa−1− e−x ∀ a < t < b 0 < x < 1
ut x xa−1− e−x x ln
t x →0 as3 x→0
Combining these calculations, we arrive at
ut x ∀ a < t < b
C xa−1− e−x 101 x + xb e−x 11 x
t which is an integrable majorant (by Examples 11.12, 11.13) independent of t ∈ a b. This shows that t is differentiable on a b, with derivative xt−1 e−x ln x dx t ∈ a b t = 0
A similar calculation proves that n exists for every n ∈ ; see Problem 11.13. Problems 11.1. Adapt the proof of Theorem 11.2 to show that any sequence uj j∈ ⊂ with limj→ uj x = ux and uj g for some g with g p ∈ 1+ satisfies lim uj − up d = 0 j→
[Hint: mimic the proof of 11.2 using uj − up uj + up 2p g p .] 11.2. Give an alternative proof of Lebesgue’s dominated convergence theorem 11.2(ii) using the generalized Fatou theorem from Problem 10.8. 3
To see this, use lim x ln x→0
1 x
x=exp−t
=
lim e−t t = 0 if > 0.
t→
Measures, Integrals and Martingales
101
11.3. Prove the following result of W. H. Young [56]; among statisticians it is also known as Pratt’s lemma, cf. J. W. Pratt [36]. Theorem (Young; Pratt): Let fk k gk k and Gk k be sequences of integrable functions on a measure space X . If k→
k→
k→
(i) fk x −−→ fx, gk x −−→ gx, Gk x −−→ Gx for all x ∈ X, (ii) gk x fk x Gk x for all k ∈ and all x ∈ X, k→ k→ (iii) gk d −−→ g d and Gk d −−→ G d with g d and G d finite, then limk→ fk d = f d and f d is finite. Explain why this generalizes Lebesgue’s dominated convergence theorem 11.2(ii). 11.4. Let u j j∈ be a sequence of integrable functions on X . Show that, if uj d < , the series j=1 j=1 uj converges a.e. to a real-valued function ux, and that in this case
uj d uj d = j=1
j=1
[Hint: use C9.9 to see that the series j uj converges absolutely for almost all x ∈ X. The rest is then dominated convergence.] 11.5. Let uj j∈ be a sequence of positive integrable functions on a measure space X . Assume that the sequence decreases to 0: u1 u2 u3 and j uj ↓ 0. Show that j=1 −1 uj converges, is integrable and that the integral is given by
−1j uj d = −1j uj d j=1
j=1
[Hint: mimic the proof of the Leibniz test for alternating series.] j→
11.6. Give an example of a sequence of integrable functions uj j∈ with uj x −−→ ux for all x and an integrable function u but such that limj→ uj d = u d. Does this contradict Lebesgue’s dominated convergence theorem 11.2? 11.7. Let be one-dimensional Lebesgue measure. Show that for every integrable function u, the integral function ut dt x > 0 x → 0x
is continuous. What happens if we exchange for a general measure ? 11.8. Consider the functions 1 1 (i) ux = x ∈ 1 (ii) vx = 2 x ∈ 1 x x 1 1 (iii) wx = √ x ∈ 0 1 (iv) yx = x ∈ 0 1 x x and check whether they are Lebesgue integrable in the regions given – what would happen if we consider 21 2 instead?
102
R.L. Schilling
[Hint: consider first uk = u 11k , resp., wk = w 11/k1 , etc. and use monotone convergence and the fact that Riemann and Lebesgue integrals coincide if both exist.] 11.9. Show that the function x → exp−x is 1 dx-integrable over the set 0 for every > 0. [Hint: find dominating integrable functions u resp. w if 0 x 1 resp. 1 < x < and glue them together by u 101 + w 11 to get an overall integrable upper bound.] 3 11.10. Show that for every parameter > 0 the function x → sinx x e−x is integrable over 0 and continuous as a function of the parameter. [Hint: find piecewise dominating integrable functions like in Problem 11.9; use the continuity lemma 11.4.] 11.11. Show that the function sintx dt G → Gx = 2 \0 t 1 + t is differentiable and find G0 and G 0. Use a limit argument, integration by parts for −nn dt and the formula t t sintx = x x sintx to show that x G x =
2t sintx dt 2 2 1 + t
11.12. Denote by one-dimensional Lebesgue measure. Prove that x k (i) e−x lnx dx = lim 1− lnx dx. k→ 1k k 1 k x (ii) e−x lnx dx = lim lnx dx. 1− k→ 01 k 01 11.13. Euler’s Gamma function. Show that the function e−x xt−1 dx t > 0 t = 0
(i) is m-times differentiable with m t = 0 e−x xt−1 log xm dx. [Hint: take t ∈ a b, use induction in m. Note that e−x xt−1 log xm xm+t−1 e−x Mx−2 for x 1, and M x−1 for x < 1 and some > 0 because limx→0 xa− log xm = 0 – use, e.g. the substitution x = e−y .] (ii) satisfies t + 1 = tt. n [Hint: use integration by parts for 1/n dt and let n → .] (iii) and is logarithmically convex, i.e. t → ln t is convex. [Hint: calculate ln t and show that this is positive.] 11.14. Show that x → x n fu x, fu x = eux /ex + 1, 0 < u < 1, is integrable over and that gu = xn fu x dx, 0 < u < 1, is arbitrarily often differentiable. 11.15. Moment generating function. Let X be a random variable on the probabil ity space P. The function X t = e−tX dP is called the moment generating function. Show that X is m-times differentiable at t = 0+ if the
Measures, Integrals and Martingales 103 absolute mth moment Xm dP exists. If this is the case, the following formulae hold:
dk
(i) X k dP = −1k k X t
for all 0 k m. t=0+ dt m
X k dP (ii) X t = −1k tk + otm . (ft = otm means that k! k=0 lim ft/tm = 0.)
t→0 m−1 k
tm
X dP k k
−1 t Xm dP. (iii) X t − k! m! k=0 (iv) If Xk dP < for all k ∈ , then k
X dP X t = −1k tk k! k=0 for all t within the convergence radius of the series. 11.16. Consider the functions ux = 1∩01 and vx = 1n−1 n∈ x. Prove or disprove: (i) The function u is 1 on the rationals and 0 otherwise. Thus u is continuous everywhere except the set ∩ 0 1. Since this is a null set, u is a.e. continuous, hence Riemann integrable by Theorem 11.8. (ii) The function v is 0 everywhere but for the values x = 1/n, n ∈ . Thus v is continuous everywhere except a countable set, i.e. a null set, and v is a.e. continuous, hence Riemann integrable by Theorem 11.8. (iii) The functions u and v are Lebesgue integrable and u d = v d = 0. (iv) The function u is not Riemann integrable. 11.17. Construct a sequence of functions uj j∈ which are Riemann integrable but conj→
verge to a limit uj −−→ u which is not Riemann integrable. [Hint: consider, e.g. uj = 1q1 q2 qj where qj j is an enumeration of .] 11.18. Assume that u 0 → is positive and improperly Riemann integrable. Show that u is also Lebesgue integrable. 11.19. Fresnel integrals. Show that the following improper Riemann integrals exist: sin x2 dx and cos x2 dx 0
0
Do they exist as Lebesgue integrals? Remark. The above integrals have the value 21 2 . This can be proved by Cauchy’s theorem or the residue theorem. 11.20. Frullani’s integral. Let f 0 → be a continuous function such that limx→0 fx = m and limx→ fx = M. Show that the two-sided improper Riemann integral s fbx − fax lim dx = M − m ln ab r→0 r x s→
104
R.L. Schilling
exists for all a b > 0. Does this integral have a meaning as Lebesgue integral? [Hint: use the mean value theorem for integrals, E.12.] 11.21. Denote by one-dimensional Lebesgue measure on the interval 0 1. (i) Show that for all k ∈ 0 one has
x ln x dx = −1 k
01
(ii) Use (i) to conclude that 01
k
x−x dx =
1 k+1
k+1 k + 1
k−k .
k=1
[Hint: note that x−x = e−x ln x and use the exponential series.]
12 The function spaces p 1 p
Throughout this chapter X will be some measure space. We will now discuss functions whose (absolute) pth power or pth (absolute) moment is integrable. More precisely, we are interested in the sets p = u X → u ∈ up d < p ∈ 1 (12.1) As usual, we suppress if the choice of measure is clear, and we write p X or p if we want to stress the underlying space or -algebra. It is convenient to have the following notation: 1/p p up = ux dx (12.2) Clearly, u ∈ p ⇐⇒ u ∈ and up < . It is no accident that the notation •p resembles the symbol for a norm:1 indeed, we have because of T10.9(i) up = 0 ⇐⇒ u = 0
a.e.,
and for all ∈ 1/p 1/p p p p u d = = up u d up =
(12.3)
(12.4)
The triangle inequality for •p and deeper results on p depend much on the following elementary inequality. 12.1 Lemma (Young’s inequality) Let p q ∈ 1 be conjugate numbers, p i.e. p1 + q1 = 1 or q = p−1 . Then AB
Ap Bq + p q
holds for all A B 0; equality occurs if, and only if, B = Ap−1 . 1
See Appendix B.
105
(12.5)
106
R.L. Schilling
ξ=
η q–1
Proof There are various different methods to prove (12.5) but probably the most intuitive one is through the following picture: The shaded area representing the pieces S1 and η S2 between the graph and the - resp. -axis is B given by S2 A B Ap Bq p−1 and d = q−1 d = 1 – p p q 0 0 ξ η=
respectively. The picture shows that their combined area is greater than the area of the
S1 A
ξ
Ap B q + AB. Equality obtains if, and only if, the lighter p q shaded area vanishes, i.e. if B = Ap−1 .
darker rectangle, thus
We can now prove the following fundamental inequality. 12.2 Theorem (Hölder’s inequality) Assume that u ∈ p and v ∈ q where p q ∈ 1 are conjugate numbers: p1 + q1 = 1. Then uv ∈ 1 , and the following inequality holds: uv d uv d up · vq p
(12.6)
q
Equality occurs if, and only if, uxp /up = vxq /vq a.e. Proof The first inequality of (12.6) follows directly from T10.4(v). To see the other inequality we use (12.5) with A =
ux up
and
B =
vx vq
to get uxvx uxp vxq p + q up vq p up q vq Integrating both sides of this inequality over x yields
p q uxvx dx up vq 1 1 + = 1 + p q = p q up vq p up q vq
Measures, Integrals and Martingales
107
Equality can only happen if we have equality in (12.5). Because of our choice of A and B, the condition for equality from L12.1 becomes vx/vq =
p−1 q a.e. Raising both sides to the qth power gives vxq /vq = ux/up p uxp /up since p − 1q = p. H¨older’s inequality with p = q = 2 is usually called the Cauchy–Schwarz inequality. 12.3 Corollary (Cauchy–Schwarz inequality) Let u v ∈ 2 . Then uv ∈ 1 and uv d u2 · v2 (12.7) Equality occurs if, and only if, ux2 /u22 = vx2 /v22 a.e. Another consequence of Hölder’s inequality is the Minkowski or triangle inequality for •p . 12.4 Corollary (Minkowski’s inequality) Let u v ∈ p , p ∈ 1 . Then u + v ∈ p and u + vp up + vp
(12.8)
Proof Since u + vp u + vp 2p max up vp 2p up + vp we get that u + vp ∈ 1 or u + v ∈ p . Now u + vp d = u + v · u + vp−1 d
u · u + vp−1 d +
v · u + vp−1 d
if p = 1 the proof stops here up · u + vp−1 q + vp · u + vp−1 q
12.2
Dividing both sides by u + vp−1 q proves our claim since 1/q 1−1/p p−1q p u + vp−1 = u + v d = u + v d q where we also used that q =
p p−1 .
108
R.L. Schilling
12.5 Remarks (i) Formulae (12.4) and (12.8) imply u v ∈ p
=⇒
u + v ∈ p
∀ ∈
which shows that p is a vector space. (ii) Formulae (12.3), (12.4) and (12.8) show that •p is a semi-norm for p : the definiteness of a norm is not fulfilled since up = 0
ux = 0 for almost every x
only implies that
but not for all x. There is a standard recipe to fix this: since p -functions can be altered on null sets without affecting their integration behaviour, we introduce the following equivalence relation: we call u v ∈ p equivalent if they differ on at most a -null set, i.e. u ∼ v ⇐⇒ u = v ∈ The quotient space Lp = p /∼ consists of all equivalence classes of p functions. If up ∈ Lp denotes the equivalence class induced by the function u ∈ p , it is not hard to see that u + vp = up + vp
uv1 = up vq
and
hold, turning Lp into a bona fide vector space with the canonical norm
up p = inf wp w ∈ p w ∼ u for quotient spaces. Fortunately, up p = up and later on we will often follow the usual abuse of notation and identify u with u. ¯ (iii) All results of this chapter are still valid for -valued numerical functions. p Indeed, if f ∈ ¯ and f d < , then
f p > j f = = f p = = j∈
= lim f p > j
4.4
j→
10.12
1 f p d = 0 j→ j
lim
by the Markov inequality. This means, however, that f is a.e. -valued, so sums and products of such functions are always defined outside a -null set. In particular p p there is no need to distinguish between the classes Lp = L and L¯ . ∗
∗
∗
Measures, Integrals and Martingales
109
We will need the concept of convergence of a sequence in the space p . A sequence uj j∈ ⊂ p is said to be convergent in p with limit p limj→ uj = u if, and only if, lim uj − up = 0
j→
Remember, however, that p -limits are only almost everywhere unique. If u w are both p -limits of the same sequence uj j∈ , we have 124
u − wp lim u − uj p + uj − wp = 0 j→
implying only u = w almost everywhere. We call uj j∈ ⊂ p a (p -) Cauchy sequence, if ∀ > 0
∃ N ∈
∀ j k N uj − uk p <
Note that these definitions reduce convergence in p to convergence questions of the semi-norm •p in + . This means that, apart from uniqueness, many formal properties of limits in carry over to p – most of them even with the same proof! Caution: Pointwise convergence of a sequence uj x → ux of p -functions uj j∈ ⊂ p does not guarantee convergence in p – but in view of Lebesgue’s dominated convergence theorem 11.2, the additional condition that p
uj x gx
for some function g ∈ +
is sufficient since uj − up uj + up 2p g p and uj x − ux → 0.[] Clearly, a convergent sequence uj j∈ is also a Cauchy sequence, uj − uk p uj − up + u − uk p < 2
∀ j k N
the converse of this assertion is also true, but much more difficult to prove. We start with a simple observation: 12.6 Lemma For any sequence uj j∈ ⊂ p , p ∈ 1 , of positive functions uj 0 we have uj p (12.9) uj j=1 j=1 p
Proof Repeated applications of Minkowski’s inequality (12.8) show that N N uj p uj p uj j=1 j=1 j=1 p
110
R.L. Schilling
and since the right-hand side is independent of N , the inequality remains valid even if we pass to the sup on the left. By Beppo Levi’s theorem 9.6, we find N ∈
p p N N sup uj = sup uj d N ∈ j=1 N ∈ j=1 p
=
sup
N
N ∈ j=1
p uj
d =
p uj
d
j=1
and the proof follows. The completeness of p was proved by E. Fischer (for p = 2) and F. Riesz (for 1 p < . 12.7 Theorem (Riesz–Fischer) The spaces p , p ∈ 1 , are complete, i.e. every Cauchy sequence uj j∈ ⊂ p converges to some limit u ∈ p . Proof The main difficulty here is to identify the limit u. By the definition of a Cauchy sequence we find numbers 1 < n1 < n2 < < nk < such that
unk+1 − unk < 2−k p
k ∈
To find u, we turn the sequence into a series by unk+1 =
k
unj+1 − unj
un0 = 0
(12.10)
j=0
and the limit as k → would formally be u = j=0 unj+1 − unj – if we can make sense of this infinite sum. Since (12.9) unj+1 − unj unj+1 − unj p j=0 j=0 p (12.11) 1 un1 p + 2j j=1
unj+1 − unj p < a.e., so that u = we conclude with C10.13 that j=0
j=0 unj+1 − unj is a.e. (absolutely) convergent.
Measures, Integrals and Martingales
111
Let us show that u = p - limk→ unk . For this, observe that by the (ordinary) triangle inequality and (12.11),
def u − unk = u u − unj = − unj p j=k+1 nj+1 j=k+1 nj+1 p p unj+1 − unj j=k+1 p
(12.9)
k→ unj+1 − unj − −−→ 0 p
j=k+1
Finally, using that uj j∈ is a Cauchy sequence, we get, for all > 0 and suitable N ∈ , u − uj u − unk + unk − uj p p p u − unk p + ∀ j nk N Letting k → shows u − uj p if j N . The proof of Theorem 12.7 shows even a weak form of pointwise convergence: 12.8 Corollary Let uj j∈ ⊂ p , p ∈ 1 with p -limj→ uj = u. Then there exists a subsequence unk k∈ such that limk→ unk x = ux holds for almost every x ∈ X. Proof Since uj j∈ converges in p , it is also an p -Cauchy sequence and the claim follows from (12.11). As we have already remarked, pointwise convergence alone does not guarantee convergence in p , not even of a subsequence, see Problem 12.7. Let us repeat the following sufficient criterion, which we have already proved on page 109. 12.9 Theorem Let uj j∈ ⊂ Lp , p ∈ 1 , be a sequence of functions such p that uj w for all j ∈ and some w ∈ + . If ux = limj→ uj x exists for almost every x ∈ X, then u ∈ p
and
lim u − uj p = 0
j→
Of a different flavour is the next result which is sometimes called F. Riesz’s convergence theorem.
112
R.L. Schilling
12.10 Theorem (Riesz) Let uj j∈ ⊂ p , p ∈ 1 , be a sequence such that limj→ uj x = ux for almost every x ∈ X and some u ∈ p . Then lim uj − up = 0
j→
⇐⇒
lim uj p = up
j→
(12.12)
Proof The direction ‘⇒’ in (12.12) follows from the lower triangle inequality2 uj p − up uj − up for •p . For ‘⇐’ we observe that uj − up uj + up 2p max uj p up 2p uj p + up and we can apply Fatou’s lemma 9.11 to the sequence 2p uj p + up − uj − up 0 to get 2p+1
up d =
lim inf 2p uj p + up − uj − up d j→
lim inf 2p uj p d + 2p up d − uj − up d j→
=2
p+1
u d − lim sup p
j→
uj − up d
where we used that limj→ uj p d = up d. This shows that lim sup uj − up d = 0 hence lim uj − up d = 0 j→
j→
Let us note the following structural result on p , which will become important later on. 12.11 Corollary The simple p-integrable functions ∩ p , p ∈ 1 , are a dense subset of p , i.e. for every u ∈ p one can find a sequence fj j∈ ⊂ such that limj→ fj − up = 0. p
Proof Assume first that u ∈ + is positive. By Theorem 8.8 we find an increasing sequence fj j∈ of positive simple functions with supj∈ fj = u. Since 0 fj u, we have fj ∈ p as well as supj∈ fj p d = up d.[] We can now apply Theorem 12.10 and deduce that limj→ fj − up = 0. 2
Follows exactly as a − b a − b follows from a + b a + b, a b ∈ .
Measures, Integrals and Martingales
113
For a general u ∈ p , we consider its positive and negative parts u± and construct, as before, sequences gj hj ∈ ∩ p with gj → u+ and hj → u− in p . But then gj − hj ∈ ∩ p , and j→
u − gj − hj p u+ − gj p + u− − hj p −−−→ 0 finishes the proof. With a special choice of X we can see that integrals generalize infinite series. 12.12 Example Consider the counting measure = j=1 j , cf. Example 4.7(iii), on the measurable space . As we have seen in Examples 9.10(ii) and 10.6(ii), a function u → is -integrable if, and only if,
uj <
in which case
j=1
u d =
uj
j=1
p In a similar way one shows that v ∈ p if, and only if, j=1 vj < . Functions u → are determined by their values u1 u2 u3 and every sequence aj j∈ ⊂ defines a function u by uj = aj . This means that we can identify the function u with the sequence ujj∈ of real numbers. Thus p = u →
ujp <
j=1
= aj j∈ ⊂
aj p < = p
j=1
the latter being a so-called sequence space. Note that in this context Hölder’s and Minkowski’s inequalities become
aj bj
j=1
1/p
aj
p
j=1
1/q bj
q
(12.13)
j=1
if p q ∈ 1 are conjugate numbers, and
j=1
1/p aj ± bj p
j=1
1/p aj p
+
j=1
1/p bj p
(12.14)
114
R.L. Schilling
We close this chapter with a useful convexity, resp. concavity, inequality. ¯ is convex [concave] if Recall that a function a b → on an interval a b ⊂ tx + 1 − ty tx + 1 − ty tx + 1 − ty tx + 1 − ty
0 < t < 1
(12.15)
0 < t < 1
holds for all x y ∈ a b. Geometrically this means that the graph of a convex [concave] function between the points x x and y y lies below [above] the chord linking x x and y y. Convex [concave] functions have nice properties: they are continuz y x ous in a b and if exists, A concave function Φ it is increasing [decreasing]. If is twice differentiable, convexity [concavity] is equivalent to 0 [ 0]. Further details and proofs can be found in Boas [8]. For our purposes we need the following lemma. 12.13 Lemma A convex [concave] function a b → has at every point in and satisfies the open interval a b a finite right-hand derivative + y x − y + y x + x + y x − y + y
∀ x y ∈ a b ∀ x y ∈ a b
(12.16)
In particular, a convex [concave] function is the upper [lower] envelope of all linear functions below [above] its graph x = sup x z = z + z ∀ z ∈ a b x = inf x z = z + z ∀ z ∈ a b
(12.17)
Proof Since the graph of a convex [concave] function looks like a smile [frown], the last statement of the lemma is intuitively clear. A rigorous argument uses (12.16) which says that admits at every point a tangent below [above] its graph. We show (12.16) only for concave functions. Pick numbers z<
Measures, Integrals and Martingales
115
in a b and choose t = ty x ∈ 0 1 such that = ty + 1 − tx, t = t y such that y = t + 1 − t and t = t z such that = t z + 1 − t . Using these values in (12.15) we see after some simple manipulations that x − y − y − z − x−y −y − z− −y is bounded and increasing −y derivative + y = lim↓y −y exists −y
cf. the picture on page 114. This shows that
as ↓ y. Therefore, the right-hand and is finite. In particular, is continuous from the right at the point y, so that lim↓y = y. Letting → y in the above chain of inequalities therefore yields x − y y − y + ∀ < y < x x−y y− and rearranging the first of these inequalities gives (12.16). 12.14 Theorem (Jensen’s inequality) Let 0 → 0 be a concave and V 0 → 0 be a convex function. For any w ∈ 1+ we have uw d u w d ∀ u ∈ + (12.18) w d w d Vu w d uw d ∀ u ∈ + (12.19) V w d w d If uw ∈ 1 , then u w ∈ 1 . Proof We prove only (12.18) since (12.19) is similar. If the right-hand side of (12.18) is infinite, there is nothing to show. Therefore we may assume uw d < . Since x is concave, we find for any x = x + x that u w d u + w d uw d uw d = + = w d w d w d w d and the inequality follows from (12.17) if we pass to the inf over all linear functions satisfying . 12.15 The case p = . In Theorem 12.2 and Corollary 12.4 we avoided the cases p = 1 or . This can be overcome by introducing the space :
(12.20) = u ∈ u is a.e. bounded
116
R.L. Schilling
Obviously, is a vector space, and we can introduce by
u = inf C u > C = 0
(12.21)
a norm[] which is, for continuous u, just u = sup u. Interpreting p = 1 and q = as conjugate numbers, it is not hard to verify T12.2 and C12.4 for these values of p and q. The completeness of is much easier to prove than T12.7: if uj j∈ is a Cauchy sequence in , we set Ak = uk > uk ∪ uk − u > uk − u A = Ak k∈
By definition, Ak = 0 and A = 0, so that uj 1A = 0 for all j ∈ . On the set Ac , however, uj j∈ converges uniformly to a bounded function u, i.e. u1Ac ∈ as well as uj − u1Ac → 0. As in Remark 12.5 we write L for /∼ , where u ∼ v means that
u = v ∈ is a -null set. Note also that T12.10 and C12.11 are no longer true for p = . This can j→
be seen on from uj x = e−x/j −−−→ 1 x for the former and from [] ux = j=− 12j2j+1 x for the latter.
Problems 12.1. Let X be a finite measure space and let 1 q < p < . (i) Show that uq X1/q−1/p up . [Hint: use Hölder’s inequality for u · 1.] (ii) Conclude that p ⊂ q for all p q 1 and that a Cauchy sequence in p is also a Cauchy sequence in q . (iii) Is this still true if the measure is not finite? 12.2. Let X be a general measure space and 1 p r q . Prove that p ∩ q ⊂ r by establishing the inequality ur up · u1− q
∀ u ∈ p ∩ q
with = 1r − q1 / p1 − q1 . [Hint: use Hölder’s inequality.] 12.3. Extend the proof of Hölder’s inequality 12.2 to p = 1 and q = , i.e. show that (12.22) uv d u1 · v holds for all u ∈ 1 and v ∈ .
Measures, Integrals and Martingales
117
12.4. Generalized Hölder inequality. Iterate Hölder’s inequality to derive the following generalization: u1 · u2 · · uN d u1 p1 · u2 p2 · · uN pN (12.23) for all pj ∈ 1 such that Nj=1 pj−1 = 1 and all measurable uj ∈ . 12.5. Young functions. Let 0 → 0 be a strictly increasing continuous function such that 0 = 0 and lim→ = . Denote by = −1 the inverse function. The functions A = 1 d and B = 1 d (12.24) 0A
0B
are called conjugate Young functions. Adapt the proof of L12.1 to show the following general Young’s inequality: AB A + B
12.6.
12.7.
12.8.
12.9. 12.10.
12.11.
∀ A B 0
(12.25)
[Hint: interpret A and B as areas below the graph of , resp. .] Let 1 p < and u uk ∈ p such that k=1 u − uk p < . Show that limk→ uk x = ux almost everywhere. [Hint: mimic the proof of the Riesz–Fischer theorem using j uj+1 − uj .] Consider one-dimensional Lebesgue measure on 0 1. Verify that the sequence un x = n 101/n x, n ∈ , converges pointwise to the function u ≡ 0, but that no subsequence of un converges in p -sense for any p 1. Let p q ∈ 1 be conjugate indices, i.e. p−1 +q −1 = 1 and assume that uk k∈ ⊂ p and wk k∈ ⊂ q are sequences with limits u and w in p , resp. q -sense. Show that uk wk converges in 1 to the function uw. Prove that uj j∈ ⊂ 2 converges in 2 if, and only if, limnm→ un um d exists. [Hint: verify and use the identity u − w22 = u22 + w22 − 2 uw d.] every measurable u 0 with Let X be a finite measure space. Show that exphux dx < for some h > 0 is in p for every p 1. [Hint: check that tN /N ! et implies u ∈ N , N ∈ ; then use Problem 12.1.] Let be Lebesgue measure in 0 and p q 1 arbitrary. (i) Show that un x = n x + n− ( ∈ > 1) is for every n ∈ in p . (ii) Show that vn x = n e−nx ( ∈ ) is for every n ∈ in q .
12.12. Let ux = x + x −1 , x > 0. For which p 1 is u ∈ p 1 0 ? 12.13. Consider the measure space n , n 2 where is the =n 1 2p 1/p counting measure. Show that x is a norm if p ∈ 1 , but not for j j=1 p ∈ 0 1. [Hint: you can identify p with n .]
118
R.L. Schilling
12.14. Let X be a measure space. The space p is called separable, if there exists a countable dense subset p ⊂ p . Show that p , p ∈ 1 , is separable if, and only if, 1 is separable. [Hint: use Riesz’s convergence theorem 12.10.] 12.15. Let un ∈ p ,p 1, for all n ∈ . What can you say about u and w if you know that limn→ un − up d = 0 and limn→ un x = wx for almost every x? 1 12.16. Let X be a finite measure space and let u ∈ be strictly positive with u d = 1. Show that 1 log u d X log X 12.17. Let u be a positive measurable function on 0 1. Which of the following is larger: ux log ux dx or us ds · log ut dt? 01
01
01
[Hint: show that log x x log x, x > 0, and assume first that u d = 1, then consider u/ u d.] 12.18. Let X be a measure space and p ∈ 0 1. The conjugate index is given by − 1 < 0. Prove pq = 1/p for all measurable u v w X → 0 with u d vp d < and 0 < wq d < the inequalities 1/p 1/q uw d up d wq d and
1/p u + v d p
1/p 1/p p u d + v d p
[Hint: consider Hölder’s inequality for u and 1/w.] 12.19. Let X be a finite measure space and u ∈ be a bounded function with u > 0. Prove that for all n ∈ : (i) Mn = un d ∈ 0 ; (ii) Mn+1 Mn−1 Mn2 ; (iii) X−1/n un Mn+1 /Mn u ; (iv) limn→ Mn+1 /Mn = u . [Hint: (ii) – use Hölder’s inequality; (iii) – use Jensen’s inequality for the lower for the upper estimate; (iv) – observe that un d estimate, Hölder’s inequality n u − d = u > u − u − n , take the nth root and
u>u − let n → .] 12.20. Let X be a general measure space and let u ∈ p1 p . Then lim up = u
p→
where u = if u is unbounded.
Measures, Integrals and Martingales
119
[Hint: start with u < . Show that for any sequence qn → one has n and conclude that lim sup up+qn uqn /p+qn ·up/p+q p→ up u . The p other estimate follows from up u > 1 − u 1/p 1 − u and p → → 0, see also the hint to Problem 12.19, where is finite in view of the Markov inequality. If u = , use part one of the hint and observe that lim inf sup u ∧ kp sup lim u ∧ kp = sup u ∧ k p→
k∈ p→
k∈
k∈
= sup supux ∧ k = sup supux ∧ k k∈
x
x
k∈
= u = 12.21. Let X be a measure space and 1 p < . Show that f ∈ ∩ p if, and only if, f ∈ and f = 0 < . In particular, ∩ p = ∩ 1 . 12.22. Use Jensen’s inequality (12.18) to derive Hölder’s and Minkowski’s inequalities. Instructions: use x = x1/q x 0
w = f p
and
u = gq f −p 1 f =0
for Hölder’s inequality and x = x1/p + 1p x 0 for Minkowski’s inequality.
w = f p 1 f =0
and
u = f −p gp 1 f =0
13 Product measures and Fubini’s theorem
Lebesgue measure on n has, inherent in its definition, an interesting additional property: if n > d 1 n a1 b1 × · · · × an bn (13.1) = b1 − a1 · · bd − ad · bd+1 − ad+1 · · bn − an = d a1 b1 × · · · × ad bd · n−d ad+1 bd+1 × · · · × an bn
y∈n – d
i.e. it is – at least for rectangles – the product of Lebesgue measures in lowerdimensional spaces. In this chapter we will see that (13.1) remains true for any product A × B of sets A ∈ d and B ∈ n−d . More importantly, we will prove the following version of Cavalieri’s principle 1 (x, y ) E
0
y0
n E = =
1E (x0, y) E x0
=
x∈d
1E dn
1E x y0 d dx n−d dy0
1E x0 y n−d dy d dx0
which just says that we carve up the set E ⊂ n horizontally or vertically, measure the volume of the slices and ‘sum’ them up along the other direction to get the volume of the whole set E. Clearly, we should be careful about the measurability of products of sets. Recall the following simple rules for Cartesian products of sets A A Ai ⊂ X, i ∈ I, 120
Measures, Integrals and Martingales
and B B ⊂ Y :
i∈I
i∈I
121
Ai × B = Ai × B
i∈I
Ai × B =
Ai × B
i∈I
A × B ∩ A × B = A ∩ A × B ∩ B
(13.2)
Ac × B = X × B \ A × B A × B ⊂ A × B ⇐⇒ A ⊂ A and B ⊂ B which are easily derived from the formula A × B = A × Y ∩ X × B = 1−1 A ∩ 2−1 B where 1 X × Y → X and 2 X × Y → Y are the coordinate projections, and the compatibility of inverse mappings and set operations. To treat measurability, we assume throughout this chapter that X and Y are -finite measure spaces. Following (13.1) we want to define a measure on rectangles of the form A × B such that A × B = A B for A ∈ and B ∈ . The first problem which we encounter is that the family × = A × B A ∈ B ∈
(13.3)
is, in general, no -algebra. 13.1 Lemma Let and be two -algebras (or only semi-rings). Then × is a semi-ring.1 Proof Literally the same as the induction step in the proof of P6.4. 13.2 Definition Let X and Y be two measurable spaces. Then ⊗ = × is called a product -algebra, and X × Y ⊗ is the product of measurable spaces. The following lemma is quite useful since it allows us to reduce considerations for ⊗ to generators and of and – just as we did in (13.1). 1
See S1 –S3 on p. 37 for the definition of a semi-ring.
122
R.L. Schilling
13.3 Lemma If = and = and if contain exhausting sequences Fj j∈ ⊂ , Fj ↑ X and Gj j∈ ⊂ , Gj ↑ Y , then def
× = × = ⊗ Proof Since × ⊂ × we have × ⊂ ⊗ . On the other hand, the system = A ∈ A × G ∈ × ∀ G ∈ is a -algebra: Let A Aj ∈ , j ∈ , and G ∈ ; 1 follows from Fj × G ∈ × X ×G = j∈ ∈ ×
2 from Ac × G = X × G \ A × G ∈ × , and 3 from Aj × G = Aj × G ∈ × j∈ j∈ ∈ ×
Obviously, ⊂ ⊂ , and therefore = ; by the very definition of we conclude that × ⊂ × . A similar consideration shows × ⊂ × . This means that for all A ∈ and B ∈ A × B = A × X ∩ Y × B = A × Gk ∩ Fj × B
jk∈ ∈ ×
∈ ×
so that × ⊂ × and thus ⊗ ⊂ × . If the generators are rich enough, we have not too many choices of measures with F × G = F G. In fact, 13.4 Theorem (Uniqueness of product measures) Let X and Y be two measure spaces and assume that = and = . If • are ∩-stable, • contain exhausting sequences Fj ↑ X and Gk ↑ Y with Fj < and Gk < for all j k ∈ , then there is at most one measure on X × Y ⊗ satisfying
F × G = F G
∀ F ∈ G ∈
Measures, Integrals and Martingales
123
Proof By Lemma 13.3 × generates ⊗ . Moreover, × inherits the ∩-stability of and [] , the sequence Fj × Gj increases towards X × Y and
Fj × Gj = Fj Gj < . These were the assumptions of the uniqueness theorem 5.7, showing that there is at most one such product measure . As so often, it is the existence which is more difficult than uniqueness. 13.5 Theorem (Existence of product measures) Let X and Y be -finite measure spaces. Then the set-function
× → 0
A × B = A B
extends uniquely to a -finite measure on X × Y ⊗ such that
E = 1E x y dx dy = 1E x y dy dx
(13.4)
holds2 for all E ∈ ⊗ . In particular, the functions x → 1E x y y → 1E x y x → 1E x y dy y → 1E x y dx are , resp. -measurable for every fixed y ∈ Y , resp. x ∈ X. Proof Uniqueness of follows from T13.4. Existence: Let Aj j∈ , Bj j∈ be sequences in resp. with Aj ↑ X, Bj ↑ Y and Aj Bj < . Clearly, Ej = Aj × Bj ↑ X × Y . For every j ∈ we consider the family j of all subsets D ⊂ X × Y satisfying the following conditions: • x → 1D∩Ej x y and y → 1D∩Ej x y are measurable, • x → 1D∩Ej x y dy and y → 1D∩Ej x y dx are measurable, 1D∩Ej x y dy dx. • 1D∩Ej x y dx dy = That × ⊂ j follows from 1A×B∩Ej x y dx dy = 1A∩Aj x1B∩Bj y dx dy = A ∩ Aj 1B∩Bj y dy = A ∩ Aj B ∩ Bj = = 1A×B∩Ej x y dy dx 2
We use the symbols
d like brackets, i.e.
d d =
d d .
124
R.L. Schilling
where the ellipsis stands for the same calculations run through backwards. In each step the measurability conditions needed to perform the integrations are fulfilled because of the product structure.[] In particular, X × Y ∅ Ek ∈ j . If D ∈ j , then 1Dc ∩Ej = 1Ej − 1Ej ∩D and 1Dc ∩Ej x y dx dy = 1Ej x y dx − 1Ej ∩D x y dx dy 1Ej ∩D x y dx dy = 1Ej x y dx dy − 1Ej ∩D x y dy dx = 1Ej x y dy dx − by definition, since Ej D ∈ j = = 1Dc ∩Ej x y dy dx Again, in each step the measurability conditions hold since measurable functions form a vector space. If Dk k∈ ⊂ j are mutually disjoint sets, D = · k∈ Dk , the linearity of the integral and Beppo Levi’s theorem in the form of C9.9 show that 1D∩Ej x y dx dy = 1Dk ∩Ej x y dx dy k=1
=
1Dk ∩Ej x y dx dy
k=1
=
1Dk ∩Ej x y dy dx
k=1
by definition, since Dk ∈ j = = 1D∩Ej x y dy dx and the measurability conditions hold since measurability is preserved under sums and increasing limits. The last three calculations show that j is a Dynkin system containing the ∩-stable family × . By Theorem 5.5, ⊗ ⊂ j for every j ∈ . Since Ej ↑ X × Y , Beppo Levi’s theorem 9.6 proves (13.4) along with the measurability of the functions 1E • y, 1E x •, 1E • y dy and 1E x • dx since is stable under pointwise limits.
Measures, Integrals and Martingales
125
Replacing in the above calculations Ej by X × Y finally proves that E → E =
1E x y dx dy
is indeed a measure on X × Y ⊗ with A × B = A B. 13.6 Definition Let X and Y be -finite measure spaces. The unique measure constructed in Theorem 13.5 is called the product of the measures and , denoted by × . X × Y ⊗ × is called the product measure space. Returning to the example considered at the beginning we find 13.7 Corollary If n > d 1, n n n = d × n−d d ⊗ n−d d × n−d The next step is to see how we can integrate w.r.t. × . The following two results are often stated together as the Fubini or Fubini–Tonelli theorem. We prefer to distinguish between them since the first result, Theorem 13.8, says that we can always swap iterated integrals of positive functions (even if we get + ), whereas 13.9 applies to more general functions but requires the (iterated) integrals to be finite. 13.8 Theorem (Tonelli) Let X and Y be -finite measure spaces and let u X × Y → 0 be ⊗ -measurable. Then (i) x → ux y, y → ux y are , resp. -measurable for all y ∈ Y , resp. x ∈ X; (ii) x → ux y dy, y → ux y dx are , resp. -measurable; (iii) X×Y
Y
u d × =
YX
with values in 0 .
X
ux y dx dy =
ux y dy dx XY
Proof Since u is positive and ⊗ -measurable, we find an increasing sequence of simple functions fj ∈ + ⊗ with supj∈ fj = u. Each fj is of the form Nj fj x y = k=0 k 1Ek x y, where k 0 and the Ek ∈ ⊗ , 0 k Nj,
126
R.L. Schilling
are disjoint. By Theorem 13.5, the fact that ⊗ is a vector space and the linearity of the integral we conclude that x → fj x y y → fj x y x → fj x y dy y → fj x y dx Y
X
are measurable functions and (i), (ii) follow from the usual Beppo-Levi argument since and are stable under increasing limits, cf. C8.9. Linearity of the integral and Theorem 13.5 also show fj d × = fj d d = fj d d ∀ j ∈ X×Y
YX
XY
and (iii) follows from several applications of Beppo Levi’s theorem 9.6. 13.9 Corollary (Fubini’s theorem) Let X and Y be -finite mea¯ be ⊗ -measurable. If at least one of the sure spaces and let u X × Y → following three integrals is finite u d × ux y dx dy ux y dy dx X×Y
YX
XY
then all three integrals are finite, u ∈
1 × ,
and
(i) x → ux y is in 1 for -a.e. y ∈ Y ; (ii) y →
ux y is in 1 for -a.e. x ∈ X; ux y dx is in 1 ;
(iii) y →
(iv) x → (v)
X
ux y dy is in 1 ;
Y
u d × =
X×Y
ux y dx dy =
YX
ux y dy dx. XY
Proof Tonelli’s theorem 13.8 shows that in 0 u d × = u d d = u d d X×Y
YX
(13.5)
XY
If one of the integrals is finite, all of them are finite and u ∈ 1 × fol± lows. ± Again by Tonelli’s theorem, x → ±u x y is -measurable and y → u x y dx is -measurable. Since u u, (13.5) and C10.13 show that u± x y dx ux y dx < for -a.e. y ∈ Y X
X
Measures, Integrals and Martingales
and
u± x y dx dy
YX
127
ux y dx dy <
YX
This proves (i) and (iii); (ii) and (iv) are shown in a similar way. Finally, (v) follows for u+ and u− from Theorem 13.8 and for u = u+ − u− by linearity, since (i)–(iv) exclude the possibility of ‘ − ’. More on measurable functions There is an alternative way to introduce the product -algebra ⊗ . Recall that the coordinate projections j X1 × X2 → Xj
x1 x2 → xj
j = 1 2
induce the -algebra 1 2 on X1 ×X2 which is by Definition 7.5 the smallest -algebra such that both 1 and 2 are measurable maps. 13.10 Theorem Let Xj j , j = 1 2, and Z be measurable spaces. Then (i) 1 ⊗ 2 = 1 2 ; (ii) T Z → X1 × X2 is /1 ⊗ 2 -measurable if, and only if, j T is /j measurable j = 1 2; (iii) if S X1 × X2 → Z is measurable, then Sx1 • and S• x2 are 2 /- resp. 1 /-measurable for every x1 ∈ X1 , resp. x2 ∈ X2 . Proof (i) Since 1−1 x = x × X2 , 2−1 y = X1 × y and A1 × A2 = A1 × Y ∩ X × A2 , we have 7.5 1 2 = 1−1 1 2−1 2 = A1 × X2 X1 × A2 Aj ∈ j which shows that 1 × 2 ⊂ 1 2 ⊂ 1 ⊗ 2 , hence 1 2 = 1 ⊗ 2 . (ii) If T Z → X1 × X2 is measurable, then so is j T by part (i) and T7.4. Conversely, if j T , j = 1 2, are measurable we find T −1 A1 × A2 = T −1 1−1 A1 ∩ 2−1 A2 = T −1 1−1 A1 ∩ T −1 2−1 A2 = 1 T−1 A1 ∩ 2 T−1 A2 ∈ Since 1 × 2 generates 1 ⊗ 2 , T is measurable by L7.2. (iii) Fix x1 ∈ X1 and consider y → Sx1 y. Then Sx1 • = S ix1 •, where ix1 X2 → X1 × X2 , y → x1 y. By part (ii), ix1 is 2 /1 ⊗ 2 -measurable since
128
R.L. Schilling
the maps j ix1 x2 = xj are j /j -measurable j = 1 2. The claim follows now from T7.4. Distribution functions Let X be a -finite measure space. For u ∈ + the decreasing, left-continuous[] numerical function t → u t is called the distribution function of u (under ). The next theorem shows that Lebesgue integrals still represent the area between the graph of a function and the abscissa. 13.11 Theorem Let X be a -finite measure space and u X → 0 be -measurable. Then u d =
u t 1 dt ∈ 0
0
(13.6)
Proof Consider the function Ux t = ux t on X × 0 . By Theorem 13.10(ii), U is ⊗ 0 -measurable, thus E = x t ux t ∈ ⊗ 0 An application of Tonelli’s theorem 13.8 shows ux dx = 10ux t 1 dt dx 1E x t 1 dt dx = = =
X×0
0 ×X
0
1E x t dx 1 dt
u t 1 dt
If 0 → 0 is continuously differentiable, increasing and 0 = 0, we even have in the setting of Theorem 13.11 u d = u t 1 dt ∗
=
t=s
=
=
0
0 0
0
u t dt s u s ds s u s ds
Measures, Integrals and Martingales
129
The problem with this calculation is the step marked ∗ where we equate a Lebesgue integral with a Riemann integral. By Theorem 11.8(ii) we can do this if t → u t is Lebesgue a.e. continuous and bounded. Boundedness is not a problem since we may consider u t ∧ N , N ∈ , and let N → using T9.6. For the a.e. continuity we need 13.12 Lemma Every monotone function → has at most countably many discontinuities and is, in particular, Lebesgue a.e. continuous. Proof Without loss of generality we may assume that increases. Therefore, the one-sided limits lims↑t s = t− t+ = lims↓t s exist in , so that can only have jump discontinuities where t− < t+. Define for all >0 J = t ∈ t = t+ − t− Since on every compact interval a b and for every > 0 0 b − a =
b − a <
we can have at most b−a jumps of size or larger in the interval a b, that is # a b ∩ J < . Therefore, the set of all discontinuities of J = t ∈ t > 0 = −j j ∩ J 1/k jk∈
is a countable set, hence a Lebesgue null set. Since t → u t is decreasing, we finally have 13.13 Corollary Let X be -finite and let 0 → 0 with 0 = 0 be increasing and continuously differentiable. Then u d = s u s ds (13.7) 0
holds for all u ∈ + ; the right-hand side is an improper Riemann integral. Moreover, u ∈ 1 if, and only if, this Riemann integral is finite. In the important special case where t = tp , p 1, (13.7) reads psp−1 u s ds upp = up d = 0
(13.8)
130
R.L. Schilling
Minkowski’s inequality for integrals The following inequality is a generalization of Minkowski’s inequality C12.4 to double integrals. In some sense it is also a theorem on the change of the order of iterated integrals, but equality is only obtained if p = 1. 13.14 Theorem (Minkowski’s inequality for integrals) Let X and ¯ be ⊗ -measurable. Y be -finite measure spaces and u X × Y → Then p 1/p 1/p p ux y dy dx ux y dx dy X
Y
Y
X
holds for all p ∈ 1 , with equality for p = 1. Proof If p = 1, the assertion follows directly from Tonelli’s theorem 13.8. If p > 1 we set Uk x = ux y dy ∧ k 1Ak x Y
where Ak ∈ is a sequence with Ak ↑ X and Ak < . Without loss of generality we may assume that Uk x > 0 on a set of positive -measure, otherwise the left-hand side of the above inequality would be 0 (using Beppo Levi’s theorem 9.6) and there would be nothing to prove. By Tonelli’s theorem and p H¨older’s inequality T12.2 with p1 + q1 = 1 or q = p−1 , we find p p−1 Uk x dx Uk x ux y dy dx X
X
=
Y
Y X
p−1
Uk
Y
X
x ux y dx dy
1−1/p 1/p p Uk x dx ux yp dx dy
The claim follows upon dividing both sides by k → with Beppo Levi’s theorem 9.6.
X
1−1/p p X Uk x dx
and letting
Problems 13.1. Prove the rules (13.2) for Cartesian products. 13.2. Let X and Y be two -finite measure spaces. Show that A × N , where A ∈ and N ∈ , N = 0, is a × -null set.
Measures, Integrals and Martingales
131
13.3. Denote by Lebesgue measure on 0 . Prove that the following iterated integrals exist and that e−xy sin x sin y dxdy = e−xy sin x sin y dydx 0 0
0 0
Does this imply that the double integral exists? 13.4. Denote by Lebesgue measure on 0 1. Show that the following iterated integrals exist, but yield different values: x2 − y 2 x2 − y 2 dxdy = dydx 2 2 2 2 2 2 01 01 x + y 01 01 x + y What does this tell about the double integral? 13.5. Denote by Lebesgue measure on −1 1. Show that the iterated integrals exist, coincide, xy xy dxdy = dydx 2 2 2 2 2 2 −11 −11 x + y −11 −11 x + y but that the double integral does not exist. 13.6. (i) Prove that 0 e−tx dt = x1 for all x > 0. (ii) Use (i) and Fubini’s theorem to show that the sine integral sin x lim dx = n→ 0n x 2 13.7. Let A = #A be the counting measure and be Lebesgue measure on the measurable space 0 1 0 1. Denote by = x y ∈ 0 12 x = y the diagonal in 0 12 . Check that 1 x y dxdy = 1 x y dydx 01 01
01 01
Does this contradict Tonelli’s theorem? 13.8. (i) State Tonelli’s and Fubini’s theorems for spaces of sequences, i.e. for the measure space where = j∈ j , and obtain criteria when one can interchange two infinite summations. (ii) Using similar considerations as in part (i) deduce the following. Lemma Let Aj j be countably many (i.e. a finite or countably infinite number of) mutually disjoint sets whose union is , and let xk k∈ ⊂ be a sequence. Then xk = xk k∈
j k∈Aj
in the sense that if either side converges absolutely, so does the other, in which case both sides are equal. 13.9. Let u 2 → 0 be a Borel measurable function. Denote by Su = x y 0 y ux the set above the abscissa and below the graph u = x ux x ∈ of u.
132
R.L. Schilling (i) Show that Su ∈ 2 . (ii) Is it true that 2 Su = u d1 ? (iii) Show that u ∈ 2 and that 2 u = 0.
[Hint: (i) – use T8.8 to approximate u by simple functions fj ↑ u. Thus Su = 2 j Sfj and Sfj ∈ is easy to see; alternatively, use T13.10, set Ux y = ux y and observe that Su = U −1 C for the closed set C = x y x y; (ii) – use Tonelli’s theorem; (iii) – use u ⊂ Su \ Su − + or u = U −1 x y x = y; show first that 2 u ∩ −n n2 = 0 for every n ∈ and observe that u ∩ −n n2 = u 1−nn ∧ n.] 13.10. Let X be a -finite measure space and let u ∈ + be a 0 -valued measurable function. Show that the set Y = y ∈ x ux = y = 0 ⊂ is countable. [Hint: assume that u ∈ 1+ . Set Y = y > u = y > and observe that for t1 tN ∈ Y we have N Nj=1 tj u = tj u d. Thus Y is a finite set, and Y = kn∈ Y n1 k1 is countable. If u is not integrable, consider u ∧ m 1Am , m ∈ , where Am ↑ X is an exhaustion.] 13.11. Completion (5). Let X and Y be any two measure spaces such that = X and such that contains non-empty null sets. (i) Show that × on X × Y ⊗ is not complete, even if both and were complete. (ii) Conclude from (i) that neither 2 ⊗ × nor the product of ¯ are complete. the completed spaces 2 ∗ ⊗ ∗ ¯ × [Hint: you may assume in (ii) that ∗ = .] 13.12. Let be a bounded measure on the measure space 0 0 . (i) Show that A ∈ 0 ⊗ if, and only if, A = j∈ Bj × j, where Bj j∈ ⊂ 0 . (ii) Show that there exists a unique measure on 0 ⊗ satisfying tn B × n = e−t dt n! B 13.13. Stieltjes measure (2). Stieltjes integrals. This continues Problem 7.9. Let and be two measures on such that −n n −n n < for all n ∈ , and denote by ⎧ ⎧ ⎪ ⎪ if x > 0 if x > 0 ⎪0 x ⎪ ⎨ ⎨ 0 x Fx = 0 and Gx = 0 if x = 0 if x = 0 ⎪ ⎪ ⎪ ⎪ ⎩−x 0 if x < 0 ⎩− x 0 if x < 0 the associated right-continuous distribution functions (in Problem 7.9 we considered left-continuous distribution functions). Moreover, set Fx = Fx − Fx− and Gx = Gx − Gx−.
Measures, Integrals and Martingales
133
(i) Show that F G are increasing, right-continuous and that Fx = 0 if, and only if, x = 0. Moreover, F and are in one-to-one correspondence. (ii) Since measures and distribution functions are in one-to-one correspondence, it is customary to write u d = u dF , etc. If a < b we set B = x y a < x b x y b. Show that B is measurable and that × B = Fs dGs − FaGb − Ga ab
(iii) Integration by parts. Show that FbGb − FaGa = Fs dGs + ab
=
ab
Gs− dFs ab
Fs− dGs +
ab
Gs− dFs +
FsGs
a<sb
[Hint: expand × a b2 in two different ways, using (ii). Note that the sum in the second part of the formula is at most countable because of L13.12.] (iv) Change of variable formula. Let be a C 1 -function. Then Fb − Fa = Fs − Fs− − FsFs Fs− dFs + ab
a<sb
[Hint: use (iii) to show the change of variable formula for polynomials and then use the fact that continuous functions can be uniformly approximated by a sequence of polynomials – cf. Weierstraß’ approximation theorem 24.6.] 13.14. Rearrangements. Let X be a -finite measure space and let f ∈ p for some p ∈ 1 . The distribution function of f is given by f f t and the decreasing rearrangement of f is the generalized inverse of f , f ∗ = inf t f t
0
(inf ∅ = + ).
(i) Let f = 2 113 + 4 145 + 3 169 . Make a sketch of the graphs of fx, f t and f ∗ . (ii) Show that for f ∈ p f p d = p tp−1 f t dt = f ∗ p d
0
∗
0
In other words: f p = f p . Because of this the space p is said to be rearrangement invariant.
14 Integrals with respect to image measures
Let X be a measure space and X be a measurable space. As we have seen in T7.6 and D7.7, any / -measurable map T X → X can be used to transport the measure , defined on X , to a measure on X : TA = T −1 A
∀ A ∈
(14.1)
Let us see how (14.1) extends to integrals. To make sense of the integral dT w.r.t. the image measure T, we use again the recipe from Chapters 9, 10 when we introduced the integral. First, we calculate the image integrals for indicator functions and, by linearity, for (positive) simple functions. By monotone convergence T9.6 we extend the resulting formula to all positive measurable functions and, finally, considering positive and negative parts, to the whole class 1 T. This is the blueprint for the proof of 14.1 Theorem Let T X → X be a measurable map between the measure space ¯ X and the measurable space X . For every /-measurable ¯ ¯ ¯ is /and T-integrable function u X → we find that u T X → measurable, -integrable and satisfies X
u dT =
X
u T d
(14.2)
If u 0 is positive, (14.2) remains valid without assuming T-integrability. Proof Since u and T are measurable, so is u T , see T7.4, and the integrals in (14.2) are well-defined. 134
Measures, Integrals and Martingales
Let us assume that u 0 but not necessarily integrable, i.e. is allowed. For a simple function f ∈ + , f=
M
yj 1Aj
135
u dT = +
Aj ∈ yj 0
j=0
the identity (14.2) follows from (14.1) by linearity:
f dT =
M
yj
X
j=0
=
M
1Aj dT
yj TAj
j=0 (14.1)
=
M
yj T −1 Aj
(14.3)
j=0
=
M
yj
j=0
=
M j=0
X
1T −1 Aj x dx
X
1Aj Tx dx =
yj
fTx dx X
where we used that 1T −1 A x = 1A Tx for all A ∈ . Since every u ∈ + is the limit of an increasing sequence of positive simple functions fj ∈ + , see T8.8, we can use Beppo Levi’s theorem 9.6 and (14.3) to get (14.3) 9.6 u dT = sup fj dT = sup fj T d X
j∈ X
9.6
=
j∈ X
X
u T d
If u is T-integrable, we write u = u+ − u− and apply (14.2) to u± separately. All we have to do is to observe that u± T = u T± and that, due to the (14.2) ± u dT < . integrability assumption, u T± d = Often we are in the situation where T X → X is invertible with an inverse map T −1 X → X. In this case we can strengthen Theorem 14.1.
/-measurable
14.2 Corollary If in the situation of Theorem 14.1 the measurable map T X → X ¯ is T integrable (and, has a measurable inverse T −1 X → X, then u X →
136
R.L. Schilling
¯ a fortiori, /-measurable) if, and only if, u T is integrable (and, a fortiori, ¯ /-measurable). In this case (14.2) holds. Proof Apply Theorem 14.1 to u and u T using the measurable transformations T and T −1 respectively. 14.3 Examples We will frequently encounter the following particular situation of Corollary 14.2: let X be n n n where n is n-dimensional Lebesgue measure and let X = n n . The maps n → n y → −y
and
x n → n y → y − x
are continuous, hence measurable (Example 7.3) and so are their inverses −1 = and x−1 = −x . (i) By Corollary 7.11, Lebesgue measure n is invariant under reflections and translations, so that n = n and n = x n for all x ∈ n . But then (14.2) becomes u−y n dy = u y n dy = uy n dy (14.4) n = uy dy and, for all x ∈ n ,
uy ∓ x n dy =
u ±x y n dy = =
uy ±x n dy (14.5) uy n dy
(ii) If we consider Lebesgue measure with a density f 0, f n , cf. L10.8, we find uy x f n dy = u x yfy n dy (14.6) = u x yf x −x y n dy = uyfy + x n dy which also proves that x f n = f −x n .
Measures, Integrals and Martingales
137
Convolutions The convolution or Faltung of functions and measures on n n appears naturally in functional analysis, Fourier analysis, probability theory and other branches of mathematics. One can understand it as an averaging process that respects translations and results in a gain of smoothness. ¯ be 14.4 Definition Let and be measures on n n and u v n → measurable numerical functions. The convolution of … • …two functions u and v is the function ux − yvy n dy u vx =
(14.7)
provided ux − •v is positive or contained in 1 n ; • …of the function u and the measure is the function ux − y dy u x =
(14.8)
provided ux − • is positive or contained in 1 ; • …of two measures and is the measure B ∈ n B = 1B x + y dx dy
(14.9)
n
n
14.5 Remarks (i) The convolution of two functions (or of a function with a measure or of two measures) is linear in each of its arguments, e.g. u + v w = u w + v w
∈
Similar formulae hold in the second argument and for the other cases. (ii) The function 2n → n , x y = x + y, is Borel measurable and B = 1B x + y dx dy = 1 −1 B x y d × x y = × If and have densities u v 0 w.r.t. Lebesgue measure, that is = u n and = v n , we find for all B ∈ n that 1B x + yuxvy n dx n dy u n v n B = n ux − yvy dy n dx = B
138
R.L. Schilling
where we used Tonelli’s theorem 13.8 and (14.6). Thus u n v n = u v n ; n in a similar way one shows = u n . that u Interpreting (14.9) as 1B d = 1B x + y dx dy, we easily see that d =
x + y dx dy
first for simple functions, then by T9.6 for positive measurable functions, and finally by linearity for general ∈ 1 × . Note that the definition of u v is not really straightforward since we require ux − •v to be positive or integrable. Here is a much handier criterion due to W.H. Young. 14.6 Theorem (Young’s inequality) Let u ∈ 1 n and v ∈ p n , p ∈ 1 . Then the convolution u v defines a function in p n , satisfies u v = v u and
u v p u 1 · v p
(14.10)
Proof Assume first that u v 0. Let x y = x − y. Then is Borel measurable and so are x y → ux − yvy and x y → uyvx − y. Since n is invariant under translations, we see using (14.4)–(14.6) u vx = ux − yvy n dy = uyvx − y n dy = v ux Moreover,
p uyvx − y n dy n dx p uy n p dy n dx vx − y = u 1
u 1 12.14 uy n p dy n dx u 1 vx − yp
u 1 uy 13.8 p = u 1 n dy vx − yp n dx
u
1
u v pp =
p
= v p by (14.5) p
= u 1 v pp which implies that u v ∈ p n . The general case follows now from considering u = u+ − u− and v = v+ − v− and the fact that u+ − u− v± = u+ v± − u− v± where the difference is a.e. defined since u± v± ∈ p n is a.e. finite.
Measures, Integrals and Martingales
139
The convolution u v is a hybrid of u and v which inherits those properties which are preserved under translations and averages, cf. Problems 14.6–14.8. In general, u v is smoother than u and v. To see this, we need the following result which, although similar to Corollary 12.11, is much deeper and uses the topological structure of n n n . 14.7 Lemma The continuous functions with compact support Cc n are a dense subset of p n , p ∈ 1 . Proof Postponed to Chapter 15, Theorem 15.17. p n 14.8 Theorem Let u ∈ , p ∈ 1 . (i) The map x → ux + y − uyp n dy is uniformly continuous.
(ii) If u ∈ 1 n , v ∈ n , then u v is bounded and continuous. Proof (i) Because of Lemma 14.7 we find for every > 0 some ∈ Cc n such that u − . By the lower triangle inequality for • p and the translation invariance of n we find for any two x x ∈ n
ux + • − u p − ux + • − u p ux + • − ux + • p (14.5)
= ux − x + • − u p
Using again the triangle inequality and translation invariance we get for every R > 0 and all x x with x − x < R/2
ux−x + • − u p ux − x + • − u 1BR 0 p + ux − x + • − u 1BRc 0 p 1/p [] p n ux − x + • − u 1BR 0 p + 2 u d c 0 BR/2
Since u ∈ p n , it follows from the monotone convergence theorem 11.1 that limR→ Bc 0 up dn = 0, so that we can achieve R
1/p u d p
c 0 BR/2
n
∀ R > R
Since is continuous with compact support, it is uniformly continuous, which means that there is a = R > 0 such that for all y ∈ n , x < , and any fixed R > R we have x + y − y /n BR 01/p .
140
R.L. Schilling
Another application of the triangle inequality for • p and translation invariance yields ux−x + • − u 1B 0 R p ux − x + • − x − x + • 1BR 0 p + u − 1BR 0 p + x − x + • − 1BR 0 p 1/p
p n
2 u − p + x − x + y − y dy
BR 0 p /n BR 0 if x−x <
3 Combining the above estimates, we get
ux + • − ux + • p 5
∀ x x x − x <
< R/2
which is but uniform continuity. (ii) We have for any x x ∈ n
u vx − u vx
vyux − y − vyux − y n dy v ux − • − ux − • 1 (14.4)
= v ux + • − ux + • 1
and continuity follows from part (i). The boundedness of u v is proved with a similar calculation. Problems 14.1. Let X be a measure space and T X → X be a bijective measurable map whose inverse T −1 X → X is again measurable. Show that for every f ∈ + one has u dTf = u T f d = u f T −1 dT = u d f T −1 T 14.2. Let X be a measure space and Y be a measurable space. Assume that T A → B, A ∈ , B ∈ , is an invertible measurable map. Show that TB = TA with the restrictions A • = A ∩ • and TB = TB ∩ •. 14.3. Let be a measure on n n and x y z ∈ n . Find x y and z .
Measures, Integrals and Martingales
141
14.4. Let be two -finite measures on n n . Show that has no atoms (cf. Problem 6.5) if has no atoms. 14.5. Let P be a probability measure on n n and denote by P n the n-fold convolution product P P · · · P. Show that P n d n Pd if
Pd < , then
P n d = n
Pd
14.6. Let p → be a polynomial and u ∈ Cc . Show that u p exists and is again a polynomial. 14.7. Let w → be an increasing (hence, measurable, by Problem 8.18) and bounded function. Show that for every u ∈ 1 1 the convolution u w is again increasing, bounded and continuous. 14.8. Assume that u ∈ Cc and w ∈ C . Show that u w exists, is of class C and satisfies j u w = u j w. 14.9. Young’s inequality. Adapt the proof of Theorem 14.6 and show that
u w r u p · w q for all p q r ∈ 1 , u ∈ p , w ∈ q and r −1 + 1 = p−1 + q −1 . 14.10. Friedrichs mollifiers. Let n → + be a C -function such that dn = 1 and supp = B1 0. For > 0 define the function x = −n x/. The function u is called the Friedrichs mollifier of u ∈ p , 1 p < . (i) Show that x = exp1/x2 − 1 1B1 0 x has, for a suitable > 0, the properties mentioned above. Determine . (ii) Show that ∈ Cc n , supp = B 0, and 1 = 1. (iii) Show that supp u ⊂ supp u + supp = y ∀ x ∈ supp u x − y . (iv) Show that u is in C ∩ p and
u p u p
∀ > 0
(v) Show that Lp -lim→0 u = u. [Hint: split the region of integration as in the proof Theorem 14.8 and use the uniform boundedness shown in (iv).] 14.11. Define → by x = 1−cos x 102 x and let ux = 1, vx = x, and wx = −x t dt. Then
(i) u vx = 0 for all x ∈ ; (ii) u wx = x > 0 for all x ∈ 0 4; (iii) u v w ≡ 0 = u v w. Does this contradict the commutativity of the convolution which was used in Theorem 14.6?
15 Integrals of images and Jacobi’s transformation rule1
The previous chapter dealt with image measures and, by their definition, with measures of pre-images of sets. Sometimes one needs to know the measure of the direct image of a set under T X → Y . If T −1 exists and is measurable, we can apply the results of Chapter 14 to S = T −1 and we are done. If, however, T −1 is not measurable, the direct image TA of a set A ∈ need not be -measurable; in particular, an expression of the type TA – here is any measure on X – may not be well-defined, let alone a measure. Let us consider this problem in a very particular setting, where X ⊂ n X ⊂ d
and are Lebesgue measures n resp. d
We need some notation: if X → d , we write = 1 2 d for its components and we set for vectors x = x1 xn ∈ n and matrices A = ajk j=1n k=1d
x = max xj
A = max ajk
1jn
1jn 1kd
(15.1)
A set F ⊂ n [G ⊂ n ] is called an F -set [G -set] if it is the countable union of closed sets [countable intersection of open sets], i.e. if F= G= (15.2) C U ∈
∈
for closed sets C [open sets U ]. Obviously, both F - and G -sets are Borel sets; but, in general, neither are F -sets closed nor are G -sets open.
1
The proofs in this chapter can be left out at first reading.
142
Measures, Integrals and Martingales
143
15.1 Theorem Let F ⊂ n be an F -set and F → d be an -Hölder continuous map, that is x − y L x − y
∀ x y ∈ F
(15.3)
with constant L and exponent ∈ 0 1. For every F -set E ⊂ n , F ∩ E is an F -set in d , hence Borel measurable. If d n, we have d F ∩ E Ld n E
(15.4)
Proof Since E F are F -sets, they have representations E = F = j∈ Cj with closed sets j Cj ⊂ n . Moreover, E ∩ Bk 0 = j ∩ Bk 0 = K E= k∈
kj∈
j∈ j
and
∈
where K ∈ is an enumeration of the family j ∩ Bk 0jk∈ of closed and bounded, hence compact, sets. Thus Cj ∩ K is a compact set, and since images of compact sets under continuous maps are compact, we see that Cj ∩ K is compact and, in particular, closed. So, (2.4) F ∩ E = Cj ∩ K j ∈
is an F -set. Assume now that d n. If n E = , (15.4) is trivial and we will consider only the case n E < . The proof of Carathéodory’s extension theorem 6.1 – in particular (6.1) – for = n and the semi-ring = n of n-dimensional half-open rectangles (cf. P6.4) shows that we can find for every > 0 a sequence Jj j∈ ⊂ n with n Jj and Jj n E + (15.5) E⊂ j∈
j∈
Without loss of generality we can assume that all Jj are squares, i.e. have sides of equal length sj < s < 1, otherwise we could subdivide each Jj into finitely many non-overlapping squares of this type.[] So, 4.6
F ∩ Jj d F ∩ Jj (15.6) d F ∩ E d j∈
j∈
which means that it is enough to check (15.4) for a square E = J of side-length s < 1 and centre c ∈ n . Because of (15.3), n d J = × ck − 21 s ck + 21 s ⊂ × k c − L2 s1/ k c + L2 s1/ k=1
k=1
144
R.L. Schilling
and (notice that n J 1 and d/ n 1)
d/ n d F ∩ J Ls1/ d = Ld n J Ld n J From (15.5), (15.6) we conclude d F ∩ E
Ld n Jj Ld n E +
j∈
and the claim follows upon letting → 0. Theorem 15.1 can be improved if we use the completed Borel- -algebra cf. Problems 4.13, 6.2, 10.11 and 10.12. Recall that
∗ n ,
B∗ ∈ ∗ n ⇐⇒
B∗ = B ∪ N for some B ∈ n and a subset N of a Borel null set.
The advantage of ∗ n over n is that -Hölder continuous maps n → d map ∗ n -measurable sets into ∗ d -sets if d n; this is not true for n . To see this we need a few preparations. 15.2 Lemma Let B ∈ n be a Borel set. Then there exists an F -set F and a G -set G such that F ⊂B⊂G
and
n F = n B = n G
Proof The proof consists of three stages: Step 1: Construction of the set G. If n B = , we take G = n . If n B < , we find as in the proof of Theorem 15.1 (or as in Carathéodory’s extension theorem 6.1, (6.1), with = n and = n ) for every k ∈ a sequence of half-open squares Jjk j∈ ⊂ n of side-length sj such that j∈
Jjk
and
n Jjk n B +
j∈
We can now enlarge Jjk by moving the lower left corner by j = sjn + 2−j /k1/n − sj units ‘to the left’ in each coordinate direction. The new open square J˜jk has volume 1 n J˜jk = n Jjk + 2−j k
1 k εj
~k
Jj
sj
k
Jj
sj
∋
B⊂
j
Measures, Integrals and Martingales
145
˜k j∈ Jj
⊃ B satisfy 4.6 n k 1 1 1 n k n ˜k n G + Jj = Jj + B + k k k j∈ j∈
and we see that the open sets Gk =
is a G -set with G ⊃ B, and 2 4.4 n n n k n B G = lim G lim B + = n B k→ k→ k
Thus G =
k k∈ G
Step 2: Construction of the set F if n B < . Denote by B¯ the closure2 of B. Since B¯ \ B is a Borel set, we find as in step 1 open sets U k with B¯ \ B ⊂ U k
and
n U k n B¯ \ B +
1 k
(15.7)
Observe that
B ⊂ B \ U k ∪ U k ∩ B ⊂ B¯ \ U k ∪ U k \ B¯ \ B so that by the subadditivity of measures
n B n B¯ \ U k + n U k \ B¯ \ B = n B¯ \ U k + n U k − n B¯ \ B (15.7)
n B¯ \ U k +
1 k
By construction, Ck = B¯ \ U k ⊂ B¯ \ B¯ \ B = B is a closed set and F = ⊂ B is an F -set satisfying 1 n B − n Ck n Cj = n F n B k j∈
k∈ Ck
The claim follows as k → . Step 3: Construction of the set F if n B = . Setting
Bj = B ∩ Bj 0 \ Bj−1 0 j ∈ we get a disjoint partitioning of B = · j∈ Bj where each set Bj is a Borel set with finite volume. Applying step 2 to each Bj , we find F -sets Fj ⊂ Bj with 2
i.e. the smallest closed set containing B, cf. Appendix B, Definition B.3(iii).
146
R.L. Schilling
n Fj = n Bj , j ∈ . Since the Bj are mutually disjoint, so are the Fj , and since F = j∈ Fj is again an F -set (cf. Problem 15.1) we end up with F ⊂ B and n n n F = Fj = Bj = n B j∈
j∈
The proof of the lemma is now complete. 15.3 Lemma Let n → d be an -Hölder continuous map with ∈ 0 1 and d n. If N ∗ is a subset of a Borel null set N ∈ n , then N ∗ is a subset of a Borel null set M ∈ d . Proof Since N ∗ ⊂ N ∈ n where n N = 0, we can repeat the argument of the proof of Theorem 15.1 to find for k ∈ a covering of N by half-open squares Jjk ∈ n such that N⊂
Jjk
and
n
j∈
1 Jjk n Jjk k j∈ j∈
Since n Jjk = n J¯jk , J¯jk is the closed square, we have also n
1 J¯jk n J¯jk k j∈ j∈
Applying T15.1 to the F -set F k = well as
¯k j∈ Jj
shows that
d F k Ld n F k Since
k∈ F
k
k∈ F
k ∈ n
as
Ld k
⊃ N ⊃ N ∗ , we conclude
d
Ld k→ F d F k −−−→ 0 k ∈
Lemma 15.3 is just a special case of the following theorem which has already been announced above. 15.4 Theorem Let F ⊂ n be an F -set, F → d be an -Hölder continuous map with exponent ∈ 0 1. If d n, then maps the completed Borel
-algebra F ∩ ∗ n into ∗ d , and the inequality (15.4) holds for all B ∈ ∗ n with the completed Lebesgue measures3 ¯n and ¯d . 3
See Problems 4.13, 6.2, 10.11, 10.12, 13.3 for the completion of measures and their properties.
Measures, Integrals and Martingales
147
Proof Pick B∗ ∈ ∗ n and write B∗ = B ∪ N ∗ where B ∈ n and N ∗ is a subset of a Borel null set N ∈ n . According to L15.2 we have B∗ = E ∪ M ∗ ∪ N ∗ = E ∪ N ∗∗ where E is an F -set, n E = n B, and M ∗ N ∗∗ = N ∗ ∪ M ∗ are subsets of Borel null sets. Thus B∗ = E ∪ N ∗∗ = E ∪ N ∗∗ and E is an F -set, see T15.1, and N ∗∗ is contained in a Borel null set ⊂ d , see L15.3, hence B∗ ∈ ∗ d . Finally, by T15.1,
¯ d F ∩ B∗ = ¯ d F ∩ E ∪ N ∗∗
¯ d F ∩ E ∪ N ∗∗
= d F ∩ E 15.1
Ld n E = Ld n B = Ld ¯ n B∗
Let us stress that both H¨older continuity of and the condition d n are crucial for Theorem 15.4; one can find counterexamples if we have only ∈ CF d or d < n.
Jacobi’s transformation formula One of the most interesting situations arises if = 1 n nx → ny (we write nx if we want to indicate the generic variable in order to distinguish between the domain and range of ) is a C 1 -map with everywhere defined inverse −1 ny → nx which is again a C 1 -map. Such maps are called C 1 n n diffeomorphisms. As usual, we write D x = x k x for the j
jk=1n
Jacobian at the point x ∈ nx . By Taylor’s theorem we find for all x x ∈ K from a compact set K ⊂ nx k x − k x
n · xj − xj k x j j=1
(15.8)
n sup D · x − x ∈K
i.e. is locally Lipschitz (1-Hölder) continuous with Lipschitz constant L = LK = n sup∈K D .
148
R.L. Schilling
15.5 Theorem (Jacobi’s transformation theorem) Let nx → ny be a C 1 -diffeomorphism. Then n (15.9) B = det D x n dx B
holds for all Borel sets B ∈
nx .
The proof of Theorem 15.5 is based on two auxiliary results. 15.6 Lemma Let and be two measures on space X and 4 the measurable let be a semi-ring such that = . If and if there is a sequence Sj j∈ ⊂ with Sj ↑ X, then . Proof It is clear from the properties of and that = − → 0 is a pre-measure. By T6.1, has a unique extension to a measure ˜ on and + S S = + S =
∀S ∈
+ is the unique extension of the pre-measure + to a measure where on . But the measures ˜ + and satisfy S = ˜ S + S = S + S = + S
∀S ∈
and we conclude from the uniqueness of the extensions that = ˜ + on , i.e. A − A = ˜ A 0 for all A ∈ . Caution: Lemma 15.6 fails if is not a semi-ring; see Problem 15.4. 15.7 Lemma For every C 1 -diffeomorphism nx → ny we have ∀ J ∈ nx n J det D x n dx J
Proof Let J = a b, a b ∈ nx , and note that J¯ = a b is a compact set. Since D −1 is continuous, we find on the compact set J¯ L = sup D x−1 sup D −1 y x∈J
y∈ J¯
(15.10)
where we used the inverse function theorem.[] Since D is uniformly continuous on J¯, we find for a given > 0 some > 0 such that sup D x − D x (15.11) L xx ∈ab x−x
4
This is short for S S for all S ∈ .
Measures, Integrals and Martingales
149
Partition J into N disjoint half-open squares J1 JN ∈ nx of the same side-length < . Since D and det D are continuous functions[] , we can find for each = 1 2 N a point x ∈ J¯
such that det D x = inf det D x x∈J¯
Set T = D x ∈ n×n and observe that DT −1 x = T −1 D x = idn +T −1 D x − D x (idn is the identity matrix in n×n ). The estimates (15.10), (15.11) show that sup DT −1 x 1 + L
x∈J¯
= 1+ L
∀ 1 N
i.e. T −1 is Lipschitz (1-H¨older) continuous with constant 1 + , see (15.8). Therefore, the special transformation rule T6.10 for Lebesgue measure and T15.1 show
n J = n T T −1 J
= det T · n T −1 J det T 1 + n n J N Since J = · =1 J and det T det D x for all x ∈ J , we get n J
N
n J 1 + n
=1
N
det T n J
=1
1 + n = 1 + n
N =1 J
J
det D x n dx
det D x n dx
and the proof is finished by letting → 0. We can finally proceed to the proof of Theorem 15.5. Proof (of Theorem 15.5) Set = −1 . Since is continuous, = d = d is a measure on nx , compare T7.6 and D7.7. The determinant det D is also continuous, thus A = A det D x n dx defines a measure on nx , see L10.8. From Lemma 15.7 we know that J J < for all
150
R.L. Schilling
rectangles J ∈ nx , and Lemma 15.6 shows that holds on the whole of nx , i.e. n X det D x n dx ∀ X ∈ nx (15.12) X
This proves ‘’ of (15.9). For the other direction our strategy is to apply Lemma 15.7 to the inverse function = −1 . If X = −1 Y , Y ∈ ny , (15.12) becomes 1Y y n dy = n Y 1 −1 Y x det D x n dx = 1Y x det D x n dx and with exactly the same arguments which we used to prove Theorem 14.1, this inequality is easily extended from indicator functions to all u ∈ + ny : uy n dy u x det D x n dx (15.13) ny
nx
Switching in (15.13) the rôles of nx ↔ ny , x ↔ y and considering the C 1 -diffeomorphism ny → nx (instead of ) and the measurable[] function ux = 1 A x det D x for some A ∈ nx yields 1 A x det D x n dx nx
= = =
ny
1 A det D −1 y det D −1 y n dy
ny
ny
ny
1 A y · detD −1 y · det D −1 y n dy
1 A y · det
D −1 y · D −1 y
idn =Didn =D −1 =D −1 ·D −1
1 A y n dy = n A
This proves that for all A ∈ nx 1A x det D x n dx = nx
n dy
nx
1 A x det D x n dx
n A and, together with the converse inequality (15.12), the theorem follows.
Measures, Integrals and Martingales
151
If X ⊂ nx Y ⊂ ny are open sets and X → Y is a C 1 -diffeomorphism, we still can apply Theorem 15.5 to A = −1 B, A ∈ X ∩ nx , B ∈ Y ∩ ny to get n Y = Y = X • = det D x n dx (15.14) • ∩X
i.e. Theorem 14.1 yields the following important result. 15.8 Corollary (General transformation theorem) Let X Y ⊂ n be open sets ¯ is integrable w.r.t. and X → Y be a C 1 -diffeomorphism. A function u Y → n ¯ if, and only if, the function u · det D X → is integrable w.r.t. n . In this case Y
uy n dy =
X
u x det D x n dx
(15.15)
For many applications we need a somewhat reinforced version of C15.8 since is often only almost everywhere a diffeomorphism. The following simple generalization takes care of that. Recall that ¯ n is the completed Lebesgue measure, cf. Problems 4.13, 6.2, 10.11, 10.12, 13.11. 15.9 Corollary Let X → ny be a C 1 -map on a measurable set X ∈ ∗ nx whose open interior is denoted by X . If X \ X is a ¯ n -null set5 and X is a C 1 -diffeomorphism onto X , then uy ¯ n dy = u x det D x ¯ n dx (15.16) X
X
holds for all ∗ -measurable positive functions u X → 0 . Moreover, ¯ is ¯ n u X → is ¯ n integrable if, and only if, u · det D X → integrable; in this case (15.16) remains valid. Proof The argument proving C15.8 remains literally valid for ¯ n , i.e. the difficulty of C15.9 is not the completion of the measure but the fact that is only almost everywhere a diffeomorphism. Since ¯ n X \ X = 0, we get X \ X ⊂ X \ X , cf. Chapter 2, which is again a ¯ n -null set by Lemma 15.3. In view of C10.10 we can alter 1 -functions on null sets, which means that the equality
u d¯ n = 1 X · u · det D d¯ n X
from C15.8 immediately implies (15.16). 5
i.e. a subset of a Borel null set.
152
R.L. Schilling
15.10 Remark Formulae (15.9) and (15.15) have the following interesting interpretation in connection with the Radon–Nikodým theorem 19.2 and Lebesgue’s differentiation theorem for measures T19.20, in particular C19.21: dn n Br x x = det D x = lim r→0 n Br x dn Spherical coordinates and the volume of the unit ball Some of the most interesting applications of Corollaries 15.8 and 15.9 are coordinate changes. 15.11 Example (Planar polar coordinates) Consider the map P 0 × 0 2 → 2 \ 0 × 0
Pr = r cos r sin
which introduces polar coordinates r in 2 . It is not hard to see that P is bijective and even a C 1 -diffeomorphism. The determinant of the Jacobian is given by
Pr det r
cos −r sin = = r cos2 + r sin2 = r sin r cos
Since 0 × 0 is a 2 -null set, we can apply Corollary 15.8 (or 15.9) and find for every u 2 → , u ∈ 1 2 2 2
ux y d2 x y = =
r ur cos r sin d2 r 0×02
r ur cos r sin d1 d1 r
0 02
where we used Fubini’s theorem 13.9 for the last equality. This shows, in particular, that
u ∈ 1 2 ⇐⇒ r → r ur cos r sin ∈ 1 0 × 0 2 A simple but quite interesting application of planar polar coordinates is the following formula which plays a central rôle in probability theory: this is where the norming factor √1 for the Gaussian distribution comes from. 2
Measures, Integrals and Martingales
15.12 Example We have
e−x d1 x = 2
√
153
(15.17)
Proof: We use the following trick: by Tonelli’s theorem 13.8 2 2 2 −x2 1 e d x = e−x e−y d1 x d1 y
= =
2
e−x
2 +y 2
d2 x y
r e−r d1 r d1 2
0 02
−r 2
is positive and improperly Riemann integrable[] , we know that Since re Lebesgue and Riemann integrals coincide (cf. 11.8, 11.18), and therefore 2
2 2 −x2 1 e d x = 1 0 2 r e−r dr = 2 − 21 e−r 0 =
0
Polar coordinates also exist in higher dimensions but, unfortunately, the formulae become quite messy. The idea ω here is that we parametrize n by the radius r ∈ 0 , and n − 1 angles ∈ 0 2 and ∈ −/2 /2n−2 , so that x = Pr . The Jacobian is now of the form r n−1 J and, if we denote θ by v = u P the function u expressed in polar coordinates, the transformation formula gives u dn = r n−1 vr det J d1 r d1 dn−2 n
0×02× ×−/2/2n−2
We will not give further details but settle for the slightly simpler case of spherical coordinates which will lead to a similar formula. Let S n−1 = x ∈ n x2 = 1 be the unit sphere of n (x2 = x12 + · · · + xn2 is the Euclidean norm) and set n \ 0 → 0 × S n−1
x → x x
where x = x/x ∈ S n−1 is the directional unit vector for x. Obviously, is bijective, differentiable and has a differentiable inverse −1 r s = r · s.
154
R.L. Schilling
15.13 Theorem On S n−1 = S n−1 ∩ n there exists a measure n−1 which is invariant under rotations and satisfies
u d = n
n
r n−1 urs 1 dr n−1 ds
(15.18)
0×S n−1
for all u ∈ 1 n . In other words, n = × n−1 where dr = r n−1 10 r 1 dr; in particular
u ∈ 1 n n ⇐⇒ r n−1 urs ∈ 1 0 × S n−1 1 × n−1 Proof We define n−1 by
n−1 A = n n −1 A ∩ B1 0 ∀ A ∈ S n−1 which is an image measure, hence a measure, cf. T7.6. Since −1 and n are invariant w.r.t. rotations around the origin, see T7.9, it is obvious that n−1 inherits this property, too. Both and −1 are continuous, hence measurable. Therefore,
−1 ⊗ S n−1 ⊂ n
n ⊂ ⊗ S n−1
and
which shows that n = −1 ⊗ S n−1 . To see (15.18), fix A ∈ S n−1 and consider first the set B = x ∈ n x ∈ a b x ∈ A = −1 A ∩ x a x < b, which is clearly a Borel set of n . Thus
n B = n −1 A ∩ x a x < b
= n −1 A ∩ Bb 0 − n −1 A ∩ Ba 0
= bn n −1 A ∩ B1 0 − an n −1 A ∩ B1 0
= bn − an n −1 A ∩ B1 0 where we used that n a · B = an n B, cf. T7.10 or Problems 5.8, 7.7, and that −1 is invariant under dilations. This shows n B = n1 bn − an n−1 A =
ab
r n−1 n−1 A 1 dr
= × n−1 a b × A
Measures, Integrals and Martingales
155
Since the family a b × A a < b A ∈ S n−1 generates ⊗ S n−1 , see Lemma 13.3, and satisfies the conditions of the uniqueness theorem 5.7, the above relation extends to all sets B ∈ ⊗ S n−1 . Since n = −1 ⊗ S n−1 , we have B = B for some B ∈ n , so that n B = n −1 B = n −1 B = × n−1 B All other assertions follow now from Theorem 14.1 on image integrals and Fubini’s theorem 13.9. Let us note the particularly interesting case where ux = fx is rotationally invariant. 15.14 Corollary If ux = fx is a rotationally invariant function, then u ∈ 1 n n if, and only if, r → r n−1 fr ∈ 1 0 1 . In this case n
fx n dx = n n
r n−1 fr 1 dr
0
where n = n B1 0 denotes the volume of the unit ball in n . In particular, we get for the functions f x = x , ∈ , f ∈ 1 B1 0 \ 0 ⇐⇒ > −n f ∈ 1 n \ B1 0 ⇐⇒ < −n Proof The integral formula follows from (15.18) where the constant n n =
n−1 S n−1 .6 That n must be the volume of B1 0 is immediately clear if we choose ux = 1B1 0 x. The integrability of f follows now from Example 11.12. Let us finally determine n , the volume of the unit ball in n . For this we use the same method which we employed in Example 15.12: √
n (15.17)
=
e
−t2
n 1
dt
= 15.14
···
= n n
6
e−x1 +···+xn 1 dx1 2 dxn
2
2
r n−1 e−r 1 dr 2
0
This is, actually, the surface area of the unit ball B1 0 in n .
156
R.L. Schilling
Since r n−1 e−r is positive and improperly Riemann integrable[] , Riemann and Lebesgue integrals coincide (use 11.8, 11.18), and we find after a change of variables according to s = r 2 2
√ n = n n
r
n−1 −r 2
e
0
see Example 11.14. Since
n 2
n n/2−1 −s dr = n s e ds = n n2 n2 2 0
n2 = n2 + 1, we have finally established
n = n B1 0 =
15.15 Corollary
n/2 . n2 + 1
Continuous functions are dense in p n We will now establish a result that is closely related to Lemma 15.2: we show that the continuous functions with compact support Cc n are dense in the space of Lebesgue p-integrable functions p n , 1 p < , that is, if u ∈ p n , then ∀ > 0
∃ = u ∈ Cc n u − p
Since every compact set K ⊂ n is bounded, we find for some sufficiently large R > 0 that K ⊂ −R Rn , hence n K 2Rn . Thus for ∈ Cc n with support supp = = 0 ⊂ K, pp =
p dn =
K
p dn sup xp 2Rn < x∈K
so that Cc n ⊂ p n (measurability is clear because of continuity). Our strategy will be to approximate first indicator functions of Borel sets and simple functions. For this we need the following 15.16 Lemma (Urysohn) Let K ⊂ n be a compact set and U ⊃ K be an open set. Then there exists a continuous function = KU ∈ Cn such that 1 K 1U . Proof Let dx A = inf y∈A x − y be the distance of the point x ∈ n from the set A ⊂ n . For x x ∈ n we have
dx A = inf x − y inf x − x + x − y = x − x + dx A y∈A
y∈A
Measures, Integrals and Martingales
157
which shows, due to the symmetry in x and x , that dx A − dx A x − x , or, in other words, that x → dx A is continuous. It is now easy to see that the function dx U c x = dx K + dx U c is continuous and satisfies 1K 1U . 15.17 Theorem Cc n is a dense subset of p n , 1 p < . Proof We have already verified that Cc n ⊂ p n . Step 1: Cn ∩ p n is dense in n ∩ p n . Let B ∈ n such that 1B ∈ p n (i.e. n B < ). In steps 1,2 of the proof of Lemma 15.2 we constructed for such sets open sets U and closed sets C such that C ⊂ B ⊂ U
and n U − n B + n B − n C p
By the continuity of measures T4.4(iii) we find bounded, hence
for the closed and n n compact, sets Bj 0 ∩ C ↑ C that limj→ Bj 0 ∩ C = C . This means that we can replace C by a compact set K ⊂ C and still have K ⊂ B ⊂ U
and n U − n K 2p
Using Lemma 15.16 we find a continuous function = U K ∈ Cn with 1K 1U . As 1K 1B 1U we have, in particular, 1B − p 1B − 1K p + 1K − p 2 1U − 1K p 4 which also shows that ∈ p n . Since any f ∈ n ∩ p n has a standard representation of the form f = M j=0 yj 1Bj where y0 = 0 and B1 BM are Borel sets of finite volume, it is clear that Cn ∩ p n is dense in the set of all pth power integrable simple functions. Step 2 : Cn ∩ p n is dense in p n . Fix > 0. Since n ∩ p n is dense in p n , cf. C12.11, there exists some f ∈ n ∩ p n such that f − up Using step 1 we find some ∈ Cn ∩ p n with − f p and the claim follows from Minkowski’s inequality for •p − up − f p + f − up 2
158
R.L. Schilling
Step 3 : Cc n is dense in p n . Let ∈ Cn be the function constructed in step 2. Using Lemma 15.16 we obtain a sequence of functions j such that j→
1B 0 j 1Bj+1 0 . Obviously, j −−−→ , j and j ∈ j Cc n . Lebesgue’s dominated convergence theorem 11.2 (or 12.9) therefore shows that lim u − j p = u − p 2
j→
and the theorem is proved. Regular measures The seemingly innocuous question whether the continuous functions are a dense subset of p is – even for Lebesgue measure in n – quite hard to answer, as we have seen in Theorem 15.17. In general measure spaces, such results require a connection between measure and topology that reaches further than just considering the Borel (= topological) -algebra on a topological space X . This connection is made in the following 15.18 Definition Let X be a topological space, denote by the compact subsets of X and let be a measure on X , = . The measure is called outer regular if B = infU U ∈ U ⊃ B
∀ B ∈
and (compact) inner regular if B = supK K ∈ K ⊂ B
∀ B ∈
For Lebesgue measure n on n n we have proved outer and inner regularity in Lemma 15.2, see also step 1 in the proof of Theorem 15.17 and Problem 15.2. Let us note, without proof, the following characterization of outer regular measures. 15.19 Theorem Let X be a complete separable metric space7 and denote the open sets by and the compact sets by . Every measure on X X which is locally finite, i.e. every x ∈ X has an open neighbourhood U = Ux of finite measure U < , is both outer regular and inner regular, i.e. B = infU U ∈ U ⊃ B = supK K ∈ K ⊂ B 7
cf. Appendix B.
Measures, Integrals and Martingales
159
A proof can be found in Bauer [6, §26]. Note the analogy to Lemma 15.2 and the proof of Theorem 15.17 where we (essentially) verified Theorem 15.19 for Lebesgue measure. Also note that the measure in Theorem 15.19 is -finite: since X is separable, there is a countable dense subset D ⊂ X, and the collection = Br d r ∈ + d ∈ D Br d ⊂ Ud Ud as in T15.19 is a countable family of open balls with finite -measure. Moreover, since every U ∈ can be written in the form8 U= Br d
Br d⊂U
N
we find that X = N =1 j=1 Brj dj with Brj dj < . Almost the same argument that was used in the proof of Theorem 15.17 is valid in the abstract setting. 15.20 Theorem Let X be a topological space and be an outer regular measure on X X. Then the set Cfin X = u X → u is continuous u = 0 < is dense in Lp X , 1 p < . Proof Let A ∈ be a set with A < . Since is outer regular, we find for every > 0 some U ∈ such that A⊂U
and U − p A U
Literally as in step 2 of the proof of Lemma 15.2 we can find some closed set F with F ⊂A
and
F A F + p
and, consequently, U − F 2p . The rest of the proof is now as in T15.17. Problems 15.1. Let F F1 F2 F3 be F -sets in n . Show that (i) F1 ∩ F2 ∩ ∩ FN is for every N ∈ an F -set; (ii) Fj is an F -set; j∈ 8
This is similar to (3.2) in the proof of T3.8: the inclusion ‘⊂’ is obvious, for ‘⊃’ fix x ∈ U . Then there exists some r ∈ + with Br x ⊂ U . Since D is dense, x ∈ Br/2 d for some d ∈ D with d x < r/4, so that x ∈ Br/2 d ⊂ U .
160
R.L. Schilling
(iii) F c and j∈ Fjc are G -sets; (iv) all closed sets are F -sets. 15.2. Prove the following corollary to Lemma 15.2: Lebesgue measure n on n is outer regular, i.e. ∀ B ∈ n n B = inf n U U ⊃ B U open and inner regular, i.e. n B = sup n F F ⊂ B F closed = sup n K K ⊂ B K compact
∀ B ∈ n ∀ B ∈ n
15.3. Completion (6). Combine Problems 15.2 and 10.12 to show that the completion ¯ n of n-dimensional Lebesgue measure is again inner and outer regular. 15.4. Consider the Borel -algebra 0 and write = 1 0 for Lebesgue measure on the half-line 0 . (i) Show that = a a 0 generates 0 . (ii) Show that B = B 124 dx and B = 5 · B, B ∈ 0 are measures on 0 such that but not in general. Why does this not contradict Lemma 15.6? 15.5. Use Jacobi’s transformation formula to recover Theorem 5.8(i), Problem 5.8 and Theorem 7.10. Show, in particular, that for all integrable functions u n → 0 ux + y n dx = ux n dx ∀ y ∈ n 1 ux n dx tn 1 uAx n dx = ux n dx det A
ut x n dx =
∀ t > 0 ∀ A ∈ GLn
In particular, the l.h.s. of the above equalities exists and is finite if, and only if, the r.h.s. exists and is finite. Why can’t we use 15.5 and 15.8 to prove these formulae? 15.6. Arc-length. Let f → be a twice continuously differentiable function and denote by f = t ft t ∈ its graph. Define a function → 2 by x = x fx. Then (i) → f is a C 1 -diffeomorphism and det D x = 1 + f x2 . (ii) = det D 1 is a measure on f . (iii) f ux y d x y = ut ft 1 + f t2 d1 t with the understanding that whenever one side of the equality makes sense (measurability!) and is finite, so does the other.
Measures, Integrals and Martingales
161
The measure is called canonical surface measure on f . This name is justified by the following compatibility property w.r.t. 2 : Let nx be a unit normal vector ˜ × → 2 by x ˜ to f at the point x fx and define a map r = x + r nx. Then ˜ r = 1+f x2 −r f x. (iv) nx = −f x 1/ 1 + f x2 and det D x Conclude that for every compact interval c d there exists some > 0 such ˜ cd×− is a C 1 -diffeomorphism. that (v) Let C ⊂ f cd and r < with as in (iv). Make a sketch of the set
˜ −1 C × −r r and show that it is Borel measurable. Cr = (vi) Use dominated convergence to show that for every x ∈ c d 1 det D x ˜ ˜ 0 r 1 dr = det D x lim r↓0 2r −rr (vii) Use the general transformation theorem 15.8, Tonelli’s theorem 13.8, (vi) and dominated convergence to show that det D x 1 dx lim 2 Cr = −1 C
r↓0
(viii) Conclude that
1 + f t2 dt is the arc-length of the graph of f .
15.7. Let d → M ⊂ n , d n, be a C 1 -diffeomorphism.
(i) Show that M = det D d is a measure on M. Find a formula for u dM . M (ii) Show that for a dilation r n → n , x → r x, r > 0, we have ur r n dM = u dM M
r M
(iii) Let M = x = 1 = S be the unit sphere in n , so that d = n − 1. Show that for every integrable u ∈ 1 n and = M uxn dx = ux dx 1 dr n−1
0 x=r
=
ur x dx 1 dr
0 x=1
Remark. With somewhat more effort it is possible to show the analogue of the approximation formula in Problem 15.6(vii) for M ; all that changes are technical details, the idea of the proof is the same, cf. Stroock [50, pp. 94–101] for a nice presentation. 15.8. In Example 11.14 we introduced Euler’s Gamma function: xt−1 e−x 1 dx t = Show that
21
=
√
0
.
162 15.9.
R.L. Schilling 3-d polar coordinates. Define 0 × 0 2 × −/2 /2 → 3 by
r = r cos cos r sin cos r sin Show that det D r = r 2 cos and find the integral formula for the coordinate change from Cartesian to polar coordinates x y z r .
15.10. Compute for m n ∈ the integral
B1 0
xm yn d2 x y.
16 Uniform integrability and Vitali’s convergence theorem
Lebesgue’s dominated convergence theorem 11.2 gives sufficient conditions which allow us to interchange limits and integrals. A crucial ingredient is the assumption that uj w a.e. for all j ∈ and some w ∈ 1+ . This condition is not necessary, but a slightly weaker one is indeed necessary and sufficient in order to swap limits and integrals. The key idea is to control the size of the sets where the uj exceed a given reference function. This is the rationale behind the next definition. 16.1 Definition Let X be a measure space and ⊂ be a family of measurable functions. We call uniformly integrable (also: equi-integrable) if ∀ > 0 ∃ w ∈ 1+ sup u d <
(16.1) u∈ u>w
Note that there are other (but for X < usually equivalent) definitions of uniform integrability, see Theorem 16.8 below for a discussion. We follow the universal formulation due to G. A. Hunt [21, p. 33]. j→
The other key assumption in Theorem 11.2 was that uj x −−−→ ux for (almost) all x ∈ X; we can weaken this assumption, too. 16.2 Definition Let X be a measure space. A sequence of -measurable ¯ converges in measure1 if numerical functions uj X → ∀ > 0 ∀ A ∈ A < lim uj − u > ∩ A = 0 (16.2) j→
holds for some u ∈ . We write - limj→ uj = u or uj −→ u. 1
If is a probability measure one usually speaks of convergence in probability.
163
164
R.L. Schilling
16.3 Example Convergence in measure is strictly weaker than pointwise convergence. To see this, take X = 0 1 0 1 1 01 and set un x = 1j2−k j+12−k x
n = j + 2k 0 j < 2k
This is a sequence of rectangular functions of width 2−k moving in 2k steps through 0 1 , jump back to x = 0, halve their width and start moving again. Obviously, n = nk→
1 un > = 2−k −−−−−−−→ 0
∀ ∈ 0 1
1
so that un −→ 0 in measure, but the pointwise limit limn→ un x does not exist anywhere.[] 16.4 Lemma Let uj j∈ ⊂ p , p ∈ 1 , and wk k∈ ⊂ . Then
(i) lim uj − u p = 0 implies uj −→ u; j→
(ii) lim wk x = wx a.e. implies wk −→ w. k→
Proof (i) follows immediately from the Markov inequality P10.12, uj − u > ∩ A uj − u > = uj − up > p
1
u − u pp
p j
(ii) Observe that for all > 0 wk − w > ⊂ ∧ wk − w
An application of the Markov inequality P10.12 yields wk − w > ∩ A ∧ wk − w ∩ A 1 1 ∧ wk − w 1A d
∧ wk − w d = A 1 If A < , the function 1A ∈ + is integrable, dominates the integrand ∧ wk − w 1A , and Lebesgue’s dominated convergence theorem 11.2 implies that limk→ A ∧ wk − w d = 0.
16.5 Lemma Assume that X is -finite and that uj j∈ ⊂ converges in measure to u. Then u is a.e. unique.
Measures, Integrals and Martingales
165
Proof Let Ak k∈ ⊂ be a sequence with Ak ↑ X and Ak < . Suppose that
u and w are two measurable functions such that uj −→ u and uj −→ w. Because of u − w u − uj + uj − w we find for all j n ∈ that u − w > n2 ⊂ u − uj > n1 ∪ uj − w > n1
Therefore, Ak ∩ u − w > n2 j→ Ak ∩ u − uj > n1 + Ak ∩ uj − w > n1 −−−→ 0 holds for all k n ∈ , i.e. Ak ∩ u − w > n2 is a null set for all k n ∈ ; but then u = w ⊂ n∈ u − w > n2 = kn∈ Ak ∩ u − w > n2 is also a null set, and we are done. Caution: Limits in measure on a non--finite measure space X need not be unique, see Problem 16.6. We are now ready for the main result of this chapter, which generalizes Lebesgue’s dominated convergence theorem 11.2. 16.6 Theorem (Vitali) Let X be -finite and let uj j∈ ⊂ p , p ∈ 1 , be a sequence which converges in measure to some measurable function u ∈ . Then the following assertions are equivalent: (i) lim uj − u p = 0; j→ (ii) uj p j∈ is a uniformly integrable family; (iii) lim uj p d = up d. j→
Proof (iii)⇒(ii): Since lim uj p d = up d, there exists some constant j→ C < such that supj∈ uj p d C, and for every > 0 there is some N ∈ such that ∀ j N
uj p d − up d p p
Setting w = maxu1 u2 uN u , we have w ∈ + [] and we see for every ∈ 0 1 that uj > 1 w = ∅ ∀ j N uj > 1 w ⊂ uj > u ∀ j ∈
166
R.L. Schilling
This implies for all j ∈ that p p p p
uj d
uj − u u d d +
1 1 uj > 1 w uj > w uj > w up d p + uj >u p + p sup uj p d 1 + C p
j∈
Since
uj > 1 w = uj p > 1p wp (16.3) we have established the uniform integrability of uj p j∈ . (ii)⇒(i): Let us first check that the double sequence uj − uk p jk∈ is again uniformly integrable. In view of (16.3), our assumption reads uj p d < ∀j ∈ (16.4) p
w ∈ + ⇔ wp ∈ 1+
and
uj >w p
for some suitable w = w ∈ + . From a − b a + b 2 maxa b we deduce p p p uj − uk d 2 uj ∨ uk d uj −uk >2w
uj −uk >2w
and since uj − uk uj + uk we get uj − uk > 2w ⊂ uj > w ∪ uk > w
Consequently, uj − uk p d uj −uk >2w
2
p
+
uj >w ∩uk >w
2p
uj >wuk
uj p d
uj >w ∩uk >w
+ 2p
uj p d + 2p
uj >w 16 4
4 · 2p = 2p+2
+
+
uk >wuj
uj >w ∩uk >w
uk >w
uk p d
uj p ∨ uk p d
uk p d
Measures, Integrals and Martingales
167
p
From this we conclude that for W = 2w ∈ + and large R > 0 uj − uk p d uj − uk p d + uj − uk p d = uj −uk >W
uj −uk W
2p+2 +
uj −uk W ∧
2
p+2
+
2p+2 +
p
W d + p
uj −uk > ∩ W>R
p ∧ W p d +
uj − uk p d
<uj −uk W
∧ W d + p
uj − uk p d +
p
W d
uj −uk > ∩ <W R
W p d
W>R
+ R uj − uk > ∩ < W R
p
Letting first j k → we find because of uj −→ u that[] lim sup uj − uk p d 2p+2 + p ∧ W p d + jk→
W p d
W>R
The last two terms vanish as → 0 and R → by the dominated convergence theorem 12.9, so that limjk→ uj − uk p d = 0. Since p is complete (cf. T12.7), uj j∈ converges in p to a limit u˜ ∈ p .
Due to Lemma 16.4, p -convergence also implies uj −→ u˜ and, by Lemma 16.5, we have u = u˜ a.e., hence p - limj→ uj = u. (i)⇒(iii): is a consequence of the lower triangle inequality for the p -norm, cf. the first part of the proof of Theorem 12.10 16.7 Remark Vitali’s theorem 16.6 still holds for measure spaces X which are not -finite. In this case, however, we can no longer identify the p -limit and the theorem reads: If uj −→ u, then the following are equivalent: (i) uj j∈ converges in p ; (ii) uj j∈ is uniformly integrable; (iii) uj p j∈ converges in . The reason for this is evident from the proof of T16.6: the last few lines of the step (ii)⇒(i) require -finiteness of X .
168
R.L. Schilling
Different forms of uniform integrability2 In view of Vitali’s convergence theorem 16.6 one is led to suspect that uniform integrability is essentially a sufficient (and also necessary, if X is -finite) condition for weak sequential relative compactness in 1 , i.e. every uj j∈ ⊂ has a subsequence ujk k∈ such (16.1) =⇒ that lim u · d exists for all ∈ . jk k→
(see Dunford and Schwartz [15, pp. 289–90, 386–7]). In p , 1 < p < , uniform boundedness of ⊂ p is enough for this: ⎧ ⎪ every uj j∈ ⊂ has a subsequence ⎪ ⎨ ujk k∈ such that lim ujk · d sup u p < ⇐⇒ k→ ⎪ u∈ ⎪ ⎩ exists for all ∈ q 1 + 1 = 1. p
q
This is a consequence of the reflexivity of the spaces p , p > 1. Let us give various equivalent conditions for uniform integrability. 16.8 Theorem Let X be some measure space and ⊂ 1 . Then the following statements (i)–(iv) are equivalent: is uniformly integrable, i.e. (16.1) holds; (ii) a) sup u d < ; (i)
u∈
b) ∀ > 0 ∃ w ∈ 1+ > 0 ∀ B ∈ =⇒ sup u d< ;
(iii) a) sup
B
w d <
u∈ B
u d < ;
u∈
b) ∀ > 0 ∃ K ∈ K < sup
u d < ; c) ∀ > 0 ∃ > 0 ∀ B ∈ B < =⇒ sup u d < ; u∈ B (iv) a) ∀ > 0 ∃ K ∈ K < sup u d < ; u∈ Kc u d = 0. b) lim sup u∈
R→ u∈
Kc
u>R
If X is a -finite measure space, (i)–(iv) are also equivalent to 2
This section can be left out at first reading.
Measures, Integrals and Martingales
169
u d < ; u d = 0 for every decreasing sequence Aj j∈ ⊂ , Aj ↓ b) lim sup
(v) a) sup u∈
j→ u∈ Aj
∅. [Note: Aj < is not assumed.] If X is a finite measure space, (i)–(v) are also equivalent to (vi) lim sup u d = 0; R→ u∈ u>R (vii) sup u d < for some increasing, convex function 0 u∈
t = . t→ t
→ 0 such that lim
16.9 Remark Almost any combination of the above criteria appears in the literature as uniform integrability or under different names. Here is a short list: (ii-a) – uniform boundedness (iii-b) – tightness (iii-c) – uniform absolute continuity (v-b) – uniform -additivity (vii) – de la Vallée Poussin’s condition (iii) – Dieudonné’s condition (weak seq. relative compactness) (v) – Dunford–Pettis condition (weak seq. relative compactness) Proof (of Theorem 16.8) First we show (iv)⇒(iii)⇒(ii)⇒(i)⇒(iv) for general measure spaces, then (ii)⇒(v)⇒(i) for -finite measure spaces and, finally, for finite measure spaces (iv)⇒(vi)⇒(vii)⇒(i). (iv)⇒(iii): Condition (iii-b) is clear. Given > 0 we can pick K = K/2 ∈ and R = R/2 > 0 such that u d + u d + u d u d = K∩u>R
Kc
K∩uR
+ RK + < 2 2 uniformly for all u ∈ . Setting = 2R we see for every B ∈ with B < that u d = u d + u d
B
and (iii) follows.
B∩u>R
u>R
B∩uR
u d + R B
+ R = 2
170
R.L. Schilling
(iii)⇒(ii): Condition (ii-a) is clear. Given > 0 we pick K = K ∈ with K < and = > 0 and set w = 1K . If B ∈ is such that B ∩ K = w d < , we get from (iii-c) and (iii-b) that B u d = u d + u d + B
B∩Kc
B∩K
uniformly for all u ∈ which is just (ii-b). (ii)⇒(i): Take w = w and = > 0 as in (ii). If R > and so
u d
u>Rw
u>Rw
u d R
w d
1
sup u d we see u∈
w d u>Rw
1 sup u d
R u∈
From (ii-b) we infer that supu∈ u>Rw u d . (i)⇒(iv): Let w = w be as in (i) resp. (16.1). Since u w ∩ u R ⊂ w R , we have u d = u d + u d u>R
u>w ∩u>R
u>w
+
u d +
uw ∩u>R
(16.5)
w d w>R
w 1w>R d
From the dominated convergence theorem 11.2 we see that the right-hand side tends (uniformly for all u ∈ ) to as R → and (iv-b) follows. To see (iv-a) we choose r = r > 0 so small that wr w d w ∧ r d ; this is possible since by Lebesgue’s convergence theorem 11.2 limr→0 w ∧ r d = 0. By the Markov inequality P10.12 we see w > r 1r w d < , and we get for K = w > r u d = sup u d + u d sup u∈ K c
wr ∩u>w
u∈ (16.1)
+ sup
u∈ wr ∩uw
+ 2
w d wr
wr ∩uw
u d
Measures, Integrals and Martingales
171
This proves (iv). Assume for the rest of the proof that is -finite (ii)⇒(v): (v-a) is clear. If A j ↓ ∅ we see from the monotone convergence theorem 11.1 that limj→ A w d = 0, so that for we have by (ii-b) j supu∈ A u d supu∈ A u d < for sufficiently large j ∈ . j
j
(v)⇒(i): Note that for the positive, resp. negative parts u± of u u± d = ±u d and Aj ∩ ±u 0 ↓ ∅ Aj ∩±u0
Aj
which implies that we may replace u in (v-b) by u. Since is -finite, we can find an exhausting sequence Ek ∈ , Ek ↑ X, Ek < . The function w =
2−k 1 1 + Ek Ek k∈
is clearly positive and ∈ 1+ . Assume (i) false; in particular, u d >
∃ > 0 ∀ j ∈ sup u∈ u>j w
But Aj = u > j w ↓ ∅ and (v) (with the above discussed modification) will then lead to a contradiction. Assume for the rest of the proof that is finite (iv)⇒(vi): is trivial. (vi)⇒(vii): For u ∈ we set n = n u = u > n and define t =
s ds
s =
0t
n 1nn+1 s
n=1
We will now determine the numbers 1 2 3 . Clearly, t =
n
n=1
0t
1nn+1 s ds =
n t − n+ ∧ 1
n=1
and
u d =
n=1
n
n u > n
u − n+ ∧ 1 d n=1
(16.6)
172
R.L. Schilling
If we can construct n n∈ such that it increases to and (16.6) is finite (uniformly for all u ∈ ), then we are done: s will increase to , t will be convex3 and satisfy t 1 1 s ds = s ds 21 2t ↑
t t 0t t t/2t By assumption we sequence rj j∈ ⊂ such that can find an increasing −j limj→ rj = and u>r u d 2 . Thus j
u > k =
k=rj
< u + 1
k=rj =k
=
< u + 1
=rj k=rj
< u + 1
=rj
Now sum the above inequality over j = 1 2 3 to get
u > k
j=1 k=rj
< u + 1
j=1 =rj
=
j=1 =rj <u+1 j=1
u d
u d 1
u>rj
2−j by assumption 3
Usually one argues that 0 a.e., but for this we need to know that the monotone function = is almost everywhere differentiable – and this requires Lebesgue’s differentiation theorem 19.20. Here is an alternative elementary argument: it is not hard to see that a b → is convex if, and only if, y−x z−x holds for all a < x < y < z < b, use e.g. the technique of the proof of Lemma 12.13. y−x z−x x Since x = 0 s ds (by L13.12 and T11.8), this is the same as 1 y 1 z 1 y 1 z s ds s ds ⇐⇒ s ds s ds y−x x z−x x y−x x z−y y 1 1 ⇐⇒ sy − x + x ds sz − y + y ds
0
0
The latter inequality follows from the fact that is increasing and sy − x + x ∈ x y while sz − y + y ∈ y z for 0 s 1.
Measures, Integrals and Martingales
173
and interchange the order of summation in the first double sum on the left: u > k = 11k rj u > k 1
j=1 k=rj
k=1
j=1
= k
This finishes the construction of the sequence k k∈ . (vii)⇒(i): Since X < , constants are integrable and we may take w x = r for all x ∈ X. Fix > 0 and choose r so big that t−1 t > 1/ for all t > r . Then u d u d u d u>r
u>r
and (i) follows. Problems 16.1. Let X be a finite measure space and uj j∈ ⊂ . Prove that
j→ lim sup uj > = 0 ∀ > 0 =⇒ fj −−→ 0 a.e. k→
jk
[Hint: uj → 0 a.e. if, and only if, jk uj > is small for all > 0 and big k k .] 16.2. Show that for a sequence uj j∈ of measurable functions on a finite measure space
lim sup uj > = lim sup uj > ∀ > 0 k→
j→
jk
and combine this with Problem 16.1 to give a new criterion for a.e. convergence. j→
16.3. Let X be a measure space and uj j∈ ⊂ . Show that uj −−→ u in jk→
measure if, and only if, uj − uk −−−→ 0 in measure. 16.4. Consider one-dimensional Lebesgue measure on 0 1 0 1 . Compare the convergence behaviour (a.e., p , in measure) of the following sequences: (i) fnj = n 1j−1/nj/n , n ∈ 1 j n run through in lexicographical order; (ii) gn = n 101/n , n ∈ ; (iii) hn = an 1 − nx+ , n ∈ , x ∈ 0 1 and a sequence an n∈ ⊂ + . 16.5. Let uj j∈ wj j∈ be two sequences of measurable functions on X . Sup
→ u and wj − → w. Show that auj + bwj , a b ∈ , maxuj wj , pose that uj − minuj wj and uj converge in measure and find their limits. 16.6. Let X be a measure space which is not -finite. Construct an example of a sequence uj j∈ ⊂ which converges in measure but whose limit is not unique. Can this happen in a -finite measure space?
174
R.L. Schilling
[Hint: let Xf = F F < be the -finite part of X. Show that X \ Xf = ∅, that every measurable E ⊂ X \ Xf satisfies E = and that we can change every limit of uj j∈ outside Xf .] 16.7. (i) Prove, without using Vitali’s convergence theorem, the following Theorem (Bounded convergence). Let X be a measure space, A ∈ be a set with A < and uj j∈ be a sequence of measurable functions. Suppose that all uj vanish on Ac , that uj C for all j ∈ and some constant
→ u. Then L1 -limj uj = u. C > 0 and that uj − (ii) Use one-dimensional Lebesgue measure and the sequence uj = 1jj+1 to show that the assumption A < is really needed in (i). (iii) As L1 -limit the function u is unique but, as we have seen in Problem 16.6, this is not the case for limits in measure. Why does the uniqueness of the limit in (i) not contradict Problem 16.6? 16.8. Let P be a probability space. Define for two random variables X Y X Y = inf > 0 PX − Y
(i) is a pseudo-metric on the space of random variables , i.e. satisfies properties d2 , d3 of a metric, cf. Appendix B, Definition B.15. (ii) A sequence Xj j∈ ⊂ converges in probability to a random variable j→
X if, and only if, Xj X −−→ 0. (iii) is a complete pseudo-metric on , i.e. every -Cauchy sequence converges in probability to some limit in . (iv) Show that g X Y =
X − Y dP 1 + X − Y
and
X Y =
X − Y ∧ 1 dP
are pseudo-metrics on which have the same Cauchy sequences as . 16.9. Let X be a -finite measure space. Suppose that Aj j∈ ⊂ satisfies j→
Aj −−→ 0. Show that lim
j→ Aj
u d = 0
∀ u ∈ 1
[Hint: use Vitali’s convergence theorem 16.6.] 16.10. Let X be a measure space and un n∈ ⊂ . n→
(i) Let xn n∈ ⊂ . Show that xn −−→ 0 if, and only if, every subsequence k→
xnk k∈ satisfies xnk −−→ 0.
→ u if, and only if, every subsequence unk k∈ has a sub(ii) Show that un − subsequence ˜unk k∈ which converges a.e. to u on every set A ∈ of finite -measure.
Measures, Integrals and Martingales
175
[Hint: use L16.4 for necessity. For sufficiency show that u˜ nk → u in measure, hence the sequence of reals A ∩ unk − u > has a subsequence converging to 0; use (i) to conclude that A ∩ un − u > → 0.] → u entails that un − → u for every (iii) Use part (ii) to show that un − continuous function → . 16.11. Let and be two families of uniformly integrable functions on an arbitrary measure space X . Show that (i) every finite collection of functions f1 fn ⊂ 1 is uniformly integrable. (ii) ∪ f1 fn , f1 fn ∈ 1 is uniformly integrable. (iii) + = f + g f ∈ g ∈ is uniformly integrable. (iv) c.h. = tf + 1 − t f ∈ 0 t 1 (‘c.h.’ stands for convex hull) is uniformly integrable. (v) the closure of c.h. in the space 1 is uniformly integrable. 16.12. Assume that uj j∈ is uniformly integrable. Show that 1 lim sup uj d = 0
k→ k jk 16.13. Let P be a probability space. Adapt the proof of Theorem 16.8 to show that a sequence uj j∈ ⊂ 1 is uniformly integrable if it is bounded in some space p P with p > 1, i.e. if supj∈ uj p < . Use Vitali’s convergence theorem 16.6 to construct an example illustrating that 1 -boundedness of uj j∈ does not guarantee uniform integrability. 16.14. Let X be a finite measure space and ⊂ 1 be a family of integrable functions. Show that is uniformly integrable if, and only if, j=1 j j < f j + 1 converges uniformly for all f ∈ . [Hint: compare (vi)⇒(vii) of the proof of Theorem 16.8.]
17 Martingales
Martingales are a key tool of modern probability theory, in particular, when it comes to a.e. convergence assertions and related limit theorems. The origins of martingale techniques can be traced back to analysis papers by Kac, Marcinkiewicz, Paley, Steinhaus, Wiener and Zygmund from the early 1930s on independent (or orthogonal) functions and the convergence of certain series of functions, see e.g. the paper by Marcinkiewicz and Zygmund [28] which contains many references. The theory of martingales as we know it now goes back to Doob and most of the material of this and the following chapter can be found in his seminal monograph [13] from 1953. We want to understand martingales as an analysis tool which will be useful for the study of Lp - and almost everywhere convergence and, in particular, for the further development of measure and integration theory. Our presentation differs somewhat from the standard way to introduce martingales – conditional expectations will be defined later in Chapter 22 – but the results and their proofs are pretty much the usual ones. The only difference is that we develop the theory for -finite measure spaces rather than just for probability spaces. Those readers who are familiar with martingales and the language of conditional expectations we ask for patience until Chapter 23, in particular Theorem 23.9, when we catch up with these notions. Throughout this chapter X is a measure space which admits a filtration, i.e. an increasing sequence 0 ⊂ 1 ⊂ ⊂ j ⊂ ⊂ of sub--algebras of . If X 0 is -finite1 we call X j a -finite filtered measure space. This will always be the case from now on. Finally, 1
i.e. Aj j∈ ⊂ 0 with Aj ↑ X and Aj < .
176
Measures, Integrals and Martingales
177
we write = j j = 0 1 2 for the smallest -algebra generated by all j . 17.1 Definition Let X j be a -finite filtered measure space. A sequence of -measurable functions uj j∈ is called a martingale (w.r.t. the filtration j j∈ ), if uj ∈ 1 j for each j ∈ and if uj+1 d = uj d ∀ A ∈ j (17.1) A
A
We say that uj j∈ is a submartingale (w.r.t. j j∈ ) if uj ∈ 1 j and uj+1 d uj d ∀ A ∈ j (17.2) A
A
and a supermartingale (w.r.t. j j∈ ) if uj ∈ 1 j and uj+1 d uj d ∀ A ∈ j A
(17.3)
A
If we want to emphasize the underlying filtration, we write uj j j∈ . 17.2 Remark (i) It is enough to assume instead of (17.1) that G uj+1 d = G uj d for all G ∈ j where j is a generator of j containing an exhausting sequence Gk k∈ ⊂ j with Gk ↑ X. This follows from the fact that + − uj+1 d = uj d ⇐⇒ u+ + u d = u− j j+1 + uj d j+1 A A A A = A
= A
where are finite measures on j and from the uniqueness theorem 5.7: j = j implies – under our assumptions on j – that = on j . (For sub- or supermartingales we need, in addition, that j is a semi-ring, cf. Lemma 15.6.) (ii) Set j = A ∈ j A < . It is not hard to see that j is a semiring and that, because of -finiteness, j = j . Therefore (ii) means that it is enough to assume (17.1)–(17.3) for all sets in j , i.e. for all sets with finite -measure. (iii) Condition (17.2) in Definition 17.1 is equivalent to
uj+1 d uj d ∀ ∈ (17.2 ) + j
Indeed: Since = 1A ∈ + j for all A ∈ j , (17.2 ) implies (17.2). Conversely, if ∈ + j is a simple function, (17.2 ) follows from (17.2) by linearity. For general ∈ + j , we find by T8.8 a sequence of j -measurable
178
R.L. Schilling
simple functions k such that k and k ↑ . Since uj uj+1 ∈ 1 , we can use Lebesgue’s dominated convergence theorem 11.2 and get
uj+1 d = lim
k→
17.2’
k uj+1 d
lim
k→
k uj d =
uj d
Similar statements hold for martingales (17.1) and supermartingales (17.3). (iv) With some obvious (notational) changes in Definiton 17.1 we can also consider other index sets such as 0 , or −. 17.3 Examples Let X j be a -finite filtered measure space. (i) uj j∈ is a martingale if, and only if, it is both a sub- and a supermartingale. (ii) uj j∈ is a supermartingale if, and only if, −uj j∈ is a submartingale. (iii) Let uj j∈ and wj j∈ be [sub-]martingales and let be [positive] real numbers. Then uj + wj j∈ is a [sub-]martingale. (iv) Let uj j∈ be a submartingale. Then u+ j j∈ is a submartingale. Indeed: Take A ∈ j and observe that uj 0 ∈ j . Then + + uj+1 d uj+1 d uj+1 d A
A∩uj 0
A∩uj 0
(17.2)
A∩uj 0
uj d =
A
u+ j d
(v) Let uj j∈ be a martingale. Then uj j∈ is a submartingale. This follows from uj = 2u+ j − uj , (iii) and (iv). (vi) Let uj j∈ be a martingale. If uj ∈ p j for some p ∈ 1 , then uj p j∈ a submartingale. y Indeed: Note that y p − x p = x p tp−1 dt p x p−1 y − x for all x y ∈ y x where we set, as usual, x = − y if x > y . If we take y = uj+1 and x = uj and integrate over A ∈ j , we find by dominated convergence T11.2 uj+1 p − uj p d p 1A uj p−1 uj+1 − uj d A
=
lim p
N →
1A uj p−1 ∧ N uj+1 − uj d ∈ + j
(17.2 ),(v)
0
since uj j∈ is, by (v), a submartingale.
Measures, Integrals and Martingales
179
(vii) Let uj ∈ 1 j , j ∈ , and u1 u2 u3 . Then uj j∈ is a submartingale. (viii) Let X = 0 1 0 1 = 1 01 and consider the finite (-) algebras generated by all dyadic intervals of 0 1 of length 2−j , j ∈ 0 : −j −j −j j −j j = 0 2 k2 k + 12 2 − 12 1 Obviously, 0 ⊂ 1 ⊂ ⊂ 0 1 and 0 1 0 1 j is a (-) finite filtered measure space. Then uj j∈0 , uj = 2j 102−j , is a martingale. Indeed: Since the sets k2−j k + 12−j , k = 0 1 2j − 1 are a disjoint partition of 0 1, every A ∈ consists of a (finite) disjoint union of such sets. If 0 2−j ⊂ A, we have uj+1 d = 2j+1 1A∩02−j+1 d = 2j+1 2−j+1 A
= 2j 2−j =
2j 1A∩02−j d =
A
uj d
and, otherwise, uj+1 d = 2j+1 102−j+1 d = 0 = 2j 102−j d = uj d A
A
A
A
(ix) Let X = 0 n 0 n = n 0n and consider the algebras j generated by the lattice of half-open dyadic squares of sidelength 2−j , j ∈ 0 , −j n −j n j ∈ 0 j = z + 0 2 z ∈ 2 0 n n n Then 0 ⊂ 1 ⊂ ⊂ 0 , and 0 0 j is a -finite filtered measure space. For every real-valued function u ∈ 1 0 n we can define an j measurable step function uj on the dyadic squares in j by z+02−j n u d 1z+02−j n x uj x = −j n z∈2−j n0 z + 0 2 (17.4)
1z+02−j n d 1z+02−j n x = u −j n z + 0 2 n −j z∈2 0
Then uj j j∈ is a martingale.
180
R.L. Schilling
Indeed: Since the sets z + 0 2−j n are disjoint for different z ∈ 2−j n0 , the sums in (17.4) are actually finite sums.
That uj ∈ 1 j is clear from the construction. To see (17.1), fix z ∈ 2−j n0 and j ∈ 0 and observe that for all k = j j + 1 j + 2
z +02−j n
uk x dx
=
1z+02−k n d · 1z+02−k n 1z +02−j n d u z + 0 2−k n
z∈2−k n0
=
z∈2−k n0 z+02−k n ⊂z +02−j n
=
z∈2−k n0 z+02−k n ⊂z +02−j n
=
1z+02−k n d · z + 0 2−k n u z + 0 2−k n
z+02−k n
ux dx
z +02−j n
ux dx
The r.h.s. is independent of k and, therefore, we get uj d = u d = z +02−j n
z +02−j n
z +02−j n
uj+1 d
Since j is generated by (disjoint unions of) squares of the form z + n −j n
−j 0 2 , z ∈ 2 0 , the claim follows from Remark 17.2(i).
(x) Assume that X is a probability space, i.e. a measure space where X = 1. A family of real functions uj j∈ ⊂ 1 is called independent, if M M −1 uj Bj = u−1 (17.5) j Bj j=1
j=1
holds for all M ∈ and any choice of B1 B2 BM ∈ . If k = u1 u2 uk is the -algebra generated by u1 u2 uk , then the sequence of partial sums sk = u1 + u2 + · · · + uk is an k k∈ -submartingale if, and only if,
k ∈
uj d 0 for all j.
Measures, Integrals and Martingales
181
To see this we need an auxiliary result which is of some interest on its own: If u1 u2 uk+1 are independent integrable functions, then A
uk+1 d = A
uk+1 d
∀ A ∈ u1 u2 uk
(17.6)
∀ ∈ 1 u1 uk
(17.7)
and
uk+1 d =
d ·
uk+1 d
In particular, integrable independent functions satisfy k
uj d =
j=1
k
uj d
j=1
The proof of (17.6) and (17.7) will be given in Scholium 17.4 below. Returning to the original problem, we find for all A ∈ k that A
sk+1 d =
A
sk + uk+1 d = (17.6)
=
A
A
sk d +
A
uk+1 d
sk d + A
uk+1 d
Thus uk+1 d 0 is necessary and sufficient for sk k∈ to be a submartingale. (xi) Let uj j∈ ⊂ 1+ ∩ + be independent functions (in the sense of (x)). Then pk = u0 · u1 · · uk , k ∈ , isa submartingale w.r.t. the filtration k = u0 u1 uk if, and only if, uj d 1 for all j. This follows directly from A
pk+1 d =
(17.7)
1A pk uk+1 d =
=
A
1A pk d · pk d ·
uk+1 d uk+1 d ∀ A ∈ k
17.4 Scholium (on independent functions) (i) Let u1 u2 uk+1 be independent integrable functions on the probability space X . Then A
uk+1 d = A
uk+1 d
∀ A ∈ u1 u2 uk
(17.6)
182
R.L. Schilling
and
uk+1 d =
d ·
∀ ∈ 1 u1 uk
uk+1 d
(17.7)
−1 Proof. We begin with (17.6). Pick a set AM = M j=1 uj Bj , B1 BM ∈
, M k, from the generator of k = u1 u2 uk . Because of Theorem 8.8 (and Problem 8.10) we find a sequence of simple functions f ∈ ⊂ uk+1 such that f uk+1 and lim→ f = uk+1 . For the standard repreN sentations f = j=0 yj 1Hj , Hj ∈ uk+1 , we get using dominated convergence T11.2
11.2
AM
uk+1 d = lim
N
→ AM j=0
yj 1Hj d
N
= lim
→
yj AM ∩ Hj
j=0
N
(17.5)
= lim
→
yj AM Hj
j=0
11.2
= AM
uk+1 d
where we applied (17.5) for Hj ∈ uk+1 ⇐⇒ Hj = u−1 k+1 Cj with some suitable Cj ∈ and AM . This proves (17.6) for a generator of k which satisfies the conditions stated in Remark 17.2(i); a similar argument as the one in this remark now proves that (17.6) holds for all A ∈ k . For (17.7) let us first assume that is bounded. Set k = u1 uk . By Theorem 8.8 (and Problem 8.10) we find a sequence of simple functions f ∈ ⊂ k such that f and lim→ f = . For the standard N representations f = j=0 yj 1Aj , Aj ∈ k , we get using dominated convergence T11.2 and (17.6)
11.2
uk+1 d = lim
→
N j=0
yj 1Aj uk+1 d
N
= lim
→
j=0
yj Aj
uk+1 d
Measures, Integrals and Martingales
= lim
→
11.2
=
N j=0
d ·
yj 1Aj d ·
183
uk+1 d
uk+1 d
If is integrable but not bounded, we apply the previous calculation to the bounded functions = ∧ and use dominated convergence on the right and monotone convergence on the left to get
9.6
· uk+1 d = lim
→
· uk+1 d = lim
d ·
→
11.2
=
d ·
uk+1 d
uk+1 d
This shows, in particular, that uk+1 ∈ 1 . We can therefore apply dominated convergence to = − ∨ ∧ to derive
uk+1 d = lim
→
uk+1 d = lim
→
=
d ·
d ·
uk+1 d
uk+1 d
(ii) In Example 17.3(x) we assumed the existence of infinitely many independent functions. As a matter of fact, this is a not completely trivial matter. If we want to construct finitely many independent functions u1 u2 un , we can proceed as follows. Replace the probability space X by the n-fold n ⊗n ×n product measure space X (which is again a probability space[] ) and define u˜ j x1 xn = uj xj for j = 1 2 n. Since each of the new functions u˜ j depends only on the variable xj , their independence follows from a simple Fubini-type argument. A similar argument can be applied to countably many functions – provided we know how to construct infinite-dimensional products. We will not follow this route but construct instead countably many independent functions Xj j∈ on the probability space 0 1 0 1 = 1 01 which are identically distributed, i.e. the image measures satisfy X1 = Xj for all j ∈ with a Bernoulli distribution X1 = p 1 + 1 − p 0 , p ∈ 0 1. Consider the interval map p 0 1 → 0 1 p x =
x x−p x 10p x + 1 p 1 − p p1
184
R.L. Schilling
and its iterates np = p · · · p , see the pictures for the graphs of p and 2p . n times
Define Xn x = 10p n−1 p x
n ∈
In the first step the interval 0 1 is split according to p 1 − p into two intervals 0 p and p 1 and X1 is 1 on the left segment and 0 on the right. The subsequent iterations split each of the intervals of the previous step – say, step n − 1 – into two new sub-intervals according to the ratio p 1 − p, and we define Xn to be 1 on each new left subinterval and 0 otherwise, see the picture for n = 1 2. Thus Xn = 1 = p and Xn = 0 = 1 − p, which means that the Xn are identically Bernoulli distributed. To see independence, fix j ∈ 0 1 , and observe that X1 = 1 ∩ X2 = 2 ∩ ∩ Xn−1 = n−1 exactly determines the segment before the nth split. Since each split preserves the proportion between p and 1 − p, we find
1 p
p 0
1
1 p
2p 0
p2
p
2p-p2
1
X1 = 1 ∩ ∩ Xn−1 = n−1 ∩ Xn = 1 = X1 = 1 ∩ ∩ Xn−1 = n−1 · p so that X1 = 1 ∩ ∩ Xn−1 = n−1 ∩ Xn = n = p1 +···+n 1 − pn−1 −···−n =
n
Xj = j
j=1
This shows that the Xj are all independent. For later reference purposes let us derive some formulae for the arithmetic means n1 Sn = n1 X1 + X2 + · · · + Xn . The mean value is 1 1 Sn d = X1 + · · · + Xn d = X1 d = 1 · p + 0 · 1 − p = p n n
Measures, Integrals and Martingales
185
while the variance is given by n 2 2 1 S − np d = X − p d j n n n2 j=1
1
=
n 1 Xj − pXk − p d n2 jk=1
n 1 = 2 Xj − p2 d n j=1
1 X1 − p2 d n 1 = 1 − p2 p + p2 1 − p n 1 = p1 − p n
=
(independence) (identical distr.)
In the next chapter we study the convergence behaviour of a martingale uj j∈ ; therefore, it is natural to ask questions of the type from which index j onwards does uj x exceed a certain threshold, etc. This means that we must be able to admit indices which may depend on the argument x of uj x: ux x. The problem is measurability. 17.5 Definition Let X j be a -finite filtered measure space. A stopping time is a map X → ∪ which satisfies j ∈ j for all j ∈ . The associated -algebra is given by = A ∈ A ∩ j ∈ j ∀ j ∈ As usual, we write u x instead of the more correct ux x. 17.6 Lemma Let be stopping times on a -finite filtered measure space X j . (i) ∧ , ∨ , + k, k ∈ 0 are stopping times. (ii) < ∈ ∩ and ⊂ if . (iii) If uj is a sequence of real functions such that uj ∈ j , then u is / -measurable.
186
R.L. Schilling
Proof (i) follows immediately from the identities ∧ j = j ∪ j ∈ j ∨ j = j ∩ j ∈ j + k j = j − k ∈ j−k∨0 ⊂ j (ii) Since for all j ∈
j
< ∩ j =
= k ∩ k <
k=1
=
j
k ∩ k − 1 c ∩ k c ∈ j k=1 ∈ k
∈ k
∈ k
we find that < ∈ , while a similar calculation for < ∩ j yields < ∈ . If we find for A ∈ A ∩ j = A ∩ ∩ j = A ∩ j ∩ j ∈ j ∈ j
=
∈ j
i.e. A ∈ , hence ⊂ . (iii) We have for all B ∈ and j ∈ ∪ j u ∈ B ∩ j = uk ∈ B ∩ = k k=1
=
j
uk ∈ B ∩ k ∩ k − 1 c ∈ j k=1 ∈ k
∈ k
∈ k
The next result is a very useful characterization of (sub-)martingales. 17.7 Theorem Let X j be a -finite filtered measure space. For a sequence uj j∈ , uj ∈ 1 j , the following assertions are equivalent: (i) uj j∈ is a submartingale; (ii) u d u d for all bounded stopping times ; (iii) A u d A u d for all bounded stopping times and A ∈ . Proof (i)⇒(ii): Let N be two stopping times. By Lemma 17.6 u is measurable, and since N N uj d uj d < u d = j=1 =j
we find that u u ∈
1 X .
j=1
Measures, Integrals and Martingales
187
Step 1: − 1. In this case < ∩ = j = > j ∩ = j = j c ∩ = j ∈ j and we see
u d =
=
(17.2)
=
=
=
=
u d + u d + u d +
N −1 j=1 < ∩=j N −1 j=1 < ∩=j
uj d
uj+1 d
<
(use − 1)
u d
u d
Step 2: if N we introduce (at most N ) intermediate stopping times
j = + j ∧ , j = 0 1 2 k N . For some k N we get = 0 1 k = while j+1 − j 1. Repeating step 1 from above k times yields
u d =
u 0 d
u 1 d
u k d =
u d
(ii)⇒(iii): Note that for any A ∈ the function = A = 1A + 1Ac is again a bounded stopping time. This follows from j = j ∩ A ∪ j ∩ Ac ∈ j
j ∈
where we used that A ∈ ⊂ , cf. Lemma 17.6. Since , (ii) shows u 1A + u 1Ac d = u d u d which is but
A u
d
A u
d.
(iii)⇒(i): Take = j and = j + 1. 17.8 Remark One should read Theorem 17.7(iii) in the following way: Let 1 2 k N be bounded stopping times. Then uj j j∈ is a submartingale
=⇒
uj j j=1k is a submartingale
This statement is often called the optional sampling theorem.
188
R.L. Schilling
Problems Unless otherwise stated X j will be a -finite filtered measure space. 17.1. Let X be a finite measure space and let uj j j∈ be a martingale. Set 0 = ∅ X . Show that uj j j∈0 is a martingale if, and only if, u0 = u1 d. 17.2. Let uj j j∈ be a (sub-, super-)martingale and let j j∈ and j j∈ be filtrations in which are smaller resp. larger than j j∈ , i.e. such that j ⊂ j ⊂ j . (a) Show that uj j j∈ is again a (sub-, super-)martingale. (b) Show that uj j j∈ is, in general, no longer a (sub-, super-)martingale. 17.3. Completion (7). Let uj j j∈ be a submartingale and denote by ∗j the completion of j . Then uj ∗j j∈ is still a submartingale. 17.4. Show that uj j∈ is a submartingale if, and only if, uj ∈ 1 j for all j ∈ and uj d uk d ∀ j < k ∀ A ∈ j A
A
Find similar statements for martingales and supermartingales. 17.5. Prove the assertion made in Remark 17.2(ii). 17.6. Let uj j j∈ be a martingale with uj ∈ 2 j . Show that uj uk d = u2j∧k d [Hint: assume that j < k. Approximate uj by simple functions from j , use dominated convergence and (17.1).] 17.7. Martingale transform Let uj j j∈ be a martingale and let fj j∈ be a sequence of bounded functions such that fj ∈ j for every j ∈ . Set f0 = 0 and u0 = u1 d. Then the so-called martingale transform f • uk =
k
fj−1 · uj − uj−1
k ∈
j=1
is again a martingale w.r.t. j j∈ . 17.8. Let P be a probability space and let Xj j∈ be a sequence of independent identically distributed random variables with Xj ∈ 2 and Xj dP = 0. Set j = X1 X2 Xj . (i) Show, without using Example 17.3(vi), that Sn2 = X1 + X2 + · · · + Xn 2 is a submartingale w.r.t. n n∈ . (ii) Show that there exists a constant such that Sn2 − n is a martingale w.r.t. n n∈ . 17.9. Let P be a probability space and let Xj j∈ be a sequence of independent random variables with Xj ∈ 2 , Xj dP = 0 and Xj2 dP = j2 . Set j = X1 X2 Xj and Aj = 12 + · · · + j2 . Show that 2 n n 2 Xj − j2 Mn = Sn − An = j=1
j=1
is a martingale. [Hint: use formulae (17.6), (17.7) and Remark 17.2(ii).]
Measures, Integrals and Martingales
189
17.10. Martingale difference sequence Let dj j∈ be a sequence in 2 ∩ 1 and define 0 = ∅ X and j = d1 d2 dj . Suppose that for each j ∈ dj d = 0 ∀ A ∈ j−1 A
Show that
u2n n∈
where un = d1 + · · · + dn is a submartingale which satisfies n u2n d = dj2 d j=1
Show that on 1 the sequence dj x = sgn sin2j x, x ∈ , j ∈ , is a martingale difference sequence. (See Chapter 24, in particular pp. 299 and 302 for more details.) 17.11. Let P be a probability space and let Xj j∈ be a sequence of independent identically Bernoulli p 1 − p-distributed random variables with values ±1, i.e. such that PXj = 1 = p and PXj = −1 = 1 − p – this can be constructed as in Scholium 17.4. Set Sn = X1 + · · · + Xn . Then 1−p Sn is a martingale w.r.t. the p filtration given by n = X1 Xn . 17.12. Let X be a -finite measure space, let be a further measure on and let Anj j∈ ⊂ be for each n ∈ a sequence of mutually disjoint sets such that X = · Anj . Assume, moreover, that each set Anj is the union of finitely many j∈
sets from the sequence An+1k k∈ . Show that (i) the -algebras n = Anj j ∈ form a filtration; (ii) if Anj > 0 for all n j ∈ , then un =
Anj 1Anj j=1 Anj
is a martingale w.r.t. n n∈ . 17.13. Let uj j j∈ be a supermartingale and uj 0 a.e. Prove that uk = 0 a.e. implies that uk+j = 0 a.e. for all j ∈ . 17.14. Verify that the family defined in Definition 17.5 is indeed a -algebra. 17.15. Show that is a stopping time if, and only if, = j ∈ j for all j ∈ . 17.16. Show that, in the notation of Lemma 17.6, ∧ = ∩ for any two stopping times .
18 Martingale convergence theorems
Throughout this chapter X j is a -finite filtered measure space. One of the foremost applications of martingales is to convergence theorems. Let us begin with the following simple observation for a sequence uj j∈ of real numbers. If uj j∈ has a limit = limj→ uj and if we know that ∈ a b, only finitely many of the uj can be outside of a b. In particular, if infinitely many uj are bigger than b and infinitely many smaller than a, then the sequence has no limit at all. We call any occurrence of uj a
and
uj+k b
(for some k ∈ )
an upcrossing of a b – the picture below shows three such upcrossings if j = 0 1 N – and we have just observed that, if for some pair a b ∈ , a < b, # upcrossings of a b =
=⇒
uj j∈ has no limit
(18.1)
For a submartingale we can estimate the average number of upcrossings over any interval:
b
a (uN – a)–
190
Measures, Integrals and Martingales
191
18.1 Lemma (Doob’s upcrossing estimate) Let uj j∈ be a submartingale and denote by Ua b N x the number of upcrossings of uj xj∈ across a b which occur for 1 j N . Then A
Ua b N d
1 u − a+ d b−a A N
∀ A ∈ 0
Proof In order to keep track of the upcrossings we introduce the following stopping times[] , cf. Problem 18.1: 0 = 0 and k = infj > k−1 uj a ∧ N
k = infj > k uj b ∧ N
(as usual we set inf ∅ = +). Then 0 = 0 < 1 1 2 N = N = N
By the very definition of an upcrossing we find b − a Ua b N u 1 − a + u 2 − u2 + · · · + u N − uN b−a
b−a
and integrating both sides of this inequality over A ∈ 0 yields, after some simple rearrangements, b − a Ua b N d A
−
17.7
A
A
a d +
A
0
u 1 − u2 d + · · · +
u N − a d
A
A
0
u N −1 − uN d +
A
u N d
u N − a+ d
The upcrossing lemma is the basis for all martingale convergence theorems. 18.2 Theorem (Submartingale convergence) Let uj j j∈ be a submartingale on the -finite + filtered measure space X j . If supj∈ uj d < , then u x = limj→ uj x exists for almost all x ∈ and defines an -measurable function. Before we give the details of the proof, let us note some immediate consequences.
192
R.L. Schilling
18.3 Corollary Under any of the following conditions the pointwise limit limj→ uj exists a.e. in : (i) uj j∈ is a supermartingale and supj∈ u− j d < . (ii) uj j∈ is a positive supermartingale. (iii) uj j∈ is a martingale and supj∈ uj d < . Proof (of Theorem 18.2) In view of (18.1) we have
x lim uj x does not exist = x lim sup uj x > lim inf uj x j→
j→
j→
x sup Ua b N x =
=
a
N ∈
Since 0 is -finite, there exists an exhausting sequence Ak k∈ ⊂ 0 such that Ak ↑ X and Ak < . From the inequality − + + + we find 9.6 sup Ua b N d = sup Ua b N d Ak N ∈
N ∈ Ak
1 sup uN − a+ d b − a N ∈ Ak
1 + sup u d + a Ak < b − a N ∈ Ak N
18.1
and a routine application of Markov’s inequality P10.12 yields
sup Ua b N = ∩ Ak = 0
Since
k∈
a
N ∈
x supN ∈ Ua b N x = ∩ Ak is a countable union
of null sets, it is itself a null set; thus the limit u x = limj→ uj x exists for almost all x ∈ and is -measurable. An application of Fatou’s lemma T9.11 shows u d = lim inf uj d lim inf uj d sup uj d j→
j→
j∈
while the submartingale property gives
+ sup uj d = sup 2 uj d − uj d 2 sup u+ d − u1 d
j j∈
j∈
j∈
The last expression is, by assumption, finite and we conclude that u ∈ 1 and u < a.e.
Measures, Integrals and Martingales
193
We have seen in Example 17.3(v) that for a martingale uj j∈ the sequence uj j∈ is a submartingale. Therefore,
u1 d
u2 d
u3 d
which means that, if we had a martingale with index set running to the left, say, w ∈− , condition (iii) of C18.3 would be automatically fulfilled: w−j d w−1 d < , and the limit limj→+ w−j would always exist. 18.4 Definition Let X be a measure space and −1 ⊃ −2 ⊃ −3 ⊃
be a decreasing filtration of sub--algebras of such that −j is -finite for every j ∈ . A (sub-, super-)martingale w ∈− is called reversed or backwards (sub-, super-)martingale. 18.5 Corollary (Backwards convergence theorem) Let w ∈− be a backwards submartingale. If on − = ∈− the measure − is finite, then limj→+ w−j x ∈ − + exists for almost all x and defines an − /− -measurable function. Proof The proof of Doob’s upcrossing lemma 18.1 obviously applies to the (finite) sequence w−N w−N +1 w−1 and yields the following variant of the upcrossing inequality: b − a
A
Ua b −N d
A
w−1 − a+ d
∀ A ∈ −N
This means that the arguments of the proof of Theorem 18.2 remain valid without further conditions on w ∈− . But w+ ∈− is again a submartingale, see Example 17.3(iv), thus
+ d = w−
9.11
+ lim inf w−j d lim inf j→+
j→+
+ d w−j
+ w−1 d <
+ = + = 0. By Corollary 10.13, w− = + = w−
Theorem 18.2 does not guarantee 1 -convergence of a martingale. An example of a martingale which satisfies all conditions of T18.2 but fails to have a limit j→ in 1 , is given in 17.3(viii): here uj −−−→ 0 a.e. while 01 uj d = 1 → 0. Such phenomena can be avoided if we assume that the submartingale is uniformly integrable (UI).
194
R.L. Schilling
Recall from Definition 16.1 that uj j∈ is uniformly integrable if 1 uj d <
∀ > 0 ∃ w ∈ + sup j∈ uj >w
18.6 Theorem (Convergence of UI submartingales) Let uj j∈ be a submartingale on the -finite filtered measure space X j . Then the following assertions are equivalent: (i) u x = lim uj x exists a.e., u ∈ 1 , j→ lim uj d = u d, and uj j∈∪ is a submartingale. j→
(ii) uj j∈ is uniformly integrable. (iii) uj j∈ converges in 1 . Proof (i)⇒(ii): Since 0 is -finite, we can fix an exhausting sequence Ak k∈ ⊂ 0 with Ak ↑ X and Ak < . It is not hard to see that the −k 1 + A −1 1 function w = k Ak is strictly positive w > 0 and intek=1 2 1 grable w ∈ 0 . Because of u ∈ 1 , we find for every > 0 + some > 0 and some N ∈ such that u+ > u+ d + Ac u d < for all j
j N . Example 17.3(iv) shows that u+ j j∈∪ is still a submartingale, so that for every L > 0 + u d u+ d j + + uj >Lw
uj >Lw
+ u+ j >Lw∩u ∩AN
u+ d +
c u+ >∪AN
u+ d
u+ > Lw ∩ A N + j −N −1 > L 2 1 + A + u+ N j where we used that w 2−N 1 + AN −1 on AN . The Markov inequality P10.12 and the submartingale property imply 2N 1 + AN + sup + sup u+ uj d j d + L j∈ uj >Lw j∈ 2N 1 + AN + u d +
L Since we may choose L > 0 arbitrarily large, we have found that u+ j j∈ is + uniformly integrable. From limj→ uj = u a.e., we conclude limj→ u+ j = u ,
Measures, Integrals and Martingales
195
+ and Vitali’s convergence theorem 16.6 shows that limj→ u+ j d = u d. Thus j→ + uj d = 2u+ − u d − − − → 2u − u d = u d j j
and another application of Vitali’s theorem proves that uj j∈ is uniformly integrable. (ii)⇒(iii): Because of uniform integrability we have for some > 0 and a suitable w ∈ 1 uj d = uj d + uj d uj >w
+
uj w
w d <
and the martingale convergence theorem 18.2 guarantees that the pointwise limit u = limj→ uj exists a.e.; 1 -convergence follows from Vitali’s convergence theorem 16.6. (iii)⇒(i): Since 1 -limj→ uj = u exists we find (e.g. as in Theorem 12.10) that supj∈ uj d < . By the martingale convergence theorem 18.2, the pointwise limit u = limj→ uj exists a.e. On the other hand, by Corollary 12.8, u = limk→ ujk a.e. for some subsequence. This implies that u = u a.e. and, in 1 particular, that u = -limj→ uj ; this entails limj→ A uj d = A u d for all A ∈ .[] Since uj j∈ is a submartingale, we find for all k > j and A ∈ j A
uj d
k→
A
uk d −−−→
A
u d
so that uj j∈∪ is also a submartingale. Again, 1 -convergence of backwards (sub-)martingales holds under much weaker assumptions. 18.7 Theorem Let w ∈− be a backwards submartingale and assume that − is -finite. Then (i) lim w−j = w− ∈ − exists a.e. (ii)
j→+ 1 - lim w−j j→+
= w− if, and only if, inf j∈ w−j d > −. In this case,
w ∈−∪− is a submartingale and w− is a.e. real-valued.
For a backwards martingale, the condition in (ii) is automatically satisfied.
196
R.L. Schilling
Proof Part (i) has already been proved in Corollary 18.5. For (ii) we start with the observation that for a backwards submartingale sup w−j d < ⇐⇒ inf w−j d > − ⇐⇒ lim w−j d ∈
j∈
j∈
j→+
Indeed: the second equivalence follows from the submartingale property, w−j−1 d w−j d w−1 d while ‘⇐’ of the first equivalence derives from the fact that w+ ∈− is again a submartingale, cf. Example 17.3(iv), and + + w−j d = 2w−j − w−j d 2 w−1 d − w−j d the other direction ‘⇒’ is obvious. With exactly the same reasoning which was used in the proof of T18.6, (i)⇒(ii), we can now show that w+ ∈− and w ∈− are uniformly integrable (of course, the function w used as a bound for uniform integrability is now − -measurable). The submartingale property of w ∈−∪− follows literally with the same arguments as the corresponding assertion in (iii)⇒(i) of T18.6. We close this chapter with a simple but far-reaching application of the (backwards) martingale convergence theorem. 18.8 Example (Kolmogorov’s strong law of large numbers) For every sequence Xj j∈ of identically distributed independent random variables on the probability space P – that is, all Xj → are measurable, independent functions (in the sense of Example 17.3(x) and Scholium 17.4) such that Xj P = X1 P for all j ∈ – the strong law of large numbers holds, i.e. the limit 1 X1 + · · · + Xn n→ n lim
exists and is finite for a.e. ∈
if, and only if, the Xj are integrable. If this is the case, the above limit is given by X1 dP. Sufficiency: Suppose the Xj are integrable. Then Yj = Xj − Xj dP are again independent identically distributed random variables with zero mean: Yj dP = 0. Set Sn = Y1 + Y2 + · · · + Yn and −n = Sn Sn+1 Sn+2 and n Sn −n n∈ is a backwards martingale. In fact, any function of Y1 Y2 Yn Sn is independent of Yn+1 Yn+2 , and (17.6) yields for every set of the 1
Measures, Integrals and Martingales
197
form A = Nj=1 Yn+j ∈ Bj ∩ Sn ∈ B0 , B0 BN ∈ , N ∈ , and all k = 1 2 n Yk dP = N 1Sn ∈B0 Yk dP j=1 Yn+j ∈Bj
A
= =
Sn ∈B0
Yk dP · P
Sn ∈B0
Yn+j ∈ Bj
(by (17.6))
j=1
N
Y1 dP · P
N
Yn+j
∈ Bj
j=1
noting that the Yk are identically distributed. Summing over k = 1 n gives A
Sn dP = n
Sn ∈B0
Y1 dP · P
N j=1
Yn+j ∈ Bj = n
A
Y1 dP
This means that A Y1 dP = A n1 Sn dP for all n ∈ and all sets A from a generator clearly satisfies the conditions of Remark 17.2(i), proving that of1 −n which S −n n∈ is a backwards martingale. Theorem 18.7 now guarantees that n n Sn S 2 = lim n2 n→ n n→ n
L = lim
exists a.e. and in 1
It remains to show that L = 0 a.e. Note that limn→ Sn /n2 = 0 a.e.; since e− x 1 and since constants are integrable, the dominated convergence theorem 11.2 and independence (17.7) show 2 S −S e− L dP = lim exp − Snn exp − n2n2 n dP n→ S −S = lim exp − Snn exp − n2n2 n dP n→
S −S exp − Snn dP exp − n2n2 n dP = lim n→
=
e− L dP
2
Thus
2 2 2 e− L − e− L dP dP = e− L dP − e− L dP = 0
198
R.L. Schilling
and we conclude with Theorem 10.9(i) that e− L = e− L dP a.e.; as a consequence, L is almost everywhere constant. Using L = L1 - limn→ Sn /n, we get S n L = L dP = lim dP = 0 a.e. n→ n =0
Necessity: Suppose the a.e. limit L = limn→ n1 X1 + · · · + Xn exists and is finite. If all Xj were positive, we could argue as follows: the truncated random variables Xjc = Xj ∧ c are still independent and identically distributed. Since they are also integrable, the sufficiency direction of Kolmogorov’s law shows that for all c > 0 X c + · · · + Xnc X + · · · + Xn X1c dP = lim 1 lim 1 = L
n→ n→ n n Letting c → , Beppo Levi’s theorem 9.6 proves X1 dP < . Such a simple argument is not available in the general case. For this we need the converse or ‘difficult’ half of the Borel – Cantelli lemma (cf. Problem 6.9). 18.9 Theorem (Borel–Cantelli) Let P be a probability space and Aj j∈ ⊂ . Then PAj < =⇒ Plim supj→ Aj = 0 j=1
if the sets Aj are pairwise independent,1 then
PAj =
=⇒
Plim supj→ Aj = 1
j=1
Proof Recall that lim supj Aj = k jk Aj . Thus ∈ lim supj Aj if, and only if, appears in infinitely many of the Aj . This shows that lim supj Aj = j=1 1Aj = . The first of the two implications follows thus: by the Beppo Levi theorem for series C9.9, we see
1Aj dP =
j=1
Corollary 10.13 then shows 1
j=1
j=1 1Aj
i.e. PAj ∩ Ak = PAj PAk for all j = k.
1Aj dP =
PAj <
j=1
< a.e., and Plim supj→ Aj = 0 follows.
Measures, Integrals and Martingales
n
For the second implication we set Sn = j=1 1Aj and S = mn = Sn dP = nj=1 PAn and, by pairwise independence,
Sn − mn 2 dP =
n
199
j=1 1Aj .
Then
1Aj − PAj 1Ak − PAk dP
jk=1
=
n
1Aj − PAj 2 dP
j=1
=
n
PAj 1 − PAj mn
j=1
Since Sn S, we can use Markov’s inequality P10.12 to get P S 21 mn P Sn 21 mn = P Sn − mn − 21 mn P Sn − mn 21 mn = P Sn − mn 2 41 m2n 4 4 2 Sn − mn 2 dP
mn mn n→
By assumption mn −−−→ , hence PS < = limn→ PS 21 mn = 0. 18.8 Example (continued) We can now continue with the proof of the necessity part of Kolmogorov’s strong law of large numbers. Since the a.e. limit exists, we get Xn Sn n − 1 Sn−1 n→ = − −−−→ 0 n n n n−1 which shows that ∈ An = Xn > n happens only for finitely many n. In other words, P 0; since the An are all independent, the Borel–Cantelli j=1 1Aj = = lemma T18.9 shows that j=1 PAj < . Thus
X1 dP = =
j=1 j−1 X1 <j j
X1 dP
Pj − 1 X1 < j =
Pj − 1 X1 < j
n=1 j=n
j=1 n=1
= 1+
j Pj − 1 X1 < j
j=1
P X1 n = 1 +
n=1
since X1 and Xn have the same distribution.
n=1
P Xn n <
200
R.L. Schilling
We will see more applications of the martingale convergence theorems in the following chapters. Problems Unless otherwise stated X j will be a -finite filtered measure space. 18.1. Verify that the random times k and k defined in the proof of Lemma 18.1 are stopping times. 18.2. Let −j j∈ be a decreasing filtration such that − is -finite. Assume that u−j −j j∈ is a backwards supermartingale which converges a.e. to a real-valued function u− ∈ 1 which closes the supermartingale to the left, i.e. such that u−j −j j∈∪ is still a supermartingale. Then lim u−j d = u− d
j→
18.3. Let uj j j∈ be a supermartingale such that uj 0 and limj→ uj d = 0. j→
Then uj −−→ 0 pointwise a.e. and in 1 . Remark: Positive supermartingales with limj→ uj d = 0 are called potentials. 18.4. Let uj j j∈ be a martingale. If 1 -limj→ uj exists, then the pointwise limit limj→ uj x exists for almost every x. 18.5. Let P be a probability space. Find a martingale uj j∈ for which 0 < Puj converges < 1. [Hint: take a sequence Xk k∈0 of independent Bernoulli 21 21 -distributed random variables with values ±1; try uj = 21 X0 + 1X1 + X2 + · · · + Xj .] 18.6. The followingexercise furnishes an example of a martingale Mj j∈ on the probability space 0 1 0 1 = 1 01 such that -limj→ Mj exists but the pointwise limit limj→ Mj x doesn’t. Compare this with Problem 18.4. (i) Construct a sequence Xj j∈ of independent, identically Bernoulli distributed random variables with X1 = 1 = X1 = −1 = 21 . (ii) Let n = X1 X2n . Show that An = X2n−1 +1 + · · · + X2n = 0 is for each n ∈ contained in n and lim An = 0
n→
and
lim sup An = 1
n→
Conclude that the set of all x for which limn 1An x exists is a null set. [Hint: use the ‘difficult’ direction √ of the Borel–Cantelli lemma T18.9. Moreover, Stirling’s formula n! ∼ 2n n/en might come in handy.] (iii) The sequence M0 = 0 and Mn+1 = Mn 1 + X2n +1 + 1An X2n +1 , n ∈ 0 , defines a martingale Mn n n1 . (iv) Show that Mn+1 = 0 21 Mn = 0 + An . (v) Show that for every x ∈ limn Mn exists the limit limn 1An x exists, too. Conclude that limn Mn = 0 = 1 and that limn Mn exists = 0.
Measures, Integrals and Martingales
201
1 18.7. Consider the probability space P with Pj = 1j − j+1 . Set n = 1 2 n n + 1 ∩
and show that Xn = n + 11n+1∩ , n ∈ , is a positive martingale such that Xn dP = 1, limn→ Xn = 0 but supn∈ Xn = . 2 18.8. martingales A martingale uj j j∈ is called 2 -bounded, if supj∈ -bounded 2 uj d < . For ease of notation set u0 = 0. (i) Show that uj j∈ is 2 -bounded if, and only if, uj − uj−1 2 d < . j=1
[Hint: use Problem 17.6.] Assume from now on that uj j∈ is 2 -bounded. (ii) Show that lim uj = u exists a.e. j→
[Hint: argue that u2j j∈ is a submartingale.] (iii) Show that lim u − uj 2 d = 0. j→ 2 [Hint: check that uj+k − uj 2 d = j+k =j+1 u − u−1 d and apply Fatou’s lemma T9.11.] (iv) Assume now that X < . Show that uj j∈ is uniformly integrable, that j→
uj −−→ u in 1 and that u = u closes the martingale to the right, i.e. that uj j∈∪ is again a martingale. 18.9. Let P be a probability space. (i) Let j j∈ be a sequence of independent identically Bernoulli 21 21 -distributed random variables with values ±1. Show that for any sequence yj j∈
yj2 <
⇐⇒
j=1
j yj
converges a.e.
j=1
(ii) Generalize (i) to a sequence of independent random variables Xj j∈ with zero mean Xj dP = 0 and finite variances Xj2 dP = j2 < and prove
j2 <
=⇒
j=1
Xj
converges a.e.
j=1
[Hint: consider the martingale Sn = X1 + · · · + Xn and use Problem 18.8.] (iii) If Xj C for all j ∈ , the converse of (ii) is also true, i.e. j=1
j2 <
⇐⇒
Xj
converges a.e.
j=1
[Hint: show that Mn = X1 + · · · + Xn 2 − 12 + · · · + n2 = Sn2 − An is a martingale, use optional sampling 17.8 for Mn with = infj Mj > , observe that Mn∧ C + and that An∧ dP K + c2 .]
19 The Radon–Nikodým theorem and other applications of martingales
After our excursion into the theory of martingales we want to apply martingales to continue the development of measure and integration theory. The central topics of this chapter are • the Radon–Nikodým theorem 19.2 and Lebesgue’s decomposition theorem 19.9; • the Hardy–Littlewood maximal theorem 19.17; • Lebesgue’s differentiation theorem 19.20. For the last two we need (maximal) inequalities for martingales. These will be treated in a short interlude which is also of independent interest.
The Radon–Nikodým theorem Let X be a measure space. We have seen in Lemma 10.8 that for any + f ∈ 1+ – or indeed for f ∈ – the set-function = f given by A = A fx dx is again a measure. From Theorem 10.9(ii) we know that N ∈
N = 0
=⇒
N = 0
(19.1)
This observation motivates the following 19.1 Definition Let be two measures on the measurable space X . If (19.1) holds, we call absolutely continuous w.r.t. and write . Measures with densities are always absolutely continuous w.r.t. their base measure: f . Remarkably, the converse is also true. 19.2 Theorem (Radon–Nikodým). Let be two measures on the measurable space X . If is -finite, then the following assertions are equivalent 202
Measures, Integrals and Martingales
(i) A = (ii) .
fx dx A
203
for some a.e. unique f ∈ + ;
The unique function f is called the Radon–Nikodým derivative and (traditionally) denoted by f = d/d. Above we have just verified that (i)⇒(ii). The converse direction is less obvious and we want to use a martingale argument for its proof. For this we need a few more preparations which extend the notion of martingale to directed index sets. Let I be any partially ordered index set. We call I upwards filtering or upwards directed if
∈ I
=⇒
∃ ∈ I
(19.2)
A family ∈I of sub- -algebras of is called a filtration if
∈ I =⇒ ⊂ as before, we set = ∈I , and we treat as the biggest element of I ∪ , i.e. < for all ∈ I. If a -algebra 0 ⊂ for all ∈ I and if 0 is -finite, we call X a -finite filtered measure space. 19.3 Definition Let X be a -finite filtered measure space. A family of measurable functions u ∈I is called a martingale (w.r.t. the filtration ∈I ), if u ∈ 1 for each ∈ I and if u d = u d ∀ ∀ A ∈ (19.3) A
A
The notion of convergence along an upwards filtering set is slightly more complicated than for the index set . We say u = 1 - lim u ⇐⇒ ∀ > 0 ∃ ∈ I ∀ u − u 1 <
∈I
We can now extend Theorem 18.6. 19.4 Theorem Let I be an upwards filtering index set, X be a -finite measure space and u ∈I be a martingale. Then the following assertions are equivalent. (i) There exists a unique u ∈ 1 such that u ∈I∪ is a martingale. In this case u = 1 -lim ∈I u . (ii) u ∈I is uniformly integrable.
204
R.L. Schilling
Proof (i)⇒(ii): (compare with T18.6) Denote by Aj j∈ an exhausting sequence in 0 . Since u ∈ 1 , we find for every > 0 some > 0 and N ∈ such that u >
u d +
Acj
u d
∀ j N
Clearly, the function wx = j∈ 2−j 1 + Aj −1 1Aj x is in 1+ 0 , w > 0 and, as u ∈I∪ is a submartingale (cf. Example 17.3(v)), we find for every L>0 sup u d sup u d
∈I u >Lw
sup
∈I u >Lw
∈I u >Lw∩AN ∩ u
u d +
AcN
u d +
sup u > L 2−N 1 + AN −1 +
u >
u d
∈I
(use for the last step that wx 2−N 1 + AN −1 for x ∈ AN ). By Markov’s inequality P10.12 and the submartingale property we get 2−N 1 + AN sup sup u d + u d L
∈I u >Lw
∈I −N 2 1 + AN u d + L and (ii) follows since we can choose L > 0 as large as we want. (ii)⇒(i): Step 1: uniqueness. Assume that u w ∈ 1 are two functions which close the martingale u ∈I , i.e. functions satisfying u d = w d = u d ∀ A ∈ ∈ I A
A
A
Since u and w are integrable functions, the family
= A ∈ u d = w d
A
A
is a -algebra which satisfies ∈I ⊂ ⊂ .Since is generated by the , we get = , which means that A u d = A w d holds for all A ∈ . Now Corollary 10.14 applies and we get u = w almost everywhere. Step 2: existence of the limit. We claim that ∀ > 0 ∃ ∈ I ∀ u − u d < (19.4) Otherwise we could find a sequence j j∈ ⊂ I such that u j+1 − u j d > for all j ∈ . Since I is upwards filtering, we can assume that j j∈ is an
Measures, Integrals and Martingales
205
increasing sequence.[] Because of (ii), u j j j∈ is a uniformly integrable martingale with index set which is, by construction, not an 1 -Cauchy sequence. This contradicts Theorem 18.6. We will now prove the existence of the 1 -limit. Pick in (19.4) = n1 and choose 1/n . Since I is upwards directed, we can assume that 1/n increases as n → ;[] thus u 1/n n∈ ⊂ 1 is an 1 -Cauchy sequence. By Theorem 18.6 it converges in 1 and a.e. to some u = limn→ u 1/n ∈ 1 . Moreover, for all A ∈ and > 1/n we have A
u − u d
u − u 1/n d + A
A
u 1/n − u d
2 n
1/n by (19.4) 1
This shows, in particular, that 1A u −→ 1A u for all A ∈ , and in view of step 1, u is the only possible limit. The same argument that we used in (iii)⇒(i) of T18.6 now yields that u ∈I∪ is still a martingale. a.e. along I
Theorem 19.4 does not claim that u −−−−−−→ u . This is, in general, false for non-linearly ordered index sets I, see e.g. Dieudonné [12]. That uncountable, partially ordered index sets are not at all artificial is shown by the following example which will be essential for the proof of Theorem 19.2. 19.5 Example Let X be a finite measure space and assume that is a measure such that . Set
n I = = A1 A2 An n ∈ Aj ∈ and · Aj = X j=1
and define an order relation ‘ ’ on I through · ∪ · A where Ak ∈ ∈
⇐⇒ ∀ A ∈ A = A1 ∪ Since the common refinement of any two elements ∈ I, = A ∩ A A ∈ A ∈ is again in I and satisfies and , it is clear that I is upwards filtering. In particular, ∈I
where
= A A ∈
206
R.L. Schilling
is a filtration as ⊂ whenever . Moreover, f ∈I defined by A A 1 = 0 if A = 0 f = A A A A∈
is a martingale. Indeed, if , ∈ I, then A A if A > 0 f d = A = = A A 0 if A = 0 A · ∪ · B and B1 B ∈ as . Similarly, for A ∈ with A = B1 ∪ A
f d =
k=1 Bk
Bk Bk Bk k=1 Bk =
f d =
k Bk >0 ∗
=
Bk = A
k=1
= 0 if B = 0. Thus where we used in ∗ that , i.e. B k k A f d = A f d for all A ∈ , hence on since all A ∈ are disjoint and generate
[] ( , cf. also Remark 17.2(i)). What Example 19.5 really says is that A = f d ∀ A ∈
(19.5)
A
or and d /d = f . Heuristically we should expect that,
→
if f −−−→ f exists, f is the Radon–Nikodým derivative d/d = f . This idea can be made rigorous and is the basis for the Proof (of Theorem 19.2 (ii)⇒(i)) Let us first assume that and are finite measures Denote by f ∈I the martingale of Example 19.5. It is enough to show that f = 1 - lim f exists and that =
∈I
Indeed, (19.6) combined with (19.5) implies A = f d ∀A ∈ A
∈I
(19.6)
Measures, Integrals and Martingales
207
and theorem 5.7 for measures extends this equality to = the uniqueness ∈I . Since A ∈ is trivially contained in where = A Ac – at this point we use the finiteness of the measure – we see ⊃ = ⊃ ⊃
∈I
∈I
and all that remains is to prove the existence of the limit in (19.6). In view of Theorem 19.4 we have to show that f ∈I is uniformly integrable. We claim that sup ∈I f > R for all large enough R = R > 0. Otherwise we could find some 0 > 0 with f > n > 0 for all n ∈ , so that n∈ f > n > 0 by the continuity of measures, T4.4. Since is a finite measure, 4.4 10.12 1 X f d = lim = 0 f > n = inf f > n lim n→ n n→ n n∈ n∈ which contradicts the fact that . Finally, f d = f d = f > R f >R
f >R
if R = R > 0 is sufficiently large, and uniform integrability follows since the constant function R ∈ 1 . The uniqueness of f follows also from Theorem 19.4. Assume that is finite and X = Denote by = F ∈ F < the sets with finite -measure. Obviously, is ∪-stable, and the constant c = sup F X < F ∈
can be approximated by an increasing sequence Fj j∈ ⊂ such that c = j∈ Fj = supj∈ Fj .[] When restricted to the set F = j∈ Fj , is by c , A ∈ , we have definition -finite, while for A ⊂ F either
A = A = 0
or
0 < A < A =
(19.7)
In fact, if A < , then Fj ∪ A ∈ for all j ∈ , which implies that · A = F ∪ · A = F + A = c + A Fj ∪ c j∈
that is A = 0, hence A = 0 by absolute continuity; if, however, A = we have again by absolute continuity that A > 0. Define now F0 = ∅ j = • ∩ Fj \ Fj−1 j = • ∩ Fj \ Fj−1
208
R.L. Schilling
and it is clear that j j for every j ∈ . Since j j are finite measures, the first part of this proof shows that j = fj j . Obviously, the function f x if x ∈ Fj \ Fj−1 (19.8) fx = j c if x ∈ F fulfils = f . By construction, f is unique on the set F . But since every density f˜ of with respect to satisfies c c f˜ n ∩ F f˜ d n f˜ n ∩ F = < c f˜n∩F c = f˜ n ∩ F c = 0 for all the alternative (19.7) reveals that f˜ n ∩ F n ∈ , i.e. that f˜ Fc = . In other words: f , as defined in (19.8), is also unique c. on F Assume that is -finite and X Let Aj j∈ ⊂ be an exhausting sequence with Aj ↑ X and Aj < . Then the measures 2−j h and where hx = 1 x 1 + Aj Aj j=1 have the same null sets.[] Therefore if, and only if, h . Since h is a finite measure[] , the first two parts of the proof show that = f · h = fh for a suitable density f ∈ + . The last equality needs proof: if f= M j=0 yj 1Aj is a positive simple function, A =
M A j=0
yj 1Aj dh =
M
yj
1Aj ∩A h d =
j=0
fh d A
and the general case follows from Beppo Levi’s theorem 9.6. Uniqueness is clear as f is h -a.e. unique, which implies that fh is -a.e. unique since h > 0. 19.6 Corollary Let X be a -finite measure space and = f . Then (i) X < ⇐⇒ f ∈ 1 ; (ii) is -finite ⇐⇒ f = = 0. Proof The first assertion (i) is obvious. For (ii) assume first that f = = 0. Since is -finite, we find an exhausting sequence Aj j∈ ⊂ with Aj ↑ X and Aj < . The sets Bk = 0 f k
B = f =
Measures, Integrals and Martingales
obviously satisfy
k∈ Bk ∪ B
Bk ∩ Aj =
Bk ∩Aj
209
= X as well as B = 0 and f d k d = k Aj < Aj
This shows that Aj ∩ Bk ∪ B jk∈ is an exhausting sequence for which means that is -finite. Conversely, let be -finite and assume that f = > 0. As we can find one exhausting sequence Ck k∈ ⊂ for both and [] , we see that f = = f = ∩ Ck ⊃ f = ∩ Ck0 k∈
for some fixed k0 ∈ with Ck0 > 0. But then Ck0 f d = f =∩Ck0
which is impossible. It is clear that not all measures are absolutely continuous with respect to each other. In some sense, the next notion is the opposite of absolute continuity. 19.7 Definition Two measures on a measurable space X are called (mutually) singular if there is a set N ∈ such that N = 0 = N c . We write in this case ⊥ (or ⊥ as ‘⊥’ is symmetric). 19.8 Examples Let X = n n . Then (i) x ⊥ n for all x ∈ n ; (ii) f ⊥ g if supp f ∩ supp g = ∅.1 The measures and are singular, if they have disjoint ‘supports’, that is, if lives in a region of X which is not charged by and vice versa. In this sense, Example 19.8(ii) is the model case for singular measures. In general, however, two measures are neither purely absolutely continuous nor purely singular, but are a mixture of both. 19.9 Theorem (Lebesgue decomposition) Let be two -finite measures on a measurable space X . Then there exists a (up to null sets) unique decomposition = + ⊥ where and ⊥ ⊥ . 1
supp f = f = 0.
210
R.L. Schilling
Proof Obviously + is still a -finite measure[] , and + . In this situation Theorem 19.2 applies and shows that = f + = f + f
(19.9)
For any > 0 we conclude, in particular, that f 1 + = f d + f 1+
1 + f 1 + + 1 + f 1 + i.e. f 1 + = f 1 + = 0 for all , hence f > 1 = f > 1 = 0. Without loss of generality we may therefore assume that 0 f 1. In this case (19.9) can be rewritten as 1 − f = f
(19.10)
and on the set N = f = 1 we have N =
f =1
d =
(19.10)
f =1
f d =
f =1
1 − f d = 0
Therefore, ⊥ ⊥ where ⊥ = • ∩ f = 1, and for = • ∩ f < 1 we get from (19.10) A = A ∩ f < 1 =
A∩f<1
d =
f d ∀ A ∈ A∩f<1 1 − f
showing that . The uniqueness (up to null sets) of this decomposition follows directly from the uniqueness of the Radon–Nikodým derivative f/1 − f 1f<1 . 19.10 Remark We have used the martingale convergence theorem to prove the Radon–Nikodým theorem. But the connection between these two theorems is much deeper. For measures with values in a Banach space (‘vector measures’) the Radon–Nikodým theorem holds if, and only if, the pointwise martingale convergence theorem is valid. One should add that the Radon–Nikodým theorem for Banach spaces is intimately connected with the geometry of Banach spaces. Note, however, that the techniques required in the theory of vector measures are distinctly different from those in the real case. For more on this see Diestel-Uhl [11, Chapter V.2], Benyamini-Lindenstrauss [7, Chapter 5.2] or Métivier [29, § 11].
Measures, Integrals and Martingales
211
Martingale inequalities Martingales will allow us to prove maximal inequalities which are useful and important both in analysis and probability theory. In order to ease the exposition we introduce the following (quite common) shorthand notation: u∗N x = max uj x
and
1jN
u∗ x = lim u∗N x = sup uj x N →
j∈
The following simple lemma is the key to all maximal inequalities. 19.11 Lemma Let X j be a -finite filtered measure space and let uj j∈ be a submartingale. Then we have for all s > 0
max uj s
1jN
1 s
max uj s
1jN
uN d
1 + uN d s
(19.11)
p
If uj ∈ + or if uj j∈ ⊂ p , p ∈ 1 , is a martingale, then u∗N s
1 1 p u d uN p d sp u∗N s N sp
(19.12)
Proof Consider the stopping time when uj exceeds the level s for the first time: = infj N uj s ∧ N + 1 inf ∅ = + and set A = max uj s = Nj=1 uj s = N ∈ , where we used 1jN
Lemma 17.6. From Theorem 17.7(iii) and the fact that u s on A, we conclude N 1 u 1 1 + uj s d = u d uN d uN d s A s A s A s j=1 =A
The second inequality (19.12) follows along the same lines since, under our assumptions, uj p j∈ is a submartingale, cf. Example 17.3(vi). The next theorem is commonly referred to as Doob’s maximal inequality. 19.12 Theorem (Doob’s maximal Lp -inequality) Let X j be a finite filtered measure space, 1 < p < and let uj j∈ be a martingale or uj p j∈ be a submartingale. Then we have u∗N p
p p u max u p − 1 N p p − 1 1jN j p
212
R.L. Schilling
Proof It is enough to consider the case where uj j∈ is a martingale; the situation where uj p j∈ is a submartingale is similar and simpler. If uN p = , the inequality is trivial; if uN ∈ p , then u1 uN −1 ∈ p since uj p j∈ is a submartingale by 17.3(vi). Thus u∗N u1 + u2 + · · · + uN
=⇒
u∗N ∈ p
and using (13.8) of Corollary 13.13 and Tonelli’s theorem 13.8 we find (13.8) u∗N p d = p sp−1 u∗N s ds 0
(13.12)
p
13.8
= p
(13.8)
=
Hölder’s inequality T12.2 with
u∗N p
sp−2
0
uN
u∗N
uN 1u∗N s d ds s
p−2
ds d
0
p uN u∗N p−1 d p−1 1 p
+ q1 = 1, i.e. q =
p p−1 ,
yields
1/p 1−1/p p p ∗ p uN d d uN d p−1
and the claim follows. Using the continuity of measures T4.4, resp. Beppo Levi’s Theorem 9.6 we derive from (19.11), resp. Theorem 19.12 the following result. 19.13 Corollary Let uj j∈ be a martingale on the -finite filtered measure space X j . Then 1 sup u s j∈ j 1 p u∗ p sup u p − 1 j∈ j p
u∗ s
(19.13) p ∈ 1
(19.14)
If uj j∈∪ is a martingale, we may replace supj∈ uj p , p ∈ 1 , in (19.13) and (19.14) by u p . An inequality of the form (19.13) is a so-called weak-type maximal inequality opposed to the strong-type p p inequalities of the form (19.14).
Measures, Integrals and Martingales
213
If p = 1 and uj j∈∪ is a martingale, we cannot expect a 1 1 strong-type inequality like (19.14) and we have to settle for the weak-type maximal inequality (19.13) instead. Otherwise, the best we can hope for is e ∗ + u 1 X + u log u d if X < e−1 e
∗ + u 1 + u log u d u d else or e − 1 u∗
Details can be found in Doob [13, pp. 313–4] apart from some obvious modifications if X = . The Hardy–Littlewood maximal theorem Doob’s martingale inequalities T19.12 and C19.13 can be seen as abstract versions of the classical Hardy–Littlewood estimates for maximal functions in n . To prepare the ground we begin with a dyadic example. 19.14 Example Consider in n the half-open squares Qk z = z + 0 2−k n
k ∈ z ∈ 2−k n
with lower left corner z and side-length 2−k . Then 0 k = Qk z z ∈ 2−k n
k ∈
defines a (two-sided infinite) filtration 0
0
0
0
0
⊂ −2 ⊂ −1 ⊂ 0 ⊂ 1 ⊂ 2 ⊂ of sub- -algebras of n . The superscript ‘0’ indicates that the square lattice 0 in each k contains some square with the origin 0 ∈ n as lower left corner. Just as in Example 17.3(ix) one sees that for a function f ∈ 1 n fk x =
1 f dn 1Qk z x n Q z Q z k k z∈2−k n
k ∈
(19.15)
is a martingale infinite index set, – if0you are unhappy about the two-sided 0 then think of fk k k∈ as a martingale and of f−k −k k∈ as backwards 0 0 martingale.
214
R.L. Schilling
For the square maximal function
0 1 ∗ x = sup fk x = sup n f dn Q ∈ k x ∈ Q f0 Q Q k∈ k∈ and the submartingale fk k∈ , cf. Example 17.3(v), Doob’s inequalities become
∗ f0 s
1 1 sup fk dn f dn s k∈ s
The classical Hardy–Littlewood maximal function is defined similar to the square maximal function from Example 19.14, the only difference being that one uses balls rather than squares. 19.15 Definition The Hardy–Littlewood maximal function of the function u ∈ p n , 1 p < is defined by u∗ x = sup B Bx
1 u dn n B B
where B ⊂ n stands for a generic (open or closed) ball of any radius. From the Hölder inequality we see that for all sets with finite Lebesgue measure A
u d A n
n
1−1/p
A
1/p u d p
n
1 p <
so that u∗ is well-defined. However, since u∗ is given by a (possibly uncountable) supremum, it is not obvious whether u∗ is Borel measurable. 19.16 Lemma Let u ∈ p n , 1 p < . The Hardy–Littlewood maximal function satisfies
1 ∗ n n u x = sup n u d r ∈ + c ∈ x ∈ Br c Br c Br c In particular, u∗ is Borel measurable. Proof Since + × n is countable, the formula shows that u∗ arises from a countable supremum of Borel measurable functions and is, by Corollary 8.9, again Borel measurable. The inequality ‘’ is clear since every ball with rational centre and radius is admissible in the definition of the maximal function u∗ . To see ‘’, we fix x ∈ n
Measures, Integrals and Martingales
215
and pick some generic (open or closed) ball B with x ∈ B. Given some > 0 we can find r ∈ + and c ∈ n such that B = Br c ⊂ B, 21 n B n B and n B \ B 1−1/p
n B n B u p 2 u p
Then 1 1 n u d u dn n B B n B B 1 1 = n u dn + n u dn B B\B B B 1 1 n n B \ B 1−1/p u p + n u dn B B B 1 + sup n u dn B x∈B B B (the supremum ranges over all balls B with rational radius and centre s.t. x ∈ B ), where we again used Hölder’s inequality in the penultimate line. Since and B were arbitrary, the inequality ‘’ follows by considering the supremum over all balls with x ∈ B and then letting → 0. We will see now that u∗ is in p if 1 < p < . 19.17 Theorem (Hardy, Littlewood) Let u ∈ p n , 1 p < , and write u∗ for the maximal function. Then cn u 1 s p cn u p u∗ p p−1
n u∗ s
with the universal constant cn =
16 n √ n2
s > 0 p = 1
(19.16)
1 < p <
(19.17)
+ 1.
Proof If we could show that the square maximal function u∗0 satisfies u∗0 u∗ , then (19.16), (19.17) would immediately follow from Doob’s inequalities C19.13, compare Example 19.14. The problem, however, is that a ball Br of radius r ∈ 41 2−k−1 41 2−k , k ∈ , need not entirely fall into any single square of our 0
lattice k :
216
R.L. Schilling
r
r ∋
2
2
–k
0
∋
0
1
But if we move our lattice by 2 · 41 2−k = 21 2−k in certain (combinations of) coordinate directions, we can ‘catch’ Br inside a single cube Q of the shifted lattice.[] More precisely, if j ∈ 0 21 2−k e = 1 n then
e k = e + Qk z z ∈ 2−k n k ∈
e uk k∈
u∗e
are 2n filtrations with corresponding martingales and square maximal functions. As in Example 19.14 we find that 1 n u∗e s u 1 s
s > 0
(19.18)
Combining Corollary 15.15 with the translation invariance and scaling behaviour of Lebesgue measure we see that the volume of a ball Br of radius 41 2−k−1 r < 1 −k and arbitrary centre is 42 n n/2 r n n/2 41 2−k−1 15.15 n n n Br = r B1 = n2 + 1 n2 + 1 hence we get from x ∈ Br ⊂ Q and n Q = 2−k n that n Q 1 1 n u d u dn n Br Br n Br n Q Q −k n n 2 +1 2 1 u dn 1 n n n/2 −k Q Q 82 8 n √ n2 + 1 max u∗e x e
= n
Measures, Integrals and Martingales
217
This shows that u∗ n max u∗e and e
n u∗ s n
e
n
u∗e
e (19.18)
2n n
s n s
max u∗e
n
1 u n ds s
p u p for all shifts e, and Doob’s A very similar argument yields u∗e p p−1 ∗ inequality (19.14) applied to each ue finally shows
u∗ p n max u∗e p n u∗e p 2n n e
e
p u p p−1
All that remains to be done is to call cn = 2n n . The proof of Theorem 19.17 extends with very little effort to maximal functions of finite measures. 19.18 Definition Let be a locally finite2 measure on n n . maximal function is given by ∗ x = sup B Bx
The
B n B
where B ⊂ n stands for a generic open ball of any radius. If we replace in the proof of Theorem 19.17 the expression B u dn by B and u∗e x by
e Q ∗ k x ∈ Q Q∈ e x = sup n Q k∈ we arrive at the following generalization of (19.16) 19.19 Corollary Let be a finite measure on n n with total mass and maximal function ∗ . Then c (19.19) n ∗ s n s > 0 s n with the universal constant cn = √16 n2 + 1. 2
i.e. every point x ∈ n has a neighbourhood U = Ux such that U < . In n this is clearly equivalent to saying that B < for every open ball B.
218
R.L. Schilling
Lebesgue’s differentiation theorem Let us return once again to the Radon–Nikodým theorem 19.2. There we have seen that implies = f . The proof, though, shows even more, namely = f
and
1 - lim f = f
∈I
(notation as in T19.2). Let us consider a concrete measure space X = n n n . In this case we could reduce our consideration to a countable sequence of -algebras (instead of ∈I ) – cf. Problem 19.1 – and use Theorem 18.6 instead of 19.4. In fact, this would even allow us to get fx as pointwise limit. This is one way to prove Lebesgue’s differentiation theorem. 19.20 Theorem (Lebesgue) Let u ∈ 1 n . Then 1 lim n uy − ux n dy = 0 r→0 Br x Br x for (Lebesgue) almost all x ∈ n . In particular, 1 uy n dy ux = lim n r→0 Br x Br x
(19.20)
(19.21)
We will not follow the route laid out above, but use instead the Hardy–Littlewood maximal theorem 19.17 to prove T19.20. The reason is mainly a didactic one since this is a beautiful example of how weak-type maximal inequalities (i.e. inequalities like (19.16) or (19.13)) can be used to get a.e. convergence. More on this theme can be found in Krantz [25, pp. 27–30] and Garsia [16, pp. 1–4]. Our proof will also show that the limits in (19.20) and (19.21) can be strengthened to B ↓ x where B is any ball containing x and, in the limit, shrinking to x. Proof (of Theorem 19.20) We know from Theorem 15.17 that the continuous functions with compact support Cc n are dense in 1 n . Since ∈ Cc n is uniformly continuous, we find for every > 0 some > 0 such that x − y <
∀ x − y r r <
Thus
1 y − x n dy r→0 n Br x Br x lim
∀ ∈ Cc n
(19.22)
and (19.20) is true for Cc n . For a general u ∈ 1 n we pick a sequence j j∈ ⊂ Cc n with limj→ u − j 1 = 0. Denote by 1 w dn w x = sup n B x B x 0
Measures, Integrals and Martingales
219
the restricted maximal function. Since the supremum is subadditive and since u − j u − j ∗ , we get n
1 n uy − ux dy > 3 n r→0 Br x Br x = n x inf u − ux x > 3
x lim sup
>0
x u − ux x > 3 = n x u − j + j − j x + j x − ux x > 3 n u − j ∗ > + n x j − j x x > + n x j x − ux >
n
cn 1 u − j 1 + 0 + j − u 1
where we used Theorem 19.17, resp., (19.22) with → 0, resp., the Markov inequality 10.12 to deal with each of the above three terms respectively. The assertion now follows by letting first j → and then → 0. Let us now investigate the connection between ordinary derivatives and the Radon–Nikodým derivative. For this the following auxiliary notation will be useful. If is a measure on n n that assigns finite volume to any ball, we set ¯ Dx = lim sup r→0
Br x Br x = lim sup n Br x k→ 0
and, whenever the limit exists and is finite, Br x r→0 n Br x
Dx = lim
¯ and D are Borel measurable functions. Note that by Lemma 19.16, both D 19.21 Corollary Let be a locally finite3 measure on n n which is absolutely continuous w.r.t. Lebesgue measure: n . Then D exists Lebesgue a.e. and coincides a.e. with the Radon-Nikodým derivative d/dn , that is, = D n . 3
See the footnote on page 217. Note that is automatically -finite.
[]
220
R.L. Schilling
Proof Assume first that is a finite measure. By the Radon–Nikodým Theorem 19.2 we know that there is a unique function f ∈ 1 n with Br x 1 = fy n dy n Br x n Br x Br x By Lebesgue’s differentiation theorem the right-hand side of the above equality tends for n -almost all x to fx as r → 0, so that D = f almost everywhere. If is not finite, we choose an exhausting sequence of open balls Bk 0, k ∈ , and set k • = Bk 0 ∩ • and nk • = n Bk 0 ∩ • . Since the measures k and nk are finite, the previous argument applies and shows that Dk = dk /dnk a.e. for every k ∈ . By the very definition of the Radon-Nikodým derivative, we find that dk /dnk = dj /dnj on Bj 0 whenever j < k and the same is true for Dk , resp. Dj . Thus x ∈ Bk 0
Dx = Dk x
and
d dk x = x dn dnk
x ∈ Bk 0
are well-defined functions which satisfy D = d/dn n -almost everywhere. 19.22 Corollary Let be a locally finite4 measure on n n which is singular w.r.t. n , i.e. ⊥ n . Then D = 0 n -almost everywhere. Proof Assume first that is a finite measure. Write for the total mass of . ¯ = 0 a.e. Since ⊥ n , there is some n -null set N It is enough to show that D with N = . From Theorem 15.19 we know that is inner regular. Thus for every > 0, there is some compact set K = K ⊂ N such that K > − . Setting 1 • = K ∩ •
and
2 = − 1
we obtain two measures 1 2 with = 1 + 2 and 2 . Since K c is open, ¯ 1 x = 0 for all x ∈ K c , we conclude from the definition of the derivative that D so that ¯ 2 x = D ¯ 2 x 2∗ x ¯ ¯ 1 x + D Dx = D
∀ x ∈ Kc
where 2∗ denotes the maximal function for the measure 2 . This shows that ¯ > s ⊂ K ∪ 2∗ > s D
∀ s > 0
Using that n K n N = 0 and the maximal inequality Corollary 19.19 we find c c ¯ > s n 2∗ > s n 2 n n D s s 4
See the footnote on p. 217.
Measures, Integrals and Martingales
221
¯ = 0 Lebesgue a.e. Since > 0 and s > 0 were arbitrary, we conclude that D If is not finite, we choose an exhausting sequence of open balls Bk 0, k ∈ ¯ = D ¯ k on Bk 0, and there, and set k • = Bk 0 ∩ • . Obviously, D ¯ fore the first part of the proof shows that Dx = 0 for Lebesgue almost all x ∈ ¯ B 0. Denoting the exceptional set by Mk we see that Dx = 0 for all x ∈ M = k n k∈ Mk ; the latter, however, is an -null set, and the theorem follows. The Calderón–Zygmund lemma Our last topic is the famous Calderón–Zygmund decomposition lemma which is the heart of many further developments in the theory of singular integral operators. We take the proof from Stein’s book [47, p. 17] and rephrase it a little to bring out the martingale connection. 19.23 Lemma (Calderón–Zygmund decomposition) Let u ∈ 1+ n and > 0. Then there exists a decomposition of n such that (i) n = F ∪ and F ∩ = ∅; (ii) u almost everywhere on F ; (iii) = k∈ Qk with mutually disjoint half-open axis-parallel squares Qk such that for each Qk 1
< n u dn 2n Qk Qk 0
Proof Let k = k , k ∈ , be the dyadic filtration of Example 19.14 and let uk k∈ be the corresponding martingale (19.15). Introduce a stopping time = infk ∈ uk >
inf ∅ = +
and set F = = + ∪ = − and = − < < +. By the very definition of the martingale uk k∈ we see 1 k→− uk x n u dn = 2nk u 1 −−−−→ 0 Qk so that limk→− uk x = 0 and = − = ∅. If x ∈ = +, we have uk x
and so ux = limk→ uk x a.e., as the almost everywhere pointwise limit exists by Corollary 18.3 (note that uk dn = u dn < ). This settles (i) and (ii). Since is a stopping time, = k = k \ k − 1 ∈ k , hence = k as well as = · k∈ = k are unions of disjoint half-open squares. The estimate in (iii) can be written as
< u x 2n
∀ x ∈
222
R.L. Schilling
From its definition, u > is clear. For the upper estimate we note that every square Qk−1 ∈ k−1 contains 2n squares Qk ∈ k , so that 1 n u d 1Qk y n Qk y Qk y uk Qk y⊂Qk−1 z z∈2−k+1 n = 1 uk−1 u dn 1Qk−1 z n Q z Qk−1 z k−1 −k+1 n z∈2 2−n n u d 1Qk−1 z n Qk y Qk y −k+1 n Q y⊂Q z z∈2 k k−1 2n 1 u dn 1Qk−1 z n Q z Qk−1 z k−1 z∈2−k+1 n = 2n Finally, by the definition of , 1−<<+ u =
uk 1=k
2n uk−1 1=k
k∈
k∈
2n 1=k
k∈
= 2n 1−<<+ and the proof is complete. Note that all results that we have proved here for n and n can be extended to spaces of homogeneous type, i.e. metric spaces X with a measure that is finite and strictly positive on balls and has the following volume doubling property: for some positive constant > 0 we have B2r x Br x
∀ x ∈ X r > 0
see Krantz [25, §6.1, pp. 235–61].
Problems 19.1. Show that Theorem 18.6 is enough to prove the Radon–Nikodým theorem 19.2 in the situation where is countably generated, i.e. where = Aj j∈ . [Hint: set n = A1 A2 An and observe that the atoms of n are of the form C1 ∩ ∩ Cn where Cj ∈ Aj Acj , 1 j n.] 19.2. A theorem of Doob. Let t t0 and t t0 be two families of measures on the -finite measure space X such that t t for all t 0 and t → t A t A
Measures, Integrals and Martingales
223
are measurable for all A ∈ . Then there exists a measurable function t x → pt x, t x ∈ 0 × X, such that t = pt • t for all t 0. [Hint: set, as in the proof of Theorem 19.2, p t x = A∈ t A 1A x, and check t A that this function is jointly measurable in t and x. Now argue as in the proof of 19.2.] 19.3. Conditional expectations. Let X be a -finite measure space and let ⊂ be a sub- -algebra. Use the Radon–Nikodým theorem to show that for every u ∈ 1 there exists an – up to null sets unique – -measurable function u ∈ 1 such that u d = u d ∀F ∈ (19.23) F
19.4.
19.5.
19.6. 19.7.
19.8. 19.9.
F
Use this result to rephrase the (sub-, super-)martingale property 17.1. Remark. Since u is unique (modulo null sets) one often writes u = E u where E is an operator which is called conditional expectation. We will introduce this operator in a different way in Chapter 22 and show in Theorem 23.9 that we could have defined E by (19.23). Let X be a -finite measure space and let be a further measure. Show that entails that = f for some (a.e. uniquely determined) density function f such that 0 f 1. Let , be two -finite measures on X which have the same null sets. Show that = f and = g where 0 < f < a.e. and g = 1/f a.e. Remark. Measures having the same null sets are called equivalent. Give an example of a measure and a density f such that f is not -finite. Let be -finite measures on the measurable space X . Let j j∈ be a filtration of sub- -algebras of such that = j∈ j and denote by j = j and j = j . If j j for all j ∈ , then . Find an expression for the density d/d. Let be Lebesgue measure on 0 2 and be Lebesgue measure on 1 3. Find the Lebesgue decomposition of with respect to . Stieltjes measure (3). Let X be a finite measure space and denote by F the left-continuous distribution function of as in Problem 7.9. Use Lebesgue’s decomposition theorem 19.9 to show that we can decompose F = F1 + F2 + F3 and, accordingly, = 1 + 2 + 3 in such a way that (1) F1 is discrete, i.e. 1 is the countable sum of weighted Dirac -measures. (2) F2 is absolutely continuous, i.e. for every > 0 there exists a > 0 such that N Fyj − Fxj for all points x1 < y1 < x2 < y2 < < xN < yN with j=1 N j=1 yj − xj < . (3) F3 is continuous and singular, i.e. 3 ⊥ 1 .
[Hint: use in (19.2) the characterization of null sets of Problem 6.1.] 19.10. The devil’s staircase. Recall the construction of Cantor’s ternary set from Problem k k 7.10. In each step of the construction Ek = Ik1 · · Ik2 . Denote by Jk1 Jk2 −1
224
R.L. Schilling the intervals which make up 0 1 \ Ek arranged in increasing order of their endpoints. We construct a sequence of functions Fk 0 1 → 0 1 by ⎧ ⎪ if x = 0 ⎪ ⎨0 −k Fk x = j 2 if x ∈ Jkj 1 j 2k − 1 ⎪ ⎪ ⎩1 if x = 1 and interpolate linearly between these values to get Fk x for all other x. (i) Sketch the first three functions F1 F2 F3 . (ii) Show that the limit Fx = limk→ Fk x exists. Remark. F is usually called the Cantor function. (iii) Show that F is continuous and increasing. (iv) Show that F exists a.e. and equals 0. (v) Show that F is not absolutely continuous (in the sense of Problem 19.9(2)) but singular, i.e. the corresponding measure with distribution function F is singular w.r.t. Lebesgue measure 1 01 .
19.11. Kolmogorov’s inequality. Let Xj j∈ be a sequence of independent, identically distributed random variables on a probability space P. Then we have the following generalization of Chebyshev’s inequality, cf. Problem 10.5 (vi), # n # n # # 1 # # P max # Xj − EXj # t 2 VX 1jn t j=1 j j=1 where, in probabilistic notation, EY = Y dP is the expectation or mean value and VY = Y − EY 2 dP the variance of the random variable (i.e. measurable function) Y → . 19.12. Let u w 0 be measurable functions on a -finite measure space X . (i) Show that t u t ut w d for all t > 0 implies that
up d
p p−1 u w d p−1
∀ p > 1
(ii) Assume now that u w ∈ Lp . Conclude from (i) that u p
p p−1
w p for p > 1.
[Hint: use the technique of the proof of Theorem 19.12; for (ii) use H¨older’s inequality.] 19.13. Show the following slight improvement of Doob’s maximal inequality T19.12: Let uj j∈ be a martingale or uj p j∈ , 1 < p < , be a submartingale on a -finite filtered measure space. Then max uj p u∗N p jN
p p uN p max u p−1 p − 1 1jN j p
Measures, Integrals and Martingales
225
19.14. p -bounded martingales. A martingale uj j j∈ is called p -bounded, if p supj∈ uj d < for some p > 1. Show that the sequence uj j∈ converges a.e. and in p -sense to a function u ∈ p . [Hint: compare with Problem 18.8] 19.15. Use Theorem 18.6 to show that the martingale of Example 19.14 is uniformly integrable. 19.16. Let u a b → be a continuous function. Show that x → ax ut dt is everywhere differentiable and find its derivative. What happens if we only assume that u ∈ 1 dt? [Hint: Theorem 19.20.] 19.17. Let f → be a bounded increasing function. Show that f exists Lebesgue almost everywhere and that fb−fa ab f x dx. When do we have equality? [Hint: assume first that f is left- or right-continuous. Then you can interpret f as distribution function of a Stieltjes measure . Use Lebesgue’s decomposition theorem 19.9 to write = + ⊥ and use Corollaries 19.21 and 19.22 to find f . If f is not one-sided continuous in the first place, use Lemma 13.12 to find a version of f which is left- or right-continuous such that = f is at most countable, hence a Lebesgue null set.] 19.18. Fubini’s ‘other’ theorem. Let fj j∈ be a sequence of monotone increasing functions fj a b → . If the series sx = j=1 fj x converges, then s x exists a.e. and is given by s x = j=1 fj x a.e. [Hint: the partial sums sn x and sx are again increasing functions and, by Problem 19.17 s x and sn x exist a.e.; the latter can be calculated through termby-term differentiation. Since the fj are increasing functions, the limits of the difference quotients show that 0 sn sn+1 s a.e., hence j fj converges a.e. To identify this series with s , show that k sx − snk x converges on a b for some suitable subsequence. The first part of the proof applied to this series implies that k s x − sn k x converges, thus s − sn k → 0.]
20 Inner product spaces
¯ Often it is Up to now we have only considered functions with values in or . necessary to admit complex-valued functions, too. In what follows will stand for or . Recall that a -vector space is a set V with a vector addition ‘ + ’ V × V → V , v w → v + w and a multiplication of a vector with a scalar ‘ · ’ × V → V , v → · v which are defined in such a way that V + is an Abelian group and that for all ∈ and v w ∈ V the relations + v = v + v
v + w = v + w
v = v
1·v = v
hold. Typical examples of -vector spaces are the spaces p or Lp (see Remark 12.5) and, in particular, the sequence spaces p from Example 12.12. For the -versions we first need to know how to integrate complex functions. 20.1 Scholium (integral of complex functions) It is often necessary to consider complex-valued functions u X → on a measurable space X . Since is a normed space, we have a natural topology on and we may consider the Borel -algebra on . Since we can (even topologically) identify the complex plane with 2 , the Borel sets in are generated by the half-open rectangles
z w = x + iy Re z x < Re w Im z y < Im w 2 The correspondence ↔ 2 is accomplished by the map → , z → 1 1 z = Re z Im z = 2 z + z¯ 2i z − z¯ which is, along with its inverse −1 2 → , x y → −1 x y = x + iy, continuous, hence measurable.
226
Measures, Integrals and Martingales
227
Consequently, we have fX→
is
⎫ ⎬
/ measurable ⎭
⇐⇒
⎧ ⎨ Re f Im f X → are ⎩ / measurable.
(20.1)
To see ‘⇒’ note that the maps Re z → 21 z + z¯ and Im z → 2i1 z − z¯ are continuous, hence measurable, and so are by Theorem 7.4 the compositions Re f and Im f . Conversely, ‘⇐’ follows – if we write f = u + iv – from the formula f −1
z w = u−1 Re z Re w ∩ v−1 Im z Im w ∈
∈
∈
and the fact that the rectangles of the form
z w generate . This means that we can define the integral of a -valued measurable function by linearity f d = Re f d + i Im f d (20.2) and we call f X → integrable and write f ∈ 1 if Re f Im f X → are integrable in the usual sense. The following rules for f ∈ 1 are readily checked: Re f d = Re f d Im f d = Im f d f d = f d (20.3) f ∈ 1 ⇐⇒ f ∈ and f ∈ 1
(20.4)
1/2 is measurable In (20.4) the direction ‘⇒’ follows since f = Re f2 +Im f2 and f Ref + Im f , while ‘⇐’ is implied by Re f Im f f . The equivalence (20.4) can be used to show that 1 is a -vector space: for f g ∈ 1 and ∈ we have f + g ∈ 1 , in which case
f + g d =
f d +
moreover, we have the following standard estimate: f d f d
g d
(20.5)
(20.6)
228
R.L. Schilling
Only (20.6) is not entirely straightforward. Since f d ∈ , we can find some ∈ 0 2 such that i f d = ei f d = Re e f d (20.3),(20.5)
=
Re ei f d
ei f d = f d
p
The spaces , 1 < p , are now defined by p p = f ∈ f ∈
(20.7)
and it is obvious that all assertions from Chapter 12 remain valid. In particular, p p L stands for the set of all equivalence classes of -functions if we identify functions which coincide outside some -null set. Note also that most of our results on -valued integrands carry over to -valued functions by considering real and imaginary parts separately. p
As we have seen in Chapter 12, cf. Remark 12.5, the spaces , resp. semi-normed, resp. normed vector spaces. The same and more is true n : here we can even define a product of two vectors which, however, results in a scalar. It is this notion which we want to study in greater detail. p L are for n and
20.2 Definition A -vector space V is an inner product space if it supports a scalar or inner product, i.e. a map • • V × V → with the following properties: for all u v w ∈ V and ∈ v v > 0
definiteness
⇐⇒
v = 0
v w = w v
skew-symmetry
u + v w = u w + v w
SP1 SP2 SP3
If = , (SP2 ) becomes symmetry and (SP2 ), (SP3 ) together show that both v → v w and w → v w are -linear; therefore we call v w → v w bilinear. If = , (SP2 ), (SP3 ) give SP2
SP3
u v + w = v + w u = v u + w u SP2 ¯ = ¯ v u + ¯ w u = u ¯ v + u w
Measures, Integrals and Martingales
229
i.e. w → v w is skew-linear. We call • • in this case a sesqui-linear form. Since = always includes = , we will restrict ourselves to = . 20.3 Lemma (Cauchy–Schwarz inequality) Let V • • be an inner product space. Then
v w 2 v v w w
∀ v w ∈ V
(20.8)
Equality holds if, and only if, v = w for some ∈ . Proof If v = 0 or w = 0, there is nothing to show. For all other v w ∈ V and ∈ we have 0 v − w v − w = v v − w v − v ¯ w + w ¯ w = v v − 2 Re w v + 2 w w where we used that z + z¯ = 2 Re z. If we set = v v / w v , we get 0 v v − 2 Re v v +
v v 2 w w
w v 2
which implies (20.8). Since v − w v − w = 0 only if v = w, this is necessary for equality in (20.8), too. If, indeed, v = w, we see
v w 2 = w w 2 = w ¯ w w w = w w w w = v v w w showing that v = w is also sufficient for equality in (20.8). Lemma 20.3 is an abstract version of the Cauchy–Schwarz inequality for integrals C12.3. Just as in Chapter 12 we will use it to show that in an inner product space V • • v ∈ V (20.9) v = v v defines a norm, i.e. a map • V → 0 satisfying for all v w ∈ V and ∈ definiteness
v > 0
⇐⇒
v = 0
N1
pos. homogeneity
v = · v
N2
triangle inequality
v + w v + w
N3
230
R.L. Schilling
20.4 Lemma V • • 1/2 is a normed space.
Proof Because of (SP1 ) the map • V → 0 , v = v v , is well-defined. All we have to do is to check the properties N1 –N3 . Obviously SP1 ⇔ N1 , N2 follows from SP2 SP3 : ¯ v = 2 · v2 v2 = v v = v and the triangle inequality N3 is a consequence of the Cauchy–Schwarz inequality (20.8): v + w2 = v + w v + w = v v + v w + w v + w w = v2 + 2 Re v w + w2 v2 + 2 v w + w2 (20.8)
v2 + 2 v · w + w2
= v + w2 20.5 Examples (i) The typical finite-dimensional inner product spaces are n
-vector space x y =
n
xj yj
n
-vector space
z w =
j=1
x =
zj w ¯j
j=1
1/2
n
n
z =
xj2
j=1
n
1/2
zj
2
j=1
(ii) The typical separable1 infinite-dimensional inner product spaces are 2
-vector space
x = xj j∈ y = yj j∈ x y = x y2 =
xj yj
2
-vector space
z = zj j∈ w = wj j∈ z w = z w2 =
j=1
x = x2 =
j=1
1
zj w ¯j
j=1
1/2 xj2
z = z2 =
1/2
zj
2
j=1
Separable means that the space contains a countable dense subset, see Definition 21.14 below.
Measures, Integrals and Martingales
231
(iii) Let X be a measure space. The typical general (finite and infinitedimensional) inner product spaces are L2
-vector space u v = u v2 = u v d u = u2 =
L2
-vector space f g = f g2 = f g¯ d
1/2 2
u d
f = f 2 =
1/2
f d
2
Every inner product space becomes a normed space with norm given by (20.9), but not every normed space is necessarily an inner product space. In fact, Lp or p are for all 1 p normed spaces, but only for p = 2 inner product spaces. The reason for this is that in Lp , p = 2, the parallelogram law does not hold. 20.6 Lemma (Parallelogram identity) Let V • • be an inner product space. Then v + w 2 v−w 2 1 2 + ∀ v w ∈ V (20.10) = v + w2 2 2 2 Proof Obvious. Geometrically v + w and v − w are the diagonals of the parallelogram spanned by v and w. The proof of (20.10) in n would show the cosine law for the angle x y between the vectors x y ∈ n : x y = cos x y x · y
(20.11)
In fact, inner products induce a natural geometry on V which resembles in many aspects the Euclidean geometry on n and n . 20.7 Definition Let V • • be an inner product space. We call v w ∈ V orthogonal and write v ⊥ w if v w = 0. 20.8 Remark (i) If • derives from a scalar product, we can recover • • from • with the help of the so-called polarization identities: if = , v w = 41 v + w2 − v − w2 = 21 v + w2 − v2 − w2 (20.12) and if = , v w =
1 4
v + w2 − v − w2 + iv − iw2 − iv + iw2
(20.13)
232
R.L. Schilling
(ii) One can show that a norm • derives from a scalar product if, and only if, • satisfies the parallelogram identity (20.10). For a proof we refer to Yosida [55, p. 39], see also Problem 20.2. (iii) Let V = V be an -inner product space with scalar product • • . Then we can turn V into a -inner product space using the following complexification procedure: V = V ⊕ iV = v + iw v w ∈ V with the following addition v + iw + v + iw = v + v + iw + w
v v w w ∈ V
scalar multiplication + iv + iw = v − w + iv + w
∈ v w ∈ V
inner product v + iw v + iw = v v + i w v − i v w + w w
v v w w ∈ V
1/2
and norm · = • • . Problems 20.1. Show that the examples given in 20.5 are indeed inner product spaces. 20.2. This exercise shows the following Theorem (Fréchet–von Neumann–Jordan). An inner product • • on the vector space V derives from a norm if, and only if, the parallelogram identity (20.10) holds. (i) Necessity: prove Lemma 20.6. Assume from now on that • is a norm satisfying (20.10) and set v w = 41 v + w2 − v − w2 (ii) Show that v w satisfies the properties SP1 and SP2 of Definition 20.2. (iii) Prove that u + v w = u w + v w. (iv) Use (iii) to prove that q v w = q v w for all dyadic numbers q = j 2−k , j ∈ , k ∈ 0 and conclude that SP3 holds for dyadic . (v) Prove that the maps t → tv + w and t → tv − w t ∈ v w ∈ X are continuous and conclude that t → tv w is continuous. Use this and (iv) to show that SP3 holds for all ∈ .
Measures, Integrals and Martingales
233
20.3. (Continuation of Problem 20.2) Assume now that W is a -vector space with norm • satisfying the parallelogram identity (20.10) and let v w = 41 v + w2 − v − w2 Then v w = v w + iv iw is a complex-valued inner product. 20.4. Does the norm •1 on L1 0 1 0 1 1
01 derive from an inner product? 20.5. Let V • • be a -inner product space, n ∈ and set = e2i/n . n 1 if k = 0 1 (i) Show that jk = n j=1 0 if 1 k n − 1 (ii) Use (i) to prove for all n 3 the following generalization of (20.12) and (20.13): n 1 v w = j v + j w2 n j=1 (iii) Prove the following continuous version of (ii) 2 1 v w = ei v + ei w d 2 − 20.6. Let V be an inner product space. Show that v ⊥ w if, and only if, Pythagoras’ theorem v + w2 = v2 + w2 holds.
21 Hilbert space
Let V • • be an inner product space. As we have seen in Chapter 20, V • = • •1/2 is a normed space and the norm resembles in many aspects the Euclidean, resp. unitary norm in n and n . In particular, we have a notion of convergence:1 a sequence vj j∈ ⊂ V converges to an element v ∈ V if v − vj j∈ converges to 0 in + , lim vj = v ⇐⇒ lim v − vj = 0
j→
j→
But it is completeness and the study of Cauchy sequences in V , vj j∈ ⊂ V Cauchy sequence ⇐⇒ lim vj − vk = 0 jk→
that gets analysis really going. This leads to the very natural 21.1 Definition A Hilbert space is a complete inner product space, i.e. an inner product space where every Cauchy sequence converges. We will usually write for a Hilbert space. 2 21.2 Example The spaces n , n , and L2 over any measure space X are Hilbert spaces and, indeed, the ‘typical’ ones. This follows from Example 20.5 and the Riesz – Fischer theorem 12.7.
Since every Hilbert space is an inner product space, we have the notion of orthogonality of g h ∈ , see Definition 20.7: g ⊥ h ⇐⇒ g h = 0 234
Measures, Integrals and Martingales
235
21.3 Definition Let be a Hilbert space. The orthogonal complement M ⊥ of a subset M ⊂ is by definition M ⊥ = h ∈ h ⊥ m ∀ m ∈ M (21.1) = h ∈ h m = 0 ∀ m ∈ M 21.4 Lemma Let be a Hilbert space and M ⊂ be any subset. The orthogonal complement M ⊥ is a closed linear subspace of and M ⊂ M ⊥ ⊥ . Proof If g h ∈ M ⊥ we find for all ∈ that g + h m = g m + h m = 0
∀ m ∈ M
i.e. g + h ∈ M ⊥ and M ⊥ is a linear subspace of . To see the closedness we take a sequence hk k∈ ⊂ M ⊥ such that limk→ hk = h. Then, for all m ∈ M, 20.3
k→
h m = h m − hk m = h − hk m h − hk · m −−−→ 0 =0
this shows that M ⊥ is closed since h ∈ M ⊥ . Finally, if m ∈ M we get 0 = h m = m h
∀ h ∈ M ⊥ =⇒ m ∈ M ⊥ ⊥
The next theorem is central for the study of (the geometry of) Hilbert spaces. Recall that a set C ⊂ is convex if u w ∈ C =⇒ tu + 1 − tw ∈ C
∀ t ∈ 0 1
21.5 Theorem (Projection theorem) Let C = ∅ be a closed convex subset of the Hilbert space . For every h ∈ there is a unique minimizer u ∈ C such that h − u = inf h − w = dh C w∈C
(21.2)
This element u = PC h is called (orthogonal) projection of h onto C and is equally characterized by the property PC h ∈ C
and
Re h − PC h w − PC h 0 ∀ w ∈ C
(21.3)
Proof Existence: Let d = inf w∈C h−w. By the very definition of the infimum, there is a sequence wk k∈ ⊂ C such that lim h − wk = d
k→
If we can show that wk k∈ is a Cauchy sequence, we know that the limit u = limk→ wk exists because of the completeness of and is in C since C is
236
R.L. Schilling
closed. Applying the parallelogram law (20.10) with v = h − wk and w = h − w gives 2
wk − w 2 1 h − wk + w + = h − wk 2 + h − w 2 2 2 2 Since C is convex, 21 wk + 21 w ∈ C, thus d h − 21 wk + w and d2 + 41 wk − w 2
1 2
k→ h − wk 2 + h − w 2 −−−−→ d2
This proves that wk k∈ is a Cauchy sequence. Uniqueness: Assume that u u˜ ∈ C satisfy both (21.2), i.e. u − h = d = u˜ − h Since by convexity 21 u + 21 u˜ ∈ C, the parallelogram law (20.10) gives 2
2 d2 h − 21 u + 21 u˜ + 21 u − u˜ = 21 h − u2 + h − u˜ 2 = d2 d2
and we conclude that u − u˜ 2 = 0 or u = u˜ . Equivalence of (21.2),(21.3): Assume that u ∈ C satisfies (21.2) and let w ∈ C. By convexity, 1 − tu + tw ∈ C for all t ∈ 0 1 and by (21.2) h − u2 h − 1 − tu − tw2 = h − u − tw − u2 = h − u2 − 2t Re h − u w − u + t2 w − u2 Hence, 2 Re h − u w − u tw − u2 and (21.3) follows as t → 0. Conversely, if (21.3) holds, we have for u = PC h ∈ C h − u2 − h − w2 = 2 Re h − u w − u − u − w2 0
∀ w ∈ C
which implies (21.2). We will now study the properties of the projection operator PC . If V W ⊂ are two subspaces with V ∩ W = 0 , we call V + W = v + w v ∈ V w ∈ W the direct sum and write V ⊕ W . 21.6 Corollary (i) Let ∅ = C ⊂ be a closed convex subset. The projection PC → C is a contraction, i.e. PC g − PC h g − h
∀ g h ∈
(21.4)
Measures, Integrals and Martingales
237
(ii) If ∅ = C = F is a closed linear subspace of , PF is a linear operator and f = PF h is the unique element with f ∈F
and
h − f ∈ F ⊥
(21.5)
In particular, = F ⊕ F ⊥ . (iii) If F is not closed, then = F¯ ⊕ F ⊥ or, equivalently, F¯ = F ⊥ ⊥ . Proof (i) follows from the inequality
PC g − PC h2 = Re PC g PC g − PC h − PC h PC g − PC h = Re PC g − g PC g − PC h + PC h − h PC h − PC g
+ g − h PC g − PC h (21.3)
Re g − h PC g − PC h g − h · PC g − PC h
where we used the Cauchy – Schwarz inequality L20.3 for the last estimate. (ii) Since F is a linear subspace, v ∈ F =⇒ v ∈ F for all ∈ and (21.3) reads in this case Re h − PF h v − PF h 0
∀ ∈ v ∈ F
or, equivalently,
Re h − PF h v Re h − PF h PF h
∀ ∈ v ∈ F
which is only possible if h − PF h v = 0 for all v ∈ F and for v = PF h, in particular, h − PF h PF h = 0; this shows (21.5). If, on the other hand, (21.5) is true, we get for all v ∈ F 0 = Re h − f v − Re h − f f = Re h − f v − f and f = PF h follows by the uniqueness of the projection. The decomposition = F ⊕ F ⊥ follows immediately as h = PF h + h − PF h and h ∈ F ∩ F ⊥ ⇐⇒ h h = 0 ⇐⇒ h = 0. The decomposition also proves the linearity of PF since for all g h ∈ and ∈ g − PF g + h − PF h g + h = 0 ∈F⊥
as well as
∈F⊥
∈F
g + h − PF g + h g + h = 0
238
R.L. Schilling
which implies, again by uniqueness of the projection, that PF g + h = PF g + PF h. (iii) We know from Lemma 21.4 that F ⊂ F ⊥ ⊥ and that F ⊥ ⊥ is closed; therefore, F¯ ⊂ F ⊥ ⊥ . Moreover, F ⊂ F¯ implies F¯ ⊥ ⊂ F ⊥ ,[] showing that 21.6(ii) 21.6(ii) = F¯ ⊕ F¯ ⊥ ⊂ F¯ + F ⊥ ⊂ F ⊥ ⊥ ⊕ F ⊥ =
and = F¯ ⊕ F ⊥ or F¯ = F ⊥ ⊥ follows. 21.7 Remarks (i) It is easy to show that the projection PF onto a subspace F ⊂ is symmetric, i.e. that PF g h = g PF h
∀ g h ∈
(21.6)
and that PF2 = PF , i.e. PF2 g h = PF g PF h = PF g h
∀ g h ∈
(21.7)
In fact, (21.7) implies (21.6). Since PF g ∈ F , PF PF g = PF g by the uniqueness of the projection and PF2 g h = PF g h follows. Finally, PF g h = PF g PF h + PF g h − PF h = PF g PF h =0
(ii) Pythagoras’ theorem has a particularly nice form for projections: h2 = PF h2 + h − PF h2
∀ h ∈
(21.8)
(iii) A very useful interpretation of C21.6(iii) is the following: a linear subspace F ⊂ is dense in if, and only if, F ⊥ = 0 . In other words, F ⊂
is dense ⇐⇒ f h = 0
∀f ∈ F
entails h = 0
Let us briefly discuss two important consequences of the projection theorem 21.5: F. Riesz’ representation theorem on the structure of continuous linear functionals on and the problem of finding a basis in . 21.8 Definition A continuous linear functional on is a map → , h → h which is linear, g + h = g + h
∀ ∈ ∀ g h ∈
and satisfies
g − h c g − h with a constant c 0 independent of g h ∈ .
∀ g h ∈
Measures, Integrals and Martingales
239
It is easy to find examples of continuous linear functionals on . Just fix some g ∈ and set g h = h g
h ∈
(21.9)
Linearity is clear and the Cauchy–Schwarz inequality L20.3 shows
g h − h˜ = h − h˜ g g ·h − h˜ = c
That, in fact, all continuous linear functionals of arise in this way is the content of the next theorem, due to F. Riesz. 21.9 Theorem (Riesz representation theorem) Each continuous linear functional on the Hilbert space is of the form (21.9), i.e. there exists a unique g ∈ such that h = g h = h g
∀ h ∈
Proof Set F = −1 0 which is, due to the continuity and linearity of , a closed linear subspace of .[] If F = , ≡ 0 and g = 0 ∈ does the job. Otherwise we can pick some g0 ∈ \ F and set g =
g0 − PF g0 (21.5) ⊥ ∈ F =⇒ g = 0 g0 − PF g0
Since = F ⊕ F ⊥ , we can write every h ∈ in the form
h h g+ h− g ∈ F ⊥ ⊕ F h= g g hence
h h h g g = 0 ⇐⇒ h g = g g h− g g g =1
⇐⇒ h = h g g
and the proof is finished. We will finally see how to represent elements of a Hilbert space using an orthonormal base (ONB, for short). We begin with a definition. 21.10 Definition Let be a Hilbert space. (i) The (linear) span of a family ek k = 1 2 N ⊂ , N ∈ ∪ , is the set of all finite linear combinations
240
R.L. Schilling
of the ek , i.e. span e1 e2 eN =
n
k ek 1 n ∈ n ∈ n N
k=1
(ii) A sequence ek k∈ ⊂ is called a (countable) orthonormal system (ONS, for short) if 0 if k = ek e = 1 if k = that is, ek = 1 and ek ⊥ e whenever k = . 21.11 Theorem Let ek k∈ be an ONS in the Hilbert space and denote by E = EN = span e1 eN the linear span of e1 eN , N ∈ . (i) E = EN is a closed linear subspace, PE g = N g − g ek ek < g − f
N
k=1 g ek ek
and
∀ f ∈ E f = PE g
k=1
and also PE g2 =
N
g ek 2
k=1
(ii) (Pythagoras’ theorem) For g ∈ 2 N N g = g − PE g + PE g =
g ek 2 g − g ek ek + 2
2
2
k=1
k=1
(iii) (Bessel’s inequality) For g ∈
g ek 2 g2
k=1
c e , c ∈ , converges to (iv) (Parseval’s identity) The sequence m k=1 2 k k m∈ k an element g ∈ if, and only if, k=1 ck < . In this case, Parseval’s identity holds: k=1
ck 2 =
k=1
g ek 2 = g2
Measures, Integrals and Martingales
241
Proof (i) That EN is a linear subspace is due to the very definition of ‘span’. The closedness follows from the fact that EN is generated by finitely many ek : if f ∈ EN is of the form f = Nj=1 cj ej , cj ∈ , then f ek =
N
N cj ej ek = cj ej ek = ck
j=1
j=1 n→
Let f n n∈ ⊂ EN be a sequence with f n −−−→ f ∈ . Then N N n n f − f ej ej = f − f ej ej j=1
j=1
N f n − f ej · ej j=1
N
f n − f
(L20.3, ej = 1)
j=1 n→
= N f n − f −−−→ 0 which shows that limn→ f n = Nj=1 f ej ej ∈ EN . If g ∈ , we observe that g − Nj=1 g ej ej ⊥ ek for all k = 1 2 N , since for these k N N g − g ej ej ek = g ek − g ej ej ek j=1
j=1
= g ek − g ek = 0 Since = EN ⊕ EN ⊥ , we get PEN g = Nj=1 g ej ej , while (21.2) implies g − N g ej ej g − f for f ∈ EN , with equality holding only if f = j=1 PEN g because of uniqueness of PEN g. Finally, PEN g2 = PEN g PEN g =
N
g ej ej
j=1
=
N
g ek ek
k=1
g ej g ek ej ek =
jk=1
where we used that ej is an ONS.
N
N j=1
g ej 2
242
R.L. Schilling
(ii) follows from (21.8) and (i). (iii) From (ii) we get for all N ∈ N
g ej 2 = g2 − g − PE g g2
j=1
Since the right-hand side is independent of N ∈ , we can let N → and the claim follows. m
(iv) Since is complete, it is enough to show that k=1 ck ek m∈ is a Cauchy sequence. Because of the orthogonality of the ek we see (as in (i)) n 2 n n 2 2 = c e
c
e =
ck 2 k k k k k=m−1
k=m−1
k=m−1
m ck ek m∈ is a Cauchy sequence in if, and only if, which means that k=1
2 converges. In the latter case, Parseval’s identity follows from (iii): k=1 ck N for g = k=1 ck ek we have PEN g = k=1 ck ek and ck = g ek by (i). Thus by (ii), g2 = g − PEN g2 +
N
N →
g ej 2 −−−→
j=1
g ej 2 =
j=1
cj 2
j=1
Two questions remain: can we always find a countable ONS? If so, can we use it to represent all elements of ? The answer to the first question is ‘yes’, while the second question has to be answered by ‘no’, unless we are looking at separable Hilbert spaces, see Definition 21.14 below. Here we will restrict ourselves to the latter situation but we will point towards references where the general case is treated. 21.12 Definition An ONS ek k∈ in the Hilbert space is said to be maximal (also complete, total, an orthonormal basis) if for every g ∈ g ek = 0
∀ k ∈ =⇒ g = 0
The idea behind maximality is that we can obtain as limit of finite-dimensional projections, ‘ = limN Pspan e1 eN ’ or ‘ = k∈ ek ’, if the limits and summations are understood in the right way. Here we see that the countability of the ONS entails that can be represented as closure of the span of countably many
Measures, Integrals and Martingales
243
elements – and that this is indeed a restriction should be obvious. Let us make all this more precise. 21.13 Theorem Let ek k∈ be an ONS in the Hilbert space . Then the following assertions are equivalent. (i) ek k∈ is maximal; (ii) span e1 eN is dense in ; N ∈
(iii) g =
g ej ej
∀ g ∈ ;
j=1
(iv)
g ej 2 = g2
∀ g ∈ ;
j=1
(v)
g ej h ej = g h
∀ g h ∈ .
j=1
Proof (i)⇒(ii): Since F = N ∈ span e1 eN = span ej j ∈ is a linear subspace of , the assertion follows from the definition of maximality and Remark 21.7(iii). (ii)⇒(iii) is obvious since
g ej ej = lim
N →
j=1
N
g ej ej = lim PEN g
j=1
N →
(iii)⇒(iv) follows from Theorem 21.11(iv). (iv)⇒(v) follows from the polarization identity (20.13). (v)⇒(i): If u ek = 0 for some u ∈ and all k ∈ , we get from (v) with g = h = u that 0=
u ej u ej = u u = u2
j=1
and therefore u = 0. Theorem 21.13 solves the representation issue. To find an ONS, we recall first what we do in a finite-dimensional vector space V to get a basis. If V =
244
R.L. Schilling
recursively all v1 vk such that still V = span span v1 vN , we remove
v1 vN \ v1 vk . This procedure gives us in at most N steps a minimal system w1 wn ⊂ v1 vN , N = n + k, with the property that V = span w1 wn . Note that this is, at the same time, a maximally independent system of vectors in V . We can now rebuild w1 wn into an ONS by the Gram–Schmidt orthonormalization procedure: e1 =
w1 w1
and recursively
e˜ j+1 = wj+1 − Pspan e1 ej wj+1 = wj+1 −
j
wj+1 e e
=1
ej+1
e˜ j+1 = e˜ j+1
⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭
(21.10)
Another interpretation of (21.10) is this: If we had unleashed the Gram–Schmidt procedure on the set v1 vN , we would have obtained again n orthonormal vectors[] , say, f1 fn (which are, in general, different from e1 en constructed from w1 wn ). A close inspection of (21.10) shows that at each step V = span f1 fj vj+1 vN , so that (21.10) extends an partially existing basis f1 fj to a full ONB f1 fn . This means that (21.10) is also a ‘basis extension procedure’. To get (21.10) to work in infinite dimensions we must make sure that is the closure of the span of countably many vectors. This motivates the following convenient (but somewhat restrictive) 21.14 Definition A Hilbert space is said to be separable if contains a countable dense subset G ⊂ . 21.15 Theorem Every separable Hilbert space has a maximal ONS. Proof Let G = gj j∈ be an enumeration of some countable dense subset of and consider the subspaces Fk = span g1 gk . Note that Fk ⊆ Fk+1 , dimFk k and that k∈ Fk is dense in . Now construct an ONB in the finite-dimensional space Fk and extend this ONB using (21.10) to an ONB in Fk+1 , etc. This produces a sequence ej j∈ of orthonormal elements such that span ej j ∈ = k∈ Fk = G is dense in and T21.13 completes the proof.
Measures, Integrals and Martingales
245
21.16 Remarks (i) Assume that is separable. Then we have the following ‘algebraic’ interpretation of the results in 21.11–21.15. Consider the maps coordinate projection 2 →
g → g ej j∈
(re-)construction map !
2 →
cj j∈ →
cj ej
j=1
! Because of Theorem 21.11(iv), both and are well-defined maps, and Theorem ! 21.11 shows that Diagram 1 (below, left) commutes, i.e. = id2 .
H
H
Π
2 ()
Π
Π
2 ()
Π id
id
H
2 ()
Diagram 1
Diagram 2
This means that, if we start with a square-summable sequence, associate an element from with it and project to the coordinates, we get the original sequence back. The converse operation, if we start with some h ∈ , project h down to its coordinates, and then try to reconstruct h from the (square integrable) coordinate sequence, is much more difficult, as we have seen in Theorems 21.13 and 21.15. Nevertheless, it can be done in every separable Hilbert space, and Diagram 2 ! (above, right) becomes commutative, i.e. = id . This shows that every separable Hilbert space can be isometrically mapped 2 onto . The isometry is given by Parseval’s identity 21.11(iv):
2
h ej 2 = h2 = h2
j∈
(ii) If is not separable, we can still construct an ONB but now we need transfinite induction or Zorn’s lemma. A reasonably short account is given in Rudin’s book [40, pp. 83–88]. The results 21.11–21.13 carry over to this case if one makes some technical (what is an uncountable sum? etc.) modifications.
246
R.L. Schilling
Problems 21.1. Show that every convergent sequence in is a Cauchy sequence. 21.2. Show that g → g h, h ∈ , is continuous.
1/p 21.3. Show that g h = gp + hp is for every p 1 a norm on × . For which values of p does × become a Hilbert space? 21.4. Show that g h → g h and t h → t h are continuous on × resp. × . 21.5. Show that a Hilbert space is separable if, and only if, contains a countable maximal orthonormal system. " 21.6. Show that for = L2 X and w ∈ L2 the set Mw⊥ = u ∈ L2 u w d = 0 ⊥ is either 0 or a one-dimensional subspace of . 21.7. Let ej j∈ ⊂ be an orthonormal system. (i) Show that no subsequence of ej j∈ converges. However, limj→ ej h = 0 for every h ∈ . [Hint: show that it can’t be a Cauchy sequence. Use Bessel’s inequality.] 1 (ii) The Hilbert cube Q = h ∈ h = cj ej cj j j ∈ is closed, j=1
bounded and compact (i.e. every sequence has a convergent subsequence). (iii) The set R = B1/j ej is closed, bounded but not compact (cf. (ii)). j=1
(iv) The set S = h ∈ h =
cj ej cj j j ∈ 2 compact (cf. (ii)) if, and only if, j=1 j < .
is closed, bounded and
j=1
21.8. Let be a real Hilbert space.
g h = sup g h = sup g h g g=0 g1 g=1 (ii) Can we replace in (i) • • by • •? (iii) Is it enough to take g in (i) from a dense subset rather than from (resp. B1 0 or k ∈ k = 1 )? (i) Show that
h = sup
21.9. Show that the linear span of a sequence ek k∈ ⊂ , span ek ek ∈ k ∈ , is a linear subspace of . 21.10. A weak form of the uniform boundedness principle. Consider the real Hilbert 2 space 2 = and let a = aj j∈ and b = bj j∈ be two sequences of real numbers. 2 (i) Assume that j=1 aj = . Construct a sequence jk k∈ such that j1 = 0 and 2 jk <jjk+1 aj > 1 for all k ∈ . (ii) Define bj = k aj for all jk < j jk+1 , k ∈ and show that one can determine 2 the k in such a way that j=1 aj bj = while j=1 bj < . (iii) Conclude that if a b < for all a ∈ 2 , we have necessarily b ∈ 2 . (iv) State and prove the analogue of (iii) for all separable Hilbert spaces.
Measures, Integrals and Martingales
247
Remark. The general uniform boundedness principle states that in every Hilbert space and for any H ⊂ one has sup h g < ∀ g ∈
=⇒
h∈H
sup h < h∈H
Interpreting h g → g h as linear map, this says that the boundedness of the orbits h for all h ∈ H implies that the set H is bounded. This formulation perseveres even in Banach spaces. The proof is normally based on Baire’s category theorem, cf. Rudin [40]. 21.11. Let F G ⊂ be linear subspaces. An operator P defined on G is called (-) linear, if P f + g = Pf + Pg holds for all ∈ and f g ∈ G. (i) Assume that F is closed and that P → F is the orthogonal projection. Then P2 = P
and
Pg h = g Ph
∀ g h ∈
(21.11)
(ii) If P → is a map satisfying (21.11), then P is linear and P is the orthogonal projection onto the closed subspace P. (iii) If P → is a linear map satisfying P2 = P
Ph h
and
∀ h ∈
then P is the orthogonal projection onto the closed subspace P. 21.12. Let X be a measure space and Aj j∈ ⊂ be mutually disjoint sets such that X = · j∈ Aj . Set # 2 2 Yj = u ∈ L
u d = 0 j ∈ Acj
(i) Show that Yj ⊥ Yk if j = k.
(ii) Show that span · j∈ Yj , (i.e. the set of all linear combinations of finitely many elements from · j∈ Yj ) is dense in L2 .
(iii) Find the projection Pj L2 → Yj .
21.13. Let X be a measure space and assume that Aj j∈ ⊂ is a sequence of pairwise disjoint sets such that · j∈ Aj = X and 0 < Aj < . Denote by n = A1 A2 An and by = Aj j ∈ . (i) Show that L2 n ⊂ L2 and that L2 n is a closed subspace. (ii) Find an explicit formula for E n u where E n is the orthogonal projection E n L2 → L2 n . (iii) Determine the orthogonal complement of L2 n .
n (iv) Show that E u n∈∪ is a martingale. n→
(v) Show that E n u −−→ E u a.e. and in L2 . (vi) Conclude that L2 is separable.
22 Conditional expectations in L2
Throughout this chapter X will be some measure space. We have seen in Chapter 20 that L2 = L2 ⊕ i L2 . By considering real and imaginary parts separately, we can reduce many assertions concerning L2 to L2 . From Chapter 21 we know that L2 is a Hilbert space with inner product, resp. norm u v = u v2 =
u v d
resp.
u = u2 =
1/2 u d 2
Since a function1 u ∈ L2 is only almost everywhere defined and since ¯ are almost everywhere finite, (square-) integrable functions with values in hence -valued, cf. Remark 12.5, we can identify L2 and L2¯ . We will do so and simply write L2 . Caution: Note that for functions u v ∈ L2 expressions of the type u = v, u v always mean ux = vx, ux vx for all x outside some -null set. In this chapter we are mainly interested in linear subspaces of L2 and projections onto them. One particularly important class arises in the following way: if ⊂ is a sub--algebra of , then any -measurable function is certainly -measurable. Since X is also a measure space, it seems natural to interpret L2 (with norm · L2 ) as a subspace of L2 (with norm · L2 ). This can indeed be done. 22.1 Lemma Let ⊂ be a sub--algebra of . Then ı 2 → 2 1
and
j L2 → L2
Strictly speaking we should call it an equivalence class of functions, cf. Remark 12.5.
248
Measures, Integrals and Martingales
249
are isometric imbeddings, i.e. linear maps satisfying ıu2 = u2 and jwL2 = wL2 for all u ∈ 2 , resp. w ∈ L2 . In particular 2 L2 is a closed linear subspace of 2 L2 . Proof Since ⊂ and since and coincide on , we have ⊂ for the simple functions. The map ı →
f → ı f = f =
N
j 1Gj
j=0
where the latter is a standard representation of f with j ∈ and Gj ∈ , clearly satisfies f 22 =
N
2j Gj =
j=0
N j=0
2j Gj = f 22
∀ f ∈
According to Corollary 12.11 we can find for every u ∈ 2 a sequence fk k∈ ⊂ ∩ 2 such that limk→ u − fk 2 = 0. Therefore, k →
ı fk − ı f 2 = ı fk − ı f 2 −−−−→ 0 which shows because of the completeness of 2 (cf. Theorem 12.7) that ıu = 2 - limk→ ı fk is a linear isometry from 2 to 2 . Since 2 is complete, ı2 is a closed linear subspace of 2 . Denote by u ∈ L2 the equivalence class containing the function u ∈ 2 . Since for any two u w ∈ 2 with u = w a.e. we also have ıu = ıw a.e., the map j u = ıu is independent of the chosen representative for u and defines a linear isometry j L2 → L2 . As before, jL2 is a closed linear subspace of L2 . It is customary to identify u ∈ L2 with ju ∈ L2 and we will do so in the sequel. Unless we want to stress the -algebra, we will write instead of and · 2 for the norm in L2 and L2 . A key observation is that the choice of ⊂ determines our knowledge about a function u. 22.2 Example Consider a finite measure space X and the sub--algebra = ∅ G Gc X where G ∈ and G > 0, Gc > 0. Let f ∈ be a simple function in standard representation: f=
m j=0
yj 1Aj
yj ∈ Aj ∈
250
R.L. Schilling
Then G
f d =
m j=0
yj
G
1Aj d =
m
yj
j=0
Aj ∩ G G = 1 1G d G
(22.1)
= 1
and similarly, Gc
f d =
m j=0
yj
Aj ∩ Gc c G = 1Gc d 2 Gc
(22.2)
= 2
This indicates that we could have obtained the same results in the integrations (22.4) and (22.5) if we had not used f ∈ but the -simple function g = 1 1G + 2 1Gc ∈
(22.3)
with 1 2 from above. In other words, f and g are indistinguishable, if we evaluate (i.e. integrate) both of them on sets of the -algebra . Note that g is much simpler than f , but we have lost nearly all information of what f looks like on sets from save : if we take a set from the standard representation of f , say, Aj0 G, Aj0 ∈ , then f d = yj0 Aj0 = g d = 1 Aj0 ∩ G Aj0
Aj0
Aj ∩ G = yj ·Aj0 G j=0 m
= 1
i.e. we would get a weighted average of the yj rather than precisely yj0 . Let us extend the process sketched in Example 22.2 to -finite measures and general square-integrable functions. Our starting point is the observation that, with the notation of 22.2, f g d = 1 f d + 2 f d = 12 G + 22 Gc = g 2 d G
Gc
that is, f − g g = 0 or f − g ⊥ g in the space L2 . 22.3 Definition Let X be a measure space and ⊂ be a sub--algebra. The conditional expectation of u ∈ L2 relative to is the orthogonal projection onto the closed subspace L2
E L2 → L2 Sometimes one writes Eu instead of E u.
u → E u
Measures, Integrals and Martingales
251
The terminology ‘conditional expectation’ comes from probability theory where this notion is widely used and where X is usually a probability space. In slight abuse of language we continue to call E conditional expectation even if is not a probability measure. Let us collect some properties of E . 22.4 Theorem Let X be a measure space and ⊂ a sub--algebra. The conditional expectation E has the following properties (u w ∈ L2 ): (i) (ii) (iii) (iv)
E u ∈ L2 ; E uL2 uL2 ; E u w = u E w = E u E w; E u is the unique minimizer in L2 such that u − E uL2 = inf u − gL2 g∈L2
u = w =⇒ E u = E w; E u + w = E u + E w for all ∈ ; If ⊂ is a further sub--algebra, then E E u = E u; E g u = g E u for all g ∈ L ; E g = g for all g ∈ L2 ; 0 u 1 =⇒ 0 E u 1; u w =⇒ E u E w; E u E u; 1 (xiii) E∅X u = u d for all u ∈ L1 ∩ L2 X
(v) (vi) (vii) (viii) (ix) (x) (xi) (xii)
1
= 0 .
Before we turn to the proof of the above properties let us stress again that all (in-)equalities in (i)–(xiii) are between L2 -functions, i.e. they hold only -almost everywhere. In particular, E u is itself only determined up to a -null set N ∈ . Proof (of Theorem 22.4) Properties (i)–(vi) and (ix) follow directly from Theorem 21.5, Corollary 21.6 and Remark 21.7. (vii): For all u w ∈ L2 we find because of (iii) (iii)
E E u w = E u E w = u E E w ∈L2
(ix)
= u E w = E u w
as E w ∈ L2 ⊂ L2 . Since w ∈ L2 was arbitrary, we conclude that E E u = E u.
252
R.L. Schilling
(viii): Writing u = f + f ⊥ ∈ L2 ⊕ L2 ⊥ , we get g u = g f + g f ⊥ . Moreover, we have for any ∈ L2 and g ∈ L that g ∈ L2 , thus g f ⊥ = f ⊥ g = 0 and from the uniqueness of the orthogonal decomposition we infer that g f ⊥ = g f ⊥
E g u = g f = g E u
or
(x): Let 0 u 1. Since E u ∈ L2 , the Markov inequality P10.12 implies
E u > n1 n2 E u2L2 n2 u2L2 <
(22.4)
and so n = 1E u>1/n ∈ L2 . Therefore,
E u 1E u<0 n d
22.4(iii)
22.4(ix)
=
=
u E 1E u<0 n d u 1E u<0 n d 0
which is only possible if E u < 0 ∩ E u > 1/n = 0, that is, if
E u < 0 = E u < 0 ∩ E u > 1/n n∈
= sup E u < 0 ∩ E u > 1/n = 0 n∈
hence E u 0. With very similar arguments we see that 1E u>1 ∈ L2 , and since u 1 we have
22.4(iii),(ix) E u 1E u>1 d = u 1E u>1 d E u > 1 which entails E u > 1 = 0 or E u 1. (xi): Using that w −u 0, the first part of the proof of (x) shows E w −u 0, so that by linearity E w E u. (xii): Again by the proof of (x) we find for u ± u 0 that E u ± u 0, and by linearity ±E u E u. This proves E u E u. (xiii): If X = , we have L2 ∅ X = 0,[] thus E∅X u = 0 and the formula clearly holds.
Measures, Integrals and Martingales
253
If X < , we have L2 ∅ X , and E∅X u = c is a constant. By (iv), c = E∅X u minimizes u − cL2 , and u − c2 d = u2 d − 2c u d + c2 d = shows that c =
1 X
2 2 1 1 u d + u d − c X u d − X X 2
u d is the unique minimizer.
In Chapter 23 we will extend the operator E to further properties.
p p1 L
and add a few
On2 the structure of subspaces of L2 In the rest of this chapter we want to address a different question. As we have seen, E L2 → L2 is a symmetric orthogonal projection onto the closed subspace L2 of the Hilbert space L2 . It is natural to ask whether every orthogonal projection L2 → onto a closed subspace ⊂ L2 is a conditional expectation. Equivalently we could ask under which conditions a closed subspace
of L2 is of the form L2 = for a suitable sub--algebra ⊂ . 22.5 Theorem Let X be a -finite measure space. For a closed linear subspace ⊂ L2 and its orthogonal projection = P L2 → , the following assertions are equivalent. (i) = L2 and = E for some sub--algebra ⊂ containing an exhausting sequence Gj j∈ ⊂ with Gj ↑ X and Gj < . (ii) is a sub-Markovian operator, i.e. 0 u 1 =⇒ 0 u 1, u ∈ L2 , and for some u0 ∈ L2 with u0 > 0 we have u0 > 0. (iii) ∩ L is an algebra – i.e. it is closed under pointwise products: f h ∈
∩ L =⇒ f h ∈ ∩ L – which is L2 -dense in and contains an (everywhere) strictly positive function h0 > 0. (iv) is a lattice – i.e. f h ∈ =⇒ f ∧ h ∈ – containing an (everywhere) strictly positive function h0 > 0, and for all h ∈ also h ∧ 1 ∈ . Proof We show that (i)⇒(ii)⇒(iv)⇒(i)⇒(iii)⇒(iv). (i)⇒(ii) The sub-Markov property of = E follows from Theorem 22.4(x), while 2−j u0 = 1Gj ∈ + j=1 Gj + 1 2
This section can be left out at first reading.
254
R.L. Schilling
clearly satisfies 0 < u0 1, 2−j 2−j u0 2 = 1Gj 1Gj 2 2 j=1 Gj + 1 j=1 Gj + 1 Gj −j = 2 1 Gj + 1 j=1 so that u0 ∈ L2 and, therefore, 0 < u0 = u0 1. (ii)⇒(iv) Since preserves positivity, we find for all u ∈ L2 that u+ 0 and u = u+ − u− u+ , thus u ∨ 0 u+ . On the other hand, = h ∈ L2 h = h and the above calculation shows for h ∈
h+ = h+ = h ∨ 0 h+
(22.5)
Since is a contraction, see (21.4), we find also (22.5)
h+ 2 h+ 2 h+ 2 which implies h+ h+ = h+ h+ = h+ h+ . Because of (22.5) we get h+ − h+ h+ = 0 or h+ = h+ on the set h+ > 0. But then 0
h+ 2 = h+ 2 =⇒
h+ =0
h+ 2 d =
h+ =0
h+ 2 d = 0
which shows that h+ = 0 on h+ = 0 or h+ = 0 = 0. In either case, h+ = h+ (almost everywhere) and h+ ∈ . Consequently, f ∧ h = f − f − h+ ∈ . Similarly, h ∧ 1 = h − h − 1+ and, if h ∈ , we see h ∧ 1 h ∧ 1 = h ∧ 1. Further, h − 1+ = h − h ∧ 1 h − h ∧ 1 = h − 1+ and since is a contraction, the same argument which we used to get h+ = h+ yields h − 1+ = h − 1+ , hence h − 1+ h ∧ 1 ∈ . Finally, h0 = u0 , u0 as in (ii), satisfies h0 ∈ and h0 > 0. (iv)⇒(i) We set = G ∈ h ∧ 1G ∈
∀ h ∈
Let us first show that is a -algebra. Clearly, ∅ X ∈ . If G ∈ , then h ∧ 1Gc + h ∧ 1G = h ∧ 1 + h ∧ 0 ∈
∈
∀ h ∈
Measures, Integrals and Martingales
255
which means that h ∧ 1Gc ∈ and Gc ∈ . For any two sets G H ∈ we see3 h ∧ 1G∪H = h ∧ 1G ∨ 1H = h ∧ 1G ∨ h ∧ 1H ∈
∀ h ∈
so that G ∪ H ∈ . Finally, let Gj j∈ ⊂ ; since is ∪-stable, we may assume that Gj ↑ G = j∈ Gj . Then h ∧ 1Gj j∈ ⊂
and
lim h ∧ 1Gj = h ∧ 1G ∈ L2
j→
∀ h ∈
Since h ∧ 1Gj h ∧ 1G ∈ L2 , an application of the dominated convergence theorem 12.9 shows that in L2 limj→ h ∧ 1Gj = h ∧ 1G for all h ∈ . Since
is a closed subspace, we conclude that h ∧ 1G ∈ for all h ∈ , thus G ∈ . We will now show that L2 = . If f ∈ we know from our assumptions that ±f ∧ 0 ∈ , so that f + = −−f ∧ 0, f − = −f ∧ 0 ∈ . Thus =
+ − + , and since also L2 = L2+ − L2+ it is clearly enough to show that + = L2+ .
Assume that f ∈ + . Then for a > 0, f ∧ a = a fa ∧ 1 ∈ , and by monotone convergence T11.1 and the closedness of ,
h ∧ 1f>a = h ∧ sup nf − nf ∧ a ∧ 1 ∈
∀ h ∈ n∈
proving that f > a ∈ for all a > 0. Moreover, f > a = X if a < 0 and f > 0 = k∈ f > 1/k, which shows that f > a ∈ for all a ∈ and, consequently, + ⊂ L2+ . Conversely, if g ∈ L2+ , we can write g as limit of simple functions, gn = Nn n n n n j=1 yj 1Gn with disjoint sets G1 GNn ∈ and yj > 0. For all h ∈
j
we find
Nn
gn ∧ h =
j=1
n yj 1Gn ∧ h j
Nn
=
j=1
n yj
1Gn ∧ j
h n
yj
∈
and dominated convergence T12.9 and the closedness of imply that g ∧ h ∈ . Choosing, in particular, h = n h0 for some a.e. strictly positive function h0 and letting n → gives g = L2 - lim n h0 ∧ g ∈ + n→
where we again used monotone convergence T11.1 and the closedness of . This proves L2+ ⊂ + . Finally, the sets Gj = h0 > 1/j ∈ satisfy Gj ↑X and, because of the Markov inequality P10.12, Gj = h0 > 1/j j 2 h20 d < . 3
Use that is a vector space and a ∨ b = −−a ∧ −b.
256
R.L. Schilling
(i)⇒(iii): Note that L2 ∩ L = L2 ∩ L . An application of the dominated convergence theorem 12.9 shows that the sequence fn = −n ∨ f ∧ n, n ∈ , f ∈ L2 , converges in L2 to f , i.e. L2 ∩ L is a dense subset of L2 . The element h0 > 0 is now constructed as in the proof of (i)⇒(ii). That L2 ∩ L is an algebra is trivial. (iii)⇒(iv): Let us show, first of all, that ∩ L is stable under minima. To this end we define recursively a sequence of polynomials in , p0 x = 0
pn+1 x = pn x + 21 x2 − pn2 x
n ∈ 0
By induction it is easy to see that pn 0 = 0 for all n ∈ 0 and that 0 pn x pn+1 x x
∀ x ∈ −1 1
For n = 0 there is nothing to show. Otherwise we can use the induction assumption pn x pn+1 x x to get def
= pn+2 x
2 x 0 pn+1 x pn+1 x + 21 x2 − pn+1 0
= x − x − pn+1 x · 1 − 21 x + pn+1 x 0
0 for x∈ −11
x Therefore, limn→ pn x = supn∈ pn x = x exists for all x 1 and, according to the recursion relation, x = x. Since is a linear subspace which is stable under products, we get for every h ∈ ∩ L that pn h/h ∈ , and monotone convergence T11.1 and the closedness of show h h sup pn ∈ =⇒ h ∈ = h h n∈ As ∩ L is dense in , we find for h ∈ a sequence hk k∈ ⊂ ∩ L such that L2 - limk→ hk = h. From above we know, however, that hk ∈ and k→
hk −−−→ h in L2 , thus h ∈ . This shows, in particular, that f ∧ h = 21 f + h − f − h ∈
∀ f h ∈
Since 0 < h0 h0 , we get for all n h0 that N n h0 j h0 j = = sup n − h0 j=0 n N ∈ j=0 n
Measures, Integrals and Martingales
and for h ∈ , h∧
257
N h0 j n = lim ∧h n − h0 N → j=0 n ∈
n By monotone convergence T11.1 we conclude that h ∧ n−h ∈ . Finally, as 0
2 n n 2 h2 , we can use the dominated convergence theorem n−h ↓ 1 and h ∧ n−h 0
0
12.9 and the closedness of to see that h ∧ 1 ∈ . Problems 22.1. Let X be a measure space and ⊂ be two sub--algebras. Show that E E u = E E u = E u
∀ u ∈ L2
22.2. Let X be a measure space, ⊂ be a sub--algebra and let = f where f ∈ + is a density f > 0. (i) Denote by E resp. E the projections in the spaces L2 , resp. L2 . Express E in terms of E . [Hint: E u = E fu/E f 1E f>0 ]
(ii) Under which condition do we have E u = E u for all u ∈ L2 ∩L2 ? Remark. The above result allows us to study conditional expectations for finite measures only and to define for more general measures a conditional expectation by E u =
E fu E f
1E f>0
n 22.3. Let X be a finite measure space, G1 Gn ∈ such that · j=1 Gj = X and Gj > 0 for all j = 1 2 n. Then n dx E u= ux 1 Gj Gj Gj j=1 Remark. The measure 1Gj /Gj = • ∩ Gj /Gj is often called the conditional probability given Gj .
23 Conditional expectations in Lp
Throughout this chapter X will be some measure space. Our aim is to extend the operator E from L2 to a wider class of functions including the spaces Lp , 1 p . We will use the same technique that allowed us in Chapters 9 and 10 to extend the integral from the simple functions to the positive measurable functions + and integrable functions 1 . Since we are now considering the spaces Lp of (equivalence classes of) pth power integrable functions, it is convenient to have an analogous notion for measurable functions. 23.1 Definition Let X be a measure space. Two functions u v ∈ are called equivalent, u ∼ v, if u = v ∈ is a -null set. We write M = /∼ for the set of all equivalence classes of measurable functions u ∈ . As with Lp -functions, all (in-)equalities between elements from M hold pointwise almost everywhere. 23.2 Lemma Let X be a -finite measure space. Then u ∈ M + if, and only if, there exists an increasing sequence uj j∈ ⊂ L2+ such that u = supj∈ uj . Proof The ‘only if’ part ‘⇐’ is trivial since suprema of countably many measurable functions are again measurable (C8.9). For ‘⇒’ let Aj j∈ ⊂ be an exhausting sequence such that Aj ↑ X and Aj < . If u ∈ M + , then uk = u ∧ k 1Ak ∈ L2+ and supk∈ uk = u. 23.3 Remark Lemma 23.2 is no longer true if X is not -finite. In fact, if 1 ∈ M + can be approximated by an increasing sequence uk k∈ ⊂ L2+ , 1 = supk∈ uk , the sets Ak = uk > 1/k would form an increasing sequence 258
Measures, Integrals and Martingales
Ak ↑ X with Ak = uk > 1/k k P10.12.
2
259
u2k d < by the Markov inequality
The key technical point is the following result. 23.4 Lemma Let X be a measure space, ⊂ be a sub- -algebra, and uj j∈ wj j∈ ⊂ L2 be two increasing sequences. Then sup uj = sup wj =⇒ sup E uj = sup E wj
j∈
j∈
j∈
(23.1)
j∈
If u = supj∈ uj is in L2 , the following conditional monotone convergence property holds: sup E uj = E sup uj = E u (23.2) j∈
j∈
in L2 and almost everywhere. Proof Let us first of all assume that uj ↑ u and u ∈ L2 . Monotone convergence j→
11.1 and Theorem 12.7 show that uj −−−→ u also in L2 -sense. By Theorem 22.4(xi), E uj j∈ is again an increasing sequence and E uj E u. From 22.4(ii), (vi) we get E u − E uj = E u − uj u − uj 2 2 2 i.e. L2 -limj→ E uj = E u. For a subsequence ujk k∈ ⊂ uj j∈ we even have limk→ E ujk = E u a.e., cf. Corollary 12.8. Because of the monotonicity of the sequence E uj j∈ , we get for all j > jk E u − E uj = E u − E uj E u − E ujk and letting first j → and then k → gives E uj ↑ E u a.e. This finishes the proof of (23.2). If uj j∈ wj j∈ ⊂ L2 are any two increasing sequences1 such that supj∈ uj = supj∈ wj , we can apply (23.2) to the increasing sequences uj ∧ wk ↑ uj (as k → and for fixed j) and uj ∧ wk ↑ wk (as j → and for fixed k). This shows (23.2)
(23.2)
sup E uj = sup sup E uj ∧ wk = sup sup E uj ∧ wk = sup E wk
j∈
j∈ k∈
k∈ j∈
k∈
A combination of Lemmata 23.2 and 23.4 allows us to define conditional expectations for positive measurable functions in a -finite measure space. 1
We do not assume that supj∈ uj supj∈ wj ∈ L2 .
260
R.L. Schilling
23.5 Definition Let X be a -finite measure space and ⊂ be a sub -algebra. Let u ∈ M + and let uj j∈ ⊂ L2+ be an increasing sequence such that u = supj∈ uj . Then E u = sup E uj
(23.3)
j∈
is called the conditional expectation of u with respect to . If u ∈ M and E u± ∈ almost everywhere, we define (almost everywhere) − E u = E u+ − E u− = lim E u+ (23.4) j − E uj j→
± 2 where u± j ↑ u are suitable approximating sequences from L+ . We write L for the set of all functions u ∈ M such that (almost everywhere) E u exists and is finite.
23.6 Theorem Let X be a -finite measure space. The conditional expectation E extends E , i.e. L2 ⊂ L and E u = E u for all u ∈ L2 . Proof Applying (23.2) to u+ and u− shows E u± = E u± and, in particular, E u± ∈ L2 . As such, E u± is a.e. real-valued, so that (23.4) is always defined in M, resp. M. 23.7 Theorem Let X be a -finite measure space. Then L is a vector space and Lp ⊂ L for all 1 p . Proof Let 1 < p < and take u ∈ Lp ∩ L2 . Since E u ∈ L2 , the Markov inequality P10.12 shows that E u = ∈ is a -null set and that the sets Gn = n > E u > 1/n ∈ have finite -measure, Gn E u > 1/n n2 E u2 d <
Moreover,
E u 1G p = E u E up−1 sgnE u 1G n p n
= u E up−1 sgnE u 1Gn
(by T22.4(iii), (ix))
Cq u p where we used H¨older’s inequality T12.2 with p−1 + q −1 = 1 and
1/q
1/q Cq = E up−1q 1Gn d = E up 1Gn d
Measures, Integrals and Martingales
261
Dividing the above inequality by Cq – if Cq = 0 there is nothing to show since in this instance E u = 0[] – gives E u 1G u p
n p As we have seen above, E u = = 0, so we can use Beppo Levi’s theorem 9.6 to find for all u ∈ L2 ∩ Lp E u = E u 10<E u< = sup E u 1G u p
(23.5) n p p p n∈
⊂ M, 1 < p < , we use Lemma 23.2 to find For general u ∈ + ± − 2 ± sequences uj j∈ ⊂ L+ such that 0 u± j ↑ u . Since uj − uj uj u, T22.4(x)–(xii) and Fatou’s lemma T9.11 show (23.4) + − E u = lim E u − u j j p Lp
j→
lim inf E uj j→
p
p
lim inf E uj p j→
(23.5)
lim inf uj p u p j→
(23.6)
which, in turn, shows that E u is a well-defined function in Lp . If p = 1 and u ∈ L2 ∩ L1 , the estimate (23.5) follows more easily from
E u 1G = E u 1G 21.4(iii),(ix) = u 1Gn u 1 n 1 n for all u ∈ L2 ∩ L1 (notation as above), and we conclude that E u 1 u 1 for u ∈ L1 and that E u is well-defined. If p = and u ∈ L2 ∩ L we get from T22.4 and the observation that u/ u 1 E u/ u E u/ u 1 thus E u u for all u ∈ L2 ∩L . This extends to E u for general u ∈ L as in the cases where p ∈ 1 . It remains to show that L is a vector space. This follows immediately from the formula E u+ w = E u+ E w, which is easily proved from the definition of E via approximating sequences and the corresponding property (vi) for E of Theorem 22.4. The properties of E resemble those of E . The theorem below is the analogue of Theorem 22.4.
262
R.L. Schilling
23.8 Theorem Let X be a -finite measure space and let ⊂ ⊂ be sub- -algebras. The conditional expectation E has the following properties u w ∈ L : (i) E u ∈ M; (ii) u ∈ Lp =⇒ E u ∈ Lp and E u p u p ; p ∈ 1 ; (iii) E u w = u E w = E u E w for u w ∈ L u E w ∈ L1 , e.g. if u ∈ Lp and w ∈ Lq with p−1 + q −1 = 1; (iv) u = w =⇒ E u = E w; (v) E u + w = E u + E w ∈ ;
(vi) ⊃ =⇒ E E u = E u; (vii) g ∈ M u ∈ L =⇒ g u ∈ L and E g u = g E u (viii) M ⊂ L and E g = g E 1 for all g ∈ M; (viii ) if is -finite 1 = E 1 and g = E g for all g ∈ M; (ix) 0 u 1 =⇒ 0 E u 1; (x) u w =⇒ E u E w; (xi) E u E u; 1 1 (xii) E ∅X u = u d for all u ∈ L1 = 0
X Proof (i) is clear from the definition of E , (ii), (v) were already proved in Theorem 23.7. (iii), (iv), (ix), (x) follow by approximation from the corresponding properties of E from Theorem 22.4, and (xi) is derived from (ix) exactly as in the L2 -case. (vi) Without loss of generality it is enough to consider the case u 0. Pick a sequence uj j∈ ⊂ L2 such that uj ↑ u. By Lemma 23.4 and the definition of E we know that E uj ↑ E u as well as E uj ↑ E u and E E uj ↑ E E u. Since E E uj = E uj , by Theorem 22.4(vii), we are done. (vii) Assume first that g u 0. Define gj = g ∧ j ∈ L + and let uj j∈ ⊂ 2 L+ be an increasing sequence such that supj∈ uj = u. Then gj uj ∈ L2 , gj uj ↑ g u and T22.4 shows that E gj uj = gj E uj . Hence, E g u = sup E gj uj = sup gj E uj = g E u
j∈
j∈
u ∈ L ,
the conditional expectation E u+ − E u− is well-defined If g 0 and and we find from the previous calculations g E u = g E u+ − E u− = E g u+ − E g u− = E g u
Finally, if g ∈ M we see, using g + g − = 0, that g + − g − E u = E g + u − E g − u = E g u
Measures, Integrals and Martingales
263
(viii) Since X is -finite, E 1 = supj∈ E 1Aj for some exhausting sequence Aj j∈ ⊂ with Aj ↑ X and Aj < . We can now argue as in (vii) to get E g = g E 1. (viii ) If is -finite, we can find an exhausting sequence Gj j∈ ⊂ with Gj ↑ X and Gj < . Since 1Gj ∈ L2 , we find from T22.4(ix) that E 1 = sup E 1Gj = sup 1Gj = 1
j∈
j∈
For g ∈ M, we use gj± = g ± ∧ j 1Gj ∈ L2 as approximating sequences and finish the proof as before. (ix) If uj j∈ ⊂ L2 approximates 0 u 1 such that uj ↑ u = supk∈ uk , 2 we still have vj = u+ j ∈ L and vj ↑ u. Thus T22.4(x) implies 0 E u 1. (xii) Considering positive and negative parts separately we may assume that u 0. Since u ∈ L , there is an approximating sequence uj j∈ ⊂ L2+ , u = supj∈ uj , and as 0 uj u ∈ L1 , we have uj ∈ L1 . Theorem 22.4(xiii) gives, together with the definition of E and Beppo Levi’s theorem 9.6, 1 1 uj d = u d
E ∅X u = sup E∅X uj = sup X j∈ j∈ X Classical conditional expectations From now on we will no longer distinguish between E and its extension E but always write E . In particular, we can now show that the operator E coincides with the traditional definition of conditional expectation for L1 -functions. The latter turns out to be a rather elegant way to rewrite the martingale property introduced in Definition 17.1. 23.9 Theorem Let X be a -finite measure space and ⊂ be a sub -algebra such that is -finite. For u ∈ L1 and g ∈ L1 the following conditions are equivalent: (i) E u = g; (ii) u d = g d G G u d = g d (iii) G
G
∀ G ∈ ; ∀G ∈ G < .
If is generated by a ∩-stable family ⊂ X containing an exhausting sequence Fj j∈ Fj ↑ X then (i)–(iii) are also equivalent to u d = g d ∀F ∈ . (iv) F
F
264
R.L. Schilling
Proof We begin with the general remark that by Theorem 23.8(iii), (viii ) we have for all G ∈ and u ∈ L1
E u d = E u 1G = u E 1G = u 1G = u d
(23.7) G
G
(i)⇒(ii): Because of (23.7) we get for all G ∈ and k ∈ (23.7) 23.9(i) E u d = u d = g d
G
G
G
(ii)⇒(iii) is obvious. (iii)⇒(i): Take an exhausting sequence Gk k∈ ⊂ with Gk ↑ X and Gk < . Then we have for all G ∈ (23.7) 23.9(iii) E u d = u d = g d
G∩Gk
G∩Gk
G∩Gk
Since 1G∩Gk E u E u ∈ L1 and 1G∩Gk g g ∈ L1 , we can use dominated convergence T11.2 to let k → and get E u d = g d ∀ G ∈ G
G
from which we conclude that E u = g a.e. by Corollary 10.14(i). Assume now, in addition, that = . In this case, (ii)⇒(iv) is obvious, while (iv)⇒(ii) follows with the technique used in Remark 17.2(i): because of (iv) the measures G = u+ + g − d and G = u− + g + d G
G
coincide on , and by the uniqueness theorem for measures 5.7, on . If we combine Theorem 23.9 with the Beppo Levi theorem 9.6 or other convergence theorems we can derive all sorts of ‘conditional’ versions of these theorems. 23.10 Corollary (Conditional Beppo Levi theorem) Let X be a -finite measure space and ⊂ be a sub- -algebra such that is -finite. For every increasing sequence uj j∈ ⊂ L1+ of positive functions the limit u = supj∈ uj admits a conditional expectation with values in 0 and sup E uj = E sup uj = E u
(23.8) j∈
j∈
Proof Let Aj j∈ ⊂ be an exhausting sequence of sets, i.e. Aj ↑ X and 1 Aj < . Then the functions wj = uj ∧ j1Aj ∈ L + ∩ L+ and, in
Measures, Integrals and Martingales
265
particular, wj ∈ L2+ .[] Moreover, the sequence wj increases towards u. From Definition 23.5 we get that
E u = sup E wj def
j∈
which is a numerical function with values in 0 . On the other hand, we know from Theorem 23.9 that for all G ∈ E wj d = wj d and E uj d = uj d G
G
G
G
holds. Since supj∈ wj = u = supj∈ uj and since the sequences E uj j∈ and E wj j∈ are positive and increasing, cf. Theorem 22.4(xi) and 23.8(x), we conclude from Beppo Levi’s theorem 9.6 that sup E uj d = sup E uj d = sup uj d = sup uj d = u d
G j∈
j∈ G
j∈ G
G j∈
G
With a similar calculation we find sup E wj d = u d G j∈
and, consequently,
sup E uj d =
G
G j∈
sup E wj d
G j∈
∀ G ∈
By Corollary 10.14 we conclude that supj∈ E uj = supj∈ E wj = E u almost everywhere. def
In the same way as we deduced Fatou’s lemma T9.11 and Lebesgue’s dominated convergence theorem 11.2 from the monotonicity property of the integral and Beppo Levi’s theorem 9.6, we can get their conditional versions from T22.4(xi), (xii) and C23.10. We leave the simple proofs to the reader. 23.11 Corollary (Conditional Fatou’s lemma) Let X be a -finite measure space, ⊂ be a sub- -algebra such that is -finite, and uj j∈ ⊂ L1+ . Then E lim inf uj lim inf E uj
(23.9) j→
j→
23.12 Corollary (Conditional dominated convergence theorem) Let X be a -finite measure space, ⊂ be a sub- -algebra such that is -finite,
266
R.L. Schilling
and uj j∈ ⊂ L1 such that uj w for some w ∈ L1+ . Then E lim uj = lim E uj
j→
j→
(23.10)
23.13 Corollary (Conditional Jensen inequality) Let X be a -finite measure space and ⊂ be a sub- -algebra such that is -finite. Assume that V → is a convex function with V0 0 and → a concave function with 0 0. Then E u E u ∀ u ∈ L (23.11) and, in particular, u ∈ L . Moreover, V E u E Vu
∀ u ∈ L s.t. Vu ∈ L
(23.12)
Proof The argument is very similar to the proof of Jensen’s inequality T12.14. Note, however, that we do not have to require the finiteness of the reference measure – which was w in T12.14. Let us, for example, prove (23.12). Using Lemma 12.13 and denoting by sup the supremum over all linear functions such that x = ax + b Vx for all x ∈ , we get using T22.4(v), (x), (ix) V E u = sup a E u + b sup E au + b E Vu
since b E b where we observed that b V0 0. The inequality (23.11) is proved in the same way. Because of Theorem 23.9 it is now very easy and convenient to express the martingale property D17.1 in terms of conditional expectations. In fact, 23.14 Corollary Let X j be a -finite filtered measure space. A sequence uj j∈ ⊂ L1 such that uj ∈ L1 j is a martingale (resp. sub- or supermartingale) if, and only if, for all j ∈ Ej uj+1 = uj resp. Ej uj+1 uj or Ej uj+1 uj
A great advantage of this way of putting things is that we can now formulate the convergence theorem for uniformly integrable martingales T18.6 in a very striking way: 23.15 Theorem (Closability of martingales) Let X j be a -finite filtered measure space and = j∈ j . (i) For every u ∈ L1 the sequence Ej u j∈ is a uniformly integrable marj→
tingale. In particular, Ej u −−−→ E u in L1 and a.e.
Measures, Integrals and Martingales
267
(ii) Conversely, if uj j∈ is a uniformly integrable martingale, there exists j→
a function u ∈ L1 such that uj −−−→ u in L1 and a.e. such that uj j∈∪ is a martingale. In particular, uj = Ej u . In this sense, u closes the martingale uj j∈ . Proof (i) That Ej u j∈ is a martingale follows at once from Theorem 23.8(vi). By assumption there exists an exhausting sequence Ak k∈ ⊂ 0 with Ak ↑ X and Ak < . Therefore, the function w =
2−k 1 1 + Ak Ak k=1
is strictly positive and integrable. Since u ∈ L1 and u N w ↓ ∅ as N → , we find by dominated convergence T11.2 that lim u d = 0
N → uNw
This shows that, for all > 0, large enough N = N and any A ∈ u d = u d + u d A
A∩u
N
A
w d +
A∩uNw
uNw
u d N
A
w d +
2
We can rephrase this as: for all > 0 there exists = > 0 such that w d < =⇒ u d < ∀ A ∈
(23.13) A
A
Since for all j ∈ and c > 0 j j c w d E u d E u d = u d E
j u>c w
we may choose c = c0 = −1 u d, which implies that for a given > 0 (23.13) w d < u d <
=⇒ E
j u>c
E
0w
j u>c
0w
Since Ej u > c0 w ∈ j , the martingale property implies 23.8(xi) E j u d E j u d E
j u>c
E
0w
23.9
j u>c
0w
Ej u>c0 w
u d
which is but uniform integrability of the family Ej u j∈ .
∀ j ∈
268
R.L. Schilling
The convergence assertions follow now from the convergence theorem for UI submartingales T18.6. (ii) follows directly from Theorem 18.6. Since the conditional Jensen inequality needs fewer assumptions than the classical Jensen inequality we can improve Example 17.3(v), (vi). 23.16 Corollary Let X j be a -finite filtered measure space and uj j∈ be a family of measurable functions uj ∈ Lj which satisfies the [sub-]martingale property2 uj = Ej uj+1 resp. uj Ej uj+1
If V → is a [monotone increasing] convex function such that Vuj ∈ L1 j , then Vuj j∈ is a submartingale. Proof Since uj j∈ satisfies the [sub-]martingale property, we find from Jensen’s inequality C23.13 [and the monotonicity of V ] that Vuj V Ej uj+1 Ej Vuj+1
23.17 Example In Example 17.3(ix) we introduced a dyadic filtration on the measure space 0 n 0 n = n 0n given by −j n −j n j ∈ 0
j = z + 0 2 z ∈ 2 0 For u ∈ L1 0 n and all j ∈ 0 we can now rewrite (17.4) as 1z+02−j n j d 1z+02−j n x
E ux = u z + 0 2−j n z∈2−j n 0
23.18 Remark In Theorem 22.5 we found necessary and sufficient conditions that a projection in L2 is a conditional expectation. This result has a counterpart in the spaces Lp , p = 2, which we want to mention here without proof. Details can be found in the monograph by Neveu [31, pp. 12–16]. Let X be a finite measure space. Then (i) Let p ∈ 1 . linear operator T Lp → Lp such that Every bounded Tf d = f d, f ∈ Lp , and Tf Tg = Tf Tg, f ∈ L g ∈ 1 L , is a conditional expectation w.r.t. some sub- -algebra ⊂ . (ii) Let p ∈ 1 , p = 2. Every linear contraction T Lp → Lp such that T 2 = T andT 1 = 1isaconditionalexpectationw.r.t.somesub- -algebra ⊂. 2
This is slightly more general than assuming that uj j∈ is a [sub-]martingale since [sub-]martingales are, by definition, integrable.
Measures, Integrals and Martingales
269
Separability criteria for the spaces Lp X Let X be a measure space. Recall that Lp is separable if it contains a countable dense subset dj j∈ ⊂ Lp . We have seen in Chapter 21 that the Hilbert space L2 is separable if we can find a countable complete ONS ej j∈ ⊂ L2 since the system q1 e1 + · · · + qN eN N ∈ qj ∈ is both countable and dense. Conversely, using any countable dense subset dj j∈ as input for the Gram–Schmidt orthonormalization procedure (21.10), produces a complete countable ONS. Here is a simple sufficient criterion for the separability of Lp . 23.19 Lemma Let X be a -finite measure space and assume that the -algebra is countably generated, i.e. = Aj j ∈ , Aj ⊂ X. Then Lp X , 1 p < , is separable. Proof Step 1: Let us first assume that is a finite measure. Consider the -algebras n = A1 An ; then 1 ⊂ 2 ⊂ ⊂ = j j ∈ is a filtration, j is trivially -finite for every j ∈ and = .
Set uj = Ej u for u ∈ L1 . By Theorem 23.15 uj j j∈ is a uniformly j→
integrable martingale, hence uj −−−→ u in L1 and a.e. If v ∈ Lp , we set vj = Ej vp Ej vp (by Corollary 23.13) and observe that vj j j∈ is a submartingale, cf. Theorem 23.15, which is uniformly integrable. The latter follows easily from p E j v d vj d Ej vp d vj >w
vj >w
E
j vp >w
and the uniform integrability of the family Ej vp j∈ , see Theorem 23.15. j→
From the (sub-)martingale convergence Theorem 18.6 we conclude that vj −−−→ E vp = vp in L1 and a.e., and Riesz’ theorem 12.10 shows vj 1/p = j→ j→ E j v −−−→ v in Lp . Consequently, Ej v −−−→ v in Lp . Since the -algebra j is generated by finitely many sets, Ej u, resp., Ej v are simple functions with canonical representations of the form s=
N k=1
yk 1Bk
yk = 0 B1 BN ∈ j disjoint
270
R.L. Schilling
as j is kept fixed, we suppress the dependence of yk Bk N on j. If yk ∈ , we find for every > 0 numbers yk ∈ such that yk − y k
N X1/p
The triangle inequality now shows N N yk − y Bk 1/p s − y 1 B k k k k=1
p
k=1
which proves that the system N qk 1Bk N ∈ qk ∈ Bk ∈ j D = k=1
j∈
is a countable dense subset of the space Lp X , 1 p < . Step 2: If is -finite but not finite, we choose an exhausting sequence Cj j∈ ⊂ such that Cj ↑ X and Cj < and consider the finite measures j = • ∩ Cj , j ∈ , on Cj ∩ . Since every u ∈ Lp j = Lp Cj Cj ∩ j can be extended by 0 on the set X \ Cj and becomes an element of Lp j+1 , we can interpret the sets Lp j as a chain of increasing subspaces of each other and of Lp : Lp Cj Cj ∩ j ⊂ Lp Cj+1 Cj+1 ∩ j+1 ⊂ ⊂ Lp X
Applying the construction from step 1 to each of the sets Lp j furnishes countable dense subsets Dj . Obviously, D = j∈ Dj is a countable set but it is also dense in Lp . To see this, fix > 0 and u ∈ Lp . Since X \ Cj ↓ ∅, we find by Lebesgue’s dominated convergence theorem 12.9 some N ∈ such that X\C up d < p for all j N . Since Dj is dense in Lp j and since j u 1Cj ∈ Lp j , there is some dj ∈ Dj ⊂ D with u 1Cj − dj Lp j , and altogether we get for large j N u − dj p u 1Cj − dj Lp j + u 1X\Cj Lp 2
If the underlying set X is a separable metric space (cf. Appendix B), the criterion of Lemma 23.19 becomes particularly simple. 23.20 Corollary Let X be a separable metric space equipped with its Borel -algebra = X. Then Lp X , 1 p < is separable for every -finite measure on X . If is not -finite, Lp X need not be separable.
Measures, Integrals and Martingales
271
Proof Denote by D ⊂ X a countable dense subset and consider the countable system of open balls Br d = x ∈ X x d < r = Br d d ∈ D r ∈ + ⊂ X
Since every open set U ∈ X can be written as U= Br d 3 Br d⊂U Br d∈
which shows that X ⊂ ⊂ X = X. Thus the Borel sets X = are countably generated, and the assertion follows from Lemma 23.19. If is not -finite, we have the following counterexample: take X = 0 1 with its natural Euclidean metric x y = x − y and let be the counting measure on 0 1 0 1, i.e. B = #B. Obviously, is not -finite. The p th power -integrable simple functions are of the form N p ∩ L = yj 1Aj N ∈ yj ∈ Aj ∈ #Aj < j=1
so that
Lp = u 0 1 → ∃ xj j∈ ⊂ 0 1 ux = 0 and
∀ x = xj uxj <
p
j=1
Obviously, 1x x∈01 ⊂ Lp , but no single countable system can approximate this family since 1x − 1y pp = 0
or
2
according to whether x = y or x = y. With somewhat more effort we can show that the conditions of Lemma 23.19 are even necessary. 23.21 Theorem Let X be a -finite measure space. Then the following assertions are equivalent. (i) is (almost) separable, i.e. there exists a countable family ⊂ such that F < for all F ∈ and ≈ in the sense that every set in has, up to a null set, a version in . 3
The inclusion ‘⊂’ is obvious, for ‘⊃’ fix x ∈ U . Then there exists some r ∈ + with Br x ⊂ U . Since D is dense, x ∈ Br/2 d for some d ∈ D with d x < r/4, so that x ∈ Br/2 d ⊂ U .
272
R.L. Schilling
(ii) is separable,4 i.e. there exists a countable family ⊂ such that F < and for every A ∈ with A < we have ∀ > 0 ∃ F ∈ A \ F + F \ A
(iii) Lp X is separable, 1 p < . Proof (i)⇒(iii): The proof of Lemma 23.19 shows that Lp X is separable. Since for each A ∈ there is an A∗ ∈ with A \ A∗ ∪ A∗ \ A = 0 ⇐⇒ 1A − 1A∗ d = 0 every simple function ∈ has a version ∗ ∈ such that −∗ d = 0. This proves that Lp X ⊃ Lp X (we have, in fact, equality since ⊂ ), and we see that Lp X is separable. (iii)⇒(ii): Denote by dj j∈ a countable dense subset of Lp . Since ∩Lp is dense in Lp , cf. Lemma 12.11, we find for each dj a sequence fjk k∈ ⊂ ∩ Lp such that Lp -limk→ fjk = dj . Thus fjk jk∈ is also dense in Lp , and the system of subsets N fjk = r N ∈ j k ∈ r ∈ = =1
is countable since each fjk attains only finitely many values. For every A ∈ , A < , we have 1A ∈ Lp , and we find a subsequence p fA ∈ ⊂ fjk jk∈ with lim→ fA − 1A p = 0. A A Set F = f − 1A 1/2 ∩ f > 1/2. Obviously F ∈ , and F ⊂ A since Ac ∩ F = Ac ∩ fA − 1A 1/2 ∩ fA > 1/2 = Ac ∩ fA 1/2 ∩ fA > 1/2 = ∅
Thus F \ A = 0, while A \ F A ∩ fA − 1A > 1/2 + A ∩ fA 1/2
Using the triangle inequality, we infer A ∩ fA 1/2 ⊂ A ∩ fA − 1 1/2 = A ∩ fA − 1A 1/2 4
This notion derives from the fact that , A B = A \ B + B \ A, A B ∈ becomes a separable pseudo-metric space in the usual sense, cf. Appendix B.
Measures, Integrals and Martingales
273
and with the above calculation and an application of Markov’s inequality we conclude that A \ F + F \ A 2 A ∩ fA − 1A 1/2 10.12
2p+1 fA − 1A pp
The right-hand side of the above inequality tends to 0 as → , and (ii) follows. (ii)⇒(i): Fix A ∈ with A < . Then we find, by assumption, sets Fn ∈ with A \ Fn + Fn \ A 2−n . Consider the sets F ∗ =
Fn
and
F∗ =
k=1 n=k
Fn
k=1 n=k
Then using the continuity of measures T4.4 and -subadditivity C4.6, c ∗ F \ A + A \ F∗ = Fn A + A ∩ Fn k=1 n=k
=
k=1 n=k
c Fn \ A + A ∩ Fn
k=1 n=k
k=1 n=k
Fn \ A + A \ Fn = lim k→
n=k
Fn \ A + A \ Fn
lim
k→
n=k
lim
k→
n=k
2−k = 0
n=k
This shows that for all A ∈ with A < ∃ F∗ F ∗ ∈ F∗ ⊂ F ∗
and F ∗ \ A + A \ F∗ = 0
implying that F ∗ \ A + A \ F ∗ = 0 also. If A = we pick some exhausting sequence Ak k∈ ⊂ with Ak ↑ X and Ak < . Then the sets A ∩ Ak have finite -measure and we can construct, as before, sets Fk∗ and F∗k . Setting F ∗ = k∈ Fk∗ we find
k∈
Fk∗
j∈
A ∩ Aj =
k∈
Fk∗
∗ A ∩ Aj ⊂ Fk \ A ∩ Ak
j∈
k∈
274
R.L. Schilling
and so ∗
F \ A
k∈
Fk∗ \ A ∩ Ak
Fk∗ \ A ∩ Ak = 0
! " k=1 =0
A \ F ∗
is handled analogously. The expression This shows that sets from and differ by at most a null set. Problems 23.1. Complete the proof of Theorem 23.8. 23.2. Show that E 1 = 1 if, and only if is -finite. Find a counterexample showing that E 1 1 is, in general, best possible. [Hint: use p = 2 and E = E .] 23.3. Let be a sub- -algebra of . Show that E g = g for all g ∈ Lp . [Hint: observe that, a.e., g = g 1j g>1/j and g > 1/j < . This emulates -finiteness.] 23.4. Let ⊂ be two sub- -algebras of . Show that E E u = E E u = E u p for all u ∈ L provided is -finite. resp. for all u ∈ M [Hint: if is not -finite, the set Lp can be very small ….] 23.5. Consider on the measure space 0 = 1 0 the filtration n = 0 n 0 1 0 2 n − 1 n n . Find E u for u ∈ Lp . 23.6. Let X be a measure space and ⊂ be a sub- -algebra. Show that, in general, E u d u d u ∈ L1 with equality holding only if is -finite. 23.7. Prove Corollaries 23.11 and 23.12. 23.8. Let X j be a -finite filtered measure space and denote by u the canonical dual pairing between u ∈ Lp and ∈ Lq , p−1 + q −1 = 1, i.e. u = u d. A sequence uj j∈ ⊂ Lp is weakly relatively compact if there exists a subsequence ujk k∈ such that k→
ujk − u −−→ 0 holds for all ∈ Lq and some u ∈ Lp . Show that for a martingale uj j∈ and every p ∈ 1 the following assertions are equivalent: (i) there exists some u ∈ Lp such that limj→ uj − u p = 0; (ii) there exists some u ∈ Lp such that uj = E j u ; (iii) the sequence uj j∈ is weakly relatively compact.
Measures, Integrals and Martingales
275
23.9. Let X be a measure space and uj j∈ ⊂ L1 . Show that m1 = u1
mj+1 − mj = uj+1 − E j uj+1
is a martingale under the filtration j = u1 uj . 23.10. (Continuation of Problem 23.9). If u1 d = 0 and E j uj+1 = 0 then uj j∈ is called a martingale difference sequence. Assume that uj ∈ L2 and denote by sk = u1 + · · · + uk the partial sums. Show that sj2 j j∈ is a submartingale satisfying k sk2 d = u2j d
j=1
23.11. Doob decomposition. Let X j be a -finite filtered measure space and let sj j j∈ be a submartingale. Show that there exists an a.e. unique martingale mj j j∈ and an increasing sequence of functions aj j∈ such that aj ∈ L1 j−1 for all j 2 and sj = mj + aj
j ∈
[Hint: set m0 = u0 , mj+1 − mj = uj+1 − E j uj+1 and a0 = 0, aj+1 − aj = E j uj+1 − uj . For uniqueness assume m ˜ j + a˜ j is a further Doob decomposition ˜ j = a˜ j − aj .] and study the measurability properties of the martingale Mj = mj − m 23.12. Let P be a probability space and let Xj j∈ be a sequence of independent identically distributed random variables such that PXj = 0 = PXj = 2 = 21 . Set # Mk = kj=1 Xj . Show that there does not exist any filtration j j∈ and no random variable M such that Mk = E k M. [Hint: compare with Example 17.3(xi).] Remark. This example shows that not all martingales can be obtained as conditional expectations of a single function.
24 Orthonormal systems and their convergence behaviour
In Chapter 21 we discussed the importance of orthonormal systems (ONSs) in Hilbert spaces. In particular, countable complete ONSs turned out to be bases of separable Hilbert spaces. We have also seen that a countable ONS gives rise to a family of finite-dimensional subspaces and a sequence of orthogonal projections onto these spaces. In the present chapter we are concerned with the following topics: • to give concrete examples of (complete) ONSs; • to see when the associated canonical projections are conditional expectations; • to understand the Lp (p = 2) and a.e. convergence behaviour of series expansions with respect to certain ONSs. The latter is, in general, not a trivial matter. Here we will see how we can use the powerful martingale machinery of Chapters 17 and 18 to get Lp 1 p < and a.e. convergence. Throughout this chapter we will consider the Hilbert space L2 I I where I ⊂ is a finite or infinite interval of the real line, I = I ∩ are the Borel sets in I, = 1 I is Lebesgue measure on I and x is a density function. We will usually write x dx and dx instead of and d. One of the most important techniques to construct ONSs is the Gram–Schmidt orthonormalization procedure (21.10), which we can use to turn any countable family fk k∈ into an orthonormal sequence ek k∈ . Something of a problem, however, is to find a reasonable sequence fk k∈ which can be used as input to the orthonormalization procedure. Orthogonal polynomials For many practical applications, such as interpolation, approximation or numerical integration, a natural set of fk to begin with is given by the polynomials on I. 276
Measures, Integrals and Martingales
277
Usually one applies (21.10) to the sequence of monomials 1 t t2 t3 = tj j∈0 to construct an ONS consisting of polynomials. Of course, this depends heavily on the underlying measure space where polynomials should be square integrable. With some (partly pretty tedious) calculations1 one can get the following important classes of orthogonal polynomials in L2 I I x dx. , > −1 We choose 24.1 Jacobi polynomials Jk k∈ 0
I = −1 1
x dx = 1 − x 1 + x dx
> −1
and we get dk
+k +k 1 + x 1 − x dxk x = Jk k k! 2 1 − x 1 + x k 1 k+ k+ = k x − 1k−j x + 1j 2 j=0 j k−j −1k
2
J
= k 2
2 ++1 k + + 1 k + + 1 2k + + + 1 k + 1 k + + + 1
Choosing in 24.1 particular values for and yields other important families. 24.2 Chebyshev polynomials (of the first kind) Tk k∈0 We choose I = −1 1
x dx = 1 − x2 −1/2 dx
and we get
⎧ ⎨ 2 cosk arccos x if k ∈ −1/2−1/2
Tk x = Jk x = ⎩ √1 if k = 0
1 k + 21 2 2 Tk 2 = 2 k + 1
The first few Chebyshev polynomials are 1 1
x
2x2 − 1
4x3 − 3x
8x4 − 8x2 + 1
16x5 − 20x3 + 5x
The material in Sections 24.1–24.5 below is taken from Alexits [1, pp. 30–37], Gradshteyn-Ryzhik [17, §8.9] and Kaczmarz-Steinhaus [22, §§IV.1–2, 8–9]. Another classic is the book by Szeg¨o [51, §§1–5], and a good modern reference is the monograph by Andrews et al. [2, §§5.1, 6.1–6.3].
278
R.L. Schilling
and the following recursion formula holds: Tk+1 x = 2x Tk x − Tk−1 x
k ∈
24.3 Legendre polynomials Pk k∈0 We choose I = −1 1
x dx = dx
and we get 00
Pk x = Jk
x =
−1k dk 1 − x2 k k k k! 2 dx
Pk 22 =
2 2k + 1
The first few Legendre polynomials are 1 x
2 1 2 3x − 1
3 1 2 5x − 3x
4 2 1 8 35x − 30x + 3
5 3 1 8 63x − 70x + 15x
and the following recursion formula holds: k + 1 Pk+1 x = 2k + 1 x Pk x − k Pk−1 x
k ∈
24.4 Laguerre polynomials Lk k∈0 We choose I = 0
x dx = e−x dx
and we get j k dk −x k j k x e x = k! −1 Lk x = e dxk j j! j=0
Lk 22 = k!2
x
The first few Laguerre Polynomials are 1 1 − x x2 − 4x + 2 −x3 + 9x2 − 18x + 6 x4 − 16x3 + 72x2 − 96x + 24 and the following recursion formula holds: Lk+1 x = 2k + 1 − x Lk x − k2 Lk−1 x
k ∈
24.5 Hermite polynomials Hk k∈0 We choose I = −
x dx = e−x dx 2
and we get Hk x = −1k ex
2
dk −x2 e dxk
√ Hk 22 = 2k k!
Measures, Integrals and Martingales
279
The first few Hermite polynomials are 1
2x
4x2 − 2
8x3 − 12x
16x4 − 48x2 + 12
and the following recursion formula holds: Hk+1 x = 2x Hk x − 2k Hk−1 x
k ∈
In order to decide if a family of polynomials pk k∈ ⊂ L2 I x dx is a complete ONS we have to show that ux pk x x dx = 0 ∀ k ∈ 0 =⇒ u = 0 a.e. The key technical result is the Weierstraß approximation theorem. 24.6 Theorem (Weierstraß) Polynomials are dense in C0 1 w.r.t. uniform convergence. Proof (S.N. Bernstein) Take a sequence Xj j∈ of independent2 measurable functions on 0 1 0 1 dx which are all Bernoulli p 1 − p-distributed, 0 < p < 1, i.e. Xj = 1 = p
and
Xj = 0 = 1 − p
∀ j ∈
cf. 17.4 for the construction of such a sequence. Write Sn = X1 + · · · + Xn for the partial sum and observe that, due to independence, Sn = k = · Xj1 = 1 ∩ ∩ Xjk = 1∩ 1j1 jk n
∩ Xjk+1 = 0 ∩ ∩ Xjn = 0
n k = p 1 − pn−k k which shows that u
Sn x
2
n
n k n k dx = p 1 − pn−k = Bn u p u n k k=0
In the sense of Example 17.3(x).
280
R.L. Schilling
where Bn u p stands for the nth Bernstein polynomial.3 From 17.4 we also know that 2 S x 1 p1 − p n (24.1) n − p dx = n 4n since the function p → p1 − p attains its maximum at p = 1/2. As u ∈ C0 1 is uniformly continuous, ux − uy < whenever x − y < is sufficiently small. Thus Bn u p − up u Snn − up d = S u Snn − up d + S
n n −p<
S u n − up d n nn −p
Sn Sn n − p < + 2 u n − p 2 1 + 2 u 2 Snn − p d (24.1)
+
u 2 n 2
by Markov’s inequality P10.12 (in the penultimate step) and (24.1). The above inequality is independent of p ∈ 0 1 , and the assertion follows by letting first n → and then → 0. 24.7 Remark The key ingredient in the above proof is (24.1) which shows that the variance of the random variable Sn vanishes uniformly (in p) as n → . A short calculation confirms that this is equivalent to saying that n→ Bn • − p2 p −−−→ 0 uniformly for p ∈ 0 1 . n→
With this information, the proof then yields that Bn u p −−−→ u uniformly in p for all continuous u. This is, in fact, a special case of Korovkin’s theorem: A sequence of positive linear operators from C0 1 to C0 1 converges uniformly for every u ∈ C0 1 if, and only if, it converges uniformly for each of the following three test functions: 1 x x2 . 3
n→
In view of the strong law of large numbers, Example 18.8, we observe that n−1 Sn −−−→ p a.e., so that by n→ dominated convergence Bn u p −−−→ up for each p ∈ 0 1. Since our argument includes this result as a particular case, we leave it as a side-remark.
Measures, Integrals and Martingales
281
(In the present case, the operators are u → Bn u p.) More on this topic can be found in Korovkin’s monograph [24, pp. 1–30] or the expository paper [4] by Bauer. 24.8 The monomials tj j∈0 are complete in L1 = L1 0 1 dt, that Corollary j is 01 ut t dt = 0 for all j ∈ 0 implies that u = 0 a.e. Proof Assume first that u ∈ C0 1 satisfies 01 uttj dt = 0 for all j ∈ 0 . This implies, in particular, that utpt dt = 0 for all polynomials pt 01
Using Weierstraß’ approximation theorem 24.6 we find a sequence of polynomials pk k∈ which approximate u uniformly on 0 1 . Since C0 1 ⊂ L1 0 1 ⊂ L2 0 1 , we see u2 dt = u · u − pk dt u 2 · u − pk 2 01
01
k→
u 2 · u − pk −−−→ 0 and conclude that u = 0 a.e. (even everywhere since u is continuous). Assume now that u ∈ L1 0 1 dt \ C0 1 such that 01 ut tj dt = 0 for all j ∈ 0 . The primitive Ux = ut dt x1
is a continuous function, cf. Problem 11.7, and by Fubini’s theorem 13.9 we see for all j ∈ Ux xj−1 dx = 1x1 t ut xj−1 dt dx 01
01 01
= =
01
01
10t x x
j−1
dx ut dt
tj ut dt = 0 01 j!
This means that 01 Ux xk dx = 0 for all k ∈ 0 and, by the first part of the proof, that U ≡ 0. Lebesgue’s differentiation theorem 19.20 finally shows that ux = U x = 0 a.e.
282
R.L. Schilling
It is not hard to see that Theorem 24.6 and Corollary 24.8 also hold for the interval −1 1 and even for general compact intervals a b (cf. Problem 24.3). This we can use to show that the Jacobi (hence, Legendre and Chebyshev) polynomials are dense in L2 I x dx and form a complete ONS. Note that −11
u dx =
√ 2 u dx
−11
−11
1/2 ·
1/2 u2 dx ·
√ 2 dx
−11
dx
−11
1/2
1/2 <
implies that u ∈ L1 −1 1 dx, and from Corollary 24.8 and the fact that > 0 we get uxx xj dx = 0 =⇒ u = 0 a.e. =⇒ u = 0 a.e. −11
This does not quite work for the Hermite and Laguerre polynomials, which are defined on infinite intervals. For the latter we take u ∈ L2 0 e−x dx, and find for all s 1 −sx ux e dx = ux e1−sx e−x dx 0
0
=
1 − sk ux xk e−x dx = 0 k! 0 k=0
=0
(note that the integral and the sum can be interchanged by dominated convergence). Using Jacobi’s formula C15.8 to change coordinates according to t = e−x , dt/dx = −e−x , we get 0= ux e−sx dx = u− ln t ts−1 dt s 1 0
01
and for s ∈ the above equality reduces to the case covered by Corollary 24.8. A very similar calculation can be used for the Hermite polynomials since 2 2 ux e−sx dx = ux + u−x e−sx dx
0
√ √ dt u t + u − t e−st √ 0 2 t √ where we used the obvious substitution x = t. =
Measures, Integrals and Martingales
283
The trigonometric system and Fourier series We consider now L2 = L2 − − = 1 − . As before we use dx as a shorthand for dx. The trigonometric system consists of the functions 1 √ 2
cos x √
sin x √
cos 2x √
sin 2x cos kx √ √
or, equivalently, 1 √ eikx 2
k ∈ i =
√
sin kx √ (24.2)
−1
(24.3)
Since eix = cos x + i sin x, we can see that (24.2) and (24.3) are equivalent, and from now on we will only consider (24.2). Orthogonality of the functions in (24.2) follows easily from the classical result that ⎧ if k = ⎪ ⎨0 (24.4) cos kx sin x dx = if k = 1 ⎪ − ⎩ 2 if k = = 0 which we leave as an exercise for the reader, see Problem 24.4. 24.9 Definition A trigonometric polynomial (of order n) is an expression of the form n Tx = 0 +
j cos jx + j sin jx (24.5) j=1
where n ∈ 0 , j j ∈ and 2n + 2n > 0. It is not hard to see that the representation (24.5) of Tx is equivalent to Tx =
n
jk cosj x sink x
jk=0
with coefficients jk ∈ , cf. Problem 24.5. It is this way of writing Tx that justifies the name trigonometric polynomial. 24.10 Theorem The trigonometric system (24.2) is a complete ONS in L2 = L2 − dx. Proof We have to show that ux cos kx dx = 0 −
−
ux sin x dx = 0
⎫ ∀ k ∈ 0 ⎪ ⎪ ⎬ ⎪ ⎪ ∀ ∈ ⎭
=⇒
u=0
a.e.
(24.6)
284
R.L. Schilling
Assume first that u is continuous and that, contrary to (24.6), ux0 = c = 0 for some x0 ∈ − . Without loss of generality we may assume that c > 0. Since the trigonometric functions are 2 -periodic, we can extend u periodically onto the whole real line. Then wx = c−1 ux + x0 is continuous around x = 0, orthogonal on − to any of the functions in (2)[] , and satisfies w0 = 1. As w is continuous, there is some 0 < < such that wx >
1 2
∀ x ∈ −
Consider the trigonometric polynomial
2 – cos δ 1
tx = 1 − cos + cos x Obviously, tx and all powers tN x are polynomials in cos x. From de Moivre’s formula
–π
–δ
δ
π
t (x) = 1 – cos δ + cos x
eikx = cos x + i sin xk
it is easy to see that cosk x = kj=0 cj cos jx,[] see also Gradshteyn and Ryzhik [17, 1.32]. We can thus write tN x as linear combination of cos kx, k = 0 1 N . By assumption, w is orthogonal to all of them, and so 0= wx tN x dx = + + (24.7) wx tN x dx −
− − −
On − we have wx > −
1 2
as well as tx > 1, hence
wx tN x dx
1 N → tN x dx −−−→ 2 −
by monotone convergence T9.6. On the other hand, tx 1 for x ∈ − − ∪ and N wx t x dx − w < ∀ N ∈
which means that (24.7) is impossible, i.e. w ≡ 0 and u ≡ 0. An arbitrary function u ∈ L2 − dx is, due to the finiteness of the measure, integrable[] , and we may consider the primitive Ux = ut dt − x
Measures, Integrals and Martingales
285
which is a continuous function, cf. Problem 11.7. Moreover, U− = 0 = ut dt = U −
because of the assumption √ that u is orthogonal to every function from (24.2) and, in particular, to t → 1/ 2 . By Fubini’s theorem 13.9 we get Ux cos kx dx = 1− x t ut cos kx dt dx −
− −
=
−
=−
−
ut −
1t x cos kx dx ut dt
sin kt dt = 0 k
and we conclude from the first part of the proof that U ≡ 0. Lebesgue’s differentiation theorem 19.20 finally shows that ux = U x = 0 a.e. Since the trigonometric system is one of the most important ONSs, we provide a further proof of the completeness theorem which gives some more insight into Fourier series and yields even an independent proof of Weierstraß’ approximation theorem 24.6 for trigonometric polynomials, cf. Corollary 24.12 below. We begin with an elementary but fundamental consideration which goes back to Féjer. If u ∈ L2 − dt, we write 1 1 aj = ut cos jt dt bk = ut sin kt dt (24.8)
−
− (j ∈ 0 , k ∈ ) for the Fourier cosine and sine coefficients of u and set sN u x =
N j=1
a aj cos jx + bj sin jx + 0 2
N 1 1 = ut dt cos jt cos jx + sin jt sin jx +
− j=1 2
(24.9)
N 1 1 = + ut cos jt − x dt 2 j=1
− = DN t−x
where we used the trigonometric formula cos a cos b + sin a sin b = cosa − b
(24.10)
286
R.L. Schilling
The function DN • is called the Dirichlet kernel. In Problem 24.6 we will see that DN • has the following closed-form expression: sin n + 21 x (24.11) DN x = 2 sin x2 but we do not need this formula in the sequel. Now we introduce the Cesàro C-1 mean 1 N u x = s0 u x + s1 u x + · · · + sN u x N +1
(24.12)
and in view of (24.9) we want to compute what is known as the Féjer kernel KN x =
1 D0 x + D1 x + · · · + DN x N +1
Using again (24.10) and observing that the cosine is even, we find for every k = 0 1 N 1 − cos x Dk x = (24.10)
=
k 1 1 − cos x cos jx 2 j=−k k 1 cos jx − cosj − 1x + sin x sin jx 2 j=−k
1 cos kx − cosk + 1x 2 since sin jx = − sin−jx is an odd function which cancels if we sum over −k j k. Summing over all values of k = 0 1 N shows 1 KN x = D0 x + D1 x + · · · + DN x N +1 1 − cosN + 1x 1 (24.13) = 2N + 1 1 − cos x =
24.11 Lemma (Féjer) If u ∈ C− , then limN → N u − u p = 0 for all 1 p . Proof From (24.9), (24.12) and (24.13) we get after a change of variables in the integrals 1 N u x = ut KN x − t dt
− 1 − cosN + 1t 1 ux − t dt = 2N + 1 − 1 − cos t
Measures, Integrals and Martingales
Since
1
− KN t dt
N u − u p
287
= 1[] , we see for all > 0 and sufficiently small > 0
1 1 − cosN + 1t =
u• − t − u dt
2N + 1 −
1 − cos t
p
1 − cosN + 1t
u• − t − u dt p 1 − cos t −
1 1 − cosN + 1t
u• − t − u dt p 2N + 1 − 1 − cos t u p 1 − cosN + 1t dt + N + 1 − − ∪ 1 − cos t
12.14
1 2N + 1
+
u p 4 N + 1 1 − cos
where we used Jensen’s inequality and the fact that limt→0 u• − t − u p = 0 by dominated convergence (p < ), resp. uniform continuity (p = ). Letting first N → and then → 0 finishes the proof. 24.12 Corollary (Weierstraß) The trigonometric polynomials are dense in C− under • and dense in Lp − dt, w.r.t. • p , 1 p < . Proof From (24.9), (24.12) it is obvious that N u • is a trigonometric polynomial. The density of the trigonometric polynomials in C− is just Lemma 24.11. Since C− is dense in Lp − dt, cf. Theorem 15.17, we can find for every > 0 and u ∈ Lp − some g ∈ C− with u − g p and a trigonometric polynomial t such that g − t 2 −1/p . This shows u − t p u − g p + g − t p + 2 1/p g − t 2 For the last estimate we also used that w p 2 1/p w . 24.13 Corollary The trigonometric system (24.2) is a complete ONS in L2 = L2 − dt Proof (of C24.13 and, again, of T24.10) Let u ∈ L2 − and pick a trigonometric polynomial t such that u − t 2 , cf. Corollary 24.12. Let n = degreet . As in the proof of Theorem 24.10 we use de Moivre’s formula to see that cosk x and sink x can be represented as linear combinations of 1 cos x cos kx and sin x sin kx.[]
288
R.L. Schilling
Recall that the partial sum sn u x = a0 /2 + nj=1 aj cos jx + bj sin jx is the projection of u onto span1 cos x sin x cos nx sin nx. Therefore, Theorem 21.11(i) applies and u − sn u 2 u − t 2 proves completeness. The above proof of the completeness of the trigonometric system has a further advantage as it allows a glimpse into other modes of convergence of Fourier series. We have 24.14 Corollary (M. Riesz’ theorem) Let u ∈ Lp − dt and 1 p < . lim u − sn u p = 0
n→
⇐⇒
sn u p Cp u p ∀ n ∈
(24.14)
with an absolute constant Cp not depending on u or n ∈ . Proof The ‘only if’ part is a consequence of the uniform boundedness principle (Banach–Steinhaus theorem) from functional analysis, see e.g. Rudin [40, §5.8] or Problem 21.10. The ‘if’ part follows from the observation that every trigonometric polynomial T of degree n satisfies sn T = T .[] Choosing for u ∈ Lp the polynomial T = t with u − t p , cf. Corollary 24.12, we infer that for sufficiently large n > degreet u − sn u p u − t p + t − sn t p + sn t − sn u p =0
1 + Cp u − t p Establishing the estimate sn u p Cp u p is an altogether different matter and so is the whole Lp - and pointwise convergence theory for Fourier series. Here we want to mention only a few facts: • Lp -convergence (1 p < ) of the Cesàro means n u follows immediately from Lemma 24.11. This is in stark contrast to… • Lp -convergence (1 p < , p = 2) of the partial sums sn u requires the estimate (24.14); see Corollary 24.14 and, for more details, Wheeden and Zygmund [53, §12.88].
Measures, Integrals and Martingales
289 n→
• Pointwise a.e. convergence of the partial sums sn u −−−→ u when u ∈ L2 or u ∈ Lp , 1 < p < , which had been an open problem until 1966. A.N. Kolmogorov constructed in 1922/23 a function u ∈ L1 whose Fourier series diverges a.e. In his famous 1966 paper L. Carleson proved that a.e. convergence holds for u ∈ L2 , and R.A. Hunt extended this result in 1968 to u ∈ Lp , 1 < p < . All these deep results depend on estimates of the type (24.14) and, more importantly, on estimates for max0jn sj u which resemble the maximal martingale estimates which we have encountered in Chapter 19, e.g. T19.12. But there is a catch. 24.15 Lemma The subspace n = span1 cos x sin x cos nx sin nx of L2 − dx is not of the form L2 n where n is a sub--algebra of the Borel sets − . Proof The space L2 n is a lattice, i.e. if f ∈ L2 n , then f ∈ L2 n . Take fx = sin x. Unfortunately, 2 4 cos 2x cos 4x cos 6x sin x = − + + +···
1·3 3·5 5·7 so that sin• ∈ n but sin• ∈ n . (You might also want to have a look at Theorem 22.5 for a more systematic treatment.) This means that martingale methods are not (immediately) applicable to Fourier series. The Haar system In contrast to Fourier series, the Haar system allows a complete martingale treatment. Throughout this section we consider L2 = L2 0 1 0 1 , = 1 01 . 24.16 Definition The Haar system consists of the functions 00 x = 101 x jk x = 2k/2 1 2j−2 2j−1 x − 1 2j−1 2k+1 2k+1
2j 2k+1 2k+1
1 j 2 k k ∈ 0
x
⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭
(24.15)
Obviously, each Haar function is normalized to give jk 2 = 1. The first few Haar functions are
290
R.L. Schilling
2
2
2
2
1
1
√2 1
√2 1
1 4
1 2
3 4
1
χ 0,0
χ 1,0
χ 1,1
χ 2,1
2
2
2
2
1
1
1
1
χ 1,2
χ 2,2
χ 3,2
χ 4,2
It is often more convenient to arrange the double sequence (24.15) in lexicographical order: 00 ; 10 ; 11 21 ; 12 22 32 42 ; …and to relabel them in the following way H0 = 00
Hn = H2k + = +1k
0 2k − 1
(24.16)
(note that the representation n = 2k + , 0 2k − 1 is unique). We can now associate with the sequence Hn n∈ a canonical filtration H n = H0 H1 Hn
n ∈ 0
which is the smallest -algebra that makes all functions H0 Hn measurable, cf. Definition 7.5. 24.17 Theorem The Haar functions are a complete ONS in L2 0 1 dx. Moreover, N MN = an Hn N ∈ 0 an ∈ n=0 p is a martingale w.r.t. the filtration H N N ∈0 , and for every u ∈ L 0 1 dx, 1 p < , the Haar–Fourier series
sN u x =
N n=0
u Hn Hn
Measures, Integrals and Martingales
291
converges to u in Lp and almost everywhere, and the maximal inequality
p
sup s u u p
n p−1 p n∈ holds for all u ∈ Lp and 1 < p < . Proof Step 1. Orthonormality: That jk 2 = 1 is obvious. If the functions jk = m satisfy jk = 0 ∩ m = 0 = ∅, it is clear that jk m d = 0. Otherwise, we can assume that k < m, so that either
m = 0 ⊂ jk = +1
or
m = 0 ⊂ jk = −1
obtains. In either case,
m jk d = ±
m d = 0
Step 2. Martingale property: Let n = 2k + . Then, for all n ∈ , H n = 00 10 11 2k−1 k−1 1k 2k +1k 2 + 1 2 + 2 + 1 + 2 2k − 1 1 = 0 k+1 k k+1 1 k k+1 2 2 2k 2 2 2 = n
= n
where we used that the dyadic intervals are nested and refine. Assume, for simplicity, that < 2k − 1. Then Hn+1 = 0 ∈ n , and so Hn+1 x dx = 0 ∀ J ∈ n or J ∈ n J
(If = 2k − 1 we get an analogous conclusion with a rollover as H n is just the dyadic -algebra generated by all disjoint half-open intervals of length 2−k−1 in H 0 1.) By Theorem 23.5 we have En Hn+1 = 0, and by Theorem 23.8
EN MN +1 = EN MN + aN +1 HN +1 = MN + aN +1 EN HN +1 = MN This shows that MN H N N ∈ is indeed a martingale, cf. Corollary 23.14. H
H
H
Step 3. Convergence in L1 and a.e. if u ∈ L1 ∩ L : Set ak = u Hk , so that MN = sN u becomes the Haar–Fourier partial sum. Using Bessel’s inequality (Theorem 21.11) we see sN u 22 =
N k=0
u Hk 2 u 22
(24.17)
292
R.L. Schilling
where the right-hand side is finite since L1 ∩ L ⊂ L2 ,[] and from the Cauchy– Schwarz C12.3 and Markov P10.12 inequalities we get for all R > 0 1/2 sN u d sN u 2 sN u > R sN u>R
1 1 sN u 22 u 22 R R
Since the constant function R is in L2 0 1 dx, the martingale sN uN ∈ is uniformly integrable in the sense of Definition 16.1, and we conclude from Theorems 18.6 and 23.15 that N →
sN u −−−→ u
in L1 and almost everywhere
Since H n n∈ contains the sequence k k∈ of dyadic -algebras – we have H H H indeed n = 2n −1 – we know that = n n ∈ = 0 1. Just as in Example 23.17 we see that
E2n −1 u = En u = s2n −1 u H
and in view of Theorem 23.15 we conclude that u = u a.e. Step 4. Convergence in Lp if u ∈ L1 ∩ L : Observe that L1 ∩ L ⊂ Lp for all 1 < p < .[] Applying the inequality b p p p p p−1 a − b a − b = p t dt a
p a − b max ap−1 bp−1 p a − b max ap−1 bp−1 a b ∈ , 1 < p < , to the martingale EN u = sN u, we get after integrating over 0 1 ± sN up − up d p sN u − u 1 u p−1 p > 1 H
H where we also used that sN u = EN u u as u u < , cf. Theorem 23.8(ix). From Riesz’ convergence theorem T12.10 we conclude that N →
sN u −−−→ u in Lp for all 1 < p < and all u ∈ L1 ∩ L . Step 5. Convergence in Lp if u ∈ Lp : If u ∈ Lp , 1 p < , is not bounded, we set uk = −k ∨ u ∧ k. Since we have a finite measure space, uk ∈ Lp ∩ L ⊂
Measures, Integrals and Martingales
293
L1 ∩ L , and we see from the triangle inequality and Theorem 23.8(v),(ii) sN u − u p sN u − sN uk p + sN uk − uk p + uk − u p sN uk − uk p + 2 uk − u p The claim follows as N → and then k → . Step 6. A.e. convergence if u ∈ Lp : Since sN u± = EN u± 0, we know from Corollary 23.16 that sN u± p N ∈ are submartingales which satisfy, by Theorem 23.8(ii), H
H
p p sN u± p d EN u± d = EN u± p u± pp H
Therefore, the submartingale convergence theorem 18.2 applies and shows that limN → sN u± xp exists a.e., hence, limN → sN u x exists a.e. Since step 5 and Corollary 12.8 already imply limj→ sNj u x = ux a.e. for some subsequence, we can identify the limit and get limN → sN u x = ux a.e. Step 7. Completeness follows from lim sN u − u 2 = 0 and T21.13. N →
Step 8. The maximal inequality is just Doob’s maximal Lp -inequality for martingales T19.12 since sn un∈ is a uniformly integrable martingale which is, by step 5 and Theorem 23.15, closed by s u = u. 24.18 Remark As a matter of fact, ordering the Haar functions in a sequence like Hn n∈0 does play a rôle. If p = 1, we can find (after some elementary but very tedious calculations) that
2n √
k/2
00 + 10 + 2 1k
2 1
k=1
while the lacunary series satisfies
n
k
10 + 2 12k
cn
k=1
1
for some absolute constant c > 0. Therefore, we can rearrange n=0 an Hn in such a way that it becomes a divergent series n=0 an Hn for some necessarily infinite permutation 0 → 0 . This phenomenon does not happen if 1 < p < . In fact, Hn n∈0 is what one calls an unconditional basis of Lp , 1 < p < , which means that every p rearrangement of the series n=0 an Hn converges in L and leads to the same limit. The Haar system is even the litmus test for the existence of unconditional bases: every Banach space B where Hn n∈0 is a basis has an unconditional
294
R.L. Schilling
basis if, and only if, the basis Hn n∈0 is unconditional, cf. Olevski˘ı [32, p. 73, Corollary] or Lindenstrauss and Tzafriri [27, vol. II, p. 161, Corollary 2.c.11]. Since the unconditionality of Hn n∈0 rests on a martingale argument, we include a sketch of its proof. First we need the following Burkholder–Davis– Gundy inequalities for a martingale uj j∈0 on a probability space X :
p sup uj u• u• N Kp sup uj
(BDG) 0jN
p
p
0jN
p
for all N ∈ 0 , all 0 < p < and some absolute constants Kp p > 0. The expression u• u• N stands for the quadratic variation of the martingale u• u• N = u0 2 +
N −1
uj+1 − uj 2
j=0
A proof of (BDG) can be found in Rogers and Williams [38, vol. 2, pp. 94–6]. If we combine (BDG) with Doob’s maximal Lp -inequality 19.12 we get
p Kp
p uN p u• u• N u (BDG ) p−1 N p p for all N ∈ 0 and 1 < p < – mind the different range for p in (BDG ) compared to (BDG). Obviously, uN =
N
u Hk Hk
and
k=0
wN =
N
k u Hk Hk
k=0
k ∈ −1 +1, are uniformly integrable martingales (use the argument of the proof of Theorem 24.17) and their quadratic variations u• u• N = w• w• N coincide. Therefore, (BDG ) shows that the martingales uN − un N n and wN − wn N n satisfy
1/2
1/2
uN − un p ∼ u• − un u• − un N p = w• − wn w• − wn N p ∼ wN − wn p where a ∼ b means that a b K a for some absolute constants K > 0, so that either both sequences converge or diverge. Let us assume that uN N ∈0 converges. Then every lacunary series
u Hkj Hkj
converges
(24.18)
j=1
since we can produce its partial sums by adding and subtracting uN and wN with suitable ±1-sequences k k∈ . This entails that for every fixed permutation
Measures, Integrals and Martingales
0 → 0
N
u Hk Hk
N>n
295
sufficiently large
p
k=n
Otherwise, we could find finite sets 0 1 2 ⊂ 0 with kj j∈ = and
u Hk Hk
∀ n ∈
>
!
n∈ n
p
k∈n
contradicting (24.18). For more on this topic we refer to Lindenstrauss and Tzafriri [27].
The Haar wavelet Let us now consider a Haar system on the whole real line, i.e. in L2 = L2 dx. We begin with the remark that the functions 00 = 101 and 10 = 101/2 − 11/21 are the two basic Haar functions, since we can reconstruct all Haar functions jk from them by scaling and shifting: jk x = 2k/2 10 2k x − j + 1
k ∈ 0 j = 1 2 2k
(24.19)
The advantage of (24.19) over the definition (24.15) is that (24.19) easily extends to all pairs j k ∈ 2 and, thus to a system of functions on . 24.19 Definition The Haar wavelets are the system jk jk∈ where the mother wavelet is x = 101/2 x − 11/21 x and jk x = 2k/2 2k x − j = 2k/2 1 2j 2j+1 x − 1 2j+1 2j+2 x 2k+1 2k+1
2k+1 2k+1
for all j k ∈ . Note that = 10 = 10 , j−1k = jk for all j = 1 2 2k and k ∈ 0 while −10 x = 2−1/2 00 x for 0 x < 1. The Haar wavelets can be treated by martingale methods. To do so, we introduce the two-sided dyadic filtration j j+1 = n ∈ j ∈ = jn j ∈ n+1 2n+1 2n+1 (24.20) " = = ∅ = = − n n n∈
n∈
296
R.L. Schilling
The last assertion follows from the fact that D = j2−n−1 j ∈ n ∈ is a dense subset of and that is generated by all intervals of the form a b where a b ∈ D (or, indeed, any other dense subset).[] In what follows we have to consider double summations. To keep notation simple, we write # $ ajk as a shorthand for ajk k=− j=−
and call kconst. the double sum.
j=−
the right tail and
k=− −
j=−
j=−
the left tail of
24.20 Theorem The Haar wavelets jk jk∈ are a complete ONS in L2 = L2 dx. Moreover, for all 1 < p < , u=
u jk jk
u ∈ Lp
(24.21)
k=− j=−
in Lp and almost everywhere, and
N
sup
2p − 1 u p u jk jk
MN ∈
p−1 p k=−M j=−
(24.22)
holds for all 1 < p < and u ∈ Lp . Proof Step 1. Orthonormality of the family jk jk∈ can be seen with arguments similar to those in step 1 of the proof of Theorem 24.17. Step 2. Lp 1 p < and a.e. convergence of the right tail of (24.21) if u ∈ L1 ∩ L : Note that the inner sum is pointwise convergent since jk k = 0 whenever j = . Consider now u ∈ L1 ∩ L ⊂ Lp .[] Set uN−M =
N
u jk jk = EN +1 u − E−M u
(24.23)
k=−M j=−
The latter equality follows from the fact that En is the orthogonal projection onto L2 n – whose basis is jk j ∈ k ∈ k n − 1 – and by 22.4(vii), En+1 u − En u = En+1 u − En+1 En u = En+1 u − En u
⊥ 2 which is the orthogonal projection of L2 n onto L n+1 . This means that En+1 u − En u = u jn jn (24.24) j∈
Measures, Integrals and Martingales
297
since the resulting function must be n+1 -measurable as well as orthogonal to 2 L n : i.e. we must include jn j∈ and exclude jk j∈ . Summing (24.24) k
over n = −M N yields (24.23). Since EN uN ∈ is by Theorem 23.15 a uniformly integrable martingale, and since = , we find that N →
EN u −−−→ u
in L1 and a.e. for all u ∈ L1 ∩ L
As in step 4 of the proof of Theorem 24.17 we see that this also holds in Lp . Step 3. Lp 1 p < convergence of the right tail of (24.21) if u ∈ Lp : For a general u ∈ Lp we can use dominated convergence T12.9 to see that the functions uk = −k ∨ u ∧ k1−kk ∈ L1 ∩ L approximate u in Lp -sense. By Theorem 23.8(v),(ii),
E N u − u EN u − EN uk + EN uk − uk + uk − u p p p p
E N uk − uk p + 2 uk − u p In view of the result of the previous step, we can let first N → , then k → , N →
and find that EN u −−−→ u in Lp for every u ∈ Lp .
Step 4. A.e. convergence of the right tail of (24.21) if u ∈ Lp , 1 < p < follows from exactly the same arguments that were used in step 6 of the proof of Theorem 24.17. Step 5. Lp -convergence 1 < p < of the left tail of (24.21) if u ∈ Lp : It remains to consider E−M M∈ . Although this is a backwards martingale, we cannot use Theorem 18.7 as − is not -finite. Instead, we take u ∈ Lp , 1 < p < and set uR = u 1−RR , R > 0. For all M ∈ with 2M > R we find
E−M uR = 2−M
−R0
ux dx 1−2M 0 + 2−M
0R
ux dx 102M
where we used that E−M projects onto the intervals j2M j + 12M , and we find from the H¨older inequality T12.2 with p−1 + q −1 = 1 that E −M uR x 2−M R1/q u p 1−2M 2M x
which implies
E −M uR 2−M R1/q u p 2 · 2M 1/p = cR 2−M1−1/p u p p
298
R.L. Schilling
Finally, by Theorem 23.8(v),(ii),
E −M u E−M u − uR + E−M uR
p p p
u − uR p + cR 2−M1−1/p u p and we get limM→ E−M u p = 0 for all u ∈ Lp , 1 < p < , letting first M → and then R → . MN → This shows that u−MN −−−−−→ u in Lp , 1 < p < , and the proof of the convergence of (24.21) in Lp , 1 < p < , is complete.
Step 6. Completeness of the Haar wavelets in L2 follows if we apply (24.21) in the case p = 2, cf. Theorem 21.13. Step 7. A.e. convergence of the left tail of (24.21): Observe that A = E−M u > for infinitely many M ∈ " E−j u > ∈ −
=
M=1 j=M
∈ −M
By the martingale maximal inequality, Lemma 19.11, for the reversed martingale E −j u j∈ and Theorem 23.8(ii) we see E −j u > A j=M
sup E−j u > j∈
1
E−1 u 1 u p p p p
This shows that A < . Since − = ∅ is the trivial -algebra, we M→
conclude that A = 0 or A = ∅. Therefore, E−M u −−−→ 0 almost everywhere
MN →
and so uN−M −−−−−→ u almost everywhere. Step 8: The maximal inequality (24.22): From step 2 we know that
N
sup uN−M = sup
u jk jk
p NM∈
NM∈ k=−M j=−
= sup EN +1 u − inf E−M u
N ∈
M∈
p u p + E−1 u p p−1
p
p
Measures, Integrals and Martingales
299
The last estimate follows from a combination of Minkowski’s inequality, Doob’s maximal Lp -inequality for martingales T19.12 applied to the closed (by u) martin p gale EN u , cf. step 3 and Theorem 23.15, and the fact that E−M u N ∈∪
M∈
is a reversed submartingale, cf. Example 17.3(vi) or Corollary 23.16, which entails E−M u p E−1 u p . Since by T23.8(ii) conditional expectations are contractions on Lp , we have E−1 u p u p , and the proof is completed.
A nice introduction to the Haar and other wavelets is Pinsky [35]. The Rademacher functions Let L2 = L2 0 1 0 1 , = 1 01 . The Rademacher functions Rk k∈0 are functions on L2 defined by R1 = 10 1 − 1 1 0
R0 = 101
2
2
R2 = 10 1 − 1 1 1 + 1 1 3 − 1 3 1 4
4 2
2 4
4
The graphs of the first four Rademacher functions are 1
1 4
1 2
3 4
1
R0
R1
R2
R3
In terms of Haar functions we have k
R0 = 00
Rk+1 =
2 1
2k/2 j=1
jk
k ∈ 0
(24.25)
Another equivalent definition of the Rademacher system is the following: expand −j with ∈ 0 1 – we exclude each x ∈ 0 1 as binary series, x = j j=1 j 2 expansions terminating with a string of 1s to enforce uniqueness – and set R0 x = 101 x
Rk x = 2k − 1
Yet another way to think of the functions Rk is as right-continuous versions of sign changes: Rk x ≈ sgn sin2k x, k ∈ 0 . 24.21 Lemma The system of Rademacher functions Rk k∈ is an ONS of independent4 functions in L2 0 1 dx which is not complete. 4
In the sense of Example 17.3(x) and Scholium 17.4.
300
R.L. Schilling
Proof Orthonormality follows since R =±1 R d = 0 for all k < , thus k Rk R d = 0 while R2k d = 1 is obvious. In very much the same way we deduce that Rk R1 R2 d = 0 for all k ∈ 0 which shows that the system Rk k∈0 is not complete. Independence is a special case of Scholium 17.4 with p = q = 1/2. Although Rk k∈0 is not complete in L2 , it still has good a.e. convergence properties. The reason for this is formula (24.25) and independence. 24.22 Theorem The Rademacher series 2 everywhere if, and only if, k=0 ck < .
k=1 ck Rk ,
ck ∈ , converges almost
2 −k/2 c Proof Assume first that k k=0 ck < . In view of (24.25) we set cjk = 2 and rearrange the absolutely convergent series as
2 k
ck2
=
k=0
2 cjk <
k=0 j=1
We can now interpret the double sequence cjk 1 j 2k k ∈ 0 as coefficients of the complete (!) Haar ONS jk 1 j 2k k ∈ 0 . From Parseval’s identity T21.11(iv) we then conclude that the series
2 k
ck Rk =
k=0
cjk jk
k=0 j=1
converges almost everywhere and in L2 to some element u ∈ L2 . Conversely, assume that the series k=0 ck Rk converges to a finite limit sx for all x ∈ E ∈ 0 1 such that E > 0. Writing sN for the N th partial sum of this series, we see that AN =
x ∈ E sj x − sx >
1 2
and
"
AN = ∅
N ∈
j=N
By the continuity of measures T4.4 we find for every > 0 some N = N ∈ such that AN < < 21 E
and
E \ AN > 0
In particular, if E ∗ = E \ AN , sj x − sk x sj x − sx + sx − sk x 1
∀ j k > N x ∈ E ∗
Measures, Integrals and Martingales
301
and an application of the Cauchy–Schwarz inequality for (double) series, cf. (12.13), shows 2 N ∗ E ck Rk d E∗
k=M+1
N
= E ∗
1
k=M+1
E = E
Rj Rk d
k=M+1
M<j
ck2 − 2
cj2 ck2
M<j
N
E∗
1/2
ck2 − 2
k=M+1 ∗
cj ck
M<j
N
∗
ck2 + 2
M
Rj Rk d 2 1/2
ck2
E∗
2 1/2
M<j
E∗
Rj Rk d
(24.26)
Consider now the system Rj Rk 0j k j the integral R R R =±1 Rm d = 0, we see that j
k
Rj Rk R Rm d = 0
if j k = m
This shows that Rj Rk 0j
For sufficiently large values of M ∈ we can thus achieve that 2 E ∗ 2 Rj Rk d 4 E∗ M<j
∗
E E
ck2 − 2
k>M
which implies
2 k>M ck
2, i.e.
ck2
k>M
2 k=0 ck
E ∗ E ∗ 2 = c 4 2 k>M k
< , and we are done.
It is possible to extend the Rademacher system explicitly to a complete ONS. This can be achieved by the following construction: w0 = R0
wn = Rj1 +1 · Rj2 +1 · · Rjk +1
n ∈
(24.27)
302
R.L. Schilling
where n = 2j1 + 2j2 + · · · + 2jk is the unique dyadic representation of n ∈ where 0 j1 < j2 < < jk . A similar argument to the one used in the second part of the proof of Theorem 24.22 shows that wn n∈0 is indeed an ONS. Note that Rk+1 = w2k , so that Rk k∈0 ⊂ wn n∈0 . 24.23 Definition The system (24.27) is called the Walsh orthonormal system (in Paley’s ordering). The Walsh system is a complete ONS, cf. Alexits [1, pp. 61–3] or Schipp et al. [41], and it is susceptible to a complete martingale treatment, cf. [41]. Again one considers the filtration of dyadic -algebras n n∈0 on 0 1 and the special partial sums s2n −1 u =
n −1 2
u wj wj
j=1
Then sn u = En u and we have the full martingale toolkit at our disposal. With n→ the methods used so far it is possible to show that s2n −1 u −−−→ u a.e. and in Lp , 1 p < . The case of general partial sums sn u is somewhat harder to handle but it is still doable with some variations of the techniques presented here; see Schipp et al. [41, Chapters 4, 6].
Well-behaved orthonormal systems For the Haar system and the Haar wavelet we could use martingale methods. A close inspection of our proofs reveals that the crucial input for getting martingales is that the ONS ej j∈0 satisfies
En en+1 = 0
n ∈ 0
(24.28)
where n = e0 e1 en . This condition implies immediately that the partial sum nj=0 cj ej is a martingale w.r.t. the filtration n n∈0 generated by the ONS ej j∈ . 24.24 Definition Let X be a -finite measure space and 1 p < . A family of functions ej j∈0 ⊂ Lp X satisfying (24.28) is called a system of martingale differences. For a system of martingale differences no orthogonality is required. The archetype of martingale differences are sequences of independent5 functions fk k∈0 ⊂ 5
In the sense of Example 17.3(x).
Measures, Integrals and Martingales
303
2 1 L ⊂ L , since is2 a probability measure) which are normalized such that fk d = 0 and fk d = 1. Our methods used in connection with the Haar system and Haar wavelets still apply and yield
24.25 Theorem Let X be a -finite measure space and let ej j∈0 be an ONS of martingale differences in L2 X . Then sn u x =
n
u ej ej x
n ∈ u ∈ L1 ∩ L2
j=0
is a martingale w.r.t. the filtration n = e0 e1 en . For every u ∈ L2 the sequence sn un∈ converges a.e. and satisfies the following maximal inequality:
sup s u u ∈ L2
2 u 2 n n∈
2
Proof That the sequence of partial sums satisfies En sn+1 u = sn u for u ∈ L2 and is, for u ∈ L1 ∩ L2 , a martingale is clear. Therefore, Corollary 23.16 shows that sn u± 2 n∈ are submartingales, and from Bessel’s inequality, cf. Theorem 21.11, sup sn u± 2 u± 2
u ∈ L2
n∈
we conclude that sn u± 2 n∈ satisfy the conditions of the submartingale convergence theorem 18.2. Thus limn→ sn u± 2 exists a.e. in 0 , and, since sn u± 0, so does limn→ sn u. From Doob’s maximal inequality T19.12 and Bessel’s inequality T21.11 we get
sup s u N ∈
2 sN u 2 2 u 2 n nN
2
and the usual monotone convergence argument proves the maximal inequality as N → . In the situation of Theorem 24.25 we cannot say much more about the limit limn→ sn u x apart from its mere existence. In particular, the partial sums limn→ sn u can converge to something completely different from u! Consider, for example, the system of Rademacher functions Rn n∈0 , which is clearly a system of martingale differences. If u = R1 R2 we get u Rj = R1 R2 Rj = R1 xR2 xRj x dx = 0 01
304
R.L. Schilling
for all j ∈ . Thus sn R1 R2 ≡ 0 is convergent, but limn→ sn R1 R2 ≡ 0 = R1 R2 . The reason is that the Rademacher functions are not complete in L2 . This also means that we cannot hope to get Lp -convergence in Theorem 24.25. 24.26 Theorem Let X be a -finite measure space and ej j∈0 ⊂ L2 be an ONS of martingale differences. Denote by sn u the partial sum sn u x =
n
u ej ej x
u ∈ L2
j=0
and by n = e0 e1 en the associated canonical filtration. Then the following assertions are equivalent: (i) ej j∈0 is a complete ONS. sn u d = u d for all A ∈ n , A < , and u ∈ L2 . (ii) A
A
(iii) En u = sn u for all u ∈ L2 . (iv) lim sn u − u p = 0 for all u ∈ Lp and all 1 p < . n→
Proof (i)⇒(ii): Since ej j∈0 is complete, we know from Theorem 21.13 that limn→ sn u − u 2 = 0 for all u ∈ L2 . Using the Cauchy–Schwarz inequality 12.3 we see for every A ∈ n with A < n→ sn u − u d sn u − u 2 · 1A 2 −−−→ 0 A Thus limn→ A sn u d = A u d. Since ej j∈0 is a system of martingale differences, we know that En en+k = 0, k ∈ ,[] and by Theorem 23.9, applied to the function 1A en+k ∈ L1 and A ∈ n , en+k d = 1A en+k d = 0 A A Therefore A u d = limj→ A sj u d = A sn u d holds for all n ∈ and A ∈ n with A < . (ii)⇒(iii) Since 1A u ∈ L1 for all A ∈ n with A < and u ∈ L2 , Theorem 23.9 and 23.8(vii) show u d = 1A u d = En 1A u d A
A
A
=
A
1A En u d =
Together with the assumption this gives En u d = sn u d A
A
En u d
A
∀ A ∈ n A <
Measures, Integrals and Martingales
305
Choose, in particular, for every k ∈ the set sn u > k1 + En u ∈ n . By Markov’s inequality P10.12 we see
2
sn u − En u > k1 k2 sn u − En u 2 < so that the above equality becomes 1 1 sn u − E n u d sn u > + E n u 0= k 1 n k sn u> k +E u This is only possible if sn u > k1 + En u = 0. A similar argument for the set sn u < k1 + En u finally shows n n 1 sn u − E u > k sn u = E u =
k∈
sn u − En u >
1 k
= 0
k∈
Therefore, sn u = En u a.e.
(iii)⇒(iv) For u ∈ L1 ∩L Theorem 23.15 shows that En u n∈ is a uniformly n→
integrable martingale and that En u −−−→ u in L1 and a.e. As in step 4 of the proof of Theorem 24.17, we use the inequality ap − bp p a − b maxap−1 bp−1 to deduce that ±
a b ∈ p > 1
En up − up d p En u − u · u p−1 1 n→
and, by Riesz’ convergence theorem 12.10, that En u −−−→ u in Lp . If u ∈ Lp is not bounded, we take an with exhausting sequence Ak k∈ ⊂ 1 Ak ↑ X and Ak < and set uk = −k ∨ u ∧ k 1Ak . Clearly, uk ∈ L ∩ L , and we see using Theorem 23.8(v),(ii),
E n u − u En u − En uk + En uk − uk + uk − u p p p p
E n uk − uk p + 2 uk − u p The claim follows if we let first n → and then k → . (iv)⇒(i) is just p = 2 combined with Theorem 21.13. If we know that the elements of the ONS are independent, we obtain the following necessary and sufficient conditions for pointwise convergence which generalize Theorem 24.22.
306
R.L. Schilling
24.27 Theorem Let X P be a probability space and ej j∈0 ⊂ L2 P be independent random variables such that ej dP = 0 and ej2 dP = 1 and let cj j∈0 ⊂ be a sequence of real numbers. Then (i) The family ej j∈0 is an ONS of martingale differences; 2 (ii) If in L2 P and a.e.; j=0 cj < , then j=0 cj ej converges (iii) If supj∈0 ej < and if j=0 cj ej converges almost everywhere, 2 then j=0 cj < . Proof (i) We set n = e0 e1 en
and
un =
n
cj ej
j=0
Since P is a probability measure, uj ∈ L2 P ⊂ L1 P and under our assumptions it is clear that uj j j∈0 is a martingale.[] By independence we have ⎧ ⎪ ⎨ ej dP · ek dP = 0 if j = k ej ek dP = ⎪ ⎩ e2 dP = 1 if j = k j which entails
u2n dP =
n
cj ck
ej ek dP =
n
cj2
(24.29)
j=0
jk=0
and also n n n un un+k dP = E un un+k dP = un E un+k dP = cj2
(24.30)
j=0
(ii) Because of (24.29) we see that un 21 un 22 =
n j=0
cj2
cj2 <
j=0 n→
and the martingale convergence theorem C18.3 shows that un −−−→ u a.e. Using (24.30) we conclude that un+k − un 2 dP = u2n+k − 2un un+k + u2n dP =
u2n+k dP −
u2n dP =
n+k j=n+1
cj2
j=n+1
cj2
Measures, Integrals and Martingales
307
Thus, by Fatou’s lemma 9.11, n→ cj2 −−−→ 0 u − un 2 dP lim inf un+k − un 2 dP k→
j=n+1
n→
and un −−−→ u follows in the L2 -sense. (iii) Since ej and j−1 are independent, we find for all A ∈ j−1 176 uj − uj−1 2 dP = cj2 ej2 dP = cj2 PA = cj2 dP A
A
(24.31)
A
Essentially the same calculation that was used in (24.30) also yields uj − uj−1 2 dP = 1A uj − 1A uj−1 2 dP A
= =
1A uj 2 dP −
A
1A uj−1 2 dP
u2j − u2j−1 dP
which can be combined with (24.31) to give n n−1 2 2 2 2 un − un−1 − cj dP = cj dP A
A
j=0
∀ A ∈ n−1
j=0
This means, however, that wn = u2n − nj=0 cj2 is a martingale. Consider the stopping time = = infn ∈ 0 un > , inf ∅ = . Since the series j=0 cj ej converges a.e., we can choose > 0 in such a way that 2 P < < 21 P = Without loss of generality we may also take 2 > w0 dP + u20 dP . The optional sampling theorem 17.8 proves that wn∧ n∈ is again a martingale and, therefore, n∧ 2 w0 dP = wn∧ dP = u2n∧ dP − cj dP (24.32) j=0
Taking into account the very definition of we find furthermore 2 2 2 un∧ dP + un∧ dP + u2n∧ dP un∧ dP = >n
2 2 +
1n
1n
u2 dP
=0
308
R.L. Schilling
= 2 2 +
c e + u−1 2 dP
1n
2 2 + 2
c2 e2 + u2−1 dP
1n
where we used the elementary inequality a + b2 2a2 + 2b2 in the last line. Since the ej are uniformly bounded by and since u−1 , we get u2n∧ dP 4 2 + 2 c2 dP n
4 2 + 2 P n
n
cj2
(24.33)
j=0
4 2 + 21 P =
n
cj2
j=0
since, by construction, 2 P n 2 P < < 21 P = . Rearranging (24.32) and combining this with the above estimates we obtain n∧ n∧ n 2 2 cj2 = cj dP cj dP P = j=0
=
j=0
j=0 (24.32)
=
u2n∧ dP −
w0 dP
(24.33)
4 2 + 21 P =
n
cj2 + 2
j=0
uniformly for all n ∈ . Since, by assumption, P = > 0 for sufficiently 2 large , we conclude that j=0 cj < . Theorem 24.27 has an astonishing corollary if we apply the Burkholder–Davis– Gundy inequalities (BDG) from p. 294 to the martingale n+k
wn = un+k − uk =
cj ej
j=k+1
w.r.t. the filtration n = n+k = e0 e1 en+k . The part of the inequalities which is important for our purposes reads
p wn p p sup wn w• w• n (24.34) 0jn
p
p
Measures, Integrals and Martingales
309
where n ∈ 0 , 0 < p < , and the quadratic variation is given by w• w• n = u•+k − uk u•+k − uk n =
n−1
n+k
uj+k+1 − uj+k 2 =
j=0
cj2 ej2
j=k+1
If we happen to know that supj∈ ej < , we even find n+k 1/2
2
cj
w• w• n
j=k+1
and we conclude from (24.34) that for all n k ∈ and 0 < p <
p un+k − uk p u•+k − uk u•+k − uk 1/2 n p
u•+k − uk u•+k − uk 1/2 n
n+k
1/2 cj2
j=k+1
holds. This proves immediately the following 24.28 Corollary Let X P be a probability space and let ej n∈0 be a sequence of independent random variables such that sup ej < ej dP = 0 and ej2 dP = 1 j∈0
Then un = nj=0 cj ej converges in L2 and a.e. to some u ∈ L2 if, and only if, 2 j=0 cj < . If the latter is the case, u ∈ Lp and the convergence takes place in Lp -sense for all 0 < p < . Unfortunately, many ONSs of martingale differences are incomplete and seem to behave more often like Rademacher functions than Haar functions. More on this topic can be found in the paper by Gundy [18] and the book by Garsia [16]. 24.29 Epilogue The combination of martingale methods and orthogonal expansions opens up a whole new world. Let us illustrate this by a rapid construction of one of the most prominent stochastic process: the Wiener process or Brownian motion. Choose in Theorem 24.27 X P = 0 1 0 1 where is onedimensional Lebesgue measure on 0 1 ; denoting points in 0 1 by , we will often write d instead of d. Assume that the independent, identically
310
R.L. Schilling
distributed random variables ej are all standard normal Gaussian random variables, i.e. 1 −x2 /2 Pej ∈ B = √ e dx B ∈ 0 1 2 B and consider the series expansion Wt =
en 10t Hn
∈ 0 1
n=0
1 Here t ∈ 0 1 is a parameter, u v = 0 uxvx dx, and Hn , n = 2k + j, 0 j < 2k , denote the lexicographically ordered Haar functions (24.16). A short calculation confirms for n 1 t t 10t Hn = Hn x dx = 21 2k/2 H1 2k x − j dx = 21 2−k/2 Fn t where F1 t =
t 0
0
0
H1 x dx 101 t = 2t10 1 t − 2t − 21 1 1 t is a tent2
function and Fn t = F1 2k t − j. Since 0 Fn 1, we see
10t Hn 2
n=0
2
1 1 2−k = 4 n=0 2
and Theorem 24.27(ii) guarantees that Wt exists, for each t ∈ 0 1 , both in L2 d-sense and d-almost everywhere. More is true. Since the en are independent Gaussian random variables, so are their finite linear combinations (e.g. Bauer [5, §24]) and, in particular, the partial sums N SN t = en 10t Hn n=0
Gaussianity is preserved under L2 -limits;6 we conclude that Wt has a Gaussian distribution for each t. The mean is given by 1 1 Wt d = en d 10t Hn = 0 0
n=0 0
(to change integration and summation use that L2 d-convergence entails L1 d-convergence on a finite measure space). Since en em d = 0 or 1 6
(cf. [5, §§23, 24]) if Xn is normal distributed with mean 0 and variance n2 , its Fourier transform is iX n→ 2 2 e n dP = en /2 . If Xn −−−→ X in L2 -sense, we have n2 → 2 and, by dominated convergence, iX iX 2 2 2 2 n e dP = limn e dP = limn en /2 = e /2 ; the claim follows from the uniqueness of the Fourier transform.
Measures, Integrals and Martingales
311
according to n = m or n = m, we can calculate for 0 s < t 1 the variance by 1 Wt − Ws 2 d 0
= =
nm=0 0
1
en em d 10t − 10s Hn 10t − 10s Hm
1st Hn 2
24.17,21.13
=
1st 1st = t − s
n=0
In particular, the increment Wt − Ws has the same probability distribution as Wt−s . In the same vein we find for 0 s < t u < v 1 that 1 Wt − Ws Wv − Wu d = 1st 1uv = 0 0
Since Wt − Ws is Gaussian, this proves already the independence of the two increments Wt − Ws and Wv − Wu , cf. [5, §24]. By induction, we conclude that Wtn − Wtn−1 Wt1 − Wt0 are independent for all 0 t0 · · · tn 1. Let us finally turn to the dependence of Wt on t. Note that for M < N N 1 1 sup SN t − SM t d = sup en 10t Hn d 0 0 t∈01
t∈01 n=M+1
N
1
n=M+1 0
C
N
en d sup 10t Hn t∈01 = const.
1 2
2−k/2 <
n=M+1
which means that the partial sums SN t of Wt converge in L1 d uniformly for all t ∈ 0 1 . By C12.8 we can extract a subsequence, which converges (uniformly in t) for d-almost all to Wt ; since for fixed the partial sums t → SN t are continuous functions of t, this property is inherited by the a.e. limit Wt . The above construction is a variation of a theme by Lévy [26, Chap. I.1, pp. 15–20] and Ciesielski [10]. In one or another form it can be found in many probability textbooks, e.g. Bass [3, pp. 11–13] or Steele [45, pp. 35–39]. A related construction of Wiener, see Paley and Wiener [34, Chapter XI], using random Fourier series, is discussed in Kahane [23, §16.1–3].
312
R.L. Schilling
Problems 24.1. Prove the orthogonality relation for the Jacobi polynomials 24.1. 24.2. Use the Gram–Schmidt orthonormalization procedure to verify the formulae for the first few Chebyshev, Legendre, Laguerre and Hermite polynomials given in 24.1–24.5. 24.3. State and prove Theorem 24.6 and Corollary 24.8 for an arbitrary compact interval a b . 24.4. Prove the orthogonality relations (24.4) for the trigonometric system. [Hint: observe that Im eix+y + eix−y = 2 sin x cos y.] 24.5. (i) Show that for suitable constants cj sj ∈ and all k ∈ 0 cosk x =
k
cj cos jx
and
sink+1 x =
j=0
k+1
sj sin jx
j=1
(ii) Show that for suitable constants aj bj ∈ and all k ∈ cos kx =
k
aj cosk−j x sinj x
and
sin kx =
j=0
k
bj cosk−j x sinj x
j=1
(iii) Deduce that every trigonometric polynomial Tn x of order n can be written in the form n Un x = jk cosj x sink x jk=0
and vice versa. 24.6. Use the sin a − sin b = 2 cos a+b sin a−b to show that DN x sin x2 = 2 2 formula 1 1 sin N + 2 x. This proves (24.11). 2 24.7. Find the Fourier series expansion for the function sin x . 24.8. Let ux = 101 x. Show that the Haar–Fourier series for u converges for all 1 p < in Lp -sense to u. Is this also true for the Haar wavelet expansion? 24.9. Show that the Haar–Fourier series for u ∈ Cc converges uniformly for every x ∈ to ux. Show that this remains true for functions u ∈ C , i.e. the set of continuous functions such that limx→ ux = 0. [Hint: use the fact that u ∈ Cc is uniformly continuous. For u ∈ C observe that • (closure in sup-norm) and check that sN u x u .] C = Cc 24.10. Extend Problem 24.9 to the Haar wavelet expansion.
N → [Hint: use Problem 24.9 and show that E−N u −−−→ 0 for all u ∈ Cc .]
24.11. Let ux = 101/3 x. Prove that the Haar–Fourier diverges at x = 13 . [Hint: verify lim inf N → sN u 13 < lim supN → sN u 13 .]
Appendix A lim inf and lim sup
For a sequence of real numbers aj j∈ ⊂ the limes inferior or lower limit is defined as lim inf aj = sup inf aj j→
k∈ jk
(A.1)
and the limes superior or upper limit is defined as lim sup aj = inf sup aj k∈ jk
j→
(A.2)
Lower and upper limits of a sequence are always defined as numbers in − + and − is due to the fact that the +, respectively. This sequences inf jk aj k∈ ⊂ − + and supjk aj k∈ ⊂ − + are in- resp. decreasing, so that the supk∈ and inf k∈ in (A.1) and (A.2) are actually (improper) limits limk→ . Let us collect a few simple properties of lim inf and lim sup. A.1 Properties (of lim inf and lim sup). Let aj j∈ and bj j∈ be sequences of real numbers. (i) lim inf aj = lim inf aj and lim sup aj = lim sup aj . j→
k→ jk
j→
k→ jk
(ii) lim inf aj = − lim sup−aj . j→
j→
(iii) lim inf aj lim sup aj . j→
j→
(iv) lim inf aj and lim sup aj are limits of subsequences of aj j∈ and all other j→
j→
limits L of subsequences of aj j∈ satisfy lim inf aj L lim sup aj j→
j→
313
314
R.L. Schilling
(v) lim aj ∈ exists ⇐⇒ − < lim inf aj = lim sup aj < +. j→
j→
j→
In this case lim aj = lim inf aj = lim sup aj . j→
j→
j→
(vi) lim inf aj + lim inf bj lim inf aj + bj , j→
j→
j→
lim supaj + bj lim sup aj + lim sup bj . j→
j→
j→
(vii) If aj bj 0 for all j ∈ , then lim inf aj lim inf bj lim inf aj bj j→
j→
j→
lim sup aj bj lim sup aj lim sup bj j→
j→
j→
(viii) lim inf aj + bj lim inf aj + lim sup bj lim supaj + bj . j→
j→
j→
j→
(ix) If, for all j ∈ , aj bj 0, then lim inf aj bj lim inf aj lim sup bj lim sup aj bj j→
j→
j→
j→
(x) If the limit limj→ aj exists, then lim inf aj + bj = lim aj + lim inf bj j→
j→
j→
lim supaj + bj = lim aj + lim sup bj j→
j→
j→
(xi) If aj bj 0 for all j ∈ and if limj→ aj exists, then lim inf aj bj = lim aj lim inf bj j→
j→
j→
lim sup aj bj = lim aj lim sup bj j→
j→
j→
(xii) lim sup aj = 0 =⇒ lim aj = 0. j→
j→
Proof (i) follows from the remark preceding A.1, (ii) is clear since inf aj = − sup−aj j
j
and (iii) follows from the inequality inf jk aj supjk aj where we can pass to the limit k → on both sides. Notice that (ii) reduces any statement about lim sup to a dual statement for lim inf. This means that we need to show (iv)–(xi) for the lower limit only.
Measures, Integrals and Martingales
315
(iv): Let anj j∈ ⊂ aj j∈ be some subsequence with (improper) limit L = limj→ anj . Then inf aj inf anj L =⇒ lim inf aj L
jk
jk
k→ jk
i.e. lim inf j→ aj is smaller than any limit of any subsequence. Let us now construct a subsequence which has L∗ = lim inf j→ aj > − as its limit. By the very definition of L∗ and the infimum we find for all > 0 some N ∈ such that L∗ − inf aj ∀ k N jk
Since then inf jk aj > −, we find by the definition of the infimum some
k N , = k , and a with a − inf aj jk
Specializing = n1 , n ∈ , we obtain an infinite family of a n from which we can extract a subsequence with limit L∗ . If L∗ = −, the sequence aj j∈ is unbounded from below and it is obvious that there must exist a subsequence tending to −. (v): If limj→ aj exists, then all subsequences converge and have the same limit, thus lim inf j→ aj = limj→ aj = lim supj→ aj by (iv). Conversely, if L = lim inf j→ aj = lim supj→ aj , we get for all k ∈ k→
0 ak − inf aj sup aj − inf aj −−−→ 0 jk
jk
jk
and limk→ ak = limk→ inf jk aj = L follows from a sandwiching argument. (vi) follows immediately from inf aj + inf bj a + b
jk
jk
∀ k
=⇒
inf aj + inf bj inf a + b
jk
jk
k
if we pass to the limit k → on both sides. (vii): We have 0 inf jk bj b for all k and multiplying this inequality with 0 inf jk aj a , k, gives inf aj inf bj a b
jk
jk
∀ k
=⇒
inf aj inf bj inf a b
jk
jk
k
The assertion follows as we go to the limit k → on both sides. (viii): We have inf aj + bj a + b a + sup bj
jk
jk
∀ k
316
R.L. Schilling
so that inf jk aj + bj inf jk aj + supjk bj , and the assertion follows as we go to the limit k → on both sides. (ix) is similar to (viii) taking into account the precautions set out in (vii). (x): If limj→ aj exists, we know from (v) that limj→ aj = lim inf j→ aj = lim supj→ aj . Thus A.1(v)
lim aj + lim inf bj = lim inf aj + lim inf bj
j→
j→
j→
A.1(vi)
j→
A.1(viii)
lim inf aj + bj j→
lim sup aj + lim inf bj j→
A.1(v)
j→
lim aj + lim inf bj
j→
j→
(xi) is similar to (x) using (v),(vii) and (ix). (xii): since aj 0, A.1(iii)
0 lim inf aj lim sup aj = 0 j→
j→
and we conclude from (v) that lim aj = lim inf aj = lim sup aj = 0
j→
j→
j→
Thus limj→ aj = 0. ∗
∗
∗
Sometimes the following definitions for upper and lower limits of a sequence of sets Aj j∈ , Aj ⊂ X, are used: Aj and lim sup Aj = Aj (A.3) lim inf Aj = j→
k∈ jk
j→
k∈ jk
The connection between set-theoretic and numerical upper and lower limits is given by A.2 Lemma For all x ∈ X we have lim inf 1Aj x = 1lim inf Aj x
(A.4)
lim sup 1Aj x = 1lim sup Aj x
(A.5)
j→
j→
j→
j→
Measures, Integrals and Martingales
317
Proof Note that 1k∈ Bk = inf 1Bk
and
k∈
1k∈ Bk = sup 1Bk k∈
which follows from 1k∈ Bk x = 1 ⇐⇒ x ∈
Bk
k∈
⇐⇒ ∀ k ∈ x ∈ Bk ⇐⇒ ∀ k ∈ 1Bk x = 1 ⇐⇒ inf 1Bk x = 1 k∈
A similar argument proves the assertion for supk∈ 1Bk . Hence, 1lim inf Aj = 1k∈ jk Aj = sup 1jk Aj = sup inf 1Aj = lim inf 1Aj j→
and (A.5) follows analogously.
k∈
k∈ jk
j→
Appendix B Some facts from point-set topology
The following diagram gives a survey of various types of abstract spaces used in this book. The arrows ‘−→’ indicate how the spaces are connected. In brackets we mention the key concepts that define the notion of convergence in these spaces. n
Banach space (norm, complete)
Hilbert space (scalar product, complete)
normed space (norm)
inner product space (scalar product)
metric space (distance)
topological space (open set)
Note that due to the Riesz–Fischer theorem 12.7 the space L2 • • is a Hilbert space and all Lp •p , 1 p < are Banach spaces. The material below can be found in many introductory texts on general topology and real analysis. For this compilation we used the books by Willard [54], Steen and Seebach [46] and Rudin [39]. Complete proofs are given in [54] and in the first few chapters of [39].
318
Measures, Integrals and Martingales
319
Topological spaces Topological spaces are characterized by the notion of openness of sets. B.1 Definition A topological space X consists of a set X and a system = X of subsets of X, called a topology, which satisfies the following properties: ∅ X ∈
(1 )
U V ∈ =⇒ U ∩ V ∈ Ui ∈ i ∈ I arbitrary =⇒ Ui ∈ n
(2 ) (3 )
i∈I
A set U ∈ is called an open set. A set F ⊂ X is closed, if its complement F c is open. We write = X for the family of closed sets in X. From de Morgan’s identities (2.2) it is not hard to see that • X and ∅ are closed sets, • unions of finitely many closed sets are again closed, • intersections of arbitrarily many closed sets are again closed. B.2 Examples Let X be an arbitrary set. (i) ∅ X is a topology on X. (ii) The power set X is a topology on X. (iii) Let U be a ‘classical’ open set in n , i.e. for every x ∈ U one can find some > 0 such that B x ⊂ U . The classical open sets n are a topology in n . Unless otherwise stated, we will always consider this natural topology on n . (iv) (Trace topology) Let X X be a topological space and A ⊂ X be any subset. Then the relatively open subsets of A, A = A ∩ X = A ∩ U U ∈ X turn A A into a topological space. (v) (Product topology) Let X X and Y Y be topological spaces. Then X × Y becomes a topological space under the product topology X × Y : by definition, a set W ∈ X × Y if W ⊂ X × Y and if for each w = x y ∈ W there exist U ∈ X and V ∈ Y such that w = x y ∈ U × V ⊂ W This makes X × Y the smallest topology containing X × Y . B.3 Definition Let X be a topological space. (i) An open neighbourhood of a point x ∈ X is an open set U = Ux containing x. A neighbourhood of x is any set containing an open neighbourhood of x.
320
R.L. Schilling
(ii) The space X is called separated or a Hausdorff space if any two different points x y ∈ X have disjoint neighbourhoods. ¯ is the smallest closed set (iii) Let A ⊂ X. The closure of A, denoted by A, containing A, i.e. A¯ = F ∈F ⊃A F . (iv) Let A ⊂ X. The (open) interior of A, denoted by A , is the largest open set inside A, i.e. A = U ∈U ⊂A U . (v) A set A ⊂ X is dense in X, if A¯ = X. (vi) The space X is separable if it contains a countable dense subset. B.4 Examples (i) The space n n is a Hausdorff space. (ii) The space X X and all spaces mentioned in the diagram at the beginning of the section are Hausdorff spaces. (iii) The space X ∅ X is not separated. (iv) A set U in a topological space X is open if, and only if, it is a neighbourhood of each of its points. (v) The open ball Br x = y ∈ n x −y < r in n is an open neighbourhood of x. The closed ball Kr x = y ∈ n x − y r is the closure of Br x, thus Br x = Kr x. (vi) The set of rational numbers is dense in . Therefore is separable. The same is true for n when we consider the countable dense set n . Density assertions are often expressed through approximation theorems such as Corollary 12.11, Theorem 24.6 or Corollary 24.12. B.5 Definition A subset K of a Hausdorff space X is called compact, if every cover of K by open sets, K ⊂ i∈I Ui , Ui ∈ , I is an arbitrary index set, admits a finite sub-cover, i.e. if there are finitely many Ui1 Uin such that K ⊂ Ui1 ∪ ∪ Uin . A set L is relatively compact if L is compact. B.6 Proposition Let X be a Hausdorff space. (i) Every compact set K is closed. (ii) Closed subsets of compact sets are closed. (iii) A family Ki i∈I of compact sets (indexed by an arbitrary set I) has non empty intersection i∈I Ki = ∅ if, and only if, every finite subcollection Kij nj=1 has non-empty intersection Ki1 ∩ Ki2 ∩ ∩ Kin = ∅. B.7 Example A set K ⊂ n is compact if, and only if, it is closed and bounded. This is also equivalent to saying that every sequence xj j∈ ⊂ K has a convergent subsequence. Such a simple characterization of compactness fails in infinitedimensional spaces, notably in the Hilbert space L2 or the Banach spaces Lp , 1 p < , see Theorem B.22 and B.27.
Measures, Integrals and Martingales
321
Theorem B.6(iii) is an abstract version of the well known interval principle in : a sequence of nested closed intervals aj bj ⊂ , j ∈ , has non empty intersection j∈ aj bj = ∅. If, in addition, limj→ bj − aj = 0 then j∈ aj bj = L where L = lim j→ aj = lim j→ bj . B.8 Definition Let X X and Y Y be two topological spaces. A map f X → Y is called continuous at x ∈ X, if for every neighbourhood V = Vfx we can find a neighbourhood U = Ux of x such that fU ⊂ V . If f is continuous at every x ∈ X, we call f continuous. B.9 Example Definition B.8 coincides on Euclidean spaces with the classical notion of continuity, i.e. a map f n → m is continuous at x ∈ n if, and j→
j→
only if, for every convergent sequence xj −−−→ x we have fxj −−−→ fx, cf. Theorem B.19. B.10 Definition Let X be a topological space. A set A ⊂ X is called connected, if A cannot be written in the form A = U ∪ V where U V ∈ and U ∩ V = ∅. The set A is called pathwise connected, if for any two points x y ∈ A there is a continuous curve or path 0 1 → A such that 0 = x and 1 = y. B.11 Examples (i) The only connected sets in are finite or infinite intervals. The set a b ∪ c d where a < b < c < d is not connected. (ii) Pathwise connected sets are connected; the converse is, in general, wrong: the set V = x 0 ∈ 2 x 0 ∪ x sin x1 ∈ 2 x > 0 is connected, but no path can be found from 0 0 to any point x sin x1 . B.12 Theorem Let f X → Y be a map between the topological spaces X and Y . (i) The map f is continuous if, and only if, for all open V ∈ Y the pre-image f −1 V ∈ X is open. (ii) Let f be continuous. The image fK ⊂ Y of a compact [connected, pathwise connected ] set K ⊂ X is again compact [connected, pathwise connected ]. (iii) Let K ⊂ X be a compact set. A continuous map1 g K → attains its maximum and minimum. (iv) Let K ⊂ X be a compact set and f K → be a injective and continuous map. Then the inverse map f −1 fK → K exists and is continuous. Since for our purposes the characterization of continuity by open sets is of central importance, cf. Example 7.3, we include the short 1
We consider here the trace topology K, cf. Example B.2.
322
R.L. Schilling
Proof (of Theorem B.12(i)) ‘⇐’ Assume first that f −1 Y ⊂ X. Every neighbourhood V˜ of fx contains by definition an open set V ⊂ V˜ with fx ∈ V . By assumption, U = f −1 V is open, and since x ∈ U , U is an (open) neighbourhood of x with fU = f f −1 V ⊂ V . ‘⇒’ Assume now that f is continuous. Take any open set B ⊂ Y , set A = and fix some x ∈ A. Since B is open, there is some open neighbourhood V = Vfx ⊂ B and by continuity we find some neighbourhood U = Ux ⊂ X of x with fU ⊂ V . Thus f −1 B
def
U ⊂ f −1 fU ⊂ f −1 V ⊂ f −1 B = A which shows that A contains for every of its points a whole neighbourhood. This is to say that A is open. B.13 Example Let g a b → be a continuous function. Since a b is compact, g attains its maximum M = sup g a b = gxmax and minimum m = inf g a b = gxmin at some points xmax xmin ∈ a b. Since a b is compact and pathwise connected, so is g a b, hence it is of the form m M. In particular, we have recovered the intermediate value theorem for functions of a real variable. B.14 Definition Let xj j∈ ⊂ X, be a sequence in the topological space X . We j→
say that xj converges to x ∈ X and write limj→ xj = x or x −−−→ x if for every open neighbourhoodU = Ux there is someN = NU ∈ such thatxj ∈ U for allj NU . This is also the ‘usual’ convergence in the spaces and n . Note that limits are only unique if X is a Hausdorff space. Sometimes we can use limits of sequences to give an equivalent description of the topology. This is always the case if every point x ∈ X has a countable system of open neighbourhoods Un n∈ with the property that for every neighbourhood V = Vx of x there is at least one Un0 ⊂ V ; this is always true in metric spaces, cf. B.19. Metric spaces In metric spaces we have a notion of distance between any two points. B.15 Definition A metric space X d is a set X with a distance function or metric d X × X → 0 such that for all x y z ∈ X definiteness symmetry triangle inequality
dx y = 0 ⇐⇒ x = y dx y = dy x dx y dx z + dz y
d1 d2 d3
Measures, Integrals and Martingales
323
B.16 Examples (i) Let X d be a metric space. Then A d = dA×A is again a metric space for all A ⊂ X. (ii) The real line is a metric space with dx y = x − y. The space n becomes a metric space with each of the following metrics: ⎧ 1/p n ⎪
⎪ p ⎨ if 1 p < xj − yj j=1 dp x y = ⎪ ⎪ if p = ⎩ max xj − yj 1jn
(iii) The topological space X X is a metric space with metric 1 if x = y dx y = 0 if x = y (iv) Let Xj dj , j = 1 2, be two metric spaces. Then X1 × X2 becomes a metric space for any of the following metrics xj yj ∈ Xj 1 p < :
p 1/p p
p x1 x2 y1 y2 = d1 x1 y1 + d2 x2 y2 or
x1 x2 y1 y2 = max dj xj yj j=12
B.17 Definition Let X d be a metric space. We call Br x = x ∈ X dx y < r
resp. Kr x = x ∈ X dx y r
an open resp. closed ball with centre x and radius r > 0. An open set is a set U ⊂ X such that for every x ∈ U there is some > 0 and B x ⊂ U . Closed sets arise as complements of open sets. Using the triangle inequality it is easy to see that open balls in X are also open sets and that closed balls are closed sets. Mind, however, that in general Br x Kr x. B.18 Lemma The family of open sets of a metric space X is a topology in the sense of Definition B.1. X is a separated topological space. The converse of Lemma B.18 is wrong: the topology ∅ a X of the space X = a b cannot be generated by any metric. The topology of metric spaces can be described by sequences.
324
R.L. Schilling
B.19 Theorem Let X d, Y be a metric spaces. j→
(i) A sequence xj j∈ ⊂ X converges to x, xj −−−→ x, if, and only if, j→
dxj x −−−→ 0. Moreover, the limit x is unique. (ii) A set F ⊂ X is closed if, and only if, every convergent sequence xj j∈ ⊂ F has its limit limj→ xj ∈ F . (iii) A set K ⊂ X is compact if, and only if, every sequence xj j∈ ⊂ K has a convergent subsequence whose limit is in K. (iv) A set A ⊂ X is dense if, and only if, for every x ∈ X there is a sequence j→
aj j∈ with daj x −−−→ 0. (v) A function f X → Y is continuous at x ∈ X if, and only if, for every sequence j→
j→
xj −−−→ x we have fxj −−−→ fx. Since for our purposes the characterization of continuity is of central importance, cf. Example 7.3, we include the short Proof (of Theorem B.19(v)) We begin with the observation that every neigh˜ bourhood U˜ = Ux of a point x ∈ X contains some open set U ⊂ U˜ with x ∈ U . Since U is open, we find by definition some > 0 such that B x ⊂ U . This shows that we can restate the definition of continuity B.8 at a point x in the following form: ∀ > 0
∃ > 0 fB x ⊂ B fx
(mind that the balls are taken in X and Y , respectively). j→
‘⇒’: If xj −−−→ x, we know from the definition of convergence that for every > 0 there is some N = N such that xj ∈ B x for all j N . Since f is continuous at x, we can choose for every > 0 some = > 0 such that fB x ⊂ B fx. Thus fxk ∈ fxj j N ⊂ fB x ⊂ B fx ∀ k N k→
which shows that fxk −−−→ fx. j→
j→
‘⇐’: Assume that xj −−−→ x implies fxj −−−→ fx but that f is not continuous at x. Thus there is some > 0, such that for all n ∈ the set fB1/n x is not (entirely) contained in B fx. Thus we can pick for each n ∈ some n→ xn ∈ B1/n x, such that fxn ∈ B fx. This means, however, that xn −−−→ x while dfxn fx > 0 for all n ∈ , contradicting that fxn converges to fx.
Measures, Integrals and Martingales
325
B.20 Definition Let X d be a metric space. A sequence xj j∈ is a Cauchy sequence, if ∀ > 0
∃ N = N ∈
∀ j k N dxj xk
A metric space is complete if every Cauchy sequence converges. An isometry is a surjective map j X → Y between two metric spaces X d and Y which satisfies dx x = jx jx . B.21 Theorem (Completion) For every metric space X d there exists a comˆ such that d ˆ X×X = d and X ⊂ X d is a dense subset. Any plete metric space X two completions of X are, up to isometries, identical. By covering a compact set K with the open sets B1 xx∈K and extracting a finite subcover we can easily see that K has finite diameter diamK = supxy∈K dx y and is, therefore, bounded. Thus compact sets are closed and bounded. The converse is, in general, not true; however, B.22 Theorem (Heine–Borel) A subset of n is compact if, and only if, it is closed and bounded. Moreover, all metrics on n are equivalent in the sense that for any two metrics d and there are absolute constants c C > 0 such that c dx y x y C dx y
∀ x y ∈ n
Normed spaces B.23 Definition A normed space X • is a -vector space2 X with a norm •, i.e. a map • X → 0 which satisfies for x y ∈ X and ∈ the following properties: x > 0 ⇐⇒ x = 0
N1
pos. homogeneity
x = · x
N2
triangle inequality
x + y x + y
N3
definiteness
If we drop the definiteness N1 , • is called a semi-norm and X • is a semi-normed space. B.24 Examples
(i) The spaces n n equipped with 1/p n xj p or x = max xj x = j=1
1 p < are normed spaces. 2
stands for either or .
1jn
326
R.L. Schilling
(ii) Let Xj •j , j = 1 2, be two normed spaces. Then X1 × X2 becomes a normed space under any of the following norms xj ∈ Xj 1 p < :
p p 1/p x1 x2 p = x1 1 + x2 2
or
x1 x2 = max xj j j=12
(iii) Every normed space is a metric space with metric given by dx y = x −y. Therefore, all notions and results for metric spaces carry over to normed spaces. In particular, open and closed balls are given by Br x = y ∈ X x − y < r and
Kr x = y ∈ X x − y r
Since X is a vector space, we have now Br x = Kr x. However, not every metric space arises from a normed space, e.g. the metric dx y = 1 or 0 according to x = y or x = y on n cannot be realized by any norm. B.25 Lemma Let X be a normed space. Then the following maps are continuous: X x → x
X × X x y → x + y
× X x → x
B.26 Definition A Banach space is a complete normed space. The following result, due to F. Riesz, says that the Heine–Borel theorem B.22 holds if, and only if, the underlying space is finite-dimensional. B.27 Theorem (Riesz). In a normed space V closed and bounded sets are compact if, and only if, V is finite-dimensional. Let ∼ be an equivalence relation on the normed space X. We write x = y ∈ X x ∼ y for the equivalence class with representative x. The quotient space X/∼ consists of all equivalence classes. It is not hard to see that X/∼ is again a vector space and that
x + y = x + y
∀ ∈ x y ∈ X
B.28 Theorem Let X • be a (complete) normed space. Then X/∼ is a (complete) normed space under the quotient norm given by x∼ = infy y ∈ x
Measures, Integrals and Martingales
327
Essentially the same procedure allows us to turn any semi-normed space X • into a normed space. We use the following equivalence relation for x y ∈ X: x ≈ y ⇐⇒ x − y = 0 and observe that infy y ∈ x = x B.29 Corollary Let X • be a (complete) semi-normed space. Then X/≈ is a (complete) normed space with norm given by x≈ = x. B.30 Example Denote by p X , 1 p < , the pth power integrable functions of the measure space X . Then 1/p up = up d is a semi-norm on p X , and Lp X = p X /∼ is a Banach space if we identify u w ∈ p X whenever u − wp = 0.
Appendix C The volume of a parallelepiped
In this appendix we give a simple derivation for the volume of the parallelepiped A 0 1n = Ax ∈ n x ∈ 0 1n A ∈ GLn for a non-degenerate n × n matrix A ∈ n×n . C.1 Theorem n A0 1n = det A for all A ∈ GLn . The proof of Theorem C.1 requires two auxiliary results. C.2 Lemma If D = diag1 n , j > 0, is a diagonal n × n matrix, then n DB = det D n B for all Borel sets B ∈ n . Proof Since both D and D−1 are continuous maps, DB is a Borel set if B ∈ n , cf. Example 7.3. In view of the uniqueness theorem 5.7 for measures it is enough to prove the lemma for half-open rectangles a b, a b ∈ n . Obviously, n
Da b = × j aj j bj j=1
and n n n Da b = j bj − j aj = 1 · · n bj − aj j=1
j=1
= det D n a b
C.3 Lemma Every A ∈ GLn can be written as A = SDT , where S T ∈ On are orthogonal n × n matrices and D = diag1 n is a diagonal matrix with positive entries j > 0. 328
Measures, Integrals and Martingales
329
Proof The matrix tAA is symmetric and so we can find some orthogonal matrix U ∈ On such that ˜ = diag 1 n UtAAU = D
t
Since for ej = 0 0 1 0 0 and the Euclidean norm • j
˜ j = tej tU tAAUej = AUej 2 > 0
j = tej De
˜ = diag1 n where j = j . Thus we can define D = D D−1 tU tAAUD−1 = idn and this proves that S = AUD−1 ∈ On. Since T = tU ∈ On, we easily see that SDT = AUD−1 D t U = A Proof (of Theorem C.1) We have for A ∈ GLn C.3 n A0 1n = n SDT 0 1n 7.9 n = DT 0 1n C.2 = det D n T 0 1n C.3 = det D n 0 1n Since S T ∈ On, their determinants are either +1 or −1, and we conclude that det A = detSDT = det S · det D · det T = det D.
Appendix D Non-measurable sets
Let X be a measure space and denote by X ∗ ¯ its completion, cf. Problem 4.13 for the definition and Problems 6.2, 10.11, 10.12, 13.11 and 15.3 for various properties. Here we only need that ∗ = A ∪ N A ∈ N is a subset of some -measurable -null set is the completion of with respect to the measure . It is a natural question to ask how big and ∗ are and whether ⊂ ∗ ⊂ X are proper inclusions. Sometimes, see Problems 6.10 or 6.11, these questions are easy to answer. For the Borel -algebra = n and Lebesgue measure = n this is more difficult. The following definition helps to distinguish between sets in n and the completion ∗ n w.r.t. Lebesgue measure. D.1 Definition The Lebesgue -algebra is the completion ∗ n of the Borel -algebra w.r.t. Lebesgue measure n . A set B ∈ ∗ n is called Lebesgue measurable. The next theorem shows that there are ‘as many’ Lebesgue measurable sets as there are subsets of n . D.2 Theorem We have #∗ n = #n for all n ∈ . Proof Since ∗ n ⊂ n we have that #∗ n #n . On the other hand, we have seen in Problem 7.10 that the Cantor ternary set C is an uncountable Borel measurable 1 -null set of cardinality # = . Consequently, n−1 × C is a n -null set. By definition of the Lebesgue -algebra, all sets in n−1 × C are Lebesgue measurable (null) sets, i.e. n−1 × C ⊂ ∗ n , and therefore #n−1 × C #∗ n . Using the fact that there is a bijection between C and we also get #n #n−1 × C #∗ n , and the Cantor–Bernstein theorem 2.7 proves that #n = #∗ n . 330
Measures, Integrals and Martingales
331
Unfortunately, we cannot use Theorem D.2 to decide whether there are sets which are not Lebesgue measurable. To answer this question we need the axiom of choice. D.3 Axiom of choice (AC) Let Mi i ∈ I be a collection of non-empty and mutually disjoint subsets of X. Then there exists a set L ⊂ i∈I Mi which contains exactly one element from each set Mi , i ∈ I. Note that AC only asserts the existence of the set L but does not tell us how or if the set L can be constructed at all. (This problem is at the heart of the controversy over whether one should or should not accept AC.) D.4 Theorem Assuming the axiom of choice, there exist non-Lebesgue measurable sets in n . Proof Assume first that n = 1. We will construct a non-Lebesgue measurable subset of = 0 1. We call any two x y ∈ equivalent if x∼y
⇐⇒
x − y ∈
The equivalence class containing x is given by x = y ∈ x − y ∈ = x + ∩ . By construction, is partitioned by a family of mutually disjoint equivalence classes xj , j ∈ J . By the axiom of choice1 there exists a set L which contains exactly one element, say mj , from each of the classes xj , j ∈ J . We will show that L cannot be Lebesgue measurable. Assume L were Lebesgue measurable. Since for every x ∈ we have x ∩ L = mj0 , j0 = j0 x ∈ J , we can find some q ∈ such that x = mj0 + q. Obviously, −1 < q < 1. Thus ⊂ L + ∩ −1 1 ⊂ + −1 1 = −1 2 which we can rewrite as 0 1 ⊂
q + L ⊂ −1 2
q∈ ∩−11
Moreover, r +L∩q +L = ∅ for all r = q, r q ∈ . Otherwise r +x = q +y for x y ∈ L, so that x ∼ y which is impossible since L contains only one representative 1
We have to use the axiom of choice since J is uncountable. This follows from the observation that the uncountable set = · j∈J xj is the disjoint union of countable sets xj = x + ∩ . It is known that all proofs for Theorem D.4 must use the axiom of choice or some equivalent statement, cf. Solovay [44].
332
R.L. Schilling
of each equivalence class. Therefore we can use the -additivity of the measure ¯ 1 to find ¯ 1 q + L ¯ 1 −1 2 = 3 1 = ¯ 1 0 1 q∈ ∩−11
Since ¯ 1 is invariant under translations, we get ¯ 1 q + L = ¯ 1 L for all q ∈ ∩ −1 1. We conclude that 1 ¯ 1 L 3 q∈ ∩−11
which is not possible. This proves that L cannot be Lebesgue measurable. If n > 1, a similar argument shows that 0 1n−1 × L is not Lebesgue measurable. The question whether there are Lebesgue measurable sets which are not Borel measurable can be answered constructively. Since this is quite tedious, we content ourselves with the fact that there are ‘fewer’ Borel sets than there are Lebesgue measurable sets. D.5 Theorem We have #n = . D.6 Corollary There are Lebesgue measurable sets which are not Borel measurable. Proof (of D.6) We know from Theorem D.2 that #∗ n = #n and from Theorem D.5 that #n = . Since by Theorem 2.9 and Problem 2.17 #n > #n = , we conclude that n ∗ n . To prove Theorem D.5 we show that the Borel sets are contained in a family of k sets which has cardinality . Let = k=1 be the set of all finite sequences of natural numbers and write for the family of open balls Br x ⊂ n with radius r ∈ + and centre x ∈ n . We have seen in Problems 2.19 and 2.9 that # = # and # = # + × n = # Therefore, the collection of all Souslin schemes → i1 i2 ik → Ci1 i2 ik has cardinality # = # = , cf. Problem 2.18. With each Souslin scheme we can associate a set A ⊂ n in the following way: take any sequence ij j∈ of natural numbers and consider the sequence of finite tuples i1 i1 i2 i1 i2 i3 i1 i2 ik formed by the first 1 2 k members of the sequence
Measures, Integrals and Martingales
333
ij j∈ . Using the Souslin scheme we pick for each tuple i1 i2 ik the corresponding set Ci1 i2 ik ∈ to get a sequence of sets Ci1 Ci1 i2 Ci1 i2 i3 Ci1 i2 ik from . Finally, we form the intersection of all these sets Ci1 ∩ Ci1 i2 ∩ Ci1 i2 i3 ∩ ∩ Ci1 i2 ik ∩ and consider the union over all possible sequences ij j∈ of natural numbers: A = A = Ci1 i2 ik ij j∈∈ k=1
Note that this union is uncountable, so that A is not necessarily a Borel set. It is often helpful to visualize this construction as tree: Souslin scheme s
C1
C11
C12
C2
C13
...
C211
C21
C212
C22
C213
...
C3
...
C23
C31
C32
C33
...
...
where the Ci1 Ci1 i2 Ci1 i2 i3 ∈ are the sets of the 1st, 2nd, 3rd, etc. generation. We will also call Ci1 i2 or Ci1 i2 i3 children or grandchildren of Ci1 . D.7 Definition (Souslin) Let , and A be as above. The sets in = A ∈ are called analytic or Souslin sets (generated by ). D.8 Lemma Let and be as before. (i) (ii) (iii) (iv)
is stable under countable unions and countable intersections; contains all open and all closed subsets of n ; n = ⊂ ; # .
Proof (i) Let A ∈ , ∈ , be a sequence of analytic sets
A = ij
j∈∈
k=1
Ci1 i2 ik
334
R.L. Schilling
Since
A =
A =
∈ ij j∈∈ k=1
∈
Ci1 i2 ik
it is obvious that A can be obtained from a Souslin scheme which arises by the juxtaposition of the Souslin schemes belonging to the A : arrange the double sequence Ci1 , i1 ∈ × , in one sequence – e.g. using the counting scheme of Example 2.5(iv) – to get the first generation of sets while all other generations follow suit in genealogical order. Thus A ∈ . For the countable intersection of the A we observe first that B =
∈
A =
∈ ij
j∈∈
ijm j∈∈
=1 k=1
[]
Ci1 i2 ik =
k=1
Ci i i 1 2
k
m=123
and then we merge the two infinite intersections indexed by k ∈ × into a single infinite intersection. Once again this can be achieved through the counting scheme of Example 2.5(iv): Ci11 1
∩ Ci11 i1 ∩ Ci22 1 2
1
∩ Ci11 i1 i1 ∩ Ci22 i2 ∩ Ci33 1 2 3
1 2
1
∩
1 1 → 1 2 → 2 1 → 1 3 → 2 2 → 3 1 → and so B=
ijm j∈∈ m=123
Ci11 ∩ Ci11 i1 ∩ Ci22 ∩ Ci11 i1 i1 ∩ Ci22 i2 ∩ Ci33 ∩ 1
1 2
1
1 2 3
1 2
1
We will now construct a Souslin scheme which produces B by arranging the sets j Ckm in a tree: • The first generation are the sets Ci11 , i11 ∈ . 1
• The second generation are the sets Ci11 i1 , i21 ∈ , such that they are for fixed i11 1 2
the children of Ci11 . 1
• Each Ci11 i1 has the same offspring, namely the sets Ci22 , i12 ∈ , which form 1 2 1 jointly the third generation. • The fourth generation are the sets Ci11 i1 i1 , i31 ∈ , such that they are for fixed 1 2 3
i11 i21 the grandchildren of Ci11 i1 . 1 2
• The fifth generation are the sets Ci22 i2 , i22 ∈ , such that they are for fixed i12 the grandchildren of Ci22 . 1
1 2
Measures, Integrals and Martingales
335
• Each Ci22 i2 has the same offspring, namely the sets Ci33 , i13 ∈ , which form 1 2 1 jointly the sixth generation. • This shows that B ∈ . (ii) Every open set can be written as countable union of -sets Br x U= Br x⊂U Br x∈
Indeed, the inclusion ‘⊃’ is obvious, for ‘⊂’ fix x ∈ U . Then there exists some r ∈ + with Br x ⊂ U . Since n is dense in n , x ∈ Br/2 y for some y ∈ n with x − y < r/4, so that x ∈ Br/2 y ⊂ U . Since there are only countably many sets in , the union is a fortiori countable. By part (i) we then get that U ∈ , i.e. contains all open sets. For a closed set F we know that F= Uj where Uj = F + B1/j 0 = y x ∈ F x − y < 1j j∈
is a countable intersection of open[] sets Uj . Since open sets are analytic, part (i) implies that F ∈ . (iii) Consider the system = A ∈ Ac ∈ . We claim that is a -algebra. Obviously, satisfies conditions 1 2 – i.e. contains n and is stable under complementation. To see 3 we take a sequence Aj j∈ ⊂ and observe that, by part (i),
c c Aj ∈ and Aj = Aj ∈ j∈ j∈ j∈ ∈
so that j Aj ∈ . Because of (ii) we have ⊂ ⊂ and this implies that ⊂ . Since, by (ii), all open sets are countable unions of sets from , we get ⊂ ⊂ def
( denotes the family of open sets) or = = n . (iv) follows immediately from the fact that there are # = # = Souslin schemes, cf. Definition D.7. The Proof of Theorem D.5 is now easy: By Lemma D.8 there are at most analytic sets. Since each singleton x , x ∈ n , is a Borel set, there are at least
336
R.L. Schilling
Borel sets (use Problem 2.17 to see #n = ). So, #n # and an application of Theorem 2.7 finishes the proof. D.9 Remark Our approach to analytic sets follows the original construction of Souslin [42], which makes it easy to determine the cardinality of . This, however, comes at a price: if one wants to work with this definition, things become messy, as we have seen in the proof of Lemma D.8(i). Nowadays analytic sets are often introduced by one of the following characterizations. A set A ⊂ n is analytic if, and only if, one of the following equivalent conditions holds: (i) A = f for some left-continuous function f → n ; (ii) A = g for some Borel measurable function g → n ; (iii) A = hB for some Borel set B ∈ X, some Polish space2 X and some Borel measurable function h B → n ; (iv) A = 2 B where 2 Y × n → n is the coordinate projection onto n , Y is a compact Hausdorff space3 and B ⊂ Y × n is a -set, i.e. B can be written as countable intersection (‘ ’) of countable unions (‘ ’) of compact subsets (‘ ’) of Y × n . For a proof we refer to Srivastava [43] which is also our main reference for analytic sets. The Souslin operation can be applied to other systems of sets than . Without proof we mention the following facts: = open sets = closed sets = compact sets and also =
and
n ∗ n
Most constructions of sets which are not Borel but still Lebesgue measurable are actually constructions of non-Borel analytic sets, cf. Dudley [14, §13.2].
2 3
i.e. a space X which can be endowed with a metric for which X is complete and separable. cf. Appendix B, Definition B.3.
Appendix E A summary of the Riemann integral
In this appendix we give a brief outline of the Riemann integral on the real line. The notion of integration was well known for a long time and ever since the creation of differential calculus by Newton and Leibniz, integration was perceived as anti-derivative. Several attempts to make this precise were made, but the problem with these approaches was partly that the notion of integral was implicit – i.e. axiomatically given rather than constructively – partly that the choice of possible integrands was rather limited and partly that some fundamental points were unclear. Out of the need to overcome these insufficiencies and to have a sound foundation, Bernhard Riemann asked in his Habilitationsschrift Über die Darstellbarkeit einer Function durch eine trigonometrische Reihe1 the question Also zuerst: Was b hat man unter fx dx zu verstehen?2 (p. 239) and proposed a general way a
to define an integral which is constructive, which is (at least for continuous integrands) the anti-derivative, and which can deal with a wider range of integrands than all its predecessors. We do not follow Riemann’s original approach but use the Darboux technique of upper and lower integrals. Riemann’s original definition will be recovered in Theorem E.5(iv). The (proper) Riemann integral Riemann integrals are defined only for bounded functions on compact intervals a b ⊂ ; this avoids all sorts of complications arising when either the domain or the range of the integrand is infinite. Both cases can be dealt with by various extensions of the Riemann integral, one of which – the so-called improper Riemann integral – we will discuss later on. 1 2
On the representability of a function by a trigonometric series. b First of all: what is the meaning of fx dx? a
337
338
R.L. Schilling
A partition of the interval a b consists of finitely many points satisfying = a = t0 < t1 < < tk−1 < tk = b
k = k
We call mesh = max1jk tj − tj−1 the mesh or fineness of the partition. Given a partition and a bounded function u a b → we define mj =
inf
ux
x∈tj−1 tj
and
Mj =
sup
ux
x∈tj−1 tj
for all j = 1 2 k, and introduce the lower, resp. upper Darboux sums
k
S u =
k
mj tj − tj−1
resp.
S u =
j=1
Mj tj − tj−1
j=1
Obviously, S • S • are linear, and if ux M, they satisfy S u S u M b − a
S u S u M b − a
(E.1)
E.1 Lemma Let be a partition of a b and ⊃ be a refinement of . Then
S u S u S u S u holds for all bounded functions u a b → .
Proof Since S u = −S −u and since S u S u is trivially fulfilled, it is enough to show S u S u. The partitions contain only finitely many points and we may assume that = ∪ where tj0 −1 < < tj0 for some index 1 j0 k. The rest follows by iteration. Clearly, S u = mj tj − tj−1 + mj0 tj0 − + mj0 − tj0 −1 j =j0
j =j0
+
mj tj − tj−1 + inf ux tj0 − x∈ tj0
inf
x∈tj0 −1
ux − tj0 −1 = S u
Lemma E.1 shows that the following definition makes sense. E.2 Definition Let u a b → be a bounded function. The lower and upper integrals of u are given by b S u ∗ u = sup a
and
b∗ a
u = inf S u
where sup and inf range over all finite partitions of a b.
Measures, Integrals and Martingales
339
b b∗ b b∗ E.3 Lemma ∗ u u and ∗ u = − −u. a
a
a
a
E.4 Definition A bounded function u a b → is said to be (Riemann) integrable, if the upper and lower integrals coincide. Their common value is denoted by
b
a
b b∗ ux dx = ∗ u = u a
a
and is called the (Riemann) integral of u. The collection of all Riemann integrable functions in a b is denoted by a b. E.5 Theorem (Characterization of a b) Let u a b → be a bounded function. Then the following assertions are equivalent (i) u ∈ a b. (ii) For every > 0 there is some partition such that S u − S u . (iii) For every > 0 there is some > 0 such that S u − S u for all partitions with mesh < . (iv) The limit I = lim uj tj − tj−1 exists for every choice of intermesh →0
j tj ∈
mediate values tj−1 j tj ; this means that for all > 0 there exists a > 0 such that for all partitions with mesh < I − uj tj − tj−1 j tj ∈
independently of the intermediate points. b∗ b u = ∗ u. If the limit exists, I = a
a
Proof We show the implications (i)⇒(ii)⇒(iii)⇒(iv)⇒(i). (i)⇒(ii): By the very definition and the lower and upper integrals in terms of of sup and inf, we find for every > 0 partitions and such that b
∗ u − S u 2 a
and S u −
b∗ a
u 2
Using the common refinement = ∪ we get from Lemma E.1 and the integrability of u b b∗ S u − S u S u − S u = S u − u + ∗ u − S u a
a
340
R.L. Schilling
(ii)⇒(iii): This is the most intricate step in the proof. Fix > 0 and denote by = a = t0 < t1 < < tk = b the partition in (ii). We choose > 0 in such a way that <
1
min tj − tj−1 2 1jk
and
<
4k u
If = a = t0 < t1 < < tN = b is any partition with mesh < we find
S u − S u =
Mj − mj tj − tj−1
j ∩tj−1 tj =∅
+
Mj − mj tj − tj−1
(E.2)
j ∩tj−1 tj =∅
where Mj mj indicates that the supremum resp. infimum is taken w.r.t. intervals defined by the partition . The first sum has at most 2k terms since ∩ a t1 = a , ∩ tN −1 b = b and since all other tj , 1 j k − 1, appear in exactly one or two intervals defined by . Thus
Mj − mj tj − tj−1 2k · 2 u ·
(E.3)
j ∩tj−1 tj =∅
The second sum in (E.2) can be written as a double sum
Mj − mj tj − tj−1
j ∩tj−1 tj =∅
=
⎡
k
⎣
j=1
k j=1
k
⎡
⎤
M − m t − t−1 ⎦
t t−1 t ⊂tj−1 j
⎣
⎤
Mj − mj t − t−1 ⎦
(E.4)
t t−1 t ⊂tj−1 j
Mj − mj tj − tj−1
j=1
= S u − S u Together (E.2)–(E.4) show S u − S u 2 for any partition with mesh < .
Measures, Integrals and Martingales
341
(iii)⇒(iv): Fix > 0 and choose > 0 as in (iii). Then we have for any partition = a = t0 < < tk = b with mesh < and any choice of intermediate points j ∈ tj−1 tj ,
k
S u − S u
uj tj − tj−1 S u S u +
j=1
This implies b∗
k
u−
a
uj tj − tj−1
b∗
u
a
j=1
and k b b u u t − t j j j−1 ∗ ∗ u + a
a
j=1
b b∗ mesh →0 u t − t − − − − − − → I = u = u. j j j−1 j=1 ∗ a a k mesh →0 (vi)⇒(i): Assume that j=1 uj tj − tj−1 −−−−−−→ I exists for any choice
which means that
k
of intermediate values. We have to show that I =
b∗ a
b u = ∗ u. By definition of a
the limit, there is some > 0 and some partition with mesh < such that
k
I −
uj tj − tj−1 I +
j=1
Since this must hold uniformly for any choice of intermediate values, we can pass to the infimum and supremum of these values and get
k
I −
j=1
k
inf
∈tj−1 tj
u tj − tj−1
sup
u tj − tj−1 I +
j=1 ∈tj−1 tj
Thus I − < S u S u I + , and b b∗ u S u I + I − < S u ∗ u a
a
Once we know that u is Riemann integrable, we can work out the value of the integral by particular Riemann sums:
342
R.L. Schilling
E.6 Corollary If u a b → is Riemann integrable, then the integral is the limit of Riemann sums lim
n→
kn n n n u j tj − tj−1 j=1
n n n where n = a = t0 < t1 < < tkn = b is any sequence of partitions with n n n→ n mesh n −−−→ 0 and where j ∈ tj−1 tj are some intermediate points. The existence of the limit of Riemann sums for some particular sequence of partitions does not guarantee integrability. E.7 Example The Dirichlet jump function ux = 101∩ x on 0 1 is not Riemann integrable, since for each partition of 0 1 we have Mj = 1 and 1 1∗ u = S u = 1. mj = 0, so that ∗ u = S u = 0 while 0
0
On the other hand, the equidistant Riemann sum k
j
uj
j−1 − = k k
j=1
1 k
k
uj
j=1
takes the value nk , 0 n k if we choose 1 n rational and n+1 k irrational. This allows us to construct sequences of Riemann sums which converge to any value in 0 1. Let us now find concrete functions which are Riemann integrable. A step function on a b is a function f a b → of the form fx =
N
yj 1Ij x
j=1
where N ∈ , yj ∈ and Ij are (open, half-open, closed, even degenerate) adjacent intervals such that I1 ∪ ∪ IN = a b and Ij ∩ Ik , j = k, intersect in at most one point. We denote by a b the family of all step functions on a b. E.8 Theorem Continuous functions, monotone functions, and step functions on a b are Riemann integrable. Proof Notice that the functions from all three classes are bounded on a b. Continuous functions: Let u a b → be continuous. Since a b is compact, u is uniformly continuous and we find for all > 0 some > 0 such that ux − uy
∀ x y ∈ a b x − y <
Measures, Integrals and Martingales
343
If is a partition of a b with mesh < we find Mj − mj tj − tj−1 tj − tj−1 = b − a S u − S u = tj ∈
tj ∈
since, by uniform continuity, Mj − mj = sup utj−1 tj − inf utj−1 tj =
sup
u − u
∈tj−1 tj
Thus u ∈ a b by Theorem E.5(iii). Monotone functions: We can safely assume that u a b → is monotone increasing, otherwise we would consider −u. For the equidistant partition k with points tj = a + j b−a k , 0 j k, we get S k u − Sk u =
k
utj − utj−1 tj − tj−1
j=1
=
k b−a b−a utj − utj−1 = ub − ua k j=1 k
where we used that sup utj−1 tj = utj and inf utj−1 tj = utj−1 because of monotonicity. Since b−a k ub − ua can be made arbitrarily small, u ∈ a b by Theorem E.5(ii). Step functions: Let u be a step function which has value yj on the interval Ij , j = 1 k. The endpoints of the non-degenerate intervals form a partition of a b, = a = t0 < t1 < < tN = b , N k, and we set for every > 0 = a = s0 < s1 < s1 < s2 < < sN −1 < sN −1 < sN = b where sj < tj < sj , 1 j N − 1, and sj − sj < /2N u . Since u is constant with value yj on each interval sj−1 sj , we find S u − S u =
N
yj − yj sj − sj−1 +
j=1
N −1 j=1
N −1
sup usj sj − inf usj sj sj − sj
j=1
2 u
2N u
Therefore Theorem E.5(ii) proves that u ∈ a b. With somewhat more effort one can prove the following general theorem.
344
R.L. Schilling
E.9 Theorem Any bounded function u a b → with at most countably many points of discontinuity is Riemann integrable. An elementary proof of this based on a compactness argument can be found in Strichartz [48, §6.2.3], but since Theorem 11.8 supersedes this result anyway, we do not include a proof here. A combination of Theorems E.8 and E.5 yields the following quite useful criterion for integrability. E.10 Corollary u ∈ ab if, and only if, for every > 0 there are f g ∈ a b b such that f u g and a g − f dt . E.11 Theorem The Riemann integral is a positive linear form on the vector lattice a b, that is, for all ∈ and u w ∈ a b one has b b b (i) u + w ∈ a b and u + w dt = u dt + w dt; a a ba b u dt w dt; (ii) u w =⇒ a a b b + − (iii) u ∨ w u ∧ w u u u ∈ a b and u dt u dt; a
(iv) up u w ∈ a b, 1 p < .
a
Proof (i) follows immediately from the linearity of the limit criterion in Theorem E.5(iv). b (ii): In view of (i) it is enough to show that v = w − u 0 entails a v dt 0. This, however is clear since v ∈ a b and b b 0 ∗v= v dt a a
(iii): Since u ∨ w = −−u ∧ −w, u+ = u ∨ 0, u− = −u ∨ 0 and u = u+ − u− , it is enough to prove that u ∧ w ∈ a b. By Corollary E.10 there are for every f g ∈ a b such that f u g, w b > 0 step functions b and a g − f dt + a − dt . Obviously, f ∧ g ∧ are again step functions[] with f ∧ u ∧ w g ∧ and b b f ∧ − g ∧ dt g − f + − dt a
a
where we used (ii) and the elementary inequality for a b A B ∈ a ∧ A − b ∧ B maxa − b A − B a − b + A − B b b Finally, since±u u we find by parts (i),(ii) that ± a u dt a u dt which b b implies a u dt a u dt.
Measures, Integrals and Martingales
345
(iv): By (iii), u ∈ a b and, by Corollary E.10, we find for each > 0 b step functions f u g such that a g − f dt . Without loss of generality, we may assume that f 0 and g u – otherwise we could consider f + and g ∧ u and note that f + g ∧ u ∈ a b, f + u g ∧ u and b b g ∧ u − f + dt g − f dt a gp ,
up
a
where Thus differential calculus we get fp
f p gp
∈ a b. By the mean value theorem of
p−1 g p − f p p g p−1
g − f p u g − f
Thus, by (ii), a
Since uw and the fact
b g p − f p dt p u p−1 g − f dt p u p−1
b
a
= 41 u + w2 − u − w2 , that u2 = u2 ∈ a b.
we conclude that uw ∈ a b from (i)
Note that Theorem E.11(iii) has no converse: u ∈ a b does not imply that u ∈ a b (as is the case for the Lebesgue integral, cf. T10.3). This can be seen by the modified Dirichlet jump function u = 101∩ − 101\ which is not Riemann integrable but whose modulus u = 101 is Riemann integrable. E.12 Corollary (Mean value theorem for integrals) Let u ∈ a b be either positive or negative and let v ∈ Ca b. Then there exists some ∈ a b such that b b utvt dt = v ut dt (E.5) a
a
Proof The case u 0 being similar, we may assume that u 0. By Theorem E.8 and E.11(iv), uv is integrable and because of E.11(ii) we have b b b ut dt utvt dt sup va b ut dt inf va b a
a
a
Since v is continuous on a b, the intermediate value theorem guarantees the existence of some ∈ a b such that (E.5) holds. E.13 Theorem Let c d ⊂ a b. Then a b ⊂ c d in the sense that u ∈ a b satisfies ucd ∈ c d. Moreover, for any u ∈ a b b c b u dt = u dt + u dt a
a
c
Proof By Theorem E.8 and E.11 we find that 1cd u ∈ a b. Since we can always add the points c and d to any of the partitions appearing in one of the
346
R.L. Schilling
criteria of Theorem E.5, we see that ucd = 1cd ucd ∈ c d and b d 1cd u dt = u dt a
c
Considering u = 1ac u + 1cb u proves also the formula in the statement of the theorem. The fundamental theorem of integral calculus x Since by Theorem E.13 a x ⊂ a b, we can treat a ut dt, u ∈ a b, as a function of its upper limit x ∈ a b. x E.14 Lemma For every u ∈ a b the function Ux = a ut dt is continuous for all x ∈ a b. Proof Since u is bounded, M = supx∈ab ux < . For all x y ∈ a b, x < y, we have by Theorem E.13 and E.11 x y Uy − Ux = ut dt − ut dt a
a
y y x−y→0 ut dt ut dt M y − x −−−−→ 0 = x x
showing even uniform continuity. We can now discuss the connection between differentiation and integration. Let us begin with a few examples. E.15 Example (i) Let 0 1 a b. Then ux = 101 x is an integrable function and ⎧ ⎫ ⎪ ⎪ ⎨ 0 if x 0 ⎬ x Ux = ut dt = x if 0 < x < 1 = x+ ∧ 1 ⎪ ⎪ a ⎩ 1 if x 1 ⎭ Note that U x does not exist at x = 0 or x = 1, so that ux cannot be the derivative of any function (at every point). (ii) Let a b = 0 1 and take an enumeration qj j∈ of 0 1 ∩ . Then the function
−j ux = 2 = 2−j 1qj 1 x x ∈ 0 1 j qj x
j=1
is increasing, satisfies 0 u 1 and its discontinuities are jumps at the points qj of height uqj + − uqj − = 2−j – this is as bad as it can get for
Measures, Integrals and Martingales
347
a monotone function, cf. Lemma 13.12. By Theorem E.8 u is integrable, and since qj j∈ is dense, there is no interval c d ⊂ 0 1 such that U x = ux for all x ∈ c d for any function Ux. (iii) Consider on −1 1 the function x2 sin x12 if x = 0 ux = 0 if x = 0 It is an elementary exercise to show that u x exists on −1 1 and 2x sin x12 − x2 cos x12 if x = 0 u x = 0 if x = 0 Thus u exists everywhere, but it is not Riemann integrable in any neighbourhood of x = 0 since u is unbounded. (iv) Let qn n∈ be an enumeration of 0 1 ∩ . The function 2−n if x = qn n ∈ ux = 0 if x ∈ 0 1 \ ∪ 0 1 is discontinuous for every x ∈ 0 1 ∩ and continuous otherwise. Moreover, u ∈ 0 1 which follows from Theorem E.9 or directly from the following argument: fix > 0 and n ∈ such that 2−n < . Choose a partition = 0 = t0 < t1 < < tN = 1 with mesh = < n in such a way that each qk from Qn = q1 q2 qn is the midpoint of some tj−1 tj , j = 1 2 N . Therefore, if Mj denotes sup utj−1 tj , 0 S u S u =
N
Mj tj − tj−1
j=1
=
Mj tj − tj−1
j tj−1 tj ∩Qn =∅
+
Mj tj − tj−1
j tj−1 tj ∩Qn =∅
n
+ 2−n n j t
tj − tj−1
j−1 tj ∩Qn =∅
+
N
tj − tj−1 = 2
j=1
1 x This proves u ∈ 0 1 and 0 0 ut dt 0 ut dt = 0. Thus u x = 0 = ux for all x from a dense subset.
348
R.L. Schilling
The above examples show that the Riemann integral is not always the antiderivative, nor is the antiderivative an extension of the Riemann integral. The two concepts, however, coincide on a large class of functions. E.16 Definition Let u a b → be a bounded function. Every function U ∈ Ca b such that U x = ux for all but possibly finitely many x ∈ a b is called a primitive of u. Obviously, primitives are only unique up to constants: for every constant c, U + c is again a primitive of u. On the other hand, if U W are two primitives of u, we have U − W = 0 at all but finitely many points a = x0 < x1 < < xn = b. Thus the mean value theorem of differential calculus shows U = W + const. (cf. Rudin [39, Thm. 5.11]), first on each interval xj−1 xj , j = 1 2 n, and then, by continuity, on the whole interval a b. x E.17 Proposition Every u ∈ Ca b has Ux = a ut dt as a primitive. Moreover, b Ub − Ua = ut dt a
Proof Since continuous functions are integrable, Ux is well-defined by Theorem E.13 and continuous by Lemma E.14. For a < x < x + h < b and sufficiently small h we find x x+h x+h Ux + h − Ux − h ux = ut dt − ut dt − ux dt a a x x+h = ut − ux dt x
x+h
ut − ux dt
x
x
x+h
dt = h
where we used that ut is continuous at t = x. With a similar calculation we get Ux − Ux − h − h ux h and a combination of both inequalities shows that lim
y→x
The formula Ub − Ua =
b a
Uy − Ux = ux y−x
ut dt follows from the fact that Ua = 0.
Measures, Integrals and Martingales
349
E.18 Theorem (Fundamental theorem of calculus) Assume that U is a primitive of u ∈ a b. Then b Ub − Ua = ut dt a
Proof Let C be some finite set such that U x = ux if x ∈ a b \ C. Fix
> 0. Since u is integrable, we find by E.5(ii) a partition of a b such that S u − S u . Because of Lemma E.1 this inequality still holds for the partition = ∪ C whose points we denote by a = t0 < t1 < < tk = b. Since Ub − Ua =
k
Utj − Utj−1
j=1
and since U is differentiable in each segment tj−1 tj and continuous on a b, we can use the mean value theorem of differential calculus to find points j ∈ tj−1 tj with Utj − Utj−1 = U j tj − tj−1 = uj tj − tj−1
1 j k
Using mj = inf utj−1 tj uj sup utj−1 tj = Mj we can sum the above equality over j = 1 k and get
S u − S u Ub − Ua S u S u + b By integrability, S u a u dt S u, and this shows b b u dt − Ub − Ua u dt + ∀ > 0 a
a
which proves our claim. E.19 Remark There is not much room to improve the fundamental theorem E.18. On one hand, Example E.15(ii) shows that an integrable x function need not have a primitive and E.15(iv) gives an example where a u dt exists, but is not a primitive in any interval; on the other hand, E.15(iii) provides an example of a function u which has a primitive u but which is itself not Riemann integrable since it is unbounded. Volterra even constructed an example of a bounded but not Riemann integrable function with a primitive, see Sz.-Nagy [30, pp. 155–7]. To overcome this phenomenon was one of the motivations for Lebesgue when he introduced the Lebesgue integral. And, in fact, every bounded function f on the interval a b with a primitive F is Lebesgue integrable: indeed, since F is continuous, it is measurable in the sense of Chapter 8 and so is the limit fx = limn→ Fx + n1 − Fx/ n1 , cf. Corollary 8.9 – the finitely many points where the limit does not exist are a Lebesgue null set and pose no problem. Since
350
R.L. Schilling
f is dominated by the (Lebesgue) integrable function M 1ab M = sup fa b, we conclude that f ∈ 1 a b. An immediate consequence of the integral as antiderivative are the following integration formulae which are easily proved by ‘integrating up’ the corresponding differentiation rules. E.20 Theorem (Integration by parts) Let u and v be integrable functions on a b with primitives u and v. Then uv is a primitive of u v+uv and, in particular, b b u tvt dt = ubvb − uava − utv t dt a
a
E.21 Theorem (Integration by substitution) Let u ∈ a b and assume that c d → a b is a strictly increasing differentiable function such that c = a and d = b. If u ∈ c d and if u has a primitive U , then U is a primitive of u · as well as d −1 b b ut dt = us s ds = us s ds a
−1 a
c
E.22 Corollary (Bonnet’s mean value theorem3 ) Let u v ∈ a b have primitives U and V . If u 0 [resp. u 0] and U 0, then there exists some ∈ a b such that b Utvt dt = Ua vt dt (E.6) a a b b resp. Utvt dt = Ub vt dt (E.6 ) a
Proof By subtracting a suitable constant from x V we may assume that Va = 0 and, by the fundamental theorem E.18, Va = a vt dt. Integration by parts now shows b b Utvt dt = UbVb − utVt dt a
a
Since u 0 we get b Utvt dt UbVb − sup Va b a
b
ut dt
a
= UbVb − sup Va b Ub − Ua = Ub Vb − sup Va b + sup Va b Ua sup Va b Ua 3
Also known as the second mean value theorem of integral calculus.
Measures, Integrals and Martingales
351
and a similar calculation yields the other inequality below: b Utvt dt sup Va b Ua inf Va b Ua a
Applying the intermediate value theorem to the continuous function V furnishes some ∈ a b such that (E.6) holds. Integrals and limits One of the strengths of Lebesgue integration is the fact that we have fairly general theorems that allow interchanging pointwise limits and Lebesgue integrals. Similar results for the Riemann integral regularly require uniform convergence. Recall that a sequence of functions un •n∈ on a b converges uniformly (in x) to u, if ∀ > 0
∃ N ∈ ∀ x ∈ a b ∀ n N
un x − ux
The basic convergence result for the Riemann integral is the following. E.23 Theorem Let un n∈ ⊂ a b be a sequence which converges uniformly to a function u. Then u ∈ a b and b b b lim un dt = lim un dt = u dt n→ a
a n→
a
n→
Proof Let be a partition of a b and let > 0 be given. Since un −−−→ u uniformly, we can find some N ∈ such that ux − un x /b − a uniformly in x ∈ a b for all n N . Because of (E.1) we find for all n N S u − S u = S u − un + S un − S un − S u − un 2 + S un − S un thus b∗ a
b u − ∗ u 2 + S un − S un
∀ n N
a
Fixing some n0 N we can use that un0 is integrable and choose in such a way that
S un0 − S un0
. This shows that
b∗ a
b u − ∗ u 3 and u ∈ a b. a
Once u is known to be integrable, we get for all n N b b
→0 u − un dt u − un dt b − a −−→ 0 a
a
We can now consider Riemann integrals which depend on a parameter.
352
R.L. Schilling
E.24 Theorem (Continuity theorem) Let u a b × → be a continuous function. Then b wy = ut y dt a
is continuous for all y ∈ . Proof Since u• y is continuous, the above Riemann integral exists. Fix y ∈ and consider any sequence yn n∈ with limit y. Without loss of generality we can assume that yn n∈ ⊂ I = y − 1 y + 1. Since a b × I is compact, uab×I is uniformly continuous, and we can find for all > 0 some > 0 such that t − 2 + y − 2 < =⇒ ut y − u < n→
As yn −−−→ y, there is some N ∈ with ut yn − ut y <
∀ t ∈ a b ∀ n N
n→
i.e. uyn t −−−→ uy t uniformly in t ∈ a b. Theorem E.23 and the continuity of ut • therefore show b b b lim wyn = lim ut yn dt = lim ut yn dt = ut y dt = wy n→
n→ a
a n→
a
which is but the continuity of w at y. E.25 Theorem (Differentiation theorem) Let u a b× → be a continuous function with continuous partial derivative y ut y. Then wy =
b
ut y dt a
is continuously differentiable and b d b w y = ut y dt ut y dt = dy a a y Proof Since u• y and y u• y are continuous, the above integrals exist. Fix y ∈ and consider any sequence yn n∈ with limit y. Without loss of generality we can assume that yn n∈ ⊂ I = y − 1 y + 1. We introduce the following auxiliary function
ht z = ut z − ut y −
ut y z − y y
Measures, Integrals and Martingales
353
Clearly, ht y = 0 and z ht z = z ut z − y ut y is continuous and uniformly continuous on a b × I, i.e. for all > 0 there is some > 0 such that 2 2 t − + z − < =⇒ ht z − h < z
From the mean value theorem of differential calculus we infer that for some between z and y ht z = ht z − ht y = h · z − y = h − ht y · z − y y z − y whenever z y ∈ I and z − y < . This shows that for some N ∈ ut yn − ut y − ut yyn − y yn − y ∀ t ∈ a b ∀ n N y Theorem E.23 now shows that b ut y − ut y wyn − wy n = lim dt n→ n→ yn − y yn − y a b b ut yn − ut y dt = lim ut y dt = yn − y a n→ a y
w y = lim
Improper Riemann integrals Let us finally have a glance at various extensions of the Riemann integral to unbounded intervals and/or unbounded integrands. The following cases can occur: A. the interval of integration is a + or − b; B. the interval of integration is a b or a b, and the integrand ut is unbounded as t ↑ b resp. t ↓ a; C. the interval of integration is a b with − a < b + and the integrand may or may not be unbounded.
354
R.L. Schilling
A. Improper Riemann integrals of the type
u dt or a
b
−
u dt
E.26 Definition If u ∈ a b for all b ∈ a [resp. a ∈ − b] and if the limit b b u dt resp. lim u dt lim a→− a
b→ a
exists and is finite, we call u improperly Riemann integrable and write u ∈ a [resp. u ∈ − b ]. The value of the above limit is called the
b (improper Riemann) integral and denoted by a u dt resp. − u dt . The typical examples of improper integrals of this kind are expressions of the type 1 t dt if < 0. In fact, if = −1, b −1 if < −1 1 +1 +1 t = lim t dt = lim b − 1 = b→ b→ + 1 1 1
if > −1 and a similar calculation confirms that 1 t−1 dt = . Thus t ∈ 1 if, and only if, < −1. From now on we will only consider integrals of the type a u dt, the case of a finite upper and infinite lower limit is very similar. The following Cauchy criterion for improper integrals is quite useful. E.27 Lemma y u ∈ a if, and only if, u ∈ a b for all b ∈ a and limxy→ x u dt = 0 (x y → simultaneously). z Proof This is just Cauchy’s convergence criterion for Uz = a ut dt as z → . It is not hard to see that Lemma E.27 implies, in particular, that • a is a vector space, i.e. for all ∈ and u w ∈ a , u + w dt = u dt + w dt a
• u ∈ a if, and only if,
a
b
a
u dt exists for all b > a.
E.28 Corollary Let u w a → be two functions such that u w. If w ∈ a , and if u ∈ a b for all b > a, then u u ∈ a . In particular, u ∈ a implies that u ∈ a .
Measures, Integrals and Martingales
355
Proof For all y > x > a we find using Theorem E.11 and Lemma E.27 that y y y xy→ u dt u dt w dt −−−−→ 0 x
x
x
which shows, again by E.27, that u u ∈ a . Note that, unlike Lebesgue integrals, improper Riemann integrals are not absolute integrals since improper integrability of u does NOT imply improper integra bility of u, see e.g. Remark 11.11 where 0 sin t/t dt is discussed. This means that the following convergence theorems for improper Riemann integrals are not necessarily covered by Lebesgue’s theory. E.29 Theorem Let un n∈ ⊂ a . If for some u a → n→
• un t −−−→ ut uniformly in t ∈ a b and for every b > a, b un dt exists uniformly for all n ∈ , i.e. for every > 0 there is • lim b→ a
some N ∈ such that y sup un dt < n∈ x then u ∈ a and
lim
n→ a
un dt =
∀ y > x > N a
lim un dt =
n→
u dt a
Proof That u ∈ a b for all b > a follows from Theorem E.23. Fix > 0 and choose N as in the above statement. For all y > x > N y y y u dt u − un dt + un dt y − x sup ut − un t + x
x
x
t∈xy
y and as n → we find x u dt for all y > x > N , hence u ∈ a by Lemma E.27. In pretty much the same way as we derived Theorems E.24, E.25 from the basic convergence result E.23 we get now from E.29 the following continuity and differentiability theorems for improper integrals. E.30 Theorem Let I ⊂ be an open interval and u a × I → be continuous such that u• y ∈ a for all y ∈ I and b lim ut y dt exists uniformly for all y ∈ c d ⊂ I Then Uy
b→ a = a ut y dt
is continuous for all y ∈ c d.
356
R.L. Schilling
Proof (sketch) Fix y ∈ c d and choose any sequence yn n∈ ⊂ c d with limit n→ y. By the assumptions un t = ut yn −−−→ ut y uniformly for all t ∈ a b. Now the basic convergence theorem for improper integrals E.29 applies and shows n→ Uyn −−−→ Uy. E.31 Theorem Let I ⊂ be an open interval and u a × I → be contin uous with continuous partial derivative y ut y. If u• y y ut y ∈ a for all y ∈ I, and if b b lim ut y dt and lim ut y dt b→ a b→ a y exist uniformly for all y ∈ c d ⊂ I, then Wy = a ut y dt exists and is differentiable on c d with derivative d W y = ut y dt ut y dt = dy a a y x Ux y exists and Proof (sketch) Set Ux y = a ut y dt. By Theorem E.25 y x equals a y ut y dt. By assumption, x→ ut y dt pointwise for all y ∈ c d Ux y −−−→ a
x→ Ux y −−−→ ut y dt y a y
uniformly for all y ∈ c d
By a standard theorem on uniform convergence and differentiability, cf. Rudin [39, Theorem 7.17], we now conclude d ut y dt = ut y dt dy a a y E.32 Theorem Let u w ∈ a b for all b ∈ a and assume that u w 0 and that limx→ ux/wx = A > 0 exists. Then u ∈ a if, and only if, w ∈ a . Proof By assumption we find for every > 0 some N ∈ such that 0 < A−
ux A+ wx
∀ x N > a
Thus A − wx ux A + wx for all x N . Thus, if w ∈ a , we get A + w ∈ a (cf. the remark following Lemma E.27) and, by Corollary E.28, u ∈ a . Similarly, if u ∈ a , we have u/A − ∈ a and, again by E.28, w ∈ a .
Measures, Integrals and Martingales
357
We will finally study the interplay of series and improper integrals. E.33 Theorem Let a = b0 < b1 < b2 < be a strictly increasing sequence with bk → .
bk u dt converges. (i) If u ∈ a , then k=1 bk−1
(ii) If u 0 and u ∈ bk−1 bk for all k ∈ , then the convergence of implies u ∈ a . Proof (i): Since u ∈ a , bn n u dt = lim u dt = lim n→ a
a
n→
bk
k=1 bk−1
u dt =
bk
bk
u dt
k=1 bk−1
u dt
k=1 bk−1
bk (ii): Define S = k=1 bk−1 u dt. Since bk increases to , we find for all b > a some N ∈ such that bN > b. Consequently,
b a
u dt
bN
a
which shows that the limit limb→
u dt = b a
N
bk
k=1 bk−1
u dt S
u dt = supb>0
b a
u dt S exists.
E.34 Theorem (Integral test for series) Let u ∈ C0 , u 0, be a decreasing function. Then
u dt and uk 0
k=0
either both converge or diverge. Proof Note that by Theorem E.8 u ∈ 0 b for all b > 0, so that the improper integral can be defined. Since u is decreasing, k+1 uk + 1 ut dt uk k
cf. Theorem E.11, and summing these inequalities over k = 0 1 N yields N +1 k=1
uk =
N k=0
uk + 1
0
N +1
ut dt
N
uk
k=0
Since positive terms, it is obvious that u is positive and since the series has only u dt converges if, and only if, the series k=0 uk is finite. 0
358
R.L. Schilling
B. Improper Riemann integrals with unbounded integrands E.35 Definition If u ∈ a c [resp. u ∈ c b ] for all c ∈ a b and if the limit c b u dt resp. lim u dt lim c↑b a
c↓a c
exists and is finite, we call u improperly Riemann integrable and write u ∈ a b [resp. u ∈ a b ]. The bvalue of the limit is called the (improper Riemann) integral and denoted by a u dt. Notice that the function u in E.35 need not be bounded in a b. If it is, the improper integral coincides with the ordinary Riemann integral. E.36 Lemma If the function u ∈ a b [or u ∈ a b ] has an extension to a b which is bounded, then the extension is Riemann integrable over a b, and proper and improper Riemann integrals coincide. Proof We consider only a b, since the other case is similar. Denote, for notational simplicity, the extension of u again by u. Let M = sup ua b, fix > 0 and pick c < b with b − c M . Since u ∈ a c, we can find a partition of a c such that S u − S u . For the partition = ∪ b of a b we get
M = S u − S u = sup uc b M M and
M = S u − S u = inf uc b M M
which implies that S u − S u 3 and u ∈ a b by Theorem E.5. The claim now follows from Lemma E.14. b Many of the results for improper integrals of the form a u dt resp. − u dt carry over with minor notational changes to the case of half-open bounded intervals. Note, however, that in the convergence theorems some assertions involving uniform convergence are senseless in the presence of unbounded integrands. We leave the details to the reader. The examples of improper integrals of this kind are expressions of the 1typical type 0 t dt if < 0. In fact, if = −1, 1 1 1 if > −1 1 +1 t = lim t dt = lim 1 − = +1
→0
→0 +1 0
if < −1
Measures, Integrals and Martingales
and a similar calculation confirms that only if, > −1.
1 0
359
t−1 dt = . Thus t ∈ 0 1 if, and
C. Improper Riemann integrals where both limits are critical Assume now that the integration interval is a b and that both endpoints a and b, − a < b + , are critical, i.e. that the integrand is unbounded at one or both endpoints and/or that one or both endpoints are infinite. Let u ∈ a c ∩ c b for some point a < c < b and suppose that d satisfies c < d < b. By the remark following Lemma E.27 and Theorem E.13 we find c b c y u dt + u dt = lim u dt + lim u dt a
x↓a x
c
= lim
y↑b c
c
x↓a x
= lim
x↓a x
=
a
d
d
u dt +
d
c
u dt + lim
u dt +
u dt + lim
y↑b d
y↑b d
y
u dt
y
u dt
b
u dt d
which shows that u ∈ a d ∩ d b. Therefore, the following definition makes sense. E.37 Definition Let − a < b + and let a b ⊂ be a bounded or unbounded open interval. Then u a b → is said to be improperly integrable if for some (hence, all) c ∈ a b the function u is improperly integrable both over a c and c b, i.e. we define a b = a c ∩ c b. The (improper Riemann) integral is then given by b c b c y u dt = u dt + u dt = lim u dt + lim u dt a
a
c
x↓a x
y↑b c
The typical example of an improper integral of this kind is Euler’s Gamma function x = tx−1 e−t dt x > 0 0
which is treated in Example 10.14 in the framework of Lebesgue theory, but the arguments are essentially similar. The Gamma function is only for 0 < x < 1 a two-sided improper integral, since for x 1 it can be interpreted as a one-sided improper integral over 0 , cf. Lemma E.36.
Further reading
Measure theory is used in many mathematical disciplines. A few of them we have touched in this book and the purpose of this section is to point towards literature which treats these subjects in depth. The choice of books and topics is certainly not comprehensive. On the contrary, it is very personal, limited by my knowledge of the literature and, of course, my own mathematical taste. I decided to include only books in English and which I thought are accessible to readers of the present text. Real analysis (in particular measure and integration theory for analysts) Bass, R. F., Probabilistic Techniques in Analysis, New York: Springer 1995. Dudley, R. M., Real Analysis and Probability (2nd edn), Cambridge: Cambridge University Press, Studies in Adv. Math. vol. 74, 2002. Hewitt, E. and K. R. Stromberg, Real and Abstract Analysis, New York: Springer, Grad. Texts in Math. vol. 25, 1975. Kolmogorov, A. N. and F. V. Fomin, Introductory Real Analysis, Mineola (NY): Dover, 1975. Lieb, E. H. and M. Loss, Analysis (2nd edn), Am. Mathematical Society, Grad. Studies in Math. vol. 14, Providence (RI) 2001. Rudin, W., Real and Complex Analysis (3rd edn), McGraw-Hill, New York 1987. Saks, S., Theory of the Integral (2nd revised edn), Hafner, Mongrafie Matematyczne Tom VII, New York 1937. [Reprinted by Dover, 1964. Free online edition in the Wirtualna Biblioteka Nauki: http://matwbn.icm.edu.pl/kstresc.php?tom=7&wyd=10] Stroock, D., A Concise Introduction to the Theory of Integration (3rd edn), Birkh¨auser, Boston 1999. Sz.-Nagy, B., Introduction to Real Functions and Orthogonal Expansions, Oxford University Press, Univ. Texts in the Math. Sci., New York 1965. Wheeden, R. L. and A. Zygmund, Measure and Integral. An Introduction to Real Analysis, Marcel Dekker, Pure Appl. Math. vol. 43, New York 1977.
360
Further reading
361
Functional analysis Bollobas, B., Linear Analysis. An Introductory Course (2nd edn), Cambridge University Press, Cambridge 1999. Hirsch, F. and G. Lacombe, Elements of Functional Analysis, Springer, Grad. Texts in Math. vol. 192, New York 1999. Kolmogorov, A. N. and F. V. Fomin, Introductory Real Analysis, Mineola (NY): Dover, 1975. Yosida, K., Functional Analysis (6th edn), Springer, Grundlehren math. Wiss. Bd. 123, Berlin 1980. Zaanen, A. C., Integration (completely revised edn. of An Introduction to the Theory of Integration), North-Holland, Amsterdam 1967.
Fourier series, harmonic analysis, orthonormal systems, wavelets Alexits, G., Convergence Problems of Orthogonal Series, Pergamon, Int. Ser. Monogr. Pure Appl. Math. vol. 20, Oxford 1961. Andrews, G. E., Askey, R. and R. Roy, Special Functions, Cambridge University Press, Encycl. Math. Appl. vol. 71, Cambridge 1999. Garsia, A. M., Topics in Almost Everywhere Convergence, Markham, Chicago 1970. Helson, H., Harmonic Analysis, Addison-Wesley, London, 1983. Kahane, J.-P., Some Random Series of Functions (2nd edn), Cambridge University Press, Stud. Adv. Math. vol. 5, Cambridge 1985. Krantz, S. G., A Panorama of Harmonic Analysis, Mathematical Association of America, Carus Math. Monogr. vol. 27, Washington 1999. Pinsky, M. A., Introduction to Fourier Analysis and Wavelets, Brooks/Cole, Ser. Adv. Math., Pacific Grove (CA) 2002. Schipp, F., Wade, W. R. and P. Simon, Walsh Series. An Introduction to Dyadic Harmonic Analysis, Adam Hilger, Bristol 1990. Stein, E. M., Singular Integrals and Differentiability Properties of Functions, Princeton University Press, Math. Ser. vol. 30, Princeton (NJ) 1970. Stein, E. M. and R. Shakarchi, Fourier Analysis: An Introduction, Princeton University Press, Princeton (NJ) 2003. Sz.-Nagy, B., Introduction to Real Functions and Orthogonal Expansions, Oxford University Press, Univ. Texts in the Math. Sci., New York 1965. Wojtaszczyk, P., A Mathematical Introduction to Wavelets, Cambridge University Press, London Math. Society Student Texts vol. 37, Cambridge 1997. Zygmund, A., Trigonometric Series (2nd edn), Cambridge University Press, Cambridge 1959. [Almost unaltered softcover editions: Cambridge: Cambridge University Press, 1969, 1988 and 2003.]
362
Further reading
Geometric measure theory, Hausdorff measure, fine properties of functions Evans, L. C. and R. F. Gariepy, Measure Theory and Fine Properties of Functions, CRC Press, Boca Raton (FL) 1992. Mattila, P., Geometry of Sets and Measures in Euclidean Spaces, Cambridge University Press, Studies in Adv. Math. vol. 44, Cambridge 1995. Morgan, F., Geometric Measure Theory: A Beginner’s Guide (3rd edn), Academic Press, San Diego, 2000. Rogers, C. A., Hausdorff Measures, Cambridge University Press, Cambridge Math. Library, Cambridge 1970. Ziemer, W. P., Weakly Differentiable Functions, Springer, Grad. Texts in Math. vol. 120, New York 1989. Topological measure theory, functional analytic aspects of integration and measure Bauer, H., Measure and Integration Theory, de Gruyter, Studies in Math. vol. 26, Berlin 2001. Choquet, G., Lectures on Analysis. vol. 1: Integration and Topological Vector Spaces, W. A. Benjamin, New York 1969. Dieudonné, J., Treatise on Analysis, vol. II, Academic Press, Pure Appl. Math. vol. 10-II, New York 1969. Hewitt, E. and K. A. Ross, Abstract Harmonic Analysis, vol. 1, Springer, Grundlehren math. Wiss. Bd. 115, Berlin 1963. Malliavin, P., Integration and Probability, Springer, Grad. Texts in Math. 157, New York 1995. Oxtoby, J. C., Measure and Category (2nd edn), Springer, Grad. Texts Math. vol. 2, New York 1980. Weir, A. J., General Integration and Measure, Cambridge University Press, Cambridge 1974. Borel and analytic sets Rogers, C. A. et al., Analytic Sets, Academic Press, London 1980. Srivastava, S. M., A Course on Borel Sets, Springer, Grad. Texts Math. vol. 180, New York 1998. Probability theory (in particular probabilistic measure theory) Ash, R. B. and C. A. Doléans-Dade, Probability and Measure Theory (2nd edn), Academic Press, San Diego (CA) 2000. Billingsley, P., Probability and Measure (3rd edn), Wiley, Ser. Probab. Math. Stat., New York 1995. Chow, Y. S. and H. Teicher, Probability Theory. Independence, Interchangeability, Martingales (3rd edn), Springer, Texts in Stat., New York 1997.
Further reading
363
Durrett, R., Probability: Theory and Examples (3rd edn), Thomson Brooks/Cole, Duxbury Adv. Studies, Belmont (CA) 2004. Kallenberg, O., Foundations of Modern Probability, Springer, New York 2001. Malliavin, P., Integration and Probability, Springer, Grad. Texts in Math. 157, New York 1995. Neveu, J., Mathematical Foundations of the Calculus of Probability, Holden Day, San Francisco (CA) 1965. Stromberg, K., Probability for Analysts, Chapman and Hall, Probab. Ser., New York 1994. Martingales and their applications Ash, R. B. and C. A. Doléans-Dade, Probability and Measure Theory (2nd edn), Academic Press, San Diego (CA) 2000. Chow, Y. S. and H. Teicher, Probability Theory. Independence, Interchangeability, Martingales (3rd edn), Springer, Texts in Stat., New York 1997. Dellacherie, C. and P. A. Meyer, Probabilities and Potential Pt. B: Theory of Martingales, North Holland, Math. Studies, Amsterdam 1982. [Note that Probabilities and Potential Pt. A, Amsterdam 1979, by the same authors is a prerequisite for this text.] Garsia, A. M., Topics in Almost Everywhere Convergence, Markham, Chicago 1970. Meyer, P. A., Probabilities and Potentials, Blaisdell, London 1966. Neveu, J., Discrete-parameter Martingales, North Holland, Math. Libr. vol. 10, Amsterdam 1975. Rogers, L. C. G. and D. Williams, Diffusions, Markov Processes and Martingales (2 vols., 2nd edn), Cambridge Math. Library, Cambridge 2000.
References
[1] Alexits, G., Convergence Problems of Orthogonal Series, Oxford: Pergamon, Int. Ser. Monogr. Pure Appl. Math. vol. 20, 1961. [2] Andrews, G. E., Askey, R. and R. Roy, Special Functions, Cambridge: Cambridge University Press, Encycl. Math. Appl. vol. 71, 1999. [3] Bass, R. F., Probabilistic Techniques in Analysis, New York: Springer, 1995. [4] Bauer, H., Approximation and abstract boundaries, Am. Math. Monthly 85 (1978), 632–647. Also in: H. Bauer, Selecta, Berlin: de Gruyter, 2003, 436–451. [5] Bauer, H., Probability Theory, Berlin: de Gruyter, Studies in Math. vol. 23, 1996. [6] Bauer, H., Measure and Integration Theory, Berlin: de Gruyter, Studies in Math. vol. 26, 2001. [7] Benyamini, Y. and J. Lindenstrauss, Geometric Nonlinear Functional Analysis, vol. 1, Providence (RI): Am. Math. Soc., Coll. Publ. vol. 48, 2000. [8] Boas, R. P., A Primer of Real Functions, Math. Association of America, Carus Math. Monogr. vol. 13, 1960. ¨ [9] Carathéodory, C., Uber das lineare Maß von Punktmengen – eine Verallgemeinerung des L¨angenbegriffs, Nachr. Kgl. Ges. Wiss. Göttingen Math.-Phys. Kl. (1914), 404–426. Also in: C. Carathéodory, Gesammelte mathematische Schriften (5 Bde.), M¨unchen: C.H. Beck, 1954-57, Bd. 4, 249–275. [10] Ciesielski, Z., H¨older condition for realizations of Gaussian processes, Trans. Am. Math. Soc. 99 (1961), 403–413. [11] Diestel, J. and J. J. Uhl Jr., Vector Measures, Providence (RI): American Mathematical Society, Math. Surveys no. 15, 1977. [12] Dieudonné, J., Sur un théorème de Jessen, Fundam. Math. 37 (1950), 242–248. Also in: J. Dieudonné, Choix d’œuvres mathématiques (2 tomes), Paris: Hermann, 1981, t. 1, 369–275. [13] Doob, J. L., Stochastic Processes, New York: Wiley, Ser. Probab. Math. Stat., 1953. [14] Dudley, R. M., Real Analysis and Probability, Pacific Grove (CA): Wadsworth & Brooks/Cole, Math. Ser., 1989. [15] Dunford, N. and J. T. Schwartz, Linear Operators I, New York: Pure Appl. Math. vol. 7, Interscience, 1957. [16] Garsia, A. M., Topics in Almost Everywhere Convergence, Chicago: Markham, 1970. [17] Gradshteyn, I. and I. Ryzhik, Tables of Integrals, Series, and Products (4th corrected and enlarged edn), San Diego (CA): Academic Press, 1992.
364
References
365
[18] Gundy, R. F., Martingale theory and pointwise convergence of certain orthogonal series, Trans. Am. Math. Soc. 124 (1966), 228–248. [19] Hausdorff, F., Grundzüge der Mengenlehre, Leipzig: Veit & Comp., 1914 (1st edn). Reprint of the original edn, New York: Chelsea, 1949. [20] Hewitt, E. and K. R. Stromberg, Real and Abstract Analysis, New York: Springer, Grad. Texts Math. vol. 25, 1975. [21] Hunt, G. A., Martingales et processus de Markov, Paris: Dunod, Monogr. Soc. Math. France t. 1, 1966. [22] Kaczmarz, S. and H. Steinhaus, Theorie der Orthogonalreihen (2nd corr. reprint), New York: Chelsea, 1951. First edition appeared under the same title with PWN, Warsaw: Monogr. Mat. Warszawa vol. VI, 1935. [23] Kahane, J.-P., Some Random Series of Functions, (2nd edn) Cambridge: Cambridge University Press, Stud. Adv. Math. vol. 5, 1985. [24] Korovkin, P. P., Linear Operators and Approximation Theory, Delhi: Hindustan Publ. Corp., 1960. [25] Krantz, S. G., A Panorama of Harmonic Analysis, Washington: Mathematical Association of America, Carus Math. Monogr. vol. 27, 1999. [26] Lévy, P., Processus stochastiques et mouvement Brownien, Paris: Gauthier-Villars, Monographies des Probabilités Fasc. VI, 1948. [27] Lindenstrauss, J. and Tzafriri, L., Classical Banach Spaces I, II, Berlin: Springer, Ergeb. Math. Grenzgeb. 2. Ser. Bde. 92, 97, 1977–79. [28] Marcinkiewicz, J. and A. Zygmund, Sur les fonctions indépendantes, Fundam. Math. 29 (1937), 309–335. Also in: J. Marcinkiewicz, Collected Papers, Warsaw: PWN, 1964, 233–259. [29] Métivier, M., Semimartingales. A Course on Stochastic Processes, Berlin: de Gruyter, Stud. Math. vol. 2, 1982. [30] Sz.-Nagy, B., Introduction to Real Functions and Orthogonal Expansions, New York: Oxford University Press, Univ. Texts in the Math. Sci., 1965. [31] Neveu, J., Discrete-parameter Martingales, Amsterdam: North Holland, Math. Libr. vol. 10, 1975. Slightly updated version of the French original: Martingales à temps discrèt, Paris: Masson, 1972. [32] Olevski˘ı, A. M., Fourier Series with Respect to General Orthogonal Systems, Berlin: Springer, Ergeb. Math. Grenzgeb. Bd. 2. Ser. 86, 1975. [33] Oxtoby, J. C., Measure and Category, (2nd edn), New York: Springer, Grad. Texts Math. vol. 2, 1980. [34] Paley, R. E. A. C. and N. Wiener, Providence (RI): Fourier Transforms in the Complex Domain, American Mathematical Society, Coll. Publ. vol. 19, 1934. [35] Pinsky, M. A., Introduction to Fourier Analysis and Wavelets, Pacific Grove (CA): Brooks/Cole, Ser. Adv. Math., 2002. [36] Pratt, J. W., On interchanging limits and integrals, Ann. Math. Stat. 31 (1960), 74–77. [Acknowledgement of Priority, Ann. Math. Stat. 37 (1966), 1407.] ¨ [37] Riemann, B., Uber die Darstellbarkeit einer Function durch eine trigonometrische Reihe, Nachr. Kgl. Ges. Wiss. Göttingen 13 (1867), 227–271. Also in: Bernhard Riemann, Collected Papers, Berlin: Springer, 1990, 259–303. [38] Rogers, L. C. G. and D. Williams, Diffusions, Markov Processes and Martingales (2 vols., 2nd edn), Cambridge: Cambridge Mathematical Library, 2000. [39] Rudin, W., Principles of Mathematical Analysis (3rd edn), New York: McGrawHill, 1976. [40] Rudin, W., Real and Complex Analysis (3rd edn), New York: McGraw-Hill, 1987.
366
References
[41] Schipp, F., Wade, W. R. and P. Simon, Walsh Series. An Introduction to Dyadic Harmonic Analysis, Bristol: Adam Hilger, 1990. [42] Souslin, M. Y., Sur une définition des ensembles mesurables B sans nombres transfinis, C. R. Acad. Sci. Paris 164 (1917), 88–91. [43] Srivastava, S. M., A Course on Borel Sets, New York: Springer, Grad. Texts Math. vol. 180, 1998. [44] Solovay, R. M., A model of set theory in which every set of reals is Lebesgue measurable, Ann. Math. 92 (1970), 1–56. [45] Steele, J. M., Stochastic Calculus and Financial Applications, New York: Springer, Appl. Math. vol. 45, 2000. [46] Steen, L. A. and J. A. Seebach, Counterexamples in Topology, New York: Dover, 1995. [47] Stein, E. M., Singular Integrals and Differentiability Properties of Functions, Princeton (NJ): Princeton University Press, Math. Ser. vol. 30, 1970. [48] Strichartz, R. S., The Way of Analysis (rev. edn), Sudbury (MA): Jones and Bartlett, 2000. [49] Stromberg, K., The Banach–Tarski paradox, Am. Math. Monthly 86 (1979), 151– 161. [50] Stroock, D. W., A Concise Introduction to the Theory of Integration (3rd edn), Boston: Birkh¨auser, 1999. [51] Szeg¨o, G., Orthogonal Polynomials, Providence (RI): Am. Math. Soc., Coll. Publ. vol. 23, 1939. [52] Wagon, S., The Banach–Tarski Paradox, Cambridge: Cambridge University Press, Encycl. Math. Appl. vol. 24, 1985. [53] Wheeden, R. L. and A. Zygmund, Measure and Integral. An Introduction to Real Analysis, New York: Marcel Dekker, Pure Appl. Math. vol. 43, 1977. [54] Willard, S., General Topology, Reading (MA): Addison-Wesley, 1970. [55] Yosida, K., Functional Analysis (6th edn), Berlin: Springer, Grundlehren Math. Wiss. Bd. 123, 1980. [56] Young, W. H., On semi-integrals and oscillating successions of functions, Proc. London Math. Soc. (2) 9 (1910/11), 286–324.
Notation index
This is intended to aid cross-referencing, so notation that is specific to a single section is generally not listed. Some symbols are used locally, without ambiguity, in senses other than those given below. Numbers following entries are page numbers with the occasional (Pr mn) referring to Problem mn on the respective page. Unless otherwise stated, binary operations between functions such as f ± g, f · g, j→
f ∧ g, f ∨ g, comparisons f g, f < g or limiting relations fj −−→ f , limj fj , lim inf j fj , lim sup fj , supi fi or inf i fi are always understood pointwise. Alternatives are indicated by square brackets, i.e., ‘if A [B] … then P [Q] ’ should be read as ‘if A … then P ’ and ‘if B … then Q ’. Abbreviations and shorthand notation a.a. a.e. ONB ONS UI w.r.t. negative positive
almost all, 80 almost every(where), 80 orthonormal basis, 239 orthonormal system, 239 uniformly integrable, 163, 194 with respect to always in the sense 0 always in the sense 0
∪-stable ∩-stable []
stable under finite unions stable under finite intersections, 32 end of proof, x indicates that a small intermediate step is required, x (in the margin) caution, x
Special labels, defining properties 1 2 3 M1 M2
Dynkin system, 31 measure, 22
S1 S2 S3 1 2 3
semi-ring, 37 -algebra, 15
Mathematical symbols Sub- and superscripts +
positive part, positive elements
⊥ b c
367
orthogonal complement, 235 bounded compact support
368
Notation index
Symbols, binary operations ∀ ∃ # − → − → ↑ ↓
= def = ≡ ∨, [f ∨ g] ∧, [f ∧ g] ⊥ ⊕ ×
⊗
for all, for every there exists, there is cardinality, 7 converges to convergence in measure, 163 increases to decreases to defining equality equal by definition identically equal maximum [of f and g], 64 minimum [of f and g], 64 absolutely continuous, 202 - measures: singular, 209 - Hilbert space: orthogonal, 231, 235 convolution, 137 direct sum, 236 - Cartesian product of sets; - Cartesian product of -algebras, 121; - product of measures, 125 product of -algebras, 121;
A⊂B AB A¯ A Aj ↑ A Aj ↓ A A×B An A #A
a b a b a b a b open, closed, half-open intervals a b, ((a,b)) rectangles in n , 18
u v u = v u ∈ B etc. 57 Functions, norms, measures & integrals fA f −1 f g f+ f− 1A
sgn
•
Set operations ∅ A∪B ·B A∪ A∩B A\B Ac AB
#A #B, #A < #B 7 t·A = ta a ∈ A, 36 (Pr 5.8) x+A = x + a a ∈ A, 28 E∩ = E ∩ A A ∈ , 16
empty set union 5 union of disjoint sets, 3, 5 intersection, 5 set-theoretic difference, 5 complement of A, 5 symmetric difference, 13 (Pr 2.2) subset, 5 proper subset, 5 closure of A, 320 open interior of A, 320 24 24 Cartesian product n-fold Cartesian product infinite sequences with values in A cardinality of A, 7
•p • • • ¯
x ∈ A, 6 = fx = f −1 B B ∈ , 16 composition: f gx = fgx = f ∨ 0 positive part, 61 = −f ∧ 0 negative part, 61 indicatorfunction of A 1 x ∈ A 1A x = 0 x ∈ A sign function ⎧ x>0 ⎨1 sgnx = 0 x=0 ⎩ −1 x < 0 maximum-norm in n and n×n , 142 Lp -norm, 105, 108 L -norm, 116 scalar product, 228
completion of the measure , 29 (Pr 4.13) restriction of the measure to the family of sets restriction of the measure X to the canonical -algebra on X T −1 image measure, 51 - limj→ convergence in measure, 163 T image measure, 51 u· measure with density, 79–80 ux d x 69, 76 u d , ux dx, u d = 1 u d , 79 A A
, X,
Notation index 77 u dx, ux dx u dT = u T d , 134 b b ux dx, R a ux dx Riemann a integral, 93, 339 Other notation in alphabetical order ℵ0 ∗ j j , − ,
cardinality of , 7 completion, 29 (Pr 4.13) filtration, 176 =
i i ∈ I, 177, 203 = ∈− , 193 185
Br x
open ball with radius r and centre x, 17, 323 Borel sets in A, 20 (Pr 3.10) Borel sets in , 226 Borel sets in n , 17 completion of the Borel sets, 132 (Pr 13.11), 144, 330 ¯ 58 Borel sets in ,
A n n ∗ n ¯ ¯ CU Cc U C U
x det Dx d d
E , E• E
cardinality of 0 1, 11 complex numbers continuous functions f U → continuous functions f U → with compact support functions f U → differentiable arbitrarily often Dynkin system generated by , 31 unit mass at x, Dirac measure at x, 26 determinant (of a matrix) Jacobian, 147 Radon-Nikodým derivative, 203 conditional expectation, 250, 263 = E conditional expectation, 260, 263 simple functions, 60
369
GLn
invertible n×n -matrices
id
identity map or matrix
n n half-open rectangles in n , 18 n …with rational endpoints, rat rat 18 o on o n open rectangles in n , 18 on o rat rat …with rational endpoints, 18
or
n 1 p
1 , 1¯
1
1 -lim∈I Lp
p , Lp
p
p -limj→
, L L lim inf j aj lim supj aj lim inf j Aj lim supj Aj
Lebesgue measure in n , 27 79 113 76 227 203 108 228 105 109 116 260 = supk inf jk aj , 313 = inf sup a , 313 k jk j = A , 316 k∈ jk j = k∈ jk Aj , 316
, ¯ M
59 258
0
natural numbers: 1 2 3 positive integers: 0 1 2 -null sets, 29 (Pr 4.10), 80
n X n , n
volume of the unit ball in n , 156 topology, open sets, 17 topology, open sets in n , 17
PC , PF X
(orthogonal) projection, 235 all subsets of X, 12
rational numbers
370
Notation index
¯
real numbers extended real line − +, 58 n Euclidean n-space nx , ny 147 n×n real n × n-matrices a b 339 a , − b 354 a b, a b 358 a b, − 359 ,
stopping times, 185 -algebra generated by , 16
T Ti i ∈ I -algebra generated by the map(s) T , resp., Ti , 51 span all finite linear combinations of the elements in , 239 supp f = f = 0 support of f , x
stopping times, 185 shift x y = y − x, 49
X measurable space, 22 X measure space, 22 X j , X filtered measure space, 176, 203
integers: 0 ±1 ±2
Name and subject index
This should be used in conjunction with the Bibliography and the Index of Notation. Numbers following entries are page numbers which, if accomplished by (Pr n.m), refer to Problem n.m on that page; a number with a trailing ‘n’ indicates that a footnote is being referenced. Unless otherwise started ‘integral’, integrability’ etc. always refer to the (abstract) Lebesgue integral. Within the index we use ‘L-…’ and ‘R-…’ as a shorthand for ‘(abstract) Lebesgue-…’ and ‘Riemann-…’ ℵ0 aleph null, 7 absolutely continuous, 202 uniformly absolutely continous, 169 Alexits, Gy¨orgy, 277n, 302 almost all (a.a.), 80 almost everywhere (a.e.), 80 Analytic set, 333 Andrews, George, 277n arc-length, 160–161 (Pr 15.6) Askey, Richard, 277n atom, 20 (Pr 3.5), 46 (Pr 6.5) axiom of choice, 331 Banach, Stephan, 43 Banach space, 326 Banach–Tarski paradox, 43 basic convergence result for improper R-integrals, 355 for R-integrals, 351 basis, 242 unconditional basis, 293–295 Bass, Richard, 311 Bauer, Heinz, 159, 281, 310 Benyamini, Yoav, 210 Bernoulli distribution, 183 Bernstein, Serge˘ı N., 279
Bernstein polynomials, 280 bijective map, 6 Boas, Ralph, 114 Borel, Emile Borel measurable, 17 Borel set, 17 Borel -algebra, 17 alternative definition, 21 (Pr 3.12) cardinality, 332 completion, 330 generator of, 18, 19 in a subset, 20 (Pr 3.10) ¯ 58 in , Brownian motion, 309–311 continuum, 11 Calderón, Alberto Calderón–Zygmund decomposition, 221 Cantor, Georg, 11 Cantor’s diagonal method, 11 Cantor discontinuum, 55 (Pr 7.10), 223–224 (Pr 19.10) Cantor function, 224 (Pr 19.10) Cantor (ternary) set, 55 (Pr 7.10), 223–224 (Pr 19.10) Carathéodory, Constantin, 37
371
372
Name and Subject Index
cardinality, 7 of the Borel -algebra, 332 of the Lebesgue -algebra, 330 Carleson, Lennart, 289 Cartesian product rules for Cartesian Products, 121 Cauchy sequence in p , 109 in metric spaces, 325 in normed spaces, 234 Cavalieri’s principle, 120 Cesàro mean, 286 change of variable formula for Lebesgue integrals, 151 for Riemann integrals, 350 for Stieltjes integrals, 133 (Pr 13.13) Chebyshev, Pafnuti L., 85 (Pr 10.5) Chebyshev polynomials (first kind), 277 Ciesielski, Z., 311 closed ball, 323 compactness (weak sequential), 169 in 1 , 168 in p , 168, 274 (Pr 23.8) and uniform integrability, 169 completeness of p , 1 p < , 110 of , 116 in normed spaces, 234 completion, 29 (Pr 4.13) and H¨older maps, 146 and inner measure, 86 (Pr 10.12) and inner/outer regularity, 160 (Pr 15.6) integration w.r.t. complete measures, 86 (Pr 10.11) of metric spaces, 325 and outer measure, 46 (Pr 6.2), 86 (Pr 10.12) and product measures, 132 (Pr 13.11) and submartingales, 187 (Pr 16.3) complexification, 232 conditional conditional Beppo Levi Theorem, 264 conditional dominated convergence Theorem, 266 conditional Fatou’s Lemma, 265 conditional Jensen inequality, 266 conditional monotone convergence property, 259 conditional probability, 257 (Pr 22.3)
conditional expectation in Lp and L , 260 in L1 , 263–264 in L2 , 250 properties (in L2 ), 251 properties (in Lp ), 261–262 via Radon-Nikodým Theorem, 223 (Pr 19.3) conjugate numbers (also conj. indices), 105 conjugate Young functions, 117 (Pr 12.5) continuity implies measurability, 50 of measures at ∅, 24 of measures from above, 24 of measures from below, 24 in metric spaces, 324 in topological spaces, 321 continuous function is measurable, 50 is Riemann integrable, 342 continuous linear functional in Hilbert space, 238 representation of continuous linear functionals, 239 convergence along an upwards filtering set, 203 criteria for a.e. convergence, 173 (Pr 16.1,16.2) in p , 109 in p implies in measure, 164 in measure, 163 criterion, 174 (Pr 16.10) is metrizable, 174 (Pr 16.8) no unique limit, 173 (Pr 16.6) weaker than pointwise, 164 in metric spaces, 323–324 in normed spaces, 234 pointwise implies in measure, 164 pointwise vs. p , 109 in probability, 163n of series of random variables, 201 (Pr 16.9) in topological spaces, 322 uniform convergence, 351 convex function, 114–115, 172n convex set, 235 convolution formula for integrals, 138 of a function and a measure, 137 of functions, 137
Name and Subject Index as image measure, 137 of measures, 137 cosine law, 231 countable set, 7 counting measure, 27 Darboux, Gaston, 337 Darboux sum, 93, 338 de la Vallée Poussin, Charles de la Vallèe Poussin’s condition, 169 de Morgan’s identities, 5, 6 dense subset, 320 of C, 279, 287 in Hilbert space, 238 of p , 157, 159, 139 of L2 , 282 density (function), 80, 202 derivative of a measure, 152, 219 of a measure singular to n , 220 of a monotone function, 225 (Pr 19.17) Radon-Nikodým derivative, 203 of a series of monotone functions, 225 (Pr 19.18) diagonal method, 11 Diestel, Joseph, 210 Dieudonné, Jean, 205 Dieudonné’s condition, 169 diffeomorphism, 147 diffuse measure, 46 (Pr 6.5) Dirac measure, 26 direct sum, 236 Dirichlet (Lejeune-D.), Gustav Dirichlet’s jump function, 88 not Riemann integrable, 342 Dirichlet kernel, 286 disjoint union, 3, 5 distribution distribution function, 128 of a random variable, 52 Doob, Joseph, 176, 213 Doob decomposition, 275 (Pr 23.11) Doob’s upcrossing estimate, 191 Dudley, Richard, 336 Dunford, Nelson, 168 Dunford–Pettis condition, 169 dyadic interval, 179 dyadic square, 179 Dynkin system, 31 conditions to be -algebra, 32
373
generated by a family, 31 minimal Dynkin system, 31 not -algebra, 36 (Pr 5.2) enumeration, 7 equi-integrable, see uniformly integrable exhausting sequence, 22 factorization lemma, 56 (Pr 7.11), 64 Faltung, see convolution Fatou, Pierre, 73 Féjer, Lipót, 285, 286 Féjer kernel, 286 filtration, 176, 203 dyadic filtration, 213, 221, 268, 295, 302 finite additivity, 23 Fischer, Ernst, 110 Fourier, Jean Baptiste Fourier coefficients, 285 Fourier series, a.e.-convergence, 288 Fourier series, Kolmogorov’s example of a nowhere convergent Fourier series, 288 Fourier series, Lp -convergence, 288 Fréchet, Maurice, 232 (Pr 20.2) Fresnel integral, 103 (Pr 11.19) Friedrichs mollifier, 141 (Pr 14.10) Frullani integral, 103 (Pr 11.20) F set, 142, 159 (Pr 15.1) Fubini, Guido, 125 function absolutely continuous function, 223 (Pr 19.9) concave function, 114–115 convex function, 114–115, 172n distribution function, see distribution function independent function, see independent functions indicator function, see indicator function integrable function, 76 measurable function, see measurable function(s) moment generating function, 102 (Pr 11.15) monotone function, see monotone function negative part of, 61 numerical function, 58
374
Name and Subject Index
function (cont.) positive part of, 61 Riemann integrable funtion, 339 simple function, see simple functions (Gamma) function, 99, 161 (Pr 15.8) Garsia, Adriano, 218, 309 Gaussian distribution, 152, 310 G set, 142 generator of the Borel -algebra, 18, 19 of a Dynkin system, 31 of a -algebra, 16 Gradshteyn, Izrail S., 277n, 284 Gram-Schmidt orthonormalization, 243–244 Gundy, Richard, 309 Haar, Alfréd Haar–Fourier series, 290 a.e.-convergence, 291 Lp -convergence, 291 Haar functions, 289, 310 Haar system, 289, 310 complete ONS, 290 Haar wavelet, 295 a.e.-convergence, 296 complete ONS, 296 Lp -convergence, 296 Hausdorff, Felix, 43 Hausdorff space, 320 Hermite polynomials, 278 Hewitt, Edwin, 11, 19n, 43 Hilbert cube, 246 (Pr 21.7) Hilbert space, 234 isomorphic to 2 , 244–245 separable Hilbert space, 230, 244–245, 245 (Pr 21.5) H¨older continuity, 143 Hunt, G., 163 Hunt, R., 289 image measure, 51, 134 integral w.r.t. image measure, 134 of measure with density, 140 (Pr 14.1) independence and integrability, 85 (Pr 10.10) of -algebras, 36 (Pr 5.10) independent functions, 180–184, 279, 299, 302
existence of independent functions, 183 independent random variables, 188 (Pr 16.8, 16.9), 196, 201 (Pr 18.9), 224 (Pr 19.11), 275 (Pr 23.12), 306 convergence of independent random variables, 309 indicator function, 13 (Pr 2.5), 59 measurability, 59 rules for indicator functions, 74 (Pr 9.9), 316 inequality Bessel inequality, 240 Burkholder–Davis–Gundy inequality, 294 Cauchy–Schwarz inequality, 107, 229 Chebyshev inequality, 85 (Pr 10.5) conditional Jensen inequality, 266 Doob’s maximal Lp inequality, 211, 224 (Pr 19.13) generalized H¨older inequality, 117 (Pr 12.4) Hardy–Littlewood maximal inequality, 215 H¨older inequality, 106 for series, 113 for 0 < p < 1, 118 (Pr 12.18) Jensen inequality, 115 Kolmogorov’s inequality, 224 (Pr 19.11) Markov inequality, 82 variants thereof, 84–85 (Pr 10.5) Minkowski’s inequality, 107 for integrals, 130 for series, 113 for 0 < p < 1, 118 (Pr 12.18) moment inequality, 118 (Pr 12.19) strong-type inequality, 212 weak-type maximal inequality, 212 Young inequality, 105, 117 (Pr 12.5), 138, 141 (Pr 14.9) injective map, 6 inner product, 228 inner product space, 228 integrability comparison test, 85 (Pr 10.9) of complex functions, 227 integrability criterion
Name and Subject Index for improper R-integrals, 354, 356, 357 for L-integrals, 77 for L-integrals of image measures, 134, 135 for R-integrals, 94, 339, 344 of exponentials, 98, 102 (Pr 11.9) w.r.t. image measures, 135 of measurable functions, 76 of positive functions, 77 of (fractional) powers, 98, 155 Riemann integrability, 93, 339 integrable function. see also 1 P etc. improperly R-, not L-integrable function, 97 is a.e. -valued, 83 Riemann integrable function, 93, 339 integral, see also Lebesgue integral, Riemann integral, Stieltjes integral and alternating series, 101 (Pr 11.5) of complex functions, 226–228 examples, 72–73, 79 generalizing series, 113 w.r.t. image measures, 134 and infinite series, 101 (Pr 11.4) iterated vs. double, 130–131 (Pr 13.3–13.5) lattice property, 78 of measurable functions, 76 over a null set, 81 of positive functions, 69 examples, 72–73 properties, 71–72 is positive linear functional, 79 properties, 78–79 of rotationally invariant functions, 155 of simple functions, 68 sine integral, 131 (Pr 13.6) over a subset, 79 integral test for series, 357 integration by parts for Riemann integrals, 350 for Stieltjes integrals, 133 (Pr 13.13) integration by substitution, 350 see also change of variable formula isometry, 245, 325 Jacobi polynomials, 277 Jacobian, 147 Jordan, Pascual, 232 (Pr 20.2)
Kac, Mark, 176 Kaczmarz, Stefan, 277n Kahane, Jean-Pierre, 311 kernel, 74 (Pr 9.11) Dirichlet kernel, 286 Féjer kernel, 286 Kolmogorov, Andrei N., 196, 288 Kolmorgov’s law of large numbers, 196–200 Korovkin, Pavel P., 281 Krantz, Steven, 218, 222 1 (summable sequences), 79 2 being isomorphic to separable Hilbert spaces, 245 1 , 1¯ (integrable functions), 76 1 , 227 p , 113 p , 105 dense subset of p , 140, 157, 159 p , Lp , 228 Lp , 108 being not separable, 271 Lp = Lp¯ , 108 separability criterion, 269, 271, 272 Laguerre polynomials, 278 lattice, 253 law of large numbers, 196–200 , L , 116 Lebesgue, Henri, 77, 77n, 349 Lebesgue integrable, 77 Lebesgue measurable set, 330 Lebesgue pre-measure, 45 Lebesgue -algebra, 330 cardinality, 330 Lebesgue integral, 77 abstract Lebesgue integral, 77n invariant under reflections, 136 invariant under translations, 136 transformation formula, 151 and differentiation, 152 Lebesgue measure, 27 change of variable formula, 53, 148 and differentiation, 152 characterized by translation invariance, 34 is diffuse, 46 (Pr 6.5) dilations, 36 (Pr 5.8) existence, 28, 45 and H¨older maps, 143
375
376
Name and Subject Index
Lebesgue measure (cont.) is inner/outer regular, 158 invariant under motions, 54 invariant under orthogonal maps, 52 invariant under translations, 34 null sets, 29 (Pr 4.11), 47 (Pr 6.8) under H¨older maps, 146 as product measure, 125 properties of Lebesgue measure, 28, 46 (Pr 6.3) transformation formula, 148 and differentiation, 152 uniqueness, 28 Legendre polynomials, 278 lemma Borel-Cantelli lemma, 48 (Pr 6.9), 198 Calderón-Zygmund lemma, 221 lemma conditional Fatou’s lemma, 265 continuity lemma (L-integral), 91 differentiability lemma (L-integral), 91 Doob’s upcrossing lemma, 191 factorization lemma, 56 (Pr 7.11), 64 Fatou’s lemma, 73 Fatou’s lemma for measures, 74 (Pr 9.9) generalized Fatou’s lemma, 85 (Pr 10.8) Pratt’s lemma, 101 (Pr 11.3) reverse Fatou lemma, 74 (Pr 9.8) Urysohn’s lemma, 156 Lévy, Paul, 311 lim inf, lim sup (limit inferior/superior) of a numerical sequence, 63, 313–314 of a sequence of sets, 74 (Pr 9.9), 316 Lindenstrauss, Joram, 210, 294, 295 linear span, 240, 246 (Pr 21.9) lower integral, 338 map bijective map, 6 continuity in metric spaces, 324 continuous map, 321 H¨older continuous map, 143 and completion, 146 injective map, 6 measurable map, 49, 54 (Pr 7.5) surjective map, 6 Marcinkiewicz, Jozef, 176 martingale, 177 see also submartingale backwards martingale, 193 characterization of martingales, 186
closure of a martingale, 266–267 and conditional expectation, 266 and convex functions, 178, 268 martingale difference sequence, 188 (Pr 17.9), 275 (Pr 23.10), 302 a.e.-convergence, 303, 304, 306 L2 -convergence, 306 Lp -convergence, 304 of independent functions, 306 ONS, 303 with directed index set, 203 1 -convergence, 203 example of non-closable martingale, 275 (Pr 23.12) martingale inequality, 210–213, 294 L1 -convergence, 267 2 -bounded martingale, 201 (Pr 18.8) p -bounded martingale, 225 (Pr 19.14) quadratic variation, 294 reverse martingale, see backwards martingale martingale transform, 188 (Pr 16.7) uniformly integrable (UI) martingale, 194, 267 maximal function Hardy–Littlewood maximal function, 214 of a measure, 217 square maximal function, 214 measurability of continuous maps, 50 of coordinate functions, 54 (Pr 7.5) of indicator functions, 59 ∗ -meaurabilty, 43 measurable function(s), 57 complex valued measurable function, 227 stable under limits, 62 vector lattice, 63 measurable map, 49, 54 (Pr 7.5) measurable set, 15, 17 measurable space, 22, measure, 22, see also Lebesgue measure, Stieltjes measure, 22 complete measure, 29 (Pr 4.13) continuous at ∅, 24 continuous from above, 24 continuous from below, 24 counting measure, 27 -measure, 26
Name and Subject Index diffuse measure, 46 (Pr 6.5) Dirac measure, 26 discrete probability measure, 27 equivalent measures, 223 (Pr 19.5) examples of measures, 26–28 finite measure, 22 inner regular measure, 158 invariant measure, 36 (Pr 5.9) locally finite measure, 218, 217n non-atomic measure, 46 (Pr 6.5) outer measure, 38n outer regular measure, 158 pre-measure, 22, 24, 45 probability measure, 22 product measure, 122–123 properties of measures, 23 separable measure, 272 -additivity, 22 -finite measure, 22, 30 (Pr 4.15) -subadditivity, 26 singular measure, 209 on S n−1 , 153–156 strong additivity, 23 subadditivity, 23 surface measure, 153–156, 161 (Pr 15.6) uniqueness, 33 measure with density, 80, 202 measure space, 22 complete measure space, 29 (Pr 4.13) finite measure space, 22 probability space, 22 -finite measure space, 22, 30 (Pr 4.15) -finite filtered measure space, 176, 203 mesh, 338 Métivier, Michel, 210 metric (distance function), 322 metric space, 322 monotone class, 21 (Pr 3.11) monotone function discontinuities of monotone functions, 129 is Lebesgue a.e. continuous, 129 is Lebesgue a.e. differentiable, 225 (Pr 19.17) is Riemann integrable, 342 monotonicity monotonicity of the integral, 78, 344 monotonicity of measures, 23
377
neighbourhood, 319 open neighbourhood, 319 von Neumann, John, 232 (Pr 20.2) Neveu, Jacques, 268 non-measurable set, 48 (Pr 6.10), 48 (Pr 6.11) for the Borel -algebra, 332 for the Lebesgue -algebra, 331 norm, 105, 325 normed space, 325 and inner products, 229 Lp , 108 p , 105 , L , 116 quotient norm, 326 null set, 29 (Pr 4.10), 47 (Pr 6.8), 80 subsets of a null set, 29 (Pr 4.13) under H¨older map, 146 Olevski˘ı, A.M., 294 open ball, 323 optional sampling, 187 orthogonal orthogonal complement, 235 orthogonal elements of a Hilbert space, 234 orthogonal projection, 235, 246–247 (Pr 21.1) as conditional expectation, 253 orthogonal vectors, 231 orthogonal polynomials, 277–279 Chebyshev polynomials, 277 complete ONS, 282 dense in L2 , 282 Hermite polynomials, 278 Jacobi polynomials, 277 Laguerre polynomials, 278 Legendre polynomials, 278 orthonormal basis (ONB), 239, 242 characterization of, 242 orthonormal system (ONS), 240 complete orthogonal system, 242 maximal orthogonal system, 242 total orthogonal system, 242 orthonormalization procedure, 243–244 Oxtoby, John, 43 Paley, Raymond, 176, 302, 311 parallelogram identity, 231 parameter-dependent
378
Name and Subject Index
parameter-dependent (cont.) improper R-integrals, 103 (Pr 11.20), 355–356 L-integrals, 92, 99 R-integrals, 352–353 Parseval’s identity, 240 partial order, 9 partition, 338 Pettis, B.J., 169 Pinsky, Mark, 299 polar coordinates 3-dimensional, (Pr 15.9) 162 n-dimensional, 153 planar, 152 polarization identity, 231 generalized polarization identity, 233 (Pr 20.5) Polish space, 336, 336n power set, 12 Pratt, John, 101 (Pr 11.3) pre-measure, 22, 24, 45 extension of a pre-measure, 37 -subadditivity, 26 primitive, 346, 348 bounded functions with primitive are L-integrable, 349 of a continuous function, 225 (Pr 19.16) differentiability of a primitive, 225 (Pr 19.16) probability space, 22 product of measurable spaces, 121 product measure space, 125 product measures, 122–123 product -algebra, 121, 127 projection, 235, 246–247 (Pr 21.11) orthogonal projection, 238 Pythagoras’ theorem, 233 (Pr 20.6), 238, 240 quadratic variation (of a martingale), 294 quotient norm, 326 quotient space, 326 Rademacher, Hans Rademacher functions, 299 are an incomplete ONS, 299
completion of, 301–302 Rademacher series, a.e.-convergence, 300 Radon–Nikodým derivative, 203 random variable, 49 see also independent random variables distribution of a random variable, 52 ¯ (extended real line), 58 ¯ 58 arithmetic of , rearrangement decreasing rearrangement, 133 (Pr 13.14) rearrangement invariant, 133 (Pr 13.14) rectangle, 18 Riemann, Bernhard, 337 Riemann integrability, 339 criteria for Riemann integrability, 94, 339, 344 Riemann sum, 342 Riemann integral, 93, 339, 339 vs. antiderivative, 348 coincides with Lebesgue integral, 94 and completed Borel -algebra, 97 function of upper limit, 346 improper Riemann integral, 96, 129, 353–359 improper Riemann integral and infinite series, 357 properties of Riemann integral, 344 Riesz, Frigyes, 110, 111, 238, 239, 326 Riesz, Marcel, 288 ring of sets, 24n, 40n Rogers, Chris (L.C.G.), 294 Roy, Ranjan, 277n Rudin, Walter, 245, 246 (Pr 21.10), 288, 318, 348, 356 Ryzhik, Iosif, 277n, 284 scalar product, see inner product Schipp, Ferenc, 302 Schwartz, Jacob, 168 Seebach, J. Arthur, 318 semi-norm, 325 in p , 108 semi-ring (of sets), 37 n is semi-ring, 44 separable separable Hilbert space, 244, 230, 244–245, 245 (Pr 21.5) separable Lp -space, 272
Name and Subject Index separable measure, 272 separable -algebra, 272 separable space, 320 sesqui-linear form, 229 set analytic set, 333 Borel set, 17 cardinality of, 7 closed set, 319 closed in metric spaces, 323 closed in n , 17 closure, 320 compact set, 320, 324–326, see also compactness connected set, 321 convex set, 235 countable set, 7 dense set, 320, see also dense subset F set, 159 (Pr 15.1), 142 G set, 142 (open) interior, 320 measurable set, 15 ∗ -measurable set, 43 non-measurable set, see non-measurable set nowhere dense set, 55 (Pr 7.10) open set, 319 open in metric spaces, 323 open in n , 17, 319 pathwise connected set, 321 relatively compact set, 320 relatively open set, 319 Souslin set, 333 uncountable set, 7 upwards filtering index set, 203 -additivity, 3, 22 -algebra, 15 Borel set, 17 examples, 15–16 generated by a family of maps, 51 generated by a family of sets, 16 generated by a map, 51 generator of, 16 inverse image, 16, 49 minimal -algebra, 16, 51 product -algebra, 121, 127 properties, 15, 20 (Pr 3.1) separable -algebra, 272 topological -algebra, 17 trace -algebra, 16, 20 (Pr 3.10)
379
-finite filtered measure space, 176 -finite measure, 22 -finite measure space, 22 -subadditivity, 26 Simon, Peter, 302 simple functions, 60 dense in p , 1 p < , 112 dense in , 61 integral of simple finctions, 68 not dense in , 116 standard representation, 60 uniformly dense in b , 65 (Pr 8.7) singleton, 46 (Pr 6.5) Souslin, Michel, 333, 336 Souslin operation, 336 Souslin scheme, 332 Souslin set, 333 span, 240, 246 (Pr 21.9) spherical coordinates, 153 Srivastava, Sashi, 336 standard representation, 60 Steele, Michael, 311 Steen, Lynn, 318 Stein, Elias, 221 Steinhaus, Hugo, 176, 277n step function, 342, see also simple functions is Riemann integrable, 342 Stieltjes, Thomas Stieltjes function, 55 (Pr 7.9) Stieltjes integral, 132 (Pr 13.13) change of variable, 133 (Pr 13.13) integration by parts, 133 (Pr 13.13) Stieltjes measure, 54 (Pr 7.9), 132 (Pr 13.13) Lebesgue decomposition of Stieltjes measure, 223 (Pr 19.9) stopping time, 184 characterization of, 189 (Pr 17.9) Strichartz, Robert, 344 Stromberg, Karl, 11, 19n, 43 strong additivity, 23 Stroock, Daniel, 161 (Pr 15.7) subadditivity, 23 submartingale, 177 backwards submartingale, 193 convergence theorem, 193 1 -convergence, 195 change of filtration, 187 (Pr 16.2) characterization of submartingales, 186
380
Name and Subject Index
submartingale (cont.) w.r.t. completed filtration, 187 (Pr 16.3) and conditional expectation, 266 and convex functions, 178, 268 Doob decomposition, 275 (Pr 23.11) Doob’s maximal inequality, 211, 224 (Pr 19.13) examples, 178–181, 200 (Pr 18.6) inequalities for, 210–213 1 -convergence, 194 pointwise convergence, 191 reversed submartingale, see backwards martingale uniformly integrable (UI)martingale, 193, 194 upcrossing estimate, 191 supermartingale, 177 see also submartingale surface measure, 153–156, 161 (Pr 15.6) surjective map, 6 symmetric difference, 13 (Pr 2.2) Sz.-Nagy, Béla, 349 Szeg¨o, Gabor, 277n Tarski, Alfréd, 43 theorem, see also lemma or inequality backwards convergence theorem, 193 Beppo Levi theorem, 70 Bonnet’s mean value theorem, 350 bounded convergence theorem, 174 (Pr 16.7) Cantor–Bernstein theorem, 9 Carathéodory’s existence theorem, 37 completion of metric spaces, 325 conditional Beppo Levi theorem, 264 conditional dominated convergence theorem, 266 continuity theorem (improper R-integral), 355 continuity theorem (R-integral), 352 continuity lemma (L-integral), 91 convergence of UI submartingales, 194 differentiability lemma (L-integral), 91 differentiation theorem (improper R-integral), 356 differentiation theorem (R-integral), 352 dominated convergence theorem, 89 p -version, 111 100 (Pr 11.1) Doob’s theorem, 222 (Pr 19.2) existence of product measures, 123
extension of measures, 37 Fréchet-v. Neumann-Jordan theorem, 232 (Pr 20.2) Fubini’s theorem, 126 Fubini’s theorem on series, 225 (Pr 19.18) fundamental theorem of calculus, 349 general transformation theorem, 151 Hardy–Littlewood maximal inequalities, 215 Heine–Borel theorem, 325, 326 integral test for series, 357 integration by parts, 350 integration by substitution, 350 Jacobi’s transformation theorem, 148 Korovkin’s theorem, 280 Lebesgue’s convergence theorem, 89 p -version, 100 (Pr 11.1), 111 Lebesgue decomposition, 209 Lebesgue’s differentiation theorem, 218 mean value theorem for integrals, 345 monotone convergence theorem, 88 optional sampling theorem, 187 projection theorem, 235 Pythagoras’ theorem, 233 (Pr 20.6), 238, 240 Radon-Nikodým theorem, 202 Riesz representation theorem, 239 Riesz’ convergence theorem, 112 Riesz–Fischer theorem, 110 M. Riesz’ theorem, 288 second mean value theorem for integrals, 350n submartingale convergence theorem, 191 Tonelli’s theorem, 125 uniqueness of measures, 33 alternative statement, 36 (Pr 5.6) uniqueness of product measures, 122 Vitali’s convergence theorem, 165 non--finite case, 167 Weierstrass approximation theorem, 279, 287 tightness (of measures), 169 Tonelli, Leonida, 125 topological -algebra, 17 topological space, 17, 319 topology, 17, 319 examples, 319 trace -algebra, 16, 20 (Pr 3.10)
Name and Subject Index transformation formula for Lebesgue integrals, 151 for Lebesgue measure, 53, 148 and differentiation, 152 trigonometric polynomial, 283 trigonometric polynomials are dense in C − , 287 trigonometric system, 283 complete in L2 , 283, 287 Tzafriri, Lior, 294, 295 Uhl, John, 210 unconditional basis, 293–295 uncountable, 7 uniform boundedness principle, 246 (Pr 21.10) uniformly integrable, 163, 175 (Pr 16.11) vs. compactness, 169 equivalent conditions, 168 uniformly -additive, 169 unit mass, 26 upcrossing, 190 upcrossing estimate, 191 upper integral, 338 upwards filtering, upwards directed, 203
381
vector space, 226 Volterra, Vito, 349 volume of unit ball, 155–156 Wade, William, 302 Wagon, Stan, 43 Walsh system, 302 wavelet, see Haar wavelet Weierstrass, Karl, 279 Wheeden, Richard, 288 Wiener, Norbert, 176, 311 Wiener process, 309, 311 Willard, Stephen, 318 Williams, David, 294 Yosida, Kôsaku, 232 Young, William, 101 (Pr 11.3), 117 (Pr 12.5), 138 Young function, 117 (Pr 12.5) Young’s inequality, 105, 117 (Pr 12.5), 138, 141 (Pr 14.9) Zygmund, Antoni, 176, 221, 288