Fractal sets and Preparation to Geometric Measure Theory Heinrich von Weizs¨acker Notes by Johannes Geißler Universit¨at...
56 downloads
1400 Views
424KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Fractal sets and Preparation to Geometric Measure Theory Heinrich von Weizs¨acker Notes by Johannes Geißler Universit¨at Kaiserslautern Winter-Semester 2002/03 Revision Februar 2006
Preface Geometric measure theory studies the interplay between geometry and measure theory. This short course can provide only a first glance at some aspects of the field. More spefically we have two main goals: 1. The introduction of fractal sets and of the Hausdorff measure, in particular the study of ’objects of noninteger dimension’. 2. Extension of some concepts of classical vector analysis of smooth objects to certain nonsmooth settings. As an simple example Rademacher’s theorem says that for a Lipschitz function f : U → Rk , the set {x ∈ Rn | f not dif f erentiable at x} is of Lebesgue measure 0. Moreover, for k = n we still have the classical change of variable formula Z Z ϕ(f (x))|detDf (x)|dx = ϕ(y)dy U
f (U )
This sort of results is useful e.g. for PDEs, it eventually leads to the theory of ’rectifiable currents’ and ’varifolds’ as generalized manifolds. We do not show the application of such tools for variational problems of the following type which are also at the core of classical geometric measure theory: 3. Characterisation of some geometric objects by their measure theoretic properties, the most famous example being the isoperimetric inequality: In euclidean space a ball has the largest volume among all bodies of equal perimeter (= size of surface); equivalently surf ace(ball) = inf { surf ace(C) | vol(C) = vol(ball)}. Thus the ball can be characterised in terms of an optimisation problem involving volume and surface measure. Books: There are several introductions to the Theory of fractal sets. A first introduction gives Barnsley [1]. We recommend in particular the books of Falconer: A popular introduction is [8] (2003, first edition 1990, German translation 1993). It contains many important examples and it is mathematically precise but gives not all proofs. Mathematically more sophisticated are [6] and [7]. A textbook which also provides most of the necessary mathematical background is Edgar [4]. Another more advanced expert exposition which also contains a number of deep results by the author gives Pertti Mattila in [11]. It is a bridge between the two areas described above. The extension of ’smooth analysis’ to wider classes of functions is described in the monograph [5] by Evans and Gariepy. The classical advanced text of Geometric Measure Theory is Federer’s [9]. An also 1
classical more elementary introduction to the theory of Hausdorff measure is C. Rogers [14]. Specialists in the analytical aspects of Geometric Measure Theory often quote L. Simon [15]. A beautifully written text on the variational problems mentioned above (soap bubbles etc.) is Frank Morgan’s ’Beginner’s Guide’ [12]. However, despite the title the reader should be well experienced in real Analysis. We hope that the student of our course should be sufficiently prepared to give it a try with [12]. Finally I would like to mention that I have learned a lot about this area from Peter M¨ orters, see e.g. [13].
2
Contents 1 Hausdorff dimension and Hausdorff measure 1.1 The Minkowski dimension . . . . . . . . . . . . . . . . . 1.2 The definition of Hausdorff dimension . . . . . . . . . . 1.3 The definition of the Hausdorff measure . . . . . . . . . 1.4 Lebesgue measure as Hausdorff measure . . . . . . . . . 1.5 Determining the Hausdorff dimension: Some techniques
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
4 4 7 8 11 17
2 Self-similar sets 2.1 Definitions . . . . . . . . . . . . . . . . . . . 2.2 Coding of the attractors . . . . . . . . . . . 2.3 The Chaos Algorithm . . . . . . . . . . . . 2.4 The Hausdorff dimension of self-similar sets 2.5 A glimpse of Julia sets . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
21 21 24 25 27 32
3 Differentiation 3.1 Upper and Lower Densities . . . . . . . . . . 3.2 Connection to Potential Theory . . . . . . . . 3.3 The Hausdorff dimension of Brownian orbits . 3.4 Lebesgue’s Differentiation Theorem . . . . . . 3.5 Rademacher’s Theorem . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
36 36 37 38 40 43
Area and Coarea Formulas Egorov’s Theorem and an application to Lipschitz maps The area formula . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . The Coarea Formula . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
49 49 51 57 58
4 The 4.1 4.2 4.3 4.4
3
Chapter 1
Hausdorff dimension and Hausdorff measure 1.1
The Minkowski dimension
The concept of Hausdorff dimension is only one of a variety of different measure theoretic notions of dimension. Its definition is a little involved, so we start with the simpler notion of the Minkowski dimension. Definition 1.1 Let E ⊂ Rn be bounded and ε > 0. Then let M (E, ε) := min{k ∈ N | ∃ x1 , ..., xk
s.d.
E ⊂ ∪ki=1 B(xi , ε)}.
Here B(x, ε) = {y ∈ Rn | d(x, y) < ε}. We observe that if E is a ’typical d-dimensional set’ then the number M (E, ε) is approximatively equal to εcd . Example 1.1 Let E be a union of finitely many intervals of the real line. Then M (E, ε) ∼
Length(E) 2ε
(1.1)
since for any interval we have M (E, ε) − 1)2ε ≤ Length(E) ≤ M (E, ε)2ε. Similarly (1.1) holds if E is the orbit of a smooth curve γ in Rn , in this case Z b Length(E) = |γ 0 (t)|dt. a
If however is a ’typical planar set’ then M (E, ε) ∼
Area(E) πε2
which extends to two-dimensional surfaces in Rn . 4
Definition 1.2 We define the lower Minkowski dimension of E as log M (E, ε) log( 1ε )
dimM E := lim inf ε→0
Similarly the upper Minkowski dimension is given by dimM E := lim sup ε→0
log M (E, ε) log( 1ε )
Always one has dimM E ≤ dimM E but not necessarily equality. In the case of equality define the common value dimM E as the Minkowski dimension of E. Example 1.2 (Cantor’s middle third set) Cantor’s middle third set is defined by ∞ X C= εi 3−i | εi ∈ {0, 2} . i=1
Alternatively one can describe C by the following iterative procedure: Start with the unit interval and remove the open middle third (first step) . . . After the n-th step there are 2n closed intervals left, each of length 3−n . We call their union Cn . In the n + 1-st step we remove the open middle third in each of the intervals which constitute Cn . The remaining 2n+1 intervals form Cn+1 . Then C=
∞ \
Cn .
n=1
Lemma 1.1 dimM (C) =
log 2 log 3
2 Proof: Let α = log log 3 1. upper bound: We need a good covering of C by intervals of radius ε. For ε > 0 let n be the unique integer satisfying 31n < ε ≤ 1 3n−1 . The sets n n X xi X xi [ , + ε] 3i i=1 3i i=1
cover C. Here the coefficients xi ∈ {0, 2} are arbitrary. Each of these intervals has radius ε/2 and is in turn contained in an interval of radius ε and center ∈ C. Therefore log 2 1 M (C, ε) ≤ 2n = 3 log 3 n = 3αn ≤ 3α · α ε and hence 1 log M (C, ε) ≤ log 3α + α log ε which implies for ε → 0 that dimM C ≤ 5
log 2 log 3 .
2. lower bound: We assume that C ⊂ have to prove m & const εα . Note that B(xj , ε) ∩ C ⊂ {
∞ X yi i=1
3i
Sm
i=1
B(xi , ε) for certain points xi . We
| y1 = xj,1 , . . . , yn = xj,n } =: Cxnj
P xj,i 1 where n is such that 3n+1 ≤ ε < 31n and each xj is written as xj = 3i with xj,i ∈ {0, 2}. This means that the expansion of each point in Cxnj starts like the one of xj . One needs at least 2n sets of the type Cxnj in order to cover all of C. So m ≥ 2n and hence 1 1 M (C, ε) ≥ 2n = 3αn ≥ ( )α α 3 ε Like above this implies dimM C ≥ α and together with part 1 dimM C = dimM C = dimM C = α. Example 1.3 For the set E = {0} ∪ {
1 | k ∈ N} k
we have dimM E =
1 . 2
1 1 Proof: upper bound: Let ε > 0. Choose n ∈ N such that (n+1) 2 ≤ ε < n2 . The set { k1 | k > n} ∪ {0} can be covered by n + 1 intervals of radius ε. The remaining n points need at most n further intervals. Hence E can be covered by 2n + 1 intervals of radius ε. This implies
M (E, ε) ≤ 2n + 1 ≤
2n + 1 1 1 ( )2 n ε
and dimM E ≤ 1/2. 1 = lower estimate: Two neighbouring points of E have distance k1 − k+1 1 1 1 k(k+1) ≥ (k+1)2 . An interval B(x, ε) with x ∈ E, i.e. x = k for some k, contains no other points of E if k < n where n is chosen depending on ε as above. Thus the number of needed intervals is at least ≥ n. Therefore M (E, ε) ≥ n ≥
n 1 1 ( )2 n+1 ε
which implies dimM E ≥ 12 .
6
This example is instructive since the set E is countable and on the other hand dimM {x} = 0 ∀x. This shows that the Minkowski dimension does not have the following countable stability property which is certainly desirable in a measure theoretic context: dim(
∞ [
Ek ) = sup dim Ek k∈N
k=1
There are two ways to modify the definition in order to enforce this property: 1. We allow countable coverings by arbitrarily small sets. This leads to the notion of Hausdorff dimension. 2. One first splits the set in question into countably many disjoint parts and then considers the smallest supremum of the dimensions of these subsets among all such partitions. This leads to the socalled packing dimension which we shall not pursue further.
1.2
The definition of Hausdorff dimension
In order to define the Hausdorff dimension let (X, d) be any metric sapce. We introduce the diameter |U | := sup{d(x, y) | x, y ∈ U } of a set U ⊂ X . A δ-covering of a set E is a system (Ui )i∈I of at most countably many sets Ui S satsifying |Ui | < δ and E ⊂ i Ui . Fix for the moment the number δ > 0. For each real number α ≥ 0 define X Hδα (E) := inf{ |Ui |α | (Ui ) is a δ − covering of E} i
| {z }
α-value of the covering
=
α- value of the most efficient δ-covering of E.
We remark that if δ 0 < δ then Hδα0 (E) ≥ Hδα (E) since for δ more coverings are allowed in the formation of the infimum than for δ 0 . Lemma 1.2 1. We have inf{α | Hδα (E) = 0} = sup{α | Hδα (E) > 0}. 2. The value of this infimum resp. supremum does depend on δ. Proof: 1. Let s > inf{α | Hδα (E) = 0}. We claim that Hδs (E) = 0. This will prove part 1 of P the assertion. For each ε ∈ (0, 1) there is some α < s and a δ-covering with i |Ui |α ≤ ε. Then |Ui | ≤ 1 ∀i. For x ∈ [0, 1] the function s 7→ xs is decreasing. Thus X X |Ui |s ≤ |Ui |α < ε i
i
and hence Hδs (E) = 0. 2. For the proof of the second assertion suppose Hδs (E) = 0. For δ 0 > δ there 7
are more δ 0 -coverings. Thus Hδs0 (E) = 0. For δ 0 P < δ let ε > 0 be given. Then, because of Hδs (E) = 0, there is a δ-covering with i |Ui |s ≤P min(ε, (δ 0 )s ). Then 0 0 |Ui | < δ for each i. So (Ui ) is even a δ -covering of E with i |Ui |s ≤ ε. Hence Hδs0 (E) = 0. Definition 1.3 The number dim E := inf{α | Hδα (E) = 0} = sup{α | Hδα (E) > 0} is called Hausdorff dimension of the set E. Proposition 1.3 The Hausdorff dimension is at most equal to the lower Minkowski dimension: dim E ≤ dimM E. Proof: Let s > dimM E. Choose η ∈ (dimM E, s). Then there is a small number ε > 0 with 2ε < δ and a covering of E by k balls of radius ε where log k = log(M (E, ε)) ≤ η log 1ε = log( 1ε )η . s Hence the s-value of this covering is ≤ k(2ε)s ≤ 2s εεη = 2s εs−η which converges s to zero as ε → 0. This implies Hδ (E) = 0 and therefore dim E ≤ s. This completes the proof.
1.3
The definition of the Hausdorff measure
Definition 1.4 a. A map µ from the power set P(X) to the extended positive axis [0, ∞] is called outer measure if it has the following three properties 1. µ(∅) = 0 2. If A ⊂ B then µ(A) ≤ µ(B) (Monotonicity). S∞ P∞ 3. µ( i=1 Ei ) ≤ i=1 µ(Ei ) for all sequences of subsets Ei of the set X (countable subadditivity). b. If (X, d) is a metric space then the number Hα (E) := lim Hδα (E) δ→0
is called α-dimensional (outer) Hausdorff measure of E. Proposition 1.4 The map Hα is an outer measure.
8
Proof: Properties 1) und 2) are trivial for each of the functions Hδα and by a α simple limit argument also P forαH . For 3) we may assume i H (Ei ) < ∞. Let ε > 0 be given andPfix δ > 0. For each i ∈ N there is a δ-covering (Uji )j∈N von Ei such that |U i |α < S∞ j j α −i i H (Ei ) + ε2 . Then the system (Uj )i,j∈N is a δ-covering of i=1 Ei with P α P∞ P i α H (Ei ) + ε2−i = i=1 Hα (Ei ) + ε. Since ε > 0 is arbitrary this i,j |Uj | ≤ implies the assertion. The Proposition shows in particular that Hausdorff dimension has indeed the countable stability property. Corollary 1.5 For every sequence of sets Ek ⊂ X dim(
∞ [
Ek ) = sup dim Ek . k∈N
k=1
Proof: Recall that dim E = inf{α | Hδα (E) = 0}. Clearly E ⊂ F implies dim E ≤ dim F . Therefore “≥” is trivial. For “≤” let s > dim Ek ∀k. Then Hδs (Ek ) = 0 for all k. Thus Hδs ( Hence s ≥ dim(
S
∞ [ k
Ek ) ≤
∞ X
k=1
Hδs (Ek ) = 0.
Ek ).
Our next goal is to show that Hα is even σ-additive on the system of all Borelsets. We need Lemma 1.6 (Carath´ eodory) Let ν be an outer measure on (X). Then the system E of all ν-measurable sets, i.e. E = {E ⊂ X | ν(A) = ν(A ∩ E) + ν(A ∩ E c )
∀A ⊂ X}
is a σ-algebra and ν|E is a measure. The proof of this lemma can be found in most textbooks on measure theory (e.g. [2]) in the chapter of measure extension. Intuitively the partition X = E ∪ E c yields a ’clean’ cut if and only if E is measurable, i.e. if the strict inequality ν(A) < ν(A ∩ E) + ν(A ∩ E c ) holds for some A then E and E c cannot be separated cleanly in terms of the set function ν. In order to explain why the passage to the limit δ → 0 is important in the definition of the Hausdorff measure we first note the following. Remark 1.1 For every δ > 0 and convex E ⊂ Rn with |E| < δ one has Hδ1 (E) = |E|. 9
Proof: Since {E} is a δ-covering of E we have the inequality Hδ1 (E) ≤ |E|. For the converse let x, y ∈ E and denote by S the interval between x and y. The convexity of E gives S ⊂ E. Let λ1 be the 1-dimensional Lebesgue measure, restricted to S. Let (Ui ) be a δ-covering of E. Then X X X |x − y| = λ1 (S) ≤ λ1 (Ui ∩ S) ≤ |Ui ∩ S| ≤ |Ui | i
i
i
and hence Hδ1 (E) ≥ |x − y| for all x, y ∈ E or Hδ1 (E) ≥ |E|. Corollary 1.7 For δ > 0 Borel sets are in general not Hδα -measurable. Proof: Let α = 1, X = Rn and E be a halfspace. Choose a ball A with diameter ε < δ and center in the separating hyperplane. Then Hδ1 (A) = ε but Hδ1 (E ∩ A) + Hδ1 (A ∩ E c ) = 2ε. Definition 1.5 Let ν be an outer measure on the power set of a metric space (X, d). Then ν is called a metric outer measure if for any two subsets of X with dist(A, B) := inf{d(x, y) | x ∈ A, y ∈ B} > 0 one has ν(A ∪ B) = ν(A) + ν(B). Lemma 1.8 The outer measure Hα is a metric outer measure. Proof: Only the inequality Hα (A) + Hα (B) ≤ Hα (A ∪ B) has to be checked. Let (Ui ) be a δ-covering of A ∪ B with δ < 12 dist(A, B) . Then {Ui | A ∩ Ui 6= ∅} ∩ {Ui | Ui ∩ B 6= ∅} = ∅ and hence X |Ui |α ≥ i
X
i mit Ui ⊂A
|Ui |α +
X
i mit Ui ⊂B
|Ui |α ≥ Hα (A) + Hα (B) − ε
for sufficently small δ. Passing to the limit δ → 0 we get H α (A ∪ B) ≥ Hα (A) + Hα (B). Lemma 1.9 Let ν be a metric outer measure and let A1 ⊂ A2 ⊂ . . . ⊆ S ∞ j=1 Aj =: A ⊂ X satisfy dist(Ai , A \ Ai+1 ) > 0 for each i. Then ν(Aj )
−→
j→∞
10
ν(A).
Proof: First note that the limit limj→∞ ν(Aj ) by monotonicity both exists and is ≤ ν(A). For the converse inequality let B1 = A1 , and Bj = Aj \ Aj−1 for j ≥ 2. Then dist(Bi , Bj ) > 0 if j ≥ i + 2. Hence ν(
m [
B2k−1 ) =
ν(B2k−1 ) ;
{z
⊆A2m
}
ν(
m [
B2k ) =
k=1
k=1
k=1
|
m X
| {z }
n X
ν(B2k ).
k=1
⊆A2m
Without loss of generality may assume lim ν(A2m ) < ∞. Hence the both P∞ Pwe ∞ series, k=1 ν(B2k ) and k=1 ν(B2k−1 ), converge. Moreover ν(A) = ν(
∞ [
j=1
Aj ) ≤ ν(Aj ) + ν(
∞ [
k=j
Bk ) ≤ ν(Aj ) +
∞ X
ν(Bk )
k=j
The second sum converges to 0 which implies the result. Theorem 1.10 For every metric outer measure ν, in particular for H α , the Borel-σ-algebra B(X) is contained in the σ-algebra of ν-measurable sets. Proof: We only need to verify the ν-measurability of closed sets. So let E be closed. Consider any set A and define Aj = A \ E[ 1j ] where 1 1 E[ ] := {x ∈ X | dist(x, E) < }. j j S 1 Then dist(Aj , A \ Aj+1 ) ≥ 1j − j+2 > 0. Since E is closed we have A \ E = Aj and hence by the lemma ν(Aj ) → ν(A\E). On the other hand dist(A∩E, Aj ) > 0 and so ν(A) ≥ lim ν(A ∩ E) + νAj = ν(A ∩ E) + ν(A \ E). j→∞
As the converse inequality is clear we have ν(A) = ν(A ∩ E) + ν(A \ E) for all A ⊂ X. Thus E belongs to the system E of all ν-measurable sets.
1.4
Lebesgue measure as Hausdorff measure
Every norm in Rn induces a translation invariant metric and hence translation invariant Hausdorff measures for all dimensions α between 0 and n. We are now interested in the special case α = n. Since, up to a multiplicative constant, Lebesgue measure is the only translation invariant measure which assigns finite volume to bounded sets, the Hausdorff measures corresponding to different norms differ only by a constant. This constant will be determined in Theorem 1.17 below. To explain the idea we first treat the simplest case separately. 11
Theorem 1.11 In Rn Lebesgue measure λn is the Hausdorff measure Hn with respect to the max-norm kxk∞ = maxi |xi |. Proof: 1. A particular property of this norm is that every bounded set U ⊂ Rn is contained in a k · k∞ -ball with diameter |U |. In fact for i = 1, . . . , n let ai := inf{xi | x ∈ U } and bi := sup{xi | x ∈ U }. Then |U | = max1≤i≤n (bi − ai ) and U is contained in a equilateral box (or k · k∞ -ball) of this diameter. 2. Let now E ⊂ Rn and let Ui be a δ-covering von E. Without enlarging P n i |Ui | we may assume that the Ui are boxes because of the observation above. Then X X [ |Ui |n = λn (Ui ) ≥ λn ( Ui ) ≥ λn (E). (1.2) i
Taking the infimum yields Hn (E) ≥ λn (E). 3.For the converse estimate note that each equilateral box W of side length l can be decomposed, for each k ∈ N, into (2k )n small boxes of side length l2−k . For δ > 0 choose k mit l2−k < δ. These small boxes form a δ-covering of W with X |Wi |n = (l2−k ) · (number of small boxes) = l n = λn (W ). i
This implies Hn (W ) ≤ λn (W ) for all equilateral boxes. The same inequality follows then for all open and finally for all Borel sets. Step 2 and step 3 of this proof have somewhat harder analogies for general norms. In step 2 the initial remark implies that the volume of any bounded set is at most equal to the volume of a ball with equal diameter. This ’isodiametric inequality’ holds true for every norm, as we shall prove in Theorem 1.14 as a consequence of the Brunn-Minkowksi inequality. In step 3 it was essential that the small boxes can be chosen to be disjoint. This will be no longer true for other norms but if we disregard nullsets then Vitali’s covering theorem 1.15 will give what we need. Theorem 1.12 (Brunn-Minkowski inequality) For any two compact subsets A, B of Rn one has: 1
1
1
λn (A + B) n ≥ λn (A) n + λn (B) n . Proof: Since λn is translation invariant we can shift the sets freely without changing the validity of the assertion. 1. First we treat the special case where A, B are boxes with sides parallel to the axes and sidelengths a1 , . . . , an and b1 , . . . , bn , respectively. Then A + B = {x + y | x ∈ A, y ∈ B} is a box with side lengths a1 + b1 , . . . , an + bn . We thus need to show that n Y
i=1
1
(ai + bi ) n ≥
n Y
1
ain +
n Y
i=1
i=1
12
1
bin
∀ai , bi ≥ 0.
To see this divide and put αi := 1≥
n Y
1
αin +
ai ai +bi . n Y
The assertion is equivalent to 1
(1 + αi ) n
i=1
i=1
∀αi ∈ (0, 1).
1
Now the P geometric mean (c1 ·. . .·cn ) n is always at most equal to the arithmetric n mean n1 i=1 ci which can be verified by applying Jensen’s inequality to the concave function log. Thus we get the desired estimate Y
1
αin +
Y
n
1
(1 − αi ) n ≤
1X 1X 1X 1 = 1. αi + (1 − αi ) = n n n i=1
2. Now suppose that A, B are each the union k resp. m of such boxes which are moreover quasidisjoint. Here we call two sets C, D quasidisjoint if their intersection is contained in a hyperplane. Wir give a proof by induction with respect to k and n. The case k = n = 1 has already been treated. So assume without loss of generality k ≥ 2. We can split A by a hyperlane into two sets A0 , A00 , whose number of blocks both are < k. We even may assume that the separating hyperplane is parallel to {xn = 0}, and by suitable shifting we get A0 ⊂ {xn ≥ 0}, A00 ⊂ {xn ≤ 0}. n
0
) 0 00 Let r := λλn(A (A) ∈ (0, 1). Next split the set B into two quasidisjoint sets B , B such that λn (B 0 ) = r. λn (B)
Moreover, again by shifting we may suppose B 0 ⊂ {xn ≥ 0}, B 00 ⊂ {xn ≤ 0}. Then also A + B contains (A0 + B 0 ) ∪ (A00 + B 00 ) where the two sets in the brackets are quasidisjoint and have smaller number of boxes. Thus by induction λn (A + B) ≥ λn (A0 + B 0 ) + λn (A00 + B 00 ) 1
1
1
1
≥ [λn (A0 ) n + λn (B 0 ) n ]n + [λn (A00 ) n + λn (B 00 ) n ]n 1
1
1
1
= r[λn (A) n + λn (B) n ]n + (1 − r)[λn (A) n + λn (B) n ]n 1
1
= [λn (A) n + λn (B) n ]n
which proves the assertion in case 2. 3. If A, B are general compact sets then there are two sequences (Al ), (Bl ) of finite unions of boxes which decrease to A and B respectively and the theorem follows by passing to the limit. 13
As an example of application the reader may easily verify the following Corollary 1.13 Let C ⊆ Rn+1 be convex. For x ∈ Rn+1 , t ∈ R consider the 1 level sets Ct := {y ∈ C |< x, y >= t}. Then the map t 7→ λn (Ct ) n is concave on the interval {t : Ct 6= ∅}. Our main application of the Brunn-Minkowski inequality is the isodiametric inequality for general norms on Rn . The simple proof is taken from [3]. Corollary 1.14 (Isodiametric inequality for norms) Let ||.|| be a norm on Rn and let C be a bounded Lebesgue-measurable set. Then the volume of C is at most equal to the volume of a ||.||-ball with equal diameter. Proof: By definition of the diameter we have |C| = sup{||x − y|| | x, y ∈ C}. Therefore C−C is contained in the ball B(0, |C|). Thus by the Brunn-Minkowski inequality 2n λn (B(0,
|C| ) = λn (B(0, |C|)) ≥ λn (C − C) 2 1 1 ≥ [λn (C) n + λn (−C) n ]n = 2n λn (C)
which implies the assertion since the ball (B(0,
|C| 2
has diameter |C|.
The second tool for relating the Hausdorff measure H n with respect to general norms to Lebesgue measure is the Vitali covering theorem. It will be useful also lateron. Definition 1.6 Let E be a subset of the metric space (X, d). Let Φ be a class of subsets of X such that for all ε > 0 and x ∈ E there is some U ∈ Φ with x ∈ U and |U | < ε. Then the family Φ is called a Vitali class for E. Theorem 1.15 [Vitali’s covering theorem] Let E ⊂ X and let Φ be a Vitali class for E consisting of closed subsets of E. Then there is a (finite or infinite) (Ui ) of disjoint S elements of Φ such that for each α ≥ 0 either P sequence α α |U | = +∞ or H (E \ i i Ui ) = 0. i
Proof: We are going to choose the sets Ui recursively. Without loss of generality we may assume that all elements of Φ have diameter at most 1 since the others are not needed in the definition of a Vitali class. We start with any U1 ∈ Φ. For the recursion step suppose that U1 , U2 , . . . , Um are already chosen. Let dm := sup{|U | | U ∈ Φ, U ∩ Ui = ∅ for i = 1, . . . , m}.
(1.3)
Now choose Um+1 such that |Um+1 | > 12 dm and Um+1 ∩ Ui = ∅ fori ≤ m. If dm = 0 then stop. This is the construction. 14
In order to show that these sets have the desired propertyPlet, for each i, Bi be a ball with center in Ui und radius 3|Ui |. We claim that |Ui |s < ∞ implies E\
k [
i=1
Ui ⊂
∞ [
Bi .
(1.4)
i=k+1
Sk Sk Let x ∈ E \ i=1 Ui . Since the Ui are closed we have d(x, i=1 Ui ) > 0. The Vitali property implies that there is some U ∈ Φ with x ∈ U and U ∩ Ui = ∅ for all i ≤ k. Since |Um | ≥ 12 dm the above series converges there is a first index m with |U | > dm . This inequality implies U ∩ Uj 6= ∅ for some j < m because otherwise U would have been admissible in the definition (1.3) of the number dm . Now let xj be the center of Bj and let y be any element of U ∩ Uj . Then, for all z ∈ U , d(y, z) ≤ |U | ≤ dj ≤ 2|Uj | and hence d(xj , z) ≤ d(xj , y) + d(y, z) ≤ 3|Uj |. Thus U ⊂ Bj and in particular x ∈ Bj . Of course the index j is larger than k since U ∩ Ui = ∅ for i ≤ k. This proves (1.4). As consequence we get Hα (E \
k [
i=1
Ui ) ≤ H α (
and hence Hα (E \
∞ [
i=1
∞ [
i=k+1
Bi ) ≤
∞ X
i=k+1
Ui ) ≤ lim 6α k→∞
|Bi |α ≤ 6α
∞ X
i=k+1
∞ X
i=k+1
|Ui |α
|Ui |α = 0.
For many purposes the following corollary is sufficient in which the first alternative in the conclusion does not appear. Corollary 1.16 Let Φ be a family of closed balls (for some norm on Rm ) and let A ⊂ Rm be a set such that each point in A is contained in a member of Φ of arbitrarily small radius. Then there is a sequence of disjoint elements of Φ whose union covers all of A except for a λm -nullset. Proof: As in the previous proof we may assume that the radii of the members of Φ are uniformly bounded. Let A1 = A ∩ B(0, 1). Choose a sequence (Ui1 ) of members of Φ for the set E = A1 according to the Theorem such that in addition A1 ∩ Ui 6= ∅. Then the union of the Ui1 is bounded and hence only the second alternative in the conclusion of the Theorem is possible, i.e. the union of the Ui covers all of A1 except for a nullset with respect to Hα where α = m and hence with respect to λm since we already know that Lebesgue measure coincides up to a constant a m-dimensional Hausdorff measure. Choose an Sn1 with Ui1 covers all of A1 up to a set of measure 2−1 . index n1 such that i=1 15
Recursively suppose that we have already defined Aj , nj ∈ N and pairwise disjoint sets Uij ∈ Φ for j < k such that λ(Aj \
nj [
Uij ) < 2−j .
(1.5)
i=1
Then we set Ak = A ∩ B(0, k) \
[
Uij .
(1.6)
j
Since the balls Uij are closed the family Φk of members of Φ which are disjoint from the Uij with j < k, i ≤ nj is still a Vitali class for Ak . Thus we can find nk and sets Uik in ΦkTsuch that (1.5)holds with j replaced by k. Because of (1.5) and (1.6) the set ∞ k=1 Ak is a Lebesgue nullset and each point in A is either in this nullset or it is contained in one member of the disjoint countable collection of balls Uik (, i ≤ nk ). This completes the proof. Now we are in the position to prove our main result of this section Theorem 1.17 Let ||.|| be any norm on Rn and let cn = λn {x | ||x|| ≤ 1}. n n Then λn = 2cnn H||.|| , where H||.|| is the n-dimensional Hausdorff measure corresponding to the metric which is induced by ||.||. n ≥ λn is similar to part 2. in the proof of Proof: 1. The proof of 2cnn H||.|| Theorem 1.11 except that in the estimate (1.2) the first equality is replaced by the inequality ∞ ∞ X cn X n |U | ≥ λn (Ui ) i 2n i=1 i=1
which follows from the isodiametric inequality 1.14. 2. Similarly, for the reverse inequality we can now use essentially the same argument as in part 3. of the proof of Theorem 1.11 since now due to the Vitali covering theorem we know that every ’simple’ set (e.g. a box) can be covered up to a Lebesgue nullset by a sequence of disjoint small k.k-balls. The previous result concerns only the highest dimensional nontrivial Hausdorff measure in Rn . The lower dimensional Hasudorff measures for non euclidean norms are no longer rotation invariant. As an example the one-dimensional 1 Hausdorff measure Hk.k of any interval [x, y] is equal to the norm ky − xk of the difference of the end points of the interval. But this quantity is clearly only rotation invariant if the norm is. In order to get a rotation invariant Hausdorff measure Hs for s < n we have to use the Euclidean norm. On the other hand for integer dimensions and flat subsets of the same dimension one would like the Hausdorff measure to coincide with Lebesgue measure. This is the reason 16
why in many texts on Geometric measure theory the α-dimensional Hausdorff measure is defined by X ωα Hα (E) = lim inf { |Ui |α | (Ui ) is a δ-covering of E} (1.7) δ→0 2α
where for α ∈ N the number ωα is chosen as the Lebesgue volume of the αdimensional euclidean unit ball and |Ui | denotes the euclidean diameter of Ui . This measure is clearly rotation invariant and according to our Theorem1.17 it coincides indeed with Lebesgue measure for α = n. But it differs precisely by the factor ω2αα from our definition of Hausdorff measure (which is the one in many texts on Fractal Geometry). In order to avoid ambiguity we sometimes speak of the measure defined by (1.7) as the euclidean Hausdorff measure.
1.5
Determining the Hausdorff dimension: Some techniques
For an upper estimate of the Hausdorff dimension of a given set it is enough to find one sufficiently efficient covering. For a lower estimate of the Hausdorff dimension one has to prove something for all conceivable coverings. One useful procedure for this is provided by the mass distribution principle. Definition 1.7 Let E ⊆ X. A measure µ on the Borel σ-algebra B(X) is called mass distribution over E if there is a compact subset A of E such that µ(Ac ) = 0 and 0 < µ(A) < ∞. Theorem 1.18 (Mass distribution principle) Let µ be a mass distribution over E such that for some α ≥ 0 and some positive constants c and δ we have µ(U ) ≤ c|U |α for all U with |U | < δ. Then Hα (E) ≥
µ(E) and dim E ≥ α. c
Proof: We may suppose without loss P of generality H α (E) < ∞. Let Ui be a δ-covering of E satisfying i |Ui |α ≤ Hα (E) + ε. Then [ X X 0 < µ(E) ≤ µ( Ui ) ≤ µ(Ui ) ≤ c|Ui |α ≤ cHα (E) + cε i
and hence ∞ > Hα (E) ≥ ular we have dim E ≥ α.
i
µ(E) c
i
> 0 which implies the first assertion. In partic
As a first simple application we get the 17
Corollary 1.19 Let E ⊂ Rn be Lebesgue measurable and λn (E) > 0. Then dim E = n. Proof: Since λn is inner regular there is a compact set A ⊂ E with 0 < λn (A) < ∞. Then µ = λn (A ∩ .) is a mass distribution on E. For every bounded set U one has λn (U ) ≤ cn|U |n where the optimal constant is given by the isodiametric inequality. Another useful principle is the application of Lipschitz maps. Theorem 1.20 Lipschitz Principle Let E, F be subsets of metric spaces. We assume that there is a surjective Lipschitz map f : E → F with Lipschitz constant L. Then Hα (F ) ≤ Lα Hα (E) for all α ≥ 0 and dimF ≤ dim E. Proof: Again without loss of generality Hα (E) < ∞. Let (Ui ) be a δ-covering of E. Let Vi := f (Ui ∩ E) ⊂ F. Then |Vi | = sup{d(f (x), f (y)) | x, y ∈ E ∩ Ui } ≤ L|Ui | and hence (Vi ) is a Lδ-covering of F with X X |Vi |α ≤ Lα |Ui |α ≤ Lα (Hα (E) + ε) i
i
P
if Hα (E) < i |Ui |α +ε. Letting δ and ε tend to zero we get Hα (F ) ≤ Lα Hα (E). In particular Hα (E) < ∞ and hence dim F ≤ dim E. Corollary 1.21 Isometries leave Hausdorff measure and dimension invariant. As an illustration of this principle we consider Lipschitz curves. Theorem 1.22 Let γ : [0, 1] → Rn be a non constant Lipschitz curve. Then a) dim(γ[0, 1]) = 1. b) If γ is injective then the arc length is given by the 1-dimensional Hausdorff measure: X |γ(ti ) − γ(ti−1 )| | {t1 , . . . , tn } ⊂ [0, 1] . H1 (γ[0, 1)) = sup i
18
Proof: 1. Since γ is a Lipschitz map we have dim γ[0, 1] ≤ dim[0, 1] = 1 by the previous theorem. Since γ is not constant there are two points x < y in [0, 1] such that γ(x) 6= γ(y). Let π : Rn → g be the orthogonal projection onto the line g, which is spanned by γ(x), γ(y). For the euclidean norm π is Lipschitz with constant 1. Thus the theorem implies dim γ[0, 1] ≥ dim γ[x, y]≥ dim πγ[x, y] = 1 since πγ[x, y] is connected and hence contains a nondegenerate subinterval of the line g which in turn is isometric to R. 2. If moreover γ is injective then we have for an arbitrary partition t0 < . . . < tn of [0, 1] X X H1 (interval between γ(ti−1 ), γ(ti )) H1 (γ[ti−1 , ti ]) ≥ H1 (γ[0, 1]) = i
i
=
X i
|γ(ti ) − γ(ti−1 )|
and hence H1 (γ[0, 1]) ≥ l(1) where l denotes the arc length X | γ(ti ) − γ(ti−1 )| | {t0 < . . . < tn } ⊂ [0, t] . l(t) := sup i
It remains to be shown that H1 (γ[0, 1]) ≤ l(1): The function l is strictly increasing because of injectivity, it is continuous and the inverse function l −1 satisfies |γ ◦ l−1 (l(b)) − γ ◦ l−1 (l(a))| = |γ(b) − γ(a)| ≤ l(b) − l(a) by definition of l. Therefore γ ◦ l −1 has a Lipschitz constant ≤ 1. So again our theorem implies H1 (γ[0, 1] = H1 (γ ◦ l−1 ([0, l(1)])) ≤ H1 ([0, l(1)]) = l(1). Attention: Sets with dim E = 1 (so called 1-sets) can look quite different from Lipschitz curves. Example 1.4 [Cantor-like set in R] We construct a subset E of the plane recursively, starting with the unit square E0 . In the k-th step we assume that Ek is the disjoint union of 4k squares the sides of each being parallel to the axes and having length 4−k . Sei Q be one of these squares. Then choose the four subsquares of Q with sidelength 4−(k+1) which are sitatuated in the four corners of Q. Then let Ek+1 be the union of these 4k+1 smaller squares. T∞ Having defined the sequence (Ek ) finally let E be their intersection E = k=0 Ek . 19
We want to prove that E has dimension 1: If δ > 0 and k is chosen such that δ >P 4−k then P the squares which constitute Ek form a δ-covering of E with 1 = Q 4−k = Q |Q|1 . This implies dim E ≤ 1. For ≥ we construct a Lipschitz map f : E → R whose image contains a nondegenerate interval. Actually we give two completely different constructions. 1. Consider the following coding of the points E by real numbers which are represented in 4-adic expansion. For every x ∈ E suppose that the first digits y1 (x), . . . , yk−1 (x) of the code y(x) are already determined by the square of the k − 1-th level in which x lies. Then x lies exactly in one of the corresponding four subsquares of the k-th level. Let yk (x) be the number in {0, 1, 2, 3} of this subsquare. Then y1 (x), . . . , yk (x) determine the unique k-th level subsquare containing x. P∞ We claim that the associated map f : x 7→ k=1 yk (x)4−k from E to the unit interval is Lipschitz and surjective. Surjectivity is clear since every possible sequence in {0, 1, 2, 3}N is used. For the verification of the Lipschitz property let x, x0 be an two points. Choose k such that 4−(k+1) < |x − x0 | < 4−k . Then x, x0 are in the same square of level k and hence the first k digits of the 4-adic numbers f (x) and f (x0 ) coincide. This implies |f (x) − f (x0 )| < 4−k ≤ 4|x − x0 |. 2. Our second idea is to consider the orthogonal projection π onto a line with slope 1/2. Clearly this projection is Lipschitz and one can verify that the image set π(E1 ) is a full interval since the images of the four squares just touch each other. By induction then the image set π(Ek ) is the same interval and hence even π(E) is an interval. This implies dim(E) ≥ 1. Remark The last argument does not work for lines with arbitrary slope. Actually we shall prove later that for almost every direction the image of E under the corresponding projection is a one-dimensional nullset.
20
Chapter 2
Self-similar sets 2.1
Definitions
Roughly speaking a set is self-similar if it formed by the union of several smaller copies of itself. Example 2.1 1. The Sierpinski triangle is the union of three translates of a copy of itself, each of half of the original size. 2. The Koch curve consists of 4 parts, each being a copy of the original curve squeezed by a factor 3. Definition 2.1 a) Let (X, d) be a metric space. A contraction is a map ϕ : X → X with Lipschitz constant c < 1. b) Let ϕ1 , . . . , ϕn be finitely many contractions. Sn A non-empty compact set K ⊂ X is an attractor of (ϕ1 , . . . , ϕn ), if K = i=1 ϕi (K). c) If K has a representation as an attractor of finitely many contractions then K is called self-similar in the wide sense d) If the maps ϕi can even be chosen to be similarities, i.e. d(ϕ(x), ϕ(y)) = ri d(x, y) for all x, y ∈ X for suitable ri ∈ (0, 1), then the attractor K is called self-similar (in the strict sense). Example 2.2 The Cantor middle third set K is selfsimilar: Choose ϕ1 x = 13 x and ϕ2 x = 32 + 31 x. Clearly theseP two maps are similarities. Proof of K = ϕ1 K ∪ ϕ2 K: ⊇. Let x = k x3kk with xk ∈ {0, 2} be the ternary representation of x ∈ K. Then both points X yk where y1 = 0; yk+1 = xk ϕ1 x = 3k k
21
ϕ2 x =
X yk k
are in K. P ⊆. Let z = k
zk 3k .
3k
where y1 = 2; yk+1 = xk
Then z=
(
ϕ1 x for suitable x if z1 = 0 ϕ2 x for suitable x if z1 = 2.
The existence and uniqueness of attractors is ensured in great generality by the following result. Theorem 2.1 Let (X, d) be a complete metric space. Let ϕ1 , . . . , ϕn be finitely many contractions. Then there is a unique non-empty compact set K ⊂ X such Sn that K = i=1 ϕi K. The main tool in the proof is the following metric between compact sets: Definition 2.2 Let K = {K ⊂ X | K 6= ∅, kompakt} and let K, L ∈ K. Then define the Hausdorff-distance between K and L by dH (K, L) = inf{δ > 0 | K ⊂ L[δ] und L ⊂ K[δ]} where K[δ] denotes the δ-neighbourhood of a set K, i.e. K(δ) = {x ∈ X | dist(x, K) < δ}. Lemma 2.2 If (X, d) is a complete metric space then so is (K, dH ). Proof(of the lemma): The inequality dH (K, L) ≥ 0 and the symmetry of dH are obvious. If K 6= L, e.g. if there is some x ∈ K \ L then dH (K, L) ≥ dist(x, L) > 0 since L is compact. The triangular inequality follows from K ⊂ L[δ], L ⊂ M [ε] ⇒ K ⊂ M [ε + δ]. So the main point is the completeness. Let (Kn ) be a Cauchy sequence. Let εn := sup {dH (Kl , Km ) | l, m ≥ n}. Without loss of generality we can assume that ∞ X εn < ∞. n=1
Let
K=
∞ [ ∞ \
Km .
n=1 m=n n→∞
We claim that K ∈ K and d(Kn , K) −→ 0. a). K, being the intersection of closed sets, is closed. Moreover K ⊂ Kn [εn ] for each jedes n. 22
(2.1)
ε K is totally bounded: Let ε > 0 be n such that Smεn < 2 . Cover S∞given. Choose ε Kn by finitely many balls Kn ⊂ i=1 B(xi , 2 ). Then K ⊂ i=1 B(xi , ε). We claim now ∞ X Kn ⊂ K[ εj ]. (2.2) j=n
Let x ∈ Kn . Put xn = x. Choose recursively xk+1 ∈ K Pk+1 with d(xk , xk+1 ) < εk which is possible since dH (Kk , Kk+1 ) < εk . Then k εk < ∞ implies that (xk ) ist eine Cauchyfolge in (X, d). Let x∗ = lim xP k which exists by assumption. ∞ Then x∗ ∈ K and d(x, x∗ ) = limk→∞ d(xn , xk ) ≤ k=n εk , i.e. (2.2) holds. (2.1) and (2.2) together imply dH (Kn , K)−→0 as n → ∞ and hence the assertion. Sn
Proof (of the theorem): Define Φ : K → K by Φ(K) = i=1 ϕi K. Then Φ is a contraction of the space K: Let K, L ∈ K be given. Then dH (Φ(K), Φ(L)) = dH (
n [
i=1
ϕi K,
n [
i=1
ϕi L) ≤ max dH (ϕi K, ϕi L) 1≤i≤n
≤ ( max ci )dH (K, L). 1≤i≤n
Then by Banach’s fix point theorem there is a unique set K ∈ K with Φ(K) = K. This implies the theorem. Remark 2.1 The iterative procedure Kn+1 = ΦKn converges for every initial set K0 to the unique attractor K. Example 2.3 Let ϕ1 , ϕ2 be the two maps with the Cantor middle third set as their attractor as in Example 2.2. Let us first start with the singleton K0 = { 21 }. Then 1 5 Φ(K0 ) = { , } = K1 6 6 1 5 13 17 Φ(K1 ) = { , , , } = K2 , 18 18 18 18 The set Φ(K2 ) = K3 has 8 elements etc. . Similarly if we start with L0 = [0, 1] then 2 1 ΦL0 = [0, ] ∪ [ , 1] = L1 3 3 2 1 2 7 8 1 ΦL1 = [0, ] ∪ [ , ] ∪ [ , ] ∪ [ , 1] = L2 . 9 9 3 3 9 9 The set Kn and Ln become more and more similar to each other even though all the Kn are finite and all the Ln are unions of disjoint nondegenerate intervals. 23
2.2
Coding of the attractors
Let K be the attractor associated to the contractions ϕ1 , · · · , ϕn . Definition 2.3 Let i1 , . . . , ik be any indices from {1, . . . , n}. Then define Ki1 ...ik := ϕi1 ◦ . . . ◦ ϕik (K) ⊂ K. Remark 2.2 We have Ki1 ...ik ⊂ Ki1 ,...,ik−1 . Note that the for this property the order in which the maps ϕi−l are composed is important. The sequences (Ki1 ...ik )k∈N ) are formed by decreasing subsets of K. Moreover for every pair of points x, y ∈ X we get d(ϕi1 ◦ . . . ◦ ϕik x, ϕi1 ◦ . . . ◦ ϕik y) ≤ ci1 . . . cik d(x, y).
(2.3)
If x ∈ K then the point ϕi1 ◦ . . . ◦ ϕik x is in K. The estimate (2.3) implies |Ki1 ...ik | ≤ ci1 ...ik |K|.
(2.4)
in particular for every x ∈ K and every sequence (ik ) of indices the sequence (ϕi1 ◦ . . . ◦ ϕik x)k converges, and again by (2.3) the same holds for every y ∈ X. This observation provides a coding of the points of the attractor K. Theorem 2.3 Let the space Σn = {1, . . . , n}N of index sequences be endowed with the product topology (topology of convergence of the coordinates). Then Σ n is compact and totally disconnected. There is a surjective continuous function π : Σn → K such that for every point (yk )k ∈ Σn we have π((yk )k ) = lim ϕy1 ◦ . . . ◦ ϕyk (x) for all x ∈ X. k→∞
(2.5)
If the maps ϕi are injective and if ϕi (K) ∩ ϕj (K) = ∅ for i 6= j then π is a homeomorphism, in particular the set K is totally disconnected. Proof: 1. That Σn is compact and totally disconnected is an elementary fact of toplogy. The convergence in the defining relation (2.5) of π has been observed already in the above remark. S 2. π is surjective: Let z ∈ K be given. Since Ki1 ...ik = nj=1 Ki1 ...ik j for each k one can construct a sequence y1 , y2 , . . . by recursion such that z ∈ Ky1 ...yk for all k ∈ N. Then by (2.4) z = π(y) and for this sequence (yk ) (2.5) holds by the remark. 3. The map π is continuous: The topology on Σn is induced e.g. by the metric 1 0 d((yk ), (yk0 )) : = max{ | ym 6= ym }. m Now assume d((yk ), (yk0 )) < δ. Then yk = yk0 for all k, which satisfy k1 > δ or equivalently k < 1δ . Then both points π((yk )) and π((yk0 )) are contained in 24
the same set Ky1 ...yk = Ky10 ...yk0 and hence by (2.4) d(π((yk )), π((yk0 ))) ≤ ck |K| where c = max1≤i≤n ci . Thus π is (uniformly) continuous. 4. If now in addition ϕi K ∩ ϕj K = ∅ for i 6= j and each ϕi is injective then π injektiv. On th other hand Σn is compact (this is a spacial case of Tychonoff’s theorem but it is also easy to verify directly that Σn is complete and totally bounded with respect to the above metric). Therefore the inverse function ⇒ π −1 is contiunuous as well, i.e. π is a homeomorphism.
2.3
The Chaos Algorithm
The coding theorem of the last section is useful not only for theoretic purposes but it also provides a algorithm for producing good and efficient approximations of self-similar sets by computer. In this section we assume that the reader is familiar with a few basic concepts of probability theory. Our algorithm is defined as follows: 1. Choose any probability distribution (p1 , . . . , pn ) on the index set {1, . . . , n} with pi > 0 for all i. 2. Simulate a sequence Y1 , Y2 , . . . of independent random variables with values in {1, . . . , n} such that P(Yk = i) = pi for all i and k. 3. Choose an arbitrary point x ∈ X as starting point. Choose the depth t ∈ N of the algorithm. 4. Plot (or save) the k points Z1t = ϕY1 ◦ ϕY2 ◦ . . . ◦ ϕYt x, Z2t = ϕYt+1 ◦ . . . ◦ ϕY2t x, ...,
Zkt = ϕY(k−1)t+1 ◦ . . . ◦ ϕYkt x. The main properties of the resulting collection of points are summarized in Theorem 2.4 The sequence of points Z1t , Z2t , . . . consists of independent identically distributed random variables with values in X. If the starting point x is already in K then they are K-valued. Let µt be the law of Z1t and let µ be the law of the random variable Z ∞ = π((Y1 , Y2 , . . .)) where π is the coding map of the previous theorem. Then the sequence (µ t )t∈N converges in distribution to µ, or (Z1t )t converges in law to Z ∞ , i.e. Z Z t→∞ E(f (Z t )) = f dµt −→ f dµ = E(f (Z ∞ )) for all f ∈ Cb (X). K
K
Moreover the measure µ has the topological support K. 25
Proof: The random {1, . . . , n}t -valued ’vectors’ (Y1 , . . . , Yt ), (Yt+1 , . . . , Y2t ), . . . , (Y(k−1)t+1 , . . . , Ykt ) are iid (independent and identically distributed). The random variables Z1t , . . . , Zkt are produced from these by application of one and the same function
Z1t , . . . , Zkt
f : (y1 , . . . , yk ) 7→ ϕy1 ◦ . . . ◦ ϕyk x.
Hence the are also iid. Now let us prove the convergence of the laws. The two points Z1t and Z1∞ (= π(Y1 , Y2 , . . .)) are both in the image of the same random map ϕY1 ◦ . . . ◦ ϕYt . Therefore their mutual distance can be estimated from above by (2.3), i.e. we get d(Z1t , Z ∞ ) ≤ ct |X| where c = max1≤i≤n ci < 1. So (Z1t ) converges to Z ∞ uniformly, and hence in probability and a fortiori in law. Finally, since Z ∞ ∈ K we have µ(K) = 1. On the other hand let U be a nonempty open subset of K. Then the preimage π −1 (U ) in the sequence space Σn is also open and non-empty since π is continuous and surjective by the coding theorem. Therefore there is some finite sequence i1 , . . . , ik∗ of indices such that (yk ) ∈ π −1 (U ) whenever y1 = i1 , . . . , yk∗ = ik∗ . Hence µ(U ) = P(Z ∞ ∈ U ) = P((Y1 , Y2 , . . .) ∈ π −1 (U )) ≥ P(Y1 = i1 , . . . , Yk∗ = ik∗ ) > 0.
This proves that the support of µ is not smaller than our self similar set K. The last part of the theorem concerning the support shows that all parts of the set K are hit with positive probability. The convergence results say that this property is already true, at least approximately, at finite depth. Therefore the collection of points which the algorithm produces form with high probability a good approximation of the whole set K, even if the starting point is any point of X. This is important since nothing about the true shape of K needs to be known in advance. Another possibility to get a finite approximation of the set K would be the iterative procedure given by the remark 2.1 starting with any singleton K0 = {x}. Then the set Kt = Φt K0 has nt elements. This is a huge number even for tame depth t. The adavantage of the chaos algorithm is that the number k of the points produced and the depth t can be chosen more or less independently and it suffices to take a relatively small number k of points in order to get a reasonable picture of K. Another aspect of this algorithm is that it produces in addition to the self-similar set K also a ’self-similar measure’ which is carried by K. Theorem 2.5 The probability distribution µ has the following property: µ=
n X i=1
pi (µ ◦ ϕ−1 i ). 26
Thus µ is a convex combination of its on images under the contractions ϕi . (One calls such a measure a self-similar measure .) Proof: Let us first remark that ϕi Ky1 ...ym = ϕi ◦ ϕy1 ◦ . . . ◦ ϕym K and a point z ∈ K is of the form z = ϕi π(y1 , y2 , . . .) if and only if z = π(i, y1 , y2 , y3 , . . .). Let A be any Borel subset of K. Then (
n X i=1
pi µ ◦ ϕ−1 i )(A) = = =
n X
pi µ(ϕ−1 i (A)) =
i=1
n X
i=1 n X i=1
n X i=1
piP(Z ∞ ∈ ϕ−1 i (A))
pi P(ϕi π(Y1 , Y2 , . . .) ∈ A) pi P(π(i, Y1 , Y2 , . . .) ∈ A)
= P(π(Y, Y1 , Y2 , . . .) ∈ A) where Y is a random variable with P(Y = i) = pi , independent of the Yk . Then the sequence Y, Y1 , Y2 , . . . hat the same joint distribution as the sequence Y1 , Y2 , Y3 , . . .) and so the above sum is also equal to P(π(Y1 , Y2 , . . .) ∈ A) = P(Z ∞ ∈ A) = µ(A). This completes the proof.
2.4
The Hausdorff dimension of self-similar sets
For general sets which are selfsimiar in the wide sense one has the following upper estimate of the Hausdorff dimension of their attractor. Theorem 2.6 [upper estimate] Let ϕ1 , · · · , ϕn be finitely many contractions of a complete metric space and let c1 , · · · , cn be the corresponding contraction coefficients. Let s be the unique solution of the equation n X
csi = 1.
i=1
Then the attractor K of ϕ1 , · · · , ϕn satisfies Hs (K) ≤ |K|s and dim K ≤ s. Pn Proof: The map s 7→ i=1 csi ist continuous and strictly decreasing, it converges to 0 for s → ∞ and to n for s → 0. Thus there is indeed exactly one
27
P solution of the equation i csi = 1. Let δ > 0. Choose k large enough such that ck |K| < δ. Recall that [ K= Ki1 ...ik . i1 ...ik
Since |Ki1 ... ik | ≤ δ these sets form a δ-covering of K. Moreover !k n X X X s s s s s |Ki1 . . . ik | ≤ ci ci1 · . . . · cik |K| = |K|s = |K|s . i1 ...ik
i1 ...ik
s
i=1
s
This implies H (K) ≤ |K| and in particular dim K ≤ s. A corresponding lower estimate is true only under more restrictive conditions. We first give a version of the result under a strong disjointness condition in order to give the main idea of the argument. Theorem 2.7 Suppose that the contractions of the previous Theorem 2.6 satisfy a) The ϕi are injective and the sets ϕi K are disjoint (strong open set condition) b) There are positive numbers ε and b1 , . . . , bn such that d(ϕi x, ϕi y) ≥ bi d(x, y) whenever x, y ∈ K and d(x, y) < ε. Pn Let r be the unique solution of the equation i=1 bri = 1. Then dim K ≥ r.
Proof: We want to apply the mass distribution principle Theorem 1.18 to the measure µ of the chaos algorithm Theorem 2.4 with starting point in K and weights (p1 , . . . , pn ) = (br1 , . . . , brn ). We need to show that there is some c > 0 such that for every U ⊂ K we have µ(U ) ≤ c|U |s . Because of condition a) for each k ∈ N all sets of the form Ki1 ,...,ik are disjoint. Moreover by the contraction property we can and will fix k in such a way that these sets all have diameter < ε where ε is as in condition b). Since these sets are compact there is some η > 0 which is smaller than the distance of any two of these sets. We claim that µ(U ) ≤
1 |U |r ηr
for all subsets U of K. Fix U . Let i1 , . . . , im be a maximal multiindex such that U ⊂ Ki1 ,...,im . Then there are two different indices im+1 and jm+1 such that the set U meets both Ki1 ,...,im ,im+1 and Ki1 ,...,im ,jm+1 . We consider now the cases m < k and m ≥ k separately. First suppose m < k. Then U even meets two different sets of the type Ki1 ,...,ik and hence |U | > η. Therefore we have indeed µ(U ) ≤ µ(K) ≤
µ(K) r 1 |U | = r |U |r . ηr η 28
In the case m ≥ k we get from assumption b) and the construction of η |U | ≥ dist Ki1 ,...,im ,im+1 , Ki1 ,...,im ,jm+1
= dist ϕi1 . . . ϕim−k Kim−k+1 ,...,im+1 , ϕi1 . . . ϕim−k Kim−k+1 ,...,jm+1 ≥ bi1 · · · bim−k dist Kim−k+1 ,...,im+1 , Kim−k+1 ,...,jm+1
≥ bi1 · · · bim−k η.
Thus µ(U ) ≤ µ(Ki1 ,...,im )
≤ bri1 · · · brim = (bi1 · · · bim−k )r (bim−k+1 · · · bim )r |U | r ≤ ( ) ( max bi )kr . 1≤i≤n η
Together we get the desired estimate since we may assume bi ≤ 1. The local condition b) can be checked for smooth funtions via the derivative. See the last remark in this chapter. We shall use this in the next section. Of course the condition implies only local injectivity and hence the global injectivity in a) is not superfluous. For similarities we can choose bi = ci in the two results above and apply them together to get: Corollary 2.8 Let ϕ1 , · · · , ϕn be finitely many similarities of a complete metric space and let c1 , · · · , cP n be the corresponding coefficients. Let s be the unique n solution of the equation i=1 csi = 1. If the Strong Open Set Condition of Theorem 2.7 holds then the attractor K satisfies H s (K) ≤ |K|s and dim K = s. Example 2.4 1. In Cantor’s middle third we have ϕ1 x = 31 x, ϕ2 x = 31 x + 2 1 1 s 1 s 1 s 1 3 x; then c1 = c2 = 3 and dim E = s where ( 3 ) + ( 3 ) = 1, i.e. ( 3 ) = 2 log 2 or s = log 3. 2. In the case of the Cantor set in the plane of Example 1.4 we get c1 = 41 = c2 = c3 = c4 and therefore for the dimension indeed ( 41 )1 + ( 41 )1 + ( 41 )1 + ( 14 )1 = 1. In these two cases the assumption of the corollary are satisfied. In the following two cases the same formal calculation gives the correct dimension even though the strong open set condition is not fulfilled. But for these two examples the next more subtle result Theorem 2.9 can be applied. 3. The Koch curve is contained in a closed triangle which is mapped under the four similarities into itself. The four closed image triangles touch each other. But the interior of the large triangle has four disjoint images. So log 4 we get the condition 4( 31 )s = 1 or s = log 3. 29
4. A similar argument applies to the Sierpinski triangle and yields dim E = log 3 log 2 . Definition 2.4 We assume that X = Rd . The similarities ϕ1 , . . . , ϕn satisfy the open set condition (O.S.C.) if there is an open bounded set V 6= ∅ such that the sets ϕ1 V, . . . , ϕn V are disjoint and contained in V . Remark 2.3 The hypothesis of our previous result Theorem 2.7 is indeed, as the name suggests, stronger than the ’O.S.C.’: If the sets ϕi K are disjoint then a sufficiently small open neighbourhood V = K[δ] of K has the desired property. The following partial extension of the Corollary is the main result of this section. Theorem 2.9 Let ϕ1 , . . . , ϕn be similarities in Rd which satisfy the O.S.C.. Then the corresponding Pnself-similar set K has Hausdorff dimension s where s is determined by 1 = i=1 csi .
For the proof we need the following
Lemma 2.10 Let V1 , V2 , . . . be disjoint balls in Rd with radius ε% and let V be a ball with radius %. Then V meets at most ( (1+2ε) )d of the small balls. ε Proof(of the lemma): The balls Vi each have the volume ωd (ε%)d where ωd is the volume of the unit ball. Wenn Vi ∩ V 6= ∅ then Vi is contained in the ball with radius (1 + 2ε)% around the center of V . The volume of this enlarged ball therefore satisfies X (1 + 2ε)%)d ωd ≥ (ε%)d ωd i:Vi ∩V 6=∅
which implies that the cardinality of the index set on the left hand side is at most (1+2ε) εd . Proof(of the Theorem): 1. We S start by showing K ⊂ V . The closure V is compact since V is bounded and i ϕi V ⊂ V by hypothesis. Thus by continuity Sn (cf. Theorem 2.1) ΦV = i=1 ϕi V ⊂ V and hence K = limn→∞ Φn V ⊂ V .
2. The set V contains a small ball of, say, radius a1 , since it is open, and V is contained in a large ball of, say, radius a2 since it is bounded. Then similarly each of the sets Vi1 ...ik := ϕi1 ◦ . . . ◦ ϕik is contained in a ball of radius ci1 · . . . · cik · a2 and conversely it contains a ball of radius ci1 · . . . · cik · a1 since the maps ϕi are similarities. 3. We want to apply the mass distribution principle. Let µ be the measure introduced in the last section which corresponds to the weights pi = csi . By choice of s the csi form indeed a probability vector. Let now U be a set with U ∩ K 6= ∅ and |U | = %. We claim that there is a constant c such that µ(U ) ≤ 30
c|U |s . We may assume that U is open (approach U and % from above by open sets resp. slightly larger numbers). For each coding sequence y = (yl ) ∈ Σn there is exactly one index k = k(y) such that cy1 · . . . · cyk > % ≥ cy1 · . . . · cyk+1 .
(2.6)
We consider the system of initial segments S := {y1 . . . yk(y) |y ∈ Σn } and claim that the sets Vy1 ...yk(y) are pairwise disjoint. Let Vi1 ,...,ik , Vy1 ,...,yk0 be two different sets of this type. Then there is a first index k0 such that ik0 6= yk0 . By construction Vik0 ...ik ⊂ ϕik0 V, Vyk0 ...yk0 ⊂ ϕyk0 V. Thes two sets on the right hand side are disjoint by assumption. Hence so are the sets on the left. Moreover our two sets Vi1 ,...,ik , Vy1 ,...,yk0 are the images of the sets on the left under the injective map ϕi1 ◦ . . . ◦ ϕik0 −1 = ϕy1 ◦ . . . ◦ ϕyk0 −1 and thus they are also disjoint. 4. Suppose y = (y1 , y2 , . . .) is a coding sequence such that π(y) ∈ U . Let Σy be the set of all coding sequences with the initial segment y1 . . . yk(y) . Then the set π(Σy ) meets U . Since by step 1 π(Σy ) = Ky1 ...yk(y) ⊂ V y1 ...yk(y) and since U is open we have even Vy1 ...yk(y) ∩ U 6= ∅. Choose a point z0 ∈ U . Then U ⊂ B(z0 , %). Therefore µ(U )
= P(π(Y1 , Y2 , . . .) ∈ U ) X {P(Y1 = y1 , . . . Yk = yk ) : Vy1 ...yk ∩ B(z0 , %) 6= ∅}. ≤ S
We want to estimate the number of terms in this sum. Let y1 . . . yk be such that Vy1 ...yk ∩ B(z0 , %) 6= ∅. Then Vy1 ...yk ⊂ B(z0 , % + a2 · cy1 · . . . · cyk ) ⊂ B(z0 , %(1 + a2 /c)) where c = mini ci and we have used the second inequality in (2.6). On the other hand Vy1 ...yk contains a ball with radius a1 · cy1 · . . . · cyk > a1 %. Moreover these 2 /c) sets are disjoint by step 3. Letting ε = (1+a the lemma implies that there a1 is a uniform bound M on the number of disjoint sets Vy1 ...yk which can meet B(z0 , %). Therefore µ(U ) ≤ M max P(Y1 = y1 , . . . Yk = yk ) S
≤ M max(csy1 . . . csyk ) ≤ M %s S
and Theorem 1.18 gives the result. 31
1 = c|U |s (min ci )s
Remark 2.4 Schief (96) proved that without the O.S.C. either dim K < s or at least Hs (K) = 0.
2.5
A glimpse of Julia sets
In the previous section we gave some examples where the precise Hausdorff dimension could be determined. Here we want to give another example which is related to the Mandelbroit set which played a big role in the popularization of the field. Here our techniques yield only lower and upper bounds for the dimension. Definition 2.5 Let f : C → C be an entire function. A point z ∈ C is called k-periodic if it is a fixpoint of the k-fold iteration f ◦k of f . It is a repellent periodic point if for some k it is k-periodic such that |(f ◦k )0 (z)| > 1. The Julia set J(f ) of f is the closure of all repellent periodic points of f . Example 2.5 Let f (z) = z n . Then every periodic point of f is either 0 or on k k the unit circle. We have f ◦k (z) = z n . Hence |(f ◦k )0 (z)| = nk |z n −1 | and thus a periodic point is repellent if and only if it is on the unit circle. The k-periodic points on the unit circle are precisely the (nk − 1)-th roots of unity. If we let k run we see that the repellent periodic points of f form a dense subset of the unit circle and hence the Julia set J(f ) is equal to the unit circle. We are interested in the Julia set of the simple quadratic function fc (z) = z 2 + c. As we shall see the structure of the set J(fc ) depends a lot on the number c. The following Theorem collects some facts which we need but we omit the proof. It involves the concept of normal families and Montel’s Theorem (see [8]). Theorem 2.11 If f is a polynomial function of degree > 1 the Julia set J(f ) is a compact nonempty set invariant under f and under f −1 . Here the nontrivial fact is f −1 (J(f )) ⊂ J(f ). Of particular interest is the question whether the Julia set is connected. This leads to the Definition 2.6 The Mandelbroit set M is defined by M = {c ∈ C : J(fc ) is connected}. It is itself a bounded connected set with a very interesting boundary structure. By the Example above the point c = 0 belongs to M . One can prove ([8] and see also the picture of M there) that the disk {|c| ≤ 41 } is contained in M and M has the following alternative description: 32
M
= {c ∈ C : {f ◦k (z)}k∈N is bounded} = {c ∈ C : {f ◦k (z)}k∈N does not converge to ∞}.
The result below gives in particular a disk which contains the Mandelbroit set. But its main objective is the structure of the Julia sets for large c. Theorem 2.12 For |c| sufficiently large, more specifically if |c| >
√ 1 (5 + 2 6) ≈ 2.475, 4
the Julia set of fc (z) = z 2 + c is totally disconnected and for its Hausdorff dimension we have 2 log 2 dim J(fc ) ∼ log |c| as |c| −→ ∞.
Proof: 1. As a preparation we want to describe geometrically the inverse image under fc of the circle C = {z ∈ C : |z| = |c|} and of its interior D. The inverse √ √ iθ 1 −1 iθ 2 image fc (C) is the set of all points of the form (ce − c) = c e − 1 where with each choice of the square √ root the negative of that value is also admissible. Let us first study the values eiθ − 1 as θ moves. For θ ∈ [0, 2π) the curve (eiθ − 1) forms a unit circle shifted to the left by 1, starting at 0 into positive vertical direction, passing through −2 and returning to 0 from the vertical negative direction as θ ↑ 2π. We first consider the square roots of the points of this circle which have positive imaginary part. Thinking in polar coordinates one easily sees that they form a smooth loop leaving 0 into the positive direction of the first diagonal (’northeast’) and returning to the origin from the diagonal direction of the second quadrant (’nothwest’). Note that the interior of this loop is precisely the set of square roots with positive imaginary part of points in the interior of the circle {eiθ − 1}. Now, choosing for θ ∈ [2π, 4π) the negative imaginary part for the square root, we follow the mirror image of this loop under reflection at the real axis: We leave the origin into the √ fourth quadrant ’southeast’ and return from ’southwest’. Together the values eiθ − 1 form a√symmetric ’double loop’ √ with center at 0 and attaining the maximal distance √2 from its center 0 at ± 2i. If we multiply this double loop with c p we get a point symmetric double loop with center at 0 and maximal distance 2|c| from its center 0. This double loop is the set f −1 (C). Let G1 and G2 be the interiors of the two closed parts of this double loop. 2. Then fc maps both G1 and G2 bijectively and analytically to the interior D of C. We call g1 and g2 the corresponding inverse functions. They form the 1 two branches of fc−1 (z) = ±(z − c) 2 . To these two functions we would like to apply the results of the previous section. pAre they contractions? Not in general. But consider the disk V = {z : |z| < 2|c|}. Then V is just large enough to 33
contain f −1 (C) and hence the disjoint domains Gi = gi (D). As soon√as the closed disk V is contained in the open disk D, or equivalently if |c| > 2, the functions gi both map V into itself. The derivative of gi , i = 1, 2 is given by 1 gi0 (z) = ± 21 (z − c)− 2 . On V we get the estimates p p 1 1 1 1 (|c| + 2|c|)− 2 ≤ |gi0 (z)| ≤ (|c| − 2|c|)− 2 . 2 2
3. Solving the corresponding quadratic √ equation we see that the upper bound is less than 1 if and only if |c| > 41 (5 + 2 6). From now on we assume that this So for these values of |c| the restrictions gi to the set V are contractions with p of the 1 contraction coefficients 12 (|c| − 2|c|)− 2 according to the mean value theorem (cf. also Lemma 2.13 below). Since V is a compact subset of the open set D the sets g1 (V ) and g2 (V ) are compact subsets of the two disjoint domains G1 and G2 . In particular gi (V ) ⊂ V and g1 (V ) ∩ g2 (V ) = ∅. 4. According to Theorem 2.1 there is an associated attractor, namely the unique nonempty subset K of V such that K = g1 (K) ∪ g2 (K). We claim that K is equal to our Julia set J(fc ): According to Theorem 2.11 p J(fc ) is nonempty. Since |fc (z)| ≥ |z|2 − c > |z| whenever |z| > 2|c|, i.e. whenever z ∈ / V , we see that V contains all periodic points and hence the set J(fc ). On the other hand by Theorem 2.11 J(fc ) = f −1 (J(fc )) = g1 (J(fc )) ∪ g2 (J(fc )) and by uniqueness of K we get J(fc ) = K and by Theorem 2.3 J(fc ) is totally disconnected. p 1 5. Finally we estimate the dimension of J(fc ). Let d = 21 (|c| − 2|c|)− 2 and p 1 let b be a number less than 21 (|c| + 2|c|)− 2 . Then b<
|gi (z) − gi (z 0 )| ≤d |z − z 0 |
(at least locally) according to Lemma 2.13 below. Let r and s be the solutions of 2br = 2ds = 1. We get r ≤ dimJ(fc ) ≤ s from Theorems 2.6 and 2.7. Let b increase to its upper bound and calculate r and s explicitly. The result is 2 log 2 2 log 2 p p ≤ dim(J(f (c )) ≤ . log(|c| + |2c|) log(|c| − |2c|)
As |c| → ∞ both sides are asymptotically equal to proof.
2 log 2 log |c|
which completes the
In the proof we used the following versions of the mean value theorem.
34
Lemma 2.13 Let g : D → V be a holomorphic map between two complex domains. a) If D is convex then sup |
z6=z 0 ∈D
g(z) − g(z 0 ) | ≤ sup |g 0 (z)|. z − z0 z∈D
b) Let g be bijective and let V be convex then inf 0
z6=z ∈D
|
g(z) − g(z 0 ) | ≥ inf |g 0 (z)|. z∈D z − z0
c) For every compact subset K of {z ∈ D : g 0 (z) 6= 0} and every positive number b < inf z∈K |g 0 (z)| there is some ε > 0 such that inf
z,z 0 ∈K,0<|z−z 0 |<ε
|
g(z) − g(z 0 ) | ≥ b. z − z0
Proof: a) This is a standard form of the multidimensional mean value theorem. b) Since g is bijective the inverse function g −1 : V → D exists and it satisfies supy∈V |(g −1 )0 (y)| = (inf z∈D |g 0 (z)|)−1 . Applying part a) to g −1 we get |z−z 0 | = |g −1 (y)−g −1 (y 0 )| ≤ ( inf |g 0 (z)|)−1 |y−y 0 | = ( inf |g 0 (z)|)−1 |g(z)−g(z)0 | z∈D
z∈D
which is the estimate in b). c) Replacing D by a suitable neighbourhood of K we may assume that b < inf z∈D |g 0 (z)|. By the open mapping theorem for each point z ∈ D there is a neighbourhood U ⊂ D of z which is mapped bijectively to an open ball B around of g(z). Applying part b) to the restriction of g to U inf 00
z 0 6=z ∈U
|
g(z 0 ) − g(z 00 ) | > b. z 0 − z 00
By compactness there is a covering of K by finitely many sets Uj of this type and there is some ε > 0 such that every pair z 0 , z 00 of points in K with distance < ε are together in one of these sets Uj . This implies the assertion. Remark 2.5 1. The convexity of the image in part b) is essential, as the simple example g(z) = ez on a rectangle D = (−ε, ε) + i(−π + ε, π − ε) shows. A closer look at the construction in the first part of the proof of Theorem 2.12 shows that the sets G1 and G2 are indeed convex, so it is not really necessary to use part c) of the Lemma. Moreover for the purpose of the above proof it would be sufficient to have Theorem 2.7 only under the assumption of the stronger global lower estimate. However our local argument also works in similar situations where the contractions do not have convex range. 2. As is well known part a) of this Lemma extends to C 1 -maps in the Euclidean space Rd if one replaces in part a) |g 0 (z)| by the operator norm kDg(z)k, or 2 2 equivalently by σmax (z) where σmin < . . . < σmax are the singular values of Df (z). Parts b) and c) extend similarly but with kDg −1 (g(z))k−1 = σmin (z) instead of |g 0 (z)|. 35
Chapter 3
Differentiation 3.1
Upper and Lower Densities
Definition 3.1 Let µ be a finite Borel measure on the metric space (X, d) and let α > 0 and x ∈ X. Then we introduce: dα (µ, x) := lim sup
µ(B(x, r)) rα
as the upper α-density and
dα (µ, x) := lim inf
µ(B(x, r)) rα
as the lower α-density
r&0
r&0
of µ at x. If the two values coincide, i.e. dα (µ, x) = dα (µ, x) = dα (µ, x), we call the common value dα (µ, x) the α-density of µ at x. n RExample 3.1 Let f : R → R+ be continuous and consider the measure µ(A) := A f dx. Then R R f (y)dy f (y)dy B(x,r) B(x,r) = ωn lim = ωn f (x). dn (µ, x) = lim n r&0 r&0 r volB(x, r)
Remark 3.1 A remarkable theorem of Marstrand gives a partial converse: Let µ be a measure on Rn and α ≥ 0 such that dα (µ, x) exists µ-almost everywhere. Then α ∈ Z+ . A first application of the upper density is given by the following result. Theorem 3.1 [local mass distribution principle] Let E ⊂ X be such that dα (µ, x) < ∞ for all x ∈ E and for some mass distribution µ on E. Then dim E ≥ α. If moreover E ∈ B(X) and dα (µ, x) ≤ c for all x ∈ E then Hα (E) ≥
36
µ(E) . c
Lemma 3.2 For every r > 0 the map x 7→ µ(B(x, r)) is lower semicontinuous. The functions dα (µ, ·) and dα (µ, ·) are Borel measurable. Proof: For the first statement we have to show that {x ∈ E|µ(B(x, r)) > c} is open for all c: Let µ(B(x0 , r)) > c. Since the balls B(x0 , r − n1 ) increase towards B(x0 , r) there is some ε > 0 such that µ(B(x0 , r − ε)) > c. For every x ∈ B(x0 , ε) we have B(x0 , r − ε) ⊂ B(x, r) and hence µ(B(x, r)) > c. ¿From the first statement it follows that the upper and lower densities are lim sup and lim inf, respectively, of a countable family of Borel functions and hence measurable. Proofof the Theorem: Suppose E ∈ B(X) such that dα (µ, x) ≤ c ∀x ∈ E. Let δ > 0 and c0 > c. Consider the set Eδ = {x ∈ E|µ(B(x, r)) < c0 rα for all r < δ} = {x ∈ E|µ(B(x, r)) < c0 rα for all r < δ, r ∈ Q}. Then Eδ is a Borel set. Let (Ui ) be a δ-covering of E. If x ∈ Eδ ∩ Ui then Ui ⊂ B(x, |Ui |) and therefore µ(Ui ) ≤ µ(B(x, |Ui |)) ≤ c0 |Ui |α Hence µ(Eδ ) ≤
X
Ui ∩Eδ 6=∅
µ(Ui ) ≤ c0
X
Ui ∩Eδ 6=∅
|Ui |α
δ) and hence Taking the infimum over all such coverings we get H α (Eδ ) ≥ µ(E c0 δ) 0 Hα (Eδ ) ≥ µ(E since c > c was arbitrary. Since E % E and µ and Hα are δ c measures on the Borel σ-algebra we get finally H α (E) ≥ µ(E) c . Let now E be not necessarily Borel. For a suitable compact K ⊂ E we have µ(K c ) = 0 by assumption. Then the set Km := {x ∈ K|dα (µ, x) ≤ m} is Borel and hence m) > 0 for some m and thus dim E ≥ α. Hα (E) ≥ Hα (Km ) ≥ µ(K m
3.2
Connection to Potential Theory
Definition 3.2 Let µ be a mass distribution on Rd and let α ≥ 0. Then we call R 1 - Φ(α) (x) := |x−y| (α) dµ(y) the (α)-potential of µ at the point x; R RR 1 - I(α) (µ) := Φ(α) (x)dµ(x) = dµ(x)dµ(y) the (α)-energy of µ; |x−y|(α)
- C(α) (E) := sup{ I(α)1(µ) |µ is a mass distribution on E with µ(E) = 1} the (α)capacity of E . 37
Theorem 3.3 [Potential theoretic dimension estimate] If E ⊂ (X, d) and µ is a mass distribution on E with Iα (µ) < ∞ then Hα (E) = ∞,
in particular dim E ≥ α.
Proof: Suppose Iα (µ) < ∞. Then µ{x} = 0 for each x. In the set E1 := {x ∈ E|dα (µ, x) > 0} choose a point x ∈ E1 . Then there is some ε > 0 and a sequence ri & 0 such that µ(B(x, ri )) ≥ εriα for all i. Choose a number qi < ri such that µ(B(x, qi )) < 12 εriα which is possible since µ(B(x, r)) & 0. For a suitable subsequence we get then even ri+1 < qi < ri and µ(B(x, ri )B(x, qi )) ≥ 12 εriα . {z } | =:Bi
Then
XZ XZ 1 1 1 dµ(y) ≥ dµ(y) ≥ Φα (x) = α dµ(y) α d(x, y)α d(x, y) r Bi Bi i ∞ X 1 X −α α εr r = +∞. = ri−α µ(Bi ) ≥ 2 i=1 i i Z
R R But ∞ > Iα (µ) = Φα dµ ≥ E1 Φα (x)dµ. Thus we have necessarily µ(E1 ) = 0 and therefore dα (µ, x) = 0 µ-almost everywhere. In particular we have dα (µ, x) ≤ c µ a.e. for every c > 0. Letting E c := {x ∈ c ) E : dα (µ, x) ≤ c} we get µ(E ≤ Hα (E) from Theorem 3.1 and for c & 0 finally c α H (E) = +∞. In particular we remember the two rules: 1. Finite α-potential implies dα (µ, x) = 0. 2. Positive α-capacity on E implies dim E ≥ α.
3.3
The Hausdorff dimension of Brownian orbits
There are many types of random sets whose dimension can be determined by the methods presented above. As an example, in this section we determine the Hausdorff dimension of the orbit of a multidimensional Brownian motion. Definition 3.3 A standard Brownian motion is a family B = (B(t), t ≥ 0) of random variables on a common probability space (Ω, F, P) with the following properties: 1) B(0) = 0 a.s.. 2) For all time parameters 0 ≤ t0 < t1 < · · · < tn the increments B(t1 ) − B(t0 ), · · · , B(tn ) − B(tn−1 ) are independent random variables. 3) For all s < t the increment B(t) − B(s) is N (0, t − s)-distributed. 4) For P-almost all ω ∈ Ω the ’path’ t 7→ B(t)(ω) is continuous. 38
Remark Property 4) is in the following sense actually a consequence of the other 3 conditions: One can prove that for every process B 0 which has the corresponding properties 1) − 3) there is a process B on the same probability space which has in addition property 4) such that B(t) = B 0 (t) a.s. for every t. We collect first a couple of first properties of Brownian motion. Many properties of Brownian motion can be derived from the simple Lemma 3.4 The random variable X has a N (0, t) law then aX has a N (0, a 2 t) law. Corollary 3.5 Let α > 0 and L(X) = N (0, t), then E(|X|α ) = tα/2 c(α) for some c(α) which does not depend on t. Proof: Choose c(α) = E(|Z|α ) where the law of Z is N (0, 1) and apply the lemma. The continuity can be made more precise: We use this without proof. Theorem 3.6 For P-almost all ω ∈ Ω the ’path’ t 7→ B(t)(ω) is locally γH¨older-continuous for each γ < 21 , i.e. for each T < ∞ and γ < 21 there is a constant CTγ (ω) such that for all s, t ∈ [0, T ] one has |B(t) − B(s)|(ω) ≤ CTγ (ω)(t − s)γ . This result will be used in connected with the H¨ older principle : Theorem 3.7 Let f : X → Y a H¨older continuous map between two metric spaces with H¨older exponent α > 0. Then for every subset E of X we have dim(f (E) ≤ α1 dim(E). Proof: The proof is an adaption of the argument of the proof of the Lipschitz Priciple. Let δ > 0 and let s > dimE. Then there is a δ-covering (Ui ) of E and P s |U | < ∞. Then (f (Ui )) is covering (Ui )| ≤ C|Ui |α ≤ Cδ α i i P of f (E)s/αwith |fP ≤ C i |Ui |s < ∞. Such a by the H¨ older continuity. Therefore i |f (Ui )| covering exists for each δ > 0. Therefore dim(f (E)) ≥ s/α for each s > dim(E) and hence dim(f (E)) ≥ dim(E)/α. Finally we need the concept of a d-dimensional Brownian motion. Definition 3.4 A family (B(t) : t ≥ 0) of Rd -valued random vectors is a standard d-dimensional Brownian motion if it satisfies the conditions 1) - 4) of a standard Brownian motion where only the one dimensional normal random variables are replaced by a d-dimensional random vectors with law N (0, tI d ) where Id is the d-dimensional unit matrix.
39
This definition is equivalent to saying that the component processes (Bi (t) : t ≥ 0), i = 1 · · · , d form d independent one-dimensional distributions. Now we are able to prove the main result of this section. Theorem 3.8 Let B be a d-dimensional Brownian motion, d ≥ 2. Then Palmost all of the random sets B[0, 1] = {B(t) : 0 ≤ t ≤ 1} have Hausdorff dimension 2. Proof: . 1. For all γ < 12 the set B[0, 1](ω) is a.s. the image of the unit interval under a Holder map with exponent γ by Theorem 3.6. By Theorem 3.7 we get 1 , m ∈ N, we conclude dimB[0, 1] ≤ γ −1 . Since this is true for all γ = 21 + m dimB[0, 1] ≤ 2 a.s.. 2. For the upper estimate we want to apply the method of the previous section. Let µ be the (random) sojourn time distribution on B[01], i.e. µ(A) = −1 λ(B|[0,1] (A)) = λ{t ∈ [0, 1] : B(t) ∈ A}. Then by the usual transformation rule for image measures we have Z
h(x) dµ = B[0,1]
Z
1
h(B(t)) dt 0
for every measurable function h ≥ 0 and similarly Z 1Z 1 Z Z |x − y|α dµ(x)dµ(y) = |B(t) − B(s)|α dtds. B[0,1]2
0
0
Now we use Fubini and the multidimensional version of Corollary 3.5 to get E
Z Z
B[0,1]2
|x − y|
−α
dµ(x)dµ(y)
= E =
Z
1 0
Z
1 0
Z
1
Z
0
= const
Z
1 0
|B(t) − B(s)|−α dtds
E(|B(t) − B(s)|−α ) dtds 1 0
Z
1 0
|t − s|−α/2 dtds.
The last integralRisR finite if and only if −α/2 > −1, or α < 2. In this case the −α random integral dµ(x)dµ(y) is finite a.s. and thus according B[0,1]2 |x − y| to Theorem 3.3 the random set B[0, 1] has a.s. dimension at least α for each α < 2. This implies the result.
3.4
Lebesgue’s Differentiation Theorem
We start with a simple lemma concerning the regularity of finite Borel measures.
40
Lemma 3.9 Let µ be a finite measure on B(X) where X is a metric space. Then µ is regular: µ(A) = inf{µ(U )|A ⊂ U, U open}
= sup{µ(F )|F ⊂ A, F closed}
outer regularity inner regularity
(3.1) (3.2)
Proof: (Sketch) If A ⊂ X is open then property (3.1) is trivial and property (3.2) is easy, consider the sets An = {x ∈ X| dist(x, Ac ) ≥ n1 } % A. It remains to show that A = {A|A satisfies (3.1) and (3.2)} is a σ-algebra. Clearly this system is closed under taking complements. That it is closed under countable unions is easily verified with the typical ε2−n -argument. The following local result is easy for continuous integrands. But it is somewhat surprising that the conclusion holds at least almost everywhere for arbitrary Lebesgue integrable functions. It is another nice application of Vitali’s covering theorem. The class of all balls in this result can be replaced by many other classes of sets. Theorem 3.10 [Lebesgue’s differentiation Theorem] Let f : Rm → R be Lebesgue integrable, i.e. (f ∈ L1 (Rm ). Then for λm -almost all x ∈ Rm the following holds: For every sequence (Kn )n of balls which contain x and whose diameter converges to 0 we have R f (y)dy Kn (x) = f (x) (3.3) lim m n→∞ λ (Kn (x)) As an illustration we prove that measurable sets consist almost exclusively of ’density points’. Definition 3.5 A set A ⊂ Rm has a density point at x if λn (A ∩ B(x, r)) = 1. r→0 λn (B(x, r)) lim
Folgerung 3.1 Let A ⊂ Rm ) be Lebesgue measurable. Then Lebesgue-almost every point in A is a density point of A. For the proof it suffices to consider bounded sets in which case one can apply the Theorem to the indicator function of A. Proof: (of the Theorem). First of all it suffices to prove the result for nonnegative f . Also we note that it does not matter whether the balls in the statement
41
are closed or open since the boundary of a ball always is a Lebesgue nullset. Let now R f (y)dy f(x) := lim sup{1K (x) K m |K is a ball containing x with radius r < δ}. δ→0 λ (K) The function f is measurable. Indeed if the balls K are open and sδ (x) denotes the sup in the definition of f then the sets {sδ > c} are open. Since sδ is monotone in δ it suffices to let δ run through a nullsequence, and the function f is measurable as the limit of a sequence of lower semicontinuous functions. Now let r > 0 and consider the set Ar := {x ∈ Rm |f (x) < r < f (x)}. We want to show that Ar is a nullset, i.e. λm (Ar ) = 0. We proceed by contradiction. Suppose λm (Ar ) > 0. Then Z Z rdx = rλm (Ar ). f (x)dx < Ar
Ar
R (cf. Lemma 3.9) Due to outer regularity of the the measure µ(B) := B f (x)dx R we find an open set U containing Ar which still satisfies U f (x)dx < rλm (Ar ). Let now Φ the collection of all balls contained in U such that R K f (y)dy >r (3.4) λm (K) By defininition of Ar each point of Ar is an element of an arbirtrarily small member of Φ, i.e. Φ is a Vitali class for Ar . According to Corollary 1.16 there is a disjoint sequence (K l ) in Φ whose union covers Ar up to a nullset. Therefore Z Z XZ X f (y)dy = f (y)dy ≥ S rλm (Ar ) > f (y)dy ≥ r λm (K l ) U
l
Kl
l
Kl
l
m
≥ rλ (Ar ).
(3.5)
S
This contradiction shows that Ar is a nullset and hence {f < f } = r∈Q+ Ar as well. This proves the lim sup part of (3.3). Similarly we treat the lim inf part.
Another application of this result is the following version of the one-dimensional Fundamental Theorem of calculus. Rx Corollary 3.11 Let f : R → R be Lebesgue integrable and let F (x) = −∞ f (s)ds. Then F is λ1 -a.e. differentiable with F 0 (x) = f (x). Proof: Let x be a point for which the assertion of Theorem 3.10 holds. We claim that F is differentiable at x. Let tn be a nullsequence of positive numbers. We use the balls [x, x + tn ]. Then R x+tn f (y)dy F (x + tn ) − F (x) lim = lim 1x = f (x). n→∞ n→∞ λ ([x, x + tn ]) tn This shows the assertion for the right hand derivative. A similar argument works for the left hand derivative. 42
For later reference we note here yet another result which shows that the common assumption of continuous differentiability in one-dimensional calculus is not really essential. Lemma 3.12 [Integration by parts] Let F, G be the primitive of f and g respectively, i.e. Z x Z x F (x) = f (s)ds; G(x) = g(s)ds ∀x ∈ R −∞
Dann gilt
Z
−∞
b
|ba
f (x)G(x)dx = F G a
−
Z
b
g(x)F (x)dx a
Proof: The key is a ’Fubini flip’: Z
b
F (x)g(x)dx = a
=
Z
Z
b
[F (a) + a b
[F (a) + a
Z
Z
x
f (y)dy]g(x)dx a b
f (y)1[y<x] dy]g(x)dx a
=F (a)[G(b) − G(a)] + =F (a)[G(b) − G(a)] + =F (a)[G(b) − G(a)] +
Z
Z
Z
b a b
Z
b
f (y)1[y<x]g(x)dydx a
f (y)
b
g(x)dxdy
y
a b a
Z
f (y)[G(b) − G(y)]dy
=F (a)[G(b) − G(a)] + G(b)[F (b) − F (a)] −
Z
b
f (y)G(y)dy. a
Remark 3.2 A function F which is locally the primitive of an integrable function in the sense of the previous two results is also called an absolutely continous function. We shall later study two extensions of this concept, one for measures and one for functions of several variables.
3.5
Rademacher’s Theorem
Our final goal in this chapter, Rademacher’s Theorem, plays an essential role in the program of softening the smoothness assumptions in classical calculus. We start with the one-dimensional case.
43
Lemma 3.13 Let F : [a, b] → R be Lipschitz continuous with Lipschitz constant L. Then F is the primitive of a Lebesgue-integrable function f with |f | ≤ L a.e.. In particular every Lipschitz function F on R is absolutely continuous and differentiable almost everywhere. Proof: We use a little Hilbert space theory. Since the assertion is of a local character we may assume that the interval Pn is the unit interval. Let ψ be a step function on the interval [0, 1], i.e. ψ = i αi 1]ai ,ai+1 ] . Then we define X I[0,1] (ψ) := αi [F (ai+1 ) − F (ai )]. i
Note that |I(ψ)| ≤ L
X i
|αi ||ai+1 − ai |
= L||ψ||1 ≤ L||ψ||2
by Jensen’s inequality.
Moreover the functional I is linear. The step functions are dense in the Hilbert space L2 ([0, 1]). Hence there is a unique continuous extension of I to the whole space L2 [0, 1]. According to the representation theorem of Fischer-Riesz there is a function f ∈ L2 ([0, 1]) such that for all ψ Z 1 I(ψ) =< f, ψ >= ψ(x)f (x)dx. 0
Choose now ψ := 1[x,y] for some subinterval [x, y] ⊂ [0, 1]. Then Z 1 Z y F (y) − F (x) = I(ψ) = 1[x,y] f (u) du = f (u) du. 0
x
Therefore F is the primitive of f . Moreover from the Lipschitz condition we see that |F 0 (x)| ≤ L whereever this derivative exists. Now Corollary 3.11 implies that |f | ≤ L a.e.. The multivariate result reads as follows. Theorem 3.14 (Rademacher) Let U ⊂ Rm open and let g : U → Rn be locally Lipschitz continuous. Then g is differentiable at λm -almost all points x ∈ U. Remark 3.3 For m = 1 the Theorem is sharp: Let A ∈ B(R) with λ1 (A) = 0. Then there is a Lipschitz function f : R → R which is differentiable at no point of A. (Zakorski 74). However for m = 2 there are Borel measurable Lebesgue nullsets A such that every Lipschitz function f : R2 → R is differentiable at at least one point x ∈ A. (Preiss’90) 44
In the proof we shall need the following well known continuity principle. Proposition 3.15 Let X, Y be metric spaces and assume that Y is complete. Let (fi )i∈I be a family of maps from X to Y which is uniformly equicontinuous, i.e. for all ε > 0 there is some δ > 0 such that for all x, x0 ∈ X with d(x, x0 ) < δ and all i ∈ I we have d(fi (x), fi (x0 )) < ε. Moreover suppose that (fi (x))i∈I converges in Y for all x in a dense subset X0 of X. Then 1. (fi ) converges uniformly on the whole space X. 2. The limit function f (x) := limi∈I fi (x) is uniformly continuous. Proof: Let ε > 0. Choose any point x ∈ X. Then we can find some x0 ∈ X0 with d(x, x0 ) < δ where δ is chosen for ε according to the assumption. Then there is some i0 ∈ I such that for all i, j > i0 we have d(fi (x0 ), fj (x0 )) < ε since limi fi (x0 ) exists. Then d(fi (x), fj (x)) ≤ d(fi (x), fi (x0 )) + d(fi (x0 ), fj (x0 )) + d(fj (x0 ), fj (x)) < 3ε The space Y is complete and hence f (x) = limi fi (x) exists. Moreover d(fi (x), f (x)) ≤ 3ε for all i > i0 and x ∈ X which implies uniform convergence. Finally d(x, x0 ) ≤ δ implies d(f (x), f (x0 )) ≤ ε, i.e. the limit function is uniformly continuous. Proof(of the Theorem): 1. It suffices to treat the components of the function f separately, i.e. we only consider the scalar valued case. Moreover differentiability is a local property. Thus we assume that f is globally Lipschitz. 2. First we consider directional derviatives. Let v ∈ ∂B(0, 1) be any vector of unit length. Define f (x + tv) − f (x) t&0 t
Dv f (x) := lim
whenever the limit exists.
Since f is continuous, we may restrict our attention to t ∈ Q. Therefore the set Av := {x|Dv f (x) exists} ∈ B(Rm ) and Dv f 1Av ist Borel measurable. For every x ∈ v ⊥ the set {ξ ∈ R|Dv f exists at the point x + ξv} = {ξ ∈ R|x + ξv ∈ Av } is a λ1 -nullset because of the one-dimensional version of the Theorem. Thus Z Z Z Fubini λm (Acv ) = 1Acv (z) dz = 1Acv (x + ξv) dξdx = 0. v⊥
Thus each directional derivative exists a.e.. 3. Applying part 2. to v = e1 , . . . , en we can define ∂f ∂f gradf (x) = ( (x), . . . , (x)) ∂x1 ∂xn 45
λm -almost everywhere. We claim: For fixed v ∈ ∂B(0, 1) one has Dv f (x) = hv, gradf (x)i λm -a.e..
(3.6)
Let g ∈ Cc∞ (Rm ) be a test function. The Lemma 3.12 implies the corresponding multivariate integration by parts: Z Z Z Dv f (x)g(x) dx = Dv f (y + ξv)g(y + ξv) dξdy v⊥
Rm
=
Z
v⊥
R
Z Z − f (y + ξv)Dv g(y + ξv) dξ dy = − R
f (x)Dv g(x) dx. Rm
But the last expression can be rewritten as Z Z n n X X ∂g ∂g (x)dx = − vi (x) dx f (x) vi f (x) − ∂x ∂x n i i Rn R i=1 i=1 Z Z n X ∂f (x) = vi g(x) dx = (v gradf (x)) g(x) dx. Rn ∂xi i=1 Thus in order to conclude (3.6) we need only the following general remark. If for R R two locally integrable functions one has f1 g dx = R f2 g dx forRall g ∈ Cc∞ then f1 = f2 a.e.. For this consider the class D := {g| f1 g dx = f2 g dx} which closed under monotone convergence. Thus 1U ∈ D for every bounded open set U . Then by outer regularity 1A ∈ D for each bounded Borel set. Applying this to bounded subsets of {f1 > f2 } and {f2 > f1 } respectively one concludes that both sets must be nullsets. This concludes the proof of (3.6). 4. Let now {vk }k=1,... be a countable dense subset of ∂B(0, 1). Define Ak := {x ∈ Rn |Dvk f (x) and gradf (x) exist and Dvk f (x) = hvk , gradf (x)i}. T∞ Then λn (Ack ) = 0 and A := k=1 Ak equals Rm up to a nullset. We claim that f is differentiable on A, i.e. f (x + w) = f (x) + hw, gradf (x)i + o(|w|). (x) − Fix x ∈ A and consider for t > 0 the expression Qt (v) := f (x+tv)−f t v gradf (x). Because of f (x + tv) − f (x) ≤ lim Lip(f )t|v| = Lip(f )|v| |Dv f (x)| = lim t&0 t t
we get
| gradf (x)| = and hence |Qt (v) − Qt (v 0 )| ≤
qX
(Dei f (x))2 ≤
√ n Lip(f )
Lip(f )t|v − v 0 | √ + n Lip(f )|v − v 0 |. t 46
Thus the family Qt is equicontinuous as a function of v. Since x ∈ A we have Qt (vk ) −→ 0 for all k as t approaches 0. Since {vk } is dense in ∂B(0, 1) the functions Qt converge to 0 uniformly on the whole unit sphere. This implies f (x) + f (x + tv) − f (x) w )= f (x + w) = f (x + |w| t |{z} |w| t |{z} t v
= f (x) + t(hv, gradf (x)i + Qt (v)) = f (x) + w · gradf (x) + |w|Qt (v).
The error term |w|Qt (v) is of the order o(|w|) by the above uniform convergence statement. As an illustration of the fact that in this general setting of Lipschitz functions quite intuitive statements of calculus need a little additional attention we prove that the derivative of a Lipschitz function vanishes almost everywhere on each level sets of this function. Folgerung 3.2 If f is locally Lipschitz and Z = {f = 0} then λn {x ∈ Z|Df (x) 6= 0} = 0. Proof: Again we may restrict ourselves to the scalar valued case. We may assume that λm (Z) > 0. Let x ∈ Z be a point of density of Z (recall from 3.1 that almost all points in Z have this property. Suppose now a := Df (x) 6= 0. Let S := {v ∈ ∂B(0, 1) | hv, ai | > 12 |a|}. This is a nonempty open subset of ∂B(0, 1). Therefore there is a number c > 0 such that λm (B(x, r) ∩ (x + R+ S)) ≥ cλn (B(x, r)) for all r > 0. Since the density point property implies 1 = lim
t→0
λn (Z ∩ B(x, r)) λn (B(x, r))
there is for small r a point y ∈ Z ∩ (x + R+ S ∩ B(x, r)). In fact for sufficently small r we get λm (B(x, r) \ (B(x, r) ∩ (x + R+ S)) ≤ (1 − c)λn (B(x, r))
c < (1 − )λn (B(x, r)) ∩ Z) 2 and thus Z ∩ B(x, r) * B(x, r) ∩ (x + R+ S)c ). If y is such a point we can write it as y := rv with |v| = 1 and we get |f (x + rv)| = |r ha, vi | + o(|rv|) > 0, in contradiction to y ∈ Z = {f = 0}. 47
Remark 3.4 Rademacher’s theorem stays valid for functions f which are only defined on a subset E of Rm in the sense that the formula f (x + w) = f (x) + Df (x)w + o(|w|) holds only for those pairs for which both x and x + w are in E. In fact we have the following extension theorem for Lipschitz functions. Theorem 3.16 Let (X, d) be a metric space and let E ⊂ X be any subset and f : E → Rn a Lipschitz function. Then f has an Lipschitz extension f : X → Rn √ with Lipschitz constant nL where L = Lip(f ). Proof: First let us consider the scalar case n = 1: Define f (x) : = inf {f (e) + Ld(x, e)} e∈E
for all x ∈ X. Then a) f |E = f : Let x ∈ E be given. Then f (x) = f (x) + 0 = f (x) + Ld(x, x) is an admissible element of the set the inf of which is f (x). Thus f (x) ≥ f (x). Conversely for all e ∈ E we have f (x) = f (e) + f (x) − f (e) ≤ f (e) + Ld(x, e) and hence f (x) ≤ f (x). b) In order to see that Lip(f ) = Lip(f ) let x, y ∈ X and, say, f (x) < f (y). Choose ε > 0. Then there is some e ∈ E such that f (e) + Ld(x, e) ≤ f(x) + ε. Then f(y) − f (x) ≤ f (e) + Ld(y, e) − f(x)
≤ f (e) + Ld(x, e) + Ld(x, y) − f (x) ≤ Ld(x, y) + ε.
m > 1: For each i = 1, . . . , m the i-th component fi is also a Lip(L)-function. So there is a corresponding extension f i . With these we define f (x) := (f 1 (x), . . . , f m (x)). Clearly f |E = f and |f(x) − f(y)| =
sX i
(f i (x) − f i (y))2 ≤
p
mL2 d2 (x, y) =
√ mLd(x, y).
Remark 3.5 The constant in the vector valued case can be improved: Accord√ ing to the Theorem of Kirszbaum one can replace the factor m by 1. (cf. [9]).
48
Chapter 4
The Area and Coarea Formulas From Rademacher’s Theorem we know that Lipschitz functions are differentiable a.e.. We have seen in the proof that in one dimension they share a couple of properties which in classical calculus are shown for C 1 functions. The purpose of this chapter is to prove in an analogous way extensions of the classical change of variable formula in multidimensional integration theory, where the transformation is only assumed to be Lipschitz. It is remarkable that with the help of Hausdorff measure one even gets such results for highly noninjective maps. The generality of these results makes the proofs considerably more involved than in the classical situation. However one gets as a byproduct a neat and natural extension of the theory of integration on manifolds. Throughout this chapter we use the euclidean Hausdorff measures H α defined by 1.7 for integer dimension α. They are normalized in such a way that they coincide with standard Lebesgue measure λα for subsets of α-dimensional linear subspaces of Rn .
4.1
Egorov’s Theorem and an application to Lipschitz maps
The main tool in this chapter besides Rademacher’s Theorem and the Vitali covering Theorem is Egorov’s Theorem from general measure theory which says that every a.e. converging sequence of measurable functions is even uniformly converging outside a small exceptional set. This will imply that a Lipschitz function on Rn is even C 1 on large parts of the space and locally f can be sufficiently well approximated by affine functions. Theorem 4.1 [Egorov’s Theorem] Let (Ω, A, µ) be a finite measure space. Let (fn ) be a sequence of measurable functions on Ω with values in the metric space 49
(X, d) such that the limit f (ω) = limn→∞ fn (ω) exists µ-a.e.. Then for every ε > 0 there is a set F ∈ A with µ(F c ) < ε such that fn converges uniformly on F. Proof: Consider for k, m ∈ N the sets Ak,m = {ω ∈ Ω|d(fn (ω), f (ω)) ≤
1 k
∀n ≥ m}.
For fixed k, the sets Ak,m increase to almost all of Ω as m → ∞. Since µ(Ω) < ∞ there is exists an index mk such that µ(Ack,mk ) < ε2−k . Let F :=
∞ \
Ak,mk .
k=1
P S Then µ(F c ) = µ( k Ack,mk ) ≤ k µ(Ack,mk ) < ε and for ω ∈ F and each k ∈ N we have ω ∈ Ak,mk and hence d(fm (ω), f (ω)) ≤ k1 for all m ≥ mk . This implies uniform convergence on F . The announced application to Lipschitz maps reads as follows. Proposition 4.2 Let f : Rm → Rn be locally Lipschitz. Then every Borel set A has a decomposition ∞ [ A= Ak ∪ C k=1
into disjoint Borel sets such that C is a Lebesgue nullset and for each k the total derivative Df is uniformly continuous on Ak and the function f is ’uniformly differentiable’ in Ak , i.e. functions 4` defined by 4` (x) :=
sup x0 6=x00 ,x0 ,x00 ∈B(x, 1` )∩Ak
n |f (x0 ) − f (x00 ) − Df (x)(x0 − x00 )| o |x0 − x00 |
(4.1)
converge uniformly to 0 on Ak . Proof: 1. By Rademacher’s Theorem we may without loss of generality assume that f is differentiable everywhere on A, putting the ’bad’ points into the nullset C. If A is unbounded we decompose first A into a countable number of disjoint bounded sets and it suffices to prove the result for each of these seperately, i.e. we also may assume that A is bounded. Then Egorov’s theorem can be applied to the restriction of Lebesgue measure to A. For each i ∈ [1, n] and j ∈ [1, N ] let fi (x + 1` ej ) − fi (x) d`i,j (x) := 1 ` `
n
N
and let d (x) : R → R denote the linear map induced by this matrix function. Then d` converges pointwise on A to Df . Hence Df is measurable, and the 50
restriction Df is continuous on any subset of A on which this convergence is uniform. By Egorov’s theorem 4.1 we can thus can split A into countably many disjoint subsets Ak plus a nullset C such that Df is continuous on each Ak . ˜ ` by 2. We also introduce the preliminary function 4 ˜ ` (x) := 4
sup ξ6=0,ξ∈B(0, 1` )
n |f (x + ξ) − f (x) − Df (x)(ξ)| o |ξ|
(4.2)
This function is measurable because it suffices to let ξ rum through all rational ˜ ` converges to vectors in B(0, 1` ). The differentiability of f on A implies that 4 0. Applying Egorov’s theorem again we may even assume that this convergence is uniform on each Ak . By inner regularity of Lebesgue measure and increasing the nullset we can split up further such that each becomes Ak is compact and hence Df is even uniformly continuous on each Ak . 1 3. The differentiability of f on Ak implies that for all x0 , x00 ∈ B(x, 2` ) ∩ Ak |f (x0 ) − f (x00 ) − Df (x)(x0 − x00 )|
Thus
≤ |f (x0 ) − f (x00 ) − Df (x0 )(x0 − x00 )| + kDf (x) − Df (x0 )k · |x0 − x00 | ˜ ` (x0 ) + kDf (x) − Df (x0 )k |x0 − x00 |. ≤ 4 ˜ ` (x0 ) + sup 42` ≤ sup 4
x∈Ak
x0 ∈Ak
sup x,x0 ∈A
1 0 k ,|x−x |≤ `
kDf (x) − Df (x0 )k.
The right hand side to 0 as ` → ∞ which concludes the proof.
4.2
The area formula
The main result of this section is the following. In contrast to the classical transformation formula the image space can be higher-dimensional than the domain. This is due to the fact that with Hausdorff measure we already have the natural concept of surface measure. Therefore we do not need to establish first the transformation rule before we can define integration over the image set. Note the form of the Jacobian. It uses the transpose Df (x)∗ of the derivative Df (x). This assumes the usual euclidean inner product so that the composition T ∗ T is meaningful for all linear maps T from Rn to RN . Theorem 4.3 (Area formula) Let f : Rn → RN , (N ≥ n) be locally Lipschitz. Then Z Z Z J(x) dx = # f −1 {y} ∩ A dHn (y) = # f −1 {y} ∩ A dHn (y) A
RN
f (A)
for all A ∈ B(Rn ). Here J(x) =
p
(4.3)
det Df (x)∗ Df (x). 51
In order to see that the formula is reasonable we first verify it for affine nonsingular f . Lemma 4.4 The theorem holds if f is injective and affine, i.e. f (x) = a+F (x) with a nonsingular linear map F . Proof: Let us first assume that f (0) = 0 and that the image space F (Rn ) euquals Rn × {(0, . . . , 0)}. Then we can apply the classical transformation rule | {z } N −n
to the composition πnN ◦ F : Rn → Rn of F with the canonical projection. We get √ J(x) = det F ∗ F = | det(πnN F )| = | det DπnN F (x)| and hence by the classical transformation rule Z J(x)dx = λn (πnN F (A)) = Hn (F (A)). A
Now we pass to general f (0) = a and F . Then there is a orthogonal transformation U : RN → RN such that, dass U F (Rn ) = Rn × {(0, . . . , 0)}. Let f(x) = U f (x) − U (a). Then f (0) = 0 and f (Rn ) = Rn × {0}. Moreover Hn is invariant under rotation and translation. Thus Z Z Jf (x)dx = Hn (f (A)) = Hn (U f (A) − U (a)) = Hn (f (A)) J(x)dx = A
A
where we used J(x) =
√
det F ∗ F =
p
det(U F )∗ U F = Jf (x).
In the proof of the area formula we proceed in a number of steps. First we note that if one splits the set A into countably many parts it suffices to establish the result for each part separately. Lemma 4.5 If A is union of a sequence of disjoint sets Ek and if the theorem holds for each Ek then it holds also for A. Proof: Under the above assumptions we get Z XZ J(x) dx = A
=
XZ k
RN
k
#{Ek ∩ f −1 (y)} dHn (y) =
Z
RN
J(x) dx Ek
#{A ∩ f −1 (y) dHn (y).
52
As a consequence, we can and shall assume below that f is not only locally Lipschitz but globally Lipschitz since Rn can be decomposed into countably many disjoint parts on which f is Lipschitz. Also with the help of the previous lemma we may assume that A is bounded. By Rademacher’s Theorem the set A can be decomposed into three parts: • The set of points x ∈ A for which Df (x) exists and has (maximal) rank n, i.e. Jf (x) > 0, • The points x ∈ A for which Df (x) exists and Jf (x) = 0, • The remaining nullset where f is not differentiable. Particularly simple is the case of nullsets. Lemma 4.6 The theorem holds for E if λn (E) = 0 since then Hn (f (E)) = 0. Proof: Let L = Lipf . Then by the Lipschitz mapping principle Hn (f (E)) ≤ Ln Hn (E) = 0. Therefore both integrals in the area formula vanish. The heart of the proof is contained in the following step. The main idea is that for nonsingular Df the relative local error which one makes in replacing f by its Taylor approximation can be seen to be small with the help of the Lipschitz principle, Theorem 1.20. Lemma 4.7 The Theorem holds if f is everywhere on A differentiable with Jf (x) > 0. Proof: 1. Due to Lemma 4.5 we can assume that A has the properties of one of the components Ak in the decomposition of Proposition 4.2, i.e. we can even assume that the restriction of Df to A is continuous and the first Taylor approximation of f is uniform on A in the sense of that proposition. 2. We claim: For each ε > 0 and each x ∈ A there is some δ > 0 such that for all x0 , x00 ∈ A ∩ B(x, δ) we have |Jf (x0 ) − Jf (x)| < ε
(4.4)
and (1 − ε)|Tx (x0 ) − Tx (x00 )| ≤ |f (x0 ) − f (x00 )| ≤ (1 + ε)|Tx (x0 ) − Tx (x00 )|
(4.5)
where Tx denotes the affine approximation of f around the point x, i.e. Tx (z) = f (x) + Df (x)(z − x). In fact, given η > 0 we can find ` such that 4` (x) ≤ η (cf. the definition (4.1)) A. Moreover there is some δ > 0 with δ < 1` such that (4.4) holds for all x0 ∈ A ∩ B(x, δ). Then = ≤ ≤
|f (x0 ) − f (x00 ) − (Tx (x0 ) − Tx (x00 ))| |f (x0 ) − f (x00 ) − Df (x)(x0 − x00 )| 4` (x)|x0 − x00 | 2ηk(Df (x))−1 k · |Tx (x0 ) − Tx (x00 )| 53
and it suffices to apply this to η = ε(2k(Df (x))−1 k)−1 in order to get (4.5). 3. Let now x, ε and δ be such that (4.4) and (4.5) hold on the set Uδ (x) = A ∩ B(x, δ). Since Tx is injective we can consider the map φ = f ◦ Tx−1 : Tx (Uδ (x)) → f (Uδ (x)). Because of (4.5) the map φ has the Lipschitz constant 1 + ε and its inverse φ−1 has the Lipschitz constant (1 − ε)−1 . Together with Lemma 4.4 and the Lipschitz principle Theorem 1.20 we get because of JTx (ξ) = Jf (x) Hn (f (Uδ (x))) ≤ (1 + ε)n Hn (Tx (Uδ (x))) Z Z =(1 + ε)n Jf (x) dξ ≤ (1 + ε)n Uδ (x)
(4.6) Jf (ξ) dξ + ελn (Uδ (x))
Uδ (x)
and similarly, using φ−1 , Hn (f (Uδ (x))) ≥ (1 − ε)n Hn (dk (Uδ (x))) Z n ≥ (1 − ε) Jf (ξ)dx − ελn (Uδ (x)) .
(4.7)
Uδ (x)
4. For each ε, by step 3. the class of all balls B(x, δ) such that we have the estimates (4.6) and (4.7) is a Vitali class for A. Hence by the Vitali covering theorem we can cover almost all of A by a disjoint sequence of subsets Ui ⊂ A such that f is injective on each Ui and Z n Jf (x)dx − ελn (Uδ (x)) ≤ Hn (f (Ui )) (1 − ε) Ui Z n Jf (x) dy + ελn (Ui ) . (4.8) ≤ (1 + ε) Ui
Here the middle term can be replaced by Z # f −1 {y} ∩ Ui dHn (y) RN
since f : Ui → f (Ui ) is bijective. After this replacement all three terms in the estimate (4.8) become σ-additive as a function of the underlying set and hence the same estimate holds for A. Since A was assumed to be bounded and hence of finite measure and ε was arbitrary we see that (4.8) implies the area formula for A.
The final step shows that the ’critical’ points, i.e. the points at which Df (x) does not have maximal rank (i.e. Jf (x) = 0) do not contribute to the right hand side of the area formula: Their image under f is of Hausdorff measure 0. This is a preliminary form of Sard’s theorem. ( The ’real’ Sard theorem is concerned with maps into lower dimensional spaces but it requires more smoothness for the function f .) 54
Lemma 4.8 1. For every locally Lipschitz map f from Rn into some Euclidean space one has Hn {f (x) : Df (x) exists but Jf (x) = 0} = 0. Proof: With the help of Lemma 4.6 and Proposition 4.2 we can, like in the previous lemma, restrict our attention to points in a bounded set A such that Df (x) has rank < n on A and the Taylor approximation is uniform in A, i.e. the error functions 4` defined in (4.1) converge to 0 uniformly on A. 2. We show that for each ε > 0 and each x ∈ A there is some δ > 0 such that for all measurable subsets E of A ∩ B(x, δ) one has Hn (f (E)) ≤ εHn (E).
(4.9)
Let Bn be the closed unit ball in Rn . Since the rank of Df (x) is not maximal its (compact) image Df (x)(Bn ) under Df (x) has affine dimension < n. Hence Hn (Df (x)(Bn )) = 0. Thus there are open sets Ui in RN such that (Df (x)(Bn ) ⊂ U = and
X i
∞ [
Ui
i=1
|Ui |n < ε2−n .
(4.10)
The left hand side is compact, the right hand side is open and hence there is some η > 0 such that Df (x)(Bn ) + ηBN ⊂ U
where BN is the unit ball in RN . Next we choose ` such that 4` (x) < η, i.e. |f (x00 ) − f (x0 ) − Df (x)(x00 − x0 )| < η|x0 − x00 |
for all x0 , x00 ∈ A ∩ B(x, 1` ) and we choose δ such that δ < 1` . Now consider a small ball B(x0 , β) ⊂ B(x, δ). Each x00 ∈ B(x0 , β) can be written in the form x00 = x0 + βv with v ∈ ∂Bn and hence |f (x00 ) − f (x0 ) − Df (x)(v)β| < ηβ or This implies
f (x00 ) ∈ f (x0 ) + β Df (x)(Bn ) + ηBN ⊂ f (x0 ) + βU. f (B(x0 , β)) ⊂ f (x0 ) + β
∞ [
Ui .
i=1
Let E be a subset of B(x, δ) and let (Vj ) be a cover of E by subsets of diameter |Vj | < δ/2 where we may assume that also Vj ⊂ B(x, δ). Then we embed each
55
Vj into a ballSB(xj , βj ) ⊂ B(x, δ) of radius βj = 2|Vj |. Then f (E) is contained in the union i,j Ui,j where Ui,j = f (xj ) + βj Ui and X X X X X |Ui,j |n ≤ |Ui |n βjn ≤ ε2−n 2n |Vj |n = ε |Vj |n , i,j
i
n
j
j
j
n
which implies H (f (E)) ≤ εH (E) as required in (4.9). 3. Finally, like in the previous proof, a straightforward Vitali covering argument in combination with Lemma 4.6 shows that for fixed ε > 0 (4.9) extends to all subsets of A and this gives the result since ε was arbitrary. The last Lemma completes the proof of the area formula since according to the decomposition of a general set A mentioned before Lemma 4.6, the three Lemmas 4.6, 4.7 and 4.8 show the area formula for all three parts of A and Lemma 4.5 then implies the assertion for their union A. Now let us draw some consequences. Corollary 4.9 Let f : Rn → RN , (N ≥ R n) be locally Lipschitz. Then a) Whenever A ∈ B(Rn ) is such that A Jf (x) dx < ∞ the set f −1 {y} ∩ A is finite for Hn - almost all y ∈ f (A). b) We have Hn ({y ∈ RN : f −1 (y) is uncountable}) = 0. Proof: a) This follows directly from the area formula. b) Let N be the Lebesgue nullset of all x at which Jf (x) is not defined. It is mapped under f onto a Hn -nullset in RN . Outside N the function Jf is measurableR and finite. Hence Rn \ N is a countable disjoint union of sets Ak such that Ak Jf (x)dx < ∞ for each k = 1, 2, · · · . Then by part a) the sets Mk = {y ∈ f (Ak ) : f −1 (y) ∩ Ak is inf inite} are also Hn -nullsets. If now y is a point such that f −1 (y) is uncountable then y is either in the Hn -nullset f (N ) or in one of the Hn -nullsets Mk .
The following formula (4.11) coincides with the area formula if g = 1A . The general case follows via the usual approximation procedure with step functions. The second formula (4.12) is just the first one applied to Jgf . Corollary 4.10 Let f : Rn → RN , n ≤ N be a locally Lipschitz map. Let g : Rn → R be Lebesgue measurable and either g ≥ 0 or g · Jf ∈ L1 (λn ). Then Z Z X (4.11) g(x)Jf (x) dx = g(x) Hn (dy). Rn
RN
Similarly if g ≥ 0 or g ∈ L1 (λn ), Z
g(x) dx = Rn
Z
RN
x∈f −1 (y)
X
x∈f −1 (y)
56
g(x) n H (dy). Jf (x)
(4.12)
4.3
Examples
As a first application we recover our formula for the arc length of Lipschitz curves: Let f : R → RN be locally Lipschitz and suppose that f|[a,b] is injective. Then Z Z H1 (f ([a, b])) =
b
b
Jf (x)dx =
a
a
|f 0 (x)|dx.
These Lipschitz curves are ’rectifiable’.
A very useful tool for higher dimensional applications is the Lemma 4.11 (Binet-Cauchy formula) Let A be a N × n-matrix. Then X det(ai1 , . . . , ain )2 det A∗ A = 1≤i1 <...
= Sum of squares alln × n subdeterminants of A , where the aik are the row vectors of A. Proof: see e.g. [10], p.137 or [16] Example 4.1 Consider the graph of a locally Lipschitz map g : R n → R and define f (x) = (x, g(x)). Then
1 .. Df (x) = . 0
∂g ∂x1
··· .. . ··· ···
and Jf2 = 1 + |∇g(x)|2 . Thus
Hn ({(x, g(x)) : x ∈ U }) =
Z
0 .. . 1
∂g ∂xn
1
U
(1 + |∇g|2 ) 2 dx.
Example 4.2 (parametrised hypersurface) Let f : Rn → Rn+1 with f = (f 1 , . . . , f n+1 ) be locally Lipschitz. Then Jf2 (x)
=
n+1 X
∂(f 1 , . . . , f k−1 , f k+1 , . . . , f n+1 ) ∂(x1 , . . . , xn )
2
Z
∂(f 1 , . . . , f k−1 , f k+1 , . . . , f n+1 ) ∂(x1 , . . . , xn )
12
k=1
and hence n
H (f (U )) =
U
57
dx.
These are just two special cases of the follwing general situation. Definition 4.1 A set M ⊂ RN is called n-dimensional Lipschitz-manifold if for every a ∈ M there are open sets U ⊂ Rn and V ⊂ RN and a Lipschitz bijection f : U → V ∩ M such that a ∈ V . This f is called a chart. Remark 4.1 We know already 1. For λn -almost all x ∈ U the derivative Df (x) exists. 2. The function f maps λn -nullsets onto Hn -nullsets. 3. The image of the set B = {x ∈ U | rgDf (x) < n} under f is a H n -nullset. Thus for Hn -almost all points y ∈ V ∩ M we have rg(Df (f −1 (y))) = n. Such points are called regular values of f or regular points of M . For regular y ∈ M define Ty M := Df (x)(Rn ) = ImDf (x) where f (x) = y. If the chart f is given we can define N
gij (x) :=<
X ∂f k (x) ∂f k (x) ∂f (x) ∂f (x) , >= · . ∂xi ∂xj ∂xi ∂xj k=1
These coefficients form a n ×n-matrix which describes the linear map Df ∗ (x) ◦ p ∂f and Jf (x) = det gij we get from the Df (x). Observing Df (x) = ∂xi i=1...n area formula for the n-dimensional Hausdorff-volume of a set E ⊂ V ∩ M the expression Z p det gij dx where A = f −1 (E). Hn (E) = A
This is precisely the measure induced by the volume form which is introduced in Riemannian geometry in the case of smooth manifolds.
4.4
The Coarea Formula
The area formula in some sense is just the natural extension from C 1 -functions to Lipschitz functions of the classical transformation rule. The result of the present section, the coarea formula, however has not such a well known analogue in the usual introductory texts on multidimenional analysis. Various versions of this result for smooth maps can be found in more advanced texts on Riemannian geometry. However again it turns that the Lipschitz setting is both most natural and quite general. The result can be viewed as a variant of Fubini’s theorem for Lipschitz coordinates. It is due to W. Fleming and R. Rishel (1960). The image space of the Lipschitz map f now is lower dimensional. Thus the maximal possible rank of the derivative Df (x) is the dimension of the image space. If this rank is achieved we have det Df (x)Df (x)∗ > 0.
58
Theorem 4.12 (Coarea formula) Let f : Rn −Rq , q ≤ n be locally Lipschitz. Then for all A ∈ B(Rn ) the function y 7→ Hn−q (A ∩ f −1 {y}) on Rq is Lebesgue measurable and we have Z Z Hn−q (A ∩ f −1 {y}) dy (4.13) Jf∗ (x) dx = Rq
A
where Jf∗ (x) =
p
det Df (x) ◦ (Df (x))∗ .
Remark 4.2 Let us first note that in the case q = n on the right hand side the 0-dimensional Hausdorff measure H0 appears. It is easy to verify that this measure coincides with the counting measure. Moreover in this case Jf∗ (x) = Jf (x) and thus the coarea formula (4.13) and the area formula (4.3) then are equivalent. The proof follows the same scheme as for the area formula. After verifying the formula for linear maps we prove it first for nullsets, then for those points with Jf∗ (x) > 0 and finally for the critical points. However each of these steps is a little more complicated than in the previous section. Note that a priori it is not clear that the integrand on the right hand side of the coarea formula (4.13) is Lebesgue measurable even though at the end we shall see that this is the case. Lemma 4.13 The Theorem holds for surjective affine functions f , i.e. if f (x) = a + ϕ(x) for some a ∈ Rq and some linear map ϕ of maximal rank. R R Proof: We may assume that a = 0 since A Jf∗ (x)dx = A Jϕ∗ (x)dx and moreover the measures Hq and Hn−q are translation invariant. Let V be the n − q dimensional subspace ker ϕ of Rn and let V ⊥ be its (qdimensional) orthogonal complement. We work in the coordinate system adapted to these spaces, i.e. we write x ∈ Rn in the form x = (z, w) with z ∈ V ⊥ , w ∈ V . Then the Lebesgue measure λn on Rn is just the product measure Hq ⊗ Hn−q since the euclidean Hausdorff measures are invariant under orthogonal transformations and coincide with the Lebesgue measures on the corresponding linear subspaces. Note that ϕ|V ⊥ is injective and ϕ∗ maps Rq onto V ⊥ . Denote by ψ := (ϕ|V ⊥ )−1 the inverse of this restriction. Then ψ ∗ ◦ ψ ◦ ϕ ◦ ϕ∗ = ψ ∗ ◦ ϕ∗ = (ϕ ◦ ψ)∗ = idRq and hence Jϕ∗ =
p
det(ϕϕ∗ ) =
r
1 1 = . det ψ ∗ ψ Jψ
Fix now a Borel set A ⊂ Rn . Consider the map h : z 7→ Hn−q ({w ∈ V : (z, w) ∈ A}). Note that for each y ∈ Rq ϕ−1 (y) ∩ A = {(ψ(y), w) : w ∈ V } ∩ A 59
and hence h(ψ(y)) = Hn−q (ϕ−1 (y) ∩ A). By Fubini h is a Borel function and Z Z h(z) dHq (z). Hn−q ({w ∈ V : (z, w) ∈ A}) dz = λn (A) = V⊥
V⊥
With the area formula (4.12) applied to ψ and the function g = h ◦ ψ we get Z Z h(z) Jϕ∗ dx = Jϕ∗ λn (A) = dHq (z) V ⊥ Jψ A Z Z Hn−q ϕ−1 (y) ∩ A dy. h(ψ(y)) dy = = Rq
Rq
For general not necessarily measurable functions one can always define upper and lower integrals as follows. Definition 4.2 Let (X, B, µ) be a measure space and g : X → R+ any function. Then Z ∗ Z g dµ := inf{ h dµ : g ≤ h, h measurable} X
and
Z
X
g dµ := sup{ ∗X
Z
X
h dµ : g ≥ h, h measurable}.
R∗ R Remark 4.3 If g dµ = ∗ g dµ with a finite common value then R gj is µ1 j measurable and in L (µ). In fact the two functions g = inf h where h dµ < j R∗ R R g dµ + 1j and g ≤ hj , resp. g = supj hj where hj dµ > ∗ g dµ − 1j , are R B-measurable and g − g dµ = 0 or µ{g < g} = 0 and these functions are R∗ integrable. In particular if g ≥ 0 and g dµ = 0 then g is µ-measurable and the upper integral is an ordinary integral. For λn -nullsets the coarea formula is a consequence of the following estimate, used for k = n. Lemma 4.14 Let A ⊂ Rn and f : A → Rq be a Lipschitz map. Then for all integers k with q ≤ k ≤ n we have Z ∗ ωq ωk−q Lip(f )q Hk (A). (4.14) Hk−q (f −1 (y) ∩ A) dy ≤ ωk In particular, for k = n, we conclude that the Theorem holds for λ n -nullsets A. Proof: Let δ = 1j . There are covers ((Uij )i∈Ij )j∈N of A by closed sets such that for each j 1 ωk X j k 1 |Ui | < Hk1 (A) + |Uij | ≤ and k j j 2 j i∈Ij
60
The images f (Uij ) are compact and for each y ∈ Rq X j 2k−q k−q −1 |Ui |k−q 1f (U j ) (y) =: gj (y). H 1 (f {y} ∩ A) ≤ i ωk−q j i∈Ij
Then Fatou’s lemma and the isodiametric inequality imply Z Z 2k−q ∗ k−q −1 2k−q ∗ (f −1 {y} ∩ A) dy H (f {y} ∩ A) dy = lim Hk−q 1 j→∞ j ωk−q ωk−q Z Z X j ≤ lim inf gj (y) dy ≤ lim inf gj (y) dy = lim inf |Ui |k−q λq (f (Uij )) j→∞
j→∞
j→∞
i∈Ij
X j X j ωq ωq |Ui |k−q Lip(f )q |Uij |q |Ui |k−q |f (Uij )|q ≤ q lim inf ≤ q lim inf 2 j→∞ 2 j→∞ i i∈Ij
k
≤
ωq 2 1 ωq Lip(f )q lim inf (Hk1 (A) + ) = 2k−q Lip(f )q Hk (A). j j→∞ 2 q ωk j ωk
which is the assertion. S
Like in the proof of the area formula, if A = k=1 Ek where Ek ∩El = ∅ ∀k 6= l and the Ek are Lebesgue measurable then it suffices to prove the theorem for each Ek separately. The second step is to prove the coarea formula for the regular points. As in the proof of Lemma 4.7 the main idea is that locally the problem can be reduced to the affine case via a locally bijective Lipschitz transformation such that the Lipschitz constant of both, the transformation and its inverse, are close to 1. Whereas in Lemma 4.7 the image space of f was transformed here the transformation acts in the domain space. Lemma 4.15 The Theorem holds if Jf∗ (x) > 0 on A. Proof: 1. Again, as in Lemma 4.7, with the help of the decomposition of Proposition 4.2 and the fact that the Theorem is already proven in the case of nullsets, we can assume that A is compact, Df (x)|A is continuous and that the functions 4` converge uniformly on A to 0. 2. Fix x ∈ A and consider the subspaces V := ker Df (x) and V ⊥ , the orthogonal complement of V . As in the proof of Lemma 4.13 Df (x)|V ⊥ is injective. Let πV denote the orthogonal projection of Rn to V . Define the function hx : Rn → Rn by −1 hx (x0 ) := x + πV (x0 ) + Df (x)|V ⊥ (f (x0 ) − f (x)). (4.15) 2. We claim: For each ε > 0 there is some δ > 0 such that for all x0 , x00 ∈ B(x, δ) ∩ A one has (1 − ε)|x0 − x00 | ≤ |hx (x0 ) − hx (x00 )| ≤ (1 + ε)|x0 − x00 | 61
(4.16)
and
|Jf∗ (x0 ) − Jf∗ (x)| < ε.
(4.17)
In fact (4.17) can be easily obtained by the continuity of Df on A. For (4.16) choose δ < 1` where ` is such that −1 k4` (x) < ε Df (x)|V ⊥
Then for all x0 , x00 ∈ B(x, 1` ) ∩ A
Df (x)πV ⊥ (x00 − x0 ) = Df (x)(x00 − x0 ) and hence |x0 − x00 − hx (x0 ) − hx (x00 )| −1 (f (x0 ) − f (x00 ))| =|πV ⊥ (x00 − x0 ) − Df (x)|V ⊥ −1 00 0 0 00 =| Df (x)|V ⊥ Df (x)(x − x ) − (f (x ) − f (x )) | −1 ≤k Df (x)|V ⊥ k4` (x)|x0 − x00 | < ε|x0 − x00 |.
This proves (4.16). 4. The function f can be written in the form f = Tx ◦ hx where Tx is the surjective affine map Tx (x0 ) = f (x) + Df (x)(x0 − x). In fact Tx (hx (x0 )) = f (x) + Df (x)(hx (x0 ) − x) = f (x) + Df (x)(πV (x0 )) + Df (x) = f (x) + (f (x0 ) − f (x)) = f (x0 ).
Df (x)|V ⊥
−1
(f (x0 ) − f (x))
5. Let Bδ be the compact set B(x, δ) ∩ A as above. According to (4.16) on Bδ the map hx is Lipschitz with Lipschitz constant (1+ε) and its inverse on hx (Bδ ) has Lipschitz constant (1 − ε)−1 . Thus the Lipschitz principle 1.20 gives (1 − ε)n λn (hx (Bδ )) ≤ λn (Bδ ) ≤ (1 + ε)n λn (hx (Bδ )) and for each set E ⊂ hx (Bδ ) −(n−q) n−q (1 + ε)−(n−q) Hn−q (E) ≤ Hn−q h−1 H (E)). x (E) ≤ (1 − ε)
In particular
Tx−1 (y) ∩ hx (Bδ ) Hn−q (f −1 (y) ∩ Bδ ) = Hn−q (h−1 x
≤ (1 − ε)−(n−q) Hn−q Tx−1 (y) ∩ hx (Bδ ) . (4.18) 62
Lemma 4.13 applied to Tx and the compact set hx (Bδ ) shows that the right hand side is a Borel function of y and Z Z Hn−q Tx−1 (y) ∩ hx (Bδ ) dy = JT∗x (x0 ) dx0 = Jf∗ (x)λn (hx (Bδ )) Rq
hx (Bδ ) −n n
λ (Bδ )Jf∗ (x) (4.19) Z Jf∗ (x0 ) dx0 + ελn (Bδ ) . ≤ (1 − ε)−n
≤ (1 − ε)
Bδ
Here (4.17) was used. Similarly,
Hn−q (f −1 (y) ∩ Bδ ) ≥ (1 + ε)−(n−q) Hn−q Tx−1 (y) ∩ hx (Bδ ) and Z
Rq
Hn−q Tx−1 (y) ∩ hx (Bδ ) dy ≥ (1 + ε)−n
Z
Bδ
(4.20)
Jf∗ (x0 ) dx0 − εHn (Bδ ) .
(4.21) 6. For fixed ε > 0 by Vitali’s covering theorem the set A can be covered up to a λn -nullset by a disjoint sequence of sets of the type Bδ . We can disregard the nullset by Lemma 4.14. Taking the sum over this partition we get from (4.18)-(4.21) Z Z −(2n−q) ∗ 0 0 n (1 + ε) Jf (x ) dx − ελ (A) ≤ Hn−q (f −1 (y) ∩ A) dy ∗
A
≤
Z
∗
Rq
Hn−q (f −1 (y) ∩ A) dy ≤ (1 − ε)−(2n−q)
ZR
q
A
Jf∗ (x0 ) dx0 + ελn (A) .
Since ε > 0 was arbitrary the upper and the lower integral coincide, i.e. by remark 4.3 the function y 7→ Hn−q (f −1 (y) ∩ A) is Lebesgue measurable and the coarea holds for the set A. Finally in the last step of the proof of the Coarea formula, we deal with the critical points. A subtle point is that even if rgDf (x) < q the neighbourhoods of x are generally not mapped onto ’smaller’ sets: Consider the following simple example: Let f : R2 → R, f (x1 , x2 ) = x21 + x22 . Then Df (x) = ∇f (x1 , x2 ) = (2x1 , 2x2 ) and ∇f (0, 0) = (0, 0). For U = [−ε, ε]2 we get f (U ) = [0, 2ε2 ] and H1 (f (U )) = 2ε2 = H2 (U ) = (2ε)2 . So volq f (U ) 9 0. voln U Nevertheless the formula holds. In fact Lemma 4.16 Let A ⊂ {x|Df (x) exists but rgDf (x) < q}. Then Hn−q (A ∩ f −1 {y}) = 0 63
for λq almost all y.
Proof: 1. The idea of the proof is to perturb the function f a little such that it becomes regular but with a very small value of Jf∗ . However in order to keep track of the individual fibers we have to enlarge the dimension of the domain space. 2. First of all we can assume that f is globally Lipschitz. This means that all ∂fi are uniformly bounded on A. Let ε > 0 be fixed. We partial derivatives ∂x j n q extend f : R → R to f˜: Rn+q → Rq by setting ˜ z) : = f (x) + εz. f(x, ˜ z) given by the q × (n + q)-matrix Then Df(x, Df (x) εI and hence
˜ z)Df˜(x, z)∗ Df(x,
=
Df (x)
Df (x)∗ εI εI
= Df (x)Df (x)∗ + ε2 I. Because of rgDf (x) < q on A we have p det (Df (x)Df (x)∗ + ε2 · I) ≤ C · ε
for some constant C determined by the uniform bound of the components of Df (x). Together we get the global estimate on A × Rq 0 < Jf∗˜(x, z) ≤ Cε. 3. Thus we can apply the coarea formula already to the function f˜ according to Lemma 4.15. Before we do this let us compare the fibers of f and f˜. Let Q be the unit box in Rq . Then for each y ∈ Rq and z ∈ Q we have f˜−1 (y) ∩ A × Q = {(x, z) ∈ A × Q : f (x) + εz = y}
= {(x, z) ∈ A × Q : f (x) = y − εz} [ = f −1 ((y − εz) ∩ A) × {z}. z∈Q
Therefore for each pair y, z of points in Rq Hn−q (f −1 (y − εz) ∩ A)1Q (z) = Hn−q π −1 (z) ∩ f˜−1 (y) ∩ (A × Q)) . (4.22)
where π : Rn+q → Rq is the canonical projection. 4. The coarea formula applied to f˜ (cf Lemma 4.15) gives Z Z ∗ Cε ≥ Jf˜(x, z) dxdz = Hn f˜−1 (y) ∩ (A × Q) dy. A×Q
Rq
64
(4.23)
In order to get back to f we need to get rid of the z-s. For this we want to apply Lemma 4.14 for fixed y to the projection π. A slight difficulty is the fact that we do not know in advance that the function y 7→ Hn−q (f −1 ∩ A) is measurable on Rq . (This is why in Lemma 4.14 one needs the upper integral.) Let g be a Borel-measurable upper envelope of this function with respect to Lebesgue measure λq . Thus, g(y) ≥ Hn−q (f −1 (y) ∩ A) for all y and for every other Borel function h such that h(y) ≥ Hn−q (f −1 (y) ∩ A) for all y we have g(y) ≤ h(y) λq a.e.. One can replace the upper integral of a possibly non measurable function by the integral of a measurable upper upper envelope of this function. Since λn+q (A × Q) = λn (A) we have for each y ∈ Rq by Lemma 4.14 applied to π (note that the Lipschitz constant of π is 1 so we can ignore it) and equation (4.22) Z ∗ ωq ωn−q n n ˜−1 λ (A)H f (y) ∩ (A × Q) ≥ Hn−q (f −1 (y − εz) ∩ A) dz ωn Q Z g(y − εz) dz. = Q
Combining with (4.23) we get ωq ωn−q n λ (A)Cε ωn
≥ =
Z
Rq
Z
Z Z Q
=
Z Z Q
= =
Z
Q
g(y − εz) dz dy
Rq
g(y − εz) dy dz g(y) dy dz
Rq
g(y) dy
q ZR∗
Rq
Hn−q (f −1 (y) ∩ A) dy.
Now ε > 0 was arbitrary and the hence the Lemma and the Theorem are proved. Corollary 4.17 For a measurable function g : A → R+ and D ∈ B(Rq ) we have Z Z Z g(x) n−q H (dx) dy g(x) dx = ∗ f −1 (D)∩{Jf∗ (x)>0} D f −1 {y} Jf (x) Proof: By Lemma 4.16 the inner integral is well defined for almost every y ∈ D. Letting g˜(x) = Jg(x) ∗ (x) 1{J ∗ (x)>0} we only need to prove f f
Z
f −1 (D)
g˜(x)Jf∗ (x)
dx =
Z Z D
65
f −1 {y}
g˜(x) Hn−q (dx) dy.
If g˜ = 1B this identity follows from the coarea formula Z Z g˜(x)Jf∗ (x) dx = Jf∗ (x) dx −1 −1 f (D) B∩f (D) Z = Hn−q (B ∩ f −1 (D) ∩ f −1 {y}) dy q ZR = Hn−q (f −1 {y} ∩ B) dy D Z Z = g˜(x) Hn−q (dx) dy. D
f −1 {y}
Then the same identity for all g˜ ≥ 0 by the usual monotone approximation by step functions. This has the following application in probability theory Corollary 4.18 Let X be an n-dimensional random variable on some probability space (Ω, F, P) such that X has the density g. Let f : Rn → Rq be locally Lipschitz. If Jf∗ (X) > 0 holds P-a.s. then the q-dimensional random variable Y := f (X) has the density p with Z g(x) Hn−q (dx). p(y) = ∗ f −1 (y) Jf (x) Proof: According to the above Corollary the integral of this function p over a q-dimensional Borel set D gives the probability P X ∈ f −1 (D) ∩ {Jf∗ > 0} = P X ∈ f −1 (D) = P (Y ∈ D) .
This expression for the density p can be simplified if the density g of X is constant on the fibers of f .
66
Example 4.3 Let X1 , . . . , Xn be independent N (0, 1)-distributed, i.e. the random vector X := (X1 , . . . , Xn ) has the density g(x) = √
1 2π
ne
− kxk 2
2
.
Pn Define Y := i Xi2 , i.e. Y is a χ2 -variable with n degrees of freedom. Then Y has the density y n τn 2 −1 e− 2 , n y 2(2π) 2 where τn is the surface area of the unit sphere in Rn . 2 Proof: P n We apply our preceding results to the function f with f (x) = kxk = x . Then i i ∂f ∂f Df (x) = ,..., = 2(x1 , . . . , xn ) = 2xt ∂x1 ∂xn
and hence Df (x)Df (x)∗ = 4xt x = 4kxk2 and Jf∗ (x) = 2kxk. Therefore the density of Y is given by Z g(x) n−q p(y) = (dx) ∗ (x) H J −1 f (y) f Z y 1 e− 2 n−1 = H (dx) n {x : kxk2 =y} (2π) 2 2kxk y
=
e− 2 1 √ √ n−1 − y2 n−1 {x : kxk = y} = √ n n y τn e , √ n H 2 2 y2 π 2 y(2π)
which yields the result. Remark 4.4 A main idea of the ’Malliavin-calculus’ in stochastic analysis is that this method to prove existence of densities of random variables can be even extended to the case where f is defined on an infinite dimensional space. Some other finite dimensional aspects of these ideas will be treated in the subsequent chapters.
67
Index α
measure, 8 Hausdorff measure for norms, 16 H¨ older principle, 39
-Dichte, 36 -capacity, 37 -energy, 37 -potential, 37 -value, 7
Integration by parts, 43 isodiametric inequality, 14 isoperimetric inequality, 1
absolutely continuous, 43 Area formula, 51 attractor, 21
Julia set, 32 Kirszbaum’s Theorem, 48 Koch curve, 21
Binet-Cauchy formula, 57 Brownian motion, 38 Brunn-Minkowski inequality, 12
Lebesgue’s Differentiation Theorem, 41 Lemma of Carath´eodory, 9 Lipschitz principle, 18 Lipschitz-Manifold, 58
Cantor like, 19 Cantor’s middle third set, 5 Chaos Algorithm, 25 Coarea formula, 59 coding, 24 continuity principle, 45 contraction, 21 countable stability, 7 covering, 7 critical points, 54
Mandelbroit set, 32 Marstrand’s Theorem, 36 mass distribution, 17 mass distribution priciple, 17 max-norm, 12 mean value theorem, 34 measure outer, 8 metric measure, 10 Minkowski dimension, 5
density point, 41 diameter, 7 Egorov’s Theorem, 49 euclidean Hausdorff measure, 17
open set condition OSC, 30 strong OSC, 28
Fundamental Theorem of calculus, 42
packing dimension, 7 Potential theoretic method, 38
Hausdorff dimension, 7, 8 distance, 22
Rademacher’s Theorem, 44 68
regular points, 58 values, 58 Sard-type Theorem, 54 Schief’ Theorem, 32 self-similar in the strict sense, 21 in the wide sense, 21 self-similar measure, 27 Sierpinski triangle, 21 Theorem local mass distribution, 36 totally disconnected, 24 upper integral, 65 upper integrals, 60 variational porblems, 1 Vitali class, 14 Vitali’s covering theorem, 14, 15
69
Bibliography [1] M. Barnsley. Fractals Everywhere. Academic Press, San Diego, London, 1988. [2] H. Bauer. Maß- und Integrationstheorie, 2nd ed. de Gruyter, Berlin - New York, 1992. [3] W. Doster. Zur Berechnung des Hausdorffmaßes von Familien von Wahrscheinlichkeitsmaßen. PhD thesis, Univ. Kaiserslautern, 1994. [4] G.A. Edgar. Measure, Topolgy, and Fractal Geometry. Springer, New York, Berlin, 1990. [5] L.C. Evans and R.F. Gariepy. Measure Theory and Fine Properties of Functions. CRC Press, New York etc., 1992. [6] K.J. Falconer. The Geometry of Fractal Sets. Cambridge UP, Cambridge, 1990. [7] K.J. Falconer. Techniques in Fractal Geometry. John Wiley & Sons, West Sussex, 1997. [8] K.J. Falconer. Fractal Geometry, 2nd. Edition. John Wiley & Sons, West Sussex, 2003. [9] H. Federer. Geometric Measure Theory. Springer Verlag, Berlin, 1968. [10] O. Forster. Analysis 3, 3. Auflage. Vieweg Studium 52, Wiesbaden, 1996. [11] P. Mattila. Geometry of Sets and Measures in Euclidean Spaces. Cambridge University Press, Cambridge, 1995. [12] F. Morgan. Geometric Measure Theory: A Beginner’s Guide, 3rd edition. Academic Press, San Diego, London, 2000. [13] P. M¨ orters. Fractal Geometry: From Self-Similarity to Brownian Motion. Kaiserlautern, 2001. [14] C.A. Rogers. Hausdorff Measures. Cambridge University Press, Cambridge, 1970. 70
[15] L. Simon. Lectures in Geometric Measure Theory. Proceedings of the Centre for Mathematical Analysis. Australian National University, Canberra, 1983. [16] J Voss and H.v. Weizs¨ acker. Einf¨ uhrung in die Vektoranalysis. available under http://www.mathematik.uni-kl.de/∼wwwstoch/ 2001s/vec-ana.html, Kaiserslautern, 2001.
71