Editorial, Sales, and Customer Service Office A K Peters, Ltd. 63 South Avenue Natick, MA 01760 Copyright © 1998 by A K Peters, Ltd. All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without written permission from the copyright owner.
Library of Congress Cataloging-in-Publication Data Blatter, Christian, 1935[Wavelets. English] Wavelets: a primer / Christian Blatter. p. cm. ISBN 1-56881-095-4 1. Wavelets (Mathematics) I. Title. QA403.3.B5713 1998 515' .2433-DC21
98-29959 CIP Rev.
Originally pllblished in the German I,mguage by Friedr Vieweg & Sohn Verlagsgesellschaft mbH, D·65189 Wicsbaden, with the title "Wavelets. Eine Einfiihrung I st Edition" (c) by Friedr Vieweg & Sohn VCIIagsgesellschaft mbH, BraunschwciglWiesbaden, 1998
Printed in the United States of America 10 9 876 543 2 1 02 01 00 99 98
Contents
Preface Read Me 1
1.1 1.2 1.3 1.4 1.5 1.6 2
2.1 2.2 2.3 2.4 3
3.1 3.2 3.3 3.4 3.5 4
4.1 4.2 4.3 4.4
vii ix
Formulating the problem A central theme of analysis . Fourier series . Fourier transform Windowed Fourier transform Wavelet transform The Haar wavelet .
1 1 4 8 11 14 20
Fourier analysis . Fourier series . Fourier transform on lR The Heisenberg uncertainty principle The Shannon sampling theorem .
29 29 34 49 53
The continuous wavelet transform Definitions and examples . A Plancherel formula Inversion formulas The kernel function Decay of the wavelet transform
61 61 69 74 78 82
Frames Geometrical considerations . The general notion of a frame The discrete wavelet transform Proof of theorem (4.10)
90 90 99 104 114
vi
Contents
5.1 5.2 5.3 5.4
Multiresolution analysis Axiomatic description . . . The scaling function. . . . Constructions in the Fourier domain. Algorithms . . . . . . . . . . . .
6
Orthonormal wavelets with compact support
6.1 6.2 6.3 6.4
The basic idea . . . . Algebraic constructions Binary interpolation. Spline wavelets
5
120 121 126
134 149 157 157 168 176 188
References
199
Index . . .
201
Preface
This book is neither the grand retrospective view of a protagonist nor an encyclopedic research monograph, but the approach of a working mathematician to a subject that has stimulated approximation theory and inspired users in many diverse domains of applied mathematics, unlike any other since the invention of the Fast Fourier Transform. As a matter of fact, I had only set out to draw up a one-semester course for our students at ETH Zurich that would introduce them to the world of wavelets ab ovo; indeed, such a course hadn't been given here before. But in the end, thanks to encouraging comments from colleagues and people in the audience, the present booklet came into existence. I had imagined that the target group for this course would be the following:· students of mathematics in their senior year or first graduate year, having the usual basic knowledge of analysis, carrying around a knapsack full of convergence theorems, but without any practical experience, say, in Fourier analysis. In the back of my mind I also entertained the hope that some people from the field of engineering would attend the course. In fact, they did, and afterward I found out that exactly these students had profit ted the most from my efforts. The contents of the book can be summarized as follows: The introductory Chapter 1 presents a tour d 'horizon over various ways of signal representation; it is here that the Haar wavelet makes its first appearance. Chapter 2 serves primarily as a tutorial of Fourier analysis (without proofs); it is supplemented by the discussion of two theorems that define ultimate limits of signal theory: the Heisenberg uncertainty principle and Shannon's sampling theorem. In Chapter 3 we are finally ready for a treatment of the continuous wavelet transform, and Chapter 4, entitled "Frames", describes a general framework (pun not intended) allowing us to handle the continuous and the discrete wavelet transforms in a uniform way. All this being accomplished, we finally arrive at the main course: multiresolution analysis with its fast algorithms in Chapter 5 and the construction of orthonormal wavelets with compact support in Chapter 6. The book ends with a brief treatment of spline wavelets in Section 6.4.
viii
Preface
Given the small size of this treatise, some things had to be left out: biorthogonal systems, wavelets in two dimensions, and a detailed description of applications, to name a few. Furthermore, I decided to leave distributions out of the picture. This means that there aren't any Sobolev spaces, nor a discussion of pointwise convergence, etc., of wavelet approximations, and the Paley-Wiener theorem is not at our disposal either. Fortunately, there is an elementary argument coming to our rescue in proving that the Daubechies wavelets indeed have compact support. When putting the material together, I made generous use of the work of other authors. In the first place, of course, I borrowed from Ingrid Daubechies' incomparable "Ten Lectures on Wavelets" [D], to some lesser extent from [1], which at the time (winter 1996-97) was the only wavelet book available in German, and from Kaiser's "Friendly Guide to Wavelets" [K]. Concerning further sources of inspiration, I refer the reader to the list of references at the end of the book. I have deliberately kept this list short and have refrained from reprinting the more extensive, but not updated, lists of references given in [D] or [L]. A substantial and at the same time very recent (1998) list of references can be found in [Bu], which, by the way, takes an approach to wavelets that is fairly similar to ours. Let me comment briefly on the figures. Most graphs of mathematically defined functions were first computed with the help of Mathematica®, then output as Plot, and, finally were finished in the graphics environment "Canvas". A few of the figures, e.g., Figures 3.7 and 6.1, were generated by means of "Think Pascal" as bitmaps, then printed out in letter format and finally reduced to the required width photographically. This book was published first in German by Vieweg-Verlag under the title "Wavelets - Eine Einfuhrung". I am grateful to Klaus Peters that he consented to give the present English edition a chance, and to his collaborators for streamlining the schoolboy's English of my raw translation. Christian Blatter Zurich, 14 August, 1998
Read Me
This book is divided into six chapters, and each chapter is subdivided into a certain number of sections. Formulas that are used again at some later point are numbered sectionwise in parentheses: (1). When referring to formula (5) of the current section, we do not give the section number; 3.4.(2), however, denotes formula (2) of Section 3.4. New terms are printed in slanted type at their place of definition or first appearance; as a rule there is no further warning of the "Watch out: Here comes a definition!" type. The exact spot where a term is defined is referenced in the index at the back of the book. Propositions and theorems are numbered by chapters, the boldface marker (4.3) denoting the third theorem in Chapter 4. Theorems are usually announced; in any case they are recognizable from the marker at the beginning and from their text being printed in slanted type. The two corners I and ~ denote the beginning and the end of a proof.
®
Circled numbers mark the beginning of examples, some of them of a more explanatory nature, some of them describing famous animals created by means of the general theory. The numbering of examples begins anew in each section; the empty circle 0 marks the end of an example. A family of objects designated by
Cn
over the index set I (called an array for short) is
(cn 10: E I)
=:
c.
1A denotes the characteristic function of the set A and Ix the identity mapping of the vector space X. If e resp. ai, ... , ar are given vectors of a vector space X, then span(al, ... , a r ) denote the subspace spanned by e resp. the ak.
< e > resp.
R* := JR \ {O} is the multiplicative group of real numbers. R:' := JR* x JR is the (a, b)-plane "cut up into two halves". Note that in corresponding figures the a-axis is drawn vertically and the b-axis horizontally, as explained in Section 1.5.
x
Read Me
J
The symbol without upper and lower limits always denotes the integral over all of IR with respect to the Lebesgue measure:
J
f(t) dt
In an analogous manner, sums to be sums over all of Z:
:=
1:
f(t) dt .
L:k without upper and lower limits are meant 00
Lak:=
L
ak·
k=-oo
k
The Fourier transform is defined as
f(t;,)
:=
~
J
f(t)
e-i~t dt ,
and the Fourier inversion formula, sometimes called Fourierv transform, reads
By jf: f we denote the N-jet (the Taylor polynomial of order N) of point a E IR, given by
l: f(t)
N :=
L
f at the
(k)()
f k! a (t - a)k .
k=O
The symbol e", denotes the function
If f is a complex-valued function defined on X := IR or X := Z, then a(f) and b(f) denote the left and right ends of the support of f, respectively:
I
a(f) := inf{x E X f(x)
# O},
b(f)
A time signal is simply a function f: IR ---- C.
:=
sup{x E X I f(x)
# O}
.
1 Formulating the problem
1.1 A central theme of analysis The approximation, resp. the representation, of arbitrary known or unknown functions f by means of special functions can be viewed as a central theme of analysis. "Special functions" are functions taken from a catalogue, e.g., monomials t t-t tk, kEN, or functions of the form t t-t ect , C E C a parameter. As a rule special functions are well understood, very often they are easy to compute and have interesting analytical properties; in particular, they tend to incorporate and re-express the evident or hidden symmetries of the situation under consideration. In order to fix ideas we consider a (given or unknown) function
f:
IR "" C ,
assuming that f is sufficiently many times differentiable in a neighbourhood U of the point a E R Such a function can be approximated within U by its Taylor polynomials (k) ( )
n
j:f(t)
:=
L
f k! a (t - a)k
(1)
k=O
(jets for short), up to an error that can be quantitatively controlled, and under suitable assumptions the function f is actually represented by its Taylor series, meaning that one has
f(k)( )
L T(t 00
f(t) =
a)k
k=O
for all t in a certain neighbourhood U' C U. The general setup in this realm is the following: Depending on the particular situation at hand one chooses a family (e", I a E 1) of basis functions t t-t e", (t); the index set I may be a discrete or a "continuous" set. An approximation of a more or less arbitrary function f by means of the e", then has the form N
f(t) ~
L Cke"'k (t) k=l
1 Formulating the problem
2
with coefficients
Ck
to be determined, and a representation of f has the form
f(t) ==
L co:eo:(t) ;
(2)
o:EI
or it appears as an integral over the index set I:
f(t) ==
1
do. c(a.) eo: (t) .
(3)
In the ideal case there are exactly as many basis functions at our disposal as are needed to represent any function f of the considered kind in exactly one way in the form (2) resp. (3). The operation that assigns a given function f the corresponding coefficient vector or array (co: Ia. E I) is called the analysis of f with respect to the family (eo: Ia. E 1). The coefficients Co: are particularly easy to determine, if the basis functions eo: are orthonormal (see below). In the case of the Taylor expansion (1) the coefficients have to be determined by computing recursively ever higher derivatives of f; and in the case of the so-called Tchebycheff approximation there are no formulas for the coefficients Ck, even though they are uniquely determined. The inverse operation that takes a given coefficient vector (co: I a. E I) as input and returns the function itself as output is called the synthesis of f by means of the eo:.
CD
Suppose that the x-interval [0, L J is modeling a heat conducting rod S (see Figure 1.1). The spatially and temporally variable temperature within this rod is described by a function (x, t) t-t u(x, t) that satisfies the onedimensional heat equation au
at
= a
2a 2 u 2 j
ax
( 4)
here a > 0 is a material constant. The initial temperature x t-t f(x) along the rod is given, as is the boundary condition that the two ends of the rod are kept at temperature 0 at all times. Along the rod, i.e., for 0 < x < L, there is no heat exchange with the surroundings. The task is to determine the resulting temperature fluctuation u(·, .) within the rod. In connection with problems of this kind the following procedure (called separation of variables) has turned out to be useful: One begins by determining functions U (., .) of the special form
(x, t)
t-t
U(x, t) = X(x) T(t) ,
satisfying (4) and vanishing at the two ends of the rod. A collection of functions fulfilling these requirements is given by
Uk(X, t)
:= exp (
2 2 2 )
k 7r a ---v:t
k7rx
sin L
1.1 A central theme of analysis
3
u
---*------------------~~--+-----+
0c===============~
X
s
Figure 1.1
Since the conditions imposed on the Uk are linear and homogeneous, it follows that arbitrary linear combinations <Xl
u(x, t) :=
2: Ck Uk(X, t) k=l
of the Uk are in their turn solutions of the heat equation vanishing at the ends of the rod. Therefore we shall have the solution of the original problem in our hands, if we are able to specify the coefficients ck in such a way that the initial condition u(x, 0) == f(x) is fulfilled as well. This means that we would have to guarantee the identity
~
L..,;Ck sin
Tk7rx
f(x)
(0 < x < L) .
(5)
k=l
It is at this point that the question arises as to whether the function system
. k7rx
ek () x :=smT
(k E
N~t)
is "complete", that is to say, is rich enough to allow the representation of an arbitrarily given function f: )0, L[ ~ lR in the form (5). The answer to this question is yes, as is proven in the theory of Fourier series (see below). 0 As we move along, another issue enters the picture: If a function f is analyzed or synthesized not only in thought and for theoretical purposes, but concretely, as in the analysis of ECGs or of long term climate changes, then for the numerical work a more or less complete discretization becomes almost indispensable. The discretization refers, on the one hand, to the collection of basis functions (in case the latter has not been discrete from the outset) and, on the other hand, to the space parametrized by the independent variable t
4
1 Formulating the problem
(resp. x, x, ... ): The values of all occurring (given or unknown) functions are evaluated, measured or computed only at the discrete places t := kT
(k E Z,
T
> 0 fixed) .
The fact that the function values f(t) themselves are represented in the computer in a "quantized" form only, instead of with "infinite precision", does not concern us here.
Wavelets are novel systems of basis functions used for the representation, filtration, compression, storage, and so on of any "signals" f:
lR. n
-+
C .
In the case n = 1, the variable t represents time, and one works with time signals f: lR. -+ C. The case n = 2 refers to image processing; a concrete example is the representation and storage of millions upon millions of fingerprints in the FBI's computer, see [1]. We shall approach these wavelets by recalling briefly some facts about Fourier series and the Fourier transform. A more complete tutorial of Fourier analysis is given in Sections 2.1 and 2.2.
1.2 Fourier series Fourier series concern 21T-periodic functions f:
lR.
-+
C,
f(t
+ 21T) ==
f(t) ,
equivalently written as f: lR./21T -+ C. The "natural" domain of definition of such a function is the unit circle 8 1 in the complex z-plane, see Figure 1.2. On 8 1 the infinitely many modulo 21T equivalent points t+2k1T, k E Z, appear as a single point z = eit . i
~~--~I--~or------~o--+ t
t-21T
Figure 1.2
0
t
t+21T
1
1.2 Fourier series
5
Expressing the monomial power functions
in terms of the variable t, one arrives at the trigonometrical basis functions or pure harmonics
(k E Z) . (Unfortunately there is no universally used and accepted notation for these functions; so we shall give the boldface e a try here.) The natural scalar product for functions f: lR/27r -+ C is given by
(I, g) The
ek
:=
1 27r
171" -71" f(t) g(t) dt .
(1)
are orthonormal: (ej,ek)
= 8jk ;
in particular, they are linearly independent. From general principles of linear algebra it follows that
(2) is the "k-th coordinate of f with respect to the basis N
IkE
Z)" , and
N
L
SN:=
(ek
Ckek
resp.
SN(t):=
k=-N
L
Ck eikt
k=-N
is the orthogonal projection of f onto the subspace
formed by all linear combinations of the ek having Ikl :=:; N. Being the foot of the perpendicular from f to UN (see Figure 1.3), the point SN is nearest to f among all points of UN. In saying this we have tacitly assumed that in our function space the distance function
d(f,g)
:=
1 171" ) 1/2 Ilf - gll:= ( 27r _...!f(t) - g(t)1 2 dt
corresponding to the scalar product (1) has been adopted.
6
1 Formulating the problem
Figure 1.3
This has been the easy part. But what is crucial here, and much more difficult to prove as well, is that the system (ek IkE Z) is complete: Any reasonable function f: JR/27l" ~
'~ " ck eikt
,
k=-oo
meaning that in some sense, to be made precise in each individual case, one has the convergence limN ..... oo SN = f resp. 00
f(t) =
L
Ck eikt .
(3)
k=-oo
We shall look into this in more detail in Section 2.1 below. What can be said about "discretization" here? The system (ek IkE Z) is already discrete: There are only integer frequencies k. In numerical computations one is of course restricted to a finite frequency range [ - N .. N l; thus instead of representations (3) there are only approximations SN. If one discretizes with respect to the time variable t as well, one arrives at the so-called discrete Fourier transform. The latter is a purely algebraic matter, since convergence questions no longer enter the picture. The discrete Fourier transform has received an enormous boost by the invention of fast algorithms (Cooley & Tukey, 1965; but there are predecessors). The key phrase here is fast Fourier transform, FFT for short. We shall see that wavelets are structured for fast algorithms right from the outset. This was a key ingredient in making wavelets a powerful tool in various application fields within a small number of years.
7
1.2 Fourier series
The "Fourier transform" that assigns a 27r-periodic function f its array of Fourier coefficients (Ck IkE z) treats f as an "overall object" (Gesamtobjekt in German). In particular, there is no localization on the time axis. In an array (Yk 10::; k < N), Yk:=
27rk) f( N
(0::; k < N) ,
i.e., a simple table of values of f, information about f is stored in a way that allows easy and precise localization of individual features (e.g., local maxima, turning points, and so on) on the time axis. In marked contrast to this characteristic of a table (Yk 10 ::; k < N), each individual Fourier coefficient Ck contains information about f originating from the entire domain of definition of f. One cannot decide, merely from looking at the Ck, where f has, e.g., its maximum or a jump discontinuity.
~f
I I
t
o
7r
Figure 1.4
@ The jump function f(t):=
{ 0~(7r-t) f(t
+ 27r)
(0 < t < 27r) (t = 0) \:It
(Figure 1.4) can be developed into a Fourier series as follows: 00
f(t) =
L k=l
1
k sin(kt)
.
8
1 Formulating the problem
The given series actually represents f at all points t, but it is converging "uniformly poorly": Since the coefficients 11k decay so slowly when k -+ 00, at each point t -:j:. a (mod 211") one is dependent on the oscillations of k f-? sin(kt) to obtain convergence. Furthermore, the well known Gibbs phenomenon rears its ugly head: Any partial sum S N of the Fourier series overshoots the maximal function value ~ at some point tN near a by about 18%. Now if, e.g., the Fourier analysis of the function 9 shown in Figure 1.5 is at stake, then, because of the jump discontinuity at to, this function has a Fourier series that is everywhere poorly convergent to begin with; furthermore, one cannot see from looking at the Ck where the jump is, even though it may be that this is the only interesting thing about g.
0
t
Figure 1.5 If one approximates a function f by means of wavelets then there will definitely be some kind of localization; moreover, this localization is, so to speak, tailored to measure: Transient features (short-lived details) of f, like, e.g., jump discontinuities or marked peaks can easily be localized from looking at the wavelet coefficients, whereas longtime trends of f are stored in deeper layers of the coefficient hierarchy and are automatically represented in a smaller scale; as a consequence they are less precisely localized on the time axis.
1.3 Fourier transform Fourier transform on JR, FT for short, has as its goal the analysis and synthesis of functions f: JR -+ C, using the pure harmonics (1) ea: JR -+ C,
1.3 Fourier transform
9
as basis functions, but this time of arbitrary real frequencies a. In other words, the index set is IR and so is isomorphic (Le., structurally equal) to the domain of definition of the functions f under consideration. The relevant scalar product now is
(I, g)
:=
1:
f(t) g(t) dt
(cf. 1.2.(2)); it is the decisive structural element of the so-called L 2 -theory (for details see Section 2.2). Since the functions e a do not lie in L2, it makes no sense to ask whether they are orthonormal: The scalar product (e a , e{3) is not defined. Nevertheless it is allowed and makes sense for a great many functions f E L2 to define a "coefficient vector" (i(a) Ia E IR) by means of the formula
f(a)
:= -
1
1
00
.j2ii
f(t) e- iat dt .
-00
The function is called the Fourier transform, sometimes also the spectral function, of the function f. An individual value f(a) may be viewed as the complex amplitude by which the frequency a is present in the signal f. Again in this case there is no localization with respect to the variable t: One cannot read off from the value f(a), at which time the "note" a was played. In the field of image processing one would like to make use of the twodimensional Fourier transform. Think, e.g., of a picture of a landscape. In different areas of the image you see totally different textures (a forest, a newly plown field, a lake, clouds, and so on). These textures cause the occurence 1R2 ---+ C of this image. of characteristic patterns in the Fourier transform Again, from looking at the function you might perhaps be able to tell which kinds of textures occur in the original picture, but definitely not where in the picture these textures manifest themselves. For this reason one does not subject the picture as a whole to the Fourier transform. Instead one divides it into small squares that can be considered homogeneously textured, then these small squares are individually Fourier transformed.
i
f
Simultaneous localization with respect to both variables t and a in a single data array is available only within specific bounds - and these bounds cannot be transgressed even with wavelets. An "oscillation impulse" manifest in the time interval [to - h, to + h 1 (and == 0 outside) and having a frequency range [ 00 - 0, ao + 0 l, where h > 0, 0 > 0 are arbitrarily small, does not exist. The
1 Formulating the problem
10
quantitative expression of this fundamental fact is the Heisenberg uncertainty principle 00 2 100 1
1 -00 t lf(tW dt·
-00 a?
li(aW da ~
411fll4
(2)
(see Section 2.3). Here the first factor on the left is a certain measure for the "spread" of the graph of f over the t-axis, and the second factor is a measure for the "spread" of the graph of lover the a-axis (Figure 1.6). The inequality (2) says that the graphs of f and cannot simultaneously have a single marked peak at the origin. For the constant multiples of the functions t t--+ exp( -ct 2 ), c> 0, and only for these, one has equality in (2).
1
--_..,--H---...>o.---_t
Figure 1.6
For reasonable functions f:
f(t) =
1
~ -+
00 ~ f(a) eW. t da v21l' -00 1
In=
C, the Fourier inversion formula resp.
f =
1
00 da i(a) eo (3) 21l' -00
In= 1
V
is valid. This formula represents resp. synthesizes the function f as an (integral) superposition of pure harmonics (1). It is of course fundamental in theoretical considerations, but for practical purposes it produces more than one really needs: A real-world signal is negligibly weak or even identically zero outside of some t-interval I. The user knows this from the start, and he is not interested at all in synthesizing the signal outside of the interval I. But the inversion formula (3) produces a function value at all points of the t-axis; in particular, it goes to great pains to generate "identically 0" on ~ \ I by mutual complete cancellation of the eo - and nobody is looking.
1.4 Windowed Fourier transform
11
1.4 Windowed Fourier transform It may be clear from what we have said in the last two sections that we are looking out for a "data type" that allows easy extraction or retrieval of both temporal (resp. spatial) and frequency information about a signal f: lR ----+ C. A musical score is a data type having just these characteristics: If you can read music and are given a musical score, then you can see at a glance at which instances of time which frequencies are activated. The so-called windowed or short time Fourier transform, abbreviated WFT resp. STFT, constitutes a continuous version of such a data type. However, the simultaneous localization (within the fundamental bounds, of course) with respect to the time and frequency variables comes at the price of an enormous redundancy, insofar as now the index set of the resulting data vector
(Gf(ex,s) I (ex,s)
E lR x lR)
is two-dimensional, altough a function of only one real variable t is encoded.
y ~
2h
I I
\ y=g(t)
\
-h
0
t
h
Figure 1.7
The WFT can be described as follows: One begins by choosing a window function g: lR ----+ lR~o once and for all. The function 9 should have "total mass" 1 and be more or less concentrated around t = 0, which means that it should have, e.g., a compact support containing 0 (see Figure 1.7) or at least a maximum at t = 0 and fast decay when It I ----+ 00. A widely used window is given by the function
g(t)
:=
Nu,o(t) .-
~O" exp ( - 2~2) ,
(1)
12
1 Formulating the problem
u being a fixed parameter. 1 The corresponding transform is often called Gabor transform, since Dennis Gabor (Nobel prize in physics, 1971) was one
of the first to use the WFT systematically; in particular, he remarked that the window Na,o is in some sense optimal. For a given
8
E
JR, the function
g8:
t
t-+
g(t - 8)
represents the window g, translated by the amount 8 (to the right, if 8 > 0). We retain the functions 1.3.(1) as our basic oscillation patterns and define the window transform
Gf: of a function
f
JR x JR
-7
(a,8)
t-+
Gf(a, 8)
by
Gf(a,8):=
1
J7C
V 21l"
JOO f(t) g(t -
.
8) e- w :t dt .
(2)
-00
If we had chosen, e.g., the window function 9 shown in Figure 1.7, then formula (2) may be interpreted as follows: The value Gf(a,8) represents to some measure the complex amplitude by which the pure harmonic eO! is present in f during the time interval [8 - h, 8 + h]. If during this interval, among others, the "note" a is played, then IGf(a, 8)1 will be large.
Since the information about f is represented redundantly in Gf, there are several inversion formulas for the windowed Fourier transform f t-+ Gf, see, e.g., [K], Section 2.3. For practical-numerical purposes, one of course has to resort to a discrete version of the WFT, using equidistant subdivisions both on the t- and the a-axis.
*
It is a consequence of the constant window width 2h (resp. "'2u in the case (1)) that for lal » the "key pattern" t t-+ g(t_8)e- i O!t has the shape shown in Figure 1.8. Now, a given signal might contain just a couple of oscillations of frequency a within the interval [8 - h, 8 + h ], and these will take place in a very small part of this interval. Therefore Gf(a,8) will have a respectable value, but the "key pattern" shown in Figure 1.8 will not be able to detect the location of such an oscillatory impulse with the desired precision.
1
The official symbol for this function is N(O, u), but the symbol we are proposing here is in accordance with the notation 1.5.(1) commonly used in wavelet theory.
1.4 Windowed Fourier transform
13
y
/ n
n
nl
s-h
y- g(t-s) cos (at) , lallarge
A n
1\
fI
1\
s+h
s
V
V
v
v
V
V
t
v
Figure 1.8
*'
At the lower end of the audible range, i.e., for frequencies lal « things are even worse. In this case the "key pattern" has the shape shown in Figure 1.9. If the signal f possesses a (perhaps highly interesting) oscillatory component of a characteristic frequency lexl « then the transformation G will not detect it: The window in Figure 1.9 is too narrow to encompass even a single full turnaround of such a low frequency.
*,
y
~=g(t-s)cos(at), lalsmal~ ~~/r-~------~----------~\----t s-h s s+h
Figure 1.9
14
1 Formulating the problem
1.5 Wavelet transform In order to make clear what is so decisively new about the wavelet transform, WT for short, as compared to the FT and WFT described in the preceding sections, we are going to repeat resp. summarize the main features of the latter as follows: • The Fourier transform of functions f: lR - t C uses a special analyzing function t I - ' eit that is distinguished by a host of interesting analytical properties. This analyzing function is dilated by the real frequency parameter 0: and appears as t I - ' eio: t in the transformation formulas. The windowed Fourier transform uses the same analyzing function t I - ' eit • as well as its dilated versions. There is an additional element in the form of a movable but otherwise rigid window function g. Note that there is a certain freedom in choosing this window function. y
t L
Figure 1.10
The basic model of the wavelet transform works on complex-valued time signals f: lR - t C, also. One begins by choosing a suitable analyzing wavelet, also called the motber wavelet or simply a wavelet, x I - ' 'lj;(x). Figure 1.10 shows a 'lj; having compact support [0, L J. Dilated and translated copies of the mother wavelet 'lj; we shall call wavelet functions. The "key patterns" used for the analysis of time signals f will be just such wavelet functions, and the following notation shall be adopted for them: t
I-'
(t - b) .
1 la11/2 'lj; -a-
(1)
15
1.5 Wavelet transform
The double index (a, b) appearing here runs through the set R* x IR or R>o x lIt The variable a is called the scaling parameter, and b is the translation parameter. The factor 1/laI 1/ 2 in (1) is not crucial and is more of a technical nature; it is thrown in to guarantee II7Pa,bll = 1.
1\
y=
7P(t-b) ---u;- ,
\
O
t
b\
~
aL Figure 1.11
As may be gathered from Figures 1.11 and 1.12, the width of the "key pattern" resp. "key window" grows proportionally to lal, and for all values of a and b this window presents a single and complete copy of the analyzing wavelet. Of the following facts one should take note right at the beginning: •
Scaling parameter values a of modulus 0 < lal « 1 result in very narrow windows and serve for the precisely localized registration of high frequency resp. transient phenomena present in the signal f.
•
Scaling parameter values a of modulus lal » 1 result in very wide windows and serve for the registration of slow phenomena resp. long wave oscillatory components of f.
Due to everything that has been said so far it is now clear that the wavelet transform Wf: IR* x IR - t C , (a, b) ...... Wf(a,b) of a time signal f is defined as follows:
Wf(a, b)
:=
(f,7Pa,b)
=
la111/ 2
1
00
-00
t - b
f(t) 7P(-a-) dt .
1 Formulating the problem
16
t-b)' a»l y='l/J (a t
b
aL Figure 1.12
To be completely correct we should write W"'! instead of Wi, for the resulting data array
(W!(a,b)
I (a, b) E]R*
X]R)
depends on the mother wavelet 'l/J chosen at the beginning. In all cases where there is only one mother wavelet at stake, we are allowed to do without the full notation W",. The domain of definition of the transform W! is the (a, b)-plane, "cut into two halves". Since the variable b denotes a translation along the time axis, it has become standard in wavelet theory to draw the b-axis horizontally and the a-axis vertically, contrary to the usual disposition of the axes corresponding to the first and second factors of a cartesian product. We shall see in Section 3.3 that for the wavelet transform there is again an inversion formula. This formula represents the original signal ! as a "linear combination" of the basis functions 'l/Ja,b, with the values W!(a, b) of the wavelet transform serving as coefficients. In order to set up such a formula one needs a characteristic "volume element" on the index set ]R* x R If the functions 'l/Ja,b are given by (1), then one has 1
! = -C '"
1m JR" xJR
dadb -I -12 W!(a,b)'l/Ja,b a
with a constant C", depending only on the chosen 'l/J (Theorem (3.7)). It is a fundamantal feature of the setup described here that on the scaling axis (the wavelet analog of the frequency axis) a logarithmic scale becomes prevalent. Such an experience is maybe familiar to the reader from acoustics resp. from music: Equal tone steps correspond to equal frequency ratios W2/Wi (e.g., 5 : 4 for the major third) and not to equal frequency differences W2 -Wi'
1.5 Wavelet transform
17
This fact becomes particularly evident when as our next step we are going to discretize the index set lR>o x lR: We choose a zoom step (j > 1 (the value (j := 2 is most commonly used here) and consider from hereon only the discrete set of dilation factors
(r
E
Z) .
Note that larger numbers r E Z correspond to larger dilation factors aT > o. With regard to the translation parameter b, we cannot simply choose a base step j3 > 0 and then have a single grid oftranslation values bk := k j3 (k E Z) as in the case of the Fourier transform. The truth is that at finer scales, which is to say: for smaller values of r, we need a correspondingly smaller translational step size as well, if everything is to come out right. Concretely, on the level aT in the (a, b)-plane (a scaled vertically, b horizontally!) we select as grid values the numbers
(k E Z) (see Figure 4.4). This means that consecutive bT,k'S have a distance (jTj3 from each other. A moment's reflection shows that this choice is in fact quite natural; in particular, it allows in an optimal way the precise localization of high frequency and/or transient phenomena occurring in the processed time signal f. In this way a discrete group of self-similarities of lR on the one hand and between 'IjJ and its scaled versions on the other hand has been established. The systematic exploitation of this group leads to the so-called multiresolution analysis and to the fast algorithm that goes with it. The latter, called fast wavelet transform, FWT for short, serves for the computation of the wavelet coefficients
and likewise for the reconstruction (Le., synthesis) of the signal stored data er,k .
f from the
In choosing the analyzing wavelet 'IjJ one has great freedom, this being in marked contrast to the rigid framework of Fourier analysis. Essentially it is enough to make sure that 'IjJ belongs to £1 n £2 and that J~oo 'IjJ(t) dt = O. Depending on circumstances and desirabilities, things can always be set up in such a way that
18
1 Formulating the problem
•
'Ij; has compact support,
•
the wavelet functions (the "key patterns")
•
fast algorithms are available,
•
'Ij; is so and so many times differentiable,
•
the wavelet coefficients have optimal decay when r
•
and so on.
belonging to the described discretization are orthonormal,
- t -00,
As we proceed through the chapters of this book we shall meet several "famous" mother wavelets 'Ij; - some of them represented by simple formulas, others given in the form of theoretical constructs; and in each case we shall present a numerical resp. graphical realization of the wavelet under discussion as well. These are, in order of appearance (at the left the number of the corresponding figure is shown): 1.13 3.4 3.5 3.9 4.8 5.4 6.4 6.6 6.9 6.11
Haar wavelet Mexican hat Modulated Gaussian Derivative of the Gaussian Daubechies-Grossmann-Meyer wavelet corresponding to a= 2, {3= 1 Meyer wavelet Daubechies wavelet 3'I/J Daubechies wavelet 2'I/J Battle-Lemarie wavelet corresponding to n = 1 Battle-Lemarie wavelet corresponding to n = 3.
The central aim of this book is to present the mathematical foundations of wavelet analysis in a form readily accessible to the student. Nevertheless it is appropriate and perhaps even mandatory to take a quick glance at the applications of this new theory, too.
Fourier analysis is a mighty tool within mathematics as well as in applied fields. Within mathematics it is primarily used in the theory of (linear) partial differential equations. A toy model for this kind of application is given by Example 1. 1. Outside mathematics Fourier theory comes to the fore in the modelization, description, and analysis of any spatially or temporally periodic phenomena, to mention the most obvious. The Fourier transform draws its power from the overwhelming invariance and symmetry properties of the pure harmonics en.
CD.
1.5 Wavelet transform
19
In marked contrast to the above, the invention of wavelets is directly tied to practical applications (to the analysis of seismic waves, as a matter of fact). The analytic properties of wavelets are decidedly more intricate than those of the pure harmonics en; as a consequence their use within mathematics, i.e., as a tool for the working mathematician, has been somewhat limited (but things are beginning to change). A nice example ofthis type can be found in [M], Chapter 5. The two applied fields where wavelets have been used with the greatest success are signal processing and image processing. Signal processing is concerned with time signals, so it makes use of the "one-dimensional" wavelets whose theory is presented in this book. In the realm of image processing twodimensional wavelets are used. The theory of these two-dimensional wavelets is in part a straightforward "squaring" of the one-dimensional theory, but it also contains other elements; it is not treated in the present book. Under the term processing we subsume the analysis, "purification", filtering, efficient storage, retrieval, and transmission of time signals resp. image data, and above all their compression. In information theory an image is viewed as the result of a random process, in the ultimate limit as a bitmap without any correlation between adjacent pixels. But in a real world image (or audio document) there are typically regions of high information density and other regions (e.g., cloudless sky) where there is almost no pictorial content. Now assume that the given image is subject to a (discrete) wavelet transform, resulting in a large amount of data Cr,k , say. Then it is easy to filter out those coefficients Cr,k whose values transcend a certain threshold. Only these Cr,k are actually stored resp. transmitted. In this way (and now we are coming to the essence of the whole set-up) in each region of the image exactly as much image content per unit of area is expressed as is in fact present there. That is to say, by dynamically adapting the image resolution to the changing local information density one can achieve respectable data compression ratios, the whole with no noticeable loss in overall image quality. The reader who wants to go more deeply and in more detail into the various applications of wavelets is referred to the volumes [Be], [C /] and [D/], each of which contains a collection of essays by various authors, or to [L], Chapter 3. The computational and programming aspects of signal and image processing using wavelets are extensively treated in [W]. As a novel descriptional tool wavelets have found their way into various sub domains of mathematical physics as well; in this regard see, e.g., [K], Part II. We conclude this section with a very brief historical note. Predecessors of wavelets, albeit without the melodious name, have been in existence since 1910 (see the next section). Over the course of subsequent decades several communication theorists have attempted to overcome the aforementioned drawbacks
20
1 Formulating the problem
of Fourier analysis resp. the WFT by various wavelet-like constructions. We should also mention a famous integral formula by Calderon (1964) which in a way is the godfather of the inversion formula for the wavelet transform. The main breakthrough, however, came only in the late 1980s with the axiomatic description of multiresolution analysis (by Mallat and Meyer [12]) and with the construction of orthonormal wavelets having compact support, by Ingrid Daubechies [3]. For a more detailed presentation of this course of events, accompagned by an extensive bibliography (complete as of 1992), we refer the interested reader to the standard treatise [D].
1.6 The Haar wavelet Many important aspects of wavelet theory can already be observed and comprehended by studying the most simple wavelet of all, the so-called Haar wavelet. To do this we don't need any profound preparations; on the contrary, it is possible to begin with our bare hands. It goes without saying that the Haar wavelet will show up time and again in later chapters and so will serve as a handy example througout the book. In 1910 the mathematician Alfred Haar was the first to describe a complete orthonormal system for the Hilbert space L2 := L2(JR), and in so doing he proved that this space is isomorphic to the space [2:=
{(Ck1kEN)
~ickI2
of square-summable sequences. Nowadays, resp. in connection with the matter under discussion, we view the basis functions given by Haar as dilated and translated copies of a certain mother wavelet 1/J, as described in the foregoing Section 1.5. The Haar wavelet is the following simple step function:
(O~x<~) (~ ~ x < 1) (otherwise) (see Figure 1.13). This 1/J =: obvious that
i:
1/JHaar
1/J(x)dx = 0,
has compact support; furthermore, it is
21
1.6 The Haar wavelet
1
x
-1
Figure 1.13
The Haar wavelet
The Haar wavelet is well localized i~the time domain, but unfortunately not continuous. The Fourier transform '¢ of '¢Haar is computed as follows: ;j;(a)
= _1_ ( {1/2 e-iOl.x dx _ (I
vI27r Jo 1
1 (-iOl.X 11/2
=
vI27r -ia
=
_i_sin2(a/4) e-iOl./2
vI27r
e-iOl.x
Jl/2
e
x:=o -
e
dX)
-iOl.x 11 ) x:=1/2 =
(1)
a/4
The (even) function 1;j;1 has its maximum at the frequency ao ~ 4.6622, see Figure 1.14, and decays like l/a when a -+ 00. As a consequence one might say that ;j; is "fairly well" localiz~d at the frequency ao, but the discontinuity of 1/JHaar causes a slow decay of '¢ at infinity.
y
Figure 1.14
Using '¢Haar as a template we now generate the wavelet functions
'¢r,k(t)
:= 2
-r/2
'¢Haar
(t-k.2 r 2
r)
(r, k
E Z)
(2)
22
1 Formulating the problem
(see Figure 1.15). The function '¢r,k has as its support the interval
of length 2r. Let us repeat the following here: A larger value of r means longer intervals Ir,k, and the corresponding wavelet functions '¢r,k are mimicking longer "waves". The amplitude of '¢r,k is chosen in such a way that
(3) for all r and all k. But in reality much more is true:
(1.1) The functions '¢r,k (r E Il, k E Il) constitute an orthonormal basis of the space £2(1R).
II'¢r, kll = 1
'¢r, k,
t \ (k+l)· 2r
Figure 1.15
I
If k
i- l, then the functions '¢r,k and '¢r,l
(same r!) have disjoint supports,
and
(k
i- l)
is an immediate consequence.
If, on the other hand, s < r then '¢r,k is constant (= -1, 0 or 1) on the support of '¢s,l , see Figure 1.16. Therefore we have (s
i- r,
all k, l) ,
and in conjunction with (3) it follows that the '¢r,k do indeed form an orthonormal system. Now to the essential point: We have to show that any f E £2 can be approximated arbitrarily well (in terms of the £2- metric) by finite linear combinations of the '¢r,k' Such linear combinations we shall call wavelet polynomials. By
1.6 The Haar wavelet
23
jl
V;s, I
I I I I I I ----~~~~I~[-----+----------~------'
t
I I I I I I Figure 1.16
general principles it is enough to consider an There is an m ~ 0 and an n ~ 0 such that
I:
lR
-t
C of the following kind:
(a) I(x) == 0 and (b) I is a step function, constant on the intervals Ln,k of length 2- n . We are now going to construct a sequence ('l1 r Ir ~ mials r 'l1 r :=
-n)
of wavelet polyno-
2::: (2::: Cj,k V;j,k)
j=-n+1
k
as follows: Beginning with the finest details in the signal I itself we shall extract recursively out of the remainder Ir := I - 'l1 r the finest details still present therein, the latter becoming ever more spread out as we go along. This means in particular that in the limit r - t 00 the lowest frequency parts of I are treated last, just the reverse from what one has in Fourier analysis resp. synthesis. We start the construction with 'l1-n:=O,
I-n:= I .
For the induction step r "'" r' := r + 1 we make the following assumption (which is obviously fulfilled for r := -n):
Ar The wavelet polynomial 'l1 r and the remainder Ir have been determined in such a way that
(4) and such that Ir is constant on each of the intervals Ir,k. The value of Ir on Ir,k , denoted by Ir,k , is nothing other than the mean value of the original function I on the interval Ir,k.
1 Formulating the problem
24
Now we define the quantities
1 br',k := "2(fr,2k - ir,2k+1) ,
ir' ,k :=
1
"2 (fr,2k + ir,2k+1)
(see Figure 1.17) and put Cr' ,k ..== 2 r ' /2 br' ,k
IJI r' := IJI r
+L
(cf. the normalization of the 1/Jr,k) ,
(5)
Cr' ,k 1/Jr' ,k ,
k
ir'(x)
:=
ir',k
Then (4) is true with r' instead of r, the function ir' is constant on the intervals Ir' ,k , and ir' ,k is the mean value of i on Ir' ,k; in other words: A r, is fulfilled.
t
h br k ....... ... ~--.-- ... --- .... ---- ... ---- ... ---- .... --.----.-- ir,k fr,2k+l
-----1 I
Ir ,k
I
(2k+2)·2 r
(2k+1) ·2r Figure 1.17 Beginning with r := -n, one arrives after n
+ m such steps at the formula
m
f = IJI m
+ im =
L (LCj,k 1/Jj,k) j=-n+1 k
+
fm .
The remainder im is constant on the intervals Im,k of length 2m. however, that at most the two values
A:= fm,-l = mean of f on [-2m,0[ B:= im,Q = mean of i on [0,2m[
Note,
and
are different from 0; for up to this moment all functions coming into the picture were == 0 for Ixl 2: 2m.
1.6 The Haar wavelet
25
We can continue our doubling procedure with the as yet unprocessed remainder fm. After p further steps we have m+p fm
=
2: (2:
j=m+l
Cj,k 'ljJj,k)
+
fm+p,
k
the function fm+p being constant on the two intervals [-2m+p,o[, [O,2 m +p [ and == 0 outside. Since f is identically zero outside the interval [- 2m , 2m [, it follows that fm+p,-l = 2- P A, fm+p,o = 2- P B . Therefore we have
resp. Letting p
-f
00,
we finally obtain
as stated. This proof of theorem (1.1) is constructive in the sense that it also yields an algorithm for the determination of the wavelet coefficients Cj,k, and, what's more, this is not any old algorithm, but what people call a fast algorithm. We can easily convince ourselves that this is indeed the case by counting the number of arithmetical operations required for the complete analysis. The original function f is determined by
individual entries. The first reduction step concerns N /2 pairs of intervals and requires essentially two additions per pair (dividing by 2 does not count, neither does the scaling (5»). Every subsequent reduction step requires half as many operations as the preceding one; furthermore, it makes sense to stop the process after m + n steps. This means that for the determination of all coefficients Cj,k altogether only N ( 1 + "2 1+4 1 + . .. ) . 2 ~ 2N "2
26
1 Formulating the problem
arithmetical operations are required, a number that grows linearly with the input length. We shall see in Section 5.4 that the reconstruction of f, using the Cj,k as input, can be accomplished with about the same number of operations. By way of comparison: The straightforward multiplication of a data vector of length N by a square matrix of order N requires O(N2) arithmetical operations. The most welcome algorithmic facts we have encountered here are not a specialty of the Haar wavelet; on the contrary, they are guaranteed to us for all mother wavelets 1/J admitting, as 1/JHaar does, a so-called multiresolution analysis. For more details we refer the reader to Section 5.4: Algorithms. We bring this section and with it the introductory chapter to a close by pointing our finger at a certain paradox that is apt to worry the novice. It is the following: All wavelet functions 1/Jr,k (including the ones that we shall meet only later) have mean value 0:
I:
1/Jr,k(t) dt = 0
(r, k E Z) .
How is it possible to approximate, e.g., the function by linear combinations of such functions?
f shown in Figure 1.18
y
y= f(t)
t Figure 1.18
Well, the approximation Wr --+ f (r --+ 00) takes place in L2, in many practical cases even pointwise, but not in L1. The latter may be seen formally as follows: The functional
f .....
I:
f(t)dt
is continuous on L1, and for a function f as shown in Figure 1.18 one has L(f) > O. Since on the other hand for all approximating functions the equality L(W r) = 0 holds, we cannot have limr -+ oo wr = f in L1.
1.6 The Haar wavelet
27
What happens in reality can best be examined with the help of the following simple example: We are going to approximate the function
¢(x):={~
(O::;x
by means of the procedure used for the proof of theorem (1.1). To simplify matters we replace the wavelet functions 'l/Jr,k as defined in (2) by the functions
Le., the normalization factor appearing in (2) is omitted. In addition, we introduce the functions
(0 ::; t < 2r) (otherwise)
(r ;::: 0) ;
they are related to the -0r,k by means of the recursion formula
as is easily verified by looking at Figure 1.19. From the last equation it follows by induction that r
¢
= go = L
1 _ 2j 'l/Jj,O
1
+ 2r gr
(r ;::: 0) .
j=1
Here the sum on the right hand side is just the approximating wavelet polynomial Wr , appearing in the proof of theorem (1.1), whereas the term gr/2r is constant on the interval Ir,o and therefore represents the remainder fr. We now can see the following: The function ¢ being approximated by the wavelet polynomials Wr has the interval [0, 1[ as its support, but the supports of the approximating functions Wr are ever more spread out over the t-axis. The discrepancy that "for mean value reasons" necessarily has to persist between ¢ and the Wr is smeared out over a larger and larger domain: Wr has the value 1 - 21r on the interval [0, 1[ and the value - 21r on the interval [1, 2r [ . As was to be expected, one has
28
1 Formulating the problem
as well as
(r
--+
the latter in agreement with (6), and finally the formula lim I¢(t) - wr(t)1
r--too
= r--+oo lim Ifr(t)1 = 0
is true as well, the convergence even being uniform in t.
\it
00) ,
2 Fourier analysis
The most important tool in the construction of wavelet theory is Fourier analysis. The subsequent chapters rely on many of the well-known theorems and formulas relating to Fourier series, as well as on a basic understanding of the Fourier transform on R These ideas will be presented in the following sections in the way of a review, so that they can readily be used later on. For the corresponding proofs we refer the reader to the pertinent textbooks, e.g., [2], [5], [10], [15J. In Sections 2.3 and 2.4 we give an account of the Heisenberg uncertainty principle and of the Shannon sampling theorem. These two theorems point to certain definitive limits of signal theory, and, in consequence, they also also playa decisive, if sometimes hidden, role in all work with wavelets.
2.1 Fourier series As our basic environment we use the function space L~ := L 2 (lR/271'). The points ofthis space are measurable functions f: lR - f C, which are 271'-periodic:
f(t
+ 271') =
f(t)
Vt E lR,
and for which the integral
is finite. To be precise, the space L~ consists of equivalence classes of such functions; two functions f and 9 differing only on a set of t-values of measure oare considered to be the same point in L~. Among other things, this has the following consequence: A function f E L~, about which nothing more specific is known, has no definite values at individual points. Under these circumstances, it makes no sense to speak, for example, about the value f(O). It takes some time to become familiar with this not very functionlike behavior. On the other hand, arbitrary integrals f(t) dt have a well-determined value.
J:
30
2 Fourier analysis
The formula
2.
(f,g) :=
27r
ior
27r
f(t)g(t)dt
defines a scalar product on L~. To this scalar product belong the norm
Ilfll
:=
vu:n
(
=
2~
1 If(t)12 27r
1/2
dt)
and the distance function d(f,g) := Ilf - gil. With regard to this distance function, our space L~ becomes a complete metric space, which means that Cauchy sequences of functions fn E L~ are automatically convergent to some point f E L~. All in all (don't forget that L~ is also a vector space over q, the space L~ is an example of a (complex) Hilbert space. The functions ek:
t
f-+
e ikt
= cos(kt)
+ isin(kt)
(k
E
Z)
are 27r-periodic, and because of
(ej, ek)
= -1
27r
127r e'(J-k)t o. dt 0
={
(j = k)
1 1 i U_k)t\27r _ 27r(j _ k) e 0 - 0
(j
i= k)
they form an orthonormal system in L~. Any f E L~ has Fourier coefficients Ck :=
27r 1 ~ f(k) := (f, ek) = - 1 f(t) 27r 0
°kt e-'
(k
dt
E
Z) .
(1)
The Ck are nothing more than the coordinates of f with respect to the orthonormal basis (ek IkE Z), cf. the analog formulas for vectors of the euclidean ]Rn. The following so-called Riemann-Lebesgue lemma is not very difficult to prove:
(2.1)
lim
k_±oo
Ck
= o.
But the central result of L~-theory is Parse val's formula. It says that the scalar product of any two functions f and g E L~ coincides with the "formal scalar product" of the corresponding coefficient vectors und g:
i
31
2 1 Fourier series
(2.2) For arbitrary I and 9 E L~, the equality
00
L
!(k) g(k) = (/, g)
k=-oo
is valid; in particular, one has
2.:;:0=-00 ICkl2
Using the Fourier coefficients of
I,
= 11/112.
one forms the series
(2) called the (formal) Fourier series of I. Occasionally one writes
(3) to express the fact that the series (2) belongs to the given function I. The analogies between the geometries of L~ and of IR n lead one to conjecture that the series (2) "represents" the function I in a certain sense. In this regard we can say the following: The series (2) has partial sums N
SN(t).-
L
Ck eikt
k=-N
In Section 1.2 we remarked that SN is nothing but the orthogonal projection of f into the (2N + 1)-dimensional subspace
In particular the vector SN is orthogonal to 1- SN, see Figure 1.3. From this observation it follows by Pythagoras' theorem that N
III - sNI12 = 11/112 -llsNl1 2 = 11/112 -
L
ICkI2.
k=-N
On account of (2.2), we therefore may conclude that limN-+oo III - sNI12 which is to say
= 0,
32
2 Fourier analysis
(2.3) The formal Fourier series of a function f E L; converges to f in the sense of the L;-metric. For most practical purposes one would need much more than this, namely a theorem that guarantees the pointwise convergence of SN(t) to f(t) for sufficiently regular functions. The deepest result in this direction is Carleson's theorem (1966). Its proof is so difficult that it has not shown up in the usual textbooks on Fourier series. Since we shall make use of the theorem in several places, we state it here:
(2.4) The partial sums SN(t) of a function f E L; converge to f(t) for almost all t.
The following theorems are easier to prove. In these theorems the notion of "variation" of a function f: lR/21T ---> C appears (we are talking about a bona fide function here, not an equivalence class). This notion is explained as follows: To an arbitrary subdivision
T:
0
= to < tl < t2 < ... < tn = 21T
of the interval [0, 21T J belongs the increment sum n
VT(f)
:=
L
If(tk) - f(tk-l) I .
k=l
(Note that the absolute values of the increments are summed here!) The total variation V(f) of the 21T-periodic function f is the supremum of these sums over all subdivisions T. If V(f) is finite, then f is called a function of bounded variation. One may consider the function t f---+ f (t) as a parametric representation of a closed curve 'Y in the complex plane. In light of this interpretation the quantity V (f) is nothing more than the length L(-y) of this curve. If f is, e.g., piecewise continuously differentiable, then
V(f) = L(-y) =
iotIT 1f'(t)1 dt < 00 .
(2.5) Let the function f: lR/21T ---> C be continuous and of bounded variation. Then the partial sums S N (t) of the Fourier series of f converge for N ~ 00 uniformly on lR/21T to f(t). Using the idea of variation we can formulate the following "quantitative version" of the Riemann-Lebesgue lemma:
2.1 Fourier series
33
(2.6) Let f(r) denote the r-th derivative, r 2: 0, of the function f: lR/27T If !(T) is continuous and V(J(r)) =: V is finite, then
--+
C.
Vki=0.
This can be summarized as follows: The smoother the function f, the faster the Fourier coefficients Ck are decaying with k --+ ±oo. Theorem (2.6) can, in a way, be reversed:
(2.7) If the coefficients Ck obey an estimate of the form
for some e > 0, then the function f(t) .continuously differentiable.
I:k Ck eikt
is at least r times
I' When the series defining the function f is differentiated term-by-term p times, one obtains
L ck(ik)P eikt . k
The estimate
shows that the resulting series is uniformly convergent (to a continuous function) as long as p ::; r. In fact, for such p the series represents f(p) , so altogether we have f E cr. ~ The phenomena described in (2.6) and (2.7) become manifest again when we are dealing with Fourier analysis on lR and will have decisive consequences for the smoothness of our wavelets; we shall come back to this. We conclude this section by writing down the relevant formulas for the Fourier series and its coefficients in case of a period of arbitrary length L > instead of 27r. For L := 27T, these formulas must become (1) and (3), and similarly for Parseval's formula.
°
34
2 Fourier analysis
(2.8) Let f: JR
---+
foL If(x)12 dx <
00.
C be a periodic function with period L > 0, and suppose Then the formal Fourier series of f is given by
00
f(x)
--+
L
Ck e2k1rix/L ,
k=-oo
and Parseval's formula appears as
The function g(t) := f(2~t) is 27l"-periodic, thus the relations (4) are obtained by a simple substitution of variables. From (2.2), it follows that for L-periodic functions an equality of the form
I"
must hold. The special function f(t) :=== 1 has Fourier coefficients (Kronecker-delta), which leads to the conclusion C =
t·
2.2 Fourier transform on
Ck
= 80k J
~
Notation: From this point on until the end of the book an integral sign J with· out upper and lower limits denotes the integral with respect to the Lebesgue measure on JR, extended over the whole real axis:
J
f(t) dt :=
1:
f(t) dt .
Fourier analysis on JR is governed not by one theory but by at least three different theories, all depending on which function space is chosen as the basic environment. All of these theories deal with functions of the type f:
JR
---+
C;
we shall call such functions time signals for short.
(1)
2 2 Fourier transform on lR
35
The space Ll consists of the measurable functions (1) for which the integral
J
I/(t)1 dt =: 1I/IiI
(the 1 is a notational index!) is finite; to be precise, it consists of equivalence classes of such functions. Analogously, the space L2 consists of the functions (1) for which the integral
J
I/(tW dt =: 11/!12
(the
is an exponent!) is finite. The third of these spaces is the so-called Scbwartz space S; its elements are the functions (1) with the following properties: 1 has derivatives of all orders (in symbols, 1 E Coo (JR)) , and for It I ---* 00 all derivatives decay faster to 0 than any negative power l/ltln. Examples of such functions are 2
1 tl-+--.
cosht
Figure 2.1 shows the inclusions that are valid between these spaces. All wavelets of any practical significance belong to the intersection Ll n L2, so the L1-theory as well as the L 2 -theory is available for them. The famous "Mexican hat" (see Figure 3.4) even lies in S. 1
1
sint t
/
1
Figure 2.1
The Fourier transform
f of a function 1 E Ll is defined by the integral (~ E
JR) .
(2)
36
2
Fourier analysis
f
The definition of is not uniform in the mathematical literature. In addition to the integral given here, one also encounters
J
f(t) e- 27ri €t dt
and others. The content of the theory remains intact under such changes, of course, but the formulas will look a little different throughout. For a given ~ E lR, the well-determined value f(~) may be interpreted as follows: f(~) is the complex amplitude with which the pure oscillation ee is represented in f. The following "Gedankenexperiment" (thought experiment) will illustrate this: Consider a time signal f whose value f(t) oscillates around the origin (not necessarily in circles) with an angular velocity approximately ~ during some length of time and is very weak the rest of the time. If I is the time interval of this encircling motion, then arg(J(t) e-*) is more or less constant on I, and the integral
1
f(t) e-i€t dt
has a large absolute value, since there is little cancellation. The remaining integral
r
f(t) e-i€t dt ,
iff{\!
on the other hand, will have a very small value, since the signal-reading f(t) is more or less constant on lR \ I, while e€ is oscillating rapidly and harmonically there, so that we have a great deal of cancellation during the summation process on lR \ I.
(2.9) The Fourier transform uous. Furthermore, one has
f
of a function f lim f(~)
€-->±oo
E £1
is automatically contin-
O.
The vanishing of f at ±oo is nothing more than the Fourier transform version of the Riemann-Lebesgue lemma. We now derive a few rules for calculating the Fourier transforms of functions related to some given f by translation, dilation and the like. For any time signal f and arbitrary h E lR, the function Thf is defined by
Thf(t)
:=
f(t - h) .
2.2 Fourier transform on lR
37
~-+------~---=--~------~~--=-----
t-h
t
t
Figure 2.2
If h is positive, then Th translates the graph of f by h to the right (see Figure 2.2). Let f be in L1 and g(t) := Thf(t). Then the Fourier transform of 9 is computed as
This proves our first rule:
(Rl) which may be expressed in words as follows: If f is translated by h to the right along the time axis, then its Fourier transform j picks up a factor e-h. We again consider an arbitrary signal f E L1 and modulate f with a pure oscillation e w , WEIR; that is to say, we consider the function g(t) := eiwt f(t). The Fourier transform of 9 is given by
So we have the following rule, which is in a way "dual" to (Rl): (R2) In words: If the signal f is modulated with e w • then the graph of jis translated by w (to the right, if w > 0) on the ~-axis. Speaking philosophically, one can say that Fourier theory is the systematic exploitation of translational symmetry. In the realm of wavelets dilations of the time axis playa role of even more importance. For this reason we have to investigate how the Fourier transform behaves under the operation Da, which for arbitrary a E 1R* is defined by
Daf(t) .- fG) .
2 Fourier analysis
38
(a=3)
t
t/a
t
Figure 2.3 The effect of Da on the graph of a signal f is shown in Figure 2.3 for the case > 1, then 9(1) is stretched horizontally by the factor lal, and for lal < 1 the graph is compressed horizontally by the factor lal. If a < 0, then, in addition, 9(1) is reflected on the vertical axis. So let g(t) := Daf(t). In order to compute 9 we use the substitution
a := 3. If lal
t := at'
(t'
E
JR) ,
da =
lal dt'
(absolute value of the Jacobian!) and obtain
9(0 =
_1_
..ffff
J
f(!) a
e-i~t dt = l:L ..ffff
J
f(t')
e-i~at' dt' = lal f(ae)
.
All in all, we have proven the formula
(a E JR*) .
(R3)
In terms of the graphs of f and f this means the following: If the graph of f is stretched horizontally by a factor a > 1, then the graph of is compressed horizontally to the fraction ~ < 1 of its original width; moreover, it is scaled vertically by the factor lal.
i
For any two given functions defined by
f
* g(x)
:=
f and 9
J
E L1, their
f(x - t) g(t) dt
convolution product f
*9 is
(x E JR) .
In any case the object f * 9 is an element of L1. This means that a priori it is only an equivalence class of functions. In most concrete cases, however, f *9 is a bona fide function with well-determined values. One can even say more:
2 2 Fourier transform on lR
39
The function f * 9 is at least as smooth as the smoother of the two functions I and g. A typical application of convolution is the so-called regularization of a given function f by means of smooth bump functions ge E Coo. The ge have total mass f ge(t) dt = 1 and are identically zero outside of the interval I-e, e J, see Figure 2.4. The value f * ge(x) can then be regarded as a weighed average of the f-values taken in an c-neighbourhood of x, so the Coo-function Ie := f * ge is an "c-smeared out" version of the given function f.
!ge(t)dt= 1
t
Figure 2.4
With the help of Fubini's theorem (on the interchange of the order of integration) we can now easily compute the Fourier transform of f * g:
(J*gr(~)= ~I(I f(x-t)9(t)dt)e-iexdx = y
=
~f
27T JIRXIR
~
f(x - t)g(t) e- iex d(x, t)
1 (I g(t)
f(x - t) e- iex dx )dt .
By rule (Rl), the resulting inner integral has the value .j2i e-* f(~), and here only the factor e- iet is dependent on t. Thus we may continue the above chain of equalities with
Our computation proves the so-called convolution tbeorem (2.10)
40
2
Fourier analysis
In words: The Fourier transform converts the convolution product of the two functions f and 9 into the ordinary, meaning pointwise, product of their Fourier transforms. Now for the L 2-theory. On L2 one defines a scalar product by
(f, g)
(3)
:= J f(t) g(t) dt .
For any two functions f, 9 E L2, their scalar product (f, g) is a well-determined complex number. Any f E L2 has a finite 2-norm, norm for short,
Ilfll
:=
vTT1) = (Jlf(tWdt)1/2,
and one easily proves Scbwarz' inequality
I (f, g) I ~ Ilfllllgil .
(4)
L2 is a Hilbert space, as was L~, but not everything carries over. For a general E L2, the Fourier integral (2) need not exist: Since ee is not an element of L2, this integral cannot be regarded as being the scalar product e~). 2 Fortunately, the subset X := L1 nL is dense in L2, and this makes it possible to extend the Fourier transform
f
.,A:;(f,
F:
f
t--t [ ,
defined on X by formula (2), in a unique way to all of L2. This implies, of course, that the Fourier transform of a function f E L2 \X becomes accessible only through an additional limiting process. Working out the details, one arrives at the following picture: The Fourier transform [ of a function f E L2 about which nothing else is known is again an L 2 -object, Le., an equivalence class of functions, and does not have well-determined values at individual points ~ E R But as a map
the Fourier transform is well-defined and bijective (a miracle!). In fact, even more is true: F is an isometry with respect to the scalar product (3). This is analytically expressed by the following theorem, called the ParsevalPlancberel formula:
2 2 Fourier transform on lR
41
(2.11) For arbitrary I, g E L2 one has
(1,9)
= (J,g) ,
or, written out in full,
J
I(t) g(t) dt .
In particular, resp.
A periodic function I can be reconstructed from its Fourier coefficients Ck = j(k), by summing the series. In a similar vein, there is also a reconstruction procedure (called the inversion formula) for the Fourier transform. It accepts the Fourier transform lof a time signal I as input and reproduces the original signal I by means of a summation process. In the textbooks on Fourier analysis one finds various approaches to such an inversion formula under ever weaker assumptions about I and Let us note here the following version:
f
(2.12) If f and
1are both in P, then
almost everywhere, in particular at all points t where I is continuous.
This formula can be written "abstractly" in the form
which may be interpreted as follows: The original signal I is a linear combination of pure oscillations of all possible frequencies ~ E lR; to be more precise, any individual oscillation ee occurs in I with complex amplitude [(~) (cf. our remarks following the definition (2) of 1). In Theorem (2.12) there are assumptions not only about the original signal f but also about Thus we have to address the following question: How are the properties of (continuity, decay at infinity, etc.) related to those of I? Generally speaking, the following can be said in this regard: The smoother
1. 1
42
2 Fourier analysis
the time signal f, the faster the decay of i(f.) for 1f.1 ~ 00. Reflecting this in a logical mirror, one has the following dual statement: The faster the original signal decays for It I ~ 00, the smoother, or more regular, is its Fourier transform (Following the general custom, we use the word regular to convey a not very precise notion of smoothness.) A function f in Schwartz space S is "super smooth", and as a consequence its Fourier transform decays "super fast". On the other ha~d, f and all its derivatives enjoy "super fast" decay, and as a consequence f is "super smooth". All in all, it turns out that :1, restricted to S, maps this space bijectively onto itself.
i
We want to formulate the described general principle somewhat more precisely, i.e., in a more quantitative way. The smoothness (regularity) of a function is most easily expressed by the number of times it can be continuously differentiated. So we first have to investigate the interplay between the Fourier transform and differentiation. Let f be a aI-function and assume that f as well as l' are integrable, i.e., in Ll. Then in any case one has limt-doo f(t) = 0 (an exercise!), and partial integration of the Fourier integral (2) gives
Ji
f'(t)
e-i~t 1
dt = f(t)
e-i~t I~_
t.--oo
+ if.
J
f(t)
e-i~t dt ,
from which we can read off the following rule for computing the Fourier transform of a derivative:
(R4) Continuing in this way, we obtain, at least formally, for arbitrary r ;::: 0, the formula
(5) Assume, e.g., that our signal f is r times continuously differentiable and that the derivatives f(k) (0:::; k :::; r) are in Ll. Then formula (5) is applicable, and Theorem (2.9), applied to fH, guarantees lim
~-doo
If.l ri(f.) = 0 .
This can be read as follows: Under the described circumstances the Fourier transform has a decay at infinity (i.e., for 1f.1 ~ (0) that is faster than the decay of l/If.lr.
i
Using (2.11) instead of (2.9) we arrive at a similar result: If, under suitable assumptions about the derivatives f(k) (0:::; k :::; r), the integral f If(r)(t)1 2 dt
2.2 Fourier transform on lR
43
is finite, then the integral J 1~12r 1f(~W df. is finite as well, which implies that must have corresponding decay at infinity.
i
As a counterpart to the considerations in the last paragraph we start afresh, but this time with time signals f that have fast decay at infinity. We consider an fEU decaying for It I - t 00 at least fast enough to make the integral JItllf(t)1 dt convergent. We shall denote the function t t---+ t f(t) by tf for short, so we assume tf ELI. We now compute the derivative of f To this end we write
f(~ + h~ - f(~)
~
=
J
f(t)
e-i~t e-it~ -
1 dt .
Here the integrand .
gh(t) := f(t) e-t~t
e- ith -
h
1
can be estimated as follows: 'Vh-j:.O.
By Lebesgue's theorem (about the interchange of limit and integration) we conclude that the derivative
(1)'(~) =
lim h--O
f(~ + h) - f(~) h
= _1_
J21T
J
f(t)
e-i~t( -it) dt
exists. If the last equation is read from right to left, one obtains the following rule for computing the Fourier transform of tf: (t fr(~) = i (1)'(~) .
(R5)
Because of (2.9), the function (1)' is even continuous. By induction one proves easily that the following is true for arbitrary r ~ 1: (2.13) Assume that fELl decays fast enough for It I - t 00 to make the integral J IW If(t)1 dt finite. Then the Fourier transform is at least r times continuously differentiable. Furthermore,
1
(6) An extremal case of fast decay is when the time signal fELl has in fact compact support. If supp(f) C [-b, b], we may write
f(()
=
~rb f(t) e-i(t dt J- b
y27r
.
(7)
44
2 Fourier analysis
Note that we have replaced the frequency variable ~ by a (, for something essential has happened: The Fourier transform has become an entire holomorphic function of the complex variable ( = ~ + iry. Looking back, we remark that for the convergence of the Fourier integral (2) in general it was crucial that the factor e-it;t remain bounded when t - t ±oo. Now in the integral (7) over a finite interval, the factor e-i(t can be estimated for complex ( as follows: = le- i (t;.+i7)tl ::; e bl 7)1 (-b::; t::; b) .
f
le-i(tl
This shows that the integral (7) is convergent for arbitrary values of ( E C, and as in the proof of (R5) it follows that one may differentiate (7) in the sense of complex function theory with respect to the variable (. Furthermore, one has for f itself an estimate of the form
1f(()1 ::;
_l_jb v'2i
If(t)le1tIm(()1
dt::;
CeblIm(OI .
-b
Thus the size of the support of f determines the rate of increase of the entire function ( t---+ f( () in the vertical direction.
f
Since the Fourier transform in this case has turned out to be an entire holomorphic function, it is impossible that has compact support, if this is the case for f. Turned the other way around, a bandlimited signal (see Section 2.4) cannot have compact support.
f
We conclude this section with a few examples.
CD Let a > 0, and consider the function is computed as follows: f(~)
j v'2i
= -1-
a
e-it;t dt
fi sin(a~) y; ~
:= l[-a,aj.
v'2i
(~ =1= 0)
The value ~ = 0 is special. limt;--->o f(~), one finds
-i~
Its Fourier transform
1 2 = ___
lIla = __ __ e-it;t
-a
=
f
v'2i ~
t:=-a
e
it;a
- e
-it;a
2i
.
By a separate calculation or by looking at
f(0) =
j!a.
The graphs of both f and f are shown in Figure 2.5. In the signal theoretic literature, very often the so-called sinc function is introduced as a standard tool. It is usually defined by sinc (x)
._ { Sinxx I
(x
=1=
0)
(x =0)
2.2 Fourier transform on R
45
and is an entire holomorphic function of x, when x is considered as a complex variable. Using this function we may write down our result about f in the following way:
(8)
t 1
'i2Fr a 1
f
--~--~--~----+t
-a
a
Figure 2.5
As an exercise in using our rules, we compute the Fourier transform of the Haar wavelet (see Section 1.6) a second time. Considered as an element of £1, the Haar wavelet may be written as follows: 'l/JHaar
= 1[0'21.]-1[1.2' 1] = T1.1[_1. 1.]- T;J,l[_1. 1.] 4 4'4 4 4'4
.
Rule (R1) now allows us to read off :;jHaar directly from (8):
as before.
The function
g(t)
:=
l[-a,a](t). eiwot
models a certain process setting in at the exact time t := -a and abruptly stopping at time t := a. In between, we observe a pure oscillation of frequency (angular velocity, to be exact) woo The Fourier transform treats this process
46
2 Fourier analysis
mandatorily as an overall phenomenon extended over the full time axis. Rule (R2) gives, in this case:
9(~) =
fi sin(a(~ -
y;
wo)) ~-wo
As was to be expected, the function 9 has a more or less distinctive maximum at the frequency ~ := Wo (see Figure 2.6). But because of the jump discontinuities of 9 at the times t := ±a, the absolute value 191 decays only slowly with I~I - t 00; in fact, 9 is not even in L1. 0
Figure 2.6
@ The Fourier transform of the function
is most easily computed via the methods of complex function theory. Since 90 is real and even, its Fourier transform 90 will also be a real and even function. So it suffices to discuss ~ > O. Inspired by 90, we consider the function J(z) := e- z2 /2, holomorphic in the full complex z-plane, and draw
the rectangle R shown in Figure 2.7. Since, in the end, we shall take the limit - t 00, we may assume right from the start that a ~ ~ > 0; note that ~ is fixed here. a
Cauchy's integral theorem tells us that
1 ~
J(z) dz =
1 ~
J(z) dz +
JaR J(z) dz = O.
Therefore we have
ju
J(z) dz ,
J(z) dz -
j
~
47
2.2 Fourier transform on R
R 1-
-a
a
Figure 2.1
which we may abbreviate as
h = 10 +1+ - L . For h we use the parametric representation tt---+z(t):=t+i~
0"1:
and obtain
11 =
i:
exp( - t
2
+ 2i;t -
= ee/ 2 (27J"go(~)
+ 0(1))
(-a
e) dt = ee (a
-4
/2
~ t ~
i:
a)
e- t2 / 2 e- iet dt
(0) .
(9)
The integral 10 can be written as
10
=
f
a -a
2
e- t /2 dt
= y'2; + 0(1)
(a-4oo) .
(10)
Here we have used a well-known special value of the probability integral, which can be obtained without excursion into the complex domain. To compute the remaining integrals I ±, we use the parametric representation
I±:
t
and obtain
i±=
t---+
z(t)
le o
:=
±a + it
(a 2 ± 2iat - t 2 ) 2 idt.
exp -
48
2 Fourier analysis
Because of a
~ ~,
the last integral can be estimated as follows:
II± I ~ loa exp (
(a - t )2( a + t)) dt
= ... = ~(1- e-
a2
a
/
~ loa exp ( - ~ (a -
2) = 0(1)
(a
-t
t)) dt
00) .
This proves It = 10 + 0(1) (a - t 00); therefore from (9) and (10), by passing to the limit a - t 00, we obtain ~go(C) _ _1_ -e/2 '>
-
v'27fe
.
We see that the special function N1,0 has as its Fourier transform an identical copy of itself, but living on the ~-axis.
y
((J=I, w=5) Figure 2.8
We conclude the present example by computing the Fourier transform of the "wave train"
g(t)
:=
Nu,o(t) cos(wot)
=
1
~
V27r (J
exp
(t2 ) eiwot +2 e- iwot --2
2(J
(see Figure 2.8). To this end we use our rules. First, one has Nu,o = ~Dl1go, so rule (R3) gives
2 3 The Heisenberg uncertainty principle
49
To this we apply rule (R2) and obtain
We see that the Fourier transform of our "wave train" has peaks at the two points ±wo of the ~-axis, and these peaks become more and more pronounced as (J increases, Le., when the number of oscillations of frequency Wo that in fact could be observed becomes larger and larger. 0 For additional formulas giving the Fourier transforms of special functions we refer the reader to the extensive tables in [13].
2.3 The Heisenberg uncertainty principle We have n~ed at several places already that a time signal I and its Fourier transform I cannot be simultaneously localized in a small domain of the tresp. the ~-axis.
f
•
The scaling rule (R3) implies that the graph of is stretched horizontally (and, in addition, flattened by vertical scaling) when the graph of I is compressed horizontally.
•
The Fourier transform of a pure oscillation cut off outside ±a has all of lR as its support and is not even absolutely integrable for I~I ~ 00.
•
A time signal with compact support cannot be bandlimited (see Section 2.4).
•
Further observations can be made along the same vein, which the reader is invited to make on his own.
The phenomenon described here rather intuitively has found its quantitative expression in the famous Heisenberg uncertainty principle, a theorem of Fourier analysis that plays an important role in quantum mechanics. There the motion of a particle is described "abstractly" by a certain function 'ljJ E S (no connection with our wavelets) in the following way: The function Ix (x) := 1l,b(x)j2 is interpreted as the probability density for the position X ofthis particle, considered as a random variable, and Ip(~) := 10(~)12 is the corresponding density for its momentum P. The uncertainty principle states in the form of a
50
2 Fourier analysis
precise inequality that these two densities cannot simultaneously have a single marked peak. Here we have tacitly assumed 'f/; E L2, and, for the probabilistic interpretation,
11'f/;11 2 =
J
fx(x) dx = 1 .
The quantity
is the expectation of the random variable X 2 and consequently a measure for the horizontal spread of the function 'f/;. Analogously, the integral
can be regarded as a measure of the spread of 'f/; over the ~-axis. In terms of these quantities, the Heisenberg uncertainty principle can be formulated as follows: (2.14) Let
'f/;
be an arbitrary function in L2. Tben
(1) tbe left-band side being allowed to assume tbe value 00. Tbe equality sign is valid exactly for tbe constant multiples of tbe functions x f---+ e- cx2 , c> O. If Ilx'f/;ll = 00 or II~';;;II = 00, then there is nothing to prove. In this case at least one of the two functions 'f/; and';;; is definitely ''very spread out". Therefore we may assume that the left-hand side of (1) is finite and prove this inequality first for functions 'f/; E S. Under this additional hypothesis all convergence questions are moved out of the way; in particular, we have limx--->±oo xl'f/;(x)12 = O.
I
The Fourier transform';;; may be eliminated from (1) by means of rule (R4) and Parseval's formula (2.11). One has
II~';;;II
=
11'f/;'11
=
11'f/;'II,
from which it follows that the stated inequality (1) is equivalent to
Ilx'f/;ll· IWII ~ ~2 11'f/;11 2
.
(2)
2 3 The Heisenberg uncertainty principle
51
:.Jow by Schwarz' inequality 2.2.(4), we have
Ilx1/l11·111/I'11 ~ !(x 1/1, 1/1')! ~ !Re(x1/l,1/I')!.
(3)
Here the right-hand side can be computed as follows: 2Re(x 1/1, 1/1')
= (x 1/1, 1/1') + (1/1', x 1/1) = =
X
11/1 (x) 12
[Xl -
I:
J
x (1/I(x)1/I'(x)
+ 1/I'(x)1/I(x))
!
i
11/I(xW dx =
dx
_111/111 2 .
If we insert this on the right side of (3), the inequality (2) follows. To finish up the proof we have to get rid of the assumption 1/1 E S. Since S is dense in L2, a simple approximation argument (which we leave as an exercise) will do the job. One has equality in (1), if and only if both ~ relations in (3) are in fact equalities, and for this to be valid it is necessary, in the first place, that the two vectors x1/l and 1/1' E £2 are linearly dependent. So there has to be a J1- + ill E C with
1/I'(x) == (J.L + iv) x 1/1 (x)
(x E JR.) .
(4)
The solutions of this differential equation are given by
and such a 1/1 is an element of £2 if and only if J.L =: -c is negative. For the second ~ in (3) to be an equality, (x 1/1,1/1') has to be real. Together with (4) we are led to the condition
SO II
has to be zero.
According to this theorem, the two functions 1/1, :(b cannot simultaneously be sharply localized at x := 0, ~ := 0: At least one of the numbers IIx1/l11 2 and 11~~112 is ~ 111/111 2/2. Of course the same is true for an arbitrary pair (xo, ~o) instead of (0,0): (2.15) For any 1/1 E
£2
and arbitrary Xo E JR., ~o E JR. one has
52
2 Fourier analysis
Here
II(x -
xo)'l/JII resp. II(~ - ~o)~11 denote the following quantities:
(j (x _ xo)21'l/J(x)12 dX) 1/2
I
resp.
We bring the auxiliary function
g(t)
:= e-i~ot 'l/J(t
+ xo)
into play and compute
IIgll2 = jl'l/J(t + xo)1 2dt = 1I'l/J1I2 , IItgll 2 = jt21'l/J(t+xo)12= j(x-xo)21'l/J(x)1 2 dX. Writing 9 in the form
g(t) =
e-i~ot
h(t) ,
h(t)
:=
f(t
+ xo) ,
and with the help of rules (R2) and (Rl), we deduce that
This implies
If we now apply (2.14) to the function 9 and insert the values obtained for
IIgll, IItgll and
II T 911, we arrive at the stated formula.
J
24 The Shannon sampling theorem
53
2.4 The Shannon sampling theorem The Shannon sampling theorem gives a surprising answer to the following question: Is it possible to reconstruct a time signal f from discrete values (J(kT) IkE Z) completely, i.e., for all values of the continuous variable t? Without further assumptions about f the answer to this question of course has to be no, for in the open intervals between the sample points kT the graph of f could be filled in more or less arbitrarily. The sampling theorem has an interesting history; see [9J for a very readable account. The fact is that the series representation given by Shannon's theorem had been known long before Shannon by the name of cardinal series. :\. function fELl is called n-bandlimited if its Fourier transform vanishes :dentically for I~I > 0.:
1
(I~I
> 0.) .
Shannon's theorem states that an n-bandlimited function can be reconstructed completely from its values
(J(kT)
IkE Z) ,
T·-
n, 7r
(1)
sampled at the discrete points kT. By "completely" we mean that at all points t E JR we get back the exact original value f(t). Now this might come as a surprise, but a moment's reflection shows that it is not so surprising after all: A bandlimited time signal f is automatically an entire holomorphic function of the complex variable t (cf. the corresponding statement about the Fourier transform of time signals having compact support), and it is well known that such a function is determined on all of C by giving its values on a comparatively "modest" set. So uniqueness follows from general principles, but Shannon's theorem even gives a formula for f. In (1) a certain rigid relation between the bandwidth 0. and the sampling lllterval T is stipulated. There is a lot to be said about that, and we shall come back to this matter later on. For the moment, the following will suffice: All harmonic components e~ actually occurring in f have a period length ~ 27r /0.. Thus, by requiring T := 7r /0., one makes sure that any pure oscillation possibly present in f would be sampled at least twice per period. Here is the sampling theorem (Figure 2.9):
(2.16) Let the continuous function f: JR that f satisfies an estimate of the form
f(t) =
O(ltl~+g)
-+
C be n-bandlimited and assume
(t
-+
±oo) .
(2)
54
2 Fourier analysis
Figure 2.9
Let T
:= 7r
10.
Then 00
f(t) =
L
f(kT) sinc(O(t - kT))
(3)
(tEJR).
k=-oo
The formal series appearing in (3) is called the cardinal series in the literature. Because the sinc-function is bounded on JR, the assumption (2) guarantees that the cardinal series is uniformly convergent on JR and so represents a function j that is continuous on all of R The relations sinc(k7r) = OOk imply that the function j automatically interpolates the given values f (kT). This means that the cardinal series can be used as a continuous interpolant of the given data (J(kT) IkE Z) even in cases where f is not bandlimited. From what was said above about f, it is no restriction of generality to assume right from the start that f is continuous. The assumption (2) could be weakened.
I
Because of (2) the function f is in L1 n L2 and has a continuous Fourier transform by (2.9). Since vanishes for lei > 0, it is in L1 as well, and the right side of the inversion formula (2.12) produces a continuous function t f--+ j(t) which coincides with f almost everywhere, so is actually == f:
1
f(t) =
_1_ J ..j2;
_1_1
l(e) e* de =
..j2;
A
1
0
1<e) eite de
(t
E
JR) .
(4)
-0
Since is continuous, one has 1<-0) = 1<0) = 0, and one may say that on the e-interval [-0, OJ the function coincides with a certain periodic function F of period 20: (5) 1<e) == F(e) This function F E L2 (JRI (20)) can be developed into a Fourier series accord· ing to the formulas (2.8):
1
00
F(e)
--+
L k=-oo
cke 2k1rie /(20)
,
(6)
2-1 The Shannon sampling theorem
55
and we know by Carleson's theorem (2.4) that the series written here converges for almost all ~ to the true function value F(~). The coefficients Ck are computed as follows:
Comparing this equality with (4) we see that the last integral can be interpreted as an f-value, so we get Ck
y'2; y'2; = 20 f(-br/O) = 20 f(-kT) ,
and formula (6) becomes
F(~)
=
~
y'2; 20
~
f(kT) e-ikTt;
~ E
(almost all
1R) .
(8)
k=-oo
On account of (5) we may therefore replace (4) by
f(t) =
~]O 20
(f
f(kT)
e-ikTt;) eitt;
d~ .
k=-oo
-0
Because of (2), the series under the integral sign converges uniformly, and we are allowed to integrate it term by term:
The last integral is computed as follows: ] 0 ei(t-kT)t;
d~
-0
=]0
cos((t _
kT)~) d~
-0
= t _
2
.
kT sm(O(t - kT))
= 20 sinc(O(t -
kT))
(t
-I kT)
(t E 1R) ,
so that we definitively obtain the stated formula 00
f(t) =
L k=-oo
f(kT) sinc(O(t - kT))
(t E 1R) .
56
2 Fourier analysis
The frequency (angular velocity, to be exact) n := 7r IT is called the Nyquist frequency for the chosen sampling interval T. Conversely, the quantity T- I represents the number of samples taken per unit of time and is called the sampling rate. The sampling rate T- 1 := n/7r is called the Nyquist rate for functions of bandwidth n. Assume now that a certain sampling rate is given, e.g., T- 1 := 40000 sec l . What can be said when the actual bandwidth n' of the sampled function f is larger than the Nyquist frequency n := 7r IT? In order to answer this question we need to go once more through the above proof. The places A in (4) and B in (7) are the only two instances where the assumption that vanishes identically outside of the interval [-n, n] has actually been used. If this assumption is not fulfilled, i.e., if the true bandwidth n' of f is larger than n = 7r IT, then at the places A and B we no longer have equality, and the cardinal series will not represent f.
i
Which other function is then represented by the cardinal series? One might perhaps entertain the idea that simply the harmonic components e~ with frequencies I~I > n are filtered out, so that the cardinal series would essentially produce the function -
1
f:=
f(C
y
27r
111 d~ f(~) ~ e~ . -11
Unfortunately, this conjecture is false. In reality, a new phenomenon occurs. It is called aliasing and is a nuisance in various fields of technology (telephone communications, computer tomography, etc.), where discretization of analog phenomena is an essential ingredient. Things become more clear when we now consider an ately" undersampled. We take
f that is only "moder·
n < n' < 3n and assume that ic~) == 0 for I~I >
If we make the substitution ~:=
(±2n
n'.
Then we can write
(cf.
(4))
2 4 The Shannon sampling theorem
57
the two exterior integrals on the right, then eikT~ = eikT~1 (because of 2f! T = 211"), and we obtain
10
f(kT) =
~1° (i(~) + i(~ -0
y'27l"
20)
+ i(~ + 20)) eikT~ d~ .
(9)
This brings into the game the continuous function 9 E £2 whose Fourier transform is given by
(-0::;
~::;
0)
(10)
(I~I > 0)
Because of (9), the function 9 satisfies
o
g(kT)
= ~1 g(~) eikT~ d~ = f(kT) v27l"
(k E Z) .
-0
We realize that 9 has the same cardinal series as f, but 9 is, contrary to f, truly n-bandlimited. This implies that the common cardinal series of f and 9 represents not f but g, and we are led to the following general conclusion: If the true bandwidth 0' of f is larger than the Nyquist frequency 0 := 7l" IT, then the high frequency parts of f are not simply filtered out or "forgotten" by the cardinal series, but they appear therein, afflicted with a mysterious frequency shift. The cardinal series produces an O-bandlimited function 9 whose Fourier transform 9 is given by (10) and is shown in Figure 2.10 .
..........
----~~~~----~+_--~~----~--------r+ ~
-0'
Figure 2.10 Aliasing
-0
0'
30
58
2 Fourier analysis
While undersampling leads, as we have seen, to the undesirable effect of aliasing, the skillful deployment of oversampling can be used to improve the rate of convergence. We now show how this can be realized. Let a sampling rate T- 1 be given and let 0 := 7r IT be the corresponding Nyquist frequency. We assume that the signals f taken into consideration are O'-bandlimited for some 0' < O. Let the auxiliary function q E L2 be defined by giving its Fourier transform:
~(!:) .= {~(1 _sm. 7r(2Iel2(0-- 00')- 0')) q... 2 o
(lei
~
0')
(0' ~ lei ~ 0) (lei 2: 0)
Note that q is, apart from the parameter values 0 and 0', independent of f. Figure 2.11 shows the graphs of If and of a typical! under consideration.
1
-0 Figure 2.11
The signal f satisfies the assumptions of theorem (2.16), therefore (8) is valid. and we may write
= ..,fj/ff
20
~
L..;
f(kT)
e-ikT~
k=-oo
i<e)
Furthermore, we know that is identically zero for 0' ~ lei ~ O. In the interval lei ~ 0' we have If(e) == 1. This implies that, starting with (4), we
2 -1 The Shannon sampling theorem
59
may do the following computation:
Using the abbreviation
n21 1 q(~) e'S~ 0
.
=:
Q(s),
(11)
-0
we see that the cardinal series (3) has been transformed into the novel representation 00
L
f(t) =
f(kT) Q(t - kT) .
(12)
k=-oo
In order to be able to judge the announced improvement in convergence we need the "universal" (i.e., independent of f) function Q in explicit form. Since qis an even function, the integral (11) is computed as follows:
Q(s)
=
2~
~
i:
(1
q(~) cos(s~) d~
01
cos(s~) d~ + l~
... cos(s~) d~)
7[2 sin(O's)+sin(Os) 20s 7[2 - (0 - O')2s2 From this, we immediately deduce
Q(s) =
O(ls~3)
Let us consider an example. Oversampling the time signal f twice means ~n. Imagine that we want to reconstruct the signal f in the t-interval [0, T]. For the comparison of (12) and (3) we have to estimate the order of magnitude of the factor Q(t - kT) in (12) when Ikl ---+ 00. It is given by
n' =
27[2 20·
IklT . (0/2)2(kT)2
4 1 ;: Tkf3
.
2 Fourier analysis
60
In simplifying, we have used the relation flT = 1r. Compare this with the cardinal series (3): The order of magnitude of the corresponding factor sinc(fl(t - kT)) when Ikl ~ 00 is much larger, namely 1 1
;Tkj. It follows that, using (3), one would have to take several times more terms into account as compared to (12) in order to guarantee the same level of precision.
3 The continuous wavelet transform
3.1 Definitions and examples A function 'Ij;: lR -+ C satisfying the conditions
11'Ij;1I and 211"
r 11J(a) 12
JR- lal da
(1)
= 1
=: C'if;
<
(2)
00
is called a mother wavelet or simply a wavelet. These two conditions represent the bare minimum that is necessary for the functioning of the theory described in this chapter. All wavelets occurring in practice are L1-functions as well, most ofthem are continuous (the Haar wavelet isn't), many are differentiable, and the wavelets that are the most popular (as mathematical objects, if not in the applications) have compact support.
Whether a proposed function 'Ij; E L2 fulfills condition (2) cannot be decided Just by looking at it. That's why the following criterion is of help, at least for reasonable 1/;'s; at the same time it gives an intuitively accessible interpretation of condition (2):
(3.1) For functions 'Ij; E L2 satisfying condition (2) is equivalent to
f:
'Ij;(t)dt
=0
t'lj;
resp.
E
£1, i.e.,
J It I 1'Ij;(t) Idt
1J(O) = 0 .
<
00,
(3)
According to this proposition a wavelet has mean value O. From this we infer that the graph Q('Ij;) of a wavelet 'Ij; lies, as most graphs of "waves" do, partly above and partly below the t-axis.
I
A function 'Ij; of the described kind is automatically in L1, and one has
~ =...,!2-ff 1 'Ij;(O)
J
'Ij;(t) dt .
3 The continuous wavelet transform
62
By (2.9) the Fourier transform only converge if ;J(O) = o. Conversely: The condition
;J is
t'l/J E £1
continuous. Then the integral (2) can
implies
;J E C 1 by
SUp{I;J'(~)III~I:::; I}
=:
(2.13). Let
M.
Now, if ;J(O) = 0, then the mean value theorem of differential calculus implies
( I~I :::; 1) , and we obtain the estimate
Assume that a certain wavelet function
Wf(a, b)
:=
'l/J has been chosen and is held fixed.
la111 / 2
f
(t - b)
f(t) 'l/J -a- dt
(ai-O)
Then the (4)
is called the wavelet transform of the time signal f E £2 with respect to ~. The domain of definition of Wf is the (a, b)-plane, "cut into two halves", i.e., the set lR:' := {(a, b) I a E lR*, bE lR} . Note again that in wavelet theory the a-axis is scaled vertically and the b-axis horizontally (see, for example, Figure 3.7). Very often the domain of WI is restricted to positive a-values. In this case, condition (2) has to be modified slightly (see below).
Wf is a function of two real variables; therefore its gnphical representation in a figure is not as easily accomplished as that of for f. We refer the reader to Example ® for a version that is easily implemented on a computer. Assume that a wavelet let
'l/J has been chosen once and for all. 'l/Ja(t)
:=
For arbitrary a f 0
lal~/2 'l/JG)
be the function obtained from 'l/J by stretching its graph horizontally from 0 by the factor lal, reflecting it at the vertical axis in case a < 0, and finally scaling it appropriately in the vertical direction, making
3 1 Definitions and examples
63
Ii after this dilation process the function 1/Ja is translated along the time axis Jy the amount b (to the right, if b > 0), one obtains the function
1/Ja,b(t) := 1/Ja(t - b) =
1 (t-a- b) laI1/21/J
(5)
,
lppearing in the integral (4); see Figure 3.1. We obviously have
II1/Ja,b I = 1
v (a, b) E lR~
.
[sing the 'ljJa,b we can write the definition (4) of the wavelet transform in the form of a scalar product:
WI (a, b)
~
(6)
_ _ _ _~~ _ _ _ _~ _ _L -_ _ _ _ _ _~~ _ _ _ _ _ _- . _ _~~-L_ _ _
t
Figure 3.1
This implies, first, that at each point (a, b)
E
lR* x lR the wavelet transform
WI has a well determined value WI (a, b) and, second, by Schwarz' inequality, that WI is uniformly bounded on lR::: IW/(a, b)1 ~
IIIII
v(a, b) E lR~
.
(7)
We now compute the Fourier transforms of the functions 1/Ja,b. According to rule (R3) he have whence we obtain by rule (Rl), applied to (5):
(8)
64
3 The continuous wavelet transform
On account of (2.11) (Parseval's formula) and (6) we therefore can write Wf(a, b) in the following form:
The last integral can be regarded as a Fourier integral; to be precise, it gives the Fourier v transform of the £l-function
(10) written as a function of the variable b. Altogether, we have proven the following proposition:
(3.2) For fixed a =I- 0 the function
Wf(a, .):
b 1-+ Wf(a, b)
can be regarded as the Fourierv transform of the function
Fa, the latter given
by (10). Because of (2.9) one may conclude in particular that the function Wf is continuous on horizontal lines a = const., and takes the limit 0 when b -+ ±oo, keeping a fixed.
CD The function 1/J := 1/JHaar is obviously a wavelet in the sense of the general definition. If a > 0 then (b:::;t
(b + ~ :::; t < b + a) (otherwise) and consequently
(1b+ f(t) dt - l + f(t) dt va = -fa (21b+ f(t) dt - -21 + f(t) dt a
Wf(a, b) = 1r;;
b a
/2
b
a
2
a
)
b+a/2
b
b a
/2
a
)
b+a/2
This shows that (apart from the normalizing factor) the value Wf(a,b) represents a difference between two mean values of f, these means being taken
3.1 Definitions and examples
65
---+------+-----+----+ t b+~ b b+a 2
Figure 3.2
over two adjacent intervals of length %in the neighbourhood of b, as indicated in Figure 3.2. We may also look at the same quantity Wf (a, b) in a totally different way:
Wf(a, b) =
1 Va
l
1
= - Va
b
b+a / 2
(J(t) - f(t
lb+ (1t+ a /2
b
= __ 1 ja/2
Va
t
a /2
+ %)) dt f' (T) dT) dt = ...
(~-ITI) fl(b + ~ + T) dT .
-a/2 2
2
Written in this form the value Wf(a, b) appears as a weighed mean of the derivative f' over the interval [b, b + a]. Figure 3.3 shows the graph of the weight function relating to this second interpretation of Wf(a, b). 0
. t
T
a 2 Figure 3.3
o
a 2
66
3 The continuous wavelet transforn
@ Consider the function (11 where the leading numerical factor (=: ,) is chosen so as to make 111jJ11 = 1 The graph of 'Ij; is shown in Figure 3.4; its shape immediately reminds one 0: a Mexican hat. As is easily verified, one has 'Ij;(t) = -,g"(t), where g(t) := e- t2 / 2 denote, the Gaussian. In Example 2.2.@ we computed the Fourier transform of the latter and found that it is equal to g. We conclude, using rule (R4), that
In particular, we have ;;(0) = 0, and from Proposition 3.1 we infer that thE function 'Ij; is indeed a wavelet. For obvious reasons this function is called thE Mexican bat.
1
---=~--------r---~--~---------==------
Figure 3.4
t
Mexican hat
® In Figure 3.5 the graph of a modulated Gaussian is shown. It is can· structed as follows: First a fundamental frequency w > 0 is chosen and held fixed. It seems that for certain practical reasons the value w := 5 is a good choice, see [D], 3.3.5.C, for details. It is evident that the "wave train"
would be an interesting candidate to serve as a "key pattern". Unfortunateh the condition £(0) = 0 is not fulfilled. For this reason we modify X slightly to
3 1 Definitions and examples
67
and now have to pick a suitable value for A. Rule (R2) gives
2/
~
,:nd consequently 'Ij;(O) = e- W 2 - A. This shows that setting A := e- w (an satisfy condition (3); therefore the complex valued function
2
/2
we
:0 on principle acceptable as a wavelet. The 'Ij; as given by this formula has :,'et to be normalized. We leave it to the reader as an exercise to perform the !1eCessary calculations to that end.
0
t
(w=3) Figure 3.5
Modulated Gaussian
8) An arbitrary function 'Ij;
E L2 n L1 having norm 1, mean 0 and compact is automatically a wavelet: Let 'Ij;(t) be == 0 for It I > b. The function h(t):= ItI1[-b,bj(t) is obviously in L2, thus ~upport
J
It I 1'Ij;(t) I dt = (h, 1'Ij;1) <
and the above statement follows using (3.1).
00 ,
o
68
3 The continuous wavelet transfornl
o
-1
-3
4
3
6
Figure 3.6
® The following is an attempt to visualize the wavelet transform of a giver time signal I as a function of two real variables. As our analyzing wavelet we take the Mexican hat (11). We let the time signal I be a superposition ofthf three "notes" (-3:::;t:::;-I),
!let) :=2-2It+21 h(t) := 1 - cos(27ft) !J(t) :=
1
2(1 -
:= 0
(0 :::; t :::; 3) ,
cos(57ft))
:= 0
(4:::;t:::;6),
(otherwise), (otherwise) ,
:= 0
(otherwise)
(see Figure 3.6) with suitably chosen coefficients:
I(t) := 2.883 !let) + 1.205 h(t) + 0.968 !J(t) . In order to compensate for the natural decay of WI(a, b) when a Theorem (3.15) below) we show a density plot of the function
w(a, b)
1
:=
a3 / 2 iWI(a, b)i
(0
(12, --+
0
(Set
:::; 0.4)
instead of WI. The intensities appearing in (12) were chosen in such a wa) that the three components Wi, W2, W3 assume the same maximal value Wmax = 10 in the considered (a, b)-domain. Figure 3.7 consists of 480x 768 pixels, each of them representing a point (a, b) in the indicated rectangle. For each pixel we computed its test score p := w(a, b)/wmax numerically; subsequently thE pixel in question was colored black with probability p, using a random number ~~.
0
32 A Plancherel fonnula
Figure 3.T
Th<'! wavelet transform of the function f given by (12); d. Figure 3.6
3.2 A Plancherel formula The wavelet transform accepts functions f E L2(1R) as input and produces functions Wi; IR~ --> C as output. If in such a situation we contemplate ",tablishing a Plancherel formula, we of course need a scalar product for functions u: lR~ -+ C. For the definition of a scalar product we need a measure on the S€t IR~ := IR* x R. The two-dimensional Lebesgue measure dadb comes to mind first, but it is not appropriate here for the following reason: The rariables a and b are not on an equal footing, as, e.g., the variables x and y in the euclidean plane are. Looking at the integral 3.1.(4) defining the wavelet transform we see that a point (a, b) E R~ is used implicitly to characterize the affine transformation Sa,b:
R-+lR,
T ...... t:=aT+b
of the time ~, and here it is for everyone to see that the stretching factor ;al is of much greater importance than the translational variable b.
3 The continuous wavelet transform
70
The totality Aff(lR) := {Sa,b
I (a, b) E IR:'}
(1)
of these affine transformations is a topological group with respect to 0 (Le.) composition) and as such it carries a "natural" measure dp, called left invariant Haar measure. Formula (1) defines a parametrization of the group Aff (IR) by the set IR=-, so the measure dp becomes manifest as a measure in the (a, b)-plane. The resulting expression for dp = dp(a, b) can be computed explicitly; one finds 1 (2) dp = dp(a, b) := lal 2 dadb . The explanations given here only serve to motivate heuristically why we adopt the particular measure (2) on the set IR=- and no other. For a more detailed account of Haar measure we refer the reader to the literature, e.g., [8] or [16]; but the general theory of Haar measure will not be needed in the remainder of the book. This having been settled, we can talk about the Hilbert space
whose scalar product is defined by
(U,V)H
{
:= JR~
--dadb u(a,b)v(a,b)W'
Having all the necessary ingredients ready we can finally formulate the Plancherel theorem announced in the title of this section.
(3.3) Let 'ljJ be an arbitrary wavelet and let W denote the corresponding wavelet transform. Then for all i, 9 E L2 the following is true: (Wi, Wg) H
I
=
C", (j, g) .
We work with the function Fa introduced in 3.1.(10) and let the function G a be defined analogously from g. Using (3.2) and (2.11) we obtain
3.2 A Plancherel formula
71
successively
(3)
The inner integral in the last line (=: Q) is trivially 0 when ~ "1 0 the substitution
a :=
ea'
(a'
E
IR*) ,
da
=
e= 0, and for
da'
ill
(absolute value of the Jacobian!) gives for Q the value
independently of
e. Therefore we may continue the chain of equations (3) by
By Fubini's theorem the resulting expression justifies all our previous formal manipulations. .-J Before we analyze this theorem and its consequences we present some alternative versions of (3.3). In many cases only scaling factors a > 0 are taken into consideration; i.e., the wavelet transform Wj is restricted to the upper half-plane
and on IR; the same measure (2) is adopted as before. Let
3 The continuous wavelet transforrr
72
be the corresponding Hilbert space. If we insist that already "half the wavele: transform" W fIR; should allow a Plancherel formula, then our wavelet II must satisfy a certain symmetry condition, namely 2
7f
1 1~(a)12 II a
<0
d = 2 a
7f
1 1~(a)12 II >0
a
d
a
=' G' '..p'
This condition is automatically fulfilled if'IjJ is symmetric (Le., even) or real· valued: If 'IjJ is symmetric, then ~ is symmetric as well, and if'IjJ is a real-valued function, then ~(-~) == ~(~).
(3.4) Let 'IjJ be a wavelet satisfying the symmetry condition (4) and let Y\' denote the corresponding wavelet transform. Then for all J, 9 E L2 thE following is true: (WJ, Wg)HI = G~ (I,g) .
I
The chain of equations analogous to (3) now reads as follows:
(WJ,Wg)HI =
Lo (J
WJ(a,b) Wg(a, b) db)
1~~2
The inner integral in the last line (=: Q') is trivially 0 when the substitution a' da' da=a :="'[ (a' E 1R>0) ,
~
= O. If ~ >0,
~
leads to Q'
=
r 1~(a')12 d~'/~ = r 1~(a)12 da = ~G~ . la /~I J>o lal
J>o
Similarly, in the case
~
27f
< 0, the substitution da =
gives
da'
lIT
1 2 A Plancherel formula
73
:\ ow one continues as before:
.1 second look at the proof of theorem (3.3) shows that the bilinearity of the Plancherel formula with respect to the variables f and 9 permits a considerable ~eneralization of the theorem: One may transform f and 9 by means of two djfferent wavelets and still gets a formula of type (3.3). This fact of course !!1creases the flexibility of the wavelet transform both for the analysis and for the synthesis of time signals f. (3.5) Let 1/J and X be two wavelets and assume that the integral
2
7r
1 ~(a)I I R"
x(a) d -' C a -. ..px a
(5)
JS defined, i.e., finite. If W..p and Wx denote the wavelet transform with respect to 1/J and X, then the following is true for arbitrary f, 9 E L2;
;-
Repeat the proof of (3.3) with Fa defined by 3.1.(10) as before, while G a obviously has to be replaced by
\\'e leave the details to the reader.
The formulas established in this section are best understood in the framework of topological groups and their representations. For a short but very readable presentation of this aspect see [LJ, Section 1.6.
74
3 The continuous wavelet transform
3.3 Inversion formulas The continuous wavelet transform encodes a given time signal, i.e., a function f of one real variable t, as a function Wf of two real variables a and b. Instead of 00 1 data we now have, so to speak, 00 2 of them, and this means that f is represented in the data (Wf(a, b) I (a, b) E lR:') with very high redundancy. It will come as no surprise that this circumstance greatly facilitates the reconstruction of the original signal f from Wf. As a matter of fact, there is not only one inversion formula, as with the Fourier transform, but in the end there is an arbitrary number of such formulas. We shall see in the next chapter that even an appropriate discrete collection of values
suffices to restore f completely; in other words, there is also a kind of Shannon theorem for the wavelet transform. In purely set theoretic terms the set lR:' has "the same number" of points as JR, and consequently there are "equally many" functions of the form u: lR:' ~ C as there are functions f: lR --+ C. Nevertheless, it is beyond question that not every theoretically possible set of data (u( a, b) I (a, b) E lR:') can actually occur as a wavelet transform of some function f E L2. This means that the values Wf(a, b) of genuine wavelet transforms must be intercorrelated in an as yet mysterious way. We shall come back to this point in Section 3.4. We will need the following regularization lemma:
(3.6) Let
(t2 )
1 9u(t) := J2ifcr exp - 2cr2
denote the normal distribution with variation cr, and assume that the function f E L1 is continuous at some given point x. Then lim (f
u-+O+
I
Let an c
* 9u )(x) =
> 0 be given. There is an h > 0 (not dependent on cr) with If(x - t) - f(x)1
Because of
f(x) .
< c
(It I :::; h) .
J
9u (t) dt = 1 we may write
(f * 9u)(X) - f(x) =
J
(J(x - t) - f(X))9u(t) dt,
3.3 Inversion formulas
75
which can be estimated as follows:
IU*ga)(X) - f(x)1
~
r
Jtl5,h
If(x - t) - f(x)1 ga(t) dt +
1
~
el
h
-h
r
Jtl"2h
(If(x - t)1 + If(x)l) ga(t) dt
1
ga(t)dt+ Ilflllga(h) + If(x)1
r
J1tl"2h
ga(t)dt.
Here the first integral on the right hand side has a value < 1, and ga(h) as well as the last integral tend to 0 with a --4 0+; see Figure 3.8. Thus one can find a ao so that for all a < ao the following is true:
IU * ga)(x) Since e
f(x)1 < 2e.
> 0 was arbitrary, the proof is complete.
We note as an addendum the following identity, valid for arbitrary
f
E £2:
(1) The left hand side of (1) is by definition equal to
J
f(t)ga(x - t) dt, but the
same is true for the right hand side, since ga is a real symmetric (i.e., even) function.
-+--------~--~=---~~=------+
h Figure 3.8
t
3 The continuous wavelet transform
76
The Plancherel formula (3.3) can be written as follows:
1 ( dadb (I,g) = C,p JJR=- Wf(a,b) (V;a,b,g)
W .
(2)
Letting 9 := Tx gu this becomes
so that by means of (1) we obtain the formula
(3) We now let (J ~ 0+ on both sides of (3) and use Lemma (3.6). This leads to the following reconstruction formula for our time signal f:
(3.7) Let x be a point of continuity of the time signal f. Under suitable assumptions about f and V; one has the equality
f(x)
1
=
(
C,p JJR=- Wf(a, b) V;a,b(X)
dadb
W .
(4)
I
Performing the limit under the integral sign in (3) is quite subtle. For a complete proof we refer the reader to [DJ, Proposition 2.4.2. -.J Formula (4) can be viewed "abstractly" as saying
(5) Written in this form it represents the original signal f as a superposition ("linear combination") of wavelet functions V;a,b, the values Wf (a, b) of the wavelet transform serving as coefficients. By the way, the validity of (5) in the so-called "weak sense" can be regarded as an immediate consequence of the Plancherel formula (3.3). We are referring here to the following functional-analytic hocus-pocus: Any vector f E L2 possesses a second ("weak") personality in the form of a continuous conjugatelinear functional, to wit
g....-. (I, g) ;
3.3 Inversion formulas
77
and any continuous conjugate-linear functional ¢: £2 -) C belongs to a well determined f. If we now look at the Plancherel formula in the form (2) for a fixed f and variable 9 E £2, then it says no more and no less than
(I, .) = C1 1/J
r
df.1 Wf(a, b) (V;a,b, .) . JlR':..
This can be expressed in words as follows: The "weak version" of f is retrieved from Wf by superimposing the functionals (V;a,b, .), using the values Wf(a, b) as coefficients. The formal agreement with (5) is evident. From the two variants (3.4) and (3.5) of the Plancherel formula one derives in the same way the following reconstruction formulas:
(3.8) Under suitable regularity assumptions one has
r
1 dadb f(x) = C~ JlR~ Wf(a, b) V;a,b(X)
W '
if'1f; satisfies the symmetry condition 3.2.(4), and similarly
f(x) =
1 -c 1/Jx
1 lR':..
W1/Jf(a, b) Xa,b(X)
dadb -I -12 ' a
iftbe quantity C1/Jx, see 3.2.(5), is defined. The last formula can be read as
It performs the reconstruction of f using a different set of wavelet functions from the ones previously used for the analysis of f. We shall encounter analysis-synthesis-pairings of this kind a second time in connection with the discretized version of the wavelet transform.
3 The continuous wavelet transform
78
3.4 The kernel function Formula 3.3.(5) can be paraphrased in the following way: The mapping
(1) is the identity. If in this connection people talk about a resolution of the identity, then this is to be understood in an almost chemical sense: The map id: L2 -) L2 is first resolved into its (a, b)-constituents and in the end recrystallized in the integral 3.3.(5) resp. (1). Resolutions of the identity are encountered already on a very elementary level: If (el' ... ,en) is an orthonormal basis of the euclidean JRn, then the formula n X
= L(x,ek)ek k=l
is valid identically in x E JRn ; in other words, the mapping n X
1-+
L (x, ek) ek k=l
is the identity. There is, however, an essential difference relative to 3.3.(5) resp. (1): The vectors ek (1 :::; k :::; n) are linearly independent, but the functions 'l/Ja,b (a E JR*, b E JR) are not. In Sections 4.1 and 4.2 we shall study these matters once again and in a more general setting. For the moment we stay with H:= L2(JR:',dp,). From (3.3) we infer
IIW/II :::; .;0;11/11 showing that the wavelet transform W: L2 -) H is a continuous map. Let
be the image space. In the case at hand there is an inverse mapping
the inverse W- 1 being given (at least formally), according to 3.3.(5), by
3.4 The kernel function
79
The space U consisting of all wavelet transforms Wj, j E L2, is a proper subspace of H. We know, e.g., that the functions u E U have a well determined value at all points (a, b) E JR.:', and each individual u E U is globally bounded owing to 3.1.(7):
Ilull oo
:=
sup{u(a,b) I (a,b)
E JR.~}
<
00.
More is true, however: The function space U admits a so-called reproducing kernel, and this implies that the values of any given u E U are correlated over large distances, as is the case for holomorphic functions. We remind the reader that holomorphic functions have a reproducing property that can be described as follows: Let G c
j(z) =
~ 211"t
r
j(() d(
laG (- z
(z E G) .
Consider a fixed u E U. There is an j E L2 with u = Wj. On account of (3.3) we may write
u(a, b) = (j,V;a,b) =
1
c'"
(Wj,WV;a,b)H (2)
1
= C'" (u, WV;a,b)H
((a, b)
E
JR.~) .
If we want to present the right hand side of (2) in the form of an integral, we have to express the function WV;a,b as a function of new variables a', b'. To this end we regard the wavelet function V;a,b as a time signal and deduce from 3.1.(6) the following expression for WV;a,b(a',b'):
Inserting this into (2) we finally get
u(a, b)
1
=
c'"
r
lff?:' u(a', b')(V;a,b, V;a',b')
da'db' -W .
The function
K(a,b,a',b')
:=
(V;a',b"V;a,b)
is well defined at all points (a, b, a', b') E JR.:' x JR.:' and is called a reproducing kernel for the functions u E U. Altogether we have proven the following theorem:
80
3 The continuous wavelet transform
(3.9) (C"" U and K are as explained in the text.) For arbitrary (a, b) E lR:' one has u(a, b)
CD
1 1m -c
'"
~
U
E U and
K( a,b,a" da'db' ,b) u (' a ,b') -1-1 -'
~
2
(3)
Let us compute the kernel function belonging to the following wavelet:
for a picture of the graph, see Figure 3.9. The leading numerical factor was chosen so as to make 11"p11 = 1. On account of rule (R4) and Example 2.2.@ one has
y
------~~--------~--~----+_------~~--------
Figure 3.9
t
Derivative of the Gaussian
If we restrict ourselves to positive a, then the reproducing formula (3) takes the form
r ( ") (' ')
1 da'db' u( a,) b = C~ JJR;' K a, b, a ,b u a ,b ~' where C~ is given by 3.2.(4) and is computed as follows:
3.4 The kernel function
81
We shall arrive at K (a, b, a', b') by means of Parseval's formula, therefore we need {iJa,b. Rule 3.1.(8) gives {iJa,b(~) = al/2e-ib~ {iJ(a~) =
-v'2 7r- 1/ 4a3/ 2 i e-ib~ ~ e-a2e /2,
and a similar formula holds for (fal,bl. We now can write
The resulting integral may be regarded as a Fourier integral, in fact
(4) where the function
GO
is given by G(~) :=
e e-(a 2+a/2 )e /2 .
As an abbreviation we write J a 2 + a,2 =: A. Since the function ~ ~ e-e /2 is reproduced by the Fourier transform, according to rule (R3) the Fourier transform of g(~) := e-(A~)2/2 can be written as
g(x)
= ~e-(X/A)2 /2,
so that with the help of (2.13) we get
G(x)
= -(g)"(x) =
15
(A2 _x 2)e-(x/A)2/ 2 .
Inserting this into (4) we finally obtain
K(a b a' b') = ",
's a
yo
3/2 '3/2
a
A5
where x:= b' - b and A:= Ja 2 + a,2.
(A2 _ x 2) e-(x/A)2/2
'
o
@ We leave it to the reader as an exercise to compute C~ and the kernel function for the Haar wavelet. Since in this case the scalar products (,IPal,bl, 'l/Ja,b) can immediately be read off from suitable figures (see Figure 3.10), it is no longer necessary to make the detour via the Fourier transform. The other side of the matter is that there are many different cases to consider, so that in the end no simple expression for the kernel function K results.
0
82
3 The continuous wavelet transform
'l/Jd, b'
b+a b
b'
II II ~_--i--_---JI I I
b'+a'
t
Figure 3.10
3.5 Decay of the wavelet transform In this section we investigate the asymptotic properties of the function (a, b) H Wf(a, b) in the limit a --+ O. The values Wf(a, b) corresponding to arguments lal « 1 encode information about high frequency and/or short-lived (called transient in signal theoretic circles) components of f. We have seen that in the realm of Fourier transform let's say jump discontinuities of the signal f entail a slow decay of i(~) when ~ --+ ±oo. As a consequence the inversion formula (in practice a suitable discretization and/or truncation of this formula) is converging only poorly even in zones of the t-axis where the function f is well behaved, e.g., infinitely differentiable. With the wavelet transform this slowing down of convergence can be localized: If the time signal f is smooth in the neighborhood of t = b, then Wf(a, b) converges very rapidly to 0 for a --+ 0; and only in zones where the time signal f has sharp peaks or clicks do we encounter a slow decay of Wf(a, b) when a --+ O. The circumstances we have just described have significant practical consequences: When a time signal f is worked on numerically, then of its wavelet transform Wf( a, b) only, e.g., the values Cr,k := Wf(2r, k 2r) are computed (resp. measured) and stored. Now, if the signal behaves very well over long stretches of the time axis, let's say, if it is so many times differentiable there, then the overwhelming part of the Cr,k will become so minuscule that these Cr,k may as well be taken to be zero. In this way one can achieve an enormous rate of data compression: Only the Cr,k whose absolute value transcends a certain threshold are kept back at all, then stored and used for the reconstruction of f later on. A vast body of numerical evidence demonstrates that
3.5 Decay of the wavelet transform
83
these "essential" Cr,k are completely sufficient to restore the original signal f with the desired precision. For a further glimpse into this matter we refer the interested reader to the article [19]. We begin with two statements of a rather simple type.
(3.10) Assume that a wavelet 'ljJ with t'ljJ E Ll has been chosen. Let the time signal f E L2 be globally bounded and assume that f is Hoelder continuous at the point b, i.e., there is a: E ]0,1] such that in a neighbourhood of b an estimate of the form (1) If(t) - f(b)1 ::; Cit - bI'" holds. Then IWf(a,b)1 ::; C f
lale>+t .
(2)
I
It is enough to consider the case a > O. Since f is bounded, we may assume (enlarging C, if necessary) that (1) is true for all t E R
Because of J 'ljJ(t) dt
= 0 we have
Wf(a, b)
=
a 11/ 2
J
(t - b)
(J(t) - f(b)) 'ljJ -a- dt
and consequently IWf(a, b)1
::; a~2
JIt - ble> I'ljJC:
I
b) dt .
In the integral on the right we substitute t := b + ay (-00 IWf(a, b)1
::; C lale>+t
Jlyle>
< y < 00)
and get
1'ljJ(y) Idy .
From a: ::; 1 we deduce lyle> ::; 1 + Iyl, therefore by assumption on 'ljJ the last integral has a finite value, and (2) is proven. ~
A Lipschitz continuous function f E L2 is necessarily bounded and is everywhere Hoelder continuous with exponent a: = 1. Thus we get the following corollary: (3.11) Assume that a wavelet 'ljJ with t'ljJ E Ll has been chosen. If the time signal f E L2 is globally Lipschitz continuous, then there is a C, not depending on b, such that IWf(a, b)1 ::; C la1 3 / 2 .
There are various variants of converses to these statements, see, e.g., [DJ, Theorems 2.9.2 and 2.9.4. As an example we quote the following theorem; the reader is referred to [D] for a proof.
84
3 The continuous wavelet transform
(3.12) Assume that a wavelet 7/J with compact support has been chosen. If I E L2 is a continuous time signal whose wavelet transform satisfies an estimate of the form
((a,b) for some a E JO, 1], then
E IR~)
I is globally Hoelder continuous with exponent a.
The following theorems are of a more subtle nature. The essential lesson we learn from them is that in order to optimize the asymptotic properties of our wavelet transforms WI we have to impose additional conditions on the selected wavelet 7/J. The regularity of 7/J is not an issue here, but it turns out that it is to our advantage to extend the basic requirement J 7/J(t) dt = 0 to higher order moments. The indicated line of thought is based on the following definitions: For arbitrary kEN the quantity
(t k 7/J
E
£1)
(otherwise) is called the k-th moment of 7/J E Ll. The wavelet 7/J is a wavelet of order N if it satisfies the following conditions:
If no special measures are taken, the order of a wavelet is 1, by definition. Symmetric wavelets have an order ~ 2, if we assume the existence of the relevant moments. By (2.13) the Fourier transform ;jj of a wavelet of order N is N-times continuously differentiable, and the moment conditions imply
;jj(N)(O) f o. It follows that the Taylor expansion of ;jj at 0 has the form
;jj(e) = ,'eN +
higher terms,
(3)
3.5 Decay of the wavelet transform
85
(3.13) Assume that the chosen wavelet 1fJ has order N and compact support. If the time signal f E L2 is of class eN in a neighbourhood U of the point b, then Wf(a, b) = laI N+! (,' f{N)(b) + 0(1)) (a-.O), (4) where {' := sgnN(a)::y j N! . Suppose that 1fJ(t) == 0 for It I > T. It suffices to consider the case a > 0, and one may assume from the beginning that a is so small that the whole interval [b - aT, b + aT 1is contained in U. The function f has a Taylor expansion centered at the point b: For given t E U there is a r between band t such that
I'
f(t) = jt'-l f(t) =
+ f{~~r) (t - b)N
jC' f(t) + f{N)(r) ;/(N)(b) (t _ b)N ,
(5)
where the leading term on the right hand side can be unpacked as N
jC' f(t)
=
L Ck(t -
b)k .
k=O
This implies that for the computation of
Wf(a, b)
:=
a- 1/ 2
J
f(t) 1fJ((t - b)ja) dt
we need among others the following integrals:
Altogether we have 1
Wf(a, b)
=
a N+'2 'Y
f{N)(b) N!
+
R,
and we now have to estimate the error term R, stemming from the remainder term in (5). Using the substitution t := b + at' (-T S t' S T) the quantity R can be written as
3 The continuous wavelet transform
86
The last integral suggests that we should introduce the auxiliary function
w(h):=
sup II(N}(T) - I(N}(b)1 IT-bl::;h
by assumption on
I
(h
~
0) ;
we are sure that lim w(h)
=
(6)
0 .
h--+O+
Since the (variable) point T is known to lie between band t can estimate R as follows:
aN+t
IRI ::; ~
= b + at', we now
JT w(alt'l) WIN 1'IjJ(t')I dt' ::; ~ aN+t JT WIN 1'IjJ(t')I dt' . w(aT) -T
-T
By assumption on 'IjJ the last integral is finite, therefore by means of (6) we arrive at the stated relation R = a N+ t 0(1)
(a--+O) .
According to this theorem the rate of decay of the wavelet transform when a --+ 0 is determined by the order N of the chosen wavelet, at least in regions of the b- resp. t-axis where I is sufficiently smooth. One can even say more: The proportionality factor appearing in the asymptotic formula (4) is essentially the exact value I(N} (b) of the N-th derivative of I at b, which means that the "zoom"
a f--+ WI(a, b)
(a--+O)
can be used as a measuring device for this value. - In any case, for the reasons indicated in the beginning of this section, it pays to chose a wavelet, whose order N is (under the given circumstances) as large as possible. In cases where the smoothness of I is smaller than is honoured by the order of the chosen wavelet, the following generalization of (3.11) gives an overall decay estimate:
(3.14) Assume that a wavelet 'IjJ of order N has been chosen. If the time signal I E L2 is of class C r , r < N , and if I(r} is Lipschitz continuous, then there is a C, not dependent on b, with IWI(a, b)1 ::; C lal r +t .
3.5 Decay of the wavelet transform
87
I
We may again assume a > O. Computing the Taylor expansion of an arbitrary point b E ffi. one obtains (cf. (5))
f at
the point T lying between band t. Because of r < N only the remainder term is contributing anything to Wf (a, b) at all; so we have
Wf(a, b) = _1_ j(f(T)(T) _ f(r)(b))(t _ bt1/;(t - b) dt r! a 1 / 2 a =
arr~!
j(J(r)(T)-f(r)(b))t!r1/;(t!)dt!.
Since the point T lies between band t = b + at!, by assumption on f we are sure that for a suitable
Clip.
Therefore we are able to estimate Wf(a, b) as follows:
Here the last integral is finite by assumption on 1/;. We conclude this section by investigating how "clicks" of a time signal f influence the decay of the wavelet transform Wf. In our terminology an rclick, r 2: 0, of f is an isolated jump discontinuity of the r-th derivative of f at some point b E ffi.:
f(r) (b+) - f(r) (b-)
=:
b. .
Apart from that all derivatives f U) of order:::; r are assumed to be continuous in a neighbourhood of the point b. About such clicks we prove the following:
(3.15) Assume that the chosen wavelet 1/; has order N and compact support. lithe time signal f E L2 has an r-c1ick, r < N, at the point b, then
Wf(a,b) =
lal r +!(Cb.+o(I))
(a
-t
0) ,
the constant C being independent of f. The left part of Figure 3.7 illustrates the case r = 1, N = 2 of this theorem.
3 The continuous wavelet transform
88
I
As in the proof of (3.13) we suppose that 'Ij;(t) == 0 for It I > T. It is no restriction of generality to assume b = 0; furthermore it suffices to consider the limit a ~ 0+. Instead of (5) we now have
(t > 0) for some
T
between 0 and t, and similarly for t
< O. Setting
we obtain the following representation of f, valid for all t
f(t) =
r-o
1
f(t)
~
A
+ -r! t r + -2r!
sgn t . t r +
Here the ±-sign has to be interpreted as Because of N > r this formula implies
# 0:
f(r) (T) - f(r) (O±) tr . r!
+ when t > 0 and as - when t < o.
Wf(a,O) = _1_ J(~ sgnt + (J(r)(T) - f(r)(O±))) tr'lj;(!) dt r! a1/ 2 2 a ~ = _ar+t _ (_ sgnt' + (J(r)(T) - f(r) (O±)))t'r 'Ij;(t') dt' . r! -T 2
jT
Putting
-1, 2r.
jT sgnt·tr'lj;(t)dt
=:
(7)
C
-T
we arrive at
Wf(a, 0) = C ~ar+!
+ R.
It remains to estimate the error term R. To this end we use the auxiliary function w(h):= sup If(r)(T) - f(r)(o±)1 ' O
defined for h > 0 and where again the ±-sign is to be interpreted as > 0 and as - when T < o. By assumption on f we have
+ when
T
lim w(h) = 0 .
h--->O+
(8)
3.5 Decay of the wavelet transform
89
The (variable) point T in the integral (7) is lying between 0 and t implies that the remainder R can be estimated as follows:
IRI ::;
a~~!
i:
w(a It I)
IW 17/J(t) Idt
::;
arr~! w(aT)
i:
= at'.
IW 17/J(t) Idt
This
.
Since the integral on the right side of this equation is finite, we may conclude with the help of (8) that the stated formula
(a is true.
-+
0+)
4 Frames
The general notion of a "frame" will enable us to present the continuous wavelet transform and its discretized version (to be studied later on) from a single functional-analytic viewpoint. The next two sections, 4.1 and 4.2, are essentially borrowed from [KJ, where this unified aspect of the two theories is described in a particularly lucid way. To summarize the general idea in a few lines: A frame is a collection a. := (a L I LEI) of vectors in a Hilbert space X that is rich enough to make sure that no vector x E X other than 0 is orthogonal to all a L • In the infinitedimensional case this is not so easy to guarantee. The aL need not be linearly independent, let alone orthonormal. As a consequence, frames are in general a "redundant" collection of vectors.
4.1 Geometrical considerations In order to get acquainted with the proposed "framework" we consider the following situation: Let X be a finite-dimensional complex Hilbert space: dim X =: n < 00, and assume that r vectors at, ... , a r E X are given. The number r of these vectors should be thought of by the reader as being larger than the dimension n of the space X. With the aid of these aj we construct the mapping T:
X
-+
Cr
,
X r-.
(1 :S j :S r) .
Tx;
Denoting the canonical basis of C r =: Y by (et, ... , er ) we can write the mapping T in the following form: r
Tx = 2:)x, aj) ej . j=l
Since X has dimension n, the image space
U := im(T) := {Tx I x E X}
(1)
4.1 Geometrical considerations
x
91
T
..
..
Y
Figure 4.1
is at most n-dimensional, therefore U is a proper subspace of the r-dimensional space Y in case r > n; see Figure 4.1. We now want to investigate the following questions: Is a vector x E X uniquely determined by its image y := Tx E Y? Or, to put it differently: Is T an injective mapping? Or, expressed yet a third way: Is ker T = O? And, if the answer is yes: How, in such a case, could one reconstruct the vector x from its image y? If T is injective (from which, in principle, invertible) then the given collection a. := (al, ... ,ar) of vectors aj E X is called a frame for the (finitedimensional) Hilbert space X, and the mapping T is called the frame operator belonging to the collection a .. If we adopt on the space Y the canonical scalar product r
(y,z)
:=
(2)
LYkZk, k=l
the space Y becomes a Hilbert space, too. This setup may be expressed in a more sophisticated way as follows: Y = L2 ({I, ... , r}, #). The fact is that the vectors Y E Y can be regarded as complex-valued functions
{l, ... ,r}-tC,
k
f--+
Yk ,
and # denotes as usual the counting measure, which assigns each point of the domain under consideration the measure (mass) 1. In this way the mapping T becomes a mapping between Hilbert spaces, therefore it is possible to consider its adjoint T*: Y -t X. It is characterized by the following identity: (x, T*y)x = (Tx, y)y
VXEX,VyEY.
92
4 Frames
In particular, one has (x, T*ej) = (Tx, ej) = (j-th coordinate of Tx) = (x, aj)
VxE X,
which allows the conclusion
(1 'S:j 'S:r).
(3)
If we compose the mapping T with T* we obtain the Gram operator (so-called by us, see the footnote 1 below)
G
:=
T* T:
X
~
X ,
a mapping from X to X. Applying the mapping T* to both sides of (1), we obtain, thanks to (3), the following formula for G: r
Gx
~)x,aj)aj .
(4)
j=l
Regarding kernels we now assert: kerT = kerG .
I
Tx
(5)
= 0 of course implies Gx = 0, and the identity
IITxl12 = (Tx, Tx) = (T*Tx, x) = (Gx, x)
(6)
proves the converse. Formula (5) admits the following conclusion:
(4.1) The mapping T: X ~ Y is injective if and only if the corresponding Gram operator G := T*T: X ~ X is regular.
1
The Gram matrix or Gramian of a collection of vectors ak E X is by definition the matrix of the scalar products (ak, al). This is not the matrix of G but the matrix of the mapping TT': Y ~ Y.
4.1 Geometrical considerations
93
We have to take a closer look at the Gram operator. Since for arbitrary x, u E X we have (x, Gu)
=
(x, T*Tu)
=
(Tx, Tu)
=
(T*Tx, u)
=
(Gx, u) ,
(7)
we conclude that the operator G is self-adjoint. This has the consequence that all its eigenvalues Ai are real, and, what's more, if A is an eigenvalue of G and xi- 0 a corresponding eigenvector, then from (6) one deduces
A(X, x) = (Gx, x) =
IITxl12 ~ 0,
which in turn implies A ~ O. We arrange the Ai in increasing order as follows:
By the same token there is an orthonormal basis (ell ... ,en) of X that diagonalizes G. With respect to this basis the image of the vector x = (x 1, ... , x n ) is given by Gx = (AIX1, ... , AnXn). Computing IITxl1 2using these coordinates one gets
These inequalities are going to play an essential role in the rest of the book. For the time being we note the following proposition:
(4.2) A collection a. = (all"" ar ) of vectors is a frame for the (finitedimensional) Hilbert space X, if and only if there are constants B ~ A > 0 such that VxE X.
The numbers B ~ A > 0 are the frame constants of the frame a.. If A = B, then the frame a. is called a tight frame. In this case one has VXEX,
which means that T maps X essentially isometrically onto U; and the Gram operator belonging to a tight frame is given by
G=A·Ix, where Ix denotes the identity map of the vector space X.
4 Frames
94
CD Let X be the space ((:2, fitted with the canonical scalar product (2). For an arbitrarily chosen number r ~ 2 we put w := e27ri / r and define the r unit vectors (O:::;j:::;r-l). Figure 4.2 shows the first coordinates of the vectors aj. We now are going to study the corresponding frame operator T: X --+ ((:r. For a general vector x = (XI,X2) E X we have 1 . .
(Tx)j = (x, aj) = J2(xl wJ + X2WJ)
and consequently r-l
r-l
j=O
j=O
IITxI12 = ~2~ "'(XIW j +X2Wj)(XIWj +X2Wj) = ~ "'(IXI12 + IX212) = ~ IIxI1 2 T 2~ 2 (at the up arrow i we have made use of L;:~ w2j = 0). The resulting identity shows that the collection a. = (ao, ... , ar-l) is a tight frame with frame constant A = r /2. One may regard the value r /2 as a measure of the redundancy of the frame a.. It is clear that for ((:2 two suitably chosen vectors would do.
o C (first coordinate)
Ijf2
Figure 4.2
@ Let a. = (al"'" an) be an orthonormal basis of the Hilbert space X. If T is the corresponding frame operator, then n
IITxI12 = L I(x, aj)12 = IIxI1 2
'<:/xEX.
j=l
It follows that a. is a tight frame with frame constant A
= 1.
o
4.1 Geometrical considerations
95
® In order to strengthen the geometric intuition we consider in this last example the following real situation: Let (1 ::; j ::; 3)
(8)
be three linearly independent vectors of the euclidean ]R3. Writing the three row vectors (8) one below the other, one obtains a regular (3 x 3)-matrix [Mj. The frame operator T maps a general vector x E ]R3 onto the vector 3 3 3
Tx
:=
(L alkXk, L a2k Xk, L a3kXk) k=l
k=l
E]R3.
k=l
Computing
brings into the picture the quadratic form Q whose matrix elements Qk! are given by 3
Qk,! :=
L ajkaj! . j=l
These are not the scalar products of the aj, but the scalar products of the column vectors of [Mj. The above formula for the Qk! is equivalent to the matrix equation [Qj = [Mj' [M], where the prime' denotes transposition. It follows that the symmetrical matrix [Qj is regular as well, therefore the quadratic form Q is positive definite. This implies that Q assumes a certain maximum value B and a positive minimum value A on the unit sphere 8 2 C R3 , from which we may immediately conclude that the three given "lectors form a frame with frame constants B :::: A > O.
0
We now address the second question: How can the vector x E X be reconstructed from its image y := Tx? Thus we assume that the collection a. = (aI, ... , a r ) is indeed a frame and let G: X --+ X be the corresponding Gram operator. Since G is regular, it has an inverse G- 1 : X --+ X. Using G-l we define the mapping 8 := G-1T*:
The formula 8T
Y --+ X .
= G- 1T*T = G-1G = Ix
(9)
96
4 Frames
shows that S is a left inverse of the frame operator T and so may be used for the reconstruction of x from y = Tx. If the frame a. is tight, then a-III
=11
x
and consequently
S = .!.T* . A This means that in the case of a tight frame the inverse transformation S is obtained for free, Le., without having to compute a matrix inverse. We now compose Sand T the other way around and obtain the mapping
P := T S:
Y
--t
Y .
It can be characterized geometrically as follows:
(4.3) P:= T S is the orthogonal projection of the space Y onto the subspace U:= im(T).
I"
Let Pu be the orthogonal projection of Y onto U. Any vector y E Y has a uniquely determined decomposition of the form
u = Pu y E U,
v E U1. .
For vectors u = Tx E U, formula (9) implies the identity Pu = TSTx Tx = u. For a v E U 1. we have
(x, T*v) From this we conclude T*v Altogether we obtain
= (Tx, v) = 0
=
VXEX.
= 0, and this in turn gives Pv = T(a-IT*)v = o.
Py = Pu + Pv = u = Pu y
Vy E Y,
as stated. Proposition (4.3) may be interpreted as follows (see Figure 4.3): The S-image x := Su of a vector u E U is the uniquely determined vector x E X whose T-image is the given u, and the S-image x := Sy of an arbitrary y E Y is the one vector x E X whose T-image is nearest to the given y. In this way we have obtained a simple geometric description of the mapping S. Now for the next step: Using
a-I
we define the vectors
(1
~ j ~
r) .
4.1 Geometrical considerations
97
x
y
u
S:=G-IT* o
o
~Y
X=Sy
Tx=Py
Figure 4.3
The collection a. := (al,"" ar ) is called the dual frame of the frame a.. If the given frame a. is tight, then the aj coincide with the aj up to the constant factor j. In the following theorem we sum up what can be said about the relation between a frame a. and its dual a..
(4.4) Let a. be a frame with frame constants B ;::: A > 0 and let corresponding dual frame. ·Then the following are true:
a.
be the
(a) The two frames a. and a. together incorporate a resolution of the identity for the space X: r
X = L(x,aj) aj
VxEX.
j=l
(b) The image Sy of an arbitrary vector y
=
(Yl, ... Yr) E Y is given by
r
Sy
= LYjaj . j=l
(c) The collection
a.
1 1 is in fact a frame with frame constants A ;::: B > O.
(d) The dual frame of a. is a. ; in particular, one has the following mirrop formula to (a): r
x
L(x,aj) aj
VXEX.
j=l
I
(a) Using (4) one immediately obtains
x = G-l(Gx) = G- l (L(X, aj) aj) = L(x, aj) aj . j
j
98
4 Frames
(b) Formula (3) implies Sy = G-lT*(.LYjej) = G-l(LYjaj) = LYjaj. j j j
(c) Let T be the frame operator belonging to the collection ii.. Since G is self-adjoint, the same is true for G-l. Now we have '
for all x and all j. This proves
T-
TG- l
,
(10)
and (6) implies, in turn,
There is an orthonormal basis (el, ... , en) of X that diagonalizes both G and G-l. Using this basis we now obtain the required estimates: 1
IITxl12 = (x, G-lx) = L -IXiI2 i=l Ai n
{
1.. IIxl12 - B ::; -lllxll2
>
(d) With the help of (10) one obtains the following expression for the Gram operator C belonging to the collection ii.:
C := T* T = G-lT* TG- l = G- l
.
This implies ~j := C-liij = Giij = aj for all j, as stated. If r > n := dim(X), then the ii j are linearly dependent, so there have to be infinitely many representations of a given vector x E X as a linear combination of the iij. Among these the representation (4.4)(a) is distinguished as follows:
(4.5) Let a. and ii. be dual frames, and let x = E;=l ~j iij be an arbitrary representation of the vector x E X as a linear combination of the iij. Then r
L
j=l
r
l~jl2 ~
L
j=l
l(x,aj)1 2,
the equality sign holding only if ~j = (x, aj) for 1 ::; j ::; r.
4.2 The general notion of a frame
99
I
Consider the point (6, ... er) =: y E Y. According to (4.4)(b) one has x = By, and (4.3) implies Tx = TBy = puY. This at once leads to
Here we can have equality only if y = puY = Tx. Expressing these geometric facts in terms of coordinates one obtains the statements of the theorem. --.l The content of Theorem (4.5) can be expressed in this way: The "natural" representation (4.4) (a) uses the least amount of "coefficient energy" .
4.2 The general notion of a frame The geometrical (and finite-dimensional) analysis presented in the foregoing section served to prepare us for the following general dispositions:
X is a complex Hilbert space whose vectors we denote by letters. One should imagine X being infinite-dimensional.
f, h and similar
M is an "abstract" set of points m. On the set M a measure JL is defined that assigns each measurable subset E eMits "mass" or ''volume'' JL(E) E [0,00]. The measurable subsets form a so-called u-algebra F, and care is taken that any "reasonable" subset E C M belongs to F. According to general principles it is then possible to set up an integral calculus for functions on M, and it makes sense, e.g., to speak about the Hilbert space Y := L 2 (M, JL). The pair (M, JL) is the abstraction of the pair ({I, 2, ... , r}, #) that played such a prominent role in the last section. Furthermore, a family h. := (hm 1m E M) of vectors h m E X is given, the measure space M serving as index set for this family. The h m are (analogous to the aj of Section 4.1) to be viewed as "measuring probes", by means of which we want to explore the individual vectors f E X as completely as possible. In Section 1.5 we tentatively spoke of "key patterns" when actually the same "measuring probes" were meant. The fact is, for a given f E X, one gets ahold (numerically, experimentally, conceptually, or otherwise) of the family of all scalar products Tf(m) := (j, hm )
(m E M) .
4 Frames
100
In this way one obtains an array (Tf(m) 1 mE M) that is nothing other than a function Tf: M ----; C. The integral installed on M now enables us to quantify the yield of our measuring efforts: The L 2 -integral (~
(0)
(1)
is obviously a natural measure for the amount of information so collected about f. This brings us to the following definition: The family h. is a frame, if the following conditions are satisfied: • •
the function Tf is j.L-measurable for all f E X, so that the integral (1) is always defined; there are constants B ~ A > 0 such that A
IIfl12 < IITfl12 < Bllfl12 (a)
VfEX.
(b)
Here the inequality (b) guarantees that the frame operator
T:
X ----;
eM ,
f
f-t
Tf
is a bounded operator from X to Y := L2(M,j.L). The inequality (a), in most cases the crucial one of the two, serves to make sure that T is injective, signifying that no information is lost in the process f f-t Tf. While we are at it, we proceed to explain the related notion of a "Riesz basis", which will playa certain role in connection with the discrete wavelet transform later on. Here the set M is countable to start with, and j.L is the counting measure # on M. A family h. = (hm 1mE M) of vectors hm E X is called a Riesz basis of X if the following conditions are satisfied: • •
span(h.) = X; there are constants B
A
~
A > 0 such that
2:leml 2 ~il2:emhmI2 ~ m (c) m "
B
2:leml 2 m
Altogether, these conditions say that the mapping
m
is a bounded operator having a bounded inverse K- 1 : X ----; l2 (M).
(2)
4.2 The general notion of a frame
101
The relation between the two concepts "frame" and "Riesz basis" is not obvious, because the two definitions speak about totally different things. Thus it is not a bad idea to prove the following proposition: (4.6) A Riesz basis h. with constants B ~ A > 0 is automatically a frame with A and B as frame constants. Let (em 1m E M) be the canonical orthonormal basis of [2(M). Then one has K em = hm and consequently
I""
m
m
m
for all x EX. By general principles of functional analysis the conditions (2) imply the analogous inequalities for K* = T. This means that we also have
The following somewhat vague statement is not so far from the truth: A Riesz basis is a countable frame whose vectors are linearly independent and stay so even "in the limit". To wit, the inequality (c) in (2) guarantees that it is impossible for a nontrivial linear combination :Em ~m hm to represent the zero vector. In the finite-dimensional case the inverse G- 1 of the Gram operator and the dual frame a. could be computed by inverting a certain matrix. In the case at hand, an operator G:X~X,
dim(X)
= 00,
*:
has to be inverted. This can be accomplished by means of an iteration procedure whose rate of convergence is tied to the quotient The nearer this quotient is to 1, the better the convergence of our procedure is. In fact, we shall prove the following:
(4.7) Assume that h. is a frame for X with frame constants B ~ A > 0, and let y E X be an arbitrary vector. If the sequence x. is recursively defined by Xo := 0,
Xn+l := Xn
2
+ A +B
(y - Gx n )
(n ~ 0) ,
then limn --+ oo Xn = G-I Y . In practice (that is to say, in the actual numerical computation of the frame vectors aj := G-1aj), the described procedure is cut short as soon as the increments (y - Gx n ) become negligibly small.
A!B
102
I
4 Frames
We consider the auxiliary operator 2
R := Ix - A + B G . In terms of R, the iteration formula can be rewritten as 2
xn+1 := A+B y
+
Rxn ·
Now G is a positive definite self-adjoint operator, and by assumption on T we know that A Ix ~ G ~ B Ix (such inequalities make sense in this case!). This implies A+B I x ~ B-A G - -2-'
II
II
2
so that we get the following estimate for the norm of R: 2
IIRII = II A + B
II B-A G -Ix ~ B + A
BjA-1
= BjA + 1 < 1 .
By the contraction principle (Le., the general fixed-point theorem) we can conclude now that limn ....Hxl Xn =: x E X exists, and furthermore that
The last equation implies y - Gx = 0, whence x
= G .... ly, as stated.
.J
At this time we can see the following two applications of the concepts presented here: Number one, of course, the finite-dimensional model discussed in Section 4.1, and number two, the continuous wavelet transform as treated in Chapter 3. We are now going to review and interpret the latter in the functional analytic framework (!) set up in this section. X is the space L2(JR) of time signals j, and M is the set JR~:= {(a,b)
I a E JR*,
bE JR},
provided with the measure d/L := dadbjlal 2 . The Hilbert space Y := L2(M) is the space L2(JR:', d/L) that was denoted by H in Chapter 3. After a mother wavelet 'IjJ has been selected, one defines the wavelet functions 'ljJa,b(t) :=
1
laI1/2'IjJ
(t-a- b)
4.2 The general notion of a frame
103
and in this way installs a family
of vectors 'l/Ja,b E L2. The corresponding frame operator T transforms any function f E L2 into a function T f: JR.:' -+ C according to the prescription
Tf(a, b) := (I, 'l/Ja,b) = Wf(a, b) We see that the wavelet transform W is nothing other than the frame operator T corresponding to the family 'I/J • . Now by Theorem (3.3) one has
where the constant C'IjJ is given by C'IjJ := 2n
l. 1~~?12
da .
In terms of the concepts defined in the current chapter, we can express this fact as follows: (4.8) Let 'I/J be an arbitrary mother wavelet. Then the family 'I/J. is a tight frame with frame constant C'IjJ.
In view of this theorem, the inverse of the Gram operator is given by
a- 1 =
~'IjJ lx, and the dual frame ;jJ. coincides with 'I/J. up to the same constant factor:
If we now apply formula (4.4)(a), which reconstructs a given vector x E X
from the values (Tx)j := (x, aj), to the situation at hand, we arrive at the following: ( dadb 1 (3) f = JIR=Wf(a, b) C'IjJ 'l/Ja,b
Tar
This is in agreement with (3.7) resp. 3.3.(4). It must be admitted, however, that (4.4)(a) is related to a finite-dimensional model, so the validity of (3) is not guaranteed in the present situation. As a matter of fact, formula (3) is valid only in a "weak" sense or else under stronger assumptions on f and 'I/J; see our remarks in Section 3.3 regarding this point.
4 Frames
104
4.3 The discrete wavelet transform Shannon's sampling theorem (Section 2.4) accomplishes the full reconstruction of a bandlimited time signal f from a discrete collection (J (kT) IkE IE) of sampled values. In this section we set out to attain something similar in the realm of the wavelet transform. The data that we shall use in the reconstruction of f are no longer f-values at equally spaced points kT, but results of "wavelet measurements" (j, 'l/Ja,b); that is to say, suitably chosen values of the wavelet transform Wf: lR=- -+ c. One must always keep in mind that a given signal f is encoded in its wavelet transform with an enormous redundancy. Under these circumstances it is not so surprising that a discrete set of Wf-values is already sufficient to reconstruct the given f as an L2_ object or even pointwise, and all this even without the assumption that f is bandlimited. We now describe the class of "grids" in the (a, b)-plane that we shall use for the sampling of the function Wf: First a zoom step a > 1 is chosen (the habitual choice is a = 2) as well as a base step f3 > 0 (a good choice is f3 = 1). These two parameters characterize the chosen "grid" and are kept fixed in the following. Then one sets (m, n E IE) , a a~-------'~----------
1~~-+~~f3____~f3__~__~_________
(m=O)
(m
b Figure 4.4
4.3 The discrete wavelet transform
105
and with these numbers one defines the countable set
shown in Figure 4.4. Note that negative a-values are no longer taken into consideration. From a structural point of view, i.e., for the purposes of addressing the individual points of M, we obviously can say that M rv Z x Z. Our next question is: What should be the correct measure on this M? Each point (am, bm,n) E M represents a rectangle Rm,n of width am f3 and height (J'mfo - am/~ in the (a, b)-plane (see Figure 4.5), and the Rm,n constitute a disjoint decomposition of the upper half-plane JR.;. The J.L-content of the rectangle Rm,n is computed as follows:
therefore it is independent of m and n. This crucial observation leads us to choose the counting measure # as our measure on the set M rv Z2, so that the space Y of the foregoing section becomes Y := l2(Z2).
Rm,n
ern/Va Figure 4.5
Assume now that a mother wavelet 'l/J has been chosen once and for all. From the full set of wavelet functions 'l/Ja,b, (a, b) E JR.~, we only retain the ones that belong to the points (am, bm,n) E M, and of course these functions get a new address: 'l/JaTn,naTn{3 =: 'l/Jm,n' This means we now have the family
106
4 Frames
consisting of the following wavelet functions:
The corresponding frame operator T: I ~ T I is connected to the wavelet transform W: I ~ WI by means of the formula
(1) We are now ready for the essential questions of this section: Under which assumptions on 'lj;, (J and f3 can we be sure that the collection 'lj;. is in fact a frame, and what are the resulting frame constants? Regarding the second question, in [DJ, Theorem 3.3.1, the following is proven:
(4.9) Let'lj; be a wavelet and let C_, C+ be defined by C
.= 271"
-.
r
J
1-0(~W de I~I
.. ,
If the family 'lj;. corresponding to given step sizes (J and f3 is in fact a frame, then the resulting frame constants B :::: A > 0 satisfy the following inequali-
ties:
In particular, one cannot have A = B unless C_ = C+. This is a consequence of the fact that we have rejected negative a-values; cf. the analogous condition in Theorem (3.4). For the proof of (4.9) we refer the reader to [DJ. SO far, so good, but what we really want is a theorem of the following kind: Under exactly described circumstances it is guaranteed that the collection 'Ij;. is a frame, with the frame constants B :::: A > 0 obeying tolerances stipulated in advance.
Assume that a zoom step (J > 1 is given. A wavelet 'lj; is called admissible for the purposes of this discussion if its Fourier transform -0 fulfills the conditions (a) and (b) below. (a) There are constants ex> 0, p > 0 and C, such that (I~I
:::; 1)
(I~I
:::: 1)
(2)
4.3 The discrete wavelet transform
107
This condition is in fact harmless and serves mainly to introduce the constants
!.
(b) There is a constant A' > 0 such that 00
L
1';;;(O'm~)12 > A'
;-1
(~
E
JR) .
(3)
m=-oo
Since the left hand side of (3) is invariant with respect to the transformation e1-+ O'e, it is enough to check the required in~quality on the domain 1 ::; I~I ::; (J. According to this condition the zeros of 'IjJ are in a way forbidden to be in "logarithmic conspiration". Thus it is in particular excluded that the support of;j) is contained in a single interval of the form lb, O'b[. Assume, e.g., that 'IjJ has finite order N. Then because of 3.5.(3) there is an h > 0 with
(0 <
lei < h) ,
and (3) is guaranteed. For the purposes of the current discussion we call the constants Ct, p, G, and A' the parameters of 'IjJ. - After all these preparations we can finally formulate the central theorem of this chapter:
(4.10) Let a zoom step 0' > 1 be given and assume that 'IjJ is an admissible wavelet with parameters Ct, p, G and A'. Then there are constants (30, B' and G', so that the following is true: For any base step (3 < (30 the family 'if;. = ('ljJm,n I (m, n) E Z2) is a frame with frame constants
A
=
21l' (A' - G' (31+P) , (3
B =
2;
(B'
+ G' (31+P)
.
We defer the proof of this theorem to the next section. For the time being, the following heuristic argument should be sufficient: We have to show that the operator T satisfies the frame condition
(4)
108
4 Frames
According to (1) we have
m,n
m,n
Now the above considerations concerning the rectangles Rm,n show that the right hand side of this equation can essentially be regarded as a Riemann sum for the integral
and according to Theorem (3.4) this integral has the value G~ IIf1l2. For this reason it is quite plausible that for sufficiently small a > 1 and sufficiently small f3 > 0 the quantities IITfll2 and IIfll2 have the same order of magnitude, as required by (4). Theorem (4.10) shows that in reality very modest assumptions about 1/J suffice to guarantee that the data
(Tf(m,n)! (m,n) E Z2)
(6)
encode all features of the analyzed function f, as soon as f3 is small enough; in particular, it is all right to take a := 2 in such a case. For the reconstruction of the original f using the data (6) we need the frame ¢., dual to 1/J•. If the frame 1/J. is not tight, we have to compute the ¢m,n using the prescription ¢m,n := G- 1 (1/Jm,n) . Unfortunately the ¢m,n cannot be obtained from a single ¢ by mere dilation and translation, unless of course 1/J is chosen in a very special way at the outset. The following considerations will make this more clear: The two operators
Df(t):=
~f(~)
and
8f(t) are unitary, therefore we have D* Gram operator G, given by
= D-l
m,n
:=
f(t - f3)
and 8*
= 8- 1 .
Consider now the
4.3 The discrete wavelet transform
109
Regarding D, we have 1 (t) 1 Dif;m,n(t) = ,fiiWm,n -;; = o-(m+l)/2 W(t/oo-m - n{3 ) = Wm+l,n(t)
and consequently
m,n
m,n
m,n Obviously in this case
m,n
m,n
a-I
commutes with D as well, and we obtain
that is to say,
Unfortunately a and S do not commute, so that the above calculation (7) cannot (mutatis mutandis) be repeated. The reason is the following: The functions SWm,n appearing on the right hand side of the formula
m,n cannot be identified with certain Wm',n " as the D'¢m,n could; rather, they look like this:
and in general the factor n + o--m is not an integer. From this observation one has to draw the conclusion that the dual wavelet functions 1Po,n, nEZ, are not related to each other in a simple way, so they have to be determined individually. For the reasons described above, in most circumstances one is eager to choose a tight frame W. right at the outset. The following theorem shows that such a choice is indeed possible:
110
4 Frames
(4.11) Assume that the Fourier transform ¢ of the mother wavelet '¢ has compact support in the interval I := [w, w'], w' > w > 0 and that 00
.L
I¢(am~w == A' > 0
m=-oo
Then the collection '¢. = ('¢m,n I (m, n) E Z2) belonging to the zoom step (! and arbitrary base step 2n (3 < -
- w-w'
is a tight frame for real-valued time signals 1 E L2.
r
Without restriction of generality we may assume
2n w -w
(8)
{3:=-,- .
On account of Parseval's formula (2.11) and rule 3.1.(8) one has
IITI1I2 =
If
.L 1(1, ¢m,n> \2 = Lam 1(~) ¢(am~) einuTn{3e d~r m,n
m,n
Introducing the auxiliary function
we can write liT1112 in the form
IITI1I2 =
Lam m,n
If g(~)einuTn{3e~r =
L am lQmnl 2
,
m,n
the Qmn being given by
(note that the function 9 is identically zero outside the interval a-mI). The functions (nEZ)
4.3 The discrete wavelet transform
III
are the trigonometrical basis functions for an interval of length
in particular for the interval(7-m I. This indicates that the Qmn are essentially Fourier coefficients; in fact the formulas (2.8) give
211" 9~( -n) , Qmn = (7 -m {j" and summing over n (m is fixed) gives
At the up arrow i we have used Parseval's formula for period length 21r/(3, as quoted in (2.8). In this way we finally obtain
It is only at the very end that we have used real-valued. In this case the identity
the~umption
i( -~) ==
i(~) holds.
that
(7-m •
f should be ~
We now are confronted with the task of producing a mother wavelet 1/J that satisfies the assumptions of Theorem (4.11). Since these assumptions refer to the Fourier transform it suggests starting with;P. In the following example, constructed by Daubechies-Grossmann-Meyer, a suitable;P is given in terms of simple formulas; the actual wavelet 1/J in the time domain then has to be computed numerically. Now, this Fourier inversion concerns a single function and may be performed once and for all, preceeding the wavelet analysis of any time signal f.
;p,
112
4 Frames y 1~----------------~~--------
t, x
o
1/2
1
Figure 4.6
CD
We shall need the auxiliary function
"(x)
,~ {~OX3 - 1s,,' + 6'"
(x $ 0) (O$x$l) (x~
(9)
1)
(or some other function with similar properties). In the interval 0 $ x :$ 1 this function can be written as
Looking at the integrand on the right hand side (see Figure 4.6), we see that it has a double zero both at t = 0 and at t = 1, is otherwise positive, and is symmetrical with respect to the point t = ~. It follows that vex) increases monotonically from 0 to 1 in the interval 0 $ x $ 1, with C 2-crossings at the points x = 0 and x = 1; moreover, the mentioned symmetry implies the identity \fxER, (10) v(l - x) = 1 - vex) which is going to playa certain role later on. Let u
> 1 and 13 > 0 be given,
and set 27f
W := (u2 -
1)13 '
4.3 The discrete wavelet transform
113
in this way (8) is fulfilled. We now define the formula
!
;j; having support
I := [w, w'] by
sin(~v( ~ -
;j;(~)
:=
.JAi.
w )) aw-w cos(~v( ~ aw )) 2 a 2 w -aw 2
-
o
(aw ::; ~ ::; a 2 w)
(11)
(otherwise)
(see Figure 4.7). The constant A' appearing here is determined by the condition II'¢II = 1.
(a=2, ,8= 1)
1
o
21r/3
41r/3
Figure 4.7
As we remarked earlier, the function
m
is invariant with respect to the transformation ~ t-+ a~. If we restrict our attention to the ~-interval [w, aw], then we see that only the two terms corresponding to m = 0 und m = 1 contribute anything to W(~) at all. Therefore we have
where we have used the abbreviation ~-w
aw-w
=:
x.
So much for ;j;. The (complex-valued) wavelet '¢ having the given ;j; as its Fourier transform is shown in Figure 4.8; one observes that Re('¢) is an even, Jm('¢) an odd function. - We shall come back to this example in Section 5.3.
o
114
4 Frames y
y=Im('!f;(t))
4
Figure 4.8
Daubechies-Grossmann-Meyer wavelet (step sizes u= 2,
.8= 1)
4.4 Proof of theorem (4.10) The following proof is essentially taken from [D], Section 3.3.2. We are confronted with the task of estimating the sum on the right hand side of 4.3.(5) as accurately as possible. To this end, we begin with 3.1.(9):
Introducing the auxiliary function .'\.
-~--
g(~) := f(~) '!f;(a~)
,
(1)
we can write
Wf(a,nb)
(2)
4.4 Proof of theorem (4.10)
where we have tacitly assumed b -I-
is periodic and of period
l15
o.
The function
2:. Because of the formulas (2.8) we therefore can
interpret (2) as
Wf(a, nb) =
~ lal l /2 . b21r G( -n)
.
Taking the sum with respect to n we obtain
(3)
where at the end we used Parseval's formula for period length
2:, see (2.8).
We now take a closer look p.t the last integral:
Substituting
e+ l2:
=: .;', we can continue with
The last expression is now inserted into (3), leading to the following intermediate result:
L n
IWf(a, nb)12 = 27rbla l
L Jg(e) g(f. + k 2;) de . k
116
4 Frames
Here we set
(mE Z) and sum over m as well, so that we finally obtain
IITfll2 = 2:jWf(a m,nam,8)j2 = m,n
~
2:Qkm.
(4)
k,m
When the Qkm appearing on the right are unpacked using the definition (1) of g, they look as follows:
It will turn out that the terms with k = 0 in (4) account for the lion's share of II T f 112. For this reason we collect all terms Q km belonging to k =1= 0 into a single remainder term Q and write (4) in the form
We now have to play the dominant part and the remainder term against each other. In order to bring the main line of reasoning to a close, we formulate the following lemma:
(4.12) Let'lj; be an admissible wavelet with parameters a, p, C and A'. Then there is a constant B' such that Vt;, E JR., m
and, more important, one has
(5) with a constant C' that does not depend on ,8. Using this lemma and, of course, Definition 4.3.(3) of the parameter A' we arrive at the inequalities
4.4 Proof of theorem (4.10)
117
appearing in the statement of the theorem. (4.10), modulo the lemma.
This completes the proof of ~
It remains to carry out the proof of Lemma (4.12).
r
In order to estimate the sum I:rn I-¢(urne)12 from above we have to treat the terms corresponding to m < a resp. to m 2: a separately, using the appropriate inequality concerning -¢ in each of the two cases. In this way we obtain
as stated. Now we come to (5), but this is a longer story. We regard Qkrn as a scalar product, using a suitable decomposition of the various factors appearing in the definition of Qkrn. In this way we obtain, by Schwarz' inequality,
If we use the substitution formula transforms into
e+ 2k1f/ (urn f3)
=:
e'
in the second factor, this
For the estimate (5) we now have to sum the IQkrnl over all k f; a and all m. For the inner sum (with respect to m) we use Schwarz' inequality in the form
118
4 Frames
leading to
(6)
In order to estimate the sums auxiliary function
Lm under the integral signs we introduce the
q(s) := sup LI~(O"m~)II~(O"m~ + s)1 ' t;
m
where, as we have seen in similar cases before, it is enough to take the supremum over the set of ~'s with 1 ::::; I~I : : ; 0". In terms of this function q(.) the inequality (6) takes the following form:
IQI : : ; 11/112 L
vq(2k1f/{3) q(-2k1f/{3) .
(7)
kiO
In estimating q(.) we may assume (3 ::::; 1f from the outset; this has the consequence that only values q(s) for lsi ~ 2 need to be considered. As in the first part of the lemma we have to treat the terms corresponding to m < 0 resp. to m ~ 0 separately. To this end we split q(.) into the two parts
so that in any case
(8) We take up m < 0 first. The inequalities
!~I
: : ; 0" and lsi ~ 2 together imply
Therefore the assumptions on ~ allow the estimate
4.4 Proof of theorem (4.10)
119
and taking the sum over all m < 0 one obtains
In the case m ~ 0 we argue as follows: At least one of the two numbers jam.;) and lam.; + sl is ~ Isl/2 (note that.; and s may be of different signs) and at least one is ~ lam';l. Both Isl/2 and lam.;! are ~ 1. Since l.;p(.;) I :$ a for all ~, these circumstances allow the following conclusion:
Taking the sum over all m
~
0, we see that q+ (.) can be estimated as follows:
Because of (8), we now have
(lsi ~ 2) and consequently
(k
=I: 0) .
Inserting this into (7) and performing the summation over all k obtain the stated estimate for Q:
=I: 0, we finally
It is easy to verify that the introduced constants 01, ... , 0 4 and A' do not depend on /3. -.J
5 Multiresolution analysis
The triumphant progress wavelets have made in a great variety of applications is based in the first place on the so-called "fast algorithms" (fast wavelet transform, FWT) , and these in turn owe their existence to a careful choice of the mother wavelet 'I/J. So far in this book the particular mother wavelet chosen only had to fulfill some ''technical'' conditions, such as tr'I/J E L1 or 'I/J E C r for some r ::::: 0 and, of course, 1P(O) = 0 or, even better, 'I/J should be of a certain order N > l. The trigonometric basis functions e",: t 1--+ ei",t are distinguished by the following linear reproducing property: If such a function is subject to a translation Th ) it simply picks up a constant factor:
Contrary to this, in the realm of wavelets the operation of scaling is the central theme, i.e., for arbitrary a E R* the operation
With respect to this operation, the wavelets considered so far did not behave in a special way (except 'l/JHaar)' OK, their graph became flattened out or got compressed in the t-direction, depending on the value of a, but there was no reproduction property in the sense that the scaled version of a 'I/J could be related to the original 'I/J in some other way. In the discrete case only the integer iterates of a single scaling operation D cr , a > 1 denoting the zoom step, enter the picture. From now until the end of the book we choose a := 2; by the way, this is also the value most commonly used in practice. If we now adopt a mother wavelet that in a certain way "reproduces itself" when it is subject to the scaling D 2 , then novel and highly desirable effects develop. That's what "multiresolution analysis" is all about. To be more specific, things are arranged in such a way that the mother wavelet 'I/J satisfies a linear identity having the following structure: n
D2'I/J (t) == 2:>k'I/J(t - k) . k=O
5.1 Axiomatic description
121
This identity carries in its wake analogous linear formulas between the scalar products (J, '¢n,k) and (J, '¢n+1,k) , so that these scalar products (called the wavelet coefficients of f) need not be computed by tedious integrations over and over again when going from one zoom level to the next one. The definitive formulas will look somewhat different, but this is the general idea.
5.1 Axiomatic description In Section 4.3 we discretized the continuous wavelet transform, and we showed that under suitable assumptions a discrete, i.e., countable, set of "wavelet measurements" (Tf(m, n) I (m, n) E Z2) is sufficient to allow the complete reconstruction of f in the L2-sense or pointwise, etc., depending on the exact circumstances. Multiresolution analysis is discrete to begin with, and the wavelet functions '¢j,k being used form an orthonormal basis of L2 by construction, so it is not necessary to compute any ~j,k'S. We now come to the formal definition. A multiresolution analysis, abbreviated MRA, is constituted by the following ingredients (a)-(c). (a) A bilateral sequence (Yj Ij E ordered by inclusion, ... C V2 C VI C
Va
Z)
of closed subspaces of L2. These Yj are
C V-I C ... C
Vj
C
Yj-I C ... C L2
(1)
(smaller values of j correspond to larger spaces Yj !), and one has
(separation axiom) ,
(2)
(completeness axiom)
(3)
The following intuitive description will be helpful later on: The time signals f E Yj only comprise features (i.e., details) exhibiting a spread of size 2': 2j on the time axis. The more negative j is, the finer are the details that may occur in a f E Yj, and "in the limit" every single f E L2 can be attained by functions Ii E Yj. (b) The Yj are connected to each other by a rigid scaling property: Vj EZ.
(4)
122
5 Multiresolution analysis
Referring to time signals! this can be expressed as follows:
!
E
Vi
¢=>
! (2j .)
E
(5)
Vo .
(c) Vo contains one basis vector per base step 1. To be precise, there is a function if; E L2 n L1 such that its translates (if;( . - k) IkE Z) form an orthonormal basis of 110. This function if; is commonly called the scaling function of the MRA under consideration; it is the determining element of the whole setup. Please note: Several authors number the Vi's in the reverse direction compared to (1). We stick to the ordering used in [D]. According to (c) above, the space 110 can be described as a set of time signals ! in the following way:
Vo = {!EL21!(t) = Ekckif;(t-k),
Ek!ck!2
(6)
Using if; as a template we now define the functions
if;j,k(t) :=
Tj/2if;C-~'2j)
= Tj/2if;(;j -k)
(jEZ, kEZ)i
this being in obvious concordance with the formulas defining the wavelet functions 'l/Jm,n' It then follows immediately from (b) that the family (if;j,k IkE Z) is an orthonormal basis of Vi , two subsequent functions if;j,k and if;j,k+l now being translated by the amount 2j with respect to each other. According to our remarks concerning point (a) above, one may interpret the orthogonal projection Pj of L2 onto Vi as a low-pass filter: The image Pj ! of a time signal! E L2 incorporates all features of ! whose horizontal spread over the time axis is of size 2j or larger. Pj is given by the following formula: 00
Pj! =
L
(f, if;j,k) if;j,k .
(7)
k=-oo
CD
The simplest example of an MRA is obtained as follows: Choose ¢ :== lro,l[ and set
Vo
:=
{f E L2
Vi
:=
D 2j
(110)
I!
constant on intervals [k, k + 1[ } ,
(j # 0) .
Then (b) and (c) are obviously fulfilled, and (1) is also guaranteed. The separation axiom (2) holds trivially, and completeness (3) is an immediate consequence of the fact that the step functions with jumps at the binary rationals k· 2j are dense in L2. If one applies the general constructions described in Sections 5.1-5.3 to this example, one obtains the Haar wavelet. We shall explore this in detail in subsequent examples.
0
5.1 Axiomatic description
123
Because of the inclusions (1) the ¢>j,k cannot be brought together to form a "big" orthonormal basis of L2. For this reason we construct, besides the chain of spaces Vj , a system (Wj Ij E Z) of pairwise orthogonal subspaces Wj C L2 in the following way: Wj is the space gained in the transition from Vj to the next larger space Vi-I in the chain (1). By this intuitive description we of course mean the following: Wj is the orthogonal complement of Vj in Vj-I. Then one has VjEZ; (8) furthermore, everything is set up in such a way that the formulas analogous to (4) and (5), namely resp.
(9)
hold likewise; their easy verification may safely be left to the reader. Bearing the chain (1) and the definition (8) ofthe Wj in mind, the following proposition becomes plausible:
(5.1) If the system (Vj liE z) possesses the properties (a) of 811 MRA, then tbe corresponding subspaces Wj are pairwise orthogonal, and furthermore (orthogona,l direct sum) .
r
If i
(10)
> j, then Wi C Vi-I C Vj, and using (8) one concludes that Wi
1.
Wj. For the proof of (10) we need completeness (3) as well as the separation condition (2). We have to prove that 1 E L2 and
together imply 1 =
o.
Letan e > 0 be given. By (3) there is a jo and an ho E Vjo with 111 - ho II < e ; for the sake of simplicity we may assume jo = O. Such an ho E Vo being chosen, there is an hI E VI and a 91 E WI with
and similarly there are
h2 E
V2 and 92
E
W 2 such that
124
5 Multiresolution analysis
Proceeding in this manner along the descending chain Vo one arrives after n steps at the representation
~
VI ~
V2
~
... ,
Since all vectors appearing on the right hand side of this equation are orthogonal to each other, we have n
\!hnW + L
1!9kW =
I!hoW
\In.
k=1
This implies that the series L~o 119k W is convergent, whence the series L~o 9k converges in L2, from which in turn we may conclude that the limit limn ...... oo h n =: h exists. Consider a fixed j E Z. For all n ? j one has h n E Vn C Vi; and since the This being true for all j we conclude from
Vi are closed, we also have h E Vi. (2) that h = o. This implies
Now, by assumption, the function have
f is orthogonal to al19k
E
W k , whence we
00
(f,ho) = L(f,9k) = 0 . k=O
This implies the inequality
IlfW = Ilf - hoW -llhoW < e:2 by the Pythagorean theorem; and since e: was arbitrary, we come to the conclusion that f = O. -1 Let Qj denote the orthogonal projection of L2 onto Wj. From (8) we conclude by general principles that Qj = Pj- 1
-
Pj
resp.
Pj - l = Pj +Qj .
A few moments ago we interpreted the projection Pj as a low-pass filter. Pursuing such ideas further we can now say the following: Pj-I! incorporates all features or details of the signal f exhibiting a spread of size? 2j - 1 on the
5.1 Axiomatic description
125
time axis, and in forming the difference Pj-I! -Pj! = Qj! one removes from Pj-I! all features with a time spread of size ~ 2j . In this way we can regard Qj as a kind of filter that retains resp. sieves out of! just those features or details that have a time spread of size t v 2j /.../2. Or, to look at it another way, one obtains the more detailed Pj-I! by adjoining to Pj!, the latter encompassing all features of! with a time spread of size 2j and greater, the details of size t v 2j /.../2 stored in the vector Q j!' Looking at the orthogonal decomposition
we can put forward the following naIve miscalculation: To fix up the space V-I we need two basis vectors per unit length, and from YO we already have one basis vector per unit length at our disposaL As a consequence the space Wo should get by with one basis vector per unit length as well; furthermore, on account of symmetry reasons it should be possible to arrange matters in such a way that the basis vectors of Wo are integer translates of a single function 'I/J, in the same way as the,basis vectors of YO are integer translates of a single tfo. In other words, there is~some hope that we can find a function 1/J E L2 such that the collection (1/J(. - k) IkE Z) is an orthonormal basis of Woo Such a
1/J would then be our mother wavelet. If one subsequently sets (j
E
Z, k
E
Z),
as agreed upon in the beginning of Section 4.3, then the family
(j is fixed here) is an orthonormal basis of Wj, and the orthogonal projection Qj: L2 -+ Wj is given by the following formula: 00
Qj!
L
(j, 1/Jj,k) 1/Jj,k .
k=-oo
The totality of a111/Jj,k , i.e., the family
would then be an orthonormal wavelet basis of all of L2 by Proposition (5.1).
126
5 Multiresolution analysis
The following sections are devoted to the realization of this dream. In the particularly simple case of Example CD the above naive miscalculation is actually correct, because the supports of the functions
5.2 The scaling function The scaling function
If
= 1
resp.
~ \ =1\
(1)
is necessary and sufficient for 5.1.(2) and 5.1.(3). The third and last condition that we have to take into account is maybe less obvious than the first two, but it is the most crucial of them all: We have to make sure that the inclusions 5.1.(1) are guaranteed. The verification of the following lemma is left to the reader:
(5.2) Assume that a
5.2 The scaling function
127
(5.3) For the inclusion Va C V-I it is necessary and sufficient that an identity of the form
V2
¢(t)
=
L
hk¢(2t-k)
(almost all t
E
JR.)
(2)
k=-=
is valid with a coefficient vector h.
r
E
l2 (Z).
The relations 5.1.(5) and 5.1.(6) imply
so in order for ¢ E Vo C V-I to hold, condition (2) is necessary. Conversely, the identity (2) implies for arbitrary l ¢(t -l) =
L=
V2
hk ¢(2t - (k
+ 2l))
E
Z the identity
(almost all t E JR.) ,
k=-<>o
and as a consequence one has 00
¢O,l =
L
hk ¢-I,k+21 E V-I
Vl EZ.
k=-=
Under such circumstances it is clear that arbitrary linear combinations of the 4>0,1 are lying in V-I as well, thus Va C V_I is proven. ~ The identity (2) goes by the name of the scaling equation; as we have said, it controls the entire multiresolution analysis. As a matter of fact, we shall see in Theorem (6.1) resp. 6.1.(2) that the coefficient vector h. determines the scaling function ¢ uniquely. The coefficients hk also appear in the corresponding algorithms; in fact, they determine more or less everything. When doing numerical computations, one does not need the scaling function ¢ nor the corresponding mother wavelet (that we shall construct in due course) at one's constant disposaL This is in marked contrast to Fourier analysis, where one has to compute function values eit; time and again. The scaling equation describes a kind of "self-similarity". It can be compared with the equation r
K = Uh(K) i=I
5 Multiresolution analysis
128
appearing in the theory of fractal sets, resp. of iterated function systems, to be exact. The Ii in this latter equation are contracting similarities of the euclidean plane; along the same vein the maps T 1-+ t := ~(T + k), playing a key role in wavelet theory, are contracting similarities of the real axis. - That the scaling function 4> should have the reproducing property (2) is obviously a very strong restriction on the possible choices for such a function. The hk cannot be chosen arbitrarily, either. Indeed, we have to make sure that the 4>o,k form an orthonormal basis of Vo. Since the scalar product in L2 is translation invariant, the equations
VnEZ are necessary and sufficient for that. In conjunction with (2) this leads to
80n
=
=
J
4>(t - n) 4>(t) dt = 2 ~ hk hl k,l
~ hk hl ~
J
4>(2t - 2n - k) 4>(2t -l) dt
J
4>(t' - 2n - k) 4>(t' -l) dt' =
~ hk hl82n+k,l ~
We see that in order for the 4>o,k to be orthonormal it is necessary that the hk satisfy the so-called consistency relations 00
~ hk hk+2n
(5.4)
=
VnEZ;
80n
k=-oo
in particular, one must have
L::k Ih k [2 =
l.
While we are at it, we are going to prove a certain linear relation among the hk ; the condition q -I 0 appearing therein is of no importance because of (1).
(5.5) Suppose that h
I
E
l1(Z) and that
J 4>(t) dt =: q -I O.
Then
Integrating the scaling equation (2) from -N to N with respect to t gives
j
N -N
4>(t) dt
=
..J2 ~ k
hk
jN -N
4>(2t-k) dt
=
j2N-k ~ hk 4>(t') dt' . (3) v2 k -2N-k 1
/0
5.2 The scaling function
129
Since
I[::~: ¢(t') dt'! ~ 1I¢1I1
VkEZ,
we can apply the theorem of Lebesgue to the sum on the right hand side of (3). Letting N --* 00 in (3) we obtain
from which the theorem follows. But we should be careful: Even if we have a coefficient vector h. E l2(Z) that satisfies the relations (5.4) and (5.5), we can by no means be sure that there exists a usable function ¢ fulfilling the scaling equation (2). Let us assume for the moment that a multiresolution analysis according to (a)-(c) above is given to Ufl. If we write (2) in the form
then we see that according to general principles about orthonormal bases one has the formula (k E Z) . (4) The scalar product (¢, ¢-l,k) can only be =I 0, if the supports of ¢ and of ¢-l,k overlap. Thus formula (4) allows us to conclude the following:
(5.6) If the scaling function ¢ has compact support, then only finitely many hk are different from O. But one can say even more. To this end, for arbitrary functions f: lR define the quantities
a(f)
:= inf{x
I f(x) =I O} 2: -00,
b(f)
:= sup{x
~
C we
I f(x) =I O} ~ 00 .
Thus a(J) and b(J) are respectively the "left end" and the "right end" of the support of f. In the following theorem we assume for simplicity that ¢ is a bona fide function, not a mere L2-object.
130
5 Multiresolution analysis
(5.1) lithe scaling function > has compact support, then the quantities a:= a(» and b := b(» are integers, and at most the hk with a ~ k ~ b are different from O. lOne has
b(>-I,k) =
1
"2 (b(» + k) .
On account of (5.6), the integers kmin :=
I i= O} ,
I i= O}
min{k hk
k max := ma:x{k hk
are well defined. Considering the right hand side of the identity (2) as a superposition of congruent graphs, translated with respect to each other by steps of and taking the a(·) and the b(·) on both sides we see that the following is true:
l,
The last two equations give at once has hahb i= 0 as a bonus.
kmin
= a,
k max
=
bj
in particular, one
-1
Taking into account that only the hk are going to playa role in the numerical algorithms, the last two propositions make it obvious that constructing scaling functions with compact support is not a mere academic exercise. But we still have a long way to go until we are there. := ¢Haar := 1[0,1[
considered in Example 5.1.
>(t) == >(2t) + >(2t -1)
resp.
1
>
1
= .J2 >-1,0 + .J2 >-1,1
(see Figure 5.1). Thus in the case at hand we have 1
ho = hI =
.J2'
hk = 0 Vk E Z \ {O, 1} .
(5)
It is easily verified that the statements (5.4), (5.5) and (5.1) are confirmed by this example. 0
:
5.2 The scaling function
131
1 1 - -.....,...------,
I I I I
o
1
Figure 5.1
To conclude this section, we take up a problem that we have postponed so far: We have to formulate precise assumptions on the scaling function > that guarantee separation 5.1.(2) and completeness 5.1.(3) of the resulting family (V; Ij E Z). The following theorem shows that under very mild technical assumptions on >, condition (I), listed at the beginning of this section, is indeed the only condition for these axioms to hold. (5.8) Assume that the scaling function > E L2 satisfies an estimate of the form C (t E JR) (6) I>{t)) $ 1 + t2 and that the fa,mily (>O,k IkE Z) is an orthonormal basis of Va. Then, first, one has separation: (7)
nv;={O}; j
and second, if and only if the integral also has
r
Uj V;
J
¢(t) dt =: q has absolute value 1, one
= L2, i.e., completeness.
Any I E Va has a representation of the form I = L:k Ik ¢O,k with Because of (6) we have the further estimate
2:k )/kj2 = 11/112 < 00.
Vt E JR (with another C), and this implies, by Schwarz' inequality, that
I/(t) I
s I: Ilkl !>(t k
k)1
s C 11/11
(almost all t E JR) .
132
Since
5 Multiresolution analysis
I
E Vo
was arbitrary, we therefore can say that
1111100
:= esssup
II(t)1
c 11111
~
tEIR
For a given 9 E Vj the function following:
I
:= g(2 j .) is in Vo , whence we can say the
Now, if such a 9 belongs to all Vj (j > 0) simultaneously, then this is possible only if IIglioo = 0, whence 9 = O. This proves (7). The space V:= Uj Vj is invariant with respect to the translations Tk (k E Z) and the dilations D 2 j (j E Z); on the other hand, the step functions with jumps at the binary rationals k . 2j are dense in £2. To prove the second statement it is therefore enough to prove the following: The function
I
:= 1[-1,1[ belongs to
V, if and only if Iql =
1.
The relation I E V can be expressed as follows: The function I is arbitrarily well approximated in the £2-sense by its projections P_jl when j - t 00, i.e.,
By general principles this is equivalent with
(8) Keep j > 0 fixed for the moment. By 5.1.(7) we have
P-jl =
L Ck 4>-j,k , k
and consequently
II P_j11l2 =
L ICkl k
2 •
5.2 The scaling function
133
The Ck can be computed as follows:
Ck
=
j1 4>-i,k(t) dt -1
= Ti/ 2
j
= 2i/2j1
4>(2it - k) dt
-1
(9)
N-k
4>(t') dt' ,
-N-k
where we have written 2i =: N as an abbreviation. In the following, the letter C denotes various positive constants that may depend on the chosen scaling function 4>, but not on j (resp. N) and k, and the letter e denotes various complex numbers of absolute value ~ l. From (6) we deduce for arbitrary a
1
> 0 the estimate
1
00
Itl~a
14>(t)ldt<2
a
C
C
2"dt=-. t a
(10)
In order to obtain additiop.al manrevering space in the subsequent convergence discussion we now choose'an c E ]0,1]. It then follows that there is an MEN with
r
Jltl~M
14>(t)ldt
~
c .
(11)
We are now going to estimate the integral on the right hand side of (9). We may assume from the outset that N := 2i ~ M and distinguish the following three cases: (a) If Ikl ~ N - M, then one has -N - k ~ -N + (N - M) = -M and analogously N - k ~ N - (N - M) = M. Because of (11) we therefore may conclude that
Ck
=
Ti/2
(if + 8c) ,
and from this we easily obtain
(b) If N -M <
Ikl ~ N
+M, then
Il:~: 4>(t) dtl ~ J/4>(t)1 dt = C implies the estimate
ICkl ~ 2-i / 2 C.
134
5 Multiresolution analysis
(c) If JkJ > N + M and, e.g., k > 0, then for the upper limit of the integral in question, one has N - k -M < O. This implies in view of (10) that the corresponding Ck can be estimated as follows:
:s
Summing over all such k One obtains
Taking into account the respective numbers of k's in the two cases (a) and (b) we arrive at the following representation of JJPj fJJ2:
II P _jf1l 2 =
2: ICkJ2 k
=
(2. (2j -M) + I) Tj(Jq12 +cee) + T j e(4MC + ~)
= (2JqJ2 +cee) + T Letting j
- t 00
~)
we can draw the conclusion that .lim
)->00
As e
j e(2M(lqI2 + C) +4MC +
JJP-jf1l 2 = 2lqJ2 + cee .
> 0 was arbitrary we see that (8) is valid if and only if Jql = 1.
5.3 Constructions in the Fourier domain Multiresolution analysis is "invariant" with respect to (a) integer translations of the time axis and (b) dilations by powers of 2. In order to make the best use of this inner symmetry we shall transfer the actual construction of admissible scaling functions 4> and corresponding mother wavelets 'IjJ into the "Fourier domain". As a consequence, e.g., the orthonormality of the 4>o,k = 4>(' - k) has to be expressed in terms of properties of ¢; of course we also need a Fourier version of the scaling equation, and so On.
5.3 Constructions in the Fourier domain
135
For an arbitrary function ¢ E L2 one may write
The integral on the right hand side can be thought of as an integral over Z x [0, 27r J. If one interchanges the order of integration, the function
{p(e)
:=
2:)¢'(e + 27rl)/2 I
appears as the new inner integral. By Fubini's theorem {P is defined almost everywhere, first on [0, 27r], then on all of JR, is 27r-periodic, and one has
We first prove the following lemma:
(5.9) The integer translates ¢k := ¢(. - k) of an arbitrarily given function ¢ E L 2 constitute an orthonormal system if and only if the following identity holds: (almost all
eE JR) .
(1)
r
For symmetry reasons it is enough to consider the scalar products of the form {¢o, ¢k}. They are computed as follows:
This implies that the orthonormality condition {¢o, ¢k} ~ 1 {P(k) = 27r
and the latter obviously means {p(e)
80k
==
= 80k is equivalent to
VkEZ, 2~ almost everywhere.
136
5 Multiresolution analysis
The next point on our agenda is the scaling equation
¢(t) == V2'Lhk¢(2t-k)
(almost all t E 1R.) .
(2)
k
Taking the Fourier transform on both sides of (2) we obtain, using the rules (R1) and (R2), the identity
Looking at this formula we are led to introduce (at first only formally) the function
H(~):= ~L hk e-ikf; ; v2
(3)
k
we call it the generating function of the multiresolution analysis under consideration. Because of Ilh.H = 1, the series (3) is almost everywhere convergent, by Theorem (2.4), and defines H as an actual 21l"-periodic function. If only finetely manyhnniknd ( Itdt t: • lnll
5.3 Constructions in the Fourier domain
137
(5.10) The generating function H of a multiresolution analysis satisfies the identity (almost all wE lR) . This of course implies that H is uniformly bounded on lR:
JH(w)J ~ 1
(w E lR) .
(5)
Furthermore, since ¢(O) =J. 0 by 5.2.(1), it follows from (4) that H(O) (5.10) in turn implies H(Jr) = O.
= 1, and
Our next goal is to describe the space W o, i.e., the orthogonal complement of Vo in the larger space V-I> as explicitly as possible. Having such a description in hand we shall be able to give an explicit formula for a possible mother wavelet 'if; belonging to the given scaling function
f =
f
E V-I has a representation of the form
L fk
fk
= (j,
(k E Z) ,
k
and taking the Fourier transform on both sides we obtain (cf. the same calculation for the scaling function
i(~) = ~ L
fk e- ik f,/2
¢(~)
.
(6)
k
Therefore we introduce (analogously to H above) the function
(7) In this way formula (6) becomes
(8) The series appearing in (7) is convergent for almost every ~ E Rj21l'; therefore we can say that the representation (8) is valid for almost all ~ E lR. The above chain of arguments can be reversed: If (8) is true for some function mf E L; , then f E V-I' A function
f
E
Wo C V-I is orthogonal to
170,
and as a consequence one has
(j,
5 Multiresolution analysis
138
for such
I,
and the latter is possible only if the periodic function
2: j(~ + 211"l) 1>(~ + 211"l) l
vanishes for almost all ~ E R/211". In the last sum we again separate the partial sums corresponding to even resp. odd values of l, then we express j by means of (8) and analogously 1> by means of (4), noting that and H are 211"-periodic. Altogether we obtain the following chain of equations, where in the end we again make use of (5.9):
mf
o ==
2: j(~ + 47l"l) ¢(~ + 411"l) + 2: fc~ + 211" + 411"l) ¢(e + 211" + 411"l) l
=
l
2: mf (~) H(~) l¢(~ + 211"l) 12 l
+
2:mf(~ +11") H(~ +11") I¢(~ +11" + 211"l) 12 l
=
(mf(~)H(~)+mf(~+1I")H(~+1I")). 2~·
It turns out that we have proven the following identity: (almost all w E R) .
(9)
Formulas (5.10) and (9) together can be paraphrased as follows: For (almost) every fixed w the vector
H :=
(H(w),H(w+1I"))
is a unit vector in the unitary space ([:2, and the vector
is orthogonal on H. It is easy to see that H and the further vector
together form an orthonormal basis of ([:2. This implies by general principles that illf =
>.(w)H',
(10)
5.3 Constructions in the Fourier domain
139
where the coefficient >.(w) is given by the formula
The function w f--+ >. (w) satisfies the identity >. (w + 1r) there is a 21r-periodic function v(·) such that
== - >. (w), consequently (11)
Inserting this into (10) and extracting the first coordinate we obtain the following representation of m f :
Introducing this into (8) we finally get for
1the expression (almost all e E JR.) _
(12)
This line of reasoning leids us to the following theorem:
(5.11) A function f E L2 belongs to the space Wo, if and only if there exists a function v(-) E L~, such that [ can be written in the form (12)_
r
We have already shown that f E Wo implies the existence of a 21r-periodic function v: JR. -+ C such that [has a representation ofthe form (12)_ Solving (11) for v(·) we get the expression vee) = e- i f.!2>.(e/2), and we infer from (10) that
This implies
Conversely, if (12) is true for some v(-) E L~, then we have (8) with
140
Multiresolution analysis
5
Because of (5) we may conclude that mf E L~, and this in turn implies
f E V-I- Furthermore, we have
proving that the vector m f is orthogonal on H for almost all w _ This means that (9) is true for almost all w; on the other hand, for an f E V-I this is equivalent to f l- Va-.J Inspired by the identity (12) we now define the mother wavelet 'ljJ corresponding to the given 4> by the following formula:
(13) It appears that in doing so we are successful:
(5.12) IE the mother wavelet 'I/J isdeiined by (13), then the system of functions ('l/Jo,k
IkE Z)
constitutes an orthonormal basis ofWo_
I
According to (5.9) the orthonormality of the 'l/Jo,k is proven by the following calculation:
:Llij;(e + 27rl) 12 = :L Iij;(e + 47rlW + :L \ij;ce + 27r + 47rl)\2 I
I
=
I
!H(~ + 7r) \2:L \¢(~ + 27rl) \2 + \H(~) r:L \¢(~ + 7r + 21rl) \2 I
=
(lH(~ + 1r)
r+ \H(~) \2) :1r
I
==
2~ -
As 1 E L~, it follows from (5.11) that 'I/J is indeed in W o , whence all integer translates 'l/Jo,k belong to Wo as well_ On the other hand, consider an arbitrary f E Wo _ By Theorem (5.11) resp_ (12) and (13) we know that there is a v(-) E L~ such that (almost all
eE JR) _
(14)
The function v(-) can be developed into a Fourier series Ek Vk e- ike , and by Carleson's theorem (2.4) this series converges almost everywhere to v(e)- It follows that we can replace (14) by
ice)
=
I: Vk e-ike ij;(e) k
(ahnost aU
eE JR) -
5.3 Constructions in the Fourier domain
141
Now, this is nothing more than the Fourier transform of the representation
J(t) =
"L
1/k
'if;(t - k)
resp.
k
the series appearing on the right converging in L2. Altogether this proves that the 'if;o,k do indeed form an orthonormal basis of Wo . ~ The scaling function > does not determine the corresponding mother wavelet 'if; uniquely, thus formula (13) can be modified to a certain degree. For instance, amending it by factors eiO! e-iNf. with a E JR, NEZ, is allowed. An additional factor e-iNf. in :;j produces a translation of the graph of 'if; by N units to the right. In this way, depending on circumstances, one can achieve that 'if; has the same support as >. Formula (13) gives only the Fourier transform of the wavelet 'if;. In order to obtain the function 'if; itself we have to translate (13) back into the time domain. Using (3) we get. '.
where at the very end we performed the substitution k := -k' - 1 (k' E Z). Therefore (13) can be replaced by
(15) According to the rules (Rl) and (R3) the last formula is nothing other than the Fourier transform of the representation
'if;(t) =
v'2"L(
_1)k-l h-k-l
>(2t - k) .
(16)
k
In order to get a well-structured set of formulas we set
(-1)
k-l-h-k-l =: 9k.
(17)
::>
14:;!
LVIUltlresolutlOn analySis
In this way (16) becomes '1/J(t) =
v2L9k ¢(2t -
(18)
k) ,
k
an identity that has the same structure as the scaling equation 5.2.(2). Another admissible definition of the 9k would have been (19) If, e.g., only the hk for D ~ k ~ 2N -1 are different from zero, then (19) implies the same state of affairs for the 9k, and all summations in the corresponding algorithms (see Section 5.4) range over the index set {D, 1, ... ,2N - I}. Let us summarize the results obtained so far in the following theorem:
(5.13) Assume that (Vj Ij E Z) is a multiresolution analysis with scaling function ¢ and generating function H, and let the mother wavelet '1/J be defined by (13) resp. by (16). Then the function system j
-"/2 (t-k.2 ) '1/Jj,k(t) .= . 2 J '1/J 2j'
is an orthonormal wavelet basis of L2(~).
r
Consider a fixed j E Z. Since according to (5.12) the '1/Jo,k constitute an orthonormal basis of Wo, it is an easy consequence of the principle 5.1.(9) and a small calculation that ('1/Jj,k IkE Z) is an orthonormal basis of W j . The theorem now follows from Proposition (5.1). .-J
CD
As our first example we take up the Haar multiresolution analysis again, cf. Example 5.2.Q). This time we are in a position to construct the mother wavelet '1/J following the prescriptions of the general theory. It is easy to verify that ¢ := ¢Haa:r has as its Fourier transform the function
¢(~) =
1 sin(~/2) e-i~/2
.J2i
~/2
(20)
On the other hand we now insert the values of the hk' as computed in 5.2.(5), into (3) and obtain the following generating function: H(~) = -
1
1 ~ "c/2 -(1 + e-''') = cos - e-'" . 2
v2v2
"C
(21)
5.3 Constructions in the Fourier domain
143
It is easily seen that the functional equation (4) is fulfilled in this case. The recipe (13) now gives
which is the same as 1.6.(1), up to a factor _ei ( This means that the 'IjJ we have constructed here is translated one unit to the left and is multiplied by -1 with respect to the "official" Haar wavelet. This fact is corroborated, if we now compute the 9k by means of (17): -
9-1
1
= ho = .../2'
-
9-2
= -hI
1
= -
.../2 '
all remaining 9k being zero. This gives ~
1
'IjJ = j2 ¢>-1,-1 -
1
j2 ¢>-1,-2
resp. 'l/J(t) = ¢>(2t + 1) - ¢>(2t + 2), as announced above. The reader may convince himself on his own that the alternative definition (19) of the 9k (in the case at hand we have N = 1) would have led to the "official" 'l/JHaar , whose support coincides with that of ¢>Haar' 0
@ As our second example we present the so-called Meyer wavelet. For its construction we again make use of the auxiliary function
vex)
,~ Haxz
-15x' +
6'"
(x S 0) (0 x S 1) (x 2:: 1)
s
shown in Figure 4.6 (this v(.) has nothing to do with the vO's appearing in Theorem (5.11»). We set 1
¢(~).-
v'21r _1_
~
o
(I~I S
cos(~v( ~1~1-1)) 2 21r
2;)
e: s I~I S \1r) (I~I 2::
\?r)
144
5 Multiresolution analysis A
'I7=4\t;,-27f)
,..----
1/V21r
/
/
o Figure 5.2
(see Figure 5.2). This defines a functionj> E L2 about which we can say the following right away: From the fact that ¢ has compact support it follows that ¢ E Coo, and because of ¢ E C2 the assumption 5.2.(6) of Theorem (5.8) is satisfied by ¢; furthermore, one has
J
¢(t) dt = V2rr¢(O) = 1,
as is required for Uj 10 = L2, see (5.8). In view of Proposition (5.9) we now have to examine the function
~(t;,)
:=
L I¢(t;, + 21rlW . l
A short glance at Figure 5.2 shows that it is sufficient to verify condition (1) in the f.-interval [2;, 4;]. In this interval only the two terms corresponding to l = 0 and l =-1 contribute anything to ~(f.) at all. Because of
2~ If. - 27f! - 1 = 1 - (2~f. - 1)
< C < 41l") ( 21l" 3 -<" - 3
and v(l - x) == 1 - v(x)
(x E JR)
it follows that 1 cos ~(t;,) = 21r
2(7f"2v (321l"f. -
1))
is valid for 2; ~ f. ~ 4;, as required.
1sm . 2(7f + 21r "2v (321l"f. - 1))
1
145
5.3 Constructions in the Fourier domain
Figure 5.3
The scaling function for the Meyer wavelet
We now define (out ofthe blue) the 27r-periodic function
lI(f.)
:=
..j2;
I: if;(2f. + 41fl)
(22)
I
(this is, for any given f. E JR, a finite sum!) and assert that Hand ¢ are in fact related to each other by the functional equation
as called for by the general theory.
I
The function f.
f-+
if;(~) has as its support the interval [- 8; , 8;]. On the
other hand, all functions if;( . + 47rl) belonging to an l on this interval. Therefore we already know that
i= a are identically zero
k
is true. This
implies that the right hand side of (23) has for all f. the value
if;({), as stated.
But on the support [-
t, 4;]
of if; the identity if;(~)
;:::;
-.J
146
5 Multiresolution analysis
1
0.5
-4
Figure 5.4
1/2
The Meyer wavelet
According to what we just have proven, the function 1> satisfies a scaling equation as well, so that now all circumstances required for a multiresolution analysis are established. Formula (13) gives the following expression for an admissible mother wavelet in this case:
e~/2 H(~ + 7f) ¢(~)
ie /2 L ¢(f. + 27f + 47fl) ¢(~)
= .j2;e
I
=
.j2;e~/2 (¢(f. + 27f) + ¢(f. -
27f))
2
¢(~)
The corresponding 'f/; is called the Meyer wavelet. One easily verifies that it is, up to the ''phase factor" eie /2 , nothing other than the DaubechiesGrossmann-Meyer wavelet 4.3.(11) corresponding to the step sizes (j := 2, f3 := 1. We refer the reader to Example 4.3.(!) for details. There the 'f/;j,k constituted only a frame. Thanks to the additional factor eie /2 provided by the general theory we now even have an orthonormal wavelet basis. In Figures 5.3 and 5.4 the scaling function 1> as well as the Meyer wavelet ¢ are shown in the time domain. 0 At the beginning of Section 5.2 we put on record that a scaling function 4> has to meet three (sets of) requirements: First, the 1>o,k have to constitute an orthonormal system, second, there is the normalization condition 5.2.(1) securing separation and completeness, and third, there is of course the scaling equation. We conclude the current section by showing how a given 1> that
5.3 Constructions in the Fourier domain
147
satisfies only the second and the third of these conditions can be improved in such a way that the resulting ¢# is a scaling function belonging to the same exhaustion (Yj Ij E Z) of L2 and such that its integer translates ¢# (. - k) are in fact orthonormal.
(5.14) Assume that the function ¢ E Ll n L2 satisfies a scaling equation as well as the condition J ¢(t) dt =1= 0, and let the spaces Yj C L2 be defined by 5.1.(5)-(6). If there are constants B 2:: A > 0 such that (almost all
EE JR.) ,
then the following are true:
(a) The family (¢( . - k) IkE Z) is a Riesz basis of Va ; in particular, it is a frame for Va with frame constants 27rA and 27rB. (b) If one defines the function ¢# via its Fourier transform by
then ¢# determines a multiresolution analysis with the same spaces Yj. This means, in particular, that the functions (¢# ( . - k) IkE Z) constitute an orthonormal basis of Va.
r
(a) We have to show that for arbitrary
1
:=
L::>k¢(' -
k) E
Va
27rB
L I kl
k
the following inequalities are true: 27rA
L I kl 2 ::; C
111112 ::;
k
The Fourier transform of 1 is given by
i= (Lcke-ike)J;(E), k
therefore we have
C
k
2 •
148
5 Multiresolution analysis
In an analogous manner one argues with respect to A, and (4.6) shows that the ¢( . - k) are a fortiori forming a frame. (b) Because of ¢ E L\ the function ¢ is continuous, and this implies in turn the continuity of one after another of 1 v'21T~
,
The two functions v'21T~ and 1/v'21T~ belong to L~. Denoting the Fourier coefficients of 1/v'21T~ by ak, we have (almost all
eE R)
(almost all
eE R) .
and consequently
¢#(E,) = Lake-ik~ ¢(e) k
Translating the last equation into the time domain we come to the conclusion that ¢# =
L ak ¢( . - k) Eva, k
vt
and this in turn implies c Va. In an analogous manner, using the Fourier expansion of v'21T~, one proves the inclusion Va C It follows that each one ofthe spaces ~# coincides with the corresponding Vj.
vt.
That the ¢#(. - k) are orthonormal is an immediate consequence of (5.9). But our proof is not completely finished. It still remains to show that ¢# satisfies the normalization condition 5.1.(1), which means the same as saying that the Vj fulfilled the separation and the completeness axioms to begin with. Because of Va c V-I the modified scaling function ¢# satisfies a certain scaling equation as well, whence also an identity of the form (4): (24) By assumption on ¢ we have ¢(O) -10 and consequently ¢#(O) -10 as well. Therefore we may conclude from (24) that H#(O) = 1 and, what's more, that
5.4 Algorithms
149
H# is continuous in a neighbourhood of O. Since H# satisfies the identity (5.10) we must have H#(1f) = O. We now assert that the following are true: ¢#-(21fl)
=
0
Vl E Z \ {O} .
r
For any given l =1= 0 there is an r E N and an n E Z such that l = 2r(2n+1). Ifwe apply (24) recursively r times, we get r-l
''¢#(21fl)
=
IIH#(2 r - j (2n+1)1f) .H#((2n+1)1f),¢#((2n+1)1f)
=
0,
j=l
since H# vanishes at odd multiples of 1f. In view of what we have just shown, we now have
as required by 5.1.(1).
5.4 Algorithms At this point we pause for a moment in our pursuit of the general theory, in order to present at long last the "fast algorithms" that we have repeatedly announced in earlier sections. In the framework of multiresolution analysis such algorithms lend themselves almost automatically, contrary to Fourier , analysis, where it took centuries from its invention (by Euler) until the advent 'of the FFT . . Maybe the reader has found the numerous factors v'2 and ~ appearing in the foregoing sections to be kind of a nuisance, and he very likely might have , thought that such factors could have been avoided by arranging definitions and notations more carefully. The truth of the matter is that the agreements we made are very sound: Everything is set up in such a way that these annoying factors do not occur anymore where it really matters, to wit, in repetitive numerical calculations.
150
5 Multiresolution analysis
The motor propelling the fast wavelet algorithms is the scaling equation
1>(t)
V2 L
=
hk 1>(2t - k),
(1)
k
paired with the analogous equation for written in the form
'I/J(t)
=
'I/J.
The latter can by 5.3.(18) be
V2 Lgk 1>(2t -
k) ,
(2)
k
the gk appearing in (2) being related to the hk according to 5.3.(17) or 5.3.(19). From (1) we deduce, for arbitrary j E Z, nEZ, the identity
This may be written in the form Vj, Vn,
1>j,n = L hk 1>j-I,2n+k
(3)
k
that is to say, as a recursion formula for 1>j-I,. one obtains from (2) the formula
'V't
1>j, •. In an analogous way
Vj, Vn,
'l/Jj,n = Lgk1>j-I,2n+k
(4)
k
which leads from the array 1>j-I,. to the array 'l/Jj, •. We are now going to analyze a time signal f E L2, and having done that we are going to synthesize it back to its original appearance. In the whole process there will be a finest scale to be considered; we may assume that it belongs to the value j = o. Therefore the analysis begins with the data
aO,k
:=
(j,1>o,k)
:=
J
f(t) 1>(t - k) dt .
These values could be determined, e.g., by numerical integration. It may also be the case that f is only given in the form of a discrete array (J(k) IkE Z) to begin with. In such circumstances one simply puts
aO,k
:=
f(k)
(k
E
Z) .
5.4 Algorithms
151
This is not so farfetched in view of the fact that J ¢(t) dt = 1, particularly in the case when ¢ has a narrow support and subsequent values of ! do not differ much from each other. Be that as it may, for the remaining discussion our basic assumption on ! can be summarized as follows:
I: aO,k ¢O,k .
Po! =
k
The wavelet analysis now proceeds in the direction of increasing j, and this means in the direction of ever longer waves resp. toward more drawn-out features of the signal!. We describe right away the step j - 1 ..,... j. Let j ;::: 1 and assume Pj-I! =
I: aj-l,k ¢j-l,k ,
(5)
k
where the values aj-l,k are known and stored in an array. Intuitively speaking, the image Pj-I! encompasses all features of! having a spread of size;::: 2j - 1 on the time axis; see our detailed explanations in this regard in Section 5.l. Our first task is the computing of the quantities aj,n (n E Z). Using (3) we obtain aj,n := (j, ¢j,n) =
I: hk (j, ¢j-I,2n+k) , k
so
that we can write down the following recursion formula for the step from to aj,_ :
aj-I,_
aj,n =
I: hk aj-I,2n+k k
The array aj,_ encodes the next coarser approximation of !, to wit
p.! J
= ~ '"' aJ,. k
A. . k • 'i'J,
k
The approximations Pj-I! and Pj ! are related to each other by the formula
Qj denoting the orthogonal projection onto Wj. The image Qj! contains all features (details) of ! that have a time spread of size rv 2j /..;2. Since ('if;j,k IkE Z) is an orthonormal basis of Wj, we can write Qj! =
L dj,k 'if;j,k , k
152
5
Multiresolution analysis
and on account of (4) the coefficients appearing here are given by
L
dj,n = (f, 'l/Jj,n) =
9k (f,4>j-I,2n+k) .
k
Expressing the scalar products on the right by means of (5) we therefore obtain the following formula for the "diagonal" step from aj-I,. to dj ,.: dj,n =
L
9k aj-I,2n+k
k
The information about the time signal ! that was extracted in the transition from Pj ! to Pj-I! is now stored in the array dj ,.. Contrary to the "temporary" quantities aj,k , the dj,k are actual wavelet coefficients. Altogether we obtain the following cascade, in the course of which at each step the signal ! is made coarser by a factor of two and at the same time details having a time spread of size 2j / V2 are extracted: IV
ao,.
Ii ---t
al,.
~g
Ii ---t
az,.
~g
Ii
a3,.
~g
dz,.
dl,.
Ii ---t
Ii
---t
---t
~g
~g
d3,.
aJ,.
dJ,.
(6) The wavelet analysis (6) of the given time signal! is terminated after J steps, where the number J comes out in a natural way, see below. We now address the following question: How many arithmetical operations were necessary for this analysis? In order to fix ideas we assume from the outset that the scaling function 4> has compact support. We know from (5.7) that in this case the numbers a( 4» and b( 4» are integers. In keeping with the notation used in certain famous examples later on we assume that
a(4)) = 0,
b(4)) = 2N -1,
N~l.
It follows from (5.7) that only the hk with 0 ~ k ~ 2N - 1 are different from 0, and the same is true for the 9k, if we agree on 5.3.(19). We introduce the following piece of notation: If x. is an arbitrary array over the index set Il, then the formulas supp(x.)
c [p, q[ ,
length(x.)
~
q- p
5.4 Algorithms
153
express the fact that at most the Xk with p ~ k < q are nonzero and that at most q - p individual entries are considered resp. stored at all. (The numbers p and q need not be integers.) The array ao,. encodes all the information that we are going to use about the time signal f. For simplicity, we assume, e.g., supp(ao,.) C
[0, 2J[ ,
length(ao,.) = 2J
.
We assert that under the described circumstances the supports of the arrays fJj,. can be bounded as follows: supp(aj,.) C [-2N + 2, 2J - j [
(j '2 0) .
(7)
r
For j = 0 the assertion is true by assumption. For the step j - 1 -v+ j we may suppose that j '2 1 and that supp(aj-l,.) C [-2N + 2, q[ ,
Because of
q := 2J -(j-l) .
2N-l
aj,n =
L
hk aj-l,2n+k ,
k=O
a component aj,n can be
1= 0 only if the two sets
{2n,2n + 1, ... ,2n + 2N -I}
and
[-2N + 2, q[
have a nonempty intersection, and for the latter it is necessary and sufficient that the inequalities
2n< q
/\
2n + 2N - 1 '2 -2N + 2
hold. The first of these says n < q/2 = 2J - j , the second n '2 -2N + ~. Thus we may conclude that supp(aj,.) is bounded as stated in (7). ...J Formula (7) suggests that we terminate the process after J steps, since from then on supp(aj,.) stays put at [-2N + 2, OJ. How many multiplications have been carried out up to this point? (For the sake of simplicity we are disregarding the additions here.) The computation of an individual value aj,n requires at most length(h.) multiplications. On the other hand we conclude from (7) that length(aj,.) ~ 2 J -
j
+ 2N -
2
(j '2 0) ,
= 2N
5 Multiresolution analysis
154
and for length(dj,.) we obviously have the same bound. Altogether we obtain the following upper bound for the total number f.L of multiplications required for the complete analysis of the given signal f: J
f.L::; 2·2N· 2:)2 J - j +2N-2) =4N(2J -1+J(2N-2)). j=l
This implies f.L ::; 21ength( h.) length( ao,.) (1
+ 0(1)) ;
that is to say, the number of required operations is linear in the input length. Starting from ao,. and proceeding in the described way we have computed in J ~ 1 steps the coefficient arrays
(the intermediate or "temporary" arrays ao,., ... , aJ-l,. are no longer needed). The total length of these arrays is about equal to length(ao,.), so that at first glance we have gained nothing in terms of storage requirements. But we have to bear in mind that the individual coefficient arrays dj,. will contain long sequences of negligible entries dj,k, depending on the fine structure of the time signal f in different regions of the t-axis. By disregarding all dj,k whose absolute value is below a certain threshold and releasing the corresponding storage cells one is able to achieve spectacular compression ratios without Significant loss of information. For instructive examples in this regard, we refer the reader to [19]. Now for the synthesis: Here we obtain an algorithm of a similar simplicity. Since the step j -1 ~ j amounts to replacing the orthonormal basis CPj-l,. of YJ-l by the likewise orthonormal basis CPj,. U 'l/Jj,. , the reverse step j ~ j -1 does not necessitate the inversion of a certain matrix. The details are as follows: One has
L aj,k hk + L dj,k 'l/Jj,k k
k
and consequently aj-l,n = (Pj-d, CPj-l,n) =
L aj,k (CPj,k, CPj-l,n) + L dj,k ('l/Jj,k, CPj-l,n) . k
k
The scalar products appearing on the right can be read off from (3) and (4):
5.4 Algorithms
155
so that altogether the following synthesis formula emerges: aj-l,n =
L hn- 2k aj,k + L gn-2k dj,k k
k
In this way we obtain as a counterpart to (6) an "upward" cascade that takes the coefficient arrays aJ,. , dJ,., dJ-I,., ... , d2,., dl ,. as its input and finally returns ao,., i.e., Pof, as its output: aJ,.
h ~
aJ-I,.
/g dJ,.
h ~.
aJ-2,.
h
h
~
~
/g
/g d2,.
dJ-I,.
al,.
h ~
ao,.
/g d 1,.
We leave it to the reader as an exercise to compute the total number 11 of multiplications required ~for such a synthesis. The resulting figure will be about twice as large as the 11 from the "downward" cascade (6). The boxed formulas show that we need only a table of the hk and the gk in order to be able to begin with concrete numerical work. Neither the scaling function 1> nor the mother wavelet Whave to be stored, be it numerically or otherwise, nor do they have to be recomputed on end at runtime. (By the way, one does not need to understand anything of the underlying theory either. .. ) In [DJ one finds a great number of such tables; they relate to various wavelets 'if; that for the one reason or another have proved their worth. The following example of such a table belongs to the so-called Daubechies wavelet 3W having support [0,5] : k
hk
0 1 2 3 4 5
.3326705529500825 .8068915093110924 .4598775021184914 -.1350110200102546 -.0854412738820267 .0352262918857095
gk
= (-I)kh 5 _k
.0352262918857095 .0854412738820267 -.1350110200102546 -.4598775021184914 .8068915093110924 -.3326705529500825
(8)
We shall construct this wavelet in 6.2.@ ab ovo, only there we shall see how the values of the hk tabulated above come about.
5 Multiresolution analysis
156
CD (Continuation of 5.3.@) We have not yet computed the hk corresponding to the Meyer wavelet. That's what we are going to do now. The generating function H(·) is given by 5.3.(22) and is an even function, as is 4>. Thus on account of 5.3.(3) we obtain successively hk
= .../2 2 2j'lr H(f.) eikf; df. = .../2j'lr H(f.) cos(kf.) df.
-'lr 2~ -'lr -.../21'lr V21fL: ¢(2f. + 4~l) cos(kf.) df. . ~
=
~
0
l
In the last sum, only the term corresponding to l = 0 is contributing anything to the integral, whence we obtain hk = h-k =
r ¢(2f.) cos(kf.) dE; .
2 -..fo Jo
These integrals now have to be computed numerically. In view of the function v(·) used in the construction, the resulting ¢ has 4-clicks at the two points ± and 3-clicks at the two points ± ~; apart from that it is infinitely differentiable. This implies (cf. Example 1.2.@) that for k ~ 00 the hk decay only like 1/k4. The numerical computation results in the following values:
2;
k
hk = h-k
k
hk = h-k
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
.748791 .442347 -.039431 -.127928 .033278 .057120 -.024807 -.025310 .016000 .009538 -.008556 -.002451 .003416 .000058 -.000647 .000225
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
-.000329 .000061 .000333 -.000231 -.000059 .000174 -.000115 -.000027 .000115 -.000067 -.000028 .000066 -.000040 -.000015 .000046 -.000027
o
6 Orthonormal wavelets with compact support
6.1 The basic idea We are confronted with the task of producing scaling functions ¢: lR -..,. C having the following properties: (a)
¢ E L2 ,
(b)
¢(t)==h~hk¢(2t-k)
(c)
!
supp(¢) compact,
¢(t) dt = 1 resp.
¢(O) =
(d) !¢(t)¢(t-k)dt=OOk
¢(~)=H(~)¢(~),
resp.
k
vk,
resp.
~1¢(~+21rlW==~. 211"
k
If all these conditions are met, then Theorem (5.13) will provide us with an orthonormal basis of wavelets 'l/Jj,k having compact support.
Condition (a) immediately implies ¢ E L1 and ¢ E Coo; furthermore, we know from (5.6) that only finitely many hk are nonzero. It follows that the generating function
H(~)
:=
..2.- ~ hk e-ik~ V2k
is a trigonometric polynomial satisfying the identity
(1) and having the special values H(O)
= 1, H(1I") = 0;
see (5.10).
The systematic construction of polynomials with these properties is an algebraic problem that we shall take up in the next section. For the moment we assume that we have such an H at our disposal, and we begin our undertaking by showing that the corresponding scaling function ¢, if there is one at all, is uniquely determined by H. Applying (b) recursively r times we obtain
6 Orthononnal wavelets with compact support
158
and, therefore, because of (c), 1
~
¢(~) = V2-ff r~~ }1 H(2j) r
,
(2)
if the infinite product converges. In this regard, we show: (6.1) Assume that the generating function H E 0 1 satisfies the identity (1) as well as H(O) = 1. Then the product (2) converges locally uniformly on IR to a function ¢ E L2.
I
Setting max IH'(~)I =: M
e
and using the mean value theorem of differential calculus we obtain IH(~)
-1\
= IH(~) - H(O)I ~ M I~I
(~ E
R),
therefore we may conclude
- 1\ I H(~) 23
< -
M23I.~I
~
(j
0) .
Because 2: j2 1 2- j
= 1, this implies by general principles that the product (2) is converging locally uniformly to a continuous function ¢: R - t Co In order to prove ¢ E £2 we have to modify the limiting process leading from H to
¢ slightly by means off a
"cut-off function". To this end we set
1o(~)
vk 1[_1r,1r[(~)
:=
and define recursively, as in (b),
(3) This implies ~
fr(~)
=
1 !2;
~
II H(2j) .1[_2 1r,2 1r[ • ... 2?r r
r
(4)
r
j=l
For any given
~ E
R there is an - 2r ?r
TO
~ ~
such that
< 2r ?r
VT
> TO ,
showing that the "cut-off factor" in (4) has no effect as soon as Therefore the comparison with (2) proves (~E
T
>
TO.
R) ,
moreover, we have locally uniform convergence of the fr as well. The next point on the agenda is the following lemma:
6.1 The basic idea
159
(6.2) For each r ~ 0 the family (jr(' - k) IkE
r
Z)
is an orthonormal system.
Because of Proposition (5.9) the assertion of the lemma is equivalent to
(r Now the recursion formula (3) for the mula for the functions
<pr(e)
=
~
0) .
(5)
f.. implies the following recursion for-
L If..(e + 411"l) 12 + L If..(e + 211" + 411"l) 12 l
l
Since statement (5) is obviously true in the case r (1) together imply that .it is true for all r ~ o.
= 0, the last equation and -.l
,
In particular we have IIfrll 2 = 1 for all r ~ O. Using Fatou's lemma we therefore may draw the following conclusion about the limit function J;:
This proves ¢ E L2. The existence of a scaling function ¢ corresponding to the given H being established, we now have to take care of supp(¢). How can we be certain that the scaling function (2) indeed has compact support, given that only finitely many hk are nonzero? The functions fr that were used in the proof of Theorem (6.1) and are converging to ¢ in L2 certainly do not have compact support; in fact, theX are holomorphic functions of the complex variable since the sets supp( fr ) are compact. In order to get control over supp( ¢) we have to argue directly in the time domain. So let us assume that
e,
I
a(h.):= min{k hk =F O}
= 0,
I
b(h.):= max{k hk =F O}
= 2N -1.
(6)
If the resulting ¢ has indeed compact support, then we know from (5.7) that the latter is bounded below by a(¢) = 0 and above by b( ¢) = 2N -1. We now construct a second sequence (gr Ir ~ 0) that converges in some sense to ¢;
160
6 Orthonormal wavelets with compact support
but this time we make sure that the supports of all gr are lying in the interval [0,2N - 1] we are aspiring to. For the definition of such a sequence we recall the reproducing property of if> encoded in the scaling equation 5.2.(2). It can be expressed as follows: The scaling function > is a fixed point of the transformation 2N-l
Sg(t) :=
..J2
L
hk g(2t - k) .
k=O
In functional analysis the common procedure to determine a fixed point of some mapping S is the following: One chooses a suitable starting point go and defines recursively a sequence (gr Ir ~ 0) by the formula
(r
~
(7)
0) .
If one is lucky, this sequence converges to ''the'' fixed point if> of S. In view of 5.2.(1), in the case at hand we choose go := 1[0,1[ and define the sequence (gr Ir ~ 0) by (7). The first thing we prove is
SUpp(gr) C [0,2N -1]
Vr~
O.
(8)
Because N ~ 1, the assertion is true for r = o. If (8) is valid for a certain r, then the value gr+l(t) = Sgr(t) has to be 0, unless the two sets
I
{2t - (2N -1), ... ,2t -1,2t}
and
[0,2N-1]
have a nonempty intersection; for this to be the case, the inequalities
2t
~
0
2t - (2N -1) 5: 2N -1
must hold, which is the same thing as saying that 0 5: t 5: 2N - 1. The effect of S in the Fourier domain is obviously given by
and iterating this r times produces for our gr the formula
.J
6.1 The basic idea
161
Now by 5.3.(20) we have
90(';)
_1_ e-i (.!2 Sinc(i)
=
and therefore lim
r-+oo
.j21r
90(~) r 2
2
(9)
= _1 .
.j21r
This implies that at least in the Fourier domain we have what we hoped for, that is to say
the convergence being locally uniform on lR. How well the 9r themselves converge to > depends strongly on the regularity properties of >, and these we don't know. At the moment the ''function'' > is but an L2 object. Nevertheless, it makes sense to talk about the support of >. The statement ;:supp(» C [0,2N -1] can be deemed true, if
[
1>(tWdt = 0
JIR\[0,2N -I]
is guaranteed, and for the latter it is sufficient for > to be orthogonal on all test functions u E C 2 having co~pact support disjoint from [0, 2N -1]. This is precisely what we are going td prove the following lemma:
in
(6.3) Let u be a C 2 -function having a support that is compact and disjoint from the interval [0, 2N - 1]. Then
(>,u) =
J
>(t)u(t)dt = O.
I
Let an c be given. By assumption on u we know that U E Ll, thus there is an M > 0 such that
1
leI2:M
lu(.;) Id'; ~
c .
Such an M being fixed, one can find an r :;::: 0 such that
162
6 Orthonormal wavelets with compact support
furthermore, we deduce from 5.3.(5) and (9) that 'It; E lR, 'IT 2: 0 .
In view of (8) the supports of 9r and u are disjoint, therefore we can write
(¢,u)
= (9r,U) + (¢ -
9r,U)
= 0 + (J; - 9n u) ,
so that we obtain the estimate
\(¢, u)1 <
1M I¢'(t;) -
9r(t;)lIu(t;) Idt; + {
-M
J~~M
(I¢'(t;) I + 19r(t;)\) lu(t;) \ dt;,
< (lIulll + ~)c. Since c > 0 was arbitrary, we must have (¢, u)
= 0, as stated in the lemma..J
Altogether, we have arrived at the following theorem:
(6.4) Assume that the coefficient vector h. is bounded by (6) and that the corresponding function H satisfies the identity (1), as well as H(O) = 1. Then the scaling equation admits a unique solution ¢ E L2, and ¢ has compact support in the interval [0, 2N - 11. By the way, the iteration procedure that we have used in the proof of (6.4) can easily be implemented for the actual numerical construction of ¢ as well. Figures 6.1 and 6.3 show the approximating step functions 9r together with the limiting scaling function ¢.
3
Figure 6.1
Iterative construction of Daubechies' scaling function 2
6.1 The basic idea
163
In view of (6.1) resp. (6.4) the scaling function 4> is uniquely determined by H and explicitly given by (2). Therefore the following procedure suggests itself: One chooses a trigonometric polynomial H that satisfies the identity (1), as well as H(O) = 1, and defines 4> by (2). Then (a), (b) and (c) at the beginning of this section are fulfilled automatically; it remains to prove (d). The following example shows that the consistency conditions encoded by (1) are necessary, but unfortunately not sufficient for (d).
CD
Taking off from Example 5.3.CD we define
H(f.) := ~ (1 + e-3i~) = e-3i~/2 cos 3f. . 2 2 The identity (1) is fulfilled in this case:
The uniquely determined solution of the functional equation (b) that also satisfies (c) can be written down explicitly; it is ,
Taking the inverse Fourier transform one gets
(0::; t < 3) (otherwise) It is easy to see that the functions 4>o,k = 4>(' - k) (k E Z) are not orthonormaL On the other hand it can be shown that the 'l/Jj,k derived from this particular 4> constitute a tight frame for L2, see [D], Proposition 6.3.2.
0
Various additional assumptions on H have been proposed to make property (d) come true, as a matter of fact the gap is not wide. We shall treat two such attempts in what follows. The following variant is due to Mallat [12]: (6.5) Assume that the generating function H E 0 and H(O) = 1, as well as the additional condition
1
satisfies the identity (1)
(10) and let if; be defined by (2). Then the functions 4>o,k (k orthonormal basis of Vo .
E
Z) constitute an
164
6 Orthonormal wavelets with compact support
I
We have to show that the orthonormality (6.2) of the functions];. ( . - k) is preserved in the limit. That's where the extra hypothesis (10) comes in.
If 1.;1 ~ 7r, then one has H(';/2 j ) :I 0 for all j ~ 1, and this implies by definition of the convergence of an infinite product that ¢;(.;) :I o. Because of the locally uniform convergence of (2) we know that ¢; is continuous, therefore we can find a 5 > 0 such that
(1.;1
~ 7r) .
(11)
A moment's reflection will show that the function];. can also be written in the following alternative way:
(otherwise) In view of (11) this implies that the universal estimate
is valid, so that in the concluding formula
J
¢(t) ¢(t - k) dt =
J
1¢;(.;Weikf; d';
= r->oo lim
J1];.(';)1
2
eikf; d';
= OOk
Vk
E
7l
we are allowed to apply Lebesgue's theorem (on limits under the integral sign) .
.-J
CD
(Continued) In order to see what went wrong in this example we compute
~ I2 = 27r 1 Ifr(';)
r
r
j=l
j=l
ITI H (.;) 12 = 27r 1 IT 2 3.. 2j cos 2j+1
Now consider the points ';r := one has
i 2r 7r
C
(r ~ 1). According to the last formula
6.1 The basic idea
165
er
for all r ~ 1. Since the tend to infinity when r -T that the 1];.1 2 have a common integrable majorant.
00,
it seems inconceivable
The deeper reason for the phenomenon observed here is the following: The action D: 'R./21r -T 'R./21r , has a closed orbit
(eo,··· ,en-I),
ek
:=
Dek-I Vk,
en
=
eo
(12)
F:,
~}. It is a sidesuch that IH(ek)1 = 1 for all k, namely the two-cycle effect of condition (10) to make orbits of this kind impossible. This can be seen as follows: Condition (10) implies
< c"(: < 311") 2
IH(e) 1< 1 the variable
( 1!: 2 -
e being understood modulo 21r.
'
Let (12) be an arbitrary closed
orbit of D. In the (necessarily periodic) binary representation of
g~
modulo 1
each of the two sequences 01 or 10 must occur somewhere. But this implies that after finitely many steps a point Dj eo falls into the interval [~, 3;]; therefore the orbit under discussion necessarily contains points ej for which 0 one has IH(ej) I < 1. Lawton [11] has found a condition of a more algebraic nature that likewise guarantees the orthonormality of the functions ¢>O,k. We again assume (6); then by Theorem (6.4) we have a(¢» = 0 and be¢»~ = 2N -1. At stake are the numbers
am
:=
(¢>, ¢>O,m)
=
J
¢>(t) ¢>(t - m) dt
Because of supp(¢» c [0, 2N -IJ all am with Iml zero. Due to the scaling equation 5.2.(2) one has
J J
= L ~
hk hi
2N -1 are automatically
¢>(2t - k) ¢>(2t - 2m -l) dt
am = 2 Lhk hi k,1
~
(m E Z) .
¢>(t') ¢>(t' + k - 2m - l) dt' = L hk hi a2m+l-k . ~
If we substitute the summation variable 1 according to 1 := n + k - 2m, where n is the new running variable, we obtain
am
=
L(Lhkhn+k-2m) an· n
k
(13)
166
6 Orthonormal wavelets with compact support
In this way the square matrix A are defined by
:=
[Amn] of order 4N - 3, whose elements
L hk hn+k-2m
Am,n :=
(lml, Inl < 2N -1)
(14)
k
comes into play. Formula (13) can now be read as O:m = I:n Amn O:n, meaning that the vector 0:. is an eigenvector of A corresponding to the eigenvalue 1. The special vector
/3.
:=
(0, ... ,0,1,0, ... ,0) ,
i.e.
f3m = 80m
(lml < 2N -1)
is an eigenvector of A corresponding to the eigenvalue 1 as well; for, because of (1) resp. (5.4), one has
L Amn f3n = Am,o = L hk hk -
2m
=
80,m
= f3m
( Iml < 2N -
1) .
k
n
After all this work, we are in a position to state the following theorem:
(6.6) Assume that the coefficient vector h. is bounded by (6), that the corresponding function H satisfies the identity (1) and H(O) = 1, and that > is the scaling function determined by (2). H 1 is a simple eigenvalue of the matrix A, then the functions >O,k (k E Z) are orthonormal.
I
By assumption on A there is a number c E C* such that 0:. = cf3.; that is to say, all O:m = (>, >O,m) corresponding to m =F 0 have the value 0 as stated, and 0:0 = c =F o. The computation carried out in the proof of (5.9) shows that under these circumstances the identity
holds. Now, if l = 2T(2n + 1) =F 0, then the calculation T-1
¢(27rl)
=
II H(2T-
j=l
j
(2n + 1)7r) . H(2n + 1)7r) ¢(2n + 1)7r)
=
0,
(15)
6.1 The basic idea
167
repeated from the proof of (5.14), shows that in fact c
=
27r 1¢(0)1 2
=1 .
CD (Continued) In this example we have N = 2, and the hk take the following values: 1 ho = h3 =
J2'
Inserting these into (14) one arrives at the matrix
(the rows and columns
a~e
numbered from -2 to 2), having the eigenvalues
-1,
1
1
-"2' "2'
1, 1 .
The eigenspace corresponding to the eigenvalue 1 is two-dimensional; it is 0 spanned by the vectors (1,2,0,2,1) and, of course, (0,0,1,0,0). So far we have not touched the question of how regular the scaling functions are that one obtains in this way. Figures 6.1 (resp. 6.5) and 6.3 show that 1 may indeed look quite jagged. Since such a 1 comes into being only as the limit of a certain "fractal" process, and is not at our disposal in the form of a simple expression, the investigation of its regularity, be it via the decay of ¢(~) for I~I - 00 or via a careful analysis of the operator S, is very delicate and requires subtle estimates of various sorts. In this way one is able to prove, e.g., that the Daubechies scaling function 31 and its corresponding mother wavelet 37/J are already continuously differentiable, and furthermore that the order of differentiability increases essentially linearly (with a proportionality factor rv 0.2) with N. For details we refer the reader to [DJ, Chapter 7, or to the paper [7J.
168
6 Orthonormal wavelets with compact support
6.2 Algebraic constructions In view of the results presented in the last section, only the following algebraic problem remains: We have to find trigonometric polynomials that satisfy the identity H(f.) := ~ hk e- ike
I:
V2k
and, of course, the condition H(O) = 1. We shall insist here on real coefficients hk; the corresponding scaling functions if> as well as the mother wavelets 'if; will then be real-valued as well. According to 5.3.(13) the Fourier transform of 'if; is given by
Now, on account of what we said in Section 3.5 (see, e.g., Theorem (3.13)), we are interested in our wavelet 'if; having an order N as high as possible, and according to 3.5.(3) this is equivalent to the requirement that ~ should vanish of an order N as large as possible at f. = O. As a consequence the generating function H should have a zero of order N » 1 at f. = 7r, a fact that we express most elegantly by writing N?l. Instead of looking for H we switch for a moment to the function
(1) that would have to satisfy the linear identity
(2) For symmetry reasons the function M is a polynomial in cos f. , and M contains the factor
Therefore we may write
(3)
6.2 Algebraic constructions
169
where P is a certain polynomial as welL Now we introduce a new variable y by letting y := sin2 ~. This leads to A(~) = p(cos~) = P(l - 2y) =: P(y) ,
(4)
where again P is a certain polynomiaL In this way (3) becomes M(~)
Because of
= (1 _ y)N P(y) .
+ 1r) =sm . 2 "2~ =y cos 2 (~ -2-
and
A(~ +1r) = p(-cos~) = P(2y -1) = P(l- 2(1- y») = P(l- y), the identity (2) takes the following form when expressed in terms ofthe variable y: (5) This formula is valid for 0 :::; y :::; 1 at first, but by general principles on holomorphic functions we may conclude that it is true for arbitrary y E C. By the theorem on decomposition into partial fractions there are uniquely determined coefficients Ck , C k such that
and for symmetry reasons one has Ck = C k for all k. Clearing denominators, we can infer that there is a polynomial PN of degree :::; N - 1 such that
holds, and PN is the only polynomial solution of (5) having a degree:::; N - l. Now it easy to see that any solution P of (5) satisfies the identity
P(y) == (1 - y)-N (1 _ yN P(l _ y») as welL In particular, this is the case for PN, and this allows us to draw the following conclusion:
PN(y)
=
if:- 1 PN(y)
=
I: (-N) k (_y)k = I:
N-l
N-l
k=O
k=O
(N + k-1) yk . k
(6)
170
6 Orthonormal wavelets with compact support
Here we have made use of the fact that the part of PN carrying the factor 1 PN . The solution of (5) having the smallest yN gives no contribution to possible degree now has been determined explicitly: It is the right hand side of (6). Now let P be an arbitrary solution of (5). Then
i:-
and consequently P(y) - PN(y)
= yN P*(y)
for some polynomial P*. If we insert this into (7) again, we obtain P*(y)
+ P*(l- y) == 0,
which is equivalent to p*(y)
=
R(l- 2y)
= R(cos~) ,
R odd.
Since we can perform the same computations backward as well, all in all the following theorem has been proven:
(6.7) A trigonometric polynomial M(·) satisfies the identity (2) if and only if it has the following form:
M(~) = (coS2~) N p(sin2~)
.
Here P(y)
= PN(y) + yN R(l - 2y) ,
where PN is given by (6) and R is an arbitrary odd polynomial. In view of (1) such a function M(·) is of use only if P satisfies the additional condition P(y) ;::: 0 Letting P := PN, this condition is obviously satisfied. So much for the admissible functions M, these being related to H by (1). In order to get the generating functions H themselves, we must, so to speak, "take the square root of M ". In doing this we only have to bother about the factor
P(sin2~) = p(cos~) = A(~)
introduced in (3). For carrying out this task a surprising lemma of Riesz will come to our help. It reads as follows:
6.2 Algebraic constructions
171
(6.8) If n
A(~) = Lak cosk~, k=O
and if A(~) ~ 0 for real~, in particular A(O) = 1, then there is a trigonometric polynomial n
L b e-ik~
B(~) =
k
k=O
with real coefficients bk and B(O) = 1, such that A(~)
identically in
I
B(~) B( -~)
==
(8)
,
~.
The function A(·) possesses a product representation of the form n
~ A(~) = an
II (cos~ -
(9)
Cj) ,
j=1
the Cj being real or else appearing in complex conjugate pairs. We introduce the complex variable z by writing e-i~ =: Z; then (9) goes over into n
A(~)=anII
(Z +Z-1
)
2
(10)
-Cj.
j=1
In investigating the individual factors appearing in (10), we need the well known properties of the mapping z I--t (z + Z-1 ) /2 as well as the identity
z + Z-1
----=2=--- -
s + s-1 2
1
== - 2s (z - s) (z
-1
(a) If Cj E R and ICjl ~ 1, then there is an s Therefore we obtain, using (11):
z + Z-1
- - - - Cj =
2
(b) If Cj
E
Rand
ICjl
1
-_.
2s
E
R* such that
1
eia:
S+S-1 2
# 0) .
Cj =
=COSct.
(11)
(s+s-I)/2.
)
(z - s) . (z- - s .
< 1, then there is an s = Cj=
(zs
- s)
# ±1 such that
172
6 Orthononnal wavelets with compact support
This implies that A(';) contains a factor cos'; - COSQ, and the latter is not compatible with A(';) ~ 0 (.; E JR), unless this factor occurs an even number of times. Therefore there is a j' such that Cj' = Cj, and using (11) we obtain the identity 1 (Z+2Z-
-Cj)
(z+2z -
1
-Cjl)
=
4e;ia (z - eia ) (Z-1 - eia)(z - eia )(z-1 - eia )
=
~(z - eia)(z _ e- ia ) (Z-l _ eia ) (Z-1 _ e- ia ) 4
= ~.(z2-2ZCOSQ+1).(Z-2-2z-1COSQ+1). (c) If Cj tj JR, then there is, first, a j' such that s E C* such that 2
Cj'
=
Cj'
=
Cj
and, second, an
8+8- 1
2
Using (11) again we get
All things considered, it follows that it is possible to combine and to regroup the factors appearing in (10) in such a way that the resulting representation of A(';) assumes the following form:
Here Q(z) = L:~=o qkzk is a polynomial with real coefficients qk, and the constant C E C" is obtained by collecting an and the various numerical factors that have appeared in (a)-(c). The extra condition A(O) = 1 gives C = 1/(Q(1))2. It follows that, if we set B(';) := Q(e-i~)IQ(l), then (8) is valid; therefore the lemma is proven. -.J The decomposition (8) is not uniquely determined, since in the cases (a) and (c) interchanging sand S-1 leads to another decomposition ofthe corresponding partial product of A(·). This, albeit modest, flexibility can be used to make
173
6.2 Algebraic constructions
the resulting scaling function and in consequence the related mother wavelet more symmetrical. We shall not pursue this matter any further. Assume that N is given. If we choose for simplicity P := PN , then A(.) becomes a polynomial of degree N - 1 in cos~ and B(·) a polynomial of degree N - 1 in e-i~. In this way the generating function
is of degree 2N - 1 in e-i~, and the support of the corresponding scaling function (=: N¢» turns out to be the interval [0, 2N - 1]. The mother wavelets N'I/J derived from the N¢> are called Daubechies wavelets.
CD
In the case N = 1 we of course obtain the Haar wavelet. Formula (6) gives P1 (y) == 1, and this in turn implies p(cos~) == 1, B(~) == 1, so that we finally get 1
·c
H(!;.) = 2(1 + e- t ... )
,
o
which is in agreement with 5.3.(21).
The case N = 2 shall be dealt with in detail in the next section; the case N = 3 appears as Example @ below. In [D], Table 6.1, the coefficient vectors (hk 10 :'S k :'S 2N - 1) corresponding to the Daubechies wavelets N'I/J are given to 16 decimal places for 2 :'S N :'S 10. In [L], Table 2.3, one finds these coefficients to six decimal places for N from 2 to 5.
@ We now describe in detail the case N P(y) :=P3 (y) =
= 3, choosing
G) + G)y+ (:)y 2 =1+3 +6y2 . Y
Inserting 2 ~ 1 ·c ·c y = sin - = -( -e-·. . + 2 - e·. . ),
2
4
into (4) we get
A(~)
=
~e-2~ _ -49 e-i~ + 19 8
4
_ ...
Figure 6.2 confirms that A(~) is 2:: 0 throughout so that it makes sense to proceed with our computation. In the case at hand, the function B(·) has the
174
6 Orthononnal wavelets with compact support y
1
o
211"
Figure 6.2
form B(~) = bo + b1e- ie + b2e- 2ie , so that we have to compare coefficients in the identity
°e + b2 e- 2°e °e 2°e , )(bo + b1e' + b2e • )
(b o + b1e-'
3 2°e - '49 e-'"·c + 4: 19 - ... = Se-'
For symmetry reasons it is enough to check the coefficients corresponding to e- 2ie , e- i { and 1. In this way we obtain the three equations
(12) Because A(O) = P(O) = 1, Lemma (6.8) guarantees that we can find real solutions (b o, b1 , b2) that satisfy the additional condition bo + b1 + b2 = 1. If we use this condition to eliminate bo + b2 from the second equation in (12), we get for bl the quadratic equation by - bl - ~ = 0, and this in turn leads to b _ l±v'lO 12 '
We leave it to the reader to pursue the upper choice of the sign here; it will result in complex solutions bo and b2 • This means that we definitively have b1 = (1 - v'lO) /2, and because of the first equation in (12) we can say that bo and b2 are the two solutions of the quadratic equation
Choosing arbitrarily (well, not quite ... ) one ofthe two possible assignments, we get
V5 + 2v'lO + 1 - 2v'lO e-ie + 1 + v'lO - 4V5 + 2v'lO e -2ie ,
B(..C) -_ 1 + v'lO + 4
6.2 Algebraic constructions
175
so that we finally obtain
C+
H(f,)
;-i€
r
B(e)
.!.(1 3 -i€ 8 + e +...) =
1 + J10 + V5
32
(1 + J10 +4V5 + 2J1O + 1-2J10 e-i€ + ...)
+ 2J1O
+
5 + J10 + 3V5 + 2J1O -i€ 32 e
+ ....
From the part of H that is actually printed out here one can immediately read off ho and hI: 1 + J10 + V5 + 2J1O ho = v1n2 L, 32
hI =
= 0.33267 . .. ,
J2 5 + J10 + 3V5 + 2J1O = 32
0.80689 ... ,
both in agreement with Table 5.4.(8). We leave it to the reader as an exercise to compute the remaining hk as well and so convince herself that we have indeed determined the coefficient vector h. corresponding to the Daubechies wavelet 31/J. Figures 6.3 and 6.4 show the functions
0
34> and 31/J in the time domain.
1-+------::#----'>-
3
Figure 6.3
The Daubechies scaling function
34>
4
5
6 Orthonormal wavelets with compact support
176
1
1
Figure 6.4
4
5
The Daubechies wavelet 3'1/1
6.3 Binary interpolation In the two foregoing sections we obtained scaling functions and corresponding wavelets by means of constructions in the Fourier domain, and also as limiting functions of an iteration procedure. In neither approach, however, did we discuss the convergence behaviour in the time domain. Now there is a third, called the direct method for constructing scaling functions >. This method yields without a limiting process the exact values >(x) at all "binary rational" points x E JR, and it is with the help of this method that one obtains the best regularity results, e.g. for the Daubechies wavelets N'lj;. In order to fix ideas, we assume that an N > 1 has been chosen once and for all and, furthermore, that
a(h.)
=
0,
b(h.) = 2N -1,
as agreed upon in connection with the Daubechies wavelets. The following abbreviations will prove useful:
{O, 1, ... ,2N - I}
=: J ,
6.3 Binary interpolation
177
For the description of the binary rational numbers we use the handy notation
therefore we have the inclusions Z
= lJ)o
C lJ)1 C ... C lJ)r C lJ)r+1 C . . . C lJ) ,
and lJ) is dense in R. The scaling equation now has the form 2N-l
¢(t) =
v'2
L
ho h2N-
hk ¢(2t - k),
1 =1=
0.
(1)
k=O
The "direct method" is founded on the following three simple facts: • • •
If t E lJ)r for some r ~ 1, then the numbers 2t - k (k E J) belong to lJ)r-l· If t < 0, then the numbers 2t - k (k E J) are < 0 as well. If t > 2N -1, then,the numbers 2t - k (k E J) are> 2N -1 as well.
On account of these facts the scaling equation (2) allows us to compute the values of ¢ successively on w
••
,
and therefore on all oflJ), if only these values have been determined on lJ)o = Z beforehand. Moreover, if ¢(k) = 0 for k E Z2N-l to begin with, then automatically ¢(t) = 0 for all t E lJ)
¢(k) := 0
(k
Z\J)
E
is in agreement with (1). Therefore, we are left with the system of homogeneous equations ¢(j) = v'2 2:k hk ¢(2j - k), or, equivalently, 2N-l
¢(j)
= v'2
L
h 2j -k ¢(k)
(0 :::; j :::; 2N - 1) ,
(2)
k=O
for the vector (¢(j) Ij E J) =: a. This means that the (J x J)-Matrix Bjk
:=
v'2 h2j- k
((j, k) E J x J)
should have an eigenvector a corresponding to the eigenvalue 1. In this regard we shall prove the following:
178
6 Orthonormal wavelets with compact support
(6.9) The matrix B has 1 as an eigenvalue in any case. If this eigenvalue is simple, then there is exactly one corresponding eigenvector a such that
L kEJ ak=l. r-
(3)
As an illustration of this theorem we show here the matrix B in the case
N=3:
B
v'2
ho 0 0 0 0 0 h2 hI ho 0 0 0 h4 h3 h2 hi ho 0 0 h5 h4 h3 h2 hI 0 0 0 h5 h4 h3 0 0 0 0 h5 0
(4)
For the proof we argue about the column sums of B. To this end we consider again the generating function H, as given in 5.3.(3). Because of
we have, in addition to (5.5), the equation
so that the following is true:
A glance at (4) shows that the matrix B has (at least in the case N = 3) constant column sums 1. Of course, this is true in general:
(k even) (k odd) and it is easy to verify that for each k E J the sum extends over all h2l =f. 0 resp. all h2l+1 =f. O. What we have found can be expressed in other words as follows: The vector e := (Iii E J) is an eigenvector of the matrix B', corresponding to the eigenvalue 1. Such being the case, the matrix B has 1 as an eigenvalue as well, and there is a corresponding eigenvector a =f. o. For the proof of the second part of the theorem, we note the following: By general prinCiples (see [6], §58, Theorem 1), our space X is the direct sum
179
6.3 Binary interpolation
of two B-invariant subspaces U and V such that B - Ix is nilpotent on U and invertible on V. Therefore the characteristic polynomial q(>..) of B can be decomposed as q(>..) = (>.. - l)m q1(>"), where m := dim(U). Now by assumption on q(.) we have m = 1; therefore U = < a> and dim(V) = dim(X) -1. To any y E V there is an x E V such that y = Bx - x, and from this we conclude that
(e,y) = (e,Bx) - (e,x) = (B'e,x) - (e,x)
=0.
This proves V c < e >1., by counting dimensions we therefore have V <e>1.. Because a tj. V, this implies
~
~kEJ
=
= (e, a) # 0 ,
ak
which is enough to show that the sum on the left can be normalized to 1.
-1
Condition (3), resp. I:kEJ 4>(k) = 1, does not come out of the blue. As a matter of fact, one ha~the following theorem (cf. (6.1)):
(6.10) Suppose that the generating function H is as in Theorem (6.1) and that if; E L2 is defined by the infinite product 6.1.(2). If 4> is in reality a continuous function, satisfying an estimate of the form
c
14>(t)1 ~ 1 + t2
(t E JR.) ,
then the following identity holds: 1
L4>(x-k)
(x E JR.) •
(5)
k
I
By assumption on 4> the auxiliary function
g(x) := L 4>(x - k) k
is a continuous periodic function of period 1 and has Fourier coefficients Cj
=
=
t
Jo
g(x) e-2j1rix dx
J
4>(x) e-2j1rix dx
=L k
[1 4>(x - k) e- 2j1ri (x-k) dx
Jo
= v2-irif;(2j1f) = 80j
(j
E
Z) ,
6 Orthonormal wavelets with compact support
180
where in the end we have used 6.1.(15). From this it follows that 9 has the constant value 1, as stated. .-J For N ~ 2 the Daubechies scaling functions N
= 2:
(6.11) The Daubecmes scaling function 2
r
We begin as in Example 6.2.®: According to 6.2.(6) one has
and consequently
To this A(·) we have to apply the Riesz' Lemma (6.8). If we compare coefficients in the identity
6.3 Binary interpolation
181
then the two equations 1
bob i = - 2
result. We choose the solution (bo, bl ) = (1 + ..;3) /2, (1 - ..;3)/2), which leads to
H(t;,) = =
( 1+e-i~)2 2 B(t;,) = ~(1 + V3 +
(3 +
1
8(1 + 2e-i~ + e-2i~) (1 + V3 + (1 - V3)e-i~)
V3)e-i~ +
(3 -
V3)e-2i~ +
(1-
V3)e-3i~) ,
whence H(O) = 1 is satisfied, too. In this way we obtain the following table, representing the coefficient vector h. :
-..2.... 1 +4..;3 -h o-y'2 hI =
~ 3 +..;3 = V2 4
h 2-y'2 -..2.... 3 - 4..;3 -h3 =
~ 1 -4..;3 =
.4829629131445341 .8365163037378079 .2241438680420134 -.1294095225512604 .
All calculations that follow will take place within the following range of real numbers: The set Jl) [v'3] is obviously a ring, and the conjugation (complex numbers do not occur any more in this section) z= x
+ yV3
1-+
z:= x - yV3
is an automorphism of Jl) [v'3] that keeps the elements of the ground ring Jl) fixed. The following two numbers will playa special role in our computations:
a:= If a and
1+..;3 4 = .6830 ... ,
_ 1-..;3 a= 4 = -.1830 ....
a are inserted into the scaling equation (1), it takes the form
(6)
182
6 Orthonormal wavelets with compact support
and analogously the system of equations (2) becomes
¢(I) [¢(o)] ¢(2)
=
[a 1- a
1-
a
a
]
a
1- a
1-
a
¢(3)
a
¢(o)] [¢(I) ¢(2)
.
(7)
¢(3)
The system (7) has exactly one solution that satisfies condition (3) as well, namely,
¢(O)] [¢(I) ¢(2)
_ -
¢(3)
[2a0] 2a 0
.
As has been said twice before, we set ¢(k) := 0 for all remaining k E Z. Then ¢(.) is recursively determined on all of II} by (6). We assert that the resulting function ¢: II} ---+ lR. has the properties listed below:
(6.12) For all x ElI}, the following are true:
1I}[V3] ,
(a)
¢(x)
(b)
¢(3 - x) = ¢(x) ,
(c)
L.k ¢(x - k) = 1,
(d)
L.kk¢(x-k) = x-2a-4a.
E
I
For an x E lI}o = Z, the statements (a)-( c) are true. In order to verify (d) iz we write ¢ IZ in the form
(x
E
Z) .
Then
¢(x - k) = 2a 8x-k,1
+ 2a 8x- k,2 =
2a 8x- 1 ,k + 2a 8x- 2,k
(x,k E Z),
and this implies the following chain of equations for arbitrary x E Z:
Lk¢(x-k) = 2aLk8x-l,k+2aLk8x-2,k= 2a(x-I)+2a(x-2) k k k = x-2a-4a. We noW assume that the relations (a)-( d) are true for all x E II}r and consider an arbitrary t E II}r+1. All numbers 2t - k belong to II}r, therefore, one may
6.3 Binary interpolation
183
read off immediately from (6) that ¢(t) lies in ]])[v'3] as well. Regarding (b) and (c), one has
+ (1- a)¢(5 - 2t) + (1- a)¢(4 - 2t) + a¢(3 - 2t) 3) + (1- a) ¢(2t - 2) + (1- a) ¢(2t -1) + a¢(2t)
¢(3 - t) = a¢(6 - 2t) =
a¢(2t -
=
¢(t)
and
=
L(a¢(2t - 2k)
+ (1- a)¢(2t -
k
+ (1 -
= (a+(1-a))L¢(2t-2k)
2k -1)
a)¢(2t - 2k - 2) + a¢(2t - 2k -
+
((1-a)+a)L¢(2t-2k-1)
k
= L
¢(2t - i)
3))
k
= 1.
l
Finally, the induction step for (d):
Lk¢(t- k) k
= Lk (a¢(2t-2k)+(1-a)¢(2t-2k-1) k
+ (1- a)¢(2t -
2k - 2) + a¢(2t - 2k -
3))
= L(ak + (1- a)(k -1))¢(2t - 2k) k
+ L((l - a)k + a(k - 1))¢(2t - 2k -1) k
=
1
2" ~(2k + 2a 1
=-
2
1
2)¢(2t - 2k) + 2" ~((2k + 1) -1- 2a)¢(2t - 2k -1)
L 1¢(2t -l) + (a - 1) L ¢(2t -l) l
l
1
= -(2t 2
2a - 4a)
+ a-I
= t - 2a -4a.
In the last part we used the relation 2a + 2a = 1 several times.
184
6 Orthonormal wavelets with compact support
In view of this induction proof, property (d) seems to come as a miracle. In reality this property may be related to certain general principles in a similar way as (c) has its theoretical foundation in Theorem (6.10). Now consider the formulas (6.12)(c) and (d) when x is restricted to the interval 0 :::; x:::; 1. Because of supp(¢) = [0,31 we obtain the two equations
¢(x) + ¢(x + 1) + ¢(x + 2) = 1 - ¢(x + 1) - 2¢(x + 2) = x - 2a - 4a, and from these the following formulas result through elimination:
¢(x + 1) = -2¢(x) + x + 2a } ¢(x+2) = ¢(x) -x+2a
(x E J[), 0 :::; x :::; 1) .
(8)
We stick for a moment to the x-interval [0,1]. Because of supp( ¢) = [0,3], for such x the number of terms in the scaling equation can be reduced as follows:
¢(x) = {
a¢(2x) a¢(2x)
(XEJ[),O:::;X:::;~)
+ (1 -
a)¢(2x -1)
(XEJ[),~:::;x:::;l)
(9)
The second line of (9) is not yet in its optimal form. If ~ :::; x :::; 1, then there is an u E [0,1] such that 2x = u + 1. Using the first formula (8) we therefore may write
¢(2x)
=
¢(u + 1)
=
-2¢(u) +u +2a = -2¢(2x -1)
+ 2x -1 + 2a
and consequently
a¢(2x)
+ (1- a)¢(2x -1) =
(-2a + 1- a)¢(2x -1) + 2ax - a + 2a2
= a¢(2x -1)
1
+ 2ax + 4 .
This means that we can replace (9) by
¢(x) = {
(x E J[), 0:::; x :::;
a¢(2x) a ¢(2x - 1) + 2ax +
i
~)
(XEJ[), !:::;x:::;l)
(10)
In this way we have obtained a reproduction scheme for ¢ referring to the interval [0,1 J only. In both lines of (10) there is a single q'>-term on the right hand side, and, what's more, at both occurrences of such a term the coefficients have an absolute value < 1. This fact is going to be the main ingredient of our continuity proof.
6.3 Binary interpolation
185
We let X be the space of all continuous functions f: [0,1]
-+
JR assuming at
o and 1 resp. the values 0 and 2a and provide it with the metric d(j,g):= sup If(x) - g(x)1 . O:$;x$l
By general principles X is a complete metric space. We now assert that the following proposition is valid:
(6.13) The formula
Tf(x)
:=
{a f (2X) af(2x -1) + 2ax +
defines a contracting mapping T: X
d(Tf,Tg)
~
-+
(0 ~ x
a
i
~!)
~x ~
(11)
1)
X; to be precise, one has
ad(j,g)
(12)
Vf, 9 EX.
I
If f(O) = 0 and f(l) = 2a, then Tf(O) = 0 and Tf(l) = 2a as well. Furthermore, one has Tfa) = 2a2 , this being the case regardless of whether the value has been computed using the first or the second line of (11). Finally it becomes clear from ~ooking at (11) that for any f E X the image Tf is continuous on each of the two half-intervals [O,!] and [!, 1], and as a consequence Tf is continuous on all of [0,1]. Altogether, we have shown that T is a well defined map from X to X. Now let
f and 9 be two arbitrary functions in X. For 0 ~ x
~
! one has
ITf(x) - Tg(x) I = a If(2x) - g(2x) I ~ a d(j, g) ,
! ~ x ~ 1 the following is true: ITf(x) - Tg(x) I = I(af(2x -1) + 2ax + i) - (ag(2x -1) + 2ax + i) I
and for
= lallf(2x -1)
-
g(2x -1)1
Because of 10,1 < a « 1) we therefore have x E [0,1], and (12) is proven.
10,1 d(j,g) . ITf(x) - Tg(x) I ~ ad(j,g) ~
for all
-1
From (6.13) it follows by the general fixed point theorem that there is a unique function f* E X satisfying Tf* = f* . This function f* coincides in the points of JI)) n [0,1] with the function ¢: JI)) -+ JR constructed earlier, because at the points 0 and 1 the function f* has the same values as ¢ has, and because the reproduction scheme (11), applied to f:= f* (=> Tf=f*), goes over into the reproduction scheme (10) for the function ¢i(JI)) n [0,1]). From this it follows that our ¢: JI)) -+ JR, restricted to 0 ~ x ~ 1, has a continuous extension on all of [0,1]. Now from (8) one concludes that such continuous extensions exist in the intervals [1,2] and [2,3] as well, and outside of [0,3] the definition ¢(x) := 0 trivially makes for a continuous extension.
186
6 Orthonormal wavelets with compact support
2a
1
3
1
20: Figure 6.S
The Daubechies scaling function
24>
Let us summarize our results so far: (6.14) There is a unique continuous function >: R -+ R having support [0,3] and satisfying, identically in x, the following equations:
.E!=o hk >(x -
(a)
>(x) =
(b)
.Ek >(x - k) = 1,
(c)
.Ekk>(x-k)=x-
2k) ,
3-
v'3
2
.
(a) The function u(x) := >(x) - .E!=o hk >(2x - k) is continuous and vanishes at all points ofD, consequently u(x) == O. In any bounded x-interval, the left hand side of (b) is a finite sum and therefore a continuous function v(·). According to (6.12)(c) this function takes the value 1 at all points of D, therefore we have v(x) == 1 on all of R. In an analogous manner one obtains the identity (c) from (6.12)(d). --1
I
The function >: R -+ R we constructed here is in fact the Daubechies scaling function 2>, for (6.14)(a) implies
187
6.3 Binary interpolation
and from (6.14)(b) one concludes 3 1 2
.,f2;r¢(O) =
r ¢(x)dx= Jor L¢(x+k)dx= Jor L¢(x+k)dx=l. Jo k=O
1
k
Altogether this means that 6.1.(2) is true. It follows that our ¢ is the "original", i.e., time domain version of the unique scaling function belonging to the coefficient vector (h o, ... , h3). This function, by definition, is 2¢; but up to this point it was analytically available to us only in the form ¢. --.l In Figures 6.5 and 6.6, the functions 2¢ and 2'1/; are shown. These figures have been created by means of the described recursion procedure, computing 3·256 values in each of the two cases.
1
3
-2a Figure 6.6
The Daubechies wavelet 2'1j;
188
6 Orthonormal wavelets with compact support
6.4 Spline wavelets In this last section we construct the so-called Battle-Lemarie wavelets. The starting material are spline functions, and that's why these wavelets are occasionally called spline wavelets as well, even though they are no longer spline functions. At the same time, the BattIe-Lemarie wavelets, in contradiction to the title of the current chapter, don't have compact support either. Nevertheless it will be possible to use the formalism that we have erected in the foregoing sections for the treatment of these wavelets as well. But let's take everything in turn! Another glance at the scaling equation in the form 5.3.(4) shows that, given two pairs (¢I, HI) and (¢2, H 2 ), each of them satisfying such an equation, the pair (~ . ¢2 , HI' H 2 ) satisfies such an equation as well. To multiplication in the Fourier domain corresponds convolution in the time domain; in other words, if (1)1 and ¢2 are scaling functions, then ¢I * ¢2 will satisfy a scaling equation as well. Therefore, beginning with ¢o := ¢Haar and setting up the recursion scheme ¢n+1 := ¢o * ¢n (n 2: 0), we should obtain a sequence of ever more regular functions that a priori satisfy scaling equations and could maybe be adapted to be useful in the construction of wavelets. We are going to change our notation to some extent, for the functions obtained in this way have previously appeared in numerical practice, going by the name of B-splines (for "basis splines"), and they play an important role in the general theory of spline approximation. Various notations for these functions can be found in the literature, among them the following, which suits our purposes well enough: Bo(x) .-
(0:5 x < 1)
{~
Bn+l(x) := (Bo
(otherwise)
* Bn)(x) = 1~1 Bn(t) dt
(n 2: 0) .
(1)
Doing the actual computation one finds, e.g., that the cubic B-spline is given by the following formulas: .!.X3 6
2 2 I 3 B3(X)= { 3-2X+2X-2X B3(4 - x)
o Figure 6.7 shows the graphs of B I , B2 and B 3 •
(0:5x:51) (1:5 x :5 2) (2:5 x:5 4) (otherwise) .
6.4 Spline wavelets
189
The easy verification of the following statements is left to the reader: supp(Bn)
=
J
[0, n + 1] ,
Bn(x)dx
=1
(n;::: 0);
furthermore, one has
(n;::: 1) .
1
1/2
x
o
1
2
3
4
Figure 6.7
Since for all practical purposes Bo
= ¢Haan copying 5.3.(20) gives
The convolution theorem (2.10) converts the recursion formula (1) into the formula
and by multiplicative accumulation one obtains
(n;::: 0) .
(2)
The following can immediately be read off from this representation of Bn: (3)
190
6 Orthonormal wavelets with compact support
On account of what was said at the beginning of this section, we now expect that each B-spline Bn satisfies a scaling equation. As a matter of fact, we have
and consequently
This means that
(4) where the generating function Hn is given by
e)n+l = Hn(e):= ( e-i~/2 cos 2"
(1 +2e-i~)n+l
(5)
We see that the coefficients hk (hin), really) of Hn have the following values:
.j2
hk = { ~n+l
(n+ 1) k
(0 ~ k ~ n
+ 1)
(otherwise) so that the scaling equation in the time domain takes on the following form:
(x
E
JR.) .
That the Bn would satisfy such identities could not immediately be guessed from looking at their definition! In order to check whether Bn can be used as a scaling function, according to (5.9) we have to examine the 27r-periodic function
(6) Because of (3) the series appearing on the right is uniformly convergent. It follows that n is a continuous function (we shall compute n explicitly later on). Furthermore, we obtain, using (2) and the inequality sin x x
>
2 7r
6.4 Spline wavelets
191
the following estimate:
IBn(e) 12
=
~lsin(e/2)12n+2 > ~(~)2n+2 211" e/2 - 211" 11"
Under these circumstances there are numbers B 2:: A > 0 (B and A depend on n) such that E lR,
\Ie
and on account of part (a) of Theorem (5.14) we come to the conclusion that the translates Bn (. - k) (k E Z) constitute a Riesz basis of the space
Va
:=
span(Bn(· - k) Ik E Z)
.
The proof of the following lemma is deferred to a later point:
(6.15) There are polynomials Pn of respective degree n such that the following is true:
(n2::0). The Pn can be computed recursively and have rational coefficients. We now suppose that an n 2:: 1 has been chosen and remains fixed in what follows. Part (b) of Theorem (5.14) describes an orthonormalization procedure; in particular, it gives a formula for the "definitive" scaling function if; corresponding to the chosen n, meaning that the translates if; (. - k) (k E Z) of if; are in fact orthonormal. The formula in question is
(7) In order to get an expression for if; in the time domain, we develop the function IlVPn(cose) into a Fourier series:
Inserting this into (7) and applying rule (Rl) we finally obtain the following representation of the scaling function if; corresponding to the chosen n:
if;(x)
= :~::>k Bn(x - k) . k
(8)
192
6 Orthonormal wavelets with compact support
It has to be admitted, however, that the coefficients
(k;::: 0) appearing here have to be computed numerically one by one. Since l!VPn(cosf.) is a real-analytic 27r-periodic function, the nential decay when Ikl -+ 00: There is a p < 1 such that
Ck
have expo-
and because of supp(Bn) = [0, n + 1] it easily follows from this that ¢(x) is exponentially decaying when Ixl -+ 00 as well. But the compact support of Bn has been lost in the orthogonalization process. Proceeding along the lines of the general theory, we further need the modified generating function H#, and in order to be able to work with the mother wavelet 'Ij; corresponding to the above ¢ we need the coefficients h'/f in the representation
H#(f.) =
~ Lh'f!'e- ire
v2
(9)
.
r
From (7) we conclude because of (4) that
Pn(cosf.) Pn (cos(2f.)) Therefore, by means of (5), we get the representation
Pn(cosf.) Pn (cos(2f.)) ,
(10)
from which one can read off already that 'Ij; has the order n + 1. The square root on the right now has to be developed into a Fourier series:
Pn(cosf.) Pn (cos(2f.)) here again the coefficients
(k ;::: 0)
(11)
193
6.4 Spline wavelets
have to be computed numerically one by one. Comparing coefficients in (9) and (10) we obtain the following formula for the hf:
(12) Only now are we in a position to compute the Battle-Lemarie wavelet resp. spline wavelet 'IjJ corresponding to the chosen n. On account of 5.3.(16) resp. (8) we have
'IjJ(t) =
J2 2:::( _1)k-l h~k-l ¢(2t -
k)
k
=
J2 2::: 2:::( _1)k-l h~k_lCl Bn(2t k
=
k -l)
I
J2 2::: 2) _1)k-l h~k_l Cr-k Bn(2t r
r) .
k
This means that we should introduce the new set of coefficients br :=
J2 2:::(-I)k-l h~k_l Cr-k, k
and in this way we get definitively
'IjJ(t) = 2::: br Bn(2t - r) . r
How many terms of this expansion actually have to be taken into consideration is best decided "at run time". The last formula has brought our discussion to a close. It remains to supply the proof of Lemma (6.15).
I
Inserting (2) into the definition (6) of ~n we get
e) =
~n (
1 . 2n+2 e '\:"' -2 L....
-2 sm 7r
I
1 f;
(~+ 7rl)
1 . 2n+2 es (t::) 2n+2 = -2 SIn -2 n <"
where we have introduced the auxiliary function
7r
,
194
6 Orthonormal wavelets wit.h compact. support
As is easily verified, one has (n:::: 1) ,
and t.his leads to the following recursion formula for the Ifln :
(13) It remains to knead this prescription into a more practicable form, Since the Bo ( - k) (k E Z) are in fact orthonormal, we have .;po (e) == 2~' Setting cos'; =: y, we introduce a new variable y and write the function Ifln in the following form:
= 2~ Pn(Y) ;
Po(y)
== 1 .
We are now going to insert this into (13), In so doing we must observe the following differentiation rules: d dE,
( - sin {)
d 2 ~ -y-+(l-y)-.
!!.. , dy
dy
dy2
In this way the recursion formula (13) becomes Pn{Y)
=
1 U_y)lt+l(_y(Pn-l(Y))'+(I_ y2 )(Pn-l(Y))"), (14) n(2n + 1) (1 - y)n (1 _ y)n
where the dot' denotes differentiation with respect to the variable y. By computing successively
( p71-l(Y))' (1 - y)n
Pn-l + n = .,----,-(1 - y)n
Pn-l , (1 - y)n+l
jin - 1 +2n Pn-l ( Pn-l(y))" = ( ( I-yn ) +1 ( ) I-yn .1 ) yn
+n
(
n
+1
)
Pn-l
(1 _ y)n+2 '
we get rid of the denominators in (14): Pn(Y)
= n(2n1+ 1) (-y(l + (1 + Y)(l
Y)Pn-1 - nYPn-l
- yf'Pn-1
+ 2n(1
y)i)n-l
+ n(n + I)P71-1)) .
6.4 Spline wavelets
195
This can be slightly simplified by collecting like terms. In this way we obtain the following definitive recursion formula for the Pn:
= n (2'n+l 1 ) (n(n+l+nY)Pn-1
+ (1
- y) (2n
+ (2n -
1) y) Pn -1
+ (1
2
- y) (1
+ y) Pn - 1)
;
and it easy to see that Pn is a polynomial of degree n in the variable y = cos e, jf Pn-l had degree n - 1. ~ If one feeds the final recursion formula to, e g., Mathematica®, the following output is returned: PI (y) = ~(2+y),
P2(Y) = 3~ (16 + 13y + y2) , P3(Y) = 6:;0 (272
+ 297y + 60y2 + y 3 )
,
and so on.
CD In t.he case 17. = lone obtains by means of (11) and (12) the following table of coefficients h;! : r
h# r =h# 2-r
1
.8176464014 .3972970868 -.0691009838 -.0519453464 .0169710467 .0099905948 -.0038832619 - .0022019510 .0009233709 .0005116360 .0002242963 -.0001226863 .0000553563 0000300112 -.0000138188 - .0000074444
2 3 4 5 6 7 8
9 10 11
12 13
14 15
16
r
h# =h# r 2-r
17
.0000034798 .0000018656 - 0000008823 - 0000004712 0000002249 .0000001198 -.0000000576 -.0000000306 .0000000148 .0000000078 - .0000000038 -.0000000020 .0000000010 .0000000005 - .0000000003 - .0000000001
18 19
20 21 22 23
24 25 26 ?_I
28 29 30 31 32
196
6 Orthonormal wavelets with compact support
The scaling function ¢ and the Battle-Lemarh~ wavelet 1j; corresponding to n = 1 are shown in Figures 6.8 and 6.9. Both functions are piecewise linear.
x
Figure 6.8
The Battle-Lemarie scaling function corresponding to n
=
1
1
x
-4
-3
-2
1
-1
2
-0.5
-1
-1.5
Figure 6.9
The Battle-Lemarie wavelet corresponding to n
=1
197
6.4 Spline wavelets
Carrying out the same calculations for n = 3, one finds that the h'f! now decay considerably slower than before when Irl --4 00. As a consequence the following table gives these h'f! to six decimal places only, although they were originally computed, using Mathematica®, to 14 decimal places. r
h#=h# r 4-r
r
h#=h# r 4-r
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
.766130 .433923 -.050202 -.110037 .032081 .042068 -.017176 -.017982 .008685 .008201 -.004354 -.003882 .002187 .001882 -.001104
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
-.000927 .000560 .000462 -.000285 -.000232 .000146 .000118 -.000075 -.000060 .000039 .000031 -.000020 -.000016 .000010 .000008
The scaling function
Figure 6.10
The Battle-Lemarie scaling function corresponding to n = 3
198
6 Orthonormal wavelets with compact support
1
0.5
x
Figure 6.11
The Battle-Lemarie wavelet corresponding to n
=3
References
Books on wavelets [Be] John J. Benedetto and Michael W. Frazier eds.: Wavelets: Mathematics and applications. CRC Press 1994. [Bu] C. Sidney Burrus, Ramesh A. Gopinath and Haitao Guo: Introduction to wavelets and wavelet transforms. Prentice Hall 1998.
[C]
Charles K. Chui: An introduction to wavelets. Academic Press 1992.
[C'] Charles K. Chui ed.: Wavelets. A tutorial in theory and applications. Academic Press 1992. [D]
Ingrid Daubechies: Ten lectures on wavelets. CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM 1992.
[D'J Ingrid Daubechies ed.: Different perspectives on wavelets. Proc. Symp. Appl. Math. 47, Amer. Math. Soc. 1993. [K]
Gerald Kaiser: A friendly guide to wavelets. Birkhauser 1994.
[L]
Alfred K. Louis, Peter MaB und Andreas Rieder: Wavelets, Theorie und Anwendungen. Teubner 1994.
[M] Yves Meyer: Ondelettes et operateurs, I: Ondelettes. Hermann 1990. The same in English: Wavelets and operators. Cambridge University Press 1992.
[W] Mladen Victor Wickerhauser: Adapted wavelet analysis from theory to software. A K Peters 1994.
Original papers and background material [1]
Christopher M. Brislawn: Fingerprints go digital. AMS Notices 42(11) (1995), 1278-1283.
[2]
Paul L. Butzer and Rolf J. Nessel: Fourier analysis and approximation. Vol. I: One-dimensional theory. Birkhauser 1971.
[3]
Ingrid Daubechies: Orthonormal bases of compactly supported wavelets. Communications on Pure and Applied Mathematics 41 (1988), 909-996.