ANHA
Applied and Numerical Harmonic Analysis
Hogan Lakey
Time—Frequency and Time—Scale Methods Adaptive Decompositions, Uncertainty Principles, and Sampling
Developed in this book are several deep connections between time–frequency (Fourier/Gabor) analysis and time–scale (wavelet) analysis, emphasizing the powerful adaptive methods that emerge when separate techniques from each area are properly assembled in a larger context. While researchers at the forefront of developments in time–frequency and time–scale analysis are well aware of the benefits of such a unified approach, there remains a knowledge gap in the larger community of practitioners about the precise strengths and limitations of Fourier/Gabor analysis versus wavelets. This book fills that gap by presenting the interface of time–frequency and time–scale methods as a rich area of work.
Topics and Features: • Inclusion of historical, background material such as the pioneering ideas of von Neumann in quantum mechanics and Landau, Slepian, and Pollak in signal analysis • Presentation of self-contained core material on wavelets, sampling reconstruction of bandlimited signals, and local trigonometric and wavelet packet bases • Development of connections based on perspectives that emerged after the wavelet revolution of the 1980s • Integrated approach to the use of Fourier/Gabor methods and wavelet methods • Comprehensive treatment of Fourier uncertainty principles • Explanations at the end of each chapter addressing other major developments and new directions for research Applied mathematicians and engineers in signal/image processing and communication theory will find in the first half of the book an accessible presentation of principal developments in this active field of modern analysis, as well as the mathematical methods underlying real-world applications. Researchers and students in mathematical analysis, signal analysis, and mathematical physics will benefit from the coverage of deep mathematical advances featured in the second part of the work.
Time—Frequency and Time—Scale Methods
Jeffrey A. Hogan and Joseph D. Lakey
Time–Frequency and Time–Scale Methods Adaptive Decompositions, Uncertainty Principles, and Sampling Jeffrey A. Hogan Joseph D. Lakey
ISBN 0-8176-4276-5
EAN
Birkhäuser ISBN 0-8176-4276-5 www.birkhauser.com
9 780817 642761
Birkhäuser
Jeffrey A. Hogan, Joseph D. Lakey
Time–Frequency and Time–Scale Methods: Adaptive Decompositions, Uncertainty Principles, and Sampling November 3, 2004
Springer Berlin Heidelberg NewYork Hong Kong London Milan Paris Tokyo
To
Mysie and to Ellen
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1
2
Wavelets: Basic properties, parameterizations and sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Scaling and multiresolution analysis . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Orthonormal wavelet bases for L2 (R) . . . . . . . . . . . . . . . . 1.1.2 Subband coding and FWT . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Biorthogonal multiresolution analyses . . . . . . . . . . . . . . . 1.1.4 Regularity for scaling distributions . . . . . . . . . . . . . . . . . . 1.2 A construction of quadrature mirror filters . . . . . . . . . . . . . . . . . 1.2.1 The Zak transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Scaling functions in the Zak domain . . . . . . . . . . . . . . . . . 1.2.3 QMF construction algorithm . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Constraints on samples imposed by QMFs . . . . . . . . . . . . 1.2.5 Parameterization of four-coefficient systems . . . . . . . . . . . 1.2.6 Cardinal scaling functions . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Computing the scaling function . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2 6 7 13 15 23 24 25 27 28 28 29 30 31
Derivatives and multiwavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Wavelets and derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Nonstandard wavelet representation of d/dx . . . . . . . . . . 2.1.2 Differentiation and commutation of MRAs . . . . . . . . . . . . 2.1.3 Wavelet characterization of Sobolev norms . . . . . . . . . . . . 2.1.4 Sobolev estimates for pointwise products . . . . . . . . . . . . . 2.2 Piecewise polynomial multiwavelets . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Multiwavelet introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Alpert’s piecewise polynomial wavelets . . . . . . . . . . . . . . . 2.2.3 Interpolating scaling functions . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Multiscaling properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Multiwavelets based on fractal interpolation vectors . . . . . . . . . .
41 42 42 44 45 49 53 53 54 54 55 58
viii
Contents
2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 2.3.6 2.4 Notes
Fractal interpolation functions . . . . . . . . . . . . . . . . . . . . . . DGHM multiwavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiwavelets and Sobolev spaces on R+ . . . . . . . . . . . . . Strela’s two-scale transform and commutation . . . . . . . . . Smoothing and roughening DGHM scaling filters . . . . . . Biorthogonal multiwavelets on H01 (R+ ) . . . . . . . . . . . . . . . .................................................
58 59 61 63 67 70 75
3
Sampling in Fourier and wavelet analysis . . . . . . . . . . . . . . . . . . 89 3.1 Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.1.1 The frame algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.1.2 Frame acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.2 Sampling of trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . 101 3.2.1 Uniform sampling and the fast Fourier transform . . . . . . 101 3.2.2 Nonuniform (fast) Fourier transforms . . . . . . . . . . . . . . . . 102 3.2.3 Algorithms based on Taylor polynomials . . . . . . . . . . . . . 103 3.2.4 The Dutt–Rokhlin algorithm . . . . . . . . . . . . . . . . . . . . . . . . 105 3.2.5 The inverse transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.2.6 Nonuniform sampling and frames . . . . . . . . . . . . . . . . . . . . 108 3.3 Sampling in the Paley–Wiener spaces . . . . . . . . . . . . . . . . . . . . . . 110 3.3.1 Sampling sets for the Paley–Wiener spaces . . . . . . . . . . . 112 3.3.2 Iterative reconstructions in P WΩ . . . . . . . . . . . . . . . . . . . . 114 3.3.3 Prolate spheroidal wavefunctions . . . . . . . . . . . . . . . . . . . . 115 3.3.4 The ΩT theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 3.3.5 Quadrature for Paley–Wiener spaces . . . . . . . . . . . . . . . . . 120 3.4 Sampling in phase space: the short-time Fourier transform . . . . 131 3.4.1 Regular Gabor frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 3.4.2 Irregular Gabor frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 3.5 Sampling in principal shift-invariant spaces . . . . . . . . . . . . . . . . . 144 3.5.1 Iterative reconstruction in PSI spaces . . . . . . . . . . . . . . . . 145 3.5.2 Periodic nonuniform sampling in PSI spaces . . . . . . . . . . 147 3.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4
Bases for time–frequency analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 163 4.1 Wilson bases and the Zak transform . . . . . . . . . . . . . . . . . . . . . . . 164 4.2 Local trigonometric bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 4.2.1 Smooth localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 4.2.2 Locally bandlimited functions . . . . . . . . . . . . . . . . . . . . . . . 173 4.3 Wavelet packet bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 4.3.1 High- and low-pass filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 4.3.2 Subspaces and trees; splitting criteria . . . . . . . . . . . . . . . . 178 4.4 Information cells and tilings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 4.5 The discrete Walsh model phase plane . . . . . . . . . . . . . . . . . . . . . 181 4.5.1 Subspaces spanned by finite sets of tiles . . . . . . . . . . . . . . 183 4.5.2 Tilings and the notion of best basis . . . . . . . . . . . . . . . . . . 185
Contents
ix
4.6 Phase planes for finite Abelian groups . . . . . . . . . . . . . . . . . . . . . 186 4.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 5
Fourier uncertainty principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 5.1 Fourier support properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 5.1.1 Benedicks’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 5.1.2 Consequences of support properties . . . . . . . . . . . . . . . . . . 194 5.1.3 Uncertainty and missing data . . . . . . . . . . . . . . . . . . . . . . . 195 5.1.4 Nazarov’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 5.2 Growth properties and Fourier uniqueness criteria . . . . . . . . . . 196 5.2.1 Hardy’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 5.2.2 Beurling’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 5.2.3 Gelfand–Shilov spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 5.3 Finite uncertainty principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 5.4 Symmetry and sharp inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 202 5.4.1 The sharp Hausdorff–Young inequality . . . . . . . . . . . . . . . 202 5.4.2 Entropy and logarithmic Sobolev inequalities . . . . . . . . . 208 5.4.3 Other sharp inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 5.4.4 Pitt’s inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 5.4.5 Rearrangements and spectral concentration . . . . . . . . . . . 212 5.5 Uncertainty inequalities in phase space . . . . . . . . . . . . . . . . . . . . 213 5.5.1 A Heisenberg inequality for the Wigner distribution . . . 213 5.5.2 Wigner consequences of Hausdorff–Young . . . . . . . . . . . . 214 5.5.3 Benedicks’ theorem for the Wigner distribution . . . . . . . 215 5.5.4 Hardy’s theorem for S(f, g) . . . . . . . . . . . . . . . . . . . . . . . . . 215 5.5.5 Heisenberg’s inequality and phase plane rotations . . . . . . 216 5.5.6 DeBruijn’s inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 5.6 Weighted Fourier inequalities and uncertainty . . . . . . . . . . . . . . . 221 5.7 Embeddings, uncertainty and Poisson summation . . . . . . . . . . . 225 5.7.1 Weil’s approach to PSF and a generalized version . . . . . 225 5.7.2 Some necessary and sufficient conditions for PSF . . . . . . 228 5.7.3 M1 and PSF in particular . . . . . . . . . . . . . . . . . . . . . . . . . . 230 5.7.4 More counterexamples to PSF . . . . . . . . . . . . . . . . . . . . . . 231 5.7.5 Proof of the embedding theorem . . . . . . . . . . . . . . . . . . . . 233 5.8 Time–scale uncertainty principles . . . . . . . . . . . . . . . . . . . . . . . . . 234 5.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
6
Function spaces and operator theory . . . . . . . . . . . . . . . . . . . . . . . 245 6.1 Besov spaces: history and wavelets . . . . . . . . . . . . . . . . . . . . . . . . . 247 6.2 Unconditional bases as best bases . . . . . . . . . . . . . . . . . . . . . . . . . 248 6.3 Best nonlinear approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 6.3.1 Nonlinear wavelet approximation in Besov norms . . . . . . 252 6.3.2 Temlyakov’s theorem and wavelet approximation . . . . . . 253 6.4 Nonlinear approximation, wavelets and trees . . . . . . . . . . . . . . . 254 6.5 Wavelets and coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
x
Contents
6.6
6.7
6.8
6.9 6.10 7
6.5.1 Kolmogorov entropy and coding . . . . . . . . . . . . . . . . . . . . . 257 6.5.2 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 6.5.3 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 6.5.4 Performance in Besov balls . . . . . . . . . . . . . . . . . . . . . . . . . 261 Boundedness and compression of operators . . . . . . . . . . . . . . . . . 263 6.6.1 Schur’s lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 6.6.2 Schur’s lemma and wavelet matrices . . . . . . . . . . . . . . . . . 263 6.6.3 Wavelet compression of operators . . . . . . . . . . . . . . . . . . . . 264 Boundedness and compression of singular integrals . . . . . . . . . . . 265 6.7.1 Haar wavelets and the Hilbert transform . . . . . . . . . . . . . 265 6.7.2 Compression of Calder´on–Zygmund operators . . . . . . . . . 268 Schur’s lemma and symbol classes . . . . . . . . . . . . . . . . . . . . . . . . . 269 6.8.1 Pseudodifferential operators . . . . . . . . . . . . . . . . . . . . . . . . 269 6.8.2 Symbol conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 6.8.3 Estimates for singular values and compression of compact pseudodifferential operators . . . . . . . . . . . . . . . . . 271 6.8.4 Exotic symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 Dyadic structure and NWO sequences . . . . . . . . . . . . . . . . . . . . . 274 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Uncertainty principles in mathematical physics . . . . . . . . . . . 285 7.1 Wave mechanics and uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . 286 7.1.1 Spectral theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 7.1.2 Measuring position and momentum . . . . . . . . . . . . . . . . . . 289 7.1.3 Simultaneous observability . . . . . . . . . . . . . . . . . . . . . . . . . . 290 7.1.4 Physical considerations of indeterminacy . . . . . . . . . . . . . 293 7.2 Eigenvalue estimates for Schr¨odinger operators . . . . . . . . . . . . . 295 7.2.1 Stability of the hydrogen atom . . . . . . . . . . . . . . . . . . . . . 295 7.2.2 Volume counting and its deficiencies . . . . . . . . . . . . . . . . . 296 7.2.3 Fefferman–Phong eigenvalue estimates . . . . . . . . . . . . . . . 297 7.2.4 Thomas–Fermi theory and stability of matter . . . . . . . . . 306 7.2.5 Sharpening the Fefferman–Phong condition . . . . . . . . . . 309 7.2.6 NWO eigenvalue estimates for Schr¨odinger operators . . 312 7.2.7 Eigenfunction estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 7.3 More on decay of wavelet coefficients . . . . . . . . . . . . . . . . . . . . . . . 316 7.3.1 Bounded variation and weak-`1 . . . . . . . . . . . . . . . . . . . . . 316 7.3.2 Wavelets and an improved Sobolev inequality . . . . . . . . . 322 7.4 More on the spectrum of Schr¨odinger operators . . . . . . . . . . . . . 322 7.4.1 WKB approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 7.4.2 Turning points and connection formulas . . . . . . . . . . . . . . 323 7.4.3 Spectral estimates for Schr¨odinger operators with slowly decaying potentials . . . . . . . . . . . . . . . . . . . . . . . . . 324 7.4.4 Adapted martingales and pointwise bounds . . . . . . . . . . . 328 7.4.5 The endpoint p = 2 and Carleson-type operators . . . . . . 331 7.5 Walsh models revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
Contents
7.5.1 7.5.2 7.5.3 7.6 WKB 7.6.1 7.6.2 7.6.3 7.7 Notes A
xi
A Walsh model for the Carleson operator . . . . . . . . . . . . . 334 A Walsh quartile operator and the BHT . . . . . . . . . . . . . . 335 Estimates for the Walsh bilinear Hilbert transform . . . . 337 and WAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 Cochlear modelling: early history . . . . . . . . . . . . . . . . . . . . 346 The cochlear compromise . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Cochlear processing and WAM . . . . . . . . . . . . . . . . . . . . . . 348 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 A.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 A.2 Miscellany from real and harmonic analysis . . . . . . . . . . . . . . . . . 361 A.3 Miscellany from functional analysis . . . . . . . . . . . . . . . . . . . . . . . . 365
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Preface
By time–frequency (TF) analysis we mean, loosely, techniques and principles used in signal analysis, PDE and harmonic analysis that combine consideration of spatial/temporal content, on the one hand, and spectral content on the other, in ways that yield more powerful results than from considering the two domains separately. Time–scale analysis encompasses those aspects involving wavelets and other multiscale methods. Wavelets provide particularly useful TF-decompositions, and the first two chapters of this book focus on certain aspects of wavelets. The most appealing aspects of TF-analysis, in our view, are those that bring out connections between deep theoretical principles on the one hand and intended uses of the methods on the other. These aspects include adaptive decompositions that come from viewing wavelets and Fourier bases as special instances of a larger class of building blocks for function and signal decompositions, the sampling methods that can be attached to such building blocks, and the uncertainty principles (UPs) that different decompositions necessarily entail. This is not a textbook. There are numerous resources, both on wavelets and on time–frequency analysis in general, that are more foundational. David Walnut’s An Introduction to Wavelet Analysis [356] and Karlheinz Gr¨ochenig’s Foundations of Time–Frequency Analysis [168] are two excellent texts in this book series. This monograph is really geared toward readers who have some knowledge of the foundations of wavelets and time–frequency analysis. Although applications are mentioned, the word “methods” in the title refers to “mathematical methods” that readers working in applications will find useful when thinking of how to fine-tune their own analysis tools. The first part of this book (Chapters 1–4) builds up material that, while technical, is more readily accessible to a more diverse audience. The second part of the book (Chapters 5–7) builds on the first, but also makes use of important elements of harmonic analysis. Conceptually, three main threads intertwine throughout. These are: (i) adaptive decompositions, (ii) methods for passing from continuous to discrete information and vice versa, and (iii) uncertainty principles—theoretical limi-
xiv
Preface
tations on TF-localization. The following three pictures serve to illustrate in broad—and perhaps familiar—terms the main ideas of the book.
Fig. 0.1. A rectangular or “Gabor” grid
Fig. 0.2. A hyperbolic or “wavelet” grid
Figure 0.1 represents “classical” time–frequency analysis while Figure 0.2 represents wavelet analysis. The dyadic tiling in Figure 0.3 represents one of a vast family of pictures, of which the first two are especially important examples. When interpreted as having unit area, the cells in each picture are called Heisenberg tiles. Consider associating to such a picture a family of building blocks—one per cell—from which all signals or functions can be built. The building blocks are thought to be localized on the corresponding cell or tile. Time–frequency analysis addresses a whole raft of issues surrounding the signal or function representations that can arise in this way.
Preface
xv
Fig. 0.3. A dyadic tiling
One aspect of the uncertainty principle concerns the possibility of localizing functions on a single rectangle R = IR × ωR in which IR and ωR denote the respective time and frequency intervals of R. If f is well localized about I then its Fourier transform fˆ cannot be well localized about ω (and vice versa). Working at Bell Labs in the 1960s, Landau, Slepian and Pollak showed that, when appropriately phrased, the functions that are optimally localized on R are prolate spheroidal wave functions. Moreover, when the area |R| of R is large, the dimension of the space of functions “well localized” on R is essentially the area |R| of R. Functions “localized on R” are said to be essentially time- and bandlimited. Functions f that are truly bandlimited live on a single horizontal strip— one row of rectangles—in Figure 0.1. Shannon’s sampling theorem says that, suitably normalized, such an f can be expressed as a sum of its sample values P via f (t) = k f (k) sinc(t − k) where sinc(t) = sin(πt)/(πt). The functions sinc(t − k) can be thought of as being localized about rectangles [k, k + 1) × [−1/2, 1/2). However, the localization is poor in time. This is the price one must pay for the other nice properties of sinc(t). Specifically, the functions sinc(t − k) are orthogonal. Hence, for bandlimited f , f (k) = hf, sinc(· − k) i. The orthogonality is best seen by dualizing Figure 0.1, i.e., interchanging the roles of f and fˆ (rotating by π/2). Then one is looking at a vertical strip; fˆ is localized on this strip, on whose rectangles live the modulated Haar functions e2πint χ[0,1) (t), which form an orthonormal basis for L2 ([0, 1)). By considering all integer translates and modulates snk (t) = e2πint sinc(t − k) of the sinc function or all modulates and shifts hnk (t) = e2πint χ[k,k+1) of the Haar function, one obtains orthonormal bases for all of L2 (R). However, the basis elements are either poorly localized in time (snk ) or in frequency (hnk ). Can a (Riesz) basis—not necessarily orthonormal—for L2 (R) be formed of functions gnk that are “well localized” on the unit rectangles [k, k + 1) × [n, n+1)? The possibility of doing so with unit TF-shifts e2πinx G(x−k) of the
xvi
Preface 2
Gaussian G(x) = e−πx was suggested by Denis Gabor in 1946. This is why Figure 0.1 is often called the Gabor picture. However, the TF-shifted Gaussians turn out not to form a Riesz basis. The now famous Balian–Low theorem says that, in fact, no well-localized basis of TF-shifts can exist. Ideas due to Wilson actually furnish bases whose elements are “exponentially localized” on symmetric pairs of time–frequency tiles, but this was not realized until after the wavelet revolution. Thus, it came as a surprise that by modifying the geometry of the rectangles—replacing the squares in the Gabor picture by the “hyperbolic squares” of Figure 0.2—the so-called wavelet picture—one could generate orthonormal bases of L2 (R) whose elements ψjk (t) = 2j/2 ψ(2j t − k) are well localized about corresponding unit TF-rectangles. It is constructive to think of the tiling in Figure 0.3 as being obtained from those of the first two figures by a sequence of recombinations of adjacent pairs of rectangles, trading time sibling tiles sharing the same frequency interval for frequency siblings or vice versa. Two specific methods for doing so conform to respective wavelet packet and local trigonometric bases. Recombination criteria lead to adaptive signal decomposition techniques including best basis algorithms. The preceding paragraphs give only a broad view of time–frequency analysis. The beauty of its methods comes out best in particular contexts and intricate technical constructions. Once immersed in details it may be difficult to follow the main threads and their connections, several of which are summarized in Figure 0.4, and to discover surprisingly diverse uses of common tools and methods, including the Zak transform (Sections 1.2, 3.5, 4.1, 5.1, 5.7), rearrangements (Sections 5.1, 5.4, 5.6, 6.2, 6.6, 7.3), and combinatorial arguments based on dyadic trees (Sections 3.5, 4.3, 4.5, 6.4, 6.5, 7.2, 7.3, 7.5). What follows is an outline of the first four chapters, plus some comments pertaining to the remainder of the book. More detailed descriptions can be found in the introductions of the individual chapters. Though it is not an introduction, Chapter 1 addresses several basic properties of wavelets that crop up later. The technical heart of the chapter lies in Sections 1.1.4 and 1.2. Section 1.1.4 addresses the problem of pointwise regularity of wavelets, elaborating on the discussion in Daubechies’ book [99]. Regularity properties of wavelets play a fundamental role in their approximation properties discussed in Chapter 6 and their role in establishing uncertainty relations in Chapter 7. Wavelets having any regularity are associated with multiresolution analyses (MRAs). An MRA is generated by a scaling function ϕ whose shifts ϕ(x − k) form a basis for a basic space V (ϕ). When the shifts ϕk = ϕ(x − k) are orthogonal, the orthogonal projection of f ∈ L2 (R) onto P V (ϕ) is defined by f 7→ k hf, ϕk iϕk . The Shannon MRA is generated by the function ϕ(x) = sinc(x). In this case the space V (sinc) is the space of functions bandlimited to [−1/2, 1/2). As noted above, if f ∈ V (sinc) then hf, sinc(· − k)i = f (k). The problem of relating expansion coefficients of elements f ∈ V (ϕ) and samples of f has important ramifications, some of which
Preface
WAVELETS Chapters 1,2,6
1.2 QMF Construction 3.5 Sampling and MRA
2.1, 2.3 Wavelets and Sobolev Estimates 5.8 Time-scale UPs 7.2 Fefferman-Phong Estimates 7.3 Improved Sobolev Inequality
4.3 Wavelet packets Ch. 6: Best Basis, Nonlinear Approximation, Compression
ADAPTIVE DECOMPOSITIONS Chapter 4
xvii
SAMPLING Chapter 3
4.2 Locally Bandlimited 5.7 UP and Poisson Summation 3.2 Nonuniform FFTs 3.3 Pseudo PSWFs 3.4 Gabor Expansions
Ch. 4: UPs in Basis Constructions 4.5 Discrete Walsh Plane 7.5 Multilinear Expansions, Walsh Estimates
UNCERTAINTY PRINCIPLES Chapters 5, 7
Fig. 0.4. Some relationships among main topics
are considered in Chapter 3. With these issues in mind, we present a construction of MRAs in Section 1.2 displaying some new aspects that boil down to determining the MRA from the integer values of the basic scaling function ϕ. In Chapter 2 we investigate further regularity properties of wavelets, with a focus on Sobolev spaces. Although we do not address any specific applications to PDEs, the properties that we consider have their origins in questions of how to apply wavelets in numerical analysis. We start out reviewing the fact that, for suitably regular wavelets, L2 -Sobolev space norms are characterized in terms of magnitudes of wavelet coefficients. Half of this characterization is one form of Sobolev inequality. The role of wavelets in proving sharper Sobolev inequalities and, consequently, their role in establishing sharper forms of the uncertainty principle, is documented in detail later in Chapter 7. The goal of constructing new wavelets should not simply be to have more wavelets—but rather to have ones that can do what could not be done before. One of the major concerns with first-generation wavelets was their inability to line up at boundaries of intervals, thus posing serious questions about their possible application to boundary value problems. Boundary adapted wavelets and biorthogonal wavelets were constructed to address this issue, but they each posed their own tradeoffs. Starting in Section 2.3 we address these concerns by means of the DGHM multiwavelets. These multiwavelets combine
xviii
Preface
symmetry and minimal support properties in crucial ways to produce bases for Sobolev spaces on half-lines and intervals. Implicit to this approach is the question of building, from a dual pair of MRAs, a new pair of MRAs related to the first by differentiating (roughening) one of the dual MRAs and integrating (smoothing) the other. In Chapter 3 we return to issues of sampling. The focus of the first part of the chapter is on finite sample data using uniform and nonuniform Fourier techniques. Several Fourier-based algorithms for processing sampled data are outlined. Frame expansions play a useful role here in iterative reconstruction algorithms. The second part of Chapter 3 addresses connections between phase space density and the existence of time–frequency localized building blocks for signal approximation and reconstruction. This includes a review of the work of Landau–Slepian–Pollak, identifying prolate spheroidal wave functions (PSWFs) as bases for spaces of approximately time- and bandlimited functions. Approximation properties of Gabor expansions are also considered. The third and last part of Chapter 3 addresses the problem of sampling in multiresolution and, more generally, shift-invariant, spaces. This includes iterative reconstruction as well as interpolation schemes. In the latter case, the full structure of an MRA is shown to play an important and natural role in validating reconstruction from samples. Because of their nestedness, MRA spaces also furnish a natural context for discussing aliasing errors. Chapter 4 develops specific time–frequency methods that underlie the general tilings represented by Figure 0.3. As the discussion to this point is disproportionately skewed toward the wavelet picture, we first review a construction of the Wilson bases conforming to the Gabor picture. We then review the local trigonometric bases (LTBs) and wavelet packets, which bear parallel relationships to the respective Gabor and wavelet pictures. Though it is constructive to think of Figure 0.3 as being derived from the respective Gabor and wavelet pictures via recombination, the utility of the corresponding decomposition tools thus derived is obscured by uncertainty issues. The discrete Walsh plane provides a model for the time–frequency plane. Orthogonal Walsh functions are indexed by sequency—a discrete but imperfect substitute for frequency. Unlike exponentials that must be cut-off in order to achieve time–localization, Walsh functions are truly supported in [0, 1). After the substitution of sequency for frequency is accepted, shifted and dilated Walsh packets WP associated with dyadic tiles P can then be interpreted concretely as wavelet packets on the one hand and as a version of LTB functions on the other. The advantage is that there is no uncertainty. Packets associated with disjoint tiles are orthogonal. Thus, one is able to develop precise combinatorial statements, associating time–sequency projections to regions of the phase space. Specifically, if a region R in the plane can be written as a finite pairwise disjoint union of tiles P ∈ P, then there is a welldefined orthogonal projection associated to R—defined as the span of those WP , P ∈ P comprising a pairwise disjoint cover of R. Any other pairwise disjoint covering of R defines the same projection. The Walsh model thus
Preface
xix
leads to a clean geometrical interpretation of best basis algorithms normally associated with LTBs or wavelet packets. Important theoretical uses of the ideas surrounding Figure 0.3 are discussed in Chapters 6 and 7. For example, associating LTBs to specific tilings leads to a simple proof of boundedness properties of so-called exotic operators discussed in Section 6.8—ones that are actually rather fundamental as regards certain problems in PDE. To some extent, the technique of associating basis functions with tilings of regions of the time–frequency plane has its origins in the problem of stability of matter, which boils down to eigenvalue estimates corresponding to bound states of Schr¨odinger operators. Associating function decompositions to tilings leads to even more sophisticated techniques for estimating multilinear operators that arise naturally in nonlinear PDE. We refer here to Lacey and Thiele’s solution of the famous Calder´on conjecture [239] regarding boundedness of the bilinear Hilbert transform. The Walsh model for this problem, as well as connections of these ideas to perturbation theory of Schr¨odinger operators, are also discussed in Chapter 7. Chapter 5 is about the uncertainty principle as a limitation on joint localization of a function and its Fourier transform. The chapter is intended to serve as a resource on methods for proving uncertainty principles and for establishing relationships among different forms of Fourier uncertainty. A major theme is how to transform statements about joint localization of a function f and its Fourier transform into, sometimes sharper, statements about phase space decay of some time–frequency distribution of f . Chapter 6 addresses consequences of regularity and geometric organization of wavelets in terms of approximation—and thereby compression—of functions and operators. The accompanying techniques culminate in an effective coding scheme for wavelet approximations of signals due to Cohen, Dahmen, Daubechies and DeVore [81]. The same set of techniques lead, surprisingly, to a wavelet proof of a sharp Sobolev inequality (Section 7.3) due to Cohen, DeVore, Petrushev and Yu [85]. On the surface, this inequality bears no relation to signal compression. Other parallels between compression and operator bounds in the wavelet and Gabor pictures are also noted in Chapter 6. Notational issues and some background results are mentioned in the Appendix. One point is worth mentioning here: in assigning notation, we have tried to be consistent with standard usage in the literature. Because of the breadth of subject matter in this monograph, this approach inevitably leads us to assign multiple meanings to particular symbols such as “∗” and “δ.” The intended usage should always be clear from the context. Insofar as this is a book about mathematical methods, we have chosen to include proofs only of selected results that illuminate the power of the methods. Many other results are reported in order to sketch the mathematical landscape of which they serve as landmarks. We have omitted or merely outlined proofs of many important results due either to length, technical complication or redundancy. Finally, we are guilty of sins of omission on many
xx
Preface
counts. We apologize for not including the multitude of brilliant and relevant ideas that we have failed even to mention. Many persons have contributed to this book in one form or another. Lauren Schultz helped us to get off to an enthusiastic start. It is a pleasure to thank Luca Capogna, Mark Craddock, Chris Heil, Bill Heller, Loredana Lanzani, Chris Meaney, Sofian Obeidat, Cristina Pereyra, David Walnut and Ying Wang not only for sharing their brilliance but, most importantly, for their friendship and encouragement. On technical matters we are indebted to Gregory Beylkin, Pete Casazza, Sam Efromovich, Hans Feichtinger, Charly Gr¨ochenig, Palle Jorgensen, Zioma Rseszotnik, Xiaoping Shen, Roy Slaven, Mark Tygert, Chris Weaver and Hong Xiao for discussion and feedback. We also owe special thanks to several anonymous reviewers who set us straight on some important points. We are grateful to David Donoho et al. for creating WaveLab and to Vasily Strela for creating MWMP which allowed us to produce many of the graphics without a hitch. We would also like to acknowledge the contributions of several past and present colleagues and mentors, including Lolina Alvarez, Dick Bagby, Charles Chui, Michael Cowling, Garth Gaudry, Doug Kurtz, Alan McIntosh, John Price, Adam Sikora, Caroline Sweezy and Guido Weiss for their moral support and help in understanding different aspects of the material. Hogan is grateful for the support of a Macquarie University Research Grant which helped fuel this long-term collaboration with his good friend, and an NSF conference grant which brought the very best of modern applied harmonic analysis to the foothills of the Ozarks. Lakey is indebted to the Mathematics Department at the University of Texas at Austin, and particularly to Bruce Palka for arranging support for his sabbatical in 2002-2003. He would also like to acknowledge support from the Army Research Office. We would like to thank the editorial staff at Birkh¨auser Boston, particularly, Regina Gorenshteyn, Elizabeth Loew, and Tom Grasso for the amazing job they did in overseeing the development of this project. Final words of appreciation go to John Benedetto and to John Gilbert, who have profoundly influenced our development as mathematicians. And, of course, to our parents: thank you for making us do our homework.
Fayetteville, Arkansas, Las Cruces, New Mexico October, 2004
Jeff Hogan Joe Lakey
1 Wavelets: Basic properties, parameterizations and sampling
Wavelets play a fundamental role in decomposing the function spaces—and operators that act on them—that are considered throughout this book. Morlet and colleagues (e.g., [278,279]) coined the term ondelettes to describe families of shifted and modulated pulses generated from a single function ψ. Their discovery of the benefits of wavelets in geoexploration was quickly seen as a germination of similar ideas incubating collectively in the mathematics (e.g., Calder´on–Zygmund theory), mathematical physics (e.g., coherent states) and electrical engineering (vis-`a-vis subband coding) communities. The key features of wavelets that enable their exploitation in discrete signal analysis have long since been distilled into the conceptual framework of a multiresolution analysis (MRA). Some basic questions regarding wavelets that will be important throughout this book are: what regularity properties can they have, and to what extent does regularity complement or obstruct other desirable properties? Statements regarding the inability of wavelets to possess simultaneous smoothness and localization properties are a manifestation of the uncertainty principle. Nevertheless, wavelets can have some degree of smoothness together with compact support as Daubechies [97] first showed. Another basic question that will be addressed in more detail in Chapter 3 is: how can multiresolution methods be exploited when analyzing sampled data? This chapter is not intended as an introduction to wavelets. We assume that the reader has some familiarity with them already. The purpose, rather, is to develop properties of wavelets that can be used to address the questions just asked. Nevertheless, some basic discussion is needed both to set notation and to help establish the perspective needed to answer these questions. This discussion, including a brief review of orthogonal and biorthogonal multiresolution analyses, subband coding schemes and computation of scaling functions via the cascade algorithm, will comprise the first few parts of Section 1.1. Daubechies’ construction of continuous, compactly supported orthogonal wavelets [97] was a highlight among theoretical developments. Before then, the very existence of such wavelets was suspect. The dependence of regularity—
2
1 Wavelets: Basic properties, parameterizations and sampling
as measured by local H¨older exponents—on scaling filters was first addressed in contraction estimates due to Daubechies and Lagarias [107, 108] showing, essentially, how the rate of convergence of the cascade algorithm depends on the spectral structure of the scaling filter. Section 1.1.4 contains details of these estimates. In Section 1.2 we present a “new” way of constructing quadrature mirror filters predicated on a given sequence of integer sample values of the corresponding scaling function. This construction is motivated by the desire to have scaling functions amenable to extrapolation in the sense of Papoulis and Gerchberg [96, 154, 159, 293] and to sampling (see [197, 198, 200, 201] and Chapter 3). Such properties are not easily derived from standard constructions. Technically, this new construction hinges on properties of the Zak transform. The techniques also provide an alternative method for computing the values of the scaling function, as is discussed in Section 1.3. The notes of this chapter focus largely on two other ways of parameterizing wavelets by building them from basic components. Pollen’s product provides a means of building orthogonal, compactly supported wavelets based on a certain factorization of unitary matrix-valued Laurent polynomials. A different factorization, based on the Euclidean algorithm, provides a means of building biorthogonal filters from lower-degree factors. This decomposition is the basis for Sweldens’ lifting construction.
1.1 Scaling and multiresolution analysis This section contains a more or less standard approach to the construction of a wavelet basis from a multiresolution analysis (MRA) of L2 (R). A two-scale MRA consists of a sequence of closed subspaces {Vj }j∈Z of L2 (R) that are nested in the sense that Vj ⊂ Vj+1 and, additionally, f (x) ∈ Vj if and only if f (2x) ∈ Vj+1 . The spaces also must satisfy: ∩j Vj = {0} while ∪j Vj = L2 (R). The base space V0 should be shift-invariant, in the sense that f ∈ V0 if and only if f (· − k) ∈ V0 (k ∈ Z). Moreover, we insist that there exists a function φ ∈ V0 whose integer shifts {φ(· − k)}k∈Z form a Riesz basis for V0 , i.e., there exist constants A, B > 0 such that for any sequence {ak } ∈ `2 (Z), A
X k
° °2 X °X ° ° |ak | ≤ ° ak φ(· − k)° |ak |2 . ° ≤ B 2
2
k
k
We write V0 = V (φ) when we wish to emphasize that V0 is the principal shiftinvariant space generated by φ (see Chapter 3). The nestedness and Riesz basis properties imply the existence of {hk } ∈ `2 (Z) such that X 1 ³x´ φ = hk φ(x − k). 2 2 k
(1.1)
1.1 Scaling and multiresolution analysis
3
This is known as the two-scale relation or scaling equation or dilation equation. In the Fourier domain this equation becomes b b φ(2ξ) = H(ξ) φ(ξ) (1.2) P in which the Fourier series H(ξ) = k hk e−2πikξ is called the symbol, scaling filter, refinement mask, etc., of φ, which itself is called a scaling function. R Here we normalize the Fourier transform so that fb(ξ) = f (x)e−2πixξ dx when f ∈ L1 (R). The simplest example of a scaling function φ in L2 (R) is φ(x) = χ[0,1] (x), the Haar scaling function, with H(ξ) = e−πiξ cos πξ. Since the integer shifts of φ generate a Riesz basis for L2 (R), a version of Gram–Schmidt can be used to find an element ϕ of V (φ) whose shifts form an orthonormal basis for V0 . The idea goes back to Schweinler and Wigner [315] and is discussed by Daubechies (in Section 5.3.1 of [99]). Consider the Gram matrix with entries G(k, l) = h φ(· − k), φ(· − l) i. By a change of variables, G(k, l) = G(k − l, 0), while Z b φ(ξ) b e−2πikξ dξ G(k, 0) = φ(ξ) =
XZ
l+1
Z b 2 e−2πikξ dξ = |φ(ξ)|
l
l
1
X
0
b + l)|2 e−2πikξ dξ. |φ(ξ
l
Thus, G(k) = G(k, 0) is the kth Fourier coefficient of the overlap function P b 2 . That the shifts of φ form a Riesz basis for V (φ) is equivΦ(ξ)2 = l |φ(ξ+l)| alent to Φ being essentially bounded above and below since, by Plancherel’s theorem on R, ° °2 ° °2 Z 1 ¯ ¯2 °X ° °X ° ¯X ¯ −2πikξ b −2πikξ ¯ 2 ° ° ° ° ¯ ak φ(· − k)° = ° ak e φ(ξ)° = ak e ° ¯ ¯ Φ(ξ) dξ. k
2
k
2
0
k
The claim then follows from Plancherel’s theorem on T. In fact, the integer shifts of φ are orthonormal precisely when Φ(ξ) ≡ 1. Since 1/Φ is bounded and periodic, it canPbe expressed as the Fourier series of some `2 -sequence {vk }. Define ϕ(x) = k vk φ(x − k). Then ϕ ∈ V (φ) and the integer shifts of ϕ form an orthonormal basis for V (φ) = V (ϕ). We say ϕ is an orthogonal generator of V0 = V (ϕ). Although an orthogonal generator always exists, certain properties of an MRA are often easier to deduce by referring to some other generator. For example, in the space V = {f ∈ L2 (R) ∩ C(R) : f |[k,k+1] is linear}, the function φ(x) = (1 − |x|)+ serves as a nonorthogonal generator. It is cardinal in the sense P that φ(k) = δk so that each f ∈ V admits the sampling formula f (x) = k f (k)φ(x − k). While there are orthogonal generators of V , none are simultaneously compactly supported and cardinal. In the remainder of this
4
1 Wavelets: Basic properties, parameterizations and sampling
section, though, we consider properties of MRAs predicated on the assumption that ϕ is an orthogonal generator with scaling filter H. P Suppose now that ϕ is an orthogonal scaling function. Since l |ϕ(ξ b + l)|2 ≡ 1, (1.2) applied to ϕ yields X ¯¯ ³ l ´¯¯2 ¯¯ ³ l ´¯¯2 ¯H ξ + ¯ ¯ϕ ξ + ¯ 2 2 l l ¯ ³ X³ 1 ´¯¯2 ¯¯ ³ 1 ´¯¯2 ´ ¯ |H(ξ + l)|2 |ϕ(ξ + l)|2 + ¯H ξ + l + = ¯ ¯ϕ ξ + l + ¯ 2 2 l ¯ ³ X³ 1 ´¯¯2 ´ 1 ´¯¯2 ¯¯ ³ ¯ = |H(ξ)|2 |ϕ(ξ + l)|2 + ¯H ξ + ¯ ¯ϕ ξ + l + ¯ 2 2 l ¯ ¯2 ¯ ³ 1 ´¯¯2 ¯ ¯ ¯ = ¯H(ξ)¯ + ¯H ξ + ¯ . 2
1 =
X
| ϕ(2ξ b + l)|2 =
That is, orthogonality plus scaling ¯ ¯2 ¯ ¯ ¯H(ξ)¯ +
implies ¯ ³ 1 ´¯¯2 ¯ ¯H ξ + ¯ ≡ 1. 2
(1.3)
For purposes of wavelet construction, one associates to H a filter G such that H and G satisfy |H(ξ)|2 + |G(ξ)|2 ≡ 1. Then (H, G) is known as a quadrature mirror filter pair (QMF). One of the possible choices for G is ¯ + 1/2). One often refers to H itself as a quadrature G(ξ) = −e−2πiξ H(ξ mirror filter. Now consider the converse problem: when does a QMF give rise to an orthogonal scaling function? Iterating (1.2) yields, at least formally, ϕ(ξ) b =
∞ Y j=1
H
³ξ´ . 2j
(1.4)
P For convergence of (1.4) it is necessary that H(0) = 1, i.e., k hk = 1. The product (1.4) will converge locally uniformly if H is sufficiently well behaved at zero. Suppose, for example, that H is H¨older continuous at zero of some positive order α. That is, there is a C > 0 such that |H(ξ) − H(0)| |H(ξ) − 1| = ≤ C. α α |ξ − 0| |ξ| Then, since | log |H(ξ)| | ≤ log(1 + C|ξ|α ) ≤ C 0 |ξ|α , it follows that
(|ξ| ¿ 1)
∞ ¯ ∞ ¯ ³ ξ ´¯ ¯ X X ¯ ¯ ¯¯ 2−jα ¯ log ¯H j ¯ ¯ ≤ C 0 |ξ|α 2
j=N
j=N
1.1 Scaling and multiresolution analysis
5
converges absolutely which, in turn, implies absolute convergence of (1.4). Thus, if H is a trigonometric polynomial or if its coefficients {hk } decay at some rate, then ϕ b is well defined as a pointwise product. Convergence in L2 of the infinite product in (1.4) then follows from the QMF condition (1.3). Consider the sequence of truncated partial products defined by J ³ξ´ Y (1.5) ϕ bJ (ξ) = H j χ[−2J−1 ,2J−1 ] (ξ). 2 j=1 QJ Since j=1 H(ξ/2j ) is periodic with period 2J , it follows that kϕJ k22
Z
2J−1
Z 2J−1 ¯ ³ ´¯2 J−1 J ¯ ³ Y ξ ¯ Y ¯¯ ³ ξ ´¯¯2 ξ ´¯¯2 ¯ ¯ = ¯H J ¯ ¯H j ¯ dξ ¯H j ¯ dξ = 2 2 2 −2J−1 j=1 −2J−1 j=1 Z
2J−1
= 0
Z = 0
J−1 ³¯ ³ ξ ´¯2 ¯ ³ ξ 1 ´¯¯2 ´ Y ¯¯ ³ ξ ´¯¯2 ¯ ¯ ¯ ¯H J ¯ + ¯H J − ¯ ¯H j ¯ dξ 2 2 2 2 j=1
2J−1 J−1 Y¯
¯ ¯H
j=1
Z 2J−2 J−1 ³ ξ ´¯2 Y ¯¯ ³ ξ ´¯¯2 ¯ dξ = ¯H j ¯ dξ = kϕJ−1 k22 ¯ 2j 2 −2J−2 j=1
because of the QMF condition (1.3). Therefore, by induction from the base case kϕ1 k2 = 1 one has kϕJ k2 = 1. It follows from the weak-star compactness of the unit ball in L2 (R) that ϕJ has a weak-star limit in L2 which must agree with the pointwise limit defined by the infinite product (1.4). Thus, any QMF whose coefficients hk are well behaved gives rise to a scaling function ϕ ∈ L2 (R). The only remaining question is whether ϕ defined through (1.3) and (1.4) must be orthogonal to its shifts. In fact, this is not always the case. The filter H(ξ) = (1 + e−6πiξ )/2 satisfies (1.3) but gives rise through (1.4) to the stretched Haar function 3−1/2 χ[0,3] (x) which is not orthogonal to its integer shifts. More sophisticated examples can be found in Daubechies [99] where the pathology is described in more detail. For a trigonometric polynomial H (cf. Proposition 1.4.1 for the more general case), A. Cohen characterized the pathology in the following simple way (cf. Daubechies, [99], p. 188). Let τ : [0, 1) → [0, 1) be given by τ (ξ) = 2ξ mod 1. A τ -cycle is a collection {ξ1 , . . . , ξN } ∈ [0, 1) such that ξj+1 = τ (ξj ) 1 ≤ j ≤ N − 1 and ξ1 = ξN . The trivial τ -cycle consists of the single point {0}. Cohen proved the following. Theorem 1.1.1. Suppose that H(ξ) is a trigonometric polynomial satisfying (1.3) with H(0) = 1 and ϕ is the scaling function defined via (1.4). Then ϕ is orthogonal to its integer shifts if and only if there is no nontrivial τ -cycle {ξ1 , . . . , ξN } such that |H(ξj )| = 1 for 1 ≤ j ≤ N − 1. The criterion of the theorem will be called the τ -cycle condition. A separate characterization of this nondegeneracy of ϕ was found by Lawton [252] (cf. [99], p. 190 and [251]).
6
1 Wavelets: Basic properties, parameterizations and sampling
P Theorem 1.1.2. Suppose that H(ξ) = k hk e−2πikξ is a trigonometric polynomial satisfying (1.3) with H(0) = 1 and ϕ is the scaling function defined via (1.4). Then ϕ isP orthogonal to its integer shifts when the eigenvalue 1/2 of the matrix Akl = m hm hl−2k+m possesses a one-dimensional eigenspace. Failure of the Cohen and Lawton conditions is easy to check for H(ξ) = (1 + e−6πiξ )/2. Cohen’s condition fails because {1/3, 2/3} forms a nontrivial τ -cycle. For Lawton’s criterion, one has h0 = h3 = 1/2 and hk = 0 otherwise. Thus Akl = δk−2l /2 + δk−2l±3 /4 (where −2 ≤ k, l ≤ 2). Lawton’s condition then fails because [0, 0, 1, 0, 0]T and [1, 1, 0, 1, 1]T are both eigenvectors of A with eigenvalue 1/2. 1.1.1 Orthonormal wavelet bases for L2 (R) Not every wavelet comes from an MRA, but every MRA gives rise to an orthonormal wavelet. As before, assume that ϕ is an orthogonal scaling funcP tion with QMF H(ξ) = k hk e−2πikξ . As the spaces V0 (ϕ) ⊂ V1 (ϕ) themselves are Hilbert spaces, one can define the relative orthogonal complement W0 = (V0⊥ |V1 ) where the notation denotes the orthogonal complement of V0 as a subspace of V1 . The space W0 is shift-invariant since g ∈ W0 implies that hg(· − l), ϕ(· − k)i = hg, ϕ(· − (k −Pl))i = 0 for all k, l ∈ Z. now that g(x) = k ck ϕ(2x − k) ∈ W0 . Setting C(ξ) = P Suppose −2πikξ c e one has, for each l ∈ Z: k k Z hg, ϕ(· − l)i = gb(ξ) ϕ(ξ) b e2πilξ dξ Z ³ ξ ´ ³ ξ ´ ¯ ³ ξ ´¯2 1 ¯ ¯ 2πilξ ¯ = C H b dξ ¯ϕ ¯ e 2 2 2 2 Z 1 X ¯ |ϕ(ξ b + k)|2 dξ = C(ξ) H(ξ) e4πilξ 0
Z
k
³ 1 ´´ 4πilξ 1´ ¯³ ¯ H ξ+ e dξ = 0, C(ξ) H(ξ) + C ξ+ 2 2 0 P b + k)|2 = 1 for a.e. ξ. That is, all where we have used the fact that k |ϕ(ξ ¯ ¯ + Fourier coefficients of the 1/2-periodic function C(ξ)H(ξ) + C(ξ + 1/2)H(ξ ¯ ¯ 1/2) vanish. Thus, C(ξ)H(ξ) = −C(ξ + 1/2)H(ξ + 1/2) a.e. on [0, 1/2). Since, by (1.3), H(ξ) and H(ξ + 1/2) cannot vanish simultaneously, there is a scalar function M on [0, 1) such that · ¸ · ¸ ¯ + 1) C(ξ) H(ξ 2 = M (ξ) . (1.6) ¯ C(ξ + 21 ) −H(ξ) 1/2
³
=
Moreover, (1.3) implies |C(ξ)|2 + |C(ξ + 1/2)|2 = |M (ξ)|2 so M ∈ L2 (T). Replacing ξ by ξ + 1/2 in (1.6), one also has M (ξ + 1/2) = −M (ξ) a.e. so that
1.1 Scaling and multiresolution analysis
7
M (ξ) = e2πiξ N (2ξ) for some N ∈ L2 (T). Thus, any g ∈ W0 can be expressed ¯ by gb(ξ) = C(ξ/2)ϕ(ξ/2) b = eπiξ H(ξ/2 + 1/2)N (ξ)ϕ(ξ/2). b −πiξ b ¯ Setting ψ(ξ) = −e H(ξ/2 + 1/2)ϕ(ξ/2), b any g ∈ W0 then satisfies 2 b gb(ξ) = N (ξ)ψ(ξ) for some N ∈ L (T). In particular, the functions {ψ(x − b = k)}k∈Z form an orthonormal basis for W0 . Actually, if ψ is defined by ψ(ξ) ¯ µ(ξ)H(ξ/2 + 1/2)ϕ(ξ/2) b in which |µ(ξ)| ≡ 1 and µ(ξ + 1/2) = −µ(ξ), then the collection {ψ(x − k)}k∈Z will also form an orthonormal basis for W0 . If ϕ is supported on [0, M ] (M odd), then taking µ(ξ) = −e−πiξ puts the support of ψ in [(1 − M )/2, (1 + M )/2]. P ¯ 1−l e−2πilξ so that ψ(2ξ) b = G(ξ)ϕ(ξ). b From now on we set G(ξ) = l (−1)l h ⊥ For any j ∈ Z, let Wj = (Vj |Vj+1 ). Arguing just as before, the functions ψjk (x) = 2j/2 ψ(2j x−k) form an orthonormal basis for Wj . Moreover, if j 6= j 0 then the spaces Wj and Wj 0 are automatically orthogonal to each other: If, say j 0 > j, then Wj is a subspace of Vj+1 which is, in turn, a subspace of Vj 0 that is orthogonal to Wj 0 . It follows then from the limit properties of the spaces Vj that the functions ψjk form an orthonormal basis for L2 (R). 1.1.2 Subband coding and FWT Well before Mallat’s discovery of MRA, quadrature mirror filters were utilized by Esteban and Galand [132] to address a concrete application in speech processing. The idea is simple. Start with a sequence s(k) thought of as integer sample values of a continuous-time signal and form the Fourier series P S(ξ) = k s(k)e−2πikξ . The idea of subband coding is essentially to replace S by the subband elements SH and SG which can be processed separately. ¯ SG 7→ SGG ¯ Reconstruction is achieved by conjugate filtering SH 7→ SH H, and adding the components. The QMF condition (1.3) shows that this pro¯ + SGG. ¯ Strang and Nguyen’s book [332] is cess recovers S, i.e., S = SH H one of numerous excellent sources containing detailed descriptions of subband coding and related algorithms and applications. Discrete wavelet transform. The spaces Vj (j ∈ Z) may be thought of as providing a scale of details or resolutions of images and the orthogonal projection Pj onto Vj as an operator that removes details above some level specified by the index j. Elements of V0 are, roughly speaking, as detailed as ϕ itself. The space V1 contains signals that are twice as detailed. The differences (details) between signals in V1 and their orthogonal projections onto V0 are contained in the wavelet space W0 = V1 ª V0 . This heuristic leads to an efficient, hierarchical decomposition of sequences. The collection {ϕjk }k∈Z where ϕjk (x) = 2j/2 ϕ(2j x−k) forms an orthonormal basis for Vj . Similarly, {ψjk }k∈Z forms an orthonormal basis for Wj . Hence, orthogonal projections Pj and Qj of f ∈ L2 (R) onto these respective spaces are achieved by
8
1 Wavelets: Basic properties, parameterizations and sampling
Pj f =
X
hf, ϕjk i ϕjk ;
Qj f =
k
X
hf, ψjk i ψjk .
k
Given f ∈ Vj , let cjk = cjk (f ) = hf, ϕjk i and djk = hf, ψjk i. Since ϕ satisfies the dilation equation (1.1), it follows that √ X hl−2k ϕj+1,l (x). ϕjk (x) = 2 l
Define filters H, G : `2 (Z) → `2 (Z) by √ X √ X ¯ l−2k al ; (Ga)k = 2 (Ha)k = 2 h g¯l−2k al . l
(1.7)
l
¯ −l followed by Notice that H acts by convolution against the sequence h∗l = h decimation/downsampling, and similarly for G. Then √ X ¯ l−2k cj+1 , cjk = hf, ϕjk i = 2 h l l
i.e., cj = Hcj+1 . Similarly, dj = Gcj+1 . This leads to a hierarchical scheme for the computation of wavelet coefficients, symbolically represented in the following diagram: H / H / H / cj C cj−2 F cj−3D_ _ _/ cL cj−1 F CC FF FF D CC FF FF D C F F G CC G FF G FF D! ! # # j−1 j−2 j−3 d d d dL
Given the sequence cj we compute cj−1 and dj−1 using the decimation and convolution filters H and G as in (1.7). Repeating the process on cj−1 gives the next layer of coefficients cj−2 and dj−2 . Continuing gives cL = Hj−L cj and dL = GHj−L−1 cj . This process is known as the discrete wavelet transform. Inverse discrete wavelet transform. Reconstruction of cj from cL , dL , dL+1 , . . . , dj−1 may be achieved with the aid of the filters H∗ , G ∗ : `2 (Z) → `2 (Z) given by √ X √ X (H∗ a)k = 2 hk−2l al ; (G ∗ a)k = 2 gk−2l al . (1.8) l
l
The filters H∗ and G ∗ are `2 -adjoints √ of H and G and act via upsampling followed by convolution, i.e., (H∗ a)k = 2(h ∗ a ˜)k where ( 0, if l is odd, e al = al/2 , if l is even,
1.1 Scaling and multiresolution analysis
9
and similarly for G ∗ . Now cj−1 , dj−1 determine cj via cjk = hf, ϕjk i = hPj−1 f + Qj−1 f, ϕjk i X j−1 X j−1 = cl hϕj−1,l , ϕjk i + dl hψj−1,l , ϕjk i l
=
l
√ X √ X 2 hk−2l clj−1 + 2 gk−2l dj−1 l l ∗ j−1
= (H c
l ∗ j−1
)k + (G d
)k ,
i.e., cj = H∗ cj−1 + G ∗ dj−1 . Note that HH∗ = GG ∗ = I on `2 (Z) and H∗ H + G ∗ G = I on `2 (Z), an equation equivalent to (1.3). Repeating this process on the sequences cj−2 and dj−2 gives cj−1 = H∗ cj−2 + G ∗ dj−2 and continuing we find j−L−1 X cj = (H∗ )j−L cL + (H∗ )m G ∗ dj−m−1 . (1.9) m=0
This formula, known as the discrete inverse wavelet transform, is represented in the following diagram. H∗
∗
∗
∗
/ cL+1 H / cL+2 H / cL+3 _ _ _/ cj−1 H / cj = ; x; w; z< {{ ww ww { w zz x w { w z w x ∗ {{ ∗ ww ∗ zz ∗ ww ∗ ww G ww G zz G {{ G x G dj−1 dj−2 dL dL+1 dL+2 cL
Fast wavelet transform. In order to implement these decompositions in a practical way, one must preprocess signals in a suitable fashion. As in the case of the fast Fourier transform, one can work with periodic signals and periodized basis functions to build a fast algorithm for the wavelet transform and its inverse. Other preprocessing, including truncation or zero padding, will be addressed implicitly in Chapter 2. Suppose (Vj , ϕ) is an MRA of L2 (R) with orthogonal generator ϕ ∈ L1 ∩ ∞ L and ψ is the associated wavelet. We define the periodizations ϕper jk and per ψjk of ϕjk and ψjk , respectively, by ϕper jk (x) =
X
ϕjk (x + l);
l
per ψjk (x) =
X
ψjk (x + l)
l
and the periodizations Vjper and Wjper of Vj and Wj , respectively, as the closed subspaces of L2 (T) given by Vjper = span {ϕper jk };
per Wjper = span {ψjk }. P P Since H(1/2) P = 0 and H(0) = 1, we have k h2k = k h2k+1 = 1/2. Hence, if F (x) = l ϕ(x + l) = ϕper 0l (x), an application of the dilation equation (1.1) gives
10
1 Wavelets: Basic properties, parameterizations and sampling
F (x) = 2
XX l
hk ϕ(2x + 2l − k) = 2
XX
k
k
hk+2l ϕ(2x − k) = F (2x).
l
(1.10) However F ∈ L1 (T), and (1.10) implies that its Fourier coefficients satisfy Fb (m)= Fb(2j m) for non-negative integers j, thus contradictingR the Riemann– ∞ Lebesgue lemma unless F is constant. Since 1 = ϕ(0) b = −∞ ϕ(x) dx = R1 P F (x) dx, this constant must be 1, i.e., l ϕ(x + l) ≡ 1. As a consequence, 0 per for j ≤ 0, the spaces Vj are one-dimensional spaces containing only the constant functions. Similarly, X XX X ¯ 1−k ϕ(2x + l − k) = 2 ¯ 1−k = 0 ψ(x + l/2) = 2 (−1)k h (−1)k h l
l
k
k
from which we see that Wjper = {0} for j ≤ −1. We need then only concern ourselves with the spaces Vjper and Wjper for j ≥ 0. The nestedness of the multiresolution spaces Vj is inherited by their per per periodizations Vjper as is the orthogonal decomposition Vjper = Vj−1 ⊕ Wj−1 . j
per 2 −1 Further, each Vjper has an orthonormal basis {ϕper has jk }k=0 and each Wj j
per 2 −1 an orthonormal basis {ψjk }k=0 . (j)
(j)
We denote 2j -periodized versions hk and gk of the filter sequences hk P (j) (j) (j) ¯ (j) and gk by hk = l hk+2j l and similarly for gk . Then gk = (−1)k h 1−k j (j) where the subscripts are now taken modulo 2 . Periodized filters H and G (j) acting on 2j -periodic sequences are then defined by (H(j) a)k =
√
2
j 2X −1
¯ (j) al ; h l−2k
(G (j) a)k =
√
l=0
2
j 2X −1
(j)
g¯l−2k al ,
l=0
and their adjoints (H(j) )∗ and (G (j) )∗ by ((H
(j) ∗
) a)k =
√
2
j 2X −1
l=0
(j) hk−2l
al ;
((G
(j) ∗
) a)k =
√
2
j 2X −1
(j)
gk−2l al .
l=0
Given a discrete signal cj of length 2j , we compute the signals cj−1 and d by cj−1 = H(j) cj , dj−1 = G (j) cj . Both cj−1 and dj−1 are signals of length j−1 2 (or, more precisely, are signals with period 2j−1 ). Continuing in this way we construct sequences dj−1 , dj−2 , . . . , d0 of lengths 2j−1 , 2j−2 , . . . , 2, 1, respectively, and a constant c0 . The sum of the lengths of these sequences is 2j−1 + 2j−2 + · · · + 2 + 1 + 1 = 2j , the length of the original sequence cj . When the fast Fourier transform is used to compute the convolutions that appear j j in the action of the operator Wh : C2 → C2 which decomposes a signal j cj ∈ C2 to the sequence (dj−1 , dj−2 , . . . , d0 , c0 ), it is easily shown that the algorithm has complexity O(N log N ) where N = 2j is the length of the data sequence cj . If the low-pass filter {hk }k has finite impulse response (FIR) in j−1
1.1 Scaling and multiresolution analysis
11
the sense that hk = 0 if k < 0 or k > M for some positive integer M , then the algorithm has complexity O(M N ). Under either of these circumstances, Wh is known as the fast wavelet transform (FWT). In analogy with (1.9), cj may be recovered from (dj−1 , dj−2 , . . . , d0 , c0 ) with the aid of the adjoint operators (H(m) )∗ , (G (m) )∗ (1 ≤ m ≤ j − 1). The operator Wh−1 that implements this mapping is known as the fast inverse wavelet transform (FIWT). The cascade algorithm. There are several schemes for computing the values of the scaling function ϕ given the QMF H. The first is the spectral method as outlined in (1.4) and subsequent discussion. Another method will be given in Section 1.3. For now we concentrate on the cascade algorithm that arises directly from (1.1). For reasonable H, iterating H∗ starting from the delta sequence provides convergence to the values of ϕ at dyadic rationals. P Given a scaling function ϕ with associated QMF H(ξ) = k hk e−2πikξ , define a bounded operator T on L2 (R) by X T f (x) = 2 hk f (2x − k). k
P By (1.1), ϕ is a fixed point of T andR the condition k hk = 1 ensures that R T preserves the first moment, i.e., T f = f . The iterates ϕn = T n ϕ0 (n ≥ 0) of ϕ0 ∈ L2 (R) having integral one will converge to ϕ under reasonable hypotheses on H, ϕ0 . R suppose ϕ = 1, ϕ is H¨older continuous of order α > 0 and R Alternatively, |t|α |ϕ(t)| dt < ∞. Then if k/2J is a dyadic rational and j is large enough, ¯ Z ¯ µ ¶ ¯ µ ¶ µ ¶¯ ¯ ¯ ¯ ¯ ¯ϕ k − 2j/2 hϕ, ϕj,k2j−J i¯ ≤ ¯ϕ k − ϕ y + k ¯2j |ϕ(2j y)| dy ¯ ¯ ¯ 2J 2j 2j ¯ ≤ C2−jα . Recall that H∗ : `2 (Z) → `2 (Z) acts via (H∗ a)k = function ϕ is unique in L2 (R) with the properties hϕ, ϕ0k i = δk ,
hϕ, ψjk i = 0
√ P 2 l hk−2l al . The scaling
(j ≥ 0, k ∈ Z).
We now run the inverse discrete wavelet transform on these sequences. Let c0 be the delta sequence c0k = δk and define sequences cj by cj = (H∗ )j c0 and dj = 0 (j ≥ 0). Then hϕ, ϕ1k i = c1k = (H∗ c0 )k + (G ∗ d0 )k = (H∗ c0 )k . More generally, hϕ, ϕjk i = cjk = ((H∗ )j c0 )k and as a consequence, ¯ µ ¶ ¯ ¯ µ ¶ ¯ ¯ ¯ ¯ ¯ ¯ϕ k − 2j/2 ((H∗ )j δ0 )k2j−J ¯ = ¯ϕ k − 2j/2 cj j−J ¯ ≤ C2−jα ¯ ¯ ¯ ¯ k2 2J 2J
12
1 Wavelets: Basic properties, parameterizations and sampling
which is the desired convergence estimate. For visualization, one typically plots the piecewise linear interpolant η j of the pairs (k/2j , 2j/2 cjk ). The following result and its proof appear in [99] (Proposition 6.5.2). Proposition 1.1.3. If ϕ is H¨ older continuous of order α > 0, then there exist C > 0 and j0 ∈ N such that, for j ≥ j0 , kϕ − η j k∞ ≤ C2−jα . Initialization and design. Typical examples of MRAs, scaling functions, wavelets and QMFs illustrate the tradeoffs involved in using FWT algorithms in signal processing. First we consider the so-called Shannon MRA. The Shannon scaling function ϕS (x) = sin πx/(πx) has the Fourier transform ϕ bS (ξ) = χ[−1/2,1/2] (ξ). The associated QMF is the one-periodic function HS (ξ) = χ[−1/4,1/4] (ξ) on T ∼ [−1/2, 1/2). The space Vj (ϕS ) is the space of L2 -functions f bandlimited to [−2j−1 , 2j−1 ], i.e., fb(ξ) = 0 for |ξ| > 2j−1 . The statement that {ϕS (· − k)}k∈Z is an orthonormal basis for V0 is the classical samplingPtheorem (see Chapter 3), which also states that if f ∈ V0 (ϕS ) then f (x) = k f (k)ϕS (x − k), a consequence of the cardinality of the Shannon scaling function, i.e., ϕS (k) = δk0 . Xia and Zhang [368] proved that there is no compactly supported continuous orthogonal scaling function with this property (see Theorem 1.2.1 for an alternative proof). The Haar scaling function ϕH = χ[0,1) is cardinal, compactly supported and has orthogonal shifts, but is, of course, not continuous, while the Shannon scaling function is continuous, orthogonal and cardinal, but not compactly supported. In implementing the FWT, it is standard practice to regard the values sk of an integer-sampled sequence as the input coefficients ck of some f ∈ V0 . Strang and Nguyen refer to this as a wavelet crime (see [332], p. 232). For P a fixed scaling function ϕ and f (x) = k ck ϕ(x − k) ∈ V (ϕ), the samples sk = f (k) are effectively never the same as the coefficients ck . Even in the Shannon MRA, real signals are only approximately bandlimited. If one wishes only to compute, store, transmit and reconstruct samples from wavelet coefficients, then there is nothing wrong with this practice. However, as soon as one modifies wavelet coefficients (e.g., for compression or denoising) there is difficulty in interpreting the signal errors thus encumbered. Moreover, such errors can be magnified rapidly when fed back into iterative schemes. Even in noniterative applications, though, such errors can be difficult to interpret, particularly when they are measured by some norm other than L2 . Suppose, for example, that one wishes to approximate a sampled sigP nal from some subset of wavelet terms of f (x) = k ck ϕ(x − k) in such a way that the reconstruction captures important fluctuations of the original samples. This is possible if variation can be expressed in terms of the magnitude of wavelet coefficients. It is important to P keep in mind that one is measuring the variational error between f (x) = k ck ϕ(x − k) and P freconst (x) = k c0k ϕ(x − k) where c0k are the reconstructed approximate co-
1.1 Scaling and multiresolution analysis
13
efficients. The sample variation of freconst is not the same as the discrete variation of c0k . This is one reason why prefiltering seems desirable. The ideal manner of prefiltering would P be to find a convolutional inverse e d for the sequence of samples ϕ(k), i.e., k l dl ϕ(k − l) = δk . Then f (x) = P P k( l dl sk−l )ϕ(x − k) at least agrees with f at the integers. As we will see in Chapter 3, this problem is not always well posed, and even if it is, the filter coefficients dl may have undesirable properties. Strang and Nguyen [332] suggest the following prefiltering scheme that at least serves to approximate the desired coefficients ck (f R). In the case of an orthogonal generator one replaces the desired ck (f ) P = f (t)ϕ(t − k) dt in which f (t) is a function having samples sk , by γk = l sl ϕ(k − l). This discrete approximation of the desired processing coefficients is accurate on polynomials up to degree d whenever the filter H of ϕ has the following property (see [268, 330, 331], cf. Theorem 2.1.1). Theorem 1.1.4. Suppose that ϕ is an orthogonal scaling function. Then the space Pd−1 of polynomials of degree less than or equal to d − 1 is reproduced by ϕ in the sense that X p(t) = p(k) ϕ(t − k) k
for all p ∈ Pd−1 , if and only if the QMF H(ξ) associated with ϕ has a zero of order d at ξ = 1/2. In wavelet-based signal processing algorithms, one would postprocess as well. As mentioned, in order that the wavelet ψ have certain properties desirable in signal analysis and processing, constraints must be placed on the scaling function ϕ. For example, symmetry is an important consideration in signal analysis when trying to avoid artifacts caused by the edges of images, and when solving boundary value problems as will be discussed below. Daubechies (see [99], p. 252) proved the following. Theorem 1.1.5. If φ generates an MRA for L2 (R) such that φ and the associated wavelet ψ are both real and have compact support, and if ψ has an axis either of symmetry or antisymmetry, then ψ is the Haar wavelet. Although orthogonality can have important ramifications for processing transformed data, from the point of view of straight subband coding, the perfect reconstruction property is what matters. This property is expressed naturally in terms of biorthogonal filterbanks. 1.1.3 Biorthogonal multiresolution analyses The conditions for biorthogonal MRA filters can be deduced in much the same way as in the orthogonal case. One starts with a pair of scaling functions φ ˜ − k)i = δk . In the Fourier domain this becomes and φ˜ satisfying hφ, φ(·
14
1 Wavelets: Basic properties, parameterizations and sampling
X
be b + l) φ(ξ φ(ξ + l) ≡ 1.
l
Following the orthogonal case routinely yields the mirror filter condition: ³ 1´ e³ 1´ e H(ξ) H(ξ) + H ξ+ H ξ+ ≡ 1 2 2
(1.11)
˜ respectively. Often ˜ the scaling filters associated with φ and φ, with H and H ˜ H is called the primal filter and H the dual filter. B-splines furnish a natural family of examples (e.g., Cohen [79]). The Bsplines are symmetric, compactly supported scaling functions whose integer shifts form Riesz bases for their respective spans. For example, the linear B-spline φ1 (x) = (1 − |x − 1|)+ satisfies the two-scale equation φ1 (x) =
1 1 φ1 (2x) + φ1 (2x − 1) + φ1 (2x − 2). 2 2
The inner products kφ1 k2 = 2/3, hφ1 , φ1 (· ± 1)i = 1/12 and hφ P1 , φ1 (· ± k)i = 0 (k > 1) are determined by explicit integrations. Thus, if f = k ck φ1 (x−k) ∈ V (φ1 ) with real coefficients ck then kf k22 =
1 X 2 X |ck |2 + ck ck−1 3 6 k
k
and the Riesz basis property of {φ1 (· − k)} follows directly from the Cauchy– Schwarz inequality. The standard orthogonal generator of V (φ1 ) fails to have compact support. One seeks instead a dual generator φ˜1 that is biorthogonal to φ1 in the sense that hφ1 , φ˜1 (· − k)i = δk . Such a biorthogonal generator is not necessarily unique, but instead might be chosen among a family of generators that allow for tradeoffs between regularity and support length. Suppose now that φ˜ is a scaling function whose shifts are biorthogonal to those of φ. Then (1.11) must be satisfied by the dual scaling pair and this is the condition that one seeks to solve. For the linear spline centered at x = 1, the scaling filter has the form H1 (ξ) =
1 1 1 + e−2πiξ + e−4πiξ = 4 2 4
µ
1 + e−2πiξ 2
¶2 = e−2πiξ cos2 πξ.
More generally, the B-spline of order N is defined as the N +1-fold convolution of χ[0,1) with itself. It is a scaling function with symbol µ HN (ξ) = The Bezout equation
1 + e−2πiξ 2
¶N +1 = e−(N +1)πξ cosN +1 πξ.
1.1 Scaling and multiresolution analysis
15
(1 − y)L PL (y) + y L PL (1 − y) = 1 has the solution [79, 99]: PL (y) =
L−1 Xµ j=0
L−1+j j
¶ yj .
Upon setting y = cos2 πξ one sees that (cos2 πξ)L PL (sin2 πξ) + (sin2 πξ)L PL (cos2 πξ) ≡ 1, which can be rewritten as µ ¶2L µ ¶2L 1 + e−2πiξ 1 − e−2πiξ PL (sin2 πξ) + (−1)L PL (cos2 πξ) ≡ e2πiLξ . 2 2 Subject to 2L ≥ N + 1, the dual filters µ e N,L (ξ) = e−2πiLξ H
1 + e−2πiξ 2
¶2L−N −1 PL (sin2 πξ)
provide all solutions to (1.11) for H(ξ) = HN (ξ) (see [99], p. 272 for plots of corresponding scaling functions and wavelets). The minimal choice 2L = N +1 (N odd) does not lead to a convergent scaling function when N = 1. In this ˜ 1,1 (ξ) = e−2πiξ , the scaling filter for the (shifted) δ distribution. Howcase H ˜ 1,2 (ξ) = e−6πiξ cos2 πξ(1 + 2 sin2 πξ) gives rise to a square-integrable ever, H scaling function. It is worth pointing out that in the biorthogonal case the FWT follows the same pattern as in the orthogonal case. After the correspond¯ + 1/2) and G(ξ) ˜ ˜ + 1/2) ing high-pass filters G(ξ) = e−2πiξ H(ξ = e−2πiξ H(ξ are fixed, one uses the dual filters for the forward transform and the primal filters for the inverse transform (e.g., [79, 99]). 1.1.4 Regularity for scaling distributions Consequences of the fact that wavelets form unconditional bases for a large scale of function spaces will be a basic theme of Chapter 6. This fact also goes a long way towards explaining their utility as tools for solving PDEs with initial conditions in suitable function spaces. To form convergent expansions, the wavelets themselves should belong to the space in question. It becomes important to be able to analyze local and global regularity properties of wavelets. One needs to know, for example, when a scaling function belongs to a Sobolev space—having a given number of derivatives in L2 —on the one hand, and a H¨older space—a uniform pointwise condition on divided differences—on the other. Here we will review methods for analyzing pointwise (i.e., local) regularity of a scaling distribution that has compact support. Global regularity considerations based on Fourier methods will be addressed in Chapter 2.
16
1 Wavelets: Basic properties, parameterizations and sampling
Refinement methods are used to establish pointwise and difference estimates on scaling distributions based on the eigenspace structure of certain matrices attached to the scaling coefficients. Such methods are discussed in some detail in Chapter 7 of Daubechies [99], but first appeared in [107, 108] (see also Cabrelli et al. [63]). What follows is, to some extent, a modest elaboration of the discussion in [99]. Refinement methods. In this section we consider questions of existence and regularity of solutions of the refinement or scaling equation (cf. (1.1)) φ(x) = 2
M X
hk φ(2x − k).
(1.12)
k=0
There are only M +1 terms in the sum on the right-hand side of (1.12), i.e., H is an FIR filter. A solution of (1.12) is a fixed point of the refinement operator T f (x) = 2
M X
hk f (2x − k).
(1.13)
k=0
associated to the scaling sequence {hk }M k=0 . Notice that if f is supported on [0, M ], then so is T f . Hence, if a solution φ of (1.12) can be expressed as a limit of iterates T j f of a function supported in [0, M ] then supp (φ) ⊂ [0, M ]. Daubechies and Lagarias [107, 108] construct φ as a limit of piecewise linear splines having desired values at dyadic rationals. Very roughly speaking, differences in values in passing from one iteration to the next are closely related to eigenvectors of certain transition matrices associated with {hk } and sum rules on {hk } enable one to avoid large eigenvalues that lead to irregularity. The refinement operator T in (1.13) induces a refinement operator T on M vector-valued functions as follows. Let P G = (G0 , . . . , GM −1 ) : [0, 1) → R . h G (2x) while, for 1/2 < x ≤ 1, For 0 ≤ x < 1/2 set (T G) (x) = 2 j k k 2j−k P set (T G)j (x) = 2 k hk G2j−k+1 (2x − 1). Then T can be expressed in terms of the M × M matrices h0 0 0 0 · · · 0 0 h1 h0 0 0 0 · · · 0 0 h2 h1 h0 0 · · · 0 h3 h2 h1 h0 0 · · · 0 0 0 T0 = 2 . . . . . , T1 = 2 . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 0 0 · · · hM hM −1 0 0 0 0 0 · · · 0 hM i.e., T G(x) = T0 G(2x) (0 ≤ x < 1/2); T G(x) = T1 G(2x − 1) (1/2 ≤ x < 1). The action of T on vector-valued functions may be written more economically with the aid of the operator τ : [0, 1) → [0, 1) which acts via τ x = 2x mod 1 (cf. Theorem 1.1.1). We have T G(x) = Tε1 (x) G(τ x)
1.1 Scaling and multiresolution analysis
17
P∞
−j where x = is the binary expansion of x. The ambiguity at j=1 εj (x)2 x = 1/2 causes no problems if Gk (0) = 0 = Gk−1 (1). If φ is a continuous solution of (1.12) then φ(0) = φ(M ) = 0. By evaluating both sides of (1.12) at integers k, the vector v = (φ(1), φ(2), . . . , φ(M −1))T ∈ RM −1 becomes an eigenvector of the (M −1)×(M −1) matrix A with (j, k)-th entry Ajk = 2h2j−k (1 ≤ j, k ≤ M − 1) and eigenvalue 1, i.e., Av = v. Note that A is a submatrix of both T0 and T1 : it may be obtained by removing the first row and column of T0 or the last row and column of T1 . We insist, as before, that the sequence {hk }M k=0 satisfies the sum rule
X
h2k =
k
X
h2k+1 =
k
1 . 2
(1.14)
PM −1 PM −1 Thus j=1 Ajk = j=1 2h2j−k = 1 for all k. Equivalently, the row vector u = (1, 1, . . . , 1) ∈ RM −1 is a left-eigenvector of A with eigenvalue 1. Similarly, e0 = (1, 1, . . . , 1) ∈ RM is left-fixed by both T0 and T1 . Suppose for the moment that the eigenvalue 1 of A is nondegenerate (i.e., has multiplicity one). Later we will give a natural condition that ensures this. Then there is a right eigenvector w of A with eigenvalue 1 that one sees, upon consideration of Jordan form of A, cannot be orthogonal to u. Normalize w Pthe M −1 so that uw = j=1 uj wj = 1. With w = (w1 , w2 , . . . , wM −1 )T so normalized and extended to RM +1 by setting w0 = wM = 0, define the piecewise linear interpolant f (0) (x) on R by ( wk , if 0 ≤ k ≤ M + 1, f (0) (k) = 0, else. Then f (0) is supported on [0, M ]. Define a (column) vector-valued function (0) F (0) : [0, 1) → RM by Fk (x) = f (0) (x + k) (0 ≤ x ≤ 1, 0 ≤ k ≤ M − 1), i.e., F (0) (x) = (f (0) (x), f (0) (x + 1), . . . , f (0) (x + M − 1))T (0)
(0)
(0 ≤ x < 1). (0)
Then Fk (0) = f (0) (k) = f (0) (1 + (k − 1)) = Fk−1 (1) and F0 (0) = f (0) (0) = (0) FM −1 (1).
0 = f (M ) = Vector-valued functions F (j) are now defined recursively by F (j) (x) = Tε1 (x) F (j−1) (τ x) (j ≥ 1). Each F (j) may be unfolded to generate a piecewise (j) linear function f (j) on the line with f (j) (x + k) = Fk (x) (0 ≤ x < 1). Since (0) Fk (x) = xwk+1 + (1 − x)wk , e0 F (0) (x) =
X k
(0)
Fk (x) = x
M −1 X k=0
wk+1 + (1 − x)
M −1 X k=0
and, since e0 Tε = e0 for ε = 0, 1 and ε1 (τ j x) = εj+1 (x),
wk =
M −1 X k=0
wk = 1
18
1 Wavelets: Basic properties, parameterizations and sampling
e0 F (j) (x) = e0 Tε1 (x) Tε2 (x) · · · Tεj (x) F (0) (τ j x) = e0 F (0) (τ j x) = 1. (1.15) The proof of the following result due to Daubechies and Lagarias [108] will be reviewed here. Theorem 1.1.6. Suppose that the finite sequence {hk }M k=0 satisfies the sum condition (1.14), T0 , T1 , f (j) and e0 are as above and E0 = span {e0 }. Suppose there exist constants C > 0 and λ < 1 such that for all integers m ≥ 1, ° ¯ ° ° ° (1.16) max ° Tε1 Tε2 · · · Tεm ¯E ⊥ ° ≤ Cλm . εj =0 or 1, j=1,...,m
0
Then the functions f (j) converge uniformly to a continuous function φ with kf (j) − φk∞ ≤ C 2j log2 (λ) . Moreover, φ is a solution of (1.12), is supported on [0, M ], satisfies and is H¨ older continuous of order α = − log2 (λ).
R
φ=1
Proof. First, as Daubechies showed in [99], the assumption (1.16) implies that the eigenvalue 1 is nondegenerate. Observe that by (1.15), F (j) (x) − F (k) (x) ∈ E0⊥ for all j, k and x. Hence, with an application of (1.16), kF (j+1) (x) − F (j) (x)k = kTε1 (x) · · · Tεj (x) (F (1) (τ j x) − F (0) (τ j x))k ≤ C λj kF (1) (τ j x) − F (0) (τ j x)k where k · k denotes the Euclidean norm on RM . Therefore, kF (j) (x)k ≤
j−1 X
kF (k+1) (x) − F (k) (x)k + kF (0) (x)k
k=0
≤
j−1 X
C λk kF (1) (x) − F (0) (x)k + kF (0) (x)k
k=0
≤
C sup kF (1) (y) − F (0) (y)k + sup kF (0) (y)k 1 − λ 0≤y<1 0≤y<1
so that the norms kF (j) (x)k are bounded independent of x, j. Furthermore, kF (j+k) (x) − F (k) (x)k = kTε1 (x) · · · Tεk (x) (F (j) (τ k x) − F (0) (τ k x))k ≤ C λk kF (j) (τ k x) − F (0) (τ k x)k ≤ C λk . ∞ M Since each F (j) (x) is continuous, {F (j) }∞ j=0 is Cauchy in L ([0, 1], R ) and so has a continuous limit Φ(x) = limj→∞ F (j) (x). Now unfold Φ(x) to obtain a continuous solution φ of (1.12) by φ(x + k) = Φk (x) (0 ≤ x < 1). Since (j) (j) Φk (0) = limj→∞ Fk (0) = limj→∞ Fk−1 (1) = Φk−1 (1), the definition of φ
1.1 Scaling and multiresolution analysis
19
at the integers is consistent. By (1.15) and the definition of φ, the solution satisfies Z
M
φ(x) dx = 0
M −1 Z 1 X j=0
φ(x + j) dx =
0
M −1 Z 1 X j=0
Z
Φj (x) dx
0
Z
1
=
1
e0 Φ(x) dx = lim 0
j→∞
e0 F (j) (x) dx = 1.
0
To obtain an estimate of the H¨older continuity of φ, observe that since φ is a fixed point of T in (1.13) and Φ(x) − F (0) (x) ∈ E0⊥ , for all x, kΦ(x) − F (j) (x)k = kTε1 (x) · · · Tεj (x) (Φ − F (0) )(τ j x)k ≤ C λj k(Φ − F (0) )(τ j x)k ≤ Cλj .
(1.17)
Suppose 2−(j+1) ≤ y − x < 2−j . Then, without loss of generality, the binary expansions of x and y agree up to the jth bit and kΦ(x) − Φ(y)k ≤ kΦ(x) − F (j) (x)k + kF (j) (x) − F (j) (y)k + kF (j) (y) − Φ(y)k ≤ 2 C λj + kTε1 (x) · · · Tεj (x) (F (0) (τ j x) − F (0) (τ j y))k ≤ 2 C λj + C λj kF (0) (τ j x) − F (0) (τ j y)k ≤ C 0 λj ≤ C 0 |x − y|log2 (1/λ) , since F (0) (τ j x) − F (0) (τ j y) ∈ E0⊥ . Hence φ is H¨older continuous of order α = − log2 (λ). This proves the theorem. In [97] and [99] Daubechies constructs continuous scaling functions N ϕ (N ≥ 1) supported on [0, 2N − 1]. To √ see how Theorem 1.1.6 works in a particular example, let µ = (1 + 3)/2 and φ = 2 ϕ be the lowestorder Daubechies scaling function supported on [0, 3] with scaling coefficients (h0 , h1 , h2 , h3 ) = (µ, 1 + µ, 2 − µ, 1 − µ)/4. Then T0 , T1 are given by µ 0 0 1+µ µ 0 1 1 2−µ 1+µ µ , T1 = 1 − µ 2 − µ 1 + µ T0 = 2 2 0 1−µ 2−µ 0 0 1−µ √ √ and have simple eigenvalues 1, 1/2, (1 + 3)/4 ¯and 1, 1/2, (1 − 3)/4, respectively. It is a simple matter to show that kT0 ¯E ⊥ k ≈ 0.7954 < 1; how0 ¯ ¯ ever kT1 ¯E ⊥ k ≈ 1.2646 > 1. Thus, the simple estimate kTε1 · · · TεL ¯E ⊥ k ≤ 0 ¯0 QL ¯ ⊥ k is insufficient to obtain (1.16). Nevertheless, Daubechies [99] kT ε j j=1 E0 gives an estimate that does handle this example. ¯ Qm Theorem 1.1.7. Let λm = maxεj =0 or 1 k j=1 Tεj ¯E ⊥ k1/m (m ≥ 0). Then a 0 necessary and sufficient condition for (1.16) is that λm < 1 for some positive integer m.
20
1 Wavelets: Basic properties, parameterizations and sampling
¯ ¯ When φ = 2 ϕ, λ1 = kT1 ¯E ⊥ k ≈ 1.2646 > 1 and λ2 = kT0 T1 ¯E ⊥ k1/2 ≈ 0 ¯0 1.0028 > 1, but λ3 = kT02 T1 ¯ ⊥ k1/3 ≈ 0.9104. Consequently, 2 ϕ is H¨older E0
continuous of order α = log2 ((0.9104)−1 ) = 0.1354. Higher values of m can produce lower values of λm and better estimates of the H¨older exponent of continuity. For the optimal H¨older exponent, one needs to take account of extra sum rules satisfied by QMFs with higher-order zeros at ξ = 1/2. We assume in what follows that H has a zero of order two at ξ = 1/2, i.e., apart from the sum rule (1.14), the sequence {hk }M k=0 also satisfies X (−1)k k hk = 0. (1.18) k
P
Set σ2 = k 2kh2k . If u = (1, 2, . . . , M ) ∈ RM then, by sum rules (1.14) and (1.18), uT0 = u/2 + σ2 e0 . Similarly, uT1 = u/2 + (σ2 − 1/2)e0 . Putting e01 = u − 2σ2 e0 gives e01 T0 = u T0 − 2 σ2 e0 T0 =
1 1 u − σ2 e0 = e01 . 2 2
Further,
1 0 (e − e0 ), 2 1 i.e., e01 Tε = (e01 − εe0 )/2 with ε = 0, 1. Hence, if e11 = e01 + e0 , then e01 T1 = u T1 − 2 σ2 e0 T1 =
e11 T1 = e01 T1 + e0 T1 =
1 0 1 1 (e1 − e0 ) + e0 = (e01 + e0 ) = e11 . 2 2 2
Thus, we have constructed left eigenvectors eεk for Tε (ε = 0, 1) with eigenvalues 1/2k (k = 0, 1). With E1 = span {e0 , e01 } = span {e0 , e11 }, Daubechies [99] showed that if ° ¯ ° ° ° max (1.19) °Tε1 Tε2 · · · Tεm ¯E ⊥ ° ≤ C λm εj =0 or 1, j=1,...,m
1
for some C > 0 and λ < 1 and the sum rules (1.14), (1.18) hold, then the left and right eigenspaces of T0 and T1 with eigenvalues 1 and 1/2 are nondegenerate. For j, k = 0, 1, let vkε be the right eigenvector of Tε with eigenvalue 2−k . By considering the Jordan forms of T0 and T1 , it is clear that eεk and vkε cannot be orthogonal (eεk vkε 6= 0), and one may normalize the right eigenvectors so that eεk vkε 0 = δk,k0 (ε, k, k 0 = 0, 1). (1.20) 0
The remaining inner products eεk vkε 0 may be deduced from (1.20), the eigenproperties of these vectors and the relationship e11 = e01 + e0 , and may be summarized by e11 vk0 = 1,
e01 vk1 = (−1)k+1 ,
e0 v1ε = 0
(ε, k = 0, 1).
(1.21)
1.1 Scaling and multiresolution analysis
21
Consider the operators Pε : RM → RM given by Pε u = (e0 u)v0ε + (eε1 u)v1ε (ε = 0, 1) and let Pε0 u = (I − Pε )u = u − Pε u. By (1.20) and (1.21) one has e0 Pε0 u = eε1 Pε0 u = 0 so that Ran (Pε0 ) ⊂ E1⊥ . Returning now to the question of the continuity of solutions of refinement equations, observe that e01 F (0) (x) = (1 − x) e01 F (0) (0) + x e01 F (0) (1) ! ÃM M X X l wl − 2 σ2 wl−1 = (1 − x) Ã + x
l=1 M X
l=1
l wl−1 − 2σ2
l=1
since
P l
lwl = 2σ2 and
P l
M X
!
wl
= x
l=1
wl = 1. Similarly, e11 F (0) (x) = x + 1, and therefore
Pε F (0) (x) = (e0 F (0) (x)) v0ε + (eε1 F (0) (x)) v1ε = v0ε + (x + ε) v1ε
(ε = 0, 1). (1.22)
Hence, for all x, y, Pε (F (0) (x) − F (0) (y)) = (x − y) v1ε .
(1.23)
Suppose now that 2−(j+1) ≤ |x − y| ≤ 2−j . As in the proof of Theorem 1.1.6, one may assume that x and y have the same binary expansion up to the first j terms. Then because of (1.23), assumption (1.19), the uniform bound on F (0) and the fact that Pε0 maps RM into E1⊥ , kF (j) (x) − F (j) (y)k = kTε1 (x) · · · Tεj (x) (F (0) (τ j x) − F (0) (τ j y))k ≤ kTε1 (x) · · · Tεj (x) Pεj (x) (F (0) (τ j x) − F (0) (τ j y))k + kTε1 (x) · · · Tεj (x) Pε0j (x) (F (0) (τ j x) − F (0) (τ j y))k ε (x)
≤ kTε1 (x) · · · Tεj (x) (τ j x − τ j y) v1j k + C λj 1 ε (x) = |τ j x − τ j y| kTε1 (x) · · · Tεj−1 (x) v1j k + C λj 2 ε (x)
where we have used the fact that Tεj (x) v1j ε (x) Pεj−1 (x) v1j
+
ε (x) Pε0j−1 (x) v1j ε (x)
kTε1 (x) · · · Tεj−1 (x) v1j
=
ε (x) v1j−1
+
ε (x)
= v1j
ε (x)
/2. However, v1j
ε (x) Pε0j−1 (x) v1j ,
so ε (x)
k ≤ kTε1 (x) · · · Tεj−1 (x) Pεj−1 (x) v1j + kTε1 (x) · · ·
k
Tεj−1 (x) Pε0j−1 (x) ε
(x)
ε (x)
v1j
k
≤ kTε1 (x) · · · Tεj−1 (x) v1j−1 k + C λj−1 1 ε (x) ≤ kTε1 (x) · · · Tεj−2 (x) v1j−1 k + C λj−1 . 2 Combining these estimates yields
=
22
1 Wavelets: Basic properties, parameterizations and sampling
kF (j) (x) − F (j) (y)k ≤
1 j ε (x) |τ x − τ j y| kTε1 (x) · · · Tεj−2 (x) v1j−1 k 4 C + |τ j x − τ j y|λj−1 + Cλj . 2
Continuing in this manner, one finds that kF (j) (x) − F (j) (y)k ≤ C |τ j x − τ j y|
j X λj−k + C λj . 2k
(1.24)
k=1
If λ < 1/2, estimating the sum in (1.24) gives kF (j) (x) − F (j) (y)k ≤ C
|τ j x − τ j y| + C λj ≤ C |x − y| + C 2−j ≤ C |x − y| 2j
so that each F (j) is Lipschitz continuous. If λ = 1/2, one finds kF (j) (x) − F (j) (y)k ≤ Cj|x − y| + Cλj ≤ C|x − y| log(|x − y|−1 ). Finally, for 1/2 < λ < 1, kF (j) (x)−F (j) (y)k ≤ C |τ j x−τ j y| λj
j X
(2λ)−k + C λj ≤ C λj ≤ C |x−y|α
k=1
where α = − log2 (λ), i.e., each F (j) is H¨older continuous of order α = − log2 (λ). As in the proof of Theorem 1.1.6, this implies the continuity of Φ and the scaling function φ. In summary, if 0 < |x − y| < 1, if λ < 1/2, |x − y|, −1 |φ(x) − φ(y)| ≤ C |x − y| log(|x − y| ), if λ = 1/2, |x − y|− log2 (λ) , if 1/2 < λ < 1. In the case √ of the Daubechies scaling function √ φ = ϕ2 , the eigenvalues 1, 1/2 and (1 + 3)/4 of T0 and 1, 1/2 and (1 − 3)/4 are ¯ ¯ all nondegenerate. The space E1⊥ is one-dimensional so that T0 ¯E ⊥ and T1 ¯E ⊥ commute and act 1 √ 1 √ 3)/4 and (1 − 3)/4, respectively. as multiplication by (1 + ¯ √ Therefore the restriction Tε1 Tε2 · · · Tεj ¯E ⊥ acts as multiplication by ((1 + 3)/4)j−sj ((1 − √ Pj 1 3)/4)sj where sj = l=1 εl . This observation allows one to compute local H¨older estimates. For the worst case one obtains √ ¶j µ ¯ 1+ 3 ¯ kTε1 Tε2 · · · Tεj E ⊥ k ≤ 1 4 √ which gives H¨older continuity of order − log2 ((1 + 3)/4) ≈ 0.5500. This is the best global estimate and is attained at x = 0.
1.2 A construction of quadrature mirror filters
23
Statements and proofs of the general versions of these results (for QMFs with higher-order zeros at ξ = 1/2—equivalently with extra sum rules) as well as local estimates of H¨older continuity, differentiability and calculations relating to scaling functions with longer support, can be found in [99]. As has already been hinted, convergence properties of refinement schemes can be recast in terms of joint spectral radius properties (cf. [63]). If εj (x) = εj (y) for all j = 1, . . . , L then Φ(x) − Φ(y) = Tε1 · · · TεL (Φ(τ L x) − Φ(τ L y)). In particular, if xL = 0.ε1 (x) . . . εL (x) in binary notation then x − xL tends to zero as L → ∞ so, assuming that φ is continuous and supported in [0, M ], it follows that φ(x + k) − φ(xL + k) tends to zero as L → ∞ for each k = 0, . . . , M − 1. That is, Φ(x) − Φ(xL ) tends to the zero vector. On the other hand, the points τ L x may be arbitrarily distributed in [0, 1), so it must be that the product Tε1 · · · TεL applied to any vector of the form Φ(x) − Φ(y) tends to zero. The subspace W = span {Φ(x) − Φ(y) : x, y ∈ [0, 1)} ⊂ CM is invariant under T0 and T1 . Consequently Tε1 · · · TεL |W → 0 as L → ∞. Define the uniform joint spectral radius (JSR) ρb(A0 , A1 ) of matrices A0 , A1 to be ρb(A0 , A1 ) = lim sup max kAε1 Aε2 · · · Aεm k1/m . m→∞
εj =0,1
The JSR for a subspace W is defined similarly, in terms of the norms of the restrictions of Ai to W . It is known that arbitrary products of the form Aε1 · · · AεL tend to zero if and only if the JSR of A0 , A1 is strictly less than one. Consequently, a necessary condition for the continuity of φ is that ρb(T0 |W , T1 |W ) < 1. The space W turns out to have a characterization as the smallest common invariant subspace of the Tε that also contains the vector Φ(1)−Φ(0) (see [182], Proposition 4.2). Since Tε fixes Φ(ε) (ε = 0, 1), it is possible to determine these vectors directly from the scaling coefficients hk . Using this approach one can show that ρb(T0 |W , T1 |W ) < 1 is also sufficient for the continuity of φ [182]. In general, the JSR can be difficult to estimate precisely, but matters simplify in important cases including the present refinement setting. Specifically, W = {v ∈ CN ; v1 + v2 + · · · + vN = 0} precisely when the integer shifts φ(· − k) are linearly independent [63].
1.2 A construction of quadrature mirror filters In Section 1.1.4 we saw that, in the case of a finite length scaling filter H, the integer values of its scaling function φ arise as an eigenvector of the matrix Akl = 2h2k−l . Here we propose to work in reverse, parameterizing QMF
24
1 Wavelets: Basic properties, parameterizations and sampling
filters in terms of the possible integer values of the QMF scaling function ϕ. The present construction offers a different perspective and certainly a different set of techniques from the standard constructions of Daubechies [99] or Mallat [267] for designing QMFs and is motivated by applications to sampling and extrapolation of signals in wavelet subspaces. Sampling in wavelet subspaces will be considered in further detail in Chapter 3 (cf. [201]). As QMF coefficients are not arbitrary, constraints must be imposed on {ϕ(k)} in order to ensure the existence of an orthogonal scaling function with these integer samples. Understanding these constraints will allow one to design QMFs suitable for signal extrapolation and for understanding when signals in V (ϕ) can be represented efficiently in terms of their samples. The construction is based on the Zak transform. 1.2.1 The Zak transform Here we define the Zak transform and consider some of its properties of immediate relevance. Other properties will be considered later in this book as needs arise. More details and insights, including some history and applications to signal processing, can be found in Janssen’s tutorial [211]. The Zak transform has a natural place in Gabor analysis and accounts of this role are found in Daubechies’ book [99] and the tutorial by Heil and Walnut [183]. Here we develop its role in the multiresolution context. Given f in the Schwartz space S(R), the function X f (x + k) e2πikξ ((x, ξ) ∈ R × R) (1.25) Zf (x, ξ) = k
is called the Zak transform of f . For such an f , the Poisson summation formula can be expressed as Zf (x, ξ) = e−2πixξ Z fb(−ξ, x). The Zak transform is quasiperiodic in the sense that Zf (x + k, ξ + l) = e−2πikξ Zf (x, ξ)
(l, k ∈ Z).
Consequently, the values of Zf on the square Q = [0, 1) × [0, 1) determine the values of f on the whole time–frequency plane and one thinks of Q as the domain of Zf . A nontrivial consequence of quasi-periodicity—one that has well-established ramifications in Gabor theory and sampling and will play an important role in Chapter 3 as well—is that if Zf (x, ξ) is continuous on Q then Zf has a zero in Q (see [183]). The Zak transform is unitary: if f, g ∈ S(R), then Z
Z
1
Z
1
Zf (x, ξ) Zg(x, ξ) dx dξ.
f (x) g¯(x) dx = 0
0
Hence Z extends to a unitary mapping from L2 (R) to the space
1.2 A construction of quadrature mirror filters
2 −2πiξ F (x, ξ), F : R → C : F (x + 1, ξ + 1) = e Z 1Z 1 Z= . |F (x, ξ)|2 dx dξ < ∞ and 0
25
(1.26)
0
R1 The inversion formula is particularly simple: 0 Zf (x, ξ) dξ = f (x) whenever the integral converges. It will certainly do so for f ∈ S(R); the formula extends to f ∈ L2 (R) by a limiting argument. It will be important later (in the context of sampling) and convenient now to extend the definition of the Zak transform from R × R to R × C. To this end we define the complexified Zak transform ZC f (x, z) for x ∈ R, z ∈ C as the Laurent series X ZC f (x, z) = f (x + k) z k k
whenever the sum converges. For fixed x, ZC f (x, z) is the z-transform of the sequence of samples {f (x + k)}k∈Z . Notice also that if f has compact support, for fixed x the Zak transform ZC f (x, z) is a Laurent polynomial in z. Whenever it is clear from context, we omit the subscript C and, abusing notation, write Zf (x, z) for ZC f (x, z). As such, when z = e2πiξ is restricted to the unit circle, one recovers the previous definition of the Zak transform, i.e., Zf (x, z) = ZC f (x, e2πiξ ) ≡ Zf (x, ξ). 1.2.2 Scaling functions in the Zak domain Because of the fundamental role that the z-transforms of the integer values of scaling function ϕ and wavelet ψ will play, we denote in this section Φ(z) = Zϕ(0, z) and Ψ (z) = Zψ(0, z) where ϕ and ψ are the scaling function and wavelet generated by a QMF. Under suitable a priori conditions on the integer samples of ϕ, its QMF can in fact be defined in terms of these samples. Let ϕ be a scaling function for an MRA of L2 (R). In the Zak domain, the dilation and wavelet equations satisfied by ϕ and its wavelet ψ may be written ZC ϕ(x, z 2 ) = H(z) ZC ϕ(2x, z) + H(−z) ZC ϕ(2x, −z), ZC ψ(x, z 2 ) = G(z) ZC ϕ(2x, z) + G(−z) ZC ϕ(2x, −z), where H is the QMF associated to ϕ and G is a conjugate filter (see [212]). We make frequent use of the involution z ∗ = 1/z. Given a function F defined on a region of the complex plane, we also define F ∗ (z) = F¯ (z ∗ ). This should not be confused with the conjugate matrices that will use the same P transpose forP ∗-notation. When F (z) = k ck z k , F ∗ (z) = k c¯k z −k . A common choice for the conjugate filter is G(z) = −z 2Q+1 H ∗ (−z) for some integer Q whose role is to center ψ as will be clear in a few examples below. With this choice, the fundamental scaling and wavelet equations can be expressed in matrix form: · ¸ · ¸ ZC ϕ(x, z 2 ) ZC ϕ(2x, z) = M (z) (1.27) ZC ψ(x, z 2 ) ZC ϕ(2x, −z)
26
1 Wavelets: Basic properties, parameterizations and sampling
with M (z) the 2 × 2 matrix · ¸ H(z) H(−z) M (z) = . −z 2Q+1 H ∗ (−z) z 2Q+1 H ∗ (z) Let u, v, w : C → C2 be defined by · ¸ · ¸ H(z) Φ(z) , v(z) = , u(z) = H(−z) Φ(−z)
(1.28)
·
¸ Φ(z 2 ) w(z) = . Ψ (z 2 )
Putting x = 0 in (1.27) then gives w(z) = M (z) v(z).
(1.29)
M (z) is not necessarily unitary off the unit circle, but it is invertible since the QMF condition on H extends to C \ {0} as H(z) H ∗ (z) + H(−z) H ∗ (−z) ≡ 1.
(1.30)
Consequently, det M (z) = z 2Q+1 and f(z) = I2 M (z) M
(1.31)
˜ (z) = (M (z ∗ ))∗ is the paraconjugate of M (z), i.e., the conjugate where M transpose of M (z ∗ ). Matrix-valued functions M (z) satisfying (1.31) are said to be paraunitary in that they are invertible extensions of matrices that are unitary on the unit circle. With h·, ·i now representing the usual complex inner product on C2 , and with c(z) = Φ(z 2 )Φ∗ (z 2 ) + Ψ (z 2 )Ψ ∗ (z 2 ), (1.29) gives c(z) = = = =
hw(z), w(z ∗ )i hM (z) v(z), M (z ∗ ) v(z ∗ )i hv(z), M (z)∗ M (z ∗ ) v(z ∗ )i hv(z), v(z ∗ )i = Φ(z) Φ∗ (z) + Φ(−z) Φ∗ (−z)
(1.32)
by the paraunitarity of M (z). Notice also that equation (1.29) may be rewritten in the form w(z) = Q(z) u(z) (1.33) where
· Q(z) =
Φ(z) Φ(−z) z −2Q−1 Φ∗ (−z) −z −2Q−1 Φ∗ (z)
¸
˜ so that Q(z)Q(z) = c(z)I2 . Hence, if c(z) 6= 0 then Q(z) is invertible with ˜ inverse Q(z)−1 = c(z)−1 Q(z) and we may multiply both sides of (1.33) by ˜ c(z)−1 Q(z) to obtain e w(z). u(z) = c(z)−1 Q(z) In particular, equating the top entries in this identity yields
1.2 A construction of quadrature mirror filters
H(z) =
1 (Φ∗ (z) Φ(z 2 ) + z 2Q+1 Φ(−z) Ψ ∗ (z 2 )). c(z)
27
(1.34)
Since c(ξ) = |Φ(ξ)|2 + |Φ(ξ + 1/2)|2 on the unit circle (z = e2πiξ ), there (1.34) becomes H(−ξ) =
¯ Φ(2ξ) Φ(ξ) + e2πi(2Q+1)ξ Ψ¯ (2ξ) Φ(ξ + 1/2) . |Φ(ξ)|2 + |Φ(ξ + 1/2)|2
This is the fundamental equation defining H in terms of the samples of ϕ and its associated wavelet. To see that H, as defined by (1.34) satisfies the QMF condition (1.30), ob˜ serve that by (1.32), c(z) = c(−z) = c∗ (z) = c∗ (−z). Also, since Q(z)Q(z) = c(z)I2 , (1.33) yields ® H(z) H ∗ (z) + H(−z) H ∗ (−z) = u(z), u(z ∗ ) ® e w(z)/ c(z), Q(z e ∗ ) w(z ∗ )/ c(z ∗ ) = Q(z) ® e ∗ Q(z e ∗ ) w(z ∗ ) / c2 (z) = w(z), Q(z) = hw(z), c¯(z) w(z ∗ )i/ c2 (z) = 1 since c(z) = hw(z), w(z ∗ )i. Furthermore, the operator TH given by TH f (z 2 ) = H(z)f (z) + H(−z)f (−z) fixes Φ, since e w(x), v¯(z)i TH Φ(z 2 ) = hu(z), v¯(z)i = hc(z)−1 Q(z) −1 = c(z) hw(z), Q(z ∗ ) v¯(z)i ¸À ¿ · c¯(z) = Φ(z 2 ). = c(z)−1 w(z), 0 1.2.3 QMF construction algorithm In summary, one has the following algorithm for constructing QMFs: 1. Choose a smooth Φ on the complex plane with Φ(1) = 1 and such that c(z) = Φ(z)Φ∗ (z) + Φ(−z)Φ∗ (−z) is nonvanishing in a neighborhood of |z| = 1. 2. Choose Ψ satisfying Ψ (z 2 )Ψ ∗ (z 2 ) = c(z) − Φ(z 2 )Φ∗ (z 2 ). If Φ(−1) 6= 0 then one also requires Ψ (1) = Φ(−1). 3. Define H by (1.34). These conditions only insure that H(0) = 1 and |H(ξ)|2 + H(ξ + 1/2)|2 = 1. To generate orthogonal wavelets, one still must verify Cohen’s condition (Proposition 1.4.1) independently. Moreover, additional conditions must be imposed to guarantee any regularity of the resulting wavelets. Though the algorithm is quite general, design becomes nontrivial when properties such as the compact support of scaling functions are sought. Then
28
1 Wavelets: Basic properties, parameterizations and sampling
Φ should also satisfy certain constraints. If one can choose c(z) = c, a constant, and still verify Φ(1) = 1, then c = c(1) = 1+|Φ(−1)|2 ≥ 1. Then, by definition of c, Ψ should satisfy Ψ (z) Ψ ∗ (z) = c − Φ(z) Φ∗ (z) = Φ(−z) Φ∗ (−z). Possible choices for such Ψ include Ψ (z) = Ψ (1) (z) = ωz N Φ(−z) or Ψ (z) = Ψ (2) (z) = ωz N Φ∗ (−z) for some integer N and ω ∈ C with |ω| = 1. If Φ(−1) 6= 0, then the compatibility condition becomes ωΦ(−1) = Φ(−1), i.e., ω = 1 for ¯ ¯ Ψ = Ψ (1) and ω Φ(−1) = Φ(−1), i.e., ω = Φ(−1)/Φ(−1) for Ψ = Ψ (2) . In either case, H can be defined by ¡ ¢ Φ(z 2 ) Φ∗ (z) + z 2Q+1 Ψ ∗ (z 2 ) Φ(−z) H(z) = . (1.35) 1 + |Φ(−1)|2 1.2.4 Constraints on samples imposed by QMFs For a concrete construction PM −1 and parameterization, let Φ be the trigonometric polynomial Φ(z) = k=1 ak z k . The condition Φ(1) = 1 translates to M −1 X
ak = 1
(1.36)
k=1
on the coefficient side. The condition c(z) = c = const. can be expressed as M −1 X k=1
ak a ¯k−2m =
c δm . 2
(1.37)
If M = 2N + 1 is an odd integer, (1.37) imposes M − 2 nontrivial quadratic constraints and (1.36) imposes an additional linear constraint. Thus, an admissible Φ can be identified through its coefficients with a vector a = (a1 , a2 , . . . , aM −1 ) ∈ CM −1 satisfying these constraints for some c ≥ 1. This vector contains the integer values of the scaling function. With such a Φ given, Ψ must be chosen so that Ψ (z)Ψ ∗ (z) = Φ(−z)Φ∗ (−z) and satisfying the compatibility condition. Then H is determined by (1.35). 1.2.5 Parameterization of four-coefficient systems When M = 3, the family (1.37) contains just one nontrivial equation, namely |a1 |2 +|a2 |2 = c/2. From (1.36) we have a2 = 1−a1 , hence 2|a1 |2 −2Re(a1 )+1− c/2 = 0. For simplicity we assume a1 and a2 real. Then 2a21 − 2a1 + 1 − c/2 = √ √ (+) 0 and therefore a1 = (1 ± c − 1)/2 (c ≥ 1). Let a1 = (1 + c − 1)/2, √ (−) (+) (−) (−) (+) a1 = (1 − c − 1)/2. Then a2 = a1 and a2 = a1 so that Φ(+) (z) = (+) (+) (−) (−) a1 z + a2 z 2 and Φ(−) (z) = a1 z + a2 z 2 satisfy z 3 Φ(−) (z −1 ) = Φ(+) (z)
1.2 A construction of quadrature mirror filters
29
and the construction starting from Φ(−) will lead simply to a time reversal of the scaling function determined by Φ(+) . From now on Φ = Φ(+) and √ ν = −1/ c − 1. Then µ ¶ µ ¶ ν−1 ν+1 Φ(z) = z+ z2. 2ν 2ν Let Ψ (z) = Ψ (1) (z) = z N Φ(−z). Then ¡ ¢ H(z) = c−1 Φ(z 2 )Φ∗ (z) + z 2Q−2N +1 Φ∗ (−z 2 )Φ(−z) ¡ = c−1 (a1 z 2 + a2 z 4 )(a1 z −1 + a2 z −2 )
¢ + z −2Q−1 (a2 z −4 − a1 z −2 )(a2 z 2 − a1 z)
which is a polynomial for N = Q − 1. With this value of N , H becomes H(z) =
¢ 1 1¡ 2 (a1 + a22 )z + (a21 + a22 )z 2 = (z + z 2 ), c 2
which produces a translation of the Haar scaling function. On the other hand, if Ψ (2) (z) = z N Φ∗ (−z), H(z) =
³a ´ 1³ a2 ´ 1 (a1 z 2 + a2 z 4 ) + 2 + z 2Q−2N +1 (a2 z 4 − a1 z 2 )(a2 z 2 − a1 z) c z z
which is a polynomial of degree 3 if N = Q + 2. With this value of N , H becomes ¢ 1¡ a1 (a1 + a2 ) + a1 (a1 − a2 )z + a2 (a2 − a1 )z 2 + a2 (a1 + a2 )z 3 c ¡ ¢ 1 ν(ν − 1) + (1 − ν)z + (ν + 1)z 2 + ν(ν + 1)z 3 . (1.38) = 2(ν 2 + 1)
H(z) =
As ν ranges over (−1, 0) one recovers the Daubechies 4-tap QMFs (cf. [358]). 1.2.6 Cardinal scaling functions These ideas furnish a simple proof of the nonexistence of orthogonal, continuous, compactly supported cardinal scaling functions (cf. Xia and Zhang [368]). Recall that a continuous function f on R is cardinal if f (k) = δk (k ∈ Z). Theorem 1.2.1. Suppose that ϕ is a compactly supported orthogonal cardinal scaling function. Then ϕ must be the Haar function or its time reversal. Proof. If ϕ is cardinal, then Φ(z) ≡ 1, c(z) ≡ 2 and by (1.32), Ψ must satisfy Ψ (z)Ψ ∗ (z) ≡ 1. Since Ψ is a trigonometric polynomial, it must be of the form Ψ (z) = λz P for some λ ∈ C with |λ| = 1 and integer P . The compatibility P condition Ψ (1) (1.35) we ¡ = Φ(−1) ¢ requires λ = 1 so that Ψ (z) = z . From 2R+1 have H(z) = 1 + z /2 for some integer R and when z = e−2πiξ ,
30
1 Wavelets: Basic properties, parameterizations and sampling
H(ξ) =
1 (1 + e−2πi(2R+1)ξ ) = e−πi(2R+1)ξ cos π(2R + 1)ξ = HHaar ((2R + 1)ξ) 2
where HHaar (ξ) = e−πiξ cos πξ is the QMF associated to the Haar scaling function ϕHaar = χ[0,1] . Then ϕ(ξ) b =
∞ Y
µ HHaar
j=1
(2R + 1)ξ 2j
¶ =ϕ bHaar ((2R + 1)ξ).
In particular, ϕ(t) = ϕHaar (t/(2R + 1))/|2R + 1| is not continuous at 0. If ϕ is cardinal in the sense of one-sided limits, then limt→0+ ϕ(t) = 1 or limt→0− ϕ(t) = 1, and this requires 1/|2R + 1| = 1. Hence R = 0 or R = −1. If R = 0 then ϕ = ϕHaar while if R = −1 then ϕ(t) = ϕHaar (−t).
1.3 Computing the scaling function The cascade algorithm generates approximate values of the scaling function ϕ at the dyadic rationals D. Exact valuesPcan be computed by the scaling relation (1.1): the values ϕ(k) solve ϕ(k) = l hl ϕ(2k −l)and values at dyadic rationals k/2j are then computed by iterating (1.1). Exact values of ϕ along D can also be obtained using Fourier/Zak transform techniques, as we will see here. In Chapter 3 this approach will be put to use in verifying sampling formulas in V (ϕ). The integer samples of the scaling function ϕ induced from Φ(ξ) as in the QMF construction algorithm above are just the Fourier coefficients of Φ, that is, Φ(ξ) = P 2πikξ = Zϕ(0, ξ). Let H be the QMF associated with Φ by (1.35). k ϕ(k)e Let E denote the operation of multiplication by e2πiξ and T the operator T f (ξ) = H(ξ/2)f (ξ/2) + H(ξ/2 + 1/2)f (ξ/2 + 1/2) acting on L2 ([0, 1]). By (1.27), ϕ must satisfy Zϕ(x, ξ) = T (Zϕ(2x, ·))(ξ). By quasiperiodicity, Zϕ(l, ξ) = E −l Zϕ(0, ξ) = E −l Φ(ξ) (l ∈ Z). Combining and iterating these R1 facts and the reconstruction formula ϕ(x) = 0 Zϕ(x, ξ) dξ, one must then have µ ¶ Z 1 l ϕ J +k = E −k T J E −l Φ(ξ) dξ. 2 0 Suppose now that ϕ is supported on [0, M ]. Recall that if A(ξ) = PM −1 2πijξ , then j=1−M aj e Z
1
a0 =
A(ξ) dξ = 0
¶ M −1 µ k 1 X A . M M
(1.39)
k=0
If 0 ≤ l ≤ 2J − 1 then Zϕ(l/2J , ξ) is a forward trigonometric polynomial of degree at most PMM−1− 1. It follows that, for each 0 ≤ k ≤ M − 1, e−2πikξ Zϕ(l/2J , ξ) = j=1−M bj e2πijξ for some constants bj and we may use the quadrature formula (1.39) to obtain
1.4 Notes
µ ϕ
l +k 2J
¶ =
¶ µ M −1 1 X −2πijk/M l j e Zϕ J , M j=0 2 M
=
µ ¶ M −1 1 X −2πijk/M J −l j e T E Φ M j=0 M
31
J
M −1 2 −1 1 X −2πijk/M X −2πiln/2J = e e F (j, n) M j=0 n=0
(1.40)
where F is the M × 2J matrix with (j, n)-th entry µ µ ¶ J ¶ j + Mn Y j + Mn F (j, n) = Φ H (0 ≤ j ≤ M − 1, 0 ≤ n ≤ 2J − 1). p M 2J M 2 p=1 Hence, values of ϕ at dyadic rationals may be computed from (1.40) by r µ ¶ l 2J (1) (2) ϕ + k = F F J F (k, l) (1.41) 2J M M 2 (i)
where FN denotes the N -point discrete Fourier transform in the ith variable. In summary: if the sequence al satisfies some mild constraints, then one can construct a scaling function ϕ such that ϕ(l) = al . The QMF H of ϕ is obtained by passing through the Zak transform. While defining a scaling function ϕ in terms of its integer samples has useful applications that we will see in Chapter 3, the previous considerations leave open one important problem, namely the possible regularity of a scaling function ϕ when it is defined in terms of its integer values as in Section 1.2.3.
1.4 Notes Frame multiresolution analysis. Lawton [251] made an initial investigation of QMF wavelets of the type of the stretched Haar filter that give rise to frames for L2 (R). The theory was developed P further by2 Benedetto and Li [41] b + k)| . The shifts of ϕ form in terms of the overlap function Φ(ξ) = k |ϕ(ξ a frame for V (ϕ) precisely if Φ(ξ) is bounded and essentially bounded below on the set on which it does not vanish. In the case of an orthogonal scaling function, of course, one has Φ ≡ 1. MRAs associated with such frames are studied in [44]. Basic properties of wavelets. In her book [99], Daubechies pointed out several basic problems in characterizing relationships between QMFs and wavelets on the one hand and wavelets and MRAs on the other. The latter
32
1 Wavelets: Basic properties, parameterizations and sampling
was already settled by 1990 and Lemari´e-Rieusset characterized those wavelets P∞ P b j (ξ + that have an associated MRA by the simple condition j=1 k |ψ((2 2 k))| = 1 a.e. Another basic question is: which filters H satisfying H(0) = 1 and |H(ξ)|2 + |H(ξ + 1/2)|2 = 1 a.e. give rise to orthogonal scaling functions? We saw that, when H is a trigonometric polynomial, the τ -cycle condition must be satisfied. When H is only assumed to be C 1 , a necessary and sufficient condition was discovered by Cohen (cf. [189], Section 7.4), namely: Proposition 1.4.1. Under the hypotheses above, H generates an orthogonal scaling function if and only if there a finite union P K of closed and bounded intervals containing 0 its interior such that k χK (ξ + k) = 1 a.e. and H(ξ/2j ) 6= 0 for all ξ ∈ K and j = 1, 2, . . . . Still, several nagging issues remained. The WUTAM consortium (e.g., [367]) was a group of waveleteers at Washington University and Texas A&M who set out to settle several basic questions about wavelets once and for all. Some of the very basic properties appear in the book of Hern´andez and Weiss [189]. Here are a few others. A wavelet multiplier is a function ν such b ∨ is an orthonormal wavelet whenever ψ itself is an orthonormal that (ν ψ) wavelet. WUTAM characterized such ν as those unimodular functions for which ν(2ξ)/ν(ξ) is periodic of period one. They also used such multipliers to prove that MRA wavelets are arc-connected in the sense that if ψ0 and ψ1 are two MRA wavelets then there is a continuous path A(t) : [0, 1] → W (with W the class of MRA wavelets on the line) such that A(0) = ψ0 and A(1) = ψ1 . A characterization of scaling filters was subsequently obtained by part of WUTAM [292]. M-band wavelets. In this chapter we considered only two-scale dilation equations. If one replaces 2 by m in the scaling equation (1.1) one obtains an m-scale dilation equation. In many cases it is possible to construct, but now in a less canonical way, orthogonal m-scale wavelet bases having m − 1 generators. There are some advantages to doing so. For example, Chui and Lian [78] constructed a scaling function with m = 3 leading to a pair of wavelets, one of which is symmetric and the other antisymmetric. In terms of subband coding, this approach corresponds to using m subbands rather than two; see [332]. Convolution structure on scaling distributions. As discussed in Section 1.1.3, the N th order B-splines are all scaling functions and, at the same time, are N th order autoconvolutions of the Haar scaling function. They are, however, not orthogonal to their shifts. These observations apply to more general convolutions of scaling functions. Suppose that φ(1) and φ(2) areP scaling distributions, meaning that they are distributional solutions of φ = 2 k hk φ(2·−k) in the sense that, for every test function f one has D X E hf, φi = 2 f, hk φ(2 · −k) . k
1.4 Notes
33
If, in addition, φ is tempered then its Fourier transform is well defined and b = H(·/2)φ(·/2) b satisfies φ(·) where H is the Fourier series of {hk }, provided this product exists. In what follows we shall assume at the very least that φb is defined as a locally square-integrable function and that H is locally bounded and continuous. Then the pointwise product φb(1) φb(2) of two such functions is well defined. Suppose now that we can justify rearranging the terms of the formal product to write φb(1) (ξ)φb(2) (ξ) =
Y
H (1)
³ ξ ´Y ³ ´ Y ³ ´ ³ ´ (2) ξ (1) ξ (2) ξ H = H H 2j 2j 2j 2j
as is the case when the individual products converge locally uniformly. In short, reasonable subclasses of scaling distributions will be closed under convolution. However, the product of two orthogonal QMFs H (1) , H (2) will never itself be orthogonal since, typically, |H (1) H (2) (ξ)|2 + |H (1) H (2) (ξ + 1/2)|2 < 1 (cf. (1.3)). Nevertheless, one can still attach to the convolution φ = φ(1) ∗ φ(2) the orthogonal scaling function ϕ(1,2) satisfying ϕ b(1,2) (ξ) = P (1) (2) (1) (2) 2 b φb (ξ + k)| as was originally sugφb φb (ξ)/Φ(ξ) where Φ(ξ) = k |φ gested by Aldroubi and Unser (e.g., [4, 352]). Polyphase representation. Compactly supported orthogonal scaling functions and wavelets are associated with a Laurent series H satisfying (1.30). We define G(z) = cz 2Q+1 H ∗ (−z) for some integer Q and constant c with |c| = 1. Then (H, G) form a QMF pair, i.e., H(z) H ∗ (z) + G(z) G∗ (z) = 1
(1.42)
wherever the series defining H(z) and G(z) are defined (a set containing the unit circle). If the scaling function has compact support then H and G are finite polynomials and (1.42) is well defined on C \ {0}. As we have seen in Section 1.2.2, the QMF condition (1.42) can be expressed in terms of the pa˜ (z) = I with M ˜ (z) raunitarity of the modulation matrix (1.28), i.e., M (z)M the paraconjugate of M (z). If H and G are Laurent polynomials, this is equivalent to the unitarity of M (z) for P all z ∈ T. Any Laurent series P (z) = n pn z n can be expressed in polyphase form: P (z) = pe (z 2 ) + z po (z 2 ) P P where pe (z) = n p2n z n and po (z) = n p2n+1 z n . Given a QMF pair L = (H, G), one associates the polyphase matrix · ¸ √ he (z) ge (z) PL (z) = 2 ho (z) go (z) For the Haar QMF, H(z) = (z + 1)/2 and = √ − 1)/2, the polyphase £ G(z) ¤ (z 1 1 2. matrix simply takes the form PHaar (z) = −1 / 1
34
1 Wavelets: Basic properties, parameterizations and sampling
Pollen’s parameterization. Pollen [299] revealed a group structure on polyphase matrices that leads to a unique factorization of orthogonal QMF pairs into so-called elementary factors. This should be contrasted with lifting— outlined below—which emphasizes factorization of biorthogonal filters. What follows is a brief outline of Pollen’s program. Let U (2, [z, 1/z]) be the group of paraunitary 2 × 2 matrices A(z) whose entries are Laurent polynomials. The modulation matrices M (z) of (1.28) are in U (2, [z, 1/z]). The key to the factorization is the construction of a mapping that assigns the identity matrix to a nontrivial QMF. Define a mapping ∆ on 2 × 2 matrices by ∆(A) = PHaar AT . Since T PHaar PHaar = I2 , ∆ maps U (2, [z, 1/z]) to itself and ∆−1 (A) = AT PHaar . Also, ∆ acts on the polyphase matrix PL (z) of a QMF pair L = (H, G) by · ¸ he (z) + ho (z) ge (z) + go (z) T ∆(PL ) = PHaar PL = . ho (z) − he (z) go (z) − ge (z) Pollen defined the (Pollen) product K1 ]K2 of two QMF pairs K1 = (H1 , G1 ) and K2 = (H2 , G2 ) via the polyphase matrix T PK1 ]K2 (z) = ∆−1 (∆(PK1 (z))∆(PK2 (z))) = PK2 (z)PHaar PK1 (z).
As PK1 ]K2 ∈ U (2, [z, 1/z]), the product ] defines a group multiplication on U (2, [z, 1/z]). In this group, the Haar QMF acts via PHaar as group identity. Furthermore, if Hi (z) = hie (z 2 ) + zhio (z 2 ), Gi (z) = gei (z 2 ) + zgoi (z 2 ) (i = 1, 2) then K3 = K1 ]K2 is defined through its polyphase terms h3e = [h1e (h2e − ge2 ) + h1o (h2e + ge2 )], ge3 = [ge1 (h2e − ge2 ) + go1 (h2e + ge2 )],
h3o = [h1e (h2o − go2 ) + h1o (h2o + go2 )], go3 = [ge1 (h2o − go2 ) + go1 (h2o + go2 )].
Pollen inverses have the form PK −1 (z) = PHaar P˜K (z)PHaar with P˜K (z) the paraconjugate of PK (z). In fact, given any QMF L = (H, G) one could define a Pollen product ]L on U (2, [z, 1/z]) by putting ∆L (A)(z) = P˜L (z)A(z)T and PA]L B (z) = T ˜ ∆−1 L (∆L (A)∆L (B)) = B(z)PL (z) A(z). Choosing L to be the Haar QMF though is important for building wavelets from a set of minimal factors. Pollen’s factorization of SU(2, [z, 1/z]). Finite-degree QMFs are special pairs of Laurent polynomials. Pollen discovered a basic set of building blocks, referred to as factors, such that any QMF has a unique factorization under Pollen’s product. These factors have minimal degree. SpecifiPk cally, if p(z) = m=−l am z m , one defines dmax (p) = max(k, l). The degree function dmax can be extended to matrices A(z, 1/z) by setting dmax (A) = maxi,j dmax (aij (z, 1/z)). Proposition 1.4.2. Every P ∈ U (2, [z, 1/z]) can be factored in the form P =¤ £ ABS such that A is a scalar matrix in U (2), B has the form B = 10 z0k for some integer k, and S ∈ SUI (2, [z, 1/z]).
1.4 Notes
35
Here SU (2, [z, 1/z]) denotes the subgroup of U (2, [z, 1/z]) consisting of paraunitary matrices A(z) of determinant 1 and SUI (2, [z, 1/z]) is the subgroup of SU (2, [z, 1/z]) satisfying A(1) = I. h To factorize iSUI (2, [z, 1/z]), Pollen defined a set F of basic factors X = p(z) q(z) such that p(z) = a+cz, q(z) = b+dz subject to the constraint −q ∗ (z) p∗ (z) that X ∈ SUI (2, [z, 1/z]). The set F itself does not contain all elements of SUI (2, [z, 1/z]) having Laurent degree dmax = 1. In fact, F ∗ ∩ F = I, but it turns out that the mapping (X, Y ) → XY ∗ maps F × F onto the subset of SUI (2, [z, 1/z]) consisting of those matrices of degree at most one. Pollen’s unique factorization theorem is as follows. Theorem 1.4.3.QAny P ∈ SUI (2, [z, 1/z]) of Laurent degree m can be written m in the form P = k=1 A∗k Bk in which Ak , Bk ∈ F . Moreover this factorization is unique when considering P as a product of elements of F . The theorem is proved constructively by a series of lemmas providing conditions for pulling off factors of specific types. Existence and uniqueness are both proved by induction on the Laurent degree of P . We refer to [299] for its proof. More on parameterizing wavelets. Scaling sequences form a subset of the unit ball in `2 (Z). As such they form a metric space under the induced metric; however, this metric is not necessarily invariant under any transformation such as Pollen multiplication. To find a suitable metric, take the angle in `2 between two scaling sequences h = {hk } and k = √ {kl }, that is, θ = arccos(hh, ki). One then defines the new distance to be 2| sin(θ/2)|. It can be shown that Pollen multiplication is an isometry on the set of QMF sequences under this metric [361]. This result is of a different nature from the characterizations of wavelets by the WUTAM consortium. Wavelet systems are not ideal. The group structure arising from Pollen’s product is sometimes called the wavelet group. It would be better termed the MRA group in the sense of Benedetto and Li’s frame MRAs [41], since the product does not preserve Cohen’s τ -cycle condition (see Theorem 1.1.1). To give a simple example, the stretched Haar filter K(z) = (1 + z 3 )/2 has Pollen square K]K = (z −2 − z + z 2 + 2z 3 + z 5 )/4. The quantities |K|2 and K]K|2 are plotted in Figure 1.1. Clearly the τ -cycle condition fails for K but is satisfied by K]K. Pollen’s factorization: m × m case. Pollen’s factorization has been considered in the case of integer dilations by Heller et al. [185], who also proved that the so-called m-band orthogonal wavelets form a group under the appropriate analogue of Pollen’s product. Their work was extended further by Kautsky and Turcajova [226] who addressed the problem of factoring SL(m, [z, 1/z]). Pollen’s factorization takes certain advantages of properties of 2 × 2 matrices
36
1 Wavelets: Basic properties, parameterizations and sampling
stretched Haar filter 1 0.8 0.6 0.4 0.2 0 −0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
0.2
0.3
0.4
0.5
Pollen square of stretched Haar 1 0.8 0.6 0.4 0.2 0 −0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
2
Fig. 1.1. Plots of |K| (stretched Haar filter) and |K]K|2
that do not extend to larger matrices. In particular, when m > 2 the analog of Pollen’s factorization no longer produces unique factorization. In [226], a quasi-canonical factorization was considered. It was extended to the biorthogonal case by Resnikoff et al. [305]. One calls a pair (C(z), D(z)) a factorization of I provided (i) both C(z) and D(z) are matrix polynomials (not Laurent polynomials!), (ii) C(1) = 1 = D(1) and (iii) C(z)(D(¯ z ))∗ = I. The quasicanonical factorization states that a pair of polyphase matrices (L, R) forms a biorthogonal matrix pair of rank m if and only if there exist primitive matrices V1 , . . . , Vd , a factorization (C(z), D(z)) of I and a G in the group GL(m − 1) of invertible (m − 1) × (m − 1) matrices, such that · ¸ 1 0 −k0 L(z) = z V1 (z) · · · Vd (z)C(z) H, and 0 G · ¸ 1 0 −k0 R(z) = z V1 (z) · · · Vd (z)D(z) H. 0 (G−1 )∗ Here H denotes a canonical Haar matrix of size m − 1. Assuming that det L(z) = cz −b , one has d = b − mk0 while the degree of C(z) is at most the maximal degree of the polyphase components of L. Euclidean algorithm for Laurent polynomials. Pollen’s parameterization of QMF wavelets relied on a unique factorization of SU (2, [z, 1/z]). A second type of factorization of Laurent series—which is not unique—leads to a natural and rapidly implementable construction of biorthogonal wavelets.
1.4 Notes
37
PN
The Laurent sum degree of q(z) = k=M ck z k ∈ C[z, 1/z] is defined as dsum (q) = N − M . Then dsum is a homomorphism from the multiplicative group C[z, 1/z] to the integers. Henceforth we will write D(p) = dsum (p). Just as in the case of ordinary polynomials, given a, b ∈ C[z, 1/z] with D(a) ≥ D(b), one has the Euclidean algorithm factorization a(z) = b(z) q(z) + r(z) D(b) + D(q) = D(a) D(r) < D(b). However, in contrast with C[z], such a factorization is not necessarily unique. For example, z + z −1 = (z + 1)(1 + z −1 ) − 2 = (2z + 1)(1/2 + z −1 ) − 5/2. This nonuniqueness actually provides some freedom in defining lifting steps to pass from lower-degree filters to higher-degree filters. Theorem 1.4.4. (Euclidean algorithm) Let a, b be Laurent polynomials with D(a) ≥ D(b). Starting with a = a0 and b = b0 , inductively define ai+1 = bi and bi+1 the remainder obtained from dividing bi into ai . Let n be the smallest integer for which bn (z) = 0. Then an (z) is the gcd of a and b. If,h eventually, i ai is a monomial then a, b are relatively prime. When e e P = hho ggo belongs to SL(2, [z, 1/z]), the group of matrix-valued Laurent polynomials having constant determinant one, the Laurent polynomials he and ho are necessarily relatively prime: otherwise det P has a nonzero root. Therefore, gcd(he , ho ) is a monomial and in fact can be normalized to be a constant K as above. After n = D(he ) steps of the Euclidean algorithm one has · e¸ Y ¸· ¸ n · h qi (z) 1 K = . ho 1 0 0 i=1
e
o
In case D(h ) < D(h ), take q1 = 0, while if n is odd, multiplying he , ho by z and dividing ge , go by z does not affect the determinant. Thus, n may be assumed to be even. With these conventions, starting with a filter H one can readily define a complementary filter G in such a way that · P (z) =
he ho
ge go
¸ =
¸· n · Y qi (z) 1 K 1 0 0
0 1/K
i=1
¸ ∈ SL(2, [z, 1/z]).
Because ·
q(z) 1 1 0
¸
· =
1 q(z) 0 1
¸·
0 1 1 0
¸
· =
0 1 1 0
¸·
1 0 q(z) 1
¸
one can take advantage of the fact that n is assumed even to write
,
38
1 Wavelets: Basic properties, parameterizations and sampling
· P (z) =
K 0
0 1/K
¸ n/2 Y· i=1
1 q2i−1 (z) 0 1
¸·
1 0 q2i (z) 1
¸ .
(1.43)
The purpose of this exercise, as we shall see, is to be able to associate readily to a filter H any number of dual filters G. The Haar polyphase matrix gives the simplest nontrivial example of the factorization (1.43), namely, · ¸ ·√ ¸· ¸· ¸ 1 2 0 1 −1 1 −1/2 1 0 = PHaar (z) = √ . √1 0 0 1 1 1 2 1 1 2 The role of the constant diagonal matrix is to give the QMF filters the right values at z = 1. Complementary filters. We have observed that the Euclidean algorithm providesh a means of iattaching to H a complementary filter G such that e g e (z) P (z) = hho (z) ∈ SL(2, [z, 1/z]). From now on, we say that the filter (z) g o (z) pair (H, G) is complementary when P ∈ SL(2, [z, 1/z]). Then G is said to complement H, and vice versa. Nonuniqueness of the factorization indicates that there is more than one way to complement H. Theorem 1.4.5. If two Laurent polynomials G1 and G2 complement H, then G2 (z) = G1 (z) + H(z)s(z 2 ) for some Laurent polynomial s. Conversely, if G1 complements H and G2 (z) = G1 (z) + H(z)s(z 2 ), then G2 complements H. Proof. Suppose G1 , G2 complement H and let ¸ ¸ · e · e h (z) g2e (z) h (z) g1e (z) , P (z) = P1 (z) = 2 ho (z) g2o (z) ho (z) g1o (z) be the polyphase matrices associated with the pairs L1 = (H, G1 ) and −1 L2 = (H, G2 ), respectively. By considering the for £ 1 explicit ¤ form of P −1 s(z) P ∈ SL(2, [z, 1/z]), we see that P1 (z)P2 (z) = 0 1 = S(z) with s the Laurent polynomial s(z) = g1o (z)g2e (z) − g1e (z)g2o (z). Hence P2 (z) = P1 (z)S(z) and by reading off the polyphase components of both sides we have g2e (z) = he (z)s(z)+g1e (z), g2o (z) = ho (z)s(z)+g1o (z). Combining these polyphase terms gives G2 (z) = g2e (z 2 ) + zg2o (z 2 ) = G1 (z) + H(z)s(z 2 ). On the other hand, if G1 complements H and G2 (z) = G1 (z) + H(z)s(z 2 ), then the polyphase components of G2 are g2e (z) = he (z)s(z) + g1e (z) and g2o (z) = ho (z)s(z) + g1o (z). The polyphase matrix for the pair (H, G2 ) is then · e ¸ · e ¸· ¸ h (z) g1e (z) + he (z) s(z) h (z) g1e (z) 1 s(z) P2 (z) = = ho (z) g1o (z) + ho (z) s(z) ho (z) g1o (z) 0 1 from which it is clear that P2 (z) ∈ SL(2, [z, 1/z]), i.e., G2 complements H.
1.4 Notes
39
Theorem 1.4.5 is symmetric in H and G. If two Laurent polynomials H1 and H2 complement G, then H2 (z) = H1 (z)+G(z)t(z 2 ) for some Laurent polynomial t. Conversely, if H1 complements G and H2 (z) = H1 (z) + G(z)t(z 2 ), then H2 complements G. It is also worth noting that if K1 = (H1 , G) and K2 = (H2 , G) then H1 and H2 complement G if and only if the polyphase matrices Q1 (z)£ and Q¤2 (z) associated to K1 and K2 respectively satisfy 1 0 Q2 (z) = Q1 (z) t(z) for some Laurent polynomial t(z). 1 Lifting. While paraunitarity of the polyphase matrix P of (H, G) identifies the pair as a QMF, invertibility of P indicates an ability to complete H to a perfect reconstruction filter bank. In particular, we require a polyphase matrix Q such that P (z)Q(z) = I. The Laurent polynomials det P and det Q satisfy det P (z) det Q(z) = 1 so that det P must be a monomial cz k since it is invertible in C[z, 1/z]. One can then renormalize P and Q so that they belong to SL(2, [z, 1/z]). The factorization technique just considered is the foundation of Sweldens’ lifting scheme [109]. Here, perfect reconstruction subband filters are decomposed into (alternatively lifted from) readily implemented lifting steps. Definition 1.4.6. A polyphase matrix£ P2 ∈ SL(2, ¤ [z, 1/z]) is said to be lifted from P1 ∈ SL(2, [z, 1/z]) via S(z) = 10 s(z) if P2 (z) = P1 (z)S(z). Simi1 larly, P2 ∈ SL(2, [z, 1/z]) is dual-lifted from P1 via S if P2 = P1 S T . Theorem 1.4.5 tells us how to write a polyphase matrix P as a product of lifting steps. That the same Laurent polynomial can be used to factor both the even and odd terms takes advantage of nonuniqueness of factorization. Upon reassembling the modulation matrix (1.28) from the polyphase matrix P (z), the lifting scheme can be summarized as: given complementary finite filters H, G, any other finite filter Gnew complementary to H is obtained from G by a lifting step: Gnew (z) = G(z) + H(z) s(z 2 ), while any other finite filter H new complementary to G is obtained from H by a dual lifting step: H new (z) = H(z) + G(z) t(z 2 ). Starting with the Lazy wavelet defined as P = I, other wavelets may be built via such lifting steps, so named because these steps lift the degree of the Laurent filters in question. The Haar wavelet itself requires a nontrivial lifting step. Examples of biorthogonal wavelets, and special considerations leading to orthogonal ones through lifting, are provided in [109]. Subband coding is improved through lifting: rather than implementing the filters H, G at once, their lifting steps may be implemented in series. The scheme is also adaptable to other multiscale contexts that lack the shiftinvariance of the MRA spaces, e.g., [102].
2 Derivatives and multiwavelets
This chapter addresses several aspects of wavelets that underlie their potential use in the numerical solution of partial differential equations (PDEs). We will not address the PDEs themselves, except to make a few remarks about the case of Navier–Stokes equations. Two principal issues are considered here. The first is that membership in Sobolev spaces is characterized in terms of magnitudes of wavelet coefficients across scales. The second principal issue is commutation, by which we mean existence of pairs of MRAs with base spaces V (φ+ ) and V (φ) having the property that differentiation commutes with projections P+ and P onto V (φ+ ) and V (φ), respectively, that is, (d/dx)P+ = P . We start with a review of the so-called nonstandard representation of d/dx in wavelet coordinates. This representation leads painlessly to a means of implementing the operator d/dx by matrix multiplication in wavelet coordinates. Next we discuss commutation for MRAs. This is a means of associating to a given MRA a new MRA whose spaces contain derivatives of those of the first. In Section 2.1.3, we review the characterization of Sobolev norms in terms of magnitudes of wavelet coefficients. The change of basis matrix that maps coefficients in one wavelet basis to coefficients in another plays an important role here. The magnitudes of the entries of this matrix depend on the regularity of the two wavelets. Consequently, the wavelet characterization of Sobolev spaces depends only on the regularity of the mother wavelet. This observation will be extended to Besov spaces in Chapter 6. In Section 2.1.4 we prove an estimate for wavelet projections of pointwise products in Sobolev norms. Analogous estimates have been used by Federbush [134] and by Cannone and Meyer [66] in important matters of well-posedness of Navier–Stokes equations. Methods underlying product estimates also play a role in validating approximate numerical solutions of PDEs for divergence-form elliptic operators. One of the early concerns about wavelets was the difficulty in adapting them to bounded domains. There are now several ways to do so—each with its tradeoffs. For particularly simple domains one can take advantage of symmetry properties, but to get symmetry one must work either with biorthogonal wavelets whose supports can be a problem, or with multiwavelets, which
42
2 Derivatives and multiwavelets
exhibit better tradeoff between regularity, support length and symmetry. In Sections 2.2 and 2.3 we study what are, perhaps, the two most familiar families of multiwavelets, namely piecewise polynomial multiwavelets, introduced as such by Alpert (e.g., [6]), and the famous DGHM wavelets [123] developed at Georgia Tech by Donovan, Geronimo, Hardin and Massopust. The Alpert wavelets are conceptually simpler, but they suffer the drawback of being discontinuous. This makes convergence of wavelet expansions in Sobolev norms difficult to analyze. A good illustration of this can be found in Chapter 7 where Fefferman and Phong’s eigenvalue estimates for Schr¨odinger operators [136] are reviewed: they had to work hard to prove crucial Sobolev estimates for expansions in piecewise linear multiwavelets. Strela [333] developed criteria for desirable properties of multiwavelets, such as symmetry, when transforming one multi-MRA into another. We explore these criteria in the context of commutation in the DGHM setting. The ideas are then used to establish characterizations of Sobolev norms on the half-line in terms of DGHM wavelet coefficients. Natural extensions to higher dimensions are mentioned in the notes. The chapter notes also address further matters of commutation. Work of Donoho, Dyn, Levin and Yu [116], in which commutation is studied in relation with interpolation methods on a dyadic grid, is outlined. An extension of commutation to the setting of subdivision schemes for irregular grids, due to Daubechies, Guskov and Sweldens [103, 104], is also discussed. In summary, this chapter is about how wavelets are adapted and how they are adaptable: they are natural for describing function spaces of fundamental importance in PDE and elsewhere; they are adaptable—wavelet constructions can take into account some geometric constraints. The techniques discussed herein can be brought to bear on numerical implementation in the solution of boundary value problems, in adapting wavelet constructions to special domains in Rn , and in building multiwavelets with still more regularity and symmetry properties.
2.1 Wavelets and derivatives 2.1.1 Nonstandard wavelet representation of d/dx Fix an orthogonal scaling function ϕ. Let Pj denote the orthogonal projection onto Vj (ϕ) and let Qj denote the orthogonal projection onto the corresponding wavelet subspace. A linear operator T having the wavelet basis in its domain can be written as µX X X¶ X Q ν T Qj = + + Q ν T Qj T = ν=j
ν,j ∈ Z
=
X j
ν<j
Qj T Qj +
ν>j
X j
Pj T Qj +
X j
Q j T Pj .
2.1 Wavelets and derivatives
43
This formulation is called the nonstandard representation of T because it makes special use of the multiresolution structure of the orthogonal projections, specifically, Qj = Pj+1 − Pj , to express T as a sum of operators acting at a fixed scale. It is particularly useful when T commutes with translations and dilations. Consider the special case T = d/dx. One has D d E X d Qj Qj f = hf, ψjk i ψjk , ψjl ψjl . dx dx k,l
The matrix coefficients of Qj (d/dx)Qj are Z E D d j j ¯ j x − l) dx ψjk , ψjl = 2 αkl = 2j ψ 0 (2j x − k) ψ(2 dx Z ¯ − (l − k)) dx = 2j ψ 0 (u) ψ(u = 2j hψ 0 , ψ(· − (l − k))i = 2j αk−l . Similarly, the wavelet matrix coefficients for Pj (d/dx)Qj and Qj (d/dx)Pj satisfy the following identities: D d E j αkl = ψjk , ψjl = 2j αk−l , dx D E d j βkl = ψjk , ϕjl = 2j βk−l , dx D d E j γkl = ϕjk , ψjl = 2j γk−l . dx In terms of the connection coefficients D E d rl = ϕ, ϕ(· − l) dx and coefficients {hk } and {gk } of the QMFs defining ϕ and ψ, one has XX αk = gl g¯m r2k+l−m , l
βk = γk =
m
XX l
m
l
m
XX
¯ m r2k+l−m , gl h hl g¯m r2k+l−m .
The scaling relationship (1.1) then leads to the equations ¾ X ½X ¯ m+2l−1 (r2k−2l+1 + r2k+2l−1 ) rk = 2 r2k + hm h l
m
which boil down to a finite linear system when ϕ has compact support. Once the rk are determined, the nonstandard form of d/dx can be implemented through matrix multiplication.
44
2 Derivatives and multiwavelets
2.1.2 Differentiation and commutation of MRAs The B-splines φN of order N are the N -fold autoconvolutions of the Haar scaling function φ0 = χ[0,1) . The hat function φ1 has the form φ1 (x) = x on [0, 1] and φ1 (x) = 2 − x on [1, 2] and is zero elsewhere. Its derivative equals one on (0, 1) and equals minus one on (1, 2). Thus φ01 (x) = φ0 (x)−φ0 (x−1) = ∆− φ0 (x) where ∆− is the backward difference operator ∆− f = f (·) − f (· − 1). In fact, d(ϕ ∗ χ[0,1) )/dx = ∆− ϕ for any ϕ: as χ[0,1) = χ[0,∞) − χ[0,∞) (· − 1) and since δ is the distributional derivative of χ[0,∞) , one has dχ[0,1) /dx = δ − δ(· − 1). Since differentiation commutes with convolution, this identity applies in the sense of distributions. Integrating both sides shows that a smoothed scaling function ϕ+ = ϕ ∗ χ[0,1) can be obtained by integrating ∆− ϕ. It is natural to ask how to reverse this process: starting with a scaling function ϕ, can one obtain a roughened scaling function from a similar process? That is, given a scaling function ϕ ∈ L2 (R), when does ϕ− defined by the distributional identity ϕ0 = ∆− ϕ− also define a scaling function ϕ in L2 (R)? More precisely, what condition is required of the scaling filter of ϕ? The previous discussion suggests that this filter should be divisible by (1 + e−2πiξ )/2—the Haar scaling filter. Theorem 2.1.1. Suppose that the filter H of the scaling function ϕ has a zero of order m at ξ = 1/2. Set H−m (ξ) = H(ξ)((1Q + e−2πiξ )/2)−m . Then ϕ ∈ ∞ 2 2 L (R) has m derivatives in L (R) if and only if j=1 H−m (ξ/2j ) converges in L2 (R). The argument simplifies when ϕ has compact support, as we shall assume now. Moreover, we will only consider the case m = 1. The general case follows by induction. It is clear, with ϕ− as above, that if ϕ has a derivative in L2 (R) then ϕ = χ[0,1) ∗ϕ− . In the Fourier domain, ϕ(ξ) b = χ[ c [ − . But χ [0,1) (ξ)ϕ [0,1) (ξ) = ¢ Q∞ Q∞ ¡ j −2πiξ/2j )/2 while ϕ(ξ) b = j=1 H(ξ/2 ). This tells us that, at j=1 (1 + e least formally, ϕ− is the scaling function with filter H−1 . In addition, since H is a polynomial, so is H−1 , so, as a distribution, ϕ− has compact support. Also, ϕ0 ∈ L2 (R) implies that ∆− ϕ− ∈ L2 (R). By telescoping, for any M = 1, 2, . . . , ϕ− (x) − ϕ− (x − M ) belongs to L2 (R). Since ϕ− (x)ϕ− (x − M ) ≡ 0 for M large enough, ϕ− itself must be in L2 (R). This proves the result under our added assumption of compact support. Intertwining of MRAs. The smoothing operation ϕ 7→ ϕ+ = ϕ ∗ χ[0,1) and roughening operation ϕ 7→ ϕ− each preserve the scaling property, but the pair also has a natural use in deriving biorthogonal MRAs, as was originally pointed out by Lemar´ıe-Rieusset [256]. If H is the scaling filter of an orthogonal MRA then the derived filter pair H− (ξ) = 2H(ξ)/(1 + e−2πiξ ) and H+ (ξ) = (1 + e−2πiξ )H(ξ)/2 clearly satisfy ¯ ³ ´ ´¯2 ³ ¯ − (ξ) + H+ H ¯ − ξ + 1 = |H(ξ)|2 + ¯¯H ξ + 1 ¯¯ = 1. H+ H 2 2
2.1 Wavelets and derivatives
45
¯ ± (ξ + 1/2), the orthogonality condition Setting G± (ξ) = e−2πiξ H ³ ´ ³ ´ ¯ − ξ + 1 + G+ (ξ) G ¯− ξ + 1 ≡ 0 H+ (ξ) H 2 2 also remains valid. ˜ G ˜ satisfy the conditions of biorthogonality More generally, if H, G, H, # · · ¸" ∗ ¸ e (z) G e ∗ (z) H(z) H(−z) H 10 = (2.1) e ∗ (−z) G e ∗ (−z) G(z) G(−z) 01 H ˜ +, G ˜ + still satisfy the then the smoothed and roughened filters H− , G− , H conditions of biorthogonality. These derived scaling filters then define new biorthogonal MRAs as long as ϕ− defined through H− converges in L2 (R). 2.1.3 Wavelet characterization of Sobolev norms As we will see in Chapter 6, wavelets form unconditional bases for a large family of function spaces, namely Besov and Triebel–Lizorkin spaces (see also Meyer [274] and Hernandez and Weiss [189]). Here we review the use of wavelets in defining equivalent norms on L2 -Sobolev spaces. The Sobolev space H s (R) (s ∈ R) consists of those functions f ∈ L2 (R) whose R Fourier transforms belong to the space L2s (R) of those g such that kgk2L2 = |g(ξ)|2 (1+|ξ|2 )s dξ < s ∞. One sets kf kH s = kfbkL2 . s
Theorem 2.1.2. Suppose that ψ is the mother wavelet of a wavelet basis for L2 (R), that ψ ∈ C α (R) for some α > s ≥ 0 and that, for some sufficiently −N large N , |ψ(x)| there are constants C1 , C2 such that P ≤ (1 +js|x|) . Then 2 C1 kf kHs ≤ j,k (1 + 4 )|hf, ψjk i|2 ≤ C2 kf k2Hs . To simplify matters we will only prove Theorem 2.1.2 for the case of orthonormal wavelets and for 0 < s < 1. One argues the case of bandlimited wavelets first, then passes to any sufficiently regular wavelets by means of a change of wavelet argument, showing that changing from one suitably regular wavelet basis to another defines a bounded operator on H s (R). The arguments for the biorthogonal case are much the same—only the notion of change of wavelet is slightly more complicated. However, assuming the biorthogonal case for 0 < s < 1, the result for s > 1 can be reduced to that for 0 < s < 1 essentially by using Theorem 2.1.1. The bandlimited wavelet case. We begin with a variant of the bandlimited Lemari´e–Meyer wavelets as derived in Auscher et al. [10], as will be reviewed in the notes of Chapter 4. All we need here is the specific form of the wavelets. Let b(ξ) be a smooth, non-negative function supported inside [1/3, 4/3] and P decreasing away from ξ = 1 such that j |b(|ξ|/2j )|2 = 1 for all ξ 6= 0. Set
46
2 Derivatives and multiwavelets
ω(ξ) = sgn (ξ)eπiξ b(2|ξ|). In Chapter 4 we will see that when b satisfies certain j additional properties, the functions ωnj = 2−j/2 e−2πinξ/2 ω(ξ/2j ) form an 2 orthonormal basis for L (R). We assume this henceforth. The inverse Fourier transforms of the ωnj are wavelets ψjn (see [10], Theorem 7, p. 252). By Plancherel, hf, ψjn i = hfb, ωnj i. Since the Fourier transform is an isomorphism between H s and L2s , to prove the bandlimited case P P of Theorem 2.1.2 it suffices to prove that g ∈ L2s if and only if j (1 + 4js ) n |hg, ωnj i|2 < ∞. Let Aj = P {2j /3 ≤ |ξ| ≤ 4 · 2j /3}. Clearly kgk2L2 is equivalent in magnitude to j (1 + s Pj+1 4js )kgχAj k2L2 . Since k=j−1 |b(|ξ|/2k )|2 = 1 on Aj , it suffices to show that Pj+1 P P kgχAj k2L2 is at most C k=j−1 n |hg, ωnk i|2 and at least c n |hg, ωnj i|2 for appropriate constants c, C. To simplify matters, it will help to assume that g is supported in [0, ∞). There is no loss of generality here since g ∈ L2s if and only if its restrictions to [0, ∞) and (−∞, 0] are. Similar arguments as below can be applied to gχ(−∞,0] . For g supported in [0, ∞), Z ³ ´ j ξ −j/2 hg, ωnj i = 2 b j g(ξ) sgn (ξ) e−2πi(n+1/2)ξ/2 dξ 2 Z 2j+2 /3 ³ξ´ j j = 2−j/2 e−πiξ/2 b j g(ξ) e−2πinξ/2 dξ. 2 2j /3 j
As {2−j/2 e−2πinξ/2 }n∈Z forms an orthonormal basis for L2 ([2j /3, 2j+2 /3)), Z ¯ ³ ξ ´¯2 X ¯ ¯ 2 |hg, ωnj i| = |g(ξ)|2 ¯b j ¯ ≤ C kg χAj k2L2 . 2 n On the other hand, kgχAj k2L2 ≤
j+1 X k=j−1
kg(·) b(2−k | · |)k2L2 =
j+1 X X k=j−1
|hg, ωnj i|2 .
n
Summing these inequalities with the appropriate weight factors (1+4js ) yields the desired coefficient estimates when g is supported in [0, ∞). Again, the same methods apply to g supported in (−∞, 0], thus proving the theorem for the case of the Lemari´e–Meyer type wavelets. Notice that the restriction 0 < s < 1 is not necessary for this bandlimited case. Passage to other wavelets. It will be convenient to return to the interval notation ψI = ψjk when I = I(j, k) = [k/2j , (k + 1)/2j ). The general case of Theorem 2.1.2 boils down to proving bounds on a change of wavelet matrix. This is where we use the hypothesis that s < 1. (2) By expanding the wavelet ψ (1) in terms of the basis ψI , that is, ψ (1) = P (2) (2) (1) , ψI iψI , any f ∈ L2 (R) can be expressed as I∈D hψ
2.1 Wavelets and derivatives
f =
X
(1)
(1)
hf, ψI i ψI
=
X
I
(1)
hf, ψI i
X
I
(1)
(2)
(2)
hψI , ψJ i ψJ .
47
(2.2)
J
Now let ψ (1) = ψ b be the bandlimited P wavelet considered above for which we have the norm equivalence kf k2H s ∼ I∈D (1 + |I|−2s )|hf, ψIb i|2 . Thus H s is isomorphic toPthe Hilbert sequence space Hs consisting of those sequences {cI } such that I (1 + |I|−2s )|cI |2 < ∞. To prove that a more general wavelet ψ = ψ (2) under consideration also provides a norm equivalence between H s (b) and Hs , it is enough to show that the matrix AIJ = hψI , ψJ i is bounded and s continuously invertible on jointly on `2 (D) and on H˙ s , the PH or equivalently, −2s 2 space of {cI } such that I∈D |I| |cI | < ∞. Since {cI } ∈ H˙ s if and only if |I|−s cI ∈ `2 (D), proving that AIJ is bounded on H˙ s is equivalent to proving s that BIJ = (|I|/|J|) AIJ is `2 (D)-bounded. Thus, one finds a sufficient con2 dition for ` -boundedness of AIJ first, then verifies a corresponding condition for BIJ . The following matrix boundedness criterion was established by Frazier and Jawerth [148]. The criterion boils down to a form of Schur’s lemma that will be discussed further in Chapter 6. We take it for granted here. Theorem 2.1.3. Suppose that the matrix AIJ satisfies sup I, J ∈ D
|AIJ | ≤ C ωIJ
(2.3)
where, for some fixed ² > 0, ωIJ =
µ 1+
|xI − xJ | max{|I|, |J|}
¶−1−²
½ min
|I| |J| , |J| |I|
¾(1+²)/2 .
Then A is bounded on `2 (D). Such a matrix is “almost diagonal” in that the entries decay if the centers xI , xJ of I, J are distant, or if the intervals lie on distant scales. Because we are assuming that 0 < s < 1, we will just need H¨older continuity, vanishing mean and decay—so-called vaguelette properties—of the wavelets to establish the condition (2.3). Definition 2.1.4. The functions {fI }I∈D are said to form a vaguelette family of type (β, α) if they are continuous on R and satisfy the uniform estimates: |fI (x)| ≤ C|I|
−1/2
µ ¶−1−β |x − xI | 1+ , |I|
(2.4)
Z fI = 0,
(2.5) µ
0
|fI (x) − fI (x )| ≤ C |I|
−1/2
|x − x0 | |I|
¶α .
(2.6)
48
2 Derivatives and multiwavelets
These vaguelette properties are satisfied by the wavelets of Theorem 2.1.2 with α > s, β = N − 1 and β > s as large as necessary to establish (2.3). By applying the decay estimate (2.4) alone one obtains the following. Lemma 2.1.5. If the functions {fI } satisfy (2.4) then ½ |hfI , fJ i| ≤ C min
|I| |J| , |J| |I|
¾1/2 µ 1+
|xI − xJ | max{|I|, |J|}
¶−1−β = M1 (I, J). (2.7)
Proof (of Lemma 2.1.5). We follow [189]. By shifting and dilating it is enough to prove the estimate in the case in which I is the unit interval [0, 1] and |J| ≤ 1. Specifically, for J = J(j, k) it suffices to show that Z (1 + |u|)−1−β (1 + |2j u − k|)−1−β du ≤ C (1 + 2−j |k|)−1−β . Set E1 = {u : |u − 2−j k| ≤ 3}, E2 = {u : |u − 2−j k| > 3 and |u| ≤ 2−j−1 |k|} and E3 = {u : |u − 2−j k| > 3 and |u| > 2−j−1 |k|}. By the triangle inequality, if u ∈ E1 ∪ E3 then 1 + 2−j |k| ≤ 4(1 + |u|) so Z (1 + |u|)−1−β (1 + |2j u − k|)−1−β du E1 ∪E3 Z C2 C 4 C1 du ≤ . ≤ (1 + 2−j |k|)1+β (1 + 2j |2−j k − u|)1+β (1 + 2−j |k|)1+β Next, if u ∈ E2 then 2j |2−j k − u| ≥ 2j−2 (1 + 2−j |k|) so that Z (1 + |u|)−1−β (1 + |2j u − k|)−1−β du E2
≤
4C2 2−jβ (1 + 2−j |k|)1+β
Z
C1 C du ≤ 1+β −j (1 + |u|) (1 + 2 |k|)1+β
since j ≥ 0. Combining these estimates gives (2.7). Lemma 2.1.6. If {fI } satisfy (2.4)–(2.6) with β > α then ½ |hfI , fJ i| ≤ C min
|I| |J| , |J| |I|
¾α+1/2 = M2 (I, J).
(2.8)
Proof (of Lemma 2.1.6). The proof is standard. One has ¯Z ¯ ¯ ¯ ¯ |hfI , fJ i| = ¯ (fI (x) − fI (xJ )) fJ (x) dx¯¯ µ
Z ≤ C |I|−1/2−α
|x − xJ |α |fJ (x)| dx ≤ C
Since one can interchange the roles of I, J, (2.8) follows.
|J| |I|
¶α+1/2 .
2.1 Wavelets and derivatives
49
Proof (of Theorem 2.1.2). Observe that since the wavelets ψ b and ψ satisfy the vaguelette conditions, the `2 (D)-boundedness of AIJ follows from Theorem 2.1.3 as soon as 2α > ² and β > 2/(2α − ε) which hold for ² sufficiently small. s To prove boundedness on H˙ s one must show that (|J|/|I|) AIJ is bounded (1) on `2 . We treat the cases |I| ≤ |J| and |I| > |J| separately. Thus let AIJ = AIJ (2) (1) (1) if |I| ≤ |J| and AIJ = 0 otherwise, and set AIJ = AIJ − AIJ . To apply Theorem 2.1.3 we need µ ¶s |J| |AIJ | sup ≤ C |I| ωIJ |I|≤|J| where, for some fixed ² > 0, ωIJ =
µ ¶−1−² µ ¶(1+²)/2 |xI − xJ | |I| 1+ . |J| |J|
Taking geometric means of (2.7) and (2.8) one has, for any θ ∈ [0, 1], |hfI , fJ i| ≤ M1θ M21−θ . (1)
For AIJ with fixed θ this becomes µ
|J| |I|
¶s
µ (1) |AIJ |
≤ C
|I| |J|
¶1/2+(1−θ)α−s µ ¶−θ−θβ |xI − xJ | 1+ . |J|
Thus, we need to find θ ∈ [0, 1] and ² > 0 such that (1 − θ)α − s ≥ ²/2 while θ(1 + β) ≥ 1 + ².
(2.9)
The first estimate can be established for small enough θ since α > s. Once such a θ is fixed, β can be chosen large enough to satisfy the second condition. This (1) gives the desired boundedness of AIJ . Similar arguments apply to prove the (2) boundedness of AIJ and of the adjoint AJI . From these estimates Theorem 2.1.2 follows, at least for suitably regular wavelets ψ. The conditions (2.9) determine the tradeoff between the smoothness α and decay N required of ψ in the statement of Theorem 2.1.2. For example, the wavelet ψ b has as much regularity and decay as we wish whereas, when ψ = ψ (2) has compact support we have β as large as we wish and α > s is all that is required. 2.1.4 Sobolev estimates for pointwise products Pointwise products are among the simplest operations that arise in nonlinear PDE. The Navier–Stokes equations provide perhaps the most familiar and dramatic illustration of the difficulties that Fourier techniques have with
50
2 Derivatives and multiwavelets
products. Wavelets can handle multiplication. In this section we will obtain estimates for wavelet subspace projections of products in Sobolev norms that often arise in matters of well-posedness for PDEs. The estimates depend crucially on both the regularity and orthogonality properties of the wavelets. The variational form of the Navier–Stokes equations in Rn is: Z t v(t) = S(t) v0 − PS(t − s)∇ · (v ⊗ v)(s) ds. (2.10) 0
One wishes to solve for the velocity field v = v(x, t). Here S(t) = exp(t∆) is the heat semigroup and P is the singular integral operator that projects a vector field onto its divergence-free component. Cannone and Meyer [66] developed a method for obtaining so-called mild solutions of (2.10) with initial data in certain function spaces including Sobolev spaces. More will be said about their approach in the notes to this chapter. A crucial step was to develop a notion under which a function space X is adapted to the bilinear product estimates required for application of a Picard iteration method to solve for v. Cannone and Meyer then used Littlewood–Paley theory to prove such estimates which can be regarded, in a sense, as estimates for wavelet projections of pointwise products specific to bandlimited wavelets. Extending such estimates to more general wavelets is one step required in adapting the techniques in [66] to the setting of domains in Rn with boundary such as half-spaces. Here is an abstraction of the essential role of wavelets in obtaining useful estimates for operators such as PS(t − s)∇. As before, Pj and Qj denote the orthogonal projections of an orthonormal MRA. One seeks estimates of the form kQj (f g)kX ≤ ηj kf kX kgkX . (2.11) Definition P 2.1.7. A linear operator T is adapted to pointwise products on X provided j,j 0 ηj 0 kQj T Qj 0 kX→X < ∞ where ηj is as in (2.11). Lemma 2.1.8. If T is adapted to pointwise products on X, then (f, g) 7→ T (f g) is strongly continuous from X × X → X. Since Qj is an orthogonal projection, the lemma is a straightforward application of Minkowski’s inequality. In [66] the lemma is used, essentially, to obtain bounds C(t) on PS(t − s)∇ in (2.10). Local integrability of C(t) is the main step in obtaining solutions of (2.10) continuous in t. In what follows we establish estimates of the form (2.11) for Sobolev spaces. Theorem 2.1.9. Let Qj denote the projection onto the jth wavelet space of a multiresolution analysis of L2 (R) where the wavelets are orthogonal and Lipschitz with compact support. Then for 0 ≤ α ≤ 1 and j ≥ 1, kQj (f g)kH α ≤ C 2j(1/2−α) kf kH α kgkH α .
2.1 Wavelets and derivatives
51
The first stage of the argument is based on the so-called paraproduct P∞ decomposition. Writing f = P0 f + j=0 Qj f , expressing g similarly and rearranging sums suitably, one has f g − P0 f P0 g =
∞ X
(Qj f ) (Pj−2 g) +
j=2
∞ X
(Qj g) (Pj−2 f ) +
j=2
X
Qi f Qj g.
|i−j|≤2;i,j≥0
(2.12) One can estimate Qν (f g) by applying Qν to the three terms on the right in (2.12). In what follows we will pretend that ϕ and ψ are supported in [−1, P 1]. For convenience, thePwavelets are assumed to be real-valued. We let f = jk cjk ψjk and Pj g = k djk φjk . Estimate of the first term. To estimate (f, g) 7→ Qi (Qj f Pj−2 g), one looks at a typical wavelet coefficient in its expansion. The case j < i. Under the make-believe support hypothesis, one notes that hψjk1 ϕj−2,k2 , ψi0 i = 0 unless k1 , k2 ∈ {0, ±1}. Using the moment and orthogonality conditions on the wavelets we can write hψjk1 ϕj−2,k2 , ψi0 i as Z 1/2i (i−2)/2 j 2 2 (ψ(2j x − k1 ) − ψ(−k1 ))(ϕ(2j−2 x − k2 ) − ϕ(−k2 ))ψ(2i x)dx. −1/2i
By the Lipschitz continuity of ϕ, ψ, the integral is at most C22j−3i so |hψjk1 ϕj−2,k2 , ψi0 i| ≤ c 2i/2 23(j−i) .
(2.13)
The coefficient of ψim in Qi (Qj f Pj−2 g) will be X cjk1 dj−2,k2 hψjk1 ϕj−2,k2 , ψim i . k1 ,k2
Because of the support properties of the wavelets, for m fixed, there are at most nine combinations of k1 , k2 such that hψjk1 ϕj−2,k2 , ψim i 6= 0. It does no harm to pretend that the coefficient is nonzero for only one such choice ki = ki (m), i = 1, 2 and write X Qi (Qj f Pj−2 g) = cjk1 dj−2,k2 hψjk1 ϕj−2,k2 , ψim i ψim . m
Conversely, for k1 , k2 fixed there are at most O(2i−j ) choices of m such that ki = ki (m). By Theorem 2.1.2 and (2.13) we have X 2 2 kQi (Qj g Pj−2 f )kH α ≤ C 22iα |cjk1 dj−2,k2 hψjk1 ϕj−2,k2 , ψim i| m 2i(α+1/2)
≤ C2
26(j−i)
X
2
|cjk1 dj−2,k2 |
m
i(2α+1)
≤ C2
i
8(j−i)
2
α(2i−4j)
≤ C2 2
µX
2
k 8(j−i)
|cjk |
2
2 kf kH α
¶µX
k 2 kgkH α
¶ |dj−2,k |
2
52
2 Derivatives and multiwavelets
and this estimate applies whenever j < i. Summing these estimates over all j < i yields ° µ X ¶° X ° ° °Qi ° Q f P g ≤ kQi (Qj g Pj−2 f )kH α j j−2 ° ° Hα
0≤j
0≤j
X
≤ C kf kH α kgkH α 2i(1/2−α)
2(4−2α)(j−i)
0≤j
≤ C kf kH α kgkH α 2
i(1/2−α)
which is valid for α < 2. The case j > i. Here we estimate kQi (Qj f Pj−2 g)kH α where j > i. Since hψjk1 , ϕj−2,k2 i = 0, the Lipschitz condition on ψ yields hψjk1 ϕj−2,k2 , ψi0 i = Z (k1 +1)/2j (i−2)/2 2 2j ψ(2j x − k1 )ϕ(2j−2 x − k2 )(ψ(2i x) − ψ(2i−j k1 ))dx (k1 −1)/2j
≤ C 23i/2−j . Estimating the number of possible nonzero terms as before, X 2 2 kQi (Qj f Pj−2 g)kH α ≤ C 22iα |cjk1 dj−2,k2 hψjk1 ϕj−2,k2 , ψim i| m 2iα
≤ C2
3i−2j
2
2i(α+1)−j
≤ C2
2i(α+1)
≤ C2 Summing over i < j yields ° ° ° X ° °Qi ° (Q f P g) j j−2 ° ° j>i
Hα
2j−i ³X
X X k1 k2 ∼k1
|cjk |
k −j−4jα
2
2
|cjk1 dj−2,k2 |
2
´³X
2
´
|dj−2,k |
k 2 kf kH α
2
kgkH α .
≤ C 2i(α+1) kf kH α kgkH α
X
2−j/2−2jα
j>i i(1/2−α)
= C2
kf kH α kgkH α .
Together with a corresponding estimate when i = j, this completes the estimate of the first term in the paraproduct expansion (2.12). The estimate of the second term is exactly the same with the roles of f, g reversed. Estimate P of the third term. To estimate the third term in (2.12), set F = g. One can express Qi F as integration of F against the j≥i Qj f QjP kernel Ki (x, y) = m ψim (x)ψim (y). Then
2.2 Piecewise polynomial multiwavelets
¯2 Z ¯Z ¯ ¯ 2 ¯ kQi F k2 = ¯ Ki (x, y)F (y) dy ¯¯ dx Z Z Z = F (y) F (z) Ki (x, y)Ki (x, z) dx dy dz ¯Z ¯ ¯ ¯ ≤ sup ¯¯ Ki (x, y)Ki (x, z)dx¯¯kF k21 . y,z
53
(2.14)
The order of the integrals is justified by their absolute convergence. For each fixed y there are at most three values of m such that ψjm (y) 6= 0. Thus, by Cauchy–Schwarz and since ψim (x)ψin (x) = 0 if |m − n| > 1, one has ¯ ¯X ¯ ¯ ψim (x) ψin (x) ψim (y) ψin (z)¯ |Ki (x, y) Ki (x, z)| = ¯ m,n
¯ X ¯ 1 ≤ 2i kψk2∞ ¯¯
¯ ¯ ψim (x) ψin (x)¯¯
n,m=−1
≤ C
2 kψk∞
2i
X
2
|ψim (x)| .
m
Substituting this estimate in (2.14) and using the fact that ° ° X °X ° ° ° ≤ Q f Q g kQj f k2 kQj gk2 ≤ C 2−2iα kf kH α kgkH α , j j ° ° 1
j≥i
j≥i
one concludes that ° ° ° X ° °Qi Qj f Q j g ° ° ° j≥i
L2
≤ C 2i(1/2−2α) kf kH α kgkH α .
Since kQi F kH α ∼ 2iα kQi F k2 , the desired Sobolev estimate follows. This, together with corresponding estimates for the wavelet projections of P0 f P0 g —obtained by similar methods—proves Theorem 2.1.9. Estimates could be obtained for larger values of α by subtracting higher moments, provided one has more regularity and vanishing moments of the wavelets themselves.
2.2 Piecewise polynomial multiwavelets 2.2.1 Multiwavelet introduction Daubechies’ Theorem 1.1.5 states that a compactly supported, orthogonal scaling function cannot be both continuous and symmetric. Among other drawbacks, this creates some difficulty in using wavelets to solve two-point boundary value problems: special wavelets must be engineered for boundary conditions as was done by Cohen et al. [83]. Asymmetry can also be overcome
54
2 Derivatives and multiwavelets
with biorthogonal filters but contention between localization and regularity still remains in this case. Theorem 1.1.5 reduces to a matter of the number of constraints that are imposed on orthogonal scaling coefficients. Vector refinement equations that give rise to scaling vectors and multiwavelets have more free variables than their scalar counterparts. Ultimately, this means that more design constraints (support, regularity, symmetry) can be met simultaneously. As in the case of uniwavelets, by now there is a multitude of multiwavelet constructions having varied aims. It is not our goal to unify multiwavelets into a coherent conceptual framework, though several design principles were laid out in Strela’s thesis [333]. These will be reviewed in the specific context of the DGHM fractal construction of Geronimo and colleagues [123, 155]. Because of their conceptual simplicity, we will review the piecewise polynomial wavelets attributed to Alpert (cf. [6]) first. 2.2.2 Alpert’s piecewise polynomial wavelets Perhaps the simplest example of a multifunction MRA consists of those spaces of functions whose restrictions to dyadic intervals of a given scale are polynomials of at most a fixed degree. Thus one defines VIm , I ∈ D, to be the space of polynomials of degree less than m truncated to I and sets Vjm = {f ∈ L2 (R) : f χI ∈ VIm , for all I ∈ Dj }. Clearly, for m fixed, f ∈ Vjm m if and only if f (2·) ∈ Vj+1 . Also, ∩j Vjm = {0} follows since no nontrivial 2 polynomial is in L (R) while ∪j Vjm = L2 (R) since the Haar space VjHaar is contained in Vjm . The only remaining problem is to find a set of m functions in V0m whose shifts form an orthonormal basis for V0m . The Legendre polynomials P0 , . . . , Pm−1 are defined by applying Gram– Schmidt to the monomials {1, x, . . . , xm−1 } on [−1, 1]. Rodrigues’ formula states that bl/2c X (2l − 2j)! Pl (x) = (−1)j l xl−2j . 2 j!(l − j)!(l − 2j)! j=0 We p consider these2 functions to be truncated to zero outside [−1, 1]. Then l + 1/2 Pl has L -norm one. Now define √ π l (x) = 2l + 1 Pl (2x − 1). (2.15) The functions π l (x), l = 0, 1, . . . form an orthonormal basis for L2 ([0, 1]). m Therefore the shifts of {π l }m−1 n=0 form an orthonormal basis for V0 . 2.2.3 Interpolating scaling functions Gauss–Legendre quadrature (e.g., [360]) is a method for numerical approxR1 Pm−1 imation of integrals −1 f (t) dt by sums of the form i=0 f (xi )wi . This
2.2 Piecewise polynomial multiwavelets
55
method uses for nodes {x0 , . . . , xm−1 } the roots of Pm . The quadrature weights {w0 , . . . , wm−1 } are then defined by 0 1/wi = m Pm (2xi − 1) Pm−1 (2xi − 1).
(2.16)
The method is exact on polynomials of degree 2m − 1. By renormalizing to intervals on unit scale, one can express the orthogonal projection of f onto V0m in terms of this method. One particularly useful basis for V0m is built from the fundamental Lagrange interpolation polynomials on [−1, 1] defined by Ll (x) =
m−1 Y i=1,i6=l
µ
x − xi xl − xi
¶ ,
Ll (xi ) = δil ,
i = 0, . . . , m − 1.
(2.17)
Here, as above, {x0 , . . . , xm−1 } are the roots of Pm . The “prescaling” functions √ ρl (x) = Ll (x)/ wl satisfy the following. Proposition 2.2.1. (i) The functions ρl (x), l = 0, . . . , m − 1 form an orthonormal basis for the subspace of L2 ([−1, 1]) of polynomials of degree less √ Pm−1 than m on [−1, 1]. (ii) One has ρl (x) = wl i=0 (i + 1/2)Pi (xl )Pi (x) and (iii) any polynomial p of degree less than m can be expressed by the interpoPm−1 √ lation formula p(x) = l=0 wl p(xl )ρl (x). To prove (i), one expresses the inner product of ρi and ρl in terms of the Gauss–Legendre quadrature formula, substituting Ll (xi ) = δil to obtain hρi , ρl i = δil . The representation of ρl is obtained by expanding Ll in the LegPm−1 endre basis: Ll (x) = i=0 cil (i+1/2)1/2 Pi (x) where cil = (i+1/2)1/2 hLl , Pi i. Applying the quadrature formula again gives cil = (i + 1/2)1/2 Pi (xl )wl and (ii) follows from P the definition of ρl . Since {ρl } forms an orthonormal basis, one has p = hp, ρl i ρl and (iii) follows from the quadrature formula and the interpolating property of Ll . This proves the proposition. Now one defines the interpolating scaling functions ϕi (x) =
√
wi
m−1 X
π k (xi ) π k (x)
(2.18)
k=0
in which the points xi are the zeros of π m and wi are the quadrature weights defined by (2.16). 2.2.4 Multiscaling properties The inclusion VIm ⊂ VIm + VIm expresses the space of polynomials of degree r l less than m, truncated to I, as a sum of such truncations to its left and right subintervals Il and Ir . In particular, since {π n }m−1 n=0 spans the space of polynomials of degree at most m − 1, cutoff to [0, 1), while their dilations span
56
2 Derivatives and multiwavelets
the corresponding piecewise polynomials on the left and right subintervals, the vector π = [π 0 , . . . , π m−1 ]T is a scaling vector. Since ϕ = [ϕ0 , . . . , ϕm−1 ]T is obtained from the Legendre basis by an invertible linear transformation, it too forms a scaling vector. We will refer to these scaling vectors as Legendre and interpolating scaling vectors, respectively. Wavelets that correspond to the Legendre scaling vector are often called Alpert wavelets, but we will refer to them as Legendre wavelets. For π(x) = [π 1 , . . . , π m−1 ]T there are m × m scaling matrices Hk = Hkm , k = 0, 1 such that 1 X π(x) = 2 Hk π(2x − k). (2.19) k=0
Similarly, one can construct wavelet matrices Gk = Gm k by applying Gram– m m Schmidt in thinking of VIm +V . Set H(z) = H0 +H1 z as a completion of V I I r l and G(z) = G0 + G1 z. Then (note normalization) · ¸ H(z) H(−z) L(z) L(z −1 )T = 2 I2m where L(z) = . G(z) G(−z) The entries of L(z) are computed by Gaussian quadrature [1, 337, 360] with wi as in (2.16) but now the nodes xl are the zeros of π m . For example, R 1/2 H0 = 0 π(x)π T (2x) dx, yields µ ¶ m−1 m−1 ³x ´ 1 X 1 X xl + 1 l wl π π T (xl ); H1 = wl π π T (xl ), 2 2 2 2 l=0 l=0 µ ¶ m−1 m−1 ³ ´ X X xl xl + 1 1 1 T wl ψ wl ψ G0 = π (xl ); G1 = π T (xl ), 2 2 2 2
H0 =
l=0
l=0
where ψ is the wavelet vector corresponding to π. Moreover, from the symmetry properties ³1 ´ ³1 ´ πi − x = (−1)i π i +x 2 2 (see (2.1)) one obtains, again using superscripts for the matrix entries, H1il = (−1)i+l H0il ;
i+l+m il Gil G0 . 1 = (−1)
From the point of view of fast wavelet algorithms this means that one need only store the matrices H0 and G0 . For the record, when m = 3 one has [6] 4 0 0 1 √ 2 0; −2 3 √ H0,Leg = √ 4 2 0 − 15 1 √ √ 4 3 12 −4√ 15 √ 1 √ G0,Leg = 0 3 3 9 5 . 12 6 −2√15 −6√5 −8√3
2.2 Piecewise polynomial multiwavelets
57
These are the basic scaling and wavelet matrices for the Legendre-type basis in Figure 2.1.
Fig. 2.1. Plots of Legendre scaling functions (left) and wavelets (right) for m = 3
In the case of the interpolating scaling vector one has scaling and wavelet coefficients µ ¶ wl i xl il H0 = ϕ ; H1il = H0m−i−1,m−l−1 , 2 2 µ ¶ wl i xl il i+m i,m−l−1 . G0 = ψ ; Gil G0 1 = (−1) 2 2 Symmetries alternate among the generators. For example, when m = 3, ϕi (x) = ±ϕm−i (1 − x). Again, for the case m = 3 one has the basic scaling and wavelet matrices √ √ √ 6 10 + 12 6 42 + 12 15 √ √ 6 √ 1 √ √ H0,int = 6 10 − 15 6 6 10 + 15 √ 6 ; √ 42 √ 72 2 6 6 10 − 12 6 42 − 12 15 √ √ √ √ √ −12√ 3 − 4 5 28 √2 12 √3 − 4 5 1 G0,int = −15 6 √15 + 9 . 6√ 15 − 9 √ 6 72 6 15 − 26 4 10 −6 15 − 26 Polynomial interpolation properties lead to nice multiresolution expansion formulas. Suppose that for a sufficiently high scale N , f is well approximated by polynomials on each dyadic subinterval of length 2−N . In particular, if f ∈ VNm then Gauss–Legendre quadrature yields the formulas
58
2 Derivatives and multiwavelets
³x ´ l + x π i (xl ) wl and I 2N l=0 ³x ´ ® i i −N/2 √ f, ϕI = 2 wi f N + xI (i = 0, . . . , n − 1) 2
f, πIi
®
= 2−N/2
m−1 X
f
for the normalized Legendre and interpolating scaling coefficients, respectively. Here xI is the left endpoint of I and wl is defined as in (2.16).
2.3 Multiwavelets based on fractal interpolation vectors 2.3.1 Fractal interpolation functions The hat function h(x) = (1 − |x|)χ[−1,1] (x) satisfies the refinement equation: h(x) =
1 1 h(2x + 1) + h(2x) + h(2x − 1) 2 2
(x ∈ R)
and gives rise to an MRA of L2 (R), but h is not orthogonal to its shifts. The Schweinler–Wigner orthogonalization method [315] yields an orthogonal genP b =b erator φ for V (h) where φ(ξ) h(ξ)/( k |b h(ξ + k)|2 )1/2 , but this φ does not have compact support. Alternatively, one can build complementary biorthogonal MRAs in which regularity is traded for length of the generators. There is a third possibility: to extend V (h) to a finitely generated shift-invariant space (FSI) V (Φ) that (i) contains V (h), (ii) has compactly supported, orthogonal generators and (iii) remains refinable. One hopes to optimize the tradeoff between regularity and wavelet transform complexity in doing so. One might also P wish to preserve some semblance of the interpolation property: if f = k ck h(x − k) ∈ V (h) then ck = f (k). Suppose that Φ = [h, w]T is a scaling vector with X w(x) = [ak w(2x − k) + bk h(2x − k)]. (2.20) k
Requiring properties of w imposes constraints on ak , bk . For example, if w is minimally supported in [0, 1] then bk = 0 in (2.20) unless k = 1 while ak = 0 unless k = 0 or k = 1. Suddenly (2.20) reduces to w(x) = a0 w(2x) + a1 w(2x − 1) + b h(2x − 1).
(2.21)
Moreover, if w is to be symmetric then normalizing b = 1 forces a0 = a1 ≡ s. Remaining design concerns are regularity and (bi)-orthogonality. Since h is merely Lipschitz one cannot expect better behavior of w, but it could be worse, depending on s. For α ∈ (0, 1] let C0α ([0, 1]) denote the space of α-H¨older continuous functions supported in [0, 1], with seminorm kf kC α =
|f (x) − f (y)| . |x − y|α x,y ∈ [0,1] sup
2.3 Multiwavelets based on fractal interpolation vectors
59
α The closed subspace S1/2 of functions symmetric about 1/2 is invariant under the linear operator
Γs f (x) = s (f (2x) + f (2x − 1)). α Lemma 2.3.1. Let 0 ≤ α ≤ 1. If |s| < 2−α then Γs is a contraction on S1/2 . α Consequently, there is a unique fixed point w ∈ S1/2 satisfying
w(x) = s w(2x) + s w(2x − 1) + h(2x − 1)
x ∈ [0, 1].
(2.22)
Such a solution w will be called a fractal interpolation function. Proof. If x, y ∈ [0, 1/2] and f ∈ C0α ([0, 1]) then f (2x − 1) = 0 = f (2y − 1) and ¶ µ |Γs f (x) − Γs f (y)| |f (2x) − f (2y)| = s ≤ 2α s kf kC α . |x − y|α |x − y|α The same estimate applies if x, y ∈ [1/2, 1]. Suppose now that x ∈ [0, 1/2] and y ∈ [1/2, 1] (or vice versa). The locations α of x and y imply that |x+y −1| ≤ |1/2−x|+|y −1/2| = |x−y|. Since f ∈ S1/2 implies f (2y − 1) = f (2 − 2y), one has |Γs f (x) − Γs f (y)| s |f (2x) − f (2y − 1)| = |x − y|α |x − y|α |f (2x) − f (2 − 2y)| |x + y − 1|α = 2α s ≤ 2α s kf kC α . |2x − (2 − 2y)|α |x − y|α α Hence kΓs k ≤ 2α s < 1 as an operator on S1/2 . Thus, Γs is a contraction in α S1/2 as claimed. α A solution of (2.22) in S1/2 satisfies
(I − Γs ) w = h(2x − 1). The contraction mapping theorem implies that P a unique ¡solution exists. It can ¢ ∞ be computed by the Neumann expansion w = n=0 Γns h(2 · −1) . The scaling vector Φ = [h, w]T generates a closed subspace V (Φ) of L2 (R) that is, P heuristically, twice the size of V (h). Since w is supported in [0, 1], if f (x) = k [ck h(x − k) + dk w(x − k)] then f (k) = ck while dk = {2f (k + 1/2) − (f (k) + f (k + 1))}/(2w(1/2)) since h(±1/2) = 1/2. 2.3.2 DGHM multiwavelets One can use Gram–Schmidt to biorthogonalize the scaling vector Φs = [h, ws ]T . There are a couple of issues here. First, as the scaling vectors are parameterized by s, it would be nice to find biorthogonal generators within this family. This means that one seeks a pair {φ1s , φ2s } in V (Φs ) and a parameter
60
2 Derivatives and multiwavelets
s˜ such that hφis (·), φls˜(· − k)i = δ0k δil . We also wish that φis remain minimally supported and symmetric. Set φ1s = γ (h − β (ws (·) + ws (· + 1))),
φ2s = δws ,
with β, γ, δ to be chosen. In this formulation, φ1s is supported in [−1, 1] and symmetric with respect to the origin while, of course, φ2s is supported in [0, 1] and symmetric with respect to x = 1/2. Except in degenerate cases, these relations define a refinable vector that generates the same base space V (Φ) as [h, ws ]T . The inner products between the shifts of h and of ws are simple to determine from the refinement equation and change of variables. Once these are known, a Gram–Schmidt type biorthogonalization argument can be used to solve for the four parameters γ, β and δ and s˜ in order that the equations hφis (·), φjs˜(· − k)i = δ0k δij hold. Once the support conditions are imposed, the dual pairs are determined uniquely provided, in addition, the scaling coefficients for Φs = [φ1s , φ2s ]T depend linearly on s [246]. Without this condition the dual scaling vector is not uniquely defined—just as in the case of singly generated MRAs. Rather than listing the coefficients γ, β, δ here we will write out refinement filters below. Curiously, the equations of biorthogonality are satisfied within the parametric family Φs when s˜ = (1 + 2s)/(5s − 2); see [176]. In particular, when s = −1/5 one has s˜ = s and the φis define orthogonal multiscaling functions. In Table 2.1 we list the nonzero coefficient matrices Ck and Dk of the respective scaling and wavelet filters Hs (z) and Gs (z) that are obtained from application of Gram–Schmidt. Table 2.1. DGHM scaling and wavelet coefficients √ ¸ 0 −(1 + 2s) 2 0 0
D−2
√ ¸ 8s − 2 (5 − 2s) 2 0 0
D−1
· 1 24
C−2
· 1 24
C−1
· 1 24
C0 · C1
1 24
√ ¸ 12 (5 − 2s) 2 0 8 + 4s
· 1 24
· 1 24
√ ¸ 8s − 2√ (5 − 2s) 2 (8s − 2) 2 10 − 4s ·
1 24
D0
√ ¸ 8s − 2√ −(1 + 2s) 2 D1 (8 − 8s) 2 8 + 4s
· 1 24
√ ¸ 0 −(1 + 2s) 2 0 −2 − 4s
√ ¸ −12 (5 − 2s) 2 0 4s − 10
√ ¸ 8s − 2√ −(1 + 2s) 2 (2 − 8s) 2 2 + 4s
˜ s = Hs˜ and G ˜ s = Gs˜ is easy to check with a symbolic Biorthogonality for H algebra package. Verification that the scaling functions and wavelets form biorthogonal bases for L2 (R) within the range −1 < s < 1/7 on which s 7→ s˜ is one-to-one and onto can be found in [155]. We summarize these observations as follows (cf. [176, 246]).
2.3 Multiwavelets based on fractal interpolation vectors
61
Theorem 2.3.2. Let −1 < s < 1/7 and set s˜ = (1 + 2s)/(5s − 2). Let Φs = (φ1s , φ2s ) be the modified DGHM multiscaling function with filter Hs and let {Vj } be the corresponding MRA generated by its components. Then: (a) Φs is a continuous multiscaling vector biorthogonal to Φs˜. (b) φ1s is supported on [−1, 1] and symmetric about x = 0 while φ2s is supported on [0, 1] and symmetric about x = 1/2. (c) h(x) = (1 − |x|)χ[−1,1] (x) ∈ V (Φs ). (d) The associated multiwavelet vector Ψs with filter Gs is continuous, supported in [−1, 1] and biorthogonal to Ψs˜. (e) ψs1 is symmetric about x = 0 while ψs2 is antisymmetric about x = 0. Because of the important use that we shall make of the symmetries and supports of these functions, for a fixed s we will write φ1 = φe emphasizing its even symmetry with respect to zero and φ2 = φi emphasizing that it lives inside [0, 1]. We shall also write ψ 1 = ψ e and ψ 2 = ψ o emphasizing their even and odd symmetries, respectively. We will omit reference to the parameter s when it is not necessary. We refer to these wavelets as DGHM wavelets, as the construction leading to them is due to Donovan, Geronimo, Hardin and Massopust [123]. 2.3.3 Multiwavelets and Sobolev spaces on R+ The DGHM wavelets are excellent candidates for numerical solution of boundary value problems on intervals. One piece of evidence supporting this assertion is that they provide coefficient characterizations of Sobolev spaces on R+ and on [0, 1]. Suitable boundary truncations of the DGHM wavelets form (bi)-orthogonal bases for H α (R+ ) when 0 ≤ α ≤ 1. Higher regularity requires a smoother interpolation scheme based on longer scaling vectors. Sobolev spaces behave well under restrictions (as in the Sobolev embedding theorem) and extensions. Theorem 2.3.3 shows that DGHM wavelets can be used to extend functions from the half-line to the whole line, continuously in H α (cf. [176, 246] for variations). The Sobolev space H α (R+ ) consists of restrictions to [0, ∞) of functions in H α (R) with quotient norm given by inf{kf˜kH α (R) : f˜χR+ = f }. Remark. Theorem 2.1.2 implies that any basis P of compactly supported wavelets in C α (Rn ) provides expansions f = jk cjk ψjk converging to f P∞ P in H α (R) when j=1 22jα k∈Zn |cjk |2 < ∞. This fact extends routinely to the DGHM multiwavelets. In this section Vj and Wj are the multiresolution spaces generated by Φs and Ψs for a fixed s ∈ (−1/2, 0]. Table 2.2 summarizes multiresolution spaces on the half-line obtained by restricting appropriate generators to [0, ∞). The spaces indicated represent the components of the multiresolution spaces in R that live inside R+ and at the interior boundary of R+ , respectively.
62
2 Derivatives and multiwavelets Table 2.2. Multiresolution spaces for [0, ∞) Multiresolution space Vjint (R+ )
spanned by or sum of © e ª∞ © ª∞ φjk k=1 ∪ φijk k=0
type interior
Wjint (R+ )
© e ª∞ © o ª∞ ψjk k=1 ∪ ψjk k=1
interior
Vjbd (R+ )
φej0 χ[0,∞)
boundary
Wjbd (R+ )
e ψj0 χ[0,∞) ,
boundary
£
L2int (R+ )
¤ V0int (R+ ) ⊕ ⊕j≥0 Wjint (R+ ) interior
L2bd (R+ )
£ ¤ V0bd (R+ ) ⊕ ⊕j≥0 Wjbd (R+ ) boundary
The corresponding L2 -spaces L2bd (R+ ) and L2int (R+ ) defined in the table are oblique direct sums when s 6= −1/5. Why only symmetric boundary terms are used. The space L2bd (R+ ) in Table 2.2 is generated by restrictions to [0, ∞) of φe χ[0,∞) and the even e wavelets ψj0 χ[0,∞) . Thought of as an element of L2 (R) that vanishes for all x < 0, any element of L2bd (R+ ) can be written either in terms of the even scaling and wavelet terms or the odd wavelet terms: the sum of the odd terms inside must equal the sum of the even terms inside, since these sums cancel for almost all x < 0. However, this flipping of even and odd expansions does not respect convergence in H α when α > 1/2. To outline what can go wrong, consider the wavelets in Figure 2.2 with s = 0. In this case ψ e is piecewise linear so we may write ψ e (x) = ax + b, b 6= 0 on [0, 1/2j0 ] for large enough j0 . Then, for j > j0 one estimates Z 1 hψ e χ[0,∞) , ψejo i = 2j/2 (ax + b) ψeo (2j x) dx ≈ c 2−j/2 + d 2−3j/2 0
R1
R1 where c = b 0 ψ˜o (x)dx and d = a 0 uψ˜o (u)du. One can check that c 6= 0. o Thus, when expanded in terms of the ψj0 χ[0,∞) , the coefficients of ψ e χ[0,∞) P∞ −j/2 decay no faster than c2 . Consequently, j=1 22jα |hψ e χ[0,∞) , ψ˜jo i|2 fails to converge whenever α ≥ 1/2. The expansion of ψ e χ[0,∞) in terms of the odd wavelets fails to identify ψ e χ[0,∞) as the restriction to [0, ∞) of a function in H α (R) when α ∈ [1/2, 1]. This dichotomy at α = 1/2 is expected since the subspace H0α (R+ ) of functions ©whose extension by zero ª to (−∞, 0] belongs to H α (R) differs from H α (R+ ) = f χ[0,∞) : f ∈ H α (R) when α ≥ 1/2 (see [2]). In summary, even frame expansions serve to identify convergence in H α (R+ ). It follows from the Lipschitz property of the DGHM wavelets that, for any α ∈ [0, 1], H α (R) ∩ Wjint (R+ ) ⊂ H0α (R+ ). The even extension of a function f
2.3 Multiwavelets based on fractal interpolation vectors
63
defined on R+ is f e (−x) = f (x) when x ≥ 0. The following theorem extends to half-spaces in higher dimensions [2]. Theorem 2.3.3. The even extension mapping f 7→ f e is continuous from H α (R+ ) to H α (R) whenever 0 ≤ α ≤ 1. (R+ ) ∩ H α (R+ ) (0 ≤ α ≤ then the frame Corollary 2.3.4. If f ∈ L2bdP P1) ∞ ∞ e expansion f = c0 φe χ[0,∞) + j=0 cj ψj0 χ[0,∞) satisfies j=0 22jα |cj |2 < ∞ and hence converges to f in H α (R+ ). Proof (of Corollary 2.3.4). Consider the mapping f 7→ f e . The biorthogonality and symmetry properties of the wavelets guarantee that, as the P even exten∞ e sion of an element in L2bd , f e has a formal expansion f e = c0 φe10 + j=1 cj ψj0 P ∞ e α 2jα 2 e ˜e where cj = hf , ψj0 i. But by Theorem 2.3.3, f ∈ H (R) so j=1 2 |cj | < ∞. Restricting f back to [0, ∞) gives the desired conclusion.
2.3.4 Strela’s two-scale transform and commutation Lemari´e-Rieusset’s technique of smoothing and roughening biorthogonal uniwavelet pairs outlined in Section 2.1.2 makes use of the simple observation that Ã∞ ! ³ξ ´ ³ξ ´ ³ dϕ ´∧ Y + −πξ/2k −2πiξ e cos π k H k = (e − 1)ϕ(ξ) b = −2πiξ (ξ), 2 2 dx k=1
with a similar identity for a roughened scaling function provided H has a zero at ξ = 1/2. In the z-domain (z = e−2πiξ ) it is the same to require that H is divisible by z + 1. A parallel smoothing and roughening scheme applies in the multiwavelet setting but now nondegeneracy is required of the filter matrix H as formalized in Strela’s thesis [333]; cf. [334]. In fact, in his thesis Strela developed several principles also for preserving desirable properties such as symmetry and minimal support when transforming one multiwavelet family into another. We will state the corresponding transformation recipes. Rather than providing proofs, we will illustrate their implementation in the DGHM case. Again, we refer to [333] and [334] for complete justifications. Two-scale similarity transform. First we consider a filter transformation that does not effectively change the multiresolution structure generated by a scaling vector P Φ. The principal is the same as in the case of uniwavelets. bnew = Set Φnew = k Ak Φ(x − k) where Ak are coefficient matrices. Then Φ ¡P ¢ P −2πikξ −2πikξ b Φ(ξ). If A(ξ) = is an absolutely convergent k Ak e k Ak e Fourier series with nonvanishing determinant for each ξ then it has an inverse with the same properties (cf. [29]) and Φnew generates the same multiresolution analysis as does Φ. In fact,
64
2 Derivatives and multiwavelets 2.5
2
2 1 1.5 1
0
0.5
−1
0 −2 −0.5 −2
−1
0
1
2
−2
−1
0
1
−1
0
1
−1
0
1
−1
0
1
2.5 2 2 1 1.5 0 1 −1 0.5 −2 0 −2
−1
0
1
2
−2
3 3 2
2
1
1 0
0
−1
−1
−2 −2 −2
−1
0
1
2
3.5
−2
3
3
2
2.5
1
2 0 1.5 −1
1
−2
0.5 0 −2
−1
0
1
2
−3 −2
Fig. 2.2. Plots of DGHM scaling functions (top left) and wavelets (top right) and their duals (bottom left and right) for s = 0
³ξ ´ ³ξ ´ ³ ´ ³ξ ´ ³ξ ´ bnew (ξ) = A(ξ)Φ(ξ) b b bnew ξ . Φ = A(ξ)H Φ = A(ξ)H A−1 Φ 2 2 2 2 2 Thus, in the z-domain the scaling filters of Φ and Φnew are related by the two-scale similarity transform (TST) H new (z) = A(z 2 ) H(z) A−1 (z).
(2.23)
Singular two-scale transforms and smoothing. If A(z) is singular for some |z| = 1 then the TST defined by (2.23) will no longer be MRA preserving.
2.3 Multiwavelets based on fractal interpolation vectors
65
Instead, it can raise or lower the regularity of the MRA [334]. As in the case of scalar MRAs, an adjustment should be made here, multiplying the corresponding TST by a factor 1/2, so that the new scaling function has the desired relationship with the original. Theorem 2.3.5. Suppose that the MRA generated by the scaling filter H(z) locally produces polynomials of degree p. Suppose that H(1)u = u and let S be a matrix polynomial such that det(S(z)) = c(1 − z) and S(1)u = 0. Then H + (z) =
1 S(z 2 ) H(z) S −1 (z) 2
generates an MRA that locally produces polynomials of degree p + 1. We will only give a rough sketch of the proof. The main step is to encode antidifferentiation in terms of properties of the two-scale transform. Let Φ be the scaling vector with filter H. The vector Φ+ defined by b+ (ξ) = Φ
1 b S(ξ) Φ(ξ) 2πiξ
(2.24)
b+ (ξ) = H + (ξ/2)Φ b+ (ξ/2) since is, formally, a solution of the scaling equation Φ b b S(2ξ) Φ(2ξ) = S(2ξ) H(ξ) Φ(ξ) b = S(2ξ) [2 S −1 (2ξ) H + (ξ) S(ξ)] Φ(ξ) b b+ (ξ). = 2H + (ξ) S(ξ) Φ(ξ) = 4πiξ H + (ξ) Φ One must justify convergence of the infinite product defining H + (z). This is where the eigenvector property comes in [334]. One should also show that the scaling equation has a unique solution. This is a nontrivial step, but one implied, for example, by the work of Colella and Heil [92]. The formal definition of Φ+ plus uniqueness then imply that the derivative of Φ+ belongs to the MRA space generated by Φ. Elements of V (Φ+ ) are essentially integrals of elements of V (Φ), as in the uniwavelet case. Thus, if V (Φ) locally contains polynomials of degree m then V (Φ+ ) locally contains polynomials of degree m + 1. Taking inverse Fourier transforms of (2.24) one has d + Φ = TS Φ, dx where TS is a distributional convolution operator whose convolution matrix is the inverse Fourier transform of S(ξ). When S is a matrix trigonometric polynomial, TS is a matrix polynomial in the shift operator σf (x) = f (x − 1) obtained by replacing z by σ in the transition matrix S. Under a suitable cancellation condition, the new multiscaling function Φ+ will also be compactly supported when Φ is (see below).
66
2 Derivatives and multiwavelets
Roughening. A dual approach leads to roughened scaling functions. Suppose that Φ is Lipschitz continuous, locally produces linear functions and has filter H(z) such that H(1) has a simple eigenvalue λ = 1 with left eigenvector u. One seeks a transition matrix R(z) such that (i) R(z) is invertible for all z 6= 1 and (ii) uT R(1) = 0, with λ = 0 a simple eigenvalue. Given such an R, the two-scale similarity transform inverse filter H − (z) = 2 R−1 (z 2 ) H(z) R(z) b− satisfying −2πiξ Φ(ξ) b b− (ξ), or defines a new multiscaling function Φ = R(ξ)Φ d Φ = TR Φ− , dx where TR is a matrix with polynomial entries in the shift operator as before. ˜ generate a biorthogonal Commutation and biorthogonality. If Φ and Φ pair of MRAs, once the roughening matrix R(z) is found, one can choose S(z) = −R∗ (z) as the smoothing matrix to be applied to a dual scaling vector. Then TS = −TR∗ , where T ∗ here denotes the Hermitian adjoint of T . The new scaling pair generates a new pair of biorthogonal MRAs. Preserving symmetry and support. The only condition imposed so far on S, R is the eigenvalue condition. This leaves some flexibility in the construction of transition matrices. To make matters more concrete we return to the case of the DGHM multiscaling functions that have support and symmetry properties worthy of preserving under two-scale transforms. To minimize support of Φ+ the entries of R(z) should be linear in z (see [333], Lemma 3.6.2, p. 54). Set R(z) = R0 + R1 z so that R∗ (z) = R0∗ + R1∗ z −1 . The corresponding transition operators are given by TR = R0 + R1 σ,
TR∗ = R0∗ + R1∗ σ −1 = TR∗
since the shift operator σ satisfies σ ∗ = σ −1 . Strela also found a necessary condition such that the two-scale transform preserves any symmetries of the scaling vector: Suppose dΦ/dx = −TR Φ− , − T where both Φ = [φ1 , . . . , φm ]T and Φ− = [φ− 1 , . . . , φm ] are refinable and have components that are symmetric or antisymmetric about the points t1 , . . . , tm − and t− 1 , . . . , tm respectively. Then necessarily: −1 R(z) = E(z) R(z −1 ) E− (z),
where E(z) = diag(±z 2t1 , . . . , ±z 2tm ),
−
−
E− (z) = diag(±z 2t1 , . . . , ±z 2tm ).
−1 The role of E− is to shift symmetry points of Φ− to the origin. The plus/minus signs amount to preserving/switching parities. Then E moves the symmetry points back to those of Φ.
2.3 Multiwavelets based on fractal interpolation vectors
67
2.3.5 Smoothing and roughening DGHM scaling filters For the filter Hs defined in Table 2.1, the matrix √ · ¸ 1 2s + 1 2(1 − s) √ Hs (1) = 2 (1 − s) 2+s 3 √ √ has eigenvectors [ 2, −1]T ↔ s, and [1, 2]T ↔ 1. Thus, the roughening criteria that Hs (1) has simple √ eigenvalue one is satisfied. One seeks a matrix R(z) linear in z such that [1, 2]R(1) = 0. Since the components of Φ are symmetric with respect to z = 0 and z = 1/2 respectively, as prescribed above we set E(z) = diag (−1, −z) where minus signs reflect the antisymmetries of dΦ/dx. To make matters simple, suppose that one wants the components of Φ− to be symmetric and antisymmetric, respectively, at the origin. Then one should take E− (z) = diag (1, −1). The simplest matrix R(z) satisfying these eigenvalue, linearity, and given symmetry constraints is · √ ¸ 0 2 2 R(z) = . (2.25) 1 − z −1 − z We come to the following. Theorem 2.3.6. Let −1/2 < s < 1/7 so that Φs is Lipschitz continuous while −1 < s˜ < 0 and hence Φs˜ is continuous. Let R(z) be given by (2.25) and set: Hs− (z) = 2 R−1 (z 2 ) Hs (z) R(z),
Hse+ (z) =
1 ∗ 2 R (z ) Hse(z) (R∗ )−1 (z). 2
Then the new filters satisfy the biorthogonality condition: Hse+ (z) (Hs− )∗ (z) + Hse+ (−z) (Hs− )∗ (−z) = I. − Moreover, the corresponding scaling vectors Φ+ s˜ and Φs satisfy the following: ∗ + − (a) dΦs /dx = TR Φs , dΦs /dx = −TR Φs . (b) Both components of Φ− s are piecewise continuous and supported on [−1, 1]; the first component φe,− is even, the second φo,− is odd. s s e,+ (c) Both components of Φ+ s˜ are supported on [−1, 1]. The first component φs˜ is odd. is even while the second, φo,+ s˜
Proof. The biorthogonality condition follows directly from that for Hs , Hs˜. The differential relations (a) are consequences of Strela’s smoothing and roughening criteria. With R as in (2.25) and σf (·) = f (· − 1), TR Φ− s becomes: · ¸ · e,− ¸ · ¸ √ √ φs 0 2 2I 2 2 φo,− − s TR Φs = . = φo,− I − σ −I − σ (I − σ) φe,− − (I + σ) φo,− s s s √ o,− e Since dΦs /dx = TR Φ− while dφis /dx = (I − s , we have dφs /dx = 2 2φs e,− o,− o,− σ)φs − (I + σ)φs . Consequently, φs inherits the support [−1, 1] and odd
68
2 Derivatives and multiwavelets
symmetry of dφes /dx. Strela’s symmetry criterion also implies that φe,− is s even. Thus one can write φo,− s (x) = α1 (x) − α1 (−x) = α1 (x) − β1 (x), where α1 is supported on [0, 1] as well as φe,− s (x) = α2 (x) + α2 (−x) + η(x) = α2 (x) + β2 (x) + η(x), where α2 is supported on [0, 1], and η is even and supported outside [−1, 1]. With this notation we have d i φ (x) = (I − σ) φe,− − (I + σ) φo,− s s dx s = (α2 + β2 + η) − σ(α2 + β2 + η) − (α1 − β1 ) − σ(α1 − β1 ) = (α2 − α1 ) + σ(β1 − β2 ) by support considerations. Cancellation of the other terms yields: σ(α2 + α1 ) = η χ[1,2] ,
β2 + β1 = −σ η χ[−1,0] ,
η χR\[−1,2] = σ η χR\[−1,2] .
The last identity implies that η(x) = η(x+1) outside [−1, 2]. Since η ∈ L2 (R), this periodicity implies that η = ηχR\[−1,1] = 0. Therefore α1 = −α2 = α and β1 = −β2 = β. With these conventions we can write: φe,− = α + β and s = β − α, which proves (b). φo,− s To prove (c), one has · ¸· e ¸ · ¸ d + 0 I − σ −1 (I − σ −1 )φise φse ∗ √ √ − Φse = TR Φse = = . φise 2 2 I −I − σ −1 2 2φese − (I + σ −1 )φise dx −1 i )φs˜ is supported on Since φis˜ is supported in [0, 1], dφe,+ s˜ /dx = −(I − σ e,+ [−1, 1] and is odd. Its antiderivative even and supported on [−1, 1]. √ e φs˜ is −1 i /dx = −2 +(I +σ )φs˜ is supported on [−1, 1] and is Similarly, dφo,+ 2φ s ˜ s˜ also lives in [−1, 1], and is odd, as claimed. This even. Its antiderivative φo,+ s˜ proves (c) and the theorem.
For the record, in Table 2.3 we list the nonzero filter coefficients Ck± for and Hs− , where
Hse+
Hs− (z) =
2 X k=−2
Ck− (s) z k ;
Hse+ (z) =
1 X
Ck+ (e s) z k .
(2.26)
k=−1
Smoothed and roughened DGHM wavelet filters. In [246] a simple construction was presented for completing Hs− and Hs˜+ to a biorthogonal filter bank, yielding smoothed and roughened wavelets related directly by differentiation and integration. The starting point there was an orthonormal MRA, but the construction still works in the DGHM biorthogonal case. Theorem 2.3.7. Let s, s˜, R and Hs− , Hs˜+ be as in Theorem 2.3.6 and let Gs , Gs˜ be the DGHM wavelet filters. The new wavelet filters
2.3 Multiwavelets based on fractal interpolation vectors
69
Table 2.3. Smoothed and roughened DGHM scaling coefficients · − C−2
1+2s 24
· − C−1
1 24
· C0−
1 24
1 24
6 20s − 8 6 20s − 8
¸
1+2s 24
−1 −1 1 1
G− s (z) = 2 Gs (z) R(z);
· + C−1
6 8 − 20s −6 20s − 8 ·
C2−
¸
14 + 4s 0 0 14 + 4s ·
C1−
−1 1 −1 1
1 24
6 2(s − 1) 9 3(2s − 1)
¸
· C0+
1 24
¸
· C1+
1 24
12 0 0 6
¸
¸
6 2(1 − s) −9 3(2s − 1)
¸
¸
G+ s e (z) =
1 Gse(z) (R∗ )−1 (z) 2
(2.27)
yield the conditions of biorthogonality (cf. (2.1)): ¸ · ¸ · − ¸ · ¡ + ¢∗ ∗ Hse¢ (z) (G+ I 0 Hs (z) Hs− (−z) s e ) (z) ¡ = . ∗ − ∗ 0 I G− Hse+ (−z) (G+ s (z) Gs (−z) s e ) (−z) The corresponding multiwavelets satisfy d + Ψ = −Ψse, dx se
d Ψs = Ψs− . dx
Smoothing and roughening does not change the supports [−1, 1] of the wavelets. This is clear for the roughened wavelets and follows from vanishing moments for the smoothed wavelets. The symmetries of the smoothed and roughened wavelets are reversed from those of the DGHM wavelets. Proof. The conditions of biorthogonality follow directly from the TST definition of the new filters and the conditions for the original filters. Using the differentiability properties of the scaling function one has: ∗ b+ (ξ) = 1 Gse(ξ) (R∗ )−1 (ξ) R (ξ) Φ bse(ξ) = 1 Ψbse(2ξ), Ψbse+ (2ξ) = G+ s e (ξ) Φs e 2 2πiξ 4πiξ
which implies dΨs˜+ /dx = −Ψs˜. Similarly, dΨs /dx = Ψs− . The Riesz basis property follows by integration by parts since the wavelets Ψs , Ψs˜ have the same regularity as Φs , Φs˜, as in Theorem 2.3.6.
70
2 Derivatives and multiwavelets
The coefficients of the smoothed and roughened DGHM multiwavelet filters 2 X
G− s (z) =
Dk− (s) z k
1 X
G+ s e (z) =
and
k=−2
Dk+ (e s) z k
(2.28)
k=−1
are listed in Table 2.4. Corresponding scaling functions and wavelets are plotted in Figure 2.3. Table 2.4. Smoothed and roughened DGHM wavelet coefficients − D−2
− D−1
1+2s 12
1 12
· D0−
1 12
√ ¸ · √ 6 2 (20s − 8) 2 12 40s − 16
1 12
· + D−1
1 48
√ ¸ 0 (4s − 34) 2 D0+ 8s − 20 0
· D1−
· √ √ ¸ − 2 2 −2 2
√ √ ¸ −6 2 (20s − 8) 2 D1+ 12 16 − 40s
√ 3 2 2
3
· 1 48
· 1 48
¸
√ ¸ 0 −3 2 −6 0
√ −3 2 2
3
√
(4s − 1) 22 (4s − 1)
√
(4s − 1) 22 (1 − 4s)
¸
√ ¸ 2 2 −2 −2
·√ D2−
1+2s 12
Commutation relations: InP the remainder of this section we will abbrei 1 m T ˜i ˜jk the sum viate by hf, Φjk iΦ i hf, φjk iφjk when Φ = [φ , . . . , φ ] . Then P P − + + ˜ ˜0k and P˜ f = P˜0 f = k hf, Φ0k iΦ 0 k hf, Φ0k iΦ0k are the respective oblique projections onto V˜0 and V˜0+ . Differentiation commutes with oblique MRA projections. That is, (d/dx)P˜0+ = P˜0 (d/dx), since À À X ¿ df X¿ X df d e e e0k = e P0 = , Φ0k Φ0k = hf, −TR Φ− f, − Φ0k Φ 0k i Φ0k dx dx dx k k k X X d − − + e = d Pe+ f. e0k = Φ = hf, Φ0k i (−TR∗ )Φ hf, Φ0k i dx 0k dx 0 k
k
By localizing these commutation relations to functions on a half-line one can characterize membership in the Sobolev space H01 (R+ ) of functions in H 1 (R) that vanish on (−∞, 0]. 2.3.6 Biorthogonal multiwavelets on H01 (R+ ) Though the following discussion will pertain to the parameter range for s, s˜ in Theorem 2.3.6, we will fix s = −1/5 = s˜—the case of orthogonal DGHM
2.3 Multiwavelets based on fractal interpolation vectors
71
Fig. 2.3. Plots of dual smoothed DGHM scaling functions (top left) and wavelets (top right) and roughened scaling functions (bottom left) and wavelets (bottom right) for s = 0
multiwavelets—for notational simplicity (see [246] for the general case). The smoothed MRA spaces for [0, ∞) are summarized in Table 2.5. The main difference here from Table 2.2—where the interest was in characterizing H α (R+ )—is the appearance of the truncated antisymmetric wavelets in the interior spaces. In contrast, H01 (R+ ) is the oblique direct sum of the interior smoothed spaces, as follows. ±,e/o ∞ }k=1
Theorem 2.3.8. The families {φjk
∪ {φ±,o j0 χ[0,∞) } form biorthogonal ±,o/e ∞ }k=1
bases for Vj±,int , respectively. The families {ψjk
±,o ∪ {ψj0 χ[0,∞) } form
72
2 Derivatives and multiwavelets Table 2.5. Smoothed multiresolution spaces for [0, ∞) Multiresolution space Vj+,int (R+ ) Wj+,int (R+ )
spanned by or sum of type o∞ n o∞ © +,o ª +,o interior ∪ φjk ∪ φj0 χ[0,∞) k=1 k=1 n o∞ n o∞ © +,o ª +,o +,o ψjk ∪ ψjk ∪ ψj0 χ[0,∞) interior n
φ+,e jk
k=1
k=1
Vj+,bd (R+ )
φ+,e j0 χ[0,∞)
boundary
Wj+,bd (R+ )
+,e ψj0 χ[0,∞) , ³ ´ V0+,int (R+ ) ⊕ ⊕j≥0 Wj+,int (R+ ) ³ ´ V0+,bd (R+ ) ⊕ ⊕j≥0 Wj+,bd (R+ )
boundary
L2int (R+ ) L2bd (R+ )
interior boundary
biorthogonal bases for Wj±,int . Together the spaces form biorthogonal MRAs for L2 (R+ ). Moreover, H01 (R+ ) is the H 1 -closure of V0+,int ⊕ (⊕j≥0 Wj+,int ) and X X e/o,− e/o,− 2 kf kH01 (R+ ) ∼ (1 + 22j ) |hf, ψjk i|2 + |hf, φk i| e/o,j,k
e/o,k
where the sum extends over those j, k and e/o defining the interior spaces. The characterization of H0α (R+ ) actually extends up to α = 3/2, but we will only consider the case α = 1. A key step is to establish commutation formulas for the interior spaces. Lemma 2.3.9. The following commutation relations hold: d d +,int P = (Pjint + Pjbd ) , dx j dx d +,int bd d Q , = (Qint j + Qj ) dx j dx
(2.29) (2.30)
where Pj+,int denotes the oblique projection onto Vj+,int in Table 2.5 and Pjint and Pjbd are the corresponding projections onto Vjint and Vjbd , respectively; bd similarly, Q+,int denotes the oblique projection onto Wj+,int and Qint j , Qj are j the respective projections onto Wjint and Wjbd . Corresponding relations hold for the roughened restrictions [246]. Proof (of Lemma 2.3.9). The relationship (2.30) follows directly from the relationship (d/dx)Ψ + = −Ψ and the respective definitions of the interior and boundary wavelet spaces in Tables 2.2 and 2.5.
2.3 Multiwavelets based on fractal interpolation vectors
73
To prove (2.29), since dΦ+ /dx = −TR∗ Φ, and d/dx commutes with scaling and translation, the definition of the transformation TR∗ on all of R gives " # · ¸ φijk − φij,k−1 d φe,+ j jk √ = −2 . (2.31) o,+ 2 2 φejk − φijk − φij,k−1 dx φjk When k = 1, 2, . . . all terms on both sides are supported in [0, ∞). The only term of Vjint,+ remaining to be accounted for is the truncated term φo,+ j0 χ[0,∞) which is differentiable from the right at zero and satisfies √ ¢ d ¡ o,+ φj0 χ[0,∞) χ[0,∞) = −2j 2 (2φej0 χ[0,∞) − φij0 ). dx
(2.32)
Appropriate linear combinations of the resulting component terms yield all of the interior components of Pjint as well as its boundary component φej0 χ[0,∞) . The wavelet spaces follow a similar, but simpler pattern based on the identity dΨ + /dx = −Ψ as in Theorem 2.3.7. To prove (2.29) one makes explicit use of (2.31) and (2.32). As in (2.29), ∞ X d + d o,+ d +,int Pj f = hf, Φ− Φjk + hf, φo,− φ χ[0,∞) j0 i jk i dx dx dx j0 k=1 ∞ X
= −
√ o,− j j ∗ hf, Φ− 2(2φej0 − φij0 ) χ[0,∞) . jk i 2 TR Φjk − hf, φj0 i 2
k=1
Using summation by parts and keeping track of boundary terms: ∞ D E X d +,int d Pj f = f, − Φjk Φjk dx dx k=1 √ e,− i √ j i 2 φj1 i φj0 +hf, 2j 2 φo,− j1 i φj0 + hf, 2 √ o,− j e −hf, 2 2 φj0 i (2φj0 χ[0,∞) − φij0 )
=
=
∞ X
® ® f 0 , Φjk Φjk + f 0 , φej0 φej0 χ[0,∞)
k=1
√ e,− o,− i + hf, 2j ( 2 φo,− j0 + φj1 + φj1 )i φj0
∞ X
® ® ® f 0 , Φjk Φjk + f 0 , φij0 φij0 + f 0 , φej0 φej0 χ[0,∞)
k=1
= Pjint
df df + Pjbd . dx dx
This proves the commutation relation (2.29). The lemma follows. Proof (of Theorem 2.3.8). Since Wj ⊂ Vj+1 ∩ V˜j⊥ , biorthogonality on [0, ∞) is inherited from support conditions and biorthogonality on all of R, except for
74
2 Derivatives and multiwavelets
the truly truncated wavelets and/or scaling functions. For truncated terms one always pairs two antisymmetric functions, whose product is an even function whose integral over the interval [0, 1] is half the integral over the interval [−1, 1]. Thus, if the functions were orthogonal in the larger interval they √ would still be in the smaller one; otherwise one normalizes, multiplying by 2 so that the product of such a function with its dual integrates to 1. This verifies the statement about biorthogonality. That the closure in H 1 (R+ ) of the union of the Vj+,int is H01 (R+ ) will follow from the commutation relations. Let V +,int = ∪j≥0 Vj+,int and denote by P +,int the oblique projection onto V +,int parallel to V −,int = ∪j≥0 Vj−,int , i.e., V +,int = V0+,int ⊕ (⊕j≥0 Wj+,int ). For f ∈ L2 (R+ ), X X e/o,− e/o,+ e/o,− e/o,+ P +,int f = hf, ψjk i ψjk + hf, φk i φk j,k,e/o
e/o,k
where the sum extends, again, over those indices defining the interior scaling and wavelet components in Table 2.5. Everything is truncated to [0, ∞). One wants to show that if f ∈ H01 (R+ ) then P +,int f = f and that the norm equivalence holds. If f ∈ H01 (R+ ) then both f and f 0 belong to L2 (R+ ). Moreover, one can assume that f is locally absolutely continuous. Since f 0 is square integrable and supported in [0, ∞), it can be expanded in the biorthogonal basis given by truncation of the DGHM multiwavelets in Table 2.2. By (2.29) and (2.30), one then has X df e/o ® e/o f0 = f 0 , ψjk ψjk + P0 dx e/o,j,k X D d e/o E e/o d e+ = f, − ψjk ψjk + P f dx dx 0 e/o,j,k
= −
X e/o,j,k
X e/o,+ e/o,− ® e/o e/o,− ® d φ . 2j f, ψjk ψjk + f, φk dx k e/o,k
One concludes, on the one hand, that X X e/o,− kf 0 k22 ∼ 22j |hf, ψjk i|2 + |hf, φe/o,− i|2 . e/o,j,k
e/o,k
On the other hand, this expansion of f 0 also yields µ X ¶ df d d + d +,int e/o,− e/o,+ = hf, ψjk i ψjk + P f = P f. dx dx dx 0 dx e/o,j,k
Integrating both sides one obtains f = P +,int +C, but C = 0 since f ∈ L2 (R). Finally,
2.4 Notes
kf k22 ∼
X
e/o,−
|hf, ψjk
e/o,j,k
i|2 +
X
e/o,−
|hf, φk
75
i|2
e/o,k
since the scaling and wavelet terms defining P +,int form a Riesz basis for its range. Coupled with the norm estimate for the derivative one concludes that X X e/o,− e/o,− 2 kf kH01 (R+ ) = kf k22 + kf 0 k22 ∼ (1 + 4j ) |hf, ψjk i|2 + |hf, φk i| . e/o,j,k
e/o,k
2.4 Notes Wavelets and Navier–Stokes equations. As discussed in Section 2.1.4, Cannone and Meyer [66] used the notion of adaptedness to products to prove well-posedness of Navier–Stokes in certain function spaces. Specifically, they defined a space X to be well P suited for Navier–Stokes if its norm is invariant under translation and if j∈Z 2−|j| ηj < ∞ with ηj as in (2.11). Their projections in (2.11) required the use of bandlimited wavelets. They proved the following. Theorem 2.4.1. If X is well suited for Navier–Stokes, then for any vector data v0 ∈ X such that ∇ · v0 = 0, there is a time T = T (kv0 kX ) > 0 and a strong solution v(t, x) ∈ C([0, T ); X) of the mild form Z t v(t) = S(t) v0 − P S(t − s) ∇ · (v ⊗ v)(s) ds (2.33) 0
of Navier–Stokes (here, again, S(t) denotes the heat semigroup). In fact, the fluctuation v(t) − S(t)v0 can be shown to behave even better for 0 ≤ t < T . There are two main steps in proving Theorem 2.4.1. First, one shows that if X is well suited then there is a π(t) ∈ L1 (0, T ) such that kPS(t)∇ · (f ⊗ g)kX ≤ π(t)kf kX kg 0 kX . The second step is essentially a bilinear Picard iteration lemma. It says that if β denotes the norm of a bilinear operator B as a mapping from Y × Y to Y then, whenever y ∈ Y satisfies kykY < 1/(4β), the equation x = y + B(x, x) has a solution x ∈ Y that is unique if one also requires that kxkY is not too large. One applies this lemma to the bilinear form defining v above, where Y is C([0, T ); X) with norm sup0≤t
1/2. It is worth mentioning that well-suitedness is not a necessary condition for existence of strong solutions (see [66]). Because the wavelets they used are necessarily supported on the whole space, the techniques of Cannone and Meyer do not automatically lend themselves to analysis on domains with boundary, such as half-spaces. The problem of extending their techniques to this setting is essentially open, though
76
2 Derivatives and multiwavelets
see Cannone et al. [67]. There are two complicating issues here: first, there is the issue of boundary values. Secondly, there is the problem of divergence-free projections and heat kernel estimates. Some first steps along these lines were taken in Obeidat’s thesis [291]. Divergence-free extensions of the smoothed and roughened DGHM wavelets play an important role there. Several other issues relating the use of wavelets to the study of NavierStokes equations are outlined by Katz and Pavlovi´c in [223]. d/dx in the Legendre basis. A few aspects of the use of the Legendre and interpolating wavelet expansions in numerical analysis bear special mention. Estimation of operators in divergence form (d/dx)a(x)(d/dx) acting on, say, C ∞ ([0, 1]), requires an approximate representation both of d/dx as well as of multiplication by a(x). We will return to the problem of estimating products below. When Pj is the orthogonal projector onto Vjm , the operator (d/dx)Pj is not well defined pointwise because elements of Vjm are discontinuous. Representing d/dx in VJm , where J is thought of as the finest numerical i l scale, amounts to determining the matrix coefficients h(d/dx)πJk , πJk 0 i. For simplicity, we take J = 0. These coefficients need to be determined in some weak sense. Because the components π l are cut off outside [0, 1) this problem reduces to defining the connection coefficients D d E rkil = π i , π l (· + k) , k = 0, ±1. (2.34) dx The condition that P0 (d/dx)P0 = d/dx on polynomials p ∈ V (π) imposes the conditions that Z 1 X X ¯1 ¯1 dπ i hp0 , π i i = p π i ¯0 − p dx = p π i ¯0 − rkil hp, π l i. (2.35) dx 0 k=0,±1
l
The boundary values leave two free conditions that can be expressed in terms of parameters a, b such that: r1il = −b π i (0) π l (1), r0il = (1 − a) π i (1) π l (1) − (1 − b) π i (0) π l (0) − k il , il r−1 = a π i (1) π l (0).
(2.36)
The coefficients k il can be determined from the recursion relation 0 0 (2i + 1) Pi (x) = Pi+1 (x) − Pi−1 (x)
for the Legendre polynomials on [−1, 1]. Straightforward algebra then yields ( √ √ 1, if i − l ∈ {1, 3, . . . }, k il = 2 2i + 1 2l + 1 × 0, otherwise. On the other hand, by definition of the Legendre polynomials,
2.4 Notes
√ π i (0) = (−1)i 2i + 1,
π i (1) =
√
77
2i + 1.
The relations (2.36) are particularly simple when one imposes periodic boundary values on the input functions. Then one necessarily has a = b = 1/2. This li leads to r1il = −r−1 . In the case m = 3 and a = b = 1/2, the matrices r0 and r1 defined by (2.36) then take the explicit form: √ √ √ 3 −1 − 3 − 0 √ √ 5 √0 1 √ r1 = 3 15 ; r0 = − 3 0 15 . √3 √ √ 2 − 5 − 15 −5 0 − 15 0 These same matrices are used to define the operator PJ (d/dx)PJ when J ≥ 0 represents the “finest scale” on which arbitrary smooth functions satisfying f (0) = f (1) are to be represented. Full details of this approach, including detailed analysis of approximation errors can be found in [6]. Dual wavelets based on refinement schemes. In Section 2.2 we considered a Legendre scaling vector π(x) = [π 0 , . . . , π m−1 ]T as well as a Lagrange interpolating vector ϕ = [ϕ0 , . . . , ϕm−1 ]T . Both generate the same space V m of piecewise polynomials. There is a natural way to generalize these scaling vector families by means of subdivision schemes that brings out the natural duality between Hermite interpolation (HI) of pointwise values and interpolation of local moments (MI) in a manner analogous to constructions of smoothed and roughened MRAs. As was the case with the smoothed and roughened DGHM wavelets, the relationship between MI and HI boils down to the fundamental theorem of R k+1 calculus: Given f ∈ L1 (R), its local moments µk (f ) = k f are nothing but Rt forward differences F (k + 1) − F (k) of its antiderivative F (t) = −∞ f . This suggests that interpolation of the point values F (k) can give rise to a function F (t) whose derivative f = F 0 has prescribed averages. Roughly, differentiation of pointwise interpolants corresponds to average/moment interpolants. Taking this naive observation as motivation, Donoho et al. [116] devised a general method to construct dual scaling functions incorporating some nice features of the Lagrange/Legendre bases, but with the addition of some regularity. Specifically, one designs a scaling vector Φ = [φ1 , . . . , φm ]T that is dual to the Legendre scaling vector π = [π 1 , . . . , π m ]T in the sense that any g ∈ V (π), the closure of the span of the {π i (· − k)}, can be written X g = hg, φk iT πk (x) (2.37) k T in which hg, φk iT = [hg, φ1k i, . . . , hg, φm k i] and πk (x) = π(x − k). Similarly, the shifts of the components of the vector Φ determine a space V (Φ) such that any f ∈ V (Φ) has an expansion X f = hf, πk iT Φk (x). (2.38) k
78
2 Derivatives and multiwavelets
One calls Φ a dual Legendre scaling vector. The projections Pπ and PΦ defined, respectively, by (2.37) and (2.38) are oblique; however the scaling vector Φ can have some regularity. Moreover, one can define a multiresolution space V1 (Φ) consisting of two-scale dilates of elements of V (Φ). The orthogonal complements of V (π) inside V1 (Φ) and V (Φ) inside V1 (π) define biorthogonal wavelet spaces from which one can build biorthogonal wavelet bases. In the latter case these are the actual Legendre wavelets. The basic step to these developments is to understand how to build the vector Φ. This is accomplished by means of moment interpolation refinement. Here is the setup. l Let πjk (x) = 2j/2 π l (2j x − k) denote the L2 -normalized translation and dilation of the Legendre function π l supported on I = I(j, k). The ith Legi endre polynomial moment of f over I = I(j, k) is µijk (f ) = hf, πjk i. Moment interpolating (MI) refinement seeks a solution to the problem of constructing a function f µ such that hf µ , πki i = hf, πki i, i = 0, . . . , m − 1, k ∈ Z and having some prescribed regularity. A solution is constructed through a sequence of transition operators Sj mapping moment data from one dyadic level to the next in such a way that certain differences remain bounded. At each level the process takes two steps: (i) find a local polynomial having the prescribed moments at level j, and (ii) use this polynomial to define the moment sequence at the next level. Accuracy is phrased in terms of the degrees of the polynomials thus produced. As one might expect, regularity comes at the price of support length. Fix L, D > 0. Step one is to find a polynomial pjk of degree D sharing the same local moments of f in the sense that ® i pjk , πj,k+h = µij, k+h , −L ≤ h ≤ L. (2.39) When D = (2L + 1)m − 1, one condition is imposed per coefficient so pjk is unique provided the conditions are linearly independent, as they are [116]. Step two is to prescribe moments at the next level by setting ® i (2.40) µ bij+1, 2k+h = pjk , πj+1, 2k+h , h = 0, 1. It turns out that the operator S = SL,m defined by ¢ ¡ i ¢ ¡ µ bj+1, k k,i = S(µijk ) k,i
(2.41)
does not depend on the scale j and it is possible to compute explicit m × m matrices Mhe and Mho , h = −L, . . . , L such that: µ bij+1, 2k =
L m−1 X X h=−L l=0
Mhe (i, l) µljk ;
µ bij+1, 2k+1 =
L m−1 X X
Mho (i, l) µljk .
h=−L l=0
Moreover, for the Legendre scaling matrices H0 , H1 in (2.19), one has µijk = H0 µj+1,2k + H1 µj+1,2k+1 . Therefore, µ bijk = H0 µ bj+1,2k + H1 µ bj+1,2k+1 . Consequently, the sequences µ bjk depend only on the starting sequence µ0k . Thus, if
2.4 Notes
79
P
the functions fjµ = k µ bTjk πjk have a limit f µ then this limit depends linearly on the initial moments µ0k (f ). The components φi of a Legendre dual scaling function Φ (which also ∞ depends on L) can now be defined. Let S ∞ = SL,m denote the linear operator that sends moment data at level zero to a limit function via iteration of SL,m . Then φi is simply S ∞ (δ0k δ il ). Since the refinement scheme commutes with integer shifts, φi (x − k) = S ∞ (δkk0 δ il ). Thus, if µ0k = {µi0k }m−1 i=0 is any bounded sequence of moment vectors then X S ∞ (µ0k ) = µ0k Φ0k . k
Because SL,m are defined in terms of moment interpolation, one has ¿ À X i l il hπk , φh i = δkh δ or πk , µk Φk = µ0k k
from which (2.37) and (2.38) follow. In contrast to moment interpolation, which seeks a limit function having the same prescribed local moments as a given f , Hermite interpolation (HI) refinement seeks a well-behaved function having prescribed point values βki ∼ f (i) (k) of a function f and its first m − 1 derivatives at the integers. In this i case, two-scale refinement prescribes values β1,2k+1 at the half-integers and so on. Fixing L > 0 and D as before, the Hermite refinement problem is to find a polynomial ρ0k of degree D + m satisfying (i)
i ρ0k (k + h) = β0,k+h , −L ≤ h ≤ L + 1
(2.42)
from which one prescribes the half-integer values: (i) i βb1,2k+1 = ρ0k (k + 1/2).
(2.43)
This procedure implicitly defines an operator T = TL,m yielding i i {βbj+1,k } = T {βjk }.
Parallel to the case of MI, there happen to exist m × m matrices Nh , now with −L ≤ h < L and such that i βbj+1, 2k+1 =
1 2(j+1)L
L−1 X m−1 X
l 2jl Nh (i, l) βj,k−h ;
βbj+1, 2k = βjk .
h=−L l=0
In order to relate an interpolant based on point-values of a function and a certain number of its derivatives with a corresponding interpolant of local moments up to a certain order, as suggested above, first one must link higher derivatives with higher polynomial moments and, second, one must pass this link from data on unit scale to infinitesimal scales.
80
2 Derivatives and multiwavelets
To map Legendre data to Hermite data, suppose that µijk is a finite LegP endre moment sequence of f = k µTjk πjk . Let (m−1)!F (x) be the (m−1)-th order antiderivative of f and define β to be the Hermite vector data of F as above. Then one has the following [116]. Theorem 2.4.2. Given a set of Legendre moment vectors {µjl : k − L ≤ l ≤ k + L} there exists a set of Hermite vectors {βjl : k − L ≤ l ≤ k + L + 1 } such that the unique Hermite interpolant ρjk of {βjl } and moment interpolant pjk (x) of {µjl } are related by (m − 1)!
dm ρjk = pjk . dxm
This theorem is not difficult to prove once one identifies the precise mechanism—essentially divided differencing—for mapping Hermite data to moment data. One desires to pass this connection through to limit functions. It turns out that Bernstein polynomials play a fundamental role in relating divided differences to moment sequences. One defines a matrix B whose ith column contains the Legendre moments on [0, 1) of the Bernstein polynomial B m,i (x) =
(m − 1)! xi (1 − x)m−1−i i!(m − 1 − i)!
where 0 ≤ i < m. Then B amounts to a change of polynomial basis. The passage from Hermite data at one level to Legendre data at the next level is captured by means of the following commuting diagram: {βki } Hermite interpolation ² {ρk } sampling at xj+1, 2k+1 ² i {β1,k }
B −1 ◦ ∂ m
(m − 1)!
dm dxm
B −1 ◦ ∂ m
/ {µi } k moment interpolation ² / {π1,k } subinterval ² moments / {µi1,k }
Again, we refer to [116] for full details. See also Dyn and Levin [129], based also on earlier idea of Dyn [128], for details regarding the regularity of limit functions. Naturally, regularity increases with the length parameter L. A straightforward consequence of Theorem 2.4.2 is that an MI refinement limit M I(m, L) is precisely the mth derivative of a corresponding Hermite (m, L + 1) refinement limit [116]. Multiwavelet estimation of pointwise products. In Section 2.1.4 we considered estimates for wavelet paraproducts that can come into play in function space estimates arising in nonlinear PDE. We consider now the problem of
2.4 Notes
81
devising numerically useful counterparts of such products. To describe the difficulty, if f, g are locally (well approximated by) polynomials of degree at most m−1, their product will typically be better approximated by a low-order polynomial on some subintervals than others. Thus one seeks to partition an interval, say [0, 1], into dyadic subintervals Pf = {a = x0 < x1 < · · · < xn = b} such that f is well approximated by polynomials of at most a fixed degree on each subinterval. Alpert et al. [6] suggest the following algorithm for approximating pointwise products, assuming an algorithm parP for determining Pm−1 titions Pf and Pg : (i) Find Pf and Pg and set f ≈ I∈Pf i=0 αIi πIi and Pm−1 i i P g ≈ I∈P ˜ g i=0 βI˜πI˜. (ii) Define the partition Pf g obtained by bisecting the intervals in the joint refinement of Pf and Pg . (iii) Define the product √ coefficients ciJ = 2|J|/2 αJi βJi / wi for each J in Pf g . This defines an initial approximation X m−1 X fg ≈ ciJ πJi . J∈Pf g i=0
The coefficients αJi are obtained by expanding the πI of the original partitions in terms of the scaling functions πJ of the refined partitions. The wi are quadrature weights. (iv) One merges refinement intervals for the product if some criterion is met, e.g., if coefficients from sibling intervals are close to their average over their parent interval. Legendre wavelets and recursive partitioning. The algorithm just outlined requires, in turn, an algorithm for computing appropriate functiondependent partitions of [0, 1] on which f, g could be estimated locally by Legendre sums. Perhaps the most basic problem of recursive partitioning is to estimate a function f say, by a fixed number N of local averages corresponding to a dyadic partition of [0, 1]; cf. [60]. Higher-order accuracy is possible using piecewise polynomial approximation. This leaves the problem of obtaining approximations efficiently. Legendre wavelets can aid in doing so [116]. A recursive dyadic partition (RDP) of [0, 1) is any partition P determined as follows: (i) P = {[0, 1)} is an RDP and (ii) if P = {[ak , bk )} is a partition into N subintervals then splitting one of the intervals in half results in a new partition P 0 called a refinement of P. An RDP is naturally associated with a dyadic tree. One can measure spatial homogeneity or inhomogeneity in terms P of the entropy of the lengths of the intervals of the partition, namely E(P) = 2−j log(#{I ∈ P : |I| = 2−j })/#{I ∈ P}). Given f , onePsets fP = fPm the piecewise polynomial approximation to f given by fP = I∈P pI χI in which pI is the L2 orthogonal projection onto VIm . One says that P ∗ = P ∗ (N, m) is an optimal RDP provided P ∗ minimizes kf − fP k2 over all dyadic partitions into at most N pieces and with piecewise polynomial approximants of degree at most m. The difficulty here is to satisfy the two constraints—on m and N —simultaneously: it is not clear that there is a direct binary search strategy that will come up with the solution. The
82
2 Derivatives and multiwavelets
first step is to find a way of associating to an RDP the solution fP . In the following theorem one denotes by I(P) the collection of strict dyadic ancestors in [0, 1) of elements of P, with I(P) = ∅ if P = {[0, 1)}. Also, ψ refers to an orthogonal Legendre mother wavelet vector. Theorem 2.4.3. For each RDP P the operator f → fPm is diagonal in the Legendre basis, that is: X fPm = µT0 π + hf, ψjk iT ψjk I(j,k) ∈ I(P)
with
kf − fPm k22 =
X
khf, ψjk ik2 .
I(j,k)∈I(P) / m This theorem is simple to prove: if I(j, k) ⊂ I ∈ P then fPm |I(j,k) ∈ VI(j,k) so
hfPm , ψjk i = hpI , ψjk i = 0. The set N (P) of those piecewise polynomials having this orthogonality property has dimension N m. The range of the operator mapping f to fPm has the same dimension. This operator acts as the identity on its range, which conl tains the functions φl and the Legendre wavelets ψjk for each I(j, k) ∈ I(P). There are N m of these. They are linearly independent and so must span the range. The identities then follow from orthogonality of the Legendre wavelets. The approximant fPm will generally be discontinuous, since the wavelets are. In [116] a method is proposed for smoothing this approximant, thus avoiding discontinuities at boundaries of partition elements, by extending the MI refinement method outlined above. Presumably this approach can be adapted to dealing with the operator d/dx in terms of dual Legendre wavelets as well. Subdivision and commutation. The dyadic rationals {k/2j } at level j define a sequence of grids. For scaling functions, the refinement mask produces data on the next finer dyadic level from data on the given level. Extra properties of the scaling filter guarantee convergence and other desirable properties of the scaling function. One important direction in time–frequency analysis involves extending this approach of defining multiresolution analyses in situations that do not lend themselves to uniform multilevel grids, but in which nonuniform ones can be defined. A series of papers of Daubechies, Guskov and Sweldens addresses the particular issue that we consider now of extending commutation relations for MRAs to nonuniform subdivision structures on the real line. Basic issues arise in the nonuniform setting that are not present in the standard dyadic setting. For example, it becomes important to allow the transition operators that map data from one grid level to the next to depend on level. This becomes clear when one is faced with the task of producing specific
2.4 Notes
83
functions such as polynomials in the limit. In what follows we consider how to use divided differences to produce derivatives in the limit, just as was done using Strela’s techniques in Section 2.3 or the techniques in [116] above in the case of MI and HI schemes. Again, irregularity of the grid becomes an important consideration. We will develop the multigrid formalism loosely, leaving out some important subtle points. Precise details can be found in [103] and [104]. By a grid X on R we mean simply a strictly increasing sequence xk such that limk→±∞ xk = ±∞. By a dyadic multigrid we then mean a sequence {Xj }∞ j=0 of grids that are dyadically nested in the loose sense that if xjk denotes the kth element of Xj then xjk = xj+1,2k . Important generalizations of this notion are made precise in [103]. One can associate to a multigrid a sequence of linear transition operators Tj : `∞ (Xj ) → `∞ (Xj+1 ) mapping data on Xj to data on Xj+1 . In the j standard coordinates of `∞ (Z), these operators define matrices Tj = {Tkl } (with countably many rows and columns). They are associated with the grid only insofar as sequences fj = {fjk } input into Tj are thought of as functions fjk = fj (xjk ) on Xj . A fundamental question is: what functions on R can one build as limits of sequences on X0 ? First one must make sense of what a limit means. In order that a function f (x) is a well-defined limit of fj it must be that, whenever xjkj → x ∈ R as j → ∞ one has fj (xjkj ) → f (x). In order that such a limit function f is defined on all of R, the multigrid {Xj } must be dense in the sense that any x ∈ R can be expressed as x = limj→∞ xjkj Other basic requirements are needed. For example, it seems reasonable to impose that, if the initial data f0 on X0 is identically equal to one then the data fj produced by the transition operators at each level and the limit function f are also identically one. This can be accomplished by imposing the property X j Tkl = 1 (2.44) l j for all j and k on the transition matrices {Tkl }. It makes sense to refer to a scheme {Xj , Tj } having this property as a subdivision scheme since it implies that if one starts with a unit mass at the point xjk this mass gets subdivided among the grid points at the next level and so on (note, though, that the j Tkl need not be non-negative). Actually, if (2.44) is satisfied, then one says {Xj , Tj } is an affine scheme. Consider next the problem of producing from {Xj , Tj } a so-called derived scheme {Xj , Tj− } whose limits are derivatives of the limits produced by {Xj , Tj }. If {Xj , Tj− } is affine then {Xj , Tj } must have produced the function f (x) = x as a limit from some starting sequence f0 . In this case {Xj , Tj } is said to produce linear functions. Given such a scheme {Xj , Tj }, how can one define {Xj , Tj− }? In the remainder of these notes we abbreviate by ∆ the forward difference operator (∆s)k = sk+1 − sk .
84
2 Derivatives and multiwavelets
Lemma 2.4.4. If f0 ∈ C0 (Z) then the scheme with multigrid {Xj } and tranP j j j sition matrices Skl = m>l (Tk+1,m − Tkm ) produces the sequences {∆fj } from ∆f0 . The lemma follows directly from a telescoping argument. Assuming that {Xj , Tj } produces linear functions, denote henceforth by aj+1 = Tj aj the level sequences starting from an a0 from which {Xj , Tj } produces the limit f (x) = x. One can show then that the derived scheme {Xj , Tj− } having the same multigrid and with transition matrices j,− Tkl =
X j aj,l+1 − ajl ∆(aj )l j j Skl = (Tk+1,m − Tkm ) ∆(aj+1 )k aj+1,k+1 − aj+1,k
(2.45)
m>l
produces derivatives of limits from {Xj , Tj }. More precisely, if {Xj , Tj } produces linear functions then one has the following. Theorem 2.4.5. If {Xj , Tj } produces f (x) from the starting sequence f0 and {Xj , Tj− } produces f − (x) from f0 , then f is differentiable with derivative f − . The forward divided difference operators ∂ja (fj ) = ∆fj /∆aj can be thought of as applying ∆ then left multiplying by diag (1/(∆aj )) which commutes with Sj in Lemma 2.4.4. Therefore (Tj− ∂ja fj ) =
1 (∆fj+1 ) Sj ∆(fj ) = . ∆(aj+1 ) ∆(aj+1 )
(2.46)
In short, the schemes {Xj , Tj } and {Xj , Tj− } satisfy the commutation formula a ∂j+1 Tj = Tj− ∂ja .
(2.47)
One would like to use Theorem 2.4.5 as a framework for building subdivision schemes that produce limits with some number of derivatives, but the theorem tells us only how to produce derivatives. One thus wishes to associate to {Xj , Tj } an integrated scheme {Xj , Tj+ } that produces antiderivatives of limits of {Xj , Tj }. In the commutation formula (2.47) the linear producing sequences aj are given. For inverse commutation one must come up both with sequences bj and transition operators Tj+ verifying b Tj+ = Tj ∂jb . ∂j+1
(2.48)
Hypothesizing such sequences bj for the moment, let Bj denote the matrix diag (∆bj ). If one imposes that the limit functions produced by {Xj , Tj+ } have compact support when their derivatives produced by {Xj , Tj } do, then X X + + 0 = (fj+1, ∆bj+1,k fj+1,k k+1 − fj+1,k ) = k
k
= 1T Bj+1 Tj fj = 1T Bj+1 Tj Bj−1 ∆fj+
2.4 Notes
85
in which 1 is the infinite column vector have all entries one. Requiring this of all finitely supported sequences fj+ forces 1T Bj+1 Tj Bj−1 = cj 1T
(2.49)
for some constant cj . One seeks bj for which this identity holds. From (2.48), Tj+ must satisfy j,+ j,+ Tk+1,l − Tkl
(∆bj+1 )k
= −
j j Tk,l−1 Tkl + (∆bj )l (∆bj )l−1
which, upon iterating, yields j,+ Tkl =
X m>k
µ (∆bj+1 )m
j Tm,l
(∆bj )l
−
j Tm,l−1
¶
(∆bj )l−1
where the series on the right converges when (2.49) applies. Assuming as before that T is affine, one has b ∂j+1 Tj+ bj = Tj ∂jb bj = Tj 1 = 1 or ∆Tj+ bj = ∆bj+1 .
This defines bj up to a constant that depends on j. By (2.48) the constants can be chosen such that Tj+ bj = bj+1 . Moreover, the bj must be the linear producing sequences for T + which tells us that the constants thus fixed will also have a well-defined limit as j → ∞. Commutation and biorthogonality. Theorem 2.3.7 shows that, starting with biorthogonal DGHM MRAs, the new MRAs produced by smoothing and roughening respectively remain biorthogonal. For irregular subdivision schemes as discussed here, not all properties of MRAs go through. However, this property of commutation with respective smoothing and roughening does. Here is an outline. For a subdivision scheme corresponding to a scalar MRA, the function φjk (x) = 2j/2 φ(2j x − k) is produced from iterating the refinement mask starting with the sequence δ(k/2j ) defined on dyadic rationals at level j having value one at k/2j and zeros elsewhere. In the case of irregular subdivision, one defines φjk by analogy as the subdivision limit starting with the sequence δ(xjk ). These functions are still called scaling functions (e.g., [104]) because they satisfy X j φjk = Tkl φj+1,l . but, of course, they are not related to other φj 0 k0 by dilating and translating. Two subdivision schemes {Xj , Tj } and {Xj , T˜j } sharing the same grid are said to be (uniformly) biorthogonal provided their scaling functions satisfy Z φjk φejν = wjk δkν with weights wjk uniformly bounded above and below for j fixed.
86
2 Derivatives and multiwavelets
Theorem 2.4.6. Let {Xj , Tj } and {Xj , T˜j } be a pair of biorthogonal, affine, subdivision schemes with normalizing sequences wj = {wjk }k . Then, when defined as above, the derived and integrated schemes {Xj , Tj− } and {Xj , T˜j+ } are again biorthogonal with normalizing sequences ∆aj , where aj produces f (x) = x from {Xj , Tj }. Moreover, bj that produce x from {Xj , T˜j+ } satisfy ˜ j = wj where ∆ ˜ is the backward difference operator (∆s) ˜ k = sk − sk−1 . ∆b It is worth commenting that, in the proof given in [104], the scaling functions are assumed further to be uniformly bounded with compact support. In [104], higher-order commutation between Lagrange interpolation schemes that interpolate data at the grid points {xjk } and spline schemes with knots at the grid points is considered. This represents a sort of generalization to the case of nonuniform grids of the results in [116] relating Hermite and moment interpolation refinement schemes. Other irregular multiresolution methods. The preceding material was adapted mainly from [103] and [104]. An alternate multiwavelet based notion of an irregular MRA was introduced by Donovan et al. [122]. Here is a brief example of their construction. One is given a two-nested knot sequence {xjk }. Let Vj be the space of continuous piecewise quadratic splines with knots at xjk . These spaces form an MRA in the sense of being nested and having dense limit. One introduces new spaces Vj0 based on the notion of a squeeze map that adds basis functions wjk supported in [xjk , xj,k+1 ] to the basis generated by normalized linear and quadratic splines. For [xjk , xj,k+1 ] = [0, 1] and v = xj+1,2k+1 ∈ (0, 1) the basic splines h(x) = (1 − |x|)+ and q(x) = x(1−x)χ[0,1] (x) are refined by taking q1,0 = q(x/v), q1,1 (x) = q((x−v)/v) and h1 (x) = x/v on [0, v], h1 (x) = (1−x)/(1−v) on [v, 1] and h1 (x) = 0 otherwise. One then produces orthogonal bases for L2 (R) with desired approximation properties. Lifting for multiwavelets. Given a scaling function φ with filter H there ˜ G} ˜ satisis, generally, more than one way to produce a filter bank {H, G, H, fying the conditions of biorthogonality. The identification of complementary biorthogonal filters is due to Vetterli and Herley [353]. This identification was extended to the multiwavelet setting by Davis et al. [110], as follows. Theorem 2.4.7. The polyphase matrix P (z) corresponding to a compactly supported multiwavelet pair can be written · ¸· ¸· ¸ I 0 I 0 I WN (z) P (z) = 0 T 0 (z) SN (z) I 0 I · ¸· ¸· ¸ I 0 I W1 (z) T (z) 0 ··· S1 (z) I 0 I 0 I in which Sk (z), Wk (z) have finite Laurent degree, while T and T 0 have Laurent polynomial entries and monomial determinants.
2.4 Notes
87
The matrices T, T 0 each possess a further LU type factorization. As in the uniwavelet case then: (i) construction of duals is simple—one has only to invert each lifting step to construct the dual polyphase matrix—and (ii) scaling filters with good approximation properties can be built stepwise. DGHM wavelets on R2+ . The symmetry properties of the DGHM wavelets lead to natural bases for half-spaces in higher dimensions. The L2 -Sobolev space H α (R2+ ) consists of those functions having extensions to H α (R2 ). Just as in one-variable, the even extension across the boundary is continuous from H α (R2+ ) to H α (R2 ). On the other hand, elements of H α (R2 ) have well-defined restrictions to H α−1/2 (R) (cf. [2]). Tensor products of DGHM wavelets provide simple, constructive proofs of these facts [245]. The characterization of Sobolev norms in terms of wavelet coefficients makes the restriction result particularly intuitive. Table 2.6 summarizes the tensor DGHM spaces on R2 that live inside R2+ and at the interior boundary of R2+ . The corresponding L2 -spaces are orthogonal or oblique direct sums depending on whether the wavelets are orthogonal or biorthogonal. Table 2.6. Multiresolution spaces for R2+ MRA space
tensor product/direct sum description
Vjint (R2+ )
Vj (R) ⊗ Vj int (R+ )
Wjint (R2+ ) [Vj (R) ⊗ Wj int (R+ )] ⊕ [Wj (R) ⊗ Vjint (R+ )] ⊕ [Wj (R) ⊗ Wjint (R+ )] Vjbd (R2+ )
Vj (R) ⊗ Vjbd (R+ )
Wjbd (R2+ ) [Vj (R) ⊗ Wjbd (R+ )] ⊕ [Wj (R) ⊗ Vjbd (R+ )] ⊕ [Wj (R) ⊗ Wjbd (R+ )] L2int (R2+ )
£ ¤ V0int (R2+ ) ⊕ ⊕j≥0 Wjint (R2+ )
L2bd (R2+ )
£ ¤ V0bd (R2+ ) ⊕ ⊕j≥0 Wjbd (R2+ )
The following is a two-variable analogue of Corollary 2.3.4. Corollary 2.4.8. If f ∈ L2bd (R2+ ) ∩ H α (R2+ ) (0 ≤ α ≤ 1) has expansion f =
∞ X X ¡¡
¢ ¡ ¢ ¢ e o e o e dejk ψjk + dojk ψjk ⊗ φej0 + eejk ψjk + eojk ψjk ⊗ ψj0 χ[0,∞)
j=1 k
then
∞ X
¢ ¡ 22jα |dejk |2 + |dojk |2 + |eejk |2 + |eojk |2 < ∞.
j=1
That is, this frame expansion converges to f in H α (R2+ ).
88
2 Derivatives and multiwavelets
e Since ψj0 (0) = 2j/2 ψ e (0) and φej0 (0) = 2j/2 φe (0), f satisfies ∞ X X ¡¡ ¯ ¢ e ¡ ¢ e ¢ e o e o f ¯R = 2j/2 dejk ψjk + dojk ψjk φ (0) + eejk ψjk + eojk ψjk ψ (0) . j=1
k
By Theorem 2.1.2 and Corollary 2.4.8, this series converges in H α−1/2 (R) (in the weak sense if α ≤ 1/2). Using techniques similar to those of the product estimates of Theorem 2.1.9 one can extend these estimates to restrictions of arbitrary elements of H α (R2 ) to obtain the following classical restriction theorem. ¯ Corollary 2.4.9. If f ∈ L2bd (R2+ ∩ H α (R2+ ) (0 ≤ α ≤ 1) then f ¯R belongs to H α−1/2 (R) with weak convergence when α ≤ 1/2. Corresponding extension and restriction theorems for the Morrey space analogues of H α (Rn ) and H α (Rn+ ) can be found in [245]. These Morrey estimates are of particular relevance for Navier–Stokes equations, e.g., [134].
3 Sampling in Fourier and wavelet analysis
The advent of high-speed digital computers has revolutionized many aspects of our lives. The modern technologies we take for granted rely not only on fantastically fast computer hardware, but also on various mathematical technologies. Sampling theory, in its many guises, is among the most important of these. It provides a means by which continuous-time phenomena such as the pressure waves emanating from a voicebox or musical instrument or loudspeaker, or the image of a natural scene formed in a camera, can be dealt with by fast, though finite, machines. Computers can only ever deal with a discretized, finite-length, finite-accuracy “sampled” version of these phenomena, and sampling theory provides answers to important questions such as: (1) Is the continuous-time signal being observed amenable to sampling, i.e., can the signal be “captured” by its values taken at discrete, separated points in time? (2) If so, how fast, and where, should the samples be taken? (3) If the samples form an equivalent discrete-time description of the signal, how should the samples be combined to reconstruct the signal? A description of the technology of (necessarily) imperfect samplers is beyond the scope of this book. On a purely mathematical level, we idealize the sampling problem(s) as follows: Uniqueness: Given a space X of continuous-time “signals” defined on an interval I, which discrete sets Λ ⊂ I have the property that if f ∈ X then the ¯ sequence f ¯Λ determines f ? Such a set Λ is called a set of uniqueness for X. Reconstruction: Given a set ¯ of uniqueness Λ for X, how can we reconstruct f ∈ X from the sequence f ¯Λ ?
90
3 Sampling in Fourier and wavelet analysis
Another issue concerning the application of sampling algorithms is their complexity. Some applications will require inexpensive, real-time processing of samples and this must be considered in evaluating the benefits of algorithms. In this chapter we investigate a few important aspects of the theory and practice of sampling. Numerous articles and books on the subject have recently appeared and the literature has become vast. We concentrate on just a few major issues, and make connections with the other main themes of this monograph, namely uncertainty and wavelets. In Section 3.1 we review important results in the theory of frames, particularly its impact on sampling. Basic results are followed by more advanced topics related to the acceleration of frame algorithms. Sampling of trigonometric polynomials is the setting for Section 3.2. The discrete Fourier transform provides the link between the coefficients of trigonometric polynomials and their uniform samples. Uniformity of the samples endows the Fourier matrix with a group structure that may be exploited to compute the transform in a fast manner. When sampling is performed nonuniformly, as often happens in applications, this group structure is lost. We present some recent work by a number of researchers towards making these transforms fast through the use of various approximation techniques. The section closes with a development of frame-based techniques for the recovery of the Fourier coefficients of trigonometric polynomials from nonuniform samples. In Section 3.3 we review a very small sample of the huge literature on sampling sets and the existence of frames of exponentials for L2 ([−Ω/2, Ω/2]). The theory of frames is applied to generate iterative reconstructions from samples of bandlimited signals on the line. The remarkable properties of the prolate spheroidal wavefunctions (PSWFs) are reviewed and the so-called ΩT theorem is addressed. This result, due to Slepian, gives weight to the folk theorem that the class of signals bandlimited to [−Ω/2, Ω/2] and timelimited to [−T /2, T /2] has dimension ≈ ΩT . Finally, recent results on local approximate sampling formulas for bandlimited signals based on PSWFs are discussed. The setting then moves to phase space. Section 3.4 introduces the shorttime Fourier transform (STFT) which has become a standard tool of signal analysis. Information encoded in the STFT is highly redundant—in any event, a finite machine will only be able to compute (approximately) a finite number of samples of the transform. The question is: how much is enough? It is important to understand which sampling sets are admissible for proper representation of the signal. This is where the connection between sampling and uncertainty is at its most obvious and powerful. Necessary conditions on the density of regular Gabor frames are discussed and references to further results provided. The focus then turns to irregular Gabor frames. Again it is uncertainty that provides the key as we develop Landau’s information-theoretic ideas. The PSWFs play a crucial role in the proof of a theorem giving minimum density requirements for lattices supporting irregular Gabor frames.
3.1 Frames
91
Finally, in Section 3.5 we turn to sampling in principal shift-invariant (PSI) spaces. Broadly speaking, these are the spaces of wavelet theory. Sampling is important, for example, to speed up the initialization of wavelet decomposition algorithms, and more fundamentally, to provide rigorous criteria for the treatment of discrete data in wavelet domains. The critical sampling algorithm of Janssen is presented. Frame-based iterative schemes due to Aldroubi and Gr¨ochenig are also dealt with. These are similar in spirit to their algorithms for trigonometric polynomials and bandlimited signals, but here the machinery of analytic function theory is not available and other techniques involving Wiener spaces are employed. Then the oversampling schemes introduced by Djokovic and Vaidyanathan enter the discussion. We give conditions found by the authors which verify the validity of these schemes. The chapter closes with an investigation of the notion of aliasing of sampling operators in PSI spaces and a comparison of the aliasing performance of various sampling operators considered in the section.
3.1 Frames Frames were first introduced by Duffin and Schaeffer [125] in the context of nonharmonic Fourier series, and have since become important tools in signal processing, particularly in communications. Since their value in the construction of “painless” nonorthogonal wavelet expansions was revealed by Daubechies et al. [101], applications of frames have multiplied and their theory is now well developed. It is the redundancy (or over-completeness) of frames that is their most attractive feature for applications in communications, leading to good behavior under erasures (loss of information due to signal drop-outs) and stability in the presence of noise. Frames have also become indispensable tools in sampling theory. For this reason we outline here some of the most basic results and in the spirit of implementablity explore some more advanced topics regarding the acceleration of frame algorithms. Let H be a (separable) Hilbert space with inner product h·, ·i. A sequence {en }∞ n=0 ⊂ H is a frame for H if there exist constants 0 < A ≤ B < ∞ such that ∞ X Akf k2 ≤ |hf, en i|2 ≤ Bkf k2 (3.1) n=0
for all f ∈ H. The constants A and B are the lower and upper frame bounds, respectively. Associated with each frame is the analysis (digitization) operator D : H → `2 (Z+ ) given by Df = {hf, en i}∞ n=0 and its Hilbert space adjoint S = D∗ : `2 (Z ) → H which is known as the synthesis operator and is + P∞ given by Sc = n=0 cn en . The frame P operator is the self-adjoint composition ∞ T = S ◦ A : H → H given by T f = n=0 hf, en ien . Equation (3.1) may be written as
92
3 Sampling in Fourier and wavelet analysis
Akf k2 ≤ hT f, f i ≤ Bkf k2
(3.2)
i.e., AI ≤ T ≤ BI, so that T is invertible with B −1 I ≤ T −1 ≤ A−1 I. Each f ∈ H admits the frame expansion f = T T −1 f =
∞ X
hT −1 f, en ien =
n=0
∞ X
hf, gn ien
(3.3)
n=0
with gn = T −1 en . The collection {gn }∞ n=0 is itself a frame (the dual frame). 3.1.1 The frame algorithm Computation of the dual elements gn in (3.3) may be difficult, and desirable properties of the original frame elements such a smoothness and rapid decay may not be inherited by elements of the dual frame. Elementary results from functional analysis (see the Appendix) can be used to generate iterative algorithms for reconstruction of f ∈ H from its frame coefficients hf, en i. By (3.2), |h(I − 2T /(A + B)f, f i| ≤ ((B − A)/(B + A))kf k2 so that by the self-adjointness of T , ° ° ° ° °I − 2 T ° < B − A < 1 (3.4) ° A+B ° B+A and we see that T is invertible. Further, the inverse of T may be computed via the convergent Neumann series (see Appendix) T
−1
¶j ∞ µ 2 X 2 = I− T . A + B j=0 A+B
(3.5)
Since f = T −1 T f , (3.5) gives f =T
−1
¶j ∞ µ 2 X 2 Tf = I− T Tf A + B j=0 A+B
(3.6)
cf. (3.3). Letting fn be the nth partial sum on the right-hand side of (3.6) gives the iteration f0 =
2 Tf; A+B
fn+1 = fn +
2 T (f − fn ). A+B
Notice that by (3.4) and (3.7), the error f − fn satisfies °µ ° ¶ ° ° 2 ° kf − fn k = ° I − T (f − f ) n−1 ° ° A+B °µ ¶n+1 ° µ ¶n+1 ° ° 2 κ−1 ° ° ≤° I− T f° ≤ kf k A+B κ+1
(3.7)
(3.8)
3.1 Frames
93
where κ = B/A is the condition number of the frame, so that fn → f geometrically as n → ∞ with the rate of convergence determined by κ. In fact, the iteration converges whatever the initialization f0 . There are two difficulties that arise in the application of (3.7) to the reconstruction of f ∈ H from data {hf, en i}∞ is tight n=0 . Notice that if the Pframe ∞ (i.e., if κ = 1) then f0 = f from (3.8) so that f = (2/(A + B)) n=0 hf, en ien , a formula reminiscent of an expansion in an orthonormal basis. However, if κ is large, so that ρ = (κ − 1)/(κ + 1) ≈ 1, then the convergence fn → f is slow. In fact, the number of iterations required to ensure that kf − fn k ≤ εkf k is n = nframe ≥ log(1/ε)/ log(1/ρ) which is of course large for ρ ≈ 1. The second difficulty is that applying (3.7) assumes knowledge of the frame bounds A and B. In fact there are many examples of systems for which the existence of frame bounds is known, while their precise value is not. See, for example, Theorem 3.3.1. In these situations, the reconstruction (3.7) is useless. Poorly conditioned frames (those for which κ = B/A À 1) provide slow convergence of the iteration (3.7). In this section we introduce relaxation techniques and accelerated frame reconstruction algorithms which help alleviate these problems. Relaxation techniques. We are interested here in what can be done when only the existence of frame bounds is known. For each γ > 0 (the relaxation parameter ) consider the operator I − γ T with T the frame operator as above. Then by (3.2) h(I − γ T )f, f i = hf, f i − γhT f, f i ≤ (1 − γA)kf k2 for all f ∈ H. Similarly, h(I − γ T )f, f i ≥ (1 − γB)kf k2 so that we have (1 − γB)kf k2 ≤ h(I − γ T )f, f i ≤ (1 − γA)kf k2 .
(3.9)
Hence, by (3.9) and the self-adjointness of I − γ T , kI − γ T k =
sup
|h(I − γ T )f, f i| ≤ c(γ)
f ∈H, kf k=1
where c(γ) = max{|1 − γA|, |1 − γB|}. If γ is sufficiently small then c(γ) < 1 and γ T (hence T ) is invertible with the inverse realized by a Neumann series. The sequence {f˜n }∞ n=0 defined iteratively by fe0 = 0;
fen+1 = fen + γ T (f − fen )
(n ≥ 0)
(3.10)
converges to f geometrically: kf − f˜n k ≤ c(γ)n kf k. If an upper frame bound B is known, then c(1/B) = (B − A)/B < 1 so that the iteration (3.10) with γ = 1/B will converge to f . However, if A is unknown (as is often the case), then we have no estimate of the rate of convergence.
94
3 Sampling in Fourier and wavelet analysis
3.1.2 Frame acceleration Increasing the rate of convergence of frame expansions is crucial if they are to be useful in practice. Ideas from numerical linear algebra may be adapted to aid the convergence of poorly conditioned frames and to develop frame algorithms that may be applied without knowledge of the frame bounds. We outline some of the work of Gr¨ochenig [162] in this area. Suppose {ek }∞ k=0 is a frame for the Hilbert space H with lower and upper ∞ frame bounds A and B, respectively, and {fj }j=0 the successive approximations of f ∈ H obtained from the frame algorithm (3.7). In polynomial acceleration algorithms, an element gn ∈ Vn = span {f0 , f1 , . . . , fn } is determined to be the best approximation of f from Vn , i.e., kf − gn k ≤ kf − gk for all g ∈ Vn . Since gn ∈ Vn , there exist constants {ank }nk=0 such that gn =
n X
ank fk .
(3.11)
k=0
If f0 = f , then all successive frame approximations f1 , f2 , . . . also agree with Pn f , and this requires k=0 ank = 1. To determine the coefficients ank , observe that by (3.7) with R = I − 2T /(A + B), f − fn = R(f − fn−1 ) = R2 (f − fn−2 ) = · · · = Rn (f − f0 ),
(3.12)
and the error f − gn satisfies f − gn =
n X k=0
ank (f − fk ) =
n X
ank Rk (f − f0 ) = Qn (R)(f − f0 )
k=0
Pn where Qn is the polynomial Qn (x) = k=0 ank xk . Suppose polynomials Qn have been found for which kf − gn k is small. Calculating the best approximations gn via (3.11) still requires the computation and storage of the first n iterates of the frame algorithm and is therefore no faster than the standard frame algorithm. As we will see, there are simple three-term recurrence relations for the gn that do not refer to the iterates fn or the polynomials Qn . Here we discuss two such accelerated algorithms which appear in [162]. The first of these—Chebyshev acceleration—uses the frame bounds in the algorithm. Despite accelerating the convergence markedly for poorly conditioned frames, the algorithm is useless when only the existence of the frame bounds is known. The polynomials Qn are independent of f and the approximations depend linearly on f . By contrast, the second of the accelerated algorithms—the conjugate gradient (CG) frame algorithm—is adaptive. The polynomials Qn depend nonlinearly on f and arise from a minimization procedure. The convergence is even faster than the Chebyshev algorithm, and the frame bounds do not appear in the algorithm, which makes the CG algorithm applicable even when the frame bounds are unknown.
3.1 Frames
95
Theorem 3.1.1. (Chebyshev acceleration algorithm) Let {en }∞ n=0 be a frame for a Hilbert space H with frame bounds A and B as in (3.1). Let T be the frame operator and ρ = (B − A)/(B + A). Define constants λn (n ≥ 1) recursively by λ1 = 2 and λn = (1 − ρ2 λn−1 /4)−1 (n ≥ 2). Then each f ∈ H may be reconstructed from its frame coefficients hf, en i by the recursion g0 = 0; µ gn = λn gn−1 − gn−2 +
2 Tf; A+B ¶ T (f − gn−1 ) + gn−2 g1 =
2 (n ≥ 2). A+B √ √ √ √ With σ = ( B − A)/( B + A), we have the error estimate kf − gn k ≤
(3.13)
2σ n ρ kf k. 1 + σ 2n
As the name of the theorem suggests, properties of the Chebyshev polynomials play a crucial role in the proof. These polynomials may be defined either by the recursion C0 (x) = 1,
C1 (x) = x,
Cn (x) = 2xCn−1 (x) − Cn−2 (x) (n ≥ 2) (3.14)
or by Cn (x) = cosh(n cosh−1 x) =
p p ¢n ¡ ¢−n ¢ 1 ¡¡ x + x2 − 1 + x + x2 − 1 . (3.15) 2
For |x| < 1, (3.15) may be written in the form Cn (x) = cos(n cos−1 x). Lemma 3.1.2. Let a < b < 1 and let Pn be the normalized Chebyshev polynomial on [a, b] given by µ ¶Á µ ¶ 2x − a − b 2−a−b Pn (x) = Cn Cn . b−a b−a Then for any polynomial Qn of degree at most n satisfying Qn (1) = 1, max |Qn (x)| ≥ max |Pn (x)| =
a≤x≤b
a≤x≤b
µ µ ¶¶−1 2−a−b Cn . b−a
Proof (of Theorem 3.1.1). The proof relies on some elementary facts from functional analysis that are collected in the appendix. Let Pn (x) be as in Lemma 3.1.2 with a = P −ρ, b = ρ, i.e., Pn (x) = C Pnn(x/ρ)/Cn (1/ρ) = Pn n k p f . By (3.12), since p x and let g = nk k nk n k=0 pnk = Pn (1) = k=0 k=0 1, we have f − gn =
n X k=0
pnk (f − fk ) =
n X k=0
pnk Rk (f − f0 ) = Pn (R)(f − f0 ).
(3.16)
96
3 Sampling in Fourier and wavelet analysis
By (3.4), |hRf, f i| ≤ ρkf k2 so the spectrum σ(R) of R is contained in the interval [−ρ, ρ]. Since the frame operator is self-adjoint, the spectral theorem (see Appendix), (3.15) and (3.16) give kf − gn k = kPn (R)(f − f0 )k ≤ kPn (R)kkf − f0 k ≤ max |Pn (x)|kf − f0 k |x|≤ρ
=
1 2σ n ρ kf − f0 k = kf k. Cn (1/ρ) 1 + σ 2n
We now need only check the recursions for λn and gn . With s = 1/ρ, (3.14) may be written Cn (s)Pn (x) − 2sxCn−1 (s)Pn−1 (x) + Cn−2 (s)Pn−2 (x) = 0.
(3.17)
Replacing the real variable x in (3.17) by the operator R, evaluating both sides of the resulting operator equation at f − f0 ∈ H and applying (3.16) gives 0 = Cn (s)(f − gn ) − 2sCn−1 (s)R(f − gn−1 ) + Cn−2 (s)(f − gn−2 ) 4sCn−1 (s) = [Cn (s) − 2sCn−1 (s) + Cn−2 (s)]f + T (f − gn−1 ) (A + B) − Cn (s)gn + 2sCn−1 (s)gn−1 − Cn−2 (s)gn−2 . (3.18) However, because of the recursion (3.14), the terms in the square brackets vanish and (3.18) may be rearranged to give the recurrence relation gn = 2s
Cn−1 (s) 4s Cn−1 (s) Cn−2 (s) gn−1 + T (f − gn−1 ) − gn−2 . (3.19) Cn (s) (A + B) Cn (s) Cn (s)
Let λn = 2sCn−1 (s)/Cn (s). Then by (3.14), λn − 1 = Cn−2 (s)/Cn (s) so that (3.19) becomes 2 λn T (f − gn−1 ) − (λn − 1)gn−2 gn = λn gn−1 + A+B µ ¶ 2 = λn gn−1 + T (f − gn−1 ) − gn−2 + gn−2 , A+B the required recurrence relation for successive approximations. The recurrence (3.14) also implies a recursive definition for the λn . In fact, λ1 = 2sC0 (s)/C1 (s) = 2 and λn = 1 +
Cn−2 (s) Cn−2 (s) Cn−1 (s) 1 =1+ = 1 + 2 λn−1 λn . Cn (s) Cn−1 (s) Cn (s) 4s
Since s = 1/ρ we have λn = (1 − ρ2 λn−1 /4)−1 . The proof is complete.
3.1 Frames
97
The first observation to make here is that the recurrence (3.13) assumes— dubiously in many situations—knowledge of the frame bounds. Secondly, the error estimate is a significant improvement on the standard frame reconstruction error. In fact, for the successive approximations fn generated by (3.7) to achieve an accuracy of kf − fn k ≤ εkf k, the number of iterations n = nframe required satisfies ρn < ε, i.e., nframe >
log(1/ε) κ−1 ≈ log(1/ε) log(1/ρ) 2
where, as before, κ = B/A is the condition number of the frame. On the other hand, the Chebyshev accelerated algorithm requires n = nCheb iterations for the same accuracy, with 2σ n /(1 + σ 2n ) < ε which may be estimated by nCheb >
√ log(ε/2) log 2 + log(1/ε) (log 2 + log(1/ε)) = ≈ ( κ − 1) . log σ log(1/σ) 2
Hence we have √ nframe (κ − 1) log(1/ε) ≈ √ ≈ κ+1 nCheb ( κ − 1)(log 2 + log(1/ε)) which is large for ill-conditioned frames. Theorem 3.1.3. (conjugate gradient acceleration) Let {ej }∞ j=0 be a frame for H with frame operator T . Define sequences hn , rn , pn in H and γn in C by h0 = p−1 = 0, r0 = p0 = T f and pn+1 = T pn − γn =
hrn , pn i ; hpn , T pn i
hT pn , T pn i hT pn , T pn−1 i pn − pn−1 ; hpn , T pn i hpn−1 , T pn−1 i rn+1 = rn − γn T pn ;
hn+1 = hn + γn pn
for n ≥ 0. Then hn → f in H. If kT k = b, kT −1 k = a−1 and µ = then we have the error estimate kf − hn k ≤
√ √ √b−√a , b+ a
2µn √ κkf k. 1 + µ2n
Before beginning the proof, observe that because of (3.2) and the selfadjointness of T , the bilinear form hf, giT = hf, T gi defines an inner product on H which generates a norm kf kT equivalent to the original norm on H. In fact, Akf k2 ≤ kf k2T ≤ Bkf k2 for all f ∈ H. A collection of vectors {f1 , f2 , . . . } ⊂ H is said to be T -orthogonal if hfi , fj iT = 0 for i 6= j. Proof. Let n be the least number for which {T f, T 2 f, . . . , T n+1 f } is linearly dependent. Let Vn = span {T f, T 2 f, . . . , T n f }. Now p0 = T f ∈ V1 and
98
3 Sampling in Fourier and wavelet analysis
p1 = T p0 −
hT p0 , T p0 i hT p0 , T p0 i p0 = T 2 f − T f ∈ V2 . hp0 , T p0 i hp0 , T p0 i
More generally, an inductive proof shows that {p0 , p1 , . . . , pn−1 } ⊂ Vn . We aim to show that this set also forms a T -orthogonal basis for Vn . Again, the proof is by induction. Observe that each pj (0 ≤ j ≤ n − 1) is nonzero for otherwise the set {T f, T 2 f, . . . , T n f } would be linearly dependent. That the collection {p0 , p1 } is T -orthogonal is easily seen from the definition of p1 . Assume now that {p0 , p1 , . . . , pk−1 } is a T -orthogonal basis for Vk (k < n). We need to show that hpk , pj iT = 0 for 0 ≤ j ≤ k − 1. For j = k − 1 we have À ¿ hT pk−1 , T pk−2 i hT pk−1 , T pk−1 i pk−1 − pk−2 , pk−1 hpk , pj iT = T pk−1 − hpk−1 , T pk−1 i hpk−2 , T pk−2 i T hT pk−1 , T pk−1 i = hT pk−1 , pk−1 iT − hpk−1 , pk−1 iT = 0. hpk−1 , T pk−1 i That hpk , pj iT = 0 for j = k − 2 is shown similarly. For j ≤ k − 3, observe Pk−2 that T pj ∈ Vj+2 ⊂ Vk−1 so that T pj = l=0 cl pl . Hence by the recursive definition of pk and the inductive hypothesis, hpk , T pj i = hT pk−1 − αk pk−1 − βk pk−2 , T pj i = hT pk−1 , T pj i − αk hpk−1 , pj iT − βk hpk−2 , pj iT =
k−2 X
c¯l hpk−1 , pl iT = 0,
l=0
and we see that {p0 , p1 , . . . , pk } is a T -orthogonal basis for Vk+1 . This is true for all k ≤ n − 1, so {p0 , p1 , . . . , pn−1 } is a T -orthogonal basis for Vn . Next we claim that hm = f for all m ≥ n. To see this, observe that by the recursive definitions of hn and rn , hn =
n−1 X
γj pj ;
rn = r0 −
j=0
n−1 X
γj T pj = T (f − hn ).
j=0
Let the T -orthogonal projection of f onto Vn be denoted PVTn (f ). Then, since {p0 , p1 , . . . , pn−1 } is a T -orthogonal basis for Vn , PVTn (f ) =
n−1 X j=0 n−1 X
hf, pj iT pj hpj , pj iT
hr0 , pj i pj hp j , p j iT j=0 n−1 n−1 X hrj + Pj−1 γk T pk , pj i X hrj , pj i k=0 = pj = pj = hn . (3.20) hpj , pj iT hpj , pj iT j=0 j=0 =
3.1 Frames
99
Hence, for all g ∈ Vn , kf − hn kT ≤ kf − gkT .
(3.21)
However, since {T f,P T 2 f, . . . , T n+1 f } is linearly dependent and T n+1 f 6= 0, n j we have T n+1 f = j=1 cj T f . Furthermore, f admits the representation (3.6) from which we see that f ∈ Vn . Putting g = f in (3.21) then gives hn = f . Subsequent approximations hj (j ≥ n + 1) also agree with f and the CG algorithm may be terminated after n iterations with exact reconstruction. Therefore, without loss of generality, we may assume that dim (Vn ) = n for all n, {p0 , p1 , . . . , pn−1 } is a T -orthogonal basis for Vn and hn is the T -orthogonal projection of f onto Vn , i.e., (3.21) is satisfied for all g ∈ Vn . If g ∈ Vn , there exist constants c1 , c2 , . . . , cn for which g=
n X
cj T j f =
j=1
n−1 X
cj+1 T j (T f ) = qn−1 (T )T f
j=0
Pn−1 j with qn−1 (x) = j=0 cj x ∈ Pn−1 , the class of polynomials of degree no greater than n − 1. In particular, hn = Qn−1 (T )T f for some Qn−1 ∈ Pn−1 , so that e n (I − T )f f − hn = (I − Qn−1 (T )T )f = Q ˜ n (x) = 1 − (1 − x)Qn−1 (1 − x) ∈ Pn satisfies Q ˜ n (1) = 1. Hence (3.21) where Q may be written e n (I − T )f kT ≤ kf − qn−1 (T )T f kT = ke kf − hn kT = kQ qn (I − T )f kT (3.22) with q˜n (x) = 1 − (1 − x)qn−1 (1 − x) ∈ Pn satisfying q˜n (1) = 1. Equation (3.22) holds for all polynomials q˜n ∈ Pn with q˜n (1) = 1 and in particular is true when q˜n is the normalized Chebyshev polynomial P˜n (x) as in Lemma 3.1.2 that minimizes max1−b≤x≤1−a |Qn (x)| over the collection of polynomials Qn ∈ Pn satisfying Qn (1) = 1, i.e., µ ¶Á µ ¶ 2x − 2 + b + a b+a P˜n (x) = Cn Cn b−a b−a which has maximum value 1/Cn ((b + a)/(b − a)) on [1 − b, 1 − a]. Therefore, since (1 − b)kf k2 ≤ h(I − T )f, f i ≤ (1 − a)kf k2 , kf − hn kT ≤ kP˜n (I − T )f kT = kP˜n (I − T )T 1/2 f k ≤ max |P˜n (1 − x)|kf kT a≤x≤b
=
max
1−b≤x≤1−a
|P˜n (x)|kf kT = kf kT
Á µ ¶ b+a 2µn Cn = kf kT . b−a 1 + µ2n
Since k · k and k · kT are equivalent norms, the proof is complete.
100
3 Sampling in Fourier and wavelet analysis
When applying a frame reconstruction algorithm, one of course can handle only a finite amount of data. Each iteration of the algorithms above requires an application of the frame operator which itself involves the computation of infinitely many frame coefficients hf, en i. A more reasonable approach is to seek an optimal approximation of f ∈ H from finitely many frame coefficients. Let W = span {e0 , e1 , . . . , eN −1 } and PW : H → W be the orthogonal pro−1 jection. The aim is now to compute PW f from the coefficients {hf, en i}N n=0 . N −1 Since W is finite-dimensional, the collection {en }n=0 which spans W is also a frame for W . Suppose the lower and upper frame bounds are A0 and B 0 , respectively. That is, for all f ∈ W , A0 kf k2 ≤
N −1 X
|hf, en i|2 ≤ B 0 kf k2 .
n=0
Let TW : f 7→
PN −1 n=0
f0 = 0;
hf, en ien (f ∈ H). Define a sequence {fj }∞ j=0 ⊂ W by fj+1 = fj +
A0
2 TW (f − fj ) + B0
(j ≥ 0).
Then we have the following result. ∞ Theorem 3.1.4. Let H, {en }∞ n=0 , W , PW , TW , {fj }j=0 be as above. Then fj → PW f as j → ∞.
Proof. When TW is restricted to W , it is the frame operator associated with N −1 the frame {en }n=0 for W . If f ∈ W then {fj }∞ j=0 are the associated frame iterates converging geometrically to f . However hf, en i = hf, PW en i = hPW f, en i so that for any f ∈ H, fj → PW f . If H has dimension N < ∞, then the proof of Theorem 3.1.3 shows that the CG approximations hj agree with f for j ≥ N . This cannot be guaranteed for the iterates of the standard frame algorithm. Consider also the case where H is (potentially) infinite-dimensional and W , TW are as above. In direct analogy with the iterations in Theorem 3.1.3, let h0 = 0, r0 = p0 = TW f , p−1 = 0, pn+1 = TW pn − γn =
hrn , pn i ; hpn , TW pn i
hTW pn , TW pn i hTW pn , TW pn−1 i pn − pn−1 , hpn , TW pn i hpn−1 , TW pn−1 i rn+1 = rn − γn TW pn ;
hn+1 = hn + γn pn .
T T Then, as in (3.20) we have PW f = hN where PW f is the T -orthogonal projection of f onto W . To compare the error estimates of Theorems 3.1.1 and 3.1.3, observe that by (3.1), Akf k2 ≤ hT f, f i ≤ kT f kkf k so that a = kT −1 k−1 ≥ A. Similarly, b = kT k ≤ B and we have the inequalities A ≤ a ≤ b ≤ B. Hence
3.2 Sampling of trigonometric functions
101
√ √ √ √ b− a B− A √ √ √ , µ= √ ≤σ= B+ A b+ a i.e., the CG error is generally smaller than that of the Chebyshev approximation, and the number of iterations required to obtain a predetermined accuracy is therefore smaller for the CG algorithm.
3.2 Sampling of trigonometric functions 3.2.1 Uniform sampling and the fast Fourier transform Consider the space TN −1 of trigonometric polynomials of degree no greater than N − 1: ½ TN −1 =
f (t) =
N −1 X
¾ an e2πint ; a0 , a1 , . . . , aN −1 ∈ C .
(3.23)
n=0
Such functions are periodic with period 1. Sampling f ∈ TN −1 at the distinct points 0 ≤ ξ0 < ξ1 < · · · < ξN −1 < 1 gives data bk = f (ξk ) =
N −1 X
an e2πinξk = (V a)k
(0 ≤ k ≤ N − 1)
n=0
where a = (a0 , a1 , . . . , aN −1 )T , b = (b0 , b1 , . . . , bN −1 )T ∈ CN and V ∈ MN (C) −1 with entries vkn = e2πinξk . Since {ξk }N k=0 are distinct, V is a Vandermonde matrix, hence invertible, and the coefficients an may be recovered: a = V −1 b. Reconstruction of f from its samples is achieved by f (t) =
N −1 X
(V −1 b)n e2πint =
n=0
N −1 N −1 X X
(V −1 )nk f (ξk )e2πint =
n=0 k=0
N −1 X
f (ξk )Sk (t)
k=0
where the interpolating functions Sk (0 ≤ k ≤ N − 1) are given by Sk (t) = PN −1 −1 )nk e2πint ∈ TN −1 . Notice that the interpolating functions satisfy n=0 (V the cardinality property Sk (ξl ) = δkl . When sampling is performed uniformly, i.e., when ξk = k/N (0 ≤ k ≤ N − 1), we have Sk (t) = S0 (t − k/N ) with S0 (t) = eπi(N −1)t sin(N πt)/(N sin πt). √ In the case of uniform sampling, we denote the matrix V / N by FN , i.e., √ (FN )jk = e2πijk/N / N . Then FN FN∗ = IN and with b, a as above, b=
√
N FN a,
1 a = √ FN∗ b. N
(3.24)
As written in (3.24), the forward transform is known as the discrete N -point √ Fourier transform (DFT) and denoted FN , i.e., FN : a 7→ b = N FN a. The
102
3 Sampling in Fourier and wavelet analysis
DFT is performed by multiplication of an N -vector by an N × N matrix and therefore has complexity O(N 2 ). Cooley and Tukey [94] √ demonstrated that the group structure of the Fourier basis ej (l) = e2πijl/N / N may be exploited to dramatically lower the complexity of the computation of DFTs to O(N log N ), thus opening the door to the widespread use of digital computers in scientific and engineering applications. When performed in this manner, the DFT is known as the fast Fourier transform (FFT). 3.2.2 Nonuniform (fast) Fourier transforms In many applications, however, data is not given uniformly. The use of spectral methods on nonuniform grids has recently become feasible with the development of fast algorithms which generalize the (uniform) fast Fourier transform. There are at least three types of such transforms which we now briefly describe. Together with their associated inverses, these are known as nonuniform fast Fourier transforms (NUFFTs). Given a finite-length sequence a ∈ CN and “nodes” 0 ≤ ξ1 < ξ2 < · · · < ξN −1 < N , consider the problem of computing the sums N −1 1 X nuo a)k = √ (FN al e−2πilξk /N N l=0
(0 ≤ k ≤ N − 1).
nuo nuo The transform FN : a → FN a is known as the fast Fourier transform with nonuniform output and may be √ the evaluation of the trigonoPN −1thought of as metric polynomial f (ξ) = ( l=0 al e−2πilξ )/ N at the nonuniformly spaced −1 N points {ξk /N }N to k=0 ⊂ [0, 1). The second type of transform maps b ∈ C nud N FN b ∈ C by N −1 1 X −2πiξl k/N nud b)k = √ (FN bl e N l=0
(0 ≤ k ≤ N − 1)
and is known as the fast Fourier transform with nonuniform data. The fully nonuniform Fourier transform of c ∈ CN is given by N −1 1 X −2πiξl xk /N fnu (FN c)k = √ cl e N l=0
(0 ≤ k ≤ N − 1)
−1 N −1 where {ξl }N l=0 and {xk }k=0 are (potentially) nonuniformly distributed in [0, N ). Of course, the uniform Fourier transform is a special case of both the nonuniform data and nonuniform output Fourier transforms, which in turn are special cases of the fully nonuniform Fourier transform. nuo nud fnu The transforms FN , FN and FN are known as forward transforms. Of great importance in sampling applications are the three corresponding inverse transforms, which we briefly consider at the end of this section.
3.2 Sampling of trigonometric functions
103
As written, the forward transforms have complexity O(N 2 ) and their direct inverses have complexity O(N 3 ), and are therefore impractical for use on large datasets. In this section we describe several algorithms which tradeoff precision against speed. 3.2.3 Algorithms based on Taylor polynomials Several algorithms for the computation of NUFFTs interpolate nonuniform data to a uniform grid and apply uniform FFTs. The interpolation may be done with the help of Taylor polynomials, but this is by no means the only possibility. We outline here an algorithm that employs Taylor expansions for the interpolation [156]. √ PN −1 nud a)k = (1/ N ) n=0 an ψk (ξn /N ) where ψk is the Fourier We write (FN character ψk (ω) = e−2πikω and 0 ≤ ξn < N for all n. For each 0 ≤ r ≤ N − 1 let Ir be the interval Ir = {ξ : −1/(2N ) ≤ ξ − r/N < 1/(2N )}. Then, using the P Taylor expansion of ψk at the centers of these intervals and putting aj,r = {n: ξn /N ∈Ir } an (ξn − r)j /N j (j ≥ 0, 0 ≤ r ≤ N − 1) gives N −1 X 1 X nud (FN a)k = √ an ψk (ξn /N ) N r=0 {n: ξ /N ∈I } n
1 = √ N
N −1 X
r
X
r=0 {n: ξn /N ∈Ir }
an
∞ X j=0
(j)
ψk
³ r ´ (ξ − r)j n N Njj !
∞ N −1 1 X 1 X ³ −2πik ´j ³ r ´ = √ ψk aj,r N N N j=0 j ! r=0 ∞ N −1 1 X 1 ³ −2πik ´j X aj,r e−2πirk/N = √ N N j=0 j ! r=0
=
∞ X 1 ³ −2πik ´j (FN aj,· )k j! N j=0
(3.25)
with (FN aj,· ) denoting the usual uniform N -point FFT applied to the second index. The approximation arises by truncating the final sum in (3.25), i.e., the approximation is simply nud a)k = (FN,TP
M −1 X j=0
1 ³ −2πik ´j (FN aj,· )k . j! N
(3.26)
The subscript on the operator is meant to denote that this transform has been computed with the help of Taylor polynomials. We break the algorithm into steps, computing the complexity of each step as we go.
104
3 Sampling in Fourier and wavelet analysis
P Step 1. Compute the collection aj,r = {n: ξn /N ∈Ir } an (ξn − r)j /N j . Calculating each aj,r requires j + 1 multiplications and cr = #{n : ξn /N ∈ Ir } additions. Hence the collection aj,r (0 ≤ j ≤ M − 1, 0 ≤ r ≤ N − 1) requires O(N M 2 ) operations. √ PN −1 Step 2. Compute the collection bj,k = (1/ N ) r=0 aj,r e−2πirk/N (0 ≤ j ≤ M − 1, 0 ≤ k ≤ N − 1). This requires the computation of M uniform FFTs of length N , and can be performed in O(M N log N ) operations. ¢j PM −1 ¡ nud Step 3. Compute (FN,TP a)k = (−2πik)/N bj,k /j! directly. This j=0 requires O(N M 2 ) operations. The algorithm therefore has overall complexity O(M N log N + N M 2 ). The error incurred in truncating the sum in (3.25) is intimately connected to the complexity. The error in each term is given by ∞ X 1 ³ −2πik ´j nud nud ek = (FN a)k − (FN,TP a)k = bj,k j! N j=M
2
so that the total ` error is ¯2 ¶1/2 µ NX µ ¶ −1 ¯ X ¯ ∞ 1 −2πik j ¯ ¯ kek`2 = bj,k ¯¯ ¯ j! N k=0
j=M
µ N −1 ¶1/2 ∞ X 1 (2π)j X 2j 2 ≤ k |bj,k | j ! Nj j=M
k=0
µ N −1 ¶1/2 ∞ X (2π)j X 2 ≤ |bj,k | . j! j=M
(3.27)
k=0
An application of the Plancherel theorem for uniform FFTs and the Cauchy– Schwarz inequality to (3.27) give ¶1/2 µ N −1 ∞ X (2π)j X 2 |aj,r | kek`2 ≤ j! r=0 j=M
µ N −1 ¯ ∞ X (2π)j X ¯¯ = ¯ j! r=0 j=M
≤
X
j=M
r
an
{n: ξn /N ∈Ir }
µ N −1 µ ∞ X (2π)j X j! r=0
≤ max #Ir kak`2
µ
∞ X j=M
X {n: ξn /N ∈Ir }
ξn − r N
¶j ¯2 ¶1/2 ¯ ¯ ¯
¶µ
X
|an |2
¶¶1/2 (2N )−2j
{n: ξn /N ∈Ir }
j
π πM π/N 2 ≤ e max #I kak ≤ εkak`2 r ` r Njj ! NMM !
for M > c log(1/ε)/ log(N ). So the algorithm has complexity O(N log(1/ε) + N (log(1/ε)/ log N )2 ).
3.2 Sampling of trigonometric functions
105
3.2.4 The Dutt–Rokhlin algorithm The algorithm of Dutt and Rokhlin [126] relies on an approximation of complex exponentials of the form eicx by suitably modulated trigonometric polynomials. In particular, they prove the following result: Theorem 3.2.1. Let b > 1/2, c, d ∈ R and m ≥ 2, q ≥ 4πb be integers. Then for |x| ≤ d, ¯ ¯ [mdc/π]+q/2 X ¯ icx ¯ ikπx/md ¯ −bπ 2 (1−1/m2 ) ¯e − eb(xπ/md)2 ρk (c)e ¯ ¯ < (4b + 9)e k=[mdc/π]−q/2 2
where ρk (c) = e−(c−k)
/4b
√ /(2 πb). 2
Notice that the modulated Gaussian e−bx eicx is approximated by a linear combination of finitely many time–frequency shifts of another Gaussian ρ0 . This theme is generalized in [52] and [326]. The proof (which we omit) is ingenious, using some elementary facts regarding Gaussians. For application in fast algorithms for nonuniform Fourier transforms, we put c = 2πξl , d = 1/2, x = j/N to get ¯ ¯ [mξl ]+q/2 X ¯ 2πijξ /N ¯ 4b(πj/mN )2 2πijk/mN ¯ −bπ 2 (1−1/m2 ) l ¯e −e ρkl e ¯ ¯ < (4b + 9)e k=[mξl ]−q/2
(3.28) √ 2 where ρkl = e−(2πξl −k) /4b /(2 bπ), (−N/2 ≤ j, k, l ≤ N/2 − 1). The approximation (3.28) is the basis for the fast algorithms of Dutt–Rokhlin which we now describe. Define the Dutt–Rokhlin fast Fourier transform for nonuniform data nud FN,DR by nud a)j (FN,DR
2 N −1 [mξl ]+q/2 X e4b(πj/mN ) X √ = ρkl e−2πijk/mN . al N l=0 k=[mξ ]−q/2
(3.29)
l
The transform depends on the parameter b, but we suppress this dependence in the notation. Of course, use of the word “fast” in the name of the transform is nuo fnu yet to be justified! The Fourier transforms FN,DR and FN,DR may be similarly defined using (3.28), but we concentrate here on the case of nonuniform data. Before addressing the issue of speed, we consider the precision of the transform. From (3.28) we have the uniform estimate nud nud |(FN a)j − (FN,DR a)j | ¯ ¯ [mξl ]+q/2 N −1 X ¯ −2πijξ /N ¯ 1 X 4b(πj/mN )2 −2πijk/mN ¯ l ¯ ≤√ |al |¯e −e ρkl e ¯ N l=0 k=[mξ ]−q/2 l
<
(4b + 9) −bπ2 (1−1/m2 ) √ e N
N −1 X l=0
|al |.
106
3 Sampling in Fourier and wavelet analysis
To achieve an `2 -error estimate of the form nud nud kFN a − FN,DR ak`2 ≤ εkak`2 ,
(3.30)
√ 2 2 we require (4b + 9) N e−bπ (1−1/m ) ≤ ε which in turn demands b ≥ C log(N/ε). The relationship between the parameters b and q connects the error estimates with the speed of the algorithm for computing (3.29). nud The algorithm for the computation of FN,DR a relies on a clever decompo√ 2 sition of the double sum in (3.29). Observe that with cj = (e4b(πj/mN ) )/ N , nud (FN,DR a)j
= cj
N −1 X
q/2 X
al
l=0
ρk+[mξl ],l e−2πi(k+[mξl ])/mN
k=−q/2
mN +q/2
= cj
X
q/2 X
e−2πijr/mN
r=−q/2
X
al ρrl .
(3.31)
k=−q/2 {l: k+[mξl ]=r}
We now describe the algorithm for the fast computation of the triple sum on the right-hand side of (3.31), and compute the complexity of each step as we go. Step 1. Compute the collection βrk (−q/2 ≤ r ≤ mN + q/2, 0 ≤ k ≤ N − 1) given by X βrk = ρrl al . {l: [mξl ]=r−k}
This step has complexity O(N q). Pq/2 Step 2. Compute the collection γr = k=−q/2 βrk (−q/2 ≤ r ≤ mN + q/2) directly. This step has complexity O(N q + q 2 ). PmN +q/2 With δj = r=−q/2 γr e−2πirj/mN , we have 2
e4b(πj/mN ) √ δj N 2 µ mN +q/2 ¶ −1 mN −1 X X X e4b(πj/mN ) √ + + = γr e−2πirj/mN . (3.32) N r=0 r=mN r=−q/2
nud (FN,DR a)j =
Step 3. Compute the three sums on the right-hand side of (3.32). The first and third sums may be computed directly with O(N q) operations. The second sum may be computed using a uniform mN -point fast Fourier transform in O(mN logN ) operations. This step thus has complexity O(mN logN + N q). √ 2 nud Step 4. Compute (FN,DR a)j = e4b(πj/mN ) δj / N (0 ≤ j ≤ N − 1) with O(N ) operations.
3.2 Sampling of trigonometric functions
107
The algorithm is complete and has complexity O(qN + q 2 + mN log N ). To compare the complexity with the precision, recall that for the `2 error estimate (3.30) we need b ≥ C log(N/ε) Hence q ≥ 4πb ≥ C log(N/ε) and we see that the complexity is O(N log(1/ε) + (log(1/ε))2 + N log N ). Thinking of ε as a fixed precision, the asymptotic complexity (for large N ) is O(N log(1/ε) + N log N ), an improvement over the Taylor polynomial-based algorithm of Section 3.2.3. Variations on a theme. A variant of the Dutt–Rokhlin algorithm which first appeared in [263, 290] with applications in [263], differs from the original in that the approximation of the complex exponential is made in discrete time. Whereas Dutt and Rokhlin uniformly approximate functions of the form 2 e−bt eict (|t| ≤ 1/2) by trigonometric polynomials and then sample the approximations, Nguyen and Liu approximate discrete complex exponentials e2πicj/N directly by [mc]+q/2
e2πicj/N ≈ s−1 j
X
xk (c)e2πijk/mN
(3.33)
k=[mc]−q/2
with c a real number (chosen successively to be the nonuniformly spaced data −1 points ξl ), q is a positive even integer, m is a positive integer and {sj }N j=0 is a sequence of positive numbers chosen to minimize the error of the approximation. The approximation is made with the aid of a pseudoinverse and is optimal in a least squares sense. Beyond the calculations with the pseudoinverse, the algorithm proceeds in much the same manner as in [126] and has similar asymptotic complexity and precision. Numerical experiments comparing the performance of the two algorithms appear in [290] and demonstrate the improved efficiency of the fully discrete algorithm. 3.2.5 The inverse transforms √ nud Let A ∈ MN (C) be given by Alk = (e−2πilξk /N )/ N . Since FN a = Aa, nud −1 −1 the inverse transform is given by (FN ) b = A b. Here we outline a fast algorithm for the inversion due to Dutt and Rokhlin [126]. Observe that (AA∗ )lj =
N −1 1 X −2πi(l−j)ξk /N e N
(0 ≤ l, j ≤ N − 1).
k=0
Hence P AA∗ is both Hermitian and Toeplitz, i.e., if c ∈ CN has mth entry N −1 −2πimξk √ )/ N (0 ≤ k ≤ N − 1) then cm = ( k=0 e ( cl−j , if l − j ≥ 0, ∗ (AA )lj = c¯j−l , if l − j < 0.
108
3 Sampling in Fourier and wavelet analysis
The entries of AA∗ are known once the entries of c are known. However, if √ nud 1 = (1, 1, . . . , 1) ∈ CN , then c = (FN 1)/ N which, as we have just seen, may be approximated with precision ε in O(N log N + N log(1/ε)) operations. Having computed the entries of AA∗ , the computation of AA∗ u (u ∈ N C ) may be performed in O(N log N ) operations with the use of the uniform fast Fourier transform by exploiting the convolution structure of the product. These products form part of an (iterative) conjugate gradient algorithm [174] which computes (AA∗ )−1 v (v ∈ CN ) in O(κ(A)N log N ) operations, where nuo w), ¯ the algorithm κ(A) is the condition number of A. Also, since A∗ w = (FN for the computation of Fourier transforms with nonuniform outputs can be used to calculate A∗ w (approximately) for any w ∈ CN . Hence we have the nud −1 following algorithm for computing of (FN ) a with precision ε: nud Step 1. Compute AA∗ with precision ε using FN,DR , requiring O(N log N + N log(1/ε)) operations.
Step 2. Compute b = (AA∗ )−1 a approximately using a conjugate gradient algorithm and the uniform FFT in O(κ(A)N log N ) operations. Step 3. Compute A∗ b with precision ε in O(N log N +N log(1/ε)) operations nuo using FN,DR . nud nuo Steps 1 and 3 apply FN,DR and FN,DR , respectively, introducing errors of ε. Step 2 introduces an error of κ(A)ε for a total error of Cκ(A)ε and complexity O(κ(A)N log N + N log(1/ε)). The condition number κ(A) is related to the uniformity of the distribution of nodes ξk in the interval [0, N ). When the nodes are uniformly distributed, A is unitary and κ(A) = 1. The condition number will be large for highly nonuniform distributions.
3.2.6 Nonuniform sampling and frames PN −1 2πilξ ∈ TN −1 . Evaluating f at N distinct points Let f (ξ) = l=0 al e 0 ≤ λ0 < λ1 < · · · < λN −1 < 1 gives the matrix equation f = Aa with f , a ∈ CN and A ∈ MN (C) given by fj = f (ξj ), al = al and Ajl = e2πilξj . Determining the coefficients al from the samples f (ξj ) involves the time-expensive operations of inverting A and multiplying the inverse by f . The calculation may be performed quickly (though only approximately) with an application nuo of the algorithm for the inversion of FN . An alternative, though related, approach is to use the theory of frames and the frame algorithms discussed in Section 3.1. We outline here a frame-based approach due to Gr¨ochenig [163]. For convenience we work with the class of centered trigonometric polynomials ½ ¾ M X 2πijξ e TM = f (ξ) = aj e ; a−M , . . . , aM ∈ C , j=−M
i.e., T˜M = e−2πiM ξ T2M (cf. (3.23)). Let DM ∈ T˜M be the Dirichlet kernel PM DM (ξ) = k=−M e2πikξ = sin((2M + 1)πξ)/ sin(πξ).
3.2 Sampling of trigonometric functions
109
In keeping with the periodicity of the trigonometric polynomials, the sam−1 pling nodes {λl }N l=0 are continued periodically to a doubly infinite sequence ∞ {λl }l=−∞ defined by λkN +l = k + λl (k ∈ Z, 0 ≤ l ≤ N − 1) so that, for example, λN = λ0 + 1. Theorem 3.2.2. Let 0 ≤ λ0 < λ1 < · · · < λN −1 < 1, δ = max0≤l≤N −1 (λl+1 − λl ) and vl = (λl+1 − λl−1 )/2. Suppose δ < 1/(2M ). Then the collection √ −1 2 ˜ { vl DM (·−λl )}N l=0 is a frame for TM with lower frame bound A = (1−2δM ) and upper frame bound B = (1 + 2δM )2 . Note that the condition δ < 1/2M forces N > 2M . In the uniform √ case −1 (λl = l/N ), if N = 2M + 1 then the collection (DM (· − l/N ))/ N }N l=0 √ is an orthonormal basis for T˜M . The appearance of the weights vl in the definition of the frame elements (and subsequent frame algorithms) make this an adaptive frame in that it adjusts to the local geometry of the sampling set. The effect is more pronounced in Theorem 3.3.4. The proof uses Wirtinger’s inequality [127]: Z b Z b 4 2 2 |f (ξ)| dξ ≤ 2 (b − a) |f 0 (ξ)|2 dξ (3.34) π a a for all f ∈ L2 ([a, b]) with either f (a) = 0 or f (b) = 0, and Bernstein’s inequality for trigonometric polynomials: kf 0 k2 ≤ 2πM kf k2 for all f ∈ T˜M . Proof. Let ηl = (λl−1 + λl )/2 be the midpoint of the interval [λl−1 , λl ] and define an operator Q : T˜M → L2 ([0, 1]) by Qf =
N −1 X
f (λl )χ[ηl ,ηl+1 ] .
l=0
PN −1 Then kQf k22 = l=0 vl |f (λl )|2 and since λl − ηl < δ/2, ηl − λl−1 < δ/2, the Wirtinger and Bernstein inequalities give N −1 Z ηl+1 X 2 kf − Qf k2 = |f (ξ) − f (λl )|2 dξ l=0
=
l=0
≤
ηl
N −1 µ Z λl X
N −1 X l=0
Z |f (ξ) − f (λl )|2 dξ +
ηl
ηl+1
¶ |f (ξ) − f (λl )|2 dξ
λl
µ ¶2 Z ηl+1 4 δ |f 0 (ξ)|2 dξ ≤ 4δ 2 M 2 kf k22 . π2 2 ηl
¯ Hence k(I − Q)¯T˜ k ≤ 2δM < 1, and if f ∈ T˜M , M
(1 − 2δM )kf k2 ≤ kQf k2 ≤ (1 + 2δM )kf k2 . PN −1 PN −1 √ 2 2 Finally, observing that l=0 |hf, vl DM (· − λl )i| = l=0 vl |f (λl )| = 2 kQf k2 completes the proof.
110
3 Sampling in Fourier and wavelet analysis
The frame operator is the mapping T : T˜M → T˜M given by T f (ξ) =
N −1 X
hf,
√
N −1 X √ vl DM (· − λl )i vl DM (ξ − λl ) = vl f (λl )DM (ξ − λl ).
l=0
l=0
√
The data is the collection of frame coefficients vl f (λl ) and with frame bounds as in the statement of Theorem 3.2.2, the frame iteration takes the form fn+1 = fn +
N −1 X 1 vl (f (λl ) − fn (λl ))DM (ξ − λl ). 1 + 4δ 2 M 2
(3.35)
l=0
What is required, however, is the collection of Fourier coefficients aj = R1 (n) f (ξ)e−2πijξ dξ (−M ≤ j ≤ M ). Let aj be the jth Fourier coefficient 0 of fn . Computing Fourier coefficients of both sides of equation (3.35) gives (n+1)
aj
(n)
= aj
+
N −1 X 1 vl ((f (λl ) − fn (λl ))e−2πijλl 1 + 4δ 2 M 2 l=0
=
(n) aj
+
(0) aj
M X 1 (n) ak uj−k − 1 + 4δ 2 M 2
(3.36)
k=−M
PN −1 where uj = l=0 vl e−2πijλl . The vectors a(0) , u ∈ CN need to be computed just once. In any event, they may be computed approximately by the fast PM (n) algorithms of Sections 3.2.3 and 3.2.4. The convolutions k=−M ak uj−k may be computed using the uniform FFT in O(M log M ) operations. Hence, each iteration is an O(M log M ) operation and we have µ ¶ 1 + 2δM ka − a(n) k`2 = kf − fn k2 ≤ (2δM )n+1 kf k 1 − 2δM which will converge quickly for δ ¿ 1/(2M ). If δ approaches 1/(2M ), the convergence will be slow and the Chebyshev or conjugate gradient algorithms of Section 3.1.2 will produce faster convergence.
3.3 Sampling in the Paley–Wiener spaces The classical space in which to study all aspects of sampling is the space V of finite energy signals on the line “bandlimited” to [−Ω/2, Ω/2]: V = {f ∈ L2 (R) : fb(ξ) = 0 for |ξ| > Ω/2}. Because of its role in the celebrated Paley–Wiener theorem, V is often called the Paley–Wiener space and denoted P WΩ . The classical sampling theorem states that if f ∈ P WΩ , then f admits the expansion
3.3 Sampling in the Paley–Wiener spaces
X µ n ¶ sin π(Ωt − n) f (t) = f . Ω π(Ωt − n) n
111
(3.37)
See Benedetto and Ferreira’s introduction to [34] for the history of this result. The Paley–Wiener spaces provide a good model for many phenomena. On the one hand, many natural and synthetic signal generators can only output slowly varying signals. On the other hand, many natural and synthetic processors can only process bandlimited signals. For example, the human ear cannot process/amplify signals beyond, say, 20 kHz. The classical sampling theorem is fundamental in digital signal processing since it provides a means of converting an analog signal f to a digital signal {f (n/Ω)} in a lossless way, and a means of reconstructing f continuously within P WΩ via (3.37). The slow decay of the cardinal sine function sinc (t) = sin πt/πt causes many problems in the application of (3.37). It is preferable in many respects to sample at a rate faster than Ω (samples per unit time) if the cardinal sine function can be replaced in (3.37) by a function with more rapid decay. This can be done as follows. Let a > 1 and choose a smooth function ϕ with the property ( 1, if |ξ| ≤ Ω/2, ϕ(ξ) b = 0, if |ξ| > aΩ/2. √ 2 Since {e2πikξ/aΩ / aΩ}∞ k=−∞ is an orthonormal basis for L ([−aΩ/2, aΩ/2]), the Fourier series for fb can be written µ Z aΩ/2 ¶ 1 X fb(ξ) = fb(η)e2πikη/aΩ dη e−2πikξ/aΩ aΩ −aΩ/2 k µ ¶ k 1 X f = e−2πikξ/aΩ aΩ aΩ k
on [−aΩ/2, aΩ/2]. Since fb = fbϕ, b Z ∞ 2πitξ f (t) = fb(ξ)ϕ(ξ)e b dξ −∞ µ ¶Z ∞ k 1 X 2πiξ(t−k/aΩ) = f ϕ(ξ)e b dξ aΩ aΩ −∞ k µ ¶ µ ¶ k k 1 X f ϕ t− . = aΩ aΩ aΩ
(3.38)
k
The fast decay of ϕ makes (3.38) a much more stable sampling formula than (3.37), the drawback being the higher rate (aΩ samples per unit time). Since bandlimited signals are analytic [294], no such signal can be compactly supported. Nevertheless, suppose the values of f ∈ P WΩ are negligible
112
3 Sampling in Fourier and wavelet analysis
off [−T /2, T /2]. Then in the sampling result (3.37), only the samples f (n/Ω) for |n/Ω| < T /2 contribute significantly to the sum and with small error the sum may be truncated so that [T Ω/2]
f (t) ≈
X
n=−[T Ω/2]
f
³ n ´ sin π(Ωt − n) Ω
π(Ωt − n)
.
Hence, in a sense that will be made precise in Section 3.3.4, the manifold of functions bandlimited to J = [−Ω/2, Ω/2] and approximately timelimited to I = [−T /2, T /2] has dimension 2[T Ω/2] + 1 ∼ ΩT . Equation (3.37), the template for sampling formulas, has been used for many years to move between the continuous and discrete realms. In many realworld situations, the assumption of bandlimitedness has ample justification. A serious drawback, however, is that the result applies only to uniformly sampled data. It is natural then to ask (a) which nonuniform discrete subsets Λ ⊂ R have ¯ the property that every f ∈ P WΩ may be reconstructed from its samples f ¯Λ , and (b) what methods do we have for performing the reconstruction? Of course the question of speed of reconstruction may also be a factor. In this section we show how the theory of frames enters the discussion, and we use some basic functional analysis techniques to create stable reconstruction algorithms. 3.3.1 Sampling sets for the Paley–Wiener spaces We want to view the classical sampling theorem from another perspective. The √ functions en (ξ) = e−2πinξ/Ω χ[−Ω/2,Ω/2] / Ω (n ∈ Z) form an orthonormal basis for L2 ([−Ω/2, Ω/2]). Since the Fourier transform is a unitary mapping √ F : L2 ([−Ω/2, Ω/2]) → P WΩ , the collection gn (t) = ebn (t) = Ω sinc Ω(t − n/Ω) (n ∈ Z) is an orthonormal basis for P WΩ . By the Plancherel theorem, if f ∈ P WΩ , X X X ³n´ f (t) = hf, gn ign (t) = hfb, en ign (t) = f sinc (Ωt − n) Ω n n n with convergence in L2 . The sampling sets {λn }∞ n=−∞ ⊂ R for which the more general collection en (ξ) = e2πiλn ξ (n ∈ Z) is a frame for L2 ([−Ω/2, Ω/2]) are crucial in the sampling theory of Paley–Wiener spaces. The theory of these sets is well established, deep, and is surveyed clearly in [372]. Here we describe just a few of the most important results. Given a sequence Λ = {λk }∞ k=−∞ ⊂ R, its frame radius R = R(Λ) is defined by 2 R(Λ) = sup{r : {e2πiλk t }∞ k=−∞ is a frame for L ([−r, r])}.
3.3 Sampling in the Paley–Wiener spaces
113
Λ is (uniformly) separated if inf j6=k |λj − λk | = δ > 0 in which case δ is said to be the separation constant of Λ. If Λ is separated, then it has uniform density d > 0 if there exists L > 0 such that |λk − k/d| ≤ L for all k. The following result is due to Duffin and Schaeffer [125]. Theorem 3.3.1. If Λ is separated and has uniform density d > 0, then R(Λ) ≥ d/2. Again by the unitarity of the Fourier transform, Duffin and Schaeffer’s result may be stated thus: if Λ = {λk }∞ k=−∞ is separated with uniform density d > Ω, then the collection {sinc Ω(t − λk )}∞ k=−∞ forms a frame for P WΩ . Notice however that Theorem 3.3.1 provides no estimate of the frame bounds, so that both the standard frame algorithm and the accelerated Chebyshev algorithm (Theorem 3.1.1) are not applicable to the reconstruction. On the other hand, Gr¨ochenig’s CG algorithm (Theorem 3.1.3) will still converge. Theorem 3.3.1 raises an important issue regarding the density of sampling sets. Given a discrete set Λ ⊂ R and r > 0, let n h n h r io r io r r − n+ (r) = sup # Λ ∩ x − , x + ; n (r) = inf # Λ ∩ x − , x + . Λ Λ x∈R 2 2 2 2 x∈R The upper and lower Beurling densities of Λ are n+ Λ (r) ; r→∞ r
D+ (Λ) = lim
n− Λ (r) . r→∞ r
D− (Λ) = lim
Notice that for the uniform sampling set {n/Ω}∞ n=−∞ in the classical sampling theorem (3.37), D+ (Λ) = D− (Λ) = Ω while in Theorem 3.3.1 above, the sampling sets Λ satisfy D− (Λ) = d > Ω. Hence, applying Theorem 3.3.1 to construct separated, perturbed sampling sets always requires oversampling (relative to the classical sampling theorem). The oversampling requirement persists in the case of general nonuniform sampling sets. To demonstrate this we have Landau’s result [247]: Theorem 3.3.2. If Λ is a separated sequence, then D− (Λ) exists and R(Λ) ≤ D− (Λ)/2. Jaffard [208] provides another criterion for the density of sampling sets. Suppose Λ has uniform density d and let ∆(Λ) be the supremum of the densities of subsequences of Λ that have a uniform density. Then R(Λ) = 0 if Λ has no such subsequences, or if Λ has arbitrarily many points in intervals of length 1, while R(Λ) = ∆(Λ)/2 otherwise. Seip [316] rounded out this circle of ideas as follows. A sequence Λ = {λk }∞ k=−∞ ⊂ R is relatively separated if it is a finite union of separated sequences. 2 Theorem 3.3.3. If {e2πiλk t }∞ k=−∞ is a frame for L ([−1/2, 1/2]), then Λ is − relatively separated and D (Λ) ≥ 1. On the other hand, if Λ is relatively 2 separated and D− (Λ) > 1, then {e2πiλk t }∞ k=−∞ is a frame for L ([−1/2, 1/2]).
114
3 Sampling in Fourier and wavelet analysis
For more details on frames of complex exponentials and nonharmonic Fourier series in general, the reader is referred to [372]. The results of Jaffard and Seip are also reviewed in [76]. 3.3.2 Iterative reconstructions in P WΩ In [3], frame-based algorithms for the reconstruction of irregularly sampled bandlimited signals are developed. The methods are generalizations of those used in the proof of Theorem 3.2.2. We state a simplified version of the result in [3] which draws out the parallels with Theorem 3.2.2. Theorem 3.3.4. Let {λl }∞ l=−∞ be such that λl → ±∞ as l → ±∞ and δ = sup |λ − λ | < 1/Ω. Let vl = (λl+1 + λl−1 )/2. Then the collecl+1 l l √ is a frame for P WΩ with lower frame bound tion { v l sinc Ω(t − λl )}∞ l=−∞ A = (1 − δΩ)2 and upper frame bound B = (1 + δΩ)2 . Since (A + B)/2 = 1 + δ 2 Ω 2 , the frame iteration (3.7) becomes X 1 vl f (λl ) sinc Ω(t − λl ), 1 + δ2 Ω 2 l X 1 = fn + vl (f (λl ) − fn (λl )) sinc Ω(t − λl ), 2 2 1+δ Ω
f0 = fn+1
(3.39)
l
and converges in P WΩ to f with kf −fn k2 ≤ (δΩ)n+1 (1+δΩ)/(1−δΩ)kf k2 by (3.8). When δ approaches 1/Ω, the frame algorithm will converge slowly and the CG algorithm of Theorem 3.1.3 is the preferred reconstruction method. Complete knowledge of g ∈ P WΩ can be obtained from the samples (n) {g(k/Ω)}∞ = fn (k/Ω), k=−∞ via the classical sampling formula (3.37). If ak then from (3.39) we have (0)
X 1 vl f (λl ) sinc (Ωλl − k), 2 2 1+δ Ω l X 1 (n) (0) (n) = ak + ak − Bjk aj , 1 + δ2 Ω 2 j
ak = (n+1)
ak
P where Bkj = l vl sinc (Ωλl −j) sinc (Ωλl −k). Slow decay of the sinc function means that these sums cannot be reliably and accurately truncated, which is necessary when only finitely many samples are available. Remedies for this situation have been considered in [140] where a completely discrete model for bandlimited signals is employed, and in [165, 167] where bandlimited signals are approximated by trigonometric polynomials.
3.3 Sampling in the Paley–Wiener spaces
115
3.3.3 Prolate spheroidal wavefunctions A function cannot be both time- and bandlimited, yet digital devices treat signals as if they were. The simplest way of treating a signal as being essentially time- and bandlimited is to assume that most of the energy of f and fb is concentrated in a time–frequency rectangle. To make this precise we consider timelimiting operator QT and bandlimiting operator PΩ acting on L2 (R) by QT f (t) = f (t)χ[−T /2,T /2] ;
PΩ f (t) = (fbχ[−Ω/2,Ω/2] )∨ .
As there are no simultaneously time- and bandlimited functions, PΩ and QT have no joint eigenfunctions for the eigenvalue γ = 1. Nevertheless, we say that f is essentially time- and bandlimited if f nearly lies in the span of those eigenfunctions for λ close to one. The goal here is to outline how this fundamental notion can be turned into a computationally useful one. It turns out that there are well-defined finite-dimensional spaces of essentially timeand bandlimited functions. However, numerical analysis of such spaces has been considered only recently by Xiao et al. [369] and Beylkin and Monz´on [53] whose ideas we will also outline. In a beautiful series of papers written by Landau, Pollak and Slepian at Bell Labs in the early 1960s [249, 250, 322, 324], the authors explore the concentrations of bandlimited functions, the prolate spheroidal wavefunctions (PSWFs), and the “dimension” of the family of signals timelimited “within ε” to an interval I and bandlimited “within δ” to an interval J. Here we briefly recount some of their results which are important in the current discussion, particularly the remarkable properties of PSWFs. Notice that Ran (QT ) is the space of L2 -signals timelimited to [−T /2, T /2] while Ran (PΩ ) is the Paley–Wiener space P WΩ of L2 -signals bandlimited to [−Ω/2, Ω/2]. The composition PΩ QT : P WΩ → P WΩ is self-adjoint and acts by integration against the kernel K(x, t) = Ω sinc Ω(x − t)χ[−T /2,T /2] (t), i.e., Z
T /2
PΩ QT f (x) = −T /2
sin πΩ(x − t) f (t) dt. π(x − t)
(3.40)
Since K(x, t) satisfies kKk2L2 (R×R) = ΩT < ∞, PΩ QT is a Hilbert–Schmidt operator and kPΩ QT k2HS = kKk2L2 (R×R) = ΩT . Hence PΩ QT is compact on P WΩ , and its eigenvalues γ0 ≥ γ1 ≥ · · · are all non-negative. The associated eigenfunctions may be chosen to be real-valued and, when suitably normalized, form an orthonormal basis for P WΩ . Under this normalization the PSWF ψj (j ≥ 0) is defined as the eigenfunction of PΩ QT with eigenvalue γj . If j is even then ψj is an even function, while if j is odd then ψj is an odd function. A surprise feature is that the PSWFs, which are orthonormal on the line, are also orthogonal on [−T /2, T /2]: hQT ψj , QT ψk i = hψj , PΩ QT ψk i = γk hψj , ψk i = γk δjk .
(3.41)
116
3 Sampling in Fourier and wavelet analysis −1/2
Hence {γj QT ψj }∞ j=0 is an orthonormal set and, in fact, an orthonormal basis for L2 ([−T /2, T /2]). The eigenvalues have another interpretation as time-concentrations of bandlimited functions. Since PΩ QT is self-adjoint, its operator norm is given by γ0 = kPΩ QT k =
sup
|hPΩ QT f, f i| =
f ∈P WΩ ,kf k=1
kQT f k2 ,
sup f ∈P WΩ , kf k=1
i.e., the top eigenvalue γ0 of PΩ QT is in fact the maximum concentration of a function f ∈ P WΩ on the interval [−T /2, T /2]. Since PΩ QT is compact and self-adjoint, the supremum is attained. Hence γ0 < 1 for otherwise there would exist a signal f ∈ P WΩ timelimited to [−T /2, T /2], thus contradicting the analyticity of bandlimited functions. The other eigenvalues also have a natural characterization. In fact they may be computed recursively as follows: let Vj = span{ψk }jk=0 . Then γj+1 =
kQT f k2 .
sup f ∈P WΩ ªVj ,kf k=1
Computing the trace of PΩ QT from its integral representation (3.40) gives ∞ X
Z γj = tr (PΩ QT ) =
Z
T /2
K(t, t) dt =
Ω dt = ΩT.
(3.42)
−T /2
j=0
On the other hand, observe that since {ψj }∞ j=0 is an orthonormal basis for P WΩ which has K(x, t) as its reproducing kernel, ∞
sin πΩ(x − t) X = ψj (x)ψj (t). π(x − t) j=0
(3.43)
The eigenvalue equation for ψj reads Z
T /2
γj ψj (x) = −T /2
sin πΩ(x − t) ψj (t) dt. π(x − t)
Multiplying both sides by ψj (x), summing over j and using (3.43) yields ∞ X j=0
Z γj ψj (x)2 =
T /2
−T /2
sin2 πΩ(x − t) dt. π 2 (x − t)2
(3.44)
Since kQT ψj k2 = γj , integrating both sides of (3.44) over [−T /2, T /2] gives ∞ X j=0
Z γj2 =
T /2 −T /2
Z
T /2 −T /2
sin2 πΩ(t − s) dt ds = π 2 (t − s)2
Z
ΩT /2
−ΩT /2
Z
ΩT /2−s
−ΩT /2−s
sin2 πt dt ds. π 2 t2
Landau [248] shows that an integration by parts on the outer integral gives
3.3 Sampling in the Paley–Wiener spaces ∞ X
γj2 ≥ ΩT − M1 ln(ΩT ) − M2
117
(3.45)
j=0
for positive constants M1 , M2 independent of Ω and T , provided that ΩT ≥ 3/4. Let c = ΩT . Then combining (3.42) and (3.45) gives ∞ X
γj (1 − γj ) < M1 log c + M2 .
(3.46)
j=0
For 0 ≤ γ < 1 fixed, consider those eigenvalues γj with γ ≤ γj < 1 − γ and observe that X γj (1 − γj ) ≥ γ(1 − γ)#{j : γ ≤ γj < 1 − γ}. γ≤γj <1−γ
Consequently, P #{j : γ ≤ γj < 1 − γ} ≤
γ≤γj <1−γ
γj (1 − γj )
γ(1 − γ)
<
M1 log c + M2 γ(1 − γ)
(3.47)
from which we see that the eigenvalues γj are near 1 for j small, near 0 for j large, and the transition occurs over a small range of values of j. Landau [248] also shows that γ[c]+1 ≤ 1/2 ≤ γ[c]−1 (3.48) so that the transition occurs around j = [c]. Other descriptions of this thresholding behavior are included in the following result, and will be needed in Section 3.3.5. The theorem is stated for time limit T = 1. Theorem 3.3.5. (i) For any Ω > 0 and 0 < α < 1 the number N (α) of eigenvalues of the operator PΩ Q1 that are greater than α satisfies µ ¶ µ ¶ 1 1−α Ω N (α) = Ω + log log + O(log Ω). (3.49) π α 2 (ii) For each η > 0 there is a ρ such that, if 0, γn (Ω) → [1 + eπρ ]−1 , if 1, if
as Ω ↑ ∞ n = [(1 + η)Ω], n = [Ω + πρ log πΩ 2 ], n = [(1 − η)Ω].
(3.50)
The final property of PSWFs we will need is their behavior under the Fourier transform: s µ ¶ Ωξ T j ψbj = εj i QT ψj (ξ), (εj = ±1). (3.51) T Ωγj
118
3 Sampling in Fourier and wavelet analysis
The proof is simple once it is recognized that the operator QT PΩ acting on L2 ([−T /2, T /2]) has simple eigenvalues γj with associated eigenfunctions QT ψj . In fact, if δa is the dilation δa f (t) = f (at) then since PΩ QT ψj = γj ψj we have γj δΩ/T ψbj (ξ) = (PΩ QT ψj )∧ (Ωξ/T ) Z ∞ Z T /2 sin πΩ(x − t) = e−2πixΩξ/T ψj (t) dt dx π(x − t) −∞ −T /2 Z T /2 Z ∞ sin πΩx −2πixΩξ/T = ψj (t)e−2πitΩξ/T e dx dt πx −T /2 −∞ Z T /2 ψj (t)e−2πitΩξ/T dt = χ[−T /2,T /2] (ξ) Z = χ[−T /2,T /2] (ξ)
−T /2 ∞
µ ¶ Ωu sin πΩ(ξ − u) ψbj du T π(ξ − u) −∞
= QT PΩ δΩ/T ψbj (ξ). Hence δΩ/T ψbj is an eigenfunction of QT PΩ with eigenvalue γj . We conclude that ψbj (Ωξ/T ) = aj QT ψj (ξ) and all that remains to be done is compute the constants aj . Now each ψj is real-valued, and if j is even then ψj is even and ψbj is real-valued, while if j is odd then ψj is odd and ψbj is purely imaginary. Hence we may write ψbj (Ωξ/T ) = ij Aj QT ψj (ξ) for real constants Aj and Z 1=
|ψbj (ξ)|2 dξ =
Ω T
Z ¯ ³ Z T /2 ´¯ Ωγj ¯ b Ωξ ¯2 2Ω . ψ dξ = A |ψj (ξ)|2 dξ = A2j ¯ j ¯ j T T −T /2 T
p Hence Aj = ± T /(Ωγj ) and (3.51) is proved. 3.3.4 The ΩT theorem Essential time- and bandlimiting. We say g ∈ L2 (R) is ε-timelimited ° °2 to [−1/2, 1/2] or has unit duration at level ε provided °gχ{|t|>1/2} °2 < ε. Similarly one says that g is δ-bandlimited to [−Ω/2, Ω/2] or has bandwidth Ω ° °2 at level δ provided °gbχ{|ω|>Ω/2} °2 < δ. F(Ω; ε, δ) = Fε,δ denotes the manifold of L2 -functions having unit duration at level ε and bandwidth Ω at level δ. To simplify notation we will restrict attention to Fε = Fε,ε . The Donoho–Stark uncertainty inequality (Corollary 5.1.5) states that Fε contains no elements √ of norm greater than or equal to one unless Ω ≥ (1 − 2ε)2 . Moreover, a function in Fε cannot have arbitrary energy (e.g., Nazarov’s theorem 5.1.6). Slepian [323] posed the problem of maximizing the energy of an element of Fε in the following terms: Fix α < ε and β < ε and consider those functions satisfying
3.3 Sampling in the Paley–Wiener spaces
kQgk22 = α
and
kPΩ gk22 = β.
119
(3.52)
By Plancherel’s theorem, ° ° °gbχ{|ω|>Ω/2} °2 = 2
ZZ g(t)
sin πΩ(t − τ ) g(τ ) dt dτ. π(t − τ )
Now introduce Lagrange multipliers µ, ν and note that (with χ = χ[−1/2,1/2] ) Z 2 I = kgk2 + µ (1 − χ (t)) g 2 (t) dt µ ¶ ZZ sin πΩ(t − τ ) 2 +ν kgk2 − g(t) g(τ ) dt dτ π(t − τ ) must be stationary with respect to small changes in g for the most energetic function subject to the constraints. Thus, taking the first variation of I one sees that g must satisfy Z sin πΩ(t − τ ) Ag(t) + Bχ (t) g(t) = g(τ ) dτ ≡ b(t) (3.53) π(t − τ ) with constants A, B independent of t. The function b is bandlimited to [−Ω/2, Ω/2] while Ag(t) = b(t) for |t| > 1/2 and (A + B)g(t) = b(t) for |t| < 1/2. Making these substitutions in (3.53) yields Z 1/2 1 sin πΩ(t − τ ) Ag(t) + Bχ (t) g(t) = b(τ ) dτ A + B −1/2 π(t − τ ) Z 1 sin πΩ(t − τ ) + b(τ ) dτ A |τ |>1/2 π(t − τ ) µ ¶ Z 1/2 1 1 1 sin πΩ(t − τ ) = b(t) + − b(τ ) dτ. A A+B A π(t − τ ) −1/2 Taking |t| < 1/2 yields (1 − A)(A + B) b(t) = B
Z
1/2 −1/2
sin 2πΩ(t − τ ) b(τ ) dτ = PΩ Q1 b(t). π(t − τ )
Consequently, b must be an eigenfunction kψj of PΩ Q1 with (1 − A)(A + B)/B = γj . Henceforth we refer to g as gj . The energy constraints (3.52) on gj together with (3.53) and (3.41) imply that s r α β gj (t) = ψj (t) + χ (t) ψj (t). 1 − γj γj (1 − γj ) p 2 Moreover, kgj k2 = (α + β + 2 γj αβ)/(1 − γj ). Since {γj } ⊂ (0, 1) decreases monotonically, g0 will have the greatest possible energy among the gj . The energy of gj is maximized among α, β in [0, ε] when α = β = ε as we shall assume
120
3 Sampling in Fourier and wavelet analysis
henceforth. It follows from (3.41) that gj are orthogonal on the line as well as on the interval [−1/2, 1/2]. Furthermore, they have monotonically decreasing ° °2 √ √ √ 2 energies kgj k2 = (2ε)/(1 − γj ) and °gj χ{|t|<1/2} °2 = ε(1 + γj )/(1 − γj ). For large j the energy of gj is nearly equally divided inside and outside of [−1/2, 1/2]. The gj give rise to economical approximations of Fε . In fact, the following result is given in [323]: Theorem 3.3.6. For every η > 0 and for every ε˜ > ε > 0 there is a number ω(η, ε, ε˜) such that, whenever Ω > ω(η, ε, ε˜) there are N = N (Ω, ε, ε˜) elements {g1 , . . . , gN } of Fε such that: (i) 1 − η ≤ N/Ω ≤ 1 + η and (ii) every element of Fε is within ε˜ in L2 -norm of some element of the span of {g1 , . . . , gN }. This theorem is known as the “ΩT theorem.” It says that if Ω is large enough then there is a subspace of L2 having dimension essentially Ω such that every essentially Ω-bandlimited function of essential unit duration nearly belongs to this subspace. The general case of functions essentially timelimited to [−T /2, T /2] is obtained from the unit duration case by rescaling. What follows is a brief outline of the proof. The argument showing that N/Ω ≤ 1 + η is very similar to that used for finding the energy maximizers in Fε . In this case, subject to fixed energy levels α, β of the restrictions of g to {|t| > 1/2} and gb³to {|ω| > Ω/2}, respectively, ´ Pm−1 one finds that a maximizer of the energy of g − j=0 hg, ψj χi ψj /γj χ among g ∈ Fε , subject to the energy constraints, is the same as a maximizer 2 of kgχk2 among those g ∈ Fε that are orthogonal to ψj χ, j = 0, . . . , m − 1. Letting α and β attain the optimal value ε, one has g = gm for such an √ √ 2 optimizer. Then kgχk2 = ε(1 + γm )/(1 − γm ). It follows now from (3.50) that given δ > 0 there is an n(δ) such that Ω > n(δ) implies γm ≤ (1 + eπρ )−1 + δ when m = [Ω + (ρ/π) log(πΩ/2)]. Taking δ = (˜ ε − ε)/(2(˜ ε + ε)) 2 and ρ = (1/π) log(1/δ − 1) yields γm ≤ 2δ and hence kgχk2 < ε˜. Taking µ ³ ³ ε˜ + ε ´ ´¶ ³ ³ ε˜ + ε ´ ´ ³ 1 ³ ε˜ − ε ´´ 1 c1 = log 2 −1 and c2 = 2 −1 n π ε˜ − ε ε˜ − ε 2 ε˜ + ε it follows that if Ω > c2 then N (Ω, ε, ε˜) ≤ 1 + (1 + c1 log Ω)/2 and this proves the upper bound N/Ω ≤ 1 + η for large Ω. We will not prove the lower bound N/Ω ≥ 1 −p η but here is the idea: Upon choosing p = [(1 − η)Ω] it can be shown that g = ε/(1 − γp+1 ) ψp+1 belongs to Fε and kg − hk22 ≥ εγp+1 /(1 − γp+1 ) for all h ∈ span {ψ0 , . . . , ψp }. This is enough to guarantee that one needs sufficiently many functions to form an approximate basis at level ε˜ for Fε . This finishes our sketchy proof of Theorem 3.3.6. 3.3.5 Quadrature for Paley–Wiener spaces A bandlimited function can be sinc-interpolated exactly from its integer samples via the classical sampling theorem (3.37). There are drawbacks: real data
3.3 Sampling in the Paley–Wiener spaces
121
is only essentially bandlimited and the sinc-series converges slowly. The ΩT theorem suggests that an essentially time- and bandlimited function f can be approximated well by a linear combination c0 ψ0 + · · · + cN ψN of PSWFs where N is as small as possible. But such a representation loses the advantage of needing only the samples to reproduce f . No magical formula relates the coefficients cn to the samples f (n/Ω). At best one can estimate one from the other, and approximate numerical solutions to this problem have been formulated by Xiao et al. [369] and by Beylkin and Monz´on [53]. The approaches are based on quadrature properties of PSWFs. The idea boils down to the construction of a quadrature rule which uniR 1/2 formly approximates the integral sinc (Ωx) = −1/2 e2πiΩxξ dξ by a sum of the Pn form j=1 wj e2πiΩxξj . The sum can be approximated, in turn, by a suitable linear combination of PSWFs. We make use of two different sets of PSWFs. As before, these are eigenfunctions of iterated projections PΩ QT . Each choice of Ω and T produces a different set of PSWFs. Throughout this section we fix T = 1 and Q1 will be written (Ω) simply as Q. With Ω fixed and T = 1 we have the PSWFs ψj = ψj which (Ω)
are eigenfunctions of PΩ Q with associated eigenvalues γj = γj With Ω replaced by 2Ω and T = 1, we obtain the PSWFs
(2Ω) ψj
(j ≥ 0).
which are
(2Ω) eigenfunctions of P2Ω Q with associated eigenvalues γj (j ≥ 0). We also −1/2 (2Ω) (2Ω) (2Ω) ˜ ˜ make use of the collections ψj = γj Qψj and ψj = (γj )−1/2 Qψj 2
which both form orthonormal bases of L ([−1/2, 1/2]). A collection of n continuous functions ϕ0 , ϕ1 , . . . , ϕn−1 on an interval [a, b] is said to form a Chebyshev system on [a, b] if, given n distinct points a ≤ x0 < x1 < · · · < xn−1 ≤ b, the n × n matrix A with (j, k)-th entry Ajk = ϕk (xj ), is invertible. For a full theory of Chebyshev systems, the reader is referred to [150] and [222]. Suppose then that {ϕ0 , ϕ1 , . . . , ϕn−1 } forms a Chebyshev system on [a, b] and let Vn = span {ϕ0 , ϕ1 , . . . , ϕn−1 }. Signals in Vn enjoy a particularly strong Pn−1 sampling property. Given f (t) = k=0 ck ϕk (t) ∈ Vn , observe that evaluating f at distinct points x0 , x1 , . . . , xn−1 ∈ [a, b] produces a matrix equation Pn−1 f (xj ) = k=0 Ajk ck with A ∈ Mn (C) defined as above. Since A is invertible, Pn−1 we have ck = l=0 (A−1 )kl f (xl ) so that f (t) =
n−1 X n−1 X k=0 l=0
(A−1 )kl f (xl )ϕk (t) =
n−1 X
f (xl )Sl (t)
(3.54)
l=0
Pn−1 −1 where Sl (t) = )kl ϕk (t) ∈ Vn . The functions Sl satisfy the imk=0 (A portant cardinality property Sl (xk ) = δlk . Examples of Chebyshev systems include the trigonometric functions {1, cos x, sin x, . . . , cos nx, sin nx} on [0, 2π] and the monomials {1, x, . . . , xn } on the real line. It is a remarkable fact that for each integer n ≥ 1, the prolate spheroidal wavefunctions {ψ0 , ψ1 , . . . , ψn−1 } form a Chebyshev system on [−1/2, 1/2]. This is due, in
122
3 Sampling in Fourier and wavelet analysis
part, to the “lucky accident” that the PSWFs are eigenfunctions of a Sturm– Liouville problem (see [150, 351]). We immediately have that functions in Vn (Ω) = span {ψ0 , ψ1 , . . . , ψn−1 } satisfy the sampling result (3.54). While equation (3.54) is a pleasing reconstruction formula for signals in Vn (Ω), it can be difficult to work with if A is ill-conditioned. In particular, Pn−1 if kA−1 k is large, computing the coefficients l=0 (A−1 )kl f (xl ) or the samPn−1 pling functions k=0 (A−1 )kl ψk (t), can be computationally expensive, perhaps completely impractical. To circumvent these difficulties, we truncate A and deal with associated pseudoinverses. If a collection of functions {ϕ0 , ϕ1 , . . . , ϕn−1 } forms a Chebyshev system on [a, b] and {x0 , x1 , . . . , xn−1 } ⊂ [a, b] are distinct nodes, then there exist weights w0 , w1 , . . . , wn−1 ∈ C such that Z
b
f (x) dx = a
n−1 X
wj f (xj )
(3.55)
j=0
for all f ∈ span {ϕ0 , ϕ1 , . . . , ϕn−1 }. The weights are in fact the unique Rb Pn−1 solutions of the matrix equation a ϕi (x) dx = j=0 wj ϕi (xj ). Particular choices of nodes allow this quadrature property to be extended. One such choice leads to what is known as generalized Gaussian quadrature: if {ϕ0 , ϕ1 , . . . , ϕ2n−1 } is a collection of 2n functions which form a Chebyshev system on [a, b] then there exist n unique nodes {x0 , x1 , . . . , xn−1 } ∈ [a, b] and (unique) weights w0 , w1 , . . . , wn−1 ∈ C such that (3.55) holds for all f ∈ span {ϕ0 , ϕ1 , . . . , ϕ2n−1 }. For a more complete discourse on generalized Gaussian quadrature, the reader is referred to [150] and [222]. Computing the nodes and weights amounts again to solving the equations Rb Pn−1 ϕ (x) dx = j=0 wj ϕi (xj ) (0 ≤ i ≤ 2n − 1), this time for the 2n unknowns a i x0 , x1 , . . . , xn−1 , w0 , w1 , . . . , wn−1 . These equations are nonlinear in the variables xi . For example, if ϕi (x) = xi (i ≥ 0) are the monomials on the interval [−1, 1] then the unique nodes which give Gaussian quadrature occur at the n zeros of the nth Legendre polynomial on [−1, 1]. In this section we consider the problem of constructing a signal in Vn (Ω) which approximates f ∈ P WΩ on the interval [−1/2, 1/2] (in an L∞ or L2 sense) using as data only finitely many samples of f at sampling/quadrature nodes x0 , x1 , . . . , xn−1 ∈ [−1/2, 1/2]. Such approximations will be referred to as local sampling approximations. We review the work of Xiao et al. [369] and Beylkin and Monz´on [53], introducing several ideas regarding pseudoinverses which are absent, though perhaps implicit, in [369]. The starting point is the construction of appropriate quadratures. Given an approximation parameter ε, we require nodes {x0 , x1 , . . . , xn−1 } ⊂ [−1/2, 1/2] and weights w0 , w1 , . . . , wn−1 such that, for all |b| < 1, ¯Z ¯ ¯ ¯
1/2
−1/2
2πiΩbx
e
dx −
n−1 X k=0
wk e
¯ ¯ ¯ ≤ ε.
2πiΩbxk ¯
(3.56)
3.3 Sampling in the Paley–Wiener spaces
123
The idea is that the left-hand side of (3.56) can be made arbitrarily small by using sufficiently many nodes and weights. Just as quadratures based on polynomials take advantage of the Euclidean algorithm, a corresponding factorization theorem for bandlimited functions serves to establish quadratures based on PSWFs (see [369]) in which case the nodes can be taken to be the zeros of a PSWF, in analogy with the classical case of Gaussian quadrature. Such nodes are not ideal for quadrature [53] but at least provide a convenient and reasonable starting point. We will not dwell here on the problems of finding suitable quadrature weights and nodes. Much more detail pertaining to these issues can be found in [53]. Instead, we show how generalized Gaussian quadrature nodes and weights for the space Vn (2Ω) can be used in (3.56) to obtain a good approximation. n−1 Theorem 3.3.7. Let {xk }n−1 k=0 ⊂ [−1/2, 1/2] and {wk }k=0 be generalized Gaussian quadrature nodes and weights, respectively, for the Chebyshev sys(2Ω) |b| < 1, there exists a constant tem {ψj }2n−1 j=0 on [−1/2, 1/2]. Then, for all q (2Ω) c = c(Ω) for which (3.56) holds with ε = c(Ω) γ2n . (2Ω) 2 Proof. Since {ψ˜j }∞ j=0 forms an orthonormal basis for L ([−1/2, 1/2]), the 2πiΩbx complex exponential e has the expansion ¶ Z 1/2 ∞ µ X 1 (2Ω) 2πiΩbt (2Ω) e2πiΩbx = e ψ (t) dt ψj (x) j (2Ω) −1/2 γj j=0
= =
∞ X
1 ¡ (2Ω) ¢∨ (2Ω) Qψj (Ωb)ψj (x) (2Ω) γ j=0 j ∞ X (2Ω) (2Ω) (2Ω) βj ψj (b/2)ψj (x) j=0
(3.57)
¡ q (2Ω) ¢ (2Ω) with βj = εj / ij 2γj Ω and ²j = ±1. The expansion is valid for |x| < 1/2. In obtaining (3.57) the Fourier transform property (3.51) has been (2Ω) applied. The quadrature integrates exactly the first 2n PSWFs {ψj }2n−1 j=0 , i.e., Z 1/2 n−1 X (2Ω) (2Ω) ψj (x) dx = wk ψj (xk ) (3.58) −1/2
k=0
for 0 ≤ j ≤ 2n − 1, and therefore by (3.57) and (3.58) Z
1/2
e2πiΩbx dx −
−1/2
=
n−1 X
wk e2πiΩbxk
k=0 ∞ X j=2n
(2Ω) (2Ω) βj ψj (b/2)
µZ
1/2
−1/2
(2Ω) ψj (x) dx
−
n−1 X k=0
¶ .
(2Ω) wk ψj (xk )
124
3 Sampling in Fourier and wavelet analysis
The PSWFs satisfy the bound (Ω) |ψj (x)|
q (Ω) ≤ c∞ (Ω) γj
(3.59)
for |x| < 1/2, a consequence of the convergence of the PSWFs to the Legendre polynomials (see [53]). The weights of a generalized Gaussian quadrature are Pn−1 positive, and their sum k=0 wk is bounded by a constant independent of n [351]. Therefore, ¯Z ¯ µ n−1 ∞ q n−1 ¯ 1/2 ¯ X X ¶ c2∞ X ¯ (2Ω) 2πiΩbx 2πiΩbxk ¯ √ e dx − wk e γj 1+ wk ¯ ¯≤ ¯ −1/2 ¯ 2Ω j=2n k=0 k=0 q (2Ω) ≤ c(Ω) γ2n . q P∞ q (2Ω) (2Ω) The last inequality used the estimate j=2n γj ≤ (c1 + c2 log Ω) γ2n which is obtained from (3.49). Here c1 and c2 are independent of Ω. This completes the proof. Equation (3.56), in which integrals of complex exponentials are approximated by appropriate nonharmonic Fourier sums, can be extended to bandlimited functions in the following manner. n−1 Lemma 3.3.8. Let {xk }k=0 , {wk }n−1 k=0 be quadrature nodes and weights satisR 1/2 Pn−1 fying (3.56), h ∈ P WΩ and µn (ξ) = −1/2 e2πiξx dx − k=0 wk e2πiξxk . Then
Z
1/2
h(x) dx − −1/2
n−1 X
Z
Ω
b h(ξ)µn (ξ).
wk h(xk ) = −Ω
k=0
The proof requires no more than an application of the Fourier inversion theorem and is omitted. Notice that Rby (3.56), |µn (ξ)| ≤ ε for |ξ| < Ω. ∞ Here, and from now on, kf k2 = ( −∞ |f |2 )1/2 is the L2 -norm of f on the real line. If f ∈ P WΩ , then putting h = f ψl ∈ P W2Ω in Lemma 3.3.8 gives Corollary 3.3.9. If f ∈ P WΩ then ¯ ¯2 ¶1/2 µ Z µ n−1 X ¯ n−1 X ¯ ¯ wk f (xk )ψl (xk )¯¯ ≤ ¯ l=0
¶1/2
1/2
|f |
2
+ 2Ωεkf k2 .
−1/2
k=0
Proof. By Lemma 3.3.8 and the Plancherel identity, n−1 X k=0
Z
Z
1/2
wk f (xk )ψl (xk ) =
Ω
f (x)ψl (x) dx − −1/2
fd ψl (ξ)µn (ξ) dξ
−Ω
= hQf, ψl i − hf (P2Ω µn )∨ , ψl i.
3.3 Sampling in the Paley–Wiener spaces
125
Now k(P2Ω µn )∨ k∞ ≤ kP2Ω µn k1 ≤ 2Ωε and the collection {ψl }∞ l=0 is a Bessel sequence in L2 (R), so ¯2 ¶1/2 µ n−1 X ¯¯ n−1 X ¯ ¯ wk f (xk )ψl (xk )¯¯ ¯ l=0
≤
k=0
µ n−1 X
¶1/2 2
|hQf, ψl i|
+
µ n−1 X
¶1/2 ∨
2
|hf (P2Ω µn ) , ψl i|
l=0
l=0
≤ kQf k2 + kf (P2Ω µn )∨ k2 ≤ kQf k2 + 2Ωεkf k2 . The proof is complete. Choose nodes {xk }n−1 k=0 ⊂ [−1/2, 1/2] and define a matrix A ∈ Mn (R) by Ajk = ψk (xj ) (0 ≤ j, k ≤ n − 1). As we have already seen, A is invertible since {ψ0 , ψ1 , . . . , ψn−1 } is a Chebyshev system on [−1/2, 1/2]. To be useful in numerical algorithms it is important that A be well-conditioned, in particular that kA−1 k be not too large. For this reason we fix m < n and consider truncations B ∈ Mnm (R) of A defined by Bjk = Ajk (0 ≤ j ≤ n−1, 0 ≤ k ≤ m−1). Since it is not square, B cannot be invertible, but it does have full rank and therefore has a pseudoinverse. Being the truncation of an invertible matrix, B ∗ B ∈ Mm (R) is invertible since if B ∗ Bv = 0, then kBvk2 = hB ∗ Bv, vi = 0, thus contradicting the full rank of B. The pseudoinverse of B is the matrix B + = (B ∗ B)−1 B ∗ ∈ Mmn (R). Although Bv = w may not have a solution for arbitrary w ∈ Cn , v = B + w is the minimizer of kBv − wk. The critical result that gives the well-conditionedness of the matrices we use in the sampling theory is the following strengthening of a result which appears in [369]. n−1 Lemma 3.3.10. Let {xk }n−1 k=0 and {wk }k=0 be quadrature nodes and weights ˜ ∈ Mnm (R) for which (3.56) holds. Let W = diag (w0 , w1 , . . . , wn−1 ) and B −1/2 ˜jk = ψ˜k (xj ) = γ ˜ ∗ W B, ˜ have (j, k)-th entry B Bjk (m < n). If E = In − B k then kEk ≤ Ωε/γm .
Proof. By (3.51) and an application of (3.56), we have (with εj = ±1) n−1 X
1 ψ (xl )wl ψk (xl ) γj γk j l=0 Z Z n−1 X Ωεj εk ik−j 1/2 1/2 = ψj (t)ψk (s) wl e2πiΩxl (s−t) ds dt = I − II γj γk −1/2 −1/2
e ∗ W B) e jk = (B
√
l=0
where Ωεj εk ik−j I= γj γk
Z
1/2
Z
Z
1/2
1/2
ψj (t)ψk (s) −1/2
−1/2
−1/2
e2πiΩx(s−t) dx ds dt
126
3 Sampling in Fourier and wavelet analysis
and II =
Z
Ωεj εk ik−j γj γk
1/2
Z
1/2
ψj (t)ψk (s)fε (s − t) ds dt −1/2
−1/2
where |fε (s)| ≤ ε. With a second application of (3.51) we find Z Ωεj εk ik−j 1/2 (Qψj )∨ (Ωx)(Qψk )∨ (Ωx) dx γj γk −1/2 Z 1/2 1 = √ ψj (x)ψk (x) dx = δjk . γj γk −1/2
I=
Hence Ejk = II. To compute the operator norm of E, consider the bilinear form hEu, vi with u, v ∈ Cm : hEu, vi =
m−1 X m−1 X
Z
1/2
Z
1/2
Ejk uk v j = Ω
Fu (s)F v (t)fε (s − t) ds dt −1/2
j=0 k=0
−1/2
Pm−1 where Fu (s) = k=0 εk ik uk ψk (s)/γk and Fv (t) is defined similarly. By the orthogonality of the PSWFs on [−1/2, 1/2], µZ |hEu, vi|2 ≤ Ω 2
1/2
Z
|Fu (s)||Fv (t)||fε (s − t)| ds dt −1/2
µZ 2 2
−1/2
1/2
≤Ω ε
≤Ω ε
¶µ Z 2
1/2
|Fu (s)| ds −1/2
2 2
¶2
1/2
µ m−1 X k=0
2
|uk | γk
¶µ m−1 X l=0
¶ |Fv (t)| dt 2
−1/2
|vl | γl
2
¶
≤
Ω 2 ε2 kuk2 kvk2 2 γm−1
and we conclude that kEk ≤ Ωε/γm−1 as required. This concludes the proof. The lemma shows that for highly accurate quadratures (i.e., those for ˜ ∗W B ˜ is invertible and may which (3.56) is satisfied with a small value of ε) B be computed by a simple iteration which will converge quickly for small ε. There are several possible projections from P WΩ to Vm (Ω) that are worth considering. The first is the orthogonal projection Ω : f ∈ P WΩ 7→ Pm = Pm
m−1 X
hf, ψj iψj .
j=0
Although Pm will be important in computing suitable approximations, we need projections which use sampled data only. At this stage we insist that the weights wk be strictly positive (as they are when associated with a generalized Gaussian quadrature). In this situation, since B has full rank and W = diag (w0 , w1 , . . . , wn−1 ) is invertible on Cn , the matrix B ∗ W B
3.3 Sampling in the Paley–Wiener spaces
127
+ is also invertible and we define the (weighted) pseudoinverse Bw of B by + ∗ −1 ∗ Bw = (B W B) B W and a projection Sm : P WΩ → Vm (Ω) by
Sm f (x) =
m−1 X
+ (Bw f )j ψj (x) =
j=0
m−1 X X n−1
+ (Bw )jk f (xk )ψj (x).
j=0 k=0
Here f ∈ Cn is the vector of samples with jth entry fk = f (xk ) (0 ≤ k ≤ n−1). Pm−1 To see that Sm is a projection, suppose f = j=0 cj ψj ∈ Vm (Ω) so that Pm−1 + f (xk ) = j=0 cj ψj (xk ) = (Bc)k . Then Bw B = Im and Sm f (x) =
m−1 X n−1 X
+ (Bw )jk (Bc)k ψj (x) =
j=0 k=0
m−1 X
cj ψj (x) = f (x).
j=0
We wish to show that if f ∈ P WΩ , then Sm f is a good approximation for f in L2 ([−1/2, 1/2]). The first step is to show that Sm is bounded from P WΩ to L2 ([−1/2, 1/2]). n−1 Proposition 3.3.11. Let f ∈ P WΩ and {xk }n−1 k=0 ⊂ [−1/2, 1/2], {wk }k=0 be quadrature nodes in [−1/2, 1/2] and positive weights, respectively, so that (3.56) holds with ε < γm /Ω. Then
µZ
1/2
¶1/2 |Sm f (x)|2 dx
≤
−1/2
µ ¶ 2εΩ γm kQf k2 + √ kf k2 . γm − Ωε γm
Proof. First note that if Γ = diag (γ0 , γ1 , . . . , γm−1 ) ∈ Mm (R), the orthogonality of {ψj }∞ j=0 on [−1/2, 1/2], yields Z
1/2
Z 2
1/2
|Sm f (x)| dx = −1/2
−1/2
=
m−1 X
¯ m−1 ¯2 ¯X + ¯ ¯ (Bw f )j ψj (x)¯¯ dx ¯ j=0
+ + 2 γj |(Bw f )j |2 = kΓ1/2 Bw fk .
(3.60)
j=0 + + ˜ ∗ W B, ˜ we have EΓ1/2 Bw Since E = Im − B = Γ1/2 Bw −Γ−1/2 B ∗ W . Therefore, by Lemma 3.3.10, + + kΓ1/2 Bw f k ≤ kEΓ1/2 Bw f k + kΓ−1/2 B ∗ W f k ≤
so that
+ kΓ1/2 Bw fk ≤
Ωε 1/2 + e∗W f k kΓ Bw f k + kB γm
γm e ∗ W f k. kB γm − Ωε
However, an application of Lemma 3.3.8 with h = f ψl gives
(3.61)
128
3 Sampling in Fourier and wavelet analysis
e∗W f k = kB
µ m−1 X j=0
≤
µ m−1 X
¯ n−1 ¯2 ¶1/2 ¯ 1 ¯¯ X ψj (xk )wk f (xk )¯¯ ¯ γj k=0
|hf, ψej i|2
¶1/2
j=0
+
µ m−1 X j=0
1 2 |hf ψj , P\ 2Ω µn i| γj
¶1/2
1 2εΩ ≤ kQf k2 + √ kf P\ kf k2 . 2Ω µn k2 ≤ kQf k2 + √ γm γm
(3.62)
Combining (3.60), (3.61) and (3.62) gives the result. Now we require approximation results concerning the orthogonal projection Pm of P WΩ onto Vm (Ω). Let Q0 f = (I − Q)f = f χ{|x|>1/2} . Proposition 3.3.12. Let f ∈ P WΩ and g = Pm f = √ kQ(f − g)k2 ≤ γm kf k2 and
Pm−1 k=0
hf, ψk iψk . Then
kf − gk2 ≤ γm kf k2 + kQ0 f k2 .
(3.63) (3.64)
Proof. By the orthonormality of {ψ˜k } on [−1/2, 1/2], kQ(f −
g)k22
¯2 Z ¯X ¯ ∞ ¯ ¯ = ¯ hf, ψj iQψj ¯¯ j=m
=
∞ X
|hf, ψj i|2 γj ≤ γm
j=m
∞ X
|hf, ψj i|2 = γm kf k22
j=0
which gives (3.63). Also, by the orthonormality of {ψj }∞ j=0 on the real line, kf − gk2 =
µX ∞
¶1/2 |hQf, ψj i + hQ0 f, ψj i|2
j=m
=
µX ∞
¶1/2 |γj hf, ψj i + hQ0 f, ψj i|2
≤ γm kf k2 + kQ0 f k2
j=m
which completes the proof of (3.64). Finally we are able to estimate the difference between f ∈ P WΩ and its sampling projection Sm f ∈ Vm (Ω). Theorem 3.3.13. Let f ∈ P WΩ and let ε as in (3.56) with Ωε < γm /2. Then · ¸ √ γm 0 kQ(Sm f − f )k2 ≤ (2γm (1 + εΩ) − εΩ)kf k2 + 2εΩkQ f k2 . γm − εΩ
3.3 Sampling in the Paley–Wiener spaces
129
Proof. Since g = Pm f ∈ Vm (Ω), Sm g = g so that, with an application of Propositions 3.3.11 and 3.3.12, kQ(Sm f − f )k2 ≤ kSm (f − g)k2 + kf − gk2 ¶ µ γm 2εΩ √ ≤ kQ(f − g)k2 + √ kf − gk2 + γm kf k2 γm − Ωε γm µ ¶ γm 2Ωε √ √ 0 ≤ γm kf k2 + √ (γm kf k2 + kQ f k2 ) + γm kf k2 γm − Ωε γm which gives the result. Note that if Ωε < γm /2, then since kQ0 f k2 < kf k2 we immediately have √ kQ(Sm f − f )k < 7 γm kf k2 . Beylkin and Monz´on [53] take a different approach to localized sampling approximations in which the spaces Vm (Ω) and the matrix algebra of [369] play no (explicit) role. As in [369], Beylkin and Monz´on start with a quadrature scheme with n−1 nodes {xk }n−1 k=0 and weights {wk }k=0 for which (3.56) holds. They then show that complex exponentials of the form e2πiΩbt , |b| < 1/2, may be uniformly well approximated by a linear combination of the functions {e2πiΩxl t }n−1 l=0 . Proposition 3.3.14. Let |t| < 1/2, |b| < 1/2, Ω, ε > 0 and {xk }n−1 k=0 ⊂ [−1/2, 1/2], {wk }n−1 be a collection of quadrature nodes and weights for which k=0 (3.56) holds. Then there exist coefficients αl = αl (b) (0 ≤ l ≤ n − 1) and a constant c(Ω) such that ¯ ¯ ¶ µ X ¯ 2πiΩbt n−1 ¯ ε √ 2πiΩxl t ¯ ¯ sup ¯e γ + − αl e . ≤ c(Ω) √ n ¯ γn |t|<1/2 l=0
Proof. As in [369], the exponential e2πiΩbt is expanded into a series of PSWFs, P∞ (Ω) this time with bandlimit Ω. In fact, e2πiΩbt = j=0 βj ψj (b)ψj (t) for |t| < ¡ p ¢ (Ω) 1/2 with βj = βj = εj / ij γj Ω (εj = ±1, cf. (3.57)), so that ¯ ¯ ¯X ¯ X (Ω) ¯ 2πiΩbt n−1 ¯ ¯ ∞ (Ω) ¯ √ ¯e ¯ ¯ − βj ψj (b)ψj (t)¯ = ¯ βj ψj (b)ψj (t)¯¯ ≤ c(Ω) γn . ¯ j=0
(3.65)
j=n
P∞ √ We have used the inequality (3.59) and the estimate γj ≤ (c1 + j=n √ c2 log Ω) γn (with c1 , c2 independent of n, Ω) which may obtained from dj (Ωt) = γj β ψj (t) so (3.49). Then (3.51) may be rewritten in the form Qψ j that with the eigenfunction property PΩ Qψj = γj ψj , an application of (3.56) gives
130
3 Sampling in Fourier and wavelet analysis
¯ n−1 ¯ ¯X ¯ 1 2πiΩtxl ¯ wl e β j ψj (xl ) − ψj (t)¯¯ ¯ Ω l=0 ¯ ¯ n−1 Z 1/2 ¯ ¯1 X 1 2πiΩtxl −2πiΩxl s ¯ wl e ψj (s)e ds − ψj (t)¯¯ =¯ γj Ω −1/2 l=0 ¯ Z 1/2 µ n−1 ¶ ¯ Z 1/2 X ¯ 1¯ ≤ ¯¯ ψj (s) wl e2πiΩ(t−s)xl − e2πiΩ(t−s)u du ds¯¯ γj −1/2 −1/2 l=0 ¯ Z 1/2 ¯ Z 1/2 ¯ 1 ¯¯ γj 2πiΩ(t−s)u ψj (s) e + ¯ du ds − ψj (t)¯¯ γj −1/2 Ω −1/2 ¯Z ¯ Z 1/2 ¯ ε 1 ¯¯ Ω/2 d 2πiξt ≤ Qψj (ξ)e dξ − γj ψj (t)¯¯ |ψj (s)| ds + ¯ γj −1/2 γj Ω −Ω/2 ε ε 1 ≤ √ + |PΩ Qψj (t) − γj ψj (t)| = √ . (3.66) γj γj Ω γj Combining (3.59), (3.65) and (3.66) gives ¯ ¯ X n−1 X wl ¯ 2πiΩbt n−1 ¯ 2πiΩtxl ¯e e ψj (b)ψj (xl )¯¯ − ¯ γj l=0 j=0
¯ ¯ X ¯ 2πiΩbt n−1 ¯ ¯ ≤ ¯e − βj ψj (b)ψj (t)¯¯ j=0
¯ ¯ n−1 n−1 X n−1 X wl ¯ ¯X 2πiΩtxl ¯ e ψj (b)ψj (xl )¯¯ +¯ βj ψj (b)ψj (t) − γj j=0
l=0 j=0
¯ ¯ n−1 X ¯1 ¯ √ 2πiΩtxl ¯ |βj ||ψj (b)|¯ ψj (t) − β j wl e ψj (xl )¯¯ ≤ c(Ω) γn + Ω Ω j=0 n−1 X
l=0
√ ≤ c(Ω) γn + Ω
n−1 X
ε |βj ||ψj (b)| √ γj j=0
ε √ ≤ c(Ω) γn + c(Ω) √ γn
(3.67)
Pn−1 √ where in the last step we have applied the estimate j=0 1/ γj ≤ (c1 + √ c2 log Ω)/ γn (with c1 , c2 independent of n, Ω) which is obtained from Theorem 3.49. The proof is complete. Proposition 3.3.14 now implies the following approximate sampling formula for signals in P WΩ . n−1 Theorem 3.3.15. Let {xl }n−1 l=0 ⊂ [−1/2, 1/2], {wl }l=0 be nodes and weights for which (3.56) holds. Let Ψl ∈ Vn (Ω) (0 ≤ l ≤ n − 1) be given by
3.4 Sampling in phase space: the short-time Fourier transform
Ψl (t) = wl
n−1 X j=0
131
1 ψj (xl )ψj (t). γj
Then for all f ∈ P WΩ and all t ∈ [−1/2, 1/2] ¯ ¯ µ ¶ n−1 X ¯ ¯ ¯f (t) − ¯ ≤ c(Ω) √γn + √ε f (x )Ψ (t) kf k2 . l l ¯ ¯ γn l=0
Proof. Equation (3.67) may be written in the form ¯ ¯ µ ¶ X ¯ 2πitξ n−1 ¯ ε √ 2πixl ξ ¯e ¯ − e Ψl (t)¯ ≤ c(Ω) γn + √ . ¯ γn
(3.68)
l=0
The Fourier inversion theorem and the Cauchy–Schwarz inequality yield ¯ ¯ ¯ Z Ω/2 µ ¶ ¯ n−1 n−1 X X ¯ ¯ ¯ ¯ 2πitξ 2πixl ξ b ¯f (t) − ¯ ¯ f (x )Ψ (t) = f (ξ) e − e Ψ (t) dξ ¯¯ l l l ¯ ¯ ¯ l=0
−Ω/2
µZ
l=0
Ω/2
≤ kf k2 −Ω/2
¯ ¯2 ¶1/2 X ¯ 2πitξ n−1 ¯ 2πixl ξ ¯e . − e Ψl (t)¯¯ dξ ¯ l=0
Applying (3.68) now gives the result. It is appropriate at this stage to make some comparisons and distinctions between the approaches and results of [53] and [369]. The first thing to observe is that Theorem 3.3.13 requires positive weights in an n-point quadrature satisfying (3.56). A further requirement of the quadrature is that the order √ ε of the approximation should satisfy ε < γm with m < n. The sampling functions are cardinal in the sense that Sl (xk ) = δlk . By contrast, in Theorem 3.3.15 the sampling functions are not cardinal and approximations are made in Vn where n is the number of nodes in the √ quadrature. The quadrature approximation ε should be smaller than γn , a stronger restriction than that of [369]. There are also similarities between the approaches of [53, 369] on the one hand, where bandlimited functions are locally approximated by a series of PSWFs, and [165, 167] on the other, where bandlimited functions are locally approximated by trigonometric polynomials.
3.4 Sampling in phase space: the short-time Fourier transform R∞ The Fourier transform f 7→ fb(ξ) = −∞ f (t)e−2πitξ dt is a first point-of-call when we want to extract frequency information R ∞ from signals. The reconstruction of f from its Fourier transform, f (t) = −∞ fb(ξ)e2πitξ dξ, is a reconstruction of f from its frequency content. The spectrum |fb(ξ)| gives an indication of which frequencies are important in the make-up of f .
132
3 Sampling in Fourier and wavelet analysis
Consider, however, a situation in which we want to determine the score of a piece of music from the audio signal we record at a concert. Certainly from the spectrum we can see which notes were played, but a musical score is much more than just a list of the notes to be played. It also tells the musician when they are to be played and, of course, how long each note is to be held. We therefore need to extract not just frequency information, but time-localized frequency information. We should be aiming to produce a “score” function S of two variables (time t and frequency ξ) so that if S(t0 , ξ0 ) is small, then no note of frequency ξ0 (or near ξ0 ) is being played at time t0 (or near t0 ), while a large value of S(t0 , ξ0 ) will indicate that a note of frequency ξ0 is being played fortissimo at time t0 . One way to generate such a function is to “window” the Fourier transform in the following way. Let g ∈ L2 (R) (the window function) have unit L2 -norm. Then the short-time Fourier transform (or windowed Fourier transform) of f with respect to the window g is the function S(f, g) on R2 defined by Z S(f, g)(t, ξ) = f (s)¯ g (s − t)e−2πisξ ds = (f g¯(· − t))∧ (ξ) = hf, gt,ξ i (3.69) where gt,ξ (s) = e2πisξ g(s − t). The functions gt,ξ are thought of as time– frequency translates of g. With Tt and Mξ , the translation and modulation operators, respectively, acting via Tt f (s) = Rf (s−t) and Mξ f (s) = Re2πisξ f (s), we have gt,ξ = Mξ Tt g. If g satisfies ∆(g) = t2 |g(t)|2 dt < ∞ and t|g(t)|2 dt = 0, then we interpret g as being centered at 0 with finite variance. The function g¯(·−t) is then centered at t and S(f, g)(t, ξ) gives frequency information about f near time t. The degree of localization is interpreted to be the variance ∆(g) of the window. On the other hand, with an application of the Parseval relation to (3.69) we have Z (3.70) S(f, g)(t, ξ) = fb(η)e2πit(η−ξ) g¯b(η − ξ) dη. R R If we further assume that ∆(b g ) = ξ 2 |b g (ξ)|2 dξ < ∞ and ξ|b g (ξ)|2 dξ = 0 so that gb is centered at 0, then from (3.70) we see that S(f, g)(t, ξ) represents time information about f near frequency ξ. Hence S(f, g)(t, ξ) gives time–frequency localized information near the point (t, ξ) in time–frequency space (phase space). This information is represented in phase space by the Heisenberg box [t − ∆(g)/2, t + ∆(g)/2] × [ξ − ∆(b g )/2, ξ + ∆(b g )/2]. Note that the shape of the box is independent of t and ξ. Furthermore, by the Heisenberg uncertainty inequality (Theorem 5.1), the area ∆(g)∆(b g ) of the box is bounded below by π 2 /16. For purposes of good resolution, we would like the information we glean from S(f, g) to be as well time–frequency localized as possible. This is achieved, in our heuristic, by minimizing the volume of the Heisenberg box, which, again by Theorem 5.1, is achieved by choosing g to be a shifted, modulated multiple of a Gaussian.
3.4 Sampling in phase space: the short-time Fourier transform
133
The fixed shape of the Heisenberg box causes problems when using this transform for signal analysis. Phenomena of duration shorter than the time window will be poorly resolved—two notes of short duration played close together in time will not be distinguishable. Of course, this may be remedied by shortening the time window which would make sense if we have some prior knowledge of the signal, for example, the shortest duration of a note in the score. But shortening the window in time has the undesirable effect of lengthening the frequency window (remembering that the area of the box is bounded below) and this gives poor frequency resolution—so it will be difficult to resolve two notes whose frequencies are close together. Information encoded in S(f, g) is extremely redundant. Unlike the Fourier transform, the output S(f, g)(t, ξ) is a continuous function of (t, ξ) no matter the input f ∈ L2 or window g ∈ L2 . It is not surprising then that the shorttime Fourier transform can be sampled without loss of information. 3.4.1 Regular Gabor frames Mimicking again the musical score, we discretize the transform by dividing the time axis into equally spaced intervals and similarly in frequency. Given a, b > 0 we compute only the values S(f, g)(ma, nb) (m, n ∈ Z), i.e., we sample the short-time Fourier transform on the lattice Λ = aZ × bZ. The determination of appropriate values of a and b is an important question and will be taken up shortly, but the heuristic, at least for the moment, is that each point (t, ξ) in phase space should be covered by at least one of the Heisenberg boxes Bmn (g) = [−∆(g)/2, ∆(g)/2] × [−∆(b g )/2, ∆(b g )/2] + (ma, nb). If a or b is too large, phase space will not be covered by the boxes Bmn (g) and we cannot expect the Gabor system G(g, a, b) = {gma,nb }∞ m,n=−∞ to be complete. On the other hand, if a and b are both small, there will be redundancy in the coefficients S(f, g)(ma, nb). The short-time Fourier transform as defined by equation (3.69) is a multiple of a unitary operator from L2 (R) into L2 (R2 ), i.e., ZZ hS(f, g), S(h, g)iL2 (R2 ) = (f (·)¯ g (· − t))∧ (ξ)(h(·)¯ g (· − t))∧ (ξ) dξ dt ZZ ¯ ds dt = f (s)|g(s − t)|2 h(s) Z Z ¯ = f (s)h(s) |g(s − t)|2 dt ds = kgk22 hf, hiL2 (R) . As a consequence, reconstruction of f from its short-time Fourier transform is achieved by ZZ f (s) = S(f, g)(t, ξ)gt,ξ (s) dt dξ, at least in the weak sense. Norm convergence and pointwise convergence theorems are also available with appropriate summability methods [183].
134
3 Sampling in Fourier and wavelet analysis
We might hope then that the discretized operator, mapping f ∈ L2 (R) to the sequence cmn = S(f, g)(ma, nb) (m, n ∈ Z) obtained by sampling the short-time Fourier transform on the lattice aZ × bZ, remains unitary. In this case the Gabor system G(g, a, b) is an orthonormal basis for L2 (R) and f may be recovered from its sampled short-time Fourier transform via X f (t) = S(f, g)(ma, nb)gma,nb (t). m,n
If we insist only that G(g, a, b) form a frame for L2 (R), then we have X |S(f, g)(ma, nb)|2 ≤ Bkf k22 Akf k22 ≤ m,n
and reconstructions of f from the coefficients S(f, g)(ma, nb) may be computed using the frame reconstruction algorithms of Section 3.1. There is, however, a delicate trade-off between the density of the lattices aZ × bZ in phase space, the smoothness of g and gb and the ability of the Gabor system G(g, a, b) to form a frame or orthonormal basis for L2 (R). The trade-off is expressed in the next set of results. Theorem 3.4.1. Let g ∈ L2 (R). If the Gabor system G(g, a, b) is a frame for L2 (R) then ab ≤ 1. Daubechies [98] was the first to prove this result (using Zak transform techniques) in the case where ab is rational. The irrational case was dealt with by Baggett [15] with an application of a result by Rieffel [304] on the coupling constant of the von Neumann algebra generated by the time–frequency shift operators Mnb Tma . Janssen [213] gave a proof that avoids von Neumann algebras, instead using Walnut’s remarkable representation of frame operators (see [168, 355]), and the Wexler–Raz biorthogonality relations (see [168, 363]). To fix notation, given g ∈ L2 (R) and a lattice Λ = aZ × bZ in phase space, we define the (analysis) operator Dga,b = Dg taking f ∈ L2 (R) to the doubly indexed sequence dmn = (Dg f )mn = hf, Mnb Tma gi = S(f, g)(ma, nb). 2 Then G(g, a, b) = {Mnb Tma g}∞ m,n=−∞ is a frame for L (R) precisely when there exist constants 0 < A ≤ B < ∞ for which
Akf k22 ≤ kDga,b f k2`2 (Z2 ) ≤ Bkf k22 for all f ∈ L2 (R). In this case, Dga,b is bounded from L2 (R) to `2 (Z2 ) and its adjoint (Dga,b )∗ = Sga,b : `2 (Z2 ) → L2 (R) (the synthesis operator) given by Sga,b c =
X m,n
cmn Mnb Tma g
3.4 Sampling in phase space: the short-time Fourier transform
135
is also bounded. Both Dga,b and Sga,b have bounded inverses. Given g, h ∈ L2 (R) for which the analysis operators Dga,b and Dha,b are a,b bounded, we consider the bounded operator Ug,h = Sga,b ◦ Dha,b : L2 (R) → a,b L2 (R). If G(g, a, b) is a frame for L2 (R), then Ug,g (the frame operator) is 2 a,b bounded and invertible on L (R) and since Ug,g is self-adjoint and commutes with the time–frequency shifts Mnb Tma , we have a decomposition of f ∈ L2 (R) of the form X a,b a,b −1 a,b −1 f = Ug,g h(Ug,g (Ug,g ) f= ) f, Mnb Tma giMnb Tma g m,n
X X a,b −1 hf, (Ug,g ) Mnb Tma giMnb Tma g = hf, Mnb Tma geiMnb Tma g = m,n
m,n
a,b 2 a,b a,b a,b −1 ) g. Equivalently, Ug,˜ where g˜ = (Ug,g ˜ = I on L (R). Similarly, g = Sg Dg a,b −1 a,b f = (Ug,g ) Ug,g f =
X
hf, Mnb Tma giMnb Tma ge,
m,n a,b a,b 2 i.e., Ug˜a,b ,g = Sg ˜ ◦ Dg = I on L (R). The Wexler–Raz biorthogonality relations [168, 363], are as follows.
Theorem 3.4.2. Suppose g, h ∈ L2 (R), a, b > 0 and Sga,b , Sha,b are bounded on `2 (Z2 ). Then the following are equivalent: a,b a,b (i) Ug,h = Uh,g =I (ii) hh, Tn/b Ml/a gi = ab δl0 δn0 for all l, n ∈ Z. If G(g, a, b) is a frame for L2 (R), there will in general be many functions a,b a,b h with the property that Ug,h = Uh,g = I. Theorem 3.4.2 characterizes these “dual” windows as precisely those with biorthogonality property (ii). One such a,b −1 window is the canonical dual window g˜ = (Ug,g ) g. Given Theorem 3.4.2, the proof of the density result Theorem 3.4.1 is straightforward. a,b 2 Proof. Let g˜ be the canonical dual window for g. Then Ug,˜ g = I on L (R) and by the Wexler–Raz biorthogonality relations, ab δl0 δn0 = h˜ g , Tn/b Mm/a gi. In particular, he g , gi = ab. (3.71)
Since the Gabor system G(g, a, b) is a frame rather than P a Riesz basis, each signal f ∈ L2 (R) admits an expansion of the form f = l,m clm Mmb Tla g, but the coefficients clm are in general not unique. The minimum energy sequence clm for which such an expansion is valid is that which arises from the canonical dual window, i.e., X g= hg, Mmb Tla geiMmb Tla g. (3.72) l,m
136
3 Sampling in Fourier and wavelet analysis
However, g also admits the trivial expansion X δl0 δm0 Mmb Tla g. g=
(3.73)
l,m
The coefficient sequence of the canonical expansion (3.72) has energy no greater than that of the coefficient sequence in (3.73). Using this and equation (3.71), we have X X (ab)2 = |hg, gei|2 ≤ |hg, Mmb Tla gei|2 ≤ δl0 δm0 = 1, l,m
l,m
i.e., ab ≤ 1 and the proof is complete. The case of the regular Gabor system with lattice density 1/(ab) = 1 is intriguing. Consider the case where a = b = 1 and g = χ[0,1) . Then gmn (t) = χ[m,m+1) (t)e2πint and if f ∈ L2 (R), a simple calculation gives X m,n
X X ¯¯ Z ¯ |hf, gmn i| = ¯
1
2
m
n
f (t + m)e 0
−2πint
¯2 ¯ dt¯¯ = kf k22 ,
i.e., the system is an orthonormal basis for L2 (R). Notice, however, that this desirable property has been achieved at the expense of the time–frequency localization of the basis functions. In fact, the Heisenberg box associated with g has infinite area since ∆(b χ[0,1) ) = ∞. Although the values of S(f, g)(m, n) give information about f that is well-localized in time, the information is poorly localized in frequency. This behavior is characteristic of windows g for 2 which the Gabor system {gmn }∞ m,n=−∞ forms an orthonormal basis for L (R). There is a beautiful interplay between the density 1/(ab) of the sampling lattice and the smoothness/decay properties of the window of a Gabor system G(g, a, b). By Theorem 3.4.1, if G(g, a, b) is a frame for L2 (R), then ab ≤ 1. Furthermore, a Gabor system G(g, a, 1/a) (with kgk2 = 1) is a tight frame if and only if it is an orthonormal basis. Proofs of these statements appear in [168]. At the critical density (b = 1/a), the window g of a Gabor frame G(g, a, b) is either poorly localized in time or in frequency, as the following result demonstrates. Theorem 3.4.3. If g ∈ L2 (R) generates a Gabor Riesz basis G(g, a, b) for L2 (R) with ab = 1, then ∆(g)∆(b g ) = ∞. This is the celebrated Balian–Low theorem. Notice the relationship between Theorem 3.4.1 and Theorem 3.4.3. The sampling rate (ab)−1 = 1 is a threshold rate. In Theorem 3.4.3, sampling is performed at the threshold rate. While this may produce a frame (perhaps even an orthonormal basis) and hence a stable reconstruction procedure, this comes at the expense of poor time–frequency localization of the window, hence of the representation. By
3.4 Sampling in phase space: the short-time Fourier transform
137
contrast, Theorem 3.4.1 allows for good time–frequency localization at the expense of a higher sampling rate (ab)−1 > 1. The Theorem 3.4.3 was observed independently by Balian [17] and Low [264]. The proofs they gave, however, had slight errors which were corrected and the result was extended by Coifman and Semmes to include frames rather than just orthonormal bases, as reported in [98]. Battle [23] provided a proof for orthonormal bases which appealed directly to the Heisenberg uncertainty inequality (5.2). We now sketch a proof due to Daubechies and Janssen [106] that uses properties of the Zak transform. By dilating if necessary, we may assume that a = b = 1. Notice that Zgmn (t, ξ) = e2πint e−2πimξ Zg(t, ξ). Hence, by the unitarity of the Zak transform ¯ ¯2 X X¯Z 1 Z 1 ¯ 2 −2πint 2πimξ ¯ |hf, gmn i| = Zf (t, ξ)e e Zg(t, ξ) dt dξ ¯¯ ¯ m,n
m,n
Z
1
0
Z
1
= 0
0
|Zf (t, ξ)|2 |Zg(t, ξ)|2 dt dξ.
0
2 Consequently, if {gmn }∞ m,n=−∞ is a frame for L (R) with frame bounds 0 < A ≤ B < ∞, then Z 1Z 1 Z 1Z 1 X A |Zf (t, ξ)|2 dt dξ ≤ |hf, gmn i|2 ≤ B |Zf (t, ξ)|2 dt dξ 0
0
m,n
0
0
for all f ∈ L2 (R) if and only if A ≤ |Zg(t, ξ)|2 ≤ B
(3.74)
for a.e. (t, ξ). However, all continuous Zak transforms have a zero on the unit square [183]. Either of the conditions ∆(g) < ∞ or ∆(b g ) < ∞ force the continuity of Zg, hence contradicting the lower bound of (3.74). Heil and colleagues [35, 181], have proved the following Wiener space version of the Balian–Low theorem. The Wiener space W = W (R) is the Banach space of measurable functions f on the line for which kf kW = P sup 0≤t<1 |f (t + k)| < ∞. W0 (R) represents the continuous elements of k W (R). Theorem 3.4.4. If g ∈ L2 (R) generates a Gabor frame G(g, a, b) for L2 (R) with ab = 1, then both g ∈ / W0 (R) and gb ∈ / W0 (R). Theorem 3.4.4 extends to arbitrary dimensions. For purposes of time– frequency analysis, the Wiener spaces are natural candidates from which to draw windows as Gr¨ochenig explains in [168]. Neither Theorems 3.4.3 or 3.4.4 imply the other.
138
3 Sampling in Fourier and wavelet analysis
3.4.2 Irregular Gabor frames A very general approach to the density of Gabor frames was provided by Landau [248] whose work used the prolate spheroidal wavefunctions (PSWFs), the properties of which were reviewed in Section 3.3.3. Phase space density of Gabor systems. The theme of [248] is the local representation of signals in phase space. To explain this concept, suppose Λ is a discrete subset of phase space and g ∈ L2 (R) has the property that the (possibly irregular) Gabor system G(g, Λ) = {e2πiλ2 t g(t − λ1 )}λ=(λ1 ,λ2 )∈Λ is aPframe for L2 (R). Each f ∈ L2 (R) admits an expansion of the form f = ˜λ igλ where {˜ gλ }λ∈Λ is the dual frame to {gλ }λ∈Λ . Given a subset λ∈Λ hf, g D ⊂ R2 , we study truncations fD of the frame expansion of f given by X fD = hf, geλ igλ . λ∈Λ∩D
For measurable subsets S, Σ of R define the “timelimiting operator” QS and “bandlimiting operator” PΣ by QS f = χS f , PΣ f = (χΣ fb)∨ . Given intervals I, J and δ > 0, let F(I, J; δ) denote the set of those signals f ∈ L2 (R) for which kQR\I f k2 ≤ δkf k2 , kPR\J f k2 ≤ δkf k2 . If δ is small, signals in F(I, J; δ) are well localized in time to the interval I and well localized in frequency to the interval J. The Donoho–Stark uncertainty principle √ (Corollary 5.1.5) states that if F(I, J; δ) is nonempty, then |I||J| ≥ (1 − 2δ)2 . The Gabor system G(g, Λ) is said to provide a local representation of b and δ > 0, there exists L2 (R) in phase space if, for all intervals I ⊂ R, J ⊂ R a constant K > 0 and a neighborhood Γ (I, J; δ) of I × J in phase space such that kf − fΓ (I,J;δ) k ≤ Kδkf k for all f ∈ F(I, J; δ). Landau weakens this requirement slightly by insisting b there is a neighborhood only that for all f ∈ L2 (R) and intervals I ⊂ R, J ⊂ R, Γ (I, J) of I × J in phase space such that ° ° X ° ° °f − hf, geλ igλ ° (3.75) ° ° ≤ K(kQR\I f k2 + kPR\J f k2 ) + νkf k2 λ∈Λ∩Γ (I,J)
2
for constants ν < 1 and K independent of I and J. With this definition, we have the following remarkable result [248]: Theorem 3.4.5. If I, J, Γ (I, J), K and ν are as above and satisfy (3.75) for all f ∈ L2 (R), then #{Γ (I, J) ∩ Λ} ≥ |I||J| − L log(|I||J|) with L a constant depending only on K and ν.
3.4 Sampling in phase space: the short-time Fourier transform
139
Proof. Let n = #{Γ (I, J) ∩ Λ} and S be the linear span of {˜ gλ }λ∈Λ∩Γ (I,J) . Then dim(S) ≤ n, and if f ∈ L2 (R) is bandlimited to J with f ⊥ S then hf, g˜λ i = 0 for all λ ∈ Λ ∩ Γ (I, J). Therefore ° ° X ° ° ° kf k2 = °f − hf, g˜λ igλ ° ° ≤ KkQR\I f k2 + νkf k2 , 2
λ∈Λ∩Γ (I,J)
i.e., kQR\I f k2 ≥ (1 − ν)kf k2 /K. Equivalently, kf k22 − kQR\I f k22 (1 − ν)2 kQI f k22 = ≤1− . 2 2 kf k2 kf k2 K2
(3.76)
The minimax characterization of the eigenvalues γk , namely µ ¶ kQI f k22 max γk = min PJ f =f, f ⊥U kf k2 dim(U )=k 2 then gives (since dim(S) ≤ n) γn ≤
max
PJ f =f, f ⊥S
kQI f k22 (1 − ν)2 ≤1− 2 kf k2 K2
by (3.76). Suppose p = max{q : γq > (1 − ν)2 /K 2 }. Then, applying (3.47), ½ ¾ (1 − ν)2 (1 − ν)2 p−n ≤ # k : ≤ γ ≤ 1 − ≤ M (ν, K) log(|I||J|). (3.77) k K2 K2 However, p ≥ [|I||J|] + 1 > |I||J| by (3.48). Combining this with (3.77) gives n ≥ p − M (ν, K) log(|I||J| ≥ |I||J| − M (ν, K) log(|I||J|) as required. The proof is complete. Given a discrete subset Λ ⊂ Rn and r > 0, let Q be the cube centered at the origin of side length 1 with sides parallel to the axes and rQ its dilate by r (of side length r). For each r > 0 let n+ Λ (r) = sup #{Λ ∩ (rQ + x)} x∈Rn
be the largest number of elements of Λ in a translate of rQ. Define n− Λ (r) similarly, replacing the supremum by an infimum. Then the upper and lower Beurling densities of Λ are n+ Λ (r) ; r→∞ r n
D+ (Λ) = lim
n− Λ (r) . r→∞ r n
D− (Λ) = lim
This extends the definitions of Beurling densities given in Section 3.3.1 to higher dimensions in a natural way. There are other possible definitions of
140
3 Sampling in Fourier and wavelet analysis
densities of sampling sets. A discussion of other densities and their relationship to the Beurling densities is given in Benedetto and Ferreira [34]. A discrete set λ = {λi } ⊂ R2 is said to be uniformly separated if there exists δ > 0 such that |λi − λj | > δ for all λi , λj ∈ Λ with i 6= j. The following result, due to Christensen et al. [77], provides a connection between discrete subsets Λ ⊂ R2 which satisfy a separation property and those for which G(g, Λ) is a frame for some g ∈ L2 (R). Lemma 3.4.6. If g ∈ L2 (R) is nontrivial, Λ ⊂ R2 is a discrete subset and the Gabor system G(g, Λ) has upper frame bound B < ∞, then Λ is uniformly separated. Proof. With g as in the statement of the lemma and f ∈ L2 (R), S(f, g) is a bounded continuous function on phase space since translation Tt and modulation Mξ are continuous operators on L2 (R) and |S(f, g)(t, ξ)| ≤ kf k2 kMξ Tt gk2 = kf k2 kgk2 . Fix 0 6= f ∈ L2 (R). S(f, g) is nontrivial since kS(f, g)kL2 (R2 ) = kf k2 kgk2 and its continuity implies that it must be bounded away from zero on some cube Qh (t0 , ξ0 ) centered at (t0 , ξ0 ) with side length h: µ=
inf
(t,ξ)∈Qh (t0 ,ξ0 )
|S(f, g)(t, ξ)| > 0.
Suppose now that Λ is not uniformly separated. Then for all integers N ≥ 1, there exists λ = (a, b) ∈ Λ such that Qh (a, b) contains at least N elements (ci , di ) (1 ≤ i ≤ N ) of Λ. The function Mb−ξ0 Ta−t0 f satisfies S(Mb−ξ0 Ta−t0 f, g)(t, ξ) = S(f, g)(t + t0 − a, ξ + ξ0 − b). However, if (t, ξ) ∈ Qh (a, b), then (t + t0 − a, ξ + ξ0 − b) ∈ Qh (t0 , ξ0 ) and X |hMb−ξ0 Ta−t0 f, gλ i|2 Bkf k22 ≥ λ∈Λ
≥
N X
|hMb−ξ0 Ta−t0 f, Mdi Tci gi|2
i=1
=
N X
|S(f, g)(ci + t0 − a, di + ξ0 − b)|2 ≥ N µ,
i=1
thus contradicting the assumption that G(g, Λ) has an upper frame bound. The proof is complete. In [98], Daubechies uses the Poisson summation formula to show that if Λ = aZ × bZ is a uniform lattice and g, gb satisfy the bounds |g(t)| ≤
c , (1 + |t|)α
|b g (ξ)| ≤
c , (1 + |ξ|)α
(3.78)
3.4 Sampling in phase space: the short-time Fourier transform
141
with α > 1, then the Gabor system G(g, a, b) satisfies (3.75) with Γ (I, J) a rectangle obtained by increasing the side lengths of I and J by a fixed amount. Clearly, Γ (I) = Γ (I, I) then satisfies lim|I|→∞ |Γ (I)|/|I × I| = 1 and the density result D− (Λ) ≥ 1 is immediate. Daubechies’ result has, of course, been superseded by the work of Janssen (Theorem 3.4.1). It is possible to prove a density result based on Landau’s notion of time– frequency localization (3.75) for nonuniform sampling sets Λ. The cost of abandoning the lattice structure in Daubechies result is a stronger condition on the decay of g, gb. Theorem 3.4.7. Suppose g ∈ L2 (R) satisfies (3.78) with α > 3/2, Λ ⊂ R2 is a discrete subset and G(g, Λ) is a frame for L2 (R). Then for each pair of b with |I|, |J| > 1 and 0 < ν < 1, there exists a intervals I ⊂ R and J ⊂ R neighborhood Γ (I, J; ν) of I × J for which (3.75) is valid. Proof. Let I = [−T, T ] and J = [−Ω, Ω] with T ≥ Ω and for a > 0 let (I × J)a be the rectangle (I × J)a = [−a − T, T + a] × [−Ω − aΩ/T, Ω + aΩ/T ] containing I × J. The value of a will depend on ν and will be determined later. Let C be the conical domain C = {(t, ξ) ∈ R2 ; |ξ| ≤ |t|Ω/T } and D its complement in R2 . For non-negative integers j, let Cj = {(t, ξ) ∈ C; T + 2j a ≤ |t| < T + 2j+1 a}, ½ ¾ 2j Ωa 2j+1 Ωa Dj = (t, ξ) ∈ D; Ω + ≤ |ξ| < Ω + . T T Suppose the upper and lower frame bounds for G(g, Λ) are A and B, respectively. Then the upper and lower frame bounds for the dual frame G(˜ g , Λ) are B −1 and A−1 , respectively, and we have ° °2 ° °2 X X ° ° ° ° °f − ° ° hf, gλ ie gλ ° = ° hf, gλ ie gλ ° ° ° 2
λ∈Λ∩(I×J)a
¯ ¯ = sup ¯¯ khk2 =1
λ∈Λ\(I×J)a
X
≤ sup ≤A
−1
¶µ |hf, gλ i|2
λ∈Λ\(I×J)a
X
2
¯2 ¯ hf, gλ ihe gλ , hi¯¯
X
µ khk2 =1
λ∈Λ\(I×J)a
X
¶ |he gλ , hi|2
λ∈Λ\(I×J)a 2
|hf, gλ i| .
(3.79)
λ∈Λ\(I×J)a
Decomposing R2 \ (I × J)a as the union (C \ (I × J)a ) ∪ (D \ (I × J)a ) and applying the frame bounds for {gλ }λ∈Λ gives
142
3 Sampling in Fourier and wavelet analysis
X λ∈Λ\(I×J)a
|hQI f, gλ i| + |hPJ f, gλ i| +
λ∈Λ\(I×J)a ∩D
≤
X
+
X
2
B(kQR\I f k22
|hQR\I f, gλ i|2
λ∈Λ\(I×J)a ∩C
λ∈Λ\(I×J)a ∩C
+
|hf, gλ i|2
λ∈Λ\(I×J)a ∩D
X
2
X
X
|hf, gλ i|2 +
λ∈Λ\(I×J)a ∩C
X
≤
X
|hf, gλ i|2 =
|hPR\J f, gλ i|2
λ∈Λ\(I×J)a ∩D
+
kPR\J f k22 )
X
|hQI f, gλ i|2 +
λ∈Λ\(I×J)a ∩C
|hPJ f, gλ i|2 .
(3.80)
λ∈Λ\(I×J)a ∩D
However, by (3.78) and the decomposition C \ (I × J)a = ∪∞ j=0 Cj , X
|hQI f, gλ i|2 =
∞ X X
|hQI f, gλ i|2
j=0 λ∈Λ∩Cj
λ∈Λ\(I×J)a ∩C
≤
∞ X X µZ j=0 λ∈Λ∩Cj
≤ c2
¶2 |f (t)||gλ (t)| dt
I
∞ X X µZ j=0 λ∈Λ∩Cj
I
¶2 |f (t)| dt (3.81) (1 + |t − λ1 |)α
where λ = (λ1 , λ2 ) and gλ (t) = g(t − λ1 )e2πitλ2 . If λ ∈ Λ ∩ Cj and t ∈ I then |t − λ1 | ≥ 2j a so that when λ1 > 0, the last integral in (3.81) may be estimated by Cauchy–Schwarz: µZ ¶2 Z T |f (t)| dt 2 dt ≤ kQI f k2 α 2α (1 + |t − λ |) (1 + λ 1 1 − t) I −T kQI f k22 (1 + λ1 − T )1−2α 2α − 1 kQI f k22 j 1−2α ≤ (2 a) , 2α − 1 ≤
(3.82)
where the condition α > 3/2 is sufficient to ensure the convergence of the integrals. Substituting the estimate (3.82) into (3.81) gives X
|hQI f, gλ i|2 ≤ c2
∞ X X j=0 λ∈Λ∩Cj
λ∈Λ\(I×J)a ∩C
≤ c2
kQI f k22 1−2α j(1−2α) a 2 2α − 1
∞ kQI f k22 1−2α X j(1−2α) a 2 #(Λ ∩ Cj ). (3.83) 2α − 1 j=0
We now need to estimate #(Λ ∩ Cj ). By Lemma 3.4.6, Λ is δ-uniformly separated for some δ > 0 and consequently, if Bδ/2 (λ) is the ball of radius δ/2
3.4 Sampling in phase space: the short-time Fourier transform
143
centered at λ ∈ Λ, then ∩λ∈Λ Bδ/2 (λ) is empty. Given a set E ⊂ R2 and ε > 0, let E ε = {x ∈ R2 ; dist (x, E) < ε} = ∪y∈E Bε (y) be the ε-thickening of E. Then the disjoint balls {Bδ/2 (λ)}λ∈Λ∩Cj are contained in the δ/2-thickening δ/2
Cj
of Cj . Consequently,
πδ 2 #(Λ ∩ Cj ) = 4
X
¯ ¯ |Bδ/2 (λ)| = ¯¯
λ∈Λ∩Cj
¯ ¯ δ/2 Bδ/2 (λ)¯¯ ≤ |Cj | ≤ 32Ωa2 22j
[ λ∈Λ∩Cj
since without loss of generality we may assume Ω > 1/2 and δ < 1 ≤ a. Hence, #(Λ ∩ Cj ) ≤ 128Ωa2 22j /πδ 2 . Substituting this into (3.83) gives X
∞
|hQI f, gλ i|2 ≤ cδ −2 Ωa3−2α
λ∈Λ\(I×J)a ∩C
≤ cδ −2 Ωa3−2α
kf k22 X −j(2α−3) 2 2α − 1 j=0 kf k22 1 − 23−2α
(3.84)
where the sum converges sinceP α > 3/2. Similarly, we may estimate λ∈Λ\(I×J)a ∩D |hPJ f, gλ i|2 with the aid of the Plancherel theorem, from which we obtain µ
X
|hPJ f, gλ i|2 ≤ cδ −2 a3−2α
λ∈Λ\(I×J)a ∩D
Ω T
¶1−2α (Ω + T )
kf k22 . (3.85) 1 − 23−2α
Substituting (3.84), (3.85) and (3.80) into (3.79)gives ° ° °f − °
X λ∈Λ∩(I×J)a
° µ ¶1/2 ° B hf, gλ i˜ gλ ° (kQR\I f k2 + kPR\J f k2 ) ≤ c ° A 2
µ µ ¶1−2α ¶1/2 c a(3−2α)/2 Ω + √ Ω + (Ω + T ) kf k2 . T δ A (1 − 23−2α )1/2
To show that G(g, Λ) provides a local representation in the sense of (3.75), we need to show that for all 0 < ν < 1, a may be chosen so that a(3−2α)/2 δ A (1 − 23−2α )1/2 c √
à Ω+
µ
Ω T
¶1−2α
!1/2 (Ω + T )
< ν.
Since the exponent (3 − 2α)/2 of a in the inequality is negative, however, it is clear that the inequality can always be achieved for a sufficiently large. The proof is complete. To obtain the density result, suppose Ω = T so that I × J is the square [−T, T ] × [−T, T ]. An appropriate value of a for which (3.75) is valid for ν = 1/2 is a = dT 1/(2α−3) where d = c(δ 2 A)1/(3−2α) . In that case,
144
3 Sampling in Fourier and wavelet analysis
(I×J)a = [−T −dT 1/(2α−3) , T +dT 1/(2α−3) ]×[−T −dT 1/(2α−3) , T +dT 1/(2α−3) ] and |(I × J)a | 4(T + dT 1/(2α−3) )2 = = (1 + dT (4−2α)/(2α−3) )2 → 1 |I × J| 4T 2 as T → ∞, provided α > 2. Hence, when coupled with Theorem 3.4.5, these estimates provide the following density result: Theorem 3.4.8. Suppose g ∈ L2 (R) satisfies (3.78) with α > 2 and Λ ⊂ R2 is a discrete set for which G(g, Λ) is a frame for L2 (R). Then D− (Λ) ≥ 1. Ramanathan and Steger [302] have proved that Theorem 3.4.8 remains valid without the decay condition on g, gb. They introduce as a tool in their proof the notion of the homogeneous approximation property (HAP) of a frame which weakens Landau’s requirement of a time–frequency localized frame representation. Christensen et al. [77] extend this result to higher dimensions and multiple generating functions. Their proof again uses the HAP and some complex function theory.
3.5 Sampling in principal shift-invariant spaces Increasingly, the algorithms being used for compression, smoothing and other signal processing tasks, arise from wavelet techniques and the signals dealt with come from those spaces associated with wavelet theory, of which Paley– Wiener spaces are but one example. It is important then that sampling procedures be developed for signals in this broader class of spaces. A principal shift-invariant (PSI) space is a closed subspace V of L2 (R) with the property that there exists ϕ ∈ V such that V is the closed linear span of the collection {ϕ(· − n)}∞ n=−∞ of integer shifts of ϕ. We then write V = V (ϕ). From the definition, it is clear that V (ϕ) is invariant under integer translations (which we call shifts)), though not necessarily under arbitrary translations as in the Paley–Wiener case. To simplify the exposition, we will also insist that {ϕ(· − n)}∞ n=−∞ form an orthonormal basis for V , though sampling criteria considered here will often hold under weaker conditions. Then ϕ is called an orthogonal generator of V (ϕ). We will consider two approaches to sampling reconstructions. The first involves iterative methods, in analogy to Section 3.3.2, and will be phrased in the full generality of PSI spaces. The second involves formulas for interpolating signals in PSI spaces from their samples. To verify the existence of interpolating functions, as well as to discuss aliasing errors, it will be convenient to work in the context of multiresolution spaces.
3.5 Sampling in principal shift-invariant spaces
145
3.5.1 Iterative reconstruction in PSI spaces We now turn to the iterative reconstruction of signals in PSI spaces and report on results of Aldroubi and Gr¨ochenig [3]. These schemes are frame based, rely on some elementary ideas from functional analysis and can be viewed as natural successors of the extrapolation procedures of Papoulis [293] and Gerchberg [154]. The notion of PSI spaces may be lifted easily from L2 (R) to P Lp (R). Given 1 ≤ p < ∞, let V p (ϕ) be the space of signals of the form f (t) = k ck ϕ(t − k) p with {ck }∞ k=−∞ ∈ ` (Z). In [3], this picture is further complicated by the appearance of weights in the definition of the PSI spaces, but for the sake of clarity we will not consider weights. The main results of this section are still valid in the more general context, but the constants involved in the inequalities (whose computation are crucial in the determination of appropriate sampling rates) become more difficult to determine in the case of general p or weighted spaces. In Section 3.4.1 we introduced the Wiener amalgam space W = W (L∞ , `1 ). We now need to broaden its definition. The Wiener amalgam space W (`p ) = W (L∞ , `p ) (1 ≤ p ≤ ∞) is the Banach space of measurable functions f on the line for which ¶1/p µX sup |f (t + k)|p < ∞ (1 ≤ p < ∞) kf kW (`p ) = k
0≤t<1
kf kW (`∞ ) = sup sup |f (t + k)| < ∞. k
0≤t<1
Notice that W = W (`1 ). Crucial for our purposes is the continuous inclusion W (`p ) ⊂ Lp . To see this, simply note that if f ∈ Lp then Z kf kpp =
∞
|f (t)|p dt =
−∞
XZ k
≤
X k
1
|f (t + k)|p dt
0
sup |f (t + k)|p = kf kpW (`p ) .
(3.86)
0≤t<1
It is important for us to be able to control the W (`p ) norm of f ∈ V p (ϕ) by the `p -norm of its coefficients. As a simple application of Young’s inequality we have [3]: P Lemma 3.5.1. If f = k ck ϕ(· − k) ∈ V p (ϕ), then kf kW (`p ) ≤ kck`p kϕkW . Proof. Observe that with bk = sup0≤t<1 |ϕ(t + k)| (k ∈ Z),
146
3 Sampling in Fourier and wavelet analysis
¯X ¯ ¯ ¯ sup ¯¯ ck ϕ(t + l − k)¯¯ 0≤t<1 l k µ ¶p X X ≤ |ck | sup |ϕ(t + l − k)|
kf kpW (`p ) =
X
l
= k|c| ∗
k bkp`p
0≤t<1
≤ kckp`p kbkp`1 = kckp`p kϕkpW .
Recall that in the proof of Theorem 3.2.2 (frame-based reconstruction from samples of trigonometric polynomials), a crucial ingredient was the application of Wirtinger’s inequality. The same can be said for Theorem 3.3.4. In the case of PSI spaces, Wirtinger’s inequality and other tools of complex function theory are no longer available. We must rely on methods that make no use of the Fourier transform. Given a complex-valued function f on the line and δ > 0, define the δoscillation of f , or modulus of continuity ωδ (f ), as the non-negative function on the line given by ωδ (f )(t) = sup |f (t + y) − f (t)| |y|≤δ
whenever the supremum exists. With this definition, ωδ is a sublinear operator on W . In fact, provided δ ≤ 1, kωδ (ϕ)kW ≤ 5kϕkW . Furthermore, Aldroubi and Gr¨ochenig [3] prove the following result Lemma 3.5.2. If ϕ ∈ W0 , then kωδ (ϕ)kW → 0 as δ → 0. This property is inherited by the elements of V p (ϕ). In fact we have P Lemma 3.5.3. If f = k ck ϕ(· − k) ∈ V p (ϕ), then kωδ (f )kW (`p ) ≤ kck`p kωδ (ϕ)kW . Proof. Notice that
¯ ¯ ¯X ¯ ¯ ωδ (f )(t) = sup ¯ ck (ϕ(t + y − k) − ϕ(t − k))¯¯ |y|≤δ k X ≤ |ck |ωδ (ϕ)(t − k). k
Applying Lemma 3.5.1 gives the result. We are now in a position to prove the convergence of an iterative sampling scheme for signals in V (ϕ) = V 2 (ϕ). In particular we present a simplified version of a sampling result in [3]. Theorem 3.5.4. Let ϕ ∈ W0 (L1 ) be an orthogonal generator for a PSI space V 2 (ϕ), {λj }∞ j=−∞ ⊂ R be an increasing sequence such that λj → ±∞ as jP→ ±∞ and δ = supl |λl+1 − λl |. Let vl = (λl+1 − λl−1 )/2 and Φ(x, t) = ¯ − k) be the reproducing kernel for V 2 (ϕ). Then there exists k ϕ(x − k)ϕ(t √ δ0 such that if δ < δ0 , the collection el (x) = vl Φ(x, λl ) (l ∈ Z) is a frame for V 2 (ϕ).
3.5 Sampling in principal shift-invariant spaces
147
Proof. Let ηl = (λl−1 + λl )/2 be Pthe midpoint of the interval [λl−1 , λl ]. Define Q : V 2 (ϕ) → L2 (R) by Qf = l f (λl )χ[ηl ,ηl+1 ] . Then with an application of Lemma 3.5.3 we have X Z ηl+1 kf − Qf k22 = |f (t) − f (λl )|2 dt l
≤
XZ l
ηl
ηl+1 ηl
|ωδ/2 (f )(t)|2 dt
= kωδ/2 (f )k22 ≤ kck2`2 kωδ/2 (ϕ)k2W = kf k22 kωδ/2 (ϕ)k2W . Since kωδ/2 (ϕ)kW → 0 as δ → 0 by Lemma 3.5.2, there exists δ0 such that γ = kωδ/2 (ϕ)kW < 1 for all δ < δ0 . Then if δ < δ0 , kf − Qf k22 < γ 2 kf k22 and (1 − γ)2 kf k22 ≤ kQf k22 ≤ (1 + γ)2 kf k22 .
(3.87)
With el as in the statement of the theorem, we have X X X |hf, el i|2 = vl |hf, Φ(·, λl )i|2 = vl |f (λl )|2 = kQf k22 . l
l
l
2 Hence, by (3.87), {el }∞ l=−∞ is a frame for V (ϕ) with frame bounds A = 2 2 (1 − γ) and B = (1 + γ) . The proof is complete.
The frame operator takes the form T f (x) =
X
hf, el iel (x) =
l
X
¶ µX vl f (λl )ϕ(λ ¯ l − k) . ϕ(x − k) l
k
P (n) If frame iterates fn (x) = k ck ϕ(x − k) (n ≥ 0) are defined as in (3.7), then (n) the coefficients ck are recoverable from the samples f (λl ) via the iteration 1 X (n) Bkl cl 1 + γ2 l l (3.88) P where Bkl = j ϕ(λj − l)ϕ(λ ¯ j − k). Note that if ϕ is compactly supported, then the matrix B is banded and in each iteration of (3.88), the sum has fewer than 2M terms with M the length of the support of ϕ. (0)
ck =
1 X vl f (λl )ϕ(λ ¯ l − k); 1 + γ2
(n+1)
ck
(n)
(0)
= ck + ck −
3.5.2 Periodic nonuniform sampling in PSI spaces The (uniform) interpolation problem for V (ϕ) is: given an offset value t0 ∈ [0, 1), identify an interpolating function St0 ∈ V (ϕ) such that any f ∈ V (ϕ) can be written X f (t0 + n)St0 (t − n). (3.89) f (t) = n
148
3 Sampling in Fourier and wavelet analysis
When t0 = 0 this is just interpolation from integer samples and when ϕ(t) is the sinc function, (3.89) is the classical sampling formula (3.37) with Ω = 1. In an important, precise sense (cf. [368]) the sinc function is essentially the only cardinal, orthogonal scaling function. To describe the role played by the offset t0 when V (ϕ) is not invariant under arbitrary translations, we first consider the problem of interpolation from integer samples in that case. If one has X X ck ϕ(t − k) = f (k)S(t − k) (3.90) f (t) = k
k
2
with convergence in L (R) whenever f ∈ V (ϕ) then, upon taking the Zak transform of (3.90) one finds that Zf (t, ξ) = C(ξ) Zϕ(t, ξ) = F (ξ) ZS(t, ξ) (3.91) P P in which C(ξ) = k ck e2πikξ and F (ξ) = k f (k)e2πikξ . Equating Fourier R1 coefficients in (3.91) and applying the inversion formula 0 Zf (x, ξ) dξ = f (x) we obtain, at least formally, Z 1 Zϕ(t, ξ) −2πikξ S0 (t + k) = e dξ (3.92) Φ(ξ) 0 where, as in Chapter 1, Φ(ξ) = Zϕ(0, ξ). Whether (3.92) converges in V (ϕ) depends on the zero set of Φ. In particular, if ϕ has compact support then the integral will be undefined if Φ has a zero on T. If Φ does have a zero on T, it might still be possible to verify offset sampling as in (3.89), which applies to any f ∈ V (ϕ) provided St0 defined by Z St0 (t + k) =
0
1
Zϕ(t, ξ) −2πikξ e dξ Zϕ(t0 , ξ)
(3.93)
converges in L2 . This holds, in turn, if |Zϕ(t0 , ξ)| ≥ c > 0 on T for some fixed t0 . However, the existence of such a t0 is by no means guaranteed, as the following simple example shows. Set g(t) = i sin 2πtχ[0,1) (t) and 2πit , e 2πit ϕ(t) = e g(t) − g(t − 1) = i sin 2πt −1, 0,
if 0 ≤ t < 1, if 1 ≤ t < 2, else.
Then ϕ is continuous, supported on [0, 2], has orthogonal (integer) shifts, satisfies ϕ(0) b = 1/2, kϕk2 = 1 and Zϕ(t, t) = 0 for all t. In this case, the integral defining St0 in (3.93) cannot converge for any value of t0 ∈ [0, 1). Thus (3.89) fails to apply in V (ϕ), no matter what the value of t0 . This contrived example illustrates one possible shortcoming of regular sampling when working in the general setting of PSI spaces. Another limitation,
3.5 Sampling in principal shift-invariant spaces
149
at least in the case of offset integer sampling, is that the interpolating function may not be well localized. This is the case when ϕ has compact support. Then Zϕ(t0 , ξ) is a trigonometric polynomial so, even if it does not vanish on T, the Fourier series of 1/(Zϕ(t0 , ξ)) will have infinitely many terms. Consequently, St0 defined as above cannot have compact support. These considerations indicate that a broader class of sampling schemes might be needed in order to establish interpolation formulas for PSI spaces in which the basic interpolating functions have desirable properties. Several different sampling schemes are considered in some detail in the papers [200, 201]. We summarize a few of the key aspects of those papers here. Periodic nonuniform sampling. Periodic nonuniform sampling (PNS) was introduced by Yen [371], in the context of sampling of bandlimited functions. Bracewell [59] also referred to such sampling as bunched sampling or interlaced sampling. By periodic sampling we mean a sampling pattern that is repeated every P time steps. Then P is called the sampling period. We will always take P ∈ N. Interpolation operators for PNS take the form f 7→ St f =
L−1 XX
f (tj + P k)Sj (x − P k),
t = (t0 , . . . , tL−1 ).
(3.94)
j=0 k∈Z
Applications of PNS in the context of bandlimited signals include reduction of channel noise [8] and analysis of multiband signals [188], among others. In order to focus ideas, as well as to juxtapose with the bandlimited case, in what follows we shall always assume that ϕ is continuous and has compact support in [0, M ]. The PNS pattern is then determined by an offset vector t = (t0 , t1 , . . . , tL−1 ) in which 0 ≤ t0 < t1 < · · · < tL−1 < P , that gives rise to the sampling lattice t + P Z. The average rate of L/P samples per unit time is called the sampling rate. Critical sampling refers to the case L = P ; oversampling refers to L > P . Djokovic and Vaidyanathan [114] introduced several PNS schemes, each of which gives rise to an interpolation formula as in (3.94). We will focus on the oversampling case so as to address aliasing issues. Scaling relations can provide extra leverage in validating oversampling schemes, with control both on the sampling rate and on the supports of the interpolating functions. Although we will not address critical sampling in detail, it is worth mentioning here that, in contrast to simple offset sampling, the more flexible critical PNS schemes can give rise to compactly supported interpolating functions (see [201]). Oversampling. To simplify notation, we will always assume a unit sampling period and a rate of L samples per unit time. Several important properties of PNS sampling operators boil down to issues of finite-dimensional linear algebra. To take advantage of this aspect, it is useful to introduce some notation. Fix an offset vector t : 0 ≤ t0 < t1 < · · · < tL−1 < 1 and define the vector Zak transform Zϕ(t, ξ) = (Zϕ(t0 , ξ), . . . , Zϕ(tL−1 (ξ)) whose extension to C,
150
3 Sampling in Fourier and wavelet analysis
when defined,Pis given by ZC ϕ(t, z) = (ZC ϕ(t0 , z), . . . , ZC ϕ(tL−1 , z)) where ZC ϕ(ti , z) = k ϕ(ti + k)z k (z ∈ C). We will use “·” to denote the Hermitian inner product on CL . Then kZϕ(t, ξ)k2 = Zϕ(t, ξ) · Zϕ(t, ξ) is the standard Euclidean length of the vector Zϕ(t, ξ). Just as with offset sampling, properties of PNS operators boil down to properties of zeros of Zak transforms. We will say that the polynomials {ZC ϕ(ti , z)}L−1 i=0 are coprime if the vector ZC ϕ(t, z) has no zeros in C. We will consider a condition that guarantees coprimality momentarily, but first we consider its consequences. Suppose then that t is chosen so that the polynomials {ZC ϕ(tj , z)}L−1 j=0 are coprime. Since they all have degree at most M − 1, Euclid’s algorithm (e.g., [207]) furnishes a vector polynomial P = (P0 , P1 , . . . , PL−1 ) with entries of degree at most M − 2 such that P(z) · ZC ϕ(t, z) =
L−1 X
Pj (z)ZC ϕ(tj , z) = 1
(3.95)
j=0
for all z ∈ C. Thus, if f (t) = C(ξ)Zϕ(·, ξ), (3.95) yields
P
k ck ϕ(t
− k) ∈ V (ϕ), so that Zf (·, ξ) =
C(z) = C(z)P(z) · ZC ϕ(t, z) = P(z) · ZC f (t, z). PM −2 PM −2 Writing Pj (z) = m=0 pjm z m and Pj (ξ) = m=0 pjm e2πimξ , the coefficients ck of f are recovered by Z
1
ck =
P(ξ) · Zf (t, ξ) e−2πikξ dξ =
0
L−1 −2 XM X
pjm f (tj + k − m).
j=0 m=0
Consequently, f can be reconstructed from its samples along t + Z by f (t) =
X L−1 X k
=
j=0
X L−1 X k
f (tj + k)
M −2 X
pjm ϕ(t − m − k)
m=0
f (tj + k)Sj (t − k).
(3.96)
j=0
PM −2 Here, Sj (t) = m=0 pjm ϕ(t − m) ∈ V (ϕ) is supported on [0, 2M − 2]. Desirable properties of the interpolation formula are often obtainable by sampling at a higher rate. For example, under suitable conditions, increasing L allows one to solve (3.95) using polynomials of degree less than M − 2 and, hence, interpolating functions with shorter support. A different advantage in terms of aliasing errors will be addressed below. Validation of oversampling in MRA spaces. When ϕ is a scaling function supported in [0, M ], it is possible to validate the convergence of (3.96) in V (ϕ) with a sampling rate that depends (geometrically) on the length of the support of ϕ.
3.5 Sampling in principal shift-invariant spaces
151
Theorem 3.5.5. If ϕ is a continuous, orthogonal scaling function supported ¡ ¢ M −2 in [0, M ], then the polynomials {ZC ϕ 4l/2M , z }2l=0 −1 are co-prime. In the case of Daubechies’ 4-coefficient systems, one has M = 3 so the theorem applies, for example, when t = (0, 1/2). The interpolating functions S0 and S1 as in (3.96) then are supported in [0, 4]. To prove the theorem, we need a preliminary result on the zeros of Zak transforms of compactly supported scaling functions. It depends on scaling properties expressed in the Zak domain; see Section 1.2.2. Lemma 3.5.6. If ϕ is a continuous scaling function supported on [0, 2J + 1] and ϕ(k/2J ) = 0 for 0 ≤ k ≤ 2J − 1, then ϕ ≡ 0. The full proof is lengthy but the idea is simple: one uses the scaling equation to deduce thatP ϕ vanishes at all integers, thus contradicting the partition of unity property k ϕ(x + k) = 1. We illustrate this for the case J = 3. Suppose ϕ is as in the statement of the lemma with J = 3 and ϕ(k/8) = 0 for 0 ≤ k ≤ 7. The scaling equation (1.1) then implies that 0=
1 ϕ 2
µ ¶ X ∞ 1 = hk ϕ(1 − k) = h0 ϕ(1), 2 k=0
Thus, ϕ(1) = 0 since, without loss of generality, we may assume h0 6= 0. Using the dilation equation successively at t = 5/4, 3/2, 7/4 gives ϕ(5/4) = ϕ(3/2) = ϕ(7/4) = 0. Now putting t = 5/2, 3, 7/2, 4 gives ϕ(5/2) = ϕ(3) = ϕ(7/2) = ϕ(4) = 0. Finally, putting t = 5, 6, 7, 8 gives ϕ(5) = ϕ(6) = ϕ(7) = ϕ(8) = 0. We are now in a position to prove Theorem 3.5.5. Proof. Suppose that ZC ϕ(4l/2M , z0 ) = 0, (0 ≤ l < 2M −2 ). Since ZC ϕ(x, 1) = 1 for all x, we see that z0 6= 1. Further, by Lemma 3.5.6, z0 6= 0. Choose z1 so that z12 = z0 . Then for each 0 ≤ l ≤ 2M −2 − 1, we have ³ 4l ´ ³ 8l ´ ³ 8l ´ , z = H(z )Z ϕ , z + H(−z )Z ϕ , −z 0 1 C 1 1 C 1 , 2M 2M 2M ³ 8l ³ 8l ´ ³ 8l ´ 1 ´ 0 = z1 ZC ϕ M + , z0 = H(z1 )ZC ϕ M , z1 − H(−z1 )ZC ϕ M , −z1 . 2 2 2 2 ¡ ¢ ¡ ¢ M M Hence, H(z1 )ZC ϕ 8l/2 , z1 = 0 = H(−z1 )ZC ϕ 8l/2 , −z1 . However, by the QMF condition (1.3), H(z1 ) and H(−z1 ) cannot both be zero. ¡ ¢ Thus, by swapping z1 and −z1 if necessary, we see that ZC ϕ 8l/2M , z1 = 0 for 0 ≤ l ≤ 2M −3 − 1. Since z1 6= z0 , taking l = 0 indicates a second zero ±z1 of Φ(z) = ZC ϕ(0, z). We say z0 is a common zero at level 0 and z1 is a common zero at level 1 and begin a tree of zeros as in Figure 3.1. We can continue this process, producing common zeros z2 at level 2, z3 at level 3 and so on until finally we reach zM −2 at level M − 2. If z0 ∈ / T 2 then, since zi+1 = zi , the collection 0, z0 , z1 , . . . , zM −2 form M distinct zeros 0 = ZC ϕ
152
3 Sampling in Fourier and wavelet analysis z0 z1 Fig. 3.1. Seed of the tree of zeroes of Φ
of Φ, which is a trigonometric polynomial of degree M − 1. Hence we have a contradiction. We are left with the situation in which z0 ∈ T. If the zeros z0 , z1 , . . . , zM −2 are distinct then we have a contradiction as before. Otherwise, zj = zk for j j some j < k. Then z0 = zj2 = zk2 = zk−j . However, since z1 6= z0 (since z12 = z0 and z0 6= 1), we can assume that z0 , z1 , . . . , zm−1 are all distinct with m ≥ 2. We claim that the string z0 , z1 , . . . , zm−1 must branch at some point, i.e., for some 0 ≤ j ≤ m − 1, wj = −zj is another zero at level j. If this were not the case then, for each 0 ≤ j ≤ ¢m − 1, z¡j is a zero at level ¢j but ¡ wj = −zj is not. Since ZC ϕ 22+j l/2M , zj = ZC ϕ 22+j l/2M + 1/2, zj = 0 (0 ≤ l ≤ 2M −2−j − 1), we have ³ 8l ´ ³ 8l ´ 0 = H(zj+1 )ZC ϕ M −j , zj+1 = H(−zj+1 )ZC ϕ M −j , −zj+1 . 2 2 ¡ ¢ But ZC ϕ 8l/2M −j , −zj+1 6= 0, so H(−zj+1 ) = 0. By (1.30), |H(zj+1 )| = 1. But this implies that {z0 , z1 , . . . , zm−1 } forms a τ -cycle for which |H(zn )| = 1 for 0 ≤ n ≤ m − 1, a contradiction of the τ -cycle condition (see Theorem 1.1.1). Thus, the string must branch at some zj−1 as in Figure 3.2. z0 z1 zj−1
wj
ÄÄ ÄÄ
?? ?? zj zj+1 zm
Fig. 3.2. Branched tree of zeroes
We can use wj as the seed for a new string of zeros wj , wj+1 , . . . which is easily seen to be distinct from the string z0 , z1 , . . . , zm−1 . Either this new string reaches level M − 2, in which case we have a contradiction as before, or
3.5 Sampling in principal shift-invariant spaces
153
there is a repetition in the new string, in which case it branches. Continuing 2 in this way we find that each new string has length at least 2 since wj+2 = 2 wj+1 6= wj+1 . The worst possible scenario (in the sense of production of the fewest new zeros per string) is that in which each new string has length 2. It is clear that after one branching we reach level 2 and the mth level is reached after no more than m − 1 branchings. In particular, level M − 2 is reached in no more than M − 3 branchings. In this way we are assured of finding M − 1 zeros of Φ(z)/z, a contradiction. The proof is complete.
Aliasing. The interpolating operator defined in (3.94) can be thought of as an oblique projection from some suitable subspace of L2 (R) onto V (ϕ). In the classical theory, aliasing refers to errors that arise when the sampled signal is not actually bandlimited or is limited to a band that is too wide. As such, there is implicit reference to a larger space. If one is willing to accept a signal model that stipulates that all signals are (essentially) bandlimited then, in practice, aliasing becomes a problem when sampling is too sparse, for example, when one can only sample at half of the effective Nyquist rate. In this case one seeks to represent a signal using only half of the required bandwidth. In an MRA, the analogue is that of trying to use integer sampling to represent functions in V1 in terms of elements of V0 . The errors encumbered in doing so are what we seek to quantify. Recall that the wavelet space W0 is the orthogonal complement of V0 inside V1 . As a refinement of the aliasing ideas of Janssen [212], we measure the aliasing performance of a sampling operator S : V1 → V0 by its aliasing bounds N ∗ (S) and N∗ (S) defined by ½ ¾ ½ ¾ kSf k kSf k ∗ N (S) = sup ; f ∈ W0 , N∗ (S) = inf ; f ∈ W0 . kf k kf k N ∗ (S) is of course the norm of S acting on W0 . A low value of N ∗ (S) will mean that aliasing effects are weak; that is, signals in W0 are killed off in the sampling procedure. High values of N∗ (S) means that aliasing effects are strong; that is, signals in W0 are P not being killed off. In the case of the classical sampling operator Sc : f 7→ k f (k)sinc (t − k) acting on P W1 , we find N ∗ (Sc ) = 1, N∗ (Sc ) = 0. This sampling projection is far from orthogonal. Aliasing performance of critical sampling operators. The offset samP pling operators St0 map f ∈ V (ϕ) to St0 f (t) = k f (t0 + k)St0 (t − k) as in (3.89) and were introduced by Janssen [212]. HereP St0 is the interpolating function defined in (3.93) or equivalently by St0 (t) = k sk ϕ(t − k) with sk = R1 (Zϕ(t0 , ξ))−1 e−2πikξ dξ. They are well defined whenever inf ξ |Zϕ(t0 , ξ)| > 0 0. In [212], Janssen showed that ¯ ¯ ¯ ¯ ¯ Zψ(t0 , ξ) ¯ ¯ Zψ(t0 , ξ) ¯ ∗ ¯ ¯ ¯ ¯ N (St0 ) = sup ¯ , N∗ (St0 ) = inf ¯ (3.97) ξ Zϕ(t0 , ξ) ¯ Zϕ(t0 , ξ) ¯ ξ
154
3 Sampling in Fourier and wavelet analysis
when ϕ is an orthogonal scaling function with associated mother wavelet ψ. The aliasing bounds can be optimized with respect to t0 . In Figure 3.3, N ∗ (St0 ) and N∗ (St0 ) are computed via (3.97) over a range of values of t0 for the Daubechies D4 scaling function. The norm N ∗ (Sy ) is minimized for t0 ≈ 0.63.
3
2.5
2
1.5
1
0.5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 3.3. Plot of aliasing bounds N ∗ (St0 ) (upper graph) and N∗ (St0 ) (lower graph) for Daubechies D4 scaling function as a function of t0
Aliasing performancePof oversampling operators. The classical sampling operator Sc f (t) = k f (k)sinc (t − k) has aliasing bounds N ∗ (Sc ) = 1, N∗ (Sc ) = 0. On the other hand, a simple calculation shows that when V0 = V0 (sinc) and W0 and V1 are defined accordingly, the classical oversampling operator µ ¶ µ ¶ 1X k k Oc f (t) = f sinc t − 2 2 2 k
annihilates W0 . That is, sampling at twice the Nyquist rate eliminates aliasing, and Oc is orthogonal when thought of as a projection from V1 onto V0 . Although this extreme behavior is not observed in the case of general multiresolution spaces, oversampling does typically improve aliasing performance relative to critical sampling. As before, assume that an offset vector t : 0 ≤ t0 < t1 < · · · < tL−1 < 1 is chosen such that ZC ϕ(t, z) is nonvanishing on C. We also continue to assume that ϕ is an orthogonal generator of a PSI
3.5 Sampling in principal shift-invariant spaces
155
space V (ϕ) supported on [0, M ] and that P = (P0 , P1 , . . . , PL−1 ) is chosen so that (3.95) holds. To emphasize that we are oversampling here, we write Ot f (t) =
X X L−1 k
f (tn + k)Sn (t − k)
(3.98)
n=0
PM −2 with Sn (t) = j=0 pnj ϕ(t−j). The aliasing performance of Ot is summarized in the following theorem. Theorem 3.5.7. Let ϕ, t, P and Ot be as above. Let ψ be another orthogonal generator for a PSI space V (ψ) and Ht (ξ) = P(e2πiξ ) · ZC ψ(t, e2πiξ ).
(3.99)
Then Ot satisfies the norm bound kOt kV (ψ)→V (ϕ) = supξ |Ht (ξ)| while, if Ht is nonvanishing on T, then k(Ot )−1 kV (ϕ)→V (ψ) = (inf ξ |Ht (ξ)|)−1 . When ψ is the wavelet associated to ϕ, kOt kV (ψ)→V (ϕ) has a parallel interpretation to the aliasing norm N ∗ as does 1/k(Ot )−1 kV (ϕ)→V (ψ) to N∗ . Thus we use the notation N ∗ (Ot ) = sup |Ht (ξ)| and N∗ (Ot ) = inf |Ht (ξ)| when referring to the specific case in which ϕ is an orthogonal scaling function and ψ is an associated orthogonal wavelet. The proof illustrates the role of orthogonality and the fact that scaling plays no role in the actual result, but only in its interpretation. P Proof. Let g(t) = l bl ψ(t − l) ∈ V (ψ). Then Ot g(t) =
X L−1 XX k
n=0
bl ψ(tn + k − l)Sn (t − k).
l
Taking the Zak transform of both sides and letting B(ξ) denote the Fourier series of {bk } yields
ZOt g(t, ξ) = B(ξ)
L−1 X
Zψ(tn , ξ)ZSn (t, ξ) = B(ξ)Zϕ(t, ξ)Ht (ξ)
n=0
with Ht as in (3.99). Since the integer shifts of ϕ are orthonormal, Z kOt gk22 =
1 0
Z 0
1
Z
1
|ZOt g(t, ξ)|2 dt dξ =
|B(ξ)Ht (ξ)|2 dξ.
0
The result follows by choosing B to be arbitrarily concentrated near points at which Ht attains its supremum and infimum, respectively.
156
3 Sampling in Fourier and wavelet analysis 1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 3.4. Plot of aliasing bounds N ∗ (Ot ) (upper graph) and N∗ (Ot ) (lower graph) for the Daubechies D4 scaling function where t = (0, α) as a function of α 0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Fig. 3.5. Plot of aliasing bounds N ∗ (Ot ) (upper graph) and N∗ (Ot ) (lower graph) for the Daubechies D4 scaling function where t = (α, α + 1/2) as a function of α. Observe that N∗ (Ot ) is essentially zero for all α
3.6 Notes
157
When ϕ is the Daubechies D4 scaling function and t = (0, α) (0 ≤ α < 1), N ∗ (Ot ) and N∗ (Ot ) are plotted as functions of α in Figure 3.4. In Figure 3.5 the offset vector is changed to t = (α, α + 1/2). Elimination of aliasing with a high sampling rate. The operator Ot is orthogonal from W0 to V0 precisely when the vector polynomial P that satisfies (3.95), also satisfies P(z) · ZC ψ(t, z) ≡ 0.
(3.100)
To illustrate the complexity of the problem of eliminating aliasing, suppose that L polynomials P0 , P1 , . . . , PL−1 , each of degree no greater than D, can be chosen so that both (3.95) and (3.100) hold. Each Pj is of the form Pj (z) = PD k k=0 pjk z , so in simultaneously solving (3.95) and (3.100), we have (D +1)L coefficients to determine. On the other hand, the polynomials P(z) · ZC ϕ(t, z) and P(z) · ZC ψ(t, z) have degree D + M − 1 when both ϕ, ψ are continuously supported in [0, M ]. Thus, to insure a joint solution we need 2(D + M ) ≤ (D + 1)L, or 2(D + M ) L≥ . (3.101) D+1 If M = 3 and D = 0 then we need L ≥ 6 while if M = D = 3 we need L ≥ 3. For a given support length M , in order to make the length M + D of the sampling functions “small,” the sampling rate L must be “large.” In the case of the D4 scaling function and wavelet, one has M = 3 so, using interpolating functions of length 3 + 3 = 6, it is possible to eliminate aliasing (from V1 ) by sampling at a rate of three samples per unit interval.
3.6 Notes Sufficient conditions for (ir)regular sampling of the short-time Fourier transform. According to Theorem 3.4.1, if ab > 1 then the regular Gabor system G(g, a, b) cannot form a frame for L2 (R). When ab = 1, there exist square-integrable functions g for which G(g, a, b) is a frame, but by the Balian–Low theorem (3.4.3) they must be poorly localized in either time or frequency (or both). However, any reasonable function g ∈ L2 (R) will generate a Gabor frame G(g, a, b) for sufficiently small a, b. To see why this should be so, consider the short-time Fourier transform S(f, g)(t, ξ) = hf, Mξ Tt gi. Now kS(f, g)kL2 (R2 ) = kgk2 kf k2 , and since S(f, g) is continuous on phase space we might expect that an appropriately fine Riemann sum approximation for the integral defining kS(f, g)kL2 (R2 ) would provide a good approximation: X X |hf, Mma Tnb gi|2 |S(f, g)(ma, nb)|2 = ab kS(f, g)kL2 (R2 ) ≈ ab m,n
m,n
158
3 Sampling in Fourier and wavelet analysis
so that {Mma Tnb g}m,n is a frame for L2 (R) with bounds A, B ≈ kgk22 /(ab). Daubechies [98, 99] provides conditions on g, a, b to ensure that G(g, a, b) is a frame for L2 (R). Many sufficient conditions are available. See [168] for further references. In particular, the Riemann sum approach which involves analysis on the Heisenberg group, can be found in [139] and references therein. This point of view is taken up in [158] in the case of the wavelet transform, where analysis is done on the ax + b-group. PSWFs as sampling functions. In [359], the PSWFs appear explicitly in sampling formulas for bandlimited functions. It is observed that since sinc series: Ω sinc (Ωx) = P∞(Ωx) ∈ P WΩ , it may be expanded in a PSWF ∞ a ψ (x). By the orthogonality of the {ψ } , we have ak = ψk (0). j j j j=0 j=0 P∞ Similarly, Ω sinc (Ωx − k) = ψ (k/Ω)ψ (x), so that each f ∈ P WΩ j j=0 j admits the expansion ¸ ∞ ³n´ X ³ n ´ sin π(Ωx − n) X ³ n ´· X f (x) = f = f ψj ψj (x) . (3.102) Ω π(Ωx − n) Ω j=0 Ω n n Suppose however that f is bandlimitedR to [−Ω/2, Ω/2] and “well concentrated” on [−1/2, 1/2] in the sense that |x|>1/2 |f |2 /kf k2 < ε for some small ². The cardinal sine expansion does not reflect this concentration since the cardinal sine function is itself poorly localized. Walter and Shen make precise the statement that truncations of the expansion (3.102) provide a good approximation provided N is sufficiently large, i.e., f (x) ≈
−1 ³n´ X ³ n ´ NX f ψj ψj (x). Ω j=0 Ω n
(3.103)
The decay of the eigenvalues γ0 > γ1 > · · · enters again, and provides estimates of the length N of the expansion (3.103) to ensure the error does exceed some specified threshold. For details, see [359]. A note on higher dimensions. The setting in this chapter has been the real line so that phase space is identified with R2 . This choice was made to simplify both the notation and the presentation. The extra geometry of higher dimensions provides a richer theory and a wider range of questions can be asked. For example, Balian–Low type results for regular Gabor frames have been extended to symplectic lattices in [33] and [170]. Numerous other extensions of results for rectangular lattices may be transferred to the symplectic setting (see [168]). Sharpness of the Balian–Low theorem. In [32] it is shown that for all ε > 0 there exists g ∈ L2 (R) such that the Gabor system G(g, 1, 1) is an orthonormal basis for L2 (R) with
3.6 Notes
µZ |g(x)|2
(1 + |x|2 ) log1+ε (2 + |x|)
¶µ Z dx |b g (ξ)|2
(1 + |ξ|2 ) dξ log1+ε (2 + |ξ|)
159
¶ < ∞;
cf. Theorem 3.4.3. Hence, the slightest possible weakening of the conditions of Theorem 3.4.3 allows for the existence of regular Gabor frames at the critical rate ab = 1. Consequences of non–translation-invariance in PSI spaces. PSIs typically are not invariant under arbitrary translations. Madych [266] showed that the only MRAs (Vj , ϕ) that are translation-invariant are those for which |ϕ| b = χE for some measurable tile of measure 1 (the minimally supported in frequency scaling functions). The Paley–Wiener space is generated by an MSF scaling function, namely the cardinal sine. Translation-invariance is a very nice property to work with in signal analysis and sampling in particular; that is, it buys you some freedom in initializing a sampling scheme. Imagine a signal in P W1 is sampled uniformly at the critical rate of once per unit time. Suppose the samples are ak = f (k + α), but the value of α is unknown. If we ignore this uncertainty and assume α = 0, the function we reconstruct by the classical sampling theorem is X X g(t) = ak sinc (t) = f (k + α)sinc (t − k) = f (t + α). k
k
Although we would reconstruct f (· + α) rather than f , the result is simply a translation of the true signal and in any event, the reconstruction interpolates the data, i.e., g(k) = f (α + k) = ak . In general PSIs, the situation is vastly different and this theme is explored in some detail in [198, 200]. Large errors can be incurred when sampling in a PSI space if incorrect assumptions are made about the location of sampling points. All the sampling procedures for PSI spaces in this chapter rely heavily on knowledge of the location of sampling points. In particular, the sampling procedures used for reconstruction from a lattice Λ ⊂ R are different if the samples are taken on a translate Λ + α of the lattice—unlike the classical case, the sampling functions we use on Λ + α are not simply translates of the sampling functions we use on Λ. In [198], the extent of these problems is explored. The discrepancy dϕ of a PSI space V (ϕ) is defined to be dϕ = sup
sup
0<α<1 f ∈V (ϕ),kf k=1
kTα f − Pϕ Tα f k.
It is shown that if ϕ ∈ W then dϕ = 1, i.e., there exists a signal f ∈ V (ϕ) and 0 ≤ α < 1 such that Tα f is almost orthogonal to V (ϕ)! It is also shown in [198] that oversampling is an effective tool in reducing data translation errors. An algorithm is given which determines the translation of signals in PSI spaces from oversampled data, thus enabling correct initialization of sampling algorithms.
160
3 Sampling in Fourier and wavelet analysis
Σ∆-quantization. We have seen several examples in this chapter of perfect reconstruction formulas of the form X f (t) f (tn )ϕ(t − tn ) n
for f in some space V of continuous functions on the line and ϕ ∈ V cardinal in the sense that ϕ(tn ) = δn . Much has been made of the fact that capturing the samples {f (tn )}∞ n=−∞ gives complete knowledge of f . What has been ignored is that the samples are assumed to be known with infinite precision, an impossibility in practice. A natural compromise is to use quantized versions of the samples sn = 2−N b2N f (tn )c as the quantities to be encoded into a bitstream to be stored or transmitted—a method known as pulse code modulation (PCM). An approximate reconstruction of f is obtained from X f˜(t) = sn ϕ(t − tn ). n
P
When nP|ϕ(t − tn )| is uniformly bounded, the L∞ error satisfies kf − f˜k∞ ≤ 2−N supt n |ϕ(t − tn )|, which decays exponentially in the number N of bits used per sample. Unfortunately, PCM suffers from several implementation difficulties and engineers prefer to use a method known as oversampled quantization in which samples are encoded with very few bits (perhaps one bit per sample) the downside being that high sampling rates must be used. When one bit is budgeted for each sample, simply rounding off each sample to its sign bit, i.e., sn = sgn (f (tn )) is insufficient, even in the bandlimited case since any two signals within the space that do not change sign would have the same quantized representation. Algorithms known as sigma–delta (Σ∆) quantization partially alleviate this problem. In [172], G¨ unt¨ urk considers one-bit quantization of signals in P WΩ using a stable sampling formula and considers approximation properties of representations having the form ∞ 1 X λ ³ n´ f˜λ (t) = qn ϕ t − λ n=−∞ λ
(t ≥ 0)
with qnλ = ±1. He proves: Theorem 3.6.1. There exist positive constants µ, λ0 , r and C such that for any f ∈ P WΩ with kf k∞ ≤ µ and any λ ≥ λ0 , there exists a sequence {qnλ }∞ n=−∞ ⊂ {1, −1} and a number T ≥ 0 independent of f with ¯ ¯ ∞ ¯ 1 X λ ³ n ´¯¯ ¯ sup ¯f (t) − q ϕ t− ≤ C2−rλ . λ n=−∞ n λ ¯ t≥T
3.6 Notes
161
There is by now a significant engineering literature on Σ∆ quantization, and a rather smaller but growing mathematical literature—see [100, 172, 173] and references therein. Generalizing Shannon sampling. Recent work of Smale and Zhou [325] addresses the problem of approximately reconstructing an analog signal f from discrete, noisy sampled data under fairly general assumptions on the noise and the sampling methods, but with a view toward applications to likely convergence of fast algorithms based on sparse data. To set the stage, let X ⊂ Rn be closed and let K : X × X → R be a symmetric kernel. One fixes a discrete set D ⊂ X such that {K(d1 , d2 )}d1 ,d2 ∈ D defines a positive, bounded and continuously invertible matrix on `2 (D), and a discrete sampling set S ⊂ X such that the matrix {K(s, d)} is bounded from `2 (D) to `2 (S). An input space HK,D of analog signals is defined by completing the span of the functions Kd (x) = K(d, x) with respect to the inner product hKd1 , Kd2 iK = K(d1 , d2 ). One models noise in terms of a Borel measure ρ on X × R. It is assumed that for each x ∈ X, the conditional measure ρx (A) = ρ(x, A) is a probability measure with zero mean supported on P [−Mx , Mx ] such that M (S) = ( s∈S Ms2 )1/2 < ∞. With these conventions, one defines a noisy sampling operator on HK,S by sampling noisy data f ∗ (x) = f (x)+ηx along S. Here, ηx is randomly generated from ρx . Because of the role of set D in defining the input space Pthe discrete e(s) − S(f )(s)|2 among fe ∈ HK,D , the | f HK,S , in order to minimize s∈S matrix {K(d, s)} · {K(s, d)} should have a bounded inverse on `2 (D). Under these hypotheses, Smale and Zhou show that, with the pseudoinverse matrix {L(d, s)} = ({K(d, s)} · {K(s, d)})−1 · {K(d, s)}, the interpolatP P ing operator fe(x) = d∈D Kd (x) s∈S {L(d, s)}f ∗ (s) serves to approximate f ∈ HK,D with error bounds expressed through the incantation “for any ² > 0, kfe−f ∗ kK ≤ κσ 2 +² with probability 1−δ.” The quantity κ is defined precisely in terms of theP matrix norms Pof {K(d, d)} and L, while δ also depends on the variance σ 2 = s∈S σ 2 (ρs ) s∈D K(d, s)2 and on the magnitude M (S). Applications to (irregular) sampling of bandlimited signals and to subsampling are outlined in [325].
4 Bases for time–frequency analysis
In the first part of this chapter we construct various bases having good time– frequency localization. The construction of wavelet packets, for example, is presented in Section 4.3. Wavelets are, at least metaphorically, localized in the upper time–frequency plane on rectangles whose heights are inversely proportional to the lengths of their time intervals. Wavelet packets arise when a pair of wavelets living on adjacent time intervals is replaced by a pair of packets whose time interval is the union of the wavelet time intervals and whose frequency intervals are the lower and upper halves of the wavelet frequency intervals. Further wavelet packets can be obtained by repeating this type of recombination. When suitably defined, such packets give rise to a general class of basis functions associated with certain rectangles in the plane. An alternative approach is to start with the Gabor tiling. The Balian– Low theorem asserts that Gabor functions cannot have good time–frequency localization. However, Wilson basis functions—in which the role of exponentials in Gabor functions is replaced by cosines—can have good time–frequency localization. They can be regarded as living on rectangles in the upper half plane having a unit time interval (support) and a unit frequency interval corresponding to the frequency of the cosine. Wilson bases having optimal time–frequency localization are constructed in Section 4.1. Local trigonometric bases (LTBs) generalize Wilson bases in the sense of being localized on tiles whose time intervals are determined by a partition of R into intervals of not necessarily uniform length. They are constructed explicitly in Section 4.2. In a certain sense, the splittings in time leading to LTBs are dual to the splittings in frequency leading to wavelet packets (the analogy is made explicit in the chapter notes). Neither lead to arbitrary tilings. It is natural to consider the possibility of tiling phase space by mixing wavelet packet and LTB based recombinations. However, the separate recombinations are possible only because of particular conditions on the “starting” basis functions: the recombination transformations are not arbitrary. In Section 4.4 we consider uncertainty issues that must be addressed in order to build bases adapted to given tilings.
164
4 Bases for time–frequency analysis
In Section 4.5, we begin with a review of the Walsh functions. If one replaces the concept of frequency of a trigonometric function by sequency (rate of sign change) of a Walsh function, one can think of Walsh packets—shifted and dilated Walsh functions—alternately as wavelet packets—which they truly are—or as discrete analogues of LTBs. In this way one is equipped with an uncertainty-free Walsh model of the time–sequency plane in which Walsh packets live precisely on time–sequency rectangles—analogues of Heisenberg tiles. Following Thiele’s thesis [347] we then discuss combinatorial consequences of this Walsh picture, some of which will be applied to operator theory in Chapter 7. One particularly appealing result is the following: if a region in the plane can be expressed as a finite pairwise disjoint union of tiles, then there is a well-defined orthogonal projection onto the subspace of L2 (R) spanned by the Walsh packets of those tiles (Section 4.5.1). What is important here is that any other covering by pairwise disjoint tiles defines the same projection. This becomes particularly useful when one associates a measure of information cost to a tiling: if the measure satisfies certain properties then one can define a best tiling or best basis to a region, along with a fast algorithm for computing a best basis expansion (Section 4.5.2). In Section 4.6 we consider a finite version of the Walsh plane for working with finite data and a generalization of it to the phase plane for finite Abelian groups.
4.1 Wilson bases and the Zak transform According to the Balian–Low theorem, if the functions gkn (x) = e2πinx g(x−k) form a Riesz basis for L2 (R) then g cannot be well localized both in time and in frequency. Wilson (see [99] for history) observed that matters are not as drastic as they seem: good localization is possible if the exponentials e2πinx are replaced by sines and cosines. Given a suitable window function w, the Wilson basis construction of Daubechies et al. [105] yields functions √ if n = 0, k even, 2 w(x − k/2), n θk (x) = 2 w(x − k/2) cos(2πnx), if n ∈ {1, 2, 3, . . . }, k even, 2 w(x − (k + 1)/2) sin(2π(n + 1)x), if n ∈ {0, 1, 2, . . . }, k odd. The alternating polarities of the contiguous shifts arise naturally as we will see, also in the case of local trigonometric bases in Section 4.2. Requiring that the θkn be orthogonal imposes a strong constraint on the class of admissible windows. The constraint is not so severe in the case of Riesz bases and allows for modified Gaussian windows as Coifman and Meyer [90] showed. They considered specific Gaussians of the form exp(−ζ(x − 1/4)2 ) in which Re(ζ) > 0. The reason for this particular shift has to do with a folding operation introduced below. Bittner [54] used the Zak transform to construct a general family of biorthogonal Wilson bases on R, including the Gaussian bases. We consider here a slightly simpler case of his construction.
4.1 Wilson bases and the Zak transform
165
Let w be real-valued and symmetric with respect to x = 1/4. The Zak transform of w therefore satisfies ³1 ´ Zw − x, ξ = Zw(x, ξ). (4.1) 2 We will also want to assume that w is well localized. A natural hypothesis is that both w and w b belong to the Wiener space W defined in Chapter 3. Then the sum defining Zw(x, ξ) converges uniformly to a continuous function. One defines the folding matrix Mw (x, ξ) by · ¸ Zw(x, ξ) Zw(−x, ξ) Mw (x, ξ) = (4.2) −Zw(−x, ξ) Zw(x, ξ) for (x, ξ) ∈ Q+ = [0, 1/2) × [−1/2, 1/2). Notice that M (x, ξ) has determinant det M (x, ξ) = |Zw(x, ξ)|2 + |Zw(−x, ξ)|2 . Define the folding operator Tw on L2 (R) by · ¸ · ¸ ZTw f (x, ξ) Zf (x, ξ) = Mw (x, ξ) , (x, ξ) ∈ Q+ . (4.3) ZTw f (−x, ξ) Zf (−x, ξ) To fix some notation, set √ 2, enε (x) = 2 cos 2πnx, 2 sin(2(n + 1)πx),
for n = 0, ε = 0, for n ∈ {1, 2, 3, . . . }, ε = 0, for n ∈ {0, 1, 2, . . . }, ε = 1.
(4.4)
More generally, for integers n and k, n ≥ 0, set enk = enk mod 2 and define ³ k´ ψkn (x) = enk (x) w x − . (4.5) 2 Since the functions enk are periodic, n Zψ2k (x, ξ) = e2πikξ en0 (x) Zw(x, ξ) and ³ 1 ´ n Zψ2k+1 (x, ξ) = e2πikξ en1 (x) Zw x − , ξ . 2
(4.6)
Boundedness of Zw is enough to ensure boundedness of Tw on L2 (R) and invertibility of Mw ensures invertibility of Tw as the following result demonstrates. Proposition 4.1.1. If Zw is bounded, then Tw is bounded on L2 (R) and kTw k ≤
sup (x,ξ) ∈ Q+
| det Mw (x, ξ)|.
Further, if inf (x,ξ) ∈ Q+ | det Mw (x, ξ)| > 0, then Tw is invertible and µ ¶−1 −1 kTw k ≤ inf + | det Mw (x, ξ)| . (x,ξ) ∈ Q
166
4 Bases for time–frequency analysis
Proof. From (4.3), (4.2), and the unitarity of the Zak transform, ZZ kTw f k22 = |ZTw f (x, ξ)|2 dx dξ Q ZZ ¡ ¢ = |ZTw f (x, ξ)|2 + |ZTw f (−x, ξ)|2 dx dξ Q+ ZZ ¡ = |Zw(x, ξ) Zf (x, ξ) + Zw(−x, ξ) Zf (−x, ξ)|2 Q+
¢ +|Zw(x, ξ) Zf (−x, ξ) − Zw(−x, ξ) Zf (x, ξ)|2 dx dξ
ZZ
det Mw (x, ξ) |Zf (x, ξ)|2 dx dξ
= Q
from which the result follows. The upper and lower norm bounds of Proposition 4.1.1 are sharp. To see this, observe that det Mw (−x, ξ) = det Mw (x, ξ). One can choose a bounded continuous function E(x, ξ) on Q+ , concentrated near the set on which det Mw (x, ξ) attains its supremum, in such a way that for any η > 0, ZZ ZZ 2 det Mw (x, ξ)|E(x, ξ)| dx dξ ≥ (1 − η)k det Mw k∞ |E(x, ξ)|2 dx dξ. Q+
Q+
For −1/2 < x < 0, let E(x, ξ) = E(−x, ξ). Upon extending E quasiperiodically to the plane (E(x + k, ξ + l) = e−2πikξ E(x, ξ)), E becomes the Zak R1 transform of a function g ∈ L2 (R), namely g(x) = 0 E(x, ξ) dξ. Then ZZ ¡ ¢ kTw gk22 = det Mw (x, ξ) |E(x, ξ)|2 + |E(x, −ξ)|2 dx dξ Q+ ZZ =2 det Mw (x, ξ) |E(x, ξ)|2 dx dξ + Q ZZ ≥ 2(1 − η) sup det Mw (x, ξ) |E(x, ξ)|2 dx dξ (x,ξ) ∈ Q+
= (1 − η)
sup (x,ξ) ∈ Q+
= (1 − η)
sup (x,ξ) ∈ Q+
ZZ
Q+
|Zg(x, ξ)|2 dx dξ
det Mw (x, ξ) Q
det Mw (x, ξ) kgk22 .
Sharpness of the lower bound is similarly proved. The connection between the folding operator and the basis functions enk and ψkn is provided by the following result. Proposition 4.1.2. Let Tw , enk and ψkn be as above. Then for all f ∈ L2 (R), Z (k+1)/2 hf, ψkn i = Tw f (x) enk (x) dx. k/2
4.1 Wilson bases and the Zak transform
Proof. The result hinges on the simple fact that ( e2πikξ en0 (x), for Z(χ[k,k+1/2) en0 )(x, ξ) = 0, for ( e2πi(k+1)ξ en1 (x), Z(χ[k+1/2,k+1) en1 )(x, ξ) = 0,
167
x ∈ [0, 1/2), x ∈ [−1/2, 0); for x ∈ [−1/2, 0), for x ∈ [0, 1/2).
The unitarity of the Zak transform, even symmetry of en0 , (4.3) and (4.6) then imply that ZZ n n (x, ξ) dx dξ hf, ψ2k i= Zf (x, ξ) Zψ2k Q ZZ = Zf (x, ξ) Zw(x, ξ) e−2πikξ en0 (x) dx dξ Q ZZ ¡ ¢ Zf Zw(x, ξ) + Zf Zw(−x, ξ) e−2πikξ en0 (x) dx dξ = Q+ ZZ = ZTw f (x, ξ) Z(χ[k,k+1/2) en0 )(x, ξ) dx dξ Z
Q
k+1/2
= k
Tw f (x) en0 (x) dx.
® n . A similar calculation, using the fact that en1 is odd, applies to f, ψ2k+1 The uniform upper and lower bounds on det Mw was shown in Proposition 4.1.1 to be equivalent to the boundedness and invertibility of Tw . The following result shows that this is in turn equivalent to the system {ψkn } (n ≥ 0, k ∈ Z) forming a Riesz basis. Theorem 4.1.3. Let the functions {ψkn } be defined as in (4.5). Then {ψkn } forms a Riesz system with lower Riesz bound A = inf (x,ξ)∈Q+ det M (x, ξ) and upper Riesz bound B = sup(x,ξ)∈Q+ det Mw (x, ξ). Proof. Suppose 0 < A ≤ B < ∞. Then Tw and its adjoint Tw∗ are bounded with bounded inverse. Since the collection enk χ[k/2,(k+1)/2) (n ≥ 0, k ∈ Z) is an orthonormal basis for L2 (R), for each f ∈ L2 (R) we have X ® (Tw∗ )−1 f = (Tw∗ )−1 f, enk χ[k/2,(k+1)/2) enk χ[k/2,(k+1)/2) . k,n
Hence, f admits the expansion f = Tw∗ (Tw∗ )−1 f X ® = (Tw∗ )−1 f, enk χ[k/2,(k+1)/2) Tw∗ (enk χ[k/2,(k+1)/2) ). k,n
(4.7)
168
4 Bases for time–frequency analysis
However, by Proposition 4.1.2, for all g ∈ L2 (R) hg, ψkn i = hTw g, χ[k/2,(k+1)/2) enk i = hg, Tw∗ (χ[k/2,(k+1)/2) enk )i, so that Tw∗ (χ[k/2,(k+1)/2) enk ) = ψkn and from (4.7) we have f =
X
h(Tw∗ )−1 f, enk χ[k/2,(k+1)/2) i ψkn
k,n
so that the system {ψkn } is complete. To check the Riesz bounds, let {ckn } be a square-summable sequence. Then since kTw∗ k = kTw k, ° ³X °X ° ´° ° ° ° ° ckn enk χ[k/2,(k+1)/2) ° ckn ψkn ° = °Tw∗ ° k,n
k,n
° °X ° ° ckn enk χ[k/2,(k+1)/2) ° ≤ kTw∗ k° k,n
= kTw k
X
|ckn |2 =
sup det Mw (x, ξ) kck`2
(x,ξ) ∈ Q
k,n
which is the desired upper Riesz bound. Also, Tw∗ is invertible and k(Tw∗ )−1 k = k(Tw−1 )∗ k = kTw−1 k = (inf (x,ξ)∈Q det Mw (x, ξ))−1 . Hence °X ° X ° ° |ckn |2 = ° ckn enk χ[k/2,(k+1)/2) ° k,n
k,n
°X ° ° ° =° ckn (Tw∗ )−1 (ψkn )° k,n
°X ° ³ ° ° ≤ k(Tw∗ )−1 k° ckn ψkn ° = inf
(x,ξ) ∈ Q
k,n
° ´−1 ° X ° ° det Mw (x, ξ) ckn ψkn ° ° k,n
which gives the desired lower Riesz bound. We now compute the dual basis for {ψkn }. Define a dual window w ˜ by Z
1/2
w(x) e = −1/2
Zw(x, ξ) dξ det M (x, ξ)
and in analogy with (4.5), a system {ψ˜kn } by ³ k´ ψekn (x) = enk (x) w e x− . 2 Then we have: Theorem 4.1.4. {ψ˜kn } forms a Riesz basis biorthogonal to {ψkn }.
4.1 Wilson bases and the Zak transform
169
Proof. With these definitions, Z w ˜ is continuous and bounded since Z w(x, e ξ) =
Zw(x, ξ) . det Mw (x, ξ)
Therefore, det Mw˜ (x, ξ) = (det Mw (x, ξ))−1 , so that {ψ˜lm } forms a Riesz basis with lower Riesz bound (sup(x,ξ)∈Q Mw (x, ξ))−1 and upper Riesz bound (inf (x,w)∈Q Mw (x, ξ))−1 . The biorthogonality condition hψkn , ψ˜lm i = δnm δkl will be verified by considering the relative parity of the indices k, l. First, ZZ |Zw(x, ξ)|2 n m , ψe2l i = e2πi(k−l)ξ en0 (x) em hψ2k dx dξ 0 (x) det Mw (x, ξ) Q ZZ = e2πi(k−l)ξ en0 (x) em 0 (x) dx dξ = δkl δnm Q+
where we have used the orthogonality of {e2πikξ }∞ k=−∞ on [−1/2, 1/2] and the orthogonality of {en0 }∞ on [0, 1/2]. Next, n=0 ZZ |Zw(x − 1/2, ξ)|2 n m e hψ2k+1 , ψ2l+1 i = e2πi(k−l)ξ en1 (x) em dx dξ 1 (x) det Mw (x, ξ) Q ZZ = e2πi(k−l)ξ en1 (x) em 1 (x) dx dξ = δkl δnm . Q+
Finally, we have ZZ
Zw(x, ξ)Zw(x − 1/2, ξ) dx dξ det Mw (x, ξ) Q ZZ h ³ en (x)em 1 ´ 1 (x) = e2πi(k−l)ξ 0 Zw(x, ξ) Zw x − , ξ det Mw (x, ξ) 2 Q+ ³ 1 ´i −Zw(−x, ξ) Zw − x − , ξ dx dξ = 0 2
n m hψ2k , ψe2l+1 i=
e2πi(k−l)ξ en0 (x) em 1 (x)
since the symmetry condition (4.1) and the quasiperiodicity of Zak transforms forces the integrand to vanish. Consider now the problem of identifying the dual basis in the case where the window is the shifted Gaussian w(x) = exp(−α(x − 1/4)2 ) with α > 0, which has the desired symmetry, w(1/2 − x) = w(x). Direct calculation followed by Poisson summation shows that ¶ ¶µX µX −αm2 /2 2πimξ −α(2x−1/2+k)2 /2 e e e det Mw (x, ξ) = r =
k
π 2α
µX k
m 2
e−α(2x−1/2+k)
/2
¶µX m
e−2π
2
(ξ+m)2 /α
¶ .
170
4 Bases for time–frequency analysis
The upper and lower Riesz bounds then take the form [54] µX ¶µX ¶ µX ¶ −α(1/2+k)2 /2 m −αm2 /2 −αk2 /2 A= e (−1) e < e =B k
m
k
and the dual basis is generated by the dual window r P Z 1/2 −α(2x−1/2+k)2 /2 e2πikξ α k γk e P −2π2 (ξ+m)2 /α dξ. w(x) e = P ; γ = k 2 /2 −α(2x−1/2+l) 2π −1/2 m e le When α = 2π, w ˜ itself is very nearly a shifted Gaussian.
4.2 Local trigonometric bases 4.2.1 Smooth localization The Wilson bases have good joint time and frequency-pair localization. The basis elements are approximately localized on frequency pairs of Heisenberg tiles of fixed length and width. In contrast, wavelet basis elements can be thought of as being concentrated on Heisenberg boxes of unit area but of constant relative bandwidth based on scale. By modifying the Wilson basis construction, it is possible to construct bases whose elements live under overlapping windows having compact supports of arbitrary lengths. This corresponds to tiling the time–frequency plane with rectangle pairs whose sides have constant ratio over fixed time intervals, but whose ratios can change from one time interval to the next. The construction of these local trigonometric bases on R is attributed to Coifman and Meyer [89] (see also [10] and [240]), although discrete analogues were introduced first by Malvar [269] (see [189] for further insights). In the forthcoming construction only adjacent intervals have overlapping windows. The case of multiple but finitely overlapping windows was also considered by Herley et al. [187]. Remaining problems in higher dimensions will not addressed here. Let η(x) be a function that is non-negative, symmetric with respect to x = ∞ 0, supported in [−1, R x 1] and having integral π/2. We can take η to be C if we wish. Set θ(x) = −∞ η(t) dt. Then θ(x) − π/4 is smooth, nondecreasing, and antisymmetric. One can also define θε (x) = θ(x/ε). It follows that sin(θε (x)) is also nondecreasing such √ that sin(θε (x)) = 0 if x < −ε, sin(θε (x)) = 1 if x > ε and sin(θε (0)) = 1/ 2. Notice also that sin(θε (−x)) = sin(π/2 − θε (x)) = cos(θε (x)). We will abbreviate sin(θε (x)) = sε (x) and cos(θε (x)) = cε (x). Suppose now that I = [x0 , x1 ) is a nontrivial interval. Choose numbers α and β such that x0 + α ≤ x1 − β. Define bI (x) = sα (x − x0 )cβ (x − x1 ). The function bI (x) is called the bell over I because the graph of bI is, more or less, bell-shaped: it vanishes to the left of x0 − α, increases until it reaches the value one at x0 + α, stays flat on (x0 + α, x1 − β) then decreases steadily until it vanishes to the right of x1 + β.
4.2 Local trigonometric bases
171
Normally one thinks of the localization of a function f to the interval [x0 , x1 ) as the product of f with the characteristic function of [x0 , x1 ). Such a cutoff clearly defines a projection onto a subspace of L2 (R) and two such projections are orthogonal when their cutoff intervals are disjoint. We aim to build analogous projections with smooth, hence necessarily overlapping cutoffs. Orthogonality is no longer automatic and requires folding at endpoints. Definition 4.2.1. Let I = [x0 , x1 ) and let bI be as above. The localization SI f of f to I is defined as SI f (x) = bI (x)f (x) + bI (2x0 − x)f (2x0 − x) − bI (2x1 − x)f (2x1 − x). One now defines an operator PI f (x) = bI (x)SI f (x). Symmetry properties of sα and cβ show that SI f (x) is symmetric on (x0 −α, x0 +α) with respect to its midpoint x0 while it is antisymmetric with respect to x1 on (x1 −β, x1 +β). This guarantees that PI defines a projection. Proposition 4.2.2. PI ◦ PI = PI . In fact, f = PI (f ) if and only if f = bI S with S even with respect to x0 on (x0 − α, x0 + α) and odd with respect to x1 on (x1 − β, x1 + β). Proof. The first statement follows from the second. Suppose now that S has the indicated local symmetries. Then we claim that SI (bI S) = S on [x0 − α, x1 + β) in which case bI S is certainly in the range of PI = bI SI . The trick is to analyze the regions on which the behavior of the bell varies. If x0 − α ≤ x ≤ x0 + α then x0 − α ≤ 2x0 − x ≤ x0 + α so bI (2x1 − x) = 0 and SI (bI S)(x) = b2I (x)S(x) + b2I (2x0 − x)S(2x0 − x) − b2I (2x1 − x)S(2x1 − x) = s2α (x − x0 ) S(x) + c2α (x − x0 ) S(2x0 − x) = (s2α (x − x0 ) + c2α (x − x0 )) S(x) = S(x) where we have used the symmetry of S on (x0 − α, x0 + α). If x0 + α ≤ x ≤ x0 −β, then bI (x) = 1 and bI (2x0 −x) = bI (2x1 −x) = 0 so that SI (bI S)(x) = S(x). Finally, if x1 − β ≤ x ≤ x1 + β, then x1 − β ≤ 2x1 − x ≤ x1 + β and bI (2x0 − x) = 0 and SI (bI S)(x) = c2β (x − x1 ) S(x) − c2β (x1 − x) S(2x1 − x) = S(x) where we have used the anti-symmetry of S on (x1 − β, x1 + β). Thus for such a function S one has SI (bI S) = S and, hence, PI (bI S) = bI S and bI S is in the range of PI . This completes the proof. The aim here is to build a family of orthogonal projections that give rise to a basis for L2 (R). Suppose that I = [x0 , x1 ) and J = [x1 , x2 ) are two adjacent intervals. Bells bI (x) and bJ (x) for I and J are said to be compatible provided bI (x) = sα (x − x0 )cβ (x − x1 ) while bJ (x) = sβ (x − x1 )cγ (x − x2 ) in which x0 + α ≤ x1 − β < x1 + β ≤ x2 − γ. The point is that the inner cutoff terms need to match up. Given such compatible bells one defines the bell bI∪J (x) = sα (x − x0 )cγ (x − x2 ).
172
4 Bases for time–frequency analysis
Proposition 4.2.3. Suppose that I and J are contiguous intervals as above with compatible bells bI ,bJ and let bI∪J be defined as above. Then PI + PJ = PI∪J while PI PJ = PJ PI = 0. Proof. The second statement follows from the first along with the idempotency of the projections, since then one has 2
PI∪J = (PI + PJ ) = PI∪J + PI PJ + PJ PI which shows that PI PJ = −PJ PI . On the other hand, PI PJ = PI2 PJ = −PI PJ PI = PJ PI2 = PJ PI , which provides a contradiction unless PI PJ = 0. The first statement is proved directly from the definitions using observations similar to those used in proving Proposition 4.2.2 and we leave it as an exercise. Armed with Proposition 4.2.3 one can build a resolution of the identity by choosing a strictly increasing sequence {xk } ⊂ R such that limk→±∞ xk = ±∞ and assigning to each interval Ik = [xk−1 , xk ) the operator Pk = PIk = bIk SIk in such a way that contiguous intervals are equipped with compatible bells. It then follows from Proposition 4.2.3 that M X
Pk f = P[x−N −1 ,xM ] f → f in L2 (R) as N, M → ∞.
k=−N
Thus, to build an orthonormal basis for L2 (R) of functions localized about the interval Ik it suffices to build, for each k, an orthonormal basis enk of functions on Ik having the indicated symmetry properties. Then the functions bIk enk will be orthogonal in L2 (R) and, being in the image of Pk , will automatically be orthogonal to corresponding orthogonal families generated over Pk0 for k 0 6= k. Here is the idea. Proposition 4.2.4. Suppose that {en } forms an orthonormal basis for L2 (I) where I = [x0 , x1 ). Given α, β > 0 with α + β < x1 − x0 , let e˜n be obtained by symmetric extension of en on (x0 − α, x0 + α) and by antisymmetric extension of en on (x1 − β, x1 + β). Then the functions bI e˜n form an orthonormal family in L2 (R). Proof. First notice that ½ Z x0 +α Z hbI een , bI eem i = + x0 −α
Z
x1 −β
x1 +β
+
x0 +α
x1 −β
¾ b2I (x) een (x) eem (x) dx .
(4.8)
The symmetry properties of the extensions e˜n and the change of variable x → 2x0 − x gives Z x0 +α Z x0 +α £ 2 ¤ b2I (x) een (x) eem (x) dx = bI (2x0 − x) + b2I (x) en (x) em (x) dx x0 −α
x0 x0 +α
Z
en (x) em (x) dx
= x0
4.2 Local trigonometric bases
173
since b2I (2x0 − x) + b2I (x) = 1 on [x0 , x0 + α). Similarly, since e˜n e˜m is locally symmetric about x1 , Z x1 +β Z x1 b2I (x) een (x) eem (x) dx = en (x) em (x) dx. x1 −β
x1 −β
Adding now the terms in (4.8) and usingR the fact that bI (x) = 1 inside (x0 + x α, x1 − β), one finds that hbI e˜n , bI e˜m i = x01 en (x)em (x) dx = δnm as claimed. The particular structure of the local basis functions is not so crucial as is the local bell condition b2I (2x0 − x) + b2I (x) = 1. Corollary 4.2.5. Suppose that for each k ∈ Z one has an orthonormal basis {enk } for L2 (Ik ). Then {bIk e˜nk }n,k forms an orthonormal basis for L2 (R). The corollary follows from the orthogonality of the projection operators Pk along with the fact that Pk (bIk e˜nk ) = bIk e˜nk by the symmetry properties of e˜nk and the characterization of the range of PI from Proposition 4.2.2. Example. The functions {cos ((2n + 1)πx/2)}n=0,1,... form an orthonormal basis for L2 [0, 1]. Further, the basis functions are locally symmetric near x = 0 and locally antisymmetric near x = 1. By dilating and shifting one can obtain a corresponding basis for I = [x0 , x1 ), namely s µ ¶ 2 2n + 1 en (x) = cos π(x − x0 ) (n ≥ 0) |I| 2|I| which are locally symmetric near x = x0 and locally antisymmetric near x = x1 . By choosing a corresponding basis for each interval Ik one obtains an orthonormal basis for L2 (R). Figure 4.1 illustrates several cosine packets. Remark. The particular symmetries chosen at the left and right endpoints of I are not crucial for building a projection over I. For example, by changing the “−” to a “+” in the definition of SI one would obtain a function SI f that is locally symmetric at both endpoints of I; by changing the “+” to a “−” one would obtain a function that is locally antisymmetric at both endpoints of I. The operator bI SI would be idempotent in either case. However, in order that projections corresponding to contiguous intervals be orthogonal, one must have opposite polarity of the two projections at any shared endpoint as is the case when symmetry is imposed at the left endpoint and antisymmetry at the right endpoint. By choosing other symmetries one can build local bases of sinusoids. The bases obtained in the corollary, starting from trigonometric functions {enk }, are called local trigonometric bases. 4.2.2 Locally bandlimited functions The techniques just outlined are capable of generating large families of bases of L2 (R). Intuitively, such bases are appropriate for analyzing signals having
174
4 Bases for time–frequency analysis Some Cosine Packets 8 (7,32, 1) 7 (6,16, 2) 6 (5, 8, 4) 5 (4, 4, 8) 4 (3, 2,16) 3 (2, 1,32) 2 (1, 0,64) 1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 4.1. Plots of cosine packets of different scales and frequencies
components of different onsets, durations, and basic waveforms. Best basis algorithms (e.g., [364]) seek to choose, among a given family of bases, one that optimizes a given function of the cost of computing approximate signal expansions. If, for the sake of argument, one wishes to expand f in local trigonometric functions, there is still the question of how to choose the segmentation points {xk } from which the basic bells are chosen. There are several issues here and we will consider just a few. First, often one does not know, a priori, what are good points at which to segment a given signal. One can attempt a type of recursive dyadic partitioning (see Section 2.4). However, the signal may have sharply localized features near points of the form k/2L where k is odd and L large, meaning there is a high cost in searching for good break points. A second issue is that time–frequency localization of bells bI (x) = sα (x − x0 )cβ (x − x1 ) depends on the factors α, β that determine the steepness of the cutoff. One has α + β ≤ x1 − x0 and, ideally, one wants α ∼ β ∼ (x1 − x0 )/2 in order not to introduce artificial high frequencies into localized signal components. Good time–frequency localization under adjacent intervals then requires that these intervals have lengths of roughly the same magnitude. Jawerth and Sweldens [216] suggest a type of multiscale folding to address this issue. A second issue is that of dealing with discrete data. Malvar’s lapped orthogonal transforms (LOT’s) are the discrete version of Coifman and Meyer’s local trigonometric bases. When working with discrete data one is constrained by the Nyquist rate, whereas when working with LOTs one hopes that the signal is locally well approximated by local trigonometric terms of low degree. A local sampling theorem should tell us how long windows must be in order to contain well-defined frequency content of a sampled bandlimited signal. Such
4.2 Local trigonometric bases
175
a local sampling theorem was deduced by Bernardini and Vetterli [49]. The theorem really applies to the Wilson bases, that is, local cosine basis with uniform bells. Bernardini and Vetterli defined a subspace Vj of L2 (R) as Vj = span{gjk : k ∈ Z} in which ³ ³ 1´ ´ t . gjk = b(t − k) cos π j + 2 Here, b is a bell function based on [0, 1] as described in Section 4.2. From these Vj one can build U (N ) = span{gjk : 0 ≤ j < N, k ∈ Z}. Then U (N ) may be thought of as a class of real signals that are locally bandlimited of order N . The space U (N ) depends on the specific bell used in the definition of gjk , so the extent to which signals can be regarded as approximately locally bandlimited is open to interpretation. Given a continuous-time function f and a positive integer N , consider the sample sequence f (N ) [n] = f (1/(2N ) + n/N ) (n ∈ Z). Similarly, we define (N ) b(N ) [n] = b(1/(2N ) + n/N ) and gjk [n] = gjk (1/(2N ) + n/N ). The sampling sequence {1/(2N ) + n/N }n is chosen as it is in order to preserve local symmetry properties of signals with respect to integers. A space of discrete-time signals U (N ) is obtained by sampling signals in U (N ) : U (N ) = {f (N ) [n] : f ∈ U (N ) , n ∈ Z}. As is shown in [49], local symmetry properties inherited from {gjk } are enough (N ) −1 (N ) to show that {gjk }N . j=0 is an orthonormal basis for U Consider now the kernel R(t, x) =
−1 X NX k
gjk (t)gjk (x)
j=0
which, by the orthonormality of {gjk }, is the reproducing kernel for U (N ) . The following result can be viewed as an analogue of the classical sampling theorem. It applies to continuous-time signals that are locally bandlimited and was first proved by Bernardini and Vetterli [49]. Theorem 4.2.6. If f ∈ U (N ) , then f admits the sampling representation µ ¶ X n (N ) f (t) = f [n]R t, . N n As a basic component of the proof, one must relate the coefficients of the continuous representation to those of the discrete representation. Proof. If f ∈ U (N ) , then there are constants ajk such that f (t) =
−1 X NX k
j=0
ajk gjk (t).
(4.9)
176
4 Bases for time–frequency analysis
Since f is continuous, sampling both sides of this equation at t = n/N yields f (N ) [n] = f (1/(2N ) + n/N ) =
−1 X NX k
(N )
ajk gjk [n].
j=0
(N )
Applying the discrete orthonormality of {gjk }jk then gives ajk =
X
(N )
f (N ) [n]gjk [n].
(4.10)
n
Substituting (4.10) into (4.9) gives the result.
4.3 Wavelet packet bases 4.3.1 High- and low-pass filters Recall that scaling functions ϕ(x) are associated with scaling sequences {hk } such thatP(1.2), i.e., ϕ(2ξ) b = H(ξ)ϕ(ξ) b holds, where the low-pass scaling filter H(ξ) = k hk e−2πikξ . When ϕ is orthogonal to its shifts, by (1.3) the highP ¯ 1−k satisfies the identity pass filter G(ξ) = k gk e−2πikξ with gk = (−1)k h 2 2 |H(ξ)| + |G(ξ)| ≡ 1. As in Section 1.1.2 one defines the discrete convolution–decimation operator H acting on sequences by √ X ¯ l−2k cl . h (Hc)k = 2 l
The adjoint operator H∗ , which acts by upsampling followed by convolution is given by √ X (H∗ c)k = 2 hk−2l cl . l ∗
Operators G and G are defined similarly by replacing the coefficients hk by ¯ 1−k , the Fourier coefficients of G. The QMF condition may be gk = (−1)k h written H∗ H + G ∗ G = I and GH∗ = HG ∗ = 0 (4.11) which means that the dual high- and low-pass filter pairs give rise to a perfect reconstruction subband scheme. From the point of view of functions, on the other hand, the conditions (4.11) account for the direct sum decomposition V0 = W−1 ⊕ V−1 in terms of multiresolution spaces and this is the point of view that we will build on here. P At the level of coefficients {ck } of f (x) = k ck ϕ(x − k) in V0 this decomposition is accomplished through the mappings c(1) = Hc and d(1) = Gc. The operators H and G are also used to obtain V−1 = V−2 ⊕ W−2 so that one has V0 = V−2 ⊕ W−2 ⊕ W−1 . Wavelet decompositions amount to iterations of this
4.3 Wavelet packet bases
177
procedure. The idea behind wavelet packets is simply that the wavelet spaces can also be decomposed. Given a scaling function ϕ that is orthogonal to its integer shifts, basic wavelet packets are defined recursively as follows. Set X X w0 (x) = ϕ(x) = 2 hk ϕ(2x − k); w1 (x) = ψ(x) = 2 gk ϕ(2x − k). 2 2 ¯ Notice that √ if H : ` (Z) → ` (Z) is the convolution–decimation operator ¯ k = 2 P hl−2k cl with a similar definition for G, ¯ then (Hc) l
w0 (x) =
√
¯ 0 (2x − ·))0 , 2 (Hw
w1 (x) =
√
¯ 0 (2x − ·))0 . 2 (Gw
Iterating the operators H and G on w0 and w1 gives rise to X √ ¯ n (2x − ·))0 , w2n (x) = 2 hk wn (2x − k) = 2 (Hw X √ ¯ n (2x − ·))0 . w2n+1 (x) = 2 gk wn (2x − k) = 2 (Gw The QMF property of the filters H(ξ) and G(ξ) is of course shared by H(−ξ) ¯ and G¯ satisfy H ¯ ∗H ¯ + G¯∗ G¯ = I. and G(−ξ). Consequently, the discrete filters H Hence, reconstruction of wn from w2n and w2n+1 is achieved by 1 ¯∗ wn (2x) = √ (H w2n (x − ·))0 + (G¯∗ w2n+1 (x − ·))0 2 X ¯ 2l w2n (x + l) + g¯2l w2n+1 (x + l). h =
(4.12)
l
The formulas express the fact that wavelet packets are being employed to decompose wavelet subspaces further into high- and low-frequency components. To express this view more precisely, define nX o Ωn = ak wn (x − k) : {ak } ∈ `2 (Z) . Then, setting δf (x) = decomposition
√
2f (2x), the formula (4.12) expresses the orthogonal δΩn = Ω2n ⊕ Ω2n+1 .
Since W1 = δΩ0 ª Ω0 = Ω1 and Wn+1 = δWn , we have W2 = δW1 = δΩ1 = Ω2 ⊕ Ω3 and by iterating,
m
Wm = ⊕2j=2−1 m−1 Ωj .
It follows that the collection {wn (x − k) : k ∈ Z and 2m−1 ≤ n ≤ 2m − 1} forms an orthonormal basis for Wm . Wavelet packet splittings can be viewed as tilings of the half-plane as follows. Recall that a wavelet can be thought of as being concentrated in
178
4 Bases for time–frequency analysis
the upper time–frequency plane in the Heisenberg rectangle I × ω where, if I = [k/2j , (k + 1)/2j ) then ω = [2j , 2j+1 ). Suppose now that k is even and consider the wavelet pair ψI , ψI 0 where I 0 = [(k + 1)/2j , (k + 2)/2j ). This pair occupies the bitile [k/2j , (k + 2)/2j ) × [2j , 2j+1 ) which we think of as a pair of time sibling tiles. On the other hand, this same bitile can j−1 j−1 be as the pair of frequency ¡ expressed ¢ sibling tiles [l/2 , (l + 1)/2 ) × j−1 j−1 j−1 j−1 [2 · 2 , 3 · 2 ) ∪ [3 · 2 , 4 · 2 ) where k = 2l. These frequency siblings correspond to wavelet packets from spaces Ωj of lower order. 4.3.2 Subspaces and trees; splitting criteria Wavelet packet decompositions can be associated with dyadic trees and such a scheme is consistent with a general notion of wavelet packets. Notationally, set Γ = V0 ' `2 (Z) so that we have attached a concrete pair of orthogonal projection operators H∗ H and G ∗ G associated to V−1 and W−1 which we now label Γ0 and Γ1 , i.e., Γ0 = V−1 = H∗ H V0 , Γ1 = W−1 = G ∗ G V0 and Γ = Γ0 ⊕ Γ1 . Iterating the operators on these subspaces then gives Γ0 = Γ0,0 ⊕ Γ0,1 ' V−2 ⊕ W−2 and Γ1 = Γ1,0 ⊕ Γ1,1 ' δ −2 Ω2 ⊕ δ −2 Ω3 . After m-iterations one has a family of 2m subspaces indexed by binary sequences of length m or, even better, by dyadic subintervals of [0, 1) of length 2−m . We denote these as ΓI , understood to mean Γε1 ...εm when the left endpoint Pm−1 of I has binary expansion 2−m k=0 εk 2k . The following theorem is due to Coifman et al. [91]. Theorem 4.3.1. Suppose that, except for a countable set, a dyadic interval I ∞ is expressed as the disjoint union I = ∪∞ j=1 Ij . Then ΓI = ⊕j=1 ΓIj . To interpret Theorem 4.3.1, to each interval I one associates an orthonormal basis of ΓI . Splitting I into its left and right subintervals amounts to replacing an expansion of a signal in ΓI in terms of the “standard” basis for ΓI by the expansion in terms of the basis functions standard to the left and right subintervals. The point here is that, taken as a union, all of the different bases are vastly overcomplete but, by fixing a partition of I into dyadic subintervals, one chooses a specific basis from all of the ones available. In the concrete setting of wavelet packets, the entire family 2m/2 wn (2m x − k), n ∈ N, m ∈ Z, k ∈ Z of wavelet packets is overcomplete. One would like a convenient way of labelling those subfamilies that form bases. If one identifies Γ ' `2 (Z) with Vm (rather than V0 ) equipped with its standard basis {2m/2 ϕ(2m x − k)}∞ k=−∞ and denotes I(−j, n) = [2j n, 2j (n + 1)) restricted to those n, j such that 2−m I(−j, n) ⊂ [0, 1) (denote these intervals by Em ) one has the following. Theorem 4.3.2. If E ⊂ N × Z has the property that, except for a countable set, [0, ∞) is covered by a disjoint union of dyadic intervals I(−j, n) where (j, n) ∈ E, then the wavelet packets
4.4 Information cells and tilings
179
n o 2j/2 wn (2j x − k) : k ∈ Z, (j, n) ∈ E form an orthonormal basis of L2 (R).
4.4 Information cells and tilings Theorem 4.3.1 remains valid if, instead of using the same pair of operators H and G to split each ΓI into “left” and “right” subspaces, one simply has a consistent, but possibly interval-dependent, means of splitting ΓI into an orthogonal direct sum of two subspaces indexed by its left and right subintervals. Intuitively, wavelet packets split a frequency interval into its upper and lower parts, whereas local trigonometric bases serve to split an interval space spatially in terms of its left and right subintervals. One might wish to be able to interchange such splittings so as to optimize some function of them. Herley et al. [187] introduced a general approach to joint time–frequency splittings of discrete signals, though lapped orthogonal transforms were at the technical heart of that work. Given a suitable means of interpolating between the Malvar time splitting and the wavelet packet frequency splitting, one could recursively construct signal decompositions associated with arbitrary tilings of the time–frequency plane. We will not review the techniques of [187]. Rather, we will consider some subsequent ideas of Bernardini and Kovacevic [48] addressing the problem of designing appropriate filters for these arbitrary tilings. As usual there are two competing issues: the cost of computing an expansion (rate) and the cost of compressing (distortion). The problem of localizing a basic signal about a tile is fundamental here. As in [48], we consider discrete time vectors f [n], n = 0, . . . , M −1 thought of as critically sampled sequences of function values. To analyze f , choose an orthogonal family gk , k = 1, . . . , M of vectors in RM and extend them symmetrically to the left and antisymmetrically to the right, setting g˜k [n] = gk [−1 − n] when −M/2 ≤ n < 0 and g˜k [n] = −gk [2M − 1 − n] when M ≤ n < 3M/2 where we have assumed that M is even. The extension process is represented by left multiplication of the orthogonal matrix G having columns gk by a 2M ×M matrix E. The columns of EG remain orthogonal. Subsequent windowing is represented by multiplication of EG on the left by the 2M × 2M matrix W = diag (w(−M/2), . . . , w(3M/2−1)). Here we shall assume that the window sequence w(n) is chosen a priori to have some desirable properties— as in the case of lapped orthogonal transforms. The columns of H = W EG then form a filter bank with extra properties determined by W and G. For example, in the standard lapped orthogonal transforms G is essentially a DFT matrix, hence is amenable to fast filter implementations while the columns hk of H admit some interpretation of being time–frequency localized. We focus here on the problem of optimizing G for a fixed choice of W . Optimization here is expressed in terms of minimizing “uncertainty.”
180
4 Bases for time–frequency analysis
Localization of h(n), n ∈R Z about tile I × ω in T × Z P a sequence P the 2πint 2 means that EI = n∈I |h(n)| and E = | h(n)e |2 dt should each ω / n T\ω be small. In the finite case one wishes to impose corresponding localization constraints—that EIk and Eωk should be small—jointly on the columns of H. The errors EIk and Eωk can be expressed in terms of a pair of 2M × 2M symmetric positive semidefinite uncertainty matrices CIk —which serves to restrict the sum to appropriate indices—and Cωk —which represents the analogous quantity Eωk (hk ) = hTk Cωk hk . The uncertainty cost associated with the basis hk can be expressed as C{h1 , . . . , hk } =
M X
hTk (CIk + Cωk ) hk
k=1
=
M X
gkT E T W T (CIk + Cωk ) W E gk
k=1
≡
M X
gkT Dk gk .
(4.13)
k=1
Thus, given W , one seeks a basis {gk } that minimizes the cost associated with localizing the basis elements about a corresponding collection of tiles in time– frequency. The gk need not be Fourier vectors. Rather, they are the discrete analogues of the local basis functions enk in Corollary 4.2.5. Minimizing (4.13) is untenable insofar as the matrices Dk have no preordained structure. A suboptimal approach is to minimize each term of (4.13) by finding an eigenvector ak of Dk with minimal eigenvalue. The ak will not generally be orthogonal to one another, so it is desirable to find the cost of orthogonalizing the resultant set as well. This involves computing the singular value decomposition of the matrix A = [a1 · · · aM ], A = OΣQ, in the sense that OQ is orthogonal and its columns have the property of having the minimal distance, among all orthogonal bases, from the original vectors. Proposition 4.4.1. The matrix OQ in the singular value decomposition A = OΣQ minimizes tr [(U − A)T (U − A)] among all orthogonal matrices U . Proof. A simple calculation gives tr [(OQ − OΣQ)T (OQ − OΣQ)] = M + tr (Σ 2 ) − 2tr (Σ). Further, tr [(U − OΣQ)T (U − OΣQ)] = tr [(U T − QT ΣOT )(U − OΣQ)] = tr [(I + QT Σ 2 Q − QT ΣOT U − U T OΣQ)] = M + tr (QT Σ 2 Q) − 2tr (QT ΣOT U ) = M + tr (Σ 2 ) − 2tr (QT ΣOT U ), so the problem is to maximize tr (QT ΣOT U ) = tr (ΣOT U QT ) over all choices of U . Equivalently, one needs to show that tr (ΣV ) is maximized over orthogonal V when V = I. But this is clear because
4.5 The discrete Walsh model phase plane
181
¯M ¯ M X ¯X ¯ ¯ |tr (ΣV )| = ¯ Σnn Vnn ¯¯ ≤ Σnn |Vnn | ≤ tr (Σ) n=1
n=1
since |Vnn | ≤ 1. Here one has also used the fact that Σnn ≥ 0. This proves the proposition. Of course, minimizing uncertainty cost is but one issue to be confronted in choosing an optimal basis corresponding to a time–frequency tiling. At this stage we turn to the problem of finding a best-adapted basis chosen from a fixed family of tiling bases, specifically in the setting of Walsh functions.
4.5 The discrete Walsh model phase plane In Thiele’s thesis [347], the Walsh model phase plane was formalized as a model phase space for time–frequency analysis. Walsh packets are supported in an interval. It is useful to think of them as being localized in a frequency bin as well but they are discontinuous, hence poorly localized in frequency. Thus a time–frequency tile can be associated with a Walsh function only in a heuristic way. To each standard Walsh function on [0, 1) one can, in fact, attach a unique number n that specifies the number of times the function changes sign on [0, 1). This sign change frequency or sequency provides a rough notion of wavenumber. What one gains from this point of view is a pairwise disjoint (a.e.) decomposition of the plane, thought of loosely as time–wavenumber space, defined by cells occupied by separate orthogonal Walsh functions. In short, the precise meaning of frequency is traded-off for precise orthogonal time–wavenumber decompositions. The sequency order on Walsh functions is defined recursively on [0, 1) by W0 (x) = χ[0,1) (x), W2n (x) = Wn (2x) + (−1)n Wn (2x − 1), W2n+1 (x) = Wn (2x) − (−1)n Wn (2x − 1), so that Wn has exactly n-sign changes in [0, 1) as is easily proved by induction. This sequency ordering was introduced by Walsh [357] cf., [179], [348]. The Walsh packet functions Wnlj (x) = 2j/2 Wn (2j x − l) are then thought of as functions localized on the rectangles [l/2j , (l + 1)/2j ) × [2j n, 2j (n + 1)) in the time–frequency plane. Literally, Wnjl (x) is supported in [l/2j , (l + 1)/2j ). W0 is the Haar scaling function having zero oscillations on [0, 1) while W1 is the Haar wavelet which has one oscillation on [0, 1). The parameter j in Wnlj (x) does not affect wavenumber, but rather wavelength. So, care must be taken when thinking of [2j n, 2j (n + 1)) as some type of
182
4 Bases for time–frequency analysis
Fourier support of Wnlj . Nevertheless, interpreting [2j n, 2j (n + 1)) as a wave support allows one to associate a dyadic tile P = Pnlj = [l/2j , (l + 1)/2j ) × [2j n, 2j (n + 1)) with a unique Walsh packet WP = Wnlj and, thereby the union R × R+ of these tiles can be called the Walsh phase plane. One calls IP = [l/2j , (l + 1)/2j ) the time interval of P and ωP = [2j n, 2j (n + 1)) the frequency interval of P . Figure 4.2 illustrates several Walsh packets. Figure 4.3 illustrates the nor-
Fig. 4.2. Walsh packets of different shifts and sequencies
malized tiles associated with a pair of Walsh packets. The phase planes for these packets are normalized so that a fixed number of congruent tiles will fit inside the square. Disjoint tiles give rise to orthogonal Walsh packets. Lemma 4.5.1. Two Walsh packets having disjoint tiles are orthogonal. Proof. The statement is obvious when the tiles have disjoint time supports. Disjoint tiles with a common time interval of unit length are orthogonal because they are integer time shifts of orthogonal Walsh functions Wn . In general, distinct Walsh packets sharing the same time interval are Walsh functions rescaled and shifted by the same factor, so orthogonality follows. If the time intervals IP and IP 0 of WP and WP 0 merely overlap then, since they are dyadic one is a dyadic subinterval of the other, say IP ⊂ IP 0 . But then WP 0 χIP is a multiple of a Walsh packet WP 00 : this follows from the recursive definition of Walsh functions. On the other hand, the frequency intervals ωP and ωP 0 must be disjoint since the tiles are. Since |ωP | = |ωP 00 | it follows again from dyadic geometry that ωP and ωP 00 are disjoint and so, as before, WP is orthogonal to WP 00 and hence to WP 0 .
4.5 The discrete Walsh model phase plane
0.1
183
1 0.8
0.05
0.6 0 0.4 −0.05
−0.1
0.2
0
0.2
0.4
0.6
0.8
1
0.1
0
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
1 0.8
0.05
0.6 0 0.4 −0.05
−0.1
0.2
0
0.2
0.4
0.6
0.8
1
0
Fig. 4.3. Normalized Walsh packets and their associated tiles
In the Walsh plane, Walsh packet tiles have unit area. For reasons that will become apparent, it will be useful to associate tree structures to certain subsets of tiles. Toward this end, it is also useful to work with admissible bitiles. These are pairs of tiles, or tiles of area two, in which both the time and frequency intervals are dyadic. Not all adjacent pairs of tiles form admissible bitiles. Bitiles can be split into tiles either by splitting in frequency into upper and lower sibling tiles or in time into left and right sibling tiles. The relevance of bitiles stems from combination rules for the different siblings. If B is a bitile with frequency siblings B + and B− and time siblings Bl and Br then · ¸ · ¸· ¸ 1 1 (−1)n WB− WBl =√ . n+1 WB + WBr 2 1 (−1) That is, time or frequency sibling pairs of Walsh packets span one another. 4.5.1 Subspaces spanned by finite sets of tiles What follows is a sequence of lemmas concerning the geometry of tiles. The immediate goal is to show that the span of a finite collection of Walsh packets depends only on the region covered by their tiles. This will have consequences presently in terms of defining optimal bases for signal expansion and later, in Chapter 7 in proving boundedness of certain integral operators. Several geometric details will be left as exercises for the reader (see also [347]). The first observation offers a relationship between tiles and bitiles. Lemma 4.5.2. Let B, C be distinct bitiles with upper and lower siblings B+ , B− and C+ , C− respectively. Then either B+ ∩ C+ = ∅ or B− ∩ C− = ∅.
184
4 Bases for time–frequency analysis
It is simple to put a partial order on tiles based on containment of their time intervals: one writes P ¹ P 0 if IP ⊂ IP 0 and ωP 0 ⊂ ωP . Given a set P of tiles one denotes by P min the set of minimal tiles in P and by P¯ the set of all tiles contained in span (P) ≡ ∪P ∈P P . One has Lemma 4.5.3. Given a finite set P of pairwise disjoint tiles, either P = P¯ min or P contains a pair of frequency siblings. Suppose that P % P¯ min where P¯ min is the set of tiles contained in the span of the minimal tiles in P. Choose a tile P with time interval IP of maximum length among tiles in P \ P¯ min . The frequency sibling Q of this tile must then also belong to P. To see this, note that there is a tile P 0 ≺ P in P¯ and so P 0 ≺ Q as well. Since P 0 ⊂ span (P), there must be a P 00 ∈ P \ P¯ min that intersects Q nontrivially. As the tiles in P are pairwise disjoint, P ∩ P 00 = ∅ so Q ¹ P 00 . But |IP 00 | ≤ |IP | = |IQ | so one must have Q = P 00 . Corollary 4.5.4. If P is a pairwise disjoint set of tiles then span (P) = span (P¯ min ) = span (P¯ max ). The key observation here is that if span (P) contains a bitile then the frequency and time siblings can be substituted for one another in span (P). The same observation also yields the following. Corollary 4.5.5. If P is a pairwise disjoint set of tiles then each R ∈ P¯ belongs to a set of pairwise disjoint tiles having the same span as P. Bitiles can be given the same partial ordering by containment of time intervals as tiles have. One says that a collection B of bitiles is convex if B, B 00 ∈ B and B ¹ B 0 ¹ B 00 implies B 0 ∈ B. Lemma 4.5.6. The union of a finite convex set of bitiles can be decomposed into a disjoint union of tiles. The lemma is proved by induction. The idea is to start with a minimal bitile B in the convex collection B and to use convexity to show that if P is one of the frequency siblings of B then P is either contained in or disjoint from the union of bitiles in the set B \ {B}. The thrust of the lemmas above is that there is a well-defined L2 -projection attached to any union of pairwise disjoint tiles. Theorem 4.5.7. If P and P 0 are finite, pairwise disjoint sets of tiles having the same tile span then {WP : P ∈ P} and {WQ : Q ∈ P 0 } form orthonormal families spanning the same subspace of L2 (R). In particular, the union of a finite collection of pairwise disjoint tiles defines an orthogonal projection operator onto the corresponding subspace. The theorem follows from the fact that the transformation from P to P¯ min only requires converting pairs of frequency siblings to time siblings. This is where bitiles come in. Each such conversion is just an orthogonal transformation. The result also uses Lemma 4.5.1.
4.5 The discrete Walsh model phase plane
185
4.5.2 Tilings and the notion of best basis As there is a large number of possible Walsh packet bases that can span the same disjoint union of tiles, it is important to have a criterion for choosing, among such bases, one that is best for performing a certain signal processing task, such as compressing a signal or image. One would also like an efficient algorithm for computing such an optimal signal representation. Definition 4.5.8. Let S denote the set of all sequences defined on dyadic tiles. A functional H : S → [0, ∞) is said to be additivePif, whenever P is a pairwise disjoint set of tiles, one has H({aQ χP (Q)}) = P ∈P H({aQ δQP }). Here χP (Q) = 1 if Q ∈ P and is zero otherwise, while δQP is the Kronecker delta the entropy function H({aP }) = P include P for a pair of tiles. Examples − a2P log a2P or Hr ({aP }) = |aP |r if r < 2. Fix an additive function H. If A is a subset of the Walsh plane that can be written as a pairwise disjoint union of tiles P ∈ P then one sets mA ({aP }) = inf{H({aQ χP (Q)}) : P defines a pairwise disjoint cover of A}. The mapping A 7→ mA is subadditive. That is, if A1 and A2 are disjoint subregions spanned by pairwise disjoint collections of tiles, then mA1 ∪A2 ≤ mA1 + mA2 . For dyadic rectangles one actually has the following. Lemma 4.5.9. Let H be an additive functional and D a dyadic rectangle in the Walsh plane with area at least 2. Let D+ , D− , Dl , Dr denote the top, bottom, left and right subrectangles, respectively. Then mD = min{mD+ + mD− , mDl + mDr }. The important observation here is that if P is a tiling of D by pairwise disjoint tiles then, from interval length considerations, every tile is either contained entirely in one of D+ , D− or else every tile is contained entirely in one of Dl , Dr . Suppose now that P satisfies mD ({aP }) = H({aQ χP (Q)}). Then P is the disjoint union of two pairwise disjoint tilings, either of Dl and Dr or of D+ and D− . The result now follows from additivity of H. The lemma gives a divide and conquer strategy for computing optimal Walsh decompositions, but the actual algorithm for doing so is obtained from the following bottom-up approach. We are thinking of analyzing discrete signals of length N = 2J . To such a signal we associate a function that is piecewise constant on dyadic subintervals of [0, 1) of length 2−J . First compute the Walsh packet coefficients for each tile in SJ = [0, 1) × [0, 2J ). There are O(J2J ) of these. Next: • compute H({aQ δQP }) for each such tile. • for l = 1, . . . , J do: For each dyadic rectangle of area 2l , find a minimizing tiling built from subrectangle minimizers of size 2l−1 by choosing the vertical or horizontal subrectangles. At the Jth iteration, a global minimizing tiling has been found for the sequence of Walsh packet coefficients. The algorithm is O(J2J ).
186
4 Bases for time–frequency analysis
4.6 Phase planes for finite Abelian groups Much of the theory of phase space extends to locally compact Abelian (LCA) groups. In this section we address just one aspect, namely that certain tiles in phase planes for finite Abelian groups G define orthogonal elements of L2 (G) just as disjoint sets of Walsh tiles define orthogonal functions in L2 (R). The notion of a tile has a particularly natural interpretation in the setting of finite Abelian groups since optimally localized Fourier pairs are associated with such tiles, as we will see in Chapter 5. Further aspects of time–frequency analysis in the LCA group setting are addressed in [166]. Rudin [312] is a classical source for Fourier analysis on groups; see also Katznelson [225] for a brief introduction. In what follows, G is a finite Abelian group having |G| elements. Examples N having 2N elements include the group Πk=1 Z2 and the cyclic group of order N 2 . The former can be identified, spatially, with the dyadic subintervals of [0, 1) down to length 1/2N and the Walsh functions W0 , . . . , W2N −1 (k < N ) in b we mean the homomorphisms sequency order provide a representation. By G b γ : G → T. Products in G and G will be expressed by multiplication while b on g ∈ G will be expressed as hg, γi = γ(g). In particular, the action of γ ∈ G b if γ1 , γ2 ∈ G and g ∈ G, we write (γ1 γ2 )(g) = γ1 (g)γ2 (g). By the phase plane b To describe rectangles or phase cells first of G we mean the product G × G. recallPthat the group algebra C(G), in the case of finite G, can be expressed as { g∈G cg g : cg ∈ C}. As a vector space it is just C|G| with standard b can also be identified basis identifiedP with the elements of G. Elements γ ∈ G with elements γ(g)g of C(G) and we will not distinguish these two ways of expressing γ. In any case, this way of writing γ does not lead to any problems when using h·, ·i to express both the standard Hermitian inner product on b on G. Phase cells are described in terms C(G) = `2 (G) and the action of G of cosets of product subgroups as follows. b Let H, K b be Definition 4.6.1. Let G be a finite Abelian group with dual G. b b b subgroups of G, G, respectively, and let g ∈ G, γ ∈ G. The product Hg × Kγ = b is called a phase cell of G × G. b To such a phase cell {(hg, κγ) : h ∈ H, κ ∈ K} b one associates the vector space VHg×Kγ b = span (Hg) ∩ span (Kγ) as a linear subspace of the group algebra. b we mean the linear span of the elements of C(G) associated By span (Kγ) b of G. b The dimension or normalized with the corresponding elements κγ, κ ∈ K b area of a phase cell Hg × Kγ is the vector space dimension of VHg×Kγ b . A convenient expression for this area is desired. For a subgroup H of G one b of those η such that η(h) = 1 for all h ∈ H. denotes by H ⊥ the subgroup of G Theorem 4.6.2. VHg×H ⊥ γ is one-dimensional and is spanned by X X 1 γ(g) p γ(g 0 ) g 0 = p γ 0 (g) γ 0 . ⊥ |H| g0 ∈ Hg |G||H | γ 0 ∈ H ⊥ γ
(4.14)
4.6 Phase planes for finite Abelian groups ⊥ b then dim(V b If H ⊥ ⊆ K b ) = |H||K|/|G|. If H Hg×Kγ {0}.
187
b then V K b = Hg×Kγ
One can think of the left side of (4.14) as a standard basis expansion and the right side as a Fourier expansion. Proof. The vectors in (4.14) are unit vectors. To show that they are equal, observe that ¿ À X X γ(g) 1 0 0 0 0 p γ(g ) g , p γ (g) γ |H| g0 ∈ Hg |G||H ⊥ | γ 0 ∈ H ⊥ γ X 1 X γ 0 (g 0 ) γ(g) γ 0 (g) γ(g 0 ) = |G| 0 0 ⊥ g ∈ Hg γ ∈ H γ
1 X = |G| 00
g ∈H
X γ 00
γ 00 (g 00 ) = 1
∈ H⊥
since |H||H ⊥ | = 1. The second statement—from which the one-dimensionality in the first part follows—is proved by counting. First, vector spaces corresponding to disjoint phase cells are orthogonal because the cells have disjoint b and because the separate time and frequency support either in G or in G b The coset Kγ b is the disbases are orthogonal. Now assume that H ⊥ ⊂ K. ⊥ b b joint union of |K/H | = |K||H|/|G| cosets of H ⊥ each giving rise to at least a one-dimensional subspace of VHg×Kγ b . This gives the indicated dimension as a lower bound. On the other hand, there are |G|/|H| cosets of H and b K| b cosets of K b so, since |G| = |G|, b there are |G|2 /(|H||K|) b combinations |G|/| of cosets in the product giving rise to the same number of mutually orthogonal vector spaces. Since the dimension of the group algebra is |G|, the indicated dimension is an upper bound as well. b one can pick γ 0 ∈ K b \ H ⊥ and, considering its action Finally, if H ⊥ K b on the group algebra, the only vector in span (Kγ) on which γ 0 acts as a scalar is the zero vector, whereas it acts as a scalar on all of span (Hg) so the conclusion follows. Corollary 4.6.3. Let a phase cell C be the disjoint union of phase cells Pα each having normalized area one. Then choosing a unit vector out of each of the one-dimensional vector spaces VPα gives an orthonormal basis for VC . The corollary allows one to define a library of orthonormal bases that are amenable to best-basis protocols by choosing one basis vector for each b into phase cells of normalized unit area. Fast algorithms partition of G × G are associated to sublibraries that allow for splittings just as in the Walsh case above. For this purpose one fixes a chain of subgroups G0 ⊂ G1 ⊂ · · · ⊂ Gn = G. A tile is admissible with respect to this chain (e.g., generated by dyadic subintervals in the Walsh case) if it is a phase cell of the form Gk g × G⊥ k γ for each Gk in the chain. An admissible tiling of any phase cell is then a partition of the cell into admissible tiles.
188
4 Bases for time–frequency analysis
Lemma 4.6.4. If a phase cell C of normalized area greater than one has the form Gk g × G⊥ l γ then an admissible tiling of this cell is the disjoint union of tilings either of |Gk |/|Gk−1 | phase cells of the form Gk−1 gj × G⊥ l γ or of ⊥ ⊥ |G⊥ |/|G | phase cells of the form G g × G γ . k j l l+1 l+1 We omit the proof. The idea is that if the tiles do not have such a splitting then there must be a nonempty intersection among them.
4.7 Notes From local trigonometric bases to Meyer wavelets. Here we explore the connection between the local trigonometric bases of Coifman and Meyer and the bandlimited Lemari´e–Meyer [255] wavelets that were used in the proof of Theorem 2.1.2. We follow here the work of Auscher et al. [10]. As a special example of the construction of Section 4.2, consider the case in which I = [1, 2], α = 1/3 and β = 2/3. Let PI be the projection with polarity (+, −). √ Let b = bI be a bell associated with I. Then the orthonormal set φn (ξ) = 2b(ξ) cos((2n + 1)π(ξ − 1)/2), n = 0, 1, 2, . . . forms a basis for the image of the projection P0 = PI . Now denote by Pk = P[2−k ,21−k ) the projections with compatible bells b(2k ξ). One obtains an orthonormal basis then for L2 (0, ∞) by setting φkn (ξ) = 2k/2 φn (2k ξ). With a change of polarity one could just as well have chosen a local sine basis. What is important is that opposite polarity at the endpoints allows the adjacent projections to be orthogonal. In order to obtain an orthonormal basis for all of L2 (R) one could take even extensions of functions comprising a local cosine basis together with odd extensions of functions comprising a local sine basis. In other words, set µ ¶ 2n + 1 φekn (ξ) = 2k/2 b(2k |ξ|) cos π(2k |ξ| − 1) , 2 µ ¶ 2n + 1 φokn (ξ) = 2k/2 sgn (ξ) b(2k |ξ|) sin π(2k |ξ| − 1) . 2 Theorem 4.7.1. The functions ηkn = φekn + iφokn ; νkn = φokn + iφekn (k ∈ Z, n ∈ N) together form an orthonormal basis for L2 (R). Proof. Orthogonality follows from direct calculation. Let f be real valued and let fe and fo denote its even and odd parts: fe (x) = (f (x) + f (−x))/2 and fo (x) = (f (x) − f (−x))/2. Then
4.7 Notes
X X
hf, ηkn i ηkn =
k∈Zn∈N
XX k
=
{hf, φekn + iφokn i}(φekn + iφokn )
n
XX k
189
{hfe , φekn i − i hfo , φokn i}(φekn + iφokn )
n
XX 1 = f +i {hfe , φekn i φokn − hfo , φokn i φekn }, 2 n k
and XX k
n
hf, νkn i νkn =
XX 1 {hfo , φokn i φekn − hfe , φekn i φokn }. f +i 2 n k
Adding the two equations gives the result. As a simple consequence, we have Corollary 4.7.2. The functions ωkn (ξ) = 2k/2 e−2πi2
k
nξ
ω(2k ξ) (k, n ∈ Z)
where ω(ξ) = sgn (ξ)eπiξ b(2|ξ|) form an orthonormal basis for L2 (R). Proof. Set ηk,−n (ξ) = ηk,n−1 (ξ) = −iνk,n−1 (ξ) whenever k ∈ Z and n > 0. By the previous theorem {ηkn }k,n∈Z forms an orthonormal basis for L2 (R). A simple calculation shows that η1,0 (ξ) = (−1)n+1 iω(ξ). Then ωkn (ξ) = ηk+1,−n (ξ). A similar calculation applies when n < 0 and the proof is complete. b b If ψkn is the inverse Fourier transform of ωkn , then we find that ψkn (t) = −k b 2 ω ˇ (2 t − n), i.e., ψ generates an orthonormal wavelet basis for L2 (R). The bandlimited wavelet ψ b is of the type first discovered by Meyer [273,274]. −k/2
5 Fourier uncertainty principles
A mathematical uncertainty principle is an inequality or uniqueness theorem concerning the joint localization of a function or system and its spectrum. A Fourier uncertainty principle is thus a statement that a function and its Fourier transform cannot both decay too rapidly. The most familiar form is the Wiener–Heisenberg inequality, kf k22 k(x − x0 )f (x)k2 k(ξ − ξ0 )fb(ξ)k2 ≥ , 4π
(5.1)
which becomes an equality only for multiples of Gaussians of the form exp(−πα(x − x0 )2 + 2πiξ0 x). In this chapter, D will denote the normalized differentiation or momentum operator D = (1/2πi)d/dx. The inequality (5.1) is often written σX (f ) σD (f ) ≡ kXf − hXf, f if k2 kDf − hDf, f if k2 ≥ 1,
(5.2)
where X is the multiplication operator f (x) 7→ xf (x). Barnes [22] attributes the first proof of (5.1) to a lecture delivered by Wiener in Hilbert’s mathematical physics seminar in G¨ottingen in 1925. In fact, Wiener discusses this lecture, though not the explicit proof of (5.1), at length in his autobiography (see [365], p. 105). What distinguished Wiener’s approach, however, was his macroscopic interpretation of (5.1), a point of view that led to further important achievements, including the Landau, Pollack and Slepian “Bell Labs” uncertainty principles in Chapter 3. The Bell Labs principles address the question of the extent to which a Fourier pair can be localized on respective intervals in time and frequency. It also makes sense to ask: to what extent a pair can be jointly localized on an arbitrary pair of sets of finite measure? The most basic fact here—that a nonzero f cannot vanish outside of a set S of finite measure if fb also vanishes off a set Σ of finite measure—was first proved by Benedicks (Theorem 5.1.1; see also [46]). Nazarov (Theorem 5.1.6) later quantified the dependence of kf χR\S k22 + kfbχR\Σ k22 on |S||Σ|. We discuss these results, and one positive
192
5 Fourier uncertainty principles
implication in terms of ability to reconstruct a signal from incomplete data, in Section 5.1. Optimal joint time–frequency localization can also be cast in terms of limitations on pointwise decay of f and fb. Hardy’s theorem (Theorem 5.2.1) provides perhaps the most familiar form of this. It provides an additional meaning to optimal localization of Gaussians. Variations of this theme due to Beurling and to Gelfand and Shilov are also discussed in Section 5.2. In Section 5.3 we consider uncertainty principles for data on finite sets, precisely, on the group ZN of integers mod N . One purpose is to illustrate how uncertainty relations for “analog” signals are inherited by their discrete sampled counterparts. Beyond this, it is noteworthy that explicit optimizers for finite uncertainty inequalities can be computed. For example, the analogue of Benedicks’ theorem says that the product of the counting measures of the b is at least N . The optimizers supports of a data vector x on ZN and its DFT x for this condition are characteristic vectors of subgroups of ZN . A second b. meaningful type of finite Fourier inequality involves joint entropy of x and x Such an inequality is derived from explicit knowledge of the norm of the DFT 0 from `p (ZN ) to `p (ZN ). Although much more difficult, the corresponding sharp Hausdorff–Young inequality and entropy inequality for the Fourier transform on Rn was derived by Beckner [26] much earlier. The proof of this well-known result will be outlined in Section 5.4. Several other sharp Fourier inequalities, including sharp Sobolev inequalities and sharp versions of Pitt’s inequality will also be outlined or proved here. An important theme linking the methods underlying these results is consideration of symmetries of Euclidean space and rearrangements of functions. In Chapter 3 we considered Gabor systems as defining joint time–frequency energy distributions of f , analogously to the classical Wigner distribution. It is natural to ask whether uncertainty inequalities involving f and fb possess analogous statements in terms of the STFT or Wigner distribution. In Section 5.5 we provide methods for transferring localization statements for Fourier pairs to corresponding statements about localization of time–frequency distributions (see also Gr¨ochenig [169]). In some cases, the extra rotational symmetry of the plane provides more refined information than does separate consideration of f and fb. Operator norm inequalities for the Fourier transform—such as Hausdorff– Young or Pitt’s inequalities—give rise to uncertainty principles. This suggests asking for which pairs of weights the Fourier transform is continuous from a given weighted Lp space to a weighted Lq space. For monotone weights, as in the case of Pitt’s inequalities, the admissible weight pairs were remarkably characterized separately, but at around the same time, by Benedetto and Heinig [36], by Muckenhoupt [281] and by Jurkat and Sampson [218]. These characterizations are reviewed in Section 5.6. In Section 5.7 we point out important connections between the uncertainty principle and the Poisson summation formula (PSF). Specifically, convergence
5.1 Fourier support properties
193
of PSF requires sufficient decay of f and fb. This decay can be formulated in terms of convergence of the sort of weighted norms of f and fb that arise in Section 5.6. Insofar as the classical Shannon sampling theorem is a consequence of PSF, this convergence can also be construed as determining validity of certain sampling methods (cf. [30]). Further illumination of topics discussed here, as well as related topics not explicitly mentioned, can be found in the review article by Folland and Sitaram [145]. We start with uniqueness theorems concerning vanishing sets for Fourier pairs.
5.1 Fourier support properties 5.1.1 Benedicks’ theorem The Paley–Wiener theorem states that if f is a distribution having compact support, then fb has an extension to an entire function of exponential type (see [202] or [329]). In particular, the zeros of fb must be isolated. However, if f is known only to vanish outside of a set of finite measure then fb need not be real-analytic. Indeed, Kargaev [221] (cf. also [145, 180]) constructed a set S of finite measure whose characteristic function χS has a Fourier transform that vanishes on an interval. Thus, a first fundamental question is whether there can exist a nontrivial function f such that f and fb both vanish outside of sets of finite measure. A result due to Benedicks [46] states that this is impossible. Theorem 5.1.1. Suppose f ∈ L2 (R) is such that {x : f (x) 6= 0} and {ξ : fb(ξ) 6= 0} have finite measure. Then f = 0. Proof. Benedicks’Pproof relies on the following form of PSF: If f ∈ L1 (R) 1 then the series k f (x + k) converges in L (T) and has Fourier series P b 2πikx . Since dilating f does not affect the hypothesis |{x : f (x) 6= k f (k)e 0}| < ∞, we can assume that |{x : f (x) 6= 0}| < 1. Then Z
1
0
while Z 0
X
Z χ{f 6=0} (x + k) dx =
χ{f 6=0} (x) dx = |{x : f (x) 6= 0}| < 1
k
1
X
Z χ{fb6=0} (ξ + k) dξ =
χ{fb6=0} (ξ) dξ = |{ξ : fb(ξ) 6= 0}| < ∞
k
P P and both k χ{fb6=0} (ξ + k) and k χ{f 6=0} (x + k) have only finitely many P nonzero terms almost everywhere on [0, 1). Moreover, k χ{f 6=0} (x+k) = 0 on P a subset of [0, 1) of positive measure. By PSF (5.64), k f (x+k)e−2πia(x+k) ∈
194
5 Fourier uncertainty principles
P b 2πikx L1 (T) has Fourier series P which is a trigonometric polynomial k f (k+a)e P for almost every a. But | k f (x + k)e2πia(x+k) | ≤ k |f (x + k)| = 0 on a set of positive measure. Since a nonzero trigonometric polynomial can vanish only on a finite set, it follows that fb(k + a) = 0 for almost every a ∈ [0, 1). Thus fb = 0 almost everywhere. By Fourier uniqueness, f = 0 as well. 5.1.2 Consequences of support properties Benedicks’ result may also be stated from the point of view of projections on a Hilbert space (e.g., [7]). The operators QS : f → χS f and PΣ : f 7→ F −1 χΣ Ff define noncommuting, orthogonal projections. The self-adjoint operator TS,Σ = QS PΣ QS = QS PΣ (QS PΣ )∗ has operator norm strictly less than one and otherwise depends on S and Σ. In fact, one has the following. ¡ ¢ Proposition 5.1.2. If |S||Σ| < 1 then kf k22 ≤ A kf χR\S k22 + kfbχR\Σ k22 . Lemma 5.1.3. kQS PΣ k ≤ (|S||Σ|)1/2 . Proof (of Lemma 5.1.3). Suppose that f ∈ L2 has compact support so that one can justify change of order of integration: Z Z ∨ 2πixξ b QS PΣ f (x) = χS (f χΣ ) (x) = χS (x) e e−2πiyξ f (y) dy dξ Σ Z Z Z = χS (x) f (y) e2πi(x−y)ξ χΣ (ξ) dξ dy = H(x, y) f (y) dy where
H(x, y) = χS (x) (χΣ )∨ (x − y).
Then QS PΣ is Hilbert–Schmidt (e.g., [168], p. 331) with ZZ Z Z |H(x, y)|2 dx dy = |(χΣ )∨ (x − y)|2 dy dx = |S| |Σ| kQS PΣ k2HS = S
R
and the lemma follows from the fact that kQS PΣ kL2 →L2 ≤ kQS PΣ kHS . Proof (of Proposition 5.1.2). If |S||Σ| < 1 then I − QS PΣ is invertible with −1
k (I − QS PΣ )
k ≤
∞ X k=0
k
kQS PΣ k ≤
∞ X
k/2
(|S| |Σ|)
k=0
=
1 1/2
1 − (|S| |Σ|)
.
Since I = QS + QR\S = QS PΣ + QS PR\Σ + QR\S , the orthogonality of the operators in (5.3) gives
(5.3)
5.1 Fourier support properties
kf k22 ≤ k(I − QS PΣ )−1 k2 kQS PR\Σ f + QR\S f k22 ¡ ¢ 1 ≤ kQS PR\Σ f k22 + kQR\S f k22 1/2 2 (1 − (|S| |Σ|) ) ¡ ¢ 1 ≤ kPR\Σ f k22 + kQR\S f k22 , 1/2 2 (1 − (|S||Σ|) )
195
(5.4)
which proves the proposition with A = (1 − (|S||Σ|)1/2 )−2 . Definition 5.1.4. A function f ∈ L2 (R) is ε-concentrated on a measurable set S if kQR\S f k2 < ε. Inequality (5.4) gives a slightly sharper version of an inequality of Donoho and Stark [119] (cf. [168], p. 30). Corollary 5.1.5. If f of unit L2 -norm is εS -concentrated on S and its Fourier transform fb is εΣ -concentrated on Σ with |S||Σ| < 1 then |S| |Σ| ≥ (1 − (ε2S + ε2Σ )1/2 )2 . 5.1.3 Uncertainty and missing data The fact that a function and its Fourier transform cannot have arbitrarily small joint support has positive ramifications regarding recovery of missing values. Certainly, a bandlimited signal is real analytic, so it is enough to know values on a dense enough sample set to know all values. But how can one take advantage of nonlocalizability when fb is merely concentrated on a set of finite measure? A typical way to set this problem is as follows. Let s be an L2 signal that is bandlimited to a set Ω of finite measure (i.e., sb vanishes off Ω). Suppose s is transmitted and the received signal is corrupted by both noise n and unregistered values. Thus the received (analog) signal can be taken to have the form ( s(t) + n(t), if t ∈ R \ T , r(t) = (I − QT ) s + n = 0, if t ∈ T . The goal is to reconstruct s as nearly as possible, making use of the bandlimited hypothesis and the received data r. The uncertainty principle implies that stable signal recovery is possible so long as |T ||Ω| < 1. By stable recovery we mean that p there is a linear operator Q such that ks − Qrk ≤ Cknk where C ≤ (1 − |T ||Ω|)−1 . Of course, here Q is simply (I − QT PΩ )−1 which is estimated by a Neumann series expansion as above. The error can still be problematic if knk is too large or if |T ||Ω| is close to one. The actual iterative reconstruction algorithm, which makes sense as long as |T ||Ω| < ∞, is generally attributed to Papoulis [293] and Gerchberg [293].
196
5 Fourier uncertainty principles
5.1.4 Nazarov’s theorem Corollary 5.1.5 does not imply Benedicks’ theorem because it only states that f and fb cannot both be concentrated on very small sets. Norm estimates for operators QS PΣ QS for sets S, Σ of larger measure require far more sophisticated techniques. The following estimate due to Nazarov requires an impressive combination of function theory and operator theory; details take up a significant portion of the text [180]. Theorem 5.1.6. There are absolute constants A, C > 0 such that µZ ¶ Z Z |f |2 ≤ Ce2A|S||Σ| |f |2 + |fb|2 . R\S
(5.5)
R\Σ
That the constant must grow exponentially in |S||Σ| is seen by taking f to be a Gaussian. Nazarov’s inequality has an interesting consequence pertaining to rearrangements f ∗ and (fb)∗ of f and fb (cf. [180], p. 462ff.): Let S = {|f | > a} and Σ = {|fb| > b}. Nazarov’s inequality then implies Z ∞ Z ∞ 2 ∗2 |f | + |fb|∗2 ≥ Ce−2A(ab) kf k . (5.6) a
b
p q Suppose now that |f |∗ (y) = o(e−A|y| ) and |fb|∗ (y) = o(e−A|y| ) with p, q > 1 and + 1/q = 1. Choose λ > 0 such that p = 1 + λ and q = 1 + 1/λ. Since R ∞ 1/p p −A|y|p e dy = O(e−A|x| ), (5.6) implies that x Z ∞ Z ∞ p q 2 Ce−2A(ab) kf k2 ≤ |f |∗2 + |fb|∗2 ≤ K(a, b)(e−2Aa + e−2Ab ) (5.7)
a
b
where K(a, b) → 0 as max(a, b) → ∞. When b = aλ , bq = ap = a1+λ . The 1+λ 1+λ left-hand side of (5.7) is Ce−A(a ) while the right-hand side is o(e−A(a ) ). Taking a → ∞ one sees that f = 0. As a particular consequence, if p q 1 1 |f |∗ (y) = O(e−A1 |y| ), |fb|∗ (y) = O(e−A2 |y| ) and + < 1, then f = 0. p q (5.8) We shall investigate further uniqueness criteria in the next section. Those criteria are sharp, but depend on the actual pointwise decay of f and fb—not just their value distributions.
5.2 Growth properties and Fourier uniqueness criteria 5.2.1 Hardy’s theorem Hardy’s theorem [177] was perhaps the first quantification of Fourier uniqueness in terms of pointwise decay of f and fb, that is, without some assumption on their zero sets.
5.2 Growth properties and Fourier uniqueness criteria
197
Theorem 5.2.1. Suppose that for f ∈ L1 (R) there is a constant C such that 2 2 |f (x)| ≤ Ce−παx and |fb(ξ)| ≤ Ce−πξ /α .
(5.9)
2
Then f is a multiple of e−παx . An immediate corollary of Hardy’s theorem is that one cannot have both 2 2 |f (x)| ≤ Ce−παx and |fb(ξ)| ≤ Ce−πβξ where αβ > 1 unless f = 0. Morgan [276] (cf. [180], p. 134) generalized this fact to the case |f (x)| ≤ Ce−2παM (x) ∗ and |fb(ξ)| ≤ Ce−2πβM (ξ) where M, M ∗ are dual Young’s functions, for ex0 p ample, M (x) = |x| /p and M ∗ (ξ) = |ξ|p /p0 , p0 = p/(p − 1). A similar result, cf. (5.8), was proved by Laeng [241] using power series methods. 5.2.2 Beurling’s theorem The following variant of Hardy’s theorem was discovered by Beurling and eventually reported by H¨ormander [203]. Theorem 5.2.2. Suppose that f and fb belong to L1 (R) and ZZ |f (x) fb(ξ)| e2π|xξ| dx dξ < ∞.
(5.10)
Then f = 0. The following proof of Theorem 5.2.2 was reported by Bonami et al. [56]. Proof. By Fourier uniqueness, it is enough to show that the function g = 2 f ∗ e−πx vanishes identically. Since f ∈ L1 (R), g extends to an entire function of order two in C. By completing squares in the exponent, one has ZZ ZZ Z 2 2 |g(x) gb(ξ)| e2π|xξ| dx dξ ≤ |fb(ξ) f (y)| e−π(ξ +(x−y) +2|xξ|) dx dy dξ ZZ ≤ 2 |f (x) fb(ξ)|e2π|xξ| dx dξ ≤ C. (5.11) For z ∈ C,
Z |b g (ξ)| e2π|z||ξ| dξ,
|g(z)| ≤ R
so by (5.11)
Z |g(x)| sup |g(z)| dx ≤ C. R
|z|=|x|
One claims that the holomorphic function Z z G(z) = g(u) g(iu) du 0
(5.12)
198
5 Fourier uncertainty principles
is bounded by C. It is bounded by C on the real axis by (5.12). To bound G the first quadrant, let θ ∈ [0, π/2), α ∈ (0, θ), and set Gα (z) = R z in −iα g(e u)g(iu) du. Then Gα is entire of order 2 and bounded by C on 0 the y-axis and on the half-line ρeiα . By the Phragmen–Lindel¨of theorem it is bounded by C in the intervening wedge as well. Boundedness in the first quadrant follows by continuity and independence of the bound on θ. A similar argument yields boundedness in the other quadrants. Therefore G is constant. 2 Thus g = f ∗ e−πx = 0 and f = 0 as well. This proves Beurling’s theorem. The following variant of Beurling’s theorem for Rn , also due to Bonami et al. [56], emphasizes the role of products of polynomials with Gaussians as near optimizers of time–frequency localization. Theorem 5.2.3. Let f ∈ L2 (Rn ) and N ≥ 0. Then ZZ
|f (x) fb(ξ)| e2π|x·ξ| dx dξ < ∞ (1 + |x| + |ξ|)N
if and only if f may be written as f (x) = P (x)e−π(Ax)·x in which A is a real positive definite symmetric matrix and P is a polynomial of degree less than (N − n)/2. Beurling’s theorem amounts to the degenerate case n = 1, N = 0 in which one concludes that f = 0. 5.2.3 Gelfand–Shilov spaces Beurling’s theorem applies to joint time–frequency decay as measured by a conjugate pair M (x) and M ∗ (ξ) of Young’s functions. Suppose that m(t) is R |t| positive and strictly increasing on (0, ∞). Define M (t) = 0 m(τ ) dτ and R |s| M ∗ (s) = 0 m−1 (t) dt where m−1 is the inverse function of m. M ∗ is not to be confused as a rearrangement. The convex function M is called a Young’s function with Young’s dual M ∗ . The pair satisfies Young’s inequality |xy| ≤ M (x) + M ∗ (y) with equality when y = m(x). Beurling’s theorem implies (see [203]; see also [56]) that Z Z ∗ if |f (x)| e2πM (x) dx < ∞ and |fb(ξ)| e2πM (ξ) dξ < ∞ then f = 0. (5.13) 0 As an example, let M (x) = |x|p /p so that M ∗ (y) = |y|p /p0 . According to p p0 0 (5.13), if e|x| /p f ∈ L1 and e|ξ| /p fb ∈ L1 then f = 0. This result is sharper than (5.8) except that it requires pointwise decay of f and fb rather than of their rearrangements. When p = 2 the conditions are akin to those of Hardy’s theorem except that pointwise decay is replaced by convergence of an integral. H¨ormander [203] cited this case as an illustration of the sharpness of
5.3 Finite uncertainty principles
199
(5.13) and cited Gelfand–Shilov spaces as further illustrations of its sharpness. The Gelfand–Shilov spaces (see [153], vol. 2–3) of interest here are defined as N,b follows. Given Young’s functions M and N and a, b > 0, WM,a consists of those entire functions φ such that for any α < a and any β > b, kφkαβ =
sup |φ(z)| e(M (α|x|)−N (β|y|)) < ∞.
(5.14)
z=x+iy
Gelfand and Shilov proved the Fourier duality M ∗ ,2π/a
N,b F(WM,a ) = WN ∗ ,2π/b .
(5.15) M ∗ ,2π/a
N,b This duality boils down to the embedding F(WM,a ) ,→ WN ∗ ,2π/b since 2 ∗ ∗ F f (x) = f (−x) and (M ) = M . The embedding requires a use of Cauchy’s M,b theorem to optimize application of Young’s inequality. The space WM,a is M,b trivial when a > b since, with b < α < a, any f ∈ WM,a would satisfy ZZ Z Z ∗ |f (x) fb(ξ)| e2π|xξ| dxdξ ≤ |fb(ξ)| eM (2πξ/α) dξ |f (x)| eM (αx) dx < ∞
which, just as in (5.13), implies f = 0. On the other hand, if a > b then any M,b satisfies f ∈ WM,a Z Z M ∗ (2πξ/β) b |f (ξ)| e dξ |f (x)| eM (αx) dx < ∞ M,b are nearly when a > α and β > b. In this sense, elements of the space WM,a ∗
M ,b optimally time–frequency localized. The space WM,a , on the other hand, is ∗ nontrivial when M grows more slowly than M , as is the case when m(t) = o(t) as t → ∞ (cf. [215]). In particular, then M (x)/x2 → 0 while x2 /M ∗ (x) → 2 M ∗ ,b ∞ so the Gaussian e−πx ∈ WM,a . While its elements are not necessarily well localized, the space is invariant under the Fourier transform when b = 2π/a.
5.3 Finite uncertainty principles In the case of finite signals, growth properties are effectively meaningless, but one can still give meaning to localization. Donoho and Stark [119] formulated a finite, concrete version of Benedicks’ qualitative uncertainty principle. They proved the following uncertainty relation for a finite signal and its discrete Fourier transform. Theorem 5.3.1. Let x = {xj } be a signal of length N with N -point dis√ PN −1 crete Fourier transform x bk = (1/ N ) j=0 x(j)e−2πijk/N . Let |supp x| and b| denote the number of points at which x and x b, respectively, do not |supp x b| ≥ N while (ii) equality holds in (i) precisely vanish. Then: (i) |supp x|| supp x b is a cyclic permutation of a multiple of the indicator function of when x or x a subgroup of ZN .
200
5 Fourier uncertainty principles
The proof provided by Donoho and Stark brings out the subgroup structure of the DFT. One thinks of x = {xj } as a function on the discrete group ZN with addition modulo N . If N = M P then ZP can be identified as a subgroup with indicator sequence IM (k) = δ0 mod M (k) whose DFT Ic M satisfies √ Ic (ω) = P I (ω)/ N . Assume the following lemma for the moment that. M P Lemma 5.3.2. If x = {xj } has M nonzero elements then no M consecutive b can vanish. entries of x Proof (of Theorem 5.3.1). If M = |supp x| and N = M P then one can partition {0, . . . , N − 1} into P interval cosets of ZM ∼ {0, . . . , M − 1}. By the b cannot vanish identically on any of these cosets. Thus x b has at least lemma, x N/M nonzero entries and Theorem 5.3.1 (i) holds in this case. When |supp x| b| > N since there is no does not divide N , it must hold that |supp x||supp x b without leavway to spread out fewer than dN/|supp x|e nonzero entries in x ing a gap larger than |supp x|. This proves statement (i) of Theorem 5.3.1, modulo Lemma 5.3.2. b| = N , the nonzero entries of x and x b must be In order that |supp x||supp x equally spaced. In particular, one can assume that xk = 0 unless k = lP + l0 where N = M P and l0 is fixed. The cyclic permutation y, yk = x(k+l0 ) of x satisfies √ yk = 0 unless k = lP . Thus y has the form y = wIP and b = (M/ N )w b ∗ IM with convolution on ZN . Since IM is cycloperiodic with y b . Since y b is a modulate of x b, it has P nonzero, evenly spaced period M , so is y b must be a multiple of a cyclic permutation of IM and x b entries. Therefore y must be, up to a modulation and cyclic permutation, a multiple of IP . This proves statement (ii) of Theorem 5.3.1, modulo Lemma 5.3.2. Proof (of Lemma 5.3.2). Suppose that j1 , . . . , jM are the distinct support indices of x. The lemma asserts that, for any p, the vector w(p) defined by (p)
wk
M 1 X = x bp+k = √ xjl e−2πijl (k+p)/N , k = 0, . . . , M − 1, N l=1
is not identically zero. Equivalently, as an element of CM , {xj1 , . . . , xjM } is not in the kernel of the matrix Zlk = zlp+k where zl = e−2πijl /N . Since Zlk is a Vandermonde matrix and the zl are all distinct, it follows that Zlk is nonsingular. This proves the lemma and completes the proof of Theorem 5.3.1. Theorem 5.3.1 (i) was extended to arbitrary locally compact Abelian (LCA) groups by Matolcsi and Sz¨ ucs [270] as follows. b Let mG and m b Theorem 5.3.3. Let G be an LCA group with dual group G. G b respectively, normalized so that the Plancherel denote Haar measures on G, G, R R theorem is valid: G |f (x)|2 dmG (x) = Gb |fb(ω)|2 dmGb (ω) for all f ∈ L2 (G). Then mG (supp (f )) · mGb (supp (fb)) ≥ 1. (5.16)
5.3 Finite uncertainty principles
201
An elementary proof using only the Plancherel theorem and the Hausdorff– Young inequality was given in [194] and a version of Benedicks’ result (Theorem 5.1.1) for functions on LCA groups appeared in [193, 194]. This was extended to certain classes of non-Abelian groups in [195]. In [194, 195] it was shown that equality in (5.16) is attained by multiples of indicator functions of cosets of compact open subgroups. Kutyniok [238] showed that these examples are the only extremals of inequality (5.16). It is worth pointing out that (5.16) can be sharpened in certain cases. Specifically, Tao [339] has proved the sharp inequality |supp (f )| + |supp (fb)| ≥ p + 1 when f is a function on the finite cyclic group Zp of prime order p. The techniques for proving Theorem 5.3.1 also give rise to an analogue of Theorem 5.1.5 in the context of LCA groups which specializes to the case G = ZN as follows. Theorem 5.3.4. If x ∈ CN has unit norm and is εT concentrated on a subset b is εΩ -concentrated on a set Ω, then T of {0, . . . , N − 1} while x |T | |Ω| ≥ N (1 − (ε2T + ε2Ω )1/2 )2 . As before, |T |, |Ω| denote the cardinalities of T , Ω, respectively. Finite entropy inequalities. Entropy can be viewed as a measure of information concentration and, for a specific type of signal, it makes sense to seek a unitary transformation that concentrates information as much as possible. The Hirschman-type entropy inequality Theorem 5.3.6 was reported by Dembo et al. [112]. The following lemma follows directly from Riesz–Thorin interpolation. Lemma 5.3.5. If U is a unitary N × N matrix then, for any z ∈ CN , (2−p)/p
kU zkp0 ≤ kU k`1 →`∞ kzkp
(1 ≤ p ≤ 2).
(5.17)
Theorem 5.3.6. Let U be a unitary N × N matrix. Then for any unit vector z ∈ CN one has X X − |zi |2 ln |zi | − |(U z)i |2 ln |(U z)i | ≥ − ln kU kL1 →L∞ . Proof. For kzk2 = 1, set Hp (z) =
1 1−
p 2
ln
X
|zi |p =
p 1−
µ p 2
ln
kzkp kzk2
¶ .
Upon taking logarithms of both sides, (5.17) becomes Hp (z) + Hp0 (U z) ≥ −2 ln kU kL1 →L∞ . The entropy inequality is the limiting case of this inequality as p ↑ 2, which involves taking logarithmic derivatives.
202
5 Fourier uncertainty principles
Example. When U is the discrete N -point Fourier transform, kU kL1 →L∞ = N −1/2 and one obtains X X 1 − |zi |2 ln |zi | − |b zi |2 ln |b zi | ≥ ln N. (5.18) 2 When z is a standard basis vector, |b zi | = N −1/2 for all i and both sides of (5.18) are equal. Whenever kzk2 = 1, X − |zi |2 ln |zi | ≤ ln |{i : zi 6= 0}|. With the corresponding inequality for b z, one concludes from (5.18) that |{i : zi 6= 0}||{j : zbj 6= 0}| ≥ N , thus providing an alternate proof of part (i) of Theorem 5.3.1. Curiously, one can turn the approach of proving entropy inequalities for unitary matrices around and use entropy minimizers for the Hirschman inequality in the finite Fourier case to define certain Hirschman optimal unitary transforms as was done by Przebinda et al. [301]. The result makes use of the finite (polarized) Heisenberg group 1ac HN = 0 1 b : a, b, c ∈ ZN (5.19) 001 with ordinary matrix multiplication. The discrete Schr¨odinger representation ρN (see Appendix) acts on L2 (ZN ) in the expected way. As in the Euclidean case one can associate a projective Hilbert space P (ZN ) in which unit vectors u ∈ L2 (ZN ) are equivalent if they differ by a unimodular factor. P P Theorem 5.3.7. (i) If u ∈ P (ZN ) then − |ui |2 ln |ui | − |b ui |2 ln |b ui | ≥ ln N/2, (ii) the set of vectors u ∈ P (ZN ) such that this inequality becomes an equality coincides with the set of orbits of normalized indicator functions of subgroups of ZN under the action of ρN , (iii) each such orbit defines an orthonormal basis of L2 (ZN ). The entropy minimizers in this case coincide with the joint support minimizers in Theorem 5.3.1. The third statement says something new. These last several results merit comparison with those of Section 4.6.
5.4 Symmetry and sharp inequalities 5.4.1 The sharp Hausdorff–Young inequality The Hausdorff–Young inequality, kfbkLp0 ≤ kf kLp , 1 ≤ p ≤ 2 was first proved as an application of the Riesz–Thorin interpolation theorem (e.g., [225], p. 98). In 1961, Babenko [13] used further properties of entire functions to show
5.4 Symmetry and sharp inequalities
203
that, at least for p0 ∈ 2Z, Gaussians are extremal functions for the Hausdorff– Young inequality in the sense of maximizing the ratio kfbkLp0 /kf kLp . Beckner [26] extended this sharp form of the Hausdorff–Young inequality to all p ∈ (1, 2] in the form. Theorem 5.4.1. Let 1 < p ≤ 2. Then for all f ∈ Lp (Rn ), µ kfbkLp0 ≤
¶n/2
p1/p p0 1/p
0
kf kLp .
(5.20)
We will outline the main steps of Beckner’s argument in the univariate case. The multivariate case follows in a straightforward manner from the product structure of Rn . The univariate case involves two principal reductions. The first transfers the problem to that of bounding a corresponding Hermite multiplier operator. Define the Hermite functions hm and polynomials Hm , m = 0, 1, 2, . . . on R via 2 (−1)j πx2 dj −2πx2 hm (x) = 21/4−j p e (e ); Hm (x) = eπx hm (x). j j dx π j!
(5.21)
With this normalization, hm is an eigenfunction of the Fourier transform with eigenvalue (−i)m . Moreover, the hm are orthonormal in L2 (R). Mehler’s formula (see also [144], p. P 55 for more properties of Hermite functions) states that for |ω| < 1, the kernel m≥0 ω m hm (x)hm (y) of the linear Hermite multiplier operator hm 7→ ω m hm can be written as µ
2 1 − ω2
¶1/2
µ ¶ (1 + ω 2 )(x2 + y 2 ) − 4ωxy exp −π . 1 − ω2
(5.22)
The key to the first reduction just mentioned is to associate with the Fourier transform F a family of operators more readily adapted to Lp dilations, noting that F is the Hermite multiplier in the limiting case ω = −i. It will be more convenient to work in the polynomial setting, defining Tω : Hm 7→ ω m Hm . Then Tω is regarded as an operator on L2µ with respect to the measure dµ = √ −2πx2 2e dx. It can be expressed then as integration against kernel µ ¶1/2 µ ¶ 2 2ω 2 (x2 + y 2 ) − 4ωxy exp −π . (5.23) 1 − ω2 1 − ω2 with respect to the measure dµ. The first reduction asserts that the Babenko–Beckner inequality, for n = 1, is equivalent to the inequality
204
5 Fourier uncertainty principles
kTω gkLp0 (dµ) ≤
√
p 2 kgkLp (dµ) (ω = i p − 1)
(5.24)
√ −2πx2 p on dx. (The factor √ L with respect to the Gaussian measure dµ = 2e 2 is related to the normalization of Hermite polynomials.) Beckner’s second principal reduction—a truly striking idea—utilizes the central limit theorem to build from a two-point space inequality to the inequality√for Tω , suitably 2 renormalized, by thinking of the Gaussian measure e−x /2 / 2π as a convolution limit of the discrete measure dν having point masses of weight 1/2 at x = ±1. First reduction: equivalence of (5.20) and (5.24). We will show that √ when ω = i p − 1 and n = 1, (5.24) is equivalent to (5.20) on a dense subspace of Lp (R). By (5.23), the left-hand side of (5.24) can be written as ¯p0 µZ ¯Z µ ¶ ¶1/p0 ¯ 2ω 2 (x2 + y 2 ) − 4ωxy dy ¯¯ dx ¯ cω g(y) 2πy2 ¯ 2πx2 ¯ exp − π 1 − ω2 e e √ 0 where cω = 21+1/(2p ) /|1 − ω 2 |1/2 . When ω = i p − 1 the Gaussian terms involving x cancel and the left-hand side of (5.24) reduces to 0
21+1/(2p ) √ p
¯p0 ¶1/p0 µZ ¯Z ¶ µ √ ¯ ¯ ¯ exp 4πiy p − 1 x e−2πy2 /p g(y) dy ¯ dx . ¯ ¯ p
Setting f (y) = e−2πy
2
/p
g(y) this becomes
¶° µ √ ¶1/p0 0 ° 0 µ ° p 21+1/(2p ) ° 21+1/(2p ) ∨ 2 p−1 ° ° √ x ° = f kf ∨ kp0 √ √ p ° p p 2 p−1 p0 µ 0 1/p0 ¶1/2 1/2+1/(2p) p = 2 kf ∨ kp0 . p1/p To complete the reduction, we need to compute the right-hand side of the 2 inequality (5.24). As g(y) = e2πy /p f (y), this side is simply √
2 kgkLp (dµ) =
√
µZ 2
|g(x)|p
√
2
¶1/p
2 e−2πx dx
µZ = 2
1/2+1/(2p)
= 2
1/2+1/(2p)
2πx2 /p
|e
p −2πx2
f (x)| e
µZ
¶1/p dx
¶1/p p
|f (x)| dx
.
Cancelling the factor 21/2+1/(2p) then multiplying both sides of (5.24) by ¡ 1/p 0 1/p0 ¢1/2 p /p one concludes that (5.24) is equivalent to (5.20), at least 2 whenever f has the form f (x) = e−2πx /p g(x). For polynomial g, such f
5.4 Symmetry and sharp inequalities
205
comprise a dense subspace of Lp so, by taking limits, the equivalence extends to all of Lp with the respective measures. Hermite and symmetric polynomials. The sharp Hausdorff–Young inequality now is reduced to proving (5.24). The passage from the basic two-point inequality below to (5.24) requires an asymptotic relationship between Hermite and elementary symmetric polynomials. It will be convenient to renormalize the Hermite functions and polynomials (5.21) with respect to the Gaus√ 2 sian measure dµ = e−x /2 / 2π at this stage. Set ¶ µ m x m x2 /4 d −x2 /2 −1/4 1/2 km (x) = (−1) e (e ) = 2 (m!) hm √ dxm 2 π and Km (x) = ex
2
/4
¯ dm tx−t2 /2 ¯¯ e ¯ . dtm t=0
km (x) =
To see that Km is the polynomial part of km , observe that µ ¶m µ ¶m 2 2 d d e−(x−t) /2 = (−1)m e−(x−t) /2 dt dx 2
2
2
2
and, since e−(x /4−tx+t /2) = ex /4 e−(x−t) 2 pansion of e−(x−t) /2 at t = 0 that ∞ X m=0
Km (x)
/2
it follows from the Taylor ex-
2 tm = etx−t /2 m!
(5.25)
(cf. [328], p. 571). Let σnl and φnl denote the elementary and normalized symmetric polynomials of n variables: X σnl (x1 , . . . , xn ) = xi1 · · · xil ; φnl = l! σnl . i1 <···
The desired asymptotic relationship between φnl and Kl is as follows. Lemma 5.4.2. If x21 = x22 = · · · = x2n = 1/n then φnl (x1 , . . . , xn ) = Kl (x1 + · · · + xn ) +
[l/2] 1 X al,r Kl−2r (x1 + · · · + xn ) n r=1
where al,r is bounded independent of n. Proof. The polynomials Km satisfy the recurrence relation Km (x) = x Km−1 (x) − (m − 1) Km−2 (x)
(m ≥ 2).
Also, the normalized symmetric polynomials φnm satisfy the relation
(5.26)
206
5 Fourier uncertainty principles
φnm (x) = φn1 (x) φn,m−1 (x) −
(m − 1) ((n − (m − 2)) φn,m−2 (x), (5.27) n
provided x = (x1 , x2 , . . . , xn ) satisfies x2i = 1/n, i = 1, . . . , n. The recursion (5.26) follows, at least formally, from integrating by parts the idenR 2 2 2 tity et /2−xt = (d/dt)et /2−xt dt, setting u = (t − x) and dv = et /2−xt dt, then equating coefficients of tm in (5.25). To verify the recursion (5.27), first note that Qnφnl (x1 , . . . , xn ) is the lth coefficient of the Taylor expansion of Φ(x, t) = k=1 (1 + txk ) around t = 0. Since n n n n X Y X ∂ Y tl−1 φnl , (1 + txk ) = xk (1 + txj ) = ∂t (l − 1)! k=1
k=1
j6=k
l=1
the constraint x2k = 1/n yields φn1 (x1 , . . . , xn ) =
n t X n
n Y
k=1 n Y
(1 + txj ) +
k=1 j6=k
=
(1 + txk ) =
n X
xk (1 + txk )
k=1 n X
n Y
k=1
j6=k
xk
n Y
(1 + txj )
n n−1 n t X X tl ∂ Y φn−1,l (x1 , . . . , x ck , . . . , xn ) + (1 + txk ) n l! ∂t k=1 l=0
=
(1 + txj )
j6=k
t n
n−1 X l=0
l
t (n − l) φnl (x1 , . . . , xn ) + l!
k=1
n X l=1
l−1
t φnl . (l − 1)!
Here, x ck refers to omission of the kth variable. The last identity follows since the sum over k is symmetric with respect to x1 , . . . , xn , hence is a multiple of σnl with constant determined by computing the value when all xi are equal. The recursion (5.27) now follows from equating coefficients of powers of t. To complete the proof of the lemma, one verifies the identity directly for l = 0, 1, 2 then applies (5.27) together with an induction argument to verify that the anl are bounded. Symmetric multipliers and limiting arguments. In analogy with the Hermite multiplier operators, consider now symmetric versions (again, x = (x1 , . . . , xn )) Tn,ω : φnl (x) 7→ ω l φnl (x). Let dν denote the measure (δ−1 +δ1 )/2 having a mass of 1/2 √ at the points √ ±1. Denote by Xn the product space with measure dνn = dν( nx1 ) · · · dν( nxn ). For each n, {φnl }nl=0 forms an orthogonal basis of L2 (Xn ). Assume for the time being that Tn,ω maps Lp (Xn ) to itself with norm one (1 ≤ p ≤ 2) for all n. One uses a limiting argument, comparing Tn,ω with T¯ω : Km 7→ ω m Km , to prove
5.4 Symmetry and sharp inequalities
207
the renormalized version of (5.24). Any of Hermite Ppolynomial is a sum P m polynomials. For such a combination g = bm Km , one has T¯ω g = ω bm Km . By Minkowski’s inequality, the asymptotic relationship of Lemma 5.4.2 yields, on any polynomial, ¶¯p0 ¶ µX Z ¯ µX ¯ ¯ ¯T¯ω bm φnm (x) ¯¯ dνn bm Km (x1 + · · · + xn ) − Tn,ω ¯ m
m
¯p0 Z ¯X ¯ ¯ m ¯ = ω bm (Km (x1 + · · · + xn ) − φnl (x))¯¯ dνn ¯ m
≤
µX
|bm |
µZ
¶1/p0 ¶p0
p0
|Km (x1 + · · · + xn ) − φnl (x)| dνn
m
=
µX m
¯p 0 µ Z ¯ [l/2] ¶1/p0 ¶p0 ¯1 X ¯ ¯ ¯ |bm | al,r Kl−2r (x1 + · · · + xn )¯ dνn →0 ¯n r=1
as n → ∞. By the triangle inequality, then ° µX ¶° ° ° °Tω ° b K (x + · · · + x ) − m m 1 n ° ° m
Lp0 (Xn )
° µX ¶° ° ° °Tn,ω ° b φ (x) m nl ° ° m
Lp0 (Xn )
also tends to zero as n tends to infinity. Thinking of dνn now as a convolution product, the √ central limit theorem states that dνn converges weakly to dµ = 2 e−x /2 dx/ 2π. That is, for f ∈ C0 (R) one has Z √ √ hf, dνn i = f (x1 + · · · + xn ) dν( nx1 ) · · · dν( nx1 ) Z 2 1 → hf, dµi = √ f (x) e−x /2 dx. 2π This convergence extends to polynomials. It follows that, if Tn,ω is a uniformly 0 bounded family of operators from Lp (Xn ) to Lp (Xn ) of norm at most one 0 then, since polynomials are dense, Tω maps Lp (dµ) to Lp (dµ) also with norm one. Rescaling then yields (5.24). Thus the boundedness of Tω is reduced to that of the Tn,ω . Reduction to the two-point space. By means of an interpolation lemma for product measures, the uniform boundedness of Tn,ω reduces √ to a two-point estimate—the case n = 1. Since the n-fold product of dν( nx) is discrete, all functions on Xn can be identified as products of functions linear in each coordinate. Define Cω : a + bx 7→ a + ωbx. The following two-point lemma asserts that Cω is bounded with norm one. √ 0 Lemma 5.4.3. With ω = i p − 1, Cω is bounded from Lp (dν) to Lp (dν) with norm one. That is, for all a, b in C,
208
5 Fourier uncertainty principles 0
0
|a − ω b|p + |a + ω b|p ≤ 2
µ
|a − b|p + |a + b|p 2
¶p0 /p .
The lemma follows from elementary, though nontrivial, applications of Minkowski’s inequality and minimization arguments (see [26] for details). To pass to the product case, one defines operators Cnk : a(b xk ) + b(b xk )xk → a(b xk ) + ωb(b xk )xk where a(b xk ) and b(b xk ) are functions of the n − 1 variables xi , i 6= k. One also defines T n,ω = C11 C22 · · · Cnn . The boundedness of T n,ω follows from the following product lemma. Lemma 5.4.4. If T1 and T2 are both operators with integral kernels and Ti : Lp (dρi ) → Lq (dλi ), i = 1, 2 are continuous both with norm at most one where dρi and dλi are sigma finite then, provided p ≤ q, the operator T1 × T2 : Lp (dρ1 × dρ2 ) → Lq (dλ1 × dλ2 ) is bounded with norm at most one. This lemma is a straightforward consequence of Minkowski’s inequality, which is where q ≥ p is required. Armed with these lemmas, one simply notes that Tn,ω agrees with the action of T n,ω on elementary symmetric polynomials, thus Tn,ω can be regarded as the restriction of T n,ω to symmetric functions. Hence, by Lemmas 5.4.3 and 5.4.4, Tn,ω has norm at most one as 0 an operator from Lp (Xn ) to Lp (Xn ). Assuming these lemmas, this completes the reduction and thus the proof of Theorem 5.4.1.
5.4.2 Entropy and logarithmic Sobolev inequalities Theorem 5.3.6 gives a lower bound on the entropy of a finite vector. A similar method leads to a lower bound for f ∈ L2 (R), kf k2 = 1, but here one must use Theorem 5.4.1 to recover the sharp estimate. Because F (p) = kfbkLp0 /kf kLp increases to one as p → 2− , its logarithmic derivative from the left is nonnegative there, as Hirschman [191] proved. Beckner’s theorem yields more 1/p0 ) and both sides tend to precise information. Since, in fact F (p) ≤ (p1/p /p0 one as p → 2− : ¯ µµ 1/p ¶1/2 ¶¯ ¯ ¯ d p d 1 ¯ ¯ ln(F (p))¯ ln (5.28) ≥ = {1 − ln 2}. ¯ 1/p0 0 dp dp 4 p p=2 p=2 R Since dkf kpLp /dp = |f |p ln |f |, differentiation under the integral yields µ Z ¶ Z d 1 b|p0 + ln |f |p ln(F (p)) = ln | f dp p2 R R µ ¶ 0 |fb|p ln |fb| |f |p ln |f | 1 1 R − . + R p (p − 1) |f |p |fb|p0 Evaluating both sides at p = 2 and using (5.28) gives
5.4 Symmetry and sharp inequalities
Z
Z |fb|2 ln |fb| +
|f |2 ln |f | ≥
1 {ln 2 − 1} 2
(kf k2 = 1),
209
(5.29)
as Hirschman originally conjectured as an alternate path to the Wiener– Heisenberg inequality. Another application to logarithmic Sobolev inequalities is also worth mentioning (see [27, 28]). We will state only the univariate case, though the im√ 2 portant multivariate case follows the same proof. Setting dµ = e−x /2 dx/ 2π and assuming that kgkL2 (dµ) = 1, one has ˜ m (t) = Corollary 5.4.5. Let g˜ be defined through the linear extension of K im Km (t) on Hermite polynomials as in (5.25). Then Z Z Z |g|2 ln |g| dµ + |e g |2 ln |g| dµ ≤ |g 0 |2 dµ. The corollary follows from (5.29) essentially by a change of variables. It improves the more familiar logarithmic Sobolev inequality Z Z 2 |g| ln |g| dµ ≤ |∇g|2 dµ (5.30) which can thought of as an infinite-dimensional Sobolev inequality (cf. [28] for its broad interpretations). 5.4.3 Other sharp inequalities In [26], Beckner also proved a sharp form of Young’s inequality, namely kf ∗ gkLr (Rn ) ≤ (Bp Bq Br0 )n kf kLp (Rn ) kgkLq (Rn ) ,
1 1 1 = + −1 r p q
(5.31)
1/p0
)1/2 is the Babenko–Beckner conwhere p, q, r ∈ [1, ∞] and Bp = (p1/p /p0 stant in (5.20). Inequality (5.31) follows readily from (5.20) when r0 ≤ 2. In general the rearrangement inequality Z Z ? h(x) (f1 ∗ · · · ∗ fm )(x) dx ≤ h? (x) (f1? ∗ · · · ∗ fm )(x) dx (5.32) on Rn allows one to reduce to the case of symmetric decreasing functions. Here f ? denotes the equimeasurable symmetrically nonincreasing rearrangement of f . The idea behind (5.32) is simple enough: when R R the fi are symmetric, nonwhile f g P ≤ f ? g ? is the integral analogue increasing then so is f1 ∗ · · · ∗ fm P of the more familiar inequality ak bk ≤ a∗k b∗k for rearranged sequences (e.g., [178]). Lieb [259] used these basic observations about symmetry and monotonicity to prove existence of extremals, i.e., equalizers for the Hardy– Littlewood–Sobolev inequality
210
5 Fourier uncertainty principles
¯ ¯Z Z ¯ ¯ −λ ¯ ¯ ≤ Np,λ,n kf k p kgk q f (x)|x − y| g(y) dx dy L L ¯ ¯
µ
¶ 1 1 λ + + =2 p q n (5.33)
with norm ½ Np,λ,n = sup
k|x|−λ ∗ f kq 1 λ 1 ; + = 1+ kf kp p n q
¾ (f 6= 0).
(5.34)
For q = p0 , Np,λ,n was identified as π n/p
0
· ¸−1+2/p Γ (n/p − n/2) Γ (n) . Γ (n/p) Γ (n/2)
Up to dilation, the maximizing function is a multiple of (1 + |x|2 )−n/p . The heuristic that f ? is smoother than f plays a role here in the form k∇f kp ≥ k∇f ? kp . The proof of this last fact is complicated except when p = 2. In that 2 case, t−1 (kf k22 − hf, gt ∗ f i) → k∇f k22 as t → 0 where gt = (4πt)−n/2 e−|x| /4t (e.g., [328], p. 24). Since hf, gt ∗ f i ≤ hf ? , gt ∗ f ? i and kf k22 = kf ? k22 , the estimate follows. 5.4.4 Pitt’s inequalities In 1937, Pitt [298] proved the Fourier series inequality µX
|fb(n)|q |n|−β
¶1/q
µZ ≤ C
¶1/p
1
p
α
|f (x)| |x| dx 0
n∈Z
for some constant C = C(p, q, α) when 1 < p ≤ q < ∞, 1 < α < p − 1, β = q(α/p − 1/p0 ) + 1 ≥ 0 and fb(0) = 0. The inequality was subsequently extended to functions on Rn and their Fourier transforms. The following sharp form of Pitt’s inequalities for p = q = 2 was proved by Beckner [27]. Theorem 5.4.6. For f ∈ S(Rn ) and 0 ≤ α < n: µZ |fb(ξ)|2 |ξ|−α dξ
¶1/2 ≤ π
α/2
Γ ((n − α)/4) Γ ((n + α)/4)
µZ
¶1/2 |f (x)| |x| dx . 2
α
(5.35) Here is an outline of the proof. Note that (5.35) is dilation invariant and its right-hand side is made smaller upon replacing f by f ? . The resulting rearranged version of (5.35) reduces to a convolution inequality on the multiplicative group R+ for which the optimal norm is one (see [26]). Let Cα be the optimal constant in (5.35), the goal being to show that Cα = π α/2 Γ ((n − α)/4)/Γ ((n + α)/4). Upon replacing f by |x|α/2 f (x), (5.35) is equivalent to the Stein–Weiss fractional integral inequality (cf. [327], p. 117; [259], Theorem 5.1):
5.4 Symmetry and sharp inequalities
¯ ¯Z Z ¯ ¯ −α/2 α−n −α/2 ¯ ¯ f (x) |x| |x − y| |y| f (y) dy dx ¯ ¯ Z Γ (α/2) ≤ Cα π −α+n/2 |f (x)|2 dx. Γ ((n − α)/2)
211
(5.36)
Changing to polar coordinates, one sees that it suffices to prove this for radial fR . Setting x = t(σ1 , . . . , σn ) with t = |x|, h(t) = |x|n/2 f (x), and η(t) = (t + 1/t − 2σ1 )(α−n)/2 dσ, where dσ is the element of area on the unit Σn−1 sphere, (5.36) becomes ¯Z ¯ Z ³s´ ¯ Γ (α/2) ds dt ¯¯ dt −α+n/2 ¯ h(t) η ≤ C π h(s) |h(t)|2 . α ¯ t s t ¯ Γ ((n − α)/2) t R+ ×R+
R+
Young’s inequality for convolution on R+ [26] states that kη ∗ f kLp (R+ ) ≤ kηkL1 (R+ ) kf kLp (R+ ) with no extremal functions. One has (cf. [327], pp. 117-118) ¶(α−n)/2 ¸ µ Z ∞ ·Z 1 dt dσ kηkL1 (R+ ) = t + − 2σ1 t t 0 Σn−1 Z 2π n/2 ¢ = ¡ |x − y|α−n |y|−(α+n)/2 dy (|x| = 1) Γ n/2 2
=
2
Γ (n/2) Γ ((n − α)/4) . 2Γ ((n − α)/2) Γ ((n + α)/4)2
This gives the desired value for Cα . Pitt’s inequality (5.35) has no extremals because Young’s inequality for R+ does not. As a consequence of the sharp Pitt’s inequalities, one has: Corollary 5.4.7. For any Schwartz function f , µ 0 ¶ Z Z Γ (n/4) 2 2 b |f (x)| ln |x| dx + |f (ξ)| ln |ξ| dξ ≥ − ln π kf k22 . (5.37) Γ (n/4) The corollary follows by much the same argument as Beckner used in proving Hirschman’s entropy inequality (5.29) above, that is, by differentiating inequality (5.35) at α = 0. Relationship with logarithmic Sobolev inequalities. Pitt’s inequalities also include the case p < 2. A sharp form of these inequalities was also established by Beckner [27] which depends in turn on the case q = p and hence λ = 2n/p0 of the sharp Hardy–Littlewood–Sobolev inequality (5.33). Starting now from this case of (5.33) and putting f = g, one has · ¸2/p−1 Z 0 Γ (n) n(1−2/p0 ) b 2 n/p−n/2 Γ (n/p ) |ξ| |f (ξ)| dξ ≤ π kf k2Lp . (5.38) Γ (n/p) Γ (n/p)
212
5 Fourier uncertainty principles
which reduces to the Plancherel identity when p = 2 and can be differentiated there as before, yielding: Corollary 5.4.8. If f is a Schwartz function on Rn with kf k2 = 1, then µ n/2 ¶ Z Z π Γ (n) n n Γ 0 (n/2) 1 . |fb(ξ)|2 ln |ξ|dξ ≥ |f (x)|2 ln |f (x)|dx + − ln 2 2 Γ (n/2) 2 Γ (n/2) As the left-hand side is a measure of smoothness, the last inequality is a type of logarithmic Sobolev inequality and, in fact, inequality (5.30) can be deduced from it [27]. 5.4.5 Rearrangements and spectral concentration The heuristic that f ? is smoother than f plays a quantitative role in the sharp bilinear fractional integral estimate (5.36). In the Fourier domain this heuristic should appear as a statement that F(f ? )(ξ) is concentrated near ξ = 0. Donoho and Stark [120] showed the following. Theorem 5.4.9. Let f be supported on a set of measure T . If ΩT < 1.6 then ¯ Z Ω/2 ¯ ¯ Z Ω/2 ¯ ¯ ¯ ¯ ¯ ? ∧ bdξ ¯ ≤ ¯ ¯ ¯ f (|f | ) dξ ¯ ¯ ¯ ¯ −Ω/2
−Ω/2
Proof. Since |f |? vanishes off [−T /2, T /2], the inequality ¯Z ¯ ¯ ¯
Ω/2
−Ω/2
R
fg ≤
R
f ? g ? gives
¯ ¯Z ¯ ¯ ¯ ¯ b ¯ ¯ f (ξ) dξ ¯ = Ω ¯ f (t) sinc (Ωt) dt¯¯ Z ≤ Ω |f (t)| |sinc (Ωt)| dt Z ≤ Ω |f |? (t) |sinc|? (Ωt) dt Z
T /2
= Ω
|f |? (t) |sinc|? (Ωt) dt.
−T /2
However, if ΩT < 1.6 then |sinc|? (Ωt) = sinc (Ωt) for |t| < T /2. Hence, ¯ Z Ω/2 ¯ Z Z Ω/2 ¯ ¯ ? b ¯ ¯ ≤ Ω f (ξ) dξ |f | (t) sinc (Ωt) dt = (|f |? )∧ (ξ) dξ. ¯ ¯ −Ω/2
−Ω/2
This proves the theorem. The conclusion can be strengthened if f is supported in (−1, 1), since the sinc function truncated to (−1, 1) is equal to its symmetrically decreasing rearrangement. It can also be strengthened to larger ΩT if either f ≥ 0 or if f is supported where sinc is non-negative. Otherwise, the condition on the
5.5 Uncertainty inequalities in phase space
213
P
measure of the support is essential: let g be of the form i χ[αi ,βi ] where [αi , βi ] are disjoint intervals in (0, ∞) and let f (x) = g(x) + g(−x). Then P fb(ξ) = i (sin 2πβi ξ − sin 2παi ξ)/(πξ) and ¯Z ¯ ¯ ¯
Ω/2 −Ω/2
¯ ¯ ¯ X Z πβi Ω ¯ ¯2 sin u ¯¯ b ¯ ¯ f (ξ) dξ ¯ = ¯ du¯ π i παi Ω u
which can be made arbitrarily large by appropriate choice of [αi , βi ] if no P constraint is put on T = 2 i (βi − αi ). On the other hand, |f |? = χ[−T /2,T /2] so ¯ Z Ω/2 ¯ Z πT Ω/2 ¯ ¯ sin u ? ∧ ¯ ¯= 1 du (|f | ) dξ ¯ ¯ π u −Ω/2
−πT Ω/2
converges as T tends to infinity. An argument similar to the proof of Theorem 5.4.9, based this time on (5.32) gives Theorem 5.4.10. Let f be supported on a set of measure T and ΩT ≤ 0.8. Then Z Ω/2 Z Ω/2 ¯ ¯ ¯¡ ? ¢ ∧ ¯2 2 b |f (ξ) | dξ ≤ (ξ)¯ dξ ¯ |f | −Ω/2
−Ω/2
The heuristic that f ? is smoother than f does not hold, even for smooth functions, with otherwise arbitrarily distributed values: one can construct C ∞ functions for which f ? is not C 1 (cf. [120]). Thus Theorem 5.4.10 cannot be extended to arbitrary values of T Ω.
5.5 Uncertainty inequalities in phase space Our aim in this section is to illustrate the use of phase space as a means of sharpening and extending some of the known uncertainty principle inequalities. The first group of results use special tricks to get from a statement about f and fb to a statement about the Wigner distribution W (f ) of f . We then proceed to consider some uncertainty principles that make essential use of methods available in phase space that are not, at least readily, available in Rn by itself. 5.5.1 A Heisenberg inequality for the Wigner distribution The Wigner distribution of a pair f, g is Z ³ t´ ³ t ´ −2πitξ W (f, g)(x, ξ) = f x+ g x− e dt 2 2
(5.39)
whenever the integral is defined. Among other desirable mathematical properties, W (f ) = W (f, f ) has all of the properties of a time–frequency energy
214
5 Fourier uncertainty principles
distribution save, perhaps the most crucial of all: W (f ) typically fails to be non-negative (cf. [144], Section I.8). R R Since |f (x)|2 = W f (x, ξ) dξ and |fb(ξ)|2 = W f (x, ξ) dx for all f ∈ L2 (R) (loc. cit.), the moments of |f |2 and |fb|2 may be computed from the Wigner distribution by Z ZZ x2 |f (x)|2 dx = x2 W (f )(x, ξ) dξ dξ, Z ZZ ξ 2 |fb(ξ)|2 dξ = ξ 2 W (f )(x, ξ) dx dξ. De Bruijn proved that ZZ (x2 + ξ 2 )k W (f )(x, ξ) dx dξ ≥
k! 2 kf k2 (2π)k
(k = 0, 1, . . . ).
(5.40)
The Heisenberg inequality (5.1) follows from the case k = 1 together with the inequality for arithmetic and geometric means. In this sense (5.40) with k = 1 improves (5.1). Indeed, the Wigner inequality is invariant under rotations of the time–frequency plane. Mustard [287] used this fact to sharpen the Heisenberg inequality, as we will review below. We will also use this rotational invariance to generalize (5.40) in replacing (x2 + ξ 2 )k by any radial weight. The Wigner distribution is also closely related to the short-time Fourier transform S(f, g)(x, ξ) = hf (u), e2πiuξ g(u−x)i (see Chapter 3). In particular, 2S(f, g)(x, ξ) = e−πixξ W (f, g˜)(x/2, ξ/2) where g˜(x) = g(−x) while F1 F2 W (f, g)(ζ, y) = e−πiyζ S(f, g)(−y, ζ) where Fi denotes partial Fourier transform in the ith variable. 5.5.2 Wigner consequences of Hausdorff–Young Lieb [260] (see also [168]) proved the following quadratic uncertainty principle for the short-time Fourier transform, thought of here as a bilinear operator (f, g) 7→ S(f, g). Theorem 5.5.1. If f, g ∈ L2 (Rn ) and 2 ≤ p < ∞, then µ ZZ
¶1/p p
|S(f, g)|
µ ¶n/p 2 ≤ kf k2 kgk2 . p
(5.41)
The inequality is reversed for 1 ≤ p < 2. Equipped with Beckner’s sharp Hausdorff–Young and Young’s inequalities ∧ g the idea is simple: S(f, g)(x, ξ) = (f · (T x g)) (ξ) where Tx g(·) = g(· − x) and g˜(·) = g¯(−·). Applying Hausdorff–Young (5.20) in ξ, the left side of (5.41) is bounded by a convolution of powers of |f | and |g| to which Young’s convolution inequality is applied to give the result. The extremal functions
5.5 Uncertainty inequalities in phase space
215
of this inequality for p > 2 (and f = g) are Gaussians as in the case of the separate Hausdorff–Young and Young’s inequalities. Also as in the case of Theorem 5.4.1, one can differentiate the inequality at p = 2 to obtain an inequality for the entropy of S(f, g), namely: Z Z − |S(f, g)|2 ln |S(f, g)|2 ≥ 1 (kf k2 = kgk2 = 1). (5.42) By Cauchy–Schwarz, kS(f, g)k∞ ≤ kf k2 kgk2 so S(f, g) can have no peaks. On the other hand, Z Z 2 2 kf k2 kgk2 = |S(f, g)|2 ≤ kS(f, g)k∞ kS(f, g)k1 . (5.43) Lieb’s inequality at p = 1 yields the sharper bound kS(f, g)kL1 ≥ 2kf k2 kgk2 . 5.5.3 Benedicks’ theorem for the Wigner distribution Mustard and Sitaram (see [145]) conjectured the following phase space analogue to Benedicks’ theorem: If W (f ) vanishes outside a set of finite measure then f = 0. Several proofs appeared quickly, independently due to Jaming [209], Wilczok [366] and Janssen [214]. Janssen’s proof in particular makes a striking use of the following consequence (stated for R) of Moyal’s reproducing formula attributed to Hlawatsch [192] and others: ZZ 4 eπi(ax−bξ) W (f )(t, ω) W (f )(x − t, ξ − ω) e−2πi(at−bω) dt dω ³ 2x + b 2ξ + a ´ ³ 2x − b 2ξ − a ´ = W (f ) , W (f ) , . 4 4 4 4
(5.44)
To outline Janssen’s argument, fix f ∈ L2 (R). Then W (f ) ∈ L1 ∩ C0 (R2 ) so H(x,ξ) (t, ω) = W (f )(t, ω)W (f )(x − t, ξ − ω) ∈ L1 (R2 ) uniformly in (x, ξ). If W (f ) vanishes off of a set of finite measure then so does H(x,ξ) (t, ω). By b (x,ξ) also vanishes off a set of finite measure. By Benedicks’ theorem, (5.44), H f = 0. 5.5.4 Hardy’s theorem for S(f, g) The following version of Hardy’s theorem for the short-time Fourier transform was proved by Gr¨ochenig and Zimmerman [171]. Theorem 5.5.2. Let (g, f ) ∈ S × S 0 (R). Suppose that |S(f, g)(x, ξ)| ≤ 2 2 Ce−π(ax +bξ )/2 where a > 0, b > 0 and C are fixed. (i) If ab = 1 then g, f are both multiples of a time–frequency shift of the Gaussian exp(−πax2 ) while (ii) if ab > 1 then f = 0 or g = 0.
216
5 Fourier uncertainty principles
Clearly (i) implies (ii). We will outline the proof of (i) because the technique is, in a sense, dual to that used by Janssen in proving Benedicks’ theorem for the Wigner distribution. Gr¨ochenig and Zimmerman defined a function F (x, ξ) = e2πixξ S(f, g)(x, ξ)S(f, g)(−x, −ξ). Covariance properties of S(f, g) imply that Fb(η, y) = F (−y, η) where Fb is the two-variable Fourier transform of F . Consequently, by Hardy’s theorem in R2 [345], F itself cannot decay faster than a corresponding bivariate Gaussian in R2 . From symmetry properties, one concludes that decay of S(f, g) suffers the same limitation. Gr¨ochenig and Zimmerman also derived Benedicks’ theorem for S(f, g) based on this approach. One has the identities W (f, g˜)(u, v) = 2e4πiuv S(f, g)(2u, 2v) = W (˜ g , f )(u, v) where g˜(x) = g(−x). As such, the special case of Hardy’s theorem in which f is assumed to be symmetric or antisymmetric follows from Janssen’s approach. Bonami et al. [56] also proved the following analogue of Beurling’s theorem. Theorem 5.5.3. If f, g ∈ L2 (Rn ) satisfy ZZ ZZ 2 2 eπ|x| eπ|ξ| 2 |S(f, g)|2 dx dξ < ∞ and |S(f, g)| dx dξ < ∞ (1 + |x|)N (1 + |ξ|)N 2
2
then f (x) = P (x)e2πiw·x e−π|x−a| and g(x) = Q(x)e2πiw·x e−π|x−a| for some fixed a, w ∈ Rn and polynomials P, Q. 5.5.5 Heisenberg’s inequality and phase plane rotations The fractional Fourier transform (FRFT) discovered in turn by Condon [93] and Bargmann [21]—and later by Namias [288], who coined its name—can be defined initially on L1 (R)-functions as Z 2 2 iθ 1/2 f 7→ Fθ f (ξ) = fθ (ξ) = (−ie csc θ) eπi cot θ(x +ξ )−2πi(csc θ)xξ f (x) dx = (−ieiθ csc θ)1/2 Ccot θ (ξ) (Ccot θ (·)f )∧ (ξ csc θ) (sin θ 6= 0)
(5.45)
2
where Cu is the chirp Cu (x) = eπiux . This representation (see [144], p. 193) can be derived from the metaplectic representation—an approach that makes the connection with rotations of the Wigner distribution clear (e.g., [144], Prop. 4.28, p. 180): Fθ (f ) amounts to a rotation of the Wigner distribution of f through an angle of −θ. Precisely, W (fθ , gθ )(x, ξ) = W (f, g)(ρ−θ (x, ξ)) where ρθ denotes rotation through an angle θ. By Mehler’s formula (5.22), Fθ can also be defined in terms of its action on the orthonormal Hermite basis (5.21): ∞ X fθ = e−ikθ hf, hk i hk . (5.46) k=0
Thus, curiously, rotating W (f ) in phase space amounts to rotating the eigenvalues in its Hermite expansion.
5.5 Uncertainty inequalities in phase space
217
The operators Z, Z ∗ defined by Z = (X + iD),
Z ∗ = (X − iD)
(as before, (Xf )(x) = xf (x), Df (x) = (1/2πi)df /dx) satisfy (see [144], p. 53) r r k k+1 ∗ Zhk = hk−1 , Z hk = hk+1 . (5.47) π π The product ZZ ∗ satisfies ZZ ∗ = D2 + X 2 + I/(2π) so that πZZ ∗ hk = (k + 1)hk ,
πZ ∗ Zhk = khk ,
π(ZZ ∗ − Z ∗ Z)hk = hk . (5.48)
Consequently, ZZ ∗ commutes with Fθ , although Z and Z ∗ themselves do not. Property (5.48) yields π(ZZ ∗ − Z ∗ Z) = I so that, by induction, µ ¶µ ¶ µ ¶ 2 k−1 1 ∗ ∗ k ∗k ∗ ∗ Z Z = ZZ ZZ + I ZZ + I · · · ZZ + I . (5.49) π π π Thus Z k Z ∗k also commutes with Fk and so ° ° ° ° ∗k ® ® °Z Fθ f °2 = Z ∗k Fθ f, Z ∗k Fθ f = F−θ Z k Z ∗k Fθ f, f = °Z ∗k f °2 . 2 2 Similarly, kZ k Fθ f k22 = kZ k f k22 . In fact, repeated application of (5.47) in (5.46) yields, upon substituting ν = m − k (and setting hν = 0 when n < 0): Z k Fθ f =
∞ X
e−imθ hf, hm i Z k hm
m=0
=
∞ X
s e
m=0
= e−ikθ
m! hm−k (m − k)!π k r ∞ X (ν + k)! −iνθ e hf, hν+k i hν ν!π k ν=0
−imθ
hf, hm i
= e−ikθ Fθ Z k f p since hf, hν+k i (ν + k)!/ν!π k is the νth coefficient in the Hermite expansion of Z k f . Unitarity of Fθ then yields k ® ® Z Fθ f, Z ∗k Fθ f = e−2ikθ Z k f, Z ∗k f . (5.50) Consequently, |hZ k f, Z ∗k f i| is also invariant under Fθ . Since X = (Z + Z ∗ )/2 and D = i(Z ∗ − Z)/2 one has o 1 1n 2 2 2 2 kXf k2 = k(Z + Z ∗ )f k2 = kZf k2 + kZ ∗ f k2 + 2Re hZf, Z ∗ f i and 4 4 o 1 1n 2 2 2 2 kDf k2 = k(Z − Z ∗ )f k2 = kZf k2 + kZ ∗ f k2 − 2Re hZf, Z ∗ f i . 4 4
218
5 Fourier uncertainty principles
The invariance of kXf k22 + kDf k22 under Fθ then follows from that of kZf k22 and kZ ∗ f k22 . It is clear, however, that the Heisenberg spread kXf k2 kDf k2 itself is not invariant under Fθ : in view of (5.50), ¾ ½ ´2 ¡ −2iθ ¢2 1 ³ 2 2 2 2 ∗ ∗ hZf, Z f i kZf k2 + kZ f k2 − 4 Re e kXfθ k2 kDfθ k2 = 16 depends on θ. In contrast, the quantity ½³ ¾ ´2 1 2 2 ∗ ∗ 2 σ12 (f ) = kZf k + kZ f k − 4| hZf, Z f i | 2 2 4 16 kf k2
(5.51)
is independent of θ. Moreover, by the Cauchy–Schwarz inequality and (5.48), σ1 is related to the Heisenberg spread via kXfθ k22 kDfθ k22 ≥ σ12 (f ) kf k42 ≥
´2 1 ³ kf k42 2 2 . kZf k2 − kZ ∗ f k2 = 16 16π 2
By applying much the same analysis as above one has the following (see [287]). Theorem 5.5.4. For k = 1, 2, . . . the kth order quantity σk defined by ½³ ¾ ° k °2 ° ∗k °2 ´2 k ® 2 1 2 ∗k ° ° ° ° σk (f ) = Z f 2 + Z f 2 − 4| Z f, Z f | (5.52) 4 16 kf k2 is invariant under Fθ and satisfies the uncertainty inequality σk2 (f ) ≥
1 4 16 kf k2
³° ° ° ° ´2 °Z k f °2 − °Z ∗k f °2 . 2 2
5.5.6 DeBruijn’s inequalities DeBruijn [111] referred to the Wigner distribution of a signal as its musical score. He then interpreted inequalities for the musical score as precise statements about the limitations of the musical score as a representation of pitch local in time. As noted, inequality (5.40) for k = 1 has the classical uncertainty principle (5.1) as a consequence. DeBruijn used power series methods to prove the scale of inequalities (5.40) for higher moments of the Wigner distribution. Although DeBruijn did not explicitly mention this fact, his estimates imply the following analogue of Beurling’s theorem (which also follows from TheoR R 2π(x2 +ξ2 ) rems 5.5.2 and 5.5.3): if e W (f )(x, ξ) dx dξ < ∞ then f ≡ 0. In this section we consider a new method [196] to extend (5.40) to the case in which (x2 + ξ 2 )k is replaced by a radial weight function on the time–frequency plane. As with Theorems 5.5.2 and 5.5.3, we prove the corresponding result for 2 the short-time Fourier transform S(f, G) with G the Gaussian G(x) = e−πx , instead of W (f ), as the constants that arise are more tractable for S(f, G). Henceforth, we refer to S(f, G) as the Gabor transform of f .
5.5 Uncertainty inequalities in phase space
219
In either case, one takes advantage of rotational invariance of the weight by using rotational covariance of the Wigner or Gabor transforms. Because F1 F2 W (f, g)(x, ξ) = e−πixξ S(f, g˜)(−x, ξ), it follows that |S(fθ , gθ )(x, ξ)| = |S(f, g)(ρ−θ (x, ξ))|. Explicit calculations of Wigner distributions (cf. [144], p. 66) yield the identity W (hk , G)(x, ξ) =
2 2 2k+1 π k/2 √ (x + iξ)k e−2π(x +ξ ) k!
so that for the Gabor transform one also has 2 2 π k/2 S(hk , G)(x, ξ) = e−πixξ √ (−1)k (x + iξ)k e−π(x +ξ )/2 . k!
(5.53)
Since the Gabor transform is linear in f , S(fθ , G) =
∞ X
e−ikθ hf, hk i S(hk , G).
k=0
By Plancherel’s theorem and (5.53), Z
2π
2
|S(fθ , G)(x, ξ)| dθ = 2π 0
= 2π
∞ X k=0 ∞ X
2
|hf, hk i| |S(hk , G)(x, ξ)| 2
|hf, hk i|
k=0
2
2 2 πk 2 (x + ξ 2 )k e−π(x +ξ ) . k!
Suppose now that ω(x, ξ) is a radial weight function that can be written R 2π ω(x, ξ) = ω(r) = 0 V (r cos φ)dφ for all r > 0 for some non-negative V , as is the case when ω(x, ξ) = (x2 + ξ 2 )α . Then, setting (x, ξ) = (r cos φ, r sin φ) one has ZZ Z 2π V (x) |S(fθ , G)(x, ξ)|2 dθ dx dξ 0
ZZ = 2π
V (x)
∞ X
2
|hf, hk i|
k=0
= 2π = 2π
∞ X k=0 ∞ X k=0
2
|hf, hk i|
2
|hf, hk i|
πk k!
Z 0
∞
2 2 πk 2 (x + ξ 2 )k e−π(x +ξ ) dt dξ k!
Z
2π
2
V (r cos φ) r2k+1 e−πr dr dφ
0
πk M (k, ω) k!
R∞ 2 ˜ = where M (k, ω) = 0 ω(r)r2k+1 e−πr dr. On the other hand, since G −πixξ Gθ = G, 2S(f, g)(x, ξ) = e W (f, g˜)(x/2, ξ/2), and since W (fθ , gθ )(x, ξ) = W (f, g)(ρ−θ (x, ξ)), we have S(fθ , G)(x, ξ) = S(f, G)(ρ−θ (x, ξ)). Hence
220
5 Fourier uncertainty principles
ZZ
Z
2π
V (x) Z
0 ∞
Z
|S(fθ , G)(x, ξ)|2 dθ dx dξ Z
2π
=
2π
V (r cos φ) 0
Z
0
∞
Z
2π
= 0
Z
0
∞
Z
∞
Z
0
2π
V (r cos φ) |S(fθ−φ , G)(r, 0)|2 dθ dφ rdr µ Z 2π ¶ 2 |S(f−θ , G)(r, 0)| V (r cos(θ + φ)) dφ dθ rdr 0
2π
= Z
Z
|S(fθ , G)(r cos φ, r sin φ)|2 dθ dφ rdr
0
0
0 2π
= 0
ZZ =
|S(f, G)(r cos θ, r sin θ)|2 ω(r) dθ rdr
0
ω(x, ξ) |S(f, G)(x, ξ)|2 dx dξ.
Thus we have proved the following. R 2π Theorem 5.5.5. Whenever ω(r) = 0 V (r cos θ) dθ is a positive, radial function, G is the Gaussian and f is a suitably behaved function, ZZ ∞ k X 2 π ω(x, ξ) |S(f, G)(x, ξ)|2 dx dξ = 2π |hf, hk i| M (k, ω) k! k=0
where M (k, ω) =
R∞ 0
2
ω(r)r2k+1 e−πr dr.
Corollary 5.5.6. Under the hypotheses of the theorem, ZZ 2 ω(x, ξ) |S(f, G)(x, ξ)|2 dx dξ ≥ Cω kf k2 where Cω = 2 inf k π k+1 M (k, ω)/k!. R 2π 2 2 Example. Let V (x) = eπαx . Then ω(r) = 0 V (r cos θ) dθ ≥ Ceπαr /r and 2 2 the moments M (k, ω) all diverge when α ≥ 1. Thus S(f, G)(x, ξ)eπ(x +ξ )/2 cannot be square integrable unless f = 0, an analogue of Beurling’s theorem for the Gabor transform. Example. For α ≥ 0, ZZ ∞ X 2 Γ (k + 1 + α) 2 2 α 2 (x + ξ ) |S(f, G)(x, ξ)| dxdξ = |hf, hk i| . π α k! k=0
This follows immediately from taking ω(r) = r2α then noting that Z ∞ Z ∞ 1 1 du Γ (k + 1 + α) 2k+2α −πr 2 r e r dr = uk+α+1 e−u = . k+α 2π π u 2π α+k+1 0 0 By differentiating this scale of identities with respect to α at α = 0 one obtains also the following logarithmic weighted identity.
5.6 Weighted Fourier inequalities and uncertainty
221
Corollary 5.5.7. S(f, G) satisfies ZZ ln(x2 + ξ 2 ) |S(f, G)(x, ξ)|2 dx dξ =
∞ X
µ 2
|hf, hk i|
k=0 2
= −(γ + ln π) kf k2 +
Γ 0 (k + 1) − ln π Γ (k + 1)
∞ X
2
|hf, hk i|
k=1
where γ is Euler’s constant γ = limm→∞ (
Pm k=1
¶
k X 1 m m=1
1/k − ln m) = 0.577 . . . .
Corresponding identities and uncertainty principles hold for the Wigner distribution. Because of sesquilinearity of W one has XX W (fθ ) = hf, hk i hf, hl i ei(k−l)θ W (hk , hl ) which, upon integrating over θ, yields Z
2π
W (fθ )(x, ξ) dθ = 2π 0
∞ X
2
|hf, hk i| W (hk )(x, ξ).
k=0
A symmetrization argument yields an analogue of Theorem 5.5.5. In the case ω(r) = r2α one obtains ZZ (x2 + ω 2 )α |W (f )(t, ξ)|2 dt dξ =
∞ X
2
Dk |hf, hk i|
k=0
in which Dk grows like k α [196]. DeBruijn’s inequalities (5.40) are recovered by choosing α ∈ N.
5.6 Weighted Fourier inequalities and uncertainty The following observation of Benedetto and Heinig, (e.g., [30]) shows that weighted norm inequalities for the Fourier transform imply Fourier uncertainty inequalities (cf. Cowling and Price [95]). Proposition 5.6.1. Suppose that 1 < p, q < ∞ and u, v are non-negative weight functions. Set kf kLpv = (k|f |p vkL1 )1/p . If there is a constant C > 0 such that, for all f in Lpv (R), kfbkLqu ≤ C kf kLpv
(5.54)
then, for all such f , kf k2L2 ≤ 4πC kx f (x)kLpv kξ fb(ξ)kLq0
0 u1−q
.
222
5 Fourier uncertainty principles
Proof. Integration by parts yields: Z Z Z kfbk2L2 = ξ |fb(ξ)2 |0 dξ ≤ |ξ [fb(ξ)2 ]0 | dξ ≤ 2 |ξ fb(ξ) fb0 (ξ)| dξ Z = 2 |ξ fb(ξ) u(ξ)−1/q ||fb0 (ξ) u(ξ)1/q | dξ µZ 0 0 |ξ fb(ξ)|q u(ξ)−q /q dξ
≤ 2
¶1/q0 µ Z
µZ 0 0 |ξ fb(ξ)|q u(ξ)−q /q dξ
≤ 2C
¶1/q0 µ Z
µZ = 4πC
|fb0 (ξ)|q u(ξ) dξ
0 0 |ξ fb(ξ)|q u(ξ)−q /q dξ
¶1/q
¶1/p 0 ∨ p b |(f ) (x)| v(x) dx
¶1/q0 µ Z
¶1/p |x f (x)| v(x) dx . p
This proves the proposition. When p = q = 2 and u = v = 1 the inequality (5.54) becomes Plancherel’s identity; in particular, C = 1 in that case and the proof above is that of Heisenberg’s uncertainty principle inequality (5.1) credited to Weyl. The problem of finding weights u, v and exponents p, q such that (5.54) holds is described by Garc´ıa-Cuerva and Rubio de Francia as “probably one of the most interesting and hard problems concerning weighted inequalities” (see [151], p. 468). Indeed, special cases of the two-weights Fourier problem appear in diverse and deep mathematical applications. The Hausdorff–Young and Pitt-type inequalities encountered above are special instances of (5.54) for which sharp bounds are known. Validity of (5.54) is characterized by effective weight conditions—though without sharp bounds—for a broad class of monotone weight pairs (u, v). Rearrangements play a vital role. Suppose that u and 1/v in (5.54) are radially decreasing away from the origin. Then the rearrangement weight inequality µZ ¶1/q µZ ¶1/p |fb? |q u ≤ C |f ? |p v (5.55) is stronger than (5.54) because the left-hand side is maximized and the righthand side is minimized relative to all equimeasurable rearrangements of f and fb. Thus (5.55) imposes an essential constraint on weight pairs, but one that allows an effective condition for verification. There are three independent proofs that condition (5.56) below is necessary and sufficient for (5.55), at least when p ≤ q. These are due to Benedetto and Heinig [36], to Jurkat and Sampson [218], and to Muckenhoupt [281] (see also [37] for illumination of subsequent developments). Theorem 5.6.2. If 1 < p < ∞ and 1 < q < ∞ then µ Z s ¶1/q µ Z 1/s µµ ¶∗ ¶p0 /p ¶1/p 1 sup u∗ = A<∞ v s>0 0 0
(5.56)
5.6 Weighted Fourier inequalities and uncertainty
223
˜ and all r > 0 either if and only if, for some positive B, D, D µZ
¶µ Z
µ ¶p0 /p ¶q/p0 1 ≤ Dq (q ≤ p0 ) or 0 /q−1 ∗ v p p−1 ξ u >Br v p0 ). (5.57) u dx ≤D 0 −1 1 ∗ q/p v q/p >Br ru>1 x (( v ) ) u∗ dξ
In case p ≤ q, either of these conditions is equivalent to (5.55). The case q > p0 of (5.57) follows from the case q ≤ p0 essentially by interchanging the roles of the weights. For the sake of palatability we will restrict the proofs to the case in which p = q = 2. Whereas the proof that (5.59) implies (5.60) uses Plancherel’s theorem, the general case p ≤ q invokes the Hausdorff–Young inequality at the same point. Though the result holds in Rn , we continue to work in just one variable here. For convenience, we will also assume that all integrals written below converge. We refer to Muckenhoupt [281] for full details of the general case. Restated for p = q = 2, Theorem 5.6.2 says: µ Z s ¶µ Z 1/s µ ¶∗ ¶ 1 sup u∗ = A2 < ∞ (5.58) v s>0 0 0 if and only if, for some B > 0 and all r > 0, µZ ¶µ Z ¶ 1 ∗ u ≤ D2 u∗ >Br v
(5.59)
(5.60)
We will show that (5.58) ⇒ (5.59) following Muckenhoupt [280], then proceed to show that (5.59) ⇒ (5.60) and (5.60) ⇒ (5.58). To prove that (5.58) implies (5.59), one sets B = A2 , fixes r > 0 and chooses s so that u∗ (s) > Br (if there is no such s then the first integral in (5.59) vanishes and (5.59) holds with D = 0). Since u∗ is nonincreasing, Z s u∗ ≥ B r s. 0 ∗
Since (1/v) is nonincreasing on [0, 1/s], one also has (1/v)∗ (1/s)/s ≤ R 1/s (1/v)∗ so (5.58) implies that 0 Br
³ 1 ´∗ ³ 1 ´
Z ≤
s
Z u∗
1/s
³ 1 ´∗
v s v 0 0 ³ 1 ´∗ ³ 1 ´ 1 ≤ since B = A2 . v s r
≤ A2 or
224
5 Fourier uncertainty principles
Since (1/v)∗ is nonincreasing it follows that x ∈ [0, 1/s] whenever (1/v)∗ (x) > 1/r. By the definition of equimeasurable rearrangement, therefore Z s Z 1 u∗ ≤ A2 v 0 v Br, i.e., when s = sup{x : u∗ (x) > Br}. This shows that (5.59) holds with A = D. Next we show that (5.59) implies (5.60). Local integrability of 1/v implies that L1 ∩ Lpv is dense in Lpv so one can assume that f in (5.60) is integrable. It will help to assume that the level sets of u and v have measure zero. One can always find functions u ˜, v˜ having essentially the same magnitudes as u and v, respectively, satisfying this property. Moreover, we will assume to begin that |{x : u(x) > 0}| < ∞ and derive bounds independent of |{x : u(x) > 0}|. Fix B and set Ej = {x : 2j B < u(ξ) ≤ 2j+1 B}. Also set Vj = {x : v(x) ≥ 2j−1 } and set fj = f χVj . Then µZ ¶ Z Z 2 2 2 b b b b |f | u ≤ 4 |fj | u + |f − fj | u . (5.61) Ej
Ej
Ej
The result will follow by summing over j once one shows that each of the corresponding terms on the right satisfies appropriate bounds. To estimate the sum of the first terms in (5.61) note that, by Plancherel’s theorem, Z XZ X |fbj (ξ)|2 u(ξ) dξ ≤ 2B 2j |fbj (ξ)|2 dξ Ej
j
j
= 2B
X
Z 2j
|fj (x)|2 dx
j
Z
|f (x)|2
= 2B
2j χv(x)≥2j−1 (x) dx
j
Z ≤ 4B
X
|f (x)|2 v(x) dx.
The estimate of the sum of the second terms in (5.61) is trickier because v is small on the corresponding sets. First, ¶2 Z µZ XZ ¯ ¯ ¯(f χR\V )∧ (ξ)¯2 u(ξ) dξ ≤ |f (x)| dx u(ξ) dξ. j j
Ej
v(x)
As f is assumed to be integrable, let J be the least integer such that 2J ≥ kf k1 . Set rJ =R ∞. As the level sets of v have measure zero, one can choose rj (j < J) so that v
Z j
|f (x)| dx = 2 = 2 v
|f (x)| dx rj−1 ≤v
(j < J).
5.7 Embeddings, uncertainty and Poisson summation
225
Then by (5.59), Z µZ
¶2 |f (x)| dx
u(ξ) dξ ≤
v(x)
j=−∞
j=−∞
j=−∞
| f v 1/2 | v −1/2 µZ
¶µ Z 2
u
|f | v
Brj−1 ≤ u
rj−1 ≤ v
µZ ∗
u
u∗ ≥Brj−1
v
µZ J X
≤ D2 c
j=−∞
v
rj−1 ≤ v
Z J X
≤ c
¶2 |f | u
¶2
u
Brj−1 ≤ u
Z J X
≤ c
rj−1 ≤ u/B
j=−∞
µZ
Z J X
≤ c
µZ
Z J X
1 v
rj−1 ≤ v
¶µ Z
¶
¶ 2
|f | v rj−1 ≤ v
¶ |f |2 v
1 v
µZ
¶ |f |2 v .
= c0
rj−1 ≤ v
Thus (5.59) is sufficient for (5.60). To prove necessity of (5.58), set f = (1/v)χ[−s,s] (x). If (5.60) holds then Z
1/(8s) −1/(8s)
¯Z ¯ ¯ ¯
s −s
¯2 Z 1/(8s) 1 ¯¯ u ≤ 2 v¯ −1/(8s) Z 1/(8s) = 2 Z
¯2 ³1´ ¯ −2πixξ (x) e dx¯¯ u(ξ) dξ v −s Z ∞ |fb|2 u ≤ 2 |fb|2 u ¯Z ¯ ¯ ¯
−1/(8s) ∞
Z
|f |2 v =
≤ c −∞
Dividing both sides by
Rs −s
s
−∞ s
−s
1 . v
1/v yields
Z
Z
1/(8s)
s
u −1/(8s)
−s
1 ≤ C. v
Because (5.60) still holds when u and 1/v are replaced by their symmetrically decreasing rearrangements, one can apply the same argument to u? and (1/v)? . Since u? is symmetrically decreasing one concludes (5.58).
5.7 Embeddings, uncertainty and Poisson summation 5.7.1 Weil’s approach to PSF and a generalized version The PSF can be formulated as an identity relating values of the Zak transforms of f and fb. As Auslander and Meyer [11] showed and as we review now, it then generalizes to identities relating, essentially, samples of the Zak transform. Let
226
5 Fourier uncertainty principles
Γ denote the integer coordinate subgroup of the (polarized) Heisenberg group H. Given f ∈ S(R), its Weil transform Zf is defined on H by X Zf (x, y, z) = e2πiz f (x + k)e2πiky (x, y, z) ∈ H k∈Z
and satisfies Zf ((m, n, l) · (x, y, z)) = Z f (x + m, y + n, l + z + my) X = e2πi(l+z+my) f (x + m + k) e2πik(n+y) = e2πiz
X
k
f (x + k) e2πiky = Z f (x, y, z)
k
whenever (m, n, l) ∈ Γ . Further, f 7→ Zf extends to a unitary mapping from L2 (R) to the subspace of L2 (H/Γ ) of functions satisfying F (x, y, z + t) = e2πit F (x, y, z). Note that Zf (x, y, 0) = Zf (x, y), the Zak transform of f and Z is inverted similarly to Zak: Z 1 Z −1 (Zf )(x + m) = Zf (x, y, 0) e−2πimy dy. 0
A straightforward calculation shows that the symplectic action J F (x, y, z) = F (y, −x, z − xy) satisfies the following. Theorem 5.7.1. For f ∈ L2 (R), Z −1 J Z(f ) = fb. Evaluating the identity J Zf = Z fb now at (x, y, z) = (0, 0, 0) yields X X f (n) = fb(l). (5.62) n∈Z
l∈Z
This proves the Poisson summation formula, modulo important considerations of convergence of the sums in question to which we will return. Standard dilation and shift properties of f 7→ fb allow one to rewrite (5.62) as N −1 X
X
f (k + qN ) =
k=0 q ∈ Z
N −1 1 X −2πikq/N X b³ q ´ e f , N N q∈Z
k=0
expressing PSF in terms of oversampling fb combined with a DFT. Applying oversampling and a DFT on both sides of PSF when N = M P , Auslander and Meyer [11] obtained the identity, for any m, b ∈ ZM and a, p ∈ ZP : X q∈Z
N −1 ³ ´ ´ X X ³ b a fb p + f m+ + qM = P e−2πikbp/N + qP . (5.63) P M k=0
q∈Z
5.7 Embeddings, uncertainty and Poisson summation
227
A direct, formal verification of (5.63) based on the standard PSF can be found in [11]. This mode assumes convergence of PSF. It is also natural to seek a proof that brings out the interplay between the continuous and discrete Fourier transforms, so first we consider the analogue of PSF in the case of ZN . Let HN be the (polarized) Heisenberg group of ZN as in (5.19). If N = M P then we can also let Γ = Γ (M, P ) denote the subgroup of HN consisting of those elements of the form (αM, βP, c) with (α, β) ∈ ZP × ZM . Consider now the subspace H(M, P ) of L2 (HN /Γ (M, P )) consisting of functions F satisfying F (a, b, c + d) = e2πid/N F (a, b, c). By analogy with the continuous case, one defines the finite Weil transform (cf. [12]) Z(M,P ) : L2 (ZN ) → L2 (HN ) e2πic/N X Z(M,P ) (x)(a, b, c) = √ xjM +a e2πijb/P P j∈ZP 2 which is left Γ (M, P ) invariant and unitary from (ZN ) into H(M, P ). The √ LP inversion formula takes the form: xa+dM = (1/ P ) b∈ZP X(a, b, c)e−2πibd/P when X = Z(M,P ) x. By analogy one now defines JN by JN X(a, b, c) = X(b, −a, c − ab). Then −1 b b Theorem 5.7.2. Z(P,M ) JN Z(M,P ) x = x where x denotes the N -point discrete Fourier transform of x.
As before, the theorem is left as a simple exercise. Evaluating Z(M,P ) x b at (0, 0, 0) gives the following discrete version of PSF: and Z(P,M ) x 1 b(0, 0, 0) = √ Z(P,M ) x M
X k ∈ ZM
1 X x bkP = √ xjM . P j ∈ ZP
To relate the continuous and discrete theories, consider the coset space H/Γ as a cross product of subrectangle cosets with ZM × ZP . To do so, instead of starting with Γ , begin with ∆(1/P, 1/M ) = {(a/P, b/M, c/N ) : a, b, c ∈ Z} and consider the homomorphism ³1 1 ´ Φ:∆ , → HN P M ³a b c ´ , , 7→ (a mod N, b mod N, c mod N ). P M N Then Φ is surjective with kernel ∆(M, P ) = {(αM, βP, γ) ∈ ∆(1/P, 1/M ) : α, β, γ ∈ Z}. Moreover, ∆(1/P, 1/M ) is sent to ∆(1/M, 1/P ) by the transformation (a, b, c) 7→ (b, −a, c − ab) which also commutes with Φ, while Φ(Γ ) = Γ (P, M ) so that Φ further induces a homomorphism ΦN : ∆(1/P, 1/M )/Γ 7→ HN /Γ (P, M ). In the end we need to reconcile the function theory with the group theory. If F is a continuous function on H/Γ then the samples along ∆(1/P, 1/M ) are
228
5 Fourier uncertainty principles
well defined. Denote these samples by s(F ). The subgroup ∆(1/P, 1/M ) of H then acts on s(F ) by right multiplication on sample indices. The stabilizer subgroup is ∆(P, M ). Here is a summary of these facts. Lemma 5.7.3. (i) L2 (HN ) can be identified with functions on ∆(1/P, 1/M ) that are ∆(P, M ) invariant; (ii) If F is Γ invariant then s(F ) is left Γ (P, M ) invariant and (iii) the samples of continuous functions on H/Γ can be identified with elements of L2 (HN /Γ (P, M )). Given these facts, one then has the following relationship between the finite Weil transform and samples of the continuous Weil transform. Theorem 5.7.4. Let f ∈ L2 (R) be such√thatP Zf is continuous and let x = ∞ x(f ) ∈ L2 (ZN ) be defined by xmP +a = M −∞ f (m + a/P + kM ) (m ∈ ZM , a ∈ ZP ). Then Z(P,M ) x = s(Zf ). Again, the proof is just an exercise in keeping track of definitions. To verify the generalized PSF that we are about to consider, one notes that s(J F ) = JN (s(F )) when the samples are defined. The generalized PSF now takes the following form. Theorem 5.7.5. Let f and x be as in Theorem 5.7.4 and let yb+pM = √ P∞ P k=−∞ fb(p + b/M + kP ) (b ∈ ZM , p ∈ ZP ). Then y ∈ L2 (ZN ) and b, that is, (5.63) holds. y=x To prove the lemma, one has Z(P,M ) x = s(Zf ) while Z(M,P ) y = s(Z fb). Since J Zf = Z fb and since s(J Zf ) = J Z(P,M ) x, it follows from Theorem 5.7.2 that −1 −1 −1 b b y = Z(M,P ) s(Z f ) = Z(M,P ) s(J Zf ) = Z(M,P ) J Z(P,M ) x = x.
This proves the theorem. 5.7.2 Some necessary and sufficient conditions for PSF It has long been known that f ∈ L1 (R) and fb ∈ L1 (R) is insufficient to imply convergence of PSF. A counterexample in which both sides diverge can be found in Bourbaki [57]. Katznelson [224] first gave an example in which both sides converge but to unequal sums. The following rather simple example is due to J. Lei [254]. Define ϕ(x) on [0, ∞) as follows. Let ϕ(x) = 1 − x on [0, 1) and ϕ(x) = (k + 1 − x)k (k − x) on [k, k + 1) , k = 1, 2, . . . . Extend ϕ 1 symmetrically to R. Then ϕ ∈ (R), ϕ(0) = 1 and ϕ(k) = 0 for k 6= 1. On PL ∞ the other hand, set Φ(u) = k=−∞ ϕ(u + k). Then Φ has period one and Φ(0) = 1 while if u ∈ (0, 1) then
5.7 Embeddings, uncertainty and Poisson summation ∞ X
ϕ(u + k) = (1 − u) − u
(1 − u)k = 0 while
k=1
k=0 −1 X
∞ X
229
ϕ(u + k) = u +
k=−∞
∞ X
(uk+1 − uk ) = 0
k=1
and Φ(u) = 0. As ϕ(n) b is the nth Fourier coefficient of Φ, X X ϕ(n). ϕ(n) b = 0 6= 1 = n
n
P Thus both sides of PSF converge P but they are unequal. The sum n ϕ(n) converges absolutely as does b and, of course, it is only the single n ϕ(n) value x = 0 at which the full Poisson summation formula X X ϕ(n) b e2πinx (5.64) ϕ(x + k) = k
n
fails due to discontinuity of Φ. It is worthwhile to consider conditions under which PSF does converge uniformly as we shall do here. Clearly, both sides of PSF converge uniformly if both ϕ and ϕ b belong to the Wiener space ∞ 1 W = W (L , ` ) defined in Chapter 3 and consisting of those f such that P k kf χ[k,k+1) k∞ < ∞. Convergence of (5.64) at every point is guaranteed, in fact, if either of the following asymmetric conditions holds: (i) {ϕ(n)} b ∈ `1 , P ϕ(x + k) converges absolutely and uniformly to a continuous sum, or (ii) {ϕ(n)} b ∈ `1 , ϕ ∈ W and ϕ is continuous. Both (i) and (ii) hold if (iii) ϕ ∈ W and ϕ b ∈ W . Moreover, one has the following. Lemma 5.7.6. If ϕ ∈ W and ϕ b ∈ W then both sides of (5.64) converge absolutely and uniformly and are equal. The joint Wiener conditions of the lemma are rather strong and require knowledge of both ϕ and ϕ. b This can be undesirable when one wants to deduce one side of (5.64) from the other. The following lemma provides a one-sided condition for verifying (5.64). Lemma 5.7.7. If ϕ ∈ L1 (R) and ϕ0 ∈ L1 (R) then (5.64) holds for all x. Proof. One kf 0 kL1 ). On the other hand, P can show that kf kW ≤ C(kf kL1 + 0 Φ(x) = ϕ(x + k) satisfies kΦkBV [0,1] ≤ kϕ kL1 . Dirichlet’s theorem (e.g., [31], p. 161) then shows that Φ is equal to its Fourier series at every point. The Wiener condition on ϕ and this last observation then imply that X X ϕ(x + k) = Φ(x) = ϕ(n) b e2πinx k
n
with absolute, uniform convergence of the sum on the left and pointwise convergence on the right. This proves the lemma.
230
5 Fourier uncertainty principles
In Section 5.6 we saw that if f belongs to a certain weighted Lp space, and the Fourier transform maps this space to a corresponding weighted Lq d)u−1/q k q0 ≥ C kf k2 2 . Here we want to relate space, then k(Xf )v 1/p kLp k(Df L L weight conditions underlying Fourier uncertainty, on the one hand, and the corresponding weight conditions needed to verify PSF, on the other. As just observed, some combination of regularity and decay on f and fb is needed to verify (5.64) so it is not completely surprising that PSF should bear some connection with Fourier uncertainty. The idea of sharing some regularity and decay among f and fb to guarantee (5.64) was formalized by Kahane and Lemari´e-Rieusset [219]. A particular case of their results is that for kf kLpa = k|x|a f kLp , if f ∈ Lpa (R) and fb ∈ Lqb (R) where 1 ≤ p, q ≤ 2 and (a − 1/p0 )(b − 1/q 0 ) > 1/(pq), then both sides of PSF converge absolutely and are equal. The methods of Kahane and Lemari´e-Rieusset were refined in a sense by Gr¨ochenig [164], as we discuss now. 5.7.3 M1 and PSF in particular The modulation space M1 (R), often called the Feichtinger algebra, is defined as the space of those f whose short-time Fourier transform S(f, g) belongs to L1 (R × R) for a suitable fixed g. Basic properties of modulation spaces, including their well-definedness in terms of g, are outlined in [168]. Among other desirable properties possessed by M1 , the PSF converges absolutely for its members since [164] kf kW + kfbkW ≤ C kf kM1 . Embeddings into M1 guarantee convergence of PSF but they also can be viewed as uncertainty principles in the sense of (5.43). Consider weight functions of the form wa = (1 + |x|)a and wa,α (x) = (1 + |x|)a (ln(e + |x|))α . With these stronger weights, Gr¨ochenig [164] strengthened the conclusions of the results of [219]; in particular, for 1 ≤ p, q ≤ 2 and (a − 1/p0 )(b − 1/q 0 ) = 1/(pq) and α > 1/p, β > 1/q, Gr¨ochenig proved that kf kM1 ≤ C kf wa,α kp kfbwb,β kq .
(5.65)
The following result is a slight extension of (5.65) (see [196]). Theorem 5.7.8. Let 1 < p, q ≤ 2. Set (a − 1/p0 )(b − 1/q 0 ) = 1/(pq). If α > 1/p0 and β > 1/q 0 then kf kM1 ≤ C kf wa,α kp kfbwb,β kq
(5.66)
where C = C(a, b, p, q, α, β). The conclusion fails if α < 1/p0 or β < 1/q 0 . The embedding fails below the “critical range” of exponents. This follows from counterexamples to PSF due to Kahane and Lemari´e-Rieusset. A corollary of Theorem 5.7.8 is that kf kM1 ≤ Ckf wa k2 kfbwa k2 provided a > 1.
5.7 Embeddings, uncertainty and Poisson summation
231
The tools needed to prove this corollary are essentially those introduced by Gr¨ochenig to prove (5.65). We include the slightly more complicated proof of the sharper result Theorem 5.7.8 (when p < 2 or q < 2) because it illustrates the role that rearrangement methods can play. The methods also point to possible connections to Proposition 5.6.1 and Theorem 5.6.2. For example, recall Pitt’s inequality for p = q = 2. Example. Pitt’s inequality in one-variable for p = q = 2 (5.35) states that kfb(ξ)|ξ|−s kL2 ≤ Cs kf (x)|x|s kL2 for 0 ≤ s < 1/2. Then Proposition 5.6.1 implies that kf k2L2 ≤ 4πCs k|x|1+s f (x)kL2 k|ξ|1+s fb(ξ)kL2
(0 ≤ s < 1/2).
(5.67)
In comparison, according to [219], if the right-hand side of (5.67) is finite then the Poisson summation formula converges. On the other hand, by (5.65), kf kM1 ≤ C kf (x) (1 + |x|)a k2 kfb(ξ) (1 + |ξ|)a k2
(a > 1).
(5.68)
One would like to replace the weight (1 + |x|)a in (5.68) by |x|a —at least when a < 3/2—so that (5.68) can be viewed as an uncertainty improvement of (5.67) (cf. (5.43)). The method of proof of Theorem 5.7.8 does not handle this possibility. Nonetheless, this example indicates a general relationship between monotone weight pairs that verify (5.54) and those that justify PSF. 5.7.4 More counterexamples to PSF Failure of PSF precludes membership of f in M1 so counterexamples f to PSF satisfying kf wa,α kp kfbwb,β kq < ∞, (a/n − 1/p0 )(b/n − 1/q 0 ) = 1/(pq) and α ≤ 1/p0 or β ≤ 1/q 0 will indicate that Theorem 5.7.8 is sharp. The counterexamples below follow the pattern established in [164], where q = 2 so that Plancherel’s theorem can be used. Let g be a Schwartz function and set h(x) =
∞ X
αk Dm(k) (x) g(x − k) ≡
X
Hk (x)
(5.69)
k
k=2
P where Dm (x) = |l|≤m e2πilx = (sin 2π(m + 1/2)x)/ sin πx is the Dirichlet kernel of order m. Then Z 1 kDm kpp = |Dm (x)|p dx ≤ Cp mp−1 (1 < p ≤ 2). (5.70) 0
To see this, for p = 2 Plancherel’s theorem gives kDm k22 = 2m + 1. When 1 < p < 2, more brutal estimates are employed. For example Z 1 2m Z (k+1)/(2m+1) X | sin(2π(m + 21 )x)|p | sin(2π(m + 12 )x)|p dx = dx. | sin(πx)|p | sin(πx)|p 0 k/(2m+1) k=0
Symmetry along with Jordan’s inequality (sin θ ≥ 2θ/π, 0 ≤ θ ≤ π/2) is used to bound the sum by Cp mp−1 times a convergent p-series.
232
5 Fourier uncertainty principles
Lemma 5.7.9. Let g ∈ S(R), {αk } ⊂ R and Pk one-periodic. Then: ° ∞ °p ¶ µX ∞ °X ° p p ap αp ° ° α P (x) g(x · −k) w (·) |α | kP k |k| (ln(k)) . ≤ C k k a,α k k p ° ° p
k=2
k=2
The lemma follows from applying Minkowski’s inequality and the fact that wa,α is approximately constant on unit intervals. Applying Lemma 5.7.9 to the case Pk (x) = Dm(k) (x) one obtains the following. Corollary 5.7.10. Let g be a Schwartz function, {αk } ⊂ R and 1 < p ≤ 2. Then °X °p µX ¶ ∞ ° ∞ ° p p−1 ap αp ° ° ≤ C α D g(· − k) w |α | . m(k) |k| (ln(k)) k a,α ° k m(k) ° p
k=2
k=2
The corollary provides a convenient test for membership of the functions wa,α h from (5.69) in Lp . We want an equally simple test for membership of wb,β b h in L2 . With h as in (5.69) we have b h(ξ) =
∞ X
αk e−2πikξ
k=2
X |l|≤m(k)
gb(ξ − l) =
∞ X
µl (ξ) gb(ξ ± l)
l=0
P −2πikξ . Applying Lemma 5.7.9 to this last sum where µl (ξ) = m(k)≥l αk e and ignoring the first few terms, we have by Plancherel’s theorem: P∞ 2b 2 2a < ∞. Corollary 5.7.11. Fix h in (5.69) such that l=2 kµl k2 l (ln l) 2 b Then wb,β h ∈ L . In what follows, suppose that g in (5.69) is a smooth, non-negative, compactly supported function satisfying g(0) = 1. Specialize the coefficients of h to αk = k −r−1 (ln k)−t where r is as in the proof of Theorem 5.7.8 and t is to be determined. Let m(k) be the integer part of k r (ln(k))t−1 in (5.69) so that h(x) =
∞ X k=2
k r+1
1 t−1 (x) g(x − k). D r (ln k)t [k (ln(k)) ]
(5.71)
It follows that X n∈Z
h(n) =
∞ X X
αk Dm(k) (n) g(n − k) ≥
n ∈ Z k=2
∞ X n=2
1 n ln n
(5.72)
so that PSF fails to converge for this choice of h. We show now that t can be chosen so that kwa,α hkLp kwb,β b hkL2 < ∞. First, by Corollary 5.7.10, kwa,α hkpp ≤ C
∞ X k=2
k p(a−1−r/p) (ln k)1−t+p(α−1) < ∞,
5.7 Embeddings, uncertainty and Poisson summation
233
provided (a−1/p0 ) = r/p as in the proof of Theorem 5.7.8, and t > 2+p(α−1). Next we determine when wb,β b h ∈ L2 . If l ≥ 2, X kµl k22 = k −2−2r (ln k)−2t ≤ C l−2−1/r (ln l)(t−1)/r−2 l≤[kr (ln k)t−1 ]
so that X
kµl k22 l2b (ln l)2β ≤ C
l≥2
X
l2(b−1−1/(2r)) (ln l)(t−1)/r−2+2β < ∞
l≥2
provided (b − 1/2) = 1/(2r) as in the proof of Theorem 5.7.8 and t < 1 + (1 − 2β)/(2b − 1). We summarize as follows. Corollary 5.7.12. Let h be as in (5.71), 1 < p ≤ 2, and let (a − 1/p0 )(b − 1/2) = 1/(2p). Then one can choose t in (5.71) so that kwa,α hkLp kwb,β b hkL2 < ∞ but the Poisson summation formula fails for h, provided 2 + p(α − 1) < 1 + (1 − 2β)(2b − 1). In particular this happens if α = 1/p0 , β < 1/2 or α < 1/p0 , β = 1/2. 5.7.5 Proof of the embedding theorem p,s Recall that the Lorentz space R ∞ L (R) consists of those functions f whose rearrangements f ∗ satisfy 0 (t1/p−1/s f ∗ (t))s dt < ∞. We need the following result.
Lemma 5.7.13. If α > 1/s then w(x) = (1+|x|)−1/p ln(e+|x|)−α ∈ Lp,s (R). Proof (of Lemma 5.7.13). Note that w∗ is bounded since wRis bounded, while ∞ a simple estimate gives w∗ (t) ≈ t−1/p (ln t)−α for large t. As 2 (ln t)−αs dt/t < ∞ when α > 1/s, the lemma follows. Proof (of Theorem 5.7.8). One sets F (x, ξ) = S(f, g)(x, ξ). In computing the L1 -norm of F we decompose the region of integration into the setsR Ar = {(x, ξ) : |x| ≥ |ξ|1/r } and Br = (R × R) \ Ar . We will only estimate Ar |F | since the estimate for Br is similar. With wa,α (x) = (1 + |x|)a (ln(e + |x|))α and assuming that (a − 1/p0 ) = r/p we have: Z Z Z |F | = |F (x, ξ)| wa,α (x) wa,α (x)−1 dx dξ {|x|≥|ξ|1/r }
Ar
Z ≤
µZ −p0
¶1/p0
kF (·, ξ) wa,α (·)kp wa,α (x) dx dξ Ar Z 0 0 ≤ Cr kF (·, ξ) wa,α (·)kp (ln(e + |ξ|))−α (1 + |ξ|)(1−ap )/(rp ) dξ
≤ C k(kF (·, ξ) wa,α (·)kp ) kLp0 ,s k(ln(e + |ξ|))−α (1 + |ω|)−1/p kLp,s0 ≤ C k(kF (·, ξ) wa,α (·)kp ) kLp0 ,s (αs0 > 1)
234
5 Fourier uncertainty principles
by Lemma 5.7.13 (with s replaced by s0 ). Since αp0 > 1, there exists s > p with αs0 > 1. Fix such an s. We must show that k(kF (·, ξ) wa,α (·)kp ) kLp0 ,s ≤ C kf wa,α kp . We have k(kF (·, ξ) wa,α (·)kp ) kLp0 s
°µ Z ¶°1/p ° ° p p ° = ° |F (x, ·)| wa,α (x) dx ° ° p0 /p,s/p L µZ ¶1/p p p 0 ≤ Cp,s k|F (x, ·)| kLp /p,s/p wa,α (x) dx µZ p (x) dx kS(f, g)(x, ·)kpLp0 ,s wa,α
= C µZ ≤ C
kf (·) g(· +
x)kpLp
¶1/p
¶1/p p wa,α (x) dx
≤ C kf wa,α kp . Here we have employed H¨older’s inequality, Hunt’s Lorentz space version of the Hausdorff–Young inequality (kfbkLp0 ,s ≤ Ckf kLp,s , e.g., [329]), the fact that kf kLp,s2 ≤ kf kLp,s1 whenever s1 ≤ s2 , and the fact that (|f |r )∗ = (f ∗ )r . We have also used a Lorentz space version of Minkowski’s inequality, again using s > p. The last inequality required the submultiplicativity of the function (1 + |x|)ap ln(e + |x|)ap together with a requirement that gwa,α be p-integrable. Thus we have shown that Z |F | ≤ C kf wa,α kp . Ar
Under the hypotheses on b, β, the same techniques can be applied to the region Br to obtain the inequality Z q |F | ≤ C kfb(ξ) wb,β (ξ)kq . Br
Combining these inequalities with the inequality for arithmetic and geometric means finally proves Theorem 5.7.8.
5.8 Time–scale uncertainty principles The Balian–Low theorem (see Chapter 3) states that if the integer shifts and modulates of g form a Riesz basis for L2 (R) then Xg ∈ / L2 or Dg ∈ / L2 . That is, independence plus completeness is incompatible with localization, at least for time–frequency shifts. Wavelets were observed early on to provide
5.8 Time–scale uncertainty principles
235
bases having good time–frequency localization. Certainly there are orthonormal wavelet bases whose mother wavelets have polynomial decay of any order in both time and frequency. Nevertheless, there are still limitations on Heisenberg localization of wavelets. We will present some of Battle’s work on this topic here. The first observation is that orthogonality between scales already imposes a limitation on localization. Lemma 5.8.1. If the functions ψjk (x) = 2j/2 ψ(2j x−k) ∈ L2 (R) are mutually b orthogonal and if ψb is continuous, bounded and integrable, then ψ(0) = 0. Since ψb is integrable, ψ ∈ C0 (R). Since ψ is nontrivial, one can choose j0 ,k0 so that ψ(k0 /2j0 ) 6= 0. By orthogonality and the Parseval relation, Z b j0 ξ) ψ(2 b −(j−j0 ) ξ) dξ 0 = 2j0 e2πik0 ξ ψ(2 Z j0 b −j ξ) dξ → ψ(0) b ψ(k0 /2j0 ) b ψ(2 = e2πik0 ξ/2 ψ(ξ) b by dominated convergence. It follows that ψ(0) = 0. This proves the lemma. Next we extend the lemma to higher-order moments under the hypothesis b ∈ L1 . that ψb ∈ C N +1 and (1 + |x|)N +1 ψ(ξ) Lemma 5.8.2. Under the additional hypotheses on ψ and the same orthogonality condition as before, ψ is orthogonal to polynomials up to degree N . The proof of Lemma 5.8.2 uses a Taylor expansion of φb at the origin but is otherwise similar to that of Lemma 5.8.1 (cf. [24]). Corollary 5.8.3. If {ψjk } forms an orthogonal family then ψ cannot have exponential localization in both time and frequency. b vanishes By Lemma 5.8.2 such localization criterion would imply that ψ(0) to infinite order. However, exponential localization of ψ implies that ψb is realanalytic, hence can vanish only to finite order. Next we wish to obtain quantitative Heisenberg bounds on orthogonal wavelets. Such bounds will make explicit use of the structure of the scaling operator S given by 1 S = (DX + XD) . (5.73) 2 Here, as before, X and D denote the usual multiplication and differentiation operators Xf (x) = xf (x) and Df (x) = (1/2πi)df /dx. Then S is the infinitesimal generator of scaling in the sense that e2πiαS f (x) = eα/2 f (eα x).
(5.74)
This can be deduced from the metaplectic representation (e.g., [144] Section 4.3, [142]). The scaling operator will play an important role in sharpening Heisenberg bounds for orthogonal wavelets. At the same time, we continue to make use of moment properties and decay.
236
5 Fourier uncertainty principles
Lemma 5.8.4. If x1+ε f ∈ L2 (R) for some ε > 0 and for some g ∈ L2 (R).
R
f = 0 then f = Dg
Lemma 5.8.5. If f ∈ L2 (R) and f = Dg for some g ∈ L2 (R) then 2
kDf k2 k(X − hXf, f i) f k2 ≥
3 kf k2 . 4π
Lemma 5.8.5 is worth proving [25]. First, because the condition f = Dg is invariant under translation, one can assume that hXf, f i = 0. Now kXf k2 = kXDgk2 . Since [D, X] = I/(2πi), one sees that S =
1 1 (DX + XD) = XD + I 2 4πi
(5.75)
while
1 2 1 [D , X] = D. (5.76) 2 2πi Since kT f k2 = kT ∗ f k2 where T ∗ is the (Hermitian) adjoint of T , it follows from (5.75), (5.76) and Cauchy–Schwarz that [D, S] =
kDf k2 kXf k2 = kDf k2 kXDgk2 °³ 1 ´ ° ° ° = kDf k2 ° S − I g° 4πi 2 °³ 1 ´ ° ° ° I g° = kDf k2 ° S + 4πi 2 ¯D ´ E¯ ³ 1 ¯ ¯ I g ¯ ≥ ¯ Df, S + 4πi ¯D ³ ´ E¯ 3 ¯ ¯ = ¯ f, SD + D g ¯ 4πi ¯D ³ 3 3 ´ E¯¯ ¯ f ¯ ≥ kf k22 = ¯ f, S + 4πi 4π where the last inequality follows from the fact that hf, Sf i is real since S is Hermitian self-adjoint. This proves the lemma. Returning to the role of orthogonality, it follows from (5.74 ) that if ψ(x) and ψ(2x) are orthogonal, then heln
√
2
ψ(eln 2 ·), ψ(·)i = he−2πi(ln 2)S ψ, ψi = 0.
(5.77)
On the other hand, by (5.75), D³ E d −λ/2 2πiλS 1 ´ 2πiλS (e he ψ, ψi) = 2πi e−λ/2 S − I e ψ, ψ dλ 4πi −λ/2 2πiλS = 2πi e hXD e ψ, ψi, (5.78) whereas
5.8 Time–scale uncertainty principles
237
e−2πiλS D e2πiλS φ(x) = e−2πiλS D (eλ/2 φ(eλ x)) e3λ/2 −2πiλS 0 λ e φ (e x) 2πi λ = e D φ(x). =
(5.79)
Then (5.78), (5.79), the fundamental theorem of calculus, the orthogonality condition and the unitarity of e2πiλS yield 1 = |1 − hψ(·/2), ψi| √ = |1 − 2 he−2πi(ln 2)S ψ, ψi| ¯ ¯ Z ln 2 ¯ ¯ d λ/2 −2πiλS ¯ (e he ψ, ψi) dλ¯¯ = ¯ dλ 0 ¯ Z ln 2 ¯ ¯ ¯ = ¯¯ 2πi eλ/2 hX D e−2πiλS ψ, ψi dλ¯¯ 0 ¯ Z ln 2 ¯ ¯ ¯ −λ/2 −2πiλS ¯ = 2π ¯ e he D ψ, X ψi dλ¯¯ 0
Z
≤ 2π kDψk2 kXψk2
ln 2
√ e−λ/2 dλ = 4π(1 − 1/ 2) kXψk2 kDψk2 ,
0
and we have proved the following. Proposition 5.8.6. If ψ(x) and ψ(2x) are orthogonal then kXψk2 kDψk2 ≥
1 . 4π(1 − 2−1/2 )
There is nothing special about the dilation√ factor 2 here: √ with a scaling factor a > 1 the same conclusion holds with 2 replaced by a. In particular, as a ↓ 1 the Heisenberg spread necessarily becomes larger. This is not surprising: if f is orthogonal to a small dilate then either f must be spread out spatially √ √ or f must oscillate rapidly—hence be spread out in frequency. Since 2/( 2 − 1) > 3, the orthogonality condition yields a stronger HeisenR berg bound than does the moment condition f = 0 of Lemma 5.8.5. On the other hand, there is a big difference in this context between inequalities for the product kXψk2 kDψk2 as opposed to the more general Heisenberg product σψ (X)σψ (D) where σψ (T ) = k(T − hT ψ, ψi)ψk2 . Consider the case of orthogR R R d = − ξ|ψ| b 2 = 0 since onal wavelets. If ψ is real-valued then ψDψ = ψbDψ b |ψ| is even. In fact, Balan [16] showed that if the wavelet family {ψjk } forms a normalized Bessel family and if hDψ, ψi = 0 then σψ (X)σψ (D) ≥ 3/(4π). This inequality then applies to each of the ψjk as well. The Bessel inequality is b used only to deduce the moment condition ψ(0) = 0. The result then follows from Lemma 5.8.4 and Lemma 5.8.5. Inequalities for higher-order measures of spread were considered in [25].
238
5 Fourier uncertainty principles
5.9 Notes Spectral gaps. A classical theorem of Levinson [257] says that if (i) W ≥ R 1, (ii) the logarithmic integral (1 + |x|2 )−1 ln W (x) dx diverges, and (iii) R |f |W < ∞, then vanishing of fb on an interval P implies that f ≡ 0. As a corollary, if (ak , bk ) are disjoint intervals such that [1 + (bk − ak )2 ]/(ak bk ) = ∞ then any f ∈ L1 (R) having fb = 0 on ∪(ak , bk ) must vanish identically. Benedicks [45] later used ideas of Beurling and Malliavin to prove that the condition on the intervals cannot be weakened. Another corollary of Levinson’s theorem is that an L1 function vanishing on a half-line cannot have a spectral gap. Shapiro [319] showed that, in Rn (n ≥ 2), f ∈ L1 can have such a gap if it is supported in a half-space, but not if supported in a cone of opening less than π/2. The books by Koosis [231,232] provide a detailed study of the relationship between the logarithmic integral and Fourier uncertainty principles, Hardy spaces and other areas of analysis. Concentration of bandlimited functions. If f is bandlimited then f cannot be concentrated on a thin set. Suppose that E ⊂ R satisfies: there exist T > 0 and γ ∈ (0, 1] such that, for any interval I of length T , |E ∩ I| ≥ γT . The Logvinenko–Sereda theorem (see [180], p. 112) states that, under these conditions on E, one has kf χE kp ≥ exp(−C(T Ω + 1)/γ)kf kp whenever f is Ω-bandlimited. Here p ∈ [1, ∞] and C = C(p, α, γ, Ω). The dependence of the growth on γ was improved by Kovrijkine [235] to kf χE kp ≥ (γ/C1 )C2 (T Ω+1) kf kp . The latter depends on Nazarov’s estimates for subset restrictions of trigonometric polynomials to arcs. Hardy’s theorem for rotations. Hardy’s theorem characterizes Gaussian functions in terms of optimal joint decay of f and of fb. Exponential decay of these functions implies that they have extensions to entire functions. Hogan and Lakey [199] extend Hardy’s theorem to rotations of Gaussian functions in C as follows. R Theorem 5.9.1. Let f ∈ L1 (R). Define fb(ζ) = f (x)e−2πixζ dx for any ζ ∈ C for which the integral exists and define f (z), (z ∈ C) by Fourier inversion whenever possible. Suppose that for some θ0 there exists a constant C such 2 that |f (±reiθ0 )| ≤ Ce−παr while, for some ψ0 ∈ (−π/2, π/2), |fb(±reiψ0 )| ≤ 2 2 Ce−πr /α . Then f is a rotation of a multiple of e−παx through the angle −θ0 −2iθ0 2 z in the complex plane. That is, f (z) = Ce−παe . There is no a priori relationship between the inverse Fourier transform of fb and that of a complex rotation of its analytic extension. Establishing such a relationship under the decay hypotheses is the heart of the proof. This involves certain regularization and limiting arguments. Nothing is required a priori about the relationship between the angles θ0 and ψ0 , although θ0 = −ψ0 follows a posteriori, while the condition f ∈ L1 then implies |θ0 | < π/4.
5.9 Notes
239
Hardy’s theorem on groups. One can make sense of Gaussian decay of a function and its Fourier transform in a large class of semisimple Lie groups, including SL(2, R) (see e.g., [9, 321]). Early versions for the Heisenberg group were considered by Thangavelu [345]. Morgan’s theorem has also been extended to such settings (cf. [14]). Uncertainty inequalities with norms other than L2 . The quantity Λ(f ) = kx2 f k1 kf 0 k22 /(kf k1 kf k22 ) arises in minimizing var(p) var(˜ p) taken over all probability density functions on R having non-negative, integrable characteristic functions. Here p˜ is a naturally defined adjoint of p. Other norm products involving Xf and Df arise in other optimization problems. Laeng and Morpurgo [242] showed that Λ(f ) attains a unique minimum on functions of the form f (x) = αf0 (βx) where α, β are nonzero reals and f0 (x) = {cos x − cos m0 + (x2 − m20 )sin m0 /m0 }χ{|x|<m0 } in which m0 is the unique minimum of y(5 − 2y 2 ) tan2 y + 5(3 − y 2 ) tan y − 15y = 0. This result was proved independently by Ehm et al. [131]; cf. also [124]. Pointwise decay of Fourier pairs. Impossibility of joint pointwise decay of Fourier transform pairs was cast in general terms by Balila and Reyes [18] as follows. Theorem 5.9.2. Let Vi : R → (0, ∞) (i = 0, 1) be even, nondecreasing and satisfying Z 1 Z ∞ dt dt = ∞ and <∞ (i = 0, 1). Vi (t) 0 Vi (t) 1 Suppose also that Vi are moderate in the sense that Vi (2t) ≤ Ci Vi (t). Then there exist homeomorphisms Φ and Ψ of [0, ∞) such that for all f ∈ C(R) ∩ L1 (R), µ ¶ µ ¶ kV0 f k∞ kV1 fbk∞ Φ Ψ ≥ 1. |f (0)| |fb(0)| For example: kx2 f k∞ kξ 2 fbk∞ ≥ C|f (0)||fb(0)| for all continuous f ∈ L1 . The decay of Vi at ∞ is necessary. Poisson summation plays an important role in the proof. The integral conditions on Vi are used to obtain a variant of the Wiener criterion f, fb ∈ W justifying PSF. Extensions of Pitt’s theorems. Pitt’s inequalities can be extended to higher powers of |x| when restricted to functions having vanishing moments up Pmto a certaink order or, equivalently, by subtracting a term of the form k=0 (−2πixξ) /k! from the kernel of the Fourier transform.
240
5 Fourier uncertainty principles
Theorem 5.9.3. If f ∈ S0,0 (R), 1 < p ≤ q < ∞ and k ∈ / Z then µZ 0 |fb(ξ)|q | ξ|(q/p −1)−kq dξ
¶1/q
µZ ≤ C
¶1/p p
|f (x)| |x|
kp
dx
.
Here S0,0 (R) consists of those Schwartz functions whose Fourier transforms have compact support away from the origin. This class is dense in Lp|x|k . No form of Pitt’s inequalities can hold when k is an integer and q = p: for k = 1 this is seen by considering fN (x) = {χ(1/N,1) − χ(1,N ) }/x. An Ap -weighted version of this result was proved by Sadosky and Wheeden [313]. In fact, Pitt’s inequality yields the following characterization of monotone Ap weights due to Benedetto et al. [38] (cf. Heinig and Sinnamon [184] for an Rn version). Theorem 5.9.4. Let 1 < p ≤ q ≤ p0 < ∞. Assume that w(|t|) ≥ 0 is increasing on (0, ∞). Then there is a constant C > 0 such that µZ
µ 0 |fb(ξ)|q |ξ|q/p −1 wq/p
1 |ξ|
¶
µZ
¶1/q dξ
≤C
¶1/p |f (x)| w(x) dx p
holds for all f if and only if w ∈ Ap . The role of Ap here is curious and indicates that, at least qualitatively, taking into account the role played by inversion, the methods of proving these inequalities are akin to those for proving boundedness of singular integrals. More on rearrangement weighted Fourier inequalities. Theorem 5.6.2 can be extended to the case q < p by means of a certain duality argument, but with a different looking condition on the weights. Namely, if Z
∞
µZ
1/s
¶r/q µ Z ∗
u 0
0
0
s
µµ ¶∗ ¶p0 −1 ¶r/q0 µµ ¶∗ ¶p0 −1 1 1 (s) ds < ∞ (5.80) v v
where 1/r = 1/q − 1/p, then (5.55) holds. This was first proved in [39]. A different proof was given in [37], which also contains a different proof of Rs Theorem 5.6.2. Here the authors invoke an a priori inequality 0 fb∗ (t)q dt ≤ R s R 1/t Kqq 0 ( 0 f ∗ )q dt (q ≥ 2) due to Jodeit and Torchinsky [217]. This, together with Hardy’s inequality (i.e., the Lp norm, p > 1, of the averages of f ∗ is equivalent to the Lp norm of f (e.g., [271])) proves the case q ≥ 2 of Theorem 5.6.2 and also leads to (5.80). Fourier inequalities in weighted Lorentz space norms are also established in [37]. Local uncertainty inequalities. The Heisenberg uncertainty (5.1) does not exclude the possibility of fb having large, localized, well-separated peaks. In [300], Price rules out this behavior by showing that if E ⊂ Rn is measurable and β > 1, then
5.9 Notes
241
Z |fb(ξ)|2 dξ ≤ K |E| kf k2
2−2/β
E
2/β
k|x|nβ/2 f k2
for all f ∈ L2 (R). The sharp constant K is computed. Fourier restriction theorems. Fourier restriction theorems are inequalities of the form kfbkLqµ ≤ Ckf kLpv in which µ is a singular measure, often associated with some submanifold of Rn . They are surveyed at some length in Stein [328] where applications are also discussed. The following notes focus on necessary conditions for such inequalities. The Tomas–Stein inequality µZ |fb(ξ)|2 dσn−1 (ξ) Σn−1
¶1/2 ≤ Cp kf kp
µ ¶ 2n + 2 1 ≤ p< n+3
(5.81)
is proved using pointwise decay of the Fourier transform of the surface measure dσn−1 of the unit sphere which relies, in turn, on curvature. The product structure of the Fourier transform on Rn illustrates why curvature is needed, as is closeness of p to p = 1. In R2 , set f = f1 f2 where fb1 (ξ1 ) = χ[−δ,δ] (ξ1 ) and fb2 (ξ2 ) = χ[√1−δ2 ,1] (ξ2 ). Thus fb is the characteristic function of a box about a circular cap having arclength ≈ 2δ. On the other hand, f1 (x) = sin(2πxδ)/(2πx) while, up to a modulation, f2 (x) ≈ sin(2πxδ 2 )/(2πx). Since ksin(2πxδ)/(2πxδ)kLp (R) = Cδ −1/p ksin(u)/ukLp (R) , √ 0 kf kLp (R2 ) ≈ δ 3/p while kfbkL2dσ ≈ δ. Letting δ → 0 one sees that 1/2 ≥ 3/p0 or p ≤ 6/5 = (2n + 2)/(n + 3) is necessary for (5.81). Similar reasoning shows that, in Rn , the Fourier transform can only be continuous from Lp to Lqdσn−1 if p0 > q(n + 1)/(n − 1). One also requires p ≤ 2. The problem of determining sharp sufficient conditions on p and q for this Fourier restriction property is still open (e.g., [341]). A necessary condition for restriction inequalities of the form kfbkLqdµ ≤ Ckf kLp in which µ is a canonical measure associated with some set of fractional Hausdorff dimension on Rn is readily obtained from the following approximation result of Beurling [51] (see also Herz [190], Newman [289], Kahane and Salem [220] and Benedetto and Lakey [40]). Theorem 5.9.5. Let f ∈ L1 ∩ L2 (Rn ) and denote by Cfp the Lp -closed linear span of the translates of f . Suppose that the set E on which fb = 0 has Hausdorff dimension α < n. Then there is a number r ≤ 2n/(2n − α) for which Cfp = Lp (R) whenever p > r. In particular, for any such f the span of the translates of f is dense in Lp whenever p > 2n/(2n − α). If f ∈ XE = {f ∈ L1 ∩ L2 : fbχE = 0}, then kfbkLqµ = 0 when µ is a measure supported on E. On the other hand, if p > 2n/(2n − α) where dim E = α then XE is dense in Lp . If gk is a sequence in XE approximating g then kb g − gbk kLqµ = kb g kLqµ while kg − gk kLp → 0. Thus the inequality
242
5 Fourier uncertainty principles
kfbkLqµ ≤ C kf kLp cannot hold on all of Lp for such nontrivial µ. Thus, this theorem of Beurling’s gives rise to necessary conditions for Fourier restriction. In the case of the unit sphere, α = n − 1 so spherical restriction cannot hold unless p ≤ 2n/(n + 1), no matter what q is. The method for proving Beurling’s theorem also gives rise to the following R closure theorem for L2β = {f : |f (x)|2 (1 + |x|2 )β dx < ∞} (see [40]). Theorem 5.9.6. Let E ⊂ R be a subset of Hausdorff dimension α < 1 which is the zero set of the Fourier transform of some f ∈ L1 ∩ L2 (R). Then {g ∈ L1 ∩ L2β : gbχE = 0} is dense in L2β whenever β < 2α. The same argument above gives necessary conditions for Sobolev restriction theorems (cf. notes of Chapter 7). More on embeddings into M1 . The global decay required of f and fb in Theorem 5.7.8 can be relaxed at the expense of making wa,α slightly larger. Theorem 5.9.7. Let 1 < p, q ≤ 2. If (a − 1/2)(b − 1/2) = 1/(pq), α > 1/2 and β > 1/2, then kf kM1 (R) ≤ C kf wa,α kW (Lp ,`2 ) kfbwb,β kW (Lq ,`2 ) . Here W (Lp , `q ) is the Wiener amalgam space consisting of those f such that {kf χ[k,k+1) kLp }k∈Z ∈ `q (Z). When p < 2, Lp ∪ L2 is properly contained in W (Lp , `2 ). Theorems 5.9.7 and 5.7.8 illustrate the tradeoffs between decay imposed by the exponents p, q versus that imposed by the weights wa,α , wb,β . Theorem 5.9.7 has a proof similar to that of Theorem 5.7.8, but requiring a form of the Hausdorff–Young theorem for amalgam spaces, e.g., [244]. Uncertainty principles and basis functions. Lemma 5.8.1 (cf. [24]) shows that orthogonality of wavelets implies a vanishing moment of the mother wavelet while additional regularity and decay of ψb imply more vanishing moments. On the other hand, the Balian–Low theorem 3.4.3 limits the possible localization of the generator of a Gabor basis. Bourgain [58] proved that an orthogonal basis of the Rform gmn (x) = fm (x + n) could satisfy the joint fiR nite variance estimates |gmn |2 (1 + |x|) dx ≤ C and |b gmn |2 (1 + |ξ|) dξ ≤ C. This is optimal in the sense of a result attributed to Steger: L2 (R) does not R orthonormal basis {gmn } satisfying |gmn |2 (1 + |x|)1+ε dx ≤ C and Radmit an 2 |b gmn | (1 + |ξ|) dξ ≤ C. In general, if φn is an infinite, orthonormal subset of L2 (R), one can ask: what a priori joint time–frequency decay limitations must the functions suffer? H.S. Shapiro proved that there cannot exist an absolute constant C and an ε > 1/2 such that both |φn (x)| ≤ C(1 + |x|)−ε and |φbn (ξ)| ≤ C(1 + |ξ|)−ε .
(5.82)
5.9 Notes
243
Byrnes et al. [62] later proved Shapiro’s condition is sharp in the sense that one can take ε = 1/2 for the basis functions in (5.82). This is achieved by cn (ξ)| ≤ constructing first an orthonormal basis for L2 ([0, 1]) such that |φ C(1 + |ξ|)−1/2 uniformly in n. Then one builds a family of finite but increasing dimensional orthogonal transforms, based on much the same construction as that of the Rudin–Shapiro polynomials (e.g., [225]), and extends to other intervals by simultaneously shifting and dilating. This is what allows the uniform decay. One defines φnN (x) by shifting φn to [1, 2] then dilating so that φnN (x) = 2−N/2 φn (x/2N ) is a basis for L2 [2N , 2N +1 ] and |φnN (x)| ≤ C2−N/2 ≤ C/|x|1/2 on its support. On the other hand, |φbnN (ξ)| = |2N/2 φbn (2N ξ)| ≤ C(2N + |ξ|)−1/2 ≤ C(2−N + |ξ|)−1/2 . Variance inequalities for scale transforms. Frequency can be identified 2 as inverse scale in the context of H+ (R), the Hardy space of functions in 2 L (R) whose Fourier transforms vanish on (−∞, 0). The scale transform Sf 2 of f ∈ H+ (R) is given by Z ∞ S(f )(s) = fb(ξ) ξ 2πis−1/2 dξ. 0 2 It is an isometry from H+ (R) to L2 (R, ds). Let kf k2 = 1. Flandrin [142] proved the following analogue of Heisenberg’s inequalityR for the scaling operator. Define the mean madd (f ) = x|f (x)|2 dx and variance R standard arithmetic 2 2 Vadd = (x − madd (f )) |f (x)| dx. Then 2 Theorem 5.9.8. If f ∈ H+ (R) then Vadd (fb)Vadd (Sf ) ≥ madd (fb)2 /(16π 2 ). Equality holds if and only if fb has the form
fb(ξ) = C exp(a ln ξ − bξ + i(c ln ξ + d))χ[0,1) (ξ). The optimizer f is often called the “Klauder wavelet.” Flandrin also proved corresponding uncertainty inequalities for the scale operator involving geometric means.
6 Function spaces and operator theory
In the mathematics community, wavelets emerged as a refinement of classical Littlewood–Paley methods. Stromberg’s proof that specific spline-type wavelets form an unconditional basis for the real Hardy space ReH 1 (Rn ) (see [336]) was a major step in showing that wavelets might also lead to genuinely new mathematical results, although the existence of an explicit unconditional basis—just not exactly wavelets—had already been proved by Carleson [68]. Some feeling emerged that, as a mathematical tool, wavelets might add nothing more than technical simplification of known consequences of Littlewood–Paley theory. Tchamitchian’s construction [342] of certain algebras of singular integrals that are not spectrally invariant finally marked a new achievement in operator theory truly attributable to the use of wavelets. It is not our goal to list specific theorems directly attributable to wavelets, though. It is more enlightening to consider new perspectives that have emerged in response to curmudgeonly challenges. The first part of this chapter is about wavelet decompositions and approximations in Besov spaces. The second part investigates the use of wavelets and other time–frequency decompositions in the analysis of operators acting on these and other function spaces. The variation of a signal—either analog or sampled—provides a fundamental measure of the amount of “features” that one must retain when trying to approximate or compress it. That wavelets do a good job of capturing variation is an underlying theme of this chapter. A precise statement of the decay of wavelet coefficients of functions of bounded variation, however, will not be given until Chapter 7, because of certain analytic intricacies. In a sense, the variation norm lies at the end of the scale of Besov norms that will be discussed in Section 6.1. Donoho developed a principal (discussed in Section 6.2) formulating a sense in which an unconditional basis that also happens to be an orthonormal basis for L2 is the best type of basis to use when one wishes to approximate arbitrary elements of a function space by finite linear combinations. Wavelets, for example, furnish unconditional bases for Besov spaces—an observation first
246
6 Function spaces and operator theory
formulated by Lemari´e and Meyer [255]. This partly explains why wavelets might be a good tool for signal compression—at least if one is willing to believe that a Besov norm can quantify how much essential information a signal contains (cf. [121]). But one also must be able to encode that information efficiently if high-speed processing algorithms are to be employed. Cohen et al. have developed an algorithm for encoding and decoding wavelet coefficient trees effectively [81]. This coding scheme is reviewed in Section 6.5. The ability to approximate elements of Besov spaces in terms of wavelet trees is considered in Sections 6.3 and 6.4. An operator mapping a function space into itself can be expressed in terms of its coefficient matrix with respect to a basis for that space. Donoho’s heuristic says that a basis is optimal for a type of approximation when the norm of an element depends only on the magnitudes of its basis coefficients. In a parallel sense, a basis is well adapted to an operator acting on a space if basic properties such as boundedness of the operator on the space can be deduced from a test that only depends on the magnitudes of its matrix coefficients. Schur’s lemma (6.6.1) provides such a criteria for boundedness when the function space norm is equivalent to a weighted `p norm on the basis coefficients. For much the same reasons as in Donoho’s heuristic, this approach also leads to effective approximations of the operator via truncations of its matrix. These issues are addressed in the context of wavelets and Besov norms in Section 6.6. In Section 6.7 we apply much the same principle in the context of specific singular integral operators. It is shown that the Haar wavelet matrix of the Hilbert transform satisfies Schur’s criterion. Pseudodifferential operators—generalizations of differential operators— are often categorized in terms of their symbols—how they are localized in time and frequency. By choosing bases of Gabor functions or local trigonometric bases, etc., that are adapted to this localization, one can play the same game of applying Schur’s lemma to deduce mapping properties. We show how this works in Section 6.8. Finally, in Section 6.9, rather than asking for which operators wavelets are adapted, we ask which operators can be built from components that are well localized on dyadic cubes. This localization is the only a priori condition that the building blocks need have in common with wavelets. A criterion called near weak orthogonality (see Rochberg and Semmes [310]) not only guarantees L2 -boundedness of the operators thus built, but also accounts for other approximation properties of the operators. Applications of these ideas will be considered in Chapter 7. Several concepts discussed in the second part of this chapter, particularly in relation to wavelets and singular integrals, are the result of a long evolution of the theory of singular integral operators. Little or no effort will be made to summarize that history, many details of which can be found in Garc´ıaCuerva and Rubio de Francia [151] and Stein [327, 328]; cf. also Meyer [274] and Hernandez and Weiss [189] on the specific role of wavelets. The work of Frazier and Jawerth (see [146, 147]) is particularly noteworthy as it provided
6.1 Besov spaces: history and wavelets
247
a transition from classical Littlewood–Paley methods to the wavelet methods discussed here.
6.1 Besov spaces: history and wavelets Although nearly all of the ideas and results in this chapter have parallel versions in Rn , we will work primarily in one-variable, mostly for notational simplicity. In Chapter 2 we saw that Hilbertian Sobolev spaces can be normed in terms of wavelet coefficients. Besov spaces are extensions of Sobolev spaces with respect to this property and they provide a natural setting for asking: what advantages do wavelets hold, asymptotically, over other bases? Besov [50] introduced a family of function spaces characterized in terms of Lp -norms of moduli of smoothness and sharing many of the fundamental properties of Sobolev spaces, including embedding, restriction and extension, and interpolation properties. Descriptions of these spaces in terms of particular Littlewood–Paley decompositions evolved through the work of Taibleson [338], Fefferman and Stein around 1970 (see, e.g., [328]) and others. Peetre [295] contains an excellent historical account of these developments and a definition of Besov norms in terms of more general Littlewood–Paley decompositions. Working in R, one fixes φ and ψj , j = 1, 2, . . . such that φb is supported in (−2, 2) and ψbj is supported in (−2j+1 , 2j+1 ) \ (−2j−1 , 2j−1 ), P while φb + ψbj ≡ 1 and 2−jβ |dβ ψj /dxβ | ≤ cβ for β ∈ N. For p ∈ [1, ∞], q > 0 and s ∈ R one defines the space ½ µX ¶ ¾ ∞ ¡ sj ¢q 1/q s,q 0 Bp (R) = f ∈ S (R) : kφ∗f kLp k+ 2 kφj ∗f kLp < ∞ . (6.1) j=1
In 1985, Frazier and Jawerth [146] provided a discrete description, essentially in terms of wavelet frames, followed shortly thereafter by Lemari´e and Meyer’s [255] characterization of Bps,q by wavelet coefficient norms kf kBps,q (Rn ) = k{hf, φk i}k`p +
µX ∞ ¡
2js k{hf, ψjk,p0 i}k k`p
¢q
¶1/q .
(6.2)
j=1
Here φ is the scaling function of an MRA and ψ is the corresponding mother wavelet, while ψjk,p = 2j/p ψ(2j x − k), also written ψI,p = |I|1/2−1/p ψI . To define Bps,∞ one replaces the sum over j by a corresponding supremum in (6.1) and (6.2). Originally the Lemari´e–Meyer wavelets were used but, as in Theorem 2.1.2, any sufficiently regular orthogonal or biorthogonal wavelet will do [189]. The norm equivalence can be extended to p = ∞ and or q = ∞. It is obvious from this characterization that P wavelets form P unconditional bases s for the Besov spaces. That is, if f = hf, φ i φ + k k j,k hf, ψjk i ψjk ∈ Bp,q P P s then εk hf, φk i φk + j,k εjk hf, ψjk i ψjk ∈ Bp,q whenever |εjk | = 1.
248
6 Function spaces and operator theory
Several approximation properties of Besov norms are easier to describe when functions are restricted to compact subsets of R or Rn . One defines Bps,q ([0, 1]) to consist of the restrictions to [0, 1] of elements of Bps,q (R), with quotient norm inf{kgkBps,q (R) : g|[0,1] = f }. In order to avoid entanglement in the intricacies of boundary conditions and different equivalent norms (cf. s,q Section 2.3.3), we will consider a wavelet model bs,q p = bp (D[0, 1]) for localizas,q tion of elements of Bp . Recall that D[0, 1] consists of the dyadic subintervals of [0, 1]. Fix an orthogonal wavelet basis whose mother wavelet ψ is H¨older continuous of order at least s. We say that f ∈ bs,q p if f =
X I ∈ D[0,1]
q
cI ψI,p and kf kbs,q = p
∞ X ¡
° ° ¢q 2js °cI(j,·) °`p < ∞.
(6.3)
j=1
6.2 Unconditional bases as best bases The singular-value decomposition (SVD) was once a standard for image compression. Treating an image as a matrix, computing its SVD, and truncating according to negligible eigenvalues, one removes information in directions orthogonal to those being kept which, in turn, retain as much energy as possible for a given number of retained directions. This optimizes a naive measure of bit rate versus distortion so, in the sense of L2 error of an N -term approximant, SVD gives the best basis for compressing a given image. However, SVD is unstable with respect to perturbations of the statistics from which it is computed: a small rotation of a matrix can change all of its eigenvectors. Moreover, the SVD may not provide an optimal basis when distortion is computed in terms of a norm designed to indicate perceptually significant features such as edges. Even more importantly, SVD is expensive. On the positive side, SVD does not require any a priori signal model. Daubechies et al. [121] formulated the problem of finding efficient basis expansions as a 3-faceted grand challenge, namely to: (1) obtain accurate models of naturally occurring sources of data, (2) obtain “optimal representations” of such models and (3) rapidly compute such optimal representations. We will not address the first and most basic facet here. The authors of [82] suggest that there are deep reasons why such models should be amenable to efficient computational harmonic analysis. We will consider two such reasons here that tie in directly with facets (2) and (3) of the grand challenge. Later we will see that the same ideas underlie certain advances in operator theory. Aside from fast algorithms, wavelet bases are good bases for signal compression when the signal size is measured in a Besov norm. We will refer to the underlying rationale as “Donoho’s heuristic” [115]: unconditional bases are optimal in an asymptotic, minimax sense. Wavelets play no intrinsic role
6.2 Unconditional bases as best bases
249
in this heuristic. Extrinsically, they provide unconditional bases for important function classes, including Besov spaces. We will focus on wavelets as a tool for compression but the “optimality” in facet (2) has two other senses expressible via basis coefficients {θk } of a function f representing a signal or image, these being statistical estimation and recovery. In contrast to SVD, the minimax approach supposes that the function of interest could be any member of a specific class of functions arising through facet (1). Minimax means that the optimal basis should perform well uniformly throughout that class while asymptotic refers to the rate of decay of expansion coefficients. In the estimation problem one observes noisy data y = x+εz. The components zi of z are taken to be independent and identically normally distributed b with small with mean zero and unit variance. The goal is to estimate x via x MSE kb x − xk22 . Developing y in an orthonormal basis amounts to replacing the problem of estimating x directly by that of estimating its transform θ by b There is a subtle difference between the problem of estimation and that θ. of optimal signal recovery, taking into account information based complexity. In the latter, one has x = µ + εη in which the ηi are nonstochastic nuisance terms satisfying |ηi | ≤ 1. One wishes to recover µ with the smallest worst case error sup kb µ − µk2 in which the supremum is taken over all choices of ηi satisfying |ηi | ≤ 1. The abstract model here is driven by inability to compute the orthogonal coefficients of a signal f exactly, for example, because of machine arithmetic. Estimation and recovery issues both arise, even in applications where the primary goal is to compress data for storage and transmission purposes. The compression problem itself is that of finding a best representative of a noiseless vector x under the constraint that only N machine words, each consisting of a floating point number and an integer (i.e., a coefficient and its index) can be used for the representation. The goal is to bN − θk2 in which the infimum is taken over all approxfind cN (θ) = inf kθ imations of θ by N -word representations. This corresponds to the practical problem of transform coding. The common objective of these PN approaches is to find, inside the nonlinear manifold of N -term expansions k=1 cα(k) ψα(k) in a given countable basis ψα , a minimizer of some quadratic error. What basis should one use? If relevant properties of the underlying signals are captured in terms of a function space norm then, in Donoho’s terms, one can hope to bracket the function norm between a pair of orthosymmetric norms that are characterized in terms of the rate of decay of coefficients in an appropriate basis in question. How Donoho’s ideas fall into the broader statistical framework is laid out in Mallat’s book [268]; cf. also [168]. Here is the setup. Denote by F a function space with norm k · kF . Set F(C) = {f : kf kF ≤ C}. Suppose that {φk } is a countable orthonormal basis that is also a basis for F. OneP says that {φP k } is an unconditional basis for F provided the mapping f = ck φk 7→ mk ck φk is P bounded from `∞ × F to F, say with norm M . Set C = C = {{c } : cP F k k φk ∈ F} P with k{ck }kC = k ck φk kF as well as k{ck }k∗ = supk{mk }k∞ ≤1 k mk ck kC .
250
6 Function spaces and operator theory
Unconditionality of {φk } implies that the class C∗ (1) = {{ck } : k{ck }k∗ ≤ 1} is orthosymmetric and solid in the sense that k{µk ck }k∗ ≤ 1 whenever k{ck }k∗ ≤ 1 and k{µk }k∞ ≤ 1, while M −1 C∗ (1) ⊂ C(1) ⊂ M C∗ (1). One convenient measure of asymptotic decay of coefficients {ck } is a weak`p or `p,∞ norm: {ck } ∈ `p,∞ if #{k : |ck | > λ} ≤ Cλ−p or, more intuitively, if c∗n ≤ Cn−1/p where {c∗n } denotes the decreasing rearrangement of {ck }. For C defined as above, one defines p∗ (C) = inf{p : C ⊂ `p,∞ }. One has [115] the following. Theorem 6.2.1. Let K be a bounded, orthosymmetric and solid subset of `2 . Then, for every orthogonal transformation U : `2 → `2 , p∗ (U K) ≥ p∗ (K). This implies, in the sense of `p,∞ decay, that an unconditional basis is asymptotically optimal for approximation via thresholding. Given a nonnegative sequence {αk }, the hyperrectangle K(α) consists of those {ck } such that |ck | ≤ αk for all k. An application of Khintchine’s inequality (e.g., [375]) yields the following. Lemma 6.2.2. Let K(α) be a hyperrectangle. Let U : `2 → `2 be orthogonal and let p ∈ (0, 2). Then there is a γ(p) independent of U such that sup {ck } ∈ K(α)
kU {ck }kp ≥ γ(p) k{αk }kp .
Proof (of Theorem 6.2.1). Assuming that Lemma 6.2.2 holds, whenever K is orthosymmetric and solid, and U is orthogonal, one has sup {ck } ∈ K(α)
kU {ck }kp ≥
sup {σk } ∈ K({|ck |})
kU {σk }kp ≥ γ(p) k{ck }kp .
Consequently, sup kU {ck }kp =
{ck } ∈ K
sup
sup
K(α) ⊂ K {ck } ∈ K(α)
≥ γ(p)
sup
kU {ck }kp
sup
K(α) ⊂ K {ck } ∈ K(α)
k{ck }kp = γ(p) sup k{ck }kp . {ck } ⊂ K
Suppose now that for some p0 and some orthogonal transformation U one has sup{ck } kU {ck }k`p0 ,∞ < ∞. Then for each δ > 0 and p1 = p0 + δ one has sup kU {ck }k`p1 ≤ C(p0 , δ)−1 sup kU {ck }k`p0 ,∞
{ck } ∈ K
{ck } ∈ K
whereas sup k{ck }k`p1 ,∞ ≤
{ck } ∈ K
sup k{ck }k`p1 ≤ γ(p1 )−1 kU {ck }k`p1 .
{ck } ∈ K
Consequently, p∗ (K) ≤ p0 + δ for each δ > 0. That is, p∗ (K) ≤ p0 . Thus Theorem 6.2.1 follows from Lemma 6.2.2.
6.3 Best nonlinear approximation
251
Proof (of Lemma 6.2.2). Fix {αk } ∈ K and consider the random vector Xk = sk αk in which sk ∈ {−1, 1} takes the values ±1 with equal likelihood. Then {Xk } ∈ K with probability one, and so is {U {Xk }}, with expected `p ¢ ¡P P p 1/p norm equal to E . By Khintchine’s inequality, H¨older’s j( k Ujk Xk ) inequality and orthogonality of U , ¶p/2 XµX p 2 p |Ujk αk | E k{U {Xk }}kp ≥ γ(p) j
p
≥ γ(p)
j
p
= γ(p)
k
XX X
2 p αk Ujk
k
αkp .
k
This proves the lemma and hence the theorem. What does Donoho’s heuristic say about Besov spaces? Let Bps,q (1) = {f : kf kBps,q ≤ 1}. Donoho [115] identified the critical exponent p∗ (Bps,q ) = 1/(s + 1/2) (s > 0, p, q ∈ (0, ∞])
(6.4)
which depends only on the smoothness of the Besov scale—also the only parameter of the Besov scale that delimits which wavelets furnish bases for Bps,q . This exemplifies one important aspect of Donoho’s heuristic—that p∗ (K) is a property of the basis {φk } and only depends on the space F insofar as K does. Still, Donoho’s heuristic is sharp in that, for any r < p∗ , there is an ∗ f ∈ Bps,q such that {hf, ψI i} ∈ `p ,∞ but {hf, ψI i} ∈ / `r,∞ . That p∗ depends only on s can be used advantageously. For example, for the periodic Besov spaces Bps,q (T) with corresponding periodic wavelet bases, there are constants C1 ≤ C2 such that [115] ∞ 1 X j/2 1 2 k{hf, ψjk i}k k`1 ≥ (Var f ) ≥ sup 2j/2 k{hf, ψjk i}k k`1 . C1 j=1 C2 j
That is,
B11,1 (C1 ) ⊂ BV(1) ⊂ B11,∞ (C2 ),
(6.5)
∗
showing that p (BV) = 2/3. We will see later (Theorem 7.3.1) that this decay rate can be refined when the wavelet coefficients are weighted appropriately.
6.3 Best nonlinear approximation In some cases it is possible to determine a rate of nonlinear approximation of f ∈ F(1) by partial sums of basis expansions that depends intrinsically on both the basis and the (quasi)-norm on F. Consider a basis B = {ek } for F and the nested families of manifolds
252
6 Function spaces and operator theory
½ Σ(N, B) =
f ∈F : f =
N X
¾ cm ekm
(6.6)
m=1
consisting of those elements of F that can be written as a linear combination of at most N elements of B. The best N -term approximant SN (g, B, F, k · k) of g ∈ F is then an element S of Σ(N, B) that minimizes kg−Sk. Here k·k is a norm, though not necessarily the norm on F. We will explain this distinction below. Consider now what a “best basis” for nonlinear approximation might mean. Donoho’s heuristic says that an unconditional basis provides the best rate of decay of coefficients of F(1). A slightly stronger version of minimax is considered in [121] in which minimax, asymptotic optimality means that the Steˇckin numbers sup kf − SN (f, B, L2 )kL2
dN (F, B) =
f ∈ F (1)
decay at a rate that is optimal over all choices of orthonormal families that are bases for F. This sense of optimality should be distinguished from a notion of a best basis for a specific signal f that minimizes, among a given family of bases, some notion of cost of expanding or approximating f , as was encountered in Chapter 4. Donoho’s heuristic says that an orthogonal basis that is an unconditional basis for F is a best basis in the minimax sense. Here is a table taken from [121] that summarizes the situation for various function norms. Space
Optimal basis
dn (F, B)
W2m Wpm C˙ α
Fourier or wavelet wavelet
O(n−m ) O(n−m )
wavelet
O(n−α )
Bump algebra Bounded variation
1 B˙ 1,1
BV
wavelet Haar
O(1/n) O(1/n)
Segal algebra
S
Wilson
O(n−1/2 )
L2 -Sobolev Lp -Sobolev H¨older
The H¨older and bump spaces are endpoint Besov spaces while, by (6.5), BV is intermediate to certain endpoint Besov spaces. The Lp -Sobolev spaces are Triebel–Lizorkin spaces. Wavelets form unconditional bases for all of these families. Decay of dN depends only on the smoothness parameter in these cases. The L2 -Sobolev spaces and Segal algebra are modulation spaces for which Wilson bases are unconditional bases (cf. [168]). 6.3.1 Nonlinear wavelet approximation in Besov norms Fix an MRA for L2 (R) and consider `p -normalized wavelet expansions:
6.3 Best nonlinear approximation
f =
X k
hf, φk i φk +
∞ X X
hf, ψI,p0 i ψI,p
253
(ψI,p = |I|1/2−1/p ψI ).
j=1 I ∈ Dj
In the following P discussion, we will systematically ignore the role of the V0 component hf, φk iφk by assuming that this component is zero. The N -term wavelet approximation numbers σN (f )Lp = inf {kf − fN kLp : fN ∈ ΣN } ΣN : N-term wavelet sums define approximation spaces Asq (Lp ) of functions whose wavelet approximation numbers belong to Lorentz `s,q spaces: ½ P∞ ( N =1 (N s σN (f )p )q / N )1/q , 0 < q < ∞ kf kAsq (Lp ) = supN N s σN (f )p , q = ∞ The following characterization of Asq (Lp ) was proved by DeVore et al. [113], cf. [84], to which we also refer for further properties. It says that Asq (Lp ) is characterized by a rate of decay of wavelet coefficients. Theorem 6.3.1. For 1 < p < ∞, s > 0 and 1/r = s + 1/p, f ∈ Asq (Lp ) if and only if {hf, ψI, p0 i}I∈D+ ∈ `r,q . In this case one also has kf kAsq (Lp ) ∼ kf kLp + k{hf, ψI, p0 i}k`r,q . In what follows we will consider N -term wavelet approximations that take into account the properties of dyadic intervals. It is worth noting for now that if p ≤ p0 and s > 1/p−1/p0 or p > p0 and s+1/p−1/2 > 0 then Bps,q ⊂ Asq (Lp ) and when q = p = r/(1 + rs), Asp (Lr ) = Bps,p . Whenever p > s/(1 + rs), the r unit ball of bs,q p is a compact subset of L (Lemma 6.5.3). 6.3.2 Temlyakov’s theorem and wavelet approximation Temlyakov [343] established the following uniform estimate on wavelet expansions (see [84] for the proof here). P 0 Theorem 6.3.2. Let f = I∈Λ hf, ψI,p i ψI,p where Λ ⊂ D+ is finite. If 1 ≤ p < ∞ then kf kLp ≤ Cp (#Λ)1/p max |hf, ψI,p0 i| I ∈Λ
(6.7)
where Cp depends only on p. Moreover, if 1 < p ≤ ∞ then Cp0 (#Λ)1/p min |hf, ψI, p0 i| ≤ kf kLp . I∈Λ
(6.8)
254
6 Function spaces and operator theory
Proof. The proof is based on a type of Calder´on–Zygmund decomposition of E(Λ) = ∪I∈Λ I. Let I(x, Λ) be the smallest interval in Λ that contains x ∈ EΛ . If J ∈ Λ, set J˜ = {x : I(x, Λ) = J} ⊂ J. Then E(Λ) = ∪J∈Λ J˜ is a pairwise disjoint union, so Z XZ g = g. (6.9) E(Λ)
J∈Λ
Je
¡P ¢1/2 2 By Lemma 6.10.2, the function S(f )(x) = satisfies I i| /|I| x∈I |hf, ψ P the norm equivalence kS(f )kLp ≈ kf kLp [274], so for f = I∈Λ hf, ψI, p0 i ψI, p , kf kLp ≤ C kS(f )kLp
°µ X ¶1/2 ° ° ° −2/p ° ° |I| ≤ C max |hf, ψI, p0 i | ° ° I∈Λ x∈I∈Λ
Lp
≤ C max |hf, ψI, p0 i| k|I(x, Λ)|−1/p χI(x,Λ) (x)kLp I∈Λ
1/p
= C max |hf, ψI, p0 i| k|I(x, Λ)|−1 kL1
(6.10)
I∈Λ
since
P x∈I∈Λ
|I|−2/p ≤ cp |I(x, Λ)|−2/p . By (6.9),
Z |I(x, Λ)| E(Λ)
−1
dx =
X Z J∈Λ
Je
|I(x, Λ)|−1 dx =
X |J| e ≤ #Λ |J|
(6.11)
J∈Λ
˜ Combining (6.10) and (6.11) yields (6.7). by the definition of J. P To prove (6.8), χI(x,Λ) /|I(x, Λ)| ≥ c I∈Λ χI /|I| implies °µ ¶1/2 °p X ° p p p ° −2/p ° kf kLp ≥ C kS(f )kLp ≥ C min |hf, ψI, p0 i| ° |I| ° ° p I∈Λ L x∈I∈Λ ° ° X ° ° χ I° p p ≥ C min |hf, ψI, p0 i| ° = C min |hf, ψI, p0 i| #Λ. ° I∈Λ I∈Λ |I| °L1 x∈I∈Λ
This proves (6.8) and completes the proof of the theorem.
6.4 Nonlinear approximation, wavelets and trees As we did in defining the spaces bs,q in (6.3), we will consider here only p functions that can be expanded in wavelets ψI where I ∈ D[0, 1]. The notion of best nonlinear wavelet approximation can be refined to take organizational advantage of the dyadic tree structure of wavelets. By a tree we mean here a family T of dyadic intervals in D[0,1) such that if I ∈ T and |I| < 1 then the parent interval of I is in T . Any such tree has maximal element [0, 1). We use #T to denote the number of nodes in T . In analogy to the manifold ΣN of all N -term wavelet sums, one defines the nonlinear manifold
6.4 Nonlinear approximation, wavelets and trees tr ΣN
n = f =
X
o cI ψI : #T ≤ N .
255
(6.12)
I ∈ T a tree tr Then ΣN is a submanifold of ΣN . For a Banach function space B one can define the N -term tree approximation error
τN (f )B =
inf kf − SkB .
tr S ∈ ΣN
It is desirable to know whether or when the numbers τN compare favorably to the wavelet approximation numbers σN . Intuitively, Besov norms capture information about singularities that propagate down wavelet scales, so large wavelet terms should nearly have a dyadic tree structure. One should interpret this statement carefully, though: permuting wavelet terms at any fixed scale does not affect the norm in (6.3). Fix p ≥ 1. For η > 0 set Λ(f, η) = {I ∈ D([0, 1]) : |hf, ψI, p0 i| ≥ η} T (f, η) = the minimal spanning tree of Λ(f, η). With T (f, η) one associates the Lp -approximant X T (f, η) = hf, ψI, p0 i ψI, p .
(6.13)
I ∈ T (f,η)
A rate of growth of T (f, η) as η → 0 defines a space © ª Bλ (Lp ) = f ∈ Lp : kf kBλ = sup η (#T (f, η))1/λ < ∞ .
(6.14)
0<η<1
Thus
C(f ) (f ∈ Bλ (Lp )) ηλ while, by Temlyakov’s theorem there exists C 0 (f ) such that: #Λ(f, η) ≤ #T (f, η) ≤
p
#Λ(f, η) ≤
(C 0 (f ) kT (f, η)kLp ) ηp
(f ∈ Lp ).
(6.15)
Thus, typically #Λ(f, η) can grow faster in Lp than in Bλ when λ < p. While Bλ (Lp ) is a linear space, the natural quasinorm kf kBλ (Lp ) = C(f )1/λ does not satisfy the triangle inequality. Nevertheless, Bλ (Lp ) provides some leverage for obtaining tree approximation estimates in Besov norms. Theorem 6.4.1. Given 1 ≤ p < ∞ and 0 < λ < p one has λ/p
kf − T (f, η)kLp ≤ C kf kBλ (Lp ) η 1−λ/p
(6.16)
with C independent of f and η while if, moreover, s > min(0, 1/r − 1/p) then for 0 < q ≤ ∞, one has ´ ³ 1 (6.17) kf kBλ (Lp ) ≤ C 0 kf kbs,q , λ = r s + 1/ max(p, r)
256
6 Function spaces and operator theory
Proof. To prove (6.16) let X
Tl (f, η) =
hf, ψI, p0 i ψI, p
(6.18)
I ∈ T (f,η/2l+1 )\ T (f,η/2l )
denote the lth layer of T (f, η). By Temlyakov’s inequality, µ ³ ³ 2l ´λ ¶1/p η ´´1/p η η ³ kTl (f, η)kLp ≤ C l #T f, l+1 ≤ C l kf kBλ (Lp ) . 2 2 2 η By Minkowski’s inequality one has kf − T (f, η)kLp ≤
∞ X
1/p
kTl (f, η)kLp ≤ C η 1−λ/p kf kBλ (Lp )
l=0
∞ X
2l(λ/p−1)
l=0
which yields (6.16) for λ < p. For (6.17), to keep notation simple we consider only the case r = p. Let Λj (f, η) = Λ(f, η) ∩ Dj . For f ∈ bs,q p one has X p η p #Λj (f, η) ≤ |hf, ψI, p0 i| by Chebychev I ∈ Dj
° °p ° X ° ° ≤ C° hf, ψI, p0 i ψI,p ° ° p by properties of wavelets L
I ∈ Dj
p
≤ C 2−jsp kf kbs,q by (6.2). p Set Tj = T ∩ Dj . If I ∈ Tj (f, η) then I ∈ Λj (f, η) or I is an ancestor of an element of Λ(f, η). Since #Dj [0, 1] = 2j , the estimate for #Λj (f, η) gives ½ µ ¶p ¾ X kf kbs,q p #Tj (f, η) ≤ #Λk (f, η) ≤ C min 2j , . η2js k≥j
Setting
½ j0 = min
one has #T (f, η) =
X
µ kf kps,q ¶¾ bp 0, λ log2 , η
#Tj (f, η) +
X
#Tj (f, η)
j>j0
j≤j0
≤
X
2j + C η −p kf kpbs,q p
j≤j0
µ
X j>j0
2−jsp
¶λ µ ¶p(1−λs) kf kbs,q kf kbs,q p p ≤ +C η η µ ¶ λ kf kbα,q p = C when λ = p(1 − λs) η which proves the case r = p of the theorem.
6.5 Wavelets and coding
257
Theorem 6.4.1 allows us to quantify the rate at which tree approximations converge to f in Besov norm as follows. Corollary 6.4.2. For 1 ≤ p < ∞ and 0 < λ < p one has τN (f )Lp ≤ C kf kBλ (Lp ) N 1/p−1/λ while if s > min(0, 1/r − 1/p) then, for 0 < q ≤ ∞ and f ∈ bs,q r , one has τN (f )Lp ≤ C kf kbs,q N 1/p−1/λ−s . r Proof. For the first inequality one has, by definition, η λ #T (f, η) ≤ kf kλBλ (Lp ) so, choosing N so that N η λ = kf kλBλ (Lp ) the approximant T (f, η) will belong tr and the theorem then implies that to ΣN λ/p
τN (f )Lp ≤ kf − T (f, η)kLp ≤ C kf kBλ (Lp ) η 1−λ/p λ/p−1
= Ckf kBλ (Lp ) kf kBλ (Lp ) η 1−λ/p = C kf kBλ (Lp ) N 1/p−1/λ . The embedding (6.17) provides the second inequality.
6.5 Wavelets and coding The optimality of unconditional bases among all possible bases is one aspect of the grand challenge mentioned above. Facet (3) of the grand challenge addresses efficient computation. Beyond the fast wavelet transform, the dyadic structure of wavelets plays a decisive role when facet (3) is framed in terms of Kolmogorov’s entropy coding theory. 6.5.1 Kolmogorov entropy and coding A fundamental result of approximation theory says that the Kolmogorov ε-entropy of a Besov space grows like a power of 1/ε. The goal of entropy source coding is to optimize the tradeoff between precision and efficiency of representations of a class of signals taken from a compact subset K of a metric space (X, d). The ε-covering number N (K, ε) of K is the minimum number of balls of radius ε required to cover K. The Kolmogorov ε-entropy of K is H(K, ε) = log2 N (K, ε). This abstract concept can be concretized in terms of approximate encoding and decoding of elements of K by bitstreams. A (deterministic) encoder is a mapping E : f ∈ K 7→ E(f ) = (f1 , f2 , . . . ) ∈ B, the set of all terminating sequences of zeros and ones, with L(f, E) denoting the number of binary digits of E(f ) (before termination). A decoder D : B →
258
6 Function spaces and operator theory
X is a right inverse of E but typically not a left inverse: E rarely provides a faithful representation of f . One defines the distortion of the pair (D, E) as dist(f, E, D) = d(f, DE f ), dist(K, E, D) = sup dist(f, E, D). f ∈K
The bitlength of K is L(K, E) = max L (f, E). f ∈K
An efficient encoder will optimize the tradeoff between the length of bitstreams and the distortion of an encoding pair. The capacity to be encoded efficiently is a property intrinsic to K. One sets distn (K) = inf {dist(K, E, D) : L (K, E) ≤ n}. (E,D)
Determining the optimal rate-distortion tradeoff amounts to finding the Kolmogorov entropy. To see why, fix ε > 0 and find a covering of K by balls B(fj , ε), j = 1, 2, . . . , N (K, ε). For each f ∈ K select j such that f ∈ B(fj , ε) and let E(f ) = E(fj ) be the sequence of binary digits of j. The decoder maps the binary sequence of j into fj . Thus the encoding pair remembers f only up to a distortion by an amount ε. The number of digits needed thus to describe any f ∈ K to within an error ε is at most log2 N (K, ε). Conversely, given any encoding pair E, D with distortion at most ε the balls centered at the points DEf , f ∈ K form a covering of K and so H(K, ε) ≤ L(K, E). Constructing optimal encoders is, at least formally, the same as computing Kolmogorov entropy. However, determining an ε cover of K and assigning bitstreams to particular elements of K can be a nontrivial matter. This is where harmonic analysis and (concrete) approximation theory becomes important in addressing facet (3) of the grand challenge. Suppose, for example, that a class of data of interest can be modelled by membership in a compact subset of a Besov space. One can use the FWT to compute coefficients and sort them easily in decreasing order of magnitude. This alone does not give a best approximant of f but, by Donoho’s heuristic, Theorem 6.2.1, it gives a good one for Besov norms. However, a practical encoder requires still more, namely quantization and positions of the coefficients. The former is not so tricky as the latter, and this is where the dyadic tree organization of wavelets becomes critical. In what follows we review the wavelet based coding scheme due to Cohen et al. [81] that provides a concrete, optimal estimate for the Kolmogorov entropy of Besov balls. 6.5.2 Encoding We continue to work in one variable noting, however, that coding of twodimensional images is a primary application of these ideas. It will be assumed
6.5 Wavelets and coding
259
throughout that f is a superposition of wavelets ψI,p , I ∈ D[0, 1] and p < ∞. The case p = ∞ and expansions in all of R is found in [81]. Let T (f, 1/2k ) be the minimal tree spanning those interval nodes I such that |hf, ψI,p0 i| > 2−k . Define the layers L0 = T0 and Lk (f ) = T (f, 1/2k )) \ T (f, 2/2k ), k = 1, 2, . . . (not to be confused with the dyadic levels T ∩ Dk ) and the layer P projections Lk f = hf, ψI,p0 i ψI,p which give rise to the telescoping I∈Lk P∞ sum f = k=0 Lk f . In what follows we will consider a method for encoding (and decoding) functions in Lp (R) based on their tree decompositions up to layer N . The encoding is progressive, meaning that if EN −1 (f ) = (f1 , . . . , fβN −1 ) then one only need compute the bits fβN −1 +1 , . . . , fβN to obtain EN (f ) = (f1 , . . . , fβN −1 , . . . , fβN ). Several ingredients are involved in passing from EN −1 to EN , namely: PN (f ) : positions of intervals at the N th layer SN (f ) : signs of the coefficients hf, ψI, p i , I ∈ LN Bk,N −k (f ) : (N − k)-th magnitude bits of hf, ψI, p i , I ∈ Lk , 0 ≤ k ≤ N. To keep track of magnitude bits one normalizes in terms of M (f ), the size of the largest coefficient of f . In summary, starting from the left, the bitstream EN (f ) takes the form M (f ); P0 (f ), S0 (f ), B0,0 (f ); P1 (f ), S1 (f ), B0,1 (f ), B1,0 (f ); . . . ; PN (f ), SN (f ), B0,N (f ), B1,N −1 (f ), . . . , BN,0 (f ).
(6.19)
Here are the particulars of the individual bitstreams. The normalization bitstream M(f ). One sets M (f ) = m0 · · · mQ+1 where Q = | log2 maxI | hf, ψI, p i ||. Here m0 = 1 if | maxI hf, ψI, p i | ≥ 1 and m0 = 0 otherwise. One sets mk = 1 if 1 ≤ k ≤ Q and mQ+1 = 0. Thus the string encodes the magnitude of the largest coefficient of f . The position bitstreams Pk (f ). D[0, 1] is lexicographically ordered: I(j, m) precedes I(j 0 , m0 ) if j < j 0 or if j = j 0 and m < m0 . This ordering induces a code for the positions of interval nodes in a dyadic tree as follows. Proposition 6.5.1. (i) The positions of the interval nodes in a tree T with top [0, 1) and #T nodes indexed by D[0, 1] can be encoded in the lexicographical order by a bitstream having at most 1+2#T bits. (ii) Given the positions of the nodes of a subtree T 0 of T , the positions of the nodes in T \ T 0 can be encoded lexicographically with a bitstream P consisting of at most 1 + 2#(T \ T 0 ) bits such that: (iii) If the bitstream P is embedded in a larger bitstream then, whenever the position of the first bit of P is known, the position of the last bit of P can be determined.
260
6 Function spaces and operator theory
Proof. First, since [0, 1) ∈ T one sends the bitstream 1. Next, one identifies those nodes of T in D1 . Each such node is a child of [0, 1). Thus one assigns a zero to each such interval not in T and a one to each interval in T . One arranges this pair of bits according to the natural ordering I(1, 0) = [0, 1/2) and I(1, 1) = [1/2, 1). This bitstream requires two bits. For example, 01 means that [0, 1/2) ∈ / T while [1/2, 1) ∈ T . To identify the nodes in D2 ∩ T one needs 2#(D1 ∩ T ) bits to identify which children of nodes in D1 ∩ T belong to T . In general one needs 2#(Dj ∩T ) bits to identify which children of nodes in Dj ∩T belong to T . Termination of T at level J is identified by the appearance of 2#(Dj ∩ T ) consecutive zeros, signifying that no children of nodes in DJ ∩ T belong to T . The argument is the same for (ii): instead of assigning a bit of 1 if a child is in T , one assigns a 1 when a child is in T \ T 0 . Finally, since the encoding keeps track of dyadic level j, a string of 2#(Dj ∩ (T \T 0 )) zeros indicates termination of the bitstream. The bitstream Pk (f ) now identifies the positions of the cubes in the kth layer Lk of T . Several of these trees can be empty, depending on M (f ). The sign bitstream Sk (f ). This bitstream encodes the signs of the coefficients at the kth layer: a zero bit is assigned if hf, ψI,p i ≥ 0 and a one-bit is assigned otherwise. The length of Sk —the number of coefficients at layer k—is determined by Pk . The signs are assigned in lexicographical order on Lk . P∞ Finally, one comes to the coefficients. Let |hf, ψI, p i| = ν=−∞ bν (f, I)2−ν . The bitstream Bk,n−k (f ). If I ∈ Ln (f ) then | hf, ψI,p0 i | ≤ 21−n . Thus, bν (f, I) = 0 if ν < n. For such intervals, adding a single bit reduces error in quantizing | hf, ψI,p i | to 1/2n . One should update the coefficients at each layer k < n as well. This requires adding a single bit bn (f, I) for each I ∈ Lk . Thus, for each 0 ≤ k ≤ n the bitstream Bk,n−k consists of the bits bn (f, I), ordered lexicographically over those I ∈ Lk . One final remark is in order. If each coefficient happens to be small, it could happen that the normalization bitstream already exceeds the allotted number of bits for the encoder. Then the decoded approximation will contain only one wavelet term, but the remaining terms of the original function will all be small in comparison. 6.5.3 Decoding Recall that EN was defined to store the first βN bits of f . In our scenario, βN is meant to be chosen in such a way that if f comes from a compact metric space such as the ball of some fixed size in a Besov space—to which best treebased approximants will converge at a uniform rate—then βN is large enough to store all of the normalized, quantized best-tree wavelet coefficients of f up to layer N . In this case the wavelet coefficients of DN EN f satisfy |hf, ψI, p i − hDN EN f, ψI, p i| ≤ 2−N .
(6.20)
6.5 Wavelets and coding
261
EN retains the quantized coefficients up to T (f, 2−N ) while, if I ∈ / T (d, 2−N ), −N then one already has |hf, ψI, p0 i| < 2 . The goal is to show that this encoder is efficient—it provides asymptotic minimax optimization of rate-distortion tradeoff in certain Besov norms. 6.5.4 Performance in Besov balls Theorem 6.5.2. Let 1 ≤ p < ∞ and 0 < λ < p and let U denote the unit ball in Bλ (Lp ) as defined in (6.14). Then the maximum length L(U, EN , DN ) of those bitstreams used to encode elements of U with an optimal encoder satisfies L(U, EN , DN ) ≤ C 2λN
(6.21)
while the distortion dist (U, EN , DN ) of the pair (EN , DN ) above satisfies dist (U, EN , DN ) ≤ C 2−N (1−λ/p)
(6.22)
with constants depending only on p and λ. See [81] for improvements of these estimates in certain cases. © ª Proof. Recall that Bλ (Lp ) = f ∈ Lp : #T (f, η) ≤ cη −λ . If f ∈ U then none of its wavelet coefficients will exceed unit magnitude. If Q ≥ N then EN (f ) will consist of N + 2 bits; hence the number nM of bits needed to encode M (f ) is nM ≤ N + 2. By the definition of U and the quasinorm on Bλ (Lp ), one automatically has #T (f, 2−N ) ≤ 2N λ . Therefore the total number of bits nP in all of the bitstreams Pk (f ) satisfies nP ≤ 2 + 2 #T (f, 2−N ) ≤ 2 + 21+λN . Since there are at most 2 bits allotted to each sign, the total number nS of bits allotted to the bitstreams Sk is 21+λN . To estimate the number nB of bits allotted to encode the wavelet coefficient moduli, one counts N + 1 − k bits for each node in Lk . But #Lk ≤ #T (f, 2−k ) ≤ 2kλ . Thus nB ≤
N X
(N + 1 − k) 2kλ ≤ C 2λN
k=0
where C depends only on λ. Therefore L(U, EN , DN ) = nL + nP + nS + nB ≤ C 2N λ with C depending only on λ. This proves (6.21). By (6.20) together with the first Temlyakov inequality (6.7), one has
262
6 Function spaces and operator theory
° ° ¢ ¡ °DN EN f − T (f, 2−N )° p ≤ C 2−N #T (f, 2−N ) 1/p ≤ C 2−N (1−λ/p) . L Together with the bound (6.16), ° ° °f − T (f, 2−N )° p ≤ C 2−N (1−λ/p) L one achieves (6.22). This completes the proof of Theorem 6.5.2. We finish this section with an estimate for Kolmogorov entropy of the unit ball of the Besov space bs,q p . is a compact subset of Lp when δ = Lemma 6.5.3. The unit ball of bs,q r s − (1/µ − 1/p) > 0 and µ = min(r, p). To prove the lemma, for each j = 0, 1, . . . one has ° X ° ° ° ° ° jδ ° jδ ° hf, ψI, p0 i ψI, p ° {hf, ψI, p0 i}I ∈ Dj °`p 2 ° ° p ∼2 I ∈ Dj
L
° ° ° ° ≤ 2jδ °{hf, ψI, p0 i}I ∈ Dj °`µ = 2jδ °{2(1/µ−1/p)j hf, ψI,µ0 i}I ∈ Dj °`µ ° ° ° ° ° ° = 2js °{hf, ψI,µ0 i}I ∈ Dj °`µ ≤ °f °bs,∞ ≤ C °f °bs,q r
µ
Therefore, ° ° J X X ° ° °f − ° 0 hf, ψ i ψ I, p I, p ° °
Lp
j = 0 I ∈ Dj
≤
° ° ∞ X ° X ° ° ° 0 hf, ψ iψ I, p I, p ° ° j=J+1
I ∈ Dj
Lp
. ≤ C 2−Jδ kf kbs,q r Since there are only finitely many terms at each scale it follows that the unit ball of kf kbs,q is a compact subset of Lp . This proves the lemma. r Corollary 6.5.4. Let 1 ≤ p < ∞, 0 < q ≤ ∞ and let s > 1/p0 − 1/p. If U is the unit ball in Bps,q 0 , then the Kolmogorov entropy satisfies Hε (U ) ≤ C ε−1/s , where C depends only on p and s. As in the case of Donoho’s p∗ in (6.4), the Kolmogorov rate depends, asymptotically, only on the smoothness s. Proof. This follows from the fact that the Kolmogorov entropy is controlled by an optimal encoder. Fix λ = p/(sp+1) as in (6.17). Let N be the smallest integer such that C2−N λs < ε. Since p − λ = λsp, dist (U, EN , DN ) ≤ C2−N λs ≤ ε. By (6.21) this implies that the maximum length of the bitstreams used to encode U does not exceed Cε−1/s and one obtains Hε (U ) ≤ L(U, EN , DN ) ≤ C ε−1/s .
6.6 Boundedness and compression of operators
263
6.6 Boundedness and compression of operators 6.6.1 Schur’s lemma As in its original form, the applications of Schur’s lemma that concern us will involve the concrete setting of matrices acting on coefficient sequences, so we will state the basic lemma accordingly. Lemma 6.6.1. (Schur’s lemma) Suppose that M (α, β) : A × B → C satisfies X X Cβ = supβ |M (α, β)| µ(α) < ∞; Cα = supα |M (α, β)| ν(β) < ∞. Then multiplication by the matrix M defines a bounded operator from `pν to `pµ , 1 ≤ p ≤ ∞ . The standard proof follows directly from H¨older’s inequality and interchange of order of integration. In applications, one often associates to an operator T on a function space X its matrix with respect to a suitable basis B, then uses Schur’s lemma to say something about boundedness of T when B furnishes an isomorphsm of X with a suitable coefficient space. 6.6.2 Schur’s lemma and wavelet matrices It will be simpler to work with the homogeneous Besov spaces here. In the special case q = p the homogeneous version of the wavelet Besov norm in (6.2) is the pth root of X µ |hf, ψI, p0 i| ¶p X p p kf kB˙ s, p (R) = = |hf, ψI i| |I|1−ps−p/2 . (6.23) s p |I| I ∈D
I ∈D
Thus, when p = q the wavelet-Besov norm is simply the `p -norm with respect to the discrete measure dν(I) = |I|1−ps−p/2 in which the wavelet coefficients are L2 -normalized. At the level of coefficients, then, a sequence {cI } is the set of wavelet coefficients of an element of B˙ ps,p precisely when {|I|−γ cI } ∈ `p where γ = s + 1/2 − 1/p. By the wavelet matrix of T we mean the matrix WT (I, J) = hT ψI , ψJ i , (I, J ∈ D) which satisfies T f = g when hg, ψJ i = =
X
X
hT ψI , ψJ i hf, ψI i
I ∈D
WT (I, J) hf, ψI i , J ∈ D.
I ∈D
Thus, in wavelet coordinates T is represented as multiplication of {hf, ψI i} by the matrix WT (I, J) to produce {hT f, ψJ i}.
264
6 Function spaces and operator theory
Then T maps B˙ ps,p to itself continuously provided the discrete operator M−γ WT Mγ maps `p (D) continuously to itself, where Mγ (cI ) = |I|γ cI . Multiplication of a coefficient sequence {cI } by M−γ WT Mγ is simply multiplication by the matrix µ ¶γ |I| WTγ (I, J) = WT (I, J). (6.24) |J| From Schur’s lemma then (see [147], Section 10) one has the following. Corollary 6.6.2. kT k
B˙ ps, p →B˙ ps, p
≤ sup I
X J
µ
|I| |WT (I, J)| |J|
¶γ + sup J
X I
µ
|I| |WT (I, J)| |J|
¶γ .
6.6.3 Wavelet compression of operators A basis that provides good nonlinear approximation for a given function space should also provide effective compression of operators that act continuously on that space. One defines the approximation numbers of an operator T in the basis B = {eα } on the (quasi)-normed space B by σN (T, B, B) = inf kT − SN kB→B ; SN = SN
N X
hT eαk , eβk i eβk ⊗ eαk .
k=1
When B = B˙ ps,p and B is an orthonormal wavelet basis of regularity exceeding s, σN (T, B˙ ps,p ) measures the ability to approximate T as an operator on B˙ ps,p by rank N wavelet-to-wavelet projections. In view of preceding observations, viz. (6.24), one also has ½µ ¶γ ¾N ° ° |Ik | σN (T, B˙ ps, p ) = inf °WTγ − B (N ) °`p →`p ; B (N ) = hT ψIk , ψJk i . |Jk | B (N ) k=1 To simplify matters, we will specialize now on the case in which p = 1 so that γ = s − 1/2. Then Schur’s criterion stipulates that ° ∗° kWTγ k`∞ (`1 ) + °(WTγ ) °`∞ (`1 ) < ∞ where X kWTγ k`∞ (`1 ) = sup |WTγ (I, J)|. I
J
WTγ
Abbreviate now simply to W . For fixed I, let WI∗ (k) be thePnonincreasing rearrangement of W (I, ·) in the parameter J. Set ω(I, k) = l≥k WI∗ (l). Then, trivially, kW k`∞ (`1 ) = k{ω(I, k)}k`∞ (D×N) defines an embedding of `∞ (`1 )(D × D) into `∞ (D × N). Taking ω ∗ (n) to be the nonincreasing arrangement of ω(I, k), one has ° ° inf °W − W (N ) °`∞ (`1 ) = ω ∗ (N ). W (N )
Wavelet compressibility of T can then be expressed as follows [204].
6.7 Boundedness and compression of singular integrals
265
Theorem 6.6.3. Let α > 0 and γ = s − 1/2. Then X
(σN (T, B˙ 1s,1 ))α ≈
N
∞ XX I
(k WI∗ (k))α
k=1
γ
where W = WTγ = {hT ψI , ψJ i (|I|/|J|) } denotes the normalized wavelet coefficient matrix of T . P Proof. By the observations above, one has supI J |hT ψI , ψJ i|(|I|/|J|)γ = kW k`∞ (`1 ) so that σN (T, B˙ 1s,1 ) = ω ∗ (N ). Strictly speaking, σN (T, B˙ 1s,1 ) depends in the same way on decay of approximations of the adjoint of W . We take such decay for granted here. Thus, X X α (σN (T, B˙ 1s,1 ))α = (ω ∗ (N ))α = kω ∗ k`α (N) . N
N
But, by the definition of ω ∗ and an application of Hardy’s inequality (i.e., the `p norm, p > 1, of the averages of a nonincreasing sequence is equivalent to the `p norm ofPthe sequence) this last expression is equivalent to the mixed , which is precisely the right-hand side of Lorentz norm I k{WI∗ }kα `(1+α)/α,α the expression.
6.7 Boundedness and compression of singular integrals 6.7.1 Haar wavelets and the Hilbert transform One of the major achievements in harmonic analysis in the 1970s was Coifman, McIntosh and Meyer’s proof of L2 -boundedness of the Cauchy integral on Lipschitz curves (see [88]; cf. [274]). Motivated by this problem, Zygmund earlier had asked: can one prove L2 -boundedness of the Hilbert transform Z f (t) H : f 7→ p.v. dt (6.25) s−t without using Plancherel’s theorem? Much technology intervened, but eventually Semmes [317] found an elementary proof using Haar wavelets and Schur’s lemma. We will outline the proof here to illustrate the techniques which did, ultimately lead to a short proof of L2 -boundedness of the Cauchy integral on Lipschitz curves (see [87]). The Haar wavelets are hI (x) = |I|−1/2 (χIl − χIr ) where Il and Ir are the left and right halves of the dyadic interval I. We write h = h[0,1) . By a rescaling argument and the fact that H ∗ = −H, Schur’s criterion reduces to the following. P Lemma 6.7.1. I∈D |hHh, hI i| ≤ C.
266
6 Function spaces and operator theory
Lemma 6.7.1 will be reduced to a few further lemmas. In what follows, let aI , bI and mI = (aI + bI )/2 denote the left endpoint, right endpoint and midpoint, respectively, of I. Lemma 6.7.2. If x ∈ / 2I then |H(hI )(x)| ≤ C|x − mI |−2 |I|3/2 . R R Since hI = 0, one has H(hI )(x) = p.v. (1/(x − t) − 1/(x − mI ))hI (t) dt and Lemma 6.7.2 follows from straightforward estimates. c2 |I| . Lemma 6.7.3. If x ∈ 2I then |H(hI )(x)| ≤ √C1 ln min{|x−aI |,|x−b I |,|x−mI |} |I|
This lemma is reduced to the cases x ∈ I versus x ∈ / I. In the latter case one chooses the endpoint to which x is closest and argues as in the Lemma 6.7.2. When x ∈ I one splits into Il and Ir . The integral over the half of I not containing x is estimated as in the case x ∈ / I. For the half containing x, say x ∈ Il , one computes µ ¶ Z mI Z 0 dt (iεeiθ ) mI t=x−ε p.v. = lim − ln(t − x)|t=a dθ + ln(t − x)| + t=x+ε I iθ ε→0 x−t aI −π εe ³ x−a ´ I = ln + iπ mI − x so if, say x ∈ Il and x − aI ≤ mI − x then ¯ ¯ µ ¯ ¯ ¶ Z ¯ ¯ hI dt ¯¯ x − aI C2 |I| 1 ¯¯ C1 ¯p.v. ¯ ¯ ¯ = p|I| ¯ln mI − x + iπ ¯ ≤ p|I| ln |x − aI | . x − t Il Similar analysis applies to the other geometric cases thus yielding the lemma. To prove Lemma 6.7.1 one breaks the set of all I ∈ D into disjoint sets of intervals and estimates according to these cases. Thus we say I ∈ D1 if |I| ≥ 1/8, I ∈ D2 if |I| < 1/8 and I ∩ [−1, 2] = ∅, and I ∈ D3 if |I| < 1/8 and I ⊂ [−2, 3]. P Lemma 6.7.4. I∈D1 | hHh, hI i | ≤ D1 . To prove this, first suppose that 2I ∩ [0, 1] 6= ∅. To apply the estimate of Lemma 6.7.3 one may as well assume that [0, 1] ⊂ 2I. In that case C1 | hHh, hI i | ≤ p |I|
Z
2|I|
ln 0
C c2 |I| dx ≤ p . min{|x − aI |, |x − bI |, |x − mI |} |I|
The number of intervals in D1 ∩ Dk such that 2I ∩ [0, 1] 6= ∅ is bounded independent of k, so the sum of the corresponding terms yields a convergent geometric series. Now suppose that 2I ∩ [0, 1] = ∅. Then by Lemma 6.7.2 one has Z 1 |x − mI |−2 dx ≤ C|I|3/2 |mI |−2 . | hHh, hI i | ≤ C|I|3/2 0
6.7 Boundedness and compression of singular integrals
267
Summing over such intervals also in Dk one has a convergent p-series (p = 2) with bound C2−k/2 . Summing over k then gives a convergent geometric series. This completes the proof of Lemma 6.7.4. P Lemma 6.7.5. I∈D2 |hHh, hI i| ≤ D2 . Just as before, Lemma 6.7.2 yields |hHh, hI i| ≤ C |I|3/2 |mI |−2 . for such intervals. The condition on I implies that mI ≈ (n + 1/2)|I| for some integer n such that |n| > 1/|I|. Summing first over n yields a convergent p series then over |I| yields a convergent geometric series. P Lemma 6.7.6. I ∈ D3 |hHh, hI i| ≤ D3 . This lemma requires slightly more care. First one breaks down D3 into those intervals close to 0 (called D3,0 ), those close to 1/2, and those close to 1. We will consider only D3,0 —the arguments for the other cases are the same. Since h = χ[0,1/2] − χ[1/2,1] , one just needs to prove that X
|hH χ[0,1/2) , hI i| ≤ D3 .
I ∈ D 3,0
P The argument for I∈D3,0 |hHχ[1/2,1) , hI i| is contained in the previous lemmas. Now if I ⊂ [−2, 0) and bI 6= 0 then 2I ∩ [0, 1) = ∅ and ® | χ[0,1/2) , HhI | ≤ C |I|3/2 |aI |−1 ≤ C |I|3/2 (|aI | + |I|)−1 (6.26) while, if bI = 0 then µZ |hχ[0,1/2) , HhI i| ≤ C
Z
|I|
1/2
+ 0
¶ |HhI | .
|I|
The second integral is estimated just as above while, by Lemma 6.7.3, Z
Z
|I|
|HhI | ≤ C1 |I| 0
µ
|I|
−1/2
log 0
c2 |I| |x|
¶ ≤ C|I|1/2 ≈ C|I|3/2 (|aI | + |I|)−1 .
On the other hand, suppose that I ⊂ [0, 3/8). Since
R
H(hI ) = 0,
hχ[0,1/2) , HhI i = −hχR\[0,1/2) , HhI i. By the same techniques as above, one obtains (6.26) in this case as well. As in the previous cases, the lemma now follows from summing first a p-series then a convergent geometric series. From Lemmas 6.7.4–6.7.6, Lemma 6.7.1 and thus the L2 -boundedness of the Hilbert transform follow.
268
6 Function spaces and operator theory
6.7.2 Compression of Calder´ on–Zygmund operators Suitable linear operators T possess nonstandard wavelet representations ∞ X
T =
∞ X
Pj T Qj +
j=−∞
j= −∞
Qj T Pj +
∞ X
Qj T Qj
(6.27)
j= −∞
where Pj and Qj are the orthogonal projections onto the multiresolution and wavelet spaces Vj and Wj respectively. The specific case of T = d/dx was considered in Chapter 2. Here we consider the role of (6.27) in a result concerning compression of Calder´on–Zygmund operators due to Qiang (see [275]). The three components of T in (6.27) give rise to matrices AT = {hT ψjk , φjl i}kl = {α(j, k, l)}kl , BT = {hT φjk , ψjl i}kl = {β(j, k, l)}kl , CT = {hT ψjk , ψjl i}kl = {γ(j, k, l)}kl . The kernel K of T can then be expressed as K = KA + KB + KC where, for example, X KA (x, y) = (α(j, k, l)φjl (x)ψjl (x)) ψjk (y) j,k,l
and KB and KC are expressed correspondingly. When T is an integral operator having an odd kernel K(s, t) = −K(t, s), as is the case for the Hilbert transform (6.25), there are extra relations among the matrices, namely β(j, k, l) = − α(j, l, k) and γ(j, k, l) = − γ(j, l, k). Consider a family of operators (cf. [274]) Z T f (x) = p.v. K(x, y) f (y) dy on R satisfying the standard Calder´ on–Zygmund kernel estimates: |K(x, y)| ≤ C0 |x − y|−1 and |∂ K(x, y) − ∂ K(x0 , y)| ≤ Cα |x − x0 |r |y − y 0 |−1−|α|−r 1 whenever |α| ≤ 1 and |x − x0 | ≤ |x − y| 2 α
α
(6.28)
where r ∈ (0, 1) and C is a fixed constant. The Hilbert transform is an example of such an operator. When it holds, L2 -boundedness of such an operator always depends on some cancellation condition. In our case this is the condition that K(x, y) = −K(y, x). Qiang’s compression result relies on the nonstandard form to define approximants Tm . The operator naively defined by truncating all coefficients α(j, k, l) etc. when |k − l| > m fails the cancellation that is vital in establishing a bound for T : L2 → L2 . For odd kernels, the antisymmetry is passed to nonstandard truncations by setting
6.8 Schur’s lemma and symbol classes
269
if |k − l| > m, αm (j, k, l) = 0, αm (j, k, l) = α(j, k, l), if m ≥ |k − l| ≥ 1, α (j, k, k) = α(j, k, k) + P m |k−l| > m α(j, k, l), so that
X
αm (j, k, l) =
l
X
α(j, k, l).
l
One defines the β and γ truncations similarly. The corresponding banded matrices Am , Bm , Cm define approximations of T that satisfy the following. Theorem 6.7.7. There is a constant C = C(r) such that for any operator T with kernel K satisfying (6.28) one has, for all m ≥ 2: p kT − Tm kL2 →L2 ≤ C m−r log m. This bound is optimal and extendible to certain unbounded operators (see [275]). The proof uses a special `2 version of Schur’s lemma (6.10.3).
6.8 Schur’s lemma and symbol classes Though the appearance of wavelets in Calder´on–Zygmund theory is relatively recent, the ideas that make wavelets useful in operator theory originate, in one form or another, in Littlewood–Paley theory. In many respects, it is not so critical to this theory that wavelets form bases. Indeed, the work of Frazier and Jawerth [147] that unified much of the theory of Besov and Triebel– Lizorkin spaces and enabled one to phrase boundedness questions in terms of matrix boundedness, was carried out in the context of discrete wavelet frames. Gabor tight frames and local trigonometric expansions play a parallel role in the analysis of pseudodifferential operators. 6.8.1 Pseudodifferential operators Let b(x) be a smooth, non-negative, symmetric bell function (cf. Section 4.2) P compactly supported in [−1, 1] such that k |b(x − k)|2 = 1. The corresponding Gabor functions bkn = eπinx b(x − k) form a tight frame for L2 (R). To see this, one has (cf. [99], p. 84): ¯2 X¯ X ¯¯ X Z 1 ¯ ¯ −πins ¯ hf, bkn i ¯2 = ¯ ¯ f (s + 2l) b(s + 2l − k) e ds ¯ ¯ n,k
= 2
k
= 2
XZ k
1
−1
XZ k,l
= 2
−1
l
XZ
1
¯X ¯2 ¯ ¯ f (s + 2l) b(s + 2l − k) ds¯ by Plancherel ¯ l
¯ ¯ ¯f (s + 2l) b(s + 2l − k) ds¯2 by support of b
−1 2
2
2
|f (s)| |b(s − k)| ds = 2 kf k2
270
6 Function spaces and operator theory
P
since k |b(x − k)|2 = 1. Let σ(X, D) be a pseudodifferential operator defined in terms of the Weyl correspondence (cf. [144], p. 81) by ZZ hσ(X, D)f, gi = σ(x, ξ) W (f, g)(x, ξ) dx dξ (6.29) in which W denotes the Wigner distribution (5.39). Straightforward substitution shows that ³ k+l n + m´ W (bkn , blm )(x, ξ) = ip eπi(m−n)x e−2πi(k−l)ξ W (b) x − ,ξ + 2 4 where p = p(n, k, m, l) ∈ Z. By (6.29) the Gabor matrix coefficients of σ(X, D) satisfy hσ(X, D) bkn , blm i = ZZ ³ k+l n + m´ ip σ(x, ξ) eπi((m−n)x+2(l−k)ξ) W (b) x − ,ξ + dx dξ. 2 4 Set Lx = I − d2 /dx2 , which has eigenfunctions eπiαx with eigenvalues (1 + π 2 α2 ). Define also Lξ = I − d2 /dξ 2 . The eigenfunction property plus repeated integration by parts yield, for R = 1, 2, . . . : ip hσ(X, D) bkn , blm i = × 2 2 R (1 + 4π (k − l) ) (1 + π 2 (n − m)2 )R ZZ n ³ k+l n + m ´o R eπi((m−n)x+2(l−k)ξ) LR L σ(x, ξ) W (b) x − , ξ + dxdξ x ξ 2 4 so that CR |hσ(X, D) bkn , blm i| ≤ × 2R (1 + |k − l|) (1 + |m − n|)2R ZZ ¯ ³ k+l n + m ´¯¯ ¯ |∂xα ∂ξβ σ(x, ξ)|¯∂xγ ∂ξδ W (b) x − ,ξ + ¯ dx dξ 2 4
(6.30)
where the sum is taken over those α, β, γ, δ such that β + δ ≤ 2R and α + γ ≤ 2R. The support and smoothness conditions on b imply that W (b) and its partials decay faster than any polynomial. To estimate the Gabor matrix coefficients of σ(X, D) one needs bounds on the growth of the partials of σ. The initial goal here is to obtain matrix coefficient estimates, starting from (6.30), from which boundedness of σ(X, D) on L2 (R) and, perhaps, on Sobolev spaces, will follow from Schur’s lemma. A further goal is to estimate decay of singular values when σ(X, D) is compact. Rochberg and Tachizawa [311] used the scheme just outlined to satisfy these goals under moderate growth conditions on σ. We will consider their results here.
6.8 Schur’s lemma and symbol classes
271
6.8.2 Symbol conditions Fix the recursion parameter R in the integration by parts above and consider the following growth condition on the symbol σ of σ(X, D): |∂ξα ∂xβ σ(x, ξ)| ≤ Cα,β M (x, ξ), (α ≤ 2R, β ≤ 2R)
(6.31)
such that M satisfies the s-moderate growth condition M (x + y, ξ + η) ≤ C (1 + |y| + |η|)s M (x, ξ)
(6.32)
for some fixed 0 < s < 2R − 3. For σ(x, D) defined in (6.29) one has the following (see [311], p. 174). Theorem 6.8.1. Suppose that σ(x, ξ) satisfies (6.31) with M as in (6.32). Then there is a C > 0 such that for all k, n, l, m ∈ Z one has |hσ(X, D) bkn , blm i| ≤ C
min {M (k, n), M (l, m)} . (1 + |k − l|)2R−s (1 + |m − n|)2R−s
Boundedness in L2 -Sobolev norms depending on R, s follows immediately from Schur’s lemma (6.10.3). In particular, if R = [(s + 1)/2] + 1 then σ(X, D) is L2 -bounded (see [311], p. 174). Proof. It follows from (6.31) and (6.32) that ¯ ³ k + l ¯¯ ¯¯ m + n ¯¯´s ³ k + l m + n ´ ¯ |∂xα ∂ξβ σ(x, ξ)| ≤ C 1 + ¯x − , . ¯ + ¯ξ + ¯ M 2 4 2 4 The localization properties of b then yield ¯ ¯ ³ ³ k+l m + n ´¯¯ k + l ¯¯ ¯¯ m + n ¯¯´−3−s ¯ γ δ ¯ ,ξ+ . ¯∂x ∂ξ W (b) x − ¯ ≤ C 1 + ¯x − ¯ + ¯ξ − ¯ 2 4 2 4 Integrating (6.30) then gives |hσ(X, D) bkn , blm i| ≤ C
m+n M ( k+l 2 , 4 ) (1 + |k − l|)2R (1 + |m − n|)2R
and the theorem follows from (6.32). 6.8.3 Estimates for singular values and compression of compact pseudodifferential operators The moderate growth envelope M also provides leverage for estimating decay of eigenvalues of σ(X, D) when it is compact, as it will be when M decays at infinity. Let λ1 ≥ λ2 ≥ · · · be the singular values of σ(X, D) and µi the nonincreasing rearrangement of the sequence {M (k, n)}. One has the following (see [311], p. 174).
272
6 Function spaces and operator theory
Theorem 6.8.2. If σ satisfies (6.31) where lim|x|+|ξ|→∞ M (x, ξ) = 0 then σ(X, D) is compact and there is a C > 0 such that λk ≤ Cµk . Proof. When σ(X, D) is compact, λN +1 = inf{kProjV ⊥ k : dim V ≤ N }. In particular, λN +1 is at most the norm of the projection onto the orthogonal complement of the span V of the first N in any enumeration of the Gabor functions βν = bkν nν . Since X
| hσ(X, D)βν , βi i | ≤ CM (kν , nν )
i
X i
M (ki , ni ) (1 + |ki − kν |)2R (1 + |ni − nν |)2R
≤ CµN +1 whenever ν ≥ N + 1, for f ∈ V ⊥ the tight frame property yields X kσ(X, D)f k2 ≤ C |hσ(X, D)f, βi i|2 i
¯2 ∞ X ¯¯ X ¯ ¯ ¯ = C hf, β ihσ(X, D)β , β i ν ν i ¯ ¯ i
≤ C
X i
N +1 ∞ X
∞ X
N +1
N +1
| hf, βν i |2 | hσ(X, D)βν , βi i |
≤ CµN +1 ≤ Cµ2N +1
∞ X ν=N +1 ∞ X
| hf, βν i |2
X
| hσ(X, D)βν , βi i |
| hσ(X, D)βν , βi i |
i 2
| hf, βν i |2 ≤ Cµ2N +1 kf k .
ν=N +1
The eigenvalue estimate follows. Corollary 6.8.3. If M ∈ Lp,q (R2 ) then
P∞ k=1
k q/p−1 λqk < ∞.
6.8.4 Exotic symbols P The does not map the symbol ak (x)ξ k to the operator P Weyl correspondence ak (x)Dk . As such, in applications to PDE one often prefers to work with the Kohn–Nirenberg correspondence which associates the operator Z Tσ f = e2πixξ σ(x, ξ) fb(ξ) dξ (6.33) to the symbol σ(x, ξ) (see [144] for more details). For the Gabor functions bkn = eπinx b(x − k) and blm one has ZZ ³ n´ dξ dx hTσ bkn , blm i = c e2πi((l−k)ξ+(n−m)x/2+xξ) bb(ξ) b(x)σ x + l, ξ + 2 ≡ c I(k, n, l, m). (6.34)
6.8 Schur’s lemma and symbol classes
273
Because of assumptions on b, it follows that the integrand in (6.34) lies in a bounded set of S(R × R). One can estimate I(k, n, l, m) by repeated integration by parts just as before. As in Theorem 6.8.1, one can deduce matrix boundedness. Hence, L2 -boundedness of Tσ follows once appropriate estimates on the partials of σ(x, ξ) are obtained. Consider the condition |∂ξα ∂xβ σ(x, ξ)| ≤ Cα,β (1 + |ξ|)δ(|β|−|α|)
(6.35)
for some δ ∈ [0, 1) and α, β ∈ N. When δ = 0 there is no problem applying the same integration by parts arguments as above to conclude that |I(k, n, l, m)| ≤ C(1 + |k − l| + |n − m|)−3 so that Schur’s criterion holds. Symbols satisfying (6.35) are said to be exotic. In the particular case δ = 1/2 such symbols arise naturally in the analysis of parabolic differential operators as well as in several complex variables as discussed in Stein (see [328], Chapter VII), where a proof of L2 -boundedness of operators Tσ having such symbols—based on Cotlar’s lemma and the standard dyadic decomposition of frequency—is also provided. Meyer [275] argues that Schur’s lemma furnishes a more efficient route to this result. However, standard Gabor functions are no longer a good set of time–frequency atoms to use, nor are wavelets ideally adapted to this problem. Nevertheless, just as it can be used to produce Lemari´e–Meyer wavelets, the local trigonometric frame construction can produce a frame of wavepackets naturally adapted to the growth of the symbol. We outline Meyer’s approach in the case δ = 1/2 here. The goal is to produce wavelet-like packets from the local trigonometric frame construction, working now in the frequency domain and so using “j” to keep track of supports of bells and “k” to count local oscillations. One begins with sequences of numbers {ωj } and {ηj } that are increasing and nonnegative, respectively, such that ωj+1 − ωj ≥ ηj+1 + ηj . Let bj (ξ) be smooth bell functions satisfying X |bj (ξ)|2 = 1 supp bj ⊂ [ωj − ηj , ωj+1 + ηj+1 ] supp (1 − bj ) ⊂ R\ (ωj + ηj , ωj+1 − ηj+1 )c .
(6.36)
For Lj = ωj+1 − ωj + ηj + ηj+1 one defines local trigonometric functions ³ 2πikξ ´ 1 bjk (ξ) = p bj (ξ) exp , j, k ∈ Z. Lj Lj As before one can verify that {bjk } forms a tight frame. To adapt this construction to the symbols at hand one sets ωj = j 2 sgn j and ηj = (1 + |j|)/10 so that Lj = 11(2|j| + 1)/10 if j 6= 0. The smoothness and support properties of bj are consistent with |(d/dξ)m bj (ξ)| ≤ C|j|−m . An integration by parts argument as before then shows that, if σ satisfies (6.35) with δ = 1/2 then
274
6 Function spaces and operator theory
sup
X
(j 0 , k0 ) j,k
sup (j 0 , k0 )
X
| hTσ bjk , bj 0 k0 i | ≤ C | hbjk , Tσ bj 0 k0 i | ≤ C 0 .
j,k
The L2 -boundedness of Tσ then follows from Schur’s lemma. For other values of δ one should adapt the bells accordingly.
6.9 Dyadic structure and NWO sequences In this section we return to analysis on Rn since some of the results will be needed later specifically in Rn , n ≥ 3. Geometry of dyadic cubes will play an important role. Denote by Q the dyadic cubes Q in Rn with centers xQ and sidelength l(Q), and by Qj those Q ∈ Q with l(Q) = 2−j . In [274], Meyer provides a proof of L2 -boundedness of certain Calder´on–Zygmund operators based on the fact that those operators map wavelets to “vaguelettes.” This mapping property depends on regularity and cancellation properties of the wavelets. What can be deduced just from the dyadic structure, with as little extra hypotheses on the dyadic functions as possible? Rochberg and Semmes [307, 309, 310] addressed this question in an elegant way. P 2 One calls a sequence {ψj } ⊂ L2 weakly orthogonal (WO) if k αj ψj k2 ≤ P C |αj |2 holds for any `2 -sequence {αj }. Let {φj } be a Riesz basis for L2 and let T be a linear operator P defined by T φj = ψj where ψj is WO. Then T is L2 -bounded since, if f = cj φj then °X °2 °X °2 X ° ° ° ° 2 2 kT f k2 = ° cj ψj ° ≤ C |cj |2 ≤ C ° cj φj ° = C kf k2 . 2
2
In fact, {ψj } is weakly orthogonal if anly only if {ψj } is the image of a Riesz basis under a bounded operator. Rochberg and Semmes [310] introduced a quadratic independence property weaker than WO, but particular to dyadic geometry, that they called near weak orthogonality. One says that a collection {fQ }Q∈Q is nearly weakly orthogonal (NWO) provided the nontangential maximal function N f (x) = N (f, {fQ })(x) =
sup |x−xQ | ≤ l(Q)
| hf, fQ i | |Q|1/2
(6.37)
maps L2 (Rn ) to itself continuously. The mapping f 7→ N f depends on the family {fQ }. If {fQ } is WO then it is NWO since Z X Z | hf, fQ i |2 X 2 (N f ) ≤ χ2Q ≤ cn | hf, fQ i |2 . |Q| Q∈Q
Q∈Q
The following lemma shows that the NWO property holds when the eQ are localized near Q and are a little better than square-integrable.
6.9 Dyadic structure and NWO sequences
275
Lemma 6.9.1. Suppose that for some c, K > 0 and r ∈ (2, ∞] and for all Q ∈ Q, supp eQ ⊂ KQ and keQ kr ≤ c|Q|1/r−1/2 . Then {eQ }Q∈Q is NWO. Proof. Given h ∈ L2 (Rn ) and Q ∈ Q, by H¨older’s inequality 0
|Q|−1/2 | hh, eQ i | ≤ c |Q|−1/r kh χKQ kr0 ≤ c ³ where Mr (h)(x) = supx ∈ Q N h(x) =
1 |Q|
R
|h|r Q
sup |x−xQ | ≤ l(Q)
´1/r
inf
x ∈ KQ
Mr0 (h)(x)
. That is,
| hh, eQ i | ≤ c Mr0 (h)(x). |Q|1/2
Since r0 < 2, Z 2 kMr0 (h)k2
=
Z r0
2/r 0
[M1 (|h| )]
≤ C
2
|h|2 = C kf k2
by the Hardy–Littlewood maximal theorem. This proves the lemma. Corollary 6.9.2. If w2 ∈ A∞ then {eQ = wχQ /kwχQ k2 } is NWO. This follows directly from the fact that w2 satisfies a reverse H¨older condition which guarantees the condition of the lemma. A natural connection between NWO and WO is provided by the quadratic Carleson class QC(Q) consisting of those sequences {λQ }Q∈Q such that µ sup Q∈Q
¶1/2 1 X |R| |λR |2 ≡ k{λQ }kQC < ∞. |Q|
(6.38)
R⊂Q
To any sequence {cQ }Q∈Q one can associate the maximal function SN {cQ }(x) =
sup
|cQ |.
|x−xQ | ≤ l(Q)
A standard distribution function argument (e.g., [328], p. 60) shows that Z X X 2 2 |λQ |2 |cQ |2 dx ≤ C kSN {cQ }k2 k{λQ }kQC . |λQ |2 |Q| |cQ |2 = x∈Q
(6.39) √ More precisely, set Oα = {x : SN {cQ }(x) > α}. Roughly, O is a union of α √ pairwise disjoint cubes such that |cQ |P > α. Dividing Oα into pairwise disjoint maximal dyadic cubes one obtains Q0 ⊂Oα |λQ0 |2 ≤ Ck{λQ }k2QC |Oα |. The estimate (6.39) follows from integrating over α. Now suppose that {λQ } ∈ QC and that {ψQ } is NWO. Set cQ = hf, ψQ i/|Q|1/2 . Then (6.39) yields k{hf, λQ ψQ i}k`2 ≤ C kSN {cQ }k22 k {λQ }k2QC = C kN f k22 k{λQ }k2QC ≤ C kf k22 k {λQ }k2QC .
276
6 Function spaces and operator theory
Thus f 7→ {hf, λQ ψQ i} is continuous from L2 to `2 so {λQ ψQ } is WO. It turns out, conversely, that if {λQ ψQ } is WO for all {λQ } ∈ QC then {ψQ } is NWO. Consequently, {ψQ } is NWO precisely when {λQ ψQ } is WO for each {λQ } ∈QC. A linear Carleson condition happens to be more suitable for proving L2 -boundedness of operators associated to NWO sequences. The (linear) Carleson class consists of those sequences {λQ }Q∈Q such that sup Q∈Q
1 X |R| |λR | ≡ k{λQ }kC < ∞. |Q|
(6.40)
R⊂Q
A slight adjustment of the proof of (6.39) yields Z X X |λQ | |cQ | dx ≤ C kSN {cQ }k1 k{λQ }kC . (6.41) |λQ | |Q| |cQ | = x∈Q
Rochberg and Semmes [310] considered a class of operators built from NWO sequences, {eQ , fQ } and a coefficient sequence {aQ } having the form X X T : f 7→ aQ hf, fQ i eQ or T = aQ eQ ⊗ fQ . (6.42) Q∈Q
Q∈Q
Operators that map wavelets to vaguelettes (2.1.4) are typical examples. Such a T is bounded provided {aQ } is in the Carleson class. P Theorem 6.9.3. If {eQ } and {fQ } are NWO then for T = Q∈Q aQ eQ ⊗fQ , one has kT kL2 →L2 ≤ C k{aQ }kC Proof. Given f and g in L2 , (6.41) and Cauchy–Schwarz imply: ¯X ¯ ¯ ¯ hT f, gi = ¯ aQ hf, fQ i hg, eQ i¯ Q
≤
X
|aQ | |Q| |Q|−1 |hf, fQ i| |hg, eQ i|
Q
Z
≤ C k{aQ }kC Z ≤ C k{aQ }kC
|hf, fQ i| |hg, eQ i| dx |Q| |x−xQ | ≤ l(Q) sup
N f (x) N g(x) dx
≤ C k{aQ }kC kN (f, {fQ })k2 kN (g, {eQ })k2 ≤ C k{aQ }kC kf k2 kgk2 due to boundedness of the corresponding maximal functions. Theorem 6.9.3 extends readily to Lp -boundedness with a slight renormalization of NWO to accommodate homogeneity in Lp (cf. [310], p. 303). The theorem is of practical value only insofar as one can identify sequences having the NWO property. Examples include vaguelettes (2.1.4) that are,
6.9 Dyadic structure and NWO sequences
277
in fact, WO—a fact whose proof requires some effort. The NWO functions can be reasonably considered as approximate eigenfunctions and the {aQ } as approximate singular values for a more general class than wavelet-vaguelette operators. Let {a∗Q } denote a nonincreasing rearrangement of {aQ } in (6.42). In what follows there will be no harm in assuming that aQ ≥ 0. If a∗Q → 0 then T in (6.42) is compact and can be approximated by finite rank operators of PN the form TN = k=1 aQk eQk ⊗ fQk in which aQk = a∗Q (k). The values a∗Q (k) are not actual singular values, but they do control the rate of approximation of T by TN as we shall see. The Carleson envelope M ({λR })(Q) =
1 X |R| |λR |, |Q|
(6.43)
R⊂Q
by definition, satisfies kM {λQ }k∞ = k{λQ }kC . But {cQ } 7→ M {cQ } also enjoys other mapping properties. Lemma 6.9.4. {cQ } 7→ M {cQ } is `p -bounded (0 < p < ∞). We only consider the case p ≥ 1. For p = 1: k{M (cQ )}k1 =
X
M (cQ ) =
Q
=
X X |R| |cR | |Q| Q R⊂Q
X
X
|cR |
R
Q:R ⊂ Q
X |R| = Cn |cR | = Cn k{cQ }k1 . |Q| R
When p > 1, by two applications of H¨older’s inequality, ¶p X X µ X |R| p (M (cQ )) = |cR | |Q| Q R⊂Q ¶p ∞ X X µX X = 2n(k−j) |cR | Q ∈ Qk
k
≤ Cp
X X k
≤ C
∞ X
Q ∈ Qk j=k
∞ XX
R ⊂ Q,R ∈ Qj
j=k
(1 + j − k)2p 2np(j−k)
X X j
R ∈ Qj
¶p
X
|cR |
R ⊂ Q,R ∈ Qj
(1 + j − k)2p 2np(k−j) 2n(p−1)(j−k)
X
|cR |p
R ∈ Qj
k j=k
≤ C
µ
|cR |p
j X k=−∞
X (1 + j − k)2p ≤ C |cR |p n(j−k) 2 R
It follows from the Marcinkiewicz interpolation theorem that M is also bounded on `p,q when 1 < p < ∞ and 1 ≤ q ≤ ∞. In particular, there is a C > 0 such that, for all λ > 0,
278
6 Function spaces and operator theory
#{R ∈ Q : |M (cQ )(R)| > λ} ≤ C p λ−p } k{cQ }k`p,∞ . P Let F = {Q1 , . . . , QN } be a set of dyadic cubes and TN = Q∈F aQ eQ ⊗ fQ a finite rank approximation of T in (6.42). In view of Theorem 6.9.3, to P estimate T − TN = Q∈F a e aQ }kC where Q Q ⊗ fQ it suffices to estimate k{b / bQ = 0 if Q ∈ b aQ = aQ if Q ∈ F and λ / F. The following lemma says that the Carleson sequence norm can be localized. bQ = λQ if Q ∈ F and λ bQ = 0 if Lemma 6.9.5. Let F ⊂ Q be finite and set λ b Q∈ / F. Then kM {λQ }k∞ ≤ sup M {λQ }. Q∈F /
Proof. Let F = {Q1 , . . . , QN } where l(Q1 ) ≥ l(Q2 ) ≥ · · · ≥ l(QN ). Set F 0 = ∅ and set Fj = Fj−1 ∪ Qj and let Fj0 = F \ Fj . Finally, set λjQ = λQ if bQ = λ0 ≤ λ1 ≤ · · · ≤ λN = λQ and it Q∈ / Fj0 and λjQ = 0 if Q ∈ Fj0 . Then λ Q Q Q suffices to show that for j = 0, 1, . . . , N one has sup M {λjQ } ≤
Q∈ / Fj
sup Q∈ / Fj+1
M {λjQ } ≤
sup Q∈ / Fj+1
M {λj+1 Q }.
The right-hand side inequality is trivial because λjQ ≤ λj+1 and {λQ } 7→ Q M {λQ } is monotone. For the left-hand side inequality one checks that © ª © ª M λjQj+1 ≤ sup M λjQ . Q∈ / Fj+1
Since λjQj+1 = 0, if Qij+1 are the bisection cubes of Qj+1 then M
©
ª λjQj+1
n
2 ´ ³ ´ 1 X ³ j = n M λQi ≤ max M λjQi . j+1 j+1 i 2 i=1
Since the Qj are ordered by size, none of the Qij+1 belong to Fj+1 . Thus the left-hand side inequality holds and the lemma is proved. As a corollary, kM {aλQ }k∞ ≤ supQ∈F / λ M {aQ } ≤ λ provided Fλ = {Q ∈ Q : |aQ | > λ} is finite, where aλQ = aQ if |aQ | > λ and is zero otherwise. PN Returning to the original rank N approximant TN = k=1 aQk eQk ⊗ fQk , in which aQk = a∗Q (k) is the kth largest element of {|aQ |}, one has the following. Corollary 6.9.6. kT − TN kL2 →L2 ≤ Ca∗ (QN +1 ). Let Sp,q denote the Schatten–Lorentz class of operators T whose approximation numbers σN (T ) = inf{kT −SN kL2 →L2 : rank SN ≤ N } define elements of the Lorentz class `p,q . One has the following. P Corollary 6.9.7. If {eQ } and {fQ } are NWO then for T = Q∈Q aQ eQ ⊗fQ one has T ∈ Sp,q whenever {aQ } ∈ `p,q . In summary, the very simple NWO condition is enough to encode information about the decay of approximation numbers. Operators of the form P a Q∈Q Q eQ ⊗ fQ will be considered as models for operators of interest in harmonic analysis in the next chapter.
6.10 Notes
279
6.10 Notes Matching pursuit. It is not always favorable to use a basis when considering best N -term approximations. Best-basis algorithms, for example, seek a signal representation that minimizes some notion of information cost—such as rate of approximation—over a family of bases. Mallat et al. (see [268]) considered dictionaries of time–frequency atoms—essentially modulated wavelet packets. Matching pursuits is a greedy algorithm that identifies a nearly optimal singleterm approximant a1 of a signal s from among an overcomplete dictionary D of signal descriptors. One then iterates, identifying a near-optimal single-term approximant a2 of r1 = s − a1 , a3 of r1 − a2 etc. Formally, then s = a1 + r1 = · · · = a1 + · · · + an + rn . It is more difficult to identify tradeoffs between algorithm complexity and rate of approximation here than in the case of basis approximants. The idea of projection pursuit goes back to Huber [205]. Basis pursuit. One of the advantages of matching pursuits and best basis algorithms is that diverse building blocks can be used to match an N -term approximant to a given signal in a Hilbert space H. Donoho suggested a method of basis pursuit in which the matching pursuit dictionary is a finite union of orthonormal bases. Among all possible signal expansions, an optimal one minimizes `1 norm of the expansion coefficients. The rationale for this choice is that `1 is, in a sense, the closest convex norm to the `0 “norm” that counts the number of nonzero expansion coefficients. Diversity is expressed in terms of the joint concentration of two subsets S1 = {φi } and S2 = {ψj } of the unit sphere in H. A working, minimax definition of joint concentration is given by the Schur norm, namely X X |hφi , ψj i|. |hφi , ψj i| + max max j
i
i
j
Investigations along these lines were initiated by Donoho and Huo [118]; see also Donoho and Elad [117], Gilbert et al. [157] and Tropp [350] for more recent developments. Besov spaces and rearrangements. The heuristic that large coefficients of wavelet expansions of functions in Besov spaces are roughly organized along dyadic trees is subject to the following caveat: shuffling the wavelet coefficients at any fixed dyadic level does not affect the (wavelet) Besov norm. Baraniuk [19] has suggested a generalization of Besov norms in which coefficients can be weighted by their location as well as their scale. Wavelet compression and trees. Theorems 6.4.1 and 6.5.2 show that treebased wavelet approximation works well in Besov norms, while the companion approximation spaces Asq (Lp ) were shown to be natural spaces for considering the question of rate of convergence of nonlinear approximation. In particular,
280
6 Function spaces and operator theory
(i) Theorem 6.3.1 characterizes the somewhat complicated norm on Asq (Lp ) in terms of a much simpler wavelet coefficient Lorentz norm while (ii) Asq (Lp ) turned out to be a familiar Besov space in a certain range. While good enough to make conclusions about optimality of wavelet tree encoding, Bλ (Lp ) (see (6.14) cannot be called natural for tree approximation in the same way that Asq (Lp ) is for nonlinear approximation. One would like to characterize those functions whose wavelet tree approximations converge at a certain rate. The trouble here seems to be with Lp . One is unable to read off Lp norms simply from the magnitude of coefficients when p 6= 2: spatial arrangement also plays a role. Baraniuk et al. [20] proposed using a different base space for tree approximation—a Besov space that is, in a sense, close to Lp . The resulting tree approximation space is nearly described as a wavelet coefficient Lorentz norm—but now a certain maximal function intervenes. We will just describe the result here and refer to [20] for the somewhat technical proof. In analogy with Asq (B) one considers ½ P∞ ( n=1 [ns τn (f )B ]q / n)1/q , 0 < q < ∞ kf kBs (B) = q supn ns τn (f )B , q = ∞. the Besov space consisting of those distributions whose Denote by bp = b0,p p coefficients satisfy k{hf, ψI,p0 i}k`p (D[0,1]) < ∞. It is a normed space if p ≥ 1, but rather different from Lp when p 6= 2. The tree structure is useful for introducing the subtree maximal function I 7→ µI (f ) = sup TI
³ 1 X ´1/p | hf, ψI, p0 i |p #TI J ∈ TI
in which the supremum is taken over all finite subtrees TI of DI (the dyadic subintervals of I). Then mI (f ) = inf I⊂J µJ satisfies mI ≤ mJ whenever I ⊂ J and {I : mI (f ) > ε} is a tree. One defines a new function space Mps,q to consist of those f such that kf kMps,q = k{mI (f )}k`p/(sp+1),q (D[0,1]) . One has the following characterization of approximation spaces Bqs (bp ) [20]. Theorem 6.10.1. For any s, p, q > 0 s Bqs (bp ) = Mp,q
with equivalent norms. Moreover, if 0 < p < ∞ and ε > 0, one has ° ° ° ° °f − Te(f, ε)° = τNε (f )bp bp
where T˜(f, ε) =
P {I:mI (f )>ε}
hf, ψI, p0 i ψI, p and Nε = #{I : mI (f ) > ε}.
6.10 Notes
281
The last statement says that thresholding based on values of the maximal function—rather than on values of the coefficients themselves—produces optimal N -tree approximants. The subtree maximal function norm accounts for the spatial organization of coefficient magnitudes more effectively than straight Besov norms. Triebel–Lizorkin spaces. As in the case of Besov spaces, Triebel–Lizorkin (TL) spaces can be defined in terms of their wavelet coefficients. Namely, one defines a space f˙pα,q of sequences {sI }I∈D for which the square function P ( x∈I (|I|−α−1/2 |sI |)q )1/q belongs to Lp . When {sI = hf, ψI i} ∈ f˙pα,q for a sufficiently regular wavelet basis, one says that f belongs to the Triebel– Lizorkin space F˙pα,q (R). In view of the following lemma (see [274]), the TLspaces include the spaces Lp (R) as the case α = 0 and q = 2. Lemma 6.10.2. There are constants 0 < cp,ψ ≤ Cp,ψ < ∞ such that, when¡P ¢1/2 2 satisfies x∈I∈D |hf, ψI i| /|I|
ever f ∈ Lp (R), S(f )(x) =
cp,ψ kf kp ≤ kS(f )kp ≤ Cp,ψ kf kp . The space F˙10,2 (R) is the real Hardy space Re H 1 (R) (e.g., [149, 158, 189]). Stromberg’s proof [336] that (Franklin) wavelets form an unconditional basis for Re H 1 stimulated much of the early mathematical interest in wavelets. Curvelets. Wavelets are effective for representing objects with point singularities, whereas many images contain discontinuities across curves. Curvelets [65] provide efficient representations of objects that are C 2 everywhere in the plane except along a C 2 curve. Specifically, the best N -term curvelet C C 2 approximant fN of such an object f satisfies kf − fN k2 ≤ CN −2 (log N )3 , whereas the minimax N -term wavelet approximation numbers decay only like 1/N . The key to this improvement is that curvelets are localized in space and look like needles with various orientations. They have the form ψj,l,k (x) = 23j/2 ψ(Dj Rθ(j,l) x − kδ ) where Dj is the dilation matrix diag {4j , 2j }, Rθ(j,l) is the rotation through angle 2πl/2j (l = 0, 1, . . . , 2j − 1) and kδ is translation by (k1 δ1 , k2 δ2 ), where (δ1 , δ2 ) is fixed and (k1 , k2 ) ∈ Z2 . The basic waveform ψ is oscillatory in one direction and bell-shaped in the other. It is possible to choose ψ real-valued in such a way that the ψj,l,k form a tight frame for L2 (R2 ). Their anisotropy also makes them appropriate frames for analysis of Fourier integral operators of the form Z Ta,Φ f (x) = eiΦ(x,ξ) a(x, ξ) fb(ξ) dξ. (6.44) For nice symbols a satisfying |∂ξα ∂xβ a(x, ξ)| ≤ Cαβ (1 + |ξ|)m−|α| and phase functions Φ satisfying the nondegeneracy condition |det ∇x ∇ξ Φ(x, ξ)| > c > 0, one has the decay estimates [64]
282
6 Function spaces and operator theory
|hT ψj,l,k , ψj 0 ,l0 ,k0 i| ≤ CM 2−M |j−j
0
|
¡
¢−M 0 1 + 2min(j,j ) d((j, l, k), (j 0 , l0 , k 0 )
on the curvelet matrix coefficients of T = Ta,Φ for each M > 0. Here d(·, ·) is essentially the sum of the squared distance of the centers of the supports plus the angular distance. The argument follows much the same lines as for wavelet–vaguelette operators. Here, Ta,Φ maps curvelets to curvelet molecule analogues of vaguelettes. Schur’s lemma for `2 . A few results in this chapter are based on the following form of Schur’s lemma that is specific to `2 spaces. Lemma 6.10.3. (Schur’s lemma) Suppose that there are two sequences ω(j) and ω ˜ (j), j ∈ J of positive reals such that: X for all j ∈ J , |α(j, k) | ω(k) ≤ ω e (j); k
for all k ∈ J ,
X
|α(j, k) | ω e (j) ≤ ω(k).
k
Then the matrix A = (α(j, k))j,k∈J is bounded on `2 (J ) with kAk ≤ 1. See, e.g., Meyer [274] for the simple proof. In many cases, for example when A is symmetric, or even when |α(k, k)| ≥ δ > 0 one can take ω = ω ˜. Pointwise convergence. Besides the rate of approximation by best N -term approximants, one can also argue the superiority of wavelet versus Fourier expansions in Besov or Triebel–Lizorkin spaces in terms of almost-everywhere convergence of series expansions. Pointwise convergence of wavelet expansions in Lp (1 < p < ∞) is a far simpler matter (e.g., [158]) than for Fourier expansions. In fact, K¨orner [233] has shown that a Fourier series can diverge almost everywhere when partial sums are expressed in terms of the distribution of large coefficients. That is, there is an f ∈ L2 (T) such that P lim supη→0 k |fb(n)|>η fb(n)e2πint k → ∞ almost everywhere on T. In fact, this behavior P is typical (see [234]). In contrast, Tao [340] proved that for f ∈ L2 (R), |hf,ψI i|>η hf, ψI iψI → f a.e. on R. Negative results for the Weyl correspondence. The Weyl correspondence (6.29) defines an isometry between symbols in L2 (R × R) and Hilbert– Schmidt operators (e.g., [144]). There is an asymmetrical form of (6.29) that, in Rn , assigns the operator f (x)g(−2πi∇) to the symbol σ(p, q) = f (q)g(p). It is known to satisfy kf (x)g(−2πi∇)kSp ≤ kf kp kgkp and, by a theorem of Cwickel, kf (x)g(−2πi∇)kSp,∞ ≤ kf kLp kgkLp,∞ when p > 2. In fact, the NWO picture provides a relatively straightforward proof for a wavelet model of this difficult fact [307]. This all might lead one to conjecture that kσ(X, D)kSp ≤ kσkp holds in general for nonseparable σ, with a similar bound for σ ˜ (X, D). That these fail was proved by Simon [320].
6.10 Notes
283
Theorem 6.10.4. The estimate kσ(X, D)kSp ≤ Ckσkp fails for p > 2. RR Here is an outline for one variable. Since hψ, σ(X, D)ψi = σW (ψ), if σ(x, ξ) 7→ σ(X, D) is to be bounded from Lp (R2 ) to the continuous operators 0 0 on Lp (R), then W (ψ) must belong to Lp (R2 ) whenever ψ ∈ Lp ∩ L2 (R). If ψ is supported in (−1, 1), then by (5.39), W (ψ) is supported in the strip defined by |x| ≤ 1. By H¨older’s inequality, Z b ψ(2ξ) b ψ(0) = W (ψ)(x, ξ)e2πixξ dx ∈ Lq (dξ) RR b whenever |W (ψ)|q < ∞. Let 0 ≤ ψ(x) ≈ |x|−α for small x. Then ψ(0) 6= 0 α−1 q b and ψ(ξ) ≈ |ξ| for large ξ. This implies that W (ψ) ∈ / L if q(1 − α) < 1. Thus, given q ∈ [1, 2), one has ψ ∈ Lq but W (ψ) ∈ / Lq when α is close to 1/2. In short, ψ 7→ W (ψ) is unbounded from Lq (R) to Lq (R2 ) when q < 2. Modulation spaces and Schur’s criterion. Just as wavelets can be used to define the Besov spaces as weighted mixed-norm coefficient spaces, the local trigonometric functions bkn defined in Section 6.8 can be used to define modulation spaces Mwp,q (R) to consist of those distributions f such that kf kMwp,q =
µX µX n
¶q/p ¶1/q |hf, bkn i| w(n, k)p
k
is finite. When one of p or q is infinite the corresponding sum is replaced by a supremum. Multivariate versions can be defined in terms of multivariate extensions of bkn . Gr¨ochenig [168] considers the modulation spaces in detail. There they are defined in terms of the short-time Fourier transform. Examples of such spaces include the Feichtinger algebra M1 , as well as the Sobolev spaces H s (R) when p = q = 2 and w = w(n) = (1 + |n|2 )s . The space M ∞,1 is of particular interest. Gr¨ochenig and Heil proved that if the symbol σ lies in M ∞,1 (R2 ) then the operator σ(X, D) is bounded on M p,q , 1 ≤ p, q ≤ ∞ (cf. [168]) and thus on L2 (R). The symbol condition is tailor-made for an application of Schur’s lemma for matrix boundedness. Setting this up requires a clever use of what amounts to Moyal’s formula (cf. also 5.44). This approach can be used to prove L2 -boundedness of σ(X, D) under a minimal smoothness hypothesis on σ, as well as to recover and extend several of the results in Section 6.8.
7 Uncertainty principles in mathematical physics
In earlier chapters we have motivated mathematical analysis of Fourier uncertainty inequalities through problems in signal analysis. However, many deep ideas in time–frequency analysis have their origins in mathematical physics, specifically in quantum mechanics. So it makes sense, in turn, to consider advances in mathematical physics that have resulted from time–scale analysis and wave packet combinatorics. Sobolev inequalities comprise one of the most important mathematical forms of the uncertainty principle. The role of wavelets in proving such inequalities was considered in Chapter 2. Here we investigate the uncertainty principle in mathematical physics as it pertains to localization of particles and stability of matter. We thus seek qualitative information about eigenvalues and approximate eigenfunctions of Schr¨odinger operators −∆ + V . In Section 7.2 we review the fundamental work of Fefferman and Phong (see [136]) who provided estimates on the number of bound state solutions of a Schr¨odinger equation in terms of local averages or “large bumps” of the potential V that, intuitively, define potential wells. Remarkably, their “approximate eigenfunctions” are nothing but piecewise linear wavelets localized on the large bumps. The statement that −∆ + V is a positive operator away from the bumps is a form of Sobolev’s inequality. At the other end of the spectrum, one considers “small” but nonlocalized potentials, for which −∆ + V is thought of as a perturbed Laplacian. Among other basic questions one can ask: when can a wavefunction solution of the scattering problem (−∆ + V )ψ = i(dψ/dt) be expressed as superpositions of generalized eigenstates, just as wavefunction solutions of d2 ψ/dx2 = −idψ/dt 2 can be expressed as superpositions of the eigenstates eiλx−iλ t ? This requires that the absolutely continuous spectrum of −∆+V have no gaps, while the generalized eigenfunctions should behave, asymptotically, like sines and cosines. In Section 7.4.3 we outline the work of Christ and Kiselev (e.g., [72]) that addresses this problem and leads to connections with other deep problems in harmonic analysis such as Carleson’s theorem on almost-everywhere conver-
286
7 Uncertainty principles in mathematical physics
gence of Fourier series and Lacey and Thiele’s proof of boundedness of the bilinear Hilbert transform (BHT). The discrete Walsh phase plane discussed in Chapter 4 gives rise to a more tractable Walsh model BHT. Boundedness of this operator will be proved following the approaches of Thiele [346] and of Gilbert and Nahmod [160]. The full combinatorial power of the Walsh plane is brought to bear in the form of ideas originating in Fefferman’s proof of Carleson’s theorem [135]. Along the way, WKB methods and their role in providing understanding of uncertainty relations are discussed. To tie together the circle of ideas running throughout the book, we finish with a discussion of the wavelet auditory model of Benedetto and Teolis. The appearance of WKB and the uncertainty principle in this setting was termed by Zweig et al. [373] as the cochlear compromise. Irregular sampling of the continuous wavelet transform along local maxima provides input data for iterative reconstruction. Wavelets play a role in other deep and important results in this chapter. In Section 7.2.6 we review Rochberg’s estimates for bound-state eigenvalues of −∆ + V in terms of the large eigenvalues of Kλ = V 1/2 (λ − ∆)−1 V 1/2 . Rochberg (e.g., [307]) proposed a wavelet model for the latter operator and analyzed it using the concept of near weak orthogonality developed in Chapter 6. In Section 7.3, we present work of Cohen et al. [80] (and others, cf. [82]) that uses analysis on binary trees associated with wavelets, along with specific properties of the wavelets themselves—in a manner closely parallel to the Fefferman–Phong techniques—to produce a sharp “endpoint” Sobolev inequality. Because most of the ideas presented here have their origins in quantum mechanics, we review the relevant concepts here, especially those from spectral theory, in a brief and superficial way: our purpose is merely to convey a sense of their historical development.
7.1 Wave mechanics and uncertainty In classical or Newtonian mechanics the state or configuration of a system is determined completely from its initial configuration and constraints that can be described deterministically as a specific real-valued function of the positions q1 , . . . , qn and momenta p1 , . . . , pn of the constituents or particles of the system by means of the Hamiltonian function H(q, p) =
n 1 X p2j + V (q). 2 j=1 mj
Here mj is the mass of the jth particle. Transition among states is modelled then by the equations of motion ∂H dqj ∂H dpj = − ; = . dt ∂qj dt ∂pj
7.1 Wave mechanics and uncertainty
287
The deterministic picture began to unravel around 1900, first with Planck’s work on black body radiation, followed by Einstein’s 1905 investigation of the photoelectric effect, and Neils Bohr’s 1915 observation of discrete spectra. The problem of indeterminacy of position–momentum pairs was quantified after Compton’s 1923 scattering experiments. As de Broglie’s 1924 theory of matter waves gained acceptance, it became necessary to develop a mathematical model unifying these observations, with a hope of reconciling the quantum picture with the Hamiltonian picture. The first attempts at capturing noncommutativity of position and momentum measurements involved a matrix mechanics due to Dirac, Heisenberg and Pauli in which the coordinate functions pi and qj were replaced by matrices Qj and Pi such that Qj Qk = Qk Qj ; Pi Pk = Pk Pi but Pi Qj −Qj Pi = (~/2πi)δij I. This mechanics failed to capture the operational nature of measurements in terms of infinitesimal generators of motion, i.e., differential operators. Thus, Hamiltonian mechanics was replaced by Schr¨odinger’s wave mechanics, with waves expressed as solutions of ³ ´ ~ H q; ∇q ψ(q) = λ ψ(q), 2πi referred to as wavefunctions or states. The basic tool for modelling phenomena such as refraction and interference is the wave operator d2 − a2 ∆. dt2 With de Broglie’s wave–particle duality it became accepted that the wave equation should play a role in describing quantum states as well. Solutions of the wave equation having the special separable form ψ(q)e−iωt in Rn × R must then be solutions of the Helmholtz equation ¡ ¢ ∆ + k 2 ψ = 0. Here, the wavenumber k satisfies k =
ω 2πν 2π = = a a λ
where ν denotes the frequency of the wave and λ the wavelength. In de Broglie’s scheme, λ = ~/mv expresses the wavelength of a wave–particle as being inversely proportional to momentum mv, with proportionality expressed by Planck’s constant ~. With these conventions, the Helmholtz equation takes the form µ ¶ 8π 2 m mv 2 ∆+ ψ = 0. ~2 2 Expressing the sum of kinetic and potential energies as E = mv 2 /2 + V yields
288
7 Uncertainty principles in mathematical physics
µ ¶ 8π 2 m ∆+ (E − V ) ψ = 0 or ~2 µ ¶ ~2 − 2 ∆ + V ψ = E ψ, 8π m
(7.1)
which is the time-independent Schr¨odinger equation. It suggests that, for a given potential the possible energy levels of bound states of the system— ones remaining bounded and approaching zero at infinity—will depend on the spectrum of the Schr¨odinger operator on the left-hand side. In this sense, (7.1) offers an explanation of Bohr’s theory for bound states of the hydrogen atom that can be extended in a consistent way to other systems. Then fundamental issues such as the problem of stability of matter can be cast in terms of the problem of estimating ground-states energies—that is, minimal elements of the discrete spectrum. Making sense of these issues requires some understanding of spectral theory. The material is standard now (e.g., [303]). The comments that follow are based largely on Mackey’s observations [265] regarding its historical development. 7.1.1 Spectral theory Hilbert’s observation of the analogy between integral kernels and matrices led to his gem in this field—the spectral theorem for bounded self-adjoint operators in Hilbert space. The first observation is classical: any Hermitian matrix A = {aij } = {¯ aji } is diagonalizable. If such an A is the matrix of an operator T = TA expressed in standard coordinates on Cn then TA possesses an orthogonal basis of eigenfunctions. In this way one can write A = U −1 DU in which the columns of U are the eigenvectors of TA and the diagonal entries of D are the corresponding eigenvalues. The obvious generalization of this fact fails in infinite-dimensional Hilbert space because then there is an important distinction between true eigenvalues—elements of the point spectrum on which (T − λI)−1 does not exist —versus those generalized eigenvalues λ for which (T − λI)−1 merely fails to be bounded or to be densely defined. To formulate Hilbert’s spectral theorem, it helps to reformulate the problem in what Mackey [265] calls a seemingly perverse fashion. Suppose for a moment that T has a discrete spectrum: all of its eigenvalues are true eigenvalues. For E a bounded subset of R one lets Λ(E, T ) be those eigenvalues of T lying in E and defines P (E) = P (E, T ) as the projection onto the closed subspace spanned by those eigenvectors having eigenvalues in E. The mapping E → P (E) is a projection-valued measure inP the sense that (i) P (∅) = 0, (ii) P (E ∩ F ) = P (E)P (F ) and (iii) P (∪Ek ) = P (Ek ) when Ek are pairwise disjoint. The measure E → P (E) is atomic and its atoms are the eigenvalues of T . Then, as a bilinearPform, T can be reconstructed from these projections by defining hT φ, ψi = λ λ hP (λ, T )φ, ψi.
7.1 Wave mechanics and uncertainty
289
The advantage of this formulation is that, with minor modifications, it extends to self-adjoint operators in any separable Hilbert space H. To quantify existence of generalized eigenvalues one must replace atomic measures by arbitrary sigma finite ones. A mapping E → P (E) defined from Borel sets of R with values in projection operators on H that satisfies (i)—(iii) is called a a projection-valued Borel measure. One says that the measure has bounded support if, for large enough R, P (E) = 0 whenever E ∩ [−R, R] = ∅. Then for any fixed pair φ, ψ in H one can define an ordinary complex Borel measure Rb such that for any interval [a, b), hP ([a, b))φ, ψi = a d hP (λ)φ, ψi. One can state Hilbert’s spectral theorem as follows. Theorem 7.1.1. Let T be any bounded, self-adjoint operator on the separable Hilbert space H. Then there exists a unique projection-valued Borel measure with bounded support such that, for all φ, ψ in H, one has Z ∞ hT φ, ψi = λ d hP (λ, T )φ, ψi . −∞
Conversely, any such measure defines a unique bounded self-adjoint operator on H. Moreover, the eigenvalues of T correspond to the atoms of the measure. Just as any complex Borel measure has a decomposition into absolutely continuous, purely continuous, and discrete parts, the operator-valued spectral measure E 7→ P (E) has a corresponding decomposition. Loosely speaking, those λ supported by the absolutely continuous and discrete components define the absolutely continuous spectrum and discrete spectrum respectively. The theorem was extended by von Neumann to the case of unbounded operators (e.g., [265], p. 140); the only change needed in the statement is that the projection-valued measure no longer need have bounded support. We refer to Kreyszig [237] for an elementary approach to the spectral theorem and to Lax [253] for multiple treatments. 7.1.2 Measuring position and momentum Among the most troubling quantum mechanics issues remaining at the time of von Neumann’s work was that of interpretation in the quantum context of such fundamental classical concepts as position and momentum of a wave/particle. Von Neumann sought to quantify the emerging view that properties such as position and momentum can be defined only extrinsically through measurement, which amounts to performing some operation on the system. One thinks now in terms of projective Hilbert space, in which a state satisfying the Schr¨odinger equation ³ ~ ∂ ~ ∂ ´ ~ ∂ψ −H q1 , . . . , qk , ,..., ψ(q1 , . . . , qk , t) = (q1 , . . . , qk , t) 2πi ∂q1 2πi ∂qk 2πi ∂t evolves, in the absence of intervening disturbances, under the Schr¨odinger semigroup as e−(2πit/~)H ψ0 . It makes sense then to interpret measurements as
290
7 Uncertainty principles in mathematical physics
expectations of possibly unbounded Hermitian operators, including position and momentum operators. Hitherto, operational calculus had been encumbered by matrix coordinates and von Neumann criticized Dirac’s work for its inability to deal with unbounded operators properly: “In the case of those operators for which this is not actually the case, this requires the introduction of ‘improper’ functions, that is, eigenfunctions that do not belong to the Hilbert space, with self-contradictory properties” (see [354], Introduction; see also p. 223). This reservation motivated von Neumann to develop a rigorous theory of “observables” as unbounded operators. It is instructive to review “simple” aspects of von Neumann’s inferences to get a glimpse of early developments. 7.1.3 Simultaneous observability Von Neumann set up a logical correspondence between physical measurements/observations S having real-valued outcomes and self-adjoint operators S on Hilbert space to which the spectral theorem could be applied and used as a fundamental tool for quantifying the extent to which simultaneous observations of a system could be made. The idea of measuring a system in a given “state” is quantified by an operator on an element φ of the Hilbert space that corresponds to this state. We will not review details of this correspondence (see [354], § IV.1) here except to say that it assigns a concrete mathematical meaning to the expectation of an observed real value. Namely, if S is the operator for the quantity S then the expectation that the outcome of S “in state R φ” lies in an interval I is E(I, S)φ = I d hP (λ, S)φ, φi. Joint observability of two physical quantities R, S with operators R, S then refers to the possibility of computing the joint expectation that simultaneous observations with operators R and S, both in state φ, will lie in a given rectangle I×J. The point of departure for a theory of simultaneous observables is the following (see [354], p. 200). Proposition 7.1.2. Suppose that R, S are self-adjoint operators on the separable Hilbert space H. The expectation that, in the state φ, the quantities R, S with the operators R, S take on the values from the respective intervals I, J is defined, for all pairs I, J, and all φ, precisely when the operators commute. 2 Then the expectation is kE(I, R)E(J, S)φk . The proposition extends to any finite family of operators. Though its primary interest lies in the case of unbounded operators with continuous spectra, its proof is considerably simpler in the case of bounded operators having simple, discrete spectra as we outline here. Such operators possess complete, orthonormal bases of eigenfunctions. Proof. If S is such an operator, then
7.1 Wave mechanics and uncertainty
Z E(J, S) φ =
X
dP (λ, S) φ = J
2
kE(J, S) φk =
291
hφ, φk i φk and
λk (S) ∈ J
X
2
|hφ, φk i|
λk (S) ∈ J
where φk is a normalized eigenfunction for λj . One is not concerned with any particular ordering of the {λk } at this stage. Because the eigenfunctions for different eigenvalues form an orthonormal basis for H, J 7→ hE(J, S)φ, φi defines a probability measure for fixed φ (kφk = 1) while φ 7→ E(J, S)φ defines the orthogonal projection onto the subspace spanned by those eigenfunctions for the eigenvalues in J. The relevance of commutation is clear since, under the hypotheses, R, S commute if and only if they eigenfuncP possess a complete set of simultaneous P tions {φk }. Thus Rφ =P k λk (R) hφ, φk i φk while Sφ = k λk (S) hφ, φk i φk . Then f 7→ E(I, R)f = λk (R) ∈ I hf, φk i φk and X
E(J, S) E(I, R)φ =
hφ, φk i φk = E(I, R) E(J, S)φ
λk (S) ∈ J and λk (R) ∈ I
while
2
kE(J, S) E(I, R)φk =
X
2
|hφ, φk i| .
λk (S) ∈ J and λk (R) ∈ I
Thus if R, S commute then R, S are jointly observable. Conversely, suppose again that R, S each have discrete spectra. Set X X R = λk (R) h·, φk i φk while S = λk (S) h·, ψk i ψk k
k
where {φk } and {ψk } are, again, complete orthonormal bases of eigenfunctions, but now for R, S respectively. Assume that R, S do not commute. Then one can relabel the eigenvalues and eigenfunctions in such a way that φk now refer to those joint eigenfunctions (if any) with eigenvalues λk (R) and λk (S) while ψk (R) and ψk (S) refer to those distinct eigenfunctions (there is at least one pair) with eigenvalues µk (R) and µk (S) of R and S, respectively. To make matters clear we shall take intervals I and J small enough to contain the unique eigenvalues µN (R) and µN (S), respectively. Then E(J, S) E(I, R) φ = hφ, ψN (R)i hψN (R), ψN (S)i ψN (S) 6= hφ, ψN (S)i hψN (S), ψN (R)i ψN (R) = E(I, R) E(J, S) φ unless φ is orthogonal to both ψN (R) and ψN (S). Therefore the quantities R, S with operators R, S are not jointly observable when R, S fail to commute. This proves the proposition under the added assumptions. The role of commutativity can be seen in the case in which R, S are unbounded by considering the commutation, or lack thereof, of each of their
292
7 Uncertainty principles in mathematical physics
spectral projections (see [354], p. 225). A few extra observations can be made in the case of simple, discrete spectra. First, an exact measurement of an observable R with operator R is possible if and only if R has a discrete spectrum. Mathematically, the measurement R puts the state in the range of R. If the numerical measurement of R is made once again—immediately following the initial measurement—then, since Rψ evolves continuously under the Schr¨odinger semigroup the measured value, call it λ∗ , will be the same, with certainty. Consequently, λ∗ must be an eigenvalue λM and the state, upon measurement, must be a corresponding eigenstate φM . If R, S commute and F is a function such that F (λn ) = µn then one can write S = F (R) in the sense of spectral calculus. Physically this means that the experimental outcome of R determines the outcome of S. Reversing the roles of R and S, one sees that simultaneous observability of R and S is characterized by commutativity of R, S. When R, S have continuous spectra the situation is more complicated because exact measurement is no longer possible. The extent to which self-adjoint operators R, S fail to commute can be quantified in terms of uncertainty relations. When both R, S have discrete spectra one can define the subspace XRS of L2 spanned by their joint eigenvectors φ1 , φ2 , . . . , which can be completed to an orthonormal basis for L2 by appending functions ψ1 , ψ2 , . . . . Fix distinct numbers λ1 , λ2 , . . . and µ1 , µ2 , . . . and define a linear operator T by T (φk ) = λk φk and T (ψk ) = µk ψk . Now hypothesize an experiment T whose measurements produce one of the eigenstates φk or ψk . If T produces a φk then one can infer the outcome of immediately ensuing R and S-experiments as above; however, if a ψk is produced then one cannot determine the results of R, S-experiments. What one can say is that if T is performed on a system in state ψ then the expectation that the 2 ensuing state is one of the ψk is given by kPXRS ⊥ φk . Uncertainty as a variational inequality. The operators X : f (x) 7→ xf (x) and D : f (x) 7→ f 0 (x)/(2πi) are both unbounded with empty discrete spectrum. Their noncommutativity quite clearly in the one-variable R is reflected R R integration by parts formula f = (xf )0 − xf 0 which says that their commutator [D, X] satisfies [D, X] = DX − XD = I/(2πi). In general, having a nontrivial commutator implies that the joint variance of a pair of operators must be large. The expected values of the observables R, S in state φ are hRφ, φi and hSφ, φi, respectively, while their variances or dispersions are defined, respectively, as 2
2
2
2
var(R, φ) = kRφ − hRφ, φi φk = kRφk − | hRφ, φi |2 , var(S, φ) = kSφ − hSφ, φi φk = kSφk − | hSφ, φi |2 . Theorem 7.1.3. Let R, S be self-adjoint, not necessarily bounded linear operators on the Hilbert space H. If φ is a unit vector in Dom(R2 ) ∩ Dom(S 2 ) ∩ Dom([R, S]) then
7.1 Wave mechanics and uncertainty
293
| h[R, S] f, f i |2 ≤ 4 var(R, φ) var(S, φ). Direct estimation of the variance of rR + isS yields h i h i 2 2 r2 kRφk − | hRφ, φi |2 + s2 kSφk − | hSφ, φi |2 ≥ 2rs Im hRφ, Sφi . The theorem follows from taking s2 = var(R, φ) and r2 = var(S, φ) and noting that 2iIm hRφ, Sφi = h[S, R]φ, φi. In [354], von Neumann was particularly interested in the case in which the Hermitian operators P ,Q are canonically conjugate, meaning that [P, Q] = iαI in which α is a real scalar. Naturally, this identity is only assumed to hold on the domain of the commutator which may not even be dense when P, Q 2 are unbounded. Since 2iIm hP φ, Qφi = α2 kφk whenever φ ∈ Dom([P, Q]), Cauchy–Schwarz implies that α2 2 2 2 kφk ≤ kP φ − hP φ, φi φk kQφ − hQφ, φi φk . 4
(7.2)
This is a special case of Theorem 7.1.3, just as Wiener’s uncertainty principle is the special case in which Q is the position operator and P is the momentum operator. As von Neumann noted, one has equality in (7.2) precisely when P φ is a pure imaginary multiple of Qφ. In the case of the position and momentum operators this happens precisely for Gaussian functions as Heisenberg observed. 7.1.4 Physical considerations of indeterminacy Von Neumann himself was disinclined to proclaim that his correspondence between physical observables and self-adjoint operators by itself implied fundamental limitations regarding natural observations (see [354], p. 237): “With the foregoing considerations, we have comprehended only one phase of the uncertainty relations, that is, the formal one; for a complete understanding of these relations, it is still necessary to consider them from another point of view: from that of direct physical experience. For the uncertainty relations bear a more easily understandable and simpler relation to direct experience than many of the facts on which quantum mechanics was originally based, and therefore the above, entirely formal, derivation does not do them full justice.” See also Penrose [296] for comments on lingering reservations of de Broglie and others. Although Einstein introduced the concept of photons in 1905 and postulated that their energy levels come in discrete quanta, it was not until 1922 that Arthur Halley Compton experimentally observed the collision of photons with electrons. The elastic scattering of the photon is called the Compton effect when the interaction can be regarded as the collision of two otherwise free particles, which is reasonable when the photon energy is at least on the order of magnitude of the rest energy of the electron. Compton observed a
294
7 Uncertainty principles in mathematical physics
discrepancy between wavelengths of x-rays scattered at various angles from thin targets of light elements, versus their incident wavelengths, taking the form ~ λscattered − λincident = (1 − cos θ) me c in which θ is the angle between the scattered and incident photon directions. Assuming conservation of momentum, the scattered photon energy ~ν 0 must be smaller than the incident energy ~ν, as is accounted for by electron recoil. To justify the mathematical formalism, von Neumann sought to quantify the uncertainty relations as they arise particularly in Compton’s experiment. Thus one shoots a photon stream P straight at an electron E. P is scattered, say, onto a photographic plate. One assumes that P has coherent frequency ν = c/λ, that is, its variation is infinitesimal, and momentum p = ~/λ, and that one has precise knowledge of the direction of firing. Then one can measure the position of E if one knows the direction of scattering of the photon. There would be uncertainty in the resulting momentum of E if the collision process were not known; however, one can infer its nature from the change in direction of P . Production of a coherent photon stream requires a focusing lens. To focus light of wavelength λ onto an element of surface of radius ε one should apply a lens of aperture φ such that λ/ε ∼ 2 sin φ/2. The direction of reflection of the light quanta through the lens then is only known to lie between −φ/2 and φ/2. Thus its momentum is only known to within an error of 2(sin φ/2) = (λ/ε)(~/λ) = ~/ε. That is, if the target position is known to within ε then the photon momentum is only known to within ~/ε. One still must account for the possibility of measuring momentum first then position with arbitrary precision. One can measure momentum without relying on position by means of the Doppler effect. If light of frequency ν0 is emitted from a particle E moving at velocity v then an observer at rest measures the frequency (ν − ν0 )/ν0 = (v/c) cos θ where θ is the angle between the direction of motion and of emission. The velocity of E can be inferred then if ν is measured and ν0 is known, for example, in terms of a spectral line of a particular element. Then the component of v in the direction of the observation is measured as v cos θ = c(ν − ν0 )/ν. Upon multiplication by the mass m of E this gives the component of momentum pθ . The dispersion η of pθ depends on that of ν via η∼
mc ∆ν mc ∆ν ∼ . ν0 ν
The observed frequency ν will be sharply localized only in the case of pure monochromatic light, the electric component of whose field has the form a sin(2π((q/λ) − νt) + α) in which q is the spatial coordinate, t the time, a the amplitude and α the phase. Now one is faced with the problem of spatial extension of sinusoids. Setting λ = c/ν, the component of field strength should be damped to have the form Fν ((q/c) − t). The optimal damping will be
7.2 Eigenvalue estimates for Schr¨ odinger operators
295
Gaussian. This brings one to more familiar mathematical territory because now the problem of joint position–momentum localization is recast in the language of Fourier transforms. If the monochromatic wave train modelled by Fν has length τ in t and cτ in q, then the Fourier transform shows that ν must have dispersion on the order of 1/τ . Back to physical grounds, dispersion can be viewed as arising from recoil of E, on the order of ~ν/c in the direction of observation, due to photon emission. This results in a velocity change of ~ν/(mc). Since the emission process must take some time τ if it is to give rise to a well-defined wave train, one cannot localize the time during which this change in velocity occurred to an interval of length less than τ . Hence an indeterminacy of position ε ∼ ~ντ /mc results. Since η ∼ mc∆ν/ν, the product of the two indeterminacies satisfies εη ∼ (~ντ /(mc)) (mc∆ν/ν) = ~∆ντ ∼ ~ as Heisenberg’s principle states. Pinning down momentum necessarily implies an indeterminate position. Von Neumann’s uncertainty principle thus attended to the issue of simultaneous observability with a view toward explaining the necessarily probabilistic nature of localizing particles. Other very basic issues remained, including an attempt to explain the structure of large atoms and of bulk matter, which require other mathematical uncertainty inequalities.
7.2 Eigenvalue estimates for Schr¨ odinger operators 7.2.1 Stability of the hydrogen atom Perhaps the most basic stability issue concerning the nature of atoms can be phrased as: why does a bound electron not fall into the nucleus of an atom? A little more precisely, stability of an electron refers to the principle that its ground-state energy is finite. The hydrogen atom has Hamiltonian H = −∆ − k|x|−1 in R3 (cf. [261]). The mathematical assertion of its finite ground-state energy is hψ, Hψi ≥ E0 hψ, ψi for some E0 > −∞ and all ψ ∈ L2 (R3 ). The ostensible justification is Heisenberg’s inequality. Taking k∇ψk2 as the kinetic energy and kxψk2 as the spatial variation, one has k∇ψk2 kxψk2 ≥ c with c independent of ψ. The kinetic energy must grow like 1/R2 as the radius of the electron falls to R, implying that hψ, Hψi & 1/R2 − k/R which has minimum −k 2 /4 at R = 2/k. As Lieb [258] pointed out, this argument assumes without justification R that ψ takes the R form of a well-localized wave packet. If ψ = ψ1 +ψ2 where |ψ1 |2 = 1/2 = |ψ2 |2 and ψ1 is a narrow packet concentrated about radius R whileR ψ2 is spherically symmetric and supported R in a shell of mean radius L then |xψ2 |2 ≈ L2 /2 while |ψ|2 /|x| ≈ 1/(2R). Heisenberg’s inequality then only implies that k∇ψk2 & c(2/L2 ) so hψ, Hψi & c(2/L2 ) − k/R. Assuming freedom to choose L, this does not preclude an arbitrarily large negative value E0 by letting R → 0. Thus, to argue for
296
7 Uncertainty principles in mathematical physics
stability of an atom one needs an inequality sharper than Heisenberg’s—one that reflects that spatial compression at any location must lead to an increase in kinetic energy. One form of Sobolev’s inequality in n-variables is kψkL2n/(n−2) ≤ cn k∇ψkL2 with a known sharp constant cn . In R3 this implies a lower bound for E0 . With H as above the Sobolev inequality implies that for any ψ, Z 1 dx 2 hψ, Hψi ≥ . (7.3) kψkL6 (R3 ) − k |ψ(x)|2 c3 |x| An extremal for this last inequality is easily computed and given by |ψ|2 = α(1/|x| − 1/R)1/2 χ(0,R) (|x|) with R = R(c3 , k). Though (7.3) does not yield the exact value for E0 , it does give the correct order of magnitude [261]. This leaves the problem of actually determining the possible states— solutions of (7.1) depending on the potential V . Finding useful expressions for eigenfunctions for general V is hopeless. Even estimating such functions is an extremely complicated matter (cf. Theorem 7.2.17). In this section we will review the work of Fefferman and Phong [136] who used basis functions— essentially piecewise linear wavelets, thought of very roughly as approximate eigenfunctions —to estimate the number of negative eigenvalues corresponding to bound states of certain Schr¨odinger operators. Although Fefferman and Phong also obtained eigenvalue estimates in the case of polynomial potentials such as the quantum harmonic oscillator, we will only review the case of potentials that are bounded at infinity in an appropriate sense. Why should wavelets bear on eigenvalue estimates for Schr¨odinger operators? Folk wisdom held that the number of small eigenvalues is governed by the volume of the subset of Rn × Rn on which the symbol ξ 2 + V is not too large. This is called volume counting. It works for many potentials—in particular for Coulomb potentials that arise in the most primitive statements regarding stability of matter. But there are very basic examples in which it fails to give accurate eigenvalue estimates. Piecewise linear (PL) wavelets are extraordinarily well localized in space. Consequently one can hope that they provide morally accurate eigenvalue estimates when those estimates are given in terms of spatial averages of V . This view pans out more or less, but only after some intricate analysis. 7.2.2 Volume counting and its deficiencies One of the most basic questions that one can ask about a Schr¨odinger operator is: how many bound states can one associate to H? Such states are true eigenfunctions of H. When V can be thought of as a short-range potential such as an attractive Coulomb force the corresponding eigenvalue is negative. Since −∆ is non-negative, in order that Hψ = λψ for some λ < 0, V must be
7.2 Eigenvalue estimates for Schr¨ odinger operators
297
negative somewhere. In what follows, it will be convenient simply to assume that V < 0. Thus we will adopt the convention of writing H = −∆ − V thinking of V as being always, or predominantly, non-negative. Volume counting seeks a relationship between the number of eigenvalues of H of a given size and the number of Heisenberg tiles needed to cover the region of Rn ×Rn on which the symbol satisfies a corresponding size estimate. Because of the geometric nature of estimates involved it is necessary—and physically correct—to work in Rn , n ≥ 3. One denotes by N (λ, V ) the number of true eigenvalues smaller than λ and by vol (λ, V ) = |{(x, ξ) : ξ 2 − V (x) < λ}|. The following is due to Cwickel–Lieb–Rosenblum (see [136]). Theorem 7.2.1. If n ≥ 3 then, in Rn , N (λ, V ) ≤ Cn vol (λ, V ). An immediate corollary is that if vol (λ, V ) < 1/Cn then H ≥ λI. This particular consequence can be viewed, in turn, as a consequence of Sobolev’s inequality when λ = 0 and V ≥ 0 since then Z 2 2 |u|2 V ≤ kukL2n/(n−2) kV kLn/2 ≤ c2n k∇uk kV kLn/2 2
= (cn / ωn1/n )2 |{(x, ξ) : ξ 2 < V (x)}|2/n k∇uk ≤ k∇uk
2
if we take Cn = cnn /ωn where cn is the constant in the Sobolev inequality (7.2.1) and ωn is the surface area of the unit sphere in Rn . Thus, sharpening Sobolev’s inequality is very much in the spirit of improving Theorem 7.2.1. However, volume counting itself does not always yield effective eigenvalue estimates as the following simple example illustrates. Further examples can be found in [136]. Particle in a box. Let B be a product of intervals having lengths δ1 ≤ δ2 ≤ · · · ≤ δn and set H = −∆ − EχB . We will see in Section 7.2.5 that if E > E0 (δ1 , . . . , δn ) ∼ [δ1 δ2 log(1 + δ3 /δ2 )]−1 then H can have negative eigenvalues. However, vol{(x, ξ) : ξ 2 < EχB } ∼ E n/2 Πδi . To make this product bigger than one, as volume counting predicts it should be, one requires −2/n E > (Πδi ) , a quantity that bears no relation to E0 (δ1 , . . . , δn ). 7.2.3 Fefferman–Phong eigenvalue estimates In what follows we will continue to write H = −∆ − V , V ≥ 0. We will work in Rn for n ≥ 3, denoting by Q the collection of dyadic cubes with generic element Q ∈ Q and by Qj those Q ∈ Q with |Q| = 2−nj . One is particularly interested in: (i) the magnitude of the smallest negative eigenvalue λ1 (H) and (ii) how R many negative eigenvalues H can have. The largest mean value µ(V, B) = B V /|B| of the potential provides a naive initial guess for the magnitude of λ1 . A safer bound is given by the maximum ´1/p ³ R . For fixed 0 < c < C define p-mean, µp (f, Q) = |Q|−1 Q |f |p
298
7 Uncertainty principles in mathematical physics
¡ ¡ ¢ ¢ Esm = sup µ(V, B(x0 ; δ)) − Cδ −2 ≤ sup µp (V, B(x0 ; δ)) − cδ −2 = Ebig . x0 ,δ
x0 ,δ
One has [136] the following. Theorem 7.2.2. If V ≥ 0 and 1 < p < ∞ then there are constants c, C depending only on n and p such that one has cEsm ≤ −λ1 (H) ≤ CEbig . Consequently, if Ebig < 0 then −∆ − V ≥ 0. The techniques for this estimate also have implications for estimating the number of negative eigenvalues as Fefferman and Phong showed, and as we review here. One says that f belongs to the Morrey space Mρp (Rn ) (0 ≤ ρ < n) if ¶1/p µ Z |f |p <∞ kf kMρp = sup |Q|−ρ/n Q
(7.4)
Q
where the supremum is taken over all cubes in Rn . Morrey norms are analogous in some ways to Sobolev norms. In this section we are concerned with 2 potentials that nearly belong to the Morrey space Mn−2p for some 1 < p < n/2 2/n in the sense that |Q| µp (V, Q) is small except on some specific set of bad 2 cubes. As such, the condition of membership in Mn−2p is often referred to as 2 the Fefferman–Phong p-bump condition and non-negative elements of Mn−2p are called Fefferman–Phong weights. Theorem 7.2.3. Let V ≥ 0. If Q1 , . . . , QN are cubes whose doubles 2Qi are pairwise disjoint and if µ(V, Qj ) ≥ c|Qj |−2/n for each cube Qj , then H = −∆ − V has at least N negative eigenvalues. Conversely, there is a constant C > 0 independent of N such that, if H has CN negative eigenvalues, then there are cubes Q1 , . . . , QN with disjoint doubles such that µp (V, Qj ) ≥ c(n, p)|Qj |−2/n (1 < p < ∞). The first statement is proved by defining an appropriate lattice of cubes over which the means satisfy µ(V, Q) ≤ cλ with the exception of a set of N bad cubes [136]. We will only address the second, more difficult claim: negative eigenvalues imply disjoint cubes with large p-means. The starting point is a sort of wavelet Sobolev inequality using a multivariate version of the piecewise polynomial wavelet decomposition considered in Chapter 2. To each cube Q ∈ Q one associates a space V1 (Q) of functions living in Q that are piecewise linear (PL) on each of the dyadic offspring of Q. One thinks, loosely, of V1 (Q) as the space of functions concentrated on one of the tiles over which |ξ|2 − V has negative average. Then V1 (Q) is the sum of V0 (Q)—those functions that are linear on all of Q—and W(Q), the orthogonal complement of V0P (Q) in V1 (Q). Any L2 function can be expanded in an orthogonal series u = Q PW(Q) (u) in which the sum of the projections PW(Q) extends over all dyadic cubes. P 2j The PL-wavelets satisfy an inequality analogous to 2 |hf, ψjk i|2 ≤ 2 Ck∇uk2 , the case s = 1 of Theorem 2.1.2. It says, very roughly, that the PL-wavelets almost diagonalize the Laplace operator.
7.2 Eigenvalue estimates for Schr¨ odinger operators
Lemma 7.2.4.
P Q
|Q|−2/n
299
° ° °PW(Q) u°2 ≤ C k∇ uk2 . 2 2
In contrast to Theorem 2.1.2 in which the wavelets would be at least C 1 , the PL-wavelets are discontinuous so the lemma requires some effort, including estimates of Littlewood–Paley projections. P Proof. One starts with a smooth Littlewood–Paley decomposition u = j u ∗ P wj = j u ∗ wj ∗ w ˜j such that the w bj (ξ) = w(2 b −j |ξ|) are compactly supported P about the annulus |ξ| ∼ 2j and w bj (ξ) = 1 when ξ 6= 0 while w ˜ = w−1 + w0 + w1 . Setting ψ = PW(Q) uj /kPW(Q) uj k2 gives ¯Z ¯ ¯ ¯ ¯ k PW(Q) uj k2 = |hw ej ∗ uj , ψi| = ¯ w ej (x − y) uj (y) ψ(x) dx dy ¯¯ = ¯Z ¯ ¯ ¡ ¯ ¢ ¯ (w ej (x − y) − w ej (xQ − y)) − (x − xQ ) · ∇w ej (xQ − y) uj (y)ψ(x) dx dy ¯¯ ¯ since ψ annihilates linear functions. By using smooth cutoffs one may assume that w ej satisfies the estimates µ |∂ α w ej (x)| ≤ Cα,β 2(n+|α|)j
1 1 + 2j |x|
¶β (7.5)
P for some β > n. By Minkowski’s inequality, kPW(Q) uk2 ≤ j kPW(Q) uj k2 . Application of (7.5) to the second derivatives of w ˜j together with the estimate kψk1 ≤ C|Q|1/2 yields Z µ
¶β 1 kPW(Q) uj k2 ≤ C|Q| |Q| 2 |uj (y)| dy or 1 + 2j |xQ − y| ¶β ¶2 Z µZ µ 1 |u (y)| dy dx kPW(Q) uj k22 ≤ C 2−4k 22nj j 1 + 2j |x − y| Q 1/2
2/n (n+2)j
whenever Q ∈ Qj+k , since then the value of (1 + 2j |x − ·|)−β ∗ uj at xQ is essentially equal to its average over Q. Summing this inequality over all Q ∈ Qj+k , it follows that 24k
X
Z µZ kPW(Q) uj k22 ≤ C
Q ∈ Qj+k
µ 2nj
1 1 + 2j |x − y|
¶β
¶2 |uj (y)| dy
dx
≤ C kuj k22 since the functions 2nj (1 + 2j |x − y|)−β have the same L1 -norm and so form a uniformly L2 -bounded family of convolution kernels. Thus,
300
7 Uncertainty principles in mathematical physics
X X j
X
k ≥ 0 Q ∈ Qj+k
=
XX j
≤ C
° |Q|−2/n ° °PW(Q) uj °2 2 j 1/n 2 |Q| X ° °2 22j+3k °PW(Q) uj °2
k≥0 Q ∈ Qj+k
XX j
2
22j−k kuj k2 = C
X
2
2
22j kuj k2 ≤ C 0 k∇uk2
j
k≥0
by Plancherel and the Littlewood–Paley decomposition. Since X X ° ° ° °2 °PW(Q) uj °2 ≤ C 2−j |Q|−1/n °PW(Q) uj °2 2 j: 2−j ≥ |Q|1/n
j: 2−j ≥ |Q|1/n
it follows from rearranging sums that X X ° ° °PW(Q) uj °2 ≤ C k∇uk2 . |Q|−2/n 2 2 Q
j:2−nj ≥|Q|
On the other P hand, since the projections PW(Q) are orthogonal, for fixed j, k one has Q∈Qj−k kPW(Q) uj k22 ≤ kuj k22 and X
X ° ° 2 2 °PW(Q) uj °2 |Q|−2/n (2−j |Q|)1/n ≤ C 22j kuj k2 ≤ C 0 k∇uk2 . 2 j
2−nj <|Q|
Finally, since X
X
° ° °PW(Q) uj °2 ≤ C 2
2−nj <|Q|
° ° °PW(Q) uj °2 (2−j |Q|)1/n 2
2−nj <|Q|
it follows that X° XX° ° ° °PW(Q) u° |Q|−2/n ≤ °PW(Q) uj °2 |Q|−2/n ≤ C k∇uk2 2 2 Q
j
Q
and this proves Lemma 7.2.4. We shall needP a second technical lemma involving the wavelet square function S(u)(x) = ( x∈Q kPW(Q) (u)k2 /|Q|)1/2 . As with the usual Littlewood– Paley square function, one can establish weighted norm inequalities for S as follows. Lemma 7.2.5. If u ∈ V0 (R)⊥ then Z Z |u|2 V ≤ C S 2 (u) Mp (V χR ) where R
R
Mp (V )(x) = sup µp (V, Q) (p > 1). x∈Q
7.2 Eigenvalue estimates for Schr¨ odinger operators
301
The proof requires results from weighted norm inequalities whose details lie beyond the scope of the present discussion. The main point is that Mp V is in A∞ (see Appendix). Standard properties of such weights imply (e.g., [151]) that for u ∈ V0 (R)⊥ , Z Z Z 2 2 |u| V ≤ |Mdy u| Mp V ≤ C S 2 (u) Mp V R
R
R
where Mdy denotes the dyadic maximal function and C depends on the A∞ property of Vp . One proves the second statement of Theorem 7.2.3 by showing that if one can find at most M ≤ 5N cubes Q1 , . . . , QM that have pairwise disjoint doubles and satisfy µp (V, Qj ) ≥ c|Qj |−2/n , then H = −∆ − V has at most CN negative eigenvalues. To do so one builds a subspace H ⊂ L2 (Rn ) of codimension at most CN on which H ≥ 0. One constructs H⊥ by associating the subspace V1 (Q) to each of a carefully chosen set of cubes Q1 , . . . , QM . The positivity of H on H will follow from combining Lemmas 7.2.4–7.2.6. First one defines a collection B of bad cubes consisting of all Q ∈ Q such that µp (V, Q) ≥ γ|Q|−2/n where γ is a small constant to be chosen below. To guarantee a well-defined subcollection of minimal bad cubes Q1 , . . . , QN one starts out assuming that V is bounded: then γ|Q|−2/n ≥ kV∞ k ≥ µp (V, Q) if |Q| is small. The eigenvalue estimates will not depend on kV k∞ . On the minimal bad cubes one has control of H as follows. Lemma 7.2.6. Suppose that µp (V, R) ≤ γ|R|−2/n for all dyadic subcubes R of a fixed dyadic cube Q. If u is orthogonal to V0 (Q) one has Z X ° °2 2 |u|2 V ≤ C γ |R|−2/n °PW(R) u° ≤ C γ k∇(u χQ )k2 . Q
R
Proof. The second inequality follows from Lemma 7.2.4. To prove the first inequality, one claims that µR (Mp (V )) ≤ Cγ|R|−2/n for each dyadic subcube of Q. To see this, note that Mp (V χQ ) ≤ Mp (V χR ) +
sup e ⊂Q R⊂R
e ≤ Mp (V χR ) + γ|R|−2/n . µp (V, R) (7.6)
The maximal theorem and H¨older’s inequality imply that µ(Mp (V χR ), R) ≤ Cp µp (V, R) ≤ Cp γ|R|−2/n . (7.7) P Next, if u is supported in Q and u ∈ V0 (Q)⊥ then u = R⊂Q PW(R) u while, by Lemma 7.2.5 and (7.6) and (7.7),
302
7 Uncertainty principles in mathematical physics
Z
Z |u|2 V ≤ C Q
S 2 (u χR ) Mp (V χQ ) ¶ Z µX χR kPW(R) (u)k2 = C Mp (V χQ ) |R| Q R⊂Q X kPW(R) (u)k2 µ(Mp (V χQ ), R) = C Q
R⊂Q 0
≤ C γ
X
kPW(R) (u)k2 |R|−2/n .
R⊂Q
This proves the first inequality and hence Lemma 7.2.6. If u ∈ V0 (Q)⊥ and if µp (V, R) ≤ γ|R|−2/n with γ small enough so that Cγ in Lemma 7.2.6 is at most one, then Z hH(u χQ ), ui = k∇(u χQ )k22 − |u|2 V Q
2
2
≥ k∇(u χQ )k2 − C γ k∇(u χQ )k2 ≥ 0. That is, H is locally positive on V0 (Q)⊥ whenever Q is one of the minimal bad cubes PQ1 , . . . , QN . However, this is not enough to guarantee positivity of H on ( V0 (Q))⊥ . One needs to append to Q1 , . . . , QN carefully chosen cubes QN +1 , . . . , QM , M ≤ 5N to obtain the desired positivity. These additional Qj will be chosen in a bit. To describe what one seeks of the appended cubes QN +1 , . . . , QM , let E(Qj ) be the complement in Qj of those Qj 0 ∈ {Q1 , . . . , QM } that are properly contained in Qj . One also sets E(Rn ) = Rn \ ∪M j=1 Qj . These exceptional sets partition Rn into disjoint subsets. The property that one requires of {Q1 , . . . , QM } is that, for any dyadic R, one has µp (V χE , R) ≤ C γ |R|−2/n
(7.8)
when E is one of E(Qj ) or E(Rn ). Once this is accomplished, the technique used to prove Lemma 7.2.6 will also yield Z X° ° °PW(R) u°2 |R|−2/n (u ∈ V1 (Qj )⊥ ), (7.9) |u|2 V ≤ Cγ 2 E(Qj )
Z
E(Rn )
R
X° ° °PW(R) u°2 |R|−2/n |u|2 V ≤ Cγ 2
(u ∈ Cc∞ (Rn )). (7.10)
R
The estimates are restricted to V1 (Qj )⊥ as opposed to V0 (Qj )⊥ in Lemma 7.2.6 because one must bisect Qj to pass to E(Qj ). Finally, one defines H⊥ = PM j=1 V1 (Qj ). To insure that each dyadic R appears only once when summing the estimates (7.9) and (7.10), let Ωj , j = 1, . . . , M consist of those dyadic R such
7.2 Eigenvalue estimates for Schr¨ odinger operators
303
that R ⊂ Qj and R is not contained in any Qj 0 that is properly contained in Qj . Also, letPΩ0 consist of those R that are not contained in any of the Qj . Set u(j) = R∈Ωj PW(R) u, j = 0, 1, . . . , M . Since the Ωj form a partition P P P of Q it follows that kPW(R) uk22 |R|−2/n = j R∈Ωj kPW(R) uk22 |R|−2/n . If u ∈ H then u ∈ (V1 (Qj ))⊥ for each j = 1, . . . , M . Thus u(x) = u(j) (x) for x ∈ E(Qj ) while u(x) = u(0) (x) for x ∈ E(Rn ). Taking γ small enough, it follows from Lemma 7.2.6 that Z
Z 2
|u| V =
|u
(0) 2
| V +
E(Rn )
≤ Cγ
M Z X j =1
µX
(0) 2
kPW(R) u
k |R|
|u(j) |2 V E(Qj )
−2/n
X R 2
≤ k∇uk
¶ (j) 2
kPW(R) u
−2/n
k |R|
j=1 R
R
= Cγ
+
M X X
kPW(R) uk2 |R|−2/n (u ∈ H).
(7.11)
To summarize, (7.11) will follow once the following tasks are performed to satisfaction: (i) the cubes QN +1 , . . . , QM are defined such that M ≤ 5N and (ii) the estimate (7.8) is verified whenever E = E(Qj ) or E(Rn ). The first step is to choose the relative cubes QN +1 , . . . , QM . Recall that the bad cubes R ∈ B are those satisfying µp (V, R) ≥ γ|R|−2/n . Here γ is fixed. One can view B as a forest of badtrees indexed by the maximal cubes Bmax in B. Note that B max is well defined if one temporarily assumes that limr→∞ supR∈Qr |R|2/n µp (V, R) = 0. Then #B max ≤ N since every minimal bad cube is contained in a maximal one. Not every dyadic cube intermediate to a pair of bad cubes is itself bad, so one defines a badtree as follows. Assign a node to each bad cube and assign an edge between a pair of nodes corresponding to Q ⊂ Q0 ∈ B provided there is no other Q00 ∈ B with Q ⊂ Q00 ⊂ Q0 . One continues to call R an offspring of Q, denoted R ∈ O(Q), if R is one level down from Q in a badtree. One calls R a bifurcation node of B provided R has at least two offspring. Finally, for reference one denotes by Bmin the family of minimal cubes, B max the family of maximal cubes, B∧ the family of bifurcation cubes and O∧ the family of offspring of bifurcation cubes of B. By definition, Bmin = {Q1 , . . . , QN }. The set {Q1 , . . . , QM } of chosen cubes will consist of Bmin ∪ B max ∪ B∧ ∪ O∧ . One claims that M ≤ 5N . As #Bmax ≤ #Bmin , it is enough to prove the following. P Lemma 7.2.7. Q ∈ B∧ {1 + #O(Q)} ≤ 3 #Bmin . Proof. Here O(Q) denotes the offspring of Q in its badtree as defined above. The left-hand side then equals #B∧ +#O∧ . It suffices to prove the estimate for a single badtree. This is done by induction on the height of the tree. Clearly it holds when the tree has just a single node which is minimal. Suppose it
304
7 Uncertainty principles in mathematical physics
holds for any tree up to a given height. One observes what happens when one ¯ in the original tree remains more level is added or grown. If a minimal cube Q ¯ minimal then the inequality is unchanged. If a single offspring is added to Q ¯ in creating the new tree then Q is deleted from Bmin while its single offspring is added to Bmin . Thus both sides of the inequality in the Lemma remain ¯ has two or more offspring in the new tree then Q ¯ unchanged. Finally, if Q becomes a bifurcation cube. In this case the value of #Bmin is increased by ¯ − 1 since Q ¯ is no longer minimal but its offspring are. Consequently #O(Q) ¯ − 1) while the the right-hand side of the inequality increases by 3(#O(Q) ¯ left-hand side increases by only {1 + #O(Q)} (see Figure 7.1). ◦
◦
Ä?? ÄÄ ??? Ä ?? ÄÄ ?? ÄÄ Ä ◦ ◦
/
◦Ä
Ä?? ÄÄ ??? Ä ?? ÄÄ ?? ÄÄ Ä ◦? Ä◦Â ? ? ? Ä Â ? ? Ä Â ? ? ◦
◦
◦
Fig. 7.1. Growing a tree
¯ − 4 ≥ 0 when #O(Q) ¯ ≥ 2. Consequently the Their difference is 2#O(Q) right-hand side increases at least as fast as the left-hand side as the tree grows new levels. The lemma is proved. One concludes that M , the number of chosen bad cubes, is at most five times N , the number of minimal bad cubes. It just remains to establish the estimate (7.8) where E = E(Qj ) for one of the chosen cubes, or E(Rn ). Let R be a dyadic cube. If R ∈ / B there is nothing to estimate. So assume that R ∈ B. First suppose that E = E(Rn ). Since R ∈ B there is a maximal cube Q ∈ B max such that R ⊂ Q. But then Q ∈ {Q1 , . . . , QM } so, by definition, Q∩E(Rn ) = ∅ and µp (V χE , R) = 0. This proves the estimate for E = E(Rn ). Next suppose that E = E(Qj ) and R ∈ B. If, in fact, R = Qj then one can apply the mean value estimates for each of the 2n offspring subcubes of Qj at the possible price of increasing the constant C in (7.8) by a factor of 2n . If R properly contains Qj then, since the desired estimate (7.8) can be rewritten as Z |V |p dx ≤ (Cγ)p |R|1−(2p/n) (7.12) R ∩ E(Qj )
and since E(Qj ) ⊂ Qj , the case Qj ⊂ R follows from the estimate for R = Qj provided p ≤ n/2. However, since µp (V, R) and hence B and N increase with p, the second statement of Theorem 7.2.3 gets stronger as p > 1 decreases.
7.2 Eigenvalue estimates for Schr¨ odinger operators
305
So it is enough to take 1 < p ≤ n/2 (this is where one needs n ≥ 3). Thus, to prove the estimate it is enough to consider the case in which R is a bad, proper subcube of Qj . If Q ∈ Bmin then R ∈ B cannot be properly contained in Qj . Thus one can assume that Q is not minimal. Next, if Qj ∈ B∧ then E(Qj ) ⊂ Qj \ ∪Q0 ∈O(Qj ) Q0 . Since any R ∈ B that is properly contained in Qj must be contained in one of the offspring of Qj , it follows that R ∩ E(Qj ) = ∅ for such R and the estimate (7.12) is trivial. This was the purpose for including O∧ in the chosen cubes, though it also forces one to establish (7.12) in the case in which Qj is neither minimal nor bifurcating. For the remaining Qj one must show that if R is a proper subcube of Qj then, even though R is bad, enough of the bad part of R lies outside of E(Qj ) to get the desired estimate (7.12). Since Qj is not minimal one can assign a ¯ that is of maximal size among those proper subcubes that are bad cube Q ¯ either minimal or bifurcating. Then E(Qj ) ⊂ Qj \ Q. ¯ One claims that if R ∈ B is a proper subcube of Qj then it must intersect Q. ¯ ˜ Since R and Q are properly contained in Qj one can find a cube Q of minimal ¯ As Q ˜∈ size among those bad cubes that contain both R and Q. / Bmin , it has ˜ had just one bad offspring then that offspring at least one bad offspring. If Q ¯ and R, which would contradict the choice would necessarily contain both Q ˜ Thus Q ˜ must be a bifurcation cube. Since Qj itself is not a bifurcation of Q. ˜ must be a proper subcube of Qj . But this contradicts that Q ¯ is as cube, Q ¯ large as possible. Thus R and Q must overlap as claimed. ¯ or Q ¯ ⊂ R. If R ⊂ Q ¯ then Because they are dyadic cubes, either R ⊂ Q E(Qj ) ∩ R = ∅ and the estimate (7.12) is trivial. So one may assume that ¯ ⊂ R. Finally one decomposes R into what Fefferman [136] calls a Calder´ Q on– Zygmund decomposition (see Figure 7.2).
Q
Fig. 7.2. Calder´ on–Zygmund decomposition of R
306
7 Uncertainty principles in mathematical physics
¯ then set First bisect R into its 2n sibling subcubes. If one of these is Q n aside the remaining 2 − 1 siblings and stop. Otherwise, set aside the siblings ¯ and bisect the sibling subcube containing Q. ¯ Repeat that do not contain Q ¯ until Q is one of the sibling offspring. The subcubes set aside during this ¯ and from one another. They cannot be bisection process are disjoint from Q ¯ is of maximal size among those bad bad cubes: that would contradict that Q subcubes of Qj that are either bifurcating or minimal. Consequently, Z Z X Z p p |V | ≤ |V | = |V |p R ∩ E(Qj )
¯ R ∩ [Qj \Q]
≤ C
X
Q set aside
Q
γ p |Q|1−2p/n ≤ cn (2n − 1)γ p
Q set aside
X
2k(n−2p)
2nk ≤|R|
where the last sum keeps track of the bisection iterations. The geometric series converges when p < n/2 giving the desired estimate (7.12). This completes the proof of the second statement of Theorem 7.2.3. The method of proof of Theorem 7.2.3 yields a stronger result. Assume as before that V ≥ 0 in H = −∆ − V . Let N (E, V ) be the number of eigenvalues of H that are less than −E while Nup (E, V ) = #{Q ∈ Bmin : |Q| ≤ CE −n/2 }. One then has the following. Theorem 7.2.8. N (E, V ) ≤ C Nup (E, V ). Integrating this estimate over E > 0 one finds thatPthe sum of the absolute values of the negative eigenvalues of H is at most C Q∈Bmin |Q|−2/n . 7.2.4 Thomas–Fermi theory and stability of matter In Section 7.2.1 we discussed why a Sobolev inequality is needed in order to prove stability of the hydrogen atom. Here we sketch a consequence of the techniques of Fefferman and Phong pertaining to one aspect of stability of bulk matter. Thomas–Fermi theory was introduced separately by Thomas [349] and Fermi [141] to address the N -body problem in quantum physics. One assumes that N electrons are located at positions x = (x1 , . . . , xN ) (really xi ∈ R3 ). When they are distributed over M atoms with nuclei at fixed loci PM y = (y1 , . . . , yM ) carrying positive charges zj , j=1 zj = N , the total (nonrelativistic) potential energy associated with the system of normalized attractive and repulsive Coulomb forces takes the form V (x1 , . . . , xN ) = −A+B+U = −
X j,k
X X zj zk zk 1 + + |xj − yk | |xj − xk | |yj − yk | j
j
(7.13) denoting the electron–nucleus attractions, electron–electron repulsions and nucleus–nucleus repulsions, respectively. The state of the system is described by a wave function ψ(x1 , . . . , xN ) with kψk2 = 1. Neglecting spin, one takes ψ
7.2 Eigenvalue estimates for Schr¨ odinger operators
307
to be scalar-valued. Even so, when N is large ψ is intractable, while one seeks 2 to control more understandable quantities such as E(ψ) = k∇ψk + hV ψ, ψi. In the case of solid, uniform material as is assumed now, the charges zk are all z = N/M where N, M are large—on the order of 1026 . Intuitively, the points yj are equally spaced and the electron orbitals are uniform as well. There are really two issues to address in explaining, in mathematical terms, why this should be the case: first, why is each molecule stable and, secondly, why do the molecules bind? In the case of a single atom, the term U in the potential (7.13) vanishes. Then, when z = N is large, the term A dominates B. Thus, neglecting B one finds that the state minimizing the N -variate energy e E(ψ) =
¿µ ¶ À N X z −∆N − ψ, ψ |xi | i=1
QN has the form Φ(x) = i=1 φ(xi ) in which φ = ce−z|x|/2 is a minimizer of the corresponding one particle energy. But then Φ lives on a scale of essentially 1/|z|, implying that “large” atoms are smaller than “small” atoms. This troublesome conclusion is eliminated if one stipulates as Pauli did in his famous exclusion principle, that only one electron is allowed in each orbital. Mathematically this principle stipulates, in turn, that the allowable rest states are antisymmetric functions (really tensors). A typical example is the determinantal function 1 det{φi (xj )}N ψ(x) = √ i,j=1 N!
(7.14)
in which φ1 , . . . , φN are mutually orthogonal and thus exclusive: this function captures abstractly the notion of having one electron per orbital. Henceforth one takes candidate minimizers of EQM (ψ) = hHψ, ψi to be antisymmetric functions called Fermions. Because it is cumbersome to work directly with wave functions ψ one conR siders the electron density ρ(x) = N |ψ(x, x2 , . . . , xN )|2 dx2 · · · dxN . Since |ψ|2 is symmetric, ρ does not depend on the hyperplane of integration. Thomas–Fermi theory seeks to approximate EQM (ψ) in terms of ρ and to guess its minimum EQM when ψ is a ground-state eigenvector. The electrostatic potential associated to a classical charge density ρ is Z Z X zj zk X zk 1 ρ(x)ρ(y) V (ρ) = dx dy + − ρ(x) dx. 2 R3 ×R3 |x − y| |yj − yk | |x − yk | R3 j
k
(7.15) The first term only approximates its relativistic counterpart (e.g., [137]). 2 The kinetic energy k∇ψk is a purely quantum-mechanical term in the sense that a classical charge distribution can remain stationary and hence possess zero kinetic energy. The following fundamental estimate of Lieb and
308
7 Uncertainty principles in mathematical physics
Thirring (cf. [136]) makes fundamental use of volume counting and antisymmetry to obtain a 5/3-law estimate for kinetic energy in terms of charge density. Lemma 7.2.9. If ψ is antisymmetric with density ρ thenR there is a constant 2 h strictly smaller than c in (7.16) such that k∇ψk ≥ h R3 ρ5/3 (x) dx. R One calls c R3 ρ5/3 the Thomas–Fermi kinetic energy of ρ. Here, c corresponds to the minimum energy needed to pack a large number of Fermions in a box. Adding to it the potential terms (7.15) one obtains the Thomas–Fermi energy functional: Z ρ5/3 . (7.16) ETF (ρ) = V (ρ) + c R3
Then ETF (ρ) provides a lower bound for EQM (ψ). The following fundamental estimate of Dyson–Lennard [130] and Lieb–Thirring [258] is often referred to as stability of matter. Theorem 7.2.10. Let ψ(x1 , . . . , xN ) be an antisymmetric unit vector in L2 . Then (i) EQM (ψ) ≥ −CN while (ii) if EQM (ψ) ≤ C1 N then there is a cR2 depending only on C1 such that |Ω| > c2 N whenever Ω ⊂ R3 satisfies ρ dx ≥ N/2. Ω Statement (i) is proved by estimating ETF and applying Lemma 7.2.9. Statement (ii), which implies that ρ cannot be arbitrarily localized, then just requires an application of H¨older’s inequality. In estimating ETF the volume counting estimate of Theorem 7.2.1 is used along with the curious fact that ½ ETF {y1 , . . . , yN } ≡ inf ETF (ρ) : ρ ≥ 0,
¾
Z ρ=N
≥
N X
ETF {yi } (7.17)
1
which says that, in Thomas–Fermi theory, atoms do not bind since separating the nuclei lowers ETF . Thus statement (ii) says that matter cannot be compressed too tightly but it does not account for the molecular bonding that occurs in bulk matter. Fefferman used the techniques of Theorem 7.2.3 to show that uniform bulk matter is, in fact, composed of atoms of a fixed size. To this end he postulated that on a cube Q of normalized unit volume ρ should satisfy: R (A1) Q ρ ∼ 1, (A2) ρ R ≥ c on a fixed portion of Q, (A3) Q ρ5/3 ≤ C, and (A4) at least one nucleus and at most C nuclei belong to Q. Though still not a precise description of a molecular lattice, these properties at least impose some regular structure on ρ. Fefferman [136] referred to the following fact as the atomic structure of matter.
7.2 Eigenvalue estimates for Schr¨ odinger operators
309
Theorem 7.2.11. Suppose that ψ(x1 , . . . , xN ) is a Fermion wave function with EQM (ψ) ≤ −εN . Then there is a collection of at least cN pairwise disjoint cubes Q whose sizes satisfy c ≤ |Q|2/n ≤ C on which Rρ = ρ(ψ) satisfies the conditions (A1)–(A4). In fact, (A3) can be replaced by Q ρ3 ≤ C. The full strength of the Fefferman–Phong estimates is needed here: volume counting is not enough. Discussion of subsequent developments c. 1990 can be found in Lieb [261]. 7.2.5 Sharpening the Fefferman–Phong condition Theorem 7.2.3 says that the negative eigenvalues of −∆ − V (V ≥ 0) are essentially the values −|Q|−2/n as Q ranges over those minimal cubes for which µ(V, Q) or µp (V, Q) is large. As they are simple to estimate, p-means provide a useful tool for ensuing stability estimates. But there is potentially a large gap between µ(V, Q) and µp (V, Q), so the eigenvalue estimates may not be very precise. Another concern is the use of PL-wavelets as “approximate eigenfunctions.” They are, perhaps, too different from true eigenfunctions to serve as templates for sharp testing conditions. As discussed below, true λ-eigenfunctions should be in the range of V 1/2 (λ − ∆)−1 V 1/2 . For the sake of a concrete example illustrating these concerns we return to the case of a particle in a box. Recall that if B = B1 × · · · × Bn where Bi are intervals with increasing lengths δi then volume counting predicts a required energy E & Evol (δ1 , . . . , δn ) ∼ (Πδi )−2/n for H = −∆−EχB to have a negative eigenvalue. The Fefferman–Phong condition indicates that if H has a negative eigenvalue then there is a cube Q such that E ≥ c|Q|1/p−2/n |Q ∩ B|−1/p . If the side length of Q is one of the δj then |Q|1−2p/n |Q ∩ B|−1 ≥ (j−1)−2p (Π1j−1 δi )−1 δj . This product is smallest when j = 3, yielding µ −1
E ≥ (δ1 δ2 )
δ1 δ2 δ32
¶1/p0 (p > 1).
The correct estimate is Ecritical ∼ 1/(δ1 δ2 log(1 + δ3 /δ2 )). Cube counting is premised upon eigenfunctions living essentially in cubes, whereas V |Q only affects that part of the particle/wave eigenfunction extending over Q. Replacing −∆ − V by its adjoint it makes more sense, perhaps, to test a fractional integral of order two over B. Suppose that I2 (χQ V ) is, at best, roughly evenly spread over Q. This is the case when V = χB . Then the quantity Z 1 (I2 (χQ V )) V, (7.18) V (Q) Q R in which V (Q) = Q V , is at most some multiple of I2 (χB )(xB ) where xB is the center of B. This value is essentially Z δ1 Z δn Z ³ δ3 ´ dy dy = c · · · ≈ δ δ log 1 + n 1 2 n−2 n−2 δ2 −δ1 −δn |y| B |xB − y|
310
7 Uncertainty principles in mathematical physics
as can be seen by breaking up the integral into appropriate regions. Thus (7.18) gives the correct energy estimate for a particle in a box. Kerman and Sawyer [230] were able to sharpen the Fefferman–Phong eigenvalue estimates in this manner, counting the number of negative eigenvalues in terms of the number of bad cubes, where bad means that the mean (7.18) is too large. Their analogue of Theorem 7.2.3 is the following. Theorem 7.2.12. Let H = −∆ − V where V ≥ 0 on Rn (n ≥ 3). Then there are c, C depending only on n such that for any fixed λ ≥ 0: (i) If Q1 , . . . , QN are cubes of side length at most λ−1/2 having pairwise disjoint doubles then N (−λ, H) ≥ N provided Z ¡ ¢ 1 I2 (χQj V ) V ≥ C (1 ≤ j ≤ N ). V (Qj ) Qj (ii) N (−λ, H) ≤ CN , provided there are at most N pairwise disjoint dyadic cubes Q1 , . . . , QN having side length at most λ−1/2 such that Z ¡ ¢ 1 I2 (χQj V ) V ≥ c (1 ≤ j ≤ N ). V (Qj ) Qj Thus the negative eigenvalues of H are approximately given by −|Q|−2/n as Q ranges over the minimal cubes over which (7.18) satisfies a given lower bound. The same testing condition now applies to set upper and lower bounds for the number of negative eigenvalues, though it is often less simple to compute than the Fefferman–Phong bump condition. The proof of the lower bound on N (−λ, H) is a little more involved than that of Theorem 7.2.3. We will sketch the proof of the upper bound for the sake of comparing to that of Theorem 7.2.3. Complete details are found in [230]. As before, rather than estimating N (−λ, H) one only estimates the number of negative eigenvalues. The bad cubes are defined as those for which the average (7.18) is at least γ. One defines the set of M ≤ 5N cubes that are either minimal, maximal, branching or badtree offspring of branching bad cubes, as well as the exceptional sets Ej = E(Qj ) and E(Rn ) just as before. The complemented subspace H is defined slightly differently. Because of additional complications that arise from using condition (7.18 ) one needs to generate H⊥ with functions having cancellation properties localized to the exceptional sets better than the PLwavelets. The following lemma [70] allows one to conclude that not too many functions will be needed. Lemma 7.2.13. If R1 , . . . , Rk are pairwise disjoint dyadic subcubes of a dyadic cube Q then there are cubes P1 , . . . , Pm (m ≤ Cn k) that are not necessarily dyadic or disjoint but such that E(Q) = Q \ ∪ki=1 Ri = ∪m ν=1 Pν . The conclusion applies with Q replaced by Rn if the Pν are also allowed to be products of semi-infinite intervals.
7.2 Eigenvalue estimates for Schr¨ odinger operators
311
Armed with the lemma one can define the complemented subspace having the appropriate cancellation properties. The Ri in the lemma are typically the offspring of Qj . Thus, to belong to H one imposes on u that it have mean zero on each of the cubes Pν = Pνj obtained in the lemma. This means that there are at most a fixed dimensional multiple of M ≤ 5N characteristic functions defining H⊥ to which u must be orthogonal. A lemma of Fabes et al. [133] implies the pointwise estimate |u(x) − uQ | ≤ C I1 (χQ |∇u|)(x) (x ∈ Q)
(7.19)
where uQ is the average over the arbitrary cube Q and I1 now is the fractional integral of order one. Thus, for u ∈ H one has the pointwise estimate |u(x)| ≤ C I1 (χEj |∇u|)(x) (x ∈ Ej ).
(7.20)
Applying the same type of badtree analysis as before one shows that if Vj = V χEj then Z ¡ ¢ 1 I2 (χQ Vj ) Vj ≤ γC (7.21) Vj (Q) Q holds for any dyadic cube (one defines this average to be zero if Vj = 0 a.e on Q). The main adjustment at this stage is a Whitney decomposition of the nontrivial case when Q is a bad cube that is intermediate to a pair of the chosen Qj ’s. The modification is needed because Vj (Q) is not simply a power of |Q| and, hence, (7.18) does not scale in the same sense as the Fefferman– Phong bump condition. If both (7.21) and (7.20) hold then for u ∈ H, Z |u|2 V =
M Z X j=0
≤ C2
Ej M X j=0
≤
M X j=0
Z
|u|2 V Z
¡ Ej
Z |∇u|2 by the trace theorem Ej
|∇u|2
=
¢2 I1 (χEj |∇u|)(x) V by (7.20)
(7.22)
Rn
which is the desired conclusion. The last inequality is based on the estimate Z Z 2 (I1 (f )) V ≤ C1 γ |f |2 . (7.23) A trace theorem due to Kerman and Sawyer states, as a special case, that (7.23) holds for all f ∈ L2 (Rn ) if and only if
312
7 Uncertainty principles in mathematical physics
Z
¡ ¢2 I1 (χQ V ) ≤ C V (Q).
(7.24)
Q
holds for all dyadic cubes Q. On the other hand, the special form of fractional integration yields, up to a fixed constant, Z ¡ ¢ 2 I2 (χQ V ) V = hI1 (χQ V ), I1 (χQ V )i = kI1 (χQ V )k2 (7.25) Q
(see e.g., [327] for composition of Riesz potentials, valid here since n ≥ 3). Therefore, (7.24) is equivalent to (7.18). 7.2.6 NWO eigenvalue estimates for Schr¨ odinger operators Having seen that wavelets with specific properties can improve fundamental inequalities of mathematical physics, it makes sense to sort out which properties of wavelets come into play in different operator estimates of interest. Dyadic geometry plays an important role throughout. For the improved Sobolev inequality, cancellation and Lipschitz properties are important as well. Cancellation plays, at least, a convenient role in eigenvalue estimates. In Section 6.9 certain model operators (6.42) constructed from waveletlike building blocks were considered, following [309, 310]. Subsequent work of Rochberg [306, 307] developed the theory further in the context of eigenvalue estimates for Schr¨odinger operators. It is worthwhile to compare Rochberg’s techniques with those in Section 7.2. We will work in Rn (n ≥ 3). Birman and Schwinger introduced the following method of replacing H = −∆ − V (V ≥ 0) by what is, in the case of non-negative V , a positive compact operator depending on a choice of eigenvalue for H. Let −λ be an eigenvalue of H and φ an eigenfunction for −λ. If ψ = V 1/2 φ then V −1/2 (λ − ∆)V −1/2 ψ = ψ. So, at least formally, the eigenfunctions of H with eigenvalue −λ correspond to the eigenvalues of Kλ = V 1/2 (λ − ∆)−1 V 1/2 with eigenvalue one. Taking M (λ, V ) to be the number of eigenvalues of Kλ that are at least one, Birman and Schwinger (e.g., [314]) proved that N (−λ, V ) ≤ M (λ, V ).
(7.26)
where, as before, N (λ, V ) denotes the number of eigenvalues of H that are smaller than λ. Taking this Birman–Schwinger theorem as a starting point, Rochberg [307] reconsidered the problem of eigenvalue estimates for H in terms of estimates for Kλ . To emphasize what properties of vaguelettes are required, one considers instead a slightly simplified model for Kλ based on a Whitney decomposition of the kernel. Here, vaguelette-like building blocks are used to write the model operator in the generic form (6.42). Since V 1/2 will play a fundamental role, we set V 1/2 = W . For convenience, we will assume that V ∈ A∞ .
7.2 Eigenvalue estimates for Schr¨ odinger operators
313
Though this hypothesis discounts several important examples such as particles in boxes, it makes the analysis of this model much simpler. For example, the Fefferman–Phong p-means then can be replaced by ordinary means with no ill effects. To outline the strategy, consider the case λ = 0. The operator (−∆)−1 is given by convolution with a multiple of |x|2−n and K = K0 = W (−∆)−1 W is given, at least formally, by integration against k(x, y) = |x − y|2−n W (x)W (y). Let {Qα × Q0α } be a dyadic Whitney decomposition of {(x, y) ∈ Rn × Rn : x 6= y} (e.g., [327], p. 16) so that the diameter of Qα × Q0α is on the order of the distance of Qα × Q0α from the diagonal, which is also comparable to the side length |Qα |1/n . The model operator for K = K0 then can be expressed through the kernel X W (x) W (y) χQα (x) χQ0α (y). |Qα |(n−2)/n α If the Whitney cubes are essentially pairwise disjoint then this kernel is comparable to k(x, y). Moreover, since the Whitney organization implies that dist(Qα ,Q0α ) ∼ |Qα |1/n , there is no essential loss in considering an even simpler model in which the terms Qα 6= Q0α are disregarded. Then writing −1
W (y) χQ (y),
−1
W (x) χQ (x),
hQ = kW χQ k eQ = kW χQ k
one has an operator L with integral kernel kL (x, y) =
X Q∈Q
2
kW χQ k hQ (y) eQ (x) |Q|(n−2)/n
(7.27)
that is of the form (6.42) and is L2 -bounded if V is also a Fefferman–Phong weight. Theorem 7.2.14. If V = W 2 ∈ A∞ then L and K are L2 -bounded provided 2 the coefficients |Q|2/n−1 kW χQ k are bounded, i.e., M = sup |Q|2/n µ(V, Q) < ∞. Q
Then the operator norms of L and K are at most fixed multiples of M. Proof. Since W 2 = V ∈ A∞ , it follows directly from Corollary 6.9.2 that the sequences {eQ } and {hQ } are NWO. To prove Theorem 7.2.14, by Theorem 2 6.9.3 it suffices to prove that aQ = |Q|2/n−1 P kW χQ k defines a Carleson sequence (see (6.40)), that is, that M (aQ ) = R⊂Q |R||aR |/|Q| is bounded. Set Qk (Q) = {R ∈ Q : R ⊂ Q, l(R) = 2−k l(Q)}. Then
314
7 Uncertainty principles in mathematical physics
M (aQ ) =
∞ X |R| X |aR | = 2−nk |Q|
R⊂Q
= |Q|
k=0
2/n
∞ X
−2k
2
k=0
1 |Q|
X
|aR |
R ∈ Qk (Q)
X R ∈ Qk (Q)
Z
W2 = R
4 |Q|2/n µ(V, Q) < ∞. 3
This shows that M (aQ ) is bounded and hence L is L2 -bounded. The proof can be extended to the operator K by considering the full Whitney decomposition. If supQ |Q|2/n µ(V, Q) is small enough then, by (7.26), the number of negative eigenvalues is less than one, hence zero. When λ 6= 0, the Riesz potential (−∆)−1 is replaced by the Bessel potential (λ − ∆)−1 whose kernel qλ satisfies ½ c1 |x − y|2−n , (λ1/2 |x − y| ≤ 1) qλ (x, y) ∼ c2 |x − y|2−n exp(−λ1/2 |x − y|), (λ1/2 |x − y| ≥ 1) with constants c1 , c2 not depending on λ (e.g., [327], p. 131), which will be fixed henceforth. As before we have a dyadic model operator Lλ with kernel of the form ½ X ¾ X 1 exp(−λ1/2 |Q|1/n ) 2 + kW χQ k hQ ⊗ eQ . (n−2)/n (n−2)/n |Q| |Q| −n/2 −n/2 |Q| ≤ λ
|Q| ≥ λ
Theorem 7.2.15. If W = V 1/2 ∈ A∞ then the L2 -operator norm of Lλ is at most a fixed multiple of sup |Q|2/n µ(V, Q) < ∞. Q∈Q
Proof. Just as in the proof of Theorem 7.2.14, it suffices to show that the coefficient sequence has finite Carleson norm. Split {aQ } into {aQ χlarge (Q)} ∪ {aQ χsmall (Q)} where χlarge (Q) = 1 if |Q| ≥ λ−n/2 and is zero otherwise, and χsmall (Q) = 1 − χlarge (Q). The estimates for those small cubes Q such that |Q| ≤ λ−n/2 go precisely as in the previous theorem. Since {aQ } 7→ {M (aQ )} is sublinear, it is enough to estimate {M (aQ χlarge (Q))}. To simplify notation, set bQ = aQ χlarge (Q). For fixed large Q choose N so that |Q|1/n ∼ 2N λ−1/2 . Then bR = 0 if |R| < 2−nN |Q| so, as before, M (bQ ) =
N X |R| X |bR | = 2−nk |Q|
R⊂Q
=
N X
k=0
X
|bR |
R ∈ Qk (Q)
2−2k exp(−λ1/2 2−k |Q|1/n ) |Q|2/n µ(V, Q)
k=0
≤ C |Q|2/n µ(V, Q)
N X k=0
2−2k exp(−c 2N −k ).
7.2 Eigenvalue estimates for Schr¨ odinger operators
315
P
Applying the estimate A>2k 2−kβ exp(−c2−k A) ≤ CA−β in the case β = 2 and A = 2N ∼ λ1/2 |Q|1/n , one bounds the last term by C |Q|2/n µ(V, Q)(λ1/2 |Q|1/n )−2 ≤ C |Q|2/n µ(V, Q). Thus M (bQ ) ≤ C supQ ∈ Q |Q|2/n µ(V, Q). This proves the theorem. Slightly more thorough analysis shows that, in fact, {aQ } satisfies M (aQ ) ≤ C min(λ−1 , |Q|2/n ) µ(V, Q). As in the case of Theorem 7.2.14 one can estimate Kλ = W (λ − ∆)−1 W in the same way but the accounting is more technical. We return to the problem of counting eigenvalues. The strategy is the same as that of Fefferman and Phong: identify those cubes over which the potential averages are large and associate these with the large eigenvalues. By (7.26), if −λ is the N th smallest eigenvalue of H then the N th approximation number sN (Kλ ) is at least one. Consequently, there is a constant c such that C min(λ−1 , |Q|2/n ) µ(V, Q) ≥ M (aQ ) ≥ c
(7.28)
for at least N + 1 cubes. These cubes are not necessarily pairwise disjoint. Since V ∈ A∞ , it follows that µ(V, Q) and µ(V, Q0 ) have a bounded ratio (independent of Q) whenever Q and Q0 share a common face. This observation makes it possible, with a slight decrease in c, to come up with N pairwise disjoint cubes for which (7.28) perseveres. By passing again to subcubes, one can assume that each such Q satisfies |Q| ≤ λ−n/2 . Now we can state the eigenvalue estimate. Set NQ (λ) = sup #{Q : |Q| ≤ λ−n/2 , |Q|2/n µ(V, Q) ≥ d} where the sup is taken over all pairwise disjoint collections of dyadic cubes satisfying the given estimates. Then one has the following (cf. Theorem 7.2.8). Theorem 7.2.16. If W 2 = V ∈ A∞ then N (−λ, V ) ≤ c NQ (λ) (λ > 0). 7.2.7 Eigenfunction estimates Rochberg took full advantage of the dyadic paraproduct structure that NWO sequence operators provide to obtain eigenfunction estimates for Schr¨odinger potential operators. As we will see in a different context below, one can try to expand a resolvent in multilinear terms and sum them up. In the present case, the resolvent expansion is applied to Kλ = W (λ − ∆)−1 W . The multilinear terms are further split into “near” and “far” components, combinations of which must be summed delicately. To describe the result, consider a decay envelope, defined for any given pair of dyadic cubes Q, R as
316
7 Uncertainty principles in mathematical physics
µ mε (Q, R) =
|Q| 1+ |R|
¶ε µ ¶−n/2+ε |xQ − xR | 1+ |Q|1/n
(7.29)
where xQ is the center of Q. Suppose that u = uλ satisfies kuk2 = 1 and Kλ uλ = uλ where λ > 0 and the potential V = W 2 ∈ A∞ . Then the localizations uλ |R of uλ to dyadic cubes R satisfy the following [308]. Theorem 7.2.17. Given ε > 0 there is a δ > 0 such that if Nδ < ∞ then one can find constants a, b, c, d depending only on ε, λ and kV kA∞ and N = Nδ dyadic cubes Q1 , . . . , QN such that for any dyadic cube R one has ÃN ! X −1 uλ |R ≤ a mε (Qk , R) exp(−b (c λ − ∆) ) (V χdR ) . k=1
In very rough terms, one thinks of the N th eigenfunction as being estimated by O(N ) islands on which it is controlled by the fractional integral estimate, plus a global term that decays exponentially away from the islands. Intuitive justification is provided in [307].
7.3 More on decay of wavelet coefficients In Chapter 6 wavelets were shown to provide an optimal approximation rate for certain Besov norms bracketing the variation of a function. In the following several pages we outline work of Cohen et al. (cf. [82]) that provides an optimal weighted weak-type estimate on the decay of wavelet coefficients of functions in BV(Rn ). The delicate analysis is reminiscent of the proof of Theorem 7.2.3 and the result itself leads to a sharpening of the endpoint Sobolev inequality kf kLn0 ≤ Ck∇gkL1 as we will see. Here it is necessary to work in Rn , but now with n ≥ 2. As before, ψQ = ψjk when Q = Q(j, k). 7.3.1 Bounded variation and weak-`1 Theorem 7.3.1. If ∇f ∈ L1 (Rn ) with n ≥ 2 then β(j, k) = 2j(1−n/2) hf, ψjk i defines a sequence in `1,∞ (Z × Zn ), that is, for each λ > 0, Z c(n) #{Q ∈ Q : |β(Q)| > λ} ≤ |∇f |. λ This weak-type estimate extends to all of BV(Rn ). Restricting the proof to the case ∇f ∈ L1 does not circumvent the technical heart of the proof but it does allow one to avoid some measure-theoretic complications. As with Theorem 7.2.3, the proof boils down to identifying certain subcollections of bad cubes off which the average gradient controls some other
7.3 More on decay of wavelet coefficients
317
average. InR the present case, the bad cubes are described in terms of the magnitude of Q |∇f |. To prove the theorem one assumes for convenience that f R is supported in [0, 1)n and that |∇f | = 1. The theorem only makes sense for suitable wavelets. In the present context it is convenient to use wavelets that are Lipschitz and have compact support. One pretends as we did in Chapter 6 that the wavelet basis is generated by a single Lipschitz function of Lipschitz norm at most one and having compact support inside the cube [0, 1]n . In reality, by rescaling one can obtain a finitely generated orthogonal wavelet basis in which all of the generators are supported inside [0, 1]n with the added cost that these generators have large Lipschitz norm. The difference between the pretend basis and a true wavelet basis can be accounted for by a large constant c(n) in the theorem. With this convention one can write, unambiguously, β(Q) = β(j, k) when Q = Q(j, k). We start with the following. R Lemma 7.3.2. If ∇f ∈ L1 then |β(Q)| ≤ n Q |∇f |. The lemma follows from a simple integration-by-parts argument. It tells us that the number of large normalized Rwavelet coefficients is bounded by the number of bad cubes—ones for which Q |∇f |dx is large. Badness here is a R matter of degree. One says that Q ∈ B p , p = 0, 1, 2, . . . if Q |∇f |dx > 2−p . Evidently, the Bp form an increasing family of trees that are called the p-level trees of f . In what follows one fixes λ = 2−q for some q ∈ N. Thereby, B ≡ B q will be fixed. The lemma is not enough to deduce the theorem—it allows for too many bad cubes. In particular, it does not pinpoint the subset of Q on which the variation is localized. Importantly, if the variation is localized on a very small subset of Q then that variation will not back-propagate into the wavelet coefficients at too many longer scales. Making this intuition precise is the R problem. As a first pass, one notes that, by Chebychev’sq inequality and |∇f |dx = 1, the minimal elements Bmin of B are at most 2 in number. Definition 7.3.3. The 2n offspring of Q are denoted Q(1) , Q(2) , . . . , Q(n) such that µ(∇f, Q(j+1) ) ≤ µ(∇f, Q(j) ) where µ(f, S) denotes the average of |f | over S. The offspring thus ordered will be called the first child, second child, etc., of Q. B∧ denotes those Q ∈ B having at least two offspring in B. Since #Bmin ≤ 2q it follows that #B∧ ≤ 2q as well. The elements of B∧ are, in a sense, the worst of the bad cubes. However, the cubes are not the only ones for which β(Q) can be large. Thus one wishes to augment the B∧ in such a way that (i) |β(Q)| ≤ C2−q if Q is not in the augmented family and (ii) there are still O(2q ) cubes in the augmented family. An added complication is that one must also take into account the degree of badness of the augmented cubes. An element ofR B = Bq could also be in Bp for some p < q, that is, it could also satisfy Q |∇f | > 2−p . The “worst” elements of Bq are, in an important sense to be made precise presently, those cubes that are not only in Bp but, in fact, are not too many levels above a p-bifurcation.
318
7 Uncertainty principles in mathematical physics
Definition 7.3.4. Λpq denotes the subset of Bq of those Q ∈ B p such that Bp has a bifurcation node at most 2(q − p) generations below Q. One sets Λq = ∪Λpq . The definition is illustrated in Figure 7.3. ◦ •Â > ¡¡Â > ¡ > ¡¡ Â > ¡¡ ¡ ◦Â ~◦? ? ◦ ~ ~ Â ~ ? ~~ Â ? ~ ◦~ ◦ ◦ ◦ Fig. 7.3. The black node belongs to Λpp+1 . Bp is indicated by solid edges; Bp+1 by solid or dashed edges q By definition then B∧ = Λqq . It will suffice to show that (i) #Λq ≤ C2q −q while (ii) |β(Q)| ≤ C2 holds for any Q ∈ / Λq . The proof of (i) is not so difficult. It is harder to show that the wavelet coefficient of a Λpq cube is not too large. Since the gradient cannot be too large on distinct yet big subcubes, the wavelet is close to its average when f is varying and this allows one to control the size of the coefficient. Nevertheless, precise estimates still require a type of Calder´on–Zygmund decomposition. First we show that there are not too many cubes in Λq .
Lemma 7.3.5. There is a C independent of f such that #Λq ≤ C2q . p ≤ 2p and since any R ∈ Λpq sits at most 2(q − p) + 1 levels Proof. Since #B∧ above such a bifurcation one has
#Λq ≤
q X p=0
#Λpq ≤
q X p=0
2p (2(q − p) + 1) ≤ 2q
∞ X
2−r (2r + 1) = C 2q .
r=0
This proves the lemma. The problem of proving Theorem 7.3.1 now reduces to that of proving / Λq then |β(Q)| ≤ C2−q . Proposition 7.3.6. If Q ∈ p First, one sets ∧(Q) = min(+∞, inf{p : Q ∈ B∧ }). Then B ∧(Q) is the sparsest level tree, if there is one, in which Q is a bifurcation node. Membership in the complement of Λq can be rephrased as:
7.3 More on decay of wavelet coefficients
Q∈ / Λq
319
if and only if for every descendent R of Q either: ∧(R) > q
OR
Q precedes R by more than 2(q − ∧(R)) generations.
(7.30)
In short, either ∧(R) is “large” or R is “small.” To get started, note that if ∧(Q) is large then f is well behaved outside a small subset of Q. In fact, with Q(1) as in Definition 7.3.3, one has the following. R Lemma 7.3.7. Q\Q(1) |∇f | ≤ 2 (2n − 1) 2−∧(Q) . p Proof. If ∧(Q) = p then Q(1) and Q(2) belong p is minimal for R to B . Since (2) p−1 this property of Q, Q ∈ / B , that is, Q(2) |∇f | ≤ 21−∧(Q) . The same estimate holds for Q(j) , j = 3, . . . , 2n due to their rank ordering.
The next step toward proving Proposition 7.3.6 is to make precise the analogue of the Calder´on–Zygmund decomposition used in the Fefferman– Phong estimates. If Q ∈ / Λq due to Q ∈ / B q then, by Lemma 7.3.2, |β(Q)| ≤ −q q n2 . If, on the other hand, Q ∈ B then one can define a decreasing chain (1) of cubes Q0 ⊇ · · · ⊇ Qr by setting Q0 = Q and letting Qj = Qj−1 , the (1)
first offspring of Qj−1 . The chain terminates as soon as Qr+1 ∈ / B q , which 1 n must happen eventually since ∇f ∈ L (R ). With this notation one has the following. Proposition 7.3.8. If n ≥ 2 then there is a constant C such that for any Q ∈ Bq , µ ¶ r X |β(Q)| ≤ C 2−r−q + 2−∧(Qj )−j . j=0
Proof. By Lemma 7.3.7 and the choice of the Qj it follows that Z |∇f | ≤ C 2−∧(Qj ) , (0 ≤ j < r); Qj \Qj+1
Z
|∇f | ≤ C 2−q .
(7.31)
Qr
One can take C = 2n+1 . Next one claims that Z r X |β(Q)| ≤ C 2−j j=0
|∇f |.
(7.32)
Qj \Qj+1
The proposition follows from (7.32) and (7.31). The proof of (7.32) requires wavelet estimates. Recalling that ψQ is supported in Q, one sets
320
7 Uncertainty principles in mathematical physics
α(Q) ≡ hf, ψQ i =
r−1 Z X j=0
=
r−1 Z X j=0
+
r−1 X j=0
Z f ψQ +
f ψQ =
Qj \Qj+1
Qr
r X
αj
j=0
Z (f − mj ) ψQ +
(f − mr ) ψQ
Qj \Qj+1
Qr
Z mj
Z ψQ + mr
ψQ ≡
Qj \Qj+1
Qr
r X
γj + δj
j=0
where mj = m(f, Qj \Qj+1 ) is the average of f over Qj \Qj+1 (j < r) or over Qr (j = r). For j < r, the estimate |ψQ | ≤ |Q|−1/2 and H¨older’s inequality yield, for n0 = n/(n − 1), 1 |γj | ≤ |Q|1/2
Z
|Qj \ Qj+1 |1/n |f −mj f | ≤ |Q|1/2 Qj \Qj+1
µZ |f −mj f |
n0
¶1/n0 .
Qj \Qj+1
A form of Poincar´e’s inequality [86] yields Z kf − mj f kLn0 (Qj \Qj+1 ) ≤ K
|∇f |. Qj \Qj+1
The constant K in Poincar´e’s inequality depends on the domain of integration. The particular version that we have applied has best constant invariant under shifts and which scales like 1/a under x 7→ ax. Moreover, up to such a renormalization Qj \ Qj+1 is one of a finite family of Lipschitz domains (cf. Lemma 7.2.13). Therefore, with a constant C independent of Q one has Z −1/2 1/n |γj | ≤ C |Q| |Qj | |∇f |. (7.33) Qj \Qj+1
R Finally one exploits cancellation of the wavelets. Writing θj = Qj ψQ R Pr (with θr+1 = 0) one has Qj \Qj+1 ψQ = θj − θj+1 . Therefore, j=0 δj = Pr m (θ − θ ). Using the vanishing integral property of ψ and setting j j j+1 j=0 m0 = 0, summation by parts yields r X
δj = m0 (θ0 − θ1 ) + m1 (θ1 − θ2 ) + · · · + mr (θr − θr+1 )
j=0
= θ1 (m1 − m0 ) + θ2 (m2 − m1 ) + · · · + θr (mr − mr−1 ). Estimation of the means mj allows one to conclude that Z C |mj − mj−1 | ≤ |f − ηj | |Qj−1 \ Qj+1 | Qj−1 \Qj+1 where ηj = m(f, Qj−1 \ Qj+1 ) denotes the mean value of f over Qj−1 \ Qj+1 . As before, Qj−1 \ Qj+1 is a rescaling of one of a finite collection of connected
7.3 More on decay of wavelet coefficients
321
Lipschitz domains. Consequently, an estimate parallel to that for γj allows one to conclude that Z |mj − mj−1 | ≤ C |Qj−1 \ Qj+1 |(1−n)/n |∇f | Qj−1 \Qj+1
µZ ≤ C 0 |Q|(1−n)/n 2j(n−1)
¶ |∇f | .
Z
|∇f | + Qj−1 \Qj
Qj \Qj+1
The trivial estimate |θj | ≤ |Qj |/|Q|1/2 then gives ¯X ¯ µZ r X ¯ r ¯ 00 1/n−1/2 −j ¯ ¯ δ ≤ C |Q| 2 j¯ ¯ j=0
Z |∇f | +
Qj−1 \Qj
j=1
¶ |∇f | .
Qj \Qj+1
Combining this with (7.33) we have, finally, 1/n−1/2
|α(Q)| ≤ C |Q|
r X
Z −j
2
|∇f |. Qj \Qj+1
j=1
The estimate (7.32) now follows from the definition of β(Q). This completes the proof of Proposition 7.3.8. Now we are prepared to proof of Theorem ³ finish the ´ 7.3.1. By Proposition Pr −r−q −∧(Qj )−j 7.3.8 one has |β(Q)| ≤ C 2 + j=0 2 . If Q ∈ B q \ Λq then for R ⊂ Q, if ∧(R) ≤ q then Q precedes R by more than 2(q − ∧(R)) generations. This observation will allow one to get a desired estimate on the sum r X
2−∧(Qj )−j =
j=0
X
X
2−∧(Qj )−j +
{j:∧(Qj ) ≤ q}
2−∧(Qj )−j ≡ S1 + S2 .
{j:∧(Qj )>q}
in Proposition 7.3.8. Clearly, S2 ≤ 2−q
∞ X
2−j = 21−q .
j=0
For the terms in S1 one has j > 2(q − ∧(Qj )) so that X {j:∧(Qj ) ≤ q}
2
−∧(Qj )−j
≤
q X
X
k=0 j>2(q−k)
−k−j
2
=
q X
2−k−2(q−k) ≤ C2−q .
k=0
This, finally, completes the demonstration that if Q ∈ / Λq then |β(Q)| ≤ C2−q . 0 q Together with the estimate #Λq ≤ C 2 this proves that #{Q : |β(Q)| > R C1 2−q } ≤ C2 2q when f is supported in [0, 1)n and |∇f | = 1. The extensions of these estimates to any f such that k∇f k1 < ∞ and any λ > 0 are routine. This completes the proof of Theorem 7.3.1.
322
7 Uncertainty principles in mathematical physics
7.3.2 Wavelets and an improved Sobolev inequality Theorem 7.3.1 requires n > 1 indirectly through the Sobolev inequality kf kn0 ≤ cn k∇f k1 , n0 = n/(n − 1), (7.34) ° ° which is unaffected by dilations. However, °∇(e2πiωx f (x))°1 ∼ 2π|ω| kf k1 + ° 2πiωx ° k∇f k1 whereas °e f (x)°n0 = kf kn0 , so (7.34) lacks precision when f is modulated. This suggests the possibility of sharpening the Sobolev inequality by including a multiplicative “norm” on the right that is small for oscillating functions, to obtain a more precise inequality of the form 1/n0
kf kn0 ≤ cn k∇f k1
1/n
kf kB
(7.35)
in which side is still invariant under dilations. Roughly, one ° the right-hand ° wants °e2πiωx f (x)°B ∼ (1/|ω|)n−1 for large ω. If f is appropriately localized then one can pretend that e2πiωx f (x) represents a single bounded oscillation localized about an interval of length 1/|ω|. Hence, for |ω| ∼ 2j and an® ap 2πiωx propriately placed wavelet at scale j one might obtain | e f (x), ψjk | ∼ ® 2−nj/2 , or 2j(1−n/2) | e2πiωx f (x), ψjk | ∼ 2j(1−n) ∼ (1/|ω|)n−1 . This suggests a heuristic definition of a wavelet coefficient function norm defined as ® kf kB = sup 2j(1−n/2) | e2πiωx f (x), ψjk | j,k,ψ
where the sup is taken over all wavelet indices and over the mother wavelets in an appropriately chosen finitely generated orthonormal wavelet basis. One takes this as a working definition of B. Definition 7.3.9. The space B consists of those tempered distributions modulo polynomials such that, in a basis of C 1 compactly supported wavelets for Rn , one has supj,k,ψ 2j(1−n/2) | hf, ψjk i | < ∞. −(n−1),∞
(cf. It turns out that B fits into the Besov scale as the space B˙ ∞ (6.2)). Different wavelet bases with suitable regularity define equivalent norms on B. The inequality (7.35) applies with this space and is a consequence of Theorem 7.3.1. While the latter applies to Haar wavelets, (7.35) requires some regularity. Further details are found in [86]; see also [82] for related results.
7.4 More on the spectrum of Schr¨ odinger operators 7.4.1 WKB approximation Once wave mechanics was placed on a firm mathematical basis, the next task was to apply it to some of the fundamental problems of potential theory, including tunnelling through a potential barrier and determining eigenstates
7.4 More on the spectrum of Schr¨ odinger operators
323
of a potential well. The idea was to approximate the wave function as an oscillatory wave depending on a phase integral—then several problems could be solved by quadrature. Three applications of this approach were published around the same time by Wentzel [362], Kramers [236], and Brillouin [61], ergo the term WKB, though many other names are partially attached to the idea that arguably dates to Liouville [262]. We will restrict the discussion to a single variable. The WKB approximation arises as follows. One starts with the standard time-independent Schr¨odinger eigenvalue problem ψ 00 + (λ2 − V )ψ = 0
(7.36)
which can be expressed in phase form ψ(x) = eiφ(x) as is suggested by the case of the free Hamiltonian V = 0. Substituting this form into (7.36) and setting W = Wλ = λ2 − V yields −(φ0 )2 + iφ00 + W = 0.
(7.37)
When φ00 is small one obtains an initial approximation √ (φ0 )2 ≈ W or φ0 ≈ ± W . As a contingency, the initial approximation suggests that 2φ00 ≈ ±(W )−1/2 W 0 which, substituting in (7.37) yields the second ODE i (φ0 )2 ≈ ± (W )−1/2 W 0 + W 2 ln |W 2 +(W 0 )2 /(4W )|1/2 +i tan−1 (W 0 /(2W 3/2 )) = e 0 3/2 ≈ eln |W |+i(W /(2W )) , assuming that (W 0 )2 ¿ W 3 . Taking complex square roots yields √ 0 3/2 i W0 φ0 ≈ eln |W |/2+i(W /(4W )) = ± W + or 4W Z x√ i W + ln W + C so that φ≈± 4 0 µ Z xp ¶ A 2 exp ±i λ −V . ψλ ≈ 2 (λ − V )1/4 0
(7.38)
The general approximate solution is obtained by linear combinations over the choices of sign and may be expressed then in terms of signs and cosines. 7.4.2 Turning points and connection formulas The WKB approximation breaks down when V (x) ≈ λ. A zero crossing of λ2 − V is called a turning point because, in the classical picture, this is a point at which an oscillator changes direction. But in the quantum picture
324
7 Uncertainty principles in mathematical physics
the wave/particle need not be constrained to the allowed region x2 ≤ λ2 . One wishes to quantify wave function energy that is transmitted beyond, or reflected from a turning point. Mathematically, one can continue a wave function beyond a turning point by choosing an appropriate branch of the square root when V > λ2 in order that the approximate wave function decays exponentially beyond the turning point. More troubling is whether the WKB approximation even makes sense in that case, and one needs to justify passing from the allowed to the forbidden region. This is accomplished by so-called connection formulas. Detailed analysis of connection formulas can be found in Merzbacher [272]. What is relevant here is the approximation of the transmission and reflection of energy about a turning point. As a specific example, if one normalizes so that the turning point in question occurs at x = 0 and so that V (x) − λ2 ≈ x near x = 0 then approximate solutions v of the renormalized form of (7.36) take the form v ≈ a cos y + b sin y, in the allowed region, v ≈ c cosh |y| + d sinh |y| in the forbidden region, in a neighborhood of the turning point while the WKB approximation is valid away from the turning point. The two approximations are assumed to connect somewhere in between. 7.4.3 Spectral estimates for Schr¨ odinger operators with slowly decaying potentials In Section 7.2 we considered estimates for eigenvalues or pure point spectra of Schr¨odinger operators in Rn in terms of large potentials. In this section we consider a complementary problem concerning absolutely continuous spectra. The potential V will be thought of as a perturbation of the free Hamiltonian H0 = −∆ + 0. Positivity will not be important but decay will. We will work exclusively in a single variable and return to the standard convention of writing H = HV = −d2 /dx2 + V . A first basic question is: Under what conditions on V will the solutions ψ = ψλ of (7.36) behave, asymptotically, like those of a free Hamiltonian, that is, like linear combinations of e±iλx as x → ±∞? It is well known that if V ∈ L1 (R) R xthen this happens—a fact that is tied in with the existence of limx→±∞ 0 V . If one merely assumes that V ∈ Lp for some p > 1 then these limits may not exist and it becomes reasonable to ask: how should the asymptotic behavior of ψλ depend on V ? To answer this question one needs to consider scattering effects. Onevariable scattering can be described in terms of transmitted and reflected energy in the direction of increasing x. Thus one seeks a solution ψλ+ (x) of (7.36) having the form
7.4 More on the spectrum of Schr¨ odinger operators
½ ψλ+ (x) ≈
iλx
e
t(λ) eiλx , x > 0 large + r(λ) e−iλx , x < 0 large
325
(7.39)
in which t(λ) and r(λ) are called transmission and reflection coefficients, Rx respectively. The behavior of 0 V appears in the phase of ψλ+ according to the following alternative WKB approximation “near infinity” due to Christ and Kiselev [73]. Theorem 7.4.1. Assume that V ∈ (L1 + Lp )(R+ ) for some p ∈ (1, 2). Then, for almost every λ ∈ R there is a solution of (7.36) of the form i
ψλ+ (x) = eiλx− 2λ
Rx 0
V
(1 + o(1)) as x → ∞.
Rx Thus transmission and reflection are tied directly to the behavior of 0 V as x → ∞. R x One only considers behavior on R+ here. To motivate the appearance of 0 V , one begins assuming that ψλ (x) = eiΦ(x,λ) . The property (7.36) implies that 2
2
i Φ00 − (Φ0 ) = (V − λ2 ) or (Φ0 ) = λ2 − V by equating real parts. The present emphasis on asymptotics versus turning points merits an approximation method. If one can write Φ(x) = λx + φ1 (x) + φ2 (x) such that φ02 (x) = o(φ1 (x)), then Φ0 (x) ≈ λ + φ01 (x) or 2
(λ + φ01 (x)) ≈ λ2 − V. 2
If (φ01 ) is assumed to be small compared to φ01 then, at least for large x one has the approximation Z x 1 Φ(x) ≈ φ(x, λ) = λ − V. (7.40) 2λ 0 One refers to φ(x, λ) as the WKB phase. The approximations can be justified for V in a dense subspace of L1 +Lp of functions whose derivatives decay with increasing rapidity. A second fundamental question is that of so-called asymptotic completeness of the absolutelyR continuous spectrum of H. When f can be expressed as f (x) = (1/(2π)) fb(λ/2π)eiλx dλ one can form a solution of the free Schr¨odinger equation H0 ψ = idψ/dt, ψ(x, 0) = f (x) in terms of eigenfunction solutions via Z ³ ´ 2 1 λ ψ(x, t) = e−iH0 t f (x) = fb eiλx−iλ t dλ. (7.41) 2π 2π Almost all of the functions {e±iλx }λ≥0 then are needed to describe the evolution of an initial state ψ(x, 0) under eiH0 t . It is reasonable that the set of solutions to (7.36) in Theorem 7.4.1 that exist for almost all λ > 0 should generate initial states for solutions of HV ψ = idψ/dt in a corresponding way. This
326
7 Uncertainty principles in mathematical physics
was verified by Christ and Kiselev, for V ∈ (L1 + Lp )(R+ ), (p ∈ (1, 2)) [74] using WKB methods. Specifically, with approximation multipliers 2
ρ± (λ, t) = e−iλ
t∓
i 2λ
R ±2λt 0
V
,
one considers the function Z ∞ ³λ´ R ±2λt 2 i 1 V eiHρ± t ψ = e−iλ t∓ 2λ 0 dλ sin(λx)ψb 2π 0 2π
(7.42)
as a WKB approximate solution of eiHt ψ. To justify calling this an approximation, consider the modified wave operators Ω± = limt→∓∞ eitH e−itHρ± . Christ and Kiselev proved the following. Theorem 7.4.2. Let V ∈ (L1 + Lp )(R+ ) for some p ∈ (1, 2). Then the modified wave operators Ω± are both unitary bijections from L2 (R+ ) onto Hac (R+ ), the maximal closed subspace of L2 (R+ ) on which H has purely absolutely continuous spectrum. Significantly, this result does not require any regularity of V as one would expect of a statement about WKB approximations. Delicate analysis is required to pass from the case of regular V to arbitrary V . A key point in justifying both the asymptotic behavior of ψλ (λ-a.e.) in Theorem 7.4.1 and the well-definedness of the modified wave operators of Theorem 7.4.2 is a type of uniform local integrability of ψλ (·) with respect to λ, as captured in the following proposition. Proposition 7.4.3. If V ∈ R(L1 + Lp )(R) for some p ∈ (1, 2) then for any compact Λ ⊂ R\{0} one has Λ log k1 + ψλ (·)k∞ dλ < ∞. The proof of the proposition will be outlined in the next several pages. It uses a sort of formal Taylor series expansion of ψλ (·)—yet another type of WKB approximation. The λ-a.e. asymptotic behavior predicted by Theorem 7.4.1 follows essentially from the method of proof of the proposition. A full account of the results of Christ and Kiselev and their physical ramifications can be found in the expositions [72] and [75]. The specific aim here is to outline techniques of time–frequency analysis that the results engender. To begin to address the question of asymptotic behavior one needs first at least a formal way of writing down solutions. A possible first step is to replace (7.36) by the first-order system · 0 ¸ · ¸· ¸ ψλ 0 1 ψλ = . (7.43) ψλ00 V − λ2 0 ψλ0 One seeks a solution that tends asymptotically to eiφ(x,λ) with WKB phase φ in (7.40). Now substitute · ¸ · iφ ¸· ¸ ψλ e e−iφ w1 = . (7.44) ψλ0 w2 iλeiφ −iλe−iφ
7.4 More on the spectrum of Schr¨ odinger operators
327
For fixed λ, the boundedness of w1 , w2 as functions of x is equivalent to the boundedness of ψλ and ψλ0 , while ψλ (x) → eiφ(x,λ) as x → ∞ if and only if h1i hw i 1 → . Moreover, w itself is a solution of the system w2 0 · 0¸ · ¸· ¸ i w1 0 −V e−2iφ w1 . (7.45) = w2 w20 0 2λ V e2iφ h One can solve formally for
w1 w2
i by starting with "
w
(0)
=
#
(0)
w1 (0) w2
=
· ¸ 1 0
and iterating the system (7.45). A single integration of (7.45) starting with w(0) yields " # ¸ Z ∞· (1) i 0 w1 (1) w = dy1 . (7.46) = − (1) V (y1 ) e2iφ(y1 ) 2λ x w2 A second integration yields ¸ Z ∞Z ∞· −1 −V (y1 ) V (y2 ) e−2iφ(y1 ) + φ(y2 ) (2) dy2 dy1 . w = 0 (2λ)2 x y1 A solution ψλ+ such that ψλ+ − eiλx → 0 as x → ∞ while ψλ+ ∼ a(x, λ)eiλx + b(x, λ)e−iλx as x → −∞ can be obtained as the combination Rx
i
ψλ+ = eiλx− 2λ
0
V (y) dy
i
w1 + e−iλx+ 2λ
Rx 0
V (y) dy
w2
and can be expressed formally in terms of a limit of the iterates w(n) as i
ψλ+ = eiλx− 2λ
Rx
∞ X
V (y) dy
0
(−1)n T2n (V, . . . , V )(x, λ)
n=0 i
+ e−iλx+ 2λ
Rx 0
V (y) dy
∞ X
(−1)n T2n−1 (V, . . . , V )(x, λ)
(7.47)
n=1
in which µ Tn (f1 , . . . , fn )(x, λ) =
i 2λ
¶n Z
n Y
e(−1)
n−k
2iφ(yk ,λ)
fk (yk ) dyk .
x≤y1 ≤···≤yn k=1
Thus (7.47) is a formal Taylor expansion for ψλ+ . Further physical grounds for this expression in terms of transmitted and reflected energy can be found in [72] and [75]. Proving Proposition 7.4.3 amounts to establishing summability
328
7 Uncertainty principles in mathematical physics
of (7.47). It requires techniques from harmonic analysis that we only outline briefly now. The operator Tn = Tn,φ defines a multilinear operator with input arguments fk when the phase φ is fixed. One associates maximal operators Mn,φ (f1 , . . . , fn )(λ) = | sup Tn,φ (f1 , . . . , fn )(x, λ)| x
(7.48)
and P seeks bounds on the Mn,φ strong enough to conclude that the series Mn,φ (f ) (where Mn,φ (f ) = Mn,φ (f, . . . , f )) and hence the corresponding series of terms Tn,φ (f ) converges in the metric defined by Proposition 7.4.3 as well as pointwise in λ whenever f belongs to a suitable dense subspace of Lp . Letting f tend to V from this subspace then allows one to conclude that the Taylor series (7.47) converges pointwise in x for almost every λ and the asymptotic behavior follows. In the case of a linear operator T , norm boundedness of its maximal operator plus pointwise convergence on a dense subspace allow one to deduce almost-everywhere convergence of T f on the whole space by a standard limiting argument (see [327], p. 45). For multilinear operators, one requires an extra telescoping sum argument (see [72] for further details) that is valid, given strong enough bounds on the Mn,φ . 7.4.4 Adapted martingales and pointwise bounds Let T : Lp (R, dx) → Lq (Λ, dλ) be any bounded linear operator with locally integrable distribution kernel K(λ, x) and define ¯Z ¯ Y ¯ ¯ ¯ Mn,T (f1 , . . . , fn )(λ) = sup ¯ K(λ, yi ) f (yi ) dyi ¯¯. x≤z
x ≤ y1 ≤···≤yn ≤z
i
One abbreviates this to Mn,T (f ) when fk = f for all k. Then one has the following (see [72]). Theorem 7.4.4. If p < q and 2 ≤ q then, for every n = 1, 2, . . . , kMn,T (f )(λ)kLq/n (Λ) ≤
(Bp,q kT kLp →Lq kf kLp )n √ n!
(7.49)
¯ where Bp,q depends only on p, q. The bound still holds if K is replaced by K ¯ and/or if f is replaced by f in any subset of the arguments yi defining Mn,T . The requirement 2 ≤ q is not essential but it appears from use of a certain square function defined below. The statement about conjugates is needed in order to make the theorem applicable to the operators in (7.48) since the phase in each integral involves a choice of ±1. When fk = f for all k one takes advantage of the symmetry to write µZ z ¶n Z Y 1 K(λ, yk ) f (yk ) dyk = K(λ, y) f (y) dy n! x ≤ y1 ≤···≤yn ≤ z x k
7.4 More on the spectrum of Schr¨ odinger operators
329
which is readily seen from the fact that the iterated integral is symmetric with respect to any permutation of the variables whereas the regions generated by the permutations fill up the cube [x, z]n . This observation does not account for the possible conjugations of K or f that play a role in (7.47). The key to proving (7.49) is the notion of a martingale structure adapted to f in Lp . A martingale structure on R is a collection {Ejk } of pairwise disjoint intervals Ejk = [ajk , bjk ) indexed by j ∈ {0, 1, 2, . . . } and 1 ≤ k ≤ 2j such that bjk = aj,k+1 and Ej,k = Ej+1,2k−1 ∪ Ej+1,2k . Such a structure is adapted to f in Lp provided Z Z p −j |f |p for all j, k. |f | = 2 Ejk
R
The following proposition illustrates the use of adapted martingales in passing from boundedness of T to boundedness of MT f (x) = sups |T (f χ(−∞,s] )(x)|. The bound obtained also explains the constraint p < q. Proposition 7.4.5. Let 1 ≤ p < q ≤ ∞ and suppose that T is a bounded linear operator from Lp (R) to Lq (R) with bound kT kLp →Lq . Then for all f in Lp one has 21/p kMT f kLq ≤ 1/p kT kLp →Lq kf kLp . 2 − 21/q Proof. To prove the proposition, fix f ∈ Lp and let {Ejk } be a martingale structure adapted to f . For any point s of density of f one can choose k = k(j) such that αj = ajk(j) → s, inducing a partition of (−∞, s] so that ¯Z α0 Z α1 ¯ ¯ ¯ |T (f χ(−∞,s] )(x)| = ¯¯ ± ± · · · K(x, y) f (y) dy ¯¯ −∞
≤
∞ X
α0
sup
j=0 1 ≤ k ≤ 2
j
¯ ¯ ¯T (f χE )(x)¯ jk
¶ 2 ∞ µX X ¯ ¯ 1/q ¯T (f χE )(x)¯q ≡ Gf (x). (7.50) jk j
≤
j=0
k=1
By Minkowski’s inequality, ¶ ∞ Z µX 2j X ¯ ¯ 1/q ¯T (f χE )(x)¯q kGf kq ≤ dx jk j=0
k=1
≤ kT kLp →Lq
∞ ³ X
q
2j(1−q/p) kf kp
´1/q
j=0
= kT kLp →Lq kf kp
∞ X j=0
2j(1/q−1/p) =
21/p kT kLp →Lq kf kp . 21/p − 21/q
Since G does not depend on s, (7.50) now gives the result.
330
7 Uncertainty principles in mathematical physics
We pause to consider the following maximal extension of the Hausdorff– Young inequality. Suppose that V ∈ L1loc vanishes weakly at Rinfinity, meaning R x 1 |V | → 0 as x → ∞. As before, set φ(x, λ) = λx − 2λ V . One has x+[0,1] 0 Theorem 7.4.6. If V vanishes weakly at infinity then for any 1 ≤ p < 2 and any compact set Λ ⊂ R \ {0}, the mapping ¯Z x ¯ ¯ ¯ f 7→ sup¯¯ eiφ(y,λ) f (y) dy ¯¯ x
0
0
is bounded from Lp (R) to Lp (Λ). The theorem follows from Proposition 7.4.5 and boundedness of f 7→ eiφ(y,λ) f (y) dy. The latter follows from the same argument as the standard interpolation proof of Hausdorff–Young. The L2 estimate uses the weak vanishing property. It is proved by duality and integration by parts. Armed with Proposition 7.4.5 we can outline how Proposition 7.4.3 can be deduced from the formal Taylor expansion (7.47). First one has the following. R
Lemma 7.4.7. There is a constant B > 0 such that when {Ejk } is a martingale structure adapted to f in Lp one has ¯Z ¯ sup ¯¯ x≤z
n Y
x≤ y1 ≤···≤yn ≤ z i=1
¯ µ ∞ µX ¶ ¶ 2j ¯ ¯ ®¯2 1/2 n Bn X ¯ ¯ ¯ √ j f, χEjk f (yi ) dyi ¯ ≤ . n! j=0 k=1
One considers the slightly simpler case in which the supremum is omitted, taking x → −∞ and z → ∞. Then writing |Mn (f )| for the left-hand side and partitioning the region S of integration of |Mn (f )| into S = ∪Sm where Sm = {y ∈ S : yi ∈ E11 , i ≤ m and yi ∈ E12 , i > m} one then has | Mn (f )| ≤ |hf, χE11 i| |Mn−1 (f χE12 )| + |hf, χE12 i| |Mn−1 (f χE11 )| ¶ µn−2 X |Mn−l (f χE11 )| |Ml (f χE12 )| + |Mn (f χE12 )| + |Mn (f χE11 )| . + l=2
The lemma follows from iterating and carefully estimating the resulting sums. ¯ ´1/2 P∞ ³P2j ¯ ¯T (f χE )(λ)¯2 Set GT (f )(λ) = j . Similar analysis gives the j=0
k=1
jk
following estimate. Lemma 7.4.8. There is a constant B such that for any n, λ one has Mn,T (f )(λ) ≤
B n GT (f )(λ)n √ n!
Arguing as in the proof of Proposition 7.4.5 also gives
7.4 More on the spectrum of Schr¨ odinger operators
331
Lemma 7.4.9. If p < q and q ≥ 2 then there is a C such that if T is a bounded linear operator from Lp (R) to Lq (R) then kGT (f )kq ≤ C kT kLp →Lq kf kp . Theorem 7.4.4 follows directly from Lemmas 7.4.9 and 7.4.8. However, to prove Proposition 7.4.3 one prefers to work directly from Lemma 7.4.8. Indeed, this lemma justifies the formal Taylor expansion (7.47) because √ the latter is P∞ then majorized pointwise by the series n=0 B n GT (V )(λ)n / n! where T is the operator with kernel K(x, λ) = eiφ(x,λ) . If G = GT (V )(λ) is essentially bounded on any compact subset Λ ⊂ R \ {0} then µX µX ¶2 ¶µ ¶ ∞ ∞ ∞ X B n GT (V )(λ)n (B G)2n B k Gk √ = 1+2 Qn+k 1/2 n! n! n=0 n=0 k=1 ( l=n+1 l) ≤ C exp(CΛ,V GT (V )(λ)2 ). Taking logarithms, applying (7.47), (7.48) and the pointwise estimate Lemma 7.4.8 proves Proposition 7.4.3. In particular, ψλ (x) is defined and nonvanishing for almost every λ > 0. 7.4.5 The endpoint p = 2 and Carleson-type operators The martingale methods just outlined are adequate for addressing the subtle and difficult problem of existence, boundedness and asymptotic behavior of generalized eigenfunctions of a broad and important class of Schr¨odinger potentials. However, the methods fail at the conjectured limit p = 2 of spectral asymptotics as expressed, e.g., by Proposition 7.4.3. The analogues of Theorems 7.4.1 and 7.4.2 are known to fail for p > 2. One cannot tackle p = 2 because the martingale methods only extend Lp → Lq boundedness of T to that of MT when p < q. One expects the multilinear operators Tn,φ and their maximal versions to be bounded in the full H¨older range, specifically, from Lp ×· · ·×Lp to Lp/n . This will undoubtedly put some restrictions on the phase φ (see below). In the remainder of this section we will review what is known in the simplest cases, pointing out the connection with multilinear singular integrals. The simplest case of the operators Tn arising in (7.47) corresponds to the free Hamiltonian, V = 0. Then n ³ i ´n Z Y n−k 2iλyj Tn (f1 , . . . , fn )(x, λ) = e(−1) fk (yk ) dyk . 2λ x ≤ y1 ≤···≤ yn k=1 (7.51) b For the time being replace yk by ξk and λ by πx and write fk in place of f . Then one has n ³ i ´n Z Y n−k 2πixξj b Tn (fb1 , . . . , fbn )(ξ0 , x) = e(−1) fk (ξk ) dξk . 2πx ξ0 ≤ ξ1 ≤···≤ ξn k=1
332
7 Uncertainty principles in mathematical physics
By Fourier uniqueness one can think of these operators as being operators of inputs fk rather than of fbk . To describe the problem of determining optimal Lp mapping properties of these operators, consider n = 1—the simplest case of all. Thinking of T1 (fb) as a function of f rather than of fˆ, and disregarding the factor i/(2πx), T1 = T takes the form Z ∞ f 7→ T (f )(ξ, x) = e2πixη fb(η) dη. ξ
Taking suprema in ξ gives rise to Carleson’s operator (see (7.57) below). Lp bounds on its maximal version amount to the Carleson–Hunt theorem (e.g., [135]) which implies the almost everywhere convergence of Fourier series in Lp , 1 < p < ∞. When n = 2, one has the mapping Z ∞Z ∞ T2 (f, g)(ξ0 , x) = e2πix(η−ξ) fb(ξ) gb(η) dξ dη ξ0
ξ
which, upon letting ξ0 → −∞ takes the form Z ∞Z ∞ T2 (f, g)(x) = e2πix(η−ξ) fb(ξ) gb(η) dξ dη. −∞
(7.52)
ξ
To express T2 as an integral kernel operator one formally writes Z
∞
T2 (f, g)(x) =
Z −2πixξ
e −∞ Z ∞
= =
Z −2πixξ
Z fb(ξ)
−∞
1 = 2
Z
∞
e2πixη gb(η) dη dξ
ξ
e −∞ Z ∞
fb(ξ) fb(ξ)
∞
e2πix(η+ξ) gb(η + ξ) dη dξ
0 ∞
e2πixη gb(η + ξ) dη dξ
0 ∞
fb(ξ) ((I + iH)(e−2πi·ξ g(·))(x) dξ (see below)
−∞
Z ∞ Z −2πiξ(x−t) 1 i e g(x − t) f (−x) g(x) + p.v. fb(ξ) dt dξ 2 2 t −∞ Z 1 i g(x − t) f (t − x) = f (−x) g(x) + p.v. dt 2 2 t 1 = (I + iH) (f (−·) g(·))(x) 2 =
in which H denotes the Hilbert transform (6.25). That is, T2 defined by (7.52) is the Cauchy integral of the pointwise product f (−x)g(x). Since (f, g) 7→ f g is bounded from L2 × L2 into L1 while H is bounded from L1 to L1,∞ , one has T2 : L2 × L2 → L1,∞ . The signs ±1 in the phases defining Tn,φ can have a dramatic effect on its mapping properties. The innocent looking replacement of η − ξ by η + ξ in the phase of T2 defined by (7.52) results in the operator
7.4 More on the spectrum of Schr¨ odinger operators
Z Te2 (f, g)(x) =
∞
Z
−∞
∞
333
e2πix(η+ξ) fb(ξ) gb(η) dη dξ.
ξ
Following the same formal pattern above one arrives at Z 1 i f (x + t) g(x − t) Te2 (f, g)(x) = f (x) g(x) + p.v. dt 2 2 t i 1 = f (x) g(x) + BHT(f, g) 2 2 where BHT(f, g) denotes the bilinear Hilbert transform Z Z BHT(f, g) = −i sgn (ξ − η) e2πix(η+ξ) fb(ξ) gb(η) dη dξ.
(7.53) (7.54)
(7.55)
Significantly, in the case of T2 , the discontinuity of the symbol and the zero of the phase coincide along the line η = ξ. In the case of T˜2 , the discontinuity along η = ξ is perpendicular to the line of vanishing phase. The integral kernel pv 1/t still has a point singularity but translations now treat f and g differently, making BHT(f, g) much harder to analyze than H(f g). Nevertheless, in the end BHT(f, g) is better behaved than H(f g) as, in fact, it maps L2 × L2 → L1 as Lacey and Thiele proved [239]. This fact was originally conjectured by Calder´on as a first step of one possible means of assault on the problem of boundedness of the Cauchy integral on Lipschitz curves. However, proving boundedness of BHT turned out to be more difficult than proving continuity of the Cauchy integral. But the technical ingenuity required to prove boundedness of BHT offers promise for handling (7.47) in terms of its multilinear pieces. Matters become increasingly complicated for larger n. The operator Z ∞ Z ∞Z ∞ T3 (f, g, h)(x) = e2πix(γ−η+ξ) fb(ξ) gb(η) b h(γ)dγ dη dξ −∞
ξ
η
turns out to be a degenerate case of the trilinear family Z ∞ Z ∞Z ∞ (f, g, h) 7→ e2πix(c3 γ+c2 η+c1 ξ) fb(ξ) gb(η) b h(γ)dγ dη dξ −∞
ξ
(7.56)
η
that has been analyzed by Muscalu et al. [283, 286] for generic values of (c1 , c2 , c3 ): the case (1, −1, 1) does not satisfy the mapping properties Lp × Lq × Lr → Ls in the H¨older range 1/p + 1/q + 1/r = 1/s as it does in the case c1 = c2 = c3 = 1 when (7.56) defines a sort of trilinear version of the Hilbert transform. In the next section we will consider a Walsh model BHT and its boundedness properties.
334
7 Uncertainty principles in mathematical physics
7.5 Walsh models revisited 7.5.1 A Walsh model for the Carleson operator In this section we continue to work in a single variable. We wish to consider a Walsh model for the Carleson operator Z ξ (7.57) C(f )(x, ξ) = sup e2πixη fb(η) dη. ξ
−∞
A corresponding model for the bilinear Hilbert transform will be presented in the next subsection. Carleson proved that C maps L2 to L2,∞ and Hunt later proved its Lp -boundedness, 1 < p < ∞. The operators C and BHT have models in the Walsh phase plane. In matters concerning multilinear operators with nonsmooth symbols (e.g., [161,239, 283]) the Walsh setting consistently serves to provide a blueprint for the line of assault on the Euclidean problem which is always more technical because of the uncertainty principle. The Walsh models for C and BHT indicate the general pattern for the model operators corresponding to Christ and Kiselev’s spectral asymptotics. Operator bounds for the Walsh BHT will be proved in Section 7.5.3. The Walsh functions were introduced in Chapter 4. Because convergence properties of oscillatory expansions can depend on the order of terms, we will consider here an enumeration of the Walsh functions different from the sequency ordering. Thus we set W0 (x) = χ[0,1) (x), W2n (x) = Wn (2x) + Wn (2x − 1), W2n+1 (x) = Wn (2x) − Wn (2x − 1). This ordering is called the natural or Paley ordering as opposed to the previously employed sequency ordering. One still wishes to think of Wn as occupying the tile [0, 1) × [n, n + 1) in the upper half-plane. As before, the functions WP = Wjkn (x) = 2j/2 Wn (2j x − k) are then thought of as wave packets—in fact they are wavelet packets for the Haar wavelets—supported in the time interval [k/2j , (k + 1)/2j ) and associated with the frequency interval [2j n, 2j (n + 1)). The product of these intervals forms a Heisenberg tile P in the Walsh time–frequency plane. All of the combinatorial properties established for the Walsh phase plane under sequency ordering still remain true with the Paley ordering. In particular, there is an orthogonal transformation that converts any time sibling pair into a frequency sibling pair. We start by considering the Walsh model for the Carleson operator C. For bounded, measurable Φ, the linearized Carleson operator CΦ is defined by Z CΦ (f )(x) = fb(ξ) e2πixξ dξ. (7.58) ξ<Φ(x)
7.5 Walsh models revisited
335
The Carleson operator can be thought of as C(f )(x) = supΦ |CΦ (f )(x)|. One proves that it is bounded by finding bounds on CΦ that do not depend on Φ.
( x, Phi (x) )
Fig. 7.4. Walsh packet tiles contributing to WCΦ (f )(x)
P A Walsh model operator WCΦ (f ) will have the form P ∈PΦ hf, WP i WP for an appropriate set PΦ of Walsh packet tiles. Since CΦ (f )(x) consists of the superposition of all frequency components of f below Φ(x), it makes sense that the sum defining WCΦ (f ) should only propagate up to Φ(x) as well. In order that a tile appearing in WCΦ (f ) provides optimal frequency resolution its time interval should be as large as possible. This can be arranged by including in WCΦ (f )(x) those tiles B− such that Φ(x) ∈ B + , see Figure 7.4. As in Chapter 4, B− and B + denote the lower and upper frequency sibling tiles of a bitile B—a product of dyadic intervals having area two. With these observations in mind, one defines the linearized Walsh–Carleson operator : X ® WCΦ (f ) = f, WB− WB− . (7.59) B: (x,Φ(x)) ∈ B +
7.5.2 A Walsh quartile operator and the BHT Next we turn to the problem of finding a Walsh model for the bilinear Hilbert transform in (7.55). The difficulty in estimating BHT(f, g) is the lack of smoothness of the frequency cut-off. The standard Littlewood–Paley method for establishing Lp -boundedness of the usual Hilbert transform Hf (x) = R −i sgn(ξ)fb(ξ)e2πixξ dξ, whose symbol −isgn(ξ) is discontinuous at the origin, P∞ R represents Hf as −i j=0 sgn (ξ )bj (ξ)fb(ξ)e2πixξ dξ where bj form a smooth resolution of unity supported around dyadic annuli. The idea of resolving frequencies this way underlies the Walsh model for the BHT as well. But now
336
7 Uncertainty principles in mathematical physics
one must resolve pairs of frequencies with respect to the line of discontinuity, ξ = η via ZZ X ZZ 2πix(ξ+η) b fb(ξ)b g (η)e2πix(ξ+η) bω (ξ)bω0 (η)dξdη f (ξ)b g (η)e dξdη = η<ξ
ω
(7.60) in which bω form a decomposition of unity. The specific organization comes from a Whitney decomposition of the domain η < ξ in R × R (see Figure A.1). The squares ω × ω 0 have the form [2j k, 2j (k + 1)) × [2j (k − 2), 2j (k − 1)), j, k ∈ Z [347] so that, not only is the condition η < ξ met but the two frequencies are resolved by their corresponding packets. In the Euclidean BHT j+1 j+1 the frequency component ξ+η Pthen lies in [2 (k−1), 2 k). The component ∨ f ∗ bω has Walsh analogue I dyadic,|I||ω|=1 hf, WI×ω i WI×ω and one might guess that the Walsh analogue of the ωth summand in (7.60) might also be the product term X hf, WI×ω i hg, WI×ω0 i WI×ω WI×ω0 I dyadic, |I||ω|=1
since WI×ω is supported in I. But here the Walsh case differs significantly from the Euclidean case: when multiplying two Walsh functions the oscillations tend to cancel rather than add. Arguably, if the oscillations cancel too much then the two Walsh packets did not do an adequate job of separating the frequency components of f, g in the first place. It is reasonable to posit then that the Walsh packets appearing in a model BHT should have frequency interval intermediate to those of the coefficient packets. The resulting Walsh model operator would be a superposition of packets of the form hf, WP0 i hf, WP2 i WP1 in which the Pi have adjacent but distinct frequency intervals. It is not really canonical so one proposes a slight alternative having essentially the same structure but a slightly more convenient organization. One defines a quartile Q (not to be confused with a cube in Rn ) to be a product I ×σ of dyadic intervals in R×[0, ∞). The time and frequency intervals of Q are denoted IQ and ωQ , respectively. A quartile Q will have four frequency sibling tiles denoted Q0 , Q1 , Q2 , Q3 in order of increasing frequency. Its time siblings will be denoted Qll , Qlr , Qrl , Qrr in order of increasing x. With these conventions one can define an appropriately normalized Walsh model BHT via X 1 WB(f, g)(x) = hf, WQ0 i hg, WQ1 i WQ2 (7.61) |IQ |1/2 Q where the sum extends over all quartiles. The operator WBπ (f, g) =
X Q
® ® 1 f, WQπ(0) g, WQπ(1) WQπ(2) , |IQ |1/2
7.5 Walsh models revisited
337
where π is any permutation of {0, 1, 2}, can be estimated by the same methods that apply to WB. More insights regarding Walsh packet models for BHT and related operators can be found in [160] and [347]. Walsh models for more complicated multilinear and maximal operators that arise in the WKB asymptotics of Christ and Kiselev have been considered by Muscalu et al. (e.g., [283] and [284]). 7.5.3 Estimates for the Walsh bilinear Hilbert transform In this section we consider boundedness properties of WB. Theorem 7.5.1. The Walsh bilinear Hilbert transform (7.61) is bounded from Lp ×Lq (R) into Lr (R) whenever p, q, r are in the H¨ older range 1/p+1/q = 1/r with r > 2/3. In particular, WB is bounded from L2 × L2 (R) → L1 (R). We will follow Thiele’s approach [346], taking into account several technical adjustments made by Gilbert and Nahmod [160]. Many of the intricate combinatorial innovations to be used originated in Fefferman’s proof of almosteverywhere convergence of Fourier series [135]. One defines good and bad sets having measure depending on a parameter κ so that a weak-type estimate can be established. Interpolation then leads to the desired strong-type estimate. Forests and trees are defined in terms of density properties of f , while the bad set is defined in terms of the time intervals of those Walsh packets with respect to which f is too dense too often. Properties of the Walsh phase plane outlined in Chapter 4—and which apply equally to the Paley order used here—will be used as needed. In view of the quartile structure of WB, we will think of Walsh packets associated with the quartile Q = I × ω = I × ω0 ∪ · · · ∪ I × ω3 where I = [4j l, 4j (l + 1)) and ω = [41−j n, 41−j (n + 1)) and WQi = 2−j W4n+i (4−j x − l) = (δ4j ◦ τl )W4n+i . In order to prove strong-type estimates for WB one first proves weak-type estimates then applies the following instance of a multilinear interpolation theorem due to Janson [210]. Suppose that the bilinear operator B, initially defined on some dense subspace of Lp × Lq (R) (0 < p, q < ∞), extends to a bounded operator from Lpi × Lpj (R) → Lr0 ,∞ (R) for (i, j) ∈ {(0, 2), (2, 0)} where 1/p0 + 1/p2 = 1/r0 and from Lpi × Lpj (R) → Lr1 ,∞ (R) for (i, j) ∈ {(1, 2), (2, 1)} where 1/p1 + 1/p2 = 1/r1 . Then B extends continuously to Lp × Lq (R) → Lr (R) whenever 1/p + 1/q = 1/r. The symmetry conditions on the exponent pairs (pi , pj ) indicate that it is necessary to establish the same g weak-type bounds for W B(f, g) = WB(g, f ) as are needed for WB(f, g). By a certain duality argument, in order to establish Lp × Lq (R) → Lr (R) for the full H¨older range of exponents it is enough to establish the boundedness on a certain subset of the H¨older range. g Proposition 7.5.2. If WB and W B are bounded from Lp ×Lq (R) → Lr,∞ (R) whenever 1 < p < 2 ≤ q < ∞ and 1/p + 1/q = 1/r, then WB extends to a bounded operator WB : Lp ×Lq (R) → Lr (R) whenever 1/p+1/q = 1/r < 3/2.
338
7 Uncertainty principles in mathematical physics
One further reduction is required for which it will be convenient to assume that f lies in the dense subspace of L2 ∩ Lp consisting of those functions that can be written as finite linear combinations of Walsh packets. In the sequel we will assume that such an f is fixed and proceed to establish estimates for the linear mapping g 7→ WB(f, g) with bounds proportional to kf kp . When we wish to emphasize this implicit role of f we will write WB(g) in lieu of g 7→ WB(f, g). The following estimates, to which we will refer as the main lemma, enable one to verify the hypotheses of Proposition 7.5.2. Lemma 7.5.3. (main lemma) There is a C > 0 such that for each κ > 0 there is a set E ⊂ R such that the estimates µ ¶p kf kp |E| ≤ (7.62) κ and
¶1/2
µZ |WB(f, g)|2
µ ≤ C
kf kp
¶−α
κ
R\E
kf kp kgkq
(7.63)
hold uniformly for all g ∈ Lq (R) where 2 ≤ q < ∞ and α = p(1/r − 1/2). Here is how weak-type boundedness follows from Lemma 7.5.3. Let λ > 0 p q and choose κ = (λq (kf kp / kgkq ))1/(p+q) . Then the estimate (7.62) and the H¨older condition r = pq/(p + q) imply that à ! µ ¶p µ ¶r p q p/(p+q) kf kp kf kp kgkq kf kp kgkq |E| ≤ = pq/(p+q) = p κ λ kf kp λ while (7.63) then gives Z
µ µ ¶¶2α/(p+q) kf kpp 2(1−α) 2 q kf kp kgkq |WB(f, g)| ≤ C λ q kgk q R\E µ ¶r kf k kgk p q = Cλ2 . λ 2
By Chebychev’s inequality, |{x : |WB(f, g)(x) > λ}| ≤
1 λ2
≤ C
0
Z |WB(f, g)|2 dx + |E| µ
R\E
kf kp kgkq λ
¶r .
Thus, if the conditions of the lemma are satisfied then WB maps Lp × Lq continuously into Lr,∞ . Now we need to verify the conditions of Lemma 7.5.3—the existence of a bad set E whose measure can be controlled and a complementary good set on which L2 estimates can be established.
7.5 Walsh models revisited
339
Denote by P the collection of all dyadic tiles and NP (x) = #{P ∈ P : x ∈ P } the counting function of a set P of pairwise disjoint tiles in P. Let Pk = Pk (f ) denote a collection of pairwise disjoint tiles P in P satisfying p |hf, WP i| ≥ 2k |IP |. (7.64) Because Walsh packets coming from disjoint tiles are orthogonal, one has the Carleson-type estimate Z X NPk (x) dx = |IP | ≤ 2−2k kf k22 . (7.65) P ∈ Pk
Proposition 7.5.4. Fix f ∈ Lp (R), (1 < p < 2). For each ε > 0 there is a constant C = C(p, ε) such that ° ° ° 1/p0 +ε ° ° ≤ C 2−k kf kp . °NPk p
Proof (of Proposition 7.5.4). Let β(x) = {hf, WP i WP (x)}P ∈Pk . The Lp (`s ) norm of β is kβkLp (`s ) =
µZ µ X
s
|hf, WP i WP (x)|
¶p/s
¶1/p dx
.
P ∈ Pk
Again, orthogonality of Walsh packets of disjoint tiles implies kβkL2 (`2 ) =
µ X
2
|hf, WP i|
¶1/2
2
≤ kf k2
P ∈ Pk
whereas, since |hf, WP i WP (x)| ≤ M f (x) where M f is the Hardy–Littlewood maximal function of f , Z 1+ε 1+ε kβkL1+ε (`∞ ) = sup |hf, WP i WP (x)| dx P ∈ Pk Z 1+ε ≤ M f (x)1+ε dx ≤ Cε kf k1+ε by the maximal theorem. Using complex interpolation (e.g., [47]) it follows that kβkLp (`p0 +εe) ≤ Cεe,ε,p kf kp (1 < p < 2) (7.66) holds for ε˜ sufficiently small. At the same time, since χP = χ1+δ for any δ > 0, by (7.64), P NPk (x) =
X P ∈ Pk
χP (x) ≤
X ¡ ¢p0 +eε 2−k |hf, WP i WP (x)| P ∈ Pk
340
7 Uncertainty principles in mathematical physics
so that by (7.66) one has 0 ¶ Z µ X ° ° ¡ −k ¢p0 +eε p/(p +eε) ° 1/(p0 +eε) °p 2 |hf, WP i WP (x)| dx °NPk ° ≤
p
P ∈ Pk
≤
1 2kp
Z
p
(kβ(x)k`p0 +εe ) dx ≤
C kf kpp . 2kp
This proves the proposition.
Fig. 7.5. A pair of trees. Dashed boxes are included to make trees convex.
Lemma 7.5.3 is based on a decomposition of WB into sums over trees. The decomposition takes account of the quartile structure of the operator and the trees will be chosen accordingly. Quartiles Q are partially ordered by containment of their time intervals, Q ¹ Q0 if IQ ⊂ IQ0 and ω 0 ⊂ ω; disjoint quartiles are not comparable. In order to extend the notion of density implicit in (7.64) to quartiles one needs to consider the operator ΠQ f , the orthogonal projection onto the space spanned by the Walsh packets of tiles contained in ∪Q∈Q Q. When Q is a single quartile this subspace is four-dimensional. The density of a quartile with respect to f is defined as δ(Q, f ) = sup kΠQ0 f k∞ . Q0 ºQ
Since x lies in only one of the four time siblings defining ΠQ f (x), one has kΠQ0 f k∞ = (4|IQ |)−1/2 max{|hf, Qll i| , |hf, Qlr i| , |hf, Qrl i| , |hf, Qrr i|} . Thus one can express kΠQ f k∞ in terms of a single Walsh packet coefficient. Define now the quartiles of homogeneous density 2k to be Qk = {Q : 2k ≤ δ(Q, f ) < 2k+1 }. These quartiles provide an initial decomposition of WB(g). A refinement of this decomposition comes from setting
7.5 Walsh models revisited
Qk,ν =
©
ª ν+1
Q ∈ Qk : 2ν ≤ #{Q0 ∈ Qmax : Q0 º Q} < 2 k
341
.
The collections Qk,ν are pairwise disjoint for different k, ν. Each collection Qk,ν will be called a forest since the partial ordering on quartiles induces a tree structure TQ¯ on the collection of quartiles in Qk,ν that are bounded by ¯ in Qk,ν (see Figure 7.5). In this case Q ¯ will (i.e., ¹) a fixed maximal quartile Q be called the tree-top of TQ¯ . With this terminology, WB(g) can be expressed as a sum X 1 p WB(g) = hf, WQ0 i hg, WQ1 i WQ2 |I Q| Q∈Q =
∞ XX X k ∈ Z ν=0 Q ∈ Qk,ν
∞ XX 1 p hf, WQ0 i hg, WQ1 i WQ2 ≡ Fk,ν (f, g) |IQ | k ∈ Z ν=0
where the forest operators Fk,ν are linear in g. Forests Qk,ν are clearly disjoint for different k and or ν but one also wants to express each forest as a disjoint collection of trees. Fefferman [135] referred to this fact as the no-vee lemma. Lemma 7.5.5. For k, ν fixed, any two trees in Qk,ν are disjoint. Proof. The result is not true for arbitrary collections of quartiles since maximal elements could share the same time interval and have neighboring frequency intervals. One proves the lemma by contradiction. Suppose that ¯ 1 and Q ¹ Q ¯ 2 where Q ¯ i are maximal elements Q ∈ Qk,ν such that Q ¹ Q i ¯ of Qk,ν . By definition of Qk,ν these Q each must be bounded by at least ¯ 1 must be ν maximal quartiles in Qk . Moreover, those quartiles bounding Q 2 ¯ ¯ 1 and disjoint from those bounding Q because the frequency intervals of Q 2 ν+1 ¯ Q are minimal, hence disjoint. In this case Q is bounded by 2 maximal elements of Qk which contradicts the assumption that Q ∈ Qk,ν . Lemma 7.5.5 tells us that each forest operator can be expressed as a sum of tree operators TQ¯ : X
g 7→ Fk,ν (f, g) =
TQ¯ (f, g)
¯ ∈ Qmax Q k,ν
TQ¯ (f, g) =
X Q ∈ Qk,ν ∩ TQ ¯
1 hf, WQ0 i hg, WQ1 i QQ2 |IQ |
p
(7.67)
These tree operators are the most basic constituents of g 7→ WB(f, g) and they are local in the sense that TQ¯ (f, g)(x) depends only on the restriction of g to IQ¯ . The tree operators are pairwise orthogonal because of Lemma 7.5.5. Both locality and orthogonality will play a role in estimating the forest operators but first one needs to use the finite packet hypothesis on f to insure that WB(f, g) also can be written as a finite sum.
342
7 Uncertainty principles in mathematical physics
Lemma 7.5.6. The set of quartiles in TQ¯ is a finite convex set of tiles. ¯ and suppose that Q ¹ Q0 ¹ Proof. Let TQ¯ be a tree in Qk,ν with tree-top Q 00 00 0 ¯ so one just needs that Q0 ∈ Qk,ν . Since Q with Q, Q in TQ¯ . Then Q ¹ Q, the density δ(Q, f ) is monotone in Q, one has 2k ≤ δ(Q, f ) ≤ δ(Q0 , f ) ≤ δ(Q00 , f ) < 2k+1 ˜ ∈ Qmax : Q ˜ ≥ Q} is monotone so that Q0 ∈ Qk . On the other hand, #{Q k decreasing on Qk and, since this number equals ν for Q and Q00 , one must also have Q0 ∈ Qk,ν and the lemma follows. One has the following pointwise estimate for trees. ¯ and let ΠT ¯ denote Lemma 7.5.7. Let TQ¯ be a tree in Qk,ν with tree-top Q Q 2 the orthogonal projection onto the subspace of L (R) spanned by {WQ0 }Q∈TQ¯ . Then, whenever x ∈ IQ¯ one has |ΠTQ¯ f (x)| ≤ C 2k . Proof. Since a tree is convex it can be decomposed into a disjoint set of tiles ¯ consist of those tiles comprising the quartiles in TQ¯ (Lemma 4.5.6). Let P(Q) ˜ Q) ¯ consist of those tiles lying in the region covered by P(Q). ¯ Then and let P( ΠTQ¯ can also be realized as (cf. Theorem 4.5.7) ΠTQ¯ f =
X ¯ P ∈e P(Q)
hf, WP i WP . min
Fix x ∈ IQ¯ and let Q be the minimal quartile in TQ¯ that contains x in IQ . Then x belongs to one of the time siblings of Q. Without loss of generality, ˜ Q) ¯ min then, as observed above, say, x ∈ IQll . If Qll ∈ P( ΠQ f (x) = ΠQll f (x) = ΠTQ¯ f and the inequality |ΠTQ¯ f (x)| = |ΠQ f (x)| ≤ C 2k follows from Q ∈ Qk . ˜ Q) ¯ min . Then one can find four Suppose on the other hand that Qll ∈ / P( time sibling tiles Rll , . . . , Rrr such that Qll is contained in their union and ˜ Q). ¯ But then such that any of these tiles dominated by Qll is an element of P( the union of these time siblings is a quartile R such that IR = IQll so R ¹ Q. Then Q is not minimal among all quartiles in TQ¯ . Convexity of TQ¯ implies that R ∈ TQ¯ . But this contradicts minimality of Q among those quartiles in ¯ min . This proves Lemma TQ¯ containing x. Thus it must be that Qll ∈ P(Q) 7.5.7.
7.5 Walsh models revisited
343
Now we can prove the forest estimate needed to prove Lemma 7.5.3. Proposition 7.5.8. There is a C > 0 such that the forest estimate kFk,ν (g)k2 ≤ C 2k kgk2 holds for all k, ν and all g ∈ L2 . Proof. Since IQ¯ are pairwise disjoint for different maximal elements of Qmax k,ν , X ° ° °TQ¯ (g χI ¯ )°2 Q 2
2
kFk,ν (g)k2 =
¯ ∈ Qmax Q k,ν
so it is sufficient to show that ° ° °TQ¯ (g)° ≤ C 2k kgk . 2 2 Fix ξ ∈ ωQ¯ and set e ξ = TQ¯ \ Tξ¯ . T Q Q
TξQ¯ = {Q ∈ TQ¯ : ξ ∈ ωQ1 },
If two quartiles Q, Q0 overlap then at most one of their frequency interval pairs ωQi , ωQ0i can overlap. Since Q, Q0 ∈ TξQ¯ implies ξ ∈ ωQ1 ∩ ωQ01 it follows that ωQ2 ∩ ωQ02 = ∅ and the Walsh packets WQ2 and WQ02 are orthogonal. Since one can also replace f in (7.67) by ΠTQ¯ f this orthogonality yields kTQ¯ξ (f, g)k22 =
X Q ∈ TξQ ¯
1 | hΠTQ¯ f, WQ0 i hg, WQ1 i|2 . |IQ |
One can apply a maximal estimate to TQ¯ξ (f, g) as follows. One views TξQ¯ as a discrete measure space whose quartiles are assigned measure µ({Q}) = |hΠTQ¯ f, WQ0 i|2 . For any λ > 0 one has the maximal function estimate 2
{Q : |hg, WQ1 i| ≥ λ |IQ |} ⊂ {x : M g(x)2 ≥ λ}. Therefore, by Lemma 7.5.7, one has Z 2
µ({Q : |hg, WQ1 i| ≥ λ |IQ |}) ≤ ¯ ¯ ≤ C 22k ¯{x : M g(x)2 ≥ λ}¯ .
{x: M g(x)2 ≥ λ}
¯ ¯ ¯ ΠT ¯ f (x)¯2 dx Q
Now one can proceed much as in Section 6.9, taking advantage of the local nature of TQ¯ξ to write the L2 norm of TQ¯ξ (f, g) in terms of a distribution function as
344
7 Uncertainty principles in mathematical physics
Z kTQ¯ξ (f, g)k22 =
∞
2
µ({Q : |hg, WQ1 i| ≥ λ |IQ |}) dλ Z ∞ ¯ ¯ 2k ¯{x : M g(x)2 ≥ λ}¯ dλ ≤ C2 Z0 2 2k = C2 M g(x)2 dx ≤ C 22k kgk2 0
by the maximal theorem. On the other hand, set TeQ¯ξ (f, g) =
X eξ Q∈T ¯ Q
1 p hf, WQ0 i hg, WQ1 i WQ2 |IQ |
n o ˜ ξ¯ , WQ : Q ∈ T ˜ ξ¯ is an orthonormal Since ωQ1 ∩ωQ01 = ∅ whenever Q, Q0 ∈ T 1 Q Q set. It follows that the adjoint of g 7→ T˜ ¯ξ (f, g) has the same structure as Q
TQ¯ξ , but with exposed packets of the form WQ1 , and its L2 -operator norm will satisfy the same estimate. Combining the norm estimates for T ¯ξ and T˜ ¯ξ Q
Q
one obtains the desired norm estimate kTQ¯ (f, g)k2 ≤ C2k . This proves the proposition. Now one can proceed with the proof of the main lemma (7.5.3). Given κ > 0 one chooses k0 so that 2k0 ≤ κ < 2k0 +1 . Select any s > p0 + ε where ε is the same as that appearing in Proposition 7.5.4. The Carleson estimate (7.65) will be needed in summing the estimates for forests appearing in the decomposition ∞ X X WB(f, g) = Fk,ν (f, g). k ∈ Z ν=0
Here one sets A = {Qk,ν : k < k0 , ν < s(k0 − k)} and lets B denote those remaining forests. Then X X WB(f, g) = Fk,ν (f, g) + Fk,ν (f, g). Qk,ν ∈ A
Qk,ν ∈ B
Because of the local nature of trees, [ supp (Fk,ν (f, g)) ⊂ {IQ : Q ∈ Qk,ν } ⊂ {x : NQk (x) > 2ν } where NQk (x) counts the number of quartiles in Qk containing x. If Qk,ν ∈ B then, by its definition, ½ {x : NQk (x) > 1}, k > k0 supp (Fk,ν (f, g)) ⊂ Ek ≡ {x : NQk (x) > 2s(k0 −k) }, k ≤ k0 Finally we can define the bad set E = ∪Ek . It follows from Chebychev’s inequality, the definition of Qk , and from Lemma 7.5.4 that
7.5 Walsh models revisited
µ |Ek | ≤ C µ |Ek | ≤ C
kf kp 2k
345
¶p
(k > k0 ) while ¶p kf kp (k ≤ k0 ). s(k −k)/(p0 +ε)
2k 2
0
The choices of 2k0 ≤ κ and s > p0 + ε then guarantee that ¶ ½X µ X¾ X X 0 + 2kp(1−s/(p +ε)) + 2−kp |E| ≤ |Ek | ≤ Ckf kpp 2−k0 p k ≤ k0
k>k0
µ
≤ C 2−k0 p kf kpp ≤ C
kf kp κ
k≥0
¶p
k>k0
.
This establishes the estimate on |E| in the main lemma (7.5.3). For the quadratic estimate on WB one uses the fact that if IQ ⊂ R\E then Q resides in a forest in A. Therefore, if g ∈ L2 ∩ Lq then using disjointness of forests one has Z X 2 |WB(f, g)(x)|2 dx ≤ kFk,ν (f, g)k2 . R\E
Qk,ν ∈ A
Since ν < s(k0 − k) the forest estimate Proposition 7.5.8 gives X
2
kFk,ν (f, g)k2 =
Qk,ν ∈ A
X
s(k0 −k)
k ≤ k0
ν=0
X
2
kFk,ν (f, g)k2
X ¡ ¢2 s (k0 − k) 2k kgk2 ≤ C k ≤ k0 2
2
≤ C 22k0 kgk2 ≤ C κ2 kgk2 . Since supp (Fk,ν (f, g)) ⊂ {x : NQk (x) ≥ 1} when Qk,ν ∈ A, one also has Fk,ν (f, g) = Fk,ν (f, g χ{NQk ≥ 1} ). Then from Chebychev’s and H¨older’s inequalities (q ≥ 2) kg χ{NQk ≥ 1} k2 ≤ kgkq | {NQk ≥ 1}|1/2−1/q µ ¶p(1/2−1/q) ¶p(1/2−1/q) µ kf kp kf kp ≤ C kgkq ≤ C kgkq 2k κ summing over the forests in A one obtains the estimate µZ
¶1/2 2
|WB(f, g)(x)| dx R\E
µ ≤ C kgkq
kf kp κ
The main lemma (7.5.3) then follows from substituting
¶p(1/2−1/q) .
346
7 Uncertainty principles in mathematical physics
³1 1´ ³1 1 1´ 1−α = p − = p − + . 2 q 2 r p Since f was assumed to lie in a dense subspace of Lp and since the estimates for g 7→ WB(f, g) depended only on kf kp , the proof that WB maps Lp ×Lq → Lr in the H¨older range with r > 2/3 follows.
7.6 WKB and WAM Haldane attributed to Compton the view that “life takes advantage of the uncertainty principle to make certain events more probable than they would otherwise have been” [175]. Instances of this view abound in the life sciences. Here we present one instance related to mammalian auditory processing. This is followed by an outline of a wavelet auditory model (WAM) due to Benedetto and Teolis [43]. We will not consider the mechanics of mammalian audition in any detail. Allen’s survey paper [5], from which many of the following observations are taken, is an excellent source for more details and references. 7.6.1 Cochlear modelling: early history As is typical in modelling of complex biological systems, our knowledge of mammalian auditory processing is based largely on inference from indirect observation. Nevertheless, accurate yet simple auditory models are important for correcting faulty hearing, as well as for mimicking evolved auditory systems for such purposes as source separation and noise suppression. Helmholtz’s cochlear model is believed to have first been presented in Bonn in 1857 [186]. Helmholtz correctly compared the cochlea to a bank of highly tuned resonators, selective to different input frequencies. However, lacking direct observation of the motion of the cochlear fluid, he incorrectly modelled the motion of the basilar membrane (BM) in terms of standing waves rather than travelling waves. The next major advance did not come until 1924 from Wegel and Lane and further observations of Fletcher, one of their experimental subjects (e.g., [143]). Though not deduced until later, travelling waves are a mathematical consequence of the Wegel and Lane model. The physical nature of these waves was uncovered by von B´ek´esy through direct observations performed on human cadavers starting in 1928 and eventually meriting a Nobel prize in 1961 (e.g., [5]). Von B´ek´esy found that the cochlea behaves like a dispersive transmission line in which pure tones set up waves that travel in one direction from the stapes (basal end) toward the heliocotrema (wider apical end) along the basilar membrane. Displacements corresponding to high frequencies are cut off at the basal end while low frequencies propagate to the apical end. In multicomponent signals, different frequency components travel along BM at different rates. Excitation of dead tissue requires sound levels well beyond the
7.6 WKB and WAM
347
ordinary pain threshold. Consequently, von B´ek´esy’s results did not reflect precisely the response characteristics of live tissue which turns out to localize frequencies more sharply than von B´ek´esy’s experiments indicated. In fact, this response can change in amplitude more than five orders of magnitude per millimeter along the membrane. The Wegel and Lane model was cast in terms analogous to an electrical transmission line in 1948 by Zwislocki [374], who argued that the mass of fluid behaves as an inductor while membrane stiffness plays the role of a capacitor. The impedances vary continuously with the distance x along BM, thus serving (i) to separate the input acoustic signal into overlapping frequency bands by means of differentiated propagation rates and (ii) to compress the broad acoustic intensity range into a much smaller mechanical and electrical dynamic range of the inner hair cell. In speech processing, the stimulus (input signal) has a high information rate while the individual neurons (on the order of 30,000 of them in the human auditory nerve) that must transmit data to the auditory cortex are very low bandwidth channels. The cortex must have a robust way of piecing these components together. The decomposition of input into small frequency packets is the job of the inner hair cells (IHCs). After being filtered by the cochlea, a low-level pure tone induces excitation of the cilia of about 40 contiguous IHCs that convert mechanical displacement to voltage. Each IHC is connected to many neurons. In humans, neurons encode responses of about 3500 IHCs which form a single row along BM. IHC excitation amounts to a narrowband signal with a center frequency that depends on the IHC’s location along BM and each IHC voltage is a low-pass filtered representation of the detected IHC cilia displacement (e.g., [206]). It is believed that the neuron information channel between the IHC and the cochlear nucleus is a combination of the mean firing rate and the relative timing between neural pulses. 7.6.2 The cochlear compromise In Zwislocki’s transmission line model, no energy gets reflected as the wave propagates. Physically, this suppression of reflections or impedance matching implies that whatever power is available to excite the hair cells that convert mechanical energy into neural pulses is properly utilized. However, it also requires that mechanical properties of the cochlea change slowly along BM, a direct conflict with the requirement that the cochlea provide sharp localization of tones. Zweig et al. [373] termed this incarnation of the uncertainty principle the cochlear conflict. A resolution of this conflict allows one to solve approximately the equations of motion for BM via WKB methods. In the transmission line model one associates an effectively pure imaginary series impedance ζ(x)∆x to motion of fluid in the x-direction while a parallel admittance ν(x)∆(x) is associated with longitudinal motion. Pressure differential p(x) (analogous to voltage differential) acts as a driver while fluid velocity (corresponding to current flow) v(x) allows one to build a mechanical
348
7 Uncertainty principles in mathematical physics
model in the form dν = −p ν, dx dp = −v ζ. dx
(7.68)
Setting µ2 = ζ/ν and weighting distance with respect to impedance, i.e., setting ∆u = −iζ∆x (note that −iζ is primarily positive real), (7.68) yields 1 d2 p + 2 p = 0. 2 du µ If 1/µ varies slowly with u then one can approximate p by the WKB method, yielding the approximate solution µ Z ¶ du p ≈ p0 (ω) µ1/2 exp −i (7.69) µ for p as a function of angular frequency ω, thinking of 2πµ as a local wavelength relative to the metric ∆u. This formal approximation raises the question: what compromise must the cochlea make in order to validate this approximation? If one thinks of motion of waves along BM as wave propagation in a medium of continuously varying refraction, then the hypothesis of small reflected energy requires that the index of refraction vary slowly. This is the same as having small changes in wavelength over distance. It translates into |d(1/µ)/du| ¿ 1. One more important issue is that of the relationship between frequency and location along BM. According to von B´ek´esy’s data the frequency ωr (x) that resonates at locus x along BM behaves like ωm e−x/d in which ωm is the maximum audible frequency and d is a characteristic length. One argues that displacement of BM is proportional to νp/(iω) and, in view of the WKB approximation (7.69), this displacement can be expressed as a joint function of x and ω with a maximum at the resonant frequency and exponential damping in the forbidden region above it. In this manner Zweig et al.’s cochlear compromise accounts for von B´ek´esy’s observations, at least in the interior region of BM where ωr (x) ≈ ωm e−x/d . 7.6.3 Cochlear processing and WAM The IHC conversion of mechanical displacement to neural input voltage has been viewed as a parallel bank of narrowband filters with central frequencies distributed approximately logarithmically along BM. It is convenient to model their transfer functions as being covariant with dilation, except for frequency translation along this axis. In this way, the displacements W due to the stimulus y with the output of the cochlear filter bank having input responses given by convolution with
7.6 WKB and WAM
349
dilates of a fixed function ψ : R → C can be modelled as a continuous wavelet transform W = Wψ (y)(t, s) = hy, s1/2 ψ(s · −t)i (cf. [99], p. 6). One discretizes in scale by breaking the logarithmic axis into equi(log)-spaced intervals. Setting sm = am for a fixed a > 1, one obtains the discrete displacements Wψ (y)(t, sm ) = Wψ (t, am ),
m ∈ Z.
b plays a critical role in determining the effectiveness of The shape of |ψ| the model. As a causal filter, ψb should be supported in [0, ∞). It should also be fin-shaped so the high frequency edges of the dilates of ψb act as effective scale-delimiters: pure tones should propagate up to an appropriate scale then die out. The auditory system does not receive the enormously redundant data Wψ (y)(t, s) directly. The output of each cochlear filter is high-passed by the velocity coupling between the membrane and the cilia of the hair cell transducers that initiate the electrical nervous activity by means of a shearing action on the tectorial membrane. It is reasonable to approximate this step by a time derivative, obtaining thus ∂t Wψ y(t, sm ) whose zeros are the extrema of the wavelet transform. One organizes these zeros into: Zm = {tnm : ∂t Wψ y(tnm , sm ) = 0}. The next step of the auditory process is the first stage of neural activity involving saturation and activation of auditory channels then leakage of current. Mathematically, these are modelled by an instantaneous sigmoidal nonlinearity (threshold) followed by a low-pass filter with impulse-response h. This cochlear output takes the mathematical form Ch,R (t, s) = (R ◦ ∂t Wψ y(·, s)) ∗ h(t), RT (x) =
eT x , 1 + eT x
in which T delimits neuronal firing rate. This output represents a planar auditory wave pattern sent to the auditory cortex along the scale-ordered array of auditory channels. For large T , R represents a sigmoidal activation. In WAM this is modelled simply by taking R = H, the Heaviside function, and h = δ, the Dirac measure even though this hardly represents a low-pass filter. Thus the cochlear WAM output takes the form C(t, s) = (H ◦ ∂t Wψ y(·, s)). Lateral inhibitory network. As noted, speculation abounds around the manner in which the brain processes auditory nerve patterns modelled by W , ∂Wt and Zm . The lateral inhibitory network (LIN) is one such model that has been studied in hopes of extracting the spectral pattern of the acoustic stimulus [277], [318, 370]. LIN reflects proximate frequency channel behavior.
350
7 Uncertainty principles in mathematical physics
One models LIN by a derivative in scale at the zero crossings Zm , defining Λm = {∂s ∂t Wψ y(t(n, sm ), sm ) : t(n, sm ) ∈ Zm }. (see [43] for further justification). This irregularly spaced set in the (t, s) plane can be called WAM data. The WAM reconstruction problem is that of reconstituting y from this data. WAM and wavelet frames. By differentiating under the integral one has, for t(n, sm ) ∈ Zm , ∂t Wψ y(t(n, sm ), sm+1 ) − ∂t Wψ y(t(n, sm ), sm ) sm+1 − sm ∂t Wψ y(t(n, sm ), sm+1 ) = sm+1 − sm sm+1 W∂t ψ y(t(n, sm ), sm+1 ) = sm+1 − sm E 1 D = − y, τt(n,sm ) δ sm+1 ∂t ψe a−1
∂s ∂t Wψ y(t(n, sm ), sm ) ≈
in which τt f (x) = f (x − t) and δ a f (x) = a1/2 f (ax). Define Ψm,n = −
1 ˜ τt(n,sm ) δ sm+1 ∂t ψ. a−1
Then the WAM data approximately corresponds to the image {hy, Ψm,n i} of a linear operator L mapping a closed subspace H of L2 (R) to `2 (Z × Z). Reconstruction from the data then depends on properties of L. Reconstruction from WAM data. The reconstruction problem is that of inverting the map L that sends y to the WAM data {hy, Ψm,n i} which is ∗ nothing other than a discretized wavelet transform of y. Consider the P map L : 2 2 ` (Z × Z) to L (R) that assigns to a sequence cmn the function cmn Ψmn . Suppose now that S = L∗ L is bounded and continuously invertible on L2 (R). This is the same as saying that the functions Ψm,n form a frame for L2 (R). In this case one can write y = (S −1 L∗ ) Ly. In what follows let λ be the reciprocal of the average of the upper P and lower ∞ frame bounds. One can write S −1 in terms of the Neumann series λ k=0 (I − k −1 λS) . As S is not a numerical operator, one would prefer to write S in terms of powers of the Grammian LL∗ . This is done by writing y = S −1 Sy = λ
∞ X k=0
(I − λL∗ L)k (L∗ L) y = λ
∞ X
L∗ (I − λLL∗ )k Ly.
k=0
Thus, reconstructing y from its WAM data boils down to computing the numerical iterates (I − λLL∗ )k to the WAM data then applying L∗ at the
7.7 Notes
351
end. Questions of stability hinge on convergence rates of the iterates. A fairly straightforward geometric series estimate shows that ° ° N X ° ° ∗ ∗ k °y − λ L (I − λLL ) Ly ° ° ° k=0
¢N ¡ ≤ λ kI − λLL∗ kL(H) kykL2 . L2
One is not necessarily taking the sequence space H to be all of `2 . For the problem of reconstructing from WAM data one only needs to consider the norm of I − λLL∗ when restricted to the subspace spanned by some number of the iterates Ly, LL∗ (Ly), (LL∗ )2 Ly, . . . . When concerned with compression and estimation, one should expand in the subspace spanned by truncated iterates; in the context of noise, one should include in the span all iterates of the possible noise. Further issues with WAM. WAM is predicated on several simplifications that could arguably compromise its fidelity as a model of actual mammalian audition. The bottom line is whether WAM provides an effective and robust technique for speech processing. The book by Teolis [344] provides concrete computational evidence of WAM’s possibilities; see [42] for more recent observations. One question is: when will Ψm,n form a frame for L2 (R)? It can be P ˜ ∧ |2 is bounded above and verified that this occurs when m |δ 1/(sm+1 ) (∂t ψ) below (cf. [42]) and this condition should be imposed when possible. In turn, it places constraints on the scale parameter a as well as the basic filter ψ which, in a sense, is how the cochlear compromise is embedded in WAM.
7.7 Notes The trace inequality and Fefferman–Phong estimates. In Section 7.2.5 we used the following trace theorem due to Kerman and Sawyer [230]. Theorem 7.7.1. R Suppose that φ is a radially decreasing, non-negative function such that |y|≥r φ(y)2 dy < ∞ for all r > 0. Let Tφ be the operator of convolution with φ and define the maximal function µ ¶µZ ¶ Z 1 Mφ (f )(x) = sup φ |f | . x ∈ Q |Q| |y|<|Q|1/n Q For a locally finite Borel measure σ on Rn , the bound Z Z |Tφ f |2 dσ ≤ C |f |2 is equivalent to each of the following: (i) there is a C 0 such that for all Q ∈ Q one has
352
7 Uncertainty principles in mathematical physics
Z (Tφ (χQ σ))2 ≤ C 0 σ(Q) (ii) there is a C 00 such that for all Q ∈ Q one has Z (Mφ (χQ σ))2 ≤ C 00 σ(Q). Q
In Section 7.2.5 this was applied in the case of fractional integration Tφ = I1 , that is, φ(x) = |x|1−n , valid when n ≥ 3. For σ = V and φ(x) = |x|1−n the Kerman–Sawyer testing condition becomes Z M1 (χQ V )2 ≤ C V (Q) Q
R where Mα (f )(x) = supx ∈ Q |Q|α−n Q |f | denotes the fractional maximal function of order α. Fefferman and Phong effectively proved that µ(V, Q) ≤ C|Q|−2/n is necessary and µp (V, Q) ≤ C|Q|−2/n (p > 1) is sufficient for (7.23). One can verify directly that Z 1 c1 |Q|2/n µ(V, Q) ≤ (I2 (χQ V )) V ≤ c2 |Q|2/n µp (V, Q) (all cubes) V (Q) Q (7.70) as we shall do here—in lieu of proving the trace theorem itself. The left-hand inequality of (7.70) follows from observing that |Q|2/n µ(V, Q) = |Q|2/n−1 V (Q) ≤ C I2 (χQ V )(x)
(x ∈ Q)
since |x − y|2−n ≥ c|Q|2/n−1 when x, y ∈ Q. The right-hand inequality of (7.70) uses properties of A∞ weights. By the maximal theorem, Vp∗ (x) = supx∈Q µp (V, Q) is an A∞ weight such that sup µ2 (Vp∗ , Q) ≤ Cp sup |Q|2/n µp (V, Q) ≡ Cp Bp . Q
Q
Set W = Vp∗ . Since kM2 (χQ W )k∞ ≤ Bp for any cube Q, µZ
Z I2 (χQ W ) W ≤ Q
(I2 (χQ W ))
p0
¶1/p0 µZ
¶1/p Wp
Q
Q
µZ [M2 (χQ W )]p
≤ C Q
≤ C Bp |Q|1/p
µZ 0
¶1/p Wp
Q
0
¶1/p0 µZ
¶1/p Wp
Q
Z
≤ C Bp
W Q
provided that p is sufficiently close to one. The second inequality above is due to Muckenhoupt and Wheeden [282] and the last uses the reverse H¨older
7.7 Notes
353
property of A∞ . Thus (7.24) holds with V replaced by W . Therefore the weighted norm inequality (7.18) holds with V replaced by W . However, W dominates V so (7.18) holds for V as well. Since Kerman and Sawyer’s trace theorem is a characterization, it follows then that V must also satisfy (7.24) and passage through the trace theorem yields a bound of at most a fixed multiple of Bp for (7.23). Thus we have the right side inequality in (7.70). Morrey spaces and A1 . We have already noted that, by duality, (7.23) is equivalent to Z Z 2 2 |u| V ≤ CV |∇u| . This is in turn implies positivity of −∆ − V when CV ≤ 1. As observed, a key step in Theorem 7.2.3 is the observation that if V belongs to the Morrey space r Mn−2r , that is, if µr (V, Q) ≤ C|Q|−2/n for all cubes, then (7.7) holds. A simple alternative proof of this fact was found by Chiarenza and Frasca [71]. Their r proof shows, moreover, that (7.7) holds in the full range V ∈ Mn−pr ,1 0. For a Young’s function Φ one defines ½ µ ¶ ¾ Z 1 |f | kf kΦ,B = inf λ > 0 : Φ dy ≤ 1 |B| B λ along with its maximal version MΦ f (x) = supx∈B kf kΦ,B . It turns out that MΦ is Lp -bounded if and only if Φ ∈ Bp , meaning that for large enough c, Z ∞ Φ(t) dt < ∞. tp t c
354
7 Uncertainty principles in mathematical physics
Perez and Wheeden proved a generalization of the following in the setting of spaces of homogeneous type [297]. Theorem 7.7.2. Suppose that T , φ are as above, that K(x, y) is essentially decreasing with respect to |x − y| and φ satisfies (7.71). Suppose also that ˜ Ψ˜ belong to B2 . Suppose, Φ, Ψ are Young’s functions whose Young’s duals Φ, moreover, that the weights (V, W ) satisfy φ(B) |B| kV 1/2 kΨ,B kW −1/2 kΦ,B ≤ C for all balls B. Then for all f , Z Z 2 0 |T f | V ≤ C |f |2 W. Consider the special case W = 1. Then kW −1/2 kΦ,B is a constant independent of B. A standard example of a Young’s function of type Bp is |x|p (log(1 + |x|))−1−β . For this Ψ , T is bounded from L2 to L2V provided µ φ(B) |B|
1 |B|
Z
³
B
V ´1+β V log 1 + V (B)
¶1/2 ≤ C.
When T = I1 this becomes µ ¶ Z ³ 1 V ´1+β |B|2/n V log 1 + ≤ C |B| B V (B) which, except for the factor β, is Fefferman’s conjectured best possible bump condition. Uncertainty principles and weighted Fourier inequalities. In Chapter 5, weighted Fourier inequalities were linked to uncertainty principles of Heisenberg type. There also turns out to be a link between weighted Fourier norm inequalities and potential inequalities, as was pointed out by Kerman and Sawyer in a preprint draft of [230] and later extended as follows [243]. Theorem 7.7.3. Given a convexly decreasing weight function w = v −1/2 and n positive RBorel measure R σ on R , define the maximal function Mw,σ (g)(ξ) = supξ∈Q |y|<|Q|−1/n w Q |g| dσ. There is a constant C such that for all f , Z
Z |fb|2 dσ ≤ C
|f |2 v
if and only if there is a constant C 0 such that for all cubes Q , Z [Mw,σ (χQ )]2 ≤ C 0 σ(Q). Q
(7.72)
(7.73)
7.7 Notes
355
Here w plays the role of the Fourier transform of the convolution kernel φ: writing g ∈ L2 as g = f w for some f ∈ L2v , one has gb = fb ∗ w b ≈ fb ∗ φ and the Fourier inequality follows essentially from the trace theorem by duality. The concern with (7.73) is that it can be difficult to apply in practice, even if w is well behaved, because the measure σ could have complicated gaps. If one is content with close to sharp sufficient conditions, one can interpolate weak type inequalities that are implied by much simpler conditions as follows. R Set W (Q) = |y|<|Q|−1/n w. A necessary and sufficient condition that Mw,σ maps Lpσ continuously into wkLpdx , which in turn implies (7.72) (see [243], p. 214), is: there is a C such that for all cubes Q, |Q| W (Q)2 µ(Q) ≤ C.
(7.74)
Because σ now appears only on the left-hand side of the inequality, one no longer need worry about any holes that σ might have, but only how concentrated it can be. Thus, in practice (7.74) is simpler to check than (7.73). To consider a specific example, one defines a measure σ to be locally uniformly α-dimensional provided σ(B(x, r)) ≤ Crα where C does not depend on x or on r > 0. If one sets v(x) = (1 + |x|)n−β then one has Corollary 7.7.4. If σ is locally uniformly α-dimensional, 0 ≤ β < α ≤ n and p = 2 or if β = α and 1 < p < 2 then for f , µZ b p0
|f | dσ
¶1/p0
µZ ≤ C
¶1/p |f (x)|p (1 + |x|)n−β dx
.
Rn
The corollary was first proved by Strichartz [335], who invoked the decay of quadratic means of σ b. The case β = α, and p = 2 fails in general, as is the case for Cantor measure on R when β = α = ln 2/ ln 3. On the other hand, (7.72) is validated when σ is Cantor measure provided v(x) = (1 + |x|)ln (3/2)/ ln 3 ln(e + |x|)γ , γ > 1 by following the same line of argument for establishing (7.74) (the techniques in [335] would require γ > 2). Unique continuation of −∆ − V. So far we have said little about uniqueness of solutions of Schr¨odinger equations. Kenig [227] contains a nice historical introduction to the topic of unique continuation. The problem (in Rn ) is to find conditions on a linear partial differential operator P (x, d/dx) such that, if a solution u vanishes to infinite order at some point x0 then u vanishes identically. As a typical result, Chanillo and Sawyer [69] proved that if V is a Fefferman–Phong weight in Rn , n ≥ 3 satisfying, for any compact K, limr→0 supx∈K r2 µp (V, B(x; R)) = 0 then −∆ − V possesses this unique continuation property. A very basic question that one can ask about uniqueness is: Are there examples of V ∈ L1 and nonzero, compactly supported solutions u of −∆ − V = 0? Kenig and Nadirashvili [228] have constructed examples of such in Rn , n ≥ 2 with V having arbitrarily small L1 norm.
356
7 Uncertainty principles in mathematical physics
Energy of a large atom. The work of Fefferman and Phong addressed the question of the number of bound states of a particle system. An even more basic, and in some ways more difficult, question is that of estimating the ground-state energy of an atom. Thomas–Fermi theory predicted that the ground-state energy of an atom E of atomic number Z should be E(Z) ≈ 7/3 −c0 Z 7/3 + c1 Z 2 − c2 Z 5/3 . Here is the Thomas–Fermi energy, while 0Z R −c 5/3 4/3 c1 = 1/8 and c2 Z = cD R3 ρ in which cD is a constant identified by Dirac. Schwinger later observed a correction leading to the replacement of c2 by c3 = 10c2 /9. A proof of the asymptotic formula E(Z) ≈ −c0 Z 7/3 + c1 Z 2 − c3 Z 5/3 + O(Z 5/3−a ) was announced by Fefferman and Seco [138]. The main steps of the argument a reduction, within the asymptotic bound Pinclude N O(Z 5/3−a ), to estimating k=1 Ek where R Ek is the kth eigenvalue of the single electron Hamiltonian −∆ − Z/|x| + R3 ρ(y)/|x − y| dy in L2 (R3 ). This sum is estimated explicitly using techniques of analytic number theory parallel to those required for counting the number of lattice points in a ball, along with refined WKB asymptotics. More on multilinear singular integrals. What sort of smoothness is required of a multiplier or symbol in order that a corresponding singular integral is Lp -bounded? The Hilbert transform is discontinuous only at the origin and the Cauchy integral plays an important role in its analysis. More severe difficulties arise in higher dimensions, with examples such as the disk multiplier which has a discontinuity on a one-dimensional manifold. In a different direction, one can ask what sort of discontinuity can arise in the symbol of a multilinear convolution type operator. The BHT can be viewed as an operator of the more general form: Z Tm (f1 , . . . , fn−1 )∧ (−ξn ) =
m(ξ1 , . . . , ξn−1 ) ξ1 +···+ξn =0
n−1 Y
fbi (ξi ) dξi
i=1
in the special case m(ξ1 , ξ2 ) = πi sgn(ξ2 − ξ1 ). Operators as in (7.7) whose symbols satisfy H¨ormander estimates |∂ξα m(ξ)| ≤ C|ξ|−α for α up to an appropriate order off of the hyperspace Γ = {ξ ∈ Rn : ξ1 + · · · + ξn = 0}, will be 0 bounded from Lp1 ×· · ·×Lpn−1 to Lpn in the H¨older range 1/p1 +· · ·+1/pn = 1, 1 < pi ≤ ∞ (i = 1, . . . , n − 1), and 1/(n − 1) < p0n < ∞, (e.g., [229]). These symbol estimates are not satisfied in the case of BHT but the operator is still bounded in the H¨older range [161], as are a family of multilinear extensions (cf. [285]). BHT and Hankel operators. Besides the connection of BHT to spectral estimates for Schr¨odinger operators, its boundedness has other deep consequences. Here is one. One says that a matrix A is a Hankel matrix provided aij = ai+1,j+1 (indices mod n ). Let Atriang denote the lower triangular matrix having the same entries as A on and below the main diagonal and zeroes elsewhere. As an operator on `2 (Zn ), it is clear that the operator norm of
7.7 Notes
357
Atriang is bounded by a constant cn depending only on n times the operator norm of A. One form of Peller’s conjecture states that the constant can be taken independent of n, that is, kAtriang k`2 (Zn )→`2 (Zn ) ≤ C kAk`2 (Zn )→`2 (Zn ) with a constant independent of A and n for all Hankel matrices A. Gilbert and Gasch [152] and Bonami [55] independently proved Peller’s conjecture as a corollary of boundedness of BHT.
A Appendix
A.1 Notation The symbols Z, R, C denote the integers and real and complex numbers, respectively. Rn is the Euclidean space of n-tuples of real numbers whose elements are generically denoted by symbols such as x, y. We reserve boldface x, y, etc., when thinking of vectors in RN or CN as functions on an N -point space except, in specific instances, to distinguish Rn variables or parameters from those of a single variable. The symbol T denotes the unit circle {z ∈ C : |z| = 1} identified with unit the interval via z = e2πiξ , ξ ∈ [0, 1). The greatest integer less than or equal to t is denoted by [t], with dte denoting the least integer greater than or equal to t. The class of m×n matrices (m columns, n rows) with entries from the field F will be denoted by Mmn (F) and by Mn (F) when m = n. If a1 , . . . , an ∈ F then D = diag (a1 , . . . , an ) ∈ Mn (F) has (i, j)-th entry Dij = ai δij . The n × n identity matrix is denoted In . When acted on by matrices in standard coordinates, elements of Rn or Cn are thought of as column vectors, with matrices acting on the left as v 7→ Av unless specifically stated otherwise. All functions are assumed at least measurable with respect to the canonical measure—Lebesgue measure in the case of Rn —on its domain. A property is said to hold almost everywhere on Rn , or “a.e.” if it holds outside a set of Lebesgue measure zero. When sums and integrals are written without qualifying limits they extend over all elements for which P the integrand P∞ or summand is defined. Thus if s is a sequence onR Z then Rsk means k=−∞ sk while if f is a function defined on Rn then f means Rn f (x) dx. For a measurable R subset Ω ⊂ Rn , Lp (Ω) = {f : ||f ||Lp (Ω) = ( Ω |f |p )1/p < ∞}. When the domain of integration is clear we often abbreviate ||f ||Lp (Ω) = ||f ||p . When p = ∞, L∞ = {f : ||f ||∞ = sup |f (x)| < ∞}. For 1 ≤ p < ∞, the dual space 0 of Lp is Lp where p0 = p/(p−1) is the dual exponent of p. By χS we mean the indicator or characteristic function of the R set S, that is, χS (x) = 1 if x ∈ S and χS (x) = 0 if x ∈ / S. We use |S| = χS to denote the Lebesgue measure
360
Appendix
of S. We alternately use |A| or #A to denote the counting measure of a finite set A, depending on context. Given a space X of functions defined on Rn , we denote by Xloc those functions f such that f χB(0;R) ∈ X for every R > 0 where B(x; R) denotes the ball centered at x of radius R > 0. By a cube Q ⊂ Rn we shall always mean a product of intervals—its sides— having equal lengths denoted l(Q). Dyadic intervals and cubes play an important role. We denote by D the collection of all dyadic intervals of the form I = I(j, k) = [k/2j , (k + 1)/2j ), j, k ∈ Z. Also, D+ = {I ∈ D: |I| ≤ 1}, Dj = {I ∈ D: |I| = 2−j } and D(I) = {J ∈ D: J ⊂ I}. By Q we denote the dyadic cubes in Rn , that is, those cubes whose sides are in D, with similar meanings for Q+ , Qj and Q(Q). Then Q = Q(j, k) = Π[2−j ki , 2−j (ki + 1)) when k = (k1 , . . . , kn ) ∈ Zn . We also use Q to denote generic quartiles in phase space late in Chapter 7. We have tried to assign symbols consistent with their use in the literature, relying on context to determine meaning in the case of conflicting assignments. For example, the dual pairing h·, ·i denotes the Hermitian inner product in Hilbert spaces over C (sometimes we write h·, ·iH to make reference to a specific Hilbert space) but it also denotes the dual pairing between test functions and distributions. The symbol “∗” is used to denote adjoints of operators (hT f, gi = hf, T ∗ gi) as well as rearrangements of functions and sequences. The symbol “ b ” typically denotes a Fourier transform, but it is also used to denote imputed (Section 2.4) or estimated values (Section 6.2). The symbol δ is used for the Kronecker delta δαβ = 1 if α = β and δαβ = 0 if α 6= β. It is also used for the √ Dirac δ measure δ(f ) = f (0) as well as for the unitary dilation δa f (x) = af (ax). As with other instances of notational conflict, its meaning should be clear from context. The Fourier transform plays aR fundamental role in this text. For f ∈ L1 (Rn ), one sets Ff (ξ) = fb(ξ) = f (x)e−2πixξ dx. Here xξ is shorthand for the dot product x ·Rξ of two vectors x, ξ in Rn . The inverse Fourier transform of g(ξ) is g ∨ (x) = g(ξ)e2πixξ dξ a.e. The Fourier inversion formula says that (fb)∨ (x) = f (x). This formula applies on a dense subspace of L2 (Rn ) and extends by limiting arguments to larger function and distribution spaces. The relationship between wavelets and scaling filters leads to some notational difficulty in assigning a Fourier P series to a sequence {ck }k∈Z . The standard conventheory, tion is to write C(ξ) = k ck e2πikξ and we usually do this. In wavelet P however, it is often more convenient to assign the filter H(ξ) = k hk e−2πikξ to the sequence {hk } and we also do this in contexts emphasizing wavelets. Finally, the letters c and C are generically used to denote constants that do not depend on elements of some class of functions under consideration but whose values will likely change through the course of a proof.
Appendix
361
A.2 Miscellany from real and harmonic analysis H¨ older and Minkowski inequalities. Let kf kLpµ denote the usual Lp norm of f with respect to the measure µ. H¨older’s inequality says that kf gkL1µ ≤ kf kLpµ kgkLqµ where p, q ≥ 1 are conjugate exponents, that is, q = p/(p − 1). The Cauchy–Schwarz inequality is the special case when p = q = 2. Minkowski’s inequality boils down to some form of Rthe statement that if f R (·, y) lies in a normed space X for each y then k f (·, y) dµ(y)kX ≤p kf (·, y)kX dµ(y). This inequality is often employed in the case X = Lν for some measure ν. Rearrangements. The book by Stein and Weiss [329] is a standard reference for this material. We include only the most basic aspects here. A rearrangement {yn } of a sequence {xk }k∈Z is defined by a bijective map ρ : Z → N giving yρ(k) = xk . A nonincreasing rearrangement {x∗n } of {xk } is a rearrangement under which x∗n+1 ≤ x∗n . For functions on Rn , rearrangements must be taken with respect to Lebesgue measure. One defines the distribution function λf (s) = |{x : |f (x)| > s}| which is decreasing (i.e., nonincreasing) in s. Then one sets f ∗ (t) = inf{s : λ(s) ≤ t} which is nonincreasing on (0, ∞). When λf (s) is continuously strictly decreasing, |{t ∈ (0, ∞) : f ∗ (t) > s}| = |{x ∈ Rn : |f (x)| > s}|. Thus fR ∗ is calledR the equimeasurable decreasing rearrangement of f . Notice that u ≤ {u∗ >α} u∗ . Sometimes it is convenient to think of rearrangements {u>α} as functions on Rn . To do so, one sets f ? (x) = f ∗ (γn |x|n ). Here γn is the volume of the unit ball in Rn . Then (f ? )∗ = f ∗ . When n = 1, f ? (x) = f ∗ (2|x|). Lorentz spaces. Lorentz quasinorms are natural extensions of Lebesgue Lp norms when the latter R ∞ are expressed in terms of distribution functions. We write kf kLp,q = 0 (t1/p f ∗ (t))q dt/t when 1 ≤ p, q < ∞. When q = p, kf kLp,p = kf ∗ kLp (0,∞) = kf kLp (Rn ) . When q 6= p, however, the expression k·kLp,q fails Minkowski’s inequality, though when p > 1 it is possible to define a bona fide norm on Lp,q that is equivalent to k·kLp,q . The weak-Lp space Lp,∞ is defined by finiteness of kf kLp,∞ = supt>0 t1/p f ∗ (t). Since λf is an inverse of f ∗ (when defined), kf kLp,∞ is equivalent to sups>0 sλf (s)1/p . The expression kf kLp,q is decreasing inPq for fixed p. The discrete Lorentz ∞ spaces `p,q are defined by finiteness of n=1 (n1/p x∗n )q /n when q < ∞ and by #{k : |xk | > λ} ≤ C/λp when q = ∞. The Hardy–Littlewood maximal function. As in Chapter 7 µp (f, S) = ¡ 1 R ¢ p 1/p denotes the p-mean (p > 0) of |f | over S, with µ(f, S) = |S| S |f | µ1 (f, S). The Hardy–Littlewood maximal function is defined by M f (x) = sup µ(f, Q) x∈Q
(A.1)
362
Appendix
where the supremum is taken over all cubes containing x. The Hardy– Littlewood maximal theorem says that the sublinear operator f 7→ M f is bounded on Lp (Rn ) whenever p > 1 and maps L1 into L1,∞ , i.e., is weak-type (1, 1) (e.g., [327], p. 5). It follows immediately that the p-maximal function Mp f (x) = supx∈Q µp (f, Q) is Lr -bounded whenever r > p and weak-type (p, p). The dyadic maximal function Mdy is defined by restricting the supremum in (A.1) to dyadic cubes containing x. Ap weights. Two standard sources for properties of Muckenhoupt weights are [151] and [328]. A nonnegative weight function w is said to belong to the class A∞ provided there is a constant C > 0 and exponent δ > 0 such that for every cube Q ⊂ Rn and every measurable subset A ⊂ Q one has µ ¶δ w(A) |A| ≤C , (A.2) w(Q) |Q| R where w(A) = A w. Any w ∈ A∞ satisfies a reverse-H¨older inequality: there is some ε > 0 and fixed C such that for all cubes one has (e.g., [151], p. 402) µ1+ε (w, Q) ≤ Cµ(w, Q).
(A.3)
Any A∞ weight belongs to Ap for some 1 < p < ∞ where Ap is defined by the existence of a C > 0 such that, for all cubes Q, one has µ(w, Q)µ1/(p−1) (1/w, Q) ≤ C. The case p = 1 is somewhat special. It is specified by the condition that M w(x) ≤ Cw(x) where M is the Hardy–Littlewood maximal function. The classes Ap , p ∈ [1, ∞) are increasing with p and are all contained in A∞ . A typical element of A1 has the form w(x) = (M f (x))q for some q ∈ (0, 1) where f is such that M f < ∞ off a set of measure zero (cf. [328], p. 214). Since A1 ⊂ A∞ it then follows that Mr w ∈ A∞ where r = 1/q. Whitney decompositions. The Whitney decomposition can be stated as follows (see [327], p. 16): If F is a closed subset of Rn then its complement Ω is the (a.e.) union of a sequence of disjoint cubes Qk whose diameters are proportional to their distances from F . We use an explicit case of this fact in which Ω is the region in Rn × Rn above the diagonal ∆ = {(x, x) : x ∈ Rn }. In the plane this works as follows. For each (j, k) ∈ Z × Z consider the square Sj,k whose lower right vertex is (k/2j , k/2j ). Subdivide Sjk into four equal squares. (1) The three squares Sjk = [(2k−2)/2j+1 ), (2k−1)/2j+1 )×[k/2j , (2k+1)/2j+1 ), (2)
(3)
Sjk = [(2k − 2)/2j+1 ), (2k − 1)/2j+1 ) × [(2k + 1)/2j+1 , (k + 1)/2j ) and Sjk = [(2k − 1)/2j+1 ), k/2j ) × [(2k + 1)/2j+1 , (k + 1)/2j ) then lie in Ω. The distances √ √ (1) (3) (2) of Sjk and Sjk to ∆ are 1/(2j 2) while that of Sjk to ∆ is 2/2j . These three cubes are Whitney cubes for Ω. As (j, k) range over Z × Z the cubes (i) Sjk , i = 1, 2, 3 cover Ω a.e.
Appendix
Sjk(2)
363
Sjk(3)
Sjk(1)
Fig. A.1. Whitney decomposition above diagonal
Sobolev and Poincar´ e inequalities. The space Lpk = Lpk (Rn ) consists of P those f such that |α|≤k k(∂ α /∂xα )f kp < ∞. Sobolev’s theorem as follows is proved in Stein (see [327], p. 124 ff.): Theorem A.2.1. (Sobolev’s theorem) Let k ∈ Z and 1/q = 1/p − k/n. (i) If q < ∞ (i.e., p < n/k), then Lpk is continuously embedded in Lq . (ii) If q = ∞ (i.e., p = n/k) then Lpk ⊂ Lrloc for any r < ∞. (iii) If p > n/k, then every f ∈ Lpk can be modified on a set of measure zero so that the resulting function is continuous. When k = 1 one has the endpoint inequality kf kn0 ≤ Cn k∇f k1 . This case requires special techniques (loc. cit.). As with other Sobolev inequalities, its analogue for a bounded Lipschitz domain Ω in Rn , kf − fΩ kLn0 (Ω) ≤ CΩ k∇f kL1 (Ω) is referred to as a Poincar´e inequality. The constant CΩ depends on Ω and fΩ denotes the mean value of f over Ω. The need to subtract fΩ is seen by taking f to be constant on Ω. A Lipschitz domain in Rn is one whose boundary can locally be expressed, up to a rotation, as the graph of a Lipschitz function from Rn−1 to R. Interpolation theorems. We use two standard results from interpolation theory: the Riesz–Thorin theorem from complex interpolation and the Marcinkiewicz theorem from real interpolation. Bergh and L¨ofstrom [47]; cf. [329], contains details and related results. All exponents pi , qi below lie in [1, ∞). Theorem A.2.2. (Riesz–Thorin) Suppose that T : Lpi (X, µ) → Lqi (Y, ν) is a bounded linear operator with norm Mi , i = 0, 1. Let 1 1−θ θ = + p p0 p1
and
1 1−θ θ = + q q0 q1
(0 < θ < 1).
(A.4)
364
Appendix
Then T : Lp (X, µ) → Lq (Y, ν) is bounded with norm M ≤ M01−θ M1θ . Theorem A.2.3. (Marcinkiewicz) Suppose that q0 6= q1 , that pi ≤ qi , i = 0, 1, and that the (sub)-linear operator T : Lpi (X, µ) → Lqi ,∞ (Y, ν) is bounded with norm Mi , i = 0, 1. If p ≤ q are as in (A.4) then T : Lp (X, µ) → Lq (Y, ν) is bounded with norm M ≤ Cθ M01−θ M1θ . These basic interpolation theorems have many important generalizations, including vector-valued inequalities and extensions to multilinear operators. Heisenberg group. The Heisenberg group of Rn is the set Rn × Rn × R with multiplication (p, q, t) ·(p0 , q 0 , t0 ) = (p+p0 , q +q 0 , t+t0 +(pq 0 −qp0 )/2). For us it is more convenient to work with the polarized Heisenberg group which has the same underlying set but with product (p + p0 , q + q 0 , t + t0 + pq 0 ). We use Hn to denote the polarized Heisenberg group (usually one writes Hpol n , e.g., [144]). The group may then be identified with the collection of (n + 2) × (n + 2) matrices 1 p t M (p, q, t) = 0 In q , 0 0 1 the group product reducing to the matrix product. One advantage of working with the polarized group is that the elements with integer entries form a subgroup. In fact, one can also define a finite Heisenberg group HN by forming the matrices M (p, q, t) from the underlying set ZN × ZN × ZN in which ZN denotes the group (additive and multiplicative) of integers modulo N . The Heisenberg group possesses a one parameter family of representations on L2 (Rn ) of which we refer to one particular normalization as the Schr¨ odinger representation, ρ(p, q, t) : f → e2πit+2πiqx+πipq f (x + p). Distributions. The elements of Cc∞ (Rn ), those infinitely differentiable functions with compact support, are called test functions. The space is endowed with a topology in which a sequence {fn } converges to zero if, for any compact set K ⊂ Rn , fn and all of its derivatives converge to zero uniformly on K. The space D0 (Rn ) consists of those linear mappings E : Cc∞ (Rn ) → C such that hE, fn i = E(fn ) → 0 whenever fn → 0 in Cc∞ (Rn ). A multi-index has the form α = (α1 , . . . , αn ) ∈ {0, 1, 2, . . . }n with |α| = α1 + · · · + αn . Also, αn α α 1 n xβ = xβ1 1 · · · xβnn , ∂ α = (∂ α1 /(∂xα /(∂xα n ) and φ = ∂ φ whenever 1 ) · · · (∂ n the derivatives exist. The Schwartz space S(R ) consists of those functions φ : Rn → C such that |xβ φα (x)| → 0 as |x| → ∞ for all multi-indices α, β. The seminorms kφkαβ = sup |xβ φα (x)| endow S(Rn ) with a Fr´echet topology in which fn → 0 when kfn kαβ → 0 for all multi-indices. The Schwartz or tempered distributions S 0 (Rn ) consist of those E ∈ D0 (Rn ) satisfying hE, φn i → 0 whenever φn → 0 in S(Rn ). H¨ormander [202] is a standard reference for analysis of these spaces. Measures are special examples of distributions. The Dirac point mass δ is defined by hδ, f i = f (0). Its derivatives can be defined by
Appendix
365
hδ α , f i = (−1)|α| f α (0). More generally, operations on distributions are defined through their adjoints hT E, f i = hE, T ∗ f i. For example, δx (f ) = f (x) is defined as hτx δ, f i = hδ, τ−x f i = f (x) where τa denotes the unitary translation operator τa (f )(x) = f (x − a).
A.3 Miscellany from functional analysis n 2 n We work mainly with the Hilbert spaces L2 (R R P ) and ` (Z ). In the case of L2 (Rn ), hf, gi = f g¯. For `2 (Zn ), hx, yi = xk y¯k . The identity operator on a linear space X will be denoted by I. Given a subset S of X we denote by span(S) the subspace of all finite linear combinations taken from S. The closure of this set in the topology on X is denoted span(S). Other function spaces (Lp -spaces, Besov spaces, etc.) are important as are inequalities for operators on these spaces. In several instances we work with operators that are unbounded on L2 (Rn ). Standard examples include partial differentiation and multiplication by an unbounded function. It is convenient to work with dense subspaces. The Schwartz space, for example, is dense in L2 (Rn ) and almost all of the function spaces on Rn that we consider. If one has an operator bound of the form kT f kY ≤ Ckf kX in which T is linear on the Banach space X and T f is well defined for all f belonging to a dense subspace V of X, then the inequality extends, with the same constant C and an appropriate extension of T to all of X, by means of a standard limiting argument. The operator norm of T , denoted kT kX→Y or simply kT k, is then sup{kT f kY : kf kX = 1}. The nature of this extension, of course, can depend intrinsically on X and the dense subspace (see, e.g., [40]). In the case of an unbounded operator T from X to itself, the domain dom T of T is defined as those f ∈ X such that T f ∈ X. If S, T are unbounded then the domain of the composition ST consists of those f ∈ dom T such that T f ∈ dom S. The commutator of two operators S and T is the operator [S, T ] = ST − T S.
Oblique projections. Although we assume familiarity with the elements of Hilbert space theory, one aspect, namely the nature of oblique projections, bears special mention as it arises in the context of biorthogonal wavelets and sampling. Recall that a projection operator P on a Hilbert space H is an operator that acts as the identity on its range. P is orthogonal if (P H)⊥ is contained in, hence equal to the kernel of P . If (P H)⊥ is not contained in ker P then the projection P is called oblique. The operator P : R2 → R2 , P (x1 , x2 ) = (x1 + x2 , 0) provides a simple example satisfying P 2 = P but ker P = {(x, −x)} 6= (Ran P )⊥ = {(0, x)}. For subspaces V1 and V˜0 of H, a projection into V1 ∩ V˜0⊥ need not agree with the orthogonal projection into V1 . Some specific classes of operators. The spectral theorem for self-adjoint operators is outlined near the beginning of Chapter 7. Some specific classes
366
Appendix
of operators on a separable Hilbert space H are of particular relevance. A unitary operator U on H1 → H2 is one such hAf, AgiH2 = hf, giH1 . The (Hermitian) adjoint A∗ of A : H1 → H2 is defined via hAf, giH2 = hf, A∗ giH1 (we also write A∗ for the Banach space adjoint of A). Then A : H → H is self-adjoint if A∗ = A. A compact, self-adjoint operator A on H has a particularly simple structure since, for such an operator, there P∞ is an orthogonal basis (the normalized eigenvectors of A) such that A = k=1 λk ek ⊗ ek . The Hilbert–Schmidt class consists of those operators A on H for which there is an orthonormal basis {ek } under which the sequence {kAek kH } is squaresummable (the sum is then independent of the basis since two such sequences differ by a unitary operator). When A has an integral kernel k(x, y), the Hilbert–Schmidt norm squared can also be expressed as the integral of |k|2 . More generally, define the approximation numbers σN (A) = inf{kA − AN k : rank(AN ) ≤ N } where the rank of AN is the dimension of its range. When A is compact and self-adjoint, σN (A) coincides with the N th largest eigenvalue of A. One defines the Schatten–Lorentz class Sp,q to consist of those operators on H such that the sequence of approximation numbers belongs to the Lorentz space `p,q (N). Functional calculus. Let A : H → H be bounded and self-adjoint, and let m = inf kϕk=1 hAϕ, ϕi and M = supkϕk=1 hAϕ, ϕi = kAk be the lower and upper bounds of A, respectively. Let f be a continuous function on the interval [m, M ]. In light of the spectral theorem (7.1.1) for self-adjoint R∞ operators, one can define f (A) = −∞ f (λ)dP (λ, A) dλ in the sense that R∞ hf (A)ϕ, ψi = −∞ f (λ)dhP (λ, A)ϕ, ψi dλ for all ϕ, ψ ∈ H. This definition Pn Pn j j satisfies p(A) = j=0 aj λ is a polynomial with j=0 aj A when p(λ) = real coefficients. In addition, f (A) is self-adjoint whenever f is real-valued. If R∞ f (λ) ≡ 1 then f (A) = I, while kϕk2 = −∞ dhP (λ, A)ϕ, ϕi dλ. Thus, if f is real-valued, ¯Z ∞ ¯ ¯ ¯ kf (A)k = sup |hf (A)ϕ, ϕi| = sup ¯¯ f (λ)dhP (λ, A)ϕ, ϕi dλ¯¯ kϕk=1
kϕk=1
≤
sup
−∞
|f (λ)|.
m≤λ≤M
Neumann series: iterative methods. Let T : H → H be linear and bounded with kI − T k = γ < 1. Then T has a bounded inverse T −1 = P∞ j series for j=0 (I − T ) . The right-hand side of this equation is a Neumann P∞ j −1 −1 T . Each f ∈ H admits the expansion f = T T f = j=0 (I − T ) T f . Pn j Let fn = j=0 (I − T ) T f be the nth partial sum of the series. Then fn+1 = fn + T (f − fn ), thus providing the iterative algorithm: f0 = T f ;
fn+1 = fn + T (f − fn ) (n ≥ 0).
Appendix
367
The convergence fn → f is geometric in the sense that kf −fn k ≤ cγ n kf k → 0 as n → ∞. In fact, for any initialization f0 , the iterates of the algorithm will converge to f geometrically.
References
1. M. Abramowitz and I.A. Stegun. Handbook of Mathematical Functions. Dover, New York, 1970. 2. R. Adams. Sobolev Spaces. Academic Press, New York, 1975. 3. A. Aldroubi and K. Gr¨ ochenig. Nonuniform sampling and reconstruction in shift-invariant spaces. SIAM Review, 43:585–620, 2001. 4. A. Aldroubi and M. Unser. Families of wavelet transforms in connection with Shannon’s sampling theory and Gabor transforms. In C.K. Chui, editor, Wavelets: A Tutorial in Theory and Applications, pages 509–528. Academic Press, San Diego, 1992. 5. J.B. Allen. Nonlinear cochlear signal processing. In A. Jahn and J. SantosSacchi, editors, Physiology of the Ear, pages 393–442. Singular Thompson, San Diego, 2nd edition, 2001. 6. B. Alpert, G. Beylkin, D. Gines, and L. Vozovoi. Adaptive solution of partial differential equations in multiwavelet bases. J. Comput. Phys., 182:149–190, 2002. 7. W.O. Amrein and A.M. Berthier. On support properties of Lp functions and their Fourier transforms. J. Funct. Anal., 24:258–267, 1977. 8. X. Aragones, J.L. Gonzalez, and A. Rubio. Analysis and Solutions for Switching Noise in Coupling Mixed Signal ICs. Kluwer, Dordrecht, 1999. 9. F. Astengo, M. Cowling, B. Di Blasio, and M. Sundari. Hardy’s uncertainty principle on certain Lie groups. J. London Math. Soc., 62:461–472, 2000. 10. P. Auscher, G. Weiss, and M.V. Wickerhauser. Local sine and cosine bases of Coifman and Meyer and the construction of smooth wavelets. In C.K. Chui, editor, Wavelets: A Tutorial in Theory and Applications, pages 237–256. Academic Press, San Diego, 1992. 11. L. Auslander and Y. Meyer. A generalized Poisson summation formula. Appl. Comput. Harmon. Anal., 3:372–376, 1996. 12. L. Auslander and R. Tolimieri. Is computing with the finite Fourier transform pure or applied mathematics? Bull. Amer. Math. Soc. (N.S.), 1:847–897, 1979. 13. K.I. Babenko. An inequality in the theory of Fourier integrals. Izv. Akad. Nauk. SSSR, Ser. Mat., 27:531–542, 1961. English transl. Amer. Math. Soc. Trans. (2) 44, 115–128. 14. S. Bagchi and S. Ray. Uncertainty principles like Hardy’s theorem on some Lie groups. J. Austral. Math. Soc. (Series A), 65:289–302, 1999.
370
References
15. L.W. Baggett. Processing a radar signal and representations of the Heisenberg group. Colloq. Math., 60/61:195–203, 1990. 16. R. Balan. An uncertainty inequality for wavelet states. Appl. Comput. Harmon. Anal, 5:106–108, 1998. 17. R. Balian. Un principe d’incertitude fort en th´eorie du signal ou en m´ecanique quantique. C. R. Acad. Sci. Paris S´er. II, 292:1357–1362, 1981. 18. E. Balila and N. Reyes. Weighted uncertainty principles in L∞ . J. Approx. Theory, 106:241–248, 2000. 19. R. Baraniuk. Information-theoretic interpretation of Besov spaces. In A. Aldroubi, A. Laine, and M. Unser, editors, Wavelet Applications in Signal and Image Processing VIII, volume 4119 of Proc. SPIE, pages 675–685, 2000. 20. R. Baraniuk, R. DeVore, G. Kyriazis, and X. M. Yu. Near best tree approximation. Adv. Comput. Math., 16:357–373, 2002. 21. V. Bargmann. On a Hilbert space of analytic functions and an associated integral transform, part I. Comm. Pure Appl. Math., 14:187–214, 1961. 22. J. Barnes. Laplace–Fourier transformation, the foundation for quantum information theory and linear physics. In Problems in Analysis (Papers dedicated to Salomon Bochner, 1969), pages 157–173. Princeton University Press, Princeton, 1970. 23. G. Battle. Heisenberg proof of the Balian–Low theorem. Lett. Math. Phys., 15:175–177, 1988. 24. G. Battle. Phase space localization theorem for ondelettes. J. Math. Phys., 30:2195–2196, 1989. 25. G. Battle. Heisenberg inequalities for wavelet states. Appl. Comput. Harmon. Anal., 4:119–146, 1997. 26. W. Beckner. Inequalities in Fourier analysis. Ann. Math., 102:159–182, 1975. 27. W. Beckner. Pitt’s inequality and the uncertainty principle. Proc. Amer. Math. Soc., 123:1897–1905, 1995. 28. W. Beckner. Geometric asymptotics and the logarithmic Sobolev inequality. Forum Math., 11:105–137, 1999. 29. J. Benedetto. Spectral Synthesis. Academic Press, New York, 1971. 30. J. Benedetto. Frame decompositions, sampling, and uncertainty principle inequalities. In J. Benedetto and M. Frazier, editors, Wavelets: Mathematics and Applications, pages 247–304. CRC Press, Boca Raton, FL, 1994. 31. J. Benedetto. Harmonic Analysis and Applications. CRC Press, Boca Raton, FL, 1997. 32. J. Benedetto, W. Czaja, P. Gadzi´ nski, and A. Powell. The Balian–Low theorem and regularity of Gabor systems. J. Geom. Anal., 13:239–254, 2003. 33. J. Benedetto, W. Czaja, and A. Maltsev. The Balian–Low theorem for the symplectic form on R2d . J. Math. Phys., 44:1735–1750, 2003. 34. J. Benedetto and P.J.S.G. Ferreira. Introduction. In Modern sampling theory, pages 1–26. Birkh¨ auser, Boston, MA, 2001. 35. J. Benedetto, C. Heil, and D. Walnut. Differentiation and the Balian–Low theorem. J. Fourier Anal. Appl., 1:355–402, 1995. 36. J. Benedetto and H. Heinig. Weighted Hardy spaces and the Laplace transform. In G. Mauceri, F. Ricci, and G. Weiss, editors, Harmonic Analysis, Proceedings, Cortona 1982, pages 240–277. Springer-Verlag, New York, 1983. 37. J. Benedetto and H. Heinig. Weighted Fourier inequalities: new proofs and generalizations. J. Fourier Anal. Appl., 9:1–37, 2003.
References
371
38. J. Benedetto, H. Heinig, and R. Johnson. Fourier inequalities with Ap -weights. In General Inequalities, 5 (Oberwolfach, 1986), volume 80, pages 217–232. Birkh¨ auser, Basel, 1987. 39. J. Benedetto, H. Heinig, and R. Johnson. Weighted Hardy spaces and the Laplace transform, II. Math. Nachr., 132:29–55, 1987. 40. J. Benedetto and J. Lakey. The definition of the Fourier transform for weighted norm inequalities. J. Funct. Anal., 120:403–439, 1994. 41. J. Benedetto and S. Li. The theory of multiresolution analysis frames and applications to filter banks. Appl. Comput. Harmon. Anal., 5:389–427, 1998. 42. J. Benedetto and S. Scott. Frames, irregular sampling, and a wavelet auditory model. In F. Marvasti, editor, Sampling Theory and Practice. Plenum Press, New York, 2000. 43. J. Benedetto and A. Teolis. A wavelet auditory model and data compression. Appl. Comput. Harmon. Anal., 1:3–28, 1993. 44. J. Benedetto and O. Treiber. Wavelet frames: multiresolution analysis and extension principles. In L. Debnath, editor, Wavelet Transforms and Time– Frequency Analysis, pages 3–36. Birkh¨ auser, Boston, 2001. 45. M. Benedicks. The support of functions and distributions with a spectral gap. Math. Scand., 55:285–309, 1984. 46. M. Benedicks. On the Fourier transform of functions supported on sets of finite Lebesgue measure. J. Math. Anal. Appl., 106:180–183, 1985. 47. J. Bergh and J. L¨ ofstrom. Interpolation Spaces. An Introduction. SpringerVerlag, Berlin, 1976. 48. R. Bernardini and J. Kovaˇcevi´c. Arbitrary tilings of the time–frequency plane using local bases. IEEE Trans. Signal Process., 47:2293–2304, 1998. 49. R. Bernardini and M. Vetterli. Discrete- and continuous-time local cosine bases with multiple overlapping. IEEE Trans. Signal Process., 46:3166–3180, 1998. 50. O. Besov. Investigation of a family of functional spaces connected with embedding and extension theorems. Trudy Mat. Inst. Steklov, 60:42–81, 1961. 51. A. Beurling. On a closure problem. Ark. Mat., 1:301–303, 1950. 52. G. Beylkin. On the fast Fourier transform of functions with singularities. Appl. Comput. Harmon. Anal., 12:363–381, 1995. 53. G. Beylkin and L. Monz´ on. On generalized Gaussian quadratures for exponentials and their applications. Appl. Comput. Harmon. Anal., 12:332–373, 2002. 54. K. Bittner. Biorthogonal Wilson bases. In A. Aldroubi, A. Laine, and M. Unser, editors, Wavelet Applications in Signal and Image Processing VII, volume 3813 of Proc. SPIE, pages 410–421, 1999. 55. A. Bonami and J. Bruna. On the truncation of Hankel and Toeplitz operators. Publ. Mat., 43:235–250, 1999. 56. A. Bonami, B. Demange, and P. Jaming. Hermite functions and uncertainty principles for the Fourier and the windowed Fourier transforms. Rev. Mat. Iberoamericana, 19:23–55, 2003. 57. N. Bourbaki. Th´eories Spectrales, Chapitre II. Hermann, Paris, 1967. 58. J. Bourgain. A remark on the uncertainty principle for Hilbertian basis. J. Funct. Anal., 79:136–143, 1988. 59. R. Bracewell. The Fourier Transform and its Applications. McGraw-Hill, New York, 1965. 60. L. Breiman, J. Friedman, R. Olshen, and C.J. Stone. On Classification and Regression Trees. Wadsworth, Belmont, CA, 1983.
372
References
61. L. Brillouin. Sur un type g´en´eral de probl`emes, permettant la s´eparation des variables dans la m´ecanique ondulatoire de Schr¨ odinger. C. R. Acad. Sci. Paris, 183:270–271, 1926. 62. J.S. Byrnes, W. Moran, and B. Saffari. Smooth PONS. J. Fourier Anal. Appl., 6:663–674, 2000. 63. C.A. Cabrelli, C. Heil, and U.M. Molter. Self-similarity and multiwavelets in higher dimensions. Mem. Amer. Math. Soc., 170, 2004. 64. E. Cand`es and L. Demanet. Curvelets and Fourier integral operators. C. R. Math. Acad. Sci. Paris, 336:395–398, 2003. 65. E. Cand`es and D. Donoho. New tight frames of curvelets and optimal representations of objects with piecewise C 2 singularities. Comm. Pure Appl. Math., 57:219–266, 2004. 66. M. Cannone and Y. Meyer. Littlewood–Paley decomposition and Navier– Stokes equations. Meth. and Math. of Analysis, 2:307–319, 1995. 67. M. Cannone, F. Planchon, and M. Schonbek. Strong solutions to the incompressible Navier–Stokes equations in the half-space. Comm. Partial Differential Equations, 25:903–924, 2000. 68. L. Carleson. An explicit basis of H 1 . Bull. Sci. Math., 104:405–416, 1980. 69. S. Chanillo and E. Sawyer. Unique continuation for ∆+v and the C. Fefferman– Phong class. Trans. Amer. Math. Soc., 318:275–300, 1990. 70. S. Chanillo and R. Wheeden. Lp estimates for fractional integrals and Sobolev inequalities, with applications to Schr¨ odinger operators. Comm. Partial Differential Equations, 10:1077–1116, 1985. 71. F. Chiarenza and M. Frasca. A remark on a paper by C. Fefferman. Proc. Amer. Math. Soc., 108:407–409, 1990. 72. M. Christ and A. Kiselev. One-dimensional Schr¨ odinger operators with slowly decaying potentials: spectra and asymptotics. Lecture Notes, University of Arkansas, 2001. 73. M. Christ and A. Kiselev. WKB asymptotic behavior of almost all generalized eigenfunctions for one-dimensional Schr¨ odinger operators with slowly decaying potentials. J. Funct. Anal., 179:426–447, 2001. 74. M. Christ and A. Kiselev. Scattering and wave operators for one-dimensional Schr¨ odinger operators with slowly decaying nonsmooth potentials. Geom. Funct. Anal., 12(6):1174–1234, 2002. 75. M. Christ and A. Kiselev. One-dimensional Schr¨ odinger operators and nonlinear Fourier analysis. preprint, 2003. 76. O. Christensen. An Introduction to Frames and Riesz Bases. Birkh¨ auser, Boston, 2003. 77. O. Christensen, B. Deng, and C. Heil. Density of Gabor frames. Appl. Comput. Harmon. Anal., 7:292–304, 1999. 78. C.K. Chui and J.A. Lian. A study of orthonormal multiwavelets. Appl. Numer. Math., 20:273–298, 1996. 79. A. Cohen. Biorthogonal wavelets. In C.K. Chui, editor, Wavelets: A Tutorial in Theory and Applications, pages 123–152. Academic Press, San Diego, 1992. 80. A. Cohen, W. Dahmen, I. Daubechies, and R. DeVore. Nonlinear approximation and the space BV(R2 ). Amer. J. Math., 121:587–628, 1999. 81. A. Cohen, W. Dahmen, I. Daubechies, and R. DeVore. Tree approximation and optimal encoding. Appl. Comput. Harmon. Anal., 11:192–226, 2001. 82. A. Cohen, W. Dahmen, I. Daubechies, and R. DeVore. Harmonic analysis of the space BV. Rev. Mat. Iberoamericana, 19:235–263, 2003.
References
373
83. A. Cohen, I. Daubechies, and P. Vial. Wavelets on the interval and fast wavelet transforms. Appl. Comput. Harmon. Anal., 1:54–81, 1993. 84. A. Cohen, R. DeVore, and R. Hochmuth. Restricted nonlinear approximation. Constr. Approx., 16:85–113, 2000. 85. A. Cohen, R. DeVore, P. Petrushev, and H. Xu. Nonlinear approximation and the space BV(R2 ). Amer. J. Math., 121:587–628, 1999. 86. A. Cohen, Y. Meyer, and F. Oru. Improved Sobolev embedding theorem. In S´eminaire sur les Equations aux D´eriv´ees Partials, 1997–1998, Paliseau, 1998. ´ Ecole Polytechnic, Paliseau. 87. R. Coifman, P. Jones, and S. Semmes. Two elementary proofs of the L2 boundedness of Cauchy integrals on Lipschitz curves. J. Amer. Math. Soc., 2:553–564, 1989. 88. R. Coifman, A. McIntosh, and Y. Meyer. L’int´egrale de Cauchy d´efinit un op´erateur born´e sur L2 pour les courbes Lipschitziennes. Ann. Math., 116:362– 387, 1982. 89. R. Coifman and Y. Meyer. Remarques sur l’analyse de Fourier ` a fenˆetre. C. R. Acad. Sci. Paris, S´erie I, 312:259–261, 1991. 90. R. Coifman and Y. Meyer. Gaussian bases. Appl. Comput. Harmon. Anal., 2:299–302, 1995. 91. R. Coifman, Y. Meyer, and V. Wickerhauser. Wavelet analysis and signal processing. In M.B. Ruskai, G. Beylkin, and R. Coifman et al., editors, Wavelets and Their Applications, pages 153–178. Jones and Bartlett, Boston, 1992. 92. D. Colella and C. Heil. Matrix refinement equations: existence and uniqueness. J. Fourier Anal. Appl., 2:363–377, 1996. 93. E.U. Condon. Immersion of the Fourier transform in a continuous group of functional transformations. Proc. Natl. Acad. Sci. USA, 23:158–164, 1937. 94. J.W. Cooley and J.W. Tukey. Algorithm for the machine computation of complex Fourier series. Math. Comp., 19:297–301, 1965. 95. M. Cowling and J.F. Price. Bandwidth versus time concentration: the Heisenberg–Pauli–Weyl inequality. SIAM J. Math. Anal., 15:151–165, 1984. 96. M. Croft and J.A. Hogan. Wavelet-based signal extrapolation. In Proc. Fourth Int. Symp. Sig. Proc. and Appl., Gold Coast, Australia, pages 752–755, 1996. 97. I. Daubechies. Orthonormal bases of compactly supported wavelets. Comm. Pure Appl. Math., 41:909–986, 1988. 98. I. Daubechies. The wavelet transform, time–frequency localization and signal analysis. IEEE Trans. Inform. Theory, 36:961–1005, 1990. 99. I. Daubechies. Ten Lectures on Wavelets. SIAM, Philadelphia, 1992. 100. I. Daubechies and R. DeVore. Approximating a bandlimited function using very coarsely quantized data: a family of stable sigma–delta quantizers of arbitrary order. Ann. Math. (2), 158:679–710, 2004. 101. I. Daubechies, A. Grossmann, and Y. Meyer. Painless nonorthogonal expansions. J. Math. Phys., 27:1271–1283, 1986. 102. I. Daubechies, I. Guskov, P. Schr¨ oder, and W. Sweldens. Wavelets on irregular point sets. R. Soc. Lond. Philos. Trans. Ser. A Math. Phys. Eng. Sci., 357:2397–2413, 1999. 103. I. Daubechies, I. Guskov, and W. Sweldens. Regularity of irregular subdivision. Constr. Approx., 15:381–426, 1999. 104. I. Daubechies, I. Guskov, and W. Sweldens. Commutation for irregular subdivision. Constr. Approx., 17:479–514, 2001.
374
References
105. I. Daubechies, S Jaffard, and J.L. Journ´e. A simple Wilson orthonormal basis with exponential decay. SIAM J. Math. Anal., 22:554–572, 1991. 106. I. Daubechies and A.J.E.M. Janssen. Two theorems on lattice expansions. IEEE Trans. Inform. Theory, 39:3–6, 1993. 107. I. Daubechies and J.C. Lagarias. Two-scale difference equations I. Existence and global regularity of solutions. SIAM J. Math. Anal., 22:1388–1410, 1991. 108. I. Daubechies and J.C. Lagarias. Two-scale difference equations II. Local regularity, infinite products of matrices and fractals. SIAM J. Math. Anal., 23:1031–1079, 1992. 109. I. Daubechies and W. Sweldens. Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl., 4:247–269, 1998. 110. G. Davis, V. Strela, and R. Turcajova. Multiwavelet construction via the lifting scheme. In T.-X. He, editor, Wavelet Analysis and Multiresolution Methods, pages 57–79. Marcel Dekker, New York, 2000. 111. N.G. DeBruijn. Uncertainty principles in Fourier analysis. In O. Shisha, editor, Inequalities, pages 55–71. Academic Press, New York, 1967. 112. A. Dembo, T. Cover, and J. Thomas. Information theoretic inequalities. IEEE Trans. Inform. Theory, 37:1501–1518, 1991. 113. R. DeVore, B. Jawerth, and V. Popov. Compression of wavelet decompositions. Amer. J. Math., 114:737–785, 1992. 114. I. Djokovic and P.P. Vaidyanathan. Generalized sampling theorems in multiresolution subspaces. IEEE Trans. Signal Process., 45:583–599, 1997. 115. D. Donoho. Unconditional bases are optimal bases for data compression and statistical estimation. Appl. Comput. Harmon. Anal., 1:100–115, 1993. 116. D. Donoho, N. Dyn, D. Levin, and T. Yu. Smooth multiwavelet duals of Alpert bases by moment-interpolating refinement. Appl. Comput. Harmon. Anal., 9:166–203, 2000. 117. D. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proc. Natl. Acad. Sci. USA, 100:2197–2202, 2003. 118. D. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory, 47:2845–2862, 2001. 119. D. Donoho and P. Stark. Uncertainty principles and signal recovery. SIAM J. Appl. Math., 49:906–931, 1989. 120. D. Donoho and P. Stark. A note on rearrangements and spectral concentration. IEEE Trans. Inform. Theory, 39:257–260, 1993. 121. D. Donoho, M. Vetterli, R. DeVore, and I. Daubechies. Data compression and harmonic analysis. IEEE Trans. Inform. Theory, 44:2436–2476, 1998. 122. G. Donovan, J. Geronimo, and D. Hardin. Squeezable orthogonal bases: accuracy and smoothness. SIAM J. Numer. Anal., 40:1077–1099, 2002. 123. G. Donovan, J. Geronimo, D. Hardin, and P. Massopust. Construction of orthogonal wavelets using fractal functions. SIAM J. Math. Anal., 27:1158– 1192, 1996. 124. I. Dreier, W. Ehm, T. Gneiting, and D. Richards. Improved bounds for Laue’s constant and multivariate extensions. Math. Nachr., 228:109–122, 2001. 125. R.J. Duffin and A.C. Schaeffer. A class of nonharmonic Fourier series. Trans. Amer. Math. Soc., 72:341–366, 1952. 126. Dutt and Rokhlin. Fast Fourier transforms for nonequispaced data. SIAM J. Sci. Comput., 14:1368–1393, 1993.
References
375
127. H. Dym and H.P. McKean. Fourier Series and Integrals. Academic Press, San Diego, 1972. 128. N. Dyn. Subdivision schemes in computer aided graphic design. In Advances in Numerical Analysis II, Wavelets Subdivision Algorithms and Radial Basis Functions, pages 36–104. Clarendon Press, Oxford, 1992. 129. N. Dyn and D. Levin. Analysis of Hermite-interpolatory subdivision schemes. In S. Dubuc, editor, Spline Functions and the Theory of Wavelets, pages 105– 113, Providence, RI, 1998. AMS. 130. F. Dyson and A. Lenard. Stability of matter, I. J. Math. Phys., 8:423–434, 1967. 131. W. Ehm, T. Gneiting, and D. Richards. On the uncertainty relation for positive definite functions, II. Statistics, 33:267–286, 1999. 132. D. Esteban and C. Galand. Application of quadrature mirror filters to splitband voice coding schemes. In Proc. IEEE Int. Conf. Acoust. Signal Speech Process., pages 191–195, 1977. 133. E. Fabes, C. Kenig, and R. Serapioni. The local regularity of solutions of degenerate elliptic equations. Studia Math., 51:241–250, 1974. 134. P. Federbush. Navier and Stokes meet the wavelet. Comm. Math. Phys., 155:219–248, 1993. 135. C. Fefferman. Pointwise convergence of Fourier series. Ann. Math., 98:551–571, 1973. 136. C. Fefferman. The uncertainty principle. Bull. Amer. Math. Soc. (NS), 9:129– 206, 1983. 137. C. Fefferman and R. de la Llave. Relativistic stability of matter, I. Rev. Math. Iberoamericana, 2:119–215, 1985. 138. C. Fefferman and L. Seco. On the energy of a large atom. Bull. Amer. Math. Soc. (NS), 23:525–530, 1990. 139. H.G. Feichtinger and K. Gr¨ ochenig. Gabor wavelets and the Heisenberg group: Gabor expansions and the short time Fourier transform from the group theoretical point of view. In C.K. Chui, editor, Wavelets: A Tutorial in Theory and Applications, pages 359–398. Academic Press, Boston, 1992. 140. H.G. Feichtinger, K. Gr¨ ochenig, and T. Strohmer. Efficient numerical methods in nonuniform sampling theory. Numer. Math., 69:423–440, 1995. 141. E. Fermi. Un metodo statistico per la determinazione di alcune priorieta dell’atomo. Atti. Acad. Naz. Lincei., Rend., 6:602–607, 1927. 142. P. Flandrin. Inequalities in Mellin–Fourier signal analysis. In L. Debnath, editor, Wavelet Transforms and Time–Frequency Analysis, pages 289–319. Birkh¨ auser, Boston, 2001. 143. H. Fletcher. Physical measurements of audition and their bearing on the theory of hearing. J. Franklin Inst., 196:289–326, 1923. 144. G.B. Folland. Harmonic Analysis in Phase Space. Princeton University Press, Princeton, 1989. 145. G.B. Folland and A. Sitaram. The uncertainty principle: a mathematical survey. J. Fourier Anal. Appl., 3:207–238, 1997. 146. M. Frazier and B. Jawerth. Decomposition of Besov space. Indiana Univ. Math. J., 34:777–799, 1985. 147. M. Frazier and B. Jawerth. A discrete transform and decompositions of distribution spaces. J. Funct. Anal., 93:34–170, 1990.
376
References
148. M. Frazier and B. Jawerth. Applications of the ϕ and wavelet transforms to the theory of function spaces. In M.B. Ruskai, G. Beylkin, and R. Coifman et al., editors, Wavelets and Their Applications, pages 377–417. Jones and Bartlett, Boston, 1992. 149. M. Frazier, B. Jawerth, and G. Weiss. Littlewood–Paley Theory and the Study of Function Spaces. AMS, Providence, RI, 1991. 150. F.P. Gantmacher and M.G. Krein. Oscillation Matrices and Kernels and Small Vibrations of Mechanical Systems. AMS Chelsea Publishing, Providence, RI, 2002. 151. J. Garc´ıa-Cuerva and J.L. Rubio de Francia. Weighted Norm Inequalities and Related Topics. North–Holland, Amsterdam, 1985. 152. J. Gasch and J.E. Gilbert. Triangularizations of Hankel operators and the bilinear Hilbert transform. In The Functional and Harmonic Analysis of Wavelets and Frames (San Antonio, TX, 1999), pages 235–248. AMS, Providence, RI, 1999. 153. I.M. Gelfand and G.E. Shilov. Generalized Functions. Academic Press, New York, 1968. 154. R.W. Gerchberg. Super-resolution through error reduction. Optica Acta, 21:709–720, 1974. 155. J. Geronimo, D. Hardin, and P. Massopust. Fractal interpolation functions and wavelet expansions based on several scaling functions. J. Approx. Theory, 78:373–401, 1994. 156. A.C. Gilbert, S. Muthukrishnan, and M. Strauss. Improved time bounds for near-optimal sparse Fourier representations. preprint. 157. A.C. Gilbert, S. Muthukrishnan, and M. Strauss. Approximation of functions over redundant dictionaries using coherence. In Proceedings of the Fourteenth Annual ACM–SIAM Symposium on Discrete Algorithms (Baltimore, MD, 2003), pages 243–252, New York, 2003. ACM. 158. J.E. Gilbert, Y. Han, J. Hogan, J. Lakey, D. Weiland, and G. Weiss. Smooth molecular decompositions of functions and singular integral operators. Mem. Amer. Math. Soc., 156 (742), 2002. 159. J.E. Gilbert, J. Hogan, and J. Lakey. Wavelet subspaces for sampling and extrapolation. In P.J.S.G. Ferreira, editor, Proc. SampTA–97, pages 273–278, Aveiro, Portugal, 1997. University of Aveiro. 160. J.E. Gilbert and A. Nahmod. Hardy spaces and a Walsh model for bilinear cone operators. Trans. Amer. Math. Soc., 351:3267–3300, 1999. 161. J.E. Gilbert and A. Nahmod. Boundedness of bilinear operators with nonsmooth symbols. Math. Res. Lett., 7:767–778, 2000. 162. K. Gr¨ ochenig. Acceleration of the frame algorithm. IEEE Trans. Signal Process., 41:3331–3340, 1993. 163. K. Gr¨ ochenig. A discrete theory of irregular sampling. Linear Algebra Appl., 193:129–150, 1993. 164. K. Gr¨ ochenig. An uncertainty principle related to the Poisson summation formula. Studia Math., 121:87–104, 1996. 165. K. Gr¨ ochenig. Finite and infinite-dimensional models of nonuniform sampling. In P.J.S.G. Ferreira, editor, Proc. SampTA–97, pages 285–290, Aveiro, Portugal, 1997. University of Aveiro. ochenig. Aspects of Gabor analysis on locally compact Abelian groups. 166. K. Gr¨ In H. Feichtinger and T. Strohmer, editors, Gabor Analysis and Algorithms: Theory and Applications. Birkh¨ auser, Boston, 1998.
References
377
167. K. Gr¨ ochenig. Nonuniform sampling in higher dimensions: from trigonometric polynomials to bandlimited functions. In J. Benedetto and P.J.S.G. Ferreira, editors, Modern Sampling Theory: Mathematics and its Applications, pages 155–171. Birkh¨ auser, Boston, 2000. 168. K. Gr¨ ochenig. Foundations of Time–Frequency Analysis. Birkh¨ auser, Boston, 2001. 169. K. Gr¨ ochenig. Uncertainty principles for time–frequency representations. In Advances in Gabor Analysis, pages 11–30. Birkh¨ auser, Boston, 2003. 170. K. Gr¨ ochenig, D. Han, C. Heil, and G. Kutyniok. The Balian–Low theorem for symplectic lattices in higher dimensions. Appl. Comput. Harmon. Anal., 13:169–176, 2002. 171. K. Gr¨ ochenig and G. Zimmerman. Hardy’s theorem and the short-time Fourier transform of Schwartz functions. J. London Math. Soc., 63:2001, 2001. 172. C.S. G¨ unt¨ urk. One-bit sigma–delta quantization with exponential accuracy. Comm. Pure Appl. Math., 56:1608–1630, 2003. 173. C.S. G¨ unt¨ urk. Approximating a bandlimited function using very coarsely quantized data: improved error estimates in sigma–delta modulation. J. Amer. Math. Soc., 17:229–242, 2004. 174. L.A. Hageman and D.M. Young. Applied Iterative Methods. Academic Press, New York, 1981. 175. J.B.S. Haldane. Quantum mechanics as a basis for philosophy. J. Phil. Sci., 1:78–98, 1934. 176. D. Hardin and J. Marasovich. Biorthogonal multiwavelets on [−1, 1]. Appl. Comput. Harmon. Anal., 7:34–53, 1999. 177. G.H. Hardy. A theorem concerning Fourier transforms. J. London Math. Soc., 8:227–231, 1933. 178. G.H. Hardy, J.E. Littlewood, and G. P´ olya. Inequalities. Cambridge University Press, Cambridge, 1952. 179. H.F. Harmuth. Transmission of Information by Orthogonal Functions. Springer-Verlag, New York, 1972. 180. V. Havin and B. J¨ oricke. The Uncertainty Principle in Harmonic Analysis. Springer-Verlag, Berlin, 1994. 181. C. Heil. Wiener Amalgam Spaces in Generalized Harmonic Analysis and Wavelet Theory. PhD thesis, University of Maryland, College Park, 1990. 182. C. Heil and D. Colella. Dilation equations and the smoothness of compactly supported wavelets. In J. Benedetto and M. Frazier, editors, Wavelets: Mathematics and Applications, pages 163–3201. CRC Press, Boca Raton, FL, 1994. 183. C. Heil and D. Walnut. Continuous and discrete wavelet transforms. SIAM Review, 31:628–666, 1989. 184. H. Heinig and G. Sinnamon. Fourier inequalities and integral representations of functions in weighted Bergman spaces over tube domains. Ind. Univ. Math. J., 38:603–628, 1989. 185. P. Heller, H. Resnikoff, and R. Wells. Wavelet matrices and the representation of discrete functions. In C.K. Chui, editor, Wavelets: A Tutorial in Theory and Applications, pages 15–50. Academic Press, San Diego, 1992. 186. H.L.F. Helmholtz. Helmholtz’s Popular Scientific Lectures. Dover, New York, 1962. 187. C. Herley, J. Kovaˇcevi´c, K. Ramchandran, and M. Vetterli. Tilings of the time–frequency plane: construction of arbitrary orthogonal bases and fast tiling algorithms. IEEE Trans. Signal Process., 41:3341–3359, 1993.
378
References
188. C. Herley and P.-W. Wong. Minimum rate sampling and reconstruction of signals with arbitrary frequency support. IEEE Trans. Inform. Theory, 45:1555– 1564, 1999. 189. E. Hern´ andez and G. Weiss. A First Course in Wavelets. CRC Press, Boca Raton, FL, 1996. 190. C. Herz. A note on the span of translates in Lp . Proc. Amer. Math. Soc., 8:724–727, 1957. 191. I.I. Hirshman. A note on entropy. Amer. J. Math., 79:152–156, 1957. 192. F. Hlawatsch. Regularity and unitarity of bilinear time–frequency signal representations. IEEE Trans. Inform. Theory, 38:82–94, 1992. 193. J. Hogan. A qualitative uncertainty principle for locally compact Abelian groups. Proc. Centre Math. Anal. Austral. Nat. Univ., 16:133–142, 1988. 194. J. Hogan. Fourier Uncertainty on Groups. PhD thesis, University of New South Wales, 1991. 195. J. Hogan. A qualitative uncertainty principle for unimodular groups of type I. Trans. Amer. Math. Soc, 340:587–594, 1993. 196. J. Hogan and J. Lakey. Embeddings and uncertainty principles for generalized modulation spaces. In J. Benedetto and P.J.S.G. Ferreira, editors, Modern Sampling Theory: Mathematics and Its Applications, pages 73–105. Birkh¨ auser, Boston, 2000. 197. J. Hogan and J. Lakey. Sampling for shift-invariant and wavelet subspaces. In A. Aldroubi, A. Laine, and M. Unser, editors, Wavelet Applications in Signal and Image Processing VIII, volume 4119 of Proc. SPIE, pages 36–47, 2000. 198. J. Hogan and J. Lakey. Sampling and aliasing without translation-invariance. In A. Zayed, editor, Proc. SampTA–01, pages 61–66, Orlando, 2001. University of Central Florida. 199. J. Hogan and J. Lakey. Hardy’s theorem and rotations. preprint, 2003. 200. J. Hogan and J. Lakey. Periodic nonuniform sampling in shift-invariant spaces. In C. Heil, editor, Harmonic Analysis and Applications. Birkh¨ auser, Boston, 2004. to appear. 201. J. Hogan and J. Lakey. Sampling and oversampling in shift-invariant and multiresolution spaces I: validation of sampling schemes. Int. J. Wavelets Multiresolut. Inf. Process., 2004. to appear. 202. L. H¨ ormander. The Analysis of Linear Partial Differential Operators, I. Springer-Verlag, New York, 1983. 203. L. H¨ ormander. A uniqueness theorem of Beurling for Fourier transform pairs. Ark. Math., 29:237–240, 1991. 204. C.-C. Hsiao, B. Jawerth, B. Lucier, and X. Yu. Near optimal compression of orthonormal wavelet expansions. In J. Benedetto and M. Frazier, editors, Wavelets: Mathematics and Applications. CRC Press, Boca Raton, FL, 1994. 205. P.J. Huber. Projection pursuit. Ann. Stat., 13:435–475, 1985. 206. A.J. Hudspeth and D.P. Corey. Sensitivity, polarity and conductance change in the response of vertebrate hair cells to controlled mechanical stimuli. Proc. Natl. Acad. Sci. USA, 74:2407–2411, 1977. 207. N. Jacobson. Basic Algebra. W.H. Freeman, New York, 1985. 208. S. Jaffard. A density criterion for frames of complex exponentials. Michigan Math. J., 38:339–348, 1991. 209. P. Jaming. Principe d’incertitude qualitatif et reconstruction de phase pour la transform´ee de Wigner. C. R. Acad. Sci. Paris S´er I Math., 327:249–254, 1998.
References
379
210. S. Janson. On interpolation of multilinear operators. In Function Spaces and Applications (Lund, 1986), pages 290–302. Springer-Verlag, Berlin, 1988. 211. A.J.E.M. Janssen. The Zak transform: a signal transform for sampled timecontinuous signals. Philips J. Res., 39:23–69, 1989. 212. A.J.E.M. Janssen. The Zak transform and sampling theorems for wavelet subspaces. IEEE Trans. Signal Process., 41:3360–3364, 1993. 213. A.J.E.M. Janssen. Signal analytic proofs of two basic results on lattice expansions. Appl. Comput. Harmon. Anal., 1:350–354, 1994. 214. A.J.E.M. Janssen. Proof of a conjecture on the supports of Wigner distributions. J. Fourier. Anal. Appl., 4:723–726, 1998. 215. A.J.E.M. Janssen and S.J.L. Van Eijndhoven. Spaces of type W , growth of Hermite coefficients, Wigner distribution and Bargmann transform. J. Math. Anal. Appl., 152:368–390, 1990. 216. B. Jawerth and W. Sweldens. Biorthogonal smooth trigonometric bases. J. Fourier Anal. Appl., 2:109–133, 1995. 217. M. Jodeit and A. Torchinsky. Inequalities for the Fourier transform. Studia Math., 37:245–276, 1971. 218. W. Jurkat and G. Sampson. On rearrangement and weight inequalities for the Fourier transform. Indiana Univ. Math. J., 33:257–270, 1984. 219. J.-P. Kahane and P.-G. Lemari´e-Rieusset. Remarques sur la formule sommatoire de Poisson. Studia Math., 109:303–316, 1994. 220. J.-P. Kahane and R. Salem. Ensembles Parfaits et S´eries Trigonom´etriques. Hermann, Paris, 1963. 221. P.P. Kargaev. The Fourier transform of the characteristic function of a set, vanishing on an interval. Mat. Sb. (N.S.), 117 (159):397–411, 432, 1982. Russian. 222. S. Karlin and W. J. Studden. Tchebycheff Systems: With Applications in Analysis and Statistics. John Wiley, New York, 1966. 223. N. Katz and N. Pavlovi´c. Finite time blow-up for a dyadic model of the Euler equations. Trans. Amer. Math. Soc., 2004. to appear. 224. Y. Katznelson. Une remarque concernant la formule de Poisson. Studia Math., 19:107–108, 1967. 225. Y. Katznelson. An Introduction to Harmonic Analysis. John Wiley, New York, 1968. Dover reprint, 1974. 226. J. Kautsky and R. Turcajova. Pollen product factorization and construction of higher multiplicity wavelets. Linear Algebra Appl., 222:241–260, 1995. 227. C. Kenig. Restriction theorems, Carleman estimates, uniform Sobolev inequalities and unique continuation. In Harmonic Analysis and Partial Differential Equations (El Escorial, 1987), pages 69–90. Springer-Verlag, New York, 1989. 228. C. Kenig and N. Nadirashvili. A counterexample in unique continuation. Math. Res. Lett., 7:625–630, 2000. 229. C. Kenig and E. Stein. Multilinear estimates and fractional integration. Math. Res. Lett., 6:1–15, 1999. 230. R. Kerman and E. Sawyer. The trace inequality and eigenvalue estimates for Schr¨ odinger operators. Ann. Inst. Fourier (Grenoble), 36:207–228, 1986. 231. P. Koosis. The Logarithmic Integral, I. Cambridge University Press, Cambridge, 1988. 232. P. Koosis. The Logarithmic Integral, II. Cambridge University Press, Cambridge, 1992.
380
References
233. T. K¨ orner. Divergence of decreasing rearranged Fourier series. Ann. of Math., 144:167–180, 1996. orner. Decreasing rearranged Fourier series. J. Fourier Anal. Appl., 5:1– 234. T. K¨ 19, 1999. 235. O. Kovrijkine. Some results related to the Logvinenko–Sereda theorem. Proc. Amer. Math. Soc., 129:3037–3047, 2001. 236. H.A. Kramers. Wellenmechanik und habzahige quantisierung. Zeit. Phys., 39:828, 1926. 237. E. Kreyszig. Introductory Functional Analysis with Applications. John Wiley, New York, 1978. 238. G. Kutyniok. A qualitative uncertainty principle for functions generating a Gabor frame on LCA groups. J. Math. Anal. Appl., 279:580–596, 2003. 239. M. Lacey and C. Thiele. On Calder´ on’s conjecture. Ann. Math., 149:475–496, 1999. 240. E. Laeng. Une base orthonormal de L2 (R) dont les ´el´ements sont bien localis´es dans l’espace de phase et leurs supports adapt´es ` a toute partition sym´etrique de l’espace des fr´equences. C. R. Acad. Sci. Paris, S´er. 2, 311:667–680, 1990. 241. E. Laeng. Power series, uncertainty principle inequalities and real zeros of Fourier transforms. Expositiones Mathematicae, 14:171–179, 1996. 242. E. Laeng and C. Morpurgo. An uncertainty principle inequality involving L1 norms. Proc. Amer. Math. Soc., 127:3565–3572, 1999. 243. J. Lakey. Trace inequalities, maximal inequalities and weighted Fourier transform estimates. J. Fourier Anal. Appl., 1:202–232, 1994. 244. J. Lakey. Weighted Fourier transform inequalities via mixed-norm Hausdorff– Young inequalities. Can. J. Math., 46:586–601, 1994. 245. J. Lakey, S. Obeidat, and M.C. Pereyra. Multiwavelet characterization of function spaces adapted to the Navier–Stokes equations. In M. Unser, A. Aldroubi, and A. Laine, editors, Wavelet Applications in Signal and Image Processing VIII, volume 4119 of Proc. SPIE, pages 372–383, 2000. 246. J. Lakey and M.C. Pereyra. Divergence-free multiwavelets on rectangular domains. In T.-X. He, editor, Wavelet Analysis and Multiresolution Methods, pages 203–240. Marcel Dekker, New York, 2000. 247. H.J. Landau. Necessary density conditions for sampling and interpolation of certain entire functions. Acta Math., 117:37–52, 1967. 248. H.J. Landau. On the density of phase-space expansions. IEEE Trans. Inform. Theory, 39:1152–1156, 1993. 249. H.J. Landau and H.O. Pollak. Prolate spheroidal wavefunctions, Fourier analysis and uncertainty. II. Bell Syst. Tech. Jour., 40:65–84, 1961. 250. H.J. Landau and H.O. Pollak. Prolate spheroidal wavefunctions, Fourier analysis and uncertainty. III. The dimension of the space of essentially time- and band-limited signals. Bell Syst. Tech. Jour., 41:1295–1336, 1962. 251. W. Lawton. Tight frames of compactly supported wavelets. J. Math. Phys., 31:1898–1901, 1990. 252. W. Lawton. Necessary and sufficient conditions for constructing orthonormal wavelet bases. J. Math. Phys., 32:57–61, 1991. 253. P. Lax. Functional Analysis. John Wiley, New York, 2002. 254. J. Lei. Approximation by Multi-Integer Translates of Functions Having Global Support. PhD thesis, University of Oregon, 1991. 255. P.G. Lemari´e and Y. Meyer. Ondelettes et bases hilbertiennes. Rev. Mat. Iberoamericana, 2:1–18, 1986.
References
381
256. P. G. Lemari´e-Rieusset. Analyses multi-r´esolutions non orthogonales, commutation entre projecteurs et derivation et ondelettes vecteurs ` a divergence nulle. Rev. Mat. Iberoamericana, 8:222–237, 1992. 257. N. Levinson. Gap and Density Theorems. AMS, New York, 1940. 258. E. Lieb. The stability of matter. Rev. Mod. Phys., 48:553–569, 1976. 259. E. Lieb. Sharp constants in the Hardy–Littlewood–Sobolev and related inequalities. Ann. Math., 118:349–374, 1983. 260. E. Lieb. Integral bounds for radar ambiguity functions and Wigner distributions. J. Math. Phys., 31:594–599, 1990. 261. E. Lieb. The stability of matter: from atoms to stars. Bull. Amer. Math. Soc., 22:1–49, 1990. 262. J. Liouville. Troisi´eme m´emoire sur le d´eveloppement des fonctions ou parties de fonctions en s´eries dont les divers termes sont assujettis ` a satisfaire ` a une mˆeme ´equation diff´erentielle du second ordre, contenant un paramˆetre variable. J. de Math., S´er. I, 2:418–436, 1837. 263. Q.H. Liu and N. Nguyen. An accurate algorithm for nonuniform fast Fourier transforms (NUFFTs). IEEE Micro. Guided Lett., 8:18–20, 1998. 264. F. Low. Complete sets of wave-packets. In C. DeTar, J. Finkelstein, and C.-I. Tan, editors, A Passion for Physics: Essays in Honor of Geoffrey Chew, pages 17–22. World Scientific, Singapore, 1985. 265. G. Mackey. Hermann Weyl and the application of group theory to quantum mechanics. In The Scope and History of Commutative and Noncommutative Harmonic Analysis. AMS, Providence, RI, 1992. 266. W.R. Madych. Some elementary properties of multiresolution. In C.K. Chui, editor, Wavelets: A Tutorial in Theory and Applications, pages 259–294. Academic Press, San Diego, 1992. 267. S. Mallat. Multiresolution approximations and wavelet orthonormal bases of L2 (R). Trans. Amer. Math. Soc., 315:69–87, 1989. 268. S. Mallat. A Wavelet Tour of Signal Processing. Academic Press, San Diego, 2nd edition, 1999. 269. H. Malvar. Lapped transforms for efficient transform/subband coding. IEEE Trans. Acoust. Speech Signal Process., 38:969–978, 1990. 270. T. Matolcsi and J. Sz¨ ucs. Intersection des mesures spectrales conjug´ees. C. R. Acad. Sci. Paris S´er. A-B, 277:A841–A843, 1973. 271. V. G. Maz’ja. Sobolev Spaces. Springer-Verlag, New York, 1985. 272. E. Merzbacher. Quantum Mechanics. John Wiley, New York, 1961. 273. Y. Meyer. Principe d’incertitude, bases hilbertiennes et alg`ebres d’op´erateurs. Seminaire Bourbaki, 662:209–223, 1985–1986. 274. Y. Meyer. Ondelettes et Op´erateurs. I. Ondelettes. Hermann, Paris, 1990. 275. Y. Meyer. Wavelets and Operators. In I. Daubechies, editor, Different Perspectives on Wavelets, pages 35–58. AMS, Providence, RI, 1993. 276. G.W. Morgan. A note on Fourier transforms. J. London Math. Soc., 9:187–192, 1934. 277. I. Morishita and A. Yajima. Analysis and simulation of networks of mutually inhibiting neurons. Kybernetic, 11:154–165, 1972. 278. J. Morlet. Sampling theory and wave propagation. In NATO ASI Series, Vol I, Issues in Acoustic Signal/Image Processing and Recognition, pages 233–261, Berlin, 1983. Springer-Verlag. 279. J. Morlet, G. Arens, I. Fourgeau, and D. Giard. Wave propagation and sampling theory. Geophysics, 47:203–236, 1982.
382
References
280. B. Muckenhoupt. A note on two weight function conditions for a Fourier transform norm inequality. Proc. Amer. Math. Soc., 88:97–100, 1983. 281. B. Muckenhoupt. Weighted norm inequalities for the Fourier transform. Trans. Amer. Math. Soc., 276:739–742, 1983. 282. B. Muckenhoupt and R.L. Wheeden. Weighted norm inequalities for fractional integrals. Trans. Amer. Math. Soc., 192:251–275, 1974. 283. C. Muscalu, T. Tao, and C. Thiele. Lp estimates for the biest II. The Fourier case. preprint. 284. C. Muscalu, T. Tao, and C. Thiele. On the bi-Carleson operator I. The Walsh case. preprint. 285. C. Muscalu, T. Tao, and C. Thiele. Multilinear operators given by singular multipliers. J. Amer. Math. Soc., 15:469–496, 2002. 286. C. Muscalu, T. Tao, and C. Thiele. A counterexample to a multilinear endpoint question of Christ and Kiselev. Math. Res. Lett., 10:237–246, 2003. 287. D. Mustard. Uncertainty principles invariant under the fractional Fourier transform. J. Austral. Math. Soc. Ser. B, 33:180–191, 1991. 288. V. Namias. The fractional order Fourier transform and its applications to quantum mechanics. J. Inst. Math. Appl., 25:241–265, 1980. 289. D.J. Newman. The closure of translates in lp . Amer. J. Math, 86:651–667, 1964. 290. N. Nguyen and Q.H. Liu. The regular Fourier matrices and nonuniform fast Fourier transforms. SIAM J. Sci. Comput., 21:283–293, 1999. 291. S. Obeidat. Wavelet Techniques for the Navier-Stokes Equations. PhD thesis, New Mexico State University, 2002. 292. M. Papadakis, H. Siki´c, and G. Weiss. The characterization of low pass filters and some basic properties of wavelets, scaling functions and related concepts. J. Fourier Anal. Appl., 5:495–521, 1999. 293. A. Papoulis. A new algorithm in spectral analysis and bandlimited extrapolation. IEEE Trans. Circuits and Systems, 22:735–742, 1975. 294. A. Papoulis. Signal Analysis. McGraw–Hill, New York, 1977. 295. J. Peetre. New Thoughts on Besov Spaces. Duke University Mathematics Series. Duke University, Raleigh–Durham, 1975. 296. R. Penrose. The Emperor’s New Mind. Oxford University Press, Oxford, 1989. 297. C. Perez and R. Wheeden. Uncertainty principle estimates for vector fields. J. Funct. Anal., 181:146–188, 2001. 298. H.R. Pitt. Theorems on Fourier series and power series. Duke Math. J., 3:747– 755, 1937. 299. D. Pollen. SUI (2, F [z, 1/z]) for F a subfield of C. J. Amer. Math. Soc., 3:611– 624, 1990. 300. J.F. Price. Sharp local uncertainty inequalities. Studia Math., 85:37–45, 1987. ¨ 301. T. Przebinda, V. DeBrunner, and M. Ozaydin. The optimal transform for the Hirschman uncertainty principle. IEEE Trans. Inform. Theory, 47:2086–2090, 2001. 302. J. Ramanathan and T. Steger. Incompleteness of sparse coherent states. Appl. Comput. Harmon. Anal., 2:148–153, 1995. 303. M. Reed and B. Simon. Methods of Modern Mathematical Physics I: Functional Analysis. Academic Press, New York, 1980. 304. M.A. Reiffel. Von Neumann algebras associated with pairs of lattices in Lie groups. Math. Ann, 257:403–418, 1981.
References
383
305. H.L. Resnikoff, J. Tian, and R.O. Wells. Biorthogonal wavelet space: parametrization and factorization. SIAM J. Math. Anal., 33:194–215, 2001. 306. R. Rochberg. NWO sequences, weighted potential operators, and Schr¨ odinger eigenvalues. Duke Math. J., 72:187–215, 1993. 307. R. Rochberg. The use of decomposition theorems in the study of operators. In J. Benedetto and M. Frazier, editors, Wavelets: Mathematics and Applications, pages 547–570. CRC Press, Boca Raton, FL, 1994. 308. R. Rochberg. Size estimates for eigenvalues of singular integral operators and Schr¨ odinger operators and for derivatives of quasiconformal mappings. Amer. J. Math., 117:711–771, 1995. 309. R. Rochberg and S. Semmes. A decomposition theorem for BMO and applications. J. Funct. Anal., 67:228–263, 1986. 310. R. Rochberg and S. Semmes. Nearly weakly orthogonal sequences, singular value estimates and Calder´ on–Zygmund operators. J. Funct. Anal., 86:237– 306, 1989. 311. R. Rochberg and K. Tachizawa. Pseudodifferential operators, Gabor frames, and local trigonometric bases. In H.G. Feichtinger and T. Strohmer, editors, Gabor Analysis and Algorithms: Theory and Applications, pages 171–192. Birkh¨ auser, Boston, 1998. 312. W. Rudin. Fourier Analysis on Groups. John Wiley, New York, 1962. 313. C. Sadosky and R. Wheeden. Some weighted norm inequalities for the Fourier transform of functions with vanishing moments. Trans. Amer. Math. Soc., 300:521–533, 1987. 314. M. Schechter. Spectra of Partial Differential Operators. North–Holland, Amsterdam, 2nd edition, 1986. 315. H.C. Schweinler and E.P. Wigner. Orthogonalization methods. J. Math. Phys., 11:1693–1694, 1970. 316. K. Seip. On the connection between exponential bases and certain related sequences in L2 (−π, π). J. Funct. Anal., 130:131–160, 1995. 317. S. Semmes. Nonlinear Fourier analysis. Bull. Amer. Math. Soc. (N.S.), 20:1– 18, 1989. 318. S. Shamma. Speech processing in the auditory system, II. Lateral inhibition and the central processing of speech evoked activity in the auditory nerve. J. Acoust. Soc. Amer., 78:1622–1632, 1985. 319. H. S. Shapiro. Functions with a spectral gap. Bull. Amer. Math. Soc., 79:355– 360, 1973. 320. B. Simon. The Weyl transform and Lp functions on phase space. Proc. Amer. Math. Soc., 116:1045–1047, 1992. 321. A. Sitaram and M. Sundari. An analogue of Hardy’s theorem for very rapidly decreasing functions on semi-simple Lie groups. Pacific J. Math., 177:187–200, 1997. 322. D. Slepian. Prolate spheroidal wave functions, Fourier analysis and uncertainty IV. Extensions to many dimensions; generalized prolate spheroidal wave functions. Bell Syst. Tech. J., 43:3009–3057, 1964. 323. D. Slepian. On bandwidth. Proc. IEEE, 64:292–300, 1976. 324. D. Slepian and H.O. Pollak. Prolate spheroidal wave functions, Fourier analysis and uncertainty. I. Bell Systems Tech. J., 40:43–64, 1961. 325. S. Smale and D.-X. Zhou. Shannon sampling and function reconstruction from point values. Bull. Amer. Math. Soc. (N.S.), 41:279–305, 2004.
384
References
326. G. Steidl. A note on fast Fourier transforms for nonequispaced grids. Adv. Comput. Math., 9:337–352, 1998. 327. E.M. Stein. Singular Integrals and Differentiability Properties of Functions. Princeton University Press, Princeton, 1970. 328. E.M. Stein. Harmonic Analysis: Real-Variable Methods, Orthogonality, and Oscillatory Integrals. Princeton University Press, Princeton, 1993. 329. E.M. Stein and G. Weiss. Introduction to Fourier Analysis on Euclidean Spaces. Princeton University Press, Princeton, 1971. 330. G. Strang and G. Fix. Fourier analysis of the finite-element method in Ritz– Galerkin theory. Studies in Appl. Math., 48:265–273, 1969. 331. G. Strang and G. Fix. An Analysis of the Finite Element Method. PrenticeHall, Englewood Cliffs, NJ, 1973. 332. G. Strang and T. Nguyen. Wavelets and Filter Banks. Wellesley–Cambridge Press, Wellesley, MA, 1996. 333. V. Strela. Multiwavelets: Theory and Applications. PhD thesis, MIT, Cambridge, MA, 1996. 334. V. Strela and G. Plonka. Construction of multiscaling functions with approximation and symmetry. SIAM J. Math. Anal., 29:481–510, 1998. 335. R. Strichartz. Fourier asymptotics of fractal measures. J. Funct. Anal., 89:154– 157, 1990. 336. J.O. Stromberg. A modified Franklin system and higher order spline systems on Rn as unconditional bases for Hardy spaces. In W. Beckner, A. Calder´ on, R. Fefferman, and P. Jones, editors, Conference in Honor of Antoni Zygmund, pages 475–493. Wadsworth, Belmont, CA, 1982. 337. G. Szeg¨ o. Orthogonal Polynomials. AMS, Providence, RI, 4th edition, 1975. 338. M. Taibleson. On the theory of Lipschitz spaces of distributions on Euclidean n-space, I-III. J. Math. Mech., 13, 14, 15:407–480, 821–840, 937–981, 1964, 1965, 1966. 339. T. Tao. An uncertainty principle for cyclic groups of prime order. preprint. 340. T. Tao. On almost everywhere convergence of wavelet summation methods. Appl. Comput. Harmon. Anal., 4:384–387, 1996. 341. T. Tao, A. Vargas, and L. Vega. A bilinear approach to the restriction and Kakeya conjectures. J. Amer. Math. Soc., 11:967–1000, 1998. 342. P. Tchamitchian. Biorthogonalit´e et th´eorie des op´erateurs. Rev. Mat. Iberoamericana, 3:163–189, 1987. 343. V.N. Temlyakov. The best m-term approximation and greedy algorithms. Adv. Comput. Math, 8:249–265, 1998. 344. A. Teolis. Computational Signal Processing with Wavelets. Birkh¨ auser, Boston, 1998. 345. S. Thangavelu. Harmonic Analysis on the Heisenberg Group. Birkh¨ auser, Boston, 1998. 346. C. Thiele. The quartile operator and pointwise convergence of Walsh series. Trans. Amer. Math. Soc., 352:5745–5766, 2000. 347. C. Thiele. Time–frequency analysis in the discrete phase plane. In Topics in Analysis and its Applications, pages 99–152. World Scientific, River Edge, NJ, 2000. 348. C. Thiele and L. Villemoes. A fast algorithm for adapted time–frequency tilings. Appl. Comput. Harmon. Anal., 3:91–99, 1996. 349. L.H. Thomas. The calculation of atomic fields. Proc. Cambridge Phil. Soc., 23:542–548, 1927.
References
385
350. J. Tropp. Recovery of short, complex linear combinations via `1 minimization. preprint, 2004. 351. M. Tygert, 2003. personal communication. 352. M. Unser and A. Aldroubi. Polynomial splines and wavelets—a signal processing perspective. In C.K. Chui, editor, Wavelets: A Tutorial in Theory and Applications, pages 91–122. Academic Press, San Diego, 1992. 353. M. Vetterli and C. Herley. Wavelets and filter banks: theory and design. IEEE Trans. Acoust. Speech and Signal Process., 40:2207–2232, 1992. 354. J. von Neumann. Mathematical Foundations of Quantum Mechanics. Princeton University Press, Princeton, 1955. Original German 1932. 355. D.F. Walnut. Continuity properties of the Gabor frame operator. J. Math. Anal. Appl., 165:479–504, 1992. auser, Boston, 2002. 356. D.F. Walnut. An Introduction to Wavelet Analysis. Birkh¨ 357. J.L. Walsh. A closed set of orthogonal functions. Amer. J. Math., 45:5–24, 1923. 358. G. Walter. A sampling theorem for wavelet subspaces. IEEE Trans. Inform. Theory, 38:881–884, 1992. 359. G. Walter and X. Shen. Sampling with prolate spheroidal wave functions. Sampl. Theory Signal Image Process., 2:25–52, 2003. 360. E.W. Weisstein. Legendre-Gauss quadrature. MathWorld–A Wolfram Web Resource, http://mathworld.wolfram.com/Legendre-GaussQuadrature.html. 361. R.O. Wells. Parametrizing smooth compactly supported wavelets. Trans. Amer. Math. Soc., 228:919–931, 1993. 362. G. Wentzel. Eine verallgemeinerung der quantenbedingungen f¨ ur die zwecke der wellenmechanik. Zeit. Phys., 38:518, 1926. 363. J. Wexler and S. Raz. Discrete Gabor expansions. Signal Proc., 21:207–221, 1990. 364. M.V. Wickerhauser. Adapted Wavelet Analysis from Theory to Software. A.K. Peters, Wellesley, MA, 1994. 365. N. Wiener. I Am a Mathematician: The Later Life of a Prodigy. MIT Press, Cambridge, MA, 1956. 366. E. Wilczok. Zur Funktionalanalysis der Wavelet und Gabortransformation. PhD thesis, TU M¨ unchen, 1998. 367. The WUTAM Consortium. Basic properties of wavelets. J. Fourier Anal. Appl., 4:575–594, 1998. 368. X.-G. Xia and Z. Zhang. On sampling theorem, wavelets, and wavelet transforms. IEEE Trans. Signal Process., 41:3524–3535, 1993. 369. H. Xiao, V. Rokhlin, and N. Yarvin. Prolate spheroidal wavefunctions, quadrature and interpolation. Inverse Problems, 17:805–838, 2001. 370. X. Yang, K. Wang, and S. Shamma. Auditory representations of acoustic signals. IEEE Trans. Inform. Theory, 38:824–839, 1992. 371. J.L. Yen. On the nonuniform sampling of bandwidth-limited signals. IRE Trans. Circuit Theory, 3:251–257, 1956. 372. R.M. Young. An Introduction to Nonharmonic Fourier Series. Academic Press, San Diego, 2001. 373. G. Zweig, R. Lipes, and J.R. Pierce. The cochlear compromise. J. Acoust. Soc. Am., 59:975–982, 1976. 374. A. Zygmund. Trigonometric Series. Vol. I, II. Cambridge University Press, Cambridge, 1977.
Index
A∞ , see Ap -weight Ap -weight, 240, 301, 352, 362 I(j, k), see dyadic interval Lpa , 45, 230 Lpv , 221 M1 , 230, 242 Q(j, k), see dyadic cube ΩT -theorem, 120 δ-oscillation ωδ (f ), 146 D, Dj , see dyadic interval τ -cycle condition, 5, 152 z-transform, 25 Q, Qj , see dyadic cube aliasing, 153 almost everywhere convergence, 282 analysis operator, 91, 134 approximation nonlinear, 251–254 numbers, 253 space Asq (Lp ), 253 tree, 254, 255 bad cube, 298, 301 Balian–Low theorem, 136, 164, 242 bandlimited, 238 locally, 173 bell, 269, 273 over I, 170 Besov space, 247–257, 261 Bps,q , 247 bs,q p , 248, 262 and BV, 251, 322 homogeneous, 263
best basis algorithm, 185 for nonlinear approximation, 252 Beurling density, 139 biorthogonality conditions, 45, 60, 67 bounded variation, 252, 316 Calder´ on–Zygmund decomposition, 254, 305 operator, 268 cardinal, 3, 101 scaling function, 12 sine, 12, 111 Carleson class, 275, 276 envelope, 277 estimate, 339 operator, 332 Walsh model, 334, 335 sequence, 314 cascade algorithm, 11 Cauchy integral, 265 Chebyshev polynomials, 95 system, 121 commutator, 292, 365 compression, 249 of operators, 264 connection coefficients, 43 DFT, 101, 180, 199 difference operator backward, 44 forward divided, 84
388
Index
dilation equation, see scaling Dirichlet kernel, 231 distortion, 258, 261 Donoho’s heuristic, 248 dyadic cube, Q, 297, 360 dyadic interval, D, 360 encoder, 257 entropy, 201, 208 Hirschman’s inequality, 209, 211 Kolmogorov, 257 Euclidean algorithm, 37 exclusion principle, 307 fast Fourier transform Dutt-Rokhlin, 105 FFT, 102 nonuniform data, 102 NUFFT, 102 output, 102 Fefferman–Phong bump condition, 298, 353 weight, 298, 313, 355 Feichtinger algebra, see M1 Fermion, 307 filter complementary, 38 dual, 14 finite impulse response, 10 FIR, 10 prefilter, 13 primal, 14 quadrature mirror, 4 scaling, 3 low-pass, 176 folding operator, 164 forest, 303, 341 operator, 341 Fourier transform, 3, 360 fractional, 216 frame, 91 acceleration, 94 adaptive, 109 bounds, 91 condition number, 93 conjugate gradient acceleration, 94 dual, 92 operator, 91
radius, 112 relaxation parameter, 93 tight, 93 Gabor function, 269 system, 133 transform, 219 Gaussian measure, 203, 204 quadrature, 122 window, 164 generator biorthogonal, 14 dual, 14 orthogonal, 3 grand challenge, 248 ground-state energy, 356 Hausdorff–Young inequality, 201–205, 214, 222, 223, 234 for amalgam spaces, 242 Heisenberg box, 132, 170 group, 202, 226, 239 inequality, 213 spread, 218 Hermite function, 203 interpolant, 80 multiplier, 203 polynomial, 203 Hilbert transform, 333 bilinear, 333, 356 Walsh model, 335 Hilbert–Schmidt operator, 194, 366 H¨ older continuous, 4, 11 interpolation fractal function, 58, 59 fractal vectors, 58 function, 101, 147 Hermite, 77 Lagrange polynomials, 55 Marcinkiewicz, 363 moment, 78 Riesz–Thorin, 201, 363 intertwining, 44 inverse commutation, 84
Index lapped orthogonal transforms, 174 LCA group, 186, 200 Legendre polynomials, 54 lifting, 39 Littlewood–Paley theory, 247 local trigonometric basis, 173 matrix almost diagonal, 47 change of wavelet, 46 Vandermonde, 101, 200 maximal function, 361 dyadic, 301, 362 Hardy–Littlewood, 339 nontangential, 274 sequential, 275 operator associated to multilinear operator, 328 fractional, 351 theorem, 362 Mehler’s formula, 203, 216 minimax, 248, 261 modulation space, 252, 283 momentum operator, 191 Morgan’s theorem, 197, 239 Morrey space, 298, 353 Moyal’s formula, 283 multiresolution analysis (MRA), 2 biorthogonal, 13 frame, 31 Navier–Stokes equations, 50 Nazarov’s theorem, 196 nearly weakly orthogonal, 274 Neumann series, 92 normalized area, 187 NWO, see nearly weakly orthogonal operator adjoint, 360 self-adjoint, 288 unbounded, 289, 365 oversampling, 149 Paley–Wiener theorem, 193 Papoulis–Gerchberg algorithm, 195 paraconjugate, 26
389
paraunitary, 26, 33, 35 periodization, 9 phase cell, 186 plane of G, 186 Walsh model, 181, 334 phase space, 132 Poincar´e inequality, 363 Poisson summation formula, see PSF, 140 polyphase, 33 principal shift-invariant space orthogonal generator, 144 PSI, 144 prolate spheroidal wavefunctions (PSWFs), 115 pseudodifferential operator, 269 Weyl correspondence, 270, 282 PSF, 169, 193, 225–234, 239 counterexamples, 228 quadrature Gauss–Legendre, 54 quadrature mirror filter (QMF), 4, 27 construction, 23 quantization oversampled, 160 sigma–delta, 160 rearrangement, 209, 212, 222, 233, 240, 250, 264 refinement, see scaling equation, 16 method, 16 operator, 16 vector, 54 Riesz basis, 2, 167 Rodrigues’ formula, 54 sampling critical, 149 offset, 148 periodic nonuniform (PNS), 149 theorem, 110 local, 175 scaling equation, 3 filter, 3 function, 3, 25
390
Index
cardinal, 12, 29 Haar, 29, 32 interpolating, 55 Shannon, 12 multiscaling, 60 operator, 235 vector, 56 Schr¨ odinger equation, 288, 289, 325 operator, 288, 324 representation, 202, 364 semigroup, 289 Schur’s lemma, 263, 265, 282 sequency, 181 shift-invariant, 6 short-time Fourier transform, 132, 214–221 singular value, 271 approximate, 277 decomposition, 248 Sobolev inequality, 296, 353 endpoint, 322 Hardy–Littlewood–Sobolev, 209 logarithmic, 208, 209, 211 Sobolev space, 15, 45, 61, 247 spectrum absolutely continuous, 325 spectral theorem, 288 spline, 14, 44 B-spline, 14 square function sequential, 281 wavelet, 254, 281, 300 stability of matter, 308 subband coding, 7 subdivision scheme, 82 uniformly biorthogonal, 86 synthesis operator, 91
transition operators, 78, 83 tree, 340 bad, 303 operator, 341 top, 341 two-scale transform (TST), 64
tile, 178, 188 minimal, 184 quartile, 336–344 time–frequency distribution, 214 space, 132
Young’s function, 197, 353 Young’s inequality for convex functions, 198 for convolution, 209, 211
uncertainty principle, 191, 292 Donoho-Stark, 138 Heisenberg, 132 unconditional basis, 247, 248 vaguelette, 47, 274 volume counting, 297 Walsh packet, 181, 335 wavefunction, 287 wavelet auditory model (WAM), 346 bandlimited, 45, 188 crime, 12 Haar, 265 Legendre, 56 Lemari´e–Meyer, 45 nonstandard representation, 42, 268 packet, 177 transform, 349 fast (FWT), 9 weak-Lp , 250, 333, 361 Weil transform, 226 Wexler-Raz relations, 135 Whitney decomposition, 311, 313 Wiener condition, 229 space, 145, 229, 242 Wigner distribution, 213, 221, 270 Wilson basis, 164, 252 Wirtinger’s inequality, 109 WKB approximation, 323, 325, 348, 356
Zak transform, 24, 137, 226