ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 124
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
ASS...
35 downloads
670 Views
5MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 124
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
ASSOCIATE EDITORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics Edited by
PETER W. HAWKES CEMES-CNRS Toulouse, France
VOLUME 124
Amsterdam Boston London New York Oxford Paris San Diego San Francisco Singapore Sydney Tokyo
∞ This book is printed on acid-free paper. C 2002, Elsevier Science (USA). Copyright
All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (www.copyright.com), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2002 chapters are as shown on the title pages: If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/2002 $35.00 Explicit permission from Academic Press is not required to reproduce a maximum of two figures or tables from an Academic Press chapter in another scientific or research publication provided that the material has not been credited to another source and that full credit to the Academic Press chapter is given.
Academic Press An imprint of Elsevier Science. 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.academicpress.com
Academic Press 84 Theobald’s Road, London WC1X 8RR, UK http://www.academicpress.com International Standard Book Number: 0-12-014766-1 PRINTED IN THE UNITED STATES OF AMERICA 02 03 04 05 06 07 9 8 7 6 5 4 3
2
1
CONTENTS
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Contributions . . . . . . . . . . . . . . . . . . . . . .
vii ix xi
V-Vector Algebra and Volterra Filters Alberto Carini, Enzo Mumolo, and Giovanni L. Sicuranza
I. II. III. IV. V. VI.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . Volterra Series Expansions and Volterra Filters . . . . . . . . . . V-Vector Algebra. . . . . . . . . . . . . . . . . . . . . . . V-Vectors for Volterra and Linear Multichannel Filters . . . . . . . A Novel Givens Rotation–Based Fast QR-RLS Algorithm. . . . . . Nonlinear Prediction and Coding of Speech and Audio by Using V-Vector Algebra and Volterra Filters . . . . . . . . . . . . . . VII. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix I: The Givens Rotations. . . . . . . . . . . . . . . . Appendix II: Some Efficient Factorization Algorithms . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .
2 4 12 24 29 42 54 55 56 59
A Brief Walk through Sampling Theory Antonio G. Garc´ia
I. II. III. IV. V.
Starting Point . . . . . . . . . . . . Orthogonal Sampling Formulas . . . . . Classical Paley–Wiener Spaces Revisited . Sampling Stationary Stochastic Processes At the End of the Walk . . . . . . . . References . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
64 65 92 124 128 132
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
140 141 143 150 158 164
Kriging Filters for Space–Time Interpolation William S. Kerwin and Jerry L. Prince
I. II. III. IV. V. VI.
Introduction . . . . . . . . . Data Model . . . . . . . . . Review of Kriging Methods . . Best Linear Unbiased Prediction Cokriging Filters . . . . . . . Space–Time Kriging Filters . .
. . . . . .
. . . . . .
v
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
vi
CONTENTS
VII. Applications . . . . . . . . . . . . . . . VIII. Discussion and Conclusion. . . . . . . . . Appendix: Optimality of Filtering Algorithms. References . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
171 184 187 192
Constructions of Orthogonal and Biorthogonal Scaling Functions and Multiwavelets Using Fractal Interpolation Surfaces Bruce Kessler
I. II. III. IV. V.
Introduction . . . . . . . . . Scaling Function Constructions . Associated Multiwavelets . . . Wavelet Constructions . . . . Applications to Digitized Images Appendix . . . . . . . . . . References . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
195 204 209 218 226 232 250
Introduction . . . . . . . . . . . . . . . . . . . . . . Background . . . . . . . . . . . . . . . . . . . . . . Diffraction Tomography for Turbid Media: The Forward Model Backpropagation in Turbid Media . . . . . . . . . . . . . Signal-to-Noise Ratios . . . . . . . . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
253 259 268 281 316 338 339
. . . . . . . .
. . . . . . . .
. . . . . . . .
343 345 348 358 366 383 391 391
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
395
Diffraction Tomography for Turbid Media Charles L. Matson
I. II. III. IV. V. VI.
Tree-Adapted Wavelet Shrinkage James S. Walker
I. II. III. IV. V. VI. VII.
Introduction . . . . . . . . . . . . . . Comparison of Taws and Wiener Filtering . Wavelet Analysis . . . . . . . . . . . . Fundamentals of Wavelet-Based Denoising . Tree-Adapted Wavelet Shrinkage . . . . . Comparison of Taws with Other Techniques Conclusion . . . . . . . . . . . . . . References . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
CONTRIBUTORS
Numbers in parentheses indicate the pages on which the authors’ contribution begins.
Alberto Carini (1), TELIT Mobile Terminals S.p.A., I-34010 Sgonico, Trieste, Italy Antonio G. Garc´ia (63), Department of Mathematics, Universidad Carlos III de Madrid, E-28911 Legan´es (Madrid), Spain William S. Kerwin (139), Department of Radiology, University of Washington, Seattle, Washington 98195 Bruce Kessler (195), Department of Mathematics, Western Kentucky University, Bowling Green, Kentucky 42101 Charles L. Matson (253), Directed Energy Directorate, Air Force Research Laboratory, Kirtland AFB, New Mexico 87117 Enzo Mumolo (1), Department of Electrical, Electronic and Computer Engineering (DEEI), University of Trieste, I-34127 Trieste, Italy Jerry L. Prince (139), Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218 Givoanni L. Sicuranza (1), Department of Electrical, Electronic and Computer Engineering (DEEI), University of Trieste, I-34127 Trieste, Italy James S. Walker (343), Department of Mathematics, University of Wisconsin–Eau Claire, Eau Claire, Wisconsin 54702
vii
This Page Intentionally Left Blank
PREFACE
Four of the six contributions to this volume are concerned with aspects of signal filtering, which of course includes image filtering for noise suppression and quality improvement. The first, by A. Carini, E. Mumolo and G. L. Sicuranza, is concerned with the class of polynomial filters known as Volterra filters. In order to analyse these and to transfer algorithms already studied in the linear case to the nonlinar situation, an algebra known as V-vector algebra has been devised and many new results have recently been obtained by these authors; these are included here and this very full survey thus usefully complements the book on Polynomial Signal Processing by V. J. Mathews and G. L. Sicuranza. Sampling theory is a vast subject with a long history, in which the Whittaker– Shannon–Kotel’nikov formula is the best known event. A. G. Garc´ıa surveys the various approaches to sampling and explains the results that have been obtained over the years. Although much of the material presented here can be found in the literature, this presentation by A. Garc´ıa brings together a mass of material in a single coherent account, and also indicates where further information on each topic can be found. Furthermore, some topics that are not well known, such as the use of Riesz bases and the use of frames, are explained in detail, thereby making these accessible to a wide audience. The third chapter brings us back to filtering—here it is kriging that is examined. Originally, kriging was used for spatial interpolation of irregular geological samples but has subsequently been extended to permit interpolation in both space and time. W. S. Kerwin and J. L. Prince show that, with suitable assumptions, space–time kriging (and the related cokriging) can be performed by means of fast filtering techniques. Examples of the application of the technique to hydrogeology and to cardiac magnetic resonance images conclude this highly original contribution. In his discussion of orthogonal and biorthogonal scaling functions and multiwavelets constructed with the aid of fractal interpolation surfaces, B. Kessler gives an account of the most recent results in an area in rapid development. Separable bases in two (or more) dimensions for image decomposition are easy to use but have the disadvantage that they are biased towards the natural vertical and horizontal directions in a rectangular image. Nonseparable bases do not exhibit such a bias but may not be so easy to put into practice. The author has made major contributions to this theory and he leads us systematically though the subject, from the construction of scaling functions, through the associated multiwavelets, finally arriving at families of wavelets ix
x
PREFACE
for biorthogonal and for orthogonal construction. A final section describes applications to noise reduction and compression of images. Although the most familiar technique for obtaining three-dimensional information about the human body is tomography using x-rays, this is not the only wavelength range that can be used. Optical diffusion tomography, in which light is used instead of x-rays, is attractive in that the radiation is not ionizing and the equipment is much less expensive; furthermore, the image may give functional information about the tissues and organs irradiated. The major drawback is that light is scattered in its passage through the body and image interpretation is much less direct. This is the subject of the chapter by C. L. Matson, on diffraction tomography in turbid media. The theoretical models are presented in detail and the use of backprojection in these difficult conditions is explained. This long contribution forms a short monograph on this subject, which is of great potential interest well beyond the medical applications that stimulated these ideas. The final chapter brings us back to filtering, the objective here being noise reduction in images based on wavelet decomposition. J. S. Walker presents the technique pioneered by him that is known as tree-adapted wavelet shrinkage (TAWS). After a brief introduction to wavelet analysis and to noise reduction based on wavelets, the author describes tree-adapted wavelet shrinkage in detail and presents the associated algorithms. He concludes with very telling comparisons with other methods, from which it is clear that edges are well preserved by TAWS. It only remains for me to thank all the contributors to this volume for going to such trouble to make their material accessible to a wide audience. Peter W. Hawkes
FUTURE CONTRIBUTIONS
T. Aach Lapped transforms G. Abbate New developments in liquid-crystal-based photonic devices S. Ando Gradient operators and edge and corner detection A. Arn´eodo, N. Decoster, P. Kestener and S. Roux (vol. 126) A wavelet-based method for multifractal image analysis M. Barnabei and L. B. Montefusco (vol. 125) An algebraic approach to subband signal processing C. Beeli (vol. 127) Structure and microscopy of quasicrystals I. Bloch Fuzzy distance measures in image processing G. Borgefors Distance transforms B. L. Breton, D. McMullan and K. C. A. Smith (Eds) Sir Charles Oatley and the scanning electron microscope A. Bretto Hypergraphs and their use in image modelling Y. Cho Scanning nonlinear dielectric microscopy E. R. Davies (vol. 126) Mean, median and mode filters H. Delingette Surface reconstruction based on simplex meshes A. Diaspro (vol. 126) Two-photon excitation in microscopy R. G. Forbes Liquid metal ion sources xi
xii
FUTURE CONTRIBUTIONS
E. F¨orster and F. N. Chukhovsky X-ray optics A. Fox The critical-voltage effect L. Frank and I. Mullerov´ ¨ a Scanning low-energy electron microscopy M. Freeman and G. M. Steeves (vol. 125) Ultrafast scanning tunneling microscopy L. Godo & V. Torra Aggregation operators A. Hanbury Morphology on a circle P. W. Hawkes (vol. 127) Electron optics and electron microscopy: conference proceedings and abstracts as source material M. I. Herrera The development of electron microscopy in Spain J. S. Hesthaven (vol. 127) Higher-order accuracy computational methods for time-domain electromagnetics K. Ishizuka Contrast transfer and crystal images I. P. Jones (vol. 125) ALCHEMI G. K¨ogel Positron microscopy W. Krakow Sideband imaging N. Krueger The application of statistical and deterministic regularities in biological and artificial vision systems A. Lannes (vol. 126) Phase closure imaging
FUTURE CONTRIBUTIONS
B. Lahme Karhunen–Loeve decomposition B. Lencov´a Modern developments in electron optical calculations M. A. O’Keefe Electron image simulation N. Papamarkos and A. Kesidis The inverse Hough transform M. G. A. Paris and G. d’Ariano Quantum tomography E. Petajan HDTV T.-C. Poon (vol. 126) Scanning optical holography H. de Raedt, K. F. L. Michielsen and J. Th. M. Hosson (vol. 125) Aspects of mathematical morphology E. Rau Energy analysers for electron microscopes H. Rauch The wave-particle dualism D. de Ridder, R. P. W. Duin, M. Egmont-Petersen, L. J. van Vliet and P. W. Verbeek (vol. 126) Nonlinear image processing using artificial neural networks D. Saad, R. Vicente and A. Kabashima (vol. 125) Error-correcting codes O. Scherzer Regularization techniques G. Schmahl X-ray microscopy S. Shirai CRT gun design methods T. Soma Focus-deflection systems and their applications
xiii
xiv
FUTURE CONTRIBUTIONS
I. Talmon Study of complex fluids by transmission electron microscopy M. Tonouchi Terahertz radiation imaging N. M. Towghi Ip norm optimal filters Y. Uchikawa Electron gun optics D. van Dyck Very high resolution electron microscopy K. Vaeth and G. Rajeswaran Organic light-emitting arrays C. D. Wright and E. W. Hill Magnetic force microscopy F. Yang and M. Paindavoine (vol. 126) Pre-filtering for pattern recognition using wavelet transforms and neural networks M. Yeadon (vol. 127) Instrumentation for surface studies S. Zaefferer (vol. 125) Computer-aided crystallographic analysis in TEM
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 124
V-Vector Algebra and Volterra Filters ALBERTO CARINI,1 ENZO MUMOLO,2 AND GIOVANNI L. SICURANZA2 1
TELIT Mobile Terminals S.p.A., I-34010 Sgonico, Trieste, Italy Department of Electrical, Electronic and Computer Engineering (DEEI), University of Trieste, I-34127 Trieste, Italy
2
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . II. Volterra Series Expansions and Volterra Filters . . . . . . . . . . . A. Volterra Series Expansions for Continuous Nonlinear Systems . . . . B. Volterra Series Expansions for Discrete Nonlinear Systems . . . . . C. Properties of Discrete Volterra Series Expansions . . . . . . . . . 1. Linearity with Respect to the Kernel Coefficients . . . . . . . . 2. Multidimensional Convolution Property . . . . . . . . . . . . 3. Symmetry of the Volterra Kernels . . . . . . . . . . . . . . 4. Impulse Responses of Volterra Filters . . . . . . . . . . . . . 5. Stability of Volterra Filters . . . . . . . . . . . . . . . . . 6. Existence and Convergence of Volterra Series Expansions . . . . III. V-Vector Algebra . . . . . . . . . . . . . . . . . . . . . . . . A. The Time-Shift Property . . . . . . . . . . . . . . . . . . . B. V-Vectors for Quadratic Homogeneous Filters . . . . . . . . . . C. Definitions and Properties of V-Vector Algebra . . . . . . . . . . D. Some Further Definitions and Fundamental Operations . . . . . . . IV. V-Vectors for Volterra and Linear Multichannel Filters . . . . . . . . A. V-Vectors for pth-Order Volterra Filters . . . . . . . . . . . . . B. V-Vectors for Linear Multichannel Filters . . . . . . . . . . . . V. A Novel Givens Rotation–Based Fast QR-RLS Algorithm . . . . . . . A. Review of RLS Adaptive Filtering . . . . . . . . . . . . . . . B. The Volterra Givens Rotation–Based Fast QR-RLS Filter . . . . . . C. Experimental Results . . . . . . . . . . . . . . . . . . . . . VI. Nonlinear Prediction and Coding of Speech and Audio by Using V-Vector Algebra and Volterra Filters . . . . . . . . . . . . . . . . . . . A. Nonlinear Prediction of Speech by Using V-Vector Algebra . . . . . B. Nonlinear Coding of Speech and Audio by Using V-Vector Algebra . C. The Coding Algorithm . . . . . . . . . . . . . . . . . . . . D. Stability of the Proposed Coding Algorithm . . . . . . . . . . . E. Sampling Frequency Issue . . . . . . . . . . . . . . . . . . . F. Efficient Coding of the Side Information . . . . . . . . . . . . . G. Experimental Results . . . . . . . . . . . . . . . . . . . . . VII. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix I: The Givens Rotations . . . . . . . . . . . . . . . . . Appendix II: Some Efficient Factorization Algorithms . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
2 4 4 5 8 8 9 9 11 11 11 12 12 15 16 23 24 25 27 29 30 33 40
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
42 43 47 48 50 51 52 52 54 55 56 59
1 Copyright 2002, Elsevier Science (USA). All rights reserved. ISSN 1076-5670/02 $35.00
2
CARINI ET AL.
I. Introduction This article describes an algebraic structure which is usefully applied to the representation of the input–output relationship of the class of polynomial filters known as discrete Volterra filters. Such filters are essentially based on the truncated discrete Volterra series expansion, which is obtained by suitably sampling the continuous Volterra series expansion, which is widely applied for representation and analysis of continuous nonlinear systems. Vito Volterra, an Italian mathematician born in Ancona in 1860, introduced the concept of functionals and devised the series, named after him, as an extension of the Taylor series expansion. His first works on these topics were published in 1887. Besides devising the theory of functionals, he made relevant contributions to integral and integrodifferential equations and in other fields of physical and biological sciences. A complete list of his 270 publications is reported in the book published in 1959 in which his works on the theory of functionals were reprinted in English (Volterra, 1959). Other seminal contributions related to the Volterra series expansion can be found in Fr´echet (1910), where it is shown that the set of Volterra functionals is complete. The main result of all this work was the finding that every continuous functional of a signal x(t) can be approximated with arbitrary precision as a sum of a finite number of Volterra functionals in x(t). This result can be seen as a generalization of the Stone–Weierstrass theorem, which states that every continuous function of a variable x can be approximated with arbitrary precision by means of a polynomial operator in x. The first use of Volterra’s theory in nonlinear system theory was proposed by Norbert Wiener in the early 1940s. Wiener’s method of analyzing continuous nonlinear systems employed the so-called G-functionals to determine the coefficients of the nonlinear model. The relevant property of G-functionals is that they are mutually orthogonal when the input signal to the system is white and Gaussian. An almost complete account of his work in this area is available in Wiener (1958). These works stimulated a number of studies on Volterra and Wiener theories. Complete accounts of the fundamentals of Volterra system theory and of the developments that occurred until the late 1970s can be found in the survey papers by Billings (1980) and Schetzen (1993) and in the books by Marmarelis and Marmarelis (1978), Rugh (1981), and Schetzen (1989). The first book is primarily devoted to the applications in biomedical engineering. The development of digital signal-processing techniques and the facilities offered by powerful computers and digital signal processors stimulated a number of studies on discrete nonlinear systems in the 1980s. The model used was often the discrete version of the Volterra series expansion. As a result, a new class of filters, polynomial filters, including Volterra filters, was introduced and widely applied. A number of applications were considered in different fields,
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
3
from system theory to communications and biology, to mention only a few. Particular interest was devoted to adaptive filters and adaptation algorithms because these devices are employed in many applications. An account of pertinent activities in digital signal processing can be found in the survey papers by Mathews (1991) and Sicuranza (1992) and in the book by Mathews and Sicuranza (2000). This book contains the first complete account of the discrete Volterra series expansion and the whole class of polynomial filters, together with a large number of references. As already mentioned, adaptive filters based on discrete Volterra models play a relevant role in nonlinear digital signal processing because they are used in many tasks such as nonlinear system identification, compensation for nonlinear distortions, equalization of communication channels, nonlinear echo cancellation, and so forth. In this respect, the V-vector algebra presented in this article constitutes a powerful tool for describing Volterra filters and their adaptation algorithms. In fact, adaptation algorithms for Volterra filters are usually obtained by extending classical algorithms proposed for linear filters. However, what makes this task complicated is the loss of the time-shift property in the input vector. This property is the key factor for deriving many fast adaptation algorithms. It consists of the fact that, in the linear case, passage from the vector collecting the N most recent samples of the input signal at time n to that at time n + 1 requires the last element of the vector to be discarded and then the new input sample to be added as the first element. This property does not apply to the input vector of Volterra filters, which is formed by different products of input samples. V-vector algebra has been accordingly designed to preserve the time-shift property of the input vectors of the linear case. V-vector algebra can thus be viewed as a simple formalism which is suitable for the derivation of adaptation algorithms for Volterra filters by simple reformulation of the well-known adaptation algorithms applied to linear filters. In particular, the vectors of linear algebra are replaced by V-vectors, which can be viewed as nonrectangular matrices. Using the V-vector formalism allows fast and numerically stable adaptation algorithms for Volterra filters to be easily derived from known linear theory. As an additional feature, it is possible to show that V-vector algebra can be usefully exploited to describe multichannel linear adaptive filters with channels of different memory lengths. The first part of this article provides a brief introductory account of the Volterra series expansions and discrete Volterra filters. The remaining sections are essentially based on a chapter of the doctoral thesis by Carini (1997) and a paper by Carini et al. (2000) that address the main definitions of V-vector algebra. Another paper by Carini et al. (1999) presents, in addition, some applications of V-vector algebra for the derivation of fast and stable adaptation algorithms for Volterra filters. New material in this area is presented in Sections V and VI.
4
CARINI ET AL.
II. Volterra Series Expansions and Volterra Filters Volterra series expansions form the basis of the theory of polynomial nonlinear systems (or filters), including Volterra filters. In this section the Volterra series expansions for both continuous and discrete systems are introduced and their main properties reviewed. A complete account of these arguments can be found in Mathews and Sicuranza (2000).
A. Volterra Series Expansions for Continuous Nonlinear Systems A continuous-time nonlinear system in which the output signal at time t, y(t), depends on only the input signal at time t, x(t), can be described, with some restrictions, by means of an appropriate power series expansion such as the Taylor series expansion. The output of such a system, which is called a memoryless system, is thus described by the input–output relation y(t) =
∞
c p x p (t)
(1)
p=0
A continuous-time nonlinear system in which the output signal at time t, y(t), depends also on the input signal at any time τ different from t, is said to be a system with memory. Such a system can be represented by means of an extension of expression (1) known as the Volterra series expansion (Rugh, 1981; Schetzen, 1989; Volterra, 1887, 1913, 1959): ∞ h 1 (τ1 )x(t − τ1 ) dτ1 y(t) = h 0 + + +
∞
−∞ ∞
−∞
−∞
∞
−∞
···
h 2 (τ1 , τ2 )x(t − τ1 )x(t − τ2 ) dτ1 dτ2 + · · · ∞ −∞
h p (τ1 , . . . , τ p )x(t − τ1 ) · · · x(t − τ p ) dτ1 · · · dτ p + · · · (2)
The continuous-time nonlinear system represented by a Volterra series expansion is completely characterized by the multidimensional functions h p (t1 , . . . , t p ), called the Volterra kernels. The kernel of order zero, h 0 , is a constant. The first-order kernel, h 1 (τ1 ), is the impulse response of a timeinvariant linear system, and the corresponding term in the expansion in Eq. (2) is the well-known convolution integral, which describes the output of a continuous time-invariant linear system. The higher-order kernels can be assumed,
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
5
without loss in generality, as symmetric functions of their arguments so that any of the p! possible permutations of t1 , . . . , t p leaves h p (t1 , . . . , t p ) unchanged. It is worth noting that the integrals corresponding in Eq. (2) to the higher-order kernels have the form of multidimensional convolutions. A more compact expression can be given for Eq. (2) by defining the pth-order Volterra operator h¯ p [x(t)] as ∞ ∞ ¯h p [x(t)] = ··· h p (τ1 , . . . , τ p )x(t − τ1 ) · · · x(t − τ p ) dτ1 · · · dτ p −∞
−∞
(3)
Then, Eq. (2) can be written as y(t) = h 0 +
∞
h¯ p [x(t)]
(4)
p=1
The system in Eq. (2) is said to be causal if and only if h p (t1 , . . . , t p ) = 0
for any
ti < 0 and i = 1, . . . , p
(5)
As a consequence, the lower limits of the integrals in Eq. (2) are equal to zero. Then, the upper limits of the integrals, given as ∞, indicate that the causal system may have infinite memory. On the contrary, if the upper limits are all finite, the system possesses finite memory. A truncated Volterra series expansion is obtained by setting the upper limit of the summation in Eq. (4) to a finite integer value P. The parameter P is called the order, or the degree, of the Volterra series expansion. Fr´echet (1910) provided proof that any time-invariant, finite-memory system which is a continuous functional of its input can be uniformly approximated over a uniformly bounded and continuous set of input signals by a Volterra series expansion of appropriate finite order P. As a consequence of its relationship with Taylor series expansion, Volterra series expansion suffers some shortcomings when it is used to model nonlinear systems. The main limitation is related to the convergence problems encountered when the nonlinear systems to be modeled include strong nonlinearities such as saturation effects. Therefore, Volterra series expansions offer their best performance in modeling mild nonlinearities.
B. Volterra Series Expansions for Discrete Nonlinear Systems A discrete time-invariant nonlinear system with memory can be described in a manner similar to that of the continuous case by means of the discrete-time
6
CARINI ET AL.
Volterra series expansion y(n) = h 0 +
∞
h¯ p [x(n)]
(6)
p=1
where y(n) and x(n) are the discrete output and input signals, respectively, and h¯ p [x(n)] =
∞
m 1 =−∞
···
∞
m p =−∞
h p (m 1 , . . . , m p )x(n − m 1 ) · · · x(n − m p ) (7)
In the preceding definition, h p (m 1 , m 2 , . . . , m p ) is the pth-order Volterra kernel of the system. As in the continuous case, if h p (m 1 , . . . , m p ) = 0
for all
m i < 0 and i = 1, . . . , p
(8)
then the discrete nonlinear system is said to be causal, and Eq. (7) becomes h¯ p [x(n)] =
∞
m 1 =0
···
∞
m p =0
h p (m 1 , . . . , m p )x(n − m 1 ) · · · x(n − m p )
(9)
We can interpret the discrete-time Volterra kernels in a manner similar to that of the continuous-time systems. The constant h 0 is an offset term, whereas h 1 (m 1 ) is the impulse response of a discrete linear time-invariant system (or filter). If we use the terminology of digital signal processing, this term corresponds to an infinite impulse-response (IIR) filter (i.e., a filter with infinite memory) because the upper limit of the corresponding summation in Eq. (9) is given as infinity. In a similar manner, the pth-order kernels h p (m 1 , . . . , m p ) can be considered as generalized pth-order impulse responses characterizing the nonlinear behavior of the infinite-memory systems because the upper limits in the summations in Eq. (9) are still given as infinity. It is worth noting that, in practice, the difficulties that arise because of the infinite summations in Eq. (9) may be avoided by using recursive polynomial system models in analogy with the recursive structures used for linear IIR filters. In recursive polynomial system models, the relationship between the input and output signals is described by using a nonlinear difference equation of finite order involving delayed values of the output signal as well as the current and delayed values of the input signal as y(n) = f i (y(n − 1), y(n − 2), . . . , y(n − M + 1), x(n), x(n − 1), . . . , x(n − N + 1))
(10)
where f i (·) is an ith-order polynomial in the variables within the parentheses.
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
7
Such filters constitute one of the two main classes of polynomial filters: that is, that formed by the infinite-memory filters. In contrast, when we are modeling nonlinear systems we can often resort to finite-memory filters. This simpler but still useful class of filters is obtained by limiting the upper values in the summations of Eq. (9). In such a case, a Volterra series expansion involving only the input signal is sufficient to model the system. Thus, h 1 (m 1 ) is the impulse response of a linear finite impulseresponse (FIR) filter, and the effect of the nonlinearity on the output depends on only the present and the past values of the input signals defined on the extent of the filter support. If the discrete Volterra series expansion is truncated by limiting the number of kernels present in the expansion to the first P + 1, a finite-memory, finiteorder expansion is obtained as y(n) = h 0 +
P
h¯ p [x(n)]
(11)
p=1
where h¯ p [x(n)] =
N −1
m 1 =0
···
N −1
m p =0
h p (m 1 , . . . , m p )x(n − m 1 ) · · · x(n − m p )
(12)
Let us note that the upper limits in all the summations of Eq. (12) are set to be equal only for convenience. They may be set to arbitrary values to obtain a more general expression. These nonrecursive models described by truncated Volterra series have been studied extensively because of the relative simplicity of their input–output relationship. The filter represented by Eq. (7) is called a homogeneous filter of order p. In fact, in traditional system theory, a system S is said to be homogeneous if S[cx(n)] = cS[x(n)]
(13)
where c is a constant. In the case of the Volterra filter of Eq. (7), the output corresponding to an input cx(n), where c is a constant, is given by c p y(n), where y(n) is the response to x(n). Consequently, the definition of homogenous Volterra filters is an extension of the traditional definition. The simplest polynomial filter of this class is the quadratic filter obtained by choosing p = 2 in Eq. (12). A causal nonhomogeneous quadratic filter may include the constant term and the linear term, as shown by choosing P = 2 in Eq. (11). The use of homogeneous or nonhomogeneous quadratic terms often offers very interesting effects. Therefore, extensive studies of the properties of quadratic filters can be found in the literature together with studies of their use in many applications.
8
CARINI ET AL.
C. Properties of Discrete Volterra Series Expansions Volterra series expansions possess some very interesting properties which constitute the main reason for their popularity for modeling nonlinear systems. Although all these properties have been derived first for continuous-time Volterra series expansions, in this subsection they are illustrated directly for discretetime Volterra series expansions. 1. Linearity with Respect to the Kernel Coefficients The linearity of the Volterra series expansions with respect to the kernel coefficients is evident from Eqs. (6) and (7). In other words, the nonlinearity of the expansions is due to the multiple products of the delayed input values, while the filter coefficients appear linearly in the output expression. Because of this property, many classes of polynomial filters can be defined as conceptually straightforward extensions of linear system models. As an example, the output of a linear FIR filter with a memory of N samples is computed as a linear combination of a set of N input samples. The output of a finite-memory Pth-order truncated Volterra filter with the same memory is a linear combination of all the possible products up to P samples belonging to the same N input samples. When P = 2, a quadratic filter is obtained and the output signal is a linear combination of a bias term, the input samples, and product terms involving two input samples. Similarly, for a cubic, or third-order, truncated Volterra filter, P = 3, and the output is expressed as a linear combination of the bias term, samples of the input signals, products of two input samples, and products involving three samples of the input signals. This description can obviously be extended to any order of nonlinearity. The preceding examples correspond to nonrecursive nonlinear models. As already mentioned, we can extend the notion of the recursive, or IIR, linear systems by devising nonlinear models that include a feedback of the output samples, as shown by Eq. (10). In such cases the polynomial filters may have infinite memory. As an example, a simple recursive nonlinear filter is given by the input–output relationship y(n) = ax(n)x(n − 1) + bx(n)y(n − 1)
(14)
where again the filter output is linear with respect to the two filter coefficients a and b. The linearity of the output with respect to the coefficients of the filters can be exploited to extend many concepts of linear system theory to nonlinear systems. Examples of such extensions include frequency-domain representation of polynomial systems, optimum polynomial filter theory, and, of special interest for the considerations in this article, the derivation of adaptive polynomial filters.
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
9
2. Multidimensional Convolution Property Let us consider the pth-order term of the discrete-time Volterra series expansion given by Eq. (7) and define a p-dimensional signal v(n 1 , n 2 , . . . , n p ) = x(n 1 )x(n 2 ) · · · x(n p )
(15)
Let us now consider the p-dimensional convolution ∞ ∞ w(n 1 , . . . , n p ) = ··· h p (m 1 , . . . , m p )v(n 1 − m 1 , . . . , n p − m p ) m 1 =−∞
m p =−∞
(16)
Comparing Eq. (16) with Eq. (7), we see that the pth-order terms of a Volterra filter can be evaluated by performing a p-dimensional convolution and then keeping the output values for n 1 = n 2 = · · · = n p = n; that is, h¯ p [x(n)] = w(n, n, . . . , n)
(17)
Even though such a realization is not very efficient, the characterization of Volterra filters by using multidimensional convolutions is useful for understanding their properties. In other words, the nonlinearity in a one-dimensional Volterra filter is mapped into a p-dimensional linear filter by using a constraint on the input signal. This interpretation leads to the transform-domain and frequency-domain representations of Volterra filters. Because these representations are not essential for the analysis presented in this article, their derivation is omitted. The interested reader may refer to the book by Mathews and Sicuranza (2000). 3. Symmetry of the Volterra Kernels The pth-order term of a finite-memory Volterra filter, defined by Eq. (12), has N p coefficients. In this representation each permutation of the indices m 1 , m 2 , . . . , m p is considered to result in a separate coefficient. However, because all such permutations multiply the same quantity—namely, x(n − m 1 ), . . . , x(n − m p )—it is possible to reduce the number of independent coefficients. A generic kernel h p (m 1 , . . . , m p ) can thus be replaced by a symmetric kernel h p,sym (m 1 , . . . , m p ) by defining its elements as h p,sym (m 1 , . . . , m p ) =
1 h p (m π (1) , . . . , m π( p) ) (18) |π (m 1 , m 2 , . . . , m p )| π (·)
where the summation is over all distinct permutations π (·) of the indices m 1 , m 2 , . . . , m p , and |π(m 1 , m 2 , . . . , m p )| represents the number of such permutations. To evaluate |π (m 1 , m 2 , . . . , m p )|, let us denote the number of
10
CARINI ET AL.
distinct values in a specific set of (m 1 , m 2 , . . . , m p ) as r . Let k1 , k2 , . . . , kr denote the number of times these values appear in (m 1 , m 2 , . . . , m p ). Then |π (m 1 , m 2 , . . . , m p )| =
p! k1 !k2 ! · · · kr !
(19)
The elements of symmetric kernels can also be recast in what is called the triangular form as h¯ p [x(n)] =
N −1 N −1
m 1 =0 m 2 =m 1
···
N −1
h p,tri (m 1 , m 2 , . . . , m p )
m p =m p−1
· x(n − m 1 )x(n − m 2 ) · · · x(n − m p )
(20)
The elements of the triangular kernels can be computed as the sum of the corresponding terms of the symmetric kernel; that is, h p,tri (m 1 , m 2 . . . , m p ) |π (m 1 , m 2 , . . . , m p )|h p,sym (m 1 , m 2 , . . . , m p ) = 0
m1 ≤ m2 ≤ · · · ≤ m p otherwise (21)
The symmetry property can be usefully exploited to remarkably reduce the computational complexity of the Volterra series expansions because the number of the independent coefficients in the corresponding triangular representation is strongly reduced. It is worth noting that the huge number of elements that are often present in the Volterra series expansion is one of the drawbacks of such representation. The complexity of the model increases immensely with the length of the filter memory and the order of the nonlinearity. In principle, the generic pth-order kernel of a Volterra series expansion with a memory of N samples contains N p coefficients. However, according to the symmetry property, the number of its independent coefficients is given by the number of permutations without repetitions—that is, the binomial factor N + p−1 (22) Np = p This result also gives the number of the nonzero elements in the triangular representation of the Volterra kernels. The advantage of using the symmetry condition is measured by the ratio N p /N p . Consequently, the reduction in the realization complexity may be significant even for short-memory, loworder Volterra filters. The advantage of using the symmetry condition clearly increases with the memory span and the order of the nonlinearity.
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
11
4. Impulse Responses of Volterra Filters Unlike linear, time-invariant filters, Volterra filters cannot be fully characterized by the unit impulse-response signal. We can see this fact by calculating the filter response to a unit impulse function u 0 (n), by using Eqs. (6) and (7), as h(n) = h 0 + h 1 (n) + h 2 (n, n) + · · · + h p (n, . . . , n) + · · ·
(23)
This result clearly shows that the impulse response is determined only by the diagonal elements in the kernels (i.e., by the samples of h p at locations such that n 1 = n 2 = · · · = n p = n). Therefore, the impulse response alone is not sufficient to identify all the kernel elements. Schetzen (1989) has shown that for us to identify all the kernel elements of a continuous pth-order filter, we must find its response to a suitable set of p distinct impulse functions. Similar results can also be derived for the discrete-time case. Because the pth-order kernels can be completely determined by using p distinct impulses at the input of the system, they can also be considered generalized pth-order impulse responses. 5. Stability of Volterra Filters A system is stable in the bounded input–bounded output (BIBO) sense if and only if every bounded input signal results in a bounded output signal. It is well known that a sufficient and necessary condition for a linear and time-invariant system to be BIBO stable is the absolute summability of the impulse response: ∞ −∞
|h 1 (m 1 )| < ∞
(24)
The BIBO stability criterion can be extended to higher-order Volterra operators by applying similar constraints on the higher-order kernels. It has been shown (Mathews and Sicuranza, 2000) that the condition ∞
m 1 =−∞
···
∞
m p =−∞
|h p (m 1 , . . . , m p )| < ∞
(25)
is sufficient, even though not necessary in general, for the BIBO stability of homogeneous higher-order Volterra systems. 6. Existence and Convergence of Volterra Series Expansions Much of the work on the existence and convergence of Volterra series expansions for nonlinear systems has been done for continuous-time systems.
12
CARINI ET AL.
However, the basic approaches used for continuous-time Volterra series expansions can be directly extended to the discrete-time case. Although the issues of existence and convergence are essential from the mathematical view point, they are, in practice, relatively less relevant in the context of polynomial signalprocessing applications. This is the case because often nonlinear filters with specific and stable structures are used to process given sets of input signals to obtain the desired behavior at the output, as is the case of the Volterra filters discussed in this article. The interested reader can find rigorous and detailed analyses of existence and convergence in Rugh (1981) and Sandberg (1992) and references therein.
III. V-Vector Algebra In this section we first illustrate the time-shift property by using the vector representation of discrete-time FIR linear filters. This property is exploited to derive many efficient adaptation algorithms. To illustrate the motivations for the development of V-vector algebra, we consider its application to a quadratic homogeneous Volterra filter. Then, the main definitions of this new algebraic structure are introduced in a general context. The basic operations between V-vectors and V-matrices are also defined, and finally the linear algebra concepts of inverse, transposed, and triangular matrices are adapted to V-vector algebra. It is worth noting that V-vectors and V-matrices in V-vector algebra play the same role as vectors and matrices in linear algebra. Moreover, all the elements of this new algebra are essential for deriving a simple and effective description of fast adaptation algorithms for both Volterra filters and multichannel linear filters, as shown in the following sections. To this purpose, further definitions are added at the end of this section, together with some fundamental operations on V-matrices exploited in these derivations. As far as the notation is concerned, vectors and V-vectors are indicated with boldface lowercase letters, whereas matrices and V-matrices are labeled with boldface uppercase letters.
A. The Time-Shift Property An FIR linear filter is represented by the set of its N coefficients usually arranged in an N th-order vector w: w = [h 0 , h 1 , . . . , h N −1 ]T
(26)
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
13
Similarly, the N most recent samples of the input signal are arranged in a vector defined as xn = [x(n), x(n − 1), . . . , x(n − N + 1)]T
(27)
Therefore, the filter output signal is given by y(n) = wT xn
(28)
It is straightforward to see that the time-shift property of the input vector for linear filters simply means that at time n the element x(n) is added to the input vector xn−1 while the element x(n − N ) is discarded. Many fast recursive least squares (RLS) adaptive algorithms use the notion of augmented or extended input vectors (Cioffi and Kailath, 1984; Haykin, 1991; Lee and Mathews, 1993; Slock and Kailath, 1993). The extended input vector x¯ n can be defined as the vector obtained by adding the present sample x(n) on top of xn−1 : x(n) (29) x¯ n = xn−1 Alternatively, it can be defined by appending the input sample x(n − N ) to the bottom of the vector xn : xn x¯ n = (30) x(n − N ) According to the time-shift property, these two expressions are clearly equal: xn x(n) (31) = x¯ n = xn−1 x(n − N ) We next show that identity (31) is not valid for Volterra filters. For simplicity, we consider first the case of a quadratic homogeneous filter described by the equation y(n) =
N −1 N −1
m 1 =0 m 2 =m 1
cm 1 m 2 x(n − m 1 )x(n − m 2 )
(32)
where N is the length of the filter memory. The input vector at time n to a homogeneous quadratic filter can be defined as xn = [x(n)2 , . . . , x(n − N + 1)2 , x(n)x(n − 1), . . . ,
x(n − N + 2)x(n − N + 1), . . . , x(n)x(n − N + 1)]T
(33)
14
CARINI ET AL.
To obtain xn from the vector at time n − 1, xn−1 , we must discard from xn−1 the N entries of the vector: rn−1 = [ x(n − N )2 , x(n − N + 1)x(n − N ), . . . , x(n − 1)x(n − N ) ]T (34) that is, all the products of couples of input samples including x(n − N ). Then, the N elements of the vector vn = [x(n)2 , x(n)x(n − 1), . . . , x(n)x(n − N + 1)]T
(35)
that is, all the products of couples of input samples including x(n), must be added to the remaining elements. Two extended input vectors can now be defined at time n. The first vector, x˜ n , is obtained by adding vn to the top of xn−1 , vn (36) x˜ n = xn−1 whereas the second vector, x¯ n , is obtained by appending rn−1 to the bottom of xn , xn (37) x¯ n = rn−1 We can immediately see, from the expressions of these two extended vectors, that they are not coincident, nor is it possible to make them coincident by appropriate element arrangements in xn , vn , and rn−1 . Thus, the time-shift property is lost. In fact, even though the augmented vectors contain the same elements, they differ by a permutation of their elements as a consequence of the loss of the time-shift property. As mentioned before, the extension of the adaptation algorithms from linear filters to Volterra filters is granted, in theory, because the linearity property of the output of Volterra filters with respect to the kernel coefficients holds. However, in practice, the loss of the time-shift property makes this extension nontrivial. At the least, the aforementioned permutation must be applied. For example, Lee and Mathews (1993) extended a standard fast RLS adaptation algorithm to the Volterra case by taking into account a suitable permutation. The resulting algorithm was fast but not numerically stable. Conversely, fast and numerically stable algorithms can be obtained in the linear case, by means of triangular matrices, by using either QR decomposition (orthogonal matrix triangularization)—as shown, for example, in Bellanger (1989), Cioffi (1990), Liu (1995), and Terr`e and Bellanger (1994)—or QR-based lattice algorithms, as in Ling (1991), Proudler et al. (1991), and Regalia and Bellanger (1991). If we consider the adaptation algorithms which employ the concepts of both extended input vectors and triangular matrices, then the extension to the Volterra
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
15
case, by taking into account the aforementioned permutation, is a difficult task. In fact this permutation leads to the loss of the triangular structure of the matrices involved. To preserve the time-shift property and to avoid permutations, we must introduce a new algebra: V-vector algebra. To illustrate the motivations for the development of this novel algebraic structure, let us consider first its application to a quadratic homogeneous Volterra filter.
B. V-Vectors for Quadratic Homogeneous Filters In this subsection we show how it is possible to define a new algebraic element, the V-vector, which allows finite-memory quadratic homogeneous filters to mantain the time-shift property of linear FIR filters. The extension to general pth-order Volterra filters is discussed in Section IV. Let us take the entries of the input vector in Eq. (33) and arrange them in the nonrectangular matrix x(n)2
x(n − 1)2
...
x(n − N + 1)2
, l , l x(n)x(n − 1) . . . x(n − N + 2)x(n − N + 1) , l xn = , l .. , l . , l , l x(n)x(n − N + 1) , l
(38)
The diagonal brackets in this equation emphasize the nonrectangular structure of this matrix, called the V-vector according to its V shape. It is worth noting again that the V descendant structure is fundamental for the extension of adaptation algorithms which use the concepts of both triangular matrices and extended input vectors. For the V-vector in Eq. (38) we can define left and right columns, as shown in Figure 1. It can clearly be seen that the first left column of xn is formed by
Figure 1. (a) The left and (b) the right columns of a V-vector.
16
CARINI ET AL.
the elements which have been added going from xn−1 to xn , whereas the last right column of xn is formed by the elements which will be discarded when going from xn to xn+1 . The extended input V-vector can now be defined in one of two ways: by adding vn to xn−1 as the first left column or by adding rn−1 to xn as the last right column. Thus, the two definitions of the extended input V-vector give the same result: x˜ n = \vn \xn−1 / = x¯ n = \xn /rn−1 /
(39)
The explicit expression of the extended input V-vector for the quadratic homogeneous filter is as follows: x(n)2
x(n − 1)2
...
x(n − N )2
, l , l x(n)x(n − 1) ... x(n − N + 1)x(n − N ) , l x¯ n = , l .. , l . , l , l l x(n)x(n − N + 1) x(n − 1)x(n − N ) ,
(40)
The coincidence of the two definitions for the extended input V-vectors, shown in Eq. (39), constitutes proof of the conservation of the time-shift property without elemental permutation. This fact is the main motivation for the introduction of the new algebraic structure. The different notation between \a\b/ and \c/d/ in Eq. (39) is worth noting. In the first case, a indicates the first left column and b the remaining columns of the V-vector, whereas in the second case, d stands for the last right column and c for the remaining columns. For simplicity, the first left column and the last right column are called, in what follows, first column and last column, respectively.
C. Definitions and Properties of V-Vector Algebra In this subsection the general definitions of V-vector algebra are introduced together with the basic operations between V-vectors and V-matrices. Definition III.1 A V-vector is a nonrectangular matrix in which the number of elements in each row does not increase going from the top to the bottom of the matrix. Definition III.2 The first column of a V-vector is the array formed with the first elements in each row of the nonrectangular matrix.
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
17
Figure 2. Structure of a V-matrix.
Definition III.3 The last column of a V-vector is the array formed with the last elements in each row of the nonrectangular matrix. Definition III.4 The type of a V-vector is the m-tupla of integers that defines the number of rows (m) and the number of elements in each row of the V-vector. For example, the type of the V-vector in Eq. (38) is the N -tupla (N , N − 1, . . . , 1). For simplicity, in the following discussion the type of a V-vector is indicated with an uppercase script letter. Definition III.5 A V-matrix A × B is a V-vector of type A whose elements are also V-vectors, called sub-V-vectors, of type B. The structure of a V-matrix is illustrated in Figure 2. The elements of a V-vector \ai j / can be identified by a pair of indexes: the first index indicates the row, whereas the second indicates the column. Similarly, the elements of a V-matrix \\Ai jlm // are identified by two pairs of indexes: the first pair, i j, indicates the sub-V-vector, whereas the second, lm, identifies the element in the sub-V-vector. When necessary, V-matrices are identified with double diagonal brackets. So that the characterization of the novel algebraic structure can be completed, it is necessary to define the basic operations between V-vectors and V-matrices: Sum of Two V-Vectors Let a and b indicate two V-vectors of the same type. Then, their sum is a V-vector of the same type whose elements are given by ci j = ai j + bi j
18
CARINI ET AL.
Sum of Two V-Matrices Let A and B indicate two V-matrices A × B. Then, their sum is a V-matrix A × B whose elements are given by Ci jlm = Ai jlm + Bi jlm Inner Product of Two V-Vectors Let a and b indicate two V-vectors of the same type. Then, their inner product is the scalar given by a·b= ai j bi j ij
Product of Two V-Matrices Let A and B indicate an A × B matrix and a B × C V-matrix, respectively. Then, the product A · B is an A × C V-matrix whose elements are given by Pi jlm = Ai jhk Bhklm hk
Product of a V-Matrix and a V-Vector Let A indicate an A × B V-matrix and b indicate a V-vector of type B. Then, the product A · b is a V-vector of type A whose elements are given by Ai jhk bhk pi j = hk
Moreover, the linear algebra concepts of identity, transposed, and inverse matrices can be extended to V-vector algebra.
Definition III.6 The identity V-matrix is an A × A V-matrix \\Ii jlm // with the elements with the pair of indexes i j equal to lm set to one, while all the remaining entries are equal to zero. Definition III.7 The transposed V-matrix of an A × B V-matrix A = \\ai jlm // is a B × A V-matrix AT with AT = \\alm i j //. In other words, the transposed V-matrix has each sub-V-vector formed with the elements of A which occupy in the different sub-V-vectors the same positions of the sub-V-vector in AT . Furthermore each element is arranged with the same order of the corresponding sub-V-vector of A. Definition III.8 The inverse V-matrix of an A × A V-matrix A is the A × A V-matrix which pre- or postmultiplied by A gives the identity V-matrix. It is worth noting and easy to verify that if V-matrices or V-vectors reduce to matrices or vectors, respectively, all these definitions coincide with those of linear algebra. In the matrix analogy, the sub-V-vectors of V-matrices correspond
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
19
to the rows of matrices, whereas the sub-V-vectors of transposed V-matrices correspond to the columns of matrices. A class of V-matrices of particular interest is formed by triangular V-matrices. Definition III.9 A triangular V-matrix A × A is a V-matrix with embedded patterns formed by an increasing number of adjacent zeros. The great freedom in arranging the zero elements allows the introduction of 12 different canonical triangular V-matrices which can be grouped in three sets referred to as triangular V-matrices of kinds I, II, and III. These V-matrices are listed next and then defined accordingly. Right upper triangular I Right lower triangular I Left upper triangular I Left lower triangular I Right upper triangular II Right lower triangular II Left upper triangular II Left lower triangular II Right upper triangular III Right lower triangular III Left upper triangular III Left lower triangular III
(RUT I) (RLT I) (LUT I) (LLT I) (RUT II) (RLT II) (LUT II) (LLT II) (RUT III) (RLT III) (LUT III) (LLT III)
Definition III.10 A triangular V-matrix of kind I \\Ai jlm // is r
Right upper triangular when all its elements are zero for m< j l
r
Right lower triangular when all its elements are zero for m< j l>i when m = j
r
Left upper triangular when all its elements are zero for m> j l
r
Left lower triangular when all its elements are zero for m> j l>i when m = j
20
CARINI ET AL.
Definition III.11 The triangular V-matrices of kind II are obtained from V-matrices of kind I by a rotation around a vertical axis. In particular, RUT II is obtained from LUT I RLT II is obtained from LLT I LUT II is obtained from RUT I LLT II is obtained from RLT I Definition III.12 A triangular V-matrix of kind III \\Ai jlm // is r
Right upper triangular when all its elements are zero for l
r
Right lower triangular when all its elements are zero for l>i m< j when l = i
r
Left upper triangular when all its elements are zero for l j when l = i
r
Left lower triangular when all its elements are zero for l>i m> j when l = i
The rotation around a vertical axis of a triangular V-matrix of kind III produces again a triangular V-matrix of kind III. Examples of three different kinds of triangular V-matrices are shown in Figures 3 through 5. In the triangular V-matrices of kinds I and II a routing order by columns is applied, whereas in triangular V-matrices of kind III the routing order is by rows. The arrangements in these figures allows us to explain the names given to the canonical triangular V-matrices. In fact, in left (right) triangular V-matrices the nonzero elements tend to occupy the positions on the left (right) part of each sub-V-vector. Finally, the attribute of upper (lower) is given to a triangular V-matrix according to its kind: r
r
In V-matrices of kind I, the matrix whose rows are equal to the first columns of sub-V-vectors of the first column is an upper (lower) triangular matrix. In V-matrices of kind II, the matrix whose rows are equal to the last columns of sub-V-vectors of the last column is an upper (lower) triangular matrix.
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
Figure 3. Right lower triangular V-matrix of kind I.
Figure 4. Right lower triangular V-matrix of kind II.
21
22
CARINI ET AL.
Figure 5. Right lower triangular V-matrix of kind III.
r
In V-matrices of kind III, the matrix whose rows are equal to the first columns of sub-V-vectors of the first column and the matrix whose rows are equal to the last columns of sub-V-vectors of the last column are upper (lower) triangular matrices.
The following properties of triangular V-matrices can be derived by inspection: r
r
r
r
r
r
r
In right (left) strictly decreasing triangular V-matrices of kind I the matrix whose rows are equal to the last columns of sub-V-vectors of the last column is lower (upper) triangular. In right (left) strictly decreasing V-matrices of kind II the matrix whose rows are equal to the first columns of sub-V-vectors of the first column is upper (lower) triangular. The transposed V-matrix of a right (left) triangular V-matrix is a left (right) triangular V-matrix. The transposed V-matrix of an upper (lower) triangular V-matrix is a lower (upper) triangular V-matrix. The transposed V-matrix of a triangular V-matrix of kind I (II) (III) is a triangular V-matrix of kind I (II) (III). The product of two triangular V-matrices with the same triangular structure is still a V-matrix with the same triangular structure. The inverse V-matrix of a triangular V-matrix is a V-matrix with the same triangular structure.
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
23
The aforementioned V-matrices are not the unique triangular structures which satisfy these properties. In general, any routing order of the elements in a V-vector of type A defines a triangular structure for V-matrices A × A. In particular, for subsequent discussions, it is important to define the following triangular V-matrices. Definition III.13 Given a triangular V-matrix with R rows, a “row k MOD R” triangular V-matrix is the V-matrix obtained by considering a routing order which starts from row k, instead of row 1, and scans the columns in a cyclic manner. A “row k MOD R” triangular V-matrix can be defined for any canonical structure of kind I, II, and III. For example, the V-matrix \\ai jlm // is a “row k MOD R” LUT III V-matrix if all its elements are zero for mod R (l − k + 1) < mod R (i − k + 1) m> j
when l = i
An interesting property of such matrices is that the transposed V-matrix of a “row k MOD R” triangular V-matrix is still a “row k MOD R” triangular V-matrix. D. Some Further Definitions and Fundamental Operations The derivations of fast algorithms for adaptive Volterra filters, described in the next sections, make use of updating rules that require specific operations on additional V-matrix structures. In this subsection, first, some new notations are introduced. Second, how to execute the principal operations between these V-matrix structures is addressed. Definition III.14 \a\i b/ is a V-vector where a is an element placed before the first element of the ith row of V-vector b. Definition III.15 \a/i b/ is a V-vector where b is an element placed after the last element of the ith row of V-vector a.
T Definition III.16 ac i bD is a V-matrix where c is a V-vector whose elements are placed before the first elements of the ith rows of the corresponding sub-Vvectors of D, and \a\i bT / is a sub-V-vector placed before the first sub-V-vector of the ith row of the V-matrix \\c\i D//.
A b Definition III.17 cT i d is a V-matrix where b is a V-vector whose elements are placed after the last elements of the ith rows of the corresponding sub-Vvectors of A, and \cT /i d/ is a sub-V-vector placed before the first sub-V-vector of the ith row of the V-matrix \\A/i b//.
24
CARINI ET AL.
The main operations on the V-matrix structures introduced by Definitions III.16 and III.17 are listed next. r
Transposed V-matrices:
a
bT
T
c iD
A b T cT
r
i
d
=
cT
DT
T c A b
i
bT
i
d
These equations derive directly from the definition of a transposed V-matrix. Products between two V-matrices:
a T
T
e f b ae + bT g afT + bT H = ce + Dg i cfT + DH g iH c iD
E f A b cT
r
=
a
i
gT i h
d
=
AE + bgT
cT E + dgT
Af + bh i
cT f + dh
These equations correspond to row × column products. These results are obtained by computing the product between every sub-V-vector of the first factor and of the transposed of the second factor. Inverse V-matrices: The inverse V-matrices are obtained by solving the following equations:
a
bT
u vT
w iZ c iD
A b U v cT
i
d
wT
i
z
= =
T 1 0 0
I
0T
i
I
0 i
1
IV. V-Vectors for Volterra and Linear Multichannel Filters In this section we first extend the V-vector notation to Volterra filters of arbitrary order. Second, we show how V-vectors can be used to represent the input vectors to linear multichannel filters. As a consequence, both Volterra filters and linear multichannel filters can be considered as belonging to the same class of filters for which adaptation algorithms can be extended from adaptive linear filters by using V-vector algebra. More generally, such adaptation algorithms apply
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
25
to any filter whose output can be expressed as a linear function of an input V-vector that satisfies the time-shift property.
A. V-Vectors for pth-Order Volterra Filters To define an input V-vector for pth-order homogeneous Volterra filters, we must arrange the products of p input samples in a suitable V decreasing structure. This structure is called the pth-order V-vector. This arrangement may be done in a simple manner by using a filter order/memory length recursion. In fact, we can demonstrate the following proposition. Proposition IV.1 Passage from a pth-order input V-vector with a memory length of k − 1 samples to a pth-order input V-vector with a memory length of k samples requires the following steps: 1. Add to the (k − 1)th V-vector a right column of products of input samples according to the arranging rule used for the rows of this V-vector but translated one unit in time. 2. Add to the bottom of the (k − 1)th V-vector the vector p−1) x(n) r[k]( n
or the vector vn[k]( p−1) x(n − k + 1) where rn[k]( p−1) indicates the last right column of the kth memory length, p−1) indicates the first left column of the ( p − 1)th-order V-vector, and v[k]( n [k]( p−1) p−1) x(n) and v[k]( x(n − k + 1) contain same V-vector. (Note that rn n the same elements.) Proof. The products of input samples in a homogeneous Volterra filter with a memory length of k samples can be divided into three classes: 1. Products which belong to the V-vector with a memory length of (k − 1) samples 2. Products formed in the same manner used for the rows of the V-vector with a memory length of (k − 1) samples but translated one unit in time 3. Products which do not belong to the two previous classes and that, for this reason, must include both input samples x(n) and x(n − k + 1) The proposed recursive procedure simply translates this class division into an element-arranging rule. In this way we must demonstrate only that the third p−1) class coincides with the collection of elements of x(n)r[k]( and x(n − k + n [k]( p−1) . An element of these vectors cannot appear in the first two classes 1)vn
26
CARINI ET AL.
Figure 6. Input-data V-vector of a third-order, three-sample memory length, homogeneous Volterra filter.
because in every product both x(n) and x(n − k + 1) are present. Moreover, if we consider a product ξ of this third class, then ξ/x(n) is a ( p − 1)th-order product with x(n − k + 1) as a factor, and for this reason it belongs to rn[k]( p−1) . In a similar way, ξ/x(n − k + 1) has x(n) as a factor and thus belongs to p−1) . 䊏 v[k]( n Using the rule given in Proposition IV.1, we can easily form the pth-order input V-vector from the known ( p − 1)th-order V-vector. An example is given in Figure 6, in which a third-order V-vector with a memory length of three samples, formed according to the previous arranging rule based on the vector p) p) p) , is shown. Naturally, the use of the vector v[k]( instead of r[k]( leads to r[k]( n n n a different formulation of the input V-vector. However, the resulting V-vector differs from the previous one only by a permutation of the elements of its rows. This permutation does not affect the development of the adaptation algorithms. It is also worth noting that the pth-order V-vector with a memory length of only one sample is equal to \x p (n)/. In practical applications we often must use nonhomogeneous Volterra filters. Thus, we are interested in finding the input V-vector for a general Pth-order Volterra filter with a memory length of N samples described by the following input–output relationship: y(n) =
N −1 N −1 P
p=1 m 1 =0 m 2 =m 1
···
N −1
m p =m p−1
h m 1 ,m 2 ,...,m p x(n − m 1 )x(n − m 2 ) · · · x(n − m p) (41)
It is easy to show that the rules of V-vector algebra can also be applied to this general form of Volterra filters. In fact, the input V-vector for the nonhomogeneous Volterra filter in Eq. (41) can be derived by first forming the input V-vector of each of its homogeneous components and then arranging all the
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
27
Figure 7. Input-data V-vector of a third-order, three-sample memory length, nonhomogeneous Volterra filter.
rows of these V-vectors in a unique V descendant structure. As an example, in Figure 7 the V-vector of a third-order Volterra filter with a memory length of three samples is represented.
B. V-Vectors for Linear Multichannel Filters The essential feature offered by V-vector algebra is representation of the input samples to an adaptive filter in a way that permits us to satisfy the time-shift property which is at the basis of the derivation of many fast adaptation algorithms. This goal is reached by arranging the input samples in V-shaped matrices called V-vectors. Then, each row of the input V-vector can be considered as a channel of a filter bank. As a consequence, the updating of the filter coefficients can be divided into successive steps. At each step a different channel is considered and updated. The relevant aspect in this case is that all the filters whose output is linear with respect to the filter coefficients can be considered equivalent with respect to V-vector algebra. Such a class of filters includes the banks of linear filters in addition to Volterra filters. Therefore, V-vector algebra can also be used to deal with multichannel linear filters having, in particular, channels with different memory lengths. The input V-vector for the linear multichannel filter can be obtained by arranging the input vectors to the different channels in a unique V descendant structure according to the V-vector representation. The authors’ conclusion is that, because the developments exploiting the V-vector notation are the same for Volterra and
28
CARINI ET AL.
linear multichannel filters, the adaptation algorithms derived by using V-vector algebra apply to any filter of this class. These considerations make it possible to derive, in addition to the easy extension of adaptive algorithms already known for linear filters, new powerful adaptation techniques for Volterra and linear multichannel filters. It is worth noting that various multichannel approaches have been proposed to deal with linear and nonlinear filters considered as filter banks. In the nonlinear framework, a multichannel technique has been exploited to extend classical adaptation algorithms used for linear filters to Volterra filters (Lee and Mathews, 1993; Mathews, 1991; Syed and Mathews, 1993, 1994). In this multichannel approach, the Volterra filter is simply realized by means of a linear filter bank where each filter processes a product of samples of the input signal. Another application of the multichannel approach can be found in the paper by Giannakis and Serpedin (1997) on the blind equalization of nonlinear channels modeled as truncated Volterra filters. A general approach for blind equalization and identification of nonlinear single-input multiple-output truncated Volterra filters is presented in the aforementioned paper. Although impossible with a single output, deconvolution of blindly multiple truncated Volterra channels is possible with multiple outputs. The approach requires some conditions on the input sequence and the channel transfer matrix. The important result is that the nonlinear channels are equalized with linear FIR filters. This fact can be justified intuitively because the vector equalizer can be seen as a beam former which, because of its diversity, is capable of nulling the nonlinearities and equalizing the linear part. Finally, a diagonal coordinate representation for Volterra filters which presents some analogies with V-vector notation was developed by Raz and Van Veen (1998). The diagonal coordinate representation allows a truncated Volterra filter to be characterized as a bank of linear filters whose coefficients are defined by the diagonal entries of its kernels. This representation was exploited to derive efficient realizations for band-limited input signals. Bandlimited inputs frequently occur in applications of Volterra filters to problems such as equalization and linearization of nonlinear channels. The diagonal coordinate representation offers fast convolution-based implementations of Volterra filters together with insights into the relationship between the characteristics of the output in the frequency domain and the filter parameters. Efficient implementations for processing carrier-based input signals were presented in Raz and Van Veen in which downsampling was used to reduce the computational complexity. The same approach is used to develop efficient implementations for processing continuous-time carrier-based signals, pulse amplitude modulated signals, and frequency division multiplexed input signals. In the authors’ opinion, these and other applications of Volterra filters could benefit from the novel algebraic description based on V-vector notation.
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
29
V. A Novel Givens Rotation–Based Fast QR-RLS Algorithm Many real-time signal-processing problems, such as adaptive filtering and prediction as well as system identification, can be solved by means of RLS algorithms (Bierman, 1977; Haykin, 1991). However, sometimes RLS algorithms exhibit unacceptable numerical performances in limited-precision environments. The problem is exacerbated by polynomial filter structures. Numerical problems during recursion can particularly be experienced in “fast” RLS algorithms. The original RLS algorithm requires a computational complexity that grows with the square of the number of coefficients N T . On the contrary, the fast RLS algorithms for Volterra or linear multichannel filters require a computational complexity that grows proportionally with L N T , where L is the number of “channels” of the Volterra or the multichannel linear filter (i.e., the number of rows of the input-data V-vector). A number of algorithms which overcome the numerical instability problem of fast RLS algorithms have appeared in the literature (Ling, 1991; Liu, 1995; Proudler, 1994; Proudler et al., 1991; Regalia and Bellanger, 1991; Rontogiannis and Theodoridis, 1996, 1998; and Slock and Kailath, 1991; to cite just some first solutions). Popular fast and stable RLS algorithms such as the numerically stable fast transversal filter (SFTF) (Slock and Kailath, 1991) and the fast lattice QR decomposition algorithms (Proudler, 1994; Proudler et al., 1991; Regalia and Bellanger, 1991; Rontogiannis and Theodoridis, 1996, 1998) reduce the computational complexity to O(L N T ) operations per time instant. Other stable algorithms—such as the QR decomposition–based algorithms (Alexander and Ghirnikar, 1993; Frantzeskakis and Liu, 1994) and the square-root (SQR) Schur RLS adaptive filters (Strobach, 1991, 1994)— require, for their fast implementation, a systolic array of processing elements. In this section, by employing the V-vector formalism, we derive a novel fast QR-RLS algorithm based on Givens rotations. This novel algorithm belongs to the same family of SQR RLS algorithms developed by Carini (1996, 1997), Carini and Mumolo (1997), and Carini et al. (1999) but differs from these algorithms in that the adaptation is based on the a priori backward prediction error vector rather than the a posteriori backward prediction error vector. The algorithm is based on the derivation of two different Cholesky SQR factorizations of the autocorrelation matrix Ωn . Actually, in classical RLS algorithms, numerical instabilities arise because, as a result of the finite precision of processors and of error propagation, the autocorrelation matrix loses its properties—namely, its symmetry and positive definiteness. In fast RLS algorithms, instabilities may appear because the Kalman gain vector, derived directly from Ωn , at a certain moment cannot be associated with any positivedefinite autocorrelation matrix. However, in SQR algorithms, this problem is avoided by directly updating a SQR factor of Ωn or a quantity related to this
30
CARINI ET AL.
factor. In fact, in this way, we implicitly impose the symmetry and positive definiteness of the autocorrelation matrix. However, the SQR technique alone is not sufficient to achieve the numerical stability of the algorithm, which also depends on the numerically robust computation of each algorithm parameter. Extensive experimentations have shown that the proposed algorithm exhibits excellent robustness in limited-precision environments, and adaptive filtering with the mantissa rounded to 4 bits has been performed for more than 10 million samples without any instability. The proposed algorithm is closely connected with the classical fast QR and lattice QR algorithms (Alexander and Ghirnikar, 1993; Bellanger, 1989; Cioffi, 1990; Frantzeskakis and Liu, 1994; Haykin, 1991; Ling, 1991; Liu, 1995; Proudler, 1994; Proudler et al., 1991; Regalia and Bellanger, 1991; Rontogiannis and Theodoridis, 1996, 1998; Terr`e and Bellanger, 1994). As in QR algorithms, the QR-RLS algorithm has a Q Givens rotation matrix and an R triangular matrix, which is the Cholesky factor of the autocorrelation matrix. However, in contrast with QR algorithms, in QR-RLS algorithms the derivation of the filter is algebraic, based on the relationship between two different SQR factorizations of the extended autocorrelation matrix. It is worth stressing that the proposed algorithm does not determine the filter coefficients of a transversal filter but the coefficients of a lattice realization. Thus, the algorithm can be applied to system identification applications as well as to adaptive filtering and prediction applications. In particular, a direct dependency of the prediction error from the input signal makes the algorithm suitable for adaptive differential pulse code modulation (ADPCM) applications in signal coding. In what follows, V-vector algebra is used extensively in all derivations. First, the RLS adaptive filtering problem for Volterra filters is reviewed. Second, the novel algorithm is derived, and, third, some experimental results that substantiate the good numerical performance of the algorithm are presented.
A. Review of RLS Adaptive Filtering The output of a time-varying Volterra filter is given by dn (k) = wnT xk
(42)
where wnT is the Volterra filter coefficient V-vector and xk is the input-data V-vector as computed in Proposition IV.1. Let d(k) be the desired response signal. The objective is to compute the coefficient V-vector wn in such a way that the filter output is as close as possible to the desired response signal. This
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
31
leads to minimizing the exponentially weighted cost function Jn =
n k=0
λn−k |d(k) − wnT xk |2
(43)
at each time instant n, where λ is a forgetting factor that controls the speed of tracking time-varying signals. The solution of the minimization problem is given by (Haykin, 1991) wn = Ω−1 n pn
(44)
where Ωn =
n
λn−k xk xkT
and
k=0
pn =
n
λn−k xk d(k)
(45)
k=0
are the autocorrelation and cross-correlation V-matrices, respectively. Our objective is to develop a recursive version of Eq. (44) such that the number of operations per time instant is minimal and such that the recursion is numerically stable. In this article, so that a fast algorithm with L N T operations per sample can be obtained, the L rows of the input-data V-vector (the “channels” of the Volterra or multichannel linear filter) are updated in a sequential manner as in Syed and Mathews (1993). Practically, the filter update is divided into L steps and at each step a different channel is taken under consideration and updated. The term xn,i indicates the input-data V-vector in which only the first ith rows/channels have been updated at time instant n; for i = L it is xn,L = xn and for i = 0 it is xn,0 = xn−1 . All quantities with the subscript i are referred to the ith step and thus to the input-data V-vector xn,i . Particularly, Ωn,i =
n
T λn−k xk,i xk,i
(46)
k=0
The time-shift property and the relation between the forward and backward predictors are often employed for the derivation of fast RLS algorithms. The forward predictor at step i, an,i , is defined as the filter which estimates vn,i (the ith element of vector vn ) from xn,i−1 . Similarly, the backward predictor bn,i estimates rn−1,i (the ith element of vector rn−1 ) from xn,i . The forward and backward prediction errors and the estimation error are given, respectively, by T f n,i (k) = vk,i + an,i xk,i−1
(47)
T xk,i bn,i (k) = rk−1,i + bn,i
(48)
en (k) = d(k) + wnT xk,L
(49)
32
CARINI ET AL.
The Kalman gain vector or V-vector cn,i plays a fundamental role in the development of classical fast algorithms (Haykin, 1991). The definition of the Kalman gain V-vector is as follows: cn,i = Ω−1 n,i xn,i
(50)
From Eq. (44), the gain V-vector can be viewed as a predictor which estimates the pinning sequence from xk,i (Haykin, 1991). The corresponding prediction error, called the likelihood variable, is reported in Eq. (51): T xn,i γn,i = 1 − cn,i
(51)
The likelihood variable assumes a great importance in all fast transversal filter algorithms. In fact it monitors the numerical stability of the algorithm itself. According to Haykin (1991), γn,i is a real value bounded by zero and one— 0 ≤ γn,i ≤ 1—and instability arises when γn,i exceeds these bounds as a result of the finite precision of processors and of error propagation. The likelihood variable is employed for the computation of the a posteriori forward and backward prediction or estimation errors from the a priori prediction or estimation errors. Indeed, the following equations hold (Haykin, 1991): f n,i (n) = γn,i−1 f n−1,i (n)
bn,i (n) = γn,i bn−1,i (n) en (n) = γn,L en−1 (n)
(52) (53) (54)
It is worth recalling that the forward and backward predictors and the estimation filter can be recursively estimated as (Haykin, 1991) an,i = an−1,i − cn,i−1 f n−1,i (n)
bn,i = bn−1,i − cn,i bn−1,i (n) wn = wn−1 − cn,L en−1 (n)
(55) (56) (57)
The terms vk,i and rk−1,i are used in the definition of the augmented (extended) input V-vector x¯ n,i , which is obtained by placing vn,i in the left of the ith row of xn,i−1 or by placing rn−1,i in the right of the ith row of xn,i : x¯ n,i = \vn,i \i xn,i−1 / = \xn,i /i rn−1,i /
(58)
Finally, the autocorrelation of the forward and backward prediction errors are given by (Haykin, 1991) αn,i =
n k=0
2 λn−k f n,i (k) = λαn−1,i + f n−1,i (n) f n,i (n)
(59)
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
βn,i =
n k=0
2 λn−k bn,i (k) = λβn−1,i + bn−1,i (n)bn,i (n)
33
(60)
Fast RLS algorithms are usually derived by exploiting the relationships between the previously described quantities.
B. The Volterra Givens Rotation–Based Fast QR-RLS Filter In this subsection, the authors propose a new algorithm for adaptive Volterra prediction and filtering. The fast QR algorithm is based on two different factorizations of the extended autocorrelation V-matrix. First, the two factorizations are derived. Then, the different quantities employed for development of the algorithm are introduced. Proposition V.1 The extended autocorrelation matrix ¯ n,i = Ω
n
T λn−k x¯ k,i x¯ k,i
(61)
k=0
can be factorized in the following two ways:
where
T ¯ n,i = R ¯T R ¯ Ω n,i n,i = Rn,i Rn,i Rn,i = ¯ n,i = R
0T
−Rn,i−1 an,i i Rn,i−1 Rn,i −Rn,i bn,i
such that∗ R−T n,i =
i
−1/2
αn,i
1/2
0T
¯ −T R n,i = ∗
1/2
αn,i
(62)
0
(63)
(64)
βn,i
−1/2
T αn,i an,i
i
R−T n,i −1/2 T βn,i bn,i
R−T n,i−1 0 i
−1/2
βn,i
In what follows, the inverse transposed V-matrix of R is indicated by R−T .
(65)
(66)
34
CARINI ET AL.
Proof. Let us first prove Eq. (63). The extended autocorrelation matrix is given by n T ¯ n,i = Ω λn−k vk,i \i xk,i−1 · vk,i \i xk,i−1 k=0
=
n
λ
k=0
n−k
2 vk,i
xk,i−1 vk,i
T vk,i xk,i−1
i
T xk,i−1 xk,i−1
(67)
It is straightforward to demonstrate that (Carini and Mumolo, 1997) n k=0
n k=0
n k=0
Thus, ¯ n,i = Ω
2 T λn−k vk,i = αn,i + an,i Ωn,i−1 an,i
(68)
λn−k xk,i−1 vk,i = −Ωn,i−1 an,i
(69)
T λn−k xk,i−1 xk,i−1 = Ωn,i−1
(70)
T αn,i + an,i Ωn,i−1 an,i
−Ωn,i−1 an,i
T −an,i Ωn,i−1 i
Ωn,i−1
(71)
It is easy to verify that a SQR factorization of the extended autocorrelation T ¯ n,i = V-matrix Ω Rn,i is given by Rn,i 1/2 0T αn,i (72) Rn,i = −Rn,i−1 an,i i Rn,i−1 We can demonstrate Eq. (64) in the same way.
䊏
All the preceding relations are proved to be correct regardless of the type T M of the M × M V-matrix R. In the rest of this subsection, however, Rn,i is a row (i + 1) MOD L LUT II V-matrix with positive diagonal elements such that starting from a row 1 MOD L LUT II V-matrix, after L iterations we again obtain a row 1 MOD L LUT II V-matrix. Moreover, for construction the matrix ¯ T is a row (i + 1) MOD L LUT II V-matrix. R n,i The role played in the classical fast RLS algorithm by the Kalman gain V-vector and the input-data V-vector is played in this case by the V-vector dn,i defined as dn,i = R−T n−1,i xn,i
(73)
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
35
Two different extended V-vectors d can be defined in correspondence to V-matrices (65) and (66), as described in Eqs. (74) and (75): −1/2 (74) vn,i xn,i−1 = αn−1,i f n−1,i (n) dn,i−1 R−T dn,i = n−1,i · i i −1/2 ¯ −T (75) xn,i rn−1,i = dn,i βn−1,i bn−1,i (n) d¯ n,i = R n−1,i · i
i
T ¯ T are row (i + 1) From Eq. (75) and from the hypothesis that Rn,i and R n,i MOD L LUT II V-matrices, we have dn,i as the normalized a priori backward prediction error V-vector.∗ ¯ n−1,i do not coincide but differ by a rotation The V-matrices Rn−1,i and R ¯ ¯ −T R−T V-matrix Qi (Qi QiT = I ), and even the couples n−1,i , Rn−1,i and dn,i , dn,i differ by the same V-matrix. If we estimate the Qi that allows the passage from Eq. (63) to Eq. (64), then it is trivial to update dn,i from dn,i−1 . This rotation matrix will be decomposed into Givens rotations (the Givens rotations are disRn−1,i is a row × column cussed in Appendix I). Because the product Qi product, we have to proceed on the columns of Rn−1,i ; that is, the rows T : (sub-V-vectors) of Rn−1,i
T 0 Rn−1,i ¯T = R 1/2 n−1,i −(Rn−1,i bn−1,i )T i βn−1,i
1/2 T αn−1,i −(Rn−1,i−1 an−1,i ) T T = Rn−1,i Qi = · QiT (76) T 0 R n−1,i−1 i
In particular we have to annihilate some elements of the sub-V-vector 1/2 αn−1,i −Rn−1,i−1 an−1,i (77) i
while preserving the row (i + 1) MOD L LUT II structure of the remaining 1/2 T T part of Rn−1,i determined by Rn−1,i−1 . For this purpose we use αn−1,i as a pivot element and we have to rotate on this pivot all the elements at its right. Thus, the V-vector of Eq. (77) is scanned by right columns from right to left and (in a cyclic manner) from the ith row up to the (i + L − 1)th MOD L row (note that we stop scanning when we encounter the pivot). T ¯ n−1,i the ithAfter we apply the Givens rotations, if we discard from R row, last-column sub-V-vector, and from every sub-V-vector the ith-row, lastT , which is a row (i + 1) MOD L LUT II column element, we obtain Rn−1,i ∗ The algorithm proposed in this article differs from the algorithm of Carini et al. (1999) in the definition of dn,i . In Carini et al., dn,i is the normalized a posteriori backward prediction error V-vector.
36
CARINI ET AL.
V-matrix. Furthermore we can see that the update of dn,i requires only the knowledge of the Qi V-matrix (i.e., of V-vector (77)), without the need to T . build up Rn−1,i From the knowledge of dn,i we can update all the other parameters of the algorithm. Proposition V.2 The a priori forward prediction error can be written as follows: T dn,i−1 f n−1,i (n) = vn,i + zn−1,i
(78)
where zn−1,i is defined as zn−1,i = Rn−1,i−1 an−1,i
(79)
which is the prediction V-vector that estimates vn,i from the normalized a priori backward prediction V-vector dn,i−1 . Moreover, the a priori estimation error can be written as follows: T dn,L en−1 (n) = d(n) + hn−1
(80)
hn−1 = Rn−1,L wn−1
(81)
where we have defined
which is the coefficient V-vector of the filter that estimates the desired signal d(n) from the normalized a priori backward prediction V-vector dn,L . Proof. Equation (78) can easily be proved as follows: T T Rn−1,i−1 R−T f n−1,i (n) = vn,i + an−1,i n−1,i−1 xn,i−1 T = vn,i + Rn−1,i−1 an−1,i dn,i−1 T = vn,i + zn−1,i dn,i−1
The proof of Eq. (80) can be similarly derived.
䊏
Before deriving the adaptation equations of zn,i and hn , we should prove some useful results that are later applied for the derivation of the adaptation equations. Proposition V.3
The SQR factor Rn,i−1 is equal to the expression of Eq. (82): √ (82) Rn,i−1 = λTn,i−1 Rn−1,i−1
T is a row i MOD L LUT II V-matrix such that where Tn,i−1 T T Tn,i−1 Tn,i−1 = I + λ−1 dn,i−1 dn,i−1
(83)
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
37
Proof. We have T T T Rn,i−1 = λRn−1,i−1 Rn−1,i−1 + xn,i−1 xn,i−1 Rn,i−1 T T = λRn−1,i−1 I + λ−1 dn,i−1 dn,i−1 Rn−1,i−1 T T = λRn−1,i−1 Tn,i−1 Tn,i−1 Rn−1,i−1
Therefore, from the uniqueness of the SQR factorization when the factors are triangular V-matrices with positive diagonal elements, we immediately prove Eq. (82). 䊏 −1 Proposition V.4 The inverse likelihood variable γn,i is given by the following equation: −1 T γn,i = 1 + λ−1 dn,i dn,i
or equivalently by the recursive equation 2 −1/2 2 −1/2 −1 −1 γn,i = γn−1,i + λ−1 αn−1,i f n−1,i (n) − βn−1,i bn−1,i (n)
(84)
(85)
Proof. From Eqs. (50) and (51) we have T γn,i = 1 − xn,i Ω−1 n,i xn,i
−T T = 1 − xn,i R−1 n,i Rn,i xn,i
−T T = 1 − λ−1 dn,i T−1 n,i Tn,i dn,i T T −1 I + λ−1 dn,i dn,i = 1 − λ−1 dn,i dn,i T dn,i dn,i T I− = 1 − λ−1 dn,i dn,i T dn,i λ + dn,i
=
1 T dn,i 1 + λ−1 dn,i
T ¯ T dn,i . dn,i = which proves Eq. (84). Eq. (85) follows from the fact that d¯ n,i dn,i Therefore, from Eqs. (74) and (75) and Eq. (84) we immediately derive the result of Eq. (85). 䊏
Note that the update of γn,i is critical for the numerical stability of the algorithm. If γn,i is evaluated with Eq. (85), an error accumulation on the likelihood variable determines a long-term numerical instability of the algorithm. On the contrary, Eq. (84) does not determine any error accumulation. Therefore, it can be employed for the derivation of numerically robust adaptive filters. As for the adaptation of zn,i and hn , the following proposition holds.
38
CARINI ET AL.
Proposition V.5 The V-vectors zn,i and hn can be adapted with Eqs. (86) and (87), respectively: √ (86) zn,i = λTn,i−1 (zn−1,i − λ−1 dn,i−1 f n,i (n)) √ hn = λTn,L (hn−1 − λ−1 dn,L en (n)) (87) Proof. To prove these equations, we first must prove that −1/2 R−T Tn,i−1 dn,i−1 γn,i−1 n,i−1 xn,i−1 = λ
(88)
From Eq. (82) we have −1/2 −T R−T Tn,i−1 R−T n,i−1 xn,i−1 = λ n−1,i−1 xn,i−1
−T −T = λ−1/2 Tn,i−1 T−1 n,i−1 Tn,i−1 Rn−1,i−1 xn,i−1 −1 −T T = λ−1/2 Tn,i−1 I + λ−1 dn,i−1 dn,i−1 Rn−1,i−1 xn,i−1 T dn,i−1 dn,i−1 = λ−1/2 Tn,i−1 I − R−T n−1,i−1 xn,i−1 T λ + dn,i−1 dn,i−1
= λ−1/2 Tn,i−1 dn,i−1 γn,i−1 According to Eqs. (52), (55), (79), and (88) we have zn,i = Rn,i−1 (an−1,i − cn,i−1 f n−1,i (n)) √ = λTn,i−1 zn−1,i − R−T n,i−1 xn,i−1 f n−1,i (n) √ = λTn,i−1 zn−1,i − λ−1/2 Tn,i−1 dn,i−1 γn,i−1 f n−1,i (n) √ = λTn,i−1 (zn−1,i − λ−1 dn,i−1 f n,i (n)) which proves Eq. (88). Equation (87) can be proved in a similar manner, from Eqs. (54), (57), (81), and (88). 䊏 The desired signal estimation is produced by a joint process which acts only after a complete update cycle of the prediction scheme. Furthermore, because dn is the normalized a priori backward prediction error filter, the joint process part coincides with that of normalized lattice RLS and fast QR algorithms based on a priori prediction errors. So even if the filter coefficient vector is not directly evaluated, we can still apply this algorithm for system identification as well as for prediction and filtering. In particular a direct dependency of the forward a priori prediction error at time n from the input sample at the same time makes the algorithm suitable for the ADPCM application in signal coding, as is shown in Section VI, even if this direct dependency is paid for with a structure that cannot be pipelined because of the presence of sums of products.
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
39
TABLE 1 Algorithm for Computing the y = Tx Producta z=0 From the last right column of V-vector to the first From the ith row to the (i + L − 1) MOD L or from the first to last row, if the column has less than i elements Compute: z hk = z + x hk dhk z = z hk From the first right column of V-vector to the last From the (i − 1)th row to the (i − L) MOD L or from the last to first row, if the column has less than i elements Compute: cdhk lhk = 1 + (cdhk )dhk −1/2 yhk = lhk (x hk + (cdhk )z hk ) c = c/lhk a
TT T = I + cddT , and TT is a row i MOD L LUT II V-matrix.
An efficient procedure for computing of the Tn,i−1 x product of Eqs. (86) and (87) has been developed from the Agee–Turner algorithm (Bierman, 1977) and is presented in Table 1 (Carini, 1997). Appendix II explains how the algorithm of Table 1 can be derived. Regarding the initialization of the algorithm, we can choose ⎧ d1,0 = 0 ⎪ ⎪ ⎪ ⎨z0,i = 0 h0 = 0 ⎪ ⎪ ⎪ ⎩γ1,0 = 1 with δ≪1 α0,i = δ
This choice leads to a limited memory of initial conditions during the transitory convergence period, but even to sharply varying parameters in the same period, which can overflow the computational precision of processors, especially when there is limited word length. This problem can be avoided by taking α0,i =
with
≫1
This initialization gives slowly varying parameters during the transitory convergence period, which is extended proportionally to . The final algorithm and the operation count of each equation are presented in Table 2. The total computational burden of the algorithm in the case of a strictly decreasing V descendant data vector is (11 + 31 )L N T multiplications, (3 + 13 )L N T divisions, and (1 + 23 )L N T square roots. The addition count is comparable to the multiplication count. The computational complexity of the
40
CARINI ET AL. TABLE 2 Algorithm for Computing the Givens Rotation–Based Fast QR-RLS Filter Operationa Algorithm dn,0 = dn−1,L γn,0 = γn−1,L For i = 1 to L, compute T f n−1,i (n) = vn,i + zn−1,i dn,i−1
−1 f n,i (n) = f n−1,i (n)/γn,i−1 −1 mn,i = zn−1,i − λ dn,i−1 f n,i (n)
Qi
from
1/2
\αn−1,i \i −zn−1,i /
−1/2 \dn,i /i βn−1,i bn−1,i (n)/ −1/2 = Qi · \αn−1,i f n−1,i (n)\i dn,i−1 /
√ zn,i = λTn,i−1 mn,i αn,i = λαn−1,i + f n−1,i (n) f n,i (n) −1 T γn,i = 1 + λ−1 dn,i dn,i
√ End For λTn,L hn−1 T dn,L en−1 (n) = d(n) − hn−1 en (n) = γn,L en−1 (n) √ hn = λTn,L (hn−1 + λ−1 dn,L en (n)) Total
×
÷
√
L NT L NT 2 L NT 3
2 L NT 3
8 L NT 3
4 L NT 3
5L N T
2L N T
L NT
L NT 3N T NT
NT
NT 34 L NT 3
10 L NT 3
5 L NT 3
a Listed in the second, third, and fourth columns is the cost of the main term of each operation in terms of multiplications, divisions, and square roots.
algorithm presented in this section is similar to the computational complexity of the algorithms described in Syed and Mathews (1993) and in Rontogiannis and Theodoridis (1996). A different formulation of the algorithm which does not require any square root has been developed and is similar to the UDUT algorithms presented in Carini (1997). C. Experimental Results The numerical stability of the algorithm has been verified by several experiments with different types of data signals. A finite-precision arithmetic was simulated as in Proudler et al. (1991), by implementing a floating-point
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
41
Figure 8. Arithmetic mean of the a priori forward prediction mean square error as a function of time. The arithmetic mean was evaluated over 1000 different non-white-Gaussian-noise signals, whereas the mean square error was computed on data segments of 10 samples.
arithmetic with mantissa precisions of 16, 8, and 4 bits, respectively. The longest simulation, performed with a 4-bit mantissa, had more than 10 million samples, and in none of all the considered simulations was any instability observed. Figure 8 shows, as a function of time, the arithmetic mean of the a priori forward prediction mean square error of a second-order Volterra filter with the linear part having a memory length of 10 samples and the quadratic part having a memory length of 3 samples. The arithmetic mean was evaluated over 1000 different non-white-Gaussian-noise signals, whereas the mean square error was computed on data segments of 10 samples. All noise signals were obtained by filtering a zero mean, unit variance white Gaussian noise N (n) with the cascade of a linear filter and a Volterra filter given by x(n) = N (n) + 0.9x(n − 1)
(89)
y(n) = 2x(n) + x(n − 1) − 0.5x(n − 2) + 0.02x 2 (n) + 0.01x(n)x(n − 2) (90) and a unit variance white Gaussian noise was added to y(n). The different plots refer to different mantissa precisions of the processor. Figure 8 illustrates how the word length affects the performance of the algorithm and shows the
42
CARINI ET AL.
good convergence properties even with a low-mantissa precision. The same performance can be obtained from both an 8-bit mantissa word length and the standard floating-point precision. VI. Nonlinear Prediction and Coding of Speech and Audio by Using V-Vector Algebra and Volterra Filters The understanding of the physical processes which underlie the production of speech and audio signals is important in communication, medicine, consumer electronics, and computer science areas. However, many related mechanisms are still insufficiently known and much work has to be done to find better models of speech and audio signals. Some of the most relevant models for speech production are based on the acoustic theory of sound production. According to this theory, speech is considered as the output of a resonant network excited by a sound source. The main sections of the production mechanism—namely, the source, the resonant network, and the radiation effects—have been linearly modeled in a noncoupled cascade manner. Clearly, this is only an approximation, at least for the following reasons: there is strong theoretical and experimental evidence for the existence of a nonlinear coupling between the source and the resonant network; the air dynamic phenomena during speech production is described by the nonlinear Navier–Stokes equations (which imply that the produced airflow is turbulent); and the phonation system is controlled by nonlinear neuromuscular commands. Moreover, Rodet (1993) has reported that nonlinear processes are involved in the production of sound by several musical instruments. For the aforementioned reasons, there has been a growing research interest in nonlinear speech and audio processing. Among all the nonlinear methods for speech processing which have been developed so far, a few significant approaches are summarized next for the sake of pointing out some previous work in the field. Several authors used neural networks for the nonlinear analysis of speech, including radial basis functions approximations as reported in Moakes and Beet (1994) and multilayer perceptrons as described in Faundez (1999). Moreover, Haykin and Li (1995) performed nonlinear adaptive prediction of speech by using recurrent neural nets. Casdagli (1989) derived nonparametric nonlinear autoregressive methods, Gersho and Gray (1992) developed an algorithm for nonlinear predictive vector quantization, and Kumar and Gersho (1997) studied a technique based on codebook prediction. Other classes of nonlinear speech-processing methods include the technique of phase-space reconstruction, as described in Gibson et al. (1992), which was extended by other authors who analyzed some geometric
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
43
parameters that characterize the phase space. For example, the fractal dimension was analyzed by Accardo and Mumolo (1998) and Maragos and Potamianos (1999), and the Lyapunov exponents were analyzed by Banbrook et al. (1999). In a different case, Maragos et al. (1993) developed a nonlinear signalprocessing approach toward the detection and estimation of the modulations in speech resonances of the AM–FM type. Finally, some research developments based on fuzzy modeling, as reported by Mumolo and Costanzo (1997), are worth mentioning. Fuzzy models are able to deal with both high nonlinearities and partial uncertainty in the knowledge about the system.
A. Nonlinear Prediction of Speech by Using V-Vector Algebra Many authors have pointed out that nonlinear prediction of speech greatly outperforms linear prediction in terms of prediction gain. In this subsection, we focus on nonlinear prediction implemented with discrete Volterra series truncated to the second term, as described in Section II. A quadratic Volterra predictor has a linear term, which is related to the vocal-tract resonances, and a quadratic term that can model the nonlinearities related to the mechanisms of speech production. Therefore, the Volterra predictor appears as a natural extension of the linear predictors well described by Markel and Gray (1976); in fact, the predictor is the following simple parametric model: xˆ (n) =
N1 i=1
h 1 (i)x(n − i) +
N2 N2 i=1 j=i
h 2 (i, j)x(n − 1)x(n − j)
(91)
where N1 and N2 are called linear and quadratic orders in the following discussion. In principle, we can thus define an analysis model, e(n) = x(n) −
N1
h 1 (i)x(n − i) −
N1
h 1 (i)x(n − i) +
i=1
and a synthesis model, x(n) = e(n) +
i=1
N2 N2
h 2 (i, j)x(n − i)x(n − j)
(92)
N2 N2
h 2 (i, j)x(n − i)x(n − j)
(93)
i=1 j=i
i=1 j=i
The prediction error shown in Eq. (92) is the instantaneous prediction error. Identification of the Volterra coefficients can be performed by means of the minimization of the mean squared prediction error over a frame of data; the related equations are simple to derive because the predictor is nonlinear in the signal values but it is linear in the filter coefficients. This problem thus
44
CARINI ET AL.
requires the solution of a linear system in which statistical moments up to the fourth order are involved. Such block-based approaches have been worked out by Mumolo and Francescato (1993), and some results are reported subsequently. The prediction gain is very good; however, the inversion problem— namely, the reconstruction of the input signal by using a quantized residual signal—is very critical, because even a soft quantization of the residual signal leads to an unstable inverted Volterra filter. Therefore, the block-based configuration is unsuitable for coding applications. Moreover, the numerical complexity of such block-based approaches is very high. Adaptive identification of the Volterra filter coefficients yields the possibility of reducing the computational burden. The algorithms can be divided into least mean square (LMS) and RLS approaches; a thorough discussion of these adaptive techniques applied to discrete Volterra filters can be found in Mathews and Sicuranza (2000). Although the LMS approach is a simple stochasticgradient adaptive technique, it only approximately solves the problem depicted in Eq. (44) of Section V, and its convergence to the final coefficient values is very slow. A much faster convergence is obtained by using RLS algorithms, which are recursive solutions of Eq. (44) of Section V. In these algorithms, care must be taken to ensure that the autocorrelation matrix does not lose its symmetry and positiveness during the adaptation, so that numerical instabilities can be avoided in limited-precision environments. Therefore, the problem is to derive low-complexity RLS algorithms which ensure numerical stability. Mumolo and Carini (1995) derived stable RLS algorithms for Volterra filters by using SQR techniques; however, their computational complexity is quite high. The RLS algorithm for Volterra filters described in Section V and based on V-vector algebra is very attractive as regards numerical stability and computational complexity. It is worth recalling that, as shown in Section V, besides computing the prediction error, the algorithm does not compute the Volterra filter coefficients but a lattice realization of the filter; the filter coefficients can be computed at the expense of additional computations. Therefore, it is better to use this algorithm in prediction-based applications, such as adaptive coding, which requires the computation of only a good prediction of the input sample, rather than in system identification applications. Moreover, two interpretations of the instantaneous prediction error are considered in the algorithm—namely, the forward a priori and a posteriori prediction errors. In any case, a direct dependency of the input signal to the Volterra filter is shown. The a priori prediction error is the error involved in the prediction of the ith channel input value vn,i before the coefficients of the lattice Volterra representation are updated. Similarly, the a posteriori prediction error arises from predicting the input value after the Volterra filter is updated. As shown in Section V, the two forms of prediction errors are related by the likelihood variable. Because the a priori prediction can be viewed as a tentative
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
45
prediction, it is not suitable to demonstrate that a Volterra model is able to describe speech nonlinearities. Rather, the a posteriori prediction error should be used. However, for coding purposes the question of which type of prediction error should be used is a matter of how well the quantizer is able to adaptively track the error; this topic is further discussed in Sections VI.E and VI.F. The question of how well a Volterra filter can model speech nonlinearities can be answered only experimentally. The following discussion details a series of experimental investigations performed by the authors to assess the modeling capability of Volterra filters when they are applied to speech signals. The adaptive prediction algorithm, based on V-vector algebra and proposed in Section V, was used in these nonlinear prediction experiments. The algorithm is implemented according to the following pseudo-code using a scriptinglike language (for better clarity, the pseudo-code can be compared with the algorithm description reported in Section V): Initialize linear (N1) and quadratic orders (N2); Initialize likelihood variable to unity;
Open input data samples; Foreach sample x(n) in the input data Do Compute the input vector: vn(0) = x(n); vn(1) = x(n)*x(n),..., vn(N2) = x(n)*x(nN2+1); For i=0 to N2 Do /* for each channel */ Compute a-priori prediction as the inner product ; Compute a-posteriori prediction using the likelihood variable; If (i==0) Then save the a-posteriori prediction error; Fi; Compute Q and update the D v-vector using Givens rotations; Update the Z(n) vector; Update the autocorrelation of the a-posteriori prediction error; Update the likelihood variable; Od; Od
A data set composed of 10 different sentences, each spoken by 10 speakers, 5 males and 5 females, sampled at 48 kHz and downsampled at 8 kHz, was used in the authors’ experiments. The data set was large because it involved 10 speakers and more than 8 min of natural speech; for this reason significant
46
CARINI ET AL.
Figure 9. Mean squared a posteriori prediction error versus the total number of filter coefficients. See text for a description of the curves.
mean results could be obtained. Figure 9 shows the mean squared a posteriori error, averaged over all the sentences and the speakers, versus the total number of Volterra coefficients. The first curve, denoted with asterisks, is the a posteriori linear prediction error for all the linear orders from 10 to 46. The second curve, indicated with plus signs, is related to Volterra predictions with a linear order equal to 8; the first point is related to a quadratic order equal to 2 (11 coefficients total), the second to 3 (14 coefficients total), and so on, up to the 8th quadratic order. The mean squared prediction error with a nonlinear predictor is less than the corresponding linear predictor with the same total number of coefficients up to the 7th quadratic order. The third curve, denoted with multiplication signs, is related to a 10th linear order and quadratic orders from 1 to 8. The best prediction gain was obtained for a quadratic order equal to 2. The last curve, denoted by triangles, was obtained with a 12th linear order and quadratic orders from 1 to 7. The third and fourth curves show that the mean squared prediction error in the case of a quadratic Volterra model is always significantly less than in the linear case. In conclusion, the nonlinearities in speech are relevant and are well gathered by Volterra predictors. Moreover, a good choice of linear and quadratic orders is (10, 2); in any case, good values of the quadratic orders are small, in the range of 2–3. The performance of the nonlinear predictor depends on the characteristics of the nonlinear mechanisms involved in the signal production; thus they are
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
47
Figure 10. Prediction error for the sentence “Nanny may know my meaning” for (top) a 46th-order linear predictor and (bottom) a 10th/8th-order Volterra predictor.
not uniform during a sentence. In general, they are more evident for vowel sounds. For example, in Figure 10, the a posteriori prediction error for a vowellike segment extracted from the sentence “Nanny may know my meaning” is reported. In the upper and lower panels, respectively, the predictor error corresponding to a 46th-order linear predictor and the prediction error for a filter with the orders 10 for the linear part and 8 for the quadratic part (i.e., with the same number of coefficients) are reported. As is shown graphically, the error variance is greatly reduced for the Volterra model with the same number of predictor coefficients; this indicates that a Volterra predictor is able to model speech nonlinearities. B. Nonlinear Coding of Speech and Audio by Using V-Vector Algebra There has been a rapidly growing number of commercial applications which require exchange of audio information. One of the approaches for coding audio signals is based on ADPCM algorithms. Besides the International Organization for Standardization/Moving Picture Experts Group (ISO/MPEG) standard described by Dietz et al. (1996), there are currently several activities on ADPCM-based approaches—such as the Intel/DVI (digital visual interface), the Microsoft Wav-ADPCM, and the ODA (open document architecture) standards—for embedding audio signals into multimedia documents.
48
CARINI ET AL.
The authors’ ADPCM algorithm uses RLS adaptive Volterra prediction switched with LMS. The use of switched predictors in signal coding is an old idea. Differently from early applications, however, the authors used a switched scheme for facing the stability problems which arise when RLS is used in an ADPCM framework. Apart from particular cases like those cited by Bershad and Macchi (1989), generally the convergence speed of RLS is much higher than that of LMS. This is one reason why the authors’ ADPCM algorithm can be used with high-bandwidth signals, such as audio signals. The use of Volterra predictors with a switching mechanism leads to an algorithm for audio coding with the following main features: r r r
r
r
It is able to model the nonlinearities involved in speech production. It has no coding delay. As shown in Section V, it has an O(L N T ) complexity, L being the number of channels. In terms of linear and quadratic orders, the computational complexity is O(N23 ), N2 being the quadratic order; this compares very favorably with the O(N24 ) complexity of SQR techniques, as reported in Mumolo and Carini (1995). The algorithm’s basic scheme leads to performance improvement over that of the G.723 coding algorithm. The switched mechanism yields a variable bit-rate coding system. Because, if the bit rate is lowered, the performance degradation is smooth, the algorithm can be used in embedded-coding applications.
C. The Coding Algorithm The block diagram of the coding algorithm is depicted in Figure 11. The Adaptive decision block chooses the best predictor on the basis of the minimum error between the input value and the reconstructed value. Hence, the algorithm generates two outputs: the quantized prediction error related to the best predictor and the information on which is the best predictor (this output is henceforth referred to as side info). Using only these two signals, the decoder reconstructs the input value and updates the lattice Volterra filter representation by using V-vector algebra. It is important to note that the algorithm uses the a priori prediction error. Moreover, as a way to avoid possible mistracking problems, the updating step of the quantizers is shared between the two subsystems. It is important to note that there are two stability issues in this system. One concerns the numerical stability of the RLS algorithm, and the other concerns the input-dependent stability of the coding system (i.e., the quantizer
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
49
x(n)
x(n) 1 x(n)
X
x(n)
x(n)
x(n) 2
Figure 11. Block diagram of the coding algorithm. RLS, recursive least squares; LMS, least mean square.
overloading). The switching mechanism is aimed at solving the latter type of instability. In the following the ADPCM coder is roughly described in terms of a pseudocode (see also the algorithm description given in Section V): Foreach input sample x(n) Do /* RLS predictor */ Compute the a-priori prediction of the input sample; Compute the a-priori prediction error = x(n) + prediction; Quantize the prediction error = Inverse Quantize[Quantize [error]]; Estimate the input signal = quantized error - a-priori prediction ; /* LMS predictor */ Compute the prediction using LMS; Compute the LMS prediction error; Quantize the LMS prediction error = Inverse Quantize[Quantize[error]]; Estimate the input signal = quantized error + prediction;
50
CARINI ET AL.
/* find the best predictor */ Find the best predictor by comparing the estimated signals to x(n); Side info = predictor name; Efficient coding of the side info; Transmit the quantized error signal (of the best predictor) and the coded side info; /* Updating the predictors */ If the best predictor was LMS then Save the LMS estimate of the input signal; Compute the RLS prediction error using the saved estimation; Else Save the RLS estimate of the input signal; Compute the LMS prediction error using the saved estimation; Fi; /* update the RLS predictor */ Compute the input vector vn() using the estimated signal; For i= 0 to ordnl Do If ( i==0 ) then A-priori prediction error = RLS a-priori prediction error; Compute a-posteriori prediction error; Else Compute the a-priori prediction error using vn(i); Compute a-posteriori prediction error; Fi; Update RLS; Od; /* update the LMS predictor */ Update LMS Volterra filter using the quantized LMS error; Od;
The decoder is already embedded in this pseudo-code.
D. Stability of the Proposed Coding Algorithm As shown in Section V, the RLS algorithm based on V-vector algebra is robust against numerical stability problems. However, its placement in a loop in the ADPCM coder makes it sensitive to quantizer overloading. In fact, it can be shown that for both LMS and RLS, the coefficients’ perturbation depends recursively on the perturbation at the previous step, but the perturbation propagation for RLS is two orders of magnitude greater than the same perturbation in the LMS case. In conclusion, as pointed out by Watkins et al. (1995), quantizer
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
51
overloading for LMS is much less critical, by at least two orders of magnitude, than in the RLS case. This result can also be deduced from the convergence speed difference of the two adaptive algorithms. Thus, the system stabilization problem is of fundamental importance. Eleftheriou and Falconer (1986) used a simple stabilization scheme of the RLS algorithm based on a periodic resorting to LMS. Instead, the authors used an adaptive switching mechanism based on a minimum error criterion, as shown in Figure 11; in this way the performance obtained is much higher than that obtained by using the approach of Eleftheriou and Falconer. E. Sampling Frequency Issue The first experimental observation with the coder using Volterra predictors and V-vector algebra was that the system performance increases at higher sampling frequencies. In fact, the following property can be demonstrated: Proposition VI.1 If the quantizer of a generic ADPCM system is overloaded, then the following condition holds: M IS >M fS
(94)
where M is the maximum value which can be quantized without overloading, M IS is the first-order moment of the spectral signal distribution, and f S is the sampling frequency. Proof. The quantizer is overloaded when |e(n)| > (2nbit − 1) /2
given that e(n) is the error signal to quantize. Let us suppose that the prediction system was able to accurately follow the signal until the sample index (n − 1) and that there was a sudden increase of the error signal. This means that the reconstructed signal did not track the original signal, but it remained close to the reconstructed signal at the previous time instant. In other words, we can write the following relation: e(n) = s(n) − sˆ (n) ≈ s(n) − sˆ (n − 1) ≈ s(n) − s(n − 1)
Moreover, let us assume that the analog signal s(t) is differentiable. By turning to the discretization process of s(n), we can say that the time scale is divided into elementary time intervals dt and that the signal is divided into elementary signals ds. Because dt = 1/ f S , f S being the sampling frequency, or dt · f S = 1, s(n) − s(n − 1) 1 1 ds e(n) ≈ s(n) − s(n − 1) = · · = dt fS dt fS
52
CARINI ET AL.
Hence the overloading condition is represented by the following condition: (ds/dt) · (1/ f S ) > M. If we recall from the sampling theorem that for a generic signal x(t) with limited bandwidth and finite energy, the following condiω tion holds: |d x/dt| < M IX , where M IX = (1/2π) −ωn n |X (ω)||ω|dω is the firstorder moment of the spectral signal distribution. Therefore we can say that the 䊏 condition (ds/dt) · (1/ f S ) > M can also be expressed as M IS / f S > M. From Proposition VI.1 it can be deduced that the higher the sampling frequency, the lower the probability of overloading the quantizer. In other words, the performance increases with the sampling frequency.
F. Efficient Coding of the Side Information The quantization of the side information requires 1 bit/sample. However, some schemes have been devised for reducing the overhead information sent to the receiver. From the authors’ experiments, it emerges that one of the best schemes can be summarized as follows: If the difference between the two reconstructed errors is less than a threshold K 1 , or if both the errors are less than another threshold K 2 , and if the quantizers are not overloaded, then at the current time instant use the same predictor of the previous time instant. Otherwise, send 1-bit information. Using this scheme, which can be replicated at the receiver, has greatly reduced the side info simply by varying the two thresholds. Some results are reported in the next subsection.
G. Experimental Results The goal of the experiments was to analyze the algorithm behavior, which depends on several factors, mainly the prediction orders for LMS and RLS, the coding of the side information (K1, K2 parameters), the number of quantization bits, and the sampling frequency. Many experimental verifications and performance measurements of the described algorithm were performed, with the data set described previously. Some results are reported in Figure 12, where the proposed system is compared with the G.723 ADPCM coding standard, proposed by the International Telecommunication Union (ITU) in 1988; it is worth noting that the G.723 coder works with 3, 4, and 5 quantization bits only. The results pertaining to the V-vector-based algorithm were obtained with a Volterra predictor with a 10th linear order and a quadratic order equal to 2, while the switched LMS part had a 12th order. Only two signal-to-noise ratio (SNR) measurements for the standard G.723 are shown in Figure 12:
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
53
Figure 12. Signal-to-noise ratio (SNR) performance of various coding algorithms versus quantization bits at sampling rates of 8 and 48 kHz.
one for 3 bits of quantization and one for 5 bits. Because the variable bit-rate coder also needs some bits for the side info, generally the total bit rate is not an integer; therefore, its performance cannot be directly compared with that of the G.723. In these experiments, the measurements closest to the G.723 standard for the authors coder resulted at 3.1 and 4.675 bits. It is evident that the advantage over the G.723 is significant. The curve at the top of Figure 12 is related to a sampling rate of 48 kHz; its SNR is much higher than that obtained at 8 kHz. This experimental observation verifies the consideration concerning convergence speed and quantizer overloading that is reported in Section VI.E. The performance obtained by varying the coding of the side information (K1 and K2 are the thresholds mentioned in Section VI.F) varies smoothly with the sampling rate. This opens up the possibility of using the coding algorithm in embedded-coding frameworks. Some final remarks are reported next. The use of the RLS algorithm for a Volterra filter using V-vector algebra in speech and audio coding has shown several important characteristics. First, the use of the RLS technique on its own leads to a significant gain in the SNR with respect to the classical LMS adaptation because of the nature of the adaptation. The second fundamental characteristic is that the nonlinear physical mechanisms involved in signal production are well modeled with a Volterra filter, which leads to an additional gain. Because the coding procedure is so critical because of the quantization
54
CARINI ET AL.
and, because resorting to LMS for stability reasons degrades the coding performance, the coding gain would be much higher by improving the tracking capability of the adaptive quantizer. It is worth noting that the RLS algorithm based on V-vector algebra used in the ADPCM coder is robust to limited-precision computation, as shown experimentally. Finally, the coder was also used to code musical audio signals, and similar improvements were observed. VII. Summary In this article a new algebraic structure which is usefully applied to the representation of the input–output relationships of the class of polynomial filters known as discrete Volterra filters was described. First, an introductory account of such filters and the underlying theory of the continuous and discrete Volterra series expansions were given. Then, the main elements of V-vector algebra were introduced, together with their relevant properties. In principle, V-vectors can be defined as nonrectangular matrices, and V-matrices represent appropriate collections of V-vectors, replacing, respectively, the vectors and the matrices of linear algebra. The basic operations between V-vectors and V-matrices were defined, and the concepts of inverse, transposed, and triangular matrices of linear algebra were adapted to V-vector algebra. The main reason for interest in such an algebra is that it can be viewed as a formalism which is suitable for the derivation of adaptation algorithms for Volterra filters by simple reformulation of the well-known adaptation algorithms applied to linear filters. As an additional feature, it was shown how V-vector algebra can be usefully exploited to describe multichannel linear adaptive filters with channels of different memory lengths. The multichannel approach has been successfully applied to problems such as the blind equalization of nonlinear channels modeled as truncated Volterra filters or the representation of Volterra filters by means of particular coordinate systems useful for the description of Volterra filters with band-limited inputs. These and other similar applications of Volterra filters could benefit from the novel algebraic description based on V-vector notation. As an example of application of the V-vector formalism, we derived a novel, fast, and numerically stable QR-RLS algorithm based on Givens rotations. This algorithm belongs to the same family of SQR RLS algorithms proposed elsewhere by the authors of this article, but it differs from those algorithms in that the adaptation is based on the a priori backward prediction error vector rather than on the a posteriori backward prediction error vector. The algorithm is based on the derivation of two Cholesky SQR factorizations of the autocorrelation matrix. The specific application of this algorithm to nonlinear
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
55
prediction of speech was described together with a nonlinear coding technique which, exploiting the robustness of the RLS algorithm based on V-vector algebra, offers very good performance, with respect to that of existing standard methods, especially in limited-precision environments. Finally, it is worth noting, that this coder was also successfully used to code musical audio signals and similar improvements were observed. Appendix I: The Givens Rotations As noted in previous sections, one of the most successful approaches for deriving numerically stable algorithms is the SQR technique, in which the autocorrelation V-matrix is factorized as Ωn = RnT Rn
(95)
This factorization is not unique: every QR V-matrix with Q orthogonal (or rotation) V-matrix (QQT = I) fulfills the same relationship. Nevertheless, by means of Givens rotations it is easy to determine the rotation V-matrix Q that relates two factorizations. The Givens rotations are widely used in QR-RLS and fast QR-RLS algorithms. Their success is due to the simplicity and the numerical robustness of the computations they perform. Problem AI.1 Given a V-matrix R, we want to find a rotation V-matrix Q ¯T = RT QT is an LUT II V-matrix. such that R
Solution. The V-matrix Q can be decomposed into K Givens rotation V-matrices Qk ; that is, Q = Q K · Q K −1 · · · · · Q1 . A Givens rotation V-matrix Qk is given in Figure 13, where ck2 + sk2 = 1. Therefore, the V-matrix Qk rotates a couple of elements of every sub-V-vector of RT . With a proper choice RT . of ck and sk , at every Givens rotation we can annihilate one element of Particularly, if we want to rotate an element y on an element x, with x > 0; that is, if we want z ck sk x (96) = · 0 y −sk ck our choice must be x z y sk = z z = x 2 + y2
ck =
(97) (98) (99)
56
CARINI ET AL.
Figure 13. A Givens rotation V-matrix.
¯ T requires us Passage from a generic V-matrix RT to an LUT II V-matrix R T to proceed on the sub-V-vectors of R . We scan the sub-V-vectors by right columns from the left to the right and from the last row to the first. For every sub-V-vector, we choose as a pivot the diagonal element and we rotate on the pivot all the elements at its right and at the top of its column. In a similar manner we can rotate the matrix RT in a row (i + 1) MOD L LUT II V-matrix. In particular, in our algorithm the V-matrix Rn,i of Eq. (63) has most of its elements set to zero; therefore, we need to annihilate only some T of Eq. (77) to derive the row (i + 1) elements of the sub-V-vector of Rn−1,i T ¯ MOD L LUT II V-matrix Rn−1,i and the V-vector d¯ n,i . Appendix II: Some Efficient Factorization Algorithms As was shown in the previous sections, V-matrices of the form Π = I + cddT
(100)
where I is an identity V-matrix, c is a positive constant, and d is a V-vector, are widely used in the adaptive filtering algorithms. Thus, the determination of SQR factorization matrices of Π—namely, Π = T T T with T a triangular V-matrix—plays a fundamental role. As shown in Eqs. (86) and (87), rather than computing T, we need to compute the Tx or T T x products, where x is a given vector. According to the Agee–Turner algorithm (Bierman, 1977; Haykin, 1991), let us consider the quadratic form xT Πx. The following relation can thus be
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
57
obtained when T T is a row i MOD L LUT II V-matrix: xT T T Tx = xT (I + c11 ddT )x T
T
2 = y11 + [x′ (I′ + c′ d′ d′ )x′ ]
(101)
where I′ is an identity V-matrix, x′ and d′ are the V-vectors we obtain by removing the first element of the first right column of x and d (i.e., x11 and d11 , 2 . Moreover, respectively), c11 = c, c′ = c11 /e11 , and e11 = 1 + cd11 −1/2 y11 = e11 x11 + c · d11 (102) x hk dhk hk
Therefore, if we recursively apply Eq. (101), by scanning x by right columns from the first to the last column and from the (i − 1) row to the (i − L) MOD L row or from the last to the first row, if the column has fewer than i elements, we obtain 2 2 2 xT T T Tx = y1,1 + y1,2 + · · · + yi,P(i)
where P(i) is the number of elements in the ith row and ymn is given by −1/2 (103) xmn + c · dmn x hk dhk ymn = emn hk∈V
where V is the subset of indexes:
V = {hk : k > n (k = n
AND
OR mod L (h − i + 1) ≤ mod L (m − i + 1))}
From Eq. (101) we have l
l l
y = Tx =
. . . y1P(1) , , . . . y2P(2) , , l .. , l . , l y , l l L1 , y11
y12
y21
(104)
where L is the number of rows of V-vector x or d. As regards the T T x product it can be noted from Eq. (102) that the T T matrix is given by the sum of a row i MOD L LUT II V-matrix, whose generic element is T pqr s = d pq dr s cr s er−1/2 s , and a diagonal matrix whose diagonal elements
58
CARINI ET AL.
are , l e11 e12 . . . e1P(1) , l e . . . e 21 2P(2) , l .. , l . , l , l e L1 , l , l
Therefore, the product T u 11 u 12 . . . u 1P(1) T u . . . u 2P(2) 21 u = TT x = T .. T . T u L1 T T
(105)
is given by −1/2 u mn = emn xmn + dmn
−1/2
chk dhk ehk x hk
hk∈U
where U is the subset of indexes:
U = {hk : k < n (k = n
AND
OR mod L (h − i + 1) ≥ mod L (m − i + 1))}
The preceding results can be summarized in the following two algorithms. Algorithm AII.2 Let us consider the SQR factorization: T T T = I + cddT where T T is a row i LUT II V-matrix, c is a positive constant, and d is a given V-vector, whose generic element is dhk . Furthermore, let x be a given V-vector of the same type as d, and x hk its generic element. An efficient algorithm for the v = Tx product is the following: Initialize z = 0 From the last right column of V-vector to the first From the ith row to the (i + L − 1) MOD L or from the first to last row, if the column has less than i elements Compute: z hk = z + x hk dhk z = z hk From the first right column of V-vector to the last From the (i − 1)th row to the (i − L) MOD L or from the last to first row, if the column has less than i elements
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
59
Compute: cdhk lhk = 1 + (cdhk )dhk −1/2 yhk = lhk (x hk + (cdhk )z hk ) c = c/lhk
Algorithm AII.2 In the same conditions as in the previous lemma, the u = T T x product can be computed with the following algorithm: Initialize z = 0 From the first right column of V-vector to the last From the (i − 1)th row to the (i − L) MOD L or from the last to first row, if the column has less than i elements Compute: cdhk ehk = 1 + (cdhk )dhk −1/2 z = (cdhk )(x hk ehk ) + z c = ec hk −1/2 u hk = x hk ehk + dhk z In this appendix, we described some efficient algorithms for determining row i MOD L LUT II factorization of V-matrices of the form I + cddT inspired by the Agee–Turner factorization algorithm (Bierman, 1977). However, with respect to the Agee–Turner algorithm, the c constant is always positive, which leads to more-stable factorization algorithms. It is worth noting that by rearranging the expressions of the algorithms described in this appendix, we can somehow trade divisions with multiplications. Because the operation count of the fast adaptive algorithms heavily depends on the operations required by these factorization algorithms, we can simply obtain slightly different derivations with a different number of products and divisions. This can be useful from an implementation view point.
References Accardo, A. P., and Mumolo, E. (1998). Comput. Biol. Med. 28, 75–89. Alexander, S. T., and Ghirnikar, A. L. (1993). IEEE Trans. Signal Processing 41, 20–30. Banbrook, M., McLaughlin, S., and Mann, I. (1999). IEEE Trans. Speech Audio Processing 7, 1–17. Bellanger, M. G. (1989). Signal Processing 17, 291–304. Bershad, N., and Macchi, O. (1989). In Proceedings of the ICASSP-89, International Conference on Acoustics, Speech and Signal Processing, Glasgow (England). pp. 896–899. Bierman, G. J. (1977). Factorization Methods for Discrete Sequential Estimation. New York: Academic Press.
60
CARINI ET AL.
Billings, S. A. (1980). IEE Proc. 127(D), 272–285. Carini, A. (1996). In Proceedings of the EUSIPCO-96, Eighth European Signal Processing Conference, Trieste (Italy). pp. 1235–1238. Carini, A. (1997). Adaptive and nonlinear signal processing. Ph.D. thesis, University of Trieste, Italy. Carini, A., and Mumolo, E. (1997). Signal Processing 57, 233–250. Carini, A., Mumolo, E., and Sicuranza, G. L. (1999). IEEE Trans. Circuits Syst. II: Analog and Digital Signal Processing 46, 585–598. Carini, A., Mumolo, E., and Sicuranza, G. L. (2000). Signal Processing 80, 549–552. Casdagli, M. (1989). Phys. D 35, 335–356. Cioffi, J. M. (1990). IEEE Trans. Acoustic Speech Signal Processing 38, 631–653. Cioffi, J. M., and Kailath, T. (1984). IEEE Trans. Acoustic Speech Signal Processing 32, 304–337. Dietz, M., Popp, H., Brandemburg, K., and Friedrich, R. (1996). J. Audio Eng. Soc. 44, 58–72. Eleftheriou, E., and Falconer, D. (1986). IEEE Trans. Acoustic Speech Signal Processing 34, 1097–1110. Faundez, M. (1999). Proc. Eurospeech 2, 763–766. Frantzeskakis, E. N., and Liu, K. J. R. (1994). IEEE Trans. Signal Processing 42, 2455–2469. Fr´echet, M. (1910). Ann. Sci. L’Ecole Normale Superieure 27 (3rd ser.), 193–216. Gersho, A., and Gray, R. M. (1992). Vector Quantization and Signal Compression. Boston: Kluwer Academic. Giannakis, G. B., and Serpedin, E. (1997). IEEE Trans. Signal Processing 45, 67–81. Gibson, J. F., Farmer, J., Casdagli, M., and Zubank, S. (1992). Phys. D 57, 1–30. Haykin, S. (1991). Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice Hall International. Haykin, S., and Li, L. (1995). Signal Processing 43, 526–535. Kumar, A., and Gersho, A. (1997). IEEE Signal Processing Lett. 4, 89–91. Lee, J., and Mathews, V. J. (1993). IEEE Trans. Signal Processing 41, 1087–1101. Ling, F. (1991). IEEE Trans. Signal Processing 39, 1541–1551. Liu, Z. S. (1995). IEEE Trans. Signal Processing 43, 720–728. Maragos, P., Kaiser, J. F., and Quatieri, T. F. (1993). IEEE Trans. Signal Processing 41, 3024– 3051. Maragos, P., and Potamianos, A. (1999). J. Acoust. Soc. Am. 105, 1925–1932. Markel, J. D., and Gray, A. H. (1976). Linear Prediction of Speech. Berlin/Heidelberg: SpringerVerlag. Marmarelis, P. Z., and Marmarelis, V. Z. (1978). Analysis of Physiological Systems. New York: Plenum. Mathews, V. J. (1991). IEEE Signal Processing Mag. 8, 10–26. Mathews, V. J., and Sicuranza, G. L. (2000). Polynomial Signal Processing. New York: Wiley. Moakes, P. A., and Beet, S. W. (1994). In Proceedings of the 1994 IEEE Workshop Neural Networks for Signal Processing IV. Ermioni, Greece, 319–328. Mumolo, E., and Carini, A. (1995). Eur. Trans. Telecommun. 6, 685–693. Mumolo, E., and Costanzo, W. (1997). IEE Electron. Lett. 33, 1012–1013. Mumolo, E., and Francescato, D. (1993). In IEEE Winter Workshop on Nonlinear Digital Signal Processing. Tampere, Finland, 2.1-4.1–2.1-4.4. Proudler, I. K. (1994). IEE Proc. Vision Image Signal Processing 141, 325–332. Proudler, I. K., McWhirter, J. G., and Shepherd, T. J. (1991). IEE Proc. F 138, 341–353. Raz, G. M., and Van Veen, B. D. (1998). IEEE Trans. Signal Processing 46, 103–114. Regalia, P. A., and Bellanger, M. G. (1991). IEEE Trans. Signal Processing 39, 879–891. Rodet, X. (1993). IEEE Trans. Circuits Syst. II 40, 696–701. Rontogiannis, A. A., and Theodoridis, S. (1996). In Proceedings of EUSIPCO-96, Eighth European Signal Processing Conference, Trieste (Italy). pp. 1381–1384.
V-VECTOR ALGEBRA AND VOLTERRA FILTERS
61
Rontogiannis, A. A., and Theodoridis, S. (1998). IEEE Trans. Signal Processing 46, 2862–2876. Rugh, W. J. (1981). Nonlinear System Theory: The Volterra–Wiener Approach. Baltimore: Johns Hopkins Univ. Press. Sandberg, I. W. (1992). IEEE Trans. Signal Processing 40, 1438–1442. Schetzen, M. (1989). The Volterra and Wiener Theories of Nonlinear Systems, reprint ed. Malabar, FL: Krieger. Schetzen, M. (1993). Proc. IEEE 69, 1557–1573. Sicuranza, G. L. (1992). Proc. IEEE 80, 1262–1285. Slock, D. T., and Kailath, T. (1991). IEEE Trans. Signal Processing 39, 92–113. Strobach, P. (1991). In Proceedings of the ICASSP-91, International Conference on Acoustics Speech and Signal Processing, Toronto (Canada). pp. 1845–1848. Strobach, P. (1994). IEEE Trans. Signal Processing 42, 1230–1233. Syed, M. A., and Mathews, V. J. (1993). IEEE Trans. Circuits Syst. I: Fundam. Theory Appl. 40, 372–382. Syed, M. A., and Mathews, V. J. (1994). IEEE Trans. Circuits Syst. II: Analog and Digital Signal Processing 41, 202–214. Terr`e, M., and Bellanger, M. G. (1994). IEEE Trans. Signal Processing 42, 3272–3273. Volterra, V. (1887). Rendiconti Regia Accademia dei Lincei. 2o Sem. 97–105, 141–146, 153–158. Volterra, V. (1913). Lec¸ons sur les functions de lignes, Gauthier-Villars, Paris, France. Volterra, V. (1959). Theory of Functionals and of Integral and Integro-Differential Equations. New York: Dover. Watkins, C. R., Bitmead, R. R., and Crisafulli, S. (1995). IEEE Trans. Speech Audio Processing 3, 137–141. Wiener, N. (1958). Nonlinear Problems in Random Theory. New York: The Technology Press, MIT/Wiley.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 124
A Brief Walk Through Sampling Theory ANTONIO G. GARC´IA Department of Mathematics, Universidad Carlos III de Madrid, E-28911 Legan´es (Madrid), Spain
I. Starting Point . . . . . . . . . . . . . . . . . . . . . . . . . II. Orthogonal Sampling Formulas . . . . . . . . . . . . . . . . . A. Unified Approach . . . . . . . . . . . . . . . . . . . . . 1. A Related Approach . . . . . . . . . . . . . . . . . . . 2. Another Related Approach . . . . . . . . . . . . . . . . . B. Putting the Theory to Work . . . . . . . . . . . . . . . . . . 1. Classical Band-Limited Functions . . . . . . . . . . . . . 2. Band-Limited Functions in the Fractional Fourier Transform Sense 3. Finite Sine and Cosine Transforms . . . . . . . . . . . . . 4. Classical Band-Limited Functions Revisited . . . . . . . . . 5. The ν-Bessel–Hankel Space . . . . . . . . . . . . . . . . 6. The Continuous Laguerre Transform . . . . . . . . . . . . 7. The Multidimensional WSK Theorem . . . . . . . . . . . . 8. The Mellin–Kramer Sampling Result . . . . . . . . . . . . C. Finite Sampling . . . . . . . . . . . . . . . . . . . . . . 1. Trigonometric Polynomials . . . . . . . . . . . . . . . . 2. Orthogonal Polynomials . . . . . . . . . . . . . . . . . . III. Classical Paley–Wiener Spaces Revisited . . . . . . . . . . . . . A. Fourier Duality . . . . . . . . . . . . . . . . . . . . . . . B. Undersampling and Oversampling . . . . . . . . . . . . . . . 1. Poisson Summation Formula . . . . . . . . . . . . . . . . 2. Robust Reconstruction . . . . . . . . . . . . . . . . . . C. Sampling by Using Other Types of Samples . . . . . . . . . . . 1. Using Samples from the Derivative . . . . . . . . . . . . . 2. Using Samples from the Hilbert Transform . . . . . . . . . . D. Zeros of Band-Limited Functions . . . . . . . . . . . . . . . E. Irregular Sampling . . . . . . . . . . . . . . . . . . . . . 1. Introducing Riesz Bases . . . . . . . . . . . . . . . . . . 2. The Riesz Bases Setting . . . . . . . . . . . . . . . . . . 3. A Unified Approach to Nonorthogonal Sampling Formulas . . . 4. Introducing Frames . . . . . . . . . . . . . . . . . . . . 5. The Frame Setting . . . . . . . . . . . . . . . . . . . . F. Iterative Algorithms . . . . . . . . . . . . . . . . . . . . . IV. Sampling Stationary Stochastic Processes . . . . . . . . . . . . . V. At the End of the Walk . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64 65 68 74 75 75 75 78 79 82 83 85 86 87 89 90 90 92 93 97 97 100 101 102 104 108 110 111 112 114 117 119 121 124 128 132
63 Copyright 2002, Elsevier Science (USA). All rights reserved. ISSN 1076-5670/02 $35.00
64
ANTONIO G. GARC´IA
I. Starting Point Sampling theory deals with the reconstruction of functions (signals) through their values (samples) on an appropriate sequence of points by means of sampling expansions involving these values. The most famous result in this direction is the Whittaker–Shannon–Kotel’nikov formula, which allows us to reconstruct band-limited signals (i.e., signals containing no frequencies beyond a critical value ωc ) from an equidistant sequence of samples whose spacing depends on ωc . Concerning the exact discoverer of this sampling formula, there exists some historical controversy, which involves famous mathematicians such as A. L. Cauchy and E. Borel, among others. The interested reader can find some historical sources on this topic in the references mentioned in the introduction of Section II. In any case, there is no doubt as to when modern sampling theory began: in 1949, when Shannon published his famous paper “Communication in the Presence of Noise.” Although it is almost certain that Shannon was not the discoverer of “his formula,” his paper triggered an avalanche of works, which eventually produced a flourishing body of results on sampling methods and their applications. What started as a theorem for reconstructing band-limited signals from uniform samples has become, from a mathematical point of view, a whole branch of applied mathematics known as sampling theory. This new field has turned out to be very useful in many mathematical areas, such as approximation theory, harmonic analysis, theory of entire functions, theory of distributions, and stochastic processes, among others. The efforts to extend Shannon’s fundamental result point in various directions: nonuniform samples, other discrete data taken from the signal, multidimensional signals, and more. This, together with the technological impact of sampling in communication theory and signal processing, can provide a clearer idea of the current importance of this topic. As a consequence, a tremendous amount of material can be gathered under the title “Sampling Theory.” Therefore, any survey of this topic, in particular this one, must necessarily be summarized. The expression brief walk is used in the title of this article to indicate a personal choice to summarize an introduction to sampling theory. The main aim of this article is to serve as an introduction to sampling theory for the interested nonspecialist reader. Despite the introductory level, some hints about and motivations for more-advanced problems in sampling theory are given. The presentation of the article is selfcontained and mostly elementary. The only prerequisites are a good understanding of the fundamentals of Hilbert spaces and harmonic analysis, although a mastery of these theories is by no means required. Motivations and ideas are stressed at the expense of a formal mathematical presentation. As a result, the reader will not find the customary sequences of definitions, theorems, and
SAMPLING THEORY
65
corollaries, although the author has striven to keep the mathematical rigor in all arguments. The structure of this article is as follows. In Section II a survey about orthogonal sampling formulas is given. The classical Whittaker–Shannon–Kotel’nikov formula is the leitmotiv used to introduce a general theory for orthogonal sampling formulas in the framework of orthonormal bases in a Hilbert space. Most of Section II stems from Garc´ıa (2000). The procedure, which is illustrated with a number of examples, closely parallels the theory of orthonormal bases in a Hilbert space and allows a quick immersion into orthogonal sampling results. Section III is devoted to a deeper study of the spaces of classical bandlimited functions (i.e., the classical Paley–Wiener spaces). It includes sampling formulas which use other types of samples, such as derivatives or the Hilbert transform of a given signal, an idea proposed in Shannon’s paper. Nonuniform sampling involving Riesz bases or frames is also addressed on an introductory level. For completeness, an introductory theory of these mathematical concepts is also included. In Section IV, a soup¸con of sampling band-limited stationary stochastic processes is given from an abstract point of view. Finally, Section V provides a rapid overview of important sampling topics not included or mentioned in previous sections. This overview is accompanied by a suitable list of references for further reading. The main aim of this closing section is to point the interested reader to the appropriate references pertaining to more-advanced topics on sampling. Finally, most of the results stated throughout the article are well known, and the author claims originality only in the way of setting them out. He will be satisfied if this article contributes to making sampling theory better known to the scientific community. II. Orthogonal Sampling Formulas∗ In 1949 Claude Shannon published a remarkable result: If a signal f (t) (with finite energy) contains no frequencies higher than w cycles per second, then f (t) is completely determined by its values f (n/2w) at a discrete set of points with spacing 1/2w, and can be reconstructed from these values by the formula ∞ n sin π (2wt − n) f (1) f (t) = 2w π(2wt − n) n=−∞ ∗ Adapted from Garc´ıa, A. G. (2000). Orthogonal sampling formulas: A unified approach. SIAM Rev. 42(3), 499–512, with permission from the Society for Industrial and Applied Mathematics.
66
ANTONIO G. GARC´IA
In engineering–mathematical terminology, the signal f is band limited to [−2πw, 2πw], which means that f (t) contains no frequencies beyond w cycles per second. Equivalently, its Fourier transform F is zero outside this interval: 2π w 1 F(x)ei xt d x (2) f (t) = √ 2π −2π w The engineering principle underlying Eq. (1) is that all the information contained in f (t) is stored in its samples { f (n/2w)}. The cutoff frequency determines the so-called Nyquist rate, the minimum rate at which the signal needs to be sampled for us to recover it at all intermediate times t. In the preceding case, 2w = 4π w/2π is the sampling frequency and 1/2w is the sampling period. This rate was named after the engineer H. Nyquist, who was the first to point out its importance in connection with telegraph transmission in 1928. The sampling functions used in the reconstruction (1) are Sn (t) =
sin π (2wt − n) π(2wt − n)
They satisfy the interpolatory property Sn (tk ) = δn,k at tk = k/2w, k ∈ Z, where δn,k equals one if n = k, and zero if n = k. A series as in Eq. (1) is known as a cardinal series because the sampling functions involve the cardinal sine function (or sinc function): sin πt/πt if t = 0 sinc(t) = 1 if t = 0 These series owe their name to J. M. Whittaker (1935), whose work was cited by Shannon (1949). To be precise, J. M. Whittaker’s work was a refinement of that of his father, the eminent British mathematician E. T. Whittaker (1915). However, it is unclear whether they were the first mathematicians to introduce these kinds of expansions. The discovery of these series has also been attributed to other famous mathematicians such as E. Borel, A. L. Cauchy, W. L. Ferrar, and K. Ogura. Some interesting historical notes concerning this controversy can be found in Butzer and Stens (1992), Higgins (1985, 1996), Lacaze (1998) and Zayed (1993). See also the master references: Borel (1897), Cauchy (1893), Ferrar (1926), and Ogura (1920). The Shannon sampling theorem provides the theoretical foundation for modern pulse code modulation communication systems, which were introduced, independently, by V. Kotel’nikov in 1933 (an English translation of the original Russian manuscript can be found in Benedetto and Ferreira, 2001b, Chap. 2) and by Shannon (1949). This sampling theorem is currently known in the mathematical literature as the Whittaker–Shannon–Kotel’nikov theorem, or WSK sampling theorem.
SAMPLING THEORY
67
In general, the problem of sampling and reconstruction can be stated as follows: Given a set H of functions defined on a common domain , is there a discrete set D = {tn } ⊂ such that every f ∈ H is uniquely determined by its values on D? And if this is the case, how can we recover such a function? Moreover, is there a sampling series of the form f (tn )Sn (t) (3) f (t) = n
valid for every f in H, where the convergence of the series is at least absolute and uniform on closed bounded intervals? In many cases of practical interest, the set H is related to some integral transform as in Eq. (2), and the sampling functions satisfy an interpolatory property. All this leads to the proposal of a general method to obtain some sampling theorems in a unified way. In Section II.A orthogonal sampling theorems are obtained by following these steps: 1. Take a set of functions {Sn (t)} interpolating at a sequence of points {tn }. 2. Choose an orthonormal basis for an L 2 space. 3. Define an integral kernel involving {Sn (t)} and the orthonormal basis. Consider the corresponding integral transform in the L 2 space. 4. Endow the range space of this integral transform with a norm which provides an isometric isomorphism between the range space and the L 2 space by means of the integral transform. 5. Thus, any Fourier expansion in the L 2 space is transformed into a Fourier expansion in the range space whose coefficients are the samples of the corresponding function, computed at the sequence {tn }. 6. Convergence in this norm of the range space implies pointwise convergence and, as a consequence, we obtain a sampling expansion which holds for all functions in the range space. The idea underlying the whole procedure is borrowed from Hardy (1941), who first noticed that Eq. (1) is an orthogonal expansion.
This methodology is put to use in Section II.B, where several well-known sampling formulas are derived in this way. Thus the two main features of the author’s approach are the following: I. Placing the problem in a functional framework, common to many diverse situations, allows sampling theory to be introduced through the well-developed theory of orthonormal bases in a Hilbert space. A number of well-known sampling formulas are obtained in this unified way.
ANTONIO G. GARC´IA
68
II. The functional setting chosen permits us, in principle, to derive only orthogonal sampling expansions. However, it can be enlarged to more general settings including Riesz bases or frames, as is pointed out in Section III.E.1.
A. Unified Approach This subsection begins with a brief reminder of orthonormal bases in a separable Hilbert space H (i.e., a Hilbert space containing a countably dense set). This well-known concept is a basic tool in this subsection, and it will allow us to draw nontrivial consequences in sampling: An orthonormal basis for H is a complete and orthonormal sequence {en }∞ n=1 in H; that is, en , em = δn,m (orthonormality), and the zero vector is the only vector orthogonal to every en (completeness). Given an orthonormal sequence {en }∞ n=1 in H, the following three statements are equivalent (Naylor and Sell, 1982, p. 307): 1. For every x ∈ H we have the Fourier series expansion x= in the H-norm sense. 2. For every x and y in H we have x, y =
∞
x, en en
(4)
∞
x, en y, en
(5)
n=1
n=1
3. For every x ∈ H Parseval’s formula x2 = holds.
∞ n=1
|x, en |2
(6)
In this subsection L 2 (I ) spaces are dealt with; that is, |F(x)|2 d x < ∞ L 2 (I ) = F : I → C measurable and I
where I is an interval in R,bounded or not. As usual, the inner product in L 2 (I ) is given by F, G L 2 (I ) = I F(x)G(x) d x. All these spaces are separable and, consequently, possess a countable orthonormal basis (Naylor and Sell, 1982,
SAMPLING THEORY
69
p. 314). Throughout this subsection, {φn (x)}∞ n=1 denotes an orthonormal basis for a fixed L 2 (I ) space. Let {Sn }∞ n=1 be a sequence of functions Sn : ⊂ R −→ C, defined for all t ∈ , and let {tn }∞ n=1 be a sequence in satisfying conditions C1 and C2: C1: Sn (tk ) = an δn,k , where δn,k denotes the Kronecker delta and an = 0. ∞ |Sn (t)|2 < ∞ for each t ∈ . C2: n=1
Let us define the function K (x, t) as K (x, t) =
∞
Sn (t)φn (x)
n=1
(x, t) ∈ I ×
(7)
Note that, as a function of x, K (·, t) belongs to L 2 (I ) because {φn }∞ n=1 is an orthonormal basis for L 2 (I ) as well. Now, let us consider K (x, t) as an integral kernel and define on L 2 (I ) the linear integral transformation which assigns (8) f (t) := F(x)K (x, t) d x I
to each F ∈ L 2 (I ). The integral transform (8) is well defined because both F and K (·, t) belong to L 2 (I ) and the Cauchy–Schwarz inequality implies that f (t) is defined for each t ∈ . Also, this transformation is one-to-one because {K (x, tk ) = 2 ak φk (x)}∞ k=1 is a complete sequence for L (I ); that is, the only function orthogonal to every K (x, tk ) is the zero function. Actually, if two functions coincide on f and g are equal on the sequence {tk }∞ k=1 , they necessarily F(x)K (x, t) d x and the whole set
. Indeed, let us suppose that f (t) = I g(t) = I G(x)K (x, t) d x; then, f (tk ) = g(tk ) for every k can be written as [F(x) − G(x)]K (x, tk ) d x = 0 I
and this implies F − G = 0 in L 2 (I ). Hence, f (t) = g(t) for each t ∈ . Now, let us define H as the range of the integral transform (8), 2 H = f : −→ C| f (t) = F(x)K (x, t) d x, F ∈ L (I ) I
endowed with the norm f H := F L 2 (I ) . Recall that, in a Hilbert space H, the polarization identity (Naylor and Sell, 1982, p. 276) allows us to recover
ANTONIO G. GARC´IA
70
the inner product from the norm by x, y =
1 4
x + y2 − x − y2
in the case of a real vector space, or by x, y =
1 4
!
x, y ∈ H
x + y2 − x − y2 + ix + i y2 − ix − i y2
!
x, y ∈ H
in the case of a complex vector space. Using the polarization identity, we have a first result: (H, · H ) is a Hilbert space, isometrically isomorphic to L 2 (I ). For each f, g ∈ H, where f (t) =
I
f, gH = F, G L 2 (I ) F(x)K (x, t) d x and g(t) = I G(x)K (x, t) d x.
(9)
Because an isometric isomorphism transforms orthonormal bases into orthonormal bases, we derive the following important property for H by applying the integral transform (8) to the orthonormal basis {φn (x)}∞ n=1 : {Sn (t)}∞ n=1 is an orthonormal basis for H. Next, we see that (H, · H ) is a reproducing kernel Hilbert space, a crucial step for our sampling purposes. For more details on this topic, see Aronszajn’s (1950) seminal paper or Higgins (1996), Saitoh (1997), Young (1980), and Zayed (1993). We recall that A Hilbert space H of functions on is said to be a reproducing kernel Hilbert space (hereafter, RKHS) if all the evaluation functionals E t ( f ) := f (t), f ∈ H, are continuous for each fixed t ∈ (or equivalently bounded because they are linear). Then, by the Riesz representation theorem (Naylor and Sell, 1982, p. 345), for each t ∈ there exists a unique element kt ∈ H such that f (t) = f, kt , f ∈ H, where ·, · is the inner product in H. Let k(t, s) = ks , kt = ks (t) for s, t ∈ . Then, f (·), k(·, s) = f, ks = f (s)
for every s ∈
(10)
The function k(t, s) is called the reproducing kernel of H. Equivalently, an RKHS can be defined through the function k(t, s) instead of the continuity of the evaluation functionals. Namely, A functional Hilbert space H is an RKHS if there exists a function k : ×
−→ C such that for each fixed s ∈ , the function k(·, s) belongs to H, and the reproducing property (10) holds for every f ∈ H and s ∈ .
SAMPLING THEORY
71
In this case, the continuity of E t follows from the Cauchy–Schwarz inequality. The reproducing property (10) looks somewhat strange because the knowledge of f at a point s ∈ requires the inner product f, k(·, s) which involves the whole f . However, this property has far-reaching consequences from a theoretical point of view, as can be seen subsequently. We can easily prove that the reproducing kernel in an RKHS is unique. Let k ′ (t, s) be another reproducing kernel for H. For a fixed s ∈ , consider ks′ (t) = k ′ (t, s). Then, for t ∈ we have ks′ (t) = ks′ , kt = kt , ks′ = kt (s) = kt , ks = ks (t) hence, k(s, t) = k ′ (s, t) for all t, s ∈ . basis for H, then the reproducing Finally, if {en (t)}∞ n=1 is an orthonormal " kernel can be expressed, as k(t, s) = ∞ n=1 en (t)en (s). Expanding kt in the , we have orthonormal basis {en }∞ n=1 kt =
∞ ∞ en (t)en kt , en en = n=1
n=1
and by using Eq. (5), we find that k(t, s) = ks , kt =
∞
en (s)en (t)
(11)
n=1
As a consequence of the preceding discussion about the RKHS, we obtain the following: (H, · H ) is an RKHS whose reproducing kernel is given by k(t, s) =
∞ n=1
Sn (s)Sn (t) = K (·, t), K (·, s) L 2 (I )
(12)
To prove this, let us use the Cauchy–Schwarz inequality in Eq. (8), obtaining for each fixed t ∈ |E t ( f )| = | f (t)| ≤ F L 2 (I ) K (·, t) L 2 (I ) = f H K (·, t) L 2 (I )
(13)
for every f ∈ H. As for the reproducing kernel formula (12), as a result of Eq. (11) we need to prove only the second equality. To this end, let us consider ′ 2 k (t, s) = K (·, t), K (·, s) L (I ) = K (x, t)K (x, s) d x I
72
ANTONIO G. GARC´IA
Then, for a fixed s ∈ , k ′ (t, s) is the transform of K (x, s) by Eq. (8). Using the isometry (9), we have f, k ′ (·, s)H = F, K (x, s) L 2 (I ) = F(x)K (x, s) d x = f (s) I
The uniqueness of the reproducing kernel leads to the desired result. It is worth pointing out that inequality (13) has important consequences for the convergence in H. More precisely, and uniform Convergence in the norm · H implies pointwise convergence √ convergence on subsets of where K (·, t) L 2 (I ) = k(t, t) is bounded. At this point we have all the ingredients to obtain a sampling formula for all the functions in H. Expanding an arbitrary function f ∈ H in the orthonormal basis {Sn (t)}∞ n=1 , we have f (t) =
∞ n=1
f, Sn H Sn (t)
where the convergence is in the H-norm sense and hence pointwise in . Taking into account the isometry between H and L 2 (I ), we have f, Sn H = F, φn L 2 (I ) =
f (tn ) an
for each n ∈ N. Hence, we obtain the following sampling formula for H: Each function f in H can be recovered from its samples at the sequence {tn }∞ n=1 through the formula f (t) =
∞ n=1
f (tn )
Sn (t) an
(14)
The convergence of the series√in Eq. (14) is absolute and uniform on subsets of where K (·, t) L 2 (I ) = k(t, t) is bounded. Note that an orthonormal basis is an unconditional basis in the sense that, because of Parseval’s identity (6), any of its reorderings is again an orthonormal basis. Therefore, the sampling series (14) is pointwise unconditionally convergent for each t ∈ and hence absolutely convergent. The uniform convergence follows from inequality (13). We could also have obtained formula (14) "by applying the integral transform (8) to the Fourier series expansion F(x) = ∞ n=1 F, φn L 2 (I ) φn (x) of a function F in L 2 (I ).
73
SAMPLING THEORY
A comment about the functional space H is in order. Any f ∈ H can be described by using the sequence of its values { f (tn )}∞ n=1 by means of formula (14). In particular, the inner product and the norm in H can be expressed as f, gH =
∞ f (tn )g(tn ) |an |2 n=1
f 2H =
∞ | f (tn )|2 n=1
|an |2
Some properties for the functional space H can easily be obtained by using the reproducing property (10). Namely, When H is a closed subspace of a larger Hilbert space H, the reproducing formula (10) applied to any f ∈ H gives its orthogonal projection, PH f, onto H; that is, PH f (s) = f, k(·, s)H
f ∈ H and
s∈
(15)
Let f = f 1 + f 2 be the orthogonal decomposition of f ∈ H with f 1 ∈ H (i.e., f 1 = PH f ). Then, f, k(·, s)H = f 1 + f 2 , k(·, s)H = f 1 , k(·, s)H = f 1 (s) because f 2 is orthogonal to any k(·, s), s ∈ . Next we solve, in a simple way, some extremal problems in an RKHS. The interested reader should consult Istratescu (1987), Saitoh (1997), and Yao (1967) for more complex results. Fixing t0 ∈ , E > 0, and M ∈ C, we have in H the following relations: | f (t0 )|2 = E × k(t0 , t0 ) max 2
f ≤E
√ k(s, t0 ) reached for f ∗ (s) = ± E √ k(t0 , t0 )
and min f 2 =
f (t0 )=M
M2 k(t0 , t0 )
reached for f ∗ (s) = M
k(s, t0 ) k(t0 , t0 )
In fact, both results come from the inequality | f (s)|2 = | f, k(·, s)|2 ≤ f 2 k(s, s) ,
s∈
where we have used the reproducing property and the Cauchy–Schwarz inequality. This subsection closes with two approaches to orthogonal sampling formulas which can easily be seen to be related to the sampling formula proposed in this section:
ANTONIO G. GARC´IA
74
1. A Related Approach Note that given an integral kernel K (x, t), conditions C1 and C2 can be inter∞ preted as the existence of a sequence {tn }∞ n=1 ⊂ , such that {K (x, tn )}n=1 is 2 an orthogonal basis for L (I ). This was the approach originally suggested by Kramer (1957) to obtain orthogonal sampling theorems. Kramer’s result reads as follows: Let K (x, t) be a kernel belonging to L 2 (I ), I being an interval of the real line, for each fixed t ∈ ⊂ R. Assume that there exists a sequence of real numbers {tn }n∈Z such that {K (x, tn )}n∈Z is a complete orthogonal sequence of functions of L 2 (I ). Then for any f of the form f (t) = F(x)K (x, t) d x I
where F ∈ L 2 (I ), we have f (t) =
∞
f (tn )Sn (t)
(16)
n=−∞
with Sn (t) =
I
K (x, t)K (x, tn ) d x |K (x, tn )|2 d x I
The series (16) converges absolutely and uniformly wherever K (·, t) L 2 (I ) is bounded. One of the richest sources of Kramer kernels is in the subject of self-adjoint boundary value problems. For more details and references, see Everitt and Nasri-Roudsari (1999), Higgins (1996), Zayed (1991, 1993), and Zayed et al. (1990). By using orthonormal bases in ℓ2 spaces to define the kernel (7), we can easily arrive at sampling expansions associated with discrete transforms of the type f (t) = F(n)K (n, t) {F(n)} ∈ ℓ2 n
This leads to the discrete version of Kramer’s result. See Annaby et al. (1999) and Garc´ıa and Hern´andez-Medina (2001) for a more specific account of the theory and examples.
75
SAMPLING THEORY
2. Another Related Approach Another similar formulation is that one given in Nashed and Walter (1991) and Saitoh (1997): Let H be an RKHS of functions defined on a subset of R with reproducing ∞ kernel k. Assume there exists a sequence {tn }∞ n=1 ⊂ such that {k(·, tn )}n=1 is an orthogonal basis for H. Then, any f ∈ H can be expanded as f (t) =
∞ n=1
f (tn )
k(t, tn ) k(tn , tn )
with convergence absolute and uniform on subsets of where k(t, t) is bounded. This result follows from the expansion of f in the orthonormal basis √ {k(·, tn )/ k(tn , tn )}∞ n=1 . Note that, in our construction, k(t, tn ) = K (·, t), K (·, tn ) L 2 (I ) = a¯ n Sn (t) and k(tn , tn ) = |an |2 . This approach is used in Section II.C (Finite Sampling). B. Putting the Theory to Work The main aim in this subsection is to derive some of the well-known sampling formulas by following the method exposed in the previous subsection. All the examples in this subsection are based on the knowledge of specific orthonormal bases for some L 2 spaces (see Naylor and Sell, 1982, pp. 322–329, and Zayed, 1996a, for accounts of bases and integral transforms, respectively). 1. Classical Band-Limited Functions √ 2 The set of functions {e−inx/ 2π }n∈Z is an orthonormal basis √ for L [−π, π]. it x Let us consider the Fourier integral kernel K (x, t) = e / 2π. For a fixed t ∈ R, we have ∞ einx 1 eit x eit x , einx L 2 [−π,π] √ = √ 2π n=−∞ 2π 2π
=
∞ sin π(t − n) einx √ π (t − n) 2π n=−∞
in
L 2 [−π, π ]
ANTONIO G. GARC´IA
76
Therefore, taking Sn (t) = sin π (t − n)/π (t − n) and tn = n, n ∈ Z, we obtain the WSK sampling theorem: Any function of the form π 1 F(x)eit x d x f (t) = √ 2π −π
with
F ∈ L 2 [−π, π ]
(i.e., band limited to [−π, π ] in the classical sense) can be recovered from its samples at the integers by means of the cardinal series f (t) =
∞
f (n)
n=−∞
sin π (t − n) π (t − n)
(17)
The series converges absolutely and uniformly on R because in this case K (·, t)2L 2 [−π,π] = 1
for all t ∈ R
For the moment, let us denote as Hπ the corresponding H space. We reconsider this space, the so-called Paley–Wiener space, in a subsequent section. The reproducing kernel in Hπ space is given by kπ (t, s) = =
1 it x isx sin π(t − s) e , e L 2 [−π,π] = 2π π(t − s) ∞ sin π (t − n) sin π(s − n)
n=−∞
π(t − n)
π(s − n)
where we have used Eqs. (12) and (11), respectively. Actually, the sampling points need not be taken at the integers for us to recover functions in Hπ . For a fixed√real number α, we can easily check that the sequence of functions {e−i(n+α)x/ 2π}n∈Z is also an orthonormal basis for L 2 [−π, π ]. For a fixed t ∈ R, we have the expansion ∞ eit x sin π (t − n − α) ei(n+α)x = √ √ π (t − n − α) 2π 2π n=−∞
in
L 2 [−π, π]
Taking Sn (t) = sin π(t − n − α)/π (t − n − α) and tn = n + α , n ∈ Z, we obtain the following: Any function in Hπ can be recovered from its samples at the integers shifted by a real constant α by means of the cardinal series f (t) =
∞
n=−∞
f (n + α)
sin π(t − n − α) π (t − n − α)
(18)
77
SAMPLING THEORY
The preceding result shows that in regular sampling the significance relies on the spacing of the sampling points and not on the sampling points themselves. √ −inx Note that {e / 2π }n∈Z is also an orthonormal basis for any L 2 [ω0 − π, ω0 + π ], with ω0 a fixed real number. We then obtain ∞ eit x 1 einx = eit x , einx L 2 [ω0 − π, ω0 + π ] √ √ 2π n=−∞ 2π 2π
=
∞
n=−∞
eiω0 (t−n)
sin π (t − n) einx √ π (t − n) 2π
in
L 2 [ω0 − π, ω0 + π]
As a consequence, the following sampling result for signals with a nonsymmetric band of frequencies with respect to the origin arises: Any function of the form ω0 +π 1 F(x)eit x d x f (t) = √ 2π ω0 −π
with
F ∈ L 2 [ω0 − π, ω0 + π]
can be recovered by means of the series f (t) =
∞
n=−∞
f (n)eiω0 (t−n)
sin π (t − n) π(t − n)
(19)
It is worth pointing out the following result concerning the band of frequencies of a band-limited real-valued signal f : if the Fourier transform F of a real-valued function f is zero outside an interval, then it must be symmetric with respect to the origin. Indeed, |F(x)|2 = F(x)F(x) = F(x)F(−x) is an even function. The choice of the interval [−π, π] is arbitrary. The same result applies to any compact interval [−π σ, π σ ] by taking the samples {√ f (n/σ )}n∈Z and replacing t with σ t in the cardinal series (17). Indeed, {e−inx/σ/ 2π σ }n ∈ Z is an orthonormal basis for L 2 [−π σ, π σ ]. For a fixed t ∈ R, we have the expansion ∞ eit x einx/σ 1 eit x , einx/σ L 2 [−πσ,πσ ] √ = √ √ 2π σ n=−∞ 2π 2π σ
∞ √ sin π (σ t − n) einx/σ σ in L 2 [−π σ, π σ ] √ π (σ t − n) 2π σ n=−∞ √ Therefore,√taking Sn (t) = σ [sin π(σ t − n)/π(σ t − n)], tn = n/σ , n ∈ Z, and an = σ , we obtain the following:
=
ANTONIO G. GARC´IA
78
Any function of the form πσ 1 F(x)eit x d x f (t) = √ 2π −π σ
with
F ∈ L 2 [−π σ, π σ ]
can be expanded as the cardinal series ∞ # n $ sin π(σ t − n) f (t) = f σ π(σ t − n) n=−∞
(20)
We have the same convergence properties as in Eq. (17) because K (·, t)2L 2 [−πσ,πσ ] = σ . Moreover, the reproducing kernel for the corresponding space Hπσ is kπ σ (t, s) =
sin π σ (t − s) = σ sinc σ (t − s) π(t − s)
(21)
2. Band-Limited Functions in the Fractional Fourier Transform Sense √ The sequence {(1/ 2σ )e−iπnx/σ is an orthonormal basis for L 2 [−σ, σ ]. √ }n∈Z −iπnx/σ iax 2 e }n∈Z , with a ∈ R, is also an orIt is easy to prove that {(1/ 2σ )e thonormal basis for L 2 [−σ, σ ]. Let a and b be two nonzero real constants. For notational ease let us denote 2ab = 1/c. The meaning of these constants is discussed later. Direct calculations show that the expansion & ∞ % eiπnx/σ −iax 2 eiπ nx/σ −iax 2 2 2 2 2 e e e−ia(t +x −2bxt) = e−ia(t +x −2bxt) , √ √ 2σ 2σ L 2 [−σ,σ ] n=−∞ ∞ √
sin(σ/c)(t − nπ c/σ ) eiπ nx/σ −iax 2 e √ (σ/c)(t − nπc/σ ) 2σ n=−∞ √ 2 holds in the L 2 [−σ, σ ] sense. Let us set Sn (t) = 2σ e−iat [sin(σ/c)(t − nπc/σ )/(σ/c)(t − nπ c/σ ] and tn = nπ c/σ, n ∈ Z. Because Sn (tk ) = √ 2 2σ e−iatn δn,k , we obtain the following: =
2σ e−iat
2
For any function f of the form σ 2 2 f (t) = F(x)e−ia(t +x −2bxt) d x,
with
−σ
F ∈ L 2 [−σ, σ ]
(22)
the following sampling formula f (t) = holds.
∞
n=−∞
f (tn )e−ia(t
2
− nπ c/σ ) (σ/c)(t − nπc/σ )
−tn2 ) sin(σ/c)(t
(23)
SAMPLING THEORY
79
In this case, the reproducing kernel obtained from Eq. (12) is kσ (t, s) = 2σ e−ia(t
2
− s) (σ/c)(t − s)
−s 2 ) sin(σ/c)(t
Because kσ (t, t) = 2σ , the series in Eq. (23) converges uniformly in R. Our next purpose is to see how formula (22) and the fractional Fourier transform (FRFT) are related. Recall that the FRFT with angle α ∈ {0, π } of a function f (t) is defined as ∞ Fα [ f ](x) = f (t)K α (x, t) dt −∞
where, apart from a normalization constant, the integral kernel K α (x, t) is given by ei[(cot α)/2] (t
2
+x 2 )−i[xt/(sin α)]
(24)
For α = 0 the FRFT is defined by F0 [ f ](x) = f (x), and for α = π, by Fπ [ f ](x) = f (−x). Whenever α = π/2, the kernel (24) coincides with the Fourier kernel. Otherwise, Eq. (24) can be rewritten as eia(α)[t
2
+x 2 −2b(α)xt]
where a(α) = (cot α)/2 and b(α) = sec α. The inversion formula of the FRFT (see Zayed, 1996c) is given by ∞ 1 Fα (x)K −α (x, t) d x f (t) = √ 2π −∞ Consequently, formula (23) is just the sampling expansion for a function band limited to [−σ, σ ] in the FRFT sense (22). Note that 2a(α)b(α) = 1/(sin α), and c = sin α in the sampling expansion (23). The FRFT has many applications in several areas, including quantum mechanics, optics, and signal processing (Almeida, 1994; Namias, 1980; Ozaktas and Mendlovic, 1993, 1995). In particular, the propagation of light can be viewed as a process of continual FRFT. This allows us to pose the FRFT as a tool for analyzing and describing some optical systems (Ozaktas and Mendlovic, 1995). For the FRFT properties and the relationship of the FRFT to sampling, see Xia (1996), Zayed (1996c, 1998a, 1998b), and Zayed and Garc´ıa (1999). 3. Finite Sine and Cosine Transforms In this subsection two transforms closely related to the Fourier transform are examined.
ANTONIO G. GARC´IA
80
a. Finite Cosine Transform 2 Let us consider the orthogonal basis {cos nx}∞ n=0 in L [0, π ]. Note that 2 cos nx L 2 [0,π ] equals π/2 for n ≥ 1, and π for n = 0. For a fixed t ∈ R, we expand the function cos t x in this basis, obtaining & ∞ % cos nx cos nx cos t x, cos t x = cos nx L 2 [0,π] cos nx n=0 =
∞ sin πt (−1)n 2t sin π t + cos nx πt π (t 2 − n 2 ) n=1
L 2 [0, π ]
in
Therefore, choosing S0 (t) = sin π t/π t, Sn (t) = [(−1)n 2t sin πt]/[π (t 2 − n 2 )], and tn = n, n ∈ N ∪ {0}, we have the following: Any function of the form π f (t) = F(x) cos t x d x
F ∈ L 2 [0, π ]
with
0
can be expanded as ∞ sin πt 2 (−1)n t sin πt f (t) = f (0) + f (n) πt π n=1 t 2 − n2
The convergence of the series is absolute and uniform on R because sin 2tπ π + 2 4t is bounded for all t ∈ R. The reproducing kernel for the corresponding Hcos space is given by π 1 sin π(t − s) sin π(t + s) kcos (t, s) = + cos t x cos sx d x = 2 t −s t +s 0 K (·, t)2L 2 [0,π ] =
=
t2
1 [t sin tπ cos sπ − s cos tπ sin sπ ] − s2
b. Finite Sine Transform √ In a similar way, let us consider the orthonormal basis { 2/π sin nx}∞ n=1 in L 2 [0, π ]. For a fixed t ∈ R, we have sin t x = =
∞ 2 sin t x, sin nx L 2 [0,π] sin nx π n=1
∞ 2(−1)n n sin πt n=1
π(t 2 − n 2 )
sin nx
in
L 2 [0, π ]
81
SAMPLING THEORY
Taking Sn (t) = [2(−1)n n sin πt]/[π (t 2 − n 2 )] and tn = n, n ∈ N, we obtain the following: Any function of the form π F(x) sin t x d x f (t) =
with
0
F ∈ L 2 [0, π ]
can be expanded as f (t) =
∞ (−1)n n sin π t 2 f (n) π n=1 t 2 − n2
The convergence of the series is absolute and uniform on R because in this case π sin 2tπ K (·, t)2L 2 [0,π] = − 2 4t is bounded for all t ∈ R. The reproducing kernel for the corresponding Hsin space is given by π 1 sin π (t − s) sin π (t + s) − sin t x sin sx d x = ksin (t, s) = 2 t −s t +s 0 =
1 [−t cos tπ sin sπ + s sin tπ cos sπ ] t 2 − s2
The cardinal series (17) is absolutely convergent and hence unconditionally convergent. Therefore, it can be written, if terms are grouped, in the equivalent form ' ∞ f (−n) f (n) sin π t f (0) + + (−1)n f (t) = π t t −n t +n n=1
As a consequence, the sampling expansion associated with the finite cosine transform (finite sine transform) is nothing more than the cardinal series (17) for an even (odd) function. Moreover, it is easy to prove that the orthogonal sum
Hπ = Hsin ⊕ Hcos holds. In fact, using Euler formulas sin t x =
eit x − e−it x 2i
and
cos t x =
eit x + e−it x 2
we obtain Hsin ⊂ Hπ and Hcos ⊂ Hπ as sets, and f, gHsin = 1/π f, gHπ for f, g ∈ Hsin (the same occurs for f, g ∈ Hcos ). Then, having in mind the
ANTONIO G. GARC´IA
82
reproducing property (10) and Eq. (15), for s ∈ R and f ∈ Hπ we have f (s) = f, kπ (·, s)Hπ = π1 f, (ksin + kcos )(·, s)Hπ = f, ksin (·, s)Hsin + f, kcos (·, s)Hcos
f (s) + f (−s) f (s) − f (−s) + 2 2 Using an appropriate normalization, we could avoid the factor 1/π. =
4. Classical Band-Limited Functions Revisited Let us consider the product Hilbert space H = L 2 [0, π ] × L 2 [0, π ] endowed with the norm F2H = F1 2L 2 [0,π ] + F2 2L 2 [0,π ] for every F = (F1 , F2 ) ∈ H. √ The system of functions {(1/ π)(cos nx, sin nx)}n∈Z is an orthonormal basis for H. For a fixed t ∈ R we have & ∞ % 1 (cos t x, sin t x) = (cos t x, sin t x), √ (cos nx, sin nx) π H n=−∞ 1 × √ (cos nx, sin nx) π
∞ sin π (t − n) 1 √ √ (cos nx, sin nx) π(t − n) π n=−∞ √ in the H sense.√Taking Sn (t) = sin π(t − n)/ π(t − n) and tn = n ∈ Z, we have Sn (tk ) = π δn,k . As a consequence,
=
Any function of the form π {F1 (x) cos t x + F2 (x) sin t x} d x f (t) =
with
0
F1 , F2 ∈ L 2 [0, π ]
can be expanded as the cardinal series f (t) =
∞
n=−∞
f (n)
sin π (t − n) π (t − n)
The corresponding H space is the space Hπ in Section II.B.1. For f ∈ Hπ we have 0 π π 1 1 it x it x it x F(x)e d x = √ F(x)e d x + F(x)e d x f (t) = √ 2π −π 2π −π 0 0 1 F(x)(cos t x + i sin t x) d x =√ 2π −π
83
SAMPLING THEORY
+
π
0
π
F(x)(cos t x + i sin t x) d x
1 √ [F(x) + F(−x)] cos t x 2π 0 i + √ [F(x) − F(−x)] sin t x d x 2π π {F1 (x) cos t x + F2 (x)sin t x} d x = =
0
√ √ where F1 (x) = (1/ 2π )[F(x) + F(−x)] and F2 (x) = (i/ 2π)[F(x) − F(−x)] belong to L 2 [0, π ]. In particular, taking F1 = F2 = F ∈ L 2 [0, π ], we obtain the sampling expansion for a function f band limited to [0, π ] in the sense of the Hartley transform. More precisely, Any function of the form π f (t) = F(x)[cos t x + sin t x] d x 0
with
F ∈ L 2 [0, π ]
can be expanded as a cardinal series (17). Recall that the Hartley transform of a function F, defined as ∞ f (t) = F(x)[cos t x + sin t x] d x 0
was introduced by R. V. L. Hartley, an electrical engineer, as a way to overcome what he considered a drawback of the Fourier transform, namely, representing a real-valued function F(x) by a complex-valued one: ∞ g(t) = F(x)[cos t x − i sin t x] d x −∞
For more information about the Hartley transform see, for instance, Zayed (1996a, p. 265). 5. The ν-Bessel–Hankel Space √ The Fourier–Bessel set { x Jν (xλn )}∞ n=1 is known to be an orthogonal basis for L 2 [0, 1], where λn is the nth positive zero of the Bessel function Jν (t), ν > − 12
ANTONIO G. GARC´IA
84
(Watson, 1994, p. 580). The Bessel function of order ν is given by ( 2n ) ∞ (−1)n t tν 1+ Jν (t) = ν 2 Ŵ(ν + 1) n!(1 + ν) · · · (n + ν) 2 n=1 Jν satisfies the Bessel differential equation:
t 2 y ′′ + t y ′ + (t 2 − ν 2 )y = 0 Using special function formulas (Abramowitz and Stegun, 1972, Eq. 11.3.29), for a fixed t > 0, we have √ ∞ √ 2 tλn Jν (t) √ x Jν (xλn ) xt Jν (xt) = in L 2 [0, 1] ′ (λ ) t 2 − λ2 J n n n=1 ν
Hence,
The range of the integral transform 1 √ f (t) = F(x) xt Jν (xt) d x 0
F ∈ L 2 [0, 1]
is an RKHS Hν , and the sampling expansion √ ∞ 2 tλn Jν (t) f (t) = f (λn ) ′ Jν (λn ) t 2 − λ2n n=1
(25)
(26)
holds for f ∈ Hν .
Using a well-known integral (Watson, 1944, p. 134), we find that the reproducing kernel is √ st kν (s, t) = 2 {t Jν+1 (t)Jν (s) − s Jν+1 (s)Jν (t)} t − s2 Furthermore, 1 2 x|Jν (xt)|2 d x K (·, t) L 2 (0,1) = kν (t, t) = t 0
t ν2 1 ′ 2 2 = [Jν (t)] + 1 − 2 [Jν (t)] = t O 2 t t
as t goes to ∞ (Watson, 1944). As a consequence, the convergence of the series in Eq. (26) is absolute and uniform in any interval [t0 , ∞) with t0 > 0. Note that the integral kernel in Eq. (25) is that of the Hankel transform. Recall that the Hankel transform of a function F is defined as ∞ √ F(x) xt Jν (xt) d x t >0 ν > − 21 f (t) = 0
85
SAMPLING THEORY
It defines a unitary (i.e., a bijective isometry) operator L 2 [0, ∞) −→ L 2 [0, ∞) which is self-inverse (Naylor and Sell, 1982, p. 366). Therefore, functions in Hν are the functions in L 2 [0, ∞), band limited to [0, 1] in the Hankel transform sense, and Eq. (26) is the associated sampling formula. See Higgins (1972) and Zayed (1996a, p. 371) for more details about the Hankel transform and its associated sampling series. 6. The Continuous Laguerre Transform ∞ −x/2 is an orthonormal basis for L 2 [0, ∞), where The sequence "n {e k L n (x)}n=0 n k L n (x) = k=0 (−1) (1/k!)(k )x is the nth Laguerre polynomial. A continuous extension L t (x) of the Laguerre polynomials can be found in (Zayed, 1993, p. 144). It is given by
L t (x) = L t (x) is a C
∞
∞
L n (x)
n=0
sin π(t − n) π (t − n)
function that satisfies the Laguerre differential equation, x y ′′ + (1 − x)y ′ + t y = 0
which is the same differential equation satisfied by L n (x) when t is replaced by n. For our sampling purposes, the most important feature is that the expansion e−x/2 L t (x) = holds in L 2 [0, ∞). Therefore,
∞ sin π (t − n) −x/2 e L n (x) π(t − n) n=0
Any function of the form ∞ F(x)e−x/2 L t (x) d x f (t) =
with
0
F ∈ L 2 [0, ∞)
can be expanded as the sampling series f (t) =
∞ n=0
f (n)
sin π(t − n) π (t − n)
In a similar way, we can consider other families of special functions defining integral transforms and seek the associated sampling expansion. This is the case, for instance, of the continuous Legendre transform involving the Legendre function, the finite continuous Jacobi transform involving the Jacobi function, or more general versions of the continuous Laguerre transform considered in this example. See Zayed (1993, Chap. 4) for a complete discussion of this topic.
ANTONIO G. GARC´IA
86
7. The Multidimensional WSK Theorem The general theory in Section II.A can easily be adapted to higher dimensions. For simplicity let us consider the bidimensional case. The sequence {e−inx e−imy /2π } is an orthonormal basis for L 2 (R), where R denotes the square [−π, π ] × [−π, π]. For a fixed (t, s) ∈ R2 , we have 1 it x isy sin π(t − n) sin π(s − m) 1 inx imy e e = e e 2π π (t − n) π (s − m) 2π n,m
in
L 2 (R)
The functions Snm (t, s) = [sin π(t − n)/π(t − n)][sin π (s − m)/π(s − m)] and the sequence {tnm = (n, m)}, n, m ∈ Z, satisfy conditions C1 and C2 in Section II.A. Therefore, Any function of the form π π 1 f (t, s) = F(x, y)eit x eisy d x dy 2π −π −π
with
F ∈ L 2 (R)
can be recovered by means of the double series f (t, s) =
n,m
f (n, m)
sin π (t − n) sin π(s − m) π(t − n) π(s − m)
The series converges absolutely and uniformly on R2 . Similarly, we can get bidimensional versions of sampling formulas such as Eq. (18) or (19) by considering orthonormal bases in L 2 (R) obtained from orthonormal bases in each separate variable. Certainly, we can always find a rectangle enclosing the bounded support B of the bidimensional Fourier transform of a bidimensional band-limited signal f. Thus, we can use the bidimensional WSK formula to reconstruct f. However, this is clearly inefficient from a practical point of view because we are using more information than is strictly needed. In general, the support of the Fourier transform B is an irregularly shaped set. So, obtaining more efficient reconstruction procedures depends largely on the particular geometry of B. See Higgins (1996, Chap. 14) for a more specific account. In contrast, regular multidimensional sampling corresponds to a Cartesian uniform sampling grid which is used in signal and image processing whenever possible. However, the practice imposes other sampling grids, such as the polar grid used in computed tomography or the spiral grid used for fast magnetic resonance (see, for example, Bourgeois et al., 2001, and Stark, 1992). Consequently, in general, irregular sampling is more suitable than regular sampling for multidimensional signals.
87
SAMPLING THEORY
8. The Mellin–Kramer Sampling Result First, the ingredients necessary to understand the subsequent development of the Mellin–Kramer sampling result are introduced. They are taken, besides the main result, from Butzer and Jansche (1999). A function f : R+ −→ C is called c recurrent for c ∈ R, if f (x) = e2πc f (e2π x) for all x ∈ R+ where R+ stands for (0, +∞). The functional space 1 Yc2 := f : R+ −→ C ; f ∈ L loc (R+ ), c recurrent, and eπ dx <∞ |f (x)x c |2 x e−π is a Hilbert space under the inner product eπ dx f, gMc = f (x)g(x)x 2c −π x e
It is known that the sequence {e−ck x −c−ik }k∈Z forms an orthogonal basis for Yc2 . The same occurs for its conjugate sequence. Next, let us consider the kernel K c (t, x) = t −c x −c−i log t
t ∈ R+
x ∈ [e−π , eπ ]
For a fixed t ∈ R+ , we have the expansion K c (t, x) = Sc,k (t)e−ck x −c−ik k∈Z
in Yc2 , where
Sc,k (t) =
e2ck 1 K c (t, ·), K c (ek , ·)Mc = 2π 2π
π
(e−k t)−c−iu du
−π
For the proofs and more details, see Butzer and Jansche (1999). Therefore, taking tk = ek , k ∈ Z, as sampling points and taking into account that Sc,k (tm ) = δk,m , we obtain the following exponential sampling result: If f can be represented in the form eπ dx f (t) = F(x)K c (t, x) x 2c −π x e
t ∈ R+
for some c ∈ R and some F ∈ Yc2 , then f can be reconstructed by means of the exponential sampling f (t) =
∞
k=−∞
f (ek )Sc,k (t)
88
ANTONIO G. GARC´IA
This sampling result is valid for Mellin-band-limited functions—that is, functions f represented as 1 1 f (t) = M[ f ](c + iu) t −c−iu du t ∈ R+ 2π −1
Recall that the Mellin transform is defined by ∞ M[ f ](s) := f (u) u s−1 du 0
s = c + it ∈ C
whenever the integral exists. Again, see Butzer and Jansche (1999) for complete details. An application of exponential sampling in optics can be found in Gori (1992). Finally, in general, we can easily construct spaces H as in Section II.A having a sampling property at a sequence {tn }∞ n=1 as in "formula (14). To this end, let t1 , t2 , . . ., be distinct real numbers such that n 1/|tn |2 < ∞. There exists an analytic function P(t) with simple zeros at the sequence {tn }∞ n=1 (Marsden and Hoffman, 1987, p. 457). Specifically, the function P(t) is given by the canonical product ⎧* ∞ ∞ t ⎪ ⎪ 1 − |tn |−1 = ∞ exp(t/t ) if ⎪ n ⎨ t n n=1 n=1 P(t) = ∞ ∞ * ⎪ t ⎪ ⎪ 1− if |tn |−1 < ∞ ⎩ t n n=1 n=1 whenever tn = 0 for all n ∈ N, or by ⎧* ∞ t ⎪ ⎪ exp(t/tn ) 1− ⎪ ⎨t tn n=2 P(t) = ∞ * ⎪ t ⎪ ⎪ 1− ⎩t tn n=2
if
∞ n=2
if
∞ n=2
|tn |−1 = ∞ |tn |−1 < ∞
when, for instance, t1 = 0 (see Young, 1980, p. 55, for details). Taking Sn (t) = P(t)/(t − tn ) and any orthonormal basis {φn (x)}∞ n=1 for an 2 L (I ) space, we can follow the steps in Section II.A to construct an RKHS H with the sampling property at the given sequence {tn }∞ n=1 . Thus, if we take into account that Sn (tk ) = P ′ (tn )δn,k , formula (14) ensures that any function of the form (8) can be expanded as the Lagrange-type interpolation series f (t) =
∞ n=1
f (tn )
P(t) (t − tn )P ′ (tn )
This result was introduced, in connection with an inverse sampling problem, in Zayed (1996b).
89
SAMPLING THEORY
C. Finite Sampling Consider (H N , ·, ·H N ) a Euclidean finite-dimensional functional space comprising functions defined on ⊂ R. Let N be its dimension and let {ϕ1 , ϕ2 , . . . , ϕ N } be an orthonormal basis for H N . Because in a finite-dimensional space every linear functional is bounded, H N is an RKHS whose reproducing kernel, given by Eq. (11), is k N (t, s) =
N
ϕi (t)ϕi (s)
i=1
"N We can easily check that if f (t) = i=1 ai ϕi (t) ∈ H N , where ai ∈ C, then + , + , N N N f, ϕi (t)ϕi (s) ai ϕi (t), = ϕi (t)ϕi (s) i=1
i=1
i=1
HN
=
N i=1
HN
ai ϕi (s) = f (s)
∞ is When H N is a subspace of a larger Hilbert space H (e.g., when {ϕi }i=1 an orthonormal basis for H), by applying property (15), for every f ∈ H we obtain
f, k N (·, s)H N =
N i=1
f, ϕi ϕi (s)
(27)
(i.e., its orthogonal projection onto H N ). The reproducing formula for H N is a useful" tool to prove pointwise convergence of the generalized Fourier ∞ ai ϕi (t) whenever it holds (see Walter, 1994). expansion i=1 We can derive a finite sampling expansion for H N in the following way. N in such that Assume that there exists a finite sequence of points {sn }n=1 N {k N (t, sn )}n=1 are orthogonal in H N ; that is, k N (·, sm ), k N (·, sn )H N = k N (sn , sm )δnm N Then, expanding any function f ∈ H N in the orthogonal basis {k N (t, sn )}n=1 , we obtain the following finite sampling expansion:
f (t) =
N n=1
f (sn )
k N (t, sn ) k N (sn , sn )
In this context, two examples are of particular interest.
(28)
ANTONIO G. GARC´IA
90
1. Trigonometric Polynomials Consider H N the space of trigonometric polynomials of degree ≤ N and period 2π. H N is a subspace of L 2 [−π, π ] endowed with the usual inner product. An orthonormal basis for H N is given by the set of exponential complex √ N {eikt / 2π}k=−N . Therefore, the reproducing kernel for H N is k N (t, s) =
N 1 1 D N (t − s) eik(t−s) = 2π k=−N 2π
where D N denotes the N th Dirichlet kernel defined as N sin N + 12 t ikt e = D N (t) = sin (t/2) k=−N As a consequence of Eq. (27), we obtain the following well-known result for f ∈ L 2 [−π, π ]: f, k N (·, t) L 2 [−π,π] =
N 1 f, eikt eikt 2π k=−N
(i.e., its N th Fourier partial sum). Now we can obtain a sampling formula for the space H N of the trigonometric polynomials of degree ≤ N . To this end, let us consider the points sn = (2π n)/(2N + 1) ∈ [−π, π], n = −N , . . . , 0, . . . , N . Because k N (sm , sn ) =
sin π(m − n) 2N + 1 1 = δmn 2π sin π (m − n)/(2N + 1) 2π
a direct application of the sampling formula (28) gives N sin[(2N + 1)/2][t − 2πn/(2N + 1)] 2πn 1 p p(t) = 2N + 1 n=−N 2N + 1 sin 12 [t − 2π n/(2N + 1)]
"N for every trigonometric polynomial p(t) = k=−N ck eikt in H N . This interpolation formula, developed by Cauchy, dates to 1841 and is related to the finite version of Shannon’s sampling theorem. 2. Orthogonal Polynomials Another important class of examples is given by finite families of orthogonal polynomials on an interval of the real line. As an illustration, we restrict
SAMPLING THEORY
91
ourselves to the particular case of the Legendre polynomials {Pn }∞ n=0 defined, for instance, by means of their Rodrigues formula: Pn (t) =
1 dn 2 [(t − 1)n ] 2n n! dt n
It is known that they form an orthogonal basis for L 2 [−1, 1] and that Pn 2 = (n + 12 )−1 . Let us consider H N the finite subspace of L 2 [−1, 1] spanned by {P0 , P1 , . . . , PN }. For this space we have k N (t, s) =
N n=0
Pn (t)Pn (s) =
N + 1 PN +1 (t)PN (s) − PN (t)PN +1 (s) 2 t −s
where we have used the Christoffel–Darboux formula for Legendre polynomials. Note that k N (t, t) =
N +1 ′ [PN +1 (t)PN (t) − PN′ (t)PN +1 (t)] 2
N We seek points {sn }n=0 in [−1, 1] such that k N (sm , sn ) = 0 for m = n; that is,
PN +1 (sn ) PN +1 (sm ) = PN (sm ) PN (sn ) N the N + 1 simple roots of PN +1 in (−1, 1). In particular we can take for {sn }n=0 Thus, we obtain the finite sampling formula
f (t) = -
N n=0
f (sn )
PN +1 (t) (t − sn )PN′ +1 (sn )
"N ck (k + 21 )Pk (t). Note that this formula is nothing but for every f (t) = k=0 N the Lagrange interpolation formula for the samples { f (sn )}n=0 . In general, we N can take as sampling points {sn }n=0 the N + 1 simple roots of the polynomial PN +1 (t) − c PN (t) in (−1, 1), where c ∈ R. The sampling formula in this general case is f (t) =
N n=0
f (sn )
PN +1 (t)PN (sn ) − PN (t)PN +1 (sn ) (t − sn )PN (t)[PN′ +1 (sn ) − c PN′ (sn )]
General results about families of orthogonal polynomials can be found, for instance, in the classical works of Sansone (1991) and Szeg¨o (1991) or in Walter (1994). More examples and applications can be found in Annaby et al. (1999), Higgins (1996), and Plotkin et al. (1996).
ANTONIO G. GARC´IA
92
III. Classical Paley–Wiener Spaces Revisited In this section we extend the important example of the classical band-limited functions given in Section II.B.1 by exploring a little further the space (Hπ , · Hπ) and the isometry between Hπ and L 2 [−π, π ] obtained there. It is well known that the Fourier transform F is a unitary operator on L 2 (R); that is,
F : L 2 (R) −→ L 2 (R)
f −→ F ( f ) = . f
is a linear, bijective transform satisfying f L 2 (R) = . f L 2 (R) for every f ∈ 2 . L (R) (Naylor and Sell, 1982, p. 362). Whenever f or f are in L 1 (R) ∩ L 2 (R), the Fourier or inverse Fourier transform coincides with the parametric integrals ∞ ∞ 1 1 −itω . . f (ω)eitω dω f (ω) = √ f (t)e dt or f (t) = √ 2π −∞ 2π −∞
f in L 2 (R), respectively (Naylor and Sell, 1982, p. 335). For functions f , . the integrals must be understood as limits in the mean. Thus, for . f we have / /2 N / / 1 −itω /. f (t)e dt // dω −→ 0 / f (ω) − √2π −∞ −N
∞
as N → ∞ (Naylor and Sell, 1982, p. 362). As a consequence of this discussion, the space ' π 1 F(x)eit x d x , F ∈ L 2 [−π, π] Hπ = f : R −→ C | f (t) = √ 2π −π coincides with the closed subspace of L 2 (R) given by F −1 L 2 [−π, π] —that is, the classical Paley–Wiener space given by f ⊆ L 2 [−π, π ]} P Wπ := { f ∈ L 2 (R) ∩ C (R), supp .
where . f denotes the support of the Fourier transform of f . Hence, . f is zero outside [−π, π] for any f ∈ P Wπ . The isometry between Hπ and L 2 [−π, π ] is nothing more than the restriction of the Fourier transform to P Wπ , and the inner product is given by ∞ f, gHπ = f (x)g(x) d x = f, g L 2 (R) −∞
93
SAMPLING THEORY
A. Fourier Duality The Paley–Wiener space P Wπ can be expressed without resorting to the Fourier transform. Namely, any function f ∈ P Wπ can be extended to any z ∈ C as π 1 . f (z) = √ (29) f (ω)ei zω dω 2π −π
Thus we get a holomorphic (or analytic) function on C (i.e., an entire function). To this end, we first prove that f is a continuous function on C by using a standard argument allowing interchange of the limit with the integral. After this, we apply Morera’s theorem (Marsden and Hoffman, 1987, p. 173): whenever γ : [a, b] −→ C is a closed curve in C, the integral b π 1 iγ (t)ω . f (z) dz = √ dω γ ′ (t) dt f (ω)e 2π a γ −π
is shown to be zero by interchanging the order of the integrals. Moreover, f is a function of exponential type at most π (i.e., satisfies an inequality | f (z)| ≤ Aeπ|z| for all z ∈ C and some positive constant A). This follows from Eq. (29) by using the Cauchy–Schwarz inequality. For z = x + i y ∈ C we have π eπ|y| π . 1 |. f (ω)| e−yω dω ≤ √ | f (ω)|dω | f (x + i y)| ≤ √ 2π −π 2π −π ≤ eπ|z| f P Wπ
(30)
Conversely, the classical Paley–Wiener theorem (Young, 1980, p. 100) shows that P Wπ coincides with the space of entire functions of exponential type at most π with square integrable restriction to the real axis; that is, P Wπ = { f ∈ H(C) : | f (z)| ≤ Aeπ|z| , f |R ∈ L 2 (R)} The isometric isomorphism given by the Fourier transform
F P Wπ −→ L 2 [−π, π ] f −→ . f
1 f (z) = √ 2π
π
−π
. f (ω) ei zω dω
is called the Fourier duality between the spaces P Wπ and L 2 [−π, π ], and it has far-reaching consequences. Any expansion converging in L 2 [−π, π] is transformed by F −1 into another expansion which converges in the topology of P Wπ . This implies, by the reproducing kernel property, that it converges uniformly on R, as shown in Section II.B.1.
ANTONIO G. GARC´IA
94
The following eight nontrivial properties of P Wπ can easily be established by using the Fourier duality: 1. The energy of f ∈ P Wπ (i.e., its L 2 norm) is contained in that of its samthe coefficients of the Fourier expansion ples { f (n)}n∈Z . Because { f (n)}n∈Z are√ of . f in the orthonormal basis {e−inx / 2π}n∈Z , Parseval’s formula (6) gives f 2L 2 [−π,π ] = f 2P Wπ = .
∞
n=−∞
| f (n)|2 = { f (n)}2ℓ2 (Z)
2. The sequence {sinc (z − n)}n∈Z is an orthonormal basis in P Wπ . Expanding f ∈ P Wπ in this basis, we obtain its WKS expansion (17). Also, for each fixed α ∈ R, {sinc (z − n − α)}n∈Z is an orthonormal basis for P Wπ giving the sampling expansion (18). 3. P Wπ is an RKHS whose reproducing kernel is given by kπ (z, w) = sinc (z − w), whenever z, w in C. Recall that for real variables t, s we obtained kπ (t, s) = sinc (t − s). For complex variables it is necessary to conjugate the second variable in the cardinal sine function. Indeed, 1 f (w) = √ . f (x), e−iwx L 2 [−π,π ] = f (z), sinc (z − w) P Wπ 2π
w∈C
by using the Fourier duality. 4. Convergence in the norm of f ∈ P Wπ implies uniform convergence in horizontal strips of C. This is a consequence of the inequality | f (z)| ≤ eπ|y| f P Wπ , z = x + i y ∈ C, in Eq. (30). 5. For any f ∈ P Wπ , | f (x)| goes to zero as |x| → ∞, x ∈ R. This is a straightforward consequence of the Riemann–Lebesgue lemma (Bachman et al., 2000, p. 170). Furthermore, using the properties of the inverse Fourier transform with respect to the derivation, we see that the smoother . f is, the faster the decay of f is (Bachman et al., 2000, p. 334). 6. P Wπ is closed under derivation, and for every f ∈ P Wπ the following Bernstein-type inequality holds: f ′ P Wπ ≤ π f P Wπ This follows from 1 f (z) = √ 2π ′
π
−π
iω . f (ω) ei zω dω
by applying the Cauchy–Schwarz inequality. The classical Bernstein’s inequality also holds: for f ∈ P Wπ , f ′ ∞ ≤ π f ∞ , when we are using the supremun norm f ∞ = sup | f (t)| (Partington, 1997, p. 209). t∈R
95
SAMPLING THEORY
7. The orthogonal projection PP Wπ f of f ∈ L 2 (R) onto P Wπ is given by PP Wπ f (t) = F −1 (χ[−π,π] F f )(t) = f, sinc (· − t) P Wπ = ( f ∗ sinc) (t) for t ∈ R, where χ[−π,π ] denotes the characteristic function of the interval [−π, π] and ∗ the convolution operator. The first equality comes from the minimum norm property of the orthogonal projection (Naylor and Sell, 1982, p. 302). For f ∈ L 2 (R) and g ∈ P Wπ we have f −
g2L 2 (R)
∞ 1 2 . |. f (ω) − . g(ω)|2 dω = f −. g L 2 (R) = √ 2π −∞ ( π −π 1 2 . =√ | f (ω)| dω + |. f (ω) − . g(ω)|2 dω 2π −∞ −π +
π
∞
|. f (ω)| dω 2
)
f χ[−π,π ] , when the second summand equals zero. which is minimum for . g=. The other equalities come from Eq. (15) and the definition of the convolution. 8. In P Wπ , fixing t0 ∈ R, E > 0, and M ∈ C, we have | f (t0 )|2 = E max 2
f ≤E
√ sin π (s − t0 ) reached for f ∗ (s) = ± E π (s − t0 )
Similarly, min f 2 = M 2
f (t0 )=M
reached for f ∗ (s) = M
sin π(s − t0 ) π (s − t0 )
Using Fourier duality allows us to derive other expansions, not necessarily a sampling expansion. For instance, the use of the Legendre polynomials {Pn }∞ n=0 leads to the so-called Bessel–Neumann expansion in P W1 = { f ∈ L 2 (R) ∩ C (R), supp . f ⊆ [−1, 1]} 2 It is well known that { n + 12 Pn (x)}∞ n=0 is an orthonormal basis for L [−1, 1] and that ) n i F √ t −1/2 Jn+1/2 (t) (x) = Pn (x)χ[−1,1] (x) 2π
ANTONIO G. GARC´IA
96
Figure 1. Fourier duality and sampling.
for any n ∈ N ∪ {0}, where Jn+1/2 (t) is the Bessel function of half odd-integer order. For any f ∈ P W1 , we expand its Fourier transform . f as . f (x) =
∞
an Pn (x)
n=0
f , Pn L 2 [−1,1] an = n + 12 .
As a consequence of the inverse Fourier transform, we get the following: Any f ∈ P W1 can be expanded as f (z) =
∞
in an √ z −1/2 Jn+1/2 (z) 2π n=0
The convergence is absolute and uniform on horizontal strips of C. Next, let us explore the meaning of the Fourier duality in terms of signal theory by using the commutative diagram shown in Figure 1. All mappings included in this diagram are bijective isometries: the signal energy is preserved. S denotes the sampling mapping with sampling period Ts = 1. P is the 2π periodization mapping which extends a function . f in [−π, π] to the whole R with period 2π . The other two mappings are, respectively, the functional Fourier transform and the Fourier transform in ℓ2 (Z), defined as
F ({an })(ω) :=
∞
e−inω an √ 2π n=−∞
{an }n∈Z ∈ ℓ2 (Z)
Thus, we obtain the result that is well known by signal-processing engineers that states that sampling a signal (with a sampling period Ts = 1 in this case) matches a periodization of its spectrum (with period 2π in this case). The situation described by the diagram in Figure 1 is illustrated in Figure 2. In the next subsection we consider the general case (i.e., when we sample a signal in P Wπ with a sampling period Ts > 0). Finally, under minor changes, all the results in this subsection apply for any general Paley–Wiener space P Wπ σ defined by f ⊆ L 2 [−π σ, π σ ]} P Wπσ := { f ∈ L 2 (R) ∩ C (R), supp .
SAMPLING THEORY
97
Figure 2. Time–frequency interpretation of the Whittaker–Shannon–Kotel’nikov (WSK) sampling theorem.
or expressed in the form P Wπσ = { f ∈ H(C) : | f (z)| ≤ Aeπ σ |z| , f |R ∈ L 2 (R)} by using the classical Paley–Wiener theorem.
B. Undersampling and Oversampling As was mentioned in the preceding subsection, if we sample a signal f in P Wπ with a general sampling period Ts > 0, the question arises as to whether it is possible to reconstruct it from its samples { f (nTs )}. This is indeed possible in the case in which 0 < Ts ≤ 1 (i.e., sampling the signal at a frequency higher than that given by its bandwidth). For sampling periods Ts > 1, we cannot reconstruct the signal because of the aliasing phenomenon, which is explained later. In the next subsection, by using a version of the Poisson summation formula, we show that sampling a signal with a sampling period Ts is equivalent to periodizing its spectrum with a period 2π/Ts . 1. Poisson Summation Formula Consider the sequence of samples { f (nTs )}n∈Z taken from a signal f ∈ P Wπ with a sampling period Ts > 0. Let . f p be the 2π /Ts -periodized version of
98
. f ; that is,
ANTONIO G. GARC´IA
. f p (ω) =
2π . n f ω+ Ts n=−∞ ∞
Obviously, . f p is a 2π /Ts periodic function. Next,√we calculate its Fourier expansion with respect to the orthonormal basis { Ts /(2π)/ e−imTs ω }m∈Z of L 2 [0, 2π/Ts ]. The Fourier coefficient cm is calculated as 0 2π /Ts Ts . f p (ω)eimTs ω dω cm = 2π 0 0 2π /Ts ∞ Ts 2π . n eimTs ω dω = f ω+ 2π 0 Ts n=−∞ 0 ∞ 2π /Ts Ts 2π . n eimTs ω dω = f ω+ 2π n=−∞ 0 Ts The change of variable ω + (2π/Ts )n = x allows us to obtain 0 0 π ∞ (2π/Ts )(n+1) Ts Ts imTs x . . dx = cm = f (x)eimTs x d x f (x)e 2π n=−∞ (2π/Ts )n 2π −π = Ts f (mTs )
Therefore, the Fourier expansion for . f p is ∞ ∞ e−imTs ω 2π . . f (mTs ) √ n = Ts f p (ω) = f ω+ Ts 2π m=−∞ n=−∞
Thus we have obtained the Poisson summation formula applied to . f with period 2π/Ts . From this formula we deduce that the spectrum of the sequence { f (mTs )}m∈Z (i.e., the sampled signal) is precisely (up to a scale factor) the f of f . 2π/Ts -periodized version of the spectrum . As a consequence, in the oversampling case, where 0 < Ts ≤ 1, we can recover the spectrum of f from the spectrum of the sampled signal and hence recover the signal f . In terms of the WSK sampling theorem, the explanation is simple: if a signal is band limited to the interval [−π, π ], it is also band limited to any interval [−π σ, π σ ] with σ ≥ 1. This situation is depicted in Figure 3. In the undersampling case, where Ts > 1, we cannot obtain the spectrum of f from the spectrum of the sampled signal because the copies of . f overlap in . f p . Hence, it is impossible to recover the signal from its samples. The alluded overlap produces the aliasing phenomenon (i.e., some frequencies go under
SAMPLING THEORY
99
Figure 3. Oversampling.
the name of other ones). As pointed out by Hamming (1973, p. 14), this is a familiar phenomenon to watchers of television and western movies. As the stagecoach starts up, the wheels start going faster and faster, but then they gradually slow, stop, go backward, slow, stop, go forward, and so forth. This effect is due solely to the sampling the picture makes of the real scene. The undersampling situation is depicted in Figure 4. This undersampling–oversampling discussion clarifies the crucial role of the f ⊂ [−π σ, π σ ]. critical Nyquist period which is given by Ts = 1/σ whenever . Some comments about the Poisson summation formula are in order: r
r
The Poisson summation formula is a fundamental way to link a function f with its Fourier transform . f or vice versa. Namely, ∞ ∞ 2π T . f t+ n =√ (31) f (mT ) eimT t T 2π n=−∞ m=−∞
Whenever f ∈ L 1 (R), the left-hand side of Eq. (31) denotes a 2π /T periodic function belonging to L 1 [0, 2π/T ]. Expanding this √ function in Fourier series with respect to the orthonormal basis {eimT t / 2π /T }m∈Z , we obtain the right-hand side of Eq. (31) (see Gasquet and Witonski, 1990, and Higgins, 1996, for details). Under smooth hypotheses on f , it can be proved that Eq. (31) also holds pointwise (see, for instance, Gasquet and Witonski, 1990, and Partington, 1997). The following formalism, very familiar in the engineering literature, can be used to deduce the WSK sampling formula. Namely, common usage is to write the sampled signal { f (n)}n∈Z (we are assuming for simplicity
Figure 4. Undersampling.
ANTONIO G. GARC´IA
100
that Ts = 1) as f s (t) :=
∞
n=−∞
f (n)δ(t − n) = ( f ∗ )(t)
" where := ∞ n=−∞ δ(t − n) denotes the Dirac’s comb or train of deltas at the integers. We want to recover the signal f from its sampled signal f s by using an appropriate filtering device, that is, f (t) = ( f s ∗ g)(t) =
∞
n=−∞
f (n)g(t − n)
for an appropriate impulse response g. By taking the Fourier transform, we obtain ∞ . . g(ω) = . g(ω) f (ω) = . f s (ω). f (ω + 2π n) n=−∞
where we have used the Fourier transform for f s given by the Poisson summation formula. Whenever supp . f ⊂ [−π, π], the appropriate . g is χ[−π,π] and consequently g(t) = sinc (t) (i.e., an ideal low-pass filter). All the steps in the preceding reasoning can be made rigorous in light of the theory of distributions (Gasquet and Witonski, 1990). 2. Robust Reconstruction The actual computation of the cardinal series (17) presents some numerical difficulties because the cardinal sine function behaves like 1/t as |t| → ∞. A simple example is that given by the numerical calculation of f ( 12 ), for a function f in" P Wπ , from a noisy sequence of samples { f (n) + δn }. The error in this case | n (−1)n δn /π (n − 12 )|, even when |δn | ≤ δ, can be infinity. One way to overcome this difficulty is, again, to use the oversampling technique (i.e., sampling the signal at a frequency higher than that given by its bandwidth). In this way we obtain sampling functions converging to zero at infinity faster than the cardinal sine functions. Let us consider the band-limitedfunction πσ 1 F(ω)eiωt dω with F ∈ L 2 [−π σ, π σ ] and σ < 1 f (t) = √ 2π −π σ Extending F to be zero in [−π, π ] \ [−π σ, π σ ], we have F(ω) =
∞
n=−∞
e−inω f (n) √ 2π
in
L 2 [−π, π ]
101
SAMPLING THEORY
Let θ(ω) be a smooth function taking the value one on [−π σ, π σ ], and the value zero outside [−π, π ]. As a consequence, F(ω) = θ(ω)F(ω) =
∞
n=−∞
and the sampling expansion
f (t) =
e−inω f (n)θ(ω) √ 2π ∞
n=−∞
in
L 2 [−π, π]
f (n)S(t − n)
√ holds, where S is the inverse Fourier√transform F −1 of θ/ 2π and, consequently, S(t − n) = F −1 [θ(ω)e−inω / 2π ](t). Furthermore, using the properties of the Fourier transform, we see that the smoother θ is, the faster the decay of S is. However, the new sampling functions {S(t − n)}∞ n=−∞ are no longer orthogonal. Next, let us consider a particular example. Assume that σ = 1 − ǫ with 0 < ǫ < 1, and consider for θ the trapezoidal function ⎧ 1 if |ω| ≤ π (1 − ǫ) ⎪ ⎪ ⎨1 |ω| 1− if π(1 − ǫ) ≤ |ω| ≤ π θ(ω) = ⎪ ǫ π ⎪ ⎩0 if |ω| ≥ π
We can easily obtain S(t) = (sin ǫπt/ǫπ t)(sin πt/πt). Note that, in this case, S behaves like 1/t 2 as t → ∞. The corresponding sampling expansion takes the form ∞ sin ǫπ (t − n) sin π (t − n) f (t) = f (n) ǫπ (t − n) π(t − n) n=−∞
In this example, if each sample f (n) is subject to an error δn such that |δn | ≤ δ, then the total error in the f (t) just calculated is bounded by a constant depending on only δ and ǫ (Partington, 1997, p. 211). Thus we have obtained a robust reconstruction for f by using the oversampling technique.
C. Sampling by Using Other Types of Samples A sampling series in P Wπ may also contain samples from a transformed version of the signal such as, for instance, its derivative or its Hilbert transform. This is called the multichannel sampling setting: the signal is processed through various channels before being sampled. This idea is in Shannon’s (1949) famous paper, in which he suggested taking samples of the signal and
ANTONIO G. GARC´IA
102
its derivative. General methods for multichannel sampling date to Papoulis’s (1977a) work, as pointed out in the expository paper by Brown (1993). As this author says in the paper: “For certain applications, data about a given band limited signal can be available from several sources.” As can be seen in the following examples, in the multichannel case the sampling points can occur at a density below the Nyquist density, but with the overall “number” of samples maintained. 1. Using Samples from the Derivative Next we prove that it is possible to recover a signal f ∈ P Wπ by using its samples { f (2n)}n∈Z , taken at half the Nyquist rate, along with the samples { f ′ (2n)}n∈Z taken from its first derivative. Namely,
Any function f ∈ P Wπ can be recovered from the sequences of samples { f (2n)}n∈Z and { f ′ (2n)}n∈Z by means of the formula ∞ sin(π/2)(t − 2n) 2 ′ f (t) = { f (2n) + (t − 2n) f (2n)} (32) (π/2)(t − 2n) n=−∞
To this end, we consider F ∈ L 2 [−π, π] the Fourier transform of f . The following Fourier expansions in L 2 [−π, π] hold: F(ω) =
∞
n=−∞
e−inω f (n) √ 2π
F(ω − π) =
and
∞
e−inω (−1)n f (n) √ 2π n=−∞
As a consequence, the function S(ω) = 12 [F(ω) + F(ω − π )] admits the Fourier expansion S(ω) =
∞
n=−∞
e−i2nω f (2n) √ 2π
In a similar way, because
1 f (t) = √ 2π we obtain the Fourier expansion ′
iωF(ω) =
∞
n=−∞
in
L 2 [0, π ]
π
iωF(ω)eitω dω
−π
e−inω f ′ (n) √ 2π
in
L 2 [−π, π ]
Therefore, the function R(ω) = (i/2)[ωF(ω) + (ω − π)F(ω − π)] has the Fourier expansion R(ω) =
∞
n=−∞
e−i2nω f ′ (2n) √ 2π
in
L 2 [0, π ]
103
SAMPLING THEORY
Grouping both expansions, for ω ∈ [0, π ], we have 1 1 F(ω) 1 S(ω) = F(ω − π) R(ω) 2 iω i(ω − π ) or, inverting the matrix, we have 2i i(ω − π) F(ω) = −iω F(ω − π ) π
−1 1
S(ω) R(ω)
(33)
Therefore, introducing this splitting of F into Eq. (29) and after some calculations, we find π 1 F(ω)eitω dω f (t) = √ 2π −π −i2nω 0 ∞ 1 2i ′ 2 e =√ (ω + π) f (2n) + f (2n) √ eitω dω π 2π −π −∞ π 2π −i2nω π ∞ 2i ′ e 2 1 (π − ω) f (2n) − f (2n) √ eitω dω +√ π 2π 0 −∞ π 2π 0 ∞ π 1 |ω| 2 =√ 1− f (2n)ei(t−2n)ω dω π π 2π −∞ −π π (−i sgn ω) ′ 2 i(t−2n)ω f (2n)e dω + √ π −π 2π The desired result comes by using the Fourier duals (see, for instance, Higgins, 1996, p. 203) π (−i sgn ω) itω 1 πt t e dω (34) sin =√ sinc √ 2 2 2π −π 2π and
π0 1 |ω| itω t 2 =√ 1− e dω sinc 2 π 2π −π π 2
Some additional comments may elucidate this derivative sampling result: r
r
This case corresponds to a two-channel sampling: the signal f is filtered with two filters with transfer functions H1 (ω) = 1 and H2 (ω) = iω where ω ∈ [−π, π ], respectively, before sampling with a sampling period Ts = 2. The multichannel approach used in Higgins (1996, Chap. 12) and Rawn (1989) is implicitly in the proof of this simpler example. The matricial
104
ANTONIO G. GARC´IA
relation (33) can be understood as a linear, bijective, and bounded operator from the external direct sum L 2 [0, π ] ⊕ L 2 [0, π ] onto L 2 [−π, π ]. Recall that in this external direct sum the norm is given by (F, G)2 = F2L 2 [0,π ] + G2L 2 [0,π] The orthonormal basis in L 2 [0, π ] ⊕ L 2 [0, π ] given by √ √ {(e−i2nω / π , 0), (0, e−i2nω / π)}n∈Z
r
is transformed, by means of Eq. (33), into a Riesz basis of L 2 [−π, π], a concept that appears in Section III.E.1. Reconstruction formulas from samples of f ∈ P Wπ and its first p − 1 derivatives taken at the sampling period Ts = p can be obtained in a similar way. See Higgins (1999, p. 58) for the sampling formula equivalent to Eq. (32) in this more general setting.
2. Using Samples from the Hilbert Transform In this subsection we see that a function f ∈ P Wπ can be recovered f (n)} n∈Z or from both seeither from the samples of its Hilbert transform { f (2n)} n∈Z , where f stands for the Hilbert quences of samples { f (2n)} n∈Z and { transform of f . The Hilbert transform for functions in L 2 (R) is introduced by giving the following motivation: in the case of a real signal f ∈ L 2 (R), its Fourier transform . f can be written as . f (ω) = A(ω) + i B(ω) where A and B are even and odd functions, respectively. Therefore, . f , and consequently f, are determined by the values of . f on [0, ∞), that is, by . f · u where u denotes the Heaviside function 1 if t ≥ 0 u (t) = 0 if t < 0 The appropriate tool which allows us to take into account this feature about f is, as can be seen later, its associated analytic signal defined by means of its Hilbert transform. a. The Hilbert Transform in L 2 (R) The Hilbert transform in L 2 (R) can be defined as H : L 2 (R) −→ L 2 (R)
f −→ H ( f ) = f := F −1 [(−i sgn) . f]
where i stands for the imaginary unity and sgn denotes the signum function (i.e., sgn (t) = 1 if t > 0 and −1 if t < 0). It is straightforward to obtain the four main properties of the Hilbert transform in L 2 (R) by using those of the Fourier transform in L 2 (R). Namely,
SAMPLING THEORY
105
1. H is well defined (i.e., H ( f ) ∈ L 2 (R) for f ∈ L 2 (R)) and is linear by using the properties of the Fourier transform in L 2 (R). ( f )2 = . f 2 = 2. H is an isometry in L 2 (R) because H ( f )2 = H1 f 2 . 3. H is bijective and, moreover, H −1 = −H . Indeed,
F [H (H f )] = (−i sgn) 1 H f = (−i sgn)(−i sgn) . f = −. f
hence, H 2 = −I d. 4. If f is a real-valued signal in L 2 (R), the same occurs with its Hilbert transform f . We can write 1 f (t) = lim √ N →∞ 2π
N
−N
in the mean sense. Hence,
1 f (t) = lim √ N →∞ 2π
N
−N
(−i sgn ω) . f (ω) eiωt dω i sgn ω . f (−ω) e−iωt dω
The change of variable ξ = −ω allows us to conclude that f (t) = f (t) for almost all t in R. Another equivalent definition for the Hilbert transform is that given by 1 ∞ f (x) 1 1 p.v. ∗ f (t) = p.v. dx f (t) := π x π −∞ t − x 1 f (t − u) = lim+ du δ→0 π |u|>δ u
where p.v. denotes the Cauchy principal value of the integral. It allows us to enlarge the definition of the Hilbert transform to other functional spaces as the L p (R) with 1 < p < ∞. The Paley–Wiener space P Wπ is closed under the Hilbert transform; that f is also in P Wπ . Given f ∈ P Wπ we is, for f ∈ P Wπ its Hilbert transform can write its Hilbert transform as π 1 f (t) = √ (−i sgn ω) . f (ω)eitω dω (35) 2π −π Because |i sgn ω| = 1, it easily follows that the sequence {i sgn ω(e−inω/ √ 2π )}n∈Z is an orthonormal basis for L 2 [−π, π ]. For a fixed t ∈ R, we expand
ANTONIO G. GARC´IA
106
the Fourier kernel in L 2 [−π, π ] with respect to this basis, obtaining ∞ e−itω e−inω 1 e−itω , i sgn ω e−inω L 2 [−π,π ] i sgn ω √ = √ 2π n=−∞ 2π 2π
=−
∞
π e−inω 1 sinc (t − n) sin (t − n)i sgn ω √ 2 2 2π n=−∞
(36)
where we have used formula (Eq. 34) to derive the coefficients of the expansion. This expansion allows us to prove the following sampling result in P Wπ : Any function f ∈ P Wπ can be recovered from the sequence of samples { f (n)}n∈Z of its Hilbert transform by means of the formula f (t) = −
To this end, consider
∞
1 π f (n) sinc (t − n) sin (t − n) 2 2 n=−∞ %
e−itω f (t) = . f, √ 2π
&
(37)
L 2 [−π,π ]
Introducing the expansion obtained in Eq. (36) and taking into account the continuity of the inner product with respect to the L 2 [−π, π] convergence, we can take out the series. Thus, we obtain & % ∞ π 1 e−inω . f (t) = − sinc (t − n) sin (t − n) f , i sgn ω √ 2 2 2π L 2 [−π,π] n=−∞ =−
∞
π 1 sinc (t − n) sin (t − n) f (n) 2 2 n=−∞
This sampling formula can be interpreted as a single-channel sampling re(ω) = sult: the signal f is filtered with a filter whose transfer function is H −i sgn ω, ω ∈ [−π, π], before sampling with a sampling period Ts = 1. f ∈ P Wπ , we Having in mind that H 2 ( f ) = − f , if we apply Eq. (37) to obtain the dual sampling formula: f (t) =
∞
n=−∞
π 1 f (n) sinc (t − n) sin (t − n) 2 2
Next, the concept of an analytic signal associated with a real-valued signal f in L 2 (R) is introduced.
SAMPLING THEORY
107
b. Analytic Signal Given a real signal f ∈ L 2 (R), its associated analytic signal is the signal in L 2 (R) defined as f f a := f + i
The Fourier transform of the analytic signal f a satisfies . f + i(−i sgn ) . f = 2. f ·u fa = .
Consequently, supp . f a ⊆ [0, +∞). In signal processing, the analytic signal is used, for example, to define the instantaneous frequency of a real-valued signal f . We can write its analytic signal as f a (t) = A(t) eiϕ(t) to define its instantaneous frequency as ω := ϕ ′ (t). Thus, for a fixed time u, the Wigner–Ville time–frequency distribution of f a given by ∞ τ τ −iτ ξ PV f a (u, ξ ) := e fa u + dτ fa u − 2 2 −∞ is typically concentrated in a neighborhood of the instantaneous frequency ξ = ϕ ′ (u) because ∞ ξ PV f a (u, ξ ) dξ ′ ϕ (u) = −∞ ∞ P f (u, ξ ) dξ −∞ V a (see Mallat, 1999, p. 108). From now on, we confine ourselves to using analytic signals for sampling purposes. For instance, for sampling a bandpass signal efficiently,
A signal f ∈ L 2 (R) is a bandpass signal if the support of its Fourier transform satisfies supp . f ⊆ [−ω0 − σ π, −ω0 ] ∪ [ω0 , ω0 + σ π ], where ω0 > 0.
Without loss of generality we take σ = 1. Naturally, we can apply the WSK formula with the sampling period Ts = π/(ω0 + π) < 1 to recover f from its samples { f (nTs )}. Next, we show that we can recover a bandpass signal f by sampling the signal and its Hilbert transform with a sampling period Ts = 2. To this end, we use the following reasoning involving the analytic signal f a . Namely, the analytic signal f a of the bandpass signal f satisfies ω0 +π 1 2. f (ω)eiωt dω f a (t) = √ 2π ω0 As a consequence of the sampling formulas (19) and (20) we have f a (t) =
∞
n=−∞
f a (2n)eiω1 (t−2n)
sin π(t/2 − n) π(t/2 − n)
(38)
ANTONIO G. GARC´IA
108
where ω1 = ω0 + π/2. Having in mind that f = ℜ f a , we obtain the following: f ⊂ [−ω0 − π, −ω0 ] ∪ [ω, ω0 + Any real bandpass signal f such that supp . π] can be expanded as f (t) =
∞
n=−∞
! sin(π/2)(t − 2n) f (2n) cos ω1 (t − 2n) − f (2n) sin ω1 (t − 2n) (π/2)(t − 2n)
In particular, taking ω0 = 0 in Eq. (38), we obtain the following sampling result: Any real function f ∈ P Wπ can be recovered by using its samples { f (2n)}n∈Z and those { f (2n)}n∈Z of its Hilbert transform by means of the formula ∞ 2 3 π π f (t) = f (2n) sin (t − 2n) f (2n) cos (t − 2n) − 2 2 n=−∞ ×
sin(π/2)(t − 2n) (π/2)(t − 2n)
One final comment concerns the term analytic signal used in this section. An analytic signal f a is not an analytic function in the context of complex analysis. However, if we define the function ∞ 1 2. f (ω)ei zω dω z = t + iy ∈ C F(z) = √ 2π 0
we obtain an analytic function on the upper half-plane C+ = {z = t + i y ∈ C , y > 0}. Besides, its boundary function lim y→0 F(t + i y) coincides almost everywhere with f a ∈ L 2 (R), the analytic signal associated with f ∈ L 2 (R). The mathematical details, involving the Hardy space H 2 (C+ ), can be found in Duren (2000). For more information about the Hilbert transform and its uses for sampling purposes, the interested reader should see Brown (1967), Papoulis (1977b), and Zayed (1993, 1996a). A unified approach to sampling theorems for derivatives and Hilbert transforms can be found in Stens (1983).
D. Zeros of Band-Limited Functions The problem of signal recovery can also be considered from a different point of view. As we know, the signals in the space P Wπ are entire functions of
SAMPLING THEORY
109
exponential type at most π whose restriction to R belongs to L 2 (R). Although entire functions are not completely determined by the location of their zeros, as can be seen from the Hadamard factorization theorem (Young, 1980, p. 74), band-limited functions are, as can be deduced from Titchmarsh’s (1926) theorem, which assures that any band-limited function is uniquely determined by its zeros up to an exponential factor depending on its spectral interval [a, b]. If the spectral interval is of the form [−a, a], this exponential factor reduces to a constant. Recall that this is the case for real-valued band-limited signals, as pointed out in Section II.B.1. Titchmarsh’s (1926) theorem, which provides the needed mathematical foundation, reads as follows: Let F ∈ L 1 [a, b] and define the entire function f to be b f (z) = F(w) e zw dw a
Then f has infinitely many zeros, {z n }n∈N , with nondecreasing absolute values, such that ∞ * z 1− f (z) = f (0) e[(a+b)/2]z zn n=1 where the infinite product is conditionally convergent. In the preceding result, it is assumed that a and b are the effective lower and upper limits of the integral, in the sense that there are no numbers α > a and β < b such that F(ω) = 0 (a.e.) in [a, α] or [β, b]. If f is band-limited to [−a, a], then ∞ * z f (z) = f (0) 1− zn n=1 provided f (0) = 0, or
∞ * z f (z) = Az 1− zn n=1 m
if z = 0 is a zero of f of order m. Notice that the zeros in Titchmarsh’s theorem may be complex. This complexity poses a difficulty from a practical point of view because complex zeros are more difficult to detect than real zeros. Whenever the zeros are real, this theorem provides a useful tool for signal recovery, usually referred to as
110
ANTONIO G. GARC´IA
real-zero interpolation (Bond and Chan, 1958; Marvasti, 1987). One way to deal with only real zeros is by using the sine-wave crossings technique (Marvasti, 1987) involved in the following result from Duffin and Schaeffer (1938), which reads as follows: Let f be an entire function of exponential type at most γ such that | f (x)| ≤ 1 on the real axis. Then for every real α the function cos(γ z + α) − f (z) has only real zeros, or vanishes identically. Moreover, all the zeros are simple, except perhaps at points on the real axis where f (x) = ±1 For a deeper study of the oscillatory properties of Paley–Wiener functions, see Higgins (1996), Paley and Wiener (1934), and Walker (1994).
E. Irregular Sampling The WSK expansion in P Wπ (17) can also be written as f (z) =
∞
f (n)
n=−∞
G(z) G ′ (n)(z − n)
(39)
where G(z) = sin π z/π. The latter expression exhibits the Lagrange-type interpolatory character of the WSK result. It expresses the possibility of recovering a certain kind of signal from a sequence of regularly spaced samples. From a practical point of view it is interesting to have a similar result, but for a sequence of samples taken with a nonuniform distribution along the real line (a straightforward application of this result is the recovery of signals from samples affected by time-jitter error—that is, taken at points tn = n + δn , with δn some measurement uncertainty). Intuitively speaking, nonuniform sampling is the natural method for discrete representation of a signal. For example, let us assume there is a signal with high instantaneous frequency regions and low instantaneous frequency in other regions. It is more efficient to sample the low-frequency regions at a lower rate than the high-frequency regions. An appropriate question that can be used to obtain such a result is this: how close should the sample points be to the regular sample points so that an equation similar to Eq. (39) still holds? The first researchers to answer this question were Paley and Wiener (1934), who proved that if the sequence of sample points, {tn }n∈Z , satisfies D := sup |tn − n| < τ n∈Z
(40)
111
SAMPLING THEORY
where τ = 1/π 2 , and the sequence is symmetric (i.e., t−n = tn (n ≥ 1)), then any f ∈ P Wπ can be expressed as f (z) =
∞
f (tn )
n=−∞
G(z) − tn )
G ′ (tn )(z
where now ∞ * z2 G(z) = (z − t0 ) 1− 2 tn n=1 Later, Levinson (1940) extended condition (40) to τ = 41 and nonsymmetric sequences. This √ result is related to the “maximum” perturbation of2 the Hilbert basis {e−inω / 2π }n∈Z of the square-integrable function space L [−π, π], in √ −itn ω such a way that the perturbed sequence {e / 2π }n∈Z is a Riesz basis, a concept introduced next, of the same space. Kadec proved that Levinson’s result, τ = 41 , is optimal, in the sense that if D = 41 counterexamples can be found (see Young, 1980, pp. 42–44, for details). 1. Introducing Riesz Bases So that Riesz bases can be applied for irregular sampling purposes in Paley– Wiener spaces, the more important features of these bases are reviewed, with elementary proofs given when available. A Riesz basis {xn }∞ n=1 in a Hilbert space H is a basis obtained from an orthonormal basis {en }∞ n=1 by means of a bounded invertible operator T : H −→ H; that is, T (en ) = xn for each n ∈ N. Next the six most important properties of Riesz bases are listed: 1. For each x ∈ H there exists a unique sequence of scalars {cn }∞ n=1 such that x=
∞
cn x n
n=1
in the H norm sense. For each x ∈ H there is a unique y ∈ H such that x = T (y). Expanding y in the orthonormal basis {en }∞ n=1 , we obtain ∞ ∞ ∞ cn x n y, en T (en ) = y, en en = x = T (y) = T n=1
n=1
n=1
ANTONIO G. GARC´IA
112
with cn = y, en . As a consequence, the sequence {xn }∞ n=1 forms a complete set in H. 2. For each n ∈ N, the coefficient functional defined as f n : H −→ C
x −→ f n (x) = cn = y, en
is linear and bounded in H. It easily follows from the Cauchy–Schwarz inequality that |y, en | ≤ y en = T −1 (x) ≤ T −1 x 2 3. For every x ∈ H, its sequence of coefficients {cn }∞ n=1 belongs to ℓ (N). 4. As a consequence of the Riesz representation theorem, for each n ∈ N there exists a unique yn ∈ H such that f n (x) = x, yn for "every x ∈ H. Thus, for every x ∈ H we have the unique representation x = ∞ n=1 x, yn x n . ∞ and {y } are biorthonormal (i.e., xm , yn = 5. The sequences {xn }∞ n n=1 n=1 This follows from the uniqueness of the coefficients because δn,m ). " xm = ∞ n=1 x m , yn x n . 6. The sequence {yn }∞ n=1 also forms a Riesz basis for H, and the expansions
x=
∞ n=1
x, yn xn =
∞ x, xn yn n=1
hold for every x ∈ H. A proof of this result and a more specific account of Riesz bases can be found in Yound (1980, pp. 19–36). 2. The Riesz Bases Setting In what follows, {tn }n∈Z denotes a sequence of real numbers such that D := sup |tn − n| < n∈Z
1 4
√ As a consequence of Kadec’s 14 -theorem (Young, 1980, p. 42), {e−itn ω/ 2π}n∈Z is a Riesz basis for L 2 [−π, π√ ]. A necessary and sufficient condition about the −itn ω sequence {tn }n∈Z for {e / 2π}n∈Z to be a Riesz basis for L 2 [−π, π] was given by Pavlov (1979). Let us consider ∞ * z z 1− 1− (41) G(z) = (z − t0 ) tn t−n n=1
113
SAMPLING THEORY
an entire, well-defined function, whose set of zeros is {tn }n∈Z , as becomes clear with the proof of the following theorem, the so-called Paley–Wiener–Levinson (PWL) sampling theorem. Any f ∈ P Wπ can be recovered from its sample values { f (tn )}n∈Z by means of the Lagrange-type interpolation series f (z) =
∞
n=−∞
f (tn )
G(z) − tn )
G ′ (tn )(z
which is absolutely and uniformly convergent in horizontal strips of C (in particular in R). −itn ω / √ For the proof, let {h n (ω)}n∈Z be the unique biorthonormal basis of {e 2π }n∈Z ; that is, for every m, n ∈ Z, & % e−itm ω = δnm hn , √ 2π L 2 [−π,π]
Thus, every . f ∈ L 2 [−π, π] can be expressed as & ∞ % ∞ e−itn ω e−itn ω . . . f (ω) = h n (ω) f , h n L 2 [−π,π] √ f, √ = 2π 2π L 2 [−π,π] n=−∞ n=−∞ By using the Fourier duality in P Wπ , we get
−itn ω e . f , h n L 2 [−π,π] F −1 √ (z) 2π n=−∞ & ∞ % e−itn ω . F −1 (h n )(z) f, √ = 2π L 2 [−π,π] n=−∞
f (z) =
∞
By setting gn = F −1 (h n ) and√ taking into account that . f , h n L 2 [−π,π ] = f, gn P Wπ and that . f , e−itn ω / 2π L 2 [−π,π] = f (tn ), we can rewrite f (z) =
Now,
∞
n=−∞
f, gn P Wπ sinc (z − tn ) =
gn (z) = F
−1
1 (h n )(z) = √ 2π
∞
f (tn )gn (z)
n=−∞
π
h n (ω) ei zω dω
−π
is an entire function, band limited to [−π, π ], whose only zeros are {tm }m =n . Suppose on the contrary that s ∈ / {tm }m =n is a zero of gn . According to the
114
ANTONIO G. GARC´IA
classical Paley–Wiener theorem, the function g(z) =
z − tn gn (z) z−s
belongs to P Wπ and vanishes at every tn . If we take into account the completeness of a Riesz basis, this implies that g ≡ 0, a contradiction. Therefore, as a consequence of Titchmarsh’s theorem, we have gn (z) = An
G(z) z − tn
Notice that when we set n = 0, for instance, the preceding formula shows that G(z) is an entire function, as stated at the beginning of this result. Because gn (tn ) = 1, then An = 1/G ′ (tn ), and hence, f (z) =
∞
n=−∞
f (tn )
G ′ (t
G(z) n )(z − tn )
The series is convergent in the norm of P Wπ and, consequently, uniform in horizontal strips of C. The absolute convergence of the series comes from the fact that a Riesz basis is also an unconditional basis. Note that as a by-product of the proof of the PWL theorem, we deduce that the sequences {G(z)/[G ′ (tn )(z − tn )]}n∈Z and {sin π (z − tn )/π (z − tn )}n∈Z are biorthonormal Riesz bases in P Wπ . A general theory for nonorthogonal sampling formulas by using Riesz bases instead of orthonormal bases can be developed. The main steps involved in the theory are pointed out in next subsection. 3. A Unified Approach to Nonorthogonal Sampling Formulas The Riesz bases setting is the appropriate framework from which to obtain nonorthogonal sampling formulas while retaining the Riesz basis property in a unified way. The procedure closely parallels that given for orthogonal formulas in the preceding subsection. Because of this parallelism, only a sequence of the most important results is highlighted. ∞ ∗ Throughout this subsection, {φn (x)}∞ n=1 and {φn (x)}n=1 denote a pair of 2 biorthonormal Riesz bases for a fixed L (I ) space. Note that the sequences ∞ ∗ of their conjugate functions {φn (x)}∞ n=1 and {φn (x)}n=1 are also a pair of 2 biorthonormal Riesz bases for L (I ). ∞ Let {Sn }∞ n=1 be a sequence of functions Sn : ⊂ R −→ C and {tn }n=1 a sequence in verifying conditions C1 and C2 in Section II.A.
115
SAMPLING THEORY
Let us define the kernel K (x, t) as K (x, t) =
∞
Sn (t)φn∗ (x)
n=1
(x, t) ∈ I ×
(42)
Note that as a function of x, K (·, t) belongs to L 2 (I ) because {φn }∞ n=1 is a Riesz basis for L 2 (I ). By using this kernel K (x, t), let us define on L 2 (I ) the linear integral transform f (t) := F(x)K (x, t) d x for F ∈ L 2 (I ) (43) I
This transform is again a bijective isometry between L 2 (I ) and its range H, provided we endow this space with the norm f H := F L 2 (I ) . The following properties for H hold: r
r
{Sn (t)}∞ n=1 is a Riesz basis for H. It is the image by Eq. (43) of the Riesz ∞ 2 ∗ basis {φn (x)}∞ n=1 of L (I ). Besides, its biorthonormal basis {Sn (t)}n=1 is ∞ ∗ the corresponding image by Eq. (43) of {φn (x)}n=1 . (H, · H ) is an RKHS space whose reproducing kernel k(t, s) is given by k(t, s) = K (·, t), K (·, s) L 2 (I ) =
r
∞
Sn (t)Sn∗ (s)
n=1
Expanding any f ∈ H in the Riesz basis {Sn (t)}∞ n=1 , that is, f =
∞ ∞ ∞ f (tn ) f, Sn∗ Sn = Sn F, φn∗ Sn = an n=1 n=1 n=1
we obtain the (nonorthogonal) sampling expansion f (t) =
∞ n=1
f (tn )
Sn (t) an
(44)
The convergence of the series √ (44) is absolute and uniform on subsets of
where K (·, t) L 2 (I ) = k(t, t) is bounded. Let us next illustrate the proposed method with two examples: First, let us consider a sequence {tn }n∈Z of real numbers satisfying Kadec’s condition. It can be proved (Paley and Wiener, 1934) that for any fixed t ∈ R we can expand the Fourier kernel in L 2 [−π, π] as ∞ eitn x G(t) eit x = √ √ (t − tn )G ′ (tn ) 2π 2π n=−∞
(45)
ANTONIO G. GARC´IA
116
where G stands for Eq. (41), the infinite product of the sequence {tn }n∈Z . Regarding expansion (45), see also the proof of the PWL sampling theorem in the previous subsection. If we take Sn (t) = G(t)/[(t − tn )G ′ (tn )] and the sampling points {tn }n∈Z , Eq. (44) is nothing more than the statement of the PWL sampling theorem in P Wπ . Second, now let {tn }∞ n=1 be the sequence of positive roots of the equation sin 2πt = 1/t. It was proven in Mihailov (1962) that {cos[(x + π)tn ]}∞ n=1 forms a Riesz basis for L 2 [−π, π ]. Its biorthonormal basis is given by ∞ 1 cos[(x − π)tn ] αn n=1 where the normalization constants are αn = sin 2π tn /2tn + π cos 2π tn > 0. For each fixed t ∈ R we get the expansion cos[(x + π)t] =
∞ t sin 2πt − 1 cos[(x + π)tn ] αn (t 2 − tn2 ) n=1
in
L 2 [−π, π]
Therefore, taking Sn (t) = (t sin 2π t − 1)/αn (t 2 − tn2 ) and the sampling points {tn }∞ n=1 , we obtain the following nonorthogonal sampling result: Any function f of the form π f (t) = F(x) cos[(x + π )t] d x −π
F ∈ L 2 [−π, π]
can be expanded as the sampling formula f (t) =
∞ n=1
f (tn )
t sin 2π t − 1 αn (t 2 − tn2 )
The corresponding RKHS H has the reproducing kernel k(t, s) =
sin 2π(t + s) sin 2π (t − s) + 2(t + s) 2(t − s)
This subsection should be closed with a pertinent comment: The proposed constructive method is limited to the integral transforms whose kernel K (x, t) can be written as Eq. 42. However, it can be proved (see Garc´ıa and Szafraniec, 2001) that under plausible hypotheses the integral kernel adopts the required form Eq. (42). Namely, let us consider an integral transform F ∈ L 2 (I ) f (t) = F(x)K (x, t) d x I
SAMPLING THEORY
117
where t ∈ and the integral kernel belongs to L 2 (I ) for every fixed t ∈ . Let us assume that for the functions f in the range space of this integral transform, a sampling formula like Eq. (44) holds pointwise in , with { f (tn )/an }∞ n=1 ∈ ℓ2 (N) and the sampling functions {Sn (t)}∞ n=1 satisfying the following two conditions: " 2 1. ∞ n=1 |Sn (t)| < ∞ for each t ∈ . " 2 2. ∞ n=1 αn Sn (t) = 0 for every t ∈ , with {αn } ∈ ℓ , which implies αn = 0 for all n. Then the kernel of the integral transform can be expressed as K (x, t) = {φn∗ }∞ n=1
∞
Sn (t)φn∗ (x)
n=1
where is, in general, a Riesz basis for L 2 (I ). This result includes the 2 particular case in which {φn∗ }∞ n=1 is an orthonormal basis for L (I ). Let us return to the case of irregular sampling in P Wπ . The Kadec condition about the sampling points {tn }n∈Z can be relaxed by using exponential frames in L 2 [−π, π]. In the next subsection, frames in a separable Hilbert space are introduced, and an account of their most important properties is given. 4. Introducing Frames The definition of a frame in a Hilbert space is as follows: A sequence {xn }∞ n=1 in a Hilbert space H is said to be a frame if there exist constants 0 < A ≤ B, called the frame bounds, such that for any x ∈ H the frame inequality Ax2 ≤ holds.
∞ n=1
|x, xn |2 ≤ Bx2
(46)
If A = B, then the frame is called a tight frame. If the removal of one element xm renders the sequence {xn }n =m no longer a frame, then it is called an exact frame. The left-hand side of the frame inequalities (46) shows that a frame is a complete sequence in H. An orthonormal basis is a tight frame with A = B = 1. For our sampling purposes, the eight most important properties in frame theory are as follows: 1. For an arbitrary sequence {xn }∞ n=1 in H, the following are equivalent: is a frame with frame bounds A and B. a. {xn }∞ n=1
ANTONIO G. GARC´IA
118
" b. The frame operator defined as S(x) := ∞ n=1 x, x n x n is a bounded positive operator in H with AI ≤ S ≤ B I , where I denotes the identity operator in H. Recall that T is a positive operator in H (T ≥ 0) if T (x), x ≥ 0 for all x ∈ H. T ≤ S means that S − T ≥ 0. See Bachman et al. (2000, p. 467) for a proof. 2. S −1 exists and is positive in H, and B −1 I ≤ S −1 ≤ A−1 I . 3. {S −1 (xn )}∞ n=1 is also a frame, called the dual frame, with frame bounds B −1 and A−1 (Bachman et al., 2000, p. 468). 4. Any x ∈ H can be written in terms of the dual frame as x=
∞ n=1
x, S −1 (xn )xn =
∞ n=1
x, xn S −1 (xn )
(47)
Because S −1 is the operator frame for the dual frame, we have ∞ ∞ x, S −1 (xn )xn x, S −1 (xn )S −1 (xn ) = x = S(S −1 (x)) = S n=1
n=1
We get the other representation for x ∈ H by considering x = S −1 (S(x)). 5. If {xn }∞ n=1 is a tight frame in H, then the operator frame is S = AI , and for every x ∈ H the representation x=
∞ 1 x, xn xn A n=1
(48)
holds. 6. Suppose that there exists a sequence of scalars {bn }∞ n=1 such that x = "∞ b x . Then, n n n=1 ∞ n=1
|bn |2 =
∞ n=1
|an |2 +
∞ n=1
|an − bn |2
where an = x, S −1 (xn ). That is, the coefficients obtained by means of the dual frame representation (47) have a minimum norm property in ℓ2 (N) (Bachman et al., 2000, p. 468). 7. A sequence {xn }∞ n=1 of a separable Hilbert space H is a Riesz basis if and only if it is an exact frame (see Bachman et al., 2000, p. 471, or Young, 1980, p. 188, for a proof). ∞ ∞ −1 8. If {xn }∞ n=1 is an exact frame, then {x n }n=1 and {S (x n )}n=1 are biorthonor−1 mal sequences; that is, xn , S (xm ) = δnm (Bachman et al., 2000, p. 471).
SAMPLING THEORY
119
In the frame setting we" retain the representation property (i.e., every x ∈ H −1 may be written as x = ∞ n=1 x, S (x n )x n ), but we sacrifice the uniqueness of the representation, unlike the case of orthonormal or Riesz bases. It is worth pointing out that in finite-dimensional spaces proper frames correspond to spanning sets that are not necessarily linearly independent. If M is a spanning set for C N where M > N , there exist constants A, B > 0 {xn }n=1 such that for all x ∈ C N Ax2 ≤
M n=1
|x, xn |2 ≤ Bx2
Frame theory dates to 1952 when a seminal paper by Duffin and Schaeffer written in the context of Paley–Wiener spaces was published. The theory was revived in the 1990s in connection with wavelet theory and has proved to be a fundamental tool in irregular sampling. The reader interested in deeper knowledge of frame theory should refer to Casazza (2000), Christensen and Jensen (2000), and Young (1980). 5. The Frame Setting
√ Let us assume that for a real sequence {tn }n∈Z , the family {e−itn ω / 2π }n∈Z is a frame in L 2 [−π, π ]. Then, there exist two constants 0 < A ≤ B such that / /2 & ∞ /% / −itn ω e / / Aϕ2L 2 [−π,π] ≤ / ϕ, √ / ≤ Bϕ2L 2 [−π,π] / 2 [−π,π] / 2π L n=−∞ for each ϕ ∈ L 2 [−π, π ]. Taking f = F −1 (ϕ) in P Wπ , we obtain Aϕ2L 2 [−π,π] ≤ or A f 2P Wπ ≤
∞
| f (tn )|2 ≤ Bϕ2L 2 [−π,π]
∞
| f (tn )|2 ≤ B f 2P Wπ
n=−∞
n=−∞
by using the Fourier duality. Because f (tn ) = f, sinc (· − tn ) P Wπ , we deduce that {sin π (t − tn )/π(t − tn )}n∈Z is a frame in P Wπ . Let {h n }n∈Z be its dual frame. Then, as a consequence of the representation property (47), for every f ∈ P Wπ we have the sampling formula f (t) =
∞
n=−∞
f (tn )h n (t)
ANTONIO G. GARC´IA
120
The problem with this sampling formula is that we do not know the dual frame {h n }n∈Z . We would like to have a method to recover f ∈ P Wπ from the available information, that is, the" sequence of samples { f (tn )}n∈Z , or equivalently the frame operator S( f ) = ∞ n=−∞ f (tn ) sinc (· − tn ). In the next subsection an iterative algorithm, essentially the Richardson method, allows us to recover f from the operator frame evaluated at f, S( f ). An explanation of the oversampling technique seen in Section III.B.2 can be given in light of frame theory. Namely, The sequence {σ sinc σ (t − n)}n∈Z is a tight frame with bound A = 1 for every Paley–Wiener space PWπ σ with σ < 1. To this end, let f be a function in P Wπ σ , and let F be its Fourier transform supported in [−π σ, π σ ]. Extending F to be zero in [−π, π ] \ [−π σ, π σ ], we have F(ω) =
∞
n=−∞
e−inω f (n) √ 2π
L 2 [−π, π ]
in
Applying Parseval’s equality in L 2 [−π, π] and Fourier’s duality in P Wπ σ , we get f 2P Wπσ = F2L 2 [−π,π ] =
∞
n=−∞
| f (n)|2 =
∞
n=−∞
| f, σ sinc σ (· − n)|2
which proves our assertion. Note that σ sinc σ (t − s) is the reproducing kernel in P Wπσ (21). As a corollary, Any signal f in P Wπ σ , with σ < 1, can be expanded by using the tight-frame representation (48) as f (t) = σ
∞
n=−∞
f (n) sinc σ (t − n)
Finally, sufficient conditions are given √ on the real sampling points {tn }n∈Z to guarantee that the sequence {e−itn ω / 2π}n∈Z is a frame in L 2 [−π, π] or, equivalently, that {sinc (t − tn )}n∈Z is a frame in P Wπ . The first result in this direction came from Duffin and Schaeffer (1952) and reads as follows: Suppose that there exist constants 0 < ǫ < 1, α, L > 0, so that the sampling sequence {tn }n∈Z satisfies |tn − tm | ≥ α for n = m and sup |tn − ǫn| ≤ L n∈Z
Then, {sinc (t − tn )}n∈Z is a frame in P Wπ .
121
SAMPLING THEORY
Condition |tn − tm | ≥ α for n = m (the sampling set {tn }n∈Z is said to be separated or uniformly discrete) implies by itself the existence of a constant " 2 B > 0 such that ∞ n=−∞ | f (tn )| ≤ B f P Wπ for every f in P Wπ (Partington, 1997, p. 219). Both conditions together imply the existence of a constant A > 0 " 2 such that A f P Wπ ≤ ∞ n=−∞ | f (tn )| for every f in P Wπ . The second result, proof of which can be found in Partington (1997, pp. 219–231), is the following: Suppose that a uniformly discrete set {tn }n∈Z satisfies the condition that there exists a constant k such that f ∞ ≤ k sup | f (tn )|
for all
n
f ∈ P Wπ
then the sequence {sinc (t − tn )}n∈Z is a frame in P Wπ . (The renowned mathematician A. Beurling called this new condition balayage.)
F. Iterative Algorithms The iterative method allowing us to recover f ∈ P Wπ from the frame operator S( f ) is, from a functional analysis point of view, the inversion of a linear operator by means of a Neumann series. Recall that if T is a continuous linear transformation of a Banach space E into itself such that T < 1, then (I − T )−1 exists and is continuous. Moreover, it can be given by the series (I − T )−1 = I + T + T 2 + T 3 + · · · =
∞
Tn
n=0
which converges in the operator norm topology (see, for instance, Naylor and Sell, 1982, p. 431). Using the preceding result, let us prove a version of the so-called extrapolated Richardson method, (i.e., an iterative method used to find the solution f of a linear system A f = h).
Let A be a bounded operator on a Banach space E such that f − A( f ) ≤ γ f for all f ∈ E with γ < 1. Then A is invertible on E and any f can be recovered from A( f ) by the following iteration algorithm: set f 0 = A( f ) and f n+1 = f n + A( f − f n ) for n ≥ 0, then f = limn→∞ f n . After n iterations, the error estimate is given by f − f n ≤ γ n+1 f . Because I − A ≤ γ < 1, then I − (I − A) = A is invertible and A−1 = "∞ k k=0 (I − A) . Therefore, gn+1 :=
n+1 (I − A)k A( f ) −→ f k=0
as n −→ ∞
ANTONIO G. GARC´IA
122
In contrast, we can write gn+1 = A( f ) +
n+1 n (I − A)k A( f ) = A( f ) + (I − A) (I − A)k A( f ) k=1
k=0
= gn + A( f − gn )
for n ≥ 0 and g0 = A( f ). Hence, we have obtained the convergence of the proposed iterative algorithm to f . Moreover, regarding its convergence rate we have ∞ 4 4 4 4 (I − A)k A( f )4 = (I − A)n+1 A−1 A( f ) ≤ γ n+1 f f − gn = 4 k=n+1
obtaining the desired result. Next we put to use this general iterative algorithm to recover band-limited signals from a frame in P Wπ . Assume that {sinc (t − tn )}n∈Z is a frame in P Wπ with frame bounds A and B. Let S be the frame operator given by S( f ) =
∞
n=−∞
f (tn ) sinc (· − tn )
and consider the new operator S := [2/(A + B)]S. We can prove that we can use this operator in the preceding iterative algorithm. To this end, because AI ≤ S ≤ B I , we have
Therefore,
2B 2 2A f 2 ≤ S( f ), f ≤ f 2 A+B A+B A+B
f 2 − As a consequence,
2 2A f 2 ≥ f 2 − S( f ), f A+B A+B (I − S )( f ), f ≤
In a similar way we can prove that −
B−A f 2 A+B
B−A f 2 ≤ (I − S )( f ), f A+B
Because I − S is a bounded self-adjoint operator, we can deduce (Naylor and Sell, 1982, p. 371) that I − S = sup |(I − S )( f ), f | ≤ f =1
B−A =γ <1 A+B
SAMPLING THEORY
123
For more details about frames and irregular sampling, see Benedetto (1992, 1994), Benedetto and Heller (1990), and Feichtinger and Gr¨ochenig (1994). Some comments about iterative algorithms for sampling purposes are in order: r
r
We can see the crucial role played by the frame bounds in the convergence of the preceding algorithm. Thus, it is of practical importance to obtain sharp estimates for A and B. If only a crude upper bound B and the existence of a lower band A > 0 are known, the frame algorithm can still be used by using a relaxation parameter λ > 0 (see Feichtinger and Gr¨ochenig, 1994, and Gr¨ochenig, 1993a, for more details). If we are able to construct an approximation of the identity operator in P Wπ by using a sequence of samples { f (tn )}n∈Z , we can apply the iterative algorithm to recover f . For instance, let {tn }n∈Z be a strictly increasing real sequence with limn→±∞ tn = ±∞. Consider δ = supn∈Z (tn+1 − tn ) the maximal gap between samples and {z n }n∈Z the sequence of midpoints (i.e., z n = (tn + tn+1 )/2). When δ < 1, we can obtain an approximation of the identity operator in P Wπ by setting ∞ A( f ) := PP Wπ f (tn )χ[zn−1 ,zn ) n=−∞
That is, we interpolate f by a step function first, followed by the orthogonal projection onto P Wπ . Indeed, it can be proved that f − A( f ) ≤ δ f for every f ∈ P Wπ (see Feichtinger and Gr¨ochenig, 1994, and Partington, 1997, for the proof). In Feichtinger and Gr¨ochenig (1994) we can find another approximation of the identity operator in P Wπ . Let {tn }n∈Z be a sequence as in the preceding case, with maximal gap between samples δ. If we define wn = (tn+1 − tn−1 )/2, it is proved that the sequence √ { wn sinc (t − tn )}n∈Z forms a frame for P Wπ with frame bounds (1 − δ)2 and (1 + δ)2 . Consequently, we can recover any function f ∈ P Wπ from
A( f ) :=
∞ 1 wn f (tn ) sinc (· − tn ) 1 + δ 2 n=−∞
by means of the aforementioned iterative algorithm √ with a rate of convergence γ = 2δ/(1 + δ 2 ). The amplitude factor wn compensates for the nonuniformity of the density of samples (see Feichtinger and Gr¨ochenig, 1994, for the proof).
124 r
r
ANTONIO G. GARC´IA
The standard frame algorithm can be used in combination with acceleration methods like Chebyshev acceleration or conjugate gradient acceleration, which allows a reduction in the number of iterations (Gr¨ochenig, 1993a). The iterative techniques also work in higher-dimensional settings (Feichtinger and Gr¨ochenig, 1992b).
The interested reader can also consult Feichtinger and Gr¨ochenig (1992a), Feichtinger et al. (1995), and Gr¨ochenig (1992, 1993a, 1993b, 1999).
IV. Sampling Stationary Stochastic Processes A stochastic process {X (t) : t ∈ R} defined on a probability space ( , A, p) is said to be a stationary (wide-sense) stochastic process continuous in mean square if it verifies the following assumptions: r
r
r
X (t) ∈ L 2 (A; C), that is, X (t)2 = E[|X (t)|2 ] < ∞, and E[X (t)] = 0, for each t ∈ R, where E denotes the expectation of a random variable. {X (t)} is stationary (wide sense), that is, R X (t + u, t) = R X (u) for all t ∈ R, where R X stands for the autocorrelation function given by R X (t, t ′ ) = E[X (t)X (t ′ )]. The mapping defined as R −→ L 2 (A; C) t −→ X (t) is continuous when L 2 (A; C) is endowed with its usual norm U 2 = E[|U |2 ].
It is known that such a process admits an integral representation where the function to be integrated is scalar and the measure takes values in the L 2 (A; C) space (Garc´ıa and Mu˜noz-Bouzo, 2000; Rozanov, 1967; Soize, 1993). Moreover, whenever the process is band limited, it can be expanded as a Shannon sampling series (Garc´ıa and Mu˜noz-Bouzo, 2000; Lloyd, 1959; Rozanov, 1967; Soize, 1993). The primary aim in this section is to capture the main features from the latter definition (i.e., stationarity and continuity) to obtain this class of results in an abstract Hilbert-space setting. Most of the ideas included were taken from Garc´ıa and Mu˜noz-Bouzo (2001). Let us begin by defining a generalized stationary process:
125
SAMPLING THEORY
A generalized stationary process is a family {xt }t∈R ⊂ H satisfying the following two conditions: 1. The function r (u) = xu+t , xt H is well defined for all u ∈ R (stationarity). 2. The function r is continuous at 0 (continuity). The function r is the autocorrelation function of the process. Observe that whenever condition 1 holds, then condition 2 implies that {xt }t∈R is a continuous process (in the H norm). Indeed, xt − xs 2H = xt − xs , xt − xs = 2r (0) − 2ℜ r (t − s) Consequently, the continuity of {xt }t∈R in the H norm is equivalent to the continuity of r at zero. In particular, a generalized stationary process is weak continuous and consequently r is continuous in R. In contrast, condition 1 implies that r is a function of positive type because for all choices of N ∈ N, t1 , . . . , t N ∈ R, and c1 , . . . , c N ∈ C, we have + , N N N N r (tm − tn )cm cn = xtm , xtn cm cn = cm xtm ≥ 0 cm xtm , m,n=1
m,n=1
m=1
m=1
Because r is a continuous function of positive type, by using Bochner’s theorem (Riesz and Sz.-Nagy, 1990, p. 385) we find the following: There exists a positive finite measure μ on BR , the Borel sets in R, such that ∞ r (u) = eiuω dμ(ω) −∞
The term μ is the spectral measure associated with the process {xt }t∈R . Let H X denote the Hilbert space spanned by the process {xt }t∈R in H, and consider the space L 2μ of all complex-valued measurable functions f such that ∞ | f (ω)|2 dμ(ω) < ∞. Then, −∞ between the spaces L 2μ and H X with There exists an isometric isomorphism itω corresponding elements e and xt . : L 2μ → H X by To this end, let us define (g) =
n k=1
ak xtk
whenever
g(ω) =
n k=1
ak eitk ω
ANTONIO G. GARC´IA
126
Clearly, for g(ω) =
"n
k=1
′ )H = (g), (g = = =
+
ak eitk ω and g ′ (ω) =
n
a k x tk ,
k=1
ak b j
k=1 j=1 ∞ −∞
∞
−∞
b j xt j
j=1
m n
m
n
∞
−∞
ak eitk ω
k=1
"m
j=1
,
H
=
b j eit j ω we get
m n k=1 j=1
ak b j r (tk − t j )
ei(tk −t j )ω dμ(ω)
m
b j e−it j ω dμ(ω)
j=1
g(ω)g ′ (ω) dμ(ω) = g, g ′ L 2μ
to an isometric linear map A standard limit process allows us to extend itω on the closed linear manifold generated by {e : t ∈ R} (i.e., on all of L 2μ ). Clearly, it maps L 2μ onto H X . Next we derive the Shannon sampling theorem for band-limited generalized stationary processes. A generalized stationary process is said to be band limited π to [−π, π] if supp μ ⊆ [−π, π ] (i.e., r (u) = −π eiuω dμ(ω)).
Let {xt }t∈R be a generalized stationary process band limited to [−π, π ] and suppose that μ({−π, π }) = 0. Then, the following sampling formula holds: xt =
∞ sin π(t − n) xn π(t − n) n=−∞
(49)
where the series converges in H for each t ∈ R. For each t ∈ R, we have in L 2 [−π, π ] eitω =
∞ sin π(t − n) inω e π(t − n) n=−∞
(50)
The Dirichlet–Jordan test (Zygmund, 1957, p. 57) ensures that convergence is also uniform on intervals [−π + δ, π − δ], with δ > 0. Consequently, the series in Eq. (50) converges everywhere in (−π, π), and μ–almost everywhere in [−π, π]. Besides, because the bounded function eitω has Fourier coefficients O(1/n) as |n| → ∞, the partial sums in Eq. (50). are uniformly bounded in [−π, π] (Zygmund, 1957, p. 90). From the bounded convergence theorem for
SAMPLING THEORY
127
μ we get
/ /2 N / sin π(t − n) inω // / itω e / dμ(ω) −→ 0 /e − / π(t − n) −π / n=−N π
when N goes to ∞. We have convergence in the L 2μ sense, and by using the ˜ we obtain the desired expansion. isometry , In particular, when the measure μ is absolutely continuous with respect to the Lebesgue measure on [−π, π ] (i.e., dμ = s(ω) dω with s ∈ L 1 [−π, π] the spectral density of the process), this implies that μ({−π, π }) = 0 and the following corollary holds: If the measure μ is absolutely continuous with respect to the Lebesgue measure on [−π, π ], then the sampling formula (49) holds. Finally, it is worth pointing out that formula (49) works for generalized stationary processes whose μ measure is not absolutely continuous with respect to the Lebesgue measure. A simple example is given by {xt = eiat h}t∈R where a ∈ (−π, π ) and h ∈ H with h = 1. In this case, r (u) = eiau and μ = δa , the Dirac delta at point a, which is not absolutely continuous with respect to the Lebesgue measure on [−π, π ]. Before this section closes, two comments about how to go into more detail are warranted: 1. The first comment concerns the integral representation of a generalized stationary process {xt }t∈R by means of an orthogonal countably additive measure on BR and taking values in H such that ∞ xt = eitω d(ω) t ∈R (51) ∞
Recall that a countably additive measure : BR −→ H satisfies ∞ ∞ 5 An = (An ) n=1
n=1
in the norm of H, for every disjoint sequence {An }∞ n=1 in BR . The isometry ˜ defines the measure . Let B be a Borel set in R. Setting (B) = (χ ˜ B ), where χ B is the characteristic function of B, we obtain a countably aditive measure. This measure takes orthogonal values for any disjoint Borel subsets because the following equality holds: (B), (B ′ )H = χ B , χ B ′ L 2μ . 2. In general, we can consider a process {xt }t∈R represented by Eq. (51) when the countably additive measure is not necessarily orthogonal. These processes are the harmonizable processes. In the case of band-limited
ANTONIO G. GARC´IA
128
harmonizable processes the sampling formula (49) remains valid whenever supp ⊆ [−π, π] and ({−π }) = ({π}) = 0 ∈ H. The convergence in Eq. (50) is –almost everywhere and bounded. The bounded convergence theorem for applied to the expansion (50) allows us to interchange the series with the integral and thus obtain the sampling expansion for the process. Indeed, π ∞ sin π (t − n) inω e d(ω) xt = −π n=−∞ π(t − n) =
∞ sin π (t − n) xn π (t − n) n=−∞
Technical details about the integral of a scalar function with respect to a vectorial measure were obviously omitted. The interested reader should consult Bartle (1956) for the details and proofs of convergence results. Finally, let us consider a note on harmonizable processes: Stationarity is an unacceptable restriction in many problems such as signal detection. Searching for a relaxation of stationarity while still retaining the methods of harmonic analysis led Lo`eve (1943) to introduce the concept of harmonizability. The historical evolution of this concept and its mathematical treatment can be found in Rao (1982). In Cambanis and Masry (1976), the importance of harmonizable stochastic processes in system analysis is stressed by showing that the output of a wide class of systems is a harmonizable process. See also Chang and Rao (1983) and Piranashvili (1967) for topics related to harmonizable processes and sampling. V. At the End of the Walk The author is indebted to all those who, with their books, papers, and surveys, have contributed to the revitalization of this beautiful and relevant topic in applied mathematics. Let me mention, as a sampling of references, the surveys by Benedetto and Ferreira (2001a), Butzer (1983), Higgins (1985), and Jerri (1977); the papers by Benedetto (1992, 1994), Butzer et al. (1988), Butzer and Stens (1992), Feichtinger and Gr¨ochenig (1994), Nashed and Walter (1991), and Unser (2000); and the books by Benedetto and Ferreira (2001b), Higgins (1996), Higgins and Stens (1999), Marks (1991, 1992), and Zayed (1993). In addition, reading books on related subjects, such as wavelets or harmonic analysis, is a highly recommended exercise that will place sampling theory in more general contexts. Such books, for example, are those by Benedetto (1997), Daubechies (1992), Mallat (1999), Meyer (1992), Ramanathan (1998), and Strang and Nguyen (1996).
SAMPLING THEORY
129
To conclude this article, I venture to include a personal list of sampling topics or groups of topics not mentioned in previous sections. By no means should it be understood as an updated state-of-the-art in sampling theory: it is intended only to orient curious readers toward more-advanced sampling problems presented from different points of view. Many band-limited signals encountered in practical applications do not have finite energy (they do not belong to any P Wπ σ ) and the techniques in Section III do not apply. Naturally, in this case it is necessary to specify the exact meaning of the term band limited. Some generalizations of the concept of the band limited signal have appeared in the literature. In particular, if we allow the Fourier transform to be taken in the sense of Schwartz distributions, then the class of band-limited signals can be enlarged tremendously. Any complex exponential signal eω (t) = eitω can be regarded as a band-limited signal because its Fourier transform is essentially the Dirac delta function δ(x − ω), which is a generalized function with compact support at {ω}. Sampling theorems for signals that are band limited in the distributional sense can be found, for instance, in Campbell (1968), Garc´ıa et al. (1998), Hoskins and De Sousa Pinto (1984), and Walter (1988). Other generalizations of the concept of the band-limited signal can be found in Cambanis and Liu (1970), Lee (1976), Seip (1987), Zakai (1965), and Zayed (1993). Another interesting issue is to enlarge the set of classical band-limited functions by considering new spaces where the WSK sampling theorem still applies. This leads to the study of Bernstein spaces Bσp where σ > 0 and 1 ≤ p ≤ ∞, defined as the set of all entire functions of exponential type at most σ and whose restriction to R belongs to L p (R). It also leads to the general Paley–Wiener classes P Wσp , defined as the set of functions f with an integral representation σ F(x) ei zx d x with F ∈ L p [−σ, σ ] f (z) = −σ
In the particular case p = 2, both classes coincide (i.e., P Wσ2 = Bσ2 ). Morespecific accounts of these spaces and their properties can be found in Boas (1954), Higgins (1996), Young (1980), and Zayed (1993). Also, the strong relationship between the WKS sampling theorem and other fundamental results in mathematics, such as Poisson’s summation formula or Cauchy’s integral formula, is surprising. In recent years, many authors have drawn new relationships by showing the equivalence of the WSK sampling theorem, or any of its generalizations, to other important mathematical results like the Euler–MacLaurin formula, the Abel–Plana summation formula, Plancherel’s theorem, the maximum modulus principle, and the Phragm´en–Lindel¨of principle, among others. The interested reader should
130
ANTONIO G. GARC´IA
refer to Butzer and Nasri-Roudsari (1997), Butzer et al. (1988), Butzer and Stens (1983), Higgins (1996), Higgins et al. (2000), and Rachman and Schmeisser (1994). In practice, sampling expansions incur several types of errors. Truncation error results when only a finite number of samples is used. Aliasing error occurs when the band-limiting condition is violated or when an inappropriate bandwidth is used for the signal. Amplitude error arises when we know only approximations of the samples of the signal. Time-jitter error is caused by sampling at instants which differ from the theoretical ones given by the corresponding sampling at hand. And, finally, information-loss error arises when some sampled data or fractions thereof are missing. Concerning this topic, see Butzer et al. (1988), Feichtinger and Gr¨ochenig (1993), Higgins (1996), Jerri (1977, 1992), Marks (1991), and Zayed (1993) and references therein. Band-limited functions cannot be time limited (i.e., they are defined for all t ∈ R). Any f ∈ P Wπ is an entire function and, as a consequence of the isolated zeros principle, it cannot be zero on any interval of the real line unless f is the zero function. Also, for the same mathematical reason, a band-limited function can be extrapolated. As pointed out by Higgins in his book (1996, Chap. 17), band-limited signals are the “mathematical model” of a “real signal.” In other words, a real signal is considered to be known only in so far as we can make measurements or observations of it. Although a Paley–Wiener function is not exactly time limited, it can be considered nearly time limited in the sense that most of its energy is concentrated on a bounded time interval. This leads to the study of the energy concentration of a signal, and, consequently, to the prolate spheroidal functions and the uncertainty principle in signal analysis. Further discussions and details about this topic can be found in Higgins (1996), Landau (1985), Slepian (1976, 1983), Slepian and Pollak (1961), and Unser (2000). Another interesting question is that concerning the density of sampling points required to have a stable sampling in P W B = { f ∈ L 2 (R) | supp . f ⊆ B}. A sequence of sampling points {tn } is a set of stable sampling for P W B if there exists a constant K , independent of f ∈ P W B , such that f L 2 ≤ K f (tn )ℓ2
for every f ∈ P W B . Hence, errors in the output of a sampling-and-reconstruction process are bounded by errors in the input. Although band-limited functions are entire functions and, as a consequence, are completely determined by their values in a sequence of sampling points with an accumulation point (in particular, in any segment of the real line), sampling in practice is meaningless in the absence of the stable sampling condition. Note that whenever we are dealing with frames in P W B (which includes, in particular, orthonormal
SAMPLING THEORY
131
and Riesz bases), the involved sampling set is stable. This is not the case when we are dealing with a set of uniqueness in P W B (i.e., f (tn ) = 0 for every n implies that f is the zero function). Notice that the set-of-uniqueness condition is equivalent to the sequence of complex exponentials {eitn x } being a complete set in L 2 (B). Although samples taken at a set of uniqueness determine elements of P W B uniquely, this does not lead to any process by which we can reconstruct a function by its samples. For example, any finite set of M is always a frame in the space generated by their linear comM vectors {xn }n=1 binations. When M increases, the frame bounds may go respectively to 0 and +∞, and this illustrates the fact that in infinite-dimensional spaces, a family of vectors may be complete and still not yield a stable signal representation. For a set of stable sampling for P W B , its density D(tn ), defined (when the limit exists) by D(tn ) := lim
r →∞
#{tn : tn ∈ [−r, r ]} 2r
with # denoting the cardinality of a set, satisfies D(tn ) ≥ m(B)/2π, where m(B) stands for the Lebesgue measure of the set B. The critical density m(B)/2π is called the Nyquist–Landau sampling rate, below which stable reconstruction is not possible. When B = [−π σ, πσ ], the Nyquist–Landau density coincides with the Nyquist density σ . In the multichannel setting, the Nyquist–Landau density is smaller than the Nyquist density. Furthermore, if {tn } is a set of interpolation for P W B , then D(tn ) ≤ m(B)/ 2π . Recall that {tn } is a set of interpolation for P W B if the moment problem f (tn ) = an for every n has a solution whenever {an } ∈ ℓ2 . This is the case for Riesz bases (Young, 1980, p. 169) and, as a consequence, the density D(tn ) coincides with the Nyquist–Landau density in the Riesz bases setting. For more details, see Benedetto and Ferreira (2001a), Higgins (1996), Landau (1967a, 1967b), Partington (1997), Seip (1995), and Young (1980). An extension of Shannon’s model has been proposed: it is the sampling in shift-invariant or splinelike spaces. This is achieved by simply replacing the sinc function by another generating function ϕ. Accordingly, the basic approximation space V (ϕ) is specified as 2 ck ϕ(t − k) : {ck } ∈ ℓ V (ϕ) := s(t) = k∈Z
As pointed out in Unser (2000), this allows for simpler and more realistic interpolation models in practice, which can be used in conjunction with a much wider class of antialiasing prefilters that are not necessarily ideal low pass. Measured signals in applications have frequency components that decay for higher frequencies, but these signals are not band limited in the strict sense.
132
ANTONIO G. GARC´IA
As a consequence, sampling in shift-invariant spaces that are not band limited is a suitable and realistic model for many applications. See Unser (2000) and references therein for information on this topic. For irregular sampling in shift-invariant spaces, see Aldroubi and Feichtinger (1998) and Aldroubi and Gr¨ochenig (2000, in press). To close this article, I have one final comment: the coverage of sampling theory in this article is by no means intended to be exhaustive; I apologize for any important omission.
Acknowledgments The author thanks Professor Peter W. Hawkes for the opportunity to write this article on sampling theory. This work has been supported by grant BFM20000029 from the D.G.I. of the Spanish Ministerio de Ciencia y Tecnolog´ıa.
References Abramowitz, G., and Stegun, I. (1972). Handbook of Mathematical Functions. New York: Dover. Aldroubi, A., and Feichtinger, H. (1998). Exact iterative reconstruction algorithm for multivariate irregularly sampled functions in spline-like spaces: The L p -theory. Proc. Am. Math. Soc. 126, 2677–2686. Aldroubi, A., and Gr¨ochenig, K. (2000). Beurling–Landau type theorems for non-uniform sampling in shift invariant spline spaces. J. Fourier Anal. Appl. 6, 91–101. Aldroubi, A., and Gr¨ochenig, K. (in press). Non-uniform sampling and reconstruction in shiftinvariant spaces. SIAM Rev. Almeida, L. B. (1994). The fractional Fourier transform and time–frequency representations. IEEE Trans. Signal Processing 42, 3084–3091. Annaby, M. H., Garc´ıa, A. G., and Hern´andez-Medina, M. A. (1999). On sampling and second order difference equations. Analysis 19, 79–92. Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404. Bachman, G., Narici, L., and Beckenstein, E. (2000). Fourier and Wavelet Analysis. New York: Springer-Verlag. Bartle, R. G. (1956). A general bilinear vector integral. Studia Math. 15, 337–352. Benedetto, J. J. (1992). Irregular frames and sampling, in Wavelets—A Tutorial in Theory and Applications, edited by C. K. Chui. San Diego: Academic Press, pp. 445–507. Benedetto, J. J. (1994). Frame decompositions, sampling, and uncertainty principle inequalities, in Wavelets: Mathematics and Applications, edited by J. J. Benedetto and M. W. Frazier. Boca Raton, FL: CRC Press, Chap. 7. Benedetto, J. J. (1997). Harmonic Analysis and Applications. Boca Raton, FL: CRC Press. Benedetto, J. J., and Ferreira, P. J. S. G. (2001a). Introduction to modern sampling theory, in Modern Sampling Theory: Mathematics and Applications, edited by J. J. Benedetto and P. J. S. G. Ferreira. Cambridge, MA: Birkhauser Boston, Chap. 1. Benedetto, J. J., and Ferreira, P. J. S. G. Eds. (2001b). Modern Sampling Theory: Mathematics and Applications. Cambridge, MA: Birkhauser Boston.
SAMPLING THEORY
133
Benedetto, J. J., and Heller, W. (1990). Frames and irregular sampling. Math. Note 10(Suppl. 1), 181–194. Boas, R. P. (1954). Entire Functions. New York: Academic Press. Bond, F. E., and Chan, C. R. (1958). On sampling the zeros of bandwidth limited signals. IRE Trans. Inf. Theory IT-4, 110–113. Borel, E. (1897). Sur l’interpolation. C. R. Acad. Sci. Paris 124, 673– 676. Bourgeois, M., Wajer, F., van Ormondt, D., and Graveron-Demilly, D. (2001). Reconstruction of MRI images from non-uniform sampling and its application to intrascan motion correction in functional MRI, in Modern Sampling Theory: Mathematics and Applications, edited by J. J. Benedetto and P. J. S. G. Ferreira. Cambridge, MA: Birkhauser Boston, Chap. 16. Brown, J. L., Jr. (1967). On the error in reconstructing non-bandlimited functions by means of the bandpass sampling theorem. J. Math. Anal. Appl. 18, 75–84. Brown, J. L., Jr. (1993). Sampling of bandlimited signals: Fundamental results and some extensions, in Handbook of Statistics, Vol. 10, edited by N. K. Bose and C. R. Rao. Amsterdam: Elsevier, pp. 59–101. Butzer, P. L. (1983). A survey of Whittaker–Shannon sampling theorem and some of its extensions. J. Math. Res. Exposition 3, 185–212. Butzer, P. L., and Jansche, S. (1999). A self-contained approach to Mellin transform analysis for square integrable functions: Applications. Integral Transform. Spec. Funct. 8(3–4), 175– 198. Butzer, P. L., and Nasri-Roudsari, G. (1997). Kramer’s sampling theorem in signal analysis and its role in mathematics, in Image Processing: Mathematical Methods and Applications, edited by J. M. Blackledge. London: Oxford Univ. Press, pp. 49–95. Butzer, P. L., Splettßt¨ober, W., and Stens, R. L. (1988). The sampling theorem and linear predictions in signal analysis. Jahresber. Deutsch. Math. Verein. 90, 1–70. Butzer, P. L., and Stens, R. L. (1983). The Euler–MacLaurin summation formula, the sampling theorem and approximate integration over the real axis. Linear Algebra Appl. 52–53, 141–155. Butzer, P. L., and Stens, R. L. (1992). Sampling theory for not necessarily band-limited functions: A historical overview. SIAM Rev. 34, 40–53. Cambanis, S., and Liu, B. (1970). On harmonizable stochastic processes. Inf. Control 17, 183– 202. Cambanis, S., and Masry, E. (1976). Zakai’s class of band-limited of functions and processes: Its characterization and properties. SIAM J. Appl. Math. 30, 10–21. Campbell, L. L. (1968). Sampling theorem for the Fourier transform of a distribution with compact support. SIAM J. Appl. Math. 16, 626– 636. Casazza, P. G. (2000). The art of frame theory. Taiwan. J. Math. 2, 129–201. Cauchy, A. L. (1841). M´emoire sur diverses formules d’analyse. C. R. Acad. Sci. Paris 12, 283–298. Cauchy, A. L. (1893). M´ethode pour d´evelopper des fonctions d’une ou plusieurs variables en s´eries compos´ees de fonctions de mˆeme esp`ece, in Oeuvres de Cauchy, s´erie II, tome VII. Paris: Gauthier-Villars, pp. 366–392. Chang, D. K., and Rao, M. M. (1983). Bimeasures and sampling theorems for weakly harmonizable processes. Stochastic Anal. Appl. 1, 21–55. Christensen, O., and Jensen, T. K. (2000). An introduction to the theory of bases, frames, and wavelets. Tech. Univ. of Denmark. Daubechies, I. (1992). Ten Lectures on Wavelets. Philadelphia: Soc. for Industr. & Appl. Math. Duffin, R., and Schaeffer, A. (1938). Some properties of functions of exponential type. Bull. Am. Math. Soc. 44, 236–240. Duffin, R., and Schaeffer, A. (1952). A class of nonharmonic Fourier series. Trans. Am. Math. Soc. 72, 341–366.
134
ANTONIO G. GARC´IA
Duren, P. L. (2000). Theory of the H p Spaces. New York: Dover. Everitt, W. N., and Nasri-Roudsari, G. (1999). Interpolation and sampling theories, and linear ordinary boundary value problems, in Sampling Theory in Fourier and Signal Analysis: Advanced Topics, edited by J. R. Higgins and R. L. Stens. Oxford: Oxford Univ. Press, Chap. 5. Feichtinger, H. G., and Gr¨ochenig, K. (1992a). Irregular sampling theorems and series expansions of band-limited functions. J. Math. Anal. Appl. 167, 530–556. Feichtinger, H. G., and Gr¨ochenig, K. (1992b). Iterative reconstruction of multivariate bandlimited functions from irregular sampling values. SIAM J. Math. Anal. 23, 244–261. Feichtinger, H. G., and Gr¨ochenig, K. (1993). Error analysis in regular and irregular sampling theory. Appl. Anal. 50, 167–189. Feichtinger, H. G., and Gr¨ochenig, K. (1994). Theory and practice of irregular sampling, in Wavelets: Mathematics and Applications, edited by J. J. Benedetto and M. W. Frazier. Boca Raton, FL: CRC Press, Chap. 8. Feichtinger, H. G., Gr¨ochenig, K., and Strohmer, T. (1995). Efficient numerical methods in non-uniform sampling theory. Num. Math. 69, 423–440. Ferrar, W. L. (1926). On the cardinal function of interpolation theory. Proc. R. Soc. Edinb. 46, 323–333. Garc´ıa, A. G. (2000). Orthogonal sampling formulas: A unified approach. SIAM Rev. 42(3), 499–512. Garc´ıa, A. G., and Hern´andez-Medina, M. A. (2001). The discrete Kramer sampling theorem and indeterminate moment problems. J. Comp. Appl. Math. 134, 13–22. Garc´ıa, A. G., Moro, J., and Hern´andez-Medina, M. A. (1998). On the distributional Fourier duality and its applications. J. Math. Anal. Appl. 227, 43–54. Garc´ıa, A. G., and Mu˜noz-Bouzo, M. J. (2000). On sampling stationary stochastic processes. Appl. Anal. 75(1–2), 73–84. Garc´ıa, A. G., and Mu˜noz-Bouzo, M. J. (2001). Sampling generalized stationary processes, in SAMPTA 2001, edited by A. I. Zayed. Orlando, May 2001, Orlando: Univ. of Central Florida, pp. 107–110. Garc´ıa, A. G., and Szafraniec, F. H. (2002). A converse of the Kramer sampling theorem. Sampling Theory in Signal and Image Processing 1, 53– 61. Gasquet, C., and Witomski, P. (1990). Analyse de Fourier et Applications. Paris: Masson. Gori, F. (1992). Sampling in optics, in Advanced Topics in Shannon Sampling and Interpolation Theory, edited by R. J. Marks II. New York: Springer-Verlag, Chap. 2. Gr¨ochenig, K. (1992). Reconstruction algorithms in irregular sampling. Math. Comput. 59, 181– 194. Gr¨ochenig, K. (1993a). Acceleration of the frame algorithm. IEEE Trans. Signal Processing 41, 3331–3340. Gr¨ochenig, K. (1993b). A discrete theory of irregular sampling. Linear Algebra Appl. 193, 129– 150. Gr¨ochenig, K. (1999). Irregular sampling, Toeplitz matrices, and the approximation of exponential functions of exponential type. Math. Comput. 68, 749–765. Hamming, R. W. (1973). Numerical Methods for Scientists and Engineers. New York: Dover. Hardy, G. H. (1941). Notes on special systems of orthogonal functions, IV: The orthogonal functions of Whittaker’s cardinal. Proc. Camb. Philos. Soc. 37, 331–348. Higgins, J. R. (1972). An interpolation series associated with the Bessel–Hankel transform. J. Lond. Math. Soc. 5, 707–714. Higgins, J. R. (1985). Five short stories about cardinal series. Bull. Am. Math. Soc. 12, 45–89. Higgins, J. R. (1996). Sampling Theory in Fourier and Signal Analysis: Foundations. Oxford: Oxford Univ. Press.
SAMPLING THEORY
135
Higgins, J. R. (1999). Derivative sampling—A paradigm example of multichannel methods, in Sampling Theory in Fourier and Signal Analysis: Advanced Topics, edited by J. R. Higgins and R. L. Stens. Oxford: Oxford Univ. Press, Chap. 3. Higgins, J. R., Schmeisser, G., and Voss, J. J. (2000). The sampling theorem and several equivalent results in analysis. J. Comp. Anal. Appl. 2, 333–371. Higgins, J. R., and Stens, R. L. Eds. (1999). Sampling Theory in Fourier and Signal Analysis: Advanced Topics. Oxford: Oxford Univ. Press. Hoskins, R. F., and De Sousa Pinto, J. (1984). Sampling expansions for functions band-limited in the distributional sense. SIAM J. Appl. Math. 44, 605– 610. Istratescu, V. I. (1987). Inner Product Structures. Dordrecht: Reidel. Jerri, A. (1977). The Shannon sampling theorem and its various extensions and applications: A tutorial review. Proc. IEEE 68(11), 1565–1596. Jerri, A. (1992). Integral and Discrete Transforms with Applications and Error Analysis. New York: Dekker. Kotel’nikov, V. (1933). On the carrying capacity of the “ether” and wire in telecommunications, Material for the first All-Union Conference on Questions of Communications (in Russian). Izd. Red. Upr. Svyazy RKKA. Kramer, H. P. (1957). A generalized sampling theorem. J. Math. Phys. 63, 68–72. Lacaze, B. (1998). La formule d’echantillonnage et A. L. Cauchy. Traitement du Signal 15(4), 289–295. Landau, H. L. (1967a). Necessary density conditions for sampling and interpolation of certain entire functions. Acta Math. 117, 37–52. Landau, H. L. (1967b). Sampling data transmission and the Nyquist rate. Proc. IEEE 55, 1701– 1706. Landau, H. J. (1985). An overview of time and frequency limiting, in Fourier Techniques and Applications, edited by J. F. Price. New York: Plenum. Lee, A. J. (1976). Characterization of band-limited functions and proceses. Inf. Control 31, 258–271. Levinson, N. (1940). Gap and Density Theorems, Vol. 26. New York: Am. Math. Soc. Lloyd, S. P. (1959). A sampling theorem for stationary (wide sense) stochastic processes. Trans. Am. Math. Soc. 44, 1–12. Lo`eve, M. (1943). Probability Theory. Princeton, NJ: Van Nostrand. Mallat, S. (1999). A Wavelet Tour of Signal Processing. San Diego: Academic Press. Marks, R. J., II. (1991). Introduction to Shannon Sampling and Interpolation Theory. New York: Springer-Verlag. Marks, R. J., II. Ed. (1992). Advanced Topics in Shannon Sampling and Interpolation Theory. New York: Springer-Verlag. Marsden, J. E., and Hoffman, M. J. (1987). Basic Complex Analysis. New York: Freeman. Marvasti, F. (1987). A Unified Approach to Zero-Crossings and Nonuniform Sampling of Single and Multidimensional Signals and Systems. Chicago: Department of Electrical Engineering, Illinois Institute of Technology. Meyer, Y. (1992). Ondelettes et algorithmes concurrents. Paris: Hermann. Mihailov, V. P. (1962). Riesz basis in L 2 (0, 1). Dokl. Math. Soc. 3, 851–855. Namias, V. (1980). The fractional order Fourier and its application to quantum mechanics. J. Inst. Math. Appl. 25, 241–265. Nashed, M. Z., and Walter, G. G. (1991). General sampling theorems in reproducing kernel Hilbert spaces. Math. Control Signals Syst. 4, 373–412. Naylor, A. W., and Sell, G. R. (1982). Linear Operator Theory in Engineering and Science. New York: Springer-Verlag.
136
ANTONIO G. GARC´IA
Nyquist, H. (1928). Certain topics in telegraph tranmission theory. AIEE Trans. 47, 617– 644. Ogura, K. (1920). On a certain transcendental integral function in the theory of interpolation. Tˆohoku Math. J. 17, 64–72. Ozaktas, H. M., and Mendlovic, D. (1993). Fourier transforms of fractional order and their optical interpretation. Optics Commun. 101, 163–169. Ozaktas, H. M., and Mendlovic, D. (1995). Fractional Fourier optics. J. Opt. Soc. Am. A 12(4), 743–751. Paley, R. E. A. C., and Wiener, N. (1934). Fourier Transforms in the Complex Domain, Vol. 19, New York: Am. Math. Soc. Papoulis, A. (1977a). Generalized sampling expansion. IEEE Trans. Circuits Syst. 24, 652– 654. Papoulis, A. (1977b). Signal Analysis. New York: McGraw-Hill. Partington, J. R. (1997). Interpolation, Identification and Sampling. Oxford: Clarendon. Pavlov, B. S. (1979). Basicity of an exponential system and Muckenhoupt’s condition. Math. Dokl. 20, 655– 659. Piranashvili, Z. A. (1967). On the problem of interpolation of random processes. Theor. Probl. Appl. 7, 647– 657. Plotkin, E. I., Romero, J., and Swamy, M. N. S. (1996). Reproducing kernels and the use of root loci of specific functions in the recovery of signals from nonuniform samples. Signal Processing 49, 11–23. Rachman, Q. I., and Schmeisser, G. (1994). The summation formulae of Poisson, Plana, Euler– MacLaurin and their relationship. J. Math. Sci. 28, 151–171. Ramanathan, J. (1998). Methods of Applied Fourier Analysis. Cambridge, MA: Birkhauser Boston. Rao, M. M. (1982). Harmonizable processes: Structure theory. Enseign. Math. (no. 3– 4) 28, 295–351. Rawn, M. D. (1989). A stable nonuniform sampling expansion involving derivatives. IEEE Trans. Inf. Theory 35, 1223–1227. Riesz, F., and Sz.-Nagy, B. (1990). Functional Analysis. New York: Dover. Rozanov, Y. A. (1967). Stationary Random Processes. San Francisco: Holden-Day. Saitoh, S. (1997). Integral Transforms, Reproducing Kernels and Their Applications. Essex, England: Longman. Sansone, G. (1991). Orthogonal Functions. New York: Dover. Seip, K. (1987). An irregular sampling theorem for functions bandlimited in a generalized sense. SIAM J. Appl. Math. 47, 1112–1116. Seip, K. (1995). On the connection between exponential bases and certain related sequences in l 2 (−π, π). J. Funct. Anal. 130, 131–160. Shannon, C. E. (1949). Communication in the presence of noise. Proc. IRE 137, 10–21. Slepian, D. (1976). On bandwidth. Proc. IEEE 64, 292–300. Slepian, D. (1983). Some comments on Fourier analysis, uncertainty and modelling. SIAM Rev. 28, 389–393. Slepian, D., and Pollak, H. O. (1961). Prolate spheroidal wave functions, Fourier analysis and uncertainty. Bell Syst. Tech. J. 40, 43– 64. Soize, C. (1993). M´ethodes math´ematiques en analyse du signal. Paris: Masson. Stark, H. (1992). Polar, spiral, and generalized sampling and interpolation, in Advanced Topics in Shannon Sampling and Interpolation Theory, edited by R. J. Marks II. New York: SpringerVerlag, Chap. 6. Stens, R. L. (1983). A unified approach to sampling theorems for derivatives and Hilbert transforms. Signal Processing 5, 139–151. Strang, G., and Nguyen, T. (1996). Wavelets and Filter Banks. Wellesley, MA: Wellesley– Cambridge Press.
SAMPLING THEORY
137
Szeg¨o, G. (1991). Orthogonal Polynomials, Vol. 23. Providence, RI: Am. Math Soc. Titchmarsh, E. C. (1926). The zeros of certain integral functions. Proc. Lond. Math. Soc. 26, 283–302. Unser, M. (2000). Sampling—50 years after Shannon. Proc. IEEE 88(4), 569–587. Walker, W. J. (1994). Oscillatory properties of Paley–Wiener functions. Indian J. Pure Appl. Math. 25, 1253–1258. Walter, G. G. (1988). Sampling bandlimited functions of polynomial growth. SIAM J. Math. Anal. 19, 1198–1203. Walter, G. G. (1994). Wavelets and Other Orthogonal Systems with Applications. Boca Raton, FL: CRC Press. Watson, G. N. (1944). A Treatise on the Theory of Bessel Functions. Cambridge, UK: Cambridge Univ. Press. Whittaker, E. T. (1915). On the functions which are represented by the expansion on the interpolation theory. Proc. R. Soc. Edinb. A 35, 181–194. Whittaker, J. M. (1935). Interpolatory Function Theory. Cambridge, UK: Cambridge Univ. Press. Xia, X. (1996). On bandlimited signals with fractional Fourier transform. IEEE Signal Processing Lett. 3(3), 72–74. Yao, K. (1967). Applications of reproducing kernel Hilbert spaces—Bandlimited signal models. Inf. Control 11, 429– 444. Young, R. M. (1980). An Introduction to Nonharmonic Fourier Series. New York: Academic Press. Zakai, M. (1965). Band-limited functions and sampling theorem. Inf. Control 8, 143–158. Zayed, A. I. (1991). On Kramer sampling theorem associated with general Sturm–Liouville problems and Lagrange interpolation. SIAM J. Appl. Math. 51, 575– 604. Zayed, A. I. (1993). Advances in Shannon’s Sampling Theory. Boca Raton, FL: CRC Press. Zayed, A. I. (1996a). Function and Generalized Function Transformations. Boca Raton, FL: CRC Press. Zayed, A. I. (1996b). A generalized sampling theorem with the inverse of an arbitrary square summable sequence as sampling points. J. Fourier Anal. Appl. 2(3), 303–314. Zayed, A. I. (1996c). On the relationship between the Fourier and fractional Fourier transform. IEEE Signal Processing Lett. 5, 310–311. Zayed, A. I. (1998a). A convolution and product theorem for the fractional Fourier transform. IEEE Signal Processing Lett. 5(4), 101–103. Zayed, A. I. (1998b). Hilbert transform associated with the fractional Fourier transform. IEEE Signal Processing Lett. 5(8), 206–208. Zayed, A. I., and Garc´ıa, A. G. (1999). New sampling formulae for the fractional Fourier transform. Signal Processing 77, 111–114. Zayed, A. I., Hinsen, G., and Butzer, P. L. (1990). On Lagrange interpolation and Kramer-type sampling theorems associated with Sturm–Liouville problems. SIAM J. Appl. Math. 50, 893– 909. Zygmund, A. (1957). Trigonometric Series. Cambridge, UK: Cambridge Univ. Press.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 124
Kriging Filters for Space–Time Interpolation WILLIAM S. KERWIN1AND JERRY L. PRINCE2 1
Department of Radiology, University of Washington, Seattle, Washington, 98195 Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, 21218
2
I. Introduction . . . . . . . . . . . . . . . . . . . . . . II. Data Model . . . . . . . . . . . . . . . . . . . . . . . III. Review of Kriging Methods . . . . . . . . . . . . . . . . A. Spatial Kriging . . . . . . . . . . . . . . . . . . . . B. Cokriging . . . . . . . . . . . . . . . . . . . . . . C. Space–Time Kriging . . . . . . . . . . . . . . . . . . D. Comparison of Kriging, Space–Time Kriging, and Cokriging . IV. Best Linear Unbiased Prediction . . . . . . . . . . . . . . A. Projected Orthogonality Theorem . . . . . . . . . . . . B. Alternate Derivation of Kriging . . . . . . . . . . . . . V. Cokriging Filters . . . . . . . . . . . . . . . . . . . . A. Temporal Filter for Cokriging . . . . . . . . . . . . . . B. Temporal Smoother for Cokriging . . . . . . . . . . . . VI. Space–Time Kriging Filters . . . . . . . . . . . . . . . . A. Temporal Filter for Space–Time Kriging . . . . . . . . . 1. Initialization . . . . . . . . . . . . . . . . . . . . 2. The Filter . . . . . . . . . . . . . . . . . . . . . B. Temporal Smoother for Space–Time Kriging . . . . . . . VII. Applications . . . . . . . . . . . . . . . . . . . . . . A. Groundwater Data . . . . . . . . . . . . . . . . . . . B. Cardiac MRI . . . . . . . . . . . . . . . . . . . . . 1. Tag Surface Model . . . . . . . . . . . . . . . . . 2. Observation Model . . . . . . . . . . . . . . . . . 3. Cokriging . . . . . . . . . . . . . . . . . . . . . 4. Tracking Method . . . . . . . . . . . . . . . . . . 5. Results and Discussion of Cokriging Filter for Cardiac MRI VIII. Discussion and Conclusion . . . . . . . . . . . . . . . . Appendix: Optimality of Filtering Algorithms . . . . . . . . Proof of Algorithm V.1 . . . . . . . . . . . . . . . . . Proof of Algorithm V.2 . . . . . . . . . . . . . . . . . Proof of Algorithm VI.1 . . . . . . . . . . . . . . . . Proof of Algorithm VI.2 . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
140 141 143 143 144 146 146 150 154 157 158 160 163 164 165 166 168 169 171 171 175 176 179 180 181 183 184 187 187 188 190 191 192
139 Copyright 2002, Elsevier Science (USA). All rights reserved. ISSN 1076-5670/02 $35.00
140
KERWIN AND PRINCE
I. Introduction The need for data interpolation is a pervasive problem in many scientific disciplines. For example, it arises in image processing when the pixel size must be reduced or gaps need to be filled in the spatial coverage of imaging devices. When data are obtained over both space and time, the ideal interpolation technique should incorporate both spatial and temporal information. However, space–time interpolation faces several challenges, including the potential for overwhelming amounts of data, the fact that future observations are unavailable, and the need to relate temporal measures to spatial measures. The purpose of this article is to address these challenges with a set of space–time interpolation techniques based on kriging. When originally proposed, kriging was a purely spatial interpolator for estimating mineral reserves from scattered core samples (Christensen, 1991; Cressie, 1990; Matheron, 1969). The spatial distribution of ore was assumed to consist of two components—a large-scale trend and small-scale fluctuation around the trend. The kriging equations were then derived by using best linear unbiased prediction (BLUP), the basic machinery of which predates the origin of kriging (Goldberger, 1962; Henderson, 1950; Malley, 1986; Robinson, 1991). The kriging method has since been extended to interpolate data with both spatial variation and temporal variation, which has led to space–time kriging and space–time cokriging (Bogaert, 1996). In space–time kriging, time is treated as an additional data dimension (Bilonick, 1985; Rouhani and Hall, 1989). In space–time cokriging, samples obtained at discrete time points are assumed to arise from separate but correlated spatial distributions (Papritz and Fluhler, 1994). Both techniques have two drawbacks in terms of computation time. First, both require inversion of a large matrix, with dimensions that depend on the total number of observations in both space and time. Second, if the data are processed on line, newly obtained observations can be incorporated only by repeating the entire matrix inversion process. These drawbacks have led other researchers to propose space–time interpolation techniques that combine aspects of kriging and Kalman filtering (Berke, 1998; Huang and Cressie, 1996; Kerwin and Prince, 1999a). Kalman filtering is a well-known technique that reduces computation time for estimating temporal processes (Kalman and Bucy, 1961). Specifically, Kalman filtering requires the inversion of matrices with dimensions that depend only on the number of observations at a single time. Also, new observations are incorporated simply by updating past estimates. Thus, methods that combine kriging and Kalman filtering provide both space–time interpolation and fast computation. In this article, we show that under appropriate assumptions, both space– time kriging and cokriging can be accomplished by using fast filtering techniques. Our focus is on interpolating space–time sequences of functions z n (x),
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
141
where n = 1, 2, . . . , is a discrete time index and x is a position in any multidimensional space. Each function is assumed to consist of a zero-mean random component ψn (x) plus an unknown combination of trend basis functions f 1 (x), . . . , f d (x). Given observations of each function at positions x1 , . . . , x p that are fixed in time, we seek to reconstruct the complete sequence by using BLUP. We call the resulting algorithms the space–time kriging filter and the space–time cokriging filter. Both filters are presented in two forms: first, for on-line use when only the latest function in the series is of interest, and, second, for off-line use when each function is interpolated by using data from all time frames. The key to developing fast filtering methods is to assume that the temporal correlation in ψn (x) is determined by the stochastic state model ψn (x) = qψn−1 (x) + νn (x)
(1)
where q is a known scalar, and νn (x) is a temporally uncorrelated, zero-mean random input. We refer to Eq. (1) as the kriging update model (Kerwin and Prince, 1999a). We interpret q as a relaxation parameter, nominally between zero and one, that determines the rate at which the trend component is approached in the absence of additional input. For example, when applied to water reserves, q may be related to evaporation and νn (x) may be related to new rainfall. On the basis of the kriging update model, we develop the space–time kriging and cokriging filters as follows. In Section II, we present the general space–time data model common to all kriging methods under consideration. In Section III, we demonstrate the various assumptions regarding the data model that lead to spatial kriging, space–time kriging, and cokriging. Section IV presents the required theoretical framework of BLUP, illustrated with a unique derivation of the spatial kriging equations. In Section V, we develop the space–time cokriging filter, and in Section VI, the corresponding space–time kriging filter. The methods are illustrated in Section VII with applications in groundwater reserves and magnetic resonance imaging (MRI) of heart motion. We close with a discussion of extensions of these methods and some concluding remarks in Section VIII.
II. Data Model Throughout this article, we use lowercase italics to indicate scalar quantities and functions, boldface letters to indicate vectors, and uppercase letters to indicate matrices. The space–time sequences are assumed to obey the general kriging model z n (x) = ψn (x) + fTn (x)mn
(2)
142
KERWIN AND PRINCE
where ψn (x) is a zero-mean, random variation and fTn (x)mn is the large-scale trend in the data. Furthermore, fn (x) is a d × 1 vector of known basis functions, and mn is a d × 1 vector of unknown, deterministic weights. Assuming a general set of trend basis functions leads to the universal kriging model. The simplified assumption that fn (x) = 1 (i.e., the trend is an unknown constant) leads to the ordinary kriging model (Cressie, 1990). The small-scale variation ψn (x) is assumed to be characterized by a known cross-covariance function, ki j (x1 , x2 ) = E{ψi (x1 )ψ j (x2 )} with a spatial dependence that is usually determined only by the distance between points: r = x1 − x2 In kriging, the semivariogram γi j (x1 , x2 ) = 12 E{(ψi (x1 ) − ψ j (x2 ))2 } is often used in place of the covariance because it may be easier to approximate. Equivalently we can use ki j (x1 , x2 ) = −γi j (x1 , x2 ) which we interpret as the generalized covariance of an intrinsic random function (Matheron, 1973). Because kriging with intrinsic random functions is equivalent to kriging with semivariograms (Christensen, 1990), we present all subsequent equations in terms of the covariance, with the knowledge that it may be a generalized covariance. Observations of z n (x) at time n are placed in the p × 1 vector yn . Often, the observations are assumed to be exact, but more generally, the observations have been corrupted by noise so that yn = zn + η n
(3)
where zn = [ z n (x1 ) · · · z n (x p ) ]T and η n is a vector of zero-mean random noise with covariance matrix n = E{η n η Tn } In the kriging literature, observation noise is referred to as the nugget effect, owing to mining applications in which the presence of a large nugget in a core sample can skew the apparent mineral concentration (Christensen, 1991).
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
143
Example II.1 The kriging model is motivated by Brownian motion. By definition, the mean of a one-dimensional Brownian-motion process is a constant m and the semivariogram is proportional to the distance between points so that γ (r ) = ar for some constant a. However, for an arbitrary segment of a Brownian-motion process the mean is unknown. The segment can thus be assumed to consist of a zero-mean portion ψ(x) with semivariogram ar and an unknown trend m f (x) where f (x) = 1 is the basis function and m is the unknown coefficient. III. Review of Kriging Methods All kriging methods can utilize the preceding data model to predict a scalar function at an unobserved location by using BLUP. The resulting prediction zˆ n (x) is “best” in the sense of minimizing mean square error, “linear” in that zˆ n (x) is a linear combination of y1 , y2 , . . . , and “unbiased,” meaning that the expected value of the prediction is equal to the trend. The differences between the methods arise in the assumed data models. In spatial kriging, neither ψn (x) nor mn exhibits any temporal relationships. In cokriging, ψn (x) is temporally correlated and mn varies with time. Finally, in space–time kriging ψn (x) is temporally correlated and mn is a fixed constant for all time. In the following sections, the resulting prediction equations are delineated and some of the advantages and disadvantages of each method are highlighted. Before delving into the methods, we point out that our use of the term prediction can cause some confusion because prediction is often associated with temporal forecasting. However, in this case, prediction is defined as “estimating a random process at a location that has not been observed”. This traditional definition has been adopted by the spatial kriging community, in which time is not considered. However, within a space–time context, it is entirely possible to discuss “predicting the present” or, worse still, “predicting the past.” Despite these odd notions, we maintain the traditional terminology for consistency with the kriging literature.
A. Spatial Kriging If we assume that ki j (x1 , x2 ) = 0 for i = j and that mn varies with n, then data obtained at other times have no bearing on the prediction of z n (x). Thus, zˆ n (x) will be a linear combination of the components of yn only. Applying BLUP
144
KERWIN AND PRINCE
leads to the original spatial kriging formulation (cf. Christensen, 1991) ˆn zˆ n (x) = kTnn (x)wn + fTn (x)m
(4)
where 6 −1 7 wn = (K nn + n )−1 I − FnT (Fn K nn + n )−1 FnT Fn (K nn + n )−1 yn (5) −1 ˆ n = Fn (K nn + n )−1 FnT Fn (K nn + n )−1 yn (6) m
In these equations, I is the identity matrix and we have defined knn (x) = [knn (x, x1 ) · · · knn (x, x p )]T ⎤ ⎡ knn (x1 , x1 ) · · · knn (x1 , x p ) ⎥ ⎢ .. .. .. K nn = ⎣ ⎦ . . . knn (x p , x1 ) · · · knn (x p , x p ) 6 7 Fn = fn (x1 ) · · · fn (x p )
(7) (8) (9)
B. Cokriging If we apply BLUP assuming that ψn (x) is temporally correlated and that mn varies with n, we can incorporate all available data up to the last time frame N. The result is cokriging, which was originally developed for jointly kriging separate but correlated functions, such as concentrations of multiple mineral species over the same area (Journel and Huijbrechts, 1978; Myers, 1982; Wackernagel, 1994). Likewise, cokriging can be used in our space–time context to interpolate each function in the sequence over space. Under these assumptions, the cokriging solution for predicting z n (x) is given by (Myers, 1982) ˆn zˆ n (x) = kTn (x)w + fTn (x)m
(10)
where ⎡
w = (K + )−1 [I − F T (F (K + )−1 F T )−1 F (K + )−1 ]y (11) ⎤
ˆ1 m ⎢ .. ⎥ −1 T −1 −1 ⎣ . ⎦ = (F (K + ) F ) F (K + ) y ˆN m
(12)
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
145
and we have defined the following: first, ⎡ ⎤ y1 ⎢ .. ⎥ y=⎣ . ⎦ yN
second,
⎤ kn1 (x) ⎢ . ⎥ kn (x) = ⎣ .. ⎦ ⎡
kn N (x)
where
kn j = [kn j (x, x1 ) · · · kn j (x, x p )]T
(13)
third, ⎡
where
K 11 · · · ⎢ .. .. K=⎣ . . K N1 · · ·
⎤ K 1N .. ⎥ . ⎦ KNN
⎤ ki j (x1 , x1 ) · · · ki j (x1 , x p , ) ⎥ ⎢ .. .. .. Ki j = ⎣ ⎦ . . . ki j (x p , x1 ) · · · ki j (x p , x p ) ⎡
fourth,
⎡
⎢ F =⎣
0
F1 .. 0
. FN
(14)
(15)
⎤ ⎥ ⎦
a block diagonal matrix with Fn defined as in Eq. (9); and finally, the block diagonal matrix ⎤ ⎡ 0 1 ⎥ ⎢ .. (16) =⎣ ⎦ . 0 N
146
KERWIN AND PRINCE
C. Space–Time Kriging Finally, space–time kriging is obtained if we assume that ψn (x) is temporally correlated and that mn = m (i.e., it is fixed for all n). Any temporal variation is assumed to be captured in the known basis vector fn (x) which can be a function of n. The space–time kriging assumptions then permit us to use data from all time frames to produce the prediction (Rouhani and Hall, 1989) ˆ zˆ n (x) = kTn (x)w + fTn (x)m
(17)
w = (K + )−1 [I − F T (F (K + )−1 F T )−1 F (K + )−1 ]y
(18)
where ˆ = (F (K + )−1 F T )−1 F (K + )−1 y m
(19)
The matrices in these equations are defined as in cokriging except that F is replaced with the matrix
F = [F1 · · · FN ]
(20)
D. Comparison of Kriging, Space–Time Kriging, and Cokriging Reviewing the formulations for kriging, space–time kriging, and cokriging shows them to be similar. The final prediction equations (4), (10), and (17) are all linear combinations of the covariance functions plus linear combinations of the trend basis functions. Furthermore, the coefficients in the linear combinations are computed by nearly identical equations. Compare Eq. (5) with Eq. (11) with Eq. (18) and Eq. (6) with Eq. (12) with Eq. (19). Nevertheless, there are important differences in these equations that have implications for the use of kriging, space–time kriging, and cokriging as space–time interpolators. Let us consider kriging first. Its spatial formulation limits its use to interpolating data obtained at a single observation time. Thus, spatial kriging forgoes the benefits of using temporal information and generally produces worse predictions than those of the space–time methods. The advantage of kriging is that it is substantially faster than the space–time methods because fewer data points are incorporated into each prediction. In kriging, computation time is dominated by the inversion of the p × p matrix (K nn + n ), whereas the space–time methods are dominated by the inversion of the N p × N p matrix (K + ). Because the latter matrix is N times larger, it takes substantially longer to invert. For example, if 10 time points are obtained, individually kriging the data at each of the 10 times will be 100 to 1000 times faster than applying either space–time kriging or cokriging to the entire data set. For
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
147
densely sampled, low-noise data, the original spatial kriging formulation can be entirely satisfactory. Conversely, if the data are characterized by high noise levels or sparse samples, a space–time method such as space–time kriging or cokriging is often preferable. One advantage of space–time kriging is that it can be formulated to interpolate not just over space, but also between observation times. The main disadvantage of space–time kriging, aside from its computation time, is difficulties in model specification because both temporal and spatial dependencies of the covariance and the trend must be gleaned from the data (Rouhani and Myers, 1990). For example, spatial kriging often assumes isotropic covariances or semivariograms. However, in space–time kriging, isotropy requires units of time to be equated to units of distance. Rouhani and Hall (1989) proposed to avoid this issue by considering separable trends and covariances. Specifically, they assumed that trend basis functions can be grouped as fs (x) fn (x) = ft (n) where the two terms determine, respectively, the spatial trend and the temporal trend. For the covariance function, they assumed that ki j (r ) = ks (r ) + kt (|i − j|) where the two terms are, respectively, the isotropic spatial covariance depending on the separation r between locations and the temporal covariance depending on the time separation |i − j|. Finally, cokriging has similar advantages and disadvantages to those of space–time kriging. The main difference is that cokriging is capable of predicting a function only at an observed time and therefore cannot be used to interpolate over time. Nevertheless, this disadvantage comes with the benefit that the temporal trend of the data does not need to be known. Thus, cokriging can be effective for data that undergo sudden jumps or otherwise lack a clear temporal trend. Example III.1 To illustrate the differences among kriging, space–time kriging, and cokriging, let us examine a hypothetical set of data based on the model of Rouhani and Hall (1989) for monthly water table elevations. In this model, first, the trend is assumed to be quadratic over space and linear over time so that 6 7 fs (x) = 1 x y x 2 y 2 x y where x and y are the two components of position x, and ft (n) = n
148
KERWIN AND PRINCE
Second, the spatial and temporal components of the covariance are given by ks (r ) = as r
kt (|i − j|) = at |i − j| where as and at are model parameters less than or equal to zero. (This restriction is in accordance with the description of intrinsic random fields by Matheron, 1973.) Finally, the measurements are subject to independent noise with standard deviation σ . Given this model, spatial kriging can be performed for individual months by using the kriging model parameters knn (x1 , x2 ) = as x2 − x1 fn (x) = [1
x
x2
y
x y]T
y2
n = σ 2 I Cokriging can be performed by using ki j (x1 , x2 ) = as x2 − x1 + at |i − j| fn (x) = [1
x
y
x2
x y]T
y2
n = σ 2 I which adds the temporal component of the covariance. space–time kriging can be performed by using the model parameters ki j (x1 , x2 ) = as x2 − x1 + at |i − j| fn (x) = [1
x
y
x2
y2
xy
n]T
n = σ 2 I where the difference from the cokriging model is the addition of the time variable n to the trend basis functions. To demonstrate, we simulated a set of water table data with parameters as = −40, at = −2, σ = 5, mt = [1], and ms = [ 100 3 4 −0.1 −0.15 −0.1 ]T
The data were generated over a 20 × 20-mile grid and for 12 monthly time points. The water levels were then sampled with noise at 50 randomly placed “wells,” which were at fixed positions over time. Figure 1 shows a contour plot of the simulated data from month 7. The most prominent feature is a peak greater than 170, located below and to the right of center. The locations of the wells are also shown in this figure.
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
149
Figure 1. Simulated water table elevations in feet for month 7 with wells marked by × marks.
The data samples were interpolated by using spatial kriging, cokriging, and space–time kriging, and the results for month 7 are depicted in Figures 2, 3, and 4, respectively. These figures show that spatial kriging and cokriging failed to reconstruct the presence of the peak, but it is clearly visible in the space–time kriging result. Also, the space–time kriging result remained almost entirely within 15 ft of the actual levels, whereas the other methods differed by more than 30 ft in places. The better error performance of space–time kriging is also clear in Figure 5, which shows that the smallest root-meansquare (RMS) differences occured in every month for space–time kriging. The highest RMS differences occurred for spatial kriging, although cokriging showed only a marginal improvement. However, the better performance for the space–time methods comes with a substantial penalty in computation times, as shown in Table 1. Spatial kriging was performed with more than 1000 times fewer floating-point operations than those of the space–time methods. Finally, we note that these results do not indicate that space–time kriging is always superior to the other methods in error performance. These data were simulated on the basis of the space–time kriging model. Which method has the lowest error in a given application depends on the underlying data behavior. Nevertheless, with regard to computation time, spatial kriging will always substantially outperform the traditional formulations of space–time kriging and cokriging.
150
KERWIN AND PRINCE
Figure 2. (Top) Water table elevations for month 7 predicted by kriging and (bottom) difference from truth, where gray corresponds to errors larger than 15 ft in absolute value and black corresponds to errors greater than 30 ft.
IV. Best Linear Unbiased Prediction The focus of the remainder of this article is on computationally fast methods for space–time kriging and cokriging. Our goal is to interpolate functions more accurately than is possible with spatial kriging, but to minimize any increased computational burden. All methods are developed by using BLUP, a key characterization of which is the projected orthogonality theorem. Before
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
151
Figure 3. (Top) Water table elevations for month 7 predicted by cokriging and (bottom) difference from truth, where gray corresponds to errors larger than 15 ft and black corresponds to errors greater than 30 ft.
presenting this theorem, we review the basics of BLUP as applied to kriging models. BLUP is a linear predictor of the form zˆ =
N i=1
aiT yi
152
KERWIN AND PRINCE
Figure 4. (Top) Water table elevations for month 7 predicted by space–time kriging and (bottom) difference from truth, where gray corresponds to errors larger than 15 ft and black corresponds to errors greater than 30 ft.
where ai are vectors of coefficients. The prediction is constrained to be unbiased E{ˆz } = E{z} which leads to constraints on the coefficients of the form Fi ai = fi ,
i = 1, . . . , N
(21)
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
153
Figure 5. Root-mean-square (RMS) errors between the true groundwater data and the predicted values from (solid line) kriging, (dashed line) cokriging, and (dotted line) space–time kriging for each month.
or N i=1
Fi ai = fn
(22)
depending on the assumed form of the trend. Finally, BLUP produces minimum variance predictors so that E{(z − zˆ )2 } is minimized. Finding the optimal coefficient vectors a1 , . . . , a N is traditionally accomplished by using the method of Lagrange multipliers. TABLE 1 Total Number of Floating-Point Operations Required to Interpolate Water Table Dataa by Using Various Kriging Methods Method
Floating-point operations
Spatial kriging Cokriging Space–time kriging
7.25 × 105 1.04 × 109 8.83 × 108
a
From 50 samples times 12 months.
154
KERWIN AND PRINCE
Example IV.1 To demonstrate the derivation of an unbiasedness condition, we examine the cokriging model. To predict z n (x), we must satisfy the unbiasedness condition E{ˆz n (x)} = E{z n (x)} where zˆ n (x) =
N
aiT (x)yi
i=1
By definition, the expectation of z n (x) is the trend fTn (x)mn . Using this fact and the assumption that E{η i } = 0 gives E{yi } = FiT mi Thus, the unbiasedness condition gives N i=1
aiT (x)FiT mi = fTn (x)mn
Without knowledge of m1 , . . . , m N , we can satisfy this condition only if Fn an (x) = f(x) Fi ai (x) = 0,
i = n
These equations are the desired unbiasedness constraints; subject to these constraints, the minimization of ⎧( )2 ⎫ N ⎬ ⎨ aiT (x)yi − z n (x) E ⎭ ⎩ i=1
leads to the cokriging equations.
A. Projected Orthogonality Theorem The key to deriving fast filtering solutions for space–time kriging and cokriging is a property we call projected orthogonality (Kerwin and Prince, 1999a). It generalizes the well-known property of linear minimum mean square error (LMMSE) prediction that the data and the prediction error are statistically orthogonal for the optimal LMMSE prediction. However, in BLUP, the error and data are, in general, correlated. Nevertheless, the error and the data have
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
155
the useful property that the error is statistically orthogonal to the data projected onto any zero-mean subspace. Projected orthogonality is established by a theorem that is related to a theorem attributed to Lehmann and Scheffe (Malley, 1986). They showed that a linear unbiased estimate of a fixed parameter is of minimum variance if and only if the estimate is statistically orthogonal to all zero-mean linear combinations of the data. The projected orthogonality theorem shows that the linear prediction of a random variable is of minimum error variance if and only if the error satisfies the same orthogonality condition. This fact is shown as follows. Theorem IV.1 (Projected Orthogonality) Let zˆ be a linear predictor zˆ =
N
aiT yi
i=1
that satisfies Fi ai = fi ,
i = 1, . . . , N
(23)
Then, the value of E{(z − zˆ )2 } is minimized if and only if ! E (z − zˆ )bT yl = 0
(24)
for every b satisfying Fl b = 0 and every l , 1 ≤ l ≤ N . Proof. First, we show the necessity of Eq. (24) by contradiction, and then we prove that it is also sufficient. Assume that a1 , . . . , a N are the coefficients for the BLUP solution, but that they do not satisfy Eq. (24). Then, for some l ≤ n and b satisfying Fl b = 0, ' N T T E z− ai yi b yl = α i=1
where α = 0. Now, let us replace al with a˘ l = al + (α/β 2 )b where β 2 = E{(bT yl )2 }. Note that Fl a˘ l = fl
156
KERWIN AND PRINCE
so unbiasedness is preserved. With this change, the error variance becomes E{(z − zˆ )2 } = E
=E
⎧( ⎨ ⎩
z−
⎧ ⎨ ⎩
<E
z−
⎧ ⎨ ⎩
N
i=1,i =l
N
aiT yi + a˘ lT yl
aiT yi
i=1
z−
N
2 ⎫ ⎬
aiT yi
i=1
⎭
−
)2 ⎫ ⎬
α2 β2
⎭
2 ⎫ ⎬ ⎭
This inequality shows that a1 , . . . , a N do not provide the minimum variance solution, which contradicts our initial assumption. Therefore, the necessity of Eq. (24) is proven. To prove sufficiency, let a1 , . . . , a N satisfy Eqs. (23) and (24). Any unbiased linear combination must then have the form zˆ =
N (ai + bi )T yi i=1
where Fi bi = 0 for all i. Computing the associated error variance and applying Eq. (24) yields E{(z − zˆ )2 } = E
≥E
⎧ ⎨ ⎩
⎧ ⎨ ⎩
z−
N
z−
N
aiT yi
i=1
i=1
aiT yi
2 ⎫ ⎬ ⎭
2 ⎫ ⎬
+E
⎧ N ⎨ ⎩
i=1
biT yi
2 ⎫ ⎬ ⎭
⎭
with equality if and only if bi = 0 for all i. This proves that a1 , . . . , a N must produce the minimum variance solution, which thereby proves the theorem. Theorem IV.1 applies to unbiasedness conditions of the form of Eq. (21), but we also encounter unbiasedness conditions expressed as summations, as in Eq. (22). Such a summation also appears in the related Lehmann and Scheffe theorem. Thus, the following corollary of the projected orthogonality theorem is useful.
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
157
Corollary IV.1 Let zˆ be a linear predictor zˆ =
N
N
Fi ai = f
i=1
that satisfies
i=1
Then, the value of
aiT yi
E{(z − zˆ )2 } is minimized if and only if
E (z − zˆ ) for every set b1 , . . . , b N satisfying N i=1
Proof. Define ⎤ y1 ⎢ ⎥ y = ⎣ ... ⎦ yN ⎡
and
N
biT yi
i=1
'
=0
Fi bi = 0 ⎡
⎤ a1 ⎢ ⎥ a = ⎣ ... ⎦ aN
(25)
⎤ b1 ⎢ . ⎥ b = ⎣ .. ⎦ bN ⎡
F = [F1 · · · FN ]
Given these definitions, we need to show that zˆ = aT y is the minimum variance predictor of z, subject to Fa = f, if and only if E{(z − aT y)bT y} = 0
for all b satisfying Fb = 0. This statement is true by Theorem IV.1. B. Alternate Derivation of Kriging The power of the projected orthogonality theorem can be illustrated with a derivation of the kriging equations (4)–(6). Ordinarily, the kriging equations
158
KERWIN AND PRINCE
are derived by using the method of Lagrange multipliers (cf. Cressie, 1990). Use of the projected orthogonality theorem quickly leads to the same solution. For simplicity, we consider the noise-free case, in which n = 0. Given a vector of data yn , the general form of the kriging solution is assumed to be zˆ n (x) = aTn (x)yn , where the coefficients satisfy the unbiasedness constraint Fn an (x) = fn (x)
(26)
By the projected orthogonality theorem, the solution must satisfy ! E z n (x) − aTn (x)yn bT yn = 0
for all b satisfying Fn b = 0. Evaluating the expectation and applying the unbiasedness condition yields 7 6 T knn (x) − aTn (x)K nn b = 0 Then, we define
−1 T −1 T −1 Fn Fn K nn Fn Fn c b = I − K nn
noting that for every vector c, Fb = 0 under this definition. Therefore, the projected orthogonality condition is equivalently T −1 T −1 T −1 Fn Fn K nn Fn Fn = 0 knn (x) − aTn (x)K nn I − K nn
where we drop c because the relation must hold for every c. Multiplying terms, applying Eq. (26), and solving for an (x) yields −1 −1 T −1 −1 an (x) = K nn I − FnT Fn K nn knn (x) Fn Fn K nn −1 −1 T −1 Fn fn (x) FnT Fn K nn + K nn
If we then compute zˆ n (x) = aTn (x)yn and regroup terms, we obtain Eqs. (4)–(6) (with n = 0). V. Cokriging Filters The projected orthogonality theorem gives us a valuable tool for exploring the effects of the kriging update model, ψn (x) = qψn−1 (x) + νn (x)
(27)
on space–time interpolation. Let us first consider its effect on the cokriging equations.
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
159
As discussed in the Introduction, q is a known parameter. The random update νn (x) comes from a sequence of independent, identically distributed, zero-mean random fields with known covariance function k(x1 , x2 ) = E{νn (x1 )νn (x2 )} which is independent of n. We let the initial (n = 0) covariance for the sequence be E{ψ0 (x1 )ψ0 (x2 )} = λk(x1 , x2 ) for some known parameter λ, and we assume that ψ0 (x) is uncorrelated with νn (x) for all n. Of particular importance is the case in which λ=
1 1 − q2
(28)
for which E{ψn (x1 )ψn (x2 )} = λk(x1 , x2 ) for all n. That is, ψn (x) is wide-sense stationary (in time). The trend is given by fTn (x)mn as in traditional cokriging, where mn varies with n. For simplicity, we make the usual assumption that the basis functions are the same for all time frames so that fn (x) = f(x) for all n. Our goal is to predict z n (x) = ψn (x) + fT (x)mn
(29)
given the observed values in the p × 1 vectors y1 , . . . , y N defined by Eq. (3), where the p observation locations are fixed. The resulting model is fully compatible with the method of cokriging. Specifically, we note that i∨ j ki j (x1 , x2 ) = q |i− j| q 2(i∨ j) λ + q 2l−2 k(x1 , x2 ) (30) l=1
where i ∨ j is the smaller of i or j. If the wide-sense stationarity condition Eq. (28) is satisfied, Eq. (30) reduces to ki j (x1 , x2 ) = q |i− j| λk(x1 , x2 )
(31)
Either of these definitions combined with the equations in Section III.B permit cokriging to be used to predict the sequence of functions.
160
KERWIN AND PRINCE
However, in this article, our goal is to develop a computationally efficient method for cokriging. The result is a pair of algorithms that together compute the same cokriging result by means of fast filtering techniques. The first algorithm computes the cokriging prediction of z n (x) by using only the observations obtained up to time frame n. That is, the algorithm is causal; no future observations are used. We refer to this as temporal filtering, which is ideal for on-line processing, in which only the current spatial variation is of interest. The second algorithm refines the results of the first algorithm on the basis of future observations. The result is the true cokriging prediction incorporating observations from the first time frame to the final time frame N. This algorithm is best for off-line use after all observations have been collected. We refer to this as temporal smoothing, which is the terminology adopted by the Kalman filtering community (e.g., Lindquist, 1968).
A. Temporal Filter for Cokriging For the development of the temporal filtering and smoothing algorithms for cokriging, we use the definitions ⎡
k(x1 , x1 ) .. ⎢ K =⎣ .
k(x p , x1 )
··· .. .
⎤ k(x1 , x p ) .. ⎥ ⎦ .
(32)
· · · k(x p , x p )
k(x) = [k(x, x1 ) · · · k(x, x p )]T F = [f(x1 ) · · · f(x p )]
(33) (34)
Then, beginning with the temporal filtering algorithm, the key to formulating zˆn (x) as a temporal filter is that we can write it in the form of an update equation as follows. Theorem V.1 Let z n (x) be given by Eq. (29), where ψn (x) obeys Eq. (27). Then, the BLUP of z n (x) given y1 , . . . , yn can be put into the form zˆ n (x) = q ψˆ n−1 (x) + aTn (x)(yn − q ψˆ n−1 )
(35)
where ψˆ n−1 (x) is the BLUP of ψn (x) given observations through time n − 1, ψˆ n−1 is a vector of predictions at the observation locations, and an (x) is a p × 1 vector of weights.
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
161
Proof. Any general linear predictor can be written as zˆ n (x) = q ψˆ n−1 (x) + aTn (x)(yn − q ψˆ n−1 ) +
n−1
biT (x)yi
i=1
T
To satisfy the unbiasedness constraint E{ˆz n (x)} = f (x)mn , the coefficient vector an (x) must satisfy Fan (x) = mn
(36)
and the remaining coefficients must satisfy Fbi (x) = 0
i = 1, . . . , n − 1
Next, we observe that for any bi (x) satisfying the unbiasedness constraint, 6 7! E [ψn−1 (x) − ψˆ n−1 (x)] biT (x)yi = 0
by the projected orthogonality theorem. On the basis of this observation we find that )' ( n−1 6 7 T T E z n (x) − q ψˆ n−1 (x) − an (x) yn − qψ n−1 =0 bi (x)yi i=1
which can be verified by expanding z n (x) and yn into their component parts and applying our correlation assumptions regarding the individual terms. Thus, the error variance is given by 2 ! ! E [z n (x) − zˆ n (x)]2 = E z n (x) − q ψˆ n−1 (x) − aTn (x)(yn − q ψˆ n−1 ) 2 "n−1 T +E i=1 bi (x)yi
Then, minimizing the error variance can be accomplished by individually minimizing both terms in this equation. Minimizing the second term leads to b1 (x) = · · · = bn−1 (x) = 0, which proves the theorem. As in Kalman filtering, Theorem V.1 frames the prediction of z n (x) as the weighted sum of a past estimate ψˆ n (x) and the innovation (Kailath, 1968) in the latest observation, defined as yn − q ψˆ n−1 . The concept of innovation is explained by writing yn in terms of its components: yn = qψ n−1 + ν n + FnT mn + η n where ν n is a p × 1 vector with ith component νn (xi ). In this expression, qψ n−1 is the only component that can be predicted from data prior to time n. Thus,
162
KERWIN AND PRINCE
yn − q ψˆ n−1 represents all new or innovative information. Unlike Kalman filtering, the optimal weight vector must also satisfy an unbiasedness constraint, Eq. (36). From this discussion, we see that the solution of the cokriging filter requires the prediction of ψn−1 (x) and the calculation of a single weight vector an (x). The fact that only one p × 1 weight vector is calculated is the source of the computational savings over traditional cokriging computation time. Derivation of expressions for ψˆ n−1 (x) and an (x) then leads to the temporal cokriging filter. Algorithm V.1 (The Temporal Cokriging Filter) 1. Initialize H0 = λK and w0 = 0 ( p × 1). Set n = 1. 2. Calculate the matrices L n = (K + n + q 2 Hn−1 )−1
Mn = (F L n F T )−1 F L n
An = (K + q 2 Hn−1 )L n (I − F T Mn ) Hn = (K + q 2 Hn−1 ) I − ATn
3. Calculate the coefficient vectors
m ˆ n = Mn (yn − q K wn−1 )
wn = qwn−1 + K −1 An (yn − q K wn−1 )
4. Calculate the prediction, using zˆ n (x) = kT (x)wn + fT (x)m ˆn
(37)
5. Increment n and go to 2. A proof of the optimality of this algorithm is outlined in the Appendix. The algorithm can be related to the update equation (35) if we define an (x) = ATn K −1 k(x) + MnT f(x)
(38)
and ψˆ n−1 (x) = kT (x)wn−1
(thus ψˆ n−1 = K wn−1 ). The equivalence of Eq. (35) and the final estimate equation (37) can then be seen by appropriately expanding and regrouping terms. The ability to regroup the terms is critically important because the original update equation (35) implies that an infinite number of values must be fed forward from ψˆ n−1 (x), whereas in the final algorithm, only the p × 1 vector wn−1 is fed forward.
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
163
B. Temporal Smoother for Cokriging Algorithm V.1 thus produces the cokriging prediction of z n (x) given y1 , . . . , yn . However, in situations in which data up to y N (N > n) are available, cokriging has the capability of using the future data, whereas the temporal cokriging filter does not. This limitation can be overcome with a second smoothing algorithm that rapidly incorporates future data. To formulate the temporal cokriging smoother, we note that future observations can be incorporated into the prediction by means of the update equation zˆ n|i (x) = zˆ n|i−1 (x) + aTn|i (x)(yi − q ψˆ i−1 )
(39)
where zˆ n|i (x) is the BLUP of z n (x) given observations through time i (i > n). Thus, the estimate can be updated at time i given only the innovation at time i. The proof of this statement is essentially the same as that for Theorem V.1. We also find that an|i (x) is bound by the unbiasedness constraint Fan|i (x) = 0 The smoothing solution is then found by solving for an|i (x) for i = n + 1, . . . , N and combining the results to compute zˆ n|N (x). The resulting temporal cokriging smoother is given by the following algorithm. Algorithm V.2 (The Temporal Cokriging Smoother) 1. Initialize v N = 0 ( p × 1). Set n = N − 1. 2. Calculate the vector vn = q I − ATn+1 vn+1 + q L n+1 (I − F T Mn+1 )(yn+1 − q K wn ) 3. Update the coefficient vectors
m ˆ n|N = m ˆ n + Mn n ATn − Hn vn wn|N = wn + K −1 Hn vn
4. Calculate the prediction, using zˆ n|N (x) = kT (x)wn|N + fT (x)m ˆ n|N
(40)
5. If n = 1, stop; otherwise, decrement n and go to 2. A proof of this algorithm is outlined in the Appendix. Note that the smoothˆ n as inputs from the filtering algorithm requires L n , Mn , An , Hn , wn , and m ing algorithm. Thus, these matrices and vectors must be stored for all time. However, the method does not require any additional matrix inversions and is therefore much faster than even the temporal cokriging filter. Like the filter, the temporal cokriging smoother operates recursively in a single pass through
164
KERWIN AND PRINCE
the data, in this case working backward from the last time frame to the first. Therefore, the complete cokriging prediction given all time frames is computed by using Algorithm V.1 in the forward direction followed by Algorithm V.2 in the reverse direction. The relationship of the algorithm to the update equation (39) is that (41) an|i (x) = q i−n I − MiT F L i (I − Ai−1 ) · · · (I − An+1 )un (x) for i > n + 1 and
where
T F L n+1 un (x) an|n+1 (x) = q I − Mn+1
(42)
6 7 un (x) = (An n − Hn )MnT f(x) + Hn K −1 k(x)
The algorithm arises from a convenient rearrangement of these terms.
VI. Space–Time Kriging Filters Next, with one small change in the model, we obtain a set of space–time kriging filters instead of cokriging filters. We assume that z n (x) = ψn (x) + fTn (x)m
(43)
where m is now unknown, but fixed for all time, and the vector of known coefficients fn (x) varies with n in addition to x. Typically, only some of the components of fn (x) are determined by n, and the remaining components are determined by spatial position x alone. The number that vary with n is defined as dt . Thus, there are d − dt trend basis functions that vary only with x. As before, we assume that ψn (x) = qψn−1 (x) + νn (x)
(44)
where q is a known parameter and the random update νn (x) comes from a sequence of independent, identically distributed random fields with known covariance function k(x1 , x2 ) = E{νn (x1 )νn (x2 )} independent of n. Again, we let the initial (n = 0) covariance for the sequence be E{ψ0 (x1 )ψ0 (x2 )} = λk(x1 , x2 ) for some known parameter λ and assume that ψ0 (x) is uncorrelated with νn (x) for all n.
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
165
Also, as in the cokriging filter, our goal is to predict z n (x) given the observed values in the p × 1 vectors y1 , . . . , y N defined by Eq. (3), where the p observation locations are fixed. The resulting model is now fully compatible with the method of space–time kriging, with the cross-covariance determined by Eq. (30) or (31). The predictions can be determined as in Section III.C. However, in this section, we present a computationally efficient method for space–time kriging with this model.
A. Temporal Filter for Space–Time Kriging For the development of the temporal filtering and smoothing algorithms for space–time kriging, we use Eqs. (32) and (33) as the definitions for K and k(x). We also use the matrix Fn defined by Eq. (9). Then, beginning with the temporal filtering algorithm, the key to formulating zˆ n (x) as a temporal filter is that we can write it in the form of an update equation as follows. Theorem VI.1 For z n (x) given by Eq. (43), the BLUP of z n (x) given y1 , . . . , yn can be put into the form zˆ n (x) = zˆ n|n−1 (x) + aTn (x)(yn − zˆ n|n−1 )
(45)
where zˆ n|n−1 (x) is the BLUP of z n (x) given observations through time n − 1, zˆ n|n−1 is a vector of those predictions at the observation locations, and an (x) is a p × 1 vector of weights. Proof. The proof exactly parallels the proof of Theorem V.1. Any general linear prediction can be written as zˆ n (x) = zˆ n|n−1 (x) + aTn (x)(yn − zˆ n|n−1 ) +
n−1
biT (x)yi
i=1
To satisfy the unbiasedness constraint E{ˆz n (x)} = fTn (x)m, the coefficient vectors must satisfy n−1 i=1
Fi bi (x) = 0
Next, we observe that we must have
E [z n (x) − zˆ n|n−1 (x)]
( n−1 i=1
biT (x)yi
)'
=0
by the projected orthogonality theorem. On the basis of this observation and our assumption of uncorrelated processes, we find that the error variance is
166
KERWIN AND PRINCE
given by 72 ! ! 6 E [z n (x) − zˆ n (x)]2 = E z n (x) − zˆ n|n−1 (x) − aTn (x)(yn − zˆ n|n−1 ) 2 "n−1 T +E b (x)y i i=1 i Again, minimizing the error variance can be accomplished by individually minimizing both terms in this equation, and minimizing the second term leads to b1 (x) = · · · = bn−1 (x) = 0, which proves the theorem. The difference between space–time kriging and cokriging that is evident in Theorem VI.1 is that we are now able to predict z n (x) without bias by using only previous observations. This is a direct result of the assumption of a fixed vector of mean coefficients m. Furthermore, because zˆ n|n−1 (x) is unbiased, the innovation yn − zˆ n|n−1 is zero mean. Thus, there is no explicit unbiasedness constraint on an (x). Deriving an (x) is accomplished by direct minimization of the error variance.
1. Initialization Before the solution is presented, the issue of initialization must be resolved. The update equation requires an unbiased prediction of zˆ n|n−1 (x), but we are able to produce only an unbiased prediction for n > dt + 1 (i.e., for time frames after the temporal trend has become apparent; recall that dt is the number of time-dependent terms in fn (x)). This issue is overcome by initializing the filter at time dt + 2. To generate the initial prediction zˆ n|n−1 (x) for n = dt + 2, we perform traditional space–time kriging on the observations up to time n − 1. Although the use of traditional space–time kriging defeats the purpose of filtering somewhat, we note that this value of n − 1 is typically much smaller than N, so that significant computational savings is still possible. For example, for a trend that is linear in time, n − 1 = 2 and space–time kriging is performed on only the first two observation vectors. Thereafter, the filter is used. ˆ by using Eqs. (18) We therefore find the space–time kriging weights w and m and (19) with N = n − 1. To generate K, we use the assumed covariance function Eq. (30). We then let ˆ ˆ n−1 = m m
(46)
ˆ n−1 is a prediction using data only where the subscript n − 1 indicates that m
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
up to time n − 1. Next, we define
167
⎡
⎤ kn1 (x) ⎢ . ⎥ kn| j (x) = ⎣ .. ⎦ kn j (x)
where kn j (x) is defined by Eq. (13). With these definitions, the prediction of z n (x) given observations up to time n − 1 is ˆ n−1 zˆ n|n−1 (x) = kTn|n−1 (x)w + fTn (x)m
For its use in the filtering algorithm, some further modifications are necessary. Specifically, we note that j n− j 2j 2l−2 kn j (x) = q q q λ+ k(x) l=1
by Eq. (30), when n > j. We can then write
ˆ n−1 zˆ n|n−1 (x) = qkT (x)wn−1 + fTn (x)m where wn−1 =
n−1 j=1
q
n−1− j
2j
q λ+
j l=1
q
2l−2
(47)
wj
(48)
and w j is the p × 1 subvector of w corresponding to time frame j. In other words, ⎡ 1 ⎤ w ⎢ .. ⎥ w=⎣ . ⎦ wn−1
The result is a convenient rearrangement of terms that simplifies the computation of zˆ n|n−1 (x). Specifically, k T (x)wn−1 is a product of p × 1 vectors, whereas T (x)w is a product of [(n − 1) p] × 1 vectors. In addition, we obtain the kn|n−1 convenient result that ˆ n−1 zˆ n−1 (x) = kT (x)wn−1 + fTn−1 (x)m
As a final initialization step, we require the error variance in the estimate zˆ n|n−1 , which can be written as E{(zn − zˆ n|n−1 )(zn − zˆ n|n−1 )T } = K + FnT G n−1 Fn + q 2 Hn−1 T + qCn−1 Fn + q FnT Cn−1
168
KERWIN AND PRINCE
where ˆ n−1 )(m − m ˆ n−1 )T } G n−1 = E{(m − m
Hn−1 = E{(ψ n−1 − K wn−1 )(ψ n−1 − K wn−1 )T } ˆ n−1 )(ψ n−1 − K wn−1 )T } Cn−1 = E{(m − m
ˆ n−1 yields Evaluating the first expression by using Eq. (19) for m G n−1 = [F (K + )−1 F T ]−1
(49)
where K, , and F are defined by Eqs. (14), (16), and (20) with N = n − 1. To evaluate Hn−1 , we define K i| j = [K i1 · · · K i j ] by using Eq. (15) for K i j , and we note that K wn−1 = K n−1|n−1 w Using this definition to solve for Hn−1 , we find n−1 2(n−1) T 2l−2 λ+ q K − K n−1|n−1 (K + )−1 K n−1|n−1 Hn−1 = q l=1
T + K n−1|n−1 (K + )−1 F T G n−1 F (K + )−1 K n−1|n−1
(50)
Finally, a similar derivation yields T Cn−1 = −G n−1 F (K + )−1 K n−1|n−1
(51)
which completes the initialization. 2. The Filter With zˆ n|n−1 (x) initialized, we need only derive an expression for an (x) to fulfill the update equation (45). The derivation leads to the following algorithm. Algorithm VI.1 (The Space–Time Kriging Filter) ˆ n−1 , wn−1 , G n−1 , Hn−1 , and Cn−1 , using 1. Set n = dt + 2 and initialize m Eqs. (46), (48) (49), (50), and (51). 2. Calculate the matrices −1 T Fn L n = K + n + q 2 Hn−1 + FnT G n−1 Fn + q FnT Cn−1 + qCn−1 Mn = (G n−1 Fn + qCn−1 )L n
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
169
T An = K + q 2 Hn−1 + qCn−1 Fn L n
Hn = (I − An )(K + q 2 Hn−1 ) − q An FnT Cn−1 T G n = I − Mn FnT G n−1 − q Mn Cn−1 Cn = q I − Mn FnT Cn−1 − Mn (K + q 2 Hn−1 )
3. Calculate the coefficient vectors m ˆn =m ˆ n−1 + Mn yn − q K wn−1 − FnT m ˆ n−1 ˆ n−1 wn = qwn−1 + K −1 An yn − q K wn−1 − FnT m 4. Calculate the prediction, using
zˆ n (x) = kT (x)wn + fTn (x)m ˆn 5. Increment n and go to 2. The proof of this algorithm is outlined in the Appendix. In this case, we can see that the algorithm is related to the update equation (45) by noting first that ˆ n−1 Eq. (47) applies for any value of n, given the weight vectors wn−1 and m from the algorithm. Second, we find that an (x) = ATn K −1 k(x) + MnT fn (x)
(52)
which is the same relationship we found for the cokriging filter. However, we have used different definitions of An and Mn in this case. Also as in the cokriging filter, we have regrouped terms to generate an algorithm that does not feed forward the infinite number of values represented by zˆ n|n−1 (x), as implied ˆ n−1 are fed forward. by Eq. (45). Instead only the weight vectors wn−1 and m B. Temporal Smoother for Space–Time Kriging Next we develop an efficient means for incorporating future observations into each prediction to generate zˆ n|N (x) for each n. To formulate the space–time kriging smoother, we note that future observations can be incorporated into the prediction by means of the update equation zˆ n|i (x) = zˆ n|i−1 (x) + aTn|i (x) (yi − zˆ i−1 )
(53)
where zˆ n|i (x) is the BLUP of z n (x) given observations through time i (i > n). Thus, the prediction can be updated at time i given only the innovation at time i. The proof of this statement is nearly identical to the proof of Theorem VI.1. The coefficient vector in Eq. (53) is found by minimizing the error variance. In this case, no explicit unbiasedness constraint is required because zˆ n|i−1 (x) is
170
KERWIN AND PRINCE
already unbiased and the innovation yi − zˆ i−1 is zero mean. Solving for an|i (x) for i = n + 1, . . . , N leads to the following algorithm. Algorithm VI.2 (The Space–Time Kriging Smoother) For each n: 1. Set i = n + 1 and initialize wn|i−1 = wn , Hn|i−1 = Hn , and Cn|i−1 = Cn . 2. Update the coefficient vector T Fi L i yi − q K wi−1 − FiT m ˆ i−1 wn|i = wn|i−1 + K −1 q Hn|i−1 + Cn|i−1
3. Calculate the matrices
T Fi AiT Hn|i = q Hn|i−1 I − AiT − Cn|i−1 T Cn|i = I − Mi FiT Cn|i−1 − q Mi Hn|i−1
4. Calculate the prediction, using
zˆ n|i (x) = kT (x)wn|i + fTn (x)m ˆi 5. Increment i and go to 2. The proof of this algorithm is outlined in the Appendix. This algorithm requires wn , Hn , and Cn as inputs from the space–time kriging filter from ˆ i , L i , and Ai as inputs from time i. The relationship of the time n, and wi , m algorithm to the update equation (53) is established by noting that T an|i (x) = L i q Hn|i−1 + FiT Cn|i−1 K −1 k(x) + MiT fn (x)
ˆ i is incorporated into the prediction from The term Mi fn (x) arises because m the filter. In practice, parallel versions of the smoothing algorithm are running for n = 1, . . . , i − 1. The innovation at time i is then used to update w1|i−1 , . . . , wi−1|i−1 and an additional space–time kriging smoother is spawned for n = i. When i = N is reached, the complete space–time kriging prediction is available for all time frames. This approach is fundamentally different from that of the cokriging smoother, in which the smoother operates in a single reverse pass through the data. In the case of space–time kriging, the solution cannot be rearranged to produce a single-pass algorithm. Conversely, the cokriging smoother can be formulated as a set of parallel smoothing algorithms similar to those of the space–time kriging smoother. One final issue regarding this approach is that we initialized the filter algorithm at time dt + 2 and therefore do not have wn , Hn , and Cn for n ≤ dt . Thus, Algorithm VI.2 cannot be used for these values of n. This issue can be overcome by initializing wn|i−1 , Hn|i−1 , and Cn|i−1 for n ≤ dt and i = dt + 2 by using the same traditional space–time kriging solution used to initialize Algorithm VI.1. The smoothing algorithms for n ≤ dt are then begun all at
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
171
once at time dt + 2. Solving for wn|i−1 by using the same procedure as in the filter initialization yields n∨ j i−1 q |n− j| q 2(n∨ j) λ + q 2l−2 w j wn|i−1 = j=1
l=1
To generate the initialization of Hn|i−1 and Cn|i−1 , we need the definitions of these matrices, which are (see Appendix, Section D) Hn|i−1 = E{(ψ n − K wn|i−1 )(ψ i−1 − K wi−1 )T } and ˆ i−1 )(ψ n − K wn|i−1 )T } Cn|i−1 = E{(m − m Evaluating these expressions as in the initialization of the filter yields n T q 2l−2 K − K n|i−1 (K + )−1 K i−1|i−1 Hn|i−1 = q i−1−n q 2n λ + l=1
T + K n|i−1 (K + )−1 F T G i−1 F (K + )−1 K i−1|i−1
and T Cn|i−1 = −G i−1 F (K + )−1 K n|i−1
With these definitions, the complete space–time kriging prediction can be calculated for all n.
VII. Applications A. Groundwater Data To demonstrate the filtering algorithms, we return to the groundwater example presented in Section III.D. In that example, we had ki j (x1 , x2 ) = −40x2 − x1 − 2|i − j| fn (x) = [1
x
y
x2
y2
xy
n]T
n = 25I
and we simulated data with m = [100
3 4 −0.1 −0.15 −0.1 1]T
172
KERWIN AND PRINCE
Unfortunately, the covariance function is not compatible with the kriging update model. However, by fitting our kriging update model to the data, we arrive at the approximate model k(x1 , x2 ) = −0.1x2 − x1 λ = 400
q = 0.9987 for which ψn (x) is wide-sense stationary. With these values, if we evaluate Eq. (31), we find that the covariance function is correct for i = j but differs for i = j. Despite the difference in covariance functions, the filtering and smoothing algorithms are able to almost exactly reproduce the space–time kriging and cokriging results from Section III.D. Figure 6 shows the predicted water table elevations for the month 7 data obtained from the space–time kriging and cokriging smoothers. The corresponding predictions obtained by using traditional cokriging and space–time kriging were shown in Figures 3 and 4. The results are almost indistinguishable. The similarities between the original predictions and those from the filtering and smoothing algorithms are also evident in the RMS error performance. Figure 7 shows the error performance of the temporal cokriging filter and smoother. The error in the smoothed estimate is indistinguishable from that for traditional cokriging shown in Figure 5. The filtered version has slightly worse performance as a result of limiting the predictions to past and present observations. In Figure 8, the RMS errors are shown for the space–time kriging filter and smoother. In this case, the error performance of the smoother is slightly different from the original prediction, but it is not obvious which is better. For the filtered error, a distinct downward trend is visible, which indicates improved performance because more observations are incorporated into the prediction. The comparable error performance of the filters and smoothers comes with the benefit of a huge reduction in computation time compared with that of the traditional approaches. Table 2 summarizes the number of computations required for each method. These range from 7.4 to 60 times fewer computations than those of the traditional methods summarized in Table 1. The largest improvement is for the cokriging methods. The space–time kriging filter and smoother require more computation than the cokriging filter and smoother because of the need to perform traditional space–time kriging to initialize the filter and the need to run several parallel smoothers. Still, they produce a considerable computational savings over traditional space–time kriging computation time.
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
173
Figure 6. Water table elevations for month 7 predicted by (top) the temporal cokriging smoother and (bottom) the space–time kriging smoother.
174
KERWIN AND PRINCE
Figure 7. RMS errors between the true groundwater data and the predicted values from (dashed line) the temporal cokriging filter and (solid line) the temporal cokriging smoother. The error for the smoother is indistinguishable from that for traditional cokriging shown in Figure 5.
Figure 8. RMS errors between the true groundwater data and the predicted values from (dashed line) the space–time kriging filter and (solid line) the space–time kriging smoother. For reference, the error from traditional space–time kriging in Figure 5 is also shown (dotted line).
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
175
TABLE 2 Total Number of Floating-Point Operations Required to Interpolate Water Table Dataa by Using Various Kriging Filters and Smoothers Method
Floating-point operations
Cokriging filter Cokriging smoother Space–time kriging filter Space–time kriging smoother
1.72 × 107 2.65 × 107 3.57 × 107 1.19 × 108
a
From 50 samples times 12 months.
B. Cardiac MRI In this subsection, an actual application of one of the filtering algorithms— the temporal cokriging filter—demonstrates how valuable the computational savings can be. The original motivation for developing the temporal cokriging filter was for the analysis of cardiac magnetic resonance (MR) images in conjunction with a technique called tagging (Zerhouni et al., 1988). Briefly, tagging noninvasively introduces temporary features that can be tracked as the heart contracts, as illustrated in Figure 9. Tags are then used to determine the three-dimensional (3D) motion of the heart wall, which can be used for diagnostic procedures such as stress testing (Saito et al., 2000), assessment of treatment options such as pacemaker lead placement (Mc Veigh et al., 1998), or heart modeling (Park et al., 1996). However, determination of 3D wall motion from tagged MR images has proven difficult to automate. The cokriging filter has been instrumental in the following automated HeartMark approach to extracting 3D motion measurements of the left ventricle (LV) wall. The basic methodology we use for tracking 3D motion of the LV wall with HeartMark is called MR markers (Kerwin and Prince, 1998). The MR markers method uses initially planar, parallel tag surfaces, which are applied and sampled at several spatial locations by orthogonal image planes, as in Figure 9. Images are then obtained at a sequence of time frames during contraction, and motion is indicated by the distortion of the dark tag patterns within the images. Typically, there are four to eight image planes spanning the length of the LV wall and 10–12 time frames spanning the duration of heart contraction. Acquiring such images for three orthogonal sets of tag surfaces allows us to reconstruct 3D motion, as illustrated in Figure 10. The three sets of orthogonal tag surfaces define a regular 3D grid (Fig. 10a). As the grid of tag surfaces deforms with heart motion, the points where three tag surfaces intersect correspond to fixed points in the tissue. Thus, by tracking the grid intersections, we
176
KERWIN AND PRINCE
Figure 9. (MR) tagging: (a) Parallel planes of tissue are tagged (left) where the planes are orthogonal to the stack of images. This tagging produces dark parallel lines in each image (right). (b) When the heart contracts, the tag surfaces are deformed, as evidenced by the deformation of the pattern in each image.
obtain 3D motion for these points, which are called the MR markers (Fig. 10b). The key to tracking the points is to reconstruct the full grid of deforming tag surfaces at each time frame given the image data—a problem in space–time interpolation. 1. Tag Surface Model Using the cokriging filter to interpolate the deforming tag surfaces requires us to characterize each surface as a sequence of functions. We use the coordinate
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
177
Figure 10. MR markers (Kerwin and Prince, 1998): (a) Three sets of orthogonal tag surfaces are reconstructed for all time frames (b) If “markers” are placed at the resulting grid intersections that lie within the left ventricle (LV) wall, those markers track wall motion in 3D (shown in a view down the center of the LV cavity).
systems shown in Figure 11, in which the z axis is oriented perpendicular to the initial tag planes, and the x and y axes are parallel to the tag planes, which completes a standard 3D coordinate system. If we consider just one tag surface, the function z n (x, y) defines the tag surface configuration at time frame n. The initial tag surface is z 0 (x, y) = z 0 a known constant defining a plane.
178
KERWIN AND PRINCE
Figure 11. Orientation of image planes and tag surfaces. Additional tag surfaces correspond to each tag line in the images.
As the heart contracts, the deformation of a tag surface in the LV wall has been shown to be dominated by a quadratic trend (Kerwin and Prince, 1999b). Therefore, we use the model z n (x, y) = ψn (x, y) + fT (x, y)mn where f(x, y) = [1
x
y
x2
y2
x y]T
The fine-scale variation ψn (x, y) is assumed to obey the update model ψn (x, y) = ψn−1 (x, y) + νn (x, y) where νn (x, y) are independent, zero-mean random update functions. Their covariance E{νn (xi , yi )νn (x j , y j )} is given by the function k(ri j ) = ri2j log ri j where ri j =
(xi − x j )2 + (yi − y j )2
Several aspects of this model are notable. First, k(ri j ) is a generalized covariance function chosen because it is the kernel of the thin-plate spline, which means the interpolated surface will have minimum bending energy properties (Kent and Mardia, 1994; Meginguet, 1984). Second, for this model, q = 1,
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
179
which reflects the fact that the surface is not statistically stationary but becomes increasingly distorted with time. Finally, for this model, λ = 0 because the initial planar surface is known exactly. 2. Observation Model Given the chosen coordinate system, the set of images can be broken down into individual strips of pixels that are parallel to the z axis. Each strip has a unique position (xi , yi ) and encounters a brightness minimum where it intersects this tag surface. The z coordinate of this minimum is an observation of z n (xi , yi ). Figure 12a shows the set of all observations for one tag surface. Note that only the observations that lie within the LV wall are kept. These observations are placed into the observation vector yn . The observation vector also includes noise that results from uncertainty in the position of the brightness minimum. The nature of this noise depends on the method used to identify the brightness minimum, the physics of MR tagging, and the noise in the image (Atalar and Mc Veigh, 1994). The method we use to identify the brightness minimum is to first find the discrete pixel location l0 that is closest to the true tag position. This pixel is a local brightness minimum. Then, the preceding two and subsequent two pixels in the strip are used in the formula "2 l(I0 − I (l + l0 )) (54) z observed (xi , yi ) = l0 + "l=−2 2 l=−2 (I0 − I (l + l 0 )) where I0 is the untagged brightness of the LV wall and I (l) is the brightness of the lth pixel in the strip. This equation identifies asymmetry in the brightness profile around l0 and appropriately shifts the observed position to subpixel precision. The error variance in the observed position at time n is then σn2 =
1.78σ I2 tn /250 e I02
where σ I2 is the noise variance within the image and tn is the time in milliseconds after tag application (Kerwin, 1999; Kerwin and Prince, 1999b). The error variance is thus inversely proportional to the image signal-to-noise ratio I02 /σ I2 . The exponential term etn /250 arises because the tag patterns are only temporary and fade exponentially. Finally, the error in each observation is independent, which results in the error covariance matrix n = σn2 I where I is the identity matrix with dimensions equal to the number of observations.
180
KERWIN AND PRINCE
Figure 12. Sampling and reconstructing a tag surface: (a) A three-demensional (3D) depiction of all observations corresponding to one tag surface. The observation locations x1 , . . . , x p are denoted by × marks, and the corresponding z values, denoted by ∗ marks, are placed into the vector yn . Each row of observations comes from a different image plane. (b) The complete surface is reconstructed by the cokriging filter.
3. Cokriging The models of tag surface deformation and observations permit us to use the cokriging filter to interpolate the tag surface at each time frame. Specifically, we have defined λ, q, f(x, y), k(r ), and n . Thus, as each new set of observations becomes available, the new tag surface configuration is predicted by the filter. Figure 12b shows a typical result. One issue that arises is that as the heart contracts, the sets of pixel strips that intersect the LV wall invariably change. Although many strips are common to
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
181
all sets, some pixel strips are lost at each time frame and others begin to intersect. Therefore, the observation locations are not fixed over time. Fortunately, the set of all possible locations is finite and the changing observation locations can be handled by assigning infinite variance to the missing observations. The effect on the algorithm is minor and is presented in Kerwin and Prince (1999b). 4. Tracking Method To appreciate the benefit of the cokriging filter for identifying 3D heart wall motion, we examine the complete HeartMark tracking method. First, in the only step requiring human interaction, the original grid intersections are displayed on the first set of images in the sequence. Second, the user selects the subset of intersections that lie within the LV wall by using a series of mouse clicks. These are the MR markers that will be tracked by the HeartMark algorithm. Once the MR markers have been selected, they are tracked over time by using a recursive procedure. At each time in the sequence, the tag surfaces and MR markers reconstructed at the previous time are available. These previous surfaces and markers are used to extract observations from the next images in the sequence. The observations are then used to reconstruct the new grid of tag surfaces, from which the new MR marker positions are identified. The basic steps of the HeartMark algorithm are illustrated in Figure 13. First, for each of the three tag orientations, the MR markers from the previous time (Figure 13a) are used to produce initial guesses of the tag positions in the images. These guesses, called prototags, are generated by evaluating the previous tag surface predictions zˆ n−1 (x) for every strip of pixels within one tag separation (nominally 5 pixels) of an MR marker. Figure 13b illustrates the initial prototags for one image and one tag orientation. The prototags are then refined to better align with the dark tag features in the images. Refinement is accomplished by adjusting the estimated trend ˆ n in each tag surface estimate. The optimal coefficients are those coefficients m that minimize the sum of the image intensity beneath the prototags. Finding this minimum is accomplished by using gradient descent. The refined prototags from Figure 13b are displayed in Figure 13c. From the refined prototags, observations of the tag position are made for each pixel strip by using Eq. (54). The central pixel l0 is assumed to be that closest to the prototag. The full set of observations that results is shown in Figure 13d. Unfortunately, our method for identifying pixel strips that intersect the LV wall is imperfect and many of these observations lie outside the wall. Also, other observations have been perturbed by noise and are invalid. To eliminate these invalid observations, we take the 5 pixels used to generate the observation and
182
KERWIN AND PRINCE
Figure 13. The HeartMark algorithm: (a) Previous marker positions; (b) initial prototags; (c) refined prototags; (d) initial observations; (e) retained observations; and (f) deformed tag grid reconstructed from all images and orientations.
minimize 2 6
l=−2
2 72 I (l + l0 ) − a − be−(l+l0 −zobserved (x,y))
(55)
over a and b. Essentially, this fits a Gaussian profile that closely approximates actual tag profiles to the observed pixel intensities. If the value of Eq. (55) exceeds a user-defined threshold, then the observation is not “taglike” and is discarded. In addition, the value of b defines the depth of the tag. If it is smaller than some user-defined threshold, then the observation is also
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
183
discarded. Finally, any observation that is farther than 1 pixel from the prototag is discarded. The final observations shown in Figure 13e correspond well with the tag features visible in the image. In addition to the observations shown in Figure 13e, observations come from each image and each tag surface orientation. For each tag surface, an independent cokriging filter is used to reconstruct it from its corresponding observations until the deformed grid is complete (Fig. 13f). The intersections of the tag surfaces corresponding to MR markers are then determined by an alternating projections algorithm (Kerwin and Prince, 1998). These new marker positions are fed back to the HeartMark algorithm to compute the marker positions at the next time frame. 5. Results and Discussion of Cokriging Filter for Cardiac MRI The HeartMark algorithm was programmed in Matlab (The MathWorks, Natick, MA) and was run on a 300-MHz PC. The MR markers shown in Figure 10b are the first and last time frames for a typical data set from a normal subject, which consisted of 10 time frames and a 10 × 10 × 6 grid of tag surfaces. There were a total of 146 MR markers being tracked within the LV wall. These positions were obtained from original MR images within 20 min, including all user interaction. The number of floating-point operations required was 1.5 × 1010 . For comparison, the original MR markers method required the tag observations to be extracted from each image before reconstruction of the tag surfaces (Kerwin and Prince, 1998). We used a tag recognition program that required prior identification of the wall boundaries and significant user interaction to correct errors (Guttman et al., 1994). This method thus required several hours of user participation to identify the boundaries and then correct tracking errors before the MR markers could be extracted. Conversely, in the HeartMark method, the markers are computed within minutes. Several aspects of the HeartMark method are critical to its success. Foremost, the method does not require the boundaries of the LV wall to be known in order to track 3D motion, which sets HeartMark apart from most other techniques for processing tagged MR images (Denney and McVeigh, 1997; O’Dell et al., 1995); Park et al., 1996). However the fact that the boundaries are unknown leads to some erroneous observations. Furthermore, in an effort to reject these errors, many other observations are rejected, which leads to missing data. The use of the cokriging filter addresses both issues by effectively smoothing over erroneous observations and filling in missing observations by using both spatial and temporal information. Thus, the cokriging filter is critical for the robustness of the HeartMark method. In addition, the recursive cokriging filter naturally fits into the recursive technique used to track MR markers, in which the previous positions are used
184
KERWIN AND PRINCE
Figure 14. Comparison of the number of floating–point operations needed by (solid line) the cokriging filter and (dashed line) traditional cokriging to predict each tag surface, as a function of the number of time frames available.
to identify the new positions. Traditional cokriging could be used to reconstruct the tag surfaces individually on the basis of all available observations at each time. However, the use of traditional cokriging would take considerably longer. Figure 14 shows the number of floating-point operations per time frame that are required to evaluate the traditional cokriging equations for just one tag surface. The number of floating-point operations increases exponentially with time, as the number of observations grows. By the seventh time frame, 2.1 × 1010 operations are required just to evaluate the cokriging equations (8.3 × 109 operations per surface × 26 surfaces). This exceeds the total computation required by the HeartMark algorithm for all 10 time frames. In fact, the number of floating-point operations required by the cokriging filter is approximately 6 × 107 per tag surface, independent of the time frame. If not for the cokriging filter, the HeartMark algorithm would require hours to run as opposed to minutes. VIII. Discussion and Conclusion In summary, assumptions regarding the temporal structure of the covariance function were shown to greatly reduce the computational burden in both space– time kriging and cokriging. This is similar to the goal of Long and Myers (1997), who proposed breaking the matrix inversion in Eq. (11) into a set of smaller matrix inverses. In contrast, our approach was to assume that the
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
185
covariance resulted from an update model similar to that used in Kalman filtering. Such update models have proven extremely useful in the analysis of widely varying phenomena, in fields from economics to space travel. Furthermore, update models lend themselves to natural systems in which a random input, such as rainfall, is present. Given the importance of update models, a number of methods that combine kriging methods and Kalman filtering have been proposed (Berke, 1998; Huang and Cressie, 1996; Kerwin and Prince, 1999a). Most have relied on modeling the trend coefficients in mn as arising from a random update process themselves, of the form mn = mn−1 + µn Although appropriate for many applications, this assumption loses the principle assumption of kriging that the trend coefficients are deterministic but unknown. We have preserved this assumption and produced filtering algorithms that are completely equivalent to the original space–time kriging and cokriging formulations. This article is the first to present a complete set of algorithms for both space–time kriging and cokriging realized as filters and smoothers. All predictors were formulated to conform to the equation ˆn zˆ n (x) = kT (x)wn + fTn (x)m ˆ n are computed by the algorithms. Whether where the weight vectors wn and m to use filters based on cokriging or space–time kriging to compute the weights depends on the application. However, the importance of computation time in this decision is greatly reduced by the filtering formulations. An additional level of flexibility is introduced by the ability to choose between filtering and smoothing formulations. In evaluating groundwater data, the ability to incorporate all observations past and future into each prediction was important for best interpolation results. Conversely, for tracking LV motion, the ability to base each prediction on only past and present observations was required because future observations cannot be extracted before reconstruction of the past tag surfaces. Although these algorithms provide a comprehensive set of prediction equations, they are limited to the assumptions of the kriging update model. A number of other related algorithms could be developed by modifying these assumptions. For example, we could assume that the observation equation is yn = Bn zn + η n where Bn is a known matrix. Also, we could allow the observation locations to change at every time frame. The basic prediction update equations (35) and
186
KERWIN AND PRINCE
(45) do not change, but the resulting algorithms must change to reflect the changed assumptions. Another possible modification is to assume that there are multiple related functions z n1 (x), . . . z ln (x) varying together in time, which are arranged in the x-dependent vector zn (x) = ψ n (x) + FnT (x)mn For this model, we might assume a vector update equation of the form ψ n (x) = Qψ n−1 (x) + ν n (x) where Q is now an l × l matrix rather than a scalar. In particular, the multifunction model could be used to generate a more sophisticated temporal covariance in a single function. Suppose that the current value of ψn (x) depends on several past values as in ψn (x) = νn (x) +
L
ql ψn−l (x)
l=1
For example, when q1 = 2 and q2 = −1, we obtain a system with inertia. Such systems could equivalently be framed as a vector function and vector update equation if we let zn (x) = [z n−L (x) · · · z n (x)]T
ψ n (x) = [ψn−L (x) · · · ψn (x)]T ν n (x) = [0 · · · 0 νn (x)]T
and ⎡
0 1 0 ⎢0 0 1 ⎢ .. ⎢ Q=⎢ . ⎢ ⎣0 0 0 q L q L−1 q L−2
⎤ 0 0 0 0⎥ .. ⎥ ⎥ .⎥ ⎥ ··· 0 1⎦ · · · q1 1 ··· ··· .. .
Thus, solving the vector prediction problem also opens up a number of options for more sophisticated temporal covariances. A final option for extending the filtering algorithms is to consider space– time kriging between observation times. One advantage of traditional space– time kriging that is lost by the filter and smoother presented in this article is the ability to predict between observation times. Temporal interpolation can be accomplished within a filtering environment because the prediction update
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
187
equation zˆ n (x, t) = zˆ n−1 (x, t) + an (x) [yn − zˆ n−1 (x, tn )] holds for any time, where zˆ n (x, t) is the prediction of z(x, t) at some arbitrary time t given observations 1 through n. To solve for an (x) and generate an algorithm, we must specify the complete temporal covariance of ψ(x). The various extensions were not considered in this article, in part because the space–time kriging and cokriging algorithms are complicated even when only the basic assumptions are used. Allowing more generalized assumptions leads to even more complicated notation. Nevertheless, many of these extensions can be undertaken by using the same techniques discussed in this article. Developing the extended algorithms is left to future work.
Appendix: Optimality of Filtering Algorithms A rigorous derivation of the cokriging filter algorithm was previously presented in the literature (Kerwin and Prince, 1999a). The remaining algorithms—the cokriging smoother, the space–time kriging filter, and the space–time kriging smoother—can be derived in a similar fashion. We therefore presented these algorithms without proof in the preceding text. For the interested reader, we provide the following outlines of proofs, which show that each algorithm satisfies the projected orthogonality theorem for its model assumptions. Therefore, all algorithms are optimal. Many of the steps in these outlines involve considerable algebra and application of the model assumptions, but they are otherwise straightforward.
Proof of Algorithm V.1 The key to proving the optimality of Algorithm V.1 is to first recognize the following: E{wn } = 0
E{(ψ n − K wn )(ψ n − K wn )T } = Hn
E{(ψ n − K wn )(ψn (x) − kT (x)wn )} = Hn K −1 k(x) Each of these can be shown through induction, by assuming that they are true for n − 1 and showing that they must therefore be true for n. Note that the initialization of Algorithm V.1 guarantees that these equations hold for n = 0.
188
KERWIN AND PRINCE
Induction can then be used to show that ! E (ψn (x) − kT (x)wn ) biT yi = 0
(56)
for all i ≤ n and any bi satisfying Fbi = 0. This is straightforward for i ≤ n − 1. For i = n, we can equivalently show that E{(ψn (x) − kT (x)wn )(yn − q K wn−1 )T }bn = 0 where addition of the term q K wn−1 is permissible because Eq. (56) holds for i ≤ n − 1. In this form, the definitions of wn , An , L n , Hn−1 , and Mn lead directly to the desired result. This next leads to the conclusion that ! E (z n (x) − zˆ n (x)) biT yi = 0 for all i ≤ n and any bi satisfying Fbi = 0. This is shown by writing ˆ n) z n (x) − zˆ n (x) = [ψn (x) − kT (x)wn ] + fT (x)(mn − m The term in brackets is uncorrelated with biT yi by Eq. (56). Similarly, evaluating ! ˆ n ) biT yi E (mn − m
for i ≤ n shows that the second term is also uncorrelated with biT yi . Thus, Algorithm V.1 satisfies projected orthogonality. Finally, we must also show that Algorithm V.1 is unbiased so that E{ˆz n (x)} = fT (x)mn Using the fact that E{wn } = 0 leads to ˆ n} E{ˆz n (x)} = fT (x)E{m and ˆ n } = mn E{m so the algorithm is unbiased. Because it is unbiased and satisfies projected orthogonality, it must be the optimal cokriging predictor.
Proof of Algorithm V.2 To demonstrate the optimality of Algorithm V.2, we first note that if E{vn+1 } = 0, then E{vn } = 0
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
189
Because we start with v N = 0, we see that vn must be zero mean for all n < N . The algorithm thus adds a zero mean component to an unbiased prediction, which preserves unbiasedness. To prove that Algorithm V.2 satisfies projected orthogonality, we define ˆ n|i = m ˆ n|i−1 + q Rn|i−1 L i (I − F T Mi )(yi − K wi−1 ) m
wn|i = wn|i−1 + q K −1 Hn|i−1 L i (I − F T Mi )(yi − K wi−1 )
where Hn|n = Hn and we recursively define
Rn|n = Mn n ATn − Hn
Hn|i = q Hn|i−1 I − AiT Rn|i = q Rn|i−1 I − AiT
for i > n. We then show that
zˆ n|i (x) = kT (x)wn|i + fT (x)m ˆ n|i
(57)
is the BLUP of z n (x) given y1 , . . . , yi . This also proves the optimality of Algorithm V.2 because the preceding definitions lead to the same solution as Algorithm V.2 when i = N . The optimality of Eq. (57) can be proven by first showing by induction that Hn|i = E{(ψ n − K wn|i )(ψ i − K wi )T } ˆ n|i )(ψ i − K wi )T } Rn|i = E{(mn − m
and then that T K −1 k(x) E{(ψ i − K wi )(ψn (x) − kT (x)wn|i )} = Hn|i
Induction can then be used to show that ! T E z n (x) − z n|i (x) bTj y j = 0
for all j ≤ i and any b j satisfying Fb j = 0, in a manner similar to proving Eq. (56). Thus, projected orthogonality is proven and Algorithm V.2 must be optimal.
190
KERWIN AND PRINCE
Proof of Algorithm VI.1 The proof of Algorithm VI.1 follows a path similar to the proof of Algorithm V.1. First, induction leads to E{wn } = 0
ˆ n} = m E{m This immediately establishes that zˆ n (x) is unbiased because its expectation is fTn (x)m. Next, the following definitions hold Hn = E{(ψ n − K wn )(ψ n − K wn )T }
ˆ n )(m − m ˆ n )T } G n = E{(m − m
ˆ n )(ψ n − K wn )T } Cn = E{(m − m
which can be shown by assuming that they are true for n − 1 and showing that they must therefore be true for n. A similar induction argument establishes that E{(ψ n − K wn )(ψn (x) − kT (x)wn )} = Hn K −1 k(x) ˆ n )(ψn (x) − kT (x)wn )} = Cn K −1 k(x) E{(m − m
That these equations hold for n = dt + 1 (i.e., when the space–time kriging filter is started) was established in the initialization step. Finally, these facts can be used to show that
E (z n (x) − zˆ n (x))
n
biT yi
i=1
'
=0
(58)
for any b1 , . . . , bn satisfying E
n i=1
biT yi
'
=0
This establishes projected orthogonality and, therefore, the optimality of the algorithm. To show that Eq. (58) holds, we must first consider the special case in which bn = 0. Then, if we assume that Eq. (58) held for n − 1, it must also
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
191
hold for this special case. Next, we write n i=1
T ˆ n−1 bn biT yi = yn − q K wn−1 − FnT m +
n−1 i=1
T ˆ n−1 bn biT yi + q K wn−1 + FnT m
and observe that both terms must be zero mean. The second term is effectively the special case because it does not depend on yn and is therefore uncorrelated with the prediction error. Thus, to prove that projected orthogonality holds, we need only to show that the first term is uncorrelated with the prediction error. Evaluating T ! ˆ n−1 bn E (z n (x) − zˆ n (x)) yn − q K wn−1 − FnT m establishes that it is zero.
Proof of Algorithm VI.2 To prove Algorithm VI.2, we first recognize that E{wn|i } = 0 ˆ i } = m; therefore, zˆ n|i (x) is unbiased. Second, As previously established, E{m we note that Hn|i = E{(ψ n − K wn|i )(ψ i − K wi )T } ˆ i )(ψ n − K wn|i )T } Cn|i = E{(m − m
which can be shown by assuming that they are true for i − 1 and showing that they must therefore be true for i. A similar argument establishes that T K −1 k(x) E{(ψ i − K wi )(ψn (x) − kT (x)wn|i )} = Hn|i
ˆ i )(ψn (x) − kT (x)wn|i )} = Cn|i K −1 k(x) E{(m − m That these equations hold for i = n is guaranteed by the space–time kriging filter (note that zˆ n|n (x) = zˆ n (x)). Finally, these facts can be used to show that ' i T E (z n (x) − zˆ n|i (x)) =0 bj yj j=1
192
KERWIN AND PRINCE
for any b1 , . . . , bi satisfying E
i j=1
bTj y j
'
=0
which establishes projected orthogonality and, therefore, the optimality of the algorithm. Using the same strategy as in the proof of Algorithm VI.1, we can demonstrate the validity of this statement by showing that T ! ˆ i−1 bi = 0 E (z n (x) − zˆ n|i (x)) yi − q K wi−1 − FnT m References Atalar, E., and McVeigh, E. (1994). Optimization of tag thickness for measuring position with magnetic resonance imaging. IEEE Trans. Med. Imaging 13, 152–160. Berke, O. (1998). On spatiotemporal prediction for on-line monitoring data. Commun. Stat. Theory Methods 27, 2343–2369. Bilonick, R. (1985). The space–time distribution of sulfate distribution in the northeastern United States. Atmos. Environ. 19, 1829–1845. Bogaert, P. (1996). Comparison of kriging techniques in a space–time context. Math. Geol. 28, 73–86. Christensen, R. (1990). The equivalence of predictions from universal kriging and intrinsic random-function kriging. Math. Geol. 22, 655–664. Christensen, R. (1991). Linear Models for Multivariate, Time Series, and Spatial Data. Berlin: Springer-Verlag. Cressie, N. (1990). The origins of kriging. Math. Geol. 22, 239–252. Denney, T. S., and McVeigh, E. R. (1997). Model-free reconstruction of three-dimensional myocardial strain from planar tagged MR images. J. Magn. Reson. Imaging 7, 799–810. Goldberger, A. (1962). Best linear unbiased prediction in the generalized linear regression model. J. Am. Stat. Assoc. 57, 369–375. Guttman, M., Prince, J., and McVeigh, E. (1994). Tag and contour detection in tagged MR images of the left ventricle. IEEE Trans. Med. Imaging 13, 74–88. Henderson, C. (1950). Estimation of genetic parameters. Ann. Math. Stat. 21, 309–310. Huang, H., and Cressie, N. (1996). Spatio-temporal prediction of snow water equivalent using the Kalman filter. Comp. Stat. Data Anal. 22, 159–175. Journel, A., and Huijbrechts, C. (1978). Mining Geostatistics. London: Academic Press. Kailath, T. (1968). An innovations approach to least-squares estimation part I: Linear filtering in additive white noise. IEEE Trans. Automat. Control AC-13, 646–655. Kalman, R., and Bucy, R. (1961). New results in linear filtering and prediction theory. J. Basic Eng. 83, 95–108. Kent, J., and Mardia, K. (1994). The link between kriging and thin-plate splines, in Probability, Statistics and Optimisation, edited by F. Kelly. New York: Wiley, pp. 325–339. Kerwin, W. (1999). Space–time estimation of left ventricular motion from tagged magnetic resonance images. Ph.D. thesis, Johns Hopkins University, Baltimore, MD. Kerwin, W., and Prince, J. (1998). Cardiac material markers from tagged MR images. Med. Image Anal. 2, 339–353.
KRIGING FILTERS FOR SPACE–TIME INTERPOLATION
193
Kerwin, W., and Prince, J. (1999a). The kriging update model and recursive space–time function estimation. IEEE Trans. Signal Processing 47, 2942–2952. Kerwin, W., and Prince, J. (1999b). Tracking MR tag surfaces using a spatiotemporal filter and interpolator. Int. J. Imaging Syst. Technol. 10, 128–142. Lindquist, A. (1968). An optimal stochastic control with smoothed information. Inf. Sci. 1, 55–85. Long, A., and Myers, D. (1997). A new form of the cokriging equations. Math. Geol. 29, 685–703. Malley, J. (1986). Optimal Unbiased Estimation of Variance Components. Vol. 39, Lecture Notes in Statisitics. Berlin: Springer-Verlag. Matheron, G. (1969). Le krigeage universal. Technical Report Fascicule 1, Cahiers du Centre de Morphologie Math´ematique, Fontainebleau, France. Matheron, G. (1973). The intrinsic random functions and their applications. Adv. Appl. Probl. 5, 439–468. McVeigh, E., Prinzen, F., Wyman, B., Tsitlik, J., Halperin, H., and Hunter, W. (1998). Imaging asynchronous mechanical activation of the paced heart with tagged MRI. Magn. Reson. Med. 39, 507–513. Meinguet, J. (1984). Surface spline interpolation: Basic theory and computational aspects, in Approximation Theory and Spline Functions, edited by S. Singh. Dordrecht: Reidel, pp. 227– 142 Myers, D. (1982). Matrix formulation of co-kriging. Math. Geol. 14, 249–257. O’Dell, W., Moore, C., Hunter, W., Zerhouni, E., and McVeigh, E. (1995). Three-dimensional myocardial deformations: Calculation with field fitting to tagged MR images. Radiology 195, 829–835. Papritz, A., and Fluhler, H. (1994). Temporal change of spatially autocorrelated soil properties: Optimal estimation by cokriging. Geoderma 62, 29–43. Park, J., Metaxas, D., and Axel, L. (1996). Analysis of left ventricular wall motion based on volumetric deformable models and MRI-SPAMM. Med. Image Anal. 1, 53–71. Robinson, G. (1991). That BLUP is a good thing: The estimation of random effects. Stat. Sci. 6, 15–32. Rouhani, S., and Hall, T. (1989). Space–time kriging of groundwater data. Geostatistics 2, 639– 650. Rouhani, S., and Myers, D. (1990). Problems in space–time kriging of geohydrological data. Math. Geol. 22, 611–623. Saito, I., Watanabe, S., and Masuda, Y. (2000). Detection of viable myocardium by dobutamine stress tagging magnetic resonance imaging with three-dimensional analysis by automatic trace method. Jpn. Circ. J. 64, 487–494. Wackernagel, H. (1994). Cokriging versus kriging in regionalized multivariate data analysis. Geoderma 62, 83–92. Zerhouni, E., Parish, D., Rogers, W., Yang, A., and Shapiro, E. (1988). Human heart: Tagging with MR imaging—A method for noninvasive assessment of myocardial motion. Radiology 169, 59–63.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 124
Constructions of Orthogonal and Biorthogonal Scaling Functions and Multiwavelets Using Fractal Interpolation Surfaces BRUCE KESSLER Department of Mathematics, Western Kentucky University, Bowling Green, Kentucky 42101
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . A. Notation and Definitions . . . . . . . . . . . . . . . . . . . . B. Fractal Interpolation Surfaces . . . . . . . . . . . . . . . . . . C. Main Results . . . . . . . . . . . . . . . . . . . . . . . . II. Scaling Function Constructions . . . . . . . . . . . . . . . . . . A. Biorthogonal Construction . . . . . . . . . . . . . . . . . . . B. Orthogonal Construction . . . . . . . . . . . . . . . . . . . . III. Associated Multiwavelets . . . . . . . . . . . . . . . . . . . . . IV. Wavelet Constructions . . . . . . . . . . . . . . . . . . . . . . A. Wavelets for the Biorthogonal Construction . . . . . . . . . . . . ˜f . . . . . . . . . . . . . . . . . . . 1. Wavelets in W f and W ˜g . . . . . . . . . . . . . . . . . . . 2. Wavelets in Wg and W ˜h . . . . . . . . . . . . . . . . . . . 3. Wavelets in Wh and W B. Wavelets for the Orthogonal Construction . . . . . . . . . . . . . 1. Wavelets in W f . . . . . . . . . . . . . . . . . . . . . . 2. Wavelets in Wg . . . . . . . . . . . . . . . . . . . . . . 3. Wavelets in Wh . . . . . . . . . . . . . . . . . . . . . . V. Applications to Digitized Images . . . . . . . . . . . . . . . . . . A. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . B. Image Compression . . . . . . . . . . . . . . . . . . . . . . C. Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Coefficients for the Biorthogonal Scaling Function Construction in Section II.A . . . . . . . . . . . . . . . . . . . . . . . . B. Coefficients for the Orthogonal Scaling Function Construction in Section II.B . . . . . . . . . . . . . . . . . . . . . . . . C. Coefficients for the Biorthogonal Wavelet Construction in Section IV.A D. Coefficients for the Orthogonal Wavelet Construction in Section IV.B . References . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
195 197 199 203 204 204 208 209 218 218 218 219 222 223 223 223 225 226 227 229 230 232
. . .
232
. . . .
233 234 245 250
. . . .
. . . .
I. Introduction Fourier analysis, the decomposition of a signal into the different-frequency sine and cosine waves necessary to build the signal, has been a standard tool in signal processing. This approach is particularly useful when analog sound 195 Copyright 2002, Elsevier Science (USA). All rights reserved. ISSN 1076-5670/02 $35.00
196
BRUCE KESSLER
signals are being analyzed. Sounds of a particular frequency can be identified in the signal and then adjusted or even removed from the signal. However, when digital images are being analyzed, standard Fourier analysis has some distinct weaknesses: r Digitized images frequently have a number of sharp edges, whereas sound signals are typically smooth and wavy. Rapid changes in the data are reflected in a greater range of frequencies detected in the Fourier analysis of the signal and a larger number of nonzero Fourier coefficients. r Because the sine and cosine waves used in Fourier analysis have global support, changing or omitting a Fourier coefficient will cause a change in the entire image. Also, although Fourier analysis can detect the presence and size of sharp changes in the image, it cannot identify where they are located. The introduction of wavelet theory has helped to address these weaknesses. In a wavelet analysis, the sine and cosine waves of Fourier analysis are replaced with a set of compactly supported functions whose translates and dilates form a complete orthonormal system. Frequencies are determined by applying the bases at different resolutions. With bases of compact support, a nonzero basis coefficient gives an indication of both the presence and the size of a sharp change in the signal, as well as an idea of where the change took place. In addition, the basis being used can be chosen to best suit the type of signal being analyzed and the particular goals of the analysis. A great introduction to wavelets, with a comparison and contrast of Fourier analysis and wavelet analysis, can be found in Hubbard (1998). The majority of work on wavelets has involved the use of a single analysis function defined over a one-dimensional domain. (The most notable of these is Daubechies’ D4 scaling function. See Daubechies, 1992, for complete details. For other constructions, see Donovan et al., 1996c, Hardin et. al., 1992, and Strang and Strela, 1994.) By using tensor products, researchers can easily adapt bases of this type to image data defined over two-dimensional domains. Useful functions φ1 (x) and φ2 (x) can be used to construct a useful function φ(x, y) by defining φ(x, y) = φ1 (x)φ2 (y). Such bases are said to be separable. Many researchers have replaced the single scaling function with a set of functions, which allows greater freedom in the basis design. (A notable example of such a construction is the GHM (Geronimo–Hardin–Massopust) scaling vector. See Geronimo et al., 1994, for complete details.) Also, the condition that the bases be orthogonal has been relaxed. For instance, Hardin and Marasovich (1999) built biorthogonal counterparts to the GHM scaling functions. Likewise, a separable biorthogonal basis is being used by the U.S. government’s Federal Bureau of Investigation to compress images of fingerprints and is a part of the new Joint Photographic Experts Group (JPEG) standard. See Daubechies (1992) for a discussion of the role of orthogonality with a single scaling function.
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
197
This article outlines the work of Donovan, Geronimo, Hardin, and the author in constructing nonseparable (i.e., not separable) orthogonal and biorthogonal scaling vectors by using well-developed theory in fractal interpolation surfaces. (For other approaches in constructing nonseparable bases, see Belogay and Wang, 1999, and Donovan et. al., 2000.) Separable bases are easy to apply (as long as the data are rectangular) but favor horizontal and vertical changes in the data, whereas nonseparable bases may not. Also, the bases constructed in this article can be adapted to arbitrary triangulations (the subject of an upcoming paper by Hardin and the author), which may be better suited to some data sets and applications. The author is hopeful that research in this area will lead to even more useful bases for the analysis of digitized images.
A. Notation and Definitions Let ǫ1 and ǫ2 be linearly independent vectors in R2 and let us define ǫ0 := (0, 0). Let T be the three-directional mesh with directions ǫ1 , ǫ2 , and ǫ2 − ǫ1 . Let us define △0 ∈ T as the triangular region with vertices ǫ0 , ǫ1 , and ǫ2 , and ▽0 ∈ T as the triangular region with vertices ǫ1 , ǫ2 , and ǫ1 + ǫ2 . Let us also define the translation function ti, j (x) := x − iǫ1 − jǫ2 and the dilation function di, j (x) := N x − iǫ1 − jǫ2 for some fixed integer dilation N > 1. Furthermore, let us define the affine reflection function r : ▽0 → △0 which maps the vertices ǫ1 , ǫ2 , and ǫ1 + ǫ2 to vertices ǫ2 , ǫ0 , and ǫ1 , respectively. The notation fˇ := f ◦ r is used for any f supported in △0 . Definition I.1 A multiresolution analysis (MRA) of L 2 (R2 ) of multiplicity r is a set of closed linear subspaces such that 1. A · · · ⊂ V−2 ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ V2 ⊂ · · · V j = {0} 2. 5 j∈Z 3. V j = L 2 (R2 ) j∈Z
4. f ∈ V j ⇔ f (N − j ·) ∈ V0 , j ∈ Z 5. There exists a set of functions {φ 1 , φ 2 , . . . , φ r } such that {φ k ◦ ti : k = 1, . . . , r, i ∈ Z2 } forms a Riesz basis of V0 .
The r vector := (φ 1 , φ 2 , . . . , φ r )T is referred to as a scaling vector and the individual φ k as scaling functions.
Conditions 1, 4, and 5 imply that a scaling vector with compactly supported φ k satisfies the dilation equation gi ◦ di (1) (x) = N i ∈ Z2
for a finite number of r × r constant matrices gi .
198
BRUCE KESSLER
Definition I.2 A vector of r linearly independent functions on R2 is refinable at dilation N if it satisfies Eq. (1) for some sequence of r × r constant matrices gi . A simple example of an MRA of L 2 (R2 ) over the mesh T is constructed by defining the “hat” function h as the piecewise linear function that satisfies h(iǫ1 + jǫ2 ) = δ0,i δ0, j and letting = {h}. Using the notation S(H ) := clos L 2 span{ f ◦ ti : i ∈ Z2 , f ∈ H }
for H ⊂ L 2 (R2 )
let us then define V0 := S(). It is easily verified that the scaling vector is refinable for any integer dilation N > 1, and that (V p ) is an MRA, where V p := S((N p ·)). For function vectors Ŵ and with elements in L 2 (R2 ), let us define Ŵ, = Ŵ(x)(x)T d x R2
Definition I.3 If , ◦ ti, j = δ0,i δ0, j I , then let us say that is an orthogonal scaling vector. If the φ k are compactly supported, then the MRA generated by is said to be orthogonal. Let us define Wn to be the orthogonal complement of Vn in Vn+1 , so that Vn+1 = Vn ⊕ Wn
for n ∈ Z
The Wn , referred to as wavelet spaces, are necessarily pairwise orthogonal and are spanned by the orthogonal dilations and translations of a set of functions {ψ 1 , ψ 2 , . . . , ψ t }, referred to as wavelets, that satisfy the equation (x) = N h i ◦ di (2) i∈Z2
for some t × r constant matrices h i , where is the t-vector (ψ 1 , ψ 2 , . . . , ψ t )T , called a multiwavelet. ˜ is said to Definition I.4 A pair of n-dimensional function vectors and be biorthogonal if ˜ ◦ ti, j = δ0,i δ0, j I ,
i, j ∈ Z.
A necessary and sufficient condition for the construction of biorthogonal vectors was given in Hardin and Marasovich (1999) and is stated next without proof. Lemma I.1 Suppose U and W are m-dimensional subspaces of Rn . There exist dual (biorthogonal) bases for U and W if and only if U ∩ W ⊥ = {0}.
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
199
If the criteria of Lemma I.1 are met, then the Gram–Schmidt orthogonalization process can be modified to provide biorthogonal sets in the following fashion: 1. Consider the two sets {x1 , . . . , xn } and {y1 , . . . , yn } where xi , yi = 0, i = 1, . . . , n. Let u 1 = x1 and v1 = y1 . 2. Let i−1 i−1 xi , v j yi , u j uj vj and vi = yi − u i = xi − u j , v j u j , v j j=1 j=1 for i = 2, . . . , n
3. Let zi = u i
and
z˜ i =
vi u i , vi
for i = 1, . . . , n
Let us suppose that X and Y are biorthogonal function vectors. Then let us define the projection operator PXY such that ker PXY = Y ⊥ and range PXY = X . If X := S(X ) and Y := S(Y ) are finite shift-invariant spaces, then n f, yi ◦ t j PXY f := xi ◦ t j xi , yi j ∈ Z2 i=1
where xi ∈ X and yi ∈ Y .
B. Fractal Interpolation Surfaces The construction of fractal interpolation surfaces is outlined in Geronimo and Hardin (1993) and Massopust (1990). See Barnsley (1988) for an introduction to fractals in general. The following is a brief introduction to fractal interpolation surfaces. Let D be a closed triangular region in R2 and let {qn }rn=1 be a set of points in N be a triangulation D such that q1 , q2 , and q3 are the vertices of D. Let {△i }i=1 of {qn } such that the graph has chromatic number 3. (The chromatic number of a graph is the fewest number of symbols needed to cover the vertices of the graph so that any two adjacent vertices have distinct symbols. It is important to note that not all triangulations have chromatic number 3.) Let us assign a symbol k(n) ∈ {1, 2, 3} to each of the qn so that each subdomain △i has three distinct symbols at its vertices. Let {z n } be a set of real values associated with the {qn }. There exists a unique mapping u i : D → △i for i = 1, 2, . . . , N of the form mi a bi u i (x) = i (3) x+ ni ci di
200
BRUCE KESSLER
where ai , bi , ci , di , m i , and n i are uniquely determined by u i qk(n) = qn
(4)
vi (x, z) = [ei
(5)
for all vertices qn of △i . Also, let us define a mapping vi : D × R → R for i = 1, 2, . . . , N of the form f i ]x + si z + pi
where |si | < 1 and where ei , f i , and pi are uniquely determined by vi qk(n) , z k(n) = z n
(6)
for all vertices qn of △i . Let C0 (D) denote the space of continuous functions on R2 with support in D. Let us define a function Ŵ : C0 (D) → C0 (D) piecewise by (7) Ŵ( f ) := vi u i−1 , f ◦ u i−1
for f ∈ C0 (D). Then the function Ŵ is contractive in the supremum norm with contractivity |s| = maxi=1 , . . . , N |si |. By the contraction mapping theorem, there exists an f ∗ ∈ C0 (D) such that Ŵ( f ∗ ) = f ∗ . This function interpolates the points (qn , z n ) and is referred to as a fractal interpolation surface (FIS). Example I.1 Let us define an FIS over the right triangle with vertices (0, 1), (0, 0), and (1, 0) and additional triangulation points (0, 12 ), ( 21 , 12 ), and ( 21 , 0). The triangulation and chromatic mappings used are shown in Figure 1. After the various unknowns are solved, progressively finer approximations of the FIS are drawn by repeatedly applying the union of the domain mappings, starting with the linear surface that interpolates the given data. The FIS being approximated through successive iterations in Figure 2 interpolates the points (0, 1,0), (0, 12 , 41 ), ( 21 , 12 , 34 ), (0, 0, 0), ( 12 , 0, 12 ), and (1, 0, 1 ) and has vertical scaling si = 53 for all i ∈ {1, 2, 3, 4}. 4
Figure 1. Triangulation and domain mappings for Example I.1.
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
201
Figure 2. Successive approximations of the fractal interpolation surface (FIS) in Example I.1.
Under certain circumstances, the matchup conditions along the edges of the subdomains are more easily met and the construction of the FIS is greatly simplified. If the interpolation points along the boundary of D are coplanar, then the requirement that the triangulation have chromatic number 3, along with the requirement that the mappings u i take vertices of D only to “appropriate”
202
BRUCE KESSLER
Figure 3. Triangulation and domain mappings for Example I.2.
vertices of △i , may be dropped, and we may express the fixed point f ∗ as f ∗ (x) = (x) + si f ∗ ◦ u i−1 (x) (8)
where is the piecewise linear function defined by (x) = [ei
f i ]u i−1 (x) + pi
for x ∈ △i
(9)
Example I.2 This example√shows an FIS constructed on the equilateral trian1 3 gle with vertices (0, 0), √ √ √ ( 2 , 2 ), and √ (1, 0), with additional triangulation points 1 1 1 3 3 3 ( 6 , 6 ), ( 3 , 3 ), ( 2 , 10 ), and ( 43 , 43 ). The triangulation is shown in Figure 3. Notice that the triangulation has a chromatic number of 4. √ Let the surface be zero along the boundary and let us interpolate ( 12 , 103 , 12 ), with si = 21 for i ∈ {1, 2, 3, 4, 5, 6}. The orientation of the mappings u i and vi determine the resulting FIS, but not the continuity of the surface. Approximations to the FIS are shown in Figure 4.
Figure 4. Approximations to the FIS constructed in Example I.2.
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
203
Notice that if all the interpolation points (qn , z n ) are coplanar, then the resulting FIS is merely the plane containing the points over the domain D. Therefore, the hat function h defined in Section I.A is a union of six FISs. C. Main Results The following is an extension of ideas which first appeared in Donovan et al. (1996a) and later in Donovan et al. (1996b). Let us define h i := h(· − ǫi )|△0 and let C0 (R2 ) denote the bounded, continuous functions over R2 . Then we have the following result. Theorem I.1 Suppose there are function vectors B := {w1 , . . . , w t , wˇ 1 , . . . , wˇ t } and B˜ := {w˜ 1 , . . . , w˜ t , w˜ˇ 1 , . . . , w˜ˇ t } with functions in C0 (R2 ) ∩ L 2 (R2 ) such that 1. B and B˜ are biorthogonal. 2. B and B˜ each extend {h}. 3. supp(wi ), supp(w˜ i ) ⊆ △0 , i = 1, . . . , t. ˜ 4. (I − PBB )h i ⊥ (I − PB˜B )h j , i = j, i, j ∈ {0, 1, 2}.
˜ of length q := 2t + 1 Then there exist biorthogonal scaling vectors and ˜ each contain the piecewise linears on such that V0 := S() and V˜ 0 := S() the mesh T . Proof. The main issue is finding compactly supported functions φ i and φ˜ j that satisfy the biorthogonality conditions φ i , φ˜ j = δi, j . Let us define the following: for i = 1, . . . , t φ i := wi for i = 1, . . . , t φ := wˇi ˜ W h φ q := α1 I − PW t+i
and
φ˜ i := w˜ i for i = 1, . . . , t φ := w˜ˇi for i = 1, . . . , t 1 W q ˜ φ := β I − PW ˜ h ˜ t+i
˜
W W where α, β are constants such that αβ := (I − PW )h, (I − PW ˜ )h. Let := 1 q T 1 q T ˜ ˜ ˜ (φ , . . . , φ ) and := (φ , . . . , φ ) . Then let us set V p := S((N p ·)) and ˜ p ·)). V˜ p := S((N Condition 1 of Theorem I.1 guarantees that
φ i , φ˜ j = δi, j φ i , φ˜ j = δi, j
for i, j = 1, . . . , t for i, j = t + 1, . . . , 2t
Condition 3 of Theorem I.1 guarantees that φ i , φ˜ j = 0
φ i , φ˜ j = 0
for i = 1, . . . , t, j = t + 1, . . . , 2t
for i = t + 1, . . . , 2t, j = 1, . . . , t
204
BRUCE KESSLER
Condition 4 of Theorem I.1 establishes the remaining orthogonality conditions: φ q , φ˜ i = 0
φ i , φ˜ q = 0
for i = 1, . . . , 2t
for i = 1, . . . , 2t
˜ are refinable and Condition 2 of Theorem I.1 guarantees that both and that Vn ⊂ Vn+1 and V˜ n ⊂ V˜ n+1 . The requirements that ∩ j∈Z V j = 0, ∩ j∈Z V˜ j = ˜ 0, ∪ j∈Z V j = L 2 (R), and ∪ j∈Z V˜ j = L 2 (R), and that the translates of and form Reisz bases, are trivially met by compactly supported scaling vectors. Therefore, both (V p ) and (V˜ p ) are MRA’s. 䊏 ˜ f , Wg , Section III gives a detailed definition of the wavelet spaces W f , W ˜ h . W f and W ˜ f have generators supported on triangles, Wg and ˜ g , Wh , and W W ˜ h have gen˜ g have generators supported on parallelograms, and Wh and W W erators supported on hexagons. The main theorem on the construction of the q(N 2 − 1) wavelets is stated next and proven in Section III. Theorem I.2 Let (V p ) and (V˜ p ) be biorthogonal MRA of multiplicity q ˜ f , Wg , W ˜ g , Wh , in R2 constructed from Theorem I.1. Let us define W f , W ˜ ˜ ˜ ˜ and Wh as previously. Then V1 = V0 + W0 and V 1 = V 0 + W0 where W0 = ˜0 = W ˜ f +W ˜g +W ˜ h , and W0 and W ˜ 0 each have q(N 2 − 1) W f + W g + Wh , W generators. In Section V, a useful prefilter for the orthogonal scaling functions constructed in Section II.B is presented. Examples are given that use the prefilter and bases for image compression and denoising.
II. Scaling Function Constructions Let us set dilation factor N := 3, and let G = {(0, 0), (1, 0), (2, 0), (0, 1), ˇ = {(0, (1, 1), (0, 2)} and G √ 0), (1, 0), (0, 1)}. Let us set the direction vectors ǫ1 = (1, 0) and ǫ2 = ( 12 , 23 ). A. Biorthogonal Construction ˜ that satisfy Theorem I.1, we For us to construct scaling vectors and must let w and u be continuous functions with (nonempty) support in △0 and let wˇ := w ◦ r and uˇ := u ◦ r . With condition 2 of Theorem I.1 in mind,
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
205
Figure 5. The domain points and scalings used in the dilation-3 construction of biorthogonal scaling functions.
let us require that w and u satisfy the following dilation equations for some α, β, si , sˇi , qi , and qˇ i : w = αh ◦ d1,1 + sˇi wˇ ◦ di (10) si w ◦ di + ˇ i∈G
i∈G
u = βh ◦ d1,1 +
i∈G
qi u ◦ di +
ˇ i∈G
qˇ i uˇ ◦ di
(11)
Notice that both Eq. (10) and Eq. (11) have the same format as the fixed-point equation (8). Therefore, the functions w and u are FIS’s, with interpolation points located uniformly over △0 as illustrated in Figure 5, provided |si | < 1 and |qi | < 1 for all i ∈ Z2 (this is necessary if w and u are to be continuous). For us to construct w, w, ˇ u, and uˇ with rotational symmetry about the centroid of their support triangle, let s0,0 = s2,0 = s0,2 := s1 s1,0 = s0,1 = s1,1 := s2 sˇ0,0 = sˇ1,0 = sˇ0,1 := s3
and
q0,0 = q2,0 = q0,2 := q1 q1,0 = q0,1 = q1,1 := q2 qˇ 0,0 = qˇ 1,0 = qˇ 0,1 := q3
where |si |, |qi | < 1 for i = 1, 2, 3. Then the only free parameters are the scaling variables si , qi , and α and β, the values of the functions w and u at the centroid of △0 , respectively. Let us set α, β := 1 for this construction. Recall that h i = h(· − ǫi )|△0 , where i = 0, 1, 2. Because of the rotational invariance of both w and the set of h i , the six orthogonality conditions needed to satisfy condition 4 of Theorem I.1 reduce to just ˜ W W I − PW (12) h 0 ⊥ I − PW ˜ h1
206
BRUCE KESSLER
˜ = S(u). Because where W = S(w) and W ˜
W PW h0 =
h 0 , u w w, u
and
W PW ˜ h1 =
h 1 , w u w, u
then Eq. (12) reduces to h 0 , h 1 =
h 0 , uh 1 , w w, u
(13)
Because h 0 , w = h 1 , w = h 2 , w and h 0 + h 1 + h 2 = 1 on △0 , we can calculate w, 1 by using Eq. (10): w, 1 = h ◦ d1,1 1 + sˇi wˇ ◦ di , 1 si w ◦ di , 1 + ˇ i∈G
i∈G
h, 1 = "3 3 3 − i=1 si
Likewise, from Eq. (11),
h, 1 u, 1 = "3 3 3 − i=1 qi
√
= 13 w, 1, and h 1 , u = 31 u, 1, then √ √ 3 3 and h 1 , u = h 0 , w = "3 "3 18 3 − i=1 si 18 3 − i=1 qi
Because h, 1 =
3 , h 0 , w 2
(14)
Again, if we use both Eqs. (10) and (11), w, u =
Because h, h =
h, h + 3(s2 + s3 )h 0 , w + 3(q2 + q3 )h 0 , u "3 3 3 − i=1 si qi
√ 3 , 4
then if we use Eq. (14),
w, u $ $ # $ # $# √ # "3 "3 "3 "3 si qi + 2(q2 + q3 ) 3 − i=1 qi + 2(s2 + s3 ) 3 − i=1 si 3 − i=1 3 3 3 − i=1 # $# $# $ = "3 "3 "3 36 3 − i=1 si 3 − i=1 qi 3 − i=1 si qi
(15)
√
If we substitute Eq. (14), Eq. (15), and h 0 , h 1 = 483 into Eq. (13) and require that Eq. (15) be nonzero, we get the following necessary conditions on the si
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
207
and qi : 27s1 + 9s2 + 9s3 + 27q1 + 9q2 + 9q3 − 25s1 q1 − 3s1 q2 − 3s1 q3 − 3s2 q1 − 13s2 q2 + 3s2 q3 − 3s3 q1 + 3s3 q2 − 13s3 q3 − 33 = 0
(16)
− q3 s2 − 3s3 + q1 s3 − q2 s3 − q3 s3 = 0
(17)
27 − 9q1 − 3q2 − 3q3 − 9s1 + 3q1 s1 + q2 s1 + q3 s1 − 3s2 + q1 s2 − q2 s2 If we let si := 0 for i = 1, 2, 3, w becomes piecewise linear and Eqs. (16) and (17) reduce to 3(9q1 + 3q2 + 3q3 − 11) = 0
and
3(9 − 3q1 − q2 − q3 ) = 0 (18)
Furthermore, if we let q =: qi for i = 1, 2, 3, Eq. (18) reduces to 45q − 33 = 11 . Let us define the scaling 0 and 27 − 15q = 0, with the solution q = 15 functions w φ 1 := √ w, u wˇ w, u ( ) h, u ◦ ti h, uˇ ◦ ti 1 3 w ◦ ti − wˇ ◦ ti h− φ := α w, u w, u i∈H ˇ φ 2 := √
i∈ H
u φ˜ 1 := √ w, u
uˇ φ˜ 2 := √ w, u ( ) h, w h, w ◦ ti ˇ ◦ ti 1 3 ˜ u ◦ ti − h− uˇ ◦ ti φ := α w, u w, u i∈H ˇ i∈ H
where H = {(0, 0), (0, −1), (−1, 0)} and Hˇ = {(0, −1), (−1, 0), (−1, −1)} and B h 0 , wh 0 , u α := 6 h 0 , h 0 − w, u ˜ := (φ˜ 1 , φ˜ 2 , φ˜ 3 )T are biorthogonal scaling Then := (φ 1 , φ 2 , φ 3 )T and ˜ p (x, y))). vectors that generate the MRA V p = S((3 p (x, y))) and V˜ p = S((3 ˜ Note that both V0 and V 0 contain piecewise linears on the triangulation T and
208
BRUCE KESSLER
Figure 6. Scaling functions φ 1 and φ 3 with si = 0.
with si = 0 for i = 1, 2, 3, V0 is the set of piecewise linears on a uniform subdivision of T . This set of scaling functions and their biorthogonal counterparts for i = 1, 2, 3 first appeared in Kessler (2002) and are illustrated with qi = 11 15 in Figures 6 and 7. B. Orthogonal Construction If we let si = qi for i = 1, 2, 3, w = u and we can construct an orthogonal scaling vector. Equation (16) reduces to 54s1 − 25s12 + 18s2 − 6s1 s2 − 13s22 + 18s3 − 6s1 s3 + 6s2 s3 − 13s32 − 33 = 0. (19)
Figure 7. Approximations to scaling functions φ˜ 1 and φ˜ 3 with qi =
11 . 15
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
Figure 8. Scaling functions φ 1 and φ 3 with si = qi =
209
11 . 19
If we let s = si for i = 1, 2, 3, Eq. (19) reduces to with the solution s =
11 . 19
3(1 − s)(19s − 11) = 0
(20)
Let us define the scaling functions
w φ 1 := √ w, w
wˇ φ 2 := √ w, w ( ) h, w ◦ ti h, w ˇ ◦ ti 1 3 w ◦ ti − wˇ ◦ ti h− φ := α w, w w, w i∈H ˇ i∈ H
where H = {(0, 0), (0, −1), (−1, 0)} and Hˇ = {(0, −1), (−1, 0), (−1, −1)} and B h 0 , w2 α := 6 h 0 , h 0 − w, w These scaling functions first appeared in Donovan et al. (1995) and are illustrated in Figure 8. III. Associated Multiwavelets ˜ be the scaling vectors constructed in Theorem I.1 and let (V p ) Let and ˜ and (V p ) be the corresponding MRA. Recall that supp φ i , supp φ˜ i ⊆ △0 = △(ǫ0 , ǫ1 , ǫ2 ) ∈ T for i = 1, . . . , t and that supp φ i , supp φ˜ i ⊆ ▽0 = △(ǫ1 , ǫ2 ,
210
BRUCE KESSLER
ǫ1 + ǫ2 ) ∈ T for i = t + 1, . . . , 2t. For function space V and compact set K ⊂ R2 , let V (K ) := { f ∈ V : supp( f ) ⊆ K }
(21)
First let us consider wavelets supported in △ ∈ T . Let us consider the (t N 2 + ((N − 1)(N − 2))/2)-dimensional spaces V1 (△0 ) and V˜ 1 (△0 ), with the bases consisting of t dilated scaling functions on each of the N 2 subtriangles and ((N − 1)(N − 2))/2 dilated φ q and φ˜ q . Let us define the functions ˜
gi := PVV1 1(△(△0 0) ) (φ q (· − ǫi ))
(△0 ) ˜ q g˜ i := PVV˜ 11(△ (φ (· − ǫi )) 0)
and
(22)
for i = 0, 1, 2. Then let us define the subsets X of V1 (△0 ) and X˜ of V˜ 1 (△0 ) by X := {gi : i = 0, 1, 2} ∪ {φ i : i = 1, . . . , t} X˜ := {g˜ i : i = 0, 1, 2} ∪ {φ˜i : i = 1, . . . , t} ˜
Let B be a basis for the space (I − PXX )V1 (△0 ) and let B˜ be a basis for the space (I − PXX˜ )V˜ 1 (△0 ). Note that the elements of B are orthogonal to V˜ 0 and the elements of B˜ are orthogonal to V0 by definition. Also notice that because of their support, the elements of both B and B˜ are orthogonal to their own translates. A small lemma is needed before we proceed. Lemma III.1 B ∩ B˜ ⊥ = {0}.
˜ := {φ˜ i : i = 1, . . . , t} and noProof. Let W := {φ i : i = 1, . . . , t} and W ˜ are biorthogtice that, from the construction of the scaling functions, W and W ⊥ ˜ onal sets. Let us consider ψ ∈ B ∩ B . Then supp(ψ) ⊂ △0 and ψ ∈ PXX˜ V1 (△0 ). Then ψ is a linear combination of elements in X˜ orthogonal to X . Let us consider ψ, φ i , i = 1, . . . , t. Because {g˜ i : i = 0, 1, 2} ⊥ W , ψ is ˜ ∩ W⊥ = ˜ . However, by Lemma I.1, W a linear combination of elements in W 0, so ψ = 0. 䊏 Then we may construct, using the modified Gram–Schmidt process, dual ˜ denoted △f and ˜ △f , respectively. Recall the biorthogonal bases for B and B, notation fˇ = f ◦ r , where r is the affine transformation from ▽0 to △0 and ˜ ▽f := {ψˇ˜ : ψ˜ ∈ ˜ △f }. f ∈ L 2 (R2 ). Let us define ▽f := {ψˇ : ψ ∈ △f } and △ ▽ △ ▽ ˜ f := S( ˜ 0. ˜ f ∪ ˜f)⊂W Let us define W f := S( f ∪ f ) ⊂ W0 and W ˜ The spaces W f and W f each have (N − 1)(N − 2) 2 − (t + 3) = q(N 2 − 1) − 3N − 3 2 tN + 2
generators.
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
211
Next, we consider wavelets supported on adjacent triangles. Lemma III.2 For gi and g˜ i , i = 0, 1, 2, as defined in Eq. (22), gi , g˜ j < 0 for i = j.
Proof. Let us define z i := φ q (· − ǫi ) | △0 and z˜ i := φ˜ q (· − ǫi ) | △0 for i = 0, 1, 2. Recall than φ q and φ˜ q are the only scaling functions with support larger than one △ ∈ T . Notice that the z i and z˜ i are still linear and nonnegative on all edges of △0 and that z 0 , z˜ 1 = 0. Let us consider g0 , g˜ 1 , and let us express both z 0 and z˜ 1 in terms of basis functions for V1 | △0 and the g˜ i : z 0 = φ q ◦ d0,0 | △0 + + z˜ 1 = φ˜ q ◦ d N ,0 | △0 +
N −1 1 (N − i)φ q ◦ d0,i | △0 N i=1
N −1 1 (N − i)φ q ◦ di,0 | △0 + g0 N i=1
N −1 N −1 1 1 i φ˜ q ◦ di,0 | △0 + i φ˜ q ◦ di,N −i | △0 + g˜ 1 N i=1 N i=1
Recall that φ q is orthogonal to the translates of φ˜ q , even when restricted to bounded domains, so the same is true for φ q ◦ d0,0 and φ˜ q ◦ d0,0 . Therefore, many of the terms of z 0 , z˜ 1 vanish: z 0 , z˜ 1 =
N −1 1 i(N − i)φ q ◦ di,0 | △0 , φ˜ q ◦ di,0 | △0 + g0 , g˜ 1 . N 2 i=1
Because φ q ◦ di,0 | △0 , φ˜ q ◦ di,0 | △0 = K φ q ◦ di,0 φ˜ q ◦ di,0 =
K K q ˜q φ φ = 2 2 N N
for some 0 < K < 1, then z 0 , z˜ 1 =
N −1 K i(N − i) + g0 , g˜ 1 N 4 i=1
If we use the identities n i=1
i=
n(n + 1) 2
and
n i=1
i2 =
n(n + 1)(2n + 1) 6
212
BRUCE KESSLER
then z 0 , z˜ 1 =
K (N 2 − 1) + g0 , g˜ 1 = 0 6N 3
(23)
Therefore, for N > 1, g0 , g˜ 1 < 0 It is easily verified that the same result holds for the remaining gi , g˜ j , i = j. 䊏
Lemma III.3 For gi and g˜ i , i = 0, 1, 2, as defined in Eq. (22), the sets {g0 , g1 , g2 } and {g˜ 0 , g˜ 1 , g˜ 2 } are each linearly independent.
Proof. This proof hinges on the linear algebra result that for an n-dimensional space A and a space B where A ∩ B = {0}, then (I − PB )A is an n-dimensional space. Recall the linear polynomials h i , i = 0, 1, 2, supported on △0 and let us define the three-dimensional space H := span{h 0 , h 1 , h 2 }. Let H ∗ := PV1 (△0 ) H . Because H ∩ (H − H ∗ ) = {0}, then H ∗ = (I − (I − PV1 (△0 ) ))H is a three-dimensional space. Recall the space W used in the construction of φ q . Because H ∗ ∩ W = {0}, then G := span{g0 , g1 , g2 } = (I − PW )H ∗ is a three-dimensional space. An analogous proof holds for {g˜ 0 , g˜ 1 , g˜ 2 }.
䊏
Lemma III.4 For gi and g˜ i , i = 0, 1, 2, as defined in Eq. (22), there exist σi and σ˜ i , i = 0, 1, 2, such that 1. 2. 3. 4.
span{σ0 , σ1 , σ2 } = span{g0 , g1 , g2 }. span{σ˜ 0 , σ˜ 1 , σ˜ 2 } = span{g˜ 0 , g˜ 1 , g˜ 2 }. {σ0 , σ1 , σ2 } and {σ˜ 0 , σ˜ 1 , σ˜ 2 } are biorthogonal sets. σi ⊥ g˜ i and σ˜ i ⊥ gi for i = 0, 1, 2.
Proof. Note that, from Lemmas III.2 and III.3, the sets G := span{g0 , g1 , g2 } ˜ := span{g˜ 0 , g˜ 1 , g˜ 2 } are each three-dimensional but not biorthogonal. Let and G ˜ us define the following biorthogonal bases for G and G: v˜ 2 := g˜ 0
v2 := g0
g1 , v˜ 2 v0 := g1 − v2 v2 , v˜ 2 g2 , v˜ 2 g2 , v˜ 0 v2 − v0 v1 := g2 − v2 , v˜ 2 v0 , v˜ 0
g˜ 1 , v2 v˜ 2 v2 , v˜ 2 g˜ 2 , v2 g˜ 2 , v0 v˜ 2 − v˜ 0 v˜ 1 := g˜ 2 − v2 , v˜ 2 v0 , v˜ 0
v˜ 0 := g˜ 1 − and
Because each v˜ i is easily replaced with its additive inverse, let us assume without loss of generality that vi , v˜ i > 0.
213
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
√ √ Let u i := vi / vi , v˜ i and u˜ i := v˜ i / vi , v˜ i for i = 0, 1, 2. Let us define ˜ → R3 by the transformations T : G → R3 and T˜ : G T ( f ) := ( f, u˜ 0 , f, u˜ 1 , f, u˜ 2 )T and T˜ ( f ) := ( f, u 0 , f, u 1 , f, u 2 )T
so that T (g0 ) = (0, 0, g0 , u˜ 2 )T T (g1 ) = (g1 , u˜ 0 , 0, g1 , u˜ 2 )T T (g2 ) = (g2 , u˜ 0 , g2 , u˜ 1 , g2 , u˜ 2 )T
and
T˜ (g˜ 0 ) = (0, 0, g˜ 0 , u 2 )T T˜ (g˜ 1 ) = (g˜ 1 , u 0 , 0, g˜ 1 , u 2 )T T˜ (g˜ 2 ) = (g˜ 2 , u 0 , g˜ 2 , u 1 , g˜ 2 , u 2 )T
˜ by T (ω0 ) := (cos θ, sin θ, 0)T and T (ω˜ 0 ) := Let us define ω0 ∈ G and ω˜ 0 ∈ G T ˜ (cos θ˜ , sin θ˜ , 0) , respectively, so that ω0 ⊥ g˜ 0 for all θ and ω˜ 0 ⊥ g0 for all θ. ˜ by Then let us define ω1 ∈ G and ω˜ 1 ∈ G T (ω1 ) := T (ω˜ 0 ) × T (g˜ 1 )
˜ −g˜ 1 , u 2 cos θ, ˜ −g˜ 1 , u 0 sin θ) ˜ T = (g˜ 1 , u 2 sin θ,
T (ω˜ 1 ) := T (ω0 ) × T (g1 )
= (g1 , u˜ 2 sin θ, −g1 , u˜ 2 cos θ, −g1 , u˜ 0 sin θ)T
so that ω1 ⊥ ω˜ 0 , ω1 ⊥ g˜ 1 , ω˜ 1 ⊥ ω0 , and ω˜ 1 ⊥ g1 . Also, let us define ω2 ∈ G ˜ by and ω˜ 2 ∈ G T (ω2 ) := T (ω˜ 0 ) × T (g˜ 2 )
˜ −g˜ 2 , u 2 cos θ, ˜ g˜ 2 , u 1 cos θ˜ − g˜ 2 , u 0 sin θ) ˜ T = (g˜ 2 , u 2 sin θ,
T (ω˜ 2 ) := T (ω0 ) × T (g2 )
= (g2 , u˜ 2 sin θ, −g2 , u˜ 2 cos θ, g2 , u˜ 1 cos θ − g2 , u˜ 0 sin θ)T
so that ω2 ⊥ ω˜ 0 , ω2 ⊥ g˜ 2 , ω˜ 2 ⊥ ω0 , and ω˜ 2 ⊥ g2 . Then ω1 ⊥ ω˜ 2 and ω˜ 1 ⊥ ω2 provided that there exist θ and θ˜ such that T (ω1 ), T (ω˜ 2 ) = 0 and T (ω˜ 1 ), T (ω2 ) = 0; that is, (g˜ 1 , u 0 g2 , u˜ 0 + g˜ 1 , u 2 g2 , u˜ 2 )sin θ sin θ˜ + g˜ 1 , u 2 g2 , u˜ 2 cos θ cos θ˜ − g˜ 1 , u 0 g2 , u˜ 1 sin θ˜ cos θ = 0
(24)
− g1 , u˜ 0 g˜ 2 , u 1 sin θ cos θ˜ = 0
(25)
(g1 , u˜ 0 g˜ 2 , u 0 + g1 , u˜ 2 g˜ 2 , u 2 )sin θ sin θ˜ + g1 , u˜ 2 g˜ 2 , u 2 cos θ cos θ˜ respectively. Let K i, j := √ −gi , g˜ j > 0 for i = j from Lemma III.2 and let Mi := gi , g˜ i and Vi := vi , v˜ i for i = 0, 1, 2. Note that K 0,1 = K 1,2 = K 2,0 and
214
BRUCE KESSLER
K 0,2 = K 1,0 = K 2,1 as a result of the rotational invariance of the inner product. Then V2 = M0 B M0 M1 − K 0,1 K 0,2 V0 = M0 B 3 3 M0 M1 M2 − K 0,1 K 0,2 (M0 + M1 + M2 ) − K 0,1 − K 0,2 V1 = M0 M1 − K 0,1 K 0,2 and so g1 , u˜ 0 = V2
K 0,2 g1 , u˜ 2 = − V2 2 + K 0,2 V22 K 0,1 g2 , u˜ 0 = − V0 V22 g2 , u˜ 1 = V1 K 0,1 g2 , u˜ 2 = − V2
g˜ 1 , u 0 = V2
K 0,1 V2 2 + K 0,1 V22 K 0,2 g˜ 2 , u 0 = − V0 V22 g˜ 2 , u 1 = V1 K 0,2 g˜ 2 , u 1 = − V2 g˜ 1 , u 2 = −
and
Then Eqs. (24) and (25) reduce to −K 0,2 sin θ sin θ˜ +
2 K 0,1 cos θ cos θ˜ − V0 V1 cos θ sin θ˜ = 0 V22
(26)
−K 0,1 sin θ sin θ˜ +
2 K 0,2 cos θ cos θ˜ − V0 V1 sin θ cos θ˜ = 0 V22
(27)
respectively. Solving Eqs. (26) and (27) for θ in terms of θ˜ yields the conditions tan θ =
2 cos θ˜ − V0 V1 V22 sin θ˜ K 0,1 K 0,2 V22 sin θ˜
and tan θ =
2 cos θ˜ K 0,2 ˜ V22 (K 0,1 sin θ˜ + V0 V1 cos θ)
(28)
respectively. Combining the two equations in (28) yields the single condition 3 3 2 K 0,1 V0 V1 V22 tan2 θ˜ + K 0,2 − K 0,1 + V02 V12 V22 tan θ˜ − K 0,1 V0 V1 = 0
a quadratic in tan θ˜ with real solutions. Once θ and θ˜ are found that satisfy Eqs. (26) and (27), let us find the ωi and ω˜ i , i = 0, 1, 2, by the inverse transformations T −1 : R3 → G and T˜ −1 :
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
˜ defined by R3 → G and
215
T −1 (c0 , c1 , c2 ) = c0 u 0 + c1 u 1 + c2 u 2
T˜ −1 (c0 , c1 , c2 ) = c0 u˜ 0 + c1 u˜ 1 + c2 u˜ 2 √ √ As a final step, let us define σi := ωi / ωi , ω˜ i and σ˜ i := ω˜ i / ωi , ω˜ i for i = 0, 1, 2. In the orthogonal case, let g˜ i = gi and θ˜ = θ. Then the conditions (24) and (25) reduce to the one equivalent condition (g1 , u 0 g2 , u 0 + g1 , u 2 g2 , u 2 ) sin2 θ + g1 , u 2 g2 , u 2 cos2 θ − g1 , u 0 g2 , u 1 sin θ cos θ = 0
(29)
Let K := K 0,1 = K 0,2 . Then Eq. (29) yields the condition −K tan2 θ − V0 V1 tan θ + a quadratic in tan θ with real solutions.
K2 =0 V22
䊏
Lemma III.4 shows that each gi is a linear combination of two σ j where j = i, and likewise for the g˜ i . Now, consider the spaces Y0 := span{σ0 , σˇ 1 , φ q ◦ d N −i,i : i = 1, . . . , N − 1} Y˜ 0 := span{σ˜ 0 , σˇ˜ 1 , φ˜ q ◦ d N −i,i : i = 1, . . . , N − 1}
˜ f and all translates of φ˜ i , i = 1, . . . , q − 1 Functions in Y0 are orthogonal to W q q q ˜ ˜ ˜ and φ except φ ◦ t1,0 and φ ◦ t0,1 . Likewise, functions in Y˜ 0 are orthogonal to W f and all translates of φ i , i = 1, . . . , q − 1 and φ q except φ q ◦ t1,0 and ˜ φ q ◦ t0,1 . Let us define X 0 := PYY00 V0 and X˜ 0 := PY˜Y00 V˜ 0 as two-dimensional ˜ 0 be biorthogonal bases for subspaces of Y0 and Y˜ 0 , respectively. Let 0 and X˜ 0 the (N − 1)-dimensional complements (I − PX 0 )Y0 and (I − PXX˜ 00 )Y˜ 0 . The elements of 0 satisfy all existing orthogonality conditions necessary to belong ˜ 0. to the wavelet space W0 , and likewise for The same construction can be used across the other two edges of △0 by using the spaces Y1 := span{σ2 , σˇ 2 ◦ t0,−1 , φ q ◦ di,0 : i = 1, . . . , N − 1} Y˜ 1 := span{σ˜ 2 , σˇ˜ 2 ◦ t0,−1 , φ˜ q ◦ di,0 : i = 1, . . . , N − 1}
Y2 := span{σ1 , σˇ 0 ◦ t−1,0 , φ q ◦ d0,i : i = 1, . . . , N − 1} Y˜ 2 := span{σ˜ 1 , σˇ˜ 0 ◦ t−1,0 , φ˜ q ◦ d0,i : i = 1, . . . , N − 1}
216
BRUCE KESSLER
˜ h. Figure 9. Domains used in the construction of h and
and analogous subspaces X 1 , X˜ 1 , X 2 , and X˜ 2 to build biorthogonal pairs 1 ˜ 1 and also 2 and ˜ 2 . Let us define g := 0 ∪ 1 ∪ 2 and ˜ g := and ˜ ˜ ˜ 0 ∪ 1 ∪ 2 . The wavelets in g and their translates are orthogonal to the ˜ g and their translates because of the biorthogonality of the σi and wavelets in ˜ g := S( ˜ 0 . The spaces Wg ˜ g) ⊂ W σ˜ i . Let us define Wg := S(g ) ⊂ W0 and W ˜ and Wg each have 3(N − 1) generators. Let D0 , . . . , D5 be the parallelogram-shaped regions of R2 defined in Figure 9. Let us define ˜
νi := PVV1 1(D(Di )i ) φ q
and
(Di ) ˜ q φ ν˜ i := PVV˜ 11(D i)
for i = 0 , . . . , 5 and let us consider for the moment ν0 . Notice that ν0 meets several orthogonality conditions required of wavelets in W0 : ν0 ⊥ φ˜ j for j = 1, . . . , (q − 1)/2, ν0 ⊥ (φ˜ j ◦ t−1,0 ) for j = (q + 1)/2 , . . . , q − 1, ν0 ⊥ (φ˜ q ◦ ˜ f . Also, ν0 ⊥ generators of W ˜ g that are built across the edges t0,1 ), and ν0 ⊥ W (ǫ0 , ǫ2 ), (ǫ1 , ǫ2 ), and (ǫ2 , ǫ2 − ǫ1 ). Similar results can be found for the other νi and ν˜ i . The goal is to alter the νi and ν˜ i in such a way that these orthogonalities are maintained while the other necessary orthogonalities are also achieved. Let us define $ # ˜ (D ) W μi := I − PWgg(Dii) νi + ci φ q ◦ d0,0 and
# $ W (D ) μ ˜ i := I − PW˜ gg(Dii) ν˜ i + c˜ i φ˜ q ◦ d0,0
for i = 0, . . . , 5, where ci and c˜ i satisfy μi , φ˜ q = 0 and μ ˜ i , φ q = 0, respectively. From Lemma III.4, there exist biorthogonal sets = {σ0 , σ1 , σ2 }
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
217
˜ = {σ˜ 0 , σ˜ 1 , σ˜ 2 } such that span() = span{g0 , g1 , g2 }, span() ˜ = and span{g˜ 0 , g˜ 1 , g˜ 2 }, and σ j ⊥ g˜ j and σ˜ j ⊥ g j for j = 0, 1, 2. Then μ0 := ν0 − ν0 , σ˜ 2 σ2 − ν0 , σ˜ 1 ◦ t−1,0 σˇ 1 ◦ t−1,0 + c0 φ q ◦ d0,0 = c0 φ q ◦ d0,0 + g˜ 0 , σ˜ 1 σ1 + g2 , σ˜ 0 σˇ 0 +
N −1 1 (N − j)φ q ◦ d0, j N j=1
(30)
μ ˜ 1 := ν˜ 1 − ˜ν1 , σ1 σ˜ 1 − ˜ν1 , σ1 ◦ t0,−1 σˇ˜ 1 ◦ t0,−1 + c˜ 1 φ˜ q ◦ d0,0
N −1 1 (N − j)φ˜r ◦ d j,0 = c˜ 1 φ˜ q ◦ d0,0 + g˜ 0 , σ2 σ˜ 2 + g˜ 0 , σ2 σˇ˜ 2 ◦ t0,−1 + N j=1
(31)
˜ i maintain the orthogonalities of the νi and ν˜ i . Also, Notice that the μi and μ ˜ g , and μ ˜ i ⊥ φ q , μi ⊥ W ˜ i ⊥ Wg . Finally, note from by definition, μi ⊥ φ˜ q , μ Eqs. (30) and (31) that ˜ 1 = g2 , σ˜ 0 g˜ 0 , σ2 σˇ 0 , σˇ˜ 2 = 0 μ0 ◦ t1,−1 , μ
μ0 , μ ˜ 1 = c0 c˜ 1 + g0 , σ˜ 1 g˜ 0 , σ2 σ1 , σ˜ 2 = c0 c˜ 1 = 0
˜ 1 and (μ0 ◦ t0,1 ) ⊥ μ ˜ 1 . SimAlso, it is trivially established that (μ0 ◦ t1,0 ) ⊥ μ ˜ i satisfy the condition μi ⊥ (μ ˜ j ◦ tm,n ) ilarly, it is established that the μi and μ ˜ i } satisfy for m, n = 0, i, j = 0, . . . , 5, i = j, and that the sets {μi } and {μ ˜ h be biorthogonal bases for span{μi : i = 0 , . . . , 5} Lemma I.1. Let h and and span{μ ˜ i : i = 0 , . . . , 5}, respectively. Let us define Wh := S(h ) ⊂ W0 ˜ h := S( ˜ 0. ˜ h) ⊂ W and W Theorem III.1 Let (V p ) and (V˜ p ) be biorthogonal MRA of multiplicity q in ˜ f , Wg , W ˜ g , Wh , and W ˜h R2 constructed from Theorem I.2. Let us define W f , W ˜ 0 where W0 = W f + as previously. Then V1 = V0 + W0 and V˜ 1 = V˜ 0 + W ˜0 = W ˜ f +W ˜g +W ˜ h , and W0 and W ˜ 0 each have q(N 2 − 1) W g + Wh , W generators. ˜ := W ˜ f +W ˜g +W ˜ h , V := Proof. Let us define W := W f + Wg + Wh , W ˜ ˜ V1 (△0 ), and V := V1 (△0 ). Certainly V1 ⊇ V0 + W by nature of the wavelet constructions. At issue is whether V1 ⊆ V0 + W . For N > 2, generators φ i ◦ d0,0 , i = 1, . . . , q − 1, of V1 can be found in the space V . Notice that dim V = t N 2 +
(N − 1)(N − 2) 2
˜ f , Wg where t = (q − 1)/2. The scaling functions and the definitions of W f , W
218
BRUCE KESSLER
˜ g , along with Lemma III.4, provide biorthogonal bases and W {φ 1 , . . . , φ t } ∪ f ∪ {σ0 , σ1 , σ2 }
˜ f ∪ {σ˜ 0 , σ˜ 1 , σ˜ 2 } {φ˜ 1 , . . . , φ˜ t } ∪
and
of V and V˜ , each with cardinality (N − 1)(N − 2) (N − 1)(N − 2) 2 t + tN + − t − 3 + 3 = t N2 + 2 2
Because the linear systems have full rank, each f ∈ V is a linear combination of elements of V0 + W and each f˜ ∈ V˜ is a linear combination of ele˜ . Thus, φ i ◦ d0,0 ∈ V0 + W and φ˜ i ◦ d0,0 ∈ V˜ 0 + W ˜ for i = ments of V˜ 0 + W 1, . . . , q − 1. Also notice that 5 5 q ci φ q ◦ d0,0 μi = 1 − φ − i=0
i=0
Thus, V1 ⊆ V0 + W f + Wg + Wh and W = W0 . The analogous results hold ˜ . The number of generators is the sum of the generators for W f , Wg , and for W Wh : (q(N 2 − 1) − 3N − 3) + (3(N − 1)) + 6 = q(N 2 − 1)
Corollary III.1 Let (V p ) and (V˜ p ) be biorthogonal MRA of multiplicity r ˜ f , Wg , and W ˜ g as in R2 constructed from Theorem I.2. Let us define W f , W previously. Let D be the hexagonal support of φ q , and let X := (V0 + W f + ˜ f +W ˜ g )(D). Then Wh and W ˜ h are generated by Wg )(D) and X˜ := (V˜ 0 + W ˜ biorthogonal bases for PXX V1 (D) and PXX˜ V˜ 1 (D). ˜ h provide an explicit construction, Although the definitions of Wh and W ˜ f , Wg and W ˜ g are found, Corollary III.1 says that after the generators of W f , W the generators of Wh are whatever is left in V1 with the support of φ q , and ˜ h. likewise for W IV. Wavelet Constructions A. Wavelets for the Biorthogonal Construction This biorthogonal wavelet construction first appeared in Kessler (in press). ˜f 1. Wavelets in W f and W ˜ f each have 12 generators, 6 supported on △0 and 6 By definition, W f and W supported on ▽0 . Let us define the 10-dimensional spaces V := V1 (△0 ) and
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
219
Figure 10. Wavelets ψ 1 through ψ 6 : biorthogonal construction.
V˜ := V˜ 1 (△0 ) and the 4-dimensional spaces ˜
X := PVV (span{φ 1 , φ 3 , φ 3 ◦ t1,0 , φ 3 ◦ t0,1 })
and
X˜ := PVV˜ (span{φ˜ 1 , φ˜ 3 , φ˜ 3 ◦ t1,0 , φ˜ 3 ◦ t0,1 }) The ψ i , i = 1 , . . . , 6, illustrated in Figure 10, were chosen as a spanning ˜ set for (I − PXX )V that met some symmetry conditions. The associated ψ˜ i , illustrated in Figure 11, were chosen so that ψ˜ i ∈ ker(span{ψ j : j = i} ∪ X ) and ψ i , ψ˜i > 0 for i = 1 , . . . , 6. Each of the preceding is “normalized” by the factor ψ i , ψ˜ i . ˜ f |χ▽0 . Let us define These wavelets reflected onto ▽0 span W f |χ▽0 and W ψ i+6 := ψ i ◦ r ◦ t−1,0
and
ψ˜ i+6 := ψ˜ i ◦ r ◦ t−1,0
for i = 1 , . . . , 6. ˜g 2. Wavelets in Wg and W ˜ g each have six generators. Following the construction By definition, Wg and W in the proof of Lemma III.4, biorthogonal sets {σ0 , σ1 , σ2 } and {σ˜ 0 , σ˜ 1 , σ˜ 2 } can be found such that σi ⊥ g˜ i and σ˜ i ⊥ gi , i = 0, 1, 2. Then, following the construction in Section III, functions in Wg with support on the parallelogram
220
BRUCE KESSLER
Figure 11. Wavelets ψ˜ 1 through ψ˜ 6 : biorthogonal construction.
(ǫ0 , ǫ1 , ǫ2 , ǫ2 − ǫ1 ) are linear combinations of σ1 , σˇ 0 ◦ t−1,0 , φ 3 ◦ d0,1 , and φ 3 ◦ d0,2 . These functions are orthogonal to all translates of φ 1 , φ 2 , and ψ i for i = 1 , . . . , 12. Also, σ2 ⊥ φ 3 ◦ t1,0 and σˇ 0 ◦ t−1,0 ⊥ φ 3 ◦ t−1,1 . Only four other orthogonality conditions must be met. It is possible to construct symmetric–antisymmetric pairs of wavelets. Let us define ν1 := σ1 + σˇ 0 ◦ t−1,0 + c1 φ 3 ◦ d0,1 + c2 φ 3 ◦ d0,2 and solve the system of equations ν1 , φ˜ 3 =0 3 ˜ ν1 , φ ◦ t0,1 = 0 for c1 and c2 . Likewise, let us define ν˜ 1 := σ˜ 1 + σˇ˜ 0 ◦ t−1,0 + c˜ 1 φ˜ 3 ◦ d0,1 + c˜ 2 φ˜ 3 ◦ d0,2 and solve the system of equations =0 ˜ν1 , φ 3 3 ˜ν1 , φ ◦ t0,1 = 0
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
221
for c˜ 1 and c˜ 2 . If ν1 , ν˜ 1 < 0, then let us change ν˜ 1 to its additive inverse. Also, let us define ν2 and ν˜ 2 by ν2 := σ1 − σˇ 0 ◦ t−1,0
and
ν˜ 2 := σ˜ 1 − σˇ˜ 0 ◦ t−1,0
Then ν1 ⊥ ν˜ 2 and ν˜ 1 ⊥ ν2 by nature of their symmetry properties. rotations of ν1 and ν2 The remaining wavelets generating Wg are merely 2π 3 about ǫ0 , denoted τ . Let us define ω13 ω14 ω15 ω16 ω17 ω18
:= ν1 := ν2 := ν1 ◦ τ := ν2 ◦ τ := ν1 ◦ τ ◦ τ := ν2 ◦ τ ◦ τ
ω˜ 13 ω˜ 14 ω˜ 15 ω˜ 16 ω˜ 17 ω˜ 18
:= ν˜ 1 := ν˜ 2 := ν˜ 1 ◦ τ := ν˜ 2 ◦ τ := ν˜ 1 ◦ τ ◦ τ := ν˜ 2 ◦ τ ◦ τ .
and Let us normalize by defining ψ i := ωi / ωi , ω˜ i and ψ˜ i := ω˜ i / ωi , ω˜ i for i = 13 , . . . , 18. Wavelets ψ 13 , ψ 14 , ψ˜ 13 , and ψ˜ 14 are illustrated in Figure 12.
Figure 12. Wavelets ψ 13 , ψ 14 , ψ˜ 13 , and ψ˜ 14 : biorthogonal construction.
222
BRUCE KESSLER
˜h 3. Wavelets in Wh and W ˜ h each have six generators. Following the construction By definition, Wh and W ˜ h, ˜ i , i = 0 , . . . , 5, that span Wh and W in Section III, let us construct μi and μ respectively. It can be verified that c0 = c2 = c4 , c1 = c3 = c5 , c˜ 0 = c˜ 2 = c˜ 4 , and c˜ 1 = c˜ 3 = c˜ 5 because of the rotational invariance of both the gi and the g˜ i . To construct biorthogonal sets with some symmetric properties, let us first define the following: γ1 γ2 γ3 γ4 γ5 γ6
"5 := i=0 μi "5 := i=0 (−1)i μi := μ0 − μ2 := μ1 − μ3 := μ0 + μ2 := μ1 + μ3
γ˜1 γ˜2 γ˜3 γ˜4 γ˜5 γ˜6
"5 := i=0 μ ˜i "5 := i=0 (−1)i μ ˜i := μ ˜0 −μ ˜2 := μ ˜1 −μ ˜3 := μ ˜0 +μ ˜2 := μ ˜1 +μ ˜3
Then let us construct the biorthogonal sets {ω1 , . . . , ω6 } and {ω˜ 1 , . . . , ω˜ 6 } ω˜ i > 0, i = by using the biorthogonal Gram–Schmidt process so that ωi ,√ √ 1 , . . . , 6. Let us define ψ i+18 := ωi / ωi , ω˜ i and ψ˜ i+18 := ω˜ i / ωi , ω˜ i for i = 1, . . . , 6. These wavelets are illustrated in Figures 13 and 14, respectively. The sets S({ψ i : i = 1, . . . , 24}) and S({ψ˜ i : i = 1 , . . . , 24}) form biorthog˜ 0. onal bases for W0 and W
Figure 13. Wavelets ψ 19 through ψ 24 : biorthogonal construction.
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
223
Figure 14. Wavelets ψ˜ 19 through ψ˜ 24 : biorthogonal construction.
B. Wavelets for the Orthogonal Construction This orthogonal wavelet construction first appeared in Donovan et al. (1996b) and later in Kessler (2000). 1. Wavelets in W f By its definition, W f has 12 generators, 6 supported on △0 and 6 supported on ▽0 . Let us define the 10-dimensional space V := V1 (△0 ) and the 4-dimensional space X := PV (span{φ 1 , φ 3 , φ 3 ◦ t1,0 , φ 3 ◦ t0,1 })
It is possible to find an orthonormal basis {ψ 1 , . . . , ψ 6 } for (I − PX )V with some symmetry properties. The set illustrated in Figure 15 has two functions with rotation and reflection symmetry, two more with reflection symmetry, and two with reflection antisymmetry. The same set of wavelets reflected onto ▽ span W f χ▽ . Let us define ψ i+6 := ψ i ◦ r ◦ t−1,0
for i = 1 , . . . , 6
2. Wavelets in Wg From its definition, Wg has six generators. Recall that gi := PV1 (△) φ q (· − ǫi ) for i = 0, 1, 2. Following the construction in the proof of Lemma III.4, an
224
BRUCE KESSLER
Figure 15. Wavelets ψ 1 through ψ 6 : orthogonal construction.
orthonormal set σ0 , σ1 , and σ2 can be found such that σi ⊥ gi , i = 0, 1, 2. Then, following the construction in Section III, functions in Wg with support on the parallelogram (ǫ0 , ǫ1 , ǫ2 , ǫ2 − ǫ1 ) are linear combinations of σ1 , σˇ 0 ◦ t−1,0 , φ 3 ◦ d0,1 , and φ 3 ◦ d0,2 . These functions are orthogonal to all translates of φ 1 , φ 2 , and ψ i for i = 1 , . . . , 12. Also, σ2 ⊥ φ 3 ◦ t1,0 and σˇ 0 ◦ t−1,0 ⊥ φ 3 ◦ t−1,1 . Only two other orthogonality conditions must be met. It is possible to construct symmetric–antisymmetric pairs of wavelets that are automatically orthogonal. Let us define ν1 by ν1 := σ1 + σˇ 0 ◦ t−1,0 + c1 φ 3 ◦ d0,1 + c2 φ 3 ◦ d0,2 and solve the system of equations ν1 , φ 3 =0 3 ν1 , φ ◦ t0,1 = 0 for c1 and c2 . Also, let us define ν2 by ν2 := σ1 − σˇ 0 ◦ t−1,0 Then ν1 ⊥ ν2 by nature of their symmetry properties.
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
225
Figure 16. Wavelets ψ 13 and ψ 14 : orthogonal construction.
The remaining wavelets generating Wg are merely 120◦ rotations of ν1 and ν2 about ǫ0 , denoted τ . Let us define ω13 := ν1 ω14 := ν2 ω15 := ν1 ◦ τ
ω16 := ν2 ◦ τ
ω17 := ν1 ◦ τ ◦ τ
ω18 := ν2 ◦ τ ◦ τ Let us normalize by defining ψ i := ωi / ωi , ωi for i = 13 , . . . , 18. Wavelets ψ 13 and ψ 14 are illustrated in Figure 16. 3. Wavelets in Wh By definition, Wh has six generators. To explicitly construct wavelets with symmetry, let us first follow the construction in Section III, constructing μi , i = 0, . . . , 5. It can be verified that c0 = c2 = c4 and c1 = c3 = c5 because of the rotational invariance of the gi . One possible set with symmetry properties is found by defining the rotation and reflection symmetric pair γ 19 :=
5
μi
i=0
γ 20 :=
5 (−1)i μi i=0
226
BRUCE KESSLER
Figure 17. Wavelets ψ 19 through ψ 24 : orthogonal construction.
the reflection symmetric pair γ 21 := 2μ0 − μ2 − μ4
γ 22 := 2μ3 − μ1 − μ5 and the reflection antisymmetric pair γ 23 := μ2 − μ4
γ 24 := μ1 − μ5 Let us normalize by defining ψ i := γ i / γ i , γ i for i = 19, . . . , 24. This set of wavelets is illustrated in Figure 17. The set S({ψ i : i = 1, . . . , 24}) forms an orthonormal basis for W0 . V. Applications to Digitized Images Two common uses of wavelet decompositions are their applications to digitized images as a compression tool and as a denoising tool. In this section, both applications are examined, with 512 × 512 gray-scale Lena and Goldhill images used as examples, and with the use of the orthogonal bases constructed in Sections II.B and IV.B. (For a more general discussion on the use of wavelets in analyzing digitized images, see Mallat, 1998.)
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
227
Figure 18. The triangular lattice over rectangular data.
A. Preliminaries To use these bases on a rectangular grid of data, we must reset the direction vectors ǫ1 and ǫ2 so that there is a one-to-one correspondence between the data values and the scaling function in V0 . The author has chosen ǫ1 = (1, 1) and ǫ2 = (−1, 2), as illustrated in Figure 18. The process of converting the data into scaling function coefficients is called prefiltering. (For a more thorough introduction to prefiltering, see Hardin and Roach, 1998.) For an MRA of multiplicity r (i.e., r scaling functions), a sequence of r × r matrices q(m, n) is called a prefilter if it takes the data z to the scaling function coefficients c through the convolution g(m − s, n − t)z(s, t) c(m, n) = q(m, n) ∗ z(s, t) = (s,t)∈R2
" The prefilter q is said to be orthogonal if its z-transform Q(z 1 , z 2 ) = (m,n) q(m, n)z 1−m z 2−n satisfies the condition Q(z 1 , z 2 )Q(z 1−1 , z 2−1 )T = I . If so, then ||q ∗ z|| = ||c||. From Jia (1995), we have the result that a compactly supported scaling vector has approximation order p if and only if there are vectors αm,n (i, j) such that αm,n (i, j)T ◦ ti, j for m + n = 0, . . . , p − 1 (32) x m yn = i, j
Both the biorthogonal and the orthogonal scaling vectors constructed in this article have approximation order 2. The prefilter q is said to be pth-order preserving if it takes uniformly sampled values from a polynomial of degree less than p to a polynomial of the same degree in V0 .
228
BRUCE KESSLER
Figure 19. Sampling points for am,n .
Prefiltering is a nonissue with an MRA with a single scaling function because the data can be used as the scaling function coefficients. This is called the identity prefilter because Q(z 1 , z 2 ) = q(0, 0) = I , and it is both orthogonal and order preserving in that case. However, the identity prefilter is not an order-preserving prefilter with the scaling vectors constructed in this article. An interpolation prefilter maps data values to the function in V0 that interpolates the data. The interpolation prefilter is second-order preserving with our scaling vectors, but it is not orthogonal. It is possible to build a prefilter that consists of a single 3 × 3 matrix Q := Q(z 1 , z 2 ) = q(0, 0) that is orthogonal and first-order preserving, and that is very close to being second-order preserving. Let am,n be a unit 3-vector (if nonzero) formed from the uniform sampling of the polynomial x m y n over the three data locations illustrated in Figure 19. Then a0,0 = Notice that
(1, 1, 1)T √ 3
a1,0 = (0, 0, 0)T
a0,1 =
(1, 2, 3)T √ 14
√ √ √ ( 3, 3, 2)T α0,0 (0, 0) = α1,0 (0, 0) = (0, 0, 0)T √ 2 2 √ (1, 2, 6)T α0,1 (0, 0) = √ 11 are normalized partial solutions to Eq. (32) if we use the orthogonal constructed in Section II.B. Solving for the nine free parameters in Q that satisfy Q Q T = I3×3 , Qa0,0 = α0,0 (0, 0), and trivially Qa1,0 = α1,0 (0, 0) gives two
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
229
separate one-parameter families of solutions for Q. None of these solutions satisfies Qa0,1 = α0,1 (0, 0). However, we may solve numerically for the single parameter that minimizes Qa0,1 − α0,1 (0, 0) to approximately 0.00004775. The prefilter matrix Q, accurate to 12 decimal places, is given next and is used in all of the applications that follow. ⎡ ⎤ 0.999044216095 0.033297082996 0.028318872689 ⎢ ⎥ 0.994066005812 0.102634607524⎦ Q = ⎣−0.036040441557 −0.024733395618 −0.103557135694 0.994315935097
The peak signal-to-noise ratio (PSNR) is used to measure error introduced into a digitized photo. The PSNR is found by first computing the root-meansquare error (RMSE) given by B "rows "columns (originali, j − newi, j )2 j=1 i=1 RMSE = rows ∗ columns Then, the PSNR is given in decibels by PSNR = 20 log10
255 db. RMSE
B. Image Compression The following 512 × 512 images in Figures 20 and 21 were first prefiltered by using the prefilter Q from Section V.A and then decomposed into its smoother approximations in V1 , V2 , and V3 . The wavelet coefficients from W1 , W2 , and W3 were quantized uniformly; that is, they were grouped into equal-sized bins and replaced with a representative character for the bin (in this case, the midpoint of the bin). The compression ratio was calculated by using the theoretical entropy of the signal of the quantized coefficients from W1 , W2 , and W3 and the coefficients from V3 . The entropy of a signal is given by entropy = −
N
p(i) log2 p(i)
i=1
where N is the number of distinct characters in the signal and p(i) is the relative probability of the i th character. The entropy times the length of the signal gives the minimal number of bits in which the signal can be stored without losing information. The image was then reconstructed from the coefficients and postfiltered with Q T . The original images require 256 kB of storage, one byte per pixel. The examples shown in Figures 20 and 21 are reconstructed from signals requiring only 8 kB of storage, a 32:1 compression rate.
230
BRUCE KESSLER
Figure 20. (Left) Original Lena image and (right) the 32 : 1 compression (PSNR = 28.0 db).
C. Denoising Gaussian white noise was added to the original images in Figures 20 and 21, and the noisy image was decomposed into V1 and W1 . The standard deviation of the noise was known, but in most applications, it is not, so it was approximated by using the formula σ˜ =
Median(d1 (i)) 0.6745
Figure 21. (Left) Original Goldhill image and (right) the 32 : 1 compression (PSNR = 26.6 db).
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
231
Figure 22. (Left) Noisy Lena image (PSNR = 14.5 db) and (right) denoised version (PSNR = 26.3 db).
where d1 (i) represents the wavelet coefficients from W1 . Hard thresholding was applied at the level 3σ˜ ; that is, wavelet coefficients below 3σ˜ were converted to zeros. The denoised image was reconstructed as previously. Examples are shown in Figures 22 and 23.
Figure 23. (Left) Noisy Goldhill image (PSNR = 14.7 db) and (right) denoised (PSNR = 22.1 db).
232
BRUCE KESSLER
Appendix Mathematica files containing the following coefficients can be downloaded from the author’s web page: http://www.wku.edu/˜bruce.kessler
A. Coefficients for the Biorthogonal Scaling Function Construction in Section II.A The coefficients presented satisfy the dilation equations ˜ ◦ di ˜ g˜ i gi ◦ di and (x) =3 (x) = 3 i∈Z2
˜ = {φ˜ 1 , φ˜ 2 , φ˜ 3 }T . where = {φ 1 , φ 2 , φ 3 }T and
i∈Z2
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
B. Coefficients for the Orthogonal Scaling Function Construction in Section II.B The coefficients presented satisfy the dilation equation gi ◦ di (x) = 3 i∈Z2
where = {φ 1 , φ 2 , φ 3 }T .
233
234
BRUCE KESSLER
C. Coefficients for the Biorthogonal Wavelet Construction in Section IV.A The coefficients presented satisfy the dilation equations (x) = 3
i∈Z2
h i ◦ di
and
˜ (x) =3
i∈Z2
˜ ◦ di h˜ i
˜ = {φ˜ 1 , φ˜ 2 , φ˜ 3 }T , = {ψ 1 , . . . , ψ 24 }T , and ˜ = where = {φ 1 , φ 2 , φ 3 }T , 1 24 T ˜ ˜ {ψ , . . . , ψ } .
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
235
236
BRUCE KESSLER
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
237
The functions ψ 15 and ψ 16 are 120◦ rotations of ψ 13 and ψ 14 about the origin, respectively. The functions ψ 17 and ψ 18 are 240◦ rotations of ψ 13 and ψ 14 about the origin, respectively.
238
BRUCE KESSLER
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
239
240
BRUCE KESSLER
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
241
242
BRUCE KESSLER
The functions ψ˜ 15 and ψ˜ 16 are 120◦ rotations of ψ˜ 13 and ψ˜ 14 about the origin, respectively. The functions ψ˜ 17 and ψ˜ 18 are 240◦ rotations of ψ˜ 13 and ψ˜ 14 about the origin, respectively.
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
243
244
BRUCE KESSLER
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
245
D. Coefficients for the Orthogonal Wavelet Construction in Section IV.B The coefficients presented satisfy the dilation equation h i ◦ di (x) = 3 i∈Z2
where = {φ 1 , φ 2 , φ 3 }T and = {ψ 1 , . . . , ψ 24 }T .
246
BRUCE KESSLER
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
247
The functions ψ 15 and ψ 16 are 120◦ rotations of ψ 13 and ψ 14 about the origin, respectively. The functions ψ 17 and ψ 18 are 240◦ rotations of ψ 13 and ψ 14 about the origin, respectively.
248
BRUCE KESSLER
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
249
250
BRUCE KESSLER
References Barnsley, M. (1988). Fractals Everywhere. San Diego: Academic Press. Belogay, E., and Wang, Y. (1999). Arbitrarily smooth orthogonal nonseparable wavelets in R2. SIAM J. Math. Anal. 30(3), 678–697. Daubechies, I. (1992). Ten Lectures on Wavelets. Philadelphia: Soc. for Industr. & Appl. Math. Donovan, G. C., Geronimo, J. S., and Hardin, D. P. (1995). A class of orthogonal multiresolution analyses in 2D, in Mathematical Methods for Curves and Surfaces, edited by M. Daehlen, T. Lyche, and L. L. Schumaker. Nashville, TN: Vanderbilt Univ. Press, 99–110. Donovan, G. C., Geronimo, J. S., and Hardin, D. P. (1996a). Intertwining multiresolution analyses and the construction of piecewise polynomial wavelets. SIAM J. Math. Anal. 27(6), 1791–1815. Donovan, G. C., Geronimo, J. S., and Hardin, D. P. (2000). Compactly supported, piecewise affine scaling functions on triangulations. Constr. Approx. 16, 201–219. Donovan, G. C., Geronimo, J. S., Hardin, D. P., and Kessler, B. (1996b). Construction of twodimensional multiwavelets on a triangulation, in Wavelet Application in Signal and Image Processing IV, edited by M. A. Unser, A. Aldroubi, and A. F. Laine. Denver: SPIE—Int. Soc. Opt. Eng., 98–108. Donovan, G. C., Geronimo, J. S., Hardin, D. P., and Massopust, P. R. (1996c). Construction of orthogonal wavelets using fractal interpolation functions. SIAM J. Math. Anal. 27, 1158–1192. Geronimo, J. S., and Hardin, D. P. (1993). Fractal interpolation surfaces and a related 2-D multiresolution analysis. J. Math. Anal. Appl. 176, 561–586. Geronimo, J. S., Hardin, D. P., and Massopust, P. R. (1994). Fractal functions and wavelet expansions based on several scaling functions. J. Approx. Theory 78(3), 373–401. Hardin, D. P., Kessler, B., and Massopust, P. R. (1992). Multiresolution analyses and fractal functions. J. Approx. Theory 71, 104–120.
SCALING FUNCTION AND MULTIWAVELET CONSTRUCTION
251
Hardin, D. P., and Marasovich, J. A. (1999). Biorthogonal multiwavelets on [−1, 1]. Appl. Comput. Harmonic Anal. 7, 34–53. Hardin, D. P., and Roach, D. W. (1998). Multiwavelet prefilters I: Orthogonal prefilters preserving approximation order p ≤ 2. IEEE Trans. Circuits Syst. II: Analog Digital Signal Processing 45(8), 1106–1112. Hubbard, B. B. (1998). The World According to Wavelets. Wellesley, MA: A K Peters. Jia, R. (1995). Refinable shift-invariant spaces: From splines to wavelets. Approx. Theory VIII 2, 179–208. Kessler, B. (2000). A construction of orthogonal compactly-supported multiwavelets on R2. Appl. Comput. Harmonic Anal. 9, 146–165. Kessler, B. (2002). A construction of compactly-supported biorthogonal scaling vectors and multiwavelets on R2. J. Approx. Theory 117(2), 229–254. Mallat, S. G. (1998). A Wavelet Tour of Signal Processing. San Diego: Academic Press. Massopust, P. R. (1990). Fractal surfaces. J. Math. Anal. Appl. 151, 275–290. Strang, G., and Strela, V. (1994). Short wavelets and matrix dilation equations. IEEE Trans. Signal Processing 33, 2104–2107.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 124
Diffraction Tomography for Turbid Media CHARLES L. MATSON Directed Energy Directorate, Air Force Research Laboratory, Kirtland AFB, New Mexico 87117
I. Introduction . . . . . . . . . . . . . . . . . . . . . . II. Background . . . . . . . . . . . . . . . . . . . . . . A. Computed Tomography . . . . . . . . . . . . . . . . 1. Forward Model . . . . . . . . . . . . . . . . . . 2. Filtered Backprojection . . . . . . . . . . . . . . . B. Standard Diffraction Tomography . . . . . . . . . . . . 1. Forward Model . . . . . . . . . . . . . . . . . . 2. Filtered Backpropagation . . . . . . . . . . . . . . III. Diffraction Tomography for Turbid Media: The Forward Model . A. Absorptive Objects . . . . . . . . . . . . . . . . . . B. Scattering Objects . . . . . . . . . . . . . . . . . . C. Absorptive and Scattering Objects . . . . . . . . . . . . IV. Backpropagation in Turbid Media . . . . . . . . . . . . . A. Single-View Backpropagation . . . . . . . . . . . . . B. Resolution Enhancement . . . . . . . . . . . . . . . C. Object Localization . . . . . . . . . . . . . . . . . . D. Laboratory Data Reconstruction Examples . . . . . . . . 1. Frequency-Domain Data . . . . . . . . . . . . . . 2. CW Data . . . . . . . . . . . . . . . . . . . . . E. Multiple-View Backpropagation . . . . . . . . . . . . V. Signal-to-Noise Ratios . . . . . . . . . . . . . . . . . . A. SNR Derivations . . . . . . . . . . . . . . . . . . . 1. Assumptions . . . . . . . . . . . . . . . . . . . 2. SNR Derivation for CW Illumination . . . . . . . . . 3. SNR Derivation for Modulated Illumination . . . . . . 4. Comparison of Modulated and CW Illumination SNRs . . 5. Laboratory Data Validation . . . . . . . . . . . . . B. SNR Example . . . . . . . . . . . . . . . . . . . . VI. Concluding Remarks . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
253 259 260 260 262 263 263 266 268 271 274 278 281 282 288 295 303 304 310 313 316 317 317 320 322 327 329 334 338 339
I. Introduction Optical diffusion tomography (ODT) is an emerging technology that uses optical radiation (typically in the 700- to 1300-nm regime) to image and determine the material properties of the heterogeneous structure of turbid (i.e., scattering) media such as human tissue. Other technologies used to image the human body 253 Copyright 2002, Elsevier Science (USA). All rights reserved. ISSN 1076-5670/02 $35.00
254
CHARLES L. MATSON
include X-ray tomography, emission computed tomography, ultrasonic computed tomography, and magnetic resonance imaging (Kak and Slaney, 1988). Currently, X-ray tomography, also known as computed tomography (CT), is the dominant technology employed to image the interior of the human body (Kak and Slaney, 1988). X-ray radiation has many significant benefits, including providing very high resolution as a result of low levels of scattering and diffraction that are easily masked, excellent selectivity as to which parts of the body to image, and well-understood physics relating the measured data to the underlying physical properties. Even though CT is well established as an imaging modality, there are at least three reasons to consider using optical radiation instead of X-ray radiation for imaging the human body. One reason is that optical radiation can provide functional information about the body’s processes, so that the images obtained can be directly related to the health of tissues and organs. A second reason is that optical radiation, unlike X-ray radiation, is nonionizing and thus potentially safer. A third reason is that ODT systems can be manufactured for a fraction of the cost of CT systems. There are several disadvantages to using ODT instead of CT for imaging. One is that ODT uses scattered light for imaging, whereas CT uses unscattered light. The use of scattered light results in markedly decreased spatial resolution in the reconstructed images (Ishimaru, 1978a). Another disadvantage is that the complexity underlying the conversion of measured data to images and material properties is significantly greater with ODT (Gutman and Klibanov, 1994; Ishimaru, 1989; Profio, 1989). Despite these disadvantages, however, the advantages of ODT have motivated years of research on how to maximize its benefits while minimizing its disadvantages. Research into using optical radiation to image portions of the human body has been conducted for decades, including efforts to diagnose breast lesions by using continuous-wave (CW) illumination, where CW illumination is defined as light that has a constant intensity level as a function of time (Catler, 1929). A major failing in these early attempts was their emphasis on trying to interpret the data, which consisted primarily of scattered light, by applying concepts meaningful for unscattered light. Because of the amount of scattering that light undergoes in tissue, these earlier approaches did not work well. A significant new mind-set began appearing in the 1970s and 1980s, when researchers started exploring the possibility of using the scattered light itself to generate images of tissue by modeling the scattering processes in tissue (Groenhuis et al., 1983; Hemenger, 1977; Jacques and Prahl, 1987; Wilson and Adam, 1983; Yoon et al., 1987). These new models were based on describing the temporal and spatial properties of photons as they migrate in turbid media. The foundational theory for this photon migration is scalar transport theory with appropriate simplifications (Ishimaru, 1978b). The most widely used simplification to date is the diffusion equation (Ishimaru, 1978b).
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
255
These approaches to modeling use macroscopic descriptions of turbid media, typically characterized by the absorption and scattering properties of the media. Another approach to modeling light scattering in tissue is the Monte Carlo approach (Flock et al., 1989; Haselgrove et al., 1991). In this approach, each particle in a turbid medium is modeled statistically, and each photon passing through the medium is tracked while its interaction with individual particles is accounted for. The most general types of models are those based on Monte Carlo methods, whose benefits include the straightforward inclusion of arbitrary boundaries and turbid medium properties. One disadvantage of these methods is that they are computationally intensive. Another disadvantage is that it is difficult to build image-reconstruction algorithms based on them. Models that use the macroscopic properties of turbid media without requiring closed-form solutions have benefits that include much faster computation times than those for Monte Carlo methods while the ability to incorporate somewhat general boundary conditions is retained (Arridge, 1999). They can be used to generate images by using partial differential equation solution methods (Hielscher, Klose, et al., 1998; Paulsen and Jiang, 1995). A disadvantage to these algorithms is that the time required to generate an image can be unacceptably long for clinical applications. Another disadvantage is that because no closed-form solution is possible without further simplifications, using these models to answer such questions as the uniqueness of solutions and the effects of noise on spatial resolution is cumbersome and requires numerous computer simulations (Boas, O’Leary, et al., 1997; Pogue and Patterson, 1996). As a result, further model simplifications, which are described next, have been explored to mitigate these disadvantages. The highest level of simplification in models discussed in this article is the simplification of the diffusion equation that assumes that the heterogeneities in the turbid medium only weakly perturb the illuminating radiation. There are two primary approaches to simplifying the diffusion equation under this assumption: the Born approximation and the Rytov approximation (Kak and Slaney, 1988). It is generally understood that the Rytov approximation holds for stronger levels of perturbations than those for which the Born approximation holds, but both are often used (O’Leary et al., 1995b). Making either of these assumptions allows a closed-form solution to the diffusion equation to be generated and used to derive fast image-reconstruction algorithms. In addition, these solutions can be used to develop closed-form solutions that permit important fundamental image-reconstruction issues to be explored as mentioned previously: uniqueness of solutions and the effect that noise has on spatial resolution in the reconstructed images. Along with the weak perturbation assumption required for this class of models, simplified boundary conditions are needed, such as no boundaries, a single planar boundary, or two
256
CHARLES L. MATSON
Figure 1. Descriptions of three fundamental ways to model light propagation in tissue.
parallel planar boundaries. The relationships among these modeling methods is summarized in Figure 1. In addition to modeling the scattered light to obtain higher-resolution images, researchers have explored illumination methods besides CW illumination. The two main categories of illumination methods are called time-domain imaging and frequency-domain imaging. Time-domain imaging methods use short (picosecond or femtosecond) pulses of light and correspondingly fast detection systems. These techniques use the time-resolved scattered detected light as a function of location as inputs to reconstruction algorithms that use models relating the measured data to the underlying spatially resolved tissue heterogeneities (Alfano et al., 1994; Benaron et al., 1995; Cai et al., 1999; Das et al., 1993; Patterson et al., 1989; Schmidt et al., 2000). Most often the model used to reconstruct images by using time-domain data is the time-dependent diffusion equation. Another time-domain imaging method seeks to use the unscattered (or ballistic) light by gating out scattered light by using a variety of discriminating methods including time gating, polarization gating, angular gating, and others (Feng et al., 1994; Gayen and Alfano, 1996; Hee et al., 1993; Lyubimov, 1994; Mills et al., 2001; Winn et al., 1998). Because the amount of unscattered light becomes too small to be useful for tissues whose optical depths are greater than about 30 scattering lengths, the use of unscattered light to image the human body is restricted to optically thin parts of the body. The second way of illuminating tissue, called frequency-domain imaging, uses light that is amplitude modulated, typically at tens or hundreds of megahertz (Boas, Campbell, et al., 1995; Jiang et al., 1997a, 1997b; Kn¨uttel et al., 1993; O’Leary et al., 1995a; Pogue et al., 1999; Yao et al., 1997). This type of illumination generates a time-varying sinusoidal distribution of photons in the turbid medium called a diffuse photon density wave (DPDW). Although the
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
257
DPDW consists of scattered light, it nevertheless maintains well-defined phase fronts created by the amplitude-modulated illumination. At a given spatial location in the detected data, it is the relative amplitude and phase of the DPDW that carries the information about the underlying heterogeneous structure of the medium. Frequency-domain imaging methods also use the time-dependent diffusion equation as the model from which to generate image-reconstruction algorithms. Because DPDWs are typically modulated at a single temporal frequency for a given data set, the time dependence of the diffusion equation is purely sinusoidal and thus the diffusion equation can be simplified to the Helmholtz equation (Kak and Slaney, 1988). Most approaches to generating images by using the Helmholtz equation have focused on using finite element methods. However, this problem has also been explored in the context of diffraction tomography (DT), which is the main focus of this article. DT theory was first proposed by Wolf (1969) as a means to determine the structure of three-dimensional semitransparent objects. In this article, this version of DT is called standard DT to distinguish it from DT for turbid media. Standard DT methods seek to reconstruct the inhomogeneous structure of the medium through which the illuminating radiation passes. The mathematical foundation for both types of DT theory is the Helmholtz equation with the additional assumption that either the Born approximation or the Rytov approximation is valid. In addition, it is usually assumed that the inhomogeneous structure of the medium to be imaged has finite support (i.e., is contained in a bounded region of space) and is imbedded in an otherwise homogeneous background that is usually assumed to be infinite. For simplicity, the inhomogeneous structure is called the object to be imaged. Finally, standard DT theory assumes that the properties of the background medium in which the object is imbedded are known or can be determined so that the homogeneous contribution to the measured data can be subtracted to leave only the radiation scattered by the object. The range of application of standard DT has been extended beyond its initial concept to include many other areas, including geophysical and medical imaging. Standard DT theory has also been developed for scenarios including arbitrary illumination and detection geometries (Devaney, 1986), near-field effects (Schatzberg and Devaney, 1992), objects imbedded in dispersive attenuating backgrounds (Devaney, 1987), for limited views (Devaney, 1989), and for limited data (Chen et al., 2000). There are two primary image-reconstruction methods in standard DT. The first is Fourier interpolation, in which the Fourier transform of the underlying object is reconstructed from regions of Fourier domain data obtained from a series of projection data collected by the experimenter by illuminating the medium from various angles (Pan and Kak, 1983). Once the complete Fourier transform is obtained (or as much of it as possible), the desired image is reconstructed by inverse Fourier transforming the result.
258
CHARLES L. MATSON
The second method is filtered backpropagation, in which each projection data set is individually reconstructed back into the image domain. A key aspect of standard DT is the assumption that the radiation propagating through the turbid medium in the absence of an imbedded object undergoes only phase delays, not attenuation. Mathematically, this means that the wave number of the radiation is real. DT for turbid media, developed in this article, is an extension of standard DT that also uses the Helmholtz equation to model the effects of the interaction of a DPDW with a turbid medium, albeit with significant differences. One of these differences is that for standard DT the Helmholtz equation models the wave nature of the illuminating radiation itself and attempts to reconstruct an image of an object that scatters the illuminating radiation. Without an imbedded object, no radiation is scattered. In DT for turbid media, all the illuminating radiation that creates the DPDW is scattered by the turbid medium, whether or not an object is inside. However, when frequency-domain imaging is being implemented, the DPDW formed by the amplitude-modulated illumination maintains well-defined phase fronts even though the DPDW itself consists of light that is scattered with respect to the wave nature of the illuminating radiation itself. When no object is imbedded in the medium, the DPDW does not undergo scattering. When an object is imbedded, the DPDW is scattered by the object. A second difference between standard DT and DT for turbid media is that the DPDW has a wave number that is complex; that is, the DPDW undergoes attenuation as well as phase shifts as it propagates through the medium. A third difference is that in standard DT the wave number corresponds to the wavelength of the illuminating radiation in the medium, whereas in DT for turbid media, the wave number corresponds to the wavelength of the DPDW. Because of the complex wave number, the mathematical formulation of DT for turbid media is more complicated. A final difference is that the DPDW oscillates about an average value that is greater than or equal to the amplitude of the DPDW so that the light intensities in the medium are nonnegative at all times. This fact affects the signal-to-noise ratio (SNR) properties of the measured data, as is discussed later. For standard DT, the illumination wave oscillates about zero. Some of the first publications proposing the use of DT for turbid media were by Hielscher Tittel et al. (1994, 1995), which discussed modeling the effects of spherical objects on DPDWs and the possibility of solving the associated inverse problem. Other published work in this area includes papers on backpropagation algorithms (Braunstein and Levine, 2000; Liu et al., 1999; Matson, Clark, et al., 1997; Matson and Liu, 1999b; Pattanayak and Yodh, 1999), analytic inversion algorithms (Chen et al., 1998; Cheng and Boas, 1998; Durduran et al., 1999; Li, Durduran, et al., 1997; Li, Pattanayak, et al., 2000; Schotland, 1997), a Fourier inversion algorithm (Norton and
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
259
Vo-Dinh, 1998), and more-comprehensive analyses of the forward model (Matson, 1997; Matson and Liu, 1999a). As already mentioned, in this article, DT for turbid media is developed and demonstrated. The outline of the article is as follows: In Section II, brief mathematical descriptions of CT and standard DT are presented to show the evolution of tomography theory from CT to standard DT to DT for turbid media. In Section III, the forward model for DT for turbid media is presented. The Fourier diffraction theorem in turbid media is presented and compared with the Fourier diffraction theorem for standard DT. A backpropagation algorithm valid for use in turbid media is developed in Section IV and compared with the filtered backpropagation algorithm in standard DT. It is shown that the turbid media backpropagation algorithm can be used to increase the spatial resolution in the measured data as well as to locate an object three dimensionally from a single view angle. The dependence of the spatial resolution improvement on the noise levels in the data is quantified in this section. Because noise plays such a dominant role in the amount of spatial resolution in the reconstructed images, SNR expressions are derived in Section V for two types of illuminating radiation: DPDWs and CW illumination. Two separate expressions are needed because two different types of imaging systems are used. These SNR expressions can be used to determine the effects of system, background, and object parameters on the quality of the measured and reconstructed data. An example is given to illustrate that high-frequency DPDWs produce higher spatial resolution in measured data, but when deconvolution is employed in the reconstruction process, CW illumination produces higher spatial resolution in the reconstructed images. Because deconvolution is incorporated into virtually all existing ODT imaging algorithms, CW illumination is preferred when the highest possible spatial resolution is desired. Finally, conclusions are presented in Section VI. II. Background In this section, two types of tomographic forward models are outlined and their corresponding inverse solutions are presented. The first is CT, which uses nondiffracting sources to probe a medium in order to reconstruct the interior of the medium. The second is standard DT, which uses diffracting illumination sources whose propagation must be described by using the wave equation and which seeks to image objects that perturb the illumination. Because DT for turbid media is an extension of standard DT, which itself is an extension of CT, the background presented in this section will prove useful in generating insight into the complexities encountered in DT for turbid media.
260
CHARLES L. MATSON
The inverse solutions to CT and standard DT presented in this section are called filtered backprojection and filtered backpropagation, respectively. Although these terms look similar, they refer to two distinct processes, as is described in more detail in this section. Briefly, the filtered backprojection algorithm applies to data collected by using nondiffracting illumination sources, whereas the filtered backpropagation algorithm applies to data collected by using diffracting sources. The word filtered in both cases refers to functions in the inverse solutions that multiply the Fourier transform of the measured data as part of the reconstruction processes, although these functions are different for the two processes.
A. Computed Tomography In this subsection, the forward model of CT and its associated filtered backprojection inverse solution are presented. Both the forward model and the filtered backprojection inverse solution are presented in two-dimensional notation because many applications of CT use one-dimensional sources and detectors to create two-dimensional slices of a medium’s interior structure. Because the illumination sources can be modeled accurately as nondiffracting, a twodimensional model works well even when the medium being probed is threedimensional because a one-dimensional source distribution will illuminate only a two-dimensional slice of the medium. 1. Forward Model Consider the scenario shown in Figure 2. The object o(x, z) is illuminated by a line source that is at an angle φ to the x axis and emits collimated radiation (rays) orthogonal to the line source. There are other geometries such as fan beam and cone beam geometries (Kak and Slaney, 1988), but line source geometry is commonly used to discuss the principles of CT and leads directly to an important theorem called the Fourier slice theorem. In CT, the object function o(x, z) is the two-dimensional spatial distribution of the attenuation of the X-ray radiation due to the photoelectric and Compton effects. It can be shown (Kak and Slaney, 1988) that the detected radiation uφ (ρ) is related to o(x, z) by the following equation: o(x, z) ds (1) u φ (ρ) = A exp − rays
where ρ = x cos φ + z sin φ, A is the intensity of the line source, and ds is a
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
261
Figure 2. System geometry for computed tomography (CT) imaging.
differential element along the ray direction. From Eq. (1), it follows that o(x, z) ds pφ (ρ) ≡ rays
A = ln u φ (ρ)
(2)
where pφ (ρ) is called the projection of o(x, z) onto the line ρ. It can be seen from Eq. (2) that the desired information o(x, z) is encoded in the measured data pφ (ρ) by means of line integrals of o(x, z). Fourier transforming pφ (ρ) yields the Fourier slice theorem: o(x, z) exp[−iωρ (x cos φ + y sin φ)] d x dz Pφ (ωρ ) = (3) = O(ωρ cos φ, ωρ sin φ) where ωρ is the radian Fourier domain variable corresponding to ρ and i = √ −1. In this subsection and elsewhere in this article, spatial-domain quantities are denoted by lowercase variables and their Fourier transforms are denoted by the corresponding uppercase variables. In words, the Fourier slice theorem states that the Fourier-transformed one-dimensional projection of the twodimensional object is related to the two-dimensional Fourier transform of the
262
CHARLES L. MATSON
object. In particular, the Fourier-transformed projection is equal to a onedimensional slice of the Fourier transform of the object. This slice is along a line in the object’s Fourier space that passes through the origin and is oriented at the same angle φ to the horizontal Fourier axis as the angle at which the projection data are oriented to the horizontal axis in image space. The Fourier slice theorem has a number of implications. One concerns uniqueness of solutions given a set of projection data. Because each projection provides only a one-dimensional slice of the two-dimensional Fourier transform of o(x, z), a set of projections around the object is necessary to obtain a unique solution. The number of views and their spacing are determined by the Nyquist sampling theorem (Matson, Magee, et al., 1995; Stark, 1979; Stark and Wengrovitz, 1983). A second implication relates to image-reconstruction methods. From Eq. (3), it can be seen that the object can be reconstructed by Fourier transforming all the projections, interpolating their values onto a rectangular grid, and inverse Fourier transforming the result. A more popular approach is to take each projection separately, reconstruct its contribution in the image domain, and sum all contributions from all projections. This approach, called filtered backprojection, is discussed next. 2. Filtered Backprojection The filtered backprojection approach to reconstructing an image from its projections is a popular approach because it provides accurate reconstructions and can be implemented at high speed. It is derived by taking the definition of the Fourier transform of o(x, z) in rectangular coordinates and converting it to polar coordinates, which gives o(x, z) =
0
π
∞
−∞
Pφ (ωρ )|ωρ | exp(iωρ ρ) dωρ
(4)
The algorithm is called backprojection because it backprojects the Fourier transform of the measured data into image space. It is referred to as the filtered backprojection algorithm because of the |ωρ | factor in the integrand that filters the Fourier transform of the projection data. The |ωρ | factor appears as a result of the change of variables procedure that converted a rectangular coordinate description of O(ωx , ωz) to a polar coordinate description. In Fourier space, Eq. (4) indicates that the measured projection data are Fourier transformed and multiplied (filtered) by the |ωρ | factor before being inverse Fourier transformed to complete the filtered backprojection operation. In contrast, the filtered backpropagation algorithm that is the DT equivalent of the filtered backprojection algorithm requires that the filtered Fourier transform of the measured data be multiplied by a backpropagation term as well as a filtering term prior to
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
263
the inverse Fourier transform operation. This is discussed in more detail in Section II.B.
B. Standard Diffraction Tomography In this subsection, the forward model and the associated filtered backpropagation reconstruction algorithm of standard DT are presented. The types of radiation used in DT applications include ultrasonic, optical, and microwave sources. The key aspect with DT, as compared with CT, is that the wave nature of the illumination source as well as the scattering and diffraction undergone by the radiation as a result of interacting with the object must be taken into account in the forward model and thus also in the reconstruction algorithms. This necessitates the use of the (scalar) wave equation (or its associated Helmholtz equation) to build the forward model instead of the use of line integrals. The theory in this subsection is presented for a two-dimensional geometry as is commonly found in standard DT literature. The extension to three dimensions is straightforward. Unlike for CT, when DT theory in two dimensions is applied to a three-dimensional problem, the object’s properties to be reconstructed should not vary along the dimension not reconstructed because the illuminating radiation fills a three-dimensional volume. 1. Forward Model The geometry for which standard DT theory is developed in this subsection is shown in Figure 3. The object to be imaged is imbedded in a background medium such as free space or the earth. Often, the desired image is not of an imbedded object per se but rather of the medium’s spatially varying properties. In the latter case, the object to be imaged is the difference between the medium’s properties and its spatially averaged values. In general, the illuminating sources can have any desired spatial structure, but they are assumed to produce monochromatic radiation. For clarity, the two-dimensional coordinate system is chosen so that the detection line is orthogonal to the z axis. The geometry shown in Figure 3 is known as a transmission geometry because the illumination sources and the detection system are on opposite sides of the object to be imaged. The mathematical foundation for the description of the forward model for standard DT in this subsection is the first Born approximation to the Helmholtz equation, which describes the radiation u s (x, z) scattered by an object (Kak and Slaney, 1988): 2 ∇ + ko2 u s (x, z) = −o(x, z) u o (x, z)
(5)
264
CHARLES L. MATSON
Figure 3. System geometry for standard diffraction tomography (DT) imaging.
where k = 2π /λ is the wave number of the illuminating radiation, λ is the wavelength of the illumination radiation, u o (x, z) is the illuminating radiation field, and o(x, z) is the spatially dependent object property to be reconstructed. More specifically, u s (x, z) is the difference between the data that were measured with the object imbedded in the medium and u o (x, z), the data that would have been measured if the object were not present. The relationship of o(x, z) to the underlying physical properties of the object depends on the particular application. As an example, when the variations of the complex index of refraction n(x, z) from the background value of one are the properties of interest, o(x, z) is given by o(x, z) = k 2 [n 2 (x, z) − 1]
(6)
The first Born approximation to the scattered field is used to provide a simplified integral equation solution for us(x, z) and requires that the radiation scattered by the object be small compared with uo(x, z). Another approximation that has been used in standard DT is the Rytov approximation, which appears to be valid for stronger levels of scattering (Kak and Slaney, 1988). The Born approximation is used in this discussion for clarity in understanding.
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
The integral equation solution to Eq. (5) is given by o(x ′ , z ′ )u o (x ′ , z ′ )g(x − x ′ , z − z ′ ) d x ′ dz ′ u s (x, z) =
265
(7)
where g(x, z) is the Green’s function associated with Eq. (5) and the appropriate boundary conditions. It can be seen from Eq. (7) that if the Green’s function is known as well as the illumination function for all space, u s (x, z) can be calculated everywhere. Let us consider now the case in which u s (x, z) is measured by a linear detector, which makes available the values of u s (x, z) on the line z = zo. In addition, it is assumed that u o (x, z) is a plane wave. This latter assumption permits development of the Fourier diffraction theorem, the standard DT equivalent of the Fourier slice theorem. The development begins by applying these assumptions to Eq. (7) and Fourier transforming the one-dimensional measured data to produce the following mathematical description of the Fourier diffraction theorem: i 2 2 2 2 exp i z o k − ωx O ωx , k − ωx − k (8) Us (ωx ; z o ) = 2 k 2 − ωx2
where it is typically assumed that k ≥ |ωx | and ωx is the spatial-frequency variable corresponding to x. When the inequality is reversed, evanescent waves result but are usually neglected because their contribution to the measured data is negligible when the separation of the object and the detection plane is greater than approximately 10λ. The significant aspect of Eq. (8) is that the onedimensional Fourier transform of the measured data is algebraically related to a portion of the two-dimensional Fourier transform of the imbedded object. It can be shown that in the limit as the radiation wavelength becomes very small, the Fourier diffraction theorem simplifies to the Fourier slice theorem (Kak and Slaney, 1988). Examination of the arguments of O(ωx , k 2 − ωx2 − k) shows that the Fourier transform of the measured data contains information about the Fourier transform of the object properties on a circle. In three dimensions when a two-dimensional planar measurement is made, the Fourier transform values lie on a sphere. This indicates, once again, that multiple view angles of the object are necessary to uniquely reconstruct the full three-dimensional object values. As for CT, there are two main approaches to reconstructing the object values: Fourier domain interpolation and filtered backpropagation. The Fourier domain interpolation approach is similar to that carried out in CT, albeit more complicated because of the more complicated shape of the support of the Fourier transform values. Likewise, the filtered backpropagation approach is more complicated than the filtered backprojection approach, which reflects the more complicated forward model needed for diffracting sources and objects.
266
CHARLES L. MATSON
2. Filtered Backpropagation The filtered backpropagation algorithm was originally developed by Devaney (1982). As for the filtered backprojection algorithm, the filtered backpropagation algorithm is derived by describing o(x, z) in terms of its Fourier transform on a rectangular coordinate system and making a change of Fourier variables to most naturally accommodate the region of Fourier space that contains the Fourier-transformed measured data. For the backprojection algorithm, the change of Fourier variables results in a conversion of the rectangular coordinate system to a polar coordinate system. For the backpropagation algorithm, the change-of-variables process is more complicated and results in (Kak and Slaney, 1988) 2π k # $ ik 2 − ω 2 − k |ω | ω , U k o(ξ, η) = ξ s ξ ξ 4π 2 0 −k 2 2 × exp − i(ηo − η) k − ωξ exp(−ikη) exp(iωξ ξ ) dωξ dφ (9) The geometry underlying Eq. (9) is shown in Figure 4, where (ξ , η) is a rotated version of the (x, z) axis, ωξ is the spatial-frequency variable corresponding to ξ , and where φ is the rotation angle. The detector array is parallel to the
Figure 4. System geometry for filtered backpropagation in standard DT.
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
267
ξ axis. Similar to the filtered backprojection algorithm, the filtered backpropagation algorithm consists conceptually of two pieces. The first piece is the filtered portion that arises from carrying out the change of variables to convert from a rectangular coordinate system to one that naturally fits the shape of the support of the measured Fourier data. The second piece is the backpropagation portion where, for each angle φ, the data measured at that angle are backpropagated throughout the reconstructed image plane. It can be shown (Devaney, 1986) that the filtered backpropagation operation can be carried out by Fourier transforming the measured data, multiplying the result by a filter and a backpropagation transfer function, and then inverse Fourier transforming the filtered and backpropagated Fourier data. One term in Eq. (9) is called the backpropagation transfer function, Hb (ωξ ; ηo − η). It is given by Hb (ωξ ; ηo − η) = exp −i(ηo − η) k 2 − ωξ2
(10)
where ηo is the distance along the η axis from the origin to where the data were measured. Notice that the backpropagation operation in standard DT corresponds to modifying the Fourier phase of the Fourier transform of the measured data as a function of the distance (η − ηo) of the reconstructed data from the measurement line. Because only the phase is modified in the backpropagation operation and because the backpropagation operation is carried out only over a low-pass region in Fourier space, the backpropagation operation is inherently a well-posed inversion with respect to reconstructing a low-passfiltered version of o(x, z). It is shown in Section IV that the backpropagation operation in turbid media exponentially amplifies the Fourier amplitudes as well as altering the Fourier phases of the measured data and is therefore a much more poorly conditioned inverse problem. Significant differences exist between the filtered backprojection and the filtered backpropagation algorithms. One difference is that the filter function in the filtered backpropagation algorithm consists of more than just the absolute value of the relevant spatial frequency, as is the case in filtered backprojection. There is a second factor, exp(−ikη), which backpropagates the phase of the illuminating plane wave. Another difference is that there is an actual backpropagation transfer function, Eq. (10), in the filtered backpropagation algorithm, whereas in filtered backprojection there is no backprojection transfer function; equivalently, the backprojection transfer function is unity everywhere. A final significant difference between the filtered backprojection algorithm and the filtered backpropagation algorithm is that the filtered backprojection algorithm’s assumption of plane-wave illumination, linear or planar detectors, and a complete set of projections for all 360◦ around the object can be relaxed
268
CHARLES L. MATSON
while a straightforward generalized filtered backprojection algorithm can still be developed. In contrast, equivalent modifications to the filtered backpropagation algorithm for non-plane-wave illumination, as well as nonlinear or nonplanar detector arrays, are much more complicated and can restrict or even make the filtered backpropagation algorithm invalid. In addition, there are applications of standard DT in which objects can be viewed from only a limited number of angles, such as imaging objects imbedded in the ground. For these cases, the backpropagation part of the algorithm, using Eq. (10), can be used to generate an approximate image of the object. For further refinement of the image, nonlinear image-processing algorithms that can incorporate these generalizations as well as additional prior information such as the support of the object can be used. It is shown in the next section that the forward model for standard DT for a turbid medium is sufficiently complicated that no filtered backpropagation algorithm has yet been developed. Instead, only the backpropagation portion of the algorithm is available. III. Diffraction Tomography for Turbid Media: The Forward Model In this section, the forward model for DT for turbid media is developed in some detail (Matson, 1997; Matson and Liu, 1999a). The corresponding solution to the inverse problem by using backpropagation is developed in Section IV. As discussed in Section I, DT for turbid media is used to model the wave behavior of a DPDW inside a turbid medium. In particular, DT for turbid media emphasizes modeling the portion of the illuminating DPDW scattered by an imbedded object. It is important to remember that in the context of DT for turbid media the term scattered refers to the portion of the DPDW that is perturbed by the presence of an imbedded object, not the scattering that is undergone by the illumination with respect to the wave nature of the light. As a way to minimize confusion, the term scattered light is used to refer to the latter case, and the term scattered DPDW to refer to the former case. In addition, the term measured scattered DPDW refers to the portion of the scattered DPDW that is measured by the detector, whereas the term backpropagated scattered DPDW refers to the result of applying the turbid media backpropagation algorithm to the measured scattered DPDW (see Section IV). The scattered DPDW is the difference between the data measured with an object imbedded in the background medium and the data measured without an imbedded object. In a manner similar to that for standard DT, the Helmholtz equation is used to describe the spatial properties of the scattered DPDW inside the turbid medium. Although the development in this section assumes that the illumination source is amplitude modulated, the results also apply to CW illumination and are obtained by setting the modulation frequency variable equal to zero. For this
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
269
latter case, the term DPDW is still used to describe the illumination because the same equations are used, but only amplitude information is available because the illumination is not modulated. For nonzero modulation frequencies, the Helmholtz equations in this article are used to model just the modulated part of the DPDW, not its time-averaged component. This latter component is neglected because when modulated illumination is used, only the amplitude and phase of the modulated portion of the scattered DPDW are used in the reconstruction process. The time-averaged component is removed, either by ac-coupled detectors or with postprocessing. The forward model is developed in three steps in this section: first for objects whose absorption properties differ from the background turbid medium value but whose scattering properties do not, second for objects whose scattering properties differ from the background turbid medium value but whose absorption properties do not, and third for objects whose absorption and scattering properties both differ from the background turbid medium values. The result of the development in this section is an expression for the two-dimensional Fourier transform of the scattered DPDW measured in a plane. This expression is then simplified for the case of plane-wave illumination to derive the turbid media version of the Fourier diffraction theorem. The development is carried out by starting with the general Helmholtz equation that describes the spatial behavior of the photons in the turbid medium (the photon fluence), then simplifying it by using the Born approximation, and then carrying out further manipulations to obtain the Fourier transform of the scattered DPDW measured in a plane. The general Helmholtz equation used to model the wave nature of a DPDW in an inhomogeneous medium is given by (Ye et al., 1998) 2π f t u(x, y, z) ∇ · [D(x, y, z)∇u(x, y, z)] + − μa (x, y, z) + i v = −S(x, y, z)
(11)
where u(x, y, z) is the total photon fluence in the medium due to the DPDW, D(x, y, z) is the diffusion coefficient and is given by [3μ′s (x, y, z)]−1 , μa (x, y, z) and μ′s (x, y, z) are the absorption and reduced scattering coefficients of the medium, ft is the modulation frequency of the DPDW, v is the speed of light in the medium, and S(x, y, z) is the source term that is assumed to be isotropic. The reduced scattering coefficient μ′s (x, y, z) = (1 − g)μs (x, y, z), where μs (x, y, z) is the scattering coefficient of the medium and g is the anisotropy factor, which is typically greater than 0.9 for human tissue (Ishimaru, 1978b). Equation (11) holds for strong or weak scattering. When the scattering and absorption coefficients are independent of x, y, and z, the medium is said
270
CHARLES L. MATSON
to be homogeneous and Eq. (11) simplifies to (∇ 2 + k 2 )u(x, y, z) = −
1 S(x, y, z) D
(12)
where k is the complex wave number of the DPDW and is defined by k2 =
−vμa + i2π f t vD
(13)
Two limiting categories of DPDWs can be determined from Eq. (13). When the modulation frequency of the DPDW, ft , is small enough that the imaginary term is much less than the real term in the numerator of Eq. (13), the DPDW is said to be low frequency. When ft is large enough that the converse is true, the DPDW is said to be high frequency. For typical values of human tissue, the modulation frequency must be tens or hundreds of megahertz to create a high-frequency DPDW. The physical geometry for the mathematical development is shown in Figure 5. It is a transmission mode geometry, where the illumination sources are on one side of the object and the detectors are on the other side. The object is assumed to have compact support inside the turbid medium, and the background turbid medium is assumed to be infinite. The absorption and scattering properties of the object are assumed to be small perturbations from the background medium, which allows a Born approximation to the solution of the Helmholtz equation to be applied. A discussion of how small the perturbations
Figure 5. Conceptual diagram showing the geometry for the turbid media DT development. The object is represented by the cube, which is assumed to be imbedded in an infinite homogeneous turbid medium. The front of the object is located at x = 0, y = 0, z = z 2 , and the detection plane is located at zo. The illumination apparatus and the detection apparatus are not shown. Depth in the medium is measured relative to the detection plane and, for a given value of z, is equal to zo − z.
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
271
must be for the Born approximation to be valid can be found in Kak and Slaney (1988). When the perturbations are not sufficiently small, the distorted Born approximation or higher levels of the Born approximation can be used. The Rytov approximation can be used instead of the Born approximation (Kak and Slaney, 1988); however, the Born approximation is again used in this case for ease of understanding. In addition, the scattered DPDW is assumed to be measured in a plane. These assumptions permit a particularly simple form for the forward model and parallel the assumptions underlying the development of standard DT. The assumptions can be generalized to nonplanar detection and noninfinite media with a resulting increase in complexity and decrease in conceptual understanding. The spatial distribution of the illumination sources can have any arbitrary shape, in general. Later in this section, the illumination source spatial structure is simplified to a plane wave, which permits development of the Fourier diffraction theorem for turbid media. In this case, the amplitude and phase of the plane wave are assumed to be one and zero, respectively, at z = z1. The detection plane is located at z = z0 and the largest value of z inside the support of the object is z = z2. The focus of the analysis in this section is on deriving expressions for the two-dimensional Fourier transform of the measured scattered DPDW. There are several reasons for deriving expressions in the Fourier domain. The first is that information on the uniqueness of solutions can easily be seen when we look at the support of the Fourier transform of the measured scattered DPDW. A second reason is that spatial resolutions in both the measured scattered DPDW and the backpropagated scattered DPDW can be determined from the Fourier transform of the measured scattered DPDW. A third reason is that the backpropagation algorithm, to be developed in Section IV, is best understood by using a Fourier domain transfer function approach. The solution for absorptive objects is derived in Section III.A, for scattering objects in Section III.B, and for objects that have both nonzero absorption and nonzero scattering properties in Section III.C. The approach followed in deriving the integral equation solutions to the Helmholtz equations in this section is attributed to O’Leary (1996). A. Absorptive Objects When the object is assumed to have only absorptive properties that are different from that of the background medium, the inhomogeneous Helmholtz equation, Eq. (11), can be written as 2π f t 2 u(x, y, z) = −S(x, y, z) D∇ u(x, y, z) + − μa − δμa (x, y, z) + i v (14)
272
CHARLES L. MATSON
where μa is the absorption coefficient and D is the diffusion coefficient of the background medium, and μa + δμa (x, y, z) is the absorption coefficient of the object. So that we can proceed, Eq. (14) is rearranged so that the terms associated with δμa (x, y, z) are combined with the source term on the righthand side. This produces (∇ 2 + k 2 )u(x, y, z) = −
1 δμa (x, y, z) S(x, y, z) + u(x, y, z) D D
(15)
Now let u(x, y, z) = u o (x, y, z) + u s1 (x, y, z)
(16)
where u o (x, y, z) is the DPDW photon fluence in the medium without the object (the homogeneous portion of the fluence) and u s1 (x, y, z) is the scattered portion of the DPDW photon fluence due to δμa (x, y, z). Substituting Eq. (16) into Eq. (15) gives (∇ 2 + k 2 )[u o (x, y, z) + u s1 (x, y, z)]
1 δμa (x, y, z) S(x, y, z) + [u o (x, y, z) + u s1 (x, y, z)] (17) D D Because only the scattered DPDW is of interest, the homogeneous portion of the DPDW photon fluence is subtracted from the total DPDW fluence. Mathematically, this results in removing the homogeneous portion of the Helmholtz equation as described by Eq. (12) from Eq. (17), which gives =−
δμa (x, y, z) [u o (x, y, z) + u s1 (x, y, z)] (18) D The Born approximation states that |u s1 (x, y, z)| ≪ |u o (x, y, z)|, which permits the replacement of u o (x, y, z) + u s1 (x, y, z) in Eq. (18) with u o (x, y, z). Carrying out this approximation produces the desired form of the Helmholtz equation: (∇ 2 + k 2 )u s1 (x, y, z) =
(∇ 2 + k 2 ) u s1 (x, y, z) = −oa (x, y, z) u o (x, y, z)
(19)
where δμa (x, y, z) (20) D The integral solution form of Eq. (19) is given by (Kak and Slaney, 1988) u s1 (x, y, z) = oa (x ′ , y ′ , z ′ )u o (x ′ , y ′ , z ′ ) oa (x, y, z) = −
× g(x − x ′ , y − y ′ , z − z ′ ) d x ′ dy ′ dz ′
(21)
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
273
where g(x, y, z), the Green’s function for the infinite background medium, is given by exp ik x 2 + y 2 + z 2 g(x, y, z) = 4π x 2 + y 2 + z 2 1 1 (22) = 2 8π αx2 + α 2y − k 2 3 2 × exp −|z| αx2 + α 2y − k 2 + i xαx + i yα y dαx dα y
The second equality in Eq. (22) is the angular spectrum form of g(x, y, z) and is used to derive Us1(ωx , ωy; z), the two-dimensional Fourier transform of u s1 (x, y, z) with respect to x and y, and where ωx and ωy are radian spatialfrequency variables associated with x and y, respectively. Substituting Eq. (22) into Eq. (21) produces 1 1 ′ ′ ′ ′ ′ ′ exp [i(x − x ′ )αx oa (x , y , z )u o (x , y , z ) u s1 (x, y, z) = 8π 2 γα ! + i(y − y ′ )α y − |z − z ′ |γα ] dαx dα y d x ′ dy ′ dz ′ (23) where
γα =
αx2 + α 2y − k 2
(24)
The square-root operation in Eq. (24) is defined so that the real part of γα is positive. The two-dimensional Fourier transform of u s1 (x, y; z o ) is obtained by Fourier transforming Eq. (23) with respect to the x and y spatial variables and setting z = zo: 1 1 Us1 (ωx , ω y ; z o ) = exp(i xαx + i yα y ) exp(−i xωx − i yω y ) 2 8π γα (25) × oa (x ′ , y ′ , z ′ )u o (x ′ , y ′ , z ′ ) exp(−|z o − z ′ |γα ) × exp(−i x ′ αx − i y ′ α y ) d x ′ dy ′ dz ′ dαx dα y d x d y
This expression can be simplified by integrating first over x and y and then over α x and α y , which gives Us1 (ωx , ω y ; z o ) exp [−i z o γωi ] = oa (x ′ , y ′ , z ′ )u o (x ′ , y ′ , z ′ ) exp[−(z o − z ′ ) γωr ] 2γω × exp[−i(x ′ ωx + y ′ ω y − z ′ γωi )] d x ′ dy ′ dz ′
(26)
274
CHARLES L. MATSON
where γω has been expanded into its real (γωr ) and imaginary (γωi ) parts and |z o − z ′ | has been replaced with (z o − z ′ ) because zo is greater than any z value in the support of the object (see Fig. 5). Notice that the triple integral is a scaled three-dimensional Fourier transform, evaluated at (ωx , ω y , −γωi ), of the absorbing object function multiplied by both the illumination function and an exponentially decaying function. The exponential decay is due to the attenuating nature of the turbid medium. An important property of Eq. (26) is that the only mathematical operations occurring in the integrand are multiplications of functions. As is shown in Section III.B, scattering objects result in derivatives of functions in the integrand.
B. Scattering Objects In this subsection, an expression for the two-dimensional Fourier transform of the scattered DPDW in the z = zo plane, Us2 (ωx , ω y ; z o ), is developed for the case in which the imbedded object consists of only weak scattering perturbations when compared with that of the background medium. In this situation, the inhomogeneous Helmholtz equation, Eq. (11), can be written as 2π f t u(x, y, z) ∇ · { [D + δ D(x, y, z)]∇u(x, y, z)} + −μa + i v = −S(x, y, z)
(27)
where D + δ D (x, y, z) is the diffusion coefficient of the object. Similar to the case of absorbing objects, Eq. (27) is rearranged so that the terms associated with the perturbation δ D(x, y, z) are combined with the source term on the right-hand side of the equation. This produces 2 δ D(x, y, z) 1 2 ∇ + k u(x, y, z) = − S(x, y, z) − ∇ · ∇u(x, y, z) D D
1 ∇[δ D(x, y, z)] S(x, y, z) − · ∇u(x, y, z) (28) D D δ D(x, y, z) 2 − ∇ u(x, y, z) D Notice that Eq. (28) is considerably more complicated than the equation associated with absorbing objects, Eq. (15). Let u(x, y, z) be given by =−
u(x, y, z) = u o (x, y, z) + u s2 (x, y, z)
(29)
where u o (x, y, z) is the homogeneous portion of the DPDW photon fluence and u s2 (x, y, z) is the portion of the DPDW photon fluence scattered by the
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
275
object. Substituting Eq. (29) into Eq. (28) yields 2 ∇ + k 2 [u o (x, y, z) + u s2 (x, y, z)]
1 ∇[δ D(x, y, z)] S(x, y, z) − · ∇[u o (x, y, z) + u s2 (x, y, z)] (30) D D δ D(x, y, z) 2 − ∇ [u o (x, y, z) + u s2 (x, y, z)] D Because only the scattered DPDW photon fluence is of interest, the homogeneous solution is subtracted from Eq. (30), which removes the u o (x, y, z) term from the left-hand side of Eq. (30) and the source term from the right-hand side. The result after the subtraction is 2 ∇[δ D(x, y, z)] · ∇[u o (x, y, z) + u s2 (x, y, z)] ∇ + k 2 u s2 (x, y, z) = − D δ D(x, y, z) 2 − ∇ [u o (x, y, z) + u s2 (x, y, z)] (31) D Simplification of Eq. (31) involves implementing the Born approximation by assuming that |u s2 (x, y, z)| ≪ |u o (x, y, z)|, which permits the replacement of the sum of these two terms on the right-hand side of Eq. (31) with just u o (x, y, z). This simplifies Eq. (31) to the following form: =−
2 ∇[δ D(x, y, z)] · ∇u o (x, y, z) ∇ + k 2 u s2 (x, y, z) = − D δ D(x, y, z) 2 − ∇ u o (x, y, z) D The integral solution to Eq. (32) is given by
(32)
u s2 (x,y,z) δ D(x ′ , y ′ , z ′ ) 2 ∇[δ D(x ′ , y ′ , z ′ )] ′ ′ ′ ′ ′ ′ · ∇u o (x , y , z ) + ∇ u o (x , y , z ) = D D × g(x − x ′ , y − y ′ , z − z ′ ) d x ′ dy ′ dz ′
(33)
This form of the solution is more cumbersome than necessary. In particular, it is desired to have the perturbation term δ D(x, y, z), not its gradient, in the integrand. Achieving this requires the first Green’s identity (Taylor and Mann, 1972) 2 ϕ∇ ψ dv = ϕ!n · ∇ψ ds − ∇ϕ · ∇ψ dv (34) V
S
V
to be used in Eq. (33). The integrals denoted by V are integrals over a threedimensional volume, the integral denoted by S is over the surface of the volume,
276
CHARLES L. MATSON
and n! is the outward normal to the surface S. If we let ϕ=
δ D(x ′ , y ′ , z ′ ) g(x − x ′ , y − y ′ , z − z ′ ) D ψ = u o (x ′ , y ′ , z ′ )
(35) (36)
and if we substitute Eqs. (34)–(36) into Eq. (33), the following expression is obtained: u s2 (x, y, z) ∇[δ D(x ′ , y ′ , z ′ )] ·∇u o (x ′ , y ′ , z ′ )g(x − x ′ , y − y ′ , z − z ′ ) d x ′ dy ′ dz ′ = D δ D(x ′ , y ′ , z ′ ) + g(x − x ′ , y − y ′ , z − z ′ )nˆ ·∇u o (x ′ , y ′ , z ′ ) ds D S δ D(x ′ , y ′ , z ′ ) g(x − x ′ , y − y ′ , z − z ′ ) ·∇u o (x ′ , y ′ , z ′ ) d x ′ dy ′ dz ′ − ∇ D
(37)
The second integral in Eq. (37) is equal to zero because the surface S can be taken to be at infinity and both u o (x, y, z) and g(x, y, z) decay exponentially to zero. As a way to combine the remaining integral expressions in Eq. (37), the third integral can be expanded as follows: δ D(x ′ , y ′ , z ′ ) g(x − x ′ , y − y ′ , z − z ′ ) · ∇u o (x ′ , y ′ , z ′ ) d x ′ dy ′ dz ′ ∇ D δ D(x ′ , y ′ , z ′ ) ∇g(x − x ′ , y − y ′ , z − z ′ ) = ∇u o (x ′ , y ′ , z ′ )· D ∇[δ D(x ′ , y ′ , z ′ )] g(x − x ′ , y − y ′ , z − z ′ ) d x ′ dy ′ dz ′ (38) + D The portion of the integral in Eq. (38) corresponding to the second term in the braces cancels the first integral term in Eq. (37). Once the cancellation is carried out, the integral solution to the Helmholtz equation, Eq. (32), is in its final form: os (x ′ , y ′ , z ′ )∇u o (x ′ , y ′ , z ′ ) u s2 (x, y, z) = · ∇g(x − x ′ , y − y ′ , z − z ′ ) d x ′ dy ′ dz ′
(39)
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
277
where os (x, y, z) is given by os (x, y, z) = −
δ D(x, y, z) D
(40)
The next step in deriving Us (ωx , ω y ; z o ) is to calculate the gradient of the Green’s function by using its angular spectrum form as given in Eq. (22). The desired gradient is ∇g(x, y, z) 1 1 ∂ ∂ ∂ + !y + !z exp(i xαx + i yα y − |z|γα ) dαx dα y = x! ∂x ∂y ∂z 8π 2 γα 1 1 = 2 (i x! αx + i !y α y − !z γα ) exp(i xαx + i yα y − zγα ) dαx dα y (41) 8π γα where x! , !y , and !z are unit vectors in their respective directions. Obtaining the second equality involves using the fact that the z values in the Green’s function are positive. Substituting Eq. (41) into Eq. (39) produces u s2 (x, y, z) ∂ ∂ ∂ 1 os (x ′ , y ′ , z ′ ) iαx ′ + iα y ′ − γα ′ u o (x ′ , y ′ , z ′ ) = 2 8π ∂x ∂y ∂z ×
1 exp[i(x − x ′ )αx +i(y − y ′ )α y −(z − z ′ )γα ] dαx dα y d x ′ dy ′ dz ′ (42) γα
Comparing Eq. (42) with Eq. (23) reveals that the expressions for u s1 (x, y, z) and u s2 (x, y, z) are identical except that the expression for u s2 (x, y, z) contains scaled partial derivatives of u o (x, y, z). As a result, an expression for Us2 (ωx , ω y ; z o ) can be written by inspection, with the use of Eq. (26): Us2 (ωx , ω y ; z o ) =
exp[−i z o γωi ] 2γω
∂ ∂ ∂ os (x ′ , y ′ , z ′ ) iωx ′ + iω y ′ − γω ′ u o (x ′ , y ′ , z ′ ) ∂x ∂y ∂z
× exp[−(z o − z ′ ) γωr ] exp[−i(x ′ ωx + y ′ ω y − z ′ γωi )] d x ′ dy ′ dz ′
(43)
Equation (43) shows that the triple integral expression is a scaled Fourier transform of the scattering object function multiplied by both an exponentially decaying function and a scaled gradient version of u o (x, y, z).
278
CHARLES L. MATSON
C. Absorptive and Scattering Objects In this subsection, the Fourier transform expressions from Sections III.A and III.B are combined to obtain the Fourier transform of the portion of the DPDW scattered by an object that has both absorptive and scattering properties differing from those of the background medium. The resulting equation is valid for illumination sources with arbitrary spatial structure. This result is then specialized to the case of plane-wave illumination to derive the turbid medium version of the Fourier diffraction theorem, which is then compared with the Fourier diffraction theorem for standard DT to highlight the similarities and differences. The desired general expression for the two-dimensional Fourier transform Us(ωx , ωy; zo) of the scattered DPDW is given by the sum of Us1(ωx , ωy; zo) and Us2(ωx , ωy; zo) from Eqs. (26) and (43): exp[−i z o γωi ] oa (x ′ , y ′ , z ′ ) + os (x ′ , y ′ , z ′ ) Us (ωx , ω y ; z o ) = 2γω ∂ ∂ ∂ × iωx ′ + iω y ′ − γω ′ ∂x ∂y ∂z × u o (x ′ , y ′ , z ′ ) exp[−(z o − z ′ )γωr ]
× exp[−i(x ′ ωx + y ′ ω y − z ′ γωi )] d x ′ dy ′ dz ′
(44)
Equation (44) is the most general solution, under the Born approximation to the Helmholtz equation, for the forward problem. For further insight into the structure of the forward problem, a plane-wave illumination source is now assumed. This assumption permits the development of the turbid media version of the Fourier diffraction theorem, which demonstrates the similarities and differences between standard DT and DT for turbid media. To this end, let us assume that the illumination is a plane wave with unit amplitude and zero phase in the z = z1 plane. Mathematically, the plane wave is given by u o (x ′ , y ′ , z ′ ) = exp[i(z ′ − z 1 )k]
(45)
for z ′ > z 1 . Substituting Eq. (45) into Eq. (44) and simplifying gives Us (ωx , ω y ; z o ) =
exp(−i z o γωi ) exp [−(z o − z 1 )ki ] exp(−i z 1 kr ) 2γω × [oa (x ′ , y ′ , z ′ ) − ikγω os (x ′ , y ′ , z ′ )]
× exp[−(z o − z ′ ) (γωr − ki )]
× exp{−i[x ′ ωx + y ′ ω y + z ′ (−γωi − kr )]} d x ′ dy ′ dz ′
(46)
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
279
where k has been expanded into its real and imaginary parts, denoted by the subscripts r and i, respectively. Equation (46) is a statement of the Fourier diffraction theorem for turbid media and is a generalization of the standard Fourier diffraction theorem. In Eq. (8) in Section II.B.1, the standard Fourier k 2 − ωx2 . Rewriting Eq. (8) by diffraction theorem was given in terms of expressing k 2 − ωx2 and the Green’s function in three dimensions and in terms of γω to directly compare the standard version of this theorem to the turbid media version gives Us (ωx , ω y ; z o ) =
exp (−i z o γωi ) O(ωx , ω y , −γωi − kr ) 2γω
(47)
Noting that the triple integral term in Eq. (46) is the Fourier transform of the object function oa (x, y, z) − ik γω os (x, y, z) multiplied by a spatial-frequencydependent attenuating exponential and comparing Eqs. (46) and (47), we can see that the turbid media version of the Fourier diffraction theorem differs from the standard Fourier diffraction theorem primarily by additional exponential terms. There is an additional overall exponential attenuation term outside the integral in Eq. (46) which is a function of the separation of the illumination and detection planes. There is also an additional phase term outside the integral which is due to the definition of the illuminating plane wave. In the limit of no turbidity, the attenuating exponential function inside the integral becomes unity because both ki and γωr become zero. This can be seen by referring to Eq. (24), as is next discussed. If the background medium is not turbid, the imaginary part of k is zero and the real part of k is positive (Kak and Slaney, 1988). In turn, this makes the term inside the radical in Eq. (24) negative for ωx2 + ω2y < kr2 (the usual assumption in standard DT), which forces γωr to zero. Thus the triple integral term becomes the Fourier transform of the object, the attenuating exponential term outside the integral becomes unity, and Eq. (46) becomes equal to Eq. (47) with the addition of the phase term resulting from our definition of the plane-wave illumination. The Fourier diffraction theorem can now be used to approach another issue of interest, which is the possibility of developing a turbid media version of the standard DT filtered backpropagation algorithm, based on Eq. (46), that can be used to reconstruct an image of the imbedded object. However, two main difficulties have impeded this development. As mentioned previously, from Eq. (46) it can be seen that there is a spatial-frequency-dependent exponential attenuation in the integrand, unlike for standard DT. As a result, there is not a direct Fourier transform relationship between the measured data and the object properties. Efforts have been made to accommodate this exponential attenuation term by viewing the transform as a Laplace transform, not a Fourier transform (Norton and Vo-Dinh, 1998; Schotland, 1997).
280
CHARLES L. MATSON
A second difficulty, even if the Laplace transform method works well, is that the scattering nature of the object is multiplied by a spatial-frequency-dependent function, which complicates the Fourier domain interpretation of the data. As a result, in Section IV, a backpropagation algorithm for use with turbid media data is developed instead of a filtered backpropagation algorithm. The backpropagation algorithm can be used to increase spatial resolution in the measured image of an imbedded object. In addition, it can be used to locate an object in three dimensions from a single two-dimensional data set. For objects that satisfy certain additional constraints that are described in Section IV, the algorithm can be used as part of a direct inversion algorithm to determine the material properties of the object as well (Li, Pattanayak, et al., 2000). A multiple-view backpropagation algorithm is also developed in Section IV, but a filtered backpropagation algorithm for turbid media has yet to be derived. The final issue to explore in this section is the determination of which values of the three-dimensional Fourier transform of the object are contained in the Fourier transform of the detected image. For simplicity, this issue is explored in two dimensions by setting ωy = 0 so that it is straightforward to plot the intersection of this region with the ωx –ωz plane. Looking again at Eq. (46), we can see that the Fourier transform of the object (multiplied by the attenuating exponential function) is evaluated at ωx = ωx and ωz = −γωi − kr when ω y = 0. In Figure 6, plots of this region in the ωx–ωz plane are shown
Figure 6. Plots of the two-dimensional projections of the surface on which the Fourier transforms of the measured scattered diffuse photon density waves (DPDWs) are obtained for plane-wave illumination. The dashed line is for Re{k 2 } ≫ Im{k 2 } and the solid line is for Re{k 2 } ≪ Im{k 2 }. For comparison, the corresponding support plot for standard DT is shown with the dotted line. The horizontal axis is ωx and the vertical axis is ωz.
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
281
for two limiting values of k: |Re(k 2 )| ≫ |Im(k 2 )| and |Re(k 2 )| ≪ |Im(k 2 )|. The former case is the low-frequency DPDW regime, whereas the latter case is the high-frequency DPDW regime. For the low-frequency case, it can be seen that the region is essentially the ωz = 0 axis. For the high-frequency case, the region is a curved region whose shape is a function of the relative magnitude of the real and imaginary parts of k2. Extending these results to three dimensions indicates that a plane of the three-dimensional Fourier transform of the object is measured in the low-frequency DPDW regime while a curved surface is measured in the high-frequency DPDW regime. In both cases, only a two-dimension region of the object’s three-dimensional Fourier transform is obtained. A key implication of this fact is that multiple looks at different angles are needed to uniquely reconstruct an image of the threedimensional object. Exactly how many looks are needed is a function of the Nyquist sampling theorem. Also shown in Figure 6 is a plot of the region in standard DT for which Fourier data are available. As stated in Section II.B.1, the region is a circle for two dimensions and a sphere for three dimensions. IV. Backpropagation in Turbid Media In this section, backpropagation theory for image reconstruction in turbid media is developed. As mentioned in Section III, there does not exist a straightforward relationship between the Fourier transform of the measured scattered DPDW and the object material and shape properties, even in the case of planewave illumination and planar detection, unlike for standard DT. For this reason, a filtered backpropagation algorithm for turbid media data has not yet been developed. Even in standard DT, the use of a filtered backpropagation algorithm requires plane-wave illumination and a full set of views around the object, criteria that are difficult to achieve in many standard DT applications. As a result, it has been proposed (Devaney, 1986) in standard DT that the general problem be solved by first creating an approximation to the true image by using just the backpropagation operation as described in Section II.B.2, and then refining the backpropagation image with the use of nonlinear imaging algorithms such as algebraic reconstruction or similar techniques if desired (Kak and Slaney, 1988). In a similar vein, the backpropagation algorithm for turbid media can be used to reconstruct approximate images of objects imbedded in turbid media, as is shown in this section. The outline of this section is as follows: In Section IV.A, backpropagation theory is developed first for arbitrary illumination and detection geometries for a single view angle. This theory is then simplified to planar detection to generate a backpropagation transfer function that is the turbid media version of the standard DT
282
CHARLES L. MATSON
backpropagation transfer function—Eq. (10). The theory is then further specialized to the reconstruction of images of “thin” objects, where the thin dimension of the object is parallel to the detection plane (Li, Durduran, et al., 1997; Li, Pattanayak, et al., 2000). This additional simplification can permit recovery of the object’s absorption and scattering properties as well as its shape. In Section IV.B, the resolution-enhancing properties of the backpropagation algorithm are described and quantified. In Section IV.C, the ability of the backpropagation algorithm to localize objects in three dimensions is described and quantified. Examples using laboratory data that demonstrate the localizing and resolution-enhancing properties of the backpropagation algorithm are presented in Section IV.D. Finally, in Section IV.E, a multiple-view version of the backpropagation algorithm is derived.
A. Single-View Backpropagation In this subsection, the single-view backpropagation algorithm for turbid media is developed by using the forward model expressions derived in Section III (Matson, Clark, et al., 1997; Matson and Liu, 1999b). The single-view backpropagation algorithm reconstructs a three-dimensional representation of the DPDW scattered by an object from a data set obtained for a given source/ detector geometry. This backpropagated scattered DPDW is uniquely defined by the measurements and appropriate assumptions. As is shown, the backpropagated scattered DPDW can be quickly reconstructed by using fast Fourier domain noniterative methods when planar detection is employed. Several expressions for the single-view backpropagated scattered DPDW are developed in this section: a general expression permitting arbitrary placement of sources and detectors, an expression specializing this to planar detection, and further specialization of the planar detection case when the objects being imaged are thin in the depth dimension. Depth is measured along the axis normal to the detection plane (the z axis) where a depth of zero corresponds to the detection plane and positive depth values are along the axis toward the illumination source (see Fig. 5). The development of the single-view turbid media backpropagation algorithm in this section follows the approach used for nonturbid media (Devaney, 1986); that is to say, it is desired to find a unique solution to the homogeneous and source-free Helmholtz equation (Eq. (12) with S(x, y, z) = 0) that is equal to the measured data on the measurement surface. Because the object structure or material properties are not assumed to be known, only the material properties of the background medium are used to define the appropriate Helmholtz equation. The requirement that the backpropagated scattered DPDW is an incoming wave as it propagates away from the measurement
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
283
surface is also imposed. This last requirement ensures that the backpropagated scattered DPDW reverses the effect of the forward propagation. Under these conditions, the backpropagated scattered DPDW, urec(x, y, z), is defined as follows: ∂ u rec (x, y, z) = u s (x ′ , y ′ , z ′ ) gb (x − x ′ , y − y ′ , z − z ′ ) d x ′ dy ′ dz ′ (48) ∂ n!
where the integral is over the surface on which the data were measured, gb (x, y, z) is the Green’s function corresponding to the homogeneous Helmholtz equation with the boundary conditions set by the measurement surface and the restriction on the behavior of the backpropagated scattered DPDW as it propagates away from the measurement surface, ∂/∂ n! is the partial derivative operator with respect to the primed variables in the outward-facing (away from the object) direction normal to the measurement surface , and u s (x, y, z) = u s1 (x, y, z) + u s2 (x, y, z), where u s1 (x, y, z) and u s2 (x, y, z) are given in Eqs. (21) and (39), respectively. Equation (48) is the most general form for u rec (x, y, z), permitting arbitrary source and detector placement in the background medium. The form of the integrand in Eq. (48) indicates that the integral is the space-invariant convolution of ∂gb /∂ n! with us. As a result, it is shown that, for planar detection, the convolution indicated in Eq. (48) can be carried out quickly by multiplication in the Fourier domain. However, Eq. (48) is not well defined for turbid media, in general. Because of the exponential decay that an outgoing DPDW experiences, the amplitude of the backpropagated scattered DPDW exponentially increases as the distance from the measurement surface increases. As long as the measurement surface encloses a finite volume, Eq. (48) is still well posed in the classical sense. However, for open measurement surfaces, the backpropagation algorithm must be modified to make the problem well posed. Because a planar detection surface is an open surface, this issue needs to be addressed for detection in a plane. This problem is encountered even in the backpropagation algorithm for nonturbid media because of evanescent waves, as is discussed shortly. It is next assumed that the measurement surface is a plane parallel to the x–y plane and located at z = zo as described in Section III and shown in Figure 5. For this situation, the Green’s function gb can be determined by the method of images (Barton, 1989; Morse and Feshbach, 1953) and is given by gb (x − x ′ , y − y ′ , z − z ′ ) F ωx , ω y 1 {exp[(z ′ − z)γω ] − exp[(2z o − z ′ − z) γω ] } = 2 8π γω × exp[(x − x ′ )ωx + (y − y ′ ) ω y ] dωx dω y
(49)
where the plane-wave form of the Green’s function has been used (Ba˜nos,
284
CHARLES L. MATSON
1966). In writing Eq. (49), we have assumed that z′ > z, which is justified because the reconstruction geometry is such that the backpropagation is into the half-plane z < zo and z′ , the integration variable, is set equal to zo in Eq. (48) after the derivative operation is carried out. A low-pass-filter function has been included in Eq. (49), F(ωx , ωy), to ensure that gb is well defined, which ensures that Eq. (48) is well posed. The ill-posed nature of Eq. (49) results from the exponential terms inside the braces. Because gb was chosen to correspond to an incoming wave in the half-plane z < zo as z decreases, the amplitudes of the exponential terms are increasing functions of ω because γω has a positive real part. It is these increasing amplitudes that cause Eq. (49) to be ill posed. For Eq. (49) to be well posed, it must be modified to limit the amplitudes of the exponentials. The typical way to accomplish this is to low-pass filter the integrand because the exponential amplitudes become unbounded only as spatial frequency approaches infinity. In standard DT, the exponential terms have unit amplitude for all frequencies inside a circular region centered at zero spatial frequency and are increasing functions of frequency outside this circular region. The former case corresponds to propagating waves and the latter case corresponds to evanescent waves. Therefore, the standard DT problem is regularized (i.e., made well posed) by choosing F(ωx , ωy) to be an ideal low-pass filter with a spatial-frequency cutoff chosen to pass just the propagating waves. In turbid media, the arguments of these exponentials are always complex with γωr a positive and increasing function of ω, which means that the amplitudes of the exponentials are increasing functions of ω for all spatial frequencies. As a result, the low-pass-filter spatial-frequency cutoff should be chosen by using an SNR criterion. An SNR analysis that can be used to select values for the spatial-frequency cutoffs is presented in Section V. In addition, the type of filter that results in best performance is not necessarily an ideal low-pass filter. In Section IV.C several representative filters are chosen and their behaviors analyzed to see how well they work in the backpropagation process. For now, the specific type of low-pass filter in Eq. (49) remains unspecified. As a way to simplify Eq. (48) for the planar measurement scenario, Eq. (49) is substituted into Eq. (48). Because the measurement surface is parallel to the x–y plane and to the right of the object, ∂/∂ n! becomes a derivative with respect to z ′ in the positive z direction. Carrying out the derivative operation gives 1 ′ ′ u rec (x, y, z) = u s (x , y , z o ) 2 F(ωx , ω y ) exp[(z o − z)γω ] 4π × exp[(x − x ′ )ωx + (y − y ′ )ω y ] dωx dω y d x ′ dy ′
(50)
Equation (50) expresses the relationship between the measured scattered DPDW and the backpropagated scattered DPDW as a two-dimensional
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
285
convolution of the measured scattered DPDW with the Green’s function from Eq. (49). Efficiently solving this equation requires Eq. (50) to be Fourier transformed with respect to x and y to convert the convolution into a multiplication. Because the inner double integral, divided by 4π 2, is the inverse Fourier transform of F(ωx , ω y ) exp[(z o − z)γω ] with respect to ωx and ωy , this gives Urec (ωx , ω y ; z) = F(ωx , ω y ) exp [(z o − z)γω ] Us (ωx , ω y ; z o )
(51)
From Eq. (51) it is apparent that the backpropagation operation consists of multiplying the two-dimensional Fourier transform of the measured data by a low-pass filter and a backpropagation transfer function Hb (ωx , ω y ; z o − z) defined by Hb (ωx , ω y ; z o − z) = exp[(z o − z)γω ]
(52)
The backpropagated scattered DPDW, as given in Eq. (51), is calculated by using Fourier transforms and multiplications and thus can be carried out quickly. It can also be seen from Eq. (51) that the single-view backpropagation algorithm is most efficiently implemented by reconstructing planes of the backpropagated scattered DPDW that are parallel to the detection plane because the backpropagation transfer function is a function of distance away from the detection plane. Notice that the backpropagation transfer function for turbid media is notationally identical to the backpropagation transfer function for standard DT (see Eq. (10)), as can be verified by substituting the definition for γω from Eq. (24) into Eq. (52). The significant difference between the two backpropagation transfer functions is that k is real in standard DT and complex in DT for turbid media. Thus the backpropagation operation backpropagates the phase in standard DT but backpropagates both the phase and the amplitude in DT for turbid media. The seven steps necessary to implement the backpropagation algorithm are as follows: Step 1 Collect the data by using either a planar detector or a detector scanned in a plane. The algorithm runs fastest if the data are collected on a square grid with equally spaced samples because of the efficiencies of the fast Fourier transform operation. If the illuminating DPDW has a nonzero modulation frequency, the amplitude and the phase of the DPDW must be measured. This can conveniently be done either by using a lock-in amplifier or by collecting a time sequence of the data at each pixel and processing the time sequences in a computer. If the illuminating DPDW is CW, the amplitude is the measurement and the phase is zero.
286
CHARLES L. MATSON
Step 2 Subtract the portion of the measured data that is due to the homogeneous background. If a background measurement can be made (e.g., in optical mammography, the non-tumor-bearing breast may be able to be used to collect such a measurement), the subtraction is carried out with this measurement. If no measurement is available, the background must be estimated by estimating the background material properties and calculating what the background measurement would be. The subtraction produces the portion of the DPDW scattered by the object. Step 3 Fourier transform the measured scattered DPDW estimate. Step 4 Determine the z locations for which planar reconstructions of the scattered DPDW in the turbid medium are desired. Step 5 For each of these z locations, calculate a backpropagation transfer function by using Eq. (52) and multiply the Fourier transform of the measured scattered DPDW estimate by the transfer function. This produces a twodimensional Fourier-transformed backpropagated scattered DPDW at each z location. Step 6 Multiply each of these two-dimensional Fourier-transformed backpropagated scattered DPDWs by a regularizing filter to reduce noise levels. The regularizing filter can be different for each depth, if desired (Pogue et al., 1999). Step 7 Inverse Fourier transform each of the regularized and Fourier-transformed backpropagated scattered DPDWs. The result of this seventh step is the desired three-dimensional backpropagated scattered DPDW throughout the turbid media volume. The full three-dimensional backpropagated scattered DPDW can be used to locate an object three dimensionally from the single two-dimensional measurement used in the reconstruction process, as is discussed in Section IV.C. It can also be used to obtain an image of the object that has higher spatial resolution than is present in the measured data. This topic is discussed in Section IV.B. The final topic discussed in this subsection is the use of the backpropagation algorithm to reconstruct images and material properties of objects whose z dimension is thin relative to the z dependence of the functions in the integrand of the forward model (Eq. (44)) (Cheng and Boas, 1998; Durduran et al., 1999; Li, Durduran, et al., 1997; Li, Pattanayak, et al., 2000). For example, consider an object at a location z = z3 whose scattering properties are the same as that of the background medium but whose absorptive properties satisfy the Born approximation and whose width in the z direction is z. In this case, Eq. (26), a mathematical description of the Fourier transform of the scattered DPDW, can be simplified by replacing the z integral with the product of the integrand
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
evaluated at z = z3 and z, which gives
z exp[−(z o − z 3 )γω ] Us (ωx , ω y ; z o ) = 2γω
287
oa (x ′ , y ′ , z 3 )u o (x ′ , y ′ , z 3 )
× exp[−i(x ′ ωx + y ′ ω y )] d x ′ dy ′
(53)
Equation (53) can be inverted directly to solve for oa (x, y, z 3 ) by exploiting the fact that the double integral is the Fourier transform of oa (x ′ , y ′ , z 3 )u o × (x ′ , y ′ , z 3 ). The result of the inversion produces 1 Us (ωx , ω y ; z o )γω oa (x, y, z 3 ) = 2 2π zu o (x, y, z 3 ) exp[− (z o − z 3 )γω ] × exp[i(x ′ ωx + y ′ ω y )] d x ′ dy ′
(54)
As the width z increases, this approximation gets worse. For objects thin enough that all the terms in the integrand of Eq. (26) are essentially constant with respect to the z variable, the approximation is good. The key aspect in using Eq. (54) to solve for oa (x, y, z 3 ) is that the value of z3 must be known. If z3 is known, then the solution for oa (x, y, z 3 ) is a direct inversion of the forward model and no backpropagation algorithm is needed. Additional orthogonal measurements have been suggested as a means to provide the desired depth information (Li, Pattanayak, et al., 2000); however, as demonstrated in Section IV.C and by other researchers (Cheng and Boas, 1998; Durduran et al., 1999), the backpropagation algorithm can also be used to determine the depth of the object without additional measurements. Thus the backpropagation algorithm is not used to reconstruct the object information per se, but instead can be used to provide the depth information to support the direct inversion of the forward model. Using this same approach is more difficult when the object has scattering properties different from that of the background medium because, as can be seen in Eq. (44), the integrand contains a spatial-frequency-weighted gradient of u o (x, y, z). One approach to a solution in this case is to make two additional assumptions (Li, Pattanayak, et al., 2000). The first assumption is that the illumination source is a point source, and the second assumption is that the spatial variation of the object and background-medium scattering properties are smooth enough that the portion of the integrand for scattering objects in Eq. (44) can be rewritten to involve just u o (x, y, z) and not its gradient. With these assumptions, a solution for os (x, y, z 3 ) is carried out in a manner similar to the derivation of Eq. (54). For details on these inversion algorithms for thin objects, the validity of the assumptions, and the applicability of the algorithms to objects with both scattering and absorption properties different from those of the background medium, see Li, Pattanayak, et al. (2000).
288
CHARLES L. MATSON
B. Resolution Enhancement In this subsection it is shown how the backpropagation algorithm can be used to increase spatial resolution in the measured data (Matson, 2001). The mechanism for the increase in resolution can be seen by looking at Eq. (51), the mathematical description of the single-view backpropagation algorithm. The backpropagation transfer function boosts the Fourier amplitudes of the measured data because its amplitudes increase exponentially with respect to the spatial-frequency variables because zo > z, which therefore increases spatial resolution. It is shown in this subsection that the backpropagation transfer function, Eq. (52), is actually a deconvolution operator that removes the Fourier amplitude attenuation caused by forward propagation through the turbid media. Because noise limits the amount of deconvolution that can be implemented in a stable manner, the effects of noise on the deconvolution process are described and quantified. It is shown that the amount of spatial resolution increase brought about by the backpropagation algorithm, as a function of the integrated SNR of the data, is remarkably independent of the material, object, and system properties. To more easily understand the deconvolution nature of the backpropagation transfer function, let us employ a mathematical description of the forward model that emphasizes the z-dependent two-dimensional convolution that the scattered DPDW undergoes as it propagates from the object to the detection plane. To this end, the three-dimensional space in which the object resides is decomposed into two half-spaces. These half-spaces are separated by the plane z = z2 , where z2 is determined by the support of the object as follows (see Fig. 5): Let z2 be such that o(x, y, z) = 0 for all z ≥ z2 and o(x, y, z 2 − ε) = 0 for ε > 0 arbitrarily small and for some measurable set in the (x, y) plane. As a result, the object resides in the half-space z ≤ z2, which is called the object half-space and abuts the plane z = z2; the detector resides in the other halfspace, which is called the detection plane half-space. In the object half-space, the scattered DPDW is modeled, as before, with Eq. (44). In the detection plane half-space, because the medium is homogeneous, the scattered DPDW is modeled by using the source-free homogeneous Helmholtz equation (Eq. (12) with the source term equal to zero) with the following appropriate boundary conditions: the solution must equal the scattered DPDW on the boundary z = z2 , and the solution must go to zero as x, y, and z become arbitrarily large, because the physics of light propagation in turbid media requires this. The measured scattered DPDW can now be modeled with the use of these two half-spaces and mathematical models. Let Us(ωx , ωy; z2) be the Fourier transform of the scattered DPDW in the plane z = z2 , modeled by using Eq. (44). With the use of the homogeneous Helmholtz equation and the
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
289
boundary conditions just described, the Fourier transform of the measured scattered DPDW can be calculated in terms of Us(ωx , ωy; z2) by using the same approach that was used to calculate the backpropagated wave, Eq. (51), which produces the following result: Us (ωx , ω y ; z o ) = H (ωx , ω y ; z o − z 2 )Us (ωx , ω y ; z 2 )
(55)
H (ωx , ω y ; z o − z 2 ) = exp [−(z o − z 2 )γω ]
(56)
where H(ωx , ωy; zo − z2), the forward propagation transfer function, is given by It can be seen from Eq. (56) that there are a number of important properties that H(ωx , ωy; zo − z2) possesses. First, because the real part of γω is positive, the amplitude of H(ωx , ωy; zo − z2) decreases exponentially with respect to the spatial-frequency variables (recall that γω is a function of ωx and ωy). As a result, it attenuates nonzero spatial frequencies, which causes blurring in the measured data. Second, the amount of attenuation increases as the separation of the object and the detection plane, zo − z2 , increases. This means that deeply imbedded objects undergo more blurring than do more-shallow objects. Third, not unexpectedly, the backpropagation transfer function Hb(ωx , ωy; zo − z2) defined in Eq. (52) is the inverse of the forward propagation transfer function H(ωx , ωy; zo − z2). Thus it can be seen that the backpropagation algorithm behaves as a deconvolution algorithm when the measured scattered DPDW is propagated back through the medium to the object location. Correctly deconvolving the effects of forward propagation from the measured scattered DPDW requires the depth of the object to be known. Choosing a depth parameter that is smaller than the true object depth causes the blurring to be only partially removed from the measured data, which leaves residual blurring that obscures image detail. In contrast, choosing a depth parameter that is too large causes the deconvolution process to overcompensate for the forward propagation, which produces side-lobe artifacts in the reconstructed image. As is discussed in Section IV.C, these two properties lead to the ability of the backpropagation algorithm to locate an object in depth. This depth localization property of the backpropagation algorithm permits accurate deconvolution of measured data because accurate depth estimates can be made. With a knowledge of the true depth of the object, the optimum deconvolution filter can be derived. The forward propagation transfer function, given in Eq. (56), models the propagation of the scattered DPDW from the z = z2 plane into the detection half-plane. If the object’s depth dimension is less than the diameter of the residual blur in the backpropagated data, then the backpropagation algorithm can remove the blurring for all portions of the object for one value of
290
CHARLES L. MATSON
z inside the object support. If the object’s depth dimension is greater than the diameter of the residual blur in the backpropagated data, backpropagating the measured scattered DPDW to the z = z2 plane removes all the blurring in the portions of the object adjacent to that plane, but only partially removes the blurring in the portions of the object that are deeper in the object half-space. For this reason, a multiple-view backpropagation algorithm is necessary to remove blur for three-dimensional objects, in general. For the discussion in this subsection, the scattered DPDW in the z = z2 plane is assumed to be due to a point object located immediately adjacent to the z = z2 plane. As a result, the Fourier transform of the scattered DPDW in the z = z2 plane is constant at all spatial frequencies and thus, by Eq. (55), the Fourier transform of the measured scattered DPDW in the detection plane is just the forward propagation transfer function. This assumption permits the discussion in this subsection to be independent of the object being imaged. The effects of non–point objects can be included with the use of Eq. (55) by multiplying the measured scattered DPDW Fourier amplitudes in the pointobject case by the Fourier transform of the particular non–point object under consideration. The ability of the backpropagation algorithm to carry out the desired deconvolution depends on the shape of H(ωx , ωy; zo − z2) because noise ultimately limits the highest spatial frequency in the data that can be deconvolved. First, the shape of the forward propagation transfer function is discussed, followed by an explanation of how noise limits spatial resolution. The shape of H(ωx , ωy; zo − z2) is determined by a number of factors: the depth of the object, the modulation frequency of the DPDW, the background medium’s absorption and reduced scattering coefficients, and the index of refraction. These parameters can produce vastly differing versions of H(ωx , ωy; zo − z2). As a demonstration of this fact, plots of the amplitudes of H(ωx , ωy; zo − z2) as a function of spatial frequency, normalized to one at zero spatial frequency, are displayed in Figure 7 for four combinations of these parameters that can be encountered in ODT applications. Notice that the forward propagation transfer function decays much less rapidly for an object that is closer to the detection plane than for an object that is farther away. Also notice that increasing any of the other parameters (absorption and reduced scattering coefficients, and modulation frequency of the DPDW) causes the forward propagation transfer function’s amplitudes to stay near one longer before exponentially decaying. Objects at shallower depths decrease the decay rate because the argument of the exponential is proportional to the object depth. Larger values of any of the other parameters increase the spatial frequency where the forward propagation transfer function starts exponentially decaying because k is proportional to the square root of these values, and the corner spatial frequency of the
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
291
Figure 7. Fourier amplitude plots of the normalized forward propagation transfer functions: (solid line) μa = 0.03 cm−1, μ′s = 15 cm−1, ft = 0, and zo − z2 = 6 cm; (dotted line) μa = 0.03 cm−1, μ′s = 15 cm−1, ft = 1 GHz, and zo − z2 = 6 cm; (dashed line) μa = 0.03 cm−1, μ′s = 25 cm−1, ft = 0, and zo − z2 = 6 cm; (dash-dot line) μa = 0.03 cm−1, μ′s = 15 cm−1, ft = 0, and zo − z2 = 1 cm. For all plots, n = 1.333. Also plotted is a typical noise Fourier amplitude plot (dash-dot-dot-dot line).
forward propagation transfer function is proportional to k. Slices of the forward propagation point spread functions (PSFs) corresponding to the forward propagation transfer functions in Figure 7 are plotted in Figure 8. It can be seen from these slices that the widths of the PSFs, and thus the resolutions in the measured scattered DPDW corresponding to these PSFs, depend on these same parameters. In the noise-free case, the backpropagation algorithm can be used to completely remove the blurring in the measured scattered DPDW because it backpropagates the measured scattered DPDW by dividing the Fourier transform of the measured scattered DPDW by the forward propagation transfer function, which is nonzero for all spatial frequencies. However, when noise is present, the backpropagation algorithm produces reconstructions of good quality only when the division is restricted to the spatial frequencies for which the amplitude of the noise is less than the amplitude of the noise-free data. Therefore, so that how noise limits the spatial resolution in the backpropagated scattered DPDW can be understood, it is necessary to characterize the Fourier transform of the noise in the measured scattered DPDW. In ODT, the two fundamental limiting noise sources are photon noise, due to the random arrival rates of photons at the detector, and amplifier noise, due to the amplification of the detected photons by the detection circuitry. These noise sources are discussed
292
CHARLES L. MATSON
Figure 8. Normalized point spread function (PSF) plots corresponding to the normalized Fourier amplitude plots in Figure 7: (solid line) μa = 0.03 cm−1, μ′s = 15 cm−1, ft = 0, and zo − z2 = 6 cm; (dotted line) μa = 0.03 cm−1, μ′s = 15 cm−1, ft = 1 GHz, and zo − z2 = 6 cm; (dashed line) μa = 0.03 cm−1, μ′s = 25 cm−1, ft = 0, and zo − z2 = 6 cm; (dash-dot line) μa = 0.03 cm−1, μ′s = 15 cm−1, ft = 0, and zo − z2 = 1 cm. For all plots, n = 1.333. (Reprinted with permission from C. L. Matson, 2001. Deconvolution-based spatial resolution in optical diffusion tomography. Applied Optics, 40, 5791–5801.)
in greater detail in Section V, in which SNR expressions are derived for ODT. In this section, it is sufficient to note that both types of noise have Fourier transforms whose amplitude spectra are constant with respect to spatial frequency and whose amplitudes are typically two to six orders of magnitude lower than the noise-free Fourier amplitudes at zero spatial frequency. In Figure 7 a plot of the Fourier amplitudes of a typical noise realization in ODT is displayed along with the noise-free Fourier amplitude plots discussed previously. Notice that both the decay rate of the forward propagation transfer functions and their corner frequencies determine the value of the spatial frequency for which the noise Fourier amplitudes equal the noise-free Fourier amplitudes. Notice also that the exponential decays of the forward propagation transfer functions result in a swift transition from the region in Fourier space where the signal is larger than the noise to the region where the converse is true. For this reason, the image quality in the backpropagated scattered DPDW is fairly sensitive to the choice of the spatial-frequency cutoff of the regularization low-pass filter. The amount of resolution improvement possible with the use of the backpropagation algorithm is next quantified. First, it is necessary to choose a metric with which to define resolution. A number of definitions have been used, including the full width at half maximum value of the forward propagation PSF (Moon et al., 1996; Ripoll et al., 1999), the reciprocal of the steepest slope of the edge response function (Wabnitz and Rinneberg, 1997), and the reciprocal
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
293
of the width of the forward propagation transfer function after it has fallen to some desired value (Hebden, 1992). In this article, spatial resolution is defined as the distance from the peak of the forward propagation PSF to the point at which it falls to 10% of its peak value (Matson and Liu, 2000). This definition is chosen because it approximates the Rayleigh resolution criterion often used to define spatial resolution in images. The Rayleigh criterion actually defines the resolution of a PSF as the distance between its maximum and its first minimum. Because h is a monotonically decreasing function, it never reaches a minimum, so the Rayleigh definition was modified as just described so that it applies to this type of PSF. Second, it is necessary to decide which specific quantities to compare in order to determine the amount of resolution enhancement possible with the use of the backpropagation algorithm. In this chapter, the amount that the resolution is increased by the backpropagation algorithm is determined by comparing the widths of the appropriate PSFs before and after application of the backpropagation algorithm. The narrower the PSF, the greater the spatial resolution in the data. After application of the backpropagation algorithm, the appropriate PSF to use in the comparison process is the PSF of the regularizing filter in the backpropagation algorithm (F(ωx , ωy) in Eq. (51)). This is the correct PSF to use because the backpropagation algorithm divides out the forward propagation transfer function completely so that there is no blurring in the backpropagated scattered DPDW due to the forward propagation PSF, and then regularizes the deconvolution procedure by multiplying the deconvolved data with an appropriate regularization filter. Thus the only blurring in the backpropagated scattered DPDW is due to the regularizing filter. For the measured scattered DPDW, the appropriate PSF is the forward propagation PSF convolved with the regularizing filter’s PSF. The reason that the convolution of these two PSFs is used for the comparison is again due to noise. If there is noise in the measured scattered DPDW, the data should be processed to remove the spatial frequencies where the noise is greater than the signal. As a way to keep the specific type of regularizing filter from affecting the results of the comparison, the same regularizing filter is used for both the measured scattered DPDW and the backpropagated scattered DPDW. Finally, the resolution enhancement is quantified by dividing the width of the backpropagated scattered DPDW PSF by the width of the measured scattered DPDW PSF. Although the amount of spatial resolution in the backpropagated scattered DPDW depends on the type of regularizing filter used (see Section IV.C), the ratio of the regularized backpropagated and measured scattered DPDW PSF widths does not. This ratio is called the PSF width scale factor, for which smaller values are better because they indicate that the PSFs in the backpropagated scattered DPDW are narrower than in the measured scattered DPDW.
294
CHARLES L. MATSON
Figure 9. PSF width scale factors produced by deconvolution and regularization as a function of the signal-to-noise ratio (SNR) at zero spatial frequency: (solid line) μa = 0.03 cm−1, μ′s = 15 cm−1, ft = 0, and zo − z2 = 6 cm; (dotted line) μa = 0.03 cm−1, μ′s = 15 cm−1, ft = 1 GHz, and zo − z2 = 6 cm; (dashed line) μa = 0.03 cm−1, μ′s = 25 cm−1, ft = 0, and zo − z2 = 6 cm; (dash-dot line) μa = 0.03 cm−1, μ′s = 15 cm−1, ft = 0, and zo − z2 = 1 cm. For all plots, n = 1.333. (Reprinted with permission from C. L. Matson, 2001. Deconvolution-based spatial resolution in optical diffusion tomography. Applied Optics, 40, 5791–5801.)
Plots of the PSF width scale factor are shown in Figure 9 for the forward propagation PSFs plotted in Figure 8. The plots are displayed as a function of the SNR in the measured scattered DPDW at zero spatial frequency. Unsurprisingly, the resolution improvements achieved by the backpropagation algorithm deconvolution increase as the SNR increases. What is surprising is that despite the large variations in the background, system, and object parameters for the various plots, the resolution improvements for all the cases have approximately the same functional dependence on the SNR. This contrasts strongly with the significant differences in the Fourier amplitude plots as seen in Figure 7. Notice also that the PSF width scale factor decreases by approximately a factor of 5 for SNRs of 106. Values this large for the SNR at zero spatial frequency are not unreasonable in optical tomography data, as is discussed in Section V. The relative insensitivity of the deconvolution-based improvement in spatial resolution to the background, system, and object parameters is next discussed in more detail. Recall that each of the plots in Figure 9 is a result of the division of the deconvolved and regularized PSF widths by the regularized PSF widths. These PSF widths are generated from the forward propagation transfer functions whose Fourier amplitude plots are shown in Figure 7, and although these plots vary greatly in how they decay as a function of spatial frequency, the division process underlying the plots in Figure 9 normalizes out most of this spatial-frequency dependence. However, the normalization process does not remove all the differences. Notice that the plot in Figure 9 corresponding
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
295
to the 1-GHz modulation case shows a greater improvement as a result of deconvolution for lower SNR levels than for the other plots. This is because the low-spatial-frequency region of the 1-GHz forward propagation transfer function is broader than for the other plots. Conversely, the plot in Figure 9 corresponding to the depth of 1 cm shows greater improvements by means of deconvolution for higher SNR levels because the high-spatial-frequency region of the corresponding Fourier amplitude plot decays less rapidly than in all the other cases. As seen in this subsection, the backpropagation algorithm can be used to increase the spatial resolution in the measured scattered DPDW. In the next subsection, this property is shown to play a key role in the ability of the backpropagation algorithm to locate an object in three dimensions from a single two-dimensional measurement. C. Object Localization In this subsection the use of the backpropagation algorithm to localize an object in three dimensions by using a single two-dimensional dataset is explained (Cheng and Boas, 1998; Durduran et al., 1999; Matson and Liu, 1999b). To begin, let us consider again the mathematical expression of the Fourier transform of the measured scattered DPDW as given by Eq. (44). Substituting this expression into Eq. (51), the mathematical description of the backpropagation operation, gives F(ωx , ω y ) exp(−i z 3 γωi ) oa (x ′ , y ′ , z ′ ) Urec (ωx , ω y ; z 3 ) = − K 1 (z 3 ) 2γω ∂ ∂ ∂ ′ ′ ′ + os (x , y , z ) iωx ′ + iω y ′ − γω ′ ∂x ∂y ∂z × u o (x ′ , y ′ , z ′ ) exp[−(z 3 − z ′ )γωr ] × exp[−i(x ′ ωx + y ′ ω y − z ′ γωi )] d x ′ dy ′ dz ′
(57)
where, to enhance the clarity of the following discussion, the location of the x–y plane in which the measured scattered DPDW has been reconstructed is denoted by z = z3. K1(z3) is a normalization factor to be discussed shortly. A comparison of Eqs. (44) and (57) shows that the mathematical expression for the backpropagated scattered DPDW in the z = z3 plane is the same as the measured scattered DPDW (except for F(ωx , ωy) and K1(z3)), with zo in Eq. (44) replaced by z3 in Eq. (57). Because of this equivalence, the reconstructed result is identical to what would have been obtained if the scattered DPDW had been measured in the z = z3 plane. As a result, the backpropagation operation can
296
CHARLES L. MATSON
be viewed as effectively moving the detection plane back toward the object. Because resolution increases as the distance between the object and the detection plane decreases (see Section IV.B), the backpropagation algorithm effectively increases resolution in the image. However, this equivalence is valid only for planes in the volume outside the object. In terms of Eq. (57), this means that the equivalence holds for values of z3 greater than the largest value of z inside the object, which is denoted by z2 in Figure 5. For z3 < z2 , the exponential term exp[−(z 3 − z ′ )γωr ] in the integrand becomes an increasing exponential as a function of ω for values of z′ inside the support of the object because γωr is a positive and increasing function of ω. The increasing exponential causes side-lobe artifacts to appear in the reconstructed image for these smaller values of z3 because a low-pass-filtered Fourier domain increasing exponential is an approximation of the Fourier transform of a cosine function. These side-lobe artifacts can be exploited for z3 < z2 as well as the increased resolution achieved for z2 < z3 < zo to localize an object, as is next discussed. Object localization has also been considered in standard DT, assuming that all aspects of the object are known except for its location (Devaney and Tsihrintzis, 1991; Schatzberg et al., 1994; Tsihrintzis and Devaney, 1991). Let us consider the scenario in which a turbid medium containing an object is probed with a DPDW and the measured scattered DPDW is backpropagated throughout the medium. The image of the object as seen in the measured scattered DPDW is blurred a certain amount because of the separation of the object and the detection plane. As z3 is decreased, but kept larger than z2 , the backpropagated image of the object becomes sharper because the image resolution increases as the separation between the reconstruction plane and the object decreases. One effect of this increased resolution is that the intensity in the image becomes more concentrated in the vicinity of the object because blurrier images have intensities spread out more by the blurring PSF. If the integrated intensities of all the planes in the backpropagated scattered DPDW are the same, the amplitude of the backpropagated scattered DPDW increases in the region of the object as z3 gets closer to the object. The normalization factor K1(z3) in Eq. (57) is set equal to the integrated intensity at z3 for all values of z3 used in the backpropagation algorithm to achieve this desired normalization. Next, as z3 becomes smaller than z2 , the aforementioned side-lobe artifacts start appearing in the reconstructed image. These side-lobe artifacts spread energy out from the object region into the rest of the image and thus lower the amplitude of the backpropagated scattered DPDW inside the object. As a result, the peak of the amplitude of the backpropagated scattered DPDW is contained inside the object, so the backpropagation algorithm can be used to localize an object three dimensionally by using a single two-dimensional measurement by locating the peak of the amplitude of the backpropagated scattered DPDW reconstructed throughout the entire medium between the detection and illumination planes.
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
297
The type of low-pass filter, F(ωx , ωy), and the spatial-frequency cutoff play critical roles in how effectively the backpropagation algorithm can localize an object. For a fixed object location, filters that have higher-amplitude side lobes produce reconstructions that show the object closer to the detection plane, whereas those with lower-amplitude side lobes produce reconstructions that show the object farther from the detection plane. So that the effect of side-lobe amplitudes on the accuracy of the localization process can be analyzed, three types of filters are used as regularizing filters in the backpropagation algorithm: an ideal low-pass (or pillbox) filter, a modified Hamming filter, and a Hanning filter. Mathematically, these filters are described in the Fourier domain by the following equations: Pillbox:
Modified Hamming:
Hanning:
H1 (|ω|) =
1 0 ≤ |ω| ≤ Ro 0 otherwise
⎧ ⎨
(58)
|ω| 0.625 + 0.375 cos π H2 (|ω|) = Ro ⎩ 0 ⎧ ⎨
|ω| 0.5 + 0.5 cos π H3 (|ω|) = Ro ⎩ 0
0 ≤ |ω| ≤ Ro
(59)
otherwise
0 ≤ |ω| ≤ Ro
(60)
otherwise
where the spatial-frequency cutoff for all three filters is denoted by Ro and |ω| is the magnitude of the spatial frequency. In the Fourier domain, these filters differ in how much they attenuate the higher spatial frequencies, as can be seen in Figure 10. In the image domain, the width of the central lobes and the height of
Figure 10. Slices of the frequency-domain responses of low-pass filters used to regularize the backpropagation algorithm: (solid line) pillbox filter, (dashed line) modified Hamming filter, and (dotted line) Hanning filter. (Reprinted with permission from C. L. Matson and H. Liu, 1999. Backpropagation in turbid media. Journal of the Optical Society of America A, 16, 1254–1265.)
298
CHARLES L. MATSON
Figure 11. Slices of the PSFs corresponding to the low-pass filters shown in Figure 10: (solid line) PSF corresponding to the pillbox filter, (dashed line) PSF corresponding to the modified Hamming filter, and (dotted line) PSF corresponding to the Hanning filter. (Reprinted with permission from C. L. Matson and H. Liu, 1999. Backpropagation in turbid media. Journal of the Optical Society of America A, 16, 1254–1265.)
the side lobes of their corresponding PSFs are their distinguishing differences, as shown in Figure 11. The pillbox filter has the highest-amplitude side lobes and the narrowest main lobe, the modified Hamming filter has lower-amplitude side lobes and a wider main lobe, and the Hanning filter has the lowestamplitude side lobes and the widest main lobe for the same spatial-frequency cutoff. It can be seen that the less attenuation that the filter imposes in its passband, the higher the side lobes and the narrower the central lobe of the PSF. The behavior of the single-view backpropagation algorithm for these filters and for the spatial-frequency cutoffs chosen for them is explored next. First, the effect of the size of the spatial-frequency cutoff is demonstrated by backpropagating a measured scattered DPDW for two values: one chosen to be in the region of Fourier space where the signal is greater than the noise, and a second value chosen to be in the region where the noise is greater than the signal. All three filters are used for these two values of spatial-frequency cutoffs. The resulting reconstructions are analyzed to determine how accurately the estimated object location corresponds to the actual object location. This analysis is then carried out in a more quantitative manner and tested with another set of data. Both sets of data are computer-simulated data generated with a previously developed and validated computer simulation software package called Photon Migration Imaging (PMI). The PMI software was developed by D. A. Boas, M. A. O’Leary, X. Li, B. Chance, A. G. Yodh, M. A. Ostermeyer, and S. L. Jacques and is available from David Boas at the Harvard Medical School through his web site at http://www.nmr.mgh.harvard.edu/DOT. A schematic of the simulation geometry is shown in Figure 12. The background-medium
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
299
Figure 12. Schematic of a system and target used for the single-view backpropagation reconstructions and analyses shown in Figures 13 through 18. The background material properties are μa = 0.015 cm−1 and μ′s = 14 cm−1, the-1-cm-diameter sphere’s material properties are μa = 0.5 cm−1 and μ′s = 12 cm−1, the illumination source is located at x = y = z = 0, and the detection plane is located at z = 8 cm. The sphere is located in the plane x = y = 0 and at varying z locations. The y = 0 plane is shown.
absorption and reduced scattering coefficients for these results are 0.015 and 14 cm−1, respectively, which correspond to previously published values for breast tissue (Peters et al., 1990). The object is a 1-cm-diameter sphere whose center is located at x = 0, y = 0, and z = 2 cm, and whose absorption and scattering coefficients are 0.5 and 12 cm−1, respectively (Peters et al., 1990). For these tissue values, δμa = 0.485 cm−1 and δμ′s = − 2 cm−1, where δμa and δμ′s are the differences in the object’s absorption and reduced scattering coefficients from the background medium. The illumination is provided by a single point source modulated at 1 GHz that is located at x = 0, y = 0, and z = 0. The detection plane is located at z = 8 cm. The first set of backpropagated reconstructions is shown in Figure 13, where planar slices of the three-dimensional backpropagated scattered DPDW amplitudes are shown that are perpendicular to the detection plane and that pass through the center of the object. For these reconstructions, the spatial-frequency cutoff Ro of all the filters was set to 15 pixels, where 1 pixel corresponded to a spatial-frequency step of 0.125 cm−1. Notice that the side-lobe structure is
300
CHARLES L. MATSON
Figure 13. Planar slices of the backpropagated scattered DPDW amplitudes in a turbid medium volume with a 1-cm-diameter object located at z = 2 cm. The background material properties are μa = 0.015 cm−1 and μ′s = 14 cm−1, and the object material properties are μa = 0.5 cm−1 and μ′s = 12 cm−1. The illumination source is at the bottom of each slice, the detection plane is at the top of each slice perpendicular to each slice, and the slice shown contains the object center. The image size is 8 × 8 cm. Clockwise from the upper left: true object location, backpropagated scattered DPDW using a pillbox filter, backpropagated scattered DPDW using a Hanning filter, backpropagated scattered DPDW using a modified Hamming filter. All three filters have a spatial-frequency cutoff of 15 pixels. (Reprinted with permission from C. L. Matson and H. Liu, 1999. Backpropagation in turbid media. Journal of the Optical Society of America A, 16, 1254–1265.)
greatest for the pillbox filter and least for the Hanning filter, just as for the filters themselves (see Fig. 11). Notice also that the point of maximum amplitude differs for all three reconstructions. Therefore, it can be seen that the choice of filter affects the accuracy of the estimate of the object’s location. This point is readdressed shortly. The second set of reconstructions is shown in Figure 14, where the spatial-frequency cutoff was set to 40 pixels. It is easily seen that all three reconstructions are highly inaccurate. As will be shown, this inaccuracy is due to noise that is passed by the filter at high spatial frequencies. The combination of Figures 13 and 14 validate the claim that the filter type and the spatial-frequency cutoff play significant roles in the performance of the single-view backpropagation algorithm. Next, the accuracies of the estimated object depths as determined from the backpropagated scattered DPDW amplitudes are analyzed for all the
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
301
Figure 14. Planar slices of the backpropagated scattered DPDW amplitudes in a turbid medium volume with a 1-cm-diameter object located at z = 2 cm. The material and system properties are as described in Figure 13. Clockwise from the upper left: true object location, backpropagated scattered DPDW using a pillbox filter, backpropagated scattered DPDW using a Hanning filter, backpropagated scattered DPDW using a modified Hamming filter. All three filters have a spatial-frequency cutoff of 40 pixels. (Reprinted with permission from C. L. Matson and H. Liu, 1999. Backpropagation in turbid media. Journal of the Optical Society of America A, 16, 1254–1265.)
spatial-frequency cutoff values ranging from the smallest to the largest nonzero values permitted by the discrete Fourier transform of the measured scattered DPDW. It is assumed that the localization in x and y has already occurred, so the focus of this discussion is on the z dimension. In Figure 15, plots of the estimated z location of the object obtained by using the backpropagation algorithm for all three filters as a function of the filter spatial-frequency cutoff are shown. All material and system parameters are the same as used to generate Figure 13. Notice that the modified Hamming filter produces the most accurate estimates for the object location. Several other interesting phenomena can be seen in Figure 15 as well. The first is that the estimated object location obtained for all three filters diverges greatly from the true location when the filter spatial-frequency cutoff is larger than approximately 25. As a way to determine why this happens, radial slices of the Fourier transform of the measured scattered DPDW are plotted in Figure 16. Notice that the plot becomes dominated by noise for spatial-frequency values greater than
302
CHARLES L. MATSON
Figure 15. Estimated object locations in the z direction for a 1-cm-diameter object located at z = 2 cm. The material and system properties are as described in Figure 13. (Solid line) Pillbox filter, (dotted line) Hanning filter, and (dashed line) modified Hamming filter. (Reprinted with permission from C. L. Matson and H. Liu, 1999. Backpropagation in turbid media. Journal of the Optical Society of America A, 16, 1254 –1265.)
approximately 25, precisely where the estimates diverge. Although noise was not explicitly added in the simulated data, the finite precision of the computer produced the noise seen in the figure. Another phenomenon seen in Figure 15 is that the modified Hamming filter estimate does not become accurate until the spatial-frequency cutoff is greater than approximately 5 pixels. This behavior occurs because the spatial-frequency cutoff must be large enough to include the initial decay of the Fourier transform of the measured scattered DPDW.
Figure 16. Slice of the measured scattered DPDW’s Fourier amplitude for a 1-cm-diameter object located at z = 2 cm. The material and system properties are as described in Figure 13. (Reprinted with permission from C. L. Matson and H. Liu, 1999. Backpropagation in turbid media. Journal of the Optical Society of America A, 16, 1254 –1265.)
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
303
Figure 17. Estimated object locations in the z direction for a 1-cm-diameter object located at z = 6 cm. The material and system properties are as described in Figure 13. (Solid line) Pillbox filter, (dotted line) Hanning filter, and (dashed line) modified Hamming filter. (Reprinted with permission from C. L. Matson and H. Liu, 1999. Backpropagation in turbid media. Journal of the Optical Society of America A, 16, 1254 –1265.)
As a way to test these conclusions, the simulation was next run to produce a scattered DPDW for all the same parameters except that the object was located at z = 6 cm. The calculations used to generate Figure 15 were rerun for this new location of the object. The resulting estimated depths of the object produced by the three filters as a function of spatial-frequency cutoff are shown in Figure 17. Once again, notice that the modified Hamming filter estimates become accurate only for spatial-frequency cutoffs greater than approximately 5 pixels. Also, all the estimates diverge for spatial-frequency cutoffs greater than approximately 45 pixels. This is where the Fourier transform of the measured scattered DPDW becomes dominated by noise, as can be seen in Figure 18. Once again, the modified Hamming filter provides the most accurate estimates of the object’s z location. However, the modified Hamming filter does not always produce the best results, even though it did for these simulation parameters. It can be seen in Section IV.E that the pillbox filter produced better results in the example that is used to demonstrate the multiple-view backpropagation algorithm. More research is needed to determine how to pick optimal filters for ODT. D. Laboratory Data Reconstruction Examples In the previous two subsections, the single-view backpropagation algorithm was shown to be able to locate an object three dimensionally by using just a single two-dimensional measurement and to increase the spatial resolution
304
CHARLES L. MATSON
Figure 18. Slice of the measured scattered DPDW’s Fourier amplitude for an object located at z = 6 cm. The material and system properties are as described in Figure 13. (Reprinted with permission from C. L. Matson and H. Liu, 1999. Backpropagation in turbid media. Journal of the Optical Society of America A, 16, 1254 –1265.)
in the measured scattered DPDW. In this subsection, these theoretical results are demonstrated by using laboratory data. The laboratory data were collected by using two systems: one that employed modulated illumination and one that employed CW illumination. In Section IV.D.1, the frequency-domain system is described and results obtained from this system are presented. In Section IV.D.2, the CW system and results are presented. 1. Frequency-Domain Data The frequency-domain system described in this subsection was used to obtain data for two values of modulation frequency: 20 MHz and 10 kHz. The 20-MHz modulation frequency resulted in data that had both amplitude and phase components, whereas the 10-kHz modulation frequency produced a DPDW with a wavelength sufficiently long so that only amplitude data were measured. The 20-MHz system was the first system built by the author and his collaborators to test the backpropagation algorithm and was chosen specifically to obtain data that had measurable phase values in order to see how both amplitude and phase are used by the backpropagation algorithm. Subsequently an SNR analysis, presented in Section V, was carried out that predicted that the backpropagation algorithm would produce backpropagated scattered DPDWs with higher spatial resolutions by using CW illumination than was possible by using high-frequency modulated illumination. For this reason, the frequency-domain system was then modified to use 10-kHz illumination (essentially CW for this purpose). A schematic of the frequency-domain imaging
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
305
Figure 19. Frequency-domain system schematic. The data branch of the schematic is shaded in gray.
system for both modulation frequencies is shown in Figure 19. The electronic components of the frequency-domain system for the 20-MHz illumination included two radio frequency (RF) signal generators (Rhode & Schwarz, SMY01), two 50% power splitters (Mini-Circuits, ZFSC-2-1), two RF amplifiers (Mini-Circuits, ZFL-1000H), a 3-mW 780-nm laser diode (LaserMax Inc., LSX-3500) that could be modulated at frequencies up to 100 MHz, an avalanche photodiode (APD) (Hamamatsu, C5331), two frequency mixers (Mini-Circuits, ZLW-2), two low-pass filters (Mini-Circuits, SLP-1.9), a lock-in amplifier (Stanford Research Systems, SR850), and a computercontrolled, two-dimensional positioning stage (Aerotech, Inc., ATS100). For the 10-kHz illumination, the power splitters, amplifiers, frequency mixers, and low-pass filters were switched to these kilohertz-compatible components: power splitters (Mini-Circuits, ZFSC-2-6), amplifiers (Mini-Circuits, ZHL6A), frequency mixers (Mini-Circuits, ZAD-6), and low-pass filters (Rockland
306
CHARLES L. MATSON
Systems, Model 452). For both modulation frequencies, the heterodyne frequency was chosen to be 25 kHz. The low-pass filters are used to isolate the heterodyne signal and are not part of the backpropagation algorithm. The system operates by first choosing the desired modulation frequency. The computer controls the two signal generators to produce this modulation frequency for the data branch and the modulation frequency plus 25 kHz for the reference branch. The output from the data-signal generator is split evenly between the laser diode branch and the branch that provides a reference signal to the lock-in amplifier. In the laser diode branch, the signal is amplified to a level appropriate to drive the laser diode, which is used to illuminate a turbid medium phantom that is described subsequently. The DPDW emerging from the turbid media phantom is detected by the APD, which is mounted on a movable stage. The output from the APD is amplified and mixed with the output from the reference-signal generator to produce the desired 25-kHz heterodyne frequency. The heterodyne data-signal is low-pass filtered to remove the up-converted signal out of the mixer and then input it into the lock-in amplifier. The lock-in amplifier also takes as input a reference 25-kHz signal that is produced by mixing the data and reference generator outputs and low-pass filtering the result to obtain just the heterodyne signal. Comparing the data and reference inputs yields amplitude and phase information that is provided to the control computer. The computer automatically records the data and also controls the sensitivity and time constant settings for the lock-in amplifier on the basis of the signal strength. In addition, the computer drives the APD movable stage to collect data at all the desired spatial locations. Turbid medium phantoms were made out of plastic resin that was hardened in molds by adding a catalyst. The resin was combined with TiO2 powder to obtain the desired scattering coefficient and with dye to obtain the desired absorption coefficient (Firbank et al., 1993). The optical properties of homogeneous phantoms were determined by using the slope algorithm (Fishkin and Gratton, 1993) in transmission geometry. Because the backpropagation algorithm requires information on the homogeneous background medium, multiple samples were made as a set at the same time to ensure that the samples of the set had the same background optical properties. The first reconstruction example used the 20-MHz modulation-frequency value. The imbedded object was a 0.9-cm spherical absorber placed halfway between the two faces of a turbid medium phantom. The phantom’s depth in the z direction was 4.7 cm and its transverse dimensions were 14 ×14 cm. The phantom’s absorption and reduced scattering coefficients were 0.04 and 9.6 cm−1, respectively, and were determined by using the homogeneous phantom. The spherical absorber was placed off center with respect to the location where the 20-MHz illuminating laser diode entered the medium. The APD
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
307
Figure 20. Planar slice of the backpropagated scattered DPDW amplitude in a turbid medium volume with a 0.9-cm-diameter absorbing sphere located at z = 2.3 cm. The background material properties are μa = 0.04 cm−1 and μ′s = 9.6 cm−1. The illumination source is at the bottom of the slice, the detection plane is at the top of the slice perpendicular to the slice, and the slice shown contains the object center. The image size is 8 × 4.7 cm. The data were collected with the frequency-domain system shown in Figure 19.
was scanned over the center 8 × 8-cm portion of the phantom with a pixel separation size of 0.25 cm. The SNR at zero spatial frequency was determined from the data to be 1000, which implies that the backpropagation algorithm should be able to increase the resolution in the measured scattered DPDW by a factor of 3 (see Fig. 9). The regularizing filter used in the backpropagation algorithm was the modified Hamming filter whose spatial-frequency cutoff was set to 0.7 cm−1, which corresponded to the SNR = 1 point in the Fourier amplitude spectrum. A two-dimensional slice of the amplitude of the backpropagated scattered DPDW in the x–z plane that passed through the center of the spherical absorber is shown in Figure 20. The peak of this plane is located at a depth of 2.4 cm from the detection plane, which lies within a millimeter of the true depth location. The transverse location of the spherical absorber was determined with similar accuracy. The backpropagated scattered DPDW amplitudes are displayed in three planes parallel to the detection plane in Figure 21: the detection plane, the plane corresponding to the best image quality, and the plane where the illumination enters the phantom. Figure 21 also contains a diagram indicating the true location and size of the spherical absorber. Notice that the best image quality was obtained in the plane where the object resided. After the spatial resolution in the best reconstruction plane was analyzed and compared with the resolution in the detection plane, it was determined that the amount of resolution increase was only a factor of 2, not 3. It was suspected that this could be due to incomplete background subtraction or to excess RF noises, which were noticed to be a problem when the data
308
CHARLES L. MATSON
Figure 21. Clockwise from the upper left: true object, measured scattered DPDW amplitude, backpropagated scattered DPDW amplitude in a plane parallel to the detection plane at the true depth (2.3 cm) of the absorbing sphere, and backpropagated scattered DPDW amplitude at a depth of 4.7 cm. The background material properties are μa = 0.04 cm−1 and μ′s = 9.6 cm−1, the modulation frequency is 20 MHz, and the separation of the illumination and detection plane is 4.7 cm. The data were collected with the frequency-domain system shown in Figure 19.
were taken. In addition, notice that the reconstructed shape of the spherical absorber is oval, not spherical. This shape is probably due to a mismatch between the true background DPDW in the phantom containing the object and the background DPDW used for the subtraction that was measured by using a different phantom. The second reconstruction example used the 10-kHz modulation-frequency value. The object was an airplane model imbedded in a turbid medium phantom. The phantom’s absorption and reduced scattering coefficients were μa = 0.01 cm−1 and μ′s = 18 cm−1, respectively. The depth of the phantom was 5.5 cm and its transverse dimensions were 20 ×14 cm. The airplane dimensions were approximately 6.5 × 7.5 cm and it was opaque to light and thus behaved approximately as an absorbing object. The illumination was provided by four point sources whose locations were chosen to approximately bracket the spatial extent of the airplane model. The airplane was imbedded in the middle of the phantom. Similar to the case of the spherical absorber, the background measurement was obtained with a phantom made at the same time with the
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
309
Figure 22. Clockwise from the upper left: true object, measured scattered DPDW amplitude, backpropagated scattered DPDW amplitude in a plane parallel to the detection plane at the true depth (2.5 cm) of the airplane, and backpropagated scattered DPDW amplitude at a depth of 5.5 cm. The background material properties are μa = 0.01 cm−1 and μ′s = 18 cm−1, the modulation frequency is 10 kHz, and the separation of the illumination and detection plane is 5.5 cm. The data were collected with the frequency-domain system shown in Figure 19.
same optical properties and dimensions. The SNR at zero spatial frequency in the data was 105, which implies that the amount of resolution increase brought about by the backpropagation algorithm should be approximately a factor of 4. The regularizing filter in the backpropagation algorithm was a pillbox filter whose spatial-frequency cutoff was chosen to be 0.5 cm−1, the frequency at which the SNR dropped to one in the Fourier data. In Figure 22, an image of the true object is shown along with reconstructions at three planes parallel to the detection plane: the detection plane, the plane corresponding to the best image quality, and the plane where the illumination enters the phantom. Notice that the blurring in the measured scattered DPDW is sufficiently strong that it is not possible to determine what object is imbedded in the phantom. For the reconstruction that corresponds to the correct depth location of the airplane, the best image quality was obtained. Notice that the increased spatial resolution permits identification of the orientation of the airplane because its wings, tail, and fuselage are clearly seen. The reconstruction for the plane where the illumination enters the phantom shows that the backpropagation
310
CHARLES L. MATSON
algorithm overcompensated for the turbid media blurring, which resulted in a reconstruction in which noise dominates. Thus it can be seen that the correct object depth, as for the spherical absorber, can be determined by finding the depth location where the best image quality is obtained. In addition, after analysis of the relative spatial resolutions in the measured scattered DPDW and in the plane with the best reconstruction, it was determined that the resolution was increased by approximately a factor of 4 by the backpropagation algorithm (Matson and Liu, 2000). This is what theory predicts, unlike for the spherical absorber example. Because the illumination was modulated at 10 kHz for the airplane reconstruction and at 20 MHz for the spherical absorber reconstruction, it is likely that the poor match between theory and experiment for the spherical absorber was due to the RF noise noticed during the data collection. 2. CW Data In this subsection, a system that uses CW illumination to image objects imbedded in turbid media and results from using this system are described. A schematic of the system is shown in Figure 23. A 20-mW 632.8-nm HeNe laser (Melles Griot, 05-LHP-925) is used to illuminate the turbid medium phantom. Optics are used to collimate the beam to the desired diameter. The spatial coherence of the laser can be destroyed to improve the SNR in the data (see the discussion in Section V) by using a rotating ground-glass diffuser directly in front of the phantom. The light emerging from the far side of the phantom is imaged onto an unintensified 16-bit charge-coupled device (CCD) camera (SpectraSource, Orbis 1) that uses a 512 × 512 CCD chip (SITe,
Figure 23. CW system schematic. CCD, charge-coupled device.
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
311
SI-502A) and has a read noise of 25 electrons root-mean-square (RMS). The camera is controlled by a computer that is also used to collect the data. Unlike the frequency-domain system described in the previous section, the CW system collects information for all the spatial locations in a single snapshot, which greatly speeds up the data collection process. The first reconstruction example was obtained for an object consisting of two 0.8-cm cubes whose absorption and reduced scattering coefficients are 0.12 and 15 cm−1, respectively. These cubes are imbedded at a depth halfway between the two faces of a turbid medium phantom whose depth is 4.5 cm, transverse dimensions are 14 ×14 cm, and whose absorption and reduced scattering coefficients are 0.02 and 15 cm−1, respectively. A square center region of the phantom, measuring 8.75 cm on a side, was imaged onto the CCD camera. The laser beam was collimated to a pencil beam and illuminated the phantom at a transverse location directly between the two cubes. The exposure time for the data was chosen to fill the CCD wells at the brightest location in the image to more than 90% of their maximum value. The SNRs at zero spatial frequency for the inhomogeneous and homogeneous data sets are on the order of 104; however, because the optical properties of the cubes are close to that of the background medium, the SNR at zero spatial frequency in the measured scattered DPDW is only 100. As a result, the theoretically predicted improvement in spatial resolution brought about by the backpropagation algorithm is a factor of 2. The modified Hamming filter used to regularize the backpropagation algorithm had a spatial-frequency cutoff of 0.45 cm−1, the location in spatial-frequency space where the SNRs went to one. In Figure 24, a diagram indicating the true transverse locations of the two cubes is shown along with reconstructions at three planes parallel to the detection plane: the detection plane, the plane corresponding to the best image quality, and the plane where the illumination enters phantom. Notice that the amount of spatial resolution improvement is approximately a factor of 2, as predicted by theory. The second reconstruction example using CW illumination is of an airplane model located outside and against a turbid medium phantom. This geometry was chosen for two reasons: one is that it is of interest to see if images can be obtained of objects obscured by but not imbedded in a turbid medium, and the second is that all the reconstructions shown in this subsection so far are of objects imbedded halfway between the two faces of the turbid media phantoms. Carrying out a reconstruction for an object not imbedded at the halfway point ensures that the backpropagation algorithm actually locates objects in depth instead of just creating the best reconstruction at the halfway point. The airplane model was placed directly against the face of the phantom where the illumination entered the medium. The length of the airplane model along the fuselage is 12.5 cm. The phantom is essentially infinite in extent in
312
CHARLES L. MATSON
Figure 24. Clockwise from the upper left: true object, measured scattered DPDW amplitude, backpropagated scattered DPDW amplitude in a plane parallel to the detection plane at the true depth (2.3 cm) of the imbedded cubes, and backpropagated scattered DPDW amplitude at a depth of 4.5 cm. The background material properties are μa = 0.02 cm−1 and μ′s = 15 cm−1, and the separation of the illumination and detection plane is 4.5 cm. The data were collected with the CW system shown in Figure 23.
the transverse dimensions and is 2.5 cm in depth. Its absorption and reduced scattering coefficients are 0.01 and 18 cm−1, respectively. A square center region of the phantom, measuring 17.5 cm on a side, was imaged onto the CCD camera. The laser beam with a Gaussian intensity profile was collimated to a diameter of 20 cm. Again the exposure time for the data was chosen to fill the CCD wells at the brightest location in the image to more than 90% of their maximum value. As for imbedded objects, the measured scattered DPDW was created by subtracting the measured DPDW without the airplane from the measured DPDW with the airplane in place. This subtraction produced a measured scattered image of the airplane instead of the light not blocked by the airplane. The measured scattered image consists of all negative values because it comprises the negative of the light blocked by the airplane; however, the backpropagation algorithm is not adversely affected by negative values. The SNR at zero spatial frequency of the measured scattered light is approximately 103, which results in an expected resolution improvement in the optimal reconstructed image of a factor of 3. The modified Hamming filter used to regularize the
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
313
Figure 25. Clockwise from the upper left: true object, measured scattered DPDW, backpropagated scattered DPDW in a plane parallel to the detection plane at a depth of 1.2 cm, and backpropagated scattered DPDW at a depth of 2.5 cm, the edge of the turbid medium phantom against which the true object is resting. The background material properties are μa = 0.01 cm−1 and μ′s = 18 cm−1, and the separation of the illumination and detection plane is 2.5 cm. The data were collected with the CW system shown in Figure 23.
backpropagation algorithm had a spatial-frequency cutoff of 0.5 cm−1, the location in spatial-frequency space where the SNRs went to one. In Figure 25, a picture of the noise-free and unblurred image is shown along with reconstructions at three planes parallel to the detection plane: the detection plane, the plane corresponding to the middle of the phantom, and the plane where the illumination enters phantom. The best reconstruction is seen to be at the phantom face where the illumination enters the phantom, which is where the airplane model is located. Also, the improvement in resolution is approximately a factor of 3, as predicted by theory.
E. Multiple-View Backpropagation In this subsection, a multiple-view backpropagation algorithm is presented and it is shown that structural information about the object can be obtained from images reconstructed by using this algorithm (Matson and Liu, 1999b).
314
CHARLES L. MATSON
It was seen in Section IV.C that the single-view backpropagation algorithm can be used to locate an object in depth by reconstructing the scattered DPDW throughout the turbid medium and finding the location of the peak of the absolute value in the reconstruction. However, a single view does not provide sufficient information to reconstruct an object with a significant depth dimension, in general. As discussed in Section III.C, a single view contains information about the object in only a two-dimensional region of the three-dimensional Fourier transform volume. As a result, multiple views are needed to fill in the information throughout the entire three-dimensional Fourier volume. When less than a complete number of views is available, often assumed or known prior knowledge about the properties of the object is used to improve the quality of the reconstruction (Devaney, 1989). Typical kinds of prior knowledge include knowledge that the material properties are constant inside the object and knowledge about the shape of the object. Following the approach of standard DT (Devaney, 1986), the multiple-view backpropagation algorithm is defined in two steps. In the first step, the measured scattered DPDW for each view is backpropagated through the volume. In the second step, these backpropagated scattered DPDWs are coherently summed, which gives u rec (x, y, z; φ) (61) u rec,m (x, y, z) = φ
where φ is an index denoting the different views of the object, urec(x, y, z; φ) is the backpropagated scattered DPDW for the view of index φ, and urec,m(x, y, z) is the coherent sum of all the single-view backpropagated scattered DPDWs. In standard DT, the coherent sum of all the single-view backpropagated waves, as expressed in Eq. (61), can produce images that provide image structural information (Devaney, 1986). As is shown with an example in this subsection, image structural information can also be obtained by using the turbid media version of the backpropagation algorithm. Often the ability to produce such an image of the object is important, especially if it can be accomplished in near real time to provide quick-look information. In addition, the image can be used as the starting point of a quantitative material properties algorithm. For these reasons it is next demonstrated, with computer-simulated data, that the coherent sum of the single-view backpropagated scattered DPDWs can provide such quick-look information with reasonable image quality. The PMI software was used to generate the simulated measured scattered DPDWs for the reconstructions. Two 1-cm-diameter spheres are used as the imbedded objects, arranged as shown in Figure 26, where the plane shown in Figure 26 corresponds to the x–z plane in Figure 5. The upper object, located at x = 2, y = 0, and z = 6 cm, has absorption and reduced scattering coefficients of 0.5 and 12 cm−1, respectively. The lower object, located at x = 0, y = 0, and z = 2 cm, has absorption and reduced scattering coefficients
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
315
Figure 26. Schematic of a system and target used for the multiple-view backpropagation reconstructions. The background material properties are μa = 0.015 cm−1 and μ′s = 14 cm−1, the upper sphere’s material properties are μa = 0.5 cm−1 and μ′s = 12 cm−1, and the lower sphere’s material properties are μa = 0.7 cm−1 and μ′s = 9 cm−1. Both spheres have a diameter of 1 cm. The upper sphere is located at x = 2, y = 0, and z = 6 cm; the lower sphere is located at x = 0, y = 0, and z = 2 cm; the illumination source is located at x = y = z = 0; and the detection plane is located at z = 8 cm. (Reprinted with permission from C. L. Matson and H. Liu, 1999. Backpropagation in turbid media. Journal of the Optical Society of America A, 16, 1254 –1265.)
of 0.7 and 9 cm−1, respectively. The illumination and detection apparatus are modeled as rotating about the rotation axis in 15◦ increments through a full 360◦ , which results in 24 views of the object. The detector array consists of 64 × 64 elements that are spaced 0.25 cm from each other, and a 200-MHz point illumination source was used that is 8 cm behind and centered on the detection plane. The background absorption and reduced scattering coefficients are 0.015 and 14 cm−1, respectively, and the index of refraction is 1.333. For the first reconstruction, a pillbox filter with a spatial-frequency cutoff of 11 pixels was used as the regularizing filter in the backpropagation algorithm. A plane of the amplitude of the three-dimensional backpropagated scattered DPDWs is shown in Figure 27. This plane is the same plane as that displayed in Figure 26; that is, it is an x–z plane that contains the centers of the spheres. Both spheres can be seen clearly and are located within 2–3 mm of their true locations. In addition, the reconstructed sizes of the spheres are within a few millimeters of their true sizes. For the second reconstruction, also shown in Figure 27, the modified Hamming filter also with a spatial-frequency cutoff
316
CHARLES L. MATSON
Figure 27. Multiple-view backpropagation reconstruction of two spheres described in Figure 26 using (left) the pillbox filter and (right) the modified Hamming filter. (Reprinted with permission from C. L. Matson and H. Liu, 1999. Backpropagation in turbid media. Journal of the Optical Society of America A, 16, 1254 –1265.)
of 11 pixels was used. The spheres are also clearly visible, but the accuracy of their locations is slightly worse than for the pillbox filter. In addition, the pillbox filter produced a sharper image, which was expected because the PSF corresponding to the pillbox filter has a narrower central lobe than does the PSF for the modified Hamming filter (see Fig. 11). Therefore, in this multipleview example, it can be seen that the pillbox filter produced better results than were obtained with the modified Hamming filter, unlike for the single-view examples in Section IV.C, for which the modified Hamming filter produced the best results. Therefore, more work is needed to determine how the filters affect the algorithm’s performance and thus how to choose optimal filters. Another result that can be seen from these two figures is that the spherical object shapes are reconstructed much more faithfully in the multiple-view reconstructions than in the single-view reconstructions (compare Fig. 13 with Fig. 27). This result is a visual demonstration of the need for multiple views to obtain accurate object reconstructions. The algorithm, implemented by using the Interactive Data Language (IDL) data analysis package (Research Systems, Boulder, CO) took approximately 2 s per view to reconstruct a 64 × 64 × 32pixel volume when runn on an IBM RS6000 Model 591. In addition, the speed of the algorithm can be increased significantly by implementing the algorithm in a compiled language such as C or Fortran. V. Signal-to-Noise Ratios In Section IV, the ability of the backpropagation algorithm to both localize an object and increase resolution in the measured scattered DPDW was discussed and it was shown to be a function of the SNR of the Fourier amplitudes of the measured scattered DPDW. In particular, the SNRs of the Fourier amplitudes of
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
317
the measured scattered DPDW determined how to choose the spatial-frequency cutoff of the regularizing filter to maximize spatial resolution and optimally locate an object’s depth in the medium. Because Fourier domain SNRs play such a strong role in the performance of the backpropagation algorithm, it is important to develop expressions that analytically describe the SNRs in the Fourier transform of the measured scattered DPDW as a function of all the relevant system, object, and material properties. Such expressions can be used to predict the performance of the backpropagation and other algorithms so that the performance of the system can be determined before an experiment or ODT system is designed. In this section, these SNR expressions are developed and analyzed (Matson, 2002). In Section V.A, the SNR expressions are developed for two types of ODT systems: those that use CW illumination and those that use modulated illumination. In Section V.B, these expressions are compared with laboratory data SNRs to determine their accuracy.
A. SNR Derivations In this subsection, the system geometries and other assumptions underlying the SNR developments are presented and explained. In addition, assumptions on the noise sources corrupting the measured data are described and explained, as is the form of the turbid medium in which the object is imbedded. Then the desired SNR expressions are derived, first for systems using CW illumination and second for systems using modulated illumination. 1. Assumptions For both frequency-domain and CW imaging systems, a transmission geometry as shown in Figure 5 is assumed; however, the SNR expressions do not require an infinite homogeneous medium in which the object is imbedded. The assumption of an infinite homogeneous medium in the previous sections of this article was used to permit the development of the backpropagation algorithm. Because the SNR expressions derived in this subsection involve only the measured scattered DPDW, the assumption of a homogeneous and infinite medium can be relaxed to an inhomogeneous medium that has finite extent. The surface of the turbid medium facing the detector that is contained within the field of view of the detector is assumed to be planar and parallel to the detector surface. Other than the planar detection surface, the turbid medium boundaries can have any shape. The data are assumed to be measured in a plane by using a rectangular grid of detectors. The reason for assuming a planar detection surface and a rectangular grid of data in each measurement is to facilitate the analysis of spatial resolution in the measured data and in the postprocessed data. In
318
CHARLES L. MATSON
this geometry, each data set is an image in and of itself and thus does not require image-reconstruction methods to generate an image from the data. This simplification helps make the underlying spatial resolution properties more transparent. In addition, this system geometry models the compressed-breast optical mammography geometry very well (Franceschini et al., 1997), although other optical mammography geometries also exist (Colak et al., 1999). As is common in ODT, the properties of the turbid medium and the imbedded object are assumed to be independent of time during the data-collection period. For the CW imaging system model, it is assumed that the detector collecting the light is an unintensified CCD camera. CCD cameras have been used to collect spatially resolved ODT data from which bulk material properties have been estimated (Gobin et al., 1999; Kienle et al., 1996), as well as to image the interior of turbid media (Cheng and Boas, 1998). Because the light levels associated with ODT are generally reasonably high, an intensifier is typically not needed. The light emerging from the turbid medium is imaged onto the CCD camera by using a simple lens system. It is assumed that the only two noise sources corrupting the data are photon noise, arising from the detected light, and amplifier (read) noise. There are other noise sources that are not included in the theory development because they can be either removed by proper experimental procedure or folded into the amplifier noise term. For example, if coherent laser illumination is used to probe the turbid medium, noise occurs as a result of the laser’s coherence properties, even when the turbid medium is dense. The noise levels can be effectively removed by destroying the laser’s coherence with a rotating ground-glass diffuser prior to the beam’s entering the medium. Background light can also contribute photon noise. In a well-shielded environment, the background-light levels should be minimal. However, if background-light levels are sufficiently high to generate noise levels that are noticeable in the measured data, their effects can be included in the amplifier-noise term. Another potentially significant noise source is caused by the nonuniform pixel responsivities of the CCD that at high light levels cause the noise variances to be proportional to the square of the signal level (Janesick, 2001) (as compared with photon-noise variances that are proportional to the signal level). This noise source can be removed by performing a flat field correction on the data (Janesick, 2001). For the frequency-domain imaging system, it is assumed that either a photomultiplier tube (PMT) or an avalanche photodiode (APD) is used to measure the light. These types of detectors are commonly employed in ODT. The desired phase and amplitude information are extracted either by using a lock-in amplifier (Liu et al., 1999; O’Leary et al., 1992) or by digitizing the detected DPDW directly and carrying out the amplitude and phase calculations in a computer (Fantini et al., 1995; Jiang et al., 1996; Pogue and Patterson, 1996). For the SNR derivations, it will be assumed either that a lock-in amplifier is used
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
319
to estimate the in-phase and quadrature components of the detected DPDW or that these quantities are estimated from the digitized detected DPDW in a manner identical to that for a lock-in amplifier. The detected light is assumed to be collected either by the detector directly adjacent to the turbid medium or by optical fibers directly adjacent to the turbid medium that transport the light to the detector. As for the CCD noise source assumptions, only photon noise and a single additive amplifier noise source are modeled. The amplifier-noise model contains all the non-signal-dependent noise sources in the detector. Although a single amplifier-noise term is used in the derivation, all amplifier-noise sources (in the detector, in intermediate amplifiers, and in the lock-in amplifier) can be included in this term by adding their variances, scaled appropriately to account for amplifier gains. As described in the CCDnoise paragraph, laser coherence effects and background light can contribute noise and can be dealt with as previously described. If multiple photodiodes are used to detect the light, their gains must be normalized in a manner similar to the CCD flat field correction. Another noise source that is not considered in this discussion but which has been considered elsewhere is detector misalignment (Boas, O’Leary, et al., 1997). This type of noise can be significant during image reconstruction because of mismatches between the actual and modeled detector positions. However, these mismatches come into play only in the reconstruction process, not in the measurement process. As a result, this type of noise is not considered in this subsection. The definition of the SNR of the Fourier transform of the measured scattered DPDW is given by SNR (ωx , ω y ) =
|E[Uˆ s (ωx , ω y )]| {var[Uˆ s (ωx , ω y )]}1/2
(62)
where Uˆ s (ωx , ω y ) is the Fourier transform of the measured scattered DPDW estimate uˆ s (x, y), E[ ] denotes the expected value (or mean), and var[ ] denotes the variance of the bracketed quantities. The dependence of variables on z in this subsection is not explicitly indicated in the argument lists because all the interest in this subsection is on the x and y coordinates. The definition of SNR given in Eq. (62) is often used in the image-processing community (Roggemann and Welsh, 1996). The “caret” denotes that the quantity is an estimate of the true quantity. As described in Section III, the estimate uˆ s (x, y) of the measured scattered DPDW is formed by subtracting the measurement of the DPDW for a homogeneous medium from the measurement of the DPDW for the medium containing the imbedded object (Cheng and Boas, 1998; Li, Pattanayak, et al., 2000; O’Leary et al., 1995a). As a result, the estimator used to estimate the DPDW scattered by the imbedded object is given by uˆ s (x, y) = u inh (x, y) − u hom (x, y)
(63)
320
CHARLES L. MATSON
where u inh (x, y) is the measured inhomogeneous DPDW and u hom (x, y) is the measured homogeneous DPDW. The subscripts inh, s, and hom are used throughout this subsection to denote inhomogeneous, scattered, and homogeneous DPDW quantities. The units of uˆ s (x, y) depend on the detection system used. For CCD cameras, the units are usually dimensionless analog-to-digital units (ADUs), whereas for lock-in amplifiers the units are often volts. A separate measurement of the turbid medium without an imbedded object is a requirement that can be difficult to fulfill; for this reason, methods for removing the background contribution without requiring a separate measurement have been explored (Cheng and Boas, 1998; Li, Pattanayak, et al., 2000). However, these methods also introduce errors into the estimate of the measured scattered DPDW in ways that may be difficult to quantify. For this reason, the SNR analyses carried out in this discussion assume that a second measurement has been made of a homogeneous medium that is identical to the inhomogeneous medium except that no object is imbedded inside. 2. SNR Derivation for CW Illumination The SNR expression for the Fourier transform of the measured scattered DPDW using CW illumination, SNRcw(ωx , ωy), is derived first. The subscript cw for all terms denotes CW illumination quantities. In this case, because CW light is used for illumination, all the quantities in Eq. (63) are real. The approach used to derive the SNR expression is to calculate the mean and variances separately and then substitute these expressions into Eq. (62). In addition, a property of spatially independent noises is used to simplify the variance calculations; to wit, the variance of the Fourier transform of a spatially independent random process is a constant for all spatial frequencies and is equal to the sum of the variances in the image domain (Papoulis, 1965). Both photon noise and amplifier noise are spatially and temporally independent, so this property can be exploited for the calculations. The numerator of Eq. (62) is derived first. Because the expectation operator and the Fourier transform operator are both linear, the Fourier transform of the expected value of uˆ s (x, y) is the same as the expected value of the Fourier transform of uˆ s (x, y). Therefore, E[uˆ s (x, y)] is derived and then Fourier transformed to get the desired mean value. Taking the expected value of Eq. (63) produces E[uˆ s (x, y)] = E[u inh (x, y)] − E[u hom (x, y)]
(64)
u inh (x, y) = sinh (x, y) + n a (x, y)
(65)
where u hom (x, y) = shom (x, y) + n a (x, y)
(66)
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
321
and where sinh(x, y) and shom(x, y) are the inhomogeneous and homogeneous Poisson processes (Papoulis, 1965) associated with the measured DPDWs and na(x, y) is the amplifier noise whose variance is σ 2(x, y). The amplifier noise is zero mean because the CCD camera biases are assumed to have already been removed. Also, the scale factor relating detected photons to ADUs is assumed to be one, for simplicity. This assumption does not affect the SNR expression because the ADU scale factor disappears when the ratio of the mean and square root of the variance is taken. Equation (64) can be evaluated by inspection. Note that the amplifier noise is zero mean and the expected values of the Poisson processes are just the underlying deterministic light intensities. Thus, E[uˆ s (x, y)] = tαcw ηcw ps,cw (x, y)/ hν
(67)
where ps,cw (x, y) is the difference between the inhomogeneous radiance pinh (x, y) and the homogeneous radiance phom (x, y) emerging from the turbid medium at location (x, y), ηcw is the quantum efficiency of the detector, t is the exposure time, h is Planck’s constant, ν is the frequency of the light, and α cw is a scale factor that accounts for the area of a detector pixel, the acceptance angle of the detector, and the magnification of the light when the DPDW emerging from the turbid medium onto the detector is reimaged. Note that because ps(x, y) is the result of subtraction, it can easily be a negative quantity (e. g., if the imbedded object is an absorber). Fourier transforming Eq. (67) yields the numerator of Eq. (62): |E[Uˆ s (ωx , ω y )]| = tαcw ηcw |Ps,cw (ωx , ω y )|/ hν
(68)
In writing Eqs. (67) and (68), it is assumed that the detected-light levels are approximately constant over the detector pixel area. If this is not the case, these two equations can be generalized by writing the detected light as an integral of the light levels over the pixel areas. Next, the denominator of Eq. (62) will be calculated. Recall that because photon and amplifier noises are spatially independent, var[uˆ s (x, y)] (69) var[Uˆ s (ωx , ω y )] = x,y
In addition, because all image-domain noises are independent, the variance at each spatial location is the sum of the variances of the photon noise and the amplifier noise. For photon noise, the variance is just the mean, and for amplifier noise, the variance is given by σ 2(x, y). When these properties of the noises are used, Eq. (69) becomes var[Uˆ s (ωx , ω y )] =
tαcw ηcw [ pinh (x, y) + phom (x, y)] /hν + 2σ 2 (x, y) x,y
(70)
322
CHARLES L. MATSON
Finally, after substituting Eqs. (68) and (70) into Eq. (62), the SNR of the measured scattered DPDW for CW illumination using a CCD camera for detection is given by SNRcw (ωx , ω y ) =2
"
tαcw ηcw |Ps,cw (ωx , ω y )|/ hν x,y
tαcw ηcw [ pinh (x, y) + phom (x, y)]/ hν +
2σ 2 (x,
y)
31/2
(71)
Several interesting properties can be seen from Eq. (71). The first is that the variances in the denominator contain contributions not just from the inhomogeneous light but also from the homogeneous light because of the need to subtract the homogeneous light from the inhomogeneous light to get the measured scattered DPDW. Another interesting property is that the photon noise terms are due to all of the collected light, not just the portion of the light scattered by the object. As a result, the photon-noise variance is typically much larger than would be produced by the measured scattered DPDW itself. In fact, when the light scattered by the object satisfies the Born approximation, pinh(x, y) ∼ = phom(x, y) and the photon-noise variance is approximately independent of the measured scattered DPDW light level. In many experiments, the amplifier noise is much less than the photon noise. Therefore, it is useful to simplify Eq. (71) to the photon-noise limit by setting σ 2(x, y) = 0: |Ps,cw (ωx , ω y )|
tαcw ηcw 1/2 SNRcw,ph (ωx , ω y ) = !1/2 (72) " hν x,y [ pinh (x, y) + phom (x, y)] 3. SNR Derivation for Modulated Illumination The derivation of the SNR when modulated light is used to illuminate the turbid medium, SNRmod(ωx , ωy), is similar to the derivation for CW light but with two important differences. The first is that the detected signal is time varying and must be temporally processed at each spatial location to produce the desired single-pixel information. This processing is assumed to be accomplished with a lock-in amplifier. The second difference is that the resulting image is complex, in general, because both the amplitude information and the phase information about the measured scattered DPDW are used in the inverse problem. The approach taken in this case to derive SNRmod(ωx , ωy) is to calculate the mean and variance at a pixel location and then use the results from Section V.A.2 to calculate the two-dimensional SNR. The subscript mod for all terms denotes modulated light quantities. The mean and variance of the complex number at a single spatial location that is output from the lock-in amplifier is next derived. For simplicity, it is
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
323
assumed that the output of the detector is fed directly into the lock-in amplifier. In practice, it is often necessary to accomplish an intermediate heterodyne mixing to change the frequency of the modulated light (typically in the megahertz range) down to a frequency within the range of the lock-in amplifier (typically less than 100 kHz). This assumption merely removes mathematical complexity in the derivation—it does not affect the SNR expression so long as the mixing process does not introduce additional noise in the data. If it does introduce noise, the additional noise can be folded into the amplifier noise term. This assumption also ignores other amplifiers between the detector and the lock-in amplifier. The noises associated with these other amplifiers, scaled appropriately, can easily be included in the amplifier noise term in the following derivation. In addition, although many detectors remove the dc component from their output signal, it is assumed in this case that the dc component has not been removed. The purpose of this assumption is also to remove mathematical complexity without loss of generality. In the derivation of the SNR expression, the high-pass nature of the detector output is not modeled because it has no effect so long as the frequency of the DPDW is within the temporal bandpass of the detector. Let v(x, y, t) be the time-varying voltage output from the detector at location (x, y). In terms of the detected photon flux s(x, y, t) (a random process), the gain G from photon flux to output voltage, and zero-mean amplifier noise na(x, y, t) whose variance σ 2(x, y) is assumed to be independent of time, v(x, y, t) is given by v(x, y, t) = Gs(x, y, t) + n a (x, y, t)
(73)
Because the photon flux s(x, y, t) is a random process, v(x, y, t) is also a random process that is a function of the deterministic radiance p(x, y, t) emerging from the turbid medium. This radiance consists of both a dc component and an ac component that is modulated at a frequency ft. In addition, if there is an object imbedded in the medium, both the ac and the dc light are scattered by the object. Mathematically, p(x, y, t) is given by p(x, y, t) = pdc (x, y) + pac (x, y) cos [2π f t t + θ(x, y)]
(74)
where the terms pdc (x, y), pac (x, y), and v(x, y) are functions of the position of the detector. To get the real part vre (x, y) of the complex number [vre (x, y), yim (x, y)] that describes the amplitude and phase of the detected modulated light at location (x, y), the lock-in amplifier processes the timevarying voltage by multiplying v(x, y, t) by 2cos(2π f t t) and low-pass-filtering the result. As a result, the estimator vˆ re (x, y) used to estimate vre (x, y) is given by vˆ re (x, y) = lp(t) ∗ {2 cos(2π f t t)[Gs(x, y, t) + n a (x, y, t)]}
(75)
324
CHARLES L. MATSON
where lp(t) is the time response of the low-pass filter used to isolate the dccentered signal and the asterisk denotes the convolution operation. Similarly, the estimator for vim (x, y), vˆ im (x, y), is given by vˆ im (x, y) = lp(t) ∗ {2 sin(2π f t t)[Gs(x, y, t) + n a (x, y, t)]}
(76)
where the mixing function is a sine instead of a cosine to get the quadrature component of the voltage. Because the DPDW amplitude and phase are assumed to be constant with respect to time, and because the properties of the turbid medium and imbedded object are assumed to be constant during the data-collection period, both vˆ re (x, y) and vˆ im (x, y) are independent of time except for noise fluctuations. Now that the estimators vˆ re (x, y) and vˆ im (x, y) have been defined, it is necessary to find their means and variances. The mean of vˆ re (x, y), E[ˆvre (x, y)], is calculated from Eq. (75) as follows: E[ˆvre (x, y)] = lp(t) ∗ (2 cos(2π f t t){E[Gs(x, y, t)] + E[n a (x, y, t]}) = lp(t) ∗ (2 cos(2π f t t)Gαmod ηmod { pdc (x, y) (77) + pac (x, y) cos[2π f t t + θ(x, y)]}/ hν) = αmod ηmod Gpac (x, y) cos[θ(x, y)]/ hν where the first equality is obtained by using the linearity of the expectation operator, the second equality by noting that the amplifier noise is zero mean and the expected value of the Poisson process is just the underlying deterministic radiance given in Eq. (74) converted to photon units, and the third equality by carrying out the low-pass-filter operation. In addition, α mod is a scale factor that accounts for the acceptance angle of the detector, the size of the detector, and the magnification of the imaging optics (if any), and ηmod is the effective quantum efficiency of the detector (including coupling losses if fiber optics are used to transport the light to the detector). Using the same steps as for Eq. (77) allows E[ˆvim (x, y)] to be calculated by using Eq. (76): E[ˆvim (x, y)] = αmod ηmod Gpac (x, y) sin[θ(x, y)]/ hν
(78)
Next, the variance of [ˆvre (x, y), vˆ im (x, y)] is derived. As for the mean calculations, the variance calculation is carried out only for vˆ re (x, y). The result for vˆ im (x, y) follows directly and is only stated. By definition, the variance of vˆ re (x, y) is given by var[ˆvre (x, y)] ≡ E[ˆvre (x, y)2 ] − E[ˆvre (x, y)]2
(79)
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
325
Substituting Eqs. (75) and (77) into Eq. (79) gives var[ˆvre (x, y)] = lp(t − α) lp(t − β)E({2 cos(2π f t α)
× [Gs(x, y, α) + n a (x, y, α)]}{2 cos(2π f t β)[Gs(x, y, β)
+ n a (x, y, β)]}) dα dβ − E[ˆvre (x, y)]2 = lp(t − α) lp(t − β) 4 cos(2π f t α) cos(2π f t β)
(80)
× {G 2 E[s(x, y, α) s(x, y, β)] + E[n a (x, y, α) × n a (x, y, β)]} dα dβ − E[ˆvre (x, y)] 2
where the first equality is obtained by explicitly writing out the convolution operation and interchanging the order of integration and expectation, and the second equality is obtained by exploiting the property that the photon and amplifier noises are independent of each other. To continue, the expectations in the second equality in Eq. (80) must be calculated. Papoulis (1965) has shown that the first expectation, for a Poisson process, is given by 2 2 ηmod p(x, y, α) p(x, y, β)/ h 2 ν 2 E[s(x, y, α)s(x, y, β)] = αmod
+ αmod ηmod p(x, y, α)δ(α − β)/ hν
(81)
where p(x, y, t) is given by Eq. (74) and δ( ) denotes the Dirac delta function. However, for APDs and PMTs, there is an excess-noise factor Ŵ incurred in the amplification process that multiplies the second term in Eq. (81) and must be included in this derivation (Kingston, 1979). The Ŵ factor is typically <1.5 for PMTs and is given by M x with 0 < x < 0.5 for APDs, where M is the current gain of the detector. Because the amplifier noise is independent of time, the second expectation in the second equality in Eq. (80) is given by E[n a (x, y, α)n a (x, y, β)] = σ 2 (x, y)δ(α − β)
(82)
Substituting Eqs. (81) and (82) into Eq. (80) and simplifying gives var[ˆvre (x, y)]
= 4G Ŵαmod ηmod lp 2 (t − α) cos2 (2π f t α) p(x, y, α) dα/hν + 4σ 2 (x, y) (83) × lp 2 (t − α) cos2 (2π f t α) dα + [2Gαmod ηmod × lp(t − α) cos(2π f t α) p(x, y, α) dα/ hν]2 − E[ˆvre (x, y)]2 2
It is straightforward to show that the third and fourth terms in Eq. (83) are the same and thus cancel each other out. To continue, the definition of p(x, y, t)
326
CHARLES L. MATSON
from Eq. (74) is substituted in Eq. (83), trigonometric identities are used to simplify the integrands, and the fact that non-base-band terms are not passed by the low-pass filter is employed. This produces 6 7 2 2 var[ˆvre (x, y)] = 2αmod ηmod ŴG pdc (x, y)/ hν + 2σ (x, y) lp 2 (t) dt (84)
All that remains to be evaluated is the integral. By Parseval’s theorem (McGillem and Cooper, 1991), 2 (85) lp (t) dt = L P 2 ( f ) d f
where f is a temporal frequency variable. If the low-pass filter is assumed to be ideal with a cutoff frequency of Blp , the integral in Eq. (85) is equal to 2Blp , and the final result is given by var[ˆvre (x, y)] = 4αmod ηmod ŴG 2 Blp pdc (x, y)/ hν + 4Blp σ 2 (x, y)
(86)
The variance of the imaginary part of [ˆvre (x, y), vˆ im (x, y)] is calculated by using the same procedures and is given by var [ˆvim (x, y)] = 4αmod ηmod ŴG 2 Blp pdc (x, y)/ hν + 4Blp σ 2 (x, y)
(87)
From Eqs. (86) and (87) it can be seen that the variances of the real and imaginary parts are identical. Notice that the variances are a function of pdc (x, y), not pac (x, y). This has significant implications for the SNR properties of the modulated scattered DPDW as compared with the scattered DPDW obtained by using CW illumination, as is discussed in Section V.A.4. Using an ac-coupled detector does not remove this noise because the noise variance, although a function of the deterministic dc light level, is a constant nonzero value at all spatial frequencies. The pointwise means and variances of the temporally processed signals just derived can be used to obtain the desired SNR function SNRmod(ωx , ωy) by using the procedures followed in Section V.A.2. Let uˆ inh (x, y) be the estimator of the measured inhomogeneous DPDW and uˆ hom (x, y) be the estimator of the measured homogeneous DPDW, where their real and imaginary parts are defined by using Eqs. (77) and (78). Then the numerator of SNRmod(ωx , ωy) is given by the absolute value of the Fourier transform of E[uˆ inh (x, y) − uˆ hom (x, y)], denoted by E[Uˆ s (ωx , ω y )]: |E[Uˆ s (ωx , ω y )] | = αmod ηmod G|Ps,mod (ωx , ω y )|/ hν
(88)
where Ps,mod(ωx , ωy) is the Fourier transform of the difference of the inhomogeneous and homogeneous radiances emerging from the turbid medium. As before, the denominator of SNRmod(ωx , ωy) is the square root of the sum of the
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
327
single-pixel variances in the spatial domain. Because the image is complex, the summation is over the real and imaginary parts of the image. Because the real and imaginary variances are the same, this means that the variance is double that for a real image. Using this fact and Eqs. (69), (86), and 87 gives var[Uˆ s (ωx , ω y )] = 8Blp αmod ηmod ŴG 2 [ pinh,dc (x, y) x,y
+ phom,dc (x, y)]/ hν + 16σ 2 (x, y)Blp
(89)
where pinh,dc(x, y) and phom,dc(x, y) are the inhomogeneous and homogeneous dc radiances emerging from the turbid medium. Finally, the desired SNR expression SNRmod(ωx , ωy) is given by SNRmod (ωx , ω y ) =
8Blp
"
αmod ηmod G|Ps,mod (ωx , ω y )|/ hν
x,y
!1/2 αmod ηmod ŴG 2 [ pinh,dc (x, y) + phom,dc (x, y)]/hν + 2σ 2 (x, y)
(90)
In the photon-noise limit, σ 2(x, y) = 0, and Eq. (90) becomes SNRmod,ph (ωx , ω y ) αmod ηmod 1/2 = " 8Blp Ŵhν
|Ps,mod (ωx , ω y ) |
!1/2 x,y [ pinh,dc (x, y) + phom,dc (x, y)]
(91)
So that SNRmod,ph(ωx , ωy) can be more easily compared with SNRcw,ph(ωx , ωy), the low-pass filter used by the lock-in amplifier is assumed to be an integrating filter with sampling time t. This type of filter, which closely mimics the integration properties of a CCD camera, has an effective bandwidth Blp = 12 t (Kingston, 1979). Substituting this expression into Eq. (91) gives the final desired modulated SNR result SNRmod,ph (ωx , ω y )
tαmod ηmod 1/2 = " 4Ŵhν
|Ps,mod (ωx , ω y ) |
x,y
[ pinh,dc (x, y) + phom,dc (x, y)]
!1/2
(92)
4. Comparison of Modulated and CW Illumination SNRs In this subsection, the SNRs associated with modulated and CW illumination are compared and contrasted. The photon-noise-limited SNRs are compared because they provide the ultimate limit to achievable SNRs and are typically attainable when the light is detected by using a CCD camera or a PMT. To assist in the SNR comparison, the ratio of the modulated SNR expression to
328
CHARLES L. MATSON
the CW SNR expression is created. Under the reasonable assumptions that the dc intensity in the modulated data is the same as the intensity in the CW data and that the effective exposure times for the two systems are equal, the ratio is created by dividing Eq. (92) by Eq. (72), which produces SNRmod,ph (ωx , ωy ) SNRcw,ph (ωx , ωy ) 1 αmod ηmod 1/2 |Ps,mod (ωx , ω y | = 4Ŵ αcw ηcw |Ps,cw (ωx , ω y |
SNRratio (ωx , ω y ) ≡
(93)
A number of important facts can be derived from Eq. (93). The first is that the SNR of modulated-light data is decreased by a factor of (4Ŵ)1/2 compared with that of CW light. The factor of 4 comes from two factors of 2—the first from detecting a signal centered about a nonzero temporal frequency, and the second from creating a complex image with modulated light to retrieve amplitude and phase information about the measured scattered DPDW. The excess-noise factor Ŵ is device dependent; CCD cameras can accumulate detected photons with only photon-noise effects and thus have a Ŵ factor of 1, whereas multiplication devices such as PMTs and APDs have Ŵ factors associated with their charge multiplication physics that can range from approximately 1.2 (PMTs and low-gain APDs) to 5 or more (high-gain APDs). A significant factor in the SNR ratio is the ratio of α mod to α cw. Recall that α describes the amount of light that can be collected by a detector in a fixed amount of time and is a function of the detector’s area, the magnification brought about by the optics in the detection system, and the acceptance angle of the detector. A CCD camera has an advantage over PMTs and APDs in that it has the ability to simultaneously image the whole extent of the phantom, which thus permits short data-collection times. In contrast, PMT- and APDbased systems usually place the detector immediately adjacent to the phantom, which permits light to be collected from larger angles than are possible with a CCD camera; this increases the SNR in the data. Another advantage of PMTs and APDs is that they can continuously collect light over time, whereas a CCD camera can collect light only until its wells are filled. After this, the CCD camera must be read out, which slows the collection of light. Ultimately, the system design determines which type of detector will collect more light and thus have a higher overall SNR if the SNR is photon-noise limited. In Section V.A.5, results from using an APD-based system and a CCD-based system are presented. These results show that the APD-based system produced data with higher SNRs than in the CCD data because the APD-based system collected much more light than that collected by the CCD-based system. The SNR ratio is also a function of the ratio of the systems’ quantum efficiencies. Solid-state detectors, such as APDs and CCDs, can have quantum
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
329
efficiencies approaching one, whereas PMTs that use photocathodes are typically limited to quantum efficiencies of less than one-half. Another important fact is that the SNR ratio is a function of the ratio of the Fourier transform of the modulated measured scattered DPDW to the Fourier transform of the CW measured scattered DPDW. If the photon-noise variance terms in the SNR expressions defined in Eqs. (72) and (92) were due to the measured scattered DPDWs, the SNR ratio would be a function of the square root of the ratio of the Fourier transform of the modulated measured scattered DPDW to the Fourier transform of the CW measured scattered DPDW. However, this linear dependence on the ratio indicates that using modulated light to image the imbedded object can noticeably decrease the Fourier domain SNRs, compared with those obtained by using CW light because light modulated at frequencies greater than 100 MHz undergoes significantly more attenuation in turbid media. For modulation frequencies on the order of 10 MHz or less, this is not a significant factor. 5. Laboratory Data Validation In this subsection, comparisons of the SNR expressions derived in Sections V.A.2 and V.A.3 to SNR estimates from data collected by using both types of illumination (modulated and CW) are presented so that the accuracy of these expressions can be evaluated. As a way to simplify the evaluation process, the SNRs in just the homogeneous measurements are compared. The purpose for this simplification is that, in order to accurately compare the SNRs of data and theory by using Eqs. (71) and (90) for light scattered from objects imbedded in a turbid medium, the inhomogeneous and homogeneous phantoms must be sufficiently well modeled so that the model uncertainties are less than the data-noise levels. It is much easier to mathematically model the homogeneous medium than it is to model the inhomogeneous medium; therefore, this is the approach taken in this subsection. The SNR expressions in Eqs. (71) and (90) are simplified to just the homogeneous case by removing the inhomogeneous terms (pinh in Eq. (71) and pinh,dc in Eq. 90) and the factors of 2 multiplying the amplifier noise terms in the denominators. In addition, the Fourier transforms of the measured scattered DPDWs in both equations are replaced with the Fourier transforms of the measured homogeneous DPDWs. For the modulated-illumination SNR expression, these simplifications result in SNRhom,mod (ωx , ω y ) αmod ηmod G|Phom,mod (ωx , ω y )|/ hν =6 71/2 " 8Blp x,y αmod ηmod ŴG 2 phom,dc (x, y)/ hν + σ 2 (x, y)
(94)
330
CHARLES L. MATSON
Similarly for the CW illumination SNR expression, the simplified expression is given by SNRhom,cw (ωx , ω y ) = 6"
tαcw ηcw |Phom,cw (ωx , ω y )|/ hν
x,y
tαcw ηcw pcw (x, y)/ hν + σ 2 (x, y)
71/2
(95)
In both cases the illumination was provided by a narrow collimated laser beam that can be accurately modeled as a diffuse point source at a distance of one reduced scattering length in the turbid medium. As a result, the light emerging from the turbid medium can be modeled by using the point-source-illumination solution to the diffusion equation (Fishkin and Gratton, 1993). For the results in this subsection, the infinite-medium solution was used because it provided sufficiently accurate solutions; however, for even more accurate results, the method of images can be employed (Barton, 1989; Morse and Feshbach, 1953). First, the modulated-light SNR expression given in Eq. (94) is evaluated by using both theoretical predictions and experimental data. The data were collected by using the frequency-domain system described in Section IV.D.1. The two components of the system that contributed to the amplifier noise in the measured data were the APD and the RF amplifier following the APD, which resulted in an overall noise standard deviation of σ (x, y) = 0.32 μV at the input to the lock-in amplifier. So that the data SNR can be compared with theoretical predictions, it is necessary to determine the parameters to be used in Eq. (94). For us to calculate the theoretical prediction of the noise-free Fourier-transformed homogeneous DPDW, Phom, mod(ωx , ωy), the modulation frequency of the illumination and the material properties must be known. For the data shown here, ft = 20 MHz, μa = 0.014 cm−1, μ′s = 12 cm−1, n = 1.5, and the turbid medium phantom depth is 4.5 cm. The product of α mod and ηmod was calculated from the ratio of the theoretically predicted Fourier transform to the data Fourier transform at zero spatial frequency. The excess noise factor Ŵ is equal to 2.77 and the system gain from detected photons to output voltage G is equal to 2.81× 10−13 V/photon. The total dc light level phom,dc(x, y) could not be measured by the APD because it has a high-pass filter in its amplifier. Instead, the dc photon level was calculated from the modulated data by using knowledge of the modulation depth of the laser diode (50%) and of the relative attenuation of 20-MHz modulated light to CW light (0.932) from theory. Finally, the bandwidth of the lock-in amplifier Blp for these measurements was equal to 4.2 Hz. With all these system and material parameter values, the comparison between theory and data can proceed. Data were collected by the APD on a 64 × 64 square grid that was 8.2 cm on one side, and its Fourier amplitudes were then calculated. The preceding values and calculations in Eq. (94) were used to generate the theoretically predicted SNR, which is plotted in Figure 28.
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
331
Figure 28. SNR plots using modulated light: (solid line) theory and (asterisks) data. (Reprinted with permission from C. L. Matson, 2002. Signal-to-noise ratios in optical diffusion tomography. Journal of the Optical Society of America A, 19, 961–972.)
In addition, the data SNRs for all spatial frequencies were obtained by estimating the noise standard deviation in the high-spatial-frequency regions where the signal was so low that only noise remained and then dividing the Fourier transform of the measured data by the estimated noise standard deviation. A slice of the two-dimensional data SNR is also plotted in Figure 28. Notice that theory and data match well at low spatial frequencies but deviate at high spatial frequencies. This phenomenon results because when the data-estimated SNRs drops to one and less, essentially only noise remains, and dividing the absolute value of zero-mean noise realizations by their standard deviation results in values that cluster around one. Notice also that the SNR at zero spatial frequency is 105—a reasonably high number that permits close to a factor of 5 of spatial resolution increase when deconvolution techniques are used as discussed in Section IV.B. An alternative way to compare theory and data is to plot the theoretically predicted average noisy Fourier amplitudes along with the data Fourier amplitudes. The theoretical prediction is generated by taking the noise-free Fourier amplitudes calculated for the numerator of Eq. (94) and adding the denominator of Eq. (94) to it. The theory and data plots of the Fourier amplitudes are shown in Figure 29. Notice the excellent agreement between theory and data. An interesting feature of the noise in this data set is that the amplifier-noise variance is larger than the photon-noise variance by a factor of 5. As a result, if a photon-noise-limited APD had been used to collect the data, the SNRs in the data would have increased by more than a factor of 2. Next, the CW SNR expression given in Eq. (95) is evaluated by using both theoretical predictions and experimental data. The data were collected by using the CW system described in Section IV.D.2. The laser’s spatial coherence was destroyed as much as possible by passing the laser beam through a rotating
332
CHARLES L. MATSON
Figure 29. Fourier amplitude plots using modulated light: (solid line) theory and (asterisks) data. (Reprinted with permission from C. L. Matson, 2002. Signal-to-noise ratios in optical diffusion tomography. Journal of the Optical Society of America A, 19, 961–972.)
ground-glass diffuser prior to the beam’s entering the turbid medium. The camera’s exposure time was set to 100 ms to fill the camera’s wells to more than 90% with the measured light. The product of α cw and ηcw was determined, as for the modulated-light case, from the ratio of the theoretically predicted Fourier transform to the data Fourier transform at zero spatial frequency. The theoretical expression for Phom,cw(ωx , ωy) was generated in a manner similar to that in the modulated-light case, as were the data SNRs. The total photon counts phom(x, y) were obtained by adding all the pixel values in the image. The plots of the SNRs are shown in Figure 30, where it can be seen that the
Figure 30. SNR plots CW light: (solid line) theory and (asterisks) data. (Reprinted with permission from C. L. Matson, 2002. Signal-to-noise ratios in optical diffusion tomography. Journal of the Optical Society of America A, 19, 961–972.)
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
333
Figure 31. Fourier amplitude plots using CW light with corrections for laser speckle and nonuniform camera responsivity: (solid line) theory and (asterisks) data. (Reprinted with permission from C. L. Matson, 2002. Signal-to-noise ratios in optical diffusion tomography. Journal of the Optical Society of America A, 19, 961–972.)
theoretically predicted SNR is higher at zero spatial frequency than is the data SNR. The reason for this discrepancy can be seen more clearly in Figure 31, where the theoretically predicted noisy Fourier amplitudes are plotted, as are the data Fourier amplitudes. Notice that the data-noise level is higher than that predicted by theory. An analysis of the entire frame of data showed that the noise level is a factor of 2 higher than that predicted by theory. This increase over the theoretically predicted noise level is suspected to be due to both residual laser speckle effects and high-frequency structure in the camera bias frame. Research is being conducted to characterize these effects more completely with the goal of developing a method to completely remove them. In addition, the CW illumination SNR at zero spatial frequency is a factor of 3 less than the modulated-light illumination SNR (Fig. 28). This is because of the larger amount of light collected by the APD detector. Dividing the zero-spatialfrequency value of the Fourier amplitude in Figure 29 by the photon-to-voltage gain value shows that the APD-based system detected on the order of 1013 photons per second, which is a factor of 1000 more per second than was detected with the CCD camera. If the APD-based system was truly photon-noise limited, this difference in photon levels would have resulted in an improvement in SNR over the CCD camera results by a factor of approximately 30. Because the APD was read-noise limited at these light levels, the use of PMTs might be preferred, even though their quantum efficiencies are not as high. An issue raised in Section V.A.1 was that laser speckle and flat field corrections are necessary to minimize noise levels in the data. These corrections were not needed for the modulated data results presented in this subsection because
334
CHARLES L. MATSON
Figure 32. Fourier amplitude plots using CW light without corrections for laser speckle and nonuniform camera responsivity: (solid line) theory and (asterisks) data. (Reprinted with permission from C. L. Matson, 2002. Signal-to-noise ratios in optical diffusion tomography. Journal of the Optical Society of America A, 19, 961–972.)
the same detector was used for each pixel of data, and the APD active area is 4000 times larger than a single CCD camera pixel, which permits significant amounts of laser speckle averaging to occur in the APD data. This averaging process tends to remove laser speckle noise. However, for the CCD camera, both flat field corrections and laser speckle corrections are needed. For the data in Figures 30 and 31, a flat field frame was collected and used to even out the nonuniform camera pixel responsivity. In addition, the laser illumination was passed through a rotating ground-glass diffuser prior to the beam’s entering the phantom. As a way to show the importance of these corrections, theoretical and data Fourier amplitudes are plotted in Figure 32 for data for which these two corrections were not carried out. Notice the increase in the noise level. Without the laser speckle and flat field correction, the noise level is a factor of 3 higher than that predicted by theory, as compared with a factor of 2 with the corrections. This increase in noise has been seen to be as high as a factor of 4. B. SNR Example The SNR expressions derived in Section V.A can be used to evaluate the achievable SNRs in ODT imaging systems that satisfy the assumptions described in Section V.A.1. In turn, these SNRs expressions can be used to predict the detectability and resolvability of imbedded objects. An imbedded object is considered detectable if the SNR at zero spatial frequency is greater than one, which implies that there is enough light scattered by the imbedded object to distinguish the difference between the light levels from a homogeneous medium
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
335
and from a medium that contains an imbedded object in the presence of noise. The resolvability of an imbedded object is determined by the spatial frequency at which the SNR has decreased to one. It is this spatial frequency that produces the highest level of detail in the image and thus is a measure of how resolvable the object is (Matson, DeLarue, et al., 1992). The higher the spatial frequency at which the SNR is one, the more detail present in the image. In this subsection, the SNR expressions are used to evaluate the effect of the magnitude of just one system parameter, the modulation frequency of the DPDW (Matson, 2001). This parameter is used for two reasons. First, the parameter is under the control of the experimenter, so choosing this parameter correctly is important. Second, it is commonly believed that higher modulation frequencies produce higher spatial resolution. This is true for spatial resolutions in the measured data, as discussed in Section IV.B and shown in Figure 8, but it is not true for spatial resolutions in the reconstructed data, as is shown in this section. The example in this section uses computer-simulated data that are corrupted with photon noise only. As a result, the shot-noise-limited SNR expression in Eq. (91) is used to predict the SNRs in the measured scattered DPDWs. For this example, the object to be imaged is the resolution chart shown in Figure 33. This chart is modeled as a perfect absorber and is 50 cm on one side. Although the size of the resolution chart is much larger than objects typically imaged in biomedical applications, it was intentionally chosen to be large so that the resolution differences between high and low modulation frequencies could easily be seen. The object is imbedded 2 cm deep in a turbid medium phantom whose depth is 5 cm. The material properties of the phantom are μa = 0.03 cm−1, μ′s = 15 cm−1, and n = 1.5. It is assumed that a frequencydomain system similar to that described in Section IV.D.1 is used to image the imbedded object. The system parameters α mod , BWlp , and ηmod are chosen so
Figure 33. Unblurred and noise-free resolution chart. (Reprinted with permission from C. L. Matson, 2001. Deconvolution-based spatial resolution in optical diffusion tomography. Applied Optics, 40, 5791–5801.)
336
CHARLES L. MATSON
Figure 34. Measured scattered DPDW amplitudes produced when (left) the 10-kHz modulation frequency and the (right) 1-GHz modulation frequency are used. (Reprinted with permission from C. L. Matson, 2001. Deconvolution-based spatial resolution in optical diffusion tomography. Applied Optics, 40, 5791–5801.)
that approximately 1015 photons are detected during the effective measurement time determined by BWlp. The excess-noise factor Ŵ is assumed to be unity. The two modulation frequencies are chosen to be 10 kHz and 1 GHz. The 10-kHz modulation frequency is sufficiently low so that the SNRs and resolution in the measured scattered DPDW are the same as for CW illumination, whereas the 1-GHz modulation frequency is at the upper end of the values used in ODT. The amplitudes of the measured scattered DPDWs for the 10-kHz and 1-GHz modulation frequencies are shown in Figure 34. Notice that the spatial resolution in the measured data corresponding to the 1-GHz modulation frequency is better than that for the 10-kHz modulation frequency. This result has been noticed or expected by a number of authors (Boas, Campbell, et al., 1997; Ripoll et al., 1999; Wabnitz and Rinneberg, 1997). The difference was analyzed and determined to be about 20%. As a way to determine how well the backpropagation algorithm would function to increase spatial resolution and to calculate the spatial-frequency cutoffs of the regularizing filters for both data sets, SNR plots for both modulation frequencies were calculated by using Eq. (91) and are displayed in Figure 35 as a function of spatial-frequency magnitude. Notice that the 1-GHz SNRs are quite a bit lower than the 10-kHz SNRs and that the frequency at which the 1-GHz SNRs drop to one is almost half that for the 10-kHz SNRs. As a result, it is expected that the backpropagation algorithm should produce an image of the resolution chart for the 10-kHz data that is almost twice the resolution of the reconstructed image for the 1-GHz data. The modified Hamming filter was used in the backpropagation algorithm with its spatial-frequency cutoff set to where the SNRs in the data are equal to one, and the amplitudes of the backpropagated scattered DPDWs in the plane of the resolution chart for both modulation frequencies are shown in Figure 36. As expected, the resolution in the reconstructed data for the 10-kHz data is much better than that for the 1-GHz data.
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
337
Figure 35. Fourier-domain SNR plots for (solid line)the 10-kHz data and (dashed line) the 1-GHz data.
This example shows that CW illumination can generate data which, when deconvolved, have significantly better spatial resolution than do data generated by using illumination that is modulated at very high temporal frequencies. However, other factors come into play when the choice of CW illumination over modulated illumination is made for a given experiment. It has been speculated that boundary conditions may be more easily accommodated in deconvolution algorithms using frequency-domain and/or time-domain systems than with CW systems. Work has been carried out by others to determine optimal ways to generate data in optical tomography by using a variety of criteria (Schweiger and Arridge, 1997) in addition to spatial resolution. In addition, there are conflicting reports on whether or not the material properties of the imbedded
Figure 36. Backpropagated scattered DPDW amplitudes in the object plane generated from (left) the 10-kHz data and (right) the 1-GHz data. (Reprinted with permission from C. L. Matson, 2001. Deconvolution-based spatial resolution in optical diffusion tomography. Applied Optics, 40, 5791–5801.)
338
CHARLES L. MATSON
object can be recovered from data obtained by using CW illumination (Arridge and Lionheart, 1998; Iftimia and Jiang, 2000). VI. Concluding Remarks Optical diffusion tomography is an emerging technology that is used to image the interior of the human body and that provides a unique combination of benefits including information on the body’s functional status, the use of nonionizing radiation, and lower cost than other imaging modalities such as CT and MRI. In this article, an approach called diffraction tomography for turbid media was presented that models the interaction of optical radiation with turbid media such as the human body. The theory describes the behavior of diffuse photon density waves scattered by an object imbedded in a turbid medium to relate measurements made on the surface of a turbid medium volume to the properties of the object (shape and location). The forward model for DT for turbid media was derived for illumination sources and detectors with arbitrary spatial structure. The theory was then specialized to plane-wave illumination and detection in a plane to permit the development of the turbid media version of the Fourier diffraction theorem. Comparison of the turbid media Fourier diffraction theorem and the standard Fourier diffraction theorem revealed that the turbid media version was so much more complicated that an algebraic relationship between the two-dimensional Fourier transform of the measured data and the three-dimensional Fourier transform of the imbedded object was not possible, unlike for standard DT. The forward model for DT for turbid media is the foundation for the development of the inverse problem using a backpropagation approach. A single-view backpropagation algorithm was developed that reconstructs a diffuse photon density wave throughout the volume from a planar measurement at a single view angle. It was shown that the backpropagation algorithm can be used to increase spatial resolution in the measured data as well as to locate an object in depth from a single view. Examples were presented using both computer-simulated and laboratory data to illustrate the theoretical developments. Because a single view does not contain enough information to completely reconstruct the three-dimensional structure of an object, a multiple-view backpropagation algorithm was described that can be used to reconstruct the spatial structure of an object imbedded in a turbid medium volume. The behavior of the backpropagation algorithm depends critically on the noise corrupting the measured data. For this reason, SNR expressions were developed for measurements of diffuse photon density waves collected in a plane for both frequency-domain and CW imaging systems. These expressions were
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
339
validated with laboratory data and used to predict the relative performance of systems using very-high-modulation-frequency (1-GHz) illumination sources to those using essentially CW (10-kHz) illumination sources. It was shown, both theoretically and with computer-simulated data, that the 1-GHz illumination produces higher spatial resolution in the measured data, but the essentially CW illumination produces higher resolution in the backpropagated data. Work still to be accomplished includes developing, if at all possible, a turbid media version of the filtered backpropagation algorithm to generate more-accurate reconstructions. Because of the more complicated relationship between the measured data and an imbedded object in turbid media, such an algorithm may not be possible. Another area of future research is to extend the SNR expressions developed for data collected in a plane to other common geometries, such as a scanning geometry and a cylindrical geometry.
Acknowledgments The author gratefully acknowledges the fruitful collaboration with Hanli Liu: her leadership in setting up and running many of the laboratory experiments that produced much of the data presented in this article, as well as her many stimulating discussions, has been invaluable. The author also thanks Kurt Stoltenberg, Kelly Lau, Lauren Ferguson, and Ryan Christopher for their assistance in generating the laboratory data and producing the reconstructions.
References Alfano, R. R., Liang, X., Wang, L., and Ho, P. P. (1994). Science 264, 1913–1915. Arridge, S. R. (1999). Inverse Probl. 15, R41–R93. Arridge, S. R., and Lionheart, W. R. B. (1998). Opt. Lett. 23, 882–884. Ba˜nos, A., Jr. (1966). Dipole Radiation in the Presence of a Conducting Half-Space. Oxford: Pergamon. Barton, G. (1989). Elements of Green’s Functions and Propagation. Oxford: Oxford Univ. Press. Benaron, D. A., van Houten, J. P., Cheong, W. F., Kermit, E. L., and King, R. A. (1995). In Optical Tomography, Photon Migration, and Spectroscopy of Tissue and Model Media: Theory, Human Studies, and Instrumentation, edited by B. Chance and R. R. Alfano. Vol. 2389, Proc. SPIE. Bellingham, WA: SPIE—Int. Soc. Opt. Eng., pp. 582–597. Boas, D. A., Campbell, L. E., and Yodh, A. G. (1995). In Optical Tomography, Photon Migration, and Spectroscopy of Tissue and Model Media: Theory, Human Studies, and Instrumentation, edited by B. Chance R. R. Alfano. Vol. 2389, Proc. SPIE. Bellingham, WA: SPIE—Int. Soc. Opt. Eng., pp. 220–227. Boas, D. A., O’Leary, M. A., Chance, B., and Yodh, A. G. (1997). Appl. Opt. 36, 75–92. Braunstein, M., and Levine, R. Y. (2000). J. Opt. Soc. Am. A 17, 11–20.
340
CHARLES L. MATSON
Cai, W., Gayen, S. K., Xu, M., Zevallos, M., Alrubaiee, M., Lax, M., and Alfano, R. R. (1999). Appl. Opt. 38, 4237–4246. Catler, M. (1929). Surg. Gynecol. Obstet. 48, 721–729. Chen, B., Stamnes, J. J., and Stamnes, K. (1998). Pure Appl. Opt. 7, 1161–1180. Chen, B., Stamnes, J. J., and Stamnes, K. (2000). Appl. Opt. 39, 2904–2911. Cheng, X., and Boas, D. A. (1998). Opt. Express 3, 118–123. Colak, S. B., van der Mark, M. B., Hooft, G. W., Hoogenraad, J. H., van der Linden, E. S., and Kuijpers, F. A. (1999). IEEE J. Select. Topics Quantum Electron. 5, 1143–1158. Das, B. B., Yoo, K. M., and Alfano, R. R. (1993). Opt. Lett. 18, 1092–1094. Devaney, A. J. (1982). Ultrasonic Imaging 4, 336–350. Devaney, A. J. (1986). Inverse Probl. 2, 161–183. Devaney, A. J. (1987). Inverse Probl. 3, 389–397. Devaney, A. J. (1989). Inverse Probl. 5, 501–521. Devaney, A. J., and Tsihrintzis, G. A. (1991). IEEE Trans. Signal Processing 39, 672–682. Durduran, T., Culver, J. P., Holboke, M. J., Li, X. D., Zubok, L., Chance, B., Pattanayak, D. N., and Yodh, A. J. (1999). Opt. Express 4, 247–262. Fantini, S., Franceschini-Fantini, M. A., Maier, J. S., Walker, S. A., Barbieri, B., and Gratton, E. (1995). Opt. Eng. 34, 32–42. Feng, L., Yoo, K. M., and Alfano, R. R. (1994). Opt. Lett. 19, 740–742. Firbank, M., Hiraoka, M., and Delpy, D. T. (1993). In Photon Migration and Imaging in Random Media and Tissues, edited by B. Chance and R. R. Alfano. Vol. 1888, Proc. SPIE. Bellingham, WA: SPIE—Int. Soc. Opt. Eng., pp. 264–270. Fishkin, J. B., and Gratton, E. (1993). J. Opt. Soc. Am. A 10, 127–140. Flock, S. T., Patterson, M. S., Wilson, B. C., and Wyman, D. R. (1989). IEEE Trans. Biomed. Eng. 36, 1162–1168. Franceschini, M. A., Moesta, K. T., Fantini, S., Gaida, G., Gratton, E., Jess, H., Mantulin, W. W., Seeber, M., Schlag, P. M., and Kaschke, M. (1997). Proc. Nat. Acad. Sci. USA 94, 6468–6473. Gayen, S. K., and Alfano, R. R. (1996). Opt. Photon. News 17–22, 52. Gobin, L., Blanchot, L., and Saint-Jalmes, H. (1999). Appl. Opt. 38, 4217–4227. Groenhuis, R. A. J., Ferwerda, H. A., and ten Bosch, J. J. (1983). Appl. Opt. 22, 2456–2467. Gutman, S., and Klibanov, M. (1994). Inverse Probl. 10, L39–L46. Haselgrove, J. C., Leigh, J. S., Yee, C., Wang, N. G., Maris, M. B., and Chance, B. (1991). In Time-Resolved Spectroscopy and Imaging of Tissues, edited by B. Chance and A. Katzir. Vol. 1431, Proc. SPIE. Bellingham, WA: SPIE—Int. Soc. Opt. Eng., pp. 30–41. Hebden, J. C. (1992). Med. Phys. 19, 1081–1087. Hee, M. R., Izatt, J. A., Swanson, E. A., and Fujimoto, J. G. (1993). Opt. Lett. 18, 1107–1109. Hemenger, R. P. (1977). Appl. Opt. 16, 2007–2012. Hielscher, A. H., Klose, A., Catarious, D., Jr., and Hanson, K. M. (1998). In Advances in Optical Imaging and Photon Migration, edited by J. G. Fujimoto and M. S. Patterson. Vol. 21, OSA Trends in Optics and Photonics. Washington, DC: OSA Press, pp. 156–161. Hielscher, A. H., Tittel, F. K., and Jacques, S. L. (1994). In Advances in Optical Imaging and Photon Migration, edited by R. R. Alfano. Vol. 21, OSA Proceedings. Washington, DC: OSA Press, pp. 78–82. Hielscher, A. H., Tittel, F. K., and Jacques, S. L. (1995). In Photon Transport in Highly Scattering Tissue, edited by S. Avrillier, B. Chance, G. J. Mueller, A. V. Priezzhev, and V. V. Tuchin. Vol. 2326, Proc. SPIE. Bellingham, WA: SPIE—Int. Soc. Opt. Eng., pp. 75–85. Iftimia, N., and Jiang, H. (2000). Appl. Opt. 39, 5256–5261. Ishimaru, A. (1978a). Appl. Opt. 17, 348–352. Ishimaru, A. (1978b). Wave Propagation and Scattering in Random Media. San Diego: Academic Press.
DIFFRACTION TOMOGRAPHY FOR TURBID MEDIA
341
Ishimaru, A. (1989). Appl. Opt. 28, 2210–2215. Jacques, S. L., and Prahl, S. A. (1987). Lasers Surg. Med. 6, 494–503. Janesick, J. R. (2001). Scientific Charge-Coupled Devices. Bellingham, WA: SPIE—Int. Soc. Opt. Eng. Jiang, H., Paulsen, K. D., Osterberg, U. L., and Patterson, M. S. (1997a). Appl. Opt. 36, 52–63. Jiang, H., Paulsen, K. D., Osterberg, U. L., and Patterson, M. S. (1997b). Appl. Opt. 36, 2995– 2996. Jiang, H., Paulsen, K. D., Osterberg, U. L., Pogue, B. W., and Patterson, M. S. (1996). J. Opt. Soc. Am. A 13, 253–266. Kak, A. C., and Slaney, M. (1988). Principles of Computerized Tomographic Imaging. New York: IEEE Press. Kienle, A., Lilge, L., Patterson, M. S., Hibst, R., Steiner, R., and Wilson, B. C. (1996). Appl. Opt. 35, 2304–2314. Kingston, R. H. (1979). Detection of Optical and Infrared Radiation. Berlin: Springer-Verlag. Kn¨uttel, A., Schmitt, J. M., and Knutson, J. R. (1993). Appl. Opt. 32, 381–389. Li, X. D., Durduran, T., Yodh, A. G., Chance, B., and Pattanayak, D. N. (1997). Opt. Lett. 22, 573–575. Li, X., Pattanayak, D. N., Durduran, T., Culver, J. P., Chance, B., and Yodh, A. G. (2000). Phys. Rev. E 61, 4295–4309. Liu, H., Matson, C. L., Lau, K., and Mapakshi, R. R. (1999). IEEE J. Select. Topics Quantum Electron. 5, 1049–1057. Lyubimov, V. V. (1994). Optik. Spektrosk. 76, 814–815. Matson, C. L. (1997). Opt. Express 1, 6–11. Matson, C. L. (2001). Appl. Opt. 40, 5791–5801. Matson, C. L. (2002). J. Opt. Soc. Am. A 19, 961–972. Matson, C. L., Clark, N., McMackin, L., and Fender, J. S. (1997). Appl. Opt. 36, 214–220. Matson, C. L., DeLarue, I. A., Gray, T. M., and Drunzer, I. E. (1992). Comput. Elect. Eng. 18, 485–497. Matson, C. L., and Liu, H. (1999a). J. Opt. Soc. Am. A 16, 455–466. Matson, C. L., and Liu, H. (1999b). J. Opt. Soc. Am. A 16, 1254–1265. Matson, C. L., and Liu, H. (2000). Opt. Express 6, 168–174. Matson, C. L., Magee, E. P., and Holland, D. E. (1995). Opt. Eng. 34, 2811–2820. McGillem, C. D., and Cooper, G. R. (1991). Continuous and Discrete Signal and System Analysis, 3rd ed. Philadelphia: Holt, Rinehart & Winston. Mills, K. D., Deslaurier, L., Dilworth, D. S., Grannell, S. M., Hoover, B. G., Athey, B. D., and Leith, E. N. (2001). Appl. Opt. 40, 2282–2289. Moon, J. A., Battle, P. R., Bashkansky, M., Mahon, R., Duncan, M. D., and Reintjes, J. (1996). Phys. Rev. E. 53, 1142–1155. Morse, P. M., and Feshbach, H. (1953). Methods of Theoretical Physics. New York: McGraw-Hill. Norton, S. J., and Vo-Dinh, T. (1998). J. Opt. Soc. Am. A 15, 2670–2677. O’Leary, M. A. (1996). Imaging with diffuse photon density waves. Ph.D. dissertation Univ. of Pennsylvania, Philadelphia. O’Leary, M. A., Boas, D. A., Chance, B., and Yodh, A. G. (1992). Phys. Rev. Lett. 69, 2658–2661. O’Leary, M. A., Boas, D. A., Chance, B., and Yodh, A. G. (1995a). Opt. Lett. 20, 426–428. O’Leary, M. A., Boas, D. A., Chance, B., and Yodh, A. G. (1995b). In Optical Tomography, Photon Migration, and Spectroscopy of Tissue and Model Media: Theory, Human Studies, and Instrumentation, edited by B. Chance and R. R Alfano. Vol. 2389, Proc. SPIE . Bellingham, WA: SPIE—Int. Soc. Opt. Eng., pp. 320–327. Pan, S. X., and Kak, A. C. (1983). IEEE Trans. Acoust. Speech Signal Processing. ASSP-31, 1262–1275.
342
CHARLES L. MATSON
Papoulis, A. (1965). Probability, Random Variables, and Stochastic Processes. New York: McGraw-Hill. Pattanayak, D. N., and Yodh, A. G. (1999). Opt. Express 4, 231–240. Patterson, M. S., Chance, B., and Wilson, B. C. (1989). Appl. Opt. 28, 2331–2336. Paulsen, K. D., and Jiang, H. (1995). Med. Phys. 22, 691–701. Peters, V. G., Wyman, D. R., Patterson, M. S., and Frank, G. L. (1990). Phys. Med. Biol. 35, 1317–1334. ¨ Pogue, B. W., McBride, T. O., Prewitt, J., Osterberg, U. L., and Paulsen, K. D. (1999). Appl. Opt. 38, 2950–2961. Pogue, B. W., and Patterson, M. S. (1996). J. Biomed. Opt. 1, 311–323. Profio, A. E. (1989). Appl. Opt. 28, 2216–2221. Ripoll, J., Nieto-Vesperinas, M., and Carminati, R. (1999). J. Opt. Soc. Am. A 16, 1466–1476. Roggemann, M. C., and Welsh, B. (1996). Imaging Through Turbulence. Boca Raton, FL: CRC Press. Schatzberg, A., and Devaney, A. J. (1992). Inverse Probl. 8, 149–164. Schatzberg, A., Devaney, A. J., and Witten, A. J. (1994). Signal Processing 40, 227–237. Schmidt, F. E. W., Fry, M. E., Hillman, E. M. C., Hebden, J. C., and Delpy, D. T. (2000). Rev. Sci. Instrum. 71, 256–265. Schotland, J. C. (1997). J. Opt. Soc. Am. A 14, 275–279. Schweiger, M., and Arridge, S. R. (1997). Lecture Notes. Comput. Sci. 1230, 71–84. Stark, H. (1979). J. Opt. Soc. Am. 69, 1519–1525. Stark, H., and Wengrovitz, M. (1983). IEEE Trans. Acoust. Speech Signal Processing. ASSP-31, 1329–1331. Taylor, A. E., and Mann, W. R. (1972). Advanced Calculus, 2nd ed. New York: Wiley. Tsihrintzis, G. A., and Devaney, A. J. (1991). IEEE Trans. Signal Processing 39, 1466–1470. Wabnitz, H., and Rinneberg, H. (1997). Appl. Opt. 36, 64–74. Wilson, B. C., and Adam, G. (1983). Med. Phys. 10, 824–830. Winn, J. N., Perelman, L. T., Chen, K., Wu, J., Dasari, R. R., and Feld, M. S. (1998). Appl. Opt. 37, 8085–8091. Wolf, E. (1969). Opt. Commun. 1, 153–156. Yao, Y., Wang, Y., Pei, Y., Zhu, W., and Barbour, R. L. (1997). J. Opt. Soc. Am. A 14, 325–342. Ye, J. C., Millane, R. P., Webb, K. J., and Downar, T. J. (1998). Opt. Lett. 23, 1423–1425. Yoon, G., Welch, A. J., Motamedi, M., and van Gemert, M. (1987). IEEE J. Quantum Electron. QE-23, 1721–1733.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 124
Tree-Adapted Wavelet Shrinkage JAMES S. WALKER Department of Mathematics, University of Wisconsin–Eau Claire, Eau Claire, Wisconsin 54702- 4004
I. Introduction . . . . . . . . . . . . . . . . . . . . . . II. Comparison of Taws and Wiener Filtering . . . . . . . . . III. Wavelet Analysis . . . . . . . . . . . . . . . . . . . . A. Wavelet Series . . . . . . . . . . . . . . . . . . . B. Scaling Functions . . . . . . . . . . . . . . . . . . C. Discretization of Wavelet Analysis of Analog Signals . . . D. Discrete Wavelet Analysis . . . . . . . . . . . . . . . E. Wavelet Analysis of Images . . . . . . . . . . . . . . IV. Fundamentals of Wavelet-Based Denoising . . . . . . . . . A. Wavelet Shrinkage: VisuShrink . . . . . . . . . . . . B. Wavelet Shrinkage: SureShrink . . . . . . . . . . . . C. Cycle-Spin Thresholding . . . . . . . . . . . . . . . V. Tree-Adapted Wavelet Shrinkage . . . . . . . . . . . . . A. The Theory of Taws: Tree Structure of Wavelet Transforms . B. The Theory of Taws: Selection Principles . . . . . . . . C. The Aswdr Algorithm . . . . . . . . . . . . . . . D. The Taws Algorithm . . . . . . . . . . . . . . . . . E. The Taws-spin Algorithm . . . . . . . . . . . . . . . F. The Taws-Comp Algorithm . . . . . . . . . . . . . . G. Other Transforms for Taws . . . . . . . . . . . . . . VI. Comparison of Taws with Other Techniques . . . . . . . . . A. Objective: SNR Comparison . . . . . . . . . . . . . . B. Subjective: Visual Comparison . . . . . . . . . . . . . VII. Conclusion . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
343 345 348 348 350 352 352 355 358 358 363 364 366 367 368 373 376 380 380 382 383 383 385 391 391
I. Introduction During the 1990s, several new methods emerged for removing Gaussian random noise from images. These new methods, which use wavelet transform techniques, are generally superior to the classic Wiener filtering method of denoising. The goal of this article is to describe one of these new wavelet-based techniques, called tree-adapted wavelet shrinkage (Taws). The basic theory and implementation of Taws is explained, and it is compared with several other algorithms. 343 Copyright 2002, Elsevier Science (USA). All rights reserved. ISSN 1076-5670/02 $35.00
344
JAMES S. WALKER
As an introduction to the capabilities of Taws, in Section II it is compared with the classic Wiener filtering method. This comparison shows that Taws is clearly superior. So that this article is understandable to as wide an audience as possible, in Section III the fundamentals of wavelet analysis of images are described. There are two interrelated approaches to wavelet theory: the wavelet analysis of analog signals and the wavelet analysis of discrete signals. The interaction between these two modes of analysis is a fundamental aspect of wavelet theory. This brief account of wavelet theory stresses the aspects which are needed for understanding the theory behind the various wavelet denoising methods, including Taws. Readers who are familiar with wavelet analysis should feel free to skip (or perhaps skim) Section III. Section IV provides a short discussion of three wavelet-based techniques for image denoising: VisuShrink, SureShrink, and cycle-spin thresholding. Taws builds on the basic concepts provided by these three approaches, so it is important to briefly discuss them. First, the now-classic technique of VisuShrink is outlined. VisuShrink was one of the first methods to provide an alternative to Wiener filtering. In many cases it can outperform Wiener filtering, especially for images with a low signal-to-noise ratio (SNR). However, VisuShrink does suffer from the fact that it generally produces images that appear too smooth, that do not have sharp edges. Two superior methods to wavelet shrinkage are SureShrink and cycle-spin thresholding. They produce perceptually sharper denoised images and higher SNRs. Taws is built on the foundation provided by these three earlier methods. It provides fundamental improvements to the basic technique of VisuShrink, somewhat analogous to SureShrink. In addition, there is a Taws-adapted variant of cycle spinning, Taws-spin, which provides relatively high SNRs and excellent perceptual denoisings. Unlike cycle-spin thresholding, Tawsspin is a simple procedure requiring no complicated coding or memory bookkeeping. Section V covers in detail the theory of Taws and its implementation. Taws makes use of relations between edges within an image at multiple resolution levels and the image’s wavelet transform. By utilizing these relations Taws is able to ameliorate the problem of blurred edges that is endemic to classical wavelet shrinkage. An important feature of Taws is its compatibility with image compression. In particular, Taws was discovered in relation to a particular image-compression algorithm called Aswdr (adaptively scanned wavelet difference reduction). The Aswdr algorithm is briefly outlined in Section V and its connection with Taws is highlighted. The Taws algorithm was first reported in Walker and Chen (2000). Since then, the Taws algorithm has been improved to the point that all parameters can be chosen automatically. This improvement is described as well.
TREE-ADAPTED WAVELET SHRINKAGE
345
A detailed comparison of Taws with other denoising methods is carried out in Section VI, which is the final section of this article. The comparison is based on SNRs for denoising various test images with simulated Gaussian noise, and on perceptual comparisons of some of the denoised images. The Taws method is shown to be competitive with state-of-the-art methods, such as SureShrink, cycle-spin thresholding, and hidden Markov tree methods.
II. Comparison of Taws and Wiener Filtering Before the theory behind Taws is discussed, its superiority over Wiener filtering is illustrated. Wiener filtering is the classic, Fourier-based method of noise removal. It is hoped that this comparison will provide sufficient motivation for the reader to further study the Taws method. The type of noise considered in this article is additive Gaussian random noise. Given a discrete image F, the noisy image G is related to it by the equation G = F + N, where the noise values Nj,k are independent random variables with underlying distributions that are all zero-mean Gaussian normal of variance σ 2. Wiener filtering is well known to provide the best linear method for removing such noise (see, e.g., Mallat, 1999). That is, if the denoised image is obtained by applying a linear transformation to G, then Wiener filtering provides the best such denoising—“best” in the sense of least error when it is measured by using a sum of squares of differences over all pixels. In this article, tradition is followed and the objective measure of error between images used is the SNR, which in decibels is 10 log10 (F22 /F − G22 ). In other words, " |F j,k |2 SNR = 10 log10 " |F j,k − G j,k |2
where each sum is over all pixels. Thus, among all linear transformations of the noisy image G, Wiener filtering is the one which maximizes the SNR. The commercial implementation of Wiener filtering from the MatLabtm imageprocessing toolbox (the wiener2 procedure; Lee, 1980) is used. To see how Wiener and Taws denoisings perform, let us consider the standard test image Peppers shown in Figure 1a and the noisy version of it in Figure 1b. The noisy image was produced by adding Gaussian noise with σ = 32 and has an SNR of 11.8 db. Figures 1c and 1d show the Wiener filtering and Twas-spin denoisings of the noisy image. The Taws-spin denoising is superior both in terms of SNR and perceptually. The Taws-spin denoising has an SNR of 21.6 db, which is 3.0 db higher than the SNR for the Wiener denoising. Perceptually, the Wiener filtering retains a grainy appearance that strikes
346
JAMES S. WALKER
Figure 1. Wiener and Taws-spin denoisings of a noisy Peppers image. (a) Original image. (b) Noisy image, SNR = 11.8 db. (c) Wiener method, SNR = 18.6 db. (d) Taws-spin method, SNR 21.6 db.
observers as a noisy image. In contrast, the Taws-spin denoising appears to be essentially noise free, albeit slightly out of focus. It is important to expand on the remarks just made about the Taws-spin denoised image. Like all wavelet-based denoisings, a Taws denoising aims to produce an image free of all contamination by random noise and having a greatly increased SNR. Different wavelet-based approaches utilize different theoretical models for imaging in order to achieve this goal. Later, these models are briefly outlined for some of the other wavelet-based methods, and the model used to develop Taws is described in more detail. The Taws-spin denoising in Figure 1d appears to be a slightly out-of-focus version of the original image. We shall see that this is a kind of oversmoothing that is endemic to all waveletbased denoising methods, although Taws suffers from it to a lesser degree than the other wavelet methods discussed in this article do.
TREE-ADAPTED WAVELET SHRINKAGE
347
This comparison is concluded by examining the SNR performance of Wiener filtering and Taws on a suite of test images. Like all the images considered in this article, these images are gray-level images with 256 intensity levels (sometimes called 8-bit gray-level images). Gaussian noise of various standard deviations was added to five test images, known as Lena, Goldhill, Boats, Barbara, and Peppers. All of the original and noisy images are available at the following web site: http://www.uwec.edu/academic/curric/walkerjs/denoisings/ The noisy images were denoised by using Wiener filtering and Taws-spin. Table 1 contains the results. The data in Table 1 provide strong evidence that Taws-spin is far superior to Wiener filtering. Its superiority is particularly great at the lower SNRs, for noise with σ = 32 and 64. It should also be noted that Taws-spin is a relatively fast procedure. The average time for a Taws-spin denoising of a 256 × 256 image is about 3 s, whereas the average time for a 512 × 512 image is about 12 s TABLE 1 Denoisings by Wiener Filtering and Taws-spina Image, noise σ
Noisy SNR
Wiener SNR
Taws-spin SNR
Lena, 8 Goldhill, 8 Boats, 8 Barbara, 8 Peppers, 8
24.4 23.7 25.2 24.2 23.5
28.9 26.1 28.1 24.5 27.7
29.6 26.8 29.3 26.4 28.0
Lena, 16 Goldhill, 16 Boats, 16 Barbara, 16 Peppers, 16
18.4 17.7 19.2 18.2 17.6
25.0 23.3 25.0 22.1 24.0
26.1 23.6 25.6 22.1 25.0
Lena, 32 Goldhill, 32 Boats, 32 Barbara, 32 Peppers, 32
12.5 11.9 13.4 12.4 11.8
19.7 18.6 20.2 18.2 18.6
23.0 20.8 22.2 18.5 21.6
Lena, 64 Goldhill, 64 Boats, 64 Barbara, 64 Peppers, 64
7.2 6.5 8.1 7.1 6.4
14.5 13.6 15.1 13.6 13.3
19.0 18.1 19.0 16.1 17.3
a
SNR, signal-to-noise ratio.
348
JAMES S. WALKER
(on a 1-GHz machine with 256 MB RAM). For optimum speed and efficiency, a Taws denoising (without cycle spinning) is preferable, but at the cost of a reduced SNR and more visible noise artifacts. More examples of both Taws and Taws-spin denoisings and comparisons with other denoising methods are provided in Section VI. For readers who are not familiar with wavelet analysis, a brief summary is provided in the next section. Even those who are conversant with wavelets might find it useful to skim this section because some remarks are made pertaining to the theory of Taws. III. Wavelet Analysis The Taws denoising method, like all wavelet-based denoising methods, depends on first performing a wavelet transform of the noisy image. Wavelet transforms facilitate separating noise from the transformed image. In this section the principal aspects of wavelet analysis are briefly summarized. Thorough treatments can be found in Burrus et al. (1998), Chui (1997), Daubechies (1992), Hernandez and Weiss (1996), Meyer (1992), Resnikoff and Wells (1998), Strang and Nguyen (1996), and Vetterli and Kovaˇcevi´c (1995). Particular emphasis on discrete wavelet analysis is given in Walker (1997, 1999). The wavelet analysis of one-dimensional (1D) analog signals is summarized first. The notation for this theory is simpler, and the main ideas for two-dimensional (2D) images are captured in the simpler 1D setting. Because the aim is to explain denoising of digital images contaminated by random noise, we need to examine the connection between wavelet analysis of analog signals and wavelet analysis of discrete signals. This connection is mediated by an appropriate signal model. The signal model used is that a noise-free signal is a continuous, piecewise smooth function. “Piecewise smooth” means that pieces of the graph of the function are continuously differentiable to some order. This model has the virtue of simplicity. It is also fairly realistic because signals are typically obtained by means of some measuring process. Therefore, a noise-free signal has any discontinuities averaged out and irregularities smoothed out by the measuring instrument.∗ A. Wavelet Series ∞ A wavelet basis for functions in L 2 (R) = { f : −∞ | f (x)|2 d x < ∞} is a set of functions {ψ mj (x) = 2m/2 ψ(2m x − j)}m, j∈Z which is an orthonormal basis for ∗ The author is alluding to convolution by an instrument function. Discussion of wavelet-based techniques for deconvolution can be found in Mallat (1999) and references therein.
TREE-ADAPTED WAVELET SHRINKAGE
349
L2(R). Each basis function ψ mj is a dilation by 2−m and translation by j 2−m of the function ψ. This function ψ is the generating wavelet (or simply wavelet) for this wavelet basis. Because {ψ mj } is an orthonormal basis, the following wavelet series expansion holds for each f ∈ L2(R): β mj ψ mj (x) (1) f (x) = m∈Z j∈Z
with wavelet coefficients
{β mj }
defined by ∞ f (x)ψ mj (x) d x β mj =
(2)
−∞
For the given wavelet system {ψ mj }, the map f #→ {β mj } is the wavelet transform defined by the wavelet system. There are many wavelet bases. A classic example is the Haar basis (Haar, 1910). To define the Haar basis, let 1S(x) denote the indicator function for / S. The wavelet for the set S. That is, 1S(x) = 1 if x ∈ S and 1S(x) = 0 if x ∈ the Haar basis is ψ(x) = 1[0,1/2)(x) − 1[1/2,1)](x). Because this Haar wavelet ψ is a step function, any partial sum of the series in Eq. (1) is a step function. Unfortunately, approximation of an analog signal by step functions is not consistent with the continuous, piecewise smooth signal model. Fortunately, there are wavelet bases which are consistent with this signal model, for example, the Daubechies wavelet bases (Daubechies, 1988). A Daubechies wavelet ψ is continuous and compactly supported (zero outside some finite interval). Furthermore, many Daubechies wavelets are smooth; they are continuously differentiable to some order. To obtain smoothness, a Daubechies wavelet ψ is required to have a finite number of zero moments: ∞ x n ψ(x) d x = 0 n = 0, 1, . . . , N (3) −∞
Note that Eq. (3) is satisfied by the Haar wavelet if N = 0. The Daubechies wavelet satisfying Eq. (3) for a given N ≥ 1 is supported on the interval [0, 2N + 1]. Each basis function ψ mj is then supported on the interval [ j2−m, ( j +2N + 1)2−m]. These intervals vary in length depending on m, and in location depending on m and j. The basis functions ψ mj (x) = 2−m/2 ψ(2m x − j) can zoom in on particular areas of a signal. As m increases, the supports of these basis functions decrease in length, and varying j allows us to examine particular areas of the signal. The wavelet coefficients β mj encode information about local aspects of the signal f by measuring its correlation with ψ mj . Furthermore, the terms in the series in Eq. (1) are multiples of the smooth basis functions ψ mj , so partial sums are smooth. When the signal makes rapid transitions—for example, when transitioning from one smooth piece to another—the wavelet coefficients β mj that reflect this most prominently (have largest magnitude)
350
JAMES S. WALKER
are supported in intervals located around the transition region. Because the corresponding terms β mj ψ mj (x) are supported around the transition region, a partial sum of the series in Eq. (1) that includes these terms produces a smooth transition between pieces. For our signal model, the zero-moment requirement (3) implies that there are many small-magnitude wavelet coefficients. If f is closely approximated over the support of ψ mj by a polynomial of degree N (as in a truncated Taylor expansion), then the wavelet coefficient β mj will be approximately zero. This leads to many small-magnitude wavelet coefficients located in regions where the signal is smooth. The combination of orthonormality, compact support, smoothness, and zero moments yields a series in Eq. (1) that is well adapted to the signal model of continuous, piecewise smooth functions. It also produces an excellent system for performing denoising of signals contaminated by Gaussian noise. Because {ψ mj } is an orthonormal system, the wavelet transform f #→ {β mj } is orthogonal. The combination of an orthogonal transform (conservation of energy) and large numbers of small wavelet coefficients (compaction of energy) is the principal reason that Gaussian random noise can be separated from the signal. This topic is discussed further in Section IV.
B. Scaling Functions The series expansion in Eq. (1) can be viewed as a limit of a sequence of functions fM defined by f M (x) =
M−1
∞
β mj ψ mj (x)
m=−∞ j=−∞
Each fM belongs to a subspace VM of L2(R) with basis {ψ mj }m<M, j∈Z . For all the wavelet systems discussed, there exists a function φ such that, for every M, −M/2 φ(2 M x − j)} j∈Z is an orthonormal basis for VM. The the set {φ M j (x) = 2 function φ is called the scaling function for the wavelet system. Because {φ M j } is an orthonormal basis for VM, there is the following scaling series expansion of fM: f M (x) =
∞
M αM j φ j (x)
(4)
j=−∞
with scaling coefficients {α M j } defined by ∞ M αj = f (x)φ M j (x) d x −∞
Formulas (4) and (5) express fM as an orthogonal projection of f into VM.
(5)
TREE-ADAPTED WAVELET SHRINKAGE
351
The scaling function for the Haar system is φ(x) = 1[0,1) (x). The Haar scaling function and wavelet satisfy the following identities: √ 1 √ φ(x) = √ [ 2 φ(2x) + 2 φ(2x − 1)] 2 √ 1 √ ψ(x) = √ [ 2 φ(2x) − 2 φ(2x − 1)] 2
(6)
Similar identities hold for Daubechies wavelets and scaling functions. For each +1 2N +1 Daubechies system, there are two finite sets of constants {ck }2N k=0 and {dk }k=0 such that φ(x) = ψ(x) =
2N +1 k=0
2N +1 k=0
√ ck 2 φ(2x − k) √ dk 2 φ(2x − k)
For example, if N = 1, Daubechies found that the constants √ √ √ √ 1+ 3 3+ 3 3− 3 1− 3 c0 = c1 = c2 = c3 = √ √ √ √ 4 2 4 2 4 2 4 2 √ √ √ √ 3+ 3 −1 − 3 3−3 1− 3 d0 = d1 = d2 = d3 = √ √ √ √ 4 2 4 2 4 2 4 2
(7)
(8)
provide a solution of Eq. (7). They define the Daub4 wavelet system. Notice that for the Haar system, the pair of equations in (6) are a special case of those in (7), obtained√by setting N =√0. For the Haar √ case, the constants in (7) are c0 = c1 = 1/ 2 and d0 = 1/ 2, d1 = −1/ 2. There is a close relation between scaling coefficients and wavelet coefficients. From the identities in Eq. (7), we obtain = α m−1 j β m−1 j
=
2N +1
m ck αk+2 j
k=0
2N +1
(9) m dk αk+2 j
k=0
The second equation in (9) shows how the wavelet coefficients {β m−1 } are j obtained from the scaling coefficients {α mj }. The first equation in (9) allows us
352
JAMES S. WALKER
to iterate this procedure. By iteration, all wavelet coefficients {β kj }k<m can be obtained from the scaling coefficients {α mj } at scale m. For all wavelet systems, the scaling function φ (x) is supported on the interval [0, 2N + 1] and satisfies ∞ φ(x) d x = 1 (10) −∞
This equation proves useful in deriving the discrete version of wavelet theory. C. Discretization of Wavelet Analysis of Analog Signals
The scaling coefficients {α M j }, for large enough M, provide a connection between the world of analog signals and the world of discrete signals. For example, given an analog signal f (x), we can generate discrete data { f j } j∈Z by defining each fj to be f (xj), where xj = j x for some fixed step-size x. The { fj} are referred to as uniform samples, or just samples, of f (x). Formula (5) yields the following approximation for sufficiently large M (provided f is uniformly continuous): (2N +1+ j)2−M M f (x) 2−M/2 φ(2 M x − j) d x αj = j2−M
≈ f ( j2−M )
(2N +1+ j)2−M j2−M
2−M/2 φ(2 M x − j) d x
Because of Eq. (10) we then have∗ M/2 αM f ( j2−M ) j ≈ 2
(11)
The approximation in Eq. (11) is justification for assuming, in the discrete setM/2 fj. In practice, the constant factor 2M/2 is usually dropped, ting, that α M j = 2 and it is just assumed that α M j = fj for each j. The reason for dropping the constant factor 2M/2 is that the equations in (9) then define an orthogonal transformation on the samples { fj}. This orthogonal transformation is a discrete wavelet transform. D. Discrete Wavelet Analysis The equations in (9), starting from initial data {α M j = f j }, provide a discrete wavelet transform. To see how this works let us first consider the simplest case, the Haar transform. ∗
Further discussion of Eq. (11) can be found in Walker (1997).
TREE-ADAPTED WAVELET SHRINKAGE
353
√ √ √ For Haar wavelets, N = 0, c0 = c1 = 1/ 2, and d0 = 1/ 2, d1 = −1/ 2. Equation (9) then yields the following transformation: = α M−1 j
f 2 j + f 2 j+1 √ 2
β jM−1 =
f 2 j − f 2 j+1 √ 2
(12)
In practice, these formulas are applied to a finite set of initial values. Let us begin with a vector f = ( f0 , f1 , . . . , fJ − 1), where we assume J is even. Applying Eq. (12) to f yields the vector / M−1 M−1 / , β1 , . . . , β(JM−1 (a1 | b1 ) = α0M−1 , α2M−1 , . . . , α(JM−1 −1)/2 −1)/2 β0
The mapping f #→ (a1 | b1) is the first-level Haar transform of f. The vector a1 is called the trend of f, and the vector b1 is called the fluctuation of f. It is easy to see that the first-level Haar transform is orthogonal. It preserves the energy of vectors, as measured by sums of squares. If J can be divided several times by 2, then several levels of Haar transform can be performed. For example, if J is divisible by 4, then f has a second-level Haar transform. This is found by iterating the first-level Haar transform, by applying it to the trend vector a1. This is just a special instance of the formulas in (9) for the Haar case. As an example of these ideas, let us consider the following analog signal: f (x) = (sin 25π x)1[0,0.25] (x) + (4 + cos 45π x)1(0.25,0.75] (x) + (sin 35π x)1(0.75,1) (x)
The graph of the vector f, which consists of J = 16, 384 samples of the signal f (x) over the interval [0, 1), is shown in Figure 2a. The first-level Haar transform of this signal is shown at the top of Figure 2b. The trend a1 is graphed over
Figure 2. (a) Test signal. (b, top) First-level Haar transform, (b, bottom) Second-level Haar transform. (c, top) Second-level trend. (c, middle) Second-level fluctuation. (c, bottom) First-level fluctuation.
354
JAMES S. WALKER
[0, 0.5) and the fluctuation b1 is graphed over [0.5, 1). Notice that the trend a1 closely resembles the original vector f. The trend also contains 99.992% of the energy of f. The fluctuation b1 shows only two noticeable values, which correspond to the jump discontinuities of f (x). Because the fluctuation contains such a small amount of energy, what little energy it has is confined mostly to these two peak values. Similar remarks apply to the second-level transform shown in Figure 2b (bottom). This second-level transform is also displayed in Figure 2c, where each of the vectors a2, b2, and b1 are graphed over the interval [0, 1). Comparing the graph of a2 with the graph of f clearly shows that they are closely related in form, although a2 has only a quarter of the values of f, spaced four times as far apart. The second-level trend a2 contains 99.990% of the energy of f. Notice that the fluctuations b2 and b1, when graphed over the interval [0, 1), provide excellent locators for the positions of the jump discontinuities of f (x). Similar results can be found for images. However, with images, instead of jump discontinuities at isolated points, there are edges along curves. For Daubechies wavelets, there are similar wavelet transformations. For example, if we use the coefficients in Eq. (8), the first-level Daub4 discrete wavelet transform is defined by∗ = c0 f 2 j + c1 f 2 j+1 + c2 f 2 j+2 + c3 f 2 j+3 α m−1 j β m−1 = d0 f 2 j + d1 f 2 j+1 + d2 f 2 j+2 + d3 f 2 j+3 j
(13)
These equations define an orthogonal transform f #→ (a1 | b1). Higher-level Daub4 transforms, as with the higher-level Haar transforms, are defined by iteration on the trend vectors a1, a2, and so forth. One advantage of Daubechies wavelets is that the moment conditions in Eq. (3) imply corresponding discrete moment conditions on the constants {dk} used for generating wavelet coefficients in the discrete transform. These discrete moment conditions are (let 00 = 1 to enable a single statement): 2N +1 k=0
k n dk = 0
n = 0, 1, . . . , N
(14)
Given our signal model, these discrete moment conditions imply that there will be many small-magnitude wavelet coefficients. In fact, if the vector f consists of samples from a signal that can be closely approximated over the support of ψ m−1 by a polynomial of degree N (as in a truncated Taylor expansion), then j will be approximately zero. Eq. (14) implies that β m−1 j ∗ The equations in (13) do not apply when j = J/2 − 1 because the values f2j + 2 and f2j + 3 are undefined. In this case, the values of f0 and f1, respectively, are substituted for these undefined values. This corresponds to a periodic extension of the vector f.
TREE-ADAPTED WAVELET SHRINKAGE
355
Notice how this last statement for wavelet coefficients of discrete signals corresponds to a similar statement made previously for wavelet coefficients of analog signals. On the basis of this correspondence, the following basic properties for the discrete Daubechies wavelet transforms (assuming our signal model) can be stated: Energy Conservation. The wavelet transform is an orthogonal transform. Energy Compaction. There are large numbers of small-magnitude wavelet coefficients. Most of the energy is concentrated in the trend. Two Populations. The larger wavelet coefficients are clustered around sharp transition regions (edges in images). Smaller wavelet coefficients reside in smoother regions. Clustering. Large-magnitude wavelet coefficients tend to have some large-magnitude coefficients located near them. Note that the clustering property was not discussed previously. It is not difficult to see that it holds, however. It follows from the large amount of overlap of the supports of the Daubechies wavelets. Consequently, when a wavelet coefficient β mj is relatively large, there is a tendency for one of its neighbors β mj−1 or β mj+1 to be large as well. E. Wavelet Analysis of Images Now that wavelet analysis has been summarized for 1D signals, let us turn to the case of 2D images. Because the 2D theory is essentially the same as for 1D, only its new features are briefly summarized. Generalizing our signal model for 1D, the model for 2D images is continuous, piecewise smooth functions F(x, y). Discrete images consist of J × K matrices F = (Fj, k) obtained from samples from these functions. Wavelet series for 2D images are a simple generalization from 1D series. The wavelet basis functions consist of three kinds: ψ mj (x)φkm (y), φ mj (x)ψkm (y), and ψ mj (x)ψkm (y). For these three kinds of basis functions, the zero-moment condition (3) generalizes to powers of both x and y. A 2D wavelet series for F(x, y) is expressible as ∞ m m m β I I (x, y) with β I = F(x, y) Im (x, y) d x d y F(x, y) = m=−∞ I ∈I
R2
where Im (x, y) stands for any one of the three types of basis functions, and I is the index set needed for labeling all these basis functions. There is also a scaling M series, using {φ M j (x)φk (y)} as the basis, which is an obvious generalization of Eq. (4). The theory of wavelet series for 2D images is essentially the same
356
JAMES S. WALKER
as the preceding 1D theory, with only slight notational modifications, so let us now turn to a discussion of 2D discrete wavelet transforms. Such transforms are applied to matrices, just as in 1D they are applied to vectors. A discrete wavelet transform of a J × K matrix F, where J and K are both even, is obtained in two steps (Mallat, 1989): (1) Transform each row of F ˜ (2) Transform each column by a 1D wavelet transform, obtaining a matrix F. of F˜ by the same 1D transform. Steps 1 and 2 are independent and may be performed in either order. Step 1 of this transform process produces J rows of 1D transforms: ⎛ 1 1⎞ a1 | b1 ⎜ a12 | b12 ⎟ ⎟ F #→ ⎜ ⎝ ... ⎠ a1J | b1J Step 2 then produces the following first-level transform: 1 1 A V F #→ H1 D1
where A1, V1, H1, and D1 are each J/2 × K/2 submatrices. The trend A1 consists of scaling coefficients, whereas the fluctuations V1, H1, and D1 consist of wavelet coefficients for each of the three kinds of wavelet basis functions. (x) The trend A1 contains scaling coefficients for the scaling basis {φ M−1 j M−1 φk (y)}. Hence, by a 2D version of Eq. (11), it is a lower resolution version of F. For example, let us consider the image of an octagonal ring shown in Figure 3a. Its first-level Daub4 wavelet transform is shown in Figure 3b. The trend A1 occupies the upper left quadrant of the transform, and it is clearly a lower resolution version of the original image. The vertical fluctuation V1 contains wavelet coefficients for the basis elements ψ jM−1 (x)φkM−1 (y) (i.e.,
Figure 3. (a) Image. (b) First-level transform. (c) Second-level transform.
TREE-ADAPTED WAVELET SHRINKAGE
357
fluctuations along rows and trends along columns). Wherever there are vertical edges in an image, the fluctuations along rows are able to detect these edges. This tends to emphasize vertical edges, or edges containing a vertical component. This can be seen clearly in Figure 3b, where V1 appears in the upper right quadrant. Notice also that horizontal edges, where the octagonal ring image is constant over long stretches, are removed from V1. The horizontal fluctuation H1 is similar to V1, except that the roles of horizontal and vertical are reversed. In Figure 3b, the horizontal fluctuation H1 is shown in the lower left quadrant. Finally, there is the diagonal fluctuation D1, which contains wavelet coefficients for the basis elements ψ jM−1 (x)ψkM−1 (y). This fluctuation tends to emphasize diagonal features, as can clearly be seen in Figure 3b, where it occupies the lower right quadrant. Diagonal features are emphasized because fluctuations along rows and columns tend to erase horizontal and vertical edges. It is interesting to note that this decomposition of the image into a lower resolution subimage, along with several subimages reflective of responses to different edge orientations, is analogous to operations performed by mammalian vision systems. Watson (1987) was perhaps the first to establish a close connection between wavelets and vision. A precursor to Watson’s work in the field of image processing was Burt and Adelson (1983). Several papers by Field (1993, 1994, 1999) and by Field and Brady (1997) explored this connection more fully. A good summary of wavelets and vision can be found in Wandell (1995). As with 1D transforms, higher levels of 2D wavelet transforms are computed by iterating the first-level transform on the trends. For example, Figure 3c shows a second-level transform of the octagonal ring image. The trend A1 from Figure 3b has been transformed into a trend A2 and fluctuations V2, H2, and D2. The trend A2 is an even lower resolution version of the original octagon image, whereas the fluctuations reflect edge details in the lower resolution image A1. An example of a third-level transform is shown in Figure 4.
Figure 4. (a) Boats image. (b) Third-level Daub 9/7 transform. (c) Locations of larger values of V1 coefficients.
358
JAMES S. WALKER
The transform used in Figure 4 is based on the Daub 9/7 wavelet system (Cohen et al., 1992). This system is very popular in wavelet-based image processing. For instance, it is used in the Joint Photographic Experts Group (JPEG) 2000 image-compression standard (Gormish et al., 2000). Figure 4 illustrates the four basic properties of wavelet transforms. For instance, the clustering property is apparent from Figures 4b and 4c. Figure 4c also provides a good illustration of the two populations property. Comparing it with the image in Figure 4a, we can see that larger wavelet coefficients are clustered around edges, such as the boat masts and rigging, and smaller coefficients reside in smoother regions, such as the sky and the clouds. The energy compaction property is illustrated by the fact that the third-level trend, which is shown in the upper left corner of Figure 4b, accounts for 98.7% of the total energy of the transform coefficients. Finally, the Daub 9/7 system provides good energy conservation. Although it is not an orthogonal transform, the Daub 9/7 transform preserves 99.7% of the energy of the original Boats image. This example illustrates how close it is to being orthogonal—close enough that we can use it for all intents and purposes as if it were orthogonal. The Daub 9/7 transform has some important advantages for image processing. For example, its wavelet basis functions used for reconstruction are twice continuously differentiable and symmetric. One advantage of symmetry is that images can be symmetrically extended past their boundaries, which results in fewer boundary artifacts in denoised images than with periodic extension (see the preceding footnote). The Daub 9/7 system was used for the Taws-spin denoisings in Section II. The smoothness of its wavelet basis functions is reflected in the smooth appearance of the Taws-spin-denoised image in Figure 1d because that image corresponds to a wavelet series partial sum. IV. Fundamentals of Wavelet-Based Denoising In this section the fundamental concepts underlying wavelet-based denoising are examined. This background provides the right context for the description of Taws in the next section. Taws is based on improvements to two basic denoising techniques: wavelet shrinkage and cycle-spin thresholding.
A. Wavelet Shrinkage: VISUSHRINK A major breakthrough in denoising was achieved with the now-classic methods of wavelet shrinkage (Donoho and Johnstone, 1994, 1995). Their methods are referred to as VISUSHRINK and SURESHRINK. Because Taws has some ideas in common with VisuShrink, this method is described first. The 1D theory for
TREE-ADAPTED WAVELET SHRINKAGE
359
VisuShrink captures the main ideas and allows concepts to be explained more easily, so the results for 1D signals are stated. The extension of these results to 2D images is straightforward. All the results stated next for VisuShrink are proved in Donoho and Johnstone (1994). See also Donoho (1993) and Donoho et al. (1995). In this discussion, let us assume that the equation g = f + n describes the 1D noisy signal g obtained from the original signal f by the addition of the noise n. The original signal f consists of samples of an analog signal from our piecewise smooth signal model, and the noise values nj are assumed to be realizations of independent zero-mean Gaussian normal random variables with standard deviation σ ; that is, they are i.i.d. of type N (0, σ 2). Wavelet transforms ˆ and n. ˆ Because wavelet transforms are of these signals are denoted by gˆ , f, ˆ The orthogonality of wavelet transforms implies that the linear, gˆ = fˆ + n. transformed noise values nˆ j are also i.i.d. of type N (0, σ 2). In VisuShrink, the values gˆ j of the transformed noisy signal are subjected to the following shrinkage function: ⎧ if |t| < τ ⎨0 if t ≥ τ S(t) = t − τ (15) ⎩t + τ if t ≤ −τ √ where the threshold τ is assigned the value τV = σ 2 log J . After shrinkage, the inverse wavelet transform is applied to produce√ the denoised signal. For a J × K image, the VisuShrink threshold is τV = σ 2 log M, where M is the larger of√J and K. The current treatment is confined to square J × J images, so τV = σ 2 log J . To be more precise, the shrinkage function is applied to only wavelet coefficients the scaling coefficients are left unchanged. With an orthogonal transform of L levels, the noise energy in the trend AL is greatly reduced—as a fraction of total noise energy—when L is large enough. In an image, the noise energy reduction is by a factor of 1/4L; hence L ≥ 3 is usually sufficient to greatly reduce the fraction of noise energy in the trend. This is in stark contrast to the energy compaction property for the original image, which greatly magnifies the sizes of image transform values in the trend. Because the scaling functions corresponding to trend coefficients are widely spread over the image when L ≥ 3, it follows that relatively small-amplitude noise terms are much less visible in the reconstructed image. This is illustrated with an example later in this subsection. In wavelet shrinkage, all wavelet coefficients gˆ j of magnitude less than some threshold τ are set to zero. This is called thresholding. The fact that the transformed noise values are i.i.d. of type N (0, σ 2), combined with the energy compaction and two populations properties for the original signal transform ˆf,
360
JAMES S. WALKER
provides the rationale for this thresholding. If τ is several times larger than σ , as it is in VisuShrink, then |gˆ j | < τ almost certainly guarantees that gˆ j is noise dominated. In fact, for the VisuShrink threshold τ V, the following limit holds for all noise wavelet coefficients: Prob[max |nˆ j | < τV ] −→ 1
as J −→ ∞
(16)
Hence |nˆ j | is almost certainly less than τ V. Furthermore, the energy compaction and two populations properties imply that | fˆ j | ≈ 0 for most indices j. Therefore, asymptotically, the threshold τ V guarantees that VisuShrink-denoised images will be essentially noise free. To apply VisuShrink, we must estimate the standard deviation σ of the noise. Fortunately, it can be estimated from the highest-level wavelet coefficients by means of the following median estimate: σ ≈
(median of highest-level coefficients) 0.6475
(17)
This approximation is reasonably accurate because it is derived from an exact formula for N (0, σ 2) random variables and the highest-level wavelet coefficients are mostly noise. A median estimate is used, rather than, say, a mean estimate, because it is relatively insensitive to the existence of a few highmagnitude outliers. Typically there are such outliers, which result from the high-magnitude wavelet coefficients near the edges in the original image. VisuShrink produces nearly optimal denoising results. This optimality can be expressed in terms of its achieving maximum SNRs. Equivalently, ˜ f] between two VisuShrink minimizes the risk error, where the risk R[f, signals f˜ and f is defined by the following expected value: ( ) J −1 1 ˜ f] = E R[f, | f˜ j − f j |2 (18) J j=0 If we let fV stand for the VisuShrink denoising of the noisy signal, the risk R[fV, f], which compares fV with the original signal f, satisfies (for a sufficiently smooth wavelet system)
R[fV , f] ∼ C
(log J )2 σ J
(19)
where C is a constant (dependent only on the choice of wavelet system). This asymptotic result is significantly better than that for a Wiener filtering fW of the noisy image. For Wiener filtering, the following holds (see Mallat, 1999): A R[fW , f] ∼ √ σ J
(20)
TREE-ADAPTED WAVELET SHRINKAGE
361
for some constant A. Comparing Eqs. (19) and (20), we see that VisuShrink is asymptotically superior to Wiener filtering as J → ∞. Because VisuShrink is a nonlinear method, this superiority for VisuShrink does not contradict the optimality of Wiener filtering among linear methods. The near optimality of VisuShrink is expressed by the following asymptotic result:
R[fV , f] ∼ C(log J )2 R[fI , f]
(21)
where R[fI, f] is the risk of an ideal denoising fI of a piecewise polynomial signal. Other optimality results, from the use of different signal models and alternative measures of risk error, are described in detail in Donoho et al. (1995). Although the results just described provide ample evidence for the near optimality of VisuShrink, it is important to examine its performance on denoising actual images. Figure 5 shows four VisuShrink denoisings of the noisy
Figure 5. VisuShrink denoisings of the noisy image in Figure 1b using four different levels of the Daub 9/7 transform. (a) Second level, SNR = 19.6 db. (b) Third level, SNR = 19.1 db. (c) Fourth level, SNR = 17.9 db. (d) Fifth level, SNR = 17.3 db.
362
JAMES S. WALKER
Peppers image shown in Figure 1b. The wavelet system used for these denoisings was the Daub 9/7 system. The number of levels used for the wavelet transform ranged from two to five. The second- and third-level denoisings both have greater SNRs than that of the Wiener filtering shown in Figure 1c. The second-level VisuShrink denoising has a 1-db-higher SNR than that of the Wiener filtering. By carefully examining these four typical denoisings, we can learn a lot about VisuShrink, both its strengths and its weaknesses. First, observe that although the second-level denoising has the highest SNR, it also exhibits a significant amount of “mottling” artifacts. This mottling is due to the noise left unaffected in the second-level trend. As we noted previously, the reconstruction of this second-level trend noise consists of low-amplitude scaling functions spread over wide areas within the image. Because a noise standard deviation of σ = 32 produces relatively high-energy noise (hence a relatively 1 in the secondlow SNR), the reduction of noise energy by a factor of 16 level trend does not reduce the amplitudes of the reconstructed noise terms sufficiently to render them invisible. However, in the fourth- and fifth-level 1 1 and 1024 , denoisings, the reductions of trend noise energy by factors of 256 respectively, are sufficient to render the reconstructed trend noise invisible. The third-level denoising exhibits trend noise that is just barely visible. The most striking weakness of the VisuShrink denoisings is the oversmoothing of the reconstructed images. This oversmoothing is particularly extreme in Figures 5c and 5d, where it is so great that it causes a significant reduction in SNRs. Oversmoothing is caused by the VisuShrink thresholds being set too high to capture the higher-level wavelet coefficients needed for producing a sharp image. Because successive trends of f are repeated averagings, there is a blurring of edges in successive trends, which produces a decrease in amplitudes of significant wavelet coefficients near these blurred edges. Consequently, the single threshold τ V set by VisuShrink tends to threshold out some wavelet coefficients at higher levels that are needed for producing sharp edges in the reconstructed image. This is clearly shown in the images in Figure 5, which become progressively more blurred as higher-level transforms are used. It is interesting to compare the Taws-spin denoising in Figure 1d with these VisuShrink denoisings. The Taws-spin denoising is significantly less blurred and does not exhibit any mottling. A fifth-level Daub 9/7 transform was used for this Taws-spin denoising, but the threshold was equal to τ V/8. Using a much lower threshold than that of VisuShrink, allows Taws-spin to circumvent the oversmoothing problem. Thus Taws-spin is able to employ a fifth-level transform, which completely eliminates any mottling in its denoising. Finally, like the VisShrink denoisings, the Taws-spin denoising appears to be essentially free of any random noise: the image appears to be produced from our piecewise smooth image model.
TREE-ADAPTED WAVELET SHRINKAGE
363
The defects in VisShrink denoisings, as well as the “empirical gap” present in the size of the (log J)2 factor in Eq. (21), have led to continuing efforts to improve the performance of wavelet-based denoising. Major improvements over VisuShrink were obtained with the SureShrink method and cycle-spin thresholding.
B. Wavelet Shrinkage: SURESHRINK The SureShrink method of Donoho and Johnstone (1995) is also based on wavelet shrinkage. However, SureShrink uses independently chosen thresholds for each fluctuation at each level of the wavelet transform. This typically results in different shrinkage thresholds for each of these fluctuations. By using different shrinkage thresholds for each fluctuation, SureShrink is able to include more significant wavelet coefficients, which thus alleviates the blurring problem of VisuShrink and produces more-detailed images. For example, Figure 6 shows two SureShrink denoisings of the noisy Peppers image in Figure 1b. These images show the typical kinds of improvements obtained with SureShrink. The denoised images are considerably sharper than the corresponding VisuShrink denoisings with much higher SNRs. In particular, notice that the fourth-level SureShrink denoising exhibits essentially no loss of detail and is slightly better perceptually than the third-level denoising. To select the SureShrink thresholds, Donoho and Johnstone (1995) made use of an estimator of risk developed by Stein (1981). This risk estimator ensures that a threshold can be chosen for each fluctuation which, on average, minimizes the risk among all shrinkage thresholds. By minimizing these risks,
Figure 6. SureShrink denoisings of the noisy image in Figure 1b, using two levels of the Daub 9/7 transform: (a) third level, SNR = 20.7 db; (b) fourth level, SNR = 20.7 db.
364
JAMES S. WALKER
SureShrink produces a significantly lower total risk (and thus a higher SNR) than that of VisuShrink. For instance, in Johnstone (1999) it is shown that the following holds for the risk error of a SureShrink denoising fS of a noisy signal:
R[ f S , f ] ∼ C(log J )R[ f I , f ]
(22)
This asymptotic result compares favorably with Eq. (21); the (log J )2 factor has been replaced by just log J. As can be seen in Section V, the Taws method builds on the SureShrink method in that various thresholds are used for different levels and fluctuations of the wavelet transform. However, unlike SureShrink, Taws uses different thresholds for single wavelet coefficients and correlates the choice of threshold with the location of edges within the image. This helps Taws to better capture edge details and thus produce better resolved denoisings than those of SureShrink.
C. Cycle-Spin Thresholding Besides SureShrink, another significant improvement over VisuShrink is the method of cycle-spin thresholding. Cycle-spin thresholding was first described in Coifman and Donoho (1995) and in Lang et al. (1995). Although it has not been established on as firm a theoretical foundation as VisuShrink (see, however, Chambolle and Lucier, 2001 for some initial work), its basic features are fairly well understood and, in practice, it produces far superior denoisings. Cycle-spin thresholding addresses two related shortcomings of VisuShrink. One shortcoming is the lack of shift-invariance of wavelet transforms. From Eq. (2), it is easy to see that the wavelet transform f (x) #→ {β mj } is not shift invariant. This lack of shift-invariance carries over to the discrete wavelet transform as well. For example, Figure 7 shows V1 wavelet coefficients for Daub 9/7 wavelet transforms of the Boats image in Figure 4a and a shifted version of this image. At the threshold of τ = 32, these wavelet transforms are not simply shifts of each other (as would be the case with a shift-invariant transform). Notice, in particular, that there are many more significant coefficients near the central mast of the boat in Figure 7b. The goal of cyclespin thresholding is to include such new significant coefficients in shiftedimage transforms, which thus produces a sharper denoised image that includes more edge details. By including these new significant coefficients, cycle-spin thresholding is able to ameliorate the other shortcoming of VisuShrink: its oversmoothing.
TREE-ADAPTED WAVELET SHRINKAGE
365
Figure 7. Comparison of V1 wavelet coefficients for the Boats image in Figure 4a and a shift of the Boats image. The images show locations of wavelet coefficients with magnitudes at least as great as the threshold τ = 32. (a) Original. (b) One-pixel shift to the right.
Cycle-spin thresholding achieves shift invariance by averaging all distinct cycle shifts of the noisy image. More precisely, every cycle-shift (G j− j0 , k−k0 ), for j0 = 0, ± 1, . . . , ± (J/2 − 1) and k0 = 0, ± 1, . . . , ± (K/2 − 1), is denoised and all these denoisings are averaged (after reverse shifting) by means of a simple arithmetic mean.∗ In practice, it has been found that—rather than using shrinkage for denoising—it is better to use simple thresholding because it yields higher SNRs and perceptually sharper denoisings. “Simple thresholding” means that each transform’s values are subjected to the thresholding function: 0 if |t| < τ T (t) = t if |t| ≥ τ Thus, the coefficient is retained if its magnitude is at least as large as τ ; otherwise the coefficient is set to zero. Generally, the VisuShrink threshold τ V is used. Although carrying out so many denoisings to produce the average denoised image may appear to be inordinately time consuming, it has been shown that the whole process can be performed in O(P log P) operations, where P is the number of pixels in the noisy image. Because all distinct cycle shifts are used, the operation of cycle-spin thresholding is shift invariant. As an example of how well cycle-spin thresholding performs, let us consider the two images shown in Figure 8. These are cycle-spin thresholdings of the ∗ The term cycle shift is used because when j − j0 or k − k0 fall outside the range of indices for the rows or columns of the image matrix, a wraparound is used to define G j− j0 , k−k0 .
366
JAMES S. WALKER
Figure 8. Cycle-spin thresholdings of the noisy image in Figure 1b, using two levels of the Daub 9/7 transform: (a) fourth level, SNR = 21.2 db; (b) fifth level, SNR = 21.2 db.
noisy image in Figure 1b. These images show the typical kinds of improvements that cycle-spin thresholding provides over VisuShrink. The SNR in each case is significantly higher than the SNR for the same-level VisuShrink denoising. There is also much less degradation of quality and less oversmoothing. Consequently it is feasible to use fourth- and fifth-level transforms with cycle-spin thresholding. The Taws method of denoising also benefits from cycle-spin averaging. The cycle-spin version of Taws, called TAWS-SPIN, was used for the denoisings described in Section II. Taws-spin is described in the next section, where the theory and the application of tree-adapted wavelet shrinkage are explained. V. Tree-Adapted Wavelet Shrinkage In this section the theory of Taws and the details of its implementation are described. Taws is designed to exploit, in a computationally simple way, the four basic properties of wavelet transforms that were discussed in Section III. These four properties are elaborated in this section, and it is shown how they lead to the Taws approach to separating image-dominated wavelet coefficients from noise-dominated coefficients. Taws performs this separation by means of three selection principles. These selection principles enable Taws to select image-dominated wavelet coefficients at much lower thresholds than the VisuShrink threshold, which thus supplies more details, especially near edges. Capturing more image details gives Taws denoisings a particularly sharp, focused appearance. After a theoretical justification for Taws is provided, a detailed description of its algorithm is given. The Taws algorithm is a modification of an
TREE-ADAPTED WAVELET SHRINKAGE
367
image-compression algorithm called ASWDR (adaptively scanned wavelet difference reduction) (Walker, 2000). Improvements to Aswdr were made by Walker and Nguyen (2000a). These improvements led to corresponding improvements in Taws, described in Walker (2001). This newer version of Taws is discussed in this section instead of the older version introduced in Walker and Chen (2000). Software which executes the Taws algorithm (including source code) can be found at the web site listed in Section II. The close relationship between Taws and the image-compression algorithm Aswdr has allowed for the development of a combined image compressor plus denoiser called TAWS-COMP. Because combining image compression with denoising is tangential to the main subject, it is only briefly examined in this article. More details concerning Taws-Comp can be found in Walker (2001) and Walker (2002).
A. The Theory of TAWS: Tree Structure of Wavelet Transforms The Taws method of removing random noise from images is based on the four properties of wavelet transforms—energy conservation, energy compaction, two populations, and clustering—which were discussed in Section III. Section IV covered how energy compaction implies that noise can be effectively removed from trend coefficients. How VisuShrink essentially removes all noise-dominated wavelet coefficients by rejecting all coefficients having a magnitude less than the threshold τ V was also discussed. As a way to improve VisuShrink denoisings, the Taws method makes use of the two populations and clustering properties to distinguish image-dominated from noise-dominated wavelet coefficients at thresholds that are smaller than τ V. Taws uses three selection principles to perform this identification of imagedominated wavelet coefficients. Before these selection principles can be stated, the concepts of child coefficients (children) and parent coefficients (parents) must be discussed. For 1D signals, the wavelet coefficient β mj , corresponding to ψ mj (x) = 2−m/2 ψ and β2m+1 (2m x − j), has two child coefficients β2m+1 j j+1 . These children corm+1 m+1 respond to the wavelets ψ2 j (x) and ψ2 j+1 (x). The dilation and translation structure of a wavelet system implies that the support of ψ mj contains the supm+1 and ψ2m+1 and β2m+1 ports of both ψ2m+1 j j+1 . In other words, the children β2 j j+1 encode information about the signal within the same spatial region as the parent β mj . For example, if the signal is smooth over the support of mj , then the parent and its children will all have small magnitudes (this concept is made more precise subsequently). Applying the parent and child definitions to all levels allows us to obtain a binary tree structure connecting parents and children.
368
JAMES S. WALKER
The generalization of the notions of parents and children to 2D images m is the parent of four chilis simple. For instance, a vertical coefficient V j,k m+1 m+1 m+1 m+1 dren V2 j,2k , V2 j+1,2k , V2 j,2k+1 , and V2 j+1,2k+1 . Similar definitions apply to the horizontal and diagonal coefficients. Thus we obtain a four-leaved tree connecting parents with children. As for the 1D case, the supports of the 2D wavelets corresponding to children are contained within the supports of the 2D wavelets corresponding to their parents. Hence, if an image is smooth over the support of a wavelet corresponding to a parent, then it will be smooth over all the wavelets corresponding to its children. Consequently, over smooth regions, away from edges, both parents and children will have small magnitude. Wavelet coefficients of small magnitude are less significant, contributing less energy to the image, than wavelet coefficients of larger magnitude. To make this notion of significance precise—and to relate it to the multiresolution, parentchild tree structure of image transforms— let us use the following definitions of insignificant and significant coefficients. Given a threshold T > 0, a parent is insignificant if its magnitude is less than T, and a child is insignificant if its magnitude is less than the half-threshold T/2. Likewise, a parent is significant when its magnitude is greater than or equal to T, and a child is significant when its magnitude is greater than or equal to T/2. It is important to note that insignificance and significance are relative notions which depend on the size of the given threshold T. Furthermore, the half-threshold T/2 is used for testing for significance of children. We will see later that with piecewise smooth images it is correct to use this half-threshold for children. Moreover, we shall see that using these different thresholds for parents and children closely corresponds to the tree-based coding structure of the Aswdr image-compression algorithm. By using the coding structure of this algorithm, we can determine the significance of parents and children in a logically consistent manner.
B. The Theory of TAWS: Selection Principles All the background needed for stating the Taws selection principles has been provided. For wavelet coefficients having magnitudes below the VisuShrink threshold, Taws uses the following three principles to distinguish imagedominated coefficients from noise-dominated coefficients: 1. Accept only significant children with significant parents. 2. Reject a significant parent if all its children are insignificant. 3. Accept only significant coefficients with at least one adjacent significant coefficient.
TREE-ADAPTED WAVELET SHRINKAGE
369
Figure 9. Relation between parent and child coefficients (V2 and V1 coefficients). (a) Peppers image. (b) Locations (white pixels) of significant parents, threshold = 24. (c) Locations (white pixels) of significant children, threshold = 12.
A good illustration of these selection principles can be seen in Figure 9. In Figure 9b and 9c we can see the locations of significant parents and children, with threshold T = 24, for a Daub 9/7 transform of the Peppers test image. The gray pixels in both images indicate insignificant coefficients, and the white pixels indicate significant coefficients. The similarity of the regions made up of gray pixels in the two images indicates that insignificant parents tend to have insignificant children, which implies selection principle 1. Likewise, the similarity of the regions made up of white pixels indicates that significant parents tend to have some significant children, which implies selection principle 2. Finally, the clustering of significant coefficients (in either image) illustrates the validity of selection principle 3. Let us now see how these selection principles are derived from basic ideas of wavelet analysis. Selection principle 1 follows from the fact that in most instances the children of an insignificant parent are all insignificant. More precisely, within a smooth region of an image, the implication |parent| < T
⇒
|child| < T /2
(23)
holds for thresholds T that are not too small. To see why Eq. (23) holds, let us consider the 1D case. Suppose that the wavelet ψ has K ≥ 1 zero moments and that the 1D signal f (x) has an n-term Taylor expansion (for 1 ≤ n ≤ K) about the point xj = j2− m. That is, f (x) =
n−1 1 (k) 1 (n) f (x j ) (x − x j )k + f (tx ) (x − x j )n k! n! k=0
(24)
where tx lies between x and xj. Equation (24) is consistent with our piecewise smooth signal model, provided x lies within a smooth region of the image.
370
JAMES S. WALKER
It is also consistent with our model to assume that | f (n)(x)| is bounded by a constant B, for all x within a smooth region and lying a fixed distance from the transition points between different smooth regions. We may also suppose that the wavelet ψ is supported within the finite interval [− a, a].∗ The wavelet coefficient β mj is then bounded as follows (using zero moments and the bound on | f (n)(x)|): / / / m/ / ∞ / m /β / = / f (x)ψ j (x) d x // j / −∞ / ( j+a)/2m / / / 1 (n) = // f (tx )(x − x j )n ψ mj (x) d x // ( j−a)/2m n! m / / B ( j+a)/2 ≤ |x − x j |n /ψ mj (x)/ d x n! ( j−a)/2m
Applying the Schwarz inequality to the last integral yields # a $n+1/2 / m/ /β / ≤ -B j m n! n + 21 2
(25)
Inequality (25) is the key to proving Eq. (23). To use Eq. (25) to prove Eq. (23), let us first note that the right side of Eq. (25) decreases toward zero at an exponential rate, as n increases and as m increases. Therefore, if the threshold T is not too small, we may assume that # a $n+1/2 B
Inequality Eq. (26) holds under a range of conditions: (1) if T is moderately large, (2) if n is sufficiently large (sufficient smoothness of the signal), and (3) if m is sufficiently large (coefficients in lower levels). From Eqs. (25) and (26) we obtain / m+1 / / m+1 / / m/ /β / < T / < T /2 /β / < T /2 /β (27) j
2j
2 j+1
Because some mixture of conditions 1–3 generally holds for large numbers of wavelet coefficients—provided they are isolated from sharp transition regions between different smoothness regions—it follows that Eq. (23) is valid in a statistical sense (valid for most cases, provided sharp transition regions are avoided and the threshold T is not too small). The general validity of Eq. (23) establishes selection principle 1. ∗ A symmetric interval about zero is needed for Daub 9/7 wavelets, but this assumption is also valid for all the Daubechies wavelets and all other compactly supported wavelets.
TREE-ADAPTED WAVELET SHRINKAGE
371
The general validity of selection principle 1 has now been established for 1D piecewise smooth signals. Similar reasoning applies to the 2D case of piecewise smooth images. So long as the support of the wavelets corresponding to parent coefficients lie within a smoothness region (away from sharp transitions near edges) and the threshold T is not too small, then Eq. (23) is valid; hence selection principle 1 is also valid. Notice that the argument for selection principle 1 was based on smoothness of the image and does not apply to random noise. Demonstration of selection principle 1 established the validity of the implication in Eq. (23). In other words, an insignificant parent has all insignificant children. Applying this last statement to all descendants of an insignificant parent, we obtain a zero tree of insignificant coefficients. These zero trees have been used extensively in image-compression algorithms, such as the Embedded Zerotree Wavelet (EZW) algorithm of Shapiro (1993) and the Set Partitioning in Heierarchical Trees (SPIHT) algorithm of Said and Pearlman (1996b).∗ Selection principle 2 can be verified by using statistics assembled from natural images. In the Taws algorithm described in Section V.D, selection principle 2 is used as a criterion for choosing newly emergent significant children when the threshold T is decreased to a half-threshold T/2. If a parent coefficient was significant when the threshold is T, then we define the conditional probability P(new | old) by P(new | old) = Prob(new significant child | old significant parent)
(28)
Thus P(new | old) is the probability that a new significant child, whose magnitude satisfies T > | child | ≥ T/2, has an old significant parent, whose magnitude satisfies |parent |≥ T. Table 2 gives the fraction of new significant children having old significant parents at several Daub 9/7 transform levels for four test images and for random noise. The values for the random noise were obtained by averaging five realizations of Gaussian random noise with mean 0 and standard deviation 28. (Although only five realizations were used, the results in Table 2 for the noise were stable, showing deviations of no more than ± 0.01 for all values.) The data in this table clearly show that the probability in Eq. (28) is much greater, for moderately high threshold values, for the test images than for the random noise. In other words, for the random noise only, it is highly unlikely that a newly significant child will have an old significant parent when the threshold is greater than the standard deviation for all the children. This provides a statistical validation of selection principle 2 as a criterion for choosing newly emergent significant children when the threshold T is decreased to a half-threshold T/2. ∗
See also Walker and Nguyen (2000b) for a description of both algorithms.
372
JAMES S. WALKER TABLE 2 Fraction of New Significant Children of Old Significant Parentsa Parent level/Threshold:
512
256
128
64
32
16
8
Lena, 4th, σ = 37b Goldhill, 4th σ = 34 Boats, 4th, σ = 47 Barbara, 4th, σ = 38 Noise, 4th, σ = 25
(0.30) (0.00) (0.13) (0.06)
0.37 0.31 0.33 0.43
0.46 0.42 0.45 0.54 0.00
0.57 0.48 0.53 0.60 0.02
0.66 0.59 0.61 0.63 0.25
0.68 0.69 0.69 0.68 0.56
0.68 0.68 0.74 0.77 0.77
(0.07) (0.04) (0.14) 0.01
0.31 0.24 0.40 0.09 0.00
0.50 0.37 0.47 0.26 0.03
0.56 0.45 0.54 0.38 0.26
0.55 0.54 0.65 0.51 0.58
0.50 0.65 0.66 0.60 0.78
(0.95) (0.07) (0.41) 0.03 0.00
0.51 0.32 0.43 0.21 0.03
0.54 0.35 0.50 0.37 0.29
0.49 0.40 0.57 0.51 0.59
0.38 0.46 0.55 0.51 0.79
Lena, 3rd, σ = 15 Goldhill, 3rd, σ = 15 Boats, 3rd, σ = 19 Barbara, 3rd, σ = 22 Noise, 3rd, σ = 26
Lena, 2nd, σ = 19 Goldhill, 2nd, σ = 6 Boats, 2nd, σ = 6 Barbara, 2nd, σ = 12 Noise, 2nd, σ = 24
(0.00)
a
From Walker, J. S., and Chen, Y.-J. (2000). Image denoising using tree-based wavelet subband correlations and shrinkage. Opt. Eng. 39, 2904. Reprinted with the permission of the International Society for Optical Engineering (SPIE) and Y.-J. Chen. b The standard deviations σ are for the child coefficients of each level. The noise values were obtained from averaging values for wavelet transforms of five realizations of Gaussian random noise with mean 0 and standard deviation 28. A fraction in parentheses indicates that it may be an unreliable estimate of the conditional probability P (new | old) in Eq. (28) because the number of new significant values was too small (<200).
Selection principle 2 is a consequence of using wavelets that are continuous and have compact support. Consequently, sharp transitions in image values near edges produce relatively higher values for the inner products of the image with wavelet basis functions supported in regions overlapping these edges. Thus relatively higher transform values occur near edges. Detailed theoretical verifications of this fact are described in Mallat (1999), Mallat and Hwang (1992), and Wang (1995). Furthermore, because the support of the wavelet corresponding to a parent contains the supports of the wavelets corresponding to its children, it follows that a significant value of a parent occurring near an edge will, at least statistically, imply that some of its children will also be significant. Table 2 provides supporting data for this last assertion. Finally, a brief discussion of the validity of selection principle 3 is given, which relies on statistical data compiled by other researchers. Selection principle 3 follows from the large amount of overlap of supports of adjacent wavelet
TREE-ADAPTED WAVELET SHRINKAGE
373
basis functions. Because of this overlap, if a wavelet coefficient is large near an edge, then there is a high probability that adjacent wavelet coefficients will also be large. Huang and Mumford (1999) have assembled statistics, for a large number of test images, which show a high correlation between magnitudes of adjacent coefficients. They have also developed theoretical models explaining this correlation. Liu and Moulin (2001) have analyzed the degree of mutual information contained in neighboring coefficients versus parent-child coefficients. They found that neighboring coefficients are, in general, slightly more closely correlated than parent-child coefficients are. Buccigrossi and Simoncelli (1999) have reported similar findings. Because correlation between parent–child coefficients is expressed by the validity of selection principles 1 and 2, it follows that selection principle 3 must be valid as well. C. The ASWDR ALGORITHM The Taws algorithm combines the three Taws selection principles with the Aswdr image-compression algorithm. Hence, before Taws is described, Aswdr must first be summarized. The Aswdr image-compression algorithm consists of five parts, as shown in Figure 10. In the initialization part of Aswdr, a wavelet transform of the
Figure 10. Block diagram for Aswdr and Taws algorithms.
374
JAMES S. WALKER
image is computed. An initial threshold value T0 is chosen so that all transform values have magnitudes that are less than 2T0 and at least one has magnitude greater than or equal to T0 . The purpose of the loop indicated in Figure 10 is to encode significant transform values by the method of bit-plane encoding. A binary expansion, relative to the quantity 2T0 , is computed for each transform value. The loop constitutes the procedure by which these binary expansions are calculated. As the threshold is successively reduced by half, the parts labeled significance pass and refinement pass compute the next bit in the binary expansions of the transform values.∗ It will be seen that replacing the threshold T by its half-value T/2, along with the looping through the significance pass, results in a logically consistent method for testing the significance of parents and children. Each part of the Aswdr algorithm is next described in more detail. The initialization part, as described previously, involves wavelet transforming the image and choosing an initial threshold T = T0. One other task in initialization is assigning a scan order. For an image with P pixels, a scan order is a one-to-one and onto mapping, gˆ i, j = xk , for k = 1, . . . P, between the transform values (gˆ i, j ) and a linear ordering (xk). This initial scan order is a zigzag from higher to lower levels (Shapiro, 1993), with row-based scanning through the horizontal coefficients, column-based scanning through the vertical coefficients, and zigzag scanning through the diagonal coefficients. The next part of the algorithm is the significance pass. In this part, new significant transform values xm satisfying T ≤ |xm| < 2T are identified. Their index values m are encoded by using the difference reduction method of Tian and Wells (1996). The difference reduction method essentially consists of a binary encoding of the number of steps to go from the index of the last significant value to the index of the present significant value. More details can be found in Tian and Wells (1998) or Walker and Nguyen (2000b). The quantized value qm = T sgn(xm) is assigned to the index m at this point. Following the significance pass, there is a refinement pass. The refinement pass is a process of refining the precision of old quantized transform values qn, which satisfy | qn | ≥ 2T. Each refined value is a better approximation of an exact transform value. The precision of quantized values is increased to make them at least as accurate as the present threshold. For example, if an old significant transform value’s magnitude | xn | lies in the interval [32, 48), say, and the present threshold is 8, then whether its magnitude lies in [32, 40) or [40, 48) will be determined. In the latter case, the new quantized value becomes 40 sgn (xn), and in the former case, the quantized value remains 32 sgn (xn). The refinement pass adds another bit of precision in the binary expansions of the scaled transform values {xk /(2T0 )}. ∗
See Walker and Nguyen (2000b) for a more complete description of bit-plane encoding.
TREE-ADAPTED WAVELET SHRINKAGE
375
Following the refinement pass, the new scan order part is performed. This is the part of the Aswdr algorithm where ideas similar to the Taws selection principles are employed. A new scan order is created by a bootstrap process proceeding from higher to lower levels of the wavelet transform. This bootstrap process is called the ASWDR new scan order procedure: At the highest level (which contains the trend coefficients), use the indices of the remaining insignificant values as the scan order at that level. Assuming that the new scan order is already created at level r, a new scan order is created at level r − 1 in the following way. Use the old scan order to scan through the significant wavelet coefficients at level r in the transform. The first part of the new scan order at level r − 1 contains the insignificant children of these significant wavelet coefficients. Rescan through the insignificant wavelet coefficients at level r. The second part of the new scan order at level r − 1 contains the insignificant children, at least one of whose siblings is significant, of these insignificant wavelet coefficients. Rescan a second time through the insignificant wavelet coefficients at level r. The third part of the scan order at level r − 1 contains the insignificant children, none of whose siblings are significant, of these insignificant wavelet coefficients. (Although this description is phrased in terms of a three-scan process, it can be performed in one scan by linking three separate chains at the end of one scan.) Use this new scanning order for level r − 1 to create the new scanning order for level r − 2, until all levels are exhausted. The rationale for the creation of a new scan order is to reduce the number of steps between the new significant values which emerge when the threshold T is reduced to T/2. For example, let us consider Figures 9b and 9c. If the value of the threshold T is 24 for the Peppers image, then the first part of the new scan order for the transform coefficients in Figure 9c are the insignificant children of the significant parent locations shown as white pixels in Figure 9b. This captures a high percentage of new significant children within the first part of the new scan order, which thus greatly reduces the number of steps and hence the number of bits needed for encoding. Notice also how the locations of significant values are highly correlated with the locations of edges within the Peppers image. The scanning order of Aswdr dynamically adapts to the locations of these edge details in an image, and this enhances the resolution of these images in Aswdr-compressed images.∗ Because the Taws algorithm makes use of a similar dynamically adapted scanning order, it also enjoys high resolution of edges. Consequently, as seen in Section VI, Taws-denoised images are sharper, more in focus, than denoisings obtained with other wavelet methods. ∗ The importance of edges for human vision was emphasized by Marr (1982). A wavelet analysis of the key role played by edges in image formation was developed by Mallat and Zhong (1992).
376
JAMES S. WALKER
The Aswdr procedure continues either until the threshold T is less than some preassigned value τ , or until a preassigned number of bits (a bit budget) has been used for encoding. The first stopping criterion is used by the Taws algorithm, whereas the second stopping criterion is used by the Taws-Comp simultaneous compression and denoising algorithm. D. The Taws Algorithm The Taws denoising algorithm combines the three Taws selection principles with the Aswdr image-compression method just summarized. So that this combination can be achieved, three parameters are used. One parameter is the descent index D, which is a nonnegative integer. The Taws threshold τ T is then set at ατ V/2D, where the second parameter, the height index α, satisfies 1 ≤ α ≤ 2. For all the Taws denoisings reported in this √ article, the value of this second parameter α was set as 2. Note that if D = 0 and α = 1, then τ T = τ V and Taws reduces to the VisuShrink method. The third parameter is a depth index D, which is an integer lying between 1 and L, where L is the number of levels in the wavelet transform. The nature of the depth index D is clarified subsequently. The basic structure of Taws is diagrammed in Figure 10. The implementation of the Taws algorithm is described in detail next: Step 1 (Initialization) As described for the Aswdr algorithm, in this step the image is transformed, a scanning order {xk} is defined for searching through the transformed image, and an initial threshold T = T0 is chosen. For later use, the integer K is defined by the equation (2T0 )/2 K = ατV . That K is an integer can be arranged by multiplying the transform values by an appropriate constant; this rescaling of transform values is needed to ensure that T = τ T after K + D cycles through steps 2–5. Step 2 (Significance Pass) Determine new significant index values, the indices m for which xm has a magnitude satisfying T ≤ | xm |< 2T. Assign qm = T sgn (xm) as the quantized value corresponding to xm. Step 3 (Refinement Pass) Refine the quantized transform values corresponding to old significant transform values. Each refined value is a better approximation to an exact transform value. The refinement process corresponds to computing the bits in the binary expansions of the scaled transform values {xm /(2T0 )}. Step 4 (New Scan Order) Create a new scanning order as follows. For the first K cycles through steps 2–5, during which T ≥ τ V, produce a new scanning order by following the Aswdr new scan order procedure. For cycles K + 1 to K + D, the threshold T satisfies T < τ V. For these cycles, proceed as follows. If the level r is larger than D + 1, then use the Aswdr new
TREE-ADAPTED WAVELET SHRINKAGE
377
scan order procedure to produce the new scan order for level r − 1. For each level r from D + 1 to 2, produce the new scan order at level r − 1 as follows. Use the old scan order to scan through the significant wavelet coefficients in level r. Include in the new scan order all of the insignificant children of xm. (This implements selection principle 1, and implements selection principle 2 as a searching procedure for new significant coefficients.) Step 5 (Divide Threshold by 2) Replace the threshold T by half its value and repeat steps 2–4 until this new threshold T is less than τ T. When the procedure is finished, selection principle 3 is invoked by setting to zero all quantized transform values of magnitude less than τ V which do not have a nonzero adjacent value. Furthermore, as a way to ensure greater accuracy of nonzero quantized transform values, the refinement pass is executed several more times (five further refinements usually provide a sufficient increase in SNR). Finally, shrinkage is applied by using τ = τ T, the quantized transform values are divided by the rescaling multiplier used in step 1, and an inverse transform is performed to produce a denoised image. One important feature of the Taws algorithm is its fast execution. The whole procedure can be performed in O(P) operations, where P is the number of pixels in the noisy image. In fact, it takes only 1.3 s to denoise a 512 × 512 image on a 1-GHz machine with 256 MB RAM. Moreover, this speed is attained without any coding optimizations (such as expressing the rescaled transform in pure binary form to allow for replacing divisions by bit shifts and for faster comparisons). In the original description of Taws in Walker and Chen (2000), the choices of the parameters L, D, and D were not automatically specified. With the improvements of Taws described in this subsection, it is now possible to specify choices for these parameters which are automatically set, independently by the algorithm. For all images tested, the following parameter values have been found to generally produce the best denoisings (highest SNRs and good visual characteristics). The level parameter is set as L = 5. Using five levels in the Daub 9/7 transform greatly reduces the fraction of noise energy in the fifthlevel √ trend. The descent index is set as D = 3. The lowest possible threshold is then 2τV /8 which is typically about equal to the standard deviation σ of the noise. The depth parameter D is given one of two values, depending on the size of σ . If σ ≤ 25.6, then the depth parameter is set as D = 2; if σ > 25.6, then the depth parameter is set as D = 3. The reason for the difference is that when σ is fairly large (>25.6), then more noise values contribute significant energy at higher levels of the transform. Using D = 3 instead of D = 2 allows for a more stringent application of the Taws selection principles, including at one higher level of the transform. Consequently, more noise is generally removed and a higher SNR is generally obtained when D = 3 is used with higher values of σ .
378
JAMES S. WALKER
Figure 11. Effectiveness of Taws selection principles. (a) Boats image. (b) Transform. (c) Noisy transform. (d) VisuShrink. (e) τT shrinkage. (f) Taws.
To appreciate the effectiveness of the Taws algorithm, let us consider the images shown in Figures 11 and 12. Figure 11a shows the Boats test image. A portion of the Daub 9/7 transform of this image is shown in Figure 11b. To be precise, the first-level vertically oriented coefficients are shown. Only √ significant coefficients, those whose magnitude is at least as large as τT = 2 τV /8, are shown in Figure 11b and in Figures 11c through 11f. The gray background in these images corresponds to insignificant coefficients; darker pixels indicate negatively valued significant coefficients and lighter pixels indicate positively valued significant coefficients. The effect of adding random noise with σ = 20, shown in Figure 12a, is illustrated by the noisy transform in Figure 11c. Many of the details apparent in Figure 11b are obscured in Figure 11c. When VisuShrink is applied to the noisy transform in Figure 11c, then the transform shown in Figure 11d is obtained. We can see that almost all the details from Figure 11b are lost. This loss of detail leads to the blurred, oversmoothed denoising shown in Figure 12b. If shrinkage is applied with the smaller threshold τ T, then we obtain the transform shown in Figure 11e. A large amount of residual noise remains.
TREE-ADAPTED WAVELET SHRINKAGE
379
Figure 12. Comparison of Taws with VisuShrink and τ T shrinkage. (a) Noisy image, SNR = 17.3 db. (b) VisuShrink SNR = 20.8 db. (c) τ T shrinkage, SNR = 21.1 db. (d) Taws, SNR = 23.9 db.
The denoised image in Figure 12c corresponding to this transform retains a correspondingly large amount of residual noise. Finally, Figure 11f shows the transform resulting from using the Taws algorithm. There is far less residual noise, compared with that in Figure 11e, and considerably more detail than in Figure 11d. Consequently, the Tawsdenoised image in Figure 12d, resulting from the transform in Figure 11f, is a sharper, more focused image than the VisuShrink denoising and yet it does not contain residual noise as in Figure 12c. The Taws denoising is superior, both perceptually and in terms of SNR, to both the VisuShrink denoising and the denoising obtained by shrinkage with τ T. This example illustrates the superiority of Taws over VisuShrink. However, much-higher-performance denoisers have superseded VisuShrink. In Section VI, Taws is compared with such state-of-the-art denoisers.
380
JAMES S. WALKER
E. The TAWS-SPIN Algorithm Because it utilizes thresholding, the Taws algorithm can be improved by an averaging of shifted versions of the noisy image, as discussed in Section IV for cycle-spin thresholding. This averaging algorithm, called TAWS-SPIN, simply consists of averaging Taws denoisings of a finite number of cyclic shifts of the noisy image: (Gi−k,j−m) for k, m = 0, ± 1, . . . , ± N. It is important to remember that although cycle-spin thresholding is an O(P log P) method, it still requires considerable memory resources—and fairly complicated bookkeeping—to store parts of previously computed transforms of shifted images. The Tawsspin method requires only a modest increase in memory resources—just two extra arrays equal in size to the image array for holding the image shifts and for storing partial averages—and there is no extra bookkeeping needed because complete denoisings are performed to create the averages. Experiments with Taws-spin show that SNR values rapidly converge as the number of shifts increases, and that N = 1 yields a good compromise between increased SNR values and increased time and memory consumption needed to perform averages. For N = 1, just nine denoisings are averaged, hence Tawsspin still performs with O(P) complexity. As mentioned in Section II, the time for a Taws-spin denoising with N = 1 of a 512 × 512 image is about 12 s. Using N = 2 provides almost the highest possible SNR values (at about a three-times higher cost in calculation time than for N = 1). The Taws-spin denoisings in Section II were obtained by using N = 1, whereas those in Section VI were obtained with N = 2. Experiments have also determined that a height index of α = 2, a descent index of D = 4, and a depth index of either D = 2 (when σ ≤ 25.6) or D = 3 (when σ > 25.6) provide excellent Tawsspin denoisings. All the Taws-spin denoisings reported in this article used these parameter settings. Use of higher values for α and D allows the Taws-spin algorithm to apply the Taws selection principles over a greater range of threshold values (beginning with a higher value of ατ V and ending with a smaller value of τ T) and thus to be able to select out more residual noise values than Taws alone can.
F. The TAWS-COMP Algorithm One major attribute of Taws is its close connection to the image-compression procedure Aswdr. This close connection allows the Taws algorithm to be transformed into a simultaneous compressing and denoising algorithm called TAWS-COMP. The essential idea behind this algorithm is to simply transmit bits during the significance pass and refinement pass—as described for the Aswdr compression algorithm—which encode index values for new
TREE-ADAPTED WAVELET SHRINKAGE
381
significant coefficients and for how to refine old significant coefficients. These bits constitute the compressed image. They describe how to rebuild the denoised image transform. During compression, however, any noise-dominated transform values that would be removed by selection principle 3 (which is invoked after steps 1–5 in Taws) should not be encoded. Therefore, selection principle 3 must be invoked during steps 1–5. Moreover, removal of isolated noisy transform values—which contribute to isolated noisy artifacts in Tawsdenoised images—allows both the perceptual quality and the compression ratio of the denoised images to be improved. Most of these isolated noisy transform values can be removed by using selection principle 3—and by using principle 2 to remove significant parents without significant children—as part of step 4. A description of the Taws-Comp simultaneous compression and denoising algorithm follows. In this description the compressing parts of the algorithm are highlighted in italics. Step 1 (Initialization) As described for the Aswdr algorithm, in this step the image is transformed, a scanning order {x k } is defined, and an initial threshold T = T0 is chosen. Step 2 (Significance Pass) Determine new significant index values, the indices m for which xm has a magnitude satisfying T ≤ | xm | < 2T. Assign qm = T sgn(xm) as the quantized value corresponding to xm. Encode these new significant indices by using the difference reduction method described by Tian and Wells (1996, 1998) or Walker and Nguyen (200b). Step 3 (Refinement Pass) Refine the quantized transform values by successively computing the bits in the binary expansions of scaled transform values {xm /(2T0 )}. Encode the next bit in the binary expansion for each pass. Step 4 (New Scan Order) Create a new scanning order as follows. For the first K cycles through steps 2–5, during which T ≥ τ V, produce a new scanning order by following the Aswdr new scan order procedure. For cycles K + 1 to K + D, the threshold T satisfies T < τ V. For these cycles, proceed as follows. If the level r is larger than D + 1, then use the Aswdr new scan order procedure to produce the new scan order for level r − 1. For each level r from D + 1 to 2, produce the new scan order at level r − 1 as follows. Use the old scan order to scan through the significant wavelet coefficients in level r. If such a significant coefficient, xm, satisfies |xm| < τ V and has no significant children, then set xm = 0 and qm = 0 (thus invoking selection principle 2). Conversely, if | xm| < τ V and xm has some significant children, or if | xm | ≥ τ V, then include in the new scan order all of the insignificant children of xm that have at least one significant sibling. (This implements selection principle 1, and partially implements selection principle 3, for these lower levels.)
382
JAMES S. WALKER
Step 5 (Divide Threshold by 2) Replace the threshold T by half its value and repeat steps 2–4 until this new threshold T is less than τ T. Either integer-to-integer (Calderbank et al., 1998) or floating-point wavelet transforms can be used with the Taws-Comp procedure. When an integer-tointeger transform is used, then a rescaling of the transform is performed during the initialization step which approximates an orthogonal transform (Said and Pearlman, 1996a). (So that an integer-to-integer transform can be used with Taws, the rescaling of transform values in the initialization step of Taws should be replaced by the rescaling just described for Taws-Comp.) When the Taws-Comp procedure is finished, then decompression can be performed on the compressed data. This decompression consists of the following four steps: 1. Recapitulate steps 1–5 to obtain the quantized transform. 2. Set to zero all quantized transform values having magnitudes less than τ V which do not have a nonzero adjacent value (this implements selection principle 3). 3. Apply shrinkage to the quantized transform using threshold τ T. 4. Invert the quantized transform (and round to 8-bit precision). The description of Taws-Comp makes it appear as if the only exit point of the compression procedure is when T < τ T occurs in step 5. However, TawsComp also allows for checking the cumulative total of bits output throughout the compression process, and exiting may occur when this bit total exhausts a prescribed bit budget. Thus Taws-Comp can match preassigned bit rates. A similar exiting criterion within the decompression process allows Taws-Comp to decompress at any bit rate up to the total compressed rate. A more complete description of Taws-Comp can be found in Walker (2002). An example of a Taws-Comp compression plus denoising is given in Section VI. This example illustrates that Taws-Comp can perform denoising nearly as well as Taws while simultaneously compressing the image.
G. Other Transforms for TAWS The Taws procedures can be used with various different transforms. In this article, the Daub 9/7 transform was used exclusively. Other transforms that can be used are: (a) complex wavelet transforms (Kingsbury, 1998), (b) steerable wavelet transforms (Simoncelli and Freeman, 1995), (c) integer-to-integer wavelet transforms (Calderbank et al., 1998), and (d) the GenLOT transforms (de Queiroz et al., 1996) when mapped to a four-leaved tree structure (Tran, 1999; Tran and Nguyen, 1999).
TREE-ADAPTED WAVELET SHRINKAGE
383
Further research is needed on the effectiveness of these transforms for Taws denoising and how they compare. The complex and steerable wavelet transforms have already proven effective in other denoising algorithms (Choi et al., 2000; Strela et al., 2000). VI. Comparison of Taws with Other Techniques In this last section Taws is compared with other prominent denoising methods. This comparison is performed in two ways. First is an objective comparison using the SNR. The increases in the SNR, obtained by using Taws and other denoising methods, are compared on a suite of test images contaminated with Gaussian random noise having a range of standard deviations. This comparison shows that Taws produces SNRs essentially equal to those produced by SureShrink and that Taws-spin produces the highest SNRs of all the methods examined. Second, because the SNR does not always accord well with human visual perception, a subjective, visual comparison is made of several denoisings. This visual comparison illustrates the superior ability of Taws to produce denoisings that are sharply focused with well-defined edges. A. Objective: SNR Comparison Table 3 lists the SNRs for denoisings of five standard test images. These images were contaminated with Gaussian random noise having standard deviations of σ = 8, 16, 32, and 64. These standard deviations ranged from fairly low (σ = 8) to moderate (σ = 16), to moderately high (σ = 32) and extremely high (σ = 64). Seven denoising methods were used to produce the SNRs in Table 3. These methods included Wiener filtering, SureShrink, Taws, cycle-spin thresholding, and Taws-spin, which were discussed previously. In addition, Table 3 includes SNRs from two other prominent denoising methods, the Hmt method (Romberg et al., 1999a) and the Hmt-spin method (Romberg et al., 1999b). These Hmt methods use a Bayesian probabilistic approach in connection with Markov relations for the significance states, relative to thresholding, of the nodes in trees of wavelet coefficients. Complete details can be found in the papers just cited. Hmt methods bear some similarity to the Taws procedure, but they are based on a probabilistic decision theory, whereas the decision theory for Taws is deterministic. Columns 3–6 in Table 3 contain the SNRs for denoisings using Wiener filtering, the Hmt method, SureShrink, and Taws, respectively. These methods do not employ spin averaging. The data in Table 3 clearly indicate that among these non-spin-averaged methods, SureShrink generally produces the
384
JAMES S. WALKER TABLE 3 Comparison of SNRs for Various Denoising Methods Non-spin-averaged methods
Spin-averaged methods
Image, σ
Noisy SNR
W
Hmt
SuSh
Taws
C-S
H-S
T-S
Lena, 8 Goldhill, 8 Boats, 8 Barbara, 8 Peppers, 8
24.4 23.7 25.2 24.2 23.5
28.9 26.1 28.1 24.5 27.7
28.9 26.0 28.4 23.4 26.5
28.9 26.6 28.8 26.4 27.1
29.0 26.2 28.8 25.7 27.3
28.5 26.3 28.4 23.5 26.9
29.9 26.9 29.3 24.1 27.1
29.8 26.8 29.4 26.4 28.1
Lena, 16 Goldhill, 16 Boats, 16 Barbara, 16 Peppers, 16
18.4 17.7 19.2 18.2 17.6
25.0 23.3 25.0 22.1 24.0
25.5 22.8 24.6 20.0 23.6
25.6 23.3 25.1 22.3 24.2
25.6 23.1 25.0 22.1 24.4
25.7 23.2 25.0 22.0 24.4
26.4 23.5 25.3 20.5 24.2
26.2 23.8 25.9 22.3 25.2
Lena, 32 Goldhill, 32 Boats, 32 Barbara, 32 Peppers, 32
12.5 11.9 13.4 12.4 11.8
19.7 18.6 20.2 18.2 18.6
22.3 20.2 21.4 17.7 20.5
22.3 20.5 21.7 19.0 20.7
22.2 20.1 21.5 18.5 20.8
22.1 20.5 21.8 18.5 21.2
22.9 20.8 21.9 18.1 20.9
23.3 20.9 22.5 18.6 21.8
Lena, 64 Goldhill, 64 Boats, 64 Barbara, 64 Peppers, 64
7.2 6.53 8.1 7.1 6.4
14.5 13.6 15.1 13.6 13.3
19.2 17.7 18.6 16.0 17.0
18.8 17.8 18.5 16.0 17.0
18.7 17.3 18.2 15.7 16.7
18.8 17.5 18.3 15.9 17.3
19.6 18.1 18.9 16.2 17.2
19.6 18.3 19.1 16.3 17.4
a W, Wiener filtering; SuSh, SureShrink; C-S, cycle-spin thresholding; H-S, Hmt-spin; T-S, Taws-spin.
highest SNRs. However, the differences in SNRs among the three waveletbased methods (Hmt, SureShrink, and Taws) are not very significant— generally less than 0.5 db. In terms of SNRs, these three methods provide approximately equivalent performance. These wavelet-based methods all outperform Wiener filtering, particularly at the higher noise levels (σ = 32 and 64). The remaining columns in Table 3, columns 7–9, contain the SNRs for spinaveraged denoisings. These spin-averaged denoisings are cycle-spin thresholding, Hmt-spin, and Taws-spin. All these methods are averages of denoisings of cyclic shifts, (G i−k, j−m ) for k, m = 0, ± 1, . . . , ± N, of the noisy image. In each case, a certain type of denoising method—either thresholding, Hmt, or Taws—is applied to each cyclic shift, then all denoisings are averaged (after reverse shifting). For each method, it is possible to carry out this work in O(P log2 P) operations. However, it is important to remember that although cycle-spin methods have O(P log2 P) complexity, it still requires considerable memory resources to store parts of previously computed transforms of shifted images. Experiments
TREE-ADAPTED WAVELET SHRINKAGE
385
show that SNR values rapidly converge as the number of shifts increases and that N = 2 (averaging 25 different shiftings) yields a good compromise between increased SNR values and increased time and memory consumption needed for performing averages. The results reported in Table 3 for each of the three spin-averaged methods are for the case of N = 2. These results show that Taws-spin generally produces the highest SNRs, and sometimes its SNRs are significantly higher (at least 0.5 db higher) than one or both of the other spin-averaged methods. Taws-spin also generally provides the highest SNR values of all seven of the denoising methods. B. Subjective: Visual Comparison Although the SNR provides an objective standard for measuring the effectiveness of denoising, it does not always accord well with human visual perception. For example, although the VisuShrink denoisings in Figure 5 all have much higher SNRs than the noisy image in Figure 1b, most people would prefer the noisy image to the denoisings in Figures 5b through 5d. The noisy images are preferred because there is considerable loss of detail in the denoisings.
Figure 13. Denoisings of noisy Boats image, with SNR values. The noisy image in (b) has σ = 20. (a) Original Boats image. (b) Noisy image, SNR = 17.3 db. (c) SureShrink, SNR = 24.0 db. (d) Taws, SNR = 23.9 db. (e) Hmt, SNR = 23.5 db. (f) Wiener, SNR = 23.6 db.
386
JAMES S. WALKER
Figure 5a is the exception that proves the rule. It is preferred by some people because it retains enough detail while simultaneously having much-reduced noise. Thus we can see that the SNR cannot be used as the only measure of denoising effectiveness. We must also consider our subjective visual perception when we are comparing denoisings. One important point to consider is this: Human visual perception has been refined over millions of years of evolution, which has thus produced an excellent system for distinguishing image details from noise. Distinguishing image details is vitally important: the survival of individual humans, and our mammalian ancestors, has depended on it. Our perceptual ability to separate image details from noise explains the VisuShrink example discussed in the previous paragraph. When a denoising appears blurred and many image details are lost, human observers prefer the original noisy image. The ability of Taws to retain edge details, and the importance of such details for the focused appearance of images in our visual perception, enables Taws denoisings to appear more sharply focused than denoisings with other methods. The examples discussed next confirm this last statement. The first example is a denoising of the noisy Boats image shown in Figure 13b. This noisy image was obtained by adding random noise with
Figure 14. Magnifications of images in Figure 13: (a) Boats image, (b) Noisy image, (c) SureShrink, (d) Taws, (e) HMT, (f ) Wiener.
TREE-ADAPTED WAVELET SHRINKAGE
387
Figure 15. Denoisings of noisy Barbara image, with SNR values. The noisy image in (b) has σ = 16. (a) Original Barbara image. (b) Noisy image, SNR = 18.2 db. (c) SureShrink, SNR = 22.3 db. (d) Taws, SNR = 22.1 db. (e) Hmt, SNR = 20.0 db. (f) Wiener, SNR = 22.1 db.
σ = 20 to the Boats image in Figure 13a. Figures 13c through 13f show denoisings of this noisy image using SureShrink, Taws, Hmt, and Wiener filtering. Although the SureShrink denoising has the highest SNR, the Taws denoising is slightly more focused than any of the other denoisings (except perhaps the Wiener filtering). The Hmt denoising appears particularly blurred. Although the Wiener filtering might appear slightly more focused than the other denoisings, it retains considerable amounts of residual noise. These observations are even more clearly evident in the magnifications shown in Figure 14. As a second example, these same denoising methods were used on the Barbara image, contaminated with random noise having σ = 16. Because it is nearly impossible to distinguish between the full-size denoisings,∗ Figure 15. shows magnifications of the various denoisings. Although SureShrink has a slightly higher SNR, it retains some annoying isolated noise artifacts. The Hmt denoising is somewhat blurred and also exhibits some isolated noise artifacts. The Wiener filtering, as with the last example, contains a large amount of ∗
The full-size denoisings can be found at the web site listed in Section II.
388
JAMES S. WALKER
Figure 16. Spin denoisings of noisy Boats image, with SNR values. (a) Noisy images, SNR = 17.3 db. (b) Cycle-Spin thresholding, SNR = 23.7 db (c) Taws-spin, SNR = 24.7 db, (d) Hmt-Spin, SNR = 24.1 db.
residual noise (hence no more examples of Wiener filtering are considered). Finally, the Taws denoising appears the sharpest of all the wavelet-based denoisings. Note, in particular, the sharpness of the stripes in Barbara’s scarf and the faithful reconstruction of Barbara’s lips. The Taws denoising also seems the most free of noise artifacts. Let us now consider an example of spin-averaged denoising. In Figure 16, we see a cycle-spin thresholding, a Taws-spin, and an Hmt-spin denoising of the noisy Boats image with σ = 20. In this case, the Taws-spin denoising has the highest SNR and appears the sharpest, most clearly focused of the denoisings. The magnifications of these images in Figure 17 confirm the superiority of the Taws-spin denoising. In particular, the letters in the boat’s name and the boat’s masts and rigging are more sharply defined for the Taws-spin denoising than for either of the other spin denoisings.
TREE-ADAPTED WAVELET SHRINKAGE
389
Figure 17. Magnifications of images in Figure 16. The original Boats image is in Figure 13a. (a) Noisy image. (b) Cycle-spin thresholding. (c) Taws-spin. (d) Hmt-spin.
Figure 18. (a) Underwater camera image and (b) approximate histogram noise.
390
JAMES S. WALKER
(b)
Figure 19. Denoisings of underwater camera image in Figure 18: (a) SureShrink; (b) Taws; (c) Taws-Comp, 32:1; (d) Taws-spin; (e) Hmt; (f) Hmt-spin.
The last comparison is an example of denoising a real image. The previous examples used test images which were then contaminated by adding random noise. In this final example, the denoising of an image acquired from an undersea camera under very noisy conditions is considered. This image is shown in Figure 18a. The histogram of its D1 wavelet coefficients, shown in Figure 18b, illustrates the Gaussian nature of the noise contaminating this image. Denoising was performed by applying six wavelet-based methods. The results are shown in Figure 19. It is interesting that for this real example, the SureShrink denoising is clearly the worst because of its very blurred appearance. The Taws denoising is much less blurred and retains far more structure in the seaweed on the lower right. However, there are some isolated noise artifacts in the Taws denoising. These artifacts are removed in the Taws-spin denoising, which is also more sharply defined than either the Hmt or the Hmt-spin denoising. In this undersea camera application, it is necessary to transmit the acquired image over a very-low-capacity channel. The low capacity necessitates transmitting a compressed image. Figure 19c shows a Taws-Comp denoising which
TREE-ADAPTED WAVELET SHRINKAGE
391
has been simultaneously compressed at a rate of 32:1. This Taws-Comp denoising is not as good as the Taws or Hmt denoising, but it does retain more details than the SureShrink denoising, even though it is compressed at the rate of 32:1. Further examples and a more detailed discussion of Taws-Comp can be found in Walker (2002).
VII. Conclusion In this article the theory behind Taws was described, and Taws was compared with other denoising methods both in terms of SNR and perceptually. The Taws method is particularly good at preserving edge details and thus producing sharply resolved denoisings. Such sharply resolved denoisings have not been achieved with other wavelet-based methods. Future research is needed to examine the properties of Taws denoisings using different wavelet transforms, such as the complex wavelet transforms which have been shown to improve the performance of other wavelet-based denoisers. An important and long unsolved problem is to find a measure of error which is better in accord with human visual perception than the SNR is. Perhaps such a measure would objectively demonstrate that Taws is superior to competing methods. References Buccigrossi, R. W., and Simoncelli, E. P. (1999). Image compression via joint statistical characterization in the wavelet domain. IEEE Trans. Image Processing 8, 1688–1701. Burrus, C. S., Gopinath, R. H., and Guo, H. (1998). Introduction to Wavelets and Wavelet Transforms: A Primer. Englewood Cliffs, NJ: Prentice Hall. Burt, P. J., and Adelson, E. H. (1983). The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31, 532–540. Calderbank, A. R., Daubechies, I., Sweldens, W., and Yeo, B.-L. (1998). Wavelet transforms that map integers to integers. Appl. Comput. Harmonic Anal. 5, 332–369. Chambolle, A., and Lucier, B. (2001). Interpreting translation-invariant wavelet shrinkage as a new image smoothing scale space. IEEE Trans. Image Processing 10, 993–1000. Chang, S. G., Yu, B., and Vetterli, M. (2000). Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Processing 9, 1532–1546. Choi, H., and Baraniuk, R. (1999). Wavelet statistical models and Besov spaces, in Proceedings of the SPIE Conference on Wavelet Applications in Signal and Image Processing VII, Denver, CO. pp. 489–501. Choi, H., Romberg, J., Baraniuk, R., and Kingsbury, N. (2000). Hidden Markov tree modeling of complex wavelet transforms, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’00, Istanbul, Turkey. Chui, C. K. (1997). Wavelets: A Mathematical Tool for Signal Analysis. Philadelphia: Soc. for Industr. & Appl. Math.
392
JAMES S. WALKER
Cohen, A., Daubechies, I., and Feauveau, J.-C. (1992). Biorthogonal bases of compactly supported wavelets. Commun. Pure Appl. Math. 45, 485–560. Coifman, R., and Donoho, D. (1995). Translation-invariant denoising, in Wavelets and Statistics (Lecture Notes in Statistics). Berlin/ New York: Springer-Verlag. Daubechies, I. (1988). Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41, 909–996. Daubechies, I. (1992). Ten Lectures on Wavelets. Philadelphia: Soc. for Industr. & Appl. Math. Donoho, D. (1993). Nonlinear wavelet methods for recovery of signals, densities, and spectra from indirect and noisy data, in Different Perspectives on Wavelets, edited by I. Daubechies. Providence, RI: Am. Math. Soc. pp. 173–205. Donoho, D., and Johnstone, I. (1994). Ideal spatial adaptation via wavelet shrinkage. Biometrika 81, 425–455. Donoho, D., and Johnstone, I. (1995). Adapting to unknown smoothness via wavelet shrinkage. Am. Stat. Assoc. 90, 1200–1224. Donoho, D., Johnstone, I., Kerkyacharian, G., and Picard, D. (1995). Wavelet shrinkage: Asymptopia? J. R. Stat. Soc. B 57, 301–369. Field, D. J. (1993). Scale invariance and self-similar “wavelet” transforms: An analysis of natural scenes and mammalian visual systems, in Wavelets, Fractals and Fourier Transforms, edited by M. Farge, J. C. R. Hunt, and J. C. Vassilicos. Oxford: Clarendon, pp. 151–193. Field, D. J. (1994). What is the goal of sensory coding? Neural Comput. 6, 559–601. Field, D. J. (1999). Wavelets, vision and the statistics of natural scenes. Philos. Trans. R. Soc. 357, 2527–2542. Field, D. J., and Brady, N. (1997). Wavelets, blur and the sources of variability in the amplitude spectra of natural scenes. Vision Res. 37, 3367–3383. Gormish, M. J., Lee, D., and Marcellin, M. W. (2000). JPEG 2000: Overview, architecture, and applications, in IEEE International Conference on Image Processing, Vancouver, Sept. 2000, Vol. 2. pp. 29–32. Haar, L. (1910). Zur theorie der orthogonalen funktionsensysteme. Math. Ann. 69, 331–371. Hernandez, E., and Weiss, G. (1996). A First Course on Wavelets. Boca Raton, FL: CRC Press. Huang, J. (2000). Statistics of natural images and models. Ph.D. thesis, Brown University, Providence, R.I. Huang, J., and Mumford, D. (1999). Statistics of natural images and models, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1999. pp. 541–547. Johnstone, I. (1999). Function estimation and wavelets, in Lecture Notes, Dept. of Statistics Stanford University, Palo Alto, CA. Kingsbury, N. (1998). The dual-tree complex wavelet transform: A new efficient tool for image restoration and enhancement, in Proceedings of the European Signal Processing Conference, EUSIPCO 98, Rhodes. pp. 319–322. Lang, M., Guo, H., Odegard, J. E., and Burrus, C. S. (1995). Nonlinear processing of a shiftinvariant DWT for noise reduction, in Proceedings of SPIE Conference 2491, Wavelet Applications II, Orlando, Apr. 1995. pp. 640–651. Lee, J. S. (1980). Digital image enhancement and noise filtering by use of local statistics. IEEE Pattern Anal. Machine Intell. 2, 165–168. Liu, J., and Moulin, P. (2001). Information-theoretic analysis of interscale and intrascale dependencies between image wavelet coefficients. IEEE Trans. Image Processing 10, 1647–1658. Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Machine Intell. 11, 674–693. Mallat, S. (1999). A Wavelet Tour of Signal Processing, 2nd ed. New York: Academic Press. Mallat, S., and Hwang, W. L. (1992). Singularity detection and processing with wavelets. IEEE Trans. Inf. Theory 38, 617–643.
TREE-ADAPTED WAVELET SHRINKAGE
393
Mallat, S., and Zhong, S. (1992). Characterization of signals from multiscale edges. IEEE Trans. Pattern Anal. Machine Intell. 14, 710–732. Marr, D. (1982). Vision. San Francisco: Freeman. Meyer, Y. (1992). Wavelets and Operators. Cambridge, UK: Cambridge Univ. Press. de Queiroz, R. L., Nguyen, T. Q., and Rao, K. R. (1996). The GenLOT: Generalized linear-phase lapped orthogonal transform. IEEE Trans. Image Processing 44, 497–507. Resnikoff, H. L., and Wells, R. O. (1998). Wavelet Analysis. The Scalable Structure of Information. New York: Springer-Verlag. Romberg, J. K., Choi, H., and Baraniuk, R. G. (1999a). Bayesian tree-structured image modeling using wavelet-domain hidden Markov models, in Proceedings of the SPIE Technical Conference on Mathematical Modeling, Bayesian Estimation, and Inverse Problems. July 1999, Denver. pp. 31– 44. Romberg, R. K., Choi, H., and Baraniuk, R. G. (1999b). Shift-invariant denoising using waveletdomain hidden Markov trees, in Proceedings of the Thirty-third Asilomar Conference. Oct. 1999, Pacific Grove, CA. Said, A., and Pearlman, W. A. (1996a). An image multi-resolution representation for lossless and lossy image compression. IEEE Trans. Image Processing. 5, 1303–1310. Said, A., and Pearlman, W. A. (1996b). A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Trans. Circuits Syst. Video Technol. 6, 243–250. Shapiro, J. M. (1993). Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans. Signal Processings. 41, 3445–3462. Simoncelli, E. P., and Freeman, W. T. (1995). The steerable pyramid: A flexible architecture for multi-scale derivative computation, in Second International Conference on Image Processing, IEEE Signal Proc. Soc. (Washington, DC), Oct. 1995, Vol. III. pp. 444 – 447. Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Stat. 9, 1135–1151. Strang, G., and Nguyen, T. Q. (1996). Wavelets and Filter Banks. Boston: Wellesley– Cambridge Press. Strela, V., Portilla, J., and Simoncelli, E. P. (2000). Image denoising using a local Gaussian scale mixture model in the wavelet domain, in Proceedings of the SPIE 45th Annual Meeting, San Diego, CA. Tian, J., and Wells, R. O. (1996). A lossy image codec based on index coding, in IEEE Data Compression Conference, DCC’96. pp. 456 – 460. Tian, J., and Wells, R. O. (1998). Embedded image coding using wavelet-difference-reduction, in Wavelet Image and Video Compression, edited by P. Topiwala. Norwell, MA: Kluwer Academic, pp. 289–301. Tran, T. D. (1999). Linear phase perfect reconstruction filter banks: Theory, structure, design, and applications in image compression. Ph.D. thesis, Univ. of Wisconsin–Madison. Tran, T. D., and Nguyen, T. Q. (1999). A progressive transmission image coder using linear phase uniform filter banks as block transforms. IEEE Trans. Image Processing. 8, 1493–1507. Vetterli, M., and Kovaˇcevi´c, J. (1995). Wavelets and Subband Coding. Englewood Cliffs, NJ: Prentice Hall. Walker, J. S. (1997). Fourier analysis and wavelet analysis. Notices Am. Math. Soc. 44, 658– 670. Walker, J. S. (1999). A Primer on Wavelets and Their Scientific Applications. Boca Raton, FL: CRC Press. Walker, J. S. (2000). Lossy image codec based on adaptively scanned wavelet difference reduction. Opt. Eng. 39, 1891–1897. Walker, J. S. (2001). New methods in wavelet-based image denoising, in Proceedings of the Third International Conference of the International Society for Analysis, Its Applications and Computations, Berlin, Aug. 2001. Singapore: World Scientific.
394
JAMES S. WALKER
Walker, J. S. (2002). Combined image compressor and denoiser based on tree-adapted wavelet shrinkage. Opt. Eng. 41, 1520–1527. Walker, J. S., and Chen, Y.-J. (2000). Image denoising using tree-based wavelet subband correlations and shrinkage. Opt. Eng. 39, 2900–2908. Walker, J. S., and Nguyen, T. Q. (2000a). Adaptive scanning methods for wavelet difference reduction in lossy image compression, in IEEE International Conference on Image Processing Vancouver, Sept. 2000, Vol. 3. pp. 182–185. Walker, J. S., and Nguyen, T. Q. (2000b). Wavelet-based image compression, in Handbook of Transforms and Data Compression, edited by K. R. Rao, and P. C. Yip. Boca Raton, FL: CRC Press, pp. 267–311. Wandell, B. A. (1995). Foundations of Vision. Sunderland, MA: Sinauer. Wang, Y. (1995). Jump and sharp cusp detection by wavelets. Biometrika 82, 385–397. Watson, A. B. (1987). Efficiency of a model human image code. J. Opt. Soc. Am. 4, 2401–2417.
Index
A Abel–Plana summation formula, 129 Acoustic theory of sound production, 42 Adaptation algorithms, 3 Adaptive decision block, 48 Adaptive differential pulse code modulation (ADPCM), 30, 38, 47–52, 54 Adaptive filters, 3 Agee–Turner algorithm, 39, 56–59 Aliasing error, 130 Aliasing phenomenon, 97, 98–99 Amplifier noise, 291, 320, 322 Amplitude error, 130 Analog signals, wavelet analysis of, 352 Analytic signal, 107–108 Angular gating, 256 Arithmetic mean, 41 ASWDR (adaptively scanned wavelet difference reduction), 367, 373–376 Autocorrelation function, 125 Autocorrelation matrix, extended, 33–34 Avalanche photodiode (APD), 305, 306–307, 318, 328, 333–334
B Backprojection See also Filtered backprojection use of term, 262 Backpropagated scattered DPDW, 268, 282–287
Backpropagation See also Filtered backpropagation transfer function, 267, 285, 289 Backpropagation, in turbid media, 281 laboratory data reconstruction examples, 303–313 multiple-view, 313–316 object localization, 295–303 resolution enhancement, 288–295 single-view, 282–287 Balayage, 121 Banach space, 121 Band-limited, use of term, 129 Band-limited functions See also Paley–Wiener spaces classical, 75–78, 82–83 in fractional Fourier transform sense, 78–79 zeros of, 108–110 Bandpass signal, 107–108 Barbara images, 347, 384, 387 Bernstein’s inequality, 94 Bernstein spaces, 129 Bessel function, 83–84 Bessel–Neumann expansion, 95 Best linear unbiased prediction (BLUP), 140 alternate derivation of kriging, 157–158 basics of, 150–154 cokriging and, 144–145 comparison of the types of kriging and, 146–149 projected orthogonality theorem, 154–157
395
396
INDEX
Best linear unbiased prediction (Cont.) space-time kriging and, 146 spatial kriging and, 143–144 Beurling, A., 121 Biorthogonal, 198 Biorthogonal scaling function construction, 204–208 coefficients for, 232–233 Biorthogonal wavelet construction coefficients for, 234–244 wavelets for, 218–223 Bit-plane encoding, 374 Blind equalization, 28 BLUP. See Best linear unbiased prediction Boas, D. A., 298 Boat images, 347, 357–358, 364, 365, 378–379, 384, 385, 386, 388–389 Bochner’s theorem, 125 Borel, E., 64, 66 Born approximation, 255, 257, 263–264, 269, 270–271, 272, 275, 278 Bounded input-bounded output (BIBO), 11 Brownian motion, 143
C Cardinal series, 66, 76, 78, 81, 82, 83, 100 Cardinal sine function, 66, 100 Cauchy, A. L., 64, 66, 90 Cauchy–Schwarz inequality, 69, 71, 73, 93, 94, 112 Chance, B., 298 Chebyshev acceleration, 124 Child coefficients, 367–368 Children, significant, 368–372 Christoffel–Darboux formula, 91
Cokriging, 158–159 See also Space-time cokriging BLUP and prediction, 144–145 compared with kriging and space-time kriging, 146–149 groundwater data application, 171–175 magnetic resonance imaging application, 175–184 projected orthogonality theorem, 154–157 temporal filtering, 160–162, 187–188 temporal smoother, 163–164, 188–189 “Communication in the Presence of Noise” (Shannon), 64 Complex wavelet transforms, 382 Compression, image, 229–230 Compton effects, 260 Computed tomography (CT), 254 filtered backprojection, 262–263 forward model, 260–262 Conjugate gradient acceleration, 124 Continuous Volterra filters. See Volterra filters Continuous-wave (CW) illumination, 254, 310–313 comparison of SNR of modulated illumination with, 327–329 SNR derivation for, 320–322 Contraction mapping theorem, 200 Convolution integral, 4 Cycle-spin thresholding, 364–366, 383, 384
D Daubechies wavelet bases, 349, 351, 354, 355 Daub4 wavelet system, 351, 354
INDEX
Daub 9/7 wavelet system, 358, 362, 364, 369, 371, 377, 378, 382 Denoising, 230–231 See also Signal-to-noise ratio (SNR); Tree-adapted wavelet shrinkage (TAWS) cycle-spin thresholding, 364–366 wavelet shrinkage, 358–364 Depth index, 376, 377 Descent index D, 376, 377 Detection plane half-space, 288 D4 scaling function, 196 Difference reduction method, 374 Diffraction tomography (DT), standard See also Signal-to-noise ratio applications of, 257 development of, 257 differences between DT for turbid media and, 258 filtered backpropagation, 266–268 forward model, 263–265 image-reconstruction methods, 257–258 mathematical foundation for, 257 Diffraction tomography (DT), turbid media and See also Signal-to-noise ratio absorptive and scattering objects, 278–281 absorptive objects, 271–274 differences between standard DT and, 258 forward model, 268–281 mathematical foundation for, 257, 258 scattering objects, 274–277 Diffuse photon density wave (DPDW), 256–257, 258 See also Signal-to-noise ratio absorptive and scattering objects, 278–281
397
absorptive objects, 271–274 backpropagated scattered, 268, 282–287 high frequency, 270, 281 low frequency, 270, 281 measured scattered, 268, 271, 284–285, 288–289, 295, 296 object localization, 295–303 resolution enhancement, 288–295 scattered, 268, 271, 272 scattering objects, 274–277 Diffusion equation, 254–257 Digital signal processing, 2–3 Dirac delta function, 129 Dirac’s comb, 100 Dirichlet–Jordan test, 126 Dirichlet kernel, 90 Discrete signals, wavelet analysis of, 352–355 Discrete Volterra filters. See Volterra filters DPDW. See Diffuse photon density wave
E 8-bit gray-level images, 347 Entropy, 229 Errors, sampling, 130–131 Estimation error, 31, 32 Euler formulas, 81 Euler–MacLaurin formula, 129 Exponential sampling, 87, 88 EZW algorithm, 371
F Ferrar, W. L., 66 Filtered backprojection, 260 computed tomography and, 262–263
398
INDEX
Filtered backpropagation, 258, 260 See also Backpropagation, in turbid media standard diffraction tomography and, 266–268 Filtering See also Temporal filtering Kalman, 140, 161 Finite cosine transform, 80 Finite impulse-response (FIR) filter, 7, 8 time-shift property, 12–15 Finite memory, 5, 7 quadratic homogeneous filters, 15–16 Finite sampling, 89–91 Finite sine transform, 80–82 First column, 16 Forward propagation transfer function, 289 Fourier analysis, weaknesses of, 196 Fourier diffraction theorem, 265, 279 Fourier domain interpolation, 265 Fourier duality, 93–97, 120 Fourier expansion, 67, 68 Fourier interpolation, 257 Fourier slice theorem, 260, 261–262, 265 Fourier transform sense, band-limited functions in fractional, 78–79 Fractal dimension, 43 Fractal interpolation surfaces (FIS), 199–203 Fractional Fourier transform sense (FRFT), 78–79 Frames bounds, 117 defined, 117 exact, 117 operator, 118 properties of, 117–118
setting, 119–121 tight, 117, 120 Frequency-domain imaging, 256–257, 304–310 Frequency-domain representation, 8, 9 Functionals G-, 2 introduction of concept, 2 Fuzzy modeling, 43
G Gating out scattered light, methods of, 256 GenLOT transforms, 382 G-functionals, 2 GHM (Geronimo–Hardin– Massopust) scaling vector, 196 Givens rotation. See QR-RLS algorithm, Givens rotations and Goldhill images, 229–231, 347, 384 GPIB interface, 306 Gram–Schmidt orthogonalization process, 199, 210, 222 Green’s function, 265, 273, 275, 277, 279, 283–284 Groundwater data application, 171–175
H Haar basis/transform, 349, 351, 352–354 Hadamard factorization theorem, 109 Hamming filter, 297, 298, 301, 302, 303, 307, 312–313, 315–316 Hankel transform, 84–85 Hanning filter, 297, 298, 300
INDEX
Harmonic analysis, 64 Harmonizability, 128 Harmonizable processes, 127–128 Hartley, R. V. L., 83 Hartley transform, 83 HeartMark and MR markers, 175–176 Heaviside function, 104 Height index, 376 Helmholtz equation, 257, 258, 263, 268–269, 271–272, 274, 276–277, 288–289 Hilbert spaces, 64, 65, 68–75 Hilbert transform, samples from, 104–108 HMT, 383, 384 HMT-SPIN, 383, 384 Homogeneous filter, 7
I Identity prefilter, 228 Image compression, 229–230 Image human body, techniques for, 253–254 Images, wavelet analysis of, 355–358 Imaging frequency-domain, 256–257 time-domain, 256 Impulse responses, 11 Infinite impulse-response (IIR) filter, 6, 8 Infinite memory, 5, 6, 7 Information-loss error, 130 Initialization, 166–168 Innovation, 161 Input-output relation, 4, 7, 8 Input vector, 3 augmented or extended, 13 Instantaneous frequency, 107 Integer-to-integer wavelet transforms, 382
399
Interactive Data Language (IDL), 316 Interpolation prefilter, 228 Inverse matrices, 18, 24 Iterative algorithms, 121–124
J Jacobi transform, finite continuous, 85 Jacques, S. L., 298
K Kalman filtering, 140, 161 Kalman gain vector, 29, 32, 34 Kotel’nikov, V., 66 Kramer kernels, 74 Kriging See also Cokriging; Space-time cokriging; Space-time kriging comparison of the types of, 146–149 review of methods, 143–149 role of, 140 spatial, 143–144 Kriging update model, 141 general versus universal versus ordinary, 141–142 Kronecker delta, 69
L Lagrange multipliers, BLUP and, 153, 158 Lagrange-type interpolation, 110 Laguerre transform, continuous, 85 Laplace transform, 279–280 Last column, 16, 17 Least mean square (LMS), 44, 46, 48, 50–51, 52, 53, 54
400
INDEX
Lebesgue measure, 127, 131 Legendre polynomials, 91, 95 Lena images, 229–231, 347, 384 Li, X., 298 Light propagation in tissue, methods for modeling, 254–256 Likelihood variable, 32 inverse, 37 Linearity and kernel coefficients, 6 Linear minimum mean square error (LMMSE), 154 Lyapunov exponents, 43
M Magnetic resonance imaging (MRI) applications cokriging, 180–181 HeartMark and MR markers, 175–176 observation model, 179–180 results, 183–184 tagging, 175 tag surface model, 176–179 tracking method, 181–183 Mapping, sampling and periodization, 96 MatLabTM, 345 Maximum modulus principle, 129 Measured scattered DPDW, use of term, 268 Mellin–Kramer sampling result, 87–88 Memoryless system, 4 Minimization problem, 31, 43 Modulated illumination, SNR derivation for, 322–327 comparison of SNR of CW illumination with, 327–329 Monte Carlo approach, 255 Morera’s theorem, 93
MR markers, 175–176 Multichannel filters, V-vectors for, 27–28 Multichannel sampling setting, 101–102 Multidimensional convolution property, 9 Multiple-view backpropagation, 313–316 Multiresolution analysis (MRA), 197 Multiwavelets, 198 associated, 209–218
N Navier–Stokes equations, 42 Neumann series, 121 Neural networks, 42 Noise. See Signal-to-noise ratio (SNR); Tree-adapted wavelet shrinkage (TAWS) Nonlinear prediction and coding of speech and audio coding algorithm, 48–50 coding algorithm stability, 50–51 experimental results, 52–54 research on, 42–43 sampling frequency issue, 51–52 side information, 48, 52, 53 V-vector algebra for nonlinear coding, 47–48 V-vector algebra for nonlinear prediction, 43–47 Nonorthogonal sampling formulas, 114–117 Nonseparable bases, 197 Nonuniform sampling, 110 Nugget effect, 142 Nyquist, H., 66 Nyquist derivative, samples from, 102–104
INDEX
Nyquist–Landau sampling rate, 131 Nyquist rate, 66, 102 Nyquist sampling theorem, 262
O Object half-space, 288 Object localization, 295–303 Ogura, K., 66 O’Leary, M. A., 298 Optical diffusion tomography (ODT) See also Diffraction tomography (DT) advantages of, 254 applications of, 253 disadvantages of, 254 Orthogonality, single scaling function and, 196 Orthogonal polynomials, 90–91 Orthogonal sampling formulas. See Sampling formulas, orthogonal Orthogonal scaling function construction, 208–209 coefficients for, 233–234 Orthogonal scaling vector, 198 Orthogonal wavelet construction coefficients for, 245–250 wavelets for, 223–226 Orthonormal bases, 68–75 Ostermeyer, M. A., 298 Oversampling, 97–101
P Paley–Wiener classes, 129 Paley–Wiener–Levinson (PWL) sampling theorem, 113 Paley–Wiener (PW) spaces, 76, 92 analytic signal, 107–108 Fourier duality, 93–97
401
frames, 117–121 Hilbert transform, samples from, 104–108 irregular sampling, 110–121 iterative algorithms, 121–124 nonorthogonal sampling formulas, 114–117 Nyquist derivative, samples from, 102–104 Riesz bases, 111–114 sampling by using other types of samples, 101–108 undersampling and oversampling, 97–101 zeros of band-limited functions, 108–110 Parent coefficients, 367–368 Parents, significant, 368–372 Parseval’s equality, 120 Parseval’s formula, 68, 94, 326 Peak signal-to-noise ratio (PSNR), 229 Pepper images, 345–346, 347, 361–362, 363, 365–366, 369, 384 Periodization mapping, 96 Phase-space reconstruction, 42–43 Photo-multiplier tube (PMT), 318, 328–329 Photon Migration Imaging (PMI) software, 298, 314 Photon noise, 291, 320, 321, 322 Phragmen–Lindel¨of principle, 129 Pillbox filter, 297, 298, 300, 315 Plancherel’s theorem, 129 Poisson summation formula, 97–100 Polarization gating, 256 Polarization identity, 69–70 Polynomial filters, 2 adaptive, 8 theory, 8
402
INDEX
Polynomials Legendre, 91, 95 orthogonal, 90–91 trigonometric, 90 Prediction See also Best linear unbiased prediction (BLUP) backward, 31, 32, 44, 45, 46, 47 error, 43 forward, 31, 32, 36, 44–45, 48 kriging and use of term, 143 Prefiltering, 227–228 Projected orthogonality theorem, 154–157 PSF, 298 width scale factor, 293–284 pth-order preserving, 227 V-vector, 25
Q Quadratic filters, 7 V-vector algebra for homogeneous, 15–16 QR-based lattice algorithms, 14, 29 QR decomposition, 14 QR-RLS algorithm, Givens rotations and derivation of, 33–40 description of Givens rotations, 55–56 development of, 29–30 experimental results, 40–42 review of RLS adaptive filtering, 30–33
R Rayleigh resolution, 293 Real-zero interpolation, 110
Recursive least squares (RLS) adaptive algorithms, 13, 14 See also QR-RLS algorithm, Givens rotations and nonlinear prediction of speech, 44, 50–51, 54 problems with fast, 29–30 Recursive least squares adaptive filtering, review of, 30–33 Refinement pass, 374, 376, 381 Reproducing kernel Hilbert space (RKHS), 70–71, 73, 94 Resolution enhancement, 288–295 Richardson method, 120, 121 Riemann–Lebesgue lemma, 94 Riesz bases, 111–114 Riesz representation theorem, 70 Rodrigues formula, 91 Root-mean-square (RMS) differences, kriging and, 149, 172 Root-mean-square error (RMSE), 229 Rytov approximation, 255, 257, 264, 271
S Samples, 352 Sampling errors, 130–131 expansion, 67 exponential, 87, 88 finite, 89–91 frequency, 66 functions, 66 mapping, 96 period, 66 set of stable, 130–131 undersampling and oversampling, 97–101
INDEX
Sampling formulas, orthogonal approaches to, 74–75 band-limited functions, classical, 75–78, 82–83 band-limited functions in fractional Fourier transform sense, 78–79 Bessel functions, 83–84 finite, 89–91 finite cosine transform, 80 finite sine transform, 80–82 Hankel transform, 84–85 Laguerre transform, continuous, 85 Mellin–Kramer sampling result, 87–88 multidimensional WSK theorem, 86 orthonormal bases and Hilbert spaces, 68–75 Sampling theory See also Paley–Wiener spaces applications of, 64 defined, 64 development of, 64 Poisson summation formula, 97–100 stationary stochastic processes, 124–128 steps of orthogonal, 67 Whittaker–Shannon–Kotel’nikov (WSK), 64, 65, 66, 75–76, 86 Scalar transport theory, 254–255 Scaling coefficients, 350 Scaling functions applications to digitized images, 226–231 biorthogonal construction, 204–208 defined, 197 definitions and notations, 197–199 D4, 196
403
fractal interpolation surfaces, 199–203 main results, 203–204 orthogonal construction, 208–209 wavelet analysis and, 350–352 Scaling series, 350 Scaling vector defined, 197 GHM, 196 orthogonal, 198 Scan order, 374, 375, 376, 381 Scattered DPDW, use of term, 268 Scattered light, use of term, 268 Schwartz inequality, 370 Self-adjoint boundary value problems, 74, 122 Separable bases, 196, 197 Set of interpolation, 131 Set of stable sampling, 130–131 Set of uniqueness, 131 Shannon, C. E., 64, 66, 90, 101–102 Shift-invariant spaces, 131–132 Siblings, significant, 375 Side information, 48, 52, 53 Signal-to-noise ratio (SNR), 52–53 assumptions, 317–320 compared with TAWS, 383–391 comparison of modulated and CW illumination, 327–329 derivation for CW illumination, 320–322 derivation for modulated illumination, 322–327 diffraction tomography and, 258, 316–338 example, 334–338 laboratory data validation, 329–334 TAWS and Wiener filtering and, 345–348 Significance pass, 374, 376, 381 Sinc function, 66, 131
404
INDEX
Sine-wave crossings technique, 110 Sound production See also Nonlinear prediction and coding of speech and audio acoustic theory of, 42 Space-time cokriging drawbacks of, 140 forms of, 141 Space-time interpolation, challenges of, 140 Space-time kriging See also Kriging compared with kriging and cokriging, 146–149 drawbacks of, 140 forms of, 141 groundwater data application, 171–175 initialization, 166–168 prediction and, 146 projected orthogonality theorem, 154–157 temporal filtering, 164–169, 190–191 temporal smoother, 169–171, 191–192 Spatial kriging, 143–144 Spatial resolution, 293 Spectral measure, 125 Speech production. See Nonlinear prediction and coding of speech and audio SPIHT algorithm, 371 Splinelike spaces, 131–132 Stability, 11, 50–51 Stable fast transversal filter (SFTF), 29 Standard diffraction tomography. See Diffraction tomography (DT), standard Steerable wavelet transforms, 382
Stochastic processes, sampling stationary, 124–128 Stone–Weierstrass theorem, 2 Sub-V-vectors, 17 SureShrink, 363–364, 383, 384 Symmetry of Volterra kernels, 9–10 System with memory, 4
T Tagging, 175 TAWS. See Tree-adapted wavelet shrinkage TAWS-COMP, 367, 380–382 TAWS-SPIN, 380, 383, 384 Taylor series expansion, 2, 4 Temporal filtering for cokriging, 160–162, 187–188 groundwater data application, 171–175 magnetic resonance imaging application, 175–184 for space-time kriging, 164–169, 190–191 Temporal smoother for cokriging, 163–164, 188–189 groundwater data application, 171–175 for space-time kriging, 169–171, 191–192 Thresholding cycle-spin, 364–366 use of term, 359–360 Time-domain imaging, 256–257 Time gating, 256 Time-jitter error, 130 Time-shift property, 3, 12–15, 31 Titchmarsh’s theorem, 109, 114 Train of deltas, 100
INDEX
Transform-domain representation, 9 Transmission geometry, 263 Transposed V-matrices, 18, 24 Tree-adapted wavelet shrinkage (TAWS) See also Wavelet analysis algorithm, 376–379 ASWDR (adaptively scanned wavelet difference reduction), 367, 373–376 compared with SNR, 383–391 comparison of Wiener filtering with, 345–348 cycle-spin thresholding, 364–366 principles of, 368–373 spin algorithm, 380 TAWS-COMP, 367, 380–382 TAWS-SPIN, 380 theory of, 367–368 transforms, 382–383 wavelet shrinkage, 358–364 Triangular form, 10 Triangular V-matrices, 19–23 Trigonometric polynomials, 90 Truncation error, 130 Turbid media. See Backpropagation, in turbid media; Diffraction tomography (DT), turbid media and
U Undersampling, 97–101 Uniform samples, 352
V VisuShrink, 358–363 V-matrix (matrices) defined, 17 identity, 18 inverse, 18, 24
405
main operations, 24 product of two, 18, 24 product of V-vector and, 18 sum of two, 18 transposed, 18, 24 triangular, 19–23 Volterra, Vito, 2 Volterra filters origin of, 2 V-vectors for, 25–27 Volterra kernels continuous-time, 4 discrete-time, 6 linearity and kernel coefficients, 6 symmetry of, 9–10 Volterra series expansion defined, 4 order or degree of, 5 truncated, 5, 7 Volterra series expansion, continuous nonlinear description of, 4–5 Volterra series expansion, discrete nonlinear description of, 5–7 Volterra series expansion, properties of discrete, 2 existence and convergence, 11–12 impulse responses, 11 linearity and kernel coefficients, 6 multidimensional convolution property, 9 stability, 11 symmetry of kernels, 9–10 V-vector(s) defined, 15, 16 inner product of two, 18 for linear multichannel filters, 27–28 product of V-matrix and, 18 sum of two, 17 for Volterra filters, 25–27
406
INDEX
V-vector algebra definitions and fundamental operations, 23–24 definitions and properties, 16–23 for nonlinear coding of speech and audio, 47–48 for nonlinear prediction of speech, 43–47 quadratic homogeneous filters and, 15–16 role of, 3 time-shift property, 12–15
W Wavelet(s), 198 See also Tree-adapted wavelet shrinkage (TAWS) applications to digitized images, 226–231 basis, 348–349 for biorthogonal construction, 218–223 coefficients, 349 for orthogonal construction, 223–226 spaces, 198 theory, role of, 196 transform, 349 Wavelet analysis of analog signals, 352 of discrete signals, 352–355 of images, 355–358
scaling functions, 350–352 wavelet series, 348–350 Wavelet shrinkage, 358–364 See also Tree-adapted wavelet shrinkage Whittaker, E. T., 66 Whittaker, J. M., 66 Whittaker–Shannon–Kotel’nikov (WSK) sampling theorem, 64, 65, 66, 75–76 See also Sampling; Sampling formulas, orthogonal; Sampling theory multidimensional, 86 Poisson summation formula, 97–100 Wiener, Norbert, 2 Wiener filtering, 343, 360–361, 383, 384 comparison of TAWS with, 345–348
X X-ray tomography, 254
Y Yodh, A. G., 298
Z Zeros of band-limited functions, 108–110