ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 126
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
ASSOCIATE EDITORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics Edited by
PETER W. HAWKES CEMES-CNRS Toulouse, France
VOLUME 126
Amsterdam Boston London New York Oxford Paris San Diego San Francisco Singapore Sydney Tokyo
This book is printed on acid-free paper. Copyright ß 2003, Elsevier Science (USA).
All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2003 chapters are as shown on the title pages. If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/2003 $35.00 Permissionions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected]. You may also complete your request on-line via the Elsevier Science homepage (http://elsevier.com), by selecting ‘‘Customer Support’’ and then ‘‘Obtaining Permissions.’’
Academic Press An Elsevier Science Imprint. 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.academicpress.com
Academic Press 84 Theobald’s Road, London WC1X 8RR, UK http://www.academicpress.com International Standard Book Number: 0-12-014768-8 PRINTED IN THE UNITED STATES OF AMERICA 03 04 05 06 07 08 9 8 7 6 5 4 3
2
1
CONTENTS
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Contributions . . . . . . . . . . . . . . . . . . . . . . . .
ix xi xiii
A Wavelet-Based Method for Multifractal Image Analysis: From Theoretical Concepts to Experimental Applications A. ArnØodo, N. Decoster, P. Kestener, and S. G. Roux I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . II. Image Processing with the 2D Continuous Wavelet Transform III. Test Applications of the WTMM Method to Monofractal and Multifractal Rough Surfaces . . . . . . . . . . . . . . . . IV. Multifractal Analysis of High-Resolution Satellite Images of Cloud Structure . . . . . . . . . . . . . . . . . . . . . . . . V. Multifractal Analysis of 3D Turbulence Simulation Data . . . VI. Multifractal Analysis of Digitized Mammograms . . . . . . . VII. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 7 23 41 53 73 80 82
An Analysis of the Geometric Distortions Produced by Median and Related Image Processing Filters E. R. Davies I. II. III. IV. V. VI. VII. VIII. IX. X. XI. XII.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . Image Filters . . . . . . . . . . . . . . . . . . . . . . . . Shifts Produced by Median Filters in Continuous Images Shifts Produced by Median Filters in Digital Images . . Shifts Produced by Mean Filters . . . . . . . . . . . . . Shifts Produced by Mode Filters . . . . . . . . . . . . . Shifts Produced by Rank-Order Filters . . . . . . . . . . Rank-Order Filters—a Didactic Example . . . . . . . . . A Problem with Closing . . . . . . . . . . . . . . . . . . A Median-Based Corner Detector . . . . . . . . . . . . Boundary Length Measurement Problem . . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .
v
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
94 96 105 122 146 150 156 170 174 178 183 188 191
vi
CONTENTS
Two-Photon Excitation Microscopy Alberto Diaspro and Giuseppe Chirico I. Introduction . . . . . . . . . . . . . . . . . . . . . . II. Historical Notes . . . . . . . . . . . . . . . . . . . . III. Basic Principles of Two-Photon Excitation of Fluorescent Molecules . . . . . . . . . . . . . . . . . IV. Behavior of Fluorescent Molecules under TPE Regime V. Optical Consequences and Resolution Aspects . . . . VI. Architecture of Two-Photon Microscopy . . . . . . . VII. Application Gallery . . . . . . . . . . . . . . . . . . VIII. Conclusions . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
195 198
. . . . . . .
. . . . . . .
202 212 219 225 257 273 276
. . . . . . . . . .
288 293
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
296 299 299 305 307 309 313 315 317 319 320 321 321 323 327
. . . .
. . . .
. . . .
. . . .
. . . .
329 330 337 340
. . . . . . .
. . . . . . .
Phase Closure Imaging AndrE´ Lannes I. Introduction . . . . . . . . . . . . . . . . . . . . . II. Phase Space and Integer Lattices . . . . . . . . . . III. Phase Closure Operator, Phase Closure Projection, and Related Properties . . . . . . . . . . . . . . . . IV. Variance–Covariance Matrix of the Closure Phases . V. Spectral Phase Closure Projection . . . . . . . . . . VI. Reference Algebraic Framework . . . . . . . . . . . VII. Statement of the Phase Calibration Problem . . . . VIII. Phase Calibration Discrepancy and Related Results IX. Optimal Model Phase Shift and Related Results . . X. Special Cases . . . . . . . . . . . . . . . . . . . . . XI. Simulated Example . . . . . . . . . . . . . . . . . . XII. Concluding Comments . . . . . . . . . . . . . . . . Appendix 1 . . . . . . . . . . . . . . . . . . . . . . Appendix 2 . . . . . . . . . . . . . . . . . . . . . . Appendix 3 . . . . . . . . . . . . . . . . . . . . . . Appendix 4 . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
Three-Dimensional Image Processing and Optical Scanning Holography Ting-Chung Poon I. II. III. IV.
Introduction . . . . . . . . . . . . . . . Two-Pupil Optical Heterodyne Scanning Three-Dimensional Imaging Properties . Optical Scanning Holography . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
vii
CONTENTS
V. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . 347 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Nonlinear Image Processing using Artificial Neural Networks Dick de Ridder, Robert P. W. Duin, Michael EgmontPetersen, Lucas J. Van Vliet, and Piet W. Verbeek I. II. III. IV. V. VI. VII.
Introduction . . . . . . . . . . . . . . . . . . . . . . Applications of ANNs in Image Processing . . . . . Shared Weight Networks for Object Recognition . . Feature Extraction in Shared Weight Networks . . . Regression Networks for Image Restoration . . . . . Inspection and Improvement of Regression Networks Conclusions . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
352 356 366 377 399 418 442 447
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
This Page Intentionally Left Blank
CONTRIBUTORS
Numbers in parentheses indicate the pages on which the authors’ contributions begins.
A. ArnØodo (1), Centre de Recherche Paul Pascal, 33600 Pessac, France Giuseppe Chirico (195), LAMBS-INFM and Department of Physics, University of Milano Biocca, 20126 Milano, Italy E. R. Davies (93), Machine Vision Group, Department of Physics, Royal Holloway College, University of London, Egham, Surrey TW20 0EX, United Kingdom N. Decoster (1), Noveltis, Parc Technologique du Canal, 31520 Ramonville Saint Agne, France Alberto Diaspro (195), LAMBS-INFM and Department of Physics, University of Genoa, 16146 Genova, Italy Robert P. W. Duin (351), Pattern Recognition Group, Department of Applied Physics, Delft University of Technology, 2628 CJ Delft, The Netherlands Michael Egmont-Petersen (351), Decision Support Systems Group, Institute of Information and Computing Sciences, Utrecht University, 3508 TB Utrecht, The Netherlands P. Kestener (1), Centre de Recherche Paul Pascal, 33600 Pessac, France AndrE´ Lannes (287), Sciences de l’Univers du Centre Europe´en de Recherche et de Formation Avance´e en Calcul Scientifique (Suc-Cerfacs), F-31057 Toulouse cedex, France Ting-Chung Poon (329), Optical Image Processing Laboratory, Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061 Dick de Ridder (351), Pattern Recognition Group, Department of Applied Physics, Delft University of Technology, 2628 CJ Delft, The Netherlands ix
x
CONTRIBUTORS
S. G. Roux (1), Laboratoire de Physique, Ecole Normale Supe´rieure de Lyon, 69364 Lyon cedex 07, France Lucas J. Van Vliet (351), Pattern Recognition Group, Department of Applied Physics, Delft University of Technology, 2628 CJ Delft, The Netherlands Piet W. Verbeek (351), Pattern Recognition Group, Department of Applied Physics, Delft University of Technology, 2628 CJ Delft, The Netherlands
PREFACE
This latest volume of these Advances is dominated by image processing and by a major contribution on microscopy, which is also the object of much of the image processing. The collection opens with a contribution by A. Arne´odo, N. Decoster, R. Kestener, and S. G. Roux on multifractal analysis, inspired by the need to find a consistent way of characterizing surface roughness. One of the authors of this chapter has shown that use of the continuous wavelet transform has many advantages here and that the method can be extended to the two-dimensional situation, which is of course of most practical interest. This chapter provides a full account of the method and of many realistic applications. The second contribution is by E. R. Davies, whose work on median and rank-order filters is well known in the field of image processing. Despite their attractive features, these nonlinear filters also create distortions and these are analysed very thoroughly in this chapter. The author examines median and rank-order filters, mode filters and morphological filters and discusses the shifts that they are liable to create. A knowledge of the artefacts that can be generated is essential for anyone using these filters. This brings us to a chapter by A. Diaspro and G. Chirico, who present a form of microscopy that is attracting great interest, namely, two-photon excitation microscopy. Alberto Diaspro is one of the leaders of this discipline and here, the principles of the technique and a range of applications are set out very clearly. Although two-photon microscopy is a fairly recent development, the basic physics has a long history, which is summarized at the beginning of the chapter with a wealth of historic illustrations. The problem of phase calibration is particularly acute in multi-antenna radio imaging. A radically new approach based on graph-theoretic reasoning has been pioneered by A. Lannes, who explains the procedure in the fifth chapter, entitled ‘Phase closure imaging’. The mathematical fundamentals are first presented and related to realistic practical situations, after which a number of special cases are explored. This very full statement of Lannes’ solution to an important and difficult problem is thus very welcome here. Optical scanning holography was introduced by T.-C. Poon, author of the fourth chapter. This technique is explained in terms of the two-pupil optical heterodyne scanning image processor, which leads on to the notion xi
xii
PREFACE
of three-dimensional point-spread functions. The sine- and cosine-Fresnel zone plate hologram and the complex hologram are then introduced and finally, the use of these for three-dimensional reconstruction is explained. This is an authoritative account of a very exciting development in imaging. We close with a long contribution on nonlinear image processing in which the operations are performed on artificial neural networks. Here, D. de Ridder and colleagues from the Pattern Recognition Group in Delft University and the Institute of Information and Computing Sciences in Utrecht University first explain how image processing is performed by neural networks. They describe the various kinds of network and then discuss in detail object recognition, feature extraction, image restoration and finally, the inspection and improvement of regression networks. This chapter has the status of a monograph on the subject and will, I am sure, be heavily used. In conclusion, I thank most sincerely all the contributors for taking so much trouble to make their specialized knowledge available to a wider audience and list the contributions planned for future volumes. Peter W. Hawkes
FUTURE CONTRIBUTIONS
T. Aach (vol. 128) Lapped transforms G. Abbate New developments in liquid-crystal-based photonic devices S. Ando Gradient operators and edge and corner detection C. Beeli Structure and microscopy of quasicrystals I. Bloch (vol. 128) Fuzzy distance measures in image processing G. Borgefors Distance transforms B. L. Breton, D. McMullan and K. C. A. Smith (Eds) Sir Charles Oatley and the scanning electron microscope A. Bretto Hypergraphs and their use in image modelling Y. Cho (vol. 127) Scanning nonlinear dielectric microscopy H. Delingette Surface reconstruction based on simplex meshes R. G. Forbes Liquid metal ion sources E. Fo¨rster and F. N. Chukhovsky X-ray optics A. Fox The critical-voltage effect L. Frank and I. Mu¨llerova´ (vol.128) Scanning low-energy electron microscopy L. Godo and V. Torra Aggregation operators xiii
xiv
FUTURE CONTRIBUTIONS
A. Go¨lzha¨user Recent advances in electron holography with point sources A. M. Grigoryan and S. S. Agaian Transform-based image enhancement algorithms with performance measure. A. Hanbury (vol. 128) Morphology on a circle H. F. Harmuth and B. Meffert Calculus of finite differences in quantum electrodynamics P. W. Hawkes (vol. 127) Electron optics and electron microscopy: conference proceedings and abstracts as source material M. I. Herrera The development of electron microscopy in Spain J. S. Hesthaven (vol. 127) Higher-order accuracy computational methods for time-domain electromagnetics D. Hitz Recent progress on HF ECR ion sources K. Ishizuka Contrast transfer and crystal images G. Ko¨gel Positron microscopy W. Krakow Sideband imaging N. Krueger The application of statistical and deterministic regularities in biological and artificial vision systems B. Lahme Karhunen-Loeve decomposition B. Lencova´ Modern developments in electron optical calculations M. A. O’Keefe Electron image simulation
FUTURE CONTRIBUTIONS
xv
N. Papamarkos and A. Kesidis The inverse Hough transform M. G. A. Paris and G. d’Ariano (vol. 128) Quantum tomography K. S. Pedersen, A. Lee and M. Nielsen The scale-space properties of natural images E. Petajan HDTV M. Petrou Image registration M. Rainforth Recent developments in the microscopy of ceramics, ferroelectric materials and glass E. Rau Energy analysers for electron microscopes H. Rauch The wave-particle dualism J. J. W. M. Rosink and N. van der Vaart HEC sources for the CRT O. Scherzer (vol. 128) Regularization techniques G. Schmahl X-ray microscopy S. Shirai CRT gun design methods T. Soma Focus-deflection systems and their applications J.-L. Starck The curvelet transform I. Talmon Study of complex fluids by transmission electron microscopy M. Tonouchi Terahertz radiation imaging
xvi
FUTURE CONTRIBUTIONS
N. M. Towghi Ip norm optimal filters Y. Uchikawa Electron gun optics D. van Dyck Very high resolution electron microscopy K. Vaeth and G. Rajeswaran Organic light-emitting arrays C. D. Wright and E. W. Hill Magnetic force microscopy F. Yang and M. Paindavoine (vol. 127) Pre-filtering for pattern recognition using wavelet transforms and neural networks M. Yeadon Instrumentation for surface studies
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 126
A Wavelet-Based Method for Multifractal Image Analysis: From Theoretical Concepts to Experimental Applications A. ARNE´ODO,1 N. DECOSTER,2 P. KESTENER,1 AND S. G. ROUX3 1
Centre de Recherche Paul Pascal, Avenue Schweitzer, 33600, Pessac, France 2 Noveltis, Parc Technologique du Canal, 2 avenue de l’Europe, 31520, Ramonville Saint Agne, France 3 Laboratoire de Physique, Ecole Normale Supe´rieure de Lyon, 46 alle´e d’Italie, 69364, Lyon cedex 07, France
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . II. Image Processing with the 2D Continuous Wavelet Transform . . . . . . A. Analyzing Wavelets for Multiscale Edge Detection . . . . . . . . . B. Characterizing the Local Regularity Properties of Rough Surfaces with the Wavelet Transform Modulus Maxima . . . . . . . . . . 1. Isotropic Dilations . . . . . . . . . . . . . . . . . . . . 2. Anisotropic Dilations . . . . . . . . . . . . . . . . . . . C. The 2D Wavelet Transform Modulus Maxima (WTMM) Method . . . 1. Definition . . . . . . . . . . . . . . . . . . . . . . . 2. Methodology . . . . . . . . . . . . . . . . . . . . . . 3. Remark . . . . . . . . . . . . . . . . . . . . . . . . 4. Numerical Implementation . . . . . . . . . . . . . . . . . III. Test Applications of the WTMM Method to Monofractal and Multifractal Rough Surfaces . . . . . . . . . . . . . . . . . . . A. Fractional Brownian Surfaces . . . . . . . . . . . . . . . . . B. Multifractal Rough Surfaces Generated by Random Cascades on Separable Wavelet Orthogonal Basis . . . . . . . . . . . . . . 1. Remark . . . . . . . . . . . . . . . . . . . . . . . . C. Distinguishing ‘‘Multiplicative from Additive’’ Processes Underlying the Scale Invariance Properties of Rough Surfaces from Space-Scale Correlation Analysis . . . . . . . . . . . . . . . . . . . . D. Using the 2D WTMM Method to Perform Image Processing Tasks . . IV. Multifractal Analysis of High-Resolution Satellite Images of Cloud Structure A. Landsat Data of Marine Stratocumulus Cloud Scenes . . . . . . . . B. Application of the 2D WTMM Method to Landsat Images of Stratocumulus Clouds . . . . . . . . . . . . . . . . . . . . 1. Numerical Computation of the Multifractal (q) and D(h) Spectra . . 2. WTMMM Probability Density Functions . . . . . . . . . . . C. Space-Scale Correlation Function Analysis of Radiance Landsat Images . D. Comparative WTMM Multifractal Analysis of Landsat Radiance Field and Velocity and Temperature Fields in Fully Developed Turbulence . . V. Multifractal Analysis of 3D Turbulence Simulation Data . . . . . . . . A. Multifractal Description of Intermittency . . . . . . . . . . . . . 1. Intermittency Based on the Velocity Field . . . . . . . . . . .
. . . . . . . . . . . . . .
2 7 7
. . . . . . . .
9 10 10 17 17 18 19 21
. . . .
23 23
. . . .
31 35
. . . .
. . . .
36 38 41 43
. . . .
. . . .
44 45 48 50
. . . .
. . . .
51 53 53 53
1 Copyright 2003 Elsevier Science (USA). All rights reserved. ISSN 1076-5670/03
2
´ ODO ET AL. ARNE
2. Intermittency Based on the Energy Dissipation Field . . . . . . 3. Intermittency Based on the Enstrophy Field . . . . . . . . . . B. Application of the 2D WTMM Method to 2D Cuts of a Turbulent 3D Dissipation Field . . . . . . . . . . . . . . . . . . . . . 1. Remark . . . . . . . . . . . . . . . . . . . . . . . . 2. Numerical Computation of the ðqÞ and f ðqÞ Multifractal Spectra . 3. WTMMM Probability Density Functions . . . . . . . . . . . 4. Space-Scale Correlation Function Analysis . . . . . . . . . . C. Application of the 2D WTMM Method to 2D Cuts of a Turbulent 3D Enstrophy Field . . . . . . . . . . . . . . . . . . . . . . 1. Numerical Computation of the Multifractal ðqÞ and f ðqÞ Spectra . 2. WTMMM Probability Density Functions . . . . . . . . . . . 3. Space-Scale Correlation Function Analysis . . . . . . . . . . D. Discussion . . . . . . . . . . . . . . . . . . . . . . . . VI. Multifractal Analysis of Digitized Mammograms . . . . . . . . . . A. Application of the 2D WTMM Method to Mammographic Tissue Classification: Dense and Fatty Tissues . . . . . . . . . . . . . B. Detecting Microcalcifications through WT Skeleton Segmentation . . VII. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
56 59
. . . . .
. . . . .
61 62 63 66 67
. . . . . .
. . . . . .
68 68 70 71 71 73
. . . .
. . . .
74 77 80 82
I. Introduction Ever since the explosive propagation of fractal ideas [1,2] throughout the scientific community in the late 1970s and early 1980s, there have been numerous applications to surface science [3–13]. Both real space imaging techniques (including scanning tunneling microscopy, atomic force microscopy, transmission electron microscopy, secondary electron microscopy, and optical imaging techniques) and diffraction techniques (including electron, atom, light, and X-ray scattering) have been extensively used to study rough surfaces [12]. The characterization of surface roughness is an important problem from a fundamental point of view as well as for the wealth of potential applications in applied sciences. Indeed, a wide variety of natural and technological processes lead to the formation of complex interfaces [1–18]. Assigning a fractal dimension to those irregular surfaces has now become routine in various fields including topography, defect and fracture studies, growth phenomena, erosion and corrosion processes, catalysis, and many other areas in physics, chemistry, biology, geology, meteorology, and material sciences [1–18]. For isotropic and self-similar interfaces when magnified equally in all directions, algorithms (e.g., box-counting algorithms, fixed-size and fixed-mass
MULTIFRACTAL IMAGE ANALYSIS
3
correlation algorithms) were designed and shown to provide a good estimate of the fractal dimension DF [19–27]. For rough surfaces that are well described by self-affine fractals displaying anisotropic scale invariance [1,2,4,5,7,28–31], various methods (e.g., divider, box, triangle, slit-island, power spectral, variogram, and distribution methods) of computing DF were shown to give different results [32–36]. Limited resolution as well as finite-size effects are well known for introducing biases in the estimate of DF , which are indeed method dependent [32,36,37]. For a documented discussion of the possible reasons for these differences in fractal dimension measurements, we refer the reader to the review article of Lea-Cox and Wang [38]. An alternative strategy consists in computing the so-called roughness exponent H [1,2,4,7] that describes the scaling of the width (or thickness) of the rough interface with respect to measurement scale. Different methods (e.g., height–height correlation function, variance and power spectral methods, detrented fluctuation analysis, first return and multireturn probability distributions) [33–36,39–42] are available to estimate this exponent that is supposed to be related to the fractal dimension DF ¼ d H of self-affine surfaces embedded in a d-dimensional space. Again a number of artifacts may pollute the estimate of the roughness exponent [36]. Since sensitivity and accuracy are method dependent, the usually recommendation is to simultaneously use different tools in order to appreciate, in a quantitative way, the level of confidence in the measured exponent. But beyond some practical algorithmic limitations, there exists a more fundamental intrinsic insufficiency of fractal dimension measurement in the sense that the fractal dimension DF as well as the roughness exponent H are global quantities that do not account for possible fluctuations (from point to point) of the local regularity properties of a fractal surface. Box-counting and correlation algorithms were successfully adapted to resolve multifractal scaling for isotropic self-similar fractals by computation of the generalized fractal dimensions Dq [20–26]. As to self-affine fractals, Parisi and Frisch [43] proposed, in the context of the analysis of fully developed turbulence data, an alternative multifractal description based on the investigation of the scaling behavior of the so-called structure functions [18,44]: Sp ðlÞ ¼ < ðfl Þp > l p ( p integer > 0), where fl ðxÞ ¼ f ðx þ lÞ f ðxÞ is an increment of the recorded signal over a distance l. Then, after reinterpreting the roughness exponent as a local quantity [43,45–49]: fl ðxÞ l hðxÞ , the D(h) singularity spectrum is defined as the Hausdorff dimension of the set of points x where the local roughness (or Ho¨lder) exponent h(x) of f is h. In principle, D(h) can be attained by Legendre transforming the structure function scaling exponents p [43,48,49]. Unfortunately, as noticed by Muzy et al. [50], there are some fundamental drawbacks to the structure
4
´ ODO ET AL. ARNE
function method. Indeed, it generally fails to fully characterize the D(h) singularity spectrum since only the strongest singularities of the function f itself (and not the singularities present in the derivatives of f ) are a priori amenable to this analysis. Even though one can extend this study from integer to real positive p values by considering the increment absolute value, the structure functions generally do not exist for p < 0. Moreover, singularities corresponding to h > 1, as well as regular behavior, bias the estimate of p [48–50]. In previous work [47–50], one of the authors (A.A.), in collaboration with Bacry and Muzy, has shown that there exists a natural way of performing a multifractal analysis of self (multi)affine functions, which consists in using the continuous wavelet transform [51–66]. By using wavelets instead of boxes, as in classic multifractal formalism [24,67–71], one can take advantage of freedom in the choice of these ‘‘Generalized oscillating boxes’’ to get rid of possible smooth behavior that might either mask singularities or perturb the estimation of their strength h [47–50]. The other fundamental advantage of using wavelets is that the skeleton defined by the wavelet transform modulus maxima (WTMM) [72,73] provides an adaptative space-scale partitioning from which one can extract the D(h) singularity spectrum via the scaling exponents ðqÞ of some partition functions defined on the skeleton. The so-called WTMM method [47–50] therefore provides access to the entire D(h) spectrum via the usual Legendre transform DðhÞ ¼ minq ½qh ðqÞ. We refer the reader to Refs. [74,75] for rigorous mathematical results. Since the WTMM method is mainly devoted to practical applications to stochastic systems, let us point out that the theoretical treatment of random multifractal functions requires special attention. A priori, there is no reason that all the realizations of the same stochastic multifractal process correspond to a unique D(h) curve. Each realization has its own unique distribution of singularities and one crucial issue is to relate these distributions to some averaged versions computed experimentally. As emphasized by Hentschel [76], one can take advantage of the analogy that links the multifractal description to statistical thermodynamics [24,49,67,68,77], by using methods created specifically to study disorder in spin-glass theory [78]. When carrying out replica averages of the random partition function associated with a stochastic function, one gets multifractal spectra ðq; nÞ that generally depend on the number of members n in the replica average (let us note that n ¼ 0 and n ¼ 1, respectively, correspond to commonly used quenched and annealed averaging [76]). Then, by Legendre transforming ðq; nÞ, some type of average D(h) spectra is found [76]. Some care is thus required when interpreting these average spectra in order to avoid some misunderstanding of the underlying physics.
MULTIFRACTAL IMAGE ANALYSIS
5
Applications of the WTMM method to one-dimensional (ID) signals have already provided insight into a wide variety of outstanding problems [62], e.g., the validation of the cascade phenomenology of fully developed turbulence [47–49,79–87], the discovery of a Fibonacci structural ordering in 1D cuts of diffusion-limited aggregates (DLA) [88–92], the characterization and the understanding of long-range correlations in DNA sequences [93– 98], and the demonstration of the existence of a causal cascade of information from large to small scales in financial time series [99,100]. Let us also note that from a fundamental point of view, the WTMM multifractal formalism [47–50,74] has been recently revisited [101–104] in order to incorporate in this statistical ‘‘canonical’’ description (which applies for cusp-like singularities only), the possible existence of oscillating singularities [73,101,105]. This new ‘‘grand canonical’’ description [102–104] allows us to compute the singularity spectrum Dðh; Þ, which accounts for the statistical contribution of singularities of Ho¨lder exponent h and oscillation exponent (where characterizes the local power-law divergence of the instantaneous frequency). In a recent work [106–110], we have generalized the canonical WTMM method from 1D to two-dimensional (2D), with the specific goal of achieving multifractal analysis of rough surfaces with fractal dimension DF anywhere between 2 and 3. During the past few years, increasing interest has been paid to the application of the wavelet transform (WT) to image processing [26,61,62,65,111–113]. In this context, Mallat and collaborators [72,73] have extended the WTMM representation in 2D in a manner inspired from Canny’s multiscale edge detectors commonly used in computer vision [114]. Our strategy [107,108] consists of using this representation to define a three-dimensional (3D) WT skeleton from which one can compute partition functions and ultimately extract multifractal spectra. This article is mainly devoted to a detailed description of the 2D WTMM methodology with some test applications to random monofractal and multifractal self-affine surfaces displaying isotropic as well as anisotropic (with respect to space variables) scale similarity properties. As an illustration of the efficiency and reliability of this method, we will report the main results of its application to experimental 2D data in various domains, namely geophysics, hydrodynamics, and medecine. The article is organized as follows. In Section II, we describe the 2D WTMM representation introduced by Mallat et al. [72,73] as the equivalent of multiscale Canny edge detection. We present the continuous WT as a mathematical microscope that is well suited for characterizing the local regularity of rough surfaces. For practical purposes, the WTMM representation is emphasized as a very efficient and accurate numerical tool for scanning the singularities of fractal landscapes. We then describe the
6
´ ODO ET AL. ARNE
2D WTMM method as a natural generalization of box-counting algorithms and structure function techniques previously used for multifractal analysis of isotropic self-similar interfaces and multiaffine surfaces [107,108]. Section III is devoted to the application of the 2D WTMM method to fractional Brownian surfaces [1,2,4,28] that display isotropic (with respect to space variables) scaling properties. For this class of isotropic homogeneous random rough surfaces, we address the issues of statistical convergence and finite-size effects [108]. We illustrate the ability of the 2D WTMM method to reveal and to master anisotropic scale invariance hidden in the roughness fluctuations of a random surface. We also report the results of test applications to synthetic random multifractal rough surfaces generated with a random W-cascade process on a separable wavelet orthogonal basis [109]. On a more general ground, we show that the 2D WTMM method can be used for many purposes in image processing including edge detection, pattern recognition, and image denoising. The next sections are devoted to the description of the most significant results obtained when applying the 2D WTMM method to three different experimental situations. In Section IV, we review the outcomes of the statistical analysis of high-resolution LANDSAT satellite images of cloudy scenes. This study brings into light the underlying multiplicative structure of marine stratocumulus clouds [107,110]. The multifractal properties of the stratocumulus radiance fields are further compared to previous experimental estimates performed on velocity and temperature fluctuations in high Reynolds number turbulence. In Section V, we report the preliminary results of the application of the 2D WTMM method to 2D cuts of the dissipation and enstrophy fields computed from direct high-resolution numerical simulations of statistically stationary 3D homogeneous and isotropic fully developed turbulent flows at a Reynolds number around 1000 ðR ’ 1150Þ. This study reveals that both fields display log-normal multifractal properties but that the enstrophy field turns out to be much more intermittent than the dissipation field. From a comparison with previous experimental investigations of 1D data, we comment about the reliability of the results obtained when using 1D surrogate dissipation data. In Section VI, we apply the 2D WTMM method to perform a multifractal analysis of digitized mammograms [115]. We show that this method can be used to classify fatty and dense areas of breast tissue. We further demonstrate that this method provides a very efficient way to detect tumors as well as microcalcifications, which correspond to much stronger singularities than those involved in the background tissue roughness fluctuations. These preliminary results indicate that the texture discriminatory power of the 2D WTMM method may lead to significant improvement in computer-assisted diagnosis in digitized mammograms. We conclude in Section VII.
7
MULTIFRACTAL IMAGE ANALYSIS
II. Image Processing with the 2D Continuous Wavelet Transform A. Analyzing Wavelets for Multiscale Edge Detection The edges of the different structures that appear in an image are often the most important features for pattern recognition. Hence, in computer vision [116,117], a large class of edge detectors looks for points where the gradient of the image intensity has a modulus that is locally maximum in its direction. As originally noticed by Mallat and collaborators [72,73], with an appropriate choice of the analyzing wavelet, one can reformalize the Canny’s multiscale edge detector [114] in terms of a 2D wavelet transform. The general idea is to start by smoothing the discrete image data by convolving it with a filter and then to compute the gradient on the smoothed signal. Let us consider two wavelets that are, respectively, the partial derivative with respect to x and y of a 2D -smoothing function ðx; yÞ: 1 ðx; yÞ
¼
@ðx; yÞ @x
and
2 ðx; yÞ
¼
@ðx; yÞ @y
ð1Þ
We will assume that is a well-localized (around x ¼ y ¼ 0) isotropic function that depends on jxj only. In this work, we will mainly use the Gaussian function: ðx; yÞ ¼ eðx
2 þy2 Þ=2
¼ ejxj
2
=2
ð2Þ
as well as the isotropic Mexican hat: ðxÞ ¼ ð2 x2 Þejxj
2
=2
ð3Þ
The corresponding analyzing wavelets 1 and 2 are illustrated in Figure 1. They have one and three vanishing moments when using, respectively, the Gaussian function [Eq. (2)] and the Mexican hat [Eq. (3)] as smoothing function. For any function f ðx; yÞ 2 L2 ðRÞ, the wavelet transform with respect to 1 and 2 has two components and therefore can be expressed in a vectorial form: Z 9 8 2 > ½ f ¼ a d 2 x 1 ½a1 ðx bÞ f ðxÞ > T = < 1 Z ð4Þ T ½ f ðb; aÞ ¼ > ; : T 2 ½ f ¼ a2 d 2 x 2 ½a1 ðx bÞ f ðxÞ >
´ ODO ET AL. ARNE
8
Figure 1. The analyzing wavelets 1 and 2 defined in Eq. (1). First-order analyzing wavelets obtained from a Gaussian smoothing function [Eq. (2)]: (a) 1 ; (b) 2 . Third-order analyzing wavelets obtained from the isotropic Mexican hat smoothing function [Eq. (3)]: (c) 1 ; (d) 2 .
Then, after a straightforward integration by parts, one gets: Z 1 2 2 T ½ f ðb; aÞ ¼ a r d x a ðx bÞ f ðxÞ ¼ rfT ½ f ðb; aÞg ¼ rfb;a f g
ð5Þ
If ðxÞ is simply a smoothing filter like the Gaussian function [Eq. (2)], then Eq. (5) amounts to define the 2D wavelet transform as the gradient vector of f (x) smoothed by dilated versions ða1 xÞ of this filter. If ðxÞ has some vanishing moments, then T ½ f ðb; aÞ in Eq. (5) is nothing but the continuous 2D wavelet transform of f (x) as originally defined by Murenzi [118,119], provided ðxÞ be an isotropic analyzing wavelet so that the integration over the angle becomes trivial. As far as notations are concerned, we will mainly use the representation involving the modulus and the argument of the wavelet transform: T ½ f ðb; aÞ ¼ M ½ f ðb; aÞ; A ½ f ðb; aÞ ð6Þ with
M ½ f ðb; aÞ ¼
n 2 2 o1=2 T 1 ½ f ðb; aÞ þ T 2 ½ f ðb; aÞ
ð7Þ
MULTIFRACTAL IMAGE ANALYSIS
9
and A ½ f ðb; aÞ ¼ Arg T 1 ½ f ðb; aÞ þ iT 2 ½ f ðb; aÞ
ð8Þ
B. Characterizing the Local Regularity Properties of Rough Surfaces with the Wavelet Transform Modulus Maxima In the present work, we will use the term rough surface for an irregular surface on which there are no overhanging regions. This means that the surface can be correctly described by a single-valued self-affine function satisfying 8x0 ¼ ðx0 ; y0 Þ 2 R2 ; 8x ¼ ðx; yÞ 2 R2 in the neighborhood of x0 ; 9H 2 R such that, for any > 0, one has [1,2,4,5,7,28–30]: f ðx0 þ x; y0 þ yÞ f ðx0 ; y0 Þ ’ H ½ f ðx0 þ x; y0 þ yÞ f ðx0 ; y0 Þ
ð9Þ
If f is a stochastic process, this identity holds in law for fixed and x0 . According to the value of the exponent , this self-affine function will display either isotropic scale invariance with respect to the space variables ( ¼ 1) or anisotropic scale invariance ( 6¼ 1) [36,120–123]. The Hurst exponent H characterizes the global regularity of the function f. Let us note that if H < 1, then f is nowhere differentiable and that the smaller the exponent H, the more singular f. For H ¼ 1 and ¼ 1, the rough surface defined by f in R3 , is a self-similar fractal in the sense that it is invariant under some isotropic dilations [1,2,36,121–123]. In various contexts [1–18], several methods have been used to estimate the Hurst exponent of self-affine functions. In most studies, isotropic scale invariance was used as a prerequisite for the application of commonly used methods to the analysis of 1D fractal landscapes, e.g., the height–height correlation function, the variance and power spectral methods, the detrented fluctuation analysis, and the first return and multireturn probability distributions [33–36,39–42]. The strategy followed in these studies reduces the analysis of rough surfaces to the investigation of self-affine (1D) profiles obtained through 2D cuts in a three-dimensional representation. As long as the estimate of the Hurst exponent H is independent of the intersection plane, there is no inconsistency in the methodology. When H is found to be sensitive to the orientation of the intersecting plane, this means that the isotropic scale invariance hypothesis does not apply and that one needs to have recourse to methods fully adapted to the characterization of rough surfaces. Unfortunately, to our knowledge, most of the methods listed above have been extended to self-affine functions from R2 to R under the implicit assumption of isotropic scaling.
´ ODO ET AL. ARNE
10
But fractal functions generally display multiaffine properties in the sense that their roughness (or regularity) fluctuates from point to point [43,45–49]. To describe these multifractal functions, one thus needs to change slightly the definition of the Hurst regularity of f so that it becomes a local quantity hðx0 Þ. A rigorous definition of the Ho¨lder exponent (as the strength of a singularity of a function f at the point x0 ) is given by the largest exponent hðx0 Þ such that there exists a polynomial of degree n < hðx0 Þ and a constant C > 0, so that for any point x in the neighborhood of x0 one has [72,73,106–108] j f ðxÞ Pn ðx x0 Þj Cjx x0 jhðx0 Þ
ð10Þ
If f is n times continuously differentiable at the point x0 , then one can use for the polynomial Pn ðx x0 Þ the order-n Taylor series of f at x0 and thus prove that hðx0 Þ > n. Thus hðx0 Þ measures how irregular the function f is at the point x0 . The higher the exponent hðx0 Þ, the more regular the function f. In this work, we will mainly consider fractal functions of two variables that possess only cusp-like singularities. (We refer the reader to Ref. [124], for rigorous mathematical results concerning 2D oscillating singularities or chirps.) But the situation is a little more tricky than in 1D. Indeed one has to distinguish two main cases depending on whether scale invariance is under isotropic or anisotropic dilations [1,2,36,108,121–123,125]. 1. Isotropic Dilations Local scale invariance under isotropic dilations means that locally, around the point x0 , the function f behaves as f ðx0 þ uÞ f ðx0 Þ ’ 1hðx0 Þ ½ f ðx0 þ uÞ f ðx0 Þ
ð11Þ
where > 0 and u is a unit vector. If the scaling exponent hðx0 Þ does not depend upon the direction of u, then f displays isotropic local scale invariance around x0 and the corresponding singularity is of Ho¨lder exponent hðx0 Þ. If, on the contrary, the scaling exponent depends upon the direction of u, then the Ho¨lder exponent is the minimum value of h over all the possible orientations of u. Thus f displays anisotropic scale invariance around x0 with one, several, or a continuum of privileged directions along which the variation of f defines the Ho¨lder exponent of the singularity located at x0 . 2. Anisotropic Dilations Local scale invariance under anisotropic dilations means that locally around the point x0 , the function f behaves as [120–123,125] f ½x0 þ ðÞr u f ðx0 Þ ’ 1hðx0 Þ ½ f ðx0 þ uÞ f ðx0 Þ
ð12Þ
MULTIFRACTAL IMAGE ANALYSIS
11
where > 0 and u is a unit vector. r is a rotation matrix and ðÞ is a positive diagonal 2 2 matrix that accounts for anisotropic self-affine scale transformation in the -rotated referential with origin x0 : 0 ðÞ ¼ ð13Þ 0
The function f thus displays anisotropic scale invariance around x0 and the Ho¨lder exponent is given by the behavior of f in the direction ð < 1Þ or þ =2ð > 1Þ. Very much like the wavelet transform analysis of cusp singularities in 1D [47–49,74], in order to recover the Ho¨lder exponent hðx0 Þ of a function f from R2 to R, one needs to study the behavior of the wavelet transform modulus inside a cone jx x0 j < Ca in the (space-scale) half space [106,108,126]. As originally proposed by Mallat and collaborators [72,73], a very efficient way to perform point-wise regularity analysis is to use the wavelet transform modulus maxima. In the spirit of Canny edge detection [114], at a given scale a, the WTMM are defined as the points b where the wavelet transform modulus M ½ f ðb; aÞ [Eq. (7)] is locally maximum along the gradient direction given by the wavelet transform argument A ½ f ðb; aÞ [Eq. (8)]. These modulus maxima are inflection points of f a ðxÞ. As illustrated in the examples just below, these WTMM lie on connected chains hereafter called maxima chains [106–108]. In theory, one only needs to record the position of the local maxima of M along the maxima chains together with the value of M ½ f and A ½ f at the corresponding locations. At each scale a, our wavelet analysis thus reduces to store those WTMM maxima (WTMMM) only. They indicate locally the direction where the signal has the sharpest variation. This orientation component is the main difference between 1D and 2D wavelet transform analysis. These WTMMM are disposed along connected curves across scales called maxima lines [107,108]. We will define the WT skeleton as the set of maxima lines that converges to the (x, y)-plane in the limit a ! 0þ . This WT skeleton is likely to contain all the information concerning the local Ho¨lder regularity properties of the function f under consideration [108]. Example 1. Isotropic singularity interacting with a localized smooth structure. Let us first illustrate the above definitions on the function f1 shown in Figure 2: 2
2
f1 ðxÞ ¼ Aeðxx1 Þ =2 þ Bjx x0 j0:3
ð14Þ
This function is C 1 everywhere except at x ¼ x0 where f1 is isotropically singular with a Ho¨lder exponent hðx0 Þ ¼ 0:3. Its 2D wavelet transform [Eq. (4)] with a first-order analyzing wavelet [the smoothing function ðxÞ is
12
´ ODO ET AL. ARNE
Figure 2. Three-dimensional representation of the function f1 ðxÞ ¼ Aeðxx1 Þ2=2 2 þBjx x0 j0:3 . The isotropic singularity S is located at x0 ¼ ð256; 256Þ. The Gaussian localized structure G of width ¼ 128 is located at x1 ¼ ð256; 256Þ. The parameter values are A ¼ 1 and B ¼ 1.
Figure 3. Wavelet transform [Eq. (4)] of the function f1 shown in Figure 2, with a first-order analyzing wavelet ( is the isotropic Gaussian function). (a) T 1 ½ f1 ; (b) T 2 ½ f1 coded using 32 gray levels from white (min T ) to black (max T ). (c) M ½ f1 coded from white (M ¼ 0) to black (max M ). (d) jA ½ f1 j coded from white (jA j ¼ 0) to black (jA j ¼ ). The considered scale is a ¼ 23 W where W ¼ 13 (pixels) is the characteristic size of at the smallest resolved scale.
the isotropic Gaussian function] is shown in Figure 3 for a given scale a ¼ 23 w , where w ¼ 13 is the width (in pixel units) of the analyzing wavelet at the smallest scale where it is still well enough resolved. Indeed w is the smallest scale (or the highest resolution) accessible to our wavelet transform microscope. T 1 ½ f1 and T 2 ½ f1 [Eq. (4)] are shown in Figure 3a and b,
MULTIFRACTAL IMAGE ANALYSIS
13
respectively. The corresponding modulus M ½ f1 and argument A ½ f1 are represented in Figure 3c and d. From a simple visual inspection of Figure 3c, one can convince oneself that the modulus is radially symmetric around x0 where is located the singularity S. This is confirmed in Figure 3d where A ½ f1 rotates uniformly from 0 to 2 around x0 . The WTMM as well as the WTMMM are shown in Figure 4 for various values of the scale parameter a ranging from a ¼ 23:5 W (Fig. 4a) to 27:5 (Fig. 4f ). At small scale, there exist mainly two maxima chains. One is a closed curve around x0 at which the
Figure 4. Maxima chains (solid line) defined by the WTMM of the function f1 (Fig. 3). The local maxima (respectively minima) along these chains are indicated by () [respectively ( )] from which originates an arrow whose length is proportional to M ½ f1 and its direction (with respect to the x-axis) is given by the WTMM argument A ½ f1 . The scale parameter is a ¼ 23:5 (a), 24:7 (b), 25:5 (c), 26:3 (d), 26:8 (e), and 27:5 (f ) in w units. Same first-order analyzing wavelet as in Figure 3.
14
´ ODO ET AL. ARNE
Figure 5. Three-dimensional representation of the topological evolution of the WTMM chains of f1 in the space-scale half-hyperplane. The WTMMM () are disposed on connected curves called maxima lines. These maxima lines are obtained by linking each WTMMM computed at a given scale to the nearest WTMMM computed at the scale just above. There exist two maxima lines, Lx0 ðaÞ and Lx1 ðaÞ, pointing, respectively,to the singularity S and to the smooth localized structure G in the limit a ! 0þ .
singularity S is located. The other one is an open curve that partially surrounds G. On each of these maxima chains, one finds only one WTMMM () whose corresponding arguments are such that the gradient vector points to S and G, respectively. As far as the singularity S is concerned, this means that the direction of largest variation of f1 around S is given by x0 ¼ A ½ f1 þ , where A ½ f1 is the argument of the corresponding WTMMM. When increasing the scale parameter, the maxima chains evolve; in particular the closed maxima chain around S swells (its characteristic size behaves like a) until it connects with the maxima chain associated with G (Fig. 4d) to form a single closed curve surrounding both S and G (Fig. 4f ). The topological evolution of the maxima chains in the space-scale half-hyperplane is illustrated in Figure 5. This three-dimensional representation enlightens the existence of two maxima lines obtained by linking the WTMMM step by step (i.e., as continuously as possible) from small to large scales. One of these maxima
MULTIFRACTAL IMAGE ANALYSIS
15
Figure 6. Evolution of M ½ f1 and A ½ f1 when following, from large scale to small scale, the maxima lines Lx0 ðaÞ and Lx1 ðaÞ pointing, respectively, to the singularity S [(a) and (c), respectively] and to the localized smooth structure G [(b) and (d), respectively]. The symbols () and ( ) have the same meaning as in Figure 4. Same first-order analyzing wavelet as in Figure 3.
lines points to the singularity S in the limit a ! 0þ . As shown in Figure 6a, along this maxima line [Lx0 ðaÞ], the wavelet transform modulus behaves as [72,73] M ½ f1 ½Lx0 ðaÞ a hðx0 Þ ;
a ! 0þ
ð15Þ
where hðx0 Þ ¼ 0:3 is the Ho¨lder exponent of S. Moreover, along this maxima line, the wavelet transform argument evolves toward the value (Fig. 6c): A ½ f1 ðLx0 ðaÞÞ ¼ þ x0
ð16Þ
in the limit a ! 0þ , where x0 is nothing but the direction of the largest variation of f1 around x0 , i.e., the direction to follow from x0 to cross the maxima line at a given (small) scale. From the maxima line Lx0 ðaÞ, one thus gets the required amplitude as well as directional informations to characterize the local Ho¨lder regularity of f1 at x0 . Note that along the other maxima line Lx1 ðaÞ that points to x1 where the smooth localized structure G is located, the wavelet transform modulus behaves as (Fig. 6b) M ½ f1 ðLx1 ðaÞÞ an ;
a ! 0þ
where n ¼ 1 is the order of the analyzing wavelet.
ð17Þ
16
´ ODO ET AL. ARNE
Figure 7. WTMM analysis of the function f2 ðxÞ defined in Eq. (18). (a) f2 ðxÞ as coded using 32 gray levels from white (min f2 ) to black (max f2 ). The maxima chains (solid line) and the WTMMM () are shown for the following values of the scale parameter a ¼ 2 (b), 28 (c), and 211 (d) in w units. Same first-order analyzing wavelet as in Figure 3.
Example 2. Anisotropic singularity. Let us illustrate with a specific example, the possibility for a function f2 ðxÞ to display anisotropic local scale invariance with respect to isotropic dilations. In Figure 7a the following function is represented: f2 ðxÞ ¼ f2 ð ; Þ ¼ hð Þ
ð18Þ
hð Þ ¼ 0:3 sin ð 2=3Þ þ 0:5
ð19Þ
with
The exponent hð Þ is nothing but the Ho¨lder exponent at ¼ 0 of the 1D profile obtained when intersecting the image in Figure 7a along the direction . As far as the whole 2D problem is concerned, the Ho¨lder exponent of the singularity S is hðx0 Þ ¼ min hð Þ ¼ 0:2. It quantifies the sharpest variation of f2 ðxÞ, which occurs in the direction x0 ¼ =6. As shown in Figure 7b–d for different zooms, there exists at each scale only one WTMMM, which belongs to a unique maxima line Lx0 ðaÞ pointing to the singularity S. Note that this WTMMM is located in the direction x0 ¼ =6 from the origin. When following Lx0 ðaÞ from large to small scales, M ½ f2 ½Lx0 ðaÞ behaves as a power law with an exponent hðx0 Þ ¼ 0:2 (Fig. 8a), in remarkable agreement with the theoretical prediction for the Ho¨lder exponent of S. Moreover, when investigating A ½ f2 ½Lx0 ðaÞ, one further gets directional
MULTIFRACTAL IMAGE ANALYSIS
17
Figure 8. Evolution of (a) M ½ f2 and (b) A ½ f2 when following, from large to small scales, the maxima line Lx0 ðaÞ ðÞ, which points to the singularity S. Same first-order analyzing wavelet as in Figure 7.
information: A ½ f2 ¼ 5=6 ¼ x0 , from which one learns about the possible existence of some preferential direction as far as the Ho¨lder regularity properties are concerned. We will not treat here the case of local scale invariance with respect to anisotropic self-affine dilations. We send the reader to Arneodo et al. [108] where the 2D WTMM method has been be applied to random self-affine rough surfaces. C. The 2D Wavelet Transform Modulus Maxima (WTMM) Method Before describing the methodology to be used to perform a multifractal analysis of rough surfaces, we need to define the notion of singularity spectrum of a fractal function from R2 into R [108]. 1. Definition Let f be a function from R2 into R and Sh the set of all the points x0 so that the Ho¨lder exponent [Eq. (10)] of f at x0 is h. The singularity spectrum D(h) of f is the function that associates with any h, the Hausdorff dimension of Sh : DðhÞ ¼ dH x 2 R2 ; hðxÞ ¼ h ð20Þ
In the previous section, we have seen that the maxima lines defined from the WTMMM computed at different scales can be used as a scanner of singularities. They allow us to detect the positions where the singularities are located as well as to estimate their strength h. A rather naive way to compute the D(h) singularity spectrum would thus consist in identifying the
18
´ ODO ET AL. ARNE
set of maxima lines along which the wavelet transform modulus behaves with a power-law exponent h [Eq. (15)] and then to use classical boxcounting techniques [19–27] to compute the fractal dimension D(h) of the set of points fxn g R2 to which these maxima lines converge. Unfortunately, when investigating deterministic as well as random fractal functions, the situation is somewhat more intricate than when dealing with isolated singularities. The characteristic feature of these singular functions is the existence of a hierarchical distribution of singularities [47–50,62–65]. Locally, the Ho¨lder exponent hðx0 Þ is then governed by the singularities that accumulate at x0 . This results in unavoidable oscillations around the expected power-law behavior of the wavelet transform modulus [47–50,79]. The exact determination of h from log–log plots on a finite range of scales is therefore somewhat uncertain [127,128]. Note that there have been many attempts to circumvent these difficulties in 1D [79,129]. But in 2D (rough surfaces) as well as in 1D (multiaffine profiles), there exist fundamental limitations (which are not intrinsic to the wavelet technique) to the local measurement of the Ho¨lder exponents of a fractal function. Therefore, the determination of statistical quantities like the D(h) singularity spectrum requires a method that is more feasible and more appropriate than a systematic investigation of the wavelet transform local scaling behavior as experienced [127,128].
2. Methodology Our strategy will consist in mapping the methodology developed [47–50] for multifractal analysis of irregular 1D landscapes to the statistical characterization of roughness fluctuations of 2D surfaces [107,108]. The 2D WTMM method relies upon the space-scale partitioning given by the wavelet transform skeleton. As discussed in Section II.B, this skeleton (see Fig. 12) is defined by the set of maxima lines that points to the singularities of the considered function and therefore is likely to contain all the information concerning the fluctuations of point-wise Ho¨lder regularity. Let us define LðaÞ as the set of all maxima lines that exist at the scale a and that contain maxima at any scale a0 a. The important feature is that each time the analyzed image has a Ho¨lder exponent hðx0 Þ < n , there is at least one maxima line pointing toward x0 along which Eq. (15) is expected to hold. In the case of fractal functions, we thus expect that the number of maxima lines will diverge in the limit a ! 0þ , as the signature of the hierarchical organization of the singularities. The WTMM method consists in defining the following partitions functions directly from the WTMMM that belong to the wavelet transform skeleton:
19
MULTIFRACTAL IMAGE ANALYSIS
Zðq; aÞ ¼
X
L 2 LðaÞ
"
sup ðx; a0 Þ 2 L; a0 a
M ½ f ðx; a0 Þ
#q
ð21Þ
where q 2 R. As compared to classic box-counting techniques [19–27], the analyzing wavelet plays the role of a generalized ‘‘oscillating box’’, the scale a defines its size, while the WTMM skeleton indicates how to position our oscillating boxes to obtain a partition (of S ¼ [Sh ) at the considered scale. Without the ‘‘sup’’ in Eq. (21), one would have implicitely considered a uniform covering with wavelets of the same size a. As emphasized [47– 50,74], the ‘‘sup’’ can be regarded as a way of defining a ‘‘Hausdorff-like’’ scale-adaptative partition that will prevent divergencies to show up in the calculation of Zðq; aÞ for q < 0. Now, from the analogy that links the multifractal formalism to thermodynamics [48,49,67–69,76,77], one can define the exponent ðqÞ from the power-law behavior of the partition function: Zðq; aÞ aðqÞ ;
a ! 0þ
ð22Þ
where q and ðqÞ play, respectively, the role of the inverse temperature and the free energy. The main result of the wavelet-based multifractal formalism is that in place of the energy and the entropy (i.e., the variables conjugated to q and ), one has the Ho¨lder exponent h [Eq. (10)] and the singularity spectrum D(h) [Eq. (20)]. This means that the D(h) singularity spectrum of f can be determined from the Legendre transform of the partition function scaling exponent ðqÞ: DðhÞ ¼ min ½qh ðqÞ q
ð23Þ
From the properties of the Legendre transform, it is easy to convince oneself that homogeneous (monofractal) fractal functions that involve singularities of unique Ho¨lder exponent h ¼ @ =@q are characterized by a ðqÞ spectrum that is a linear function of q. On the contrary, a nonlinear ðqÞ curve is the signature of nonhomogeneous functions that display multifractal properties, in the sense that the Ho¨lder exponent h(x) is a fluctuating quantity that depends upon the spatial position x (in other words the local roughness exponent is fluctuating from point to point). 3. Remark The exponents ðqÞ are much more than simply some intermediate quantities of a rather easy experimental access. For some specific values of q, they have well known meaning [48].
´ ODO ET AL. ARNE
20
q ¼ 0: From Eqs. (21) and (22), one deduces that the exponent ð0Þ accounts for the divergence of the number of maxima lines in the limit a ! 0þ . This number basically corresponds to the number of wavelets of size a required to cover the set S of singularities of f. In full analogy with standard box-counting arguments [19–27], ð0Þ can be identified to the fractal dimension (capacity) of this set: ð0Þ ¼ dF fx; hðxÞ < þ1g
ð24Þ
q ¼ 1: As pointed out [48], the value of the exponent ð1Þ is related to the fractal dimension (capacity) of the rough surface S defined by the function f. More precisely [130]: dF ðSÞ ¼ max½2; 1 ð1Þ
ð25Þ
q ¼ 2: It is easy to show that the exponent ð2Þ is intimately related to the scaling exponent of the spectral density: Z 1 d j ^ f ðk; Þj2 k SðkÞ ¼ ð26Þ 2 where ¼ 4 þ ð2Þ
ð27Þ
From a practical point of view, the computation of the D(h) singularity spectrum, via the Legendre transform defined in Eq. (23), first requires a smoothing of the ðqÞ curve. This procedure has a main disadvantage. This smoothing operation prevents the observation of any nonanalycity in the curves ðqÞ and D(h) and the interesting physics of phase transitions [49,71,131,132] in the scaling properties of fractal functions can be completely missed. As suggested [49,131,133–137], one can avoid directly performing the Legendre transform by considering the quantities h and D(h) as mean quantities defined in a canonical ensemble, i.e., with respect to their Boltzmann weights computed from the WTMMM [49,79]: W ½ f ðq; L; aÞ ¼
jsupðx; a0 Þ 2 L; a0 a M ½ f ðx; a0 Þjq Zðq; aÞ
ð28Þ
where Zðq; aÞ is the partition function defined in Eq. (21). Then one computes the expectation values: hðq; aÞ ¼
X
L 2 LðaÞ
ln
sup ðx; a0 Þ 2 L; a0 a
M ½ f ðx; a Þ
W ½ f ðq; L; aÞ 0
ð29Þ
MULTIFRACTAL IMAGE ANALYSIS
21
and Dðq; aÞ ¼
X
L 2 LðaÞ
W ½ f ðq; L; aÞ ln ½W ½ f ðq; L; aÞ
ð30Þ
from which one extracts hðqÞ ¼ lim hðq; aÞ=lna
ð31Þ
DðqÞ ¼ lim Dðq; aÞ=lna
ð32Þ
a!0þ
a!0þ
and therefore the D(h) singularity spectrum. 4. Numerical Implementation In this section, we briefly review the main steps of the numerical implementation of the 2D WTMM method. Let us consider an n n digitized image of a rough surface. Step 1: Computation of the 2D wavelet transform. We compute the two components T 1 and T 2 of the wavelet transform [Eq. (4)] in the Fourier domain, using 2D Fast Fourier Transform (FFT) [138] and inverse FFT. We start our analysis by choosing the analyzing wavelet among the class of radially isotropic wavelets defined in Section II.A (Fig. 1). To master edge effects we focus only on the n=2 n=2 central part of the image where our wavelet coefficients can be shown to be not affected by the boundary of the original image. This means that we will be careful not to increase the scale parameter a above a critical value amax so that the n=2 n=2 central wavelet coefficients remain safe of finite-size effects. In the opposite limit, we will define a lower bound amin to the accessible range of scales so that the analyzing wavelet is still well resolved at that scale. (We refer the reader to Section 1.3.3 of Decoster’s Ph.D. thesis [139] for a detailed practical definition of the accessible [amin ; amax ] range of scales.) Under those precautions, one can be confident of our wavelet transform microscope as far as the investigation of the scale invariance properties in the range a 2 ½amin ; amax is concerned. Step 2: Computation of the wavelet transform skeleton. As explained in Section II.B, at a given scale a, we identify the wavelet transform modulus maxima as the points where M ½ f ðb; aÞ [Eq. (7)] is locally maximum along the gradient direction given by A ½ f ðb; aÞ [Eq. (8)]. Then we chain the points that are nearest neighbors (which actually have compatible arguments). Along each of these maxima chains, we locate the local maxima previously called WTMMM. Note that the two ends of an open maxima chain are not allowed positions for the WTMMM. Once
22
´ ODO ET AL. ARNE
computed the set of WTMMM for a finite number of scales ranging from amin to amax , one proceeds to the connection of these WTMMM from scale to scale. One starts at the smallest scale amin and we link each WTMMM to their nearest neighbor found at the next scale just above. We proceed iteratively from scale to scale up to amax . All the WTMMM that then remain isolated are suppressed. All the WTMMM that are connected on a curve across scales that does not originate from the smallest scale amin are also suppressed. We then store the modulus M and the argument A of the WTMMM that belong to the so-called maxima lines. Those lines are supposed to converge, in the limit a ! 0þ , to the points where the singularities of the image under study are located. As explained in Section II.B, to define the wavelet transform skeleton, one has to select these maxima lines that satisfy Eq. (15) from those that satisfy Eq. (17) and that are wavelet dependent. This is done by increasing the order of the analyzing wavelet; for n large enough, the spurious maxima lines are suppressed by a simple thresholding on M at the smallest scale amin . Their roots are definitely rejected as misleading singularity locations. Step 3: Computation of the multifractal spectrum. According to Eq. (21), one uses the wavelet transform skeleton to compute the partition function Zðq; aÞ on the discrete set of considered scales amin a amax . Then, for a given value of q 2 ½qmin ; qmax , one extracts the exponent ðqÞ [Eq. (22)] from a linear regression fit of lnZðq; aÞ vs. lna. As a test of the robustness of our measurement, we examine the stability of our estimate of ðqÞ with respect to the range of scales ½amin ; amax ½amin ; amax over which the linear regression fit is performed. After estimating the exponent ðqÞ for a discrete set of q-values, we smooth the ðqÞ curve using standard procedure. Then, one determines the D(h) singularity spectrum by Legendre transforming the ðqÞ curve according to Eq. (23). As a check of the reliability of our results, we use the alternative strategy defined in Eq. (28) to (32) to estimate the D(h) singularity spectrum without performing explicitly the Legendre transform. When dealing specifically with stochastic process, we generally have several images at our disposal somehow corresponding to different realizations of this process. In this case, we will mainly proceed to two different averagings corresponding to the following: Quenched averaging: We extract the ðqÞ curve from averaging < lnZðq; aÞ > over the number of images: e < lnZðq;aÞ> aðqÞ ;
a ! 0þ
ð33Þ
In other words, the ðqÞ spectrum is obtained by averaging over the ðqÞ curves extracted from each individual image.
MULTIFRACTAL IMAGE ANALYSIS
23
Annealed averaging: One can alternatively compute the ðqÞ spectrum after averaging the partition functions obtained for each image: < Zðq; aÞ > aðqÞ ;
a ! 0þ
ð34Þ
Note that in most of the examples discussed in this work, we have not observed any significant discrepency between the ðqÞ spectra obtained using either one of these averagings. Consequently, in the following we will mainly show the results obtained when estimating the ðqÞ and D(h) multifractal spectra using annealed averaging. Step 4: Computation of the WTMMM probability density functions. From the computation of the joint probability density function Pa ðM; AÞ, we first proceed to a test of the possible independence of M and A. If it is so, we then investigate separately the scale dependence of Pa ðMÞ and Pa ðAÞ. From the investigation of the shape of Pa ðAÞ, and of its possible evolution when varying a, one can then quantify some possible departure from isotropic scaling as well as the existence of possible privileged directions. When Pa ðM; AÞ does not factorize, then M and A are intimately related. In this case, one can try to compute the A ðqÞ and DA ðhÞ multifractal spectra by conditioning the statistics of the modulus fluctuations to a given value of the argument. The A-dependence of these spectra quantifies what one could call anisotropic multifractal scaling properties.
III. Test Applications of the WTMM Method to Monofractal and Multifractal Rough Surfaces A. Fractional Brownian Surfaces Since its introduction by Mandelbrot and Van Ness [140], the fractional Brownian motion (f Bm) has become a very popular model in signal and image processing [1–18,28–30]. In one dimension, f Bm has proved useful for modeling various physical phenomena with long-range dependence, e.g., ‘‘1/f ’’ noises. The f Bm exhibits a power spectral density Sð!Þ 1=! , where the spectral exponent ¼ 2H þ 1 is related to the Hurst exponent H. 1D f Bm has been extensively used as test stochastic signals for Hurst exponent measurements. The performances of classic methods [33–36,39–42,141–143] (e.g., height–height correlation function, variance and power spectral methods, first return and multireturn probability distributions, maximum likelihood techniques) have been recently competed by wavelet-based
24
´ ODO ET AL. ARNE
techniques [144–157]. Comparative analysis of different wavelet-based estimators for the self-similarity parameter H of f Bm can be found [152–154]. FBm’s are homogeneous random self-affine functions that have been specifically used to calibrate the 1D WTMM methodology [47–49,79]. This method was shown to be a very efficient tool to diagnose the monofractal scaling properties of f Bm. Moreover, it provides very accurate new estimators of the Hurst exponent with remarkable performances [158]. The purpose of this section is to carry out a test application of the 2D WTMM methodology described in Section II, on several realizations of 2D f Bm [108]. The generalization of Brownian motion to more than one dimension was first considered by Levy [159]. The generalization of f Bm follows along similar lines. A 2D f Bm BH ðxÞ indexed by H 2 ½0; 1 is a process with stationary zero-mean Gaussian increments and whose correlation function is given by [1,2,28,159,160] < BH ðxÞBH ðyÞ > ¼
2 ðjxj2H þ jyj2H jx yj2H Þ 2
ð35Þ
where < > represents the ensemble mean value. The variance of such a process is varðBH ðxÞÞ ¼ 2 jxj2H
ð36Þ
from which one recovers the classic behavior var½B1=2 ðxÞ ¼ 2 jxj for uncorrelated Brownian motion with H ¼ 1=2. 2D fBms are self-affine processes that are statistically invariant under isotropic dilations [Eq. (11)]: BH ðx0 þ uÞ BH ðx0 Þ ’ H ½BH ðx0 þ uÞ BH ðx0 Þ
ð37Þ
where u is a unitary vector and ’ stands for the equality in law. The index H corresponds to the Hurst exponent; the higher the exponent H, the more regular the f Bm surface. But since Eq. (37) holds for any x0 and any direction u, this means that almost all realizations of the f Bm process are continuous, everywhere nondifferentiable, isotropically scale-invariant as characterized by a unique Ho¨lder exponent hðxÞ ¼ H; 8x [1,2,28,158]. Thus f Bm surfaces are the representation of homogeneous stochastic fractal functions characterized by a singularity spectrum that reduces to a single point DðhÞ ¼ 2 ¼ 1
if h ¼ H if h ¼ 6 H
ð38Þ
By Legendre transforming D(h) according to Eq. (23), one gets the following expression for the partition function exponent [Eq. (22)]:
MULTIFRACTAL IMAGE ANALYSIS
25
Figure 9. FBm surfaces (128 128) generated with the Fourier transform filtering synthesis method. (a) H ¼ 1=3; (b) H ¼ 1=2; (c) H ¼ 2=3. In the top panels, BH ðxÞ is coded using 32 gray levels from white (min BH ) to black (max BH ).
ðqÞ ¼ qH 2
ð39Þ
ðqÞ is a linear function of q, the signature of monofractal scaling, with a slope given by the index H of the f Bm. We have tested the 2D WTMM method described in Section II on f Bm surfaces generated by the so-called Fourier transform filtering method [28,29,160]. We have used this particular synthesis method because of its implementation simplicity. Indeed it amounts to a fractional integration of a 2D ‘‘white noise’’ and therefore it is expected to reproduce quite faithfully the expected isotropic scaling invariance properties [Eqs. (37)–(39)]. From a visual inspection of Figure 9a (H ¼ 1=3), 9b (H ¼ 1=2), and 9c (H ¼ 2=3), one can convince oneself that the f Bm surfaces become less and less irregular when increasing the index H. This is nothing but the traduction that the fractal dimension of f Bm surfaces increases from 2 to 3 when H covers [0,1] [Eq. (25)]: dF ðf BmSÞ ¼ 1 ð1Þ ¼ 3 H
ð40Þ
When increasing H, a f Bm surface becomes more and more similar to a smooth Euclidean 2D surface. In Figure 10 are reported the results of a power-spectral analysis of a (1024 1024) image of a f Bm rough surface with Hurst exponent H ¼ 1=3.
26
´ ODO ET AL. ARNE
Figure 10. Power spectrum analysis of a (1024 1024) image of a fBm surface B1=3 ðxÞ. (a) in jB^1=3 ðkÞj as coded using 32 gray levels from white (min lnjB^1=3 j) to black (max lnjB^1=3 j). (b) The spectral density SðjkjÞ vs. jkj in a logarithmic representation. The solid line corresponds to the theoretical power-law prediction with exponent ¼ 2H þ 2 ¼ 8=3 [Eq. (41)].
In Figure 10a, the Fourier transform of B1=3 ðxÞ does not display any significant departure from radial symmetry. Isotropic scaling is actually ^ 1=3 ðkÞ over several of such images. In Figure confirmed when averaging B 10b, the power spectral density is shown to behave as a power law as a function of the wavevector modulus jkj, with an exponent that is in perfect agreement with the theoretical prediction for the spectral exponent [Eq. (27)]: ¼ 4 þ ð2Þ ¼ 2 þ 2H
ð41Þ
Along the lines of the numerical implementation procedure described in Section II.C, we have wavelet transformed 32 (1024 1024) images of BH¼1=3 with an isotropic first-order analyzing wavelet. To master edge effects, we then restrain our analysis to the 512 512 central part of the wavelet transform of each image. In Figure 11 the computation of the maxima chains and the WTMMM for an individual image at three different scales is illustrated. In Figure 11b the convolution of the original image (Fig. 11a) with the isotropic Gaussian smoothing filter [Eq. (5)] is shown. According to the definition of the wavelet transform modulus maxima, the maxima chains correspond to well-defined edge curves of the smoothed image. The local maxima of M along these curves are located at the points where the sharpest intensity variation is observed. The corresponding arrows clearly indicate that locally, the gradient vector points in the direction (as given by A ) of maximum change of the intensity surface. When going from large scale (Fig. 11d) to small scale (Fig. 11c), the characteristic average distance between two nearest neighbor WTMMM decreases like a. This means that the number of WTMMM and, in turn, the number of maxima lines proliferate across scales like a2 . The corresponding wavelet transform skeleton is shown in Figure 12. As confirmed just
MULTIFRACTAL IMAGE ANALYSIS
27
Figure 11. 2D wavelet transform analysis of BH¼1=3 ðxÞ. is a first-order radially symmetric analyzing function (see Fig. 1). (a) Thirty-two gray-scale coding of the central 512 512 portion of the original image. In (b) a ¼ 2 W , (c) a ¼ 20:1 W , and (d) a ¼ 21:9 W are shown the maxima chains; the local maxima of M along these chains are indicated by () from which originate an arrow whose length is proportional to M and its direction (with respect to the x-axis) is given by A . In (b), the smoothed image b;a B1=3 [Eq. (5)] is shown as a grayscale coded background from white (min) to black (max).
below, when extrapolating the arborescent structure of this skeleton to the limit a ! 0þ , one recovers the theoretical result that the support of the singularities of a 2D f Bm has a dimension dF ¼ 2, i.e., BH¼1=3 ðxÞ is nowhere differentiable [1,2,28,29,159]. The local scale invariance properties of a f Bm rough surface are investigated in Figure 13. When looking at the behavior of M along some maxima lines belonging to the wavelet transform skeleton, despite some superimposed fluctuations, one observes a rather convincing power-law decrease with an exponent hðx0 Þ that does not seem to depend upon the spatial location x0 . Moreover, the theoretical value for the Ho¨lder exponent hðx0 Þ ¼ H ¼ 1=3 provides a rather good fit of the slopes obtained at small scale in a logarithmic representation of M vs. a [Eq. (15)]. When looking at the simultaneous evolution of A along the same maxima lines, one observes random fluctuations. Unfortunately, because of the rather limited range of scales accessible to our mathematical microscope, a 2 ½ W ; 24 W , there is no hope of demostrating numerically that A actually performs a random walk over [0, 2].
28
´ ODO ET AL. ARNE
Figure 12. Wavelet transform skeleton of the 2D f Bm image shown in Figure 11a. This skeleton is defined by the set of maxima lines obtained after linking the WTMMM detected at different scales. Same analyzing wavelet as in Figure 11.
Figure 13. Characterizing the local Ho¨lder regularity of BH¼1=3 ðxÞ from the behavior of the WTMMM along the maxima lines. Three maxima lines are investigated. (a) log2 M vs. log2 a; (b) A vs. log2 a. Same analyzing wavelet as in Figure 11. The solid line in (a) corresponds to the theoretical slope h ¼ H ¼ 1=3. a is expressed in W units.
In Figure 14 the results of the computation of the ðqÞ and D(h) spectra using the 2D WTMM method described in Section II are reported. As shown in Figure 14a, the annealed average partition function Zðq; aÞ [over 32 images of B1=3 ðxÞ] displays a remarkable scaling behavior over more than three octaves when plotted versus a in a logarithmic representation
MULTIFRACTAL IMAGE ANALYSIS
29
Figure 14. Determination of the ðqÞ and D(h) spectra of 2D fBm with the 2D WTMM method. (a) log2 Zðq; aÞ vs. log2 a; the solid lines correspond to the theoretical predictions ðqÞ ¼ qH 2 [Eq. (39)] with H ¼ 1=3. (b) h(q, a) vs. log2 a; the solid lines correspond to the theoretical slope H ¼ 1=3. (c) ðqÞ vs. q for H ¼ 1=3ðÞ, 1/2 (&), and 2/3 (~); the solid lines correspond to linear regression fit estimates of H. (d) D(h) vs. h as obtained from the scaling behavior of D(q, a) vs. log2 a [Eq. (30)]; the symbols have the same meaning as in (c). Same analyzing wavelet as in Figure 11. These results correspond to annealed averaging over 32 (1024 1024) fBm images. a is expressed in W units.
[Eqs. (21) and (22)]. Moreover, for a wide range of values of q 2 ½4; 6, the data are in good agreement with the theoretical ðqÞ spectrum [Eq. (39)]. When proceeding to a linear regression fit of the data over the first two octaves, one gets the ðqÞ spectra shown in Figure 14c for three values of the f Bm index H ¼ 1=3, 1/2, and 2/3. Whatever H, the data systematically fall on a straight line, the signature of homogeneous (monofractal) scaling properties. However, the slope of this straight line provides a slight
30
´ ODO ET AL. ARNE
Figure 15. Pdfs of the WTMMM coefficients of B1=3 ðxÞ as computed at different scales a ¼ 1; 2; 4, and 8 (in W units). (a) Pa ðMÞ vs. M. (b) Pa ðAÞ vs. A. is the first-order analyzing wavelet shown in Figure 1. These results correspond to averaging over 32 (1024 1024) fBm images.
underestimate of the corresponding Hurst exponent H. Let us point out that a few percent underestimate has also been reported when performing similar analysis of 1D f Bm [47–49,98]. Theoretical investigation of finite-size effects and statistical convergence has been recently performed to explain this experimental observation [98]. R R In Figure 15 are shown the pdfs Pa ðMÞ ¼ dAPa ðM; AÞ and Pa ðAÞ ¼ dM Pa ðM; AÞ, computed for four different values of the scale parameter with B1=3 ðxÞ. As seen in Figure 15a, Pa ðMÞ is not a Gaussian [in contrast to the pdf of the continuous 2D wavelet coefficients when using Eq. (7)], but decreases fast to zero at zero. This explains that when concentrating on the wavelet transform skeleton, the discrete sum on the r.h.s. of Eq. (21) no longer diverges when considering negative q values. This remark is at the heart of the 2D WTMM method; by allowing us to compute the ðqÞ spectrum for negative as well as positive q values, the 2D WTMM method is a definite step beyond the 2D structure function method that is intrinsically restricted to positive q values. The corresponding pdfs Pa ðAÞ are represented in Figure 15b. Pa ðAÞ clearly does not evolve across scales. Moreover, except some small amplitude fluctuations observed at the largest scale, Pa ðAÞ ¼ 1=2 is a flat distribution as expected for statistically isotropic scale-invariant rough surfaces. The results reported in Figure 16 not only corroborate statistical isotropy but they bring unambiguous evidence for the independence of M and A. For two different scales, the pdf of M, when conditioned by the argument A, is shown to be shape invariant. We refer the reader to Arneodo et al. [108] for a similar detailed discussion of the results of the application of the 2D WTMM method to anisotropic monofractal self-affine rough surfaces.
MULTIFRACTAL IMAGE ANALYSIS
31
Figure 16. Pdf of M as conditioned by A. The different curves correspond to fixing A (mod ) to 0 =8; =4 =8; =2 =8 and 3=4 =8. (a) a ¼ 1; (b) a ¼ 2 (in W units). Same 2D WTMM computations for B1=3 as in Figure 15.
B. Multifractal Rough Surfaces Generated by Random Cascades on Separable Wavelet Orthogonal Basis This section is devoted to the application of the 2D WTMM method to multifractal functions synthetized from W-cascades on separable wavelet orthogonal basis as defined in Decoster et al. [109]. A 2D random Wcascade is built recursively on the two-dimensional square grid of separable wavelet orthogonal basis, involving only scales that range between a given large scale L and the scale 0 (excluded). Thus the corresponding fractal function f (x) will not involve scales greater than L. For that purpose, we will use compactly supported wavelets defined by Daubechies [58,109]. Moreover we will mainly concentrate here on multifractal rough surfaces that display isotropic scaling and that are generated with a 2D log-normal W-cascade. If m and 2 are, respectively, the mean and the variance of ln W, where W is a multiplicative random variable with log-normal probability distribution, then, as shown in Decoster et al. [109], a straightforward computation leads to the following ðqÞ spectrum: ðqÞ ¼ log2 < W q > 2; 8q 2 R ¼
2 2 m q q2 2 ln 2 ln 2
ð42Þ
where < > means ensemble average. The corresponding D(h) singularity spectrum is obtained by Legendre transforming ðqÞ [Eq. (23)]: DðhÞ ¼
ðh þ m=ln 2Þ2 þ2 2 2 =ln 2
ð43Þ
´ ODO ET AL. ARNE
32
According to the convergence criteria established in 1D [161], we will consider only parameter values that satisfy the conditions: m<0
and
pffiffiffiffiffiffiffiffi jmj > 2 ln 2 2
ð44Þ
Moreover, by solving DðhÞ ¼ 0, one gets the extremal values hmin and hmax : m 2 pffiffiffiffiffiffiffiffi ln 2 ln 2 m 2 þ pffiffiffiffiffiffiffiffi ¼ ln 2 ln 2
hmin ¼ hmax
ð45Þ
Figure 17 illustrates the computation of the maxima chains and the WTMMM for an individual image of a multifractal rough surface generated with the log-normal W-cascade model with parameter values: m ¼ 0:38 ln 2 and 2 ¼ 0:03 ln 2. Again Figure 17b illustrates perfectly the fact that the maxima chains correspond to edge curves of the original
Figure 17. 2D wavelet transform analysis of a multifractal rough surface generated with the log-normal W-cascade model with parameter values m ¼ 0:38 ln 2 and 2 ¼ 0:03 ln 2. is the first-order radially symmetric analyzing wavelet shown in Figure 1. (a) Thirty-two grayscale coding of the original (1024 1024) image. In (b) a ¼ 22:9 W , (c) a ¼ 21:9 W , and (d) a ¼ 23:9 W are shown the maxima chains and the WTMMM for the central (512 512) part of the original image [dashed square in (a)]. In (b), the smoothed image b;a f is shown as a grayscale-coded background from white (min) to black (max).
MULTIFRACTAL IMAGE ANALYSIS
33
image after smoothing by a Gaussian filter . From the WTMMM defined on these maxima chains, one constructs the WT skeleton according to the procedure described in Section II.C. From the WT skeleton of 32 (1024 1024) images like the one in Figure 17a, one computes the annealed average of the partition functions Zðq; aÞ. As shown in Figure 18a, when plotted versus the scale parameter a in a logarithmic representation, these annealed average partition functions display a rather impressive scaling
Figure 18. Determination of the ðqÞ and D(h) spectra of multifractal rough surfaces generated with the log-normal ( ) random W-cascade models, using the 2D WTMM method. is the first-order radially symmetric analyzing wavelet shown in Figure 1. (a) log2 Zðq; aÞ vs. log2 a; the solid lines correspond to linear regression fit of the data over the first four octaves. (b) h(q, a) vs. log2 a; the solid lines correspond to linear regression fit estimates of h(q). (c) ðqÞ vs. q as obtained from linear regression fit of the data in (a) over the first four octaves. (d) D(h) vs. h, after Legendre transforming the ðqÞ curve in (c). In (c) and (d), the solid lines represent the theoretical log-normal spectra given by Eqs. (42) and (43), respectively.
34
´ ODO ET AL. ARNE
behavior over a range of scales of about four octaves (i.e., W < a< 16 W , where W ¼ 13 pixels). Let us point out that scaling of quite good quality is found for a rather wide range of values of q: 6< q< 8. When processing to a linear regression fit of the data over the first four octaves, one gets the ðqÞ spectrum ( ) shown in Figure 18c. For the range of q values where scaling is operating, the numerical data are in remarkable agreement with the theoretical nonlinear ðqÞ spectrum given by Eq. (42). Similar quantitative agreement is observed on the D(h) singularity spectrum in Figure 18d. Let us note that consistant parabolic shapes are obtained when using either the Legendre transform of the ðqÞ data [Eq. (23)] or the formula (31) and (32) to compute h(q) and D(q). In Figure 18b are reported the results for the expectation values h(q,a) [Eq. (29)] vs. log2 a; it is clear on this figure that the slope h(q) depends upon q, the hallmark of multifractal scaling. Note that again, the theoretical predictions hðqÞ ¼ @ =@q ¼ 2 q=ln 2 m=ln 2 provide very satisfactory fits of the numerical data. From Eq. (45), the multifractal rough surfaces under study display intermittent fluctuations corresponding to Ho¨lder exponent values ranging from hmin ¼ 0:034 to hmax ¼ 0:726. Unfortunately, to capture the strongest and weakest singularities, one needs to compute the ðqÞ spectrum for very large values of jqj. This requires the processing of many more images of much larger size, which is not within current computer capabilities. Note that with the statistical sample studied here, one has Dðhðq ¼ 0Þ ¼ 0:38Þ ¼ 2:00 0:02, which allows us to conclude that the rough surfaces under consideration are singular everywhere. From the construction rule of these synthetic log-normal rough surfaces [109], the multifractal nature of these random functions is expected to be contained in the way the shape of the WT modulus pdf Pa ðMÞ evolves when varying the scale parameter a, as shown in Figure 19a. Indeed the joint probability distribution Pa ðM; AÞ is expected to factorize as the signature of the implicit decoupling of M and A in the construction process. This decoupling is numerically retrieved in Figure 20 where, for two different scales, the pdf of M, when conditioned by the argument A, is shown to be shape invariant. When varying the scale parameter a, no significant angulardependent evolution is observed in the distribution of the WTMMM. As seen in Figure 19b, Pa ðAÞ does not exhibit any significant change when increasing a, except some loss in regularity at large scales due to the rarefaction of the maxima lines. Let us point out that even though Pa ðAÞ looks globally rather flat, one can notice some small amplitude almost periodic oscillations at the smallest scales that reflects the existence of privileged directions in the wavelet cascading process. These oscillations are maximum for A ¼ 0; =2; and 3=2, as the witness to the square lattice anisotropy underlying the 2D wavelet tree decomposition.
MULTIFRACTAL IMAGE ANALYSIS
35
Figure 19. Pdfs of the WTMMM coefficients of synthetic multifractal rough surfaces generated with the log-normal W-cascade model (m ¼ 0:38 ln 2 and 2 ¼ 0:03 ln 2). (a) Pa ðMÞ vs. M. (b) Pa (A) vs. A. is a first-order radially symmetric analyzing wavelet. Four different scales a ¼ 1; 2; 4; 8 (in W units) are shown. These results correspond to averaging over 32 (1024 1024) images.
Figure 20. Pdfs of M when conditioned by A. The different curves correspond to fixing A (mod ) to 0 =8; =4 =8; =2 =8, and 3=4 =8. (a) a ¼ 20:1 ; (b) a ¼ 21:1 (in W units). Same 2D WTMM computations as in Figure 19.
1. Remark We have reported results obtained with the first-order radially symmetric analyzing wavelets shown in Figure 1. Possibly because of the range of Ho¨lder exponent values that is restricted to h 2 ½0; 1, but more probably because of the underlying multiplicative structure of the multifractal surface itself, a first-order analyzing wavelet leads to numerical multifractal spectra that are in remarkable agreement with the theoretical predictions. Let us point out that quite robust results are obtained with the third-order analyzing wavelet used in the previous subsection.
´ ODO ET AL. ARNE
36
C. Distinguishing ‘‘Multiplicative from Additive’’ Processes Underlying the Scale Invariance Properties of Rough Surfaces from Space-Scale Correlation Analysis Correlations in multifractals have already been experienced in the literature [162–164]. However, all these studies rely upon the computation of the scaling behavior of some partition functions involving different points; they thus mainly concentrate on spatial correlations of the local singularity exponents. The approach recently developed [85,100,165] is different since it does not focus on (or suppose) any scaling property but rather consists in studying the correlations of the logarithms of the amplitute of a space-scale decomposition of the signal. More specifically, if ðxÞ is a bump function such that kk1 ¼ 1, then by taking Z ð46Þ "2 ðx; aÞ ¼ a4 ½ðx yÞ=ajT ½ f ðy; aÞj2 d 2 y one has k f k22 ¼
ZZ
"2 ðx; aÞd 2 xda
ð47Þ
Thus, "2 ðx; aÞ can be interpreted as the local space-scale energy density of the considered multifractal function f(x). Since "2 ðx; aÞ is a positive quantity, we can define the magnitude of the function f at the point x and scale a as !ðx; aÞ ¼
1 ln "2 ðx; aÞ 2
ð48Þ
We have shown [109] that a multiplicative process can be revealed and characterized through the correlations of its space-scale magnitudes: ~ðx1 ; a1 Þ! ~ðx2 ; a2 Þ > Cðx1 ; x2 ; a1 ; a2 Þ ¼ < !
ð49Þ
~ for the centered process where < > stands for ensemble average and ! ! < ! >. When using W-cascade process, one can compute analytically the ‘‘two-scale’’ correlation function Cðx; a1 ; a2 Þ, between the magnitude at scale a1 and the magnitude at scale a2 . The function displays a logarithmic behavior as long as x is greater than the supremum of a1 and a2 , namely [109,161,165]:
L x 2 Cðx; a1 ; a2 Þ ¼ log2 2þ2 ð50Þ x L when supða1 ; a2 Þ x < L
MULTIFRACTAL IMAGE ANALYSIS
37
Figure 21. Magnitude correlation function Cðx; a1 ; a2 Þ vs. log2 ðxÞ, as computed from the continuous wavelet transform of 32 (1024 1024) images. The analyzing wavelet is the radially symmetric first-order wavelet shown in Figure 1. (a) Log-normal W-cascades for parameter values m ¼ 0:38 ln 2 and 2 ¼ 0:03 ln 2. (b) Fractional Brownian surfaces BH ¼ 1=3 (x). The symbols have the following meaning: a1 ¼ a2 ¼ 2ð Þ; a1 ¼ 1; a2 ¼ 2ð4Þ; a1 ¼ 1; a2 ¼ 22 ðhÞ and a1 ¼ 2; a2 ¼ 22 ( ) in W units. In (a) the solid line represents the theoretical prediction given by Eq. (50). We have not shown any data points for x W ð 13pixelsÞ.
Thus, the ultrametric structure of the wavelet representation of multifractal rough surfaces generated with the random W-cascade model implies that the cross-correlation functions (across scales) decrease very slowly, independently of a1 and a2 , as a logarithm function of the spatial distance x. In Figure 21a the results of the computation of Cðx; a1 ; a2 Þ when averaging over 32ð1024 1024Þ images of multifractal rough surfaces generated with the log-normal W-cascade model for the same parameter values as in Figure 17a are shown. One can see that for x > supða1 ; a2 Þ, all the data points fall onto a unique curve when plotted versus log2 ðxÞ, independently of the considered pair of scales (a1 ; a2 ). Moreover, although the analyzing wavelet is different from the one used in the construction process of the W-cascade, these numerical data are in striking good agreement with the theoretical prediction given by Eq. (50) for 2 ¼ 0:03 ln 2 and L ¼ 1024. The observed slow (logarithmic) decay of the space-scale correlation functions is thus a clear indication that magnitudes in random cascades are correlated over very long distances [100,109,161,165–167]. Note that both the scale independence and the logarithmic decay are features that are not observed in ‘‘additive’’ models like fractional Brownian motions whose long-range correlations originate from the sign of their variations rather than from the amplitudes. In Figure 21b are plotted the correlation functions Cðx; a1 ; a2 Þ computed from 32ð1024 1024Þ images of isotropic fractional Brownian surfaces with index H ¼ 1=3 (see Fig. 9a). When compared with Figure 21a, the difference is
38
´ ODO ET AL. ARNE
impressive: for x > supða1 ; a2 Þ, the magnitudes of BH ¼ 1=3 (x) are found uncorrelated.
D. Using the 2D WTMM Method to Perform Image Processing Tasks We now want to discuss the ability to use the WTMM method for specific purposes in image processing. We refer the reader to previous work [108] for edge detection and image denoising applications, and also to the work of Levy-Vehel [168,169] for previous attempts to use multifractal concepts for image analysis. In this subsection, we want to address a specific image processing segmentation problem that will be helpful when dealing with medical applications in Section VI. Indeed, in the past 20 years, many signal and image processing works have been devoted to medical research, especially mammography [170,171]. A major point is the detection and the characterization of clusters of microcalcifications, which are early signs of breast cancer (Section VI). Our goal is to demonstrate the ability of the 2D WTMM method to do, in a very attractive way, such a task with both synthetic surfaces and genuine mammographic scenes. Here by cluster or aggregate we mean a set of small objects in which the distances between them are small as compared to the size of the aggregate itself; otherwise there is no way of speaking of aggregate—all we have are isolated objects. Indeed the WTMM method allows us to discriminate two classes of singularities from the space-scale information embedded in the WT skeleton, and then to characterize separately the two resulting subsets by computing the corresponding partition functions and multifractal spectra [115]. In Figure 22, we show synthetic images of clusters of small spots of various heights over a background 2D f Bm BH (x) rough surface of Hurst exponent H ¼ 0:6. The trivial case of a single isolated spot is shown in Figure 22a. Figure 22b–d displays small spots located on a straight line, on the border of a filled-in Julia set and on a dense area, respectively. Let us recall that Julia sets are beautiful objects that arise in the study of iteration of rational functions on the complex plane [1,2,28]. Here we use the wellknown example of a quadratic polynom as iterating function fc : z ! z2 þ c, with c ¼ 0:85 þ 0:20 i. The Julia set is just the set of initial seed z0 such that the iterated sequence ðzn Þn2N ; znþ1 ¼ z2n þ c does not go to infinity. In Figure 23 is shown the filled-in Julia set that we have used to compute Figure 22c. Each of these clusters has a known fractal dimension, respectively 0, 1, 1.68, and 2 for the point, line, Julia, and dense clusters. Although these spots are not singularities but localized structures with Gaussian shape of width ¼ 3 pixels and random heights, the 2D WTMM
MULTIFRACTAL IMAGE ANALYSIS
39
Figure 22. Synthetic rough surfaces (512 512) with a 2D fBm background of Hurst exponent H ¼ 0:6 and containing a cluster of localized spots. (a) The cluster contains only one spot in the middle of the image; (b) the spots are located on a straight line; (c) the spots are located on a Julia set; (d) the spots are randomly distributed in a square. The spots are modeled by a gaussian of width ¼ 3 pixels and height randomly chosen in the range [1.2, 1.8] in BH ¼ 0:6 unit. Same gray coding as in Figure 9.
Figure 23. Filled-in Julia set with parameter c ¼ 0:85 þ 0:20 i.
method can be used in a very efficient way to identify them and characterize the geometric properties of the aggregate to which they belong. As shown in Section II.B, in the WT skeleton, one expects the maxima lines pointing to the background texture to display local scaling properties corresponding to a 2D f Bm surface, i.e. hðxÞ ¼ H ¼ 0:6 [Eq. (15)], whereas maxima lines pointing to clustered spots are expected to display different local scaling
40
´ ODO ET AL. ARNE
Figure 24. Scaling behavior of the WT modulus along some maxima lines of the WT skeletons computed from the images shown in Figure 22a, 22b, 22c and 22d respectively using the first-order radially symmetric analyzing wavelet shown in Figure 1. Symbols ( ) are for maxima lines pointing to background BH ¼ 0:6 texture singularities and (4) for those pointing to clustered small spots. The solid (resp. dashed) line corresponding to scaling exponent h ¼ 0:6 (resp h ¼ 1) is drawn the guide the eyes.
properties with exponent h ¼ 1 since they are seen (at scales a > W > by our W -resolved WT microscope) as Dirac singularities. Notice that because these spots are quite smooth localized structures, one expects the WTMM on these maxima lines to display a crossover at small scales (a> W ) toward the behavior M ½ f ðx0 ; aÞ an ; a ! 0þ [Eq. (17)] dictated by the number of zero moment of the analyzing wavelet. In Figure 24 are shown, in a logarithmic representation, the WT modulus versus scale parameter a for various maxima lines belonging to the WT skeletons computed from the four images in Figure 22. For each of the analyzed images, maxima lines pointing to small spots clearly display a crossover from some increase of M at small scales to a clear power-law decrease at large scales with a local scaling exponent h> 1 that is negative and thus can be easily distinguished from the monotonous power-law increase M a0:6 observed along the maxima line pointing to a h ¼ H ¼ 0:6 background singularity. Now if one proceeds to the computation of the partition functions Z(q, a) on the subskeleton corresponding to identified small spots, one gets the results reported in Figure 25. Even though there are quite few maxima lines in this
MULTIFRACTAL IMAGE ANALYSIS
41
Figure 25. Determination of the fractal dimension DF ¼ ð0Þ of the cluster of localized spots in Figures 22a ( : isolated spot), 22b (4: linear cluster), 22c (h: Julia cluster) and 22d (*: dense cluster). log2 Zðq ¼ 0; aÞvs: log2 ðaÞ as computed with the 2D WTMM method after discriminating the WT subskeleton corresponding to clustered spots as illustrated in Figure 24. The solid lines correspond to the theoretical fractal dimensions DF ¼ 0; 1; 1:68 and 2 respectively.
subskeleton, one gets a rather nice scaling behavior for small values of q. In particular the estimate of the exponent ð0Þ for q ¼ 0 achieves our aim to classify geometrically these clusters of localized spots. Within numerical uncertainties, one obtains the following estimates of the fractal dimensions DF ¼ ð0Þ ¼ 0 0:02; 1 0:02; 1:7 0:04 and 2 0:02 for single spot, linear cluster, Julia cluster, and dense cluster. These results are in quite good agreement with the theoretical DF values. They illustrate the ability of WTMM methodology to extract clustered objects from a nontrivial background and to retrieve a geometric characterization of the cluster, via the estimate of its fractal dimension DF .
IV. Multifractal Analysis of High-resolution Satellite Images of Cloud Structure The problem of nonlinear variability over a wide range of scales has been considered for a long time with respect to the highly intermittent nature of turbulent flows in fluid dynamics [18,44]. Special attention has been paid to their asymptotic and possibly universal behavior when the dissipation length goes to zero, i.e., when the Reynolds number goes to infinity. Besides windtunnel and laboratory (grid, jet, etc.) experiments, the atmosphere is a huge natural laboratory in which high Reynolds number (fully developed) turbulent dynamics can be studied. Clouds, which are at the source of the
42
´ ODO ET AL. ARNE
hydrological cycle, are the most obvious manifestation of the earth’s turbulent atmospheric dynamics [10,172,173]. By modulating the input of solar radiation, they play a critical role in the maintenance of the earth’s climate [174]. They are also one of the main sources of uncertainty in current climate modeling [175], where clouds are assumed to be homogeneous media lying parallel to the earth’s surface; at best, a linear combination of cloudy and clear portions according to cloud fraction is used to account for horizontal inhomogeneity when predicting radiative properties. For many years, the lack of data hindered our understanding of cloud microphysics and cloud–radiation interactions. It is now well-recognized that clouds are variable in all directions and that fractal [172,173,176–181] and multifractal [10,182–184] concepts are likely to be relevant to the description of the complex 3D geometry of clouds. Until quite recently, the internal structure of clouds was probed by balloons or aircrafts that penetrated the cloud layer, revealing an extreme variability of 1D cuts of some cloud fields [184– 192]. In particular, in situ measurements of cloud liquid water content (LWC) were performed during many intensive field programs (FIRE [193], ASTEX [194], SOCEX [195], etc.). Indeed, during the past 15 years, vast amounts of data on the distribution of atmospheric liquid water from a variety of sources were collected and analyzed in many different ways. All these data contain information on spatial and/or temporal correlations in cloudiness, enabling the investigation of scale invariance over a range from a few centimeters to hundred of kilometers. An attractive alternative to in situ probing is to use high-resolution satellite imagery that now provides direct information about the fluctuations in liquid water concentration in the depths of clouds [177,179–181,196–202]. These rather sophisticated remote sensing systems called ‘‘millimeter radars’’ are actually sensitive not only to precipating raindrops but also to suspended cloud droplets. Spectral analysis of the recorded 2D radiance field [196–202] confirms previous 1D findings that make it likely that cloud scenes display scaling over a wide range of scales. One has to give credit to Lovejoy and co-workers [120–123,182,183,203– 206] for applying the multifractal description to atmospheric phenomena. Using trace moment and double trace moment techniques [120–123, 204–206], they have brought experimental evidence for multiple scaling (or in other words, the existence of a continuum of scaling exponent values) in various geophysical fields. More recently, Davis and co-workers [184,192,202] have used the structure function method to study LWC data recorded during ASTEX and FIRE programs. Both these analyses converge to the conclusion that the internal marine stratocumulus (Sc) structure is multifractal over at least three decades in scales. Similar multifractal behavior has been reported by Wiscombe et al. [201] when analyzing liquid
MULTIFRACTAL IMAGE ANALYSIS
43
water path (LWP) data (i.e., column integrated LWC), from the Atmosperic Radiation Measurement (ARM) archives. Even though all these studies seem to agree, at least as far as their common diagnostic of multifractal scaling of the cloud structure, they all concern 1D data. To our knowledge, the structure function method has also been applied to 1D cuts of highresolution satellite images [197,207], but we are not aware of any results coming from a specific 2D analysis. Our goal here is to take advantage of the 2D WTMM method to carry out a multifractal analysis of highresolution satellite images of Sc cloudy scenes [106,107,110]. Beyond the issue of improving statistical characterization of in situ and remotly sensed data, there is a most challenging aspect, which consists in extracting structural information to constraint stochastic cloud models, which in turn will be used for radiative transfer simulations [180,182,202,208–215]. Then by comparing the multifractal properties of the numerically generated artificial radiation fields with those of actual measurements, one can hope to achieve some degree of closure.
A. Landsat Data of Marine Stratocumulus Cloud Scenes Over the past 15 years, Landsat imagery has provided the remote sensing community at large with a very attractive and reliable tool for studying the Earth’s environment [177,179–181,196–202,216,217]. One of the main advantages of high-resolution satellite imagery is its rather low effective cost as compared to outfitting and flying research aircraft. Moreover this instrument is well calibrated and it offers the possibility of reaching unusual high spatial, spectral, and radiometric resolutions [197,216]. Mainly two types of statistical analysis have been applied so far to Landsat imagery: spectral analysis of the 2D radiance field [196–200,216] and joint area and perimeter distributions for ensembles of individual clouds [177,179–181] defined by some threshold in radiance. One of the most remarkable properties of Landsat cloud scenes is their statistical scale-invariance over a rather large range of scales, which explains why fractal and multifractal concepts have progressively gained more acceptance in the atmospheric scientific community [10]. Of all cloud types, marine stratocumulus (Sc) are without doubt the ones that have attracted the most attention, mainly because of their first–order effect on the Earth’s energy balance [10,173,197,216,218]. Being at once very persistent and horizontally extented, marine Sc layers carry considerable weight in the overall reflectance (albedo) of the planet and, from there, command a strong effect on its global climate [174]. Furthermore, with respect to climate modeling [175] and the major problem of cloud–radiation
44
´ ODO ET AL. ARNE
interaction [182,196,197,208–211], they are presumably at their simplest in marine Sc that are relatively thin ( 300 500 m), with well-defined (quasiplanar) top and bottom, thus approximating the plane-parallel geometry in which radiative transfer theory is well developed [173,182, 197,209,210,213]. However, because of its internal homogeneity assumption, plane-parallel theory shows systematic biases in large-scale average reflectance [210,219] relevant to Global Circulation Model (GCM) energetics and large random errors in small-scale values [213,220] relevant to remote-sensing applications. Indeed, marine Sc have huge internal variability [184,192], not necessarily apparent to the remote observer. In this section we challenge previous analysis [177,179–181,196–202, 216,217] of Landsat imagery using the 2D WTMM methodology [106–110] with the specific goal of improving statistical characterization of the highly intermittent radiance fluctuations of marine Sc, a prerequisite for developing better models of cloud structure and, in turn, furthering our understanding of cloud–radiation interaction. For that purpose, we analyze [110] a (’ 196 168 km2 ) original cloudy Landsat 5 scene captured with the TM camera (1 pixel ¼ 30 m) in the 0.6–0.7 m channel (i.e., reflected solar photons as opposed to their counterparts emitted in the thermal infrared) during the first ISCCP (International Satellite Cloud Climatology Project) Research Experiment (FIRE) field program [193], which took place over the Pacific Ocean off San Diego in the summer of 1987. For computational convenience, we actually select 32 overlapping 1024 1024 pixels2 subscenes in this cloudy region. The overall extent of the explored area is about 7840 km2 . Figure 26a shows a typical (1024 1024) portion of the original image where the eight-bit gray scale coding of the quasinadir viewing radiance clearly reveals the presence of some anisotropic texture induced by convective structures that are generally aligned to the wind direction.
B. Application of the 2D WTMM Method to Landsat Images of Stratocumulus Clouds We systematically follow the numerical implementation procedure described in Section II.C. We first wavelet transform the 32 overlapping (1024 1024) images, cut out of the original image, with the first-order (n ¼ 1) radially symmetric analyzing wavelet defined in Figure 1. From the wavelet transform skeleton defined by the WTMMM, we compute the partition functions from which we extract the ðqÞ and D(h) multifractal spectra. We systematically test the robutness of our estimates with respect to some change in the shape of the analyzing wavelet, in particular when increasing the number of zero moments.
MULTIFRACTAL IMAGE ANALYSIS
45
Figure 26. 2D wavelet transform analysis of a Landsat image of marine Sc clouds [110]. (x) is the first-order radially symmetric analyzing wavelet shown in Figure 1. (a) A 256 grayscale coding of a (1024 1024) portion of the original radiance image. In (b) a ¼ 22:9 W , (c) a ¼ 21:9 W , and (d) a ¼ 23:9 W (where W ¼ 13 pixels ’ 390 m), are shown the maxima chains; the local maxima of M along these chains are indicated by () from which originates an arrow whose length is proportional to M and its direction (with respect to the x-axis) is given by A ; only the central (512 512) part delimited by a dashed square in (a) is taken into account to define the WT skeleton. In (b), the smoothed image b;a I is shown as a gray-scale coded background from white (min) to black (max).
1. Numerical Computation of the Multifractal ðqÞ and D(h) Spectra Figure 26 illustrates the computation of the maxima chains and the WTMMM for the marine Sc subscene. After linking these WTMMM across scales, one constructs the WT skeleton from which one computes the partition functions Z(q, a) [Eq. (21)]. As reported in Figure 27a, the annealed average partition functions () display some well-defined scaling behavior over the first three octaves, i.e., over the range of scales 390 m < a < 3120 m, when plotted versus a in a logarithmic representation. Indeed the scaling deteriorates progressively from the large scale side when one goes to large values of jqj> 3. As discussed [110], besides the fact that we are suffering from insufficient sampling, the presence of localized Dirac-like structures likely explains the fact that the observed crossover to a steeper power-law decay occurs at a smaller and a smaller scale when one increases q > 0. Actually for q> 3, the crossover scale a* < 1200 m becomes significantly smaller than the so-called integral scale, which is approximately
46
´ ODO ET AL. ARNE
Figure 27. Determination of the ðqÞ and D(h) spectra of radiance Landsat images of marine Sc. The 2D WTMM method is used with either a first-order () or a third-order ( ) radially symmetric analyzing wavelet (see Fig. 1). (a) log2 Zðq; aÞ vs. log2 a; the solid lines correspond to linear regression fits of the data over the first octave and a half. (b) ðqÞ vs. q as obtained from a linear regression fit of the data in (a). (c) D(h) vs. h, after Legendre transforming the ðqÞ curve in (b). In (b) and (c), the solid lines correspond to the theoretical multifractal spectra for log-normal W-cascades with parameter values m ¼ 0:38 ln 2 and 2 ¼ 0:07 ln 2 [Eqs. (42) and (43)]. The D(h) singularity spectrum of velocity (dotted line) and temperature (dashed line) fluctuations in fully developed turbulence are shown for comparison in (c).
given by the characteristic width ’ 1 5–6 km of the convective rolls (Fig. 26a). When processing to a linear regression fit of the data in Figure 27a over the first octave and a half (in order to avoid any bias induced by the presence of the observed crossover at large scales), one gets the ðqÞ spectrum () shown in Figure 27b. In contrast to the fractional Brownian rough surfaces studied in Section III.A [108], this ðqÞ spectrum unambiguously deviates from a straight line. When Legendre transforming this nonlinear ðqÞ curve, one gets the D(h) singularity spectrum reported in Figure 27c. Its characteristic single humped shape over a finite range of Ho¨lder exponents is a clear indication of the multifractal nature of the marine Sc radiance fluctuations. We have checked [110] that the estimate of the D(h) singularity spectrum from the scaling behavior of the partition functions h(q, a) [Eq. (29)] and D(q, a) [Eq. (30)] yields similar quantitative results.
MULTIFRACTAL IMAGE ANALYSIS
47
In Figure 27 are also shown for comparison the results ( ) obtained when applying the 2D WTMM method with a third-order (n ¼ 3) radially symmetric analyzing wavelet (the smoothing function being the isotropic 2D Mexican hat). As seen in Figure 27a, the use of a wavelet that has more zero moments seems to somehow improve scaling. For the range of q-values investigated, the crossover scale turns out to be rejected at a larger scale, enlarging by some amount the range of scales over which scaling properties can be measured, especially for the largest values of jqj. The fact that one improves scaling when increasing the order of the analyzing wavelet suggests that perhaps some smooth behavior unfortunately deteriorates our statistical estimate of the multifractal spectra of the original Landsat radiance image. Let us recall that, as explained in Section II.B, smooth C 1 behavior may give rise to maxima lines along which M an (Fig. 6b); hence the larger n , the smaller is the overall contribution of those ‘‘spurious’’ maxima lines in the partition function summation over the WT skeleton. As seen in Figures 26a, the anisotropic texture induced by the convective streets or rolls might well be at the origin of the relative lack of well-defined scale invariance. When looking at the corresponding ðqÞ spectrum ( ) extracted from the data in Figure 27b, one gets quantitatively the same estimates for q> 1. For more negative values of q, the data obtained with the third-order analyzing wavelet clearly depart from the previous estimates with the first-order wavelet. The slope of the new ðqÞ spectrum is somehow weakened, which implies, from the Legendre transform properties, that the corresponding values of hðqÞ ¼ @=@q are reduced. The computation of the D(h) singularity spectrum ( ) in Figure 27c enlightens this phenomenon: while the increasing left-hand branch (which corresponds to the strongest singularities) of the D(h) curve appears to be quite robust with respect to the choise of , the decreasing right-hand branch (associated to the weakest singularities) is modified when increasing the number of zero moments of . As shown in Figure 27b and c, the D(h) spectrum as well as the ðqÞ spectrum data are very well fitted by the theoretical quadratic spectra predicted for log-normal random W-cascades [Eqs. (42) and (43)]. However, with the first-order analyzing wavelet, the best fit is obtained with the parameter values m ¼ 0:38 ln 2 ¼ 0:263 and 2 ¼ 0:07 ln 2 ¼ 0:049, while for the third-order wavelet these parameters take slightly different values, namely m ¼ 0:366 ln 2 ¼ 0:254 and 2 ¼ 0:06 ln 2 ¼ 0:042. The variance parameter 2 that characterizes the inter mittent nature of marine Sc radiance fluctuations is therefore somehow reduced when going from n ¼ 1 to n ¼ 3. Actually the lack of statistical convergence because of insufficient sampling is actually the main reason for this uncertainty in the estimate of 2 [110]. As previously experienced [109] for synthetic multifractal rough surfaces, an accurate
48
´ ODO ET AL. ARNE
Figure 28. Pdfs of the WTMMM coefficients of the 32 (1024 1024) radiance Landsat images as computed with the first-order radially symmetric analyzing wavelet. (a) Pa ðMÞ vs. M; (b) Pa ðAÞ vs. A; the symbols correspond to the following scales: a ¼ 20:3 W ¼ 480; mðÞ; 21:3 W ¼ 960 mð Þ, and 22:3 W ¼ 1920 m ðÞ. The solid lines in (a) correspond to lognormal distributions.
estimate of the exponents ðqÞ for q< 3 requires more than 32 (1024 1024) images. With the statistical sample of Landsat images we have at our disposal, one gets Dðhðq ¼ 0Þ ¼ 0:37 0:02Þ ¼ 2:00 0:01, which is a strong indication that the radiance field is singular everywhere. From the estimate of ðq ¼ 2Þ ¼ 1:38 0:02, one gets the following estimate of the spectral exponent: ¼ ð2Þ þ 4 ¼ 2:62 0:02, i.e., a value in good agreement with previous estimates [185–189,191,196–200,216]. 2. WTMMM Probability Density Functions This subsection is mainly devoted to the analysis of the joint probability distribution function Pa ðM; AÞ [108–110] as computed from the wavelet transform skeletons of the 32 (1024 1024) radiance images with the first-order radially symmetric analyzing wavelet R (n ¼ 1). In Figure 28a and b are respectively shown the pdfs P ðMÞ ¼ dAPa ðM; AÞ and Pa ðAÞ ¼ a R dMPa ðM; AÞ for three different values of the scale parameter a ¼ 20:3 W (480 m), 21:3 W (960 m), and 22:3 W (1920 m). First let us focus on the results shown in Figure 28b for Pa ðAÞ. This distribution is clearly scale dependent with some evidence of anisotropy enhancement when going from small to large scales, in particular when one reaches scales that become comparable to the characteristic width of the convective structures (i.e., a few kilometers wide). Two peaks around the values A ’ 1 =6 and 5=6 become more and more pronounced as the signature of a privileged direction in the analyzed images. As one can check from a visual inspection of Figure 26a, this direction is nothing but the perpendicular to the mean direction of the convective rolls that are generally aligned to the wind direction. This is another clear indication that at large scales, the wavelet
MULTIFRACTAL IMAGE ANALYSIS
49
Figure 29. Pdfs of the WTMMM coefficients of the 32 (1024 1024) radiance Landsat images as computed with a first-order radially symmetric analyzing wavelet. Pdfs of M when conditioned by A. The different symbols correspond to fixing A (mod ) to 0 =8ð Þ; =4 =8 (h), =2 =8 ð4Þ, and 3=4 =8 (&). (a) a ¼ 20:3 W ¼ 480; m; (b) a ¼ 21:3 W ¼ 960; m.
transform microscope is sensitive to the convective roll texture, a rather regular modulation superimposed to the background radiance fluctuations [107–110]. Another important message that comes out from our analysis is illustrated in Figure 29. When conditioning the pdf of M by the argument A, the shape of this pdf is shown to be independent of the considered value of A, as long as the value of the scale parameter a remains small as compared to the characteristic width of the convective structures. The observation that the joint probability distribution actually factorizes, i.e., Pa ðM; AÞ ¼ Pa ðMÞPa ðAÞ, indicates that M and A are likely to be independent [107,110]. This implies that all the multifractal properties of the marine Sc radiance fluctuations are contained in the way the shape of the pdf of M evolves when one decreases the scale parameter a. This evolution is illustrated in Figure 28a when using a first-order radially symmetric analyzing wavelet. Since by definition the WTMMM are different from zero, Pa ðMÞ decreases exponentially fast to zero at zero. As previously emphasized [108], this observation is at the heart of the 2D WTMM method, which, for this reason, does not suffer any problem with divergency when estimating the ðqÞ spectrum for q < 0. As shown in Figure 28a for any scale significantly smaller than the integral scale ( 5–6 km, as given by the characteristic width of the convective structures), all the data points fall, within a good approximation, on a log-normal curve [106,110]. As shown [110], this experimental feature is not specific to some particular shape of the analyzing wavelet since log-normal pdfs are also found when using a third-order radially symmetric analyzing wavelet.
50
´ ODO ET AL. ARNE
Figure 30. Magnitude correlation function Cðx; a1 ; a2 Þ vs. log2 ðxÞ, as computed from the 32 (1024 1024) radiance Landsat images using a first-order radially symmetric analyzing wavelet. (a) WTMMM magnitude: !ðx; aÞ ¼ ln½M ½ f ½Lx ðaÞ. (b) Continuous WT magnitude: !ðx; aÞ ¼ 12 ln "2 ðx; aÞ [Eq. (48)]. The symbols have the following meaning: a1 ¼ a2 ¼ 2 W ¼ 780; mð Þ; a1 ¼ W ¼ 390; m; a2 ¼ 2 aW ¼ 780 mð4Þ; a1 ¼ W ¼ 390 m; a2 ¼ 4 W ¼ 1560 m ðhÞ; a1 ¼ 2 W ¼ 780; m; a2 ¼ W ¼ 1560; m ( ). The solid (dashed) lines correspond to the theoretical prediction [Eq. (50)] for multifractal rough surfaces generated with the random W-cascade model with parameters 2 ¼ 0:08 ln 2 (0.16 ln 2) and L ¼ 220 pixels ¼ 6.6 km.
C. Space-Scale Correlation Function Analysis of Radiance Landsat Images As pointed out in Section III.C, the real demonstration of the existence of an underlying multiplicative structure consists in taking advantage of the space-scale unfolding provided by the continuous wavelet transform to compute the cross-scale correlation functions. In Figure 30 the results of the computation of Cðx; a1 ; a2 Þ when averaging over the 32 (1024 1024) radiance Landsat images, using either the WTMMM (Fig. 30a) or the continuous WT (Fig. 30b) definition of the magnitude of f (Section III.C [110]) are reported. One can see that for x > supða1 ; a2 Þ, all the data points fall, in good approximation, onto a unique curve when plotted versus log2 ðxÞ, independently of the considered pair of scales (a1 ; a2 ). Moreover, a straight line of slope 2 ¼ 0:012 provides a rather reasonable fit of the data up to a separation distance x ’ 27 pixels ’ 3.8 km, where decorrelation seems to be attained. Note that using the WTMMM instead of the continuous WT does not make any difference; this is a strong indication of the existence of some ultrametric properties underlying the branching structure of the space-scale wavelet representation of the radiance fluctuations. On top of the data in both Figure 30a and b, we have shown, for comparison, the theoretical prediction [Eq. (50)] for the ‘‘two-scale’’ correlation function of multifractal rough surfaces generated by the random W-cascade model. This formula provides a reasonable fit of the data when adjusting the model parameters to 2 ¼ 0:16 ln 2 and L ¼ 220
MULTIFRACTAL IMAGE ANALYSIS
51
pixels ¼ 6.6 km. Although the estimate of the integral scale seems to be of the right order of magnitude as regard to the characteristic width ( 5–6 km) of the convective rolls, the value obtained for the intermittency parameter 2 is about twice as large as previous estimates derived from the WTMM computation of the ðqÞ and D(h) multifractal spectra in Figure 27. At this point, let us emphasize that a similar discrepancy has been previously noticed in the WTMM analysis of wind tunnel turbulent velocity fields [85,152]. It may suggest that simple scale-invariant self-similar cascades as pictured by the random W-cascade model are not sophisticated enough to account for the space-scale structure of the radiance fluctuations in marine Sc clouds. The interpretation of this feature in terms of correlations between weights at a given cascade step or in terms of a more complex geometry of the tree underlying the multiplicative structure of the radiance field is underway. The possible importance of the intermittently distributed localized downward spike structures is also under consideration. Before drawing definite conclusions, there is clearly a need to repeat the ‘‘two-point’’ correlation function analysis on the background radiance fluctuations, once all the maxima lines corresponding to those Dirac-like singularities are removed from the WT skeleton.
D. Comparative WTMM Multifractal Analysis of Landsat Radiance Field and Velocity and Temperature Fields in Fully Developed Turbulence Let us point out that a similar 1D WTMM analysis of the velocity fluctuations in high Reynolds number turbulence has come to conclusions very close to those of the present study [81–87,221]. Besides the presence of rather localized Dirac-like structures that witness the probing of vorticity filaments [62,84,127,221], the multifractal nature of turbulent velocity is likely to be understood in terms of a log-normal cascading process that is expected to be scale-invariant in the limit of very high Reynolds numbers [81–87]. In Figure 27c are shown for comparison the results obtained for the D(h) singularity spectrum of the radiance Landsat images together with the D(h) data extracted from the 1D analysis of a turbulent velocity signal recorded at the Modane wind tunnel (R ’ 2000) [82,85] [indeed DðhÞ þ 1 is represented for the latter in order to compare 1D to 2D data]. The turbulent velocity D(h) spectrum significantly differs from the results obtained for the marine Sc cloud. They have a common feature, i.e., the Ho¨lder exponent most frequently encountered in the radiance field h ¼ m= ln 2 ¼ hðq ¼ 0Þ ¼ @ =@qjq¼0 ¼ 0:38 0:01 is indistinguishable from the corresponding exponent h ¼ hðq ¼ 0Þ ¼ 0:39 0:01 found for the turbulent velocity field. Note that these values are significantly larger than the theoretical value
52
´ ODO ET AL. ARNE
h ¼ 1=3 predicted by Kolmogorov in 1941 [222] to account for the observed k5=3 power-spectrum behavior. The main difference comes from the intermittency parameter, which is much stronger for the cloud, 2 = ln 2 ¼ 0:07 0:01ðn ¼ 1Þ or 2 = ln 2 ¼ 0:06 0:01ðn ¼ 3Þ, than for the turbulent velocity, 2 =ln 2 ¼ 0:036 0:004. This indicates that the radiance field is much more intermittent than the velocity field: the D(h) singularity spectrum for the former is unambiguously wider than the corresponding spectrum for the later. For the sake of comparison, in Figure 27c we have also reported the multifractal D(h) spectrum of the temperature fluctuations recorded in a R ¼ 400 turbulent flow [223]. The corresponding single humped curve is definitely much wider than the velocity D(h) spectrum and it is rather close to the data corresponding to the marine Sc radiance field. It is well recognized, however, that liquid water is not really passive and that its identification with a passive component in atmospheric dynamics offers limited insight into cloud structure since, by definition, near-saturation conditions prevail and latent heat production affects buoyancy [202]. So cloud microphysical processes are expected to interact with the circulation at some, if not all, scales [224]. Nevertheless, our results in Figure 27c indicate that from a multifractal point of view, the intermittency captured by the Landsat satellite looks statistically equivalent to the intermittency of a passive scalar in fully developed 3D turbulence. The fact that the internal structure of Sc cloud somehow reflects some statistical properties of atmospheric turbulence is not such a surprise in this highly turbulent environment. The investigation of different sets of Landsat data is urgently required in order to test the degree of generality of the results reported in this first WTMM analysis of high-resolution satellite images. In particular, one may wonder to what extent the marine Sc Landsat data collected off the coast of San Diego on July 7, 1987 under specific observation conditions actually reflect the specific internal structure of Sc clouds. Work in this direction is currently in progress. Finally, with respect to the issue of cloud modeling, the WTMM analysis of marine Sc Landsat data indicates that the 2D random W-cascade models introduced [109] are much more realistic hierarchical models than commonly used multifractal models such as the fractionally integrated singular cascade [120,123,200,205,216] or the bounded cascade models [218,225]. We are quite optimistic in view of using the log-normal Wcascade models with realistic parameter values for radiation transfer simulations. To our opinion, random W-cascade models are a real breakthrough, not only for the general purpose of image synthesis, but more specifically for cloud modeling. It is likely that better cloud modeling will enable further progress in our understanding of cloud–radiation interactions possible.
53
MULTIFRACTAL IMAGE ANALYSIS
V. Multifractal Analysis of 3D Turbulence Simulation Data A. Multifractal Description of Intermittency 1. Intermittency Based on the Velocity Field Since Kolmogorov’s founding work in 1941 (K41) [222], fully developed turbulence has been intensively studied theoretically, numerically, and experimentally [18,44,226–229]. A standard way of analyzing a turbulent flow is to look for some universal statistical properties of the fluctuations of the velocity increments over a distance l: vðr; leÞ ¼ vðr þ leÞ vðrÞ
ð51Þ
where e is an arbitrary unit vector. For instance, investigating the scaling properties of the longitudinal structure functions: Sp ðlÞ ¼ < ðe:vðr; leÞÞp > l p ;
p>0
ð52Þ
where < > stands for ensemble average, leads to a spectrum of scaling exponents p that has been widely used as a statistical characterization of turbulent fields [18,44,226,228,229]. Based upon assumptions of statistical homogeneity, isotropy, and constant mean energy dissipation per unit mass , K41 asymptotic theory predicts the existence of an inertial range l L for which the structure functions behave as Sp ðlÞ p=3 l p=3
ð53Þ
where is the Kolmogorov dissipative scale and L the so-called integral scale. Although these assumptions are usually considered to be correct, there has been increasing numerical [18,226,230,231] and experimental [18,44,226,228,229,232–240] evidence that p deviates substantially from the K41 prediction p ¼ 13 p, at large p. The observed nonlinear behavior of the p spectrum actually characterizes some evolution of the longitudinal velocity increment probability density function (pdf) in the inertial range, from a Gaussian shape at large scales to stretched exponential tails toward smaller scales [228,234,235,241–245]. This evolution of the longitudinal velocity increment statistics across scales is at the heart of the multifractal description of the intermittency of small scales, pioneered by Parisi and Frisch in 1985 [43]. K41 theory is actually based on the assumption that at each point r of the fluid, the velocity field has the same scaling behavior e:vðr; leÞ l 1=3 , which yields the well-known EðkÞ ¼ k5=3 energy spectrum [18]. By interpreting the nonlinear behavior of p as a direct consequence of the existence of spatial fluctuations in the local regularity of the velocity field, namely
54
´ ODO ET AL. ARNE
e:vðr; leÞ l hðrÞ
ð54Þ
where the exponent h depends upon r, Parisi and Frisch [43] propose to capture intermittency in a geometric framework. For each h, if one calls D(h) the fractal dimension of the set of spatial points r for which hðrÞ ¼ h, then by suitably inserting this local scaling behavior [Eq. (54)] into Eq. (52), one can bridge the so-called singularity spectrum D(h) and the set of scaling exponent p by a Legendre transform: DðhÞ ¼ minð ph p þ dÞ p
ð55Þ
where d ¼ 3 is the dimension of the velocity field. From the properties of the Legendre transform, a nonlinear p spectrum is equivalent to the assumption that there is more than a single exponent h. But as already mentioned in the introduction (Section I), Eq. (55) is valid for positive (integer) p values only, which precludes the computation of the entire D(h) spectrum (in particular its right decreasing part corresponding to the weakest singularities is inaccessible to the structure function method) [50]. In the early 1990s, the 1D WTMM method [47–50] was introduced to overcome the insufficiencies of the numerical techniques commonly used to perform multifractal analysis (e.g., the structure function method, and the box-counting techniques). The use of wavelets (instead of increments or boxes) actually allows us to compute partition functions that scale like Zðq; aÞ aðqÞ , where the exponents ðqÞ are nothing but a generalization of the exponents p in the sense that q is now a real number going from 1 to þ1. Then, as demonstrated [74,75], one can prove that by Legendre transforming the ðqÞ spectrum, one gets both the increasing ðq > 0Þ and the decreasing ðq < 0Þ parts of the D(h) singularity spectrum. Preliminary results obtained for high Reynolds wind tunnel experimental data with the 1D WTMM method have confirmed the nonlinearity of the ðqÞ spectrum and consequently the multifractal nature of the longitudinal velocity fluctuations [47–49]. Let us note that from low to moderate Reynolds number turbulence, the inertial range revealed in numerical simulations as well as in experiments is rather small, which makes the estimate of the scaling exponents p and ðqÞ not very accurate. Actually, the existence of scaling laws such as Eq. (52) for the structure functions [240,246,247], as well as for the WTMM partition functions [81–83,85], is not clear experimentally, even at the highest accessible Reynolds numbers. Indeed, there is a persistent curvature when one plots ln ½Sp ðlÞ vs. ln (l), which means that, rigorously speaking, there is no scale invariance. This observation somehow questions the validity of the multifractal description. Benzi et al. [248–250] proposed some remedy to the observed departure from scale invariance by looking at the scaling behavior of one structure function against another. More precisely, p can be
MULTIFRACTAL IMAGE ANALYSIS
55
estimated from the behavior Sp ðlÞ S3 ðlÞp , if one assumes that 3 ¼ 1 ½18. The relevance of the so-called extended self-similarity (ESS) hypothesis improves and further extends the scaling behavior toward the dissipative range [230,248–250]. From the application of ESS, some broad consensus among European researchers was reached in 1996 [240], at least as far as isotropic homogeneous turbulence is concerned. In this context, the ESS hypothesis has received strong support from the ‘‘propagator (across scales)’’ approach originally developed by Castaing and co-workers [235,246,251–257] and recently revisited with the wavelet transform methodology [81–83,85,221,258]. Let us notice that Castaing’s approach can be linked to the recently proposed Fokker–Planck/Langevin description of intermittency [259–261]. According to this description, the velocity field is a Markov process across scales which suggests that the velocity increment pdf at different scales obeys a Fokker–Planck differential equation characterized by a drift and a diffusion coefficient. Even though this description remains, to a large extent, formal from a mathematical point of view and very phenomenological, it can be interesting because of its great versatility as far as scaling behavior is concerned [262]. Let us note that some theoretical works have tried to build some bridge between the Fokker–Planck approach and the Navier–Stokes dynamics [263,264]. Very recently, a systematic computation of the cumulants of the magnitude ln je:vðr; leÞj of 1D longitudinal velocity profiles stemming from three different experimental setups and covering a broad range of Taylor-scaled Reynolds numbers from R ¼ 89 to 2500 has clearly revealed some inconsistency with the ESS hypothesis [87]. Indeed this study shows that the breaking of scale invariance is mainly contained in the first-order cumulant, which is found to strongly depend on Reynolds number and experimental conditions, whereas, surprisingly, the second-order cumulant displays universal scale invariance behavior from R values as low as R ’ 100. Furthermore, when extrapolating these results to the limit of infinite Reynolds number, this study confirms the asymptotic validity of the log-normal multifractal description of the intermittency phenomenon; the p spectrum is quadratic: p ¼ ð pÞ þ d ¼ C1 p C2
p2 2
ð56Þ
with a well-defined intermittency parameter C2 ¼ 0:025 0:003 [87]. Note that a plausible explanation to the scale invariance symmetry breaking observed in the magnitude first-order cumulant at finite Reynolds number [and which turns out to pollute the scaling behavior of Sp ðlÞ for every p] is the presence of anisotropic velocity fluctuations in the inertial range that are likely to originate from large-scale boundary and forcing effects. We refer
56
´ ODO ET AL. ARNE
the reader to Refs. [265–268], which show how to master these anisotropic effects using the irreducible representations of the rotation group. 2. Intermittency Based on the Energy Dissipation Field A central quantity in the K41 theory [222] is the mean energy dissipation , which is supposed to be constant [Eq. (53)]. The observed nonlinear behavior of the p spectrum [Eq. (56)] is generally interpreted as a direct consequence of the intermittency phenomenon displayed by , which is not spatially homogeneous but undergoes local intermittent fluctuations [18,226,227,235]. Under the so-called Kolmogorov refined similarity hypothesis (RSH) [269,270], the velocity structure functions can be rewritten as Sp ðlÞ < l ðrÞp=3 > l p=3 l ð p=3Þþp=3
ð57Þ
where l ðrÞ is the spatial average of the energy dissipation over a ball of radius l centered at the point r and of volume Vl l d : Z 1 ðr0 Þdd r0 ð58Þ l ðrÞ ¼ Vl Vl Note that the dissipation rate is related to the symmetric part of the strain tensor ði; j ¼ 1; 2; 3Þ: X ð@ þ @i j Þ2 ¼ i; j j i 2 ð59Þ X S S ; ¼ 2 ij ji i; j where
1 Sij ¼ ð@j i þ @i j Þ 2
ð60Þ
According to Eq. (57), the scaling exponents of Sp are thus related to those of l ðrÞ [18]: p ¼ ð p=3Þ þ p=3
ð61Þ
By Legendre transforming both sides of this equation, one gets the following relationship between the singularity spectra of and v:
¼ 3h;
f ð Þ ¼ DðhÞ
ð62Þ
where f ð Þ is the Hausdorff dimension of the set of spatial points such that l ðrÞ behaves like
MULTIFRACTAL IMAGE ANALYSIS
l ðrÞ l 1
as
l!0
57 ð63Þ
Considered as a measure, the dissipation has singularities of exponent 1 on sets of dimension: f ð Þ ¼ minðqð 1Þ ðqÞ þ dÞ q
ð64Þ
Several experimental and numerical works have tested various facets of the RSH hypothesis [164,221,227,250,251,271–282]. The support for the RSH is strong but not unequivocal. In the experiments, besides some possible artifact that may result from the use of the Taylor’s hypothesis (which consists in substituting time derivatives for space derivatives) [18], the socalled surrogacy issue concerns the shortcoming of replacing by its surrogate: 2 @u 0 ¼ 15 ð65Þ @x where u is the recorded longitudinal velocity component. Indeed the necessity of working with the surrogate dissipation amounts to assuming that the local dissipation is well approximated by an isotropic form, which is strictly valid in an ensemble-averaged sense in high Reynolds number flows and not obviously satisfied in real experimental conditions [283,284]. In the direct Navier–Stokes simulations (DNS), there are strong indications that the detailed structures of the pdfs of the energy dissipation and its 1D surrogate are different and that the velocity increments conditioned on 0l do not follow the RSH to the same degree as those conditioned on l [282]. Since Richardson’s cascade pioneering picture [285], multiplicative cascade models have enjoyed a lot of interest as the paradigm of methods for obtaining multifractal dissipation measures [1,2,18,67–71,76,122, 123,164,206,227]. The notion of cascade actually refers to a self-similar process whose properties are defined multiplicatively from coarse to fine scales. In that respect, it occupies a central place in the statistical theory of turbulence [18,69,226–228]. Over the past 40 years, since the log-normal model proposed by Kolmogorov [269] and Obukhov [270] (KO62) to account for the correction to K41 theory, refined cascade models have flourished in the literature such as the random -model, the -model, the p-model (for reviews see [18,69,227]), the log-stable models [120–123,286], and more recently the log-infinitely divisible cascade models [254,287–291] including the rather popular log-Poisson model advocated by She and Leveque [292]. Very generally, a self-similar cascade is defined by the way the scales are refined and the statistics of the multiplicative factor at each steps of the process [76,123,167,227,293,294]. One can thus distinguish
58
´ ODO ET AL. ARNE
discrete cascades that involve discrete scale ratios leading to log-periodic corrections to scaling (discrete scale invariance [295,296]) from continuous cascades without preferable scale factors (continuous scale invariance). As far as the fragmentation process is concerned, one can specify whether some conservation laws are operating or not [76]; in particular one can discriminate between conservative (the measure is conserved at each cascade step) and nonconservative (only some fraction of the measure is transferred at each step) cascades. More fundamentally, there are two main classes of self-similar cascade processes: deterministic cascades that generally correspond to solvable models [69,227] and random cascades that are likely to provide more realistic models but for which some theoretical care is required as far as their multifractal limit and some basic multifractal properties (including multifractal phase transitions) are concerned [76]. As a notable member of the later class, the independent random cascades introduced by Mandelbrot [297,298] as a general model of random curdling in fully developed turbulence have a special status since they are the main random cascade model for which deep mathematical results have been obtained [299,300]. Recently, these multiplicative random cascade models have been recast in a Fokker–Planck/Langevin description of the pdf of lnðl Þ across scales [301,302]. There has been early experimental attemps to measure the f ð Þ singularity spectrum [Eq. (64)] of the dissipation rate with the specific goal to discriminate between the most popular multiplicative cascade models [69,227,303–305]. Surprisingly, the simplest version of the weighted curdling models proposed by Mandelbrot [297,298], namely the binomial model, turns out to account reasonably well (at least at a certain level of description) for the observed multifractal ðqÞ and f ð Þ spectra (see Ref. [296] for a recent analysis). Indeed, all the existing cascade models appeal to adjustable parameters that are difficult to determine by plausible physical arguments and that generally provide enough freedom to account for the experimental data. Moreover, a quantitative validation of any model seems rather illusive since various technical difficulties may have disturbed the measurement of the dissipation multifractal spectra. We refer the reader to Ref. [306] for a review of the possible problems involved in the experimental process. We will mention only two main experimental limitations. The first one results from the fact that the multifractal model of turbulence implies a dependence of the viscous cutoff on the singularity exponent ð Þ=L ¼ Re3=ð3þ Þ [18,306–310]. It is thus a crucial question if the current hot-wire probes can resolve the scales implied by exponents significantly less than 1, i.e., those that correspond to the strongest singularities of the dissipation measure. The second one is the fact that single probe measurement of the longitudinal velocity requires the use of the 1D surrogate dissipation 0
MULTIFRACTAL IMAGE ANALYSIS
59
approximation [Eq. (65)], which may introduce severe bias in the estimate of the multifractal spectra mainly because of the presence of global and local anisotropic effects. A genuine 3D multifractal processing of turbulence dissipation data is at the moment feasible only for numerically simulated flows. But there is a price to pay for the additional gain of not using Taylor’s frozen flow hypothesis; these simulations are still somehow limited in Reynolds number to regimes where scaling just begins to manifest itself, thus making reliable measurements of multifractal properties difficult [247,274,282,311]. Nevertheless several numerical studies [274,312] agree that, at least at low and moderate Reynolds numbers, the 1D-surrogate energy dissipation is in general more intermittent than the full field, which is found nearly log-normal in the inertial range [274,313]. Note that some departure from log-normality can be observed for high-order moments (large q > 0) [274] and is likely to define local anisotropic effects induced by strongly localized events [314,315]. Besides the experimental difficulties of measuring the energy dissipation field, there is some additional intrinsic limitation to the multifractal analysis of turbulent fields that comes from the numerical techniques commonly used in the literature to process the experimental as well as the numerical data. For instance, the multifractal spectra p of the longitudinal velocity and ðqÞ of the energy dissipation are commonly computed using, respectively, the structure functions [18,43,44] and the box-counting [24,67,227] methods. The fact that the former method allows us to compute the longitudinal velocity exponents p for positive p values only explains why for many years the validity of the RSH relationships (61) and (62) has been partially tested [227,274]. More recent checks using the 1D WTMM method [221] and an alternative two-scale method [316,317] have clearly revealed the failure of Eq. (61) for negative p values when identifying with its surrogate 0 . This means that the decreasing part of the singularity spectra f0 ð Þ and D(h) (corresponding to the weakest singularities of both fields) significantly differs with respect to numerical uncertainty. Moreover, there is an implicit normalization constraint inherent to the box-counting technique, namely ð1Þ ¼ 0 ð1Þ ¼ 0, which makes this method quite inappropriate for studying nonconservative multiplicative cascade processes. Indeed a blind use of boxcounting algorithms will always yield multifractal spectra that can be misleading compared to the theoretical spectra of the conservative cascading process. 3. Intermittency Based on the Enstrophy Field An important step in the understanding of small-scale turbulence driven by expectations of universality is to proceed to a comparative statistical
60
´ ODO ET AL. ARNE
analysis of dissipation and enstrophy in isotropic turbulence. Note that the enstrophy is related to the antisymmetric part of the strain tensor: 1 i; j ð@j i @i j Þ2 2 ¼ 2i; j !ij !ji
ð66Þ
1 !ij ¼ ð@j i @i j Þ 2
ð67Þ
¼ where
The relationship between ¼ 2S2 and ¼ 2!2 ð! ¼ r ^ v is the usual vorticity pseudovector) is X ¼ þ2 ð@j i @i j Þ ð68Þ i;j From the incompressibility condition, one can show that the global averages of dissipation and enstrophy are related: <>¼< >
ð69Þ
But this does not imply that their local averages l ðrÞ [Eq. (58)] and l ðrÞ scale identically, where Z 1
l ðrÞ ¼
ðr0 Þdd r0 : ð70Þ Vl Vl Nevertheless, if they do, this will imply that the power-law scaling of < ql > and < ql > in the inertial range must be the same ðqÞ ¼ ðqÞ and in turn the corresponding singularity spectra f ð Þ ¼ f ð Þ. There has been interesting recent controversy concerning the relative scaling properties of enstrophy and dissipation densities. Different theoretical studies have converged to the conclusion that the asymptotic scaling exponents must be equal in the limit of infinite Reynolds number [265,266,318–320]. Pioneering numerical DNS studies [321,322] have shown that the field is more intermittent than the field. The 1D measurements of the streamwise components of and !, obtained at both high and low Reynolds numbers [323,324], conclude that the degrees of intermittency in the dissipation and the enstrophy fields are not the same. This observation is corroborated by the analysis of circulation data [325]. More recent DNS studies at moderate Reynolds number (R ¼ 216) [326] confirm that there are differences between the two scalings. As suggested by Chen et al. [327], this difference is likely to result from the difference observed in the scaling exponents pL and pT of longitudinal and transverse structure functions, respectively
MULTIFRACTAL IMAGE ANALYSIS
61
[256,328–333]. More precisely, Chen et al. [327] reported numerical results that demonstrate the possible validity of a different RSH for the transverse direction (RSHT) that connects the statistics of the transverse velocity increments with the locally averaged enstrophy in the inertial range. The important implication of RSHT is the possible existence of two independent sets of scaling exponents related, respectively, to the symmetric (dissipation physics) and antisymmetric (vortex dynamics) parts of the strain rate. But some caution should be taken when extrapolating these results to high Reynolds numbers. The statistical analysis of the dissipation and enstrophy fields induced by a set of Burger vortices in He et al. [319] is very eloquent in that respect. For this model system, finite-range scaling exponents for and
are different but the asymptotic scaling exponents can be shown to be equal in the limit of infinite Reynolds number.
B. Application of the 2D WTMM Method to 2D Cuts of a Turbulent 3D Dissipation Field In this section, we want to revisit previous multifractal analysis of the dissipation field ðrÞ in isotropic turbulence using the 2D WTMM methodology described in Section II.C. Given the uncertain nature of the existing knowledge, it is important to study the scaling properties of both the dissipation and enstrophy fields without resorting to the artifacts mentioned in Section V.A. We thus employ the numerical data from DNS of isotropic turbulence carried out by Meneguzzi [334] with the same numerical code as previously developed by Vincent and Meneguzzi [231] but at a higher resolution. The DNS were performed using 5123 mesh points in a 3D periodic box and a viscosity of 5 104 . A statistically steady state was obtained by forcing low Fourier modes in a deterministic way. The Taylor microscale Reynolds number R ¼ 216 is close to the value attained in the DNS reported [274,282,326,327]. Here we will examine only one snapshot of both dissipation and enstrophy 3D spatial fields. Indeed we will mainly proceed to a comparative multifractal analysis of 2D cuts of both fields using classical box-counting techniques and the 2D WTMM method. The corresponding ðqÞ, ðqÞ, and f ð Þ, f ð Þ spectra will result from an annealed averaging over 512 (512 512) 2D cuts in a 5123 cube. Figure 31a is a picture of the fluctuations of the local dissipation as seen on an arbitrary 2D cut when using a 256 gray-level coding. The highly intermittent nature of these fluctuations is striking and further illustrated in Figure 31c on an arbitrary 1D spatial profile. We systematically follow the numerical implementation procedure described in Section II.C. We first wavelet transform the 512 (512 512) images of with the first-order
62
´ ODO ET AL. ARNE
Figure 31. 5123 DNS of the dissipation and enstrophy fields at R ¼ 216 [334]. Dissipation field: (a) 2D cut of ðrÞ; (b) 2D cut of ln ðrÞ; (c) 1D cut of ðrÞ. Enstrophy field: (d) 2D cut of ðrÞ; (e) 2D cut of ln ðrÞ; (f) 1D cut of ðrÞ. In (a), (b), (d), and (e), and are represented using a 256 gray scale coding from black (min) to white (max).
(n ¼ 1) and the third-order (n ¼ 3) radially symmetric analyzing wavelets defined in Figure 1. From the wavelet transform skeleton defined by the WTMMM, we compute the partition functions Z(q, a) from which we extract the ðqÞ and f ð Þ multifractal spectra. 1. Remark Let us point out that the WTMM definition of the WT ðqÞ spectrum [Eq. (22)] is slightly different from the one defined in Eq. (57) from the moments of the l pdf (< ql > l ðqÞ ) and from the ‘‘standard’’ boxcounting definition BC ðqÞ found in the literature, for example in Meneveau and Sreenivasan [227]:
MULTIFRACTAL IMAGE ANALYSIS
63
Figure 32. 2D wavelet transform analysis of the 2D cuts of the dissipation and enstrophy fields shown in Figure 31a and d, respectively. ðxÞ is the first-order radially symmetric analyzing wavelet shown in Figure 1. Dissipation field: (a) a ¼ 22 W ; (b) a ¼ 24 W . Enstrophy field: (c) a ¼ 22 W ; (d) a ¼ 24 W . The local maxima of M along the maxima chains are indicated by () from which originates an arrow whose length is proportional to M and whose direction (with respect to the x-axis) is given by A .
WT ðqÞ ¼ ðqÞ d ¼ BC ðqÞ dq ¼ ðq 1ÞDq dq
ð71Þ
where d ¼ 2 when investigating 2D cuts of the 3D dissipation field and Dq are the generalized fractal dimensions defined in Refs. [20–26]. Note that the Legendre transforms used in the three different cases lead to the same estimate of the f ð Þ singularity spectrum. 2. Numerical Computation of the ðqÞ and f ðqÞ Multifractal Spectra Figure 32a and b illustrates the computation of both the maxima chains and the WTMMM of the 2D cut of shown in Figure 31a when using the firstorder analyzing wavelet at two different scales. After linking these WTMMM across scales, one constructs the WT skeleton from which one computes the partition functions Z(q, a) [Eq. (21)]. As shown in Figure 33a, the annealed average of a2 Zðq; aÞðÞ displays some well-defined scaling 4 behavior over the range of scales 2 W < a< 2 W (where W is the characteristic size of at the smallest scale), when plotted versus a in a logarithmic representation and this for values of q in the interval [2; 4] for
64
´ ODO ET AL. ARNE
Figure 33. Determination of the ðqÞ and f ð Þ spectra of 512 2D cuts of the dissipation field. The 2D WTMM method is used with either a first-order () or a third-order ( ) radially symmetric analyzing wavelet (see Fig. 1). Results obtained with box-counting techniques () are shown for comparison. (a) log2 ½a2 Zðq; aÞ vs. log2 a; (b) ðqÞ vs. q; (c) f ð Þ vs. , after Legendre transforming the ðqÞ curve in (b). In (a) the different data curves have been arbitrarily vertically shifted for the sake of clarity. In (b) and (c), the solid lines correspond, respectively, to the theoretical log-normal multifractal spectra (72) and (74) for the parameter values C1 ¼ 0:11 and C2 ¼ 0:18 [Eq. (73)]. In (c), the dashed line corresponds to the average f ð Þ spectrum obtained by Meneveau and Sreenivasan from the analysis of surrogate dissipation data using a box-counting algorithm [227].
which statistical convergence turns out to be achieved. Indeed some curvature can be observed in this logarithmic representation as the indication of some scale symmetry breaking as previously observed for the longitudinal velocity [81–83,85,221,246–258,267]. The extension of this statistical analysis to time averaging over a few turnover times is currently under progress. When processing to a linear regression fit of the data in Figure 33 over the range 21:1 W a 23:6 W , one gets the ðqÞ spectrum () shown in Figure 33b. In this accessible range of q values, the ðqÞ spectrum obtained unambiguously deviates from a monofractal linear spectrum. Actually the data are remarkably well fitted by a parabola, the hallmark of log-normal multifractal spectra:
MULTIFRACTAL IMAGE ANALYSIS
ðqÞ ¼ C1q C2
q2 2
65 ð72Þ
with C1 ¼ 0:11 0:01;
C2 ¼ 0:18 0:01
ð73Þ
By Legendre transforming this quadratic ðqÞ spectrum, one gets data for the f ð Þ singularity spectrum that are well parameterized by the corresponding parabolic log-normal singularity spectrum: f ð Þ ¼ 2
ð 1 þ C1 Þ2 2C2
ð74Þ
We have checked that the estimate of the f ð Þ singularity spectrum from the scaling behavior of the partition functions ðq; aÞ ¼ hðq; aÞ þ 1 [Eq. (29)] and f ðq; aÞ ¼ Dðq; aÞ [Eq. (30)] yields similar quantitative results. Figure 33 also shows for comparison the results ( ) obtained when applying the 2D WTMM method with a third-order (n ¼ 3) radially symmetric analyzing wavelet (the smoothing function being the isotropic 2D Mexican hat). An overall comparison with the previous results shows a remarkable robustness of the estimates of the ðqÞ and f ðqÞ spectra with respect to the order of the analyzing wavelet. Both spectra are still strikingly well fitted by the lognormal multifractal model predictions [Eqs. (72) and (74), respectively] with the parameter values C1 ¼ 0:07 0:01;
C2 ¼ 0:19 0:01
ð75Þ
which, up to the numerical uncertainty, are quite consistent with the previous values in Eq. (73). Figure 33 also shows for comparison the results (~) obtained when using classical box-counting techniques (indeed we use boxes with Gaussian shape in order to take advantage of part of our 2D WT software). It is clear in Figure 33a that the data obtained for a2 Zðq; aÞ with the box-counting method significantly differ from those obtained with the 2D WTMM methodology. Actually, as reported in Figure 33b, the ðqÞ data are still reasonably well accounted by the theoretical log-normal spectrum [Eq. (72)], but with significantly different parameter values: C1 ¼ 0:09 0:01;
C2 ¼ 0:20 0:01
ð76Þ
Note that the difference is not so much in the intermittency coefficient C2 , which is found to be robust to the method used to estimate it and in good agreement with the results of previous DNS studies [231,274,282,313, 326,327]. Let us emphasize that the C2 values in Eqs. (73), (75), and (76) are at the lower bound of the range of values (0.20 to 0.28) found in
66
´ ODO ET AL. ARNE
experimental measurements based on surrogate dissipation data [227,277, 303–305,323,324]. The main difference between the 2D WTMM and the box-counting results concerns the estimate of the coefficient C1 of the linear term in ðqÞ. This is a direct consequence of the normalization constraint ð1Þ ¼ 0 intrinsic to the box-counting method, which implies the relationship C1 ¼ C2 =2 between the two parameters of the log-normal ðqÞ spectrum [Eq. (72)]. The results reported in Figure 33 dramatically reveal the failure of commonly used box-counting algorithms when the considered measure results from a nonconservative log-normal multifractal process as characterized by a negative cancellation exponent [335–337] ð1Þ ¼ 0:20 0:01, the signature of a signed measure (i.e., a distribution that varies in sign on small scales). As a consequence, the f ð Þ spectrum is misleadingly shifted to the right when using box-counting techniques as illustrated in Figure 33c (this shift is also present when studying 1D cuts of the dissipation field as reported in Roux’s thesis [221]). This observation seriously questions the validity of most of the experimental and numerical box-counting estimates of the f ð Þ singularity spectrum reported so far in the literature. Figure 33c shows for comparison some average f ð Þ spectrum obtained by Meneveau and Sreenivasan [227] from the analysis of surrogate dissipation data; the agreement with our box-counting estimate is very good for the left increasing (q > 0) branch while the right decreasing (q < 0) branch departs somehow to a larger value of as an indication of a slightly larger intermittency coefficient C2 ¼ 0:25 as compared to the value C2 ¼ 0:20 in Eq. (76). This is an additional indication that surrogate dissipation is likely to be more intermittent than real dissipation [282]. 3. WTMMM Probability Density Functions This subsection is mainly devoted to the analysis of the joint probability density function Pa ðM; AÞ as computed from the WT skeletons of 512 2D cuts of the dissipation field with the first-order radially symmetric analyzing R wavelet ðn ¼ 1Þ. Figure 34a and b show the pdfs P ðMÞ ¼ dAP a a ðM; AÞ R and Pa ðAÞ ¼ dMPa ðM; AÞ for three different values of the scale parameter a in the scaling range. First let us focus on the results shown in Figure 34b for Pa ðAÞ. Pa ðAÞ does not evolve accross scales and is almost flat. Actually some oscillations are observed with maxima for A ¼ 0; =2; and 3=2 as an indication of some anisotropy induced by the cubic lattice discretization in the DNS. All the multifractal properties of 2D cuts are thus contained in the way the shape of Pa ðMÞ evolves when one decreases the scale parameter a as shown in Figure 34a. Actually, for the three selected scales, all the data points fall, within a good approximation, on a log-normal curve (see for comparison the pdfs in Fig. 28a), which is a strong indication
MULTIFRACTAL IMAGE ANALYSIS
67
Figure 34. Pdfs of the WTMMM coefficients of 512 2D cuts of the dissipation and enstrophy fields as computed with the first-order radially symmetric analyzing wavelet. Dissipation field: (a) Pa ðMÞ vs. M; (b) Pa ðAÞ vs. A. Enstrophy field: (c) Pa ðMÞ vs. M; (d) Pa ðAÞ vs. A. The symbols correspond to the following scales a ¼ 21 W ðÞ; 22 W ð Þ, and 23 W ðÞ.
that the WTMMM have a log-normal distribution in the inertial range. This observation of log-normal statistics strengthens the previous estimates of log-normal quadratic ðqÞ and f ð Þ spectra. (We refer the reader to Wang et al. [282] for similar conclusions on the entire 3D dissipation field when using box-counting techniques.) 4. Space-Scale Correlation Function Analysis As pointed out in Section III.C, to go from log-normal diagnosis to the demonstration of the existence of an underlying multiplicative structure in the 2D fluctuations of the dissipation field, one can take advantage of the space-scale unfolding provided by the WT skeleton to compute the crossscale correlation functions. Figure 35a shows the results of the computation of C ðx; a1 ; a2 Þ when averaging over the 512 2D cuts of . One can see that for x > supða1 ; a2 Þ, all the data points fall onto a unique curve when plotted versus log2 ðxÞ, independently of the considered pair of scales
68
´ ODO ET AL. ARNE
Figure 35. Magnitude correlation function Cðx; a1 ; a2 Þ [Eq. (49)] vs. log2 ðxÞ, as computed from the WT skeleton of 512 images. The analyzing wavelet is the radially symmetric first-order wavelet shown in Figure 1. The symbols have the following meaning: a1 ¼ 1; a2 ¼ 2ð Þ; a1 ¼ 2; a2 ¼ 3ðhÞ; and a1 ¼ 1; a2 ¼ 3ð4Þ in W units. (a) Dissipation field ; the solid line represents the theoretical prediction given by Eq. (50) with 2 ¼ C2 ln 2 ¼ 0:12 [C2 ¼ 0:18 as given by Eq. (73)]. (b) Enstrophy field ; the solid line has the same meaning as in (a); the dashed line represents the theoretical curve given by Eq. (50) when fixing 2 ¼ C2 ln 2 ¼ 0:20 [C2 ¼ 0:29 as given by Eq. (78)].
ða1 ; a2 Þ. Moreover this curve is in remarkable agreement with the theoretical prediction [Eq. (50)] for the random cascading process when plugging into this equation the value 2 ¼ C2 ln 2 ¼ 0:12 as previously estimated in Eq. (73). These consistent observations strongly suggest that a 2D nonconservative log-normal cascading process provides a reasonable model for the intermittent fluctuations observed along 2D cuts of the dissipation field. C. Application of the 2D WTMM Method to 2D Cuts of a Turbulent 3D Enstrophy Field Let us now proceed to a comparative statistical analysis of the corresponding numerical enstrophy field ðrÞ. Figure 31d and illustrates a 2D cut of
ðrÞ in linear and in semilogarithmic representations, respectively. The intermittent aspect of ðrÞ is enlightened on the 1D cut shown in Figure 31f. We proceed, as for the dissipation field in Section V.B, by applying the 2D WTMM method described in Section II.C to 512 (512 512) images of
with analyzing wavelets of different orders and we compare the ðqÞ and f ð Þ multifractal spectra obtained with the corresponding estimates from box-counting computations. 1. Numerical Computation of the Multifractal ðqÞ and f ðqÞ Spectra Figure 32c and d illustrates the maxima chains and the WTMMM of the 2D cut of shown in Figure 31d as computed with the first-order (n ¼ 1)
MULTIFRACTAL IMAGE ANALYSIS
69
Figure 36. Determination of the ðqÞ and f ð Þ spectra of 512 2D cuts of the enstrophy field. The 2D WTMM method is used with either a first-order () or a third-order ( ) radially symmetric analyzing wavelet (see Fig. 1). Results obtained with box-counting techniques (4) are shown for comparison. (a) log2 ½a2 Zðq; aÞ vs. log2 a; (b) ðqÞ vs. q; (c) f ð Þ vs. , after Legendre transforming the ðqÞ curve in (b). In (a) the different data curves have been arbitrarily vertically shifted for the sake of clarity. In (b) and (c), the solid lines correspond to the theoretical log-normal multifractal spectra (77) and (79) for the parameter values C1 ¼ 0:19 and C2 ¼ 0:29 [Eq. (78)].
analyzing wavelet (Fig. 1) at two different scales. After linking these WTMMM across scales, one constructs the WT skeleton from which one computes the partition functions Z(q, a) [Eq. (21)]. As shown in Figure 36a, the annealed average of a2 Zðq; aÞðÞ displays some well-defined scaling behavior 4 over the range of scales 2 W < a< q< 2 W for 2 < 4. Indeed some slight but systematic curvature can be noticed in the log–log plots very much like what has been observed for the dissipation in Figure 33a. If we proceed as in Section V.B to a linear regression fit of the data over the range 21:0 W a 23:4 W , one gets the ðqÞ spectrum () shown in Figure 36b, which is again in quite good agreement with a parabolic log-normal spectrum: ðqÞ ¼ C1 q C2
q2 2
ð77Þ
70
´ ODO ET AL. ARNE
with C1 ¼ 0:19 0:01;
C2 ¼ 0:29 0:01
ð78Þ
Consistently, we find in Figure 36c that the corresponding f ð Þ singularity spectrum is remarkably well fitted by the parabolic log-normal curve: f ð Þ ¼ 2
ð 1 þ C1 Þ2 2C2
ð79Þ
Figure 36b and c also show for comparison the results ( ) obtained when using the third-order ðn ¼ 3Þ analyzing wavelet. The estimates of the ðqÞ and f ð Þ spectra are in very good agreement with the results obtained previously with the first-order ðn ¼ 1Þ analyzing wavelet. These spectra are still remarkably approximated by a parabola [Eqs. (77) and (79)] with the following parameter values: C1 ¼ 0:18 0:01;
C2 ¼ 0:28 0:01
ð80Þ
which are within the error bars of the values reported in Eq. (78). The robustness of these multifractal spectra estimates with respect to some change in the shape of the analyzing wavelet is even more striking when one compares these estimates to those extracted from a box-counting algorithm (4). Very much like what we have observed for , this standard technique also yields parabolic spectra but with significantly different parameter values (mainly for C1 ): C1 ¼ 0:13 0:02;
C2 ¼ 0:29 0:01
ð81Þ
because of the normalization requirement ð1Þ ¼ 0, i.e., C1 ¼ C2 =2, inherent to this method. Let us point out that whatever the technique, the estimate of the intermittency parameter C2 of the enstrophy [Eqs. (78), (80), (81)] is much larger than the corresponding value found for the dissipation [Eqs. (73), (75), (76)]. This confirms that the enstrophy field is likely to be more intermittent than the dissipation field as previously suggested [321–327]. However the WTMM method reveals that the f ð Þ spectrum is noticeably shifted toward smaller values (corresponding to stronger singularities) as compared to the box-counting estimate (Fig. 36c). We will come back to this point as well as to the possible nonconservative nature ½ ð1Þ ’ 0:34 < 0 of the underlying log-normal multiplicative structure. 2. WTMMM Probability Density Functions The pdfs Pa ðMÞ and Pa ðAÞ of the WTMMM modulus and argument of the enstrophy field ðrÞ are shown in Figure 34c and d, respectively.
MULTIFRACTAL IMAGE ANALYSIS
71
Quantitatively one recovers similar results as previously observed for (r) (Fig. 34a and b). In Figure 34d, Pa ðAÞ is rather flat (with some small amplitude oscillations induced by the cubic lattice discretization in the DNS) and does not evolve across scales. When looking at Pa ðMÞ in Figure 34c, one sees that at each scale, the data points fall on a curve, which is well approximated by a log-normal pdf and which evolves across scales as governed by the log-normal ðqÞ spectrum computed just above [Eqs. (77) and (78)]. 3. Space-Scale Correlation Function Analysis In Figure 35b the results of the computation of the magnitude correlation function C ðx; a1 ; a2 Þ when averaging over the 512 2D cuts of (r) are reported. One can see that consistently with a multiplicative cascade structure, all the data points fall onto a unique curve when plotted versus log2 ðxÞ, for x > supða1 ; a2 Þ, and this independently of the considered pair of scales ða1 ; a2 Þ. As far as the pertinence of Eq. (50) for modeling the numerical data, it seems that when plugging this theoretical prediction into the parameter value 2 ¼ C2 ln 2 ¼ 0:20, according to the previous estimate of the intermittency parameter C2 in Eq. (78), one gets a poorer agreement than when comparing it with the theoretical curve predicted for the dissipation field ð 2 ¼ 0:12; C2 ¼ 0:18Þ. Actually, if one focuses on spatial distances x that are not too large (i.e., smaller than the integral scale L), for which the linear term 2 log2 ðL=xÞ becomes dominant in Eq. (50), then the observed slope of C ðx; a1 ; a2 Þ is quite in agreement with the expected value 2 ¼ C2 ln 2 ¼ 0:20. The results in Figure 35b are thus an additional indication that a 2D nonconservative log-normal multiplicative process can be used to model the intermittent fluctuations observed in 2D cuts of the enstrophy field. D. Discussion We have used the 2D WTMM method to characterize statistically the multifractal properties of 2D cuts of both the dissipation and the enstrophy fields issued from ð512Þ3 DNS at R ¼ 216 [334]. As a general result, we find that the intermittent nature of the corresponding spatial landscape can be well modeled by a 2D nonconservative log-normal multiplicative process. To some extent this result is not so surprising since it is most likely that dissipation and enstrophy are not conserved along 2D cuts. We hope that the generalization in 3D of the WTMM method will allow us to decide whether this nonconservativity is a 2D cut effect that is likely to disappear when increasing the Reynolds number or if it is an intrinsic property of the
72
´ ODO ET AL. ARNE
underlying 3D multiplicative spatial structures of both fields. Moreover the averaging over several turnover times will allow us to investigate larger values of jqj (i.e., higher order moments) and possibly to evidence some departure from the theoretical log-normal multifractal spectra as suggested in Refs. [274,314,315] as an indication of some local anisotropy induced by strongly localized events (e.g., vorticity filaments). Work in this direction is in progress. One of the main disturbing results reported in this section is the numerical demonstration that most of the numerical and experimental estimates of the multifractal spectra of and fields previously reported in the literature are strongly biased by the normalization constraint ðq ¼ 1Þ ¼ 0 inherent to the commonly used box-counting techniques, which turn out to be quite inappropriate to study nonconservative multiplicative cascading processes. These techniques yield f; ð Þ spectra, which have almost the right width as given by the intermittency exponent C2 but which are significantly shifted to the right (i.e., to larger values corresponding to weakest singularities) with an estimate of the most frequent singularity hðq ¼ 0Þ ¼ ðq ¼ 0Þ 1 ¼ C1 , which is misleadingly found positive instead of negative as revealed by our 2D WTMM analysis (Figs. 33c and 36c). Finally, our comparative 2D WTMM multifractal analysis of the dissipation and enstrophy fields shows an unambiguous quantitative difference between the f ð Þ and f ð Þ singularity spectra. The width of the later is significantly larger than the one of the former as given by the respective values of the intermittency parameter: C2 ¼ 0:29 0:01 > C2 ¼ 0:19 0:01. Moreover f ð Þ is maximum for
ðq ¼ 0Þ ¼ h ðq ¼ 0Þ þ 1 ¼ 1 C1 ’ 0:80, a value that is smaller than ðq ¼ 0Þ ¼ h ðq ¼ 0Þ þ 1 ¼ 1 C1 ’ 0:90 for which f ð Þ is maximum. These results demonstrate that the enstrophy spatial landscape is more intermittent than the dissipation spatial landscape in the sense that the support of its singularity exponent is wider and that it reaches smaller values of h ¼ 1 corresponding to stronger singularities. Note that for both fields the maximum of the f; ð Þ curves is equal to 2 ½ ðq ¼ 0Þ ¼ ðq ¼ 0Þ ¼ 0, which means that the corresponding 2D spatial landscapes are singular everywhere. These results confirm the conclusions of preliminary comparative box-counting studies of the dissipation and enstrophy fields [321–327]. We hope to extend this 2D WTMM analysis to the current highest accessible Reynolds number DNS with the specific goal of investigating the validity of several theoretical studies [265,266,318–320] that predict the asymptotic ðR ! þ1Þ equality of the multifractal spectra of both fields, namely ðqÞ ¼ ðqÞ and f ð Þ ¼ f ð Þ.
MULTIFRACTAL IMAGE ANALYSIS
73
VI. Multifractal Analysis of Digitized Mammograms Breast cancer, the most common cancer among women in western countries, has become a major problem of public health. Statistics indicate that in the United States approximately 1 in 10 women will develop breast cancer during her lifetime [338]. Each year, breast cancer kills about 10,000 women in France (120,000 in the world); it is still the leading cause of cancer-related death in women. It is a slowly evolving dicease; the average duration of tumor growth to obtain a palpable mass is about 10 to 15 years. Mammography (X-ray examination) is widely regarded as the most effective method for early detection of breast cancer. In the past 20 years, several national mass screening mammography programs [e.g., Health Insurance Plan of Greatest New-York (1982) and the Swedish 2-county Program of Mammography Screening for Breast Cancer (1992)] have shown that early diagnosis can significantly decrease breast cancer mortality about 23 to 31% in women aged 49 to 69 (see Dilhuydy and Barreau [339] for a complete discussion of the pros and cons of mass mammography). Because no way to prevent breast cancer (as opposed to lung cancer, for example) has been found so far, mammography actually plays a vital role in diagnosis of the decease as well as pretherapeutic management and control during and after treatment, whereas MRI (magnetic resonance imaging) and echography are helpful only when the mammogram is questionable. However, the radiological interpretation of mammograms is a rather difficult task since the mammographic appearance of normal tissue is highly variable. In the context of breast cancer screening, abnormalities have to be detected at an early stage in a large number of asymptomatic women. For this reason, independent reading of screening mammograms by two expert radiologists is required to reduce the number of interpretation errors. In spite of this, about 10 % to 30 % of cancers that could have been detected are missed and a high percentage of patients called back at screening turn out not to have cancer. Recently, much research has been devoted to developing reliable computer-aided diagnosis (CAD) methods (see Doi et al. [340] for a general review). Many of these methods are based on multiresolution analysis [341–344], difference image technique and global and local thresholding [345–349], statistical approaches [350–354], neural networks [355–360], fuzzy logic [361–363], and the wavelet transform (WT) and related techniques [342–344,360,362,364–371]. Currently most of these methods are often combined to detect and classify clusters of microcalcifications (MC), which are an important mammographic sign of early (in situ) breast cancer despite the fact that several benign diseases show MC as well [347,353,354,359,
74
´ ODO ET AL. ARNE
360,362,366,367,369–373]. In the mid-1990s, fractal methods were applied to the analysis of radiographic images with some success in improving the performances of previous CAD schemes [352,374–379]. But most of these methods have been intrinsically elaborated based on the prerequesite that the background roughness fluctuations of normal breast texture are statistically homogeneous (i.e., monofractal) and uncorrelated. Regions that contain statistical aberrations that deviate from this monofractal picture are considered as abnormal regions in which tumors or MC are likely to be found. Our goal here is to propose the 2D WTMM method as an alternative method to perform multifractal analysis of digitized mammograms [115]. As we want to study scaling properties of digitized mammograms, we chose to use full-breast images from the Digital Database for Screening Mammography (DDSM) project [380], which provides online more than 2600 studies1 sorted into three categories: normal, cancer, and benign. Mammograms were digitized using a 12-bit scanner with both a good spatial resolution of 43:5 m. Full-breast images enable us to select about 50 overlapping 512 512 pixel squares; indeed, to master edge effects, only cores of the images were used for the computation of the WT skeleton and partition functions.
A. Application of the 2D WTMM Method to Mammographic Tissue Classification: Dense and Fatty Tissues Several studies in the mid-1970s showed that an association existed between mammographic parenchymal patterns and the risk of developing breast cancer [381–383]. However, it appears that very few image processing works [363] have been devoted to automatic breast tissue density measurement, since Boyd et al. [383] studied the relation between mammographic densities and breast cancer risk using both radiological classification and semiautomatic user-assisted computer measurement based on gray-level histogram thresholding. Here we analyze normal mammary parenchyma with our multifractal 2D WTMM method with the specific goal of proposing a computerized method to calculate a breast density fluctuations index. We have selected a set of 10 images in the DDSM database according to ACR breast density rating with some index ranking from 1 to 4, as assigned by an experienced mammographer: five fatty (rated 1 on ACR density scale) and five dense (rated 4) breasts. The main steps of the 2D WTMM computations are illustrated in Figure 37 on two full-breast images selected, 1
http://marathon.csee.usf.edu/Mammography/Database.html.
MULTIFRACTAL IMAGE ANALYSIS
75
respectively, to be representative of dense-glandular and fatty breasts. Figure 37a and e shows the original images, respectively, with the (superimposed) grid used to cut out 49 (512 512 pixels) subscenes. Figure 37b and f represents a zoom in the respective central subscenes. The corresponding WT modulus landscape and WT maxima chains computed
Figure 37. 2D wavelet transform analysis of two mammograms: (a–d) dense breast tissue and (e–h) fatty breast tissue. The analyzing wavelet is the first-order isotropic wavelet ( is the isotropic gaussian function) shown in Figure 1. (a) and (e) are the two full breast images. (b) and (f) represent some zooming in the central part of the two original images. (c) and (g) show the WT modulus at the scale a ¼ 3 W with the same gray-level coding as in Figure 3c; the maxima chains are shown for comparison. In (d) and (h) only the maxima chains and the local maxima of M along these chains are represented () at the scale a ¼ 2:5 W .
76
´ ODO ET AL. ARNE
at the scale a ¼ 39 pixels are shown in Figure 37c and g, respectively. Figure 37d and h represents, at a smaller scale, the location of the WTMMM () from which originate an arrow that represents the WT vector T [f] (b, a). Figure 38 shows the results of the computation of the partition functions Z (q, a) [Eq. (21)], h (q, a) [Eq. (29)], and D (q, a) [Eq. (30)] obtained when averaging over 49 nonoverlapping (512 512) images cut out of the original dense and fatty mammograms. As shown in Figure 38a and b, both dense and fatty tissues display rather good scaling properties over two and a half octaves. The scaling actually deteriorates progressively when considering large scales, due to finite size effects. When proceeding to a linear regression fit of log2 [Z (q, a)] vs. log2 ðaÞ over the range of scales extending from amin ¼ 1:6 W to amax ¼ 4 W , one obtains the ðqÞ spectra reported in Figure 38c. From a simple visual inspection, one realizes that dense and fatty breast tissues display quite different scaling properties. The latter presents a ðqÞ spectrum, which is remarkably linear in the range q 2 ½3; 3 with a slope H ¼ 0:25 0:05, while the former presents a larger slope H ¼ 0:65 0:05 with some possible nonlinear departure, which might indicate multifractality. This monofractal vs. multifractal discrimination between fatty and dense breast tissues is also evidenced by the computation of the corresponding D(h) singularity spectra in Figure 38d. However, the multifractal diagnosis for dense tissues requires further numerical analysis to ensure statistical convergence of the ðqÞ exponents for large values of jqj. Nevertheless, what seems to be robust, considering the whole set of processed images, is the fact that fatty tissues display monofractal scaling behavior with a Hurst exponent H taking a value in the range [0.20,0.35] as an indication of antipersistent roughness fluctuations while dense tissues display (possibly multifractal) scaling with H 2 [0.55, 0.75] as an indication of persistent long-range correlations. Furthermore, in the most general case, we have shown that in any full-breast mammogram, those two kinds of tissue are present and only those two. In particular, one can assign a color (e.g., blue or red) to each square of the working grid according to its dense or fatty area identification. Work is in progress to make this segmentation independent of the square grid used to cut out subscenes. Finally, let us note that in previous work, Heine et al. [379,384] already used self-similarity (fractal) analysis to study mammographic density, using the Fourier power spectrum method to extract the scaling exponent ¼ 2H þ 2 [Eq. (41)]. They obtained a histogram of values with an average H of 0.469 and a rather small standard deviation of 0.045. This finding may be interpreted in light of our results. Indeed we may think that most of the images analyzed by Heine et al. Clearly contain both fatty (H 2 [0.20, 0.35]) and dense (H 2 [0.55, 0.75]) areas, so that the power spectrum exponent is an average of two dictinct behaviors.
MULTIFRACTAL IMAGE ANALYSIS
77
Figure 38. Determination of the ðqÞ and D(h) spectra of dense () and fatty ( ) breasts with the 2D WTMM method. (a) log2 z(q, a) vs. log2a. (b) h(q, a) vs. log2 a. (c) ðqÞ vs. q. (d) D(h) vs. h obtained from Eqs. (31) and (32). Same analyzing wavelet as in Figure 37. These results correspond to annealed averaging over 49 (512 512) squares cut out of full-breast images. a is expressed in W units. In (a) and (b), q goes from 1 to 3 from bottom to top.
B. Detecting Microcalcifications through WT Skeleton Segmentation The presence of clustered MC is one of the most important and sometimes the only sign of cancer in a mammogram. As a potential computer-aided diagnostic tool, let us show how our WT methodology can identify MC that are small calcium deposits in tissue, appearing as clusters of bright spots. Figure 39 illustrates how one can actually detect MC by inspecting the WT maxima chains. Indeed, at the smallest scale resolved by our WT microscope ( W ¼ 13 pixels), MC, which can be considered as strong singularities, are
78
´ ODO ET AL. ARNE
Figure 39. Detection and characterization of microcalcifications. (a) Original 726 726 image of dense breast tissue containing MC. (b) Scaling behavior of the WT modulus M along some maxima lines pointing toward dense tissue background ( ) and microcalcifications (h). The solid (respectively dashed) straight line corresponds to the slope h ¼ 0:65 (respectively 1) characteristic of background tissue roughness fluctuations (respectively MC). (c) and (d) show the maxima chains obtained after eliminating background tissue maxima chains at scales a ¼ W (c) and 2:5 W (d), when using the WT skeleton space-scale information.
contour shaped by some maxima chains. Because the average size of MC is about 200 m (5 pixels), these singularities are seen by our mathematical microscope as Dirac singularities; thus the corresponding maxima lines pointing to the MC are likely to display scaling properties with a local Ho¨lder exponent h ¼ 1ðM ½ f a1 Þ down to scales of the order of the MC size where one should observe a crossover to the value h ¼ 0ðM ½ f cst Þ as an indication of the discontinuity induced by the MC boundary. The behavior of the WT modulus along several maxima lines pointing to background points and to MC is illustrated in Figure 39b. One can thus classify these lines according to the behavior of M ½ f along these lines, and then separate MC (h 1) from dense background tissue (h 0:65 0:05) as experienced on synthetic images in Section III.D. Figure 39c and d shows the maxima chains that are found to correspond to MC at two different scales. We see that these maxima chains can be used not only to detect MC at the smallest resolved scale (Fig. 39c), but also to perform MC clustering when investigating largest scales (Fig. 39d).
MULTIFRACTAL IMAGE ANALYSIS
79
As pointed out in Section III.D, the MC WT subskeleton can be used to compute the corresponding partition functions, and thus to fully characterize the fractal geometry of the MC cluster. Figure 40 shows the results of the computation of the partition functions from the subskeleton of WT maxima lines pointing toward MC (h 1). Let us recall that in this case h(q, a) (Fig. 40a) is simply the average scaling behavior (see Fig. 39b) along all the maxima lines of this subskeleton. As expected, one retrieves a crossover between small scale scaling properties (h 0 induced by the MC boundaries) and larger scale (2 W a 3:7 W ) scaling properties
Figure 40. Determination of the ðqÞ and D(h) spectra of the MC cluster shown in Figure 39a. The partition functions are computed from the MC WT skeleton obtained after eliminating the background tissue maxima lines (see Figure 39). (a) h(q, a) vs. log2 a. (b) D(q, a) vs. log2 a. (c) ðqÞ vs. q. (d) D(h) vs. h obtained from Eqs. (31) and (32). Same analyzing wavelet as in Figure 37. a is expressed in W units. The solid line in (a) corresponds to the slope h ¼ 0:45 and in (b) to DF ¼ 1:2.
80
´ ODO ET AL. ARNE
(h 0:4) since maxima lines, pointing toward MC, have not all reached the asymptotic (h ¼ 1) Dirac singularities behavior because of finite size effects. In the same finite range of scales, D(q, a) (Fig. 40b) displays good scaling properties for q values between 1 and 3, which results, to a good approximation, in a D(h) singularity spectrum that reduces to a single point h 0:4 (Fig. 40d). This result is consistent with the slope of the corresponding ðqÞ spectrum (Fig. 40c) that is found to be linear up to numerical uncertainty. Moreover, from the slope of D(q ¼ 0; a) vs. log a, as well as from the estimate of ð0Þ ¼ DF , one can assign with no ambiguity the fractal dimension DF ¼ 1:2 0:05 to the MC cluster, which is definitely larger than 1 and smaller than 2, the hallmark of fractal geometry. We have also applied our methodology to a small number of benign and malign clusters; work is in progress to determine to what extent the fractal dimension of a MC cluster can be used as a discriminating index between a benign state and malignancy. We have presented a new space-scale methodology for studying, within the same algorithmic framework, background tissue properties and abnormal singularities associated with breast cancer. For its ability to reveal and distinguish persistent and nonpersistent long-range correlations, the 2D WTMM method looks very promising in classifying tissues by quantifying breast density fluctuations in a very accurate way. Furthermore, we plan to improve detection and segmentation of MC by mixing and combining the 2D WTMM method with neural network techniques to assist in diagnosis of digitized mammograms.
VII. Conclusion To summarize, we have presented a first step toward a statistical theory of multifractal images based on the wavelet theory. The 2D WTMM method [106–110] relies on the computation of partition functions from the WT skeleton defined by the wavelet transform modulus maxima. This skeleton provides an adaptative space-scale partition of the fractal distribution under study from which one can extract the ðqÞ and D(h) [or f ( )] multifractal spectra as the equivalent of thermodynamic functions. With some appropriate choice of the analyzing wavelet, we have shown that the WTMM method provides a natural but necessary generalization of the classical box-counting and structure function techniques that both have intrinsic and fundamental limitations. Indeed we believe that the 2D WTMM method for characterizing the roughness fluctuations of a fractal landscape, a rough surface, a turbulent flow, or the image of a fractal object
MULTIFRACTAL IMAGE ANALYSIS
81
is likely to become as useful as the well-known phase portrait reconstruction, Poincare´ section, and first return map techniques for the analysis of chaotic time series [385–388]. Besides the new concepts involved in this methodology and its potential theoretical interest, there is a more concrete and technical contribution [139] that is likely to have a strong impact on future research. For both image analysis [107,108,139] and image synthesis [109,139] purposes, we have implemented new algorithms and developed new software that can be routinely used to analyze as well as to model experimental data. In particular, some of these numerical tools take advantage of the space-scale information contained in the WT skeleton to go beyond the classical (one-point) multifractal description via the estimate of (two-point) space-scale correlation functions. Prior to experimental applications, all these numerical tools were calibrated via systematic test applications on random self-affine surfaces (e.g., isotropic fractional Brownian surfaces and anisotropic monofractal rough surfaces [108]) as well as on synthetic multifractal rough surfaces [109]. To illustrate the wide range of potential applications of this wavelet-based image processing method, we have reported the most significant results obtained when applying the 2D WTMM methodology to three rather different experimental situations, namely the statistical analysis of high-resolution satellite images of the cloud structure, of 2D cuts of the dissipation and enstrophy fields in 3D direct numerical simulations of homogeneous and isotropic trubulence, and of digitized mammograms. We are convinced that this methodology will lead to significant progress in the understanding of the multiscale mechanisms that underly the formation of rough surfaces and the spatiotemporal evolution of intermittent fields in various domains of fundamental as well as applied sciences such as erosion and corrosion processes, deposition and growth phenomena, catalysis, fracture propagation, turbulence, medical imaging, and many other areas in physics, astrophysics, chemistry, biology, geology, meteorology, and material sciences. Acknowledgments We are very grateful to E. Bacry, R. F. Cahalan, A. Davis, M. H. Dilhuydy, S. Jaffard, L. Lalonde, J. M. Lina, A. Marshak, J. F. Muzy, and P. SaintJean for very interesting and helpful discussions. We are very indebted to Y. Gagne, Y. Male´cot, and S. Ciliberto for permission to use their experimental turbulent signals and to M. Meneguzzi for allowing us access to his DNS numerical data. We want to acknowledge M. H. Dilhuydy and L. Lalonde for many helpful and illuminating conversations relevant to
82
´ ODO ET AL. ARNE
mammography and for providing additional training mammograms. The work concerning the analysis of Landsat satellite images of cloud structure was supported by NATO (Grant CRG 960176) and was performed while S. G. Roux held a National Research Council–NASA/GSFC Research Associateship. The work concerning the analysis of DNS turbulent dissipation and enstrophy fields is currently supported by the Centre National de la Recherche Scientifique under GDR ‘‘Turbulence.’’
References 1. B. B. Mandelbrot, Fractals: Form, Chance and Dimensions (Freeman, San Francisco, 1977). 2. B. B. Mandelbrot, The Fractal Geometry of Nature (Freeman, San Francisco, 1982). 3. Random Fluctuations and Pattern Growth, edited by H. E. Stanley and N. Ostrowski (Kluwer Academic, Dordrecht, 1988). 4. J. Feder, Fractals (Pergamon, New York, 1988). 5. T. Vicsek, Fractal Growth Phenomena (World Scientific, Singapore, 1989). 6. The Fractal Approach to Heterogeneous Chemistry: Surfaces, Colloids, Polymers, edited by D. Avnir (John Wiley & Sons, New York, 1989). 7. F. Family and T. Vicsek, Dynamics of Fractal Surfaces (World Scientific, Singapore, 1991). 8. Fractals and Disordered Systems, edited by A. Bunde and S. Havlin (Springer-Verlag, Berlin, 1991). 9. Fractals in Natural Science, edited by T. Vicsek, M. Schlesinger, and M. Matsushita (World Scientific, Singapore, 1994). 10. Fractals in Geoscience and Remote Sensing, Image Understanding Research Series, Vol. 1, ECSC-EC-EAEC, edited by G. G. Wilkinson, J. Kanellopoulos, and J. Megier (Brussels, Luxemburg, 1995). 11. A. L. Baraba´si and H. E. Stanley, Fractal Concepts in Surface Growth (Cambridge University Press, Cambridge, 1995). 12. Fractal Aspects of Materials, Material Research Society Symposium Proceeding, Vol. 367, edited by F. Family, P. Meakin, B. Sapoval, and R. Wool (Pittsburg, 1995). 13. B. Sapoval, Les Fractales (Aditech, Paris, 1988). 14. On Growth and Form: Fractal and Non-Fractal Patterns in Physics, edited by H. E. Stanley and N. Ostrowski (Martinus Nijhof, Dordrecht, 1986). 15. Fractals in Physics, edited by L. Pietronero and E. Tosatti (North-Holland, Amsterdam, 1986). 16. Fractals in Physics, Essays in honour of B. B. Mandelbrot, Physica D, Vol. 38, Fractals in Physics, Essays in honour of B. B. Mandelbrot, Physica D, Vol. 38, edited by A. Aharony and J. Feder (North-Holland, Amsterdam, 1989). 17. B. J. West, Fractal Physiology and Chaos in Medecine (World Scientific, Singapore, 1990). 18. U. Frisch, Turbulence (Cambridge University Press, Cambridge, 1995). 19. J. D. Farmer, E. Ott, and J. A. Yorke, Physica D 7, 153 (1983). 20. P. Grassberger and I. Procaccia, Phys. Rev. Lett. 50, 346 (1983). 21. P. Grassberger and I. Procaccia, Physica D 9, 189 (1983). 22. R. Badii and A. Politi, Phys. Rev. Lett. 52, 1661 (1984). 23. R. Badii and A. Politi, J. Stat. Phys. 40, 725 (1985). 24. P. Grassberger, R. Badii, and A. Politi, J. Stat. Phys. 51, 135 (1988).
MULTIFRACTAL IMAGE ANALYSIS
83
25. G. Grasseau, Ph.D. thesis, University of Bordeaux I, 1989. 26. F. Argoul, A. Arneodo, J. Elezgaray, G. Grasseau, and R. Murenzi, Phys. Rev. A 41, 5537 (1990). 27. L. V. Meisel, M. Johnson, and P. J. Cote, Phys. Rev. A 45, 6989 (1992). 28. The Science of Fractal Images, The Science of Fractal Images, edited by H. O. Peitgen and D. Saupe (Springer-Verlag, New York, 1987). 29. R. F. Voss, Physica D 38, 362 (1989). 30. G. A. Edgard, Measures, Topology and Fractal Geometry (Springer-Verlag, Berlin, 1990). 31. S. Davies and P. Hall, Technical Report No. SRR 96-008, School of Mathematical Sciences, National Australian University (1996). 32. B. Dubuc, J. F. Quiniou, C. Roques-Carmes, C. Tricot, and S. W. Zucker, Phys. Rev. A 39, 1500 (1989). 33. T. Higuchi, Physica D 46, 254 (1990). 34. N. P. Greis and H. P. Greenside, Phys. Rev. A 44, 2324 (1991). 35. W. Li, Int. J. Bifurcation Chaos. 1, 583 (1991). 36. J. Schmittbuhl, J. P. Violette, and S. Roux, Phys. Rev. E 51, 131 (1995). 37. A. Scotti, C. Meneveau, and S. G. Saddoughi, Phys. Rev. E 51, 5594 (1995). 38. B. Lea-Cox and J. S. Y. Wang, Fractals 1, 87 (1993). 39. C. K. Peng, S. V. Buldyrev, M. Simons, H. E. Stanley, and A. L. Goldberger, Phys. Rev. E 49, 1685 (1994). 40. M. S. Taqqu, V. Teverovsky, and W. Willinger, Fractals 3, 785 (1995). 41. A. R. Mehrabi, H. Rassamdana, and M. Sahimi, Phys. Rev. E 56, 712 (1997). 42. B. Pilgram and D. T. Kaplan, Physica D 114, 108 (1998). 43. G. Parisi and U. Frisch, in Turbulence and Predictability in Geophysical Fluid Dynamics and Climate Dynamics, Proc. of Int. School, edited by M. Ghil, R. Benzi, and G. Parisi (NorthHolland, Amsterdam, 1985), p. 84. 44. A. S. Monin and A. M. Yaglom, Statistical Fluid Mechanics (MIT Press, Cambridge, MA, 1975), Vol. 2. 45. A. L. Baraba´si and T. Vicsek, Phys. Rev. A 44, 2730 (1991). 46. A. L. Baraba´si, P. Sze´falusy, and T. Vicsek, Physica A 178, 17 (1991). 47. J. F. Muzy, E. Bacry, and A. Arneodo, Phys. Rev. Lett. 67, 3515 (1991). 48. J. F. Muzy, E. Bacry, and A. Arneodo, Int. J. Bifurcation Chaos 4, 245 (1994). 49. A. Arneodo, E. Bacry, and J. F. Muzy, Physica A 213, 232 (1995). 50. J. F. Muzy, E. Bacry, and A. Arneodo, Phys. Rev. E 47, 875 (1993). 51. A. Grossmann and J. Morlet, S.I.A.M.J. Math. Anal. 15, 723 (1984). 52. A. Grossmann and J. Morlet, in Mathematics and Physics, Lectures on Recent Results, edited by L. Streit (World Scientific, Singapore, 1985), p. 135. 53. P. Goupillaud, A. Grossmann, and J. Morlet, Geoexploration 23, 85 (1984). 54. Wavelets, Wavelets, edited by J. M. Combes, A. Grossmann, and P. Tchamitchian (Springer-Verlag, Berlin, 1989). 55. Y. Meyer, Ondelettes (Herman, Paris, 1990). 56. Les Ondelettes en 1989, Les Ondelettes en 1989, edited by P. G. Lemarie´ (Springer-Verlag, Berlin, 1990). 57. Wavelets and Applications, Wavelets and Applications, edited by Y. Meyer (Springer, Berlin, 1992). 58. I. Daubechies, Ten Lectures on Wavelets (S.I.A.M, Philadelphia, 1992). 59. Wavelets and Their Applications, Wavelets and Their Applications, edited by M. B. Ruskai, G. Beylkin, R. Coifman, I. Daubechies, S. Mallat, Y. Meyer, and L. Raphael (Jones and Barlett, Boston, 1992). 60. C. K. Chui, An Introduction to Wavelets (Academic Press, Boston, 1992).
84
´ ODO ET AL. ARNE
61. Progress in Wavelets Analysis and Applications, Progress in Wavelets Analysis and Applications, edited by Y. Meyer and S. Roques (Editions frontie`res, Gif-sur-Yvette, 1993). 62. A. Arneodo, F. Argoul, E. Bacry, J. Elezgaray, and J. F. Muzy, Ondelettes, Multifractales et Turbulences: de l’ADN aux croissances cristallines (Diderot Editeur, Art et Sciences, Paris, 1995). 63. Wavelets: Theory and Applications, Wavelets: Theory and Applications, edited by G. Erlebacher, M. Y. Hussaini, and L. M. Jameson (Oxford University Press, Oxford, 1996). 64. M. Holschneider, Wavelets: An Analysis Tool (Oxford University Press, Oxford, 1996). 65. S. Mallat, A Wavelet Tour in Signal Processing (Academic Press, New York, 1998). 66. B. Torresani, Analyse Continue par Ondelettes (Editions de Physique, Les Ulis, 1998). 67. T. C. Halsey, M. H. Jensen, L. P. Kadanoff, I. Procaccia, and B. I. Shraiman, Phys. Rev. A 33, 1141 (1986). 68. P. Collet, J. Lebowitz, and A. Porzio, J. Stat. Phys. 47, 609 (1987). 69. G. Paladin A. Vulpiani, Phys. Rep. 156, 148 (1987). 70. B. B. Mandelbrot, Fractals and Multifractals: Noise, Turbulence and Galaxies, Vol. 1 of Selecta (Springer-Verlag, Berlin, 1989). 71. D. Rand, Ergod. Th. Dyn. Sys. 9, 527 (1989). 72. S. Zhong and S. Mallat, IEEE Trans. Pattern Anal. Machine Intelligence 14, 710 (1992). 73. S. Hwang and W. L. Mallat, IEEE Trans. Inform. Theory 38, 617 (1992). 74. E. Bacry, J. F. Muzy, and A. Arneodo, J. Stat. Phys. 70, 635 (1993). 75. S. Jaffard, SIAM J. Math. Anal. 28, 944 (1997). 76. H. G. E. Hentschel, Phys. Rev. E 50, 243 (1994). 77. T. Bohr and T. Te`l, in Direction in Chaos, Vol. 2, edited by B. L. Hao (World Scientific, Singapore, 1988), p. 194. 78. S. F. Edwards and P. W. Anderson, J. Phys. F 5, 965 (1975). 79. A. Arneodo, in Ref. [63], p. 349. 80. J. F. Muzy, E. Bacry, and A. Arneodo, in Ref. [61], p. 323. 81. A. Arneodo, J. F. Muzy, and S. G. Roux, J. Phys. II France 7, 363 (1997). 82. A. Arneodo, S. Manneville, and J. F. Muzy, Eur. Phys. J. B 1, 129 (1998). 83. A. Arneodo, B. Audit, E. Bacry, S. Manneville, J. F. Muzy, and S. G. Roux, Physica A 254, 24 (1998). 84. S. G. Roux, J. F. Muzy, and A. Arneodo, Eur. Phys. J. B 8, 301 (1999). 85. A. Arneodo, S. Manneville, J. F. Muzy, and S. G. Roux, Phil. Trans. R. Soc. London A 357, 2415 (1999). 86. A. Arneodo, J. Delour, and J. F. Muzy, in Wavelet Applications in Signal and Image Processing VIII, edited by A. Aldroubi, A. F. Laine, and M. A. Unser p. 58 (2000). 87. J. Delour, J. F. Muzy, and A. Arneodo, Eur. Phys. J. B 23, 243 (2001). 88. A. Arneodo, F. Argoul, E. Bacry, J. F. Muzy, and M. Tabard, Phys. Rev. Lett. 68, 3456 (1992). 89. A. Arneodo, F. Argoul, J. F. Muzy, M. Tabard, and E. Bacry, Fractals. 1, 629 (1993). 90. A. Arneodo, F. Argoul, J. F. Muzy, and M Tabard, Phys. Lett. A 171, 31 (1992). 91. A. Arneodo, F. Argoul, J. F. Muzy, and M. Tabard, Physica A 188, 217 (1992). 92. A. Kuhn, F. Argoul, J. F. Muzy, and A. Arneodo, Phys. Rev. Lett. 73, 2998 (1994). 93. A. Arneodo, E. Bacry, P. V. Graves, and J. F. Muzy, Phys. Rev. Lett. 74, 3293 (1995). 94. A. Arneodo, Y. Daubenton-Carafa, E. Bacry, P. V. Graves, J. F. Muzy, and C. Thermes, Physica D 96, 291 (1996). 95. A. Arneodo, Y. Daubenton-Carafa, B. Audit, E. Bacry, J. F. Muzy, and C. Thermes, Eur. Phys. J. B 1, 259 (1998). 96. A. Arneodo, Y. Daubenton-Carafa, B. Audit, E. Bacry, J. F. Muzy, and C. Thermes, Physica A 249, 439 (1998).
MULTIFRACTAL IMAGE ANALYSIS
85
97. B. Audit, C. Thermes, C. Vaillant, Y. Daubenton-Carafa, J. F. Muzy, and A. Arneodo, Phys. Rev. Lett. 86, 2471 (2001). 98. B. Audit, C. Vaillant, A. Arneodo, Y. Daubenton-Carafa, and C. Thermes, J. Mol. Biol. 316, 903 (2002). 99. A. Arneodo, J. P. Bouchaud, R. Cont, J. F. Muzy, M. Potters, and D. Sornette, preprint cond-mat/9607120 at http://xxx.lanl.gov. 100. A. Arneodo, J. F. Muzy, and D. Sornette, Eur. Phys. J. B 2, 277 (1998). 101. A. Arneodo, E. Bacry, and J. F. Muzy, Phys. Rev. Lett. 74, 4823 (1995). 102. A. Arneodo, E. Bacry, S. Jaffard, and J. F. Muzy, J. Stat. Phys. 87, 179 (1997). 103. A. Arneodo, E. Bacry, S. Jaffard, and J. F. Muzy, J. Fourier Anal. Appl. 4, 159 (1998). 104. A. Arneodo, E. Bacry, S. Jaffard, and J. F. Muzy, CRM Proc. Lecture Notes. 18, 315 (1999). 105. J. C. Vassilicos and J. C. Hunt, Proc. R. Soc. London. 435, 505 (1991). 106. J. Arrault, A. Arneodo, A. Davis, and A. Marshak, Phys. Rev. Lett. 79, 75 (1997). 107. A. Arneodo, N. Decoster, and S. G. Roux, Phys. Rev. Lett. 83, 1255 (1999). 108. A. Arneodo, N. Decoster, and S. G. Roux, Eur. Phys. J. B 15, 567 (2000). 109. N. Decoster, S. G. Roux, and A. Arneodo, Eur. Phys. J. B 15, 739 (2000). 110. S. G. Roux, A. Arneodo, and N. Decoster, Eur. Phys. J. B 15, 765 (2000). 111. J. P. Antoine, P. Carette, R. Murenzi, and B. Piette, Signal Process. 31, 241 (1993). 112. E. Freysz, B. Pouligny, F. Argoul, and A. Arneodo, Phys. Rev. Lett. 64, 745 (1990). 113. A. Arneodo, F. Argoul, J. F. Muzy, B. Pouligny, and E. Freysz, in Ref. [59], p. 241. 114. J. Canny, IEEE Trans. Pattern Anal. Machine Intelligence. 8, 679 (1986). 115. P. Kestener, J. Lina, P. Saint-Jean, and A. Arneodo, Image Anal. Stereol. 20, 169 (2001). 116. D. Marr, Vision (W. H. Freemann and Co, San Francisco, 1982). 117. A. Rosenfeld M. Thurston, IEEE Trans. Comput. C 20, 562 (1971). 118. R. Murenzi, Ph.D. thesis, University of Louvain la Neuve, 1990. 119. R. Murenzi, in Ref. [54], p. 239. 120. D. Schertzer and S. Lovejoy, J. Geophys. Res. 92, 9693 (1987). 121. D. Schertzer and S. Lovejoy, Phys. Chem. Hyd. J. 6, 623 (1985). 122. S. Lovejoy and D. Schertzer, in Ref. [10], p. 102. 123. D. Schertzer, S. Lovejoy, F. Schmitt, Y. Ghigisinskaya, and D. Marsan, Fractals. 5, 427 (1997). 124. S. Jaffard and Y. Meyer, Memoirs A.M.S. 123, n.587 (1996). 125. M. Ben Slimane, Ph.D. thesis, E.N.P.C., France, 1996. 126. S. Jaffard, Pub. Math. 35, 155 (1991). 127. E. Bacry, A. Arneodo, U. Frisch, Y. Gagne, and E. Hopfinger, in Turbulence and Coherent Structures, edited by M. Lesieur and O. Metais (Kluwer, Dordrecht, 1991), p. 203. 128. M. Vergassola, R. Benzi, L. Biferale, and D. Pisarenko, J. Phys. A 26, 6093 (1993). 129. M. Vergassola and U. Frisch, Physica D 54, 58 (1991). 130. S. Jaffard, C. R. Acad. Sci. Paris, Serie I: Math. 326, 555 (1998). 131. R. Badii, Ph.D. thesis, University of Zurich, 1987. 132. P. Cvitanovic, in Proceedings Group Theoretical Methods in Physics, edited by R. Gilmore (World Scientific, Singapore, 1987). 133. M. J. Feigenbaum, M. H. Jensen, and I. Procaccia, Phys. Rev. Lett. 57, 1503 (1986). 134. M. H. Jensen, L. P. Kadanoff, and I. Procaccia, Phys. Rev. A 36, 1409 (1987). 135. A. B. Chhabra, R. V. Jensen, and K. R. Sreenivasan, Phys. Rev. A 40, 4593 (1989). 136. A. B. Chhabra and R. V. Jensen, Phys. Rev. Lett. 62, 1327 (1989). 137. A. B. Chhabra, C. Meneveau, R. V. Jensen, and K. R. Sreenivasan, Phys. Rev. A 40, 5284 (1989). 138. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes (Cambridge University Press, Cambridge, 1992).
86 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149.
150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171.
172. 173. 174. 175.
´ ODO ET AL. ARNE N. Decoster, Ph.D. thesis, University of Bordeaux I, 1999. B. B. Mandelbrot and J. W. Van Ness, S.I.A.M. Rev. 10, 422 (1968). J. Beran, Statistics for Long-Memory Process (Chapman & Hall, New York, 1994). G. Wornell and A. V. Oppenheim, IEEE Trans. Signal Proc. 40, 611 (1992). R. F. Peltier and J. Levy Ve´hel, INRIA Report No. 2396 (1994). P. Flandrin, IEEE Trans. Inform. Theory. 35, 197 (1989). P. Flandrin, IEEE Trans. Inform. Theory. 38, 910 (1992). P. Flandrin, Temps-Fre´quence (Herme`s, Paris, 1993). E. Masry, IEEE Trans. Inform. Theory. 39, 260 (1993). P. Abry, P. Goncalve`s, and P. Flandrin, Lectures Note Statistics. 105, 15 (1995). P. Abry, Ondelettes et Turbulence—Multire´solution, Algorithmes de De´composition, Invariance d’Echelles et Signaux de Pression (Diderot Editeur, Arts et Sciences, Paris, 1997). P. Abry and D. Veitch, IEEE Trans. Inform. Theory. 44, 2 (1998). P. Abry, D. Veitch, and P. Flandrin, J. Time Ser. Anal. 19, 253 (1998). L. Abry and P. Delbeke, Stoch. Proc. Applic. 86, 177 (2000). D. Veitch and P. Abry, IEEE Trans. Inform. Theory. 45, 878 (1999). P. Abry, P. Flandrin, M. S. Taqqu, and D. Veitch, in Self-Similarity in Network Traffic, edited by K. Parks and W. Willinger (John Wiley & Sons, New York, 1998). A. H. Tewfik and M. Kim, IEEE Trans. Inform. Theory. 38, 904 (1992). J. Pando and L. Z. Fang, Phys. Rev. E 57, 3593 (1998). J. Simonsen, A. Hansen, and O. M. Nes, Phys. Rev. E 58, 2779 (1998). B. Audit, E. Bacry, J. F. Muzy, and A. Arneodo, IEEE Trans. Inform. Theory 48, 2938 (2002). P. Le´vy, Processus Stochastiques et Mouvement Brownien (Gauthier-Villars, Paris, 1965). R. F. Voss, in Fundamental Algorithms for Computer Graphics, edited by R. A. Earnshaw (Springer-Verlag, Heidelberg, 1985), p. 805. A. Arneodo, E. Bacry, and J. F. Muzy, J. Math. Phys. 39, 4142 (1998). M. E. Cates and J. M. Deutsch, Phys. Rev. A 35, 4907 (1987). A. P. Siebesma, in Universality in Condensed Matter, edited by R. Julien, L. Peliti, R. Rammal, and N. Boccara (Springer-Verlag, Heidelberg, 1988), p. 188. J. O’Neil and C. Meneveau, Phys. Fluids A 5, 158 (1993). A. Arneodo, E. Bacry, S. Manneville, and J. F. Muzy, Phys. Rev. Lett. 80, 708 (1998). M. Greiner, J. Geisemann, P. Lipa, and P. Carruthers, Z. Phys. C 69, 305 (1996). M. Greiner, J. Geisemann, and P. Lipa, Phys. Rev. E 56, 4263 (1997). J. Le´vy-Ve´hel, Fractals. 3, 755 (1995). J. Le´vy-Ve´hel, in Ref. [10], p. 85. M. Unser and A. Aldroubi, Proc. IEEE 84, 626 (1996). Wavelet Applications in Signal and Image Processing VIII., Wavelet Applications in Signal and Image Processing VIII., Vol. 4119 of SPIE Conference Proceedings edited by A. Aldroubi, A. F. Laine, and M. A. Unser (2000). S. Lovejoy, Science. 216, 185 (1982). R. F. Cahalan, in Advances in Remote Sensing and Retrieval Methods, edited by A. Deepak, H. Fleming, and J. Theon (Deepak Pub, Hampton, 1989), p. 371. V. Ramanatahn, R. D. Cess, E. F. Harrison, P. Minnis, B. R. Barkston, E. Ahmad, and D. Hartmann, Science 243, 57 (1989). R. D. Cess, G. L. Potter, J. P. Blanchet, G. J. Boer, S. J. Ghan, J. T. Kiehl, M. Le Treut, Z.-X. Li, X.-Z. Lang, J. F. B. Mitchell, J.-J. Morcrette, D. A. Randall, M. R. Riches, E. Roeckner, U. Schlese, A. Slingo, K. E. Taylor, W. M. Washington, R. T. Wetherald, and I. Yagai, Science 245, 513 (1989).
MULTIFRACTAL IMAGE ANALYSIS
87
176. F. S. Rys and A. Waldvogel, in Fractal in Physics, edited by L. Pietronero and E. Tosatti (North-Holland, Amsterdam, 1986), p. 461. 177. R. M. Welch and B. A. Wielicki, Clim. Appl. Meteorol. 25, 261 (1986). 178. J. I. Yano and Y. Takeuchi, J. Meteorol. Soc. Jpn. 65, 661 (1987). 179. R. M. Welch, K. S. Kuo, B. A. Wielicki, S. K. Sengupta, and L. Parker, J. Appl. Meteorol. 27, 341 (1988). 180. R. F. Cahalan and J. H. Joseph, Mon. Weather Rev. 117, 261 (1989). 181. G. Se`ze and L. Smith, in Proceedings of the Seventh Conference on Atmospheric Radiation, American Meteorological Society, San Francisco, CA (1990), p. 47. 182. A. Davis, S. Lovejoy, and D. Schertzer, in Scaling, Fractals and Nonlinear Variability in Geophysics, edited by S. Lovejoy and D. Schertzer (Kluwer, Dordrecht, 1991), p. 303. 183. Y. Tessier, S. Lovejoy, and D. Schertzer, J. Appl. Meteorol. 32, 223 (1993). 184. A. Davis, A. Marshak, W. J. Wiscombe, and R. F. Cahalan, J. Geophys. Res. 99, 8055 (1994). 185. W. D. King, C. T. Maher, and G. A. Hepburn, J. Appl. Meteorol. 20, 195 (1981). 186. C. Duroure and B. Guillemet, Atmos. Res. 25, 331 (1990). 187. B. Baker, J. Atmos. Sci. 49, 387 (1992). 188. S. P. Malinowski and I. Zawadski, J. Atmos. Sci. 50, 5 (1993). 189. A. V. Korolev and I. P. Mazin, J. Appl. Meteorol. 32, 760 (1993). 190. S. P. Malinowski, M. Y. Leclerc, and D. G. Baumgardner, J. Atmos. Sci. 51, 397 (1994). 191. A. Davis, A. Marshak, W. J. Wiscombe, and R. F. Cahalan, J. Atmos. Sci. 53, 1538 (1996). 192. A. Marshak, A. Davis, W. J. Wiscombe, and R. F. Cahalan, J. Atmos. Sci. 54, 1423 (1997). 193. S. Cox, D. McDougal, D. Randall, and R. Schiffer, Bull. Am. Meteorol. Soc. 68, 114 (1987). 194. B. A. Albrecht, C. S. Bretherton, D. Jonhson, W. H. Schubert, and A. S. Frisch, Bull. Am. Meteorol. Soc. 76, 889 (1995). 195. R. Boers, J. B. Jensen, P. B. Krummel, and H. Gerber, Quart. J. R. Meteorol. Soc. 122, 1307 (1996). 196. H. W. Baker and J. A. Davies, Remote Sens. Environ. 42, 51 (1992). 197. A. Davis, A. Marshak, R. F. Cahalan, and W. J. Wiscombe, J. Atmos. Sci. 54, 241 (1997). 198. R. F. Cahalan and J. B. Snider, Remote Sens. Environ. 28, 95 (1989). 199. S. Lovejoy, D. Schertzer, P. Silas, Y. Tessier, and D. Lavalle´e, Ann. Geophys. 11, 119 (1993). 200. S. M. Gollmer, M. Harshvardan, R. F. Cahalan, and J. S. Snider, J. Atmos. Sci. 52, 3013 (1995). 201. W. J. Wiscombe, A. Davis, A. Marshak, and R. F. Cahalan, Proceedings of the Fourth Atmospheric Radiation Measurement (ARM) Science Team Meeting, Charleston, U.S. Department of Energy 11 (1995). 202. A. Davis, A. Marshak, H. Gerber, and W. J. Wiscombe, J. Geophys. Res (1998) Unpublished. 203. D. Lovejoy and S. Schertzer, in Turbulence and Chaotic Phenomena in Fluids, edited by T. Tatsumi (North-Holland, Amsterdam, 1984), p. 505. 204. D. Lovejoy and S. Schertzer, in Fractals: Their Physical Origin and Properties, edited by L. Pietronero (Plenum, New York, 1989), p. 49. 205. J. Wilson, D. Schertzer, and S. Lovejoy, in Scaling, Fractals and Nonlinear Variability in Geophysics, edited by D. Schertzer and S. Lovejoy (Kluwer, Dordrecht, 1991), p. 185. 206. D. Schertzer and S. Lovejoy, in Ref. [10], p. 11.
88
´ ODO ET AL. ARNE
207. A. Davis, A. Marshak, W. J. Wiscombe, and R. F. Cahalan, Proceedings of the 2nd Workshop on Nonstationary Random Processes and Their Applications (1995), preprint. 208. L. M. Romanova, Izv. Acad. Sci. USSR Atmos. Oceanic Phys. 11, 509 (1975). 209. A. Davis, Ph.D. thesis, McGill University, Montreal, 1992. 210. R. F. Cahalan, W. Ridgway, W. J. Wiscombe, T. L. Bell, and J. B. Snider, J. Atmos. Sci. 51, 2434 (1994). 211. R. D. Cess, M. H. Zhang, Y. Zhou, X. Jing, and V. Dvortsov, J. Geophys. Res. 101, 23299 (1996). 212. K. Stamnes, S.-C. Tsay, W. J. Wiscombe, and K. Jayaweera, Appl. Opt. 27, 2502 (1988). 213. R. F. Cahalan, W. Ridgway, W. J. Wiscombe, S. Gollmer, and M. Harshvardan, J. Atmos. Sci. 51, 3776 (1994). 214. A. Marshak,A.Davis,W.J. Wiscombe, andR.F. Cahalan,J.Geophys.Res. 100,26247(1995). 215. M. Tiedke, Mon. Weather Res. 124, 745 (1996). 216. A. Davis, A. Marshak, W. J. Wiscombe, and R. F. Cahalan, in Current Topics in Nonstationary Analysis, edited by G. Trevin˜o et al. (World Scientific, Singapore, 1996), p. 97. 217. M. Harshvardan, B. A. Wielicki, and K. M. Ginger, J. Climate. 7, 1987 (1994). 218. R. F. Cahalan, M. Nestler, W. Ridgway, W. J. Wiscombe, and T. L. Bell, in Proceedings the 4th International Meeting on Statistical Climatology, edited by J. Sansom (New Zealand Meteorological Service, Wellington, 1990), p. 28. 219. A. Davis, S. Lovejoy, and D. Schertzer, SPIE Proc. 1558, 37 (1991). 220. A. Marshak, A. Davis, W. J. Wiscombe, and G. Titov, Remote Sens. Environ. 52, 72 (1995). 221. S. G. Roux, Ph.D. thesis, University of Aix-Marseille II, 1996. 222. A. N. Kolmogorov, C. R. Acad. Sci. USSR. 30, 301 (1941). 223. G. Ruiz-Chavarria, C. Baudet, and S. Ciliberto, Physica D 99, 369 (1996). 224. C. H. Meong, W. R. Cotton, C. Bretherton, A. Chlond, M. Khairoutdinov, S. Krueger, W. S. Lewellen, M. K. McVean, J. R. M. Pasquier, H. A. Rand, A. P. Siebesma, B. Stevens, and R. I. Sykes, Bull. Am. Meteorol. Soc. 77, 261 (1996). 225. A. Marshak, A. Davis, R. F. Cahalan, and W. J. Wiscombe, Phys. Rev. E 49, 55 (1994). 226. U. Frisch and S. A. Orszag, Phys. Today 24, (1990). 227. C. Meneveau and K. R. Sreenivasan, J. Fluid Mech. 224, 429 (1991). 228. Turbulence: A Tentative Dictionary, Turbulence: A Tentative Dictionary, edited by P. Tabeling and O. Cardoso (Plenum, New York, 1995). 229. K. R. Sreenivasan and R. A. Antonia, Annu. Rev. Fluid Mech. 29, 435 (1997). 230. M. Briscolini, P. Santangelo, S. Succi, and R. Benzi, Phys. Rev. E 50, R1745 (1994). 231. A. Vincent and M. Meneguzzi, J. Fluid Mech. 225, 1 (1995). 232. C. W. Van Atta and W. Y. Chen, J. Fluid Mech. 44, 145 (1970). 233. F. Anselmet, Y. Gagne, E. J. Hopfinger, and R. A. Antonia, J. Fluid Mech. 140, 63 (1984). 234. Y. Gagne, Ph.D. thesis, University of Grenoble, 1987. 235. B. Castaing, Y. Gagne, and E. J. Hopfinger, Physica D 46, 177 (1990). 236. C. Baudet, S. Ciliberto, and Phan Nhan Tien, J. Phys. II France 3, 293 (1993). 237. G. Stolovitzky and K. R. Sreenivasan, Phys. Rev. E 48, R33 (1993). 238. J. Maurer, P. Tabeling, and G. Zocchi, Europhys. Lett. 26, 31 (1994). 239. J. Herweijer and W. Van de Water, Phys. Rev. Lett. 74, 4651 (1995). 240. A. Arneodo et al., Europhys. Lett. 34, 411 (1996). 241. R. Benzi, L. Biferale, G. Paladin, A. Vulpiani, and M. Vergassola, Phys. Rev. Lett. 67, 2299 (1991). 242. P. Kailasnath, K. R. Sreenivasan, and G. Stolovitzky, Phys. Rev. Lett. 68, 2766 (1992). 243. A. Praskovsky and S. Oncley, Phys. Rev. Lett. 7, 3999 (1994).
MULTIFRACTAL IMAGE ANALYSIS
89
244. P. Tabeling, G. Zocchi, F. Belin, J. Maurer, and H. Willaime, Phys. Rev. E 53, 1613 (1996). 245. F. Belin, P. Tabeling, and H. Willaime, Physica D 93, 52 (1996). 246. B. Castaing, Y. Gagne, and M. Marchand, Physica D 68, 387 (1993). 247. G. Pedrizetti, E. Novikov, and A. Praskovsky, Phys. Rev. E 53, 475 (1996). 248. R. Benzi, S. Ciliberto, R. Tripiccione, C. Baudet, F. Massaioli, and S. Succi, Phys. Rev. E 48, R29 (1993). 249. R. Benzi, S. Ciliberto, C. Baudet, G. R. Chavarria, and R. Tripiccione, Europhys. Lett. 24, 275 (1993). 250. R. Benzi, S. Ciliberto, C. Baudet, and G. R. Chavarria, Physica D 80, 385 (1995). 251. Y. Gagne, M. Marchand, and B. Castaing, J. Phys. II France 4, 1 (1994). 252. A. Naert, L. Puech, B. Chabaud, J. Peinke, and B. Castaing B. Hebral, J. Phys. II France 4, 215 (1994). 253. B. Chabaud, A. Naert, J. Peinke, F. Chilla`, B. Castaing, and B. Hebral, Phys. Rev. Lett. 73, 3227 (1994). 254. B. Dubrulle and B. Castaing, J. Phys. II France 5, 895 (1995). 255. F. Chilla`, J. Peinke, and B. Castaing, J. Phys. II France 6, 455 (1996). 256. Y. Male´cot, C. Auriault, H. Kahalerras, Y. Gagne, O. Chanal, B. Chabaud, and B. Castaing, Eur. Phys. J. B 16, 549 (2000). 257. O. Chanal, B. Chabaud, B. Castaing, and B. Hebral, Eur. Phys. J. B 17, 309 (2000). 258. A. Arneodo, S. Manneville, J. F. Muzy, and S. G. Roux, Appl. Comput. Harmonic Anal. 6, 374 (1999). 259. R. Peinke and J. Friedrich, Phys. Rev. Lett. 78, 863 (1997). 260. R. Peinke and J. Friedrich, Physica D 102, 147 (1997). 261. P. O. Amblard and J. M. Brossier, Eur. Phys. J. B 12, 335 (1999). 262. P. Marcq and A. Naert, Phys. Fluids 13, 2590 (2001). 263. J. Davoudi and M. R. R. Tabar, Phys. Rev. Lett. 82, 1680 (1999). 264. J. P. Laval, B. Dubrulle, and S. Nazarenko, Phys. Fluids 13, 1995 (2001). 265. I. Arad, B. Dhruva, S. Kurien, V. S. L’vov, I. Procaccia, and K. R. Sreenivasan, Phys. Rev. Lett. 81, 5330 (1998). 266. I. Arad, V. S. L’vov, and I. Procaccia, Phys. Rev. E 59, 6753 (1999). 267. S. Kurien and K. R. Sreenivasan, Phys. Rev. E 62, 2206 (2000). 268. L. Biferale and F. Toschi, Phys. Rev. Lett. 86, 4831 (2001). 269. A. N. Kolmogorov, J. Fluid Mech. 13, 82 (1962). 270. A. M. Obukhov, J. Fluid Mech. 13, 77 (1962). 271. I. Hosokawa and K. Yamamoto, Phys. Fluids A 4, 457 (1992). 272. A. A. Praskovsky, Phys. Fluids A 4, 2589 (1992). 273. S. T. Thoroddsen and C. W. Van Atta, Phys. Fluids A 4, 2592 (1992). 274. S. Chen, G. D. Doolen, R. H. Kraichnan, and Z. S. She, Phys. Fluids A 5, 458 (1992). 275. G. Stolovitzky, P. Kailasnath, and K. R. Sreenivasan, Phys. Rev. Lett. 69, 1178 (1992). 276. G. Sreenivasan and K. R. Stolovitzky, Rev. Mod. Phys. 66, 229 (1994). 277. A. A. Praskovsky and S. Oncley, Europhys. Lett. 28, 635 (1994). 278. S. T. Thoroddsen, Phys. Fluids 7, 691 (1995). 279. S. Chen, G. D. Doolen, R. H. Kraichnan, and L. P. Wang, Phys. Rev. Lett. 74, 1755 (1995). 280. V. Borue and S. A. Orszag, Phys. Rev. E 53, R21 (1996). 281. R. Benzi, R. Struglia, and R. Tripiccione, Phys. Rev. E 53, R5565 (1996). 282. L. P. Wang, S. Chen, J. G. Brasseur, and J. C. Wyngaard, J. Fluid Mech. 309, 113 (1996). 283. A. Tsinober, E. Kit, and T. Dracos, J. Fluid Mech. 242, 169 (1992). 284. L. Shtilman, M. Spector, and A. Tsinober, J. Fluid Mech. 247, 65 (1993).
90 285. 286. 287. 288. 289. 290. 291. 292. 293. 294. 295. 296. 297. 298. 299. 300. 301. 302. 303. 304. 305. 306. 307. 308. 309. 310. 311. 312. 313. 314. 315. 316. 317. 318. 319. 320. 321. 322. 323. 324. 325. 326. 327. 328. 329. 330.
´ ODO ET AL. ARNE L. Richardson, Proc. R. Soc. London Ser. A 110, 709 (1926). S. Kida, J. Phys. Soc. Jpn. 60, 5 (1990). E. A. Novikov, Physica A 2, 814 (1990). E. A. Novikov, Phys. Rev. E 50, 3303 (1995). B. Dubrulle, Phys. Rev. Lett. 73, 959 (1994). Z. S. She and E. C. Waymire, Phys. Rev. Lett. 74, 262 (1995). B. Dubrulle, J. Phys. II France 6, 1825 (1996). Z. S. She and E. Leveque, Phys. Rev. Lett. 72, 336 (1994). A. B. Chhabra and K. R. Sreenivasan, Phys. Rev. Lett. 68, 2762 (1992). B. Jouault, P. Lipa, and M. Greiner, Phys. Rev E 59, 2451 (1999). D. Sornette, in Scale Invariance and Beyond, edited by B. Dubrulle, F. Graner, and D. Sornette (EDP Sciences, Les Ulis, 1997), p. 235. W. X. Zhou and D. Sornette, Physica D 165, 94 (2002). B. B. Mandelbrot, C. R. Acad. Sci. Paris Ser. A 278, 289, 355 (1974). B. B. Mandelbrot, J. Fluid Mech. 62, 331 (1974). J. P. Kahane and J. Peyrie`re, Adv. Math. 22, 131 (1976). G. M. Molchan, Commun. Math. Phys. 179, 681 (1996). A. Naert, R. Friedrich, and J. Peinke, Phys. Rev. E 56, 6719 (1997). P. Naert and A. Marcq, Physica D 124, 368 (1998). C. Meneveau and K. R. Sreenivasan, Nucl. Phys. B2, 49 (1987). C. Meneveau and K. R. Sreenivasan, Phys. Rev. Lett. 59, 1424 (1987). C. Meneveau and K. R. Sreenivasan, Phys. Lett. A 137, 103 (1989). E. Aurell, U. Frisch, J. Lutsko, and M. Vergassola, J. Fluid Mech. 238, 467 (1992). G. Paladin and A. Vulpiani, Phys. Rev. A 35, 1971 (1987). C. Meneveau and M. Nelkin, Phys. Rev. A 39, 3732 (1989). U. Frisch and M. Vergassola, Europhys. Lett. 14, 439 (1991). W. Van de Water, B. Van der Vorst, and E. Van de Wetering, Europhys. Lett. 16, 443 (1991). J. Molenaar, J. Herweijer, and W. Van de Water, Phys. Rev. E 52, 496 (1995). I. Hosokawa, S. Oide, and K. Yamamoto, Phys. Rev. Lett. 77, 4548 (1996). A. Bershadskii, T. Nakano, D. Fukayama, and T. Gotoh, Eur. Phys. J. B 18, 95 (2000). A. Bershadskii and A. Tsinober, Phys. Rev. E 48, 282 (1993). A. Bershadskii, E. Kit, A. Tsinober, and H. Vaisburd, Fluid Dyn. Res. 14, 71 (1994). R. Badii and P. Talkner, Phys. Rev. E 59, 6715 (1999). R. Badii and P. Talkner, Phys. Rev. E 60, 4138 (1999). V. L’vov and I. Procaccia, Phys. Fluids 8, 2565 (1996). G. He, S. Chen, R. H. Kraichnan, R. Zhang, and Y. Zhou, Phys. Rev. Lett. 81, 4636 (1998). M. Nelkin, Phys. Fluids 11, 2202 (1999). E. Siggia, J. Fluid Mech. 107, 375 (1981). R. Kerr, J. Fluid Mech. 153, 31 (1985). C. Meneveau, K. R. Sreenivasan, G. P. Kailasnath, and M. S. Fan, Phys. Rev. A 41, 894 (1990). M. S. Shafi, Y. Zhu, and R. A. Antonia, Phys. Fluids 8, 2245 (1996). N. Cao, S. Chen, and K. R. Sreenivasan, Phys. Rev. Lett. 76, 616 (1996). S. Chen, K. R. Sreenivasan, and M. Nelkin, Phys. Rev. Lett. 79, 1253 (1997). S. Chen, K. R. Sreenivasan, M. Nelkin, and N. Cao, Phys. Rev. Lett. 79, 2253 (1997). W. Van de Water and J. Herweijer, Bull. Am. Phys. Soc. 41, 1782 (1996). R. Camussi and R. Benzi, Phys. Fluids 9, 257 (1997). O. N. Boratov and R. B. Pelz, Phys. Fluids 9, 1400 (1997).
MULTIFRACTAL IMAGE ANALYSIS 331. 332. 333. 334. 335. 336. 337. 338. 339. 340. 341. 342. 343. 344. 345. 346. 347. 348. 349.
350. 351. 352. 353. 354. 355. 356. 357. 358. 359. 360. 361. 362. 363.
91
S. Grossmann, D. Lohse, and A. Reeh, Phys. Fluids 9, 3817 (1997). R. A. Antonia and B. R. Pearson, Europhys. Lett. 40, 123 (1997). B. Dhruva, Y. Tsuji, and K. R. Sreenivasan, Phys. Rev. E 56, R4928 (1997). M. Meneguzzi, Private communication. E. Ott, Y. Du, K. R. Sreenivasan, A. Juneja, and A. K. Suri, Phys. Rev. Lett. 69, 2654 (1992). Y. Du and E. Ott, Physica D 67, 387 (1993). A. L. Bertozzi and A. B. Chhabra, Phys. Rev. E 49, 4716 (1994). M. J. Broeders and A. L. Verbeek, in Radiological Diagnosis of Breast Diseases, edited by M. Friedrich and E. Sickles (Springer-Verlag, Berlin, 1997), p. 1. M. H. Dilhuydy and B. Barreau, Eur. J. Radiol. 24, 86 (1997). K. Doi, M. L. Giger, R. M. Nishikawa, K. R. Hoffmann, H. MacMahon, R. A. Schmidt, and K. G. Chua, Acta Radiol. 34, 426 (1993). A. F. Laine, S. Schuler, J. Fan, and W. Huda, IEEE Trans. Med. Imaging 13, 725 (1994). W. Qian, M. Kallergi, L. P. Clarke, H.-D. Li, P. Venugopal, D. Song, and R. A. Clark, Med. Phys. 22, 1247 (1995). R. A. Devore, B. Lucier, and Z. Yang, in Wavelets in Medicine and Biology, edited by A. Aldroubi and M. Unser (CRC Press, Boca Raton, FL, 1996), p. 145. J. J. Heine, S. R. Deans, D. K. Cullers, R. Stauduhar, and L. P. Clarke, IEEE Trans. Med. Imaging 16, 503 (1997). H. P. Chan, K. Doi, C. J. Vyborny, R. A. Schmidt, C. E. Metz, K. L. Lam, T. Ogura, Y. Wu, and H. MacMahon, Invest. Radiol. 25, 1102 (1990). D. R. Davies and D. H. Dance, Phys. Med. Biol. 35, 1111 (1990). J. Dengler, S. Behrens, and J. F. Desage, IEEE Trans. Med. Imaging 12, 634 (1993). R. M. Nishikawa, M. L. Giger, K. Doi, C. J. Vyborny, and R. A. Schmidt, Med. Phys. 20, 1161 (1993). A. Bazzani, A. Bevilacqua, D. Bollini, R. Campanini, N. Lanconelli, A. Riccardi, and D. Romani, in Digital Mammography: IWDM 2000, 5th International Workshop, edited by M. Yaffe (Medical Physics Publishing, Madison, 2001). N. Karssemeijer, in Proceedings of the 12th International Conference on Information Processing Medical Imaging, (Springer-Verlag, Berlin, 1991), p. 227. N. Karssemeijer, Int. J. Pattern Recog. Artificial Intell. 7, 1357 (1993). C. E. Priebe, J. L. Solka, R. A. Lorey, G. W. Rogers, W. L. Poston, M. Kallergi, W. Qian, L. P. Clarke, and R. A. Clark, Cancer Lett. 77, 183 (1994). H. D. Li, M. Kallergi, L. P. Clarke, V. K. Jain, and R. A. Clark, IEEE Trans. Med. Imaging 14, 565 (1995). W. J. H. Veldkamp, N. Karssemeijer, J. D. M. Otten, and J. H. C. L. Hendricks, Med. Phys. 27, 2600 (2000). Y. Wu, K. Doi, M. L. Giger, and R. M. Nishikawa, Med. Phys. 19, 555 (1992). Y. Wu, M. L. Giger, K. Doi, C. J. Vyborny, and R. A. Schmidt, Radiology 187, 81 (1993). H. P. Chan, S. C. B. Lo, B. Sahiner, K. L. Lam, and M. A. Helvie, Med. Phys. 22, 1555 (1995). W. Zhang, K. Doi, M. L. Giger, R. M. Nishikawa, and R. A. Schmidt, Med. Phys. 23, 595 (1996). B. Zheng, W. Qian, and L. P. Clarke, IEEE Trans. Med. Imaging 15, 589 (1996). S. Yu L. Guan, IEEE Trans. Med. Imaging 19, 115 (2000). H. Cheng, Y. M. Lui, and R. I. Feiimanis, IEEE Trans. Med. Imaging 17, 442 (1998). M. A. Gavrielides, J. Y. Lo, R. Vargas-Voracek, and C. E. FloydJr., Med. Phys. 27, 13 (2000). P. K. Saha, J. K. Udupa, E. F. Conant, D. P. Chakraborty, and D. Sullivan, IEEE Trans. Med. Imaging 20, 792 (2001).
92
´ ODO ET AL. ARNE
364. H. Yoshida, K. Doi, R. M. Nishikawa, K. Muto, and M. Tsuda, Acad. Rep. Tokyo Inst. Polytech. 17, 24 (1994). 365. H. Yoshida, R. M. Nishikawa, M. L. Giger, and K. Doi, Proc. SPIE. 2825, 805 (1996). 366. H. Yoshida, K. Doi, R. M. Nishikawa, M. L. Giger, and R. A. Schmidt, Acad. Radiol. 3, 621 (1996). 367. R. N. Strickland and H. I. I. Hahn, IEEE Trans. Med. Imaging 15, 218 (1996). 368. W. Zhang, H. Yoshida, R. M. Nishikawa, and K. Doi, Med. Phys. 25, 949 (1998). 369. J. M. Lado, P. G. Tahoces, A. J. Mendez, M. Souto, and J. J. Vidal, Med. Phys. 26, 1294 (1999). 370. T. Netsch and H. O. Peitgen, IEEE Trans. Med. Imaging 18, 774 (1999). 371. W. Qian, L. Li, X. Sun, and R. A. Clark, in Wavelet Applications in Signal and Image Processing VIII, SPIE Conference Proceedings, edited by A. Aldroubi, A. F. Laine, and M. A. Unser , p. 596–604 (2000). 372. J. K. Park and H. W. Kim, IEEE Trans. Med. Imaging 18, 231 (1999). 373. S. K. Lee, C. S. Lo, C. M. Wang, P. C. Chung, C. I. Chang, C. W. Yang, and P. C. Hsu, Int. J. Med. Informatics 60, 29 (2000). 374. C. B. Caldwell, S. J. Stapleton, D. W. Holdsworth, R. A. Jong, W. J. Weiser, G. Cooke, and M. J. Yaffe, Phys. Med. Biol. 35, 235 (1990). 375. F. Lefebvre, H. Benali, R. Gilles, E. Kahn, and R. Di Paola, Med. Phys. 22, 381 (1995). 376. D. L. Thiele, C. Kimme-Smith, T. D. Johnson, M. McCombs, and L. W. Bassett, Med. Phys. 23, 549 (1996). 377. H. Guillemet, H. Benali, E. Kahn, and R. Di Paola, Acta Stereol. 15/2, 125 (1996). 378. V. Velanovich, Am. J. Med. 311, 211 (1996). 379. J. J. Heine, S. R. Deans, R. P. Velthuizen, and L. P. Clarke, Med. Phys. 26, 2254 (1999). 380. M. Heath, K. W. Bowyer, and D. Kopanset al., in Digital Mammography, (Kluwer Academic, Dordrecht, 1998), p. 457. 381. J. N. Wolfe, Cancer. 37, 2486 (1976). 382. A. M. Boyd and N. F. Oza, Epidemiol. Rev. 15, 196 (1993). 383. N. F. Boyd, J. W. Byng, R. A. Jong, E. K. Fishell, L. E. Little, A. B. Miller, G. A. Lockwood, D. L. Tritchler, and M. J. Yaffe, J. Natl. Cancer Inst. 87, 670 (1995). 384. J. J. Heine and R. P. Velthuizen, Med. Phys. 27, 2644 (2000). 385. Universality in Chaos, Universality in Chaos, edited by P. Cvitanovic (Hilger, Bristol, 1984). 386. Chaos, Chaos, edited by B. L. Hao (World Scientific, Singapore, 1984). 387. H. G. Schuster, Deterministic Chaos (Physik Verlag, Weimheim, 1984). 388. P. Berge´, Y. Pomeau, and C. Vidal, Order within Chaos (Wiley, New York, 1986).
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 126
An Analysis of the Geometric Distortions Produced by Median and Related Image Processing Filters E. R. DAVIES Machine Vision Group, Department of Physics, Royal Holloway College, University of London, Egham, Surrey, TW20 0EX, United Kingdom
I. Introduction. . . . . . . . . . . . . . . . . . . . . . . II. Image Filters . . . . . . . . . . . . . . . . . . . . . . A. Noise Suppression Filters . . . . . . . . . . . . . . . . B. Mode Filters . . . . . . . . . . . . . . . . . . . . . C. Morphological Filters . . . . . . . . . . . . . . . . . . D. In-Depth Study of Median Filters . . . . . . . . . . . . . III. Shifts Produced by Median Filters in Continuous Images . . . . . A. Theory of Edge Shifts Produced by Median Filters in Continuous Binary Images. . . . . . . . . . . . . . . . . . . . . B. Extension to Continuous Gray-Scale Images . . . . . . . . . C. Extension to Discrete Neighborhoods . . . . . . . . . . . . D. Experimental Results for Discrete Binary Images . . . . . . . E. Experimental Results for Discrete Gray-Scale Images . . . . . . F. Edge Shifts Arising with Hybrid Median Filters . . . . . . . . IV. Shifts Produced by Median Filters in Digital Images . . . . . . . A. Using a Discrete Model to Explain Median Shifts . . . . . . . B. Theoretical Shifts for a 3 3 Neighborhood . . . . . . . . . C. More General Calculation of Edge Shifts . . . . . . . . . . D. Experimental Results for a 3 3 Neighborhood . . . . . . . . E. Numerical Computations for 5 5 Neighborhoods . . . . . . F. Numerical Computations for 7 7 Neighborhoods . . . . . . G. Tests of the Theory for 5 5 and 7 7 Neighborhoods . . . . . H. Discussion . . . . . . . . . . . . . . . . . . . . . . I. Trends for Large Neighborhoods . . . . . . . . . . . . . J. Effect of Sampling at the Center of a Pixel . . . . . . . . . . K. Case of Median Filter with Small Circles . . . . . . . . . . V. Shifts Produced by Mean Filters . . . . . . . . . . . . . . . A. Shifts for Step Edges . . . . . . . . . . . . . . . . . . B. Shifts for Linear Slant Edges . . . . . . . . . . . . . . . C. Discussion . . . . . . . . . . . . . . . . . . . . . . VI. Shifts Produced by Mode Filters . . . . . . . . . . . . . . . A. Shifts for Step Edges . . . . . . . . . . . . . . . . . . B. Shifts for Slant Edges . . . . . . . . . . . . . . . . . . C. Discussion . . . . . . . . . . . . . . . . . . . . . . D. Case of Mode Filter with Small Circles . . . . . . . . . . . VII. Shifts Produced by Rank-Order Filters . . . . . . . . . . . . A. Shifts in Rectangular Neighborhoods . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
94 96 98 100 102 104 105
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
105 110 113 113 116 120 122 122 124 128 129 131 135 136 137 137 141 143 146 146 147 149 150 150 150 151 153 156 157
93 Copyright 2003 Elsevier Science (USA). All rights reserved. ISSN 1076-5670/03
94
VIII.
IX.
X.
XI.
XII.
E. R. DAVIES B. Shifts in Circular Neighborhoods . . . . . . . . . . C. Case of High Curvature . . . . . . . . . . . . . D. Test of the Model in a Discrete Case . . . . . . . . E. Mean Distance from Center of Neighborhood to a Tangent F. Discussion . . . . . . . . . . . . . . . . . . . Rank-Order Filters—a Didactic Example . . . . . . . . A. Analysis of the Situation . . . . . . . . . . . . . B. Discussion . . . . . . . . . . . . . . . . . . . A Problem with Closing . . . . . . . . . . . . . . . A. Detailed Analysis . . . . . . . . . . . . . . . . B. Discussion . . . . . . . . . . . . . . . . . . . A Median-Based Corner Detector . . . . . . . . . . . A. Analyzing the Operation of the Median Detector . . . . B. Practical Results . . . . . . . . . . . . . . . . Boundary Length Measurement Problem . . . . . . . . A. Detailed Analysis . . . . . . . . . . . . . . . . B. Discussion . . . . . . . . . . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . .
. . . . . . Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
160 163 165 168 169 170 170 173 174 175 177 178 179 181 183 184 187 188 190
I. Introduction Over the past forty years or so, image processing has become a key science that is applied in a great many areas, ranging from photography, cinematography, and television to space and forensic science, medicine, and even the recovery of ancient manuscripts. The general idea of using image processing in these areas is to convert one image into another, with the aim of improving or modifying the output in some way. One example is the elimination of noise from images. Another is the enhancement of images, as in the case of chest radiographs, which might be expected to become easier for a clinician to use for diagnosis. Clearly, modification of images so that they may be viewed and interpreted more easily by human operators constitutes one class of image processing: this might have been expected, and ‘‘improvement’’ is very much a subjective term that pertains naturally to human judgment, though perhaps not so readily to scientific analysis. Further examples of image processing are the restoration of images to some ideal form that they would have had before transmission over some degrading medium. In the case of television, degradation can involve speckling and ghosting, as well as inclusion of the sometimes annoying scan lines that are characteristic of this originally analogue process. In some applications, degradation can take the form of blurring caused by a
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
95
turbulent atmosphere, by motion of an object in the scene being viewed, or by motion of the camera. But there are totally different uses of image processing. Images can be processed to locate particular objects as part of a more general process of image analysis. Here we are less interested in retaining the image representation as such, and far more interested in describing or measuring the content of the images. Indeed, if a suitably detailed description of the image content can be obtained, the image data and format may be thrown away. This happens when a human is driving a car down a motorway, and when a robot vision system is guiding a missile by analyzing a sequence of received images and deducing exactly where the target is. Less mundane tasks involving image analysis include automated visual inspection, where defective products or contaminants have to be located with a view to rejection. There are cogent reasons why locating objects in images is computation intensive. In particular, any object template will have to be applied at every location within an image. It will also have to be applied in all directions and perhaps at many scales. Furthermore, any variations in shade, color, texture, or other characteristics may necessitate many more tests, and the multidimensionality of the search process means that it will involve a combinatorial explosion of possibilities. As a result, it is generally far easier to locate objects from their features, as small features are much easier to locate, not least because they are subject to fewer variations including fewer distinguishable orientations. Proceeding in this way is not without its own problems, as locating the features does not then uniquely locate the objects. In fact, the presence of the objects has to be inferred, and special algorithms are needed to carry out this process—though that is another story (Davies, 1997a). Here the important point is that features have to be located in digital images. Algorithms that perform this sort of task are called filters, as their action is analogous to that of sieves in sorting the various parts of the image and extracting only the relevant parts—in this case the features that are needed for object recognition. Interestingly, the algorithms that are needed for performing many of the other image processing functions mentioned previously are also called filters, typically because they remove noise, scan lines, and speckle, or because they filter out blur or even act so as to locate the most meaningful parts of the image, thereby enhancing it. In the next section we enquire in more detail how the various filtering operations are carried out and how filters may be designed for specific tasks. Meanwhile we note that many of these operations are carried out by wellknown types of filter—mean and Gaussian filters, median filters, mode filters, and the rather large class of morphological filters. Several of these filters have been in wide use for decades, and it might be thought that the
96
E. R. DAVIES
subject of image processing would by now have reached a state of maturity such that the properties of these sorts of filter would be well known and fully documented. However, it has become increasingly clear for some years that although capable of performing certain image processing tasks extremely effectively, several of these filters also inadvertently distort the images they are processing. Furthermore, the extent of these distortions is often not known accurately and may not even be suspected by workers. As a result, it may occasionally be the case (especially when about to make specific measurements from images as part of inspection or related processes) that the best advice to those about to employ image filters is ‘‘Don’t!’’ This article is aimed at the analysis and elucidation of the various distorting processes that occur when applying common image processing filters. The median filter has a special position in the hierarchy, as it is widely used—particularly as it does not blur images in the way that mean or Gaussian filters are known to—and a whole variety of other filters are derived from or closely related to it, so that they tend to have similar properties. Thus the distortions exhibited by median filters apply in one form or another to many other types of filter. In Section II we describe some of the standard image processing filters, and present their basic properties. Then in Section III we consider the origins of the distortions produced by median filters, and make quantitative estimates of them on the assumption of a continuous analogue image space. In Section IV we proceed to extend these estimates by making due allowance for the discrete nature of the digital lattice within the image space. In Section V we apply the continuum model to the mean filter, and in Section VI we apply it to the mode filter. Then in Section VII we generalize the continuum model to cover all types of rank-order filter, with the median filter and certain morphological filters as special cases. This work is extended in Section VIII with a didactic example relating to rank-order filters, in Section IX by study of a morphological problem, in Section X by discussion of a median-based corner detector, and in Section XI by consideration of a boundary length measurement problem. Section XII presents some concluding remarks about a whole range of problems relating to continuous versus discrete representations.
II. Image Filters Perhaps the simplest type of filter is the convolution. In fact, the convolution is the most general spatially invariant linear filter that can be applied to an image. It has many realizations, but all are in the form of a weighted set of mask
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
97
coefficients that is multiplied by corresponding pixel intensities within the filter support region (commonly known as the ‘‘neighborhood’’), and which are then added to produce the final local output value. Not only does this paradigm include spatial matched filtering—a well-known procedure for enhancing desired signals of any known intensity profile—but also it includes local spatial averaging that smoothes the image locally, thereby helping to eliminate noise (Davies, 1997a). Other convolution operations include edge and feature enhancement, thus leading to the possibility of performing edge and feature detection. Although convolution filters form a very important class of filter, they are limited by their very linearity. Nonlinear filters are even more powerful as they embody no such restriction, though arguably this makes them more difficult to design as considerably more freedom is available to the algorithm designer. Nevertheless, a simple technique exists for creating useful nonlinear filters. This is to employ a two-stage procedure in which (1) a convolution is applied to enhance some particular set of features in an image, and (2) the features are located by a process such as thresholding or nonmaximum suppression, or by a combination of these, or by more sophisticated measures. The important point is that the features are ‘‘detected’’ only by the final nonlinear process. Indeed, it is the nonlinear process that provides much of the power: it is also the stage that corresponds to a set of decisions being made about where the features are. Edge detection constitutes a typical instance of the use of convolution followed by a nonlinear decision-making process. Line segment detection and corner detection are further instances in which this technique can be applied (Davies, 1997a). Two further approaches to the design of nonlinear filters are in wide use: one is the rank-order filter, typified by the median filter. The other is the morphological filter. The principle employed in rank-order filters is to take all the intensity values in a given neighborhood, to place these in order of increasing value, and finally to select the rth of the n values and return this value as the filter local output value. Clearly, n rank-order filters can be specified in terms of the value r that is used, but these filters all have the characteristic that they are intrinsically nonlinear, i.e., the output intensity cannot be expressed as a linear sum of the component intensities within the neighborhood. In particular, the median filter (for which r ¼ ðn þ 1Þ=2, and which is defined only if n is odd1) does not normally give the same output image as a mean filter: indeed, it is well known that the mean and median of a distribution are in general only coincident for symmetrical distributions. Note that minimum and 1
If n is even, it is usual to take the mean of the central two values in the distribution as representing the median.
98
E. R. DAVIES
maximum filters (corresponding to r ¼ 1 and r ¼ n respectively) are also often classed as morphological filters (see below and Section II.C). Morphological filters constitute another large class of nonlinear filters. Originally, the basic concept was to analyse and filter object shapes in binary images. However, the mathematical foundation of the subject has been considerably developed in recent years [Haralick et al. (1987) being a landmark paper], and currently the aim is to analyse both intensity variations and shapes in tandem, the mathematics being necessary to understand in a profound way the possible shapes and intensity patterns, and how they are related to each other, and how they may be processed to derive further shapes and intensity patterns.
A. Noise Suppression Filters In this section we consider the nature of the noise removal process, and how an ideal noise removal filter might be constructed. A priori, a good way of removing noise from images is to average a number of nominally identical images. In particular, if a camera is pointing at a still-life scene, averaging a sequence of off-camera images will gradually lead to the signal-to-noise ratio (SNR) being boosted, and by the time N pffiffiffiffiffi images have been averaged, the SNR will be N times larger (the total signal will be multiplied by N, and thepnoise power will be multiplied by N, ffiffiffiffiffi pffiffiffiffi ffi but the SNR will be multiplied by N= N , which equals N ). The problem with this approach is that there must be no camera shake and no motion in the scene. To overcome this problem, we can approximate by averaging locally within a single image, though this introduces a further problem— that the image will become blurred (Davies, 1997a). The blurring problem is also manifest if we consider the action of the filter in the spatial frequency domain, where removal of high spatial frequency noise also suppresses the rapidly varying signal components, and ultimately this must introduce blurring. In fact, optimal low-pass noise suppression filters need to employ mathematically well-behaved functions that are smoothly varying both in the spatial domain and in the spatial frequency domain—and in this sense the Gaussian smoothing kernel is optimal. However, whether the Gaussian smoothing filter or the mean (simple local averaging) filter is used, blurring is bound to occur and the underlying signal will not be exactly preserved. Fortunately, the median filter largely overcomes this problem. At the same time it has quite different noise suppression characteristics. In particular, it preferentially eliminates the outliers in any distribution, and thus has excellent impulse noise elimination characteristics. The validity of
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
99
this statement follows from the fact that the median is the center value of the local intensity distribution. This means that it can be considered to work by eliminating the two most extreme intensities at the lower and upper ends of the distribution, and also the next two, and the next two; and so on until only the central one remains. To understand how the median filter can remove noise without blurring the image, note that any monotonically increasing (or decreasing) function of intensity is unaffected by a median operation. More precisely, if an intensity function is monotonically increasing over the whole neighborhood of a median filter, the median operation will leave it unchanged. This onedimensional (1D) property is found to extend unchanged to twodimensional (2D) images. A further observation about the median is that although it is commonly considered not to blur images, it nonetheless appears to ‘‘soften’’ them. Although this is a subjective impression, a possible scientific explanation is the following: the median will not affect any monotonically increasing signal that increases right across the whole median filter neighborhood, so there is clearly no blurring at this level. However, if any ‘‘texture’’ appears within the neighborhood, i.e., any high-frequency signal components, these will appear like noise and will be suppressed. This will result in a rather ‘‘flat’’ set of image regions, and in this sense the image will appear ‘‘softened.’’ So large-scale edges will be unaffected but any fine alternating edges will be smoothed out of existence. Indeed, there is a general tendency for signals that have been processed by a median filter to have ‘‘runs’’ of pixels of identical intensity. This is illustrated by the following case in which all alternating components have been eliminated by processing with a threeelement 1D median filter2: input : output :
00010111212235445565 00001111122234445555
In a digital lattice of pixels, the median filter holds several surprises, a basic one being that an exactly alternating signal will repeatedly be inverted by multiple application of a three-element median filter: input : output 1 : output 2 :
01010101010101010101 00101010101010101011 00010101010101010111
2 In this and subsequent examples, the problem of neighborhoods that are partially outside the available input region is handled by assuming that the next element is equal to the immediately adjacent element within the neighborhood—a commonly used procedure.
100
E. R. DAVIES
Perhaps more relevant to the main thrust of this article is the fact that not only do median filters eliminate impulse noise highly effectively, but in the process they sometimes introduce shifts—as indicated by the following rather simple 1D example: 00000001011111111111 00000000111111111111
input : output :
In 2D, this would clearly cause a bump to appear on the boundary, adjacent to the position of the original noise spike. This effect has caused worries in the image processing community (Yang and Huang, 1981; Bovik et al., 1987), these worries being exacerbated by the difficulty of mathematically analyzing the properties of the highly nonlinear median operation. However, in this article, we are less worried by the possibility of shifts caused by noise than those that are intrinsic to the median filter. The following example serves to illustrate the problem, and at the same time acts as an existence theorem for the work described in subsequent sections of this article: 0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1
!
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 1 1
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1
Interestingly, this leads to an idea for a median-based corner detector, which is discussed in more detail in Section X (see also Davies, 1997a). A final point worth noting is that in situations of unvarying underlying signal, it is mathematically provable (Davies, 1997a) that the mean is optimal for coping with Gaussian noise, and that the median is optimal for coping with double exponential noise (i.e., noise subject to a distribution of the form expðjrjÞ). As impulse noise is expected to appear in the wings of the distribution rather than near its center, and the double exponential type of distribution has wider wings than a Gaussian; this again helps to clarify the improved capability of the median filter for coping with impulse noise.
B. Mode Filters Although the discussion in the previous section suggests that the median filter will generally be superior to the mean filter, it also leads to the question of whether the mode filter might have even better characteristics. Indeed, an obvious argument is that the mode is the highest point of any
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
101
distribution and represents the most probable value in that distribution. This seems to imply that the mode filter ought to be closer to optimal than the median filter. In fact, tests of this idea (Davies, 1988a) showed that the mode filter is closer to optimal, but only in a particular sense: that the mode is the most likely intensity value of the signal within a neighborhood, but replacing the value at the center of the neighborhood by the mode value tends to widen intensity plateaus in image space. On the other hand, if two intensity plateaus, one dark and one light, are adjacent to each other, the tendency to widen will tend to make the border between them narrower. Clearly, the mode filter is not acting as a pure noise removal filter, and the best way of describing its operation is as an enhancement operator, which tends to make edges crisper by widening adjacent plateau regions. These arguments show that simple analyses based on verbal descriptions are insufficient, as they tend to lead to the idea that the mode filter will generally be superior to the median filter, whereas these filters are actually quite different in what they are able to do. We also note in passing that the mode filter will generally be quite good at eliminating impulse noise, assuming the mode is not in the wings of the local intensity distribution. In fact, it will not be optimal in this respect: on the other hand, the median is optimal in the sense that it can cope with 50 % impulse noise at both ends of the distribution before it starts giving totally erroneous results. For further in-depth analysis of, and interesting insights into, the properties of mode filters, see Griffin (2000). Before leaving this topic, some remarks about the implementation and use of mode filters are appropriate. First, the mode filter has been used relatively little in image processing (see, for example, Coleman and Andrews, 1979; Davies, 1992c; Evans and Nixon, 1995; Griffin, 1997). Part of the reason for this must reside in the fact that the filter is difficult to implement because the sparsity of the local intensity distribution in a small neighborhood makes it difficult to define the mode: indeed, the obvious mode—the highest point in the distribution—may not be statistically significant. Hence special algorithms are required to locate the most accurate ‘‘underlying’’ mode. This problem was tackled by Davies (1988a), and a reasonably accurate and effective solution was obtained by using the median to locate the part of the local intensity distribution corresponding to any minor mode, which could then be eliminated by truncating the distribution. In fact, all the tests on mode filters described in Section VI relate to application of this ‘‘truncated median’’ filter: however, the predicted properties of the mode, such as capability for enhancement and moderate resistance to noise, have been found to apply to the truncated median filter (Davies, 1988a).
102
E. R. DAVIES
Figure 1. Effect of applying a morphological erosion operation to a binary object. The region inside the outer boundary is the original object. The inner dark shaded regions constitute the processed image: note that the erosion operation has broken the single original object into two smaller objects.
C. Morphological Filters The early morphological filters, which operate on binary images, include two particularly important operations—erosion and dilation. In fact, these operations have isotropic and directional variants, but here we shall concentrate on the former. The properties of these operations are defined in terms of structuring elements,3 which in the isotropic case amount to circles of specified radius b: applying such a circle as a structuring element for erosion leads to all objects in the image being eroded in all directions through the distance b (Fig. 1). Naturally, this means that at some points on the boundary, where the object is quite thin, the object will be eroded away completely: this may lead to the object being broken into several parts. Similarly, dilating an object using the same circular structuring element B, it will be expanded in all directions through the distance b, and this may mean that the object will become joined to another part of itself or to another object; it can also mean that any holes or inlets are filled in. Thus erosion and dilation can cause quite serious modifications to object shapes. However, morphology takes these ideas further in that it permits such operations to be used in sequence and combined into more useful operations (Haralick et al., 1987; Bangham and Marshall, 1998). One such operation is closing; another is opening. Closing is defined as dilation by B followed by erosion by B, and opening employs the same two operations in the reverse order. The general tendency is for such well-matched operations to cancel out, the reason being that expanding an object in all directions through a distance b and then contracting it through the same distance should be a null operation. However, although it is null for a shape such as a large circle, for a C shape the ends of the C might become joined by the dilation (or the C could even be filled in), and once this has happened the 3
In morphology the modifying element is known as a ‘‘structuring element’’: the term ‘‘kernel’’ is reserved for convolution.
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
103
erosion will be unable to reverse the situation. We can summarize the overall situation as follows: ðA BÞ B A ðA BÞ B
ð1Þ
where A is the original shape, and ; represent, respectively, dilation and erosion; or alternatively, expressing the combined operations as closing () and opening ( ): ABAA B
ð2Þ
When the opening operation is applied to an image containing convex objects, all objects that are narrower than a certain critical width will disappear, and the remainder will be returned to something approaching their original size and shape. Thus the operator can be envisaged as a filter that filters objects by size, eliminating the small ones. This type of operation can be repeated with opening operations using circular structuring elements of various radii b. This makes it clear that shapes with any range of widths can be eliminated, and any others retained, as long of additional image subtraction operations are permitted (in fact, the required operations are set subtractions in the image space). Clearly, then, morphological operations can be used for filtering. So far we have investigated morphological operations only in binary images. However, there are several ways in which these operations and results can be extended to gray-scale images: perhaps the most obvious is the case when a structuring element B is applied independently to all gray levels of a gray-scale image. In that case it may be shown that dilation corresponds to a local maximum operation, and erosion to a local minimum operation, in each case using a neighborhood of size equal to that of B. Next, it is of some interest that gray-scale edge detection algorithms can be designed by methods such as subtraction of eroded gray-scale images from dilated gray-scale images, the latter being generally larger if image intensity is measured in the positive z-direction: E ¼ ðA BÞ ðA BÞ
ð3Þ
It will also be of interest that there is a whole science of image processing that is based on morphological set operations instead of convolution operations, and that these are very much dual approaches that are able to solve similar sorts of problem, each in its own way, albeit with characteristics that are minutely different. Optimality depends on the individual problem and the criteria that are adopted for judging optimality. Note that the important theorem about matched filtering giving the optimum SNR under conditions of white noise (Davies, 1993) will not be valid for morphological set operations, and it is unknown what alternative
104
E. R. DAVIES
analogous form could apply for such operations. Thus it seems that many feature detectors would be closer to optimal when based on convolution rather than morphology. On the other hand, it is always useful to have alternative tools and algorithms, not least so that increased adaptability can be achieved as the data and the task demand. There are many other morphological filters that involve combinations of these basic operations. Further morphological filters may be designed by generalizing the ways in which gray-scale operations may be made, and by incorporating conditional processes (Haralick and Shapiro, 1992). D. In-Depth Study of Median Filters This article cannot do justice to all the sorts of filter that have been developed, noise-suppression, mode, rank-order, morphological, or otherwise. Instead it will be necessary to concentrate on a few—particularly rankorder filters and filters that are able to remove noise from images—and to explore their properties quite closely. Some of the original motivation for this line of work was the observation that median filters often tend to be used more because they are known to exist (to some extent having a wellknown name tends to encourage use) than because they provide provably optimal means for solving imaging problems; a further part of the motivation arises because they are known not to cause the blurring associated with mean filters, and this leads to the supposition that they do not shift edges—even though quite simple models (see Section II.A) show that this cannot be the case. Accordingly, further in-depth investigation of the properties of median filters and other associated filters seemed to be called for. In particular, (1) other rank-order filters and (2) other noise removal filters needed to be investigated. Table 1 indicates in which sections TABLE 1 The Filters Dealt with in This Article Filter Mean filter Median filter Mode filter Gaussian filter Hybrid median filter Minimum filter Maximum filter Rank-order filters Morphological filters
Described Section Section Section Section Section Section Section Section Section
II.A II.A II.B II.A III.F II.C II.C II II.C
Shifts Analyzed Section V Sections III, IV Section VI Section V Section III.F Section VII Section VII Sections VII, VIII Sections VII, IX
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
105
of the article different filters are first described and also where the shifts and distortions they give rise to are analyzed.
III. Shifts Produced by Median Filters in Continuous Images A. Theory of Edge Shifts Produced by Median Filters in Continuous Binary Images We start by considering a continuous image (i.e., a nondiscrete lattice), assuming first (1) that the image is binary, (2) that the neighborhoods are exactly circular, and (3) that the image is noise free. To proceed we notice that binary edges have symmetrical cross sections, whereas straight edges extend this symmetry into 2D: hence applying a median filter in a (symmetrical) circular neighborhood cannot pull a straight edge to one side or the other. Now consider what happens when the filter is applied to an edge that is not straight. If, for example, the edge is circular, the local intensity distribution will contain two peaks whose relative sizes will vary with the precise position of the neighborhood (Fig. 2). At some position the sizes of the two peaks will be identical. This happens when the center of the neighborhood is at a unique distance from the center of a circular object: this is the position at which the output of the median filter changes from dark to light (or vice versa). It is clear that the median filter produces an inward shift toward the center of a circular object (or the center of curvature), whether the object is dark on a light background or light on a dark background. Next suppose that the edge is irregular with several ‘‘bumps’’ (i.e., prominences or indentations) within the filter neighborhood: clearly, the filter will now tend to average out the bumps and straighten the edge, since it acts in such a way as to form a boundary on which the amounts of dark and light within the neighborhood are equalized (Fig. 3). This means that the edge will be locally biased but only by a reduced amount, since the various bumps will tend to pull the final edge in opposite directions. On the other hand, if there is one gross bump within the neighborhood—i.e., if the curvature has the same sign and is roughly constant at all points on the edge within the filter neighborhood—then all these parts of the edge will act in consort and it will be pulled sideways a significant amount by the filter. Thus a circular section of the boundary constitutes a ‘‘worstcase’’ situation, for which the filter produces the largest bias in the position of the edge. It is clearly worth finding the size of the worst-case shift and for this reason we concentrate attention below on circular objects, in
106
E. R. DAVIES
Figure 2. Variation in local intensity distribution with position of neighborhood. (a) Neighborhood of radius a overlapping a dark circular object of radius b. (b)–(d) Intensity distributions f when the separations of the centers are, respectively, less than, equal to, or greater than the center separation d for which the object bisects the area of the neighborhood. From Davies (1989).
the knowledge that all other shapes will give less serious shifts and distortions. The worst-case calculation is a matter of elementary geometry: we need to find at what distance from the center of a circular object (of radius b) the area of a circular neighborhood (of radius a) is bisected by the object boundary. One way of estimating this is to determine the integrated area within the neighborhood that lies outside the object boundary (Davies, 1989). We start by taking the boundary as being a circle centered at (b, 0) and passing through the origin (Fig. 4):
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
107
Figure 3. Edge smoothing property of the median filter. (a) Original 128 128 pixel image with 6-bit gray scale. (b) Effect of median filter smoothing of irregularities, in particular those around the boundaries, using a 21-element filter operating within a 5 5 neighborhood. Notice how the threads on the screws are virtually eliminated, although detail larger in scale than half the filter area is preserved. (c) Effect of 2LHþ ‘‘detail-preserving’’ filter. From Davies (1989).
ðx bÞ2 þ y2 ¼ b2
The integrated area outside the boundary is now Z a Z a 1 2 1 2 A ¼ a þ xdy ¼ a þ b ðb2 y2 Þ1=2 dy 2 2 a a
ð4Þ ð5Þ
whereas that inside the boundary is a2 A. To equalize these areas we need to institute a boundary shift of D (though this is necessarily an
108
E. R. DAVIES
Figure 4. Geometry for calculating edge shifts. The large circle (radius b) is the boundary of the object. The small circle (radius a) is the neighborhood. The shaded portion corresponds to the integral in Eq. (5).
approximation, as the shift will have a small affect on the intersection points, so the integration limits will no longer be exactly a and a). This gives the following equation for D: A 2aD ða2 AÞ þ 2aD
ð6Þ
1 ; 2aD A a2 2
ð7Þ
giving D
1 2a
Z
a
a
b ðb2 y2 Þ1=2 dy
ð8Þ
To perform the integration, make the substitution This leads to the formula
y ¼ b sin u
ð9Þ
b b2 D b ð1 a2 =b2 Þ1=2 sin1 ða=bÞ ð10Þ 2a 2 To proceed, we use the power series expansion of the inverse sine function: 1 3 sin1 ¼ þ 3 þ 5 þ 6 40
ð11Þ
whence D or
a2 a4 þ 6b 40b3
ð12Þ
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
109
Figure 5. Geometry for calculating neighborhood and object overlap. From Davies (1989).
1 1 ð13Þ D a2 þ 3 a4 6 40 ¼ 1=b being the local curvature. This equation simplifies to the following form for low values of : 1 ð14Þ D a2 6 Although Eq. (14) is often quite useful, Eq. (13) is especially valuable as it turns out to be far more accurate over the range 0 a b than might be expected (see below). However, to find the exact situation, we need a more rigorous theory. This is attained as follows (Davies, 1989). From Figure 5 the area of the sector of angle 2 is b2 , whereas the area of the triangle of angle 2 is b2 sin cos . Hence the area of the segment shown shaded is B ¼ b2 ð sin cos Þ
ð15Þ
Making a similar calculation of the area A of a circular segment of radius a and angle 2 , the area of overlap (Fig. 5) between the circular neighborhood of radius a and the circular object of radius b may be deduced as C ¼AþB
ð16Þ
For a median filter this is equal to a2 =2. Hence F ¼ a2 ð sin cos Þ þ b2 ð sin cos Þ a2 =2 ¼ 0
ð17Þ
where a2 ¼ b2 þ d 2 2bd cos
ð18Þ
110
E. R. DAVIES
and b2 ¼ a2 þ d 2 2ad cos
ð19Þ
To solve this set of equations, we take a given value of d, deduce values of
and , calculate the value of F, and then adjust the value of d until F ¼ 0. Since d is the modified value of b obtained after filtering, the shift produced by the filtering process is D¼bd
ð20Þ
The results of doing this computation numerically have been found by Davies (1989) and are shown in Table 2. As expected, D ! 0 as b ! 1 or as a ! 0. Conversely, the shift becomes very largepas ffiffiffi a first approaches and then exceeds b. Note, however, that when a > 2b the object is ignored, being small enough to be regarded as irrelevant noise by the filter: beyond this point it has no effect on the finalp image. The maximum edge shift before ffiffiffi the object finally disappears is ð2 2Þb 0:586b. B. Extension to Continuous Gray-Scale Images To extend these results to gray-scale images, first consider the effect of applying a median filter near a smooth step edge in 1D. Here the median filter gives zero shift, since for equal distances from the center to either end of the neighborhood there are equal numbers of higher and lower intensity values and hence equal areas under the corresponding portions of the intensity histogram. Clearly this is always valid where the intensity increases monotonically from one end of the neighborhood to the other—a property first pointed out by Gallagher and Wise (1981) [for later discussions on related ‘‘root’’ (invariance) properties of signals under median filtering, see Fitch et al., 1985; Heinonen and Neuvo, 1987]. Next, it is clear that for 2D images, the situation is again unchanged in the vicinity of a straight edge, since the situation remains highly symmetrical. Hence the median filter gives zero shift, as in the binary case. For curved edges, it again turns out that circular boundaries constitute a worst case that should be considered carefully. However, gray-scale edges are unlike binary edges in that they have finite slope. This means that it is necessary to take account of the exact form of the intensity function within the neighborhood. When boundaries are roughly circular, contours of constant intensity often appear as in Figure 6. To find how a median filter acts we merely need to identify the contour of median intensity (in 2D the median intensity value labels a whole contour), which divides the area of the neighborhood into
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
111
TABLE 2 Estimated Edge Shifts for Filtering Circles in a Continuuma a/b
D/b
D0 =b
0.000 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500 0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 1.000 1.050 1.100 1.150 1.200 1.250 1.300 1.350 1.400 1.414
0.000 0.000 0.002 0.004 0.007 0.011 0.015 0.021 0.027 0.035 0.043 0.053 0.063 0.075 0.088 0.102 0.117 0.133 0.151 0.171 0.192 0.214 0.238 0.264 0.292 0.321 0.353 0.387 0.423 0.433
0.000 0.000 0.002 0.004 0.007 0.010 0.015 0.021 0.027 0.035 0.043 0.052 0.063 0.074 0.087 0.101 0.116 0.132 0.151 0.170 0.192 0.216 0.242 0.272 0.305 0.342 0.387 0.443 0.528 0.586
a
From Davies (1989).
two equal parts. The geometry of the situation is identical to that already examined in Section III.A: the main difference here is that for every position of the neighborhood, there is a corresponding median contour with its own particular value of shift depending on the curvature. Intriguingly, the formulas already deduced may immediately be applied for calculating the shift for each contour. Figure 6 shows an idealized case in which the contours of constant intensity have similar curvature, so that they are all moved inward by similar amounts. This means that to a first approximation, the edges of the object retain their cross-sectional profile as it becomes smaller.
112
E. R. DAVIES
Figure 6. Contours of constant intensity on the edge of a large circular object, as seen within a small circular neighborhood. From Davies (1989).
We next consider the effects of noise. For simplicity we assume that the noise is additive and of symmetrical (nonskew) intensity distribution: this is valid for Gaussian noise and is also likely to be true for many types of impulse noise. Now recall that the median contour divides the neighborhood into two equal parts. Hence, adding noise of symmetrical intensity distribution will on average not change the area on either side of the original median contour: this means that noise will not on average cause edges to shift any differently as a result of applying the median filter—i.e., the shifts of edges caused by noise or by applying a median filter are, to first order, additive. In particular, noise does not affect the general conclusions presented above concerning the shifts of edges introduced by median filters. Though specific experiments have not been performed in this work to introduce noise and check this result quantitatively, it is generally supported by observations on real images containing noise. This section has generalized the results of Section III.A to cover grayscale images. It has also shown that the effects of noise should not materially affect the conclusion that median filters shift edges inward toward local centers of curvature, the worst-case situation arising for circular objects. Finally, it has shown that straight edges remain unshifted for any symmetrically shaped neighborhood. However, the detailed shift calculations (Section III.A) assumed a circular neighborhood: though this is not general, it will produce an ideal isotropic response that could not be guaranteed for any other shape of neighborhood. This fact justifies our concentration on circular neighborhoods in the above analysis. For neighborhoods of other shapes it seems simplest to confirm experimentally that the theory makes substantially correct predictions of edge shifts.
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
113
Figure 7. Edge shift curves for various sizes of neighborhood. This diagram indicates how the edge shift curves would be expected to change as p moves between the two limiting values of 1 and infinity: (a) p ¼ 1; (b) p ¼ 3; (c) p ¼ 5; (d) p ¼ 1. From Davies (1989).
C. Extension to Discrete Neighborhoods In the previous two sections we have developed theory showing how median filters introduce edge shifts in a continuous space. Here we consider the effect of applying median filters in discrete lattices of pixels: specifically, median filters need to be applied in square neighborhoods of p p pixels. In the continuous case covered earlier, p was essentially infinite. Unfortunately, it is difficult to see how to extend the theory accurately to typical cases such as 3 3 and 5 5 neighborhoods. However, it is trivial to cover the case of p ¼ 1, since in this case the median filter leaves the image unchanged. For intermediate values of p we expect that edge shifts will fall between these two cases as upper and lower bounds, and indeed that there will be a steady progression from the one to the other bound as p varies (Fig. 7). As we shall see below, this situation is generally confirmed by the experimental data. D. Experimental Results for Discrete Binary Images In this section we present experimental results for binary images. For reasons outlined earlier, we concentrate on the worst-case situation of small circular objects. Clearly, in a binary image it is only possible to approximate to (filled) circles, and in these tests radii ranged from 0.5 to about 9 pixels. It seemed sufficient to perform experiments using standard median filters in a 3 3 (square) neighborhood. In these experiments, a problem arose since
114
E. R. DAVIES
Figure 8. Binary circles before and after filtering. (a) Set of binary circles of radii ranging from 0.5 to 9 pixels. (b) Result of applying a median filter operating in a 3 3 (square) neighborhood. From Davies (1989).
the effective radius of such a neighborhood is not known accurately: for the present purpose we assume it to be such as to equalize the areas of discrete and idealized neighborhoods. Hence we took the radius as ð9=Þ1=2 ¼ 1:693 pixels. The results are shown in Figures 8 and 9. When compared with the results of Table 2 they show some interesting features. In particular: 1. There is a limited number of possible radius values. 2. For a large proportion of radius values no change in radius occurs on applying a median filter. 3. For very small radius values the circle disappears completely. 4. For other small radius values the circle becomes much smaller. 5. For certain isolated larger radius values there is a reduction in circle size, but (a) the number of instances becomes rarer as radius increases, and (b) the reduction in circle size becomes smaller as radius increases. 6. In general, repeated application of a median filter to circles above a certain critical size must lead to a small reduction in size followed by stability, whereas repeated application to circles below the critical size must lead to their elimination: the critical radius is 2:5 pixels. Ignoring result (1) as obvious, we interpret these results as follows. When applied to discrete binary images, the median filter has the properties predicted in Section III.A. However, median filters in small neighborhoods
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
115
Figure 9. Edge shifts for 3 3 median filter applied to binary circles. In this graph the plots represent the experimental results and the continuous curve is derived from theory (see Section III.A). It is also of interest to compare the experimental plots with the model of Section III.E (see lower curve in Fig. 10). From Davies (1989).
do not have the resolution to detect accurately the curvature of large circles: hence these either become resistant to any change in their size and shape or seem rather unstable and ready to shed their outermost pixels. The latter situation clearly happens for those circles whose boundaries are irregular, since they have some relatively sharp corners that are eliminated by the median filter. Such corners become increasingly rare as circle size increases (see Fig. 8). We return to this point in Section III.E, with the aim of building a more realistic model of the action of the median filter in the discrete case. The stability properties we have observed are related to the root behavior noticed by other workers when median filters are applied repeatedly (mainly to one-dimensional signals) until no further change occurs (Gallagher and Wise, 1981; Fitch et al., 1985; Heinonen and Neuvo, 1987). However, we are here less interested in root behavior than in mean edge shifts, for a single application of a median filter, as curvatures vary. Hence it is instructive to average out the rather random responses that occur for various radii b. It is seen that the resulting curve (Fig. 9) is similar in shape to the theoretical curve of Section III.A, but lies below it and indeed between it and the identity curve corresponding to the null case p ¼ 1, as predicted in Section III.C: in no case does the stability effect cause the predicted change in size of an object to be reversed in sign, though it is frequently reduced to zero.
116
E. R. DAVIES
E. Experimental Results for Discrete Gray-Scale Images The results of the previous section immediately suggest using better approximations to circles, with the jagged binary edges interpolated by appropriate gray-scale values. For each size of circle this was achieved by permitting the intensity to vary linearly from black to white over a range of 1 pixel, and then smoothing the resulting shapes using the following wellknown convolution mask: 2 3 1 2 1 1 4 2 4 25 16 1 2 1
This procedure was successful in giving a realistic approximation to a smoothed step-edge. (Note, however, that other edge models, such as edges that vary linearly over 1 or 2 pixels, seemed to give essentially the same edge shifts—i.e., the edge shift behavior was relatively insensitive to the exact type of edge model chosen.) In the experiments described here, circular objects again varied between 1 and 9 pixels, and pixel intensities were permitted to vary over an 8-bit range. The original and the modified radii were measured by taking the integrated intensities over the circle region and deducing the radii, this approach being used because it overcame problems due to irregularities in circle boundaries. Finally, the experiment was in this case performed for 3 3 and 5 5 neighborhoods, though in the latter case an attempt was made at approximating to the more ideal circular neighborhood by omitting the four corner pixels, and using the pattern:
[Note that in some earlier work on edge detection operators, it was found that this pattern not only required less computation than a full 5 5 operator, but also gave increased edge orientation accuracy (Davies, 1984).] The results of these experiments are shown in Figures 10 and 11. In these cases certain features that were present in the previous experiment on binary circles are now absent. In particular, the variation of edge shift D with initial radius b is very smooth, and there is no evidence of stability of large circles against the median filter: i.e., the median filter is able to reduce
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
117
Figure 10. Edge shifts for 3 3 median filter applied to gray-scale circles. In this graph the plots represent the experimental results and the upper continuous curve is derived from theory (see Section III.A). The lower continuous curve is derived from the model of Section III.E. From Davies (1989).
the size of all circles by an amount that tends to zero much as expected as radius increases. It seems important to analyze this case reasonably thoroughly, as it constitutes a common practical situation. Hence it was compared quantitatively with the theory of Section III.A. Again the problem arose that the effective radius of a discrete median filter in a 3 3 or 5 5 neighborhood is not known accurately: adopting the equal areas strategy of the previous section, we obtain the respective radii as ð9=Þ1=2 ¼ 1:693 and ð21=Þ1=2 ¼ 2:585. The resulting theoretical graphs are shown in Figures 10 and 11. These show that there are some odd results for very low values of circle radius. On the whole these can be explained rather neatly by appealing to the d/a vs. b/a curve (Fig. 12). Here we see that the effect of having an edge profile that varies from black to white over several pixels is to bring in a range of radius values: hence it is necessary to average the graph over a suitable range of values. As a result the experimental curve goes smoothly down to zero below the critical radius, and in the 5 5 case cuts slightly across the theoretical upper bound curve above the critical radius. Other minor inaccuracies can be explained as due to the particular (noncircular) shape of the filter neighborhood and remanent stability effects. Finally, the expected progression from p ¼ 1 through p ¼ 3 and p ¼ 5 to p ¼ 1 is obeyed, though the intermediate edge shift curves both appear to
118
E. R. DAVIES
Figure 11. Edge shifts for 5 5 median filter applied to gray-scale circles. The upper set of plots represent the experimental results and the upper continuous curve is derived from theory (see Section III.A). The lower continuous curve is derived from the model of Section III.E. The lower set of plots represents the much reduced shifts obtained with the detailpreserving type of filter. From Davies (1989).
approach zero more rapidly than would have been predicted on the basis of simple interpolation. We conclude that discrete neighborhoods impart an additional stability on the edge positions. This can be explained in general terms as follows. For a 3 3 median filter, there are essentially only 8 positions around the boundary of a large object that the median filter can erode. [This is approximately correct for binary images where large ‘‘circles’’ tend to be octagonal (see Fig. 8); however, it must also be approximately correct for gray-level images, since we can consider each gray-level outline as a separate binary image that is then eroded on its own.) Hence the effectiveness of the median filter at eroding large objects will in principle fall off by an additional factor proportional to a/b (i.e., relative to the upper bound given by the theory of Section III.A): note, however, that this additional factor will not apply for small objects, so the result can never be larger than the upper bound value of D. For the 5 5 median filter, which is more sensitive to the curvature of large objects, the same general model seems to apply, but with a different constant of proportionality. However, to fully explain the observed variations we need to model the p-variation. Following Section III.C and considering vertical lines through the D/a vs. b/a graph (Fig. 7), we try modeling the vertical variation as (1=a0 1=a), since this quantity
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
119
Figure 12. Method of averaging required for small circles. This diagram shows how averaging over the various contour radii appearing within the neighborhood should be performed. The two main effects are (1) raising of the circle size (reduction of the edge shift) for circular objects below the critical size; (2) lowering of the circle size (increase in the edge shift) for objects above the critical size. From Davies (1989).
automatically approaches a constant value as a tends to infinity, and approaches zero as a approaches a0 . [Here a0 is the effective radius of a 1 1 neighborhood, assumed here to be equal to ð1=Þ1=2 ¼ 0:564.] Taking both variations into account gives the overall model4: D0 ¼ min½D; cDð1=a0 1=aÞða=bÞ ¼ min½D; cDða=a0 1Þ=b
ð21Þ
where the constant c has to be found empirically. As will be clear from Figures 10 and 11, this formula gives very good agreement with the observed results for 3 3 and 5 5 neighborhoods, when c is made equal to 1.0. Thus it is now known with fair accuracy (even if partly on a semiempirical basis) how the upper bound form (Section III.A) adapts to a discrete lattice. At this stage there can be no doubt that the median filter gives definite and measurable edge shifts via a specific type of neighborhood averaging process. Overall, for gray-scale images the shifts predicted by this theory agree with experimental shifts within approximately 10 % for a large range of circle sizes in a discrete lattice (see Figs. 10 and 11). The agreement is less perfect for binary images, since circles of certain sizes show stability effects (akin to median root behavior): these effects tend to average out for gray-scale images, 4 It will be seen in Section IV that the additional factor 1/b that appears here presages a rigorous derivation of an overall variation proportional to 2 at low values of curvature ¼ 1=b: this was not anticipated in Davies (1989).
120
E. R. DAVIES
owing to the presence of many contours of different sizes at different gray levels. Overall, it appears that the edge shifts obtained with median filters are now quite well understood. Figures 3 and 13 give some indication of the magnitudes of these shifts in practical situations. Note that once image detail such as a small hole or screw thread has been eliminated by a filter, it is not possible to apply any edge shift correction formula to recover it, although for larger features such formulas are useful for deducing true edge positions. F. Edge Shifts Arising with Hybrid Median Filters Although median filters preserve edges in digital images, they are also known to remove fine image detail such as lines. For example, 3 3 median filters remove lines one pixel wide, and 5 5 median filters remove lines two pixels wide. In many applications such as remote sensing and X-ray imaging this is exceedingly important and efforts have been made to develop filters that overcome the problem. In 1987 Nieminen et al. reported a new class of ‘‘detail preserving’’ filters: these employ linear subfilters whose outputs are combined by median operations. There is a great variety of such filters, employing different subfilter shapes and having the possibility of several layers of median operations. Hence it is not possible to describe them fully here in the space available. Although these filters are aimed particularly at retention of line detail, and are readily understood in this context, they turn out to have some corner-preserving properties and to be resistant to the edge shifts that arise when the curvature is nonzero. Perhaps the best of the filters in the new class, from the point of view of preserving edge position, is the two-level ‘‘bidirectional’’ linear-median hybrid filter termed ‘‘2LHþ’’ (Nieminen et al., 1987). Its operation in a 5 5 neighborhood may be illustrated as follows. It employs the subfilters AI in the 5 5 region: E
D
C
E
D
C
F
F
A
B
B
G
H
I
G
H
I
pixels marked as being in the same subfilter having their intensities averaged, and dashed pixels being ignored. Nonlinear filtering then proceeds using two levels of median filtering, the final center-pixel intensity being taken as A0 ¼ med½A; medðA; B; D; F ; HÞ; medðA; C; E; G; IÞ
ð22Þ
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
121
Figure 13. Circular holes in metal objects before and after filtering. (a) Original 128 128 pixel image with 6-bit gray scale. (b) 5 5 median-filtered image: the diminution in size of the holes is clearly visible and such distortions would have to be corrected for when taking measurements from real filtered images of this type. (c) Result of using a detail-preserving filter: some distortions are present although the overall result is much better than in (b). From Davies (1989).
122
E. R. DAVIES
We here ignore the line-preserving properties of this filter and concentrate on its corner-preserving, low-edge-shift characteristics. It is quite easy to see that the 5 5 regions 0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1
0 0 0 0 1
0 0 0 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 1
are preserved by this filter, although these examples represent limiting cases that could be disrupted by minor amounts of noise or slight changes of orientation. Thus the filter seems guaranteed to preserve corners only if the internal angle is greater than 135 . This figure should be compared with the 180 obtained using similar arguments for the normal median filter in 5 5 regions such as 0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
1 1 1 1 1
1 1 1 1 1
Figure 11 shows plots obtained with this filter under the same conditions as for the 5 5 median filters. It always gives at least a 4-fold improvement (reduction) in edge shift over that for the median filter, and this performance improves with increasing radius of curvature b until there is zero shift for b > 4 (note that b ¼ 4 is approximately the figure that would be expected from the corner angle of 135 noted above, within a 5 5 neighborhood). Hence such detail-preserving filters improve the situation dramatically but do not completely overcome the underlying problem described earlier. In addition, this improvement may not have been obtained without cost, since in some cases the filter seems to insert structure where none exists (Davies, 1989). The result is to cast some doubt on the usefulness of this type of filter in all possible situations. Nevertheless, its effect on real images appears to be generally very good (see Figs. 3c and 13c). IV. Shifts Produced by Median Filters in Digital Images A. Using a Discrete Model to Explain Median Shifts To produce a discrete model we need to recognise explicitly the positions of the pixels within an p p neighborhood. We approximate by assuming
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
123
Figure 14. Idealized intensity functions for calculation of contour shifts. This figure shows idealized intensity variations within a circular neighborhood C of radius a. (a) Top: circular contours of radius b; bottom: linear variation in intensity. (b) Single step edge with a circular boundary of radius b. From Davies (1999b).
that the intensity of any pixel is the mean intensity over the whole pixel and is represented by a sample positioned at the center of the pixel. In addition, we take the underlying analogue intensity variation to have contours of curvature , as shown in Figure 14a. Following what happened in the continuum case, it will not matter whether the contours of constant intensity are those of a step edge or those of a slowly varying slant edge: it is what happens at the median contour that determines the shift that arises. The starting point is that zero shift occurs for ¼ 0. Next, if is even minutely greater than zero, the centre pixel will not necessarily be the median pixel. Consider first a situation when the circular median intensity contour does not pass through the center of the central pixel but passes symmetrically through the centers of two other pixels as shown in Figure 15. (With no loss of generality at this stage of the calculation, the two pixels are assumed to lie along the same vertical line.) If the separation of the two pixels is 2, then the geometry of a circle of radius b leads to 2 ¼ D ð2b DÞ
ð23Þ
; D 2 2bD þ 2 ¼ 0
ð24Þ
; D ¼ b ½b2 2 1=2
ð25Þ
where we have to take the minus sign. Approximating for small and large b ð¼ 1 Þ leads to the result
124
E. R. DAVIES
Figure 15. Geometry for calculation of shift when the median contour passes through the centers of two pixels.
h i1=2 D ¼ b b 1 ð=bÞ2
1 1 b b 1 ð=bÞ2 ð=bÞ4 2 8
ð26Þ
1 1 1 2 þ 3 4 2 2 8 2
We shall now follow situations similar to that shown in Figure 16, where the circular median intensity contour passes close to the center of the neighborhood at a small angle to the positive x-axis, and passes only through the upper of the two pixel centers shown in Figure 15. In that case, the filter will produce a shift: 1 D 2 2
ð27Þ
B. Theoretical Shifts for a 3 3 Neighborhood Next, we specialize to the case of a 3 3 neighborhood, and proceed (Davies, 1999a,b) by taking two pixels adjacent to the central pixel, along the y-axis, as shown in Figures 16 and 17a. In that case we have ¼ a0 , so 1 1 D a20 a0 ¼ 2 2
ð28Þ
where we have again taken the interpixel separation a0 to be unity. If is close ptoffiffiffi 45 , the shift will (see Fig. 17c) be obtained using the new value ¼ 2a0 in Eq. (26), though this time the result is best expressed in terms of ’, where
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
125
Figure 16. Geometry for calculation of median shifts on the discrete model. From Davies (1999b).
’¼
4
ð29Þ
Thus we obtain: pffiffiffi pffiffiffi 1 pffiffiffi D’ ð 2a0 Þ2 ð 2a0 Þ’ ¼ 2’ 2
ð30Þ
As indicated above, Eqs. (28) and (30) are approximate: Section IV.C gives an exact calculation with a general solution, from which these and other special cases may be derived. However, the above derivations and solutions provide useful insight into the situation. In particular, Eqs. (28) and (30) show that at the ends of the range 0 =4; D varies in proportion to . The next problem is understanding what happens when D falls to zero at intermediate values of . In fact, D remains at zero in this range, the reason being that the median contour reverts to passing through the central pixel center in the neighborhood (Fig. 17b). The resulting approximately piecewise-linear variation in D (Fig. 18a) is far from what would be expected on the continuum model. To make a realistic comparison we must average over all . In that case we obtain the result (Z ) Z =pffiffi2 =2 pffiffiffi D ð=2 Þd þ ð 2’Þ d’ =ð=4Þ 0
4 ¼
¼
0
=pffiffi2 ) p ffiffi ffi 1 1 ð=2 Þ2 þ pffiffiffi ð 2’Þ2 2 2 2 0 0 pffiffiffi! 1þ2 2 2 0:6092 2
(
=2
ð31Þ
126
E. R. DAVIES
Figure 17. Geometry for calculation of median shifts at low . These three diagrams show the positions of the median pixels and the ranges of orientations of circular intensity contours for which they apply, (a) for low , (b) for intermediate , and (c) for high . From Davies (1999b).
This shows that D follows a quadratic rather than a linear law at low values of , unlike the situation for the continuum model. However, for high values of , the variation would be expected to revert to a linear model: this should occur when reaches such high values that the range of values of for which D ¼ 0 falls to zero. At that stage the whole variation of D with should rise bodily as increases further (Fig. 18). Equating to zero the (30), we deduce values of D and D’ given by the approximate Eqs. (28) pffiffiand ffi that this should happen for values of above about ð 2 1Þ=2 0:8 (the accurate value is 0.632—see below). Perhaps the most surprising thing is that this hardly happens for a 3 3 neighborhood (Figs. 18 and 19): for the necessary high values of , the median contour fails to reach all the outermost pixels in the neighborhood and there are orientations for which the contour represents an object that is eliminated entirely by the median filter. Averaging over all is then not meaningful: here we do not consider such cases further. In fact, there is one problem with the above interpretation—that pffiffiffi for quite high values of one other pixel separation than a0 and 2a0 becomes pffiffiffi important. This value is ¼ a0 = 2 (see Fig. 20c). This leads to the following equation taking over from Eq. (30) at high values of :
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
127
Figure 18. Angular variations of 3 3 median shifts. (a) Graph showing the situation for ¼ 0:4: note that the -axis constitutes the middle part of the variation. (b) Graph for higher (0.632) when the middle part of the variation just vanishes. (c) Graph for a slightly higher value of (0.66). (d) Graph for the highest value of (0.7071) for which a valid value of D exists for all .Note that (b) is the highest graph for which no change of gradient occurs at high . All the graphs presented in this figure are calculated from the exact formulas in Section IV.C. From Davies (1999b).
pffiffiffi 1 pffiffiffi2 pffiffiffi D ’ a0 = 2 þ a0 = 2 a0 = 2 ’ 2 1 1 1 ¼ pffiffiffi þ pffiffiffi ’ 4 2 2
ð32Þ
To understand in detail when this happens we can should compare Figures 17c and 20c. The change over from one situation to the other occurs when an extreme intensity contour passes through the following three pixel centers: (1, 0), (0, 1),p(1, ffiffiffiffiffiffiffi 1). Such a contour will have a radius b ¼ ½ð3=2Þ2 þ ð1=2Þ2 1=2 ¼ 2:5 1:581, leading to a curvature 0:632. Curiously, this is the same value as that (noted above) for which the value zero for D drops out of consideration—as is seen by considering Figure 17b, when we find that both extreme intensity contours pass through (1,0), (0, 0), (1, 1). Finally, the approximate results of Eqs. (28)–(32), and predictions made from them, are superseded by the exact formulas obtained in the following section: the latter formulas are the ones used to produce the D graphs in Figure 18 and the average graph in Figure 19.
128
E. R. DAVIES
Figure 19. Comparisons of 3 3 median shifts. The lower solid curve shows the nonapproximated results of the discrete model (cf. the exact formulas in Section IV.C): the upper solid curve shows the results of the experiments on gray-scale circles. The dotted curve depicts earlier experimental data (Section III.E). The gray line shows the predictions of the original continuum model [see Eq. (13)]. From Davies (1999b).
C. More General Calculation of Edge Shifts This section derives exact formulas for edge shifts that correct and extend the calculations of Section IV.B, and in particular lead to the D graphs in Figure 18 and the average graph in Figure 19. First we refer to Figure 16, and generalize it (Davies, 1999b) so that (1) the center C (xc ; yc ) of the median contour lies on the line y ¼ x tan
ð33Þ
where need not be assumed to be small, and (2) the median contour passes through a general pixel center at (; ) rather than (0, 1). The origin O is taken to be the center of the neighborhood. The median shift D will be obtained by determining how far the median contour is from O. The equation of the median contour is ðx xc Þ2 þ ðy yc Þ2 ¼ b2
ð34Þ
Noting that it passes through (; ), and eliminating yc using Eq. (33), we find x2c sec2 xc ð2 þ 2 tan Þ þ 2 þ 2 b2 ¼ 0
ð35Þ
Solving for xc and taking the appropriate solution gives h i1=2 xc ¼ cos2 ð þ tan Þ þ ð þ tan Þ2 ð2 þ 2 b2 Þ sec2 ð36Þ
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
129
Figure 20. Geometry for calculation of median shifts at high . These three diagrams show the positions of the median pixels and the ranges of orientations of circular intensity contours for which they apply, (a) for low , (b) for intermediate , and (c) for high . From Davies (1999b).
The nearest point to O on the circle is N (xn ; yn ), which also lies along the line given by Eq. (33). We can now write down an expression for the median shift: D ¼ xn sec ¼ b xc sec
ð37Þ
The final step in this general part of the calculation is to substitute for xc in Eq. (37) using Eq. (36). Section IV.B has shown that there are three candidates for (; ) at low , and a different set of three candidates at high (see also Figs. 17 and 20): low : high :
ð0; 1Þ; ð0; 0Þ; ð1; 1Þ ð0; 1Þ; ð1; 1Þ; ð0; 1Þ
It is now a question of substituting for (; ) in the final equation for D to give the required variations with for any value of ¼ 1=b. D. Experimental Results for a 3 3 Neighborhood In this section we consider the results of experiments carried out to check the predictions of the discrete theory in the case of 3 3 neighborhoods. We
130
E. R. DAVIES
started with the experimental results obtained earlier (Davies, 1989). When these were obtained they did not agree especially well with the continuum model. More important, they did not agree at all well with the new discrete theory (Fig. 19). In part, the motivation of the present work was to provide an exact explanation of the earlier results, so further experimentation was called for. The earlier experimental work (Davies, 1989) involved taking a number of gray-scale circles of different sizes and measuring how these sizes were altered by a 3 3 median filter. Our new experimental work showed that there is a need not only to take all possible sizes of gray-scale circle but also all possible subpixel positions: though obvious in retrospect, it was not clear earlier that this factor had been lacking statistically in the data of Davies (1989), and turned out to be the crucial factor that permitted almost exact agreement between theory and experiment to be achieved in the new work (Fig. 19). Curiously, the agreement was within about 1% up to 0:6, and thereafter it diverged significantly. However, this is readily explained, as the theory is based on edge shifts, where the edges are taken to correspond simply to single edges within a 3 3 neighborhood, whereas the experimental results corresponded to gray-scale circles with dome-shaped intensity profiles; thus, for high values of , even if a circle was not located entirely within the neighborhood, some of its gray levels might appear entirely within the neighborhood: a number of these would then be eliminated (in a process that we shall refer to as ‘‘dome-slicing’’), and the remaining gray levels would be subject to different shifts that would be combined in a nonlinear manner by the measurement process. This meant that agreement would be expected only where was sufficiently small that all shifts produced within a given neighborhood would be in very much the same direction, as indicated by the intensity paradigm of Figure 14a. For reference, a set of gray-scale circles at various positions before and after processing is shown in Figure 21 for the critical value of 0:6 at which disagreement starts to occur: it will be seen that the intensities of the circles have already started dropping below the maximum value of 1.00, but not yet sufficiently to affect the agreement between theory and experiment. Finally, it should be explained how the measurement of circle radius was carried out. This was achieved by integrating the gray-scale area of each circle, and then deducing the radius. A precision of 256 gray levels was employed, and a spatial precision of 1=15 pixel was adopted, i.e., each pixel was subdivided into an array of 15 15 subpixels to determine whether it was within the required distance of the center, and then the subpixel intensities were averaged over each pixel to obtain the initial circles. This process was repeated for all positions of the circles, the positions being
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
131
Figure 21. Result of application of median filter to small circles. Here the unprocessed circles (left column) have radius 1.667 pixels, and ¼ 0:6; in each direction the center positions range between 0.0 and 0.5 pixels relative to the center of a central pixel: e.g. (a) center at (0.00, 0.00); (b) center at (0.25, 0.25); (c) center at (0.50, 0.50). Note that in all cases the unprocessed circles have the full gray-scale value of 1.000 (corresponding to 256 gray levels) at the center, whereas in some cases the processed circles have centers well below 1.000 (this effect is called ‘‘dome-slicing’’): nevertheless, at ¼ 0:6 every 3 3 neighborhood still sees an edge rather than a circle. For larger values of this is not always so. From Davies (1999b).
varied over a 16 16 subarray of positions5 ranging from 0.0 to 0.5 pixels in each direction. This amount of averaging was sufficient to give agreement between theory and experiment of around 1%, as stated above.
E. Numerical Computations for 5 5 Neighborhoods Following the successful calculation of median shifts for 3 3 neighborhoods, it was felt worthwhile to attempt calculation for larger 5
The spacings of the subarray of positions and those of the subpixels are independent: each spacing was made as small as necessary to achieve the required degree of accuracy in the results. However, they correspond to equivalent accuracies, as the subpixel boundaries form a subarray of size 16 16.
132
E. R. DAVIES
neighborhoods, and even to consider the situation for such large neighborhoods that the lattice would become a continuum. At the same time, it would prove possible to move toward the more ideal circular configuration, which should lead to much more isotropic characteristics than for a square configuration, and to this end a set of truncated neighborhoods that would map reasonably well to the circular format were devised: these are illustrated in Figure 22. First, the case of 5 5 neighborhoods was tackled (Davies, 1998). Although square 5 5 neighborhoods would correspond more closely to the situation for 3 3 neighborhoods, the truncated 5 5 neighborhood of Figure 22b was also examined. The pattern of the earlier 3 3 calculation immediately showed that at very low curvature a square 5 5 neighborhood would result in the following pixels acting in turn as the median pixel, for orientation increasing gradually from 0 to 45 : ð0; 1Þ; ð0; 2Þ; ð0; 0Þ; ð1; 2Þ; ð1; 2Þ; ð0; 0Þ; ð2; 2Þ; ð1; 1Þ whereas for the truncated 5 5 neighborhood the following slightly different sequence of pixels would apply, the (2; 2) pixel now being absent from the neighborhood: ð0; 1Þ; ð0; 2Þ; ð0; 0Þ; ð1; 2Þ; ð1; 2Þ; ð0; 0Þ; ð1; 1Þ Note that in both of these cases, the first pixel is below the central pixel rather than above it—as happens in the case of a 3 3 neighborhood; this means that D will increase rather than decrease as increases from 0 . The significance of these sequences is that the median arc rotates about each of the pixels in turn until it contacts the next one in the sequence, and the rotation around any individual pixel corresponds to a segment being drawn in the angular variation: n pixels correspond to n segments, and for the truncated 5 5 neighborhood n ¼ 7 for the lowest few angular variations shown in Figure 23. For a square 5 5 neighborhood n ¼ 8 and the variations for the lowest curvatures have eight segments. Although this simple picture is accurate for low curvatures, the situation becomes much more complex for higher curvature values, and neither the pixels that act as turning points nor their numbers n can be determined without detailed calculation. Accordingly, it was necessary to make numerical tests for each curvature value and for each orientation, in order to determine which pixel was acting as the median turning point in that situation. Curvature was increased in steps of 0.05 from 0.00 to 0.35, and orientation was increased in 0.5 steps from 0 to 45 . This was found to provide a sufficiently detailed picture of the situation, and to lead to sufficiently accurate graphs for useful measurements to be made (in the previous 3 3 case, the calculation was needed only for the latter purpose).
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
133
Figure 22. Placement of pixels within 3 3 and larger neighborhoods. (a) 3 3 neighborhood containing 9 pixels. (b) Truncated 5 5 neighborhood containing 21 pixels. (c) Truncated 7 7 neighborhood containing 37 pixels. (d) Truncated 9 9 neighborhood containing 69 pixels. (e) Truncated 11 11 neighborhood containing 97 pixels. (f) Truncated 13 13 neighborhood containing 137 pixels. All neighborhoods are octahedral except (a) and (f ), and approximate as closely as possible to the circular formats shown.
Figure 23. Angular variations of median shifts for truncated 5 5 neighborhood. The graphs show the variations in steps of 0.05 from ¼ 0:05 (lowest) to ¼ 0:35 (highest). From Davies (1998).
The angular variations in shift that were obtained for the truncated 5 5 neighborhood are shown in Figure 23. Figure 24 contrasts the variations for square and truncated 5 5 neighborhoods by taking two specific curvature
134
E. R. DAVIES
Figure 24. Comparison of angular variations for square and truncated 5 5 neighborhoods. The angular variations for square and truncated neighborhoods are shown in gray and black, respectively, in two cases: (a) ¼ 0:20 and (b) ¼ 0:35. Notice the higher anisotropy exhibited by the square neighborhood.
Figure 25. Variations in median shifts for a square 5 5 neighborhood. The results for simulated circles are distinguishable only at the right-hand end of the upper curve, where they are shown dotted.
values, ¼ 0:20 and 0.35. In particular, the highest curvature value ¼ 0:35 shows the considerably higher anisotropy that exists for the square neighborhood. The mean shifts for the two types of neighborhood are shown in Figures 25 and 26a. The gradual trend from a quadratic toward a linear variation is apparent in both cases. Notice that, as in the 3 3 case, the quadratic variation arises from integration of the approximately linear segments in the angular variations [see Eq. (31)]. The closer adherence of the meanshifttolinearityforthetruncatedneighborhoodisbynomeanssurprising.
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
135
F. Numerical Computations for 7 7 Neighborhoods Following on from the theory and numerical calculations for 5 5 neighborhoods, numerical calculations were carried out for 7 7 neighborhoods. The aim of this was to clarify the point at which changeover from quadratic to linear variations occurs in the mean shifts, and to get some idea of how the isotropy improves for larger neighborhoods. The improvement in isotropy is demonstrated by Figure 27, which compares the angular variations in three cases—3 3, truncated 5 5, and truncated 7 7 neighborhoods. The three graphs were produced for curvature values selected to give similar mean shifts, and the steady increase in isotropy through the series is apparent. Indeed, the effect is so marked that the 7 7 neighborhood variation never meets the axis, and there is no difficulty in imagining that the anisotropy will tend to zero for infinitely large neighborhoods. Next, the mean shift is plotted in Figure 26b. Here it is rather surprising that though the initial quadratic variation appears to give over to a linear variation, further quadratic variations follow at higher curvatures. This means that further experiments are required to determine at exactly what point the expected linear mean shift variation finally takes over.
Figure 26. Variations in median shifts for truncated 5 5 and 7 7 neighborhoods. (a) Variation for truncated 5 5 neighborhood. (b) Variation for truncated 7 7 neighborhood. The results for simulated circles give such good agreement with these results that the variations are not distinguishable at this scale.
136
E. R. DAVIES
Figure 27. Comparison of angular variations for 3 3 and truncated 5 5 and 7 7 neighborhoods. The black graph shows the variation for a 3 3 neighborhood when ¼ 0:30. The dark gray graph shows the variation for a truncated 5 5 neighborhood when ¼ 0:20. The light gray graph shows the variation for a truncated 7 7 neighborhood when ¼ 0:15. These values of are chosen as they give comparable mean shifts: the figure illustrates the progressive improvement in isotropy as neighborhoods increase in size.
G. Tests of the Theory for 5 5 and 7 7 Neighborhoods In Section IV.D, the methodology for measuring the median shifts with simulated circles was described in the case of 3 3 neighborhoods. In this section we describe the results of similar measurements made for 5 5 and 7 7 neighborhoods. In the 5 5 case each pixel was subdivided into arrays of 11 11 rather than 15 15 subpixels, both when defining the local intensities of the circles and when setting their precise positions (Davies, 1998); correspondingly lower levels of subdivision were also applied for even larger neighborhoods (see below), it being found that less exacting specifications led to sufficient accuracy when matching theory and experimental results. The tests were again carried out for a range of curvature values, with the results shown in Figures 25 and 26. Figure 25 shows that in the case of square 5 5 neighborhoods, the agreement between theory and experiment is essentially exact up to 0:25, and deviates only slightly at the upper end of the range. In addition, Figure 26 shows that the agreement for truncated 5 5 and 7 7 neighborhoods is exact over the whole range (the theoretical and experimental graphs cannot be separated at the given scale). The improvements in agreement relative to the original work of Davies (1989) are due to two factors: one is that the discrete theory makes accurate predictions rather than approximate estimates; the other is that the circle
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
137
simulations were carried out more precisely: in particular, circle data were obtained for, and averaged over, all possible positions relative to the discrete lattice. As stated in Section IV.D, this averaging process proved crucial to achieving high overall accuracy in the 3 3 case. The limiting value of at which deviation started to occur for the square 5 5 neighborhoods was shown to arise because for higher values of other parts of the circle boundary start to appear within the neighborhood: this corresponds to what happened in the 3 3 case (see Fig. 21). H. Discussion At this point we have acquired a fairly sound understanding of the median shifts as they apply for neighborhoods of various sizes. In particular, we have found that the angular variations become more isotropic as neighborhoods increase in size, and we have achieved almost perfect agreement between theory and experiment for the mean shifts. In the latter case, the remaining disagreements can be understood in terms of the domeslicing effect in which the tops of the circles (intensity maxima for light circles or intensity minima for dark circles) are progressively cut off (recall that in the circle simulation experiments, the integrated intensity profile is used as the primary parameter from which the radius values are computed). Nevertheless, one aspect of the median shifts remains unexplained. This is the nonadherence of the 7 7 mean shift variation to a form composed of a quadratic followed immediately by a linear variation. One obvious explanation is that the linear segments in the angular variation will, when integrated, lead to positive and negative quadratic sections in the mean variation, and these will ultimately give over to the expected linear variation: in that case only the position at which the variation becomes linear remains unknown. Accordingly, we must attempt to determine this position, or else find for what size of neighborhood it finally becomes possible for a true linear variation to emerge. To this end, some sort of trend curve should be drawn from which this critical point can be deduced. This is attempted in the next section. I. Trends for Large Neighborhoods Following the discussion in the previous section, we now make some investigations of the trends in the mean shifts as the neighborhood size is increased. To achieve this we take normalized measurements both of shifts and of deviations from linearity, and determine how the deviation factor varies with the size of the neighborhood.
138
E. R. DAVIES
The first step is to determine a convenient working point at the upper end of each mean shift variation: this working point should if possible not be subject to deviations such as those shown in Figures 19 and 25, which are due to dome-slicing (see Fig. 21). To guarantee this we work at a level slightly below the circle size at which the object is liable to be eliminated by the median filter. This size is determined by the equation 1 b2 ¼ a2 ð38Þ 2 pffiffiffi which leads to the curvature ¼ b1 ¼ 2a1 . Somewhat arbitrarily, the safe level was taken to be ¼ a1 in all cases. (Note that if this turned out not to be an optimal level, the resulting trend curve should show obvious inconsistencies, thereby highlighting the problem.) The second step is to find a precise value for the radius a of any neighborhood: as none of the neighborhoods presented in Figure 22 is exactly circular, a consistent set of values has to be assigned to the individual radii, and in principle these values should have an absolute significance. The working approximation that was adopted was identical to that assumed in Section III—to define the radius in terms of the total area of the pixels in the neighborhood. Hence for a neighborhood containing n pixels, the radius a is given by a2 ¼ n, and we obtain a ¼ ðn=Þ1=2
ð39Þ
In principle this is a realistic approximation as it tends to the correct value as n tends to infinity. The resulting values of a are listed in Table 3. The third step is to calculate the safe curvature value ¼ a1 by applying the deduced value of a for each neighborhood, and substituting for a and in Eq. (13)6; at that stage an additional pixel sampling correction is included in Dth as a result of theory presented in Section IV.J. The final theoretical shifts Dth can then be plotted against the experimentally observed values Du , as in Figure 28. This graph also includes experimental plots obtained for truncated 9 9; 11 11, and 13 13 neighborhoods. The graph shown in Figure 28 is remarkably consistent and also shows a high absolute level of agreement between the theoretical linear shifts and the observed values. This adds support to the assignment of radius values given by Eq. (39), any variations from this equation at the large neighborhood end being less than about 3%. The relatively large discrepancy for a 3 3 neighborhood can almost certainly be ascribed to the linear region of the graph not being quite reachable before the circles are eliminated rather than 6
To be precise, Eq. (13) is not linear in , as it contains a cubic correction term; however, we shall still refer to the variation being ‘‘linear’’ in spite of this.
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
139
TABLE 3 Neighborhood Parameters Size
n
a
11 33 55 77 99 11 11 13 13
1 9 21 37 69 97 137
0.564 1.693 2.585 3.432 4.687 5.557 6.604
Figure 28. Agreement between observed (Du ) and theoretical (Dth ) shifts. The crosses correspond to a theoretical model that corrects for intrapixel shifts. The straight line through the origin would correspond to exact agreement between Du and Dth . The individual plots start from 1 1 at the lower left to 13 13 at the upper right.
merely reduced in size—as is indeed relatively obvious from Figure 19. Overall, the continuum model seems able to explain the shifts accurately at their upper reaches. Thus the remaining discrepancy between theory and practice relates to the deviations from linearity of the mean shift variations previously noted in the 7 7 case. To proceed further we need a measure of the deviation from linearity of the mean shift variations. The method that was adopted was to determine the maximum difference in shift between the observed and linear models (Fig. 29), and to express this difference as a proportion Erel of the selected
140
E. R. DAVIES
Figure 29. Construction needed to calculate relative absolute error Erel .
Figure 30. Trend for relative absolute error Erel . The continuous curve is the best fit of the form a1 to the experimental plots and indicates the rapid tendency to zero with increase in neighborhood size. The individual plots start from 3 3 at the upper left to 13 13 at the lower right.
upper shift value ¼ a1 . In fact, Erel expresses the relative error for each neighborhood in a dimensionless form that should decline with increase in neighborhood size. The results obtained when plotting values of Erel against neighborhood radius a are shown in Figure 30. It is clear that the discrepancy tends to zero as a tends to infinity. In fact, the continuous graph shown in Figure 30 represents a best fit a1 variation, and, if anything, the tendency to zero is even more rapid than this. (A priori, an a1
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
141
variation would have been expected because the fractional displacement error in representing a curve by discrete pixels should be inversely proportional to the size of the pixels.) Thus we have shown with a fair degree of rigor that the quadratic variations tend to zero for infinitely large neighborhoods, and that the continuum model then becomes accurate over the whole range of .
J. Effect of Sampling at the Center of a Pixel In Section IV.I, it was stated that an additional pixel sampling correction was included in Dth before constructing the graph in Figure 28. To understand this, recall that modern CCD sensors operate by averaging light intensity over the whole of the pixel area (ignoring a small percentage of blank area between one physical pixel and the next). This means that if the intensity contours are curved, similar shift effects will occur as for square neighborhoods containing many pixels—though naturally the geometric details will change. Here we concentrate on the case of a step edge profile, and consider a circular step edge passing close to the center of a single pixel (Fig. 31). To calculate the shift produced by within-pixel averaging, it is only necessary to find the position at which a circular are of curvature and orientation bisects the area of the neighborhood (here a degenerate 1 1 neighborhood). Taking the effective width of the neighborhood in this a, it can be seen (see Section III.A7) that to first order the shift direction as 2~ 1 a2 . Clearly, in this special case the effective width of the must be 6 ~ neighborhood will vary withporientation , being a minimum of 2a0 when ffiffiffi ¼ 0 and a maximum of 2 2a0 when ¼ 45 . In fact, the formula for the half-width ~ a is and the shift is therefore
1 2 6 a0
sec2 ¼
~ a ¼ a0 sec sec2 . Z
=4
Now the mean value of !
is
sec2 d =ð=4Þ
0
4 4 =4 ¼ ½tan 0 ¼ Thus the mean shift is 7
ð40Þ sec2
Section V.A provides a much fuller justification.
ð41Þ
142
E. R. DAVIES
1 4 2 a2 D1 a20 ¼ 6 3 0
ð42Þ
Note, however, that the fact that a curve is being considered means that there are intrinsically two parts to the calculation: one is the part carried out already, which assumes that the curve intersects with opposite sides of the pixel (Fig. 31); the other arises because for a small range of orientations near ¼ 45 the curve will pass through two adjacent sides of the pixel. This latter situation has quite different geometry and will cause ~a to have an altered orientation dependence. However, for low curvatures , this will lead to only a small correction to the above formula: this will not be considered further here, as the shift is very small. Indeed the shift is necessarily a factor n smaller than the normal median shift for an n-pixel neighborhood (in the case of a 3 3 neighborhood, the factor is about 7.1, and for a 5 5 neighborhood it is about 16.5). Hence it will be permissible to ignore this effect in most cases. Note that when median filtering is carried out and the median pixel in a neighborhood is not the central pixel, an additional small orientation effect will arise: this occurs because the within-pixel shift will have to be added vectorially to the pure median shift, and the effect will be a small reduction in the effective within-pixel shift. Note also that the effect described above arises from pure averaging, and emulates the situation for mean filtering rather than median filtering. Finally, we consider the interesting case of a 1 1 neighborhood. In general terms, a median filter acting within a 1 1 neighborhood must act as an identity operator, for which the shift is zero. However, it should still be
Figure 31. Close-up view of situation within the central pixel, with the median contour passing close to, but not through, the center of the central pixel.
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
143
subject to the pixel sampling correction described above: just as a larger neighborhood gives rise to an intrinsic shift to which the sampling correction has to be added, so in the 1 1 case the sampling correction should still be included when the intrinsic shift happens to be zero. In fact, such an effect will not be observed using the simulated circle type of test, as the integrated intensity of any circle is necessarily identical to that of the idealized circle (within an experimental error that can be made vanishingly small by employing enough subpixels in the simulation). This means that because of the way in which the measurement is made, experiment and theory cannot agree in this case: the lowest plot in Figure 28 shows this discrepancy. Nevertheless, the theory corresponds to the actual shift that will be observed when individual edge points rather than whole circles are examined. Overall, these considerations indicate that when full account is taken of the limitations of the method by which the edge shifts have been measured, the true situation is that there can only be a slight trend away from perfect agreement at the lower end of the graph in Figure 28. However, the really important point is that there is almost perfect agreement in the upper reaches of the graph, so it is now understood exactly how the discrete lattice results approach a continuum as n ! 1.
K. Case of Median Filter with Small Circles In previous sections we have considered only the case of relatively large circles that will not be eliminated by the median filter. However, no theory of median filtering can be complete without some consideration being given to the case of small circles. We attempt to eliminate this deficiency in the present section. First, we recall that circles that will be eliminated by a median filter are those whose areas b2 are smaller than half thepffiffiarea a2 of the ffi neighborhood, the limiting case being given by b ¼ a= 2. In practice it is found that some larger circles (with lower curvatures) are liable to be eliminated because of discrete effects and spacings in a digital lattice. As a result the limit is closer to b a than to the theoretical limit given above (see also Section IV.I). Figure 32 shows the expected situation in terms of curvature. Above a critical value of curvature, the circle suddenly disappears, and consequently the shift suddenly increases to b ¼ 1=; the curve 1= is sketched in as the asymptotic graph for all such cases. Next, we note that tests with gray-scale circles do not yield exactly this variation, but rather seem to show a
144
E. R. DAVIES
Figure 32. Expected edge shifts for median filter applied to small circles. (a) Basic variation. (b) 1= variation. (c) Nature of observed variation.
continuation of the approximately linear (16 a2 ) behavior, until eventually the graph meets the 1= curve (Fig. 32c). It seems unlikely that discrete effects are required to explain this general situation, since these are mainly manifest up to about ¼ 1=a (Section IV.B). Hence this is more likely to be a continuum effect. However, a simple explanation of this behavior is that the gray-scale circles used in the simulation approximate not to step-edge circles but to circles with a rapidly varying slant edge, which we will here take to be linear in form, as indicated in Figure 33. In that case, the circle will not disappear completely until the lowest intensity component disappears. Likewise, it will start disappearing when the highest intensity component disappears. These two events occur respectively for b1 ¼ b þ t=2
ð43Þ
b2 ¼ b t=2
ð44Þ
where t is the overall width of the slant edge: note that b1 leads to the low curvature value 1 ¼ 1=b1 and b2 leads to the high curvature value 2 ¼ 1=b2 . ( ¼ 1=b is the nominal curvature corresponding to a point half-way up the slant edge.) Next, the value of b2 can be measured from the observed variation. This has been carried out for neighborhoods varying in size from 5 5 to 13 13 pixels, the results giving a remarkably consistent value of t: 1.422, 1.492, 1.433, 1.407, 1.496; the mean is 1:450 0:041, whereas the median of these pffiffiffi values is 1.433. Theoretically we would expect t to lie between 1 and 2
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
145
(1.414) and to have a value of about 1.20, as a result of averaging the effects of different edge orientations. However, the measurements on the simulated circles are given by the highest value of , where the D variation finally joins the 1 curve in Figurep34: ffiffiffi this corresponds to places where pffiffiffi the edge attains its maximum width of 2, thus leading to the value t 2. Considering the accuracy of the circle simulations, 1.450 agrees very well with this value, and there is no real discrepancy between theory and experiment. Nevertheless, we adhere to the experimental value of t in subsequent calculations (see, for example, Section VI.D).
Figure 33. Linear slant edge model of a circular object.
Figure 34. Expected edge shifts for median filter using linear slant edge model.
146
E. R. DAVIES TABLE 4 Curvature Breakpoints for Median Filter p
n
a
0
1
2
3 5 7 9 11 13
9 21 37 69 97 137
1.693 2.585 3.432 4.687 5.557 6.604
0.836 0.547 0.422 0.302 0.255 0.214
0.520 0.392 0.317 0.248 0.215 0.185
2.119 0.906 0.588 0.386 0.312 0.254
Table 4 lists the curvature values at which circles should start and finish being eliminated by the median filter, for various neighborhoods up to 13 13 pixels. 0 is the value of at which the circle nominally disappears, and this value should essentially be stretched out by the effect of the slant width t, so that the circle will disappear over the range 1 to 2 (Fig. 34). However, the observed variation is not jerky as in Figure 34, and actually merges gradually with the 16 a2 type of variation (Fig. 32c): as a result it is not possible to estimate the value of 1 from the observed variation. The reason for the gradual variation is that the rate of change of integrated intensity of a circular object as it moves into or out of the neighborhood is necessarily slow. Finally, it is pertinent to note that although this effect is explainable on a continuum model, its origins still lie in the discrete model, as is illustrated by the fact that t is quite close to 1 pixel in value. However, the principles presented here should still apply if wider slant edges arise in practical situations.
V. Shifts Produced by Mean Filters In this section we consider the shifts produced by mean filters in continuous images. As in the case of median filters, straight edges with symmetrical edge profiles cannot be shifted by mean filters, on account of symmetry. Hence we proceed directly to the two paradigm cases, step edges with circular boundaries and slant edges with circular boundaries. In both cases, the effects of noise will be ignored as we are considering the intrinsic rather than the noise-induced behavior of the mean filtering operation. A. Shifts for Step Edges To understand the situation for a curved step edge, we appeal again to Figure 14b, which shows the local intensity distributions that occur for
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
147
various displacements of the boundary. It is quickly seen that the result for the mean has to be identical to that for the median, because the local intensity distribution is exactly symmetric and bimodal at the point where the median filter is just switching from a left hand to a right hand decision: at that point the mean must give the same answer, since the median and the mean are coincident for a symmetric distribution. Hence we have shown that both median and mean give a shift of 16 a2 for a curved step edge.
B. Shifts for Linear Slant Edges We now proceed to calculate edge shifts where smoothly varying intensity functions exist. Basically we follow the methodology of an earlier section that studied edge shifts produced by median filters. However, the median calculation focused on the position of the median intensity contour, and was able to ignore the intensity pattern in the remainder of the neighborhood, as long as the intensities on one side of the median intensity contour are above and on the other side below that of the median. Here, the situation is different, as the mean takes account, through weighting, of all the intensity values in the neighborhood. Hence we adopt (Davies, 1991a) the simplest paradigm that will permit a rigorous calculation to be performed: the chosen paradigm is a linearly increasing intensity profile with curved contours of equal radius, and a curved step edge of known radius. Using the geometry of Figure 35, we find the mean intensity within a circular neighborhood C using the equation ZZ I ¼ 1 Iðx; yÞ dx dy ð45Þ a2 c
If (x0 ; 0) is the position on the x-axis and on the same intensity contour as the general point (x, y), then we have ðx x0 bÞ2 þ y2 ¼ b2
ð46Þ
where we are taking all relevant intensity contours as having the same radius b (Fig. 14a). In fact, the intensity at (x, y) will be given by x0 , since we assume that the basic intensity profile is linear, as stated above. Thus h i Iðx; yÞ ¼ Iðx0 ; 0Þ ¼ x0 ¼ x b þ ðb2 y2 Þ1=2 ð47Þ
where we have ignored the solution with the negative square root, since this would correspond to a position (x, y) outside the neighborhood. A series expansion now gives
148
E. R. DAVIES
Figure 35. Geometry for calculation of contour shifts using the mean filter. From Davies (1991a).
Iðx; yÞ ½x b þ bð1 y2 =2b2 þ Þ ðx y2 =2bÞ ZZ
1 ; I 2 a
y2 dx dy 2b
x
ð48Þ ð49Þ
c
By symmetry the first term integrates to zero, and we integrate the remaining term by converting to polar coordinates (x ¼ r cos ; y ¼ r sin ): ; I 2a2 b
Z2 Za 0
¼
r2 sin2 r dr d ð50Þ
0
a2 8b
Now this intensity value would normally have arisen at location x0 ¼ I = ¼
a2 8b
ð51Þ
i.e., we have deduced that there is an effective right shift of the edge by a2 =8b, or more generally by 18 a2 , where is the local curvature on the intensity contours. It is a simple matter (Davies, 1991a) to modify the above result to Gaussian instead of uniform weighting. We start by writing ; I
Z2Z1 0
0
2
2
Iðx; yÞ expðr =2 Þ r dr d =
Z2Z1 0
0
expðr2 =2 2 Þr dr d ð52Þ
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
149
where I(x, y) is still given by Eq. (48), and the resulting shift is given by ¼ I =
ð53Þ
In this case the term in x [Eq. (48)] again vanishes identically, by symmetry. Hence
Z1
2
2
2
r expðr =2 Þr dr
1 0 Z1 2b
2
2
expðr =2 Þr dr
0
¼
Z2
sin2 d
0
Z2
d
ð54Þ
0
1 1 2 2 ¼ 2 =2b 2b 2
i.e., we have a right shift of 12 2 . We may note that the two results obtained above for the right shift are comparable, since they correspond to being of similar value to a=2. This is reasonable, since a Gaussian drops fairly rapidly around the value r ¼ 2 . Returning now to consider the mean, it is of interest to compare the results for the mean and median filters. Note that these are nearly identical, differing only slightly in a numerical factor: 18 for the mean and 16 for the median. The fact that the median filter gives marginally higher shift is understandable since the median focuses on the median contour, rather than taking account of the precise intensity values over the whole neighborhood.
C. Discussion Section V has investigated the shifts on intensity contours and edges that are caused by application of median, mean, and Gaussian filters. Mean filters were found to produce shifts similar to, but generally slightly smaller than, those for median filters, and in both cases these can be interpreted by the form shift ¼ ’ S c
ð55Þ
where ’ ¼ 2a is the angle through which the contours turn within the neighborhood, S is a parameter indicating the linear dimensions of the neighborhood, and c is a numerical factor depending on the particular filter employed. That these filters should differ only in a numerical factor is reasonable since they all operate by averaging mechanisms that have similar fundamental effects. [Note the general feature of Eq. (55) that straight edges have ’ ¼ 0 and give zero shift.]
150
E. R. DAVIES
An important observation is that edge shifts are not automatically avoided merely by choosing an alternative method of filtering, since they arise as a fundamental consequence of whole-neighborhood averaging operations. It is intriguing that the median always gives the same shift, whereas the mean gives a shift varying from the value 16 a2 to the value 18 a2 as we go from a step edge to a linearly rising intensity variation. Since these are two extremes within a continuum, we can expect that the shift for a mean filter will always be between these limiting values. Finally, to test this theory, the methods of Section III should be suitable. However, the small shifts produced by the mean filter will largely be masked by the blurring it introduces, and will therefore be difficult to measure accurately.
VI. Shifts Produced by Mode Filters In this section we consider the shifts produced by mode filters in continuous images. As in the cases of median and mean filters, straight edges with symmetrical edge profiles cannot be shifted by mode filters, because of symmetry. Again we proceed to the two paradigm cases—step edges and slant edges with circular boundaries. Again, the effects of noise will be ignored as we are considering the intrinsic rather than the noise-induced behavior of the mode filtering operation. A. Shifts for Step Edges The situation for a curved step edge can again be understood by appealing to Figure 14b. The result for the mode also has to be identical to that for the median, because the local intensity distribution is exactly symmetric and bimodal at the point where the median filter switches from a left hand to a right hand decision: at that point the mode must give the same result, since the median and the mode are coincident for a symmetric distribution. Hence we conclude that the mode also gives a shift of 16 a2 for a curved step edge. B. Shifts for Slant Edges In this section we calculate edge shifts in a simple case where smoothly varying intensity functions exist. Basically we follow the methodology of earlier sections that studied edge shifts produced by median, mean, and Gaussian filters.
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
151
In this case the calculation is especially simple (Davies, 1997b). Using the geometry of Figure 14a, we consider the intensity pattern within a circular neighbourhood C. Of all the circular intensity contours appearing within C, the one possessing the most frequently occurring intensity, as selected by a mode filter, is the longest. Clearly, this is the one (M) whose ends are at opposite ends of a diameter of C. To estimate the shift in this case, all we need to do is to calculate the position of M, and determine its distance from the center of C. To proceed, we use the well-known formula relating the lengths of parts of intersecting chords of a circle, which in this case gives (see also Section IV.A) a2 ¼ Dð2b DÞ
ð56Þ
D2 2bD þ a2 ¼ 0
ð57Þ
Hence
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðb2 a2 Þ
1 1 ¼ b b 1 ða=bÞ2 ða=bÞ4 2 8
; D ¼ b
ð58Þ
1 2 1 a =b þ a4 =b3 2 8
where we have chosen the negative square root to ensure getting a solution within C. Writing b ¼ 1=, where is the curvature of the contours appearing within C, we find 1 1 D a2 þ 3 a4 2 8
ð59Þ
1 D ¼ a2 2
ð60Þ
or to fair accuracy
i.e., there is a right shift of the contour, toward the local center of curvature, of If we regard this set of contours as forming part of a gray-scale edge profile, then the mode filter shifts the edge through 12 a2 toward the center of curvature. 1 2 2 a .
C. Discussion Some comment on the marked difference between the cases of step edges and linear intensity profiles is called for. This is all the more interesting as the median filter produces identical shifts, of 16 a2 , for the two profiles (see
152
E. R. DAVIES
Table 5). In fact, of all the cases listed in Table 5, the outstanding one is the large shift for a mode filter operating on a linear intensity profile: what is special in this case is that the result relies on a single extreme contour length rather than an average of lengths amounting to an area measure. Hence it is not surprising that the mode filter gives an exceptionally large shift in this case. Next, when a mode filter is applied to a nonlinear case such as an edge with a sigmoidal intensity profile, an interesting situation arises. To understand this properly, note that variations in intensity gradient within the neighborhood affect the distribution of intensities, and that the distribution will be highest where the gradient is lowest. This means that if the gradient drops to zero at any point in the neighborhood, we revert to the type of situation that applies in the step edge case. However, another factor is also relevant: this is the fact that the neighborhood is circular. This means that what happens in the outermost (low and high intensity) reaches of the neighborhood will be less important, as relatively few pixels will be involved. As a result, the shift of 12 a2 that applies for a linear slant edge will remain approximately correct until distinct plateaus of intensity start to encroach upon the central section of the neighborhood (Fig. 36). This is the sense in which the term ‘‘intermediate’’ should be understood in the mode column of Table 5. This section has investigated the shifts of intensity contours and edges that are introduced by application of mode filters. Mode filters are found to produce similar but generally larger shifts than those produced by median filters, and these can again be interpreted in terms of the angle through which the intensity contours turn within the neighborhood [see Eq. (55)]. Finally, we note again that edge shifts are not avoided merely by choosing an alternative method of neighborhood averaging, but rather that they are intrinsic to the averaging process, and can be avoided only by specially designed operators (e.g., see Greenhill and Davies, 1994).
TABLE 5 Summary of Edge Shifts for Neighborhood Averaging Filtersa Filter Edge Type Step Intermediate Linear a
Mean
Median
1 2 6 a 1 7 a2 1 2 8 a
1 2 6 a 1 2 6 a 1 2 a 6
From Davies (1999b).
Mode 1 2 6 a 1 2 2 a 1 2 a 2
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
153
Figure 36. Position of mode within a circular neighborhood. (a) Models of slant edge. (b) Ranges of values of inverse intensity gradient . (c) Spatial distributions of intensities for a neighborhood of circular shape. (Left) Results for an ‘‘intermediate’’ slant edge, leading to a single mode. (Right) Results for an almost sigmoidal slant edge, leading to two modes. The two situations lead to quite different values of shift.
D. Case of Mode Filter with Small Circles This section echoes Section IV.K, in which how the median filter copes with small circles that are liable to be eliminated by the filter was discussed. In fact, the mode filter shows much more rigorous elimination of small circles than the median filter. However, it exhibits this property only above a certain critical value of curvature l (Fig. 37), adhering closely to the median characteristics below the critical level. Furthermore, there are actually two critical values, one (l ) at which the gradient of the shift characteristic increases markedly, and the other (u ) at which the characteristic merges rapidly with the asymptotic 1= curve. It is possible to explain these properties in very much the same way as for the median case (Section IV.K). We again assume that the gray-scale circles used in the simulation approximate not to step-edge circles but to circles with a rapidly varying slant edge, which are approximately linear in form, as indicated in Figure 33. However, the detailed explanation of how the mode shift occurs differs markedly from that for the median. In fact, the mode operates by concentrating on the plateaus and determining which of these has the larger area: this is an obvious generalization of the step edge case discussed in Section VI.A. There are two general cases, as shown in
154
E. R. DAVIES
Figures 38 and 39. As the curvature of the circle increases there is a position at which the area of the inner (highest intensity) circle falls completely within the neighborhood: that is the stage at which the gradient of the shift characteristic suddenly increases, and is a limiting case of each of the two cases shown in Figures 38 and 39. Finally, when the outer circle falls completely within the neighborhood (another limiting case of Fig. 39), there is a point at which no adjustment of the position will prevent the circular object from disappearing, and this corresponds to the point at which the characteristic meets the 1= curve, and again the gradient changes rapidly.
Figure 37. Expected edge shifts for mode filter using linear slant edge model. (a) and (b) Mode variation. (c) Continued median variation.
Figure 38. One general case of circular object with slant edges intersecting neighborhood n. The two shaded regions have equal area at the point at which the mode filter switches between output values.
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
155
Figure 39. Another general case of circular object with slant edges intersecting neighborhood n. The two shaded regions have equal area at the point at which the mode filter switches between output values.
Next we calculate the two breakpoints at which the mode characteristic changes gradient, using the notation of Section IV.K and Figure 33. The upper mode breakpoint in the variation is relatively easy to calculate, using the following formula: a2 ðb þ t=2Þ2 ¼ ðb t=2Þ2
ð61Þ
which results from the fact that the area outside the outer circle (but within the neighborhood) has to equal the area inside the inner circle, at the point at which the mode filter is about to eliminate the remainder of the object. Simplifying, we find that a2 ¼ 2b2 þ t2 =2
ð62Þ
so the upper mode breakpoint is u ¼ b1 ¼ ða2 =2 t2 =4Þ1=2
ð63Þ
It is not possible to obtain a closed formula for the lower mode breakpoint l . Its value has to be estimated numerically by adjusting the position of the gray-scale circle until the two crucial areas are equal (consider the limiting case lying between Figs. 38 and 39). Figure 40 gives the values of the breakpoints obtained in this way, and also shows the values obtained from the mode curves in Figure 37. In view of the simplicity of the model, the degree of agreement between estimated and observed values is good. Finally, note that the mode filter breakpoints have been obtained using no assumptions other than those made in developing the corresponding model for a median filter: the single parameter t relates to fitting the median rather than the mode. Overall, the general features of the behavior of the mode filter now appear to be understood quite well, and with reasonable numerical accuracy. Indeed, it is perhaps surprising that so much has been achieved using a
156
E. R. DAVIES
Figure 40. Curvature breakpoints for mode filter. The continuous curves show the upper and lower mode breakpoints estimated using the model described in the text. The plots show the observed values of these breakpoints. Agreement between model and experiment is within about 4 %, but is closer to 1% if the differences between upper and lower breakpoints are compared.
continuum model, though creating a discrete model would in this case be rather difficult.
VII. Shifts Produced by Rank-Order Filters This section is particularly concerned with rank-order filters (Bovik et al., 1983), which form a whole family of filters that can be applied to digital images—often in combination with other filters of the family—in order to give a variety of effects (Goetcherian, 1980; Hodgson et al., 1985): other notable members of the family are max and min filters. Because rank-order filters generalize the concept of the median filter, it is relevant to study the types of distortion they produce on straight and curved intensity contours. It should also be pointed out that these filters are of central importance in the design of filters for morphological image analysis and measurement. In addition, it has been pointed out that they have some advantages when used for this purpose in that they help to suppress noise (Harvey and Marshall, 1995) (though note that the effect vanishes in the special cases of max and min filters). Section VII.A examines the reasons underlying the shifts produced by rank-order filters and makes calculations of their extent for rectangular neighborhoods. Sections VII.B and C generalize these results to circular neighborhoods. Section VII.D examines the extent to which the theoretical predictions of the previous sections are borne out in practice by
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
157
measurements of the shifts produced by 5 5 rank-order filters on circular discs of varying sizes. It will be taken as axiomatic that the application of rank-order filters produces edge shifts on real images (they are well attested in the case of max, min, and median filters): the main question to be answered here is the exact numerical extent of these shifts and how they may be modeled for general rank-order filters.
A. Shifts in Rectangular Neighborhoods In common with previous work in this area—see Sections III and IV—we here concentrate on the ideal noiseless case, in which the filter operates within a small neighborhood, over which the signal is basically a monotonically increasing intensity function in some direction. The most complex intensity variation that will be considered is that in which the intensity contours are curved with curvature . In spite of this simplified configuration, it will be found that valuable statements can be made about the level of distortion likely to be produced in practice by rank-order filters. Because of the complexity of the calculations that arise in the case of rank-order filters, which involve an additional parameter vis-a`-vis the median filter, it is worth studying their properties first for the simple case of rectangular neighborhoods (Davies, 2000d). Let us presume that a rankorder filter is being applied in a situation in which straight intensity contours are aligned parallel to the short sides of a rectangular neighborhood that we initially take to be a 1 n array of pixels (Fig. 41). In this case, we can assume without loss of generality that the successive pixels within the neighborhood will have increasing values of intensity. We next take the basic property of the rank-order filter as being (effectively or in fact) to construct an intensity histogram of the local intensity distribution and return the value of the rth of the n intensity values within the neighborhood. This means that the rank-order filter selects an intensity that has physical separation B from the lowest intensity pixel of the neighborhood and C from the highest intensity pixel, where B¼r1
ð64Þ
C ¼nr
ð65Þ
A¼BþC ¼n1
ð66Þ
These definitions emphasize that a rank-order filter will in general produce a D-pixel shift, whose value is
158
E. R. DAVIES
Figure 41. Basic situation for a rank-order filter in a rectangular neighborhood. This figure illustrates the problem of applying a rank-order filter within a rectangular neighborhood consisting of a 1 n array of pixels. The intensity is taken to increase monotonically from left to right, as in (b); the intensity contours in (a) are taken to be parallel to the short sides of the neighborhood. From Davies (2000d).
1 D ¼ ðn þ 1Þ r 2
ð67Þ
Before proceeding further, it will be useful to introduce a parameter that is more symmetric than r, and has value þ1 at r ¼ 1 and 1 at r ¼ n: ¼ ðn 2r þ 1Þ=ðn 1Þ
ð68Þ
With this notation, which we will use in preference to r throughout the remainder of the article, we can write down new formulas for B, C, D: 1 B ¼ Að1 Þ 2
ð69Þ
1 C ¼ Að1 þ Þ 2
ð70Þ
1 D ¼ ðn 1Þ 2
ð71Þ
The properties of the three paradigm filters are summarized in Table 6 in terms of these parameters. We now proceed to a continuum model, assuming a large number of pixels in any neighborhood (i.e. n ! 1). The main difference will be that we shall specify distance in terms of the half-length a of the neighborhood rather than in terms of numbers of pixels:
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
D ¼ a
159 ð72Þ
Next note that this formulation is independent of the width of the neighborhood, so long as the latter is rectangular. We now generalize the situation by taking the neighborhood to be rectangular and of dimensions a (Fig. 42). 2a by 2~ The next task is to determine the result of a curvature ¼ 1=b in the intensity contours. Here we adopt a simplified form of the calculation of Section III.A, approximating the equation of a circle of radius b, with its diameter on the positive x-axis and passing through the origin, as x ¼ y2 =2b
ð73Þ
We can integrate the area under an intensity contour (see Fig. 42) as follows: Z ~a Z ~a y2 dy ¼ ð1=2bÞ½y3 =3~a~a x dy ¼ ð1=2bÞ K ¼ ~a ~a ð74Þ 1 1 a3 a3 ¼ ~ ¼ ð1=6bÞ 2~ a3 =b ¼ ~ 3 3
TABLE 6 Properties of the Three Paradigm Filtersa Filter Median Max Min a
r 1 2 ðn
n 1
þ 1Þ
B
C
D
0 1 1
1 2A A 0
1 2A 0 A
0 12 ðn 1Þ 1 2 ðn 1Þ
From Davies (2000d).
Figure 42. Geometry of a rectangular neighborhood with curved intensity contours. Here the neighborhood is a general rectangular neighborhood of dimensions 2a 2~a. Again, the intensity is taken to increase monotonically from left to right; the intensity contours are taken to be parallel and in this case are curved with identical curvature . x and y axes needed for area calculations are also shown. B and C represent the areas of the two shaded regions on either side of the thick intensity contour. From Davies (2000d).
160
E. R. DAVIES
We deduce that the shift D is given by 1 B ¼ 2~ aða DÞ þ ~a3 3
ð75Þ
1 C ¼ 2~ aða þ DÞ ~a3 3
ð76Þ
2 aD=A ~a3 =A ¼ ðC BÞ=A ¼ 4~ 3
ð77Þ
a A ¼ 4a~
ð78Þ
1 2 1 a ¼ a þ ~a2 a þ ~ D ¼ A=4~ 6 6
ð79Þ
where
Hence
What is important about this equation is that it shows that the effects of rank-order and of curvature can be calculated and summed separately, the first term being that obtained above for the case of zero curvature, and the second term being exactly that calculated for a median filter when the a [the earlier calculation (Davies, 1989) intensity contour is of length 2~ related to a circular neighborhood]. Thus in principle we merely need to recompute the first term for any appropriate shape of neighborhood. However, there is a complication, in that the value of ~a depends on the value of for any neighborhood other than a rectangle: we shall show below (Davies, 2000d) how to allow for this. B. Shifts in Circular Neighborhoods For a circular neighborhood we first calculate the shift D for zero . Referring to Figure 43, the areas B and C are given by B ¼ a2 ða sin Þða cos Þ
ð80Þ
C ¼ a2 B
ð81Þ
Hence ¼ ½a2 2a2 ð sin cos Þ=a2 ¼ 1 ð2=Þð sin cos Þ ð82Þ Also D ¼ a cos
ð83Þ
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
This relation can be used to eliminate : h 1=2 i ¼ 1 ð2=Þ cos1 ðD=aÞ ðD=aÞ 1 D2 =a2
161
ð84Þ
There is no simple way of reformulating this equation to give D in terms of . However, it is exact and can be computed numerically. We next turn our attention to the curvature term. It is a simple matter to rewrite ~ a in terms of D: ~ a2 ¼ a2 D2
ð85Þ
1 2 1 2 a ¼ a D2 D ¼ ~ 6 6
ð86Þ
1 2 1 a ¼ D a2 D 2 D0 ¼ D ~ 6 6
ð88Þ
B ¼ a2 ð sin cos Þ b2 ð sin cos Þ
ð89Þ
C ¼ a2 B
ð90Þ
D ¼ a cos þ bð1 cos Þ
ð91Þ
a sin ¼ b sin
ð92Þ
Hence the curvature term is
Unfortunately, this gives only an approximate estimate of the value of this term, since the integral we performed earlier to obtain the numerical coefficient assumed that the upper and lower ends of the intensity contour were parallel to the x-axis, and this will not be the case when D is not equal to zero. Nevertheless, a reasonable approximation should be possible if we substitute D 16 ~ a2 for D everywhere in Eq. (84) for , i.e., use h 1=2 i 0 ¼ 1 ð2=Þ cos1 ðD0 =aÞ ðD0 =aÞ 1 D 2 =a2 ð87Þ where
The results of solving this equation numerically are shown in Figure 44a for ¼ 0:8=a. We remarked above that the solution for the model given above must be approximate, so it is useful to compare it with an exact numerical solution. We obtain the latter by the following computation, with reference to Figure 45. In this case we have
162
E. R. DAVIES
Figure 43. Geometry of a circular neighborhood for area calculations. This diagram defines the angle and the local width 2~a needed for area calculations. B and C represent the areas of the two shaded regions on either side of the thick straight intensity contour. From Davies (2000d).
Eliminating , we can express D and in terms of : ¼ a2 2a2 ð sin cos Þ þ 2b2 ð sin cos Þ =a2 n ¼ 1 ð2=Þ ½ sin cos 1=2 b2 =a2 ½cos1 1 a2 =b2 sin2
o 1=2 ða=bÞ 1 a2 =b2 sin2
sin D ¼ a cos þ b½1 ð1 a2 =b2 sin2 Þ1=2
ð93Þ
ð94Þ
Hence a graph of D against can be drawn using as an independent variable.8 The results obtained using the exact numerical result and that obtained with the model are compared in Figure 44 for the case ¼ 0:8=a. For this and lower values of , the results agree quite closely. However, when ¼ 0:5=a or less (which would be the case for most practical intensity contours), the agreement is almost exact. Thus the model is entirely adequate except for quite high curvatures.
8
In the case when b > a, taking as the independent variable leads to problems because a real value of does not occur for all values of : hence in this case it is better to take as the independent variable, and deduce a value of in the range =2 , using the equation
¼ sin1 ½ðb=aÞ sin .
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
163
Figure 44. Graphs of shift D against rank-order parameter for ¼ 0:8=a. (a) The graph of D against for the model described in Section III relating to application of a rankorder filter in a circular neighborhood, ¼ 0:8=a being a moderately high curvature of the intensity contours. (b) The results of an exact numerical computation (see text). From Davies (2000d).
C. Case of High Curvature It is worth noting that when the curvatures are very high, they may arise from spots that are entirely within the neighborhood, and then there is the possibility that they will be completely eliminated by the rank-order filter (note that noise points are entirely eliminated by a median filter, which indeed is the prime practical use of that type of filter). More important, the assumptions of both our model and the exact numerical solution break down when there is no intersection of the circular neighborhood and the intensity contour of radius b ¼ 1=. The limiting situation is given (Davies, 2000d) by writing Dlim ¼ 2blim a
ð95Þ
For that situation we also have Blim ¼ a2 b2lim
ð96Þ
Clim ¼ b2lim
ð97Þ
lim ¼ ð2b2lim a2 Þ=a2 ¼ 2b2lim =a2 1
ð98Þ
Hence
164
E. R. DAVIES
Figure 45. Geometry for exact numerical computation of D graphs. This diagram defines the angles and needed for exact area calculations. From Davies (2000d).
This gives blim ¼ a½ðlim þ 1Þ=21=2 Finally, substituting in D gives n o Dlim ¼ a ½ð2ðlim þ 1Þ1=2 1
ð99Þ ð100Þ
We can now deduce limiting values for p various filters: for a median pffiffiffi ffiffiffi filter, lim ¼ 0; blim ¼ a= 2, and Dlim ¼ að 2 1Þ; for a max filter, lim ¼ 1; blim ¼ 0, and Dlim ¼ a; and for a min filter, lim ¼ 1; blim ¼ a, and Dlim ¼ a. It should be noted that the limiting case represented by Eq. (100) is not indicated explicitly in Figure 44, since it is a locus of limiting points as we run over all values of : thus the relevance to Figure 44 is that the uppermost point of a valid curve must lie on the locus. This point is made more forcibly by considering the case shown in Figure 46, where b ¼ a=2 and ¼ 2=a: outside the confines of the allowed region, any spot will be eliminated by the rank-order filter: within the confines of the allowed region, the shift D is determined by the value of , where D Dlim ¼ 0 and lim ¼ 0:5. By way of example, as lim < 0, a median filter will be one rank-order filter, which will completely eliminate spots with b ¼ a=2. We now return to discuss various aspects of the exact results. In particular, the result for a median filter is the special case that arises when ¼ 0, and is in agreement with the calculations of Section III. Next, the max and min filters are also special cases and occur for ¼ 1 and 1, respectively. In these limiting cases, the shifts are D ¼ a and a, respectively, the results being independent of : this is as might be expected a is zero in each case. Between the max and min filters and since the value of ~
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
165
Figure 46. Graph of shift D against rank-order parameter for ¼ 2=a. This graph shows the results of an exact numerical computation for the case ¼ 2=a. The region within which a circular spot of radius b ¼ 1= is eliminated rather than shifted by the filter appears shaded. From Davies (2000d).
the median filter, there is a continuous gradation of performance, with very significant but opposite shifts for the max and min filters, and the two basic effects cancelling out for median filters—though the cancellation is exact only for straight contours. The full situation is summarized in Figure 47.
D. Test of the Model in a Discrete Case This section is devoted to testing the model to determine how accurate a representation it provides with the discrete pixel lattices that arise in practice. Median and other rank-order filters are normally applied in neighborhoods ranging in size from 3 3 to 1919, with more applications at the lower end of the scale because of the well-known computational cost of such filters. It is not the purpose of this section to test in detail all situations that can arise: in particular, it was considered best to make a rigorous test for a small discrete neighborhood, as the larger neighborhoods should approximate with ever improved accuracy to the continuum case that is assumed in the theory. Nevertheless, square neighborhoods are unlikely to match the theory well, and therefore a
166
E. R. DAVIES
Figure 47. Graphs of shift D against rank-order parameter for various . This diagram summarizes the operation of rank-order filters, with graphs, bottom to top, respectively, for ¼ 0; 0:2=a, 0:5=a; 1=a; 2=a; 5=a. Note that graphs for which b < a ð > 1=aÞ apply for restricted ranges of and D (see Section VII.C). From Davies (2000d).
truncated 5 5 neighborhood with the four corner pixels excluded (Fig. 22b) was selected in an effort to make the shape a somewhat closer approximation to circular. The test was carried out by constructing circular discs of various radii and testing them by applying 5 5 rank-order filters. The discs were obtained by applying ideal circular discs of the required size to an image, and testing each subpixel within the image space to determine whether its center was within the allotted disc area; all subpixels within this area were taken to contribute equally to the gray-scale intensity of the pixel containing them, and in this way a gray-scale image of each disc was built up; for this purpose each pixel was initially divided into an array of 11 11 subpixels. In addition, to prevent undesirable effects due to specific placement of the discs, the obtained shifts were averaged over all values obtained when the discs were moved by fractions of a pixel in the two axis directions: 11 11 subpixel locations were employed for this purpose. Finally, the shifts were measured by integrating the intensities of the discs in the processed images, and computing the equivalent area and hence the radius and the overall shift. These setup and computation procedures were found to give sufficient accuracy for the required purpose, and led to the shift variations shown in Figure 48. To avoid developing a separate algorithm for use when ¼ 0 (when the disc radius would have been infinite and edge effects would have had to be allowed for), the shifts for curvatures ¼ 0:05 were averaged: at
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
167
Figure 48. Shifts obtained for a typical discrete neighborhood. These shifts were obtained for rank-order filters operating within a truncated 5 5 neighborhood when applied to eight discrete circular discs with radii ranging from 10.0 down to 1.25 pixels, the mean curvatures being 0.1–0.8 in steps of 0.1; the lowest curve was obtained by averaging the responses from circular discs of radius 20.0 pixels, with curvatures 0.05, and to the given scale are indistinguishable from the result that would be obtained with zero curvature. The uppermost curve represents the theoretical limiting value given by Eq. (100). However, because of the directional effects that occur in the discrete case, the upper limit is actually lower than indicated by this curve (see text). From Davies (2000d).
the given scale the resulting curve was deduced to be indistinguishable from the result that would be obtained with zero curvature. The uppermost curve represents the theoretical limiting value given by Eq. (100) (except that the scale has been adjusted to meet the observed values at ¼ 1). However, it was found that directional effects occur in the discrete case, with the result that sometimes the smaller circular discs are eliminated by the filters, or else are partially eliminated in their higher intensity (lower radius) reaches. In such cases the accuracy of the shift variations becomes low: where it becomes totally unreliable the graphs are shown broken off. This explains why the individual variations do not meet the theoretical limiting curve, except near ¼ 1. The variations shown in Figure 48 are very close to what would be expected from Figure 47. The upward and downward curl at the ends of the curves—especially that for ¼ 0—is not as pronounced in Figure 48 as it is in Figure 47; and the overall shape, although similar, is by no means identical. On the other hand, it is extremely close considering that Figure 47 results from the continuum model, whereas Figure 48 results from a discrete
168
E. R. DAVIES
model employing a small neighborhood. It is doubtful whether a more detailed correspondence could be produced without attempting a full discrete model of the shifts. In the present context it would appear sufficient to demonstrate that the theoretical shifts for rank-order filters for ¼ 0 with ¼ 1 are close to those actually observed in Figure 47. Examining the truncated 5 5 neighborhood in Figure 48, we see that the outermost radius should approximate to 1 2 D ¼ 1 ð22 þ 02 Þ1=2 þ ð22 þ 12 Þ1=2 ¼ 2:157 3 3
ð101Þ
This value should be compared with the observed value of 2.17—well within 1% in spite of the approximations evident in both the model and the subpixel approximation to it. Indeed, it is possible to envisage a better approximation to D ¼ 1 than afforded by the above equation, by determining the mean distance from the neighborhood of a tangent line obtained by averaging over all orientations of such a line; this gives the improved result: pffiffiffi 2 2 pffiffiffi ð 2 þ 1Þ ¼ 2:174 ð102Þ D ¼ 1 ¼
The proof of this result is presented in the next section.
E. Mean Distance from Center of Neighborhood to a Tangent Line In this section we calculate the mean distance from the center of a truncated 5 5 neighborhood to a line that is just in contact with it, as the orientation of the line is varied from 0 to 2. The first part of the proof involves noting that the neighborhood tangent line (properly, it is a tangent to the convex hull of the neighborhood) in general passes through points such as (1, 2). Its distance from the origin is given by the length of the normal from the origin to the tangent. In addition, the foot of the normal lies on a circle whose diameter is the line joining (0, 0) to (1, 2): the geometrypisffiffiffi shown in Figure 49. We immediately see that ¼ d cos , where d ¼ 5, and because of the high degree of symmetry, the mean value of is obtained by averaging over the range to (Fig. 49): Z Z pffiffiffi Z ¼ cos d =ð þ Þ d ¼ 5 d =
ð103Þ pffiffiffi ¼ 5ðsin þ sin Þ=ð þ Þ
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
169
Figure 49. Geometry for calculation of distance of line from the center of the neighborhood. From Davies (2000d).
pffiffiffi To proceed further, note that sin ¼ 1= 5, and ¼ =4 . Hence pffiffiffi pffiffiffi 4 5 1 1 2 1 2 2 pffiffiffi pffiffiffi þ pffiffiffi pffiffiffi pffiffiffi ð 2 þ 1Þ ¼ 2:174 ð104Þ ¼ ¼ 5 5 5 2
as quoted in Section IV.
F. Discussion Section VII has considered the shifts produced by rank-order filters on curved boundaries and contours. It has derived a generalized continuum model of these shifts, showing the existence of two intrinsic boundary shifting mechanisms—the one corresponding to rank-order induced neighborhood area division, and the other corresponding to the curvature shifting effect already known to apply to median filters. The model makes close predictions of shifts for moderate curvature values, but for high curvature values exact numerical calculations are required. The model and numerical calculations cover as special cases median, max, and min filters. It is of interest that the curvature shifting effect tends naturally to zero for high and low rank filters, so that max and min filters produce the intuitively expected shifts of a where a is the effective radius of the neighborhood. The finding that general rank-order filters produce shifts even of straight intensity contours has some interesting consequences. First, rank-order
170
E. R. DAVIES
filters can in principle be designed to give zero shift for approximately circular objects of known size. Second, they can in principle be designed to cancel out the shifts of previously applied rank-order filters, at least on certain objects of known size. Clearly, such tools are somewhat limited, as they cannot cause cancellation for noncircular objects or those of unknown size. For median filters, linear intensity variations and step edges are two extreme situations whose shifts are nevertheless predicted by the same formulas. This statement also applies to all rank-order filters, but no specific proof is provided here. Suffice it to say that median, max, and min options apply equally to binary intensity functions. The theory presented in this section should be valuable in leading to greater understanding of the properties of this important class of filter, since it covers the whole range from filters where shifts are an embarrassment (the median filter) to shifts that constitute a major part of the desired behavior (max and min filters)—as in morphological image analysis. The next section covers another interesting aspect of rank-order filters— the degree of isotropy they offer when implemented in a square neighborhood. However, note that this example is valuable more for the light it throws on the operation of these types of filter than for its immediate applicability.
VIII. Rank-Order Filters—a Didactic Example This section studies the characteristics of the whole family of rank-order filters, taking the case of a square neighborhood as a didactic example. With rank-order filters, the shifts have been seen (Section VII) to arise from two causes: one is due to the curvature of the intensity contours and the other is characteristic of rank-order filters and varies with the rank parameter r. Since the curvature-induced shifts are generally noticeable only when the characteristic rank-order shifts are small, as happens particularly with the median filter, we shall ignore them here. Instead we concentrate on finding how straight edges of arbitrary orientation are affected by rank-order filters. The theoretical problem is simplified by adopting a continuum approximation: the pixellation of the digital lattice will be ignored until a later stage. A. Analysis of the Situation Figure 50 shows the basic situation, in which parallel straight intensity contours of orientation impinge upon a square neighborhood of side a. It
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
171
Figure 50. Geometry for calculation of edge shifts.
will not be necessary to specify the intensity profile, except to the extent that it should be monotonically increasing in one direction: rank-order filters are known not to affect the shapes of monotonically increasing intensity profiles composed of parallel straight intensity contours (Hodgson et al., 1985). To proceed, we follow the methods of Section VII.A by converting from rank r to a more symmetric parameter which is þ1 for r ¼ 1 and 1 for r ¼ n: ¼ ðn 2r þ 1Þ=ðn 1Þ
ð105Þ
Next, we note that the rank-order filter will divide the area of the neighborhood into two parts such that 1 1 A0 A = A0 ¼ ð106Þ 2 2 where A0 ¼ a2 is the area of the neighborhood, and A is the area in the part above the rank r intensity contour. Equation (106) follows as ¼ 1 at the very top of the neighborhood, where A ¼ 0, and ¼ 1 where A ¼ A0 . Assuming that maximum intensity occurs at the top of the image, and if the rth intensity contour is above the contour marked (Case 1), A is just the area A1 down to the rth contour. Hence we have
172
E. R. DAVIES
A1 ¼
a2 u2 tan ¼ u2 cosec 2 2d 2
ð107Þ
On the other hand, if the rth intensity value is below the contour marked (Case 2), A will be the area A1max plus another area A2 , where 1 A1max ¼ a2 tan 2
ð108Þ
A2 ¼ a sec
ð109Þ
In the two cases, the distance w of the contour of pixels of rank-order intensity r from the top corner of the neighborhood will be u (Case 1) or d þ (Case 2). To determine the shift produced by the rank-order filter, we also need to know the projection p of the distance from the top corner to the center of the neighborhood in a direction normal to : 1 1 p ¼ pffiffiffi a cos ð=4 Þ ¼ aðcos þ sin Þ 2 2
ð110Þ
These formulas suffice to determine (1) from r; (2) A from ; (3) A1 and A2 from A and A1max ; (4) u and from A1 and A2 ; and (5) w and hence the shift D produced by the rank-order filter from 8 ðCase 1Þ
D is plotted for various values of in Figure 51. It is seen that there is considerable anisotropy when ¼ 1, though it is extremely rapidly attenuated as is reduced. Indeed, the anisotropy is, as might have been expected, exactly zero for a median filter ð ¼ 0Þ. In addition, there is a value of around 0.8 for which anisotropy drops pffiffiffi to a very low value. In fact, further theory shows that when ¼ 2ð 2 p 1Þffiffiffi ¼ 0:828, the shift is identical at ¼ 0 and ¼ =4, and has the value ð 2 1Þa ¼ 0:414a. In principle there is some advantage in choosing to be around 0.8284: not only will this give a substantial reduction in artifacts arising from noise, relative to the case of a minimum filter (Harvey and Marshall, 1995), but also it will markedly increase isotropy, at the same time as giving a substantial edge shift: this is indeed the very purpose of a low rank (r) filter when used for morphological applications. Naturally, the increase in rank (reduction in ) relative to a minimum filter causes a reduction in edge shift—but only (as further calculation shows) from a mean value of 0.637a (i.e., 2a=) to 0.414a. Similar comments apply for a maximum filter.
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
173
Figure 51. Variations in edge shift. The curves represent edge shifts for values of in steps of 0.1 ranging from 0 (coincident with the -axis) to 1.0 (topmost curve). Notice that the curve for ¼ 0:8 is almost isotropic, the optimal value of being 0.828.
B. Discussion Although the above calculation shows that a suitably selected rank-order filter will give an almost isotropic shift when a square neighborhood is employed, this does not mean that this method should automatically be used in practice. This is because neighborhoods are normally quite small— usually in the range 3 3 to 19 19—so pixellation will cause substantial deviations from the predictions of the continuum model presented above. In fact, the degree of isotropy achievable will depend closely on how precisely pixel centers fall into circles of various radii. Furthermore, if the central core of pixels determining the shift is large enough to permit a higher degree of isotropy, then it will also be possible to make the original neighborhood far more isotropic than a square. Nevertheless, the above example is valuable more in giving further insight into the action of the various rank-order filters. In addition, there might be occasions when a square morphological operation (i.e., one with a square structuring element) needs to be applied9: the above example shows that if it is required both to apply a square morphological operation using a rankorder filter and to reduce noise by employing a nonextreme rank, this will not give the desired effect, because the square neighborhood will act in an 9
It is by no means the case that morphological operations have to be isotropic in their effect: highly directional morphological operations are also often needed for specific purposes.
174
E. R. DAVIES
almost isotropic manner. It seems that in such cases the best that can be done is to decompose the operation into two 1D mutually perpendicular rank-order operations, so that the square neighborhood effect can be achieved.
IX. A Problem with Closing Texture analysis is an important area of machine vision, and is relevant not only for segmenting one region of an image from another (as in many remote sensing applications), but also for characterizing regions absolutely—as is necessary when performing surface inspection (for example, when assessing the paint finish on automobiles). Many methods have been employed for texture analysis. These range from the widely used gray-level cooccurrence matrix approach to Law’s texture energy approach, and from use of Markov random fields to fractal modeling (Davies, 1997a). One of the least computation intensive is Laws’ method, which involves application of a number of convolution filters to extract spots, edges, lines, waves, ripples, and other microfeatures, and then combines them using smoothing operations (Laws, 1979). In fact, there are approaches that involve even less computation and that are applicable when the textures are particularly simple and the shapes of the basic texture elements are not especially critical. For example, if it is required to locate regions containing small objects, simple morphological operations applied to thresholded versions of the image are often appropriate (Fig. 52) (Haralick and Shapiro, 1992; Bangham and Marshall, 1998). Such approaches can be used for locating regions containing seeds, grains, nails, sand, or other materials, either for assessing the overall quantity or spread or for determining whether there are regions that have
Figure 52. Idealized grouping of small objects into regions, such as might be attempted using closing operations. From Davies (2000b).
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
175
not yet been covered. The basic operation to be applied is the dilation operation, which combines the individual particles into fully connected regions. This method is suitable not only for connecting individual particles but also for separating regions containing high and low densities of such particles. The expansion characteristic of the dilation operation can be largely cancelled by a subsequent erosion operation, using the same morphological kernel. Indeed, if the particles are always convex and well separated, the erosion should exactly cancel the dilation, though in general the combined closing operation is not a null operation, and this is relied upon in the above connecting operation. We have applied closing operations to images of cereal grains containing dark rodent droppings in order to consolidate the droppings (which contain significant speckle—and therefore holes when the images are thresholded) and thus to make them more readily recognizable from their shapes. However, the result has been rather unsatisfactory as dark patches on the grains tend to combine with the dark droppings: this has the effect of distorting the shapes and also makes the objects larger. We have been able to partially overcome this problem by performing a subsequent erosion operation, so that the overall procedure is dilate þ erode þ erode. Initially, this seemed to be an ad hoc procedure, but on analysis it was found (Davies, 2000b) that the size increase actually applies quite generally when segmentation of textures containing different densities of particles is carried out. It is this general effect that we now consider.
A. Detailed Analysis Let us take two regions containing small particles with occurrence densities
1 ; 2 , where 1 > 2 . In region 1 the mean distance between particles will be d1 and in region 2 the mean distance will be d2 , where d1 < d2 . If we dilate using a kernel of radius a, where d1 < 2a < d2 , this will tend to connect the particles in region 1 but should leave the particles in region 2 separate. To ensure connecting the particles in region 1, we can make 2a larger than 1 2 ðd1 þ d2 Þ, but this may risk connecting the particles in region 2 (the risk will be reduced when the subsequent erosion operation is taken into account). Selecting an optimum value of a clearly depends not only on the mean distances d1 ; d2 but also on their distributions. Space prevents us from entering into a detailed discussion of this: we merely assume that a suitable selection of a is made, and that it is effective. The problem that is tackled here is whether the size of the final regions matches the a priori desired segmentation, i.e., whether any size distortion takes place. We start by
176
E. R. DAVIES
Figure 53. 1D particle distribution. z indicates the presence of a particle, and x shows the densities in the two regions. From Davies (2000b).
taking this to be an essentially 1D problem, which can be modeled as in Figure 53 (the 1D particle densities will now be given an x suffix). Suppose first that 2x ¼ 0. Then in region 2 the initial dilation will be counteracted exactly (in 1D) by the subsequent erosion. Next take 2x > 0: when dilation occurs, a number of particles in region 2 will be enveloped, and the erosion process will not exactly reverse the dilation. If a particle in region 2 is within 2a of an outermost particle in region 1, they will merge, and will remain merged when erosion occurs. The probability P that this will happen is the integral over a distance 2a of the probability density for particles in region 2. In addition, when the particles are well separated we can take the probability density as being equal to the mean particle density
2x . Hence Z 2a
2x dx ¼ 2a 2x ð112Þ P¼ 0
If such an event occurs, then region 1 will be expanded by amounts ranging from a to 3a, or 0 to 2a after erosion, though these figures must be increased by b for particles of width b. Thus the mean increase in size of region 1 after dilation þ erosion is 2a 2x ða þ bÞ, where we have assumed that the particle density in region 2 remains uniform right up to region 1. We next consider what additional erosion operation will be necessary to cancel this increase in size. In fact, we just make the radius ~a1D of the erosion kernel equal to the increase in size: ~ a1D ¼ 2a 2x ða þ bÞ
ð113Þ
Finally, we must recognize that the required process is 2D rather than 1D, and take y to be the lateral axis, normal to the original (1D) x-axis. For simplicity we assume that the dilated particles in region 2 are separated laterally, and are not touching or overlapping (Fig. 54). As a result, the
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
177
Figure 54. Model of the incidence of particles in two regions. Region 2 has sufficiently low density that the dilated particles will not touch or overlap. From Davies (2000b).
change of size of region 1 given by Eq. (113) will be diluted relative to the 1D case by the reduced density along the direction ( y) of the border between the two regions: i.e., we must multiply the right-hand side of Eq. (113) by b 2y . We now obtain the relevant 2D equation: ~ a2D ¼ 2ab 2x 2y ða þ bÞ ¼ 2ab 2 ða þ bÞ
ð114Þ
where we have finally reverted to the appropriate 2D area particle density 2 . Clearly, for low values of 2 an additional erosion will not be required, whereas for high values of 2 substantial erosion will be necessary, particularly if b is comparable to or larger than a. If ~a2D < 1, it will be difficult to provide an accurate correction by applying an erosion operation, and all that can be done is to bear in mind that any measurements made from the image will require correction. (Note that if, as often happens, a2D could well be at least 1.) a > 1; ~ B. Discussion This work was motivated by analysis of cereal grain images containing rodent droppings, which had to be consolidated by dilation operations to eliminate speckle, followed by erosion operations to restore size10. It has been found that if the background contains a low density of small particles that tend, upon dilation, to increase the sizes of the foreground objects, additional erosion operations will in general be required to accurately represent the sizes of the regions. The effect would be similar if impulse noise were present, though theory shows what is observed in practice, that the effect is enhanced if the particles in the background are not negligible in 10
For further background on this application see Davies et al. (1998). Davies (2000a).
178
E. R. DAVIES
size. The increases in size are proportional to the occurrence density of the particles in the background, and the kernel for the final erosion operation is calculable, the overall process being a necessary measure rather than an ad hoc technique.
X. A Median-Based Corner Detector It may be thought that the edge shifts discussed at length in this article always present problems, but there is one case in which they have been turned to advantage: this is a novel strategy for detecting corners, developed by Paler et al. (1984). It adopts an initially surprising approach based on the properties of the median filter. The technique involves applying a median filter to the input image, and then forming another image that is the difference between the input and the filtered images. This difference image contains a set of signals that is interpreted as local measures of corner strength. It may seem risky to apply such a technique since its origins suggest that far from giving a correct indication of corners, it may instead unearth all the noise in the original image and present this as a set of ‘‘corner’’ signals. Fortunately, analysis shows that these worries may not be too serious. First, in the absence of noise, strong signals are not expected in areas of background; nor are they expected near straight edges, since median filters do not shift or modify such edges significantly. However, if a neighborhood is moved gradually from a background region until its central pixel is just over a convex object corner, there is no change in the output of the median filter: hence there is a strong difference signal indicating a corner (see Section III.F). Paler et al. (1984) analyzed the operator in some depth and concluded that the signal strength obtained from it is proportional to (1) the local contrast, and (2) the ‘‘sharpness’’ of the corner. The definition of sharpness they used was that of Wang et al. (1983), meaning the angle through which the boundary turns. Since it is assumed here that the boundary turns through a significant angle within the filter neighborhood, the difference from the second-order intensity variation type of approach (based on modeling the local image intensity function in a Taylor series expansion) (Davies, 1997a) is a major one. Indeed, it is an implicit assumption in the latter approach that first- and second-order coefficients describe the local intensity characteristics reasonably rigorously, the intensity function being inherently continuous and differentiable. Thus the second-order methods may give unpredictable results with pointed corners where
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
179
directions change within the range of a few pixels. Nevertheless, it is worth looking at the similarities between the two approaches to corner detection before considering the differences. We proceed with this in the next subsection.
A. Analyzing the Operation of the Median Detector This subsection considers the performance of the median corner detector under conditions in which the gray-scale intensity varies by only a small amount within the median filter neighborhood. This permits the performance of the corner detector to be related to low-order derivatives of the intensity variation, so that comparisons can be made with second-order corner detectors. To proceed we assume a continuous analogue image and a median filter operating in an idealized circular neighborhood. For simplicity, since we are attempting to relate signal strengths and differential coefficients, noise is ignored. Next, recall that for an intensity function that increases ~ but that does monotonically with distance in some arbitrary direction x y, the median within the circular not vary in the perpendicular direction ~ neighborhood is equal to the value at the center of the neighborhood. This means that the median corner detector gives zero signal if the curvature is locally zero. If there is a small curvature , the situation can be modeled by envisaging a set of constant-intensity contours of roughly circular shape and approximately equal curvature, within the circular neighborhood that will be taken to have radius a (Fig. 55). Consider the contour having the median intensity value. The center of this contour does not pass through the center ~-axis. of the neighborhood but is displaced to one side along the negative x Furthermore, the signal obtained from the corner detector depends on this displacement. If the displacement is D, it is easy to see that the corner signal is Dgx~ since gx~ allows the intensity change over the distance D to be estimated (Fig. 55). The remaining problem is to relate D to the curvature . A formula giving this relation has already been obtained. The required result is 1 D ¼ a2 6
ð115Þ
1 K ¼ Dgx~ ¼ gx~ a2 6
ð116Þ
so the corner signal is
180
E. R. DAVIES
Figure 55. Geometry for estimation of corner signals from median-based detectors. (a) Contours of constant intensity within a small neighborhood: ideally, these are parallel, circular, and of approximately equal curvature; (b) cross section of intensity variation, indicating how the displacement D of the median contour leads to an estimate of corner strength. From Davies (1988b).
Note that K has the dimensions of intensity (contrast), and that the equation may be re-expressed in the form 1 ðgx~ aÞ ð2aÞ ð117Þ 12 so that, as in the formulation of Paler et al. (1984), corner strength is closely related to corner contrast and corner sharpness. To summarize, the signal from the median-based corner detector is proportional to curvature and to intensity gradient. Thus this corner detector gives an identical response to second-order intensity variation detectors such as the Kitchen and Rosenfeld (1982) (KR) detector. K¼
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
181
However, this comparison is valid only when second-order variations in intensity give a complete description of the situation. Clearly the situation might be significantly different where corners are so pointed that they turn through a large proportion of their total angle within the median neighborhood. In addition, the effects of noise might be expected to be rather different in the two cases, as the median filter is particularly good at suppressing impulse noise. Meanwhile, for small curvatures, there ought to be no difference in the positions at which median and second-order derivative methods locate corners, and accuracy of localization should be identical in the two cases.
B. Practical Results Experimental tests with the median approach to corner detection have shown that it is a highly effective procedure (Paler et al., 1984; Davies, 1988b). Corners are detected reliably and signal strength is indeed roughly proportional both to local image contrast and to corner sharpness (see Fig. 56). Noise is more apparent for 3 3 implementations and this makes it better to use 5 5 or large neighborhoods to give good corner discrimination. However, the fact that median operations are slow in large neighborhoods, and that background noise is still evident even in 5 5 neighborhoods, means that the basic median-based approach gives poor performance by comparison with the second-order methods. However, both of these disadvantages are virtually eliminated by using a ‘‘skimming’’ procedure, in which edge points are first located by thresholding the edge gradient, and the edge points are then examined with the median detector to
Figure 56. Result of applying median-based corner detector. (a) Original off-camera 128 128 6-bit gray-scale image; (b) result of applying the median-based corner detector in a 5 5 neighborhood. Note that corner signal strength is roughly proportional both to corner contrast and to corner sharpness. From Davies (1997a).
182
E. R. DAVIES
Figure 57. Comparison of the median and KR corner detectors. (a) Original 128 128 gray-scale image; (b) result of applying a median detector; (c) result of including a suitable gradient threshold; (d) result of applying a KR detector. The considerable amount of background noise is saturated out in (a) but is evident from (b). To give a fair comparison between the median and KR detectors, 5 5 neighborhoods are employed in each case, and nonmaximum suppression operations are not applied: the same gradient threshold is used in (c) and (d). From Davies (1998b).
locate the corner points (Davies, 1988b). With this improved method, performance is found to be generally superior to that for (say) the KR method in that corner signals are better localized and accuracy is enhanced. Indeed, the second-order methods appear to give rather fuzzy and blurred signals that contrast with the sharp signals obtained with the improved median approach (Fig. 57). Next, we note that the sharpness of signals obtained by the KR method may be improved by nonmaximum suppression (Kitchen and Rosenfeld, 1982; Nagel, 1983). However, this technique can also be applied to the output of median-based corner detectors. Thus overall, the latter seem to be at least as effective as detectors based on finding second-order intensity variations in the input intensity function. Finally, see Davies (1992a) for a paper covering a fast median filtering algorithm with application to corner detection.
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
183
XI. Boundary Length Measurement Problem At first sight, this section may seem off the main track of this article. However, it is actually quite strongly linked to the central theme, as it is fundamentally involved with the relation between a continuum and measurements made in a discrete lattice of pixels. There are many recognition schemes that involve tracking around the boundaries of objects. They include the ‘‘centroidal profile’’ or polar plot (r; ) method, the (r, s) method, the boundary orientation (s; ) method, and the boundary curvature (s; ) method, s being the boundary distance measured from some convenient point on the object boundary. These methods are described in some detail in Davies (1997a) and will not be considered further here. Simpler methods of recognizing objects also exist. One that has long been used is the ‘‘circularity’’ or ‘‘compactness’’ measure C ¼ area=ðperimeterÞ2 , which also involves measurement along the object boundary. The existence of a family of recognition schemes involving boundary distance s makes it worthwhile to develop accurate means for estimating s. Probably the simplest measure of boundary distance takes all eight neighbors of a given pixel as being one unit of distance away. However, it is more accurate to take the diagonally adjacent neighbors as being pffifficlearly ffi 2 times further away than the other four neighbors (Freeman, 1970)—a procedure that had become quite universal by 1977. At that stage Kulpa (1977) noted that this approach systematically overestimates the analogue boundary distance11 by a small factor, and he calculated a correction. Thus the Freeman measure pffiffiffi LF ¼ ne þ 2no ð118Þ was replaced by the measure
LK ¼ 0:948ne þ 1:341no
ð119Þ
ne and no being, respectively, the number of relevant even (nondiagonal) and odd (diagonal) Freeman chain code elements (Freeman, 1970). These measures are of the general form LG ¼ ne þ no
ð120Þ pffiffiffi where Kulpa assumed that = remains equal to 2. Later Proffitt and Rosen (1979) showed that this is valid, though the proof is purely 11
That is, distance measured in the original analogue space, before digitization.
184
E. R. DAVIES
Figure 58. Possible variations of LF with . These sketches show possible a priori variations of w ¼ LF =L with , L being an ideal boundary distance measure. From Davies (1991b).
Figure 59. Geometry for calculating the variation of LG with . OP and PQ are line segments with orientations 0 and 45 that represent the horizontal and diagonal sections of a line OQ with orientation . From Davies (1991b).
mathematical and the validity of the result is not obvious. Here we study the problem with a view to clarifying the situation (Davies, 1991b). A. Detailed Analysis First we note that the point of the measure LF is that it is exactly correct in the two limiting cases in which we have straight edge boundaries aligned at angles ¼ 0 and 45 to the pixel axes frame. However, between these limits LF varies with in an initially unknown way (Fig. 58). Next we follow Kulpa’s method for calculating the length of a segment of boundary consisting of horizontal and diagonal sections, where the overall horizontal displacement is a and the overall vertical displacement is b (Fig. 59). Then the true (Euclidean) displacement is L¼
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a 2 þ b2
ð121Þ
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
185
and the length measure LF ¼ ða bÞ þ We now wish to generalize LF to the form
pffiffiffi 2b
LG ¼ ða bÞ þ b ¼ a þ ð Þb
ð122Þ ð123Þ
where and are to be determined. Proceeding to polar coordinates (Fig. 59), we find L¼r
ð124Þ
LG ¼ r½ cos þ ð Þ sin
ð125Þ
so that the ratio (ideally equal to unity) is w ¼ LG =L ¼ cos þ ð Þ sin
ð126Þ
We now note that w can be rewritten in the form of a single cosine function: w ¼ cos ð Þ
ð127Þ
where ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 þ ð Þ2
ð128Þ
and tan ¼ ð Þ=
ð129Þ
However, we do not need to proceed with this detailed calculation, since it is our purpose here to point out some characteristics of the solution. In particular we note that w is a symmetrical function and that it must be centered symmetrically at ¼ 22:5 for the original case pffiffiffi
¼ 1; ¼ 2, since we know that w ¼ 1 for ¼ 0 and 45 [a formal proof can easily be obtained by substituting for and in Eq. (129)]. This itself is a remarkable result, since it shows an interesting symmetry between the cases of lines near to 0 and 45 (see below). In fact our a priori arguments led only to Figure 58a and b and certainly did not predict such a symmetry.
186
E. R. DAVIES
We next calculate the mean value of w: Z Z =4 ¼
cos þ ð Þ sin d = w 0
4 =4 ½ sin ð Þ cos 0 i pffiffiffi 4 h pffiffiffi
= 2 ð Þ= 2 þ ð Þ ¼ ¼
=4
d 0
ð130Þ
pffiffiffipffiffiffi i 2 2 2 1 hpffiffiffi ¼ 2 þ pffiffiffi equal to unity, but we also Clearly we have to adjust 2 þ to make w have to adjust the relative values of pand ffiffiffi to minimize errors. (The reason it is necessary to do this when only 2 þ appears to matter is that we have to attempt to minimize the deviation in w that can occur in any specific practical instance, i.e., for a specific value of .) Proffitt and Rosen do this by adjusting the relative values so that the standard deviation of the wð Þ 1 distribution is minimized (Proffitt and Rosen, 1979). However, we proceed differently. We note that our pffiffiffi starting values of and make w ¼ 1 cannot alter the symmetric. Furthermore, adjusting 2 þ to make w lateral placing () of the function if ; are maintained in the same ratio [see Eq. (129)]. Hence it cannot alter the symmetry. On the other hand adjusting the relative values of and will destroy the symmetry. Now it is easy to see that the symmetrical placing of w minimizes the maximum error, the mean square error, and a number of other possible error measures. Hence it is clear that the relative values of and must remain unchanged. We assert that this was not obvious a priori, but it confirms and puts a new gloss on previous pffiffiffiwork. Since we have now deduced that ¼ 2 , Eq. (130) and the condition ¼ 1 combine to give w
¼ pffiffiffi ¼ 0:948 ð131Þ 8ð 2 1Þ pffiffiffi 2 ¼ pffiffiffi ¼ 1:341 8ð 2 1Þ
ð132Þ
(Note that various other approximate versions of these values appear in the literature, several of them presumably having been produced by rounding or typographical errors.)
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
187
B. Discussion The above proof is based on the symmetry of the function wð Þ. However, no reason has been given explaining physically why this symmetry occurs. Take the case of p a straight almost horizontal line (Fig. 60a). In this case the ffiffiffi step contributes 2 to LF , whereas ideally it would only contribute pffiffiffi about 1—a clear overestimate. This interchange of the values 1 and 2 suggests step will contribute 1 to LF when that for a line near to 45 a horizontal pffiffiffi ideally it would contribute 2—thereby leading to an underestimate. However careful consideration (Fig. 60b) shows that this argument is fallacious, since the amount ideally contributed by the horizontal step is pffiffiffi 1= 2—so in fact we get an overestimate by the same factor as before. Thus the symmetry between the two limiting cases is quite subtle. The true situation is that in both cases, the amount contributed by the step should be pffiffiffi 1= 2 ð¼ cos 45 Þ of the amount actually contributed: it is only the resolved component of the step distance along the general direction of the line that should actually count. This section has studied the Kulpa boundary distance measure with a view to obtaining a better understanding of the mechanisms underlying choice of boundary distance calibration parameters. It is found that an interesting symmetry exists between the two limiting orientations, and that this why the parameters and should be exactly in the ratio pffiffiexplains ffi 1 : 2. Further insight may be obtained by referring to the papers by Dorst and Smeulders (1987), Beckers and Smeulders (1989), and Koplowitz and Bruckstein (1989).
Figure 60. Special cases of straight lines with orientations close to 0 and 45 . (a) The special case of a nearly diagonal line. In (b) note that the projection of the step along the general pffiffiffi direction of the line is 1= 2 of the length of the step, so taking the step as contributing a length of 1 pixel gives an over estimate by this factor. From Davies (1991b).
188
E. R. DAVIES
XII. Concluding Remarks This article has attempted to provide an understanding of the edge shifts that arise when certain types of filter are applied to digital images. Initial calculations and experiments related to median filters, but it was soon shown that the shifts are not avoided by applying alternative types of filter such as mean and mode filters. Indeed, the amount of shift appeared very similar in all three cases. In retrospect this is not too surprising, as each of these filters represents an averaging process that seems bound to produce a shift of the same approximate magnitude. However, it is possible to design filters that largely eliminate this problem, and among these is the hybrid median type of filter (though careful tests show that this type of filter reduces the shift only by a factor of around four, and does not eliminate it completely). Another filter that exhibited considerably reduced levels of shift distortion was a specially trained artificial neural network filter employing multilayer perceptrons (Greenhill and Davies, 1994): this showed especially good performance in inhibiting the chopping or filling in of corners (dark corners are better described as ‘‘chopped,’’ whereas light corners are better described as ‘‘filled in’’), though it was also good at preventing noise from causing bumps in boundaries. In fact, this type of filter was found to be susceptible to distortions in the training images, a factor that might affect the generality of this otherwise powerful approach to filter design: on the other hand, its capability for solving the problem at some level provides a useful existence theorem that satisfactory filters must exist, and indicates that more conventional filters could be designed with the right properties. One indication of this is provided by Davies (1992b), which showed the design and properties of a filter that is able to avoid edge bias in the vicinity of noise impulses. Although edge shifts are generally disadvantageous, they are turned to good advantage in general rank-order filters and morphological filters, where they are used for processing shapes to create other shapes and in particular to filter objects by size, shape, and other detailed characteristics. Such filters can be made sufficiently general to be able to cope with a great variety of intensity profiles, so the filtering action has to be regarded as not merely binary but also gray scale and even color processing. In this article, space has not permitted color to be discussed; for similar reasons morphological filters have been restricted to what can be achieved with rank-order filters. The latter form a scale on which each individual filter is characterized by the rank-order parameter r, and the shifts for the whole range of rank-order filters for any neighborhood of n pixels form an orderly progression from a to þa where a is the radius of the neighborhood. In
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
189
general, the shifts depend more on r than on the intrinsic neighborhood shift, but for the median filter, the rank-order shift is identically zero, so the relatively small intrinsic neighborhood shift is readily observed. Several attempts have been made to calculate and measure the intrinsic median shifts. The theory was first developed on a continuum approximation, i.e., suppressing any knowledge that the image lattice is discrete in nature. However, this led to difficulties in obtaining exact agreement with experiment, so ultimately a discrete theory of median shifts had to be devised. This demonstrated both highly accurate agreement with experimental measurements of shift, but also showed that the shifts produced by median filters are very far from isotropic. However, it is possible that this is an overly harsh judgment, as rank-order filters give much larger shifts, and the anisotropy of the median shifts is small compared with the large shifts of these other types of filter. Although excellent agreement has been obtained for median filters, mean filters lead to blurring, which largely masks the shifts, and no attempt has been made to derive a discrete model in this case. [Note, however, an interesting discrete calculation and experimental results for noise-induced edge shifts and edge orientation estimation for Sobel-like edge detectors that employ integral mean filtering (Davies, 1987).] The same situation applies for mode filters, though for general rank-order filters some attempts have been made to envisage the discrete shifts that exist. However, the fact that rank-order shifts are generally large means that there is little need to refine the continuum approach and create a detailed discrete model in that case. One further important factor has been found to be of great importance when calculating edge shifts: this is the intensity profile of the edges being investigated. Binary edges constitute a nice concept, but in real gray-scale images, the edge is bound to be gradual and to occur over a distance of about a pixel. It proved possible to measure this effect for both median and mode filters, and in all cases examined nominal step-edges appeared to have widths 1:45 pixels. At the other extreme from step edges lie linear slant edges. However, in the case of mode filters the curvature of the edge profile became important, and the most important parts of the characteristic were the edge plateaus. In fact, it appears that different types of filter seek out different parts of the intensity profile and act on it in different ways. This explains the detailed differences in edge shift that arise for mean, median, and mode filters. In particular, note that 1. Mean filters blur images and optimally suppress Gaussian noise. 2. Median filters no not blur images but are excellent at suppressing impulse noise.
190
E. R. DAVIES
3. Mode filters sharpen up images and are quite good at suppressing impulse noise. In both the latter cases, note that the words ‘‘small irrelevant signals’’ could be used to replace ‘‘impulse noise,’’ thereby emphasizing the underlying (signal-oriented) characteristics of these filters. So many different edge (intensity) profiles and so many shapes of edge boundary are possible that it is difficult to provide a full account of all the edge shifts that may arise in practice. Suffice it to say that the step edge and linear slant edge profiles provide useful extreme cases, whereas the circular edge boundary assumed consistently throughout the article represents a ‘‘worst case’’ scenario, i.e., one leading to the largest shifts. If the edge shifts are a nuisance rather than an advantage, there are three possible courses of action: (1) employ an alternative filter that minimizes or eliminates the effect; (2) do not apply any filter at all; (3) estimate the extent of the shift and allow for it in any subsequent measurements. In this article, we feel that the last approach is generally preferable, and to this end we have provided the clearest guidance that is currently available on the magnitude of the shifts that can arise in a number of important cases. Table 1 lists these cases and indicates where in the article each is discussed. It is hoped that the analysis of the situation provided in this article will prove of some value to those who are working with filters in the area of image measurement. Finally, some problems arose in trying to relate the shifts that arise for continua and discrete lattices of pixels. Such problems are omnipresent in image analysis and make themselves evident in a variety of ways. Another major example of this is in the estimation of boundary length for what was originally an analogue picture and then became a digital image. The work of Kulpa and others in this area has been outlined in Section XI, and leads to the idea that a digital image with square tessellation will systematically overestimate length by a factor 1.055, so multiplication of boundary distance by the factor 0.948 is necessary to compensate for this. Related topics include the design of fiducial marks to permit maximum accuracy of location measurement (Bruckstein et al., 1998), and the partitioning of digital curves into maximal straight line segments (Lindenbaum and Bruckstein, 1993). For a recent tutorial review of the problems of achieving accuracy and robustness in low-level vision, see Davies (2000c).
Acknowledgments The author is grateful to Derek Charles for help in measuring edge shifts in large neighborhoods and for small circles (Sections IV.I and K; VI.D). In
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
191
addition, he would like to credit the following sources for permission to reproduce tables, figures, and extracts of text from his earlier publications: Academic Press for permission to reprint portions of Chapters 3 and 13 of the following book as text in Sections III and X; and as Figure 56: Davies (1997a). Elsevier Science for permission to reprint portions of the following paper as text in Section III; as Table 2; and as Figures 2, 3, and 5–13: Davies (1989). EURASIP for permission to reprint portions of the following paper as text in Section IV; and as Figure 23: Davies (1998). The IEE for permission to reprint portions of the following papers as text in Sections IV, V, VI, IX, and XI; as Table 5; and as Figures 14, 16–21, 35, 52–54, and 58–60: Davies (1991a,b, 1997b, 1999b, 2000b). Professional Engineering Publishing Ltd. and the Royal Photographic Society for permission to reprint portions of the following paper as text in Section VII; as Table 6; and as Figures 41–49: Davies (2000d). Springer-Verlag (Heidelberg) for permission to reprint portions of the following paper as text in Section X; and as Figures 55 and 57: Davies (1988b).
References Bangham, J. A., and Marshall, S. (1998). Image and signal processing with mathematical morphology. IEE Electron. Commun. Eng. J. 10(3), 117–128. Beckers, A. L. D., and Smeulders, A. W. M. (1989). A comment on ‘‘A note on ‘Distance transformations in digital images’ ’’. Comput. Vision Graph. Image Process. 47, 89–91. Bovik, A. C., Huang, T. S., and Munson, D. C. (1983). A generalization of median filtering using linear combinations of order statistics. IEEE Trans. Acoustics, Speech Signal Process. 31(6), 1342–1349. Bovik, A. C., Huang, T. S., and Munson, D. C. (1987). The effect of median filtering on edge estimation and detection. IEEE Trans. Pattern Anal. Mach. Intell. 9(2), 181–194. Bruckstein, A. M., O’Gorman, L., and Orlitsky, A. (1998). Design of shapes for precise image registration. IEEE Trans. Inform. Theory. 44(7), 3156–3162. Coleman, G. B., and Andrews, H. C. (1979). Image segmentation by clustering. Proc. IEEE 67, 773–785. Davies, E. R. (1984). Circularity—a new principle underlying the design of accurate edge orientation operators. Image Vision Comput. 2, 134–142. Davies, E. R. (1987). The effect of noise on edge orientation computations. Pattern Recogn. Lett. 6(5), 315–322. Davies, E. R. (1988a). On the noise suppression and image enhancement characteristics of the median, truncated median and mode filters. Pattern Recogn. Lett. 7(2), 87–97. Davies, E. R. (1988b). Median-based methods of corner detection. In Proceedings of the 4th BPRA International Conference on Pattern Recognition, Cambridge (28–30 March), edited by J. Kittler, Lecture Notes in Computer Science. Berlin: Springer-Verlag, Vol. 301, pp. 360–369. Davies, E. R. (1989). Edge location shifts produced by median filters: Theoretical bounds and experimental results. Signal Process 16(2), 83–96.
192
E. R. DAVIES
Davies, E. R. (1991a). Median and mean filters produce similar shifts on curved boundaries. Electron. Lett. 27(10), 826–828. Davies, E. R. (1991b). Insight into operation of Kulpa boundary distance measure. Electron. Lett. 27(13), 1178–1180. Davies, E. R. (1992a). Simple fast median filtering algorithm, with application to corner detection. Electron. Lett. 28(2), 199–201. Davies, E. R. (1992b). Accurate filter for removing impulse noise from one- or two-dimensional signals. IEE Proc. E 139(2), 111–116. Davies, E. R. (1992c). Simple two-stage method for the accurate location of Hough transform peaks. IEE Proc. E 139(3), 242–248. Davies, E. R. (1993). Electronics, Noise and Signal Recovery. London: Academic Press. Davies, E. R. (1997a). Machine Vision: Theory, Algorithms, Practicalities. 2nd ed. London: Academic Press. Davies, E. R. (1997b). Shifts produced by mode filters on curved intensity contours. Electron. Lett. 33(5), 381–382. Davies, E. R. (1998). From continuum model to a detailed discrete theory of median shifts. Proc. EUSIPCO’98, Rhodes, Greece, 8–11 Sept., pp. 805–808. Davies, E. R. (1999a). High precision discrete model of median shifts. Proc. 7th IEE Int. Conf. Image Process. Appl., Manchester (13–15 July), IEE Conf. Publication No. 465, pp. 197–201. Davies, E. R. (1999b). Image distortions produced by mean, median and mode filters. IEE Proc. Vision Image Signal Process 146(5), 279–285. Davies, E. R. (2000a). Image Processing for the Food Industry. Singapore: World Scientific. Davies, E. R. (2000b). Resolution of problem with use of closing for texture segmentation. Electron. Lett. 36(20), 1694–1696. Davies, E. R. (2000c). Low-level vision requirements. Electron. Commun. Eng. J. 12(5), 197–210. Davies, E. R. (2000d). A generalized model of the geometric distortions produced by rank-order filters. Imaging Sci. J. 48(3), 121–130. Davies, E. R., Bateman, M., Chambers, J., and Ridgway, C. (1998). Hybrid non-linear filters for locating speckled contaminants in grain. IEE Digest No. 1998/284, Colloquium on. NonLinear Signal and Image Processing. IEE (22 May), pp. 12/1–5. Dorst, L., and Smeulders, A. W. M. (1987). Length estimators for digitized contours. Comput. Vision Graph. Image Process 40, 311–333. Evans, A. N., and Nixon, M. S. (1995). Mode filtering to reduce ultrasound speckle for feature extraction. IEE Proc. Vision Image Signal Process 142(2), 87–94. Fitch, J. P., Coyle, E. J., and Gallagher, N. C. (1985). Root properties and convergence rates of median filters. IEEE Trans. Acoust. Speech Signal Process 33, 230–239. Freeman, H. (1970). Boundary encoding and processing. In Picture Processing and Psychopictorics, edited by B. S. Lipkin and A. Rosenfeld, New York: Academic Press, pp. 241–266. Gallagher, N. C., and Wise, G. L. (1981). A theoretical analysis of the properties of median filters. IEEE Trans. Acoust. Speech Signal Process. 29, 1136–1141. Goetcherian, V. (1980). From binary to grey tone image processing using fuzzy logic concepts. Pattern Recogn. 12, 7–15. Greenhill, D., and Davies, E. R. (1994). Relative effectiveness of neural networks for image noise suppression. In Pattern Recognition in Practice IV, edited by E. S. Gelsema and L. N. Kanal, Amsterdam: Elsevier Science B. V., pp. 367–378. Griffin, L. D. (1997). Scale-imprecision space. Image Vision Comput. 15(5), 369–398. Griffin, L. D. (2000). Mean, median and mode filtering of images. Proc. R. Soc. 456(2004), 2995–3004.
GEOMETRIC DISTORTIONS PRODUCED BY IMAGE PROCESSING FILTERS
193
Haralick, R. M., and Shapiro, L. G. (1992). Computer and Robot Vision, Vol. 1. Reading, MA: Addison Wesley. Haralick, R. M., Sternberg, S. R., and Zhuang, X. (1987). Image analysis using mathematical morphology. IEEE Trans. Pattern Anal. Mach. Intell. 9(4), 532–550. Harvey, N. R., and Marshall, S. (1995). Rank-order morphological filters: A new class of filters. Proc. IEEE Workshop on Nonlinear Signal and Image Processing, Halkidiki, Greece, June, pp. 975–978. Heinonen, P., and Neuvo, Y. (1987). FIR-median hybrid filters. IEEE Trans. Acoust. Speech Signal Process 35, 832–838. Hodgson, R. M., Bailey, D. G., Naylor, M. J., Ng, A. L. M., and McNeill, S. J. (1985). Properties, implementations and applications of rank filters. Image Vision Comput. 3(1), 4–14. Kitchen, L., and Rosenfeld, A. (1982). Gray-level corner detection. Pattern Recogn. Lett. 1, 95–102. Koplowitz, J., and Bruckstein, A. M. (1989). Design of perimeter estimators for digitized planar shapes. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 611–622. Kulpa, Z. (1977). Area and perimeter measurement of blobs in discrete binary pictures. Comput. Graph. Image Process 6, 434–451. Laws, K. I. (1979). Texture energy measures. Proc. Image Understanding Workshop, Nov., pp. 47–51. Lindenbaum, M., and Bruckstein, A. M. (1993). On recursive, O(N) partitioning of a digitized curve into digital straight segments. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 949–953. Nagel, H.-H. (1983). Displacement vectors derived from second-order intensity variations in image sequences. Comput. Vision Graph. Image Process. 21, 85–117. Nieminen, A., Heinonen, P., and Neuvo, Y. (1987). A new class of detail-preserving filters for image processing. IEEE Trans. Pattern Anal. Mach. Intell. 9(1), 74–90. Paler, K., Fo¨glein, J., Illingworth, J., and Kittler, J. (1984). Local ordered grey levels as an aid to corner detection. Pattern Recogn. 17, 535–543. Proffitt, D., and Rosen, D. (1979). Metrication errors and coding efficiency of chain-encoding schemes for the representation of lines and edges. Comput. Graph. Image Process 10, 318–332. Wang, C., Sun, H., Yada, S., and Rosenfeld, A. (1983). Some experiments in relaxation image matching using corner features. Pattern Recogn. 16, 167–182. Yang, G. J., and Huang, T. S. (1981). The effect of median filtering on edge location estimation. Comput. Graph. Image Process 15, 224–245.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 126
Two-Photon Excitation Microscopy ALBERTO DIASPRO1 AND GIUSEPPE CHIRICO2 1
LAMBS-INFM and Department of Physics, University of Genoa, 16146 Genova, Italy LAMBS-INFM and Department of Physics, University of Milano Bicocca, 20126 Milano, Italy
2
I. II. III. IV. V. VI.
Introduction. . . . . . . . . . . . . . . . . . . . . . Historical Notes . . . . . . . . . . . . . . . . . . . . Basic Principles of Two-Photon Excitation of Fluorescent Molecules Behavior of Fluorescent Molecules under TPE Regime . . . . . Optical Consequences and Resolution Aspects . . . . . . . . . Architecture of Two-Photon Microscopy . . . . . . . . . . . A. General Considerations . . . . . . . . . . . . . . . . B. Laser Sources . . . . . . . . . . . . . . . . . . . . C. Lens Objectives . . . . . . . . . . . . . . . . . . . D. Example of the Practical Realization of a TPE Microscope . . VII. Application Gallery . . . . . . . . . . . . . . . . . . . VIII. Conclusions . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
195 198 202 212 219 225 225 235 242 244 257 273 276
‘‘If I have seen further it is by standing on the shoulders of giants.’’ (Isaac Newton in a letter to Robert Hooke, 5 February, 1676)
I. Introduction Microscopes offer a key to pursuing the goal of opening nature, providing clues as in a secret garden. As recently noted by Colin Sheppard (2002), a microscope is an instrument that magnifies objects by means of a specific interaction—more commonly by means of lenses—so as to capture details invisible to the naked eye. Microscopes transmit information based on image formation, which renders visible previously hidden objects. A ‘‘primary’’ observer is then required to interpret the image (Rochow and Tucker, 1994). Since Hooke’s ornate microscopes (Hooke, 1961) (Fig. 1) and van Leeuwenhoek’s single lens magnifiers (Ford, 1991) (Fig. 2), the development of the optical microscope has undergone a secure and continuous evolution marked by relevant and revolutionary passages in the past 350 years. Inventions in microscopy, stimulated by the needs of scientists, and technology contributed to the evolution of the microscope in its very different modern forms (Beltrame et al., 1985; Fay et al., 1989; 195 Copyright 2003 Elsevier Science (USA). All rights reserved. ISSN 1076-5670/03
196
DIASPRO AND CHIRICO
Figure 1. Drawing of Hooke’s microscope by Cock, 1665 from Micrographia (Hooke, 1961). Hooke did not make his own microscopes; they were made by London instrument maker Christopher Cock, whom Hooke gave much advice on design. In return, the success of Hooke’s book made Cock a very famous microscope maker, and popularized the side-pillar design (see also http://www.utmem.edu/~thjones/hist).
Figure 2. A1 Shinn’s homemade replica of van Leeuwenhoek’s microscope. Antony van Leeuwenhoek (1632–1723) was a microscopist and a microscope maker: he made more than 400 microscopes. Other information can be found at http://www.sirius.com/~alshinn. (Courtesy of A1 Shinn.)
TWO-PHOTON EXCITATION MICROSCOPY
197
Benedetti, 1998; Amos, 2000). Despite the fact that all far-field light microscopes, including conventional, confocal, and two-photon microscopes, are limited in achievable diffraction-limited resolution (Abbe, 1910), light microscopy still occupies a unique niche. Its favorable position, especially for applications in medicine and biology, comes from the peculiar ability of the optical microscope to image living systems at relatively good spatial resolution. Well-established three-dimensional (3D) optical methods such as computational optical sectioning microscopy (Agard et al., 1989; Bianco and Diaspro, 1989; Diaspro et al., 1990; Carrington et al., 1995; Carrington, 2002) and confocal laser scanning microscopy (Brakenhoff et al., 1979; Sheppard and Wilson, 1980; Wilson and Sheppard, 1984; Carlsson et al., 1985; White et al., 1987; Brakenhoff et al., 1989; Wilson, 2002) have been widespread since the 1970s (Weinstein and Castleman, 1971; Shotton, 1993). To penetrate the delicate and complex relationship between structure and function, three-dimensional imaging is relevant to obscure the major shortcoming of diffraction-limited resolution, which is in the range of 200 and 500 nm in the focal plane and along the optical axis, respectively. For the past 10 years, confocal microscopes have proved to be extremely useful research tools, notably in the life sciences. This mature and powerful technique has now evolved to 3D (x–y–z) and 4D (x–y–z–t) analysis allowing researchers to probe even deeper into the intricate mechanisms of living systems (Cheng, 1994; Pawley, 1995; Masters, 1996; Sheppard and Shotton, 1997; Periasamy, 2001; Diaspro, 2002). Within this scenario, two-photon excitation (TPE) microscopy (Denk et al., 1990) is probably the most relevant advancement in fluorescence optical microscopy since the introduction of confocal imaging in the 1980s (Wilson and Sheppard, 1984; Pawley, 1995; Webb, 1996; Sheppard and Shotton, 1997; Diaspro, 2002). It is worth noting that its fast and increasing spread has been strongly influenced by the availability of ultrafast pulsed lasers (Gratton and van de Vende, 1995; Svelto, 1998) as well as the advances in fluorescence microscopy that can be also ascribed to the availability of efficient and specific fluorophores (Haughland, 2002). Now, to place TPE microscopy in the framework of modern microscopy, consider that harm to a large portion of the specimen by fluorescence excitation is a very unfavorable condition affecting ‘‘classic’’ 3D optical schemes. Because of this experimental condition some potentially interesting biological experiments are defeated by photobleaching of the fluorescent label and phototoxicity. This fact applies in particular when there is the need for 3D imaging together with the use of ultraviolet excitable fluorochromes. The advent of two-photon excitation laser scanning microscopy mitigates these concerns, opening new perspectives to the application of microscopic techniques to the study of biological systems and related phenomena, and
198
DIASPRO AND CHIRICO
providing further attractive advantages over classic fluorescence microscopy. In addition to its three-dimensional intrinsic ability, two-photon excitation microscopy is endowed with five other interesting capabilities. (1) TPE greatly reduces photointeractions and allows imaging of living specimens over long time periods. (2) TPE operates in a high-sensitivity background free acquisition scheme. (3) TPE microscopy can image turbid and thick specimens down to a depth of a few hundreds micrometers. (4) TPE allows simultaneous excitation of different fluorescent molecules reducing colocalization errors. (5) TPE can prime photochemical reactions within a subfemtoliter volume inside solutions, cells, and tissues. Moreover, the use of infrared (IR) radiation to excite ultraviolet (UV) or visible transitions allows better discrimination between Rayleigh and Raman scattering, which again falls in the IR, and the fluorescence signal. So far, TPE fluorescence microscopy is not only revolutionary in its ability to provide optical sections, together with other practical advantages, but also in its elegance and effectiveness as applied to quantum physics (Loudon, 1983; Feynman, 1985; Shih et al., 1998). Furthermore, this form of nonlinear microscopy also favors the development and application of other investigative techniques, such as three-photon excited fluorescence (Hell et al., 1996; Maiti et al., 1997), second-harmonic generation (Gannaway and Sheppard, 1978; Campagnola et al., 1999; Diaspro et al., 2002d; Zoumi et al., 2002), third-harmonic generation (Mueller et al., 1998; Squier et al., 1998), fluorescence correlation spectroscopy (Berland et al., 1995; Schwille et al., 1999, 2000), image correlation spectroscopy (Wiseman et al., 2000, 2002), lifetime imaging (Konig et al., 1996c; French et al., 1997; Sytsma et al., 1998; Straub and Hell, 1998), single-molecule detection schemes (Mertz et al., 1995; Xie and Lu, 1999; Sonnleitner et al., 1999; Chirico et al., 2001), photodynamic therapies (Bhawalkar et al., 1997), and others (Diaspro, 1998; White and Errigton, 2000; Masters, 2002; Periasamy, 2001). For these and other reasons, TPE has become an important and relevant technique among biophysicists and biologists.
II. Historical Notes In 1990 Denk and colleagues opened a new chapter in optical microscopy demonstrating the practical application of TPE to optical microscopy of biological systems (Denk et al., 1990). Notwithstanding this, the TPE story dates back to 1931 and its roots are in the theory originally developed by Maria Go¨ppert-Mayer (1931) (Fig. 3) and later reprised by Axe (1964). The
TWO-PHOTON EXCITATION MICROSCOPY
199
Figure 3. Cover of the prestigious scientific journal Annalen der Physik and first page of the famous article published by Maria Go¨ppert-Mayer (1931). (Image obtained by scanning from the Antonio Borsellino library collection, Department of Physics, University of Genoa.)
first page of her historical article from Go¨ppert-Mayer’s doctoral thesis, predicting the phenomenon of two-photon absorption, is shown in Figure 4. The keystone of the principle of TPE theory lies in the prediction that one atom or molecule can simultaneously absorb two photons in the very same quantum event, as originally sketched for the first time in Figure 5. Now, to understand the rarity of the event, consider that the adverb ‘‘simultaneously’’ here implies ‘‘within a temporal window of 1016 1015 s.’’ As recalled by Denk and Svoboda (1997), in bright daylight a good one- or two-photon excitable fluorescent molecule absorbs a photon through one-photon interaction about once a second and a photon pair by two-photon simultaneous interaction every 10 million of years. To increase the probability of the event a very high density of photons is needed, i.e., a very bright and efficient light source.
200
DIASPRO AND CHIRICO
Figure 4. Photograph of Maria Go¨ppert-Mayer biking with colleagues. (Reproduced with permission from AIP Emilio Segre` Visual Archives, http://www.aip.org/history/esva.)
n´ m
n´
n´ m
n´ m
n
n´ m n´
n
n
n´ n
n
n´ m
n
n
n´ n
n
Figure 5. Quantum physics two-photon absorption rules as originally reported by Maria Go¨ppert-Mayer (1931). (Image obtained by scanning from the Antonio Borsellino library collection, Department of Physics, University of Genoa.)
In fact, it was only in the 1960s, after the development of the first laser sources (Wise, 1999; Svelto, 1998), that was possible to find experimental evidences of the Maria Go¨ppert-Mayer’s prediction. Kaiser and Garret (1961) reported two-photon excitation of fluorescence in CaF2:Eu2+ and Singh and Bradley (1964) were able to estimate the three-photon absorption cross section for naphthalene crystals. These two results consolidated other related experimental achievements obtained by Franken et al. (1961) of second-harmonic generation in a quartz crystal using a ruby laser. Later, Rentzepis and colleagues (1970) observed three-photon excited fluorescence from organic dyes, and Hellwarth and Christensen (1974) collected secondharmonic generation signals from ZnSe polycrystals at a microscopic level. In 1976, Berns reported a probable two-photon effect as a result of focusing
TWO-PHOTON EXCITATION MICROSCOPY
201
Figure 6. First page of the revolutionary paper by Denk and colleagues on TPE microscopy of biological samples (Denk et al., 1990). (Image obtained by scanning from the Antonio Borsellino library collection, Department of Physics, University of Genoa.)
202
DIASPRO AND CHIRICO
an intense pulsed laser beam onto chromosomes of living cells (Berns, 1976), and such interactions form the basis of modern nanosurgery (Konig, 2000). However, the original idea of generating 3D microscopic images by means of such nonlinear optical effects was first suggested and attempted in the 1970s by Sheppard, Kompfner, Gannaway, and Choudhury of the Oxford group (Sheppard et al., 1977; Gannaway and Sheppard, 1978; Sheppard and Kompfner, 1978). It was the Oxford group that realized the ability to do optical sectioning based on the event being confined at the focal plane of the objective, because the image intensity had a quadratic dependence on the illumination power (Wilson and Sheppard, 1984). It should be emphasized that for many years the application of two-photon absorption was mainly related to spectroscopic studies (Friedrich and McClain, 1980; Friedrich, 1982; Birge, 1986; Callis, 1997). The real ‘‘TPE boom’’ took place at the beginning of the 1990s at the W. W. Webb Laboratories (Cornell University, Ithaca, NY). In fact, as previously mentioned, it was the excellent and effective work done by Winfried Denk and colleagues (1990) that produced the major impact for spreading of the technique and that revolutionized fluorescence microscopy imaging. Figure 6 reproduces the first page of the cited paper from Science that revolutionized the microscopic approach to study biological systems at the cellular and molecular level. The potential of two-photon excited fluorescence imaging in a scanning microscope was rapidly coupled with the availability of ultrafast pulsed lasers. It was the development of mode-locked lasers, providing high peak power femtosecond pulses with a repetition rate around 100 MHz (Spence et al., 1991; Gosnell and Taylor, 1991; Gratton and van de Vende, 1995; Fisher et al., 1997; Wise, 1999), that made possible in practice the fast dissemination of TPE laser scanning microscopy and the flourishing of related techniques in a sort of avalanche effect (Hell, 1996; Diaspro, 1998, 1999a,b, 2002; Konig, 2000; Periasamy, 2001; Gratton et al., 2001). The technological advances that made two-photon excitation microscopy successful can be found in almost four continuously evolving areas, namely, the development of laser scanning microscopy, of ultrafast laser sources, of highly sensitive and fast acquisition devices, and of digital electronic tools (Shotton, 1993; Piston, 1999; Tan et al., 1999; Robinson, 2001). III. Basic Principles of Two-Photon Excitation of Fluorescent Molecules Fluorescence microscopy is a very popular contrast mechanism for imaging in biology since fluorescence is highly specific either as exogenous labeling or endogenous autofluorescence (Periasamy, 2001). Fluorescent molecules
TWO-PHOTON EXCITATION MICROSCOPY
203
allows us to obtain both spatial and functional information through specific absorption, emission, lifetime, anisotropy, photodecay, diffusion, and other contrast mechanisms (Diaspro, 2002; Zoumi et al., 2002). This means that one can efficiently study, for example, the distribution of proteins, organelles, and DNA as well as ion concentration, voltage, and temperature within living cells (Chance, 1989; Tsien, 1998; Robinson, 2001). Two-photon excitation of fluorescent molecules is a nonlinear process related to the simultaneous absorption of two photons whose total energy equals the energy required for conventional, one-photon, excitation (Birks, 1970; Denk et al., 1995; Callis, 1997). In any case the energy required to prime fluorescence is the energy sufficient to produce a molecular transition to an excited electronic state. Conventional techniques for fluorescence excitation use UV or visible radiation and excitation occurs when the absorbed photons are able to match the energy gap of the ground from the excited state. Then the excited fluorescent molecules decay to an intermediate state giving off a photon of light having an energy lower than the one needed to prime excitation. This means that the energy (E ) provided by photons should equal the molecule energy gap (Eg), and considering the relationship between photon energy (E ) and radiation wavelength () it follows that Eg ¼ E ¼ hc=
ð1Þ
where h ¼ 6:6 1034 J s1 is Planck’s constant and c ¼ 3 108 m s1 is the value of the speed of light (considered in a vacuum and at reasonable approximation). Due to energetic aspects, the fluorescence emission is shifted toward a wavelength longer than the one used for excitation. This shift typically ranges from 50 to 200 nm (Birks, 1970; Cantor and Schimmel, 1980). For example, a fluorescent molecule that absorbs one photon at 340 nm, in the ultraviolet region, exhibits fluorescence at 420 nm in the blue region, as sketched in Figure 7. In an almost classic three-dimensional fluorescence optical microscope such as the confocal one the fluorescence process is such that the excitation photons are focused into a diffraction-limited spot scanned on the specimen (Wilson and Sheppard, 1984; Webb, 1996). The three-dimensional ability, i.e., the confocal effect, is obtained by confining both the illuminated focal region and the detected area of the emitted light (Sheppard, 2002; Wilson, 2002). So far, the light emitted from the specimen is imaged by the objective lens of the microscope into the image plane. Here a circular aperture (pinhole) is placed in front of a light detector, as depicted in Figure 8. This pinhole is responsible for rejection of the axial out-of-focus light and of the lateral overlapping diffraction patterns. This produces an improvement of spatial resolution of a factor 1.4 along each direction, resulting in a volume
420 nm
Visible fluorescence emission
DIASPRO AND CHIRICO
UV or visible excitation
204
One-photon 340 nm Figure 7. Jablonski’s fluorescence selection rules for one-photon excitation. The fluorescent molecule is brought to an excited state and relaxes back by emitting fluorescence. (Courtesy of Ammasi Periasamy, W. M. Keck Center for Cellular Imaging, University of Virginia.)
Figure 8. Confocal basic setup. Fluorescence coming from the geometric focal plane (green) can reach the detector module unlike out-of-focus fluorescence above (red) and below (blue) the actual focal plane that is blocked by a pinhole. (Courtesy of Perkin Elmer.)
TWO-PHOTON EXCITATION MICROSCOPY
205
selectivity 2.7 times better than in the wide-field case (Brakenhoff et al., 1979; Wilson and Sheppard, 1984; Diaspro et al., 1999a; Jonkman and Stelzer, 2002; Torok and Sheppard, 2002). It is the physical suppression of the contributions from out-of-focus layers to image formation that produces the so-called optical sectioning effect. Unfortunately, a drawback is that during the excitation process of the fluorescent molecules the whole thickness of the specimen is harmed by every scan, within a hourglassshaped region (Bianco and Diaspro, 1989). This means that even though out-of-focus fluorescence is not detected, it is generated with the negative effect of potential induction of those photobleaching and phototoxicity phenomena previously mentioned. The situation becomes particularly serious when there is the need for three-dimensional and temporal imaging coupled with the use of fluorochromes that require excitation in the ultraviolet regime (Stelzer et al., 1994; Denk, 1996). As earlier reported by Konig and colleagues (1996a), even using UVA (320–400 nm) photons may modify the activity of the biological system. DNA breaks, giant cell production, and cell death can be induced at radiant exposures of the order of magnitude of J/cm2, accumulable during 10 scans with a 5-mW laser scanning beam at approximately 340 nm and a 50-ms pixel dwell time. In this context, two-photon excitation of fluorescent molecules provides an immediate practical advantage over confocal microscopy (Denk et al., 1990; Potter, 1996; Centonze and White, 1998; Gu and Sheppard, 1995; Piston, 1999; Squirrel et al., 1999; Diaspro and Robello, 2000; So et al., 2000; Wilson, 2002). In fact, reduced overall photobleaching and photodamage are generally acknowledged as major advantages of two-photon excitation in laser scanning microscopy of biological specimens (Brakenhoff et al., 1996; Denk and Svoboda, 1997; Patterson and Piston, 2000). However, excitation intensity has to be kept low considering the normal operation mode as a regime under 10 mW of average power. When laser power is increased above 10 mW some nonlinear effects might arise evidenced through abrupt rising of the signals (Hopt and Neher, 2001). Moreover photothermal effects should be induced especially when focusing on single-molecule detection schemes (Chirico et al., 2002). In TPE, two low-energy photons are involved in the interaction with absorbing molecules. The excitation process of a fluorescent molecule can take place only if two low-energy photons are able to interact simultaneously with the very same fluorophore. As mentioned in the introduction, the time scale for simultaneity is the time scale of molecular energy fluctuations at photon energy scales, as determined by the Heisenberg uncertainty principle, i.e., 1016 1015 s (Louisell, 1973). These two photons do not necessarily have to be identical but their wavelengths, 1 and 2, have to be such that
206
DIASPRO AND CHIRICO
1P ffi
1 1 þ 1 2
1
ð2Þ
where 1P is the wavelength needed to prime fluorescence emission in a conventional one-photon absorption process according to the energy request outlined in Eq. (1). This situation, compared to the conventional one-photon excitation process shown in Figure 7, is illustrated in Figure 9 using a Jablonski-like diagram. It is worth noting that for practical reasons the experimental choice is usually such that (Denk et al., 1990; Diaspro, 2001; Girkin and Wokosin, 2002) 1 ¼ 2 21P ð3Þ and
Eg ¼ 2hc=1P
ð4Þ
Considering this as a nonresonant process and the existence of a virtual intermediate state, one should calculate the resident time, virt, in this intermediate state using the time-energy uncertainty consideration for TPE:
Two-photon 680 nm
420 nm VS
Visible fluorescence emission
ð5Þ
VS IR excitation
VS 420 nm
IR excitation
where, h ¼ h/2.
Visible fluorescence emission
Eg virt ffi h=2
Three-photon 1020 nm
Figure 9. Jablonski’s fluorescence selection rules for two- and three-photon excitation. When the fluorescent molecule is brought to the excited state it relaxes emitting the same fluorescence as in the one-photon excitation case. (Courtesy of Ammasi Periasamy, W. M. Keck Center for Cellular Imaging, University of Virginia.)
TWO-PHOTON EXCITATION MICROSCOPY
207
It follows that virt ffi 1015 1016 s
ð6Þ
This is the temporal window available to two photons to coincide in the virtual state. As will be more evident in the following sections, the TPE process requires high photon flux densities that can typically be obtained by tightly focusing a laser beam. So far, in a TPE process it is crucial to combine sharp spatial focusing with temporal confinement of the excitation beam. The process can be extended to n photons requiring higher photon densities temporally and spatially confined (Fig. 9). Thus, near infrared (about 680–1100 nm) photons can be used to excite UV and visible electronic transitions producing fluorescence. The typical photon flux densities are of the order of more than 1024 photons cm2 s1 , which implies intensities around MW– TW cm2 (Go¨ppert-Mayer, 1931; Konig et al., 1996a). Such a high photon flux density can be achieved by focalizing with high numerical aperture objectives continuous near infrared laser beams (Hanninen and Hell, 1994; Konig et al., 1995). Liu and colleagues (1995) showed that cellular heating due to mW intensities at near-infrared wavelengths is of the order of 20 mK / mW (see also Konig and Tirlapur, 2002). However, the design and realization of ultrafast laser sources further improve the situation (Konig, 2000). Figure 10 shows the main factors in the application of such phenomenon in microscopy, namely, high numerical aperture lenses and ultrafast infrared laser sources. A treatment in terms of quantum theory for two-photon transition has been elegantly proposed by Nakamura (1999) using perturbation theory. He
Figure 10. Technical ingredients for two- and multiphoton excitation microscopy.
208
DIASPRO AND CHIRICO
clearly describes the process by a time-dependent Schroedinger equation, where the Hamiltonian contains electric dipole interaction terms. Using perturbative expansion one finds that the first-order solution is related to one-photon excitation and higher order solutions are related to n-photon ones (Faisal, 1987; Callis, 1997). It is worth noting that the dipole operator has odd parity, and the one-photon transition moment reflects that the initial and final states have opposite parity, whereas in the two-photon case the two states have the same parity (So et al., 2000). Now, let us try to discuss TPE on the basis of the following simple assumption: the probability of a molecule undergoing n-photon absorption is proportional to the probability of finding n photons within the volume it occupies at any moment in time (Louisell, 1973; Andrews, 1985). Alternatively, what is the probability of finding two photons within the interval of time the molecule spends in a virtual state (Moscatelli, 1986)? Here we will refer to the first case discussed earlier by Andrews (1985): what is the probability pn that n photons are in the same molecular volume? We consider that all the molecules are endowed with a suitable set of energy levels such that all possible n-photon transitions are possible. So far, we consider the relationship between the mean number of photons, M, at any time within a molecular volume and the intensity, I, of the laser beam that is energy per unit area per unit time. Considering a cube of side S through which the photons are delivered within a beam width much larger than S. The mean energy in this cubic box, for a certain wavelength , is EM ¼ Mhc=
ð7Þ
I ¼ EM =ðS2 S=cÞ ¼ Mhc2 =ð S3 Þ
ð8Þ
2
Since the cross-sectional area is S , and the time needed for each photon to cross the box is S/c, then
S3
¼ Vm =Na , where for a molecule the mean volume Recalling that V ¼ occupied is the molar volume Vm divided by Avogadro’s constant, Na ¼ 6:022 1023 mol1 , we have M ¼ IVm =Na hc2
ð9Þ
As an example, considering a wavelength of 780 nm delivered at peak intensities of the order of GW cm2 into a reasonable molecular volume of the order of 104 m3 mol1, the resulting value for M is of the order of 105. Using a Poisson distribution to determine pn (Louisell, 1973) we get pn ðM n =n!ÞeM
ð10Þ
TWO-PHOTON EXCITATION MICROSCOPY
209
The resulting probability for TPE, n ¼ 2, expanding the exponential term in Taylor series for M small and truncating at the first term setting the exponential value to unity, is given by p2 ¼ M 2 =2 / I 2
¼ proportionality factor
ð11Þ
Here, the dependence of TPE from I2 should be evident. Because we have shown that TPE is a process that has a quadratic dependence on the instantaneous intensity of the excitation beam, we can introduce the molecular cross section, as its propensity to absorb in a TPE event photons having a certain energy or wavelength, and refer the fluorescence emission as a function of the temporal characteristics of the light, I(t), to it. So far, the fluorescence intensity per molecule, If (t), can be considered proportional to the molecular cross section 2() and to I(t) as " #2 ðNAÞ2 2 2 If ðtÞ / 2 IðtÞ / 2 PðtÞ ð12Þ hc where P(t) is the laser power and NA is the numerical aperture of the focusing objective lens. The last term of Eq. (12) simply takes care of the distribution in time and space of the photons by using paraxial approximation in an ideal optical system (Born and Wolf, 1980). It follows that the time-averaged two-photon fluorescence intensity per molecule within an arbitrary time interval T,
, can be written as " #2 Z Z 1 T ðNAÞ2 1 T < If ðtÞ > ¼ If ðtÞdt / 2 PðtÞ2 dt ð13Þ hc T 0 T 0 in the case of continuous wave (CW) laser excitation. Now, because the present experimental situation for TPE is related to the use of ultrafast lasers, we consider that for a pulsed laser T ¼ f1P , where fP is the pulse repetition rate (Svelto, 1998). This implies that a CW laser beam, where P(t) = Pave, allows transformation of Eq. (13) into " # 2 2 ðNAÞ < If;cw ðtÞ > / 2 P2ave ð14Þ hc Now, for a pulsed laser beam with pulse width, p, repetition rate, fp, and average power Pave ¼ D Ppeak ðtÞ where D ¼ p fp , the approximated P(t) profile, can be described as
ð15Þ
210
DIASPRO AND CHIRICO
PðtÞ ¼
Pave p fp
PðtÞ ¼ 0
for 0 < t < p for p < t <
1 fp
So far, Eq. (13) becomes (So et al., 2001) " #2 Z " #2 P2ave ðNAÞ2 1 P P2ave ðNAÞ2 < If; p ðtÞ > / 2 2 2 dt ¼ 2 ¼ T 0 p fp hc hc p fp
ð16Þ
ð17Þ
The conclusion here is that CW and pulsed lasers operate at the very same excitation efficiency, i.e., fluorescence intensity per molecule, if the average 1 power of the CW laser is kept higher by a factor of pffiffiffiffiffiffi fP . This means that 10 W delivered by a CW laser, allowing the same efficiency of conventional excitation performed at approximately 101 mW, is nearly equivalent to 30 mW for a pulsed laser. This leads to the most popular relationship reported below, which is related to the practical situation of a train of beam pulses focused through a high numerical aperture objective, with a duration p and fp repetition rate. In this case, the probability, na, that a certain fluorophore simultaneously absorbs two photons during a single pulse, in the paraxial approximation, is by (Denk et al., 1990) 2 2 P2ave NA2 ð18Þ na / 2h c p fp2 where Pave is the time-averaged power of the beam and is the excitation wavelength. Introducing 1 GM (Go¨ppert-Mayer) ¼ 1058 ½m4 s], for a 2 of approximately 10 GM per photon (Denk et al., 1990; Xu, 2002), focusing through an objective of NA ¼ 1.2–1.4, an average incident laser power of 1–50 mW, operating at a wavelength ranging from 680 to 1100 nm with 100-fs pulsewidth and 100-MHz repetition rate, would saturate the fluorescence output as for one-photon excitation. This suggests that for optimal fluorescence generation, the desirable repetition time of pulses should be on the order of a typical excited-state lifetime, which is a few nanoseconds for commonly used fluorescent molecules. For this reason the typical repetition rate is around 100 MHz. A further condition that makes Eq. (18) valid is that the probability of each fluorophore being excited during a single pulse has to be smaller than one. The reason lies in the observation that during the pulse time (1013 s of duration and a typical excited-state lifetime in the 109 s range) the molecule has insufficient time to relax to the ground state. This can be considered a prerequisite for absorption of another photon pair. Therefore, whenever na approaches
TWO-PHOTON EXCITATION MICROSCOPY
211
Figure 11. Pictorial (not in scale) representation of typical time scales related to two- and multiphoton excitation processes, namely, laser beam repetition rate (100 MHz), laser beam pulse width (100 fs), and fluorescence decay (ns).
unity saturation effects start to occur. The use of Eq. (18) allows one to choose optical and laser parameters that maximize excitation efficiency without saturation. Figure 11 depicts (not in scale) the practical time scale condition typically used. It is also evident that the optical parameter for enhancing the process in the focal plane is the lens numerical aperture, NA, even if the total fluorescence emitted is independent of this parameter as shown by Xu (2002). This value was confined to around 1.3–1.4 as a maximum value until the recent introduction of two new objectives by Olympus and Zeiss with numerical apertures of 1.65 and 1.45, respectively. Unfortunately there is no information about transmission properties in the UV–IR regime at this moment. One can now estimate na for a common fluorescent molecule such as fluorescein that possesses a two-photon cross section of 38 GM at 780 nm (So et al., 2001). For this purpose, we can use NA ¼ 1.4, a repetition rate at 100 MHz, and a pulsewidth of 100 fs within a range of Pave assumed 1, 10, 20, and 50 mW. Substituting the proper values in Eq. (14) we obtain P2ave na ¼ 38 1058 100 1015 ð100 106 Þ2 " #2 ð1:4Þ2 ffi 5930 P 2ave 2 1:054 1034 3 108 780 109 The final results as a function of 1, 10, 20, and 50 mW are 5.93103 , 5.93101 , 1.86, and 2.965, respectively. It is evident that saturation begins to occur at 10 mW (Koester et al., 1999; So et al., 2001). The very same calculation can be made for rhodamine B by changing the cross-sectional value from 38 to 210 and considering 840 nm instead of
212
DIASPRO AND CHIRICO TABLE 1 Vales of g(2) in Relation to Pulse Shape Pulse Shape
g(2)
Rectangular Gaussian Hyperbolic-secant-squared
1.0 0.66 0.59
780 nm as the excitation wavelength. This leads to an na approximatively 4.76 times greater than in the case of fluorescein. This sets the saturation average power for rhodamine B around 5 mW instead of 10 mW. The related rate of photon emission per molecule, at a nonsaturation excitation level, in the absence of photobleaching (Patterson and Piston, 2000; So et al., 2001), is given by na multiplied by the repetition rate of the pulses. This means approximately 5107 photons s1 in both cases. It is worth noting that when considering the effective fluorescence emission one should consider a further factor given by the so-called quantum efficiency of the fluorescent molecules. In the next section we will report data related to the fluorochrome action cross section that are related to absorption cross section and quantum efficiency. It has been demonstrated that the fluorophore emission spectrum is independent of the excitation mode (Xu et al., 1995; Xu, 2002). So far, the quantum efficiency value is known from conventional one-photon excitation data (Pawley, 1995). Always referring to Eq. (18), one should also consider a further proportionality factor, g(2), that is related to the pulse shape of the laser beam (Svelto, 1998). Values for this form factor are reported in Table 1. All calculations have been made considering a rectangular pulse shape. From Eq. (18) it should be clear that there are some key parameters implicated in TPE that should be considered and controlled, namely, the cross section of the fluorescent molecule, the numerical aperture of the objective, and excitation beam characteristics. The next section will focus on the behavior of fluorescent molecules and on the optical consequences of working under a TPE regime, before moving to considerations related to excitation beam characteristics, practical architectures, performances, and applications.
IV. Behavior of Fluorescent Molecules under TPE Regime In TPE microscopy, several common fluorescent molecules can be used despite the fact that the quantum-mechanical selection rules are different
213
TWO-PHOTON EXCITATION MICROSCOPY
from those for the one-photon excitation condition (Loudon, 1983; Birge, 1979; Wang and Herman, 1996; Haughland, 2002; Xu, 2002). As a starting point, fluorescent molecules can be excited under a TPE regime at twice their one-photon excitation wavelength (Lakowicz, 1999). Figure 12 shows a simplified Jablonski diagram illustrating this type of guideline. Because this is extended to any fluorescent molecule, there are a variety of autofluorescent molecules that can be effectively exploited without the need for ultraviolet excitation (Herman and Tanke, 1998; Lakowicz, 1999). Figure 13 represents the spectral distribution of the autofluorescence exhibited by some interesting biological molecules and macromolecules. To characterize fluorescent molecules in terms of their response to excitation there are two specific parameters that have to be calculated or measured (Harper, 2001; Berland, 2001): the molecule absorption cross section and the quantum efficiency. The former is related to the propensity of a molecule to absorb photons at a certain wavelength (Cantor and Schimmel, 1980). The latter is more directly connected to the fluorescence process and to the molecule itself: it is a measure of the yield in the conversion of absorbed energy into light emission. This last parameter is also known as quantum yield and can be considered as an indicator of the probability that a given excited molecule
2λ1 λ1
λ2
(a)
λ2
2λ1
(b)
Figure 12. Jablonski diagram illustrating one-photon (a) and two-photon (b) excitation and deexcitation pathways for a fluorescent molecule.
Figure 13. Autofluorescence spectral distribution of some interesting biological molecules. (See Color Insert.)
214
DIASPRO AND CHIRICO
will produce a fluorescence photon. It is clear that both these parameters influence the detectable intensity of fluorescence from one or more specific fluorescent molecules. Moreover, their behavior is also influenced by environmental conditions, namely, pH, temperature, etc. In general, quantum yield of a fluorescent molecule conventionally excited, i.e., onephoton excitation, is preserved in a two- or multiphoton excitation regime. Unfortunately the knowledge of one-photon absorption cross sections does not permit any quantitative precise prediction of the two-photon ones. Table 2 reports measured data for both intrinsic and extrinsic fluorescent molecules, including green fluorescent protein (Tsien, 1998; Xu, 2002). This means that cross sections for TPE or higher orders of excitation have to be measured. However, the practical ‘‘rule of thumb’’ mentioned at the beginning of this section can be used even if it does not guarantee optimal TPE conditions. This simple but effective rule works especially with symmetrical molecules (Lakowicz, 1999). Figure 14 compares one- and two-photon performances for two common fluorescent molecules. It is clear that there is a peculiar and interesting variety of excitation in TPE that allows more flexibility in excitation. This fact is depicted in Figure 15. The cross section parameter has been measured for a wide range of dyes (Xu et al., 1995; Albota et al., 1998b; Xu, 2002). It is worth noting that also due to the increasing dissemination of TPE microscopy, new ‘‘ad hoc’’ organic molecules, endowed with large engineered two-photon absorption cross sections, have been recently developed (Albota et al., 1998a). The TPE cross section not only brings information about how well a specific fluorescent molecule is excited by light in the infrared spectral region but also indicates the position of the two-photon absorption peak that is normally unpredictable. As can be seen from cross-sectional data and graphs there is a very interesting and useful variety of ‘‘relative peaks.’’ The practical consequence is that unlike onephoton excitation when using TPE one can find a ‘‘good wavelength’’ for exciting fluorescence of several different molecules using the very same wavelength. The relevance of this fact is obvious, for example, with respect to colocalization problems. One can try to find an optimal excitation wavelength for simultaneously priming fluorescence of different fluorochromes. This means that a real and effective multiple fluorescence colocalization of biological molecules, macromolecules, organelles, etc. can be obtained. Figure 16 shows an example of multiple fluorescence realized by means of one- and two-photon excitation. Special mention is due to excitation of green fluorescent protein (GFP), an important molecular marker for gene expression (Chalfie et al., 1994; Chalfie and Kain, 1998; Potter, 1996; Tsien, 1998). GFP cross sections are around 6 GM (800 nm) and 7 GM (960) in the case of wild type and
215
TWO-PHOTON EXCITATION MICROSCOPY TABLE 2 Intrinsic and Extrinsic Fluorescent Molecules Fluorophores Intrinsic fluorophores GFP wt GFP S65T BFP CFP YFP EGFP Flavine NADH Phycoerythrin Extrinsic fluorophores Bis-MSB Bodipy Calcium green 1 Calcofluor Cascade blue Coumarin 307 CY2 CY3 CY5 DAPI (free) Dansyl Dansyl hydrazine Dil Filipin FITC Fluorescein (pH 11) Fura-2 (free) Fura-2 (high Ca) Hoechst Indo-1 (free) Indo-1 (high Ca) Lucifer yellow Nile red Oregon green bapta 1 Rhodamine B Rhodamine 123 Syto 13 Texas red Triple probe (DAPI, FITC, and rhodamine) TRITC (rhodamine)
(nm)
6 7
800–850 960 780/820 780/840 860/900 900–950 700 700 1064 691/700 920 800 (780, 820) 780/820 750 776 780/800 780 780/820 700/720 700 700 700 720 740/780/820 780 700 700 780/820 700 590/700 860 810 800 840 780–560 810 780 720/740 800–840
0.8 0.02 322 110 6.0 1.8 17 4.9
6.3 1.8
2.1 0.6 19 5.5
0.16 0.05 1 0.72 0.2 95 28 — 11 12
38 9.7
4.5 1.3 1.2 0.4 0.95 0.3
12 4 2.1 0.6
—
210 55
216
DIASPRO AND CHIRICO
Figure 14. Comparison of one-photon (lines) and two-photon (solid circles) fluorescence excitation spectra. The abscissa axis reports excitation wavelengths in nanometers that have to be scaled by a factor two for one-photon excitation. The ordinate axis values represent twophoton action cross section for Cascade blue in water (right) and two-photon absorption cross section for fluorescein in water, pH 13 (left). Values are given in Go¨ppert-Mayer units, GM (1 GM ¼ 1050 cm4 s) and are reported in logarithmic scale. It is worth noting that the ‘‘twice wavelength’’ rule of thumb works almost perfectly for Cascade blue whereas fluorescein exhibits a more complicated and interesting behavior as a function of wavelength. Nevertheless fluorescein also respects the rule, in fact there is a relative maximum near the one-photon excitation one (Xu et al., 1995; Xu, 2002).
S65T type, respectively. As comparison, one should consider that the cross section for NADH, at the excitation maximum of 700 nm, is approximately 0.02 GM (So et al., 2000). A combination of GFP and TPE is one of the most promising scientific fields, unfortunately it is too vast to be treated here. In discussing fluorescent molecules, another very important issue is related to TPE-induced photobleaching. Although the TPE scheme reduces overall photobleaching of the sample by limiting excitation to a small volume element, photobleaching within the excited area is not reduced (Brakenhoff et al., 1996). In fact, accelerated rates of photobleaching have been observed using TPE compared with conventional one-photon excitation (Patterson and Piston, 2000). Although two-photon excitation has several advantages for spectroscopic and imaging applications, very little is known about photobleaching and about similar effects on the stability of the molecules, especially when moving to the single-molecule detection field of application. The studies in the literature mostly refer to bulk measurements. Several definitions of bleaching can be given (Lakowicz, 1999), and we can envision two main sources. The molecule may convert from the excited state, usually with a radiative decay constant in the tens of nanoseconds range, to a second excited metastable state with a vanishing radiative constant. Another possibility is that the molecule changes its structure in such a way that the molecular ground state assumes
TWO-PHOTON EXCITATION MICROSCOPY
217
Figure 15. Two-photon action cross sections for several common fluorescent molecules, namely, Cascade blue (CB), Lucifer yellow (LY), Bodipy (BD), DAPI free (DP), dansyl (DN), pyrene (PY), cumarin (CU) (above), Indo-1 calcium bound (IC), Indo-1 free (IF), Fura-2 calcium bound (FC), Fura-2 free (FF), calcium green calcium bound (CG), calcium orange calcium bound (CO), calcium crimson calcium bound (CC), and Fluo-3 calcium bound (F3). Solid circle marks the wavelength that is twice the optimal one-photon excitation one. Axes as in Figure 14. More details about fluorochormes can be found at the Molecular Probes web site, www.probes.com (Haughland, 2002). (See Color Insert.)
a vanishing cross section for the excitation light. This modification may be induced by isomerization or thermal absorption. In both cases the molecular fluorescence emission drops to zero. Irreversible photobleaching and blinking are usually ascribed to the first type of transition. Mertz et al. (1995) have compared single to two-photon excitations with particular regard to the saturation or higher level transitions. More recently Patterson and Piston (2000) provided data on bulk solutions of dyes that indicate an enhanced photobleaching in two-photon spectroscopy due probably to a three-photon process.
218
DIASPRO AND CHIRICO
Figure 16. Bovine pulmonary artery endothelial cells (F-147780, Molecular Probes) marked with three different fluorophores for mitochondria, F-actin, and DAPI. Image on the left shows mitochondria (red) and F-actin (green)-labeled molecules. Here a sort of black hole is shown in the center at the position of the nucleus due to the fact that DAPI, UV excitable at one-photon excitation, was not excited. On the right a false color picture obtained by means of TPE at 720 nm displays the nuclear portion (pink). Using TPE, simultaneous excitation of the three fluorophores occurred at 720 nm. (Image acquired at LAMBS.) (See Color Insert.)
We have very recently studied the effect of two-photon excitation on the total amount of fluorescence that can be collected from a single immobilized molecule at the high excitation intensities required for single-molecule studies with two-photon excitation (F. Federici, A. Gerbi, and A. Diaspro, unpublished data; Chirico et al., 2002). Four dyes were considered: indo-1, rhodamine 6G, fluorescein, and pyrene. The choice of these dyes is motivated also by the different complexity of the chemical structure that increases from pyrene to indo-1. For these molecules we have evaluated the total amount of fluorescence light that can be recovered from each dye versus the excitation intensity, the temperature, and the duration and the nature of the excitation. The main result of this research is the characterization of the thermally induced bleaching of the dyes and a clear correlation of the bleaching time and its dependence on the excitation intensity, with the photophysics parameters of the molecules (Chirico et al., 2002). These conclusions were also based on numerical simulations of the local temperature increase during laser excitation. Further studies on the features of fluorescent molecules, in particular on single-molecule detection of both isolated and ‘‘in situ functioning’’ fluorescent molecules, are needed.
TWO-PHOTON EXCITATION MICROSCOPY
219
V. Optical Consequences and Resolution Aspects A misconception about TPE microscopy is that optical resolution is enhanced. This is not true in terms of strict optical resolution because as a first step toward TPE wavelengths longer than in the conventional case must be used. However, it is a common feeling that optical resolution in microscopy, or for people involved in microscopy measurements, is a mix of different parameters including signal-to-noise ratio (see also Section VI and Fig. 24). So far, because fluorescence is dramatically reduced in TPE, objects comparable or smaller with the optical resolution attainable in conventional microscopy may appear brighter or more defined when using a double excitation wavelength. It is important that one always remember that far-field TPE microscopy, as discussed in this article, is not the way to surpass the limit described by Abbe (1910). However, in microscopy, one is also interested in obtaining complete spatial information about the sample or in performing three-dimensional imaging. Here, a very important optical consequence coming from the utilization of TPE is given by the confinement of the spatial region where fluorescence takes place within a subfemtoliter volume. The practical consequence of this feature is that optical sectioning ability is an intrinsic ability of TPE microscopy. Within the one-photon excitation optical sectioning scheme depicted in Figure 17 (Agard, 1984; Bianco and Diaspro, 1989; Diaspro et al., 1990; Castleman, 1996), the situation is that the observed image O at a plane j, where the focus of the lens is mechanically positioned and our main interest is concentrated, can be described by the following relationship, for the sake of simplicity expressed in the Fourier transform domain under the condition of a spatially invariant linear system (Castleman, 2002): X O j Ij S j þ Ik Sk þ N ð19Þ k
The first term takes into proper account the optical distortion S, given by the so-called point spread function of the microscope, on the true distribution of fluorescence intensity, I, at the plane j. The second term contains contributions, defocused, from the above and below k adjacent planes. In fact, fluorescent molecules in the adjacent planes are properly excited by the proper wavelength (see Section IV), even if they are more sparse with respect to those at the focal plane or volume (Jonkman and Stelzer, 2002). The third term, N, is the noise, considered additive with reasonable approximation (Castleman, 1996; Agard, 1984). Now, noise can be easily modeled or measured as well as S, the distortion introduced by the optical system that is called the point spread function. In wide-field
220
DIASPRO AND CHIRICO
Figure 17. Sketch of the optical sectioning scheme (a) obtained by exploiting the spatial confinement of TPE depicted in double-cone excitation geometry (b). Only fluorescent molecules positioned at the double-cone apex have a nontrascurable probability of being excited under the TPE regime.
microscopy, after digital acquisition of data, the system can be solved and the best estimate of I can be found. In confocal microscopy the situation is better because the second term is dramatically reduced by the insertion of a pinhole, and S is less disturbing (I’d like to say that trend is assuming the unitary value, but this is not true). Under TPE the second term does not exist at all, due to the confinement properties of the excitation process. The 3D confinement of the two-photon excitation volume can be understood based on optical diffraction theory (Born and Wolf, 1980). Using excitation light with wavelength , the intensity distribution at the focal region of an objective with numerical aperture NA ¼ sin( ) is described, in the paraxial regime, by (Born and Wolf, 1980; Sheppard and Gu, 1990)
TWO-PHOTON EXCITATION MICROSCOPY
Iðu; vÞ ¼ j2
Z
1 0
i
J0 ð Þe2u d j2 2
221 ð20Þ
Where J0 is the zeroth order Bessel function, is a radial coordinate in the pupil plane, and u,v are defined as u¼
8 sin2 ð =2Þz
ð21Þ
2 sinð Þr dimensionless axial and radial coordinates, respectively, normalized to the wavelength (Wilson and Sheppard, 1984). This implies that the intensity of fluorescence distribution within the focal region has an I(u,v) behavior for the one-photon case and I 2 (u/2,v/2) for the TPE case as shown earlier. The arguments of I 2 (u/2,v/2) take into proper account the fact that in the latter case one utilizes wavelengths that are approximatively twice the ones used for one-photon excitation. These distributions are called the point spread functions (PSF) of the microscope (Born and Wolf, 1980; Jonkman and Stelzer, 2002; Castleman, 2002; Bertero and Boccacci, 2002). As compared with the one-photon PSF, the TPE PSF is axially confined (Nakamura, 1993; Gu and Sheppard, 1995; Jonkman and Stelzer, 2002). In fact, considering the integral over v, keeping u constant, its behavior is constant along z for one-photon excitation and has a half-bell shape for TPE. This behavior, better discussed in Wilson (2002), Torok and Sheppard (2002), and Jonkman and Stelzer (2002), explains the three-dimensional discrimination property in TPE. In general, two-photon microscopy has a radial resolution comparable with onephoton conventional microscopes due to a better signal-to-noise ratio and an effective and narrow depth of focus that make it well suited for threedimensional optical sectioning. Figure 18 shows the PSF shape for widefield, confocal, and TPE conditions (Periasamy et al., 1999). Table 3 gives the calculated values of the 3D intensity PSF in the transverse and axial directions (Gu and Sheppard, 1995). The comparison of the 3D intensity PSF for confocal one-photon and two-photon imaging reveals that resolution in both cases is almost the same (Jonkman and Stelzer, 2002; Torok and Sheppard, 2002). Now, the most interesting aspect, also predicted by Eq. (18) or Eq. (20), is that the excitation power falls off with the square of the distance from the lens focal point, within the approximation of a conical illumination geometry. In practice this means that the quadratic relationship between the excitation power and the fluorescence intensity results in TPE falling off ¼
222
DIASPRO AND CHIRICO
Figure 18. Point spread function shapes for conventional digital deconvolution microscopy, confocal microscopy, and TPE microscopy, from the left to the right. (Courtesy of Ammasi Periasamy, modified from Periasamy et al., 1999.)
TABLE 3 Values of the Half-Width of the 3D Intensity Point Spread Function in the Transverse and Axial Directions, v1/2 and u1/2
v1/2 u1/2
Conventional 1P
Confocal 1P
Conventional 2P
1.62 5.56
1.17 4.01
2.34 8.02
Confocal 2P 1.34 4.62
as with the fourth power of distance from the focal point of the objective. This implies that the point spread function or the geometric resolution parameters allow a sort of volume of event for TPE to be defined, as sketched in Figure 19a. Therefore, those regions away from the focal volume of the objective lens, directly related to the numerical aperture of the objective itself, do not suffer photobleaching or phototoxicity effects and do not contribute to the signal detected when a TPE scheme is used. This situation is presented in Figure 19b. Because they are simply not involved in the excitation process, a confocal-like effect is obtained without the necessity of a confocal pinhole. Figure 20 shows the spatial extension of the fluorescence emission from a solution containing fluorescent molecules and subjected to one- and two-photon excitation regimes. Consequently, photodamaging and photobleaching effects are extremely localized as demonstrated in Figure 21. Figure 22 shows a further demonstration of the three-dimensional localization effect attainable by means of TPE. Photobleaching was induced within a large fluorescent sphere—22 mm in diameter—using the confocal and TPE mode. The latter not only exhibited the expected features but also pointed out the potential of such a technique as an active photodevice, as it will be better seen in the following sections. In
TWO-PHOTON EXCITATION MICROSCOPY
223
Figure 19. (a) PSF or more general optical resolution parameters can be used to detrmine the extent of the volume of TPE event (modified from Pawley, 1995). (b) In conventional excitation (left) all photons carry the ‘‘right energy’’ for priming fluorescence in any fluorescent molecule encountered within the double-cone of excitation, whereas in TPE (right) only photons confined in the volume of event prime fluorescence due to the high temporal and spatial concentration. Under TPE, low-density distributed photons within the double cone of excitation do not possess enough energy for priming fluorescence. (See Color Insert.)
224
DIASPRO AND CHIRICO
Figure 20. Fluorescence emission from a solution containing fluorescent molecules under one- (double cone, above) and two-photon (bright dot, below) excitation. (Picture courtesy of John Girkin from Bio-Rad web page.) (See Color Insert.)
Figure 21. Effect of TPE localization. Comparison between one- (above) and two-photon (below) induced photobleaching visualized along the z-axis in an x–z view. Scanning was performed in the volume defined by the rectangle within the double-cone excitation volume. (Courtesy of David Piston; adapted from Pawley, 1995.)
TPE over 80% of the total intensity of fluorescence comes from a 700- to 1000-nm-thick region about the focal point for objectives with numerical apertures in the range from 1.2 to 1.4 (Brakenhoff et al., 1979; Wilson and Sheppard, 1984; Wilson, 2002; Jonkman and Stelzer, 2001; Torok and
TWO-PHOTON EXCITATION MICROSCOPY
225
Figure 22. Three-dimensional side views (y–z plane cut) of a large fluorescent sphere—22 mm in diameter—where photobleaching has been induced in a central x–y section in singlephoton confocal (left) and TPE (right) mode using 488 and 720 nm excitation wavelengths, respectively. (Adapted from Diaspro, 2001; image realized at LAMBS.)
Sheppard, 2002). This also implies a reduction in background that allows compensation of the reduction in spatial resolution due to the wavelength. The utilization of infrared wavelength instead of UV-visible ones also allows deeper penetration than in the conventional case (So et al., 2000; Periasamy et al., 2002; Konig and Tirlapur, 2002). In fact, Rayleigh scattering produced by small particles is proportional to the inverse fourth power of the wavelength. Thus the longer wavelengths used in TPE, or in general in multiphoton excitation, will be scattered less than the ultraviolet-visible wavelengths used for conventional excitation. It is worth noting that when considering deep imaging in thick samples optical aberrations should be properly considered (de Grauw and Gerritsen, 2002; Centonze and White, 1998). So far deeper targets within a thick sample can be reached. Of course, for the fluorescence light, on the way back, scattering can be overcome by acquiring the emitted fluorescence using a large area detector and collecting not only ballistic photons (Soeller and Cannel, 1999; Bueheler et al., 1999; Girkin and Wokosin, 2002). Because several factors all influence whether a particular sample should be imaged with a confocal, multiphoton, or even wide-field camera imaging system, the highest priced option, in buying or building a two-photon microscope, should not automatically be assumed to be the best for every biological imaging challenge.
VI. Architecture of Two-Photon Microscopy A. General Considerations Two-photon microscopes and architectures are now commercially available, but are very expensive. Table 4 presents an overview of market availability.
TABLE 4 Overview of Market Availability
Model
Company
Dimension
Pulse Width Wavelength Regime Range (nm)
Average Power (mW)
226
LSM 510 NLO (META) Zeiss
Compact/normal
fs
700–900
50
MRC 1024 MP
Bio-Rad
Normal/large
fs
690–1000
Radiance 2000 MP
Bio-Rad
Compact/normal
fs
690–1000
RTS 2000 MP
Bio-Rad
Large
fs
690–1000
TCS SP2
Leica
Normal/large
ps
720–900
Not reported Not reported Not reported Not reported (120 max at the sample)
Laser Coupling
Acquisition
Direct-box/ Descanned/ fiber nondescanned Direct-box Descanned/ nondescanned Direct-box Descanned/ nondescanned Direct-box Descanned/ nondescanned Fiber Descanned/ nondescanned
Other Features Simultaneous confocal None relevant Faster scanning (>750 Hz) 130 frames/s video rate Spectral capability
TWO-PHOTON EXCITATION MICROSCOPY
227
However, a TPE microscope can also be constructed from components or, utilizing a very efficient compromise, by modifying an existing confocal laser scanning microscope. This last situation is still the best in the authors’ opinion allowing an effective mix of operational flexibility and of good quality-to-cost ratio. The basic designs for the above-mentioned three solutions are very similar. The main ingredients to perform two-photon excitation microscopy and related techniques are a high peak-power laser delivering moderate average power (femtosecond or picosecond pulsed at a relatively high repetition rate) emitting infrared or near infrared wavelengths (650–1100 nm), a laser beam scanning system, a high numerical aperture objective (>1), a high-throughput microscope pathway, and a highsensitivity detection system (Denk et al., 1995; Konig et al., 1996b; So et al., 1996; Soeller and Cannell, 1996; Wokosin and White, 1997; Centonze and White, 1998; Potter et al., 1996; Wolleschensky et al., 1998; Diaspro et al., 1999b; Wier et al., 2000; Soeller and Cannell, 1999; Tan et al., 1999; Mainen et al., 1999; Majewska et al., 2000; Diaspro, 2002; Girkin and Wokosin, 2002; Iyer et al., 2002). Figure 23 shows a general scheme for a two-photon excitation microscope that also includes conventional excitation ability. In typical TPE or confocal microscopes, images are built by raster scanning the x–y mirrors of a galvanometric-driven mechanical scanner (Webb, 1996). This implies that the image formation speed is mainly determined by the mechanical properties of the scanner. In this case, the time needed for single line scanning is of the order of milliseconds. Faster beam-scanning schemes can be realized, even if the ‘‘eternal triangle of compromise’’ should be taken into proper account. As shown in Figure 24, in agreement with Shotton (1995) and Pawley (1995), triangulation refers to sensitivity, spatial resolution, and temporal resolution. Ideally, one wishes to maximize all three of these criteria. Unfortunately, the limitations of practical instrument design do not permit this and the best choice is the one satisfying the majority of the needs considering specific applications. However, speculating within the galvanometric mirrors framework, in TPE setups particular attention should be given to the surfaces of the mirrors and to the way they are mounted on the scanners in order to obtain the best reflection efficiency and scanning stability. Enhanced silver coating of the mirrors is frequently used to optimize reflectivity of the infrared excitation wavelengths (Wokosin and White, 1997). Then, the excitation light should reach the microscope objective passing through the minimum number of optical components and possibly along the shortest path. Typically, highnumerical-aperture objectives, with high infrared transmission, are used to maximize TPE efficiency (Benham and Schwartz, 2002). As also reported by Girkin and Wokosin (2002), signal detection efficiency can be further enhanced by using an additional reflector in the condenser assembly.
228
DIASPRO AND CHIRICO
ULTRAFAST LASER SOURCE
BEAM CONTROL
OD FILTER
LASER SOURCE SAMPLE
Z-AXIS CONTROL
LASER SCANNING HEAD
HARDWARE CONTROL Figure 23. Schematic of a typical two-photon scanning microscope in which the ability to use the microscope as a confocal laser scanning microscope is retained.
Figure 24. The ‘‘eternal triangle of compromise.’’
Although the x–y scanners provide lateral focal-point scanning, axial scanning can be achieved by means of different positioning devices, the most popular being a belt-driven system using a DC motor and a single objective piezo nanopositioner. Usually, it is possible to switch between the
TWO-PHOTON EXCITATION MICROSCOPY
229
Figure 25. Photograph of a new generation compact confocal laser scanning microscope architecture, Nikon C1. On the side port of the Nikon inverted microscope is plugged the portable confocal scanning head. The advantage of such a compact confocal scanning head is the reduced optical pathways resulting in an increased sensitivity. (Courtesy of Cristiana Ricci, Nikon SpA, Florence, Italy.)
one-photon and two-photon modes retaining x–y–z positioning on the sample being imaged. Figure 25 shows a new generation compact confocal laser scanning microscope easily convertible into a TPE one. Acquisition and visualization are generally completely computer controlled by dedicated software that allows different key parameters to be controlled as can be seen from the captured screen shown in Figure 26. Let us now consider two popular approaches that can be used to perform TPE microscopy, namely, the descanned and nondescanned mode. The former uses the very same optical pathway and mechanism employed in confocal laser scanning microscopy. The latter mainly optimizes the optical pathway by minimizing the number of optical elements encountered on the way from the sample to detectors, and increases the detector area. Figure 27 illustrates these two approaches, also including the conventional confocal scheme with a pinhole along with the descanned pathway. The nondescanned detection scheme is in tune with Pawley’s axiom, also reported by Girkin and Wokosin (2002), which states that the single most important
230
DIASPRO AND CHIRICO
Figure 26. Example of a software acquisition window. The main controllable parameters are photomultiplier tube gain (linear or logarithmic), dwell time or speed, field of view or zooming factor; channel port selection, and optical sectioning data. (EZ2000 software, courtesy of Kees van Oord and Nikon. Europe; www.coord.nl.) (See Color Insert.)
aspect of fluorescence microscopy is to collect every excited photon possible (Pawley 1995), as well as with John White’s statement that ‘‘The best optics are no optics!’’ (Girkin and Wokosin, 2002). Now, when working with point-scanning laser excitation systems short pixel dwell times (microseconds) are often used, which necessitate very high source intensities for sufficient signal-to-noise imaging. These high intensities have a correspondingly high risk of fluorophore bleaching and saturation. This requires that every emission photon possible should be included in the final image in order to maximize the signal-to-noise ratio and the signal-to-toxicity balance. This action is in contrast with the achievement of good spatial resolution, especially along the z-axis. However, considering an overall balance in terms of image contrast, the situation is not so bad. There is no competition with confocal microscope for imaging at large depth into thick samples: TPE is better. The TPE nondescanned mode allows very good performances providing superior signal-to-noise ratio inside strongly
TWO-PHOTON EXCITATION MICROSCOPY
231
Figure 27. Simplified optical schemes for scanned and nondescanned detection. A confocal pinhole can be used or fully opened. (Courtesy of Mark Cannel, adapted from Soeller and Cannell, 1999.)
scattering samples (Masters et al., 1997; Daria et al., 1998; Centonze and White, 1998; So et al., 2000). In the descanned approach pinholes are removed or set to their maximum aperture and the emission signal is captured using an excitation scanning device on the back pathway. For this reason it is called the descanned mode. In the latter, the confocal architecture is modified in order to increase collection efficiency: pinholes are removed and the emitted radiation is collected using dichroic mirrors on the emission path or external detectors without passing through the galvanometric scanning mirrors. A high-sensitivity detection system is another critical issue (Wokosin et al., 1998; So et al., 2000; Girkin and Wokosin, 2002). The fluorescence emitted is collected by the objective and transferred to the detection system through
232
DIASPRO AND CHIRICO
a dichroic mirror along the emission path. Due to the high excitation intensity, an additional barrier filter is needed to avoid mixing of the excitation and emission light at the detection system that is differently placed depending on the acquisition scheme being used. Photodetectors that can be used include photomultiplier tubes, avalanche photodiodes, and charge-coupled device (CCD) cameras (Denk et al., 1995; Murphy, 2001). Photomultiplier tubes are the most commonly used. This is due to their low cost, good sensitivity in the blue-green spectral region, high dynamic range, large size of the sensitive area, and single-photon counting mode availability (Hamamatsu Photonics, 1999). They have a quantum efficiency around 20–40% in the blue-green spectral region that drops down to < 1% when moving to the red region. This is a good condition in TPE because one wants to reject as much as possible wavelengths above 680 nm that are mainly used for excitation. Another advantage is that the large size of the sensitive area of photomultiplier tubes allows efficient collection of signal in the nondescanned mode within a dynamic range of the order of 108. Avalanche photodiodes are excellent in terms of sensitivity, exhibiting quantum efficiency close to 70–80% in the visible spectral range. Unfortunately their cost is high and the small active photosensitive area, < 1 mm in size, could introduce drawbacks in the detection scheme and require special descanning optics (Farrer et al., 1999). CCD cameras are used in video rate multifocal imaging (Fuijta and Takamatsu, 2002; Girkin and Wokosin, 2002). Now, as a further general consideration, to obtain a better spatial resolution it is also possible to retain the confocal pinhole as shown in Figure 27 and as discussed in the previous section (Soeller and Cannell, 1999; Periasamy et al., 1999; Gauderon et al., 1999; Torok and Sheppard, 2002). Unfortunately, in some practical experimental situations, the low efficiency of the TPE fluorescence process may rule out such a solution. However, when pinhole insertion is possible, the major advantage is that the axial resolution can be improved by approximately 40%. Torok and Sheppard (2002) analyzed the theoretical dependence of the point spread function on pinhole size. The effect of the confocal pinhole is experimentally demonstrated in Figure 28 (Gauderon et al., 1999). It can be seen that the resolution, particularly in the axial direction, is improved by using a confocal pinhole. Because the chromosomes used as test sample are dispersed in 3D, they are well suited to prove the better spatial selectivity attainable resulting in relevant 3D image enhancement. Figure 29 shows two three-dimensional views of a ‘‘spiky’’ pollen grain mounted from fluorescence optical sections realized by means of confocal and TPE microscopy (Potter, 1996). The TPE consequence is related to a better signal-to-noise ratio. This is particularly evident for good fluorescent samples. As usual for weak fluorescence, more complex considerations have
TWO-PHOTON EXCITATION MICROSCOPY
233
Figure 28. Optical sectioning x–y views of two groups of DAPI-stained onion root chromosomes in a three-dimensional volume imaged by two-photon excited fluorescence. Left: Confocal pinhole almost fully open. Right: Optimized confocal pinhole size. Using a confocal pinhole the chromosomes in the focal plane are better selected than in the pinhole open condition. (Courtesy of Colin Sheppard, adapted from Gauderon et al., 1999.) (See Color Insert.)
Figure 29. Spiky pollen grain images acquired by means of confocal and TPE threedimensional imaging. The background-free acquisition property of TPE imaging results in a better signal-to-noise ratio. (After Potter, 1996.)
234
DIASPRO AND CHIRICO TABLE 5 Comparison of TPE and Confocal Imaging Systems TPE
Excitation source
Excitation/emission separation Detectors Volume selectivity Image formation Deep imaging
Spatial resolution
Real time imaging Signal-to-noise ratio Fluorophores
Photobleaching
Contrast mechanisms
Commercially available
Confocal
Laser, IR, fs–ps pulsed, 80–100 MHz repetition rate, tunable 680–1050 nm Wide
Laser VIS/UV CW (365, 488, 514, 543, 568, 633, 647 nm) Close
PMT (typical), CCD, APD Intrinsic (fraction of femtoliter) Beam scanning (or rotating disks)
PMT (typical), CCD Pinhole required Beam scanning (or rotating disks) Approx. 200 mm (problems related to shorter wavelength scattering) Diffraction limited depending on pinhole size
> 500 mm (problems related to pulse shape modifications and scattering) Less than confocal because of the focusing of IR radiation, compensated by the higher signal-to-noise ratio; pinhole increases resolution, good for high fluorescence Possible High (especially in nondescanned mode) All available for conventional excitation plus specifically new designed for TPE Only in the focus volume defined through resolution parameters Fluorescence, high-order harmonic generation, higher order n-photon excitation, autofluorescence Yes (but still not mature and too expensive)
Possible Good Selected fluorophores depending on laser lines in use Within all the double cone of excitation defined by the lens characteristics Fluorescence, reflection, transmission
Yes (very affordable)
to be discussed (Brakenhoff et al., 1996). Table 5 compares one-photon confocal imaging features with TPE descanned ones. However, once the best quality image possible has been obtained then sophisticated mathematical algorithms can be applied to enhance the features of interest to the biological researcher and to improve the quality of data to be used for three-dimensional modeling (Brakenhoff et al., 1989; Shotton, 1995; Diaspro et al., 1990, 2000; Boccacci and Bertero, 2002;
TWO-PHOTON EXCITATION MICROSCOPY
235
Carrington, 2002). Recently, an image restoration web service has been established to get the best quality 3D data set from a wide-field, confocal, or TPE optically sectioned sample. This tool, called ‘‘Power-up your microscope’’ (Diaspro et al., 2002c), is available for free at www.powermicroscope.com. Now, let us focus on three further aspects, namely, laser sources, lens objectives, and an example of a practical realization.
B. Laser Sources Laser sources, as often happened in optical microscopy, represent an important resource, especially in fluorescence microscopy (Gratton and van de Ven, 1995; Svelto, 1998). Within the nonresonant TPE framework, owing to the comparatively low cross sections of fluorophores, high photon flux densities are required, > 1024 photons cm2 s1 (Konig, 2000). As already discussed, using radiation in the spectral range of 600–1100 nm, excitation intensities in the MW–GW cm2 are required. This spatial concentration can be obtained by the combined use of focusing lens objectives (see the next section) and CW (Hanninen and Hell, 1994; Konig et al., 1995) or pulsed (Denk et al., 1990) laser radiation of 50 mW mean power or less (Girkin and Wokosin, 2002; Diaspro and Sheppard, 2002). In fact, two-photon excitation microscopes have been realized using CW, femtosecond, and picosecond laser sources (Periasamy, 2001; Diaspro, 2002; Masters, 2002). We could say that since the original successful experiments in TPE microscopy, advances have been made in the technological field of ultrashort pulsed lasers. Even if prices, in general, are still very high, efforts have been to lower the operative technical complexity and to produce systems simpler to maintain and more compact. The originally used argonpumped dye lasers were rapidly replaced with argon-pumped Ti-sapphire lasers (Fisher et al., 1997), and more recently they have been surpassed by all-solid-state sources requiring standard mains electrical power supply and minimal cooling systems (Wokosin et al., 1996). Laser sources suitable for TPE can now be described as ‘‘turnkey’’ systems. Figure 30 shows the emission range for different laser sources combined with the cross-sectional behavior of some popular fluorophores. It is evident that the range 700– 1050 nm is well addressed by Ti-sapphire lasers. This range of wavelengths is very common because a variety of fluorophores have an excitation range in the conventional one-photon excitation regime within 350–600 nm. So, under the ‘‘twice wavelength’’ rule of thumb, the Ti-sapphire laser appears the best choice. Other laser sources used for TPE are Cr-LiSAF, pulse-compressed Nd-YLF in the femtosecond regime, and mode-locked Nd-YAG and
236
DIASPRO AND CHIRICO
Figure 30. Cross sections of common fluorophores compared with the emission wavelength range available by different commercial laser systems. (After Xu et al., 1995.)
picosecond Ti-sapphire lasers in the picosecond regime (Gratton and van de Ven, 1995; Wokosin et al., 1996). Moreover the absorption coefficients of most biological samples, cells, and tissues are minimized within this spectral window (So et al., 2000). Figures 31 and 32 show a practical setup for a all-solid-state Ti-sapphire laser. Table 6 presents some data about the most commonly used Ti-sapphire laser sources for applications in microscopy and spectroscopy. These lasers operate in the mode-locking mode. Mode locking provides the ability to generate a train of very short pulses by modulating the gain or excitation of a laser at a frequency with a period equal to the roundtrip time of a photon within the laser cavity. This frequency is related to the linear inverse of the cavity length (Fisher et al., 1997; Svelto, 1998). The resulting pulsewidth is in the 50 to 150 fs regime. Figure 33 shows a photograph of an open cavity of a Tsunami (Spectra Physics, CA) Tisapphire cavity. The chromatic beauty is provided by the green light of the solid-state pump and by the red fluorescence of the Ti-sapphire crystal. Measured values for pulse width and average power as a function of the operating wavelength are shown in the graph of Figure 34. This graph is restricted to the 680–830 nm range because even if Ti-sapphire emits in the 680– 1050 regime, to obtain a stable behavior cavity mirrors have to be wavelength selected. In terms of wavelengths, two-photon and multiphoton excitation microscopy take place with a ‘‘comb’’ of wavelengths. This fact has positive and negative effects. For a 1050-nm source three-photon events at the 350-nm
TWO-PHOTON EXCITATION MICROSCOPY
237
Figure 31. Solid state laser pump for a Ti-sapphire crystal laser cavity. In this picture is visible the open cavity of a Millennia V (Spectra Physics, Mountain View, CA) emitting in the green at 543 nm and delivering 5 W. Recently more compact solid-state pumps have been introduced like Millennia X by Spectra Physics and Verdi by Coherent. (Courtesy of Alessandro Esposito; picture taken at LAMBS.)
equivalent wavelength (potential for phototoxic UV transitions) can occur together with 525 nm (two-photon equivalent excitation) ones. Moreover, for a 720-nm laser beam the comb effect may be worse. In fact, for 720 nm, one should consider possible effects at 360 nm and 180 nm with the potential to induce DNA damage. The final choice is as usual a compromise dictated by the specificity of TPE microscope applications. However, the parameters that are more relevant in the selection of the laser source are average power, pulsewidth and repetition rate, and wavelength, also according to Eq. (18). The most popular features for an infrared pulsed laser are 700 mW–1 W average power, 80–100 MHz repetition rate, and 100–150 fs pulse width. So far, the use of short pulses and small duty cycles is mandatory to allow image acquisition in a
238
DIASPRO AND CHIRICO
Figure 32. Coupling of the solid-state laser pump (Millennia V, Spectra Physics) with the Ti-sapphire unit (Tsunami, Spectra Physics). Another popular commercial combination is made by Verdi and Mira by Coherent. On the picture background is visible the only cooling system needed for both commercial systems, i.e., a chiller. (Courtesy of Alessando Esposito; picture taken at LAMBS.)
reasonable time while using power levels that are biologically tolerable (Denk et al., 1994; Denk, 1996; Koester et al., 1999; Konig et al., 1996a,c; Konig et al., 1998; Konig, 2000; Konig and Tirlapur, 2002). The 100-fs pulses used for TPE microscopy have bandwidths of the order of 10–15 nm, and when these pulses are passed through optical elements, mainly objective lenses, dispersion takes place. This means that the original pulse is stretched (Fig. 35), in time reducing its peak power and consequently potential fluorescence signal (Soeller and Cannell, 1996; Wokosin and White, 1997; Wolleshensky et al., 1998, 2002). Compensation of such dispersion is not easy. It is not easy to be actuated and to be maintained, especially in a multiuser TPE microscopy facility. Such compression is also required if an optical fiber is used to deliver the excitation beam to the microscope scanning head. Dealing with optical fibers the problem is also complicated by power limitations. In fact, for example, if operating at high power nonlinear effects within the fiber can occur and the nonpropagating portion of the beam can produce damages at the fiber coupling zone. To minimize dispersion problems Konig (2000) suggests working with pulses around 150–200 nm. This seems a very good compromise both for pulse stretching and sample viability. It is necessary to keep in mind that a shorter pulse broadens more than a longer one. Until new optical fibers, such as the ones outlined by Warren Zipfel at the 2002 SPIE meeting on Multiphoton Microscopy, are designed and produced, it is preferable to
TABLE 6 Ti-Sapphire Laser Sources
Tuning Range
Company/Model
Wavelength (nm)
239
Spectra Physics—Tsunami
Wide
680–1050
Coherent—Mira
Wide
680–1000
Spectra Physics—Mai Tai
100 nm selectable
Coherent—Chameleon
210 nm selectable
750–850 780–920 720–930
Pulse Width 25 fs, <100–130 fs up to 100 ps <100 fs, 5–10 ps <100 fs, 25 fs 140 fs
Average Power (mW) for 5 W Pump
Dimension
Pump
100–800
Normal/large
100–700
Large
Solid-state 5–10 W compact Solid-state 5–10 W compact Solid-state integrated Solid-state integrated
750
Compact
<1000
Compact
240
DIASPRO AND CHIRICO
Figure 33. Photograph of the open Tsunami cavity. Green is the light from the laser pump; red is the Ti-sapphire crystal emitted fluorescence lasing into the aligned cavity. (Picture taken at LAMBS.)
have the laser beam directly coupled to the scanning head. However, for interested readers, Wolleshencky and colleagues (2002) reported data, benefits, and problems in using an optical fiber to launch the excitation within the scanning head. Direct coupling allows the maximum flexibility, together with the minimum level of ‘‘extra’’ problems at the excitation stage. In fact, even if the fiber-coupling solution seems the best in terms of practicality and safety at first glance, when used, especially when managing a multiuser facility, some practical problems can arise related to the fiber-coupling solution itself. For example, due to high-order nonlinearities in the fiber, there is a limit on how much power can be put through the fiber (60–100 mW). Moreover, because the fiber coupler needs some devices for maintaining pulse width and beam shape, it is not totally turnkey, demanding further control and alignment procedures, especially when wavelength changes are required or laser output beam changes occur, due, for example, to variations in room temperature. Pulse width measurement is another very delicate issue. In fact, because it is not very easy to measure it at the focal volume within the sample, little can be definitely said about it (Hanninen and Hell, 1994; Guild et al., 1997; Wolleschensky et al., 2002). Although users do not perform measurement of the pulse width at the sample when they use twophoton microscopy, which would require a specific procedure that, even if
TWO-PHOTON EXCITATION MICROSCOPY
241
Figure 34. Power and pulse width characteristics as function of beam wavelength for a typical laser used for TPE microscopy. As can be seen at the edges of the wavelength range the features are poor. This is mainly due to the characteristics of the set of mirrors mounted into the cavity and typically optimized for a fixed range of wavelengths. (Measurements made at LAMBS on a Millennia V-Tsunami system.)
Figure 35. (a) The broadening of the pulse width at the exit of the laser after passing through microscope optics. (b) An oscilloscope can be typically used for monitoring the pulsing of the laser beam through the output spectrum of the laser. The full width half maximum of the spectral distribution can provide information about pulse width (see text). Pulse widths ranging from 80 to 150 fs at the laser output window are optimal for TPE microscopy applications.
242
DIASPRO AND CHIRICO
not too complex for a researcher in the field, could be irksome for the majority of users, it is a reasonable approximation to assume that at the focal volume a 1.5 to 2-fold temporal pulse broadening occurs using high quality optics (Wolleschensky et al., 2002; Girkin and Wokosin, 2002). As an example, for a measured laser pulse width of about 100 fs, an estimate at the sample is about 150–180 fs under favorable experimental conditions, sample characteristics included. Sample properties are mentioned because for thick samples the role played by thickness, in terms of pulse width broadening, is not so obvious (de Grauw and Gerritsen, 2002; So et al., 2001, Centonze and White, 1998; Gu et al., 2000; Saloma et al., 1998). As a sort of summary on laser parameters, one should consider keeping the peak power low and the wavelength long. For the majority of practical imaging conditions, pulse width does not constitute an excessively critical parameter. One can plan to select direct beam delivery rather than using optical fibers and 150–200 fs provides an excellent compromise between minimal dispersion loss in the optics and within the sample and a sufficient quantity of peak power.
C. Lens Objectives The choice of an objective lens influences the performances of a TPE microscope. New technological requisites have to be considered with respect to conventional excitation fluorescence microscopes. Adequate transmission in the IR has to be coupled with good collection efficiency toward the ultraviolet region. Moreover, the number of components should be minimized without affecting resolution properties in order to reduce pulse width distortions. Although the collection efficiency of the time-averaged photon flux is dependent on the numerical aperture of the collecting lens, it was demonstrated that total fluorescence generation is independent of the numerical aperture of the focusing lens when imaging thick samples (Xu, 2002). This is due to the fact that the increase of intensity, obtained by sharper focusing (high NA), is counterbalanced by the shrinking of the excitation volume. Thus the total amount of fluorescence summed over the entire space remains constant. The very relevant practical consequence of this fact is that in TPE measurements on thick samples, assuming no aberrations, the generated fluorescence is insensitive to the size of the focal spot. As a positive consequence, a moderate variation of the laser beam size would not affect the measurements. This is a very efficient condition due to the fact that using an appropriate (nondescanned) acquisition scheme it is possible to collect all the generated fluorescence.
TWO-PHOTON EXCITATION MICROSCOPY
243
However, the amount of the light delivered or collected depends critically on the cone of light accepted into the objective lens (Bianco and Diaspro, 1989). For point scanning systems, this light cone is related to the numerical aperture of the lens relative to the index of refraction of the immersion medium. The total area of light collected by the lens is then proportional to the square of this ratio. Girkin and Wokosin (2002) reported on the light collection performance of a range of lenses compared to a 1.4 NA 60 oil lens. The 1.4 NA 60 oil and 1.2 NA 60 water lenses have the best collection potential; the 1.3 NA oil lenses can collect 85% as much light compared to the best two lenses, whereas the 0.75 NA air lenses and 1.0 NA water dipping lens collect only 66% as much light. As usual, a compromise must be reached, as the higher numerical aperture lenses (recently 1.65 by Olympus and 1.45 by Nikon and Zeiss have been available) gain light collection power at a loss of working distance. Typical working distances for oil immersion lenses with a numerical aperture of 1.4 are around 250 mm (which includes the thickness of a cover slip that most commonly is 0.17 mm). This gives a practical working distance of less than 100 mm, making them unsuitable for really deep imaging of intact tissue. As a general rule one selects a lens with as large a numerical aperture as possible giving the working distance required for the preparation under investigation. The 1.4 NA lenses provide a good starting point for TPE lens selection. In fact, the numerical aperture of a lens also determines the ultimate resolution of the optical system (Abbe, 1910). In addition NA has a significant contribution in determining the signal-to-background and signal-to-noise ratios. These factors play a large role in determining whether a particular sample can be successfully imaged, and in particular on the maximum depth that can be visualised (Girkin and Wokosin, 2002; Jonkman and Stelzer, 2002; de Grauw and Gerritsen, 2002). Table 7 gives transmission data for the Nikon CFI60 series of objectives (Benham and Schwartz, 2002; Girkin and Wokosin, 2002). Konig reported interesting data related to pulse broadening due to microscope optics including lens objectives. Considering an ideal chirp-free pulse at the entrance of an optical system having a pulse width in, the pulse broadening B can be calculated as out/ in where out is the pulse width at the focal plane (Konig, 2000) (Fig. 35):. out rhffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiiffi B¼ ð22Þ in ¼ 1 þ 7:68ðDL=in2 Þ2
Typical values for the optical dispersion parameter D for glass objectives at 800 nm are reported in Table 8. Considering the optical path length factor L, a typical DL value of the whole microscope including an oil immersion
244
DIASPRO AND CHIRICO TABLE 7 Nikon CFI60 Series of Objectives
Objective CFI CFI CFI CFI CFI CFI CFI CFI CFI
Plan Apochromat Plan Fluor Fluor Water dipping Plan Fluor Fluor Water dipping Super (S) Fluor Super (S) Fluor Fluor Water dipping Super (S) Fluor
Magnification NA
Working Distance (mm)
100oil IR 1.40 100oil 1.30 601.00 60oil 1.25 400.80 40oil 1.30 200.75 100.30 100.50
0.13 0.20 2.00 0.20 2.00 0.22 1.00 2.00 1.20
T% at 350 nm <15 51–70 51–70 31–50 51–70 51–70 51–70 51–70 51–70
T% at 550 nm >71 >71 >71 >71 >71 >71 >71 >71 >71
T% at 900 nm 51–70 >71 >71 51–70 >71 >71 >71 51–70 >71
TABLE 8 Values of D at 800 nm D at 800 nm (fs2 cm1 ) +251 +300 +389 +445 +1030 +1600
Glass Type CaF2 Quartz FK-3 BK7 SF2 SF10
high NA objective is 5000 fs2 (Konig, 2000). Wolleschensky et al. (2002) summarized dispersion parameters for Zeiss microscope objective lenses measured at 800 nm. D values in fs2 are 1714, 1494, 2398, and 1531 (within an error of about 10%) for 40/0.8 water IR Achroplan, 63/0.9 water IR Achroplan, 40/1.3 oil Plan Neofluar, and 20/0.75 Plan Apochromat, respectively. A pulse broadening factor of a 100-fs pulse was estimated to be between 1.14 and 1.23. D. Example of the Practical Realization of a TPE Microscope This section is related to the practical realization of a TPE microscope achieved through minor modifications of a commercial confocal laser scanning microscope (CLSM), in which the ability to operate as a standard CLSM has been preserved (Diaspro, 2001; Diaspro et al., 2001). This microscope has been established at LAMBS (Laboratory for Advanced
TWO-PHOTON EXCITATION MICROSCOPY
245
Figure 36. A schematic drawing of the TPE microscope developed at LAMBS (Diaspro, 2001). (See Color Insert.)
Bioimaging, Microscopy, and Spectroscopy) under the auspices of and grants from the National Institute for the Physics of Matter (INFM, Istituto Nazionale per la Fisica della Materia) as the first Italian TPE architecture (Diaspro et al., 1999b; Diaspro, 2001). A scheme of the architecture is graphically sketched in Figure 36. Figure 37 shows an overall picture of the laboratory including the TPE microscope. The core of the architecture is a mode-locked Ti-sapphire infrared pulsed laser (Tsunami 3960, Spectra Physics Inc., Mountain View, CA), pumped by a high-power (5 W at 532 nm) solid-state laser (Millennia V, Spectra Physics Inc., Mountain View, CA). This Ti-sapphire laser output can be tuned across two ranges, namely, from 680 to 830 nm and from 730 to 900 nm, depending on the set of mirrors actually mounted into the laser cavity. These two sets allow the twophoton excitation of a variety of fluorescent molecules normally excited by visible and ultraviolet radiation, including the green fluorescent protein family (Xu, 2002). The restriction of the tunable range is given by the set of mirror installed. Power and wavelength measurements are performed using an RE201 model ultrafast laser spectrum analyzer (Ist-Rees, UK) and an AN2/10A-P model thermopile detector power meter (Ophir, Israel) that
246
DIASPRO AND CHIRICO
Figure 37. Photograph of the TPE microscope realized at LAMBS within the strategic framework of a national project of the National Institute of the Physics of Matter (INFM) (Diaspro et al., 1999a; Diaspro, 2001). On the left the open Tsunami cavity is visible. The microscope and the PCM2000 scanning head mounted on its lateral port are visible on the right is the video unit. Visible is part of the beam diagnostics (left) performed using an ultrafast laser spectrum analyzer RE201 (Ist-Rees, UK) and a thermopile detector power meter AN2/10A-P (Ophir, Israel). In this picture are also visible the author (right) and Mirko Corosu (left), the first student working at LAMBS on the TPE microscopy project.
constitute the beam diagnostics module of the system. A model 409-08 scanning autocorrelator (Spectra Physics, Mountain View, CA) has been occasionally used for precise pulse width evaluation, but it is not within routine beam diagnostics. We are currently utilizing a compact optical autocorrelator, which allows measurement of femtosecond laser pulses on the microscope objective plane based on a Michelson interferometer and fluorescence signal (Cannone et al., 2002). A special dichroic mirror set (Stanley, 2001), optimized for high-power ultrashort infrared pulses (CVI, USA), is used to bring the Tsunami beam directly into the scanning head. Before entering the scanning head, beam average power is brought to desired values using a neutral density rotating wheel (Melles Griot, USA). For an average power of 20 mW at the entrance of the scanning head, the average power before the microscope objective is about 8–12 mW and at the sample is estimated between 2 and 6 mW. We found that at the focal volume a 1.5- to 1.8-fold broadening occurs using a high numerical aperture objective and a reduced amount of optics within the optical path (Soeller and Cannell, 1996; Hanninen and Hell, 1994). For example, this means that
TWO-PHOTON EXCITATION MICROSCOPY
247
for a measured laser pulse width of about 100 fs at the Tsunami output window, the estimate at the sample is about 150 fs. During measurement sessions, we continuously display the pulse condition by means of an oscilloscope connected to the output of the spectrum analyzer. The pulse condition can also be tested using a simple reflective grating. In this case the reflected image on a screen will be sharp for quasicontinuous emission and blurred for pulsed emission. This is due to the fact that the output of the pulsed laser beam is more spectrally broadened in the case of pulsed emission. This spectrum is visible on the screen of the above-mentioned oscilloscope in the architecture pictures. For a transform limited Sech2 pulse the relationship between pulse width (dT ) and frequency width (df ) is as follows: dT df = 0.315. Unfortunately, the pulse is not transform limited, so this product can exceed 0.315. The laser beam is aligned using a conventional laser source of the scanning head by marking some reference positions inside the scanning head itself. The scanning and acquisition system for microscopic imaging is based on a commercial single-pinhole scanning head Nikon PCM2000 (Nikon Instr., Florence, Italy) mounted on the lateral port of a common inverted microscope, Nikon Eclipse TE300 (Fig. 38). The Nikon PCM2000 has a
Figure 38. Photograph of the confocal laser scanning head currently operating at LAMBS and modified for TPE imaging, i.e., Nikon PCM2000. (Courtesy of Alessandro Esposito.)
248
DIASPRO AND CHIRICO
simple and compact light path that makes it very appropriate for conversion to a two-photon scope (Diaspro et al., 1999b). The optical resolution performances of this microscope when operating in conventional confocal mode, and using a 100/1.3 NA oil immersion objective, have been reported in detail elsewhere and are 178 21 nm laterally and 509 49 nm axially (Diaspro et al., 1999b). Under TPE the scanning head operates in the ‘‘open pinhole’’ condition, i.e., a wide-field descanned detection scheme is used (Diaspro et al., 1999a). Figure 39 illustrates the optical pathways available on the microscope. Figure 40 shows in detail the optical pathway within the laser scanning head and the beam delivery input port. Figure 41 shows the simple but effective optical path of the PCM2000 scanning head. A dichroic mirror (1st) has been substituted in the original scanning head to allow
Figure 39. Rear view of the LAMBS TPE architecture: (1) Tsunami laser; (2–3) optical mounts for beam splitting dichroics; (4) reflected beam stops; (5–6) spectrum analyzer section including neutral density filter (5) and spectrum analyzer head (6); (7) neutral density filter for average power control along the TPE microscopy pathway (righ-side beam line); (8) mobile power meter, including measuring unit and display; (9) microscopy beam line input port at the confocal scanning head; (10) scanning lens coupling the confocal scanning head with the side port of the inverted optical microscope; (11) epifluorescence originating port sacrified for TPE beam delivery for spectroscopic applications of the left-side beam line (Diaspro et al., 2001). (Photo courtesy of Alessandro Esposito.) (See Color Insert.)
TWO-PHOTON EXCITATION MICROSCOPY
249
Figure 40. Scanning head input (left) and scanning head components (right) including modified dichroics for TPE microscopy (D1, D2). (See Color Insert.)
excitation from 680 to 1050 nm (Chroma Inc., Brattleboro, VT). The substituted dichroic mirror reflects very efficiently (>95%) from 680 to 1050 nm. The 50% cut-off is around 640 nm. At its best performance (>90%) the mirror transmits from 410 to 620 nm. The neutral density filter at the openpinhole location has been removed. The galvanometer mirrors are metal coated (silver) on fused silica and exhibit a high damage threshold. The minimum pixel residence time is 3 ms and it is related to the mechanical response of the scanners. A series of emission custom-made filters that block infrared radiation (>650 nm) to an optical density of 6–7 within 50 mW of beam power incident on the filters themselves have been utilized, namely, E650P, HQ 460/50, HQ 535/50, HQ 485/30, and HQ 405/30 (Chroma Inc., Brattleboro, VT). The E650P filter has been initially tested to check its blocking performance with respect to the IR / NIR reflections coming from stray rays within the scanning head or from the sample and constitutes the base for the other HQ filters. The one-photon and two-photon mode can be simply accomplished by switching from the single-mode optical fiber (one photon), coupled to a module containing conventional laser sources (Ar-Ion, He-Ne green), to the optical path in air, delivering the Tsunami laser beam (two photon). To minimize architectural changes of the PCM2000 scanning head, a lens having a numerical aperture close to 0.11 is used, the numerical aperture of the optical fiber used for conventional excitation laser delivery. Figure 42 shows the attachment developed at LAMBS for laser coupling. It is a device that can be directly plugged into the scanning head. Switching from
250
DIASPRO AND CHIRICO
Figure 41. Optical scheme of the confocal scanning head Nikon PCM2000 shown in Figure 40. The excitation beam enters the PCM2000 scanning head through an optical coupler (1) in order to reach the sample on the x–y–z stage (5). The beam passes through the pinhole holder (2) kept in an open position, the galvanometric mirrors (3), and the scanning lens (4). Fluorescence generated from the sample (5) is delivered to the PMT through acquisition channels directed by two selectable mirrors (6, 8) via optical fiber (7, 9) coupling. The onephoton and two-photon mode can be simply accomplished by switching from the single-mode optical fiber (one photon), coupled to a module containing conventional laser sources (Ar-Ion, He-Ne green) to the two-photon optical coupler (TPOC), allowing delivery of a Tsunami laser beam (two photon). Axial scanning for confocal and TPE three-dimensional imaging is actuated by means of two different positioning devices depending on the experimental circumstances and axial accuracy needed, namely, a belt-driven system using a DC motor (RFZ-A, Nikon, Japan) and a single objective piezo nanopositioner (PIFOC P-721-17, Physik Instrumente, Germany). The piezoelectric axial positioner allows an axial resolution of 10 nm within a motion range of 1000 nm at 100 nm steps and utilizes a linear variable differential transformer (LVDT) integrated feedback sensor (Diaspro, 2001).
conventional to two-photon excitation is simple. Moreover, the switching operation allows the focus and postion on the sample to be maintained as demonstrated in Figures 43 and 44. A high-throughput optical fiber delivers the emitted fluorescence from the scanning head to the PCM2000 control unit where photomultiplier tubes (R928, Hamamatsu, Japan) are physically plugged. This solution is particularly useful for three main reasons: (1) electrical noise is reduced, (2) background light noise is reduced, and (3) it is possible to directly verify optical conditions keeping the scanning head without an enclosure. Axial scanning for confocal and TPE three-dimensional imaging is actuated by means of two different positioning devices depending on the
TWO-PHOTON EXCITATION MICROSCOPY
251
Figure 42. TPOC details. (A) Aluminum tube containing a low magnification coupling objective (Edmind Scientific, USA); (B) TPOC plugged at the scanning head input port after removing optical fiber for conventional excitation beam delivery; (C) optical fiber delivering conventional excitation connected through TPOC to the scanning head. By means of a solution adopted in (C) it is possible to simply and precisely switch from the one- to two-photon excitation mode (Diaspro, 2001). (Photograph courtesy of Federico Federici.)
experimental circumstances and axial accuracy needed, namely, a beltdriven system using a DC motor (RFZ-A, Nikon, Japan) and a singleobjective piezo nanopositioner (PIFOC P-721-17, Physik Instrumente, Germany). The piezoelectric axial positioner allows an axial resolution of 10 nm within a motion range of 1000 nm at 100 nm steps and utilizes a linear variable differential transformer (LVDT) integrated feedback sensor. Acquisition and visualization are completely computer controlled by a dedicated software, EZ2000 (Coord, Apeldorn, The Netherlands; http:// www.coord.nl). The main available controls are related to PMTs voltage, pixel dwell time, frame dimensions (1024 1024 maximum), and field of scan (from 1 to 140 mm using a 100 objective). Remember that decreasing the size of the field of scan increases the radiation exposition time when the resulting pixel dimension is smaller than one-half the dimension of the diffraction limited spot, i.e., < 200 nm, as shown in Figure 45. Zooming over a specific area in the sample it is possible to destroy selected samples by simultaneously increasing the dwell time. This micropatterning effect can be software controlled as reported in Figure 46. Specific patterns can be obtained utilizing the microscope as an active device as shown in Figure 47.
252
DIASPRO AND CHIRICO
Figure 43. Optical sectioning demonstrated in the confocal and TPE mode after switching from one mode to the other using TPOC developed at LAMBS. (See Color Insert.)
To evaluate the performance of the microscope some basic measurements have to be performed, namely, a fluorescence quadratic behavior check and point spread function (PSF) mesurement. PSF measurements are referred to a planachromatic Nikon 100, 1.4 NA immersion oil objective with enhanced transmission in the infrared region. Blue fluorescent carboxylate modified microspheres 100 nm in diameter (F-8797, Molecular Probes, OR) were used. A drop of dilute samples of bead suspensions was spread between two cover slips of nominal thickness 0.17 nm. These microspheres constitute a very good compromise toward the utilization of subresolution point scatterers and acceptable fluorescence emission. The geometry used is sketched in Figure 48. An object plane field of 18 18 mm was imaged in a 512 512 frame, at a pixel dwell time of 17 ms. Axial scanning has been performed and 21 optical consecutive and parallel slices have been collected at steps of 100 nm. The x–y scan step was 35 nm. The scanning head pinhole
TWO-PHOTON EXCITATION MICROSCOPY
253
Figure 44. Same cells as Figure 16 to demonstrate switching from the one- to two-photon excitation mode. In the TPE mode the internal structure of the nucleus is clearly visibile, i.e., chromatin DNA marked by DAPI. Along with Figure 44 this figure clearly shows the control of positioning after mode switching. (See Color Insert.)
Figure 45. Selective photodestruction of cells after selective zooming and average power increase. When zooming the residence time increases due to the fact that the spot is always diffraction limited in the same way, no zoom scanning, while the motion of the point scanning is finer and slower during zoom. (See Color Insert.)
was set to the open position. The 3D data sets of several specimens were analyzed in the form presented in Figure 49. The measured full width at half maximum (FWHM) lateral and axial resolutions were 210 40 nm and 700 50 nm, respectively (Diaspro, 2001). Intensity profiles along with the x–y–z directions of experimental data and theoretical expectations are reported in Figure 50. To be sure of operating in the TPE regime the quadratic behavior of the fluorescence intensity versus excitation power has
254
DIASPRO AND CHIRICO
Figure 46. It is possible to perform software control of the scanning beam in the x–y–z frame. This picture shows the programming of the scanners through the graphic realization (upper left window) of the desired pathway. After this, training commands are sent to EZ2000 and an example of selective photobleaching in a fluorescent sphere is shown (upper right window). (Courtesy of Alessandro Esposito, who developed this software tool, named ‘‘Stealth,’’ which directly interfaced with EZ2000 acquisition software.)
been demonstrated. Figure 51 shows the TPE trend obtained from a solution of fluorescein. Moreover, during any fluorescence acquisition a simple and effective test for the TPE condition can be performed by delivering continuous instead of pulsed radiation. This can be accomplished by interrupting the pumping at the Ti-sapphire laser for a while and switching off the pulse control. When the pump is activated, if there are not too many vibrations, it is possible to get a quasicontinuous beam that is not appropriate for TPE even if it is endowed with the very same average power of the pulsed one. Restoring the pulse at any moment during scanning, fluorescence becomes visible confirming the TPE imaging condition. Figure 52 shows the 3D ability of the system. Two different views of a mature sperm head of the octopus Eledone cirrhosa (Diaspro et al., 1997) are shown, realized from optical sections.
TWO-PHOTON EXCITATION MICROSCOPY
255
Figure 47. Examples of controlled selective photobleaching (above) resulting in writing the characters INFM within the central plane of a fluorescent sphere, and of selective ablation of a cell layer from a three-dimensional multilayer sample (Diaspro, 1999c, 2001). (See Color Insert.)
Figure 48. Geometry of the acquisition conditions for measuring the point spread function. Subresolution fluorescent spherical beads are dried and mounted between two 0.17mm cover-slips optimized for refractive index homogeneity (Diaspro et al., 2002a).
256 Figure 49. Fluorescence signals from subresolution optical fluorescent beads collected plane by plane by means of optical sectioning (Diaspro et al., 1999a,b).
257
TWO-PHOTON EXCITATION MICROSCOPY
Figure 50. Radial and axial intensity profiles of the point spread function. (Adapted from Diaspro et al., 1999b.)
3
Fluorescence intensity (a.u.)
2.5
2
1.5
1
0.5
0
0
5
10
15
20 25 30 35 Paverage (mW)
40
45
50
55
Figure 51. Quadratic behavior check for imaging under the TPE regime. (Courtesy of Mirko Corosu; measurements made at LAMBS.)
VII. Application Gallery Two-photon excitation microscopy has found applications in many areas of biology, medicine, physics, and engineering. Areas such as neurobiology and embryology, tissue engineering, and proteomics are only the tip of the
258
DIASPRO AND CHIRICO
Figure 52. Three-dimensional views (a, b) of the mature sperm head of the octopus Eledone cirrhosa, loaded with DAPI, from 12 optical sections. (Courtesy of Silvia Scaglione; image processing and visualization by Fabio Mazzone, Francesco Di Fato, and Silvia Scaglione at LAMBS and BioLab, University of Genoa, Italy.)
iceberg. Never, since the Dutch van Leewenhoek constructed his simple microscope in 1683, has there been so vast, rapid, and widespread flourishing of applications based on a microscopic technique. As will be seen, the predominant presence of applications is in the neurosciences, which is the field of Denk and co-workers (1990) from whom the modern TPE story started. Here we will try to show different applications, mixing the various areas impacted by the TPE revolution. Unfortunately it is impossible to mention the vast extent of TPE applications. For this reason we refer the interested reader to web-based search engines. Starting from neuroscience, Yuste et al. (2000) provide a wide and complete collection of examples of outstanding and excellent applications of two-photon excitation imaging. Figure 53 shows the complex organizational motifs of a special neuronal cell, i.e., Purkinje cell, evidenced by means of Oregon Green labeling. This fluorescent molecule binds calcium ions in the cytoplasm. Through specific experimental procedures it is possible to obtain quantitative information within a three-dimensional and temporal framework. In this context TPE is also relevant because of the possibility of long-term imaging session and for the further ability of performing these studies in intact tissues. In Figure 54 an optical section of rat granule cerebrellar cells is shown. In this case Indo-1 AM fluorescence is the mechanism of contrast for calcium ion distribution. This marker is conventionally excited in the UV regime and can give quantitative information about calcium concentration. TPE microscopy allows quantitatively dynamic events to be followed without perturbing the delicate and
TWO-PHOTON EXCITATION MICROSCOPY
259
Figure 53. Purkinje cell labeled with Oregon green. Calcium ion concentration is mapped by means of a color scale from blue (low concentration level) to red (maximum concentration level). (Courtesy of Prof. Cesare Usai, Institute of Biophysics, National Research Council, Genoa, Italy. Image acquired at LAMBS.) (See Color Insert.)
Figure 54. Rat granule cerebrellar cell loaded with Indo-1 AM calcium binding dye. This UV excitable fluorescent molecule has been excited at 720 nm at a moderate average power at a focal plane of 2 mW. (Courtesy of Alessandro Esposito, DIFI, University of Genoa, Italy. Image acquired at LAMBS.) (See Color Insert.)
complex relationship within neuronal cell networks that in the UV excitation case would significantly limit the duration of the experiment. The possibility of following dynamic events allowed us to demonstrate that living cells, after encapsulation into fuzzy nanostructured polyelectrolyte matrices, preserved their morphology, metabolic activity, and duplication function (Diaspro et al., 2002c). This is shown in Figure 55. Here the polyelectrolyte capsule was bound to fluorescein and DAPI was used to reveal mitochondrial and nuclear DNA distribution. For these dyes TPE
260
DIASPRO AND CHIRICO
Figure 55. Demonstration of cell duplication ability after polyelectrolyte encapsulation by coupling transmission imaging (A) with TPE imaging (B) of fluorescein and DAPI mapping caspule wall (green) and DNA distribution of a duplicating mother cell (blue). (Reprinted with permission from Langmuir, June 25, 2002, 18, 5047-5050. Copyright 2002 American Chemical Society.) Image acquired at LAMBS (Diaspro et al., 2002c). (See Color Insert.)
Figure 56. This image illustrates the peculiarity of TPE (right) with respect to conventional fluorescence excitation (left). This ability is a keystone for TPE applications. TPE takes place only within a diffraction-limited volume of event whereas conventional excitation takes place everywhere photons at a proper energy meet excitable fluorescent molecules. The volume of event, marked through the bright ellipsoid in the center of the excitation volume, can be roughly quantified using the resolution parameters of the system, as discussed in Section V. (See Color Insert.)
allowed simultaneous excitation at 720 nm at moderate average power (around 5 mW) without perturbing the hybrid cell–polyelectrolyte system. This could happen under a conventional confocal excitation regime for which 360 and 488 nm excitation wavelengths are required to excite DAPI and fluorescein, respectively. This very same ability to perform dynamic imaging is the core of a recent note published by Ott (2002) related to the ability of TPE microscopy to reveal tumor development. Figure 56 shows
TWO-PHOTON EXCITATION MICROSCOPY
261
Figure 57. Optical sections from a sea urchin egg marked by DAPI, TPE excitation at 720 nm. In this case heterochromatin distribution within the female pronucleus is visible. The whole egg has a diameter of 80 mm whereas the nucleus is 10 mm (as reference this is the maximum visible diameter). In conventional wide-field microscopy we could see only a confused bright spot from the nuclues. (Preparation of the sample made by Carla Falugi, DIBISAA, University of Genoa; images acquired at LAMBS.) (See Color Insert.)
that the key feature in TPE is a strong spatial selectivity in exciting extrinsic and intrinsic fluorophores. This property is fundamental in threedimensional imaging in thick samples. Excitation scattering is greatly reduced and at the same time emission scattering should be completely acquired since it comes from a unique well-defined subvolume within the sample that is located at the actual scanning position. The situation is dramatically enhanced with respect to UV regime conventional excitations. Figure 57 shows three-dimensional heterochromatin distribution within the nucleus of a sea urchin egg that constitutes a comparatively thick biological sample. Also in this case DAPI was used for evidencing DNA with the consequence that under conventional excitation DNA distribution details
262
DIASPRO AND CHIRICO
Figure 58. Optical sections of Figure 57 have been mounted in a topographic image. The image shows EZ2000 (Coord, NL) rendering using the ‘‘volume height function’’ that allows us to map the position of the maximum fluorescence along the optical axis. (See Color Insert.)
Figure 59. Spongy mesophyl of rice plant. TPE allowed simultaneous visualization of rice plant autofluorescence (red) and nonspecific DAPI binding to plant cell walls (blue). TPE at 790 nm. (Courtesy of Kirk J. Czymmek, Department of Biological Sciences, University of Delaware. Details on the project can be found at http://www.udel.edu/bio/people/faculty/ kczymmek.html.) (See Color Insert.)
are generally lost. This is due to the thickness and turbidity of the sample coupled with the need for UV excitation and three-dimensional imaging demand. Such a high-resolution imaging modality allows accurate topographical information to be obtained (Fig. 58), which can be used to monitor the environmental effect on sea urchin egg development (C. Falugi, 2002, private communication). Another very interesting field of application of TPE microscopy is plant biology. Figure 59 shows the spongy mesophyl of a rice plant combining chloroplast autofluorescence and DAPI binding fluorescence. It was recently observed that excitation with ultrashort
TWO-PHOTON EXCITATION MICROSCOPY
263
Figure 60. Top-down projection of senile plaques in the brain of a living transgenic mouse (Tg2576). This image is from an x–y–z volume of 500615200 mm3 (Christie et al., 2001). (Image by B. J. Bacskai, [email protected]; downloaded from Bio-Rad site http://microscopy.bio-rad.com/gallery7.htm;) (See Color Insert.)
90- and 170-fs NIR laser pulses at ¼ 740, 760, 780, 800, 820, 840, 860, 880, and 900 nm (at mean power 1 mW) invariably induces red chlorophyll autofluorescence of the chloroplasts present in the mesophyll cells (Tirlapur and Konig, 2002). As recently reported by Tirlapur and Konig (2002), the progress made in realizing TPE in plant biology indicates relevant contributions to the following topics: (1) signal transduction and ion dynamics, (2) protein–protein interactions, (3) symplastic communication, (4) basic aspects of organelle and cell division, (5) tip growth, and (6) plant morphogenesis as a whole. Hence TPE in planta is likely to exert an enormous impact on revolutionizing our basic thinking about structure– function relationships in three as well as in four dimensions. Figure 60 recalls the penetration properties of TPE microscopy, shown in red amyloid angiopathy and senile plaques from a living transgenic mouse brain. A fluorescent angiogram is shown in green. The image, captured from the Bio-Rad web site, is from an outstanding work by Christie and coworkers (2001), and is realized as a top-down projection of a large volume size 0.2 mm deep. This ability to image at a depth of 0.2 mm and deeper is a unique ability of the two-photon approach. A comparison study made by Centonze and White (1998) has convincingly demonstrated that TPE microscopy is a superior method in thick specimen analysis. Moreover the excellent work published by Squirrel’s group about long-term imaging of mammalian embryos without compromising viability (Squirrel et al., 1999) definitely demonstrated the usefulness and relevance of TPE imaging in the noninvasive and high-resolution study of living specimens. This feature of
264
DIASPRO AND CHIRICO
Figure 61. Mouse ear tissue structures visualized by means of two-photon excitation microscopy. Three-dimensional images of epidermal keratinocytes (a), basal cells (b), elastin/ collagen fibers (c), and cartilage structure (d) (above). (Adapted from So et al., 1998, 2000.) In vivo imaging of human skin: basal layers and strata corneum can be distinguished (below). (Adapted from Masters and [Au]So, 1999; So et al., 2000; courtesy of Peter So and Barry Maters.) (See Color Insert.)
TWO-PHOTON EXCITATION MICROSCOPY
265
TPE is of critical importance for applying this technique in optical biopsy. Figure 61a shows three-dimensional reconstructed TPE images of dermal and subcutaneous structures in a mouse ear tissue specimen (So et al., 1998). From the forearm of a human volunteer two-photon skin images were obtained allowing the distinct visualization of the strata corneum and of the basal layers, as reported in Figure 61b. This implies that pathological states such as atypical changes in cellular morphology as well as penetration of intradermal delivered drugs can be monitored. Notwithstanding these results it should be mentioned that some technological limitations still occur. As accurately analyzed by Gu and co-workers (2000) the penetration depth under TPE can be limited by the strength of primed fluorescence and it is not necessarily larger than that under single-photon excitation. In fact for a turbid tissue medium, where Mie scattering is dominant, multiple scattering not only reduces the illumination power in the forward direction but also produces an anisotropic distribution of scattered photons. It is worth noting that in cells and tissue it is possible to perform highresolution DNA analysis of specific sequences using two-photon excitation fluorescence in situ hybridization (FISH), more specifically three-dimensional two-photon multicolor FISH (Konig et al., 2000). Moving again to brain imaging and neuroscience applications within the framework of a study of anatomical features in whole animals (Denk et al., 1994), Yoder and Kleinfeld (2002), in an effort to image the brain with subcellular spatial resolution, designed and applied a method to image directly through thinned mouse skull using TPE microscopy and a stainless steel headframe (Kleinfeld and Denk, 2000). Figures 62 shows a cerebral vascular angiogram visualized through a thinned skull and Figure 63 shows the related red blood motion. Although the images shown here were used to penetrate the cerebral vasculature of NIH Swiss mice, these methods are applicable to any preparation that involves fluorescence imaging in mouse brain such as intracellularly injected fluorescence or genetically encoded fluorescence (e.g., green fluorescent protein). If the mean power in 100-MHz femtosecond laser TPE microscopes with a high numerical aperture is increased to light intensities of the order of magnitude of TW cm2 the instrument can switch from imaging modality to active processes useful for material processing or localized photochemistry, as previously shown in Section VI (Diaspro, 1999c; Diaspro et al., 2001). The Tetsuro Takamatsu and Satoshi Kawata groups recently communicated the achievement of TPE-induced waves of calcium ion concentration in live biological cells (Smith et al., 2001). Calcium waves were precisely induced by femtosecond pulsed-laser illumination by exposing living HeLa cells to focused 140-fs pulses of 780 nm wavelength at 30 mW average power. The waves were imaged by fluorescence and were observed to propagate from
266
DIASPRO AND CHIRICO
Figure 62. Cerebral vascular angiogram visualized through a thinned skull using 800 nm excitation and 90 mW average power. The focal plane is located 150 mm beneath the base of the skull. (Courtesy of Elizabeth Yoder. Reprinted with permission from Microscopy Research and Technique, 56, 305, 2002.)
Figure 63. Red blood motion within the capillary segment indicated in Figure 62. In this x-temporal view the unlabeled cells appear as dark bands that are evidenced against the fluorescent blood serum. A 40 water immersion objective was used. (Courtesy of Elizabeth Yoder. Reprinted with permission from Microscopy Research and Technique, 56, 305, 2002.)
TWO-PHOTON EXCITATION MICROSCOPY
267
Figure 64. Microbull adapted view—about the size of a red blood cell—of the smallest bull in the world realized by Satoshi Kawata’s group demonstrates the power of two-photon photopolymerization exploiting TPE microscopy three-dimensional capability and high spatial resolution (Kawata et al., 2001). (Image adapted from the web.)
Figure 65. Three-dimensional montage of drilled holes and cut structures in human chromosomes with a precision below the diffraction limit. Nanoprocessing has been performed using an 80-MHz ultrafast NIR laser source at 30–50 mW average power at the focal spot. (Courtesy of Karsten Konig; adapted from Konig and Tirlapur, 2002.)
the laser focal point inside the cell. From Kawata’s group, in 1997, a two-photon polymerization technique was developed that recently brought to realization the smallest bull in the world (Kawata et al., 2001). Here twophoton absorption of light was used to cause a polymer to solidify allowing the creation of a microbull in a block of commercially available resin. By using two-photon photopolymerization, Kawata’s team was able to overcome the diffraction limit and create structures with a spatial resolution of about 120 nm, even though the laser used had a wavelength that was more than six times longer, exploiting the nonlinear relationship between the
268
DIASPRO AND CHIRICO
Figure 66. Chromosome dissection within living PTK cells with a precision of 110 nm using the femtosecond NIR laser of a TPE microscope without loss of viability. The cells finished cell division after laser surgery (Ko¨nig et al., 1999b, 2000). (Courtesy of Karsten Ko¨nig; adapted from Konig, 2000.)
polymerization reaction and the light intensity. Figure 64 shows a microbull about the size of a red blood cell. The exposure source employed was a 780-nm mode-locked Ti-sapphire laser, capable of producing laser pulses of 150 fs at a repetition rate of 76 MHz, which was focused into a sample of SCR 500 resin by a high NA 1.4, oil immersion objective lens (Tanaka et al., 2002). The laser spot was scanned on the focal plane by a two-galvanomirror set, and along the optical axis by a piezo stage, both controlled by a computer. The ‘‘microbull’’, as well as the smallest ever functional micromechanical system—a spring with a diameter of only 300 nm, illustrates the potential of a new microfabrication technique that could be used to make optoelectronic devices, micromachines, and drug-delivery systems. As an extension of this nanofabrication ability, by fine tuning the laser power within a TPE microscopy architecture it was possible to realize a noncontact nanoscalpel for surgery inside the living cell, cell nucleus, or organelle without affecting other cellular compartments. Karsten Konig and his group were able to cut chromosomes within a living cell (Konig et al., 2000). Figure 65 shows three-dimensional views of human chromosome
TWO-PHOTON EXCITATION MICROSCOPY
269
Figure 67. Control measurements were performed to verify that the bright spot signals consisted of the second-harmonic generation. As the first check, used also for two-photon excitation autofluorescence, the laser was taken out of mode locking and the signals vanished. This fact indicates that the signals’ origin was due to nonlinear processes, which were also verified by a quadratic dependence on the laser power. Moreover, the laser was scanned between 750 and 830 nm keeping the 405-nm emission filter fixed. No signals was detected at 405 nm within a range of approximatively 5 nm around 810 nm. Finally, the potential SHG image appeared bleach resistant. Diaspro’s group is indebted to Colin Sheppard, Tony Wilson, and Guy Cox for critical and useful discussions about the still unclear origin of such a signal on the backscattering pathway. (Sample prepared by Paola Ramoino, DIPTERIS, University of Genoa; image acquired at LAMBS; Diaspro et al., 2002d.) (See Color Insert.)
nanoscalpeling. Figure 66 demonstrates that the cells remained alive and completed cell division after TPE-based nanosurgery. From the above reported examples, it should be clear that a promising direction for TPE applications is not only given by clinical diagnosis, for which optical biopsy can be considered a new paradigm, but also by clinical treatment based on photodynamic therapy and nanosurgery. To conclude this necessarily not exhaustive section let us switch to two more technical arguments that can greatly increase the large potential of TPE applications, namely, second-harmonic generation imaging and singlemolecule detection imaging. Second-harmonic generation (SHG), as primed by TPE nonlinear light– matter interaction (Sheppard and Kompfner, 1978), has only recently been used for biological imaging applications (Campagnola et al., 1999; Moreaux et al., 2000; Zoumi et al., 2002; Diaspro et al., 2002d). A powerful advance is
270
DIASPRO AND CHIRICO
Figure 68. Single and multiple fluorescent molecule and molecular aggregate trends. Comparison between single molecule and aggregate of molecules intensity decay. (Courtesy of Fabio Cannone, LAMBS and INFM Milano Bicocca.)
Figure 69. Distribution of the intensity of the spots of an image of a glass prepared by spin coating a C = 310 nM rhodamine 6G solution. The image was acquired with residence time 3 ms, 3535 mm2 field of view and average excitation power 7 mW. Top inset: Fluorescence of the peaks in the distribution in order of increasing intensity. (Adapted from Chirico et al., 2001.)
obtained in coupling TPE and SHG imaging on the same detection optical path, which involves different contrast mechanisms usable to obtain complementary information regarding biological system structure and functioning. TPE fluorescence is generally measured in epiilluminatin geometry but the forward propagating nature of SHG seemed to restrict
TWO-PHOTON EXCITATION MICROSCOPY
271
Figure 70. Average fluorescence of the dimmest spot measured on slides spin coated with rhodamine 6G (scatter red), fluorescein (scatter green), and pyrene (scatter navy). The solid lines are square law best fit curves showing clear evidence for the prevailing second-order process. (Courtesy of Fabio Cannone, LAMBS and INFM Milano Bicocca; data acquired at LAMBS.)
SHG microscopy to a transmission mode of detection. This hampered several potential experiments, especially in thick or in optical configurations where it is not possible to place forward detectors. Recently reflected SHG signals were collected from Bruce Tromberg’s and Alberto Diaspro’s groups (Zoumi et al., 2002; Diaspro et al., 2002d) opening new application perspectives. Figure 67 shows an autofluorescence and SHG signal from Paramecium primaurelia, a unicellular organism. The bright spots are forming vesicles and vacuoles according to cellular morphology and related positions. In this case, the autofluorescence signal was used as a cellular landmark. Background autofluorescence and bright spots allow us to image details from the samples without the need of staining (Diaspro et al., 2002d).
272
DIASPRO AND CHIRICO
The study of single molecules by spectroscopic techniques has recently become of major interest, and fluorescence has been used among others techniques to identify and characterize the properties of single-molecular entities. Xie and Lu, and Petra Schwille are the authors of two very useful and excellent reviews on the subject including outstanding developments of fluorescence correlation spectroscopy, first introduced by Magde et al. (1972), that for evident reasons could be inserted in this review (Xie and Lu, 1999; Schwille, 2001). However, we focus on single-molecule imaging (farfield) using simple two-photon optical configurations (Diaspro et al., 2001; Sonneleitner et al., 1999). Following the pioneering work by Sanchez et al. (1997) on two-photon imaging of single rhodamine B glass-immobilized molecules, spatially resolved applications of ultrasensitive TPE fluorescence have shown promising results (Sonneleitner et al., 2000; Chirico et al., 2001). Two basic issues in these studies are to diminish the background signal, either residual scattering or fluorescence, and to discriminate between the signals arising from single molecules and those that correspond to small molecular aggregates. Apart from an elaborate and elegant method based on the observation of the anticorrelation effect due to the saturation of the ground level of a single molecule, the chaotic time behavior of the fluorescence signal on a millisecond range is taken most of the time as a fingerprint of the ‘‘singlemolecule’’ spot. However, these observations can be performed only by following the time evolution of the fluorescence emission of molecular aggregates, which may be degraded by the prolonged exposure to the exciting radiation. Moreover they are performed mainly with sensitive and costly avalanche photodiodes in a single-photon counting regime. Recently we imaged the fluorescence signal of different fluorophores spread on glass substrates by means of the scanning head adapted to twophoton excitation (see Section VI) in the range of about 650–900 kW/cm2 of excitation intensity (Chirico et al., 2001). So far, it was possible to show that in this range of excitation intensity single molecules can be imaged even with analog detection and, more interestingly, that the distributions of the pixel content on the images show discrete peaks at specific levels that are found to be multiples of a reference basic fluorescence level, the latter corresponding to the dimmest spot revealed in the substrates. The main difference with respect to other single-molecule detection schemes was related to the employment of a simple analog detection scheme and the use of a commerical scanning head to quantitatively discriminate between single entities and aggregates on single snapshots of the spin-coated glasses. Figure 68 sketches single and multiple fluorescent molecule behavior under a TPE regime. In Figure 69 the number density of the spots per mm2 versus concentration of the rhodamine 6G fluorescent molecule is shown. Images were taken at microsecond residence time per pixel. The spin coating on the glass slide
TWO-PHOTON EXCITATION MICROSCOPY
273
of the fluorescent molecules has been made from a solution of rhodamine 6G at C = 312 nM, and the excitation power is at 7 mW at the entrance of the scanning head (Chirico et al. 2001). As a further control and singlemolecule level characterisation step, Figure 70 demonstrates the expected power-intensity quadratic dependence under the TPE regime of singlemolecule image spots. This was a first step in the study of the behavior of single molecules; a further step is related to photothermal effects and blinking (Chirico et al., 2002). VIII. Conclusions The rapid spreading of two-photon excitation microscopy, since Denk’s report at the beginning of the 1990s, has brought dramatic changes in designing experiments that utilize fluorescent molecules and more specifically in fluorescence optical microscopy. We are both spectators and actors of an unprecedented revolution that is leading us to new exciting discoveries as well as to have a look behind us on the decennial use of fluorescence microscopy. Not only are new incredible experiments being designed and performed but also a critical reading of the past results is done by comparing one- and two-photon experiments. It offers real progress in science with its intrinsic three-dimensional resolution, the absence of background fluorescence, and the attractive possibility of exciting UV excitable fluorescent molecules thus increasing sample penetration. In fact, in a TPE scheme two 720-nm photons combine to produce the same fluorescence conventionally primed at say 360 nm. The excitation of the fluorescent molecules bound to the biological system being studied mainly takes place (80%) in an excitation volume of the order of magnitude of 1 fl or smaller. This implies an intrinsic 3D optical sectioning effect. What is invaluable for cell imaging and in particular for live-cell imaging is the fact that weak endogenous one-photon absorption and highly localized spatial confinement of the TPE process dramatically reduce phototoxicity stress. To the best of our knowledge the situation is advantageous if compared with the damage induced by means of conventional fluorescence excitation. Notwithstanding this, some sagacity has to be used and some experimental parameters need to be carefully controlled such as average power, acquisition dwell time, zooming factor, and beam pulse width. The following points summarize the unique characteristics and distinct advantages of TPE: 1. Spatially confined fluorescence excitation in the focal plane of the specimen is the hallmark of TPE microscopy. It is one of the advantages over confocal microscopy, where fluorescence emission occurs across
274
DIASPRO AND CHIRICO
the entire thickness of the sample being excited by the scanning laser beam. A strong implication is that there is no photon signal from sources out of the geometric position of the optical focus within the sample. Therefore, the signal-to-noise ratio increases, photo-degradation effects decrease, and optical sectioning is immediately available without the need for pinhole or deconvolution algorithms. Besides, efficient acquisition schemes can be implemented such as the nondescanned one realized by placing the detector near the specimen and outside the conventional confocal fluorescence pathway. 2. The use of near-IR/IR wavelengths permits examination of thick specimens in depth. This is due to the fact that apart from special cases such as pigmented samples and the absorption spectral window of water, cells and tissues absorb poorly in the near-IR/IR region. So, cellular damage is minimized thus allowing cell viability during image acquisition to be prolonged. Moreover, scattering is reduced and deeper targets can be reached without incurring the drawbacks of onephoton excitation, i.e., more excitation intensity needed at the expense of photodamage and signal-to-noise ratio. The depth of penetration can be up to 0.5 mm. Whereas in one-photon excitation, the emission wavelength is comparatively close to the excitation one (about 50–200 nm longer), in TPE the fluorescence emission occurs at a wavelength substantially shorter and at a larger spectral distance than in the onephoton excitation case. Now, despite the advantages, there are still some practical limitations and open questions that remain to be examined closely. A severe limitation is the high cost of laser sources and of maintenance, primarily because of the limited and unpredictable duration of laser pump diodes. As other researchers have pointed out, once the technology becomes less expensive and simpler, all confocal microscopes will also be a two- or multiphoton microscope. Other matters under study involve local heating from absorption of IR light by water at high laser power (Schonle and Hell, 1998) and photothermal effects on fluorescent molecules (Chirico et al., 2002); phototoxicity from long wavelength IR excitation and short wavelength fluorescence emission (Tyrrel and Keyse 1990; Konig, 2000; Hopt and Neher, 2001; Konig and Tirlapur, 2002); and development of new fluorochromes better suited for TPE and multiphoton excitation (Albota et al., 1998a) In agreement with Gratton et al. (2001), it is our opinion that one of the major benefits in setting up a TPE microscope is the flexibility in choosing the measurement modality favored by the simplification of the optical
TWO-PHOTON EXCITATION MICROSCOPY
275
design. In fact, a TPE microscope offers a number and variety of measurement options without changing any optics or hardware. This means that during the same experiments one can get real multimodal information from the specimen being studied. The recent work done by Bruce Tromberg’s group is a clear demonstration of this and a brilliant and outstanding application of TPE (Zoumi et al., 2002). We think that this is a unique feature of the TPE microscope. In fact, the usefulness of the TPE scheme for spectroscopic and life time studies is already well documented (So et al., 1996; Sytsma et al., 1998; Schwille et al., 2000; Diaspro et al., 2001; Wiseman et al., 2002), for optical data storage and microfabrication (Cumpston et al., 1999; Kawata et al., 2001), and for single molecule detection (Mertz et al., 1995; Farrer et al., 1999; So et al., 2000; Chirico et al., 2001). Moreover, very interesting applications involve the study of impurities affecting the growth of protein crystals (Caylor et al., 1999), TPE imaging in the field of plant biology (Tirlapur and Konig, 2002) and measurements in living systems (Squirrel et al., 1999; Yoder and Kleinfeld, 2002; Diaspro et al., 2002d). This growing area of microscopy is also related to the applications of TPE as an active biomedical device for nanosurgery (Konig, 2000) and photodynamic therapy (Bhalwalkar et al., 1997; So et al., 2000). Recently TPE microscopy, even in an evanescent-field-induced configuration, has been extended to large area structures of the order of square centimeters (Duveneck et al., 2001). This can open the way for further improving the sensitivity of biosensing platforms such as genomic and proteomic microarrays based upon large planar waveguides. Besides, we deem that important and dramatic future developments will be in areas such as neurobiology, physiology, embryology, and tissue engineering. It is an easy prediction to state that the range of applicability of TPE and multiphoton laser scanning microscopes is intensively branching in biomedical, biotechnological, and biophysical sciences as well as toward clinical applications. It is appropriate to end with this citation: ‘‘There are more things in Heaven and Earth, Horatio, Than are dreamt of in our philosophy’’ (‘‘Hamlet,’’ by William Shakespeare, approx. 1601–1608).
Acknowledgments The authors are indebted to their co-workers at LAMBS, (Laboratory for Advanced Microscopy, Bioimaging, and Spectroscopy), namely (random order) Andrea Gerbi, Fabio Mazzone, Francesco Difato, Silvia Scaglione, Federico Federici, Fabio Cannone, Sabrina Beretta, Giancarlo Baldini,
276
DIASPRO AND CHIRICO
Marco Scotto, Cesare Usai, Paola Ramoino, and Alessandro Esposito. Moreover, we are grateful to Salvatore Cannistraro, Alessandra Gliozzi and Enrico Gratton for believing in the TPE project. A.D. is indebted to Peter Hawkes, for infinite patience, and to his wife Teresa for lost sunny weekends and help during hard days; without her this chapter could not have been written. A.D. dedicates this chapter to the memory of Mario Arace, who purchased his first oscilloscope still in used for TPE (see figures), and Ivan Krekule, more than a father. This research was performed under the auspices of and grants from INFM, the National Institute for the Physics of Matter, Italy.
References Abbe, E. (1910). edited by O. Lummer and F. Reiche. Braunschweig. Agard, D. A. (1984). Optical sectioning microscopy: Cellular architecture in three dimensions. Annu. Rev. Biophys. 13, 191–219. Agard, D. A., Hiraoka, Y., Shaw, P. J., and Sedat, J. W. (1989). Fluorescence microscopy in three-dimensions. Methods Cell Biol. 30, 353–378. Albota, M. et al. (1998a). Design of organic molecules with large two-photon absorption cross sections. Science 281, 1653–1656. Albota, M. A., Xu, C., and Webb, W. W. (1998b). Two-photon fluorescence excitation cross sections of biomolecular probes from 690 to 960 nm. Appl. Opt. 37, 7352–7356. Amos, B. (2000). Lessons from the history of light microscopy. Nat. Cell Biol. 2, E151–E152. Andrews, D. L. (1985). A simple statistical treatment of multiphoton absorption. Am. J. Phys. 53, 1001–1002. Axe, J. D. (1964). Two-photon processes in complex atoms. Phys. Rev. 136, 42–45. Beltrame, F., Bianco, B., Castellaro, G., and Diaspro, A. (1985). Fluorescence, absorption, phase-contrast, holographic and acoustical cytometries of living cells, in Interactions between Electromagnetic Fields and Cells, edited by A. Chiabrera and H. P. Schwan. NATO ASI Series. Vol. 97. New York: Plenum Press, pp. 483–498. Benedetti, P. (1998). From the histophotometer to the confocal microscope: The evolution of analytical microscopy. Eur. J. Histochem. 42, 11–17. Benham, G. S., and Schwartz, S. (2002). Suitable microscope objectives for multiphoton digital imaging, in Multiphoton Microscopy in the Biomedical Sciences II, edited by A. Periasamy and P. T. C. So. Proc. SPIE. 4620, pp. 36–47. Berland, K. (2001). Basics of fluorescence, in Methods in Cellular Imaging, edited by A. Periasamy. New York: Oxford University Press, pp. 5–19. Berland, K. M., So, P. T. C., and Gratton, E. (1995). Two-photon fluorescence correlation spectroscopy: Method and application to the intracellular environment. Biophys. J. 68, 694–701. Berns, M. W. (1976). A possible two-photon effect in vitro using a focused laser beam. Biophys. J. 16, 973–977. Bertero, M., and Boccacci, P. (1998). Introduction to Inverse Problems in Imaging. Bristol and Philadelphia: IOP Publishing. Bhawalkar, J. D., Kumar, N. D., Zhao, C. F., and Prasad, P. N. (1997). Two-photon photodynamic therapy. J. Clin. Laser Med. Surg. 15, 201–204.
TWO-PHOTON EXCITATION MICROSCOPY
277
Bianco, B., and Diaspro, A. (1989). Analysis of the three dimensional cell imaging obtained with optical microscopy techniques based on defocusing. Cell Biophys. 15(3), 189–200. Birge, R. R. (1979). A theoretical analysis of the two-photon properties of linear polyenes and the visual chromophores. J. Chem. Phys. 70, 165–169. Birge, R. R. (1986). Two-photon spectroscopy of protein-bound fluorophores. Accounts Chem. Res. 19, 138–146. Birks, J. B. (1970). Photophysics of Aromatic Molecules. London: Wiley Interscience. Boccacci, P., and Bertero, M. (2002). Image restoration methods: Basics and agorithms, in Confocal and Two-photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc, pp. 253–270. Born, M., and Wolf, E. (1980). Principles of Optics. 6th ed., Cambridge, UK: Cambridge University Press. Brakenhoff, G. J., Blom, P., and Barends, P. (1979). Confocal scanning light microscopy with high aperture immersion lenses. J. Microsc. 117, 219–232. Brakenhoff, G. J., van Spronsen, E. A., van der Voort, H. T., and Nanninga, N. (1989). Threedimensional confocal fluorescence microscopy. Methods Cell. Biol. 30, 379–398. Brakenhoff, G. J., Muller, M., and Ghauharali, R. I. (1996). Analysis of efficiency of twophoton versus single-photon absorption for fluorescence generation in biological objects. J. Microsc. 183, 140–144. Buehler, C., Kim, K. H., Dong, C. Y., Masters, B. R., and So, P. T. C. (1999). Innovations in two-photon deep tissue microscopy. IEEE Eng. Med. Biol. 18, 23–30. Callis, P. R. (1997). Two-photon-induced fluorescence. Annu. Rev. Phys. Chem. 48, 271–297. Campagnola, P., Mei-de, Wei, Lewis, A., and Loew, L. (1999). High-resolution nonlinear optical imaging of live cells by second harmonic generation. Biophys. J. 77, 3341–3351. Cannell, M. B., and Soeller, C. (1997). High resolution imaging using confocal and two-photon molecular excitation microscopy. Proc. R. Microsc. Soc. 32, 3–8. Cannone, F., Chirico, G., Scotto, M., and Diaspro, A. (2003) In preparation. Cantor, C. R., and Schimmel, P. R. (1980). Biophysical Chemistry. Part II: Techniques for the Study of Biological Structure and Function. New York: Freeman and Co. Carlsson, K., Danielsson, P. E., Lenz, R., Liljeborg, A., Majlof, L., and Aslund, N. (1985). Three-dimensional microscopy using a confocal laser scanning microscope. Opt. Lett. 10, 53–55. Carrington, W. (2002). Imaging live cells in 3-d using wide field microscopy with image restoration, in Confocal and Two-Photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc, pp. 333–346. Carrington, W. A., Lynch, R. M., Moore, E. D. W., Isenberg, G., Fogarty, K. E., and Fay, F. S. (1995). Super resolution in three-dimensional images of fluorescence in cells with minimal light exposure. Science. 268, 1483–1487. Castleman, K. R. (1996). Digital Image Processing. Englewood Cliffs, NJ: Prentice Hall. Castleman, K. (2002). Sampling, resolution and digital image processing in spatial and Fourier domain: Basic principles, in Confocal and Two-Photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 237–252. Caylor, C. L., Dobrianov, I., Kimmer, C., Thorne, R. E., Zipfel, W., and Webb, W. W. (1999). Two-photon fluorescence imaging of impurity distributions in protein crystals. Phys. Rev. E 59, 3831–3834. Centonze, V. E., and White, J. G. (1998). Multiphoton excitation provides optical sections from deeper within scattering specimens than confocal imaging. Biophys. J. 75, 2015–2024. Chalfie, M. and Kain, S. Eds. (1998). Green Fluorescent Protein: Properties, Applications and Protocols. New York: Wiley-Liss, Inc.
278
DIASPRO AND CHIRICO
Chalfie, M., Tu, Y., Euskirchen, G., Ward, W. W., and Prasher, D. C. (1994). Green fluorescent protein as a marker for gene expression. Science 263, 802–805. Chance, B. (1989). Cell Structure and Function by Microspectrofluorometry. New York: Academic Press. Cheng, P. C. Ed. (1994). Computer Assisted Multidimensional Microscopies. New York: Springer-Verlag. Chirico, G., Cannone, F., Beretta, S., Baldini, G., and Diaspro, A. (2001). Single molecule studies by means of the two-photon fluorescence distribution. Microsc. Res. Tech. 55, 359–364. Chirico, G., Cannone, F., Baldini, G., and Diaspro, A. (2002). Two-photon thermal bleaching of single fluorescent molecules. Biophys. J (in press). Christie, R. H. Backsai, B. J. Zipfel, W. R. et al. (2001). Growth arrest of individual senile plaques in a model of Alzheimer’s disease observed by in vivo multiphoton microscopy. J. Neurosci. 21(3), 858–864. Cox, I. J. (1984). Scanning optical fluorescence microscopy. J. Microsc. 133, 149–153. Cox, I. J., and Sheppard, C. J. R. (1983). Digital image processing of confocal images. Image Vision Comput. 1, 52–56. Cumpston, B. H. et al. (1999). Two-photon polymerization initiators for three-dimensional optical storage and microfabrication. Nature 348, 51–54. Daria, V., Blanca, C. M., Nakamura, O., Kawata, S., and Saloma, C. (1998). Image contrast enhancement for two-photon fluorescence microscopy in a turbid medium. Appl. Opt. 37, 7960–7967. de Grauw, K., and Gerritsen, H. (2002). Aberrations and penetration depth in confocal and two-photon microscopy, in Confocal and Two-photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc, pp. 153–170. Denk, W. (1996). Two-photon excitation in functional biological imaging. J. Biomed. Opt. 1, 296–304. Denk, W., and Svoboda, K. (1997). Photon upmanship: Why multiphoton imaging is more than a gimmick. Neuron 18, 351–357. Denk, W., Strickler, J. H., and Webb, W. W. (1990). Two-photon laser scanning fluorescence microscopy. Science 248, 73–76. Denk, W., Delaney, K. R., Gelperin, A., Kleinfeld, D., Strowbridge, B. W., Tank, D. W., and Yuste, R. (1994). Anatomical and functional imaging of neurons using 2-photon laser scanning microscopy. J. Neurosci. Methods 54, 151–162. Denk, W., Piston, D., and Webb, W. W. (1995). Two-photon molecular excitation in laser scanning microscopy, in Handbook of Confocal Microscopy, edited by J. B. Pawley. New York: Plenum Press, pp. 445–457. Diaspro, A. (1998). Two-photon fluorescence excitation. A new potential perspective in flow cytometry. Minerva Biotechnol 11(2), 87–92. Diaspro, A. (1999a). (guest editor) Two-photon microscopy. Microsc. Res. Tech. 47, 163–212. Diaspro, A. (1999b). (guest editor) Two-photon excitation microscopy. IEEE Eng. Med. Biol. 18(5), 16–99. Diaspro, A (1999). Two-photon excitation of fluorescence in three-dimensional microscopy. Eur. J. Histochem. 43, 169–178. Diaspro, A. (2001). Building a two-photon microscope using a laser scanning confocal architecture, in Methods in Cellular Imaging, edited by A. Periasamy. New York: Oxford University Press, pp. 162–179. Diaspro, A. Ed. (2002). Confocal and Two-Photon Microscopy: Foundations, Applications, and Advances. New York: Wiley-Liss, Inc.
TWO-PHOTON EXCITATION MICROSCOPY
279
Diaspro, A., and Robello, M. (2000). Two-photon excitation of fluorescence for threedimensional optical imaging of biological structures. J. Photochem. Photobiol. B 55, 1–8. Diaspro, A., and Sheppard, C. J. R. (2002). Two-photon excitation microscopy: Basic principles and architectures, Confocal and Two-Photon Microscopy: Foundations, Applications, and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 34–74. Diaspro, A., Sartore, M., and Nicolini, C. (1990). Three-dimensional representation of biostructures imaged with an optical microscope: I. Digital optical sectioning. Image Vision Comput. 8, 130–141. Diaspro, A., Beltrame, F., Fato, M., Palmeri, A., and Ramoino, P. (1997). Studies on the structure of sperm heads of Eledone cirrhosa by means of CLSM linked to bioimage-oriented devices. Microsc. Res. Tech. 36, 159–164. Diaspro, A., Annunziata, S., Raimondo, M., and Robello, M. (1999a). Three-dimensional optical behaviour of a confocal microscope with single illumination and detection pinhole through imaging of subresolution beads. Microsc. Res. Tech. 45(2), 130–131. Diaspro, A., Corosu, M., Ramoino, P., and Robello, M. (1999b). Adapting a compact confocal microscope system to a two-photon excitation fluorescence imaging architecture. Microsc. Res. Tech. 47, 196–205. Diaspro, A., Annunziata, S., and Robello, M. (2000). Single-pinhole confocal imaging of subresolution sparse objects using experimental point spread function and image restoration. Micros. Res. Tech. 51, 464–468. Diaspro, A., and Chirico, G. Cannone, F. et al. (2001). Two-photon microscopy and spectroscopy based on a compact confocal scanning head. J. Biomed. Opt. 6, 300–310. Diaspro, A., Federici, F., and Robello, M. (2002a). Influence of refractive-index mismatch in high-resolution three-dimensional confocal microscopy. Appl. Opt. 41, 685–690. Diaspro, A., Silvano, D., Krol, S., Cavalleri, O., and Gliozzi, A. (2002b). Single living cell encapsulation in nano-organized polyelectrolyte shells. Langmuir. 18, 5047–5050. Diaspro, A., Boccacci, P., Bonetto, P, Scarito, M., Davolio, M., and Epifani, M. (2002c). ‘‘Power-up your Microscope,’’ www.powermicroscope.com. Diaspro, A., Fronte, P., Raimondo, M., Fato, M., De Leo, G., Beltrame, F., Cannone, F., Chirico, G., and Ramoino, P. (2002d). Functional imaging of living paramecium by means of confocal and two-photon excitation fluorescence microscopy, in Functional Imaging, edited by D. Farkas. Proc. SPIE. 4622, pp. 47–53. Dong, C. Y., Yu, B., Hsu, L. L., and So, P. T. C. (2002). Characterization of two-photon point spread function in skin imaging applications, in Multiphoton Microscopy in the Biomedical Sciences II, edited by A. Periasamy and P. T. C. So. Proc. SPIE. 4620, 1–8. Duveneck, G. L., Bopp, M. A., Ehrat, M., Haiml, M., Keller, U., Bader, M. A., Marowsky, G., and Soria, S. (2001). Evanescent-field-induced two-photon fluorescence: Excitation of macroscopic areas of planar waveguides. Appl. Phys. B. 73, 869–871. Faisal, F. H. M. (1987). Theory of Multiphoton Processes. New York: Plenum Press. Farrer, R. A., Previte, M. J. R., Olson, C. E., Peyser, L. A., Fourkas, J. T., and So, P. T. C. (1999). Single molecule detection with a two-photon fluorescence microscope with fast scanning capabilities and polarization sensitivity. Opt. Lett. 24, 1832–1834. Fay, F. S., Carrington, W., and Fogarty, K. E. (1989). Three-dimensional molecular distribution in single cells analyzed using the digital imaging microscope. J. Microsc. 153, 133–149. Feynman, R. P. (1985). QED: The Strange Theory of Light and Matter. Princeton, NJ: Princeton University Press. Fisher, W. G., Watcher, E. A., Armas, M., and Seaton, C. (1997). Titanium: sapphire laser as an excitation source in two-photon spectroscopy. Appl. Spectrosc. 51, 218–226. Ford, B. J. (1991). The Leeuwenhoek Legacy. Bristol and London: Biopress and Parrand.
280
DIASPRO AND CHIRICO
Franken, P. A., Hill, A. E., Peters, C. W., and Weinreich, G. (1961). Generation of optical harmonics. Phys. Rev. Lett. 7, 118–119. French, T, So, P. T. C., Weaver, D. J., Coelho-Sampaio, T., and Gratton, E. (1997). Twophoton fluorescence lifetime imaging microscopy of macrophage-mediated antigen processing. J. Microsc. 185, 339–353. Friedrich, D. M. (1982). Two-photon molecular spectroscopy. J. Chem. Educ. 59, 472–483. Friedrich, D. M., and McClain, W. M. (1980). Two-photon molecular electronic spectroscopy. Annu. Rev. Phys. Chem. 31, 559–577. Fujita, K., and Takamatsu, T. (2001). Real-time in situ calcium imaging with single and twophoton confocal microscopy, in Confocal and Two-Photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc, pp. 483–498. Gannaway, J. N., and Sheppard, C. J. R. (1978). Second harmonic imaging in the scanning optical microscope. Opt. Quant. Electron. 10, 435–439. Gauderon, R., Lukins, R. B., and Sheppard, C. J. R. (1999). Effects of a confocal pinhole in two-photon microscopy. Microsc. Res. Tech. 47, 210–214. Girkin, J., and Wokosin, D. (2002). Practical multiphoton microscopy, in Confocal and TwoPhoton Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 207–236. Go¨ppert-Mayer, M. (1931). u¨ber Elementarakte mit zwei Quantenspru¨ngen. Ann. Phys. 9, 273–295. Gosnell, T. R. Taylor, A. J. Eds. (1991). Selected Papers on Ultrafast Laser Technology, SPIE Milestone series. Bellingham, WA: SPIE Press. Gratton, E., and van de Ven, M. J. (1995). Laser sources for confocal microscopy, in Handbook of Confocal Microscopy, edited by J. B. Pawley. New York: Plenum Press, pp. 69–97. Gratton, E., Barry, N. P., Beretta, S., and Celli, A. (2001). Multiphoton fluorescence microscopy. Methods. 25, 103–110. Gu, M., and Sheppard, C. J. R. (1995). Comparison of three-dimensional imaging properties between two-photon and single-photon fluorescence microscopy. J. Microsc. 177, 128–137. Gu, M., Gan, X., Kisteman, A., and Xu, M. G. (2000). Comparison of penetration depth between two-photon excitation and single-photon excitation in imaging thorugh turbid tissue media. Appl. Phys. Lett. 77(10), 1551–1553. Guild, J. B., Xu, C., and Webb, W. W. (1997). Measurement of group delay dispersion of high numerical aperture objective lenses using two-photon excited fluorescence. Appl. Opt. 36, 397–401. Hamamatsu Photonics, K. K. (1999). Photomultiplier Tubes: Basics and Applications, 2nd ed. Japan: Hamamatsu Photonics K. K. Hanninen, P. E., and Hell, S. W. (1994). Femtosecond pulse broadening in the focal region of a two-photon fluorescence microscope. Bioimaging. 2, 117–121. Harper, I. S. (2001). Fluorophores and their labeling procedures for monitoring various biological signals, in Methods in Cellular Imaging, edited by A. Periasamy. New York: Oxford University Press, pp. 20–39. Haughland, P. R. Ed. (2002). Handbook of Fluorescent Probes and Research ChemicalsEugene, OR: Edn. Molecular Probes. Hell, S. W. (guest editor) (1996). Nonlinear optical microscopy. Bioimaging. 4, 121–172. Hell, S. W., Bahlmann, K., Schrader, M., Soini, A., Malak, H., Gryczynski, I., and Lakowicz, J. R. (1996). Three-photon excitation in fluorescence microscopy. J. Biomed. Opt. 1, 71–74. Hellwarth, R., and Chistensen, P. (1974). Nonlinear optical microscopic examination of structures in polycrystalline ZnSe. Opt. Commun. 12, 318–322. Herman, B., and Tanke, H. J. (1998). Fluorescence Microscopy. New York: Springer-Verlag. Hooke, R. (1961). Micrographia (facsimile). New York: Dover.
TWO-PHOTON EXCITATION MICROSCOPY
281
Hopt, A., and Neher, E. (2001). Highly nonlinear photodamage in two-photon fluorescence microscopy. Biophys. J. 80, 2029–2036. Iyer, V., Hoogland, T. M., Losavio, B. E., McQuiston, A. R., and Saggau, P. (2002). Compact two-photon laser scanning microscope made from minimally modified commercial components, in Multiphoton Microscopy in the Biomedical Sciences II, edited by A. Periasamy and P. T. C. So. Proc. SPIE. pp. 274–280. Jonkman, J., and Stelzer, E. (2002). Resolution and contrast in confocal and two-photon microscopy, in Confocal and Two-photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 101–126. Kaiser, W., and Garrett, C. G. B. (1961). Two-photon excitation in CaF2:Eu2+. Phys. Rev. Lett. 7, 229–231. Kawata, S., Sun, H.-B., Tanaka, T., and Takada, K. (2001). Finer features for functional microdevices. Nature. 412, 697–698. Kleinfeld, D., and Denk, W. (2000). Two-photon imaging of neocortical microcirculation, in Imaging Neurons, edited by R. Yuste, F. Lanni and A. Konnerth. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, pp. 23.1–23.15. Koester, H. J., Baur, D., Uhl, R., and Hell, S. W. (1999). Ca2+ fluorescence imaging with picoand femtosecond two-photon excitation: Signal and photodamage. Biophys. J. 77, 2226–2236. Konig, K. (2000). Multiphoton microscopy in life sciences. J. Microsc. 200, 83–104. Konig, K., and Tirlapur, U. K. (2002). Cellular and subcellular perturbations during multiphoton microscopy, in Confocal and Two-photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 191–206. Konig, K., Liang, H., Berns, M. W., and Tromberg, B. J. (1995). Cell damage by near-IR microbeams. Nature. 377, 20–21. Konig, K., Krasieva, T., Bauer, E., Fiedler, U., Berns, M. W., Tromberg, B. J., and Greulich, K. O. (1996a). Cell damage by UVA radiation of a mercury microscopy lamp probed by autofluorescence modifications, cloning assay and comet assay. J. Biomed. Opt. 1, 217–222. Konig, K., Simon, U., and Halbhuber, K. J. (1996b). 3D resolved two-photon fluorescence microscopy of living cells using a modified confocal laser scanning microscope. Cell. Mol. Biol. 42, 1181–1194. Konig, K., So, P. T. C., Mantulin, W. W., Tromberg, B. J., and Gratton, E. (1996c). Twophoton excited lifetime imaging of autofluorescence in cells during UVA and NIR photostress. J. Microsc. 183, 197–204. Konig, K., Liang, H., Berns, M. W., and Tromberg, B. J. (1996). Cell damage in near infrared multimode optical traps as a result of multiphoton absorption. Opt. Lett. 21, 1090–1092. Konig, K., So, P. T. C., Mantulin, W. W., and Gratton, E. (1997). Cellular response to near-red femtosecond laser pulses in two-photon microscopes. Opt. Lett. 22, 135–136. Konig, K., Boehme, S., Leclerc, N., and Ahuja, R. (1998). Time-gated autofluorescence microscopy of motile green microalga in an optical trap. Cell. Mol. Biol. 44, 763–770. Konig, K., Becker, T. W., Fischer, P., Riemann, I., and Halbhuber, K. J. (1999a). Pulse-length dependence of cellular response to intense near-infrared laser pulses in multiphoton microscopes. Opt. Lett. 24, 113–115. Ko¨nig, K., Riemann, I., Fischer, P., and Halbhuber, K. J. (1999b). Intracellular nanosurgery with near infrared femtosecond laser pulses. Cell. Mol. Biol. 45, 195–201. Konig, K, Gohlert, A., Liehr, T., Loncarevic, I. F., and Riemann, I. (2000). Two-photon multicolor FISH: A versatile technique to detect specific sequences within single DNA molecules in cells and tissues. Single Mol. 1, 41–51. Kriete, A. Visualization in Biomedical MicroscopiesWeinheim: VCH. Lakowicz, J. R. (1999). Principles of Fluorescence Microscopy. New York: Plenum Press.
282
DIASPRO AND CHIRICO
Lakowicz, J. R., and Gryczynski, I. (1992). Tryptophan fluorescence intensity and anisotropy decays of human serum albumin resulting from one-photon and two-photon excitation. Biophys. Chem. 45, 1–6. Lemons, R. A., and Quate, C. F. (1975). Acoustic microscopy: Biomedical applications. Science. 188, 905–911. Liu, Y., Cheng, D., Sonek, G. J., Berns, M. W., Chapman, C. F., and Tromberg, B. J. (1995). Evidence of focalized cell heating induced by infrared optical tweezers. Biophys. J. 68, 2137–2144. Loudon, R. (1983). The Quantum Theory of Light. London: Oxford University Press. Louisell, W. H. (1973). Quantum Statistical Properties of Radiation. New York: Wiley. Magde, D., Elson, E., and Webb, W. W. (1972). Thermodynamic fluoctuations in a reacting system: Measurement by fluorescence correlation spectroscopy. Phys. Rev. Lett. 29, 705–708. Mainen, Z. F., Malectic-Savic, M., Shi, S. H., Hayashi, Y., Malinow, R., and Svoboda, K. (1999). Two-photon imaging in living brain slices. Methods. 18, 231–239. Maiti, S., Shear, J. B., Williams, R. M., Zipfel, W. R., and Webb, W. W. (1997). Measuring serotonin distribution in live cells with three-photon excitation. Science. 275, 530–532. Majewska, A., Yiu, G., and Yuste, R. (2000). A custom-made two-photon microscope and deconvolution system. Pflugers Arch. 441(2/3), 398–408. Manders, E. M. M., Stap, J., Brakenhoff, G. J., van Diel, R., and Aten, J. A. (1992). Dynamics of three-dimensional replication patterns during the s-phase analyzed by double labelling of DNA and confocal microscopy. J. Cell. Sci. 103, 857–862. Masters, B. R. (1996). Selected Papers on Confocal Microscopy. SPIE Milestone Series. Bellingham, WA: SPIE Press. Masters, B. R. (2002). Selected Papers on Multiphoton Excitation Microscopy. SPIE Milestone Series. Bellingham, WA: SPIE Press. Masters, B. R., and So, P. T. C. (1999). Multiphoton excitation microscopy and confocal microscopy imaging of in vivo human skin: A comparison. Microsc. Microanal. 5, 282–289. Masters, B. R., So, P. T. C., and Gratton, E. (1997). Multiphoton excitation fluorescence microscopy and spectroscopy of in vivo human skin. Biophys. J. 72, 2405–2412. Mertz, J., Xu, C., and Webb, W. W. (1995). Single molecule detection by two-photon excited fluorescence. Opt. Lett. 20, 2532–2534. Minsky, M. (1961). Memoir of inventing the confocal scanning microscope. Scanning. 10, 128–138. Moreaux, L., Sandre, O., and Mertz, J. (2000). J. Opt. Soc. Am. B 17, 1685–1694. Moscatelli, F. A. (1986). A simple conceptual model for two-photon absorption. Am. J. Phys. 54, 52–54. Mueller, M., Squier, J., Wilson, K. R., and Brakenhoff, G. J. (1998). 3D microscopy of transparent objects using third-harmonic generation. J. Microsc. 191, 266–274. Murphy, D. B. (2001). Fundamentals of Light Microscopy and Electronic Imaging New York: Wiley-Liss, Inc., pp. 1–367. Nakamura, O. (1993). Three-dimensional imaging characteristics of laser scan fluorescence microscopy: Two-photon excitation vs. single-photon excitation. Optik 93, 39–42. Nakamura, O. (1999). Fundamentals of two-photon microscopy. Microsc. Res. Tech. 47, 165–171. Ott, D. (2002). Two-photon microscopy reveals tumor development. Biophotonics Int. January/ February, 46–48. Patterson, G. H., and Piston, D. W. (2000). Photobleaching in two-photon excitation microscopy. Biophys. J. 78, 2159–2162. Pawley, J. B. Ed. (1995). Handbook of Biological Confocal MicroscopyNew York: Plenum Press. Periasamy, A. Methods in Cellular Imaging. New York: Oxford University Press.
TWO-PHOTON EXCITATION MICROSCOPY
283
Periasamy, A., Skoglund, P., Noakes, C., and Keller, R. (1999). An evaluation of two-photon excitation versus confocal and digital deconvolution fluorescence microscopy imaging in Xenopus morphogenesis. Microsc. Res. Tech. 47, 172–181. Periasamy, A., Noakes, C., Skoglund, P., Keller, R., and Sutherland, A. E. (2002). Two-photon excitation fluorescence microscopy imaging in Xenopus and transgenic mouse embryos, in Confocal and Two-photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 271–284. Pike, R. (2002). Superresolution in fluorescence confocal microscopy and in DVD optical storage, in Confocal and Two-photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 499–524. Piston, D. W. (1999). Imaging living cells and tissues by two-photon excitation microscopy. Trends Cell Biol. 9, 66–69. Piston, D. W., Masters, B. R., and Webb, W. W. (1995). Three-dimensionally resolved NAD(P)H cellular metabolic redox imaging of the in situ cornea with two-photon excitation laser scanning microscopy. J. Microsc. 178, 20–27. Potter, S. M. (1996). Vital imaging: Two-photons are better than one. Curr. Biol. 6, 1596–1598. Potter, S. M., Wang, C. M., Garrity, P. A., and Fraser, S. E. (1996). Intravital imaging of green fluorescent protein using two-photon laser-scanning microscopy. Gene. 173, 25–31. Rentzepis, P. M., Mitschele, C. J., and Saxman, A. C. (1970). Measurement of ultrashort laser pulses by three-photon fluorescence. Appl. Phys. Lett. 17, 122–124. Robinson, J. P. (2001). Current Protocols in Cytometry. New York: John Wiley & Sons. Rochow, G. T., and Tucker, P. A. (1994). Introduction to Microscopy by Means of Light, Electrons, X-Rays, or Acoustics. New York: Plenum Press. Saloma, C., Saloma-Palmes, C., and Kondoh, H. (1998). Site-specific confocal fluorescence imaging of biological microstructures in a turbid medium. Phys. Med. Biol. 43, 1741. Sanchez, E. J., Novotny, L., Holtom, G. R., and Xie, X. S. (1997). Room-temperature fluorescence imaging and spectroscopy of single molecules by two-photon excitation. J. Phys. Chem. 101, 7019–7023. Schonle, A., and Hell, S. W. (1998). Heating by absorption in the focus of an objective lens. Opt. Lett. 23, 325–327. Schrader, M., Hell, S. W., and van der Voort, H. T. M. (1996). Potential of confocal microscope to resolve in the 50–100 nm range. Appl. Phys. Lett. 69, 3644–3646. Schwille, P. (2001). Fluorescence correlation spectroscopy and its potential for intracellular applications. Cell Biochem. Biophys. 34, 383–405. Schwille, P., Haupts, U., Maiti, S., and Webb, W. W. (1999). Molecular dynamics in living cells observed by fluorescence correlation spectroscopy with one- and two-photon excitation. Biophys. J. 77, 2251–2265. Schwille, P., Kummer, S., Heikal, A. A., Moerner, W. E., and Webb, W. W. (2000). Fluorescence correlation spectroscopy reveals fast optical excitation-driven intramolecular dynamics of yellow fluorescent proteins. Proc. Natl. Acad. Sci. USA. 97, 151–156. Sheppard, C. J. R. (1977). The use of lenses with annular aperture scanning optical microscopy. Optik 48, 329–334. Sheppard, C. J. R. (1989). Axial resolution of confocal fluorescence miroscopy. J. Microsc. 154, 237–241. Sheppard, C. J. R. (2002). The generalized microscope, in Confocal and Two-photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 1–18. Sheppard, C. J. R., and Choudhury, A. (1977). Image formation in the scanning microscope. Opt. Acta. 24, 1051–1073.
284
DIASPRO AND CHIRICO
Sheppard, C. J. R., and Gu, M. (1990). Image formation in two-photon fluorescence microscopy. Optik. 86, 104–106. Sheppard, C. J. R., and Kompfner, R. (1978). Resonant scanning optical microscope. Appl. Opt. 17, 2879–2882. Sheppard, C. J. R., and Shotton, D. M. (1997). Confocal Laser Scanning Microscopy. Oxford, UK: BIOS. Sheppard, C. J. R., and Wilson, T. (1980). Image formation in confocal scanning microscopes. Optik. 55, 331–342. Sheppard, C. J. R., Kompfner, R., Gannaway, J., and Walsh, D. (1977). The scanning harmonic optical microscope. IEEE/OSA Conf. Laser Eng. Appl. Washington, DC. Shih, Y. H., Strekalov, D. V., Pittman, T. D., and Rubin, M. H. (1998). Why two-photon but not two photons? Fortschr. Phys. 46, 627–641. Shotton, D. M. Ed. (1993). Electronic Light Microscopy. Techniques in Modern Biomedical Microscopy. New York: Wiley-Liss, Inc. Shotton, D. M. (1995). Electronic light microscopy—present capabilities and future prospects. Histochem. Cell Biol. 104, 97–137. Singh, S., and Bradley, L. T. (1964). Three-photon absorption in naphthalene crystals by laser excitation. Phys. Rev. Lett. 12, 162–164. Smith, N. I., Fujita, K., Kaneko, T., Katoh, K., Nakamura, O., Kawata, S., and Takamastu, T. (2001). Generation of calcium waves in living cells by pulsed-laser-induced photodisruption. Appl. Phys. Lett. 79, 1208–1210. So, P. T. C., Berland, K. M., French, T., Dong, C. Y., and Gratton, E. (1996). Two photon fluorescence microscopy: Time resolved and intensity imaging, in Fluorescence Imaging Spectroscopy and Microscopy, edited by X. F. Wang and B. Herman. Chemical Analysis Series. New York: John Wiley & Sons, pp. 351–373. So, P. T. C., Kim, H., and Kochevar, I. E. (1998). Two-photon deep tissue ex vivo imaging of mouse dermal and subcutaneous structures. Opt. Express. 3, 339–350. So, P. T. C., Dong, C. Y., Masters, B. R., and Berland, K. M. (2000). Two-photon excitation fluorescence microscopy. Annu. Rev. Biomed. Eng. 2, 399–429. So, P. T. C., Kim, K. H., Buehler, C., Masters, B. R., Hsu, L., and Dong, C. Y. (2001). Basic principles of multi-photon excitation microscopy, in Methods in Cellular Imaging, edited by A. Periasamy. New York: Oxford University Press, pp. 152–161. Soeller, C., and Cannell, M. B. (1996). Construction of a two-photon microscope and optimisation of illumination pulse duration. Pfluegers Arch. 432, 555–561. Soeller, C., and Cannell, M. B. (1999). Two-photon microscopy: Imaging in scattering samples and three-dimensionally resolved flash photolysis. Microsc. Res. Tech. 47, 182–195. Sonnleitner, M., Schutz, G. J., and Schmidt, T. (1999). Imaging individual molecules by twophoton excitation. Chem. Phys. Lett. 300, 221–226. Sonnleitner, M., Schutz, G., Kada, G., and Schindler, H. (2000). Imaging single lipid molecules in living cells using two-photon excitation. Single Mol. 1, 182–183. Spence, D. E., Kean, P. N., and Sibbett, W. (1991). 60-fsec pulse generation from a self-modelocked Ti:sapphire laser. Opt. Lett. 16, 42–45. Squier, J. A., Muller, M., Brakenhoff, G. J., and Wilson, K. R. (1998). Third harmonic generation microscopy. Opt. Express. 3, 315–324. Squirrel, J. M., Wokosin, D. L., White, J. G., and Barister, B. D. (1999). Long-term two-photon fluorescence imaging of mammalian embryos without compromising viability. Nat. Biotechnol. 17, 763–767. Stanley, M. (2001). Improvements in Optical Filter Design, edited by A. Periasamy and P. T. C. So. Proc. SPIE. 4262, 52–61.
TWO-PHOTON EXCITATION MICROSCOPY
285
Stelzer, E. H. K., Hell, S., Lindek, S., Pick, R., Storz, C., Stricker, R., Ritter, G., and Salmon, N. (1994). Non-linear absorption extends confocal fluorescence microscopy into the ultraviolet regime and confines the illumination volume. Opt. Commun. 104, 223–228. Straub, M., and Hell, S. W. (1998). Fluorescence lifetime three-dimensional microscopy with picosecond precision using a multifocal multiphoton microscope. Appl. Phys. Lett. 73, 1769–1771. Straub, M., Lodemann, P., Holroyd, P., Jahn, R., and Hell, S. W. (2000). Live cell imaging by multifocal multiphoton microscopy. Eur. J. Cell Biol. 79, 726–734. Svelto, O. (1998). Principles of Lasers. 4th ed. New York: Plenum Press. Sytsma, J., Vroom, J. M., De Grauw, C. J., and Gerritsen, H. C. (1998). Time-gated fluorescence lifetime imaging and microvolume spectroscopy using two-photon excitation. J. Microsc. 191, 39–51. Tan, Y. P., Llano, I., Hopt, A., Wurriehausen, F., and Neher, E. (1999). Fast scanning and efficient photodetection in a simple two-photon microscope. J. Neurosci. Methods. 92, 123–135. Tanaka, T., Sun, H. B., and Kawata, S. (2002). Rapid sub-diffraction-limit laser micro’nanoprocessing in a threshold material system. Appl. Phys. Lett. 80, 312–314. Tirlapur, U. K., and Konig, K. (2002). Two-photon near infrared femtosecond laser scanning microscopy in plant biology, in Confocal and Two-photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 449–468. Torok, P., and Sheppard, C. J. R. (2002). The role of pinhole size in high aperture two and three-photon microscopy, in Confocal and Two-photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 127–152. Tsien, R. Y. (1998). The green fluorescent protein. Annu. Rev. Biochem. 67, 509–544. Tyrrell, R. M., and Keyse, S. M. (1990). The interaction of UVA radiation with cultured cells. J. Photochem. Photobiol. B. 4, 349–361. Wang, X. F., and Herman, B. (1996). Fluorescence Imaging Spectroscopy and Microscopy. New York: Wiley-Liss, Inc. Webb, R. H. (1996). Confocal optical microscopy. Rep. Prog. Phys. 59, 427–471. Weinstein, M., and Castleman, K. R. (1971). Reconstructing 3-D specimens from 2-D section images. Proc. SPIE. 26, 131–138. White, J. G., Amos, W. B., and Fordham, M. (1987). An evaluation of confocal versus conventional imaging of biological structures by fluorescence light microscopy. J. Cell Biol. 105, 41–48. White, N. S., and Errington, R. J. (2000). Improved laser scanning fluorescence microscopy by multiphoton excitation. Adv. Imag. Elect. Phys. 113, 249–277. Wier, W. G., Balke, C. W., Michael, J. A., and Mauban, J. R. (2000). A custom confocal and two-photon digital laser scanning microscope. Am. J. Physiol. 278, H2150–H2156. Wilson, T. (1989). Optical sectioning in confocal fluorescent microscope. J. Microsc. 154, 143–156. Wilson, T. Confocal Microscopy London: Academic Press. Wilson, T. (2002). Confocal microscopy: Basic principles and architectures, in Confocal and Two-Photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 19–38. Wilson, T., and Sheppard, C. J. R. (1984). Theory and Practice of Scanning Optical Microscopy. London: Academic Press. Wise, F. (1999). Lasers for two-photon microscopy, in Imaging: A Laboratory Manual, edited by R. Yuste, F. Lanni and A. Konnerth. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, pp. 18.1–18.9.
286
DIASPRO AND CHIRICO
Wiseman, P. W., Squier, J. A., Ellisman, M. H., and Wilson, K. R. (2000). Two photon image correlation spectroscopy and image cross-correlation spectroscopy. J. Microsc. 200, 14–25. Wiseman, P. W., Capani, F., Squier, J. A., and Martone, M. E (2002). Counting dendritic spines in brain tissue slices by image correlation spectroscopy analysis. J. Microsc. 205, 177–186. Wokosin, D. L., and White, J. G. (1997). Optimization of the design of a multiple-photon excitation laser scanning fluorescence imaging system, in Three-Dimensional Microscopy: Image, Acquisition and Processing IV. Proc. SPIE. 2984, 25–29. Wokosin, D. L., Centonze, V. E., White, J., Armstrong, D., Robertson, G., and Ferguson, A. I. (1996). All-solid-state ultrafast lasers facilitate multiphoton excitation fluorescence imaging. IEEE J. Sel. Top. Quant. Elect. 2, 1051–1065. Wokosin, D. L., Amos, W. B., and White, J. G. (1998). Detection sensitivity enhancements for fluorescence imaging with multiphoton excitation microscopy. Proc. IEEE Eng. Med. Biol. Soc. 20, 1707–1714. Wolleschensky, R., Feurer, T., Sauerbrey, R., and Simon, U. (1998). Characterization and optimization of a laser scanning microscope in the femtosecond regime. Appl. Phys. B 67, 87–94. Wolleschensky, R., Dickinson, M., and Fraser, S. E. (2002). Group velocity dispersion and fiber delivery in multiphoton laser scanning microscopy, in Confocal and Two-Photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 171–190. Xie, X. S., and Lu, H. P. (1999). Single molecule enzymology. J. Biol. Chem. 274, 15967–15970. Xu, C. (2002). Cross-sections of fluorescence molecules used in multiphoton microscopy, in Confocal and Two-Photon Microscopy: Foundations, Applications and Advances, edited by A. Diaspro. New York: Wiley-Liss, Inc., pp. 75–100. Xu, C., Guild, J., Webb, W. W., and Denk, W. (1995). Determination of absolute two-photon excitation cross sections by in situ second-order autocorrelation. Opt. Lett. 20, 2372–2374. Yoder, E. J., and Kleinfeld, D. (2002). Cortical imaging through the intact mouse skull using two-photon excitation laser scanning microscopy. Microsc. Res. Tech. 56(4), 304–305. Yuste, R., Lanni, F., and Konnerth, A. (2000). Imaging Neurons: A Laboratory Manual. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. Zoumi, A., Yeh, A., and Tromberg, B. J. (2002). Imaging cells and extracellular matrix in vivo by using second-harmonic generation and two-photon excited fluorescence. Proc. Natl. Acad. Sci. USA 99(17), 11014–11019. in press.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 126
Phase Closure Imaging ANDRE´ LANNES Sciences de l’Univers au Centre Europe´en de Recherche et de Formation Avance´e en Calcul Scientifique (SUC-CERFACS), F-31057 Toulouse cedex, France
I. Introduction. . . . . . . . . . . . . . . . . . . . A. Interferometric Graphs . . . . . . . . . . . . . . B. Phase Closure . . . . . . . . . . . . . . . . . . C. Phase Calibration . . . . . . . . . . . . . . . . D. Image Reconstruction . . . . . . . . . . . . . . . E. Contents . . . . . . . . . . . . . . . . . . . . II. Phase Spaces and Integer Lattices . . . . . . . . . . . A. Pupil Phase Space . . . . . . . . . . . . . . . . B. Baseline Phase Space . . . . . . . . . . . . . . . C. Unknown-Spectral Phase Space . . . . . . . . . . . D. Bias Phase Space . . . . . . . . . . . . . . . . E. Loop-Entry Phase Space. . . . . . . . . . . . . . III. Phase Closure Operator, Phase Closure Projection, and Related IV. Variance–Covariance Matrix of the Closure Phases . . . . . V. Spectral Phase Closure Projection . . . . . . . . . . . A. Smith Normal Form of the Spectral Phase Closure Matrix. B. Examples . . . . . . . . . . . . . . . . . . . 1. Weakly Redundant Case . . . . . . . . . . . . 2. Strongly Redundant Case . . . . . . . . . . . . VI. Reference Algebraic Framework . . . . . . . . . . . . VII. Statement of the Phase Calibration Problem . . . . . . . VIII. Phase Calibration Discrepancy and Related Results. . . . . IX. Optimal Model Phase Shift and Related Results . . . . . . A. Optimal Bias Phase. . . . . . . . . . . . . . . . B. Optimal Pupil Phase . . . . . . . . . . . . . . . X. Special Cases . . . . . . . . . . . . . . . . . . . A. Special Case Where m1 ¼ p . . . . . . . . . . . . B. Special Case Where m1 ¼ m with m < p . . . . . . . . C. Special Case Where m1 ¼ m with m ¼ p . . . . . . . . XI. Simulated Example . . . . . . . . . . . . . . . . . XII. Concluding Comments . . . . . . . . . . . . . . . . Appendix 1. Useful Property . . . . . . . . . . . . . Appendix 2. Smith Normal Form of Integral Matrices . . . Appendix 3. Reference Projections . . . . . . . . . . . Appendix 4. Closest Point Search in Lattices . . . . . . . References . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
288 289 290 291 292 292 293 293 293 294 295 296 296 299 299 300 300 300 302 305 307 309 313 314 314 315 315 316 317 317 319 320 321 321 323 327
287 Copyright 2003 Elsevier Science (USA). All rights reserved. ISSN 1076-5670/03
288
A. LANNES
I. Introduction Phase calibration is the key operation of phase closure imaging. In the general case of redundant arrays, the corresponding analysis is based on the Smith normal form of the spectral phase closure matrix. This mathematical representation, well known in integral matrix theory, has not been exploited so far in phase closure imaging. New results are thus exhibited. In this theoretical framework, the optimal model phase shift is obtained by successively solving two integer ambiguity problems. This study is illustrated with the aid of a simulation built on a particular redundant interferometric graph. The potential instabilities of a phase calibration operation can thus be well understood. In this article, A is an interferometric array observing an incoherent source of small angular size (see Fig. 1); A includes n pupil elements: n telescopes in optical interferometry (Reasonberg, 1998) or n antennas in radio imaging (Hunt and Payne, 1997). Relative to the tracking center, the source is characterized by some two-dimensional angular brightness distribution so ðjÞ. Let rð jÞ denote the position vector of the j th pupil element projected onto a plane normal to the tracking axis (see Fig. 1). According to the Van Cittert–Zernike theorem (Born and Wolf, 1970), the data set, consisting of the experimental ‘‘complex visibilities’’ Ve ð j; kÞ, corresponds to a certain sampling of the Fourier transform of so, Z ^so ðuÞ :¼ so ðjÞexpð2iu jÞdj ð1Þ R2
Each baseline ( j, k) defines a Fourier sampling point; the corresponding angular spatial frequency is defined by uð j; kÞ ¼
rð jÞ rðkÞ
ð2Þ
where is the wavelength of the electromagnetic field under consideration. In the absence of errors, one thus has Ve ð j; kÞ ¼ ^so ½uð j; kÞ. Within reasonable, well-defined limits, inversion of this basic relationship yields an approximation to so. The corresponding operation is associated with the notion of aperture synthesis. In most cases encountered in practice, the relationship between so and Ve is not a simple Fourier sampling operation. In particular, residual optical path differences often blur the basic observational principle. More precisely, we then have Ve ð j; kÞ ¼ ^so ½uð j; kÞ exp½ie ð j; kÞ þ error terms
ð3Þ
PHASE CLOSURE IMAGING
289
Figure 1. Interferometric observational principle. Each couple ( j, k) of pupil elements defines a Fourier sampling point u( j, k) of the Fourier transform of the angular brightness distribution of the object source: uð j; kÞ ¼ Mk Mj = [see Eq. (2)].
in which the e ( j, k) are bias phases of the form e ð j; kÞ ¼ e ð jÞ e ðkÞ
ð4Þ
The e( j ) are unknown pupil phases. All the complex-valued functions involved in the observational Eq. (3) are Hermitean. For example, Ve ðk; jÞ ¼ Ve ð j; kÞ. In this article, we consider the situations in which the bias phases e ð j; kÞ cannot be calibrated in an experimental manner. The phase of ^so ½uð j; kÞ, an antisymmetric function denoted by o ð j; kÞ, is therefore not directly accessible. A. Interferometric Graphs Let Bc be the set of the nðn 1Þ=2 baselines ( j, k) generated by A. The graph (A, Bc) (see Fig. 2 and Biggs, 1996), whose vertices are the pupil elements of A, and whose edges are the baselines of Bc, is said to be complete. In practice, one may be led to consider the values of the phase of Ve only on a subset B of Bc: B Bc . For example, this may result from the fact that |Ve| is negligible on Bc n B. The number of baselines of graph (A, B) is denoted by q. Clearly, q
nðn 1Þ 2
ð5Þ
290
A. LANNES
Figure 2. Top: redundant array A; bottom: corresponding complete graph (A, Bc). By definition, Bc is the set of all the baselines generated by A.
According to the very principle of interferometry, A and B are defined so that (A, B) is connected (Biggs, 1996); one then speaks of the interferometric graph. The condition ‘‘ ð jÞ ðkÞ ¼ 0 for all ð j; kÞ 2 B’’ is therefore equivalent to ‘‘ is constant on A.’’ According to Eq. (2), distinct baselines may generate the same angular frequency; ^so ½uð j; kÞ takes the same value on these baselines. Whenever this situation occurs, the interferometric graph is said to be redundant or partly redundant (see Fig. 2). To stress the fact that o is constant on the subsets of B defined by the list of distinct angular frequencies, one then says that #o o is a spectral (baseline) phase. B. Phase Closure A subgraph of (A, B) with n vertices, n 1 edges, and no loop (i.e., no cycle) in it is said to be a spanning tree of (A, B) (see Fig. 3 and Biggs, 1996). Let ( ji, ki) now be a baseline of B that does not lie in the set of baselines of the selected spanning tree. As illustrated in Figure 3 and specified in Lannes (1999), a baseline of this type defines a certain directed loop. The number of loops defined via a given spanning tree (fixed in arbitrary manner) is therefore given by the formula p ¼ q ðn 1Þ
ð6Þ
For example, in Figure 3, the selected spanning tree includes five elements: baselines (1, 2), (1, 3), (1, 4), (1, 5), and (1, 6). Baselines (2, 3), (2, 4), (2, 5),
PHASE CLOSURE IMAGING
291
Figure 3. Example of interferometric graph (n ¼ 6). Baselines (2, 3), (2, 4), (2, 5), and (4, 5) are lacking so that q ¼ 11. The thick lines correspond to the selected spanning tree. Here, such a tree includes five baselines; the remaining baselines define as many loops: p ¼ 6 (see text).
and (4, 5) are lacking so that q ¼ 11. The remaining baselines ( ji, ki), the six baselines (2, 6), (3, 4), (3, 5), (3, 6), (4, 6), and (5, 6), define as many loops ( p ¼ 6). Note that here, all these loops are of order 3. By definition, the closure phases of are the sums of the values of along the directed loops defined through a given spanning tree. For example, in Figure 3, for the directed loop induced by the first loop-entry baseline ( j1,k1) (2,6), the closure phase of is ð1Þ :¼ ð2; 6Þ þ ð6; 1Þ þ ð1; 2Þ
Note that the closures phases of any bias phase are equal to zero.
C. Phase Calibration Let sm (where m stands for model) be an approximation to so. On each baseline ð j; kÞ 2 B, the phase of Ve, the baseline phase e ð j; kÞ, is related to that of sˆm, the spectral phase #m ( j, k), by a relationship of the form e ¼ ð#m þ #Þ þ þ 2 þ e
ð7Þ
in which e is an error term. Here, # is a spectral phase, whereas is a bias phase: the # ( j, k) satisfy the redundancy constraint, whereas the ( j, k) are of the form ( j ) – (k). Clearly, ( j, k) is an integer-valued function. In the phase calibration operation, the quantities #, , and have to be chosen so as to minimize the size of the error term. The model is then constrained through a formula of the form #m ¼ #m þ # . In what follows, # is referred to as the ‘‘optimal model phase shift.’’
292
A. LANNES
D. Image Reconstruction At any step of the image reconstruction procedure, the object model sm may be refined by performing a phase calibration operation followed by a Fourier synthesis process. The latter is performed by using as input the Fourier data of the refined model ^sm ½uð j; kÞ ¼ ^sm ½uð j; kÞ exp½i# ð j; kÞ
ð j; kÞ 2 B
ð8Þ
Examples of Fourier synthesis methods can be found in Lannes et al. (1994, 1996, 1997). As will be clarified in this article, the notion of phase closure imaging is associated with the fact that the optimal model phase shift # can be expressed in terms of the closure phases of e #m . Since the original work by Cornwell and Wilkinson (1981) on how to make maps with interferometers, radio astronomers have been well aware of the critical part played by the phase calibration operation (Hunt and Payne, 1997). Instabilities were observed, but never well understood until the analysis presented much later by Lannes (1999, 2001a) became available. By stating the problem at the level of the phase (instead of the phasor), it was then established that in the case of nonredundant arrays, a phase calibration operation amounted to solving a certain ‘‘nearest lattice point’’ problem. The related instabilities could then be well understood. The present study can be regarded as an extension of the paper by Lannes (2001a) to the case of redundant arrays. New aspects, which were hidden when concentrating on the nonredundant case, are thus revealed, hence providing better knowledge of the matter.
E. Contents We first present the algebraic framework in which the analysis of the phase calibration problem can be developed. Phase spaces and their integer lattices are then introduced (Section II). Some properties related to the notion of phase closure are stated in Section III. The new results essentially concern the variance–covariance matrix of the closure phases (Section IV), and especially, the Smith normal form (SNF) of the spectral phase closure matrix (Section V). As the reader may not be familiar with the notion of SNF, Section V is illustrated with the aid of two examples. The first one concerns a weakly redundant interferometric graph, and the second a strongly redundant graph. Section VI is devoted to the reference algebraic framework resulting from this analysis. The phase calibration problem is thoroughly stated and solved in Sections VII to IX. In the general case of redundant arrays, two integer ambiguity problems must then be successively
PHASE CLOSURE IMAGING
293
solved: P1 and P2. Important special cases are examined in Section X. Section XI is devoted to a simulation built on a particular redundant array. As indicated in the concluding comments (Section XII), the present study can be extended to any interferometric device. II. Phase Spaces and Integer Lattices In what follows, we identify the n-element array A with A :¼ f1; 2; . . . ; ng, and denote by B := fð j; kÞ; ðk; jÞ : ð j; kÞ 2 Bg the set of directed baselines. A. Pupil Phase Space By definition, the pupil phase space is the space H H (R) of real-valued functions that take their values on A or A. Endowed with the inner product X ð 1 j 2 ÞH :¼
1 ð jÞ 2 ð jÞ j2A
H is a Euclidean space of dimension n. The subset of H whose elements are functions with values in Z is denoted by H(Z). This subset is a lattice of H (Cohen, 1996). The are the nodes of this lattice. The set fak : k 2 Ag in which ak ð jÞ ¼ jk (the Kronecker symbol) is the standard basis of H, as well as of H(Z), which is therefore of rank n. Given r in A, Hr is the subspace of H with standard basis fa‘ : ‘ 2 Anrg; Hr ðZÞ is the corresponding lattice. B. Baseline Phase Space The baseline phase space is the space G G(R) of antisymmetric real-valued functions that take their values on B : 8ð j; kÞ 2 B; ðk; jÞ ¼ ð j; kÞ. Clearly (see Section I.A), dim G ¼ q The subset of G whose elements are functions with values in Z is denoted by G(Z). This subset is a lattice of G. The are the nodes of this lattice. The set of baseline phase functions 8 if j ¼ j 0 and k ¼ k0 < 1 bj0 k0 ð j; kÞ :¼ 1 if j ¼ k0 and k ¼ j 0 ð j 0; k0 Þ 2 B : 0 otherwise
is the standard basis of G, as well as of GðZÞ, which is therefore of rank q.
294
A. LANNES
Let $ be a given symmetric weight function: $ð j; kÞ ¼ $ðk; jÞ > 0. Endowed with the inner product 1 X $ð j; kÞ1 ð j; kÞ2 ð j; kÞ ð1 j2 ÞG :¼ 2 ð j; kÞ 2 B ð9Þ X $ð j; kÞ1 ð j; kÞ2 ð j; kÞ :¼ ð j; kÞ 2 B
G is a real Hilbert space. In the absence of any ambiguity, the subscript G will be omitted. In other terms, ðjÞ and k k stand for (jÞG and k kG , respectively. C. Unknown-Spectral Phase Space Whenever redundant interferometric graphs are considered, one is led to introduce an important subspace of G: the spectral phase space Gs Gs ðRÞ. By definition, Gs is the set of baseline phases # 2 G that satisfy the redundancy constraint: # takes the same value on all the baselines that generate the same spatial frequency. As already mentioned, such a phase function is said to be a spectral phase. The weight function $ involved in the definition of the inner product [Eq. (9)] satisfies the redundancy constraint. The object spectral phase #o is often approximately known on a given subset Br of B. (The subscript r stands for reference.) In practice, Br corresponds to a given set of low frequencies. By definition, the unknownspectral phase space K KðRÞ is the space of spectral phases that vanish on the reference set in question. Let m be the number of spectral phase components to be determined, and fuk gm k¼1 be the corresponding set of distinct angular spatial frequencies. Clearly, dim K ¼ m
ð10Þ
For example, for the array shown in Figure 2, when # (1, 2) is assumed to be known a priori, the dimension of K is equal to 4. The subset of K whose elements are functions with values in Z is denoted by KðZÞ. This subset is a lattice of K. The are the nodes of this lattice. The standard basis of K is the set of the spectral phases 8 < 1 if u ði; jÞ ¼ uk k ði; jÞ :¼ 1 if u ði; jÞ ¼ uk ; ðk ¼ 1; . . . ; mÞ : 0 otherwise This basis, k gm k¼1 , is also the standard basis of lattice KðZÞ. By construction, the latter is of rank m.
PHASE CLOSURE IMAGING
295
D. Bias Phase Space In the process of stating the phase calibration problem, one is led to introduce the bias phase operator B : H ! G;
ðB Þð j; kÞ :¼ ð jÞ ðkÞ
By definition, the bias phase space L is the range of B : L :¼ BH. As the graph (A, B) is connected, the space of functions that are constant on A is the kernel (also called the null space) of B. This subspace of H is of dimension unity. As a result, dim L ¼ n 1
ð11Þ
Given r in A, the set f‘ :¼ Ba‘ : ‘ 2 Anrg is a basis of L. This basis generates a lattice of L denoted by L(Z). By construction this lattice is of rank n 1. Note that L(Z) is the subset of L whose elements are functions with values in Z: LðZÞ ¼ GðZÞ \ L The are the nodes of this lattice. The orthogonal complement of L in G, M, is referred to as the bias-free phase space. In what follows, the orthogonal projections onto L and M are denoted by R and S, respectively (see Fig. 4). In practice, their action does not raise any particular difficulty [see the context of Eq. (37) in Lannes, 1999].
Figure 4. Main decomposition of the baseline phase space G. The range of B is the bias phase space L. Its orthogonal complement M is the bias-free phase space; R and S are the corresponding orthogonal projections. The spanning-tree phase space E is the space induced by the selected spanning tree. Its orthogonal complement F is the loop-entry phase space; P and Q are the corresponding orthogonal projections. Operator C, which is the oblique projection of G onto F along L, is referred to as the phase closure projection.
296
A. LANNES
E. Loop-Entry Phase Space Let E be the subspace of G formed by the functions with support in the selected spanning tree; E is referred to as the spanning-tree phase space; its dimension is equal to n 1. Its orthogonal complement, F, is the loop-entry phase space. Clearly, dim F ¼ p The loop-entry phase functions 8 if j ¼ ji and k ¼ ki < 1 i ð j; kÞ :¼ 1 if j ¼ ki and k ¼ ji : 0 otherwise
ði ¼ 1; . . . ; pÞ
ð12Þ
form the standard basis of F. This basis generates a lattice of F denoted by F ðZÞ. Note that F ðZÞ ¼ GðZÞ \ F
By construction, this lattice is of rank p. As shown in Figure 4, the projections onto E and F are denoted by P and Q, respectively.
III. Phase Closure Operator, Phase Closure Projection, and Related Properties The closure phases ð1Þ ; . . . ; ðpÞ of a function lying in G are the sums of the values of along the directed loops defined through a given spanning tree of (A, B) (see Section I.B). These closure phases are the components of a vector b lying in Rp . In this context, the operator C : G ! Rp ;
C :¼ b ¼
p X
ðiÞ ji
i¼1
is said to be the ‘‘phase closure operator.’’ Note that fji ¼ Ci gpi¼1 is the standard basis of Rp . This explicitly shows that C is surjective. We therefore have dimðker CÞ ¼ dim G dim F ¼ q p hence, from Eq. (6): dimðker CÞ ¼ n 1. Clearly, the range of B is contained in ker C. As this range is of dimension n 1 [see Eq. (11)], it follows that ker C ¼ L
PHASE CLOSURE IMAGING
297
Consider the operator C : Rp ! F ;
C :¼
p X
½i i
i¼1
[i]
where denotes the ith component of . Clearly, the operator C :¼ C C is such that p X
C ¼
ðiÞ i
i¼1
As Ci ¼ i , we have C 2 ¼ C. Furthermore, L is the kernel of C. This operator, which is therefore the oblique projection of G onto F along L (see Fig. 4), is said to be the ‘‘phase closure projection.’’ Note that :¼ C lies in ker C and therefore in L. Any in G can therefore be uniquely decomposed in the form ¼ þ C
2 L and C 2 F
with
This explicitly shows that G can be regarded as the direct sum of L and F: G¼LþF
L \ F ¼ f0g
with
Let us now concentrate on the oblique projection of a node of GðZ): C ¼
p X
ðiÞ i
i¼1
As the (i) are rational integers, C is a node of F ðZÞ (see Fig. 5). It is then clear that any in GðZÞ can be decomposed in the form (see Fig. 5) ¼ þ C
with
2 LðZÞ and C 2 F ðZÞ
This explicitly shows that GðZÞ can be regarded as the direct sum of LðZÞ and F ðZÞ: GðZÞ ¼ LðZÞ þ F ðZÞ
with
LðZÞ \ F ðZÞ ¼ f0g:
As Sð CÞ ¼ 0 (since C lies in L, and S is the projection of G onto M ), we have p X S ¼ ðiÞ i i¼1
where
i :¼ Si
298
A. LANNES
Figure 5. Canonical decomposition of lattice G(Z). The intersection of G(Z) with the bias phase space L, L(Z), is a lattice of rank n 1. The intersection of G(Z) with the loop-entry phase space F, F(Z), is a lattice of rank p. For a given choice of spanning tree, any 2 G(Z) can be decomposed in the nonorthogonal form ¼ þ C with 2 LðZÞ and C 2 F ðZÞ; GðZÞ can therefore be regarded as the direct sum of L(Z) and F(Z)
Note that Ci ¼ i (see Fig. 4). As G is the direct sum of L and F, and the is form a basis of F, it follows from the relation above that the is form a basis of M (see Appendix 1). Let us now introduce the operator p X ½i i C : ¼ C : Rp ! M; i¼1
By construction, S ¼ C C. Note that Ci ¼ Ci , hence p X ðiÞ Ci CC ¼ i¼1
¼
p X i¼1
ðiÞ Ci ¼
p X i¼1
ðiÞ ji ¼ b
As ker C ¼ L; C is therefore the Moore–Penrose pseudoinverse of C: C ¼ Cþ
Now, since C is surjective, we have
Cþ ¼ C ðCC Þ1
299
PHASE CLOSURE IMAGING
hence ðCþ Cþ ÞðCC Þ ¼ Ip
ðthe identity on Rp Þ
As a result, C C ¼ ðCC Þ1
ð14Þ
IV. Variance–Covariance Matrix of the Closure Phases Let [C] and [ ] be the matrices of C and CC in the standard bases of G and Rp , and [ ] be the diagonal matrix whose elements are the inverses of the baseline weights $ð j; kÞ [see Eq. (9)]. Denote by [C]t the transpose matrix of [C]. As ðC j ÞRp
¼
½Ct ½ ¼ ½t ½Ct ½
¼
ð j C ÞG ¼ ½t ½V 1 ½C ¼ ½t ½V 1 ½C ½
we have ½C ¼ ½V ½Ct , hence
½V ¼ ½C½V ½Ct
Consequently, when [V] is regarded as the variance–covariance matrix of the baseline phases ð j; kÞ; ½V is the variance–covariance matrix of the ðiÞ closure phases e . According to Eq. (14), we have ½C C ¼ ½V 1 Note that the matrix elements of [C C ] are the inner products (i j i0 Þ:
V. Spectral Phase Closure Projection The operator from K into F induced by C is denoted by CK and referred to as the spectral phase closure projection. Its kernel K0 is the intersection of K with L: K0 :¼ ker CK ¼ K \ L We set m0 ¼ dim K0 ;
m1 :¼ m m0
300
A. LANNES
A. Smith Normal Form of the Spectral Phase Closure Matrix Let [CK] now be the matrix of CK in the standard bases of K and F: p f k gm k¼1 ; fi gi¼1 . By construction, the matrix elements of [CK ] lie in Z. Note that [CK ] has p lines and m columns. According to theo theorem introduced n m0 m1 of KðZÞ, a basis ; in Appendix 2, there exist a basis 0;k k¼1 1; j j¼1 n m1 pm1 o 1; j j¼1 ; 2;i i¼1 of F ðZÞ, and positive integers c1 ; c2 ; . . . ; cm1 , with cj dividing cjþ1 for 1 j < m1 , such that C1; j ¼ cj 1; j (for 1 j m1 ) and C0;k ¼ 0 (for 1 k m0 ), in other words such that the matrix of CK in these bases is of Smith normal form. More precisely, there then exist two matrices [ ] and [ ] (of order m and p, respectively) with coefficients in Z and determinant 1 such that ½ 1 ½CK ½ ¼ ½CK S
with c1 B0 B . B . B . B ½CK S ¼ B B B B @ 0
0
0 c2
0 0 .. .
... ... cm 1 0
0
...
1 0 0C .. C C .C C C C C C .. A . 0
Clearly, CK is of rank m1. The components of 1, j and 0, k in the standard basis of K(Z) form the j th and (m1 + k)th column vectors of [ ], whereas the components of 1, j and 2, i in the standard basis of F(Z) form the j th and (m1 + i)th column vectors of [ ]. As illustrated in the following section, in most cases encountered in practice, the elementary divisors of ½CK ; c1 ; c2 ; . . . ; cm1 , prove to be equal to unity. B. Examples In this section we present two examples. The first one concerns a weakly redundant interferometric graph (Section V.B.1), and the second a strongly redundant graph (Section V.B.2). 1. Weakly Redundant Case Let us consider the four-element array shown in Figure 6 and the corresponding interferometric graph. This graph is complete, with n ¼ 4; q ¼ 6, and p ¼ 3, and weakly redundant: only two baselines are
PHASE CLOSURE IMAGING
301
Figure 6. Top: an example of a weakly redundant array; bottom: corresponding complete interferometric graph.
redundant: baselines (1, 2) and (2, 3). The following vectors form the standard basis of Gs : 1 ¼ b12 þ b23 2 ¼ b13 3 ¼ b14 4 ¼ b24 5 ¼ b34 We now identify K with Gs so that m ¼ 5. Note that here, m is strictly greater than p. The vectors 1 ¼ b23 ; 2 ¼ b24 ; and 3 ¼ b34 , which are the loop-entry vectors of the directed loops (2, 3, 1), (2, 4, 1), and (3, 4, 1), form the standard basis of F. We then have C1 ¼ 21 þ 2 C2 ¼ 1 þ 3 C3 ¼ 2 3 C4 ¼ 2 C5 ¼ 3 Matrix [CK] is therefore of the form 0 2 1 0 ½CK ¼ @ 1 0 1
0 0 1 1 1 0
Its Smith normal form is then as follows: 0 1 0 0 ½CK s ¼ @ 0 1 0 0 0 1
0 0 0
1 0 0A 1 1 0 0A 0
Clearly, CK is then of rank 3: m0 ¼ 2 and m1 ¼ 3. Here,
302
A. LANNES
0
1 B0 B ½ ¼ B B0 @0 0
0 1 0 0 0
1 2 1 0 0
1 2 2 1 0
1 1 2 C C 1 C C 0A 1
The first three columns of [ ] yield the components of 1;1 ; 1;2 , and 1;3 in the standard basis of KðZÞ; the last two columns yield those of 0;1 and 0;2 . If need be, the reader may explicitly verify that C0; k ¼ 0 for k ¼ 1; 2. The routines that give the Smith normal form also yield ½ 1 . Here, 1 0 1 0 1 1 0 B 0 1 2 2 0C C B 1 B ½ ¼ B 0 0 1 2 1 C C @0 0 0 1 0A 0 0 0 0 1
Likewise, we then get
0
2 ½ ¼ @1 0
1 1 0 0 0A 1 1
The columns of [ ] yield the components of 1;1 ; 1;2 , and 1;3 in the standard basis of F ðZÞ. The reader may verify that C1;j ¼ 1; j for j ¼ 1; 2; 3. Here, 0 1 0 1 0 1 ½ ¼ @ 1 2 0A 1 2 1 The weakest weakly redundant situation corresponds to the nonredundant case. Then, K ¼ G; m ¼ q; 0;k ¼ k for 1 k m0 with m0 ¼ n 1, and 1; j ¼ j for 1 j m1 with m1 ¼ p. We then have C1; j ¼ j since Cj ¼ j .
2. Strongly Redundant Case We now consider the six-element array shown in Figure 7 and the corresponding interferometric graph (the same as the one shown in Fig. 3). This graph is incomplete: baselines (2, 3), (2, 4), (2, 5), and (4, 5) are lacking. In this case, n ¼ 6; q ¼ 11, and p ¼ 6. This graph is strongly redundant in the sense that many baselines are redundant. The following vectors form the standard basis of Gs:
PHASE CLOSURE IMAGING
303
Figure 7. Top: an example of a strongly redundant array; bottom: the interferometric graph to be taken into consideration, the same as the one shown in Figure 3.
1 ¼ b12 þ b34 þ b56 2 ¼ b13 þ b35 þ b46 3 ¼ b14 þ b36 4 ¼ b15 þ b26 5 ¼ b16 We now define the unknown-spectral phase space K as the subspace of Gs generated by the vectors 2 ; 3 ; 4 , and 5 ðm ¼ 4Þ. Note that m is then strictly less that p. The vectors 1 ¼ b26 ; 2 ¼ b34 ; 3 ¼ b35 ; 4 ¼ b36 ; 5 ¼ b46 , and 6 ¼ b56 , which are the loop-entry vectors of the directed loops (2, 6, 1), (3, 4, 1), (3, 5, 1), (3, 6, 1), (4, 6, 1), and (5, 6, 1), form the standard basis of F. We then have C2 ¼ 2 þ 23 þ 4 þ 5 C3 ¼ 2 þ 4 þ 5 C4 ¼ 1 3 þ 6 C5 ¼ 1 4 5 6 Matrix [CK] is therefore of the form 0 0 0 B 1 1 B B2 0 ½CK ¼ B B1 1 B @1 1 0 0
1 0 1 0 0 1
1 1 0C C 0C C 1 C C 1 A 1
304
A. LANNES
Its Smith normal form is then as follows: 0 1 0 B0 1 B B0 0 ½CK S ¼ B B0 0 B @0 0 0 0
0 0 1 0 0 0
Clearly, CK is then of rank 3: m0 ¼ 1 and m1 0 1 0 0 B0 0 0 ½ ¼ B @ 0 1 0 0 0 1
1 0 0C C 0C C 0C C 0A 0
¼ 3. Here, 1 1 1C C 2A 2
The first three columns of [ ] yield the components of 1,1, 1,2, and 1,3 in the standard basis of K(Z); the last column yields those of 0,1. If need be, the reader may explicitly verify that C0,1 ¼ 0. Here, 1 0 1 1 0 0 B0 2 1 0C C ½ 1 ¼ B @0 2 0 1 A 0 1 0 0
Likewise, we then get
0
0 B1 B B2 ½ ¼B B1 B @1 0
1 0 1 0 0 1
1 0 0 1 1 1
0 0 0 1 0 0
0 0 0 0 1 0
1 0 0C C 0C C 0C C 0A 1
The first three columns of [ ] yield the components of 1,1, 1,2, and 1,3 in the standard basis of F(Z). The reader may verify that C1; j ¼ 1; j , for j ¼ 1; 2; 3. The last three columns yield the components of 2;1 ; 2;2 , and 2;3 . Thus, in this case, 2;1 ¼ 4 ; 2;2 ¼ 5 , and 2;3 ¼ 6 . Here, 1 0 0 1 0 0 0 0 B 0 2 1 0 0 0C C B B 1 2 1 0 0 0C C ½ 1 ¼ B B 1 1 1 1 0 0 C C B @ 1 1 1 0 1 0 A 1 0 0 0 0 1
PHASE CLOSURE IMAGING
305
Figure 8. Geometric representation of the reference algebraic framework. Here, L is the bias phase space, and M its orthogonal complement in the baseline phase space G. The unknown spectral phase space K is the direct sum of K0 and K1, where K0 is the intersection of K with L. (K0 is not represented in this figure.) Likewise, the loop-entry phase space F is the direct sum of F1 and F2, where F1 is the image of K1 by the spectral phase closure projection CK. The Smith normal decomposition of CK provides bases for K0, K1, F1, and F2. The phase closure projection C maps K1 onto F1. The projection S of G onto M maps K1 and F1 onto M1. The orthogonal complement of K + L in G, M+, is the orthogonal complement of M1 in M.
VI. Reference Algebraic Framework Let K F2(Z) the lattices generated by the bases be pm K1(Z), m1 F1(Z), mand m0(Z), 1 0 ; and 2;i i¼1 1 , respectively. The linear spaces ; 1; j j¼1 ; 1; j j¼1 0;k k¼1 generated by the same bases are denoted by K0, K1, F1, and F2, respectively. Clearly, K(Z) can be regarded as the direct sum of K0(Z) and K1(Z): KðZÞ ¼ K0 ðZÞ þ K1 ðZÞ
with
K0 ðZÞ \ K1 ðZÞ ¼ f0g
ð15Þ
Likewise, F(Z) can be regarded as the direct sum of F1(Z) and F2 (Z): F ðZÞ ¼ F1 ðZÞ þ F2 ðZÞ
with
F1 ðZÞ \ F2 ðZÞ ¼ f0g
ð16Þ
As a corollary, K ¼ K0 þ K1 with K0 \ K1 ¼ f0g and F ¼ F1 þ F2 with F1 \ F2 ¼ f0g. Furthermore, as C1; j ¼ cj 1; j for j ¼ 1; . . . ; m1 , C maps K1 onto F1 (see Fig. 8). The image of K1(Z) by C is a lattice of rank m1. This lattice coincides with F1(Z) iff (if and only if ) all the elementary divisors of CK are equal to unity. According to Eq. (15), any in K(Z) can be decomposed in the form in which
¼ 0 þ 1
306
A. LANNES
0 ¼
m0 X
1 ¼
0; k 0; k ;
k¼1
m1 X
1; j 1; j
j¼1
The integers 0;knfor 1 k m 0 and 1; j for 1 j m1 are the components o of in the basis 0;k mk¼1 ; 1; j mj¼1 . Likewise [see Eq. (16)], any in F (Z) can be decomposed in the form: 0
1
¼ 1 þ 2
in which 1 ¼
m1 X
2 ¼
1; j 1; j ;
j¼1
pm X1
2; i 2; i
i¼1
The integers 1; j for 1 j n m1 and 2;i o for 1 i p m1 are the 1 components of in the basis 1; j mj¼11 ; 2;i pm . In this notation i¼1 C ¼
p X i¼1
ðiÞ i ¼
m1 X
cj 1; j j1; j
j¼1
Let us now introduce the vectors of M (see Fig. 8): 1; j :¼ S1; j ; n
2;i :¼ S2;i n
m1 pm1 o ; 2;i i¼1 1; j j¼1
o
pm1 is a basis of M (see is a basis of F, 1; j mj¼11 ; 2;i i¼1 As Appendix 1). Furthermore, since cj S1; j ¼ SC1; j ¼ S1; j , we have
1; j ¼
1 S1; j cj
ð1 j m1 Þ
S therefore maps K1 onto M1: = SK1. Clearly, the 1, j form a basis of M1 (see Fig. 8). Let K+ be the orthogonal complement of K0 in K. Denoting by T+ the projection of K onto K+, we set þ;1; j :¼ Tþ 1; j
ð17Þ
The +;1, j (for 1 j m1) form a basis of K+ (see Appendix 1). As S1; j ¼ STþ 1; j ¼ Sþ;1; j , we also have 1; j ¼
1 Sþ;1; j cj
From its definition, M1 proves to be the orthogonal complement of L in K + L. As a result, Mþ :¼ ðK þ LÞ? is the orthogonal complement of M1 in M (see Fig. 8). Denoting by U1 and U+ the projections of G onto M1 and M+, respectively, we thus have S ¼ U1 þ Uþ
ð18Þ
307
PHASE CLOSURE IMAGING
In this context, we set (see Fig. 8) 1;1; j :¼ U1 1; j ¼ 1; j ;
þ;1; j :¼ Uþ 1; j ¼ 0
ð1 j m1 Þ
ð19Þ
and 1;2;i :¼ U1 2;i ;
þ;2;i :¼ Uþ 2;i
ð1 i p m1 Þ
ð20Þ
The þ;2;i (for 1 i p m1 Þ form a basis of M+ (see Appendix 1). As 1; j ¼ 1=cj S1; j ; 1; j ¼ U1 1; j and U1 S ¼ U1 , we also have 1; j ¼
1 U1 1; j cj
ð1 j m1 Þ
ð21Þ
As specified in the following sections, in the general case of redundant interferometric graphs, the statement of the phase calibration problem leads us to consider two integer ambiguity problems, successively: P1 and P2. The vectors þ;2;i ðfor 1 i p m1 Þ are the canonical basis vectors of the Z-lattice involved in the nearest lattice node problem P1, whereas the vectors þ;1; j ðfor 1 j m1 Þ are the canonical basis vectors of the Z-lattice involved in the nearest lattice node problem P2. Furthermore, the vectors 1;2;i ðfor 1 i p m1 Þ also play an important part in the statement of P1. All these vectors can be explicitly obtained as indicated in Appendix 3.
VII. Statement of the Phase Calibration Problem Let bxe be the nearest rational integer to x, and {x} be the discrepancy between x and this integer: {x} :¼ x bxe. The value of wrapped into the interval (; ) is denoted by arc( ). Thus, arcð Þ ¼ 2{ _}
where
_ :¼ 2
ð22Þ
Let be any function in the baseline phase space G; the discrepancy between _ :¼ =ð2Þ and the nearest lattice node of G(Z) (for the distance defined in G ) is the function
{_ } ¼ _ b_ e
ð23Þ
arcðÞ ¼ 2{_ }
ð24Þ
where b_ eð j; kÞ :¼ b_ ð j; kÞe. Note that
In the process of stating the phase calibration problem, the guiding idea is to minimize the functional [see Eq. (7)]
308
A. LANNES
f1 : K L ! R;
f1 ð#; Þ :¼ k arcfðe Þ ð#m þ #Þgk2
ð25Þ
Setting :¼ e #m
we have
f1 ð#; Þ :¼ k arcf ð# þ Þgk2
ð26Þ ð27Þ
Let (#1, 1) now be a point of K L at which the minimum of f1 is attained. The quantity :¼ arcfðe 1 Þ ð#m þ #1 Þg ¼ arcf ð#1 þ 1 Þg
ð28Þ
is then referred to as the ‘‘phase calibration discrepancy.’’ In the general case where K0 (the intersection of K with L) is not reduced to {0}, let us consider the set S :¼ fð#1 ’ 2 ; 1 þ ’ 2 Þ : ’ 2 K0 ; 2 KðZÞ; 2 LðZÞg With regard to the minimum of f1, all the points of S are equivalent. In this context, the points of physical interest are those for which the size of #1 ’ 2 is minimum. To define the final solution(s), the idea is therefore to minimize the functional f2 : K0 ! R;
f2 ð’Þ :¼ k arcð#1 ’Þk2
ð29Þ
Denoting by ’ a bias phase for which the minimum of f2 is obtained, the quantity #m :¼ #m þ #
ð30Þ
# :¼ arcð#1 ’ Þ
ð31Þ
in which
is the ‘‘constrained model phase’’; # is the ‘‘optimal model phase shift’’ (see Fig. 9). The ‘‘calibrated phase’’ is then defined by the formula e :¼ e
ð32Þ
¼ 1 þ ’
ð33Þ
where
is said to be an ‘‘optimal bias phase.’’ This phase distribution is defined modulo 2 in L. This means that for any 2 LðZÞ; 2 is a solution in .
PHASE CLOSURE IMAGING
309
Figure 9. Phase calibration terminology in the general case of redundant arrays: # is the optimal model phase shift, is an optimal bias phase, #m is the constrained model phase, e is the calibrated phase, and is the phase calibration discrepancy. These functions take their values on the set of baselines of the interferometric graph. A trigonometric representation of this type can thus be associated with each baseline.
Note that the phase calibration discrepancy , as defined in Eq. (28), is also equal to arc(e #m ) (see Fig. 9). Given r in A, a pupil phase 2 Hr such that B ¼ is said to be an ‘‘optimal pupil phase.’’ Clearly, an optimal pupil phase is defined modulo 2 in Hr: for any 2 Hr (Z), 2 is a solution in . The problem of minimizing f1 is first considered (Section VIII), and then that of minimizing f2 (Section IX).
VIII. Phase Calibration Discrepancy and Related Results According to Eqs. (25), (26), and (27), the search for the phase calibration discrepancy (28) leads us to consider the functional f11 : K þ L ! R;
f11 ðÞ :¼ k arcð Þk2
is of the form # + with # 2 K and 2 L. As arc ð Þ ¼ 2{ _ _ } where _ :¼ =ð2Þ and _ :¼ =ð2Þ [see Eqs. (24), (23), and (22)], the problem of minimizing f11 is equivalent to the one of minimizing 2 f12 : K þ L ! R; f12 ð_ Þ :¼ { _ _ }
Note that { _ _ } is the discrepancy between _ _ and the nearest lattice node of G (Z). The problem is therefore to identify the nodes of G(Z) closest
310
A. LANNES
to the affine space parallel to K + L and passing through _ . We therefore have to minimize in G(Z) the norm of the projection of _ onto ðK þ LÞ? . The nodes at which the minimum is attained are defined up to a node of K(Z) + L(Z). The bulk of the problem is therefore to find a minimum of the functional f13 ðÞ :¼ kUþ ð _ Þk2 ¼ kUþ ðS _ SÞk2
f13 : GðZÞ ! R;
in which U+ is the orthogonal projection onto Mþ :¼ ðK þ LÞ? . Let _ ðiÞ for 1 i p be the closure terms of _ in the standard basis of F. In the notation adopted here, we denote by _ 1; j (for 1 j n m1 ) and by _ 2;io pm1 (for 1 i p m1 Þ the closure terms of _ in the basis 1;j mj¼11 ; 2;i i¼1 (see Section V.A). In matrix form, these closure vectors are explicitly related as follows: 1 0 ½ _ 1 C B 1 ð34Þ A ¼ ½ ½ _ @ ½ _ 2
As
Uþ S _ ¼ Uþ
m1 X j¼1
_ 1; j 1; j þ
pm X1 i¼1
_ 2;i 2;i
!
we have, since U+ 1, j ¼ 0 and þ;2;i :¼ Uþ 2; i [see Eqs. (19) and (20)], Uþ S _ ¼
pm X1
_ 2;i þ;2;i
Uþ S ¼
pm X1
2;i þ;2;i
i¼1
Likewise,
i¼1
Minimizing f13 therefore leads to minimizing the functional 2 pm 1 X _ pm1 ! R; f14 ðm2 Þ :¼ ð 2;i 2;i Þþ;2;i f14 : Z i¼1
In this integer ambiguity problem, referred to as P1, m2 is the vector of Zpm1 whose components are the 2;i . Denoting by c_ 2 the vector of Rpm1 with components _ 2;i , we have f14 ðm2 Þ ¼ q1 ðm2 c_ 2 Þ
ð35Þ
PHASE CLOSURE IMAGING
311
where q1 is the quadratic form pm1
q1 : R
2 pm 1 X ðiÞ q1 ðzÞ :¼ þ;2;i i¼1
! R;
ð36Þ
In the standard basis of Rpm1 , the matrix elements of q1 are the inner products ðþ;2;i j þ;2;i0 Þ. Let m2 be the solution of P1, i.e., the point of ðZpm1 ; q1 Þ closestPto c_ 2 (see Appendix 4). Clearly, according to the pm1 definition of f14 ; i¼1 2;i þ;2;i is the node of lattice U+G(Z) closest to Uþ _ . Let us now set _ 2;i :¼ _ 2;i 2;i _ :¼
ð1 i p m1 Þ
pm X1 i¼1
_ 2;i þ;2;i
ð37Þ ð38Þ
and 2 :¼
pm X1
2;i 2;i
i¼1
ð39Þ
Vector _ , which lies in M+, is none other than the phase calibration discrepancy up to a factor 2 : ¼ 2_ [see Eqs. (7), (28), and the successive definitions of f11, f12, f13, f14]. The nodes of G(Z) at which the minimum of f13 is attained are equal to 2 up to a node of K(Z) + L(Z). Denoting by _ the value of _ corresponding to 2 , we have _ :¼ { _ _ } ¼ ð _ _ Þ 2 hence _ ¼ _ _ 2
ð40Þ
As K + L can be regarded as the direct sum of K1 and L (see Section VI), _ can be uniquely decomposed in the form _ ¼ #_ 1 þ _ 1
ð41Þ
with #_ 1 in K1 and _ 1 in L. The point ð2#_ 1 ; 2_ 1 Þ is therefore a point ð#_ 1 ; 1 Þ of K L at which the minimum of f1 is attained: #1 ¼ 2#_ 1 and 1 ¼ 2_ 1 . It therefore remains to perform decomposition [Eq. (41)]. First note that U1, the orthogonal projection onto M1 (see Section VI), is equal to U1S. It then follows from Eq. (41) that
312
A. LANNES
U1 #_ 1 ¼ U1 _ hence, from Eq. (40) (since _ is orthogonal to M1), U1 #_ 1 ¼ U1 _ U1 2 But (see Section VI), U1 _ ¼ U1 S _ ¼ U1 ¼
m1 X j¼1
m1 X j¼1
_ 1; j 1; j þ
_ 1; j 1; j þ
pm X1
pm X1
_ 2;i 2;i
i¼1
!
_ 2;i 1;2;i
i¼1
Furthermore, from Eq. (39), U1 2 ¼ U1 S2 ¼ U1 ¼
pm X1
2;i 2;i
i¼1
pm X1
2;i 1;2;i
i¼1
As a result [see Eq. (37)], U1 #_ 1 ¼
m1 X j¼1
_ 1; j 1; j þ
pm X1
_ 2;i 1;2;i
i¼1
As 1;2,i can be expressed as a linear combination of the 1; j (see Appendix 3), it follows that m1 X _ 12; j 1; j U1 #_ 1 ¼ j¼1
in which the _ 12; j can be explicitly determined. But, 1; j ¼ U1 1; j =cj , where the cj s are the elementary divisors of [CK] [Eq. (21)]. Consequently, m1 X _ 12; j with #1; j ¼ #_ 1 ¼ ð42Þ #_ 1; j 1; j cj j¼1 The component _ 1 immediately follows from Eqs. (41) and (40): _ 1 ¼ _ _ 2 #_ 1
ð43Þ
PHASE CLOSURE IMAGING
313
IX. Optimal Model Phase Shift and Related Results According to Eqs. (29), (24), (23), and (22), the search for the optimal model phase shift [Eq. (31)] leads us to minimize the objective functional 2 f22 : K0 ! R; f22 ð’_ Þ :¼ {#_ 1 ’_ }
As a result, we have to identify the nodes of K(Z) closest to the affine space parallel to K0 and passing through #_ 1 . We therefore have to minimize in K(Z) the norm of the projection of #_ 1 onto K+ (the orthogonal complement of K0 in K ). The nodes at which the minimum is attained are defined up to a node of K0(Z). The bulk of the problem is therefore to find a minimum of the functional 2 f23 : KðZÞ ! R; f23 ð Þ :¼ Tþ ð#_ 1 Þ
in which T+ is the projection of K onto K+ (see Section VI). Let 0 and 1 be the components of on K0 (Z) and K1(Z). Expanding #_ 1 and 1 in the forms m1 m1 X X
1; j 1; j
1 :¼ #_ 1; j 1; j #_ 1 :¼ j¼1
j¼1
we have, since þ;1; j :¼ Tþ 1; j [Eq. (17)], m1 X Tþ ð#_ 1 Þ ¼ ð#_ 1; j 1; j Þþ;1; j j¼1
The problem is therefore to minimize the functional 2 m1 X m1 _ f24 : Z ! R; f24 ðr1 Þ :¼ ð#1; j 1; j Þþ;1; j j¼1
In this integer ambiguity problem, referred to as P2, r1 is the vector of Zm1 whose components are the 1, j. Denoting by q_ 1 the vector of Rm1 with components #_ 1; j , we have f24 ðr Þ ¼ q2 ðr q_ 1 Þ ð44Þ 1
1
where q2 is the quadratic form q2 : Rm1 ! R;
2 m1 X q2 ðzÞ :¼ ð jÞ þ;1; j j¼1
ð45Þ
In the standard basis of Rm1 , the matrix elements of q2 are the inner products ðþ;1; j j þ;1; j 0 Þ. Let r1 be the solution of P2, i.e., the point of (Zm1 ; q2 Þ closestPto q_ 1 (see Appendix 4). Clearly, according to the definition 1 _ of f24 ; m j¼1 1; j þ;1; j is the node of lattice Tþ KðZÞ closest to Tþ #1 .
314
A. LANNES
Let us now set #_ 1; j :¼ #_ 1; j 1; j #_ 1 :¼
m1 X j¼1
and
1 :¼
ð1 j m1 Þ
#_ 1; j þ;1; j
m1 X
1; j 1; j
j¼1
ð46Þ ð47Þ
ð48Þ
The nodes of K(Z) at which the minimum of f23 is attained are equal to 1* up to a node of K0(Z). Denoting by ’_ 1 the value of ’_ corresponding to 1*, we have #_ 1 :¼ {#_ 1 ’_ 1 } ¼ ð#_ 1 ’_ 1 Þ 1
hence
’_ 1 ¼ #_ 1 #_ 1 1 ð49Þ The optimal model phase shift # and the bias phase ’ are then, respectively, given by the formulas: # ¼ 2#_ 1 ’ ¼ 2’_ 1 A. Optimal Bias Phase The optimal bias phase is defined as ¼ 2ð_ 1 þ ’_ 1 Þ
in which _ 1 ¼ _ _ 2 #_ 1 and ’_ 1 ¼ #_ 1 #_ 1 1 [see Eqs. (33), (43), and (49)]. As _ 1 þ ’_ 1 ¼ _ #_ 1 _ ð 1 þ 2 Þ, it follows that ¼
# 2ð 1 þ 2 Þ
ð50Þ
B. Optimal Pupil Phase Given r in A, let Br be the operator from Hr into E induced by B (see Sections II.A, II.D, and II.E); Br is invertible. Indeed, Br is injective, and dim Hr ¼ dim E ¼ n 1 The optimal pupil phase, which is defined modulo 2 in Hr, is then given by the formula
PHASE CLOSURE IMAGING
315
¼ B1 r P Note that P*, the projection of * onto E, is none other than the restriction of * to the directed baselines of the corresponding spanning tree. From Eq. (50), we therefore have P ¼ Pð # 2 1 Þ As clarified below, the inverse of Br may be obtained by performing the Smith normal decomposition of Br. Let [Br] be the matrix of Br in the standard bases of Hr and E (see Section II). Note that its entries are equal to 1 or 0. As Br maps Hr onto E, the column vectors of [Br] form a basis of E. The elementary divisors of [Br] are therefore equal to unity (see Appendix 4). As a result, the Smith normal form of [Br] is the identity matrix on Rn1 : In1 . The related decomposition 0 0 is therefore of the form ½In1 ¼ ½Dr ½Br ½Dr with ½Dr ¼ ½In1 , hence 1 ½Br ¼ ½Dr . X. Special Cases In this section, we successively consider the special cases where (1) problem 1 disappears (Section X.A), (2) problem 2 is trivial (Section X.B), and (3) problem 1 disappears and problem 2 is trivial (Section X.C). A. Special Case Where m1 ¼ p As m1 is the rank of CK, m1 is less than or equal to p. In this section, we consider the special case where m1 ¼ p. Note that this is typically the case for nonredundant arrays with K ¼ G. Then, K0 ¼ L and K1 ¼ F , hence m0 ¼ n 1 and therefore m1 ¼ m m0 ¼ q ðn 1Þ ¼ p
The condition m1 ¼ p may also be satisfied in the more general case where m is simply greater than p, i.e., in the case of weakly redundant arrays (see the example given in Section V.B.1). When m1 ¼ p; K þ L coincides with G. Indeed, K þ L ¼ K1 þ L with K1 ¼ F , and G is the direct sum of L and F (see Section III). The phase calibration discrepancy is therefore reduced to zero: ¼ 0 As a result, the integer ambiguity problem P1 disappears, and Eq. (34) collapses to
316
A. LANNES
½
1
¼ ½ 1 ½
ð51Þ
The #-solution in K1 is therefore of the form [compare with Eq. (42)] #1 ¼
p X
1; j
cj
j¼1
1; j
ð52Þ
Solving the integer ambiguity problem P2 then yields #* and 1*. Modulo 2 in L, the optimal bias phase is then given by the formula [see Eq. (50)] ¼
# 2 1
Let us finally note that in the special case of nonredundant arrays with K ¼ G, we have 1; j ¼ j for 1 j p. As the elementary divisors of [CK] are then equal to unity, Eq. (52) then collapses to #1 ¼
p X
ð jÞ
j
j¼1
Furthermore, Kþ ¼ M with þ;1; j ¼ Sj ¼ j . It then follows from the analysis presented in Section IV that, in the standard basis of Rp , the matrix of the quadratic form q2 is the inverse of the variance–covariance matrix [V p] of the closure phases. The search for a reduced basis of lattice (Zp ; q2 ) then corresponds to a decorrelation process (see Appendix 4).
B. Special Case Where m1 ¼ m with m < p When, for a given choice of K, the spectral phase closure operator CK is injective, one says that the interferometric device is of ‘‘full phase,’’ and one speaks of redundant spacing calibration (RSC: Lannes and Anterrieu, 1999). This situation arises when operating on strongly redundant arrays (m < p). Then, K0 ¼ f0g; K1 ¼ K; m0 ¼ 0; m1 ¼ m. In this special case, once the integer ambiguity problem P1 has been solved, the particular solution #1 proves to be of the form #1 ¼
m X j¼1
12; j
cj
1; j
As K0 is reduced to {0}, we then have # ¼ arcð#1 Þ. The integer ambiguity problem P2 is therefore trivial: 2 1 ¼ #1 # . Modulo 2 in L, the optimal bias phase is then given by the formula [see Eq. (50)] ¼
#1 22
317
PHASE CLOSURE IMAGING
C. Special Case Where m1 ¼ m with m ¼ p This situation corresponds to what is called ‘‘critical redundancy’’: a fullphase situation with m ¼ p. In this case, the integer ambiguity problem P1 disappears, and P2 is trivial. We then have # ¼ arcð#1 Þ where (see Sections X.A and X.B) m X 1; j #1 ¼ 1; j c j j¼1 Modulo 2 in L, the optimal bias phase is then given by the formula ¼ #1
XI. Simulated Example The simulation presented in this section concerns the six-element array and the corresponding interferometric graph introduced in Section V.B.2 (see Fig. 7). The object spectral phase was assumed to be known on baselines (1, 2), (3, 4), and (5, 6): #r ¼ 0. The numbers of degrees of freedom of the integer ambiguity problems P1 and P2 are then equal to 3: p m1 ¼ 3; m1 ¼ 3. As specified in Sections II.B and II.C, the weight function $ involved in the definition of the inner product must satisfy the redundancy constraint. In the simulation presented in this section, $ was defined by the following components: $ð1; 2Þ ’ 0:21 $ð1; 5Þ ’ 0:03
$ð1; 3Þ ’ 0:06 $ð1; 6Þ ’ 0:01
$ð1; 4Þ ’ 0:05
P Here, these components are normalized so that ð j;kÞ2B $ð j; kÞ ¼ 1 for the graph shown in Figure 7. All the elements involved in the integer ambiguity problems P1 and P2 can then be easily computed. The basic components of the object spectral phase #o were set equal to the following values:
#o ð1; 2Þ ¼ 0 #o ð1; 5Þ ¼ 10
#o ð1; 3Þ ¼ 172 #o ð1; 6Þ ¼ 15
#o ð1; 4Þ ¼ 40
The experimental baseline phases e ( j, k) were simulated by referring to Eq. (7) with #m ¼ 0, and # ¼ #o . The pupil phases ( j ) were randomly distributed on the trigonometric circle, and the error term e was taken into account by adding Gaussian phase noise; its standard deviation was set equal to 3.9 . The values thus obtained were
318
A. LANNES
e ð1; 2Þ ’ 3:1
e ð1; 3Þ ’ 67:5
e ð1; 5Þ ’ 102:1
e ð1; 6Þ ’ 138:6
e ð3; 5Þ ’ 153:9
e ð5; 6Þ ’ 101:3
hence, for #m 0, the closure phases of ð1Þ
ð4Þ
e ð3; 4Þ ’ 147:8
e ð4; 6Þ ’ 40:5
ð2Þ
’ 19:3
ð5Þ
’ 199:5
2;1
’ 241:9
1;2
ð6Þ
’ 7:4
2;2
’ 468:0
1;3
’ 1:2
e ð3; 6Þ ’ 128:3
The change of variable (34) then gave 1;1
e ð2; 6Þ ’ 161:1
ð3Þ
’ 241:9 ’ 205:7
e ð1; 4Þ ’ 26:6
2;3
’ 15:7
’ 342:1
’ 448:7
’ 361:3
The solution of the integer ambiguity problem P1 proved then to be m2 ¼ bc_ 2 e [see Eqs. (36) and (35)] 2;1 ¼ 0;
2;2 ¼ 0;
2;2 ¼ 1
The norm of the phase calibration discrepancy * was then of the order ^1 was of 0.99 . Solving the integer ambiguity problem P2 was not so easy: r different from bq_ 1 e [see Eqs. (45) and (44)]. However, as P2 was of small dimension (m1 ¼ 3), it was not necessary to search for a reduced basis of (Zm1 ; q2 ) (see Appendix 4): the discrete search algorithm was simply applied to the matrix of q2 in the standard basis of R3 . The ambiguities 1, j thus resolved were the following:
1;1 ¼ 1;
1;2 ¼ 2;
1;2 ¼ 2
The optimal model phase shift #* was then found to be characterized by the following spectral components:
# ð1; 2Þ ’ 0:0
# ð1; 5Þ ’ 50:0
# ð1; 3Þ ’ 17:4
# ð1; 4Þ ’ 105:2
# ð1; 6Þ ’ 68:9
Its norm was of the order of 37.8 . In this case, as #m is equal to 0, the constrained model phase #m* coincides with #*. This simulation was completed by computing an optimal bias phase and an optimal calibration phase. Modulo 2 in L,
ð1; 2Þ ’ 3:18 ð1; 5Þ ’ 771:7 ð3; 4Þ ’ 147:3 ð4; 6Þ ’ 382:9
ð1; 3Þ ’ 276:9 ð1; 6Þ ’ 512:5 ð3; 5Þ ’ 494:8 ð5; 6Þ ’ 259:2
ð1; 4Þ ’ 129:6 ð2; 6Þ ’ 509:3 ð3; 6Þ ’ 235:6
319
PHASE CLOSURE IMAGING
and modulo 2 in H1,
ð1Þ ’ 0:0
ð4Þ ’ 129:6
ð2Þ ’ 3:2
ð5Þ ’ 51:7
ð3Þ ’ 83:1
ð6Þ ’ 152:5
In this simulation, as CK was not injective, and the discrepancy between #m and #o was large, the discrepancy between #m* and #o was also large. This situation was selected precisely to illustrate the fact that the phase calibration operation could be performed in any situation (here, obviously, a situation without any physical interest). As a general rule, the situations of physical interest are those for which m2 ¼ bc_ 2 e and r_ 1 ¼ bq_ 1 e [see Eqs. (36), (35) and (45), (44)].
XII. Concluding Comments The problems of integer ambiguity resolution arising in phase closure imaging had been analyzed previously in two extreme situations, nonredundant arrays (Lannes, 2001a) and full-phase arrays (Lannes and Anterrieu, 1999). The present study completes the results already obtained in this field. The corresponding theoretical framework is based on the Smith normal form of the spectral phase closure matrix (see Sections V and VI). In the general case of redundant interferometric graphs, two nearest lattice point problems P must be successively solved: the integer ambiguity problems P1 and P2 [see the context of Eqs. (36), (35) and (45), (44)]. As specified in Appendix 4, a problem such as P is to find the point k of Z closest to a point x of R , the distance being the one induced by a given quadratic form q. One then says that is the number of degrees of freedom of P. In the situations where there exist several k such that qðk x) is of the order of qðk x), phase calibration instabilities may occur. As illustrated in Lannes (1999) and in Lannes and Anterrieu (1999), the problem is then unstable. The number of degrees of freedom of P1 is equal to p m1 , where p denotes the number of loops defined through a given spanning tree of the interferometric graph; m1 is the difference between m, the dimension of the unknown spectral phase space K, and m0, the one of the intersection of K with the bias phase space L. Note that m is the number of spectral phase components to be determined. The number of degrees of freedom of P2 is equal to m1. In the case of full-phase arrays, m1 is equal to m; P2 then proves to be trivial. In the case of nonredundant arrays, m1 is equal to p, so that P1
320
A. LANNES
disappears. With regard to P2, there then exists a particular initialization procedure for the search for the nearest lattice point. As specified in Lannes (2001b), this procedure benefits from the fact that the notion of graph and the related algebra are basically involved in the statement of the problem. This technique can of course be applied directly to weakly redundant situations. The less redundant the array is, the more efficient this initialization procedure. The main result of a phase calibration operation is the optimal model phase shift #* (see Fig. 9). It is important to note that the #* ( j, k) for ( j, k) 2 B depend only on the differences between the closure phases of the data and those of the model. The object model sm involved in Eq. (8) may result from a global image reconstruction process based, for example, on the maximum entropy principle. The data are then the moduli %e( j, k) of the experimental complex visibilities Ve ð j; kÞ %e ð j; kÞ exp½ie ð j; kÞ, and the closure phases ðiÞ e . The phase calibration operation followed by a Fourier synthesis process is then simply used as a refinement technique. When the phase data are the experimental baseline phases e( j, k), as is typically the case in radio imaging, the calibrated phase is then defined by the formula e :¼ e , in which * is an optimal bias phase. The optimal bias phases *( j, k) can be computed (modulo 2), as well as related pupil phases, the optimal calibration phases ( j). An interferometric device is a set of arrays independently observing the same source. The present analysis can easily be extended to such devices.
Appendix 1. Useful Property Let H be a real Hilbert space, and fei gni¼1 be a basis of H. Given r < n, we denote by V the subspace of H with basis fei gri¼1 , by V+ the orthogonal complement of V in H, and by P+ the orthogonal projection of H onto V+. Property. Then, fPþ ei gni¼rþ1 is a basis of Vþ . & Proof. As V+ is of dimension n r, we simply Pn have to show that ei gni¼rþ1 is a free set of VþP . The condition fPþP i¼rþ1 ai Pþ ei ¼ 0 implies Pþ ni¼rþ1 ai ei ¼ 0. The vector ni¼rþ1 ai ei then lies in V. This means that n X
i¼rþ1
ai e i ¼
r X
bj e j
j¼1
for aj :¼ bj for 1 j r, we therefore have Pn some b1 ; . . . ; br . Setting n i¼1 ai ei ¼ 0. As fei gi¼1 is a basis of H, all the ai are equal to 0, and in particular, arþ1 ; . . . ; an . As a result, fPþ ei gni¼rþ1 is a free set of V+. &
321
PHASE CLOSURE IMAGING
Appendix 2. Smith Normal Form of Integral Matrices 0
Let A be a Z-linear operator from Zn into Zn , and [A] be its matrix in the corresponding standard bases; [A] is an n n0 matrix with coefficients in Z. The proof of the following theorem can be found in many textbooks (see, e.g., Newman, 1972; van der Waerden, 1967). 0 Theorem. There then exist a basis E 0 :¼ e01 e02 ; . . . ; e0n0 of Zn and a basis E :¼ fe1 ; e2 ; . . . ; en g of Zn, some integer r 0 and positive integers a1 ; a2 ; . . . ; ar in Z, with aj dividing aj+1 for 1 j < r, such that Ae0j ¼ aj ej for 1 j r and Ae0j ¼ 0 for j > r, in other words such that the matrix of A in the bases E 0 and E is of Smith normal form. More precisely, there then exist two matrices [W 0 ] and [W] (of order n0 and n, respectively) with coefficients in Z and determinant 1 such that 0
B B B B B ½W ½A½W 0 ¼ B B B B @
a1 0 .. .
0 a2
0 0 .. .
... ... ar 0
0
0
...
..
.
1 0 0C .. C C .C C C C C C A 0
The aj are called the ‘‘elementary divisors’’ of [A]. Clearly, r is the rank of A, i.e., the dimension of its range. The coefficients of the jth column of 0 [W 0 ] are the components of e0j in the standard basis of Zn , whereas the coefficients of the j th column of ½W 1 are the components of ej in the standard basis of Zn . The constructive processes that perform the Smith normal decomposition of [A] are based on the Euclid extended algorithm (see Cohen, 1996). They provide the elementary divisors of [A], the matrix elements of ½W ; ½W 1 ; ½W 0 , and ½W 0 1 . (All these matrix elements lie in Z.) Appendix 3. Reference Projections In this appendix, we show how to compute 1;2;i and þ;2;i for 1 i p m1 , and þ;1; j for 1 j m1 . For each i, the components of 1;2,i and +;2,i are obtained by minimizing the functional 2 m1 X m1 g1;i : R ! R; g1;i ðxÞ :¼ 2;i xj 1; j j¼1
322
A. LANNES
Clearly, the xj are the components of x in the standard basis of Rm1 . Denoting by x1* the vector x for which the minimum of g1,i is obtained, we then have 1;2;i ¼
m1 X
þ;2;i ¼ 2;i 1;2;i
x1; j 1; j
j¼1
Likewise, for each j, +;1, j can be determined by minimizing the functional 2 m0 X m0 g0; j : R ! R; xk 0;k g0; j ðxÞ :¼ 1; j k¼1
Denoting by x0* the vector x for which the minimum of g0, j is obtained, we then have 0;1; j ¼
m0 X
þ;1; j ¼ 1; j 0;1; j
x0;k 0;k
k¼1
Remark. Let be a vector in a real Hilbert space H with inner product onto the ( j ), and fei gni¼1 be a free subset of H. The projection of subspace generated by this subset, :¼ P , is obtained by minimizing the functional 2 n X n xi e i gðxÞ :¼ g : R ! R; i¼1
Indeed, denoting by x* the vector for which the minimum of g is obtained, we have
¼
n X i¼1
x;i ei
Let A be the operator
A : Rn ! H;
Ax ¼
n X
xi ei
i¼1
Minimizing g amounts to solving the normal equation A Ax ¼ A . The ith component of A* is then given by the formula
323
PHASE CLOSURE IMAGING
ðA Þi ¼ ðei j Þ Note that the matrix elements of A*A are the inner products (ei j ei 0 ): ai;i0 ¼ ðei j ei0 Þ
Appendix 4. Closest Point Search in Lattices The notion of integer ambiguity resolution is associated with the problem of ^ of R , the distance being the finding the point k* of Z closest to a point x one induced by a given quadratic form q. In what follows, [k] and [^ x] denote ^ in the standard basis of R , and [Q] the the column matrices of and x matrix of q in this basis. The problem is therefore to minimize in the quantity ^Þ ¼ ½ x ^t ½Q½ x ^ qð x ^, and q of course depend on the particular The definitions of ; ; x problem to be solved. For example, in the integer ambiguity problems stated in Sections VIII and IX, we have, for P1 [see Eqs. (36) and (35)], p m1 ;
m2 ;
^ x
c_ 2 ;
q
q1
and for P2 [see Eqs. (45) and (44)],
m1 ;
r1 ;
^ x
q_ 1 ;
q
q2
Let b^ xe denote the vector whose components are the closest rational integers ^. In many cases, the standard basis of lattice Z is far to the components of x from being orthogonal for the quadratic form q; [Q] is therefore far from being diagonal. As a result, in general, the integer ambiguity vector ^ b^ xe is not the solution of the problem: 6¼ ^. A. Search for a Reduced Basis To circumvent this difficulty, one may be led to search for a basis of Z ; e 0i i¼1 , in which the matrix of q, [Q0 ], is as diagonal as possible. This amounts to exhibiting what is referred to as a reduced basis of lattice (Z ; q). This operation can be performed by using a well-known algorithm in algebra of numbers: the Lentsra–Lenstra–Lova´sz (LLL) algorithm (see
324
A. LANNES
Section 2.6 in Cohen, 1996). One is then led to minimize in 0 the quantity ^0 t ½Q0 ½0 x ^0 ^Þ ¼ ½0 x gð x The relationship between [Q0 ] and [Q] is of the form ½Q0 ¼ ½W t ½Q½W in which [W ] is a matrix (of order n) with coefficients in Z and determinant 1. Then, ½^ x ¼ ½W ½^ x0 and ½ ¼ ½W ½0 . (The current implementations of the LLL algorithm provide [W ] and its inverse.) Despite this reduction ^0 t ½Q0 ½0 x ^0 may not be attained at operation, the minimum of ½0 x 0 0 ^ b^ x e. One then proceeds as specified below. B. Discrete Search Process For clarity, we now omit the primes, and note that ^ ¼!! ^ x
ðA4:1Þ
with ! :¼ ^;
^ ^ ^ :¼ x !
ðA4:2Þ
Set ^t ½Q½x ! ^ ðxÞ :¼ ½x !
ðA4:3Þ
and consider the ellipsoid E 0 :¼ fx 2 R :¼ ðxÞ 0 g;
ðA4:4Þ
0 :¼ ð0Þ
^Þ ¼ ð!Þ, the problem is to search for the integer ambiguity As qð x vector(s) !* at which the minimum of in Z is attained. From Eq. (A4.2), * is then given by the formula ¼ ^ þ !
ðA4:5Þ
Clearly, the search for !* can be confined to the points of Z contained in E 0. Let us now consider the Cholesky factorization of Q: ½Q ¼ ½Ut ½U [U ] is an upper triangular matrix with matrix elements uij. It then follows from Eq. (A4.3) that ^Þ ^Þt ð½U½! ! ð!Þ ¼ ð½U½! !
325
PHASE CLOSURE IMAGING
We therefore have ð!Þ ¼
X i¼1
r2i ð!Þ
ðA4:6Þ
in which ri2 is the contribution of the ith row of [U ]: ri ð!Þ :¼
X
uij ð jÞ ;
^ð jÞ ð jÞ :¼ !ð jÞ !
j¼i
ðA4:7Þ
Here, !( j ) is the jth component of ! (in the selected basis), i.e., the jth integer ^ðjÞ is the jth component of ! ^ (in the same ambiguity (in this basis), whereas ! basis). Ellipsoid E 0 is now searched for candidates for the optimal ambiguity vector !*. Following the ideas of the method presented in de Jonge (1998; see also Lannes 2001b), we first show how to exhibit bounds for ambiguity !(i), with the ambiguities !() through !(i + 1) being already conditioned; the ambiguities !ði1Þ through !ð1Þ are not yet conditioned. In other words, they are implicitly set equal to 0. Ambiguity Bounds. For clarity, let us set ri :¼ ri ð!Þ. According to Eqs. (A4.4) and (A4.6), the following condition must be satisfied: 8 X > > < r2i þ r2‘ 0 if i < ‘¼iþ1
> > :
r2i
0
if i ¼
Denoting by yi and zi the quantities 8 X < r2‘ 0 yi :¼ r2i zi :¼ ‘¼iþ1 : 0
if i < if i ¼
ðA4:8Þ
we therefore have
yi z i
ðA4:9Þ
For i < , zi can be written in the form [see Eq. (A4.8)] ! X 2 r‘ r2iþ1 zi ¼ 0 ‘¼iþ2
hence the recurrence formula: zi ¼ ziþ1 yiþ1
ðA4:10Þ
326
A. LANNES
Note that y ¼ ½u ðÞ 2 ;
z ¼ 0 pffiffiffiffi As jri j zi [see Eqs. (A4.8) and (A4.9)], we have pffiffiffiffi pffiffiffiffi zi ri zi
ðA4:11Þ ðA4:12Þ
Let us now expand ri in the form [see Eq. (A4.7)] ri ¼ uii ðiÞ þ si
ðA4:13Þ
where 8 X > < uij ð jÞ si :¼ j¼iþ1 > : 0
if i < if i ¼
It then follows from Eq. (A4.12) that
ðA4:14Þ
1 pffiffiffiffi 1 pffiffiffiffi ð zi þ si Þ ðiÞ ð zi si Þ uii uii
hence from Eq. (A4.7): ^ðiÞ !
1 pffiffiffiffi 1 pffiffiffiffi ^ðiÞ þ ð zi si Þ ð zi þ si Þ !ðiÞ ! uii uii
ðA4:15Þ
Discrete Search Algorithm. Set i . We then have from Eqs. (A4.15), (A4.14), and (A4.8) pffiffiffiffi pffiffiffiffi 0 0 ^ðÞ ^ðÞ þ !ðÞ ! ! u u
For each integer ambiguity !() in this interval, one successively computes i 1, computes si, r , y , and z1 . One then uses a program that sets i and defines the bounds for ambiguity !(i). For each possible value of this ambiguity, one then computes ri ; yi , and zi1 . If i is greater than 1, and 0 zi1 is smaller than the smallest value of computed so far on the ambiguity tree [see Eq. (A4.8) with i i 1], one then uses the same program for defining the bounds for ambiguity !ði1Þ . All the ambiguity vectors ! of interest can thus be identified through the recursive call of a same program. ð1Þ
ð1Þ
ð1Þ
ð1Þ
At level i ¼ 1, we have !m !ð1Þ !M with !m and !M in Z. One then computes ð1Þ ^ð1Þ Þ þ s1 r1 ¼ u11 ð!m !
327
PHASE CLOSURE IMAGING
and y1 ¼ r21 . According to Eqs. (A4.6) and (A4.8), the value of at the integer ambiguity vector ! thus conditioned is given by the formula ð1Þ When !M
ð!Þ ¼ 0 ðz1 y1 Þ
ð1Þ is strictly greater than !m , i.e., when !ð1Þ
ð1Þ is of the form !m
ðA4:16Þ
þ neð1Þ , the corresponding values of q are obtained through the variational formula ^ ¼ ð!Þ þ 2ðnu11 Þr1 þ ðnu11 Þ2 q ð! þ neð1Þ Þ ! ðA4:17Þ
Indeed,
^Þ þ neð1Þ Þ ¼ qð! ! ^Þ þ 2n½! ! ^t ½Q½eð1Þ þ n2 qðeð1Þ Þ q½ð! ! ^Þ ¼ ð!Þ; ½! ! ^t ½Q½eð1Þ ¼ u11 r1 and qðeð1Þ Þ ¼ u211 . in which qð! ! References Biggs, N. (1996). Algebraic Graph Theory. 2nd ed. Cambridge: Cambridge University Press. Born, M., and Wolf, E. (1970). Principles of Optics. Oxford: Pergamon Press. Cohen, H. (1996). A Course in Computational Algebraic Number Theory. Berlin: Springer-Verlag. Cornwell, T. J., and Wilkinson, P. N. (1981). A new method for making maps with unstable interferometers. Mon. Not. R. Astron. Soc. 196, 1067–1086. de Jonge, P. J. (1998). A processing strategy for the application of the GPS in networks. Publication on Geodesy, Vol. 46. Delft: NCG Nederlandse Commissie voor Geodesie Netherlands Geodetic Commission. Hunt, G., and Payne, H. E. (1997). Astronomical Data Analysis Software and Systems VI. San Francisco: Astronomical Society of the Pacific. Lannes, A. (1999). Phase calibration on interferometric graphs. J. Opt. Soc. Am. A. 16, 443–454. Lannes, A. (2001a). Integer ambiguity resolution in phase closure imaging. J. Opt. Soc. Am. A 18, 1046–1055. Lannes, A. (2001b). Re´solution d’ambiguı¨te´s entie`res sur graphes interfe´rome´triques et GPS. C. R. Acad. Sci. Paris 333, Se´r. I, 707–712. Lannes, A., and Anterrieu, E. (1999). Redundant spacing calibration: Phase restoration methods. J. Opt. Soc. Am. A. 16, 2866–2879. Lannes, A., Anterrieu, E., and Bouyoucef, K. (1994). Fourier interpolation and reconstruction via Shannon-type techniques; Part I: Regularization principle. J. Mod. Opt. 41, 1537–1574. Lannes, A., Anterrieu, E., and Bouyoucef, K. (1996). Fourier interpolation and reconstruction via Shannon-type techniques; Part II: Technical developments and applications. J. Mod. Opt. 43, 105–138. Lannes, A., Anterrieu, E., and Mare´chal, P. (1997). Clean and wipe. Astron. Astrophys. Suppl. Ser. 123, 183–198. Newman, M. (1972). Integral Matrices. New York: Academic Press. Reasonberg R. D. (1998) Proceedings of the SPIE meeting on Astronomical Interferometry, Kona (Hawaii), SPIE 3350. van der Waerden, B. L. (1967). Algebra. Berlin: Springer-Verlag.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 126
Three-Dimensional Image Processing and Optical Scanning Holography TING-CHUNG POON Optical Image Processing Laboratory, Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061
I. Introduction. . . . . . . . . . . . . . . . . . . . . . II. Two-Pupil Optical Heterodyne Scanning . . . . . . . . . . . A. Heterodyning Theory . . . . . . . . . . . . . . . . . B. Coherency Considerations . . . . . . . . . . . . . . . C. Special Cases: Fluorescent Specimens or Incoherently Reflecting Rough Surfaces . . . . . . . . . . . . . . . . . . . D. Detection Schemes . . . . . . . . . . . . . . . . . . III. Three-Dimensional Imaging Properties . . . . . . . . . . . IV. Optical Scanning Holography . . . . . . . . . . . . . . . A. Cosine, Sine, and Complex Hologram . . . . . . . . . . B. 3D Image Reconstruction . . . . . . . . . . . . . . . V. Concluding Remarks . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
329 330 330 333
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
335 336 337 340 340 341 347 348
I. Introduction Optical scanning holography (OSH) (Poon, 1985) was invented as a clever application of the pupil interaction processing technique (Poon and Korpel, 1979), which is unique in extending incoherent image processing to include the implementations of bipolar or even complex point-spread functions (Lohmann and Rhodes, 1978; Mait, 1987; Poon, 1985; Poon and Korpel, 1979; Stoner, 1978). One of the two-pupil processing techniques, namely the use of a pupil interaction scheme in a scanning illumination mode, has been developed extensively (Indebetouw and Poon, 1992). The pupil interaction scheme has been implemented by optical heterodyne scanning (Poon and Korpel, 1979) and has been used for many interesting applications such as textural edge extraction and tunable and three-dimensional (3D) filtering (Poon et al., 1988, 1990). When we drastically modify one of the pupils relative to the other (specifically one of the pupils is an open mask and the 329 Copyright 2003 Elsevier Science (USA). All rights reserved. ISSN 1076-5670/03
330
T.-C. POON
other is a pinhole mask) and defocus the optical system, we end up with an optical scanning system capable of holographic recording of the object being scanned and thus the invention of OSH (Poon, 1985). Indeed, OSH has been invented to acquire holographic information through active two-dimensional optical scanning. Scanning holographic microscopy (Indebetouw et al., 2000; Poon, 1985; Poon et al., 1996; Schilling et al., 1997; Swoger et al., 2002), optical recognition of 3D objects (Kim and Poon, 2000; Poon and Kim, 1999), 3D holographic display (Poon, 2002), and 3D optical remote sensing (Kim et al., 2002; Klysubun et al., 2000; Schilling and Templetion, 2001) are some of its most recent developments. Among its many applications, holographic microscopy has been developed quite extensively due to its important applications in 3D imaging of biological specimen (Kim, 1999; Poon et al., 1995; Zhang and Yamaguchi, 1998). Indeed some properties of a scanning holographic microscope have been outlined recently (Indebetouw et al., 2000) and numerical simulations have shown that point spread functions leading to different imaging functionalities (e.g., enhanced spatial resolution, extended depth of focus, or optical sectioning) can be expected with proper choices of pupil functions (Indebetouw, 2002). The purpose of this article is to introduce OSH through the development of a two-pupil optical heterodyne scanning image processor, which will be discussed in Section II. In Section III, 3D imaging properties in terms of the two pupils are developed and subsequently 3D point-spread functions (PSF) are derived. We then compare the developed PSFs with those obtained with conventional 3D image processing. In Section IV, we discuss OSH as a simple example of the optical heterodyne scanning image processor. We will then introduce the so-called sine-Fresnel zone plate (FZP) hologram, cosineFZP hologram, and complex hologram and subsequently, in that section, we will discuss 3D reconstruction. Finally, in Section V, we make some concluding remarks.
II. Two-Pupil Optical Heterodyne Scanning A. Heterodyning Theory A typical two-pupil heterodyne optical scanning image processor is shown in Figure 1. We model the 3D object as a stack of transverse slices spanning a longitudinal range z << z0 , where z is the thickness of the 3D object. Each slice of the object is represented by an amplitude transmittance Tðx; y; zÞ, which is thin and weakly scattering. We place the 3D object in front of the Fourier transfrom lens L2. The 3D object is then
331 Figure 1. Two-pupil heterodyne scanning image processor.
332
T.-C. POON
two-dimensionally (2D) scanned by two copropagating beams that have been modified by the two pupils located in the pupil planes as shown. The two optical beams have different temporal frequencies !0 and !0 þ as indicated in Figure 1 p1 ðx; yÞ is the impulse response, located on the back focal plane of lens L1, due to the beam of frequency !0 , and p2 ðx; yÞ is the impulse response due to the beam of frequency !0 þ . The two impulse responses are combined by beamsplitter BS1 and propagate over a distance z0 þ z to scan over the amplitude transmittance Tðx; y; zÞ, as shown in Figure 1. Two-dimensional scanning can be done by moving the 3D object along the x and y directions while keeping the two beams fixed, or by moving the beams. Mathematically, the amplitude distributions of the two beams just before the slice at position z is given by P1 ðx; y; z þ z0 Þ expð j!0 tÞ þ P2 ðx; y; z þ z0 Þ exp½jð!0 þ Þt; where Pi ðx; y; z þ z0 Þ ¼ pi ðx; yÞ hðx; y; z þ z0 Þ with i ¼RR1 or 2, and denotes the 2D convolution, defined as g1 ðx; yÞ g2 ðx; yÞ ¼ g1 ðx0 ; y0 Þg2 ðx x0 ; y y0 Þdx0 dy0 , and
jk0 k0 2 exp j x þ y2 hðx; y; zÞ ¼ exp½ jk0 z 2z 2z ð1Þ ¼ exp½ jk0 zhe ðx; y; zÞ
is the free-space spatial response in Fourier optics (Goodman, 1996; Poon and Banerjee, 2001), where k0 is the wavenumber of the light. The field just after the object slice is fP1 ðx0 ; y0 ; z þ z0 Þ expð j!0 tÞ þ P2 ðx0 ; y0 ; z þ z0 Þ exp ½ jð!0 þ tgTðx0 þ x; y0 þ y; zÞ, where x ¼ xðtÞ and y ¼ yðtÞ represent the instantaneous 2D position of the object. This field then propagates through the Fourier transform lens L2 and reaches the mask, Mðx; yÞ, located in the back focal plane of lens L2. Hence the field distribution exiting from the mask, due to all the slices of the object Tðx; y; zÞ, is Z ðx; y; xm ; ym Þ / f fP1 ðx0 ; y0 ; z þ z0 Þ expð j!0 tÞ þ P2 ðx0 ; y0 ; z þ z0 Þexp½ jð!0 þ tfTðx0 þ x; y0 þ y; zÞ
k0 k0 z exp j ðx0 xm þ y0 ym Þ dx0 dy0 exp j 2 ðx2m þ y2m Þ dzgMðxm ; ym Þ f 2f
ð2Þ
where xm and ym are the coordinates in the plane of the mask, and z is the distance of the object slice measured from the front focal plane of lens L2. The integration over z represents the volumetric effect due to the 3D object. Finally, the transmission photodetector, PD, which responds to intensity, gives the current output iðx; yÞ by spatially integrating the intensity:
3D IMAGE PROCESSING AND OPTICAL SCANNING HOLOGRAPHY
iðx; yÞ /
Z
j ðx; y; xm ; ym Þj2 dxm dym
333 ð3Þ
where the integration is over the area of the photodetector. iðx; yÞ consists of a baseband current and a heterodyne current at frequency . After some manipulations, the heterodyne current i ðx; yÞ is given by Z i ðx; yÞ / ½P1 ðx0 ; y0 ; z0 þ z0 ÞP2 ðx00 ; y00 ; z00 þ z0 Þ expðj tÞ þ P2 ðx0 ; y0 ; z0 þ z0 ÞP1 ðx00 ; y00 ; z00 þ z0 Þ expð j tÞ
k0 0 00 0 00 exp j ½xm ðx x Þ þ ym ðy y Þ f
k0 ðz0 z00 Þ 2 2 Þ þ y ðx exp j m m 2f 2
ð4Þ
Tðx0 þ x; y0 þ y; z0 ÞT ðx00 þ x; y00 þ y; z00 ÞjMðxm ; ym Þj2 dx0 dy0 dx00 dy00 dz0 dz00 dxm dym This heterodyne current contains the processed output of the two-pupil heterodyning image processor. The information to be processed is the 3D object, Tðx; y; zÞ. Different processing operations can be expected with choices of the pupils, p1 ðx; yÞ and p2 ðx; yÞ, as well as the mask, Mðx; yÞ, located on the back focal plane of lens L2. The expression in Eq. (4) is formidable and we shall break it down and analyze its consequences by manipulating the pupils and the mask. B. Coherency Considerations As in conventional optical scanning image processing (Indebetouw, 1981), the coherency of the two-pupil scanning system can be modified by changing the mask, Mðxm ; ym Þ. To see how it is the case, we rearrange Eq. (4) as Z i ðx; yÞ / ½P1 ðx0 ; y0 ; z0 þ z0 ÞP2 ðx00 ; y00 ; z00 þ z0 Þ expð j tÞ þ P2 ðx0 ; y0 ; z0 þ z0 ÞP1 ðx00 ; y00 ; z00 þ z0 Þ expð j tÞ
Tðx0 þ x; y0 þ y; z0 ÞT ðx00 þ x; y00 þ y; z00 Þ ðx0 x00 ; y0 y00 ; z0 z00 Þ
dx0 dy0 dx00 dy00 dz0 dz00 where
ð5Þ
334
T.-C. POON
Z ðx0 x00 ; y0 y00 ; z0 z00 Þ ¼ jMðxm ; ym Þj2
k0 k0 ðz0 z00 Þ 2 2 exp j ½xm ðx0 x00 Þ þ ym ðy0 y00 Þ exp j ðx þ y Þ m m dxm dym f 2f 2
ð6Þ
The function ðx0 x00 ; y0 y00 ; z0 z00 Þ plays the same role in scanning imaging as the coherence function does in conventional partially coherent imaging. It can thus be interpreted as a measure of the correlation of the field at ðx0 ; y0 ; z0 Þ, and the field at ðx00 ; y00 ; z00 Þ. We can study the coherency of the processor by investigating a special case of Eq. (5) in greater details. For a pinhole mask centered on the axis, we let Mðx; yÞ ¼ ðx; yÞ, leading to ðx0 x00 ; y0 y00 ; z0 z00 Þ ¼ 1. Equation (5) then becomes Z i ðx; yÞ / ½P1 ðx0 ; y0 ; z0 þ z0 ÞP2 ðx00 ; y00 ; z00 þ z0 Þ expð j tÞ þ P2 ðx0 ; y0 ; z0 þ z0 ÞP1 ðx00 ; y00 ; z00 þ z0 Þ expð j tÞ
Tðx0 þ x; y0 þ y; z0 ÞT ðx00 þ x; y00 þ y; z00 Þ
ð7Þ
dx0 dy0 dx00 dy00 dz0 dz00 Rearranging Eq. (7), we have Z ½P2 ðx0 ; y0 ; z0 þ z0 ÞTðx0 þ x; y0 þ y; z0 Þdx0 dy0 dz0 i ðx; yÞ / Re Z 00 00 00 00 00 00 00 00 00 P1 ðx ; y ; z þ z0 ÞT ðx þ x; y þ y; z Þdx dy dz exp ð j tÞ
ð8Þ
where Re[ ] represents the real part of. For Rsimplicity, we let p1 ðx; yÞ ¼ 1 and leave p2 ðx; yÞ as is. In this situation, P1 ðx00 ; y00 ; z00 þ z0 ÞT ðx00 þ x; y00 þ y; z00 Þdx00 dy00 dz00 is some constant, and Eq. (8) becomes h i i ðx; yÞ / Re i p ðx; yÞ expð j tÞ ð9Þ R where i p ðx; yÞ ¼ ½P2 ðx0 ; y0 ; z0 þ z0 ÞTðx0 þ x; y0 þ y; z0 Þdx0 dy0 dz0 . We can process the object’s amplitude transmittance by the pupil p2 ðx; yÞ. If we now let the mask becomes an open mask, i.e., Mðxm ; ym Þ ¼ 1, and we obtain, from Eq. (6) ðx0 x00 ; y0 y00 ; z0 z00 Þ ¼ Z
k0 exp j ½xm ðx0 x00 Þ þ ym ðy0 y00 Þ f
k0 ðz0 z00 Þ 2 2 ðxm þ ym Þ dxm dym exp j 2f 2
ð10Þ
3D IMAGE PROCESSING AND OPTICAL SCANNING HOLOGRAPHY
335
The quadratic term in Eq. (10) represents a spherical wave with a radius of curvature R ¼ f 2 =ðz0 z00 ), which can be made arbitrarily large. Thus, 1 ðx0 x00 ; y0 y00 ; z0 z00 Þ 0 00 z i3z 2 h 2 2 jk0 ðx0 x00 Þ þ ðy0 y00 Þ ð11Þ 5 exp4 2ðz0 z00 Þ ðx0 x00 ; y0 y00 ; z0 z00 Þ
By substituting this result into Eq. (5), we have, after some manipulations, Z i ðx; yÞ / Re P1 ðx0 ; y0 ; z þ z0 ÞP2 ðx0 ; y0 ; z þ z0 Þ
ð12Þ 2 0 0 0 0 jTðx þ x; y þ y; zÞj dx dy dz expð j tÞ This corresponds to an incoherent processing system as only the intensity values are processed. Note that, however, the intensity can be processed by the two pupils p1 ðx; yÞ and p2 ðx; yÞ. C. Special Cases: Fluorescent Specimens or Incoherently Reflecting Rough Surfaces For the special, but important cases of self-luminous objects, such as fluorescent specimens in biology, or incoherently reflecting rough surfaces as encountered in remote sensing, the beamsplitter BS2 and the reflection detector, as shown in Figure 1, are used for collecting energy scattered by the object. This energy, integrated by the detector, is thus equivalent to the integrated energy immediately behind the object. In this situation, the scanned current at the output of the reflection detector becomes Z ir ðx; yÞ / jfP1 ðx0 ; y0 ; z þ z0 Þ expð j!0 tÞ þ P2 ðx0 ; y0 ; z þ z0 Þ ð13Þ exp½ jð!0 þ tÞg Tðx0 þ x; y0 þ y; zÞj2 dz dx0 dy0 The integration is over the photodetector’s active surface, which is assumed sufficiently large. The heterodyne current from Eq. (13) is given by Z ir; ðx; yÞ / Re½ P1 ðx0 ; y0 ; z þ z0 ÞP2 ðx0 ; y0 ; z þ z0 Þ ð14Þ jTðx0 þ x; y0 þ y; zÞj2 dx0 dy0 dz expð j tÞ Note that from Eq. (14), only the intensity distribution is processed and the result is identical to that of Eq. (12). In summary, Eq. (9) and (14) are the core results of the heterodyne scanning image processor and these currents
336
T.-C. POON
represent scanned and processed [by p1 ðx; yÞ and=or p2 ðx; yÞ versions of the 3D object Tðx; y; zÞ or jTðx; y; zÞj2 , depending on the situation encountered. However, Eq. (14) has been applied for most of the applications encountered thus far with the heterodyne scanning processor, e.g., 3D fluorescence microscopy and 3D remote sensing. D. Detection Schemes The processed information can be obtained from Eq. (9) or (12) using a number of demodulation or filtering schemes. Using the incoherent case as an example, we can rewrite Eq. (12) as i ðx; yÞ / Re i p ðx; yÞ expð j tÞ ¼ ji p ðx; yÞj cosð t þ Þ ð15Þ R w h e r e i p ðx; yÞ ¼ P1 ðx0 ; y0 ; z þ z0 ÞP2 ðx0 ; y0 ; z þ z0 ÞjTðx0 þ x; y0 þ y; zÞj2 dx0 dy0 dz. When i (x, y) is demodulated according to the scheme shown in Figure 2, where LPF denotes electronic lowpass filtering, we have two demodulated currents id ðx; yÞ ¼ ji p ðx; yÞj cosð Þ; and iq ðx; yÞ ¼ ji p ðx; yÞj sinð Þ. When these currents are displayed in synchronization with the xyscanner, we have two processed images of | T |2. If we further perform the complex addition of the two currents id (x, y) and iq (x, y), we can recover fully the processed complex information id ðx; yÞ jiq ðx; yÞ ¼ i p ðx; yÞ
Figure 2. Electronic demodulation system.
ð16Þ
3D IMAGE PROCESSING AND OPTICAL SCANNING HOLOGRAPHY
337
The complex addition is conveniently done by a digital computer. Interestingly, the complex conjugate of i p ðx; yÞ, i.e., i p ðx; yÞ, can be obtained if we use + j instead of j in the above addition. This leads to a different physical interpretation of the processed images as is shown in the following sections.
III. Three-dimensional Imaging Properties In two-pupil processing systems, we have a choice of using p1(x, y) and p2(x, y) to process our information (the 3D object), and for simplicity, let us choose p2(x, y) ¼ 1 and leave p1(x, y) as an arbitrary distribution. For the case of processing intensity objects, we use Eq. (15); the processed information i p(x, y) then becomes Z i p ðx; yÞ ¼ ½ p1 ðx0 ; y0 Þ hðx0 ; y0 ; z þ z0 Þ exp½jk0 ðz þ z0 Þ jTðx0 þ x; y0 þ y; zÞj2 dx0 dy0 dz Z ¼ ½ p1 ðx; yÞ he ðx; y; z þ z0 Þ jTðx; y; zÞj2 dz
ð17Þ
where he has been defined in Eq. (1) and denotes correlation involving coordinates x and y and it is defined as ZZ g1 ðx; yÞ g2 ðx; yÞ ¼ g1 ðx0 ; y0 Þg2 ðx þ x0 ; y þ y0 Þdx0 dy0 ð18Þ Again, i p ðx; yÞ is a 2D record of the processed 3D information. If we display this using a spatial light modulator (SLM), upon illumination by a plane wave we have the diffracted field distribution at a distance z away, given by i p ðx; yÞ hðx; y; zÞ ¼ ¼
Z
Z
½ p1 ðx; yÞ he ðx; y; z0 þ z0 Þ jTðx; y; z0 Þj2 dz0
hðx; y; zÞ
p1 ðx; yÞ he ðx; y; z0 þ z0 Þ jTðx; y; z0 Þj2 dz0 hðx; y; zÞ
ð19Þ
In writing the last line, we have changed the correlation intergal to the convolution integral by the relation f1 ðx; yÞ f2 ðx; yÞ ¼ f1 ðx; yÞ f2 ðx; yÞ and using the fact that he(x, y) is an even function in x and y. Now by carrying out the convolution between he and h in Eq. (19), we have
338
T.-C. POON
i p ðx; yÞ hðx; y; zÞ ¼ ¼
Z
Z
p1 ðx; yÞ expðjk0 zÞhe ðx; y; z z0 z0 Þ jTðx; y; z0 Þj2 dz0
p1 ðx; yÞ hðx; y; z z0 z0 Þ jTðx; y; z0 Þj2 exp½jk0 ðz0 þ z0 Þdz0
¼ exp½jk0 z0
n
h io p1 ðx; yÞ hðx; y; z z0 Þ jT ðx; y; zÞj2 expðjk0 zÞ
ð20Þ
where we have realized the convolution in z, and denotes a 3D convolution involving x, y and z, defined as ZZ g1 ðx; y; zÞ g2 ðx; y; zÞ ¼ g1 ðx0 ; y0 ; z0 Þg2 ðx x0 ; y y0 ; z z0 Þdx0 dy0 dz0
Note that the 3D convolution is involved with jTðx; y; zÞj2 exp ðjk0 zÞ, which is called the effective object function (Gu, 2000). The term exp ðjk0 zÞ represents a defocus phase factor because of the thickness of the object. The 3D PSFs can be found by taking jTðx; y; zÞj2 ¼ ðx; y; zÞ. From Eq. (20), we find, aside from some constant, the 3D intensity or incoherent PSF (IPSF): IPSF ðx; y; zÞ ¼ p1 ðx; yÞ hðx; y; z z0 Þ
ð21Þ
Similarly, if we use i p ðx; yÞ instead to be displayed on the SLM, we obtain i p ðx; yÞ h ðx; y; zÞ n h io ¼ exp½ jk0 z0 ½ p1 ðx; yÞ h ðx; y; z z0 Þ jTðx; y; zÞj2 expð jk0 zÞ
ð22Þ
and its corresponding IPSF is IPSF ðx; y; zÞ ¼ p1 ðx; yÞ h ðx; y; z z0 Þ
ð23Þ
For the coherent case, we return to Eq. (9) and the equation is repeated below for convenience: h i ð24Þ i ðx; yÞ / Re i p ðx; yÞ expð j tÞ where
i p ðx; yÞ ¼ ¼ ¼
Z
Z
Z
P2 ðx0 ; y0 ; z0 þ z0 ÞTðx0 þ x; y0 þ y; z0 Þdx0 dy0 dz0 ½ p2 ðx0 ; y0 Þ hðx0 ; y0 ; z0 þ z0 ÞTðx0 þ x; y0 þ y; z0 Þdx0 dy0 dz0 p2 ðx; yÞ hðx; y; z þ z0 Þ Tðx; y; zÞdz
3D IMAGE PROCESSING AND OPTICAL SCANNING HOLOGRAPHY
339
This equation takes a similar form of i p ðx; yÞ by inspecting Eq. (17). Hence, if i p ðx; yÞ is the recorded information on SLM, upon illuminatin by a plane wave, we find i p ðx; yÞ h ðx; y; zÞ ¼ ½ p2 ðx; yÞ h ðx; y; z z0 Þ Tðx; y; zÞ ð25Þ and hence the 3D coherent PSF (CPSF) is given by, for T (x, y; z) ¼ (x, y; z), CPSF ðx; y; zÞ ¼ p2 ðx; yÞ h ðx; y; z z0 Þ
ð26Þ
At this point, it is instructive to compare these results with those obtained from a conventional 3D imaging system. Figure 3 shows the imaging of a 3D object with a 4f optical system, where s (x, y) represents a pupil function on the pupil plane. The object is assumed to be a dilute (weakly scattering) transparent object, or a diffusely reflecting surface without shadowing or vignetting. In this situation, the imaging is linear (or bilinear if partially coherent), and a pointspread function can be defined (Thompson, 1969). To obtain the 3D PSF, we let an on-axis point source in the front focal plane of lens L1 and investigate the spread function at the back focal plane of lens 2. Because the point source will give a uniform plane wave on the pupil plane, and hence the field distribution on the back focal plane of lens 2 is proportional to F fsðx; yÞgkx ¼k0 x=f ; ky ¼k0 y=f ¼ pðx; yÞ, where ZZ uðx; yÞ expð jkx x þ jky yÞdx dy F fuðx; yÞgkx ; ky ¼ denotes the Fourier transform of u(x, y) with kx and ky denoting spatial frequencies (Poon and Banerjee, 2001). p(x, y) is the so-called in-focus pointspread function of the optical system. To find the 3D CPSF, we let p(x, y) proprogate a distance z, and the CPSF is
Figure 3. Conventional 4f 3D imaging system.
340
T.-C. POON
CPSF ðx; y; zÞ ¼ pðx; yÞ hðx; y; zÞ
ð27Þ
If the illumination light source is incoherent, we take the squared magnitude of CPSF to obtain a 3D incoherent or intensity point-spread function (IPSF), i.e., IPSF ¼ jCPSF ðx; y; zÞj2 . It is important to note that although the CPSF of the two-pupil optical heterodyne scanning sysytem [see Eq. (26)] is very similar to that of the conventional coherent PSF [see Eq. (27)], and the IPSFs of the two-pupil optical heteroyne scanning system, as given by Eqs. (21) and (23), also have a form similar to the 3D coherent PSF in Eq. (27). Because these PSFs are complex in general, we can perform complex 3D filtering on incoherent 3D objects. In fact, this is one of the most important attributes of two-pupil processing as the resulting 3D PSFs are no longer restricted to being real and positive.
IV. Optical Scanning Holography A. Cosine, Sine, and Complex Hologram In OSH, the object is scanned by choosing p1 ðx; yÞ ¼ ðx; yÞ and p2 ðx; yÞ ¼ 1. We shall discuss the incoherent case as examples, and hence Eq. (15) will be used. With these choices of p1(x, y) and p2(x, y), id (x, y) and iq(x, y), i.e., the outputs of the electronic system shown in Figure 2, become Z id ðx; yÞ ¼ Re½he ðx; y; z þ z0 Þ jTðx; y; zÞj2 dz
Z k0 k0 2 2 sin ðx þ y Þ jTðx; y; zÞj2 dz ð28Þ ¼ 2ðz þ z0 Þ 2ðz þ z0 Þ ¼ Hsin ðx; yÞ and Z
Im ½he ðx; y; z þ z0 Þ jTðx; y; zÞj2 dz
Z k0 k0 cos ðx2 þ y2 Þ jTðx; y; zÞj2 dz ð29Þ ¼ 2ðz þ z0 Þ 2ðz þ z0 Þ
iq ðx; yÞ ¼
¼ Hcos ðx; yÞ Hsin ðx; yÞ and Hcos ðx; yÞ are called the sine-Fresnel zone plate (FZP) hologram and cosine-FZP hologram of the object jTðx; y; zÞj2 , respectively (Poon and Kim, 1999). If we let jTðx; y; zÞj2 ¼ ðx; y; zÞ, i.e., a point source k0 at the origin of the axis as shown in Figure 1, Hsin ðx; yÞ / sin½2z ðx2 þ y2 Þ, 0
3D IMAGE PROCESSING AND OPTICAL SCANNING HOLOGRAPHY
341
k0 ðx2 þ y2 Þ. These are the well-known zone plate and Hcos ðx; yÞ / cos½2z 0 expressions, which are the holograms of a point source (Collier et al., 1971). Figure 4a and b shows the sine and cosine holograms of a white text ‘‘Virigina Tech’’ on black background, respectively. In the simulations, the text has been modeled as jTðx; y; zÞj2 ¼ Iðx; yÞðzÞ, i.e., it is located at z = 0, or z0 away form the scanning point source (see Fig. 1), where I(x, y) denotes the intensity distrubution of the text. Figure 4c and d shows their reconstructions, respectively. Reconstructions are simply done by convolving the holograms with the free-space impulse response matched to the depth parameter z0 ; hðx; y; z0 ). We notice the so-called twin-image noise in these reconstructions. However, if we construct a complex hologram Hc(x, y) according to Eq. (16), we have
Hc ðx; yÞ ¼ id ðx; yÞ jiq ðx; yÞ ¼ i p ðx; yÞ ¼ Hsin ðx; yÞ þ jHcos ðx; yÞ
Z k0 jk0 exp ðx2 þ y2 Þ jTðx; y; zÞj2 dz ¼ j 2ðz þ z0 Þ 2ðz þ z0 Þ Z ¼ he ðx; y; z þ z0 Þ jTðx; y; zÞj2 dz ¼ he ðx; y; z0 Þ Iðx; yÞ
ð30Þ
When the complex hologram is reconstructed by convolving with h(x, y; z0), we show a real-image reconstruction, located at z = z0 from the hologram, free of the twin-image noise as shown in Figure 4e. Twin-image ellimination has been a topic of current interest as holography is again seriously considered for 3D image display and 3D movies (Poon et al., 2000a; Poon, 2002; Kim et al., 1997; Piestun et al., 1997; Korecki et al., 2001). If we construct a complex hologram according to Hc ðx; yÞ ¼ id ðx; yÞ þ jiq ðx; yÞ ¼ i p ðx; yÞ ¼ Hsin ðx; yÞ jHcos ðx; yÞ ¼ he ðx; y; z0 Þ Iðx; yÞ
ð31Þ
This hologram will reconstruct a virtual image at z ¼ z0 and its real-image at z ¼ z0 is shown in Figure 4f, which is basically a defocused image of I(x, y).
B. 3D Image Reconstruction Let us now take a simple 3D object as an example in OSH. Again, the pupil functions for scanning holography are p1 ðx; yÞ ¼ ðx; yÞ and p2 ðx; yÞ ¼ 1 and hence the 3D IPSFs, according to Eqs. (21) and (23), are h(x, y; z z0)
342
T.-C. POON
Figure 4. (a) Sine-Fresnel zone plate hologram. (b) Cosine-Fresnel zone plate hologram. (c) Reconstruction of the hologram shown in (a). (d) Reconstruction of the hologram shown in (b). (e) Reconsturction of the complex hologram (free of twin-image noise). (f ) Reconsturction of the complex hologram (defocused image).
3D IMAGE PROCESSING AND OPTICAL SCANNING HOLOGRAPHY
Figure. 4. (Continued)
343
344
T.-C. POON
Figure. 4. (Continued )
345
3D IMAGE PROCESSING AND OPTICAL SCANNING HOLOGRAPHY
and h ðx; y; z z0 Þ, respectively, for the two incoherent cases. Let us look at the first case. We assume that we record three-point sources as shown in Figure 5. The 3D object is then given by jTðx; y; zÞj2 ¼ ðx; y; zÞ þ ðx; y; z z1 Þ þ ðx þ x1 ; y; z z1 Þ. What is being recorded as a hologram is then given by Eq. (17). Upon plane wave illumination of the hologram displayed on a SLM, we have, from Eq. (20), i p ðx; yÞ hðx; y; zÞ ¼ hðx; y; z z0 Þ þ expðjk0 z1 Þhðx; y; z z1 z0 Þ þ expðjk0 z1 Þhðx þ x1 ; y; z z1 z0 Þ
ð32Þ
Physically, this corresponds to the reconstruction of the three-point source as shown in Figure 6a. For the other case, we have i p ðx; yÞ h ðx; y; zÞ ¼ h ðx; y; z z0 Þ þ expð jk0 z1 Þh ðx; y; z z1 z0 Þ þ expð jk0 z1 Þh ðx þ x1 ; y; z z1 z0 Þ
ð33Þ
and this corresponds to the reconstruction of three virtual point sources as shown in Figure 6b. These two conjugate reconstructions correspond to the real and pseudoscopic reconstructions of a conventional hologram, respectively.
Figure 5. Optical scanning holographic recording of a three-point source.
346
T.-C. POON
Figure 6. (a) Holographic real reconstruction when id ðx; yÞ jiQ ðx; yÞ is used for constructing a complex transmittance from SLM display. (b) Holographic pseudoscopic reconstruction when id ðx; yÞ þ jiQ ðx; yÞ is used for constructing a complex transmittance from an SLM display.
3D IMAGE PROCESSING AND OPTICAL SCANNING HOLOGRAPHY
347
V. Concluding Remarks Historically, OSH was invented as a clever application of a pupil interaction processing technique. To gain some perspectives and insights into scanning holography, we first developed a 3D imaging theory for the two-pupil optical heterodyne scanning image processor and have shown that the 3D imaging properties of the processor can be expressed in terms of the two pupils in the system. The theory is applicable to dilute transparent 3D objects or diffusely reflecting surfaces without shadowing or vignetting. The processor can be used to process 3D objects coherently or incoherently, depending on the detection scheme used. Although the incoherent mode of operation has been applied for most of the applications, such as in 3D fluorescence microscopy, optical recognition of 3D objects, and most recently optical remote sensing, the coherent mode of operation so far has been limited to theory development. Coherent 3D imaging, nevertheless, is an important extension of the processor in biological imaging for area such as quantitative phase-contrast imaging (Indebetouw et al., 2000). We also have derived 3D coherent and incoherent PSFs of the two-pupil system and have shown that these PSFs have the same form as the 3D coherent PSF in the conventional 4f imaging system. This is an important departure from standard 3D imaging as we can process intensity values optically using not only real and positive PSFs but also bipolar or even complex 3D PSFs (Poon and Indebetouw, 2002). Applications using this important attribute of scanning holography remain to be explored. In the last part of the chapter, we discussed how the two pupils in the optical system should be modified to obtain holographic recording by active optical scanning. This technique to record holographic information in real time is now called optical scanning holography. We have introduced the socalled sine- and cosine-FZP hologram and shown that a complex hologram can be formed by using these two holograms to reject the well-known twinimage noise in holography. Finally, we have illustrated one of its important 3D imaging examples in OSH and shown that two reconstructions can be obtained corresponding to the real and pseudoscopic reconstructions of a conventional hologram. Although OSH is achieved by selecting a specific pair of the pupils in the two-pupil system, we want to point out that by proper manipulation of the pupils, the scanned object can result in holograms having unique properties. In fact, this is considered a type of preprocessing of holographic information in real time (Molesini et al., 1982; Ozkul et al., 1986; Schilling and Poon, 1995; Vikram and Billet, 1983). It is worth looking into this aspect of
348
T.-C. POON
preprocessing of holographic information as it is important for the transmission of holographic information in applications such as 3D holographic display and movies (Poon, 2002). In this article we have provided some important mathematical developments in 3D image processing and optical scanning holography through a two-pupil optical system. Much important and interesting work still needs to be investigated and performed as pointed out in this section. We submit that two-pupil heterodyne scanning image processing is simple and yet powerful for 3D imaging and hope that this article will stimulate further research in 3D imaging processing and holography, and its various novel applications.
Acknowledgments T.-C. Poon acknowledges the support of the National Science Foundation (Grant ECS-9810158) and many useful discussions throughout the years with Adrian Korpel (Professor Emeritus, University of Iowa) and Guy Indebetouw (Physics Department, Virginia Tech).
References Collier, R. J., Burckhardt, C. B., and Lin, L. H. (1971). Optical Holography. London: Acad. Press. Goodman, J. W. (1996). Introduction to Fourier Optics. 2nd ed. New York: McGraw-Hill. Gu, M. (2000). Advanced Optical Imaging Theory. Berlin: Springer-Verlag. Indebetouw, G. (1981). Scanning optical data processor. Opt. Laser Tech. 8, 197–201. Indebetouw, G. (2002). Properties of a scanning holographic microscope: improved resolution, extended depth-of-focus, and/or optical sectioning. J. Mod. Opt. 49, 1479–1500. Indebetouw, G., and Poon, T.-C. (1992). Novel approaches of incoherent image processing with emphasis on scanning methods. Opt. Eng. 31, 2159–2167. Indebetouw, G., Klysubun, P., Kim, T., and Poon, T.-C. (2000). Imaging properties of scanning holographic microscopy. J. Opt. Soc. Am. A. 17, 380–390. Kim, M. K. (1999). Wavelength scanning digital interference holography for optical section imaging. Opt. Lett. 24, 1693–1695. Kim, S.-G., Lee, B., and Kim, E.-S. (1997). Removal of bias and the conjugate image in incoherent on-axis triangular holography and real-time reconstruction of the complex hologram. Appl. Opt. 36, 4784–4791. Kim, S.-G., Lee, B., Kim, E.-S., and Yi, C.-W. (2001). Resolution analysis of incoherent triangular holography. Appl. Opt. 40, 4672–4678. Kim, T., and Poon, T.-C. (2000). Three-dimensional matching by use of phase-only holographic information and the Wigner distribution. J. Opt. Soc. Am. A. 17, 2520–2528.
3D IMAGE PROCESSING AND OPTICAL SCANNING HOLOGRAPHY
349
Kim, T., Poon, T. C., and Indebetouw, G. (2002). Depth detection and image recovery in remote sensing by optical scanning holography. Opt. Eng. 44, 1331–1338. Klysubun, P., Indebetouw, G., Kim, T., and Poon, T.-C. (2000). Accuracy of 3-D remote target location using scanning holographic correlation. Opt. Commun. 184, 357–366. Korecki, P., Materlik, G., and Korecki, J. (2001). Complex -ray hologram: Solution to twin images problem in atomic resolution imaging. Phys. Rev. Lett. 86, 1534–1537. Lohmann, A. W., and Rhodes, W. T. (1978). Two-pupil synthesis of optical transfer functions. Appl. Opt. 17, 1141–1150. Mait, J. N. (1987). Pupil-function design for complex incoherent spatial filtering. J. Opt. Soc. Am. A. 4, 1185–1193. Molesini, G., Bertani, D., and Certica, M. (1982). In-line holography with interference filters as Fourier processors. Opt. Acta. 29, 479–484. Ozkul, C., Allano, D., and Trinite, M. (1986). Filtering effects in far-field in-line holography. Opt. Eng. 25, 1142–1148. Piestun, R., Shamir, L., Wess kamp, B., and Bryngdahl, O. (1997). On-axis computer generated holograms for three-dimensional display. Opt. Lett. 22, 922–924. Poon, T.-C. (1985). Scanning holography and two-dimensional image processing by acoustooptic two-pupil synthesis. J. Opt. Soc. Am. A. 2, 621–627. Poon, T.-C. (2002). Three-dimensional television using optical scanning holography. J. Inform. Display 3, 12–16. Poon, T.-C., and Banerjee, P. P. (2001). Contemporary Optical Image Processing with Matlab. Oxford: Elsevier Science. Poon, T.-C., and Indebetouw, G. (2002). Three-dimensional point spread functions of an optical heterodyne scanning image processor. Appl. Opt. Accepted for publication. Poon, T.-C., and Kim, T. (1999). Optical image recognition of three-dimensional objects. Appl. Opt. 38, 370–381. Poon, T.-C., and Korpel, A. (1979). Optical transfer function of an acousto-optic heterodyning image processor. Opt. Lett. 4, 317–319. Poon, T.-C., Park, J., and Indebetouw, G. (1988). Optical realization of textural edge extraction. Opt. Commun. 65, 1–6. Poon, T.-C., Park, J., and Indebetouw, G. (1990). Real-time tunable incoherent spatial filtering: Two-pupil processing technique. Opt. Eng. 29, 1507–1510. Poon, T.-C., Doh, K., Schilling, B. W., Wu, W., Shinoda, K., and Suzuki, Y. (1995). Threedimensional microscopy by optical scanning holography. Opt. Eng. 34, 1338–1344. Poon, T.-C., Wu, M., Shinoda, K., and Suzuki, Y. (1996). Optical scanning holography. Proc. IEEE 84, 753–764. Poon, T.-C., Kim, T., Indebetouw, G., Wu, M. H., Shinoda, K., and Suzuki, Y. (2000a). Twinimage elimination experiments for three-dimensional images in optical scanning holography. Opt. Lett. 25, 215–217. Poon, T.-C., Schilling, B., Indebetouw, G., and Storrie, B. (2000b). Three-dimensional holographic fluorescence microscope. U.S. Patent # 6,038,041. Schilling, B. W., and Poon, T.-C. (1995). Real-time preprocessing of holographic information. Opt. Eng. 34, 3174–3180. Schilling, B. W., Poon, T.-C., Indebetouw, G., Storrie, B., Shinoda, K., and Wu, M. (1997). Three-dimensional holographic fluorescence microscopy. Opt. Lett. 22, 1506–1508. Stoner, W. (1978). Incoherent optical processing via spatially offset pupil masks. Appl. Opt. 17, 2454–2466. Schilling, B. W., and Templeton, G. C. (2001). Three-dimensional remote sensing by optical scanning holography. Appl. Opt. 40, 5474–5481.
350
T.-C. POON
Swoger, J., Martinez-Corral, M., Huisken, J., and Stelzer, E. (2002). Optical scanning holography as a technique for high-resolution three-dimensional biological microscopy. J. Opt. Soc. Am. A19, 1910–1918. Thompson, B. J. (1969). Image formation with partially coherent light, in Progress in Optics. Vol. 2, edited by E. Wolf, Amsterdam: North Holland Pub. Co. Vikram, C. S., and Billet, M. (1983). Gaussian beam effects in far-field in-line holography. Appl. Opt. 22, 2830–2935. Zhang, T., and Yamaguchi, I. (1998). Three-dimensional microscopy with phase-shifting digital holoography. Opt. Lett. 23, 1221–1223.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 126
Nonlinear Image Processing using Artificial Neural Networks DICK DE RIDDER,1 ROBERT P. W. DUIN,1 MICHAEL EGMONT-PETERSEN,2 LUCAS J. VAN VLIET,1 AND PIET W. VERBEEK1 1 Pattern Recognition Group, Department of Applied Physics, Delft University of Technology, 2628 CJ Delft, The Netherlands 2 Decision Support Systems Group, Institute of Information and Computing Sciences, Utrecht University, 3508 TB Utrecht, The Netherlands
I. Introduction . . . . . . . . . . . . . . A. Image Processing . . . . . . . . . . . B. Artificial Neural Networks (ANNs) . . . . C. ANNs for Image Processing . . . . . . . II. Applications of ANNs in Image Processing . . A. Feedforward ANNs . . . . . . . . . . 1. Classification . . . . . . . . . . . . 2. Regression . . . . . . . . . . . . . B. Other ANN Types . . . . . . . . . . C. Applications of ANNs . . . . . . . . . 1. Preprocessing . . . . . . . . . . . 2. Enhancement and Feature Extraction . . 3. Segmentation . . . . . . . . . . . 4. Object Recognition . . . . . . . . . 5. Image Understanding . . . . . . . . 6. Optimization . . . . . . . . . . . . D. Discussion. . . . . . . . . . . . . . 1. Problems with Data-Oriented Approaches 2. Problems with ANNs . . . . . . . . III. Shared Weight Networks for Object Recognition A. Shared Weight Networks . . . . . . . . 1. Architecture . . . . . . . . . . . . 2. Other Implementations . . . . . . . . B. Handwritten Digit Recognition . . . . . . 1. The Data Set . . . . . . . . . . . . 2. Experiments . . . . . . . . . . . . 3. Feature Extraction . . . . . . . . . C. Discussion. . . . . . . . . . . . . . IV. Feature Extraction in Shared Weight Networks . A. Edge Recognition . . . . . . . . . . . 1. A Sufficient Network Architecture . . . 2. Training . . . . . . . . . . . . . 3. Discussion . . . . . . . . . . . . . B. Two-Class Handwritten Digit Classification . 1. Training . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
352 352 353 355 356 356 357 358 360 360 361 362 362 363 363 364 364 364 365 366 367 368 371 371 371 372 374 376 377 378 379 381 386 388 388
351 Copyright 2003 Elsevier Science (USA). All rights reserved. ISSN 1076-5670/03
352
DE RIDDER ET AL.
2. Decorrelating Conjugate Gradient Descent . . . . 3. Training ANN3 Using DCGD . . . . . . . . C. Discussion . . . . . . . . . . . . . . . . . V. Regression Networks for Image Restoration . . . . . A. Kuwahara Filtering . . . . . . . . . . . . . B. Architectures and Experiments . . . . . . . . . 1. Modular Networks. . . . . . . . . . . . . 2. Standard Networks . . . . . . . . . . . . 3. Data Sets and Training . . . . . . . . . . . 4. Results . . . . . . . . . . . . . . . . . C. Investigating the Error . . . . . . . . . . . . D. Discussion . . . . . . . . . . . . . . . . . VI. Inspection and Improvement of Regression Networks . . A. Edge-Favoring Sampling . . . . . . . . . . . 1. Experiments . . . . . . . . . . . . . . . B. Performance Measures for Edge-Preserving Smoothing 1. Smoothing versus Sharpening . . . . . . . . . 2. Experiments . . . . . . . . . . . . . . . 3. Discussion . . . . . . . . . . . . . . . . 4. Training Using Different Criteria . . . . . . . C. Inspection of Trained Networks . . . . . . . . . 1. Standard Networks . . . . . . . . . . . . 2. Modular Networks. . . . . . . . . . . . . D. Discussion . . . . . . . . . . . . . . . . . VII. Conclusions . . . . . . . . . . . . . . . . . . A. Applicability . . . . . . . . . . . . . . . . B. Prior Knowledge . . . . . . . . . . . . . . C. Interpretability . . . . . . . . . . . . . . . D. Conclusions . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . .
. . . .
. . . . .
.
. . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
392 397 398 399 399 404 404 409 410 410 415 416 418 418 419 419 424 426 427 430 432 432 435 439 442 442 443 444 446 426
I. Introduction A. Image Processing Image processing is the field of research concerned with the development of computer algorithms working on digitized images (e.g., Pratt, 1991; Gonzalez and Woods, 1992). The range of problems studied in image processing is large, encompassing everything from low-level signal enhancement to high-level image understanding. In general, image processing problems are solved by a chain of tasks. This chain, shown in Figure 1, outlines the possible processing needed from the initial sensor data to the outcome (e.g., a classification or a scene description). The pipeline consists of the steps of preprocessing, data reduction, segmentation, object recognition, and image understanding. In each step, the input and output
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
353
Figure 1. The image processing chain.
data can either be images (pixels), measurements in images (features), decisions made in previous stages of the chain (labels), or even object relation information (graphs). There are many problems in image processing for which good, theoretically justifiable solutions exist, especially for problems for which linear solutions suffice. For example, for preprocessing operations such as image restoration, methods from signal processing such as the Wiener filter can be shown to be the optimal linear approach. However, these solutions often work only under ideal circumstances; they may be highly computationally intensive (e.g., when large numbers of linear models have to be applied to approximate a nonlinear model); or they may require careful tuning of parameters. Where linear models are no longer sufficient, nonlinear models will have to be used. This is still an area of active research, as each problem will require specific nonlinearities to be introduced. That is, a designer of an algorithm will have to weigh the different criteria and come to a good choice, based partly on experience. Furthermore, many algorithms quickly become intractable when nonlinearities are introduced. Problems further in the image processing chain, such as object recognition and image understanding, cannot even (yet) be solved using standard techniques. For example, the task of recognizing any of a number of objects against an arbitrary background calls for human capabilities such as the ability to generalize, associate etc. All this leads to the idea that nonlinear algorithms that can be trained, rather than designed, might be valuable tools for image processing. To explain why, a brief introduction into artificial neural networks will be given first. B. Artificial Neural Networks (ANNs) In the 1940s, psychologists became interested in modeling the human brain. This led to the development of a model of the neuron as a thresholded
354
DE RIDDER ET AL.
summation unit (McCulloch and Pitts, 1943). They were able to prove that (possibly large) collections of interconnected neuron models, neural networks, could in principle perform any computation, if the strengths of the interconnections (or weights) were set to proper values. In the 1950s neural networks were picked up by the growing artificial intelligence community. In 1962, a method was proposed to train a subset of a specific class of networks, called perceptrons, based on examples (Rosenblatt, 1962). Perceptrons are networks having neurons grouped in layers, with only connections between neurons in subsequent layers. However, Rosenblatt could prove convergence only for single-layer perceptrons. Although some training algorithms for larger neural networks with hard threshold units were proposed (Nilsson, 1965), enthusiasm waned after it was shown that many seemingly simple problems were in fact nonlinear and that perceptrons were incapable of solving these (Minsky and Papert, 1969). Interest in artificial neural networks (ANNs) increased again in the 1980s, after a learning algorithm for multilayer perceptrons was proposed, the backpropagation rule (Rumelhart et al., 1986). This allowed nonlinear multilayer perceptrons to be trained as well. However, feedforward networks were not the only type of ANN under research. In the 1970s and 1980s a number of different biologically inspired learning systems were proposed. Among the most influential were the Hopfield network (Hopfield, 1982; Hopfield and Tank, 1985), Kohonen’s self-organizing map (Kohonen, 1995), the Boltzmann machine (Hinton et al., 1984), and the Neocognitron (Fukushima and Miyake, 1982). The definition of what constitutes an ANN is rather vague. In general it would at least require a system to consist of (a large number of ) identical, simple processing units, have interconnections between these units, possess tunable parameters (weights) that define the system’s function, and lack a supervisor that tunes each individual weight. However, not all systems that are called neural networks fit this description. There are many possible taxonomies of ANNs. Here, we concentrate on learning and functionality rather than on biological plausibility, topology etc. Figure 2 shows the main subdivision of interest: supervised versus unsupervised learning. Although much interesting work has been done in unsupervised learning for image processing (see e.g., Egmont-Petersen et al., 2002), we will restrict ourselves to supervised learning in this article. In supervised learning, there is a data set L containing samples x 2 Rd , where d is the number of dimensions of the data set. For each x a dependent variable
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
355
Figure 2. Adaptive method types discussed in this article.
y 2 Rm has to be supplied as well. The goal of a regression method is then to predict this dependent variable based on x. Classification can be seen as a special case of regression, in which only a single variable t 2 N is to be predicted, the label of the class to which the sample x belongs. In Section II, the application of ANNs to these tasks will be discussed in more detail. C. ANNs for Image Processing As was discussed above, dealing with nonlinearity is still a major problem in image processing. ANNs might be very useful tools for nonlinear image processing: instead of designing an algorithm, one could construct an example data set and an error criterion, and train ANNs to perform the desired input–output mapping; the network input can consist of pixels or measurements in images; the output can contain pixels, decisions, labels, etc., as long as these can be coded numerically—no assumptions are made. This means adaptive methods can perform several steps in the image processing chain at once; ANNs can be highly nonlinear; the amount of nonlinearity can be influenced by design, but also depends on the training data (Raudys, 1998a,b); some types of ANN have been shown to be universal classification or regression techniques (Funahashi, 1989; Hornik et al., 1989). However, it is not to be expected that application of any ANN to any given problem will give satisfactory results. This article therefore studies the possibilities and limitations of the ANN approach to image processing. It tries to answer the following main questions:
356
DE RIDDER ET AL.
Can image processing operations be learned by ANNs? To what extent can ANNs solve problems that are hard to solve using standard techniques? Is nonlinearity really a bonus? How can prior knowledge be used, if available? Can, for example, the fact that neighboring pixels are highly correlated be used in ANN design or training? What can be learned from ANNs trained to solve image processing problems? If one finds an ANN to solve a certain problem, can one learn how the problem should be approached using standard techniques? Can one extract knowledge from the solution? In particular, the last question is intriguing. One of the main drawbacks of many ANNs is their black-box character, which seriously impedes their application in systems in which insight in the solution is an important factor, e.g., medical systems. If a developer can learn how to solve a problem by analyzing the solution found by an ANN, this solution may be made more explicit. It is to be expected that for different ANN types, the answers to these questions will be different. This article is therefore laid out as follows: first, in Section II, a brief literature overview of applications of ANNs to image processing is given; in Sections III and IV, classification ANNs are applied to object recognition and feature extraction; in Sections V and VI, regression ANNs are investigated as nonlinear image filters. These methods are not only applied to real-life problems, but also studied to answer the questions outlined above. It is not the goal to obtain better performance than using traditional methods; instead, the goal is to find the conditions under which ANNs could be applied. II. Applications of ANNs in Image Processing This section will first discuss the most widely used type of ANN, the feedforward ANN, and its use as a classifier or regressor. Afterward, a brief review of applications of ANNs to image processing problems will be given. A. Feedforward ANNs This article will deal mostly with feedforward ANNs (Hertz et al., 1991; Haykin, 1994) (or multilayer perceptrons, MLPs). They consist of
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
357
Figure 3. A feedforward ANN for a three-class classification problem. The center layer is called the hidden layer.
interconnected layers of processing units or neurons (see Fig. 3). In Figure 3, the notation of weights and biases follows (Hertz et al., 1991): weights of connections between layer p and layer q are indicated by wqp; the bias, input, and output vectors of layer p are indicated by b p, I p, and O p, respectively. Basically, a feedforward ANN is a (highly) parameterized, adaptable vector function, which may be trained to perform classification or regression tasks. A classification feedforward ANN performs the mapping N : Rd ! hrmin ; rmax im
ð1Þ
with d the dimension of the input (feature) space, m the number of classes to distinguish and hrmin, rmax i the range of each output unit. The following feedforward ANN with one hidden layer can realize such a mapping: h i T T Nðx; W; BÞ ¼ f w32 f ðw21 x b2 Þ b3 ð2Þ W is the weight set, containing the weight matrix connecting the input layer with the hidden layer (w21) and the vector connecting the hidden layer with the output layer (w32); B (b2 and b3) contains the bias terms of the hidden and output nodes, respectively. The function f (a) is the nonlinear activation function with range h rmax, rmax i, operating on each element of its input vector. Usually, one uses either the sigmoid function, f ðaÞ ¼ 1=ð1 þ ea Þ, with the range hrmin ¼ 0; rmin ¼ 1i; the double sigmoid function f ðaÞ ¼ ½2= ð1 þ ea Þ 1; or the hyperbolic tangent function f ðaÞ ¼ tanhðaÞ, both with range hrmin ¼ 1; rmax ¼ 1i. 1. Classification To perform classification, an ANN should compute the posterior probabilities of given vectors x, P (!j | x), where !j is the label of class
358
DE RIDDER ET AL.
j; j ¼ 1; . . . ; m. Classification is then performed by assigning an incoming sample x to that class for which this probability is highest. A feedforward ANN can be trained in a supervised way to perform classification, when presented with a number of training samples L ¼ fðx; tÞg, with tl high (e.g., 0.9) indicating the correct class membership and t low (e.g., 0.1), 8 6¼ l. The training algorithm, for example backpropagation (Rumelhart et al., 1986) or conjugate gradient descent (Shewchuk, 1994), tries to minimize the mean squared error (MSE) function: EðW; BÞ ¼
1 2jLj
c X 2 Nðxi ; W; BÞ ti
X
ðxi ; ti Þ2L ¼1
ð3Þ
by adjusting the weights and bias terms. For more details on training feedforward ANNs see, e.g., (Hertz et al., 1991; Haykin, 1994). Richard and Lippmann (1991) showed that feedforward ANNs, when provided with enough nodes in the hidden layer, an infinitely large training set, and 0–1 training targets, approximate the Bayes posterior probabilities Pð!j jxÞ ¼
Pð!j Þpðxj!j Þ ; pðxÞ
j ¼ 1; . . . ; m
ð4Þ
with P(!j) the prior probability of class j, p(x|!j) the class-conditional probability density function of class j and p(x) the probability of observing x. 2. Regression Feedforward ANNs can also be trained to perform nonlinear multivariate regression, where a vector of real numbers should be predicted: R : Rd ! Rm
ð5Þ
with m the dimensionality of the output vector. The following feedforward ANN with one hidden layer can realize such a mapping: T
T
Rðx; W; BÞ ¼ w32 f ðw21 x b2 Þ b3
ð6Þ
The only difference between classification and regression ANNs is that in the latter application of the activation function is omitted in the last layer, allowing the prediction of values in Rm. However, this last layer activation function can be applied when the desired output range is limited. The desired output of a regression ANN is the conditional mean (assuming continuous input x): Z EðyjxÞ ¼ ypðyjxÞdy ð7Þ Rm
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
359
A training set L containing known pairs of input and output values (x, y) is used to adjust the weights and bias terms such that the mean squared error between the predicted value and the desired value, EðW; BÞ ¼
m X X 2 1 Rðxi ; W; BÞ yi 2jLj ðxi ; yi Þ2L ¼1
ð8Þ
(or the prediction error) is minimized. Several authors showed that under some assumptions, regression feedforward ANNs are universal approximators. If the number of hidden nodes is allowed to increase toward infinity, they can approximate any continuous function with arbitrary precision (Funahashi, 1989; Hornik et al., 1989). When a feedforward ANN is trained to approximate a discontinuous function, two hidden layers are sufficient for obtaining an arbitrary precision (Sontag, 1992). However, this does not make feedforward ANNs perfect classification or regression machines. There are a number of problems: there is no theoretically sound way of choosing the optimal ANN architecture or number of parameters. This is called the bias-variance dilemma (Geman et al., 1992): for a given data set size, the more parameters an ANN has, the better it can approximate the function to be learned; at the same time, the ANN becomes more and more susceptible to overtraining, i.e., adapting itself completely to the available data and losing generalization; for a given architecture, learning algorithms often end up in a local minimum of the error measure E instead of a global minimum1; they are nonparametric, i.e., they do not specify a model and are less open to explanation. This is sometimes referred to as the black-box problem. Although some work has been done in trying to extract rules from trained ANNs (Tickle et al., 1998), in general it is still impossible to specify exactly how an ANN performs its function. For a rather polemic discussion on this topic, see the excellent paper by Green (1998).
1 Although current evidence suggests this is actually one of the features that makes feedforward ANNs powerful: the limitations the learning algorithm imposes actually manage the bias-variance problem (Raudys, 1998a,b).
360
DE RIDDER ET AL.
B. Other ANN Types Two other major ANN types are the self-organizing map (SOM, Kohonen, 1995; also called topological map) is a kind of vector quantization method. SOMs are trained in an unsupervised manner with the goal of projecting similar d-dimensional input vectors to neighboring positions (nodes) on an m-dimensional discrete lattice. Training is called competitive: at each time step, one winning node gets updated, along with some nodes in its neighborhood. After training, the input space is subdivided into q regions, corresponding to the q nodes in the map. An important application of SOMs in image processing is therefore unsupervised cluster analysis, e.g., for segmentation. the Hopfield ANN (HNN, Hopfield, 1982) consists of a number of fully interconnected binary nodes, which at each given time represent a certain state. Connected to a state is an energy level, the output of the HNN’s energy function given the state. The HNN maps binary input sets on binary output sets; it is initialized with a binary pattern and by iterating an update equation, it changes its state until the energy level is minimized. HNNs are not thus trained in the same way that feedforward ANNs and SOMs are: the weights are usually set manually. Instead, the power of the HNN lies in running it. Given a rule for setting the weights based on a training set of binary patterns, the HNN can serve as an autoassociative memory (given a partially completed pattern, it will find the nearest matching pattern in the training set). Another application of HNNs, which is quite interesting in an image processing setting (Poggio and Koch, 1985), is finding the solution to nonlinear optimization problems. This entails mapping a function to be minimized on the HNN’s energy function. However, the application of this approach is limited in the sense that the HNN minimizes just one energy function, whereas most problems are more complex in the sense that the minimization is subject to a number of constraints. Encoding these constraints into the energy function takes away much of the power of the method, by calling for a manual setting of various parameters that again influence the outcome.
C. Applications of ANNs Image processing literature contains numerous applications of the above types of ANNs and various other, more specialized models. Below, we will
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
361
give a broad overview of these applications, without going into specific ones. Furthermore, we will only discuss application of ANNs directly to pixel data (i.e., not to derived features). For a more detailed overview, see, e.g., Egmont-Petersen et al. (2002).
1. Preprocessing Preprocessing an image can consist of image reconstruction (building up an image from a number of indirect sensor measurements) and/or image restoration (removing abberations introduced by the sensor, including noise). To perform preprocessing, ANNs have been applied in the following ways: optimization of an objective function specified by a traditional preprocessing approach; approximation of a mathematical transformation used in reconstruction, by regression; general regression/classification, usually directly on pixel data (neighborhood input, pixel output). To solve the first type of problem, HNNs can be used for the optimization involved in traditional methods. However, mapping the actual problem to the energy function of the HNN can be difficult. Occasionally, the original problem will have to be modified. Having managed to map the problem appropriately, the HNN can be a useful tool in image preprocessing, although convergence to a good result is not guaranteed. For image reconstruction, regression (feedforward) ANNs can be applied. Although some succesful applications are reported in the literature, it would seem that these applications call for more traditional mathematical techniques, because a guaranteed performance of the reconstruction algorithm is essential. Regression or classification ANNs can also be trained to perform image restoration directly on pixel data. In literature, for a large number of applications, nonadaptive ANNs were used. Where ANNs are adaptive, their architectures usually differ much from those of the standard ANNs: prior knowledge about the problem is used to design them (e.g., in cellular neural networks, CNNs). This indicates that the fast, parallel operation of ANNs, and the ease with which they can be embedded in hardware, can be important factors in choosing for a neural implementation of a certain preprocessing operation. However, their ability to learn from data is apparently of less importance. We will return to this in Sections V and VI.
362
DE RIDDER ET AL.
2. Enhancement and Feature Extraction After preprocessing, the next step in the image processing chain is extraction of information relevant to later stages (e.g., subsequent segmentation or object recognition). In its most generic form, this step can extract low-level information such as edges, texture characteristics, etc. This kind of extraction is also called image enhancement, as certain general (perceptual) features are enhanced. As enhancement algorithms operate without a specific application in mind, the goal of using ANNs is to outperform traditional methods either in accuracy or computational speed. The most well-known enhancement problem is edge detection, which can be approached using classification feedforward ANNs. Some modular approaches, including estimation of edge strength or denoising, have been proposed. Morphological operations have also been implemented on ANNs, which were equipped with shunting mechanisms (neurons acting as switches). Again, as in preprocessing, prior knowledge is often used to restrict the ANNs. Feature extraction entails finding more application-specific geometric or perceptual features, such as corners, junctions, and object boundaries. For particular applications, even higher level features may have to be extracted, e.g., eyes and lips for face recognition. Feature extraction is usually tightly coupled with classification or regression; what variables are informative depends on the application, e.g., object recognition. Some ANN approaches therefore consist of two stages, possibly coupled, in which features are extracted by the first ANN and object recognition is performed by the second ANN. If the two are completely integrated, it can be hard to label a specific part as a feature extractor (see also Section IV). Feedforward ANNs with bottlenecks (autoassociative ANNs) and SOMs are useful for nonlinear feature extraction. They can be used to map highdimensional image data onto a lower number of dimensions, preserving as well as possible the information contained. A disadvantage of using ANNs for feature extraction is that they are not by default invariant to translation, rotation, or scale, so if such invariances are desired they will have to be built in by the ANN designer. 3. Segmentation Segmentation is partitioning an image into parts that are coherent according to some criterion: texture, color, or shape. When considered as a classification task, the purpose of segmentation is to assign labels to individual pixels or voxels. Classification feedforward ANNs and variants can perform segmentation directly on pixels, when pixels are represented by windows extracted around their position. More complicated modular
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
363
approaches are possible as well, with modules specializing in certain subclasses or invariances. Hierarchical models are sometimes used, even built of different ANN types, e.g., using an SOM to map the image data to a smaller number of dimensions and then using a feedforward ANN to classify the pixel. Again, a problem here is that ANNs are not naturally invariant to transformations of the image. Either these transformations will have to be removed beforehand, the training set will have to contain all possible transformations, or invariant features will have to be extracted from the image first. For a more detailed overview of ANNs applied to image segmentation, see Pal and Pal (1993). 4. Object Recognition Object recognition consists of locating the positions and possibly orientations and scales of instances of classes of objects in an image (object detection) and classifying them (object classification). Problems that fall into this category are, e.g., optical character recognition, automatic target recognition, and industrial inspection. Object recognition is potentially the most fruitful application area of pixel-based ANNs, as using an ANN approach makes it possible to roll several of the preceding stages (feature extraction, segmentation) into one and train it as a single system. Many feedforward-like ANNs have been proposed to solve problems. Again, invariance is a problem, leading to the proposal of several ANN architectures in which connections were restricted or shared corresponding to desired invariances (e.g., Fukushima and Miyake, 1982; Le Cun et al., 1989a). More involved ANN approaches include hierarchical ANNs, to tackle the problem of rapidly increasing ANN complexity with increasing image size, and multiresolution ANNs, which include context information. 5. Image Understanding Image understanding is the final step in the image processing chain, in which the goal is to interpret the image content. Therefore, it couples techniques from segmentation or object recognition with the use of prior knowledge of the expected image content (such as image semantics). As a consequence, there are only a few applications of ANNs on pixel data. These are usually complicated, modular approaches. A major problem when applying ANNs for high-level image understanding is their black-box character. Although there are proposals for explanation facilities (Egmont-Petersen et al., 1998a) and rule extraction (Tickle et al., 1998), it is usually hard to explain why a particular image interpretation is the most likely one. Another problem in image understanding relates to the
364
DE RIDDER ET AL.
amount of input data. When, e.g., seldomly occurring images are provided as input to a neural classifier, a large number of images are required to establish statistically representative training and test sets. 6. Optimization Some image processing (sub)tasks such as stereo matching can best be formulated as optimization problems, which may be solved by HNNs. HNNs have been applied to optimization problems in reconstruction and restoration, segmentation, (stereo) matching and recognition. Mainly, HNNs have been applied for tasks that are too difficult to realize with other neural classifiers because the solutions entail partial graph matching or recognition of 3D objects. A disadvantage of HNNs is that training and use are both of high computational complexity.
D. Discussion One of the major advantages of ANNs is that they are applicable to a wide variety of problems. There are, however, still caveats and fundamental problems that require attention. Some problems are caused by using a statistical, data-oriented technique to solve image processing problems; other problems are fundamental to the way ANNs work. 1. Problems with Data-Oriented Approaches A problem in the application of data-oriented techniques to images is how to incorporate context information and prior knowledge about the expected image content. Prior knowledge could be knowledge about the typical shape of objects one wants to detect, knowledge of the spatial arrangement of textures or objects, or of a good approximate solution to an optimization problem. According to Perlovsky (1998), the key to restraining the highly flexible learning algorithms ANNs are, lies in the very combination with prior knowledge. However, most ANN approaches do not even use the prior information that neighboring pixel values are highly correlated. The latter problem can be circumvented by extracting features from images first, by using distance or error measures on pixel data that do take spatial coherency into account (e.g., Hinton et al., 1997; Simard et al., 1993), or by designing an ANN with spatial coherency (e.g., Le Cun et al., 1989a; Fukushima and Miyake, 1982) or contextual relations between objects in mind. On a higher level, some methods, such as hierarchical object recognition ANNs, can provide context information.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
365
In image processing, classification and regression problems quickly involve a very large number of input dimensions, especially when the algorithms are applied directly on pixel data. This is problematic, as ANNs to solve these problems will also grow, which makes them harder to train. However, the most interesting future applications (e.g., volume imaging) promise to deliver even more input. One way to cope with this problem is to develop feature-based pattern recognition approaches; another way would be to design an architecture that quickly adaptively downsamples the original image. Finally, there is a clear need for thorough validation of the developed image processing algorithms (Haralick, 1994; De Boer and Smeulders, 1996). Unfortunately, only a few of the publications about ANN applications ask the question whether an ANN really is the best way of solving the problem. Often, comparison with traditional methods is neglected. 2. Problems with ANNs Several theoretical results regarding the approximation capabilities of ANNs have been proven. Although feedforward ANNs with two hidden layers can approximate any (even discontinuous) function to an arbitrary precision, theoretical results on, e.g., convergence are lacking. The combination of initial parameters, topology, and learning algorithm determines the performance of an ANN after its training has been completed. Furthermore, there is always a danger of overtraining an ANN, as minimizing the error measure occasionally does not correspond to finding a well-generalizing ANN. Another problem is how to choose the best ANN architecture. Although there is some work on model selection (Fogel, 1991; Murata et al., 1994), no general guidelines exist that guarantee the best trade-off between model bias and variance (see page 359) for a particular size of the training set. Training unconstrained ANNs using standard performance measures such as the mean squared error might even give very unsatisfying results. This, we assume, is the reason why in a number of applications, ANNs were not adaptive at all or heavily constrained by their architecture. ANNs suffer from what is known as the black-box problem: the ANN, once trained, might perform well but offers no explanation on how it works. That is, given any input a corresponding output is produced, but it cannot be easily explained why this decision was reached, how reliable it is, etc. In some image processing applications, e.g., monitoring of (industrial) processes, electronic surveillance, biometrics, etc. a measure of the reliability is highly necessary to prevent costly false alarms. In such areas, it might be
366
DE RIDDER ET AL.
preferable to use other, less well-performing methods that do give a statistically profound measure of reliability. As was mentioned in Section I, this article will focus both on actual applications of ANNs to image processing tasks and the problems discussed above: the choice of ANN architecture; the use of prior knowledge about the problem in constructing both ANNs and training sets; the black-box character of ANNs. In the next section, an ANN architecture developed specifically to address these problems, the shared weight ANN, will be investigated.
III. Shared Weight Networks for Object Recognition In this section, some applications of shared weight neural networks will be discussed. These networks are more commonly known in the literature as time delay neural networks (TDNNs) (Bengio, 1996), since the first applications of this type of network were in the field of speech recognition.2 Sejnowski and Rosenberg (1987) used a slightly modified feedforward ANN in their NETtalk speech synthesis experiment. Its input consisted of an alpha numerical representation of a text; its training target was a representation of the phonetic features necessary to pronounce the text. Sejnowski took the input of the ANN from the ‘‘stream’’ of text with varying time delays, each neuron effectively implementing a convolution function (see Fig. 4). The window was seven frames wide and static. The higher layers of the ANN were just of the standard feedforward type. Twodimensional TDNNs later developed for image analysis really are a generalization of Sejnowski’s approach: they used the weight-sharing technique not only after the input layer, but for two or three layers. To avoid confusion, the general term ‘‘shared weight ANNs’’ will be used. This section will focus on just one implementation of shared weight ANNs, developed by Le Cun et al. (1989a). This ANN architecture is interesting, in that it incorporates prior knowledge of the problem to be solved—object recognition in images—into the structure of the ANN itself. 2 The basic mechanisms employed in TDNNs, however, were known long before. In 1962, Hubel and Wiesel introduced the notion of receptive fields in mammalian brains. Rumelhart et al. (1986) proposed the idea of sharing weights for solving the T-C problem, in which the goal is to classify a 3 3 pixel letter T and a 3 2 pixel letter C, independent of translation and rotation (Minsky and Papert, 1969).
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
367
Figure 4. The operation of the ANN used in Sejnowski’s NETtalk experiment. The letters (and three punctuation marks) were coded by 29 input units using place coding: that is, the ANN input vector contained all zeroes with one element set to one, giving 7 29 ¼ 203 input units in total. The hidden layer contained 80 units and the output layer 26 units, coding the phoneme.
The first few layers act as convolution filters on the image, and the entire ANN can be seen as a nonlinear filter. This also allows us to try to interpret the weights of a trained ANN in terms of image processing operations. First, the basic shared weight architecture will be introduced, as well as some variations. Next an application to handwritten digit recognition will be shown. The section ends with a discussion on shared weight ANNs and the results obtained.
A. Shared Weight Networks The ANN architectures introduced by Le Cun et al. (1989a) use the concept of sharing weights, that is, a set of neurons in one layer using the same incoming weight (see Fig. 5). The use of shared weights leads to all these neurons detecting the same feature, though at different positions in the input image (receptive fields); i.e., the image is convolved with a kernel defined by the weights. The detected features are—at a higher level—combined, to obtain shift-invariant feature detection. This is combined with layers implementing a subsampling operation to decrease resolution and sensitivity to distortions. Le Cun et al. (1989b) actually describe several different architectures, though all of these use the same basic techniques. Shared weight ANNs have been applied to a number of other recognition problems, such as word recognition (Bengio et al., 1994), cursive script recognition (Schenkel et al., 1995), face recognition (Lawrence et al., 1997; Fogelman Soulie et al., 1993; Viennet, 1993), automatic target recognition (Gader et al., 1995), and hand tracking (Nowlan and Platt, 1995). Other architectures employing the same ideas can be found as well. In Fukushima and Miyake (1982), an ANN architecture specifically suited to object
368
DE RIDDER ET AL.
Figure 5. The LeCun shared weight ANN.
recognition is proposed, the Neocognitron. It is based on the workings of the visual nervous system, and uses the technique of receptive fields and of combining local features at a higher level to more global features (see also II.C.4). The ANN can handle positional shifts and geometric distortion of the input image. Others have applied standard feedforward ANNs in a convolution-like way to large images. Spreeuwers (1992) and Greenhil and Davies (1994) trained ANNs to act as filters, using pairs of input–output images.
1. Architecture The LeCun ANN, shown in Figure 5, comprises at least five layers, including input and output layers: The input layer consists of a gray-level image of 16 16 pixels. The second layer contains the so-called feature maps (see Fig. 6). Each neuron in such a feature map has the same 5 5 set of incoming weights, but is connected to a square at a unique position in the input image. This set can be viewed as a convolution filter, or template; that is, if a neuron in a feature map has high output, this corresponds to a match with the template. The place of the match in the input image corresponds to the place of the neuron in the feature map. The image is
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
369
Figure 6. A feature map and a subsampling map.
undersampled, as the receptive field for two neighboring neurons is shifted two pixels in the input image. The rationale behind this is that, while high resolution is important for detecting a feature, it is not necessary to know its position in the image with the same precision. Note that the number of connections between the input and feature map layer is far greater than the number of weights, due to the weightsharing. However, neurons do not share their bias. Figure 5 shows the number of neurons, connections, and weights for each layer. The third layer consists of subsampling maps (Fig. 6). This layer is included mainly to reduce the number of free parameters. The principle is the same as for the feature maps: each neuron in a subsampling map is connected to a 5 5 square and all neurons in one subsampling map share the same set of 25 weights. Here, too, the feature map is undersampled, again losing some of the information about the place of detected features. The main difference, however, is that each neuron in a subsampling map is connected to more than one feature map. This mapping of feature maps onto subsampling maps is not trivial; Le Cun et al. use different approaches in their articles. In Le Cun et al, only the number of feature maps connected to each subsampling map, 8, is mentioned; it is not clear which feature maps are linked to which subsampling maps. In Le Cun et al. (1989b), however, Table 1 is given. Again, due to the use of shared weights, there are significantly less weights than connections (although biases are not shared). See Figure 5 for an overview.
370
DE RIDDER ET AL. TABLE 1 Connections between the Feature Map Layer and Subsampling Map Layer in the LeCun Architecture Subsampling Map
Feature Map
1
2
3
1 2 3 4 5 6 7 8 9 10 11 12
4
5
6
7
8
9
10
11
12
The output of the subsampling map is propagated to a hidden layer. This layer is fully connected to the subsampling layer. The number of neurons is 30. The output layer is fully connected to the hidden layer. It contains 10 neurons, and uses place coding for classification; the neurons are numbered 0 . . . 9, and the neuron with the highest activation is chosen. The digit recognized is equal to the neuron number. The total number of neurons in the ANN is 1256. Without weight sharing, the total number of parameters would be 64,660, equal to the number of connections. However, the total number of unique parameters (weights and biases) is only 9760. Shared weight ANNs can be trained by any standard training algorithm for feedforward ANNs (Hertz et al., 1991; Haykin, 1994), provided that the derivative of the cost function with respect to a shared weight is defined as the sum of the derivatives with respect to the nonshared weights (Viennet, 1993). The individual weight updates are used to update the bias for each neuron, since biases are not shared. Clearly, the architecture presented uses prior knowledge (recognizing local features, combining them at a higher level) about the task to solve (i.e., object recognition), thus addressing the problem discussed in Section II.D. In Solla and Le Cun (1991), the authors show that this approach indeed gives better performance. They compare three simple architectures: a
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
371
standard backpropagation ANN, an ANN with one feature map and one subsampling map and an ANN with two feature maps, each mapped onto one subsampling map. It is shown that the more prior knowledge is put into the ANN, the higher its generalization ability.3 2. Other Implementations Although the basics of other ANN architectures proposed by Le Cun et al. and others are the same, there are same differences to the one discussed above (Le Cun et al., 1989a). In Le Cun et al. (1990), an extension of the architecture is proposed with a larger number of connections, but a number of unique parameters even lower than that of the LeCun ANN. The ‘‘LeNotre’’ architecture is a proposal by Fogelman Soulie et al. (1993) and, under the name Quick, by Viennet (1993). It was used to show that the ideas that resulted in the construction of the ANNs described above can be used to make very small ANNs that still perform reasonably well. In this architecture, there are only two feature map layers of two maps each; the first layer contains two differently sized feature maps. B. Handwritten Digit Recognition This section describes some experiments using the LeCun ANNs in a handwritten digit recognition problem. For a more extensive treatment, see de Ridder (2001). The ANNs are compared to various traditional classifiers, and their effectiveness as feature extraction mechanisms is investigated. 1. The Data Set The data set used in the experiments was taken from Special Database 3 distributed on CD-ROM by the U.S. National Institute for Standards and Technology (NIST) (Wilson and Garris, 1992). Currently, this database is discontinued; it is now distributed together with Database 7 as Database 19. Of each digit, 2500 samples were used. After randomizing the order per class, the set was split into three parts: a training set of 1000 images per class, a testing set of 1000 images per class, and a validation set of 500 images per class. The latter set was used in the ANN experiments for early stopping: if the error on the validation set increased for more than 50 cycles continuously, training was stopped and the ANN with minimum error on 3 Generalization ability is defined as the probability that a trained ANN will correctly classify an arbitrary sample, distinct from the training samples. It is therefore identical to the test error for sufficiently large testing sets drawn from the same distribution as the training set.
372
DE RIDDER ET AL.
the validation set was used. This early stopping is known to prevent overtraining. The binary digit images were then preprocessed in the following steps (de Ridder, 1996): shearing, to put the digit upright; scaling of line width, to normalize the number of pixels present in the image; segmenting the digit by finding the bounding box, preserving the aspect ratio; converting to floating point and scaling down to 16 16 using lowpass filtering and linear interpolation. Figure 7 shows an example. 2. Experiments Instances of the LeCun ANN were trained on subsets of the training set containing 10, 25, 50, 100, 250, 500, and 1000 samples per class. Following Le Cun et al. (1989a), weights and biases were initialized randomly using a uniform distribution in the range ½2:4=F ; 2:4=F , where F was the total fanin of a unit (i.e., the number of incoming weights). Backpropagation was used for training, with a learning rate of 0.5 and no momentum. Training targets were set to 0.9 for the output neuron coding the right digit class, and 0.1 for the other output neurons. After training, the testing set was used to find the error. For comparison, a number of traditional classifiers were trained as well: the nearest mean linear classifier (which is denoted nm in the figures), the linear and quadratic Bayes plug-in classifiers4 (lc and qc), and the 1-nearest neighbor classifier (1 nn) [see, e.g., Devijver and Kittler (1982) and Fukunaga (1990) for a discussion on these statistical pattern classifiers]. For the Bayes plug-in classifiers, regularization was used in calculating the 256 256 element covariance matrix C: C0 ¼ ð1 r sÞ C þ r diagðCÞ þ
s trðCÞI 256
ð9Þ
where diag (C) is the matrix containing only the diagonal elements of C, tr(C) is the trace of matrix C, and using r ¼ s ¼ 0:1. Furthermore, two 4
The Bayes classifier assumes models for each of the classes are known; that is, the models can be ‘‘plugged in.’’ Plugging in normal densities with equal covariance matrices leads to a linear classifier; plugging in normal densities with different covariance matrices per class leads to a quadratic classifier.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
373
Figure 7. A digit before (a) and after (b) preprocessing.
standard feedforward ANNs were trained, containing one hidden layer of 256 and 512 hidden units, respectively. Finally, support vector classifiers (SVMs; Vapnik, 1995) were trained with polynomial kernels of various degrees and with radial basis kernels, for various values of . Results are shown in Figure 8. The LeCun ANN performs well, better than most traditional classifiers. For small sample sizes the LeCun ANN performs better than the standard feedforward ANNs. The 1-nearest neighbor classifier and the standard feedforward ANNs perform as well as the LeCun ANN or slightly better, as do the SVMs. In general, classifiers performing better also have many more parameters and require more calculation in the testing phase. For example, when trained on 1000 samples per class the LeCun ANN (2.3% error) performs slightly worse than the 1-nearest neighbor classifier (1.8% error) and the best performing SVMs (e.g., radial basis kernels, ¼ 10: 1.4% error), but slightly better than the 256 hidden unit feedforward ANN (2.4% error). The LeCun ANN has 64,660 parameters, requiring as many FLOPs (floating point operations) to test one sample. In contrast, the 1-nearest neighbor rule, trained on 1000 samples per class, requires 10,000 distance calculations in 256 dimensions, i.e., roughly 5,120,000 FLOPs. Similarly, the SVM uses a total of 8076 support vectors in its 10 classifiers, requiring 4,134,912 FLOPs. However, the fully connected feedforward ANN with 256 hidden units requires 256 256 þ 256 10 ¼ 68;096 FLOPs, a number comparable to the LeCun ANN. In conclusion, the LeCun ANN seems to perform well given its limited number of parameters, but a standard feedforward ANN performs equally well using the same amount of computation. This indicates that the restrictions placed on the shared weight ANNs are not quite necessary to obtain a good performance. It also contradicts the finding in Solla and Le Cun (1991) that the use of shared weights leads to better performance.
374
DE RIDDER ET AL.
Figure 8. Classification errors on the testing set, for (a) the LeCun and standard ANNs; (b) the nearest mean classifier (nm), linear and quadratic Bayes plug-in rules (lc, qc), and the 1-nearest neighbor classifier (1nn); (c) SVMs with a polynomial kernel function of degrees 1, 2, 4, and 6; (d) SVMs with a radial basis kernel function, ¼ 5; 10; 20.
3. Feature Extraction In Figure 9, an image of the LeCun ANN trained on the entire training set is shown. Some feature maps seem to perform operations similar to low-level image processing operators such as edge detection. It is also noteworthy that the extracted features, the outputs of the last subsampling layer, are nearly binary (either high or low). However, visual inspection of the feature and subsampling masks in the trained shared weight ANNs in general does not give much insight into the features extracted. Gader et al. (1995), in their work on automatic target recognition, inspected trained feature maps and claimed they were ‘‘suggestive of a diagonal edge detector with a somewhat weak response’’ and ‘‘of a strong horizontal edge detector with some ability to detect corners as well’’; however, in our opinion these maps can be
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
375
Figure 9. The LeCun ANN trained on the handwritten digit set, 1000 samples/class. Note: for each map in the third layer, only the first set of weights (the first filter) is depicted. Bias is not shown in the figure. In this representation, the bottom layer is the input layer.
interpreted to perform any of a number of image processing primitives. In the next section, a number of simpler problems will be studied in order to learn about the feature extraction process in shared weight ANNs. Here, another approach is taken to investigate whether the shared weight ANNs extract useful features: the features were used to train other classifiers. First, the architecture was cut halfway, after the last layer of subsampling maps, so that the first part could be viewed to perform feature extraction only. The original training, testing, and validation sets were then mapped onto the new feature space by using each sample as input and finding the output of this first part of the ANN. This reduced the number of features to 192. In experiments, a number of classifiers were trained on this data set: the nearest mean linear classifier (nm), the Bayes plug-in linear and quadratic classifier (lc and qc), and the 1-nearest neighbor classifier (1nn).
376
DE RIDDER ET AL.
Figure 10. Performance of various classifiers trained on data sets extracted from the feature extraction parts of the LeCun ANN.
For the Bayes plug-in classifiers, the estimate of the covariance matrix was regularized in the same way as before (9), using r ¼ s ¼ 0:1. Figure 10 shows the results. In all cases the 1-nearest neighbor classifier performed better than the classification parts of the ANNs themselves. The Bayes plug-in quadratic classifier performed nearly as well as the ANN (Compare Fig. 8a to Fig. 10). Interestingly, the LeCun ANN does not seem to use its 30 unit hidden layer to implement a highly nonlinear classifier, as the difference between this ANN’s performance and that of the Bayes plug-in quadratic classifier is very small. Clearly, for all shared weight ANNs, most of the work is performed in the shared weight layers; after the feature extraction stage, a quadratic classifier suffices to give good classification performance. Most traditional classifiers trained on the features extracted by the shared weight ANNs perform better than those trained on the original feature set (Fig. 8b). This shows that the feature extraction process has been useful. In all cases, the 1-nearest neighbor classifier performs best, even better than on the original data set (1.7% vs. 1.8% error for 1000 samples/ class). C. Discussion A shared weight ANN architecture was implemented and applied to a handwritten digit recognition problem. Although some nonneural classifiers (such as the 1-nearest neighbor classifier and some support vector classifiers) perform better, they do so at a larger computational cost. However,
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
377
standard feedforward ANNs seem to perform as well as the shared weight ANNs and require the same amount of computation. The LeCun ANN results obtained are comparable to those found in the literature. Unfortunately, it is very hard to judge visually what features the LeCun ANN extracts. Therefore, it was tested on its feature extraction behavior, by using the output of the last subsampling map layer as a new data set in training a number of traditional classifiers. The LeCun ANN indeed acts well as a feature extractor, as these classifiers performed well; however, performance was at best only marginally better than that of the original ANN. To gain a better understanding, either the problem will have to be simplified, or the goal of classification will have to be changed. The first idea will be worked out in the next section, in which simplified shared weight ANNs will be applied to toy problems. The second idea will be discussed in Sections V and VI, in which feedforward ANNs will be applied to image restoration (regression) instead of feature extraction (classification).
IV. Feature Extraction in Shared Weight Networks This section investigates whether ANNs, in particular shared weight ANNs, are capable of extracting ‘‘good’’ features from training data. In the previous section the criterion for deciding whether features were good was whether traditional classifiers performed better on features extracted by ANNs. Here, the question is whether sense can be made of the extracted features by interpretation of the weight sets found. There is not much literature on this subject, as authors tend to research the way in which ANNs work from their own point of view, as tools to solve specific problems. Gorman and Sejnowski (1988) inspect what kind of features are extracted in an ANN trained to recognize sonar profiles. Various other authors have inspected the use of ANNs as feature extraction and selection tools, e.g., Egmont-Petersen et al. (1998b) and Setiono and Liu (1997) compared ANN performance to known image processing techniques (Ciesielski et al., 1992) or examined decision regions (Melnik and Pollack, 1998). Some effort has also been invested in extracting (symbolic) rules from trained ANNs (Setiono, 1997; Tickle et al., 1998) and in investigating the biological plausibility of ANNs (e.g., Verschure, 1996). An important subject in the experiments presented in this section will be the influence of various design and training choices on the performance and feature extraction capabilities of shared weight ANNs. The handwritten digit experiment showed that although the LeCun ANN performed well, its complexity and that of the data set made visual inspection of a trained ANN
378
DE RIDDER ET AL.
impossible. For interpretation therefore it is necessary to bring both data set and ANN complexity down to a bare minimum. Of course, many simple problems can be created (de Ridder, 1996); here, two classification problems will be discussed: edge recognition and simple two-class handwritten digit recognition.
A. Edge Recognition The problem of edge recognition is treated here as a classification problem: the goal is to train an ANN to give high output for image samples containing edges and low output for samples containing uniform regions. This makes it different from edge detection, in which localization of the edge in the sample is important as well. A data set was constructed by drawing edges at 0 , 15 , . . ., 345 angles in a 256 256 pixel binary image. These images were rescaled to 16 16 pixels using bilinear interpolation. The pixel values were 1 for background and +1 for the foreground pixels; near the edges, intermediate values occurred due to the interpolation. In total, 24 edge images were created. An equal number of images just containing uniform regions of background (1) or foreground (+1) pixels were then added, giving a total of 48 samples. Figure 11a shows the edge samples in the data set. The goal of this experiment is not to build an edge recognizer performing better than traditional methods; it is to study how an ANN performs edge recognition. Therefore, first a theoretically optimal ANN architecture and weight set will be derived, based on a traditional image processing approach. Next, starting from this architecture, a series of ANNs with an increasing number of restrictions will be trained, based on experimental observations.
Figure 11. (a) The edge samples in the edge dat set. (b) The Laplacian edge detector. (c) The magnitude of the frequency response of the Laplacian edge detector.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
379
In each trained ANN, the weights will be inspected and compared to the calculated optimal set. 1. A Sufficient Network Architecture To implement edge recognition in a shared weight ANN, it should consist of at least three layers (including the input layer). The input layer contains 16 16 units. The 14 14 unit hidden layer will be connected to the input layer through a 3 3 weight receptive field, which should function as an edge recognition template. The hidden layer should then, using bias, shift the high output of a detected edge into the nonlinear part of the transfer function, as a means of thresholding. Finally, a single output unit is needed to sum all outputs of the hidden layer and rescale to the desired training targets. The architecture described here is depicted in Figure 12. This approach consists of two different subtasks. First, the image is convolved with a template (filter) that should give some high output values when an edge is present and low output values overall for uniform regions. Second, the output of this operation is (soft-)thresholded and summed, which is a nonlinear neighborhood operation. A simple summation of the convolved image (which can easily be implemented in a feedforward ANN)
Figure 12. A sufficient ANN architecture for edge recognition. Weights and biases for hidden units are indicated by w po and b p, respectively. These are the same for each unit. Each connection between the hidden layer and the output layer has the same weight wqp and the output unit has a bias bq. Below the ANN, the image processing operation is shown: convolution with the Laplacian template fL, pixel-wise application of the sigmoid f (.), (weighted) summation, and another application of the sigmoid.
380
DE RIDDER ET AL.
will not do. Since convolution is a linear operation, for any template the sum of a convolved image will be equal to the sum of the input image multiplied by the sum of the template. This means that classification would be based on just the sum of the inputs, which (given the presence of both uniform background and uniform foreground samples, with sums smaller and larger than the sum of an edge image) is not possible. The data set was constructed like this on purpose, to prevent the ANN from finding trivial solutions. As the goal is to detect edges irrespective of their orientation, a rotationinvariant edge detector template is needed. The first-order edge detectors known from image processing literature (Pratt, 1991; Young et al., 1998) cannot be combined into one linear rotation-invariant detector. However, the second-order Laplacian edge detector can be. The continuous Laplacian, fL ðIÞ ¼
@2I @2I þ @x2 @y2
ð10Þ
can be approximated by the discrete linear detector shown in Figure 11b. It is a high-pass filter with a frequency response as shown in Figure 11c. Note that in well-sampled images only frequencies between =2 and =2 can be expected to occur, so the filter’s behavior outside this range is not critical. The resulting image processing operation is shown below the ANN in Figure 12. Using the Laplacian template, it is possible to calculate an optimal set of weights for this ANN. Suppose the architecture just described is used, with double sigmoid transfer functions. Reasonable choices for the training targets then are t ¼ 0:5 for samples containing an edge and t ¼ 0:5 for samples containing uniform regions. Let the 3 3 weight matrix (w po in Fig. 12) be set to the values specified by the Laplacian filter in Figure 11b. Each element of the bias vector of the units in the hidden layer, b p, can be p set to, e.g., bopt ¼ 1:0. Given these weight settings, optimal values for the remaining weights can be calculated. Note that since the DC component5 of the Laplacian filter is zero, the input to the hidden units for samples containing uniform regions will be just the bias, 1.0. As there are 14 14 units in the hidden layer, each having an output of f (1) 0.4621, the sum of all outputs O p will be approximately 196 0:4621 ¼ 90:5750. Here f () is the double sigmoid transfer function introduced earlier. For images that do contain edges, the input to the hidden layer will look like this:
5
The response of the filter at frequency 0, or equivalently, the scaling in average pixel value in the output image introduced by the filter.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
0
1
4
1
1
0
0
¼
0
0
0
0
2
2
2
2
2
2
2
2
0
0
0
0
381
ð11Þ
There are 14 14 ¼ 196 units in the hidden layer. Therefore, the sum of the output O p of that layer for a horizontal edge will be X p p p p Oi ¼ 14 f ð2 þ bopt Þ þ 14 f ð2 þ bopt Þ þ 168f ðbopt Þ i
ð12Þ
¼ 14 f ð3Þ þ 14 f ð1Þ þ 168 f ð1Þ
14 0:9051 þ 14 ð0:4621Þ þ 168 0:4621 ¼ 82:0278 q These values can be used to find the wqp opt and bopt necessary to reach the targets. Using the inverse of the transfer function, 2 1þa 1 ¼ x; a 2 h1; 1i ð13Þ 1 ¼ a ¼ ) f ðaÞ ¼ ln f ðxÞ ¼ 1 þ ex 1a P P p qp q q the input to the output unit, Iq ¼ i Oip wqp i þb ¼ i Oi wopt þ bopt ¼ 0, 1 should be equal to f ðtÞ, i.e.:
edge : uniform : This gives: edge : uniform :
t ¼ 0:5 ¼) Iq ¼ 1:0986 t ¼ 0:5 ¼) Iq ¼ 1:0986
q 82:0278 wqp opt þ bopt ¼
90:5750
wqp opt
þ
bqopt
1:0986
¼ 1:0986
ð14Þ
ð15Þ
q Solving these equations gives wqp opt ¼ 0:2571 and bopt ¼ 22:1880. Note that the bias needed for the output unit is quite high, i.e., far away from the usual weight initialization range. However, the values calculated here are all interdependent. For example, choosing lower values for wpo and q bpopt will lead to lower required values for wqp opt and bopt . This means there is not one single optimal weight set for this ANN architecture, but a range.
2. Training Starting from the sufficient architecture described above, a number of ANNs were trained on the edge data set. The weights and biases of each of
382
DE RIDDER ET AL.
these ANNs can be compared to the optimal set of parameters calculated above. An important observation in all experiments was that as more restrictions were placed on the architecture, it became harder to train. Therefore, in all experiments the conjugate gradient descent (CGD; Shewchuk, 1994; Hertz et al., 1991; Press et al., 1992) training algorithm was used. This algorithm is less prone to finding local minima or diverging than backpropagation, as it uses a line minimization technique to find the optimal step size in each iteration. The method has only one parameter, the number of iterations for which the directions should be kept conjugate to the previous ones. In all experiments, this was set to 10. Note that the property that makes CGD a good algorithm for avoiding local minima also makes it less fit for ANN interpretation. Standard gradient descent algorithms, such as backpropagation, will take small steps through the error landscape, updating each weight proportionally to its magnitude. CGD, due to the line minimization involved, can take much larger steps. In general, the danger is overtraining: instead of finding templates or feature detectors that are generally applicable, the weights are adapted too much to the training set at hand. In principle, overtraining could be prevented by using a validation set, as was done in Section III. However, here the interest is in what feature detectors are derived from the training set rather than obtaining good generalization. The goal actually is to adapt to the training data as well as possible. Furthermore, the artificial edge data set was constructed specifically to contain all possible edge orientations, so overtraining cannot occur. Therefore, no validation set was used. All weights and biases were initialised by setting them to a fixed value of 0.01, except where indicated otherwise.6 Although one could argue that random initialization might lead to better results, for interpretation purposes it is best to initialize the weights with small, equal values. a. ANN1: The Sufficient Architecture. The first ANN used the shared weight mechanism to find w po. The biases of the hidden layer, b p, and the weights between hidden and output layer, w qp, were not shared. Note that this ANN already is restricted, as receptive fields are used for the hidden layer instead of full connectivity. However, interpreting weight sets of unrestricted, fully connected ANNs is quite hard due to the excessive number of weights—there would be a total of 50,569 weights and biases in such an ANN. 6
Fixed initialization is possible here because units are not fully connected. In fully connected ANNs, fixed value initialization would result in all weights staying equal throughout training.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
383
Training this first ANN did not present any problem; the MSE quickly dropped, to 1 107 after 200 training cycles. However, the template weight set found—shown in Figure 13a and b—does not correspond to a Laplacian filter, but rather to a directed edge detector. The detector does have a zero DC component. Noticeable is the information stored in the bias weights of the hidden layer b p (Fig. 13c) and the weights between the hidden layer and the output layer, w qp (Fig. 13d). Note that in Figure 13 and other figures in this section, individual weight values are plotted as gray values. This facilitates interpretation of weight sets as feature detectors. Presentation using gray values is similar to the use of Hinton diagrams (Hinton et al., 1984). Inspection showed how this ANN solved the problem. In Figure 14, the different processing steps in ANN classification are shown in detail for three input samples (Fig. 14a). First, the input sample is convolved with the template (Fig. 14b). This gives pixels on and around edges high values, i.e., highly negative (10:0) or highly positive (+10.0). After addition of the hidden layer bias (Fig. 14c), these values dominate the output. In contrast, for uniform regions the bias itself is the only input of the hidden hidden layer units, with values approximately in the range [1; 1]. The result of application of the transfer function (Fig. 14d) is that edges are widened, i.e., they become bars of pixels with values +1.0 or 1:0. For uniform regions, the output contains just the two pixels diagonally opposite at the center, with significantly smaller values. The most important region in these outputs is the center. Multiplying this region by the diagonal þ= weights in the center and summing gives a very small input to the output unit (Fig. 14e); in other words, the weights cancel the input. In contrast, as the diagonal =þ pair of pixels obtained for uniform samples is multiplied by a diagonal pair of weights of the opposite sign, the input to the output unit will be negative. Finally, the bias of the output unit (not shown) shifts the input in order to obtain the desired target values t ¼ 0:5 and t ¼ 0:5.
Figure 13. (a) The template and (b) the magnitude of its frequency response, (c) hidden layer bias weights, and (d) weights between the hidden layer and output layer, as found in ANN1.
384
DE RIDDER ET AL.
Figure 14. Stages in ANN1 processing, for three different input samples: (a) the input sample; (b) the input convolved with the template; (c) the total input to the hidden layer, including bias; (d) the output of the hidden layer; and (e) the output of the hidden layer multiplied by the weights between hidden and output layer.
This analysis shows that the weight set found is quite different from the optimal one calculated in Section IV.A.1. As all edges pass through the center of the image, the edge detector need not be translation invariant: information on where edges occur is coded in both the hidden layer bias and the weights between the hidden layer and the output layer. b. ANN2: Sharing More Weights. To prevent the ANN from coding placespecific information in biases and weights, the architecture will have to be simplified further. As a restriction, in the next ANN architecture the weights between the hidden layer and output layer were shared. That is, there was one single weight shared among all 196 connections between the hidden units and the output unit. Training took more time, but converged to a 1 106 MSE after 2400 cycles. Still, the network does not find a Laplacian; however, the template found (Fig. 15a and b) has a clearer function than the one found before. It is a strong detector for edges with slope 45 , and a weak detector for edges with slope 45 .
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
385
Figure 15. (a) The template, (b) the magnitude of its frequency response, and (c) hidden layer bias weights as found in ANN2.
Figure 16. (a) The template found in ANN3 and (b) the magnitude of its frequency response. (c) The template found in ANN4 and (d) the magnitude of its frequency response.
In the bias weights of the hidden layer (Fig. 15c), place-specific information is now stored for edges that are not amplified well by this detector. Bias weight values are also significantly higher than before (an average of 1:2144). This allows the ANN to use the transfer function as a threshold operation, by scaling large positive pixel values differently from large negative pixel values. In conclusion, responsibility for edge recognition is now shared between the template and the bias weights of the hidden layer. c. ANN3: Sharing Bias. As the biases of hidden layer units are still used for storing place-dependent information, in the next architecture these biases were shared too.7 Training became even harder; the ANN would not converge using the initialization used before, so weights were initialized to a fixed value of 0.1. After 1000 episodes, the MSE reached 8 104 , just slightly higher than the minimal error possible (at 3 104 , larger than zero due to the interpolation used in scaling the edge samples). The template found is shown in Figure 16a and b. 7 Sharing biases would have required a major rewrite of the simulation package used, SPRLIB/ANNLIB (Hoekstra et al., 1996). Therefore, biases were shared by replacing all biases by their average after each training cycle.
386
DE RIDDER ET AL.
Note that the template now looks like a Laplacian edge detector; its frequency response is similar to that of the Laplacian in the range [=2; =2]. However, there are still small differences between various weights that are equal in the true Laplacian. In fact, the filter seems to be slightly tilted, with the top left corner containing weights with higher magnitude. Also, the frequency response shows that the filter gives a bandpass response in diagonal directions. To obtain a more Laplacian-like template, further restrictions will have to be placed on the ANN. d. ANN4: Enforcing Symmetry. In the last ANN, the prior knowledge that the goal is to obtain a rotation-invariant filter was used as well, by sharing weights in the filter itself. The mask used for this purpose was
A
B
A
B
C
B
A
B
A
ð16Þ
i.e., connections with identical mask letters used shared weights. Note that in this ANN there are only six free parameters left: the three weights in the mask, a bias weight for both the hidden and output layer and one weight between the hidden and output layer. Training was again more cumbersome, but after initializing weights with a fixed value of 0.1 the ANN converged after 1000 episodes to an MSE of 3 104 . The filter found is shown in Figure 16c and d. Finally, a solution similar to the optimal one was found: its frequency response is like that of the Laplacian in the range [=2; =2] and the weights are symmetric. 3. Discussion The experiments described in this section show that ANNs can be used as edge detectors. However, the presence of receptive fields in the architecture in itself does not guarantee that shift-invariant feature detectors will be found, as claimed by some (Le Cun et al., 1989b, 1990; Viennet, 1993). Also, the mere fact that performance is good (i.e., the MSE is low) does not imply that such a feature extraction process is used. An important observation in ANN1 and ANN2 was that the ANN will use weights and biases in later layers to store place-dependent information. In such a network, where edge positions are stored, in principle any template will suffice. Obviously, this makes interpretation of these templates dubious: different observers may find the ANN has learned different templates. One reason for the ease with
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
387
which ANNs store place-dependent information might be the relative simplicity of the dataset: the fact that edges all passed through the center of the image makes this possible. Therefore, in the next section similar ANNs will be trained on a real-world dataset. When the ANNs were further restricted by sharing biases and other weights (ANN3), convergence became a problem. The explanation for this is that the optimal weight set is rather special in ANN terms, as the template has to have a zero DC component (i.e., its weights have to add up to zero). Although this seems to be a trivial demand, it has quite large consequences for ANN training. Optimal solutions correspond to a range of interdependent weights, which will result in long, narrow valleys in the MSE ‘‘landscape.’’ A small perturbation in one of the template weights will have large consequences for the MSE. Simple gradient descent algorithms such as backpropagation will fail to find these valleys, so the line-optimization step used by CGD becomes crucial. The last ANN, ANN4, was able to find an edge detector very similar to the Laplacian. However, this architecture was restricted to such an extent that it can hardly be seen as representative for practical application of ANNs. This indicates there is a trade-off between complexity and the extent to which experiments are true to life on the one hand, and the possibility of interpretation on the other. This effect might be referred to as a kind of ANN interpretability trade-off.8 If an unrestricted ANN is trained on a real-world data set, the setup most closely resembles the application of ANNs in everyday practice. However, the subtleties of the data set and the many degrees of freedom in the ANN prevent gaining a deeper insight into the operation of the ANN. On the other side, once an ANN is restrained, e.g., by sharing or removing weights, lowering the number of degrees of freedom or constructing architectures only specifically applicable to the problem at hand, the situation is no longer a typical one. The ANN may even become too constrained to learn the task at hand. The same holds for editing a data set to influence its statistics or to enhance more preferable features with regard to ANN training, which will be discussed in Section VI.
8
Note that this is not precisely the same issue as addressed by the bias-variance trade-off (see page 359), which is concerned with the relation between model complexity and error. The concern here is with the specificity of the model with respect to interpretation, which, in principle, is unrelated to complexity: making a model more specific need not introduce a bias.
388
DE RIDDER ET AL.
Figure 17. The two-class handwritten digit data set.
B. Two-Class Handwritten Digit Classification To construct a more real-life data set while still maintaining the expectation that weights can be interpreted, experiments with a small NIST subset were performed. This subset consisted of 10 samples each of the classes ‘‘1’’ and ‘‘7,’’ shown in Figure 17. The 16 16 pixel values were scaled linearly between 1:0 (background) and 1.0 (foreground). Training targets were set to t ¼ 0:5 for class ‘‘1’’ and t ¼ 0:5 for class ‘‘7.’’ For this problem, it is already impossible to find an architecture and weight set by hand that will give minimal error. The receptive fields in the ANNs are expected to act as feature detectors, extracting characteristic shapes from the data. Beforehand, it is quite hard to indicate by hand which weight sets will detect the most salient features. However, as the width of the strokes in the digit images lies in the range 3–5 pixels, feature detectors should have widths and heights roughly in the range 3–7 pixels. The starting point therefore will be the ANN used for edge recognition, shown in Figure 12. However, three different architectures will be used. The first has a 3 3 pixel receptive field and 14 14 ¼ 196 units in the hidden layer, the second contains a 5 5 pixel receptive field and 12 12 ¼ 144 hidden units, and the last contains a 7 7 pixel receptive field and 10 10 ¼ 100 hidden units. As for this data set it is to be expected that using more than one feature map will increase performance; architecture using two feature maps were trained as well. In this case, the number of hidden units doubles. 1. Training Most ANNs were rather hard to train, again due to the restrictions placed on the architecture. CGD was used with 10 steps during which directions were kept conjugate. All ANN weights and biases were initialized using a fixed value of 0.01, except where indicated otherwise. For most restricted architectures, reaching an MSE of exactly 0 proved to be impossible. Therefore, training was stopped when the MSE reached a sufficiently low value, 1:0 106 .
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
389
a. ANN1: Unrestricted. The first ANNs were identical to the one shown in Figure 12, except for the fact that three different ANNs were trained with 55 77 3 3ðANN33 1 Þ; 5 5ðANN1 Þ; and 7 7ðANN1 ) pixel receptive fields, respectively. These ANNs quickly converged to a nearly zero MSE: after 250 training cycles, the MSE was in the order of 1 1010 . The feature detectors found, shown in Figure 18a, are not very clear however. The frequency responses (Figure 18b) give more information. The filters most
Figure 18. (a) Feature detectors found in the receptive fields of ANN133 , ANN155 , and ANN177 . (b) The corresponding frequency response magnitudes. (c) Weights between hidden layer and output layer.
390
DE RIDDER ET AL.
closely resemble horizontal edge detectors: note the basic shape returning for the three sizes of feature detector. As was the case in the edge recognition ANNs, the weights between the hidden layer and the output unit have been used to store positions of the digits. Figure 18c illustrates this. Positive weights indicate pixel positions where typically only class ‘‘7’’ samples have high values; negative weights indicate positions where class ‘‘1’’ is present. Although noisy, these same basic shapes are present for each size of the receptive field. In contrast to what was found for the edge recognition ANNs, the bias weights in the second layer were not used heavily. Bias values fell roughly in the range [2 102 ; 2 102 ], i.e., negligible in comparison to feature detector weight values. b. ANN2: Fully Restricted. In the next architecture, the number of weights was restricted by sharing weights between hidden layer and output layer and by sharing the bias weights in the second layer (i.e., the basic architecture was the same as ANN3, for edge recognition, on page 385). As a consequence,
Figure 19. (a) The feature detectors found in the receptive fields of ANN233 , ANN255 , and ANN277 . (b) The corresponding frequency response magnitudes.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
391
Figure 20. The output of the hidden layer of ANN55 2 , for two samples of class ‘‘1’’ (left) and two samples of class ‘‘7’’ (right).
there were far fewer parameters left in the ANNs: the number of weights in the feature detector plus two biases and one weight between hidden and output layer. Training became quite a bit harder. It did not converge for the ANN with the 3 3 pixel receptive field; the MSE oscillated around 1:5 102 . For the other two ANNs, training was stopped when the MSE fell below 1 106 , which took 2000 cycles for the 5 5 pixel receptive field ANN and 1450 cycles for the 7 7 pixel receptive field ANN. The feature detectors found are shown in Figure 19. Note that since the 3 3 receptive field ANN did not converge, the resulting filter cannot be interpreted. Since the weights between hidden layer and output layer can no longer be used, the feature detectors of the other two look rather different. The 5 5 pixel feature detector is the most pronounced: it is a detector of 3-pixel-wide bars with a slope of 45 . Evidence for this can also be found by inspecting the output of the hidden layer for various inputs, as shown in Figure 20. In the location of the stem of the ‘‘7’’s, output values are much higher than those in the location of the stem of the ‘‘1’’s. Finally, the function of the 7 7 pixel feature detector is unclear. From these results, it is clear that a feature detector size of 3 3 pixels is too small. On the other hand, although the 7 7 pixel feature detector gives good performance, it cannot be interpreted well. The 5 5 pixel feature detector seems to be optimal. Therefore, from here on only 5 5 pixel feature detectors will be considered. c. ANN3: Two Feature Maps. Although the frequency response of the 5 5 pixel feature detector is clearer than the others, the filter itself is still noisy, i.e. neighboring weights have quite different values. There is no clear global feature (within a 5 5 pixel region) that corresponds to this detector. The reason for this might be that in fact several features are detected (either amplified or attenuated) using this one set of weights. Therefore, ANN3 contained two feature maps instead of one. In all other respects, the ANN was the same as ANN25 5, as shown in Figure 21.
392
DE RIDDER ET AL.
Figure 21. ANN3, with two 5 5 pixel feature detectors. Biases and weights between the hidden layer and output layer have not been indicated.
If this ANN is initialized using a fixed value, the two feature detectors will always remain identical, as each corresponding weight in the two detectors is equally responsible for the error the ANN makes. Therefore, random initialization is necessary. This frustrates interpretation, as different initializations will lead to different final weight sets. To illustrate this, four ANNs were trained in which weights were initialized using values drawn from a uniform distribution with range [0:01; 0:01]. Figure 22 shows four resulting template pairs. The feature detector found before in ANN55 2 (Fig. 19) often returns as one of the two feature maps. The other feature detector however shows far more variation. The instantiation in the second row of Figure 22b looks like the horizontal edge detector found in ANN1 (Fig. 18a and b), especially when looking at its frequency response (in the fourth column). However, in other ANNs this shape does not return. The first and fourth ANN indicate that actually multiple feature detectors may be distributed over the two feature maps. To allow inspection of weights, initialization with fixed values seems to be a prerequisite. To allow this, the training algorithm itself should allow initially identical weights to take on different values during training. The next section will introduce a training algorithm developed specifically for training ANNs with the goal of weight interpretation. 2. Decorrelating Conjugate Gradient Descent The last experiment on the NIST subset showed that interpretation of ANNs with multiple feature detectors is difficult. The main causes are the
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
393
random weight initialization required and a tendency of the ANNs to distribute features to be detected in a nonobvious way over receptive fields. To address the latter problem, hidden units learning identical functions, a modular approach has been suggested (Jacobs et al., 1991). However, this is not applicable in cases in which there is no clear decomposition of a task’s input space into several domains. To allow fixed value weight initialization and still obtain succinct feature detectors, a new training algorithm will be proposed. The algorithm is based on CGD, but has as a soft constraint the minimization of the squared covariance between receptive fields. In this way, the symmetry between feature detectors due to fixed value initialization can be broken, and receptive field weight sets are forced to become orthogonal while still minimizing the ANN’s MSE. a. Decorrelation. Note that, in trained ANNs, weight sets belonging to different receptive fields need not be exactly the same for the feature maps to perform the same function. This is because weights are interdependent, as was already noted in Section IV.A.1. As an example, consider the weight vectors w po,A and w po,B (from here on, wA and wB) in ANN3 (Fig. 21). As long as wA ¼ c1 wB þ c2 , biases in the hidden and output layer and the weights between these layers can correct the differences between the two weight sets, and their functionality can be approximately identical.9 The conclusion is that to compare weight sets, one has to look at their correlation. Suppose that for a certain layer in an ANN (as in Fig. 21) there are two incoming weight vectors wA and wB, both with K > 2 elements and var(wA) > 0 and varðwB Þ > 0. The correlation coefficient C between these vectors can be calculated as covðwA ; wB Þ CðwA ; wB Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi varðwA Þ varðwB Þ
ð17Þ
The correlation coefficient C(wA, wB) is a number in the range [1; 1]. For C(wA, wB) = 1, there is a strong correlation; for C(wA, wB) = 0 there is no correlation. Therefore, the squared correlation C(wA, wB)2 can be minimized to minimize the likeness of the two weight sets. Although this seems a natural thing to do, a problem is that squared correlation can be minimized either by minimizing the squared covariance or by maximizing the variance of either weight vector. The latter is undesirable, as for interpretation the variance of one of the weight vectors 9 Up to a point, naturally, due to the nonlinearity of the transfer functions in the hidden and output layer. For this discussion it is assumed the network operates in that part of the transfer function that is still reasonably linear.
394
DE RIDDER ET AL.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
Figure 22. Feature detector pairs found in ANN3, for four different random weight initializations (a–d).
395
396
DE RIDDER ET AL.
should not be unnecessarily increased just to lower the squared correlation. Ideally, both weight vectors should have comparable variance. Therefore, a better measure to minimize is just the squared covariance. To do this, the derivative of the covariance with respect to a single weight wiA has to be computed: " #2 K @covðwA ; wB Þ2 @ 1 X A A B B Þ ðw w Þ ðw w ¼ K ¼1 @wA @wA i i ð18Þ ¼
2 BÞ covðwA ; wB Þ ðwBi w K
This derivative can then be used in combination with the derivative of the MSE with respect to the weights to obtain a training algorithm minimizing both MSE and squared covariance (and therefore squared correlation, because the variance of the weight vectors will remain bounded since the ANN still has to minimize the MSE). Correlation has been used before in neural network training. In the cascade correlation algorithm (Fahlman and Lebiere, 1990), it is used as a tool to find an optimal number of hidden units by taking the correlation between a hidden unit’s output and the error criterion into account. However, it has not yet been applied on weights themselves, to force hidden units to learn different functions during training. b. A Decorrelating Training Algorithm. Squared covariance minimization was incorporated into the CGD method used before. Basically, CGD iteratively applies three stages: calculation of the derivative of the error with respect to the weights, dE ¼ @=@wEðwÞ; deriving a direction h from dE that is conjugate to previously taken directions; a line minimization of E from w along h to find a new weight vector w0 . The squared covariance term was integrated into the derivative of the error function as an additive criterion, as in weight regularization (Bishop, 1995). A problem is how the added term should be weighted (cf. choosing the regularization parameter). The MSE can start very high but usually drops rapidly. The squared covariance part also falls in the range [0, 1], but it may well be the case that it cannot be completely brought down to zero, or only at a significant cost to the error. The latter effect should be avoided: the main training goal is to reach an optimal solution in the MSE sense. Therefore, the covariance information is used in the derivative function only, not in the line minimization. The squared covariance gradient, dcov, is
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
397
normalized to the length of the ordinary gradient dE ( just its direction is used) and weighed with a factor ; i.e., d ¼ dE þ
jjdE jj cov d jjdcov jj
where dcov ¼
K 1 X K X 2 2 @ cov w ð0Þ; wl ð0Þ KðK 1Þ ¼1 l¼þ1 @w
Note that the derivative of the squared covariance is calculated only once for each pair of weight sets and attributed to only one of the weight sets. This allows one weight set to learn a globally optimal function, while the second set is trained to both lower the error and avoid covariance with the first set. It also allows initialization with fixed values, since the asymmetrical contribution of the squared covariance term provides a symmetry breaking mechanism (which can even improve performance in some classification problems, see de Ridder et al., 1999). However, the outcome of the DCGD training process is still dependent on the choice of a number of parameters. DCGD even introduces a new one (the weight factor ). If the parameters are chosen poorly, one will still not obtain understandable feature detectors. This is a problem of ANNs in general, which cannot be solved easily: a certain amount of operator skill in training ANNs is a prerequisite for obtaining good results. Furthermore, experiments with DCGD are reproducable due to the possibility of weight initialization with fixed values. The DCGD algorithm is computationally expensive, as it takes covariances between all pairs of receptive fields into account. Due to this Oðn2 Þ complexity in the number of receptive fields, application of this technique to large ANNs is not feasible. A possible way to solve this problem would be to take only a subset of covariances into account. 3. Training ANN3 Using DCGD ANN3 was trained using DCGD. Weights and biases were initialized to a fixed value of 0.01 (i.e., ¼ 0:01; ¼ 0:0) and N ¼ 10 directions were kept conjugate at a time. The only parameter varied was the weighting factor of the squared covariance gradient, , which was set to 0.5, 1, 2, and 5. Training converged but was slow. The MSE eventually reached the values obtained using CGD (1:0 106 , cf. Section IV.B.1); however, DCGD training was stopped when the MSE reached about 1:0 105 , after about 500–1000 cycles, to prevent overtraining. In all cases, classification was perfect.
398
DE RIDDER ET AL.
Figure 23 shows the feature detectors found in ANN3 trained using DCGD. Squared correlations C2 between them are very small, showing that the minimization was succesful (the squared covariance was, in all cases, nearly 0). For ¼ 1 and ¼ 2, the feature detectors are more clear than those found using standard CGD, in Section IV.B.1. Their frequency responses resemble those of the feature detectors shown in Figure 22b and, due to the fixed weight initialization, are guaranteed to be found when training is repeated. However, should be chosen with some care; if it is too small ( ¼ 0:5), the squared covariance term will have too little effect; if it is too large ( ¼ 5), minimization of the squared covariance term becomes too important and the original functionality of the network is no longer clearly visible. The features detected seem to be diagonal bars, as seen before, and horizontal edges. This is confirmed by inspecting the output of the two feature maps in ANN3 trained with DCGD, ¼ 1, for a number of input samples (see Fig. 24). For samples of class ‘‘1,’’ these outputs are lower than for class ‘‘7,’’ i.e., features specific for digits of class ‘‘7’’ have been found. Furthermore, the first feature detector clearly enhances the stem of ‘‘7’’ digits, whereas the second detector amplifies the top stroke. Finally, versions of ANN3 with three and four feature maps were also trained using DCGD. Besides the two feature detectors found before no clear new feature detectors were found.
C. Discussion The experiments in this section were performed to determine whether training ANNs with receptive field mechanisms leads to the ANN finding useful, shift-invariant features and if a human observer could interpret these features. In general, it was shown that the mere presence of receptive fields in an ANN and a good performance do not mean that shift-invariant features are detected. Interpretation was possible only after severely restricting the ANN architecture, data set complexity, and training method. One thing all experiments had in common was the use of ANNs as classifiers. Classification is a ‘‘derived’’ goal, i.e., the task is assigning (in principle arbitrary) outputs, representing class labels, to input samples. The ANN is free to choose which features to use (or not) to reach this goal. Therefore, to study the way in which ANNs solve problems moving to regression problems might yield results more fit for interpretation, especially when a regression problem can be decomposed into a number of independent subproblems. The next sections will study the use of ANNs as nonlinear filters for image enhancement.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
399
V. Regression Networks for Image Restoration This section will study whether standard regression feedforward ANNs can be applied successfully to a nonlinear image filtering problem. If so, what are the prerequisites for obtaining a well-functioning ANN? A second question (as in the previous section) is whether these ANNs correspond to classic image processing approaches to solve such a task. Note that again the goal here is not to simply apply ANNs to an image processing problem, nor to construct an ANN that will perform better at it than existing techniques. Instead, the question is to what extent ANNs can learn the nonlinearities needed in some image processing applications. To investigate the possibilities of using feedforward ANNs and the problems one might encounter, the research concentrates on a single example of a nonlinear filter: the Kuwahara filter for edge-preserving smoothing (Kuwahara et al., 1976). Since this filter is well-understood and the training goal is exactly known, it is possible to investigate to what extent ANNs are capable of performing this task. The Kuwahara filter also is an excellent object for this study because of its inherent modular structure, which allows the problem to be split into smaller parts. This is known to be an advantage in learning (Anand et al., 1995) and provides the opportunity to study subproblems in isolation. Pugmire et al., (1998) looked at the application of ANNs to edge detection and found that structuring learning in this way can improve performance; however, they did not investigate the precise role this structuring plays. ANNs have previously been used as image filters, as discussed in Section II.C.1. However, the conclusion was that in many applications the ANNs were nonadaptive. Furthermore, where ANNs were adaptive, a lot of prior knowledge of the problem to be solved was incorporated in the ANN’s architectures. Therefore, in this section a number of modular ANNs will be constructed and trained to emulate the Kuwahara filter, incorporating prior knowledge in various degrees. Their performance will be compared to standard feedforward ANNs. Based on results obtained in these experiments, in Section VI it is shown that several key factors influence ANN behavior in this kind of task. A. Kuwahara Filtering The Kuwahara filter is used to smooth an image while preserving the edges (Kuwahara et al., 1976). Figure 25a illustrates its operation. The input of the filter is a ð2 1Þ ð2 1Þ pixel neighborhood around the central pixel. This neighborhood is divided into four overlapping subwindows
400
DE RIDDER ET AL.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
Figure 23. Feature detector pairs found in ANN3 using DCGD with various values of weight factor (a–d). C2 is the squared correlation between the feature detectors after training.
401
402 DE RIDDER ET AL.
Figure 24. The output of (a) the first and (b) the second feature map of ANN3 trained with DCGD ( ¼ 1), for two samples of class ‘‘1’’ (left) and two samples of class ‘‘7’’ (right). The samples used were, for both digits, the leftmost two in Figure 17.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
403
Wi ; i ¼ 1; 2; 3; 4, each of size pixels. For each of these subwindows, the average i and the variance i2 of the 2 gray values are calculated. The output of the filter is then found as the average m of the subwindow Wm having the smallest gray value variance ðm ¼ arg mini 2i Þ. This operation can be applied in a scan-wise manner to filter an entire image. For an example of the effect of the filter, see Figure 26. The filter is nonlinear. As the selection of the subwindow based on the variances is data-driven, edges are not blurred as in normal uniform
Figure 25. (a) The Kuwahara filter: subwindows in a ð2 1Þ ð2 1) window; here ¼ 3. (b) Kuwahara filter operation as a sequence of operations.
Figure 26. Images used for (a) training and (b–c) testing purposes. The top images are the originals; the bottom images are the Kuwahara filtered versions (for image A, the training target). For presentation purposes, the contrast of the images has been stretched (Young et al., 1998).
404
DE RIDDER ET AL.
filtering. Because a straight edge will always lie in at most three subwindows, there will always be at least one subwindow that does not contain an edge and therefore has low variance. For neighboring pixels in edge regions, different subwindows will be selected (due to the minimum operation), resulting in sudden large differences in gray value. Typically, application of the Kuwahara filter to natural images will result in images that have an artificial look but that may be more easily segmented or interpreted. This filter was selected for this research because of the following: It is nonlinear. If ANNs can be put to use in image processing, the most rewarding application will be one to nonlinear rather than linear image processing. ANNs are most often used for learning (seemingly) highly complex, nonlinear tasks with many parameters using only a relatively small number of samples. It is modular (Fig. 25b illustrates this). This means the operation can be split into subtasks that can perhaps be learned more easily than the whole task at once. It will be interesting to see whether an ANN will need this modularity and complexity in order to approximate the filter’s operation. Also, it offers the opportunity to study an ANN’s operation in terms of the individual modules.
B. Architectures and Experiments In the previous section, it was shown that when studying ANN properties, such as internal operation (which functions are performed by which hidden units) or generalization capabilities, one often encounters a phenomenon that could be described as an ANN interpretability trade-off (Section IV.A.3). This trade-off, controlled by restricting the architecture of an ANN, is between the possibility of understanding how a trained ANN operates and the degree to which the experiment is still true to life. To cover the spectrum of possibilities, a number of modular ANNs with varying degrees of freedom was constructed. The layout of such a modular ANN is shown in Figure 27. Of the modular ANNs, four types were created, M ANNM 1 . . . ANN4 . These are discussed below in descending order of artificiality; i.e., the first is completely hand-designed, with every weight set to an optimal value, whereas the last consists of only standard feedforward modules. 1. Modular Networks Each modular ANN consists of four modules. In the four types of modular ANN, different modules are used.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
405
For ANN1M, the modules were hand-designed for the tasks they are to perform. In some cases, this meant using other than standard (i.e., sigmoid, linear) transfer functions and very unusual weight settings. Figure 28 shows the four module designs and the weights assigned to their connections. — The average module (MODAvg, Fig. 28a) uses only linear transfer functions in units averaging the inputs. Four of these modules can be used to calculate 1,. . ., 4. — The variance module (MODVar, Fig. 28b) uses a submodule (on the left) to calculate the average of the subwindow it is presented. The other submodule (on the right) just transports the original data to lower layers.10 The calculated averages are then subtracted from the original inputs, followed by a layer of units using an f (a) = tanh2 (a) transfer function to approximate the square of the input11 (see Fig. 29a). Four of these modules can be used to find 12, . . ., 42. — The position-of-minimum module for selecting the position of the minimum of four inputs (MODPos, Fig. 28c) is the most complicated one. Using the logarithm of the sigmoid as a transfer function, f ðaÞ ¼ ln
1 1 þ expðaÞ
ð19Þ
(see Fig. 29b), units in the first three hidden layers act as switches comparing their two inputs. Alongside these switches, linear transfer function units are used to transport the original values to deeper layers. Weights wA and wB are very high to enable the units to act as switches. If the input connected using weight wA (input IA) is greater than the input connected using weight wB (input IB), the sum will be large and negative, the output of the sigmoid will approach 0.0, and the output of the unit will be 1. If IB > IA, on the other hand, the sum will be large and positive, the output of the sigmoid part will approach 1.0, and the final output of the unit will be 0.0. This output can be used as an inhibiting signal, by passing it to units of the same type in lower layers. In this way, units in the third hidden layer have as output—if inputs are denoted as 1, 2, 3, and 4: 10 This part is not strictly necessary, but was incorporated since links between nonadjacent layers are difficult to implement in the software package used (Hoekstra et al., 1996). 11 This function is chosen since it approximates a2 well on the interval it will be applied to, but is bounded: it asymptotically reaches 1 as the input grows to 1. The latter property is important for training the ANN, as unbounded transfer functions will hamper convergence.
406
DE RIDDER ET AL.
Figure 27. A modular ANN. MODAvg, MODVar, MODPos, and MODSel denote the ANN modules, corresponding to the operations shown in Figure 25b. The top layer is the input layer. In this figure, shaded boxes correspond to values transported between modules, not units.
si ¼
0:0 0:5
i < minm¼1;...;4^m6¼i m otherwise
ð20Þ
Weights wA and wB are slightly different to handle cases in which two inputs are exactly the same but one (in this case arbitrary) minimum position has to be found. The fourth and fifth hidden layers ensure that exactly one output unit will indicate that the corresponding input was minimal, by setting the output of a unit to 0.0 if another unit to the right has an output 6¼ 0.0. The units perform an xor-like function, giving high output only when exactly one of the inputs is high. Finally, biases (indicated by bA, bB, and bC next to the units) are used to let the outputs have the right value (0.0 or 0.5). — The selection module (MODSel, Fig. 28d) uses large weights coupled to the position-of-minimum module outputs (inputs s1, s2, s3, and s4) to suppress the unwanted average values i before adding these. The small weights with which the average values are multiplied and the large incoming weight of the output unit are used to avoid the nonlinearity of the transfer function.
Figure 28. The modules for (a) calculating the average, (b) calculating the variance, (c) finding the position of the minimum variance, and (d) selecting the right average. In all modules, the top layer is the input layer. Differently shaded boxes correspond to units with different transfer functions.
Since all weights were fixed, this ANN was not trained. ANN2M modules have the same architectures as those of ANN1M. However, in this case the weights were not fixed, hence the modules could be trained. These modules were expected to perform poorly, as some of the optimal weights (as set in ANN1M) were very high and some of the transfer functions are unbounded (see Fig. 29b). In ANN3M modules, nonstandard transfer functions were no longer used. As a result, the modules MODVar and MODPos had to be replaced by standard ANNs. These ANNs contained two layers of 25 hidden units, each of which had a double sigmoid transfer function. This number of hidden units was thought to give the modules a sufficiently large number of parameters, but keeps training times feasible. In the final type, ANN4M, all modules consisted of standard ANNs with two hidden layers of 25 units each. 407
408
DE RIDDER ET AL.
Figure 28. (Contuned)
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
409
Figure 29. The nonstandard transfer functions used in (a) MODVar and (b) MODPos.
With these four types, a transition is made from a fixed, hard-wired type of ANN (ANN1M), which is a hard-wired implementation of the Kuwahara filter, to a free type (ANN4M) in which only the prior knowledge that the filter consists of four subtasks is used. The goal of the exercise is to see a gradual change in behavior and performance. Note that the ANN1M architecture is probably not the only error-free implementation possible using ANN units. It should be clear from the discussion, though, that any architecture should resort to using nonstandard transfer functions and unconventional weight settings to perform the nonlinear operations error-free over a large range of input values. In this respect, the exact choices made here are less important.
2. Standard Networks As shown in Section III, the use of prior knowledge in ANN design will not always guarantee that such ANNs will perform better than standard architectures. To validate results obtained with the ANNs described in the previous section, experiments were also performed with standard, fully connected feedforward ANNs. Although one hidden layer should theoretically be sufficient (Funahashi, 1989; Hornik et al., 1989), the addition of a layer may ease training or lower the number of required parameters (although there is some disagreement on this). Therefore, ANNs having one or two hidden layers of 1, 2, 3, 4, 5, 10, 25, 50, 100, or 250 units each were used. All units used the double sigmoid transfer function. These ANNs will be referred to as ANNSLU , where L indicates the number of hidden layers
410
DE RIDDER ET AL.
(1 or 2) and U the number of units per hidden layer. ANNSL will be used to denote the entire set of ANNs with L hidden layers. 3. Data Sets and Training To train the ANNs, a training set was constructed by drawing samples randomly, using a uniform distribution, from image A (input) and its Kuwahara filtered version (output), both shown in Figure 26a. The original 8-bit 256-gray value image was converted to a floating point image and rescaled to the range [0:5; 0:5]. Three data sets were constructed, containing 1000 samples each: a training set, a validation set, and a testing set. The validation set was used to prevent overtraining: if the error on the validation set did not drop below the minimum error found so far on that set for 1000 cycles, training was stopped. Because in all experiments only ¼ 3 Kuwahara filters were studied, the input to each ANN was a 5 5 region of gray values and the training target was 1 value. For the modular ANNs, additional data sets were constructed from these original data sets to obtain the mappings required by the individual ANNs (average, variance, position-of-minimum, and selection). For training, the standard stochastic backpropagation algorithm (Rumelhart et al., 1986) was used. Weights were initialized to random values drawn from a uniform distribution in the range [0:1; 0:1]. The learning rate was set to 0.1; no momentum was used. Training was stopped after 25,000 cycles or if the validation set indicated overtraining, whichever came first. All experiments were repeated five times with different random initializations; all results reported are averages over five experiments. Wherever appropriate, error bars indicate standard deviations. 4. Results Results are given in Figures 30 and 31. These will be discussed here for the different architectures. a. Modules. The different modules show rather different behavior (Fig. 30). Note that in these figures the MSE was calculated on a testing set of 1000 samples. As was to be expected, the MSE is lowest for the handconstructed ANN1M modules: for all ANNs except MODPos, it was 0. The error remaining for the MODPos module may look quite high, but is caused mainly by the ANN choosing a wrong minimum when two or more input values i are very similar. Although the effect on the behavior of the final module (MODSel) will be negligible, the MSE is quite high since one output that should have been 0.5 is incorrectly set to 0.0 and vice versa, leading to an MSE of 0.25 for that input pattern. For the other ANNs, it seems that if the
M
.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
Figure 30. Performance of the individual modules on the testing set in each of the modular ANNs, ANN1M . . . ANN4
411
412
DE RIDDER ET AL.
manually set weights are dropped (ANN2M), the modules are not able to learn their function as well as possible (i.e., as well as ANN1M). Nonetheless, the MSE is quite good and comparable to ANN3M and ANN4M. When the individual tasks are considered, the average is obviously the easiest function to approximate. Only for ANN4M, in which standard modules with two hidden layers were used, is the MSE larger than 0.0; apparently these modules generalize less well than the hand-constructed, linear MODAvgs. The variance too is not difficult: MSEs are Oð105 Þ. Clearly, the position-of-minimum task is the hardest. Here, almost all ANNs perform poorly. Performances on the selection problem, finally, are quite good. What is interesting is that the more constrained modules (ANN2M, ANN3M) perform less well than the standard ones. Here again the effect that the construction is closely connected to the optimal set of weights plays a role. Although there is an optimal weight set, the training algorithm did not find it. b. Modular Networks. When the modules are concatenated, the initial MSEs of the resulting ANNs are poor: for ANN2M, ANN3M, and ANN4M Oð1Þ; Oð101 Þ; and Oð102 Þ, respectively. The MODPos module is mainly responsible for this; it is the hardest module to learn due to the nonlinearity involved (see the discussion in Section V.B.4.a). If the trained MODPos in ANN2M. . . ANN4M is replaced by the constructed ANN1M module, the overall MSE always decreases significantly (see Table 2). This is an indication that although its MSE seems low ½Oð102 Þ, this module does not perform well. Furthermore, it seems that the overall MSE is highly sensitive to the error this module makes. However, when the complete ANNs are trained a little further with a low learning rate (0.1), the MSE improves rapidly: after only 100–500 learning cycles training can be stopped. In Pugmire et al. (1998), the same effect occurs. The MSEs of the final ANNs on the entire image are shown in Figure 31a,e, and i for images A, B, and C, respectively. Images B and C were preprocessed in the same way as image A: the original 8-bit (B) and 5-bit (C) 256-gray value images were converted to floating point images, with gray values in the range [0:5; 0:5]. To get an idea of the significance of these results, reinitialized versions of the same ANNs were also trained. That is, all weights of the concatenated ANNs were initialized randomly without using the prior knowledge of modularity. The results of these training runs are shown in Figure 31b, f, and j. Note that only ANN2M cannot be trained well from scratch, due to the nonstandard transfer functions used. For ANN3M and ANN4M the MSE is comparable to the other ANNs. This would indicate that modular training is not beneficial, at least according to the MSE criterion.
413
Figure 31. (Contuned)
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
414 DE RIDDER ET AL.
Figure 31. Performance of all ANNMs and ANNss on the three images used: (a–d) on image A (Fig. 26a), (e–h) on image B (Fig. 26b), and (i–l) on image C (Fig. 26c). For the ANNss, the x-axis indicates the number of hidden units per layer.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
415
TABLE 2 Dependence of Performance, in MSE on the Image A Testing Set, on the MODPos Modulea Type
MSE
ANNM 2 ANNM 3 ANNM 4
9.2 101 5.2 101 1.2 101 1.2 101 3.6 102 1.7 102
a
MSE with MODPos of ANNM 1 8.7 104 1.7 104 1.0 103 2.0 104 1.2 103 2.4 104
Values given are average MSEs and standard deviations.
The ANNs seem to generalize well, in that nearly identical MSEs are reached for each network on all three images. However, the variance in MSE is larger on image B and image C than it is for image A. This indicates that the modular networks may have become slightly too adapted to the content of image A. c. Standard Networks. Results for the standard ANNs, ANNSs, are shown in Figure 31c–d, g–h, and k–l for images A, B, and C. In each case, the first figure gives the results for ANNs with one hidden layer and the second figure for ANNs with two hidden layers. What is most striking is that for almost all sizes of the ANNs the MSEs are more or less the same. Furthermore, this MSE is nearly identical to the one obtained by the modular ANNs ANN2M. . .ANN4M. It also seems that the smaller ANNs, which give a slightly larger MSE on image A and image B, perform a bit worse on image C. This is due to the larger amount of edge pixels in image C; the next section will discuss this further. C. Investigating the Error The experiments in the previous section indicate that no matter which ANN is trained (except for ANN1M), the MSE it will be able to reach on the images is equal. However, visual inspection shows small differences between images filtered by various ANNs; see e.g., the left and center columns of Figure 32. To gain more insight in the actual errors the ANNs make, a technique can be borrowed from the field of Bayesian learning, which allows the calculation of error bars for each output of the ANN (Bishop, 1995). The computation is based on the Hessian of the ANN output with respect to its weights w, H ¼ r2W Rðx; wÞ, which needs to be found first. Using H, for each input x a corresponding variance 2tot can be found. This makes it possible to create an image in which each pixel corresponds to 2 tot, i.e., the gray value
416
DE RIDDER ET AL.
equals half the width of the error bar on the ANN output at that location. Conversely, the inverse of tot is sometimes used as a measure of confidence in an ANN output for a certain input. For a number of ANNs, the Hessian was calculated using a finite differencing approximation (Bishop, 1995). To calculate the error bars, this matrix has to be inverted first. Unfortunately, for the ANNMs, inversion was impossible as their Hessian matrices were too ill-conditioned because of the complicated architectures, containing fixed and shared weights. Figure 32b and c shows the results for two standard ANNs, ANNS125 and ANNS225 . In the left column the ANN output for image A (Fig. 26a) is shown. The center column shows the absolute difference between this output and the target image. In the third column the error bars calculated using the Hessian are shown. The figures show that the error the ANN makes is not spread out evenly over the image. The highest errors occur near the edges in image A, as can be seen by comparing the center column of Figure 32 with the gradient magnitude of |rIA| of image A, shown in Figure 33a. This gradient magnitude is calculated as (Young et al., 1998) sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 IA IA ð21Þ þ jrIA j ¼ x y where IA =x is approximated by convolving image A with a [1 0 1] mask, and IA =y by convolving image A with its transpose. The error bar images, in the right column of Figure 32, show that the standard deviation of ANN output is also highest on and around the edges. Furthermore, although the output of the ANNs looks identical, the error bars show that the ANNs actually behave differently. These results lead to the conclusion that the ANNs have learned fairly well to approximate the Kuwahara filter in flat regions, where it operates like a local average filter. However, on and around edges they fail to give the correct output; most edges are sharpened slightly, but not nearly as much as they would be by the Kuwahara filter. In other words, the linear operation of the Kuwahara filter is emulated correctly, but the nonlinear part is not. Furthermore, the error bar images suggest there are differences between ANNs that are not expressed in their MSEs.
D. Discussion The most noticeable result of the experiments above is that whatever ANN is trained, be it a simple one hidden unit ANN or a specially constructed
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
417
Figure 32. (a) The original image A. (b) and (c), from left to right: outputs of two ANNss on image A; absolute differences between target image and ANN output and ANN output error bar widths plotted as gray values.
Figure 33. (a) The gradient magnitude of image A, | rIA |. (b) Performance of ANNs150 for various training set sample sizes.
modular ANN, approximately the same performance (measured in MSE) can be reached. Modular training does not seem to boost performance at all. However, inspection of error images and standard deviation of ANN outputs suggests that there are differences between ANNs. Furthermore, the errors made by ANNs are concentrated around edges, i.e., in the part where the Kuwahara filter’s nonlinearity comes into play. There are a number of hypotheses as to what causes all ANNs to seemingly perform equally well, some of which will be investigated in the next section:
418
DE RIDDER ET AL.
the problem may simply be too hard to be learned by a finite-size ANN. This does not seem plausible, since even for a two-hidden layer ANN with 250 hidden units per layer, resulting in a total of 69,000 free parameters, the MSE is no better than for very simple ANNs. One would at least expect to see some enhancement of results; it is possible that the sample size of 1000 is too small, as it was rather arbitrarily chosen. An experiment was performed in which ANNS150 was trained using training sets with 50, 100, 250, 500, 1000, and 2000 samples. The results, given in Figure 33b, show, however, that the chosen sample size of 1000 seems sufficient. The decrease in MSE when using 2000 samples in the training set is rather small; the training set may not be representative for the problem, i.e., the nature of the problem may not be well reflected in the way the set is sampled from the image; the error criterion may not be fit for training the ANNs or assessing their performance. It is very well possible that the MSE criterion used is of limited use in this problem, since it weighs both the interesting parts of the image, around the edges, and the less interesting parts equally; the problem may be of such a nature that local minima are prominently present in the error surface, while the global minima are very hard to reach, causing suboptimal ANN operation.
VI. Inspection and Improvement of Regression Networks This section tries to answer the questions raised by the experiments in the previous section, by investigating the influence of the data set, the appropriateness of the MSE as a performance measure and the trained ANNs themselves. A. Edge-Favoring Sampling Inspection of the ANN outputs and the error bars on those outputs led to the conclusion that the ANNs had learned to emulate the Kuwahara filter well in most places, except in regions near edges (Section V.C). A problem in sampling a training set from an image12 for this particular 12 From here on, the term sampling will be used to denote the process of constructing a data set by extracting windows from an image with coordinates sampled from a certain distribution on the image grid.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
419
application is that such interesting regions, i.e., the regions where the filter is nonlinear, are very poorly represented. Edge pixels constitute only a very small percentage of the total number of pixels in an image [as a rule of pffiffiffi thumb, Oð nÞ edge pixels on OðnÞ image pixels] and will therefore not be represented well in the training set when sampling randomly using a uniform distribution. To learn more about the influence of the training set on performance, a second group of data sets was created by sampling from image A (Fig. 26a) with a probability density function based on its gradient magnitude image R R |rIA| [Eq. (21)]. If |rI | is scaled by a factor c such that x y c jrIðx; yÞ jdydx ¼ 1, and used as a probability density function when sampling, edge regions have a much higher probability of being included in the data set than pixels from flat regions. This will be called edge-favoring sampling, as opposed to normal sampling. 1. Experiments Performances (in MSE) of ANNs trained on this edge-favoring set are given in Figures 34 and 35. Note that the results obtained on the normal training set (first shown in Fig. 31) are included again to facilitate comparison. The sampling of the data set clearly has an influence on the results. Because the edge-favoring set contains more samples taken from regions around edges, the task of finding the mean is harder to learn due to the larger variation. At the same time, it eases training the position-of-minimum and selection modules. For all tasks except the average, the final MSE on the edgefavoring testing set (Fig. 34b,d,f, and h) is better than that of ANNs trained using a normal training set. The MSE is, in some cases, even lower on the normal testing set (Fig. 34e and g). Overall results for the modular and standard ANNs (Fig. 35) suggest that performance decreases when ANNs are trained on a specially selected data set (i.e., the MSE increases). However, when the quality of the filtering operation is judged by looking at the filtered images (see, e.g., Fig. 36), one finds that these ANNs give superior results in approximating the Kuwahara filter. Clearly, there is a discrepancy between performance as indicated by the MSE and visual perception of filter quality. Therefore, below we will investigate the possibility of finding another way of measuring performance. B. Performance Measures for Edge-Preserving Smoothing The results given in Section VI.A.1 show that it is very hard to interpret the MSE as a measure of filter performance. Although the MSEs differ only
420 DE RIDDER ET AL.
M Figure 34. Performance of the individual modules in each of the modular ANNs, ANNM 1 . . . ANN4 , on the normal testing set (a, c, e, g) and edge-favoring testing set (b, d, f, h).
421
Figure 35. (Continued).
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
422 DE RIDDER ET AL.
Figure 35. Performance of all ANNMs and ANNss on the three images used: (a–d) on image A (Fig. 26a), (e–h) on image B (Fig. 26b), and (i–l) on image C (Fig. 26c).
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
423
slightly, visually the differences are quite large. If images filtered by various ANNs trained on the normal and edge-favoring data sets are compared, it seems clear which ANN performs better. As an example, Figure 36 shows two filtered images. The left image was filtered by ANN4M trained on an edge-favoring training set. The image on the right is the output of ANNS1100 trained on a normal data set. Although the MSEs are nearly equal (1:48 103 for the left image versus 1:44 103 for the right one), in the left image the edges seem much crisper and the regions much smoother than in the image on the right; that is, one would judge the filter used to produce the left image to perform better. One would like to find a measure for filter performance that bears more relation to this qualitative judgment than the MSE. The reason why the MSE is so uninformative is that by far the largest number of pixels do not lie on edges. Figure 37a illustrates this: it shows that the histogram of the gradient magnitude image is concentrated near zero, i.e., most pixels lie in flat regions. Because the MSE averages over all pixels, it may be quite low for filters that preserve edges poorly. Vice versa, the visual quality of the images produced by the ANNs trained using the edge-favoring data set may be better while their MSE is worse, due to a large number of small errors made in flat regions. The finding that the MSE does not correlate well with perceptual quality judgment is not a new one. A number of alternatives have been proposed, among which the mean absolute error (MAE) seems to be the most prominent one. There is also a body of work on performance measures for edge detection, e.g., Pratt’s Figure of Merit (FOM) (Pratt, 1991) or Average Risk (Spreeuwers, 1992). However, none of these captures the dual goals of edge sharpening and region smoothing present in this problem.
Figure 36. Two ANN output images with details. For the left image, output of ANN4M trained on the edge-favoring set, the MSE is 1:48 103 ; for the right image, output of ANNs1 100 trained on a normally sampled set, it is 1:44 103 . The details in the middle show the target output of the Kuwahara filter; the entire target image is shown in Figure 26a.
424
DE RIDDER ET AL.
Figure 37. (a) Histograms of gradient magnitude values | rI | of image A (Fig. 26a) and a Kuwahara filtered version ( ¼ 3). (b) Scattergram of the gradient magnitude image pixel values with estimated lines.
1. Smoothing versus Sharpening In edge-preserving smoothing, two goals are pursued: on the one hand the algorithm should preserve edge sharpness, and on the other hand it should smooth the image in regions that do not contain edges. In other words, the gradient of an image should remain the same in places where it is high13 and decrease where it is low. If the gradient magnitude |rI | of an image I is plotted versus |rf (I )| of a Kuwahara-filtered version f (I ), for each pixel I(i, j ), the result will look like Figure 37b. In this figure, the two separate effects can be seen: for a number of points, the gradient is increased by filtering while for another set of points the gradient is decreased. The steeper the upper cloud, the better the sharpening; the flatter the lower cloud, the better the smoothing. Note that the figure gives no indication of the density of both clouds: in general, by far the most points lie in the lower cloud, since more pixels lie in smooth regions than on edges. The graph is reminiscent of the scattergram approach discussed (and denounced) in Katsulai and Arimizu (1981), but here the scattergram of the gradient magnitude images is shown.
13 Or even increase. If the regions divided by the edge become smoother, the gradient of the edge itself may increase, as long as there was no overshoot in the original image. Overshoot is defined as the effect of artificially sharp edges, which may be obtained by adding a small value to the top part of an edge and subtracting a small value from the lower part (Young et al., 1998).
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
425
To estimate the slope of the trend of both clouds, the point data is first separated into two sets: A¼ jrIjði; jÞ ; jrf ðIÞjði; jÞ jrIjði; jÞ jrf ðIÞjði; jÞ ð22Þ
j
B¼
jrIjði; jÞ ; jrf ðIÞjði; jÞ rIjði; jÞ < jrf ðIÞjði; jÞ
j
ð23Þ
Lines y ¼ ax þ b can be fitted through both sets using a robust estimation technique, minimizing the absolute deviation (Press et al., 1992), to get a density-independent estimate of the factors with which edges are sharpened and flat regions are smoothed: X ðaA ; bA Þ ¼ arg minða;bÞ jy ðax þ bÞj ð24Þ ðx;yÞ2A
ðaB ; bB Þ ¼ arg minða;bÞ
X
ðx;yÞ2B
jy ðax þ bÞj
ð25Þ
The slope of the lower line found, aA, gives an indication of the smoothing induced by the filter f. Likewise, aB gives an indication of the sharpening effect of the filter. The offsets bA and bB are discarded, although it is necessary to estimate them to avoid a bias in the estimates of aA and aB. Note that a demand is that aA 1 and aB 1, so the values are clipped at 1 if necessary—note that due to the fact that the estimated trends are not forced to go through the origin, this might be the case. To account for the number of pixels actually used to estimate these values, the slopes found are weighed with the relative number of points in the corresponding cloud. Therefore, the numbers Smoothingð f ; IÞ ¼
jAj ða0 1Þ jAj þ jBj A
ð26Þ
Sharpeningð f ; IÞ ¼
jBj ðaB 1Þ jAj þ jBj
ð27Þ
and
are used, where a0A ¼ 1/aA was substituted to obtain numbers in the same range [0, 1]. These two values can be considered to be an amplification factor of edges and an attenuation factor of flat regions, respectively. Note that these measures cannot be used as absolute quantitative indications of filter performance, since a higher value does not necessarily mean a better performance, i.e., there is no absolute optimal value. Furthermore, the measures are highly dependent on image content and
426
DE RIDDER ET AL.
scaling of f (I ) with respect to I. The scaling problem can be neglected, however, since the ANNs were trained to give output values in the correct range. Thus, for various filters f (I ) on a certain image, these measures can now be compared, giving an indication of relative filter performance on that image. To get an idea of the range of possible values, smoothing and sharpening values for some standard filters can be calculated, like the Kuwahara filter, a Gaussian filter 2 1 x þ y2 fG ðI; Þ ¼ I exp ð28Þ 2 2 2 2 for14 ¼ 0.0, 0.1,. . ., 2.0; and an unsharp masking filter 2 0 13 1 2 1 fU ðI; Þ ¼ I 4I @ 2 4 2 A5 1 2 1
ð29Þ
which subtracts times the Laplacian15 from an image, ¼ 0:0; 0:1; . . . ; 2:0. 2. Experiments Smoothing and sharpening performance values were calculated for all ANNs discussed in Section VI.A.1. The results are shown in Figure 38. First, lines of performance values for the Gaussian and unsharp masking filters give an indication of the range of possible values. As expected, the Gaussian filter on images A and B (Fig. 26a and b) gives high smoothing values and low sharpening values, while the unsharp masking filter gives low smoothing values and high sharpening values. The Kuwahara filter scores high on smoothing and low on sharpening. This is exactly as it should be: the Kuwahara filter should smooth while preserving the edges, it should not necessarily sharpen them. If ANNs have a higher sharpening value, they are usually producing overshoot around the edges in the output images. The measures calculated for image C (Fig. 26c) show the limitations of the method. In this image there is a large number of very sharp edges in an otherwise already rather smooth image. For this image the Gaussian filter gives only very low smoothing values and the unsharp masking filter gives no sharpening value at all. This is due to the fact that for this image, subtracting the Laplacian from an image produces a very small sharpening 14
For 0.5 the Gaussian is ill-sampled; in this case, a discrete approximation is used that is not stricly speaking a Gaussian. 15 This is an implementation of the continuous Laplacian edge detector mentioned in Section IV.A.1, different from the discrete detector shown in Figure 11.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
427
value, together with a negative smoothing value, caused by the Laplacian greatly enhancing the amount of noise in the image. Because the values were clipped at 0, the results are not shown in the figure. Regarding the ANNs, some things become clear. First, the handconstructed ANN (ANN1M) almost perfectly mimics the Kuwahara filter, according to the new measures. However, as soon as the hand-set weights are dropped (ANN2M), performance drops drastically. Apparently the nonstandard transfer functions and special architecture inhibit the ANN too much. ANN3M and ANN4M perform better and generalize well to other images. However, besides ANN1M, no other ANN in this study seems to be able to approximate the Kuwahara filter well. The best trained ANN still performs much worse. Second, edge-favoring sampling has a strong influence. Most of the architectures discussed perform reasonably only when trained on a set with a significantly larger number of edge samples than acquired by random sampling, especially the ANNSs. This indicates that although the MSE actually indicates ANNs trained on an edge-favoring set perform worse, sampling in critical areas of the image is a prerequisite for obtaining a wellperforming, nonlinear approximation to the Kuwahara filter. Most standard ANNs perform poorly. Only for ANNS210 , ANNS225 , and ANNS250 performance is reasonable. In retrospect, this concurs with the drop in the MSE that can be seen in Figure 35d, although the differences there are very small. ANNS250 clearly performs best. A hypothesis is that this depends on the training of the ANNs, since training parameters were not optimized for each ANN. To verify this, the same set of standard ANNs was trained in experiments in which the weights were initialized using random values drawn from a uniform distribution over the range [1:0, 1.0], using a learning rate of 0.5. Now, the optimal standard ANN was found to be ANNS225 , with all other ANNs performing very poorly. Generalization is, for all ANNs, reasonable. Even on image C (Fig. 26c), which differs substantially from the training image (image A, Fig. 26a), performance is quite good. The best standard ANN, ANNS250 , seems to generalize a little better than the modular ANNs.
3. Discussion In Dijk et al. (1999), it is shown that the smoothing and sharpening performance measures proposed here correlate well with human perception. It should be noted that in this study, subjects had fewer problems in discerning various levels of smoothing than they had with levels of sharpening. This indicates that the two measures proposed are not equivalently spaced.
428
DE RIDDER ET AL.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
Figure 38. Performance of standard filters, all ANNMs, and ANNSs on the three images used: (a–d) on image A (Fig. 26a), (e–h) on image B (Fig. 26b), and (i–l) on image C (Fig. 26c). In the legends, ef stands for ANNs trained on edge-favoring data sets, as opposed to normally sampled data sets (nrm); further indicates ANNs initialized by training the individual modules as opposed to ANNs trained from scratch (over); and 10, 25, and so on denote the number of units per hidden layer.
429
430
DE RIDDER ET AL.
The fact that the measures show that edge-favoring sampling in building a training set increases performance considerably suggests possibilities for extensions. Pugmire et al. (1998) claim that learning should be structured, i.e., start with the general problem and then proceed to special cases. This can be easily accomplished in training set construction, by adding a constant to each pixel in the gradient magnitude image before scaling and using it as a probability density function from which window coordinates are sampled. If this constant is gradually lowered, edge-pixels become better represented in the training set. Another, more general possibility would be to train ANNs on normally sampled data first and calculate an error image (such as those shown in the center column of Fig. 32). Next, the ANN could be trained further—or retrained—on a data set sampled using the distribution of the errors the ANN made, a new error image can be calculated, and so on. This is similar to boosting and arcing approaches in classification (Shapire, 1990). An advantage is that this does not use the prior knowledge that edges are important, which makes it more generally applicable. 4. Training Using Different Criteria Ideally, the sharpening and smoothing performance measures discussed in the previous section should be used to train ANNs. However, this is infeasible as they are not differentiable. This means they could be used only in learning procedures that do not need the criterion function to be differentiable, such as reinforcement learning (Gullapalli, 1990). This falls outside the scope of the experiments in this section. However, the previous section showed that ANNs did learn to emulate the Kuwahara filter better when trained using the edge- favoring data set. Note that constructing a data set in this way is equivalent to using a much larger data set and weighing the MSE with the gradient magnitude. Therefore, this approach is comparable to using an adapted error criterion in training the ANN. However, this weighting is quite specific to this problem. In the literature, several more general alternatives to the MSE [Eq. (8)] have been proposed (Hertz et al., 1991; Burrascano, 1991). Among these, a very flexible family of error criteria based on the Lp norm is Ep ðW; BÞ ¼
m 1 X X p jRðxi ; W; BÞ yi j 2jLj ðx ;y Þ2L ¼1 i
i
ð30Þ
where p 2 Z . Note that for p = 2, this criterion is equal to the MSE. For p ¼ 0, each error is considered equally bad, no matter how small or large it is. For p ¼ 1, the resulting error criterion is known as the mean absolute error or MAE. The MAE is more robust to outliers than the MSE, as larger
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
431
errors are given relatively smaller weights than in the MSE. For p > 2, larger errors are given more weight, i.e., the data are considered not to contain outliers. In fact, which p to use should be decided by assuming a noise model for the target data (Burrascano, 1991). The L1 norm (robust to outliers) corresponds to a noise distribution with large tails, a Laplacian distribution, under which outliers are probable. At the other extreme, L1 corresponds to a uniform noise distribution. As discussed before, the Kuwahara filter is most interesting around the edges in an image, were the filter behaves nonlinearly. It was also shown that exactly around these edges most ANNs make the largest errors (Fig. 32). Therefore, it makes sense to use an error criterion that puts more emphasis on larger errors, i.e., the Lp norm for p > 2. To this end, a number of experiments were run in which different norms were used. Although implementing these criteria in the backpropagation algorithm is trivial (only the gradient calculation at the output units changes), the modified algorithm does not converge well using standard settings. The learning rate and initialization have to be adapted for each choice of norm, to avoid divergence. Therefore, the norms were used in the CGD training algorithm, which is less sensitive to initialization and choice of criterion due to the line minimization involved. The best performing ANN found in Section VI.B, ANNS250 , was trained using CGD with the Lp norm. The parameter p was set to 1,2,3,5, and 7, and both the normal and the edge-favoring training sets were used. The ANN was trained using the same settings as before; in the CGD algorithm, directions were kept conjugate for 10 iterations. Figure 39 shows the results. Clearly, using the Lp norm helps the ANN trained on the normal set to achieve better performance (Fig. 39a). For increasing p, the sharpening performance becomes higher. However, the smoothing performance still lags behind that of the ANN trained using the MSE on the edge-favoring training set (Fig. 38d). When ANNS250 is trained using the Lp norm on the edge-favoring data set, smoothing performance actually decreases (Fig. 39b). This is caused by the fact that the training set and error criterion in concert stress errors around edges so much, that the smoothing operation in flat regions suffers. Figure 40 illustrates this by showing the output of ANNS225 as well as the absolute difference between this output and the target image, for various values of p. For increasing p, the errors become less localized around the edges; for p 3 the error in flat regions becomes comparable to that around edges. In conclusion, using different Lp norms instead of the MSE can help in improving performance. However, it does not help as much as edge-favoring sampling from the training set, since only the latter influences the error criterion exactly where it matters, around edges. Furthermore, it requires choosing a
432
DE RIDDER ET AL.
value for the parameter p, for which an optimal setting is not clear beforehand. Finally, visual inspection still shows p ¼ 2 to be the best choice. C. Inspection of Trained Networks 1. Standard Networks To gain insight into the relatively poor performance of most of the standard ANNs according to the performance measure introduced in Section VI.B, a very simple architecture was created, containing only a small number of weights (see Fig. 41a). Because the Kuwahara filter should be isotropic, a
Figure 39. Performance of ANNS250 on image A (Fig. 26a), trained using different Lp norm error criteria and (a) the normal training set and (b) the edge-favoring training set.
Figure 40. Top row: output of ANNS250 trained using the Lp norm on the edge-favoring data set, for various p (a–e). Bottom row: absolute difference between output and target image.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
433
symmetric weight mask was imposed on the weights (cf. Section IV.A.2.d). Furthermore, linear transfer functions were used to avoid the complications introduced in the analysis by the use of sigmoids. No bias was used. This ANN was trained on the normal data set, using a validation set. The learned weight set is shown in Figure 42a. In filtering terms, the main component looks like a negative Laplacian-of-Gaussian (i.e., the negative values around the center and the slightly positive values in the four corners). Further analysis showed that this filter closely resembles a linear combination of a normal Gaussian and a Laplacian-of-Gaussian. To confirm the hypothesis that standard ANNs learned such linear approximations to the Kuwahara filter, a simple standard ANN was trained in the same way ANNK was, using the DCGD training algorithm (Section IV.B.2). This ANN, ANNS12 , is shown in Figure 41b. All weights were initialized to a fixed value of 0.01, was set to 1, and the number of directions to be kept conjugate was set to 10. After training, the MSE on the testing set was 1:43 103 , i.e., comparable to other standard ANNs (Fig. 31), and C2 was 5:1 103 . The resulting weight sets show that the filter can indeed be decomposed into a Gaussian-like and a negative Laplacian-like filter. Adding more hidden units and training using DCGD, for which results are not shown here, did not cause any new filters to be found. This decomposition can well be explained by looking at the training objective. The Kuwahara filter smoothes images while preserving the edges. The Gaussian is a smoothing filter, while its second derivative, the Laplacian, emphasizes edges when subtracted from the original. Therefore, the following model for the filter found by the ANN was set up:
Figure 41. (a) ANNK, the simplest linear ANN to perform a Kuwahara filtering: a 5 5 unit input layer and one output unit without bias. The ANN contains six independent weights indicated in the mask by the letters A through F. (b) ANNs12 : two hidden units, no mask (i.e., no restrictions).
434 DE RIDDER ET AL. Figure 42. (a) Weights found in ANNK (Fig. 41a). (b) Weights generated by the fitted model [Eq. (31): c1 ¼ 10.21, 1 ¼ 2.87, c2 ¼ 3.41, 2 = 0.99]. (c) A cross section of this model at x ¼ 0. (d, e) Weight matrices found in ANNS12 (Fig. 41b) trained using DCGD.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
f ðc1 ; 1 ; c2 ; 2 Þ ¼ c1 fG ð 1 Þ c2 fL ð 2 Þ ¼ 2 2 ðx2 þ y2 Þ 22 1 x þ y2 x þ y2 exp exp c1 c2 2 21 2 22 2 21 2 62
435
ð31Þ
in which c1 and 1 are parameters to be estimated for the Gaussian and c2 and 2 are parameters for the Laplacian. Figure 42c shows these two functions. A Gauss-Newton fitting procedure (Mathworks Inc., 2000) was used to find the parameters of f (c1, 1, c2, 2) given the weights shown in Figure 42a. The resulting model weights are shown in Figure 42b and a crosssection is shown in Figure 42c. Although the fit ðc1 ¼ 10:21; 1 ¼ 2:87; c2 ¼ 3:41; 2 ¼ 0:99Þ is not perfect with a model fit MSE f ¼ 2:5 103 , the correlation between the model and the actual weights is quite high (C ¼ 0.96). The hypothesis was that this solution, i.e., applying a Gaussian and a Laplacian, was a local minimum to which the ANNSs had converged. To test this, the model fitting procedure was applied to each of the units in the first hidden layer of each of the ANNSs. This resulted in a model fit error f and correlation C between the actual weights and the model weights for each unit. The results, given in Figure 43, show that, at least for the smaller ANNs, the hypothesis is supported by the data. For the ANNs trained on the normal data set, over a large range of sizes (i.e., 1–5, 10, and 25 hidden units) the model closely fits each hidden unit. Only for larger numbers of hidden units the fit becomes worse. The reason for this is that in these ANNs many units have an input weight distribution that is very hard to interpret. However, these units do not play a large role in the final ANN output, since they are weighted by small weights in the next layer. For the ANNs trained on the edge-favoring set the fit is less good, but still gives a reasonable correlation. Note however that ANNs that have high performance with respect to the smoothing and sharpening measures (Section VI.B.2) do not necessarily show the lowest correlation: ANNSs with more hidden units give even lower correlation. An opposite effect is playing a role here: as ANNs become too large, they are harder to train. The conclusion is that many of the standard ANNs have learned a linear approximation to the Kuwahara filter. Although this approximation performs well in uniform regions, its output does not correspond to that of the Kuwahara filter near edges. 2. Modular Networks It is interesting to see whether the modular ANNs still use their initialization. Remember that to obtain good performance, the ANNMs had to either be trained further after the modules were concatenated, or reinitialized and trained over (Section V.B.4.b). The question is whether the
436 DE RIDDER ET AL. Figure 43. A comparison between the actual weights in ANNSs and the fitted models, for both ANNs1 s and ANNs2 s. The median f is shown in (a) and (b) as the average f is rather uninformative due to outliers.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
437
modules are still performing the functions they were initially trained on, or has the ANN—after being trained further for a while—found a better solution? To inspect the ANNs, the modules were first evaluated on the sets with which they were trained. Next, the concatenated ANNMs were taken apart and the modules were evaluated on the same sets. Figures 44 and 45 show some examples of such plots. Unfortunately, detailed inspection is hard. Ideally, if each module was performing the function it was trained to perform exactly, each plot would show a straight line y ¼ x. The plots show that this is, in most cases, not true. However, it is possible to make some general remarks about the differences between the various ways of training the ANNs. These differences are most clear for the mean and selection modules: for well-performing ANNs, the mapping in each module is no longer evident. Instead, it seems these modules make rather good use of their nonlinearity (Fig. 44c). The poorly performing ANNs still show a reasonably linear behavior (Fig. 45a); M there is a progressive increase in nonlinearity for ANNM 2 , ANN3 , and (Figs. 44a–c and 45a–c and d–f). The added complexity allows ANNM 4 the modules more flexibility when they are trained further. Note however that the basic mapping is still preserved, i.e., the trend is still visible for all units; there is an increase in nonlinearity when ANNs are trained on the edge-favoring set instead of the normal set (Fig. 45a–c vs. d–f ); as was to be expected, ANNMs trained from scratch generally do not find the modular structure (Fig. 44d–e). This leads to the conclusion that although the initialization by training models individually was useful, the modules of the better performing ANNs are no longer performing their original function. This is likely to be caused by the modules being trained individually on ideal, noiseless data. Therefore, modules have not learned to deal with errors made by other modules. This is corrected when they are trained further together in the concatenated ANNs. The larger the correction, the better the final performance of the concatenated ANN. For the MODVars and MODPoss, the differences are less clear. Most of these modules seem to have no function left in the final ANNs: the outputs are clamped at a certain value or vary a little in a small region around a value. For MODVar, only ANNM 4 modules have enough flexibility. Here too, training on the edge-favoring set increases the nonlinearity of the output (Fig. 46a–c). MODPos, finally, is clamped in almost all architectures. Only ANN4M modules give some variation in output (Fig. 46d–e). Networks trained from scratch are always clamped too.
438
DE RIDDER ET AL.
Figure 44. Plots of outputs of the four MODAvgs before concatenation against outputs of the same modules after concatenation and training further or over. Different markers indicate different output units. The plots show progressively more freedom as the modules become less restricted (a–c) and an increase in nonlinearity when modules are trained on the edge-favoring data set (a–c vs. d–e).
In conclusion, it seems that in most ANNs, the modules on the right-hand side (MODVar and MODPos, see Fig. 27) are disabled. However, the ANNs that do show some activity in these modules are the ANNs that perform best, indicating that the modular initialization to a certain extent is useful. All results indicate that although the nature of the algorithm can be used to construct and train individual modules, the errors these modules make are such that the concatenated ANNs perform poorly (see Section V.B.4.b). That is, modules trained separately on perfect data (e.g., precalculated positions of the minimal input) are ill-equipped to handle errors in their input, i.e., the output of preceding modules. For the concatenated ANNs, the training algorithm leaves its modular initialization to lower the overall MSE; trained as a whole, different weight configurations are optimal. The fact that a trained MODPos has a very specific weight configuration (with
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
439
Figure 45. Plots of MODSel outputs before concatenation against MODSel outputs after concatenation and training further or over. The plots show progressively more freedom as the modules become less restricted (a–c, d–f) and an increase in nonlinearity when modules are trained on the edge-favoring data set (a–c vs. d–f )
large weights) to be able to perform its function means it is more susceptible to weight changes than other modules and will easily lose its original functionality. In other words, the final concatenated ANN has ‘‘worked around’’ the errors made by MODPos by disabling it.
D. Discussion The previous section discussed a number of experiments, in which modular and standard feedforward ANNs were trained to mimic the Kuwahara filter. The main result was that all ANNs, from very simple to complex, reached the same MSE. A number of hypotheses was proposed for this phenomenon: that the data set and error measure may not accurately represent the finer points of this particular problem or that all ANNs have
440
DE RIDDER ET AL.
Figure 46. Plots of MODVar (a–c) and MODPos (d, e) outputs before concatenation against the same outputs after concatenation and training further or over. Different markers indicate different output units. The plots show many module outputs in the concatenated ANNs are clamped at certain values. Note that in the latter two figures, the original output is either 0.0 or 0.5; a small offset has been added for the different units for presentation purposes.
reached local minima, simply since the problem is too hard. Testing these hypotheses in this section, it was shown that using a different way of constructing training sets, i.e., by mainly sampling from regions around the edges, is of great benefit; using performance measures that do not average over all pixels, but take the two goals of edge-preserving smoothing into account, gives better insight into relative filter performance; by the proposed smoothing and sharpening performance measures, which correspond better to visual perception, modular ANNs performed better than standard ANNs; using the Lp norm to train ANNs, with p 2, improves performance, albeit not dramatically;
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
441
the smaller ANNSs have learned a linear approximation of the Kuwahara filter; i.e., they have reached a local minimum; in the poorly performing modular ANNs, the modules still perform the functions they were trained on. The better performing modular ANNs retain some of their initialization, but have adapted further to a point that the function of individual modules is no longer clear. The better the performance of the final ANN (according to the new measure) the less clear the initialization is retained. In the attempts to try to understand the operation of an ANN instead of treating it like a black box, the interpretability trade-off (discussed in Section IV) again played a role. For the modular ANNs, as soon as some of the constraints were dropped, ANN performance became much worse: there was no graceful degradation. It was shown too that it is hard to interpret the operation of the modular ANN after training it further; the operation of the ANN is distributed differently than in the original modular initialization. The one advantage of using the prior knowledge of the modular nature of the problem (for example, as in ANNM 4 ) is that it helps to avoid painstaking optimization of the number of hidden layers and units, which was shown to be quite critical in standard ANNs. Of course, for different problems this prior knowledge may not be available. The main conclusion is that, in principle, ANNs can be put to use as nonlinear image filters. However, careful use of prior knowledge, selection of ANN architecture, and sampling of the training set are prerequisites for good operation. In addition, the standard error measure used, the MSE, will not indicate an ANN performing poorly. Unimportant deviations in the output image may lead to the same MSE as significant ones, if there is a large number of unimportant deviations and a smaller number of important ones. Consequently, standard feedforward ANNs trained by minimizing the traditional MSE are unfit for designing adaptive nonlinear image filtering operations; other criteria should be developed to facilitate easy application of ANNs in this field. Unfortunately, such criteria will have to be specified for each application (see also Spreeuwers, 1992). In this light it is not surprising to find a large number of nonadaptive, application-specific ANNs in the literature. Finally, although all performance measures used in this section suggest that ANNs perform poorly in edge-preserving smoothing, the perceptual quality of the resulting filtered images is quite good. Perhaps it is the very fact that these ANNs have only partially succeeded in capturing the nonlinearity of the Kuwahara filter that causes this. In some cases this could be considered an advantage: constrained nonlinear parametric approximations to highly nonlinear filtering algorithms may give better perceptual results than the real thing, which is, after all, only a means to an end.
442
DE RIDDER ET AL.
VII. Conclusions This article discussed the application of neural networks in image processing. Three main questions were formulated in the introduction: Applicability: can (nonlinear) image processing operations be learned by adaptive methods? Prior knowledge: how can prior knowledge be used in the construction and training of adaptive methods? Interpretability: what can be learned from adaptive methods trained to solve image processing problems? Below, answers will be formulated to each of the questions.
A. Applicability The overview in Section II discussed how many researchers have attempted to apply artificial neural networks (ANNs) to image processing problems. To a large extent, it is an overview of what can now perhaps be called the ‘‘neural network hype’’ in image processing: the approximately 15-year period following the publications of Kohonen, Hopfield, and Rumelhart et al. Their work inspired many researchers to apply ANNs to their own problem in any of the stages of the image processing chain. In some cases, the reason was biological plausibility; however, in most cases the goal was simply to obtain well-performing classification, regression, or clustering methods. In some of these applications the most interesting aspect of ANNs, the fact that they can be trained, was not (or only partly) used. This held especially for applications to the first few tasks in the image processing chain: preprocessing and feature extraction. Another advantage of ANNs often used to justify their use is the ease of hardware implementation; however, in most publications this did not seem to be the reason for application. These observations, and the fact that often researchers did not compare their results to established techniques, casted some doubt on the actual advantage of using ANNs. In the remainder of the article, ANNs were therefore trained on two tasks in the image processing chain: object recognition (supervised classification), and preprocessing (supervised regression) and, where possible, compared to traditional approaches. The experiment on supervised classification, in handwritten digit recognition, showed that ANNs are quite capable of solving difficult object recognition problems. They performed (nearly) as well as some traditional
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
443
pattern recognition methods, such as the nearest neighbor rule and support vector classifiers, but at a fraction of the computational cost. As supervised regressors, a number of ANN architectures were trained to mimic the Kuwahara filter, a nonlinear edge-preserving smoothing filter used in preprocessing. The experiments showed that careful construction of the training set is very important. If filter behavior is critical only in parts of the image represented by a small subset of the training set, this behavior will not be learned. Constructing training sets using the knowledge that the Kuwahara filter is at its most nonlinear around edges improved performance considerably. This problem is also due to the use of the mean squared error (MSE) as a training criterion, which will allow poor performance if it occurs only for a small number of samples. Another problem connected with the use of the MSE is that it is insufficiently discriminative for model choice; in first attempts, almost all ANN architectures showed identical MSEs on test images. Criteria that were proposed to measure smoothing and sharpening performance showed larger differences. Unfortunately, these results indicate that the training set and performance measure will have to be tailored for each specific application, with which ANNs lose much of their attractiveness as all-round methods. The findings also explain why, in the literature, many ANNs applied to preprocessing were nonadaptive. In conclusion, ANNs seem to be most applicable for problems requiring a nonlinear solution, for which there is a clear, unequivocal performance criterion. This means ANNs are more suitable for high-level tasks in the image processing chain, such as object recognition, rather than low-level tasks. For both classification and regression, the choice of architecture, the performance criterion, and data set construction play a large role and will have to be optimized for each application.
B. Prior Knowledge In many publications on the use of ANNs in image processing, prior knowledge was used to constrain ANNs. This is to be expected; unconstrained ANNs contain large numbers of parameters and run a high risk of being overtrained. Prior knowledge can be used to lower the number of parameters in a way that does not restrict the ANN to such an extent that it can no longer perform the desired function. One way to do this is to construct modular architectures, in which use is made of the knowledge that an operation is best performed as a number of individual suboperations. Another way is to use the knowledge that neighboring pixels are related and should be treated in the same way, e.g., by using receptive fields in shared weights ANN.
444
DE RIDDER ET AL.
The latter idea was tested in supervised classification, i.e., object recognition. The shared weight ANNs used contain several layers of feature maps (detecting features in a shift-invariant way) and subsampling maps (combining information gathered in previous layers). The question is to what extent this prior knowledge was truly useful. Visual inspection of trained ANNs revealed little. Standard feedforward ANNs comparable in the number of connections (and therefore the amount of computation involved), but with a much larger number of weights, performed as well as the shared weight ANNs. This proves that the prior knowledge was indeed useful in lowering the number of parameters without affecting performance. However, it also indicates that training a standard ANN with more weights than required does not necessarily lead to overtraining. For supervised regression, a number of modular ANNs was constructed. Each module was trained on a specific subtask in the nonlinear filtering problem to which the ANN was applied. Furthermore, of each module different versions were created, ranging from architectures specifically designed to solve the problem (using hand-set weights and tailored transfer functions) to standard feedforward ANNs. According to the proposed smoothing and sharpening performance measures, the fully handconstructed ANN performed best. However, when the hand-constructed ANNs were (gradually) replaced by more standard ANNs, performance quickly decreased and became level with that of some of the standard feedforward ANNs. Furthermore, in the modular ANNs that performed well the modular initialization was no longer visible (see also the next section). The only remaining advantage of a modular approach is that careful optimization of the number of hidden layers and units, as for the standard ANNs, is not necessary. These observations lead to the conclusion that prior knowledge can be used to restrict adaptive methods in a useful way. However, various experiments showed that feedforward ANNs are not natural vehicles for doing so, as this prior knowledge will have to be translated into a choice for ANN size, connectivity, transfer functions, etc., parameters that do not have any physical meaning related to the problem. Therefore, such a translation does not necessarily result in an optimal ANN. It is easier to construct a (rough) model of the data and allow model variation by allowing freedom in a number of well-defined parameters. Prior knowledge should be used in constructing models rather than in molding general approaches. C. Interpretability Throughout this article, strong emphasis was placed on the question whether ANN operation could be inspected after training. Rather than just
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
445
applying ANNs, the goal was to learn from the way in which they solved a problem. In few publications this plays a large role, although it would seem to be an important issue when ANNs are applied in mission-critical systems, e.g., in medicine, process control, or defensive systems. Supervised classification ANNs were inspected with respect to their feature extraction capabilities. As feature extractors, shared weight ANNs were shown to perform well, since standard pattern recognition algorithms trained on extracted features performed better than on the original images. Unfortunately, visual inspection of trained shared weight ANNs revealed nothing. The danger here is of overinterpretation, i.e., reading image processing operations into the ANN that are not really there. To be able to find out what features are extracted, two smaller problems were investigated: edge recognition and two-class handwritten digit recognition. A range of ANNs was built, which showed that ANNs need not comply with our ideas of how such applications should be solved. The ANNs took many ‘‘short cuts,’’ using biases and hidden layer-output layer weights. Only after severely restricting the ANN did it make sense in terms of image processing primitives. Furthermore, in experiments on an ANN with two feature maps the ANN was shown to distribute its functionality over these maps in an unclear way. An interpretation tool, the decorrelating conjugate gradient algorithm (DCGD), can help in distributing functionality more clearly over different ANN parts. The findings lead to the formulation of the interpretability trade-off, between realistic yet hard-to-interpret experiments on the one hand and easily interpreted yet nonrepresentative experiments on the other. This interpretability trade-off returned in the supervised regression problem. Modular ANNs constructed using prior knowledge of the filtering algorithm performed well, but could not be interpreted anymore in terms of the individual suboperations. In fact, retention of the modular initialization was negatively correlated to performance. ANN error evaluation was shown to be a useful tool in gaining understanding of where the ANN fails; it showed that filter operation was poorest around edges. The DCGD algorithm was then used to find out why: most of the standard feedforward ANNs found a suboptimal linear approximation to the Kuwahara filter. The conclusion of the experiments on supervised classification and regression is that as long as a distributed system such as an ANN is trained on single goal, i.e., minimization of prediction error, the operation of subsystems cannot be expected to make sense in terms of traditional image processing operations. This held for both the receptive fields in the shared weight ANNs and the modular setup of the regression ANNs: although they are there, they are not necessarily used as such. This also supports the conclusion of the previous section, that the use of prior knowledge in ANNs is not straightforward.
446
DE RIDDER ET AL.
This article showed that interpretation of supervised ANNs is hazardous. As large distributed systems, they can solve problems in a number of ways, not all of which necessarily correspond to human approaches to these problems. Simply opening the black box at some location one expects the ANN to exhibit certain behavior does not give insight into the overall operation. Furthermore, knowledge obtained from the ANNs cannot be used in any other systems, as it makes sense only in the precise setting of the ANN itself.
D. Conclusions We believe that in the past few years there has been an attitude change toward ANNs, in which ANNs are not automatically seen as the best solution to any problem. The field of ANNs has to a large extent been reincorporated in the various disciplines that inspired it: machine learning, psychology, and neurophysiology. In machine learning, researchers are now turning toward other, nonneural adaptive methods, such as the support vector classifier. For them the ANN has become a tool, rather than the tool it was originally thought to be. So when are ANNs useful in image processing? First, they are interesting tools when there is a real need for a fast parallel solution. Second, biological plausibility may be a factor for some researchers. But most importantly, ANNs trained based on examples can be valuable when a problem is too complex to construct an overall model based on knowledge only. Often, real applications consist of several individual modules performing tasks in various steps of the image processing chain. A neural approach can combine these modules, control each of them, and provide feedback from the highest level to change operations at the lowest level. The price one pays for this power is the black-box character, which makes interpretation difficult, and the problematic use of prior knowledge. If prior knowledge is available, it is better to use this to construct a model-based method and learn its parameters; performance can be as good, and interpretation comes naturally.
References Anand, R., Mehrotra, K., Mohan, C., and Ranka, S. (1995). Efficient classification for multiclass problems using modular neural networks. IEEE Trans. Neural Netw. 6(1), 117–124. Bengio, Y. (1996). Neural Networks for Speech and Sequence Recognition. Boston, MA: International Thompson Computer Press.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
447
Bengio, Y., Le Cun, Y., and Henderson, D. (1994). Globally trained handwritten word recognizer using spatial representation, space displacement neural networks and hidden Markov models, in Advances in Neural Information Processing Systems 6, edited by J. Cowan, G. Tesauro, and J. Alspector. Cambridge, MA: Morgan Kaufmann. Bishop, C. (1995). Neural Networks for Pattern Recognition. Oxford: Oxford University Press. Burrascano, P. (1991). A norm selection criterion for the generalized delta rule. IEEE Trans. Neural Netw. 2(1), 125–130. Ciesielski, V., Zhu, J., Spicer, J., and Franklin, C. (1992). A comparison of image processing techniques and neural networks for an automated visual inspection problem, in Proceedings of the 5th Joint Australian Conference on Artificial Intelligence, edited by A. Adams, and L. Sterling. Singapore: World Scientific, pp. 147–152. De Boer, C., and Smeulders, A. (1996). Bessi: An experimentation system for vision module evaluation, in Proceedings of the 13th IAPR International Conference on Pattern Recognition (ICPR’96), Vol. C. Los Alamitos, CA: IAPR, IEEE Computer Society Press, pp. 109–113. de Ridder, D. (1996). Shared weights neural networks in image analysis. Master’s thesis, Pattern Recognition Group, Faculty of Applied Physics, Delft University of Technology, Delft. de Ridder, D. (2001). Adaptive methods of image processing. Ph.D. thesis, Delft University of Technology, Delft. de Ridder, D., Duin, R., Verbeek, P., and van Vliet, L. (1999). A weight set decorrelating training algorithm for neural network interpretation and symmetry breaking, in Proceedings of the 11th Scandinavian Conference on Image Analysis (SCIA’99), Vol. 2, edited by B. Ersbøll, and P. Johansen. Copenhagen, Denmark: DSAGM (The Pattern Recognition Society of Denmark), pp. 739–746. Devijver, P., and Kittler, J. (1982). Pattern Recognition, a Statistical Approach. London: Prentice-Hall. Dijk, J., de Ridder, D., Verbeek, P., Walraven, J., Young, I., and van Vliet, L. (1999). , A new measure for the effect of sharpening and smoothing filters on images, in Proceedings of the 11th Scandinavian Conference on Image Analysis (SCIA’99), Vol. 1. edited by B. Ersbøll, and P. Johansen. pp. 213–220. Copenhagen, Denmark: DSAGM (The Pattern Recognition Society of Denmark). Egmont-Petersen, M., Dassen, W., Kirchhof, C., Heijmeriks, J., and Ambergen, A. (1998a). An explanation facility for a neural network trained to predict arterial fibrillation directly after cardiac surgery, in Computers in Cardiology 1998. Cleveland: IEEE, pp. 489–492. Egmont-Petersen, M., Talmon, J., Hasman, A., and Ambergen, A. (1998b). Assessing the importance of features for multi-layer perceptrons. Neural Netw. 11(4), 623–635. Egmont-Petersen, M., de Ridder, D., and Handels, H. (2002). Image processing using neural networks—a review. Pattern Recog. 35(10), 119–141. Fahlman, S., and Lebiere, C. (1990). The cascade-correlation learning architecture, in Advances in Neural Information Processing Systems 2, edited by D. Touretzky. Los Altos, CA: Morgan Kaufmann, pp. 524–532. Fogel, D. (1991). An information criterion for optimal neural network selection. IEEE Trans. Neural Netw. 2(5), 490–497. Fogelman Soulie, F., Viennet, E., and Lamy, B. (1993). Multi-modular neural network architectures: Applications in optical character and human face recognition. Int. J. Pattern Recog. Artificial Intel. 7(4), 721–755. Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition. 2nd ed. New York: Academic Press. Fukushima, K., and Miyake, S. (1982). Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recog. 15(6), 455–469.
448
DE RIDDER ET AL.
Funahashi, K.-I. (1989). On the approximate realization of continuous mappings by neural networks. Neural Netw. 2(3), 183–192. Gader, P., Miramonti, J., Won, Y., and Coffield, P. (1995). Segmentation free shared weight networks for automatic vehicle detection. Neural Netw. 8(9), 1457–1473. Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the bias-variance dilemma. Neural Comp. 4(1), 1–58. Gonzalez, R., and Woods, R. (1992). Digital Image Processing. Reading, MA: Addison-Wesley. Gorman, R., and Sejnowski, T. (1988). Analysis of the hidden units in a layered network trained to classify sonar targets. Neural Netw. 1(1), 75–89. Green, C. (1998). Are connectionist models theories of cognition? Psycoloquy. 9(4), (http:// psycprints.ecs.soton.ac.uk). Greenhil, D., and Davies, E. (1994). Relative effectiveness of neural networks for image noise suppression, in Pattern Recognition in Practice IV, edited by E. Gelsema, and L. Kanal. North-Holland: Vlieland, pp. 367–378. Gullapalli, V. (1990). A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Netw. 3(6), 671–692. Haralick, R. (1994). Performance characterization in computer vision. Comput. Vision Graph. Image Process. 60(2), 245–249. Haykin, S. (1994). Neural Networks: A Comprehensive Foundation. New York: Macmillan College Publishing Co. Hertz, J., Krogh, A., and Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Reading, MA: Addison-Wesley. Hinton, G., Sejnowski, T., and Ackley, D. (1984). Boltzmann machines: Constraint satisfaction networks that learn. Technical Report CMU-CS-84-119, Carnegie-Mellon University, Pittsburgh, PA. Hinton, G., Dayan, P., and Revow, M. (1997). Modelling the manifolds of images of handwritten digits. IEEE Trans. Neural Netw. 8(1), 65–74. Hoekstra, A., Kraaijveld, M., de Ridder, D., and Schmidt, W. (1996). The Complete SPRLIB & ANNLIB. Pattern Recognition Group, Faculty of Applied Physics, Delft University of Technology, Delft. Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. USA 81, 3088–3092. Hopfield, J., and Tank, D. (1985). Neural computation of decisions in optimization problems. Biol. Cybern. 52(3), 141–152. Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366. Hubel, D., and Wiesel, T. (1962). Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154. Jacobs, R., Jordan, M., and Barto, A. (1991). Task decomposition through competition in a modular connectionist architecture: The what and where vision tasks. Cognit. Sci. 15, 219–250. Katsulai, H., and Arimizu, N. (1981). Evaluation of image fidelity by means of the fidelogram and level meansquare error. IEEE Trans. Pattern Anal. Mach. Intel. 3(3), 337–347. Kohonen, T. (1995). Self-Organizing Maps, Springer Series in Information Sciences. Berlin: Springer-Verlag. Kuwahara, M., Hachimura, K., Eiho, S., and Kinoshita, M. (1976). Digital processing of biomedical images. New York: Plenum Press, pp. 187–203. Lawrence, S., Giles, C., Tsoi, A., and Back, A. (1997). Face recognition—a convolutional neural-network approach. IEEE Trans. Neural Netw. 8(1), 98–113.
IMAGE PROCESSING USING ARTIFICIAL NEURAL NETWORKS
449
Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989a). Backpropagation applied to handwritten zip code recognition. Neural Comp. 1, 541–551. Le Cun, Y., Jackel, L. J., Boser, B., Denker, J. S., Graf, H. P., Guyon, I., Henderson, D., Howard, R. E., and Hubbard, W. (1989b). Handwritten digit recognition: Applications of neural network chips and automatic learning. IEEE Commun. 27(11), 41–46. Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network, in Advances in Neural Information Processing Systems 2, edited by D. Touretzky. San Mateo, CA: Morgan Kaufmann. Mathworks Inc. (2000). Matlab release 11.1. McCulloch, W., and Pitts, W. (1943). A logical calculus of the ideas imminent in nervous activity. Bull. Math. Biophys. 5, 115–133. Melnik, O., and Pollack, J. (1998). Exact representations from feed-forward networks. Technical Report CS-99-205, Dept. of Computer Science, Volen National Center for Complex Systems, Brandeis University, Waltham, MA. Minsky, M. L., and Papert, S. (1969). Perceptrons. Cambridge, MA: MIT Press. Murata, N., Yoshizawa, S., and Amari, S. (1994). Network information criterion—determining the number of hidden units for an artificial neural network model. IEEE Trans. Neural Netw. 5(6), 865–872. Nilsson, N. (1965). Learning Machines. New York: McGraw-Hill. Nowlan, S., and Platt, J. (1995). A convolutional neural network hand tracker, in Advances in Neural Information Processing Systems 7, edited by G. Tesauro, D. Tourctzky, and T. Leen. Cambridge, MA: MIT Press, pp. 901–908. Pal, N., and Pal, S. (1993). A review on image segmentation techniques. Pattern Recog. 26(9), 1277–1294. Perlovsky, L. (1998). Conundrum of combinatorial complexity. IEEE Trans. Pattern Anal. Machine Intel. 20(6), 666–670. Poggio, T., and Koch, C. (1985). III-posed problems in early vision: From computational theory to analogue networks. Proc. R. Soc. London B(226), 303–323. Pratt, W. K. (1991). Digital Image Processing. 2nd ed. New York: John Wiley & Sons. Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. (1992). Numerical Recipes in C. 2nd ed., Cambridge, MA: Cambridge University Press. Pugmire, R., Hodgson, R., and Chaplin, R. (1998). The properties and training of a neural network based universal window filter developed for image processing tasks, Brain-like Computing and Intelligent Information Systems, edited by S. Amari, and N. Kasabov. Singapore: Springer-Verlag, pp. 49–77. Raudys, S. (1998a). Evolution and generalization of a single neurone: I. Single-layer perceptron as seven statistical classifiers. Neural Netw. 11(2), 283–296. Raudys, S. (1998b). Evolution and generalization of a single neurone: II. Complexity of statistical classifiers and sample size considerations. Neural Netw. 11(2), 297–313. Richard, M. D., and Lippmann, R. P. (1991). Neural network classifiers estimate Bayesian posterior probabilities. Neural Comp. 3(4), 461–483. Rosenblatt, F. (1962). Principles of Neurodynamics. New York: Spartan Books. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning internal representations by error propagation, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, edited by D. Rumelhart, and J. McClelland. Cambridge, MA: MIT Press, Vol. I, pp. 319–362.
450
DE RIDDER ET AL.
Schenkel, M., Guyon, I., and Henderson, D. (1995). On-line cursive script recognition using time delay neural networks and hidden Markov models, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’95). Vol. 2, 637. Sejnowski, T., and Rosenberg, C. (1987). Parallel networks that learn to pronounce english text. Complex Syst. I, 145–168. Setiono, R. (1997). Extracting rules from neural networks by pruning and hidden-unit splitting. Neural Comp. 9(1), 205–225. Setiono, R., and Liu, H. (1997). Neural network feature selector. IEEE Trans. Neural Netw. 8(3), 645–662. Shapire, R. (1990). The strength of weak learnability. Mach. Learn. 5(2), 197–227. Shewchuk, J. R. (1994). An introduction to the conjugate gradient method without the agonizing pain, Technical Report CMU-CS-94-125, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA. Simard, P., LeCun, Y., and Denker, J. (1993). Efficient pattern recognition using a new transformation distance, in Advances in Neural Information Processing Systems 5, edited by S. Hanson, J. Cowan, and L. Giles. San Mateo, CA: Morgan Kaufmann. Solla, S. A., and Le Cun, Y. (1991). Constrained neural networks for pattern recognition, in Neural Networks: Concepts, Applications and Implementations, edited by P. Antognetti, and V. Milutinovic. Englewood Cliffs, NJ: Prentice Hall. Sontag, E. D. (1992). Feedback stabilization using two-hidden-layer nets. IEEE Trans. Neural. Netw. 3(6), 981–990. Spreeuwers, L. J. (1992). Image filtering with neural networks, applications and performance evaluation. Ph.D. thesis, Universiteit Twente, Enschede. Tickle, A., Andrews, R., Golea, M., and Diederich, J. (1998). The truth will come to light: Directions and challenges in extracting the knowledge embedded within trained artificial neural networks. IEEE Trans. Neural Netw. 9(6), 1057–1068. Vapnik, V. (1995). The Nature of Statistical Learning Theory. Berlin: Springer-Verlag. Verschure, P. (1996). Connectionist explanation: Taking positions in the mind-brain dilemma, in Neural Networks and a New AI. London: Thompson, pp. 133–188. Viennet, E. (1993). Architectures Connexionistes Multi-Modulaires, Application a` l’Analyse de Sce`ne. Ph.D. thesis, Universite´ de Paris-Sud, Centre d’Orsay. Wilson, C. L., Garris, M. D. (1992) Handprinted character database 3. National Institute of Standards and Technology, Advanced Systems division. Young, I., Gerbrands, J., and Van Vliet, L. (1998). Image processing fundamentals, in The Digital Signal Processing Handbook. Boca Raton, FL: CRC Press/IEEE Press, pp. 51/1–51/81.
Index
classification, 357–358 feedforward, 356–357 image processing, 355 problems, 365–366 regression, 358–359 types, 360 Atmospheric dynamics, 42 Autoassociative artificial neural networks, 362 Avalanche photodiodes, 230 Average module, 405
2D wavelet transform modulus maxima (WTMM) method, 17–23 definition, 17–18 methodology, 18–19 numerical implementation, 21–23 remark, 19–21 to perform image processing tasks, 38–41 2D continuous wavelet transform, 7–22 computation of, 21 3D image processing, 329–348 coherency considerations, 333–336 concluding remarks, 347–348 detection schemes, 336–337 special cases, 335–336 3D imaging properties, 337–340 3D image reconstruction, 341–346
B Baseline phase space, 293–294 Belt-driven system, 249 Bias phase space, 295 Bias-variance dilemma, 359 Binary images, 105 continuous, 110 discrete, 113–115 Black box problem, 359 Boundary length measurement problem, 183–187 detailed analysis, 184–186 discussion, 186–187 Brownian surfaces, 23
A Acquisition scheme, 198 Additive processes, 36 Algorithm, discrete search, 326–327 Analysis of the geometric distortions, 93–192 concluding remarks, 187–190 introduction, 94–96 problem with closing, 174–178 Anisotropic dilations, 10–17 Anisotropic scale invariance, 3 Annealed averaging, 23 Ambiguity bounds, 347 Argument (A), 34 Artificial neural networks (ANN), 353–355 applications, 356 architecture, 366
C Canonical description, 5 Charge-coupled device (CCD) cameras, 230 Classification, 355 Cloud structure, 41–59 liquid water content (LWP), 42 liquid water path (LWP), 43 Colocalization errors, 198
451
452 Competitive training, 340–346 Complex hologram, 298–299 Computer-aided diagnosis (CAD) methods, 73 Continuous images, 105 binary, 105 laser excitation, 209 shifts produced by media filters in, 105–122 Continuous wave (CW) laser excitation, 209 Continuous wavelet transform, 4 2D, 7–22 computation of, 21 image processing, 7–22 Convolution, 96–97 Corner detector, 178 Cosine, 340–341 Cosine-FZP hologram, 342–343 Cusp-like singularities, 5
D D (h) singularity spectrum, 3 Data-oriented approaches, problems, 364–365 Decorrelating conjugate gradient descent, 392–397 Decorrelating training algorithm, 396 Decorrelation, 393 Dense tissues in mammography, 74–77 Diffusion-limited aggregates (DLA), 5 Digital Database for Screening Mammography (DDSM) project, 74 Digital images, 122 Digitized mammorgrams, 73–80 Distance (l), 3 Dome-slicing, 130
E Edge detection, 378 Edge-favoring sampling, 418–419 Edge-preserving smoothing, 423–432
INDEX
discussion, 430 experiments, 426 Edge recognition, 378–379 discussion, 386–387 training, 381 Edge shifts, 105, 121 arising with hybrid media filters, 121–122 general calculations, 128–129 theory of, 105–110 Energy dissipation field, 56 Enstrophy field, 59–61 Exponent Ho¨lder, 10 Extensions continuous gray-scale images, 110–112 discrete neighborhoods, 113
F Fast Fourier Transform (FFT), 21 Fatty tissues in mammography, 74–77 Feature detectors, 388 Feature extraction, 377–398 Feature maps, 368 Feedforward artificial neural networks, 356–357 applications, 360–361 classification, 357–358 preprocessing, 361 regression, 358–359 types, 360 Field dissipation energy, 56 turbulent, 61–68 enstrophy, 59 turbulent 3D, 68 radiance, 51–52 receptive, 367 temperature, 51–52 velocity, 51–56 Filters Gaussian, 95 image, 96–105
453
INDEX
mean, 95 media, 104–105 mode, 100–101 modular, 404 morphological, 102–104 noise suppression, 98–100 nonlinear, 404 rank-order, 157 truncated median, 101 First ISCCP Research Experiment (FIRE), 44 Floating point operations (FLOP), 373 Fluorescence in situ hybridization (FISH), 266 Fluorescent molecules, 202 behavior under TPE regime, 212–218 two-photon excitation, 202–212 under TPE regime, 212–218 Fluorescent specimens, 293 Fokker-Planck approach, 55 Fourier domain, 21 Fractals homogeneous functions, 19 self-affine, 3 Fractal dimension (DF), 3 Fractional Brownian motion (fBm), 23 Fractional Brownian surfaces, 23–31 Full width at half maximum (FWHM) resolutions, 251 Function Gaussian, 7–8 linear, 19 point spread, 251 probability density, 53 space-space correlation, 50–51, 67–68, 71 WTMM probability density, 48–50, 66–67, 70–71 xor-like, 406
G Gaussian filters, 95 Generalized fractal dimension (Dq), 3
Geometric distortions, 93Global Circulation Model (GCM), 44 Grand canonical description, 5 Graphs, interferometric 311 Gray-scale images, 110, 116 continuous, 110–112 discrete gray-scale images, 116–121
H Handwritten digit recognition, 371 data set, 371 experiments, 372 feature extraction, 374 two-class, 388 Heterodyne scanning, 330–337 two-pupil optical, 330–337 Heterodyning theory, 330–333 High curvature, 163–165 High-resolution satellite images, 41–59 Ho¨lder exponent, 10 Hologram, complex, 340–346 Holography, 329–337, 340–346 Homogeneous (monofractal) fractal functions, 19 Hybrid median filters, 121
I Images continuous, 105–122 digital, 122 gray-scale, 110, 116 restoration, 399–418 Image enhancement, 362 Image filters, 96–105 in-depth study of media filters, 104–105 mode filters, 100–101 morphological filter, 102–104 noise suppression filters, 98–100 Image processing, 7–22, 329–337, 352–353 filters, 93 Image understanding, 363–364
454 Input layer, 368 Input-output mapping, 355 Integer lattices, 293 baseline phase space, 293–294 pupil phase space, 293 unknown-spectral, 294 Integral scale (L), 53 Interfaces isotropic, 2 self-similar, 2 Interferometric graphs, 289–290 International Satellite Cloud Climatology Project (ISCCP), 44 Interpretability trade-off, 387 Isotropic dilations, 10 Isotropic interfaces, 2 Isotropic Mexican hat, 7
K Kolmogorov dissipative scale (), 53 Kuwahara filtering, 399–404
L Laboratory for Advanced Bioimaging, Microscopy, and Spectroscopy (LAMBS), 244 Landsat images, 44–51 marine stratocumulus cloud scenes, 43–44 radiance fields and velocity and temperature fields, 51–52 Laser excitation, 209 sources, 233–241 Lattices, closest point search, 345 Layer hidden, 370 input, 368 output, 370 Lens objectives, 242–244 TPE miscroscope, 244–253
INDEX
Linear function, 19 Linear slant edge, 147–149 Linear variable differential transformer (LVDT), 249 Liquid water content (LWC), 42 Liquid water path (LWP), 43 Loop-entry phase space, 296
M Magniture, 36 Mammograms, digitized, 73 application of 2D WTMM method, 74–77 detecting microcalcifications, 77–79 WT skeleton segmentation, 77–79 Mammographic tissues, 79 dense, 74 fatty, 74 Mammography application of the 2D WTMM method to tissue classification, 74–77 multifractal analysis, 73–80 Maps feature, 368 subsampling, 369 Maxima chains, 11 Maxima lines, 11 Mean energy dissipation (), 56 Mean filters, 146 Media filters continuous images, 105–122 theory of edge shifts, 105–110 Media shifts, 122–124 Median-based corner detector, 178–182 Median filters, 104–105 with small circles, 143–146 hybrid, 121 shifts produced by, 105–110, 122 Mexican hat, isotropic, 7 Microcalcifications in mammography, 77–80 Miscroscope, 195
455
INDEX
Millimeter radars, 42 Mode filters, 101–101, 153 Modular filter, 404 Modular networks, 404, 411, 437 Modules, 410–411 average, 405 position-of-minimum, 405 selection, 406 variance, 405 Monofractal rough surfaces, 23 Morphological filters, 102–104 Multifractal analysis 2D WTMM method, 68 discussion, 71–72 mammographic tissue classification, 74–77 multifractal spectra, 68–70 numerical computation of the multifractal spectra, 63–66 remak, 62–63 space-space correlation function analysis, 50–51, 67–68, 71 WTMMM probability density functions, 43–44, 66–67, 70–71 3D turbulence simulation data, 53–72 description of intermittency, 53–56 high-resolution satellite images of cloud structure, 41–59 Multifractal properties, 19 Multifractal rough surfaces, 23, 31–35 Multifractal scaling, 3 Multifractal spectrum computation of, 22 numerical computation of, 63–66, 68–70 Multiplicative processes, 36 Multiscale edge detection, 7
N National Institute for the Physics of Matter (INFM), 244 Navier-Stokes dynamics, 55
Neighborhoods 3 x 3, 124, 129 5 x 5, 131, 136 7 x 7, 136 circular, 161 discrete, 113 p x p, 122 rectangular, 157 trends for large, 137–141 Network architecture, 379 Neural networks, artificial, 351–450 Nonlinear curve, 19 Nonlinear filters, 97, 404 Nonlinear image processing, 351–450 Nonmaximum suppression, 97 Nonparametric, 359 Noise power, 98 Noise suppression filters, 98–100 Normal sampling, 419 Numerical aperture (NA), 211
O Object recognition, 363, 366 Optical consequences and resolution aspects, 219–224 Optical scanning holography (OSH), 329–337 Optical heterodyne scanning, 330–337 Optimal model phase shift, 313 bias phase, 314 pupil phase, 314 Optimization, 364 Oscillating singularities, 5 Overtraining, 359
P Phase calibration, 288, 291 discrepancy and related results, 309–312 problem, 307–309 Phase closure, 290–291 operator, 296 projection, 296
456 Phase closure imaging, 287–327 appendices, 320–327 concluding comments, 319–320 contents, 287–288 simulated example, 317–319 special cases, 315–317 Phase closure operator, 296–299 Phase closure projection, 296–299 spectral, 299–304 Phase space baseline, 293–294 bias, 295 integer lattices, 293 loop-entry, 296 pupil, 293 unknown-spectral, 294 Phase transitions, 20 Photochemical reactions, 198 Photodetectors, 230 Photointeractions, 198 Photomultiplier tubes, 230 Pixel, 141–143 Point spread function (PSF) measurement, 251 Position-of-minimum module, 405 Power, 98 Probability density function (pdf), 53 Problem with closing, 174–178 detailed analysis, 175–177 discussion, 175–178 Pupil phase space, 293
Q Quenched averaging, 22
R Random cascades, 31–35 Rank-order filters, 157, 170–174 analysis of the situation, 170–172 discussion, 173–174 Receptive fields, 367 Redundant case strongly, 302–304 weakly, 300–302
INDEX
Reference algebraic framework, 305–307 Reference projections, 321 Regression method, 355 Regression networks, 399 architecture and experiments, 404–415 experiments, 419 inspection and improvement, 418–442 Relative filter performance, 426 Reynold’s number, 6 Rough surfaces, 9 scale invariance properties, 36–38 incoherently reflecting, 335–336 local regularity properties of, 9–17 test applications, 23–41 Roughness exponent (H ), 3
S Satellite images, high-resolution, 41 Scale invariance properties, 36–38 Scaling exponents, 4 Second-harmonic generation (SHG), 271 Selection module, 406 Self-affine fractals, 3 Self-organizing map (SOM), 360 Self-similar interfaces, 2 Shared weight networks, 367–377 architecture, 368 discussion, 375–377 feature extraction, 377–398 handwritten digit recognition, 371 Sharpening, 424 Shifts mean filters, 146–150 discussion, 149–150 median filters in continuous images, 105–122 median filters in digital images, 122–146 discussion, 137 mode filters, 150–156
457
INDEX
discussion, 151–153 rank-order filters, 156–170 discussion, 169–170 Signal-to-noise ration (SNR), 98 Sine, 340–341 Sine-Fresnel zone plate (FZP) hologram, 342–343 Single-objective piezo nanopositioner, 249 Singularities cusp-like, 5 oscillating, 5 Smith normal form, 300, 321 Smoothing, 423 Soft-threshold, 379 Space-space correlation function, 50–51, 67–68, 71 Spectral phase closure projection, 299 closure matrix, 300 examples, 300–304 weakly redundant case, 300–302 Smith normal form, 300 strongly redundant case, 302–304 Standard networks, 433 Step edges, 146–147 Stratocumulus cloud (Sc) scenes, 43–44 landsat data, 43 application of 2D WTMM method, 44 Strength (h), 4 Subsampling maps, 369 Switches, 405
T Temperature fields, 51 Template, 379 Threshold, 97, 379 Time delay neural networks (TDNN), 366 Trained networks, 433 Transport, 405 Truncated median filter, 101 Turbulence, fully developed, 51
Turbulent 3D enstrophy field, 68 Turbulent dissipation field, 61–68 Two-class handwritten digit classification, 388–398 training, 388–392 Two-photon excitation (TPE) microscope, 244 Two-photon excitation microscopy, 195–273 application gallery, 253–273 architecture, 225 basic principles, 202–212 conclusions, 273 general considerations, 225–233 historical notes, 198–202 optical consequences, 219–225 resolution aspects, 219–225 Two-photo excitation (TPE) regime, 212–219 behavior of fluorescent molecules, 212 Two-pupil optical heterodyne scanning, 330–337 coherency considerations, 333–335 detection schemes, 336–337 special cases, 335–336
U Unknown-spectral phase space, 294 Useful property, 320
V Variance-covariance matrix, 299 Variance module, 405 Velocity field, 53–56
W Wavelets, 4 analyzing for multiscale edge detection, 7–9
458 Wavelet-based method for multifractal image analysis, 1–92 conclusion, 80–81 introduction, 2–7 Wavelet orthogonal basis, 31–35 Wavelet transform (WT), 4, 5 2D, 7 continuous, 4 image processing, 5 Wavelet transform skeleton computation, 21–22 segmentation, 77–80 Wavelet transform modulus maxima (WTMM), 4
INDEX
definition, 17 methodology, 18 numerical implementation, 21 remark, 19 Wavelet transform modulus maxima maxima (WTMMM), 11 probability density functions, 48–50, 70–71
X Xor-like function, 406